putty-source

nhyatt/putty-source

Fork 0

mirror of https://git.tartarus.org/simon/putty.git synced 2025-07-12 00:33:53 -05:00

Commit Graph

Author	SHA1	Message	Date
Simon Tatham	965057d6d6	Change strategy for the Arm instruction setting DIT. Colin Watson reported that a build failure occurred in the AArch64 Debian build of PuTTY 0.83: gcc now defaults to enabling branch protection using AArch64 pointer authentication, if the target architecture version supports it. Debian's base supported architecture does not, but Armv8.4-A does. So when I changed the compile flags for enable_dit.c to add -march=armv8.4-a, it didn't _just_ allow me to write the 'msr dit, %0' instruction in my asm statement; it also unexpectedly turned on pointer authentication in the containing function, which caused a SIGILL when running on a pre-Armv8.4-A CPU, because although the code correctly skipped the instruction that set DIT, it was already inside enable_dit() at that point and couldn't avoid going through the unsupported 'retaa' instruction which tries to check an auth code on the return address. An obvious approach would be to add -mbranch-protection=none to the compile flags for enable_dit.c. Another approach is to leave the _compiler_ flags alone, and change the architecture in the assembler, either via a fiddly -Wa,... option or by putting a .arch directive inside the asm statement. But both have downsides. Turning off branch protection is fine for the Debian build, but has the unwanted side effect of turning it off (in that one function) even in builds targeting a later architecture which _did_ want branch protection. And changing the assembler's architecture risks changing it _down_ instead of up, again perhaps invalidating other instructions generated by the compiler (like if some later security feature is introduced that gcc also wants to turn on by default). So instead I've taken the much simpler approach of not bothering to change the target architecture at all, and instead generating the move into DIT by hardcoding its actual instruction encoding. This meant I also had to force the input value into a specific register, but I don't think that does any harm (not _even_ wasting an extra instruction in codegen). Now we should avoid interfering with any security features the compiler wants to turn on or off: all of that should be independent of the instruction I really wanted.	2025-02-15 15:57:53 +00:00
Simon Tatham	98200d1bfe	Arm: turn on PSTATE.DIT if available and needed. DIT, for 'Data-Independent Timing', is a bit you can set in the processor state on sufficiently new Arm CPUs, which promises that a long list of instructions will deliberately avoid varying their timing based on the input register values. Just what you want for keeping your constant-time crypto primitives constant-time. As far as I'm aware, no CPU has _yet_ implemented any data-dependent optimisations, so DIT is a safety precaution against them doing so in future. It would be embarrassing to be caught without it if a future CPU does do that, so we now turn on DIT in the PuTTY process state. I've put a call to the new enable_dit() function at the start of every main() and WinMain() belonging to a program that might do cryptography (even testcrypt, in case someone uses it for something!), and in case I missed one there, also added a second call at the first moment that any cryptography-using part of the code looks as if it might become active: when an instance of the SSH protocol object is configured, when the system PRNG is initialised, and when selecting any cryptographic authentication protocol in an HTTP or SOCKS proxy connection. With any luck those precautions between them should ensure it's on whenever we need it. Arm's own recommendation is that you should carefully choose the granularity at which you enable and disable DIT: there's a potential time cost to turning it on and off (I'm not sure what, but plausibly something of the order of a pipeline flush), so it's a performance hit to do it _inside_ each individual crypto function, but if CPUs start supporting significant data-dependent optimisation in future, then it will also become a noticeable performance hit to just leave it on across the whole process. So you'd like to do it somewhere in the middle: for example, you might turn on DIT once around the whole process of verifying and decrypting an SSH packet, instead of once for decryption and once for MAC. With all respect to that recommendation as a strategy for maximum performance, I'm not following it here. I turn on DIT at the start of the PuTTY process, and then leave it on. Rationale: 1. PuTTY is not otherwise a performance-critical application: it's not likely to max out your CPU for any purpose _other_ than cryptography. The most CPU-intensive non-cryptographic thing I can imagine a PuTTY process doing is the complicated computation of font rendering in the terminal, and that will normally be cached (you don't recompute each glyph from its outline and hints for every time you display it). 2. I think a bigger risk lies in accidental side channels from having DIT turned off when it should have been on. I can imagine lots of causes for that. Missing a crypto operation in some unswept corner of the code; confusing control flow (like my coroutine macros) jumping with DIT clear into the middle of a region of code that expected DIT to have been set at the beginning; having a reference counter of DIT requests and getting it out of sync. In a more sophisticated programming language, it might be possible to avoid the risk in #2 by cleverness with the type system. For example, in Rust, you could have a zero-sized type that acts as a proof token for DIT being enabled (it would be constructed by a function that also sets DIT, have a Drop implementation that clears DIT, and be !Send so you couldn't use it in a thread other than the one where DIT was set), and then you could require all the actual crypto functions to take a DitToken as an extra parameter, at zero runtime cost. Then "oops I forgot to set DIT around this piece of crypto" would become a compile error. Even so, you'd have to take some care with coroutine-structured code (what happens if a Rust async function yields while holding a DIT token?) and with nesting (if you have two DIT tokens, you don't want dropping the inner one to clear DIT while the outer one is still there to wrongly convince callees that it's set). Maybe in Rust you could get this all to work reliably. But not in C! DIT is an optional feature of the Arm architecture, so we must first test to see if it's supported. This is done the same way as we already do for the various Arm crypto accelerators: on ELF-based systems, check the appropriate bit in the 'hwcap' words in the ELF aux vector; on Mac, look for an appropriate sysctl flag. On Windows I don't know of a way to query the DIT feature, _or_ of a way to write the necessary enabling instruction in an MSVC-compatible way. I've _heard_ that it might not be necessary, because Windows might just turn on DIT unconditionally and leave it on, in an even more extreme version of my own strategy. I don't have a source for that - I heard it by word of mouth - but I _hope_ it's true, because that would suit me very well! Certainly I can't write code to enable DIT without knowing (a) how to do it, (b) how to know if it's safe. Nonetheless, I've put the enable_dit() call in all the right places in the Windows main programs as well as the Unix and cross-platform code, so that if I later find out that I _can_ put in an explicit enable of DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and compile the enable_dit() function appropriately.	2024-12-19 08:52:47 +00:00

Author

SHA1

Message

Date

Simon Tatham

965057d6d6

Change strategy for the Arm instruction setting DIT.

Colin Watson reported that a build failure occurred in the AArch64
Debian build of PuTTY 0.83:

gcc now defaults to enabling branch protection using AArch64 pointer
authentication, if the target architecture version supports it.
Debian's base supported architecture does not, but Armv8.4-A does. So
when I changed the compile flags for enable_dit.c to add
-march=armv8.4-a, it didn't _just_ allow me to write the 'msr dit, %0'
instruction in my asm statement; it also unexpectedly turned on
pointer authentication in the containing function, which caused a
SIGILL when running on a pre-Armv8.4-A CPU, because although the code
correctly skipped the instruction that set DIT, it was already inside
enable_dit() at that point and couldn't avoid going through the
unsupported 'retaa' instruction which tries to check an auth code on
the return address.

An obvious approach would be to add -mbranch-protection=none to the
compile flags for enable_dit.c. Another approach is to leave the
_compiler_ flags alone, and change the architecture in the assembler,
either via a fiddly -Wa,... option or by putting a .arch directive
inside the asm statement. But both have downsides. Turning off branch
protection is fine for the Debian build, but has the unwanted side
effect of turning it off (in that one function) even in builds
targeting a later architecture which _did_ want branch protection. And
changing the assembler's architecture risks changing it _down_ instead
of up, again perhaps invalidating other instructions generated by the
compiler (like if some later security feature is introduced that gcc
also wants to turn on by default).

So instead I've taken the much simpler approach of not bothering to
change the target architecture at all, and instead generating the move
into DIT by hardcoding its actual instruction encoding. This meant I
also had to force the input value into a specific register, but I
don't think that does any harm (not _even_ wasting an extra
instruction in codegen). Now we should avoid interfering with any
security features the compiler wants to turn on or off: all of that
should be independent of the instruction I really wanted.

2025-02-15 15:57:53 +00:00

Simon Tatham

98200d1bfe

Arm: turn on PSTATE.DIT if available and needed.

DIT, for 'Data-Independent Timing', is a bit you can set in the
processor state on sufficiently new Arm CPUs, which promises that a
long list of instructions will deliberately avoid varying their timing
based on the input register values. Just what you want for keeping
your constant-time crypto primitives constant-time.

As far as I'm aware, no CPU has _yet_ implemented any data-dependent
optimisations, so DIT is a safety precaution against them doing so in
future. It would be embarrassing to be caught without it if a future
CPU does do that, so we now turn on DIT in the PuTTY process state.

I've put a call to the new enable_dit() function at the start of every
main() and WinMain() belonging to a program that might do
cryptography (even testcrypt, in case someone uses it for something!),
and in case I missed one there, also added a second call at the first
moment that any cryptography-using part of the code looks as if it
might become active: when an instance of the SSH protocol object is
configured, when the system PRNG is initialised, and when selecting
any cryptographic authentication protocol in an HTTP or SOCKS proxy
connection. With any luck those precautions between them should ensure
it's on whenever we need it.

Arm's own recommendation is that you should carefully choose the
granularity at which you enable and disable DIT: there's a potential
time cost to turning it on and off (I'm not sure what, but plausibly
something of the order of a pipeline flush), so it's a performance hit
to do it _inside_ each individual crypto function, but if CPUs start
supporting significant data-dependent optimisation in future, then it
will also become a noticeable performance hit to just leave it on
across the whole process. So you'd like to do it somewhere in the
middle: for example, you might turn on DIT once around the whole
process of verifying and decrypting an SSH packet, instead of once for
decryption and once for MAC.

With all respect to that recommendation as a strategy for maximum
performance, I'm not following it here. I turn on DIT at the start of
the PuTTY process, and then leave it on. Rationale:

 1. PuTTY is not otherwise a performance-critical application: it's
    not likely to max out your CPU for any purpose _other_ than
    cryptography. The most CPU-intensive non-cryptographic thing I can
    imagine a PuTTY process doing is the complicated computation of
    font rendering in the terminal, and that will normally be cached
    (you don't recompute each glyph from its outline and hints for
    every time you display it).

 2. I think a bigger risk lies in accidental side channels from having
    DIT turned off when it should have been on. I can imagine lots of
    causes for that. Missing a crypto operation in some unswept corner
    of the code; confusing control flow (like my coroutine macros)
    jumping with DIT clear into the middle of a region of code that
    expected DIT to have been set at the beginning; having a reference
    counter of DIT requests and getting it out of sync.

In a more sophisticated programming language, it might be possible to
avoid the risk in #2 by cleverness with the type system. For example,
in Rust, you could have a zero-sized type that acts as a proof token
for DIT being enabled (it would be constructed by a function that also
sets DIT, have a Drop implementation that clears DIT, and be !Send so
you couldn't use it in a thread other than the one where DIT was set),
and then you could require all the actual crypto functions to take a
DitToken as an extra parameter, at zero runtime cost. Then "oops I
forgot to set DIT around this piece of crypto" would become a compile
error. Even so, you'd have to take some care with coroutine-structured
code (what happens if a Rust async function yields while holding a DIT
token?) and with nesting (if you have two DIT tokens, you don't want
dropping the inner one to clear DIT while the outer one is still there
to wrongly convince callees that it's set). Maybe in Rust you could
get this all to work reliably. But not in C!

DIT is an optional feature of the Arm architecture, so we must first
test to see if it's supported. This is done the same way as we already
do for the various Arm crypto accelerators: on ELF-based systems,
check the appropriate bit in the 'hwcap' words in the ELF aux vector;
on Mac, look for an appropriate sysctl flag.

On Windows I don't know of a way to query the DIT feature, _or_ of a
way to write the necessary enabling instruction in an MSVC-compatible
way. I've _heard_ that it might not be necessary, because Windows
might just turn on DIT unconditionally and leave it on, in an even
more extreme version of my own strategy. I don't have a source for
that - I heard it by word of mouth - but I _hope_ it's true, because
that would suit me very well! Certainly I can't write code to enable
DIT without knowing (a) how to do it, (b) how to know if it's safe.
Nonetheless, I've put the enable_dit() call in all the right places in
the Windows main programs as well as the Unix and cross-platform code,
so that if I later find out that I _can_ put in an explicit enable of
DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and
compile the enable_dit() function appropriately.

2024-12-19 08:52:47 +00:00

2 Commits