DIT, for 'Data-Independent Timing', is a bit you can set in the
processor state on sufficiently new Arm CPUs, which promises that a
long list of instructions will deliberately avoid varying their timing
based on the input register values. Just what you want for keeping
your constant-time crypto primitives constant-time.
As far as I'm aware, no CPU has _yet_ implemented any data-dependent
optimisations, so DIT is a safety precaution against them doing so in
future. It would be embarrassing to be caught without it if a future
CPU does do that, so we now turn on DIT in the PuTTY process state.
I've put a call to the new enable_dit() function at the start of every
main() and WinMain() belonging to a program that might do
cryptography (even testcrypt, in case someone uses it for something!),
and in case I missed one there, also added a second call at the first
moment that any cryptography-using part of the code looks as if it
might become active: when an instance of the SSH protocol object is
configured, when the system PRNG is initialised, and when selecting
any cryptographic authentication protocol in an HTTP or SOCKS proxy
connection. With any luck those precautions between them should ensure
it's on whenever we need it.
Arm's own recommendation is that you should carefully choose the
granularity at which you enable and disable DIT: there's a potential
time cost to turning it on and off (I'm not sure what, but plausibly
something of the order of a pipeline flush), so it's a performance hit
to do it _inside_ each individual crypto function, but if CPUs start
supporting significant data-dependent optimisation in future, then it
will also become a noticeable performance hit to just leave it on
across the whole process. So you'd like to do it somewhere in the
middle: for example, you might turn on DIT once around the whole
process of verifying and decrypting an SSH packet, instead of once for
decryption and once for MAC.
With all respect to that recommendation as a strategy for maximum
performance, I'm not following it here. I turn on DIT at the start of
the PuTTY process, and then leave it on. Rationale:
1. PuTTY is not otherwise a performance-critical application: it's
not likely to max out your CPU for any purpose _other_ than
cryptography. The most CPU-intensive non-cryptographic thing I can
imagine a PuTTY process doing is the complicated computation of
font rendering in the terminal, and that will normally be cached
(you don't recompute each glyph from its outline and hints for
every time you display it).
2. I think a bigger risk lies in accidental side channels from having
DIT turned off when it should have been on. I can imagine lots of
causes for that. Missing a crypto operation in some unswept corner
of the code; confusing control flow (like my coroutine macros)
jumping with DIT clear into the middle of a region of code that
expected DIT to have been set at the beginning; having a reference
counter of DIT requests and getting it out of sync.
In a more sophisticated programming language, it might be possible to
avoid the risk in #2 by cleverness with the type system. For example,
in Rust, you could have a zero-sized type that acts as a proof token
for DIT being enabled (it would be constructed by a function that also
sets DIT, have a Drop implementation that clears DIT, and be !Send so
you couldn't use it in a thread other than the one where DIT was set),
and then you could require all the actual crypto functions to take a
DitToken as an extra parameter, at zero runtime cost. Then "oops I
forgot to set DIT around this piece of crypto" would become a compile
error. Even so, you'd have to take some care with coroutine-structured
code (what happens if a Rust async function yields while holding a DIT
token?) and with nesting (if you have two DIT tokens, you don't want
dropping the inner one to clear DIT while the outer one is still there
to wrongly convince callees that it's set). Maybe in Rust you could
get this all to work reliably. But not in C!
DIT is an optional feature of the Arm architecture, so we must first
test to see if it's supported. This is done the same way as we already
do for the various Arm crypto accelerators: on ELF-based systems,
check the appropriate bit in the 'hwcap' words in the ELF aux vector;
on Mac, look for an appropriate sysctl flag.
On Windows I don't know of a way to query the DIT feature, _or_ of a
way to write the necessary enabling instruction in an MSVC-compatible
way. I've _heard_ that it might not be necessary, because Windows
might just turn on DIT unconditionally and leave it on, in an even
more extreme version of my own strategy. I don't have a source for
that - I heard it by word of mouth - but I _hope_ it's true, because
that would suit me very well! Certainly I can't write code to enable
DIT without knowing (a) how to do it, (b) how to know if it's safe.
Nonetheless, I've put the enable_dit() call in all the right places in
the Windows main programs as well as the Unix and cross-platform code,
so that if I later find out that I _can_ put in an explicit enable of
DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and
compile the enable_dit() function appropriately.
In the course of recent refactorings I noticed a couple of cases where
we were doing old-fashioned preallocation of a char array with some
conservative maximum size, then writing into it via *p++ or similar
and hoping we got the calculation right.
Now we have strbuf and dupcat, so we shouldn't ever have to do that.
Fixed as many cases as I could find by searching for allocations of
the form 'snewn(foo, char)'.
Particularly worth a mention was the Windows GSSAPI setup code, which
was directly using the Win32 Registry API, and looks much more legible
using the windows/utils/registry.c wrappers. (But that was why I had
to enhance them in the previous commit so as to be able to open
registry keys read-only: without that, the open operation would
actually fail on this key, which is not user-writable.)
Also unix/askpass.c, which was doing a careful reallocation of its
buffer to avoid secrets being left behind in the vacated memory -
which is now just a matter of ensuring we called strbuf_new_nm().
If the function name (or expression) in a function call or declaration
is itself so long that even the first argument doesn't fit after it on
the same line, or if that would leave so little space that it would be
silly to try to wrap all the run-on lines into a tall thin column,
then I used to do this
ludicrously_long_function_name
(arg1, arg2, arg3);
and now prefer this
ludicrously_long_function_name(
arg1, arg2, arg3);
I picked up the habit from Python, where the latter idiom is required
by Python's syntactic significance of newlines (you can write the
former if you use a backslash-continuation, but pretty much everyone
seems to agree that that's much uglier). But I've found it works well
in C as well: it makes it more obvious that the previous line is
incomplete, it gives you a tiny bit more space to wrap the following
lines into (the old idiom indents the _third_ line one space beyond
the second), and I generally turn out to agree with the knock-on
indentation decisions made by at least Emacs if you do it in the
middle of a complex expression. Plus, of course, using the _same_
idiom between C and Python means less state-switching.
So, while I'm making annoying indentation changes in general, this
seems like a good time to dig out all the cases of the old idiom in
this code, and switch them over to the new.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.