Unlike the ones in mpint.c proper, these are not intended to respect
the constant-time guarantees. They're going to be the kind of thing
you use in key generation, which is too random to be constant-time in
any case.
I've arranged several precautions to try to make sure these functions
don't accidentally get linked into the main SSH application, only into
key generators:
- declare them in a separate header with "unsafe" in the name
- put "unsafe" in the name of every actual function
- don't even link the mpunsafe.c translation unit into PuTTY proper
- in fact, define global symbols of the same name in that file and
the SSH client code, so that there will be a link failure if we
ever try to do it by accident
The initial contents of the new source file consist of the subroutine
mp_mod_short() that previously lived in sshprime.c (and was not in
mpint.c proper precisely because it was unsafe). While I'm here, I've
turned it into mp_unsafe_mod_integer() and let it take a modulus of up
to 32 bits instead of 16.
Also added some obviously useful functions to shrink an mpint to the
smallest physical size that can hold the contained number (rather like
bn_restore_invariant in the old Bignum system), which I expect to be
using shortly.
A recent test-compile at high warning level points out that if you
define a macro with a ... at the end of the parameter list, then every
call should at least include the comma before the variadic part. That
is, if you #define MACRO(x,y,...) then you shouldn't call MACRO(1,2)
with no comma after the 2. But that's what I had done in one of my
definitions of FUNC0 in the fiddly testcrypt system.
In a similar vein, it's a mistake to use the preprocessor 'defined'
operator when it's expanded from another macro. Adjusted the setup of
BB_OK in mpint_i.h to avoid doing that.
(Neither of these has yet caused a problem in any real compile, but
best to fix them before they do.)
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
The case previously conditioned on _M_IX86, where we use __int64 as
the BignumDblInt type, is actually valid on any Visual Studio target
platform at all, so it's safe to remove that condition and let it
apply to _M_ARM and _M_ARM64 as well. The only situation in which we
_shouldn't_ use that case for Visual Studio builds is when we have
something even better available, such as the x86-64 intrinsics for
add-with-carry and double-width multiply.
sshaes.c is more or less completely changed by this commit.
Firstly, I've changed the top-level structure. In the old structure,
there were three levels of indirection controlling what an encryption
function would actually do: first the ssh2_cipher vtable, then a
subsidiary set of function pointers within that to select the software
or hardware implementation, and then inside the main encryption
function, a switch on the key length to jump into the right place in
the unrolled loop of cipher rounds.
That was all a bit untidy. So now _all_ of that is done by means of
just one selection system, namely the ssh2_cipher vtable. The software
and hardware implementations of a given SSH cipher each have their own
separate vtable, e.g. ssh2_aes256_sdctr_sw and ssh2_aes256_sdctr_hw;
this allows them to have their own completely different state
structures too, and not have to try to coexist awkwardly in the same
universal AESContext with workaround code to align things correctly.
The old implementation-agnostic vtables like ssh2_aes256_sdctr still
exist, but now they're mostly empty, containing only the constructor
function, which will decide whether AES-NI is currently available and
then choose one of the other _real_ vtables to instantiate.
As well as the cleaner data representation, this also means the
vtables can have different description strings, which means the Event
Log will indicate which AES implementation is actually in use; it
means the SW and HW vtables are available for testcrypt to use
(although actually using them is left for the next commit); and in
principle it would also make it easy to support a user override for
the automatic SW/HW selection (in case anyone turns out to want one).
The AES-NI implementation has been reorganised to fit into the new
framework. One thing I've done is to de-optimise the key expansion:
instead of having a separate blazingly fast loop-unrolled key setup
function for each key length, there's now just one, which uses AES
intrinsics for the actual transformations of individual key words, but
wraps them in a common loop structure for all the key lengths which
has a clear correspondence to the cipher spec. (Sorry to throw away
your work there, Pavel, but this isn't an application where key setup
really _needs_ to be hugely fast, and I decided I prefer a version I
can understand and debug.)
The software AES implementation is also completely replaced with one
that uses a bit-sliced representation, i.e. the cipher state is split
across eight integers in such a way that each logical byte of the
state occupies a single bit in each of those integers. The S-box
lookup is done by a long string of AND and XOR operations on the eight
bits (removing the potential cache side channel from a lookup table),
and this representation allows 64 S-box lookups to be done in parallel
simply by extending those AND/XOR operations to be bitwise ones on a
whole word. So now we can perform four AES encryptions or decryptions
in parallel, at least when the cipher mode permits it (which SDCTR and
CBC decryption both do).
The result is slower than the old implementation, but (a) not by as
much as you might think - those parallel S-boxes are surprisingly
competitive with 64 separate table lookups; (b) the compensation is
that now it should run in constant time with no data-dependent control
flow or memory addressing; and (c) in any case the really fast
hardware implementation will supersede it for most users.
This makes it easy to re-test the mpint functions using different word
sizes and smoke out any more confusions between integer types in
mpint.c, by recompiling with -DBIGNUM_OVERRIDE=4 or =5 or =6 (for 16-,
32- or 64-bit respectively).
BIGNUM_OVERRIDE only lets you force the size downwards, not upwards:
of course the default behaviour is to use the largest BignumInt the
ifdefs can find a way to, and they can't magic up a bigger one just
because you tell them you'd like one.
At some point in mpint.c development I switched the main macro defined
by the ifdefs from BIGNUM_INT_BITS to the new BIGNUM_INT_BITS_BITS, so
I could loop from 0 to the latter in safe bit-shift loops that test
each bit of a shift count. But I forgot to change the comment
accordingly.
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.