This is a sweeping change applied across the whole code base by a spot
of Emacs Lisp. Now, everywhere I declare a vtable filled with function
pointers (and the occasional const data member), all the members of
the vtable structure are initialised by name using the '.fieldname =
value' syntax introduced in C99.
We were already using this syntax for a handful of things in the new
key-generation progress report system, so it's not new to the code
base as a whole.
The advantage is that now, when a vtable only declares a subset of the
available fields, I can initialise the rest to NULL or zero just by
leaving them out. This is most dramatic in a couple of the outlying
vtables in things like psocks (which has a ConnectionLayerVtable
containing only one non-NULL method), but less dramatically, it means
that the new 'flags' field in BackendVtable can be completely left out
of every backend definition except for the SUPDUP one which defines it
to a nonzero value. Similarly, the test_for_upstream method only used
by SSH doesn't have to be mentioned in the rest of the backends;
network Plugs for listening sockets don't have to explicitly null out
'receive' and 'sent', and vice versa for 'accepting', and so on.
While I'm at it, I've normalised the declarations so they don't use
the unnecessarily verbose 'struct' keyword. Also a handful of them
weren't const; now they are.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
I noticed a few of these in the course of preparing the previous
commit. I must have been writing that idiom out by hand for _ages_
before it became totally habitual to #define it as 'lenof' in every
codebase I touch. Now I've gone through and replaced all the old
verbosity with nice terse lenofs.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
Previously, lots of individual ssh2_cipheralg structures were declared
static, and only available to the rest of the code via a smaller
number of 'ssh2_ciphers' objects that wrapped them into lists. But I'm
going to want to access individual ciphers directly in the testing
system I'm currently working on, so I'm giving all those objects
external linkage and declaring them in ssh.h.
Also, I've made up an entirely new one, namely exposing MD5 as an
instance of the general ssh_hashalg abstraction, which it has no need
to be for the purposes of actually using it in SSH. But, again, this
will let me treat it the same as all the other hashes in the test
system.
No functional change, for the moment.
This makes the API more flexible, so that it's not restricted to
taking a key of precisely the length specified in the ssh2_macalg
structure. Instead, ssh2bpp looks up that length to construct the
MAC's key.
Some MACs (e.g. Poly1305) will only _work_ with a single key length.
But this way, I can run standard test vectors against MACs that can
take a variable length (e.g. everything in the HMAC family).
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
In commit 884a7df94 I claimed that all my trait-like vtable systems
now had the generic object type being a struct rather than a bare
vtable pointer (e.g. instead of 'Socket' being a typedef for a pointer
to a const Socket_vtable, it's a typedef for a struct _containing_ a
vtable pointer).
In fact, I missed a few. This commit converts ssh_key, ssh2_cipher and
ssh1_cipher into the same form as the rest.
The annoying int64.h is completely retired, since C99 guarantees a
64-bit integer type that you can actually treat like an ordinary
integer. Also, I've replaced the local typedefs uint32 and word32
(scattered through different parts of the crypto code) with the
standard uint32_t.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
This piece of tidying-up has come out particularly well in terms of
saving tedious repetition and boilerplate. I've managed to remove
three pointless methods from every MAC implementation by means of
writing them once centrally in terms of the implementation-specific
methods; another method (hmacmd5_sink) vanished because I was able to
make the interface type 'ssh2_mac' be directly usable as a BinarySink
by way of a new delegation system; and because all the method
implementations can now find their own vtable, I was even able to
merge a lot of keying and output functions that had previously
differed only in length parameters by having them look up the lengths
in whatever vtable they were passed.
This is more or less the same job as the SSH-1 case, only more
extensive, because we have a wider range of ciphers.
I'm a bit disappointed about the AES case, in particular, because I
feel as if it ought to have been possible to arrange to combine this
layer of vtable dispatch with the subsidiary one that selects between
hardware and software implementations of the underlying cipher. I may
come back later and have another try at that, in fact.
This is a cleanup I started to notice a need for during the BinarySink
work. It removes a lot of faffing about casting things to char * or
unsigned char * so that some API will accept them, even though lots of
such APIs really take a plain 'block of raw binary data' argument and
don't care what C thinks the signedness of that data might be - they
may well reinterpret it back and forth internally.
So I've tried to arrange for all the function call APIs that ought to
have a void * (or const void *) to have one, and those that need to do
pointer arithmetic on the parameter internally can cast it back at the
top of the function. That saves endless ad-hoc casts at the call
sites.
Just as I did a few commits ago with the low-level SHA_Bytes type
functions, the ssh_hash and ssh_mac abstract types now no longer have
a direct foo->bytes() update method at all. Instead, each one has a
foo->sink() function that returns a BinarySink with the same lifetime
as the hash context, and then the caller can feed data into that in
the usual way.
This lets me get rid of a couple more duplicate marshalling routines
in ssh.c: hash_string(), hash_uint32(), hash_mpint().
Mark Wooding pointed out that my comment in make1305.py was completely
wrong, and that the stated strategy for reducing a value mod 2^130-5
would not in fact completely reduce all inputs in the range - for the
most obvious reason, namely that the numbers between 2^130-5 and 2^130
would never have anything subtracted at all.
Implemented a replacement strategy which my tests suggest will do the
right thing for all numbers in the expected range that are anywhere
near an integer multiple of the modulus.
As I mentioned in the previous commit, I'm going to want PuTTY to be
able to run sensibly when compiled with 64-bit Visual Studio,
including handling bignums in 64-bit chunks for speed. Unfortunately,
64-bit VS does not provide any type we can use as BignumDblInt in that
situation (unlike 64-bit gcc and clang, which give us __uint128_t).
The only facilities it provides are compiler intrinsics to access an
add-with-carry operation and a 64x64->128 multiplication (the latter
delivering its product in two separate 64-bit output chunks).
Hence, here's a substantial rework of the bignum code to make it
implement everything in terms of _those_ primitives, rather than
depending throughout on having BignumDblInt available to use ad-hoc.
BignumDblInt does still exist, for the moment, but now it's an
internal implementation detail of sshbn.h, only declared inside a new
set of macros implementing arithmetic primitives, and not accessible
to any code outside sshbn.h (which confirms that I really did catch
all uses of it and remove them).
The resulting code is surprisingly nice-looking, actually. You'd
expect more hassle and roundabout circumlocutions when you drop down
to using a more basic set of primitive operations, but actually, in
many cases it's turned out shorter to write things in terms of the new
BignumADC and BignumMUL macros - because almost all my uses of
BignumDblInt were implementing those operations anyway, taking several
lines at a time, and now they can do each thing in just one line.
The biggest headache was Poly1305: I wasn't able to find any sensible
way to adapt the existing Python script that generates the various
per-int-size implementations of arithmetic mod 2^130-5, and so I had
to rewrite it from scratch instead, with nothing in common with the
old version beyond a handful of comments. But even that seems to have
worked out nicely: the new version has much more legible descriptions
of the high-level algorithms, by virtue of having a 'Multiprecision'
type which wraps up the division into words, and yet Multiprecision's
range analysis allows it to automatically drop out special cases such
as multiplication by 5 being much easier than multiplication by
another multi-word integer.
The revamp of key generation in commit e460f3083 made the assumption
that you could decide how many bytes of key material to generate by
converting cipher->keylen from bits to bytes. This is a good
assumption for all ciphers except DES/3DES: since the SSH DES key
setup ignores one bit in every byte of key material it's given, you
need more bytes than its keylen field would have you believe. So
currently the DES ciphers aren't being keyed correctly.
The original keylen field is used for deciding how big a DH group to
request, and on that basis I think it still makes sense to keep it
reflecting the true entropy of a cipher key. So it turns out we need
two _separate_ key length fields per cipher - one for the real
entropy, and one for the much more obvious purpose of knowing how much
data to ask for from ssh2_mkkey.
A compensatory advantage, though, is that we can now measure the
latter directly in bytes rather than bits, so we no longer have to
faff about with dividing by 8 and rounding up.
The key derivation code has been assuming (though non-critically, as
it happens) that the size of the MAC output is the same as the size of
the MAC key. That isn't even a good assumption for the HMAC family,
due to HMAC-SHA1-96 and also the bug-compatible versions of HMAC-SHA1
that only use 16 bytes of key material; so now we have an explicit
key-length field separate from the MAC-length field.
While ChaCha20 takes a 64-bit nonce, SSH-2 defines the message
sequence number to wrap at 2^32 and OpenSSH stores it in a u_int32_t,
so the upper 32 bits should always be zero. PuTTY was getting this
wrong, and either using an incorrect nonce or causing GCC to complain
about an invalid shift, depending on the size of "unsigned long". Now
I think it gets it right.
gcc and clang both provide a type called __uint128_t when compiling
for 64-bit targets, code-generated more or less similarly to the way
64-bit long longs are handled on 32-bit targets (spanning two
registers, using ADD/ADC, that sort of thing). Where this is available
(and they also provide a handy macro to make it easy to detect), we
should obviously use it, so that we can handle bignums a larger chunk
at a time and make use of the full width of the hardware's multiplier.
Preliminary benchmarking using 'testbn' suggests a factor of about 2.5
improvement.
I've added the new possibility to the ifdefs in sshbn.h, and also
re-run contrib/make1305.py to generate a set of variants of the
poly1305 arithmetic for the new size of BignumInt.
In many places I was using an 'unsigned int', or an implicit int by
virtue of writing an undecorated integer literal, where what was
really wanted was a BignumInt. In particular, this substitution breaks
in any situation where BignumInt is _larger_ than unsigned - which it
is shortly about to be.
Rather than doing arithmetic mod 2^130-5 using the general-purpose
Bignum library, which requires lots of mallocs and frees per operation
and also uses a general-purpose divide routine for each modular
reduction, we now have some dedicated routines in sshccp.c to do
arithmetic mod 2^130-5 in a more efficient way, and hopefully also
with data-independent performance.
Because PuTTY's target platforms don't all use the same size of bignum
component, I've arranged to auto-generate the arithmetic functions
using a Python script living in the 'contrib' directory. As and when
we need to support an extra BignumInt size, that script should still
be around to re-run with different arguments.
I'd rather see the cipher and MAC named separately, with a hint that
the two are linked together in some way, than see the cipher called by
a name including the MAC and the MAC init message have an ugly
'<implicit>' in it.