The revamp of key generation in commit e460f3083 made the assumption
that you could decide how many bytes of key material to generate by
converting cipher->keylen from bits to bytes. This is a good
assumption for all ciphers except DES/3DES: since the SSH DES key
setup ignores one bit in every byte of key material it's given, you
need more bytes than its keylen field would have you believe. So
currently the DES ciphers aren't being keyed correctly.
The original keylen field is used for deciding how big a DH group to
request, and on that basis I think it still makes sense to keep it
reflecting the true entropy of a cipher key. So it turns out we need
two _separate_ key length fields per cipher - one for the real
entropy, and one for the much more obvious purpose of knowing how much
data to ask for from ssh2_mkkey.
A compensatory advantage, though, is that we can now measure the
latter directly in bytes rather than bits, so we no longer have to
faff about with dividing by 8 and rounding up.
The key derivation code has been assuming (though non-critically, as
it happens) that the size of the MAC output is the same as the size of
the MAC key. That isn't even a good assumption for the HMAC family,
due to HMAC-SHA1-96 and also the bug-compatible versions of HMAC-SHA1
that only use 16 bytes of key material; so now we have an explicit
key-length field separate from the MAC-length field.
While ChaCha20 takes a 64-bit nonce, SSH-2 defines the message
sequence number to wrap at 2^32 and OpenSSH stores it in a u_int32_t,
so the upper 32 bits should always be zero. PuTTY was getting this
wrong, and either using an incorrect nonce or causing GCC to complain
about an invalid shift, depending on the size of "unsigned long". Now
I think it gets it right.
gcc and clang both provide a type called __uint128_t when compiling
for 64-bit targets, code-generated more or less similarly to the way
64-bit long longs are handled on 32-bit targets (spanning two
registers, using ADD/ADC, that sort of thing). Where this is available
(and they also provide a handy macro to make it easy to detect), we
should obviously use it, so that we can handle bignums a larger chunk
at a time and make use of the full width of the hardware's multiplier.
Preliminary benchmarking using 'testbn' suggests a factor of about 2.5
improvement.
I've added the new possibility to the ifdefs in sshbn.h, and also
re-run contrib/make1305.py to generate a set of variants of the
poly1305 arithmetic for the new size of BignumInt.
In many places I was using an 'unsigned int', or an implicit int by
virtue of writing an undecorated integer literal, where what was
really wanted was a BignumInt. In particular, this substitution breaks
in any situation where BignumInt is _larger_ than unsigned - which it
is shortly about to be.
Rather than doing arithmetic mod 2^130-5 using the general-purpose
Bignum library, which requires lots of mallocs and frees per operation
and also uses a general-purpose divide routine for each modular
reduction, we now have some dedicated routines in sshccp.c to do
arithmetic mod 2^130-5 in a more efficient way, and hopefully also
with data-independent performance.
Because PuTTY's target platforms don't all use the same size of bignum
component, I've arranged to auto-generate the arithmetic functions
using a Python script living in the 'contrib' directory. As and when
we need to support an extra BignumInt size, that script should still
be around to re-run with different arguments.
I'd rather see the cipher and MAC named separately, with a hint that
the two are linked together in some way, than see the cipher called by
a name including the MAC and the MAC init message have an ugly
'<implicit>' in it.