gcc and clang both provide a type called __uint128_t when compiling
for 64-bit targets, code-generated more or less similarly to the way
64-bit long longs are handled on 32-bit targets (spanning two
registers, using ADD/ADC, that sort of thing). Where this is available
(and they also provide a handy macro to make it easy to detect), we
should obviously use it, so that we can handle bignums a larger chunk
at a time and make use of the full width of the hardware's multiplier.
Preliminary benchmarking using 'testbn' suggests a factor of about 2.5
improvement.
I've added the new possibility to the ifdefs in sshbn.h, and also
re-run contrib/make1305.py to generate a set of variants of the
poly1305 arithmetic for the new size of BignumInt.
In many places I was using an 'unsigned int', or an implicit int by
virtue of writing an undecorated integer literal, where what was
really wanted was a BignumInt. In particular, this substitution breaks
in any situation where BignumInt is _larger_ than unsigned - which it
is shortly about to be.
Rather than doing arithmetic mod 2^130-5 using the general-purpose
Bignum library, which requires lots of mallocs and frees per operation
and also uses a general-purpose divide routine for each modular
reduction, we now have some dedicated routines in sshccp.c to do
arithmetic mod 2^130-5 in a more efficient way, and hopefully also
with data-independent performance.
Because PuTTY's target platforms don't all use the same size of bignum
component, I've arranged to auto-generate the arithmetic functions
using a Python script living in the 'contrib' directory. As and when
we need to support an extra BignumInt size, that script should still
be around to re-run with different arguments.
I'd rather see the cipher and MAC named separately, with a hint that
the two are linked together in some way, than see the cipher called by
a name including the MAC and the MAC init message have an ugly
'<implicit>' in it.