The new testcrypt system made it easy to write a tiny Python program
that does a lot of multiplications of various large sizes, run it
against versions of the testcrypt binary built with lots of different
threshold settings, and time the output by running the Python program
with PUTTY_TESTCRYPT="command time -f %U ./testcrypt".
When I tried that I found that lots of values in the 20-30 range
looked about as good as each other. 24 was an unusually low dip which
could well have just been a random outlier, but it's a nice round
number so I picked it anyway.
I wrote it for the sake of a test-system design I had in mind at the
time, but that design changed after I committed, and now I think
_even_ my upcoming test application won't need to copy MontyContexts.
So I'll remove the function now, so as not to have to pointlessly
write tests for it :-)
If it had to negate x-y to make it positive for mp_mod, but the answer
comes out as zero after that, then after re-negating it this is the
one case where we _shouldn't_ add the modulus afterwards. Result was
that, for example, mp_modsub(0, 0, 5) would return 5 instead of the
obvious 0.
It correctly masked off bits in the partial word, but then left all
higher words _unchanged_ rather than zeroing them.
Apparently its use in mp_invert_mod_2to was in restricted enough
circumstances not to cause a failure there!
My test for whether x has a square root was based on testing whether a
large power of x was congruent to 1 mod p, which is a fine test
provided x is in the multiplicative group of p, but would give a false
negative on the one possible input value that _isn't_ - namely zero.
The actual number returned from the function is fine (because that too
is a large power of the input, and when the input is 0 that's
foolproof). So I just needed to add a special case for the returned
'success' flag.
I had not considered what would happen if the input and output mp_ints
aliased each other. What happens is that because I wrote the output
starting from the low end, input words would get corrupted before I'd
finished with them. Easily fixed by reversing the order of the loop.
misc.c has always contained a combination of things that are tied
tightly into the PuTTY code base (e.g. they use the conf system, or
work with our sockets abstraction) and things that are pure standalone
utility functions like nullstrcmp() which could quite happily be
dropped into any C program without causing a link failure.
Now the latter kind of standalone utility code lives in the new source
file utils.c, whose only external dependency is on memory.c (for snew,
sfree etc), which in turn requires the user to provide an
out_of_memory() function. So it should now be much easier to link test
programs that use PuTTY's low-level functions without also pulling in
half its bulky infrastructure.
In the process, I came across a memory allocation logging system
enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case
we have much more advanced tools for that kind of thing these days,
like valgrind and Leak Sanitiser, so I've just removed it rather than
trying to transplant it somewhere sensible. (We can always pull it
back out of the version control history if really necessary, but I
haven't used it in at least a decade.)
The other slightly silly thing I did was to give bufchain a function
pointer field that points to queue_idempotent_callback(), and disallow
direct setting of the 'ic' field in favour of calling
bufchain_set_callback which will fill that pointer in too. That allows
the bufchain system to live in utils.c rather than misc.c, so that
programs can use it without also having to link in the callback system
or provide an annoying stub of that function. In fact that's just
allowed me to remove stubs of that kind from PuTTYgen and Pageant!
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.