It's now a subroutine specific to RSA key generation, because the
reworked PrimeCandidateSource system can handle the requirements of
DSA generation automatically.
The difference is that in DSA, one of the primes you generate is used
as a factor in the generation of the other, so you can just pass q as
a factor to pcs_require_residue_1, and it can get the range right by
itself. But in RSA, neither prime is generated with the other one in
mind; they're conceptually generated separately and independently,
apart from that single joint restriction on their product.
(I _could_ have added a feature to PrimeCandidateSource to specify a
range for the prime more specifically than a few initial bits. But I
didn't want to, because it would have been more complicated than doing
it this way, and also slightly less good: if you invent one prime
first and then constrain the range of the other one once you know the
first, then you're not getting an even probability distribution of the
possible _pairs_ of primes - you're privileging one over the other and
skewing the distribution.)
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
Unlike the ones in mpint.c proper, these are not intended to respect
the constant-time guarantees. They're going to be the kind of thing
you use in key generation, which is too random to be constant-time in
any case.
I've arranged several precautions to try to make sure these functions
don't accidentally get linked into the main SSH application, only into
key generators:
- declare them in a separate header with "unsafe" in the name
- put "unsafe" in the name of every actual function
- don't even link the mpunsafe.c translation unit into PuTTY proper
- in fact, define global symbols of the same name in that file and
the SSH client code, so that there will be a link failure if we
ever try to do it by accident
The initial contents of the new source file consist of the subroutine
mp_mod_short() that previously lived in sshprime.c (and was not in
mpint.c proper precisely because it was unsafe). While I'm here, I've
turned it into mp_unsafe_mod_integer() and let it take a modulus of up
to 32 bits instead of 16.
Also added some obviously useful functions to shrink an mpint to the
smallest physical size that can hold the contained number (rather like
bn_restore_invariant in the old Bignum system), which I expect to be
using shortly.
Mostly because I just had a neat idea about how to expose that large
mutable array without it being a mutable global variable: make it a
static in its own module, and expose only a _pointer_ to it, which is
const-qualified.
While I'm there, changed the name to something more descriptive.
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
If you try to generate (say) a 2049-bit RSA key, then primegen will
try to generate a 1025-bit prime. It will do it by making a random
1024-bit mp_int (that is, one strictly _less_ than 2^1024), and then
trying to set bit 1024. But that will fail an assertion in mp_set_bit,
because the number of random bits is a multiple of BIGNUM_INT_BITS, so
an mp_int of the minimum size that can hold the random bits is not
quite big enough to hold the extra bit at the top.
Fix: change the strategy in primegen so that we allocate the mp_int
large enough to hold even the top bit, and copy in the random numbers
via mp_or_into.
There's a second bug hiding behind that one. If the key has odd size,
then the two primes are generated with different bit lengths. If the
overall key size is congruent to 1 mod (2*BIGNUM_INT_BITS), then the
two primes will be allocated as mp_ints with different numbers of
words, leading to another assertion failure in the mp_cond_swap that
sorts the primes into a consistent order.
Fix for that one: if the primes are being generated different bit
lengths, then we arrange those lengths to be already in the right
order, and replace the mp_cond_swap with an assert() that checks the
ordering is already correct.
Combined effect: now you should be able to successfully generate a
2049-bit key without assertion failures.
I carefully tested commit 801ab68ea's rewrite of invent_firstbits in
every way I could think of to ensure that I really was generating two
values whose product was at least 'minproduct'. But unfortunately the
value of 'minproduct' itself was off by a factor of two, which made
the entire system pointless!
Mostly noticed in passing while using Address / Leak Sanitiser to
check over the previous commit. One highlight here is freeing of the
previous iqmp value in rsa_verify, which was actually a potentially
sensitive leak, introduced in the mp_int rewrite (commit 25b034ee3).
Instead of repeatedly looping on the random number generator until it
comes up with two values that have a large enough product, the new
version guarantees only one use of random numbers, by first counting
up all the possible pairs of values that would work, and then
inventing a single random number that's used as an index into that
list.
I've done the selection from the list using constant-time techniques,
not particularly because I think key generation can be made CT in
general, but out of sheer habit after the last few months, and who
knows, it _might_ be useful.
While I'm at it, I've also added an option to make sure the two
firstbits values differ by at least a given value. For RSA, I set that
value to 2, guaranteeing that even if the smaller prime has a very
long string of 1 bits after the firstbits value and the larger has a
long string of 0, they'll still have a relative difference of at least
2^{-12}. Not that there was any serious chance of the primes having
randomly ended up so close together as to make the key in danger of
factoring, but it seems like a silly thing to leave out if I'm
rewriting the function anyway.
This is in preparation for a PRNG revamp which will want to have a
well defined boundary for any given request-for-randomness, so that it
can destroy the evidence afterwards. So no more looping round calling
random_byte() and then stopping when we feel like it: now you say up
front how many random bytes you want, and call random_read() which
gives you that many in one go.
Most of the call sites that had to be fixed are fairly mechanical, and
quite a few ended up more concise afterwards. A few became more
cumbersome, such as mp_random_bits, in which the new API doesn't let
me load the random bytes directly into the target integer without
triggering undefined behaviour, so instead I have to allocate a
separate temporary buffer.
The _most_ interesting call site was in the PKCS#1 v1.5 padding code
in sshrsa.c (used in SSH-1), in which you need a stream of _nonzero_
random bytes. The previous code just looped on random_byte, retrying
if it got a zero. Now I'm doing a much more interesting thing with an
mpint, essentially scaling a binary fraction repeatedly to extract a
number in the range [0,255) and then adding 1 to it.
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
It wasn't really doing any serious harm, but I just got tired of
having to scroll past 700 lines of pointless static data every time I
wanted to look at the actual code in the file. Now primes[] is
initialised as necessary when genprime is first called.
(Since we only use primes up to 2^16, I didn't see any point in doing
anything fancy; this is the most trivial Sieve of Eratosthenes.)
feature (make sure your prime is not congruent to Foo mod Bar)
largely ineffective. As a result, RSA keys were being generated
every so often with at least one prime congruent to 1 mod 37,
causing modinv(37, phi(n)) to divide by zero, and rightly so. I
believe this fixes `puttygen-zero-div'.
[originally from svn r3316]