We already had a function pcs_require_residue_1() which lets you ask
PrimeCandidateSource to ensure it only returns numbers congruent to 1
mod a given value. pcs_require_residue_1_mod_prime() is the same, but
it also records the number in a list of prime factors of n-1, which
can be queried later.
The idea is that if you're generating a DSA key, in which the small
prime q must divide p-1, the upcoming provable generation algorithm
will be able to recover q from the PrimeCandidateSource and use it as
part of the primality certificate, which reduces the number of bits of
extra prime factors it also has to make up.
The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
Having just written a comment about how it was almost inconceivably
improbable that you _wouldn't_ be successful in finding a suitable g
on the very first number you tried, I couldn't help noticing that in
fact my very next DSA key generation test had to try twice. Had I made
a mistake in my probability theory?
No, it turns out: I find g by raising consecutive numbers to the power
(p-1)/q and looking to see if they're not 1, but I start with 1
itself, which along with -1 is the only number that _can't_ work!
Save a bit of pointless effort and iterate up from 2 instead.
Thanks to Pavel Kryukov's CI for pointing this out: VS 2015 doesn't
support C99 hex floating-point literals.
(VS 2017 does, and in general we're treating this as a C99-permitted
code base these days. So I was tempted to just increase the minimum
required compiler version and leave this code as it is. But since the
use of that particular floating literal was so totally frivolous and
unnecessary, I think I'll leave that for another day when it's more
important!)
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
(This is a reapplication of commit a7bdefb39, without the Miller-Rabin
refactoring accidentally folded into it. Also this time I've added -lm
to the link options, which for some reason _didn't_ cause me a link
failure last time round. No idea why not.)
This reverts commit a7bdefb394.
I had accidentally mashed it together with another commit. I did
actually want to push both of them, but I'd rather push them
separately! So I'm backing out the combined blob, and I'll re-push
them with their proper comments and explanations.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
Instead of repeatedly looping on the random number generator until it
comes up with two values that have a large enough product, the new
version guarantees only one use of random numbers, by first counting
up all the possible pairs of values that would work, and then
inventing a single random number that's used as an index into that
list.
I've done the selection from the list using constant-time techniques,
not particularly because I think key generation can be made CT in
general, but out of sheer habit after the last few months, and who
knows, it _might_ be useful.
While I'm at it, I've also added an option to make sure the two
firstbits values differ by at least a given value. For RSA, I set that
value to 2, guaranteeing that even if the smaller prime has a very
long string of 1 bits after the firstbits value and the larger has a
long string of 0, they'll still have a relative difference of at least
2^{-12}. Not that there was any serious chance of the primes having
randomly ended up so close together as to make the key in danger of
factoring, but it seems like a silly thing to leave out if I'm
rewriting the function anyway.
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
In commit 884a7df94 I claimed that all my trait-like vtable systems
now had the generic object type being a struct rather than a bare
vtable pointer (e.g. instead of 'Socket' being a typedef for a pointer
to a const Socket_vtable, it's a typedef for a struct _containing_ a
vtable pointer).
In fact, I missed a few. This commit converts ssh_key, ssh2_cipher and
ssh1_cipher into the same form as the rest.
After Pavel Kryukov pointed out that I have to put _something_ in the
'ssh_key' structure, I thought of an actually useful thing to put
there: why not make it store a pointer to the ssh_keyalg structure?
Then ssh_key becomes a classoid - or perhaps 'traitoid' is a closer
analogy - in the same style as Socket and Plug. And just like Socket
and Plug, I've also arranged a system of wrapper macros that avoid the
need to mention the 'object' whose method you're invoking twice at
each call site.
The new vtable pointer directly replaces an existing field of struct
ec_key (which was usable by several different ssh_keyalgs, so it
already had to store a pointer to the currently active one), and also
replaces the 'alg' field of the ssh2_userkey structure that wraps up a
cryptographic key with its comment field.
I've also taken the opportunity to clean things up a bit in general:
most of the methods now have new and clearer names (e.g. you'd never
know that 'newkey' made a public-only key while 'createkey' made a
public+private key pair unless you went and looked it up, but now
they're called 'new_pub' and 'new_priv' you might be in with a
chance), and I've completely removed the openssh_private_npieces field
after realising that it was duplicating information that is actually
_more_ conveniently obtained by calling the new_priv_openssh method
(formerly openssh_createkey) and throwing away the result.