If you want to generate a Sophie Germain / safe prime pair with this
code, then after you've made p, you need to test the primality of
2p+1.
The easiest way to do that is to make a PrimeCandidateSource that is
so constrained as to only be able to deliver 2p+1 as a candidate, and
then run the ordinary prime generation system. The problem is that the
prime generation loops forever, so if 2p+1 _isn't_ prime, it will just
keep testing the same number over and over again and failing the test.
To solve this, I add a 'one-shot mode' to the PrimeCandidateSource
itself, which will cause it to return NULL if asked to generate a
second candidate. Then the prime-generation loops all detect that and
return NULL in turn. However, for clients that _don't_ set a pcs into
one-shot mode, the API remains unchanged: pcs_generate will not return
until it's found a prime (by its own criteria).
This feels like a bit of a bodge, API-wise. But the other two obvious
approaches turn out more awkward.
One option is to extract the Pockle from the PrimeGenerationContext
and use that to directly test primality of 2p+1 based on p - but that
way you don't get to _probabilistically_ generate safe primes, because
that kind of PGC has no Pockle in the first place. (And you can't make
one separately, because you can't convince it that an only
probabilistically generated p is prime!)
Another option is to add a test() method to PrimeGenerationPolicy,
that sits alongside generate(). Then, having generated p, you just
_test_ 2p+1. But then in the provable case you have to explain to it
that it should use p as part of the proof, so that API would get
awkward in its own way.
So this is actually the least disruptive way to do it!
A Sophie Germain prime is a prime p such that 2p+1 is also prime. The
larger prime of the pair 2p+1 is also known as a 'safe prime', and is
the preferred kind of modulus for conventional Diffie-Hellman.
Generating these is harder work than normal prime generation. There's
not really much of a technique except to just keep generating
candidate primes p and then testing 2p+1. But what you _can_ do to
speed things up is to get the prime-candidate generator to help a bit:
it's already enforcing that no small prime divides p, and it's easy to
get it to also enforce that no small prime divides 2p+1. That check
can filter out a lot of bad candidates early, before you waste time on
the more expensive checks, so you have a better chance of success with
each number that gets as far as Miller-Rabin.
Here I add an extra setup function for PrimeCandidateSource which
enables those extra checks. After you call pcs_try_sophie_germain(),
the PCS will only deliver you numbers for which both p and 2p+1 are
free of small factors.
Conveniently checkable certificates of primality aren't a new concept.
I didn't invent them, and I wasn't the first to implement them. Given
that, I thought it might be useful to be able to independently verify
a prime generated by PuTTY's provable prime system. Then, even if you
don't trust _this_ code, you might still trust someone else's
verifier, or at least be less willing to believe that both were
colluding.
The Perl module Math::Prime::Util is the only free software I've found
that defines a specific text-file format for certificates of
primality. The MPU format (as it calls it) supports various different
methods of certifying the primality of a number (most of which, like
Pockle's, depend on having previously proved some smaller number(s) to
be prime). The system implemented by Pockle is on its list: MPU calls
it by the name "BLS5".
So this commit introduces extra stored data inside Pockle so that it
remembers not just _that_ it believes certain numbers to be prime, but
also _why_ it believed each one to be prime. Then there's an extra
method in the Pockle API to translate its internal data structures
into the text of an MPU certificate for any number it knows about.
Math::Prime::Util doesn't come with a command-line verification tool,
unfortunately; only a Perl function which you feed a string argument.
So also in this commit I add test/mpu-check.pl, which is a trivial
command-line client of that function.
At the moment, this new piece of API is only exposed via testcrypt. I
could easily put some user interface into the key generation tools
that would save a few primality certificates alongside the private
key, but I have yet to think of any good reason to do it. Mostly this
facility is intended for debugging and cross-checking of the
_algorithm_, not of any particular prime.
Most of them are now _mandatory_ P3 scripts, because I'm tired of
maintaining everything to be compatible with both versions.
The current exceptions are gdb.py (which has to live with whatever gdb
gives it), and kh2reg.py (which is actually designed for other people
to use, and some of them might still be stuck on P2 for the moment).
This enables Pageant to act as a client for OpenSSH's agent, which
nowadays refuses to respond to SSH1_AGENTC_REQUEST_RSA_IDENTITIES, or
any other SSH1_AGENTC_* message. It now treats SSH_AGENT_FAILURE in
response to either 'list identities' request the same as successfully
receiving an empty list.
We're no longer doing the delta step at all, and even if we were, it's
not really part of _that_ particular algorithm - candidate selection
is centralised between all of them.
When judging how many bits of the generated prime we can afford to
consume with factors of p-1 and still have enough last few bits to
vary to find an actual prime in the range, I started by setting
max_bits_needed to the total size of the required output number, and
then subtracting a safety margin.
But that doesn't account for the fact that some bits may _already_
have been used by prior requirements from the PrimeCandidateSource,
such as the 'firstbits' used in RSA generation, or the 160-bit factor
of p-1 used in DSA.
So now we start by initialising max_bits_needed by asking the PCS how
many bits of entropy it still has left, and making sure not to reduce
_that_ by too much. Should fix another cause of hangs during prime
generation.
(Also, while I'm here, I've tweaked one of the compiled-out
diagnostics so that it reports how many bits it _does_ have left once
it starts trying to find a prime. That should make it easier to spot
any further problems in this area.)
I generated a list of sizes in bits for factors of p-1, knowing a
range of bit sizes that I wanted their product to fall in. But I
forgot that when you multiply together two or more numbers of
particular bit sizes, you can get more than one possible bit size
back. My code was estimating at the low end of the possible range, so
sometimes it would end up with more bits of the output prime specified
than it expected, and be left without enough variable bits to actually
be able to find a prime somewhere in the remaining space.
Now when I'm planning the factor list, I compute both the min and max
sizes of the product of the factors, and abort if any part of the
possible range is outside the safe zone.
How embarrassing - this morning's triumphant push of a shiny new
public-key method managed to break the entire GUI configuration system
so that it dereferences a null pointer during setup. That's what I get
for only testing the crypto side.
settings.c generates a preference list of host-key enum values that
included HK_ED448. So then hklist_handler() in config.c tries to look
that id up in its list of names, and doesn't find one, because I
forgot to add it there. Now reinstated.
The comparison functions between an mp_int and an integer worked by
walking along the mp_int, comparing each of its words to the
corresponding word of the integer. When they ran out of mp_int, they'd
stop.
But this overlooks the possibility that they might not have run out of
_integer_ yet! If BIGNUM_INT_BITS is defined to be less than the size
of a uintmax_t, then comparing (say) the uintmax_t 0x8000000000000001
against a one-word mp_int containing 0x0001 would return equality,
because it would never get as far as spotting the high bit of the
integer.
Fixed by iterating up to the max of the number of BignumInts in the
mp_int and the number that cover a uintmax_t. That means we have to
use mp_word() instead of a direct array lookup to get the mp_int words
to compare against, since now the word indices might be out of range.
Functions like mp_copy_integer_into, mp_add_integer_into and
mp_hs_integer all take an ordinary C integer in the form of a
uintmax_t, and perform an operation between that and an mp_int. In
order to do that, they have to break it up into some number of
BignumInt, via bit shifts.
But in C, shifting by an amount equal to or greater than the width of
the type is undefined behaviour, and you risk the compiler generating
nonsense or complaining at compile time. I did various dodges in those
functions to try to avoid that, but didn't manage to use the same
idiom everywhere. Sometimes I'd leave the integer in its original form
and shift it right by increasing multiples of BIGNUM_INT_BITS;
sometimes I'd shift it down in place every time. And mostly I'd do the
conditional shift by checking against sizeof(n), but once I did it by
shifting by half the word and then the other half.
Now refactored so that there's a pair of functions to shift a
uintmax_t left or right by BIGNUM_INT_BITS in what I hope is a UB-safe
manner, and changed all the code I could find to use them.
This is standardised by RFC 8709 at SHOULD level, and for us it's not
too difficult (because we use general-purpose elliptic-curve code). So
let's be up to date for a change, and add it.
This implementation uses all the formats defined in the RFC. But we
also have to choose a wire format for the public+private key blob sent
to an agent, and since the OpenSSH agent protocol is the de facto
standard but not (yet?) handled by the IETF, OpenSSH themselves get to
say what the format for a key should or shouldn't be. So if they don't
support a particular key method, what do you do?
I checked with them, and they agreed that there's an obviously right
format for Ed448 keys, which is to do them exactly like Ed25519 except
that you have a 57-byte string everywhere Ed25519 had a 32-byte
string. So I've done that.
In the Windows GUI, all the controls that were previously named or
labelled Ed25519 are now labelled EdDSA, and when you select that
top-level key type, there's a dropdown for the specific curve (just
like for ECDSA), whose only current value is Ed25519.
In command-line PuTTYgen, you can say '-t eddsa' and give a number of
bits, just like '-t ecdsa'. You can also still say '-t ed25519', for
backwards compatibility.
Also in command-line PuTTYgen, I've reworked the error messages if you
give a number of bits that doesn't correspond to a known elliptic
curve. Now the messages are generated by consulting the list of
curves, so that that list has to be updated by hand in one fewer
place.
In Ed25519, when you hash things, you just feed them straight to
SHA-512. But in Ed448, you prefix them with a magic string before
feeding them to SHAKE256, and the magic string varies depending on
which variant of Ed448 is in use.
Add support for such a prefix, and for Ed25519, set it to the empty
string.
An EdDSA public key is transmitted as an integer just big enough to
store the curve point's y value plus one extra bit. The topmost bit of
the integer is set to indicate the parity of the x value.
In Ed25519, this works out very nicely: the curve modulus is just
under 2^255, so the y value occupies all but one bit of the topmost
byte of the integer, leaving the y parity to fit neatly in the spare
bit. But in Ed448, the curve modulus occupies a whole number of bytes
(56 of them) and the format standardises that you therefore have to
use a 57-byte string for the public key, with the highest-order
(final) byte having zero in the low 7 of its bits.
To support this, I've arranged that the Edwards-curve modes in
sshecc.c now set curve->fieldBytes to the right number of bytes to
hold curve->fieldBits+1 bits, instead of the obvious curve->fieldBits.
Doing it that way means fewer special cases everywhere else in the
code.
Also, this means changing the order of the validity check: we now wait
until after we've checked and cleared the topmost bit to see if the
rest is within range. That deals with checking the intervening 7 zero
bits.
This works more or less like the similar refactoring for Montgomery
curves in 7fa0749fcb: where we previously hardwired the clearing of 3
low bits of a private exponent, we now turn that 3 into a curve-
specific constant, so that Ed448 will be able to set it to a different
value.
With all the preparation now in place, this is more or less trivial.
We add a new curve setup function in sshecc.c, and an ssh_kex linking
to it; we add the curve parameters to the reference / test code
eccref.py, and use them to generate the list of low-order input values
that should be rejected by the sanity check on the kex output; we add
the standard test vectors from RFC 7748 in cryptsuite.py, and the
low-order values we just generated.
There's always something. I added the actual option, but forgot to
advertise it in the online help.
While I'm here, I've also allowed the word 'proven' as an alternative
spelling for 'provable', because having the approved spellings be
'probable' and 'provable' is just asking for hilarious typos.
In Windows PuTTYgen, this is selected by an extra set of radio-button
style menu options in the Key menu. In the command-line version,
there's a new --primes=provable option.
This whole system is new, so I'm not enabling it by default just yet.
I may in future, though: it's running faster than I expected (in
particular, a lot faster than any previous prototype of the same
algorithm I attempted in standalone Python).
This uses all the facilities I've been adding in previous commits. It
implements Maurer's algorithm for generating a prime together with a
Pocklington certificate of its primality, by means of recursing to
generate smaller primes to be factors of p-1 for the Pocklington
check, then doing a test Miller-Rabin iteration to quickly exclude
obvious composites, and then doing the full Pocklington check.
In my case, this means I add each prime I generate to a Pockle. So the
algorithm says: recursively generate some primes and add them to the
PrimeCandidateSource, then repeatedly get a candidate value back from
the pcs, check it with M-R, and feed it to the Pockle. If the Pockle
accepts it, then we're done (and the Pockle will then know that value
is prime when our recursive caller uses it in turn, if we have one).
A small refinement to that algorithm is that I iterate M-R until the
witness value I tried is such that it at least _might_ be a primitive
root - which is to say that M-R didn't get 1 by evaluating any power
of it smaller than n-1. That way, there's less chance of the Pockle
rejecting the witness value. And sooner or later M-R must _either_
tell me I've got a potential primitive-root witness _or_ tell me it's
shown the number to be composite.
The old system I removed in commit 79d3c1783b had both linear and
exponential phase types, but the new one only had exponential, because
at that point I'd just thrown away all the clients of the linear phase
type. But I'm going to add another one shortly, so I have to put it
back in.
pcs_get_upper_bound lets the holder of a PrimeCandidateSource ask what
is the largest value it might ever generate. pcs_get_bits_remaining
lets it ask how much extra entropy it's going to generate on top of
the requirements that have already been input into it.
Both of these will be needed by the upcoming provable-prime work to
decide what sizes of subsidiary prime to generate.
We already had a function pcs_require_residue_1() which lets you ask
PrimeCandidateSource to ensure it only returns numbers congruent to 1
mod a given value. pcs_require_residue_1_mod_prime() is the same, but
it also records the number in a list of prime factors of n-1, which
can be queried later.
The idea is that if you're generating a DSA key, in which the small
prime q must divide p-1, the upcoming provable generation algorithm
will be able to recover q from the PrimeCandidateSource and use it as
part of the primality certificate, which reduces the number of bits of
extra prime factors it also has to make up.
This implements an extended form of primality verification using
certificates based on Pocklington's theorem. You make a Pockle object,
and then try to convince it that one number after another is prime, by
means of providing it with a list of prime factors of p-1 and a
primitive root. (Or just by saying 'this prime is small enough for you
to check yourself'.)
Pocklington's theorem requires you to have factors of p-1 whose
product is at least the square root of p. I've extended that to
support factorisations only as big as the cube root, via an extension
of the theorem given in Maurer's paper on generating provable primes.
The Pockle object is more or less write-only: it has no methods for
reading out its contents. Its only output channel is the return value
when you try to insert a prime into it: if it isn't sufficiently
convinced that your prime is prime, it will return an error code. So
anything for which it returns POCKLE_OK you can be confident of.
I'm going to use this for provable prime generation. But exposing this
part of the system as an object in its own right means I can write a
set of unit tests for this specifically. My negative tests exercise
all the different ways a certification can be erroneous or inadequate;
the positive tests include proofs of primality of various primes used
in elliptic-curve crypto. The Poly1305 proof in particular is taken
from a proof in DJB's paper, which has exactly the form of a
Pocklington certificate only written in English.
The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
Now we don't even bother with picking an mp_int base value and a small
adjustment; we just generate a random mp_int, and if it's congruent to
anything we want to avoid, throw it away and try again.
This should cause us to select completely uniformly from the candidate
values in the available range. Previously, the delta system was
introducing small skews at the start and end of the range (values very
near there were less likely to turn up because they fell within the
delta radius of a smaller set of base values).
I was worried about doing this because I thought it would be slower,
because of having to do a big pile of 'reduce mp_int mod small thing'
every time round the loop: the virtue of the delta system is that you
can set up the residues of your base value once and then try several
deltas using only normal-sized integer operations. But now I look more
closely, we were computing _all_ the residues of the base point every
time round the loop (several thousand of them), whereas now we're very
likely to be able to throw a candidate away after only two or three if
it's divisible by one of the smallest primes, which are also the ones
most likely to get in the way. So probably it's actually _faster_ than
the old system (although, since uniformity was my main aim, I haven't
timed it, only noticed that it seems to be fast _enough_).
Having just written a comment about how it was almost inconceivably
improbable that you _wouldn't_ be successful in finding a suitable g
on the very first number you tried, I couldn't help noticing that in
fact my very next DSA key generation test had to try twice. Had I made
a mistake in my probability theory?
No, it turns out: I find g by raising consecutive numbers to the power
(p-1)/q and looking to see if they're not 1, but I start with 1
itself, which along with -1 is the only number that _can't_ work!
Save a bit of pointless effort and iterate up from 2 instead.
Thanks to Pavel Kryukov's CI for pointing this out: VS 2015 doesn't
support C99 hex floating-point literals.
(VS 2017 does, and in general we're treating this as a C99-permitted
code base these days. So I was tempted to just increase the minimum
required compiler version and leave this code as it is. But since the
use of that particular floating literal was so totally frivolous and
unnecessary, I think I'll leave that for another day when it's more
important!)
This further cleans up the prime-generation code, to the point where
the main primegen() function has almost nothing in it. Also now I'll
be able to reuse M-R as a primitive in more sophisticated alternatives
to primegen().
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
(This is a reapplication of commit a7bdefb39, without the Miller-Rabin
refactoring accidentally folded into it. Also this time I've added -lm
to the link options, which for some reason _didn't_ cause me a link
failure last time round. No idea why not.)
This reverts commit a7bdefb394.
I had accidentally mashed it together with another commit. I did
actually want to push both of them, but I'd rather push them
separately! So I'm backing out the combined blob, and I'll re-push
them with their proper comments and explanations.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I just ran into a bug in which the testcrypt child process was cleanly
terminated, but at least one Python object was left lying around
containing the identifier of a testcrypt object that had never been
freed. On program exit, the Python reference count on that object went
to zero, the __del__ method was invoked, and childprocess.funcall
started a _new_ instance of testcrypt just so it could tell it to free
the object identifier - which, of course, the new testcrypt had never
heard of!
We can already tell the difference between a ChildProcess object which
has no subprocess because it hasn't yet been started, and one which
has no subprocess because it's terminated: the latter has exitstatus
set to something other than None. So now we enforce by assertion that
we don't ever restart the child process, and the __del__ method avoids
doing anything if the child has already finished.
When I'm writing Python using the testcrypt API, I keep finding that I
instinctively try to call vtable methods as if they were actual
methods of the object. For example, calling key.sign(msg, 0) instead
of ssh_key_sign(key, msg, 0).
So this change to the Python side of the testcrypt mechanism panders
to my inappropriate finger-macros by making them work! The idea is
that I define a set of pairs (type, prefix), such that any function
whose name begins with the prefix and whose first argument is of that
type will be automatically translated into a method on the Python
object wrapping a testcrypt value of that type. For example, any
function of the form ssh_key_foo(val_ssh_key, other args) will
automatically be exposed as a method key.foo(other args), simply
because (val_ssh_key, "ssh_key_") appears in the translation table.
This is particularly nice for the Python 3 REPL, which will let me
tab-complete the right set of method names by knowing the type I'm
trying to invoke one on. I haven't decided yet whether I want to
switch to using it throughout cryptsuite.py.
For namespace-cleanness, I've also renamed all the existing attributes
of the Python Value class wrapper so that they start with '_', to
leave the space of sensible names clear for the new OOish methods.
As a simple alias for "curve25519-sha256@libssh.org". This name is now
standardised in RFC8731 (and, since 7751657811, we have the extra
validation mandated by the RFC compared to the libssh spec); also it's
been in OpenSSH at least for ages (since 7.4, 2016-12, 0493766d56).
This expands our previous check for the public value being zero, to
take in all the values that will _become_ zero after not many steps.
The actual check at run time is done using the new is_infinite query
method for Montgomery curve points. Test cases in cryptsuite.py cover
all the dangerous values I generated via all that fiddly quartic-
solving code.
(DJB's page http://cr.yp.to/ecdh.html#validate also lists these same
constants. But working them out again for myself makes me confident I
can do it again for other similar curves, such as Curve448.)
In particular, this makes us fully compliant with RFC 7748's demand to
check we didn't generate a trivial output key, which can happen if the
other end sends any of those low-order values.
I don't actually see why this is a vital check to perform for security
purposes, for the same reason that we didn't classify the bug
'diffie-hellman-range-check' as a vulnerability: I can't really see
what the other end's incentive might be to deliberately send one of
these nonsense values (and you can't do it by accident - none of these
values is a power of the canonical base point). It's not that a DH
participant couldn't possible want to secretly expose the session
traffic - but there are plenty of more subtle (and less subtle!) ways
to do it, so you don't really gain anything by forcing them to use one
of those instead. But the RFC says to check, so we check.
This uses the new quartic-solver mod p to generate all the values in
Curve25519 that can end up at the curve identity by repeated
application of the doubling formula.
I'm going to want to use this for finding special values in elliptic
curves' ground fields.
In order to solve cubics and quartics in F_p, you have to work in
F_{p^2}, for much the same reasons that you have to be willing to use
complex numbers if you want to solve general cubics over the reals
(even if all the eventual roots turn out to be real after all). So
I've also introduced another arithmetic class to work in that kind of
field, and a shim that glues that on to the cyclic-group root finder
from the previous commit.
I'm about to want to solve quartics mod a prime, which means I'll need
to be able to take cube roots mod p as well as square roots.
This commit introduces a more general class which can take rth roots
for any prime r, and moreover, it can do it in a general cyclic group.
(You have to tell it the group's order and give it some primitives for
doing arithmetic, plus a way of iterating over the group elements that
it can use to look for a non-rth-power and roots of unity.)
That system makes it nicely easy to test, because you can give it a
cyclic group represented as the integers under _addition_, and then
you obviously know what all the right answers are. So I've also added
a unit test system checking that.
I'm about to want to expand the underlying number-theory code, so I'll
start by moving it into a file where it has room to grow without
swamping the main purpose of eccref.py.
ecc_montgomery_normalise takes a point with X and Z coordinates, and
normalises it to Z=1 by means of multiplying X by the inverse of Z and
then setting Z=1.
If you pass in a point with Z=0, representing the curve identity, then
it would be nice to still get the identity back out again afterwards.
We haven't really needed that property until now, but I'm about to
want it.
Currently, what happens is that we try to invert Z mod p; fail, but
don't notice we've failed, and come out with some nonsense value as
the inverse; multiply X by that; and then _set Z to 1_. So the output
value no longer has Z=0.
This commit changes things so that we multiply Z by the inverse we
computed. That way, if Z started off 0, it stays 0.
Also made the same change in the other two curve types, on general
principles, though I don't yet have a use for that.