An EdDSA public key is transmitted as an integer just big enough to
store the curve point's y value plus one extra bit. The topmost bit of
the integer is set to indicate the parity of the x value.
In Ed25519, this works out very nicely: the curve modulus is just
under 2^255, so the y value occupies all but one bit of the topmost
byte of the integer, leaving the y parity to fit neatly in the spare
bit. But in Ed448, the curve modulus occupies a whole number of bytes
(56 of them) and the format standardises that you therefore have to
use a 57-byte string for the public key, with the highest-order
(final) byte having zero in the low 7 of its bits.
To support this, I've arranged that the Edwards-curve modes in
sshecc.c now set curve->fieldBytes to the right number of bytes to
hold curve->fieldBits+1 bits, instead of the obvious curve->fieldBits.
Doing it that way means fewer special cases everywhere else in the
code.
Also, this means changing the order of the validity check: we now wait
until after we've checked and cleared the topmost bit to see if the
rest is within range. That deals with checking the intervening 7 zero
bits.
This works more or less like the similar refactoring for Montgomery
curves in 7fa0749fcb: where we previously hardwired the clearing of 3
low bits of a private exponent, we now turn that 3 into a curve-
specific constant, so that Ed448 will be able to set it to a different
value.
With all the preparation now in place, this is more or less trivial.
We add a new curve setup function in sshecc.c, and an ssh_kex linking
to it; we add the curve parameters to the reference / test code
eccref.py, and use them to generate the list of low-order input values
that should be rejected by the sanity check on the kex output; we add
the standard test vectors from RFC 7748 in cryptsuite.py, and the
low-order values we just generated.
There's always something. I added the actual option, but forgot to
advertise it in the online help.
While I'm here, I've also allowed the word 'proven' as an alternative
spelling for 'provable', because having the approved spellings be
'probable' and 'provable' is just asking for hilarious typos.
In Windows PuTTYgen, this is selected by an extra set of radio-button
style menu options in the Key menu. In the command-line version,
there's a new --primes=provable option.
This whole system is new, so I'm not enabling it by default just yet.
I may in future, though: it's running faster than I expected (in
particular, a lot faster than any previous prototype of the same
algorithm I attempted in standalone Python).
This uses all the facilities I've been adding in previous commits. It
implements Maurer's algorithm for generating a prime together with a
Pocklington certificate of its primality, by means of recursing to
generate smaller primes to be factors of p-1 for the Pocklington
check, then doing a test Miller-Rabin iteration to quickly exclude
obvious composites, and then doing the full Pocklington check.
In my case, this means I add each prime I generate to a Pockle. So the
algorithm says: recursively generate some primes and add them to the
PrimeCandidateSource, then repeatedly get a candidate value back from
the pcs, check it with M-R, and feed it to the Pockle. If the Pockle
accepts it, then we're done (and the Pockle will then know that value
is prime when our recursive caller uses it in turn, if we have one).
A small refinement to that algorithm is that I iterate M-R until the
witness value I tried is such that it at least _might_ be a primitive
root - which is to say that M-R didn't get 1 by evaluating any power
of it smaller than n-1. That way, there's less chance of the Pockle
rejecting the witness value. And sooner or later M-R must _either_
tell me I've got a potential primitive-root witness _or_ tell me it's
shown the number to be composite.
The old system I removed in commit 79d3c1783b had both linear and
exponential phase types, but the new one only had exponential, because
at that point I'd just thrown away all the clients of the linear phase
type. But I'm going to add another one shortly, so I have to put it
back in.
pcs_get_upper_bound lets the holder of a PrimeCandidateSource ask what
is the largest value it might ever generate. pcs_get_bits_remaining
lets it ask how much extra entropy it's going to generate on top of
the requirements that have already been input into it.
Both of these will be needed by the upcoming provable-prime work to
decide what sizes of subsidiary prime to generate.
We already had a function pcs_require_residue_1() which lets you ask
PrimeCandidateSource to ensure it only returns numbers congruent to 1
mod a given value. pcs_require_residue_1_mod_prime() is the same, but
it also records the number in a list of prime factors of n-1, which
can be queried later.
The idea is that if you're generating a DSA key, in which the small
prime q must divide p-1, the upcoming provable generation algorithm
will be able to recover q from the PrimeCandidateSource and use it as
part of the primality certificate, which reduces the number of bits of
extra prime factors it also has to make up.
This implements an extended form of primality verification using
certificates based on Pocklington's theorem. You make a Pockle object,
and then try to convince it that one number after another is prime, by
means of providing it with a list of prime factors of p-1 and a
primitive root. (Or just by saying 'this prime is small enough for you
to check yourself'.)
Pocklington's theorem requires you to have factors of p-1 whose
product is at least the square root of p. I've extended that to
support factorisations only as big as the cube root, via an extension
of the theorem given in Maurer's paper on generating provable primes.
The Pockle object is more or less write-only: it has no methods for
reading out its contents. Its only output channel is the return value
when you try to insert a prime into it: if it isn't sufficiently
convinced that your prime is prime, it will return an error code. So
anything for which it returns POCKLE_OK you can be confident of.
I'm going to use this for provable prime generation. But exposing this
part of the system as an object in its own right means I can write a
set of unit tests for this specifically. My negative tests exercise
all the different ways a certification can be erroneous or inadequate;
the positive tests include proofs of primality of various primes used
in elliptic-curve crypto. The Poly1305 proof in particular is taken
from a proof in DJB's paper, which has exactly the form of a
Pocklington certificate only written in English.
The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
Now we don't even bother with picking an mp_int base value and a small
adjustment; we just generate a random mp_int, and if it's congruent to
anything we want to avoid, throw it away and try again.
This should cause us to select completely uniformly from the candidate
values in the available range. Previously, the delta system was
introducing small skews at the start and end of the range (values very
near there were less likely to turn up because they fell within the
delta radius of a smaller set of base values).
I was worried about doing this because I thought it would be slower,
because of having to do a big pile of 'reduce mp_int mod small thing'
every time round the loop: the virtue of the delta system is that you
can set up the residues of your base value once and then try several
deltas using only normal-sized integer operations. But now I look more
closely, we were computing _all_ the residues of the base point every
time round the loop (several thousand of them), whereas now we're very
likely to be able to throw a candidate away after only two or three if
it's divisible by one of the smallest primes, which are also the ones
most likely to get in the way. So probably it's actually _faster_ than
the old system (although, since uniformity was my main aim, I haven't
timed it, only noticed that it seems to be fast _enough_).
Having just written a comment about how it was almost inconceivably
improbable that you _wouldn't_ be successful in finding a suitable g
on the very first number you tried, I couldn't help noticing that in
fact my very next DSA key generation test had to try twice. Had I made
a mistake in my probability theory?
No, it turns out: I find g by raising consecutive numbers to the power
(p-1)/q and looking to see if they're not 1, but I start with 1
itself, which along with -1 is the only number that _can't_ work!
Save a bit of pointless effort and iterate up from 2 instead.
Thanks to Pavel Kryukov's CI for pointing this out: VS 2015 doesn't
support C99 hex floating-point literals.
(VS 2017 does, and in general we're treating this as a C99-permitted
code base these days. So I was tempted to just increase the minimum
required compiler version and leave this code as it is. But since the
use of that particular floating literal was so totally frivolous and
unnecessary, I think I'll leave that for another day when it's more
important!)
This further cleans up the prime-generation code, to the point where
the main primegen() function has almost nothing in it. Also now I'll
be able to reuse M-R as a primitive in more sophisticated alternatives
to primegen().
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
(This is a reapplication of commit a7bdefb39, without the Miller-Rabin
refactoring accidentally folded into it. Also this time I've added -lm
to the link options, which for some reason _didn't_ cause me a link
failure last time round. No idea why not.)
This reverts commit a7bdefb394.
I had accidentally mashed it together with another commit. I did
actually want to push both of them, but I'd rather push them
separately! So I'm backing out the combined blob, and I'll re-push
them with their proper comments and explanations.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I just ran into a bug in which the testcrypt child process was cleanly
terminated, but at least one Python object was left lying around
containing the identifier of a testcrypt object that had never been
freed. On program exit, the Python reference count on that object went
to zero, the __del__ method was invoked, and childprocess.funcall
started a _new_ instance of testcrypt just so it could tell it to free
the object identifier - which, of course, the new testcrypt had never
heard of!
We can already tell the difference between a ChildProcess object which
has no subprocess because it hasn't yet been started, and one which
has no subprocess because it's terminated: the latter has exitstatus
set to something other than None. So now we enforce by assertion that
we don't ever restart the child process, and the __del__ method avoids
doing anything if the child has already finished.
When I'm writing Python using the testcrypt API, I keep finding that I
instinctively try to call vtable methods as if they were actual
methods of the object. For example, calling key.sign(msg, 0) instead
of ssh_key_sign(key, msg, 0).
So this change to the Python side of the testcrypt mechanism panders
to my inappropriate finger-macros by making them work! The idea is
that I define a set of pairs (type, prefix), such that any function
whose name begins with the prefix and whose first argument is of that
type will be automatically translated into a method on the Python
object wrapping a testcrypt value of that type. For example, any
function of the form ssh_key_foo(val_ssh_key, other args) will
automatically be exposed as a method key.foo(other args), simply
because (val_ssh_key, "ssh_key_") appears in the translation table.
This is particularly nice for the Python 3 REPL, which will let me
tab-complete the right set of method names by knowing the type I'm
trying to invoke one on. I haven't decided yet whether I want to
switch to using it throughout cryptsuite.py.
For namespace-cleanness, I've also renamed all the existing attributes
of the Python Value class wrapper so that they start with '_', to
leave the space of sensible names clear for the new OOish methods.
As a simple alias for "curve25519-sha256@libssh.org". This name is now
standardised in RFC8731 (and, since 7751657811, we have the extra
validation mandated by the RFC compared to the libssh spec); also it's
been in OpenSSH at least for ages (since 7.4, 2016-12, 0493766d56).
This expands our previous check for the public value being zero, to
take in all the values that will _become_ zero after not many steps.
The actual check at run time is done using the new is_infinite query
method for Montgomery curve points. Test cases in cryptsuite.py cover
all the dangerous values I generated via all that fiddly quartic-
solving code.
(DJB's page http://cr.yp.to/ecdh.html#validate also lists these same
constants. But working them out again for myself makes me confident I
can do it again for other similar curves, such as Curve448.)
In particular, this makes us fully compliant with RFC 7748's demand to
check we didn't generate a trivial output key, which can happen if the
other end sends any of those low-order values.
I don't actually see why this is a vital check to perform for security
purposes, for the same reason that we didn't classify the bug
'diffie-hellman-range-check' as a vulnerability: I can't really see
what the other end's incentive might be to deliberately send one of
these nonsense values (and you can't do it by accident - none of these
values is a power of the canonical base point). It's not that a DH
participant couldn't possible want to secretly expose the session
traffic - but there are plenty of more subtle (and less subtle!) ways
to do it, so you don't really gain anything by forcing them to use one
of those instead. But the RFC says to check, so we check.
This uses the new quartic-solver mod p to generate all the values in
Curve25519 that can end up at the curve identity by repeated
application of the doubling formula.
I'm going to want to use this for finding special values in elliptic
curves' ground fields.
In order to solve cubics and quartics in F_p, you have to work in
F_{p^2}, for much the same reasons that you have to be willing to use
complex numbers if you want to solve general cubics over the reals
(even if all the eventual roots turn out to be real after all). So
I've also introduced another arithmetic class to work in that kind of
field, and a shim that glues that on to the cyclic-group root finder
from the previous commit.
I'm about to want to solve quartics mod a prime, which means I'll need
to be able to take cube roots mod p as well as square roots.
This commit introduces a more general class which can take rth roots
for any prime r, and moreover, it can do it in a general cyclic group.
(You have to tell it the group's order and give it some primitives for
doing arithmetic, plus a way of iterating over the group elements that
it can use to look for a non-rth-power and roots of unity.)
That system makes it nicely easy to test, because you can give it a
cyclic group represented as the integers under _addition_, and then
you obviously know what all the right answers are. So I've also added
a unit test system checking that.
I'm about to want to expand the underlying number-theory code, so I'll
start by moving it into a file where it has room to grow without
swamping the main purpose of eccref.py.
ecc_montgomery_normalise takes a point with X and Z coordinates, and
normalises it to Z=1 by means of multiplying X by the inverse of Z and
then setting Z=1.
If you pass in a point with Z=0, representing the curve identity, then
it would be nice to still get the identity back out again afterwards.
We haven't really needed that property until now, but I'm about to
want it.
Currently, what happens is that we try to invert Z mod p; fail, but
don't notice we've failed, and come out with some nonsense value as
the inverse; multiply X by that; and then _set Z to 1_. So the output
value no longer has Z=0.
This commit changes things so that we multiply Z by the inverse we
computed. That way, if Z started off 0, it stays 0.
Also made the same change in the other two curve types, on general
principles, though I don't yet have a use for that.
I'm getting tired of maintaining it as 2/3 compatible; 2 is on the way
out anyway and I'm losing patience. In future, if it breaks in 2, I
think I'm going to stop caring.
I tried to do an SFTP upload through connection sharing the other day
and found that pscp sent some data and then hung. Now I debug it, what
seems to have happened was that we were looping in sftp_recv() waiting
for an SFTP packet from the remote, but we didn't have any outstanding
SFTP requests that the remote was going to reply to. Checking further,
xfer_upload_ready() reported true, so we _could_ have sent something -
but the logic in the upload loop had a hole through which we managed
to get into 'waiting for a packet' state.
I think what must have happened is that xfer_upload_ready() reported
false so that we entered sftp_recv(), but then the event loop inside
sftp_recv() ran a toplevel callback that made xfer_upload_ready()
return true. So, the fix: sftp_recv() is our last-ditch fallback, and
we always try emptying our callback queue and rechecking upload_ready
before we resort to waiting for a remote packet.
This not only fixes the hang I observed: it also hugely improves the
upload speed. My guess is that the bug must have been preventing us
from filling our outgoing request pipeline a _lot_ - but I didn't
notice it until the one time the queue accidentally ended up empty,
rather than just sparse enough to make transfers slow.
Annoyingly, I actually considered this fix back when I was trying to
fix the proftpd issue mentioned in commit cd97b7e7e. I decided fixing
ssh_sendbuffer() was a better idea. In fact it would have been an even
better idea to do both! Oh well, better late than never.
This is built more or less entirely out of pieces I already had. The
SOCKS server code is provided by the dynamic forwarding code in
portfwd.c. When that accepts a connection request, it wants to talk to
an SSH ConnectionLayer, which is already a trait with interchangeable
implementations - so I just provide one of my own which only supports
the lportfwd_open() method. And that in turn returns an SshChannel
object, with a special trait implementation all of whose methods
just funnel back to an ordinary Socket.
Result: you get a Socket-to-Socket SOCKS implementation with no SSH
anywhere, and even a minimal amount of need to _pretend_ internally to
be an SSH implementation.
Additional features include the ability to log all the traffic in the
form of diagnostics to standard error, or log each direction of each
connection separately to a file, or for anything more general, to log
each direction of each connection through a pipe to a subcommand that
can filter out whatever you think are the interesting parts. Also, you
can spawn a subcommand after the SOCKS server is set up, and terminate
automatically when that subcommand does - e.g. you might use this to
wrap the execution of a single SOCKS-using program.
This is a modernisation of a diagnostic utility I've had kicking
around out-of-tree for a long time. With all of last year's
refactorings, it now becomes feasible to keep it in-tree without
needing huge amounts of scaffolding. Also, this version runs on
Windows, which is more than the old one did. (On Windows I haven't
implemented the subprocess parts, although there's no reason I
_couldn't_.)
As well as diagnostic uses, this may also be useful in some situations
as a thing to forward ports to: PuTTY doesn't currently support
reverse dynamic port forwarding (in which the remote listening port
acts as a SOCKS server), but you could get the same effect by
forwarding a remote port to a local instance of this. (Although, of
course, that's nothing you couldn't achieve using any other SOCKS
server.)
This is probably overdue; everyone else seems to have settled on it as
the preferred RSA key exponent for some time. And now that the
descendant of mp_mod_short supports moduli up to 2^32 instead of 2^16,
I can actually add it without the risk of assertion failures during
prime generation.
It's now a subroutine specific to RSA key generation, because the
reworked PrimeCandidateSource system can handle the requirements of
DSA generation automatically.
The difference is that in DSA, one of the primes you generate is used
as a factor in the generation of the other, so you can just pass q as
a factor to pcs_require_residue_1, and it can get the range right by
itself. But in RSA, neither prime is generated with the other one in
mind; they're conceptually generated separately and independently,
apart from that single joint restriction on their product.
(I _could_ have added a feature to PrimeCandidateSource to specify a
range for the prime more specifically than a few initial bits. But I
didn't want to, because it would have been more complicated than doing
it this way, and also slightly less good: if you invent one prime
first and then constrain the range of the other one once you know the
first, then you're not getting an even probability distribution of the
possible _pairs_ of primes - you're privileging one over the other and
skewing the distribution.)
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
This still isn't the true random generator used in the live tools:
it's deterministic, for repeatable testing. The Python side of
testcrypt can now call random_make_prng(), which will instantiate a
PRNG with the given seed. random_clear() still gets rid of it.
So I can still have some tests control the precise random numbers
received by the function under test, but for others (especially key
generation, with its uncertainty about how much randomness it will
actually use) I can just say 'here, have a seed, generate as much
stuff from that seed as you need'.
This is another application of the existing mp_bezout_into, which
needed a tweak or two to cope with the numbers not necessarily being
coprime, plus a wrapper function to deal with shared factors of 2.
It reindents the entire second half of mp_bezout_into, so the patch is
best viewed with whitespace differences ignored.
This is a third random-number generation function, with an API in
between the too-specific mp_random_bits and the too-general
mp_random_in_range. Now you can generate a value between 0 and n
without having to either make n a power of 2, or tediously allocate a
zero mp_int to be the lower limit for mp_random_in_range.
Implementation is done by sawing the existing mp_random_in_range into
two pieces and exposing the API between them.
This is a version of mp_lshift_fixed_into() which allocates the output
number, which it can do because you know the size of the original
number and are allowed to treat the shift count as non-secret.
(By contrast, mp_lshift_safe() would be a nonsensical function - if
you're trying to keep the shift count secret, you _can't_ use it as a
parameter of memory allocation! In that situation you have no choice
but to allocate memory based on a fixed upper bound.)
There was previously no safe left shift at all, which is an omission.
And rshift_safe_into was an odd thing to be missing, so while I'm
here, I've added it on the basis that it will probably be useful
sooner or later.
Unlike the ones in mpint.c proper, these are not intended to respect
the constant-time guarantees. They're going to be the kind of thing
you use in key generation, which is too random to be constant-time in
any case.
I've arranged several precautions to try to make sure these functions
don't accidentally get linked into the main SSH application, only into
key generators:
- declare them in a separate header with "unsafe" in the name
- put "unsafe" in the name of every actual function
- don't even link the mpunsafe.c translation unit into PuTTY proper
- in fact, define global symbols of the same name in that file and
the SSH client code, so that there will be a link failure if we
ever try to do it by accident
The initial contents of the new source file consist of the subroutine
mp_mod_short() that previously lived in sshprime.c (and was not in
mpint.c proper precisely because it was unsafe). While I'm here, I've
turned it into mp_unsafe_mod_integer() and let it take a modulus of up
to 32 bits instead of 16.
Also added some obviously useful functions to shrink an mpint to the
smallest physical size that can hold the contained number (rather like
bn_restore_invariant in the old Bignum system), which I expect to be
using shortly.
Mostly because I just had a neat idea about how to expose that large
mutable array without it being a mutable global variable: make it a
static in its own module, and expose only a _pointer_ to it, which is
const-qualified.
While I'm there, changed the name to something more descriptive.