This further cleans up the prime-generation code, to the point where
the main primegen() function has almost nothing in it. Also now I'll
be able to reuse M-R as a primitive in more sophisticated alternatives
to primegen().
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
(This is a reapplication of commit a7bdefb39, without the Miller-Rabin
refactoring accidentally folded into it. Also this time I've added -lm
to the link options, which for some reason _didn't_ cause me a link
failure last time round. No idea why not.)
This reverts commit a7bdefb394.
I had accidentally mashed it together with another commit. I did
actually want to push both of them, but I'd rather push them
separately! So I'm backing out the combined blob, and I'll re-push
them with their proper comments and explanations.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I just ran into a bug in which the testcrypt child process was cleanly
terminated, but at least one Python object was left lying around
containing the identifier of a testcrypt object that had never been
freed. On program exit, the Python reference count on that object went
to zero, the __del__ method was invoked, and childprocess.funcall
started a _new_ instance of testcrypt just so it could tell it to free
the object identifier - which, of course, the new testcrypt had never
heard of!
We can already tell the difference between a ChildProcess object which
has no subprocess because it hasn't yet been started, and one which
has no subprocess because it's terminated: the latter has exitstatus
set to something other than None. So now we enforce by assertion that
we don't ever restart the child process, and the __del__ method avoids
doing anything if the child has already finished.
When I'm writing Python using the testcrypt API, I keep finding that I
instinctively try to call vtable methods as if they were actual
methods of the object. For example, calling key.sign(msg, 0) instead
of ssh_key_sign(key, msg, 0).
So this change to the Python side of the testcrypt mechanism panders
to my inappropriate finger-macros by making them work! The idea is
that I define a set of pairs (type, prefix), such that any function
whose name begins with the prefix and whose first argument is of that
type will be automatically translated into a method on the Python
object wrapping a testcrypt value of that type. For example, any
function of the form ssh_key_foo(val_ssh_key, other args) will
automatically be exposed as a method key.foo(other args), simply
because (val_ssh_key, "ssh_key_") appears in the translation table.
This is particularly nice for the Python 3 REPL, which will let me
tab-complete the right set of method names by knowing the type I'm
trying to invoke one on. I haven't decided yet whether I want to
switch to using it throughout cryptsuite.py.
For namespace-cleanness, I've also renamed all the existing attributes
of the Python Value class wrapper so that they start with '_', to
leave the space of sensible names clear for the new OOish methods.
As a simple alias for "curve25519-sha256@libssh.org". This name is now
standardised in RFC8731 (and, since 7751657811, we have the extra
validation mandated by the RFC compared to the libssh spec); also it's
been in OpenSSH at least for ages (since 7.4, 2016-12, 0493766d56).
This expands our previous check for the public value being zero, to
take in all the values that will _become_ zero after not many steps.
The actual check at run time is done using the new is_infinite query
method for Montgomery curve points. Test cases in cryptsuite.py cover
all the dangerous values I generated via all that fiddly quartic-
solving code.
(DJB's page http://cr.yp.to/ecdh.html#validate also lists these same
constants. But working them out again for myself makes me confident I
can do it again for other similar curves, such as Curve448.)
In particular, this makes us fully compliant with RFC 7748's demand to
check we didn't generate a trivial output key, which can happen if the
other end sends any of those low-order values.
I don't actually see why this is a vital check to perform for security
purposes, for the same reason that we didn't classify the bug
'diffie-hellman-range-check' as a vulnerability: I can't really see
what the other end's incentive might be to deliberately send one of
these nonsense values (and you can't do it by accident - none of these
values is a power of the canonical base point). It's not that a DH
participant couldn't possible want to secretly expose the session
traffic - but there are plenty of more subtle (and less subtle!) ways
to do it, so you don't really gain anything by forcing them to use one
of those instead. But the RFC says to check, so we check.
This uses the new quartic-solver mod p to generate all the values in
Curve25519 that can end up at the curve identity by repeated
application of the doubling formula.
I'm going to want to use this for finding special values in elliptic
curves' ground fields.
In order to solve cubics and quartics in F_p, you have to work in
F_{p^2}, for much the same reasons that you have to be willing to use
complex numbers if you want to solve general cubics over the reals
(even if all the eventual roots turn out to be real after all). So
I've also introduced another arithmetic class to work in that kind of
field, and a shim that glues that on to the cyclic-group root finder
from the previous commit.
I'm about to want to solve quartics mod a prime, which means I'll need
to be able to take cube roots mod p as well as square roots.
This commit introduces a more general class which can take rth roots
for any prime r, and moreover, it can do it in a general cyclic group.
(You have to tell it the group's order and give it some primitives for
doing arithmetic, plus a way of iterating over the group elements that
it can use to look for a non-rth-power and roots of unity.)
That system makes it nicely easy to test, because you can give it a
cyclic group represented as the integers under _addition_, and then
you obviously know what all the right answers are. So I've also added
a unit test system checking that.
I'm about to want to expand the underlying number-theory code, so I'll
start by moving it into a file where it has room to grow without
swamping the main purpose of eccref.py.
ecc_montgomery_normalise takes a point with X and Z coordinates, and
normalises it to Z=1 by means of multiplying X by the inverse of Z and
then setting Z=1.
If you pass in a point with Z=0, representing the curve identity, then
it would be nice to still get the identity back out again afterwards.
We haven't really needed that property until now, but I'm about to
want it.
Currently, what happens is that we try to invert Z mod p; fail, but
don't notice we've failed, and come out with some nonsense value as
the inverse; multiply X by that; and then _set Z to 1_. So the output
value no longer has Z=0.
This commit changes things so that we multiply Z by the inverse we
computed. That way, if Z started off 0, it stays 0.
Also made the same change in the other two curve types, on general
principles, though I don't yet have a use for that.
I'm getting tired of maintaining it as 2/3 compatible; 2 is on the way
out anyway and I'm losing patience. In future, if it breaks in 2, I
think I'm going to stop caring.
I tried to do an SFTP upload through connection sharing the other day
and found that pscp sent some data and then hung. Now I debug it, what
seems to have happened was that we were looping in sftp_recv() waiting
for an SFTP packet from the remote, but we didn't have any outstanding
SFTP requests that the remote was going to reply to. Checking further,
xfer_upload_ready() reported true, so we _could_ have sent something -
but the logic in the upload loop had a hole through which we managed
to get into 'waiting for a packet' state.
I think what must have happened is that xfer_upload_ready() reported
false so that we entered sftp_recv(), but then the event loop inside
sftp_recv() ran a toplevel callback that made xfer_upload_ready()
return true. So, the fix: sftp_recv() is our last-ditch fallback, and
we always try emptying our callback queue and rechecking upload_ready
before we resort to waiting for a remote packet.
This not only fixes the hang I observed: it also hugely improves the
upload speed. My guess is that the bug must have been preventing us
from filling our outgoing request pipeline a _lot_ - but I didn't
notice it until the one time the queue accidentally ended up empty,
rather than just sparse enough to make transfers slow.
Annoyingly, I actually considered this fix back when I was trying to
fix the proftpd issue mentioned in commit cd97b7e7e. I decided fixing
ssh_sendbuffer() was a better idea. In fact it would have been an even
better idea to do both! Oh well, better late than never.
This is built more or less entirely out of pieces I already had. The
SOCKS server code is provided by the dynamic forwarding code in
portfwd.c. When that accepts a connection request, it wants to talk to
an SSH ConnectionLayer, which is already a trait with interchangeable
implementations - so I just provide one of my own which only supports
the lportfwd_open() method. And that in turn returns an SshChannel
object, with a special trait implementation all of whose methods
just funnel back to an ordinary Socket.
Result: you get a Socket-to-Socket SOCKS implementation with no SSH
anywhere, and even a minimal amount of need to _pretend_ internally to
be an SSH implementation.
Additional features include the ability to log all the traffic in the
form of diagnostics to standard error, or log each direction of each
connection separately to a file, or for anything more general, to log
each direction of each connection through a pipe to a subcommand that
can filter out whatever you think are the interesting parts. Also, you
can spawn a subcommand after the SOCKS server is set up, and terminate
automatically when that subcommand does - e.g. you might use this to
wrap the execution of a single SOCKS-using program.
This is a modernisation of a diagnostic utility I've had kicking
around out-of-tree for a long time. With all of last year's
refactorings, it now becomes feasible to keep it in-tree without
needing huge amounts of scaffolding. Also, this version runs on
Windows, which is more than the old one did. (On Windows I haven't
implemented the subprocess parts, although there's no reason I
_couldn't_.)
As well as diagnostic uses, this may also be useful in some situations
as a thing to forward ports to: PuTTY doesn't currently support
reverse dynamic port forwarding (in which the remote listening port
acts as a SOCKS server), but you could get the same effect by
forwarding a remote port to a local instance of this. (Although, of
course, that's nothing you couldn't achieve using any other SOCKS
server.)
This is probably overdue; everyone else seems to have settled on it as
the preferred RSA key exponent for some time. And now that the
descendant of mp_mod_short supports moduli up to 2^32 instead of 2^16,
I can actually add it without the risk of assertion failures during
prime generation.
It's now a subroutine specific to RSA key generation, because the
reworked PrimeCandidateSource system can handle the requirements of
DSA generation automatically.
The difference is that in DSA, one of the primes you generate is used
as a factor in the generation of the other, so you can just pass q as
a factor to pcs_require_residue_1, and it can get the range right by
itself. But in RSA, neither prime is generated with the other one in
mind; they're conceptually generated separately and independently,
apart from that single joint restriction on their product.
(I _could_ have added a feature to PrimeCandidateSource to specify a
range for the prime more specifically than a few initial bits. But I
didn't want to, because it would have been more complicated than doing
it this way, and also slightly less good: if you invent one prime
first and then constrain the range of the other one once you know the
first, then you're not getting an even probability distribution of the
possible _pairs_ of primes - you're privileging one over the other and
skewing the distribution.)
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
This still isn't the true random generator used in the live tools:
it's deterministic, for repeatable testing. The Python side of
testcrypt can now call random_make_prng(), which will instantiate a
PRNG with the given seed. random_clear() still gets rid of it.
So I can still have some tests control the precise random numbers
received by the function under test, but for others (especially key
generation, with its uncertainty about how much randomness it will
actually use) I can just say 'here, have a seed, generate as much
stuff from that seed as you need'.
This is another application of the existing mp_bezout_into, which
needed a tweak or two to cope with the numbers not necessarily being
coprime, plus a wrapper function to deal with shared factors of 2.
It reindents the entire second half of mp_bezout_into, so the patch is
best viewed with whitespace differences ignored.
This is a third random-number generation function, with an API in
between the too-specific mp_random_bits and the too-general
mp_random_in_range. Now you can generate a value between 0 and n
without having to either make n a power of 2, or tediously allocate a
zero mp_int to be the lower limit for mp_random_in_range.
Implementation is done by sawing the existing mp_random_in_range into
two pieces and exposing the API between them.
This is a version of mp_lshift_fixed_into() which allocates the output
number, which it can do because you know the size of the original
number and are allowed to treat the shift count as non-secret.
(By contrast, mp_lshift_safe() would be a nonsensical function - if
you're trying to keep the shift count secret, you _can't_ use it as a
parameter of memory allocation! In that situation you have no choice
but to allocate memory based on a fixed upper bound.)
There was previously no safe left shift at all, which is an omission.
And rshift_safe_into was an odd thing to be missing, so while I'm
here, I've added it on the basis that it will probably be useful
sooner or later.
Unlike the ones in mpint.c proper, these are not intended to respect
the constant-time guarantees. They're going to be the kind of thing
you use in key generation, which is too random to be constant-time in
any case.
I've arranged several precautions to try to make sure these functions
don't accidentally get linked into the main SSH application, only into
key generators:
- declare them in a separate header with "unsafe" in the name
- put "unsafe" in the name of every actual function
- don't even link the mpunsafe.c translation unit into PuTTY proper
- in fact, define global symbols of the same name in that file and
the SSH client code, so that there will be a link failure if we
ever try to do it by accident
The initial contents of the new source file consist of the subroutine
mp_mod_short() that previously lived in sshprime.c (and was not in
mpint.c proper precisely because it was unsafe). While I'm here, I've
turned it into mp_unsafe_mod_integer() and let it take a modulus of up
to 32 bits instead of 16.
Also added some obviously useful functions to shrink an mpint to the
smallest physical size that can hold the contained number (rather like
bn_restore_invariant in the old Bignum system), which I expect to be
using shortly.
Mostly because I just had a neat idea about how to expose that large
mutable array without it being a mutable global variable: make it a
static in its own module, and expose only a _pointer_ to it, which is
const-qualified.
While I'm there, changed the name to something more descriptive.
When I implemented the puttygen --dump option recently, my aim was
that all the components related to the public key would have names
beginning with "public_", and the private components' names should
begin with "private_". Anything not beginning with either is a
_parameter of the system_, i.e. something that can safely be shared
between multiple users' key pairs without giving any of them the
ability to break another's key.
(In particular, in integer DSA, p, q and g are unprefixed, y is
labelled as public, and x as private. In principle, p,q,g are safe to
share; I think the only reason nobody bothers is that standardisation
is more difficult than generating a fresh prime every time. In
elliptic-curve DSA, the effort equation reverses, because finding a
good curve is a pain, so everybody standardises on one of a small
number of well-known ones.)
Anyway. This is all very well except that I left the 'public' prefix
off the EdDSA x and y values. Now fixed.
While looking over the code for other reasons, I happened to notice
that the internal function mp_add_masked_integer_into was using a
totally wrong condition to check whether it was about to do an
out-of-range right shift: it was comparing a shift count measured in
bits against BIGNUM_INT_BYTES.
The resulting bug hasn't shown up in the code so far, which I assume
is just because no caller is passing any RHS to mp_add_integer_into
bigger than about 1 or 2. And it doesn't show up in the test suite
because I hadn't tested those functions. Now I am testing them, and
the newly added test fails when built for 16-bit BignumInt if you back
out the actual fix in this commit.
Also spelled '-O text', this takes a public or private key as input,
and produces on standard output a dump of all the actual numbers
involved in the key: the exponent and modulus for RSA, the p,q,g,y
parameters for DSA, the affine x and y coordinates of the public
elliptic curve point for ECC keys, and all the extra bits and pieces
in the private keys too.
Partly I expect this to be useful to me for debugging: I've had to
paste key files a few too many times through base64 decoders and hex
dump tools, then manually decode SSH marshalling and paste the result
into the Python REPL to get an integer object. Now I should be able to
get _straight_ to text I can paste into Python.
But also, it's a way that other applications can use the key
generator: if you need to generate, say, an RSA key in some format I
don't support (I've recently heard of an XML-based one, for example),
then you can run 'puttygen -t rsa --dump' and have it print the
elements of a freshly generated keypair on standard output, and then
all you have to do is understand the output format.
In the previous commit I introduced the ability for PuTTY to talk to a
server speaking the bare ssh-connection protocol, and listed several
applications for that ability. But none of those applications is any
use without a server that speaks the same protocol. Until now, the
only such server has been the Unix-domain socket presented by an
upstream connection-sharing PuTTY - and we already had a way to
connect to that.
So here's the missing piece: by reusing code that already existed for
the testing SSH server Uppity, I've created a program that will speak
the bare ssh-connection protocol on its standard I/O channels. If you
want to get a shell session over any of the transports I mentioned in
the last commit, this is the program you need to run at the far end of
it.
I have yet to write the documentation, but just in case I forget, the
name stands for 'Pseudo Ssh for Untappable, Separately Authenticated
Networks'.
This is the same protocol that PuTTY's connection sharing has been
using for years, to communicate between the downstream and upstream
PuTTYs. I'm now promoting it to be a first-class member of the
protocols list: if you have a server for it, you can select it in the
GUI or on the command line, and write out a saved session that
specifies it.
This would be completely insecure if you used it as an ordinary
network protocol, of course. Not only is it non-cryptographic and wide
open to eavesdropping and hijacking, but it's not even _authenticated_
- it begins after the userauth phase of SSH. So there isn't even the
mild security theatre of entering an easy-to-eavesdrop password, as
there is with, say, Telnet.
However, that's not what I want to use it for. My aim is to use it for
various specialist and niche purposes, all of which involve speaking
it over an 8-bit-clean data channel that is already set up, secured
and authenticated by other methods. There are lots of examples of such
channels:
- a userv(1) invocation
- the console of a UML kernel
- the stdio channels into other kinds of container, such as Docker
- the 'adb shell' channel (although it seems quite hard to run a
custom binary at the far end of that)
- a pair of pipes between PuTTY and a Cygwin helper process
- and so on.
So this protocol is intended as a convenient way to get a client at
one end of any those to run a shell session at the other end. Unlike
other approaches, it will give you all the SSH-flavoured amenities
you're already used to, like forwarding your SSH agent into the
container, or forwarding selected network ports in or out of it, or
letting it open a window on your X server, or doing SCP/SFTP style
file transfer.
Of course another way to get all those amenities would be to run an
ordinary SSH server over the same channel - but this approach avoids
having to manage a phony password or authentication key, or taking up
your CPU time with pointless crypto.
PSCP and PSFTP can only work over a protocol enough like SSH to be
able to run subsystems (or at the very least a remote command, for
old-style PSCP). Historically we've implemented this restriction by
having them not support any protocol-selection command-line options at
all, and hardwiring them to instantiating ssh_backend.
This commit regularises them to be more like the rest of the tools.
You can select a protocol using the appropriate command-line option,
provided it's a protocol in those tools' backends[] array. And the
setup code will find the BackendVtable to instantiate by the usual
method of calling backend_vt_from_proto.
Currently, this makes essentially no difference: those tools link in
be_ssh.c, which means the only supported backend is SSH. So the effect
is that now -ssh is an accepted option with no effect, instead of
being rejected. But it opens the way to add other protocols that are
SSH-like enough to run file transfer over.
I'm reusing the 'id' string from each BackendVtable as the name of its
command-line option, which means I don't need to manually implement an
option for each new protocol.
The previous 'name' field was awkwardly serving both purposes: it was
a machine-readable identifier for the backend used in the saved
session format, and it was also used in error messages when Plink
wanted to complain that it didn't support a particular backend. Now
there are two separate name fields for those purposes.
Now you can type an absolute pathname (starting with '/') into the
hostname box in Unix GUI PuTTY, or into the hostname slot on the Unix
Plink command line, and the effect will be that PuTTY makes an AF_UNIX
connection to the specified Unix-domain socket in place of a TCP/IP
connection.
I don't _yet_ know of anyone running SSH on a Unix-domain socket, but
at the very least it'll be useful to me for debugging and testing, and
I'm pretty sure there will be other specialist uses sooner or later.
Sometimes, within a switch statement, you want to declare local
variables specific to the handler for one particular case. Until now
I've mostly been writing this in the form
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED:
{
declare variables;
do stuff;
}
break;
}
which is ugly because the two pieces of essentially similar code
appear at different indent levels, and also inconvenient because you
have less horizontal space available to write the complicated case
handler in - particuarly undesirable because _complicated_ case
handlers are the ones most likely to need all the space they can get!
After encountering a rather nicer idiom in the LLVM source code, and
after a bit of hackery this morning figuring out how to persuade
Emacs's auto-indent to do what I wanted with it, I've decided to move
to an idiom in which the open brace comes right after the case
statement, and the code within it is indented the same as it would
have been without the brace. Then the whole case handler (including
the break) lives inside those braces, and you get something that looks
more like this:
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED: {
declare variables;
do stuff;
break;
}
}
This commit is a big-bang change that reformats all the complicated
case handlers I could find into the new layout. This is particularly
nice in the Pageant main function, in which almost _every_ case
handler had a bundle of variables and was long and complicated. (In
fact that's what motivated me to get round to this.) Some of the
innermost parts of the terminal escape-sequence handling are also
breathing a bit easier now the horizontal pressure on them is
relieved.
(Also, in a few cases, I was able to remove the extra braces
completely, because the only variable local to the case handler was a
loop variable which our new C99 policy allows me to move into the
initialiser clause of its for statement.)
Viewed with whitespace ignored, this is not too disruptive a change.
Downstream patches that conflict with it may need to be reapplied
using --ignore-whitespace or similar.
This links up the new re-encryption facilities to the Unix Pageant
client-mode command line. Analogously to -d and -D, 'pageant -r key-id'
re-encrypts a single key, and 'pageant -R' re-encrypts everything.