draft-kampanakis-curdle-ssh-pq-ke defines the packet names
SSH_MSG_KEX_HYBRID_INIT and SSH_MSG_KEX_HYBRID_REPLY. They have the
same numbers as ECDH_INIT and ECDH_REPLY, and don't change anything
else, so this is just a naming change. But I think it's a good one,
because the post-quantum KEMs are less symmetric than ECDH (they're
much more like Ben's RSA kex in concept, though very different in
detail), and shouldn't try to pretend they're the same kind of thing.
Also this enables logparse.pl to give a warning about the fact that
one string in each packet contains two separate keys glomphed together.
For the latter reason (and also because it's easier in my code
structure) I've also switched to using the HYBRID naming for the
existing NTRU + Curve25519 hybrid method, even though the
Internet-Draft for that one still uses the ECDH names. Sorry, but I
think it's clearer!
dh_is_gex() expects to find a 'struct dh_extra' in the 'extra' field
of the kex_alg you pass in, and won't look kindly on finding an
instance of some totally different structure type. We were being
careful about that everywhere in the GSSAPI kex code except for the
final free step.
This is like the seat-independent nonfatal(), but specifies a Seat,
which allows the GUI dialog box to have the right terminal window as
its parent (if there are multiple ones).
Changed over all the nonfatal() calls in the code base that could be
localised to a Seat, which means all the ones that come up if
something goes horribly wrong in host key storage. To make that
possible, I've added a 'seat' parameter to store_host_key(); it turns
out that all its call sites had one available already.
This is surprisingly simple, because it wasn't necessary to touch the
GSS parts at all. Nothing changes about the message formats between
integer DH and ECDH in GSS KEX, except that the mpints sent back and
forth as part of integer DH are replaced by the opaque strings used in
ECDH. So I've invented a new KEXTYPE and made it control a bunch of
small conditionals in the middle of the GSS KEX code, leaving the rest
unchanged.
None of the constructors can fail and return NULL. I think this must
have come in with a lot of other unnecessary null-pointer checks when
ECC support was first added. I've got rid of most of them since then,
but that one apparently escaped my notice.
Introduced recently by commit 42740a5455, in which I decided to
call ssh_key_cache_str() even on certified host keys. But that call
was conditional on s->hkey being non-NULL (which happens in GSS KEX)
as well as on it not being certified, and I managed to absentmindedly
remove _both_ conditions. As a result we got a null-pointer
dereference on any GSS kex.
OpenSSH, when called on to give the fingerprint of a certified public
key, will in many circumstances generate the hash of the public blob
of the _underlying_ key, rather than the hash of the full certificate.
I think the hash of the certificate is also potentially useful (if
nothing else, it provides a way to tell apart multiple certificates on
the same key). But I can also see that it's useful to be able to
recognise a key as the same one 'really' (since all certificates on
the same key share a private key, so they're unavoidably related).
So I've dealt with this by introducing an extra pair of fingerprint
types, giving the cross product of {MD5, SHA-256} x {base key only,
full certificate}. You can manually select which one you want to see
in some circumstances (notably PuTTYgen), and in others (such as
diagnostics) both fingerprints will be emitted side by side via the
new functions ssh2_double_fingerprint[_blob].
The default, following OpenSSH, is to just fingerprint the base key.
In the case where a server presents a host key signed by a different
certificate from the one you've configured, it need not _always_ be
evidence of wrongdoing. I can imagine situations in which two CAs
cover overlapping sets of things, and you don't want to blanket-trust
one of them, but you do want to connect to a specific host signed by
that one.
Accordingly, PuTTY's previous policy of unconditionally aborting the
connection if certificate validation fails (which was always intended
as a stopgap until I thought through what I wanted to replace it with)
is now replaced by fallback handling: we present the host key
fingerprint to the user and give them the option to accept and/or
cache it based on the public key itself.
This means that the certified key types have to have a representation
in the host key cache. So I've assigned each one a type id, and
generate the cache string itself by simply falling back to the base
key.
(Rationale for the latter: re-signing a public key with a different
certificate doesn't change the _private_ key, or the set of valid
signatures generated with it. So if you've been convinced for reasons
other than the certificate that a particular private key is in the
possession of $host, then proof of ownership of that private key
should be enough to convince you you're talking to $host no matter
what CA has signed the public half this week.)
We now offer to receive a given certified host key type if _either_ we
have at least one CA configured to trust that host, _or_ we have a
certified key of that type cached. (So once you've decided manually
that you trust a particular key, we can still receive that key and
authenticate the host with it, even if you later delete the CA record
that it didn't match anyway.)
One change from normal (uncertified) host key handling is that for
certified key types _all_ the host key prompts use the stronger
language, with "WARNING - POTENTIAL SECURITY BREACH!" rather than the
mild 'hmm, we haven't seen this host before'. Rationale: if you
expected this CA key and got that one, it _could_ be a bold-as-brass
MITM attempt in which someone hoped you'd accept their entire CA key.
The mild wording is only for the case where we had no previous
expectations _at all_ for the host to violate: not a CA _or_ a cached
key.
As distinct from the type of signature generated by the SSH server
itself from the host key, this lets you exclude (and by default does
exclude) the old "ssh-rsa" SHA-1 signature type from the signature of
the CA on the certificate.
In a rekey, we expect to see the same host key again, which we enforce
by comparing its cache string, which we happened to have handy. But
certified host keys don't have cache strings, so this no longer works
reliably - the 'assert(s->keystr)' fails.
(This is what I get for making a zillion short-lived test connections
and not leaving any of them running for more than 2 minutes!)
Instead, we now keep the official public blob of the host key from the
first key exchange, and compare that to the public blob of the one in
the rekey.
Now we offer the OpenSSH certificate key types in our KEXINIT host key
algorithm list, so that if the server has a certificate, they can send
it to us.
There's a new storage.h abstraction for representing a list of trusted
host CAs, and which ones are trusted to certify hosts for what
domains. This is stored outside the normal saved session data, because
the whole point of host certificates is to avoid per-host faffing.
Configuring this set of trusted CAs is done via a new GUI dialog box,
separate from the main PuTTY config box (because it modifies a single
set of settings across all saved sessions), which you can launch by
clicking a button in the 'Host keys' pane. The GUI is pretty crude for
the moment, and very much at a 'just about usable' stage right now. It
will want some polishing.
If we have no CA configured that matches the hostname, we don't offer
to receive certified host keys in the first place. So for existing
users who haven't set any of this up yet, nothing will immediately
change.
Currently, if we do offer to receive certified host keys and the
server presents one signed by a CA we don't trust, PuTTY will bomb out
unconditionally with an error, instead of offering a confirmation box.
That's an unfinished part which I plan to fix before this goes into a
release.
This is already slightly nice because it lets me separate the
Weierstrass and Montgomery code more completely, without having to
have a vtable tucked into dh->extra. But more to the point, it will
allow completely different kex methods to fit into the same framework
later.
To that end, I've moved more of the descriptive message generation
into the vtable, and also provided the constructor with a flag that
will let it do different things in client and server.
Also, following on from a previous commit, I've arranged that the new
API returns arbitrary binary data for the exchange hash, rather than
an mp_int. An upcoming implementation of this interface will want to
return an encoded string instead of an encoded mp_int.
Until now, every kex method has represented the output as an mp_int.
So we were storing it in the mp_int field s->K, and adding it to the
exchange hash and key derivation hashes via put_mp_ssh2.
But there's now going to be the first kex method that represents the
output as a string (so that it might have the top bit set, or multiple
leading zero bytes, without its length varying). So we now need to be
more general.
The most general thing it's sensible to do is to replace s->K with a
strbuf containing _already-encoded_ data to become part of the hash,
including length fields if necessary. So every existing kex method
still derives an mp_int, but then immediately puts it into that strbuf
using put_mp_ssh2 and frees it.
All the seat functions that request an interactive prompt of some kind
to the user - both the main seat_get_userpass_input and the various
confirmation dialogs for things like host keys - were using a simple
int return value, with the general semantics of 0 = "fail", 1 =
"proceed" (and in the case of seat_get_userpass_input, answers to the
prompts were provided), and -1 = "request in progress, wait for a
callback".
In this commit I change all those functions' return types to a new
struct called SeatPromptResult, whose primary field is an enum
replacing those simple integer values.
The main purpose is that the enum has not three but _four_ values: the
"fail" result has been split into 'user abort' and 'software abort'.
The distinction is that a user abort occurs as a result of an
interactive UI action, such as the user clicking 'cancel' in a dialog
box or hitting ^D or ^C at a terminal password prompt - and therefore,
there's no need to display an error message telling the user that the
interactive operation has failed, because the user already knows,
because they _did_ it. 'Software abort' is from any other cause, where
PuTTY is the first to know there was a problem, and has to tell the
user.
We already had this 'user abort' vs 'software abort' distinction in
other parts of the code - the SSH backend has separate termination
functions which protocol layers can call. But we assumed that any
failure from an interactive prompt request fell into the 'user abort'
category, which is not true. A couple of examples: if you configure a
host key fingerprint in your saved session via the SSH > Host keys
pane, and the server presents a host key that doesn't match it, then
verify_ssh_host_key would report that the user had aborted the
connection, and feel no need to tell the user what had gone wrong!
Similarly, if a password provided on the command line was not
accepted, then (after I fixed the semantics of that in the previous
commit) the same wrong handling would occur.
So now, those Seat prompt functions too can communicate whether the
user or the software originated a connection abort. And in the latter
case, we also provide an error message to present to the user. Result:
in those two example cases (and others), error messages should no
longer go missing.
Implementation note: to avoid the hassle of having the error message
in a SeatPromptResult being a dynamically allocated string (and hence,
every recipient of one must always check whether it's non-NULL and
free it on every exit path, plus being careful about copying the
struct around), I've instead arranged that the structure contains a
function pointer and a couple of parameters, so that the string form
of the message can be constructed on demand. That way, the only users
who need to free it are the ones who actually _asked_ for it in the
first place, which is a much smaller set.
(This is one of the rare occasions that I regret not having C++'s
extra features available in this code base - a unique_ptr or
shared_ptr to a string would have been just the thing here, and the
compiler would have done all the hard work for me of remembering where
to insert the frees!)
I recently encountered a paper [1] which catalogues all kinds of
things that can go wrong when one party in a discrete-log system
invents a prime and the other party chooses an exponent. In
particular, some choices of prime make it reasonable to use a short
exponent to save time, but others make that strategy very bad.
That paper is about the ElGamal encryption scheme used in OpenPGP,
which is basically integer Diffie-Hellman with one side's key being
persistent: a shared-secret integer is derived exactly as in DH, and
then it's used to communicate a message integer by simply multiplying
the shared secret by the message, mod p.
I don't _know_ that any problem of this kind arises in the SSH usage
of Diffie-Hellman: the standard integer DH groups in SSH are safe
primes, and as far as I know, the usual generation of prime moduli for
DH group exchange also picks safe primes. So the short exponents PuTTY
has been using _should_ be OK.
However, the range of imaginative other possibilities shown in that
paper make me nervous, even so! So I think I'm going to retire the
short exponent strategy, on general principles of overcaution.
This slows down 4096-bit integer DH by about a factor of 3-4 (which
would be worse if it weren't for the modpow speedup in the previous
commit). I think that's OK, because, firstly, computers are a lot
faster these days than when I originally chose to use short exponents,
and secondly, more and more implementations are now switching to
elliptic-curve DH, which is unaffected by this change (and with which
we've always been using maximum-length exponents).
[1] On the (in)security of ElGamal in OpenPGP. Luca De Feo, Bertram
Poettering, Alessandro Sorniotti. https://eprint.iacr.org/2021/923
All this Interactor business has been gradually working towards being
able to inform the user _which_ network connection is currently
presenting them with a password prompt (or whatever), in situations
where more than one of them might be, such as an SSH connection being
used as a proxy for another SSH connection when neither one has
one-touch login configured.
At some point, we have to arrange that any attempt to do a user
interaction during connection setup - be it a password prompt, a host
key confirmation dialog, or just displaying an SSH login banner -
makes it clear which host it's come from. That's going to mean calling
some kind of announcement function before doing any of those things.
But there are several of those functions in the Seat API, and calls to
them are scattered far and wide across the SSH backend. (And not even
just there - the Rlogin backend also uses seat_get_userpass_input).
How can we possibly make sure we don't forget a vital call site on
some obscure little-tested code path, and leave the user confused in
just that one case which nobody might notice for years?
Today I thought of a trick to solve that problem. We can use the C
type system to enforce it for us!
The plan is: we invent a new struct type which contains nothing but a
'Seat *'. Then, for every Seat method which does a thing that ought to
be clearly identified as relating to a particular Interactor, we
adjust the API for that function to take the new struct type where it
previously took a plain 'Seat *'. Or rather - doing less violence to
the existing code - we only need to adjust the API of the dispatch
functions inline in putty.h.
How does that help? Because the way you _get_ one of these
struct-wrapped Seat pointers is by calling interactor_announce() on
your Interactor, which will in turn call interactor_get_seat(), and
wrap the returned pointer into one of these structs.
The effect is that whenever the SSH (or Rlogin) code wants to call one
of those particular Seat methods, it _has_ to call
interactor_announce() just beforehand, which (once I finish all of
this) will make sure the user is aware of who is presenting the prompt
or banner or whatever. And you can't forget to call it, because if you
don't call it, then you just don't have a struct of the right type to
give to the Seat method you wanted to call!
(Of course, there's nothing stopping code from _deliberately_ taking a
Seat * it already has and wrapping it into the new struct. In fact
SshProxy has to do that, in order to forward these requests up the
chain of Seats. But the point is that you can't do it _by accident_,
just by forgetting to make a vital function call - when you do that,
you _know_ you're doing it on purpose.)
No functional change: the new interactor_announce() function exists,
and the type-system trick ensures it's called in all the right places,
but it doesn't actually _do_ anything yet.
Previously, checking the host key against the persistent cache managed
by the storage.h API was done as part of the seat_verify_ssh_host_key
method, i.e. separately by each Seat.
Now that check is done by verify_ssh_host_key(), which is a new
function in ssh/common.c that centralises all the parts of host key
checking that don't need an interactive prompt. It subsumes the
previous verify_ssh_manual_host_key() that checked against the Conf,
and it does the check against the storage API that each Seat was
previously doing separately. If it can't confirm or definitively
reject the host key by itself, _then_ it calls out to the Seat, once
an interactive prompt is definitely needed.
The main point of doing this is so that when SshProxy forwards a Seat
call from the proxy SSH connection to the primary Seat, it won't print
an announcement of which connection is involved unless it's actually
going to do something interactive. (Not that we're printing those
announcements _yet_ anyway, but this is a piece of groundwork that
works towards doing so.)
But while I'm at it, I've also taken the opportunity to clean things
up a bit by renaming functions sensibly. Previously we had three very
similarly named functions verify_ssh_manual_host_key(), SeatVtable's
'verify_ssh_host_key' method, and verify_host_key() in storage.h. Now
the Seat method is called 'confirm' rather than 'verify' (since its
job is now always to print an interactive prompt, so it looks more
like the other confirm_foo methods), and the storage.h function is
called check_stored_host_key(), which goes better with store_host_key
and avoids having too many functions with similar names. And the
'manual' function is subsumed into the new centralised code, so
there's now just *one* host key function with 'verify' in the name.
Several functions are reindented in this commit. Best viewed with
whitespace changes ignored.
This clears up another large pile of clutter at the top level, and in
the process, allows me to rename source files to things that don't all
have that annoying 'ssh' prefix at the top.