Initial live testing pointed out that the ssh_keyalg corresponding to
the certified version of rsa-sha2-512 was expecting to see the SSH id
string "rsa-sha2-512-cert-v01@openssh.com" at the start of the public
key blob, whereas in fact, the _key_ type identifier is still
"ssh-rsa-...", just as the key type for base rsa-sha2-512 is base
ssh-rsa.
Fixed inside openssh-certs.c, by adding a couple more strings to the
'extra' structure.
Certificate keys don't work the same as normal keys, so the rest of
the code is going to have to pay attention to whether a key is a
certificate, and if so, treat it differently and do cert-specific
stuff to it. So here's a collection of methods for that purpose.
With one exception, these methods of ssh_key are not expected to be
implemented at all in non-certificate key types: they should only ever
be called once you already know you're dealing with a certificate. So
most of the new method pointers can be left out of the ssh_keyalg
initialisers.
The exception is the base_key method, which retrieves the base key of
a certificate - the underlying one with the certificate stripped off.
It's convenient for non-certificate keys to implement this too, and
just return a pointer to themselves. So I've added an implementation
in nullkey.c doing that. (The returned pointer doesn't transfer
ownership; you have to use the new ssh_key_clone() if you want to keep
the base key after freeing the certificate key.)
The methods _only_ implemented in certificates:
Query methods to return the public key of the CA (for looking up in a
list of trusted ones), and to return the key id string (which exists
to be written into log files).
Obviously, we need a check_cert() method which will verify the CA's
actual signature, not to mention checking all the other details like
the principal and the validity period.
And there's another fiddly method for dealing with the RSA upgrade
system, called 'related_alg'. This is quite like alternate_ssh_id, in
that its job is to upgrade one key algorithm to a related one with
more modern RSA signing flags (or any other similar thing that might
later reuse the same mechanism). But where alternate_ssh_id took the
actual signing flags as an argument, this takes a pointer to the
upgraded base algorithm. So it answers the question "What is to this
key algorithm as you are to its base?" - if you call it on
opensshcert_ssh_rsa and give it ssh_rsa_sha512, it'll give you back
opensshcert_ssh_rsa_sha512.
(It's awkward to have to have another of these fiddly methods, and in
the longer term I'd like to try to clean up their proliferation a bit.
But I even more dislike the alternative of just going through
all_keyalgs looking for a cert algorithm with, say, ssh_rsa_sha512 as
the base: that approach would work fine now but it would be a lurking
time bomb for when all the -cert-v02@ methods appear one day. This
way, each certificate type can upgrade itself to the appropriately
related version. And at least related_alg is only needed if you _are_
a certificate key type - it's not adding yet another piece of
null-method boilerplate to the rest.)
This commit is groundwork for full certificate support, but doesn't
complete the job by itself. It introduces the new key types, and adds
a test in cryptsuite ensuring they work as expected, but nothing else.
If you manually construct a PPK file for one of the new key types, so
that it has a certificate in the public key field, then this commit
enables PuTTY to present that key to a server for user authentication,
either directly or via Pageant storing and using it. But I haven't yet
provided any mechanism for making such a PPK, so by itself, this isn't
much use.
Also, these new key types are not yet included in the KEXINIT host
keys list, because if they were, they'd just be treated as normal host
keys, in that you'd be asked to manually confirm the SSH fingerprint
of the certificate. I'll enable them for host keys once I add the
missing pieces.
This makes a second independent copy of an existing ssh_key, for
situations where one piece of code is going to want to keep it after
its current owner frees it.
In order to have it work on an arbitrary ssh_key, whether public-only
or a full public+private key pair, I've had to add an ssh_key query
method to ask whether a private key is known. I'm surprised I haven't
found a need for that before! But I suppose in most situations in an
SSH client you statically know which kind of key you're dealing with.
Previously, the fact that "ssh-rsa" sometimes comes with two subtypes
"rsa-sha2-256" and "rsa-sha2-512" was known to three different parts
of the code - two in userauth and one in transport. Now the knowledge
of what those ids are, which one goes with which signing flags, and
which key types have subtypes at all, is centralised into a method of
the key algorithm, and all those locations just query it.
This will enable the introduction of further key algorithms that have
a parallel upgrade system.
It's a class method rather than an object method, so it doesn't allow
keys with the same algorithm to make different choices about what
flags they support. But that's not what I wanted it for: the real
purpose is to allow one key algorithm to delegate supported_flags to
another, by having its method implementation call the one from the
delegate class.
(If only C's compile/link model permitted me to initialise a field of
one global const struct variable to be a copy of that of another, I
wouldn't need the runtime overhead of this method! But object file
formats don't let you even specify that.)
Most key algorithms support no flags at all, so they all want to use
the same implementation of this method. So I've started a file of
stubs utils/nullkey.c to contain the common stub version.
I wasn't really satisfied with the previous version, but it was
easiest to get Stein's algorithm working on polynomials by doing it
exactly how I already knew to do it for integers. But now I've
improved it in two ways.
The first improvement I got from another implementation: instead of
transforming A into A - kB for some k that makes the constant term
zero, you can scale _both_ inputs, replacing A with mA - kB for some
k,m. The advantage is that you can calculate m and k very easily, by
making each one the constant term of the other polynomial, which means
you don't need to invert something mod q in every step. (Rather like
the projective-coordinates optimisations in elliptic curves, where
instead of inverting in every step you accumulate the product of all
the factors that need to be inverted, and invert the whole product
once at the very end.)
The second improvement is to abandon my cumbersome unwinding loop that
builds up the output coefficients by reversing the steps in the
original gcd-finding loop. Instead, I do the thing you do in normal
Euclid's algorithm: keep track of the coefficients as you go through
the original loop. I had wanted to do this before, but hadn't figured
out how you could deal with dividing a coefficient by x when (unlike
the associated real value) the coefficient isn't a multiple of x. But
the answer is very simple: x is invertible in the ring we're working
in (its inverse mod x^p-x-1 is just x^{p-1}-1), so you _can_ just
divide your coefficient by x, and moreover, very easily!
Together, these changes speed up the NTRU key generation by about a
factor of 1.5. And they remove lots of complicated code as well, so
everybody wins.
This consists of DJB's 'Streamlined NTRU Prime' quantum-resistant
cryptosystem, currently in round 3 of the NIST post-quantum key
exchange competition; it's run in parallel with ordinary Curve25519,
and generates a shared secret combining the output of both systems.
(Hence, even if you don't trust this newfangled NTRU Prime thing at
all, it's at least no _less_ secure than the kex you were using
already.)
As the OpenSSH developers point out, key exchange is the most urgent
thing to make quantum-resistant, even before working quantum computers
big enough to break crypto become available, because a break of the
kex algorithm can be applied retroactively to recordings of your past
sessions. By contrast, authentication is a real-time protocol, and can
only be broken by a quantum computer if there's one available to
attack you _already_.
I've implemented both sides of the mechanism, so that PuTTY and Uppity
both support it. In my initial testing, the two sides can both
interoperate with the appropriate half of OpenSSH, and also (of
course, but it would be embarrassing to mess it up) with each other.
This is already slightly nice because it lets me separate the
Weierstrass and Montgomery code more completely, without having to
have a vtable tucked into dh->extra. But more to the point, it will
allow completely different kex methods to fit into the same framework
later.
To that end, I've moved more of the descriptive message generation
into the vtable, and also provided the constructor with a flag that
will let it do different things in client and server.
Also, following on from a previous commit, I've arranged that the new
API returns arbitrary binary data for the exchange hash, rather than
an mp_int. An upcoming implementation of this interface will want to
return an encoded string instead of an encoded mp_int.
This reallocs an existing mp_int to have a different physical size,
e.g. to make sure there's enough space at the top of it.
Trivial, but I'm a little surprised I haven't needed it until now!
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
I happened to notice in passing that this function doesn't have any
tests (although it will have been at least somewhat tested by the
cmdgen interop test system).
This involved writing a wrapper that passes the passphrase and salt as
ptrlens, and I decided it made more sense to make the same change to
the original function too and adjust the call sites appropriately.
I derived a test case by getting OpenSSH itself to make an encrypted
key file, and then using the inputs and output from the password hash
operation that decrypted it again.
I recently encountered a paper [1] which catalogues all kinds of
things that can go wrong when one party in a discrete-log system
invents a prime and the other party chooses an exponent. In
particular, some choices of prime make it reasonable to use a short
exponent to save time, but others make that strategy very bad.
That paper is about the ElGamal encryption scheme used in OpenPGP,
which is basically integer Diffie-Hellman with one side's key being
persistent: a shared-secret integer is derived exactly as in DH, and
then it's used to communicate a message integer by simply multiplying
the shared secret by the message, mod p.
I don't _know_ that any problem of this kind arises in the SSH usage
of Diffie-Hellman: the standard integer DH groups in SSH are safe
primes, and as far as I know, the usual generation of prime moduli for
DH group exchange also picks safe primes. So the short exponents PuTTY
has been using _should_ be OK.
However, the range of imaginative other possibilities shown in that
paper make me nervous, even so! So I think I'm going to retire the
short exponent strategy, on general principles of overcaution.
This slows down 4096-bit integer DH by about a factor of 3-4 (which
would be worse if it weren't for the modpow speedup in the previous
commit). I think that's OK, because, firstly, computers are a lot
faster these days than when I originally chose to use short exponents,
and secondly, more and more implementations are now switching to
elliptic-curve DH, which is unaffected by this change (and with which
we've always been using maximum-length exponents).
[1] On the (in)security of ElGamal in OpenPGP. Luca De Feo, Bertram
Poettering, Alessandro Sorniotti. https://eprint.iacr.org/2021/923
Instead of the basic square-and-multiply strategy which requires a
square and a multiply per exponent bit (i.e. two modular
multiplications per bit in total), we instead reduce to a square per
exponent bit and an extra multiply only every 5 bits, because the
value we're multiplying in is derived from 5 of the exponent bits at
once via a table lookup.
To avoid the obvious side-channel leakage of a literal table lookup,
we read the whole table every time, mp_selecting the right value into
the multiplication input. This isn't as slow as it sounds when the
alternative is four entire modular multiplications! In my testing,
this commit speeds up large modpows by a factor of just over 1.5, and
it still gets a clean pass from 'testsc'.
This slightly simplifies the lookup function get_dh_group(), but
mostly, the point is to make it more similar to the other lookup
functions, because I'm planning to have those autogenerated.
Now those names appear in help files, I thought it was worth giving
them a read-through and spotting any really obviously confusing or
wrong ones. Quite a few make more sense in the original context of C
than in the derived Python (e.g. 'BinarySink *bs' as a place to write
output to makes sense, but the output 'val_string bs' is less
helpful).
A couple were so confusing that I also corrected them in the original
C, notably the misuse of 'wc' for the elliptic curve point input to
ecc_weierstrass_point_copy. ('wc' in that section of the code is
normally a parameter describing a whole curve.)
marshal.h now provides a macro put_fmt() which allows you to write
arbitrary printf-formatted data to an arbitrary BinarySink.
We already had this facility for strbufs in particular, in the form of
strbuf_catf(). That was able to take advantage of knowing the inner
structure of a strbuf to minimise memory allocation (it would snprintf
directly into the strbuf's existing buffer if possible). For a general
black-box BinarySink we can't do that, so instead we dupvprintf into a
temporary buffer.
For consistency, I've removed strbuf_catf, and converted all uses of
it into the new put_fmt - and I've also added an extra vtable method
in the BinarySink API, so that put_fmt can still use strbuf_catf's
more efficient memory management when talking to a strbuf, and fall
back to the simpler strategy when that's not available.
After this change, the cmake setup now works even on Debian stretch
(oldoldstable), which runs cmake 3.7.
In order to support a version that early I had to:
- write a fallback implementation of 'add_compile_definitions' for
older cmakes, which is easy, because add_compile_definitions(FOO)
is basically just add_compile_options(-DFOO)
- stop using list(TRANSFORM) and string(JOIN), of which I had one
case each, and they were easily replaced with simple foreach loops
- stop putting OBJECT libraries in the target_link_libraries command
for executable targets, in favour of adding $<TARGET_OBJECTS:foo>
to the main sources list for the same target. That matches what I
do with library targets, so it's probably more sensible anyway.
I tried going back by another Debian release and getting this cmake
setup to work on jessie, but that runs CMake 3.0.1, and in _that_
version of cmake the target_sources command is missing, and I didn't
find any alternative way to add extra sources to a target after having
first declared it. Reorganising to cope with _that_ omission would be
too much upheaval without a very good reason.
I ran across their defining RFCs recently and noticed that each one
provides an explicit mathematical expression for the prime (since each
one is derived from the expansion of pi, with framing FFs and a
correction term to make it actually prime).
Those expressions can be re-evaluated trivially by spigot, so it seems
reasonable to add those spigot commands in comments. This also means
the comments contain citations for these primes in actual standards,
including both the hex digits and the mathematical expressions.
I've moved it from mpunsafe.c into the main mpint.c, and renamed it
mp_mod_known_integer, because now it manages to avoid leaking
information about the mp_int you give it.
It can still potentially leak information about the small _modulus_
integer - hence the word 'known' in the new function name. This won't
be a problem in any existing use of the function, because it's used
during prime generation to check divisibility by all the small primes,
and optionally also check for residue 1 mod the RSA public exponent.
But all those values are well known and not secret.
This removes one source of side-channel leakage from prime generation.
This code base has always been a bit confused about which spelling it
likes to use to refer to that signature algorithm. The SSH protocol id
is "ssh-dss". But everyone I know refers to it as the Digital
Signature _Algorithm_, not the Digital Signature _Standard_.
When I moved everything down into the crypto subdir, I took the
opportunity to rename sshdss.c to dsa.c. Now I'm doing the rest of the
job: all internal identifiers and code comments refer to DSA, and the
spelling "dss" only survives in externally visible identifiers that
have to remain constant.
(Such identifiers include the SSH protocol id, and also the string id
used to identify the key type in PuTTY's own host key cache. We can't
change the latter without causing everyone a backwards-compatibility
headache, and if we _did_ ever decide to do that, we'd surely want to
do a much more thorough job of making the cache format more sensible!)
This clears up another large pile of clutter at the top level, and in
the process, allows me to rename source files to things that don't all
have that annoying 'ssh' prefix at the top.
sshblowf.h (as was) is 100% internal to that directory. And mpint_i.h
and ecc.h are specialist enough that it's reasonable to ask clients
outside the crypto directory to include them with a subdirectory path,
to hint that it's an unusual thing to be doing.
When preparing commit fca13a17b1, I redesigned the cmake test
function at the last minute, and apparently didn't quite get all the
call sites correctly rewritten. This one still omitted some of the
argument-type keywords, and had an obsolete parameter giving an
explicit name for a sub-library, which I later decided wasn't needed.
This applies to all of AES, SHA-1, SHA-256 and SHA-512. All those
source files previously contained multiple implementations of the
algorithm, enabled or disabled by ifdefs detecting whether they would
work on a given compiler. And in order to get advanced machine
instructions like AES-NI or NEON crypto into the output file when the
compile flags hadn't enabled them, we had to do nasty stuff with
compiler-specific pragmas or attributes.
Now we can do the detection at cmake time, and enable advanced
instructions in the more sensible way, by compile-time flags. So I've
broken up each of these modules into lots of sub-pieces: a file called
(e.g.) 'foo-common.c' containing common definitions across all
implementations (such as round constants), one called 'foo-select.c'
containing the top-level vtable(s), and a separate file for each
implementation exporting just the vtable(s) for that implementation.
One advantage of this is that it depends a lot less on compiler-
specific bodgery. My particular least favourite part of the previous
setup was the part where I had to _manually_ define some Arm ACLE
feature macros before including <arm_neon.h>, so that it would define
the intrinsics I wanted. Now I'm enabling interesting architecture
features in the normal way, on the compiler command line, there's no
need for that kind of trick: the right feature macros are already
defined and <arm_neon.h> does the right thing.
Another change in this reorganisation is that I've stopped assuming
there's just one hardware implementation per platform. Previously, the
accelerated vtables were called things like sha256_hw, and varied
between FOO-NI and NEON depending on platform; and the selection code
would simply ask 'is hw available? if so, use hw, else sw'. Now, each
HW acceleration strategy names its vtable its own way, and the
selection vtable has a whole list of possibilities to iterate over
looking for a supported one. So if someone feels like writing a second
accelerated implementation of something for a given platform - for
example, I've heard you can use plain NEON to speed up AES somewhat
even without the crypto extension - then it will now have somewhere to
drop in alongside the existing ones.
Similarly to 'utils', I've moved all the stuff in the crypto
build-time library into a source directory of its own, and while I'm
at it, split up the monolithic sshauxcrypt.c into its various
unrelated parts.
This is also an opportunity to remove the annoying 'ssh' prefix from
the front of the file names, and give several of them less cryptic
names.