I must have dashed off that branch of the key reading function without
ever testing it, or I'd have noticed by now that it was looking for
the wrong string to terminate the file. Ahem.
As distinct from the type of signature generated by the SSH server
itself from the host key, this lets you exclude (and by default does
exclude) the old "ssh-rsa" SHA-1 signature type from the signature of
the CA on the certificate.
I'm about to change my mind about whether its top-level nature is
struct or union, and rather than change the key word 'union' to
'struct' at every point of use, it's nicer to just get rid of the
keyword completely. So it has a shiny new name.
This uses the test-CA code to construct a series of certificates with
various properties so as to check all the error cases of certificate
validation. It also tests the various different key types, and all the
RSA signature flags on both the certified key and the certifying one.
This is mostly intended to be invoked from cryptsuite, so that I can
make test certificates with various features to check the validation
function. But it also has a command-line interface, which currently
contains just enough features that I was able to generate a
certificate and actually make sure OpenSSH accepted it (proving that I
got the format right in this script).
You _could_ expand this script into a full production CA, with a
couple more command-line options, if you didn't mind the slightly
awkward requirement that in command-line mode it insists on doing its
signing via an SSH agent. But for the moment it's only intended for
test purposes.
Certificate keys don't work the same as normal keys, so the rest of
the code is going to have to pay attention to whether a key is a
certificate, and if so, treat it differently and do cert-specific
stuff to it. So here's a collection of methods for that purpose.
With one exception, these methods of ssh_key are not expected to be
implemented at all in non-certificate key types: they should only ever
be called once you already know you're dealing with a certificate. So
most of the new method pointers can be left out of the ssh_keyalg
initialisers.
The exception is the base_key method, which retrieves the base key of
a certificate - the underlying one with the certificate stripped off.
It's convenient for non-certificate keys to implement this too, and
just return a pointer to themselves. So I've added an implementation
in nullkey.c doing that. (The returned pointer doesn't transfer
ownership; you have to use the new ssh_key_clone() if you want to keep
the base key after freeing the certificate key.)
The methods _only_ implemented in certificates:
Query methods to return the public key of the CA (for looking up in a
list of trusted ones), and to return the key id string (which exists
to be written into log files).
Obviously, we need a check_cert() method which will verify the CA's
actual signature, not to mention checking all the other details like
the principal and the validity period.
And there's another fiddly method for dealing with the RSA upgrade
system, called 'related_alg'. This is quite like alternate_ssh_id, in
that its job is to upgrade one key algorithm to a related one with
more modern RSA signing flags (or any other similar thing that might
later reuse the same mechanism). But where alternate_ssh_id took the
actual signing flags as an argument, this takes a pointer to the
upgraded base algorithm. So it answers the question "What is to this
key algorithm as you are to its base?" - if you call it on
opensshcert_ssh_rsa and give it ssh_rsa_sha512, it'll give you back
opensshcert_ssh_rsa_sha512.
(It's awkward to have to have another of these fiddly methods, and in
the longer term I'd like to try to clean up their proliferation a bit.
But I even more dislike the alternative of just going through
all_keyalgs looking for a cert algorithm with, say, ssh_rsa_sha512 as
the base: that approach would work fine now but it would be a lurking
time bomb for when all the -cert-v02@ methods appear one day. This
way, each certificate type can upgrade itself to the appropriately
related version. And at least related_alg is only needed if you _are_
a certificate key type - it's not adding yet another piece of
null-method boilerplate to the rest.)
This commit is groundwork for full certificate support, but doesn't
complete the job by itself. It introduces the new key types, and adds
a test in cryptsuite ensuring they work as expected, but nothing else.
If you manually construct a PPK file for one of the new key types, so
that it has a certificate in the public key field, then this commit
enables PuTTY to present that key to a server for user authentication,
either directly or via Pageant storing and using it. But I haven't yet
provided any mechanism for making such a PPK, so by itself, this isn't
much use.
Also, these new key types are not yet included in the KEXINIT host
keys list, because if they were, they'd just be treated as normal host
keys, in that you'd be asked to manually confirm the SSH fingerprint
of the certificate. I'll enable them for host keys once I add the
missing pieces.
This stores its data in the same format as the existing KCT_TEXT, but
it displays differently in puttygen --dump, expecting that the data
will be full of horrible control characters, invalid UTF-8, etc.
The displayed data is of the form b64("..."), so you get a hint about
what the encoding is, and can still paste into Python by defining the
identifier 'b64' to be base64.b64decode or equivalent.
Having recently pulled it out into its own file, I think it could also
do with a bit of tidying. In this rework:
- the substructure for a single component now has a globally visible
struct tag, so you can make a variable pointing at it, saving
verbiage in every piece of code looping over a key_components
- the 'is_mp_int' flag has been replaced with a type enum, so that
more types can be added without further upheaval
- the printing loop in cmdgen.c for puttygen --dump has factored out
the initial 'name=' prefix on each line so that it isn't repeated
per component type
- the storage format for text components is now a strbuf rather than
a plain char *, which I think is generally more useful.
The function will accept a public key file or a PPK, but if it fails
to parse as any of those, the error message says "not a PuTTY SSH-2
private key", which is particularly incongruous in situations where
you're specifically _not_ after the private half of the key.
Now says "not a public key or a PuTTY SSH-2 private key".
If you already have a string (of potentially-binary data) in the form
of a ptrlen reference to somewhere else, and you want to keep a copy
somewhere, it's useful to copy it into a strbuf. But it takes a couple
of lines of faff to do that, and it's nicer to wrap that up into a
tiny helper function.
This commit adds that helper function strbuf_dup, and its non-movable
sibling strbuf_dup_nm for secret data. Also, gone through the existing
code and found a bunch of cases where this makes things less verbose.
These days, the base64 module has 'b64decode', which can tolerate a
str or a bytes as input. Switched to using that, and also, imported it
under a nice short name 'b64'.
In the process, removed the obsolete equivocation between
base64.decodebytes and base64.decodestring. That was there to cope
with Python 2 - but the assert statement right next to it has been
enforcing P3 since commit 2ec2b796ed two years ago!
This consists of DJB's 'Streamlined NTRU Prime' quantum-resistant
cryptosystem, currently in round 3 of the NIST post-quantum key
exchange competition; it's run in parallel with ordinary Curve25519,
and generates a shared secret combining the output of both systems.
(Hence, even if you don't trust this newfangled NTRU Prime thing at
all, it's at least no _less_ secure than the kex you were using
already.)
As the OpenSSH developers point out, key exchange is the most urgent
thing to make quantum-resistant, even before working quantum computers
big enough to break crypto become available, because a break of the
kex algorithm can be applied retroactively to recordings of your past
sessions. By contrast, authentication is a real-time protocol, and can
only be broken by a quantum computer if there's one available to
attack you _already_.
I've implemented both sides of the mechanism, so that PuTTY and Uppity
both support it. In my initial testing, the two sides can both
interoperate with the appropriate half of OpenSSH, and also (of
course, but it would be embarrassing to mess it up) with each other.
This is already slightly nice because it lets me separate the
Weierstrass and Montgomery code more completely, without having to
have a vtable tucked into dh->extra. But more to the point, it will
allow completely different kex methods to fit into the same framework
later.
To that end, I've moved more of the descriptive message generation
into the vtable, and also provided the constructor with a flag that
will let it do different things in client and server.
Also, following on from a previous commit, I've arranged that the new
API returns arbitrary binary data for the exchange hash, rather than
an mp_int. An upcoming implementation of this interface will want to
return an encoded string instead of an encoded mp_int.
This means if I have functions like foo_subfoo_bar and foo_baz that
both operate on a foo, the Python testcrypt system can translate both
into .bar() and .baz() methods on the object, even though they don't
start with the same prefix.
In test_primegen, we loop round retrieving random data until we find
some that will permit a successful prime generation, so that we can
log only the successful attempts, and not the failures (which don't
have to be time-safe). But this itself introduces a potential mismatch
between logs, because the simplistic RNG used in testsc will have
different control flow depending on how far through a buffer of hash
data it is at the start of a given run.
random_advance_counter() gives it a fresh buffer, so calling that at
the start of a run should normalise this out. The code to do that was
already in the middle of random_read(); I've just pulled it out into a
separately callable function.
This hasn't _actually_ caused failures in test_primegen, but I'm not
sure why not. (Perhaps just luck.) But it did cause a failure in
another test of a similar nature, so before I commit _that_ test (and
the thing it's testing), I'd better fix this.
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
I happened to notice in passing that this function doesn't have any
tests (although it will have been at least somewhat tested by the
cmdgen interop test system).
This involved writing a wrapper that passes the passphrase and salt as
ptrlens, and I decided it made more sense to make the same change to
the original function too and adjust the call sites appropriately.
I derived a test case by getting OpenSSH itself to make an encrypted
key file, and then using the inputs and output from the password hash
operation that decrypted it again.
The return value of term_data() is used as the return value from the
GUI-terminal versions of the Seat output method, which means backends
will take it to be the amount of standard-output data currently
buffered, and exert back-pressure on the remote peer if it gets too
big (e.g. by ceasing to extend the window in that particular SSH-2
channel).
Historically, as a comment in term_data() explained, we always just
returned 0 from that function, on the basis that we were processing
all the terminal data through our terminal emulation code immediately,
and never retained any of it in the buffer at all. If the terminal
emulation code were to start running slowly, then it would slow down
the _whole_ PuTTY system, due to single-threadedness, and
back-pressure of a sort would be exerted on the remote by it simply
failing to get round to reading from the network socket. But by the
time we got back to the top level of term_data(), we'd have finished
reading all the data we had, so it was still appropriate to return 0.
That comment is still correct if you're thinking about the limiting
factor on terminal data processing being the CPU usage in term_out().
But now that's no longer the whole story, because sometimes we leave
data in term->inbuf without having processed it: during drag-selects
in the terminal window, and (just introduced) while waiting for the
response to a pending window resize request. For both those reasons,
we _don't_ always have a buffer size of zero when we return from
term_data().
So now that hole in our buffer size management is filled in:
term_data() returns the true size of the remaining unprocessed
terminal output, so that back-pressure will be exerted if the terminal
is currently not consuming it. And when processing resumes and we
start to clear our backlog, we call backend_unthrottle to let the
backend know it can relax the back-pressure if necessary.
I recently encountered a paper [1] which catalogues all kinds of
things that can go wrong when one party in a discrete-log system
invents a prime and the other party chooses an exponent. In
particular, some choices of prime make it reasonable to use a short
exponent to save time, but others make that strategy very bad.
That paper is about the ElGamal encryption scheme used in OpenPGP,
which is basically integer Diffie-Hellman with one side's key being
persistent: a shared-secret integer is derived exactly as in DH, and
then it's used to communicate a message integer by simply multiplying
the shared secret by the message, mod p.
I don't _know_ that any problem of this kind arises in the SSH usage
of Diffie-Hellman: the standard integer DH groups in SSH are safe
primes, and as far as I know, the usual generation of prime moduli for
DH group exchange also picks safe primes. So the short exponents PuTTY
has been using _should_ be OK.
However, the range of imaginative other possibilities shown in that
paper make me nervous, even so! So I think I'm going to retire the
short exponent strategy, on general principles of overcaution.
This slows down 4096-bit integer DH by about a factor of 3-4 (which
would be worse if it weren't for the modpow speedup in the previous
commit). I think that's OK, because, firstly, computers are a lot
faster these days than when I originally chose to use short exponents,
and secondly, more and more implementations are now switching to
elliptic-curve DH, which is unaffected by this change (and with which
we've always been using maximum-length exponents).
[1] On the (in)security of ElGamal in OpenPGP. Luca De Feo, Bertram
Poettering, Alessandro Sorniotti. https://eprint.iacr.org/2021/923
It's unquestionably a test program, and I'm generally clearing those
out of the top level. I only missed it in the last clearout because I
was looking for things with 'test' in the name.
They were there to work around that annoying feature of VS's
preprocessor when it expands __VA_ARGS__ into the argument list of
another macro. But I've just thought of a workaround that I can apply
in testcrypt.c itself, so that those parens don't have to appear in
every function definition in the header file.
The trick is, instead of writing
destination_macro(__VA_ARGS__)
you instead write
JUXTAPOSE(destination_macro, (__VA_ARGS__))
where JUXTAPOSE is defined to be a macro that simply expands its two
arguments next to each other:
#define JUXTAPOSE(first, second) first second
This works because the arguments to JUXTAPOSE get macro-expanded
_before_ passing them to JUXTAPOSE itself - the same reason that the
standard tricks with STR_INNER and CAT_INNER work (as seen in defs.h
here). So this defuses the magic behaviour of commas expanded from
__VA_ARGS__, and causes the destination macro to get all its arguments
in the expected places again.
I was dubious about it to begin with, when I found that RFC 7616's
example seemed to be treating it as a 256-bit truncation of SHA-512,
and not the thing FIPS 180-4 section 6.7 specifies as "SHA-512/256"
(which also changes the initial hash state). Having failed to get a
clarifying response from the RFC authors, I had the idea this morning
of testing other HTTP clients to see what _they_ thought that hash
function meant, and then at least I could go with an existing
in-practice consensus.
There is no in-practice consensus. Firefox doesn't support that
algorithm at all (but they do support SHA-256); wget doesn't support
anything that RFC 7616 added to the original RFC 2617. But the prize
for weirdness goes to curl, which does accept the name "SHA-512-256"
and ... treats it as an alias for SHA-256!
So I think the situation among real clients is too confusing to even
try to work with, and I'm going to stop adding to it. PuTTY will
follow Firefox's policy: if a proxy server asks for SHA-256 digests
we'll happily provide them, but if they ask for SHA-512-256 we'll
refuse on the grounds that it's not clear enough what it means.
Now testcrypt has _two_ header files, that's more files than I want at
the top level, so I decided to move it.
It has a good claim to live in either 'test' or 'crypto', but in the
end I decided it wasn't quite specific enough to crypto (it already
also tests things in keygen and proxy), and also, the Python half of
the mechanism already lives in 'test', so it can live alongside that.
Having done that, it seemed silly to leave testsc and testzlib at the
top level: those have 'test' in the names as well, so they can go in
the test subdir as well.
While I'm renaming, also renamed testcrypt.h to testcrypt-func.h to
distinguish it from the new testcrypt-enum.h.
This allows the Python side of testcrypt to check in advance if a
given string is a valid element of an enumeration, and if not, cleanly
throw a Python-level exception without terminating the testcrypt
subprocess.
Should be useful in both manual use (when I'm trying something out by
hand and make a typo or misremember a spelling), and automated use (if
I make the same kind of error in cryptsuite.py then the exception dump
will make more sense).
In order to do this, the new handle_checkenum() function has to
recognise all the enumerated types by name and match them up to their
lookup functions - which is just the kind of thing that can now be
done easily be reincluding testcrypt-enum.h with different #defines.
Making a virtue of the necessity of adding parameter names to
testcrypt.h a couple of commits ago, we can now use those names to
improve diagnostics, so that if you use the wrong type in a Python
function call the error message will tell you the name as well as the
index of the offending argument.
Also, the repr() text for the function itself will now print a full
prototype (albeit in a nasty hybrid of C, Python and testcrypt.h
syntax) which shows all the parameter names. That should be handy when
trying to remember the order of arguments at the REPL prompt.
FUNC_WRAPPED is an alternative keyword to FUNC which you can use to
introduce a function specification in testcrypt.h, indicating that the
function is _not_ the one of the same name used in the main PuTTY
code, but instead a wrapper on it in testcrypt.c whose API was
reworked to be more friendly to translation into Python.
There are a lot of those wrappers already, and previously they passed
without comment in testcrypt.h, and were put into service by #defining
over the top of each name before expanding the marshalling functions.
Now, all those #defines are gone, because the use of FUNC_WRAPPED in
testcrypt.h is enough to clue in the marshalling wrapper to be
generated with a call to foo_wrapper() instead of foo().
Mostly the purpose of this is to make testcrypt.h a bit more
self-documenting: if you see FUNC_WRAPPED, you know not to be confused
by the Python and C function definitions totally failing to match.
Yesterday's commit 52ee636b09 which further extended the huge
pile of arity-specific annoying wrapper macros pushed me over the edge
and inspired me to give some harder thought to finding a way to handle
all arities at once. And this time I found one!
The new technique changes the syntax of the function specifications in
testcrypt.h. In particular, they now have to specify a _name_ for each
parameter as well as a type, because the macros generating the C
marshalling wrappers will need a structure field for each parameter
and cpp isn't flexible enough to generate names for those fields
automatically. Rather than tediously name them arg1, arg2 etc, I've
reused the names of the parameters from the prototypes or definitions
of the underlying real functions (via a one-off auto-extraction
process starting from the output of 'clang -Xclang -dump-ast' plus
some manual polishing), which means testcrypt.h is now a bit more
self-documenting.
The testcrypt.py end of the mechanism is rewritten to eat the new
format. Since it's got more complicated syntax and nested parens and
things, I've written something a bit like a separated lexer/parser
system in place of the previous crude regex matcher, which should
enforce that the whole header file really does conform to the
restricted syntax it has to fit into.
The new system uses a lot less code in testcrypt.c, but I've made up
for that by also writing a long comment explaining how it works, which
was another thing the previous system lacked! Similarly, the new
testcrypt.h has some long-overdue instructions at the top.
Spotted in passing: the cryptsuite test functions iterate 'hashname'
through all the available implementations of SHA-512 (or SHA-384), but
then, in each iteration, ignore that loop variable completely and
always test the default algorithm. So on a platform where more than
one implementation is available, we were only actually testing one of
them. Oops!
In http.c, this drops in reasonably neatly alongside the existing
support for Basic, now that we're waiting for an initial 407 response
from the proxy to tell us which auth mechanism it would prefer to use.
The rest of this patch is mostly contriving to add testcrypt support
for the function in cproxy.c that generates the complicated output
header to go in the HTTP request: you need about a dozen assorted
parameters, the actual response hash has two more hashes in its
preimage, and there's even an option to hash the username as well if
necessary. Much more complicated than CHAP (which is just plain
HMAC-MD5), so it needs testing!
Happily, RFC 7616 comes with some reasonably useful test cases, and
I've managed to transcribe them directly into cryptsuite.py and
demonstrate that my response-generator agrees with them.
End-to-end testing of the whole system was done against Squid 4.13
(specifically, the squid package in Debian bullseye, version 4.13-10).
Not sure how I hadn't needed that before! Obviously, if I have a test
program that can exercise all the prime generation systems, it should
include _all_ of them.
I had a testsc run fail because of alignment-dependent control flow
divergence in a glibc function with 'memmove' in the name, which
appears to have been an accident of different memory allocation
between two runs of the test in question.
sclog was already giving special handling to memset for the same
reason, so it's no trouble to add memmove to the same list of
functions that are treated as an opaque primitive for logging
purposes.
Thanks to Mark Wooding for explaining the method of doing this. At
first glance it seemed _obviously_ impossible to run an algorithm that
needs an iteration per factor of 2 in p-1, without a timing leak
giving away the number of factors of 2 in p-1. But it's not, because
you can do the M-R checks interleaved with each step of your whole
modular exponentiation, and they're cheap enough that you can do them
in _every_ step, even the ones where the exponent is too small for M-R
to be interested in yet, and then do bitwise masking to exclude the
spurious results from the final output.
I'm about to rewrite the Miller-Rabin testing code, so let's start by
introducing a test suite that the old version passes, and then I can
make sure the new one does too.
This generates primality certificates for numbers, in the form of
Python / testcrypt code that calls Pockle methods. It factors p-1 by
calling out to the 'yafu' utility, which is a moderately sophisticated
integer factoring tool (including ECC and quadratic sieve methods)
that runs as a standalone command-line program.
Also added a Pockle test generated as output from this script, which
verifies the primality of the three NIST curves' moduli and their
generators' orders. I already had Pockle certificates for the moduli
and orders used in EdDSA, so this completes the set, and it does it
without me having had to do a lot of manual work.
This script makes 128 connections to your SSH agent at once, and then
sends requests down them in random order to check that the agent is
correctly selecting between all its incoming sockets / named pipes /
whatever.
128 is bigger than MAXIMUM_WAIT_OBJECTS, so a successful run of this
script inside a Windows PuTTY agent-forwarding to a Pageant indicates
that both the PuTTY and the Pageant are managing to handle >64 I/O
subthreads without overloading their event loop.
Gives a quick and easy report of which HW-accelerated crypto
implementations are (a) compiled in to testcrypt, (b) actually
instantiable at testcrypt run time.
This applies to all of AES, SHA-1, SHA-256 and SHA-512. All those
source files previously contained multiple implementations of the
algorithm, enabled or disabled by ifdefs detecting whether they would
work on a given compiler. And in order to get advanced machine
instructions like AES-NI or NEON crypto into the output file when the
compile flags hadn't enabled them, we had to do nasty stuff with
compiler-specific pragmas or attributes.
Now we can do the detection at cmake time, and enable advanced
instructions in the more sensible way, by compile-time flags. So I've
broken up each of these modules into lots of sub-pieces: a file called
(e.g.) 'foo-common.c' containing common definitions across all
implementations (such as round constants), one called 'foo-select.c'
containing the top-level vtable(s), and a separate file for each
implementation exporting just the vtable(s) for that implementation.
One advantage of this is that it depends a lot less on compiler-
specific bodgery. My particular least favourite part of the previous
setup was the part where I had to _manually_ define some Arm ACLE
feature macros before including <arm_neon.h>, so that it would define
the intrinsics I wanted. Now I'm enabling interesting architecture
features in the normal way, on the compiler command line, there's no
need for that kind of trick: the right feature macros are already
defined and <arm_neon.h> does the right thing.
Another change in this reorganisation is that I've stopped assuming
there's just one hardware implementation per platform. Previously, the
accelerated vtables were called things like sha256_hw, and varied
between FOO-NI and NEON depending on platform; and the selection code
would simply ask 'is hw available? if so, use hw, else sw'. Now, each
HW acceleration strategy names its vtable its own way, and the
selection vtable has a whole list of possibilities to iterate over
looking for a supported one. So if someone feels like writing a second
accelerated implementation of something for a given platform - for
example, I've heard you can use plain NEON to speed up AES somewhat
even without the crypto extension - then it will now have somewhere to
drop in alongside the existing ones.
There's a new enumeration of fingerprint types, and you tell
ssh2_fingerprint() or ssh2_fingerprint_blob() which of them to use.
So far, this is only implemented behind the scenes, and exposed for
testcrypt to test. All the call sites of ssh2_fingerprint pass a fixed
default fptype, which is still set to the old MD5. That will change
shortly.
This commit adds the capability in principle to ppk_save_sb, by adding
a fmt_version field in the save parameters structure. As yet it's not
connected up to any user interface in PuTTYgen, but I think I'll need
to, because currently there's no way at all to convert PPK v3 back to
v2, and surely people will need to interoperate with older
installations of PuTTY, or with other PPK-consuming software.
This removes both uses of SHA-1 in the file format: it was used as the
MAC protecting the key file against tamperproofing, and also used in
the key derivation step that converted the user's passphrase to cipher
and MAC keys.
The MAC is simply upgraded from HMAC-SHA-1 to HMAC-SHA-256; it is
otherwise unchanged in how it's applied (in particular, to what data).
The key derivation is totally reworked, to be based on Argon2, which
I've just added to the code base. This should make stolen encrypted
key files more resistant to brute-force attack.
Argon2 has assorted configurable parameters for memory and CPU usage;
the new key format includes all those parameters. So there's no reason
we can't have them under user control, if a user wants to be
particularly vigorous or particularly lightweight with their own key
files. They could even switch to one of the other flavours of Argon2,
if they thought side channels were an especially large or small risk
in their particular environment. In this commit I haven't added any UI
for controlling that kind of thing, but the PPK loading function is
all set up to cope, so that can all be added in a future commit
without having to change the file format.
While I'm at it, I've also switched the CBC encryption to using a
random IV (or rather, one derived from the passphrase along with the
cipher and MAC keys). That's more like normal SSH-2 practice.
This is going to be used in the new version of the PPK file format. It
was the winner of the Password Hashing Context, which I think makes it
a reasonable choice.
Argon2 comes in three flavours: one with no data dependency in its
memory addressing, one with _deliberate_ data dependency (intended to
serialise computation, to hinder parallel brute-forcing), and a hybrid
form that starts off data-independent and then switches over to the
dependent version once the sensitive input data has been adequately
mixed around. I test all three in the test suite; the side-channel
tester can only expect Argon2i to pass; and, following the spec's
recommendation, I'll be using Argon2id for the actual key file
encryption.
No functional change: currently, the IV passed in is always zero
(except in the test suite). But this prepares to change that in a
future revision of the key file format.
I've just done a major rewrite of code structure and update policy for
most of the TermWin window-modification methods, and I wrote this test
program in the process to check that old and new versions of the
terminal still respond to all these escape sequences in the same way.
It's quite likely to come in useful again, so I'll commit it.
I had the wrong function name prefix in the method_prefixes array: the
MAC functions all begin with ssh2_mac_* instead of ssh_mac_*. As a
result, MAC objects in the Python testcrypt system didn't provide
OO-like methods such as m.update() and m.genresult(); instead you had
to say ssh2_mac_update(m, ...) and ssh2_mac_genresult(m).
The NEON support for SHA-512 acceleration looks very like SHA-256,
with a pair of chained instructions to generate a 128-bit vector
register full of message schedule, and another pair to update the hash
state based on those. But since SHA-512 is twice as big in all
dimensions, those four instructions between them only account for two
rounds of it, in place of four rounds of SHA-256.
Also, it's a tighter squeeze to fit all the data needed by those
instructions into their limited number of register operands. The NEON
SHA-256 implementation was able to keep its hash state and message
schedule stored as 128-bit vectors and then pass combinations of those
vectors directly to the instructions that did the work; for SHA-512,
in several places you have to make one of the input operands to the
main instruction by combining two halves of different vectors from
your existing state. But that operation is a quick single EXT
instruction, so no trouble.
The only other problem I've found is that clang - in particular the
version on M1 macOS, but as far as I can tell, even on current trunk -
doesn't seem to implement the NEON intrinsics for the SHA-512
extension. So I had to bodge my own versions with inline assembler in
order to get my implementation to compile under clang. Hopefully at
some point in the future the gap might be filled and I can relegate
that to a backwards-compatibility hack!
This commit adds the same kind of switching mechanism for SHA-512 that
we already had for SHA-256, SHA-1 and AES, and as with all of those,
plumbs it through to testcrypt so that you can explicitly ask for the
hardware or software version of SHA-512. So the test suite can run the
standard test vectors against both implementations in turn.
On M1 macOS, I'm testing at run time for the presence of SHA-512 by
checking a sysctl setting. You can perform the same test on the
command line by running "sysctl hw.optional.armv8_2_sha512".
As far as I can tell, on Windows there is not yet any flag to test for
this CPU feature, so for the moment, the new accelerated SHA-512 is
turned off unconditionally on Windows.
When we invent a movzx instruction as part of shift-count logging on
x86, we apparently need to set its 'translation' field to point at a
pre-existing instruction that it's logically related to. Later
versions of DynamoRIO than I was running with will complain if this
isn't done.
This occurred to me recently as a (very small) hole in the logging
strategy: if the size of an allocated memory block depended on some
secret data, it certainly would change the control flow and memory
access pattern inside malloc, but since we disable logging inside
malloc, the log file from this test suite would never see the
difference.
Easily fixed by printing the size of each block in the code that
intercepts malloc and realloc. As expected, no test actually fails as
a result of filling in this gap.
On AArch64, there are unexpectedly malloc and free functions in ld.so,
so the module-load function finds them there, wraps them, and then
misses the real versions in libc.
This allows my side-channel test system to at least _compile_ on other
architectures without failing for the lack of OP_xxx enum constants,
although it now won't log all the things it needs to be a proper test.
We keep an internal 128-bit counter that's used as part of the hash
preimages. There's no real need to import all the mp_int machinery in
order to implement that: we can do it by hand using a small fixed-size
array and a trivial use of BignumADC. This is another inter-module
dependency that's easy to remove and useful to spinoff programs.
This changes the hash preimage calculation in the PRNG, because we're
now formatting our 128-bit integer in the fixed-length representation
of 16 little-endian bytes instead of as an SSH-2 mpint. This is
harmless (perhaps even mildly beneficial, due to the length now not
depending on how long the PRNG has been running), but means I have to
update the PRNG tests as well.
The class for general rth-root finding started off as a cube-root
finder before I generalised it, and in one part of the top-level
explanatory comment, I still referred to a subgroup having index 3
rather than index r.
Also, in a later paragraph, I seem to have said 'index' several times
where I meant the concept of 'rank' I defined in the previous
paragraph.
The testcrypt protocol expects a string literal to be a concatenation
of literal bytes other than '%' and '\n', and %-escaped hex digit
pairs. But testcrypt.py was only ever using the latter format, so even
a legible ASCII string like "123" was being sent to testcrypt as the
unreadable and needlessly long "%31%32%33".
When debugging, I often arrange to save the testcrypt input stream to
a file, and sometimes I use that file as the starting point for
editing. So it is actually useful to have the protocol exchange be
legible to humans. Hence, here's a change to testcrypt.py which makes
it only use the %-escape encoding for byte values that aren't
printable ASCII.
A 'strong' prime, as defined by the Handbook of Applied Cryptography,
is a prime p such that each of p-1 and p+1 has a large prime factor,
and that the large factor q of p-1 is such that q-1 in turn _also_ has
a large prime factor.
HoAC says that making your RSA key using primes of this form defeats
some factoring algorithms - but there are other faster algorithms to
which it makes no difference. So this is probably not a useful
precaution in practice. However, it has been recommended in the past
by some official standards, and it's easy to implement given the new
general facility in PrimeCandidateSource that lets you ask for your
prime to satisfy an arbitrary modular congruence. (And HoAC also says
there's no particular reason _not_ to use strong primes.) So I provide
it as an option, just in case anyone wants to select it.
The change to the key generation algorithm is entirely in sshrsag.c,
and is neatly independent of the prime-generation system in use. If
you're using Maurer provable prime generation, then the known factor q
of p-1 can be used to help certify p, and the one for q-1 to help with
q in turn; if you switch to probabilistic prime generation then you
still get an RSA key with the right structure, except that every time
the definition says 'prime factor' you just append '(probably)'.
(The probabilistic version of this procedure is described as 'Gordon's
algorithm' in HoAC section 4.4.2.)
Since our prime-generation code contains facilities not used by the
main key generators - Sophie Germain primes, user-specified modular
congruences, and MPU certificate output - it's probably going to be
useful sooner or later to have a command-line tool to access those
facilities. So here's a simple script that glues a Python argparse
interface on to the front of it all.
It would be nice to put this in 'contrib' rather than 'test', on the
grounds that it's at least potentially useful for purposes other than
testing PuTTY during development. But it's a client of the testcrypt
system, so it can't live anywhere other than the same directory as
testcrypt.py without me first having to do a lot of faffing about with
Python module organisation. So it can live here for the moment.
Conveniently checkable certificates of primality aren't a new concept.
I didn't invent them, and I wasn't the first to implement them. Given
that, I thought it might be useful to be able to independently verify
a prime generated by PuTTY's provable prime system. Then, even if you
don't trust _this_ code, you might still trust someone else's
verifier, or at least be less willing to believe that both were
colluding.
The Perl module Math::Prime::Util is the only free software I've found
that defines a specific text-file format for certificates of
primality. The MPU format (as it calls it) supports various different
methods of certifying the primality of a number (most of which, like
Pockle's, depend on having previously proved some smaller number(s) to
be prime). The system implemented by Pockle is on its list: MPU calls
it by the name "BLS5".
So this commit introduces extra stored data inside Pockle so that it
remembers not just _that_ it believes certain numbers to be prime, but
also _why_ it believed each one to be prime. Then there's an extra
method in the Pockle API to translate its internal data structures
into the text of an MPU certificate for any number it knows about.
Math::Prime::Util doesn't come with a command-line verification tool,
unfortunately; only a Perl function which you feed a string argument.
So also in this commit I add test/mpu-check.pl, which is a trivial
command-line client of that function.
At the moment, this new piece of API is only exposed via testcrypt. I
could easily put some user interface into the key generation tools
that would save a few primality certificates alongside the private
key, but I have yet to think of any good reason to do it. Mostly this
facility is intended for debugging and cross-checking of the
_algorithm_, not of any particular prime.
Most of them are now _mandatory_ P3 scripts, because I'm tired of
maintaining everything to be compatible with both versions.
The current exceptions are gdb.py (which has to live with whatever gdb
gives it), and kh2reg.py (which is actually designed for other people
to use, and some of them might still be stuck on P2 for the moment).
The comparison functions between an mp_int and an integer worked by
walking along the mp_int, comparing each of its words to the
corresponding word of the integer. When they ran out of mp_int, they'd
stop.
But this overlooks the possibility that they might not have run out of
_integer_ yet! If BIGNUM_INT_BITS is defined to be less than the size
of a uintmax_t, then comparing (say) the uintmax_t 0x8000000000000001
against a one-word mp_int containing 0x0001 would return equality,
because it would never get as far as spotting the high bit of the
integer.
Fixed by iterating up to the max of the number of BignumInts in the
mp_int and the number that cover a uintmax_t. That means we have to
use mp_word() instead of a direct array lookup to get the mp_int words
to compare against, since now the word indices might be out of range.
This is standardised by RFC 8709 at SHOULD level, and for us it's not
too difficult (because we use general-purpose elliptic-curve code). So
let's be up to date for a change, and add it.
This implementation uses all the formats defined in the RFC. But we
also have to choose a wire format for the public+private key blob sent
to an agent, and since the OpenSSH agent protocol is the de facto
standard but not (yet?) handled by the IETF, OpenSSH themselves get to
say what the format for a key should or shouldn't be. So if they don't
support a particular key method, what do you do?
I checked with them, and they agreed that there's an obviously right
format for Ed448 keys, which is to do them exactly like Ed25519 except
that you have a 57-byte string everywhere Ed25519 had a 32-byte
string. So I've done that.
With all the preparation now in place, this is more or less trivial.
We add a new curve setup function in sshecc.c, and an ssh_kex linking
to it; we add the curve parameters to the reference / test code
eccref.py, and use them to generate the list of low-order input values
that should be rejected by the sanity check on the kex output; we add
the standard test vectors from RFC 7748 in cryptsuite.py, and the
low-order values we just generated.
This implements an extended form of primality verification using
certificates based on Pocklington's theorem. You make a Pockle object,
and then try to convince it that one number after another is prime, by
means of providing it with a list of prime factors of p-1 and a
primitive root. (Or just by saying 'this prime is small enough for you
to check yourself'.)
Pocklington's theorem requires you to have factors of p-1 whose
product is at least the square root of p. I've extended that to
support factorisations only as big as the cube root, via an extension
of the theorem given in Maurer's paper on generating provable primes.
The Pockle object is more or less write-only: it has no methods for
reading out its contents. Its only output channel is the return value
when you try to insert a prime into it: if it isn't sufficiently
convinced that your prime is prime, it will return an error code. So
anything for which it returns POCKLE_OK you can be confident of.
I'm going to use this for provable prime generation. But exposing this
part of the system as an object in its own right means I can write a
set of unit tests for this specifically. My negative tests exercise
all the different ways a certification can be erroneous or inadequate;
the positive tests include proofs of primality of various primes used
in elliptic-curve crypto. The Poly1305 proof in particular is taken
from a proof in DJB's paper, which has exactly the form of a
Pocklington certificate only written in English.
The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I just ran into a bug in which the testcrypt child process was cleanly
terminated, but at least one Python object was left lying around
containing the identifier of a testcrypt object that had never been
freed. On program exit, the Python reference count on that object went
to zero, the __del__ method was invoked, and childprocess.funcall
started a _new_ instance of testcrypt just so it could tell it to free
the object identifier - which, of course, the new testcrypt had never
heard of!
We can already tell the difference between a ChildProcess object which
has no subprocess because it hasn't yet been started, and one which
has no subprocess because it's terminated: the latter has exitstatus
set to something other than None. So now we enforce by assertion that
we don't ever restart the child process, and the __del__ method avoids
doing anything if the child has already finished.
When I'm writing Python using the testcrypt API, I keep finding that I
instinctively try to call vtable methods as if they were actual
methods of the object. For example, calling key.sign(msg, 0) instead
of ssh_key_sign(key, msg, 0).
So this change to the Python side of the testcrypt mechanism panders
to my inappropriate finger-macros by making them work! The idea is
that I define a set of pairs (type, prefix), such that any function
whose name begins with the prefix and whose first argument is of that
type will be automatically translated into a method on the Python
object wrapping a testcrypt value of that type. For example, any
function of the form ssh_key_foo(val_ssh_key, other args) will
automatically be exposed as a method key.foo(other args), simply
because (val_ssh_key, "ssh_key_") appears in the translation table.
This is particularly nice for the Python 3 REPL, which will let me
tab-complete the right set of method names by knowing the type I'm
trying to invoke one on. I haven't decided yet whether I want to
switch to using it throughout cryptsuite.py.
For namespace-cleanness, I've also renamed all the existing attributes
of the Python Value class wrapper so that they start with '_', to
leave the space of sensible names clear for the new OOish methods.
This expands our previous check for the public value being zero, to
take in all the values that will _become_ zero after not many steps.
The actual check at run time is done using the new is_infinite query
method for Montgomery curve points. Test cases in cryptsuite.py cover
all the dangerous values I generated via all that fiddly quartic-
solving code.
(DJB's page http://cr.yp.to/ecdh.html#validate also lists these same
constants. But working them out again for myself makes me confident I
can do it again for other similar curves, such as Curve448.)
In particular, this makes us fully compliant with RFC 7748's demand to
check we didn't generate a trivial output key, which can happen if the
other end sends any of those low-order values.
I don't actually see why this is a vital check to perform for security
purposes, for the same reason that we didn't classify the bug
'diffie-hellman-range-check' as a vulnerability: I can't really see
what the other end's incentive might be to deliberately send one of
these nonsense values (and you can't do it by accident - none of these
values is a power of the canonical base point). It's not that a DH
participant couldn't possible want to secretly expose the session
traffic - but there are plenty of more subtle (and less subtle!) ways
to do it, so you don't really gain anything by forcing them to use one
of those instead. But the RFC says to check, so we check.
This uses the new quartic-solver mod p to generate all the values in
Curve25519 that can end up at the curve identity by repeated
application of the doubling formula.
I'm going to want to use this for finding special values in elliptic
curves' ground fields.
In order to solve cubics and quartics in F_p, you have to work in
F_{p^2}, for much the same reasons that you have to be willing to use
complex numbers if you want to solve general cubics over the reals
(even if all the eventual roots turn out to be real after all). So
I've also introduced another arithmetic class to work in that kind of
field, and a shim that glues that on to the cyclic-group root finder
from the previous commit.
I'm about to want to solve quartics mod a prime, which means I'll need
to be able to take cube roots mod p as well as square roots.
This commit introduces a more general class which can take rth roots
for any prime r, and moreover, it can do it in a general cyclic group.
(You have to tell it the group's order and give it some primitives for
doing arithmetic, plus a way of iterating over the group elements that
it can use to look for a non-rth-power and roots of unity.)
That system makes it nicely easy to test, because you can give it a
cyclic group represented as the integers under _addition_, and then
you obviously know what all the right answers are. So I've also added
a unit test system checking that.
I'm about to want to expand the underlying number-theory code, so I'll
start by moving it into a file where it has room to grow without
swamping the main purpose of eccref.py.
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
This still isn't the true random generator used in the live tools:
it's deterministic, for repeatable testing. The Python side of
testcrypt can now call random_make_prng(), which will instantiate a
PRNG with the given seed. random_clear() still gets rid of it.
So I can still have some tests control the precise random numbers
received by the function under test, but for others (especially key
generation, with its uncertainty about how much randomness it will
actually use) I can just say 'here, have a seed, generate as much
stuff from that seed as you need'.
This is another application of the existing mp_bezout_into, which
needed a tweak or two to cope with the numbers not necessarily being
coprime, plus a wrapper function to deal with shared factors of 2.
It reindents the entire second half of mp_bezout_into, so the patch is
best viewed with whitespace differences ignored.
There was previously no safe left shift at all, which is an omission.
And rshift_safe_into was an odd thing to be missing, so while I'm
here, I've added it on the basis that it will probably be useful
sooner or later.
While looking over the code for other reasons, I happened to notice
that the internal function mp_add_masked_integer_into was using a
totally wrong condition to check whether it was about to do an
out-of-range right shift: it was comparing a shift count measured in
bits against BIGNUM_INT_BYTES.
The resulting bug hasn't shown up in the code so far, which I assume
is just because no caller is passing any RHS to mp_add_integer_into
bigger than about 1 or 2. And it doesn't show up in the test suite
because I hadn't tested those functions. Now I am testing them, and
the newly added test fails when built for 16-bit BignumInt if you back
out the actual fix in this commit.
Also spelled '-O text', this takes a public or private key as input,
and produces on standard output a dump of all the actual numbers
involved in the key: the exponent and modulus for RSA, the p,q,g,y
parameters for DSA, the affine x and y coordinates of the public
elliptic curve point for ECC keys, and all the extra bits and pieces
in the private keys too.
Partly I expect this to be useful to me for debugging: I've had to
paste key files a few too many times through base64 decoders and hex
dump tools, then manually decode SSH marshalling and paste the result
into the Python REPL to get an integer object. Now I should be able to
get _straight_ to text I can paste into Python.
But also, it's a way that other applications can use the key
generator: if you need to generate, say, an RSA key in some format I
don't support (I've recently heard of an XML-based one, for example),
then you can run 'puttygen -t rsa --dump' and have it print the
elements of a freshly generated keypair on standard output, and then
all you have to do is understand the output format.
Sometimes, within a switch statement, you want to declare local
variables specific to the handler for one particular case. Until now
I've mostly been writing this in the form
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED:
{
declare variables;
do stuff;
}
break;
}
which is ugly because the two pieces of essentially similar code
appear at different indent levels, and also inconvenient because you
have less horizontal space available to write the complicated case
handler in - particuarly undesirable because _complicated_ case
handlers are the ones most likely to need all the space they can get!
After encountering a rather nicer idiom in the LLVM source code, and
after a bit of hackery this morning figuring out how to persuade
Emacs's auto-indent to do what I wanted with it, I've decided to move
to an idiom in which the open brace comes right after the case
statement, and the code within it is indented the same as it would
have been without the brace. Then the whole case handler (including
the break) lives inside those braces, and you get something that looks
more like this:
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED: {
declare variables;
do stuff;
break;
}
}
This commit is a big-bang change that reformats all the complicated
case handlers I could find into the new layout. This is particularly
nice in the Pageant main function, in which almost _every_ case
handler had a bundle of variables and was long and complicated. (In
fact that's what motivated me to get round to this.) Some of the
innermost parts of the terminal escape-sequence handling are also
breathing a bit easier now the horizontal pressure on them is
relieved.
(Also, in a few cases, I was able to remove the extra braces
completely, because the only variable local to the case handler was a
loop variable which our new C99 policy allows me to move into the
initialiser clause of its for statement.)
Viewed with whitespace ignored, this is not too disruptive a change.
Downstream patches that conflict with it may need to be reapplied
using --ignore-whitespace or similar.
In an early draft of the new PRNG, before I decided to get rid of
random_byte() and replace it with random_read(), it was important
after generating a hash-worth of PRNG output to buffer it so as to
return it a byte at a time. So the PRNG data structure itself had to
keep a hash-sized buffer of pending output, and be able to return the
next byte from it on every random_byte() call.
But when random_read() came in, there was no need to do that any more,
because at the end of a read, the generator is re-seeded and the
remains of any generated data is deliberately thrown away. So the
pending_output buffer has no need to live in the persistent prng
object; it can be relegated to a local variable inside random_read
(and a couple of other functions that used the same buffer since it
was conveniently there).
A side effect of this is that we're no longer yielding the bytes of
each hash in reverse order, because only the previous silly code
structure made it convenient. Fortunately, of course, nothing is
depending on that - except the cryptsuite tests, which I've updated.
Well, actually, two new test programs. agenttest.py is the actual
test; it depends on agenttestgen.py which generates a collection of
test private keys, using the newly exposed testcrypt interface to our
key generation code.
In this commit I've also factored out some Python SSH marshalling code
from cryptsuite, and moved it into a module ssh.py which the agent
tests can reuse.
This adds stability tests (of the form 'make sure this behaves
tomorrow the same way it behaved today, taking on faith that the
latter was right') for all the new in-memory APIs for public and
private key load/save.
I accepted both 'int' and 'uint' as function argument types, but
hadn't previously noticed that only 'uint' is handled properly as a
return type. Now both are.
It demonstrates a successful round trip from a source integer to
ciphertext and back, and also I've hardcoded the ciphertext I got from
the first attempt so that future changes to the code won't be able to
change it without me noticing.
[SGT's note: Pavel Kryukov says he came across these while setting up
CI runs of testsc. Two of the three guarded functions are obvious
variants on ones we're already guarding; I hadn't heard of cfree
before, but apparently it's an alternate form of ordinary free() from
an obsolete standard, whose glibc man page says 'never use it'.]
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.