I'm about to rewrite the Miller-Rabin testing code, so let's start by
introducing a test suite that the old version passes, and then I can
make sure the new one does too.
There's a new enumeration of fingerprint types, and you tell
ssh2_fingerprint() or ssh2_fingerprint_blob() which of them to use.
So far, this is only implemented behind the scenes, and exposed for
testcrypt to test. All the call sites of ssh2_fingerprint pass a fixed
default fptype, which is still set to the old MD5. That will change
shortly.
This is going to be used in the new version of the PPK file format. It
was the winner of the Password Hashing Context, which I think makes it
a reasonable choice.
Argon2 comes in three flavours: one with no data dependency in its
memory addressing, one with _deliberate_ data dependency (intended to
serialise computation, to hinder parallel brute-forcing), and a hybrid
form that starts off data-independent and then switches over to the
dependent version once the sensitive input data has been adequately
mixed around. I test all three in the test suite; the side-channel
tester can only expect Argon2i to pass; and, following the spec's
recommendation, I'll be using Argon2id for the actual key file
encryption.
I had the wrong function name prefix in the method_prefixes array: the
MAC functions all begin with ssh2_mac_* instead of ssh_mac_*. As a
result, MAC objects in the Python testcrypt system didn't provide
OO-like methods such as m.update() and m.genresult(); instead you had
to say ssh2_mac_update(m, ...) and ssh2_mac_genresult(m).
The testcrypt protocol expects a string literal to be a concatenation
of literal bytes other than '%' and '\n', and %-escaped hex digit
pairs. But testcrypt.py was only ever using the latter format, so even
a legible ASCII string like "123" was being sent to testcrypt as the
unreadable and needlessly long "%31%32%33".
When debugging, I often arrange to save the testcrypt input stream to
a file, and sometimes I use that file as the starting point for
editing. So it is actually useful to have the protocol exchange be
legible to humans. Hence, here's a change to testcrypt.py which makes
it only use the %-escape encoding for byte values that aren't
printable ASCII.
A 'strong' prime, as defined by the Handbook of Applied Cryptography,
is a prime p such that each of p-1 and p+1 has a large prime factor,
and that the large factor q of p-1 is such that q-1 in turn _also_ has
a large prime factor.
HoAC says that making your RSA key using primes of this form defeats
some factoring algorithms - but there are other faster algorithms to
which it makes no difference. So this is probably not a useful
precaution in practice. However, it has been recommended in the past
by some official standards, and it's easy to implement given the new
general facility in PrimeCandidateSource that lets you ask for your
prime to satisfy an arbitrary modular congruence. (And HoAC also says
there's no particular reason _not_ to use strong primes.) So I provide
it as an option, just in case anyone wants to select it.
The change to the key generation algorithm is entirely in sshrsag.c,
and is neatly independent of the prime-generation system in use. If
you're using Maurer provable prime generation, then the known factor q
of p-1 can be used to help certify p, and the one for q-1 to help with
q in turn; if you switch to probabilistic prime generation then you
still get an RSA key with the right structure, except that every time
the definition says 'prime factor' you just append '(probably)'.
(The probabilistic version of this procedure is described as 'Gordon's
algorithm' in HoAC section 4.4.2.)
Most of them are now _mandatory_ P3 scripts, because I'm tired of
maintaining everything to be compatible with both versions.
The current exceptions are gdb.py (which has to live with whatever gdb
gives it), and kh2reg.py (which is actually designed for other people
to use, and some of them might still be stuck on P2 for the moment).
This implements an extended form of primality verification using
certificates based on Pocklington's theorem. You make a Pockle object,
and then try to convince it that one number after another is prime, by
means of providing it with a list of prime factors of p-1 and a
primitive root. (Or just by saying 'this prime is small enough for you
to check yourself'.)
Pocklington's theorem requires you to have factors of p-1 whose
product is at least the square root of p. I've extended that to
support factorisations only as big as the cube root, via an extension
of the theorem given in Maurer's paper on generating provable primes.
The Pockle object is more or less write-only: it has no methods for
reading out its contents. Its only output channel is the return value
when you try to insert a prime into it: if it isn't sufficiently
convinced that your prime is prime, it will return an error code. So
anything for which it returns POCKLE_OK you can be confident of.
I'm going to use this for provable prime generation. But exposing this
part of the system as an object in its own right means I can write a
set of unit tests for this specifically. My negative tests exercise
all the different ways a certification can be erroneous or inadequate;
the positive tests include proofs of primality of various primes used
in elliptic-curve crypto. The Poly1305 proof in particular is taken
from a proof in DJB's paper, which has exactly the form of a
Pocklington certificate only written in English.
The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
I just ran into a bug in which the testcrypt child process was cleanly
terminated, but at least one Python object was left lying around
containing the identifier of a testcrypt object that had never been
freed. On program exit, the Python reference count on that object went
to zero, the __del__ method was invoked, and childprocess.funcall
started a _new_ instance of testcrypt just so it could tell it to free
the object identifier - which, of course, the new testcrypt had never
heard of!
We can already tell the difference between a ChildProcess object which
has no subprocess because it hasn't yet been started, and one which
has no subprocess because it's terminated: the latter has exitstatus
set to something other than None. So now we enforce by assertion that
we don't ever restart the child process, and the __del__ method avoids
doing anything if the child has already finished.
When I'm writing Python using the testcrypt API, I keep finding that I
instinctively try to call vtable methods as if they were actual
methods of the object. For example, calling key.sign(msg, 0) instead
of ssh_key_sign(key, msg, 0).
So this change to the Python side of the testcrypt mechanism panders
to my inappropriate finger-macros by making them work! The idea is
that I define a set of pairs (type, prefix), such that any function
whose name begins with the prefix and whose first argument is of that
type will be automatically translated into a method on the Python
object wrapping a testcrypt value of that type. For example, any
function of the form ssh_key_foo(val_ssh_key, other args) will
automatically be exposed as a method key.foo(other args), simply
because (val_ssh_key, "ssh_key_") appears in the translation table.
This is particularly nice for the Python 3 REPL, which will let me
tab-complete the right set of method names by knowing the type I'm
trying to invoke one on. I haven't decided yet whether I want to
switch to using it throughout cryptsuite.py.
For namespace-cleanness, I've also renamed all the existing attributes
of the Python Value class wrapper so that they start with '_', to
leave the space of sensible names clear for the new OOish methods.
Also spelled '-O text', this takes a public or private key as input,
and produces on standard output a dump of all the actual numbers
involved in the key: the exponent and modulus for RSA, the p,q,g,y
parameters for DSA, the affine x and y coordinates of the public
elliptic curve point for ECC keys, and all the extra bits and pieces
in the private keys too.
Partly I expect this to be useful to me for debugging: I've had to
paste key files a few too many times through base64 decoders and hex
dump tools, then manually decode SSH marshalling and paste the result
into the Python REPL to get an integer object. Now I should be able to
get _straight_ to text I can paste into Python.
But also, it's a way that other applications can use the key
generator: if you need to generate, say, an RSA key in some format I
don't support (I've recently heard of an XML-based one, for example),
then you can run 'puttygen -t rsa --dump' and have it print the
elements of a freshly generated keypair on standard output, and then
all you have to do is understand the output format.
I accepted both 'int' and 'uint' as function argument types, but
hadn't previously noticed that only 'uint' is handled properly as a
return type. Now both are.
Previously, if the testcrypt subprocess suffered any kind of crash or
assertion failure during a run of the Python-based test system, the
effect would be that ChildProcess.read_line() would get EOF, ignore
it, and silently return the empty string. Then it would carry on doing
that for the rest of the program, leading to a long string of error
reports in tests that were nowhere near the code that actually caused
the crash.
Now ChildProcess.read_line() detects EOF and raises an exception, so
that the test suite won't heedlessly carry on trying to do things once
it's noticed that its subprocess has gone away.
This is more fiddly than it sounds, however, because of the wrinkle
that sometimes that function can be called while a Python __del__
method is asking testcrypt to free something. If that happens, the
exception can't be propagated out of the __del__ (analogously to the
rule that it's a really terrible idea for C++ destructors to throw).
So you get an annoying warning message on standard error, and then the
next command sent to testcrypt will be back in the same position.
Worse still, this can also happen if testcrypt has _already_ crashed,
because the __del__ methods will still run.
To protect against _that_, ChildProcess caches the exception after
throwing it, and then each subsequent write_line() will rethrow it.
And __del__ catches and explicitly ignores the exception (to avoid the
annoying warning if Python has to do the same).
The combined result should be that if testcrypt crashes in normal
(non-__del__) context, we should get a single exception that
terminates the run cleanly without cascade failures, and whose
backtrace localises the problem to the actual operation that caused
the crash. If testcrypt crashes in __del__, we can't quite do that
well, but we can still terminate with an exception at the next
opportunity, avoiding multiple cascade failures.
Also in this commit, I've got rid of the try-finally in
cryptsuite.py's (trivial) main program.
Now we only run the final memory-leak check if we didn't already have
some other error to report, or some other exception that terminated
the process.
Also, we wait for the subprocess to terminate before returning control
to the shell, so that any last-minute complaints from Leak Sanitiser
appear before rather than after the shell prompt comes back.
While I'm here, I've also made check_return_status tolerate the case
in which the child process never got started at all. That way, if a
failure manages to occur before even getting _that_ far, there won't
be a cascade failure from check_return_status getting confused
afterwards.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
No cipher construction function _currently_ returns NULL, but one's
about to start, so the testcrypt system will have to be able to cope.
This is the first time a function in the testcrypt API has had an
'opt' type as its return value rather than an argument. But it works
just the same in reverse: the wire protocol emits the special
identifer "NULL" when the optional return value is absent, and the
Python module catches that and rewrites it as Python 'None'.
The bulk of this commit is the changes necessary to make testcrypt
compile under Visual Studio. Unfortunately, I've had to remove my
fiddly clever uses of C99 variadic macros, because Visual Studio does
something unexpected when a variadic macro's expansion puts
__VA_ARGS__ in the argument list of a further macro invocation: the
commas don't separate further arguments. In other words, if you write
#define INNER(x,y,z) some expansion involving x, y and z
#define OUTER(...) INNER(__VA_ARGS__)
OUTER(1,2,3)
then gcc and clang will translate OUTER(1,2,3) into INNER(1,2,3) in
the obvious way, and the inner macro will be expanded with x=1, y=2
and z=3. But try this in Visual Studio, and you'll get the macro
parameter x expanding to the entire string 1,2,3 and the other two
empty (with warnings complaining that INNER didn't get the number of
arguments it expected).
It's hard to cite chapter and verse of the standard to say which of
those is _definitely_ right, though my reading leans towards the
gcc/clang behaviour. But I do know I can't depend on it in code that
has to compile under both!
So I've removed the system that allowed me to declare everything in
testcrypt.h as FUNC(ret,fn,arg,arg,arg), and now I have to use a
different macro for each arity (FUNC0, FUNC1, FUNC2 etc). Also, the
WRAPPED_NAME system is gone (because that too depended on the use of a
comma to shift macro arguments along by one), and now I put a custom C
wrapper around a function by simply re-#defining that function's own
name (and therefore the subsequent code has to be a little more
careful to _not_ pass functions' names between several macros before
stringifying them).
That's all a bit tedious, and commits me to a small amount of ongoing
annoyance because now I'll have to add an explicit argument count
every time I add something to testcrypt.h. But then again, perhaps it
will make the code less incomprehensible to someone trying to
understand it!
When testcrypt.h lists a function argument as 'opt_val_foo', it means
that the argument is optional in the sense that the C function can
take a null pointer in place of a valid foo, and so the Python wrapper
module should accept None in the corresponding argument slot from the
client code and translate it into the special string "NULL" in the
wire protocol.
This works fine at argument translation time, but the code that reads
testcrypt.h wasn't looking at it, so if you said 'opt_val_foo_suffix'
in place of 'opt_val_foo' (indicating that that argument is optional
_and_ the C function expects it in a translated form), then the
initial pass over testcrypt.h wouldn't strip the _suffix, and would
set up data structures with mismatched type names.
I've written a new standalone test program which incorporates all of
PuTTY's crypto code, including the mp_int and low-level elliptic curve
layers but also going all the way up to the implementations of the
MAC, hash, cipher, public key and kex abstractions.
The test program itself, 'testcrypt', speaks a simple line-oriented
protocol on standard I/O in which you write the name of a function
call followed by some inputs, and it gives you back a list of outputs
preceded by a line telling you how many there are. Dynamically
allocated objects are assigned string ids in the protocol, and there's
a 'free' function that tells testcrypt when it can dispose of one.
It's possible to speak that protocol by hand, but cumbersome. I've
also provided a Python module that wraps it, by running testcrypt as a
persistent subprocess and gatewaying all the function calls into
things that look reasonably natural to call from Python. The Python
module and testcrypt.c both read a carefully formatted header file
testcrypt.h which contains the name and signature of every exported
function, so it costs minimal effort to expose a given function
through this test API. In a few cases it's necessary to write a
wrapper in testcrypt.c that makes the function look more friendly, but
mostly you don't even need that. (Though that is one of the
motivations between a lot of API cleanups I've done recently!)
I considered doing Python integration in the more obvious way, by
linking parts of the PuTTY code directly into a native-code .so Python
module. I decided against it because this way is more flexible: I can
run the testcrypt program on its own, or compile it in a way that
Python wouldn't play nicely with (I bet compiling just that .so with
Leak Sanitiser wouldn't do what you wanted when Python loaded it!), or
attach a debugger to it. I can even recompile testcrypt for a
different CPU architecture (32- vs 64-bit, or even running it on a
different machine over ssh or under emulation) and still layer the
nice API on top of that via the local Python interpreter. All I need
is a bidirectional data channel.