The functions primegen() and primegen_add_progress_phase() are gone.
In their place is a small vtable system with two methods corresponding
to them, plus the usual admin of allocating and freeing contexts.
This API change is the starting point for being able to drop in
different prime generation algorithms at run time in response to user
configuration.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
(This is a reapplication of commit a7bdefb39, without the Miller-Rabin
refactoring accidentally folded into it. Also this time I've added -lm
to the link options, which for some reason _didn't_ cause me a link
failure last time round. No idea why not.)
This reverts commit a7bdefb394.
I had accidentally mashed it together with another commit. I did
actually want to push both of them, but I'd rather push them
separately! So I'm backing out the combined blob, and I'll re-push
them with their proper comments and explanations.
The old API was one of those horrible things I used to do when I was
young and foolish, in which you have just one function, and indicate
which of lots of things it's doing by passing in flags. It was crying
out to be replaced with a vtable.
While I'm at it, I've reworked the code on the Windows side that
decides what to do with the progress bar, so that it's based on
actually justifiable estimates of probability rather than magic
integer constants.
Since computers are generally faster now than they were at the start
of this project, I've also decided there's no longer any point in
making the fixed final part of RSA key generation bother to report
progress at all. So the progress bars are now only for the variable
part, i.e. the actual prime generations.
The more features and options I add to PrimeCandidateSource, the more
cumbersome it will be to replicate each one in a command-line option
to the ultimate primegen() function. So I'm moving to an API in which
the client of primegen() constructs a PrimeCandidateSource themself,
and passes it in to primegen().
Also, changed the API for pcs_new() so that you don't have to pass
'firstbits' unless you really want to. The net effect is that even
though we've added flexibility, we've also simplified the call sites
of primegen() in the simple case: if you want a 1234-bit prime, you
just need to pass pcs_new(1234) as the argument to primegen, and
you're done.
The new declaration of primegen() lives in ssh_keygen.h, along with
all the types it depends on. So I've had to #include that header in a
few new files.
I've replaced the random number generation and small delta-finding
loop in primegen() with a much more elaborate system in its own source
file, with unit tests and everything.
Immediate benefits:
- fixes a theoretical possibility of overflowing the target number of
bits, if the random number was so close to the top of the range
that the addition of delta * factor pushed it over. However, this
only happened with negligible probability.
- fixes a directional bias in delta-finding. The previous code
incremented the number repeatedly until it found a value coprime to
all the right things, which meant that a prime preceded by a
particularly long sequence of numbers with tiny factors was more
likely to be chosen. Now we select candidate delta values at
random, that bias should be eliminated.
- changes the semantics of the outermost primegen() function to make
them easier to use, because now the caller specifies the 'bits' and
'firstbits' values for the actual returned prime, rather than
having to account for the factor you're multiplying it by in DSA.
DSA client code is correspondingly adjusted.
Future benefits:
- having the candidate generation in a separate function makes it
easy to reuse in alternative prime generation strategies
- the available constraints support applications such as Maurer's
algorithm for generating provable primes, or strong primes for RSA
in which both p-1 and p+1 have a large factor. So those become
things we could experiment with in future.
This still isn't the true random generator used in the live tools:
it's deterministic, for repeatable testing. The Python side of
testcrypt can now call random_make_prng(), which will instantiate a
PRNG with the given seed. random_clear() still gets rid of it.
So I can still have some tests control the precise random numbers
received by the function under test, but for others (especially key
generation, with its uncertainty about how much randomness it will
actually use) I can just say 'here, have a seed, generate as much
stuff from that seed as you need'.
Also spelled '-O text', this takes a public or private key as input,
and produces on standard output a dump of all the actual numbers
involved in the key: the exponent and modulus for RSA, the p,q,g,y
parameters for DSA, the affine x and y coordinates of the public
elliptic curve point for ECC keys, and all the extra bits and pieces
in the private keys too.
Partly I expect this to be useful to me for debugging: I've had to
paste key files a few too many times through base64 decoders and hex
dump tools, then manually decode SSH marshalling and paste the result
into the Python REPL to get an integer object. Now I should be able to
get _straight_ to text I can paste into Python.
But also, it's a way that other applications can use the key
generator: if you need to generate, say, an RSA key in some format I
don't support (I've recently heard of an XML-based one, for example),
then you can run 'puttygen -t rsa --dump' and have it print the
elements of a freshly generated keypair on standard output, and then
all you have to do is understand the output format.
A trawl through the code with -Wmissing-prototypes and
-Wmissing-variable-declarations turned up a lot of things that should
have been internal to a particular source file, but were accidentally
global. Keep the namespace clean by making them all static.
(Also, while I'm here, a couple of them were missing a 'const': the
ONE and ZERO arrays in sshcrcda.c, and EMPTY_WINDOW_TITLE in
terminal.c.)
A recent test-compile at high warning level points out that if you
define a macro with a ... at the end of the parameter list, then every
call should at least include the comma before the variadic part. That
is, if you #define MACRO(x,y,...) then you shouldn't call MACRO(1,2)
with no comma after the 2. But that's what I had done in one of my
definitions of FUNC0 in the fiddly testcrypt system.
In a similar vein, it's a mistake to use the preprocessor 'defined'
operator when it's expanded from another macro. Adjusted the setup of
BB_OK in mpint_i.h to avoid doing that.
(Neither of these has yet caused a problem in any real compile, but
best to fix them before they do.)
A user reports that Visual Studio 2013 and earlier have printf
implementations in their C library that don't support the 'z' modifier
to indicate that an integer argument is size_t. The 'I' modifier
apparently works in place of it.
To avoid littering ifdefs everywhere, I've invented my own inttypes.h
style macros to wrap size_t formatting directives, which are defined
to %zu and %zx normally, or %Iu and %Ix in old-VS mode. Those are in
defs.h, and they're used everywhere that a %z might otherwise get into
the Windows build.
I've added the gcc-style attribute("printf") to a lot of printf-shaped
functions in this code base that didn't have it. To make that easier,
I moved the wrapping macro into defs.h, and also enabled it if we
detect the __clang__ macro as well as __GNU__ (hence, it will be used
when building for Windows using clang-cl).
The result is that a great many format strings in the code are now
checked by the compiler, where they were previously not. This causes
build failures, which I'll fix in the next commit.
These are better than my previous approach of just assigning to
sb->len, because firstly they check by assertion that the new length
is within range, and secondly they preserve the invariant that the
byte stored in the buffer just after the length runs out is \0.
Switched to using the new functions everywhere a grep could turn up
opportunities.
They're not much use for 'real' key generation, since like all the
other randomness-using testcrypt functions, they need you to have
explicitly queued up some random data. But for generating keys for
test purposes, they have the great virtue that they deliver the key in
the internal format, where we can generate all the various public and
private blobs from it as well as the on-disk formats.
A minor change to one of the keygen functions itself: rsa_generate now
fills in the 'bits' and 'bytes' fields of the returned RSAKey, without
which it didn't actually work to try to generate a public blob from
it. (We'd never noticed before, because no previous client of
rsa_generate even tried that.)
It wasn't expanding the output strbuf to the full size of the key
modulus, so the output delivered to Python was only a part of the
mpint it should have been.
(Also, that was logically speaking a buffer overrun - we were writing
to the strbuf buffer beyond its length - although in practice I think
the _physical_ size of the buffer was large enough not to show it up
even under ASan. In any case, a buffer overrun only in the test suite,
and in a function I hadn't even got round to testing, is about the
best place to have one.)
While I'm here, I've also changed the way that the testcrypt wrapper
on rsa_ssh1_encrypt indicates failure: now we have the 'opt_'
mechanism, it can do that by returning None rather than "".
This commit switches as many ssh_hash_free / ssh_hash_new pairs as
possible to reuse the previous hash object via ssh_hash_reset. Also a
few other cleanups: use the wrapper function hash_simple() where
possible, and I've also introduced ssh_hash_digest_nondestructive()
and switched to that where possible as well.
It demonstrates a successful round trip from a source integer to
ciphertext and back, and also I've hardcoded the ciphertext I got from
the first attempt so that future changes to the code won't be able to
change it without me noticing.
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
There are already three tediously similar functions that wrap a NULL
check around some existing function to return one or another kind of
pointer, and I'm about to want to add another one. Make a macro so
that it's easy to make more functions identical to these three.
The idea of these is that they centralise the common idiom along the
lines of
if (logical_array_len >= physical_array_size) {
physical_array_size = logical_array_len * 5 / 4 + 256;
array = sresize(array, physical_array_size, ElementType);
}
which happens at a zillion call sites throughout this code base, with
different random choices of the geometric factor and additive
constant, sometimes forgetting them completely, and generally doing a
lot of repeated work.
The new macro sgrowarray(array,size,n) has the semantics: here are the
array pointer and its physical size for you to modify, now please
ensure that the nth element exists, so I can write into it. And
sgrowarrayn(array,size,n,m) is the same except that it ensures that
the array has size at least n+m (so sgrowarray is just the special
case where m=1).
Now that this is a single centralised implementation that will be used
everywhere, I've also gone to more effort in the implementation, with
careful overflow checks that would have been painful to put at all the
previous call sites.
This commit also switches over every use of sresize(), apart from a
few where I really didn't think it would gain anything. A consequence
of that is that a lot of array-size variables have to have their types
changed to size_t, because the macros require that (they address-take
the size to pass to the underlying function).
The ssh_signkey vtable has grown a new method ssh_key_invalid(), which
checks whether the key is going to be usable for constructing a
signature at all. Currently the only way this can fail is if it's an
RSA key so short that there isn't room to put all the PKCS#1
formatting in the signature preimage integer, but the return value is
an arbitrary error message just in case more reasons are needed later.
This is tested separately rather than at key-creation time because of
the signature flags system: an RSA key of intermediate length could be
valid for SHA-1 signing but not for SHA-512. So really this method
should be called at the point where you've decided what sig flags you
want to use, and you're checking if _those flags_ are OK.
On the verification side, there's no need for a separate check. If
someone presents us with an RSA key so short that it's impossible to
encode a valid signature using it, then we simply regard all
signatures as invalid.
I just found I wanted to generate a prime with particular properties,
and I knew PuTTY's prime generator could manage it, so it was easier
to add this function to testcrypt for occasional manual use than to
look for another prime-generator with the same feature set!
I've wrapped the function so as to remove the three progress-
reporting parameters.
This completes the conversion begun in commit be5c0e635: now every
CBC-mode cipher has "cbc" in its name, and doesn't leave it implicit.
Hopefully this will never confuse me again!
Like the AES code before it, I've now exposed the explicit _sw and _hw
vtables for SHA-256 and SHA-1 through the testcrypt system, and now
cryptsuite will run the standard test vectors for those hashes over
both implementations, on a platform where more than one is available.
This tears out the entire previous random-pool system in sshrand.c. In
its place is a system pretty close to Ferguson and Schneier's
'Fortuna' generator, with the main difference being that I use SHA-256
instead of AES for the generation side of the system (rationale given
in comment).
The PRNG implementation lives in sshprng.c, and defines a self-
contained data type with no state stored outside the object, so you
can instantiate however many of them you like. The old sshrand.c still
exists, but in place of the previous random pool system, it's just
become a client of sshprng.c, whose job is to hold a single global
instance of the PRNG type, and manage its reference count, save file,
noise-collection timers and similar administrative business.
Advantages of this change include:
- Fortuna is designed with a more varied threat model in mind than my
old home-grown random pool. For example, after any request for
random numbers, it automatically re-seeds itself, so that if the
state of the PRNG should be leaked, it won't give enough
information to find out what past outputs _were_.
- The PRNG type can be instantiated with any hash function; the
instance used by the main tools is based on SHA-256, an improvement
on the old pool's use of SHA-1.
- The new PRNG only uses the completely standard interface to the
hash function API, instead of having to have privileged access to
the internal SHA-1 block transform function. This will make it
easier to revamp the hash code in general, and also it means that
hardware-accelerated versions of SHA-256 will automatically be used
for the PRNG as well as for everything else.
- The new PRNG can be _tested_! Because it has an actual (if not
quite explicit) specification for exactly what the output numbers
_ought_ to be derived from the hashes of, I can (and have) put
tests in cryptsuite that ensure the output really is being derived
in the way I think it is. The old pool could have been returning
any old nonsense and it would have been very hard to tell for sure.
This is in preparation for a PRNG revamp which will want to have a
well defined boundary for any given request-for-randomness, so that it
can destroy the evidence afterwards. So no more looping round calling
random_byte() and then stopping when we feel like it: now you say up
front how many random bytes you want, and call random_read() which
gives you that many in one go.
Most of the call sites that had to be fixed are fairly mechanical, and
quite a few ended up more concise afterwards. A few became more
cumbersome, such as mp_random_bits, in which the new API doesn't let
me load the random bytes directly into the target integer without
triggering undefined behaviour, so instead I have to allocate a
separate temporary buffer.
The _most_ interesting call site was in the PKCS#1 v1.5 padding code
in sshrsa.c (used in SSH-1), in which you need a stream of _nonzero_
random bytes. The previous code just looped on random_byte, retrying
if it got a zero. Now I'm doing a much more interesting thing with an
mpint, essentially scaling a binary fraction repeatedly to extract a
number in the range [0,255) and then adding 1 to it.
The single simplest request in the entire protocol - the command
'hello' which is supposed to respond 'hello, world\n' to demonstrate
to an interactive user that testcrypt has started up successfully -
was missing the trailing newline in the response. :-)
This allows me to compile testcrypt with -DDEBUG, even though it's not
linked against the usual collection of platform-specific modules that
normally provide dputs. I think the simplest possible dputs ('just
output to stderr') is actually better for testcrypt, because that
keeps it easy to compile for strange experimental platforms.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
All the things like des_encrypt_xdmauth and aes256_encrypt_pubkey are
at risk of changing their behaviour if I rewrite the underlying code,
so even if I don't have any externally verified test cases, I should
still have _something_ to keep me confident that they work the same
way today that they worked yesterday.
I remembered the existence of that module while I was changing the API
of the CRC functions. It's still quite possibly the only code in PuTTY
not written specifically _for_ PuTTY, so it definitely deserves a bit
of a test suite.
In order to expose it through the ptrlen-centric testcrypt system,
I've added some missing 'const' in the detector module itself, but
otherwise I've left the detector code as it was.
I found some that look pretty good - in particular exercising every
entry in every S-box. These will come in useful when I finish writing
a replacement for the venerable current DES implementation.
The new explicit vtables for the hardware and software implementations
are now exposed by name in the testcrypt protocol, and cryptsuite.py
runs all the AES tests separately on both.
(When hardware AES is compiled out, ssh2_cipher_new("aes128_hw") and
similar calls will return None, and cryptsuite.py will respond by
skipping those tests.)
The old names like ssh_aes128 and ssh_aes128_ctr reflect the SSH
protocol IDs, which is all very well, but I think a more important
principle is that it should be easy for me to remember which cipher
mode each one refers to. So I've renamed them so that they all end in
_cbc and _sdctr.
(I've left alone the string identifiers used by testcrypt, for the
moment. Perhaps I'll go back and change those later.)
No cipher construction function _currently_ returns NULL, but one's
about to start, so the testcrypt system will have to be able to cope.
This is the first time a function in the testcrypt API has had an
'opt' type as its return value rather than an argument. But it works
just the same in reverse: the wire protocol emits the special
identifer "NULL" when the optional return value is absent, and the
Python module catches that and rewrites it as Python 'None'.
The bulk of this commit is the changes necessary to make testcrypt
compile under Visual Studio. Unfortunately, I've had to remove my
fiddly clever uses of C99 variadic macros, because Visual Studio does
something unexpected when a variadic macro's expansion puts
__VA_ARGS__ in the argument list of a further macro invocation: the
commas don't separate further arguments. In other words, if you write
#define INNER(x,y,z) some expansion involving x, y and z
#define OUTER(...) INNER(__VA_ARGS__)
OUTER(1,2,3)
then gcc and clang will translate OUTER(1,2,3) into INNER(1,2,3) in
the obvious way, and the inner macro will be expanded with x=1, y=2
and z=3. But try this in Visual Studio, and you'll get the macro
parameter x expanding to the entire string 1,2,3 and the other two
empty (with warnings complaining that INNER didn't get the number of
arguments it expected).
It's hard to cite chapter and verse of the standard to say which of
those is _definitely_ right, though my reading leans towards the
gcc/clang behaviour. But I do know I can't depend on it in code that
has to compile under both!
So I've removed the system that allowed me to declare everything in
testcrypt.h as FUNC(ret,fn,arg,arg,arg), and now I have to use a
different macro for each arity (FUNC0, FUNC1, FUNC2 etc). Also, the
WRAPPED_NAME system is gone (because that too depended on the use of a
comma to shift macro arguments along by one), and now I put a custom C
wrapper around a function by simply re-#defining that function's own
name (and therefore the subsequent code has to be a little more
careful to _not_ pass functions' names between several macros before
stringifying them).
That's all a bit tedious, and commits me to a small amount of ongoing
annoyance because now I'll have to add an explicit argument count
every time I add something to testcrypt.h. But then again, perhaps it
will make the code less incomprehensible to someone trying to
understand it!
Every compiler that's so far seen testcrypt.c has tolerated me writing
'typedef enum ValueType ValueType' before actually saying what 'enum
ValueType' is, but one just pointed out that it's not actually legal
standard C to do that. Moved the typedef to after the enum.
The type names 'val_foo' used in testcrypt.h work on a system where a
further underscore suffix is treated as a qualifier indicating
something about how the C version of the API represents that type.
(For example, plain 'val_string' means a strbuf, but if I write
'val_string_asciz' or 'val_string_ptrlen' in testcrypt.h it will cause
the wrapper for that function to pass a char * or a ptrlen derived
from that strbuf.)
But I forgot about this when I named the type val_ssh2_cipher (and
ditto ssh1), with the effect that the testcrypt system has considered
them all along to really be called 'ssh2' and 'ssh1', and the 'cipher'
to be some irrelevant API-adaptor suffix.
This hasn't caused a bug because I didn't have any other type called
val_ssh2_something. But it was a latent one. Now renamed sensibly!
This enables me to control where testcrypt both reads its input and
writes its output. That in turn makes it convenient to run testcrypt
itself in a separate Unix terminal window from its client Python, by
making two named pipe files (say, 'i' and 'o'), running the client
with PUTTY_TESTCRYPT="cat o & cat > i" in its environment, and in
another window, running 'testcrypt -o o i'.
And that in turn makes it easy to attach gdb to testcrypt, so I can
easily debug its handling of whatever request the client sent.
This allows me to remove another diagnostic main() that I just found
lurking at the bottom of sshdes.c, which was there to allow manual
untangling of XDM-AUTHORIZATION-1 strings when debugging X forwarding.
Now you can ask the same kind of question at the interactive Python
prompt, without having to manually compile anything. For example, the
query you might previously have asked by building the sshdes test
program and running
$ ./sshdes 090a0b0c0d0e0f10 0123456789abcd
decrypt(090a0b0c0d0e0f10,0123456789abcd) = ab53fd65ae7f4ec3
encrypt(090a0b0c0d0e0f10,0123456789abcd) = 7065d20441f5abe3
you can now run using the standard testcrypt (bearing in mind that the
actual library function takes the key argument first):
$ python -i test/testcrypt.py
>>> from binascii import hexlify as H, unhexlify as U
>>> H(des_decrypt_xdmauth(U('0123456789abcd'),U('090a0b0c0d0e0f10')))
'ab53fd65ae7f4ec3'
>>> H(des_encrypt_xdmauth(U('0123456789abcd'),U('090a0b0c0d0e0f10')))
'7065d20441f5abe3'
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
I've written a new standalone test program which incorporates all of
PuTTY's crypto code, including the mp_int and low-level elliptic curve
layers but also going all the way up to the implementations of the
MAC, hash, cipher, public key and kex abstractions.
The test program itself, 'testcrypt', speaks a simple line-oriented
protocol on standard I/O in which you write the name of a function
call followed by some inputs, and it gives you back a list of outputs
preceded by a line telling you how many there are. Dynamically
allocated objects are assigned string ids in the protocol, and there's
a 'free' function that tells testcrypt when it can dispose of one.
It's possible to speak that protocol by hand, but cumbersome. I've
also provided a Python module that wraps it, by running testcrypt as a
persistent subprocess and gatewaying all the function calls into
things that look reasonably natural to call from Python. The Python
module and testcrypt.c both read a carefully formatted header file
testcrypt.h which contains the name and signature of every exported
function, so it costs minimal effort to expose a given function
through this test API. In a few cases it's necessary to write a
wrapper in testcrypt.c that makes the function look more friendly, but
mostly you don't even need that. (Though that is one of the
motivations between a lot of API cleanups I've done recently!)
I considered doing Python integration in the more obvious way, by
linking parts of the PuTTY code directly into a native-code .so Python
module. I decided against it because this way is more flexible: I can
run the testcrypt program on its own, or compile it in a way that
Python wouldn't play nicely with (I bet compiling just that .so with
Leak Sanitiser wouldn't do what you wanted when Python loaded it!), or
attach a debugger to it. I can even recompile testcrypt for a
different CPU architecture (32- vs 64-bit, or even running it on a
different machine over ssh or under emulation) and still layer the
nice API on top of that via the local Python interpreter. All I need
is a bidirectional data channel.