The IV-incrementing code seems to have had several bugs. It was not
propagating a carry from bit 31 to 32 of the 128-bit integer,
apparently because of having arranged the 32-bit words wrongly within
the vector register; also, when it tried to implement carry
propagation from bit 63 to 64 by checking the low 64 bits for zero, it
checked the _high_ bits instead, leading to a spurious extra addition
in the low half if the high half happened to be zero.
This must surely have been able to cause mysterious decryption
failures about once every 2^32 cipher blocks = 64Gb of data
transferred. I suppose that must be a large enough number that either
no users of the snapshot builds have encountered the problem, or the
ones who did dismissed it as computers being randomly flaky.
The revised version now passes all the same tests as the software
implementation.
This tests the CBC and SDCTR modes, in all key lengths, and in
particular includes a set of SDCTR tests designed to test the
procedure for incrementing the IV as a single 128-bit integer, by
checking propagation of the carry between every pair of words.
This makes the test suite pass if I compile with -DBIGNUM_OVERRIDE=4
to fall back to 16-bit BignumInt. In that mode, BignumInt is smaller
than 'int', which means default promotion keeps causing things to get
promoted to 'int' unexpectedly, so I had to add some casts back down.
This makes it easy to re-test the mpint functions using different word
sizes and smoke out any more confusions between integer types in
mpint.c, by recompiling with -DBIGNUM_OVERRIDE=4 or =5 or =6 (for 16-,
32- or 64-bit respectively).
BIGNUM_OVERRIDE only lets you force the size downwards, not upwards:
of course the default behaviour is to use the largest BignumInt the
ifdefs can find a way to, and they can't magic up a bigger one just
because you tell them you'd like one.
At some point in mpint.c development I switched the main macro defined
by the ifdefs from BIGNUM_INT_BITS to the new BIGNUM_INT_BITS_BITS, so
I could loop from 0 to the latter in safe bit-shift loops that test
each bit of a shift count. But I forgot to change the comment
accordingly.
This should have been done months ago in commit bf0cf984c, but I've
been indecisive about whether to keep my local dev builds in the
windows subdirectory itself or one level further down...
This enables me to control where testcrypt both reads its input and
writes its output. That in turn makes it convenient to run testcrypt
itself in a separate Unix terminal window from its client Python, by
making two named pipe files (say, 'i' and 'o'), running the client
with PUTTY_TESTCRYPT="cat o & cat > i" in its environment, and in
another window, running 'testcrypt -o o i'.
And that in turn makes it easy to attach gdb to testcrypt, so I can
easily debug its handling of whatever request the client sent.
I got the maximum shift count _completely_ wrong when trying to work
out whether each word should be compared against part of the input
uintmax_t: I measured it in bytes rather than bits _and_ applied it to
the wrong type. Ahem.
A major advantage of the new testcrypt system _not_ being written as a
native-code Python module in the usual way is that it makes it very
easy to recompile testcrypt in a non-default way, such as with -m32,
and still run the same tests via the same Python module.
But I hadn't actually _done_ that until now, and now that I do, the
test suite has picked up a couple of bugs. When computing the initial
reciprocal approximation in mp_divmod_into, I did a lot of work on
explicit uint64_t, but did it in a way that used BIGNUM_INT_BITS as
the number's bit size instead of the constant 64, and cast several
things absentmindedly to BignumInt. And because I'd only tested on a
platform where those are the same type anyway, I didn't spot it.
When I was originally designing my knockoff of Stein's algorithm, I
simplified it for my own understanding by replacing the step that
turns a into (a-b)/2 with a step that simply turned it into a-b, on
the basis that the next step would do the division by 2 in any case.
This made it easier to get my head round in the first place, and in
the initial Python prototype of the algorithm, it looked more sensible
to have two different kinds of simple step rather than one simple and
one complicated.
But actually, when it's rewritten under the constraints of time
invariance, the standard way is better, because we had to do the
computation for both kinds of step _anyway_, and this way we sometimes
make both of them useful at once instead of only ever using one.
So I've put it back to the more standard version of Stein, which is a
big improvement, because now we can run in at most 2n iterations
instead of 3n _and_ the code implementing each step is simpler. A
quick timing test suggests that modular inversion is now faster by a
factor of about 1.75.
Also, since I went to the effort of thinking up and commenting a pair
of worst-case inputs for the iteration count of Stein's algorithm, it
seems like an omission not to have made sure they were in the test
suite! Added extra tests that include 2^128-1 as a modulus and 2^127
as a value to invert.
I broke it last year in commit 4988fd410, when I made hash contexts
expose a BinarySink interface. I went round finding no end of long-
winded ways of pushing things into hash contexts, often reimplementing
some standard thing like the wire formatting of an mpint, and rewrote
them more concisely using one or two put_foo calls.
But I failed to notice that the hash preimage used in SSH-1 key
fingerprints is _not_ implementable by put_ssh1_mpint! It consists of
the two public-key integers encoded in multi-byte binary big-endian
form, but without any preceding length field at all. I must have
looked too hastily, 'recognised' it as just implementing an mpint
formatter yet again, and replaced it with put_ssh1_mpint. So SSH-1 key
fingerprints have been completely wrong in the snapshots for months.
Fixed now, and this time, added a comment to warn me in case I get the
urge to simplify the code again, and a regression test in cryptsuite.
I've moved the static method nbits up into a top-level function, so I
can use it to implement Python marshalling functions for SSH mpints.
I'm about to need one of these, and the other will surely come in
useful as well sooner or later.
This allows me to remove another diagnostic main() that I just found
lurking at the bottom of sshdes.c, which was there to allow manual
untangling of XDM-AUTHORIZATION-1 strings when debugging X forwarding.
Now you can ask the same kind of question at the interactive Python
prompt, without having to manually compile anything. For example, the
query you might previously have asked by building the sshdes test
program and running
$ ./sshdes 090a0b0c0d0e0f10 0123456789abcd
decrypt(090a0b0c0d0e0f10,0123456789abcd) = ab53fd65ae7f4ec3
encrypt(090a0b0c0d0e0f10,0123456789abcd) = 7065d20441f5abe3
you can now run using the standard testcrypt (bearing in mind that the
actual library function takes the key argument first):
$ python -i test/testcrypt.py
>>> from binascii import hexlify as H, unhexlify as U
>>> H(des_decrypt_xdmauth(U('0123456789abcd'),U('090a0b0c0d0e0f10')))
'ab53fd65ae7f4ec3'
>>> H(des_encrypt_xdmauth(U('0123456789abcd'),U('090a0b0c0d0e0f10')))
'7065d20441f5abe3'
This supersedes the '#ifdef TEST' main programs in sshsh256.c and
sshsh512.c. Now there's no need to build those test programs manually
on the rare occasion of modifying the hash implementations; instead
testcrypt is built every night and will run these test vectors.
RFC 6234 has some test vectors for HMAC-SHA-* as well, so I've
included the ones applicable to this implementation.
I noticed a few of these in the course of preparing the previous
commit. I must have been writing that idiom out by hand for _ages_
before it became totally habitual to #define it as 'lenof' in every
codebase I touch. Now I've gone through and replaced all the old
verbosity with nice terse lenofs.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
PLATFORM_HAS_SMEMCLR from winstuff.h was available to misc.c via
putty.h, but the point of utils.c is not to pull in that stuff.
This is a quick bodge to unbreak the Windows build. It needs a better
answer for optionally overriding the platform-independent smemclr()
with a platform-specific implementation.
I just found a file lying around in a different source directory that
contained a test case I'd had trouble with last week, so now I've
recovered it, it ought to go in the test suite as a regression test.
After I moved parts of misc.c into utils.c, we started getting two
versions of smemclr in the Windows builds, because utils.c didn't know
to omit its one, having not included the main putty.h.
But it was deliberate that utils.c didn't include putty.h, because I
wanted it (along with the rest of testcrypt in particular) to be
portable to unusual platforms without having to port the whole of the
code base.
So I've moved into the ubiquitous defs.h just the one decision about
whether we're on a platform that will supersede utils.c's definition
of smemclr.
(Also, in the process of moving it, I've removed the clause that
disabled the Windows smemclr in winelib mode, because it looks as if
the claim that winelib doesn't have SecureZeroMemory is now out of
date.)
This is a reasonably comprehensive test that exercises basically all
the functions I rewrote at the end of last year, and it's how I found
a lot of the bugs in them that I fixed earlier today.
It's written in Python, using the unittest framework, which is
convenient because that way I can cross-check Python's own large
integers against PuTTY's.
While I'm here, I've also added a few tests of higher-level crypto
primitives such as Ed25519, AES and HMAC, when I could find official
test vectors for them. I hope to add to that collection at some point,
and also add unit tests of some of the other primitives like ECDH and
RSA KEX.
The test suite is run automatically by my top-level build script, so
that I won't be able to accidentally ship anything which regresses it.
When it's run at build time, the testcrypt binary is built using both
Address and Leak Sanitiser, so anything they don't like will also
cause a test failure.
The new testcrypt system made it easy to write a tiny Python program
that does a lot of multiplications of various large sizes, run it
against versions of the testcrypt binary built with lots of different
threshold settings, and time the output by running the Python program
with PUTTY_TESTCRYPT="command time -f %U ./testcrypt".
When I tried that I found that lots of values in the 20-30 range
looked about as good as each other. 24 was an unusually low dip which
could well have just been a random outlier, but it's a nice round
number so I picked it anyway.
I've written a new standalone test program which incorporates all of
PuTTY's crypto code, including the mp_int and low-level elliptic curve
layers but also going all the way up to the implementations of the
MAC, hash, cipher, public key and kex abstractions.
The test program itself, 'testcrypt', speaks a simple line-oriented
protocol on standard I/O in which you write the name of a function
call followed by some inputs, and it gives you back a list of outputs
preceded by a line telling you how many there are. Dynamically
allocated objects are assigned string ids in the protocol, and there's
a 'free' function that tells testcrypt when it can dispose of one.
It's possible to speak that protocol by hand, but cumbersome. I've
also provided a Python module that wraps it, by running testcrypt as a
persistent subprocess and gatewaying all the function calls into
things that look reasonably natural to call from Python. The Python
module and testcrypt.c both read a carefully formatted header file
testcrypt.h which contains the name and signature of every exported
function, so it costs minimal effort to expose a given function
through this test API. In a few cases it's necessary to write a
wrapper in testcrypt.c that makes the function look more friendly, but
mostly you don't even need that. (Though that is one of the
motivations between a lot of API cleanups I've done recently!)
I considered doing Python integration in the more obvious way, by
linking parts of the PuTTY code directly into a native-code .so Python
module. I decided against it because this way is more flexible: I can
run the testcrypt program on its own, or compile it in a way that
Python wouldn't play nicely with (I bet compiling just that .so with
Leak Sanitiser wouldn't do what you wanted when Python loaded it!), or
attach a debugger to it. I can even recompile testcrypt for a
different CPU architecture (32- vs 64-bit, or even running it on a
different machine over ssh or under emulation) and still layer the
nice API on top of that via the local Python interpreter. All I need
is a bidirectional data channel.
The __truediv__ pair makes the whole program work in Python 3 as well
as 2 (it was _so_ nearly there already!), and __int__ lets you easily
turn a ModP back into an ordinary Python integer representing its
least positive residue.
I wrote it for the sake of a test-system design I had in mind at the
time, but that design changed after I committed, and now I think
_even_ my upcoming test application won't need to copy MontyContexts.
So I'll remove the function now, so as not to have to pointlessly
write tests for it :-)
Previously, lots of individual ssh2_cipheralg structures were declared
static, and only available to the rest of the code via a smaller
number of 'ssh2_ciphers' objects that wrapped them into lists. But I'm
going to want to access individual ciphers directly in the testing
system I'm currently working on, so I'm giving all those objects
external linkage and declaring them in ssh.h.
Also, I've made up an entirely new one, namely exposing MD5 as an
instance of the general ssh_hashalg abstraction, which it has no need
to be for the purposes of actually using it in SSH. But, again, this
will let me treat it the same as all the other hashes in the test
system.
No functional change, for the moment.
I'm getting tired of typing 'struct Foo' everywhere when I could just
type 'Foo', so here's a bunch of extra typedefs that allow me to leave
off the 'struct' in various places.
I got it right in all the serious code (or else my Curve25519 key
exchange wouldn't have worked), but I wrote it down wrongly in the
comment in ecc.h, putting the coefficient b on the RHS x term rather
than the LHS y^2. Then I repeated the same error in the point
decompression function in eccref.py.
It was computing the RHS of the curve equation affinely, without
taking account of the point's Z coordinate. In other words, it would
work OK for a point you'd _only just_ imported into ecc.c which was
still represented with a denominator of 1, but it would give the wrong
answer for points coming out of computation after that.
I've moved the simple version into ecc_weierstrass_point_new_from_x,
since the only reason it was in a separate function at all was so it
could be reused by point_valid, which I now realise it can't.
If it had to negate x-y to make it positive for mp_mod, but the answer
comes out as zero after that, then after re-negating it this is the
one case where we _shouldn't_ add the modulus afterwards. Result was
that, for example, mp_modsub(0, 0, 5) would return 5 instead of the
obvious 0.
It correctly masked off bits in the partial word, but then left all
higher words _unchanged_ rather than zeroing them.
Apparently its use in mp_invert_mod_2to was in restricted enough
circumstances not to cause a failure there!
My test for whether x has a square root was based on testing whether a
large power of x was congruent to 1 mod p, which is a fine test
provided x is in the multiplicative group of p, but would give a false
negative on the one possible input value that _isn't_ - namely zero.
The actual number returned from the function is fine (because that too
is a large power of the input, and when the input is 0 that's
foolproof). So I just needed to add a special case for the returned
'success' flag.
I had not considered what would happen if the input and output mp_ints
aliased each other. What happens is that because I wrote the output
starting from the low end, input words would get corrupted before I'd
finished with them. Easily fixed by reversing the order of the loop.
ssh_rsakex_encrypt took an input (pointer, length) pair, which I've
replaced with a ptrlen; it also took an _output_ (pointer, length)
pair, and then re-computed the right length internally and enforced by
assertion that the one passed in matched it. Now it just returns a
strbuf of whatever length it computed, which saves the caller having
to compute the length at all.
Also, both ssh_rsakex_encrypt and ssh_rsakex_decrypt took their
arguments in a weird order; I think it looks more sensible to put the
RSA key first rather than last, so now they both have the common order
(key, hash, input data).
The abstract method ssh_key_sign(), and the concrete functions
ssh_rsakex_newkey() and rsa_ssh1_public_blob_len(), now each take a
ptrlen argument in place of a separate pointer and length pair.
Partly that's because I'm generally preferring ptrlens these days and
it keeps argument lists short and tidy-looking, but mostly it's
because it will make those functions easier to wrap in my upcoming
test system.
This makes the API more flexible, so that it's not restricted to
taking a key of precisely the length specified in the ssh2_macalg
structure. Instead, ssh2bpp looks up that length to construct the
MAC's key.
Some MACs (e.g. Poly1305) will only _work_ with a single key length.
But this way, I can run standard test vectors against MACs that can
take a variable length (e.g. everything in the HMAC family).
Just like put_data(), but takes a ptrlen rather than separate ptr and
len arguments, so it saves a bit of repetition at call sites. I
probably should have written this ages ago, but better late than
never; I've also converted every call site I can find that needed it.
When we're running with that cipher/MAC combination, what ssh2bpp.c
sees as separate cipher and MAC objects are actually just different
interfaces of the same object: the Poly1305 constructor function just
returns an alternate view of the cipher object it's passed, and the
free function does nothing.
But Address Sanitiser points out that if we first call
ssh2_cipher_free and then ssh2_mac_free, then although poly1305_free
would do nothing once we reached it, we can't actually reach it
without looking up the vtable in the object we've just thrown away.
Easily fixed by reversing the order of mac_free and cipher_free in the
BPP. Also I've centralised that freeing so that it doesn't have to be
repeated with the same careful ordering in the BPP cleanup function
and the functions that replace a direction of crypto.
misc.c has always contained a combination of things that are tied
tightly into the PuTTY code base (e.g. they use the conf system, or
work with our sockets abstraction) and things that are pure standalone
utility functions like nullstrcmp() which could quite happily be
dropped into any C program without causing a link failure.
Now the latter kind of standalone utility code lives in the new source
file utils.c, whose only external dependency is on memory.c (for snew,
sfree etc), which in turn requires the user to provide an
out_of_memory() function. So it should now be much easier to link test
programs that use PuTTY's low-level functions without also pulling in
half its bulky infrastructure.
In the process, I came across a memory allocation logging system
enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case
we have much more advanced tools for that kind of thing these days,
like valgrind and Leak Sanitiser, so I've just removed it rather than
trying to transplant it somewhere sensible. (We can always pull it
back out of the version control history if really necessary, but I
haven't used it in at least a decade.)
The other slightly silly thing I did was to give bufchain a function
pointer field that points to queue_idempotent_callback(), and disallow
direct setting of the 'ic' field in favour of calling
bufchain_set_callback which will fill that pointer in too. That allows
the bufchain system to live in utils.c rather than misc.c, so that
programs can use it without also having to link in the callback system
or provide an annoying stub of that function. In fact that's just
allowed me to remove stubs of that kind from PuTTYgen and Pageant!
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
This removes the one remaining failure at -Wvla. (Of course, that
array isn't for a _hash_ function, so it wouldn't have been quite
appropriate to make it a static array of size MAX_HASH_LEN.)
Although C99 introduces variable length array (VLA) feature, it is
not supported by MS Visual Studio (being C++ compiler first of all).
In this commit, we add Clang and GCC options to disable VLA, so they
would be disabled in non-MSVC builds as well.