Previously, I checked by assertion that the base was less than the
modulus. There were two things wrong with this policy. Firstly, it's
perfectly _meaningful_ to want to raise a large number to a power mod
a smaller number, even if it doesn't come up often in cryptography;
secondly, I didn't do it right, because the check was based on the
formal sizes (nw fields) of the mp_ints, which meant that it was
possible to have a failure of the assertion even in the case where the
numerical value of the base _was_ less than the modulus.
In particular, this could come up in Diffie-Hellman with a fixed
group, because the fixed group modulus was decoded from an MP_LITERAL
in sshdh.c which gave a minimal value of nw, but the base was the
public value sent by the other end of the connection, which would
sometimes be sent with the leading zero byte required by the SSH-2
mpint encoding, and would cause a value of nw one larger, failing the
assertion.
Fixed by simply using mp_modmul in monty_import, replacing the
previous clever-but-restricted strategy that I wrote when I thought I
could get away without having to write a general division-based
modular reduction at all.
This completes the conversion begun in commit be5c0e635: now every
CBC-mode cipher has "cbc" in its name, and doesn't leave it implicit.
Hopefully this will never confuse me again!
In commit dfdb73e10 I accidentally renamed them from "aes128-cbc" to
"aes128" (and ditto for the other two key lengths), probably because
of the confusing names of the C-level identifiers for those vtables.
Now restored to the versions actually described in RFC 4253.
If something goes wrong part way through one of the new-key functions,
we immediately call the corresponding freekey function before
returning failure. That will test ek->privateKey for NULL, and if it's
not NULL, try to free it - so we should be sure it _is_ NULL if we
haven't put a private key in it.
Mainly this change affects the whole {GET,PUT}_??BIT_?SB_FIRST family,
which has always been a horrible set of macros for massive multiple-
expansion of its arguments. Now we're allowed to use C99 in this code
base, I can finally turn them into nice clean inline functions. As
bonus they now take their pointer argument as a void * (const-
qualified as appropriate) which means the call site doesn't have to
worry about exactly which flavour of pointer it's passing.
(That change also affects the GET_*_X11 macros in x11fwd.c, since I
was just reminded of their existence too!)
I've also converted NULLTOEMPTY, which was sitting right next to the
GET/PUT macros in misc.h and it seemed a shame to leave it out.
Those were a reasonable abbreviation when the code almost never had to
deal with little-endian numbers, but they've crept into enough places
now (e.g. the ECC formatting) that I think I'd now prefer that every
use of the integer read/write macros was clearly marked with its
endianness.
So all uses of GET_??BIT and PUT_??BIT are now qualified. The special
versions in x11fwd.c, which used variable endianness because so does
the X11 protocol, are suffixed _X11 to make that clear, and where that
pushed line lengths over 80 characters I've taken the opportunity to
name a local variable to remind me of what that extra parameter
actually does.
Pavel says there was no specific reason they avoided it, and compiling
with Debian Jessie's GCC 4.9.2 produces a binary that I've no reason
to believe won't work, although I haven't tested it on a real or
emulated CPU that supports the instructions.
The backend_unthrottle function gets called when the backlog on stdout
clears, and it's possible for that to happen _after_ the SSH backend
has terminated the connection and freed all its protocol modules (e.g.
if a protocol error occurred on the network while data was still
waiting to be written to stdout). So ssh_unthrottle should check that
ssh->cl still exists before calling any method of it.
If there was still pending output data on a NetSocket's output_data
bufchain when it was closed, then we wouldn't have freed it, on either
Unix or Windows.
uxnet.c's method for passing socket errors on to the Plug involves
setting up a toplevel callback using the NetSocket itself as the
context. Therefore, it should call delete_callbacks_for_context when
it destroys a NetSocket. For example, if _two_ socket errors manage to
occur, and the first one causes the socket to be closed, you need the
second callback to not happen, or it'll dereference the freed pointer.
The variable 'strbuf *realhost' was only initialised in the branch of
the ifdefs where IPV6 is enabled, so at NO_IPV6, it's used
uninitialised, which in my usual build configuration means a compile
failure.
We strbuf_free(ssh2blob), but forgot to null the pointer out
afterwards, which means a subsequent check for NULL believes there
wasn't a problem, and nulls out the error message pointer instead!
Some functions got confused if given one as input (particularly
mp_get_decimal, which assumed it could safely write at least one word
into the inv5 value it makes internally), and I've decided it's easier
to stop them ever being created than to teach everything to handle
them correctly. So now mp_make_sized enforces nw != 0 by assertion,
and I've added a max at any call site that looked as if it might
violate that precondition.
mp_from_hex("") could generate one of these, in particular, so now
I've fixed it, I've added a test to make sure it continues doing
something sensible.
The mp_cond_swap that sorts the key's factors into p>q order only
works if the mp_int representations of p and q have the same nw. It's
unusual but by no means illegal for an RSA key to be the product of
wildly different-length primes, so we should cope. Now we sort p and q
by using mp_min and mp_max.
In the clang-cl makefile, we run the .rc file through the preprocessor
and the actual resource compiler in two separate steps. But all the
include-file dependencies were being put on the _latter_ step, so
editing a .rc2 file didn't trigger a rebuild of the resource file.
Since the rewrite of hardware SHA support in cbbd464fd7, we've been
attempting to build with SHA-NI support on x86 with some GCC 4.x,
including Ubuntu 14.04's 4.8.x, whereas before we only tried it with
GCC 5.x and above. Revert to that.
(I think that GCC has had some support for this extension since 4.9.0 --
the "sha" attribute went in in upstream commit fc975a4090 -- and it
at least compiles with 4.9.2, but I'm assuming Pavel had a good reason
for sticking to 5+ in 5a38b293bd.)
When I reworked the support for rekeying after a certain amount of
data had been sent, I forgot the part where configuring the max data
limit to zero means 'never rekey due to data transfer volume'. So I
was incautiously checking the 'running' flag in the new
DataTransferStats to find out whether we needed to rekey, forgetting
that sometimes running=false means the transfer limit has expired, and
sometimes it means there never was one in the first place.
To fix this, I've got rid of the boolean return value from DTS_CONSUME
and turned it into an 'expired' flag in DataTransferStats, separate
from the 'running' flag. Now everything consistently checks 'expired'
to find out whether to rekey, and there's a new reset function that
reliably clears 'expired' but sets 'running' depending on whether the
size is nonzero.
(Also, while I'm at it, I've turned the DTS_CONSUME macro into an
inline function, because that's becoming my general preference now
that C99 is allowed in this code base.)
I think Pavel is right to have turned off -DDEBUG in the MinGW build
on general principles - it should never be the default option for any
build platform - but also, it was not intentional that sshdes.c
produces its hugely detailed diagnostics merely because you compile
with the very generic -DDEBUG. So now you have to say
-DDES_DIAGNOSTICS too if you really want sshdes.c's gory detail.
In the new testSSHCiphers function, I forgot to put in the check for
None that I put in all the other functions that try to explicitly
instantiate hardware-accelerated AES.
In the bitsliced implementation, the addition of 0x63 as the last
operation inside the S-box actually costs cycles during encryption -
four bitslice inversions - which can be easily eliminated by detaching
the constant, moving it forward past the ShiftRows and MixColumns
(with both of which it commutes) until it's adjacent to the next round
key addition, and then folding it into the round key during key setup.
I had this idea while I was originally writing this implementation,
but deferred actually doing it because it made all the intermediate
results harder to check against the standard test vectors. Now the
code is working and stable, this is the moment to come back and do it.
Since I apparently can't reliably keep this script working on both
flavours of Python, I think these days I'd rather it broke on 2 than
on 3 due to my inattention. So let's default to 3.
I've only just found out that it has the effect of treating the argv
words not as plain filenames, but as arguments to Perl default 'open',
i.e. if they end in | then the text before that is treated as a
command. That's not what was intended in any of these contexts!
Fortunately, in this project it only comes up in non-critical
'contrib' scripts.
Like the AES code before it, I've now exposed the explicit _sw and _hw
vtables for SHA-256 and SHA-1 through the testcrypt system, and now
cryptsuite will run the standard test vectors for those hashes over
both implementations, on a platform where more than one is available.
Similarly to the 'AES (unaccelerated)' naming scheme I added in the
AES rewrite, the hash functions that have multiple implementations now
each come with an annotation saying which one they are.
This was more tricky for hashes than for ciphers, because the
annotation for a hash has to be a separate string literal from the
base text name, so that it can propagate into the name field for each
HMAC wrapper without looking silly.
Similarly to my recent addition of NEON-accelerated AES, these new
implementations drop in alongside the SHA-NI ones, under a different
set of ifdefs. All the details of selection and detection are
essentially the same as they were for the AES code.
The new structure of those modules is along similar lines to the
recent rewrite of AES, with selection of HW vs SW implementation being
done by the main vtable instead of a subsidiary function pointer
within it, freedom for each implementation to define its state
structure however is most convenient, and space to drop in other
hardware-accelerated implementations.
I've removed the centralised test for compiler SHA-NI support in
ssh.h, and instead duplicated it between the two SHA modules, on the
grounds that once you start considering an open-ended set of hardware
accelerators, the two hashes _need_ not go together.
I've also added an extra test in cryptsuite that checks the point at
which the end-of-hash padding switches to adding an extra cipher
block. That was just because I was rewriting that padding code, was
briefly worried that I might have got an off-by-one error in that part
of it, and couldn't see any existing test that gave me confidence I
hadn't.
This tears out the entire previous random-pool system in sshrand.c. In
its place is a system pretty close to Ferguson and Schneier's
'Fortuna' generator, with the main difference being that I use SHA-256
instead of AES for the generation side of the system (rationale given
in comment).
The PRNG implementation lives in sshprng.c, and defines a self-
contained data type with no state stored outside the object, so you
can instantiate however many of them you like. The old sshrand.c still
exists, but in place of the previous random pool system, it's just
become a client of sshprng.c, whose job is to hold a single global
instance of the PRNG type, and manage its reference count, save file,
noise-collection timers and similar administrative business.
Advantages of this change include:
- Fortuna is designed with a more varied threat model in mind than my
old home-grown random pool. For example, after any request for
random numbers, it automatically re-seeds itself, so that if the
state of the PRNG should be leaked, it won't give enough
information to find out what past outputs _were_.
- The PRNG type can be instantiated with any hash function; the
instance used by the main tools is based on SHA-256, an improvement
on the old pool's use of SHA-1.
- The new PRNG only uses the completely standard interface to the
hash function API, instead of having to have privileged access to
the internal SHA-1 block transform function. This will make it
easier to revamp the hash code in general, and also it means that
hardware-accelerated versions of SHA-256 will automatically be used
for the PRNG as well as for everything else.
- The new PRNG can be _tested_! Because it has an actual (if not
quite explicit) specification for exactly what the output numbers
_ought_ to be derived from the hashes of, I can (and have) put
tests in cryptsuite that ensure the output really is being derived
in the way I think it is. The old pool could have been returning
any old nonsense and it would have been very hard to tell for sure.
The upcoming PRNG revamp will want to tell noise sources apart, so
that it can treat them all fairly. So I've added an extra parameter to
noise_ultralight and random_add_noise, which takes values in an
enumeration covering all the vague classes of entropy source I'm
collecting. In this commit, though, it's simply ignored.
This is in preparation for a PRNG revamp which will want to have a
well defined boundary for any given request-for-randomness, so that it
can destroy the evidence afterwards. So no more looping round calling
random_byte() and then stopping when we feel like it: now you say up
front how many random bytes you want, and call random_read() which
gives you that many in one go.
Most of the call sites that had to be fixed are fairly mechanical, and
quite a few ended up more concise afterwards. A few became more
cumbersome, such as mp_random_bits, in which the new API doesn't let
me load the random bytes directly into the target integer without
triggering undefined behaviour, so instead I have to allocate a
separate temporary buffer.
The _most_ interesting call site was in the PKCS#1 v1.5 padding code
in sshrsa.c (used in SSH-1), in which you need a stream of _nonzero_
random bytes. The previous code just looped on random_byte, retrying
if it got a zero. Now I'm doing a much more interesting thing with an
mpint, essentially scaling a binary fraction repeatedly to extract a
number in the range [0,255) and then adding 1 to it.
Mostly on the Unix side: there are lots of places the Windows code was
collecting noise that the corresponding Unix/GTK code wasn't bothering
to, such as mouse movements, keystrokes and various network events.
Also, both platforms had forgotten to collect noise when reading data
from a pipe to a local proxy process, even though in that
configuration that's morally equivalent to the network packet timings
that we'd normally be collecting from.
I'd forgotten that the SSH-2 BPP uses a defensive measure of
generating the MAC for successive prefixes of an incoming packet,
which means that ssh_mac_genresult needs to be nondestructive.
While I'm at it, I've also made all of hmac's hash objects exist all
the time - they're created up front, destroyed unconditionally on
free, and in between, whenever one is destroyed at all it's
immediately recreated. I think this simplifies things in general, and
in particular, creating at least one hash object immediately will come
in useful when I add selector vtables in a few commits' time.
Keeping that information alongside the hashes themselves seems more
sensible than having the HMAC code know that fact about everything it
can work with.
That flag was missing from all the CBC vtables' flags fields, because
my recent rewrite forgot to put it in. As a result the SSH_MSG_IGNORE
defence against CBC length oracle attacks was not being enabled.
Another thing pointed out by ASan: new_unix_listener takes ownership
of the SockAddr you give it, so I shouldn't have been freeing it at
the end of platform_make_x11_server().
The single simplest request in the entire protocol - the command
'hello' which is supposed to respond 'hello, world\n' to demonstrate
to an interactive user that testcrypt has started up successfully -
was missing the trailing newline in the response. :-)
This allows me to compile testcrypt with -DDEBUG, even though it's not
linked against the usual collection of platform-specific modules that
normally provide dputs. I think the simplest possible dputs ('just
output to stderr') is actually better for testcrypt, because that
keeps it easy to compile for strange experimental platforms.
This replaces all the separate HMAC-implementing wrappers in the
various source files implementing the underlying hashes.
The new HMAC code also correctly handles the case of a key longer than
the underlying hash's block length, by replacing it with its own hash.
This means I can reinstate the test vectors in RFC 6234 which exercise
that case, which I didn't add to cryptsuite before because they'd have
failed.
It also allows me to remove the ad-hoc code at the call site in
cproxy.c which turns out to have been doing the same thing - I think
that must have been the only call site where the question came up
(since MAC keys invented by the main SSH-2 BPP are always shorter than
that).
Similar to the versions in ssh_cipheralg and ssh_keyalg, this allows a
set of vtables to share function pointers while providing varying
constant data that the shared function can use to vary its behaviour.
As an initial demonstration, I've used this to recombine the four
trivial text_name methods for the HMAC-SHA1 variants. I'm about to use
it for something more sensible, though.
We were retaining a ptrlen 's->comment' into a past agent response
message, but that had been freed by the time it was actually printed
in a diagnostic. Also, agent_response_to_free was being freed twice,
because the variable 'ret' in the response-formatting code aliased it.
All the hash-specific state structures, and the functions that
directly accessed them, are now local to the source files implementing
the hashes themselves. Everywhere we previously used those types or
functions, we're now using the standard ssh_hash or ssh2_mac API.
The 'simple' functions (hmacmd5_simple, SHA_Simple etc) are now a pair
of wrappers in sshauxcrypt.c, each of which takes an algorithm
structure and can do the same conceptual thing regardless of what it
is.