In commit 0f405ae8a, I arranged to stop reading from the SSH
connection if the in_raw bufchain got too big. But in at least some
tools (this bit me just now with PSCP), nothing actually calls
ssh_check_frozen again when the bufchain clears, so it stays frozen.
Now ssh_check_frozen is non-static, and all the BPP implementations
call it whenever they consume data from ssh->in_raw.
When I added the new call to ssh_key_invalid the other day, I forgot
to avoid calling it if the key is NULL (and therefore even more
obviously invalid).
The standard says we should be checking that both r,s are in the range
[1,q-1]. Previously we were effectively reducing s mod q in the course
of inversion, and modinv() was guaranteeing never to return zero; the
remaining missing checks were benign. But the change from Bignum to
mp_int altered the error behaviour, and combined with the missing
upper bound check on s, made it possible to continue verification with
w == 0 mod q, which is a bad case.
Added a small DSA test case, including a check that none of these
types of signatures validates.
Although I've reinstated the tedious manual mouse input, I can at
least reduce the amount of it that the user is required to provide:
the new PRNG has a hard limit on the size of its seed, so once we've
generated enough entropy to fill that up, there's no point in
collecting more, even if we're generating a particularly large key.
This reverts the policy change in 6142013ab (though not the detailed
code changes - I've kept the reorganised code layout). Now the old
mouse-based manual entropy collection is once again required when
generating a public key.
Rationale: I came across Wikipedia's page on CryptGenRandom which
mentioned that it was not a true kernel-level PRNG of the /dev/random
variety, but rather a thing running in userland, no different in
principle from PuTTY's own. So I think that makes it no longer a thing
we should rely on for all our entropy, and I'm relegating it back to
being just one entropy source among many.
With this change, my new side-channel test system gets a 100% pass
rate when compiled with clang -O3 on Ubuntu 18.10. Previously, it had
three failing tests (namely the three ECC multiply functions), all due
to inconsistent control flow inside mp_cond_swap.
I admit I don't really understand whether this is really necessary or
not, so I'm playing it safe. The problem _seems_ to be that clang has
generated one version of the cond_swap loop using integer arithmetic
and another using MMX vectors, so the obvious suspect is alignment -
probably mp_cond_swap is processing an iteration of the loop up front
until its pointer is 16-byte aligned and then switching over to the
vectorised version. But on the other hand, when I experimentally tried
forcing allocations to be 16- or even 32-byte aligned, it didn't make
a difference. And I don't speak x86 vector instructions very well (in
fact barely at all), so I'm not even completely sure of whether the
code I was reading did what I thought it did; so I'm more comfortable
with simply applying brute force to get some code generation that the
automated test is genuinely happy with.
All the work I've put in in the last few months to eliminate timing
and cache side channels from PuTTY's mp_int and cipher implementations
has been on a seat-of-the-pants basis: just thinking very hard about
what kinds of language construction I think would be safe to use, and
trying not to absentmindedly leave a conditional branch or a cast to
bool somewhere vital.
Now I've got a test suite! The basic idea is that you run the same
crypto primitive multiple times, with inputs differing only in ways
that are supposed to avoid being leaked by timing or leaving evidence
in the cache; then you instrument the code so that it logs all the
control flow, memory access and a couple of other relevant things in
each of those runs, and finally, compare the logs and expect them to
be identical.
The instrumentation is done using DynamoRIO, which I found to be well
suited to this kind of work: it lets you define custom modifications
of the code in a reasonably low-effort way, and it lets you work at
both the low level of examining single instructions _and_ the higher
level of the function call ABI (so you can give things like malloc
special treatment, not to mention intercepting communications from the
program being instrumented). Build instructions are all in the comment
at the top of testsc.c.
At present, I've found this test to give a 100% pass rate using gcc
-O0 and -O3 (Ubuntu 18.10). With clang, there are a couple of
failures, which I'll fix in the next commit.
The case previously conditioned on _M_IX86, where we use __int64 as
the BignumDblInt type, is actually valid on any Visual Studio target
platform at all, so it's safe to remove that condition and let it
apply to _M_ARM and _M_ARM64 as well. The only situation in which we
_shouldn't_ use that case for Visual Studio builds is when we have
something even better available, such as the x86-64 intrinsics for
add-with-carry and double-width multiply.
There's been a FIXME comment in there for ages saying we should do
something less drastic than ssh_sw_abort(). This actually came up in
the course of testing Pageant's support for the new RSA validity
check, so I've fixed it: if Pageant won't deliver us a signature from
the private key we'd like, then we treat it the same as any other auth
method failure: shrug and move on to the next method on our list (or
even just the next key in Pageant).
The ssh_signkey vtable has grown a new method ssh_key_invalid(), which
checks whether the key is going to be usable for constructing a
signature at all. Currently the only way this can fail is if it's an
RSA key so short that there isn't room to put all the PKCS#1
formatting in the signature preimage integer, but the return value is
an arbitrary error message just in case more reasons are needed later.
This is tested separately rather than at key-creation time because of
the signature flags system: an RSA key of intermediate length could be
valid for SHA-1 signing but not for SHA-512. So really this method
should be called at the point where you've decided what sig flags you
want to use, and you're checking if _those flags_ are OK.
On the verification side, there's no need for a separate check. If
someone presents us with an RSA key so short that it's impossible to
encode a valid signature using it, then we simply regard all
signatures as invalid.
I computed hash + x*r by first computing x*r, and then using
mp_add_into to add the hash to it in the same bignum. But if the
result of x*r had been allocated an mp_int only just large enough to
contain it, then the addition of the hash might have made it overflow
and generated a bogus signature.
I've never seen that happen, and for all I know word sizes may make it
completely impossible. But it's a theoretical possibility, and easy to
fix now that I've happened to spot it in passing.
This will let my upcoming new test of memory access patterns run a
sequence of tests on different elliptic-curve data which is stored at
the same address each time.
I just found I wanted to generate a prime with particular properties,
and I knew PuTTY's prime generator could manage it, so it was easier
to add this function to testcrypt for occasional manual use than to
look for another prime-generator with the same feature set!
I've wrapped the function so as to remove the three progress-
reporting parameters.
The check for a sequence of ET with an EN after it could potentially
skip ETs all the way up to the end of the buffer and then look for an
EN in the following nonexistent array element. Now it only skips ETs
up to count-1, in the same style as the similar check in rule N1.
Change-Id: Ifdbae494a22d1b96bf49ae1bcae0efb901565f45
If term->esc_query == -1 (reflecting an escape sequence in which the
CSI is followed by a prefix character other than ?) then the ANSI
macro shouldn't shift it left by 8, because that's undefined behaviour
(although in practice I'd be very surprised if any compiler has
actually miscompiled it yet).
Multiplying it by 256 is a safe alternative which has the behaviour I
wanted.
The local put_mp_*_from_string functions in import.c now take ptrlen
(which simplifies essentially all their call sites); so does the local
function logwrite() in logging.c, and so does ssh2_fingerprint_blob.
A great many BinarySource_BARE_INIT calls are passing the two halves
of a ptrlen as separate arguments. It saves a lot of call-site faff to
have a variant of the init function that just takes the whole ptrlen
in one go.
Now that all the call sites are expecting a size_t instead of an int
length field, it's no longer particularly difficult to make it
actually return the pointer,length pair in the form of a ptrlen.
It would be nice to say that simplifies call sites because those
ptrlens can all be passed straight along to other ptrlen-consuming
functions. Actually almost none of the call sites are like that _yet_,
but this makes it possible to move them in that direction in future
(as part of my general aim to migrate ptrlen-wards as much as I can).
But also it's just nicer to keep the pointer and length together in
one variable, and not have to declare them both in advance with two
extra lines of boilerplate.
This is a general cleanup which has been overdue for some time: lots
of length fields are now the machine word type rather than the (in
practice) fixed 'int'.
Now we pass an error code in a separate dedicated parameter, instead
of overloading the length parameter so that a negative value means an
error code. This enables length to become unsigned without causing
trouble.
Now the integer output value is never negative (because the condition
that used to be signalled by setting it to -1 is now signalled by
returning false from the actual function), which frees me to make it
an unsigned type in an upcoming change.
If the SSH socket is readable, GTK will preferentially give us a
callback to read from it rather than calling its idle functions. That
means the ssh->in_raw bufchain can just keep accumulating data, and
the callback that gets the BPP to take data back off that bufchain
will never be called at all.
The solution is to use sk_set_frozen after a certain point, to stop
reading further data from the socket (and, more importantly, disable
GTK's I/O callback for that fd) until we've had a chance to process
some backlog, and then unfreeze the socket again afterwards.
Annoyingly, that means adding a _second_ 'frozen' flag to Ssh, because
the one we already had has exactly the wrong semantics - it prevents
us from _processing_ our backlog, which is the last thing we want if
the entire problem is that we need that backlog to get smaller! So now
there are two frozen flags, and a big comment explaining the
difference.
The ADDTOBUF macro and the three outbuf variables are trying to be a
strbuf, and not doing it as well as the real one.
Since c_write takes an int length parameter but outbuf->len is now a
size_t, I've also arranged to flush outbuf periodically during the
function, just in case it gets too big.
Previously, I checked by assertion that the base was less than the
modulus. There were two things wrong with this policy. Firstly, it's
perfectly _meaningful_ to want to raise a large number to a power mod
a smaller number, even if it doesn't come up often in cryptography;
secondly, I didn't do it right, because the check was based on the
formal sizes (nw fields) of the mp_ints, which meant that it was
possible to have a failure of the assertion even in the case where the
numerical value of the base _was_ less than the modulus.
In particular, this could come up in Diffie-Hellman with a fixed
group, because the fixed group modulus was decoded from an MP_LITERAL
in sshdh.c which gave a minimal value of nw, but the base was the
public value sent by the other end of the connection, which would
sometimes be sent with the leading zero byte required by the SSH-2
mpint encoding, and would cause a value of nw one larger, failing the
assertion.
Fixed by simply using mp_modmul in monty_import, replacing the
previous clever-but-restricted strategy that I wrote when I thought I
could get away without having to write a general division-based
modular reduction at all.
This completes the conversion begun in commit be5c0e635: now every
CBC-mode cipher has "cbc" in its name, and doesn't leave it implicit.
Hopefully this will never confuse me again!
In commit dfdb73e10 I accidentally renamed them from "aes128-cbc" to
"aes128" (and ditto for the other two key lengths), probably because
of the confusing names of the C-level identifiers for those vtables.
Now restored to the versions actually described in RFC 4253.
If something goes wrong part way through one of the new-key functions,
we immediately call the corresponding freekey function before
returning failure. That will test ek->privateKey for NULL, and if it's
not NULL, try to free it - so we should be sure it _is_ NULL if we
haven't put a private key in it.
Mainly this change affects the whole {GET,PUT}_??BIT_?SB_FIRST family,
which has always been a horrible set of macros for massive multiple-
expansion of its arguments. Now we're allowed to use C99 in this code
base, I can finally turn them into nice clean inline functions. As
bonus they now take their pointer argument as a void * (const-
qualified as appropriate) which means the call site doesn't have to
worry about exactly which flavour of pointer it's passing.
(That change also affects the GET_*_X11 macros in x11fwd.c, since I
was just reminded of their existence too!)
I've also converted NULLTOEMPTY, which was sitting right next to the
GET/PUT macros in misc.h and it seemed a shame to leave it out.
Those were a reasonable abbreviation when the code almost never had to
deal with little-endian numbers, but they've crept into enough places
now (e.g. the ECC formatting) that I think I'd now prefer that every
use of the integer read/write macros was clearly marked with its
endianness.
So all uses of GET_??BIT and PUT_??BIT are now qualified. The special
versions in x11fwd.c, which used variable endianness because so does
the X11 protocol, are suffixed _X11 to make that clear, and where that
pushed line lengths over 80 characters I've taken the opportunity to
name a local variable to remind me of what that extra parameter
actually does.
Pavel says there was no specific reason they avoided it, and compiling
with Debian Jessie's GCC 4.9.2 produces a binary that I've no reason
to believe won't work, although I haven't tested it on a real or
emulated CPU that supports the instructions.
The backend_unthrottle function gets called when the backlog on stdout
clears, and it's possible for that to happen _after_ the SSH backend
has terminated the connection and freed all its protocol modules (e.g.
if a protocol error occurred on the network while data was still
waiting to be written to stdout). So ssh_unthrottle should check that
ssh->cl still exists before calling any method of it.
If there was still pending output data on a NetSocket's output_data
bufchain when it was closed, then we wouldn't have freed it, on either
Unix or Windows.
uxnet.c's method for passing socket errors on to the Plug involves
setting up a toplevel callback using the NetSocket itself as the
context. Therefore, it should call delete_callbacks_for_context when
it destroys a NetSocket. For example, if _two_ socket errors manage to
occur, and the first one causes the socket to be closed, you need the
second callback to not happen, or it'll dereference the freed pointer.
The variable 'strbuf *realhost' was only initialised in the branch of
the ifdefs where IPV6 is enabled, so at NO_IPV6, it's used
uninitialised, which in my usual build configuration means a compile
failure.
We strbuf_free(ssh2blob), but forgot to null the pointer out
afterwards, which means a subsequent check for NULL believes there
wasn't a problem, and nulls out the error message pointer instead!
Some functions got confused if given one as input (particularly
mp_get_decimal, which assumed it could safely write at least one word
into the inv5 value it makes internally), and I've decided it's easier
to stop them ever being created than to teach everything to handle
them correctly. So now mp_make_sized enforces nw != 0 by assertion,
and I've added a max at any call site that looked as if it might
violate that precondition.
mp_from_hex("") could generate one of these, in particular, so now
I've fixed it, I've added a test to make sure it continues doing
something sensible.
The mp_cond_swap that sorts the key's factors into p>q order only
works if the mp_int representations of p and q have the same nw. It's
unusual but by no means illegal for an RSA key to be the product of
wildly different-length primes, so we should cope. Now we sort p and q
by using mp_min and mp_max.
In the clang-cl makefile, we run the .rc file through the preprocessor
and the actual resource compiler in two separate steps. But all the
include-file dependencies were being put on the _latter_ step, so
editing a .rc2 file didn't trigger a rebuild of the resource file.
Since the rewrite of hardware SHA support in cbbd464fd7, we've been
attempting to build with SHA-NI support on x86 with some GCC 4.x,
including Ubuntu 14.04's 4.8.x, whereas before we only tried it with
GCC 5.x and above. Revert to that.
(I think that GCC has had some support for this extension since 4.9.0 --
the "sha" attribute went in in upstream commit fc975a4090 -- and it
at least compiles with 4.9.2, but I'm assuming Pavel had a good reason
for sticking to 5+ in 5a38b293bd.)
When I reworked the support for rekeying after a certain amount of
data had been sent, I forgot the part where configuring the max data
limit to zero means 'never rekey due to data transfer volume'. So I
was incautiously checking the 'running' flag in the new
DataTransferStats to find out whether we needed to rekey, forgetting
that sometimes running=false means the transfer limit has expired, and
sometimes it means there never was one in the first place.
To fix this, I've got rid of the boolean return value from DTS_CONSUME
and turned it into an 'expired' flag in DataTransferStats, separate
from the 'running' flag. Now everything consistently checks 'expired'
to find out whether to rekey, and there's a new reset function that
reliably clears 'expired' but sets 'running' depending on whether the
size is nonzero.
(Also, while I'm at it, I've turned the DTS_CONSUME macro into an
inline function, because that's becoming my general preference now
that C99 is allowed in this code base.)