This commit adds sanitisation to PSCP and PSFTP in the same style as
I've just put it into Plink. This time, standard error is sanitised
without reference to whether it's redirected (at least unless you give
an override option), on the basis that where Plink is _sometimes_ an
SSH transport for some other protocol, PSCP and PSFTP _always_ are.
But also, the sanitiser is run over any remote filename sent by the
server, substituting ? for any control characters it finds. That
removes another avenue for the server to deliberately confuse the
display.
This commit fixes our bug 'pscp-unsanitised-server-output', aka the
two notional 'vulnerabilities' CVE-2019-6109 and CVE-2019-6110.
(Although we regard those in isolation as only bugs, not serious
vulnerabilities, because their main threat was in hiding the evidence
of a server having exploited other more serious vulns that we never
had.)
If Plink's standard output and/or standard error points at a Windows
console or a Unix tty device, and if Plink was not configured to
request a remote pty (and hence to send a terminal-type string), then
we apply the new control-character stripping facility.
The idea is to be a mild defence against malicious remote processes
sending confusing escape sequences through the standard error channel
when Plink is being used as a transport for something like git: it's
OK to have actual sensible error messages come back from the server,
but when you run a git command, you didn't really intend to give the
remote server the implicit licence to write _all over_ your local
terminal display. At the same time, in that scenario, the standard
_output_ of Plink is left completely alone, on the grounds that git
will be expecting it to be 8-bit clean. (And Plink can tell that
because it's redirected away from the console.)
For interactive login sessions using Plink, this behaviour is
disabled, on the grounds that once you've sent a terminal-type string
it's assumed that you were _expecting_ the server to use it to know
what escape sequences to send to you.
So it should be transparent for all the use cases I've so far thought
of. But in case it's not, there's a family of new command-line options
like -no-sanitise-stdout and -sanitise-stderr that you can use to
forcibly override the autodetection of whether to do it.
This all applies the same way to both Unix and Windows Plink.
This is for sanitising output that's going to be sent to a terminal,
if you don't want it to be able to send arbitrary escape sequences and
thereby (for example) move the cursor back up to existing text on the
screen and overprint it confusingly.
It works using the standard C library: we convert to a wide-character
string and back, and then use wctype.h to spot control characters in
the intermediate form. This means its idea of the conversion character
set is locale-based rather than any of our own charset library's fixed
settings - which is what you want if the aim is to protect your local
terminal (which we assume the system locale represents accurately).
This also means that the sanitiser strips things that will _act_ as
control characters when sent to the local terminal, whether or not
they were intended as control characters by a server that might have
had a different character set in mind. Since the main aim is to
protect the local terminal rather than to faithfully replicate the
server's intention, I think that's the right criterion.
It only strips control characters at the charset-independent layer,
like backspace, carriage return and the escape character: wctype.h
classifies those as control characters, but classifies as printing all
of the more Unicode-specific controls like bidirectional overrides.
But that's enough to prevent cursor repositioning, for example.
stripctrl.c comes with a test main() of its own, which I wasn't able
to fold into testcrypt and put in the test suite because of its
dependence on the system locale - it wouldn't be guaranteed to work
the same way on different test systems anyway.
A knock-on build tweak: because you can feed data into this sanitiser
in chunks of arbitrary size, including partial multibyte chars, I had
to use mbrtowc() for the decoding, and that means that in the 'old'
Win32 builds I have to link against the Visual Studio C++ library as
well as the C library, because for some reason that's where mbrtowc
lived in VS2003.
Rather like isatty() on Unix, this tells you if a raw Windows HANDLE
points at a console or not. Useful to know if your standard output or
standard error is going to be shown to a user, or redirected to
something that will make automated use of it.
There's now a stdio_sink, whose write function calls fwrite on the
given FILE *; a bufchain_sink, whose write function appends to the
given bufchain; and on Windows there's a handle_sink whose write
function writes to the given 'struct handle'. (That is, not the raw
Windows HANDLE, but our event-loop-friendly wrapper on it.)
Not yet used for anything, but they're about to be.
ssh_sftp_loop_iteration() used to return failure if no file handle was
in use for the select loop, on the basis that that means select would
just loop forever. But if there's a toplevel callback pending - in
particular, if it's going to do something like emptying ssh->in_raw
which will put an fd _back into_ the next iteration of the select loop
- then that's not a good enough reason to return permanent failure.
Just go round the loop, run the callback, and try again.
In commit 0f405ae8a, I arranged to stop reading from the SSH
connection if the in_raw bufchain got too big. But in at least some
tools (this bit me just now with PSCP), nothing actually calls
ssh_check_frozen again when the bufchain clears, so it stays frozen.
Now ssh_check_frozen is non-static, and all the BPP implementations
call it whenever they consume data from ssh->in_raw.
When I added the new call to ssh_key_invalid the other day, I forgot
to avoid calling it if the key is NULL (and therefore even more
obviously invalid).
The standard says we should be checking that both r,s are in the range
[1,q-1]. Previously we were effectively reducing s mod q in the course
of inversion, and modinv() was guaranteeing never to return zero; the
remaining missing checks were benign. But the change from Bignum to
mp_int altered the error behaviour, and combined with the missing
upper bound check on s, made it possible to continue verification with
w == 0 mod q, which is a bad case.
Added a small DSA test case, including a check that none of these
types of signatures validates.
Although I've reinstated the tedious manual mouse input, I can at
least reduce the amount of it that the user is required to provide:
the new PRNG has a hard limit on the size of its seed, so once we've
generated enough entropy to fill that up, there's no point in
collecting more, even if we're generating a particularly large key.
This reverts the policy change in 6142013ab (though not the detailed
code changes - I've kept the reorganised code layout). Now the old
mouse-based manual entropy collection is once again required when
generating a public key.
Rationale: I came across Wikipedia's page on CryptGenRandom which
mentioned that it was not a true kernel-level PRNG of the /dev/random
variety, but rather a thing running in userland, no different in
principle from PuTTY's own. So I think that makes it no longer a thing
we should rely on for all our entropy, and I'm relegating it back to
being just one entropy source among many.
With this change, my new side-channel test system gets a 100% pass
rate when compiled with clang -O3 on Ubuntu 18.10. Previously, it had
three failing tests (namely the three ECC multiply functions), all due
to inconsistent control flow inside mp_cond_swap.
I admit I don't really understand whether this is really necessary or
not, so I'm playing it safe. The problem _seems_ to be that clang has
generated one version of the cond_swap loop using integer arithmetic
and another using MMX vectors, so the obvious suspect is alignment -
probably mp_cond_swap is processing an iteration of the loop up front
until its pointer is 16-byte aligned and then switching over to the
vectorised version. But on the other hand, when I experimentally tried
forcing allocations to be 16- or even 32-byte aligned, it didn't make
a difference. And I don't speak x86 vector instructions very well (in
fact barely at all), so I'm not even completely sure of whether the
code I was reading did what I thought it did; so I'm more comfortable
with simply applying brute force to get some code generation that the
automated test is genuinely happy with.
All the work I've put in in the last few months to eliminate timing
and cache side channels from PuTTY's mp_int and cipher implementations
has been on a seat-of-the-pants basis: just thinking very hard about
what kinds of language construction I think would be safe to use, and
trying not to absentmindedly leave a conditional branch or a cast to
bool somewhere vital.
Now I've got a test suite! The basic idea is that you run the same
crypto primitive multiple times, with inputs differing only in ways
that are supposed to avoid being leaked by timing or leaving evidence
in the cache; then you instrument the code so that it logs all the
control flow, memory access and a couple of other relevant things in
each of those runs, and finally, compare the logs and expect them to
be identical.
The instrumentation is done using DynamoRIO, which I found to be well
suited to this kind of work: it lets you define custom modifications
of the code in a reasonably low-effort way, and it lets you work at
both the low level of examining single instructions _and_ the higher
level of the function call ABI (so you can give things like malloc
special treatment, not to mention intercepting communications from the
program being instrumented). Build instructions are all in the comment
at the top of testsc.c.
At present, I've found this test to give a 100% pass rate using gcc
-O0 and -O3 (Ubuntu 18.10). With clang, there are a couple of
failures, which I'll fix in the next commit.
The case previously conditioned on _M_IX86, where we use __int64 as
the BignumDblInt type, is actually valid on any Visual Studio target
platform at all, so it's safe to remove that condition and let it
apply to _M_ARM and _M_ARM64 as well. The only situation in which we
_shouldn't_ use that case for Visual Studio builds is when we have
something even better available, such as the x86-64 intrinsics for
add-with-carry and double-width multiply.
There's been a FIXME comment in there for ages saying we should do
something less drastic than ssh_sw_abort(). This actually came up in
the course of testing Pageant's support for the new RSA validity
check, so I've fixed it: if Pageant won't deliver us a signature from
the private key we'd like, then we treat it the same as any other auth
method failure: shrug and move on to the next method on our list (or
even just the next key in Pageant).
The ssh_signkey vtable has grown a new method ssh_key_invalid(), which
checks whether the key is going to be usable for constructing a
signature at all. Currently the only way this can fail is if it's an
RSA key so short that there isn't room to put all the PKCS#1
formatting in the signature preimage integer, but the return value is
an arbitrary error message just in case more reasons are needed later.
This is tested separately rather than at key-creation time because of
the signature flags system: an RSA key of intermediate length could be
valid for SHA-1 signing but not for SHA-512. So really this method
should be called at the point where you've decided what sig flags you
want to use, and you're checking if _those flags_ are OK.
On the verification side, there's no need for a separate check. If
someone presents us with an RSA key so short that it's impossible to
encode a valid signature using it, then we simply regard all
signatures as invalid.
I computed hash + x*r by first computing x*r, and then using
mp_add_into to add the hash to it in the same bignum. But if the
result of x*r had been allocated an mp_int only just large enough to
contain it, then the addition of the hash might have made it overflow
and generated a bogus signature.
I've never seen that happen, and for all I know word sizes may make it
completely impossible. But it's a theoretical possibility, and easy to
fix now that I've happened to spot it in passing.
This will let my upcoming new test of memory access patterns run a
sequence of tests on different elliptic-curve data which is stored at
the same address each time.
I just found I wanted to generate a prime with particular properties,
and I knew PuTTY's prime generator could manage it, so it was easier
to add this function to testcrypt for occasional manual use than to
look for another prime-generator with the same feature set!
I've wrapped the function so as to remove the three progress-
reporting parameters.
The check for a sequence of ET with an EN after it could potentially
skip ETs all the way up to the end of the buffer and then look for an
EN in the following nonexistent array element. Now it only skips ETs
up to count-1, in the same style as the similar check in rule N1.
Change-Id: Ifdbae494a22d1b96bf49ae1bcae0efb901565f45
If term->esc_query == -1 (reflecting an escape sequence in which the
CSI is followed by a prefix character other than ?) then the ANSI
macro shouldn't shift it left by 8, because that's undefined behaviour
(although in practice I'd be very surprised if any compiler has
actually miscompiled it yet).
Multiplying it by 256 is a safe alternative which has the behaviour I
wanted.
The local put_mp_*_from_string functions in import.c now take ptrlen
(which simplifies essentially all their call sites); so does the local
function logwrite() in logging.c, and so does ssh2_fingerprint_blob.
A great many BinarySource_BARE_INIT calls are passing the two halves
of a ptrlen as separate arguments. It saves a lot of call-site faff to
have a variant of the init function that just takes the whole ptrlen
in one go.
Now that all the call sites are expecting a size_t instead of an int
length field, it's no longer particularly difficult to make it
actually return the pointer,length pair in the form of a ptrlen.
It would be nice to say that simplifies call sites because those
ptrlens can all be passed straight along to other ptrlen-consuming
functions. Actually almost none of the call sites are like that _yet_,
but this makes it possible to move them in that direction in future
(as part of my general aim to migrate ptrlen-wards as much as I can).
But also it's just nicer to keep the pointer and length together in
one variable, and not have to declare them both in advance with two
extra lines of boilerplate.
This is a general cleanup which has been overdue for some time: lots
of length fields are now the machine word type rather than the (in
practice) fixed 'int'.
Now we pass an error code in a separate dedicated parameter, instead
of overloading the length parameter so that a negative value means an
error code. This enables length to become unsigned without causing
trouble.
Now the integer output value is never negative (because the condition
that used to be signalled by setting it to -1 is now signalled by
returning false from the actual function), which frees me to make it
an unsigned type in an upcoming change.
If the SSH socket is readable, GTK will preferentially give us a
callback to read from it rather than calling its idle functions. That
means the ssh->in_raw bufchain can just keep accumulating data, and
the callback that gets the BPP to take data back off that bufchain
will never be called at all.
The solution is to use sk_set_frozen after a certain point, to stop
reading further data from the socket (and, more importantly, disable
GTK's I/O callback for that fd) until we've had a chance to process
some backlog, and then unfreeze the socket again afterwards.
Annoyingly, that means adding a _second_ 'frozen' flag to Ssh, because
the one we already had has exactly the wrong semantics - it prevents
us from _processing_ our backlog, which is the last thing we want if
the entire problem is that we need that backlog to get smaller! So now
there are two frozen flags, and a big comment explaining the
difference.
The ADDTOBUF macro and the three outbuf variables are trying to be a
strbuf, and not doing it as well as the real one.
Since c_write takes an int length parameter but outbuf->len is now a
size_t, I've also arranged to flush outbuf periodically during the
function, just in case it gets too big.
Previously, I checked by assertion that the base was less than the
modulus. There were two things wrong with this policy. Firstly, it's
perfectly _meaningful_ to want to raise a large number to a power mod
a smaller number, even if it doesn't come up often in cryptography;
secondly, I didn't do it right, because the check was based on the
formal sizes (nw fields) of the mp_ints, which meant that it was
possible to have a failure of the assertion even in the case where the
numerical value of the base _was_ less than the modulus.
In particular, this could come up in Diffie-Hellman with a fixed
group, because the fixed group modulus was decoded from an MP_LITERAL
in sshdh.c which gave a minimal value of nw, but the base was the
public value sent by the other end of the connection, which would
sometimes be sent with the leading zero byte required by the SSH-2
mpint encoding, and would cause a value of nw one larger, failing the
assertion.
Fixed by simply using mp_modmul in monty_import, replacing the
previous clever-but-restricted strategy that I wrote when I thought I
could get away without having to write a general division-based
modular reduction at all.
This completes the conversion begun in commit be5c0e635: now every
CBC-mode cipher has "cbc" in its name, and doesn't leave it implicit.
Hopefully this will never confuse me again!
In commit dfdb73e10 I accidentally renamed them from "aes128-cbc" to
"aes128" (and ditto for the other two key lengths), probably because
of the confusing names of the C-level identifiers for those vtables.
Now restored to the versions actually described in RFC 4253.
If something goes wrong part way through one of the new-key functions,
we immediately call the corresponding freekey function before
returning failure. That will test ek->privateKey for NULL, and if it's
not NULL, try to free it - so we should be sure it _is_ NULL if we
haven't put a private key in it.
Mainly this change affects the whole {GET,PUT}_??BIT_?SB_FIRST family,
which has always been a horrible set of macros for massive multiple-
expansion of its arguments. Now we're allowed to use C99 in this code
base, I can finally turn them into nice clean inline functions. As
bonus they now take their pointer argument as a void * (const-
qualified as appropriate) which means the call site doesn't have to
worry about exactly which flavour of pointer it's passing.
(That change also affects the GET_*_X11 macros in x11fwd.c, since I
was just reminded of their existence too!)
I've also converted NULLTOEMPTY, which was sitting right next to the
GET/PUT macros in misc.h and it seemed a shame to leave it out.
Those were a reasonable abbreviation when the code almost never had to
deal with little-endian numbers, but they've crept into enough places
now (e.g. the ECC formatting) that I think I'd now prefer that every
use of the integer read/write macros was clearly marked with its
endianness.
So all uses of GET_??BIT and PUT_??BIT are now qualified. The special
versions in x11fwd.c, which used variable endianness because so does
the X11 protocol, are suffixed _X11 to make that clear, and where that
pushed line lengths over 80 characters I've taken the opportunity to
name a local variable to remind me of what that extra parameter
actually does.
Pavel says there was no specific reason they avoided it, and compiling
with Debian Jessie's GCC 4.9.2 produces a binary that I've no reason
to believe won't work, although I haven't tested it on a real or
emulated CPU that supports the instructions.
The backend_unthrottle function gets called when the backlog on stdout
clears, and it's possible for that to happen _after_ the SSH backend
has terminated the connection and freed all its protocol modules (e.g.
if a protocol error occurred on the network while data was still
waiting to be written to stdout). So ssh_unthrottle should check that
ssh->cl still exists before calling any method of it.
If there was still pending output data on a NetSocket's output_data
bufchain when it was closed, then we wouldn't have freed it, on either
Unix or Windows.
uxnet.c's method for passing socket errors on to the Plug involves
setting up a toplevel callback using the NetSocket itself as the
context. Therefore, it should call delete_callbacks_for_context when
it destroys a NetSocket. For example, if _two_ socket errors manage to
occur, and the first one causes the socket to be closed, you need the
second callback to not happen, or it'll dereference the freed pointer.
The variable 'strbuf *realhost' was only initialised in the branch of
the ifdefs where IPV6 is enabled, so at NO_IPV6, it's used
uninitialised, which in my usual build configuration means a compile
failure.