When I set up the simplified version of just the ssh-connection
protocol we use for connection sharing, I carefully documented at the
top of sshshare.c that packets in that protocol variant are limited to
just over 0x4000 bytes. And did I remember to actually honour that, by
restricting the sizes of outgoing packets when actually speaking the
bare connection protocol? I did not.
Well, better late than never. Here I introduce a packet size limit
that can be imposed by the BPP, and arrange for sshconnection.c to
take the min of that and any given channel's max packet size as sent
by the remote end. All BPPs except ssh2bpp-bare set it to the no-op
value of the largest possible uint32_t.
In normal builds this makes no difference, but in Windows builds with
the Minefield diagnostic system turned on, free()ing a Minefield-
allocated object causes a crash. Apparently I haven't tested Pageant
under Minefield for ages - only PuTTY, talking to an ordinary Pageant
I'd already started up.
The field s->username in the ssh2userauth state was sometimes a
secondary reference to s->default_username, and sometimes a
dynamically allocated string. So not freeing it can leak memory, but
freeing it (if non-null) can cause a double-free crash.
Solution: add a new field s->locally_allocated_username which is the
thing that needs freeing (if any), and make s->username into a const
char * which is always a secondary reference to _something_.
How embarrassing: we were looping back round to prompt for a username
again if s->change_username was *true*, but then the test at the top
would only have reissued the prompt if it was *false*. So that
configuration checkbox must have been broken for ages. It's a good job
most SSH servers don't reward enabling it!
It's a secondary reference to existing data (namely an argv word, if
anything at all), and never needs to be written through, so we should
mark it const, if nothing else to catch any absent-minded attempts to
free it.
This assertion was supposed to be checking for the buffer overrun
fixed by the previous commit, but because it checks the buffer index
just _after_ writing into the buffer, it would have permitted a
one-byte overrun before failing the assertion.
Commit 71e42b04a's refactoring of terminal keyboard input, in the case
where a Unicode string derived from a keystroke is translated into
some other charset to put on the wire, had allocated the output buffer
for that translation one byte too small.
If you select an entry in the saved sessions list box, but without
double-clicking to actually load it, and then you hit OK, the config-
box code will automatically load it. That just saves one click in a
common situation.
But in order to load that session, the config-box system first has to
ask the list-box control _which_ session is selected. On Windows, this
causes an assertion failure if the user has switched away from the
Session panel to some other panel of the config box, because when the
list box isn't on screen, its Windows control object is actually
destroyed.
I think a sensible answer is that we shouldn't be doing that implicit
load behaviour in any case if the list box isn't _visible_: silently
loading and launching a session someone selected a lot of UI actions
ago wasn't really the point. So now I make that behaviour only happen
when the list box (i.e. the Session panel) _is_ visible. That should
prevent the assertion failure on Windows, but the UI effect is cross-
platform, applying even on GTK where the control objects for invisible
panels persist and so the assertion failure didn't happen. I think
it's a reasonable UI change to make globally.
In order to implement it, I've had to invent a new query function so
that config.c can tell whether a given control is visible. In order to
do that on GTK, I had to give each control a pointer to the 'selparam'
structure describing its config-box pane, so that query function could
check it against the current one - and in order to do _that_, I had to
first arrange that those 'selparam' structures have stable addresses
from the moment they're first created, which meant adding a layer of
indirection so that the array of them in the top-level dlgparam
structure is now an array of _pointers_ rather than of actual structs.
(That way, realloc half way through config box creation can't
invalidate the important pointer values.)
One of those things you'd never notice if it weren't for Leak
Sanitiser happening to be turned on while you were doing something
else: freersakey() frees all the things pointed to _from_ an RSAKey
structure, but not the structure itself.
Thanks to Ralf Esswein for pointing out the slip, which should have
been corrected as part of 867e69187.
It didn't cause a test failure, because the ecdsa and eddsa structs
are currently so similar in layout (they differ only in the pointed-to
public key point structure, and on this failure path that pointer is
NULL anyway). As a result it's rather hard to add a regression test
against it failing again; I think I just have to chalk this one up to
the lack of OO-aware type checking when doing this kind of manual
vtable management in C, unfortunately.
Normally I never notice warnings in this build, because it runs inside
bob and dumps all the warnings in a part of the build log I never look
at. But I've had these fixes lying around for a while and should
commit them.
They're benign: all we need is an explicit declaration of strtoumax to
replace the one that stdlib.h doesn't provide, and a couple more of
those annoying NO_TYPECHECK modifiers on GET_WINDOWS_FUNCTION calls.
Having decided that the terminal's local echo setting shouldn't be
allowed to propagate through to termios, I think the local edit
setting shouldn't either. Also, no other terminal emulator I know
seems to implement this sequence, and if you enable it, things get
very confused in general. I think it's generally better off absent; if
somebody turns out to have been using it, then we'll at least be able
to find out what it's good for.
This sequence (ESC[12l, ESC[12h) enables and disables local echo in
the terminal. We were previously implementing it by gatewaying it
directly through to the local echo facility in the line discipline,
which in turn would pass it on to the terminal it was running in (if
it was Plink).
This seems to be at odds with how other terminals do it: they treat
SRM as its own entirely separate thing, in which the terminal
_emulator_ performs its own echoing of input keypress data,
independently of whether the Unix terminal device (or closest
equivalent) is doing the same thing or not.
Now we're doing it the same way as everyone else (or at least I think
so): the new internal terminal function that the term_keyinput pair
feed to is also implementing SRM-driven local echo as another of its
side effects. One observable effect is that SRM now doesn't interfere
with the termios settings of the terminal it's running in; another is
that the echo now only applies to real keypress data, and not
sequences auto-generated by the terminal.
The functions that previously lived in it now live in terminal.c
itself; they've been renamed term_keyinput and term_keyinputw, and
their function is to add data to the terminal's user input buffer from
a char or wchar_t string respectively.
They sit more comfortably in terminal.c anyway, because their whole
point is to translate into the character encoding that the terminal is
currently configured to use. Also, making them part of the terminal
code means they can also take care of calling term_seen_key_event(),
which simplifies most of the call sites in the GTK and Windows front
ends.
Generation of text _inside_ terminal.c, from responses to query escape
sequences, is therefore not done by calling those external entry
points: we send those responses directly to the ldisc, so that they
don't count as keypresses for all the user-facing purposes like bell
overload handling and scrollback reset. To make _that_ convenient,
I've arranged that most of the code that previously lived in
lpage_send and luni_send is now in separate translation functions, so
those can still be called from situations where you're not going to do
the default thing with the translated data.
(However, pasted data _does_ still count as close enough to a keypress
to call term_seen_key_event - but it clears the 'interactive' flag
when the data is passed on to the line discipline, which tweaks a
minor detail of control-char handling in line ending mode but mostly
just means pastes aren't interrupted.)
They're the intermediate product between preprocessing a resource file
and feeding it to the resource compiler proper, so they certainly need
cleaning.
The idea was that if we found a host key already cached for the given
algorithm, we should remove it from the tree and free it. In fact, I
forgot the 'remove from tree' step, so we freed a key that was still
linked from the tree234. Depending on luck and platform, this could
either cause a segfault, or an assertion failure on the subsequent
attempt to add the new key in place of the not-removed-after-all old
one.
If the user closes the Change Settings dialog box using the close
button provided by the window manager (or some analogous thing that
generates the same X11 event) instead of using the Cancel button
within the dialog itself, then after_change_settings_dialog() gets
called with retval < 0, which triggers an early return path in which
we forget to call unregister_dialog(), and as a result, assertions
fail all over the place the _next_ time you try to put up a Change
Settings dialog.
Also, the early return causes ctx.newconf to be memory-leaked. So
rather than just moving the unregister_dialog() call to above the
early return, a better fix is to remove the early return completely,
and simply treat retval<0 the same as retval==0: it doesn't matter
_how_ the user closed the config box without committing the changes,
it only matters that they did.
I needed to send a server-side disconnect message in a test just now,
which caused me to notice that I'd never got round to filling in
ssh_proto_error properly. Now I've done that, and added the associated
machinery for waiting for the remote EOF and winding up the SSH
connection.
The rest of the error functions are still stubs, though.
Without this missing line, if you tried to download a file in SCP mode
using the -p option, the payload of the 'T' command (file times) would
still be sitting in act->buf when we went back round the loop, so the
payload of the followup 'C' or 'D' would be appended to it, leading to
a massive misparse and a complaint about illegal file renaming.
When I start Uppity in listening mode, it's useful to have it
acknowledge that it _has_ started up in that mode, and isn't (for
example) stuck somewhere in my local wrapper script.
Coverity complained that it was wrong to use rand() in a security
context, and although in this case it's _very_ marginal, I can't
actually disagree that the choice of which light to light up to avoid
giving information about passphrase length is a security context.
So, no more rand(); instead we instantiate a shiny Fortuna PRNG
instance, seed it in more or less the usual way, and use that as an
overkill-level method of choosing which light to light up next.
(Acknowledging that this is a slightly unusual application and less
critical than most, I don't actually put the passphrase characters
themselves into the PRNG, and I don't use a random-seed file.)
It's identical in uxnoise and winnoise, being written entirely in
terms of existing cross-platform functions. Might as well centralise
it into sshrand.c.
Coverity points out that under a carefully checked compound condition,
we assign s->gss_cred_expiry to itself. :-)
Before commit 2ca0070f8 split the SSH code into separate modules, that
assignment read 'ssh->gss_cred_expiry = s->gss_cred_expiry', and the
point was that the value had to be copied out of the private state of
the transport-layer coroutine into the general state of the SSH
protocol as a whole so that ssh2_timer() (or rather, ssh2_gss_update,
called from ssh2_timer) could check it.
But now that timer function is _also_ local to the transport layer,
and shares the transport coroutine's state. So on the one hand, we
could just remove that assignment, folding the two variables into one;
on the other hand, we could reinstate the two variables and the
explicit 'commit' action that copies one into the other. The question
is, which?
Any _successful_ key exchange must have gone through that commit step
of the kex that copied the one into the other. And an unsuccessful key
exchange always aborts the whole SSH connection - there's nothing in
the SSH transport protocol that allows recovering from a failed rekey
by carrying on with the previous session keys. So if I made two
variables, then between key exchanges, they would always have the same
value anyway.
So the only effect of the separation between those variables is
_during_ a GSS key exchange: should the version of gss_cred_expiry
checked by the timer function be the new one, or the old one?
This can only make a difference if a timer goes off _during_ a rekey,
and happens to notice that the GSS credentials have expired. And
actually I think it's mildly _better_ if that checks the new expiry
time rather than the old one - otherwise, the timer routine would set
the flag indicating a need for another rekey, when actually the
currently running one was going to get the job done anyway.
So, in summary, I think conflating those two variables is actually an
improvement to the code. I did it by accident, but if I'd realised, I
would have done the same thing on purpose!
Hence, I've just removed the pointless self-assignment, with no
functional change.
If the input key_wanted field were set to an out-of-range value, the
parent structure retkey would not become NULL as a whole: instead, its
field 'key' would never be set to a non-null pointer. So I was testing
the wrong pointer.
Fortunately, this couldn't have come up, because we don't actually
have any support yet for loading the nth key from an OpenSSH new-style
key file containing more than one. So key_wanted was always set to 0
by load_openssh_new_key(), which also checked that the file's key
count was exactly 1 (guarding against the possibility that even 0
might have been an out-of-range index).
Coverity points out that in (the misnamed) psftp_main(), I allow for
the possibility that 'backend' might be null, up until the final
unconditional backend_free().
I haven't actually been able to find a way to make backend() come out
as NULL at all in that part of the code: obvious failure modes like
trying to scp to a nonexistent host trigger a connection_fatal() which
terminates the program without going through that code anyway. But I'm
not confident I tried everything, so I can't be sure there _isn't_ a
way to get NULL to that location, so let's put in the missing check
just in case.
In load_openssh_new_key, ret->keyblob is never null any more: now that
it's a strbuf instead of a bare realloc()ed string, it's at worst an
_empty_ strbuf. Secondly, as Coverity pointed out, the null pointer
check was too late to do any good in the first place - the previous
clause of the same if condition would already have dereferenced it!
In test_mac in the auxiliary testsc program, there's no actual reason
I would ever have called it with null ssh_mac pointer - it would mean
'don't test anything'! Looks as if I just copy-pasted the MAC parts of
the cipher+MAC setup code in test_cipher.
We have to use the file name we just failed to open to format an error
message _before_ freeing it, not after. If that use-after-free managed
not to cause a crash, we'd also leak the file descriptor 'outfd'.
Both spotted by Coverity (which is probably the first thing in years
to look seriously at any of the code designed for Ben's AFL exercise).
Coverity pointed out a call to sftp_pkt_free(pktin) straight after an
unconditional return statement, which is obviously silly.
Fortunately, it was benign: pktin was freed anyway by the function
being called _by_ the return statement, so the unreachability of this
free operation prevented a double-free rather than allowing a memory
leak. So the right fix is to just remove the code, rather than
arranging for it to be run.
It seems to have caused a compile error, apparently due to a mismatch
between compiler predefines (__SSE2__) and header files (emmintrin.h).
The easiest thing is to just turn off the hardware version completely,
so the rest of the code can still be scanned.
In commit 188e2525c I removed it from Winelib builds in general on the
grounds that Winelib had acquired that function. But now that I come
to try a Coverity build, I find that that is still one Winelib build
that does fail if I call SecureZeroMemory.
The structure field 'lengths' in 'struct zlib_decompress_ctx' was the
right length for the amount of data you might _sensibly_ want to put
in it, but two bytes shorter than the amount that the compressed block
header allows someone to _physically_ try to put into it. Now it has
the full maximum length.
The previous overrun could only reach two bytes beyond the end of the
array, and in every target architecture I know of, those two bytes
would have been structure padding, so it wasn't causing any real trouble.
In Deflate, both the literal/length and distance Huffman trees are
physically capable of encoding two symbol ids beyond the number that
the spec assigns any actual meaning to: a compressed block header can
specify code lengths for those two extra symbols if it wants to, in
which case those codes will be added to the Huffman tree (in
particular, will affect the encoding of everything else), but then
should not actually use those codes.
Our zlib decoder was silently ignoring the two invalid codes in the
literal/length tree, but treating the two invalid codes in the
distance tree as a fatal decoding error. That seems inconsistent. Now
we treat both as fatal decode errors.
There are already three tediously similar functions that wrap a NULL
check around some existing function to return one or another kind of
pointer, and I'm about to want to add another one. Make a macro so
that it's easy to make more functions identical to these three.
If a malformed private key (received through any channel, whether
loaded from a disk file or over the wire by Pageant) specifies either
of the modulus's prime factors p or q as 1, then when rsa_verify tries
to check that e*d is congruent to 1 mod (p-1) and mod (q-1), that
check will involve a division by zero, which in this context means
failing an assertion in mp_divmod.
We were already doing a preliminary check that neither of p and q was
actually zero. Now that check is strengthened to check that p-1 and
q-1 are not zero either.
Obviously we can't do that by inverting the hash function itself, but
if the user provides one or more host names on the command line that
they're expecting to appear in the file, we can at least compare the
stored hashes against those.
This change gives us an automatic --help option, which is always
useful for a script used very rarely. It also makes it that much
easier to add extra options.
Now most of the program consists of function and class definitions,
and the code that activates it all is localised in one place at the
bottom instead of interleaved between the definitions.
We support it in the ECC code proper these days, as of the bignum
rewrite in commit 25b034ee3. So we should support it in this auxiliary
script too, and fortunately, there's no real difficulty in doing so
because I already had some Python code kicking around in
test/eccref.py for taking modular square roots.
We went to the trouble of getting standard C99 format strings with
__USE_MINGW_ANSI_STDIO, but then used _vsnprintf, which bypasses them
and implements the non-standard Microsoft format strings, leading to
incorrect behaviour.
Ubuntu 14.04's mingw-w64 at least defines a vsnprintf() in all
circumstances, so we should use it.