The idea was that if we found a host key already cached for the given
algorithm, we should remove it from the tree and free it. In fact, I
forgot the 'remove from tree' step, so we freed a key that was still
linked from the tree234. Depending on luck and platform, this could
either cause a segfault, or an assertion failure on the subsequent
attempt to add the new key in place of the not-removed-after-all old
one.
If the user closes the Change Settings dialog box using the close
button provided by the window manager (or some analogous thing that
generates the same X11 event) instead of using the Cancel button
within the dialog itself, then after_change_settings_dialog() gets
called with retval < 0, which triggers an early return path in which
we forget to call unregister_dialog(), and as a result, assertions
fail all over the place the _next_ time you try to put up a Change
Settings dialog.
Also, the early return causes ctx.newconf to be memory-leaked. So
rather than just moving the unregister_dialog() call to above the
early return, a better fix is to remove the early return completely,
and simply treat retval<0 the same as retval==0: it doesn't matter
_how_ the user closed the config box without committing the changes,
it only matters that they did.
I needed to send a server-side disconnect message in a test just now,
which caused me to notice that I'd never got round to filling in
ssh_proto_error properly. Now I've done that, and added the associated
machinery for waiting for the remote EOF and winding up the SSH
connection.
The rest of the error functions are still stubs, though.
Without this missing line, if you tried to download a file in SCP mode
using the -p option, the payload of the 'T' command (file times) would
still be sitting in act->buf when we went back round the loop, so the
payload of the followup 'C' or 'D' would be appended to it, leading to
a massive misparse and a complaint about illegal file renaming.
When I start Uppity in listening mode, it's useful to have it
acknowledge that it _has_ started up in that mode, and isn't (for
example) stuck somewhere in my local wrapper script.
Coverity complained that it was wrong to use rand() in a security
context, and although in this case it's _very_ marginal, I can't
actually disagree that the choice of which light to light up to avoid
giving information about passphrase length is a security context.
So, no more rand(); instead we instantiate a shiny Fortuna PRNG
instance, seed it in more or less the usual way, and use that as an
overkill-level method of choosing which light to light up next.
(Acknowledging that this is a slightly unusual application and less
critical than most, I don't actually put the passphrase characters
themselves into the PRNG, and I don't use a random-seed file.)
It's identical in uxnoise and winnoise, being written entirely in
terms of existing cross-platform functions. Might as well centralise
it into sshrand.c.
Coverity points out that under a carefully checked compound condition,
we assign s->gss_cred_expiry to itself. :-)
Before commit 2ca0070f8 split the SSH code into separate modules, that
assignment read 'ssh->gss_cred_expiry = s->gss_cred_expiry', and the
point was that the value had to be copied out of the private state of
the transport-layer coroutine into the general state of the SSH
protocol as a whole so that ssh2_timer() (or rather, ssh2_gss_update,
called from ssh2_timer) could check it.
But now that timer function is _also_ local to the transport layer,
and shares the transport coroutine's state. So on the one hand, we
could just remove that assignment, folding the two variables into one;
on the other hand, we could reinstate the two variables and the
explicit 'commit' action that copies one into the other. The question
is, which?
Any _successful_ key exchange must have gone through that commit step
of the kex that copied the one into the other. And an unsuccessful key
exchange always aborts the whole SSH connection - there's nothing in
the SSH transport protocol that allows recovering from a failed rekey
by carrying on with the previous session keys. So if I made two
variables, then between key exchanges, they would always have the same
value anyway.
So the only effect of the separation between those variables is
_during_ a GSS key exchange: should the version of gss_cred_expiry
checked by the timer function be the new one, or the old one?
This can only make a difference if a timer goes off _during_ a rekey,
and happens to notice that the GSS credentials have expired. And
actually I think it's mildly _better_ if that checks the new expiry
time rather than the old one - otherwise, the timer routine would set
the flag indicating a need for another rekey, when actually the
currently running one was going to get the job done anyway.
So, in summary, I think conflating those two variables is actually an
improvement to the code. I did it by accident, but if I'd realised, I
would have done the same thing on purpose!
Hence, I've just removed the pointless self-assignment, with no
functional change.
If the input key_wanted field were set to an out-of-range value, the
parent structure retkey would not become NULL as a whole: instead, its
field 'key' would never be set to a non-null pointer. So I was testing
the wrong pointer.
Fortunately, this couldn't have come up, because we don't actually
have any support yet for loading the nth key from an OpenSSH new-style
key file containing more than one. So key_wanted was always set to 0
by load_openssh_new_key(), which also checked that the file's key
count was exactly 1 (guarding against the possibility that even 0
might have been an out-of-range index).
Coverity points out that in (the misnamed) psftp_main(), I allow for
the possibility that 'backend' might be null, up until the final
unconditional backend_free().
I haven't actually been able to find a way to make backend() come out
as NULL at all in that part of the code: obvious failure modes like
trying to scp to a nonexistent host trigger a connection_fatal() which
terminates the program without going through that code anyway. But I'm
not confident I tried everything, so I can't be sure there _isn't_ a
way to get NULL to that location, so let's put in the missing check
just in case.
In load_openssh_new_key, ret->keyblob is never null any more: now that
it's a strbuf instead of a bare realloc()ed string, it's at worst an
_empty_ strbuf. Secondly, as Coverity pointed out, the null pointer
check was too late to do any good in the first place - the previous
clause of the same if condition would already have dereferenced it!
In test_mac in the auxiliary testsc program, there's no actual reason
I would ever have called it with null ssh_mac pointer - it would mean
'don't test anything'! Looks as if I just copy-pasted the MAC parts of
the cipher+MAC setup code in test_cipher.
We have to use the file name we just failed to open to format an error
message _before_ freeing it, not after. If that use-after-free managed
not to cause a crash, we'd also leak the file descriptor 'outfd'.
Both spotted by Coverity (which is probably the first thing in years
to look seriously at any of the code designed for Ben's AFL exercise).
Coverity pointed out a call to sftp_pkt_free(pktin) straight after an
unconditional return statement, which is obviously silly.
Fortunately, it was benign: pktin was freed anyway by the function
being called _by_ the return statement, so the unreachability of this
free operation prevented a double-free rather than allowing a memory
leak. So the right fix is to just remove the code, rather than
arranging for it to be run.
It seems to have caused a compile error, apparently due to a mismatch
between compiler predefines (__SSE2__) and header files (emmintrin.h).
The easiest thing is to just turn off the hardware version completely,
so the rest of the code can still be scanned.
In commit 188e2525c I removed it from Winelib builds in general on the
grounds that Winelib had acquired that function. But now that I come
to try a Coverity build, I find that that is still one Winelib build
that does fail if I call SecureZeroMemory.
The structure field 'lengths' in 'struct zlib_decompress_ctx' was the
right length for the amount of data you might _sensibly_ want to put
in it, but two bytes shorter than the amount that the compressed block
header allows someone to _physically_ try to put into it. Now it has
the full maximum length.
The previous overrun could only reach two bytes beyond the end of the
array, and in every target architecture I know of, those two bytes
would have been structure padding, so it wasn't causing any real trouble.
In Deflate, both the literal/length and distance Huffman trees are
physically capable of encoding two symbol ids beyond the number that
the spec assigns any actual meaning to: a compressed block header can
specify code lengths for those two extra symbols if it wants to, in
which case those codes will be added to the Huffman tree (in
particular, will affect the encoding of everything else), but then
should not actually use those codes.
Our zlib decoder was silently ignoring the two invalid codes in the
literal/length tree, but treating the two invalid codes in the
distance tree as a fatal decoding error. That seems inconsistent. Now
we treat both as fatal decode errors.
There are already three tediously similar functions that wrap a NULL
check around some existing function to return one or another kind of
pointer, and I'm about to want to add another one. Make a macro so
that it's easy to make more functions identical to these three.
If a malformed private key (received through any channel, whether
loaded from a disk file or over the wire by Pageant) specifies either
of the modulus's prime factors p or q as 1, then when rsa_verify tries
to check that e*d is congruent to 1 mod (p-1) and mod (q-1), that
check will involve a division by zero, which in this context means
failing an assertion in mp_divmod.
We were already doing a preliminary check that neither of p and q was
actually zero. Now that check is strengthened to check that p-1 and
q-1 are not zero either.
Obviously we can't do that by inverting the hash function itself, but
if the user provides one or more host names on the command line that
they're expecting to appear in the file, we can at least compare the
stored hashes against those.
This change gives us an automatic --help option, which is always
useful for a script used very rarely. It also makes it that much
easier to add extra options.
Now most of the program consists of function and class definitions,
and the code that activates it all is localised in one place at the
bottom instead of interleaved between the definitions.
We support it in the ECC code proper these days, as of the bignum
rewrite in commit 25b034ee3. So we should support it in this auxiliary
script too, and fortunately, there's no real difficulty in doing so
because I already had some Python code kicking around in
test/eccref.py for taking modular square roots.
We went to the trouble of getting standard C99 format strings with
__USE_MINGW_ANSI_STDIO, but then used _vsnprintf, which bypasses them
and implements the non-standard Microsoft format strings, leading to
incorrect behaviour.
Ubuntu 14.04's mingw-w64 at least defines a vsnprintf() in all
circumstances, so we should use it.
This bug and the one in the previous commit combined to mean that when
an SSH-1 connection through Uppity is terminated, the pty backend for
its main session channel was never cleaned up. (Firstly because
ssh1_connection_free never got called, and secondly because that in
turn forgot to free its mainchan.)
The effect of that in turn was that a _subsequent_ connection to the
same Uppity (using the new listening-socket mode) would likely reuse
the same fd for its pty, and the insertions into the ptyfds tree in
uxpty.c would silently fail because an existing Pty was already
occupying them, leading to a segfault when that Pty in turn responded
to events on a pty it didn't really own and tried to call back to a
seat that didn't exist any more.
We went through the tree234 of ordinary channels freeing the thing at
the other end of each one, but of course in SSH-1 the main session
channel is not stored in that tree due to its totally different shape
in the protocol. So naturally I forgot to free _that_ one.
This exactly replicates the way it's done in SSH-2: at the start of
the connection layer we set the trust status to untrusted, and if that
reports that it didn't give any indication to the user, we fall back
to presenting an interactive anti-spoofing prompt.
I don't know how I forgot to do that in SSH-1, and even more, how we
haven't noticed for a month. We noticed the same bug in _Rlogin_
within a day of the 0.71 release, after all!
This is another case where a stale pointer bug could have arisen from
a toplevel callback going off after an object was freed.
But here, just adding delete_callbacks_for_context wouldn't help. The
actual context parameter for the callback wasn't mainchan itself; it
was a tiny separate object, allocated to hold just the parameters of
the function the callback wanted to call. So if _those_ parameters
became stale before the callback was triggered, then even
delete_callbacks_for_context wouldn't have been able to help.
Also, mainchan itself would have been freed moments after this
callback was queued, so moving it to be a callback on mainchan itself
wouldn't help.
Solution: move the callback right out to Ssh, by introducing a new
ssh_sw_abort_deferred() which is just like ssh_sw_abort but does its
main work in a toplevel callback. Then ssh.c's existing call to
delete_callbacks_for_context will clean it up if necessary.
Having explicitly _stated_ in commit 4dcc0fddf the principle that if
you ever queue a toplevel callback on a freeable object then you
should also call delete_callbacks_for_context on that object before
freeing it, I realised I'd never actually gone through and checked
methodically at every call site of queue_toplevel_callback. So I did,
and naturally, I found several missing ones.
We use a toplevel callback in the SSH-2 connection layer for checking
whether it's time to close the whole SSH session after a channel
closes. If the channel close itself, or something close enough to it,
involves a protocol error severe enough to abort the session and free
the connection layer, then that callback can fire anyway on stale
data.
The fix is the same as it always is in these situations: any object
which is ever used as the context parameter to queue_toplevel_callback
should also be passed to delete_callbacks_for_context before freeing it.
Because SSH-1 is a very niche interest these days. Mostly this affects
the public key documentation.
Also, a couple of unrelated concessions to modernity.