It's identical in uxnoise and winnoise, being written entirely in
terms of existing cross-platform functions. Might as well centralise
it into sshrand.c.
Coverity points out that under a carefully checked compound condition,
we assign s->gss_cred_expiry to itself. :-)
Before commit 2ca0070f8 split the SSH code into separate modules, that
assignment read 'ssh->gss_cred_expiry = s->gss_cred_expiry', and the
point was that the value had to be copied out of the private state of
the transport-layer coroutine into the general state of the SSH
protocol as a whole so that ssh2_timer() (or rather, ssh2_gss_update,
called from ssh2_timer) could check it.
But now that timer function is _also_ local to the transport layer,
and shares the transport coroutine's state. So on the one hand, we
could just remove that assignment, folding the two variables into one;
on the other hand, we could reinstate the two variables and the
explicit 'commit' action that copies one into the other. The question
is, which?
Any _successful_ key exchange must have gone through that commit step
of the kex that copied the one into the other. And an unsuccessful key
exchange always aborts the whole SSH connection - there's nothing in
the SSH transport protocol that allows recovering from a failed rekey
by carrying on with the previous session keys. So if I made two
variables, then between key exchanges, they would always have the same
value anyway.
So the only effect of the separation between those variables is
_during_ a GSS key exchange: should the version of gss_cred_expiry
checked by the timer function be the new one, or the old one?
This can only make a difference if a timer goes off _during_ a rekey,
and happens to notice that the GSS credentials have expired. And
actually I think it's mildly _better_ if that checks the new expiry
time rather than the old one - otherwise, the timer routine would set
the flag indicating a need for another rekey, when actually the
currently running one was going to get the job done anyway.
So, in summary, I think conflating those two variables is actually an
improvement to the code. I did it by accident, but if I'd realised, I
would have done the same thing on purpose!
Hence, I've just removed the pointless self-assignment, with no
functional change.
If the input key_wanted field were set to an out-of-range value, the
parent structure retkey would not become NULL as a whole: instead, its
field 'key' would never be set to a non-null pointer. So I was testing
the wrong pointer.
Fortunately, this couldn't have come up, because we don't actually
have any support yet for loading the nth key from an OpenSSH new-style
key file containing more than one. So key_wanted was always set to 0
by load_openssh_new_key(), which also checked that the file's key
count was exactly 1 (guarding against the possibility that even 0
might have been an out-of-range index).
Coverity points out that in (the misnamed) psftp_main(), I allow for
the possibility that 'backend' might be null, up until the final
unconditional backend_free().
I haven't actually been able to find a way to make backend() come out
as NULL at all in that part of the code: obvious failure modes like
trying to scp to a nonexistent host trigger a connection_fatal() which
terminates the program without going through that code anyway. But I'm
not confident I tried everything, so I can't be sure there _isn't_ a
way to get NULL to that location, so let's put in the missing check
just in case.
In load_openssh_new_key, ret->keyblob is never null any more: now that
it's a strbuf instead of a bare realloc()ed string, it's at worst an
_empty_ strbuf. Secondly, as Coverity pointed out, the null pointer
check was too late to do any good in the first place - the previous
clause of the same if condition would already have dereferenced it!
In test_mac in the auxiliary testsc program, there's no actual reason
I would ever have called it with null ssh_mac pointer - it would mean
'don't test anything'! Looks as if I just copy-pasted the MAC parts of
the cipher+MAC setup code in test_cipher.
We have to use the file name we just failed to open to format an error
message _before_ freeing it, not after. If that use-after-free managed
not to cause a crash, we'd also leak the file descriptor 'outfd'.
Both spotted by Coverity (which is probably the first thing in years
to look seriously at any of the code designed for Ben's AFL exercise).
Coverity pointed out a call to sftp_pkt_free(pktin) straight after an
unconditional return statement, which is obviously silly.
Fortunately, it was benign: pktin was freed anyway by the function
being called _by_ the return statement, so the unreachability of this
free operation prevented a double-free rather than allowing a memory
leak. So the right fix is to just remove the code, rather than
arranging for it to be run.
It seems to have caused a compile error, apparently due to a mismatch
between compiler predefines (__SSE2__) and header files (emmintrin.h).
The easiest thing is to just turn off the hardware version completely,
so the rest of the code can still be scanned.
In commit 188e2525c I removed it from Winelib builds in general on the
grounds that Winelib had acquired that function. But now that I come
to try a Coverity build, I find that that is still one Winelib build
that does fail if I call SecureZeroMemory.
The structure field 'lengths' in 'struct zlib_decompress_ctx' was the
right length for the amount of data you might _sensibly_ want to put
in it, but two bytes shorter than the amount that the compressed block
header allows someone to _physically_ try to put into it. Now it has
the full maximum length.
The previous overrun could only reach two bytes beyond the end of the
array, and in every target architecture I know of, those two bytes
would have been structure padding, so it wasn't causing any real trouble.
In Deflate, both the literal/length and distance Huffman trees are
physically capable of encoding two symbol ids beyond the number that
the spec assigns any actual meaning to: a compressed block header can
specify code lengths for those two extra symbols if it wants to, in
which case those codes will be added to the Huffman tree (in
particular, will affect the encoding of everything else), but then
should not actually use those codes.
Our zlib decoder was silently ignoring the two invalid codes in the
literal/length tree, but treating the two invalid codes in the
distance tree as a fatal decoding error. That seems inconsistent. Now
we treat both as fatal decode errors.
There are already three tediously similar functions that wrap a NULL
check around some existing function to return one or another kind of
pointer, and I'm about to want to add another one. Make a macro so
that it's easy to make more functions identical to these three.
If a malformed private key (received through any channel, whether
loaded from a disk file or over the wire by Pageant) specifies either
of the modulus's prime factors p or q as 1, then when rsa_verify tries
to check that e*d is congruent to 1 mod (p-1) and mod (q-1), that
check will involve a division by zero, which in this context means
failing an assertion in mp_divmod.
We were already doing a preliminary check that neither of p and q was
actually zero. Now that check is strengthened to check that p-1 and
q-1 are not zero either.
Obviously we can't do that by inverting the hash function itself, but
if the user provides one or more host names on the command line that
they're expecting to appear in the file, we can at least compare the
stored hashes against those.
This change gives us an automatic --help option, which is always
useful for a script used very rarely. It also makes it that much
easier to add extra options.
Now most of the program consists of function and class definitions,
and the code that activates it all is localised in one place at the
bottom instead of interleaved between the definitions.
We support it in the ECC code proper these days, as of the bignum
rewrite in commit 25b034ee3. So we should support it in this auxiliary
script too, and fortunately, there's no real difficulty in doing so
because I already had some Python code kicking around in
test/eccref.py for taking modular square roots.
We went to the trouble of getting standard C99 format strings with
__USE_MINGW_ANSI_STDIO, but then used _vsnprintf, which bypasses them
and implements the non-standard Microsoft format strings, leading to
incorrect behaviour.
Ubuntu 14.04's mingw-w64 at least defines a vsnprintf() in all
circumstances, so we should use it.
This bug and the one in the previous commit combined to mean that when
an SSH-1 connection through Uppity is terminated, the pty backend for
its main session channel was never cleaned up. (Firstly because
ssh1_connection_free never got called, and secondly because that in
turn forgot to free its mainchan.)
The effect of that in turn was that a _subsequent_ connection to the
same Uppity (using the new listening-socket mode) would likely reuse
the same fd for its pty, and the insertions into the ptyfds tree in
uxpty.c would silently fail because an existing Pty was already
occupying them, leading to a segfault when that Pty in turn responded
to events on a pty it didn't really own and tried to call back to a
seat that didn't exist any more.
We went through the tree234 of ordinary channels freeing the thing at
the other end of each one, but of course in SSH-1 the main session
channel is not stored in that tree due to its totally different shape
in the protocol. So naturally I forgot to free _that_ one.
This exactly replicates the way it's done in SSH-2: at the start of
the connection layer we set the trust status to untrusted, and if that
reports that it didn't give any indication to the user, we fall back
to presenting an interactive anti-spoofing prompt.
I don't know how I forgot to do that in SSH-1, and even more, how we
haven't noticed for a month. We noticed the same bug in _Rlogin_
within a day of the 0.71 release, after all!
This is another case where a stale pointer bug could have arisen from
a toplevel callback going off after an object was freed.
But here, just adding delete_callbacks_for_context wouldn't help. The
actual context parameter for the callback wasn't mainchan itself; it
was a tiny separate object, allocated to hold just the parameters of
the function the callback wanted to call. So if _those_ parameters
became stale before the callback was triggered, then even
delete_callbacks_for_context wouldn't have been able to help.
Also, mainchan itself would have been freed moments after this
callback was queued, so moving it to be a callback on mainchan itself
wouldn't help.
Solution: move the callback right out to Ssh, by introducing a new
ssh_sw_abort_deferred() which is just like ssh_sw_abort but does its
main work in a toplevel callback. Then ssh.c's existing call to
delete_callbacks_for_context will clean it up if necessary.
Having explicitly _stated_ in commit 4dcc0fddf the principle that if
you ever queue a toplevel callback on a freeable object then you
should also call delete_callbacks_for_context on that object before
freeing it, I realised I'd never actually gone through and checked
methodically at every call site of queue_toplevel_callback. So I did,
and naturally, I found several missing ones.
We use a toplevel callback in the SSH-2 connection layer for checking
whether it's time to close the whole SSH session after a channel
closes. If the channel close itself, or something close enough to it,
involves a protocol error severe enough to abort the session and free
the connection layer, then that callback can fire anyway on stale
data.
The fix is the same as it always is in these situations: any object
which is ever used as the context parameter to queue_toplevel_callback
should also be passed to delete_callbacks_for_context before freeing it.
Because SSH-1 is a very niche interest these days. Mostly this affects
the public key documentation.
Also, a couple of unrelated concessions to modernity.
Move "Response to remote title query" next to "Disable remote-controlled
window title changing"; that seems more logical, and matches the
documentation.
- Mention public key authentication
- Define and describe the "terminal window"
- Mention trust sigils
- Describe here the lack of feedback in password prompts, as well as in
the FAQ
If you try to generate (say) a 2049-bit RSA key, then primegen will
try to generate a 1025-bit prime. It will do it by making a random
1024-bit mp_int (that is, one strictly _less_ than 2^1024), and then
trying to set bit 1024. But that will fail an assertion in mp_set_bit,
because the number of random bits is a multiple of BIGNUM_INT_BITS, so
an mp_int of the minimum size that can hold the random bits is not
quite big enough to hold the extra bit at the top.
Fix: change the strategy in primegen so that we allocate the mp_int
large enough to hold even the top bit, and copy in the random numbers
via mp_or_into.
There's a second bug hiding behind that one. If the key has odd size,
then the two primes are generated with different bit lengths. If the
overall key size is congruent to 1 mod (2*BIGNUM_INT_BITS), then the
two primes will be allocated as mp_ints with different numbers of
words, leading to another assertion failure in the mp_cond_swap that
sorts the primes into a consistent order.
Fix for that one: if the primes are being generated different bit
lengths, then we arrange those lengths to be already in the right
order, and replace the mp_cond_swap with an assert() that checks the
ordering is already correct.
Combined effect: now you should be able to successfully generate a
2049-bit key without assertion failures.
I had another look just now at the mysterious AES intrinsics in
VS2017's arm_neon.h that have the wrong number of operands, and this
time I noticed that they're in one branch of an #ifdef whose other
branch looks a lot more sensible. Enabled the #define that switches to
the working branch.
The recent rewriting in both the GTK and Windows keyboard handlers
left the keypad 'Enter' key in a bad state, when no override is
enabled that causes it to generate an escape sequence.
On Windows, a series of fallbacks was causing it to generate \r
regardless of configuration, whereas in Telnet mode it should default
to generating the special Telnet new-line sequence, and in response to
ESC[20h (enabling term->cr_lf_return) it should generate \r\n.
On GTK, it wasn't generating anything _at all_, and also, I can't see
any evidence that the GTK keyboard handler had ever remembered to
implement the cr_lf_return mode.
Now Keypad Enter in non-escape-sequence mode should behave just like
Return, on both platforms.