When it calls through ocr->handler() to process the response to a
channel request, sometimes that call ends up back in the main SSH-2
authconn coroutine, and sometimes _that_ will call bomb_out(), which
closes the whole SSH connection and frees all the channels - so that
when control returns back up the call stack to
ssh2_msg_channel_response itself which continues working with the
channel it was passed, it's using freed memory and things go badly.
This is the sort of thing I'd _like_ to fix using some kind of
large-scale refactoring along the lines of moving all the actual
free() calls out into top-level callbacks, so that _any_ function
which is holding a pointer to something can rely on that pointer still
being valid after it calls a subroutine. But I haven't worked out all
the details of how that system should work, and doubtless it will turn
out to have problems of its own once I do, so here's a point fix which
simply checks if the whole SSH session has been closed (which is easy
- much easier than checking if that _channel_ structure still exists)
and fixes the immediate bug.
(I think this is the real fix for the problem reported by the user I
mention in commit f0126dd19, because I actually got the details wrong
in the log message for that previous commit: the user's SSH server
wasn't rejecting the _opening_ of the main session channel, it was
rejecting the "shell" channel request, so this code path was the one
being exercised. Still, the other bug was real too, so no harm done!)
A user reported a nonsensical assertion failure (claiming that
ssh->version != 2) which suggested that a channel had somehow outlived
its parent Ssh in the situation where the opening of the main session
channel is rejected by the server. Checking with valgrind suggested
that things start to go wrong at the point where we free the half-set-
up ssh->mainchan before having filled in its type field, so that the
switch in ssh_channel_close_local() picks an arbitrary wrong action.
I haven't reproduced the same failure the user reported, but with this
change, Unix plink is now valgrind-clean in that failure situation.
This has been a FIXME in the code for ages, because back when the main
channel was always a pty session or a program run in a pipe, there
weren't that many circumstances in which the actual CHANNEL_OPEN could
return failure, so it never seemed like a priority to get round to
pulling the error information out of the CHANNEL_OPEN_FAILURE response
message and including it in PuTTY or Plink's local error message.
However, 'plink -nc' is the real reason why this is actually
important; if you tell the SSH server to make a direct-tcpip network
connection as its main channel, then that can fail for all the usual
network-unreliability reasons, and you actually do want to know which
(did you misspell the hostname, or is the target server refusing
connections, or has network connectivity failed?). This actually bit
me today when I had such a network failure, and had to debug it by
pulling that information manually out of a packet log. Time to
eliminate that FIXME.
So I've pulled the error-extracting code out of the previous handler
for OPEN_FAILURE on non-main channels into a separate function, and
arranged to call that function if the main channel open fails too. In
the process I've made a couple of minor tweaks, e.g. if the server
sends back a reason code we haven't heard of, we say _what_ that
reason code was, and also we at least make a token effort to spot if
we see a packet other than OPEN_{CONFIRMATION,FAILURE} reaching the
main loop in response to the main channel-open.
2ce0b680c inadvertently removed this ability in trying to ensure that
everyone got the new IUTF8 mode by default; you could remove a mode from
the list in the UI, but this would just revert PuTTY to its default.
The UI and storage have been revamped; the storage format now explicitly
says when a mode is not to be sent, and the configuration UI always
shows all modes known to PuTTY; if a mode is not to be sent it now shows
up as "(don't send)" in the list.
Old saved settings are migrated so as to preserve previous removals of
longstanding modes, while automatically adding IUTF8.
(In passing, this removes a bug where pressing the 'Remove' button of
the previous UI would populate the value edit box with garbage.)
I think an agent sending a string length exceeding the buffer bounds
by less than 4 could have made PuTTY read beyond its own buffer end.
Not that I really think a hostile SSH agent is likely to be attacking
PuTTY, but it's as well to fix these things anyway!
Mostly so that we don't have to malloc contiguous space for them
inside PuTTY; since we've already got a handy constant saying how big
is too big, we might as well use it to sanity-check the contents of
our agent forwarding channels.
The previous agent-forwarding system worked by passing each complete
query received from the input to agent_query() as soon as it was
ready. So if the remote client were to pipeline multiple requests,
then Unix PuTTY (in which agent_query() works asynchronously) would
parallelise them into many _simultaneous_ connections to the real
agent - and would not track which query went out first, so that if the
real agent happened to send its replies (to what _it_ thought were
independent clients) in the wrong order, then PuTTY would serialise
the replies on to the forwarding channel in whatever order it got
them, which wouldn't be the order the remote client was expecting.
To solve this, I've done a considerable rewrite, which keeps the
request stream in a bufchain, and only removes data from the bufchain
when it has a complete request. Then, if agent_query decides to be
asynchronous, the forwarding system waits for _that_ agent response
before even trying to extract the next request's worth of data from
the bufchain.
As an added bonus (in principle), this gives agent-forwarding channels
some actual flow control for the first time ever! If a client spams us
with an endless stream of rapid requests, and never reads its
responses, then the output side of the channel will run out of window,
which causes us to stop processing requests until we have space to
send responses again, which in turn causes us to stop granting extra
window on the input side, which serves the client right.
Now, instead of returning a boolean indicating whether the query has
completed or is still pending, agent_query() returns NULL to indicate
that the query _has_ completed, and if it hasn't, it returns a pointer
to a context structure representing the pending query, so that the
latter can be used to cancel the query if (for example) you later
decide you need to free the thing its callback was using as a context.
This should fix a potential race-condition segfault if you overload an
agent forwarding channel and then close it abruptly. (Which nobody
will be doing for sensible purposes, of course! But I ran across this
while stress-testing other aspects of agent forwarding.)
From ssh2_channel_got_eof() to ssh2_msg_channel_eof(). This removes
the only SSH-2 specicifity from the former. ssh2_channel_got_eof()
can also be called from ssh2_msg_channel_close(), but that calls
ssh2_channel_check_close() already.
Nothing ever sets them to NULL, and the various paths by which the
channel types can be set to CHAN_X11 or CHAN_SOCKDATA all ensure thet
the relevant union members are non-NULL. All the removed conditionals
have been converted into assertions, just in case I'm wrong.
It's redundant with the halfopen flag and is a misuse of the channel
type field. Happily, everything that depends on CHAN_SOCKDATA_DORMANT
also checks halfopen, so removing it is trivial.
Now it disconnects if the server sends
SSH_MSG_CHANNEL_OPEN_CONFIRMATION or SSH_MSG_CHANNEL_OPEN_FAILURE for
a channel that isn't half-open. Assertions in the SSH-2 handlers for
these messages rely on this behaviour even though it's never been
enforced before.
All but one caller was doing this unconditionally. The one conditional
call was when initialising the main channel, and in consequence PuTTY
leaked a channel structure when the server refused to open the main
channel. Now it doesn't.
An opcode for this was recently published in
https://tools.ietf.org/html/draft-sgtatham-secsh-iutf8-00 .
The default setting is conditional on frontend_is_utf8(), which is
consistent with the pty back end's policy for setting the same flag
locally. Of course, users can override the setting either way in the
GUI configurer, the same as all other tty modes.
Previously, the code that marshalled tty settings into the "pty-req"
request was iterating through the subkeys stored in ssh->conf, meaning
that if a session had been saved before we gained support for a
particular tty mode, the iteration wouldn't visit that mode at all and
hence wouldn't send even the default setting for it.
Now we iterate over the array of known mode identifiers in
ssh_ttymodes[] and look each one up in ssh->conf, rather than vice
versa. This means that when we add support for a new tty mode with a
nontrivial policy for choosing its default state, we should start
using the default handler immediately, rather than bizarrely waiting
for users to save a session after the change.
Also add an assertion to do_ssh2_transport to catch this.
This bug would be highly unlikely to manifest accidentally, but I
think you could trigger it by setting the data-based rekey threshold
very low.
All calls to ssh2_add_channel_data() were followed by a call to
ssh2_try_send(), so it seems sensible to replace ssh2_add_channel_data()
with ssh2_send_channel_data(), which does both.
Specifically, don't try to unblock all channels just because we've got
something to send on the main one. It looks like the code to do that
was left over from when SSH_MSG_CHANNEL_ADJUST was handled in
do_ssh2_authconn().
The UI now only has "1" and "2" options for SSH protocol version, which
behave like the old "1 only" and "2 only" options; old
SSH-N-with-fallback settings are interpreted as SSH-N-only.
This prevents any attempt at a protocol downgrade attack.
Most users should see no difference; those poor souls who still have to
work with SSH-1 equipment now have to explicitly opt in.
If you're connecting to a new server and it _only_ provides host key
types you've configured to be below the warning threshold, it's OK to
give the standard askalg() message. But if you've newly demoted a host
key type and now reconnect to some server for which that type was the
best key you had cached, the askalg() wording isn't really appropriate
(it's not that the key we've settled on is the first type _supported
by the server_, it's that it's the first type _cached by us_), and
also it's potentially helpful to list the better algorithms so that
the user can pick one to cross-certify.
When Jacob introduced this message in d0d3c47a0, he was right to
assume that hostkey_algs[] and ssh->uncert_hostkeys[] were sorted in
the same order. Unfortunately, he became wrong less than an hour later
when I committed d06098622. Now we avoid making any such assumption.
Since we got a dynamic preference order, it's been bailing out at a
random point, and listing keys we wouldn't use.
(It would still be nice to only mention keys that we'd actually use, but
that's now quite fiddly.)
I noticed this in passing while tinkering with the hostkey_algs array:
these arrays are full of pointers-to-const, but are not also
themselves declared const, which they should have been all along.
Now we actually have enough of them to worry about, and especially
since some of the types we support are approved by organisations that
people might make their own decisions about whether to trust, it seems
worth having a config list for host keys the same way we have one for
kex types and ciphers.
To make room for this, I've created an SSH > Host Keys config panel,
and moved the existing host-key related configuration (manually
specified fingerprints) into there from the Kex panel.
I got momentarily confused between whether the special code
(TS_LOCALSTART+i) meant the ith entry in the variable
uncert_hostkeys[] array, or the ith entry in the fixed hostkey_algs[]
array. Now I think everything agrees on it being the latter.
If a server offers host key algorithms that we don't have a stored key
for, they will now appear in a submenu of the Special Commands menu.
Selecting one will force a repeat key exchange with that key, and if
it succeeds, will add the new host key to the cache. The idea is that
the new key sent by the server is protected by the crypto established
in the previous key exchange, so this is just as safe as typing some
command like 'ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub' at the
server prompt and transcribing the results manually.
This allows switching over to newer host key algorithms if the client
has begun to support them (e.g. people using PuTTY's new ECC
functionality for the first time), or if the server has acquired a new
key (e.g. due to a server OS upgrade).
At the moment, it's only available manually, for a single host key
type at a time. Automating it is potentially controversial for
security policy reasons (what if someone doesn't agree this is what
they want in their host key cache, or doesn't want to switch over to
using whichever of the keys PuTTY would now put top of the list?), for
code plumbing reasons (chaining several of these rekeys might be more
annoying than doing one at a time) and for CPU usage reasons (rekeys
are expensive), but even so, it might turn out to be a good idea in
future.
The last list we returned is now stored in the main Ssh structure
rather than being a static array in ssh_get_specials.
The main point of this is that I want to start adding more dynamic
things to it, for which I can't predict the array's max length in
advance.
But also this fixes a conceptual wrongness, in that if a process had
more than one Ssh instance in it then their specials arrays would have
taken turns occupying the old static array, and although the current
single-threaded client code in the GUI front ends wouldn't have minded
(it would have read out the contents just once immediately after
get_specials returned), it still feels as if it was a bug waiting to
happen.
ssh_pkt_getstring can return (NULL,0) if the input packet is too short
to contain a valid string.
In quite a few places we were passing the returned pointer,length pair
to a printf function with "%.*s" type format, which seems in practice
to have not been dereferencing the pointer but the C standard doesn't
actually guarantee that. In one place we were doing the same job by
hand with memcpy, and apparently that _can_ dereference the pointer in
practice (so a server could have caused a NULL-dereference crash by
sending an appropriately malformed "x11" type channel open request).
And also I spotted a logging call in the "forwarded-tcpip" channel
open handler which had forgotten the field width completely, so it was
erroneously relying on the string happening to be NUL-terminated in
the received packet.
I've tightened all of this up in general by normalising (NULL,0) to
("",0) before calling printf("%.*s"), and replacing the two even more
broken cases with the corrected version of that same idiom.