If you start up two sharing-enabled PuTTYs to the same host
simultaneously, the one that ends up being the downstream can connect
to the upstream before the upstream has provided a ConnectionLayer to
the sharestate, which means that log_downstream() will dereference
cs->parent->cl->frontend to find its Frontend and fail because cl is
NULL.
Fixed by providing a dummy initial ConnectionLayer containing nothing
but a frontend pointer, which is then replaced by the real one later.
Almost all the call sites were doing a cumbersome dupprintf-use-free
cycle to get a formatted message into an ErrorSocket anyway, so it
seems more sensible to give them an easier way of doing so.
The few call sites that were passing a constant string literal
_shouldn't_ have been - they'll be all the better for adding a
strerror suffix to the message they were previously giving!
Looks as if I introduced this in commit 733fcca2c, where the pointer
returned from enum_settings_start() stopped being the same thing as
the underlying 'DIR *' - I needed to retain a check for the outer
containing structure not being NULL but the DIR * being NULL inside
it.
I ran 'git push' too soon on the last commit; after a bit more thought
I realise that I didn't get the logic quite right in the case where
one direction of the connection negotiates delayed compression and the
other negotiates ordinary or no compression.
For a start, we only need to worry about temporarily delaying outgoing
packets to avoid the race condition if delayed compression applies to
_outgoing_ packets - that can be disabled in the case where delayed
compression is inbound only (though that is admittedly unlikely).
Secondly, that means that detecting USERAUTH_SUCCESS to enable
compression has to happen even if the output blockage wasn't in place.
Thirdly, if we're independently enabling delayed compression in the
two directions, we should only print an Event Log entry for the one we
actually did!
This revised version is probably more robust, although for the moment
all of this is theoretical - I haven't tested against a server
implementing unidirectional delayed compression.
The problem with OpenSSH delayed compression is that the spec has a
race condition. Compression is enabled when the server sends
USERAUTH_SUCCESS. In the server->client direction, that's fine: the
USERAUTH_SUCCESS packet is not itself compressed, and the next packet
in the same direction is. But in the client->server direction, this
specification relies on there being a moment of half-duplex in the
connection: the client can't send any outgoing packet _after_ whatever
userauth packet the USERAUTH_SUCCESS was a response to, and _before_
finding out whether the response is USERAUTH_SUCCESS or something
else. If it emitted, say, an SSH_MSG_IGNORE or initiated a rekey
(perhaps due to a timeout), then that might cross in the network with
USERAUTH_SUCCESS and the server wouldn't be able to know whether to
treat it as compressed.
My previous solution was to note the presence of delayed compression
options in the server KEXINIT, but not to negotiate them in the
initial key exchange. Instead, we conduct the userauth exchange with
compression="none", and then once userauth has concluded, we trigger
an immediate rekey in which we do accept delayed compression methods -
because of course by that time they're no different from the non-
delayed versions. And that means compression is enabled by the
bidirectional NEWKEYS exchange, which lacks that race condition.
I think OpenSSH itself gets away with this because its layer structure
is structure so as to never send any such asynchronous transport-layer
message in the middle of userauth. Ours is not. But my cunning plan is
that now that my BPP abstraction includes a queue of packets to be
sent and a callback that processes that queue on to the output raw
data bufchain, it's possible to make that callback terminate early, to
leave any dangerous transport-layer messages unsent while we wait for
a userauth response.
Specifically: if we've negotiated a delayed compression method and not
yet seen USERAUTH_SUCCESS, then ssh2_bpp_handle_output will emit all
packets from its queue up to and including the last one in the
userauth type-code range, and keep back any further ones. The idea is
that _if_ that last userauth message was one that might provoke
USERAUTH_SUCCESS, we don't want to send any difficult things after it;
if it's not (e.g. it's in the middle of some ongoing userauth process
like k-i or GSS) then the userauth layer will know that, and will emit
some further userauth packet on its own initiative which will clue us
in that it's OK to release everything up to and including that one.
(So in particular it wasn't even necessary to forbid _all_ transport-
layer packets during userauth. I could have done that by reordering
the output queue - packets in that queue haven't been assigned their
sequence numbers yet, so that would have been safe - but it's more
elegant not to have to.)
One particular case we do have to be careful about is not trying to
initiate a _rekey_ during userauth, if delayed compression is in the
offing. That's because when we start rekeying, ssh2transport stops
sending any higher-layer packets at all, to discourage servers from
trying to ignore the KEXINIT and press on regardless - you don't get
your higher-layer replies until you actually respond to the
lower-layer interrupt. But in this case, if ssh2transport sent a
KEXINIT, which ssh2bpp kept back in the queue to avoid a delayed
compression race and would only send if another userauth packet
followed it, which ssh2transport would never pass on to ssh2bpp's
output queue, there'd be a complete protocol deadlock. So instead I
defer any attempt to start a rekey until after userauth finishes
(using the existing system for starting a deferred rekey at that
moment, which was previously used for the _old_ delayed-compression
strategy, and still has to be here anyway for GSSAPI purposes).
The sshverstring quasi-frontend is passed a Frontend pointer at setup
time, so that it can generate Event Log entries containing the local
and remote version strings and the results of remote bug detection.
I'm promoting that field of sshverstring to a field of the public BPP
structure, so now all BPPs have the right to talk directly to the
frontend if they want to. This means I can move all the log messages
of the form 'Initialised so-and-so cipher/MAC/compression' down into
the BPPs themselves, where they can live exactly alongside the actual
initialisation of those primitives.
It also means BPPs will be able to log interesting things they detect
at any point in the packet stream, which is about to come in useful
for another purpose.
I haven't needed these until now, but I'm about to need to inspect the
entire contents of a packet queue before deciding whether to process
the first item on it.
I've changed the single 'vtable method' in packet queues from get(),
which returned the head of the queue and optionally popped it, to
after() which does the same bug returns the item after a specified
tree node. So if you pass the special end node to after(), then it
behaves like get(), but now you can also use it to retrieve the
successor of a packet.
(Orthogonality says that you can also _pop_ the successor of a packet
by calling after() with prev != pq.end and pop == TRUE. I don't have a
use for that one yet.)
These are things where no fix was actually necessary in the code, but
the FIXME indicated that the comment itself was either in need of a
rewrite or removal.
There was a while when I hadn't decided what the name of the program
was going to be, and apparently once I did I never got round to
substituting it back in everywhere.
It's not used anywhere, but this would make it one step easier to add
a mode argument to PSFTP's mkdir command, if anyone needs it. Mostly
the point is to get rid of the FIXME comment in fxp_mkdir_send itself.
While grepping for FIXME comments I could get rid of easily, I came
across a completely unexplained one in puttytel.rc, and after a moment
of thought, realised that it was there because PuTTYtel sharing
PuTTY's manifest file means the manifest has the wrong application
name.
Of course I could do something a bit more clever involving having one
copy of the manifest file and templating it to multiple applications,
but I think it would be more pain than it's worth given that the
templating system would have to be compatible with all the makefiles
and run on Windows systems where no sensible scripting was available.
So I just do it the trivial way.
It's never set to anything but NULL at any call site, and there's been
a FIXME comment in uxucs.c for ages saying it should be removed. I
think it only existed in the first place because it was a facility
supported by the underlying Windows API function and we couldn't see a
reason _not_ to pass it through. But I'm cleaning up FIXMEs, so we
should get rid of it.
(It stood for 'default used', incidentally - as in 'did the function
at any point have to make use of the parameter providing a default
fallback character?'. Nothing to do with _defusing_ things :-)
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
I don't actually know why this was ever here; it appeared in the very
first commit that invented Plug in the first place (7b0e08270) without
explanation. Perhaps Dave's original idea was that sometimes you'd
need those macros _not_ to be defined so that the same names could be
reused as the methods for a particular Plug instance? But I don't
think that ever actually happened, and the code base builds just fine
with those macros defined unconditionally just like all the other sets
of method macros we now have, so let's get rid of this piece of cruft
that was apparently unnecessary all along.
I think that means that _every_ one of my traitoids is now a struct
containing a vtable pointer as one of its fields (albeit sometimes the
only field), and never just a bare pointer.
Now that I'm doing that in so many of the new classes as a more
type-safe alternative to ordinary C casts, I should make sure all the
old code is also reaping the benefits. This commit converts the system
of unifont vtables in the GTK front end, and also the 'unifontsel'
structure that exposes only a few of its fields outside gtkfont.c.
All the main backend structures - Ssh, Telnet, Pty, Serial etc - now
describe structure types themselves rather than pointers to them. The
same goes for the codebase-wide trait types Socket and Plug, and the
supporting types SockAddr and Pinger.
All those things that were typedefed as pointers are older types; the
newer ones have the explicit * at the point of use, because that's
what I now seem to be preferring. But whichever one of those is
better, inconsistently using a mixture of the two styles is worse, so
let's make everything consistent.
A few types are still implicitly pointers, such as Bignum and some of
the GSSAPI types; generally this is either because they have to be
void *, or because they're typedefed differently on different
platforms and aren't always pointers at all. Can't be helped. But I've
got rid of the main ones, at least.
In mainchan_send_eof, which is the Channel method that gets called
when EOF has been received from the SSH server and is now being passed
on to the local endpoint, we decide whether or not to respond to the
server-side EOF with a client-side EOF based on application
preference. But I was doing the followup admin _outside_ that if
statement, so if the server sent EOF and we _didn't_ want to send EOF
in response, we still set the flag that said we'd sent it, and stopped
reading from standard input. Result: if you use 'plink -nc' to talk to
a remote network socket, and the server sends EOF first, Plink will
never send EOF in the other direction, because it'll stop reading from
standard input and never actually see the EOF that needs to be sent.
s->want_user_input is set and unset in response to fluctuations of the
main channel's available SSH window size. But that means it can become
TRUE before a command has been successfully started, which we don't
want, because pscp.c uses backend_sendok() to determine when it's safe
to check the flag that tells it whether to speak the SFTP or SCP1
protocol. So we want to ensure we never return true from that backend
method until we know which command we're running.
GUI feedback mode was last seen in 2006 (removed in commit 33b7caa59),
so quite what a conditioned-out piece of online help text for it was
doing still around here 12 years later, I have no idea.
(Especially since it had been under #if 0 since 2001, and also since
then its containing source file had ceased to be Windows-only so it
would have been extra-wrong to reinstate it.)
Another mistake in commit 54b300f15 was to introduce a new flag
'progress_bar_displayed', when in fact we were already storing an
indication of whether a set of live transfer statistics were currently
on the display, in the form of prev_stats_len (which is also used to
make sure each stats display overwrites all of the previous one).
Removed that redundancy, and while I'm at it, renamed the new
abandon_progress_bar() to match the rest of the code's general
convention of calling that status display 'statistics' or 'transfer
statistics' rather than a 'progress bar'.
In commit 54b300f15, I managed to set the progress_bar_displayed flag
just _after_, rather than before, the call to abandon_progress_bar
that moves to the new line once the file has finished copying. So in
the case where a file is so small that the very first displaying of
the transfer statistics is already at 100% completion, the flag
wouldn't be set when abandon_progress_bar checked for it, and a
newline still wouldn't be printed.
When I made the 'overwrite or append log file?' dialog box into a
non-modal one, it exposed a bug in logging.c's handling of an
asynchronous response to askappend(): we queued all the pending log
data and wrote it out to the log file, but forgot the final fflush
that would have made sure it all actually _went_ to the log file. So
one stdio buffer's worth could still be held in the C library, to be
released the next time log data shows up.
Added the missing logflush().
A second bug in the area of clean SSH-connection closure: I was
setting the pending_close flag (formerly send_outgoing_eof) and
expecting that once the outgoing backlog was cleared it would cause a
socket closure. But of course the function that does that -
ssh_bpp_output_raw_data_callback() - will only get called if there
_is_ any outgoing backlog to be cleared! So if there was already no
backlog, I would set the pending_close flag and nothing would ever
check it again.
Fixed by manually re-queuing the callback that will check the backlog
and the pending_close flag.
When PuTTY wants to cleanly close an SSH connection, my policy has
been to use shutdown(SHUT_WR) (or rather, sk_write_eof, which ends up
translating into that) to close just the outgoing side of the TCP
connection, and then wait for the server to acknowledge that by
closing its own end.
Mainly the purpose of doing this rather than just immediately closing
the whole socket was that I wanted to make sure any remaining outgoing
packets of ours got sent before the connection was wound up. In
particular, when we send SSH_MSG_DISCONNECT immediately before the
close, we do want that to get through.
But I now think this was a mistake, because it puts us at the mercy of
the server remembering to respond by closing the other direction of
the connection. It might absent-mindedly just continue to sit there
holding the connection open, which would be silly, but if it did
happen, we wouldn't want to sit around waiting in order to close the
client application - we'd rather abandon a socket in that state and
leave it to the OS's network layer to tell the server how silly it was
being.
So now I'm using an in-between strategy: I still wait for outgoing
data to be sent before closing the socket (so the DISCONNECT should
still go out), but once it's gone, I _do_ just close the whole thing
instead of just sending outgoing EOF.
I left this message type code out of the list in the outer switch in
ssh2_connection_filter_queue for messages with the standard handling
of an initial recipient channel id. The inner switch had a perfectly
good handler for extended data, but the outer one didn't pass the
message on to that handler, so it went back to the main coroutine and
triggered a sw_abort for an unexpected packet.
The check_termination function in ssh2connection is supposed to be
called whenever it's possible that we've run out of (a) channels, and
(b) sharing downstreams. I've been calling it on every channel close,
but apparently completely forgot to add a callback from sshshare.c
that also arranges to call it when we run out of downstreams.
Otherwise we loop round repeatedly with the event loop continuing to
report the same EOF condition on them over and over again, consuming
CPU pointlessly and probably causing other knock-on trouble too.
Without this, we don't receive EOF notifications on pipes, because gtk
uses poll rather than select, which separates those out into distinct
event types.
If you call plug_closing directly from localproxy_try_send, which can
in turn be called directly from sk_write, then the plug's
implementation of plug_closing may well free things that the caller of
sk_write expected not to have vanished.
The corresponding routine in uxnet.c pushes that call to plug_closing
into a toplevel callback, so let's do that here too.
When any BPP calls ssh_remote_error or ssh_remote_eof, it triggers an
immediate cleanup of the BPP itself - so on return from one of those
functions we should avoid going straight to the crFinish macro,
because that will write to s->crState, which no longer exists.
I carefully put a flag in the new Ssh structure so that I could tell
the difference between ssh->base_layer being NULL because it hasn't
been set up yet, and being NULL because it's been and gone and the
session is terminated. And did I check that flag in all the error
routines? I did not. Result: an early socket error, while we're still
in the verstring BPP, doesn't get reported as an error message and
doesn't cause the socket to be cleaned up.
It is useful to be able to exclude the header so that the log file
can be used for realtime input to other programs such as Kst for
plotting live data from sensors.
The call to ssh2_censor_packet for incoming packets in ssh2bpp was
passing the wrong starting position in the packet data - in
particular, not the same starting position as the adjacent call to
log_packet - so the censor couldn't parse SSH2_MSG_CHANNEL_DATA to
identify the string of session data that it should be bleeping out.
In commit 8cb68390e I managed to copy the packet contexts inaccurately
from the old implementation of ssh2_pkt_type, and listed the ECDH KEX
packets against SSH2_PKTCTX_DHGEX instead of SSH2_PKTCTX_ECDHKEX,
which led to them appearing as "unknown" in packet log files.
I reworked the code for this at the last moment while preparing the
Big Refactoring, having decided my previous design was overcomplicated
and introducing an argument parameter (commit f4fbaa1bd) would be
simpler.
I carefully checked after the rework that specials manufactured by the
code itself (e.g. SS_PING) came through OK, but apparently the one
thing I _didn't_ test after the rework was that the specials list was
actually returned correctly from ssh_get_specials to be incorporated
into the GUI.
In fact one stray if statement - both redundant even if it had been
right, and also tested the wrong pointer - managed to arrange that
when ssh->specials is NULL, it could never be overwritten by anything
non-NULL. And of course it starts off initialised to NULL. Oops.
When I separated out the transport layer into its own source file, I
also reworked the logic deciding when to rekey, and apparently that
rework introduced a braino in which I compared rekey_reason (which is
a pointer) to RK_NONE (which is a value of the enumerated type that
lives in the similarly named variable rekey_class). Oops. The result
was that after the first rekey, the loop would terminate the next time
the transport coroutine got called, because the code just before the
loop had zeroed out rekey_class but not rekey_reason. So there'd be a
rekey on every keypress, or similar.
If the user presses ^C or ^D at an authentication prompt, I meant to
handle that by calling ssh_user_close, i.e. treat the closure as being
intentionally directed _by_ the user, and hence don't bother putting
up a warning box telling the user it had happened.
I got this right in ssh2userauth, but in ssh1login I mistakenly called
ssh_sw_abort instead. That's what I get for going through all the
subtly different session closures in a hurry trying to decide which of
five categories each one falls into...
When the connection layer is ready to receive user input, it sets the
flag causing ssh_ppl_want_user_input to return true. But one thing it
_didn't_ do was to check whether the user input bufchain already had
some data in it because the user had typed ahead of the session setup,
and send that input immediately if so. Now it does.
I've tried to separate out as many individually coherent changes from
this work as I could into their own commits, but here's where I run
out and have to commit the rest of this major refactoring as a
big-bang change.
Most of ssh.c is now no longer in ssh.c: all five of the main
coroutines that handle layers of the SSH-1 and SSH-2 protocols now
each have their own source file to live in, and a lot of the
supporting functions have moved into the appropriate one of those too.
The new abstraction is a vtable called 'PacketProtocolLayer', which
has an input and output packet queue. Each layer's main coroutine is
invoked from the method ssh_ppl_process_queue(), which is usually
(though not exclusively) triggered automatically when things are
pushed on the input queue. In SSH-2, the base layer is the transport
protocol, and it contains a pair of subsidiary queues by which it
passes some of its packets to the higher SSH-2 layers - first userauth
and then connection, which are peers at the same level, with the
former abdicating in favour of the latter at the appropriate moment.
SSH-1 is simpler: the whole login phase of the protocol (crypto setup
and authentication) is all in one module, and since SSH-1 has no
repeat key exchange, that setup layer abdicates in favour of the
connection phase when it's done.
ssh.c itself is now about a tenth of its old size (which all by itself
is cause for celebration!). Its main job is to set up all the layers,
hook them up to each other and to the BPP, and to funnel data back and
forth between that collection of modules and external things such as
the network and the terminal. Once it's set up a collection of packet
protocol layers, it communicates with them partly by calling methods
of the base layer (and if that's ssh2transport then it will delegate
some functionality to the corresponding methods of its higher layer),
and partly by talking directly to the connection layer no matter where
it is in the stack by means of the separate ConnectionLayer vtable
which I introduced in commit 8001dd4cb, and to which I've now added
quite a few extra methods replacing services that used to be internal
function calls within ssh.c.
(One effect of this is that the SSH-1 and SSH-2 channel storage is now
no longer shared - there are distinct struct types ssh1_channel and
ssh2_channel. That means a bit more code duplication, but on the plus
side, a lot fewer confusing conditionals in the middle of half-shared
functions, and less risk of a piece of SSH-1 escaping into SSH-2 or
vice versa, which I remember has happened at least once in the past.)
The bulk of this commit introduces the five new source files, their
common header sshppl.h and some shared supporting routines in
sshcommon.c, and rewrites nearly all of ssh.c itself. But it also
includes a couple of other changes that I couldn't separate easily
enough:
Firstly, there's a new handling for socket EOF, in which ssh.c sets an
'input_eof' flag in the BPP, and that responds by checking a flag that
tells it whether to report the EOF as an error or not. (This is the
main reason for those new BPP_READ / BPP_WAITFOR macros - they can
check the EOF flag every time the coroutine is resumed.)
Secondly, the error reporting itself is changed around again. I'd
expected to put some data fields in the public PacketProtocolLayer
structure that it could set to report errors in the same way as the
BPPs have been doing, but in the end, I decided propagating all those
data fields around was a pain and that even the BPPs shouldn't have
been doing it that way. So I've reverted to a system where everything
calls back to functions in ssh.c itself to report any connection-
ending condition. But there's a new family of those functions,
categorising the possible such conditions by semantics, and each one
has a different set of detailed effects (e.g. how rudely to close the
network connection, what exit status should be passed back to the
whole application, whether to send a disconnect message and/or display
a GUI error box).
I don't expect this to be immediately perfect: of course, the code has
been through a big upheaval, new bugs are expected, and I haven't been
able to do a full job of testing (e.g. I haven't tested every auth or
kex method). But I've checked that it _basically_ works - both SSH
protocols, all the different kinds of forwarding channel, more than
one auth method, Windows and Linux, connection sharing - and I think
it's now at the point where the easiest way to find further bugs is to
let it out into the wild and see what users can spot.
Having redesigned it a few days ago in commit 562cdd4df, I'm changing
it again, this time to fix a potential race condition on the _output_
side: the last change was intended to cope with a server sending an
asynchronous message like IGNORE immediately after enabling
compression, and this one fixes the case in which _we_ happen to
decide to send an IGNORE while a compression request is still pending.
I couldn't fix this until after the BPP was reorganised to have an
explicit output queue of packets, but now it does, I can simply defer
processing that queue on to the output raw-data bufchain if we're
waiting for a compression request to be answered. Once it is answered,
the BPP can release any pending packets.
This is a convenient place for it because it abstracts away the
difference in disconnect packet formats between SSH-1 and -2, so when
I start restructuring, I'll be able to call it even from places that
don't know which version of SSH they're running.