Somehow I managed to leave that line out in both SSH-1 and SSH-2's
functions for handling DISCONNECT, IGNORE and DEBUG, and in both
cases, only for DISCONNECT. Oops.
This is a new vtable-based abstraction which is passed to a backend in
place of Frontend, and it implements only the subset of the Frontend
functions needed by a backend. (Many other Frontend functions still
exist, notably the wide range of things called by terminal.c providing
platform-independent operations on the GUI terminal window.)
The purpose of making it a vtable is that this opens up the
possibility of creating a backend as an internal implementation detail
of some other activity, by providing just that one backend with a
custom Seat that implements the methods differently.
For example, this refactoring should make it feasible to directly
implement an SSH proxy type, aka the 'jump host' feature supported by
OpenSSH, aka 'open a secondary SSH session in MAINCHAN_DIRECT_TCP
mode, and then expose the main channel of that as the Socket for the
primary connection'. (Which of course you can already do by spawning
'plink -nc' as a separate proxy process, but this would permit it in
the _same_ process without anything getting confused.)
I've centralised a full set of stub methods in misc.c for the new
abstraction, which allows me to get rid of several annoying stubs in
the previous code. Also, while I'm here, I've moved a lot of
duplicated modalfatalbox() type functions from application main
program files into wincons.c / uxcons.c, which I think saves
duplication overall. (A minor visible effect is that the prefixes on
those console-based fatal error messages will now be more consistent
between applications.)
The variable s->e in ssh2_transport_state should never be freed by
ssh2transport itself, because it's owned by the dh_ctx, so it will be
freed by dh_cleanup.
The problem with OpenSSH delayed compression is that the spec has a
race condition. Compression is enabled when the server sends
USERAUTH_SUCCESS. In the server->client direction, that's fine: the
USERAUTH_SUCCESS packet is not itself compressed, and the next packet
in the same direction is. But in the client->server direction, this
specification relies on there being a moment of half-duplex in the
connection: the client can't send any outgoing packet _after_ whatever
userauth packet the USERAUTH_SUCCESS was a response to, and _before_
finding out whether the response is USERAUTH_SUCCESS or something
else. If it emitted, say, an SSH_MSG_IGNORE or initiated a rekey
(perhaps due to a timeout), then that might cross in the network with
USERAUTH_SUCCESS and the server wouldn't be able to know whether to
treat it as compressed.
My previous solution was to note the presence of delayed compression
options in the server KEXINIT, but not to negotiate them in the
initial key exchange. Instead, we conduct the userauth exchange with
compression="none", and then once userauth has concluded, we trigger
an immediate rekey in which we do accept delayed compression methods -
because of course by that time they're no different from the non-
delayed versions. And that means compression is enabled by the
bidirectional NEWKEYS exchange, which lacks that race condition.
I think OpenSSH itself gets away with this because its layer structure
is structure so as to never send any such asynchronous transport-layer
message in the middle of userauth. Ours is not. But my cunning plan is
that now that my BPP abstraction includes a queue of packets to be
sent and a callback that processes that queue on to the output raw
data bufchain, it's possible to make that callback terminate early, to
leave any dangerous transport-layer messages unsent while we wait for
a userauth response.
Specifically: if we've negotiated a delayed compression method and not
yet seen USERAUTH_SUCCESS, then ssh2_bpp_handle_output will emit all
packets from its queue up to and including the last one in the
userauth type-code range, and keep back any further ones. The idea is
that _if_ that last userauth message was one that might provoke
USERAUTH_SUCCESS, we don't want to send any difficult things after it;
if it's not (e.g. it's in the middle of some ongoing userauth process
like k-i or GSS) then the userauth layer will know that, and will emit
some further userauth packet on its own initiative which will clue us
in that it's OK to release everything up to and including that one.
(So in particular it wasn't even necessary to forbid _all_ transport-
layer packets during userauth. I could have done that by reordering
the output queue - packets in that queue haven't been assigned their
sequence numbers yet, so that would have been safe - but it's more
elegant not to have to.)
One particular case we do have to be careful about is not trying to
initiate a _rekey_ during userauth, if delayed compression is in the
offing. That's because when we start rekeying, ssh2transport stops
sending any higher-layer packets at all, to discourage servers from
trying to ignore the KEXINIT and press on regardless - you don't get
your higher-layer replies until you actually respond to the
lower-layer interrupt. But in this case, if ssh2transport sent a
KEXINIT, which ssh2bpp kept back in the queue to avoid a delayed
compression race and would only send if another userauth packet
followed it, which ssh2transport would never pass on to ssh2bpp's
output queue, there'd be a complete protocol deadlock. So instead I
defer any attempt to start a rekey until after userauth finishes
(using the existing system for starting a deferred rekey at that
moment, which was previously used for the _old_ delayed-compression
strategy, and still has to be here anyway for GSSAPI purposes).
The sshverstring quasi-frontend is passed a Frontend pointer at setup
time, so that it can generate Event Log entries containing the local
and remote version strings and the results of remote bug detection.
I'm promoting that field of sshverstring to a field of the public BPP
structure, so now all BPPs have the right to talk directly to the
frontend if they want to. This means I can move all the log messages
of the form 'Initialised so-and-so cipher/MAC/compression' down into
the BPPs themselves, where they can live exactly alongside the actual
initialisation of those primitives.
It also means BPPs will be able to log interesting things they detect
at any point in the packet stream, which is about to come in useful
for another purpose.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
When I separated out the transport layer into its own source file, I
also reworked the logic deciding when to rekey, and apparently that
rework introduced a braino in which I compared rekey_reason (which is
a pointer) to RK_NONE (which is a value of the enumerated type that
lives in the similarly named variable rekey_class). Oops. The result
was that after the first rekey, the loop would terminate the next time
the transport coroutine got called, because the code just before the
loop had zeroed out rekey_class but not rekey_reason. So there'd be a
rekey on every keypress, or similar.
I've tried to separate out as many individually coherent changes from
this work as I could into their own commits, but here's where I run
out and have to commit the rest of this major refactoring as a
big-bang change.
Most of ssh.c is now no longer in ssh.c: all five of the main
coroutines that handle layers of the SSH-1 and SSH-2 protocols now
each have their own source file to live in, and a lot of the
supporting functions have moved into the appropriate one of those too.
The new abstraction is a vtable called 'PacketProtocolLayer', which
has an input and output packet queue. Each layer's main coroutine is
invoked from the method ssh_ppl_process_queue(), which is usually
(though not exclusively) triggered automatically when things are
pushed on the input queue. In SSH-2, the base layer is the transport
protocol, and it contains a pair of subsidiary queues by which it
passes some of its packets to the higher SSH-2 layers - first userauth
and then connection, which are peers at the same level, with the
former abdicating in favour of the latter at the appropriate moment.
SSH-1 is simpler: the whole login phase of the protocol (crypto setup
and authentication) is all in one module, and since SSH-1 has no
repeat key exchange, that setup layer abdicates in favour of the
connection phase when it's done.
ssh.c itself is now about a tenth of its old size (which all by itself
is cause for celebration!). Its main job is to set up all the layers,
hook them up to each other and to the BPP, and to funnel data back and
forth between that collection of modules and external things such as
the network and the terminal. Once it's set up a collection of packet
protocol layers, it communicates with them partly by calling methods
of the base layer (and if that's ssh2transport then it will delegate
some functionality to the corresponding methods of its higher layer),
and partly by talking directly to the connection layer no matter where
it is in the stack by means of the separate ConnectionLayer vtable
which I introduced in commit 8001dd4cb, and to which I've now added
quite a few extra methods replacing services that used to be internal
function calls within ssh.c.
(One effect of this is that the SSH-1 and SSH-2 channel storage is now
no longer shared - there are distinct struct types ssh1_channel and
ssh2_channel. That means a bit more code duplication, but on the plus
side, a lot fewer confusing conditionals in the middle of half-shared
functions, and less risk of a piece of SSH-1 escaping into SSH-2 or
vice versa, which I remember has happened at least once in the past.)
The bulk of this commit introduces the five new source files, their
common header sshppl.h and some shared supporting routines in
sshcommon.c, and rewrites nearly all of ssh.c itself. But it also
includes a couple of other changes that I couldn't separate easily
enough:
Firstly, there's a new handling for socket EOF, in which ssh.c sets an
'input_eof' flag in the BPP, and that responds by checking a flag that
tells it whether to report the EOF as an error or not. (This is the
main reason for those new BPP_READ / BPP_WAITFOR macros - they can
check the EOF flag every time the coroutine is resumed.)
Secondly, the error reporting itself is changed around again. I'd
expected to put some data fields in the public PacketProtocolLayer
structure that it could set to report errors in the same way as the
BPPs have been doing, but in the end, I decided propagating all those
data fields around was a pain and that even the BPPs shouldn't have
been doing it that way. So I've reverted to a system where everything
calls back to functions in ssh.c itself to report any connection-
ending condition. But there's a new family of those functions,
categorising the possible such conditions by semantics, and each one
has a different set of detailed effects (e.g. how rudely to close the
network connection, what exit status should be passed back to the
whole application, whether to send a disconnect message and/or display
a GUI error box).
I don't expect this to be immediately perfect: of course, the code has
been through a big upheaval, new bugs are expected, and I haven't been
able to do a full job of testing (e.g. I haven't tested every auth or
kex method). But I've checked that it _basically_ works - both SSH
protocols, all the different kinds of forwarding channel, more than
one auth method, Windows and Linux, connection sharing - and I think
it's now at the point where the easiest way to find further bugs is to
let it out into the wild and see what users can spot.