A user reported a phenomenon where running 'plink -shareexists' very
early in the connection would cause the receiving upstream PuTTY to
exit cleanly with the message 'All channels closed' in the log.
That wasn't hard to track down: that happens as a result of the
connection layer callback sharing_no_more_downstreams(), which causes
the connection layer to check whether it has any channels left open,
and if not, to terminate the connection on the grounds that everything
has finished. But it's premature to draw that conclusion if the reason
no channels are open if we haven't _started_ yet! Now we have a
'started' flag which is set when we initialise mainchan, and the
'we're all done now' check will never fire before that flag is set.
But in the course of investigating that, I found a second problem in
the same area: at an even earlier stage of an SSH connection, the
connshare system doesn't _even_ have the real ConnectionLayer pointer
yet. Instead, it has a pointer to a dummy one provided by the
top-level ssh.c, which contains a NULL vtable pointer. So if it calls
sharing_no_more_downstreams on _that_ ConnectionLayer, it will
dereference NULL and crash. So I've filled in cl_dummy's vtable
pointer with a trivial vtable, containing only the one callback
sharing_no_more_downstreams, which itself is a no-op function.
Hopefully that should all be stable now.
Previously, the instant at which we send to the server a request to
enable agent forwarding (the "auth-agent-req@openssh.com" channel
request, or SSH1_CMSG_AGENT_REQUEST_FORWARDING) was also the instant
at which we set a flag indicating that we're prepared to accept
attempts from the server to open a channel to talk to the forwarded
agent. If the server attempts that when we haven't sent a forwarding
request, we treat it with suspicion, and reject it.
But it turns out that at least one SSH server does this, for what
seems to be a _somewhat_ sensible purpose, and OpenSSH accepts it. So,
on the basis that the @openssh.com domain suffix makes them the
arbiters of this part of the spec, I'm following their practice. I've
removed the 'agent_fwd_enabled' flag from both connection layer
implementations, together with the ConnectionLayer method that sets
it; now agent-forwarding CHANNEL_OPENs are gated only on the questions
of whether agent forwarding was permitted in the configuration and
whether an agent actually exists to talk to, and not also whether we
had previously sent a message to the server announcing it.
(The change to this condition is also applied in the SSH-1 agent
forwarding code, mostly for the sake of keeping things parallel where
possible. I think it doesn't actually make a difference in SSH-1,
because in SSH-1, it's not _possible_ for the server to try to open an
agent channel before the main channel is set up, due to the entirely
separate setup phase of the protocol.)
The use case is a proxy host which makes a secondary SSH connection to
a real destination host. A user has run into one of these recently,
announcing a version banner of "SSH-2.0-FudoSSH", which relies on
agent forwarding to authenticate the secondary connection. You connect
to the proxy host and authenticate with a username string of the form
"realusername#real.destination.host", and then, at the start of the
connection protocol, the server immediately opens a channel back to
your SSH agent which it uses to authenticate to the destination host.
And it delays answering any CHANNEL_OPEN requests from the client
until that's all done. For example (seen from the client's POV,
although the server's CHANNEL_OPEN may well have been _sent_ up front
rather than in response to the client's):
client: SSH2_MSG_CHANNEL_OPEN "session"
server: SSH2_MSG_CHANNEL_OPEN "auth-agent@openssh.com"
client: SSH2_MSG_CHANNEL_OPEN_CONFIRMATION to the auth-agent request
<- data is exchanged on the agent channel; proxy host uses
that signature to log in to the destination host ->
server: SSH2_MSG_CHANNEL_OPEN_CONFIRMATION to the session request
With PuTTY, this wasn't working, because at the point when the server
sends the auth-agent CHANNEL_OPEN, we had not yet had any opportunity
to send auth-agent-req (because that has to wait until we've had a
CHANNEL_OPEN_CONFIRMATION). So we were rejecting the server's
CHANNEL_OPEN, which broke this workflow:
client: SSH2_MSG_CHANNEL_OPEN "session"
server: SSH2_MSG_CHANNEL_OPEN "auth-agent@openssh.com"
client: SSH2_MSG_CHANNEL_OPEN_FAILURE to the auth-agent request
(hey, I haven't told you you can do that yet!)
server: SSH2_MSG_CHANNEL_OPEN_FAILURE to the session request
(in that case, no shell session for you!)
This is a sweeping change applied across the whole code base by a spot
of Emacs Lisp. Now, everywhere I declare a vtable filled with function
pointers (and the occasional const data member), all the members of
the vtable structure are initialised by name using the '.fieldname =
value' syntax introduced in C99.
We were already using this syntax for a handful of things in the new
key-generation progress report system, so it's not new to the code
base as a whole.
The advantage is that now, when a vtable only declares a subset of the
available fields, I can initialise the rest to NULL or zero just by
leaving them out. This is most dramatic in a couple of the outlying
vtables in things like psocks (which has a ConnectionLayerVtable
containing only one non-NULL method), but less dramatically, it means
that the new 'flags' field in BackendVtable can be completely left out
of every backend definition except for the SUPDUP one which defines it
to a nonzero value. Similarly, the test_for_upstream method only used
by SSH doesn't have to be mentioned in the rest of the backends;
network Plugs for listening sockets don't have to explicitly null out
'receive' and 'sent', and vice versa for 'accepting', and so on.
While I'm at it, I've normalised the declarations so they don't use
the unnecessarily verbose 'struct' keyword. Also a handful of them
weren't const; now they are.
Sometimes, within a switch statement, you want to declare local
variables specific to the handler for one particular case. Until now
I've mostly been writing this in the form
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED:
{
declare variables;
do stuff;
}
break;
}
which is ugly because the two pieces of essentially similar code
appear at different indent levels, and also inconvenient because you
have less horizontal space available to write the complicated case
handler in - particuarly undesirable because _complicated_ case
handlers are the ones most likely to need all the space they can get!
After encountering a rather nicer idiom in the LLVM source code, and
after a bit of hackery this morning figuring out how to persuade
Emacs's auto-indent to do what I wanted with it, I've decided to move
to an idiom in which the open brace comes right after the case
statement, and the code within it is indented the same as it would
have been without the brace. Then the whole case handler (including
the break) lives inside those braces, and you get something that looks
more like this:
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED: {
declare variables;
do stuff;
break;
}
}
This commit is a big-bang change that reformats all the complicated
case handlers I could find into the new layout. This is particularly
nice in the Pageant main function, in which almost _every_ case
handler had a bundle of variables and was long and complicated. (In
fact that's what motivated me to get round to this.) Some of the
innermost parts of the terminal escape-sequence handling are also
breathing a bit easier now the horizontal pressure on them is
relieved.
(Also, in a few cases, I was able to remove the extra braces
completely, because the only variable local to the case handler was a
loop variable which our new C99 policy allows me to move into the
initialiser clause of its for statement.)
Viewed with whitespace ignored, this is not too disruptive a change.
Downstream patches that conflict with it may need to be reapplied
using --ignore-whitespace or similar.
Ever since I reworked the SSH code to have multiple internal packet
queues, there's been a long-standing FIXME in ssh_sendbuffer() saying
that we ought to include the data buffered in those queues as part of
reporting how much data is buffered on standard input.
Recently a user reported that 'proftpd', or rather its 'mod_sftp'
add-on that implements an SFTP-only SSH server, exposes a bug related
to that missing piece of code. The xfer_upload system in sftp.c starts
by pushing SFTP write messages into the SSH code for as long as
sftp_sendbuffer() (which ends up at ssh_sendbuffer()) reports that not
too much data is buffered locally. In fact what happens is that all
those messages end up on the packet queues between SSH protocol
layers, so they're not counted by sftp_sendbuffer(), so we just keep
going until there's some other reason to stop.
Usually the reason we stop is because we've filled up the SFTP
channel's SSH-layer window, so we need the server to send us a
WINDOW_ADJUST before we're allowed to send any more data. So we return
to the main event loop and start waiting for reply packets. And when
the window is moderate (e.g. OpenSSH currently seems to present about
2MB), this isn't really noticeable.
But proftpd presents the maximum-size window of 2^32-1 bytes, and as a
result we just keep shovelling more and more packets into the internal
packet queues until PSFTP has grown to 4GB in size, and only then do
we even return to the event loop and start actually sending them down
the network. Moreover, this happens again at rekey time, because while
a rekey is in progress, ssh2transport stops emptying the queue of
outgoing packets sent by its higher layer - so, again, everything just
keeps buffering up somewhere that sftp_sendbuffer can't see it.
But this commit fixes it! Each PacketProtocolLayer now provides a
vtable method for asking how much data it currently has queued. Most
of them share a default implementation which just returns the newly
added total_size field from their pq_out; the exception is
ssh2transport, which also has to account for data queued in its higher
layer. And ssh_sendbuffer() adds that on to the quantity it already
knew about in other locations, to give a more realistic idea of the
currently buffered data.
An assortment of errors: int vs size_t confusion (probably undetected
since the big switchover in commit 0cda34c6f), some outright spurious
parameters after the format string (copy-paste errors), a particularly
silly one in pscp.c (a comma between two halves of what should have
been a single string literal), and a _missing_ format string in ssh.c
(but luckily in a context where the only text that would be wrongly
treated as a format string was error messages generated elsewhere in
PuTTY).
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
When I set up the simplified version of just the ssh-connection
protocol we use for connection sharing, I carefully documented at the
top of sshshare.c that packets in that protocol variant are limited to
just over 0x4000 bytes. And did I remember to actually honour that, by
restricting the sizes of outgoing packets when actually speaking the
bare connection protocol? I did not.
Well, better late than never. Here I introduce a packet size limit
that can be imposed by the BPP, and arrange for sshconnection.c to
take the min of that and any given channel's max packet size as sent
by the remote end. All BPPs except ssh2bpp-bare set it to the no-op
value of the largest possible uint32_t.
We use a toplevel callback in the SSH-2 connection layer for checking
whether it's time to close the whole SSH session after a channel
closes. If the channel close itself, or something close enough to it,
involves a protocol error severe enough to abort the session and free
the connection layer, then that callback can fire anyway on stale
data.
The fix is the same as it always is in these situations: any object
which is ever used as the context parameter to queue_toplevel_callback
should also be passed to delete_callbacks_for_context before freeing it.
When we get an incoming forwarded X11 channel over SSH, we keep it as
an upstream channel for long enough to decide from its auth data which
downstream (if any) it's destined for. Then we do a handover which
retags the channel as a sharing one, so all further SSH messages are
passed through trivially.
But the handover function is called from chan_send, which in turn is
called from the processing of the CHANNEL_DATA message that completed
the auth exchange. So after the handover finishes, we were coming back
to the standard CHANNEL_DATA processing and calling ssh2_set_window,
which tried to dereference c->chan, which has now become NULL.
Therefore, we should check for this case after calling chan_send, and
stop doing the post-send processing if we spot it, which avoids that
segfault.
At the point when we change over the seat's trust status to untrusted
for the last time, to finish authentication, Plink will now present a
final interactive prompt saying 'Press Return to begin session'. This
is a hint that anything after that that resembles an auth prompt
should be treated with suspicion, because _PuTTY_ thinks it's finished
authenticating.
This is of course an annoying inconvenience for interactive users, so
I've tried to reduce its impact as much as I can. It doesn't happen in
GUI PuTTY at all (because the trust sigil system is used instead); it
doesn't happen if you use plink -batch (because then the user already
knows that they _never_ expect an interactive prompt); and it doesn't
happen if Plink's standard input is being redirected from anywhere
other than the terminal / console (because then it would be pointless
for the server to try to scam passphrases out of the user anyway,
since the user isn't in a position to enter one in response to a spoof
prompt). So it should only happen to people who are using Plink in a
terminal for interactive login purposes, and that's not _really_ what
I ever intended Plink to be used for (which is why it's never had any
out-of-band control UI like OpenSSH's ~ system).
If anyone _still_ doesn't like this new prompt, it can also be turned
off using the new -no-antispoof flag, if the user is willing to
knowingly assume the risk.
When ssh2_connection_filter_queue is _receiving_ messages about a
channel from the other end, it carefully checks if the channel
referred to is half-open. But we weren't exercising the same caution
before beginning to _send_ channel data, and we should, because in
that situation important fields like c->remwinsize aren't even
initialised yet.
This can come up, for example, due to typeahead in the main session
window before the server has sent OPEN_CONFIRMATION.
A great many BinarySource_BARE_INIT calls are passing the two halves
of a ptrlen as separate arguments. It saves a lot of call-site faff to
have a variant of the init function that just takes the whole ptrlen
in one go.
Now that all the call sites are expecting a size_t instead of an int
length field, it's no longer particularly difficult to make it
actually return the pointer,length pair in the form of a ptrlen.
It would be nice to say that simplifies call sites because those
ptrlens can all be passed straight along to other ptrlen-consuming
functions. Actually almost none of the call sites are like that _yet_,
but this makes it possible to move them in that direction in future
(as part of my general aim to migrate ptrlen-wards as much as I can).
But also it's just nicer to keep the pointer and length together in
one variable, and not have to declare them both in advance with two
extra lines of boilerplate.
This is a general cleanup which has been overdue for some time: lots
of length fields are now the machine word type rather than the (in
practice) fixed 'int'.
In the past, I've had a lot of macros which you call with double
parentheses, along the lines of debug(("format string", params)), so
that the inner parens protect the commas and permit the macro to treat
the whole printf-style argument list as one macro argument.
That's all very well, but it's a bit inconvenient (it doesn't leave
you any way to implement such a macro by prepending another argument
to the list), and now this code base's rules allow C99isms, I can
switch all those macros to using a single pair of parens, using the
C99 ability to say '...' in the parameter list of the #define and get
at the corresponding suffix of the arguments as __VA_ARGS__.
So I'm doing it. I've made the following printf-style macros variadic:
bpp_logevent, ppl_logevent, ppl_printf and debug.
While I'm here, I've also fixed up a collection of conditioned-out
calls to debug() in the Windows front end which were clearly expecting
a macro with a different calling syntax, because they had an integer
parameter first. If I ever have a need to condition those back in,
they should actually work now.
ssh2_channel_check_throttle should only be called on channels for
which c->chan != NULL - that is, only for channels that are not
delegated to a sharing downstream. But throttle_all_channels was
calling it for _all_ channels, so if it had the bad luck to be called
while a sharing downstream was active, ssh2_channel_check_throttle
would dereference the null c->chan for the first downstream channel it
found.
This is another cleanup I felt a need for while I was doing
boolification. If you define a function or variable in one .c file and
declare it extern in another, then nothing will check you haven't got
the types of the two declarations mismatched - so when you're
_changing_ the type, it's a pain to make sure you've caught all the
copies of it.
It's better to put all those extern declarations in header files, so
that the declaration in the header is also in scope for the
definition. Then the compiler will complain if they don't match, which
is what I want.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
I think this is the full set of things that ought logically to be
boolean.
One annoyance is that quite a few radio-button controls in config.c
address Conf fields that are now bool rather than int, which means
that the shared handler function can't just access them all with
conf_{get,set}_int. Rather than back out the rigorous separation of
int and bool in conf.c itself, I've just added a similar alternative
handler function for the bool-typed ones.
This commit includes <stdbool.h> from defs.h and deletes my
traditional definitions of TRUE and FALSE, but other than that, it's a
100% mechanical search-and-replace transforming all uses of TRUE and
FALSE into the C99-standardised lowercase spellings.
No actual types are changed in this commit; that will come next. This
is just getting the noise out of the way, so that subsequent commits
can have a higher proportion of signal.
throttled_by_backlog is quite new, so it's not too surprising that I
hadn't already noticed it wasn't initialised. But the failure to null
out the final 'next' pointer in the callbacks list when it's rewritten
as a result of delete_callbacks_for_context is a great deal older, so
I'm a lot more puzzled about how it hasn't come up until now!
This server is NOT SECURE! If anyone is reading this commit message,
DO NOT DEPLOY IT IN A HOSTILE-FACING ENVIRONMENT! Its purpose is to
speak the server end of everything PuTTY speaks on the client side, so
that I can test that I haven't broken PuTTY when I reorganise its
code, even things like RSA key exchange or chained auth methods which
it's hard to find a server that speaks at all.
(For this reason, it's declared with [UT] in the Recipe file, so that
it falls into the same category as programs like testbn, which won't
be installed by 'make install'.)
Working title is 'Uppity', partly for 'Universal PuTTY Protocol
Interaction Test Yoke', but mostly because it looks quite like the
word 'PuTTY' with part of it reversed. (Apparently 'test yoke' is a
very rarely used term meaning something not altogether unlike 'test
harness', which is a bit of a stretch, but it'll do.)
It doesn't actually _support_ everything I want yet. At the moment,
it's a proof of concept only. But it has most of the machinery
present, and the parts it's missing - such as chained auth methods -
should be easy enough to add because I've built in the required
flexibility, in the form of an AuthPolicy object which can request
them if it wants to. However, the current AuthPolicy object is
entirely trivial, and will let in any user with the password "weasel".
(Another way in which this is not a production-ready server is that it
also has no interaction with the OS's authentication system. In
particular, it will not only let in any user with the same password,
but it won't even change uid - it will open shells and forwardings
under whatever user id you started it up as.)
Currently, the program can only speak the SSH protocol on its standard
I/O channels (using the new FdSocket facility), so if you want it to
listen on a network port, you'll have to run it from some kind of
separate listening program similar to inetd. For my own tests, I'm not
even doing that: I'm just having PuTTY spawn it as a local proxy
process, which also conveniently eliminates the risk of anyone hostile
connecting to it.
The bulk of the actual code reorganisation is already done by previous
commits, so this change is _mostly_ just dropping in a new set of
server-specific source files alongside the client-specific ones I
created recently. The remaining changes in the shared SSH code are
numerous, but all minor:
- a few extra parameters to BPP and PPL constructors (e.g. 'are you
in server mode?'), and pass both sets of SSH-1 protocol flags from
the login to the connection layer
- in server mode, unconditionally send our version string _before_
waiting for the remote one
- a new hook in the SSH-1 BPP to handle enabling compression in
server mode, where the message exchange works the other way round
- new code in the SSH-2 BPP to do _deferred_ compression the other
way round (the non-deferred version is still nicely symmetric)
- in the SSH-2 transport layer, some adjustments to do key derivation
either way round (swapping round the identifying letters in the
various hash preimages, and making sure to list the KEXINITs in the
right order)
- also in the SSH-2 transport layer, an if statement that controls
whether we send SERVICE_REQUEST and wait for SERVICE_ACCEPT, or
vice versa
- new ConnectionLayer methods for opening outgoing channels for X and
agent forwardings
- new functions in portfwd.c to establish listening sockets suitable
for remote-to-local port forwarding (i.e. not under the direction
of a Conf the way it's done on the client side).
ssh2connection.c now knows how to unmarshal the message formats for
all the channel requests we'll need to handle when we're the server
and a client sends them. Each one is translated into a call to a new
method in the Channel vtable, which is implemented by a trivial
'always fail' routine in every channel type we know about so far.
The vtable method underneath sshfwd_write now takes an is_stderr
parameter, and in SSH-2, this is implemented by having separate stdout
and stderr bufchains in each outgoing channel, and counting the size
of both for the purposes of measuring backlog and so forth.
To avoid making _most_ call sites more verbose, the usual macro
wrapper hasn't changed its API; it just sets is_stderr=FALSE. To use
the new feature, there's an sshfwd_write_ext macro that exposes the
extra parameter.
This is a major code reorganisation in preparation for making this
code base into one that can build an SSH server as well as a client.
(Mostly for purposes of using the server as a regression test suite
for the client, though I have some other possible uses in mind too.
However, it's currently no part of my plan to harden the server to the
point where it can sensibly be deployed in a hostile environment.)
In this preparatory commit, I've broken up the SSH-2 transport and
connection layers, and the SSH-1 connection layer, into multiple
source files, with each layer having its own header file containing
the shared type definitions. In each case, the new source file
contains code that's specific to the client side of the protocol, so
that a new file can be swapped in in its place when building the
server.
Mostly this is just a straightforward moving of code without changing
it very much, but there are a couple of actual changes in the process:
The parsing of SSH-2 global-request and channel open-messages is now
done by a new pair of functions in the client module. For channel
opens, I've invented a new union data type to be the return value from
that function, representing either failure (plus error message),
success (plus Channel instance to manage the new channel), or an
instruction to hand the channel over to a sharing downstream (plus a
pointer to the downstream in question).
Also, the tree234 of remote port forwardings in ssh2connection is now
initialised on first use by the client-specific code, so that's where
its compare function lives. The shared ssh2connection_free() still
takes responsibility for freeing it, but now has to check if it's
non-null first.
The outer shell of the ssh2_lportfwd_open method, for making a
local-to-remote port forwarding, is still centralised in
ssh2connection.c, but the part of it that actually constructs the
outgoing channel-open message has moved into the client code, because
that will have to change depending on whether the channel-open has to
have type direct-tcpip or forwarded-tcpip.
In the SSH-1 connection layer, half the filter_queue method has moved
out into the new client-specific code, but not all of it -
bidirectional channel maintenance messages are still handled
centrally. One exception is SSH_MSG_PORT_OPEN, which can be sent in
both directions, but with subtly different semantics - from server to
client, it's referring to a previously established remote forwarding
(and must be rejected if there isn't one that matches it), but from
client to server it's just a "direct-tcpip" request with no prior
context. So that one is in the client-specific module, and when I add
the server code it will have its own different handler.
Some kinds of channel, even after they've sent EOF in both directions,
still have something to do before they initiate the CLOSE mechanism
and wind up the channel completely. For example, a session channel
with a subprocess running inside it will want to be sure to send the
"exit-status" or "exit-signal" notification, even if that happens
after bidirectional EOF of the data channels.
Previously, the SSH-2 connection layer had the standard policy that
once EOF had been both sent and received, it would start the final
close procedure. There's a method chan_want_close() by which a Channel
could vary this policy in one direction, by indicating that it wanted
the close procedure to commence after EOF was sent in only one
direction. Its parameters are a pair of booleans saying whether EOF
has been sent, and whether it's been received.
Now chan_want_close can vary the policy in the other direction as
well: if it returns FALSE even when _both_ parameters are true, the
connection layer will honour that, and not send CHANNEL_CLOSE. If it
does that, the Channel is responsible for indicating when it _does_
want close later, by calling sshfwd_initiate_close.
Previously, it returned a human-readable string suitable for log
files, which tried to say something useful about the remote end of a
socket. Now it returns a whole SocketPeerInfo structure, of which that
human-friendly log string is just one field, but also some of the same
information - remote IP address and port, in particular - is provided
in machine-readable form where it's available.
Turns out that initiation of a CHANNEL_CLOSE message before both sides
have sent EOF is not only for _unclean_ closures or emergencies; it's
actually a perfectly normal thing that some channel types want to do.
(For example, a channel with a pty at the server end of it has no real
concept of sending EOF independently in both directions: when the pty
master sends EIO, the pty is no longer functioning, and you can no
longer send to it any more than you can receive.)
I've introduced a new POD struct type 'ssh_ttymodes' which stores an
encoding of everything you can specify in the "pty-req" packet or the
SSH-1 equivalent. This allows me to split up
write_ttymodes_to_packet_from_conf() into two separate functions, one
to parse all the ttymode data out of a Conf (and a Seat for fallback)
and return one of those structures, and the other to write it into an
SSH packet.
While I'm at it, I've moved the special case of terminal speeds into
the same mechanism, simplifying the call sites in both versions of the
SSH protocol.
The new master definition of all terminal modes lives in a header
file, with an ifdef around each item, so that later on I'll be able to
include it in a context that only enumerates the modes supported by
the particular target Unix platform.
This gets another big pile of logic out of ssh2connection and puts it
somewhere more central. Now the only thing left in ssh2connection is
the formatting and parsing of the various channel requests; the logic
deciding which ones to issue and what to do about them is devolved to
the Channel implementation, as it properly should be.
In my future plans, some SshChannels are going to need to be able to
ask favours from the connection layer as a whole. And an SshChannel is
inextricably tied to an instance of the connection layer, so there's
no real reason _not_ to make the pointer generally available.
Each of the new subroutines corresponds to one of the channel types
for which we know how to parse a CHANNEL_OPEN, and has a collection of
parameters corresponding to the fields of that message structure.
ssh2_connection_filter_queue now confines itself to parsing the
message, calling one of those functions, and constructing an
appropriate reply message if any.
Instead of the central code in ssh2_connection_filter_queue doing both
the job of parsing the channel request and deciding whether it's
acceptable, each Channel vtable now has a method for every channel
request type we recognise.
In the recent refactoring, when I rewrote the loop in the SSH-2
connection layer startup which tries the primary and then the fallback
command, I failed to reproduce a subtlety of the previous code, namely
that if CONF_remote_cmd2 holds the empty string, we don't even look
for CONF_ssh_subsys2. This is because no application other than pscp
will have set the latter, and looking it up when it's absent triggers
an assertion failure in conf.c.
This is a new vtable-based abstraction which is passed to a backend in
place of Frontend, and it implements only the subset of the Frontend
functions needed by a backend. (Many other Frontend functions still
exist, notably the wide range of things called by terminal.c providing
platform-independent operations on the GUI terminal window.)
The purpose of making it a vtable is that this opens up the
possibility of creating a backend as an internal implementation detail
of some other activity, by providing just that one backend with a
custom Seat that implements the methods differently.
For example, this refactoring should make it feasible to directly
implement an SSH proxy type, aka the 'jump host' feature supported by
OpenSSH, aka 'open a secondary SSH session in MAINCHAN_DIRECT_TCP
mode, and then expose the main channel of that as the Socket for the
primary connection'. (Which of course you can already do by spawning
'plink -nc' as a separate proxy process, but this would permit it in
the _same_ process without anything getting confused.)
I've centralised a full set of stub methods in misc.c for the new
abstraction, which allows me to get rid of several annoying stubs in
the previous code. Also, while I'm here, I've moved a lot of
duplicated modalfatalbox() type functions from application main
program files into wincons.c / uxcons.c, which I think saves
duplication overall. (A minor visible effect is that the prefixes on
those console-based fatal error messages will now be more consistent
between applications.)
LogContext is now the owner of the logevent() function that back ends
and so forth are constantly calling. Previously, logevent was owned by
the Frontend, which would store the message into its list for the GUI
Event Log dialog (or print it to standard error, or whatever) and then
pass it _back_ to LogContext to write to the currently open log file.
Now it's the other way round: LogContext gets the message from the
back end first, writes it to its log file if it feels so inclined, and
communicates it back to the front end.
This means that lots of parts of the back end system no longer need to
have a pointer to a full-on Frontend; the only thing they needed it
for was logging, so now they just have a LogContext (which many of
them had to have anyway, e.g. for logging SSH packets or session
traffic).
LogContext itself also doesn't get a full Frontend pointer any more:
it now talks back to the front end via a little vtable of its own
called LogPolicy, which contains the method that passes Event Log
entries through, the old askappend() function that decides whether to
truncate a pre-existing log file, and an emergency function for
printing an especially prominent message if the log file can't be
created. One minor nice effect of this is that console and GUI apps
can implement that last function subtly differently, so that Unix
console apps can write it with a plain \n instead of the \r\n
(harmless but inelegant) that the old centralised implementation
generated.
One other consequence of this is that the LogContext has to be
provided to backend_init() so that it's available to backends from the
instant of creation, rather than being provided via a separate API
call a couple of function calls later, because backends have typically
started doing things that need logging (like making network
connections) before the call to backend_provide_logctx. Fortunately,
there's no case in the whole code base where we don't already have
logctx by the time we make a backend (so I don't actually remember why
I ever delayed providing one). So that shortens the backend API by one
function, which is always nice.
While I'm tidying up, I've also moved the printf-style logeventf() and
the handy logevent_and_free() into logging.c, instead of having copies
of them scattered around other places. This has also let me remove
some stub functions from a couple of outlying applications like
Pageant. Finally, I've removed the pointless "_tag" at the end of
LogContext's official struct name.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
All the main backend structures - Ssh, Telnet, Pty, Serial etc - now
describe structure types themselves rather than pointers to them. The
same goes for the codebase-wide trait types Socket and Plug, and the
supporting types SockAddr and Pinger.
All those things that were typedefed as pointers are older types; the
newer ones have the explicit * at the point of use, because that's
what I now seem to be preferring. But whichever one of those is
better, inconsistently using a mixture of the two styles is worse, so
let's make everything consistent.
A few types are still implicitly pointers, such as Bignum and some of
the GSSAPI types; generally this is either because they have to be
void *, or because they're typedefed differently on different
platforms and aren't always pointers at all. Can't be helped. But I've
got rid of the main ones, at least.
In mainchan_send_eof, which is the Channel method that gets called
when EOF has been received from the SSH server and is now being passed
on to the local endpoint, we decide whether or not to respond to the
server-side EOF with a client-side EOF based on application
preference. But I was doing the followup admin _outside_ that if
statement, so if the server sent EOF and we _didn't_ want to send EOF
in response, we still set the flag that said we'd sent it, and stopped
reading from standard input. Result: if you use 'plink -nc' to talk to
a remote network socket, and the server sends EOF first, Plink will
never send EOF in the other direction, because it'll stop reading from
standard input and never actually see the EOF that needs to be sent.
s->want_user_input is set and unset in response to fluctuations of the
main channel's available SSH window size. But that means it can become
TRUE before a command has been successfully started, which we don't
want, because pscp.c uses backend_sendok() to determine when it's safe
to check the flag that tells it whether to speak the SFTP or SCP1
protocol. So we want to ensure we never return true from that backend
method until we know which command we're running.
I left this message type code out of the list in the outer switch in
ssh2_connection_filter_queue for messages with the standard handling
of an initial recipient channel id. The inner switch had a perfectly
good handler for extended data, but the outer one didn't pass the
message on to that handler, so it went back to the main coroutine and
triggered a sw_abort for an unexpected packet.
The check_termination function in ssh2connection is supposed to be
called whenever it's possible that we've run out of (a) channels, and
(b) sharing downstreams. I've been calling it on every channel close,
but apparently completely forgot to add a callback from sshshare.c
that also arranges to call it when we run out of downstreams.
When the connection layer is ready to receive user input, it sets the
flag causing ssh_ppl_want_user_input to return true. But one thing it
_didn't_ do was to check whether the user input bufchain already had
some data in it because the user had typed ahead of the session setup,
and send that input immediately if so. Now it does.
I've tried to separate out as many individually coherent changes from
this work as I could into their own commits, but here's where I run
out and have to commit the rest of this major refactoring as a
big-bang change.
Most of ssh.c is now no longer in ssh.c: all five of the main
coroutines that handle layers of the SSH-1 and SSH-2 protocols now
each have their own source file to live in, and a lot of the
supporting functions have moved into the appropriate one of those too.
The new abstraction is a vtable called 'PacketProtocolLayer', which
has an input and output packet queue. Each layer's main coroutine is
invoked from the method ssh_ppl_process_queue(), which is usually
(though not exclusively) triggered automatically when things are
pushed on the input queue. In SSH-2, the base layer is the transport
protocol, and it contains a pair of subsidiary queues by which it
passes some of its packets to the higher SSH-2 layers - first userauth
and then connection, which are peers at the same level, with the
former abdicating in favour of the latter at the appropriate moment.
SSH-1 is simpler: the whole login phase of the protocol (crypto setup
and authentication) is all in one module, and since SSH-1 has no
repeat key exchange, that setup layer abdicates in favour of the
connection phase when it's done.
ssh.c itself is now about a tenth of its old size (which all by itself
is cause for celebration!). Its main job is to set up all the layers,
hook them up to each other and to the BPP, and to funnel data back and
forth between that collection of modules and external things such as
the network and the terminal. Once it's set up a collection of packet
protocol layers, it communicates with them partly by calling methods
of the base layer (and if that's ssh2transport then it will delegate
some functionality to the corresponding methods of its higher layer),
and partly by talking directly to the connection layer no matter where
it is in the stack by means of the separate ConnectionLayer vtable
which I introduced in commit 8001dd4cb, and to which I've now added
quite a few extra methods replacing services that used to be internal
function calls within ssh.c.
(One effect of this is that the SSH-1 and SSH-2 channel storage is now
no longer shared - there are distinct struct types ssh1_channel and
ssh2_channel. That means a bit more code duplication, but on the plus
side, a lot fewer confusing conditionals in the middle of half-shared
functions, and less risk of a piece of SSH-1 escaping into SSH-2 or
vice versa, which I remember has happened at least once in the past.)
The bulk of this commit introduces the five new source files, their
common header sshppl.h and some shared supporting routines in
sshcommon.c, and rewrites nearly all of ssh.c itself. But it also
includes a couple of other changes that I couldn't separate easily
enough:
Firstly, there's a new handling for socket EOF, in which ssh.c sets an
'input_eof' flag in the BPP, and that responds by checking a flag that
tells it whether to report the EOF as an error or not. (This is the
main reason for those new BPP_READ / BPP_WAITFOR macros - they can
check the EOF flag every time the coroutine is resumed.)
Secondly, the error reporting itself is changed around again. I'd
expected to put some data fields in the public PacketProtocolLayer
structure that it could set to report errors in the same way as the
BPPs have been doing, but in the end, I decided propagating all those
data fields around was a pain and that even the BPPs shouldn't have
been doing it that way. So I've reverted to a system where everything
calls back to functions in ssh.c itself to report any connection-
ending condition. But there's a new family of those functions,
categorising the possible such conditions by semantics, and each one
has a different set of detailed effects (e.g. how rudely to close the
network connection, what exit status should be passed back to the
whole application, whether to send a disconnect message and/or display
a GUI error box).
I don't expect this to be immediately perfect: of course, the code has
been through a big upheaval, new bugs are expected, and I haven't been
able to do a full job of testing (e.g. I haven't tested every auth or
kex method). But I've checked that it _basically_ works - both SSH
protocols, all the different kinds of forwarding channel, more than
one auth method, Windows and Linux, connection sharing - and I think
it's now at the point where the easiest way to find further bugs is to
let it out into the wild and see what users can spot.