There's no reason to pass a 'void *' through and then cast it back to
a packet. All those functions are really doing is serialising a pile
of output on to an arbitrary receiver, so it's nicer to use the new
abstract BinarySink type, and then the do_mode function doesn't have
to cast it at all.
This also removes one lingering ugliness from the previous commit, in
the form of the variable 'stringstart' I introduced in ssh2_setup_pty
to work around the removal of the addstring_start system. Now that's
done the sensible way, by constructing the subsidiary terminal-modes
string in a strbuf (taking advantage of the new type-genericity
meaning I can pass that to the underlying functions just as easily as
a struct Packet), and then putting that into the final packet
afterwards.
All previous uses of them are replaced by the new put_* family.
This is a fairly mechanical transformation, which doesn't take full
advantage of the new ways things can be done more usefully. I'll come
back and clean parts of it up in separate patches later; muddling that
together with this main search-and-replace part would make (even more
of) a giant unreadable monolith.
In fact, those functions don't even exist any more. The only way to
get data into a primitive hash state is via the new put_* system. Of
course, that means put_data() is a viable replacement for every
previous call to one of the per-hash update functions - but just
mechanically doing that would have missed the opportunity to simplify
a lot of the call sites.
Lots of functions had really generic names (like 'makekey'), or names
that missed out an important concept (like 'rsakey_pubblob', which
loads a public blob from a _file_ and doesn't generate it from an
in-memory representation at all). Also, the opaque 'int order' that
distinguishes the two formats of public key blob is now a mnemonic
enumeration, and while I'm at it, rsa_ssh1_public_blob takes one of
those as an extra argument.
Added a missing pq_push_front (see commit a41eefff4), without which
the SSH_MSG_USERAUTH_FAILURE was being dropped from the userauth
packet queue before returning to the top of the loop which waits for
it to arrive.
Another piece of fallout from this morning's patch series, which I
didn't notice until I left a session running for more than an hour:
once do_ssh2_transport is told to begin a rekey, it has no way of
knowing _not_ to immediately do another one, and another, and so on.
Added a value RK_NONE to the rekey class enumeration, and set
rekey_class to that immediately after a key exchange completes. Then a
new one won't start until some code actually sets rekey_class to a
nonzero value again.
Returning from the coroutine every time we finish putting together a
packet is wrong, because it means that any further data in the
incoming queue won't get processed until something triggers another
call to the coroutine. In ssh2_rdpkt I got this right - the only
crReturn that _isn't_ due to running out of data is the special one
immediately after a NEWKEYS - but I forgot to fix it the same way in
ssh1_rdpkt.
ssh->current_user_input_fn was not set to NULL when the ssh structure
was initially set up, which meant that with sufficiently eager
typeahead it could accidentally be called while still full of garbage.
And ssh->connshare is freed in more than one place (by ssh_free and
also by do_ssh_close), but only one of those places nulls it out to
stop the other one trying to free it a second time.
This fixes further embarrassing hangs in the wake of the big
restructuring, in which we'd assign a new function pointer into
ssh->current_user_input_fn but not immediately call it to let it
process whatever might already be waiting for it in the user input
queue.
A large part of the purpose of the 20-patch series just past is to
arrange that user input is never dropped on the floor: if you type it
at a moment where the active protocol coroutine has nothing it can
usefully do with it, the default action will now be to leave it on the
user_input queue where it will eventually be picked up by some later
coroutine, or later phase of this one, that does.
And did I _test_ this feature, end to end, just once, before pushing
the giant patch series?
I did not.
It doesn't work, and the reason why it doesn't work is because various
loops that spin round alternating a crReturn with a check of the input
queue do the first crReturn _before_ the first queue check. So if
there's already data in the queue, well, it won't be _dropped_, but it
also won't be passed on immediately to where it needs to be - instead,
it will sit in the queue until you press _another_ key, at which point
a queue check will happen and all your backed-up typeahead data will
come out.
Fixed by restructuring those loops to do a queue check first. This
applies to the final loop in do_ssh2_connection, and all the little
loops during userauth that prompt for usernames, passwords,
passphrases etc via get_userpass_input.
do_ssh2_authconn is no more! In its place are two separate functions
do_ssh2_userauth and do_ssh2_connection, each handling just one of the
three main protocol components of SSH-2, and each with its own
separate packet queue feeding from pq_full via the dispatch table.
This should be entirely NFC: no bugs are fixed as a side effect of
this change. It's mostly done in the hope that further refactoring in
the future might finally be able to split up ssh.c into smaller source
files, for which it's useful to have these two routines separate,
since each one will be talking to a completely different set of
supporting machinery.
This is not as trivial as it sounds, because although
do_ssh2_transport never uses user input, its in/inlen parameter pair
were nonetheless actually doing something, because by setting inlen to
special values -1 or -2 they doubled as a way to initiate a rekey with
a stated textual reason. So _that_ use of in and inlen has to be
replaced. (Good thing too - that was a horribly ugly API!)
I've replaced them with a new pair of fields in the main ssh
structure, a textual "rekey_reason" for the Event Log and an
enumeration "rekey_class" for discrimination within the code. In
particular, that allows me to get rid of the ugly strcmp that was
checking the textual rekey reason to find out whether the rekey was
due to a GSSAPI credentials update - now there's a separate enum value
for that kind of rekey and we can test for that more sensibly.
Even with my love of verbose commit messages, I don't think there's
anything interesting to say about this commit that the previous few
similar ones haven't mentioned already. This is just more of the same
transformations.
The connection protocol didn't have to do anything too complicated
with user input - just remember to check for it and turn it into
SSH1_CMSG_STDIN_DATA - so this is a much less involved change than the
previous user input conversions.
The userauth loop has always had a rather awkward feature whereby some
of its branches have _already_ received an SSH_MSG_USERAUTH_SUCCESS or
SSH_MSG_USERAUTH_FAILURE packet (e.g. because they have to wait for a
packet that might be one of those _or_ a continuation packet of some
kind), whereas other branches go back round to the top of the loop at
the moment that they know one of those will be the next packet to
arrive. So we had a flag 's->gotit' which tells the start of the loop
whether it's already sitting on the success or failure message, or
whether the first thing it should do is to crWait for one.
Now that the packets are coming from a linked list, there's a simpler
way to handle this: the top of the userauth loop _always_ expects to
find a success/failure message as the first thing in the queue, and if
any branch of the auth code finds it's already removed such a message
from the queue, it can simply put it back on the front again before
going back round.
Just like do_ssh1_login, do_ssh2_authconn is now no longer receiving
its packets via function arguments; instead, it's pulling them off a
dedicated PacketQueue populated by particular dispatch table entries,
with the same reference-counting system to stop them being freed when
the pq_full processing function regains control.
This eliminates further ugly bombout()s from the code while waiting
for authentication responses from the user or the SSH agent.
As I mentioned in the previous commit, I've had to keep
ssh2_authconn_input separate from do_ssh2_authconn, and this time the
reason why is thoroughly annoying: it's because do_ssh2_authconn has
changed its prototype to take the Ssh structure as a void * so that it
can be called from the idempotent callback system, whereas
ssh2_authconn_input has to have the type of ssh->current_user_input_fn
which takes the more sensible pointer type 'Ssh'.
Just as I did to do_ssh1_login, I'm removing the 'in' and 'inlen'
parameters from the combined SSH-2 userauth + connection coroutine, in
favour of it reading directly from ssh->user_input, and in particular
also allowing get_userpass_input to do so on its behalf.
Similarly to the previous case, I've completely emptied the wrapper
function ssh2_authconn_input(), and again, there is a reason I can't
quite get away with removing it...
This introduces the first of several filtered PacketQueues that
receive subsets of pq_full destined for a particular coroutine. The
wrapper function ssh1_coro_wrapper_initial, whose purpose I just
removed in the previous commit, now gains the replacement purpose of
accepting a packet as a function argument and putting it on the new
queue for do_ssh1_login to handle when it's ready. That wrapper in
turn is called from the packet-type dispatch table, meaning that the
new pq_ssh1_login will be filtered down to only the packets destined
for this coroutine.
This is the point where I finally start using the reference counting
system that I added to 'struct Packet' about a dozen commits ago. The
general packet handling calls ssh_unref_packet for everything that
it's just pulled off pq_full and handed to a dispatch-table function -
so _this_ dispatch-table function, which needs the packet not to be
freed because it's going to go on to another queue and wait to be
handled there, can arrange that by incrementing its ref count.
This completes the transformation of do_ssh1_login into a function
with a trivial argument list, whose job is to read from a pair of
input queues (one for user keyboard input and one for SSH packets) and
respond by taking action directly rather than returning a value to its
caller to request action.
It also lets me get rid of a big pile of those really annoying
bombout() calls that I used to work around the old coroutine system's
inability to deal with receiving an SSH packet when the control flow
was in the middle of waiting for some other kind of input. That was
always the weakness of the coroutine structure of this code, which I
accepted as the price for the convenience of coroutines the rest of
the time - but now I think I've got the benefits without that cost :-)
The one remaining argument to do_ssh1_login is the main Ssh structure,
although I've had to turn it into a void * to make the function's type
compatible with the idempotent callback mechanism, since that will be
calling do_ssh1_login directly when its input queue needs looking at.
Now it does its post-completion work itself instead of telling the
callee to do the same. So its caller, ssh1_coro_wrapper_initial, is
now a _completely_ trivial wrapper - but I'm not taking the
opportunity to fold the two functions together completely, because the
wrapper is going to acquire a new purpose in the next commit :-)
This is the first refactoring of a major coroutine enabled by adding
the ssh->user_input queue. Now, instead of receiving a fixed block of
parameter data, do_ssh1_login reads directly from the user input
bufchain.
In particular, I can get rid of all the temporary bufchains I
constructed to pass to get_userpass_input (or rather, the ones in this
particular function), because now we can let get_userpass_input read
directly from the main user_input bufchain, and it will read only as
much data as it has an interest in, and leave the rest as type-ahead
for future prompts or the main session.
After the last few commits, neither incoming SSH packets nor incoming
user input goes through those functions any more - each of those
directions of data goes into a queue and from there to a callback
specifically processing that queue. So the centralised top-level
protocol switching functions have nothing left to switch, and can go.
This change introduces a new bufchain ssh->user_input, into which we
put every byte received via back->send() (i.e. keyboard input from the
GUI PuTTY window, data read by Plink from standard input, or outgoing
SCP/SFTP protocol data made up by the file transfer utilities).
Just like ssh->incoming_data, there's also a function pointer
ssh->current_user_input_fn which says who currently has responsibility
for pulling data back off that bufchain and processing it. So that can
be changed over when the connection moves into a different major phase.
At the moment, nothing very interesting is being done with this
bufchain: each phase of the connection has its own little function
that pulls chunks back out of it with bufchain_prefix and passes them
to the unchanged main protocol coroutines. But this is groundwork for
being able to switch over each of those coroutines in turn to read
directly from ssh->user_input, with the aim of eliminating a
collection of annoying bugs in which typed-ahead data is accidentally
discarded at an SSH phase transition.
This flag was used to indicate that ssh1_protocol (or, as of the
previous commit, ssh1_coro_wrapper) should stop passing packets to
do_ssh1_login and start passing them to do_ssh1_connection.
Now, instead of using a flag, we simply have two separate versions of
ssh1_coro_wrapper for the two phases, and indicate the change by
rewriting all the entries in the dispatch table. So now we _just_ have
a function-pointer dereference per packet, rather than one of those
and then a flag check.
After the previous two refactorings, there's no longer any need to
pass packets to ssh1_protocol or ssh2_protocol so that each one can do
its own thing with them, because now the handling is the same in both
cases: first call the general type-independent packet processing code
(if any), and then call the dispatch table entry for the packet type
(which now always exists).
In SSH-2, every possible packet type code has a non-NULL entry in the
dispatch table, even if most of them are just ssh2_msg_unimplemented.
In SSH-1, some dispatch table entries are NULL, which means that the
code processing the dispatch table has to have some SSH-1 specific
fallback logic.
Now I've put the fallback logic in a separate function, and replaced
the NULL table entries with pointers to that function, so that another
pointless difference between the SSH-1 and SSH-2 code is removed.
NFC: I'm just moving a small piece of code out into a separate
function, which does processing on incoming SSH-2 packets that is
completely independent of the packet type. (Specifically, we count up
the total amount of data so far transferred, and use it to trigger a
rekey when we get over the per-session-key data limit.)
The aim is that I'll be able to call this function from a central
location that's not SSH-2 specific, by using a function pointer that
points to this function in SSH-2 mode or is null in SSH-1 mode.
I've completely removed the top-level coroutine ssh_gotdata(), and
replaced it with a system in which ssh_receive (which is a plug
function, i.e. called directly from the network code) simply adds the
incoming data to a new bufchain called ssh->incoming_data, and then
queues an idempotent callback to ensure that whatever function is
currently responsible for the top-level handling of wire data will be
invoked in the near future.
So the decisions that ssh_gotdata was previously making are now made
by changing which function is invoked by that idempotent callback:
when we finish doing SSH greeting exchange and move on to the packet-
structured main phase of the protocol, we just change
ssh->current_incoming_data_fn and ensure that the new function gets
called to take over anything still outstanding in the queue.
This simplifies the _other_ end of the API of the rdpkt functions. In
the previous commit, they stopped returning their 'struct Packet'
directly, and instead put it on a queue; in this commit, they're no
longer receiving a (data, length) pair in their parameter list, and
instead, they're just reading from ssh->incoming_data. So now, API-
wise, they take no arguments at all except the main 'ssh' state
structure.
It's not just the rdpkt functions that needed to change, of course.
The SSH greeting handlers have also had to switch to reading from
ssh->incoming_data, and are quite substantially rewritten as a result.
(I think they look simpler in the new style, personally.)
This new bufchain takes over from the previous queued_incoming_data,
which was only used at all in cases where we throttled the entire SSH
connection. Now, data is unconditionally left on the new bufchain
whether we're throttled or not, and the only question is whether we're
currently bothering to read it; so all the special-purpose code to
read data from a bufchain and pass it to rdpkt can go away, because
rdpkt itself already knows how to do that job.
One slightly fiddly point is that we now have to defer processing of
EOF from the SSH server: if we have data already in the incoming
bufchain and then the server slams the connection shut, we want to
process the data we've got _before_ reacting to the remote EOF, just
in case that data gives us some reason to change our mind about how we
react to the EOF, or a last-minute important piece of data we might
need to log.
Each of the coroutines that parses the incoming wire data into a
stream of 'struct Packet' now delivers those packets to a PacketQueue
called ssh->pq_full (containing the full, unfiltered stream of all
packets received on the SSH connection), replacing the old API in
which each coroutine would directly return a 'struct Packet *' to its
caller, or NULL if it didn't have one ready yet.
This simplifies the function-call API of the rdpkt coroutines (they
now return void). It increases the complexity at the other end,
because we've now got a function ssh_process_pq_full (scheduled as an
idempotent callback whenever rdpkt appends anything to the queue)
which pulls things out of the queue and passes them to ssh->protocol.
But that's only a temporary complexity increase; by the time I finish
the upcoming stream of refactorings, there won't be two chained
functions there any more.
One small workaround I had to add in this commit is a flag called
'pending_newkeys', which ssh2_rdpkt sets when it's just returned an
SSH_MSG_NEWKEYS packet, and then waits for the transport layer to
process the NEWKEYS and set up the new encryption context before
processing any more wire data. This wasn't necessary before, because
the old architecture was naturally synchronous - ssh2_rdpkt would
return a NEWKEYS, which would be immediately passed to
do_ssh2_transport, which would finish processing it immediately, and
by the time ssh2_rdpkt was next called, the keys would already be in
place.
This change adds a big while loop around the whole of each rdpkt
function, so it's easiest to read it as a whitespace-ignored diff.
NFC for the moment, because the bufchain is always specially
constructed to hold exactly the same data that would have been passed
in to the function as a (pointer,length) pair. But this API change
allows get_userpass_input to express the idea that it consumed some
but not all of the data in the bufchain, which means that later on
I'll be able to point the same function at a longer-lived bufchain
containing the full stream of keyboard input and avoid dropping
keystrokes that arrive too quickly after the end of an interactive
password prompt.
The crWaitUntil macros have do-while type semantics, i.e. they always
crReturn _at least_ once, and then perhaps more times if their
termination condition is still not met. But sometimes a coroutine will
want to wait for a condition that may _already_ be true - the key
examples being non-emptiness of a bufchain or a PacketQueue, which may
already be non-empty in spite of you having just removed something
from its head.
In that situation, it's obviously more convenient not to bother with a
crReturn in the first place than to do one anyway and have to fiddle
about with toplevel callbacks to make sure we resume later. So here's
a new pair of macros crMaybeWaitUntil{,V}, which have the semantics of
while rather than do-while, i.e. they test the condition _first_ and
don't return at all if it's already met.
This is another piece of not-yet-used infrastructure, which later on
will simplify my life when I start processing PacketQueues and adding
some of their packets to other PacketQueues, because this way the code
can unref every packet removed from the source queue in the same way,
whether or not the packet is actually finished with.
This is just a linked list of 'struct Packet' with a convenience API
on the front. As yet it's unused, so ssh.c will currently not compile
with gcc -Werror unless you also add -Wno-unused-function. But all the
functions I've added here will be used in later commits in the current
patch series, so that's only a temporary condition.
Previously it was local, which _mostly_ worked, except that if the SSH
host key needed verifying via a non-modal dialog box, there could be a
crReturn in between writing it and reading it.
It's pretty tempting to suggest that because nobody has noticed this
before, SSH-1 can't be needed any more! But actually I suspect the
intervening crReturn has only appeared since the last release,
probably around November when I was messing about with GTK dialog box
modality. (I observed the problem just now on the GTK build, while
trying to check that a completely different set of changes hadn't
broken SSH-1.)
It hasn't been used since 2012, when commit 8e0ab8be5 introduced a new
method of getting the do_ssh2_authconn coroutine started, and didn't
notice that the variable we were previously using was now completely
unused.
The 2-minutely check to see whether new GSS credentials need to be
forwarded to the server is pointless if we're not even in the mode
where we _have_ forwarded a previous set.
This was made obvious by the overly verbose diagnostic fixed in the
previous commit, so it's a good thing that bug was temporarily there!
Now we don't generate that message as a side effect of the periodic
check for new GSS credentials; we only generate it as part of the much
larger slew of messages that happen during a rekey.
Commit d515e4f1a went through a lot of very different shapes before it
was finally pushed. In some of them, GSS kex had its own value in the
kex enumeration, but it was used in ssh.c but not in config.c
(because, as in the final version, it wasn't configured by the same
drag-list system as the rest of them). So we had to distinguish the
set of key exchange ids known to the program as a whole from the set
controllable in the configuration.
In the final version, GSS kex ended up even more separated from the
kex enumeration than that: the enum value KEX_GSS_SHA1_K5 isn't used
at all. Instead, GSS key exchange appears in the list at the point of
translation from the list of enum values into the list of pointers to
data structures full of kex methods.
But after all the changes, everyone involved forgot to revert the part
of the patch which split KEX_MAX in two and introduced the pointless
value KEX_GSS_SHA1_K5! Better late than never: I'm reverting it now,
to avoid confusion, and because I don't have any reason to think the
distinction will be useful for any other purpose.
The former has advantages in terms of keeping Kerberos credentials up
to date, but it also does something sufficiently weird to the usual
SSH host key system that I think it's worth making sure users have a
means of turning it off separately from the less intrusive GSS
userauth.
This is a heavily edited (by me) version of a patch originally due to
Nico Williams and Viktor Dukhovni. Their comments:
* Don't delegate credentials when rekeying unless there's a new TGT
or the old service ticket is nearly expired.
* Check for the above conditions more frequently (every two minutes
by default) and rekey when we would delegate credentials.
* Do not rekey with very short service ticket lifetimes; some GSSAPI
libraries may lose the race to use an almost expired ticket. Adjust
the timing of rekey checks to try to avoid this possibility.
My further comments:
The most interesting thing about this patch to me is that the use of
GSS key exchange causes a switch over to a completely different model
of what host keys are for. This comes from RFC 4462 section 2.1: the
basic idea is that when your session is mostly bidirectionally
authenticated by the GSSAPI exchanges happening in initial kex and
every rekey, host keys become more or less vestigial, and their
remaining purpose is to allow a rekey to happen if the requirements of
the SSH protocol demand it at an awkward moment when the GSS
credentials are not currently available (e.g. timed out and haven't
been renewed yet). As such, there's no need for host keys to be
_permanent_ or to be a reliable identifier of a particular host, and
RFC 4462 allows for the possibility that they might be purely
transient and only for this kind of emergency fallback purpose.
Therefore, once PuTTY has done a GSS key exchange, it disconnects
itself completely from the permanent host key cache functions in
storage.h, and instead switches to a _transient_ host key cache stored
in memory with the lifetime of just that SSH session. That cache is
populated with keys received from the server as a side effect of GSS
kex (via the optional SSH2_MSG_KEXGSS_HOSTKEY message), and used if
later in the session we have to fall back to a non-GSS key exchange.
However, in practice servers we've tested against do not send a host
key in that way, so we also have a fallback method of populating the
transient cache by triggering an immediate non-GSS rekey straight
after userauth (reusing the code path we also use to turn on OpenSSH
delayed encryption without the race condition).
This is a preliminary refactoring for an upcoming change which will
need to affect every use of schedule_timer to wait for the next rekey:
those calls to schedule_timer are now centralised into a function that
does an organised piece of thinking about when the next timer should
be.
A side effect of this change is that the translation from
CONF_ssh_rekey_time to an actual tick count is now better proofed
against integer overflow (just in case the user entered a completely
silly value).
In the 'SSH packets + raw data' logging mode, one of these occurs
immediately after the initial key exchange, at the point where the
transport routine releases any queued higher-layer packets that had
been waiting for KEX to complete. Of course, in the initial KEX there
are never any of those, so we do a zero-length s_write(), which is
harmless but has the side effect of a zero-length raw-data log entry.
If ssh_init encounters a synchronous error, it will call random_unref
before returning. But the Ssh structure it created will still exist,
and if the caller (sensibly) responds by freeing it, then that will
cause a second random_unref, leading to the RNG's refcount going below
zero and failing an assertion.
We never noticed this before because with only one PuTTY connection
per process it was easier to just exit(1) without bothering to clean
things up. Now, with all the multi-sessions-per-process fixes I'm
doing, this has shown up as a problem. But other front ends may
legitimately still just exit - I don't think I can sensibly enforce
_not_ doing so at this late stage - so I've had to arrange to set a
flag in the Ssh saying whether a random_unref is still pending or not.
I think these began to appear as a consequencce of replacing
fatalbox() calls with more sensible error reports: the more specific a
direction I send a report in, the greater the annoying possibility of
re-entrance when the resulting error handler starts closing stuff.
ssh1_rdpkt claimed to be handling SSH1_MSG_DEBUG and SSH1_MSG_IGNORE
packets, but in fact, the handling of those has long since been moved
into the dispatch table; those particular entries are set up in
ssh1_protocol_setup().
When it calls through ocr->handler() to process the response to a
channel request, sometimes that call ends up back in the main SSH-2
authconn coroutine, and sometimes _that_ will call bomb_out(), which
closes the whole SSH connection and frees all the channels - so that
when control returns back up the call stack to
ssh2_msg_channel_response itself which continues working with the
channel it was passed, it's using freed memory and things go badly.
This is the sort of thing I'd _like_ to fix using some kind of
large-scale refactoring along the lines of moving all the actual
free() calls out into top-level callbacks, so that _any_ function
which is holding a pointer to something can rely on that pointer still
being valid after it calls a subroutine. But I haven't worked out all
the details of how that system should work, and doubtless it will turn
out to have problems of its own once I do, so here's a point fix which
simply checks if the whole SSH session has been closed (which is easy
- much easier than checking if that _channel_ structure still exists)
and fixes the immediate bug.
(I think this is the real fix for the problem reported by the user I
mention in commit f0126dd19, because I actually got the details wrong
in the log message for that previous commit: the user's SSH server
wasn't rejecting the _opening_ of the main session channel, it was
rejecting the "shell" channel request, so this code path was the one
being exercised. Still, the other bug was real too, so no harm done!)
A user reported a nonsensical assertion failure (claiming that
ssh->version != 2) which suggested that a channel had somehow outlived
its parent Ssh in the situation where the opening of the main session
channel is rejected by the server. Checking with valgrind suggested
that things start to go wrong at the point where we free the half-set-
up ssh->mainchan before having filled in its type field, so that the
switch in ssh_channel_close_local() picks an arbitrary wrong action.
I haven't reproduced the same failure the user reported, but with this
change, Unix plink is now valgrind-clean in that failure situation.
This has been a FIXME in the code for ages, because back when the main
channel was always a pty session or a program run in a pipe, there
weren't that many circumstances in which the actual CHANNEL_OPEN could
return failure, so it never seemed like a priority to get round to
pulling the error information out of the CHANNEL_OPEN_FAILURE response
message and including it in PuTTY or Plink's local error message.
However, 'plink -nc' is the real reason why this is actually
important; if you tell the SSH server to make a direct-tcpip network
connection as its main channel, then that can fail for all the usual
network-unreliability reasons, and you actually do want to know which
(did you misspell the hostname, or is the target server refusing
connections, or has network connectivity failed?). This actually bit
me today when I had such a network failure, and had to debug it by
pulling that information manually out of a packet log. Time to
eliminate that FIXME.
So I've pulled the error-extracting code out of the previous handler
for OPEN_FAILURE on non-main channels into a separate function, and
arranged to call that function if the main channel open fails too. In
the process I've made a couple of minor tweaks, e.g. if the server
sends back a reason code we haven't heard of, we say _what_ that
reason code was, and also we at least make a token effort to spot if
we see a packet other than OPEN_{CONFIRMATION,FAILURE} reaching the
main loop in response to the main channel-open.
2ce0b680c inadvertently removed this ability in trying to ensure that
everyone got the new IUTF8 mode by default; you could remove a mode from
the list in the UI, but this would just revert PuTTY to its default.
The UI and storage have been revamped; the storage format now explicitly
says when a mode is not to be sent, and the configuration UI always
shows all modes known to PuTTY; if a mode is not to be sent it now shows
up as "(don't send)" in the list.
Old saved settings are migrated so as to preserve previous removals of
longstanding modes, while automatically adding IUTF8.
(In passing, this removes a bug where pressing the 'Remove' button of
the previous UI would populate the value edit box with garbage.)
I think an agent sending a string length exceeding the buffer bounds
by less than 4 could have made PuTTY read beyond its own buffer end.
Not that I really think a hostile SSH agent is likely to be attacking
PuTTY, but it's as well to fix these things anyway!
Mostly so that we don't have to malloc contiguous space for them
inside PuTTY; since we've already got a handy constant saying how big
is too big, we might as well use it to sanity-check the contents of
our agent forwarding channels.
The previous agent-forwarding system worked by passing each complete
query received from the input to agent_query() as soon as it was
ready. So if the remote client were to pipeline multiple requests,
then Unix PuTTY (in which agent_query() works asynchronously) would
parallelise them into many _simultaneous_ connections to the real
agent - and would not track which query went out first, so that if the
real agent happened to send its replies (to what _it_ thought were
independent clients) in the wrong order, then PuTTY would serialise
the replies on to the forwarding channel in whatever order it got
them, which wouldn't be the order the remote client was expecting.
To solve this, I've done a considerable rewrite, which keeps the
request stream in a bufchain, and only removes data from the bufchain
when it has a complete request. Then, if agent_query decides to be
asynchronous, the forwarding system waits for _that_ agent response
before even trying to extract the next request's worth of data from
the bufchain.
As an added bonus (in principle), this gives agent-forwarding channels
some actual flow control for the first time ever! If a client spams us
with an endless stream of rapid requests, and never reads its
responses, then the output side of the channel will run out of window,
which causes us to stop processing requests until we have space to
send responses again, which in turn causes us to stop granting extra
window on the input side, which serves the client right.
Now, instead of returning a boolean indicating whether the query has
completed or is still pending, agent_query() returns NULL to indicate
that the query _has_ completed, and if it hasn't, it returns a pointer
to a context structure representing the pending query, so that the
latter can be used to cancel the query if (for example) you later
decide you need to free the thing its callback was using as a context.
This should fix a potential race-condition segfault if you overload an
agent forwarding channel and then close it abruptly. (Which nobody
will be doing for sensible purposes, of course! But I ran across this
while stress-testing other aspects of agent forwarding.)
From ssh2_channel_got_eof() to ssh2_msg_channel_eof(). This removes
the only SSH-2 specicifity from the former. ssh2_channel_got_eof()
can also be called from ssh2_msg_channel_close(), but that calls
ssh2_channel_check_close() already.
Nothing ever sets them to NULL, and the various paths by which the
channel types can be set to CHAN_X11 or CHAN_SOCKDATA all ensure thet
the relevant union members are non-NULL. All the removed conditionals
have been converted into assertions, just in case I'm wrong.
It's redundant with the halfopen flag and is a misuse of the channel
type field. Happily, everything that depends on CHAN_SOCKDATA_DORMANT
also checks halfopen, so removing it is trivial.
Now it disconnects if the server sends
SSH_MSG_CHANNEL_OPEN_CONFIRMATION or SSH_MSG_CHANNEL_OPEN_FAILURE for
a channel that isn't half-open. Assertions in the SSH-2 handlers for
these messages rely on this behaviour even though it's never been
enforced before.
All but one caller was doing this unconditionally. The one conditional
call was when initialising the main channel, and in consequence PuTTY
leaked a channel structure when the server refused to open the main
channel. Now it doesn't.
An opcode for this was recently published in
https://tools.ietf.org/html/draft-sgtatham-secsh-iutf8-00 .
The default setting is conditional on frontend_is_utf8(), which is
consistent with the pty back end's policy for setting the same flag
locally. Of course, users can override the setting either way in the
GUI configurer, the same as all other tty modes.
Previously, the code that marshalled tty settings into the "pty-req"
request was iterating through the subkeys stored in ssh->conf, meaning
that if a session had been saved before we gained support for a
particular tty mode, the iteration wouldn't visit that mode at all and
hence wouldn't send even the default setting for it.
Now we iterate over the array of known mode identifiers in
ssh_ttymodes[] and look each one up in ssh->conf, rather than vice
versa. This means that when we add support for a new tty mode with a
nontrivial policy for choosing its default state, we should start
using the default handler immediately, rather than bizarrely waiting
for users to save a session after the change.
Also add an assertion to do_ssh2_transport to catch this.
This bug would be highly unlikely to manifest accidentally, but I
think you could trigger it by setting the data-based rekey threshold
very low.
All calls to ssh2_add_channel_data() were followed by a call to
ssh2_try_send(), so it seems sensible to replace ssh2_add_channel_data()
with ssh2_send_channel_data(), which does both.
Specifically, don't try to unblock all channels just because we've got
something to send on the main one. It looks like the code to do that
was left over from when SSH_MSG_CHANNEL_ADJUST was handled in
do_ssh2_authconn().
The UI now only has "1" and "2" options for SSH protocol version, which
behave like the old "1 only" and "2 only" options; old
SSH-N-with-fallback settings are interpreted as SSH-N-only.
This prevents any attempt at a protocol downgrade attack.
Most users should see no difference; those poor souls who still have to
work with SSH-1 equipment now have to explicitly opt in.
If you're connecting to a new server and it _only_ provides host key
types you've configured to be below the warning threshold, it's OK to
give the standard askalg() message. But if you've newly demoted a host
key type and now reconnect to some server for which that type was the
best key you had cached, the askalg() wording isn't really appropriate
(it's not that the key we've settled on is the first type _supported
by the server_, it's that it's the first type _cached by us_), and
also it's potentially helpful to list the better algorithms so that
the user can pick one to cross-certify.
When Jacob introduced this message in d0d3c47a0, he was right to
assume that hostkey_algs[] and ssh->uncert_hostkeys[] were sorted in
the same order. Unfortunately, he became wrong less than an hour later
when I committed d06098622. Now we avoid making any such assumption.
Since we got a dynamic preference order, it's been bailing out at a
random point, and listing keys we wouldn't use.
(It would still be nice to only mention keys that we'd actually use, but
that's now quite fiddly.)
I noticed this in passing while tinkering with the hostkey_algs array:
these arrays are full of pointers-to-const, but are not also
themselves declared const, which they should have been all along.
Now we actually have enough of them to worry about, and especially
since some of the types we support are approved by organisations that
people might make their own decisions about whether to trust, it seems
worth having a config list for host keys the same way we have one for
kex types and ciphers.
To make room for this, I've created an SSH > Host Keys config panel,
and moved the existing host-key related configuration (manually
specified fingerprints) into there from the Kex panel.
I got momentarily confused between whether the special code
(TS_LOCALSTART+i) meant the ith entry in the variable
uncert_hostkeys[] array, or the ith entry in the fixed hostkey_algs[]
array. Now I think everything agrees on it being the latter.
If a server offers host key algorithms that we don't have a stored key
for, they will now appear in a submenu of the Special Commands menu.
Selecting one will force a repeat key exchange with that key, and if
it succeeds, will add the new host key to the cache. The idea is that
the new key sent by the server is protected by the crypto established
in the previous key exchange, so this is just as safe as typing some
command like 'ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub' at the
server prompt and transcribing the results manually.
This allows switching over to newer host key algorithms if the client
has begun to support them (e.g. people using PuTTY's new ECC
functionality for the first time), or if the server has acquired a new
key (e.g. due to a server OS upgrade).
At the moment, it's only available manually, for a single host key
type at a time. Automating it is potentially controversial for
security policy reasons (what if someone doesn't agree this is what
they want in their host key cache, or doesn't want to switch over to
using whichever of the keys PuTTY would now put top of the list?), for
code plumbing reasons (chaining several of these rekeys might be more
annoying than doing one at a time) and for CPU usage reasons (rekeys
are expensive), but even so, it might turn out to be a good idea in
future.
The last list we returned is now stored in the main Ssh structure
rather than being a static array in ssh_get_specials.
The main point of this is that I want to start adding more dynamic
things to it, for which I can't predict the array's max length in
advance.
But also this fixes a conceptual wrongness, in that if a process had
more than one Ssh instance in it then their specials arrays would have
taken turns occupying the old static array, and although the current
single-threaded client code in the GUI front ends wouldn't have minded
(it would have read out the contents just once immediately after
get_specials returned), it still feels as if it was a bug waiting to
happen.
ssh_pkt_getstring can return (NULL,0) if the input packet is too short
to contain a valid string.
In quite a few places we were passing the returned pointer,length pair
to a printf function with "%.*s" type format, which seems in practice
to have not been dereferencing the pointer but the C standard doesn't
actually guarantee that. In one place we were doing the same job by
hand with memcpy, and apparently that _can_ dereference the pointer in
practice (so a server could have caused a NULL-dereference crash by
sending an appropriately malformed "x11" type channel open request).
And also I spotted a logging call in the "forwarded-tcpip" channel
open handler which had forgotten the field width completely, so it was
erroneously relying on the string happening to be NUL-terminated in
the received packet.
I've tightened all of this up in general by normalising (NULL,0) to
("",0) before calling printf("%.*s"), and replacing the two even more
broken cases with the corrected version of that same idiom.
It has three settings: on, off, and 'only until session starts'. The
idea of the last one is that if you use something like 'ssh -v' as
your proxy command, you probably wanted to see the initial SSH
connection-setup messages while you were waiting to see if the
connection would be set up successfully at all, but probably _didn't_
want a slew of diagnostics from rekeys disrupting your terminal in
mid-emacs once the session had got properly under way.
Default is off, to avoid startling people used to the old behaviour. I
wonder if I should have set it more aggressively, though.
I'm about to want to make a change to all those functions at once, and
since they're almost identical, it seemed easiest to pull them out
into a common helper. The new source file be_misc.c is intended to
contain helper code common to all network back ends (crypto and
non-crypto, in particular), and initially it contains a
backend_socket_log() function which is the common part of ssh_log(),
telnet_log(), rlogin_log() etc.
We've always had the back-end code unconditionally print 'Looking up
host' before calling name_lookup. But name_lookup doesn't always do an
actual lookup - in cases where the connection will be proxied and
we're configured to let the proxy do the DNS for us, it just calls
sk_nonamelookup to return a dummy SockAddr with the unresolved name
still in it. It's better to print a message that varies depending on
whether we're _really_ doing DNS or not, e.g. so that people can tell
the difference between DNS failure and proxy misconfiguration.
Hence, those log messages are now generated inside name_lookup(),
which takes a couple of extra parameters for the purpose - a frontend
pointer to pass to logevent(), and a reason string so that it can say
what the hostname it's (optionally) looking up is going to be used
for. (The latter is intended for possible use in logging subsidiary
lookups for port forwarding, though the moment I haven't changed
the current setup where those connection setups aren't logged in
detail - we just pass NULL in that situation.)
When we set ssh->sc{cipher,mac} to s->sc{cipher,mac}_tobe
conditionally, we should be conditionalising on the values we're
_reading_, not the ones we're about to overwrite.
Thanks to Colin Harrison for this patch.
Apparently if you maintain a branch for a long time where you only
compile with a non-default ifdef enabled, it becomes possible to not
notice a typo you left in the default branch :-)
This adds the "none" cipher and MAC, and also disables kex signure
verification and host-key checking. Since a client like this is
completely insecure, it also rewrites the client version string to
start "ISH", which should make it fail to interoperate with a real SSH
server. The server version string is still expected to begin "SSH" so
the real packet captures can be used against it.
The previous assertion failure is obviously wrong, but RFC 4253 doesn't
explicitly declare them to be a protocol error. Currently, the incoming
packet isn't logged, which might cause some confusion for log parsers.
Bug found with the help of afl-fuzz.
The previous assertion failure is obviously wrong, but RFC 4253 doesn't
explicitly declare them to be a protocol error. Currently, the incoming
packet isn't logged, which might cause some confusion for log parsers.
Bug found with the help of afl-fuzz.
This protects the Unix platform sharing code in the case where no salt
file exists yet in the connection-sharing directory, in which case
make_dirname() will want to create one by using some random bytes, and
prior to this commit, would fail an assertion because the random
number generator wasn't set up.
It would be neater to just return FALSE from ssh_test_for_upstream in
that situation - if there's no salt file, then no sharing socket can
be valid anyway - but that would involve doing more violence to the
code structure than I'm currently prepared to do for a minor elegance
gain.
A user reports that in a particular situation one of the calls to
LoadLibrary from wingss.c has unwanted side effects, and points out
that this happens even when the saved session has GSSAPI disabled. So
I've evaluated as much as possible of the condition under which we
check the results of GSS library loading, and deferred the library
loading itself until after that condition says we even care about the
results.
(cherry picked from commit 9a08d9a7c1)
A Plink invocation of the form 'plink -shareexists <session>' tests
for a currently live connection-sharing upstream for the session in
question. <session> can be any syntax you'd use with Plink to make the
actual connection (a host/port number, a bare saved session name,
-load, whatever).
I envisage this being useful for things like adaptive proxying - e.g.
if you want to connect to host A which you can't route to directly,
and you might already have a connection to either of hosts B or C
which are viable proxies, then you could write a proxy shell script
which checks whether you already have an upstream for B or C and goes
via whichever one is currently active.
Testing for the upstream's existence has to be done by actually
connecting to its socket, because on Unix the mere existence of a
Unix-domain socket file doesn't guarantee that there's a process
listening to it. So we make a test connection, and then immediately
disconnect; hence, that shows up in the upstream's event log.
This is the part of ssh.c's connect_to_host() which figures out the
host name and port number that logically identify the connection -
i.e. not necessarily where we physically connected to, but what we'll
use to look up the saved session cache, put in the window title bar,
and give to the connection sharing code to identify other connections
to share with.
I'm about to want to use it for another purpose, so it needs to be
moved out into a separate function.
If we've chosen the ChaCha20-Poly1305 option for a cipher, then that
forces the use of its associated MAC. In that situation, we should
avoid even _trying_ to figure out a MAC by examining the MAC string
from the server's KEXINIT, because we won't use the MAC selected by
that method anyway, so there's no point imposing the requirement on
servers to present a MAC we believe in just so we know it's there.
This was breaking interoperation with tinysshd, and is in violation of
OpenSSH's spec for the "chacha20-poly1305@openssh.com" cipher.
The revamp of key generation in commit e460f3083 made the assumption
that you could decide how many bytes of key material to generate by
converting cipher->keylen from bits to bytes. This is a good
assumption for all ciphers except DES/3DES: since the SSH DES key
setup ignores one bit in every byte of key material it's given, you
need more bytes than its keylen field would have you believe. So
currently the DES ciphers aren't being keyed correctly.
The original keylen field is used for deciding how big a DH group to
request, and on that basis I think it still makes sense to keep it
reflecting the true entropy of a cipher key. So it turns out we need
two _separate_ key length fields per cipher - one for the real
entropy, and one for the much more obvious purpose of knowing how much
data to ask for from ssh2_mkkey.
A compensatory advantage, though, is that we can now measure the
latter directly in bytes rather than bits, so we no longer have to
faff about with dividing by 8 and rounding up.
Tim Kosse points out that we now support some combinations of crypto
primitives which break the hardwired assumption that two blocks of
hash output from the session-key derivation algorithm are sufficient
to key every cipher and MAC in the system.
So now ssh2_mkkey is given the desired key length, and performs as
many iterations as necessary.
The key derivation code has been assuming (though non-critically, as
it happens) that the size of the MAC output is the same as the size of
the MAC key. That isn't even a good assumption for the HMAC family,
due to HMAC-SHA1-96 and also the bug-compatible versions of HMAC-SHA1
that only use 16 bytes of key material; so now we have an explicit
key-length field separate from the MAC-length field.
A user reports that in a particular situation one of the calls to
LoadLibrary from wingss.c has unwanted side effects, and points out
that this happens even when the saved session has GSSAPI disabled. So
I've evaluated as much as possible of the condition under which we
check the results of GSS library loading, and deferred the library
loading itself until after that condition says we even care about the
results.
If a sharing downstream disconnected while we were still in userauth
(probably by deliberate user action, since such a downstream would
have just been sitting there waiting for upstream to be ready for it)
then we could crash by attempting to count234(ssh->channels) before
the ssh->channels tree had been set up in the first place.
A simple null-pointer check fixes it. Thanks to Antti Seppanen for the
report.
I removed a vital line of code while fixing the merge conflicts when
cherry-picking 1eb578a488 as
26fe1e26c0, causing Diffie-Hellman key
exchange to be completely broken because the server's host key was
never constructed to verify the signature with. Reinstate it.
Assorted calls to ssh_pkt_getstring in handling the later parts of key
exchange (post-KEXINIT) were not checked for NULL afterwards, so that
a variety of badly formatted key exchange packets would cause a crash
rather than a sensible error message.
None of these is an exploitable vulnerability - the server can only
force a clean null-deref crash, not an access to actually interesting
memory.
Thanks to '3unnym00n' for pointing out one of these, causing me to
find all the rest of them too.
(cherry picked from commit 1eb578a488)
Conflicts:
ssh.c
Cherry-picker's notes: the main conflict arose because the original
commit also made fixes to the ECDH branch of the big key exchange if
statement, which doesn't exist on this branch. Also there was a minor
and purely textual conflict, when an error check was added right next
to a function call that had acquired an extra parameter on master.
The final main loop in do_ssh2_authconn will sometimes loop over all
currently open channels calling ssh2_try_send_and_unthrottle. If the
channel is a sharing one, however, that will reference fields of the
channel structure like 'remwindow', which were never initialised in
the first place (thanks, valgrind). Fix by excluding CHAN_SHARING
channels from that loop.
(cherry picked from commit 7366fde1d4)
When anyone connects to a PuTTY tool's listening socket - whether it's
a user of a local->remote port forwarding, a connection-sharing
downstream or a client of Pageant - we'd like to log as much
information as we can find out about where the connection came from.
To that end, I've implemented a function sk_peer_info() in the socket
abstraction, which returns a freeform text string as best it can (or
NULL, if it can't get anything at all) describing the thing at the
other end of the connection. For TCP connections, this is done using
getpeername() to get an IP address and port in the obvious way; for
Unix-domain sockets, we attempt SO_PEERCRED (conditionalised on some
moderately hairy autoconfery) to get the pid and owner of the peer. I
haven't implemented anything for Windows named pipes, but I will if I
hear of anything useful.
(cherry picked from commit c8f83979a3)
Conflicts:
pageant.c
Cherry-picker's notes: the conflict was because the original commit
also added a use of the same feature in the centralised Pageant code,
which doesn't exist on this branch. Also I had to remove 'const' from
the type of the second parameter to wrap_send_port_open(), since this
branch hasn't had the same extensive const-fixing as master.
PuTTY now uses the updated version of Diffie-Hellman group exchange,
except for a few old OpenSSH versions which Darren Tucker reports only
support the old version.
FIXME: this needs further work because the Bugs config panel has now
overflowed.
(cherry picked from commit 62a1bce7cb)
Assorted calls to ssh_pkt_getstring in handling the later parts of key
exchange (post-KEXINIT) were not checked for NULL afterwards, so that
a variety of badly formatted key exchange packets would cause a crash
rather than a sensible error message.
None of these is an exploitable vulnerability - the server can only
force a clean null-deref crash, not an access to actually interesting
memory.
Thanks to '3unnym00n' for pointing out one of these, causing me to
find all the rest of them too.
The final main loop in do_ssh2_authconn will sometimes loop over all
currently open channels calling ssh2_try_send_and_unthrottle. If the
channel is a sharing one, however, that will reference fields of the
channel structure like 'remwindow', which were never initialised in
the first place (thanks, valgrind). Fix by excluding CHAN_SHARING
channels from that loop.
I'd rather see the cipher and MAC named separately, with a hint that
the two are linked together in some way, than see the cipher called by
a name including the MAC and the MAC init message have an ugly
'<implicit>' in it.
This allows for sharing a bit of code, and it also means that
deduplication of KEXINIT algorithms can be done while working out the
list of algorithms rather than when constructing the KEXINIT packet
itself.
The general plan is that if PuTTY knows a host key for a server, it
should preferentially ask for the same type of key so that there's some
chance of actually getting the same key again. This should mean that
when a server (or PuTTY) adds a new host key type, PuTTY doesn't
gratuitously switch to that key type and then warn the user about an
unrecognised key.
It seems like quite an important thing to mention in the event log!
Suppose there's a bug affecting only one curve, for example? Fixed-
group Diffie-Hellman has always logged the group, but the ECDH log
message just told you the hash and not also the curve.
To implement this, I've added a 'textname' field to all elliptic
curves, whether they're used for kex or signing or both, suitable for
use in this log message and any others we might find a need for in
future.
When anyone connects to a PuTTY tool's listening socket - whether it's
a user of a local->remote port forwarding, a connection-sharing
downstream or a client of Pageant - we'd like to log as much
information as we can find out about where the connection came from.
To that end, I've implemented a function sk_peer_info() in the socket
abstraction, which returns a freeform text string as best it can (or
NULL, if it can't get anything at all) describing the thing at the
other end of the connection. For TCP connections, this is done using
getpeername() to get an IP address and port in the obvious way; for
Unix-domain sockets, we attempt SO_PEERCRED (conditionalised on some
moderately hairy autoconfery) to get the pid and owner of the peer. I
haven't implemented anything for Windows named pipes, but I will if I
hear of anything useful.
The new code remembers the contents and meaning of the outgoing KEXINIT
and uses this to drive the algorithm negotiation, rather than trying to
reconstruct what the outgoing KEXINIT probably said. This removes the
need to maintain the KEXINIT generation and parsing code precisely in
parallel.
It also fixes a bug whereby PuTTY would have selected the wrong host key
type in cases where the server gained a host key type between rekeys.
Having found a lot of unfixed constness issues in recent development,
I thought perhaps it was time to get proactive, so I compiled the
whole codebase with -Wwrite-strings. That turned up a huge load of
const problems, which I've fixed in this commit: the Unix build now
goes cleanly through with -Wwrite-strings, and the Windows build is as
close as I could get it (there are some lingering issues due to
occasional Windows API functions like AcquireCredentialsHandle not
having the right constness).
Notable fallout beyond the purely mechanical changing of types:
- the stuff saved by cmdline_save_param() is now explicitly
dupstr()ed, and freed in cmdline_run_saved.
- I couldn't make both string arguments to cmdline_process_param()
const, because it intentionally writes to one of them in the case
where it's the argument to -pw (in the vain hope of being at least
slightly friendly to 'ps'), so elsewhere I had to temporarily
dupstr() something for the sake of passing it to that function
- I had to invent a silly parallel version of const_cmp() so I could
pass const string literals in to lookup functions.
- stripslashes() in pscp.c and psftp.c has the annoying strchr nature
The ec_name_to_curve and ec_curve_to_name functions shouldn't really
have had to exist at all: whenever any part of the PuTTY codebase
starts using sshecc.c, it's starting from an ssh_signkey or ssh_kex
pointer already found by some other means. So if we make sure not to
lose that pointer, we should never need to do any string-based lookups
to find the curve we want, and conversely, when we need to know the
name of our curve or our algorithm, we should be able to look it up as
a straightforward const char * starting from the algorithm pointer.
This commit cleans things up so that that is indeed what happens. The
ssh_signkey and ssh_kex structures defined in sshecc.c now have
'extra' fields containing pointers to all the necessary stuff;
ec_name_to_curve and ec_curve_to_name have been completely removed;
struct ec_curve has a string field giving the curve's name (but only
for those curves which _have_ a name exposed in the wire protocol,
i.e. the three NIST ones); struct ec_key keeps a pointer to the
ssh_signkey it started from, and uses that to remember the algorithm
name rather than reconstructing it from the curve. And I think I've
got rid of all the ad-hockery scattered around the code that switches
on curve->fieldBits or manually constructs curve names using stuff
like sprintf("nistp%d"); the only remaining switch on fieldBits
(necessary because that's the UI for choosing a curve in PuTTYgen) is
at least centralised into one place in sshecc.c.
One user-visible result is that the format of ed25519 host keys in the
registry has changed: there's now no curve name prefix on them,
because I think it's not really right to make up a name to use. So any
early adopters who've been using snapshot PuTTY in the last week will
be inconvenienced; sorry about that.
This gives families of public key and kex functions (by which I mean
those sharing a set of methods) a place to store parameters that allow
the methods to vary depending on which exact algorithm is in use.
The ssh_kex structure already had a set of parameters specific to
Diffie-Hellman key exchange; I've moved those into sshdh.c and made
them part of the 'extra' structure for that family only, so that
unrelated kex methods don't have to faff about saying NULL,NULL,0,0.
(This required me to write an extra accessor function for ssh.c to ask
whether a DH method was group-exchange style or fixed-group style, but
that doesn't seem too silly.)
Not all of them, but the ones that don't get a 'void *key' parameter.
This means I can share methods between multiple ssh_signkey
structures, and still give those methods an easy way to find out which
public key method they're dealing with, by loading parameters from a
larger structure in which the ssh_signkey is the first element.
(In OO terms, I'm arranging that all static methods of my public key
classes get a pointer to the class vtable, to make up for not having a
pointer to the class instance.)
I haven't actually done anything with the new facility in this commit,
but it will shortly allow me to clean up the constant lookups by curve
name in the ECDSA code.
All the name strings in ssh_cipher, ssh_mac, ssh_hash, ssh_signkey
point to compile-time string literals, hence should obviously be const
char *.
Most of these const-correctness patches are just a mechanical job of
adding a 'const' in the one place you need it right now, and then
chasing the implications through the code adding further consts until
it compiles. But this one has actually shown up a bug: the 'algorithm'
output parameter in ssh2_userkey_loadpub was sometimes returning a
pointer to a string literal, and sometimes a pointer to dynamically
allocated memory, so callers were forced to either sometimes leak
memory or sometimes free a bad thing. Now it's consistently
dynamically allocated, and should be freed everywhere too.
There were ad-hoc functions for fingerprinting a bare key blob in both
cmdgen.c and pageant.c, not quite doing the same thing. Also, every
SSH-2 public key algorithm in the code base included a dedicated
fingerprint() method, which is completely pointless since SSH-2 key
fingerprints are computed in an algorithm-independent way (just hash
the standard-format public key blob), so each of those methods was
just duplicating the work of the public_blob() method with a less
general output mechanism.
Now sshpubk.c centrally provides an ssh2_fingerprint_blob() function
that does all the real work, plus an ssh2_fingerprint() function that
wraps it and deals with calling public_blob() to get something to
fingerprint. And the fingerprint() method has been completely removed
from ssh_signkey and all its implementations, and good riddance.
Obviously PuTTY can't actually do public-key authentication itself, if
you give it a public rather than private key file. But it can still
match the supplied public key file against the list of keys in the
agent, and narrow down to that. So if for some reason you're
forwarding an agent to a machine you don't want to trust with your
_private_ key file (even encrypted), you can still use the '-i' option
to select which key from the agent to use, by uploading just the
public key file to that machine.
It's just ssh_pkt_addstring_data but using strlen to get the length of
string to add, so make that explicit by having it call
ssh_pkt_addstring_data. Good compilers should be unaffected by this
change.
This is the kex protocol id "curve25519-sha256@libssh.org", so called
because it's over the prime field of order 2^255 - 19.
Arithmetic in this curve is done using the Montgomery representation,
rather than the Weierstrass representation. So 'struct ec_curve' has
grown a discriminant field and a union of subtypes.
Several of the functions in ssh2_signkey, and one or two SSH-1 key
functions too, were still taking assorted non-const buffer parameters
that had never been properly constified. Sort them all out.
This causes the initial length field of the SSH-2 binary packet to be
unencrypted (with the knock-on effect that now the packet length not
including MAC must be congruent to 4 rather than 0 mod the cipher
block size), and then the MAC is applied over the unencrypted length
field and encrypted ciphertext (prefixed by the sequence number as
usual). At the cost of exposing some information about the packet
lengths to an attacker (but rarely anything they couldn't have
inferred from the TCP headers anyway), this closes down any
possibility of a MITM using the client as a decryption oracle, unless
they can _first_ fake a correct MAC.
ETM mode is enabled by means of selecting a different MAC identifier,
all the current ones of which are constructed by appending
"-etm@openssh.com" to the name of a MAC that already existed.
We currently prefer the original SSH-2 binary packet protocol (i.e. we
list all the ETM-mode MACs last in our KEXINIT), on the grounds that
it's better tested and more analysed, so at the moment the new mode is
only activated if a server refuses to speak anything else.