If we're setting ssh->s to NULL, we ought to free the thing it
previously pointed to (having extracted the error message first).
At the very least this is a memory leak.
But in fact it's worse, because not freeing it also means not
cancelling its toplevel callbacks. And if you don't do that, then a
failure to set up an SSH connection proxied over another SSH
connection will generate two error dialog boxes in succession, the
second one from the callback that should have been cancelled here.
On Windows that callback never gets called, because we exit the whole
process before getting into the main message loop which might run the
callback. But on Unix, we do go to the main message loop (we don't
have a separate one for the error box), which causes an assertion
failure in register_dialog() when the second box finds the
DIALOG_SLOT_CONNECTION_FATAL slot already occupied.
The immediate usefulness of this is in pterm.exe: when the user uses
-e to specify a command to run in the pterm, we retrieve the command
in Unicode, store it in CONF_remote_cmd as UTF-8, and then in conpty.c
we can extract it in the same form and convert it back to Unicode to
pass losslessly to CreateProcessW. So now non-ACP Unicode works in
that part of the pterm command line.
In the previous few commits I noticed some repeated work in the form
of pointless empty implementations of Plug's log method, plus some
existing (and some new) empty cases of Socket's endpoint_info. As a
cleanup, I'm replacing as many as I can find with uses of a central
null implementation in the stubs directory.
This enables plug_log to run query methods on the socket in order to
find out useful information to log. I don't expect it's sensible to do
anything else with it.
I'm preparing to be able to ask about the other end of the connection
too, so the first step is to give this data structure a neutral name
that can refer to either. No functional change yet.
This involved a trivial merge conflict fix in terminal.c because of
the way the cherry-pick 73b41feba5 differed from its original
bdbd5f429c.
But a more significant rework was needed in windows/console.c, because
the updates to confirm_weak_* conflicted with the changes on main to
abstract out the ConsoleIO system.
The Terrapin vulnerability affects the modified binary packet protocol
used with ChaCha20+Poly1305, and also CBC-mode ciphers in ETM mode.
It's best prevented by the new strict-kex mode, but if the server
can't handle that protocol alteration, another approach is to change
PuTTY's configuration so that it will negotiate a different algorithm.
That may not be possible either (an obvious case being if the server
has been manually configured to _only_ support vulnerable modes). But
if it is possible, then it would be nice for us to detect that and
show how to do it.
That could be a hard problem in general, but the most likely cause of
it is configuring ChaCha20 to the top of the cipher list, so that it's
selected ahead of things that aren't vulnerable. And it's reasonably
easy to do just one fantasy-renegotiation, having moved ChaCha20 down
to below the warn line, and see if that sorts it out. If it does, we
can pass on that advice to the user.
This will allow it to be called in a second circumstance where we're
trying to find out whether something _would_ have worked, so that we
never want to terminate the connection.
If the KEXINIT exchange results in a vulnerable cipher mode, we now
give a warning, similar to the 'we selected a crypto primitive below
the warning threshold' one. But there's nothing we can do about it at
that point other than let the user abort the connection.
This is enabled via magic signalling keywords in the kex algorithms
list, similarly to ext-info-{c,s}. If both sides announce the
appropriate keyword, then this signals two changes to the standard SSH
protocol:
1. NEWKEYS resets packet sequence numbers: following any NEWKEYS, the
next packet sent in the same direction has sequence number zero.
2. No extraneous packets such as SSH_MSG_IGNORE are permitted during
the initial cleartext phase of the SSH protocol.
These two changes between them defeat the 'Terrapin' vulnerability,
aka CVE-2023-48795: a protocol-level exploit in which, for example, a
MITM injects a server-to-client SSH_MSG_IGNORE during the cleartext
phase, and deletes an initial segment of the server-to-client
encrypted data stream that it guesses is the right size to be the
server's SSH_MSG_EXT_INFO, so that both sides agree on the sequence
number of the _following_ server-to-client packet. In OpenSSH's
modified binary packet protocol modes this attack can go completely
undetected, and force a downgrade to (for example) SHA-1 based RSA.
(The ChaCha20/Poly1305 binary packet protocol is most vulnerable,
because it reinitialises the IV for each packet from scratch based on
the sequence number, so the keystream doesn't get out of sync.
Exploiting this in OpenSSH's ETM modes requires additional faff to
resync the keystream, and even then, the client likely sees a
corrupted SSH message at the start of the stream - but it will just
send SSH_MSG_UNIMPLEMENTED in response to that and proceed anyway. CBC
modes and standard AES SDCTR aren't vulnerable, because their MACs are
based on the plaintext rather than the ciphertext, so faking a correct
MAC on the corrupted packet requires the attacker to know what it
would decrypt to.)
This centralises the messages for weak crypto algorithms (general, and
host keys in particular, the latter including a list of all the other
available host key types) into ssh/common.c, in much the same way as
we previously did for ordinary host key warnings.
The reason is the same too: I'm about to want to vary the text in one
of those dialog boxes, so it's convenient to start by putting it
somewhere that I can modify just once.
I'm about to want to use the same code to check for something else.
It's only a handful of lines, but even so.
Also, since the string constants are mentioned several times, this
seems like a good moment to lift them out into reusable static const
ptrlens.
ssh2_scan_kexinits must check to see whether it's behaving as an SSH
client or server, in order to decide whether to look for "ext-info-s"
in the server's KEXINIT or "ext-info-c" in the client's, respectively.
This check was done by testing the pointer 'server_hostkeys' to see if
it was non-NULL. I think I must have imagined that a variable of that
name meant "the host keys we have available to offer the client, if we
are the server", as the similarly named parameter 'our_hostkeys' in
write_kexinit_lists in fact does mean. So I expected it to be non-NULL
for the server and NULL for the client, and coded accordingly.
But in fact it's used by the client: it collects host key types the
client has _seen_ from the server, in order to offer them as cross-
certification actions in the specials menu. Moreover, it's _always_
non-NULL, because in the server, it's easier to leave it present but
empty than to get rid of it.
So this code was always behaving as if it was the server, i.e. it was
looking for "ext-info-c" in the client KEXINIT. When it was in fact
the client, that test would always succeed, because we _sent_ that
KEXINIT ourselves!
But nobody ever noticed, because when we're the client, it doesn't
matter whether we saw "ext-info-c", because we don't have any reason
to send EXT_INFO from client to server. We're only concerned with
server-to-client EXT_INFO. So this embarrassing bug had no actual
effect.
Spotted by Coverity. If PuTTY is functioning as a sharing upstream,
and a new downstream mishandles the version string exchange in any way
that provokes an error message from share_receive() (such as failing
to start the greeting with the expected protocol-name string), we were
calling share_disconnect() and then going to crFinish. But
share_disconnect is capable of actually freeing the entire
ssh_sharing_connstate which contains the coroutine state - in which
case, crFinish's zeroing out of crLine is a use-after-free.
The usual pattern elsewhere in this code is to exit a coroutine with
an ordinary 'return' when you've destroyed its state structure. Switch
to doing that here.
When you send a "publickey" USERAUTH_REQUEST containing a certified
RSA key, and you want to use a SHA-2 based RSA algorithm, modern
OpenSSH expects you to send the algorithm string as
rsa-sha2-NNN-cert-v01@openssh.com. But 7.7 and earlier didn't
recognise those names, and expected the algorithm string in the
userauth request packet to be ssh-rsa-cert-v01@... and would then
follow it with an rsa-sha2-NNN signature.
OpenSSH itself has a bug workaround for its own older versions. Follow
suit.
If you specify a detached certificate, it's supposed to completely
replace any certificate that might have been embedded in the input PPK
file. But one thing wasn't working: if the key was RSA, and the server
was using new SHA-2 based RSA, and the user provided both an embedded
_and_ detached certificate, then the initial call to
ssh2_userauth_signflags would upgrade the ssh-rsa-cert-... key type to
rsa-sha2-NNN-cert-..., which ssh2_userauth_add_alg_and_publickey's
call to ssh_keyalg_related_alg would not recognise as any of the base
RSA types while trying to decide on the key algorithm string _after_
replacing the certificate.
Fixed by reverting to the the uncertified base algorithm before
calling ssh_keyalg_related_alg.
Introduced in the previous commit. The new ssh_ppl_final_output method
shouldn't be called in any of the error cleanup functions if
ssh->base_layer is NULL, which it can be if we haven't got far enough
through the connection to set up any packet protocol layers at
all. (For example, ECONNREFUSED would do it.)
This should fix the bug mentioned three commits ago: if an SSH server
sends a userauth banner and then immediately slams the connection
shut (with or without SSH_MSG_DISCONNECT), the banner message should
now be reliably printed to the user, which is important if that's
where the server put its explanation for the disconnection (e.g. "Your
account has expired").
(cherry picked from commit e8becb45b5)
No functional change: I've just pulled out into separate subroutines
the piece of code that process a USERAUTH_BANNER message and append
it to our banner bufchain, and the piece that prints the contents of
the bufchain as user output. This will enable them to be called from
additional places easily.
(cherry picked from commit 99bbbd8d32)
This is called just before closing the connection, and gives every PPL
one last chance to output anything to the user that it might have
buffered.
No functional change: all implementations so far are trivial, except
that the transport layer passes the call on to its higher
layer (because otherwise nothing would do so).
(cherry picked from commit d6e6919f69)
This should fix the bug mentioned three commits ago: if an SSH server
sends a userauth banner and then immediately slams the connection
shut (with or without SSH_MSG_DISCONNECT), the banner message should
now be reliably printed to the user, which is important if that's
where the server put its explanation for the disconnection (e.g. "Your
account has expired").
No functional change: I've just pulled out into separate subroutines
the piece of code that process a USERAUTH_BANNER message and append
it to our banner bufchain, and the piece that prints the contents of
the bufchain as user output. This will enable them to be called from
additional places easily.
This is called just before closing the connection, and gives every PPL
one last chance to output anything to the user that it might have
buffered.
No functional change: all implementations so far are trivial, except
that the transport layer passes the call on to its higher
layer (because otherwise nothing would do so).
A user reported yesterday that PuTTY can fail to print a userauth
banner message if the server sends one and then immediately slams the
connection shut. The first step to fixing this is making a convenient
way to reproduce that server behaviour.
(Apparently the real use case has to do with account expiry - the
server in question presumably doesn't have enough layer violations to
be able to put the text "Your account has expired" into an
SSH_MSG_DISCONNECT, so instead it does the next best thing and sends
it as a userauth banner immediately before disconnection.)
For the same reason that diffie-hellman-group18 goes below group16:
it's useful to _have_ it there, in case a server demands it, but under
normal circumstances it seems like overkill and a waste of CPU.
SHA-256 is not only intrinsically faster, it's also more likely to be
hardware-accelerated, so PuTTY's preference is to use that if possible
and SHA-512 only if necessary.
(cherry picked from commit 289d123fb8)
I saw a post on comp.security.ssh just now where someone had
encountered an SSH server that would _only_ speak that, which makes it
worth bothering to implement.
The totally obvious implementation works, and passes the test cases
from RFC 6234.
(cherry picked from commit b77e985513)
For the same reason that diffie-hellman-group18 goes below group16:
it's useful to _have_ it there, in case a server demands it, but under
normal circumstances it seems like overkill and a waste of CPU.
SHA-256 is not only intrinsically faster, it's also more likely to be
hardware-accelerated, so PuTTY's preference is to use that if possible
and SHA-512 only if necessary.
I saw a post on comp.security.ssh just now where someone had
encountered an SSH server that would _only_ speak that, which makes it
worth bothering to implement.
The totally obvious implementation works, and passes the test cases
from RFC 6234.
ssh->base_layer is NULL when the connection is still in its early
stages, before greetings are exchanged. If the user invokes the Change
Settings dialog in this situation, ssh_reconfig would call
ssh_ppl_reconfigure() on ssh->base_layer without checking if it was
NULL first.
(cherry picked from commit d67c13eeb8)
While writing the previous patch, I realise that walking along a
decrypted string and stopping to complain about the first mismatch you
find is an anti-pattern. If we're going to deliberately give the same
error message for various mismatches, so as not to give away which
part failed first, then we should also avoid giving away the same
information via a timing leak!
I don't think this is serious enough to warrant the full-on advisory
protocol, because XDM-AUTHORIZATION-1 is rarely used these days and
also DES-based, so there are bigger problems with it. (Plus, why on
earth is it based on encryption anyway, not a MAC?) But since I
spotted it in passing, might as well fix it.
(cherry picked from commit 8e7e3c5944)
Now the return value is a dynamically allocated string instead of a
static one, which means that the error message can include details
taken from the specific failing connection. In particular, if someone
requests an X11 authorisation protocol we don't support, we can print
its name as part of the message, which may help users debug the
problem.
One particularly important special case of this is that if the client
connection presents _no_ authorisation - which is surely by far the
most likely thing to happen by accident, e.g. if the auth file has
gone missing, or the hostname doesn't match for some reason - then we
now give a specific message "No authorisation provided", which I think
is considerably more helpful than just lumping that very common case
in with "Unsupported authorisation protocol". Even changing the latter
to "Unsupported authorisation protocol ''" is still not very sensible.
The problem in that case is not that the user has tried an exotic auth
protocol we've never heard of - it's that they've forgotten, or
failed, to provide one at all.
The error message for "XDM-AUTHORIZATION-1 data was wrong length" is
the other modified one: it now says what the wrong length _was_.
However, all other failures of X-A-1 are still kept deliberately
vague, because saying which part of the decrypted string didn't match
is an obvious information leak.
(cherry picked from commit dff4bd4d14)
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
(cherry picked from commit a76109c586)
Like 5f3b743eb0, specifically reassure the user that taking the
add-to-cache action will not cause the CA that signed the key to be
trusted in any wider context, in the case where there was no previous
certified key cached. (I don't know why I missed this out before.)
(cherry picked from commit 9209c7ea38)
ssh->base_layer is NULL when the connection is still in its early
stages, before greetings are exchanged. If the user invokes the Change
Settings dialog in this situation, ssh_reconfig would call
ssh_ppl_reconfigure() on ssh->base_layer without checking if it was
NULL first.
While writing the previous patch, I realise that walking along a
decrypted string and stopping to complain about the first mismatch you
find is an anti-pattern. If we're going to deliberately give the same
error message for various mismatches, so as not to give away which
part failed first, then we should also avoid giving away the same
information via a timing leak!
I don't think this is serious enough to warrant the full-on advisory
protocol, because XDM-AUTHORIZATION-1 is rarely used these days and
also DES-based, so there are bigger problems with it. (Plus, why on
earth is it based on encryption anyway, not a MAC?) But since I
spotted it in passing, might as well fix it.
Now the return value is a dynamically allocated string instead of a
static one, which means that the error message can include details
taken from the specific failing connection. In particular, if someone
requests an X11 authorisation protocol we don't support, we can print
its name as part of the message, which may help users debug the
problem.
One particularly important special case of this is that if the client
connection presents _no_ authorisation - which is surely by far the
most likely thing to happen by accident, e.g. if the auth file has
gone missing, or the hostname doesn't match for some reason - then we
now give a specific message "No authorisation provided", which I think
is considerably more helpful than just lumping that very common case
in with "Unsupported authorisation protocol". Even changing the latter
to "Unsupported authorisation protocol ''" is still not very sensible.
The problem in that case is not that the user has tried an exotic auth
protocol we've never heard of - it's that they've forgotten, or
failed, to provide one at all.
The error message for "XDM-AUTHORIZATION-1 data was wrong length" is
the other modified one: it now says what the wrong length _was_.
However, all other failures of X-A-1 are still kept deliberately
vague, because saying which part of the decrypted string didn't match
is an obvious information leak.
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
Like 5f3b743eb0, specifically reassure the user that taking the
add-to-cache action will not cause the CA that signed the key to be
trusted in any wider context, in the case where there was no previous
certified key cached. (I don't know why I missed this out before.)
Jacob spotted that an unused -pwfile input can be accidentally used as
the answer to Plink's antispoof 'press Return to begin session'
prompt, which is unintended and confusing.
To fix that, I've made the use of a command-line password conditional
on p->to_server, the flag in a prompts_t that indicates whether the
results of the prompts are going to be sent directly to the server or
consumed locally by PuTTY. (And I've also corrected the setting of
to_server in the antispoof prompt, which was true when it should have
been false.)
A side effect of this is that -pwfile will no longer work to provide a
private-key passphrase, if you're using public-key authentication
without Pageant. This is deliberate, because if you're doing that on
purpose then Pageant is a better way to achieve the same thing (or
else just store the key unencrypted, which is no worse); but in the
case of a server that sequentially demands public-key _and_ password
authentication, the new behaviour makes -pwfile apply to the right one
of the two prompts, i.e. the actual password.
This was pointed out by another compiler warning. The 'name' parameter
of inquire_cred_by_mech is not a gss_name_t (which is the type of
GSS_C_NO_NAME); it's a gss_name_t *, because it's an _output_
parameter. We're not telling the library that we aren't _passing_ a
name: we're telling it that we don't need it to _return_ us a name. So
the appropriate null pointer representation is just NULL.
(This was harmless apart from a compiler warning, because gss_name_t
is a pointer type in turn and GSS_C_NO_NAME expands to a null pointer
anyway. It was just a wrongly-typed null pointer.)
Heimdal provides its own definitions of OIDs like GSS_C_NT_USER_NAME
in the form of macros, which conflict with our attempt to redefine
them as variables - the macro gets expanded into the middle of the
variable declaration, leaving the poor C compiler trying to parse a
non-declaration along the lines of
const_gss_OID (&__gss_c_nt_anonymous_oid_desc) = oids+5;
Easily fixed by just not redefining these at all if they're already
defined as macros. To make that easier, I've broken up the oids[]
array into individual gss_OID_desc declarations, so I can put each one
inside the appropriate ifdef.
In the process, I've removed the 'const' from the gss_OID_desc
declarations. That's on purpose! The problem is that not all
implementations of the GSSAPI headers make const_gss_OID a pointer to
a *const* gss_OID_desc; sometimes it's just a plain one and the
'const' prefix is just a comment to the user. So removing that const
prevents compiler warnings (or worse) about address-taking a const
thing and assigning it into a non-const pointer.
I mentioned recently (in commit 9e7d4c53d8) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.