The caller of new_connection has relinquished ownership of the
SockAddr it passes in. So the receiver of that SockAddr must remember
to free it, or else we leak memory.
(Additionally, this means SshProxy will be able to remember the
address during its run, e.g. to use in calls to its Plug. But that's
not implemented yet.)
Analogous to the bug I just fixed in xtruss: in the loop that tries to
find a reasonable port number for an X display, the sense of the
(horrible) strcmp distinguishing EADDRINUSE from other socket errors
was backwards.
It was introduced in error by commit d8fda3b6da: I had originally
intended to do test runs of prime generation by means of running every
attempt with logging enabled, and after each failed attempt, deleting
the log file and restarting it. But that was _far_ too slow, so I
abandoned that approach, and switched to the alternative method of
searching for a prime with logging turned off, and then repeating just
the final successful attempt under logging conditions.
log_discard() was the 'delete the log file' function intended for use
in the first of those strategies. It wasn't actually used in the end,
so I need not have committed it - and worse, the comment inside it
claiming it _is_ used in prime generation is needlessly confusing!
In the section about our ad-hoc trait idioms, I described a code
sample as containing a set of 'static inline' wrapper functions, which
indeed it should have done - but I forgot to put the 'inline' keyword
in the code sample itself.
I ran across their defining RFCs recently and noticed that each one
provides an explicit mathematical expression for the prime (since each
one is derived from the expansion of pi, with framing FFs and a
correction term to make it actually prime).
Those expressions can be re-evaluated trivially by spigot, so it seems
reasonable to add those spigot commands in comments. This also means
the comments contain citations for these primes in actual standards,
including both the hex digits and the mathematical expressions.
Not sure how I hadn't needed that before! Obviously, if I have a test
program that can exercise all the prime generation systems, it should
include _all_ of them.
Now that I've removed side-channel leakage from both prime candidate
generation (via mp_unsafe_mod_integer) and Miller-Rabin, the
probabilistic prime generation system in this code base is now able to
get through testsc without it detecting any source of cache or timing
side channels. So you should be able to generate an RSA key (in which
the primes themselves must be secret) in a more hostile environment
than you could previously be confident of.
This is a bit counterintuitive, because _obviously_ random prime
generation takes a variable amount of time, because it has to keep
retrying until an attempt succeeds! But that's OK as long as the
attempts are completely independent, because then any timing or cache
information leaked by a _failed_ attempt will only tell an attacker
about the numbers used in the failed attempt, and those numbers have
been thrown away, so it doesn't matter who knows them. It's only
important that the _successful_ attempt, from generating the random
candidate through to completing its verification as (probably) prime,
should be side-channel clean, because that's the attempt whose data is
actually going to be turned into a private key that needs to be kept
secret.
(In particular, this means you have to avoid the old-fashioned
strategy of generating successive prime candidates by incrementing a
starting value until you find something not divisible by any small
prime, because the number of iterations of that method would be a
timing leak. Happily, we stopped doing that last year, in commit
08a3547bc5: now every candidate integer is generated
independently, and if one fails the initial checks, we throw it away
and start completely from scratch with a fresh random value.)
So the test harness works by repeatedly running the prime generator in
one-shot mode until an attempt succeeds, and then resetting the
random-number stream to where it was just before the successful
attempt. Then we generate the same prime number again, this time with
the sclog mechanism turned on - and then, we compare it against the
version we previously generated with the same random numbers, to make
sure they're the same. This checks that the attempts really _are_
independent, in the sense that the prime generator is a pure function
of its random input stream, and doesn't depend on state left over from
previous attempts.
I had a testsc run fail because of alignment-dependent control flow
divergence in a glibc function with 'memmove' in the name, which
appears to have been an accident of different memory allocation
between two runs of the test in question.
sclog was already giving special handling to memset for the same
reason, so it's no trouble to add memmove to the same list of
functions that are treated as an opaque primitive for logging
purposes.
Previously, it would generate a prime candidate, test it, and abort if
that candidate failed to be prime. Now, it's even willing to fail
_before_ generating a prime candidate, if the first attempt to even do
that is unsuccessful.
This doesn't affect the existing use case of pcs_set_oneshot, which is
during generation of a safe prime (as implemented by test/primegen.py
--safe), where you want to make a PrimeCandidateSource that can only
return 2p+1 for your existing prime p, and then abort if that fails
the next step of testing. In that situation, the PrimeCandidateSource
will never fail to generate its first output anyway.
But these changed semantics will become useful in another use I'm
about to find for one-shot mode.
Thanks to Mark Wooding for explaining the method of doing this. At
first glance it seemed _obviously_ impossible to run an algorithm that
needs an iteration per factor of 2 in p-1, without a timing leak
giving away the number of factors of 2 in p-1. But it's not, because
you can do the M-R checks interleaved with each step of your whole
modular exponentiation, and they're cheap enough that you can do them
in _every_ step, even the ones where the exponent is too small for M-R
to be interested in yet, and then do bitwise masking to exclude the
spurious results from the final output.
I'm about to rewrite the Miller-Rabin testing code, so let's start by
introducing a test suite that the old version passes, and then I can
make sure the new one does too.
I've moved it from mpunsafe.c into the main mpint.c, and renamed it
mp_mod_known_integer, because now it manages to avoid leaking
information about the mp_int you give it.
It can still potentially leak information about the small _modulus_
integer - hence the word 'known' in the new function name. This won't
be a problem in any existing use of the function, because it's used
during prime generation to check divisibility by all the small primes,
and optionally also check for residue 1 mod the RSA public exponent.
But all those values are well known and not secret.
This removes one source of side-channel leakage from prime generation.
In most Halibut man pages I write, I have a standard convention of
referring to another man page by wrapping the page name in \cw and the
section number in \e, leaving the parentheses un-marked-up. Apparently
I forgot in this particular collection.
When UML terminates, it kills its entire process group. The way PuTTY
invokes proxy processes, they are part of its process group. So if UML
is used directly as the proxy process, it will commit patricide on
termination.
Wrapping it in 'setsid' is overkill (it doesn't need to be part of a
separate _session_, only a separate pgrp), but it's good enough to
work around this problem, and give PuTTY the opportunity to shut down
cleanly when the UML it's talking to vanishes.
Ian Jackson recently tried to use the recipe in the psusan manpage for
talking to UML, and found that the connection was not successfully set
up, because at some point during startup, UML read the SSH greeting
(ok, the bare-ssh-connection greeting) from its input fd and threw it
away. So by the time psusan was run by the guest init process, the
greeting wasn't there to be read.
Ian's report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991958
I was also able to reproduce this locally, which makes me wonder why I
_didn't_ notice it when I originally wrote that part of the psusan man
page. It worked for me before, honest! But now it doesn't.
Anyway. The ssh verstring module already has a mode switch to decide
whether we ought to send our greeting before or after waiting for the
other side's greeting (because that decision varies between client and
server, and between SSH-1 and SSH-2). So it's easy to implement an
override that forces it to 'wait for the server greeting first'.
I've added this as yet another bug workaround flag. But unlike all the
others, it can't be autodetected from the server's version string,
because, of course, we have to act on it _before_ seeing the server's
greeting and version string! So it's a manual-only flag.
However, I've mentioned it in the UML section of the psusan man page,
since that's the place where I _know_ people are likely to need to use
this flag.
Following the same pattern as the previous one (commit 6c924ba862),
except that this time, I don't have to _set up_ the pattern in the
front-end code of presenting the current and previous key details -
just change over the actual string literals in putty.h.
But the rest is the same: new keys at the top of pgpkeys.but, old ones
relegated to the historical appendix, key ids in sign.sh switched over.
When do_paint breaks up a line of terminal text into contiguous runs
of characters to treat the same, one of the criteria it uses is, 'Does
this character even need redrawing? (or is it already displayed
correctly from the previous redraw?)' When we encounter a character
that matches its previous value, we end the previous run of
characters, so that we can skip the one we've just encountered.
That check was not taking account of the 'truecolour' field of the
termchar it was checking. So it would sometimes falsely believe the
character to be equivalent to its previously drawn value, even when in
fact it was not, and hence insert a run break, anticipating that the
previous character needed drawing and the current one did not.
This didn't cause a _wrong_ redraw, because there's a separate loop
further on which re-checks whether to actually draw things, which
didn't make the same error. So the character that loop #1 thought
didn't need a redraw, loop #2 knew _did_ need a redraw, and hence,
everything did get redrawn.
But by the time loop #2 is running, it's too late to change the run
boundaries. So everything does get redrawn, but in much smaller chunks
than it could have been. The net effect was that if the screen was
filled with text displayed in true colour, and you changed it to the
_same_ text in a different colour, then the whole terminal would be
redrawn in one-character increments instead of the usual behaviour of
folding together runs that can be drawn in one go.
Thanks to Bradley Smith for debugging this very confusing issue!
A user reports that if you have MIT KfW loaded, and your PuTTY session
terminates without the PuTTY process exiting, and you select 'Restart
Session' from the menu, then a crash occurs inside the Kerberos
library itself. Scuttlebutt on the Internet suggested this might be to
do with unloading and then reloading the DLL within the process
lifetime, which indeed we were doing.
Now we avoid doing that for the KfW library in particular, by keeping
a tree234 of module handles marked 'never unload this'.
This is a workaround at best, but it seems to stop the problem
happening in my own tests.
(cherry picked from commit 058e390ab5)
If you don't, they are permanently leaked. A user points out that this
is particularly bad in Pageant, with the new named-pipe-based IPC,
since it will spawn an input and output I/O thread per named pipe
connection, leading to two handles being leaked every time.
(cherry picked from commit c714dfc936)
In commit f69cf86a61, I added a call to term_update that happens
when we receive WM_VSCROLL / SB_THUMBPOSITION in the subsidiary
message loop that Windows creates during the handling of WM_SYSCOMMAND
/ SC_VSCROLL. The effect was that interactive dragging of the
scrollbar now redraws the window at every step, whereas previously it
didn't.
A user just pointed out that if you click on one of the scrollbar end
buttons and hold it down until it begins emulating key repeat, the
same bug occurs: the window isn't redrawn until you release the mouse
button and the subsidiary message loop ends.
This commit extends the previous fix to cover all of the WM_VSCROLL
subtypes, instead of just SB_THUMBPOSITION and SB_THUMBTRACK. Redraws
while holding down those scrollbar buttons now work again.
(cherry picked from commit 2029aa55c2)
A user reports that if you have MIT KfW loaded, and your PuTTY session
terminates without the PuTTY process exiting, and you select 'Restart
Session' from the menu, then a crash occurs inside the Kerberos
library itself. Scuttlebutt on the Internet suggested this might be to
do with unloading and then reloading the DLL within the process
lifetime, which indeed we were doing.
Now we avoid doing that for the KfW library in particular, by keeping
a tree234 of module handles marked 'never unload this'.
This is a workaround at best, but it seems to stop the problem
happening in my own tests.
If you don't, they are permanently leaked. A user points out that this
is particularly bad in Pageant, with the new named-pipe-based IPC,
since it will spawn an input and output I/O thread per named pipe
connection, leading to two handles being leaked every time.
In commit f69cf86a61, I added a call to term_update that happens
when we receive WM_VSCROLL / SB_THUMBPOSITION in the subsidiary
message loop that Windows creates during the handling of WM_SYSCOMMAND
/ SC_VSCROLL. The effect was that interactive dragging of the
scrollbar now redraws the window at every step, whereas previously it
didn't.
A user just pointed out that if you click on one of the scrollbar end
buttons and hold it down until it begins emulating key repeat, the
same bug occurs: the window isn't redrawn until you release the mouse
button and the subsidiary message loop ends.
This commit extends the previous fix to cover all of the WM_VSCROLL
subtypes, instead of just SB_THUMBPOSITION and SB_THUMBTRACK. Redraws
while holding down those scrollbar buttons now work again.
This is used to notify the Seat that some data has been cleared from
the backend's outgoing data buffer. In other words, it notifies the
Seat that it might be worth calling backend_sendbuffer() again.
We've never needed this before, because until now, Seats have always
been the 'main program' part of the application, meaning they were
also in control of the event loop. So they've been able to call
backend_sendbuffer() proactively, every time they go round the event
loop, instead of having to wait for a callback.
But now, the SSH proxy is the first example of a Seat without
privileged access to the event loop, so it has no way to find out that
the backend's sendbuffer has got smaller. And without that, it can't
pass that notification on to plug_sent, to unblock in turn whatever
the proxied connection might have been waiting to send.
In fact, before this commit, sshproxy.c never called plug_sent at all.
As a result, large data uploads over an SSH jump host would hang
forever as soon as the outgoing buffer filled up for the first time:
the main backend (to which sshproxy.c was acting as a Socket) would
carefully stop filling up the buffer, and then never receive the call
to plug_sent that would cause it to start again.
The new callback is ignored everywhere except in sshproxy.c. It might
be a good idea to remove backend_sendbuffer() entirely and convert all
previous uses of it into non-empty implementations of this callback,
so that we've only got one system; but for the moment, I haven't done
that.
Suggested by Manfred Kaiser, who also wrote most of this patch
(although outlying parts, like documentation and SSH-1 support, are by
me).
This is a second line of defence against the kind of spoofing attacks
in which a malicious or compromised SSH server rushes the client
through the userauth phase of SSH without actually requiring any auth
inputs (passwords or signatures or whatever), and then at the start of
the connection phase it presents something like a spoof prompt,
intended to be taken for part of userauth by the user but in fact with
some more sinister purpose.
Our existing line of defence against this is the trust sigil system,
and as far as I know, that's still working. This option allows a bit of
extra defence in depth: if you don't expect your SSH server to
trivially accept authentication in the first place, then enabling this
option will cause PuTTY to disconnect if it unexpectedly does so,
without the user having to spot the presence or absence of a fiddly
little sigil anywhere.
Several types of authentication count as 'trivial'. The obvious one is
the SSH-2 "none" method, which clients always try first so that the
failure message will tell them what else they can try, and which a
server can instead accept in order to authenticate you unconditionally.
But there are two other ways to do it that we know of: one is to run
keyboard-interactive authentication and send an empty INFO_REQUEST
packet containing no actual prompts for the user, and another even
weirder one is to send USERAUTH_SUCCESS in response to the user's
preliminary *offer* of a public key (instead of sending the usual PK_OK
to request an actual signature from the key).
This new option detects all of those, by clearing the 'is_trivial_auth'
flag only when we send some kind of substantive authentication response
(be it a password, a k-i prompt response, a signature, or a GSSAPI
token). So even if there's a further path through the userauth maze we
haven't spotted, that somehow avoids sending anything substantive, this
strategy should still pick it up.
(cherry picked from commit 5f5c710cf3)
This allows the 'no trivial auth' option introduced by the previous
commit to be tested. Uppity has grown three new options to make it
accept "none" authentication, keyboard-interactive involving no
prompts, and the perverse sending of USERAUTH_SUCCESS after a
signatureless public-key offer.
The first of those options also enables the analogue in SSH-1; the
other two have no SSH-1 analogues in the first place. (SSH-1 public
key authentication has a challenge-response structure that doesn't
contain any way to terminate the exchange early with success. And the
TIS and CryptoCard methods, which are its closest analogue of k-i,
have a fixed number of prompts, which is not 0.)
Suggested by Manfred Kaiser, who also wrote most of this patch
(although outlying parts, like documentation and SSH-1 support, are by
me).
This is a second line of defence against the kind of spoofing attacks
in which a malicious or compromised SSH server rushes the client
through the userauth phase of SSH without actually requiring any auth
inputs (passwords or signatures or whatever), and then at the start of
the connection phase it presents something like a spoof prompt,
intended to be taken for part of userauth by the user but in fact with
some more sinister purpose.
Our existing line of defence against this is the trust sigil system,
and as far as I know, that's still working. This option allows a bit of
extra defence in depth: if you don't expect your SSH server to
trivially accept authentication in the first place, then enabling this
option will cause PuTTY to disconnect if it unexpectedly does so,
without the user having to spot the presence or absence of a fiddly
little sigil anywhere.
Several types of authentication count as 'trivial'. The obvious one is
the SSH-2 "none" method, which clients always try first so that the
failure message will tell them what else they can try, and which a
server can instead accept in order to authenticate you unconditionally.
But there are two other ways to do it that we know of: one is to run
keyboard-interactive authentication and send an empty INFO_REQUEST
packet containing no actual prompts for the user, and another even
weirder one is to send USERAUTH_SUCCESS in response to the user's
preliminary *offer* of a public key (instead of sending the usual PK_OK
to request an actual signature from the key).
This new option detects all of those, by clearing the 'is_trivial_auth'
flag only when we send some kind of substantive authentication response
(be it a password, a k-i prompt response, a signature, or a GSSAPI
token). So even if there's a further path through the userauth maze we
haven't spotted, that somehow avoids sending anything substantive, this
strategy should still pick it up.
If a batch of palette changes were seen in between window updates, only
the last one would take immediate effect.
(cherry-picked from commit 5677da6481)
I had manually defined the ACLE feature macro __ARM_FEATURE_CRYPTO
before including arm_neon.h, in the expectation that it would turn on
the AES, SHA-1 and SHA-256 intrinsics. But up-to-date clang has now
separated those intrinsics from each other, and guarded them by two
more specific feature macros, one for AES and one for the two SHAs. So
just defining __ARM_FEATURE_CRYPTO isn't good enough any more, and my
attempts to use crypto intrinsics in the following functions provoke a
compile error.
The fix is to define the appropriate new feature macro by hand
(leaving the old definition in place for earlier clang versions).
This fix is only needed on the release branch, of course: on main,
we've already done the reorganisation that avoids the need to manually
define ACLE feature macros at all, because the accelerated crypto code
is compiled in separate objects using command-line compile flags in
the way that the toolchain normally expects.
In commit 9cc586e605 I changed the low-level key-file reading
routines like read_header and read_body so that they read from a
BinarySource via get_byte(), rather than from a FILE * via fgetc. But
I forgot that the two functions don't signal end-of-file the same way,
so testing the return value of get_byte() against EOF is pointless and
will never match, and conversely, real EOF won't be spotted unless you
also examine the error indicator in the BinarySource.
As a result, a key file that ends without a trailing newline will
cause a tight loop in one of those low-level read routines.
(cherry picked from commit d008d235f3)
Since ca9cd983e1, changing colour config mid-session had no effect
(until the palette was reset for some other reason). Now it does take
effect immediately (provided that the palette has not been overridden by
escape sequence -- this is new with ca9cd983e1).
This changes the semantics of palette_reset(): the only important
parameter when doing that is whether we keep escape sequence overrides
-- there's no harm in re-fetching config and platform colours whether or
not they've changed -- so that's what the parameter becomes (with a
sense that doesn't require changing the call sites). The other part of
this change is actually remembering to trigger this when the
configuration is changed.
(cherry picked from commit 1e726c94e8)
A user pointed out that once we've identified the key algorithm from
an apparent public-key blob, we call ssh_key_new_pub on the blob data
and assume it will succeed. But there are plenty of ways it could
still fail, and ssh_key_new_pub could return NULL.
(cherry picked from commit 0c21eb4447)
I was cleaning up the 'struct handle', but not the underlying HANDLE.
As a result, any PuTTY process that makes a request to Pageant keeps
the named pipe connection open until the end of the process's
lifetime.
(cherry picked from commit 6e69223dc2)
I was checking a HANDLE against INVALID_HANDLE_VALUE to decide whether
it should be closed. But ten lines further up, I was setting it
manually to NULL to suppress the close. Oops.
(cherry picked from commit 155d8121e6)
Less than 12 hours after 0.75 went out of the door, a user pointed out
that enabling the 'Use system colours' config option causes an
immediate NULL-dereference crash. The reason is because a chain of
calls from term_init() ends up calling back to the Windows
implementation of the palette_get_overrides() method, which responds
by trying to call functions on the static variable 'term' in window.c,
which won't be initialised until term_init() has returned.
Simple fix: palette_get_overrides() is now given a pointer to the
Terminal that it should be updating, because it can't find it out any
other way.
(cherry picked from commit 571fa3388d)
This generates primality certificates for numbers, in the form of
Python / testcrypt code that calls Pockle methods. It factors p-1 by
calling out to the 'yafu' utility, which is a moderately sophisticated
integer factoring tool (including ECC and quadratic sieve methods)
that runs as a standalone command-line program.
Also added a Pockle test generated as output from this script, which
verifies the primality of the three NIST curves' moduli and their
generators' orders. I already had Pockle certificates for the moduli
and orders used in EdDSA, so this completes the set, and it does it
without me having had to do a lot of manual work.
Enthusiastic copy-paste: in commit 17c57e1078 I added the same
precautionary call to ensure_handlewaits_tree_exists() everywhere,
even in functions that didn't actually need to use the tree.
This script makes 128 connections to your SSH agent at once, and then
sends requests down them in random order to check that the agent is
correctly selecting between all its incoming sockets / named pipes /
whatever.
128 is bigger than MAXIMUM_WAIT_OBJECTS, so a successful run of this
script inside a Windows PuTTY agent-forwarding to a Pageant indicates
that both the PuTTY and the Pageant are managing to handle >64 I/O
subthreads without overloading their event loop.
Before commit 6e69223dc2, Pageant would stop working after a
certain number of PuTTYs were active at the same time. (At most about
60, but maybe fewer - see below.)
This was because of two separate bugs. The easy one, fixed in
6e69223dc2 itself, was that PuTTY left each named-pipe connection
to Pageant open for the rest of its lifetime. So the real problem was
that Pageant had too many active connections at once. (And since a
given PuTTY might make multiple connections during userauth - one to
list keys, and maybe another to actually make a signature - that was
why the number of _PuTTYs_ might vary.)
It was clearly a bug that PuTTY was leaving connections to Pageant
needlessly open. But it was _also_ a bug that Pageant couldn't handle
more than about 60 at once. In this commit, I fix that secondary bug.
The cause of the bug is that the WaitForMultipleObjects function
family in the Windows API have a limit on the number of HANDLE objects
they can select between. The limit is MAXIMUM_WAIT_OBJECTS, defined to
be 64. And handle-io.c was using a separate event object for each I/O
subthread to communicate back to the main thread, so as soon as all
those event objects (plus a handful of other HANDLEs) added up to more
than 64, we'd start passing an overlarge handle array to
WaitForMultipleObjects, and it would start not doing what we wanted.
To fix this, I've reorganised handle-io.c so that all its subthreads
share just _one_ event object to signal readiness back to the main
thread. There's now a linked list of 'struct handle' objects that are
ready to be processed, protected by a CRITICAL_SECTION. Each subthread
signals readiness by adding itself to the linked list, and setting the
event object to indicate that the list is now non-empty. When the main
thread receives the event, it iterates over the whole list processing
all the ready handles.
(Each 'struct handle' still has a separate event object for the main
thread to use to communicate _to_ the subthread. That's OK, because no
thread is ever waiting on all those events at once: each subthread
only waits on its own.)
The previous HT_FOREIGN system didn't really fit into this framework.
So I've moved it out into its own system. There's now a handle-wait.c
which deals with the relatively simple job of managing a list of
handles that need to be waited for, each with a callback function;
that's what communicates a list of HANDLEs to event loops, and
receives the notification when the event loop notices that one of them
has done something. And handle-io.c is now just one client of
handle-wait.c, providing a single HANDLE to the event loop, and
dealing internally with everything that needs to be done when that
handle fires.
The new top-level handle-wait.c system *still* can't deal with more
than MAXIMUM_WAIT_OBJECTS. At the moment, I'm reasonably convinced it
doesn't need to: the only kind of HANDLE that any of our tools could
previously have needed to wait on more than one of was the one in
handle-io.c that I've just removed. But I've left some assertions and
a TODO comment in there just in case we need to change that in future.
Commit 0d3bb73608 inserted PROXY_SSH just before PROXY_CMD, so
that it got the numerical value that PROXY_CMD had before. But that
meant that any saved session configured as PROXY_CMD by an older build
of PuTTY would be regarded as PROXY_SSH by the new version - even
after the previous commit fixed the unconditional use of SSH proxying
regardless of the type setting.
Now PROXY_CMD has its old value back, and PROXY_SSH is inserted at the
_end_ of the enum. (Or rather, just before the extra-weird PROXY_FUZZ,
which is allowed to have an unstable numerical value because it's
never stored in a saved session at all.)
This is all rather unsatisfactory, and makes me wish I'd got round to
reworking the saved data format to use keywords in place of integers.