Two minor memory-leak fixes on 0.74 seem not to be needed on master:
the fix in an early exit path of pageant_add_keyfile is done already
on master in a different way, and the missing sfree(fdlist) in
uxsftp.c is in code that's been completely rewritten in the uxcliloop
refactoring.
Other minor conflicts: the rework in commit b52641644905 of
ssh1login.c collided with the change from FLAG_VERBOSE to
seat_verbose(), and master and 0.74 each added an unrelated extra
field to the end of struct SshServerConfig.
This is a no-op merge, via 'git merge -s ours', which records that all
the commits up to this point on the 0.74 branch are bug fixes
cherry-picked from master, and don't need merging back to master.
From this point onwards, the 0.74 branch will contain fresh work that
_will_ need merging back to master. This preliminary non-merge allows
me to avoid needless conflicts during that process.
We received a report that if you enable Windows 10's high-contrast
mode, the text in PuTTY's installer UI becomes invisible, because it's
displayed in the system default foreground colour against a background
of the white right-hand side of our 'msidialog.bmp' image. That's fine
when the system default fg is black, but high-contrast mode flips it
to white, and now you have white on white text, oops.
Some research in the WiX bug tracker suggests that in Windows 10 you
don't actually have to use BMP files for your installer images any
more: you can use PNG, and PNGs can be transparent. However, someone
else reported that that only works in up-to-date versions of Windows.
And in fact there's no need to go that far. A more elegant answer is
to simply not cover the whole dialog box with our background image in
the first place. I've reduced the size of the background image so that
it _only_ contains the pretty picture on the left-hand side, and omits
the big white rectangle that used to sit under the text. So now the
RHS of the dialog is not covered by any image at all, which has the
same effect as it being covered with a transparent image, except that
it doesn't require transparency support from msiexec. Either way, the
background for the text ends up being the system's default dialog-box
background, in the absence of any images or controls placed on top of
it - so when the high-contrast mode is enabled, it flips to black at
the same time as the text flips to white, and everything works as it
should.
The slight snag is that the pre-cooked WiX UI dialog specifications
let you override the background image itself, but not the Width and
Height fields in the control specifications that refer to them. So if
you just try to drop in a narrow image in the most obvious way, it
gets stretched across the whole window.
But that's not a show-stopper, because we're not 100% dependent on
getting WiX to produce exactly the right output. We already have the
technology to postprocess the MSI _after_ it comes out of WiX: we're
using it to fiddle the target-platform field for the Windows on Arm
installers. So all I had to do was to turn msiplatform.py into a more
general msifixup.py, add a second option to change the width of the
dialog background image, and run it on the x86 installers as well as
the Arm ones.
Coverity points out that this function is mostly written as if it's
intended to allow for term->screen and/or term->alt_screen to be NULL,
but makes an unguarded call to find_last_nonempty_line on one of them.
I don't immediately remember _why_ I needed to deal with those
pointers being null, but it was probably a safety precaution against
swap_screen being called during setup or during reconfiguration, in
which case it seems sensible to keep it even if it's not needed in the
_current_ state of the code. So, added the missing check.
In commit 4ecc3f3c09 I did a knee-jerk fix of a macro of the form
#define SECOND_PASS_ONLY { body; }
on the grounds that it was syntax-unsafe, so I wrapped it in the
standard do while(0):
#define SECOND_PASS_ONLY do { body; } while (0)
But in this case, that was a bogus transformation, because the body
executed 'continue' with the intention of affecting the containing
loop (outside the macro). Moreover, ten lines above the macro
definition was a comment specifically explaining why it _couldn't_ be
wrapped in do while (0) !
Since then I've come up with an alternative break-and-continue-proof
wrapper for macros that are supposed to expand to something that's
syntactically a C statement. So I've used that instead, and while I'm
at it, fixed the neighbouring EXPECTS_ARG as well.
Spotted by Coverity, and well spotted indeed! How embarrassing.
udata[uindex] is a wchar_t, so if we pass it to sprintf("%d") we
should cast it to int (because who knows what primitive integer type
that might have corresponded to otherwise). I had done this in the
first of the two sprintfs that use it, but missed the second one a few
lines further on. Spotted by Coverity.
This mitigates CVE-2020-14002: if you're in the habit of clicking OK
to unknown host keys (the TOFU policy - trust on first use), then an
active attacker looking to exploit that policy to substitute their own
host key in your first connection to a server can use the host key
algorithm order in your KEXINIT to (not wholly reliably) detect
whether you have a key already stored for this host, and if so, abort
their attack to avoid giving themself away.
However, for users who _don't_ use the TOFU policy and instead check
new host keys out of band, the dynamic policy is more useful. So it's
provided as a configurable option.
Now, in both SSH-1 and SSH-2, we go through the whole response from
the SSH agent, parse out the public blob and comment of every key, and
stash them in a data structure to iterate through later.
Previously, we were iterating through the agent response _in situ_,
while it was still stored in the s->agent_response memory buffer in
the form the agent sent it, and had the ongoing s->asrc BinarySource
pointing at it. This led to a remotely triggerable stale-pointer bug:
as soon as we send a _second_ agent request trying to authenticate
with one of the keys, it causes s->agent_response to be freed. In
normal usage this doesn't happen, because if a server sends PK_OK (or
an RSA1 challenge) then it's going to accept our response, so we never
go back to iterating over the rest of the agent's key list. But if a
server sends PK_OK or an RSA1 challenge and _then_ rejects
authentication after we go to the effort of responding, we'll go back
to iterating over the agent's key list and cause a crash.
So now, we extract everything we need from the key-list agent
response, and by the time we're making further agent requests, we
don't need it any more.
If a malicious SSH agent were to send an RSA signature blob _longer_
than the key modulus while BUG_SSH2_RSA_PADDING was enabled, then it
could DoS the client, because the put_padding call would keep
allocating memory in 'strbuf *substr' until address space ran out.
The previous prompts were part of transcripts pasted directly from a
particular historical cmd session, but that's no reason to keep them
lying around confusingly, especially since we keep regenerating some
of those transcripts outside that historical context. Replace them all
with nice simple C:\> which shouldn't confuse anyone with extraneous
detail.
This reverts commit 4634cd47f7 and
commit 43a63019f5, both of which
introduced checks at ldisc_send call sites to avoid triggering the
assertion that len != 0 inside ldisc_send. Now that assertion is gone,
it's OK to call ldisc_send without checking the buffer size.
(cherry picked from commit 2bbed67d9e)
A user reported another situation in which that assertion can fail: if
you paste text into the terminal that consists 100% of characters not
available in the CONF_line_codepage character set, then the
translation step generates the empty string as output, and that gets
passed to ldisc_send by term_paste without checking.
Previous bugs of this kind (see commits 4634cd47f7 and 43a63019f5)
were fixed by adding a check before calling ldisc_send. But in commit
4634cd47f7 I said that probably at some point the right fix would be
to remove the assertion in ldisc_send itself, so that passing len==0
becomes legal. (The assertion was there in the first place to catch
cases where len==0 was used with its obsolete special meaning of
signalling 'please update your status'.)
Well, I think it's finally time. The assertion is removed: it's now
legal again to call ldisc_send with an empty buffer, and its meaning
is no longer the archaic special thing, but the trivial one of sending
zero characters through the line discipline.
(cherry picked from commit cd3e917fd0)
wcrtomb returns a size_t, so it's silly to immediately assign it into
an int variable. Apparently running gcc with LTO enabled points this
out as an error.
This was benign as far as I can see: the obvious risk of integer
overflow could only happen if the OS wanted to convert a single wide
character into more than 2^31 bytes, and the test of the return value
against (size_t)-1 for an error check seems to work anyway in
practice, although I suspect that's only because of implementation-
defined behaviour in gcc at the point where the size_t is narrowed to
a signed int.
(cherry picked from commit 99f5fa34ab)
Again, there was a missing #include in that file which meant that the
definition of the function was never being checked against the
declaration visible to other source files.
(cherry picked from commit c5aa7fc31c)
I'd forgotten to #include "dialog.h" in that file, which meant nothing
was checking the prototypes of the stub implementations of the dlg_*
function family against the real versions. They almost all needed a
'void *dlg' parameter updating to 'dlgparam *dp', which is a change
dating from commit 3aae1f9d76 nearly two years ago. And a handful of
them also still had 'int' that should be now have become 'bool'.
(cherry picked from commit c373fe979f)
Mark Wooding points out that when running with the +ut flag, we close
pty_utmp_helper_pipe during pty backend setup, which causes the
previously forked helper process to terminate. If that termination
happens quickly enough, then the code later in pty_backend_create
won't have set up the SIGCHLD handler and its pipe yet, so when we get
to the main event loop, we'll fail to notice that subprocess waiting
to be reaped, and leave it lying around as a zombie.
An easy fix is to move the handler and pipe setup to before the code
that potentially closes pty_utmp_helper_pipe, so that there isn't a
race condition any more.
(cherry picked from commit 7ffa6ed41e)
If a terminal window closed with a popup (due to a network error,
for instance) while the mouse pointer was hidden by 'Hide mouse
pointer when typing in window', the mouse pointer could remain hidden
while over the terminal window, making it hard to navigate to the
popup.
(cherry picked from commit d9c4ce9fd8)
Colin reports that on betas of Ubuntu 20.04, Pango has switched to
getting its font metrics from HarfBuzz, and a side effect is
apparently that they're being returned in the full precision of
PANGO_SCALE fixed point.
Previously, Pango appears to have been returning values that were
always a whole number of pixels scaled by PANGO_SCALE. Moreover, it
looks as if it was rounding the font ascent and descent _up_ to a
whole number of pixels, rather than rounding to nearest. But our code
rounds to nearest, which means that now the same font gets allocated
fewer vertical pixels, which can be enough to cut off some ascenders
or descenders.
Pango already provides the macro PANGO_PIXELS_CEIL, so it's easy to
switch over to using it. This should arrange that any text that fits
within the font's stated ascent/descent measurements will also fit in
the character cell.
(cherry picked from commit f9a46a9581)
If gdk_event_get_scroll_deltas() return failure for a given
GdkEventScroll, it doesn't follow that that event has no usable
scrolling action in it at all. The fallback is to call
gdk_event_get_scroll_direction() instead, which is less precise but
still gives _something_ you can use. So in that situation, instead of
just returning false, we can fall through to the handling we use for
pre-GTK3 scroll events (which are always imprecise).
In particular, I've noticed recently that if you run GTK 3 PuTTY in
the virtual X display created by vnc4server, and connect to it using
xtightvncviewer, then scroll-wheel actions passed through from the VNC
client will cause scroll_event() to receive low-res GdkEventScroll
structures of exactly this kind. So scroll-wheel activity on the
terminal window wasn't causing a scroll in that environment, and with
this patch, it does.
(cherry picked from commit 0fd30113f1)
In the SSH-2 connection layer, an outstanding_channel_request
structure comes with a handler to be called back with the reply
packet, when the other end sends one. But sometimes it doesn't - if
the channel begins to close before the request has been replied to -
in which case the handler function is called with a NULL packet
pointer.
The common ssh2_channel_response function that handles most of the
client-side channel requests was not prepared to cope with that
pointer being null. Fixed by making it handle a null return the same
as CHANNEL_FAILURE.
(cherry picked from commit e4b6a7efd2)
That causes the config dialog to terminate with result -1, which
wasn't handled at all by the result-receiving code. So GTK PuTTY would
continue running its main loop even though it had no windows open and
wasn't ever planning to do anything.
(cherry picked from commit 4fc5d7a5f5)
A known_hosts line can have multiple comma-separated hostnames on it,
or more usually a hostname and an IP address.
In the RSA and DSA key handlers, I was making a list of the integer
parameters of the public key by using the 'map' function, and then
iterating over it once per hostname on the line. But in Python 3, the
'map' function returns an iterator, not a list, so after you've
iterated to its end once, it's empty, and iterating over it a second
time stops immediately. As a result, the registry line for the second
hostname was coming out empty.
(cherry picked from commit 143f8a2d10)
The comparison functions between an mp_int and an integer worked by
walking along the mp_int, comparing each of its words to the
corresponding word of the integer. When they ran out of mp_int, they'd
stop.
But this overlooks the possibility that they might not have run out of
_integer_ yet! If BIGNUM_INT_BITS is defined to be less than the size
of a uintmax_t, then comparing (say) the uintmax_t 0x8000000000000001
against a one-word mp_int containing 0x0001 would return equality,
because it would never get as far as spotting the high bit of the
integer.
Fixed by iterating up to the max of the number of BignumInts in the
mp_int and the number that cover a uintmax_t. That means we have to
use mp_word() instead of a direct array lookup to get the mp_int words
to compare against, since now the word indices might be out of range.
(cherry picked from commit 289d8873ec)
Functions like mp_copy_integer_into, mp_add_integer_into and
mp_hs_integer all take an ordinary C integer in the form of a
uintmax_t, and perform an operation between that and an mp_int. In
order to do that, they have to break it up into some number of
BignumInt, via bit shifts.
But in C, shifting by an amount equal to or greater than the width of
the type is undefined behaviour, and you risk the compiler generating
nonsense or complaining at compile time. I did various dodges in those
functions to try to avoid that, but didn't manage to use the same
idiom everywhere. Sometimes I'd leave the integer in its original form
and shift it right by increasing multiples of BIGNUM_INT_BITS;
sometimes I'd shift it down in place every time. And mostly I'd do the
conditional shift by checking against sizeof(n), but once I did it by
shifting by half the word and then the other half.
Now refactored so that there's a pair of functions to shift a
uintmax_t left or right by BIGNUM_INT_BITS in what I hope is a UB-safe
manner, and changed all the code I could find to use them.
(cherry picked from commit 3ea69c290e)
While looking over the code for other reasons, I happened to notice
that the internal function mp_add_masked_integer_into was using a
totally wrong condition to check whether it was about to do an
out-of-range right shift: it was comparing a shift count measured in
bits against BIGNUM_INT_BYTES.
The resulting bug hasn't shown up in the code so far, which I assume
is just because no caller is passing any RHS to mp_add_integer_into
bigger than about 1 or 2. And it doesn't show up in the test suite
because I hadn't tested those functions. Now I am testing them, and
the newly added test fails when built for 16-bit BignumInt if you back
out the actual fix in this commit.
(cherry picked from commit 921118dbea)
This file exports several functions defined in sshserver.h, and the
declarations weren't being type-checked against the definitions.
(cherry picked from commit 37d91aabff)
I'm not really sure why that's necessary: by my understanding of the C
standard, it shouldn't be. But my observation is that when compiling
with {Address,Leak} Sanitiser enabled, pageant --askpass can somehow
manage to exit without having actually written the passphrase to its
standard output.
(cherry picked from commit c618d6baac)
On Windows, due to a copy-paste goof, the message that should have
read "Configuring n stop bits" instead ended with "data bits".
While I'm here, I've arranged that the "1 stop bit" case of that
message is in the singular. And then I've done the same thing again on
Unix, because I noticed that message was unconditionally plural too.
(cherry picked from commit bdb7b47a5e)
The sets of poll(2) events that we check in order to return SELECT_R
and SELECT_W overlap: to be precise, they have POLLERR in common. So
if an fd signals POLLERR, then pollwrap_get_fd_rwx will respond by
saying that it has both SELECT_R and SELECT_W available on it - even
if the caller had only asked for one of those.
In other words, you can get a spurious SELECT_W notification on an fd
that you never asked for SELECT_W on in the first place. This
definitely isn't what I'd meant that API to do.
In particular, if a socket in the middle of an asynchronous connect()
signals POLLERR, then Unix Plink will call select_result for it with
SELECT_R and then SELECT_W respectively. The former will notice that
it's got an error condition and call plug_closing - and _then_ the
latter will decide that it's writable and set s->connected! The plan
was to only select it for write until it was connected, but this bug
in pollwrap was defeating that plan.
Now pollwrap_get_fd_rwx should only ever return a set of rwx flags
that's a subset of the one that the client asked for via
pollwrap_add_fd_rwx.
(cherry picked from commit 78974fce89)
Spotted by Leak Sanitiser, while I was investigating the PSFTP /
proftpd issue mentioned in the previous commit (with ASan on as
usual).
The two very similar loops that read PSFTP commands from the
interactive prompt and a batch file differed in one respect: only one
of them remembered to free the command afterwards. Now I've moved the
freeing code out into a subroutine that both loops can use.
(cherry picked from commit bf0f323fb4)
I tried to do an SFTP upload through connection sharing the other day
and found that pscp sent some data and then hung. Now I debug it, what
seems to have happened was that we were looping in sftp_recv() waiting
for an SFTP packet from the remote, but we didn't have any outstanding
SFTP requests that the remote was going to reply to. Checking further,
xfer_upload_ready() reported true, so we _could_ have sent something -
but the logic in the upload loop had a hole through which we managed
to get into 'waiting for a packet' state.
I think what must have happened is that xfer_upload_ready() reported
false so that we entered sftp_recv(), but then the event loop inside
sftp_recv() ran a toplevel callback that made xfer_upload_ready()
return true. So, the fix: sftp_recv() is our last-ditch fallback, and
we always try emptying our callback queue and rechecking upload_ready
before we resort to waiting for a remote packet.
This not only fixes the hang I observed: it also hugely improves the
upload speed. My guess is that the bug must have been preventing us
from filling our outgoing request pipeline a _lot_ - but I didn't
notice it until the one time the queue accidentally ended up empty,
rather than just sparse enough to make transfers slow.
Annoyingly, I actually considered this fix back when I was trying to
fix the proftpd issue mentioned in commit cd97b7e7e. I decided fixing
ssh_sendbuffer() was a better idea. In fact it would have been an even
better idea to do both! Oh well, better late than never.
(cherry picked from commit 3a633bed35)
This reverts commit 4634cd47f7 and
commit 43a63019f5, both of which
introduced checks at ldisc_send call sites to avoid triggering the
assertion that len != 0 inside ldisc_send. Now that assertion is gone,
it's OK to call ldisc_send without checking the buffer size.
A user reported another situation in which that assertion can fail: if
you paste text into the terminal that consists 100% of characters not
available in the CONF_line_codepage character set, then the
translation step generates the empty string as output, and that gets
passed to ldisc_send by term_paste without checking.
Previous bugs of this kind (see commits 4634cd47f7 and 43a63019f5)
were fixed by adding a check before calling ldisc_send. But in commit
4634cd47f7 I said that probably at some point the right fix would be
to remove the assertion in ldisc_send itself, so that passing len==0
becomes legal. (The assertion was there in the first place to catch
cases where len==0 was used with its obsolete special meaning of
signalling 'please update your status'.)
Well, I think it's finally time. The assertion is removed: it's now
legal again to call ldisc_send with an empty buffer, and its meaning
is no longer the archaic special thing, but the trivial one of sending
zero characters through the line discipline.
wcrtomb returns a size_t, so it's silly to immediately assign it into
an int variable. Apparently running gcc with LTO enabled points this
out as an error.
This was benign as far as I can see: the obvious risk of integer
overflow could only happen if the OS wanted to convert a single wide
character into more than 2^31 bytes, and the test of the return value
against (size_t)-1 for an error check seems to work anyway in
practice, although I suspect that's only because of implementation-
defined behaviour in gcc at the point where the size_t is narrowed to
a signed int.
Again, there was a missing #include in that file which meant that the
definition of the function was never being checked against the
declaration visible to other source files.
I'd forgotten to #include "dialog.h" in that file, which meant nothing
was checking the prototypes of the stub implementations of the dlg_*
function family against the real versions. They almost all needed a
'void *dlg' parameter updating to 'dlgparam *dp', which is a change
dating from commit 3aae1f9d76 nearly two years ago. And a handful of
them also still had 'int' that should be now have become 'bool'.
The class for general rth-root finding started off as a cube-root
finder before I generalised it, and in one part of the top-level
explanatory comment, I still referred to a subgroup having index 3
rather than index r.
Also, in a later paragraph, I seem to have said 'index' several times
where I meant the concept of 'rank' I defined in the previous
paragraph.
Mark Wooding points out that when running with the +ut flag, we close
pty_utmp_helper_pipe during pty backend setup, which causes the
previously forked helper process to terminate. If that termination
happens quickly enough, then the code later in pty_backend_create
won't have set up the SIGCHLD handler and its pipe yet, so when we get
to the main event loop, we'll fail to notice that subprocess waiting
to be reaped, and leave it lying around as a zombie.
An easy fix is to move the handler and pipe setup to before the code
that potentially closes pty_utmp_helper_pipe, so that there isn't a
race condition any more.
A user points out that in the current state of PSCP, if you have some
protocol other than SSH configured in Default Settings, then
specifying a non-saved-session hostname on the PSCP command line will
cause it to try to connect with protocol SSH but the port number from
Default Settings.
A better approach is the one used in PSFTP: we use the port number
from the saved session _if_ the protocol is also one that's known to
PSCP (i.e. SSH or bare ssh-connection), and otherwise, we reset both
to sensible values.
Now you can see exactly what pathname the backend tried to open for
the serial port, and what error code it got back from the OS when it
tried. That should help users distinguish between (for example) a
permissions problem and a typo in the filename.
Now, instead of a 'const char *' in the static data segment, error
messages returned from backend setup are dynamically allocated and
freed by the caller.
This will allow me to make the messages much more specific (including
errno values and the like). However, this commit is pure refactoring:
I've _just_ changed the allocation policy, and left all the messages
alone.