This is preparation to allow Pageant to be able to return to its GUI
main loop in the middle of handling a request (e.g. present a dialog
box to the user related to that particular request, and wait for the
user's response). In order to do that, we need the main thread's
Windows message loop to never be blocked by a WM_COPYDATA agent
request.
So I've split Pageant's previous hidden window into two hidden
windows, each with a subset of the original roles, and created in
different threads so that they get independent message loops. The one
in the main thread receives messages relating to Pageant's system tray
icon; the one in the subthread has the identity known to (old) Pageant
clients, and receives WM_COPYDATA messages only. Each WM_COPYDATA is
handled by passing the request back to the main thread via an event
object integrated into the Pageant main loop, and then waiting for a
second event object that the main thread will signal when the answer
comes back, and not returning from the WndProc handler until the
response arrives.
Hence, if an agent request received via WM_COPYDATA requires GUI
activity, then the main thread's GUI message loop will be able to do
that in parallel with all Pageant's other activity, including other
GUI activity (like the key list dialog box) and including responding
to other requests via named pipe.
I can't stop WM_COPYDATA requests from blocking _each other_, but this
allows them not to block anything else. And named-pipe requests block
nothing at all, so as clients switch over to the new IPC, even that
blockage will become less and less common.
Those chomp operations in wincons.c and uxcons.c looked ugly, and I'm
not totally convinced they couldn't underrun the buffer by 1 byte in
weird circumstances. strbuf_chomp is neater.
UBsan pointed out another memcpy from NULL (again with length 0) in
the prompts_t system. When I looked at it, I realised that firstly
prompt_ensure_result_size was an early not-so-good implementation of
sgrowarray_nm that would benefit from being replaced with a call to
the real one, and secondly, the whole system for storing prompt
results should really have been replaced with strbufs with the no-move
option, because that's doing all the same jobs better.
So, now each prompt_t holds a strbuf in place of its previous manually
managed string. prompt_ensure_result_size is gone (the console
prompt-reading functions use strbuf_append, and everything else just
adds to the strbuf in the usual marshal.c way). New functions exist to
retrieve a prompt_t's result, either by reference or copied.
These are better than my previous approach of just assigning to
sb->len, because firstly they check by assertion that the new length
is within range, and secondly they preserve the invariant that the
byte stored in the buffer just after the length runs out is \0.
Switched to using the new functions everywhere a grep could turn up
opportunities.
In setting up the ECC tests for cmdgen, I noticed that OpenSSH and
PuTTYgen disagree on the bit length to put in a key fingerprint for an
ed25519 key: we think 255, they think 256.
On reflection, I think 255 is more accurate, which is why I bodged
get_fp() in the test suite to ignore that difference when checking our
key fingerprint against OpenSSH's. But having done that, it now seems
silly that if you unnecessarily specify a bit count at ed25519
generation time, cmdgen will insist that it be 256!
255 is now permitted everywhere an ed25519 bit count is input. 256 is
also still allowed for backwards compatibility but 255 is preferred by
the error message if you give any other value.
Now they have names that are more consistent (no more userkey_this but
that_userkey); a bit shorter; and, most importantly, all the current
functions end in _f to indicate that they deal with keys stored in
disk files. I'm about to add a second set of entry points that deal
with keys via the more general BinarySource / BinarySink interface,
which will sit alongside these with a different suffix.
As in the previous commit, this means that agent_query() is now able
to operate in an asynchronous mode, so that if Pageant takes time to
answer a request, the GUI of the PuTTY instance making the request
won't be blocked.
Also as in the previous commit, we still fall back to the WM_COPYDATA
protocol if the new named pipe protocol isn't available.
Now that Pageant runs a named-pipe server as well as a WM_COPYDATA
server, we prefer the former (if available) for agent forwarding, for
the same reasons as on Unix: it lets us establish a simple raw-data
streaming connection instead of agentf.c's complicated message
boundary detection and buffer management, and if agent connections
ever become stateful, this technique will cope.
On Windows, another advantage of this change is that forwarded agent
requests can now be asynchronous: if the agent takes time to respond
to a request for any reason, then the rest of PuTTY's GUI and SSH
connection are not blocked, and you can carry on working while the
agent is thinking about the request.
(I didn't list that as a benefit of doing the same thing for Unix in
commit ae1148267, because on Unix, agent_query() could _already_ run
asynchronously. It's only on Windows that that's new.)
This reuses all the named-pipe IPC code I set up for connection
sharing a few years ago, to set up a named pipe with a predictable
name and speak the stream-oriented SSH agent protocol over it.
In this commit, we just set up the server, and there's no code that
speaks the client end of the new IPC yet. But my plan is that clients
should switch over to using this interface if possible, because it's
generally better: it doesn't have to be handled synchronously in the
middle of a GUI event loop (either in Pageant itself _or_ in its
client), and it's a better fit to the connection-oriented nature of
forwarded agent connections (so if any features ever appear in the
agent protocol that require state within a connection, we'll now be
able to support them).
This contains most of the guts of the previously monolithic function
new_named_pipe_client(), but it directly returns the HANDLE to the
opened pipe, or a string error message on failure.
new_named_pipe_client() is now a thin veneer on top of that, which
returns a Socket * by wrapping up the HANDLE into a HandleSocket or
the error message into an ErrorSocket as appropriate.
So it's now possible to connect to a named pipe, using all our usual
infrastructure (including in particular the ownership check of the
server, to defend against spoofing attacks), without having to have a
Socket-capable event loop in progress.
Windows Plink and PSFTP had very similar implementations, and now they
share one that lives in a new file winselcli.c. I've similarly moved
GUI PuTTY's implementation out of window.c into winselgui.c, where
other GUI programs wanting to do networking will be able to access
that too.
In the spirit of centralisation, I've also taken the opportunity to
make both functions use the reasonably complete winsock_error_string()
rather than (for some historical reason) each inlining a minimal
version that reports most errors as 'unknown'.
Historically, because of the way Windows Pageant's IPC works, PuTTY's
agent forwarding has always been message-oriented. The channel
implementation in agentf.c deals with receiving a data stream from the
remote agent client and breaking it up into messages, and then it
passes each message individually to agent_query().
On Unix, this is more work than is really needed, and I've always
meant to get round to doing the more obvious thing: making an agent
forwarding channel into simply a stream-oriented proxy, passing raw
data back and forth between the SSH channel and the local AF_UNIX
socket without having to know or care about the message boundaries in
the stream.
The portfwdmgr_connect_socket() facility introduced by the previous
commit is the missing piece of infrastructure to make that possible.
Now, the agent client module provides an API that includes a callback
you can pass to portfwdmgr_connect_socket() to open a streamed agent
connection, and the agent forwarding setup function tries to use that
where possible, only falling back to the message-based agentf.c system
if it can't be done. On Windows, the new piece of agent-client API
returns failure, so we still fall back to agentf.c there.
There are two benefits to doing it this way. One is that it's just
simpler and more robust: if PuTTY isn't trying to parse the agent
connection, then it has less work to do and fewer places to introduce
bugs. The other is that it's futureproof against changes in the agent
protocol: if any kind of extension is ever introduced that requires
keeping state within a single agent connection, or that changes the
protocol itself so that agentf's message-boundary detection stops
working, then this forwarding system will still work.
When I'm declaring a local instance of some context structure type to
pass to a function which will pass it in turn to a callback, I've
tended to use a declaration of the form
struct context actx, *ctx = &actx;
so that the outermost caller can initialise the context, and/or read
out fields of it afterwards, by the same syntax 'ctx->foo' that the
callback function will be using. So you get visual consistency between
the two functions that share this context.
It only just occurred to me that there's a much neater way to declare
a context struct of this kind, which still makes 'ctx' behave like a
pointer in the owning function, and doesn't need all that weird
verbiage or a spare variable name:
struct context ctx[1];
That's much nicer! I've switched to doing that in all existing cases I
could find, and also in a couple of cases where I hadn't previously
bothered to do the previous more cumbersome idiom.
The do_select function is called with a boolean parameter indicating
whether we're supposed to start or stop paying attention to network
activity on a given socket. So if we freeze and unfreeze the socket in
mid-session because of backlog, we'll call do_select(s, false) to
freeze it, and do_select(s, true) to unfreeze it.
But the implementation of do_select in the Windows SFTP code predated
the rigorous handling of socket backlogs, so it assumed that
do_select(s, true) would only be called at initialisation time, i.e.
only once, and therefore that it was safe to use that flag as a cue to
set up the Windows event object to associate with socket activity.
Hence, every time the socket was frozen and unfrozen, we would create
a new netevent at unfreeze time, leaking the old one.
I think perhaps part of the reason why that was hard to figure out was
that the boolean parameter was called 'startup' rather than 'enable'.
To make it less confusing the next time I read this code, I've also
renamed it, and while I was at it, adjusted another related comment.
This commit switches as many ssh_hash_free / ssh_hash_new pairs as
possible to reuse the previous hash object via ssh_hash_reset. Also a
few other cleanups: use the wrapper function hash_simple() where
possible, and I've also introduced ssh_hash_digest_nondestructive()
and switched to that where possible as well.
Up until now, it's been a variadic _function_, whose argument list
consists of 'const char *' ASCIZ strings to concatenate, terminated by
one containing a null pointer. Now, that function is dupcat_fn(), and
it's wrapped by a C99 variadic _macro_ called dupcat(), which
automatically suffixes the null-pointer terminating argument.
This has three benefits. Firstly, it's just less effort at every call
site. Secondly, it protects against the risk of accidentally leaving
off the NULL, causing arbitrary words of stack memory to be
dereferenced as char pointers. And thirdly, it protects against the
more subtle risk of writing a bare 'NULL' as the terminating argument,
instead of casting it explicitly to a pointer. That last one is
necessary because C permits the macro NULL to expand to an integer
constant such as 0, so NULL by itself may not have pointer type, and
worse, it may not be marshalled in a variadic argument list in the
same way as a pointer. (For example, on a 64-bit machine it might only
occupy 32 bits. And yet, on another 64-bit platform, it might work
just fine, so that you don't notice the mistake!)
I was inspired to do this by happening to notice one of those bare
NULL terminators, and thinking I'd better check if there were any
more. Turned out there were quite a few. Now there are none.
Thanks to Patrick Stekovic for pointing out that, unlike sensible IP
stacks, Windows requires a non-default socket option to prevent a
second application from binding to a port you were already listening
on, causing some of your incoming connections to be diverted.
This replaces the previous setsockopt that enabled SO_REUSEADDR, which
I put there a long time ago in order to fix an annoying behaviour if
you used the same listening socket twice in rapid succession (e.g. for
successive PuTTYs forwarding the same port) and the second one failed
to bind the listening port because a left-over connection from the
first one was still in TIME_WAIT and causing the port number to be
marked as used.
As far as I can see, SO_EXCLUSIVEADDRUSE and SO_REUSEADDR are mutually
exclusive - if I try to set both, either way round, then setsockopt
returns failure on the second one - so if I have to set the former
then I _can't_ set the latter. And fortunately, re-testing on Windows
10, the TIME_WAIT problem that SO_REUSEADDR was supposed to solve
doesn't seem to exist any more: I deliberately tried listening on a
port that had a TIME_WAIT connection sitting on it, and it worked for
me even without SO_REUSEADDR.
(I can't remember now whether I definitely confirmed the TIME_WAIT
problem on a previous version of Windows, or whether I just assumed it
would happen on Windows in the same way as Linux, where I definitely
do remember observing it.)
While I'm changing that setsockopt call, I've also fixed its 'on'
parameter so that it's a BOOL rather than an int, in accordance with
the docs for WinSock setsockopt.
The message "Reusing a shared connection to this server" is sent to
the seat's output method during the call to ssh_init. In Windows
Plink, that output method wants to talk to the BinarySink stderr_bs
(or stdout_bs, but for this particular message, stderr). So we have to
have already set up stderr_bs by the time the backend init function is
called.
When I introduced the unreachable() macro in commit 0112936ef, I
searched the source code for assert(0) and assert(false), together
with their variant form assert(0 && "explanatory text"). But I didn't
search for assert(!"explanatory text"), which is the form I used to
use before finding that assert(0 && "text") seemed to be preferred in
other code bases.
So, here's a belated replacement of all the assert(!"stuff") macros
with further instances of unreachable().
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
The RESIZE_EITHER resizing mode responds to a window resize by
changing the logical terminal size if the window is shown normally, or
by changing the font size to keep the terminal size the same if the
resize is a transition between normal and maximised state.
But a user pointed out that it's also possible for a window to receive
a WM_SIZE message while _remaining_ in maximised state, and that
PuTTY's resize logic didn't allow for that possibility. It occurs when
there's a change in the amount of available screen space for the
window to be maximised _in_: e.g. when the video resolution is
reconfigured, or when you reconnect to a Remote Desktop session using
a client window of a different size, or even when you toggle the
'Automatically hide the taskbar' option in the Windows taskbar settings.
In that situation, the right thing seems to be for PuTTY to continue
to go with the policy of changing the font size rather than the
logical terminal size. In other words, we prefer to change the font
size when the resize is _from_ maximised state, _to_ maximised state,
_or both_.
That's easily implemented by removing the check of the 'was_zoomed'
flag, in the case where we've received a WM_SIZE message with the
state SIZE_MAXIMIZED: once we know the transition is _to_ maximised
state, it doesn't matter whether or not it was also _from_ it. (But we
still set the was_zoomed flag to the most recent maximised status, so
that we can recognise transitions _out_ of maximised mode.)
Commit f2e61275f converted the integer casts in cmpforsearch to
uintptr_t from unsigned long. But it left the companion function
cmpfortree alone, presumably on the grounds that the compiler didn't
report a warning for that one.
But those two functions (cmpfortree and cmpforsearch) are used with
the same tree234, so they're supposed to implement the same sorting
criterion. And the thing they're actually comparing, namely the
Windows API typedef SOCKET, is a pointer-sized integer. So there was a
latent bug here in which cmpforsearch was comparing all 64 bits of the
pointer, while cmpfortree was only comparing the low-order 32.
Set the initial buffer size to MAX_PATH + 1 (261). Increment e->i before
the function returns instead of incrementing it in the call to
RegEnumKey.
The initial buffer size was too small to fit a subkey with a 256
characters long name plus '\0', the first call to RegEnumKey would fail
with ERROR_MORE_DATA, sgrowarray would grow the buffer, and RegEnumKey
would be called again.
However, because e->i was incremented in the first RegEnumKey call, the
second call would get the next subkey and the subkey with the long name
would be skipped.
Saving a session with a 256 characters long name would trigger this
problem. The session would be saved in the registry, but Putty would not
be able to display it in the saved sessions list.
Pageant didn't have this problem since it uses a different function to get
the saved sessions and the size of the buffer used is MAX_PATH + 1. Pageant
and Putty would display slightly different lists of saved sessions.
I've only just noticed that it doesn't do anything at all!
Almost every implementation of the Socket vtable provides a flush()
method which does nothing, optionally with a comment explaining why
it's OK to do nothing. The sole exception is the wrapper Proxy_Socket,
which implements the method during its setup phase by setting a
pending_flush flag, so that when its sub-socket is later created, it
can call sk_flush on that. But since the sub-socket's sk_flush will do
nothing, even that is completely pointless!
Source control history says that sk_flush was introduced by Dave
Hinton in 2001 (commit 7b0e08270), who was going to use it for some
purpose involving the SSL Telnet support he was working on at the
time. That SSL support was never finished, and its vestigial
declarations in network.h were removed in 2015 (commit 42334b65b). So
sk_flush is just another vestige of that abandoned work, which I
should have removed in the latter commit but overlooked.
If the agent sent a response whose length field describes an interval
of memory larger than the file-mapping object the message is supposed
to be stored in, we shouldn't return that message to the client as if
nothing is wrong. Treat that the same as a failure to receive any
response at all.
Looking over this function today, I spotted several questionable uses
of strcat to concatenate "\PUTTY.RND" to the end of a pathname,
without having checked whether the pathname had filled up the static
fixed-size buffer already.
I don't think this is exploitable (because you'd have to be in control
of the local account already to control any of the data sources used
to fill those buffers). But it's horrible anyway, of course. Now all
of those are replaced with sensible dupcats.
(This patch re-indents a lot of the function, to give variables
tighter scopes. So the diff is best viewed with whitespace ignored.)
This assertion was supposed to be checking for the buffer overrun
fixed by the previous commit, but because it checks the buffer index
just _after_ writing into the buffer, it would have permitted a
one-byte overrun before failing the assertion.
If you select an entry in the saved sessions list box, but without
double-clicking to actually load it, and then you hit OK, the config-
box code will automatically load it. That just saves one click in a
common situation.
But in order to load that session, the config-box system first has to
ask the list-box control _which_ session is selected. On Windows, this
causes an assertion failure if the user has switched away from the
Session panel to some other panel of the config box, because when the
list box isn't on screen, its Windows control object is actually
destroyed.
I think a sensible answer is that we shouldn't be doing that implicit
load behaviour in any case if the list box isn't _visible_: silently
loading and launching a session someone selected a lot of UI actions
ago wasn't really the point. So now I make that behaviour only happen
when the list box (i.e. the Session panel) _is_ visible. That should
prevent the assertion failure on Windows, but the UI effect is cross-
platform, applying even on GTK where the control objects for invisible
panels persist and so the assertion failure didn't happen. I think
it's a reasonable UI change to make globally.
In order to implement it, I've had to invent a new query function so
that config.c can tell whether a given control is visible. In order to
do that on GTK, I had to give each control a pointer to the 'selparam'
structure describing its config-box pane, so that query function could
check it against the current one - and in order to do _that_, I had to
first arrange that those 'selparam' structures have stable addresses
from the moment they're first created, which meant adding a layer of
indirection so that the array of them in the top-level dlgparam
structure is now an array of _pointers_ rather than of actual structs.
(That way, realloc half way through config box creation can't
invalidate the important pointer values.)
Normally I never notice warnings in this build, because it runs inside
bob and dumps all the warnings in a part of the build log I never look
at. But I've had these fixes lying around for a while and should
commit them.
They're benign: all we need is an explicit declaration of strtoumax to
replace the one that stdlib.h doesn't provide, and a couple more of
those annoying NO_TYPECHECK modifiers on GET_WINDOWS_FUNCTION calls.
Having decided that the terminal's local echo setting shouldn't be
allowed to propagate through to termios, I think the local edit
setting shouldn't either. Also, no other terminal emulator I know
seems to implement this sequence, and if you enable it, things get
very confused in general. I think it's generally better off absent; if
somebody turns out to have been using it, then we'll at least be able
to find out what it's good for.
The functions that previously lived in it now live in terminal.c
itself; they've been renamed term_keyinput and term_keyinputw, and
their function is to add data to the terminal's user input buffer from
a char or wchar_t string respectively.
They sit more comfortably in terminal.c anyway, because their whole
point is to translate into the character encoding that the terminal is
currently configured to use. Also, making them part of the terminal
code means they can also take care of calling term_seen_key_event(),
which simplifies most of the call sites in the GTK and Windows front
ends.
Generation of text _inside_ terminal.c, from responses to query escape
sequences, is therefore not done by calling those external entry
points: we send those responses directly to the ldisc, so that they
don't count as keypresses for all the user-facing purposes like bell
overload handling and scrollback reset. To make _that_ convenient,
I've arranged that most of the code that previously lived in
lpage_send and luni_send is now in separate translation functions, so
those can still be called from situations where you're not going to do
the default thing with the translated data.
(However, pasted data _does_ still count as close enough to a keypress
to call term_seen_key_event - but it clears the 'interactive' flag
when the data is passed on to the line discipline, which tweaks a
minor detail of control-char handling in line ending mode but mostly
just means pastes aren't interrupted.)
It's identical in uxnoise and winnoise, being written entirely in
terms of existing cross-platform functions. Might as well centralise
it into sshrand.c.
Having explicitly _stated_ in commit 4dcc0fddf the principle that if
you ever queue a toplevel callback on a freeable object then you
should also call delete_callbacks_for_context on that object before
freeing it, I realised I'd never actually gone through and checked
methodically at every call site of queue_toplevel_callback. So I did,
and naturally, I found several missing ones.
The recent rewriting in both the GTK and Windows keyboard handlers
left the keypad 'Enter' key in a bad state, when no override is
enabled that causes it to generate an escape sequence.
On Windows, a series of fallbacks was causing it to generate \r
regardless of configuration, whereas in Telnet mode it should default
to generating the special Telnet new-line sequence, and in response to
ESC[20h (enabling term->cr_lf_return) it should generate \r\n.
On GTK, it wasn't generating anything _at all_, and also, I can't see
any evidence that the GTK keyboard handler had ever remembered to
implement the cr_lf_return mode.
Now Keypad Enter in non-escape-sequence mode should behave just like
Return, on both platforms.
TranslateKey() on Windows passed all numeric-keypad key events to this
function in terminal.c, and accepted whatever it gave back. That
included the handling for the trivial case of the numeric keypad, when
Num Lock is on and application keypad mode hasn't overridden it, so
that the keypad should be returning actual digits. In that case,
format_numeric_keypad_key() itself was returning the same ASCII
character I had passed in to it as a keypad identifier, and
TranslateKey was returning that in turn as the final translation.
Unfortunately, that means that with Num Lock on, the numeric keypad
translates into what _I_ used as the logical keypad codes inside the
source code, not what the local keyboard layout thinks are the right
codes. In particular, the key I identified as keypad '.' would render
as '.' even on a German keyboard where it ought to produce ','.
Fixed by removing the fallback case in format_numeric_keypad_key()
itself, so now it returns the empty string if it didn't produce an
escape sequence as its translation. Instead, the special case is in
window.c, which checks for a zero-length output string and handles it
by falling through to the keyboard-layout specific ToUnicode code
further down TranslateKey().
On the GTK side, no change is needed here: the GTK keyboard handler
does things in the opposite order, by trying the local input method
_first_ (unless it can see a reason up front to override it), and only
calling format_numeric_keypad_key() if that didn't provide a
translation. So the fallback ASCII translation in the latter was
already not used.
Re-consider the icon in light of the font size, so that we pick the icon
whose size mostly closely matches the terminal font, rather than always
scaling the default icon.
Remove the 'winhelp-topic' IDs from the Halibut source, and from the
code. Now we have one fewer name to think of every time we add a
setting.
I've left the HELPCTX system in place, with the vague notion that it
might be a useful layer of indirection for some future help system on a
platform like Mac OS X.
(I've left the putty.hlp target in doc/Makefile, if nothing else because
this is a convenient test case for Halibut's WinHelp support. But the
resulting help file will no longer support context help.)
GetDlgItemText_alloc is often used to get passwords from text fields,
so the memory should be freed and erased properly. Otherwise parts
of passwords might leak in memory.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
In my eagerness to make sure we didn't _accidentally_ change the
seat's trust status back to trusted at any point, I forgot to do it on
purpose if a second SSH login phase is legitimately run in the same
terminal after the first session has ended.
Fill in all the function pointers for 3rd party Windows GSS DLLs, not
just some of them. These were missed out when GSS key exchange was added
in d515e4f1a3.