When the user provides a password on the PuTTY command line, via -pw
or -pwfile, the flag 'tried_once' inside cmdline_get_passwd_input() is
intended to arrange that we only try sending that password once, and
after we've sent it, we don't try again.
But this plays badly with the 'Restart Session' operation. If the
connection is lost and then restarted at user request, we _do_ want to
send that password again!
So this commit moves that static variable out into a small state
structure held by the client of cmdline_get_passwd_input. Each client
can decide how to manage that state itself.
Clients that support 'Restart Session' - i.e. just GUI PuTTY itself -
will initialise the state at the same time as instantiating the
backend, so that every time the session is restarted, we return
to (correctly) believing that we _haven't_ yet tried the password
provided on the command line.
But clients that don't support 'Restart Session' - i.e. Plink and file
transfer tools - can do the same thing that cmdline.c was doing
before: just keep the state in a static variable.
This also means that the GUI login tools will now retain the
command-line password in memory, whereas previously they'd have wiped
it out once it was used. But the other tools will still wipe and free
the password, because I've also added a 'bool restartable' flag to
cmdline_get_passwd_input to let it know when it _is_ allowed to do
that.
In the GUI tools, I don't see any way to get round that, because if
the session is restarted you _have_ to still have the password to use
again. (And you can't infer that that will never happen from the
CONF_close_on_exit setting, because that too could be changed in
mid-session.) On the other hand, I think it's not all that worrying,
because the use of either -pw or -pwfile means that a persistent copy
of your password is *already* stored somewhere, so another one isn't
too big a stretch.
(Due to the change of -pw policy in 0.77, the effect of this bug was
that an attempt to reconnect in a session set up this way would lead
to "Configured password was not accepted". In 0.76, the failure mode
was different: PuTTY would interactively prompt for the password,
having wiped it out of memory after it was used the first time round.)
In the changes around commit 420fe75552, I made the terminal
suspend output processing while it waited for a term_size() callback
in response to a resize request. Because on X11 there are unusual
circumstances in which you never receive that callback, I also added a
last-ditch 5-second timeout, so that eventually we'll resume terminal
output processing regardless.
But the timeout lives in terminal.c, in the cross-platform code. This
is pointless on Windows (where resize processing is synchronous, so we
always finish it before the timer code next gets called anyway), but I
decided it was easier to keep the whole mechanism in terminal.c in the
absence of a good reason not to.
Now I've found that reason. We _also_ generate window resizes locally
to the GTK front end, in response to the key combinations that change
the font size, and _those_ still have an asynchrony problem.
So, to begin with, I'm refactoring the request_resize system so that
now there's an explicit callback from the frontend to the terminal to
say 'Your resize request has now been processed, whether or not you've
received a term_size() call'. On Windows, this simplifies matters
greatly because we always know exactly when to call that, and don't
have to keep a 'have we called term_size() already?' flag. On GTK, the
timing complexity previously in terminal.c has moved into window.c.
No functional change (I hope). The payoff will be in the next commit.
The previous fix on pre-0.77 was non-disruptive and just enough to get
through my Coverity build (which uses winelib); but now that I look at
the rest of the Winelib build output, there are some further warnings
I should fix on main.
Most of them are more long/LONG confusion (specific to Winelib, rather
than real Windows); also, there's a multiple macro definition in
jump-list.c because Winelib defines _PROPVARIANT_INIT_DEFINED_ in
place of _PROPVARIANTINIT_DEFINED_ which we were testing for. (Bah.)
And in windows/window.c I used wcscmp without including <wchar.h>.
In spite of long vs LONG I still had to turn off one or two more
DLL-loading typechecks.
save_screenshot() returns a dynamically allocated error message in
case of failure, and Coverity complained of a memory leak when it was
ignored in putty.c.
The memory leak is trivial, because we were about to terminate the
process with an error anyway. But it's a good point that I forgot to
report the error!
Not critical enough to fix on 0.77 (where Coverity found it), but we
might as well make it look sensible on main.
In Winelib, you have to be careful not to say 'unsigned long' where
the API expects ULONG, because Winelib doesn't have the Windows LLP64
nature - its unsigned long is 64 bits, whereas ULONG is 32.
Also, my local Winelib has <dwmapi.h> (used in the new demo-screenshot
system), but doesn't contain some of the definitions inside it. So
I've expanded the cmake test of HAVE_DWMAPI_H so that it actually
checks the things we need, instead of just the existence of the
containing header.
Now we check that we can actually make an ssh_key out of it, and
moreover, that the key is of a sensible kind (i.e. not a certificate
in turn). If that's not true, we report something about the problem in
a new CTRL_TEXT below the public key input box. If the key _is_ valid,
that same text control is used to show its type, length and
fingerprint.
On Windows, I've widened the dialog box a little to make fingerprints
fit sensibly in it.
This is for cases where they're presenting information to the user
that wouldn't wrap sensibly anyway (such as an SSH key fingerprint
which is mostly all one word), and in which newlines might be
significant.
On GTK, the implementing widget is still a GtkLabel, but without the
wrap flag set, and wrapped in a GtkScrolledWindow in case the text is
too wide to fit.
On Windows, I've switched to using an edit box instead of a static
text control, making it readonly, and borderless via my existing
MakeDlgItemBorderless helper function. This doesn't get you an actual
scrollbar, but it does mean you can scroll left and right by dragging
with the mouse.
Various alignments I want to do in the host CA box have shown up
deficiencies in this system, so I've reworked it a bit.
Firstly, you can now specify more than two controls to be tied
together with an align_next_to (e.g. multiple checkboxes alongside
something else).
Secondly, as well as forcing the controls to be the same height as
each other, the layout algorithm will also move the later controls
further _downward_, so that their top y positions also line up. Until
now that hasn't been necessary, because they lined up already.
In the GTK implementation of this via the Columns class, I've renamed
'columns_force_same_height' to 'columns_align_next_to', and similarly
for some of the internal fields, since the latter change makes the
previous names a misnomer.
In the Windows implementation, I found it most convenient to set this
up by following a linked list of align_next_to fields backwards. But
it won't always be convenient to initialise them that way, so I've
also written a crude normaliser that will rewrite those links into a
canonical form. But I only call that on Windows; it's unnecessary in
GTK, where the Columns class provides plenty of per-widget extra
storage so I just keep each alignment class as a circular list.
Apparently I never re-tested that option when I revamped Pageant's
command-line option parsing in commit dc183e1649, because it's
now off by one in figuring out which argument to treat as the start of
the command to be run.
(The new code in that commit is the same shape as the old code but
with variables renamed, and that was the mistake, because in the old
code, the argument index i pointed to the -c option, whereas in the
new code, match_opt has already advanced amo.index to the next word.
So the two index variables _shouldn't_ be treated the same.)
As distinct from the type of signature generated by the SSH server
itself from the host key, this lets you exclude (and by default does
exclude) the old "ssh-rsa" SHA-1 signature type from the signature of
the CA on the certificate.
This will allow the central host_ca_new function to pre-populate the
structure with default values for the fields, so that once I add more
options to CA configuration they can take their default values when
loading a saved record from a previous PuTTY version.
In the course of polishing up this dialog box, I'm going to want it to
actually do cryptographic things (such as checking validity of a
public key blob and printing its fingerprint), which means it will
need to link against SSH utility functions.
So I've moved the dialog-box setup and handling code out of config.c
into a new file in the ssh subdirectory and in the ssh library, where
those facilities will be conveniently available.
This also means that dialog-box setup code _won't_ be linked into
PuTTYtel or pterm (on either platform), so I've added a stub source
file to provide its entry-point function in those tools. Also,
provided a const bool to indicate whether that dialog is available,
which we use to decide whether to recognise that command-line option.
Instead of an edit box together with a Browse button that pops up a
sub-dialog, this is _just_ the browse button, only now it has a
user-defined title. I'm about to want to use this for loading CA
public keys from files.
This causes PuTTY to bring up just the host CA configuration dialog
box, and shut down once that box is dismissed.
I can imagine it potentially being useful to users, but in the first
instance, I expect it to be useful to _me_, because it will greatly
streamline testing changes to the UI of that dialog!
This gets rid of that awkward STANDARD_PREFIX system in which every
branch of the old 'union control' had to repeat all the generic
fields, and then call sites had to make an arbitrary decision about
which branch to access them through.
That was the best we could do before accepting C99 features in this
code base. But now we have anonymous unions, so we don't need to put
up with that nonsense any more!
'dlgcontrol' is now a struct rather than a union, and the generic
fields common to all control types are ordinary members of the struct,
so you don't have to refer to them as ctrl->generic.foo at all, just
ctrl->foo, which saves verbiage at the point of use.
The extra per-control fields are still held in structures named after
the control type, so you'll still say ctrl->listbox.height or
whatever. But now those structures are themselves members of an
anonymous union field following the generic fields, so those
sub-structures don't have to reiterate all the standard stuff too.
While I'm here, I've promoted 'context2' from an editbox-specific
field to a generic one (it just seems silly _not_ to allow any control
to have two context fields if it needs it). Also, I had to rename the
boolean field 'tabdelay' to avoid it clashing with the subsidiary
structure field 'tabdelay', now that the former isn't generic.tabdelay
any more.
I'm about to change my mind about whether its top-level nature is
struct or union, and rather than change the key word 'union' to
'struct' at every point of use, it's nicer to just get rid of the
keyword completely. So it has a shiny new name.
I just tried to trace through the Windows version's control flow in
response to a confusing bug report, and found that the control flow
itself was so confusing I couldn't make sense of it. Why are we
choosing between getaddrinfo and gethostbyname via #ifndef NO_IPV6,
then re-converging control flow and diverging a second time to report
the error?
So I rewrote the whole thing to have completely separate sections of
code dealing with the three resolution strategies, each with its own
dedicated error reporting system. And then I checked the Unix version
and found it was about as confusing, so I rewrote that too in the same
style. Now the two are mostly the same, except for details: Unix has
an override at the top for a Unix socket pathname, Windows has to cope
with getaddrinfo maybe not being found at run time (so the other cases
aren't in the #else clause), and Windows uses the same error reporting
for both lookup functions whereas Unix has to use the appropriate
gai_strerror or hstrerror.
To begin with, Windows's own API documentation doesn't recommend using
it (for thread-safety reasons), and promises that the error codes
returned from getaddrinfo are aliases for the normal Windows error
code enumeration. So it's safe, and quite likely preferable, to just
use ordinary win_strerror instead.
But more embarrassingly, my attempt to acquire and use gai_strerror
from one or other Winsock DLL didn't even *work*! Because of course
it's a function that handles strings, which means it comes in two
variants, gai_strerrorA and gai_strerrorW, so in order to look it up
using GetProcAddress, I should have specified which I wanted. And I
didn't, so the lookup always failed.
This should improve error reporting in cases of interesting kinds of
DNS failure.
We're assigning string literals into it all over the place, so it
should have been const char * all along. No thanks to any of the
compilers that didn't point that out!
As well as eliminating the null-pointer dereference, I also now
realise that the format-translation code depended on leaving the final
translated string in 'otherstr' in order to pass the host key check
afterwards (if they match).
I've also now realised that this only applies to *SSH-1* RSA keys, so
it's even more obsolete than I thought before. Perhaps I should just
remove this code instead of spending all this effort on fixing it. But
I've done the fix now, so I'll commit it, and then maybe we can remove
it afterwards (and have a working version of it available to resurrect
if ever needed!).
A user points out that in commit 6143a50ed2, when I converted all
use of the registry to functions that return a newly allocated buffer
instead of allocating a buffer themselves beforehand, I overlooked
that one use of the old idiom was reusing the preallocated buffer as
work space.
I _hope_ nobody still needs this code - the 'old-style' host key cache
format it handles was replaced in 2000. If anyone has a PuTTY host key
cache entry that's survived 22 years without either having to be
reinitialised on a new system or changed when the machine's host key
was upgraded, they're doing better than I am!
But if it's still here, it should still work, obviously. Replaced the
reused buffer with a strbuf, which is more robust anyway.
As of df3a21d97b, this panel has two "Browse..." buttons, and is the
first to do so. On Windows, the shortcut for this file chooser button
was hardcoded to 'w', causing an assertion failure whenever switching to
the Auth panel.
I've just removed the shortcut -- there are others near enough that it's
easy to reach the "Browse..." button with Tab.
Now we offer the OpenSSH certificate key types in our KEXINIT host key
algorithm list, so that if the server has a certificate, they can send
it to us.
There's a new storage.h abstraction for representing a list of trusted
host CAs, and which ones are trusted to certify hosts for what
domains. This is stored outside the normal saved session data, because
the whole point of host certificates is to avoid per-host faffing.
Configuring this set of trusted CAs is done via a new GUI dialog box,
separate from the main PuTTY config box (because it modifies a single
set of settings across all saved sessions), which you can launch by
clicking a button in the 'Host keys' pane. The GUI is pretty crude for
the moment, and very much at a 'just about usable' stage right now. It
will want some polishing.
If we have no CA configured that matches the hostname, we don't offer
to receive certified host keys in the first place. So for existing
users who haven't set any of this up yet, nothing will immediately
change.
Currently, if we do offer to receive certified host keys and the
server presents one signed by a CA we don't trust, PuTTY will bomb out
unconditionally with an error, instead of offering a confirmation box.
That's an unfinished part which I plan to fix before this goes into a
release.
This allows you to actually use an OpenSSH user certificate for
authentication, by combining the PPK you already had with the
certificate from the CA to produce a new PPK whose public half
contains the certificate.
I don't intend that this should be the only way to do it. It's
cumbersome to have to use the key passphrase in order to re-encrypt
the modified PPK. But the flip side is that once you've done it you
have everything you need in one convenient file, and also, the
certificate itself is covered by the PPK's tamperproofing (in case you
can think of any attacks in which the admin of your file server swaps
out just the certificate for a different one on the same key). So this
is *a* useful way to do it, even if not the only useful way.
The new options to add and remove certificates are supported by both
Windows GUI PuTTYgen and cmdgen. cmdgen can also operate on pure
public keys, so you can say 'puttygen --remove-certificate
foo-cert.pub' and get back the underlying foo.pub; Windows PuTTYgen
doesn't support that mode, but only because it doesn't in general have
any support for the loaded key not being a full public+private pair.
This makes room to add more entries without the Proxy panel
overflowing. It also means we can put in a bit more explanation in
some of the more cryptic one-word names!
I'm about to want to create a second entirely different dialog box
whose contents are described using the same dialog.h API as the main
config box. So I'm starting by moving as much handler code as possible
out of GenericMainDlgProc and its callers, and into a set of reusable
subroutines.
In particular, this gets rid of the disgusting static variables that
stored all the config-box state. Now they're stored in a more sensible
struct, which lives in the new context-pointer field provided by the
reworked ShinyDialogBox.
It's self-contained enough not to really need to live in dialog.c as a
set of static functions. Also, moving it means we can isolate the
implementation details - which also makes it easy to change them.
One such change is that I've added the ability to bake a context
pointer into the dialog - unused so far, but it will be shortly.
(Also, while I'm here, renamed the functions so they sound more as if
they're adding features than working around bugs - not to mention not
imputing mental illness to the usual versions.)
All the fiddly business where you have to check that a thing exists,
make sure of its type, find its size, allocate some memory, and then
read it again properly (or, alternatively, loop round dealing with
ERROR_MORE_DATA) just doesn't belong at every call site. It's crying
out to be moved out into some separate utility functions that present
a more ergonomic API, so that the code that decides _which_ Registry
entries to read and what to do with them can concentrate on that.
So I've written a fresh set of registry API wrappers in windows/utils,
and simplified windows/storage.c as a result. The jump-list handling
code in particular is almost legible now!
Every time I've had to do this before, I've always done the three-line
dance of initialising a BinarySource and calling get_string on it.
It's long past time I wrapped that up into a convenient subroutine.
Using a new screenshot-taking module I just added in windows/utils,
these new options allow me to start up one of the tools with
demonstration window contents and automatically save a .BMP screenshot
to disk. This will allow me to keep essentially the same set of demo
images and update them easily to keep pace with the current appearance
of the real tools as PuTTY - and Windows itself - both evolve.
Once we've actually loaded a key file, the job of updating the UI
fields is now done by a subroutine update_ui_after_load(), so that I
can call it from a different context in an upcoming commit.
I got a pterm into a stuck state this morning by an accidental mouse
action. I'd intended to press Ctrl + right-click to pop up the context
menu, but I accidentally pressed down the left button first, starting
a selection drag, and then while the left button was still held down,
pressed down the right button as well, triggering the menu.
The effect was that the context menu appeared while term->selstate was
set to DRAGGING, in which state terminal output is suppressed, and
which is only unset by a mouse-button release event. But then that
release event went to the popup menu, and the terminal window never
got it. So the terminal stayed stuck forever - or rather, until I
guessed the cause and did another selection drag to reset it.
This happened to me on GTK, but once I knew how I'd done it, I found I
could reproduce the same misbehaviour on Windows by the same method.
Added a simplistic fix, on both platforms, that cancels a selection
drag if the popup menu is summoned part way through it.
This is another fallback needed on Win95, where the Win32 API
functions to convert between multibyte and wide strings exist, but
they haven't heard of the UTF-8 code page. PuTTY can't really do
without that these days.
(In particular, if a server sends a remote window-title escape
sequence while the terminal is in UTF-8 mode, then _something_ needs
to translate the UTF-8 data into Unicode for Windows to reconvert into
the character set used in window titles.)
This is a weird enough thing to be doing that I've put it under the
new #ifdef LEGACY_WINDOWS, so behaviour in the standard builds should
be unchanged.
This makes Pageant run on Win95 again. Previously (after fixing the
startup-time failures due to missing security APIs) it would go into
an uninterruptible CPU-consuming spin in the message loop when every
attempt to retrieve its messages failed because PeekMessageW doesn't
work at all on the 95 series.
Now it can be called from places other than Pageant's WinMain(). In
particular, the attempt to make a security descriptor in
lock_interprocess_mutex() is gated on it.
In return, however, I've tightened up the semantics. In normal PuTTY
builds that aren't trying to support pre-NT systems, the function
*unconditionally* returns true, on the grounds that we don't expect to
target any system that doesn't support the security APIs, and if
someone manages to contrive one anyway - or, more likely, if we some
day introduce a bug in our loading of the security API functions -
then this safety catch should make Pageant less likely to accidentally
fall back to 'never mind, just run in insecure mode'.
Turns out that PuTTY hasn't run successfully on legacy Windows since
0.66, in spite of an ongoing intention to keep it working. Among the
reasons for this is that CreateWindowExW simply fails with
ERROR_CALL_NOT_IMPLEMENTED: apparently Win95 and its ilk just didn't
have fully-Unicode windows as an option.
Fixed by resurrecting the previous code from the git history (in
particular, code removed by commit 67e5ceb9a8 was useful), and
including it as a runtime alternative.
One subtlety was that I found I had to name the A and W window classes
differently (by appending ".ansi" to the A one): apparently they
occupy the same namespace even though the names are in different
character sets, so if you somehow manage to register both classes,
they'll collide with each other without that tweak.
These are more functions that don't exist on all our supported legacy
versions of Windows, so we need to follow the same policy as
everywhere else, by trying to acquire them at run time and having a
fallback if they aren't available.
This fixes a load-time failure on versions of Windows too old to have
that function in kernel32.dll.
We use it to determine whether a file was safe to overwrite in the
context of PuTTY session logging: if it's safe, we skip the 'do you
want to overwrite or append?' dialog box.
On earlier Windows you can use FindFirstFile to get a similar effect,
so that's what we fall back to. It's not quite the same, though - if
you pass a wildcard then it will succeed when you'd rather it had
failed. But it's good enough to at least work in normal cases.
This way, anyone who needs to use the version data can quickly call
init_winver to make sure it's been set up, and not waste too much faff
redoing the actual work.
By testing on a platform slow enough to show the flicker, I happened
to notice that if your shell prompt resets the window title every time
it's displayed, this was actually resulting in a call to SetWindowText
every time, which caused the GUI to actually do work.
There's certainly no need for that! We can at least avoid bothering if
the new title is identical to the old one.
It turns out that they're still NULL at the first moment that a
SetWindowText call tries to read one of them, because they weren't
initialised at startup! Apparently SetWindowText notices that it's
been passed a null pointer, and does nothing in preference to failing,
but it's still not what I _meant_ to do.
Unlike on Unix, a Windows process's exit status is a DWORD, i.e. a
32-bit unsigned integer. And exit statuses seen in practice can range
up into the high half of that space. For example, if a process dies of
an 'illegal instruction' exception, then the exit status retrieved by
GetExitCodeProcess will be 0xC000001D == STATUS_ILLEGAL_INSTRUCTION.
If this happens to the process running inside pterm.exe, then
conpty->exitstatus will be set to a value greater than INT_MAX, which
will cause conpty_exitcode to return -1. Unfortunately, a -1 return
from conpty_exitstatus is treated by front ends as saying that the
backend process hasn't exited yet, and is still running. So pterm will
sit around after its subprocess has already terminated, contrary to
your 'close on exit' setting.
Moreover, when cmd.exe exits, it apparently passes on to its parent
process the exit status of the last subcommand it ran. So if you run a
Windows pterm containing an ordinary interactive console session, and
the last subprogram you happen to run inside that session dies of a
fatal signal, then the same thing will happen after you type 'exit' at
the command prompt.
This has been happening to me intermittently ever since I created
pterm.exe in the first place, and I guessed completely wrong about the
cause (I feared some kind of subtle race condition in pterm's use of
the process API). I've only just managed to reproduce it reliably
enough to debug, and I'm relieved to find it's much simpler than that!
The immediate fix, in any case, is to ensure we don't return -1 from
conpty_exitcode unless the process really is still running. And don't
return INT_MAX either, because that indicates 'unclean exit' in a way
that triggers 'close window only on clean exit' (and even Unix pterm
doesn't consider the primary subprocess dying of a signal to count as
unclean). So we clip all out-of-range Windows exception codes to
INT_MAX-1.
In the longer term I think it would be nice to turn exit codes into a
full 32-bit space, and move the special values completely out of it.
That would permit actually keeping the exact exception code and
passing it on to a caller who needed it. For example, if we were to
write a Windows psusan (which I could imagine being occasionally
useful), this way it would be able to return the unmodified full
Windows exit code via the "exit-status" chanreq. But I don't think we
currently have any clients needing that much detail, so that's a more
intrusive cleanup for a later date.
There's now a command-line option to make Pageant open an AF_UNIX
socket at a pathname of your choice. This allows it to act as an SSH
agent for any client program willing to use a WinSock AF_UNIX socket.
In particular, this allows WSL 1 processes to talk directly to Windows
Pageant without needing any intermediate process, because the AF_UNIX
sockets in the WSL 1 world interoperate with WinSock's ones.
(However, not WSL 2, which isn't very surprising.)
Apparently when I made Windows Pageant use the winselgui system, I
added the call that gets WSAAsyncSelect response messages sent to
Pageant's window, but I didn't add the switch case in the window
procedure that actually handles those responses. I suppose I didn't
notice at the time because no actual functionality used it - Pageant
has never yet dealt with any real (i.e. Winsock) sockets, only with
HANDLE-based named pipes, which are called 'sockets' in PuTTY's
abstraction, but not by Windows.
Most of the previous large sk_newlistener function is now an inner
function whose address-family parameter is a platform AF_FOO constant
rather than one of our own ADDRTYPE_FOO. sk_newlistener itself is a
trivial wrapper on that, which just does the initial translation from
the input ADDRTYPE_FOO into an AF_FOO.
This will make it possible to drop in alternative wrapper functions
which won't have to make up a pointless ADDRTYPE.
This macro is now an inline function, and as in the previous commit,
each possible value for the main discriminator is now a case in a
switch statement instead of tested in an interlocking set of ?:.
The code that diverges based on the address family is now in the form
of a switch statement, rather than an unwieldy series of chained ifs.
And the final call to bind() has all its arguments worked out in the
previous switch, rather than computing them at the last minute with an
equally unwieldy set of ?: operators that repeat the previous test.
This will make it easier to add more cases, and also, to keep each
case under its own ifdef without losing too much legibility.
This replaces two previous boolean fields 'resolved' and 'namedpipe',
converting them into a single three-valued enum which avoids being
able to represent the meaningless fourth possibility at all. Also, it
provides an open-ended place to add further possibilities.
The new field is very similar to the one in unix/network.c, except
that the UNIX entry for AF_UNIX sockets is missing, and in its place
is the NAMEDPIPE entry for storing the pathnames of Windows named
pipes.
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
After a discussion with a user recently, I investigated the Windows
native ssh.exe, and found it uses a Windows named pipe to talk to its
ssh-agent, in exactly the same way Pageant does. So if you tell
ssh.exe where to find Pageant's pipe, it can talk directly to Pageant,
and then you can have just one SSH agent.
The slight problem is that Pageant's pipe name is not stable. It's
generated using the same system as connection-sharing pipe names, and
contains a hex hash value whose preimage was fed through
CryptProtectData. And the problem with _that_ is that CryptProtectData
apparently reinitialises its seed between login sessions (though it's
stable within a login session), which I hadn't fully realised when I
reused the same pipe-name construction code.
One possibility, of course, would be to change Pageant so that it uses
a fixed pipe name. But after a bit of thought, I think I actually like
this feature, because the Windows named pipe namespace isn't
segregated into areas writable by only particular users, so anyone
using that namespace on a multiuser Windows box is potentially
vulnerable to someone else squatting on the name you wanted to use.
Using this system makes that harder, because the squatter won't be
able to predict what the name is going to be! (Unless you shut down
Pageant and start it up again within one login session - but there's
only so much we can do. And squatting is at most a DoS, because
PuTTY's named-pipe client code checks ownership of the other end of
the pipe in all cases.)
So instead I've gone for a different approach. Windows Pageant now
supports an extra command-line option to write out a snippet of
OpenSSH config file format on startup, containing an 'IdentityAgent'
directive which points at the location of its named pipe. So you can
use the 'Include' directive in your main .ssh/config to include this
extra snippet, and then ssh.exe invocations will be able to find
wherever the current Pageant has put its pipe.
This imports the following options from command-line PuTTYgen, which
all correspond to controls in Windows PuTTYgen's GUI, and let you set
the GUI controls to initial values of your choice:
-t <key type>
-b <bits>
-E <fingerprint type>
--primes <prime gen policy>
--strong-rsa
--ppk-param <KDF parameters or PPK version etc>
The idea is that if someone generates a lot of keys and has standard
non-default preferences, they can make a shortcut that passes those
preferences on the command line.
These two tools had ad-hoc command loops with similar options, and I
want to extend both (in particular, in a way that introduces options
with arguments). So I've started by throwing together some common code
to do all the tedious bits like finding option arguments wherever they
might be, throwing errors, handling "--" and so on.
Should be no functional change to the existing command-line syntax,
except that now all long options are recognised in both "-foo" and
"--foo" form.
It looks as if I broke this some time around commit cfc9023616,
when I stopped proactively calling term_size() in advance of resizing
the window. A side effect was that I also stopped calling it at all in
the case where we're _not_ resizing the window (because changing the
size of the terminal means adapting the font size to fit a different
amount of stuff in the existing window).
Fixed by moving all the new machinery inside the 'actually resize the
window' branch of the if statement, and restoring the previous
behaviour in the other branch, this time with a comment that will
hopefully stop me making the same mistake again.
Some pointing devices (e.g. gaming mice) can be set to send
mouse-movement events at an extremely high sample rate like 1kHz. This
apparently translates into Windows genuinely sending WM_MOUSEMOVE
messages at that rate. So if you're using one of those mice, then
PuTTYgen's mouse-based entropy collection system will fill its buffer
almost immediately, and give you no perceptible time to actually wave
the mouse around.
I think that in that situation, there's also likely to be a strong
correlation between the contents of successive movement events,
because human-controlled mouse movements aren't fractals - the more
you zoom in on a little piece of one, the more it starts to look more
and more like something moving in a straight line at constant speed,
because the deviations from that happen on a larger time scale than
you're seeing.
So this commit adds a rate limit, not on the _collection_ of the data
(we'll still take all the bits we can get, thanks!), but on the rate
at which we increment the _counter_ for how much entropy we think
we've collected so far. That way, the user still has to spend time
wiggling the mouse back and forth in a way that varies with muscle
motions and introduces randomness.
There's no point having a separate boolean flag. All we have to do is
remember to NULL out the strbuf point state->entropy when we free the
strbuf (which is a good idea in any case!), and then we can use the
non-NULL-ness of that pointer as the indicator that we're currently
collecting entropy.
This offloads the memory management into centralised code which is
better at it than the ad-hoc code here. Now I don't have to predict in
advance how much memory the entropy will consume, and can change that
decision on the fly.
In the previous state of the code, we first tested agent_exists() to
decide whether to be the long-running Pageant server or a short-lived
client; then, during the command-line parsing loop, we prompted for
passphrases to add keys presented on the command line (to ourself or
the server, respectively); *then* we set up the named pipe and
WM_COPYDATA receiver window to actually start functioning as a server,
if we decided that was our role.
A consequence is that if a user started up two Pageants each with an
encrypted key on the command line, there would be a race condition:
each one would decide that it was _going_ to be the server, then
prompt for a passphrase, and then try to set itself up as the server
once the passphrase is entered. So whichever one's passphrase prompt
was answered second would add its key to its own internal data
structures, then fail to set up the server's named pipe, terminate
with an error, and end up not having added its key to the _surviving_
server.
This change reorders the setup steps so that the command-line parsing
loop does not add the keys immediately; instead it merely caches the
key filenames provided. Then we make the decision about whether we're
the server, and set up both the named pipe and WM_COPYDATA window if
we are; and finally, we go back to our list of key filenames and
actually add them, either to ourself (if we're the server) or to some
other Pageant (if we're a client).
Moreover, the decision about whether to be the server is now wrapped
in an interprocess mutex similar to the one used in connection
sharing, which means that even if two or more Pageants are started up
as close to simultaneously as possible, they should avoid a race
condition and successfully manage to agree on exactly one of
themselves to be the server. For example, a user reported that this
could occur if you put shortcuts to multiple private key files in your
Windows Startup folder, so that they were all launched simultaneously
at startup.
One slightly odd behaviour that remains: if the server Pageant has to
prompt for private key passphrases at startup, then it won't actually
start _servicing_ requests from other Pageants until it's finished
dealing with its own prompts. As a result, if you do start up two
Pageants at once each with an encrypted key file on its command line,
the second one won't even manage to present its passphrase prompt
until the first one's prompt is dismissed, because it will block
waiting for the initial check of the key list. But it will get there
in the end, so that's only a cosmetic oddity.
It would be nice to arrange that Pageant GUI operations don't block
unrelated agent requests (e.g. by having the GUI and the agent code
run in separate threads). But that's a bigger problem, not specific to
startup time - the same thing happens if you interactively load a key
via Pageant's file dialog. And it would require a major reorganisation
to fix that fully, because currently the GUI code depends on being
able to call _internal_ Pageant query functions like
pageant_count_ssh2_keys() that don't work by constructing an agent
request at all.
All the seat functions that request an interactive prompt of some kind
to the user - both the main seat_get_userpass_input and the various
confirmation dialogs for things like host keys - were using a simple
int return value, with the general semantics of 0 = "fail", 1 =
"proceed" (and in the case of seat_get_userpass_input, answers to the
prompts were provided), and -1 = "request in progress, wait for a
callback".
In this commit I change all those functions' return types to a new
struct called SeatPromptResult, whose primary field is an enum
replacing those simple integer values.
The main purpose is that the enum has not three but _four_ values: the
"fail" result has been split into 'user abort' and 'software abort'.
The distinction is that a user abort occurs as a result of an
interactive UI action, such as the user clicking 'cancel' in a dialog
box or hitting ^D or ^C at a terminal password prompt - and therefore,
there's no need to display an error message telling the user that the
interactive operation has failed, because the user already knows,
because they _did_ it. 'Software abort' is from any other cause, where
PuTTY is the first to know there was a problem, and has to tell the
user.
We already had this 'user abort' vs 'software abort' distinction in
other parts of the code - the SSH backend has separate termination
functions which protocol layers can call. But we assumed that any
failure from an interactive prompt request fell into the 'user abort'
category, which is not true. A couple of examples: if you configure a
host key fingerprint in your saved session via the SSH > Host keys
pane, and the server presents a host key that doesn't match it, then
verify_ssh_host_key would report that the user had aborted the
connection, and feel no need to tell the user what had gone wrong!
Similarly, if a password provided on the command line was not
accepted, then (after I fixed the semantics of that in the previous
commit) the same wrong handling would occur.
So now, those Seat prompt functions too can communicate whether the
user or the software originated a connection abort. And in the latter
case, we also provide an error message to present to the user. Result:
in those two example cases (and others), error messages should no
longer go missing.
Implementation note: to avoid the hassle of having the error message
in a SeatPromptResult being a dynamically allocated string (and hence,
every recipient of one must always check whether it's non-NULL and
free it on every exit path, plus being careful about copying the
struct around), I've instead arranged that the structure contains a
function pointer and a couple of parameters, so that the string form
of the message can be constructed on demand. That way, the only users
who need to free it are the ones who actually _asked_ for it in the
first place, which is a much smaller set.
(This is one of the rare occasions that I regret not having C++'s
extra features available in this code base - a unique_ptr or
shared_ptr to a string would have been just the thing here, and the
compiler would have done all the hard work for me of remembering where
to insert the frees!)
This fills in the remaining gap in the interactive prompt rework of
the proxy system in general. If you used the Telnet proxy with a
command containing %user or %pass, and hadn't filled in those
variables in the PuTTY config, then proxy/telnet.c would prompt you at
run time to enter the proxy auth details. But the local proxy command,
which uses the same format_telnet_command function, would not do that.
Now it does!
I've implemented this by moving the formatting of the proxy command
into a new module proxy/local.c, shared between both the Unix and
Windows local-proxy implementations. That module implements a
DeferredSocketOpener, which constructs the proxy command (prompting
first if necessary), and once it's constructed, hands it to a
per-platform function platform_setup_local_proxy().
So each platform-specific proxy function, instead of starting a
subprocess there and then and passing its details to make_fd_socket or
make_handle_socket, now returns a _deferred_ version of one of those
sockets, with the DeferredSocketOpener being the thing in
proxy/local.c. When that calls back to platform_setup_local_proxy(),
we actually start the subprocess and pass the resulting fds/handles to
the deferred socket to un-defer it.
A side effect of the rewrite is that when proxy commands are logged in
the Event Log, they now get the same amenities as in the Telnet proxy
type: the proxy password is sanitised out, and any difficult
characters are escaped.
Previously, a setup function returning one of these socket types (such
as platform_new_connection) had to do all its setup synchronously,
because if it was going to call make_fd_socket or make_handle_socket,
it had to have the actual fds or HANDLEs ready-made. If some kind of
asynchronous operation were needed before those fds become available,
there would be no way the function could achieve it, except by
becoming a whole extra permanent Socket wrapper layer.
Now there is, because you can make an FdSocket when you don't yet have
the fds, or a HandleSocket without the HANDLEs. Instead, you provide
an instance of the new trait 'DeferredSocketOpener', which is
responsible for setting in motion whatever asynchronous setup
procedure it needs, and when that finishes, calling back to
setup_fd_socket / setup_handle_socket to provide the missing pieces.
In the meantime, the FdSocket or HandleSocket will sit there inertly,
buffering any data the client might eagerly hand it via sk_write(),
and waiting for its setup to finish. When it does finish, buffered
data will be released.
In FdSocket, this is easy enough, because we were doing our own
buffering anyway - we called the uxsel system to find out when the fds
were readable/writable, and then wrote to them from our own bufchain.
So more or less all I had to do was make the try_send function do
nothing if the setup phase wasn't finished yet.
In HandleSocket, on the other hand, we're passing all our data to the
underlying handle-io.c system, and making _that_ deferrable in the
same way would be much more painful, because that's the place where
the scary threads live. So instead I've arranged it by replacing the
whole vtable, so that a deferred HandleSocket and a normal
HandleSocket are effectively separate trait implementations that can
share their state structure. And in fact that state struct itself now
contains a big anonymous union, containing one branch to go with each
vtable.
Nothing yet uses this system, but the next commit will do so.
This will mean that platform-specific proxy types will also be able to
set themselves up as child Interactors and prompt the user
interactively for passwords and the like.
NFC: nothing uses the new parameter yet.
The return value of term_data() is used as the return value from the
GUI-terminal versions of the Seat output method, which means backends
will take it to be the amount of standard-output data currently
buffered, and exert back-pressure on the remote peer if it gets too
big (e.g. by ceasing to extend the window in that particular SSH-2
channel).
Historically, as a comment in term_data() explained, we always just
returned 0 from that function, on the basis that we were processing
all the terminal data through our terminal emulation code immediately,
and never retained any of it in the buffer at all. If the terminal
emulation code were to start running slowly, then it would slow down
the _whole_ PuTTY system, due to single-threadedness, and
back-pressure of a sort would be exerted on the remote by it simply
failing to get round to reading from the network socket. But by the
time we got back to the top level of term_data(), we'd have finished
reading all the data we had, so it was still appropriate to return 0.
That comment is still correct if you're thinking about the limiting
factor on terminal data processing being the CPU usage in term_out().
But now that's no longer the whole story, because sometimes we leave
data in term->inbuf without having processed it: during drag-selects
in the terminal window, and (just introduced) while waiting for the
response to a pending window resize request. For both those reasons,
we _don't_ always have a buffer size of zero when we return from
term_data().
So now that hole in our buffer size management is filled in:
term_data() returns the true size of the remaining unprocessed
terminal output, so that back-pressure will be exerted if the terminal
is currently not consuming it. And when processing resumes and we
start to clear our backlog, we call backend_unthrottle to let the
backend know it can relax the back-pressure if necessary.
Previously, the Windows implementation of win_request_resize would
call term_size() to tell the terminal about its new size, _before_
calling SetWindowPos to actually change the window size.
If SetWindowPos ends up not getting the exact window size it asks for,
Windows notifies the application by sending back a WM_SIZE message -
synchronously, by calling back to the window procedure from within
SetWindowPos. So after the first over-optimistic term_size(), the
WM_SIZE handler would trigger a second one, resetting the terminal
again to the _actual_ size.
This was more or less harmless, since current handling of terminal
resizes is such that no terminal data gets too confused: any lines
pushed into the scrollback by the first term_size will be pulled back
out by the second one if needed (or vice versa), and since commit
271de3e4ec, individual termlines are not eagerly truncated by
resizing them twice.
However, it's more work than we really wanted to be doing, and it
seems presumptuous to send term_size() before we know it's right! So
now we send term_size() after SetWindowPos returns, unless it already
got sent by a re-entrant call to the WM_SIZE handler _before_
SetWindowPos returned. That way, the terminal should get exactly one
term_size() call in response to win_request_resize(), whether it got
the size it was expecting or not.
This commit replaces all those fiddly little linking modules
(be_all.c, be_none.c, be_ssh.c etc) with a single source file
controlled by ifdefs, and introduces a function be_list() in
setup.cmake that makes it easy to compile a version of it appropriate
to each application.
This is a net reduction in code according to 'git diff --stat', even
though I've introduced more comments. It also gets rid of another pile
of annoying little source files in the top-level directory that didn't
deserve to take up so much room in 'ls'.
More concretely, doing this has some maintenance advantages.
Centralisation means less to maintain (e.g. n_ui_backends is worked
out once in a way that makes sense everywhere), and also, 'appname'
can now be reliably set per program. Previously, some programs got the
wrong appname due to sharing the same linking module (e.g. Plink had
appname="PuTTY"), which was a latent bug that would have manifested if
I'd wanted to reuse the same string in another context.
One thing I've changed in this rework is that Windows pterm no longer
has the ConPTY backend in its backends[]: it now has an empty one. The
special be_conpty.c module shouldn't really have been there in the
first place: it was used in the very earliest uncommitted drafts of
the ConPTY work, where I was using another method of selecting that
backend, but now that Windows pterm has a dedicated
backend_vt_from_conf() that refers to conpty_backend by name, it has
no need to live in backends[] at all, just as it doesn't have to in
Unix pterm.
I'm actually quite surprised there was only _one_ copy of each of
these standard macros in the code base, given my general habit of
casually redefining them anywhere I need them! But each one was in a
silly place. Moved them up to the top level where they're available
globally.
While I'm in the mood for cleaning up the top-level directory here:
all the 'nostuff.c' files have moved into a new 'stubs' directory, and
I broke up be_misc.c into smaller modules that can live in 'utils'.
The Telnet proxy system is not a proper network protocol - we have no
reliable way to receive communication from the proxy telling us
whether a password is even required. However, we _do_ know (a) whether
the keywords '%user' or '%pass' appeared in the format string stored
in the Conf, and (b) whether we actually had a username or a password
to substitute into them. So that's how we know whether to ask for a
username or a password: if the format string asks for them and the
Conf doesn't provide them, we prompt for them at startup.
This involved turning TelnetProxyNegotiator into a coroutine (matching
all the other proxy types, but previously, it was the only one simple
enough not to need to be one), so that it can wait until a response
arrives to that prompt. (And also, as it turned out, so that it can
wait until setup is finished before even presenting the prompt!)
It also involves having format_telnet_command grow an extra output
parameter, in the form of 'unsigned *flags', with which it can
communicate back to the caller that a username or password was wanted
but not found. The other clients of that function (the local proxy
implementations) don't use those flags, but if necessary, they could.
When I wanted to append an ordinary C string to a BinarySink, without
any prefix length field or suffix terminator, I was using the idiom
put_datapl(bs, ptrlen_from_asciz(string));
but I've finally decided that's too cumbersome, and it deserves a
shorter name. put_dataz(bs, string) now does the same thing - in fact
it's a macro expanding to exactly the above.
While I'm at it, I've also added put_datalit(), which is the same
except that it expects a C string literal (and will enforce that at
compile time, via PTRLEN_LITERAL which it calls in turn). You can use
that where possible to avoid the run-time cost of the strlen.
marshal.h now provides a macro put_fmt() which allows you to write
arbitrary printf-formatted data to an arbitrary BinarySink.
We already had this facility for strbufs in particular, in the form of
strbuf_catf(). That was able to take advantage of knowing the inner
structure of a strbuf to minimise memory allocation (it would snprintf
directly into the strbuf's existing buffer if possible). For a general
black-box BinarySink we can't do that, so instead we dupvprintf into a
temporary buffer.
For consistency, I've removed strbuf_catf, and converted all uses of
it into the new put_fmt - and I've also added an extra vtable method
in the BinarySink API, so that put_fmt can still use strbuf_catf's
more efficient memory management when talking to a strbuf, and fall
back to the simpler strategy when that's not available.
(TL;DR: to suppress redundant 'Press Return to begin session' prompts
in between hops of a jump-host configuration, in Plink.)
This new query method directly asks the Seat the question: is the same
stream of input used to provide responses to interactive login
prompts, and the session input provided after login concludes?
It's used to suppress the last-ditch anti-spoofing defence in Plink of
interactively asking 'Access granted. Press Return to begin session',
on the basis that any such spoofing attack works by confusing the user
about what's a legit login prompt before the session begins and what's
sent by the server after the main session begins - so if those two
things take input from different places, the user can't be confused.
This doesn't change the existing behaviour of Plink, which was already
suppressing the antispoof prompt in cases where its standard input was
redirected from something other than a terminal. But previously it was
doing it within the can_set_trust_status() seat query, and I've now
moved it out into a separate query function.
The reason why these need to be separate is for SshProxy, which needs
to give an unusual combination of answers when run inside Plink. For
can_set_trust_status(), it needs to return whatever the parent Seat
returns, so that all the login prompts for a string of proxy
connections in session will be antispoofed the same way. But you only
want that final 'Access granted' prompt to happen _once_, after all
the proxy connection setup phases are done, because up until then
you're still in the safe hands of PuTTY itself presenting an unbroken
sequence of legit login prompts (even if they come from a succession
of different servers). Hence, SshProxy unconditionally returns 'no' to
the query of whether it has a single mixed input stream, because
indeed, it never does - for purposes of session input it behaves like
an always-redirected Plink, no matter what kind of real Seat it ends
up sending its pre-session login prompts to.
Passing an operating-system-specific error code to plug_closing(),
such as errno or GetLastError(), was always a bit weird, given that it
generally had to be handled by cross-platform receiving code in
backends. I had the platform.h implementations #define any error
values that the cross-platform code would have to handle specially,
but that's still not a great system, because it also doesn't leave
freedom to invent error representations of my own that don't
correspond to any OS code. (For example, the ones I just removed from
proxy.h.)
So now, the OS error code is gone from the plug_closing API, and in
its place is a custom enumeration of closure types: normal, error, and
the special case BROKEN_PIPE which is the only OS error code we have
so far needed to handle specially. (All others just mean 'abandon the
connection and print the textual message'.)
Having already centralised the handling of OS error codes in the
previous commit, we've now got a convenient place to add any further
type codes for errors needing special handling: each of Unix
plug_closing_errno(), Windows plug_closing_system_error(), and Windows
plug_closing_winsock_error() can easily grow extra special cases if
need be, and each one will only have to live in one place.
Having a single plug_closing() function covering various kinds of
closure is reasonably convenient from the point of view of Plug
implementations, but it's annoying for callers, who all have to fill
in pointless NULL and 0 parameters in the cases where they're not
used.
Added some inline helper functions in network.h alongside the main
plug_closing() dispatch wrappers, so that each kind of connection
closure can present a separate API for the Socket side of the
interface, without complicating the vtable for the Plug side.
Also, added OS-specific extra helpers in the Unix and Windows
directories, which centralise the job of taking an OS error code (of
whatever kind) and translating it into its error message.
In passing, this removes the horrible ad-hoc made-up error codes in
proxy.h, which is OK, because nothing checked for them anyway, and
also I'm about to do an API change to plug_closing proper that removes
the need for them.
Previously, SSH authentication banners were displayed by calling the
ordinary seat_output function, and passing it a special value in the
SeatOutputType enumeration indicating an auth banner.
The awkwardness of this was already showing a little in SshProxy's
implementation of seat_output, where it had to check for that special
value and do totally different things for SEAT_OUTPUT_AUTH_BANNER and
everything else. Further work in that area is going to make it more
and more awkward if I keep the two output systems unified.
So let's split them up. Now, Seat has separate output() and banner()
methods, which each implementation can override differently if it
wants to.
All the 'end user' Seat implementations use the centralised
implementation function nullseat_banner_to_stderr(), which turns
banner text straight back into SEAT_OUTPUT_STDERR and passes it on to
seat_output. So I didn't have to tediously implement a boring version
of this function in GTK, Windows GUI, consoles, file transfer etc.
There are quite a few of them already, and I'm about to make another
one, so let's start with a bit of tidying up.
The CMake build organisation is unchanged: I haven't put the proxy
object files into a separate library, just moved the locations of the
source files. (Organising proxying as a library would be tricky
anyway, because of the various overrides for tools that want to avoid
cryptography.)
Now every struct that implements the Backend trait is completely
cleared before we start initialising any of its fields. This will mean
I can add new fields that default to 0 or NULL, without having to mess
around initialising them explicitly everywhere.
Previously, checking the host key against the persistent cache managed
by the storage.h API was done as part of the seat_verify_ssh_host_key
method, i.e. separately by each Seat.
Now that check is done by verify_ssh_host_key(), which is a new
function in ssh/common.c that centralises all the parts of host key
checking that don't need an interactive prompt. It subsumes the
previous verify_ssh_manual_host_key() that checked against the Conf,
and it does the check against the storage API that each Seat was
previously doing separately. If it can't confirm or definitively
reject the host key by itself, _then_ it calls out to the Seat, once
an interactive prompt is definitely needed.
The main point of doing this is so that when SshProxy forwards a Seat
call from the proxy SSH connection to the primary Seat, it won't print
an announcement of which connection is involved unless it's actually
going to do something interactive. (Not that we're printing those
announcements _yet_ anyway, but this is a piece of groundwork that
works towards doing so.)
But while I'm at it, I've also taken the opportunity to clean things
up a bit by renaming functions sensibly. Previously we had three very
similarly named functions verify_ssh_manual_host_key(), SeatVtable's
'verify_ssh_host_key' method, and verify_host_key() in storage.h. Now
the Seat method is called 'confirm' rather than 'verify' (since its
job is now always to print an interactive prompt, so it looks more
like the other confirm_foo methods), and the storage.h function is
called check_stored_host_key(), which goes better with store_host_key
and avoids having too many functions with similar names. And the
'manual' function is subsumed into the new centralised code, so
there's now just *one* host key function with 'verify' in the name.
Several functions are reindented in this commit. Best viewed with
whitespace changes ignored.
The current 'displayname' field is designed for presenting in the
config UI, so it starts with a capital letter even when it's not a
proper noun. If I want to name the backend in the middle of a
sentence, I'll need a version that starts with lowercase where
appropriate.
The old field is renamed displayname_tc, to avoid ambiguity.
It was totally unused. No implementation of the 'closing' method in a
Plug vtable was checking it for any reason at all, except for
ProxySocket which captured it from its client in order to pass on to
its server (which, perhaps after further iterations of ProxySocket,
would have ended up ignoring it similarly). And every caller of
plug_closing set it to 0 (aka false), except for the one in sshproxy.c
which passed true (but it would have made no difference to anyone).
The comment in network.h refers to a FIXME comment which was in
try_send() when that code was written (see winnet.c in commit
7b0e082700). That FIXME is long gone, replaced by a use of a
toplevel callback. So I think the aim must have been to avoid
re-entrancy when sk_write called try_send which encountered a socket
error and called back to plug_closing - but that's long since fixed by
other means now.
This is the same as the previous FUNKY_XTERM mode if you don't press
any modifier keys, but now Shift or Ctrl or Alt with function keys
adds an extra bitmap parameter. The bitmaps are the same as the ones
used by the new SHARROW_BITMAP arrow key mode.
As well as affecting the bitmap field in the escape sequence, it was
_also_ having its otherwise standard effect of prefixing Esc to the
whole sequence. It shouldn't do both.
This commit introduces a new config option for how to handle shifted
arrow keys.
In the default mode (SHARROW_APPLICATION), we do what we've always
done: Ctrl flips the arrow keys between sending their most usual
escape sequences (ESC [ A ... ESC [ D) and sending the 'application
cursor keys' sequences (ESC O A ... ESC O D). Whichever of those modes
is currently configured, Ctrl+arrow sends the other one.
In the new mode (SHARROW_BITMAP), application cursor key mode is
unaffected by any shift keys, but the default sequences acquire two
numeric arguments. The first argument is 1 (reflecting the fact that a
shifted arrow key still notionally moves just 1 character cell); the
second is the bitmap (1 for Shift) + (2 for Alt) + (4 for Ctrl),
offset by 1. (Except that if _none_ of those modifiers is pressed,
both numeric arguments are simply omitted.)
The new bitmap mode is what current xterm generates, and also what
Windows ConPTY seems to expect. If you start an ordinary Command
Prompt and launch into WSL, those are the sequences it will generate
for shifted arrow keys; conversely, if you run a Command Prompt within
a ConPTY, then these sequences for Ctrl+arrow will have the effect you
expect in cmd.exe command-line editing (going backward or forward a
word). For that reason, I enable this mode unconditionally when
launching Windows pterm.
While fixing the previous commit I noticed that window titles don't
actually _work_ properly if you change the terminal character set,
because the text accumulated in the OSC string buffer is sent to the
TermWin as raw bytes, with no indication of what character set it
should interpret them as. You might get lucky if you happened to
choose the right charset (in particular, UTF-8 is a common default),
but if you change the charset half way through a run, then there's
certainly no way the frontend will know to interpret two window titles
sent before and after the change in two different charsets.
So, now win_set_title() and win_set_icon_title() both include a
codepage parameter along with the byte string, and it's up to them to
translate the provided window title from that encoding to whatever the
local window system expects to receive.
On Windows, that's wide-string Unicode, so we can just use the
existing dup_mb_to_wc utility function. But in GTK, it's UTF-8, so I
had to write an extra utility function to encode a wide string as
UTF-8.
In the Windows Pageant message loop, we were alternating between
MsgWaitForMultipleObjects with timeout=INFINITE, and
run_toplevel_callbacks. But run_toplevel_callbacks doesn't loop until
the callback queue is empty: it just runs at least one of the
currently queued callbacks, so that some progress was made. So if two
or more callbacks were queued, we'd leave the rest in the queue and go
back into MsgWaitForMultipleObjects, which could hang indefinitely.
(A very silly workaround was available: move the mouse over the
Pageant systray icon! Every mouse event would terminate the wait and
let you get one more iteration of run_toplevel_callbacks.)
Now we do the same thing as in other main loops: if any further
callbacks are pending, then we still run MsgWaitForMultipleObjects,
but we do it with timeout=0 instead of timeout=INFINITE, so that we
won't go back to a _blocking_ sleep until all callbacks have been
serviced.
When a writable HANDLE is managed by the handle-io.c system, you ask
to send EOF on the handle by calling handle_write_eof. That waits
until all buffered data has been written, and then sends an EOF event
by simply closing the handle.
That is, of course, the only way to send an EOF signal on a handle at
all. And yet, it's a bug, because the handle_output system does not
take ownership of the handle you give it: the client of handle_output
retains ownership, keeps its own copy of the handle, and will expect
to close it itself.
In most cases, the extra close will harmlessly fail, and return
ERROR_INVALID_HANDLE (which the caller didn't notice anyway). But if
you're unlucky, in conditions of frantic handle opening and closing
(e.g. with a lot of separate named-pipe-style agent forwarding
connections being constantly set up and torn down), the handle value
might have been reused between the two closes, so that the second
CloseHandle closes an unrelated handle belonging to some other part of
the program.
We can't fix this by giving handle_output permanent ownership of the
handle, because it really _is_ necessary for copies of it to survive
elsewhere: in particular, for a bidirectional file such as a serial
port or named pipe, the reading side also needs a copy of the same
handle! And yet, we can't replace the handle_write_eof call in the
client with a direct CloseHandle, because that won't wait until
buffered output has been drained.
The solution is that the client still calls handle_write_eof to
register that it _wants_ an EOF sent; the handle_output system will
wait until it's ready, but then, instead of calling CloseHandle, it
will ask its _client_ to close the handle, by calling the provided
'sentdata' callback with the new 'close' flag set to true. And then
the client can not only close the handle, but do whatever else it
needs to do to record that that has been done.
Trying to debug a problem involving threads just now, it turned out
that the version of the diagnostics going to my debug.log was getting
data in a different order from the version going to the debug console.
Now I open and write to debug_fp by going directly to the Win32 API
instead of via a buffering userland stdio, and that seems to have
solved the problem.
Similarly to cmdgen's passphrase options, this replaces the password
on the command line with a filename to read the password out of, which
means it can't show up in 'ps' or the Windows task manager.
The jump host system ought really to be treating SSH authentication
banners as a distinct thing from the standard-error session output, so
that the former can be presented to the user in the same way as the
auth banner for the main session.
This change converts the 'bool is_stderr' parameter of seat_output()
into an enumerated type with three values. For the moment, stderr and
banners are treated the same, but the plan is for that to change.
Now that it's possible for a single invocation of PuTTY to connect to
multiple SSH servers (jump host followed by ultimate destination
host), it's rather unhelpful for host key prompts to just say "the
server". To check an unknown host key, users will need to know _which_
host it's purporting to be the key for.
Another possibility is to put a message in the terminal window
indicating which server we're currently in the SSH setup phase for.
That will certainly be what we have to end up doing for userpass
prompts that appear _in_ the terminal window. But that by itself is
still unhelpful for host key prompts in a separate dialog, because the
user would have to check both windows to get all the information they
need. Easier if the host key dialog itself tells you everything you
need to know to answer the question: is _this_ key the one you expect
for _that_ host?
The format _strings_ were previously centralised into the platform-
independent console.c, as const char arrays. Now the actual formatting
operation is centralised as well, by means of console.c providing a
function that takes all the necessary parameters and returns a
formatted piece of text for the console.
Mostly this is so that I can add extra parameters to the message with
some confidence: changing a format string in one file and two fprintf
statements in other files to match seems like the kind of situation
you wish you hadn't got into in the first place :-)
The system for handling seat_get_userpass_input has always been
structured differently between GUI PuTTY and CLI tools like Plink.
In the CLI tools, password input is read directly from the OS
terminal/console device by console_get_userpass_input; this means that
you need to ensure the same terminal input data _hasn't_ already been
consumed by the main event loop and sent on to the backend. This is
achieved by the backend_sendok() method, which tells the event loop
when the backend has finished issuing password prompts, and hence,
when it's safe to start passing standard input to backend_send().
But in the GUI tools, input generated by the terminal window has
always been sent straight to backend_send(), regardless of whether
backend_sendok() says it wants it. So the terminal-based
implementation of username and password prompts has to work by
consuming input data that had _already_ been passed to the backend -
hence, any backend that needs to do that must keep its input on a
bufchain, and pass that bufchain to seat_get_userpass_input.
It's awkward that these two totally different systems coexist in the
first place. And now that SSH proxying needs to present interactive
prompts of its own, it's clear which one should win: the CLI style is
the Right Thing. So this change reworks the GUI side of the mechanism
to be more similar: terminal data now goes into a queue in the Ldisc,
and is not sent on to the backend until the backend says it's ready
for it via backend_sendok(). So terminal-based userpass prompts can
now consume data directly from that queue during the connection setup
stage.
As a result, the 'bufchain *' parameter has vanished from all the
userpass_input functions (both the official implementations of the
Seat trait method, and term_get_userpass_input() to which some of
those implementations delegate). The only function that actually used
that bufchain, namely term_get_userpass_input(), now instead reads
from the ldisc's input queue via a couple of new Ldisc functions.
(Not _trivial_ functions, since input buffered by Ldisc can be a
mixture of raw bytes and session specials like SS_EOL! The input queue
inside Ldisc is a bufchain containing a fiddly binary encoding that
can represent an arbitrary interleaving of those things.)
This greatly simplifies the calls to seat_get_userpass_input in
backends, which now don't have to mess about with passing their own
user_input bufchain around, or toggling their want_user_input flag
back and forth to request data to put on to that bufchain.
But the flip side is that now there has to be some _other_ method for
notifying the terminal when there's more input to be consumed during
an interactive prompt, and for notifying the backend when prompt input
has finished so that it can proceed to the next stage of the protocol.
This is done by a pair of extra callbacks: when more data is put on to
Ldisc's input queue, it triggers a call to term_get_userpass_input,
and when term_get_userpass_input finishes, it calls a callback
function provided in the prompts_t.
Therefore, any use of a prompts_t which *might* be asynchronous must
fill in the latter callback when setting up the prompts_t. In SSH, the
callback is centralised into a common PPL helper function, which
reinvokes the same PPL's process_queue coroutine; in rlogin we have to
set it up ourselves.
I'm sorry for this large and sprawling patch: I tried fairly hard to
break it up into individually comprehensible sub-patches, but I just
couldn't tease out any part of it that would stand sensibly alone.
In the case where these socket types are constructed because of a
local proxy command, we do actually have a SockAddr representing the
logical host we were trying to make a connection to. So we might as
well store it in the socket implementation, and then we can include it
in the PLUGLOG_CONNECT_SUCCESS call to make the log message more
informative.
Now the non-SSH backends critically depend on it, it's important not
to forget to send it, for any socket type that's going to be used for
any of those backends. But ProxySocket, and the Unix and Windows
'socket' types wrapping pipes to local subprocesses, were not doing
so.
Some of these socket types don't have a SockAddr available to
represent the destination host. (Sometimes the concept isn't even
meaningful). Therefore, I've also expanded the semantics of
PLUGLOG_CONNECT_SUCCESS so that the addr parameter is allowed to be
NULL, and invented a noncommittal fallback version of the log message
in that situation.
I just happened to notice that just below my huge comment explaining
the two command-line splitting policies, there's a smaller one that
refers to it as '(see large comment below)'. It's not below - it's
above!
That was because the older parts of that comment had previously been
inside split_into_argv(), until I moved the explanation further up the
file to the top level. Another consequence of that was that the older
section of the comment was wrapped to a strangely narrow line width,
because it had previously been indented further right.
Folded the two comments together, and rewrapped the narrow paragraphs.
This is called by the backend to notify the Seat that the connection
has progressed to the point where the main session channel (i.e. the
thing that would typically correspond to the client's stdin/stdout)
has been successfully set up.
The only Seat that implements this method nontrivially is the one in
SshProxy, which uses it as an indication that the proxied connection
to the remote host has succeeded, and sends the
PLUGLOG_CONNECT_SUCCESS notification to its own Plug.
Hence, the only backends that need to implement it at the moment are
the two SSH-shaped backends (SSH proper and bare-connection / psusan).
For other backends, it's not always obvious what 'main session
channel' would even mean, or whether it means anything very useful; so
I've also introduced a backend flag indicating whether the backend is
expecting to call that method at all, so as not to have to spend
pointless effort on defining an arbitrary meaning for it in other
contexts.
So a lot of this patch is just introducing the new method and putting
its trivial do-nothing implementation into all the existing Seat
methods. The interesting parts happen in ssh/mainchan.c (which
actually calls it), and sshproxy.c (which does something useful in
response).
On a similar theme of separating the query operation from the
attempted change, backend_send() now no longer has the side effect of
returning the current size of the send buffer. Instead, you have to
call backend_sendbuffer() every time you want to know that.
This complicates the API in one sense (more separate functions), but
in another sense, simplifies it (each function does something
simpler). When I start putting one Seat in front of another during SSH
proxying, the latter will be more important - in particular, it means
you can find out _whether_ a seat can support changing trust status
without having to actually attempt a destructive modification.
Ian Jackson recently tried to use the recipe in the psusan manpage for
talking to UML, and found that the connection was not successfully set
up, because at some point during startup, UML read the SSH greeting
(ok, the bare-ssh-connection greeting) from its input fd and threw it
away. So by the time psusan was run by the guest init process, the
greeting wasn't there to be read.
Ian's report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991958
I was also able to reproduce this locally, which makes me wonder why I
_didn't_ notice it when I originally wrote that part of the psusan man
page. It worked for me before, honest! But now it doesn't.
Anyway. The ssh verstring module already has a mode switch to decide
whether we ought to send our greeting before or after waiting for the
other side's greeting (because that decision varies between client and
server, and between SSH-1 and SSH-2). So it's easy to implement an
override that forces it to 'wait for the server greeting first'.
I've added this as yet another bug workaround flag. But unlike all the
others, it can't be autodetected from the server's version string,
because, of course, we have to act on it _before_ seeing the server's
greeting and version string! So it's a manual-only flag.
However, I've mentioned it in the UML section of the psusan man page,
since that's the place where I _know_ people are likely to need to use
this flag.
A user reports that if you have MIT KfW loaded, and your PuTTY session
terminates without the PuTTY process exiting, and you select 'Restart
Session' from the menu, then a crash occurs inside the Kerberos
library itself. Scuttlebutt on the Internet suggested this might be to
do with unloading and then reloading the DLL within the process
lifetime, which indeed we were doing.
Now we avoid doing that for the KfW library in particular, by keeping
a tree234 of module handles marked 'never unload this'.
This is a workaround at best, but it seems to stop the problem
happening in my own tests.
If you don't, they are permanently leaked. A user points out that this
is particularly bad in Pageant, with the new named-pipe-based IPC,
since it will spawn an input and output I/O thread per named pipe
connection, leading to two handles being leaked every time.
In commit f69cf86a61, I added a call to term_update that happens
when we receive WM_VSCROLL / SB_THUMBPOSITION in the subsidiary
message loop that Windows creates during the handling of WM_SYSCOMMAND
/ SC_VSCROLL. The effect was that interactive dragging of the
scrollbar now redraws the window at every step, whereas previously it
didn't.
A user just pointed out that if you click on one of the scrollbar end
buttons and hold it down until it begins emulating key repeat, the
same bug occurs: the window isn't redrawn until you release the mouse
button and the subsidiary message loop ends.
This commit extends the previous fix to cover all of the WM_VSCROLL
subtypes, instead of just SB_THUMBPOSITION and SB_THUMBTRACK. Redraws
while holding down those scrollbar buttons now work again.
This is used to notify the Seat that some data has been cleared from
the backend's outgoing data buffer. In other words, it notifies the
Seat that it might be worth calling backend_sendbuffer() again.
We've never needed this before, because until now, Seats have always
been the 'main program' part of the application, meaning they were
also in control of the event loop. So they've been able to call
backend_sendbuffer() proactively, every time they go round the event
loop, instead of having to wait for a callback.
But now, the SSH proxy is the first example of a Seat without
privileged access to the event loop, so it has no way to find out that
the backend's sendbuffer has got smaller. And without that, it can't
pass that notification on to plug_sent, to unblock in turn whatever
the proxied connection might have been waiting to send.
In fact, before this commit, sshproxy.c never called plug_sent at all.
As a result, large data uploads over an SSH jump host would hang
forever as soon as the outgoing buffer filled up for the first time:
the main backend (to which sshproxy.c was acting as a Socket) would
carefully stop filling up the buffer, and then never receive the call
to plug_sent that would cause it to start again.
The new callback is ignored everywhere except in sshproxy.c. It might
be a good idea to remove backend_sendbuffer() entirely and convert all
previous uses of it into non-empty implementations of this callback,
so that we've only got one system; but for the moment, I haven't done
that.
Suggested by Manfred Kaiser, who also wrote most of this patch
(although outlying parts, like documentation and SSH-1 support, are by
me).
This is a second line of defence against the kind of spoofing attacks
in which a malicious or compromised SSH server rushes the client
through the userauth phase of SSH without actually requiring any auth
inputs (passwords or signatures or whatever), and then at the start of
the connection phase it presents something like a spoof prompt,
intended to be taken for part of userauth by the user but in fact with
some more sinister purpose.
Our existing line of defence against this is the trust sigil system,
and as far as I know, that's still working. This option allows a bit of
extra defence in depth: if you don't expect your SSH server to
trivially accept authentication in the first place, then enabling this
option will cause PuTTY to disconnect if it unexpectedly does so,
without the user having to spot the presence or absence of a fiddly
little sigil anywhere.
Several types of authentication count as 'trivial'. The obvious one is
the SSH-2 "none" method, which clients always try first so that the
failure message will tell them what else they can try, and which a
server can instead accept in order to authenticate you unconditionally.
But there are two other ways to do it that we know of: one is to run
keyboard-interactive authentication and send an empty INFO_REQUEST
packet containing no actual prompts for the user, and another even
weirder one is to send USERAUTH_SUCCESS in response to the user's
preliminary *offer* of a public key (instead of sending the usual PK_OK
to request an actual signature from the key).
This new option detects all of those, by clearing the 'is_trivial_auth'
flag only when we send some kind of substantive authentication response
(be it a password, a k-i prompt response, a signature, or a GSSAPI
token). So even if there's a further path through the userauth maze we
haven't spotted, that somehow avoids sending anything substantive, this
strategy should still pick it up.
Enthusiastic copy-paste: in commit 17c57e1078 I added the same
precautionary call to ensure_handlewaits_tree_exists() everywhere,
even in functions that didn't actually need to use the tree.
Before commit 6e69223dc2, Pageant would stop working after a
certain number of PuTTYs were active at the same time. (At most about
60, but maybe fewer - see below.)
This was because of two separate bugs. The easy one, fixed in
6e69223dc2 itself, was that PuTTY left each named-pipe connection
to Pageant open for the rest of its lifetime. So the real problem was
that Pageant had too many active connections at once. (And since a
given PuTTY might make multiple connections during userauth - one to
list keys, and maybe another to actually make a signature - that was
why the number of _PuTTYs_ might vary.)
It was clearly a bug that PuTTY was leaving connections to Pageant
needlessly open. But it was _also_ a bug that Pageant couldn't handle
more than about 60 at once. In this commit, I fix that secondary bug.
The cause of the bug is that the WaitForMultipleObjects function
family in the Windows API have a limit on the number of HANDLE objects
they can select between. The limit is MAXIMUM_WAIT_OBJECTS, defined to
be 64. And handle-io.c was using a separate event object for each I/O
subthread to communicate back to the main thread, so as soon as all
those event objects (plus a handful of other HANDLEs) added up to more
than 64, we'd start passing an overlarge handle array to
WaitForMultipleObjects, and it would start not doing what we wanted.
To fix this, I've reorganised handle-io.c so that all its subthreads
share just _one_ event object to signal readiness back to the main
thread. There's now a linked list of 'struct handle' objects that are
ready to be processed, protected by a CRITICAL_SECTION. Each subthread
signals readiness by adding itself to the linked list, and setting the
event object to indicate that the list is now non-empty. When the main
thread receives the event, it iterates over the whole list processing
all the ready handles.
(Each 'struct handle' still has a separate event object for the main
thread to use to communicate _to_ the subthread. That's OK, because no
thread is ever waiting on all those events at once: each subthread
only waits on its own.)
The previous HT_FOREIGN system didn't really fit into this framework.
So I've moved it out into its own system. There's now a handle-wait.c
which deals with the relatively simple job of managing a list of
handles that need to be waited for, each with a callback function;
that's what communicates a list of HANDLEs to event loops, and
receives the notification when the event loop notices that one of them
has done something. And handle-io.c is now just one client of
handle-wait.c, providing a single HANDLE to the event loop, and
dealing internally with everything that needs to be done when that
handle fires.
The new top-level handle-wait.c system *still* can't deal with more
than MAXIMUM_WAIT_OBJECTS. At the moment, I'm reasonably convinced it
doesn't need to: the only kind of HANDLE that any of our tools could
previously have needed to wait on more than one of was the one in
handle-io.c that I've just removed. But I've left some assertions and
a TODO comment in there just in case we need to change that in future.
This introduces a new entry to the radio-button list of proxy types,
in which the 'Proxy host' box is taken to be the name of an SSH server
or saved session. We make an entire subsidiary SSH connection to that
host, open a direct-tcpip channel through it, and use that as the
connection over which to run the primary network connection.
The result is basically the same as if you used a local proxy
subprocess, with a command along the lines of 'plink -batch %proxyhost
-nc %host:%port'. But it's all done in-process, by having an SshProxy
object implement the Socket trait to talk to the main connection, and
implement Seat and LogPolicy to talk to its subsidiary SSH backend.
All the refactoring in recent years has got us to the point where we
can do that without both SSH instances fighting over some global
variable or unique piece of infrastructure.
From an end user perspective, doing SSH proxying in-process like this
is a little bit easier to set up: it doesn't require you to bake the
full pathname of Plink into your saved session (or to have it on the
system PATH), and the SshProxy setup function automatically turns off
SSH features that would be inappropriate in this context, such as
additional port forwardings, or acting as a connection-sharing
upstream. And it has minor advantages like getting the Event Log for
the subsidiary connection interleaved in the main Event Log, as if it
were stderr output from a proxy subcommand, without having to
deliberately configure the subsidiary Plink into verbose mode.
However, this is an initial implementation only, and it doesn't yet
support the _big_ payoff for doing this in-process, which (I hope)
will be the ability to handle interactive prompts from the subsidiary
SSH connection via the same user interface as the primary one. For
example, you might need to answer two password prompts in succession,
or (the first time you use a session configured this way) confirm the
host keys for both proxy and destination SSH servers. Comments in the
new source file discuss some design thoughts on filling in this gap.
For the moment, if the proxy SSH connection encounters any situation
where an interactive prompt is needed, it will make the safe
assumption, the same way 'plink -batch' would do. So it's at least no
_worse_ than the existing technique of putting the proxy connection in
a subprocess.
This notifies the Seat that the entire backend session has finished
and closed its network connection - or rather, that it _might_ have
done, and that the frontend should check backend_connected() if it
wasn't planning to do so already.
The existing Seat implementations haven't needed this: the GUI ones
don't actually need to do anything specific when the network
connection goes away, and the CLI ones deal with it by being in charge
of their own event loop so that they can easily check
backend_connected() at every possible opportunity in any case. But I'm
about to introduce a new Seat implementation that does need to know
this, and doesn't have any other way to get notified of it.
Since ca9cd983e1, changing colour config mid-session had no effect
(until the palette was reset for some other reason). Now it does take
effect immediately (provided that the palette has not been overridden by
escape sequence -- this is new with ca9cd983e1).
This changes the semantics of palette_reset(): the only important
parameter when doing that is whether we keep escape sequence overrides
-- there's no harm in re-fetching config and platform colours whether or
not they've changed -- so that's what the parameter becomes (with a
sense that doesn't require changing the call sites). The other part of
this change is actually remembering to trigger this when the
configuration is changed.
I was cleaning up the 'struct handle', but not the underlying HANDLE.
As a result, any PuTTY process that makes a request to Pageant keeps
the named pipe connection open until the end of the process's
lifetime.
I was checking a HANDLE against INVALID_HANDLE_VALUE to decide whether
it should be closed. But ten lines further up, I was setting it
manually to NULL to suppress the close. Oops.
Less than 12 hours after 0.75 went out of the door, a user pointed out
that enabling the 'Use system colours' config option causes an
immediate NULL-dereference crash. The reason is because a chain of
calls from term_init() ends up calling back to the Windows
implementation of the palette_get_overrides() method, which responds
by trying to call functions on the static variable 'term' in window.c,
which won't be initialised until term_init() has returned.
Simple fix: palette_get_overrides() is now given a pointer to the
Terminal that it should be updating, because it can't find it out any
other way.
This fulfills our long-standing Mayhem-difficulty wishlist item
'win-command-prompt': this is a Windows pterm in the sense that when
you run it you get a local cmd.exe running inside a PuTTY-style window.
Advantages of this: you get the same free choice of fonts as PuTTY has
(no restriction to a strange subset of the system's available fonts);
you get the same copy-paste gestures as PuTTY (no mental gear-shifting
when you have command prompts and SSH sessions open on the same
desktop); you get scrollback with the PuTTY semantics (scrolling to
the bottom gets you to where the action is, as opposed to the way you
could accidentally find yourself 500 lines past the end of the action
in a real console).
'win-command-prompt' was at Mayhem difficulty ('Probably impossible')
basically on the grounds that with Windows's old APIs for accessing
the contents of consoles, there was no way I could find to get this to
work sensibly. What was needed to make it feasible was a major piece
of re-engineering work inside Windows itself.
But, of course, that's exactly what happened! In 2019, the new ConPTY
API arrived, which lets you create an object that behaves like a
Windows console at one end, and round the back, emits a stream of
VT-style escape sequences as the screen contents evolve, and accepts a
VT-style input stream in return which it will parse function and arrow
keys out of in the usual way.
So now it's actually _easy_ to get this to basically work. The new
backend, in conpty.c, has to do a handful of magic Windows API calls
to set up the pseudo-console and its feeder pipes and start a
subprocess running in it, a further magic call every time the PuTTY
window is resized, and detect the end of the session by watching for
the subprocess terminating. But apart from that, all it has to do is
pass data back and forth unmodified between those pipes and the
backend's associated Seat!
That said, this is new and experimental, and there will undoubtedly be
issues. One that I already know about is that you can't copy and paste
a word that has wrapped between lines without getting an annoying
newline in the middle of it. As far as I can see this is a fundamental
limitation: the ConPTY system sends the _same_ escape sequence stream
for a line that wrapped as it would send for a line that had a logical
\n at what would have been the wrap point. Probably the best we can do
to mitigate this is to adopt a different heuristic for newline elision
that's right more often than it's wrong.
For the moment, that experimental-ness is indicated by the fact that
Buildscr will build, sign and deliver a copy of pterm.exe for each
flavour of Windows, but won't include it in the .zip file or in the
installer. (In fact, that puts it in exactly the same ad-hoc category
as PuTTYtel, although for completely different reasons.)
icons/Makefile will now rebuild them, but also, as per this code
base's usual policy with Windows icons, they're committed directly in
the windows subdir.
Now they're done by putty.rc and puttytel.rc, before including
putty-common.rc2. So another user of putty-common.rc2 can disagree on
what icons to use.
This prepares the ground for a second essentially similarly-shaped
program reusing most of window.c but handling its command line and
startup differently. A couple of large parts of WinMain() to do with
backend selection and command-line handling are now subfunctions in a
separate file putty.c.
Also, our custom AppUserModelId is defined in that file, so that it
can vary with the client application.
The code to find out the location of the c:\windows\system32 directory
was already present, in load_system32_dll(). Now it's moved out into a
function of its own, so it can be called in other contexts.
Jacob spots that on Windows, current PuTTY is not compatible with
0.74, if one of them acts as a connection sharing upstream and the
other as a downstream. That's because commit 1344d4d1cd
accidentally changed the hash preimage in capi_obfuscate_string() so
that it no longer had an SSH-like string length field at the front. So
the two versions of PuTTY will expect the named pipe to have a
different pathname, and so they won't be able to find each other.
Interoperation between PuTTY versions is not the most important use
case of connection sharing - surely the typical user will invoke it by
activating the same session twice, or by using Duplicate Session. But
it was never intended to deliberately _not_ work, so let's fix it
before 0.75 goes out, so that at least the incompatible behaviour will
only ever have appeared in development snapshots.
Now that the main source file of Plink in each platform directory has
the same name, we can put centralise the main definition of the
program in the main CMakeLists.txt, and in the platform directory,
just add the few extra modules needed to clear up platform-specific
details.
The same goes for psocks. And PSCP and PSFTP could have been moved to
the top level already - I just hadn't done it in the initial setup.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.
GetFileType() takes a HANDLE, not a pathname. So passing it the
pathname of the agent named pipe would never have worked at all.
I hadn't noticed, because the only call to that function logical-ORs
its return value with that of wm_copydata_agent_exists(), and the
latter _does_ work.
So if you're running true Pageant, which presents both IPC interfaces,
then there's no problem. But if a Pageant-emulating system wanted to
present only the named-pipe version, then we wouldn't have detected
it. Now we should do.
I just discovered that they weren't happening, and the reason why is
thoroughly annoying. Details are in the long comment I've added to the
WM_VSCROLL handler in WndProc, but the short version is that when you
interactively drag the terminal window's scrollbar, a subsidiary
message loop is launched by DefWndProc, causing all our timer events
to go missing until the user lets go of the scrollbar again. So we
have to manually update the terminal window on scroll events, because
the normal system is out of action.
I assume this changed behaviour round about the big rework of terminal
updating in February. Good job I spotted it just _before_ 0.75, and
not just after!
As we do in other similar situations. (The resulting passphrase dialog
is annoyingly unsymmetric, but probably less annoying than a Help
button which does nothing, and the situation shouldn't arise with our
standard builds.)
Suggested by Jacob: if this dialog box is going to pop up
_unexpectedly_ - perhaps when people have momentarily forgotten
they're even running Pageant, or at least forgotten they added a key
encrypted,, or maybe haven't found out yet that their IT installed it
- then it could usefully come with a help button that pops up further
explanation of what the dialog box means, and from which you can find
your way to the rest of the help.
I continue to believe that there's nothing I can (or should) do about
the fact that on Windows, Pageant's async passphrase prompt dialog box
doesn't automatically get the input focus when it pops up in response
to a request received via invisible IPC.
However, one thing I can do is add some text to the box that _warns_
people about it, so that at least there's some kind of suggestion that
you should get into the habit of clicking on the passphrase prompt
before typing your passphrase into it.
(I would be less concerned about all of this if it weren't for the
fact that focus is surprisingly non-obvious on Windows 10, at least on
the machine I have here. When the window doesn't have focus, the title
bar has the same background colour, and only the text is fainter. And
perhaps more confusingly, the cursor in the edit box still flashes!
That fooled _me_ a few times to begin with.)
This code base has always been a bit confused about which spelling it
likes to use to refer to that signature algorithm. The SSH protocol id
is "ssh-dss". But everyone I know refers to it as the Digital
Signature _Algorithm_, not the Digital Signature _Standard_.
When I moved everything down into the crypto subdir, I took the
opportunity to rename sshdss.c to dsa.c. Now I'm doing the rest of the
job: all internal identifiers and code comments refer to DSA, and the
spelling "dss" only survives in externally visible identifiers that
have to remain constant.
(Such identifiers include the SSH protocol id, and also the string id
used to identify the key type in PuTTY's own host key cache. We can't
change the latter without causing everyone a backwards-compatibility
headache, and if we _did_ ever decide to do that, we'd surely want to
do a much more thorough job of making the cache format more sensible!)