Having constructed a conio object with its own 'utf8' flag, we should
be checking that flag at time of use rather than the global
conio_use_utf8 (which was already taken into account at setup time).
Otherwise we miss the whole point, which is that without the override
flag turning off UTF-8, _some_ uses of the system should use the
default code page and not UTF-8.
In commit f9e572595b I claimed that I'd removed very nearly all
the global and static variables from windows/window.c. It turns out
that this was wildly overoptimistic - I missed quite a few of them!
I'm not quite sure how I managed that; my best guess is that I used an
underpowered 'nm' command that failed to find some classes of
variable.
Some of the remaining function-scope statics were removed completely
by commit afb3dab1e9 just now. In this commit, I've swept up some
more and turn them into fields of WinGuiSeat, where they should have
been moved last September.
The (hopefully complete this time) list of remaining variables,
generated by running this rune in the Windows build directory:
nm windows/CMakeFiles/putty.dir/window.c.obj |
grep -E '^([^ ]+)? *[bBcCdDgGsS] [^\.]'
consists of the following variables which are legitimately global
across the whole process and not related to a particular window:
- 'hinst' and 'hprev', instance handles for Windows loadable modules
- 'classname' in the terminal_window_class_a() and
terminal_window_class_w() functions, which allocate a window class
reusably
- some pointers to Windows API functions retrieved via the
DECL_WINDOWS_FUNCTION / GET_WINDOWS_FUNCTION system, such as
p_AdjustWindowRectExForDpi and p_FlashWindowEx
- some pointers to Windows API functions set up by assigning them at
startup to the right one of the ANSI or Unicode version depending on
the Windows version, e.g. sw_DefWindowProc and sw_DispatchMessage
- 'unicode_window', a boolean flag set at the same time as those
sw_Foo function pointers
- 'sesslist', storing the last-retrieved version of the saved
sessions menu
- 'cursor_visible' in show_mouseptr() and 'forced_visible' in
update_mouse_pointer(), each of which tracks the cumulative number
of times that function has shown or hidden the mouse pointer, so as
to manage its effect on the global state updated by ShowCursor
- 'trust_icon', loaded from the executable's resources
- 'wgslisthead', the list of all active WinGuiSeats
- 'wm_mousewheel', the window-message id we use for mouse wheel
events
and the following which are nothing to do with our code:
- '_OptionsStorage' in __local_stdio_printf_options() and
__local_stdio_scanf_options(), which I'd never noticed before, but
apparently are internal to a standard library header.
lpDx_maybe was a pointer defined to point at either lpDx itself or
NULL, depending on whether the code decided it needed to pass the lpDx
array of per-character pixel offsets to various functions during
drawing (based in turn on whether the font was variable-pitch).
lpDx is reallocated as necessary, which means lpDx_maybe must be kept
up to date. This was achieved by resetting it to lpDx if it was
already non-NULL.
But lpDx starts out as NULL before the first reallocation, so that
this can't work - it'll be initialised to NULL even if we _did_ want
to use it, and then at the first realloc, it won't be updated!
Before the previous commit turned lpDx from a static into an automatic
variable, this would have been a rare bug affecting only the first
call to the function. Now it will happen all the time, which is
better, because we can notice and fix it.
Replaced lpDx_maybe completely with a boolean flag indicating whether
we should pass lpDx to drawing functions.
In windows/window.c, a few variables inside functions were declared as
static, with no particular purpose that I can see: they don't seem to
have any reason to persist between calls to the function. So it makes
more sense to have them be ordinary stack-allocated automatic
variables.
Static variables removed by this commit:
- 'RECT ss' in reset_window.
- 'WORD keys[3]' and 'BYTE keysb[3]' in TranslateKey.
- several (buffer, length) pairs in do_text_internal.
- keys_unicode[] in TranslateKey.
All of these variables were originally introduced in patches credited
to Robert de Bath, which means I can't even try to reconstruct my
original thought processes, because they weren't _my_ thoughts anyway.
The arrays in do_text_internal are the easiest to understand: they're
reallocated larger as necessary, and making them static means the
allocation from a previous call can be reused, saving a malloc (though
I don't think that's a good enough reason to bother, these days).
The fixed-size static arrays and RECT are harder to explain. I suspect
they might originally have been that way because of 1990s attitudes to
performance: in x86-32 it's probably marginally faster to give your
variables constant addresses than sp-relative ones, and in the 1990s
computers were much slower, so there's an argument for making things
static if you have no _need_ to make them automatic. These days, the
difference is negligible, and persistent state is much more widely
recognised as a risk!
But keys_unicode[] is by far the strangest, because there was code
that clearly _did_ expect it to persist between calls, namely three
assignments to keys_unicode[0] near the end of the function after it's
finished being used for any other purpose, and a conditioned-out set
of debug() calls at the top of the function that print its contents
before anything has yet written to it.
But as far as I can see, the persistent data in the array is otherwise
completely unused. In any call to the function, if keys_unicode is
used at all, then it's either written directly by a call to ToAsciiEx,
or else (for pre-NT platforms) converted from ToAsciiEx's output via
MultiByteToWideChar. In both cases, the integer variable 'r' indicates
how many array elements were written, and subsequent accesses only
ever read those elements. So the assignments to keys_unicode[0] at the
end of the previous call will be overwritten before anything at all
can depend on them - with the exception of those debug statements.
I don't really understand what was going on here. It's tempting to
guess that those final assignments must have once done something
useful, and the code that used them was later removed. But the source
control history doesn't bear that out: a static array of three
elements (under its original name 'keys') was introduced in commit
0d5d39064a, and then commits 953b7775b3 and 26f1085038
added the other two assignments. And as far as I can see, even as of
the original commit 0d5d39064a, the code already had the property
that there was a final assignment to keys[0] which would inevitably be
overwritten in the next call before it could affect anything.
So I'm totally confused about what those assignments were _ever_
useful for. But an email thread from the time suggests that some of
those patches were being rebased repeatedly past other work (or
rather, the much less reliable CVS analogue of rebasing), so my best
guess is that that's where the confusion crept in - perhaps in RDB's
original version of the code they did do something useful.
Regardless of that, I'm pretty convinced that persistent array can't
be doing anything useful _now_. So I'm taking it out. But if anyone
reports a bug resulting from this change, then I'll eat my words - and
with any luck the details of the bug report will give us a clue what's
going on, and then we can put back some equivalent functionality with
much better comments!
When you send a "publickey" USERAUTH_REQUEST containing a certified
RSA key, and you want to use a SHA-2 based RSA algorithm, modern
OpenSSH expects you to send the algorithm string as
rsa-sha2-NNN-cert-v01@openssh.com. But 7.7 and earlier didn't
recognise those names, and expected the algorithm string in the
userauth request packet to be ssh-rsa-cert-v01@... and would then
follow it with an rsa-sha2-NNN signature.
OpenSSH itself has a bug workaround for its own older versions. Follow
suit.
In commit d07d7d66f6 I rewrote the code that constructs RTF paste
data so that it uses a strbuf, in place of the previous ad-hoc code
that counted up the lengths of pieces of RTF in advance in order to
realloc the buffer.
But apparently I left in an entire loop whose job was to count up one
of those lengths, failing to notice that it's now completely pointless
because its output value is never needed!
Happily a clang upgrade has just improved the 'variable set but not
used' warning to the point where it can spot that. I expect previously
the variable still counted as 'used' because each increment of it used
the previous value.
(cherry picked from commit 6a27ae772c)
When the standalone version of a binary, with its help file included
as a resource, extracts that resource to write it to a disk, it could
have accidentally skipped a byte in the middle if the WriteFile call
in this loop had not managed to write the whole file in one go.
(cherry picked from commit 775d969ca8)
This reverts commit 0615767224
("Windows installer: remove explicit InstallScope setting"), albeit
with different comments.
The original change worked around a Windows security vulnerability
(CVE-2023-21800), but also resulted in a rather broken installer.
(cherry picked from commit cedeb75d59)
I checked exhaustively today and found that the only characters (even
in Unicode) that Windows's default argv splitter will recognise as
word separators are the space and tab characters. So I think it's a
mistake to use <ctype.h> functions to identify word separators; we
should use that fixed character pair, and then we know we're getting
the right ones only.
(cherry picked from commit 9adfa79767)
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
(cherry picked from commit a76109c586)
Horizontal scroll events aren't generated by the traditional mouse
wheel, but they can be generated by trackpad gestures, though this
isn't always configured on.
The cross-platform and Windows parts of this patch is due to
Christopher Plewright; I added the GTK support.
(cherry picked from commit 819efc3c21)
From https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-Any-event-tracking:
Any-event mode is the same as button-event mode, except that all motion
events are reported, even if no mouse button is down. It is enabled by
specifying 1003 to DECSET.
Normally the front ends only report mouse events when buttons are
pressed, so we introduce a MA_MOVE event with MBT_NOTHING set to
indicate such a mouse movement.
(cherry picked from commit 3cfbd3df0f)
A server attempt to resize the window (for instance via DECCOLM) when
"When window is resized" was set to "Forbid resizing completely" would
cause all terminal output to be suspended, due to the resize attempt
never being acknowledged.
(There are other code paths like this, which I've fixed for
completeness, but I don't think they have any effect: the terminal
filters out resize attempts to the current size before this point, and
even if a server can get such a request through the SUPDUP protocol, the
test for that is wrong and will never fire -- this needs fixing
separately.)
(cherry picked from commit ebceb8bc94)
In commit d07d7d66f6 I rewrote the code that constructs RTF paste
data so that it uses a strbuf, in place of the previous ad-hoc code
that counted up the lengths of pieces of RTF in advance in order to
realloc the buffer.
But apparently I left in an entire loop whose job was to count up one
of those lengths, failing to notice that it's now completely pointless
because its output value is never needed!
Happily a clang upgrade has just improved the 'variable set but not
used' warning to the point where it can spot that. I expect previously
the variable still counted as 'used' because each increment of it used
the previous value.
When the standalone version of a binary, with its help file included
as a resource, extracts that resource to write it to a disk, it could
have accidentally skipped a byte in the middle if the WriteFile call
in this loop had not managed to write the whole file in one go.
This reverts commit 0615767224
("Windows installer: remove explicit InstallScope setting"), albeit
with different comments.
The original change worked around a Windows security vulnerability
(CVE-2023-21800), but also resulted in a rather broken installer.
While doing that parametrisation I noticed three strlen calls that
could obviously be replaced with one - and then I also noticed that
there were missing parens in an expression that should have
been (n+1)/2, making it n + 1/2, i.e. just n, due to integer
arithmetic.
Happily that bug meant we were _over_-allocating rather than under,
but even so, how embarrassing. Fixed.
Created in the simplest way, by parametrising the existing code using
macros.
Nothing actually uses this yet. I hope to gradually switch
command-line parsing from 'ANSI' to Unicode strings, but this isn't
the only preparation needed, so it might yet be a while.
This will contain test code and test subprograms that don't belong in
the top-level test directory due to not being cross-platform.
Initial contents are test_screenshot.c, which was already its own
source file in the windows subdir, and test_split_into_argv.c, which
I've sawn off the bottom of windows/utils/split_into_argv.c and moved
into its own source file.
I checked exhaustively today and found that the only characters (even
in Unicode) that Windows's default argv splitter will recognise as
word separators are the space and tab characters. So I think it's a
mistake to use <ctype.h> functions to identify word separators; we
should use that fixed character pair, and then we know we're getting
the right ones only.
I'm not conveniently set up to actually run it during my main build,
since that happens entirely on Linux and cross-builds the Windows
binaries. But it should at least be possible to build and run it by
hand.
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
This has all the basic necessities to become a test of the terminal's
behaviour, in terms of how its data structures evolve as output is
sent to it, and perhaps also (by filling in the stub TermWin more
usefully) testing what it draws during updates and what it sends in
response to query sequences.
For the moment, all I've done is to set up the framework, and add one
demo test of printing some ordinary text and observing that it appears
in the data structures and the cursor has moved.
I expect that writing a full test of terminal.c will be a very big
job. But perhaps I or someone else will find time to prod it gradually
in the background of other work. In particular, when I'm _modifying_
any part of the terminal code, it would be good to add some tests for
the part I'm changing, before making the change, and check they still
work afterwards.
This continues the programme of UTF-8 support in authentication, begun
in commit f4519b6533 which arranged for console userpass prompts
to function in UTF-8 when the prompts_t asked them to. Since the new
line editing setup works properly when it _is_ in UTF-8 mode, I can
now also arrange that it puts the terminal into UTF-8 mode in the
right circumstances.
I've extended the applicability of the '-legacy-charset-handling' flag
introduced by the commit mentioned above, so that now it's not
specific to the console front end. Now you can give it to GUI PuTTY as
well, which restores the previous (wrong) behaviour of accepting
username and password prompt input in the main session's configured
character set. So if this change breaks someone's workflow, they
should be able to have it back.
I'm about to rewrite it completely, so the first thing I need to do is
to write tests for as much of the functionality as possible, so that I
can check the new implementation behaves in the same ways.
Similarly to the one I just added for FontSpec: in a cross-platform
main source file, you don't really want to mess about with
per-platform ifdefs just to initialise a 'struct unicode_data' from a
Conf. But until now, you had to, because init_ucs had a different
prototype on Windows and Unix.
I plan to use this in future test programs. But an immediate positive
effect is that it removes the only platform-dependent call from
fuzzterm.c. So now that could be built on Windows too, given only an
appropriate cmake stanza. (Not that I have much idea if it's useful to
fuzz the terminal separately on multiple platforms, but it's nice to
know that it's possible if anyone does need to.)
Constructing a FontSpec in platform-independent code is awkward,
because you can't call fontspec_new() outside the platform subdirs
(since its prototype varies per platform). But sometimes you just need
_some_ valid FontSpec, e.g. to put in a Conf that will be used in some
place where you don't actually care about font settings, such as a
purely CLI program.
Both Unix and Windows _have_ an idiom for this, but they're different,
because their FontSpec constructors have different prototypes. The
existing CLI tools have always had per-platform main source files, so
they just use the locally appropriate method of constructing a boring
don't-care FontSpec.
But if you want a _platform-independent_ main source file, such as you
might find in a test program, then that's rather awkward. Better to
have a platform-independent API for making a default FontSpec.
Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.
For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.
In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
I've only just noticed that the definition of CP_UTF8 as 65001 (the
Windows code page number for UTF-8) is in the main putty.h, under an
ifdef that checks whether the per-platform header file had already
defined it to something else. That's a silly way to do things! Better
that the Windows-specific definition goes _in_ the Windows platform
header, and putty.h contains no fallback. That way, anyone writing a
third separate platform directory will get an error reminding them
that they have to provide the right definition for their platform,
instead of finding out later via a runtime failure.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
Until now, the command-line PuTTY tools (PSCP, PSFTP and Plink) have
presented all the kinds of interactive prompt (password/passphrase,
host key, the assorted weak-crypto warnings, 'append to log file?') on
standard error, and read the responses from standard input.
This is unfortunate because if you're redirecting their standard
input (especially likely with Plink) then the prompt responses will
consume some of the intended session data. It would be better to
present the prompts _on the console_, even if that's not where stdin
or stderr point.
On Unix, we've been doing this for ages, by opening /dev/tty directly.
On Windows, we didn't, because I didn't know how. But I've recently
found out: you can open the magic file names CONIN$ and CONOUT$, which
will point at your actual console, if one is available.
So now, if it's possible, the command-line tools will do that. But if
the attempt to open CONIN$ and CONOUT$ fails, they'll fall back to the
old behaviour (in particular, if no console is available at all).
In order to make this happen consistently across all the prompt types,
I've introduced a new object called ConsoleIO, which holds whatever
file handles are necessary, knows whether to close them
afterwards (yes if they were obtained by opening CONFOO$, no if
they're the standard I/O handles), and presents a BinarySink API to
write to them and a custom API to read a line of text.
This seems likely to break _someone's_ workflow. So I've added an
option '-legacy-stdio-prompts' to restore the old behaviour.
In the Windows API, there are two places you can get a command line in
the form of a single unsplit string. One is via the command-line
parameter to WinMain(); the other is by calling GetCommandLine(). But
the two have different semantics: the WinMain command line string is
only the part after the program name, whereas GetCommandLine() returns
the full command line _including_ the program name.
PuTTY has never yet had to parse the full output of GetCommandLine,
but I have plans that will involve it beginning to do so. So I need to
make sure the utility function split_into_argv() can handle it.
This is not trivial because the quoting convention is different for
the program name than for everything else. In the program's normal
arguments, parsed by the C library startup code, the convention is
that backslashes are special when they appear before a double quote,
because that's how you write a literal double quote. But in the
program name, backslashes are _never_ special, because that's how
CreateProcess parses the program name at the start of the command
line, and the C library must follow suit in order to correctly
identify where the program name ends and the arguments begin.
In particular, consider a command line such as this:
"C:\Program Files\Foo\"foo.exe "hello \"world\""
The \" in the middle of the program name must be treated as a literal
backslash, followed by a non-literal double quote which matches the
one at the start of the string and causes the space in 'Program Files'
to be treated as part of the pathname. But the same \" when it appears
in the subsequent argument is treated as an escaped double quote, and
turns into a literal " in the argument string.
This commit adds support for this special initial-word handling in
split_into_argv(), via an extra boolean argument indicating whether to
turn that mode on. However, all existing call sites set the flag to
false, because the new mode isn't needed _yet_. So there should be no
functional change.
This removes one case from several of the individual tools'
command-line parsers, and moves it into a central place where it will
automatically be supported by any tool containing console.c.
In order to make that not cause a link failure, there's now a
stubs/no-console.c which GUI clients of cmdline.c must include.
Horizontal scroll events aren't generated by the traditional mouse
wheel, but they can be generated by trackpad gestures, though this
isn't always configured on.
The cross-platform and Windows parts of this patch is due to
Christopher Plewright; I added the GTK support.
From https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-Any-event-tracking:
Any-event mode is the same as button-event mode, except that all motion
events are reported, even if no mouse button is down. It is enabled by
specifying 1003 to DECSET.
Normally the front ends only report mouse events when buttons are
pressed, so we introduce a MA_MOVE event with MBT_NOTHING set to
indicate such a mouse movement.
This enables it to handle data that isn't presented as a
NUL-terminated string.
In particular, the NUL byte can appear _within_ the string and be
correctly translated to the NUL wide character. So I've been able to
remove the awkwardness in the test rig of having to include the
terminating NUL in every test to ensure NUL has been tested, and
instead, insert a single explicit test for it.
Similarly to the previous commit, the simplification at the (one) call
site gives me a strong feeling of 'this is what the API should have
been all along'!
Previously it output to an ordinary char buffer, and returned the
number of bytes it had written. But three out of the four call sites
immediately chucked the resulting bytes into a BinarySink anyway. The
fourth, in windows/unicode.c, really is writing into successive
locations of a fixed-size buffer - but we can make that into a
BinarySink too, using the buffer_sink added in the previous commit.
So now encode_utf8() is renamed put_utf8_char, and the call sites all
look simpler than they started out.
A server attempt to resize the window (for instance via DECCOLM) when
"When window is resized" was set to "Forbid resizing completely" would
cause all terminal output to be suspended, due to the resize attempt
never being acknowledged.
(There are other code paths like this, which I've fixed for
completeness, but I don't think they have any effect: the terminal
filters out resize attempts to the current size before this point, and
even if a server can get such a request through the SUPDUP protocol, the
test for that is wrong and will never fire -- this needs fixing
separately.)
The "Cancel" button's keyboard shortcut was accidentally removed by
f1c8298000, having only just reinstated it in a77040afa1.
(Also, fix a couple of blatantly fibbing "accelerators used" comments.)
It turns out this isn't actually necessary after all to make the
installer behave in the expected way in the default case (giving a UAC
prompt and installing systemwide). And I'm told it has undesirable
consequences in more complicated cases, which I'm not expert enough in
MSI to fully understand.
Of the three calls to queue_toplevel_callback in window.c, one of them
was still passing NULL as its context parameter, rather than the new
'wgs'. As a result, a segfault could occur during some session closures.