This is the first bug found as a direct result of writing that
terminal test program - I added some tests for things I expected to
work already, and some of them didn't, proving immediately that it was
a good idea!
If the terminal is one column wide, and you've printed a
character (hence, set the wrapnext flag), what should backspace do?
Surely it should behave like any other backspace with wrapnext set,
i.e. clear the wrapnext flag, returning the cursor's _logical_
position to the location of the most recently printed character. But
in fact it was anti-wrapping to the previous line, because I'd got the
cases in the wrong order in the if-else chain that forms the backspace
handler. So the handler for 'we're in column 0, wrapping time' was
coming before 'wrapnext is set, just clear it'.
Now wrapnext is checked _first_, before checking anything at all. Any
time we can just clear that, we should.
Suppose an application tries to print a double-width character
starting in the rightmost column of the screen, so that we apply our
emergency fix of wrapping to the next line immediately and printing
the character in the first two columns. Suppose they then backspace
twice, taking the cursor to the RHS and then the LHS of that
character. What should happen if they backspace a third time?
Our previous behaviour was to completely ignore the unusual situation,
and do the same thing we'd do in any other backspace from column 0:
anti-wrap the cursor to the last column of the previous line, leaving
it in the empty character cell that was skipped when the DW char
couldn't be printed in it.
But I think this isn't the best response, because it breaks the
invariant that printing N columns' worth of graphic characters and
then backspacing N times should leave the cursor on the first of those
characters. If I print "a가" (for example) and then backspace three
times, I want the cursor on the a, _even_ if weird line wrapping
behaviour happened somewhere in that sequence.
(Rationale: this helps naïve terminal applications which don't even
know what the terminal width is, and aren't tracking their absolute x
position. In particular, the simplistic line-based input systems that
appear in OS kernels and our own lineedit.c will want to emit a fixed
number of backspace-space-backspace sequences to delete characters
previously entered on to the line by the user. They still need to
check the wcwidth of the characters they're emitting, so that they can
BSB twice for a DW character or 0 times for a combining one, but it
would be *hugely* more awkward for them to ask the terminal where the
cursor is so that they can take account of difficult line wraps!)
We already have the ability to _recognise_ this situation: on a line
that was wrapped in this unusual way, we set the LATTR_WRAPPED2 line
attribute flag, to prevent the empty rightmost column from injecting
an unwanted space into copy-pastes from the terminal. Now we also use
the same flag to cause the backspace control character to do something
interesting.
This was the fix that inspired me to start writing test_terminal,
because I knew it was touching a delicate area. However, in the course
of writing this fix and its tests, I encountered two (!) further bugs,
which I'll fix in followup commits!
This has all the basic necessities to become a test of the terminal's
behaviour, in terms of how its data structures evolve as output is
sent to it, and perhaps also (by filling in the stub TermWin more
usefully) testing what it draws during updates and what it sends in
response to query sequences.
For the moment, all I've done is to set up the framework, and add one
demo test of printing some ordinary text and observing that it appears
in the data structures and the cursor has moved.
I expect that writing a full test of terminal.c will be a very big
job. But perhaps I or someone else will find time to prod it gradually
in the background of other work. In particular, when I'm _modifying_
any part of the terminal code, it would be good to add some tests for
the part I'm changing, before making the change, and check they still
work afterwards.
It wasn't used for anything except in an assert statement, which was
triggered by the use of the scrlineptr() macro wrapper. Now moved that
check into scrlineptr() itself, via a helper function that passes the
line number of the scrlineptr() call site.
(Yes, this is introducing another modalfatalbox in terminal.c, much
like the dreaded line==NULL one that caused us so many headaches in
past decades. But the check in question was being done _already_ by
the assert in lineptr(), so this change shouldn't make it go off in
any _more_ circumstances - and now, if it does, it will at least give
us slightly more useful information about where the problem is!)
This continues the programme of UTF-8 support in authentication, begun
in commit f4519b6533 which arranged for console userpass prompts
to function in UTF-8 when the prompts_t asked them to. Since the new
line editing setup works properly when it _is_ in UTF-8 mode, I can
now also arrange that it puts the terminal into UTF-8 mode in the
right circumstances.
I've extended the applicability of the '-legacy-charset-handling' flag
introduced by the commit mentioned above, so that now it's not
specific to the console front end. Now you can give it to GUI PuTTY as
well, which restores the previous (wrong) behaviour of accepting
username and password prompt input in the main session's configured
character set. So if this change breaks someone's workflow, they
should be able to have it back.
This takes over from both the implementation in ldisc.c and the one in
term_get_userpass_input, which were imperfectly duplicating each
other's functionality. The new version should be more consistent
between the two already, and also, it means further improvements can
now be made in just one place.
In the course of this, I've restructured the inside of ldisc.c by
moving the input_queue bufchain to the other side of the translation
code in ldisc_send. Previously, ldisc_send received a string, an
optional 'dedicated key' indication (bodgily signalled by a negative
length) and an 'interactive' flag, translated that somehow into a
combination of raw backend output and specials, and saved the latter
in input_queue. Now it saves the original (string, dedicated flag,
interactive flag) data in input_queue, and doesn't do the translation
until the data is pulled back _out_ of the queue. That's because the
new line editing system expects to receive something much closer to
the original data format.
The term_get_userpass_input system is also substantially restructured.
Instead of ldisc.c handing each individual keystroke to terminal.c so
that it can do line editing on it, terminal.c now just gives the Ldisc
a pointer to its instance of the new TermLineEditor object - and then
ldisc.c can put keystrokes straight into that, in the same way it
would put them into its own TermLineEditor, without having to go via
terminal.c at all. So the term_get_userpass_input edifice is only
called back when the line editor actually delivers the answer to a
username or password prompt.
(I considered not _even_ having a separate TermLineEditor for password
prompts, and just letting ldisc.c use its own. But the problem is that
some of the behaviour differences between the two line editors are
deliberate, for example the use of ^D to signal 'abort this prompt',
and the use of Escape as an alternative line-clearing command. So
TermLineEditor has a flags word that allows ldisc and terminal to set
it up differently. Also this lets me give the two TermLineEditors a
different vtable of callback functions, which is a convenient way for
terminal.c to get notified when a prompt has been answered.)
The new line editor still passes all the tests I wrote for the old
one. But it already has a couple of important improvements, both in
the area of UTF-8 handling:
Firstly, when we display a UTF-8 character on the terminal, we check
with the terminal how many character cells it occupied, and then if
the user deletes it again from the editing buffer, we can emit the
right number of backspace-space-backspace sequences. (The old ldisc
line editor incorrectly assumed all Unicode characters had terminal
with 1, partly because its buffer was byte- rather than character-
oriented and so it was more than enough work just finding where the
character _start_ was.)
Secondly, terminal.c's userpass line editor would never emit a byte in
the 80-BF range to the terminal at all, which meant that nontrivial
UTF-8 characters always came out as U+FFFD blobs!
I'm about to rewrite it completely, so the first thing I need to do is
to write tests for as much of the functionality as possible, so that I
can check the new implementation behaves in the same ways.
For years I've been following the principle that before I'll add
auto-detection of an SSH server bug, I want the server maintainer to
have fixed the bug, so that the list of affected version numbers
triggering the workaround is complete, and to provide an incentive for
implementations to gradually converge on rightness.
*Finally*, I've got round to documenting that policy in public, for
the Feedback page!
Similarly to the one I just added for FontSpec: in a cross-platform
main source file, you don't really want to mess about with
per-platform ifdefs just to initialise a 'struct unicode_data' from a
Conf. But until now, you had to, because init_ucs had a different
prototype on Windows and Unix.
I plan to use this in future test programs. But an immediate positive
effect is that it removes the only platform-dependent call from
fuzzterm.c. So now that could be built on Windows too, given only an
appropriate cmake stanza. (Not that I have much idea if it's useful to
fuzz the terminal separately on multiple platforms, but it's nice to
know that it's possible if anyone does need to.)
Constructing a FontSpec in platform-independent code is awkward,
because you can't call fontspec_new() outside the platform subdirs
(since its prototype varies per platform). But sometimes you just need
_some_ valid FontSpec, e.g. to put in a Conf that will be used in some
place where you don't actually care about font settings, such as a
purely CLI program.
Both Unix and Windows _have_ an idiom for this, but they're different,
because their FontSpec constructors have different prototypes. The
existing CLI tools have always had per-platform main source files, so
they just use the locally appropriate method of constructing a boring
don't-care FontSpec.
But if you want a _platform-independent_ main source file, such as you
might find in a test program, then that's rather awkward. Better to
have a platform-independent API for making a default FontSpec.
Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.
For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.
In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
A user just reported that it hasn't been there since 0.76. This turns
out to be because I put the wrong pathname on the 'zip' commands in
Buildscr (miscounted the number of ../ segments).
I would have noticed immediately, if Info-Zip had failed with an error
when it found I'd given it a nonexistent filename to add to the zip
file. But in fact it just prints a warning and proceeds to add all the
other files I specified. It looks as if it will only return a nonzero
exit status if _all_ the filenames you specified were nonexistent.
Therefore, I've rewritten the zip-creation commands so that they run
zip once per file. That way if any file is unreadable we _will_ get a
build error.
(Also, while I'm here, I took the opportunity to get rid of that ugly
ls|grep.)
I've only just noticed that the definition of CP_UTF8 as 65001 (the
Windows code page number for UTF-8) is in the main putty.h, under an
ifdef that checks whether the per-platform header file had already
defined it to something else. That's a silly way to do things! Better
that the Windows-specific definition goes _in_ the Windows platform
header, and putty.h contains no fallback. That way, anyone writing a
third separate platform directory will get an error reminding them
that they have to provide the right definition for their platform,
instead of finding out later via a runtime failure.
cmake's configure-time #defines (at least the way I use them) are
defined to 0 or 1, rather than sometimes not defined at all, so you
have to test them with plain #if rather than #ifdef or #if defined.
I _thought_ I'd caught all of those in previous fixes, but apparently
there were a couple still lurking. Oops.
You can't call sk_write_eof() twice on the same socket, because the
second one will fail an assertion. Instead, you're supposed to know
you've already sent EOF, and not try to send it again.
The call to sk_write_eof() in raw_special (triggered by pressing ^D in
GUI PuTTY, for example) sets the flag raw->sent_socket_eof in an
attempt to prevent this. But it doesn't _check_ that flag, so a second
press of ^D can reach that assertion failure.
In ldisc's line editing mode, pressing ^U is supposed to erase the
current unsent line rather than inserting a literal ^U into the
buffer. In fact, when using a non-Telnet backend, it erases the
line *and* inserts ^U into the buffer!
This happens because it shares a case handler with three other
disruptive control characters (^C, ^\, ^Z), which all also clear the
line-editing buffer before doing their various actions. But in
non-Telnet mode, their actions become literal insertion of themselves,
so the combined effect is to erase the line and them self-insert.
I'm not 100% convinced that was what I actually meant to do with those
characters. But it _certainly_ wasn't what I meant to do with ^U, so
that one at least I should fix right now!
When I maximised a terminal window today and then used Ctrl-< to
reduce its font size (expecting that the window size would stay the
same but more characters would be squeezed in), pterm failed the
assertion in term_request_resize_completed() that checks
term->win_resize_pending == WIN_RESIZE_AWAIT_REPLY.
This happened because in this situation request_resize_internal() was
called from within window.c rather than from within the terminal code
itself. So the terminal didn't know a resize is pending at all, and
was surprised to be told that one had finished.
request_resize_internal() already has a flag parameter to tell it
whether a given resize came from the terminal or not. On the main code
path, that flag is used to decide whether to notify the terminal. But
on the early exit path when the window is maximised, we weren't
checking the flag. An easy fix.
I noticed today that when GTK PuTTY puts up a message box such as a
host key dialog, which calls our create_message_box function with
selectable=true (so that the host key fingerprint can be conveniently
copy-pasted), a side effect is to take the X11 PRIMARY selection away
from whoever previously had it, even though the message box isn't
actually selecting anything right now.
I don't fully understand what's going on, but it apparently has
something to do with 'select on focus' behaviour, in which tabbing
into a selectable text control automatically selects its entire
contents. That makes sense for edit boxes, but not really for this
kind of thing.
Unfortunately, GTK apparently has no per-widget configuration to turn
that off. (The closest I found is not even per _application_: it lives
in GtkSettings, whose documentation says that it's general across all
GTK apps run by a user!)
So instead I work around it by moving the gtk_label_set_selectable
call to after the focus of the new window has already been sorted out.
Ugly, but it seems to work.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
Until now, the command-line PuTTY tools (PSCP, PSFTP and Plink) have
presented all the kinds of interactive prompt (password/passphrase,
host key, the assorted weak-crypto warnings, 'append to log file?') on
standard error, and read the responses from standard input.
This is unfortunate because if you're redirecting their standard
input (especially likely with Plink) then the prompt responses will
consume some of the intended session data. It would be better to
present the prompts _on the console_, even if that's not where stdin
or stderr point.
On Unix, we've been doing this for ages, by opening /dev/tty directly.
On Windows, we didn't, because I didn't know how. But I've recently
found out: you can open the magic file names CONIN$ and CONOUT$, which
will point at your actual console, if one is available.
So now, if it's possible, the command-line tools will do that. But if
the attempt to open CONIN$ and CONOUT$ fails, they'll fall back to the
old behaviour (in particular, if no console is available at all).
In order to make this happen consistently across all the prompt types,
I've introduced a new object called ConsoleIO, which holds whatever
file handles are necessary, knows whether to close them
afterwards (yes if they were obtained by opening CONFOO$, no if
they're the standard I/O handles), and presents a BinarySink API to
write to them and a custom API to read a line of text.
This seems likely to break _someone's_ workflow. So I've added an
option '-legacy-stdio-prompts' to restore the old behaviour.
In the Windows API, there are two places you can get a command line in
the form of a single unsplit string. One is via the command-line
parameter to WinMain(); the other is by calling GetCommandLine(). But
the two have different semantics: the WinMain command line string is
only the part after the program name, whereas GetCommandLine() returns
the full command line _including_ the program name.
PuTTY has never yet had to parse the full output of GetCommandLine,
but I have plans that will involve it beginning to do so. So I need to
make sure the utility function split_into_argv() can handle it.
This is not trivial because the quoting convention is different for
the program name than for everything else. In the program's normal
arguments, parsed by the C library startup code, the convention is
that backslashes are special when they appear before a double quote,
because that's how you write a literal double quote. But in the
program name, backslashes are _never_ special, because that's how
CreateProcess parses the program name at the start of the command
line, and the C library must follow suit in order to correctly
identify where the program name ends and the arguments begin.
In particular, consider a command line such as this:
"C:\Program Files\Foo\"foo.exe "hello \"world\""
The \" in the middle of the program name must be treated as a literal
backslash, followed by a non-literal double quote which matches the
one at the start of the string and causes the space in 'Program Files'
to be treated as part of the pathname. But the same \" when it appears
in the subsequent argument is treated as an escaped double quote, and
turns into a literal " in the argument string.
This commit adds support for this special initial-word handling in
split_into_argv(), via an extra boolean argument indicating whether to
turn that mode on. However, all existing call sites set the flag to
false, because the new mode isn't needed _yet_. So there should be no
functional change.
Just like burnstr(), it memsets a NUL-terminated string to all zeroes
before freeing it. The only difference is that it does it to a string
of wchar_t.
This removes one case from several of the individual tools'
command-line parsers, and moves it into a central place where it will
automatically be supported by any tool containing console.c.
In order to make that not cause a link failure, there's now a
stubs/no-console.c which GUI clients of cmdline.c must include.
Horizontal scroll events aren't generated by the traditional mouse
wheel, but they can be generated by trackpad gestures, though this
isn't always configured on.
The cross-platform and Windows parts of this patch is due to
Christopher Plewright; I added the GTK support.
This was requested by a downstream of the code, who wanted to change
the time/space tradeoff in the terminal. I currently have no plans to
change this setting for upstream PuTTY, although there is a cmake
option for it just to make testing it easy.
To avoid sprinkling ifdefs over the whole terminal code, the strategy
is to keep the separate type 'compressed_scrollback_line', and turn it
into a typedef for a 'termline *'. So compressline() becomes almost
trivial, and decompressline() even more so.
Memory management is the fiddly part. To make this work sensibly on
both sides, I've broken up each of compressline() and decompressline()
into two versions, one of which takes ownership of (and logically
speaking frees) its input, and the other doesn't. So at call sites
where a function was followed by a free, it's now calling the
'and_free' version of the function, and where the input object was
reused afterwards, it's calling the 'no_free' version. This means that
in different branches of the #if, I can make one function call the
other or vice versa, and no call site is stuck with having to do
things in a more roundabout way than necessary.
The freeing of the _return_ value from decompressline() is handled for
us, because termlines already have a 'temporary' flag which is set
when they're returned from the decompressor, and anyone receiving a
termline from lineptr() calls unlineptr() when they're finished with
it, which will _conditionally_ free it, depending on that 'temporary'
flag. So in the new mode, 'temporary' is never set at all, and all
those unlineptr() calls do nothing.
However, we also still need to free compressed lines properly when
they're actually being thrown away (scrolled off the top of the
scrollback, or cleaned up in term_free), and for that, I've made a new
special-purpose free_compressed_line() function.
A KDE user observed that if you 'dock' a GTK PuTTY window to the side
of the screen (by dragging it to the RH edge, causing it to
half-maximise over the right-hand half of the display, similarly to
Windows), and then send a terminal resize sequence, then PuTTY fails
the assertion in term_resize_request_completed() which expects that an
unacknowledged resize request was currently in flight.
When drawing_area_setup() calls term_resize_request_completed() in
response to the inst->term_resize_notification_required flag, it
resets the inst->win_resize_pending flag, but doesn't reset
inst->term_resize_notification_required. As a result, the _next_ call
to drawing_area_setup will find that flag still set, and make a
duplicate call to term_resize_request_completed, after the terminal no
longer believes it's waiting for a response to a resize request. And
in this 'docked to the right-hand side of the display' state, KDE
apparently triggers two calls to drawing_area_setup() in quick
succession, making this bug manifest.
I could fix this by clearing inst->term_resize_notification_required.
But inspecting all the other call sites, it seems clear to me that my
original intention was for inst->term_resize_notification_required to
be a flag that's only meaningful if inst->win_resize_pending is set.
So I think a better fix is to conditionalise the check in
drawing_area_setup so that we don't even check
inst->term_resize_notification_required if !inst->win_resize_pending.
From https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-Any-event-tracking:
Any-event mode is the same as button-event mode, except that all motion
events are reported, even if no mouse button is down. It is enabled by
specifying 1003 to DECSET.
Normally the front ends only report mouse events when buttons are
pressed, so we introduce a MA_MOVE event with MBT_NOTHING set to
indicate such a mouse movement.
Unlike clang, VS didn't like me using the value of one 'static const'
integer variable to compute the value of another, and complained
'initializer is not a constant'. Replaced all those variables with an
enum, which should also more reliably ensure that even an
unsophisticated compiler doesn't actually reserve data-section space
for them.
A new module in 'utils' computes NFC and NFD, via a new set of data
tables generated by read_ucd.py.
The new module comes with a new test program, which can read the
NormalizationTest.txt that appears in the Unicode Character Database.
All the tests pass, as of Unicode 15.
Now I have a script I can easily re-run, there's no reason not to do
just that! This updates all of the new generated header files for the
UCD.zip that comes with Unicode 15.0.0.
I've re-run my bidi test suite against 15.0.0's file of test cases,
and confirmed they all pass.
The initial outputs were all deliberately inconsistent with each
other, so that each one exactly matched the existing table I was
trying to replace.
Now I've done that check, I can clean them up. Normalised spacing and
case to be consistent; removed pointless indentation (these are now
include files, so they don't have to be indented to the same level as
the array declaration surrounding each one's #include); added a header
comment in each autogenerated file, saying that it's autogenerated,
what it's for, and who it's used by.
The currently supported version number of Unicode is also exposed in a
header file, so that I can put it in diagnostics.
This will replace the various pieces of Perl scattered throughout the
code base in comments above long boring data tables. The idea is that
those long boring tables will move into header files in the new
'unicode' directory, and will be #included from the source files that
use the tables.
One benefit is that I won't have to page tediously past the tables to
get to the actual code I want to edit. But more importantly, it should
now become easy to update to a new version of Unicode, by re-running
just one script and committing the changed versions of all the headers
in the 'unicode' subdir.
This version of the script regenerates six Unicode-derived tables in
the existing source code in a byte-for-byte identical form. In the
next commits I'll clean it up, commit the output, and delete the
tables from their previous locations.
(One table I _haven't_ incorporated into this system is the Arabic
shaping table in bidi.c, because my attempt to regenerate it came out
not matching the original at all. That _might_ be because the table is
based on an old Unicode standard and desperately needs updating, but
it might also be because I misunderstood how it works. So I'll leave
sorting that out for another time.)
This enables it to handle data that isn't presented as a
NUL-terminated string.
In particular, the NUL byte can appear _within_ the string and be
correctly translated to the NUL wide character. So I've been able to
remove the awkwardness in the test rig of having to include the
terminating NUL in every test to ensure NUL has been tested, and
instead, insert a single explicit test for it.
Similarly to the previous commit, the simplification at the (one) call
site gives me a strong feeling of 'this is what the API should have
been all along'!
The test in question was supposed to contain the spurious UTF-8
encoding that 0xD800 would have if it were not a surrogate. But the
final continuation character 0x80 was instead 0x00.
The test passed anyway, because ED A0 was regarded as a truncated
sequence, instead of ED A0 80 being regarded as an illegal encoding of
a surrogate, and both return the same output!
Previously it output to an ordinary char buffer, and returned the
number of bytes it had written. But three out of the four call sites
immediately chucked the resulting bytes into a BinarySink anyway. The
fourth, in windows/unicode.c, really is writing into successive
locations of a fixed-size buffer - but we can make that into a
BinarySink too, using the buffer_sink added in the previous commit.
So now encode_utf8() is renamed put_utf8_char, and the call sites all
look simpler than they started out.