This involved a trivial merge conflict fix in terminal.c because of
the way the cherry-pick 73b41feba5 differed from its original
bdbd5f429c.
But a more significant rework was needed in windows/console.c, because
the updates to confirm_weak_* conflicted with the changes on main to
abstract out the ConsoleIO system.
This centralises the messages for weak crypto algorithms (general, and
host keys in particular, the latter including a list of all the other
available host key types) into ssh/common.c, in much the same way as
we previously did for ordinary host key warnings.
The reason is the same too: I'm about to want to vary the text in one
of those dialog boxes, so it's convenient to start by putting it
somewhere that I can modify just once.
This allows a couple more settings to be treated automatically on
save, which are more complicated on load because they still honour
older alternative save keywords.
In particular, CONF_proxy_type and CONF_remote_qtitle_action now have
explicit enum mappings. These were needed for the automated save code,
but also, I've rewritten the custom load code to use them too. This
decouples the storage format of those settings from the order of
values in the internal enum, which is generally an advantage of
specifying storage enums explicitly.
Those two settings weren't already tested by test_conf, because I
wasn't changing them in previous commits. Now I've added extra code
that does test them, and verified it works when backported to commit
b567c9b2b5 where I introduced test_conf before beginning the main
refactoring.
A setting can also be specified explicitly as not loaded and saved at
all. There were quite a few commented that way, but now there's a
machine-readable indication of it.
test_conf will now check that all these settings make sense together -
things shouldn't have a save keyword unless they use it, and should
have one if they don't, and shouldn't specify combinations of options
that conflict.
(For that reason, test_conf is now also running the consistency check
before the main test, so that a missing keyword will cause an error
message _before_ it causes a segfault, saving some debugging!)
The new ConfKeyInfo structure now includes some fields indicating how
to load and save each config option: what keyword it's stored under in
the saved settings file, and what its default value should be set to
when loading a session that doesn't mention it. (Including, of course,
loading the null session at program startup.)
So far, this only applies to the saved settings that are sufficiently
simple: a single integer, string or boolean value whose internal
format matches its storage format, or an integer value consisting of a
finite enumeration with a fixed mapping between its internal and
storage formats. Anything more difficult than that - mappings,
variable defaults, config options tied together, options that still
support a legacy save format alongside the up-to-date one, things
under #ifdef - hasn't yet been tampered with.
This allows a large amount of repetitive code in settings.c to be
deleted, and replaced by simple loops over the conf_key_info array
doing all the easy work. The remaining manual load/save code per
option is all there because it's difficult in some way.
The transitional test_conf program still passes after this upheaval.
This array is planned to be exposed more widely, and used for more
purposes than just checking the types of Conf options. In this commit
it just takes over from the two previous smaller arrays, and adds no
extra data.
Normally you don't ever want to have a Conf structure that doesn't
have an entry for every primary key, because the code that uses Conf
to get real work done will fail assertions if lookups fail. But test
programs manipulating Conf in unusual ways are a special case.
(In particular, one thing you _can_ legally do with an empty Conf is
to call load_open_settings() to populate it. That has to be legal,
because it's how a Conf gets populated in the first place, after it's
initially created empty.)
I think the only reason it currently takes a plain string is because
its interesting caller (in unix/x11.c) has just constructed a string
out of an environment variable, and it seemed like the path of least
effort not to bother wrapping it into a proper Filename. But when
Filename on Windows becomes more interesting, we'll need it to take
the full version.
We already had encode_wide_string_as_utf8, which treats the wide
string as UTF-16 or UTF-32 as appropriate to the size of wchar_t. I'm
about to need the inverse function, and was surprised that it didn't
already exist (even though enough component parts did to make it easy).
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
(cherry picked from commit a76109c586)
I thought I'd found all of these before, but perhaps a few managed to
slip in since I last looked. The character argument to the <ctype.h>
functions must have the value of an unsigned char or EOF; passing an
ordinary char (unless you know char is unsigned on every platform the
code will ever go near) risks mistaking '\xFF' for EOF, and causing
outright undefined behaviour on byte values in the range 80-FE. Never
do it.
Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.
For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.
In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
Just like burnstr(), it memsets a NUL-terminated string to all zeroes
before freeing it. The only difference is that it does it to a string
of wchar_t.
Unlike clang, VS didn't like me using the value of one 'static const'
integer variable to compute the value of another, and complained
'initializer is not a constant'. Replaced all those variables with an
enum, which should also more reliably ensure that even an
unsophisticated compiler doesn't actually reserve data-section space
for them.
A new module in 'utils' computes NFC and NFD, via a new set of data
tables generated by read_ucd.py.
The new module comes with a new test program, which can read the
NormalizationTest.txt that appears in the Unicode Character Database.
All the tests pass, as of Unicode 15.
This enables it to handle data that isn't presented as a
NUL-terminated string.
In particular, the NUL byte can appear _within_ the string and be
correctly translated to the NUL wide character. So I've been able to
remove the awkwardness in the test rig of having to include the
terminating NUL in every test to ensure NUL has been tested, and
instead, insert a single explicit test for it.
Similarly to the previous commit, the simplification at the (one) call
site gives me a strong feeling of 'this is what the API should have
been all along'!
The test in question was supposed to contain the spurious UTF-8
encoding that 0xD800 would have if it were not a surrogate. But the
final continuation character 0x80 was instead 0x00.
The test passed anyway, because ED A0 was regarded as a truncated
sequence, instead of ED A0 80 being regarded as an illegal encoding of
a surrogate, and both return the same output!
Previously it output to an ordinary char buffer, and returned the
number of bytes it had written. But three out of the four call sites
immediately chucked the resulting bytes into a BinarySink anyway. The
fourth, in windows/unicode.c, really is writing into successive
locations of a fixed-size buffer - but we can make that into a
BinarySink too, using the buffer_sink added in the previous commit.
So now encode_utf8() is renamed put_utf8_char, and the call sites all
look simpler than they started out.
This is one of marshal.c's small collection of handy BinarySink
adapters to existing kinds of thing, alongside stdio_sink and
bufchain_sink. It writes into a fixed-size buffer, discarding all
writes after the buffer fills up, and sets a flag to let you know if
it overflowed.
There was one of these in Windows Pageant a while back, under the name
'struct PageantReply' (introduced in commit b6cbad89fc, removed
again in 98538caa39 when the named-pipe revamp made it
unnecessary). This is the same idea but centralised for reusability.
I mentioned recently (in commit 9e7d4c53d8) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.
This is like the seat-independent nonfatal(), but specifies a Seat,
which allows the GUI dialog box to have the right terminal window as
its parent (if there are multiple ones).
Changed over all the nonfatal() calls in the code base that could be
localised to a Seat, which means all the ones that come up if
something goes horribly wrong in host key storage. To make that
possible, I've added a 'seat' parameter to store_host_key(); it turns
out that all its call sites had one available already.
Coverity spotted me copying an uninitialised variable into it, which
made me wonder how I hadn't noticed. The answer is that nothing
actually _uses_ that variable - it's written, but never read. I must
have put it in during development, thinking I was going to need it for
something, and then didn't end up using it after all.
My experimental build with clang-cl at -Wall did show up a few things
that are safe enough to fix right now. One was this list of missing
includes, which was causing a lot of -Wmissing-prototype warnings, and
is a real risk because it means the declarations in headers weren't
being type-checked against the actual function definitions.
Happily, no actual mismatches.
We already have the ability to start a subprocess and hook it up to a
Socket, for running local proxy commands. Now the same facility is
available as an auxiliary feature, so that a backend can start another
subcommand for a different purpose, and make a separate Socket to
communicate with it.
Just like the local proxy system, this facility captures the
subprocess's stderr, and passes it back to the caller via plug_log. To
make that not look silly, I had to add a system where the "proxy:"
prefix on the usual plug_log messages is reconfigurable, and when you
call platform_start_subprocess(), you get to pass the prefix you want
to use in this case.
I made a specific subdirectory 'stubs' to keep all the link-time stub
modules in, like notiming.c. And I put _one_ run-time stub in it,
namely nullplug.c. But the rest of the runtime stubs went into utils.
I think it's better to keep all the stubs together, so I've moved all
the null*.c in utils into stubs (with the exception of nullstrcmp.c,
which means the 'null' in a different sense). Also, fiddled with the
naming to be a bit more consistent, and stated in the new CMakeLists
the naming policy that distinguishes no-*.c from null-*.c.
This provides a convenient hook to be called between SSH messages, for
the crypto components to do any per-message processing like
incrementing a sequence number.
This patch fixes a few other whitespace and formatting issues which
were pointed out by the bulk-reindent or which I spotted in passing,
some involving manual editing to break lines more nicely.
I think the weirdest hunk in here is the one in windows/window.c
TranslateKey() where _half_ of an assignment statement inside an 'if'
was on the same line as the trailing paren of the if condition. No
idea at all how that one managed to happen!
I think a lot of these were inserted by a prior run through GNU indent
many years ago. I noticed in a more recent experiment that that tool
doesn't always correctly distinguish which instances of 'id * id' are
pointer variable declarations and which are multiplications, so it
spaces some of the former as if they were the latter.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
The test in the Pageant list box code for whether we should display
the bit count of a key was done by checking specifically for ssh_rsa
or ssh_dsa, which of course meant that it didn't catch the certified
versions of those keys.
Now there's yet another footling ssh_keyalg method that asks the
question 'is it worth displaying the bit count?', to which RSA and DSA
answer yes, and the opensshcert family delegates to its base key type,
so that RSA and DSA certified keys also answer yes.
(This isn't the same as ssh_key_public_bits(alg, blob) >= 0. All
supported public key algorithms _can_ display a bit count if called
on. But only in RSA and DSA is it configurable, and therefore worth
bothering to print in the list box.)
Also in this commit, I've fixed a bug in the certificate
implementation of public_bits, which was passing a wrongly formatted
public blob to the underlying key. (Done by factoring out the code
from opensshcert_new_shared which constructed the _correct_ public
blob, and reusing it in public_bits to do the same job.)
The text of the host key warnings was replicated in three places: the
Windows rc file, the GTK dialog setup function, and the console.c
shared between both platforms' CLI tools. Now it lives in just one
place, namely ssh/common.c where the rest of the centralised host-key
checking is done, so it'll be easier to adjust the wording in future.
This comes with some extra automation. Paragraph wrapping is no longer
done by hand in any version of these prompts. (Previously we let GTK
do the wrapping on GTK, but on Windows the resource file contained a
bunch of pre-wrapped LTEXT lines, and console.c had pre-wrapped
terminal messages.) And the dialog heights in Windows are determined
automatically based on the amount of stuff in the window.
The main idea of all this is that it'll be easier to set up more
elaborate kinds of host key prompt that deal with certificates (if,
e.g., a server sends us a certified host key which we don't trust the
CA for). But there are side benefits of this refactoring too: each
tool now reliably inserts its own appname in the prompts, and also, on
Windows the entire prompt text is copy-pastable.
Details of implementation: there's a new type SeatDialogText which
holds a set of (type, string) pairs describing the contents of a
prompt. Type codes distinguish ordinary text paragraphs, paragraphs to
be displayed prominently (like key fingerprints), the extra-bold scary
title at the top of the 'host key changed' version of the dialog, and
the various information that lives in the subsidiary 'more info' box.
ssh/common.c constructs this, and passes it to the Seat to present the
actual prompt.
In order to deal with the different UI for answering the prompt, I've
added an extra Seat method 'prompt_descriptions' which returns some
snippets of text to interpolate into the messages. ssh/common.c calls
that while it's still constructing the text, and incorporates the
resulting snippets into the SeatDialogText.
For the moment, this refactoring only affects the host key prompts.
The warnings about outmoded crypto are still done the old-fashioned
way; they probably ought to be similarly refactored to use this new
SeatDialogText system, but it's not immediately critical for the
purpose I have right now.
Partly, this just seems more sensible, since it may well vary per
platform beyond the ability of intorptr to specify. But more
immediately it means the definition of the HELPCTX macro doesn't have
to use the P() function from dialog.h, which isn't defined in any
circumstances outside the config subsystem. And I'm about to want to
put a help context well outside that subsystem.
This replaces the previous placeholder scheme of having a list of
hostname wildcards with implicit logical-OR semantics (if any wildcard
matched then the certificate would be trusted to sign for that host).
That scheme didn't allow for exceptions within a domain ('everything
in example.com except extra-high-security-machine.example.com'), and
also had no way to specify port numbers.
In the new system, you can still write a hostname wildcard by itself
in the simple case, but now those are just atomic subexpressions in a
boolean-logic domain-specific language I've made up. So if you want
multiple wildcards, you can separate them with || in a single longer
expression, and also you can use && and ! to impose exceptions on top
of that.
Full details of the expression language are in the comment at the top
of utils/cert-expr.c. It'll need documenting properly before release,
of course.
For the sake of backwards compatibility for early adopters who've
already set up configuration in the old system, I've put in some code
that will read the old MatchHosts configuration and automatically
translate it into the equivalent boolean expression (by simply
stringing together the list of wildcards with || between them).
These make a good storage format for mostly-textual data in
configuration, if it can't afford to reserve any character as a
delimiter. Assuming very few characters need to be escaped, the space
cost is lower than base64, and also you can read it by eye.
ptrlen_contains and ptrlen_contains_only are useful for checking that
a string is composed entirely of certain characters, or avoids them.
ptrlen_end makes a pointer to the byte just past the end of the
specified string. And it can be used with make_ptrlen_startend, which
makes a ptrlen out of two pointers instead of a pointer and a length.
Instead of maintaining a single sparse table mapping Unicode to the
currently selected code page, we now maintain a collection of such
tables mapping Unicode to any code page we've so far found a need to
work with, and we add code pages to that list as necessary, and never
throw them away (since there are a limited number of them).
This means that the wc_to_mb family of functions are effectively
stateless: they no longer depend on a 'struct unicode_data'
corresponding to the current terminal settings. So I've removed that
parameter from all of them.
This fills in the missing piece of yesterday's commit a216d86106:
now wc_to_mb too should be able to handle internally-implemented
character sets, by hastily making their reverse mapping table if it
doesn't already have it.
(That was only a _latent_ bug, because the only use of wc_to_mb in the
cross-platform or Windows code _did_ want to convert to the currently
selected code page, so the old strategy worked in that case. But there
was no protection against an unworkable use of it being added later.)
When the user provides a password on the PuTTY command line, via -pw
or -pwfile, the flag 'tried_once' inside cmdline_get_passwd_input() is
intended to arrange that we only try sending that password once, and
after we've sent it, we don't try again.
But this plays badly with the 'Restart Session' operation. If the
connection is lost and then restarted at user request, we _do_ want to
send that password again!
So this commit moves that static variable out into a small state
structure held by the client of cmdline_get_passwd_input. Each client
can decide how to manage that state itself.
Clients that support 'Restart Session' - i.e. just GUI PuTTY itself -
will initialise the state at the same time as instantiating the
backend, so that every time the session is restarted, we return
to (correctly) believing that we _haven't_ yet tried the password
provided on the command line.
But clients that don't support 'Restart Session' - i.e. Plink and file
transfer tools - can do the same thing that cmdline.c was doing
before: just keep the state in a static variable.
This also means that the GUI login tools will now retain the
command-line password in memory, whereas previously they'd have wiped
it out once it was used. But the other tools will still wipe and free
the password, because I've also added a 'bool restartable' flag to
cmdline_get_passwd_input to let it know when it _is_ allowed to do
that.
In the GUI tools, I don't see any way to get round that, because if
the session is restarted you _have_ to still have the password to use
again. (And you can't infer that that will never happen from the
CONF_close_on_exit setting, because that too could be changed in
mid-session.) On the other hand, I think it's not all that worrying,
because the use of either -pw or -pwfile means that a persistent copy
of your password is *already* stored somewhere, so another one isn't
too big a stretch.
(Due to the change of -pw policy in 0.77, the effect of this bug was
that an attempt to reconnect in a session set up this way would lead
to "Configured password was not accepted". In 0.76, the failure mode
was different: PuTTY would interactively prompt for the password,
having wiped it out of memory after it was used the first time round.)
Various alignments I want to do in the host CA box have shown up
deficiencies in this system, so I've reworked it a bit.
Firstly, you can now specify more than two controls to be tied
together with an align_next_to (e.g. multiple checkboxes alongside
something else).
Secondly, as well as forcing the controls to be the same height as
each other, the layout algorithm will also move the later controls
further _downward_, so that their top y positions also line up. Until
now that hasn't been necessary, because they lined up already.
In the GTK implementation of this via the Columns class, I've renamed
'columns_force_same_height' to 'columns_align_next_to', and similarly
for some of the internal fields, since the latter change makes the
previous names a misnomer.
In the Windows implementation, I found it most convenient to set this
up by following a linked list of align_next_to fields backwards. But
it won't always be convenient to initialise them that way, so I've
also written a crude normaliser that will rewrite those links into a
canonical form. But I only call that on Windows; it's unnecessary in
GTK, where the Columns class provides plenty of per-widget extra
storage so I just keep each alignment class as a circular list.