Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.
For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.
In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
Just like burnstr(), it memsets a NUL-terminated string to all zeroes
before freeing it. The only difference is that it does it to a string
of wchar_t.
Unlike clang, VS didn't like me using the value of one 'static const'
integer variable to compute the value of another, and complained
'initializer is not a constant'. Replaced all those variables with an
enum, which should also more reliably ensure that even an
unsophisticated compiler doesn't actually reserve data-section space
for them.
A new module in 'utils' computes NFC and NFD, via a new set of data
tables generated by read_ucd.py.
The new module comes with a new test program, which can read the
NormalizationTest.txt that appears in the Unicode Character Database.
All the tests pass, as of Unicode 15.
This enables it to handle data that isn't presented as a
NUL-terminated string.
In particular, the NUL byte can appear _within_ the string and be
correctly translated to the NUL wide character. So I've been able to
remove the awkwardness in the test rig of having to include the
terminating NUL in every test to ensure NUL has been tested, and
instead, insert a single explicit test for it.
Similarly to the previous commit, the simplification at the (one) call
site gives me a strong feeling of 'this is what the API should have
been all along'!
The test in question was supposed to contain the spurious UTF-8
encoding that 0xD800 would have if it were not a surrogate. But the
final continuation character 0x80 was instead 0x00.
The test passed anyway, because ED A0 was regarded as a truncated
sequence, instead of ED A0 80 being regarded as an illegal encoding of
a surrogate, and both return the same output!
Previously it output to an ordinary char buffer, and returned the
number of bytes it had written. But three out of the four call sites
immediately chucked the resulting bytes into a BinarySink anyway. The
fourth, in windows/unicode.c, really is writing into successive
locations of a fixed-size buffer - but we can make that into a
BinarySink too, using the buffer_sink added in the previous commit.
So now encode_utf8() is renamed put_utf8_char, and the call sites all
look simpler than they started out.
This is one of marshal.c's small collection of handy BinarySink
adapters to existing kinds of thing, alongside stdio_sink and
bufchain_sink. It writes into a fixed-size buffer, discarding all
writes after the buffer fills up, and sets a flag to let you know if
it overflowed.
There was one of these in Windows Pageant a while back, under the name
'struct PageantReply' (introduced in commit b6cbad89fc, removed
again in 98538caa39 when the named-pipe revamp made it
unnecessary). This is the same idea but centralised for reusability.
I mentioned recently (in commit 9e7d4c53d8) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.
This is like the seat-independent nonfatal(), but specifies a Seat,
which allows the GUI dialog box to have the right terminal window as
its parent (if there are multiple ones).
Changed over all the nonfatal() calls in the code base that could be
localised to a Seat, which means all the ones that come up if
something goes horribly wrong in host key storage. To make that
possible, I've added a 'seat' parameter to store_host_key(); it turns
out that all its call sites had one available already.
Coverity spotted me copying an uninitialised variable into it, which
made me wonder how I hadn't noticed. The answer is that nothing
actually _uses_ that variable - it's written, but never read. I must
have put it in during development, thinking I was going to need it for
something, and then didn't end up using it after all.
My experimental build with clang-cl at -Wall did show up a few things
that are safe enough to fix right now. One was this list of missing
includes, which was causing a lot of -Wmissing-prototype warnings, and
is a real risk because it means the declarations in headers weren't
being type-checked against the actual function definitions.
Happily, no actual mismatches.
We already have the ability to start a subprocess and hook it up to a
Socket, for running local proxy commands. Now the same facility is
available as an auxiliary feature, so that a backend can start another
subcommand for a different purpose, and make a separate Socket to
communicate with it.
Just like the local proxy system, this facility captures the
subprocess's stderr, and passes it back to the caller via plug_log. To
make that not look silly, I had to add a system where the "proxy:"
prefix on the usual plug_log messages is reconfigurable, and when you
call platform_start_subprocess(), you get to pass the prefix you want
to use in this case.
I made a specific subdirectory 'stubs' to keep all the link-time stub
modules in, like notiming.c. And I put _one_ run-time stub in it,
namely nullplug.c. But the rest of the runtime stubs went into utils.
I think it's better to keep all the stubs together, so I've moved all
the null*.c in utils into stubs (with the exception of nullstrcmp.c,
which means the 'null' in a different sense). Also, fiddled with the
naming to be a bit more consistent, and stated in the new CMakeLists
the naming policy that distinguishes no-*.c from null-*.c.
This provides a convenient hook to be called between SSH messages, for
the crypto components to do any per-message processing like
incrementing a sequence number.
This patch fixes a few other whitespace and formatting issues which
were pointed out by the bulk-reindent or which I spotted in passing,
some involving manual editing to break lines more nicely.
I think the weirdest hunk in here is the one in windows/window.c
TranslateKey() where _half_ of an assignment statement inside an 'if'
was on the same line as the trailing paren of the if condition. No
idea at all how that one managed to happen!
I think a lot of these were inserted by a prior run through GNU indent
many years ago. I noticed in a more recent experiment that that tool
doesn't always correctly distinguish which instances of 'id * id' are
pointer variable declarations and which are multiplications, so it
spaces some of the former as if they were the latter.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
The test in the Pageant list box code for whether we should display
the bit count of a key was done by checking specifically for ssh_rsa
or ssh_dsa, which of course meant that it didn't catch the certified
versions of those keys.
Now there's yet another footling ssh_keyalg method that asks the
question 'is it worth displaying the bit count?', to which RSA and DSA
answer yes, and the opensshcert family delegates to its base key type,
so that RSA and DSA certified keys also answer yes.
(This isn't the same as ssh_key_public_bits(alg, blob) >= 0. All
supported public key algorithms _can_ display a bit count if called
on. But only in RSA and DSA is it configurable, and therefore worth
bothering to print in the list box.)
Also in this commit, I've fixed a bug in the certificate
implementation of public_bits, which was passing a wrongly formatted
public blob to the underlying key. (Done by factoring out the code
from opensshcert_new_shared which constructed the _correct_ public
blob, and reusing it in public_bits to do the same job.)
The text of the host key warnings was replicated in three places: the
Windows rc file, the GTK dialog setup function, and the console.c
shared between both platforms' CLI tools. Now it lives in just one
place, namely ssh/common.c where the rest of the centralised host-key
checking is done, so it'll be easier to adjust the wording in future.
This comes with some extra automation. Paragraph wrapping is no longer
done by hand in any version of these prompts. (Previously we let GTK
do the wrapping on GTK, but on Windows the resource file contained a
bunch of pre-wrapped LTEXT lines, and console.c had pre-wrapped
terminal messages.) And the dialog heights in Windows are determined
automatically based on the amount of stuff in the window.
The main idea of all this is that it'll be easier to set up more
elaborate kinds of host key prompt that deal with certificates (if,
e.g., a server sends us a certified host key which we don't trust the
CA for). But there are side benefits of this refactoring too: each
tool now reliably inserts its own appname in the prompts, and also, on
Windows the entire prompt text is copy-pastable.
Details of implementation: there's a new type SeatDialogText which
holds a set of (type, string) pairs describing the contents of a
prompt. Type codes distinguish ordinary text paragraphs, paragraphs to
be displayed prominently (like key fingerprints), the extra-bold scary
title at the top of the 'host key changed' version of the dialog, and
the various information that lives in the subsidiary 'more info' box.
ssh/common.c constructs this, and passes it to the Seat to present the
actual prompt.
In order to deal with the different UI for answering the prompt, I've
added an extra Seat method 'prompt_descriptions' which returns some
snippets of text to interpolate into the messages. ssh/common.c calls
that while it's still constructing the text, and incorporates the
resulting snippets into the SeatDialogText.
For the moment, this refactoring only affects the host key prompts.
The warnings about outmoded crypto are still done the old-fashioned
way; they probably ought to be similarly refactored to use this new
SeatDialogText system, but it's not immediately critical for the
purpose I have right now.
Partly, this just seems more sensible, since it may well vary per
platform beyond the ability of intorptr to specify. But more
immediately it means the definition of the HELPCTX macro doesn't have
to use the P() function from dialog.h, which isn't defined in any
circumstances outside the config subsystem. And I'm about to want to
put a help context well outside that subsystem.
This replaces the previous placeholder scheme of having a list of
hostname wildcards with implicit logical-OR semantics (if any wildcard
matched then the certificate would be trusted to sign for that host).
That scheme didn't allow for exceptions within a domain ('everything
in example.com except extra-high-security-machine.example.com'), and
also had no way to specify port numbers.
In the new system, you can still write a hostname wildcard by itself
in the simple case, but now those are just atomic subexpressions in a
boolean-logic domain-specific language I've made up. So if you want
multiple wildcards, you can separate them with || in a single longer
expression, and also you can use && and ! to impose exceptions on top
of that.
Full details of the expression language are in the comment at the top
of utils/cert-expr.c. It'll need documenting properly before release,
of course.
For the sake of backwards compatibility for early adopters who've
already set up configuration in the old system, I've put in some code
that will read the old MatchHosts configuration and automatically
translate it into the equivalent boolean expression (by simply
stringing together the list of wildcards with || between them).
These make a good storage format for mostly-textual data in
configuration, if it can't afford to reserve any character as a
delimiter. Assuming very few characters need to be escaped, the space
cost is lower than base64, and also you can read it by eye.
ptrlen_contains and ptrlen_contains_only are useful for checking that
a string is composed entirely of certain characters, or avoids them.
ptrlen_end makes a pointer to the byte just past the end of the
specified string. And it can be used with make_ptrlen_startend, which
makes a ptrlen out of two pointers instead of a pointer and a length.
Instead of maintaining a single sparse table mapping Unicode to the
currently selected code page, we now maintain a collection of such
tables mapping Unicode to any code page we've so far found a need to
work with, and we add code pages to that list as necessary, and never
throw them away (since there are a limited number of them).
This means that the wc_to_mb family of functions are effectively
stateless: they no longer depend on a 'struct unicode_data'
corresponding to the current terminal settings. So I've removed that
parameter from all of them.
This fills in the missing piece of yesterday's commit a216d86106:
now wc_to_mb too should be able to handle internally-implemented
character sets, by hastily making their reverse mapping table if it
doesn't already have it.
(That was only a _latent_ bug, because the only use of wc_to_mb in the
cross-platform or Windows code _did_ want to convert to the currently
selected code page, so the old strategy worked in that case. But there
was no protection against an unworkable use of it being added later.)
When the user provides a password on the PuTTY command line, via -pw
or -pwfile, the flag 'tried_once' inside cmdline_get_passwd_input() is
intended to arrange that we only try sending that password once, and
after we've sent it, we don't try again.
But this plays badly with the 'Restart Session' operation. If the
connection is lost and then restarted at user request, we _do_ want to
send that password again!
So this commit moves that static variable out into a small state
structure held by the client of cmdline_get_passwd_input. Each client
can decide how to manage that state itself.
Clients that support 'Restart Session' - i.e. just GUI PuTTY itself -
will initialise the state at the same time as instantiating the
backend, so that every time the session is restarted, we return
to (correctly) believing that we _haven't_ yet tried the password
provided on the command line.
But clients that don't support 'Restart Session' - i.e. Plink and file
transfer tools - can do the same thing that cmdline.c was doing
before: just keep the state in a static variable.
This also means that the GUI login tools will now retain the
command-line password in memory, whereas previously they'd have wiped
it out once it was used. But the other tools will still wipe and free
the password, because I've also added a 'bool restartable' flag to
cmdline_get_passwd_input to let it know when it _is_ allowed to do
that.
In the GUI tools, I don't see any way to get round that, because if
the session is restarted you _have_ to still have the password to use
again. (And you can't infer that that will never happen from the
CONF_close_on_exit setting, because that too could be changed in
mid-session.) On the other hand, I think it's not all that worrying,
because the use of either -pw or -pwfile means that a persistent copy
of your password is *already* stored somewhere, so another one isn't
too big a stretch.
(Due to the change of -pw policy in 0.77, the effect of this bug was
that an attempt to reconnect in a session set up this way would lead
to "Configured password was not accepted". In 0.76, the failure mode
was different: PuTTY would interactively prompt for the password,
having wiped it out of memory after it was used the first time round.)
Various alignments I want to do in the host CA box have shown up
deficiencies in this system, so I've reworked it a bit.
Firstly, you can now specify more than two controls to be tied
together with an align_next_to (e.g. multiple checkboxes alongside
something else).
Secondly, as well as forcing the controls to be the same height as
each other, the layout algorithm will also move the later controls
further _downward_, so that their top y positions also line up. Until
now that hasn't been necessary, because they lined up already.
In the GTK implementation of this via the Columns class, I've renamed
'columns_force_same_height' to 'columns_align_next_to', and similarly
for some of the internal fields, since the latter change makes the
previous names a misnomer.
In the Windows implementation, I found it most convenient to set this
up by following a linked list of align_next_to fields backwards. But
it won't always be convenient to initialise them that way, so I've
also written a crude normaliser that will rewrite those links into a
canonical form. But I only call that on Windows; it's unnecessary in
GTK, where the Columns class provides plenty of per-widget extra
storage so I just keep each alignment class as a circular list.
As distinct from the type of signature generated by the SSH server
itself from the host key, this lets you exclude (and by default does
exclude) the old "ssh-rsa" SHA-1 signature type from the signature of
the CA on the certificate.
This will allow the central host_ca_new function to pre-populate the
structure with default values for the fields, so that once I add more
options to CA configuration they can take their default values when
loading a saved record from a previous PuTTY version.
Now we offer the OpenSSH certificate key types in our KEXINIT host key
algorithm list, so that if the server has a certificate, they can send
it to us.
There's a new storage.h abstraction for representing a list of trusted
host CAs, and which ones are trusted to certify hosts for what
domains. This is stored outside the normal saved session data, because
the whole point of host certificates is to avoid per-host faffing.
Configuring this set of trusted CAs is done via a new GUI dialog box,
separate from the main PuTTY config box (because it modifies a single
set of settings across all saved sessions), which you can launch by
clicking a button in the 'Host keys' pane. The GUI is pretty crude for
the moment, and very much at a 'just about usable' stage right now. It
will want some polishing.
If we have no CA configured that matches the hostname, we don't offer
to receive certified host keys in the first place. So for existing
users who haven't set any of this up yet, nothing will immediately
change.
Currently, if we do offer to receive certified host keys and the
server presents one signed by a CA we don't trust, PuTTY will bomb out
unconditionally with an error, instead of offering a confirmation box.
That's an unfinished part which I plan to fix before this goes into a
release.
Certificate keys don't work the same as normal keys, so the rest of
the code is going to have to pay attention to whether a key is a
certificate, and if so, treat it differently and do cert-specific
stuff to it. So here's a collection of methods for that purpose.
With one exception, these methods of ssh_key are not expected to be
implemented at all in non-certificate key types: they should only ever
be called once you already know you're dealing with a certificate. So
most of the new method pointers can be left out of the ssh_keyalg
initialisers.
The exception is the base_key method, which retrieves the base key of
a certificate - the underlying one with the certificate stripped off.
It's convenient for non-certificate keys to implement this too, and
just return a pointer to themselves. So I've added an implementation
in nullkey.c doing that. (The returned pointer doesn't transfer
ownership; you have to use the new ssh_key_clone() if you want to keep
the base key after freeing the certificate key.)
The methods _only_ implemented in certificates:
Query methods to return the public key of the CA (for looking up in a
list of trusted ones), and to return the key id string (which exists
to be written into log files).
Obviously, we need a check_cert() method which will verify the CA's
actual signature, not to mention checking all the other details like
the principal and the validity period.
And there's another fiddly method for dealing with the RSA upgrade
system, called 'related_alg'. This is quite like alternate_ssh_id, in
that its job is to upgrade one key algorithm to a related one with
more modern RSA signing flags (or any other similar thing that might
later reuse the same mechanism). But where alternate_ssh_id took the
actual signing flags as an argument, this takes a pointer to the
upgraded base algorithm. So it answers the question "What is to this
key algorithm as you are to its base?" - if you call it on
opensshcert_ssh_rsa and give it ssh_rsa_sha512, it'll give you back
opensshcert_ssh_rsa_sha512.
(It's awkward to have to have another of these fiddly methods, and in
the longer term I'd like to try to clean up their proliferation a bit.
But I even more dislike the alternative of just going through
all_keyalgs looking for a cert algorithm with, say, ssh_rsa_sha512 as
the base: that approach would work fine now but it would be a lurking
time bomb for when all the -cert-v02@ methods appear one day. This
way, each certificate type can upgrade itself to the appropriately
related version. And at least related_alg is only needed if you _are_
a certificate key type - it's not adding yet another piece of
null-method boilerplate to the rest.)
The low-level functions to handle a single atom of base64 at a time
have been in 'utils' / misc.h for ages, but the higher-level family of
base64_encode functions that handle a whole data block were hidden
away in sshpubk.c, and there was no higher-level decode function at
all.
Now moved both into 'utils' modules and declared them in misc.h rather
than ssh.h. Also, improved the APIs: they all take ptrlen in place of
separate data and length arguments, their naming is more consistent
and more explicit (the previous base64_encode which didn't name its
destination is now base64_encode_fp), and the encode functions now
accept cpl == 0 as a special case meaning that the output base64 data
is wanted in the form of an unbroken single-line string with no
trailing \n.
This makes a second independent copy of an existing ssh_key, for
situations where one piece of code is going to want to keep it after
its current owner frees it.
In order to have it work on an arbitrary ssh_key, whether public-only
or a full public+private key pair, I've had to add an ssh_key query
method to ask whether a private key is known. I'm surprised I haven't
found a need for that before! But I suppose in most situations in an
SSH client you statically know which kind of key you're dealing with.
In this commit, I provide further functions which generate the
existing set of data types:
- key_components_add_text_pl() adds a text component, but takes a
ptrlen rather than a const char *, in case that was what you
happened to have already.
- key_components_add_uint() ends up adding an mp_int to the
structure, but takes it as input in the form of an ordinary C
integer, for the convenience of call sites which will want to do
that a lot and don't enjoy repeating the mp_int construction
boilerplate
- key_components_add_copy() takes a pointer to one of the
key_component sub-structs in an existing key_components, and copies
it into the output key_components under a new name, handling
whatever type it turns out to have.
This stores its data in the same format as the existing KCT_TEXT, but
it displays differently in puttygen --dump, expecting that the data
will be full of horrible control characters, invalid UTF-8, etc.
The displayed data is of the form b64("..."), so you get a hint about
what the encoding is, and can still paste into Python by defining the
identifier 'b64' to be base64.b64decode or equivalent.
Having recently pulled it out into its own file, I think it could also
do with a bit of tidying. In this rework:
- the substructure for a single component now has a globally visible
struct tag, so you can make a variable pointing at it, saving
verbiage in every piece of code looping over a key_components
- the 'is_mp_int' flag has been replaced with a type enum, so that
more types can be added without further upheaval
- the printing loop in cmdgen.c for puttygen --dump has factored out
the initial 'name=' prefix on each line so that it isn't repeated
per component type
- the storage format for text components is now a strbuf rather than
a plain char *, which I think is generally more useful.
Previously, the fact that "ssh-rsa" sometimes comes with two subtypes
"rsa-sha2-256" and "rsa-sha2-512" was known to three different parts
of the code - two in userauth and one in transport. Now the knowledge
of what those ids are, which one goes with which signing flags, and
which key types have subtypes at all, is centralised into a method of
the key algorithm, and all those locations just query it.
This will enable the introduction of further key algorithms that have
a parallel upgrade system.
It's a class method rather than an object method, so it doesn't allow
keys with the same algorithm to make different choices about what
flags they support. But that's not what I wanted it for: the real
purpose is to allow one key algorithm to delegate supported_flags to
another, by having its method implementation call the one from the
delegate class.
(If only C's compile/link model permitted me to initialise a field of
one global const struct variable to be a copy of that of another, I
wouldn't need the runtime overhead of this method! But object file
formats don't let you even specify that.)
Most key algorithms support no flags at all, so they all want to use
the same implementation of this method. So I've started a file of
stubs utils/nullkey.c to contain the common stub version.