The Windows version of the Filename structure now contains three
versions of the pathname, in UTF-16, UTF-8 and the system code page.
Callers can use whichever is most convenient.
All uses of filenames for actually opening files now use the UTF-16
version, which means they can tolerate 'exotic' filenames, by which I
mean those including Unicode characters outside the host system's
CP_ACP default code page.
Other uses of Filename structures inside the 'windows' subdirectory do
something appropriate, e.g. when printing a filename inside a message
box or a console message, we use the UTF-8 version of the filename
with the UTF-8 version of the appropriate API.
There are three remaining pieces to full Unicode filename support:
One is that the cross-platform code has many calls to
filename_to_str(), embodying the assumption that a file name can be
reliably converted into the unspecified current character set; those
will all need changing in some way.
Another is that write_setting_filename(), in windows/storage.c, still
saves filenames to the Registry as an ordinary REG_SZ in the system
code page. So even if an exotic filename were stored in a Conf, that
Conf couldn't round-trip via the Registry and back without corrupting
that filename by coercing it back to a string that fits in CP_ACP and
therefore doesn't represent the same file. This can't be fixed without
a compatibility break in the storage format, and I don't want to make
a minimal change in that area: if we're going to break compatibility,
then we should break it good and hard (the Nanny Ogg principle), and
devise a completely fresh storage representation that fixes as many
other legacy problems as possible at the same time. So that's my plan,
not yet started.
The final point, much more obviously, is that we're still short of
methods to _construct_ any Filename structures using a Unicode input
string! It should now work to enter one in the GUI configurer (either
by manual text input or via the file selector), but it won't
round-trip through a save and load (as discussed above), and there's
still no way to specify one on the command line (the groundwork is
laid by commit 10e1ac7752 but not yet linked up).
But this is a start.
I mentioned recently (in commit 9e7d4c53d8) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.
These handy wrappers on the verbose underlying Win32 registry API have
to lose some expressiveness, and one thing they lost was the ability
to open a registry key without asking for both read and write access.
This meant they couldn't be used for accessing keys not owned by the
calling user.
So far, I've only used them for accessing PuTTY's own saved data,
which means that hasn't been a problem. But I want to use them
elsewhere in an upcoming commit, so I need to fix that.
The obvious thing would be to change the meaning of the existing
'create' boolean flag so that if it's false, we also don't request
write access. The rationale would be that you're either reading or
writing, and if you're writing you want both RW access and to create
keys that don't already exist. But in fact that's not true: you do
want to set create==false and have write access in the case where
you're _deleting_ things from the key (or the whole key). So we really
do need three ways to call the wrapper function.
Rather than add another boolean field to every call site or mess about
with an 'access type' enum, I've taken an in-between route: the
underlying open_regkey_fn *function* takes a 'create' and a 'write'
flag, but at call sites, it's wrapped with a macro anyway (to append
NULL to the variadic argument list), so I've just made three macros
whose names request different access. That makes call sites marginally
_less_ verbose, while still
This is like the seat-independent nonfatal(), but specifies a Seat,
which allows the GUI dialog box to have the right terminal window as
its parent (if there are multiple ones).
Changed over all the nonfatal() calls in the code base that could be
localised to a Seat, which means all the ones that come up if
something goes horribly wrong in host key storage. To make that
possible, I've added a 'seat' parameter to store_host_key(); it turns
out that all its call sites had one available already.
If the function name (or expression) in a function call or declaration
is itself so long that even the first argument doesn't fit after it on
the same line, or if that would leave so little space that it would be
silly to try to wrap all the run-on lines into a tall thin column,
then I used to do this
ludicrously_long_function_name
(arg1, arg2, arg3);
and now prefer this
ludicrously_long_function_name(
arg1, arg2, arg3);
I picked up the habit from Python, where the latter idiom is required
by Python's syntactic significance of newlines (you can write the
former if you use a backslash-continuation, but pretty much everyone
seems to agree that that's much uglier). But I've found it works well
in C as well: it makes it more obvious that the previous line is
incomplete, it gives you a tiny bit more space to wrap the following
lines into (the old idiom indents the _third_ line one space beyond
the second), and I generally turn out to agree with the knock-on
indentation decisions made by at least Emacs if you do it in the
middle of a complex expression. Plus, of course, using the _same_
idiom between C and Python means less state-switching.
So, while I'm making annoying indentation changes in general, this
seems like a good time to dig out all the cases of the old idiom in
this code, and switch them over to the new.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
This replaces the previous placeholder scheme of having a list of
hostname wildcards with implicit logical-OR semantics (if any wildcard
matched then the certificate would be trusted to sign for that host).
That scheme didn't allow for exceptions within a domain ('everything
in example.com except extra-high-security-machine.example.com'), and
also had no way to specify port numbers.
In the new system, you can still write a hostname wildcard by itself
in the simple case, but now those are just atomic subexpressions in a
boolean-logic domain-specific language I've made up. So if you want
multiple wildcards, you can separate them with || in a single longer
expression, and also you can use && and ! to impose exceptions on top
of that.
Full details of the expression language are in the comment at the top
of utils/cert-expr.c. It'll need documenting properly before release,
of course.
For the sake of backwards compatibility for early adopters who've
already set up configuration in the old system, I've put in some code
that will read the old MatchHosts configuration and automatically
translate it into the equivalent boolean expression (by simply
stringing together the list of wildcards with || between them).
As distinct from the type of signature generated by the SSH server
itself from the host key, this lets you exclude (and by default does
exclude) the old "ssh-rsa" SHA-1 signature type from the signature of
the CA on the certificate.
This will allow the central host_ca_new function to pre-populate the
structure with default values for the fields, so that once I add more
options to CA configuration they can take their default values when
loading a saved record from a previous PuTTY version.
As well as eliminating the null-pointer dereference, I also now
realise that the format-translation code depended on leaving the final
translated string in 'otherstr' in order to pass the host key check
afterwards (if they match).
I've also now realised that this only applies to *SSH-1* RSA keys, so
it's even more obsolete than I thought before. Perhaps I should just
remove this code instead of spending all this effort on fixing it. But
I've done the fix now, so I'll commit it, and then maybe we can remove
it afterwards (and have a working version of it available to resurrect
if ever needed!).
A user points out that in commit 6143a50ed2, when I converted all
use of the registry to functions that return a newly allocated buffer
instead of allocating a buffer themselves beforehand, I overlooked
that one use of the old idiom was reusing the preallocated buffer as
work space.
I _hope_ nobody still needs this code - the 'old-style' host key cache
format it handles was replaced in 2000. If anyone has a PuTTY host key
cache entry that's survived 22 years without either having to be
reinitialised on a new system or changed when the machine's host key
was upgraded, they're doing better than I am!
But if it's still here, it should still work, obviously. Replaced the
reused buffer with a strbuf, which is more robust anyway.
Now we offer the OpenSSH certificate key types in our KEXINIT host key
algorithm list, so that if the server has a certificate, they can send
it to us.
There's a new storage.h abstraction for representing a list of trusted
host CAs, and which ones are trusted to certify hosts for what
domains. This is stored outside the normal saved session data, because
the whole point of host certificates is to avoid per-host faffing.
Configuring this set of trusted CAs is done via a new GUI dialog box,
separate from the main PuTTY config box (because it modifies a single
set of settings across all saved sessions), which you can launch by
clicking a button in the 'Host keys' pane. The GUI is pretty crude for
the moment, and very much at a 'just about usable' stage right now. It
will want some polishing.
If we have no CA configured that matches the hostname, we don't offer
to receive certified host keys in the first place. So for existing
users who haven't set any of this up yet, nothing will immediately
change.
Currently, if we do offer to receive certified host keys and the
server presents one signed by a CA we don't trust, PuTTY will bomb out
unconditionally with an error, instead of offering a confirmation box.
That's an unfinished part which I plan to fix before this goes into a
release.
All the fiddly business where you have to check that a thing exists,
make sure of its type, find its size, allocate some memory, and then
read it again properly (or, alternatively, loop round dealing with
ERROR_MORE_DATA) just doesn't belong at every call site. It's crying
out to be moved out into some separate utility functions that present
a more ergonomic API, so that the code that decides _which_ Registry
entries to read and what to do with them can concentrate on that.
So I've written a fresh set of registry API wrappers in windows/utils,
and simplified windows/storage.c as a result. The jump-list handling
code in particular is almost legible now!
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
marshal.h now provides a macro put_fmt() which allows you to write
arbitrary printf-formatted data to an arbitrary BinarySink.
We already had this facility for strbufs in particular, in the form of
strbuf_catf(). That was able to take advantage of knowing the inner
structure of a strbuf to minimise memory allocation (it would snprintf
directly into the strbuf's existing buffer if possible). For a general
black-box BinarySink we can't do that, so instead we dupvprintf into a
temporary buffer.
For consistency, I've removed strbuf_catf, and converted all uses of
it into the new put_fmt - and I've also added an extra vtable method
in the BinarySink API, so that put_fmt can still use strbuf_catf's
more efficient memory management when talking to a strbuf, and fall
back to the simpler strategy when that's not available.
Previously, checking the host key against the persistent cache managed
by the storage.h API was done as part of the seat_verify_ssh_host_key
method, i.e. separately by each Seat.
Now that check is done by verify_ssh_host_key(), which is a new
function in ssh/common.c that centralises all the parts of host key
checking that don't need an interactive prompt. It subsumes the
previous verify_ssh_manual_host_key() that checked against the Conf,
and it does the check against the storage API that each Seat was
previously doing separately. If it can't confirm or definitively
reject the host key by itself, _then_ it calls out to the Seat, once
an interactive prompt is definitely needed.
The main point of doing this is so that when SshProxy forwards a Seat
call from the proxy SSH connection to the primary Seat, it won't print
an announcement of which connection is involved unless it's actually
going to do something interactive. (Not that we're printing those
announcements _yet_ anyway, but this is a piece of groundwork that
works towards doing so.)
But while I'm at it, I've also taken the opportunity to clean things
up a bit by renaming functions sensibly. Previously we had three very
similarly named functions verify_ssh_manual_host_key(), SeatVtable's
'verify_ssh_host_key' method, and verify_host_key() in storage.h. Now
the Seat method is called 'confirm' rather than 'verify' (since its
job is now always to print an interactive prompt, so it looks more
like the other confirm_foo methods), and the storage.h function is
called check_stored_host_key(), which goes better with store_host_key
and avoids having too many functions with similar names. And the
'manual' function is subsumed into the new centralised code, so
there's now just *one* host key function with 'verify' in the name.
Several functions are reindented in this commit. Best viewed with
whitespace changes ignored.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.