The immediate usefulness of this is in pterm.exe: when the user uses
-e to specify a command to run in the pterm, we retrieve the command
in Unicode, store it in CONF_remote_cmd as UTF-8, and then in conpty.c
we can extract it in the same form and convert it back to Unicode to
pass losslessly to CreateProcessW. So now non-ACP Unicode works in
that part of the pterm command line.
This is the pathfinding change that proves it's possible for _one_
Conf setting to become Unicode-capable.
That seems like quite a small reward for all the refactoring in the
previous patches this week! But changing over one configuration
setting is enough to get started with: once all the bugs are out of
this one, we can try switching over some more.
Changing the type to CONF_TYPE_STR_AMBI is enough by itself to make
the configuration dialog box write it into Conf as UTF-8, because
conf_editbox_handler automatically checks whether that possibility is
available. However, setting the same Conf entry from the command line
isn't automatic: I had to add code in the handler for the -l
command-line option in cmdline.c.
This commit also doesn't yet handle the _other_ way to specify a
username on the command line: including it as part of the hostname
argument via "putty user@host" or similar. That's more difficult,
because it also requires deciding what to do about UTF-8 in the actual
hostname.
(That looks as if it ought to be possible: Windows should be able to
handle looking up Unicode hostnames if you use GetAddrInfoW() in place
of getaddrinfo(). But plumbing it through everything in between
cmdline.c and windows/network.c is a bigger job than I'm prepared to
do in this proof-of-concept commit.)
This begins the process of making PuTTY more able to handle Unicode
strings as a first-class type in its configuration. One of the new
types, CONF_TYPE_UTF8, looks physically just like CONF_TYPE_STR but
the semantics are that it's definitely encoded in UTF-8, instead of
'shrug, whatever the system locale's encoding is'.
Unfortunately, we can't yet switch over any Conf items to having that
type, because our data representations in saved configuration (both on
Unix and Windows) store char strings in the system encoding. So we'll
have to change that representation at the same time, which risks
breaking backwards compatibility with old PuTTYs reading the same
configuration.
So the other new type, CONF_TYPE_STR_AMBI, is intended as a
transitional form, recording a configuration setting that _might_ be
explicitly UTF-8 or might have the legacy 'shrug, whatever' semantics,
depending on where we got it from.
My general migration plan is that first I _enable_ Unicode support in
a Conf item, by turning it into STR_AMBI; the Unicode version of the
string (if any) is saved in a new location, and a best-effort
local-charset version is saved where it's always been. That way new
PuTTY can read the Unicode version, and old PuTTY reading that
configuration will behave no worse than it would have done already.
It would be nice to think that in the far future we've migrated
everything to STR_AMBI and can move them all to mandatory UTF-8,
obsoleting the old configuration. I think it's more likely we'll never
get there. But at least _new_ Conf items, with no backwards
compatibility requirement in the first place, can be CONF_TYPE_UTF8
where appropriate.
(In conf_get_str_ambi(), I considered making it mandatory via assert()
to pass the 'utf8' output pointer as non-NULL, to defend against lazy
adaptation of existing code by just changing the function call. But in
fact I think there's a legitimate use case for not caring if the
output is UTF-8 or not, because some of the existing SSH code
currently just shoves strings like usernames directly on to the wire
whether they're in the right encoding or not; so if you want to do the
correct UTF-8 thing where possible and preserve legacy behaviour if
not, then treating both classes of string the same _is_ the right
thing to do.)
This also requires linking the Unicode support into many Unix
applications that hadn't previously needed it.
These are now specified in conf.h and filled in by automated code,
which means test_conf can make sure we didn't forget to provide them.
The default for a mapping type (not that we currently have any unsaved
ones) is expected to be empty.
Also, while adding test_conf checks, I realised I hadn't filled in the
rest of the comment in conf.h. Belatedly updated that.
This allows a couple more settings to be treated automatically on
save, which are more complicated on load because they still honour
older alternative save keywords.
In particular, CONF_proxy_type and CONF_remote_qtitle_action now have
explicit enum mappings. These were needed for the automated save code,
but also, I've rewritten the custom load code to use them too. This
decouples the storage format of those settings from the order of
values in the internal enum, which is generally an advantage of
specifying storage enums explicitly.
Those two settings weren't already tested by test_conf, because I
wasn't changing them in previous commits. Now I've added extra code
that does test them, and verified it works when backported to commit
b567c9b2b5 where I introduced test_conf before beginning the main
refactoring.
A setting can also be specified explicitly as not loaded and saved at
all. There were quite a few commented that way, but now there's a
machine-readable indication of it.
test_conf will now check that all these settings make sense together -
things shouldn't have a save keyword unless they use it, and should
have one if they don't, and shouldn't specify combinations of options
that conflict.
(For that reason, test_conf is now also running the consistency check
before the main test, so that a missing keyword will cause an error
message _before_ it causes a segfault, saving some debugging!)
This is why I wrote conf.h in the form of macros that expanded to
named structure field assignments, instead of just filling it with
named structure field assignments directly. This way, I can #include
the same file again with different macro definitions, and build up a
list of what fields were set in what config options.
This new code checks that if a config option has a default, then the
type of the default matches the declared type of the option value
itself. That's what caught the two goofs in the previous commit.
This is also the part of test_conf that I _won't_ want to delete once
I've finished with the refactoring: it can stay there forever, doing
type checking at test time that the compiler isn't doing for me at
build time.
Ahem. Of course I've been running it interactively until now, so I
never noticed that I'd forgotten to fill in that important point. But
now it's run as part of my build, it should make sure to fail if it
fails.
This aims to be a reasonably exhaustive test of what happens if you
set Conf values to various things, and then save your session, and
find out what ends up in the storage. Or vice versa.
Currently, the test program is written to match the existing
behaviour. The idea is that I can refactor the code that does the
loading and saving, and if this test still passes, I've probably done
it right.
However, in the long term, this test will be a liability: it's yet
another place you have to add every new config option. So my plan is
to get rid of it again once the refactorings I'm planning are
finished.
Or rather, I'll get rid of _that_ part of its functionality. I also
suspect I'll have added new kinds of consistency check by then, which
won't be a liability in the same way, and which I'll want to keep.