This fixes a vulnerability that compromises NIST P521 ECDSA keys when
they are used with PuTTY's existing DSA nonce generation code. The
vulnerability has been assigned the identifier CVE-2024-31497.
PuTTY has been doing its DSA signing deterministically for literally
as long as it's been doing it at all, because I didn't trust Windows's
entropy generation. Deterministic nonce generation was introduced in
commit d345ebc2a5, as part of the initial version of our DSA
signing routine. At the time, there was no standard for how to do it,
so we had to think up the details of our system ourselves, with some
help from the Cambridge University computer security group.
More than ten years later, RFC 6979 was published, recommending a
similar system for general use, naturally with all the details
different. We didn't switch over to doing it that way, because we had
a scheme in place already, and as far as I could see, the differences
were not security-critical - just the normal sort of variation you
expect when any two people design a protocol component of this kind
independently.
As far as I know, the _structure_ of our scheme is still perfectly
fine, in terms of what data gets hashed, how many times, and how the
hash output is converted into a nonce. But the weak spot is the choice
of hash function: inside our dsa_gen_k() function, we generate 512
bits of random data using SHA-512, and then reduce that to the output
range by modular reduction, regardless of what signature algorithm
we're generating a nonce for.
In the original use case, this introduced a theoretical bias (the
output size is an odd prime, which doesn't evenly divide the space of
2^512 possible inputs to the reduction), but the theory was that since
integer DSA uses a modulus prime only 160 bits long (being based on
SHA-1, at least in the form that SSH uses it), the bias would be too
small to be detectable, let alone exploitable.
Then we reused the same function for NIST-style ECDSA, when it
arrived. This is fine for the P256 curve, and even P384. But in P521,
the order of the base point is _greater_ than 2^512, so when we
generate a 512-bit number and reduce it, the reduction never makes any
difference, and our output nonces are all in the first 2^512 elements
of the range of about 2^521. So this _does_ introduce a significant
bias in the nonces, compared to the ideal of uniformly random
distribution over the whole range. And it's been recently discovered
that a bias of this kind is sufficient to expose private keys, given a
manageably small number of signatures to work from.
(Incidentally, none of this affects Ed25519. The spec for that system
includes its own idea of how you should do deterministic nonce
generation - completely different again, naturally - and we did it
that way rather than our way, so that we could use the existing test
vectors.)
The simplest fix would be to patch our existing nonce generator to use
a longer hash, or concatenate a couple of SHA-512 hashes, or something
similar. But I think a more robust approach is to switch it out
completely for what is now the standard system. The main reason why I
prefer that is that the standard system comes with test vectors, which
adds a lot of confidence that I haven't made some other mistake in
following my own design.
So here's a commit that adds an implementation of RFC 6979, and
removes the old dsa_gen_k() function. Tests are added based on the
RFC's appendix of test vectors (as many as are compatible with the
more limited API of PuTTY's crypto code, e.g. we lack support for the
NIST P192 curve, or for doing integer DSA with many different hash
functions). One existing test changes its expected outputs, namely the
one that has a sample key pair and signature for every key algorithm
we support.
While trying to get an upcoming piece of code through testsc, I had
trouble - _yet again_ - with the way that control flow diverges inside
the glibc implementations of functions like memcpy and memset,
depending on the alignment of the input blocks _above_ the alignment
guaranteed by malloc, so that doing the same sequence of malloc +
memset can lead to different control flow. (I believe this is done
either for cache performance reasons or SIMD alignment requirements,
or both: on x86, some SIMD instructions require memory alignment
beyond what malloc guarantees, which is also awkward for our x86
hardware crypto implementations.)
My previous effort to normalise this problem out of sclog's log files
worked by wrapping memset and all its synonyms that I could find. But
this weekend, that failed for me, and the reason appears to be ifuncs.
I'm aware of the great irony of committing code to a security project
with a log message saying something vague about ifuncs, on the same
weekend that it came to light that commits matching that description
were one of the methods used to smuggle a backdoor into the XZ Utils
project (CVE-2024-3094). So I'll bend over backwards to explain both
what I think is going on, and why this _isn't_ a weird ifunc-related
backdooring attempt:
When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to
arrange that the side-channel test rig calls a function of mine before
and after each call to memset. The way drwrap works is to look up the
symbol address in either the main program or a shared library; in this
case, it's a shared library, namely libc.so. Then it intercepts call
instructions with exactly that address as the target.
Unfortunately, what _actually_ happens when the main program calls
memset is more complicated. First, control goes to the PLT entry for
memset (still in the main program). In principle, that loads a GOT
entry containing the address of memset (filled in by ld.so), and jumps
to it. But in fact the GOT entry varies its value through the program;
on the first call, it points to a resolver function, whose job is to
_find out_ the address of memset. And in the version of libc.so I'm
currently running, that resolver is an STT_GNU_IFUNC indirection
function, which tests the host CPU's capabilities, and chooses an
actual implementation of memset depending on what it finds. (In my
case, it looks as if it's picking one that makes extensive use of x86
SIMD.) To avoid the overhead of doing this on every call, the returned
function pointer is then written into the main program's GOT entry for
memset, overwriting the address of the resolver function, so that the
_next_ call the main program makes through the same PLT entry will go
directly to the memset variant that was chosen.
And the problem is that, after this has happened, none of the new
control flow ever goes near the _official_ address of memset, as read
out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry
isn't at that address, and neither is the particular SIMD variant that
the resolver ended up choosing. So now my wrapper on memset is never
being invoked, and memset cheerfully generates different control flow
in runs of my crypto code that testsc expects to be doing exactly the
same thing as each other, and all my tests fail spuriously.
My solution, at least for the moment, is to completely abandon the
strategy of wrapping memset. Instead, let's just make it behave the
same way every time, by forcing all the affected memory allocations to
have extra-strict alignment. I found that 64-byte alignment is not
good enough to eliminate memset-related test failures, but 128-byte
alignment is.
This would be tricky in itself, if it weren't for the fact that PuTTY
already has its own wrapper function on malloc (for various reasons),
which everything in our code already uses. So I can divert to C11's
aligned_alloc() there. That in turn is done by adding a new #ifdef to
utils/memory.c, and compiling it with that #ifdef into a new object
library that is included in testsc, superseding the standard memory.o
that would otherwise be pulled in from our 'utils' static library.
With the previous memset-compensator removed, this means testsc is now
dependent on having aligned_alloc() available. So we test for it at
cmake time, and don't build testsc at all if it can't be found. This
shouldn't bother anyone very much; aligned_alloc() is available on
_my_ testsc platform, and if anyone else is trying to run this test
suite at all, I expect it will be on something at least as new as
that.
(One awkward thing here is that we can only replace _new_ allocations
with calls to aligned_alloc(): C11 provides no aligned version of
realloc. Happily, this doesn't currently introduce any new problems in
testsc. If it does, I might have to do something even more painful in
future.)
So, why isn't this an ifunc-related backdoor attempt? Because (and you
can check all of this from the patch):
1. The memset-wrapping code exists entirely within the DynamoRIO
plugin module that lives in test/sclog. That is not used in
production, only for running the 'testsc' side-channel tester.
2. The memset-wrapping code is _removed_ by this patch, not added.
3. None of this code is dealing directly with ifuncs - only working
around the unwanted effects on my test suite from the fact that
they exist somewhere else and introduce awkward behaviour.
This takes a plain ssh_hashalg, and constructs the most natural kind
of HMAC wrapper around it, taking its key length and output length
to be the hash's output length. In other words, it converts SHA-foo
into exactly the thing usually called HMAC-SHA-foo.
It does it by constructing a new ssh2_macalg vtable, and including it
in the same memory allocation as the actual hash object. That's the
first time in PuTTY I've done it this way.
Nothing yet uses this, but a new piece of code is about to.
With Google Groups severing its connection to Usenet today[*], newsgroup
access will require what today are quite specialised tools; and even
before that (and before the recent spam floods, which will hopefully now
cease) they didn't seem super useful as a way of getting PuTTY support.
[*] https://support.google.com/groups/answer/11036538
I've just removed the whole section, since we couldn't think of an
obvious consensus replacement venue -- certainly not one where we hang
out ourselves. There's Stackmumble and places like that, but we didn't
think of anywhere concrete to point to.
This has led to some rather unsatisfactorily vague 'seek help from some
public forum' wording in places, which could perhaps use revision.
As I mentioned in the previous commit, this merge was nontrivial
enough that it wasn't a good idea for the release checklist to suggest
doing it in a hurry. And indeed, now I look at it again this morning,
there are mistakes: a memory leak of ConsoleIO on the abort path from
both affected functions, and a missing space in one of the prompt
messages. Now both fixed.
'Merge the release branch to main' shouldn't be half way down the
procedure of actually _doing_ the release. Sometimes it's not trivial!
This time, for example, major changes in windows/console.c had taken
place on main (the ConsoleIO abstraction) which conflicted with also-
major changes on the release branch (the rework of the weak-crypto
warning messages to use SeatDialogText, for the Terrapin warning).
Suddenly I realise it makes more sense to prepare the merge in
advance, and then if it's difficult you get time to think about that
and solve it at leisure.
Also, I'm on Mastodon these days, and that seems like an obviously
useful place to announce releases. Added a checklist item to draft a
release toot, and another one to send it.
Lastly, I managed to miss my own suggested template wording for MS
Store 'what's new' text, even though it was right there in the
checklist! Expanded it out into a display paragraph or two, so that
it's more obvious.
This involved a trivial merge conflict fix in terminal.c because of
the way the cherry-pick 73b41feba5 differed from its original
bdbd5f429c.
But a more significant rework was needed in windows/console.c, because
the updates to confirm_weak_* conflicted with the changes on main to
abstract out the ConsoleIO system.
These handy wrappers on the verbose underlying Win32 registry API have
to lose some expressiveness, and one thing they lost was the ability
to open a registry key without asking for both read and write access.
This meant they couldn't be used for accessing keys not owned by the
calling user.
So far, I've only used them for accessing PuTTY's own saved data,
which means that hasn't been a problem. But I want to use them
elsewhere in an upcoming commit, so I need to fix that.
The obvious thing would be to change the meaning of the existing
'create' boolean flag so that if it's false, we also don't request
write access. The rationale would be that you're either reading or
writing, and if you're writing you want both RW access and to create
keys that don't already exist. But in fact that's not true: you do
want to set create==false and have write access in the case where
you're _deleting_ things from the key (or the whole key). So we really
do need three ways to call the wrapper function.
Rather than add another boolean field to every call site or mess about
with an 'access type' enum, I've taken an in-between route: the
underlying open_regkey_fn *function* takes a 'create' and a 'write'
flag, but at call sites, it's wrapped with a macro anyway (to append
NULL to the variadic argument list), so I've just made three macros
whose names request different access. That makes call sites marginally
_less_ verbose, while still
(cherry picked from commit 7339e00f4a)
The Terrapin vulnerability affects the modified binary packet protocol
used with ChaCha20+Poly1305, and also CBC-mode ciphers in ETM mode.
It's best prevented by the new strict-kex mode, but if the server
can't handle that protocol alteration, another approach is to change
PuTTY's configuration so that it will negotiate a different algorithm.
That may not be possible either (an obvious case being if the server
has been manually configured to _only_ support vulnerable modes). But
if it is possible, then it would be nice for us to detect that and
show how to do it.
That could be a hard problem in general, but the most likely cause of
it is configuring ChaCha20 to the top of the cipher list, so that it's
selected ahead of things that aren't vulnerable. And it's reasonably
easy to do just one fantasy-renegotiation, having moved ChaCha20 down
to below the warn line, and see if that sorts it out. If it does, we
can pass on that advice to the user.
This will allow it to be called in a second circumstance where we're
trying to find out whether something _would_ have worked, so that we
never want to terminate the connection.
If the KEXINIT exchange results in a vulnerable cipher mode, we now
give a warning, similar to the 'we selected a crypto primitive below
the warning threshold' one. But there's nothing we can do about it at
that point other than let the user abort the connection.
This is enabled via magic signalling keywords in the kex algorithms
list, similarly to ext-info-{c,s}. If both sides announce the
appropriate keyword, then this signals two changes to the standard SSH
protocol:
1. NEWKEYS resets packet sequence numbers: following any NEWKEYS, the
next packet sent in the same direction has sequence number zero.
2. No extraneous packets such as SSH_MSG_IGNORE are permitted during
the initial cleartext phase of the SSH protocol.
These two changes between them defeat the 'Terrapin' vulnerability,
aka CVE-2023-48795: a protocol-level exploit in which, for example, a
MITM injects a server-to-client SSH_MSG_IGNORE during the cleartext
phase, and deletes an initial segment of the server-to-client
encrypted data stream that it guesses is the right size to be the
server's SSH_MSG_EXT_INFO, so that both sides agree on the sequence
number of the _following_ server-to-client packet. In OpenSSH's
modified binary packet protocol modes this attack can go completely
undetected, and force a downgrade to (for example) SHA-1 based RSA.
(The ChaCha20/Poly1305 binary packet protocol is most vulnerable,
because it reinitialises the IV for each packet from scratch based on
the sequence number, so the keystream doesn't get out of sync.
Exploiting this in OpenSSH's ETM modes requires additional faff to
resync the keystream, and even then, the client likely sees a
corrupted SSH message at the start of the stream - but it will just
send SSH_MSG_UNIMPLEMENTED in response to that and proceed anyway. CBC
modes and standard AES SDCTR aren't vulnerable, because their MACs are
based on the plaintext rather than the ciphertext, so faking a correct
MAC on the corrupted packet requires the attacker to know what it
would decrypt to.)
This centralises the messages for weak crypto algorithms (general, and
host keys in particular, the latter including a list of all the other
available host key types) into ssh/common.c, in much the same way as
we previously did for ordinary host key warnings.
The reason is the same too: I'm about to want to vary the text in one
of those dialog boxes, so it's convenient to start by putting it
somewhere that I can modify just once.
I'm about to want to use the same code to check for something else.
It's only a handful of lines, but even so.
Also, since the string constants are mentioned several times, this
seems like a good moment to lift them out into reusable static const
ptrlens.
ssh2_scan_kexinits must check to see whether it's behaving as an SSH
client or server, in order to decide whether to look for "ext-info-s"
in the server's KEXINIT or "ext-info-c" in the client's, respectively.
This check was done by testing the pointer 'server_hostkeys' to see if
it was non-NULL. I think I must have imagined that a variable of that
name meant "the host keys we have available to offer the client, if we
are the server", as the similarly named parameter 'our_hostkeys' in
write_kexinit_lists in fact does mean. So I expected it to be non-NULL
for the server and NULL for the client, and coded accordingly.
But in fact it's used by the client: it collects host key types the
client has _seen_ from the server, in order to offer them as cross-
certification actions in the specials menu. Moreover, it's _always_
non-NULL, because in the server, it's easier to leave it present but
empty than to get rid of it.
So this code was always behaving as if it was the server, i.e. it was
looking for "ext-info-c" in the client KEXINIT. When it was in fact
the client, that test would always succeed, because we _sent_ that
KEXINIT ourselves!
But nobody ever noticed, because when we're the client, it doesn't
matter whether we saw "ext-info-c", because we don't have any reason
to send EXT_INFO from client to server. We're only concerned with
server-to-client EXT_INFO. So this embarrassing bug had no actual
effect.
I encountered an instance of this sequence in the log files from a
clang CI build. The payload text inside the wrapper was
"bk;t=1697630539879"; I don't know what the "bk" stood for, but the
second half appears to be a timestamp in milliseconds since the Unix
epoch.
I don't think there's anything we can (or should) actually _do_ with
this sequence, but I think it's useful to at least recognise it, so
that it can be conveniently discarded.
(cherry picked from commit 7b10e34b8f)
Shortly after the previous commit I spotted another definitely missing
display update: if you send the byte 0x7F, aka 'destructive
backspace', then the display didn't update immediately.
That was two in a row, so I did an eyeball review of the whole
terminal state machine to the best of my ability. Found a couple more
borderline ones, but also, found that the entire VT52 sub-state-
machine had a blanket seen_disp_event which really _shouldn't_ have
been there, because half the VT52 sequences aren't actually display-
modifying updates.
To make this _slightly_ less error-prone, I've sunk a number of
seen_disp_update calls into subroutines that aren't the top-level
term_out(). For example, erase_lots(), scroll(), move() and
swap_screen() now all call seen_disp_update within themselves, so
their call sites don't all have to remember to.
There are probably further bugs after this upheaval, but I think it's
moving in generally the right direction.
(cherry picked from commit 6a6efd36aa)
These escape sequences immediately change the display of the line
they're invoked on, so they need to trigger a display update. But they
weren't, and I suppose we must have never noticed before due to the
complete confusion fixed in commit bdbd5f429c.
(cherry picked from commit aa1552bc82)
Just happened to jump out at me in an eyeball inspection just now. I
carefully moved all the protocol byte-value constants into a header
file with mnemonic names, but I still hard-coded SOCKS4_REPLY_VERSION
in the text of one diagnostic, and I got the wrong one of
SOCKS5_REQUEST_VERSION and SOCKS5_REPLY_VERSION at one point in the
code. Both benign (the right value was there, juste called by the
wrong name).
Also fixed some missing whitespace, in passing. (Probably the line it
was missing from had once been squashed up closer to the right margin.)
(cherry picked from commit 1cd0f1787f)
Recently I encountered a CLI tool that took tens of seconds to run,
and produced no _visible_ output, but wrote ESC[0m to the terminal a
few times during its operation. (Probably by mistake. In other modes
it does print colourful messages, so I expect a 'reset colour' call
was accidentally outside the 'if' statement containing the rest of the
diagnostic it followed. Or something along those lines.)
I noticed this because every ESC[0m reset my pterm scrollback to the
bottom, which wasn't very helpful, and was unintentional on pterm's
part (as _well_ as on the part of the tool). But I can fix pterm!
At first glance the code _looked_ sensible: terminal.c contains calls
to seen_disp_event(term) whenever terminal output does something that
requires a redraw of the terminal window. Those are also the updates
that should count as 'reset scrollback on display activity'. And
ESC[0m, along with the rest of the SGR handler, correctly contained no
such call. So how did a display update happen at all?
The code was confusingly tangled up with the code that responds to
terminal activity by resetting the phase of the blinking cursor (if
any). term_reset_cblink() was calling seen_disp_event() (when surely
it should be the other way round!), and also, term_reset_cblink() was
called whenever _any_ terminal output data arrived. That combination
meant that any byte output to the terminal at all turned out to count
as display activity, whether or not it changed the screen contents.
Additionally, the other scrollback-reset flag, 'reset scrollback on
keypress', was handled by calling seen_disp_event() from the keyboard
handler. But display events and keyboard events are supposed to be
_independent_ potential causes of scrollback resets - it doesn't make
any sense to handle one by treating it as the other!
So I've reorganised the code completely:
- the seen_disp_event *flag* is now gone. Instead, the
seen_disp_event function tests the scroll_on_disp flag, and if set,
resets the scroll position immediately and sets the general
'scrollbar needs updating' flag.
- keyboard input is handled by doing exactly the same thing except
testing the scroll_on_key flag, so the two systems are properly
independent. That code calls term_schedule_update so that the
terminal will be redrawn as a result of the scroll, but doesn't
also call seen_disp_event() for the rest of the full treatment.
- the term_update code that does the scrollbar update is much
simpler, since now it only needs to test that one flag.
- I also had to set that flag explicitly in scroll() so that the
scrollbar would still be updated as a result of the scrollback size
changing. I think that must have been happening entirely by
accident before.
- term_reset_cblink is subsumed into seen_disp_event, so that only
_substantive_ display updates cause the cursor blink phase to reset
to the start of the solid period.
Result: if programs output no-op sequences like ESC[0m, or if you
press keys that don't echo, then the cursor will carry on blinking
normally, and (if you don't also have scroll_on_key set) the
scrollback won't be reset. And the code is slightly shorter than it
was before, and hopefully more sensible too.
(However, other classes of no-op activity _will_ still cause a cursor
blink phase change and a scrollback reset, such as sending a
cursor-positioning sequence that puts the cursor in the same place it
was already - even something as simple as ^M when already at the start
of the line. It might be nice to fix that, but it's much more
difficult: you'd have to either put a complicated and error-prone test
at every seen_disp_event call site, or else expensively diff the
entire visible terminal state against how it was before. And to avoid
a nondeterministic dependency on the terminal update cooldown, that
diff would have to be done at the granularity of individual control
sequences rather than a bounded number of times a second. I'd rather
not!)
(cherry picked from commit bdbd5f429c)
(cherry-picker's note: I also had to remove the initialisation of
the removed field term->seen_disp_event from term_init(), which didn't
need doing on main because commit 74aa3cb7fb had already replaced
all the boring parts of term_init with a big memset)
A user just reported that 0.79 doesn't build out of the box on Ubuntu
14.04 (trusty), because although its gcc (4.8.4) does _support_ C99,
it doesn't enable it without a non-default -std option. The user was
able to work around the problem by defining CMAKE_C_FLAGS=-std=gnu99,
but it would have been nicer if we'd done that automatically. Setting
CMAKE_C_STANDARD causes cmake to do so.
(This isn't a regression of 0.79 over 0.78 as far as I know; the user
in question said they had last built 0.76.)
I was surprised to find Ubuntu 14.04 still in use at all, but a quick
web search revealed that its support has been extended until next
year, so fair enough, I suppose. It's also running a cmake way older
than we support, but apparently that can be worked around via
Kitware's binary tarball downloads (which do still run on 14.04).
This is a bit unsatisfactory: I'd prefer to ask for C standards
support of _at least_ C99 level, and C11 if possible. Then I could
test for the presence of C11 features via check_c_source_compiles, and
use them opportunistically (e.g. in macro definitions). But as far as
I can see, cmake has no built-in support for asking for a standards
level of 'as new as you can get, but no older than 99'. Oh well.
(In any case, the thing I'd find most useful from C11 is _Generic, and
since that's in implementation namespace, compilers can - and do -
support it in C99 mode anyway. So it's probably fine, at least for now.)
(cherry picked from commit bd27962cd9)
Experimenting with different compile flags pointed out two instances
of typedefing the same name twice (though benignly, with the same
definition as well). PsocksDataSink was typedefed a couple of lines
above its struct definition and then again _with_ its struct
definition; cliloop_continue_t was typedefed in unix/platform.h and
didn't need defining again in unix/psocks.c.
(cherry picked from commit 3d34007889)
That's two releases running I've got most of the way through the
mechanical upload processes and suddenly realised I still have a piece
of creative writing to do. A small one, but even so, there's no reason
it couldn't have been prepared a week in advance like the rest of the
announcements and changelogs. The only reason I didn't is that the
checklist didn't remind me to. Now it does.
(cherry picked from commit da550c3158)
I encountered an instance of this sequence in the log files from a
clang CI build. The payload text inside the wrapper was
"bk;t=1697630539879"; I don't know what the "bk" stood for, but the
second half appears to be a timestamp in milliseconds since the Unix
epoch.
I don't think there's anything we can (or should) actually _do_ with
this sequence, but I think it's useful to at least recognise it, so
that it can be conveniently discarded.
Shortly after the previous commit I spotted another definitely missing
display update: if you send the byte 0x7F, aka 'destructive
backspace', then the display didn't update immediately.
That was two in a row, so I did an eyeball review of the whole
terminal state machine to the best of my ability. Found a couple more
borderline ones, but also, found that the entire VT52 sub-state-
machine had a blanket seen_disp_event which really _shouldn't_ have
been there, because half the VT52 sequences aren't actually display-
modifying updates.
To make this _slightly_ less error-prone, I've sunk a number of
seen_disp_update calls into subroutines that aren't the top-level
term_out(). For example, erase_lots(), scroll(), move() and
swap_screen() now all call seen_disp_update within themselves, so
their call sites don't all have to remember to.
There are probably further bugs after this upheaval, but I think it's
moving in generally the right direction.
These escape sequences immediately change the display of the line
they're invoked on, so they need to trigger a display update. But they
weren't, and I suppose we must have never noticed before due to the
complete confusion fixed in commit bdbd5f429c.
Just happened to jump out at me in an eyeball inspection just now. I
carefully moved all the protocol byte-value constants into a header
file with mnemonic names, but I still hard-coded SOCKS4_REPLY_VERSION
in the text of one diagnostic, and I got the wrong one of
SOCKS5_REQUEST_VERSION and SOCKS5_REPLY_VERSION at one point in the
code. Both benign (the right value was there, juste called by the
wrong name).
Also fixed some missing whitespace, in passing. (Probably the line it
was missing from had once been squashed up closer to the right margin.)
These are now specified in conf.h and filled in by automated code,
which means test_conf can make sure we didn't forget to provide them.
The default for a mapping type (not that we currently have any unsaved
ones) is expected to be empty.
Also, while adding test_conf checks, I realised I hadn't filled in the
rest of the comment in conf.h. Belatedly updated that.
This allows a couple more settings to be treated automatically on
save, which are more complicated on load because they still honour
older alternative save keywords.
In particular, CONF_proxy_type and CONF_remote_qtitle_action now have
explicit enum mappings. These were needed for the automated save code,
but also, I've rewritten the custom load code to use them too. This
decouples the storage format of those settings from the order of
values in the internal enum, which is generally an advantage of
specifying storage enums explicitly.
Those two settings weren't already tested by test_conf, because I
wasn't changing them in previous commits. Now I've added extra code
that does test them, and verified it works when backported to commit
b567c9b2b5 where I introduced test_conf before beginning the main
refactoring.
A setting can also be specified explicitly as not loaded and saved at
all. There were quite a few commented that way, but now there's a
machine-readable indication of it.
test_conf will now check that all these settings make sense together -
things shouldn't have a save keyword unless they use it, and should
have one if they don't, and shouldn't specify combinations of options
that conflict.
(For that reason, test_conf is now also running the consistency check
before the main test, so that a missing keyword will cause an error
message _before_ it causes a segfault, saving some debugging!)
It's not used! The only config items that specified it are doing their
load/save in a custom way anyway, and they _don't_ use anything
resembling that enum - instead, they map the integer values in Conf to
strings in the storage format. That enum was a total lie and an
artefact of my conversion macros. Ahem.
This is why I wrote conf.h in the form of macros that expanded to
named structure field assignments, instead of just filling it with
named structure field assignments directly. This way, I can #include
the same file again with different macro definitions, and build up a
list of what fields were set in what config options.
This new code checks that if a config option has a default, then the
type of the default matches the declared type of the option value
itself. That's what caught the two goofs in the previous commit.
This is also the part of test_conf that I _won't_ want to delete once
I've finished with the refactoring: it can stay there forever, doing
type checking at test time that the compiler isn't doing for me at
build time.
I suspected there'd be one or two mistakes introduced by that
transcription in spite of the test suite, and there were!
CONF_mouseautocopy had its default labelled as int rather than bool,
because it didn't _look_ boolean to my conversion scripts or to my
eyeballs - but its default value is actually a macro that expands to
'true' or 'false' (depending on platform), so it is really.
And CONF_supdup_ascii_set, conversely, is an int which had its default
labelled as bool, which was due to my conversion scripts faithfully
transcribing the same confusion in the original code.
Ahem. Of course I've been running it interactively until now, so I
never noticed that I'd forgotten to fill in that important point. But
now it's run as part of my build, it should make sure to fail if it
fails.
The new ConfKeyInfo structure now includes some fields indicating how
to load and save each config option: what keyword it's stored under in
the saved settings file, and what its default value should be set to
when loading a session that doesn't mention it. (Including, of course,
loading the null session at program startup.)
So far, this only applies to the saved settings that are sufficiently
simple: a single integer, string or boolean value whose internal
format matches its storage format, or an integer value consisting of a
finite enumeration with a fixed mapping between its internal and
storage formats. Anything more difficult than that - mappings,
variable defaults, config options tied together, options that still
support a legacy save format alongside the up-to-date one, things
under #ifdef - hasn't yet been tampered with.
This allows a large amount of repetitive code in settings.c to be
deleted, and replaced by simple loops over the conf_key_info array
doing all the easy work. The remaining manual load/save code per
option is all there because it's difficult in some way.
The transitional test_conf program still passes after this upheaval.
This array is planned to be exposed more widely, and used for more
purposes than just checking the types of Conf options. In this commit
it just takes over from the two previous smaller arrays, and adds no
extra data.
This aims to be a reasonably exhaustive test of what happens if you
set Conf values to various things, and then save your session, and
find out what ends up in the storage. Or vice versa.
Currently, the test program is written to match the existing
behaviour. The idea is that I can refactor the code that does the
loading and saving, and if this test still passes, I've probably done
it right.
However, in the long term, this test will be a liability: it's yet
another place you have to add every new config option. So my plan is
to get rid of it again once the refactorings I'm planning are
finished.
Or rather, I'll get rid of _that_ part of its functionality. I also
suspect I'll have added new kinds of consistency check by then, which
won't be a liability in the same way, and which I'll want to keep.
FontSpec is completely different per platform; not only is the
structure type different, not only are the behind-the-scenes
implementations of copy and free functions different, but even the API
of the constructor function is different. Cross-platform code can't
construct a FontSpec at all. This makes it hard to write a
cross-platform test program that works with them.
So, as a nasty bodge, I'll allow test programs to #define
SUPERSEDE_FONTSPEC_FOR_TESTING before including putty.h. Then they can
provide their own definition of FontSpec, and they also take
responsibility for superseding all the other functions that work with
one.
Normally you don't ever want to have a Conf structure that doesn't
have an entry for every primary key, because the code that uses Conf
to get real work done will fail assertions if lookups fail. But test
programs manipulating Conf in unusual ways are a special case.
(In particular, one thing you _can_ legally do with an empty Conf is
to call load_open_settings() to populate it. That has to be legal,
because it's how a Conf gets populated in the first place, after it's
initially created empty.)
Recently I encountered a CLI tool that took tens of seconds to run,
and produced no _visible_ output, but wrote ESC[0m to the terminal a
few times during its operation. (Probably by mistake. In other modes
it does print colourful messages, so I expect a 'reset colour' call
was accidentally outside the 'if' statement containing the rest of the
diagnostic it followed. Or something along those lines.)
I noticed this because every ESC[0m reset my pterm scrollback to the
bottom, which wasn't very helpful, and was unintentional on pterm's
part (as _well_ as on the part of the tool). But I can fix pterm!
At first glance the code _looked_ sensible: terminal.c contains calls
to seen_disp_event(term) whenever terminal output does something that
requires a redraw of the terminal window. Those are also the updates
that should count as 'reset scrollback on display activity'. And
ESC[0m, along with the rest of the SGR handler, correctly contained no
such call. So how did a display update happen at all?
The code was confusingly tangled up with the code that responds to
terminal activity by resetting the phase of the blinking cursor (if
any). term_reset_cblink() was calling seen_disp_event() (when surely
it should be the other way round!), and also, term_reset_cblink() was
called whenever _any_ terminal output data arrived. That combination
meant that any byte output to the terminal at all turned out to count
as display activity, whether or not it changed the screen contents.
Additionally, the other scrollback-reset flag, 'reset scrollback on
keypress', was handled by calling seen_disp_event() from the keyboard
handler. But display events and keyboard events are supposed to be
_independent_ potential causes of scrollback resets - it doesn't make
any sense to handle one by treating it as the other!
So I've reorganised the code completely:
- the seen_disp_event *flag* is now gone. Instead, the
seen_disp_event function tests the scroll_on_disp flag, and if set,
resets the scroll position immediately and sets the general
'scrollbar needs updating' flag.
- keyboard input is handled by doing exactly the same thing except
testing the scroll_on_key flag, so the two systems are properly
independent. That code calls term_schedule_update so that the
terminal will be redrawn as a result of the scroll, but doesn't
also call seen_disp_event() for the rest of the full treatment.
- the term_update code that does the scrollbar update is much
simpler, since now it only needs to test that one flag.
- I also had to set that flag explicitly in scroll() so that the
scrollbar would still be updated as a result of the scrollback size
changing. I think that must have been happening entirely by
accident before.
- term_reset_cblink is subsumed into seen_disp_event, so that only
_substantive_ display updates cause the cursor blink phase to reset
to the start of the solid period.
Result: if programs output no-op sequences like ESC[0m, or if you
press keys that don't echo, then the cursor will carry on blinking
normally, and (if you don't also have scroll_on_key set) the
scrollback won't be reset. And the code is slightly shorter than it
was before, and hopefully more sensible too.
(However, other classes of no-op activity _will_ still cause a cursor
blink phase change and a scrollback reset, such as sending a
cursor-positioning sequence that puts the cursor in the same place it
was already - even something as simple as ^M when already at the start
of the line. It might be nice to fix that, but it's much more
difficult: you'd have to either put a complicated and error-prone test
at every seen_disp_event call site, or else expensively diff the
entire visible terminal state against how it was before. And to avoid
a nondeterministic dependency on the terminal update cooldown, that
diff would have to be done at the granularity of individual control
sequences rather than a bounded number of times a second. I'd rather
not!)
A user just reported that 0.79 doesn't build out of the box on Ubuntu
14.04 (trusty), because although its gcc (4.8.4) does _support_ C99,
it doesn't enable it without a non-default -std option. The user was
able to work around the problem by defining CMAKE_C_FLAGS=-std=gnu99,
but it would have been nicer if we'd done that automatically. Setting
CMAKE_C_STANDARD causes cmake to do so.
(This isn't a regression of 0.79 over 0.78 as far as I know; the user
in question said they had last built 0.76.)
I was surprised to find Ubuntu 14.04 still in use at all, but a quick
web search revealed that its support has been extended until next
year, so fair enough, I suppose. It's also running a cmake way older
than we support, but apparently that can be worked around via
Kitware's binary tarball downloads (which do still run on 14.04).
This is a bit unsatisfactory: I'd prefer to ask for C standards
support of _at least_ C99 level, and C11 if possible. Then I could
test for the presence of C11 features via check_c_source_compiles, and
use them opportunistically (e.g. in macro definitions). But as far as
I can see, cmake has no built-in support for asking for a standards
level of 'as new as you can get, but no older than 99'. Oh well.
(In any case, the thing I'd find most useful from C11 is _Generic, and
since that's in implementation namespace, compilers can - and do -
support it in C99 mode anyway. So it's probably fine, at least for now.)
Experimenting with different compile flags pointed out two instances
of typedefing the same name twice (though benignly, with the same
definition as well). PsocksDataSink was typedefed a couple of lines
above its struct definition and then again _with_ its struct
definition; cliloop_continue_t was typedefed in unix/platform.h and
didn't need defining again in unix/psocks.c.
That's two releases running I've got most of the way through the
mechanical upload processes and suddenly realised I still have a piece
of creative writing to do. A small one, but even so, there's no reason
it couldn't have been prepared a week in advance like the rest of the
announcements and changelogs. The only reason I didn't is that the
checklist didn't remind me to. Now it does.