There's no difficulty with implementing these, on either platform.
Windows has native Unicode support for its edit boxes: we can set and
retrieve the text as a wide-character string, and then converting it
to and from UTF-8 is easy. And GTK has specified its edit boxes as
being UTF-8 all along, no matter what the system locale.
This begins the process of making PuTTY more able to handle Unicode
strings as a first-class type in its configuration. One of the new
types, CONF_TYPE_UTF8, looks physically just like CONF_TYPE_STR but
the semantics are that it's definitely encoded in UTF-8, instead of
'shrug, whatever the system locale's encoding is'.
Unfortunately, we can't yet switch over any Conf items to having that
type, because our data representations in saved configuration (both on
Unix and Windows) store char strings in the system encoding. So we'll
have to change that representation at the same time, which risks
breaking backwards compatibility with old PuTTYs reading the same
configuration.
So the other new type, CONF_TYPE_STR_AMBI, is intended as a
transitional form, recording a configuration setting that _might_ be
explicitly UTF-8 or might have the legacy 'shrug, whatever' semantics,
depending on where we got it from.
My general migration plan is that first I _enable_ Unicode support in
a Conf item, by turning it into STR_AMBI; the Unicode version of the
string (if any) is saved in a new location, and a best-effort
local-charset version is saved where it's always been. That way new
PuTTY can read the Unicode version, and old PuTTY reading that
configuration will behave no worse than it would have done already.
It would be nice to think that in the far future we've migrated
everything to STR_AMBI and can move them all to mandatory UTF-8,
obsoleting the old configuration. I think it's more likely we'll never
get there. But at least _new_ Conf items, with no backwards
compatibility requirement in the first place, can be CONF_TYPE_UTF8
where appropriate.
(In conf_get_str_ambi(), I considered making it mandatory via assert()
to pass the 'utf8' output pointer as non-NULL, to defend against lazy
adaptation of existing code by just changing the function call. But in
fact I think there's a legitimate use case for not caring if the
output is UTF-8 or not, because some of the existing SSH code
currently just shoves strings like usernames directly on to the wire
whether they're in the right encoding or not; so if you want to do the
correct UTF-8 thing where possible and preserve legacy behaviour if
not, then treating both classes of string the same _is_ the right
thing to do.)
This also requires linking the Unicode support into many Unix
applications that hadn't previously needed it.
The previous mb_to_wc and wc_to_mb had horrible and also buggy APIs.
This commit introduces a fresh pair of functions to replace them,
which generate output by writing to a BinarySink. So it's now up to
the caller to decide whether it wants the output written to a
fixed-size buffer with overflow checking (via buffer_sink), or
dynamically allocated, or even written directly to some other output
channel.
Nothing uses the new functions yet. I plan to migrate things over in
upcoming commits.
What was wrong with the old APIs: they had that awkward undocumented
Windows-specific 'flags' parameter that I described in the previous
commit and took out of the dup_X_to_Y wrappers. But much worse, the
semantics for buffer overflow were not just undocumented but actually
inconsistent. dup_wc_to_mb() in utils assumed that the underlying
wc_to_mb would fill the buffer nearly full and return the size of data
it wrote. In fact, this was untrue in the case where wc_to_mb called
WideCharToMultiByte: that returns straight-up failure, setting the
Windows error code to ERROR_INSUFFICIENT_BUFFER. It _does_ partially
fill the output buffer, but doesn't tell you how much it wrote!
What's wrong with the new API: it's a bit awkward to write a sequence
of wchar_t in native byte order to a byte-oriented BinarySink, so
people using put_mb_to_wc directly have to do some annoying pointer
casting. But I think that's less horrible than the previous APIs.
Another change: in the new API for wc_to_mb, defchr can be "", but not
NULL.
This #if completely replaced the actually useful version of clipme(),
so I think it must have been used early in development before the
useful version was even written.
Since then it's bit-rotted to the point where it doesn't even make
sense: the "#endif" is nested inside a while loop that the "#if 0" and
"#else" are outside, so that if the condition were changed to evaluate
true, you'd get syntactic nonsense out of the preprocessor.
And I can't see any use for it now, even if it did compile! Get rid of
it.
This parameter was undocumented, and Windows-specific: its semantics
date from before PuTTY was cross-platform, and are "Pass this flags
parameter straight through to the Win32 API's conversion functions".
So in Windows platform code you can pass flags like MB_USEGLYPHCHARS,
but in cross-platform code, you dare not pass anything nonzero at all
because the Unix frontend won't recognise it (or, likely, even
compile).
I've kept the flag for now in the underlying mb_to_wc / wc_to_mb
functions. Partly that's because there's one place in the Windows code
where the parameter _is_ used; mostly, it's because I'm about to
replace those functions anyway, so there's no point in editing all the
call sites twice.
term_do_paste and term_input_data_from_* were still taking int, which
should be size_t. Not that I expect those functions to get >2^31 bytes
of input in one go, but I noticed it while about to modify the same
functions for another reason.
When dumping out the contents of a Conf, it's useful not to have to
guess what the integer indices mean.
By putting these identifiers in a separate array in its own library
module, I should avoid them getting linked in to production binaries
to take up space, as long as conf_id() is only called from inside
debug() statements. And to enforce _that_, it isn't even declared in a
header file unless you #define DEBUG.
The code in the 'if (IsZoomed)' statement in reset_window() was
failing to take account of the user-configured gap between the text
and the window edge, so that the requested border was lost. Now it
does take that into account.
In this commit, this change of behaviour applies to both a normally
maximised window (with the window frame still visible round the edge)
and to a full-screen window (nothing visible on the whole monitor
except PuTTY).
I'm not 100% sure whether that's the right behaviour: perhaps the
purpose of this configurable border is to space the text away from the
window furniture, so that there's no need for it if there isn't any
furniture? But on the other hand, one thing _I_ use this border for is
to make space round the edge of a terminal window for the green border
Zoom superimposes when sharing the window. And that's a use case that
would still make sense when the window is full-screened.
This is a small refinement of my own to Marco Ricci's new mode
introduced by the previous commit. If Pageant is being run by a parent
process intending to make requests to it, then it's probably put a
pipe on Pageant's stdout, and will be reading from that pipe to
retrieve the environment setup commands. So it needs to know when it's
read enough.
Closing stdout immediately makes this as easy as possible, freeing the
parent process of the need to count lines of output (and also know how
many lines to expect): it can simply read until there's no more data.
This also means there's no need to make stdout line-buffered, of
course – the fclose will flush it anyway.
This new mode makes it easy to run Pageant as a "supervised" instance,
e.g. as part of a test harness for other programs interacting with an
SSH agent, which is the original use case. Because Pageant is then
running as a child process of the supervisor, the operating system
notifies the supervisor of the child's aliveness without resorting to
PIDs or socket addresses, both of which may principally run stale and/or
get recycled.
My normal usage of --debug is to run it in a terminal, where it starts
by printing its SSH_AUTH_SOCK setting for me to paste into another
terminal to run test commands, and then follows that with diagnostic
logging of the requests it's receiving.
But if you'd rather get that diagnostic information in some location
other than a terminal – say, sent to a file which you're viewing in
'less' so that you can search back and forth in it, or piped to
another machine because your test requests are going to come from
somewhere out of sight of your monitor – then you might run 'pageant
--debug' with its stdout being a pipe or a file rather than a
terminal, in which case the standard stdio policy will make it
unbuffered, and the diagnostics won't show up in a timely manner.
The one-line code change is due to Marco Ricci, who had a rather
different motivation.
Unicode 16.0.0 has changed the formatting of that file in a way that
I'm sure _they_ thought was unproblematic :-) by putting spaces around
the character class field, which the reading code wasn't prepared to
cope with.
The IDEA symmetric cipher was the standard one used to protect trad
PGP private keys, back in the days when PuTTY had its very first set.
We haven't needed this option for a long time, but it didn't cause any
obvious failures, so I never spotted it and removed it from the build
script.
But it does cause a failure now, because gpg on Ubuntu 24.04 reports
'invalid option "--load-extension=idea"', suggesting that it hasn't
just forgotten about _that_ extension, it doesn't even like extensions
at all any more. Happily, we don't need it.
Jacob reports that bullseye objected to the change from
G_APPLICATION_FLAGS_NONE to G_APPLICATION_DEFAULT_FLAGS, on the
grounds that it only has the former defined. Sigh. Added a cmake
check.
These actions were already available in the Ctrl + right-click context
menu (or just right-click, if you shifted the mouse-button actions
into Windows mode). But a user might not know about Ctrl + right-click,
and only know to look in the system menu. So it makes them easier to
find if they're in that menu too. Also, I don't really see any reason
why the two menus _should_ be different.
You could reproduce this, for example, by cutting the final line
reading "---- END SSH2 PUBLIC KEY ----" off the end of a file, and
feeding it to Unix 'puttygen -l'.
rfc4716_loadpub() was looping round on get_chomped_line() until it
found a line starting with "-" after the base64 data. But it failed to
check for the end of the file as well, so if the data was truncated,
it would just keep spinning at the end of the file.
Thanks to Will Bond for spotting this. Using STD_INPUT_HANDLE as an
output handle apparently works often enough that I didn't immediately
notice the mistake, but it's not what I _meant_ to do. Obviously we
should have used STD_OUTPUT_HANDLE instead.
I've had more than one conversation recently in which users have
mentioned finding this mode inconvenient. I don't know whether any of
them would want to turn it off completely, but it seems likely that
_somebody_ will, sooner or later. So here's an option to do that.
A user sent a transcript from a curses-based tool 'ncmpc', which
carefully disables terminal autowrap when printing a character in the
bottom right corner of the display, and then turns it back on again.
After that, it expects that sending the backspace character really
moves the cursor back a space, instead of clearing the wrapnext flag.
But in PuTTY, we set the wrapnext flag even if we're not in wrapping
mode - it just doesn't _do_ anything when the next character is sent.
But it remains set, and still affects backspace. So the display is
corrupted by this change of expectation.
(Specifically, ncmpc is printing a time display [m:ss] in the very
bottom right, so it disables wrap in order to print the final ']'.
Then the next thing it needs to do is to update the low-order digit of
the seconds field, so it sends \b as the simplest way to get to that
character. The effect on the display is that the updated seconds digit
appears where the ] was, instead of overwriting the old seconds digit.)
This is a tradeoff in desirable behaviours. The point of having a
backspace operation cancel the wrapnext flag and not actually move the
cursor is to preserve the invariant that sending 'x', backspace, 'y'
causes the y to overprint the x, even if that happens near the end of
the terminal's line length. In non-wrapping mode that invariant was
bound to break _eventually_, but with this change, it breaks one
character earlier than before. However, I think that's less bad than
breaking the expectations of curses-based full-screen applications,
especially since the _main_ need for that invariant arises from naïve
applications that don't want to have to think about the terminal width
at all - and those applications generally run in _wrapping_ mode,
where it's possible to continue the invariant across multiple lines in
any case.
Python 3.12 has a new warning for backslash-character pairs that are not
valid escape sequences at the level of string literals, as opposed to in
some interior syntax such as regular expressions
(https://docs.python.org/3/whatsnew/3.12.html#other-language-changes).
Suppress it by using raw strings.
Revision 1b2f39c24b introduced guards to
use the built-in defaults in the event that the SESSKEY was null. This
was later reverted in 39c20d4819 because
(a) a null SESSKEY is precisely how the absence of a per-session
configuration file is signalled to the backend, and (b) everything could
apparently already cope with a null SESSKEY anyway.
Unfortunately, in between these, 3214563d8e
introduced new functions for handling boolean-valued settings. The
reversion didn't affect the new `gppb_raw' function, which retained the
erroneous guard against null SESSKEY. In consequence, PuTTY ignores
X resources and `-xrm' settings unless `~/.putty/sessions/Default%20Settings'
exists, causing undesirable behaviour such as starting login shells,
establishing `utmp' entries, scrolling on output, failing to scroll on
keypress, not blinking the cursor, etc.
This isn't a total disaster: touching `~/.putty/sessions/Default%20Settings'
makes the problem go away. But it seems worth fixing anyway.
Apply the obvious one-line fix.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
The "rsa-sha2-256" and "rsa-sha2-512" algorithms, as defined by RFC
8332, differ in one detail from "ssh-rsa" in addition to the change of
hash function. They also specify that the signature integer should be
encoded using the same number of bytes as the key modulus, even if
that means giving it a leading zero byte (or even more than one).
I hadn't noticed this, and had assumed that unrelated details wouldn't
have changed. But they had. Thanks to Ilia Mirkin for pointing this
out.
Nobody has previously reported a problem, so very likely most servers
are forgiving of people making this mistake! But now it's been pointed
out, we should comply with the spec. (Especially since the new spec is
more sensible, and only historical inertia justified sticking to the
old one.)
The source-code README file claims that you can start by just running
'cmake .'. But on Windows, that's not literally true. cmake will
expect to find a compiler on your path, and if it doesn't, will fail
to configure.
I'd always taken this for granted, assuming that anyone who was
reading this README and trying to compile PuTTY was already familiar
with how Windows compiler toolchains work, and only needed the part of
the instructions that were specific to PuTTY.
But of course there's no reason that should be true. PuTTY's primary
target audience, even from before it was called PuTTY, has been Unix
expats: people who like to do nearly everything on Unix but for some
reason have a Windows machine in front of them. Such a person might
very well be familiar with _Unix_ compilation, where the C compiler is
installed in /usr/bin and on your PATH already. And then if they want
to try to tweak PuTTY, that will be the first time they have to engage
with a Windows compiler!
So we do need to at least _say_ that the user needs to start by
running vcvars32.bat or one of its friends. Even if we don't give full
details of where to find it (because I've known it to change now and
then), we at least need to remind people to go and look for it.
In the previous few commits I noticed some repeated work in the form
of pointless empty implementations of Plug's log method, plus some
existing (and some new) empty cases of Socket's endpoint_info. As a
cleanup, I'm replacing as many as I can find with uses of a central
null implementation in the stubs directory.
This is the payoff from the previous three commits. If you run
'netstat' or 'ss' or equivalent, and see multiple outgoing SSH
connections from your machine, and you want to match them up to the
instances of PuTTY you can see on your desktop, how would you do it?
On Linux you can trace each socket to an owning pid via 'ss -p', but
tracing the pid in turn to a window isn't so easy. On Windows even the
first step is hard.
Now it shouldn't be too hard, because the Event Log mentions the IP
address and ephemeral port number of the local end of a connection,
after that connection is established, if that information is
available. So now you can connect the local port numbers shown in the
'netstat' or 'ss' output with the ones in the GUI's Event Log.
(This might be useful if, for example, one connection was showing a
backlog in netstat, and you wanted to investigate the corresponding
GUI.)
This enables plug_log to run query methods on the socket in order to
find out useful information to log. I don't expect it's sensible to do
anything else with it.
The peer_info method in the Socket vtable is replaced with
endpoint_info, which takes a boolean indicating which end you're
asking about.
sk_peer_info still exists, as a wrapper on the new sk_endpoint_info.
I'm preparing to be able to ask about the other end of the connection
too, so the first step is to give this data structure a neutral name
that can refer to either. No functional change yet.
These were deliberately thrown away in our UTF-8 decoder, with a
comment apparently introduced by RDB in the 2001 big Unicode patch.
The purpose of this character range has changed completely since then,
and now they act as modifier characters on top of U+1F3F4 to construct
a space of flags (the standard examples being those of England,
Scotland and Wales). We were failing to display those flags, and even
pasting out of the terminal didn't give back the right Unicode.
This is the only one of the newly added cases in test/utf8.txt which I
can (try to) fix unilaterally just by changing PuTTY's display code,
because it doesn't change the number of character cells occupied by
the text, only the appearance of those cells.
In this commit I make the necessary changes in terminal.c, which makes
flags start working in GTK PuTTY and pterm, but not on Windows.
The system of encoding flags in Unicode is that there's a space of 26
regional-indicator letter code points (U+1F1E6 to U+1F1FF inclusive)
corresponding to the unaccented Latin alphabet, and an adjacent pair
of those letters represents the flag associated with that two-letter
code (usually a nation, although at least one non-nation pair exists,
namely EU).
There are two plausible ways we could handle this in terminal.c:
(a) leave the regional indicators as they are in the internal data
model, so that each RI letter occupies its own character cell,
and at display time have do_paint() spot adjacent pairs of them
and send each pair to the frontend as a combined glyph.
(b) combine the pairs _in_ the internal data model, by
special-casing them in term_display_graphic_char().
This choice makes a semantic difference. What if a flag is displayed
in the terminal and something overprints one of its two character
cells? With option (a), overprinting one cell of an RI pair with a
different RI letter would change it into a different flag; with
option (b), flags behave like any other wide character, in that
overprinting one of the two cells blanks the other as a side effect.
I think we need (a), because not all terminal redraw systems
(curses-style libraries) will understand the Unicode flag glyph system
at all. So if a full-screen application genuinely wants to do a screen
redraw in which a flag changes to a different flag while keeping one
of its constituent letters the same (say, swapping between BA and CA,
or between AC and AD), then the redraw library might very well
implement that screen update by redrawing only the changed letter, and
we need not to corrupt the flag.
All of this is now implemented in terminal.c. The effect is that pairs
of RI characters are passed to the TermWin draw_text() method as if
they were a wide character with a combining mark: that is, you get a
two-character (or four-surrogate) string, with TATTR_COMBINING
indicating that it represents a single glyph, and ATTR_WIDE indicating
that that glyph occupies two character cells rather than one.
In GTK, that's enough to make flag display Just Work. But on
Windows (at least the Win10 machine I have to test on), that doesn't
make flags start working all by itself. But then, the rest of the new
emoji tests also look a bit confused on Windows too. Help would be
welcome from someone who knows how Windows emoji display is supposed
to work!
These samples all come from the 'emoji' parts of Unicode, although I
use the word a bit loosely because I'm not sure that flags count (they
have their own special system). But they're all things that ought to
display via a separate font, likely in colour.
The second line of this extra test already looks correct in PuTTY:
three code points each representing an emoji, for which wcwidth()
correctly reports that they occupy 2 cells each. On GTK, the emoji
even appear in colour; on Windows they come out in black and
white. (And I don't know what I can do to fix that; the problem is not
that I don't have any emoji font installed. I do.)
The first line consists of 'simpler' emoji in the sense of being more
common, but technically more complicated, because they're ordinary
Unicode characters such as U+2764 HEAVY BLACK HEART, modified into
emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because
wcwidth() measures the primary character as having width 1 (which it
would do, by itself), and the variation selector as width 0 (also not
unreasonable), but the total is 1, where you'd like it to be 2. This
is also difficult to fix, because if we unilaterally changed it then
every curses-type library would mispredict the cursor position and
produce display corruption during partial screen redraws!
The third line uses a mechanism I've only found out about recently:
U+200D ZERO WIDTH JOINER glues together two code points that would
each be a valid emoji on its own, to make a single combined one. In
this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman
using a computer. Again this doesn't work in PuTTY, which knows
nothing about ZWJ. But it comes out as expected in other tools viewing
this file, such as 'gedit', or Firefox.
The fourth line shows another complex emoji case: the WOMAN code point
is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and
another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6,
in each case selecting the woman's skin tone. PuTTY mishandles that
too, because it doesn't know that those should act as modifiers (again
because wcwidth gives them width 2 rather than 0), and so each one
occupies an extra two character cells.
And the last line contains some sample flags, each of which is
obtained by writing a 2-letter code for a country or region (here GB,
UA, EU) with each Latin letter replaced by the appropriate 'regional
indicator symbol letter' from the 26-code-point range U+1F1E6 to
U+1F1FF inclusive. PuTTY doesn't know anything about those either, but
they at least occupy the right number of cells if handled naïvely, so
_that_ one might be possible to fix!
I was ambushed by both a UI change and a T&C update in Partner Center,
which I could have made less painful by spotting both a day in
advance. Add that to the list for next time.
I used this to confirm that the previous nonces generated by
dsa_gen_k() were indeed biased, and to check that the new RFC6979 ones
don't have the same problem.
Recovering the DSA nonce value is equivalent to recovering the private
key. One way round, this is well known: if you leak or reuse a nonce,
your private key is compromised. But the other direction of the
equivalence is also true - if you know the private key and have a
signed message, you can retrieve the input nonce. This is much less
obviously useful (certainly not to an attacker), but I found it
convenient for this particular test purpose, because it can operate on
the standard SSH data formats, without needing special access into the
signing algorithm to retrieve its internal variables. So I was able to
run this script unchanged against the 'before' and 'after' versions of
testcrypt, and observe the difference.
These tests also failed when I reran testsc, and looking at the code,
no wonder: in each test iteration, the hash object is allocated
_before_ logging begins, rather than after, so that its addresses
aren't normalised by the test suite to 'n bytes after allocation #0'.
So these tests only pass as long as all the allocations get lucky in
reusing the same address. I guess we got lucky on all previous
occasions and didn't notice until now.
Easy fix: now each iteration does alloc / do stuff / free within the
logged section.
This fixes a vulnerability that compromises NIST P521 ECDSA keys when
they are used with PuTTY's existing DSA nonce generation code. The
vulnerability has been assigned the identifier CVE-2024-31497.
PuTTY has been doing its DSA signing deterministically for literally
as long as it's been doing it at all, because I didn't trust Windows's
entropy generation. Deterministic nonce generation was introduced in
commit d345ebc2a5, as part of the initial version of our DSA
signing routine. At the time, there was no standard for how to do it,
so we had to think up the details of our system ourselves, with some
help from the Cambridge University computer security group.
More than ten years later, RFC 6979 was published, recommending a
similar system for general use, naturally with all the details
different. We didn't switch over to doing it that way, because we had
a scheme in place already, and as far as I could see, the differences
were not security-critical - just the normal sort of variation you
expect when any two people design a protocol component of this kind
independently.
As far as I know, the _structure_ of our scheme is still perfectly
fine, in terms of what data gets hashed, how many times, and how the
hash output is converted into a nonce. But the weak spot is the choice
of hash function: inside our dsa_gen_k() function, we generate 512
bits of random data using SHA-512, and then reduce that to the output
range by modular reduction, regardless of what signature algorithm
we're generating a nonce for.
In the original use case, this introduced a theoretical bias (the
output size is an odd prime, which doesn't evenly divide the space of
2^512 possible inputs to the reduction), but the theory was that since
integer DSA uses a modulus prime only 160 bits long (being based on
SHA-1, at least in the form that SSH uses it), the bias would be too
small to be detectable, let alone exploitable.
Then we reused the same function for NIST-style ECDSA, when it
arrived. This is fine for the P256 curve, and even P384. But in P521,
the order of the base point is _greater_ than 2^512, so when we
generate a 512-bit number and reduce it, the reduction never makes any
difference, and our output nonces are all in the first 2^512 elements
of the range of about 2^521. So this _does_ introduce a significant
bias in the nonces, compared to the ideal of uniformly random
distribution over the whole range. And it's been recently discovered
that a bias of this kind is sufficient to expose private keys, given a
manageably small number of signatures to work from.
(Incidentally, none of this affects Ed25519. The spec for that system
includes its own idea of how you should do deterministic nonce
generation - completely different again, naturally - and we did it
that way rather than our way, so that we could use the existing test
vectors.)
The simplest fix would be to patch our existing nonce generator to use
a longer hash, or concatenate a couple of SHA-512 hashes, or something
similar. But I think a more robust approach is to switch it out
completely for what is now the standard system. The main reason why I
prefer that is that the standard system comes with test vectors, which
adds a lot of confidence that I haven't made some other mistake in
following my own design.
So here's a commit that adds an implementation of RFC 6979, and
removes the old dsa_gen_k() function. Tests are added based on the
RFC's appendix of test vectors (as many as are compatible with the
more limited API of PuTTY's crypto code, e.g. we lack support for the
NIST P192 curve, or for doing integer DSA with many different hash
functions). One existing test changes its expected outputs, namely the
one that has a sample key pair and signature for every key algorithm
we support.
While trying to get an upcoming piece of code through testsc, I had
trouble - _yet again_ - with the way that control flow diverges inside
the glibc implementations of functions like memcpy and memset,
depending on the alignment of the input blocks _above_ the alignment
guaranteed by malloc, so that doing the same sequence of malloc +
memset can lead to different control flow. (I believe this is done
either for cache performance reasons or SIMD alignment requirements,
or both: on x86, some SIMD instructions require memory alignment
beyond what malloc guarantees, which is also awkward for our x86
hardware crypto implementations.)
My previous effort to normalise this problem out of sclog's log files
worked by wrapping memset and all its synonyms that I could find. But
this weekend, that failed for me, and the reason appears to be ifuncs.
I'm aware of the great irony of committing code to a security project
with a log message saying something vague about ifuncs, on the same
weekend that it came to light that commits matching that description
were one of the methods used to smuggle a backdoor into the XZ Utils
project (CVE-2024-3094). So I'll bend over backwards to explain both
what I think is going on, and why this _isn't_ a weird ifunc-related
backdooring attempt:
When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to
arrange that the side-channel test rig calls a function of mine before
and after each call to memset. The way drwrap works is to look up the
symbol address in either the main program or a shared library; in this
case, it's a shared library, namely libc.so. Then it intercepts call
instructions with exactly that address as the target.
Unfortunately, what _actually_ happens when the main program calls
memset is more complicated. First, control goes to the PLT entry for
memset (still in the main program). In principle, that loads a GOT
entry containing the address of memset (filled in by ld.so), and jumps
to it. But in fact the GOT entry varies its value through the program;
on the first call, it points to a resolver function, whose job is to
_find out_ the address of memset. And in the version of libc.so I'm
currently running, that resolver is an STT_GNU_IFUNC indirection
function, which tests the host CPU's capabilities, and chooses an
actual implementation of memset depending on what it finds. (In my
case, it looks as if it's picking one that makes extensive use of x86
SIMD.) To avoid the overhead of doing this on every call, the returned
function pointer is then written into the main program's GOT entry for
memset, overwriting the address of the resolver function, so that the
_next_ call the main program makes through the same PLT entry will go
directly to the memset variant that was chosen.
And the problem is that, after this has happened, none of the new
control flow ever goes near the _official_ address of memset, as read
out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry
isn't at that address, and neither is the particular SIMD variant that
the resolver ended up choosing. So now my wrapper on memset is never
being invoked, and memset cheerfully generates different control flow
in runs of my crypto code that testsc expects to be doing exactly the
same thing as each other, and all my tests fail spuriously.
My solution, at least for the moment, is to completely abandon the
strategy of wrapping memset. Instead, let's just make it behave the
same way every time, by forcing all the affected memory allocations to
have extra-strict alignment. I found that 64-byte alignment is not
good enough to eliminate memset-related test failures, but 128-byte
alignment is.
This would be tricky in itself, if it weren't for the fact that PuTTY
already has its own wrapper function on malloc (for various reasons),
which everything in our code already uses. So I can divert to C11's
aligned_alloc() there. That in turn is done by adding a new #ifdef to
utils/memory.c, and compiling it with that #ifdef into a new object
library that is included in testsc, superseding the standard memory.o
that would otherwise be pulled in from our 'utils' static library.
With the previous memset-compensator removed, this means testsc is now
dependent on having aligned_alloc() available. So we test for it at
cmake time, and don't build testsc at all if it can't be found. This
shouldn't bother anyone very much; aligned_alloc() is available on
_my_ testsc platform, and if anyone else is trying to run this test
suite at all, I expect it will be on something at least as new as
that.
(One awkward thing here is that we can only replace _new_ allocations
with calls to aligned_alloc(): C11 provides no aligned version of
realloc. Happily, this doesn't currently introduce any new problems in
testsc. If it does, I might have to do something even more painful in
future.)
So, why isn't this an ifunc-related backdoor attempt? Because (and you
can check all of this from the patch):
1. The memset-wrapping code exists entirely within the DynamoRIO
plugin module that lives in test/sclog. That is not used in
production, only for running the 'testsc' side-channel tester.
2. The memset-wrapping code is _removed_ by this patch, not added.
3. None of this code is dealing directly with ifuncs - only working
around the unwanted effects on my test suite from the fact that
they exist somewhere else and introduce awkward behaviour.
This takes a plain ssh_hashalg, and constructs the most natural kind
of HMAC wrapper around it, taking its key length and output length
to be the hash's output length. In other words, it converts SHA-foo
into exactly the thing usually called HMAC-SHA-foo.
It does it by constructing a new ssh2_macalg vtable, and including it
in the same memory allocation as the actual hash object. That's the
first time in PuTTY I've done it this way.
Nothing yet uses this, but a new piece of code is about to.
With Google Groups severing its connection to Usenet today[*], newsgroup
access will require what today are quite specialised tools; and even
before that (and before the recent spam floods, which will hopefully now
cease) they didn't seem super useful as a way of getting PuTTY support.
[*] https://support.google.com/groups/answer/11036538
I've just removed the whole section, since we couldn't think of an
obvious consensus replacement venue -- certainly not one where we hang
out ourselves. There's Stackmumble and places like that, but we didn't
think of anywhere concrete to point to.
This has led to some rather unsatisfactorily vague 'seek help from some
public forum' wording in places, which could perhaps use revision.
As I mentioned in the previous commit, this merge was nontrivial
enough that it wasn't a good idea for the release checklist to suggest
doing it in a hurry. And indeed, now I look at it again this morning,
there are mistakes: a memory leak of ConsoleIO on the abort path from
both affected functions, and a missing space in one of the prompt
messages. Now both fixed.