This is a cherry-pick of Stefan Tauner's patch from main, but without
my followup refactoring, since the refactoring seemed to me to have a
(small but easily avoidable) chance of introducing a bug in 0.83.
The only downside of the original patch is that it contains a variable
name telling a lie: 'osc_is_apc' should really read 'this isn't an OSC
but one of APC, SOS and PM'. But we don't actually treat those three
things differently, so the functionality is fine.
(cherry picked from commit b6b95f23e5)
DIT, for 'Data-Independent Timing', is a bit you can set in the
processor state on sufficiently new Arm CPUs, which promises that a
long list of instructions will deliberately avoid varying their timing
based on the input register values. Just what you want for keeping
your constant-time crypto primitives constant-time.
As far as I'm aware, no CPU has _yet_ implemented any data-dependent
optimisations, so DIT is a safety precaution against them doing so in
future. It would be embarrassing to be caught without it if a future
CPU does do that, so we now turn on DIT in the PuTTY process state.
I've put a call to the new enable_dit() function at the start of every
main() and WinMain() belonging to a program that might do
cryptography (even testcrypt, in case someone uses it for something!),
and in case I missed one there, also added a second call at the first
moment that any cryptography-using part of the code looks as if it
might become active: when an instance of the SSH protocol object is
configured, when the system PRNG is initialised, and when selecting
any cryptographic authentication protocol in an HTTP or SOCKS proxy
connection. With any luck those precautions between them should ensure
it's on whenever we need it.
Arm's own recommendation is that you should carefully choose the
granularity at which you enable and disable DIT: there's a potential
time cost to turning it on and off (I'm not sure what, but plausibly
something of the order of a pipeline flush), so it's a performance hit
to do it _inside_ each individual crypto function, but if CPUs start
supporting significant data-dependent optimisation in future, then it
will also become a noticeable performance hit to just leave it on
across the whole process. So you'd like to do it somewhere in the
middle: for example, you might turn on DIT once around the whole
process of verifying and decrypting an SSH packet, instead of once for
decryption and once for MAC.
With all respect to that recommendation as a strategy for maximum
performance, I'm not following it here. I turn on DIT at the start of
the PuTTY process, and then leave it on. Rationale:
1. PuTTY is not otherwise a performance-critical application: it's
not likely to max out your CPU for any purpose _other_ than
cryptography. The most CPU-intensive non-cryptographic thing I can
imagine a PuTTY process doing is the complicated computation of
font rendering in the terminal, and that will normally be cached
(you don't recompute each glyph from its outline and hints for
every time you display it).
2. I think a bigger risk lies in accidental side channels from having
DIT turned off when it should have been on. I can imagine lots of
causes for that. Missing a crypto operation in some unswept corner
of the code; confusing control flow (like my coroutine macros)
jumping with DIT clear into the middle of a region of code that
expected DIT to have been set at the beginning; having a reference
counter of DIT requests and getting it out of sync.
In a more sophisticated programming language, it might be possible to
avoid the risk in #2 by cleverness with the type system. For example,
in Rust, you could have a zero-sized type that acts as a proof token
for DIT being enabled (it would be constructed by a function that also
sets DIT, have a Drop implementation that clears DIT, and be !Send so
you couldn't use it in a thread other than the one where DIT was set),
and then you could require all the actual crypto functions to take a
DitToken as an extra parameter, at zero runtime cost. Then "oops I
forgot to set DIT around this piece of crypto" would become a compile
error. Even so, you'd have to take some care with coroutine-structured
code (what happens if a Rust async function yields while holding a DIT
token?) and with nesting (if you have two DIT tokens, you don't want
dropping the inner one to clear DIT while the outer one is still there
to wrongly convince callees that it's set). Maybe in Rust you could
get this all to work reliably. But not in C!
DIT is an optional feature of the Arm architecture, so we must first
test to see if it's supported. This is done the same way as we already
do for the various Arm crypto accelerators: on ELF-based systems,
check the appropriate bit in the 'hwcap' words in the ELF aux vector;
on Mac, look for an appropriate sysctl flag.
On Windows I don't know of a way to query the DIT feature, _or_ of a
way to write the necessary enabling instruction in an MSVC-compatible
way. I've _heard_ that it might not be necessary, because Windows
might just turn on DIT unconditionally and leave it on, in an even
more extreme version of my own strategy. I don't have a source for
that - I heard it by word of mouth - but I _hope_ it's true, because
that would suit me very well! Certainly I can't write code to enable
DIT without knowing (a) how to do it, (b) how to know if it's safe.
Nonetheless, I've put the enable_dit() call in all the right places in
the Windows main programs as well as the Unix and cross-platform code,
so that if I later find out that I _can_ put in an explicit enable of
DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and
compile the enable_dit() function appropriately.
I'm not sure why these have never bothered me before, but a test build
I just made for a completely different reason complained about them!
findtest() did a binary search using a while loop, and then used
variables set in the loop body, which gcc objected to on the grounds
that the body might have run 0 times and not initialised those
variables. Also in the same function gcc objected to the idea that
findrelpos234() might have returned NULL and not set 'index'. I think
neither of these things can actually have _happened_, but let's stop
the compiler complaining anyway.
In 0.81 and before, we put an application manifest (XML-formatted
Windows resource) into all the GUI tools on purpose, and the CLI tools
like Plink didn't have one. But in 0.82, the CLI tools do have one,
and it's a small default one we didn't write ourselves, inserted by
some combination of cmake and clang-imitating-MSVC (I haven't checked
which of those is the cause).
This appears to have happened as a side effect of a build-tools
update, not on purpose. And its effect is that Windows XP now objects
to our plink.exe, because it's very picky about manifest format (we
have an old 'xp-wont-run' bug record about that).
Since it seemed to work fine to not have a manifest at all in 0.81,
let's go back to that. We were already passing /manifest:no to inhibit
the default manifest in the GUI tools, to stop it fighting with our
custom one; now I've moved /manifest:no into the global linker flags,
so it's applied to _all_ binaries, whether we're putting our own
manifest in or not.
In protocols other than PROT_RAW, the new line editing system differed
from the old one in not considering ^M or ^J (typed using the actual
Ctrl key, so distinct from pressing Return) to mean "I've finished
editing this line, please send it". This commit reinstates that
behaviour.
It turned out that a third-party tool (namely PuTTY Connection Manager),
which automatically answers prompts for the user, was terminating them
by sending ^J in place of the Return key. We don't know why (and it's
now unmaintained), but it was. So this change should make that tool
start working again.
I exclude PROT_RAW above because in that protocol the line editing has
much weirder handling for ^M and ^J, which lineedit replicated
faithfully from the old code: either control character by itself is
treated literally (displaying as "^M" or "^J" in the terminal), but if
you type the two in sequence in that order, then the ^J deletes the ^M
from the edit buffer and enters the line, so that the sequence CR LF
acts as a newline overall. I haven't changed that behaviour here, but
I have added a regression test of it to test_lineedit.
A user reported that when a PuTTY window is resized by the
'FancyZones' tool included in Microsoft PowerToys, the terminal itself
knows the new size ('stty' showed that it had sent a correct SIGWINCH
to the SSH server), but the next invocation of the Change Settings
dialog box still has the old size entered in it, leading to confusing
behaviour when you press Apply.
Inside PuTTY, this must mean that we updated the actual terminal's
size, but didn't update the main Conf object to match it, which is
where Change Settings populates its initial dialog state from.
It looks as if this is because FancyZones resizes the window by
sending it one single WM_SIZE, without wrapping it in the
WM_ENTERSIZEMOVE and WM_EXITSIZEMOVE messages that signal the start
and end of an interactive dragging resize operation. And the update of
Conf in wm_size_resize_term was in only one branch of the if statement
that checks whether we're in an interactive resize. Now it's outside
the if, so Conf will be updated in both cases.
I was about to try to debug a window resizing issue, and it looked as
if this patch had a plausible set of diagnostics already. But in fact
when I turned on this #ifdef it failed to compile, so I'm getting rid
of it.
Perhaps there is a use for having types of diagnostic living
permanently in the source code and easy to enable in themed sets, but
if so, I think they'd be better compiled in and enabled by an option,
than compiled out and enabled by #ifdef. That way they're less likely
to rot, and also, you can ask a user to turn one on really easily and
get extra logs for whatever is bothering them!
The new (ish) "3.7...3.28" syntax means: cmake will give up with a
fatal error if you try to build with a version older than 3.7, but
also, it won't turn on any new behaviour introduced after 3.28 (which
is the cmake version in Ubuntu 24.04, where I'm currently doing both
my development and production builds).
Without this, cmake 3.31 (found on Debian sid) will give a warning at
configure time: "Compatibility with CMake < 3.10 will be removed from
a future version of CMake." I guess the point is that they're planning
to make breaking changes that arrange that you _can't_ make the same
CMakeLists work with both 3.7 and this potential newer version. So by
specifying 3.28 as the "max" version, we avoid those breaking changes
affecting us, for the moment.
Our "old distro support" policy is currently that we still want to be
able to (and indeed I actually test it before each release) build on
Debian stretch, which is still in support, albeit a very marginal
paid-LTS kind of support. So we do still need to support cmake 3.7.
This seems to be a plausible way to get that to carry on working,
while not provoking annoying warnings from cmake 3.31, or risking the
actual breaking change when it comes, whatever it is.
(Fun fact: cmake 3.7 doesn't actually _understand_ this 3.7...3.28
syntax! That syntax itself was introduced in 3.12. But the cmake
manual explains that it's harmless to earlier versions, which will
interpret the extra dots as separating additional version components,
and ignore them. :-)
A user helpfully figured this out for us after the changes to Plink's
password prompt handling had disrupted their previous workflow. So it
seems worth documenting in case anyone else needs this fix.
(I think it is a fix, not a workaround: anyone needing this option now
probably _should_ have been doing it all along, because with the old
behaviour, Plink would have been sending a password prompt to Git, and
maybe even interpreting some of Git's protocol output as a password!
-batch would have been a more sensible way to abort the connection
even before the changes.)
When running on Wayland, gdk_display_get_name() can return things like
"wayland-0" rather than valid X display names. PuTTY nonetheless
treated them as X display names, meaning that when running under
Wayland, pterm would set DISPLAY to "wayland-0" in subprocesses, and
PuTTY's X forwarding wouldn't work properly.
To fix this, places that call gdk_display_get_name() now only do so on
displays for which GDK_IS_X_DISPLAY() is true. As with
GDK_IS_X_WINDOW(), this requires some backward-compatibility for GDK
versions where everything is implicitly running on X.
To make this work usefully, pterm now also won't unset DISPLAY if it
can't get an X display name and instead will pass through whatever value
of DISPLAY it received. I think that's better behaviour anyway.
There are two separate parts of PuTTY that call gdk_display_get_name().
platform_get_x_display() in unix/putty.c is used for X forwarding, while
gtk_seat_get_x_display() in unix/window.c is used used for setting DISPLAY
and recording in utmp. I've updated both of them.
When the user pressed Return at the end of a line, we were calling the
TermLineEditor's receiver function once for each character in the line
buffer. A Telnet user reported from looking at packet traces that this
leads to each character being sent in its own TCP segment, which is
wasteful and silly, and a regression in 0.82 compared to 0.81.
You can see the SSH version of the phenomenon even more easily in
PuTTY's own SSH logs, without having to look at the TCP layer at all:
you get a separate SSH2_MSG_CHANNEL_DATA per character when sending a
line that you entered via local editing in the GUI terminal.
The fix in this commit makes lineedit_send_line() collect keystrokes
into a temporary bufchain and pass them on to the backend in chunks
the size of a bufchain block.
This is better, but still not completely ideal: lineedit_send_line()
is often followed by a call to lineedit_send_newline(), and there's no
buffering done between _those_ functions. So you'll still see a
separate SSH message / Telnet TCP segment for the newline after the
line.
I haven't fixed that in this commit, for two reasons. First, unlike
the character-by-character sending of the line content, it's not a
regression in 0.82: previous versions also sent the newline in a
separate packet and nobody complained about that. Second, it's much
more difficult, because newlines are handled specially - in particular
by the Telnet backend, which sometimes turns them into a wire sequence
CR LF that can't be generated by passing any literal byte to
backend_send. So you'd need to violate a load of layers, or else have
multiple parts of the system buffer up output and then arrange to
release it on a toplevel callback or some such. Much more code, more
risk of bugs, and less gain.
This occurred if the SSH server closed the connection for any
reason (in practice usually a timeout, but reproducible more easily by
manually killing a test server process) while the user was in the
middle of any kind of interactive prompt-based login in the GUI PuTTY
terminal (be it simple password, k-i, private key passphrase,
whatever).
The problem was that term->userpass_state wasn't cleaned up when the
connection died, and then if you started a fresh SSH session in the
same terminal, the attempt to create a new term->userpass_state would
find there was one already there.
The simplest place to insert the missing cleanup is the call to
term_provide_backend(), because that's a terminal API function which
is already called to notify the terminal that one backend has gone
away and the next one has turned up.
(In fact, it's called twice, once to set term->backend to NULL when
the first session closes, and again when the session is restarted. I
see no harm in making the cleanup unconditional, not bothering to tell
the difference between the two cases.)
This centralises into windows/utils/request_file.c all of the code
that deals with the OPENFILENAME structure, and decides centrally
whether to use the Unicode or ANSI version of that structure and its
associated APIs. Now the output of any request_file function is our
own 'Filename' abstract type, instead of a raw char or wchar_t buffer,
which means that _any_ file dialog can produce a full Unicode filename
if the user wants to select one - and yet, in the w32old build, they
all uniformly fall back to the ANSI version, which is the only one
that works at all pre-NT.
A side effect: I've turned the FILTER_FOO_FILES family of definitions
from platform-specific #defines into a reasonably sensible enum. This
didn't affect the GTK side of things , because I'd never got round to
figuring out how to filter a file dialog down to a subset of files in
GTK, and still haven't. So I've just moved the existing FIXME comment
from platform.h to dialog.c.
A user reported shortly after 0.82 was released that they were
experiencing display corruption when connecting to a PDP-10 running
ITS using PuTTY's SUPDUP backend, and that the nature of the
corruption was consistent with a missing clear-to-EOL operation.
Without the SUPDUP or ITS expertise to debug it ourselves, we are
indebted to Scott Michel for identifying where, and providing a patch.
(However, now that the patch is presented, it's obvious even to me
that a line should be cleared here! The comment in PuTTY's own code
mentions clearing the line that the cursor has moved on to, and the
same text appears in RFC 734.)
draft-kampanakis-curdle-ssh-pq-ke defines the packet names
SSH_MSG_KEX_HYBRID_INIT and SSH_MSG_KEX_HYBRID_REPLY. They have the
same numbers as ECDH_INIT and ECDH_REPLY, and don't change anything
else, so this is just a naming change. But I think it's a good one,
because the post-quantum KEMs are less symmetric than ECDH (they're
much more like Ben's RSA kex in concept, though very different in
detail), and shouldn't try to pretend they're the same kind of thing.
Also this enables logparse.pl to give a warning about the fact that
one string in each packet contains two separate keys glomphed together.
For the latter reason (and also because it's easier in my code
structure) I've also switched to using the HYBRID naming for the
existing NTRU + Curve25519 hybrid method, even though the
Internet-Draft for that one still uses the ECDH names. Sorry, but I
think it's clearer!
As standardised by NIST in FIPS 203, this is a lattice-based
post-quantum KEM.
Very vaguely, the idea of it is that your public key is a matrix A and
vector t, and the private key is the knowledge of how to decompose t
into two vectors with all their coefficients small, one transformed by
A relative to the other. Encryption of a binary secret starts by
turning each bit into one of two maximally separated residues mod a
prime q, and then adding 'noise' based on the public key in the form
of small increments and decrements mod q, again with some of the noise
transformed by A relative to the rest. Decryption uses the knowledge
of t's decomposition to align the two sets of noise so that the
_large_ changes (which masked the secret from an eavesdropper) cancel
out, leaving only a collection of small changes to the original secret
vector. Then the vector of input bits can be recovered by assuming
that those accumulated small pieces of noise haven't concentrated in
any particular residue enough to push it more than half way to the
other of its possible starting values.
A weird feature of it is that decryption is not a true mathematical
inverse of encryption. The assumption that the noise doesn't get large
enough to flip any bit of the secret is only probabilistically valid,
not a hard guarantee. In other words, key agreement can fail, simply
by getting particularly unlucky with the distribution of your random
noise! However, the probability of a failure is very low - less than
2^-138 even for ML-KEM-512, and gets even smaller with the larger
variants.
An awkward feature for our purposes is that the matrix A, containing a
large number of residues mod the prime q=3329, is required to be
constructed by a process of rejection sampling, i.e. generating random
12-bit values and throwing away the out-of-range ones. That would be a
real pain for our side-channel testing system, which generally handles
rejection sampling badly (since it necessarily involves data-dependent
control flow and timing variation). Fortunately, the matrix and the
random seed it was made from are both public: the matrix seed is
transmitted as part of the public key, so it's not necessary to try to
hide it. Accordingly, I was able to get the implementation to pass
testsc by means of not varying the matrix seed between runs, which is
justified by the principle of testsc that you vary the _secrets_ to
ensure timing is independent of them - and the matrix seed isn't a
secret, so you're allowed to keep it the same.
The three hybrid algorithms, defined by the current Internet-Draft
draft-kampanakis-curdle-ssh-pq-ke, include one hybrid of ML-KEM-768
with Curve25519 in exactly the same way we were already hybridising
NTRU Prime with Curve25519, and two more hybrids of ML-KEM with ECDH
over a NIST curve. The former hybrid interoperates with the
implementation in OpenSSH 9.9; all three interoperate with the fork
'openssh-oqs' at github.com/open-quantum-safe/openssh, and also with
the Python library AsyncSSH.
It's actually the limiting factor on how small the whole PuTTY
configuration dialog box can be: when KEX_MAX increased from 10 to 11
with the introduction of NTRU, the config box got taller. Now it's
back at 10.
This adds a ssh_hashalg defining SHAKE256 with a 32-byte output, in
addition to the 114-byte output we already have.
Also, it defines a new API for using SHAKE128 and SHAKE256 in the more
general form of an extendable output function, which is to say that
you still have to put in all the input before reading any output, but
once you start reading output you can just keep going until you have
enough.
Both of these will be needed in an upcoming commit implementing ML-KEM.
Now ntru.c contains just the NTRU business, and kex-hybrid.c contains
the system for running a post-quantum and a classical KEX and hashing
together the results. In between them is a new small vtable API for
the key encapsulation mechanisms that the post-quantum standardisation
effort seems to be settling on.
aligned_alloc() is used by testsc for all its memory allocation, to
avoid false-positive timing variations that depend on memory alignment
rather than actual secret data. But I'd forgotten that aligned_alloc
requires the allocation size to be a multiple of the requested
alignment.
This showed up when I ran testsc in dry-run mode, and my normal build
happened to be using ASan, which complains at the invalid allocation
size. But it was theoretically a problem in all builds of
testsc. (Though, as far as I'm aware, not practically; and it _only_
affected testsc.)
By putting the wrong error-type enum value in a ScanKexinitsResult, I
accidentally caused nonsense messages of the form
Selected key exchange algorithm "foo,bar,baz" does not correspond to any supported algorithm
where "foo,bar,baz" is the full comma-separated list sent by the
server, so it's not even _an_ algorithm as the message suggests.
Now the message is the one it should have been all along:
Couldn't agree a key exchange algorithm (available: foo,bar,baz)
I only observed this in the GTK1 build, but I don't know for sure it
can't happen in other situations, so there's no reason not to be
careful.
What seems to happen is that when the user clicks Cancel on the Change
Settings dialog box, we call gtk_widget_destroy on the window, which
emits the "destroy" signal on the window, our handler for which frees
the whole dlgparam. But _then_ GTK goes through and cleans up all the
sub-widgets of the dialog box, and some of those generate extra
events. In particular, destroying a list box is done by first deleting
all the list entries - and if one of those is selected, the list box's
selection changes, triggering an event which calls our callback that
tries to look up the control in the dlgparam we just freed.
My simple workaround is to defer actually freeing the dlgparam, via a
toplevel callback. Then it's still lying around empty while all those
random events are firing.
On Windows, this means they can use non-CP_ACP characters. Also it has
the side effect of cloning the filename out of the CmdlineArgList,
which makes it still valid after cmdline_arg_list_free (oops).
In commit 20f818af12 I renamed a lot of variables called 'ret',
by using clang-rename to do the heavy lifting. But clang-rename only
saw instances of the variable name that the _compiler_ saw. The ones
that never got through the preprocessor weren't renamed, and I didn't
eyeball the patch hard enough to find instances in the #else branch of
ifdefs that should also have been renamed.
Thanks to Lars Wendler for the report and the fixes.
If this occurs before cmdline_run_saved, then the latter will use its
saved pointers to arguments in the freed CmdlineArgList.
Affects uses of PuTTY without a saved session (like 'putty -ssh
foohost'), and a very small number of pterm options, in particular
-sessionlog.
This is the simplest possible fix: just remove the free completely,
so that the parsed command-line arguments leak. There's at most one
instance of them per process, so it doesn't matter.
I had allocated a string, advanced a pointer along it, and then freed
that pointer instead of the pointer to the start of the string.
I'd already applied the correct fix in tolocal() in commit
841bf321d4, but it needed applying in get_dir_list() too.
This reverts the change to is_interactive() by commit 80aed96286,
which switched it to using the new conio system. Now we're back to
doing it the same way as we used to: we check if stdin is a console.
The only use of is_interactive() on Windows is deciding whether to
present the console antispoof prompt. (On Unix it has an additional
use in cmdgen, for deciding whether to emit progress reports, but on
Windows that doesn't come up).
On Unix this is based on stdin being a tty, which means that a
command such as "plink host do stuff </dev/null" omits the antispoof
prompt. That's deliberate: the prompt is to defend against attacks
where the user sends interactive input to the SSH session channel
believing it to be directed at userauth, but if the input _isn't_
coming from the interactive terminal where the user is answering
userauth prompts, then they can't do that even if they are fooled.
On Windows, I think the same argument applies, now that we're reading
userauth prompts from the console in the same way as Unix. So
is_interactive() now does the analogous thing on Windows.
Conveniently, this _also_ means is_interactive() is back to exactly
how it was before the conio rewrite, which means it's one fewer thing
that can unexpectedly change and break someone's workflow. (Otherwise
I might also have wanted to change its behaviour based on
-legacy-stdio-handling, which would be extra ugly.)
In 0.81, some prompts (such as host-key prompts) went to stderr, while
others (such as username and password prompts) went to stdout.
With -legacy-stdio-prompts (or if we otherwise couldn't get console
handles), we were sending all such prompts to stdout, which might have
messed up someone's workflow if they were interacting with the
non-username/password prompts programmatically.
The fix in commit 31ab5b8e30 was incomplete. It works when you
maximise the window by pressing the maximise button in the title bar,
but not when you drag the window to the very top of the screen. In the
latter case, the resize at the instant of maximisation works right,
but then we get a WM_EXITSIZEMOVE when the drag is released,
triggering one final resize which ignored the window border again.
That was because wm_size_resize_term() was being given a flag on
purpose telling it to ignore the border: apparently at some point in
the past, ignoring the border for even a normally maximised
window (not even full-screen) was done on purpose. But for the reasons
in the previous commit, I'm changing that.
Just as with other recent changes like usernames, this allows the
remote command line to include characters outside the system code
page, encoding as UTF-8 on the wire (as the SSH protocol has wanted
all along).
When term_input_data_from_charset receives data in a specified
character set (that isn't a negative number indicating "just send
binary"), it was translating from that character set into wide-
character Unicode, but then forgetting to translate back again into
the terminal's configured character set.
Introduced by the rewrite in commit 4f756d2a4d. Affected the
answerback string sent in response to ^E, and I think potentially some
paths through term_keyinput too, although I haven't quite worked out
which ones would have been affected.
The UCS-2 translation of the access-mode string ("r", "wb" etc) was
not being freed, because the free call appeared after an unconditional
return statement.
Spotted by Coverity, although now I wonder why an ordinary compiler
warning hadn't long since pointed this one out!