DIT, for 'Data-Independent Timing', is a bit you can set in the
processor state on sufficiently new Arm CPUs, which promises that a
long list of instructions will deliberately avoid varying their timing
based on the input register values. Just what you want for keeping
your constant-time crypto primitives constant-time.
As far as I'm aware, no CPU has _yet_ implemented any data-dependent
optimisations, so DIT is a safety precaution against them doing so in
future. It would be embarrassing to be caught without it if a future
CPU does do that, so we now turn on DIT in the PuTTY process state.
I've put a call to the new enable_dit() function at the start of every
main() and WinMain() belonging to a program that might do
cryptography (even testcrypt, in case someone uses it for something!),
and in case I missed one there, also added a second call at the first
moment that any cryptography-using part of the code looks as if it
might become active: when an instance of the SSH protocol object is
configured, when the system PRNG is initialised, and when selecting
any cryptographic authentication protocol in an HTTP or SOCKS proxy
connection. With any luck those precautions between them should ensure
it's on whenever we need it.
Arm's own recommendation is that you should carefully choose the
granularity at which you enable and disable DIT: there's a potential
time cost to turning it on and off (I'm not sure what, but plausibly
something of the order of a pipeline flush), so it's a performance hit
to do it _inside_ each individual crypto function, but if CPUs start
supporting significant data-dependent optimisation in future, then it
will also become a noticeable performance hit to just leave it on
across the whole process. So you'd like to do it somewhere in the
middle: for example, you might turn on DIT once around the whole
process of verifying and decrypting an SSH packet, instead of once for
decryption and once for MAC.
With all respect to that recommendation as a strategy for maximum
performance, I'm not following it here. I turn on DIT at the start of
the PuTTY process, and then leave it on. Rationale:
1. PuTTY is not otherwise a performance-critical application: it's
not likely to max out your CPU for any purpose _other_ than
cryptography. The most CPU-intensive non-cryptographic thing I can
imagine a PuTTY process doing is the complicated computation of
font rendering in the terminal, and that will normally be cached
(you don't recompute each glyph from its outline and hints for
every time you display it).
2. I think a bigger risk lies in accidental side channels from having
DIT turned off when it should have been on. I can imagine lots of
causes for that. Missing a crypto operation in some unswept corner
of the code; confusing control flow (like my coroutine macros)
jumping with DIT clear into the middle of a region of code that
expected DIT to have been set at the beginning; having a reference
counter of DIT requests and getting it out of sync.
In a more sophisticated programming language, it might be possible to
avoid the risk in #2 by cleverness with the type system. For example,
in Rust, you could have a zero-sized type that acts as a proof token
for DIT being enabled (it would be constructed by a function that also
sets DIT, have a Drop implementation that clears DIT, and be !Send so
you couldn't use it in a thread other than the one where DIT was set),
and then you could require all the actual crypto functions to take a
DitToken as an extra parameter, at zero runtime cost. Then "oops I
forgot to set DIT around this piece of crypto" would become a compile
error. Even so, you'd have to take some care with coroutine-structured
code (what happens if a Rust async function yields while holding a DIT
token?) and with nesting (if you have two DIT tokens, you don't want
dropping the inner one to clear DIT while the outer one is still there
to wrongly convince callees that it's set). Maybe in Rust you could
get this all to work reliably. But not in C!
DIT is an optional feature of the Arm architecture, so we must first
test to see if it's supported. This is done the same way as we already
do for the various Arm crypto accelerators: on ELF-based systems,
check the appropriate bit in the 'hwcap' words in the ELF aux vector;
on Mac, look for an appropriate sysctl flag.
On Windows I don't know of a way to query the DIT feature, _or_ of a
way to write the necessary enabling instruction in an MSVC-compatible
way. I've _heard_ that it might not be necessary, because Windows
might just turn on DIT unconditionally and leave it on, in an even
more extreme version of my own strategy. I don't have a source for
that - I heard it by word of mouth - but I _hope_ it's true, because
that would suit me very well! Certainly I can't write code to enable
DIT without knowing (a) how to do it, (b) how to know if it's safe.
Nonetheless, I've put the enable_dit() call in all the right places in
the Windows main programs as well as the Unix and cross-platform code,
so that if I later find out that I _can_ put in an explicit enable of
DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and
compile the enable_dit() function appropriately.
This begins the process of enabling our Windows applications to handle
Unicode characters on their command lines which don't fit in the
system code page.
Instead of passing plain strings to cmdline_process_param, we now pass
a partially opaque and platform-specific thing called a CmdlineArg.
This has a method that extracts the argument word as a default-encoded
string, and another one that tries to extract it as UTF-8 (though it
may fail if the UTF-8 isn't available).
On Windows, the command line is now constructed by calling
split_into_argv_w on the Unicode command line returned by
GetCommandLineW(), and the UTF-8 method returns text converted
directly from that wide-character form, not going via the system code
page. So it _can_ include UTF-8 characters that wouldn't have
round-tripped via CP_ACP.
This commit introduces the abstraction and switches over the
cross-platform and Windows argv-handling code to use it, with minimal
functional change. Nothing yet tries to call cmdline_arg_get_utf8().
I say 'cross-platform and Windows' because on the Unix side there's
still a lot of use of plain old argv which I haven't converted. That
would be a much larger project, and isn't currently needed: the
_current_ aim of this abstraction is to get the right things to happen
relating to Unicode on Windows, so for code that doesn't run on
Windows anyway, it's not adding value. (Also there's a tension with
GTK, which wants to talk to standard argv and extract arguments _it_
knows about, so at the very least we'd have to let it munge argv
before importing it into this new system.)
This begins the process of making PuTTY more able to handle Unicode
strings as a first-class type in its configuration. One of the new
types, CONF_TYPE_UTF8, looks physically just like CONF_TYPE_STR but
the semantics are that it's definitely encoded in UTF-8, instead of
'shrug, whatever the system locale's encoding is'.
Unfortunately, we can't yet switch over any Conf items to having that
type, because our data representations in saved configuration (both on
Unix and Windows) store char strings in the system encoding. So we'll
have to change that representation at the same time, which risks
breaking backwards compatibility with old PuTTYs reading the same
configuration.
So the other new type, CONF_TYPE_STR_AMBI, is intended as a
transitional form, recording a configuration setting that _might_ be
explicitly UTF-8 or might have the legacy 'shrug, whatever' semantics,
depending on where we got it from.
My general migration plan is that first I _enable_ Unicode support in
a Conf item, by turning it into STR_AMBI; the Unicode version of the
string (if any) is saved in a new location, and a best-effort
local-charset version is saved where it's always been. That way new
PuTTY can read the Unicode version, and old PuTTY reading that
configuration will behave no worse than it would have done already.
It would be nice to think that in the far future we've migrated
everything to STR_AMBI and can move them all to mandatory UTF-8,
obsoleting the old configuration. I think it's more likely we'll never
get there. But at least _new_ Conf items, with no backwards
compatibility requirement in the first place, can be CONF_TYPE_UTF8
where appropriate.
(In conf_get_str_ambi(), I considered making it mandatory via assert()
to pass the 'utf8' output pointer as non-NULL, to defend against lazy
adaptation of existing code by just changing the function call. But in
fact I think there's a legitimate use case for not caring if the
output is UTF-8 or not, because some of the existing SSH code
currently just shoves strings like usernames directly on to the wire
whether they're in the right encoding or not; so if you want to do the
correct UTF-8 thing where possible and preserve legacy behaviour if
not, then treating both classes of string the same _is_ the right
thing to do.)
This also requires linking the Unicode support into many Unix
applications that hadn't previously needed it.
While trying to get an upcoming piece of code through testsc, I had
trouble - _yet again_ - with the way that control flow diverges inside
the glibc implementations of functions like memcpy and memset,
depending on the alignment of the input blocks _above_ the alignment
guaranteed by malloc, so that doing the same sequence of malloc +
memset can lead to different control flow. (I believe this is done
either for cache performance reasons or SIMD alignment requirements,
or both: on x86, some SIMD instructions require memory alignment
beyond what malloc guarantees, which is also awkward for our x86
hardware crypto implementations.)
My previous effort to normalise this problem out of sclog's log files
worked by wrapping memset and all its synonyms that I could find. But
this weekend, that failed for me, and the reason appears to be ifuncs.
I'm aware of the great irony of committing code to a security project
with a log message saying something vague about ifuncs, on the same
weekend that it came to light that commits matching that description
were one of the methods used to smuggle a backdoor into the XZ Utils
project (CVE-2024-3094). So I'll bend over backwards to explain both
what I think is going on, and why this _isn't_ a weird ifunc-related
backdooring attempt:
When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to
arrange that the side-channel test rig calls a function of mine before
and after each call to memset. The way drwrap works is to look up the
symbol address in either the main program or a shared library; in this
case, it's a shared library, namely libc.so. Then it intercepts call
instructions with exactly that address as the target.
Unfortunately, what _actually_ happens when the main program calls
memset is more complicated. First, control goes to the PLT entry for
memset (still in the main program). In principle, that loads a GOT
entry containing the address of memset (filled in by ld.so), and jumps
to it. But in fact the GOT entry varies its value through the program;
on the first call, it points to a resolver function, whose job is to
_find out_ the address of memset. And in the version of libc.so I'm
currently running, that resolver is an STT_GNU_IFUNC indirection
function, which tests the host CPU's capabilities, and chooses an
actual implementation of memset depending on what it finds. (In my
case, it looks as if it's picking one that makes extensive use of x86
SIMD.) To avoid the overhead of doing this on every call, the returned
function pointer is then written into the main program's GOT entry for
memset, overwriting the address of the resolver function, so that the
_next_ call the main program makes through the same PLT entry will go
directly to the memset variant that was chosen.
And the problem is that, after this has happened, none of the new
control flow ever goes near the _official_ address of memset, as read
out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry
isn't at that address, and neither is the particular SIMD variant that
the resolver ended up choosing. So now my wrapper on memset is never
being invoked, and memset cheerfully generates different control flow
in runs of my crypto code that testsc expects to be doing exactly the
same thing as each other, and all my tests fail spuriously.
My solution, at least for the moment, is to completely abandon the
strategy of wrapping memset. Instead, let's just make it behave the
same way every time, by forcing all the affected memory allocations to
have extra-strict alignment. I found that 64-byte alignment is not
good enough to eliminate memset-related test failures, but 128-byte
alignment is.
This would be tricky in itself, if it weren't for the fact that PuTTY
already has its own wrapper function on malloc (for various reasons),
which everything in our code already uses. So I can divert to C11's
aligned_alloc() there. That in turn is done by adding a new #ifdef to
utils/memory.c, and compiling it with that #ifdef into a new object
library that is included in testsc, superseding the standard memory.o
that would otherwise be pulled in from our 'utils' static library.
With the previous memset-compensator removed, this means testsc is now
dependent on having aligned_alloc() available. So we test for it at
cmake time, and don't build testsc at all if it can't be found. This
shouldn't bother anyone very much; aligned_alloc() is available on
_my_ testsc platform, and if anyone else is trying to run this test
suite at all, I expect it will be on something at least as new as
that.
(One awkward thing here is that we can only replace _new_ allocations
with calls to aligned_alloc(): C11 provides no aligned version of
realloc. Happily, this doesn't currently introduce any new problems in
testsc. If it does, I might have to do something even more painful in
future.)
So, why isn't this an ifunc-related backdoor attempt? Because (and you
can check all of this from the patch):
1. The memset-wrapping code exists entirely within the DynamoRIO
plugin module that lives in test/sclog. That is not used in
production, only for running the 'testsc' side-channel tester.
2. The memset-wrapping code is _removed_ by this patch, not added.
3. None of this code is dealing directly with ifuncs - only working
around the unwanted effects on my test suite from the fact that
they exist somewhere else and introduce awkward behaviour.
This aims to be a reasonably exhaustive test of what happens if you
set Conf values to various things, and then save your session, and
find out what ends up in the storage. Or vice versa.
Currently, the test program is written to match the existing
behaviour. The idea is that I can refactor the code that does the
loading and saving, and if this test still passes, I've probably done
it right.
However, in the long term, this test will be a liability: it's yet
another place you have to add every new config option. So my plan is
to get rid of it again once the refactorings I'm planning are
finished.
Or rather, I'll get rid of _that_ part of its functionality. I also
suspect I'll have added new kinds of consistency check by then, which
won't be a liability in the same way, and which I'll want to keep.
If we don't have GTK enabled in the build, then lots of important
stuff never gets added to the 'guiterminal' build-time object library,
without which these terminal-using programs can't link successfully,
even though they don't actually use GTK.
I could add yet more stub functions, but I don't think that's really
necessary - it doesn't seem like a serious inconvenience that you can
only test the terminal on a platform where you can also build real
applications that include it. So I've just moved those two executable
file definitions inside the Big If that conditionalises PuTTY and
pterm themselves.
This has all the basic necessities to become a test of the terminal's
behaviour, in terms of how its data structures evolve as output is
sent to it, and perhaps also (by filling in the stub TermWin more
usefully) testing what it draws during updates and what it sends in
response to query sequences.
For the moment, all I've done is to set up the framework, and add one
demo test of printing some ordinary text and observing that it appears
in the data structures and the cursor has moved.
I expect that writing a full test of terminal.c will be a very big
job. But perhaps I or someone else will find time to prod it gradually
in the background of other work. In particular, when I'm _modifying_
any part of the terminal code, it would be good to add some tests for
the part I'm changing, before making the change, and check they still
work afterwards.
I'm about to rewrite it completely, so the first thing I need to do is
to write tests for as much of the functionality as possible, so that I
can check the new implementation behaves in the same ways.
This removes one case from several of the individual tools'
command-line parsers, and moves it into a central place where it will
automatically be supported by any tool containing console.c.
In order to make that not cause a link failure, there's now a
stubs/no-console.c which GUI clients of cmdline.c must include.
I made a specific subdirectory 'stubs' to keep all the link-time stub
modules in, like notiming.c. And I put _one_ run-time stub in it,
namely nullplug.c. But the rest of the runtime stubs went into utils.
I think it's better to keep all the stubs together, so I've moved all
the null*.c in utils into stubs (with the exception of nullstrcmp.c,
which means the 'null' in a different sense). Also, fiddled with the
naming to be a bit more consistent, and stated in the new CMakeLists
the naming policy that distinguishes no-*.c from null-*.c.
In the course of polishing up this dialog box, I'm going to want it to
actually do cryptographic things (such as checking validity of a
public key blob and printing its fingerprint), which means it will
need to link against SSH utility functions.
So I've moved the dialog-box setup and handling code out of config.c
into a new file in the ssh subdirectory and in the ssh library, where
those facilities will be conveniently available.
This also means that dialog-box setup code _won't_ be linked into
PuTTYtel or pterm (on either platform), so I've added a stub source
file to provide its entry-point function in those tools. Also,
provided a const bool to indicate whether that dialog is available,
which we use to decide whether to recognise that command-line option.
Unix Pageant is in a tricky position as a hybrid CLI/GUI application.
It has uses even in a purely CLI environment, but it won't build
without libgtk-3-dev and friends.
The solution, of course - enabled by the migration to cmake - is to
allow it to build without GTK, leaving out just the GTK askpass
functionality. That way you can still use it in any of its CLI modes,
either as a non-graphical SSH agent or as a client for an agent
elsewhere.
(You can still even use it in X lifetime mode, because its connection
to the X server is done using PuTTY's built-in X authentication and
connection setup code. It's only putting up the password prompt window
that you lose in this configuration - so you're still fine as long as
you don't try to add any encrypted keys.)
All the seat functions that request an interactive prompt of some kind
to the user - both the main seat_get_userpass_input and the various
confirmation dialogs for things like host keys - were using a simple
int return value, with the general semantics of 0 = "fail", 1 =
"proceed" (and in the case of seat_get_userpass_input, answers to the
prompts were provided), and -1 = "request in progress, wait for a
callback".
In this commit I change all those functions' return types to a new
struct called SeatPromptResult, whose primary field is an enum
replacing those simple integer values.
The main purpose is that the enum has not three but _four_ values: the
"fail" result has been split into 'user abort' and 'software abort'.
The distinction is that a user abort occurs as a result of an
interactive UI action, such as the user clicking 'cancel' in a dialog
box or hitting ^D or ^C at a terminal password prompt - and therefore,
there's no need to display an error message telling the user that the
interactive operation has failed, because the user already knows,
because they _did_ it. 'Software abort' is from any other cause, where
PuTTY is the first to know there was a problem, and has to tell the
user.
We already had this 'user abort' vs 'software abort' distinction in
other parts of the code - the SSH backend has separate termination
functions which protocol layers can call. But we assumed that any
failure from an interactive prompt request fell into the 'user abort'
category, which is not true. A couple of examples: if you configure a
host key fingerprint in your saved session via the SSH > Host keys
pane, and the server presents a host key that doesn't match it, then
verify_ssh_host_key would report that the user had aborted the
connection, and feel no need to tell the user what had gone wrong!
Similarly, if a password provided on the command line was not
accepted, then (after I fixed the semantics of that in the previous
commit) the same wrong handling would occur.
So now, those Seat prompt functions too can communicate whether the
user or the software originated a connection abort. And in the latter
case, we also provide an error message to present to the user. Result:
in those two example cases (and others), error messages should no
longer go missing.
Implementation note: to avoid the hassle of having the error message
in a SeatPromptResult being a dynamically allocated string (and hence,
every recipient of one must always check whether it's non-NULL and
free it on every exit path, plus being careful about copying the
struct around), I've instead arranged that the structure contains a
function pointer and a couple of parameters, so that the string form
of the message can be constructed on demand. That way, the only users
who need to free it are the ones who actually _asked_ for it in the
first place, which is a much smaller set.
(This is one of the rare occasions that I regret not having C++'s
extra features available in this code base - a unique_ptr or
shared_ptr to a string would have been just the thing here, and the
compiler would have done all the hard work for me of remembering where
to insert the frees!)
These alternate frontends using the GtkApplication class don't work
before GTK3, because the GtkApplication class didn't exist. In the old
mkfiles.pl system, the simplest way to prevent a build failure was to
just compile them anyway but make them reduce to a stub main(). But
now, with the new library-based code organisation, library search
order issues mean that these applications won't build at all.
Happily, with cmake, it's also easy to simply omit these binaries from
the build completely depending on our GTK version.
It's unquestionably a test program, and I'm generally clearing those
out of the top level. I only missed it in the last clearout because I
was looking for things with 'test' in the name.
This commit replaces all those fiddly little linking modules
(be_all.c, be_none.c, be_ssh.c etc) with a single source file
controlled by ifdefs, and introduces a function be_list() in
setup.cmake that makes it easy to compile a version of it appropriate
to each application.
This is a net reduction in code according to 'git diff --stat', even
though I've introduced more comments. It also gets rid of another pile
of annoying little source files in the top-level directory that didn't
deserve to take up so much room in 'ls'.
More concretely, doing this has some maintenance advantages.
Centralisation means less to maintain (e.g. n_ui_backends is worked
out once in a way that makes sense everywhere), and also, 'appname'
can now be reliably set per program. Previously, some programs got the
wrong appname due to sharing the same linking module (e.g. Plink had
appname="PuTTY"), which was a latent bug that would have manifested if
I'd wanted to reuse the same string in another context.
One thing I've changed in this rework is that Windows pterm no longer
has the ConPTY backend in its backends[]: it now has an empty one. The
special be_conpty.c module shouldn't really have been there in the
first place: it was used in the very earliest uncommitted drafts of
the ConPTY work, where I was using another method of selecting that
backend, but now that Windows pterm has a dedicated
backend_vt_from_conf() that refers to conpty_backend by name, it has
no need to live in backends[] at all, just as it doesn't have to in
Unix pterm.
While I'm in the mood for cleaning up the top-level directory here:
all the 'nostuff.c' files have moved into a new 'stubs' directory, and
I broke up be_misc.c into smaller modules that can live in 'utils'.
Now testcrypt has _two_ header files, that's more files than I want at
the top level, so I decided to move it.
It has a good claim to live in either 'test' or 'crypto', but in the
end I decided it wasn't quite specific enough to crypto (it already
also tests things in keygen and proxy), and also, the Python half of
the mechanism already lives in 'test', so it can live alongside that.
Having done that, it seemed silly to leave testsc and testzlib at the
top level: those have 'test' in the names as well, so they can go in
the test subdir as well.
While I'm renaming, also renamed testcrypt.h to testcrypt-func.h to
distinguish it from the new testcrypt-enum.h.
There are quite a few of them already, and I'm about to make another
one, so let's start with a bit of tidying up.
The CMake build organisation is unchanged: I haven't put the proxy
object files into a separate library, just moved the locations of the
source files. (Organising proxying as a library would be tricky
anyway, because of the various overrides for tools that want to avoid
cryptography.)
After this change, the cmake setup now works even on Debian stretch
(oldoldstable), which runs cmake 3.7.
In order to support a version that early I had to:
- write a fallback implementation of 'add_compile_definitions' for
older cmakes, which is easy, because add_compile_definitions(FOO)
is basically just add_compile_options(-DFOO)
- stop using list(TRANSFORM) and string(JOIN), of which I had one
case each, and they were easily replaced with simple foreach loops
- stop putting OBJECT libraries in the target_link_libraries command
for executable targets, in favour of adding $<TARGET_OBJECTS:foo>
to the main sources list for the same target. That matches what I
do with library targets, so it's probably more sensible anyway.
I tried going back by another Debian release and getting this cmake
setup to work on jessie, but that runs CMake 3.0.1, and in _that_
version of cmake the target_sources command is missing, and I didn't
find any alternative way to add extra sources to a target after having
first declared it. Reorganising to cope with _that_ omission would be
too much upheaval without a very good reason.
Now that I've removed side-channel leakage from both prime candidate
generation (via mp_unsafe_mod_integer) and Miller-Rabin, the
probabilistic prime generation system in this code base is now able to
get through testsc without it detecting any source of cache or timing
side channels. So you should be able to generate an RSA key (in which
the primes themselves must be secret) in a more hostile environment
than you could previously be confident of.
This is a bit counterintuitive, because _obviously_ random prime
generation takes a variable amount of time, because it has to keep
retrying until an attempt succeeds! But that's OK as long as the
attempts are completely independent, because then any timing or cache
information leaked by a _failed_ attempt will only tell an attacker
about the numbers used in the failed attempt, and those numbers have
been thrown away, so it doesn't matter who knows them. It's only
important that the _successful_ attempt, from generating the random
candidate through to completing its verification as (probably) prime,
should be side-channel clean, because that's the attempt whose data is
actually going to be turned into a private key that needs to be kept
secret.
(In particular, this means you have to avoid the old-fashioned
strategy of generating successive prime candidates by incrementing a
starting value until you find something not divisible by any small
prime, because the number of iterations of that method would be a
timing leak. Happily, we stopped doing that last year, in commit
08a3547bc5: now every candidate integer is generated
independently, and if one fails the initial checks, we throw it away
and start completely from scratch with a fresh random value.)
So the test harness works by repeatedly running the prime generator in
one-shot mode until an attempt succeeds, and then resetting the
random-number stream to where it was just before the successful
attempt. Then we generate the same prime number again, this time with
the sclog mechanism turned on - and then, we compare it against the
version we previously generated with the same random numbers, to make
sure they're the same. This checks that the attempts really _are_
independent, in the sense that the prime generator is a pure function
of its random input stream, and doesn't depend on state left over from
previous attempts.
This introduces a new entry to the radio-button list of proxy types,
in which the 'Proxy host' box is taken to be the name of an SSH server
or saved session. We make an entire subsidiary SSH connection to that
host, open a direct-tcpip channel through it, and use that as the
connection over which to run the primary network connection.
The result is basically the same as if you used a local proxy
subprocess, with a command along the lines of 'plink -batch %proxyhost
-nc %host:%port'. But it's all done in-process, by having an SshProxy
object implement the Socket trait to talk to the main connection, and
implement Seat and LogPolicy to talk to its subsidiary SSH backend.
All the refactoring in recent years has got us to the point where we
can do that without both SSH instances fighting over some global
variable or unique piece of infrastructure.
From an end user perspective, doing SSH proxying in-process like this
is a little bit easier to set up: it doesn't require you to bake the
full pathname of Plink into your saved session (or to have it on the
system PATH), and the SshProxy setup function automatically turns off
SSH features that would be inappropriate in this context, such as
additional port forwardings, or acting as a connection-sharing
upstream. And it has minor advantages like getting the Event Log for
the subsidiary connection interleaved in the main Event Log, as if it
were stderr output from a proxy subcommand, without having to
deliberately configure the subsidiary Plink into verbose mode.
However, this is an initial implementation only, and it doesn't yet
support the _big_ payoff for doing this in-process, which (I hope)
will be the ability to handle interactive prompts from the subsidiary
SSH connection via the same user interface as the primary one. For
example, you might need to answer two password prompts in succession,
or (the first time you use a session configured this way) confirm the
host keys for both proxy and destination SSH servers. Comments in the
new source file discuss some design thoughts on filling in this gap.
For the moment, if the proxy SSH connection encounters any situation
where an interactive prompt is needed, it will make the safe
assumption, the same way 'plink -batch' would do. So it's at least no
_worse_ than the existing technique of putting the proxy connection in
a subprocess.
Now that the main source file of Plink in each platform directory has
the same name, we can put centralise the main definition of the
program in the main CMakeLists.txt, and in the platform directory,
just add the few extra modules needed to clear up platform-specific
details.
The same goes for psocks. And PSCP and PSFTP could have been moved to
the top level already - I just hadn't done it in the initial setup.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.
It's another file that should have been subdivided into lots of tiny
separate things in the utils library - especially since for some
reason I made a completely separate 'guimisc' cmake-level library for
it when there was no need.
This clears up another large pile of clutter at the top level, and in
the process, allows me to rename source files to things that don't all
have that annoying 'ssh' prefix at the top.
add_platform_sources_to_library() is now called
add_sources_from_current_dir(), so that it will make sense when I use
it in subdirectories that aren't for a particular platform.
This new implementation uses the same optimisation-barrier technique
that I used in various places in testsc: have a no-op function, and a
volatile function pointer pointing at it, and then call through the
function pointer, so that nothing actually happens (apart from the
physical call and return) but the compiler has to assume that
_anything_ might have happened.
Doing this just after a memset enforces that the compiler can't have
thrown away the memset, because the called function might (for
example) check that all the memory really is zero and abort if not.
I've been turning this over in my mind ever since coming up with the
technique for testsc. I think it's far more robust than the previous
smemclr technique: so much so that I'm switching to using it
_everywhere_, and no longer using platform alternatives like Windows's
SecureZeroMemory().
This is a module that I'd noticed in the past was too monolithic.
There's a big pile of stub functions in uxpgnt.c that only have to be
there because the implementation of true X11 _forwarding_ (i.e.
actually managing a channel within an SSH connection), which Pageant
doesn't need, was in the same module as more general X11-related
utility functions which Pageant does need.
So I've broken up this awkward monolith. Now x11fwd.c contains only
the code that really does all go together for dealing with SSH X
forwarding: the management of an X forwarding channel (including the
vtables to make it behave as Channel at the SSH end and a Plug at the
end that connects to the local X server), and the management of
authorisation for those channels, including maintaining a tree234 of
possible auth values and verifying the one we received.
Most of the functions removed from this file have moved into the utils
subdir, and also into the utils library (i.e. further down the link
order), because they were basically just string and data processing.
One exception is x11_setup_display, which parses a display string and
returns a struct telling you everything about how to connect to it.
That talks to the networking code (it does name lookups and makes a
SockAddr), so it has to live in the network library rather than utils,
and therefore it's not in the utils subdirectory either.
The other exception is x11_get_screen_number, which it turned out
nothing called at all! Apparently the job it used to do is now done as
part of x11_setup_display. So I've just removed it completely.
Now that the new CMake build system is encouraging us to lay out the
code like a set of libraries, it seems like a good idea to make them
look more _like_ libraries, by putting things into separate modules as
far as possible.
This fixes several previous annoyances in which you had to link
against some object in order to get a function you needed, but that
object also contained other functions you didn't need which included
link-time symbol references you didn't want to have to deal with. The
usual offender was subsidiary supporting programs including misc.c for
some innocuous function and then finding they had to deal with the
requirements of buildinfo().
This big reorganisation introduces three new subdirectories called
'utils', one at the top level and one in each platform subdir. In each
case, the directory contains basically the same files that were
previously placed in the 'utils' build-time library, except that the
ones that were extremely miscellaneous (misc.c, utils.c, uxmisc.c,
winmisc.c, winmiscs.c, winutils.c) have been split up into much
smaller pieces.
This brings various concrete advantages over the previous system:
- consistent support for out-of-tree builds on all platforms
- more thorough support for Visual Studio IDE project files
- support for Ninja-based builds, which is particularly useful on
Windows where the alternative nmake has no parallel option
- a really simple set of build instructions that work the same way on
all the major platforms (look how much shorter README is!)
- better decoupling of the project configuration from the toolchain
configuration, so that my Windows cross-building doesn't need
(much) special treatment in CMakeLists.txt
- configure-time tests on Windows as well as Linux, so that a lot of
ad-hoc #ifdefs second-guessing a particular feature's presence from
the compiler version can now be replaced by tests of the feature
itself
Also some longer-term software-engineering advantages:
- other people have actually heard of CMake, so they'll be able to
produce patches to the new build setup more easily
- unlike the old mkfiles.pl, CMake is not my personal problem to
maintain
- most importantly, mkfiles.pl was just a horrible pile of
unmaintainable cruft, which even I found it painful to make changes
to or to use, and desperately needed throwing in the bin. I've
already thrown away all the variants of it I had in other projects
of mine, and was only delaying this one so we could make the 0.75
release branch first.
This change comes with a noticeable build-level restructuring. The
previous Recipe worked by compiling every object file exactly once,
and then making each executable by linking a precisely specified
subset of the same object files. But in CMake, that's not the natural
way to work - if you write the obvious command that puts the same
source file into two executable targets, CMake generates a makefile
that compiles it once per target. That can be an advantage, because it
gives you the freedom to compile it differently in each case (e.g.
with a #define telling it which program it's part of). But in a
project that has many executable targets and had carefully contrived
to _never_ need to build any module more than once, all it does is
bloat the build time pointlessly!
To avoid slowing down the build by a large factor, I've put most of
the modules of the code base into a collection of static libraries
organised vaguely thematically (SSH, other backends, crypto, network,
...). That means all those modules can still be compiled just once
each, because once each library is built it's reused unchanged for all
the executable targets.
One upside of this library-based structure is that now I don't have to
manually specify exactly which objects go into which programs any more
- it's enough to specify which libraries are needed, and the linker
will figure out the fine detail automatically. So there's less
maintenance to do in CMakeLists.txt when the source code changes.
But that reorganisation also adds fragility, because of the trad Unix
linker semantics of walking along the library list once each, so that
cyclic references between your libraries will provoke link errors. The
current setup builds successfully, but I suspect it only just manages
it.
(In particular, I've found that MinGW is the most finicky on this
score of the Windows compilers I've tried building with. So I've
included a MinGW test build in the new-look Buildscr, because
otherwise I think there'd be a significant risk of introducing
MinGW-only build failures due to library search order, which wasn't a
risk in the previous library-free build organisation.)
In the longer term I hope to be able to reduce the risk of that, via
gradual reorganisation (in particular, breaking up too-monolithic
modules, to reduce the risk of knock-on references when you included a
module for function A and it also contains function B with an
unsatisfied dependency you didn't really need). Ideally I want to
reach a state in which the libraries all have sensibly described
purposes, a clearly documented (partial) order in which they're
permitted to depend on each other, and a specification of what stubs
you have to put where if you're leaving one of them out (e.g.
nocrypto) and what callbacks you have to define in your non-library
objects to satisfy dependencies from things low in the stack (e.g.
out_of_memory()).
One thing that's gone completely missing in this migration,
unfortunately, is the unfinished MacOS port linked against Quartz GTK.
That's because it turned out that I can't currently build it myself,
on my own Mac: my previous installation of GTK had bit-rotted as a
side effect of an Xcode upgrade, and I haven't yet been able to
persuade jhbuild to make me a new one. So I can't even build the MacOS
port with the _old_ makefiles, and hence, I have no way of checking
that the new ones also work. I hope to bring that port back to life at
some point, but I don't want it to block the rest of this change.