The refactored sshaes.c gives me a convenient slot to drop in a second
hardware-accelerated AES implementation, similar to the existing one
but using Arm NEON intrinsics in place of the x86 AES-NI ones.
This needed a minor structural change, because Arm systems are often
heterogeneous, containing more than one type of CPU which won't
necessarily all support the same set of architecture features. So you
can't test at run time for the presence of AES acceleration by
querying the CPU you're running on - even if you found a way to do it,
the answer wouldn't be reliable once the OS started migrating your
process between CPUs. Instead, you have to ask the OS itself, because
only that knows about _all_ the CPUs on the system. So that means the
aes_hw_available() mechanism has to extend a tentacle into each
platform subdirectory.
The trickiest part was the nest of ifdefs that tries to detect whether
the compiler can support the necessary parts. I had successful
test-compiles on several compilers, and was able to run the code
directly on an AArch64 tablet (so I know it passes cryptsuite), but
it's likely that at least some Arm platforms won't be able to build it
because of some path through the ifdefs that I haven't been able to
test yet.
I remembered the existence of that module while I was changing the API
of the CRC functions. It's still quite possibly the only code in PuTTY
not written specifically _for_ PuTTY, so it definitely deserves a bit
of a test suite.
In order to expose it through the ptrlen-centric testcrypt system,
I've added some missing 'const' in the detector module itself, but
otherwise I've left the detector code as it was.
Finding even semi-official test vectors for this CRC implementation
was hard, because it turns out not to _quite_ match any of the well
known ones catalogued on the web. Its _polynomial_ is well known, but
the combination of details that go alongside it (starting state,
post-hashing transformation) are not quite the same as any other hash
I know of.
After trawling catalogue websites for a while I finally worked out
that SSH-1's CRC and RFC 1662's CRC are basically the same except for
different choices of starting value and final adjustment. And RFC
1662's CRC is common enough that there _are_ test vectors.
So I've renamed the previous crc32_compute function to crc32_ssh1,
reflecting that it seems to be its own thing unlike any other CRC;
implemented the RFC 1662 CRC as well, as an alternative tiny wrapper
on the inner crc32_update function; and exposed all three functions to
testcrypt. That lets me run standard test vectors _and_ directed tests
of the internal update routine, plus one check that crc32_ssh1 itself
does what I expect.
While I'm here, I've also modernised the code to use uint32_t in place
of unsigned long, and ptrlen instead of separate pointer,length
arguments. And I've removed the general primer on CRC theory from the
header comment, in favour of the more specifically useful information
about _which_ CRC this is and how it matches up to anything else out
there.
(I've bowed to inevitability and put the directed CRC tests in the
'crypt' class in cryptsuite.py. Of course this is a misnomer, since
CRC isn't cryptography, but it falls into the same category in terms
of the role it plays in SSH-1, and I didn't feel like making a new
pointedly-named 'notreallycrypt' container class just for this :-)
The bulk of this commit is the changes necessary to make testcrypt
compile under Visual Studio. Unfortunately, I've had to remove my
fiddly clever uses of C99 variadic macros, because Visual Studio does
something unexpected when a variadic macro's expansion puts
__VA_ARGS__ in the argument list of a further macro invocation: the
commas don't separate further arguments. In other words, if you write
#define INNER(x,y,z) some expansion involving x, y and z
#define OUTER(...) INNER(__VA_ARGS__)
OUTER(1,2,3)
then gcc and clang will translate OUTER(1,2,3) into INNER(1,2,3) in
the obvious way, and the inner macro will be expanded with x=1, y=2
and z=3. But try this in Visual Studio, and you'll get the macro
parameter x expanding to the entire string 1,2,3 and the other two
empty (with warnings complaining that INNER didn't get the number of
arguments it expected).
It's hard to cite chapter and verse of the standard to say which of
those is _definitely_ right, though my reading leans towards the
gcc/clang behaviour. But I do know I can't depend on it in code that
has to compile under both!
So I've removed the system that allowed me to declare everything in
testcrypt.h as FUNC(ret,fn,arg,arg,arg), and now I have to use a
different macro for each arity (FUNC0, FUNC1, FUNC2 etc). Also, the
WRAPPED_NAME system is gone (because that too depended on the use of a
comma to shift macro arguments along by one), and now I put a custom C
wrapper around a function by simply re-#defining that function's own
name (and therefore the subsequent code has to be a little more
careful to _not_ pass functions' names between several macros before
stringifying them).
That's all a bit tedious, and commits me to a small amount of ongoing
annoyance because now I'll have to add an explicit argument count
every time I add something to testcrypt.h. But then again, perhaps it
will make the code less incomprehensible to someone trying to
understand it!
That's a terrible name, but winutils.c was already taken. The new
source file is intended to be to winmisc.c as the new utils.c is to
misc.c: it contains all the parts that are basically safe to link into
_any_ Windows program (even standalone test things), without tying in
to the runtime infrastructure of the main tools, referring to any
other PuTTY source module, or introducing an extra Win32 API library
dependency.
I've written a new standalone test program which incorporates all of
PuTTY's crypto code, including the mp_int and low-level elliptic curve
layers but also going all the way up to the implementations of the
MAC, hash, cipher, public key and kex abstractions.
The test program itself, 'testcrypt', speaks a simple line-oriented
protocol on standard I/O in which you write the name of a function
call followed by some inputs, and it gives you back a list of outputs
preceded by a line telling you how many there are. Dynamically
allocated objects are assigned string ids in the protocol, and there's
a 'free' function that tells testcrypt when it can dispose of one.
It's possible to speak that protocol by hand, but cumbersome. I've
also provided a Python module that wraps it, by running testcrypt as a
persistent subprocess and gatewaying all the function calls into
things that look reasonably natural to call from Python. The Python
module and testcrypt.c both read a carefully formatted header file
testcrypt.h which contains the name and signature of every exported
function, so it costs minimal effort to expose a given function
through this test API. In a few cases it's necessary to write a
wrapper in testcrypt.c that makes the function look more friendly, but
mostly you don't even need that. (Though that is one of the
motivations between a lot of API cleanups I've done recently!)
I considered doing Python integration in the more obvious way, by
linking parts of the PuTTY code directly into a native-code .so Python
module. I decided against it because this way is more flexible: I can
run the testcrypt program on its own, or compile it in a way that
Python wouldn't play nicely with (I bet compiling just that .so with
Leak Sanitiser wouldn't do what you wanted when Python loaded it!), or
attach a debugger to it. I can even recompile testcrypt for a
different CPU architecture (32- vs 64-bit, or even running it on a
different machine over ssh or under emulation) and still layer the
nice API on top of that via the local Python interpreter. All I need
is a bidirectional data channel.
misc.c has always contained a combination of things that are tied
tightly into the PuTTY code base (e.g. they use the conf system, or
work with our sockets abstraction) and things that are pure standalone
utility functions like nullstrcmp() which could quite happily be
dropped into any C program without causing a link failure.
Now the latter kind of standalone utility code lives in the new source
file utils.c, whose only external dependency is on memory.c (for snew,
sfree etc), which in turn requires the user to provide an
out_of_memory() function. So it should now be much easier to link test
programs that use PuTTY's low-level functions without also pulling in
half its bulky infrastructure.
In the process, I came across a memory allocation logging system
enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case
we have much more advanced tools for that kind of thing these days,
like valgrind and Leak Sanitiser, so I've just removed it rather than
trying to transplant it somewhere sensible. (We can always pull it
back out of the version control history if really necessary, but I
haven't used it in at least a decade.)
The other slightly silly thing I did was to give bufchain a function
pointer field that points to queue_idempotent_callback(), and disallow
direct setting of the 'ic' field in favour of calling
bufchain_set_callback which will fill that pointer in too. That allows
the bufchain system to live in utils.c rather than misc.c, so that
programs can use it without also having to link in the callback system
or provide an annoying stub of that function. In fact that's just
allowed me to remove stubs of that kind from PuTTYgen and Pageant!
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
Now, instead of getting the zlib test/helper program by manually
compiling a source file with unusual options, it gets built as
standard by the ordinary Makefile.
Now they live in their own file memory.c. The advantage of this is
that you can link them into a binary without also pulling in the rest
of misc.c with its various dependencies on other parts of the code,
such as conf.c.
The annoying int64.h is completely retired, since C99 guarantees a
64-bit integer type that you can actually treat like an ordinary
integer. Also, I've replaced the local typedefs uint32 and word32
(scattered through different parts of the crypto code) with the
standard uint32_t.
Like the SFTP server, this is implemented in-process rather than by
invoking a separate scp server binary.
It also uses the internal SftpServer abstraction for access to the
server's filesystem, which means that when (or if) I implement an
alternative SftpServer implementing a dummy file system for test suite
purposes, this scp server should automatically start using it too.
As a bonus, the new scpserver.c contains a large comment documenting
my understanding of the SCP protocol, which I previously didn't have
even a de-facto or post-hoc spec for. I don't claim it's authoritative
- it's all reverse-engineered from my own code and observing other
implementations in action - but at least it'll make it easier to
refresh my own memory of what's going on the next time I need to do
something to either this code or PSCP.
Unlike the traditional Unix SSH server organisation, the SFTP server
is built into the same process as all the rest of the code. sesschan.c
spots a subsystem request for "sftp", and responds to it by
instantiating an SftpServer object and swapping out its own vtable for
one that talks to it.
(I rather like the idea of an object swapping its own vtable for a
different one in the middle of its lifetime! This is one of those
tricks that would be absurdly hard to implement in a 'proper' OO
language, but when you're doing vtables by hand in C, it's no more
difficult than any other piece of ordinary pointer manipulation. As
long as the methods in both vtables expect the same physical structure
layout, it doesn't cause a problem.)
The SftpServer object doesn't deal directly with SFTP packet formats;
it implements the SFTP server logic in a more abstract way, by having
a vtable method for each SFTP request type with an appropriate
parameter list. It sends its replies by calling methods in another
vtable called SftpReplyBuilder, which in the normal case will write an
SFTP reply packet to send back to the client. So SftpServer can focus
more or less completely on the details of a particular filesystem API
- and hence, the implementation I've got lives in the unix source
directory, and works directly with file descriptors and struct stat
and the like.
(One purpose of this abstraction layer is that I may well want to
write a second dummy implementation, for test-suite purposes, with
completely controllable behaviour, and now I have a handy place to
plug it in in place of the live filesystem.)
In between sesschan's parsing of the byte stream into SFTP packets and
the SftpServer object, there's a layer in the new file sftpserver.c
which does the actual packet decoding and encoding: each request
packet is passed to that, which pulls the fields out of the request
packet and calls the appropriate method of SftpServer. It also
provides the default SftpReplyBuilder which makes the output packet.
I've moved some code out of the previous SFTP client implementation -
basic packet construction code, and in particular the BinarySink/
BinarySource marshalling fuinction for fxp_attrs - into sftpcommon.c,
so that the two directions can share as much as possible.
This server is NOT SECURE! If anyone is reading this commit message,
DO NOT DEPLOY IT IN A HOSTILE-FACING ENVIRONMENT! Its purpose is to
speak the server end of everything PuTTY speaks on the client side, so
that I can test that I haven't broken PuTTY when I reorganise its
code, even things like RSA key exchange or chained auth methods which
it's hard to find a server that speaks at all.
(For this reason, it's declared with [UT] in the Recipe file, so that
it falls into the same category as programs like testbn, which won't
be installed by 'make install'.)
Working title is 'Uppity', partly for 'Universal PuTTY Protocol
Interaction Test Yoke', but mostly because it looks quite like the
word 'PuTTY' with part of it reversed. (Apparently 'test yoke' is a
very rarely used term meaning something not altogether unlike 'test
harness', which is a bit of a stretch, but it'll do.)
It doesn't actually _support_ everything I want yet. At the moment,
it's a proof of concept only. But it has most of the machinery
present, and the parts it's missing - such as chained auth methods -
should be easy enough to add because I've built in the required
flexibility, in the form of an AuthPolicy object which can request
them if it wants to. However, the current AuthPolicy object is
entirely trivial, and will let in any user with the password "weasel".
(Another way in which this is not a production-ready server is that it
also has no interaction with the OS's authentication system. In
particular, it will not only let in any user with the same password,
but it won't even change uid - it will open shells and forwardings
under whatever user id you started it up as.)
Currently, the program can only speak the SSH protocol on its standard
I/O channels (using the new FdSocket facility), so if you want it to
listen on a network port, you'll have to run it from some kind of
separate listening program similar to inetd. For my own tests, I'm not
even doing that: I'm just having PuTTY spawn it as a local proxy
process, which also conveniently eliminates the risk of anyone hostile
connecting to it.
The bulk of the actual code reorganisation is already done by previous
commits, so this change is _mostly_ just dropping in a new set of
server-specific source files alongside the client-specific ones I
created recently. The remaining changes in the shared SSH code are
numerous, but all minor:
- a few extra parameters to BPP and PPL constructors (e.g. 'are you
in server mode?'), and pass both sets of SSH-1 protocol flags from
the login to the connection layer
- in server mode, unconditionally send our version string _before_
waiting for the remote one
- a new hook in the SSH-1 BPP to handle enabling compression in
server mode, where the message exchange works the other way round
- new code in the SSH-2 BPP to do _deferred_ compression the other
way round (the non-deferred version is still nicely symmetric)
- in the SSH-2 transport layer, some adjustments to do key derivation
either way round (swapping round the identifying letters in the
various hash preimages, and making sure to list the KEXINITs in the
right order)
- also in the SSH-2 transport layer, an if statement that controls
whether we send SERVICE_REQUEST and wait for SERVICE_ACCEPT, or
vice versa
- new ConnectionLayer methods for opening outgoing channels for X and
agent forwardings
- new functions in portfwd.c to establish listening sockets suitable
for remote-to-local port forwarding (i.e. not under the direction
of a Conf the way it's done on the client side).
The code in Pageant that sets up the Unix socket and its containing
directory now lives in a separate file, uxagentsock.c, where it will
also be callable from the upcoming new SSH server when it wants to
create a similar socket for agent forwarding.
While I'm at it, I've also added a feature to create a watchdog
subprocess that will try to clean up the socket and directory once
Pageant itself terminates, in the hope of leaving less cruft lying
around /tmp.
This is a major code reorganisation in preparation for making this
code base into one that can build an SSH server as well as a client.
(Mostly for purposes of using the server as a regression test suite
for the client, though I have some other possible uses in mind too.
However, it's currently no part of my plan to harden the server to the
point where it can sensibly be deployed in a hostile environment.)
In this preparatory commit, I've broken up the SSH-2 transport and
connection layers, and the SSH-1 connection layer, into multiple
source files, with each layer having its own header file containing
the shared type definitions. In each case, the new source file
contains code that's specific to the client side of the protocol, so
that a new file can be swapped in in its place when building the
server.
Mostly this is just a straightforward moving of code without changing
it very much, but there are a couple of actual changes in the process:
The parsing of SSH-2 global-request and channel open-messages is now
done by a new pair of functions in the client module. For channel
opens, I've invented a new union data type to be the return value from
that function, representing either failure (plus error message),
success (plus Channel instance to manage the new channel), or an
instruction to hand the channel over to a sharing downstream (plus a
pointer to the downstream in question).
Also, the tree234 of remote port forwardings in ssh2connection is now
initialised on first use by the client-specific code, so that's where
its compare function lives. The shared ssh2connection_free() still
takes responsibility for freeing it, but now has to check if it's
non-null first.
The outer shell of the ssh2_lportfwd_open method, for making a
local-to-remote port forwarding, is still centralised in
ssh2connection.c, but the part of it that actually constructs the
outgoing channel-open message has moved into the client code, because
that will have to change depending on whether the channel-open has to
have type direct-tcpip or forwarded-tcpip.
In the SSH-1 connection layer, half the filter_queue method has moved
out into the new client-specific code, but not all of it -
bidirectional channel maintenance messages are still handled
centrally. One exception is SSH_MSG_PORT_OPEN, which can be sent in
both directions, but with subtly different semantics - from server to
client, it's referring to a previously established remote forwarding
(and must be rejected if there isn't one that matches it), but from
client to server it's just a "direct-tcpip" request with no prior
context. So that one is in the client-specific module, and when I add
the server code it will have its own different handler.
The new FdSocket just takes an arbitrary pair of file descriptors to
read and write, optionally with an extra input fd providing the
standard error output from a command. uxproxy.c now just does the
forking and pipe setup, and once it's got all its fds, it hands off to
FdSocket to actually do the reading and writing.
This is very like the reorganisation I did on the Windows side in
commit 98a6a3553 (back in 2013, in preparation for named-pipe sockets
and connection sharing). The idea is that it should enable me to make
a thing that the PuTTY code base sees as a Socket, but which actually
connects to the standard I/O handles of the process it lives in.
This gets another big pile of logic out of ssh2connection and puts it
somewhere more central. Now the only thing left in ssh2connection is
the formatting and parsing of the various channel requests; the logic
deciding which ones to issue and what to do about them is devolved to
the Channel implementation, as it properly should be.
LogContext is now the owner of the logevent() function that back ends
and so forth are constantly calling. Previously, logevent was owned by
the Frontend, which would store the message into its list for the GUI
Event Log dialog (or print it to standard error, or whatever) and then
pass it _back_ to LogContext to write to the currently open log file.
Now it's the other way round: LogContext gets the message from the
back end first, writes it to its log file if it feels so inclined, and
communicates it back to the front end.
This means that lots of parts of the back end system no longer need to
have a pointer to a full-on Frontend; the only thing they needed it
for was logging, so now they just have a LogContext (which many of
them had to have anyway, e.g. for logging SSH packets or session
traffic).
LogContext itself also doesn't get a full Frontend pointer any more:
it now talks back to the front end via a little vtable of its own
called LogPolicy, which contains the method that passes Event Log
entries through, the old askappend() function that decides whether to
truncate a pre-existing log file, and an emergency function for
printing an especially prominent message if the log file can't be
created. One minor nice effect of this is that console and GUI apps
can implement that last function subtly differently, so that Unix
console apps can write it with a plain \n instead of the \r\n
(harmless but inelegant) that the old centralised implementation
generated.
One other consequence of this is that the LogContext has to be
provided to backend_init() so that it's available to backends from the
instant of creation, rather than being provided via a separate API
call a couple of function calls later, because backends have typically
started doing things that need logging (like making network
connections) before the call to backend_provide_logctx. Fortunately,
there's no case in the whole code base where we don't already have
logctx by the time we make a backend (so I don't actually remember why
I ever delayed providing one). So that shortens the backend API by one
function, which is always nice.
While I'm tidying up, I've also moved the printf-style logeventf() and
the handy logevent_and_free() into logging.c, instead of having copies
of them scattered around other places. This has also let me remove
some stub functions from a couple of outlying applications like
Pageant. Finally, I've removed the pointless "_tag" at the end of
LogContext's official struct name.
Almost all the call sites were doing a cumbersome dupprintf-use-free
cycle to get a formatted message into an ErrorSocket anyway, so it
seems more sensible to give them an easier way of doing so.
The few call sites that were passing a constant string literal
_shouldn't_ have been - they'll be all the better for adding a
strerror suffix to the message they were previously giving!
I've tried to separate out as many individually coherent changes from
this work as I could into their own commits, but here's where I run
out and have to commit the rest of this major refactoring as a
big-bang change.
Most of ssh.c is now no longer in ssh.c: all five of the main
coroutines that handle layers of the SSH-1 and SSH-2 protocols now
each have their own source file to live in, and a lot of the
supporting functions have moved into the appropriate one of those too.
The new abstraction is a vtable called 'PacketProtocolLayer', which
has an input and output packet queue. Each layer's main coroutine is
invoked from the method ssh_ppl_process_queue(), which is usually
(though not exclusively) triggered automatically when things are
pushed on the input queue. In SSH-2, the base layer is the transport
protocol, and it contains a pair of subsidiary queues by which it
passes some of its packets to the higher SSH-2 layers - first userauth
and then connection, which are peers at the same level, with the
former abdicating in favour of the latter at the appropriate moment.
SSH-1 is simpler: the whole login phase of the protocol (crypto setup
and authentication) is all in one module, and since SSH-1 has no
repeat key exchange, that setup layer abdicates in favour of the
connection phase when it's done.
ssh.c itself is now about a tenth of its old size (which all by itself
is cause for celebration!). Its main job is to set up all the layers,
hook them up to each other and to the BPP, and to funnel data back and
forth between that collection of modules and external things such as
the network and the terminal. Once it's set up a collection of packet
protocol layers, it communicates with them partly by calling methods
of the base layer (and if that's ssh2transport then it will delegate
some functionality to the corresponding methods of its higher layer),
and partly by talking directly to the connection layer no matter where
it is in the stack by means of the separate ConnectionLayer vtable
which I introduced in commit 8001dd4cb, and to which I've now added
quite a few extra methods replacing services that used to be internal
function calls within ssh.c.
(One effect of this is that the SSH-1 and SSH-2 channel storage is now
no longer shared - there are distinct struct types ssh1_channel and
ssh2_channel. That means a bit more code duplication, but on the plus
side, a lot fewer confusing conditionals in the middle of half-shared
functions, and less risk of a piece of SSH-1 escaping into SSH-2 or
vice versa, which I remember has happened at least once in the past.)
The bulk of this commit introduces the five new source files, their
common header sshppl.h and some shared supporting routines in
sshcommon.c, and rewrites nearly all of ssh.c itself. But it also
includes a couple of other changes that I couldn't separate easily
enough:
Firstly, there's a new handling for socket EOF, in which ssh.c sets an
'input_eof' flag in the BPP, and that responds by checking a flag that
tells it whether to report the EOF as an error or not. (This is the
main reason for those new BPP_READ / BPP_WAITFOR macros - they can
check the EOF flag every time the coroutine is resumed.)
Secondly, the error reporting itself is changed around again. I'd
expected to put some data fields in the public PacketProtocolLayer
structure that it could set to report errors in the same way as the
BPPs have been doing, but in the end, I decided propagating all those
data fields around was a pain and that even the BPPs shouldn't have
been doing it that way. So I've reverted to a system where everything
calls back to functions in ssh.c itself to report any connection-
ending condition. But there's a new family of those functions,
categorising the possible such conditions by semantics, and each one
has a different set of detailed effects (e.g. how rudely to close the
network connection, what exit status should be passed back to the
whole application, whether to send a disconnect message and/or display
a GUI error box).
I don't expect this to be immediately perfect: of course, the code has
been through a big upheaval, new bugs are expected, and I haven't been
able to do a full job of testing (e.g. I haven't tested every auth or
kex method). But I've checked that it _basically_ works - both SSH
protocols, all the different kinds of forwarding channel, more than
one auth method, Windows and Linux, connection sharing - and I think
it's now at the point where the easiest way to find further bugs is to
let it out into the wild and see what users can spot.
These are essentially data-structure maintenance, and it seems silly
to have them be part of the same file that manages the topmost
structure of the SSH connection.
Getting it out of the overgrown ssh.c is worthwhile in itself! But
there are other benefits of this reorganisation too.
One is that I get to remove ssh->current_incoming_data_fn, because now
_all_ incoming network data is handled by whatever the current BPP is.
So now we only indirect through the BPP, not through some other
preliminary function pointer _and_ the BPP.
Another is that all _outgoing_ network data is now handled centrally,
including our outgoing version string - which means that a hex dump of
that string now shows up in the raw-data log file, from which it was
previously conspicuous by its absence.
This piece of tidying-up has come out particularly well in terms of
saving tedious repetition and boilerplate. I've managed to remove
three pointless methods from every MAC implementation by means of
writing them once centrally in terms of the implementation-specific
methods; another method (hmacmd5_sink) vanished because I was able to
make the interface type 'ssh2_mac' be directly usable as a BinarySink
by way of a new delegation system; and because all the method
implementations can now find their own vtable, I was even able to
merge a lot of keying and output functions that had previously
differed only in length parameters by having them look up the lengths
in whatever vtable they were passed.
There's now an interface called 'Channel', which handles the local
side of an SSH connection-layer channel, in terms of knowing where to
send incoming channel data to, whether to close the channel, etc.
Channel and the previous 'struct ssh_channel' mutually refer. The
latter contains all the SSH-specific parts, and as much of the common
logic as possible: in particular, Channel doesn't have to know
anything about SSH packet formats, or which SSH protocol version is in
use, or deal with all the fiddly stuff about window sizes - with the
exception that x11fwd.c's implementation of it does have to be able to
ask for a small fixed initial window size for the bodgy system that
distinguishes upstream from downstream X forwardings.
I've taken the opportunity to move the code implementing the detailed
behaviour of agent forwarding out of ssh.c, now that all of it is on
the far side of a uniform interface. (This also means that if I later
implement agent forwarding directly to a Unix socket as an
alternative, it'll be a matter of changing just the one call to
agentf_new() that makes the Channel to plug into a forwarding.)
sshbpp.h now defines a classoid that encapsulates both directions of
an SSH binary packet protocol - that is, a system for reading a
bufchain of incoming data and turning it into a stream of PktIn, and
another system for taking a PktOut and turning it into data on an
outgoing bufchain.
The state structure in each of those files contains everything that
used to be in the 'rdpkt2_state' structure and its friends, and also
quite a lot of bits and pieces like cipher and MAC states that used to
live in the main Ssh structure.
One minor effect of this layer separation is that I've had to extend
the packet dispatch table by one, because the BPP layer can no longer
directly trigger sending of SSH_MSG_UNIMPLEMENTED for a message too
short to have a type byte. Instead, I extend the PktIn type field to
use an out-of-range value to encode that, and the easiest way to make
that trigger an UNIMPLEMENTED message is to have the dispatch table
contain an entry for it.
(That's a system that may come in useful again - I was also wondering
about inventing a fake type code to indicate network EOF, so that that
could be propagated through the layers and be handled by whichever one
currently knew best how to respond.)
I've also moved the packet-censoring code into its own pair of files,
partly because I was going to want to do that anyway sooner or later,
and mostly because it's called from the BPP code, and the SSH-2
version in particular has to be called from both the main SSH-2 BPP
and the bare unencrypted protocol used for connection sharing. While I
was at it, I took the opportunity to merge the outgoing and incoming
censor functions, so that the parts that were common between them
(e.g. CHANNEL_DATA messages look the same in both directions) didn't
need to be repeated.
In the course of reworking the socket vtable system, I noticed that
both sshshare.c and x11fwd.c independently invented the idea of a Plug
none of whose methods do anything. We don't need more than one of
those, so let's centralise the idea to somewhere it can be easily
reused.
The output routines in import.c and sshpubk.c were further horrifying
hotbeds of manual length-counting. Reworked it all so that it builds
up key file components in strbufs, and uses the now boringly standard
put_* functions to write into those strbufs.
This removes the write_* functions in import.c, which I had to hastily
rename a few commits ago when I introduced the new marshalling system
in the first place.
However, I wasn't quite able to get rid of _all_ of import.c's local
formatting functions; there are a couple still there (but now with new
BinarySink-style API) which output multiprecision integers in a couple
of different formats starting from an existing big-endian binary
representation, as opposed to starting from an internal Bignum.
Now instead of iterating through conf twice in separate functions,
once to count up the size of the serialised data and once to write it
out, I just go through once and dump it all in a strbuf.
(Of course, I could still do a two-pass count-then-allocate approach
easily enough in this system; nothing would stop me writing a
BinarySink implementation that didn't actually store any data and just
counted its size, and then I could choose at each call site whether I
preferred to do it that way.)
I've finally got tired of all the code throughout PuTTY that repeats
the same logic about how to format the SSH binary primitives like
uint32, string, mpint. We've got reasonably organised code in ssh.c
that appends things like that to 'struct Packet'; something similar in
sftp.c which repeats a lot of the work; utility functions in various
places to format an mpint to feed to one or another hash function; and
no end of totally ad-hoc stuff in functions like public key blob
formatters which actually have to _count up_ the size of data
painstakingly, then malloc exactly that much and mess about with
PUT_32BIT.
It's time to bring all of that into one place, and stop repeating
myself in error-prone ways everywhere. The new marshal.h defines a
system in which I centralise all the actual marshalling functions, and
then layer a touch of C macro trickery on top to allow me to (look as
if I) pass a wide range of different types to those functions, as long
as the target type has been set up in the right way to have a write()
function.
This commit adds the new header and source file, and sets up some
general centralised types (strbuf and the various hash-function
contexts like SHA_State), but doesn't use the new calls for anything
yet.
(I've also renamed some internal functions in import.c which were
using the same names that I've just defined macros over. That won't
last long - those functions are going to go away soon, so the changed
names are strictly temporary.)
A more or less identical piece of code to sanitise the CONF_host
string prior to session launch existed in Windows PuTTY and both
Windows and Unix Plink. It's long past time it was centralised.
While I'm here, I've added a couple of extra comments in the
centralised version, including one that - unfortunately - tries _but
fails_ to explain why a string of the form "host.name:1234" doesn't
get the suffix moved into CONF_port the way "user@host" moves the
prefix into CONF_username. Commit c1c1bc471 is the one I'm referring
to in the comment, and unfortunately it has an unexplained one-liner
log message from before I got into the habit of being usefully
verbose.
They don't do normal command-line processing, so they don't need it. A
few stray references to machinery provided in there are now satisfied
instead by a new stub module nocmdline.c.
People who use a packaging system other than jhbuild still ought to be
able to run the OS X GTK3 build, so now the gtk-mac-bundler command
finds out the locations of things by a more portable method.
(I've had this change lurking around uncommitted in a working tree for
a while, and only just found it in the course of doing other OS X-
related work. Oops.)
After a conversation this week with a user who tried to use it, it's
clear that Borland C can't build the up-to-date PuTTY without having
to make too many compromises of functionality (unsupported API
details, no 'long long' type), even above the issues that could be
worked round with extra porting ifdefs.
This too is not in the list of known DLLs on Windows 10. I don't know
of any actual viable hijacking attack based on it, which according to
my reading of MSDN (specifically, a rather vague hint in
https://msdn.microsoft.com/library/ff919712) _may_ be because we
mention the common controls assembly in our application manifest; but
better safe than sorry.
Now the entire list of remaining DLLs that PuTTY links against at load
time is a subset of the Win10 known DLLs list, so that _should_ mean
that everything we load before we've deployed our own defence
(SetDefaultDllDirectories) is defended against for us by Windows
itself.
The printing functions are split between winspool.drv and spoolss.dll
in a really weird way (who would have guessed that OpenPrinter and
ClosePrinter don't live in the same dynamic library?!), but _neither_
of those counts as a system 'known DLL', so linking against either one
of these at load time is again a potential DLL hijacking vector.
It's not on the default list of important system 'known DLLs' stored
at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs (see
https://isc.sans.edu/forums/diary/DLL+hijacking+vulnerabilities/9445/ )
which apparently makes it exempt from Windows's standard DLL hijacking
defence, i.e. if an executable links against it in the normal way then
that executable will be vulnerable to DLL hijacking from a file called
winmm.dll in the same directory as it.
The solution is to load it dynamically _after_ we've locked down our
DLL search path, which fortunately PuTTY's code base is well used to
doing already for other DLLs.
gssapi32.dll from MIT Kerberos as well as from Heimdal both load
further DLLs from their installation directories.
[SGT: I polished the original patch a bit, in particular replacing
manual memory allocation with dup_mb_to_wc. This required a Recipe
change to link miscucs.c and winucs.c into more of the tools.]
If you configure without GTK so that only the non-GUI tools get built
and installed, it makes sense to also only build and install the same
subset of the man pages.
It's obvious to the trained eye whether GTK PuTTY was compiled against
GTK2 or GTK3, but the untrained eye would probably appreciate a little
help, and even the trained eye probably can't tell GTK 3.18 from 3.19
at a glance :-)
This was very strange to write, because it's a bizarre combination of
the GNU-make-isms and rc commands of Makefile.mgw with the cl and link
commands of Makefile.vc (but also the latter thankfully doesn't need
those horrible response files).
I've added a big comment in mkfiles.pl about what the build
requirements for this makefile actually are, which _hopefully_ will be
usable by people other than me.
As documented in bug 'win-process-acl-finesse', we've had enough
assorted complaints about it breaking various non-malicious pieces of
Windows process interaction (ranging from git->plink integration to
screen readers for the vision-impaired) that I think it's more
sensible to set the process back to its default level of protection.
This precaution was never a fully effective protection anyway, due to
the race condition at process startup; the only properly effective
defence would have been to prevent malware running under the same user
ID as PuTTY in the first place, so in that sense, nothing has changed.
But people who want the arguable defence-in-depth advantage of the ACL
restriction can now turn it on with the '-restrict-acl' command-line
option, and it's up to them whether they can live with the assorted
inconveniences that come with it.
In the course of this change, I've centralised a bit more of the
restriction code into winsecur.c, to avoid repeating the error
handling in multiple places.
Now, instead of returning a boolean indicating whether the query has
completed or is still pending, agent_query() returns NULL to indicate
that the query _has_ completed, and if it hasn't, it returns a pointer
to a context structure representing the pending query, so that the
latter can be used to cancel the query if (for example) you later
decide you need to free the thing its callback was using as a context.
This should fix a potential race-condition segfault if you overload an
agent forwarding channel and then close it abruptly. (Which nobody
will be doing for sensible purposes, of course! But I ran across this
while stress-testing other aspects of agent forwarding.)
This arranges that the mechanism from the previous commit
automatically turns itself on and off depending on whether a .git
directory even exists (so it won't try to do anything in distribution
tarballs), and also arranges that it can be manually turned off by a
configure option (in case someone who _is_ building from a git
checkout finds it inconvenient for some reason I haven't thought of,
which seems quite plausible to me).
This is perhaps the more useful end of the mechanism I added in the
previous commit: now, when a developer runs a configure+make build
from a git checkout (rather than from a bob-built source tarball), the
Makefile will automatically run 'git rev-parse HEAD' and embed the
result in the binaries.
So now when I want to deploy my own bleeding-edge code for day-to-day
use on my own machine, I can easily check whether I've done it right
(e.g. did I install to the right prefix?), and also easily check
whether any given PuTTY or pterm has been restarted since I rolled out
a new version.
In order to arrange this (and in particular to force version.o to be
rebuilt when _any_ source file changes), I've had to reintroduce some
of the slightly painful Makefile nastiness that I removed in 4d8782e74
when I retired the 'manifest' system, namely having version.o depend
on a file empty.h, which in turn is trivially rebuilt by a custom make
rule whose dependencies include $(allsources). That's a bit
unfortunate, but I think acceptable: the main horribleness of the
manifest system was not that part, but the actual _manifests_, which
were there to arrange that if you modified the sources in a
distribution tarball the binaries would automatically switch to
reporting themselves as local builds rather than the version baked
into the tarball. I haven't reintroduced that part of the system: if
you check out a given git commit, modify the checked-out sources, and
build the result, the Makefile won't make any inconvenient attempts to
detect that, and the resulting build will still announce itself as the
git commit you started from.
The Windows binaries, and both Windows and Unix source archives,
output from a bob build will now include the full SHA-1 of the source
git commit in their buildinfo (hence in all the About boxes and
command-line version output).
This will be occasionally useful to me at release time (there was that
one embarrassing incident where I managed not to notice that I'd made
a release build from entirely the wrong commit), but mostly, it just
seems like an obviously useful thing to put in a general buildinfo
section now that there is one.