The new portfwdmgr_connect_socket() works basically like the existing
portfwdmgr_connect(), in that it opens an SSH forwarding channel and
gateways it to a Socket. But where portfwdmgr_connect() started from a
(hostname,port) pair and used mgr->conf to inform name lookup and
proxy settings, portfwdmgr_connect_socket() simply takes a callback
that it will call when it wants you to make a Socket for a given Plug,
and that callback can make any kind of Socket it likes.
The idea is that this way you can make port forwardings that talk to
things other than genuine TCP connections, by simply providing an
appropriate callback.
My immediate use case for this is for agent forwarding, and will
appear in the next commit. But it's easy to imagine other purposes you
might use a thing like this for, such as forwarding SSH channels to
AF_UNIX sockets in general.
This commit switches as many ssh_hash_free / ssh_hash_new pairs as
possible to reuse the previous hash object via ssh_hash_reset. Also a
few other cleanups: use the wrapper function hash_simple() where
possible, and I've also introduced ssh_hash_digest_nondestructive()
and switched to that where possible as well.
The idea is to arrange that an ssh_hash object can be reused without
having to free it and allocate a new one. So the 'final' method has
been replaced with 'digest', which does everything except the trailing
free; and there's also a new pair of methods 'reset' and 'copyfrom'
which overwrite the state of a hash with either the starting state or
a copy of another state. Meanwhile, the 'new' allocator function has
stopped performing 'reset' as a side effect; now it _just_ does the
administrative stuff (allocation, setting up vtables), and returns an
object which isn't yet ready to receive any actual data, expecting
that the caller will either reset it or copy another hash state into
it.
In particular, that means that the SHA-384 / SHA-512 pair no longer
need separate 'new' methods, because only the 'reset' part has to
change between them.
This commit makes no change to the user-facing API of wrapper
functions in ssh.h, except to add new functions which nothing yet
calls. The user-facing ssh_hash_new() calls the new and reset methods
in succession, and the copy and final methods still exist to do
new+copy and digest+free.
The code that reads an SSH1_AGENTC_ADD_RSA_IDENTITY message and parses
an RSA private key out of it now does it by calling a BinarySource
function in sshrsa.c, instead of doing inline in the Pageant message
handler. This has no functional change, except that now I can expose
that separate function in the testcrypt API, where it provides me with
a mechanism for creating a bare RSAKey structure for purposes of
testing RSA key exchange.
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
This is another case where a stale pointer bug could have arisen from
a toplevel callback going off after an object was freed.
But here, just adding delete_callbacks_for_context wouldn't help. The
actual context parameter for the callback wasn't mainchan itself; it
was a tiny separate object, allocated to hold just the parameters of
the function the callback wanted to call. So if _those_ parameters
became stale before the callback was triggered, then even
delete_callbacks_for_context wouldn't have been able to help.
Also, mainchan itself would have been freed moments after this
callback was queued, so moving it to be a callback on mainchan itself
wouldn't help.
Solution: move the callback right out to Ssh, by introducing a new
ssh_sw_abort_deferred() which is just like ssh_sw_abort but does its
main work in a toplevel callback. Then ssh.c's existing call to
delete_callbacks_for_context will clean it up if necessary.
For some reason, only Visual Studio bothers to give a warning when you
write "return g()" inside a function f() when both f and g have void
return type.
(Of course it would be cleaner and more orthogonal if that was simply
legal C in the first place - but given that it's not, it would be nice
if more compilers let me know about it so I could fix it...)
They're called things like SSH_CIPHER_3DES in the SSH-1 spec, but I
don't normally let that stop me adding the disambiguating '1' in the
names I give constants inside this code base. These ones are long
overdue for some disambiguation.
This is an obviously useful test feature, since if nothing else it
will let me exercise every individual crypto primitive, even the ones
that the client-side configuration is too coarse-grained to describe
in detail (such as the difference between CBC and CTR mode versions of
the same cipher).
This gets rid of the magic constants we apply to the top and bottom
bytes of the random data to make the Curve25519 private DH value. Or
rather, one of the magic constants is completely gone (we can infer it
from curve->fieldBits), and the other is moved into the curve
structure instead of being hardwired into the private-key-inventing
function.
With this change, it will be easy to add the similar Curve448 kex
method, because it's now just a matter of adding the protocol names
and curve constants.
The idea of these is that they centralise the common idiom along the
lines of
if (logical_array_len >= physical_array_size) {
physical_array_size = logical_array_len * 5 / 4 + 256;
array = sresize(array, physical_array_size, ElementType);
}
which happens at a zillion call sites throughout this code base, with
different random choices of the geometric factor and additive
constant, sometimes forgetting them completely, and generally doing a
lot of repeated work.
The new macro sgrowarray(array,size,n) has the semantics: here are the
array pointer and its physical size for you to modify, now please
ensure that the nth element exists, so I can write into it. And
sgrowarrayn(array,size,n,m) is the same except that it ensures that
the array has size at least n+m (so sgrowarray is just the special
case where m=1).
Now that this is a single centralised implementation that will be used
everywhere, I've also gone to more effort in the implementation, with
careful overflow checks that would have been painful to put at all the
previous call sites.
This commit also switches over every use of sresize(), apart from a
few where I really didn't think it would gain anything. A consequence
of that is that a lot of array-size variables have to have their types
changed to size_t, because the macros require that (they address-take
the size to pass to the underlying function).
This replaces all the macros like ssh_key_sign() and win_draw_text()
which take an object containing a vtable pointer and do the
dereferencing to find the actual concrete method to call. Now they're
all inline functions, which means more sensible type-checking and more
comprehensible error reports when the types go wrong, and also means
that there's no risk of double-evaluating the object argument.
Instead of repeatedly looping on the random number generator until it
comes up with two values that have a large enough product, the new
version guarantees only one use of random numbers, by first counting
up all the possible pairs of values that would work, and then
inventing a single random number that's used as an index into that
list.
I've done the selection from the list using constant-time techniques,
not particularly because I think key generation can be made CT in
general, but out of sheer habit after the last few months, and who
knows, it _might_ be useful.
While I'm at it, I've also added an option to make sure the two
firstbits values differ by at least a given value. For RSA, I set that
value to 2, guaranteeing that even if the smaller prime has a very
long string of 1 bits after the firstbits value and the larger has a
long string of 0, they'll still have a relative difference of at least
2^{-12}. Not that there was any serious chance of the primes having
randomly ended up so close together as to make the key in danger of
factoring, but it seems like a silly thing to leave out if I'm
rewriting the function anyway.
In commit 0f405ae8a, I arranged to stop reading from the SSH
connection if the in_raw bufchain got too big. But in at least some
tools (this bit me just now with PSCP), nothing actually calls
ssh_check_frozen again when the bufchain clears, so it stays frozen.
Now ssh_check_frozen is non-static, and all the BPP implementations
call it whenever they consume data from ssh->in_raw.
Although I've reinstated the tedious manual mouse input, I can at
least reduce the amount of it that the user is required to provide:
the new PRNG has a hard limit on the size of its seed, so once we've
generated enough entropy to fill that up, there's no point in
collecting more, even if we're generating a particularly large key.
The ssh_signkey vtable has grown a new method ssh_key_invalid(), which
checks whether the key is going to be usable for constructing a
signature at all. Currently the only way this can fail is if it's an
RSA key so short that there isn't room to put all the PKCS#1
formatting in the signature preimage integer, but the return value is
an arbitrary error message just in case more reasons are needed later.
This is tested separately rather than at key-creation time because of
the signature flags system: an RSA key of intermediate length could be
valid for SHA-1 signing but not for SHA-512. So really this method
should be called at the point where you've decided what sig flags you
want to use, and you're checking if _those flags_ are OK.
On the verification side, there's no need for a separate check. If
someone presents us with an RSA key so short that it's impossible to
encode a valid signature using it, then we simply regard all
signatures as invalid.
The local put_mp_*_from_string functions in import.c now take ptrlen
(which simplifies essentially all their call sites); so does the local
function logwrite() in logging.c, and so does ssh2_fingerprint_blob.
This is a general cleanup which has been overdue for some time: lots
of length fields are now the machine word type rather than the (in
practice) fixed 'int'.
If the SSH socket is readable, GTK will preferentially give us a
callback to read from it rather than calling its idle functions. That
means the ssh->in_raw bufchain can just keep accumulating data, and
the callback that gets the BPP to take data back off that bufchain
will never be called at all.
The solution is to use sk_set_frozen after a certain point, to stop
reading further data from the socket (and, more importantly, disable
GTK's I/O callback for that fd) until we've had a chance to process
some backlog, and then unfreeze the socket again afterwards.
Annoyingly, that means adding a _second_ 'frozen' flag to Ssh, because
the one we already had has exactly the wrong semantics - it prevents
us from _processing_ our backlog, which is the last thing we want if
the entire problem is that we need that backlog to get smaller! So now
there are two frozen flags, and a big comment explaining the
difference.
Similarly to the 'AES (unaccelerated)' naming scheme I added in the
AES rewrite, the hash functions that have multiple implementations now
each come with an annotation saying which one they are.
This was more tricky for hashes than for ciphers, because the
annotation for a hash has to be a separate string literal from the
base text name, so that it can propagate into the name field for each
HMAC wrapper without looking silly.
Similarly to my recent addition of NEON-accelerated AES, these new
implementations drop in alongside the SHA-NI ones, under a different
set of ifdefs. All the details of selection and detection are
essentially the same as they were for the AES code.
The new structure of those modules is along similar lines to the
recent rewrite of AES, with selection of HW vs SW implementation being
done by the main vtable instead of a subsidiary function pointer
within it, freedom for each implementation to define its state
structure however is most convenient, and space to drop in other
hardware-accelerated implementations.
I've removed the centralised test for compiler SHA-NI support in
ssh.h, and instead duplicated it between the two SHA modules, on the
grounds that once you start considering an open-ended set of hardware
accelerators, the two hashes _need_ not go together.
I've also added an extra test in cryptsuite that checks the point at
which the end-of-hash padding switches to adding an extra cipher
block. That was just because I was rewriting that padding code, was
briefly worried that I might have got an off-by-one error in that part
of it, and couldn't see any existing test that gave me confidence I
hadn't.
This tears out the entire previous random-pool system in sshrand.c. In
its place is a system pretty close to Ferguson and Schneier's
'Fortuna' generator, with the main difference being that I use SHA-256
instead of AES for the generation side of the system (rationale given
in comment).
The PRNG implementation lives in sshprng.c, and defines a self-
contained data type with no state stored outside the object, so you
can instantiate however many of them you like. The old sshrand.c still
exists, but in place of the previous random pool system, it's just
become a client of sshprng.c, whose job is to hold a single global
instance of the PRNG type, and manage its reference count, save file,
noise-collection timers and similar administrative business.
Advantages of this change include:
- Fortuna is designed with a more varied threat model in mind than my
old home-grown random pool. For example, after any request for
random numbers, it automatically re-seeds itself, so that if the
state of the PRNG should be leaked, it won't give enough
information to find out what past outputs _were_.
- The PRNG type can be instantiated with any hash function; the
instance used by the main tools is based on SHA-256, an improvement
on the old pool's use of SHA-1.
- The new PRNG only uses the completely standard interface to the
hash function API, instead of having to have privileged access to
the internal SHA-1 block transform function. This will make it
easier to revamp the hash code in general, and also it means that
hardware-accelerated versions of SHA-256 will automatically be used
for the PRNG as well as for everything else.
- The new PRNG can be _tested_! Because it has an actual (if not
quite explicit) specification for exactly what the output numbers
_ought_ to be derived from the hashes of, I can (and have) put
tests in cryptsuite that ensure the output really is being derived
in the way I think it is. The old pool could have been returning
any old nonsense and it would have been very hard to tell for sure.
This is in preparation for a PRNG revamp which will want to have a
well defined boundary for any given request-for-randomness, so that it
can destroy the evidence afterwards. So no more looping round calling
random_byte() and then stopping when we feel like it: now you say up
front how many random bytes you want, and call random_read() which
gives you that many in one go.
Most of the call sites that had to be fixed are fairly mechanical, and
quite a few ended up more concise afterwards. A few became more
cumbersome, such as mp_random_bits, in which the new API doesn't let
me load the random bytes directly into the target integer without
triggering undefined behaviour, so instead I have to allocate a
separate temporary buffer.
The _most_ interesting call site was in the PKCS#1 v1.5 padding code
in sshrsa.c (used in SSH-1), in which you need a stream of _nonzero_
random bytes. The previous code just looped on random_byte, retrying
if it got a zero. Now I'm doing a much more interesting thing with an
mpint, essentially scaling a binary fraction repeatedly to extract a
number in the range [0,255) and then adding 1 to it.
Keeping that information alongside the hashes themselves seems more
sensible than having the HMAC code know that fact about everything it
can work with.
Similar to the versions in ssh_cipheralg and ssh_keyalg, this allows a
set of vtables to share function pointers while providing varying
constant data that the shared function can use to vary its behaviour.
As an initial demonstration, I've used this to recombine the four
trivial text_name methods for the HMAC-SHA1 variants. I'm about to use
it for something more sensible, though.
All the hash-specific state structures, and the functions that
directly accessed them, are now local to the source files implementing
the hashes themselves. Everywhere we previously used those types or
functions, we're now using the standard ssh_hash or ssh2_mac API.
The 'simple' functions (hmacmd5_simple, SHA_Simple etc) are now a pair
of wrappers in sshauxcrypt.c, each of which takes an algorithm
structure and can do the same conceptual thing regardless of what it
is.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
The refactored sshaes.c gives me a convenient slot to drop in a second
hardware-accelerated AES implementation, similar to the existing one
but using Arm NEON intrinsics in place of the x86 AES-NI ones.
This needed a minor structural change, because Arm systems are often
heterogeneous, containing more than one type of CPU which won't
necessarily all support the same set of architecture features. So you
can't test at run time for the presence of AES acceleration by
querying the CPU you're running on - even if you found a way to do it,
the answer wouldn't be reliable once the OS started migrating your
process between CPUs. Instead, you have to ask the OS itself, because
only that knows about _all_ the CPUs on the system. So that means the
aes_hw_available() mechanism has to extend a tentacle into each
platform subdirectory.
The trickiest part was the nest of ifdefs that tries to detect whether
the compiler can support the necessary parts. I had successful
test-compiles on several compilers, and was able to run the code
directly on an AArch64 tablet (so I know it passes cryptsuite), but
it's likely that at least some Arm platforms won't be able to build it
because of some path through the ifdefs that I haven't been able to
test yet.
I remembered the existence of that module while I was changing the API
of the CRC functions. It's still quite possibly the only code in PuTTY
not written specifically _for_ PuTTY, so it definitely deserves a bit
of a test suite.
In order to expose it through the ptrlen-centric testcrypt system,
I've added some missing 'const' in the detector module itself, but
otherwise I've left the detector code as it was.
Finding even semi-official test vectors for this CRC implementation
was hard, because it turns out not to _quite_ match any of the well
known ones catalogued on the web. Its _polynomial_ is well known, but
the combination of details that go alongside it (starting state,
post-hashing transformation) are not quite the same as any other hash
I know of.
After trawling catalogue websites for a while I finally worked out
that SSH-1's CRC and RFC 1662's CRC are basically the same except for
different choices of starting value and final adjustment. And RFC
1662's CRC is common enough that there _are_ test vectors.
So I've renamed the previous crc32_compute function to crc32_ssh1,
reflecting that it seems to be its own thing unlike any other CRC;
implemented the RFC 1662 CRC as well, as an alternative tiny wrapper
on the inner crc32_update function; and exposed all three functions to
testcrypt. That lets me run standard test vectors _and_ directed tests
of the internal update routine, plus one check that crc32_ssh1 itself
does what I expect.
While I'm here, I've also modernised the code to use uint32_t in place
of unsigned long, and ptrlen instead of separate pointer,length
arguments. And I've removed the general primer on CRC theory from the
header comment, in favour of the more specifically useful information
about _which_ CRC this is and how it matches up to anything else out
there.
(I've bowed to inevitability and put the directed CRC tests in the
'crypt' class in cryptsuite.py. Of course this is a misnomer, since
CRC isn't cryptography, but it falls into the same category in terms
of the role it plays in SSH-1, and I didn't feel like making a new
pointedly-named 'notreallycrypt' container class just for this :-)
The new explicit vtables for the hardware and software implementations
are now exposed by name in the testcrypt protocol, and cryptsuite.py
runs all the AES tests separately on both.
(When hardware AES is compiled out, ssh2_cipher_new("aes128_hw") and
similar calls will return None, and cryptsuite.py will respond by
skipping those tests.)
sshaes.c is more or less completely changed by this commit.
Firstly, I've changed the top-level structure. In the old structure,
there were three levels of indirection controlling what an encryption
function would actually do: first the ssh2_cipher vtable, then a
subsidiary set of function pointers within that to select the software
or hardware implementation, and then inside the main encryption
function, a switch on the key length to jump into the right place in
the unrolled loop of cipher rounds.
That was all a bit untidy. So now _all_ of that is done by means of
just one selection system, namely the ssh2_cipher vtable. The software
and hardware implementations of a given SSH cipher each have their own
separate vtable, e.g. ssh2_aes256_sdctr_sw and ssh2_aes256_sdctr_hw;
this allows them to have their own completely different state
structures too, and not have to try to coexist awkwardly in the same
universal AESContext with workaround code to align things correctly.
The old implementation-agnostic vtables like ssh2_aes256_sdctr still
exist, but now they're mostly empty, containing only the constructor
function, which will decide whether AES-NI is currently available and
then choose one of the other _real_ vtables to instantiate.
As well as the cleaner data representation, this also means the
vtables can have different description strings, which means the Event
Log will indicate which AES implementation is actually in use; it
means the SW and HW vtables are available for testcrypt to use
(although actually using them is left for the next commit); and in
principle it would also make it easy to support a user override for
the automatic SW/HW selection (in case anyone turns out to want one).
The AES-NI implementation has been reorganised to fit into the new
framework. One thing I've done is to de-optimise the key expansion:
instead of having a separate blazingly fast loop-unrolled key setup
function for each key length, there's now just one, which uses AES
intrinsics for the actual transformations of individual key words, but
wraps them in a common loop structure for all the key lengths which
has a clear correspondence to the cipher spec. (Sorry to throw away
your work there, Pavel, but this isn't an application where key setup
really _needs_ to be hugely fast, and I decided I prefer a version I
can understand and debug.)
The software AES implementation is also completely replaced with one
that uses a bit-sliced representation, i.e. the cipher state is split
across eight integers in such a way that each logical byte of the
state occupies a single bit in each of those integers. The S-box
lookup is done by a long string of AND and XOR operations on the eight
bits (removing the potential cache side channel from a lookup table),
and this representation allows 64 S-box lookups to be done in parallel
simply by extending those AND/XOR operations to be bitwise ones on a
whole word. So now we can perform four AES encryptions or decryptions
in parallel, at least when the cipher mode permits it (which SDCTR and
CBC decryption both do).
The result is slower than the old implementation, but (a) not by as
much as you might think - those parallel S-boxes are surprisingly
competitive with 64 separate table lookups; (b) the compensation is
that now it should run in constant time with no data-dependent control
flow or memory addressing; and (c) in any case the really fast
hardware implementation will supersede it for most users.
The old names like ssh_aes128 and ssh_aes128_ctr reflect the SSH
protocol IDs, which is all very well, but I think a more important
principle is that it should be easy for me to remember which cipher
mode each one refers to. So I've renamed them so that they all end in
_cbc and _sdctr.
(I've left alone the string identifiers used by testcrypt, for the
moment. Perhaps I'll go back and change those later.)
All access to AES throughout the code is now done via the ssh2_cipher
vtable interface. All code that previously made direct calls to the
underlying functions (for encrypting and decrypting private key files)
now does it by instantiating an ssh2_cipher.
This removes constraints on the AES module's internal structure, and
allows me to reorganise it as much as I like.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
Previously, lots of individual ssh2_cipheralg structures were declared
static, and only available to the rest of the code via a smaller
number of 'ssh2_ciphers' objects that wrapped them into lists. But I'm
going to want to access individual ciphers directly in the testing
system I'm currently working on, so I'm giving all those objects
external linkage and declaring them in ssh.h.
Also, I've made up an entirely new one, namely exposing MD5 as an
instance of the general ssh_hashalg abstraction, which it has no need
to be for the purposes of actually using it in SSH. But, again, this
will let me treat it the same as all the other hashes in the test
system.
No functional change, for the moment.
I'm getting tired of typing 'struct Foo' everywhere when I could just
type 'Foo', so here's a bunch of extra typedefs that allow me to leave
off the 'struct' in various places.
ssh_rsakex_encrypt took an input (pointer, length) pair, which I've
replaced with a ptrlen; it also took an _output_ (pointer, length)
pair, and then re-computed the right length internally and enforced by
assertion that the one passed in matched it. Now it just returns a
strbuf of whatever length it computed, which saves the caller having
to compute the length at all.
Also, both ssh_rsakex_encrypt and ssh_rsakex_decrypt took their
arguments in a weird order; I think it looks more sensible to put the
RSA key first rather than last, so now they both have the common order
(key, hash, input data).
The abstract method ssh_key_sign(), and the concrete functions
ssh_rsakex_newkey() and rsa_ssh1_public_blob_len(), now each take a
ptrlen argument in place of a separate pointer and length pair.
Partly that's because I'm generally preferring ptrlens these days and
it keeps argument lists short and tidy-looking, but mostly it's
because it will make those functions easier to wrap in my upcoming
test system.
This makes the API more flexible, so that it's not restricted to
taking a key of precisely the length specified in the ssh2_macalg
structure. Instead, ssh2bpp looks up that length to construct the
MAC's key.
Some MACs (e.g. Poly1305) will only _work_ with a single key length.
But this way, I can run standard test vectors against MACs that can
take a variable length (e.g. everything in the HMAC family).
I'm about to want to use it for purposes other than KEX, so it's now
just called MAX_HASH_LEN and is supposed to be an upper bound on any
hash function we implement at all. Of course this makes no difference
to its value, because the largest hash we have is SHA-512 which
already fit inside that limit.
The macro wrapper for the MAC setkey function expanded to completely
the wrong vtable method due to a cut and paste error. And I never
noticed, because what _should_ have been its two call sites in
ssh2bpp.c were directly calling the _right_ vtable method instead.
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
These were both using the old-fashioned strategy of 'count up the
length first, then go back over the same data trying not to do
anything different', which these days I'm trying to replace with
strbufs.
Also, while I was in ssh.h, removed the prototype of rsasanitise()
which doesn't even exist any more.
Several pieces of old code were disposing of pieces of an RSAKey by
manually freeing them one at a time. We have a centralised
freersakey(), so we should use that instead wherever possible.
Where it wasn't possible to switch over to that, it was because we
were only freeing the private fields of the key - so I've fixed that
by cutting freersakey() down the middle and exposing the private-only
half as freersapriv().
It's just silly to have _two_ systems for traversing a string of
comma-separated protocol ids. I think the new get_commasep_word
technique for looping over the elements of a string is simpler and
more general than the old membership-testing approach, and also it's
necessary for the modern KEX untangling system (which has to be able
to loop over one string, even if it used a membership test to check
things in the other). So this commit rewrites the two remaining uses
of in_commasep_string to use get_commasep_word instead, and deletes
the former.