It correctly masked off bits in the partial word, but then left all
higher words _unchanged_ rather than zeroing them.
Apparently its use in mp_invert_mod_2to was in restricted enough
circumstances not to cause a failure there!
My test for whether x has a square root was based on testing whether a
large power of x was congruent to 1 mod p, which is a fine test
provided x is in the multiplicative group of p, but would give a false
negative on the one possible input value that _isn't_ - namely zero.
The actual number returned from the function is fine (because that too
is a large power of the input, and when the input is 0 that's
foolproof). So I just needed to add a special case for the returned
'success' flag.
I had not considered what would happen if the input and output mp_ints
aliased each other. What happens is that because I wrote the output
starting from the low end, input words would get corrupted before I'd
finished with them. Easily fixed by reversing the order of the loop.
ssh_rsakex_encrypt took an input (pointer, length) pair, which I've
replaced with a ptrlen; it also took an _output_ (pointer, length)
pair, and then re-computed the right length internally and enforced by
assertion that the one passed in matched it. Now it just returns a
strbuf of whatever length it computed, which saves the caller having
to compute the length at all.
Also, both ssh_rsakex_encrypt and ssh_rsakex_decrypt took their
arguments in a weird order; I think it looks more sensible to put the
RSA key first rather than last, so now they both have the common order
(key, hash, input data).
The abstract method ssh_key_sign(), and the concrete functions
ssh_rsakex_newkey() and rsa_ssh1_public_blob_len(), now each take a
ptrlen argument in place of a separate pointer and length pair.
Partly that's because I'm generally preferring ptrlens these days and
it keeps argument lists short and tidy-looking, but mostly it's
because it will make those functions easier to wrap in my upcoming
test system.
This makes the API more flexible, so that it's not restricted to
taking a key of precisely the length specified in the ssh2_macalg
structure. Instead, ssh2bpp looks up that length to construct the
MAC's key.
Some MACs (e.g. Poly1305) will only _work_ with a single key length.
But this way, I can run standard test vectors against MACs that can
take a variable length (e.g. everything in the HMAC family).
Just like put_data(), but takes a ptrlen rather than separate ptr and
len arguments, so it saves a bit of repetition at call sites. I
probably should have written this ages ago, but better late than
never; I've also converted every call site I can find that needed it.
When we're running with that cipher/MAC combination, what ssh2bpp.c
sees as separate cipher and MAC objects are actually just different
interfaces of the same object: the Poly1305 constructor function just
returns an alternate view of the cipher object it's passed, and the
free function does nothing.
But Address Sanitiser points out that if we first call
ssh2_cipher_free and then ssh2_mac_free, then although poly1305_free
would do nothing once we reached it, we can't actually reach it
without looking up the vtable in the object we've just thrown away.
Easily fixed by reversing the order of mac_free and cipher_free in the
BPP. Also I've centralised that freeing so that it doesn't have to be
repeated with the same careful ordering in the BPP cleanup function
and the functions that replace a direction of crypto.
misc.c has always contained a combination of things that are tied
tightly into the PuTTY code base (e.g. they use the conf system, or
work with our sockets abstraction) and things that are pure standalone
utility functions like nullstrcmp() which could quite happily be
dropped into any C program without causing a link failure.
Now the latter kind of standalone utility code lives in the new source
file utils.c, whose only external dependency is on memory.c (for snew,
sfree etc), which in turn requires the user to provide an
out_of_memory() function. So it should now be much easier to link test
programs that use PuTTY's low-level functions without also pulling in
half its bulky infrastructure.
In the process, I came across a memory allocation logging system
enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case
we have much more advanced tools for that kind of thing these days,
like valgrind and Leak Sanitiser, so I've just removed it rather than
trying to transplant it somewhere sensible. (We can always pull it
back out of the version control history if really necessary, but I
haven't used it in at least a decade.)
The other slightly silly thing I did was to give bufchain a function
pointer field that points to queue_idempotent_callback(), and disallow
direct setting of the 'ic' field in favour of calling
bufchain_set_callback which will fill that pointer in too. That allows
the bufchain system to live in utils.c rather than misc.c, so that
programs can use it without also having to link in the callback system
or provide an annoying stub of that function. In fact that's just
allowed me to remove stubs of that kind from PuTTYgen and Pageant!
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
This removes the one remaining failure at -Wvla. (Of course, that
array isn't for a _hash_ function, so it wouldn't have been quite
appropriate to make it a static array of size MAX_HASH_LEN.)
Although C99 introduces variable length array (VLA) feature, it is
not supported by MS Visual Studio (being C++ compiler first of all).
In this commit, we add Clang and GCC options to disable VLA, so they
would be disabled in non-MSVC builds as well.
I'm about to want to use it for purposes other than KEX, so it's now
just called MAX_HASH_LEN and is supposed to be an upper bound on any
hash function we implement at all. Of course this makes no difference
to its value, because the largest hash we have is SHA-512 which
already fit inside that limit.
If we have to abandon a Diffie-Hellman key exchange part way through
(e.g. the connection slams shut), and we haven't yet run all the
stages of the DH algorithm, then some of the mp_ints in the dh_ctx
will be NULL. So we shouldn't mp_free them without checking first.
The macro wrapper for the MAC setkey function expanded to completely
the wrong vtable method due to a cut and paste error. And I never
noticed, because what _should_ have been its two call sites in
ssh2bpp.c were directly calling the _right_ vtable method instead.
In commit 869ce8867 I sank an sfree(fingerprint) into an if statement
that had previously been before it - but I only put it in two of the
three branches of the if. Now added it in the third one, preventing a
minor memory leak if you use SSH-1 with the -hostkey option (and the
actual host key presented by the server matches it).
Some still-supported Linux distributions (Debian jessie and Ubuntu 14.04
at least) still use GCC 4, where C99 isn't the default.
For autoconf, we do it the autotools way. For the standalone Makefiles,
we go for -std=gnu99 rather than c99, to avoid trouble with fdopen().
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
It wasn't really doing any serious harm, but I just got tired of
having to scroll past 700 lines of pointless static data every time I
wanted to look at the actual code in the file. Now primes[] is
initialised as necessary when genprime is first called.
(Since we only use primes up to 2^16, I didn't see any point in doing
anything fancy; this is the most trivial Sieve of Eratosthenes.)
These were both using the old-fashioned strategy of 'count up the
length first, then go back over the same data trying not to do
anything different', which these days I'm trying to replace with
strbufs.
Also, while I was in ssh.h, removed the prototype of rsasanitise()
which doesn't even exist any more.
There was no point in rsa2_sign and rsa2_verify having mirrored
versions of the same code to construct the cleartext of the RSA
signature integer, just because one is building it and the other is
checking it. Much more sensible to have a single function that builds
it, and then rsa2_verify can compare the received integer against that
while rsa2_sign encodes it into an output integer.
Several pieces of old code were disposing of pieces of an RSAKey by
manually freeing them one at a time. We have a centralised
freersakey(), so we should use that instead wherever possible.
Where it wasn't possible to switch over to that, it was because we
were only freeing the private fields of the key - so I've fixed that
by cutting freersakey() down the middle and exposing the private-only
half as freersapriv().
I happened to run cmdgen under Leak Sanitiser, and found it was
_almost_ clean - clean enough that if I fix the last few leaks then it
might be worth running it again from time to time.
In the new modular SSH architecture, ssh2transport.c delegates the
actual KEX packet exchange to ssh2kex_coroutine, which has different
implementations for client and server. The KEX code actually in
ssh2transport.c consists of looping on the coroutine until it zeroes
out its state field in the ssh2transport state.
But if something goes wrong enough during KEX that we call
ssh_proto_error or any other fatal connection-terminating function,
then when we return to ssh2transport.c, the ssh2transport state won't
even exist for it to check that flag. Address Sanitiser pointed that
out to me recently, so here's a fix in which we set an 'aborted' flag
to tell the caller that its state has already been freed.
A user points out that the call to close_directory() in pscp.c's
rsource() function should have been inside rather than outside the if
statement that checks whether the directory handle is NULL. As a
result, any failed attempt to open a directory during a 'pscp -r'
recursive upload leads to a null-pointer dereference.
Moved the close_directory call to where it should be, and also
arranged to print the OS error code if the directory-open fails, by
also changing the API of open_directory to return an error string on
failure.
In the previous commit I happened to notice a %.150s in a ppl_logevent
call, which was probably an important safety precaution a couple of
decades ago when that format string was being used for an sprintf into
a fixed-size buffer, but now it's just pointless cruft.
This commit removes all printf string formatting directives with a
compile-time fixed size, with the one exception of a %.3s used to cut
out a 3-letter month name in scpserver.c. In cases where the format
string in question was already going to an arbitrary-length function
like dupprintf or ppl_logevent, that's all I've done; in cases where
there was still a fixed-size buffer, I've replaced it with a dynamic
buffer and dupprintf.
In the past, I've had a lot of macros which you call with double
parentheses, along the lines of debug(("format string", params)), so
that the inner parens protect the commas and permit the macro to treat
the whole printf-style argument list as one macro argument.
That's all very well, but it's a bit inconvenient (it doesn't leave
you any way to implement such a macro by prepending another argument
to the list), and now this code base's rules allow C99isms, I can
switch all those macros to using a single pair of parens, using the
C99 ability to say '...' in the parameter list of the #define and get
at the corresponding suffix of the arguments as __VA_ARGS__.
So I'm doing it. I've made the following printf-style macros variadic:
bpp_logevent, ppl_logevent, ppl_printf and debug.
While I'm here, I've also fixed up a collection of conditioned-out
calls to debug() in the Windows front end which were clearly expecting
a macro with a different calling syntax, because they had an integer
parameter first. If I ever have a need to condition those back in,
they should actually work now.
In the big boolification commit (3214563d8) I accidentally rewrote
"term->wrap == 0" as "term->wrap" instead of as "!term->wrap", so now
sending the backspace character to the terminal at the start of a line
causes the cursor to wrap round to the end of the previous line if and
only if it _shouldn't_ have done.
A long time ago, in commit 4d77b6567, I moved the generation of the
arrow-key escape sequences into a function format_arrow_key(). Mostly
the reason for that was a special purpose I had in mind at the time
which involved auto-generating the same sequences in response to
things other than a keypress, but I always thought it would be nice to
centralise a lot more of PuTTY's complicated keyboard handling in the
same way - at least the handling of the function keys and their
numerous static and dynamic config options.
In this year's general spirit of tidying up and refactoring, I think
it's finally time. So here I introduce three more centralised
functions for dealing with the numbered function keys, the small
keypad (Ins, Home, PgUp etc) and the numeric keypad. Lots of horrible
and duplicated code from the key handling functions in window.c and
gtkwin.c is now more sensibly centralised: each platform keyboard
handler concerns itself with the local format of a keyboard event and
platform-specific enumeration of key codes, and once it's decided what
the logical key press actually _is_, it hands off to the new functions
in terminal.c to generate the appropriate escape code.
Mostly this is intended to be a refactoring without functional change,
leaving the keyboard handling how it's always been. But in cases where
the Windows and GTK handlers were accidentally inconsistent, I've
fixed the inconsistency rather than carefully keeping both sides how
they were. Known consistency fixes:
- swapping the arrow keys between normal (ESC [ A) and application
(ESC O A) is now done by pressing Ctrl with them, and _not_ by
pressing Shift. That was how it was always supposed to work, and
how it's worked on GTK all along, but on Windows it's been done by
Shift as well since 2010, due to a bug at the call site of
format_arrow_key() introduced when I originally wrote that function.
- in Xterm function key mode plus application keypad mode, the /*-
keys on the numeric keypad now send ESC O {o,j,m} in place of ESC O
{Q,R,S}. That's how the Windows keyboard handler has worked all
along (it was a deliberate behaviour tweak for the Xterm-like
function key mode, because in that mode ESC O {Q,R,S} are generated
by F2-F4). But the GTK keyboard handler omitted that particular
special case and was still sending ESC O {Q,R,S} for those keys in
all application keypad modes.
- also in Xterm function key mode plus app keypad mode, we only
generates the app-keypad escape sequences if Num Lock is on; with
Num Lock off, the numeric keypad becomes arrow keys and
Home/End/etc, just as it would in non-app-keypad mode. Windows has
done this all along, but again, GTK lacked that special case.
Now $(CC) is defined to be nothing but the name of the clang-cl binary
itself, which makes it easier to drop in a different one for a special
purpose.
(I tried to use this for static analysis recently - unsuccessfully, as
yet, but I think this change will make anything else along the same
lines easier as well.)
It's just silly to have _two_ systems for traversing a string of
comma-separated protocol ids. I think the new get_commasep_word
technique for looping over the elements of a string is simpler and
more general than the old membership-testing approach, and also it's
necessary for the modern KEX untangling system (which has to be able
to loop over one string, even if it used a membership test to check
things in the other). So this commit rewrites the two remaining uses
of in_commasep_string to use get_commasep_word instead, and deletes
the former.
If GSSAPI authentication fails because we call the GSS acquire_cred
function on the client side and find it doesn't give us anything
useful, then that authentication attempt has to terminate - but since
_we_ decided to terminate it, on the client side, the server will be
sending us neither a formal USERAUTH_FAILURE nor any other kind of
packet.
So when we go back round to the top of the auth loop, we have to avoid
_either_ assuming we're sitting on a USERAUTH_FAILURE we can parse for
its method list, _or_ waiting to receive one. Instead we just have to
push on and try the next auth method in the list from the last
USERAUTH_FAILURE we did see.
Hence, a new flag lets us suppress the usual behaviour of waiting
until we have a response packet on the queue, and then all references
to pktin after that are tested for NULL.
There are situations - or _should_ be, at any rate - in which we
terminate a userauth attempt without having received a
USERAUTH_FAILURE from the server, which means that we can't depend on
always starting a userauth loop iteration by extracting the server's
list of permitted methods from the current failure message. If there
isn't a current failure message, the best we can do is remember the
state from last time.
That's already what we do for actually deciding which methods to
attempt (we set s->can_foo from the methods string). But we should
also keep the full original version of the string, for use in error
message.
The flag 'cross_certifying' in the SSH-2 transport layer state is now
a pointer to the host key algorithm we expect to be certifying,
instead of a plain bool. That lets me check by assertion that it's
what we expected it to be after all the complicated key exchange has
happened.
(I have no reason to think this _will_ go wrong. When we cross-
certify, the desired algorithm should be the only one we put into our
KEXINIT host key algorithm list, so it should also be the only one we
can come out of the far end of KEXINIT having selected. But if
anything ever does go wrong with my KEXINIT handling then I'd prefer
an assertion failure to silently certifying the wrong key, and also,
this makes it clearer to static analysers - and perhaps also humans
reading the code - what we expect the situation to be.)
It hasn't been possible for canonify() to return a null pointer since
commit 094dd30d9, in 2001. But the whole of psftp.c is full of error
checking clauses that allow for the possibility that it might!
This fixes a batch of clang-analyzer warnings of the form 'you
declared / assigned this variable and then never use it'. It doesn't
fix _all_ of them - some are there so that when I add code in the
future _it_ can use the variable without me having to remember to
start setting it - but these are the ones I thought it would make the
code better instead of worse to fix.
This is mostly to make static analysers and compiler warnings a bit
happier - now they know that a call to, say, modalfatalbox() means
they don't have to worry about what the rest of the function will do.
The ad-hoc code that received data from the SCP or SFTP server
predated even not-very-modern conveniences such as bufchain, and was
quite horrible and cumbersome.
Particularly nasty was the part where ssh_scp_recv set a _global_
pointer variable to the buffer it was in the middle of writing to, and
then recursed and expected a callback to use that pointer. That caused
clang-analyzer to grumble at me, in a particular case where the output
buffer was in the ultimate caller's stack frame; even though I'm
confident the code _worked_, I can't blame clang for being unhappy!
So now we do things the modern and much simpler way: the callback when
data comes in just puts it on a bufchain, and the top-level
ssh_scp_recv repeatedly waits until data arrives in the bufchain and
then copies it to the output buffer.
'struct str' in terminal.c was an earlier and less good implementation
of the same concept as misc.h's strbuf, so I've replaced it with the
same strbuf we have everywhere. As a bonus, this means I can also use
put_uint{16,32} to save a bit of effort writing out the compressed
scrollback data.
On the decompression side, I've also switched to using BinarySource,
which has the advantage that now if the decoding goes wrong we can at
least be sure of not reading beyond the end of the buffer.
(The flip side of that is that now we _store_ the length of each
compressed line buffer, which costs a bit of memory. But I think it's
worth it for the safety and code consistency.)
uxnet.c's sk_namelookup and the sorting-key construction in
pangofont_enum_fonts() were both using s[n]printf and strncpy into
buffers that had no real need to be fixed-size; format_telnet_command
and the GTK Event Log selection-data builder were doing their own
sresize loops, but now we have strbuf they can just use that and save
redoing the same work.