In commit 9cc586e605 I changed the low-level key-file reading
routines like read_header and read_body so that they read from a
BinarySource via get_byte(), rather than from a FILE * via fgetc. But
I forgot that the two functions don't signal end-of-file the same way,
so testing the return value of get_byte() against EOF is pointless and
will never match, and conversely, real EOF won't be spotted unless you
also examine the error indicator in the BinarySource.
As a result, a key file that ends without a trailing newline will
cause a tight loop in one of those low-level read routines.
This code base has always been a bit confused about which spelling it
likes to use to refer to that signature algorithm. The SSH protocol id
is "ssh-dss". But everyone I know refers to it as the Digital
Signature _Algorithm_, not the Digital Signature _Standard_.
When I moved everything down into the crypto subdir, I took the
opportunity to rename sshdss.c to dsa.c. Now I'm doing the rest of the
job: all internal identifiers and code comments refer to DSA, and the
spelling "dss" only survives in externally visible identifiers that
have to remain constant.
(Such identifiers include the SSH protocol id, and also the string id
used to identify the key type in PuTTY's own host key cache. We can't
change the latter without causing everyone a backwards-compatibility
headache, and if we _did_ ever decide to do that, we'd surely want to
do a much more thorough job of making the cache format more sensible!)
Coverity pointed out that I'd checked if the LoadedFile was NULL, set
an error message ... and then accidentally fallen through to the
success handler anyway.
ssh2_all_fingerprints() and friends will return a small 'char **'
array, containing all the fingerprints of a key that we know how to
generate, indexed by the FingerprintType enum. The result requires
complex freeing, so there's an ssh2_free_all_fingerprints as well.
For SSH-1 RSA keys, we refuse to generate any fingerprint except the
old SSH-1 MD5 version, because there's no other fingerprint type I
know of that anyone else uses. So I've got a function that returns the
same 'char **' for an SSH-1 key, but it only fills in the MD5 slot,
and leaves the rest NULL.
As a result, I also need a dynamic function that takes a fingerprint
list and returns the id of the most preferred fingerprint type in it
_that actually exists_.
NFC: this API is introduced, but not yet used.
There's a new enumeration of fingerprint types, and you tell
ssh2_fingerprint() or ssh2_fingerprint_blob() which of them to use.
So far, this is only implemented behind the scenes, and exposed for
testcrypt to test. All the call sites of ssh2_fingerprint pass a fixed
default fptype, which is still set to the old MD5. That will change
shortly.
This commit adds the capability in principle to ppk_save_sb, by adding
a fmt_version field in the save parameters structure. As yet it's not
connected up to any user interface in PuTTYgen, but I think I'll need
to, because currently there's no way at all to convert PPK v3 back to
v2, and surely people will need to interoperate with older
installations of PuTTY, or with other PPK-consuming software.
Thanks to Pavel and his CI for pointing out what I'd forgotten: the
automated test of cmdgen.c expects that round-tripping a PPK file to
some other format and back will regenerate the identical file. Of
course, with a randomised salt in the new-look password hash, that
isn't true any more in normal usage.
Fixed by adding an option in the existing parameters structure to
provide a salt override. That shouldn't be used anywhere except
cgtest, but in cgtest, it restores the determinism we need.
Another potential (but not guaranteed) source of difference is the
automatic time-scaling of the Argon2 parameter choice. So I've turned
that off too, while I'm at it.
This removes both uses of SHA-1 in the file format: it was used as the
MAC protecting the key file against tamperproofing, and also used in
the key derivation step that converted the user's passphrase to cipher
and MAC keys.
The MAC is simply upgraded from HMAC-SHA-1 to HMAC-SHA-256; it is
otherwise unchanged in how it's applied (in particular, to what data).
The key derivation is totally reworked, to be based on Argon2, which
I've just added to the code base. This should make stolen encrypted
key files more resistant to brute-force attack.
Argon2 has assorted configurable parameters for memory and CPU usage;
the new key format includes all those parameters. So there's no reason
we can't have them under user control, if a user wants to be
particularly vigorous or particularly lightweight with their own key
files. They could even switch to one of the other flavours of Argon2,
if they thought side channels were an especially large or small risk
in their particular environment. In this commit I haven't added any UI
for controlling that kind of thing, but the PPK loading function is
all set up to cope, so that can all be added in a future commit
without having to change the file format.
While I'm at it, I've also switched the CBC encryption to using a
random IV (or rather, one derived from the passphrase along with the
cipher and MAC keys). That's more like normal SSH-2 practice.
'bool old_fmt' in ppk_load_s has now given way to a numeric version
field, which will allow it to be set to 3 in future, instead of just 1
or 2. The ad-hoc integer variable 'cipher' is replaced with a pointer
to a small struct that mentions individual details like key lengths,
to aid parametrisation.
The old ssh2_ppk_derivekey is now a larger function that derives all
three of the key components used in the private-blob protection: not
just the cipher key, but the cipher IV and the MAC key as well. The
main part of it is a switch on the file-format version, which
currently has only one case (PPK v1 and v2 don't differ in the key
derivation), but gives me a handy place to drop in a new case in a
future commit, changing the derivation of all those things.
All the key material is stored end-to-end in a single strbuf, with
ptrlens pointing into it. That makes it easy to free all in one go
later.
No functional change: currently, the IV passed in is always zero
(except in the test suite). But this prepares to change that in a
future revision of the key file format.
There's no need to have one bunch of free operations before returning
success and another version before returning error. Easier to just set
up the state in the former case so that we can fall through to the
latter.
Somebody on comp.security.ssh asked about it recently, and I decided
that storing it in a comment in the key file was not really good
enough. Also, that comment was incomplete (it listed the private key
formats for RSA and DSA but not any of the newer ECC key types, simple
as their private-key formats may be).
The information was already centralised in find_pubkey_alg, but that
had a query-based API that couldn't enumerate the key types. Now I
expose an underlying array so that it's possible to iterate over them.
Also, I'd forgotten to add the two new rsa-sha2-* algorithms to
find_pubkey_alg. That's also done as part of this commit.
This is standardised by RFC 8709 at SHOULD level, and for us it's not
too difficult (because we use general-purpose elliptic-curve code). So
let's be up to date for a change, and add it.
This implementation uses all the formats defined in the RFC. But we
also have to choose a wire format for the public+private key blob sent
to an agent, and since the OpenSSH agent protocol is the de facto
standard but not (yet?) handled by the IETF, OpenSSH themselves get to
say what the format for a key should or shouldn't be. So if they don't
support a particular key method, what do you do?
I checked with them, and they agreed that there's an obviously right
format for Ed448 keys, which is to do them exactly like Ed25519 except
that you have a 57-byte string everywhere Ed25519 had a 32-byte
string. So I've done that.
Also spelled '-O text', this takes a public or private key as input,
and produces on standard output a dump of all the actual numbers
involved in the key: the exponent and modulus for RSA, the p,q,g,y
parameters for DSA, the affine x and y coordinates of the public
elliptic curve point for ECC keys, and all the extra bits and pieces
in the private keys too.
Partly I expect this to be useful to me for debugging: I've had to
paste key files a few too many times through base64 decoders and hex
dump tools, then manually decode SSH marshalling and paste the result
into the Python REPL to get an integer object. Now I should be able to
get _straight_ to text I can paste into Python.
But also, it's a way that other applications can use the key
generator: if you need to generate, say, an RSA key in some format I
don't support (I've recently heard of an XML-based one, for example),
then you can run 'puttygen -t rsa --dump' and have it print the
elements of a freshly generated keypair on standard output, and then
all you have to do is understand the output format.
This will allow it to be used more conveniently for things other than
key files.
For the moment, the implementation still lives in sshpubk.c. Moving it
out into utils.c or misc.c would be nicer, but it has awkward
dependencies on marshal.c and the per-platform f_open function.
Perhaps another time.
This reworks the cmdgen main program so that it loads the input file
into a LoadedFile right at the start, and then every time it needs to
do something with the contents, it calls one of the API functions
taking a BinarySource instead of one taking a Filename.
The usefulness of this is that now we can read from things that aren't
regular files, and can't be rewound or reopened. In particular, the
filename "-" is now taken (per the usual convention) to mean standard
input.
So now you can pipe a public or private key file into cmdgen's
standard input and have it do something useful. For example, I was
recently experimenting with the SFTP-only SSH server that comes with
'proftpd', which keeps its authorized_keys file in RFC 4716 format
instead of the OpenSSH one-liner format, and I found I wanted to do
grep 'my-key-comment' ~/.ssh/authorized_keys | puttygen -p -
to quickly get hold of my existing public key to put in that file. But
I had to go via a temporary file to make that work, because puttygen
couldn't read from standard input. Next time, it will be able to!
I'm about to use it in cmdgen for a minor UI improvement. Also, I
expect it to be useful in the Pageant client code sooner or later.
While I'm here, I've also tweaked its UI a little so that it reports a
more precise error, and provided a version that can read from an
already open stdio stream.
These were just too footling for even me to bother splitting up into
multiple commits:
- a couple of int -> size_t changes left out of the big-bang commit
0cda34c6f
- a few 'const' added to pointer-type casts that are only going to be
read from (leaving out the const provokes a warning if the pointer
was const _before_ the cast)
- a couple of 'return' statements trying to pass the void return of
one function through to another.
- another missing (void) in a declaration in putty.h (but this one
didn't cause any knock-on confusion).
- a few tweaks to macros, to arrange that they eat a semicolon after
the macro call (extra do ... while (0) wrappers, mostly, and one
case where I had to do it another way because the macro included a
variable declaration intended to remain in scope)
- reworked key_type_to_str to stop putting an unreachable 'break'
statement after every 'return'
- removed yet another type-check of a function loaded from a Windows
system DLL
- and finally, a totally spurious semicolon right after an open brace
in mainchan.c.
A user reports that Visual Studio 2013 and earlier have printf
implementations in their C library that don't support the 'z' modifier
to indicate that an integer argument is size_t. The 'I' modifier
apparently works in place of it.
To avoid littering ifdefs everywhere, I've invented my own inttypes.h
style macros to wrap size_t formatting directives, which are defined
to %zu and %zx normally, or %Iu and %Ix in old-VS mode. Those are in
defs.h, and they're used everywhere that a %z might otherwise get into
the Windows build.
All the functions that read and write public and private keys to a
FILE * are now small wrappers on top of an underlying set of functions
that read data in the same format from a BinarySource, or write it to
a strbuf. This sets me up to deal with key files in contexts other
than on disk.
Now they have names that are more consistent (no more userkey_this but
that_userkey); a bit shorter; and, most importantly, all the current
functions end in _f to indicate that they deal with keys stored in
disk files. I'm about to add a second set of entry points that deal
with keys via the more general BinarySource / BinarySink interface,
which will sit alongside these with a different suffix.
This doesn't affect what files are _legal_: the spec said we tolerated
three kinds of line ending, and it still says we tolerate the same
three. But I noticed that we're actually outputting \n by preference,
whereas the spec said we prefer \r\n. I'd rather change the docs than
the code.
This commit switches as many ssh_hash_free / ssh_hash_new pairs as
possible to reuse the previous hash object via ssh_hash_reset. Also a
few other cleanups: use the wrapper function hash_simple() where
possible, and I've also introduced ssh_hash_digest_nondestructive()
and switched to that where possible as well.
The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
They're called things like SSH_CIPHER_3DES in the SSH-1 spec, but I
don't normally let that stop me adding the disambiguating '1' in the
names I give constants inside this code base. These ones are long
overdue for some disambiguation.
The _nm strategy is slower, so I don't want to just change everything
over no matter what its contents. In this pass I've tried to catch
everything that holds the _really_ sensitive things like passwords,
private keys and session keys.
I've fixed a handful of these where I found them in passing, but when
I went systematically looking, there were a lot more that I hadn't
found!
A particular highlight of this collection is the code that formats
Windows clipboard data in RTF, which was absolutely crying out for
strbuf_catf, and now it's got it.
The local put_mp_*_from_string functions in import.c now take ptrlen
(which simplifies essentially all their call sites); so does the local
function logwrite() in logging.c, and so does ssh2_fingerprint_blob.
Those were a reasonable abbreviation when the code almost never had to
deal with little-endian numbers, but they've crept into enough places
now (e.g. the ECC formatting) that I think I'd now prefer that every
use of the integer read/write macros was clearly marked with its
endianness.
So all uses of GET_??BIT and PUT_??BIT are now qualified. The special
versions in x11fwd.c, which used variable endianness because so does
the X11 protocol, are suffixed _X11 to make that clear, and where that
pushed line lengths over 80 characters I've taken the opportunity to
name a local variable to remind me of what that extra parameter
actually does.
This is in preparation for a PRNG revamp which will want to have a
well defined boundary for any given request-for-randomness, so that it
can destroy the evidence afterwards. So no more looping round calling
random_byte() and then stopping when we feel like it: now you say up
front how many random bytes you want, and call random_read() which
gives you that many in one go.
Most of the call sites that had to be fixed are fairly mechanical, and
quite a few ended up more concise afterwards. A few became more
cumbersome, such as mp_random_bits, in which the new API doesn't let
me load the random bytes directly into the target integer without
triggering undefined behaviour, so instead I have to allocate a
separate temporary buffer.
The _most_ interesting call site was in the PKCS#1 v1.5 padding code
in sshrsa.c (used in SSH-1), in which you need a stream of _nonzero_
random bytes. The previous code just looped on random_byte, retrying
if it got a zero. Now I'm doing a much more interesting thing with an
mpint, essentially scaling a binary fraction repeatedly to extract a
number in the range [0,255) and then adding 1 to it.
All the hash-specific state structures, and the functions that
directly accessed them, are now local to the source files implementing
the hashes themselves. Everywhere we previously used those types or
functions, we're now using the standard ssh_hash or ssh2_mac API.
The 'simple' functions (hmacmd5_simple, SHA_Simple etc) are now a pair
of wrappers in sshauxcrypt.c, each of which takes an algorithm
structure and can do the same conceptual thing regardless of what it
is.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
The old 'Bignum' data type is gone completely, and so is sshbn.c. In
its place is a new thing called 'mp_int', handled by an entirely new
library module mpint.c, with API differences both large and small.
The main aim of this change is that the new library should be free of
timing- and cache-related side channels. I've written the code so that
it _should_ - assuming I haven't made any mistakes - do all of its
work without either control flow or memory addressing depending on the
data words of the input numbers. (Though, being an _arbitrary_
precision library, it does have to at least depend on the sizes of the
numbers - but there's a 'formal' size that can vary separately from
the actual magnitude of the represented integer, so if you want to
keep it secret that your number is actually small, it should work fine
to have a very long mp_int and just happen to store 23 in it.) So I've
done all my conditionalisation by means of computing both answers and
doing bit-masking to swap the right one into place, and all loops over
the words of an mp_int go up to the formal size rather than the actual
size.
I haven't actually tested the constant-time property in any rigorous
way yet (I'm still considering the best way to do it). But this code
is surely at the very least a big improvement on the old version, even
if I later find a few more things to fix.
I've also completely rewritten the low-level elliptic curve arithmetic
from sshecc.c; the new ecc.c is closer to being an adjunct of mpint.c
than it is to the SSH end of the code. The new elliptic curve code
keeps all coordinates in Montgomery-multiplication transformed form to
speed up all the multiplications mod the same prime, and only converts
them back when you ask for the affine coordinates. Also, I adopted
extended coordinates for the Edwards curve implementation.
sshecc.c has also had a near-total rewrite in the course of switching
it over to the new system. While I was there, I've separated ECDSA and
EdDSA more completely - they now have separate vtables, instead of a
single vtable in which nearly every function had a big if statement in
it - and also made the externally exposed types for an ECDSA key and
an ECDH context different.
A minor new feature: since the new arithmetic code includes a modular
square root function, we can now support the compressed point
representation for the NIST curves. We seem to have been getting along
fine without that so far, but it seemed a shame not to put it in,
since it was suddenly easy.
In sshrsa.c, one major change is that I've removed the RSA blinding
step in rsa_privkey_op, in which we randomise the ciphertext before
doing the decryption. The purpose of that was to avoid timing leaks
giving away the plaintext - but the new arithmetic code should take
that in its stride in the course of also being careful enough to avoid
leaking the _private key_, which RSA blinding had no way to do
anything about in any case.
Apart from those specific points, most of the rest of the changes are
more or less mechanical, just changing type names and translating code
into the new API.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
This commit includes <stdbool.h> from defs.h and deletes my
traditional definitions of TRUE and FALSE, but other than that, it's a
100% mechanical search-and-replace transforming all uses of TRUE and
FALSE into the C99-standardised lowercase spellings.
No actual types are changed in this commit; that will come next. This
is just getting the noise out of the way, so that subsequent commits
can have a higher proportion of signal.
One to make one from a NUL-terminated string, and another to make one
from a strbuf. I've switched over all the obvious cases where I should
have been using these functions.
If the file is empty, or otherwise fails to start with a recognised
'PuTTY-User-Key-File' header line, we forgot to fill in the error
message before returning failure, leading to a null pointer
dereference.
It is to put_data what memset is to memcpy. Several places
in the code wanted it already, but not _quite_ enough for me to
have written it with the rest of the BinarySink infrastructure
originally.
After Pavel Kryukov pointed out that I have to put _something_ in the
'ssh_key' structure, I thought of an actually useful thing to put
there: why not make it store a pointer to the ssh_keyalg structure?
Then ssh_key becomes a classoid - or perhaps 'traitoid' is a closer
analogy - in the same style as Socket and Plug. And just like Socket
and Plug, I've also arranged a system of wrapper macros that avoid the
need to mention the 'object' whose method you're invoking twice at
each call site.
The new vtable pointer directly replaces an existing field of struct
ec_key (which was usable by several different ssh_keyalgs, so it
already had to store a pointer to the currently active one), and also
replaces the 'alg' field of the ssh2_userkey structure that wraps up a
cryptographic key with its comment field.
I've also taken the opportunity to clean things up a bit in general:
most of the methods now have new and clearer names (e.g. you'd never
know that 'newkey' made a public-only key while 'createkey' made a
public+private key pair unless you went and looked it up, but now
they're called 'new_pub' and 'new_priv' you might be in with a
chance), and I've completely removed the openssh_private_npieces field
after realising that it was duplicating information that is actually
_more_ conveniently obtained by calling the new_priv_openssh method
(formerly openssh_createkey) and throwing away the result.
This parameter returned a substring of the input, which was used for
two purposes. Firstly, it was used to hash the host and server keys
during the initial SSH-1 key setup phase; secondly, it was used to
check the keys in Pageant against the public key blob of a key
specified on the command line.
Unfortunately, those two purposes didn't agree! The first one needs
just the bare key modulus bytes (without even the SSH-1 mpint length
header); the second needs the entire key blob. So, actually, it seems
to have never worked in SSH-1 to say 'putty -i keyfile' and have PuTTY
find that key in Pageant and not have to ask for the passphrase to
decrypt the version on disk.
Fixed by removing that parameter completely, which simplifies all the
_other_ call sites, and replacing it by custom code in those two
places that each does the actually right thing.
Quite a few of the function pointers in the ssh_keyalg vtable now take
ptrlen arguments in place of separate pointer and length pairs.
Meanwhile, the various key types' implementations of those functions
now work by initialising a BinarySource with the input ptrlen and
using the new decode functions to walk along it.
One exception is the openssh_createkey method which reads a private
key in the wire format used by OpenSSH's SSH-2 agent protocol, which
has to consume a prefix of a larger data stream, and tell the caller
how much of that data was the private key. That function now takes an
actual BinarySource, and passes that directly to the decode functions,
so that on return the caller finds that the BinarySource's read
pointer has been advanced exactly past the private key.
This let me throw away _several_ reimplementations of mpint-reading
functions, one in each of sshrsa, sshdss.c and sshecc.c. Worse still,
they didn't all have exactly the SSH-2 semantics, because the thing in
sshrsa.c whose name suggested it was an mpint-reading function
actually tolerated the wrong number of leading zero bytes, which it
had to be able to do to cope with the "ssh-rsa" signature format which
contains a thing that isn't quite an SSH-2 mpint. Now that deviation
is clearly commented!
This wraps up a (pointer, length) pair into a convenient struct that
lets me return it by value from a function, and also pass it through
to other functions in one go.
Ideally quite a lot of this code base could be switched over to using
ptrlen in place of separate pointer and length variables or function
parameters. (In fact, in my personal ideal conception of C, the usual
string type would be of this form, and all the string.h functions
would operate on ptrlens instead of zero-terminated 'char *'.)
For the moment, I'm just introducing it to make some upcoming
refactoring less inconvenient. Bulk migration of existing code to
ptrlen is a project for another time.
Along with the type itself, I've provided a convenient system of
including the contents of a ptrlen in a printf; a constructor function
that wraps up a pointer and length so you can make a ptrlen on the fly
in mid-expression; a function to compare a ptrlen against an ordinary
C string (which I mostly expect to use with string literals); and a
function 'mkstr' to make a dynamically allocated C string out of one.
That last function replaces a function of the same name in sftp.c,
which I'm promoting to a whole-codebase facility and adjusting its
API.
During last week's work, I made a mistake in which I got the arguments
backwards in one of the key-blob-generating functions - mistakenly
swapped the 'void *' key instance with the 'BinarySink *' output
destination - and I didn't spot the mistake until run time, because in
C you can implicitly convert both to and from void * and so there was
no compile-time failure of type checking.
Now that I've introduced the FROMFIELD macro that downcasts a pointer
to one field of a structure to retrieve a pointer to the whole
structure, I think I might start using that more widely to indicate
this kind of polymorphic subtyping. So now all the public-key
functions in the struct ssh_signkey vtable handle their data instance
in the form of a pointer to a subfield of a new zero-sized structure
type 'ssh_key', which outside the key implementations indicates 'this
is some kind of key instance but it could be of any type'; they
downcast that pointer internally using FROMFIELD in place of the
previous ordinary C cast, and return one by returning &foo->sshk for
whatever foo they've just made up.
The sshk member is not at the beginning of the structure, which means
all those FROMFIELDs and &key->sshk are actually adding and
subtracting an offset. Of course I could have put the member at the
start anyway, but I had the idea that it's actually a feature _not_ to
have the two types start at the same address, because it means you
should notice earlier rather than later if you absentmindedly cast
from one to the other directly rather than by the approved method (in
particular, if you accidentally assign one through a void * and back
without even _noticing_ you perpetrated a cast). In particular, this
enforces that you can't sfree() the thing even once without realising
you should instead of called the right freekey function. (I found
several bugs by this method during initial testing, so I think it's
already proved its worth!)
While I'm here, I've also renamed the vtable structure ssh_signkey to
ssh_keyalg, because it was a confusing name anyway - it describes the
_algorithm_ for handling all keys of that type, not a specific key. So
ssh_keyalg is the collection of code, and ssh_key is one instance of
the data it handles.