I'm preparing to be able to ask about the other end of the connection
too, so the first step is to give this data structure a neutral name
that can refer to either. No functional change yet.
I mentioned recently (in commit 9e7d4c53d8) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.
These were indented 2 spaces _further_ than the surrounding code,
instead of 2 spaces less. My bulk-reindentation the other day didn't
detect them because apparently my Emacs configuration can make this
mistake all by itself, so it thought they were right!
Previously, we had a single data structure 'keytree' containing
records each involving a public and private key (the latter maybe in
clear, or as an encrypted key file, or both). Now, we have separate
'pubkeytree' and 'privkeytree', the former storing public keys indexed
by their full public blob (including certificate, if any), and the
latter storing private keys, indexed by the _base_ public blob
only (i.e. with no certificate included).
The effect of this is that deferred decryption interacts more sensibly
with certificates. Now, if you load certified and uncertified versions
of the same key into Pageant, or two or more differently certified
versions, then the separate public key records will all share the same
private key record, and hence, a single state of decryption. So the
first time you enter a passphrase that unlocks that private key, it
will unlock it for all public keys that share the same private half.
Conversely, re-encrypting any one of them will cause all of them to
become re-encrypted, eliminating the risk that you deliberately
re-encrypt a key you really care about and forget that another equally
valuble copy of it is still in clear.
The most subtle part of this turned out to be the question of what key
comment you present in a deferred decryption prompt. It's very
tempting to imagine that it should be the comment that goes with
whichever _public_ key was involved in the signing request that
triggered the prompt. But in fact, it _must_ be the comment that goes
with whichever version of the encrypted key file is stored in Pageant
- because what if the user chose different passphrases for their
uncertified and certified PPKs? Then the decryption prompt will have
to indicate which passphrase they should be typing, so it's vital to
present the comment that goes with the _file we're decrypting_.
(Of course, if the user has selected different passphrases for those
two PPKs but the _same_ comment, they're still going to end up
confused. But at least once they realise they've done that, they have
a workaround.)
OpenSSH, when called on to give the fingerprint of a certified public
key, will in many circumstances generate the hash of the public blob
of the _underlying_ key, rather than the hash of the full certificate.
I think the hash of the certificate is also potentially useful (if
nothing else, it provides a way to tell apart multiple certificates on
the same key). But I can also see that it's useful to be able to
recognise a key as the same one 'really' (since all certificates on
the same key share a private key, so they're unavoidably related).
So I've dealt with this by introducing an extra pair of fingerprint
types, giving the cross product of {MD5, SHA-256} x {base key only,
full certificate}. You can manually select which one you want to see
in some circumstances (notably PuTTYgen), and in others (such as
diagnostics) both fingerprints will be emitted side by side via the
new functions ssh2_double_fingerprint[_blob].
The default, following OpenSSH, is to just fingerprint the base key.
If you had a key stored encrypted in Pageant, and you launched two
PuTTY sessions both trying to generate a signature with the key, and
then didn't type the passphrase into the async passphrase prompt until
after both sessions had made requests, then only one of the requests
would be replied to once you decrypted the key. The other would sit
there waiting for a response that Pageant never got round to sending.
This would also happen in the case where you launched two PuTTY
sessions each trying to use a _different_ encrypted key in Pageant, in
which case Pageant should have presented the next GUI passphrase box
after the first one was dismissed. That didn't happen either.
This was the result of two separate bugs in the code. Firstly, when
signop_coroutine() noticed that a GUI request was already in progress,
it would crReturn without making any arrangement to be called back.
Now there's a queue of 'requests blocked on waiting for some GUI
prompt to finish'.
Secondly, if multiple requests were blocked on the same key, the code
that went over the linked list of them had a bug in which it would
unblock the _first_ list item in every iteration, instead of
unblocking all the items once each.
It's a class method rather than an object method, so it doesn't allow
keys with the same algorithm to make different choices about what
flags they support. But that's not what I wanted it for: the real
purpose is to allow one key algorithm to delegate supported_flags to
another, by having its method implementation call the one from the
delegate class.
(If only C's compile/link model permitted me to initialise a field of
one global const struct variable to be a copy of that of another, I
wouldn't need the runtime overhead of this method! But object file
formats don't let you even specify that.)
Most key algorithms support no flags at all, so they all want to use
the same implementation of this method. So I've started a file of
stubs utils/nullkey.c to contain the common stub version.
If you already have a string (of potentially-binary data) in the form
of a ptrlen reference to somewhere else, and you want to keep a copy
somewhere, it's useful to copy it into a strbuf. But it takes a couple
of lines of faff to do that, and it's nicer to wrap that up into a
tiny helper function.
This commit adds that helper function strbuf_dup, and its non-movable
sibling strbuf_dup_nm for secret data. Also, gone through the existing
code and found a bunch of cases where this makes things less verbose.
Passing an operating-system-specific error code to plug_closing(),
such as errno or GetLastError(), was always a bit weird, given that it
generally had to be handled by cross-platform receiving code in
backends. I had the platform.h implementations #define any error
values that the cross-platform code would have to handle specially,
but that's still not a great system, because it also doesn't leave
freedom to invent error representations of my own that don't
correspond to any OS code. (For example, the ones I just removed from
proxy.h.)
So now, the OS error code is gone from the plug_closing API, and in
its place is a custom enumeration of closure types: normal, error, and
the special case BROKEN_PIPE which is the only OS error code we have
so far needed to handle specially. (All others just mean 'abandon the
connection and print the textual message'.)
Having already centralised the handling of OS error codes in the
previous commit, we've now got a convenient place to add any further
type codes for errors needing special handling: each of Unix
plug_closing_errno(), Windows plug_closing_system_error(), and Windows
plug_closing_winsock_error() can easily grow extra special cases if
need be, and each one will only have to live in one place.
It was totally unused. No implementation of the 'closing' method in a
Plug vtable was checking it for any reason at all, except for
ProxySocket which captured it from its client in order to pass on to
its server (which, perhaps after further iterations of ProxySocket,
would have ended up ignoring it similarly). And every caller of
plug_closing set it to 0 (aka false), except for the one in sshproxy.c
which passed true (but it would have made no difference to anyone).
The comment in network.h refers to a FIXME comment which was in
try_send() when that code was written (see winnet.c in commit
7b0e082700). That FIXME is long gone, replaced by a use of a
toplevel callback. So I think the aim must have been to avoid
re-entrancy when sk_write called try_send which encountered a socket
error and called back to plug_closing - but that's long since fixed by
other means now.
When I create any class that implements PageantAsyncOp, I had intended
to link its pao.cr list node on to the list in the appropriate
PageantClientInfo, so that if that PageantClientInfo was destroyed
prematurely (e.g. an agent connection was abruptly closed), we could
destroy all the pending requests containing a pointer to it.
I did this linking by setting the linked-list fields in pao.cr to
point to the appropriate places in the existing list - but I didn't
modify the pointer fields _in_ the existing list to point to the new
operation at all. So nothing ever actually got linked on to any of
those queues!
Commit 8f5e9a4f8d introduced a segfault into both Windows Pageant
and Windows connection sharing upstreams when they receive an incoming
named-pipe connection. This occurs because the PlugVtables for those
incoming connections had a null pointer in the 'log' field, because
hitherto, only sockets involved with an outgoing connection expected
to receive plug_log() notifications. But I added such a notification
in make_handle_socket, forgetting that that function is used for both
outgoing and incoming named-pipe connections (among other things). So
now a Plug implementation that expects to be set up by the
plug_accepting() method on a listener may still receive
PLUGLOG_CONNECT_SUCCESS.
I could fix that by adding a parameter to make_handle_socket telling
it whether to send a notification, but that seems like more faff than
is really needed. Simpler to make a rule that says _all_ Socket types
must implement the log() method, even if only with a no-op function.
We already have a no-op implementation of log(), in nullplug.c. So
I've exposed that outside its module (in the same style as all the
nullseat functions and so on), to make it really easy to add log()
methods to PlugVtables that don't need one.
In commit d53b3bcd22 I changed the final setting of kl->broken
so that it wouldn't overwrite a 'true' value set earlier in the
function. But that means it might not be set at all, because I forgot
I now needed to initialise it to false. Ahem.
Coverity points out that if rsa_ssh1_public_blob_len sees data it
doesn't like, it returns -1 to indicate an error. But the code that
uses it to parse the SSH1_AGENT_RSA_IDENTITIES_ANSWER payload was
passing it directly to get_data() as a length field, without checking
for that case. Now we do check it, and use it to set the existing
kl->broken flag that indicates that the key list was not correctly
formatted.
Gives more helpful messages if Unix pageant ends up being a client for,
say, OpenSSH's ssh-agent, or indeed an older version of Pageant.
(Also, tweak a couple of other messages that still assumed that
pageant-as-client always talks to Pageant-as-agent.)
The protocol already allowed adding an encrypted form to a cleartext key
already held by the agent, and you might want to do so if, say, the key
happened to originally be added in cleartext-only form but you want to
be able to forget that with 'pageant -R' in future.
I've merged the two previous functions, with different return types
per SSH version, into a single one that returns the containing
PageantKey instead of pulling out one of its internal fields.
This actually fixes a bug, though it would only have come up in the
Unix Pageant debugging mode: an encrypted-only key would have
terminated the key list in the diagnostic messages, because
pageant_nth_ssh2_key would have returned pk->skey which was NULL. Now
it returns pk itself, which isn't.
We're no longer calling pageant_nth_ssh*_key or pageant_add_ssh*_key
from outside pageant.c. Remove them from pageant.h and turn them
static, so that we carry on not doing so.
The GUI loop that responded to the 'Remove Key' button in the key list
worked by actually trying to retrieve a pointer to the ssh_key for a
stored key, and then passing that back to the delete function. But
when a key is encrypted, that pointer is NULL, so we segfaulted.
Fixed by changing pageant_delete_ssh2_key() to take a numeric index in
the list instead of a key pointer.
Now Windows Pageant has two clearly distinct dialog boxes for
requesting a key passphrase: one to use synchronously when the user
has just used the 'Add Key' GUI action, and one to use asynchronously
in response to an agent client's attempt to use a key that was loaded
encrypted.
Also fixed the wording in the asynchronous box: there were two copies
of the 'enter passphrase' instruction, one from the dialog definition
in pageant.rc file and one from the cross-platform pageant.c. Now
pageant.c doesn't format a whole user-facing message any more: it
leaves that to the platform front end to do it the way it wants.
I've also added a call to SetForegroundWindow, to try to get the
passphrase prompt into the foreground. In my experience this doesn't
actually get it the keyboard focus, which I think is deliberate on
Windows's part and there's nothing I can do about it. But at least the
user should _see_ that the prompt is there, so they can focus it
themself.
Now they have '(encrypted)' or '(re-encryptable)' after them, the same
as Unix Pageant.
Mostly this just involved tinkering with the code in winpgnt.c that
makes up the entry to put in the list box. But I also had to sprinkle
a few more calls to keylist_update() into the cross-platform
pageant.c, to make sure that the key list window is proactively
updated whenever a key is decrypted, re-encrypted, or loaded in
encrypted-only form.
The callback-function API in pageant.h for key enumeration is modified
so that we pass an array of all the available fingerprints for each
key.
In Unix Pageant, that's used by the -l option to print whichever
fingerprint the user asked for. (Unfortunately, the option name -E is
already taken, so for the moment I've called it --fptype. I may
revisit that later.)
Also, when matching a key by fingerprint, we're prepared to match
against any fingerprint type we know, with disambiguating prefixes if
necessary (e.g. you can match "md5🆎12" or "sha256:Ab12". That has
to be done a bit carefully, because we match MD5 hex fingerprints
case-insensitively, but SHA256 fingerprints are case-sensitive.
There's a new enumeration of fingerprint types, and you tell
ssh2_fingerprint() or ssh2_fingerprint_blob() which of them to use.
So far, this is only implemented behind the scenes, and exposed for
testcrypt to test. All the call sites of ssh2_fingerprint pass a fixed
default fptype, which is still set to the old MD5. That will change
shortly.
The callback function to pageant_enum_keys now takes a flags
parameter, which receives the flags word from the extended key list
request, if available. (If not, then the flags word is passed as
zero.)
The only callback that uses this parameter is the one for printing
text output from 'pageant -l', which uses it to print a suffix on each
line, indicating whether the key is stored encrypted only (so it will
need a passphrase on next use), or whether it's stored both encrypted
_and_ unencrypted (so that 'pageant -R' will be able to return it to
the former state).
Now, if you have a given key stored encrypted in your agent and you
say 'pageant -a [same key]' (without -E), Pageant will notice (via the
new extended key list request) that the key is currently encrypted in
the agent, and that you're trying to add it unencrypted. In this
situation it won't abort the attempt, and will try to add the key
anyway, so that it becomes decrypted in your agent.
Now, if you send SSH2_AGENTC_ADD_IDENTITY with a cleartext private key
blob, and the agent already contains an encrypted-only version of the
same key, it will drop the cleartext version in alongside it,
effectively decrypting the key as if the passphrase had been typed.
This is an extended version of SSH2_AGENTC_REQUEST_IDENTITIES, which
augments each entry in the returned key list with an extra field
containing additional data about the key.
The initial contents of that extra field are a pair of flags
indicating whether the key is currently stored in the agent encrypted,
decrypted or both.
The idea is that this will permit a Pageant-aware client to make
decisions based on that. For a start, the output key list can mention
it to the user; also, if you try to add a key unencrypted when it's
already present, we can discriminate based on whether it's already
present _unencrypted_
Somehow it had acquired a lot of internal 2-space indentation, which
is out of step with the rest of this code base's style. Before I get
into making more changes in here, let's clean it up.
This is a sweeping change applied across the whole code base by a spot
of Emacs Lisp. Now, everywhere I declare a vtable filled with function
pointers (and the occasional const data member), all the members of
the vtable structure are initialised by name using the '.fieldname =
value' syntax introduced in C99.
We were already using this syntax for a handful of things in the new
key-generation progress report system, so it's not new to the code
base as a whole.
The advantage is that now, when a vtable only declares a subset of the
available fields, I can initialise the rest to NULL or zero just by
leaving them out. This is most dramatic in a couple of the outlying
vtables in things like psocks (which has a ConnectionLayerVtable
containing only one non-NULL method), but less dramatically, it means
that the new 'flags' field in BackendVtable can be completely left out
of every backend definition except for the SUPDUP one which defines it
to a nonzero value. Similarly, the test_for_upstream method only used
by SSH doesn't have to be mentioned in the rest of the backends;
network Plugs for listening sockets don't have to explicitly null out
'receive' and 'sent', and vice versa for 'accepting', and so on.
While I'm at it, I've normalised the declarations so they don't use
the unnecessarily verbose 'struct' keyword. Also a handful of them
weren't const; now they are.
This enables Pageant to act as a client for OpenSSH's agent, which
nowadays refuses to respond to SSH1_AGENTC_REQUEST_RSA_IDENTITIES, or
any other SSH1_AGENTC_* message. It now treats SSH_AGENT_FAILURE in
response to either 'list identities' request the same as successfully
receiving an empty list.
Sometimes, within a switch statement, you want to declare local
variables specific to the handler for one particular case. Until now
I've mostly been writing this in the form
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED:
{
declare variables;
do stuff;
}
break;
}
which is ugly because the two pieces of essentially similar code
appear at different indent levels, and also inconvenient because you
have less horizontal space available to write the complicated case
handler in - particuarly undesirable because _complicated_ case
handlers are the ones most likely to need all the space they can get!
After encountering a rather nicer idiom in the LLVM source code, and
after a bit of hackery this morning figuring out how to persuade
Emacs's auto-indent to do what I wanted with it, I've decided to move
to an idiom in which the open brace comes right after the case
statement, and the code within it is indented the same as it would
have been without the brace. Then the whole case handler (including
the break) lives inside those braces, and you get something that looks
more like this:
switch (discriminant) {
case SIMPLE:
do stuff;
break;
case COMPLICATED: {
declare variables;
do stuff;
break;
}
}
This commit is a big-bang change that reformats all the complicated
case handlers I could find into the new layout. This is particularly
nice in the Pageant main function, in which almost _every_ case
handler had a bundle of variables and was long and complicated. (In
fact that's what motivated me to get round to this.) Some of the
innermost parts of the terminal escape-sequence handling are also
breathing a bit easier now the horizontal pressure on them is
relieved.
(Also, in a few cases, I was able to remove the extra braces
completely, because the only variable local to the case handler was a
loop variable which our new C99 policy allows me to move into the
initialiser clause of its for statement.)
Viewed with whitespace ignored, this is not too disruptive a change.
Downstream patches that conflict with it may need to be reapplied
using --ignore-whitespace or similar.
The reencrypt-all request is unusual in its ability to be _partially_
successful. To handle this I've introduced a new return status,
PAGEANT_ACTION_WARNING. At the moment, users of this client code don't
expect it to appear on any request, and I'll make them watch for it
only in the case where I know a particular function can generate it.
These requests parallel 'delete key' and 'delete all keys', but they
work on keys which you originally uploaded in encrypted form: they
cause Pageant to delete only the _decrypted_ form of the key, so that
the next attempt to use the key will need to re-prompt for its
passphrase.
We set it when we started prompting for a passphrase, and never unset
it again when the passphrase prompt either succeeded or failed. Until
now it hasn't mattered, because the only use of the flag is to
suppress duplicate prompts, and once a key has been decrypted, we
never need to prompt for it again, duplicate or otherwise. But that's
about to change, so now this bug needs fixing.
There was a lot of ugly, repetitive, error-prone code that decoded
agent responses in raw data buffers. Now my internal client query
function is returning something that works as a BinarySource, so we
can decode agent responses using the marshal.h system like any other
SSH-formatted message in this code base.
While I'm at it, I've centralised more of the parsing of key lists
(saving repetition in pageant_add_key and pageant_enum_keys),
including merging most of the logic between SSH-1 and SSH-2. The old
functions pageant_get_keylist1 and pageant_get_keylist2 aren't exposed
in pageant.h any more, because they no longer exist in that form, and
also because nothing was using them anyway. (Windows Pageant was using
the separate pageant_nth_ssh2_key() functions that talk directly to
the core, and Unix Pageant was using the more cooked client function
pageant_enum_keys.)
Apparently a handful of calls to that particular function managed to
miss my big-bang conversion to using bool where appropriate, and were
still being called with constants 0 and 1.
A PageantSignOp for a not-yet-decrypted key was being linked on to its
key's blocked_requests queue twice, mangling the linked list integrity
and causing segfaults. Now we take care to NULL out the pointers
within the signop to indicate that it isn't currently on the queue,
and check whether it's currently linked before linking or unlinking it.
Reading draft-miller-ssh-agent-04 more carefully, I see that I missed
a few things from the extension-message spec. Firstly, there's an
extension request "query" which is supposed to list all the extensions
you support. Secondly, if you recognise an extension-request name but
are then unable to fulfill the request for some other reason, you're
supposed to return a new kind of failure message that's distinct from
SSH_AGENT_FAILURE, because for extensions, the latter is reserved for
"I don't even know what this extension name means at all".
I've fixed both of those bugs in Pageant by making a centralised map
of known extension names to an enumeration of internal ids, and an
array containing the name for each id. So we can reliably answer the
"query" extension by iterating over that array, and also use the same
array to recognise known extensions up front and give them centralised
processing (in particular, resetting the failure-message type) before
switching on the particular extension index.
This reads data from standard input, turns it into an SSH-2 sign
request, and writes the resulting signature blob to standard output.
I don't really anticipate many uses for this other than testing. But
it _is_ convenient for testing changes to Pageant itself: it lets me
ask for a signature without first having to construct a pointless SSH
session that will accept the relevant key.
This applies to both server modes ('pageant -E key.ppk [lifetime]')
and client mode ('pageant -a -E key.ppk').
I'm not completely confident that the CLI syntax is actually right
yet, but for the moment, it's enough that it _exists_. Now I don't
have to test the encrypted-key loading via manually mocked-up agent
requests.
Until now, all the functions that have to work in both the Pageant
server and a separate client process have been implemented by having
two code paths for every request, one of which marshals an agent
request and passes it to agent_query_synchronous, and the other just
calls one of the internal functions in the Pageant core.
This is already quite ugly, and it'll only get worse when I start
adding more client requests. So here's a simplification: now, there's
only one code path, and we _always_ marshal a wire-format agent
request. When we're the same process as the Pageant server, we pass it
to the actual message handler and let that decode it again, enforcing
by assertion that it's not an asynchronous operation that's going to
delay.
This patch removes a layer of indentation from many functions in the
Pageant client layer, so it's best viewed with whitespace ignored.