We already have the ability to start a subprocess and hook it up to a
Socket, for running local proxy commands. Now the same facility is
available as an auxiliary feature, so that a backend can start another
subcommand for a different purpose, and make a separate Socket to
communicate with it.
Just like the local proxy system, this facility captures the
subprocess's stderr, and passes it back to the caller via plug_log. To
make that not look silly, I had to add a system where the "proxy:"
prefix on the usual plug_log messages is reconfigurable, and when you
call platform_start_subprocess(), you get to pass the prefix you want
to use in this case.
I'm going to want to use it in an upcoming commit, because together
with 'savedhost', it forms the identification of an SSH server (at
least as far as the host key cache is concerned, and therefore it's
appropriate for other uses too).
We were already passing the hostname through for use in user-facing
prompts (not to mention the FQDN version for use in GSSAPI).
I was just about to add another ordinary edit box control, and found I
couldn't remember what went in the context2 argument to conf_editbox.
When I looked it up, I realised it was one of those horrid integer
encodings of the form '1 means this, -1 means that, less than -1 means
some parametrised property where the parameter is obtained by negating
the encoded integer'.
Those are always awkward to remember, and worse to extend. So I've
replaced the integer context2 used with conf_editbox_handler with a
pointer to a small struct type in which the types and parameters have
sensible names and are documented.
(To avoid annoying const warnings everywhere, this also meant
extending the 'intorptr' union to have a const void * branch as well
as a 'void *'. Surprised I haven't needed that before. But if I
introduce any more of these parameter structures, it'll come in useful
again.)
I made a specific subdirectory 'stubs' to keep all the link-time stub
modules in, like notiming.c. And I put _one_ run-time stub in it,
namely nullplug.c. But the rest of the runtime stubs went into utils.
I think it's better to keep all the stubs together, so I've moved all
the null*.c in utils into stubs (with the exception of nullstrcmp.c,
which means the 'null' in a different sense). Also, fiddled with the
naming to be a bit more consistent, and stated in the new CMakeLists
the naming policy that distinguishes no-*.c from null-*.c.
Like "dh-gex-sha1", this string used in session storage really covers
both SHA-256 and SHA-1 variants (since a624786333), with the former
preferred; but backward-compatibility makes it fiddly to change (and
it's mostly not visible to users).
We've occasionally had reports of SSH servers disconnecting as soon as
they receive PuTTY's KEXINIT. I think all such reports have involved
the kind of simple ROM-based SSH server software you find in small
embedded devices.
I've never been able to prove it, but I've always suspected that one
possible cause of this is simply that PuTTY's KEXINIT is _too long_,
either in number of algorithms listed or in total length (especially
given all the ones that end in @very.long.domain.name suffixes).
If I'm right about either of those being the cause, then it's just
become even more likely to happen, because of all the extra
Diffie-Hellman groups and GSSAPI algorithms we just threw into our
already-long list in the previous few commits.
A workaround I've had in mind for ages is to wait for the server's
KEXINIT, and then filter our own down to just the algorithms the
server also mentioned. Then our KEXINIT is no longer than that of the
server, and hence, presumably fits in whatever buffer it has. So I've
implemented that workaround, in anticipation of it being needed in the
near future.
(Well ... it's not _quite_ true that our KEXINIT is at most the same
length as the server. In fact I had to leave in one KEXINIT item that
won't match anything in the server's list, namely "ext-info-c" which
gates access to SHA-2 based RSA. So if we turn out to support
absolutely everything on all the server's lists, then our KEXINIT
would be a few bytes longer than the server's, even with this
workaround. But that would only cause trouble if the server's outgoing
KEXINIT was skating very close to whatever buffer size it has for the
incoming one, and I'm guessing that's not very likely.)
((Another possible cause of this kind of disconnection would be a
server that simply objects to seeing any KEXINIT string it doesn't
know how to speak. But _surely_ no such server would have survived
initial testing against any full-featured client at all!))
This is surprisingly simple, because it wasn't necessary to touch the
GSS parts at all. Nothing changes about the message formats between
integer DH and ECDH in GSS KEX, except that the mpints sent back and
forth as part of integer DH are replaced by the opaque strings used in
ECDH. So I've invented a new KEXTYPE and made it control a bunch of
small conditionals in the middle of the GSS KEX code, leaving the rest
unchanged.
None of the constructors can fail and return NULL. I think this must
have come in with a lot of other unnecessary null-pointer checks when
ECC support was first added. I've got rid of most of them since then,
but that one apparently escaped my notice.
Introduced recently by commit 42740a5455, in which I decided to
call ssh_key_cache_str() even on certified host keys. But that call
was conditional on s->hkey being non-NULL (which happens in GSS KEX)
as well as on it not being certified, and I managed to absentmindedly
remove _both_ conditions. As a result we got a null-pointer
dereference on any GSS kex.
These were introduced in 34d01e1b65 to pretty-print certificate validity
ranges. But Microsoft's C runtime took a while to catch up with C99 --
stackoverflow claims that VS2013 and earlier don't support these
specifiers -- so it's possible to end up with PuTTY executables that
misdisplay these dates. Also, the mingw-w64 toolchain's -Wformat
complains about these specifiers, at least on Debian buster, presumably
for the same reason.
Since the specifiers in question have exact pre-C99 replacements, it
seems easiest just to use those.
This was lost in the mkfiles.pl->cmake transition (c19e7215dd).
Without this, MinGW builds were providing format strings like %zu to a
version of vsnprintf that didn't support them at runtime, so you'd get
messages like "Pageant has zu SSH-2 keys". (-Wformat would have
complained about the unknown %z format specifier, but even STRICT MinGW
builds don't get those warnings, hm.)
Now the runtime version understands %zu.
I've reviewed the other compile-time definitions that were unique to the
old Makefile.mgw, and decided not to reinstate any of them:
WIN32S_COMPAT: leave it out.
This came in in bd4b8c1285. Rationale from Joris van Rantwijk in email
2000-01-24: "Use -DWIN32S_COMPAT to avoid a linking error about
SystemPowerStatus".
But that problem was solved another way within 8 months, and
WIN32S_COMPAT removed from the code, in 76746a7d61, so this wart had
been redundant since then.
_NO_OLDNAMES: decided not to add anything back for this.
This actually does nothing with the mingw-w64 fork (which seems to spell
it NO_OLDNAMES), although current versions of original-mingw do also
still spell it _NO_OLDNAMES.
They both seem to be about suppressing a behaviour where a load of
"non-ANSI" names like strdup get redirected to invoke _strdup in MS'
libraries.
Again, original rationale is from Joris van Rantwijk: "Compile and link
with -mno-cygwin (and -D_NO_OLDNAMES) to get executables that don't need
the Cygwin DLL file."
Since I don't know of any behavioural differences that this causes
(unlike vsnprintf/_vsnprintf), and it's not obviously causing trouble
for me, continue to leave things in the default state.
I ran 'ls /usr/share/doc' in a psusan session the other day and the
output hung part way through the long directory listing. This turned
out to be because the ssh-connection channel window had run out
temporarily, and PuTTY had sent a WINDOW_ADJUST extending it again,
which the connection layer acted on by calling chan_set_input_wanted
... which sesschan was ignoring with a comment saying /* I don't think
we need to do anything here */.
Well, turns out we do need to. Implemented the simplest possible
unblocking action.
I only recently found out that OpenSSH defined their own protocol IDs
for AES-GCM, defined to work the same as the standard ones except that
they fixed the semantics for how you select the linked cipher+MAC pair
during key exchange.
(RFC 5647 defines protocol ids for AES-GCM in both the cipher and MAC
namespaces, and requires that you MUST select both or neither - but
this contradicts the selection policy set out in the base SSH RFCs,
and there's no discussion of how you resolve a conflict between them!
OpenSSH's answer is to do it the same way ChaCha20-Poly1305 works,
because that will ensure the two suites don't fight.)
People do occasionally ask us for this linked cipher/MAC pair, and now
I know it's actually feasible, I've implemented it, including a pair
of vector implementations for x86 and Arm using their respective
architecture extensions for multiplying polynomials over GF(2).
Unlike ChaCha20-Poly1305, I've kept the cipher and MAC implementations
in separate objects, with an arm's-length link between them that the
MAC uses when it needs to encrypt single cipher blocks to use as the
inputs to the MAC algorithm. That enables the cipher and the MAC to be
independently selected from their hardware-accelerated versions, just
in case someone runs on a system that has polynomial multiplication
instructions but not AES acceleration, or vice versa.
There's a fourth implementation of the GCM MAC, which is a pure
software implementation of the same algorithm used in the vectorised
versions. It's too slow to use live, but I've kept it in the code for
future testing needs, and because it's a convenient place to dump my
design comments.
The vectorised implementations are fairly crude as far as optimisation
goes. I'm sure serious x86 _or_ Arm optimisation engineers would look
at them and laugh. But GCM is a fast MAC compared to HMAC-SHA-256
(indeed compared to HMAC-anything-at-all), so it should at least be
good enough to use. And we've got a working version with some tests
now, so if someone else wants to improve them, they can.
I booted my M1 Mac into macOS rather than Asahi for the first time in
a while, and discovered that an OS update seems to have added some
sysctl flags indicating the presence of the CPU extensions that I
previously knew of no way to check for! Added them checks to
arm_arch_queries.c, though I've also retained backwards compat with
the previous OS version which didn't have them at all.
This provides a convenient hook to be called between SSH messages, for
the crypto components to do any per-message processing like
incrementing a sequence number.
Not sure how I missed this! I tested ChaCha20, but not the MAC that
goes with it. Happily, it passes, so no harm done.
This also involved adding a general framework for testing MACs that
are tied to a specific cipher: we have to allocate, key and IV the
cipher before attempting to use the MAC, and free it all afterwards.
Instead of having separate subsidiary list macros for all the AES-NI
or NEON accelerated ciphers, the main list macro now contains each
individual thing conditionalised under an IF_FOO macro defined at the
top.
Makes relatively little difference in the current state of things, but
it will make it easier to do lots of differently conditionalised
single entries in a list, which will be coming up shortly.
This causes cmake to stop whinging that there isn't one. More
usefully, by specifying the LANGUAGES keyword as just C (rather than
the default of both C and CXX), the cmake configure step is sped up by
not having to faff about finding a C++ compiler.
The length test was pasted from the ordinary decrypt function, when it
should have been pasted from encrypt_length (which got this right).
I've never tried to test those functions before, so I never noticed.
In the situation where a MAC and cipher implementation are tied
together by being facets of the same underlying object (used by the
inseparable ChaCha20 + Poly1305 pair), previously we freed them by
having the cipher_free function actually do the freeing, having the
mac_free function do nothing, and taking great care to call those in
the right order. (Otherwise, the mac_free function dereferences a
no-longer-valid vtable pointer and doesn't get as far as _finding out_
that it doesn't have to do anything.)
That's a time bomb in general, and especially awkward in situations
like testcrypt where we don't get precise control over freeing order
in any case. So I've replaced that system with one in which there are
two flags in the ChaCha20-Poly1305 structure, saying whether each of
the cipher and MAC facets is currently considered to be allocated.
When the last of those flags is cleared, the object is actually freed.
So now they can be freed in either order.
These were indented 2 spaces _further_ than the surrounding code,
instead of 2 spaces less. My bulk-reindentation the other day didn't
detect them because apparently my Emacs configuration can make this
mistake all by itself, so it thought they were right!
We're now setting the help context centrally in ssh/common.c - but I
forgot to remove the _old_ assignment statements, which overwrite
whatever that asks for. Oops.
Previously, we had a single data structure 'keytree' containing
records each involving a public and private key (the latter maybe in
clear, or as an encrypted key file, or both). Now, we have separate
'pubkeytree' and 'privkeytree', the former storing public keys indexed
by their full public blob (including certificate, if any), and the
latter storing private keys, indexed by the _base_ public blob
only (i.e. with no certificate included).
The effect of this is that deferred decryption interacts more sensibly
with certificates. Now, if you load certified and uncertified versions
of the same key into Pageant, or two or more differently certified
versions, then the separate public key records will all share the same
private key record, and hence, a single state of decryption. So the
first time you enter a passphrase that unlocks that private key, it
will unlock it for all public keys that share the same private half.
Conversely, re-encrypting any one of them will cause all of them to
become re-encrypted, eliminating the risk that you deliberately
re-encrypt a key you really care about and forget that another equally
valuble copy of it is still in clear.
The most subtle part of this turned out to be the question of what key
comment you present in a deferred decryption prompt. It's very
tempting to imagine that it should be the comment that goes with
whichever _public_ key was involved in the signing request that
triggered the prompt. But in fact, it _must_ be the comment that goes
with whichever version of the encrypted key file is stored in Pageant
- because what if the user chose different passphrases for their
uncertified and certified PPKs? Then the decryption prompt will have
to indicate which passphrase they should be typing, so it's vital to
present the comment that goes with the _file we're decrypting_.
(Of course, if the user has selected different passphrases for those
two PPKs but the _same_ comment, they're still going to end up
confused. But at least once they realise they've done that, they have
a workaround.)
OpenSSH, when called on to give the fingerprint of a certified public
key, will in many circumstances generate the hash of the public blob
of the _underlying_ key, rather than the hash of the full certificate.
I think the hash of the certificate is also potentially useful (if
nothing else, it provides a way to tell apart multiple certificates on
the same key). But I can also see that it's useful to be able to
recognise a key as the same one 'really' (since all certificates on
the same key share a private key, so they're unavoidably related).
So I've dealt with this by introducing an extra pair of fingerprint
types, giving the cross product of {MD5, SHA-256} x {base key only,
full certificate}. You can manually select which one you want to see
in some circumstances (notably PuTTYgen), and in others (such as
diagnostics) both fingerprints will be emitted side by side via the
new functions ssh2_double_fingerprint[_blob].
The default, following OpenSSH, is to just fingerprint the base key.
This patch fixes a few other whitespace and formatting issues which
were pointed out by the bulk-reindent or which I spotted in passing,
some involving manual editing to break lines more nicely.
I think the weirdest hunk in here is the one in windows/window.c
TranslateKey() where _half_ of an assignment statement inside an 'if'
was on the same line as the trailing paren of the if condition. No
idea at all how that one managed to happen!
I think a lot of these were inserted by a prior run through GNU indent
many years ago. I noticed in a more recent experiment that that tool
doesn't always correctly distinguish which instances of 'id * id' are
pointer variable declarations and which are multiplications, so it
spaces some of the former as if they were the latter.
If the function name (or expression) in a function call or declaration
is itself so long that even the first argument doesn't fit after it on
the same line, or if that would leave so little space that it would be
silly to try to wrap all the run-on lines into a tall thin column,
then I used to do this
ludicrously_long_function_name
(arg1, arg2, arg3);
and now prefer this
ludicrously_long_function_name(
arg1, arg2, arg3);
I picked up the habit from Python, where the latter idiom is required
by Python's syntactic significance of newlines (you can write the
former if you use a backslash-continuation, but pretty much everyone
seems to agree that that's much uglier). But I've found it works well
in C as well: it makes it more obvious that the previous line is
incomplete, it gives you a tiny bit more space to wrap the following
lines into (the old idiom indents the _third_ line one space beyond
the second), and I generally turn out to agree with the knock-on
indentation decisions made by at least Emacs if you do it in the
middle of a complex expression. Plus, of course, using the _same_
idiom between C and Python means less state-switching.
So, while I'm making annoying indentation changes in general, this
seems like a good time to dig out all the cases of the old idiom in
this code, and switch them over to the new.
My aim has always been to have those back-dented by 2 spaces (half an
indent level) compared to the statements around them, so that in
particular switch statements have distinct alignment for the
statement, the cases and the interior code without consuming two whole
indent levels.
This patch sweeps up all the violations of that principle found by my
bulk-reindentation exercise.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
In several pieces of development recently I've run across the
occasional code block in the middle of a function which suddenly
switched to 2-space indent from this code base's usual 4. I decided I
was tired of it, so I ran the whole code base through a re-indenter,
which made a huge mess, and then manually sifted out the changes that
actually made sense from that pass.
Indeed, this caught quite a few large sections with 2-space indent
level, a couple with 8, and a handful of even weirder things like 3
spaces or 12. This commit fixes them all.
The fixed tab stops that we used to use in the old LBS_HASSTRINGS list
box, and that I carefully replicated in the new owner-drawn version,
are no more! Now, every time we refresh the key list, we actually
_measure_ the maximum size of string that needs to fit into each
column, and size the columns based on that.
Now I don't have to worry any more about whether the set of algorithm
names might one day overflow the fixed column width, or whether a
particularly unlucky choice of key with lots of wide letters like M
and W in its base64-encoded SHA256 hash might do the same.
Also, the previous column sizes were pessimistic (for reason of
exactly that worry), so this change generally moves things over
towards the left of the list box - which means there's now room for
longer key comments, and more chance of the suffixes '(encrypted)' or
'(re-encryptable)' being visible on the right.
The test in the Pageant list box code for whether we should display
the bit count of a key was done by checking specifically for ssh_rsa
or ssh_dsa, which of course meant that it didn't catch the certified
versions of those keys.
Now there's yet another footling ssh_keyalg method that asks the
question 'is it worth displaying the bit count?', to which RSA and DSA
answer yes, and the opensshcert family delegates to its base key type,
so that RSA and DSA certified keys also answer yes.
(This isn't the same as ssh_key_public_bits(alg, blob) >= 0. All
supported public key algorithms _can_ display a bit count if called
on. But only in RSA and DSA is it configurable, and therefore worth
bothering to print in the list box.)
Also in this commit, I've fixed a bug in the certificate
implementation of public_bits, which was passing a wrongly formatted
public blob to the underlying key. (Done by factoring out the code
from opensshcert_new_shared which constructed the _correct_ public
blob, and reusing it in public_bits to do the same job.)
If you load a certified key into Windows Pageant, the official SSH id
for the key type is so long that it overflows its space in the list
box and overlaps the key fingerprint hash.
This commit introduces yet another footling little ssh_keyalg method
which returns a shorter human-readable description of the key type,
and uses that in the Windows Pageant list box only.
(Not in the Unix Pageant list, though, because being output to stdout,
that seems like something people are more likely to want to
machine-read, which firstly means we shouldn't change it lightly, and
secondly, if we did change it we'd want to avoid having a variable
number of spaces in the replacement key type text.)
The main key list control in the Pageant window was previously an
ordinary LBS_HASSTRINGS list box, with tab characters aligning the
various parts of the key information into different columns. This was
fragile because any mistake in the font metrics could have overflowed
a tab stop and forced the text to move on to the next one.
Now I've switched the list box into LBS_OWNERDRAWFIXED mode, which
means that in place of a string for each list item I store a struct of
my choice, and I have to draw the list-box entries myself by
responding to WM_DRAWITEM. So now I'm drawing each component of the
key information as a separate call to ExtTextOut (plus one
TabbedTextOut to put the '(encrypted)' suffix on the end), which means
that the tab stops are now guaranteed to appear where I tell them to.
No functional change, for the moment: this is pure refactoring. As
closely as I can tell, the appearance of the list box is
pixel-for-pixel what it was before this commit. But it opens the door
for two further improvements (neither one done in this commit): I can
dynamically choose the tab stop locations based on querying the text
metrics of the strings that will actually need to fit in the columns,
and also, whatever reorganisation I need to do to make certificates
fit sensibly in this list box can now be done without worrying about
breaking anything terribly fragile.
The main list box in the Pageant key list window was identified by a
numeric control id, even though pageant-rc.h has a nice meaningful
macro name for it (and pageant.c uses that).
When PuTTYgen is holding a certified key, I don't think there's any
sensible use for pasting around the full public key in authorized_keys
format, because the whole point is that what you put in
authorized_keys is 'please trust this CA' rather than the specific
key. So instead I've reused the space in the dialog box to indicate
that it's a certificate, and provide a 'more info' sub-dialog.
I'm about to want setupbigedit1 and setupbigedit2 to know the control
ids themselves, and also add more controls to the enum, and it keeps
the diffs more legible if I move the entire enum around unchanged
_first_ and then start making small changes in the middle of it.
The recently added SeatDialogText type was just what I needed to add a
method to the ssh_key vtable for dumping certificate information in a
human-readable format. It will be good for displaying in a Windows
dialog box as well as in cmdgen's text format.
This commit introduces and implements the new method, and adds a
--cert-info mode to command-line Unix PuTTYgen that uses it. The
Windows side will follow shortly.
I must have dashed off that branch of the key reading function without
ever testing it, or I'd have noticed by now that it was looking for
the wrong string to terminate the file. Ahem.
Most of the Windows-specific dialog control construction functions
were passing their string parameters as 'char *' even though they were
string literals. Apparently none of our previous giant constification
patches spotted that.
A user reports that if the Print Spooler service is disabled via
services.msc, then PuTTY can report 'Out of memory!' when you try to
open the Terminal config pane, which is the one containing the combo
box enumerating the available printers.
Apparently this is because the call to EnumPrinters failed with the
error code other than the expected ERROR_INSUFFICIENT_BUFFER, and in
the process, left garbage in the pcbNeeded output parameter. That
wouldn't be too surprising if it had simply _not written_ to that
parameter and therefore it was never initialised at all in the calling
function printer_add_enum. But in fact, printer_add_enum *does*
precautionarily initialise needed=0 before the initial call to
EnumPrinters. So EnumPrinters must have actively written one of its
*own* uninitialised variables into it!
Anyway, the obvious fix is to distinguish ERROR_INSUFFICIENT_BUFFER
from any other kind of EnumPrinters failure (in fact turning off Print
Spooler seems to lead to RPC_S_SERVER_UNAVAILABLE), and not attempt to
proceed in the case of other failures.
In commit 694d5184b7, I introduced the 'just_button' flag for
CTRL_FILESELECT controls, and in commit ddcd93ab12 I added a use
of such a control with the flag set to true.
But I forgot to set it to false everywhere else, which caused an
assertion failure when selecting the Bell pane in Windows PuTTY. Oops.