Just spotted this in eyeball review: we're about to construct our new
outgoing KEXINIT and write it into the strbuf s->outgoing_kexinit. So
we should clear that strbuf first. But in fact we were clearing
s->client_kexinit, which aliases s->outgoing_kexinit in an SSH client,
but in a server, aliases s->incoming_kexinit.
This was harmless in PuTTY (since the strbuf we cleared was the right
one anyway). And it was harmless in Uppity's initial kex (since the
strbuf we _meant_ to clear was empty anyway). But if Uppity had ever
initiated a rekey, this would have exploded messily.
I only realised this bug while writing up the feature for the
wishlist:
It's one thing _at connection startup_ to delay sending your KEXINIT
until the server has sent its: the server is very likely to send it
anyway (unless it's attempting the same workaround against us), so
probably nothing goes wrong.
But if we want to initiate a rekey, we do that _by_ sending a KEXINIT.
In that situation we can't just wait until the server sends one,
because it has no idea it's supposed to be doing so!
Happily, in that situation, we already have a KEXINIT from the server,
left over from the previous key exchange. So we can filter against
that, and still have the intended effect of not spending KEXINIT space
on algorithms the server doesn't know about.
Somehow I missed that Coverity reported that complaint about a
(theoretically) uninitialised pointer twice, against the two
platforms' console.c files. Now fixed the same way in the other one.
It complained in console_confirm_ssh_host_key that if the caller
passed in a SeatDialogText containing no SDT_PROMPT record, then
'prompt' would be uninitialised.
The answer is "don't do that, then", but fair enough that Coverity
didn't know that. Added an assertion, which should keep it happy, and
also cause better error handling if we ever _do_ make that mistake.
Coverity complained in keylist_update_callback that in one if
statement I was allowing for the possibility that alg == NULL, and in
the next, I was assuming it would always be non-null.
Right now I'm not actually convinced that _either_ check is necessary
- it would make sense in an agent _client_, where you might be talking
to an agent that knows key algorithms you don't, but this is the GUI
built into Pageant itself, so any key it can store internally ought to
have a known algorithm name.
Still, this fix is certainly _correct_ even if not optimal, and it'll
do for now.
Coverity spotted me copying an uninitialised variable into it, which
made me wonder how I hadn't noticed. The answer is that nothing
actually _uses_ that variable - it's written, but never read. I must
have put it in during development, thinking I was going to need it for
something, and then didn't end up using it after all.
Coverity points out that if we refer to cp_list[codepage - 65536], we
ought to have ensured that codepage - 65536 was _less_ than
lenof(cp_list), not just less or equal.
The protocol selector widgets were misaligned in GTK as well as on
Windows, but for a completely different reason. (I guess both bugs
must have been introduced at the same time when I reworked the system
to tolerate more than two aligned widgets in commit b5ab90143a2df7f.)
To vertically align N widgets, you have to first figure out what range
of y-coordinates they jointly occupy, and then centre each one within
that range. We were trying to do both jobs in the same pass, which
meant trying to place the first widget before finding out where the
last one will be. To do this, we were separately computing the
y-range's start and width, the former by taking max of the
y-coordinates _seen so far_, and the latter by taking max of _all_ the
widgets' heights.
This has two problems. One is that if you later find out that the
y-coordinate of the top of the range needs to be lower than you'd
previously realised, it's too late to go back and reposition the
widgets you've already placed. But that's a theoretical issue that
would only come up with more complicated column layouts than we've
actually used. (And probably more complicated than would even be
_sensible_ to use.)
The other, more immediate, problem: the y-coordinates we were using
for already-placed widgets in the set were the ones _after_ we
adjusted each one for vertical centring. So if the first widget is
short and the second taller (say, heights 20 and 30 pixels), then the
first widget will be offset downwards by 5 pixels, but the second
widget will use that offset y-coordinate as the _top_ of the range to
fit itself into, and hence, will also be 5 pixels downward from where
it should have been.
I think only the second of those problems is immediately concerning,
but it's easier to fix both at once. I've removed the y-adjustment for
vertical centring from the main layout loop, and put it in a separate
pass run after the main layout finishes.
We had carefully calculated, for each control in an aligned group, how
much _that control_ should move downwards by. But then, because I
carelessly referred to the wrong variable name, we actually moved the
wrong one - not the control we'd just calculated the offset for, but
always the _last_ one in the group, which was the one the top-level
alignment code was processing at the point we began this loop.
As a result, the dropdown list in the front-page protocol selector was
hilariously misaligned. Now it's back where it should be.
Another of this weekend's warnings pointed out that this function
contained a pattern I now regard as a cardinal sin: variables called
'ret' that aren't clear whether they've _been_ returned from a
subroutine, or whether they're _planned_ to be returned from the
containing function. Worse, psftp_main had both: two of the former
kind shadowing a case of the latter in sub-scopes.
I decided that the 'namemaker' system introduced recently in commit
fbb979aa98 was just too marginal to be sensible, and it's easier
to simply quote the full SSH id for each protocol.
Also, included an empty argument at the end of each macro invocation,
so that the variadic "..." is never completely missing.
My experimental build with clang-cl at -Wall did show up a few things
that are safe enough to fix right now. One was this list of missing
includes, which was causing a lot of -Wmissing-prototype warnings, and
is a real risk because it means the declarations in headers weren't
being type-checked against the actual function definitions.
Happily, no actual mismatches.
I was wondering what this was doing here at all when strbuf_chomp is a
better choice. The answer turned out to be 'nothing' - it wasn't even
used any more.
Now we should get warned if we do anything that breaks the new
stricter MinGW warning level, not to mention anything generating
warnings in the clang-cl builds.
Unfortunately, it gives an absolutely huge number of warnings, and it
wouldn't be feasible to fix them all without risking introducing
further bugs. Perhaps _after_ the next release branch it might be
worth looking at some of them, but I don't think fixing them right now
is viable.
I've left it on for actual gcc, though, since the MinGW build seems OK
with it.
This has been missing since the cmake transition (c19e7215dd) --
mkfiles.pl generally used at least -Wall -Werror -Wvla with GCC-like
compilers. After that, Windows STRICT gained -Wpointer-arith, but lost
-Wall and hence a lot of other warnings (such as the -Wformat I muttered
about in baea34a5b2).
My mingw-w64 build survives this (after my recent warning fixes), and
apparently an official Bob build does too.
My correspondent on the new authentication-plugin feature reports that
their plugin is not reliably receiving the final PLUGIN_AUTH_SUCCESS
message on Windows. I _think_ this is because the whole userauth layer
is being dismissed, leading to sk_close() of the Socket talking to the
plugin, before the data has actually been written to the outgoing
pipe.
This should fix it: track the Socket's backlog, and immediately after
sending that message, wait until we receive a notification that the
backlog has decreased to size 0. That stops us from terminating the
userauth layer until the message has left our process.
Apparently a nasty trick I did in one of the selector vtable macros
was not acceptable to VS, which thinks that "string" ? NULL : NULL is
not a constant expression - it can't tell that the string literal has
a non-null value _or_ that it doesn't matter whether the value is null
or not.
Redone the vtable name construction in a way that depends only on the
actual preprocessor, not on the followup C expression semantics.
Use of thread-local storage on Windows (introduced recently in
69e8d471d1) could cause a -Wattributes warning in mingw-w64 builds,
since that toolchain doesn't understand __declspec(thread).
Define a portability macro THREADLOCAL, which should resolve to
something appropriate for at least:
- MSVC, which understands the Microsoft syntax __declspec(thread);
- GCC (e.g., mingw-w64) which understands the GNU syntax __thread
(GCC only implements __declspec() to the extent of understanding the
arguments 'dllexport' and 'dllimport');
- Clang, which supports both syntaxes.
(It's possible there's some more obscure Windows toolchain which will
now hit the #error as a result of this change.)
I haven't attempted to try to detect and use the C11 syntax
'thread_local'. And this is all still Windows-only, since that's all we
need for now and it avoids opening the can of worms that is TLS on other
platforms.
(I considered delegating all this to cmake, but as well as being fiddly,
it seems even the latest versions of cmake don't know about thread-local
storage for C, as opposed to C++ (cxx_thread_local).)
These show up if you build from the Windows source archive on Unix,
which is an odd thing to be trying to do, but I managed it myself the
other day by accident :-)
.dsp and .dsw files are no longer provided in the source archive (they
went out with mkfiles.pl), so there's no need to include an exception
to treat them as binary files.
On the other hand, the source archives _do_ contain a .chm help file
and a .cur mouse pointer image, which _should_ be treated as binary.
(That was a benign omission: Info-Zip detected by itself that the
files were binary, and didn't mangle them. But it did print an
annoying warning, which this commit fixes.)
While I'm here, add .git to the list of version control subdirectories
to exclude.
If Halibut is not available to build the docs, but on the other hand
pre-built man pages already exist (e.g. because you unpacked a source
zip file with them already provided), then docs/CMakeLists.txt creates
a set of build rules that copy the pre-built man pages from the source
directory to the build directory.
However, if the source and build directories are the _same_, this
creates a set of cyclic dependencies, i.e. files which depend directly
on themselves. Some build tools (in particular 'ninja') will report
this as an error.
In that situation, the simple fix is to leave off the build rules
completely: if the man pages are already where the build will want
them to end up, there need not be any build rule to do anything about
them.
In recent months I've had two requests from different people to build
support into PuTTY for automatically handling complicated third-party
auth protocols layered on top of keyboard-interactive - the kind of
thing where you're asked to enter some auth response, and you have to
refer to some external source like a web server to find out what the
right response _is_, which is a pain to do by hand, so you'd prefer it
to be automated in the SSH client.
That seems like a reasonable thing for an end user to want, but I
didn't think it was a good idea to build support for specific
protocols of that kind directly into PuTTY, where there would no doubt
be an ever-lengthening list, and maintenance needed on all of them.
So instead, in collaboration with one of my correspondents, I've
designed and implemented a protocol to be spoken between PuTTY and a
plugin running as a subprocess. The plugin can opt to handle the
keyboard-interactive authentication loop on behalf of the user, in
which case PuTTY passes on all the INFO_REQUEST packets to it, and
lets it make up responses. It can also ask questions of the user if
necessary.
The protocol spec is provided in a documentation appendix. The entire
configuration for the end user consists of providing a full command
line to use as the subprocess.
In the contrib directory I've provided an example plugin written in
Python. It gives a set of fixed responses suitable for getting through
Uppity's made-up k-i system, because that was a reasonable thing I
already had lying around to test against. But it also provides example
code that someone else could pick up and insert their own live
response-provider into the middle of, assuming they were happy with it
being in Python.
No functional change, but I've pulled the bulk of the k-i setup and
prompting code out of ssh2_userauth_process_queue and into
subroutines, in preparation for wanting to do the same work in more
than one place in the main coroutine's control flow.
We already have the ability to start a subprocess and hook it up to a
Socket, for running local proxy commands. Now the same facility is
available as an auxiliary feature, so that a backend can start another
subcommand for a different purpose, and make a separate Socket to
communicate with it.
Just like the local proxy system, this facility captures the
subprocess's stderr, and passes it back to the caller via plug_log. To
make that not look silly, I had to add a system where the "proxy:"
prefix on the usual plug_log messages is reconfigurable, and when you
call platform_start_subprocess(), you get to pass the prefix you want
to use in this case.
I'm going to want to use it in an upcoming commit, because together
with 'savedhost', it forms the identification of an SSH server (at
least as far as the host key cache is concerned, and therefore it's
appropriate for other uses too).
We were already passing the hostname through for use in user-facing
prompts (not to mention the FQDN version for use in GSSAPI).
I was just about to add another ordinary edit box control, and found I
couldn't remember what went in the context2 argument to conf_editbox.
When I looked it up, I realised it was one of those horrid integer
encodings of the form '1 means this, -1 means that, less than -1 means
some parametrised property where the parameter is obtained by negating
the encoded integer'.
Those are always awkward to remember, and worse to extend. So I've
replaced the integer context2 used with conf_editbox_handler with a
pointer to a small struct type in which the types and parameters have
sensible names and are documented.
(To avoid annoying const warnings everywhere, this also meant
extending the 'intorptr' union to have a const void * branch as well
as a 'void *'. Surprised I haven't needed that before. But if I
introduce any more of these parameter structures, it'll come in useful
again.)
I made a specific subdirectory 'stubs' to keep all the link-time stub
modules in, like notiming.c. And I put _one_ run-time stub in it,
namely nullplug.c. But the rest of the runtime stubs went into utils.
I think it's better to keep all the stubs together, so I've moved all
the null*.c in utils into stubs (with the exception of nullstrcmp.c,
which means the 'null' in a different sense). Also, fiddled with the
naming to be a bit more consistent, and stated in the new CMakeLists
the naming policy that distinguishes no-*.c from null-*.c.
Like "dh-gex-sha1", this string used in session storage really covers
both SHA-256 and SHA-1 variants (since a624786333), with the former
preferred; but backward-compatibility makes it fiddly to change (and
it's mostly not visible to users).
We've occasionally had reports of SSH servers disconnecting as soon as
they receive PuTTY's KEXINIT. I think all such reports have involved
the kind of simple ROM-based SSH server software you find in small
embedded devices.
I've never been able to prove it, but I've always suspected that one
possible cause of this is simply that PuTTY's KEXINIT is _too long_,
either in number of algorithms listed or in total length (especially
given all the ones that end in @very.long.domain.name suffixes).
If I'm right about either of those being the cause, then it's just
become even more likely to happen, because of all the extra
Diffie-Hellman groups and GSSAPI algorithms we just threw into our
already-long list in the previous few commits.
A workaround I've had in mind for ages is to wait for the server's
KEXINIT, and then filter our own down to just the algorithms the
server also mentioned. Then our KEXINIT is no longer than that of the
server, and hence, presumably fits in whatever buffer it has. So I've
implemented that workaround, in anticipation of it being needed in the
near future.
(Well ... it's not _quite_ true that our KEXINIT is at most the same
length as the server. In fact I had to leave in one KEXINIT item that
won't match anything in the server's list, namely "ext-info-c" which
gates access to SHA-2 based RSA. So if we turn out to support
absolutely everything on all the server's lists, then our KEXINIT
would be a few bytes longer than the server's, even with this
workaround. But that would only cause trouble if the server's outgoing
KEXINIT was skating very close to whatever buffer size it has for the
incoming one, and I'm guessing that's not very likely.)
((Another possible cause of this kind of disconnection would be a
server that simply objects to seeing any KEXINIT string it doesn't
know how to speak. But _surely_ no such server would have survived
initial testing against any full-featured client at all!))
This is surprisingly simple, because it wasn't necessary to touch the
GSS parts at all. Nothing changes about the message formats between
integer DH and ECDH in GSS KEX, except that the mpints sent back and
forth as part of integer DH are replaced by the opaque strings used in
ECDH. So I've invented a new KEXTYPE and made it control a bunch of
small conditionals in the middle of the GSS KEX code, leaving the rest
unchanged.
None of the constructors can fail and return NULL. I think this must
have come in with a lot of other unnecessary null-pointer checks when
ECC support was first added. I've got rid of most of them since then,
but that one apparently escaped my notice.
Introduced recently by commit 42740a5455, in which I decided to
call ssh_key_cache_str() even on certified host keys. But that call
was conditional on s->hkey being non-NULL (which happens in GSS KEX)
as well as on it not being certified, and I managed to absentmindedly
remove _both_ conditions. As a result we got a null-pointer
dereference on any GSS kex.
These were introduced in 34d01e1b65 to pretty-print certificate validity
ranges. But Microsoft's C runtime took a while to catch up with C99 --
stackoverflow claims that VS2013 and earlier don't support these
specifiers -- so it's possible to end up with PuTTY executables that
misdisplay these dates. Also, the mingw-w64 toolchain's -Wformat
complains about these specifiers, at least on Debian buster, presumably
for the same reason.
Since the specifiers in question have exact pre-C99 replacements, it
seems easiest just to use those.
This was lost in the mkfiles.pl->cmake transition (c19e7215dd).
Without this, MinGW builds were providing format strings like %zu to a
version of vsnprintf that didn't support them at runtime, so you'd get
messages like "Pageant has zu SSH-2 keys". (-Wformat would have
complained about the unknown %z format specifier, but even STRICT MinGW
builds don't get those warnings, hm.)
Now the runtime version understands %zu.
I've reviewed the other compile-time definitions that were unique to the
old Makefile.mgw, and decided not to reinstate any of them:
WIN32S_COMPAT: leave it out.
This came in in bd4b8c1285. Rationale from Joris van Rantwijk in email
2000-01-24: "Use -DWIN32S_COMPAT to avoid a linking error about
SystemPowerStatus".
But that problem was solved another way within 8 months, and
WIN32S_COMPAT removed from the code, in 76746a7d61, so this wart had
been redundant since then.
_NO_OLDNAMES: decided not to add anything back for this.
This actually does nothing with the mingw-w64 fork (which seems to spell
it NO_OLDNAMES), although current versions of original-mingw do also
still spell it _NO_OLDNAMES.
They both seem to be about suppressing a behaviour where a load of
"non-ANSI" names like strdup get redirected to invoke _strdup in MS'
libraries.
Again, original rationale is from Joris van Rantwijk: "Compile and link
with -mno-cygwin (and -D_NO_OLDNAMES) to get executables that don't need
the Cygwin DLL file."
Since I don't know of any behavioural differences that this causes
(unlike vsnprintf/_vsnprintf), and it's not obviously causing trouble
for me, continue to leave things in the default state.
I ran 'ls /usr/share/doc' in a psusan session the other day and the
output hung part way through the long directory listing. This turned
out to be because the ssh-connection channel window had run out
temporarily, and PuTTY had sent a WINDOW_ADJUST extending it again,
which the connection layer acted on by calling chan_set_input_wanted
... which sesschan was ignoring with a comment saying /* I don't think
we need to do anything here */.
Well, turns out we do need to. Implemented the simplest possible
unblocking action.
I only recently found out that OpenSSH defined their own protocol IDs
for AES-GCM, defined to work the same as the standard ones except that
they fixed the semantics for how you select the linked cipher+MAC pair
during key exchange.
(RFC 5647 defines protocol ids for AES-GCM in both the cipher and MAC
namespaces, and requires that you MUST select both or neither - but
this contradicts the selection policy set out in the base SSH RFCs,
and there's no discussion of how you resolve a conflict between them!
OpenSSH's answer is to do it the same way ChaCha20-Poly1305 works,
because that will ensure the two suites don't fight.)
People do occasionally ask us for this linked cipher/MAC pair, and now
I know it's actually feasible, I've implemented it, including a pair
of vector implementations for x86 and Arm using their respective
architecture extensions for multiplying polynomials over GF(2).
Unlike ChaCha20-Poly1305, I've kept the cipher and MAC implementations
in separate objects, with an arm's-length link between them that the
MAC uses when it needs to encrypt single cipher blocks to use as the
inputs to the MAC algorithm. That enables the cipher and the MAC to be
independently selected from their hardware-accelerated versions, just
in case someone runs on a system that has polynomial multiplication
instructions but not AES acceleration, or vice versa.
There's a fourth implementation of the GCM MAC, which is a pure
software implementation of the same algorithm used in the vectorised
versions. It's too slow to use live, but I've kept it in the code for
future testing needs, and because it's a convenient place to dump my
design comments.
The vectorised implementations are fairly crude as far as optimisation
goes. I'm sure serious x86 _or_ Arm optimisation engineers would look
at them and laugh. But GCM is a fast MAC compared to HMAC-SHA-256
(indeed compared to HMAC-anything-at-all), so it should at least be
good enough to use. And we've got a working version with some tests
now, so if someone else wants to improve them, they can.
I booted my M1 Mac into macOS rather than Asahi for the first time in
a while, and discovered that an OS update seems to have added some
sysctl flags indicating the presence of the CPU extensions that I
previously knew of no way to check for! Added them checks to
arm_arch_queries.c, though I've also retained backwards compat with
the previous OS version which didn't have them at all.
This provides a convenient hook to be called between SSH messages, for
the crypto components to do any per-message processing like
incrementing a sequence number.
Not sure how I missed this! I tested ChaCha20, but not the MAC that
goes with it. Happily, it passes, so no harm done.
This also involved adding a general framework for testing MACs that
are tied to a specific cipher: we have to allocate, key and IV the
cipher before attempting to use the MAC, and free it all afterwards.