This stops cgtest from leaving detritus all over my git checkout.
There's a --keep option to revert to the previous behaviour, just in
case I actually want the detritus on some occasion - although in that
situation I might also need to arrange that the various intermediate
files all go by different names, because otherwise there's a good
chance that the one I cared about would already have been overwritten.
This bug applies to both the new stream-based agent forwarding, and
ordinary remote->local TCP port forwardings, because it was introduced
by the preliminary infrastructure in commit 09954a87c.
new_connection() and sk_new() accept a SockAddr *, and take ownership
of it. So it's a mistake to make an address, connect to it, and then
sk_addr_free() it: the free will decrement its reference count to
zero, and then the Socket made by the connection will be holding a
stale pointer. But that's exactly what I was doing in the version of
portfwdmgr_connect() that I rewrote in that refactoring. And then I
made the same error again in commit ae1148267 in the Unix stream-based
agent forwarding.
Now both fixed. Rather than remove the sk_addr_free() to make the code
look more like it used to, I've instead solved the problem by adding
an sk_addr_dup() at the point of making the connection. The idea is
that that should be more robust, in that it will still do the right
thing if portfwdmgr_connect_socket should later change so as not to
call its connect helper function at all.
The new Windows stream-based agent forwarding is unaffected by this
bug, because it calls new_named_pipe_client() with a pathname in
string format, without first wrapping it into a SockAddr.
You can now restrict testing to a single key type (for quicker round
trips once you know what you're debugging). Also --help, on general
principles now that there's more than one option.
In setting up the ECC tests for cmdgen, I noticed that OpenSSH and
PuTTYgen disagree on the bit length to put in a key fingerprint for an
ed25519 key: we think 255, they think 256.
On reflection, I think 255 is more accurate, which is why I bodged
get_fp() in the test suite to ignore that difference when checking our
key fingerprint against OpenSSH's. But having done that, it now seems
silly that if you unnecessarily specify a bit count at ed25519
generation time, cmdgen will insist that it be 256!
255 is now permitted everywhere an ed25519 bit count is input. 256 is
also still allowed for backwards compatibility but 255 is preferred by
the error message if you give any other value.
We've supported ECC keys for a while, but cgtest has never tested them
before. Now it does.
This wasn't quite as simple as adding two extra key types to the list.
I had to add a system of per-key-type flags in the tests to trigger
different expectations and workarounds: the new key types can't be
converted to and from ssh.com format, they behave differently from
rsa1 if you try (in that they'll get as far as asking for the
passphrase _before_ realising the key is an unsupported kind), and
also it turns out we disagree with OpenSSH ssh-keygen on the bit count
to write in the fingerprint of an ed25519 key. (We say 255, and they
say 256.)
But having fixed all those things, the tests pass.
I've adjusted the cmdgen main program so that it does all early
returns via the 'goto out' idiom, so that they still go through all
the last-minute freeing steps. That meant I had to adjust a few of the
last-minute freeing steps so they don't try to do impossible things
like freeing SSH2_WRONG_PASSPHRASE or calling a vtable method of a
null object. Also added a couple of completely missing frees, in
cmdgen itself ('outfiletmp') and in the cgtest wrapper main ('fp').
Now cgtest gets a completely clean run through Leak Sanitiser.
The self-test mode of command-line PuTTYgen used to be compiled by
manually setting a #define, so that it would _replace_ the puttygen
binary. Therefore, it was useful to still have it behave like puttygen
if invoked with arguments, so that you didn't have to annoyingly
recompile back and forth to switch between manual and automated
testing.
But now that cgtest is built _alongside_ puttygen, there's no need for
that. If someone needs the non-test version of puttygen, it's right
there next to cgtest. So I've removed that weird special case, and
replaced it with a new command-line syntax for cgtest which supports a
-v option (which itself replaces configuration via an awkward
environment variable CGTEST_VERBOSE).
A user reports that if the ^E answerback string is configured to be
empty, then causing the answerback to be sent fails the assertion in
ldisc_send introduced in commit c269dd013.
I thought I'd caught all of the remaining cases of this in commit
4634cd47f, but apparently not.
When I took the key comments out of the RSAKey / ssh2_userkey
structures stored in pageant.c's trees and moved them into the new
containing PageantKey structure, I forgot that that would mean Windows
Pageant - which tries to read the comments _from_ those structures -
would now not be able to show comments in its list box.
Comments are small, so the easiest fix is just to duplicate them in
both places.
I think this is on balance a cleanup in its own right. But the main
purpose is that now Pageant has its own struct that wraps the RSAKey
and ssh2_userkey objects it gets from the rest of the code. So I'll be
able to put extra Pageant-specific data in that struct in future.
Well, actually, two new test programs. agenttest.py is the actual
test; it depends on agenttestgen.py which generates a collection of
test private keys, using the newly exposed testcrypt interface to our
key generation code.
In this commit I've also factored out some Python SSH marshalling code
from cryptsuite, and moved it into a module ssh.py which the agent
tests can reuse.
This adds stability tests (of the form 'make sure this behaves
tomorrow the same way it behaved today, taking on faith that the
latter was right') for all the new in-memory APIs for public and
private key load/save.
All the functions that read and write public and private keys to a
FILE * are now small wrappers on top of an underlying set of functions
that read data in the same format from a BinarySource, or write it to
a strbuf. This sets me up to deal with key files in contexts other
than on disk.
This is like PTRLEN_LITERAL, but you can use it in a _declaration_ of
a compile-time constant ptrlen, instead of a literal in expression
context. 'const ptrlen foo = PTRLEN_DECL_LITERAL("bar");'
There are new functions here to get the next contiguous string of
characters from a given set (like strspn/strcspn, only for
BinarySource, and returning a ptrlen of what they skipped over). Also
we can get a line of text with the trailing newline chomped off.
Finally, I've provided a function to rewind a BinarySource to a
previous position with error checking.
Now they have names that are more consistent (no more userkey_this but
that_userkey); a bit shorter; and, most importantly, all the current
functions end in _f to indicate that they deal with keys stored in
disk files. I'm about to add a second set of entry points that deal
with keys via the more general BinarySource / BinarySink interface,
which will sit alongside these with a different suffix.
They're not much use for 'real' key generation, since like all the
other randomness-using testcrypt functions, they need you to have
explicitly queued up some random data. But for generating keys for
test purposes, they have the great virtue that they deliver the key in
the internal format, where we can generate all the various public and
private blobs from it as well as the on-disk formats.
A minor change to one of the keygen functions itself: rsa_generate now
fills in the 'bits' and 'bytes' fields of the returned RSAKey, without
which it didn't actually work to try to generate a public blob from
it. (We'd never noticed before, because no previous client of
rsa_generate even tried that.)
This doesn't affect what files are _legal_: the spec said we tolerated
three kinds of line ending, and it still says we tolerate the same
three. But I noticed that we're actually outputting \n by preference,
whereas the spec said we prefer \r\n. I'd rather change the docs than
the code.
I accepted both 'int' and 'uint' as function argument types, but
hadn't previously noticed that only 'uint' is handled properly as a
return type. Now both are.
It wasn't expanding the output strbuf to the full size of the key
modulus, so the output delivered to Python was only a part of the
mpint it should have been.
(Also, that was logically speaking a buffer overrun - we were writing
to the strbuf buffer beyond its length - although in practice I think
the _physical_ size of the buffer was large enough not to show it up
even under ASan. In any case, a buffer overrun only in the test suite,
and in a function I hadn't even got round to testing, is about the
best place to have one.)
While I'm here, I've also changed the way that the testcrypt wrapper
on rsa_ssh1_encrypt indicates failure: now we have the 'opt_'
mechanism, it can do that by returning None rather than "".
As in the previous commit, this means that agent_query() is now able
to operate in an asynchronous mode, so that if Pageant takes time to
answer a request, the GUI of the PuTTY instance making the request
won't be blocked.
Also as in the previous commit, we still fall back to the WM_COPYDATA
protocol if the new named pipe protocol isn't available.
Now that Pageant runs a named-pipe server as well as a WM_COPYDATA
server, we prefer the former (if available) for agent forwarding, for
the same reasons as on Unix: it lets us establish a simple raw-data
streaming connection instead of agentf.c's complicated message
boundary detection and buffer management, and if agent connections
ever become stateful, this technique will cope.
On Windows, another advantage of this change is that forwarded agent
requests can now be asynchronous: if the agent takes time to respond
to a request for any reason, then the rest of PuTTY's GUI and SSH
connection are not blocked, and you can carry on working while the
agent is thinking about the request.
(I didn't list that as a benefit of doing the same thing for Unix in
commit ae1148267, because on Unix, agent_query() could _already_ run
asynchronously. It's only on Windows that that's new.)
This reuses all the named-pipe IPC code I set up for connection
sharing a few years ago, to set up a named pipe with a predictable
name and speak the stream-oriented SSH agent protocol over it.
In this commit, we just set up the server, and there's no code that
speaks the client end of the new IPC yet. But my plan is that clients
should switch over to using this interface if possible, because it's
generally better: it doesn't have to be handled synchronously in the
middle of a GUI event loop (either in Pageant itself _or_ in its
client), and it's a better fit to the connection-oriented nature of
forwarded agent connections (so if any features ever appear in the
agent protocol that require state within a connection, we'll now be
able to support them).
This contains most of the guts of the previously monolithic function
new_named_pipe_client(), but it directly returns the HANDLE to the
opened pipe, or a string error message on failure.
new_named_pipe_client() is now a thin veneer on top of that, which
returns a Socket * by wrapping up the HANDLE into a HandleSocket or
the error message into an ErrorSocket as appropriate.
So it's now possible to connect to a named pipe, using all our usual
infrastructure (including in particular the ownership check of the
server, to defend against spoofing attacks), without having to have a
Socket-capable event loop in progress.
Windows Plink and PSFTP had very similar implementations, and now they
share one that lives in a new file winselcli.c. I've similarly moved
GUI PuTTY's implementation out of window.c into winselgui.c, where
other GUI programs wanting to do networking will be able to access
that too.
In the spirit of centralisation, I've also taken the opportunity to
make both functions use the reasonably complete winsock_error_string()
rather than (for some historical reason) each inlining a minimal
version that reports most errors as 'unknown'.
Historically, because of the way Windows Pageant's IPC works, PuTTY's
agent forwarding has always been message-oriented. The channel
implementation in agentf.c deals with receiving a data stream from the
remote agent client and breaking it up into messages, and then it
passes each message individually to agent_query().
On Unix, this is more work than is really needed, and I've always
meant to get round to doing the more obvious thing: making an agent
forwarding channel into simply a stream-oriented proxy, passing raw
data back and forth between the SSH channel and the local AF_UNIX
socket without having to know or care about the message boundaries in
the stream.
The portfwdmgr_connect_socket() facility introduced by the previous
commit is the missing piece of infrastructure to make that possible.
Now, the agent client module provides an API that includes a callback
you can pass to portfwdmgr_connect_socket() to open a streamed agent
connection, and the agent forwarding setup function tries to use that
where possible, only falling back to the message-based agentf.c system
if it can't be done. On Windows, the new piece of agent-client API
returns failure, so we still fall back to agentf.c there.
There are two benefits to doing it this way. One is that it's just
simpler and more robust: if PuTTY isn't trying to parse the agent
connection, then it has less work to do and fewer places to introduce
bugs. The other is that it's futureproof against changes in the agent
protocol: if any kind of extension is ever introduced that requires
keeping state within a single agent connection, or that changes the
protocol itself so that agentf's message-boundary detection stops
working, then this forwarding system will still work.
The new portfwdmgr_connect_socket() works basically like the existing
portfwdmgr_connect(), in that it opens an SSH forwarding channel and
gateways it to a Socket. But where portfwdmgr_connect() started from a
(hostname,port) pair and used mgr->conf to inform name lookup and
proxy settings, portfwdmgr_connect_socket() simply takes a callback
that it will call when it wants you to make a Socket for a given Plug,
and that callback can make any kind of Socket it likes.
The idea is that this way you can make port forwardings that talk to
things other than genuine TCP connections, by simply providing an
appropriate callback.
My immediate use case for this is for agent forwarding, and will
appear in the next commit. But it's easy to imagine other purposes you
might use a thing like this for, such as forwarding SSH channels to
AF_UNIX sockets in general.
In commit 105672e32 I added the pty_backend_exit_signame() function,
which constructs the SSH wire name of the signal that terminated the
pty session process. I did it by having a sequence of inline #ifdefs
for all the translatable signal names, each one guarding the code that
expected that signal to be defined in <signal.h>.
But there was no need to write all that out longhand, because in the
preceding commit 72eca76d2, I made a central list of signal names in
sshsignals.h, so all I should have needed to do was to set up the
macros that header expects, and let _it_ do the iteration over the
locally defined subset of signal ids! So now I do that instead.
If the user is scrolled back in the scrollback when a screen-swap
takes place, and if we're not configured to reset the scrollback
completely on the grounds that the swap is display activity, then we
should do the same thing we do for other kinds of display activity:
strive to keep the scroll position pointing at the same text. In this
case, that means adjusting term->disptop by the number of virtual
lines added to the scrollback to allow the main screen to be viewed
while the alt screen is active.
This improves the quality of behaviour in that corner case, but more
importantly, it should also fix a case of the dreaded line==NULL
assertion failure, which someone just reported against 0.73 when
exiting tmux (hence, switching away from the alt screen) while
scrolled back in a purely virtual scrollback buffer: the virtual
scrollback lines vanished, but disptop was still set to a negative
value, which made it out of range.
When I'm declaring a local instance of some context structure type to
pass to a function which will pass it in turn to a callback, I've
tended to use a declaration of the form
struct context actx, *ctx = &actx;
so that the outermost caller can initialise the context, and/or read
out fields of it afterwards, by the same syntax 'ctx->foo' that the
callback function will be using. So you get visual consistency between
the two functions that share this context.
It only just occurred to me that there's a much neater way to declare
a context struct of this kind, which still makes 'ctx' behave like a
pointer in the owning function, and doesn't need all that weird
verbiage or a spare variable name:
struct context ctx[1];
That's much nicer! I've switched to doing that in all existing cases I
could find, and also in a couple of cases where I hadn't previously
bothered to do the previous more cumbersome idiom.
A long time ago, in commit 09f86ce7e, I introduced a separate copy of
the saved cursor position (used by the ESC 7 / ESC 8 sequences) for
the main and alternate screens. The idea was to fix mishandling of an
input sequence of the form
ESC 7 (save cursor)
ESC [?47h (switch to alternate screen)
...
ESC 7 ESC 8 (save and restore cursor, while in alternate screen)
...
ESC [?47l (switch back from alternate screen)
ESC 8 (restore cursor, expecting it to match the _first_ ESC 7)
in which, before the fix, the second ESC 7 would overwrite the
position saved by the first one. So the final ESC 8 would restore the
cursor position to wherever it happened to have been saved in the
alternate screen, instead of where it was saved before switching _to_
the alternate screen.
I've recently noticed that the same bug still happens if you use the
alternative escape sequences ESC[?1047h and ESC[?1047l to switch to
the alternate screen, instead of ESC[?47h and ESC[?47l. This is
because that version of the escape sequence sets the internal flag
'keep_cur_pos' in the call to swap_screen, whose job is to arrange
that the actual cursor position doesn't change at the instant of the
switch. But the code that swaps the _saved_ cursor position in and out
is also conditioned on keep_cur_pos, so the 1047 variant of the
screen-swap sequence was bypassing that too, and behaving as if there
was just a single saved cursor position inside and outside the
alternate screen.
I don't know why I did it that way in 2006. It could have been
deliberate for some reason, or it could just have been mindless copy
and paste from the existing cursor-related swap code. But checking
with xterm now, it definitely seems to be wrong: the 1047 screen swap
preserves the _actual_ cursor position across the swap, but still has
independent _saved_ cursor positions in the two screens. So now PuTTY
does the same.
The do_select function is called with a boolean parameter indicating
whether we're supposed to start or stop paying attention to network
activity on a given socket. So if we freeze and unfreeze the socket in
mid-session because of backlog, we'll call do_select(s, false) to
freeze it, and do_select(s, true) to unfreeze it.
But the implementation of do_select in the Windows SFTP code predated
the rigorous handling of socket backlogs, so it assumed that
do_select(s, true) would only be called at initialisation time, i.e.
only once, and therefore that it was safe to use that flag as a cue to
set up the Windows event object to associate with socket activity.
Hence, every time the socket was frozen and unfrozen, we would create
a new netevent at unfreeze time, leaking the old one.
I think perhaps part of the reason why that was hard to figure out was
that the boolean parameter was called 'startup' rather than 'enable'.
To make it less confusing the next time I read this code, I've also
renamed it, and while I was at it, adjusted another related comment.
As explained in the comment in the code, this makes it easier to map
addresses in the log files back to addresses in the code, if the
testsc image is built as a position-independent executable.
This commit switches as many ssh_hash_free / ssh_hash_new pairs as
possible to reuse the previous hash object via ssh_hash_reset. Also a
few other cleanups: use the wrapper function hash_simple() where
possible, and I've also introduced ssh_hash_digest_nondestructive()
and switched to that where possible as well.
The h_outer, h_inner and h_live hash objects in the HMAC
implementation are now no longer freed and reallocated all the time.
Instead, they're reinitialised in place using the new ssh_hash_reset
and ssh_hash_copyfrom API functions.
This is partly a performance optimisation (malloc and free take time),
but also, it should fix an intermittent failure in the side-channel
test system 'testsc', which seems to be happening because of those
free/malloc pairs not happening the same way in successive runs. (In
other words, this didn't reflect a genuine side-channel leakage in the
actual crypto, only a failure of experimental control in the test.)
The idea is to arrange that an ssh_hash object can be reused without
having to free it and allocate a new one. So the 'final' method has
been replaced with 'digest', which does everything except the trailing
free; and there's also a new pair of methods 'reset' and 'copyfrom'
which overwrite the state of a hash with either the starting state or
a copy of another state. Meanwhile, the 'new' allocator function has
stopped performing 'reset' as a side effect; now it _just_ does the
administrative stuff (allocation, setting up vtables), and returns an
object which isn't yet ready to receive any actual data, expecting
that the caller will either reset it or copy another hash state into
it.
In particular, that means that the SHA-384 / SHA-512 pair no longer
need separate 'new' methods, because only the 'reset' part has to
change between them.
This commit makes no change to the user-facing API of wrapper
functions in ssh.h, except to add new functions which nothing yet
calls. The user-facing ssh_hash_new() calls the new and reset methods
in succession, and the copy and final methods still exist to do
new+copy and digest+free.
It demonstrates a successful round trip from a source integer to
ciphertext and back, and also I've hardcoded the ciphertext I got from
the first attempt so that future changes to the code won't be able to
change it without me noticing.