After a conversation this week with a user who tried to use it, it's
clear that Borland C can't build the up-to-date PuTTY without having
to make too many compromises of functionality (unsupported API
details, no 'long long' type), even above the issues that could be
worked round with extra porting ifdefs.
A user reports that a remote window title query, if the window title
is empty or if the option to return it is disabled, fails the
assertion in ldisc_send that I introduced as part of commit c269dd013
to catch any lingering uses of ldisc_send with length 0 that should
have turned into ldisc_echoedit_update. Added a check for len > 0
guarding that ldisc_send call, and likewise at one or two others I
noticed on my way here.
(Probably at some point I should decide that the period of smoking out
lingering old-style ldisc_send(0) calls is over, and declare it safe
to remove that assertion again and get rid of all the cumbersome
safety checks at call sites like these ones. But not quite yet.)
I've just upgraded my build process to a version of lld-link
that knows how to read .res, and I think it's a slightly more
commonly found format, so less confusing to encounter.
64-bit PuTTY should be loading gssapi64.dll, not gssapi32.dll. (In
contrast to the Windows system API DLLs, such as secur32.dll which is
also mentioned in the same source file; those keep the "32" in their
name whether we're in Win32 or Win64.)
When it calls through ocr->handler() to process the response to a
channel request, sometimes that call ends up back in the main SSH-2
authconn coroutine, and sometimes _that_ will call bomb_out(), which
closes the whole SSH connection and frees all the channels - so that
when control returns back up the call stack to
ssh2_msg_channel_response itself which continues working with the
channel it was passed, it's using freed memory and things go badly.
This is the sort of thing I'd _like_ to fix using some kind of
large-scale refactoring along the lines of moving all the actual
free() calls out into top-level callbacks, so that _any_ function
which is holding a pointer to something can rely on that pointer still
being valid after it calls a subroutine. But I haven't worked out all
the details of how that system should work, and doubtless it will turn
out to have problems of its own once I do, so here's a point fix which
simply checks if the whole SSH session has been closed (which is easy
- much easier than checking if that _channel_ structure still exists)
and fixes the immediate bug.
(I think this is the real fix for the problem reported by the user I
mention in commit f0126dd19, because I actually got the details wrong
in the log message for that previous commit: the user's SSH server
wasn't rejecting the _opening_ of the main session channel, it was
rejecting the "shell" channel request, so this code path was the one
being exercised. Still, the other bug was real too, so no harm done!)
A user reported a nonsensical assertion failure (claiming that
ssh->version != 2) which suggested that a channel had somehow outlived
its parent Ssh in the situation where the opening of the main session
channel is rejected by the server. Checking with valgrind suggested
that things start to go wrong at the point where we free the half-set-
up ssh->mainchan before having filled in its type field, so that the
switch in ssh_channel_close_local() picks an arbitrary wrong action.
I haven't reproduced the same failure the user reported, but with this
change, Unix plink is now valgrind-clean in that failure situation.
chiark's ftp server sometimes randomly refuses downloads. In the case
where this happens at postcheck time, this isn't really
release-blocking (all the files have been test-downloaded by precheck
already, so the main aim at this stage is to check that the 'latest'
symlink points to the right place, and even one or two successful
downloads are good enough to confirm that in practice). So now I can
add --no-ftp to the postcheck command line if that makes my life
easier.
This should have been part of commit ea0ab1c82; it's part of the
general revamp that we regenerate the autoconf files ourselves in a
clean directory - so we don't depend on them being present, but we
also don't depend on them being _absent_ either.
But when I made that commit my immediate priority was to get --setver
to work from a completely clean checkout, not from one already
littered with cruft, so I didn't check quite as carefully that my
changes fixed the problem in the latter case too :-)
In recent releases we've taken to making the actual release build (or
rather, candidates for it) ahead of time so that we can do some
slightly more thorough last-minute testing of the exact binaries that
we're going to release to everyone. It's time I actually wrote that
procedure down in the checklist, so that I remember what it is.
In particular, we had the idea that we should not properly GPG-sign
the release until the last moment, and use the presence of a set of
full GPG signatures as a means of distinguishing the real release
build from an RC that accidentally got out into the wild somehow. This
checklist update formalises that too, and documents the method I used
of ensuring the binaries weren't tampered with between RC building and
release signing (by making a signature on just the sha512sums). I also
include in this commit an extra command-line option to sign.sh to make
that preliminary signature step more convenient.
Previously, it demanded that your checkout was in a state where you
had run autoconf but not configure; so if you previously did have a
Makefile then you had to 'make distclean' to remove it, whereas if you
previously had no generated files at all (e.g. starting from a
completely clean checkout) then you had to run mkfiles.pl and
mkauto.sh to generate 'configure'.
This is obviously confusing, and moreover, the dependence on prior
generated files is fragile and prone to them having been generated
wrong. Adjusted the script so that it uses 'git archive' to get a
clean directory containing only the version-controlled files, and then
runs scripts itself to get that directory into the state it wants.
I listed a lot of other build options, but not that one. The release
checklist still recommends doing test builds with it, so it seems
sensible to arrange that you can tell if a build _is_ one of those or
not.
A lenof() being applied to a non-array, a couple of missing frees on
an error path, and a case in pfd_receive() where we could have gone
round a loop again after freeing the port forwarding structure it was
working on.
Like every other toolchain I've tried, my Coverity scanning build has
its share of random objections to parts of my Windows API type-
checking system. I do wonder if that bright idea was worth the hassle
- but it would probably cost all the annoyance all over again to back
out now...
Oops - I found this in an editor autosave file after pushing the
previous commit. I meant it to be part of the previous commit. Oh
well, better late than never.
Ilya Shipitsin sent me a list of errors reported by a tool 'cppcheck',
which I hadn't seen before, together with some fixes for things
already taken off that list. This change picks out all the things from
the remaining list that I could quickly identify as actual errors,
which it turns out are all format-string goofs along the lines of
using a %d with an unsigned int, or a %u with a signed int, or (in the
cases in charset/utf8.c) an actual _size_ mismatch which could in
principle have caused trouble on a big-endian target.
[psftp.c:1135] -> [psftp.c:1134]: (warning) Either the condition '!dir' is redundant or there is possible null pointer dereference: dir.
[psftp.c:1420] -> [psftp.c:1419]: (warning) Either the condition '!dir' is redundant or there is possible null pointer dereference: dir.
[unix/osxlaunch.c:133] -> [unix/osxlaunch.c:134]: (warning) Either the condition '!qhead' is redundant or there is possible null pointer dereference: qhead.
Rather than loading some from spoolss.dll, which some MSDN documentation
suggests, but which doesn't work very well in practice.
(Also, remove an unused variable.)
This has been a FIXME in the code for ages, because back when the main
channel was always a pty session or a program run in a pipe, there
weren't that many circumstances in which the actual CHANNEL_OPEN could
return failure, so it never seemed like a priority to get round to
pulling the error information out of the CHANNEL_OPEN_FAILURE response
message and including it in PuTTY or Plink's local error message.
However, 'plink -nc' is the real reason why this is actually
important; if you tell the SSH server to make a direct-tcpip network
connection as its main channel, then that can fail for all the usual
network-unreliability reasons, and you actually do want to know which
(did you misspell the hostname, or is the target server refusing
connections, or has network connectivity failed?). This actually bit
me today when I had such a network failure, and had to debug it by
pulling that information manually out of a packet log. Time to
eliminate that FIXME.
So I've pulled the error-extracting code out of the previous handler
for OPEN_FAILURE on non-main channels into a separate function, and
arranged to call that function if the main channel open fails too. In
the process I've made a couple of minor tweaks, e.g. if the server
sends back a reason code we haven't heard of, we say _what_ that
reason code was, and also we at least make a token effort to spot if
we see a packet other than OPEN_{CONFIRMATION,FAILURE} reaching the
main loop in response to the main channel-open.
Alamy Liu points out that asking for CONF_host will display the wrong
part of the configuration in the case where serial port setup fails.
The Windows front end's analogous message already got this right, but
I must have forgotten to change this one too when I introduced
conf_dest.
In particular, this means the w32 and w32old builds have
distinguishable buildinfo text, which should protect us against at
least one source of confusion when receiving bug reports.
I have a grubby method of getting this to work without Wine, which I
intend to get round to publishing just as soon as I finish deciding
what its best shape is. But I don't want to wait for that before
starting to actually use it, because this eliminates the last trace of
Windows in the PuTTY Windows builds.
This permits me to do the binary builds via cross-compilation on
Linux, which is nice from a perspective of not having my build
environment subject to the same potential pool of Windows malware that
might be interested in infecting the resulting binaries. Also, it's
quicker to build and the code generation is noticeably better.
We've been planning to do that for a while, and this installer-builder
isn't going to work anyway in the build environment I'm about to move
everything to, so this seems like the moment.
I've also been experimenting recently with running Wix on Linux under
Mono, rather than running it on Windows in the obvious way. Wix under
Mono insists on forward slashes in pathnames, and it turns out that
Wix on Windows doesn't object to them either, so I can safely change
them over unconditionally and then my installer source will work in
both modes.
I've been experimenting with using clang-cl with older versions of the
Visual Studio libraries and headers, and found that the older VS
libraries have a quirk which can cause link failure in a way that
doesn't happen with the newer one. I assume the corresponding old VS
linker must be doing some kind of special-case handling that lld isn't
mimicking.
The quirk arises because lld tries to pull in more than one of the
*crt0.obj startup objects which set things up before calling main or
WinMain (or whatever), and those objects define some of the same
symbols as each other. The fix is to explicitly ask for the right one
of those objects on the link command line, so that it's already been
loaded _before_ the linker starts searching libraries for unresolved
symbols; then the disputed symbols are never unresolved in the first
place during the library search phase.
But this means you have to pick your crt0 object differently depending
on which subsystem you're compiling for. Accordingly, here's an extra
feature in Makefile.clangcl to make that possible by means of the
right definitions on the make command line.
stdint.h is one of the set of standard C headers that has more to do
with the compiler than the library, and hence, clang provides its own
version of it, even when you're using it in clang-cl mode with Visual
Studio headers, and even when those headers are old enough not to have
a stdint.h of their own. So in clang builds, including stdint.h is
always the right way to get uintptr_t defined.
Patch due to Brian K. White; we now condition our own declaration of
DLL_DIRECTORY_COOKIE on whether the toolchain's headers had #defined
it already, rather than trying to guess the answer to that from
version-number macros.
(Apparently everything that defines DLL_DIRECTORY_COOKIE does it by
does seem to work in all my tests.)
While it's still true, the link to Winsock 2 is dead, our standard
release builds don't run on Win95 any more, and it's certainly not
frequently asked.
- I haven't heard of OpenSSH/OpenSSL mismatches being a common problem
for a long time. Specific advice about OpenSSH 3.1/3.4 seems unlikely
to be useful these days.
- "Incorrect MAC received on packet" doesn't seem to be a common
problem these days, and if anyone encounters it, the words in the
"Errors" bit of the docs seem adequate without a FAQ entry as well.
When making select_result() return void (a2fb1d9), I removed a "return"
at the end of the FD_CLOSE case, causing a fallthrough into FD_ACCEPT
with hilarious (segfaulting) consequences. Re-instate the "return".
plug_receive() and plug_closing() return 0 or 1 depending on whether
they think the main connection has closed. It is not appropriate, as
handle_gotdata and handle_socket_unfreeze did, to treat them as
returning a backlog. In fact, plugs are unusual in PuTTY in not
reporting a backlog, but just calling into the socket to freeze and
unfreeze it as required.
It's redundant with back->connected(): only the SSH backend has a
receive function that can ever return 0, and whenever ssh_receive
returns 0 it has called ssh_do_close, which will cause future calls
to ssh_connected also to return 0. Similarly, all backend closing
functions ensure that future calls to their connected function will
return 0.