The number of people has been steadily increasing who read our source
code with an editor that thinks tab stops are 4 spaces apart, as
opposed to the traditional tty-derived 8 that the PuTTY code expects.
So I've been wondering for ages about just fixing it, and switching to
a spaces-only policy throughout the code. And I recently found out
about 'git blame -w', which should make this change not too disruptive
for the purposes of source-control archaeology; so perhaps now is the
time.
While I'm at it, I've also taken the opportunity to remove all the
trailing spaces from source lines (on the basis that git dislikes
them, and is the only thing that seems to have a strong opinion one
way or the other).
Apologies to anyone downstream of this code who has complicated patch
sets to rebase past this change. I don't intend it to be needed again.
The structure field 'lengths' in 'struct zlib_decompress_ctx' was the
right length for the amount of data you might _sensibly_ want to put
in it, but two bytes shorter than the amount that the compressed block
header allows someone to _physically_ try to put into it. Now it has
the full maximum length.
The previous overrun could only reach two bytes beyond the end of the
array, and in every target architecture I know of, those two bytes
would have been structure padding, so it wasn't causing any real trouble.
In Deflate, both the literal/length and distance Huffman trees are
physically capable of encoding two symbol ids beyond the number that
the spec assigns any actual meaning to: a compressed block header can
specify code lengths for those two extra symbols if it wants to, in
which case those codes will be added to the Huffman tree (in
particular, will affect the encoding of everything else), but then
should not actually use those codes.
Our zlib decoder was silently ignoring the two invalid codes in the
literal/length tree, but treating the two invalid codes in the
distance tree as a fatal decoding error. That seems inconsistent. Now
we treat both as fatal decode errors.
The _nm strategy is slower, so I don't want to just change everything
over no matter what its contents. In this pass I've tried to catch
everything that holds the _really_ sensitive things like passwords,
private keys and session keys.
I've fixed a handful of these where I found them in passing, but when
I went systematically looking, there were a lot more that I hadn't
found!
A particular highlight of this collection is the code that formats
Windows clipboard data in RTF, which was absolutely crying out for
strbuf_catf, and now it's got it.
I noticed a few of these in the course of preparing the previous
commit. I must have been writing that idiom out by hand for _ages_
before it became totally habitual to #define it as 'lenof' in every
codebase I touch. Now I've gone through and replaced all the old
verbosity with nice terse lenofs.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
Now, instead of getting the zlib test/helper program by manually
compiling a source file with unusual options, it gets built as
standard by the ordinary Makefile.
It stopped being useful in commit 20a9bd564, where I removed the only
code that called it. I've only just noticed that this part of the
mechanism is still lying around.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
This commit includes <stdbool.h> from defs.h and deletes my
traditional definitions of TRUE and FALSE, but other than that, it's a
100% mechanical search-and-replace transforming all uses of TRUE and
FALSE into the C99-standardised lowercase spellings.
No actual types are changed in this commit; that will come next. This
is just getting the noise out of the way, so that subsequent commits
can have a higher proportion of signal.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
This was mildly fiddly because there's a single vtable structure that
implements two distinct interface types, one for compression and one
for decompression - and I have actually confused them before now
(commit d4304f1b7), so I think it's important to make them actually be
separate types!
Now when we construct a packet containing sensitive data, we just set
a field saying '... and make it take up at least this much space, to
disguise its true size', and nothing in the rest of the system worries
about that flag until ssh2bpp.c acts on it.
Also, I've changed the strategy for doing the padding. Previously, we
were following the real packet with an SSH_MSG_IGNORE to make up the
size. But that was only a partial defence: it works OK against passive
traffic analysis, but an attacker proxying the TCP stream and
dribbling it out one byte at a time could still have found out the
size of the real packet by noting when the dribbled data provoked a
response. Now I put the SSH_MSG_IGNORE _first_, which should defeat
that attack.
But that in turn doesn't work when we're doing compression, because we
can't predict the compressed sizes accurately enough to make that
strategy sensible. Fortunately, compression provides an alternative
strategy anyway: if we've got zlib turned on when we send one of these
sensitive packets, then we can pad out the compressed zlib data as
much as we like by adding empty RFC1951 blocks (effectively chaining
ZLIB_PARTIAL_FLUSHes). So both strategies should now be dribble-proof.
The return value wasn't used to indicate failure; it only indicated
whether any compression was being done at all or whether the
compression method was ssh_comp_none, and we can tell the latter just
as well by the fact that its init function returns a null context
pointer.
This removes a lot of pointless duplications of those constants.
Of course, _ideally_, I should upgrade to C99 bool throughout the code
base, replacing TRUE and FALSE with true and false and tagging
variables explicitly as bool when that's what they semantically are.
But that's a much bigger piece of work, and shouldn't block this
trivial cleanup!
This makes it clearer that it doesn't persist beyond this block, and
would have made it much more obvious that the assignment to it removed
in the previous commit was pointless.
Assignments that are overwritten shortly afterwards and never used,
and a completely unused variable. Also, the bogus array access in
testbn.c could have actually accessed one beyond the array limit
(though of course it's only in a test harness).
The symbol alphabet used for encoding ranges of backward distances in
a Deflate compressed block contains 32 symbol values, but two of them
(symbols 30 and 31) have no meaning, and hence it is an encoding error
for them to appear in a compressed block. If a compressed file did so
anyway, this decompressor would index past the end of the distcodes[]
array. Oops.
This is clearly a bug, but I don't believe it's a vulnerability. The
nonsense record we load from distcodes[] in this situation contains an
indeterminate bogus value for 'extrabits' (how many more bits to read
from the input stream to complete the backward distance) and also for
the offset to add to the backward distance after that. But neither of
these can lead to a buffer overflow: if extrabits is so big that
dctx->nbits (which is capped at 32) never exceeds it, then the
decompressor will simply swallow all further data without producing
any output, and otherwise the decompressor will consume _some_ number
of spare bits from the input, work out a backward distance and an
offset in the sliding window which will be utter nonsense and probably
out of bounds, but fortunately will then AND the offset with 0x7FFF at
the last minute, which makes it safe again. So I think the worst that
a malicious compressor can do is to cause the decompressor to generate
strange data, which of course it could do anyway if it wanted to by
sending that same data legally compressed.
[originally from svn r10278]
gcc 4.8 compiling with -O3 gives a new warning about the access to
st->pending at the top of lz77_compress, because for some reason it
thinks there's an out-of-bounds array access there (or perhaps just a
potential one, I'm not really sure which side -Warray-bounds is erring
on). Add an assertion reassuring it that st->npending can't get bigger
than the size of st->pending at the site it's complaining about, and a
second one at the site where st->npending is increased (just in case
my analysis of why it can't happen was wrong!). Also add a comment
explaining the assertions.
[originally from svn r10144]
(Since we choose to compile with -Werror, this is particularly important.)
I haven't yet checked that the resulting source actually compiles cleanly with
GCC 4, hence not marking `gcc4-warnings' as fixed just yet.
[originally from svn r7041]
middle of a PDF. So here's a modification to sshzlib.c which enables
it to be compiled into a standalone Zlib decoder if you define
ZLIB_STANDALONE. As an added bonus, it (both standalone and in
PuTTY) also validates the Zlib header, just to make sure someone
hasn't defined a new compression format.
[originally from svn r4657]
reading) in the zlib code when fed certain kinds of invalid data. As
a result, ssh.c now needs to be prepared for zlib_decompress_block
to return failure.
[originally from svn r3306]
malloc functions, which automatically cast to the same type they're
allocating the size of. Should prevent any future errors involving
mallocing the size of the wrong structure type, and will also make
life easier if we ever need to turn the PuTTY core code from real C
into C++-friendly C. I haven't touched the Mac frontend in this
checkin because I couldn't compile or test it.
[originally from svn r3014]
absent, and also (I think) all the frontend request functions (such
as request_resize) take a context pointer, so that multiple windows
can be handled sensibly. I wouldn't swear to this, but I _think_
that only leaves the Unicode stuff as the last stubborn holdout.
[originally from svn r2147]
uncompressed block at the end of each compressed packet) which we
were embarrassingly unable to deal with because we assumed every
uncompressed block contained at least one byte. Particularly silly
because I _knew_ about the existence of sync flush when I coded this
module. Arrgh. Still, now fixed.
[originally from svn r1824]
compression. This involves introducing an option to disable Zlib
compression (that is, continue to work within the Zlib format but
output an uncompressed block) for the duration of a single packet.
[originally from svn r982]
smalloc() macros and thence to the safemalloc() functions in misc.c.
This should allow me to plug in a debugging allocator and track
memory leaks and segfaults and things.
[originally from svn r818]