In both stripctrl_locale_put_wc() and stripctrl_term_put_wc(), we get
a character width from either wcwidth or term_char_width, which might
be -1 for a control character. Coverity complained that that width
value is later passed to to stripctrl_check_line_limit, which takes a
size_t parameter, i.e. it expects the width to be non-negative.
This is another bug that doesn't actually cause damage, but the
reasons why are quite fiddly:
- The only width<0 characters we let through are those that got
through stripctrl_ctrlchar_ok. (At both call sites, if that
function is unhappy then we either take an early return or else
replace the character _and_ the value of 'width' with a
substitution.)
- stripctrl_ctrlchar_ok only lets through \n, or \r if permit_cr is
set.
- stripctrl_check_line_limit ignores its width parameter if line
limiting is turned off, or if it gets \n.
- So the only way this can cause a problem is if permit_cr is set so
that \r can get through ctrlchar_ok, *and* line limiting is turned
on so that we get far enough through check_line_limit to choke on
its negative width.
- But in fact all the call sites that ever call
stripctrl_enable_line_limiting do it with a StripCtrlChars that
they got from a seat, and all the nontrivial implementations of
seat_stripctrl_new pass false for permit_cr.
So the combination of circumstances in which -1 gets implicitly
converted to size_t *and* stripctrl_check_line_limit actually pays
attention to it can never arise. But it's clearly foolish to make the
correctness of the code depend on a proof that long - now Coverity has
pointed it out, I'm not happy with it either! Fixed by substituting 0
for the width in the questionable cases.
Coverity points out that wcwidth is capable of returning a negative
number, which suggests that it's a mistake to pass its return value
unchecked to stripctrl_check_line_limit.
This shouldn't cause a problem _in principle_, because by the time
we're making that call, we should already have ruled out the kind of
dangerous control character that might provoke that return value from
wcwidth. But in practice, I couldn't absolutely guarantee that
everyone's idea of what is or is not a control character agrees in
every detail, so I think Coverity is right to urge caution.
Fixed by calling wcwidth (or its wrapper term_char_width) up front,
and ensuring that any character provoking a negative return value is
included in the 'control characters to sanitise out' branch of the
preceding logic.
Now it can optionally check that output lines don't go beyond a
certain length (measured in terminal columns, via wcwidth, rather than
bytes or characters). In this mode, lines are prefixed with a
distinctive character (namely '|'), and if a line is too long, then it
is broken and the continuation line gets a different prefix ('>').
When StripCtrlChars is targeting a terminal, it asks the terminal to
call wcwidth on its behalf, so it can be sure to use the same idea as
the real terminal about which characters are wide (i.e. depending on
the configuration of ambiguous characters).
This mode isn't yet used anywhere.
Now instead of making a StripCtrlChars just for that function call, it
uses an existing one, pointing it at the output strbuf via
stripctrl_retarget.
This adds flexibility (now you can use the same convenient string-
sanitising function with a StripCtrl configured in any way you like)
and also saves pointless setting-up and tearing-down of identical sccs
all the time.
The existing call sites in PSCP and PSFTP now use a static
StripCtrlChars instance that was made at program startup.
stripctrl_retarget() points the StripCtrlChars at a new BinarySink, to
avoid having to pointlessly throw it away and make a new one all the
time.
Since that probably means the same scc is going to be reused for
processing a fresh data stream, we also don't want any character-set
conversion state hanging over from the previous stream, so we also
reset the state in the process. Just in case it's needed,
stripctrl_reset() is also provided to do that operation on its own.
If you use the new stripctrl_new_term() to construct a StripCtrlChars
instead of the existing stripctrl_new(), then the resulting object
will align itself with the character-set configuration of the Terminal
object you point it at. (In fact, it'll reuse the same actual
translation code, courtesy of the last few refactoring commits.) So it
will interpret things as control characters precisely if that Terminal
would also have done so.
The previous locale-based sanitisation is appropriate if you're
sending the sanitised output to an OS terminal device managed outside
this process - the LC_CTYPE setting has the best chance of knowing how
that terminal device will interpret a byte stream. But I want to start
using the same sanitisation system for data intended for PuTTY's own
internal terminal emulator, in which case there's no reason why
LC_CTYPE should be expected to match that terminal's configuration,
and no reason to need it to either since we can check the internal
terminal configuration directly.
One small bodge: stripctrl_new_term() is actually a macro, which
passes in the function pointer term_translate() to the underlying real
constructor. That's just so that console-only tools can link in
stripctrl.c without acquiring a dependency on terminal.c (similarly to
how we pass random_read in to the mp_random functions).
This is for sanitising output that's going to be sent to a terminal,
if you don't want it to be able to send arbitrary escape sequences and
thereby (for example) move the cursor back up to existing text on the
screen and overprint it confusingly.
It works using the standard C library: we convert to a wide-character
string and back, and then use wctype.h to spot control characters in
the intermediate form. This means its idea of the conversion character
set is locale-based rather than any of our own charset library's fixed
settings - which is what you want if the aim is to protect your local
terminal (which we assume the system locale represents accurately).
This also means that the sanitiser strips things that will _act_ as
control characters when sent to the local terminal, whether or not
they were intended as control characters by a server that might have
had a different character set in mind. Since the main aim is to
protect the local terminal rather than to faithfully replicate the
server's intention, I think that's the right criterion.
It only strips control characters at the charset-independent layer,
like backspace, carriage return and the escape character: wctype.h
classifies those as control characters, but classifies as printing all
of the more Unicode-specific controls like bidirectional overrides.
But that's enough to prevent cursor repositioning, for example.
stripctrl.c comes with a test main() of its own, which I wasn't able
to fold into testcrypt and put in the test suite because of its
dependence on the system locale - it wouldn't be guaranteed to work
the same way on different test systems anyway.
A knock-on build tweak: because you can feed data into this sanitiser
in chunks of arbitrary size, including partial multibyte chars, I had
to use mbrtowc() for the decoding, and that means that in the 'old'
Win32 builds I have to link against the Visual Studio C++ library as
well as the C library, because for some reason that's where mbrtowc
lived in VS2003.