If you use the new stripctrl_new_term() to construct a StripCtrlChars
instead of the existing stripctrl_new(), then the resulting object
will align itself with the character-set configuration of the Terminal
object you point it at. (In fact, it'll reuse the same actual
translation code, courtesy of the last few refactoring commits.) So it
will interpret things as control characters precisely if that Terminal
would also have done so.
The previous locale-based sanitisation is appropriate if you're
sending the sanitised output to an OS terminal device managed outside
this process - the LC_CTYPE setting has the best chance of knowing how
that terminal device will interpret a byte stream. But I want to start
using the same sanitisation system for data intended for PuTTY's own
internal terminal emulator, in which case there's no reason why
LC_CTYPE should be expected to match that terminal's configuration,
and no reason to need it to either since we can check the internal
terminal configuration directly.
One small bodge: stripctrl_new_term() is actually a macro, which
passes in the function pointer term_translate() to the underlying real
constructor. That's just so that console-only tools can link in
stripctrl.c without acquiring a dependency on terminal.c (similarly to
how we pass random_read in to the mp_random functions).
This is for sanitising output that's going to be sent to a terminal,
if you don't want it to be able to send arbitrary escape sequences and
thereby (for example) move the cursor back up to existing text on the
screen and overprint it confusingly.
It works using the standard C library: we convert to a wide-character
string and back, and then use wctype.h to spot control characters in
the intermediate form. This means its idea of the conversion character
set is locale-based rather than any of our own charset library's fixed
settings - which is what you want if the aim is to protect your local
terminal (which we assume the system locale represents accurately).
This also means that the sanitiser strips things that will _act_ as
control characters when sent to the local terminal, whether or not
they were intended as control characters by a server that might have
had a different character set in mind. Since the main aim is to
protect the local terminal rather than to faithfully replicate the
server's intention, I think that's the right criterion.
It only strips control characters at the charset-independent layer,
like backspace, carriage return and the escape character: wctype.h
classifies those as control characters, but classifies as printing all
of the more Unicode-specific controls like bidirectional overrides.
But that's enough to prevent cursor repositioning, for example.
stripctrl.c comes with a test main() of its own, which I wasn't able
to fold into testcrypt and put in the test suite because of its
dependence on the system locale - it wouldn't be guaranteed to work
the same way on different test systems anyway.
A knock-on build tweak: because you can feed data into this sanitiser
in chunks of arbitrary size, including partial multibyte chars, I had
to use mbrtowc() for the decoding, and that means that in the 'old'
Win32 builds I have to link against the Visual Studio C++ library as
well as the C library, because for some reason that's where mbrtowc
lived in VS2003.