Similarly to the one I just added for FontSpec: in a cross-platform
main source file, you don't really want to mess about with
per-platform ifdefs just to initialise a 'struct unicode_data' from a
Conf. But until now, you had to, because init_ucs had a different
prototype on Windows and Unix.
I plan to use this in future test programs. But an immediate positive
effect is that it removes the only platform-dependent call from
fuzzterm.c. So now that could be built on Windows too, given only an
appropriate cmake stanza. (Not that I have much idea if it's useful to
fuzz the terminal separately on multiple platforms, but it's nice to
know that it's possible if anyone does need to.)
Instead of maintaining a single sparse table mapping Unicode to the
currently selected code page, we now maintain a collection of such
tables mapping Unicode to any code page we've so far found a need to
work with, and we add code pages to that list as necessary, and never
throw them away (since there are a limited number of them).
This means that the wc_to_mb family of functions are effectively
stateless: they no longer depend on a 'struct unicode_data'
corresponding to the current terminal settings. So I've removed that
parameter from all of them.
This fills in the missing piece of yesterday's commit a216d86106:
now wc_to_mb too should be able to handle internally-implemented
character sets, by hastily making their reverse mapping table if it
doesn't already have it.
(That was only a _latent_ bug, because the only use of wc_to_mb in the
cross-platform or Windows code _did_ want to convert to the currently
selected code page, so the old strategy worked in that case. But there
was no protection against an unworkable use of it being added later.)
Checking various implementations of these functions against each
other, I noticed by eyeball review that some of the special cases in
mb_to_wc() never check the buffer limit at all. Yikes!
Fortunately, I think there's no vulnerability, because these special
cases are ones that write out at most one wide char per multibyte
char, and at all the call sites (including dup_mb_to_wc) we allocate
that much even for the first attempt. The only exception to that is
the call in key_event() in unix/window.c, which uses a fixed-size
output buffer, but its input will always be the data generated by an X
keystroke event. So that one can only overrun the buffer if an X key
event manages to translate into more than 32 wide characters of text -
and even if that does come up in some exotic edge case, it will at
least not be happening under _enemy_ control.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.