UTF-16 support. High Unicode characters in the terminal are now
converted back into surrogates during copy and draw operations, and
the Windows drawing code takes account of that when splitting up the
UTF-16 string for display. Meanwhile, accidental uses of wchar_t have
been replaced with 32-bit integers in parts of the cross-platform code
which were expecting not to have to deal with UTF-16.
[originally from svn r9409]
easily manage, by adopting a hybrid approach to Unicode text
display. The old approach of simply calling ExtTextOutW provided
font linking without us having to lift a finger, but didn't do the
right thing when it came to bidirectional or Arabic-shaped text.
Arabeyes' replacement exact_textout() supported the latter, but
turned out to break the former (with no warning from the Windows API
documentation, so it's not their fault).
So now I've got a second wrapper layer called general_textout(),
which splits the input string into substrings based on bidi
character class. Any character liable to cause bidi or shaping
behaviour if fed straight to ExtTextOutW is instead fed through
Arabeyes' exact_textout(), but the rest is fed straight to
ExtTextOutW as it used to be.
The effect appears to be that font linking is restored for all
characters _except_ Arabic and other bidi scripts, which means in
particular that we are no longer in a state of regression over 0.57.
(0.57 would have done font linking on Arabic as well, but would also
have misbidied it, so we've merely exchanged one failure mode for
another slightly less harmful one in that situation.)
[originally from svn r6910]
incorrect. I must have written that binary search idiom a hundred
times, so it's rather embarrassing that I can't _automatically_ get
it right! This was causing all kinds of characters to be classified
as ON when they should have been various other classes.
Also while I'm here, I've added another test case to utf8.txt (a
small piece of Arabic within a predominantly L->R line), and also
supplied a means to compile minibidi.c with -DTEST_GETTYPE to
produce a command-line character class lookup tool. (Not sure what
use that'll be _other_ than debugging this precise problem, but I
don't like to throw it away now I've written it :-)
[originally from svn r5016]
searches a list of (start,end,type) tuples. This increases data size
by about 5Kb, which is a shame; but on the plus side, it boosts
performance from O(N) to O(log N). As an added bonus, the table now
covers _all_ of Unicode, not just the BMP.
[originally from svn r4964]
- rewrote the reversal loop in flipThisRun to be considerably clearer
- rewrote leastGreaterOdd and leastGreaterEven as bit-twiddling macros
- replaced malloc/free with snewn/sfree
- lost some gratuitous repeat calls of getType on the same character
And most noticeably:
- got rid of minibidi.h, since it was entirely full of minibidi.c
internals (including constant data definitions!) and wasn't used
to provide an external interface at all. Everything in it has
been folded into minibidi.c.
[originally from svn r4963]