1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

More Unicode samples for utf8.txt, most of which fail.

These samples all come from the 'emoji' parts of Unicode, although I
use the word a bit loosely because I'm not sure that flags count (they
have their own special system). But they're all things that ought to
display via a separate font, likely in colour.

The second line of this extra test already looks correct in PuTTY:
three code points each representing an emoji, for which wcwidth()
correctly reports that they occupy 2 cells each. On GTK, the emoji
even appear in colour; on Windows they come out in black and
white. (And I don't know what I can do to fix that; the problem is not
that I don't have any emoji font installed. I do.)

The first line consists of 'simpler' emoji in the sense of being more
common, but technically more complicated, because they're ordinary
Unicode characters such as U+2764 HEAVY BLACK HEART, modified into
emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because
wcwidth() measures the primary character as having width 1 (which it
would do, by itself), and the variation selector as width 0 (also not
unreasonable), but the total is 1, where you'd like it to be 2. This
is also difficult to fix, because if we unilaterally changed it then
every curses-type library would mispredict the cursor position and
produce display corruption during partial screen redraws!

The third line uses a mechanism I've only found out about recently:
U+200D ZERO WIDTH JOINER glues together two code points that would
each be a valid emoji on its own, to make a single combined one. In
this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman
using a computer. Again this doesn't work in PuTTY, which knows
nothing about ZWJ. But it comes out as expected in other tools viewing
this file, such as 'gedit', or Firefox.

The fourth line shows another complex emoji case: the WOMAN code point
is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and
another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6,
in each case selecting the woman's skin tone. PuTTY mishandles that
too, because it doesn't know that those should act as modifiers (again
because wcwidth gives them width 2 rather than 0), and so each one
occupies an extra two character cells.

And the last line contains some sample flags, each of which is
obtained by writing a 2-letter code for a country or region (here GB,
UA, EU) with each Latin letter replaced by the appropriate 'regional
indicator symbol letter' from the 26-code-point range U+1F1E6 to
U+1F1FF inclusive. PuTTY doesn't know anything about those either, but
they at least occupy the right number of cells if handled naïvely, so
_that_ one might be possible to fix!
This commit is contained in:
Simon Tatham 2024-05-06 08:58:38 +01:00
parent 6b10eaa245
commit 640c7028f8

View File

@ -21,3 +21,10 @@ Arabic and bidirectional text:
Mixed LTR and RTL text: جرير رضي back to LTR.
East Asian Ambiguous characters: ¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾
Emoji via U+FE0F: ❤️ ☺️ ☹️ (narrow, because wcwidth mishandles these)
Dedicated emoji: 💜 🙂 🙁 (wide and should look correct)
Combined via ZWJ: 👩‍💻 (PuTTY doesn't understand ZWJ)
Skin tone mod: 👩🏻 👩🏿 (wcwidth doesn't know those are modifiers)
Flags: 🇬🇧 🇺🇦 🇪🇺 (also too complicated)