1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-25 01:02:24 +00:00
putty-source/test/utf8.txt

31 lines
2.1 KiB
Plaintext
Raw Normal View History

Test of UTF-8 output in a terminal emulator
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Some basic Unicode:
∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈: ⌈x⌉ = x⌋, α ∧ ¬β = ¬(¬α β),
⊆ ℕ₀ ⊂ , ⊥ < a ≠ b ≡ c ≤ d ≪ ⇒ (A ⇔ B),
Combining characters:
STARGΛ̊TE SG-1, a = v̇ = r̈, a⃑ ⊥ b⃑
[----------------------------|------------------------]
๏ แผ่นดินฮั่นเสื่อมโทรมแสนสังเวช พระปกเกศกองบู๊กู้ขึ้นใหม่
สิบสองกษัตริย์ก่อนหน้าแลถัดไป สององค์ไซร้โง่เขลาเบาปัญญา
Wide characters with difficult wrapping:
Here we go then: コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ
Arabic and bidirectional text:
(من مجمع الزوائد ومنبع الفوائد للهيثمي ، ج 1 ، ص 74-84)
عن جرير رضي الله عنه قال قال رسول الله صلى الله عليه
وسلم: بني الاسلام على خمس شهادة ان لا اله الا الله واقام
Mixed LTR and RTL text: جرير رضي back to LTR.
East Asian Ambiguous characters: ¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾
More Unicode samples for utf8.txt, most of which fail. These samples all come from the 'emoji' parts of Unicode, although I use the word a bit loosely because I'm not sure that flags count (they have their own special system). But they're all things that ought to display via a separate font, likely in colour. The second line of this extra test already looks correct in PuTTY: three code points each representing an emoji, for which wcwidth() correctly reports that they occupy 2 cells each. On GTK, the emoji even appear in colour; on Windows they come out in black and white. (And I don't know what I can do to fix that; the problem is not that I don't have any emoji font installed. I do.) The first line consists of 'simpler' emoji in the sense of being more common, but technically more complicated, because they're ordinary Unicode characters such as U+2764 HEAVY BLACK HEART, modified into emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because wcwidth() measures the primary character as having width 1 (which it would do, by itself), and the variation selector as width 0 (also not unreasonable), but the total is 1, where you'd like it to be 2. This is also difficult to fix, because if we unilaterally changed it then every curses-type library would mispredict the cursor position and produce display corruption during partial screen redraws! The third line uses a mechanism I've only found out about recently: U+200D ZERO WIDTH JOINER glues together two code points that would each be a valid emoji on its own, to make a single combined one. In this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman using a computer. Again this doesn't work in PuTTY, which knows nothing about ZWJ. But it comes out as expected in other tools viewing this file, such as 'gedit', or Firefox. The fourth line shows another complex emoji case: the WOMAN code point is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6, in each case selecting the woman's skin tone. PuTTY mishandles that too, because it doesn't know that those should act as modifiers (again because wcwidth gives them width 2 rather than 0), and so each one occupies an extra two character cells. And the last line contains some sample flags, each of which is obtained by writing a 2-letter code for a country or region (here GB, UA, EU) with each Latin letter replaced by the appropriate 'regional indicator symbol letter' from the 26-code-point range U+1F1E6 to U+1F1FF inclusive. PuTTY doesn't know anything about those either, but they at least occupy the right number of cells if handled naïvely, so _that_ one might be possible to fix!
2024-05-06 07:58:38 +00:00
Emoji via U+FE0F: ❤️ ☺️ ☹️ (narrow, because wcwidth mishandles these)
Dedicated emoji: 💜 🙂 🙁 (wide and should look correct)
Combined via ZWJ: 👩‍💻 (PuTTY doesn't understand ZWJ)
Skin tone mod: 👩🏻 👩🏿 (wcwidth doesn't know those are modifiers)
Support Unicode flag glyphs in terminal.c (works in GTK). This is the only one of the newly added cases in test/utf8.txt which I can (try to) fix unilaterally just by changing PuTTY's display code, because it doesn't change the number of character cells occupied by the text, only the appearance of those cells. In this commit I make the necessary changes in terminal.c, which makes flags start working in GTK PuTTY and pterm, but not on Windows. The system of encoding flags in Unicode is that there's a space of 26 regional-indicator letter code points (U+1F1E6 to U+1F1FF inclusive) corresponding to the unaccented Latin alphabet, and an adjacent pair of those letters represents the flag associated with that two-letter code (usually a nation, although at least one non-nation pair exists, namely EU). There are two plausible ways we could handle this in terminal.c: (a) leave the regional indicators as they are in the internal data model, so that each RI letter occupies its own character cell, and at display time have do_paint() spot adjacent pairs of them and send each pair to the frontend as a combined glyph. (b) combine the pairs _in_ the internal data model, by special-casing them in term_display_graphic_char(). This choice makes a semantic difference. What if a flag is displayed in the terminal and something overprints one of its two character cells? With option (a), overprinting one cell of an RI pair with a different RI letter would change it into a different flag; with option (b), flags behave like any other wide character, in that overprinting one of the two cells blanks the other as a side effect. I think we need (a), because not all terminal redraw systems (curses-style libraries) will understand the Unicode flag glyph system at all. So if a full-screen application genuinely wants to do a screen redraw in which a flag changes to a different flag while keeping one of its constituent letters the same (say, swapping between BA and CA, or between AC and AD), then the redraw library might very well implement that screen update by redrawing only the changed letter, and we need not to corrupt the flag. All of this is now implemented in terminal.c. The effect is that pairs of RI characters are passed to the TermWin draw_text() method as if they were a wide character with a combining mark: that is, you get a two-character (or four-surrogate) string, with TATTR_COMBINING indicating that it represents a single glyph, and ATTR_WIDE indicating that that glyph occupies two character cells rather than one. In GTK, that's enough to make flag display Just Work. But on Windows (at least the Win10 machine I have to test on), that doesn't make flags start working all by itself. But then, the rest of the new emoji tests also look a bit confused on Windows too. Help would be welcome from someone who knows how Windows emoji display is supposed to work!
2024-05-06 10:07:12 +00:00
Flags: 🇬🇧 🇺🇦 🇪🇺 (should work in GTK 2 or better)
More Unicode samples for utf8.txt, most of which fail. These samples all come from the 'emoji' parts of Unicode, although I use the word a bit loosely because I'm not sure that flags count (they have their own special system). But they're all things that ought to display via a separate font, likely in colour. The second line of this extra test already looks correct in PuTTY: three code points each representing an emoji, for which wcwidth() correctly reports that they occupy 2 cells each. On GTK, the emoji even appear in colour; on Windows they come out in black and white. (And I don't know what I can do to fix that; the problem is not that I don't have any emoji font installed. I do.) The first line consists of 'simpler' emoji in the sense of being more common, but technically more complicated, because they're ordinary Unicode characters such as U+2764 HEAVY BLACK HEART, modified into emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because wcwidth() measures the primary character as having width 1 (which it would do, by itself), and the variation selector as width 0 (also not unreasonable), but the total is 1, where you'd like it to be 2. This is also difficult to fix, because if we unilaterally changed it then every curses-type library would mispredict the cursor position and produce display corruption during partial screen redraws! The third line uses a mechanism I've only found out about recently: U+200D ZERO WIDTH JOINER glues together two code points that would each be a valid emoji on its own, to make a single combined one. In this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman using a computer. Again this doesn't work in PuTTY, which knows nothing about ZWJ. But it comes out as expected in other tools viewing this file, such as 'gedit', or Firefox. The fourth line shows another complex emoji case: the WOMAN code point is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6, in each case selecting the woman's skin tone. PuTTY mishandles that too, because it doesn't know that those should act as modifiers (again because wcwidth gives them width 2 rather than 0), and so each one occupies an extra two character cells. And the last line contains some sample flags, each of which is obtained by writing a 2-letter code for a country or region (here GB, UA, EU) with each Latin letter replaced by the appropriate 'regional indicator symbol letter' from the 26-code-point range U+1F1E6 to U+1F1FF inclusive. PuTTY doesn't know anything about those either, but they at least occupy the right number of cells if handled naïvely, so _that_ one might be possible to fix!
2024-05-06 07:58:38 +00:00