1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-25 01:02:24 +00:00
putty-source/test/utf8.txt

31 lines
2.1 KiB
Plaintext
Raw Normal View History

Test of UTF-8 output in a terminal emulator
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Some basic Unicode:
∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈: ⌈x⌉ = x⌋, α ∧ ¬β = ¬(¬α β),
⊆ ℕ₀ ⊂ , ⊥ < a ≠ b ≡ c ≤ d ≪ ⇒ (A ⇔ B),
Combining characters:
STARGΛ̊TE SG-1, a = v̇ = r̈, a⃑ ⊥ b⃑
[----------------------------|------------------------]
๏ แผ่นดินฮั่นเสื่อมโทรมแสนสังเวช พระปกเกศกองบู๊กู้ขึ้นใหม่
สิบสองกษัตริย์ก่อนหน้าแลถัดไป สององค์ไซร้โง่เขลาเบาปัญญา
Wide characters with difficult wrapping:
Here we go then: コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ コンニチハ
Arabic and bidirectional text:
(من مجمع الزوائد ومنبع الفوائد للهيثمي ، ج 1 ، ص 74-84)
عن جرير رضي الله عنه قال قال رسول الله صلى الله عليه
وسلم: بني الاسلام على خمس شهادة ان لا اله الا الله واقام
Mixed LTR and RTL text: جرير رضي back to LTR.
East Asian Ambiguous characters: ¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾¼½¾
More Unicode samples for utf8.txt, most of which fail. These samples all come from the 'emoji' parts of Unicode, although I use the word a bit loosely because I'm not sure that flags count (they have their own special system). But they're all things that ought to display via a separate font, likely in colour. The second line of this extra test already looks correct in PuTTY: three code points each representing an emoji, for which wcwidth() correctly reports that they occupy 2 cells each. On GTK, the emoji even appear in colour; on Windows they come out in black and white. (And I don't know what I can do to fix that; the problem is not that I don't have any emoji font installed. I do.) The first line consists of 'simpler' emoji in the sense of being more common, but technically more complicated, because they're ordinary Unicode characters such as U+2764 HEAVY BLACK HEART, modified into emoji by U+FE0F VARIATION SELECTOR-16. This goes badly because wcwidth() measures the primary character as having width 1 (which it would do, by itself), and the variation selector as width 0 (also not unreasonable), but the total is 1, where you'd like it to be 2. This is also difficult to fix, because if we unilaterally changed it then every curses-type library would mispredict the cursor position and produce display corruption during partial screen redraws! The third line uses a mechanism I've only found out about recently: U+200D ZERO WIDTH JOINER glues together two code points that would each be a valid emoji on its own, to make a single combined one. In this case, WOMAN + PERSONAL COMPUTER ought to combine into a woman using a computer. Again this doesn't work in PuTTY, which knows nothing about ZWJ. But it comes out as expected in other tools viewing this file, such as 'gedit', or Firefox. The fourth line shows another complex emoji case: the WOMAN code point is followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, and another one is followed by U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6, in each case selecting the woman's skin tone. PuTTY mishandles that too, because it doesn't know that those should act as modifiers (again because wcwidth gives them width 2 rather than 0), and so each one occupies an extra two character cells. And the last line contains some sample flags, each of which is obtained by writing a 2-letter code for a country or region (here GB, UA, EU) with each Latin letter replaced by the appropriate 'regional indicator symbol letter' from the 26-code-point range U+1F1E6 to U+1F1FF inclusive. PuTTY doesn't know anything about those either, but they at least occupy the right number of cells if handled naïvely, so _that_ one might be possible to fix!
2024-05-06 07:58:38 +00:00
Emoji via U+FE0F: ❤️ ☺️ ☹️ (narrow, because wcwidth mishandles these)
Dedicated emoji: 💜 🙂 🙁 (wide and should look correct)
Combined via ZWJ: 👩‍💻 (PuTTY doesn't understand ZWJ)
Skin tone mod: 👩🏻 👩🏿 (wcwidth doesn't know those are modifiers)
Flags: 🇬🇧 🇺🇦 🇪🇺 (also too complicated)