From a216d86106d40c38f05f1ffc03996be54d590aa6 Mon Sep 17 00:00:00 2001 From: Simon Tatham Date: Tue, 31 May 2022 13:13:57 +0100 Subject: [PATCH] Windows mb_to_wc: support internal SBCSes. A user points out that the new charset-aware window title setting doesn't work if the configured character set is one of the entries in cp_list[] based on a hard-coded Unicode translation table, such as the ISO 8859 family. That's because the Windows mb_to_wc() function assumes that the code page it's given will always be OK to pass to the Windows API function MultiByteToWideChar, forgetting that for those internally implemented single-byte character sets are not. This commit adds a manual implementation of SBCS -> Unicode based on those tables, which restores the ability to set a window title specified in ISO 8859. However, it's not a full fix to windows/unicode.c in general, because wc_to_mb has a similar blind spot: it's only prepared to convert Unicode to an internally implemented SBCS if that SBCS happens to be the one currently set in ucsdata->line_codepage, because that's when we've already prepared the reverse lookup table. Probably we ought to sort that out, and arrange that it can make the reverse lookup table if suddenly called on to do a different conversion. But that needs more refactoring, so I haven't done it in this commit. --- windows/unicode.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/windows/unicode.c b/windows/unicode.c index 943c3c2d..c507a941 100644 --- a/windows/unicode.c +++ b/windows/unicode.c @@ -1240,6 +1240,35 @@ int wc_to_mb(int codepage, int flags, const wchar_t *wcstr, int wclen, int mb_to_wc(int codepage, int flags, const char *mbstr, int mblen, wchar_t *wcstr, int wclen) { + if (codepage >= 65536) { + /* Character set not known to Windows, so we'll have to + * translate it ourself */ + size_t index = codepage - 65536; + if (index >= lenof(cp_list)) + return 0; + const struct cp_list_item *cp = &cp_list[index]; + if (!cp->cp_table) + return 0; + + size_t remaining = wclen; + wchar_t *p = wcstr; + unsigned tablebase = 256 - cp->cp_size; + + while (mblen > 0) { + mblen--; + unsigned c = 0xFF & *mbstr++; + wchar_t wc = (c < tablebase ? c : cp->cp_table[c - tablebase]); + if (remaining > 0) { + remaining--; + *p++ = wc; + } else { + return p - wcstr; + } + } + + return p - wcstr; + } + int ret = MultiByteToWideChar(codepage, flags, mbstr, mblen, wcstr, wclen); if (ret) return ret;