putty-source/utils/dup_wc_to_mb.c

/*
 * dup_wc_to_mb: memory-allocating wrapper on wc_to_mb.
 *
 * Also dup_wc_to_mb_c: same but you already know the length of the
 * wide string, and you get told the length of the returned string.
 * (But it's still NUL-terminated, for convenience.).
 */

#include <wchar.h>

#include "putty.h"
#include "misc.h"

char *dup_wc_to_mb_c(int codepage, const wchar_t *string,
                     size_t inlen, const char *defchr, size_t *outlen_p)
{
    strbuf *sb = strbuf_new();
    put_wc_to_mb(sb, codepage, string, inlen, defchr);
    if (outlen_p)
        *outlen_p = sb->len;
    return strbuf_to_str(sb);
}

char *dup_wc_to_mb(int codepage, const wchar_t *string,
                   const char *defchr)
{
    return dup_wc_to_mb_c(codepage, string, wcslen(string), defchr, NULL);
}
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`/*`
			`* dup_wc_to_mb: memory-allocating wrapper on wc_to_mb.`
			`*`
			`* Also dup_wc_to_mb_c: same but you already know the length of the`
Add UTF-8 support to the new Windows ConsoleIO system. This allows you to set a flag in conio_setup() which causes the returned ConsoleIO object to interpret all its output as UTF-8, by translating it to UTF-16 and using WriteConsoleW to write it in Unicode. Similarly, input is read using ReadConsoleW and decoded from UTF-16 to UTF-8. This flag is set to false in most places, to avoid making sudden breaking changes. But when we're about to present a prompts_t to the user, it's set from the new 'utf8' flag in that prompt, which in turn is set by the userauth layer in any case where the prompts are going to the server. The idea is that this should be the start of a fix for the long- standing character-set handling bug that strings transmitted during SSH userauth (usernames, passwords, k-i prompts and responses) are all supposed to be in UTF-8, but we've always encoded them in whatever our input system happens to be using, and not done any tidying up on them. We get occasional complaints about this from users whose passwords contain characters that are encoded differently between UTF-8 and their local encoding, but I've never got round to fixing it because it's a large piece of engineering. Indeed, this isn't nearly the end of it. The next step is to add UTF-8 support to all the _other_ ways of presenting a prompts_t, as best we can. Like the previous change to console handling, it seems very likely that this will break someone's workflow. So there's a fallback command-line option '-legacy-charset-handling' to revert to PuTTY's previous behaviour. 2022-11-25 12:57:43 +00:00			`* wide string, and you get told the length of the returned string.`
			`* (But it's still NUL-terminated, for convenience.).`
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`*/`

			`#include <wchar.h>`

			`#include "putty.h"`
			`#include "misc.h"`

dup_mb_to_wc, dup_wc_to_mb: remove the 'flags' parameter. This parameter was undocumented, and Windows-specific: its semantics date from before PuTTY was cross-platform, and are "Pass this flags parameter straight through to the Win32 API's conversion functions". So in Windows platform code you can pass flags like MB_USEGLYPHCHARS, but in cross-platform code, you dare not pass anything nonzero at all because the Unix frontend won't recognise it (or, likely, even compile). I've kept the flag for now in the underlying mb_to_wc / wc_to_mb functions. Partly that's because there's one place in the Windows code where the parameter _is_ used; mostly, it's because I'm about to replace those functions anyway, so there's no point in editing all the call sites twice. 2024-09-24 07:46:39 +00:00			`char dup_wc_to_mb_c(int codepage, const wchar_t string,`
Add UTF-8 support to the new Windows ConsoleIO system. This allows you to set a flag in conio_setup() which causes the returned ConsoleIO object to interpret all its output as UTF-8, by translating it to UTF-16 and using WriteConsoleW to write it in Unicode. Similarly, input is read using ReadConsoleW and decoded from UTF-16 to UTF-8. This flag is set to false in most places, to avoid making sudden breaking changes. But when we're about to present a prompts_t to the user, it's set from the new 'utf8' flag in that prompt, which in turn is set by the userauth layer in any case where the prompts are going to the server. The idea is that this should be the start of a fix for the long- standing character-set handling bug that strings transmitted during SSH userauth (usernames, passwords, k-i prompts and responses) are all supposed to be in UTF-8, but we've always encoded them in whatever our input system happens to be using, and not done any tidying up on them. We get occasional complaints about this from users whose passwords contain characters that are encoded differently between UTF-8 and their local encoding, but I've never got round to fixing it because it's a large piece of engineering. Indeed, this isn't nearly the end of it. The next step is to add UTF-8 support to all the _other_ ways of presenting a prompts_t, as best we can. Like the previous change to console handling, it seems very likely that this will break someone's workflow. So there's a fallback command-line option '-legacy-charset-handling' to revert to PuTTY's previous behaviour. 2022-11-25 12:57:43 +00:00			`size_t inlen, const char defchr, size_t outlen_p)`
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`{`
Rework Unicode conversion APIs to use a BinarySink. The previous mb_to_wc and wc_to_mb had horrible and also buggy APIs. This commit introduces a fresh pair of functions to replace them, which generate output by writing to a BinarySink. So it's now up to the caller to decide whether it wants the output written to a fixed-size buffer with overflow checking (via buffer_sink), or dynamically allocated, or even written directly to some other output channel. Nothing uses the new functions yet. I plan to migrate things over in upcoming commits. What was wrong with the old APIs: they had that awkward undocumented Windows-specific 'flags' parameter that I described in the previous commit and took out of the dup_X_to_Y wrappers. But much worse, the semantics for buffer overflow were not just undocumented but actually inconsistent. dup_wc_to_mb() in utils assumed that the underlying wc_to_mb would fill the buffer nearly full and return the size of data it wrote. In fact, this was untrue in the case where wc_to_mb called WideCharToMultiByte: that returns straight-up failure, setting the Windows error code to ERROR_INSUFFICIENT_BUFFER. It _does_ partially fill the output buffer, but doesn't tell you how much it wrote! What's wrong with the new API: it's a bit awkward to write a sequence of wchar_t in native byte order to a byte-oriented BinarySink, so people using put_mb_to_wc directly have to do some annoying pointer casting. But I think that's less horrible than the previous APIs. Another change: in the new API for wc_to_mb, defchr can be "", but not NULL. 2024-09-24 07:18:48 +00:00			`strbuf *sb = strbuf_new();`
			`put_wc_to_mb(sb, codepage, string, inlen, defchr);`
			`if (outlen_p)`
			`*outlen_p = sb->len;`
			`return strbuf_to_str(sb);`
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`}`

dup_mb_to_wc, dup_wc_to_mb: remove the 'flags' parameter. This parameter was undocumented, and Windows-specific: its semantics date from before PuTTY was cross-platform, and are "Pass this flags parameter straight through to the Win32 API's conversion functions". So in Windows platform code you can pass flags like MB_USEGLYPHCHARS, but in cross-platform code, you dare not pass anything nonzero at all because the Unix frontend won't recognise it (or, likely, even compile). I've kept the flag for now in the underlying mb_to_wc / wc_to_mb functions. Partly that's because there's one place in the Windows code where the parameter _is_ used; mostly, it's because I'm about to replace those functions anyway, so there's no point in editing all the call sites twice. 2024-09-24 07:46:39 +00:00			`char dup_wc_to_mb(int codepage, const wchar_t string,`
Remove uni_tbl from struct unicode_data. Instead of maintaining a single sparse table mapping Unicode to the currently selected code page, we now maintain a collection of such tables mapping Unicode to any code page we've so far found a need to work with, and we add code pages to that list as necessary, and never throw them away (since there are a limited number of them). This means that the wc_to_mb family of functions are effectively stateless: they no longer depend on a 'struct unicode_data' corresponding to the current terminal settings. So I've removed that parameter from all of them. This fills in the missing piece of yesterday's commit a216d86106d40c3: now wc_to_mb too should be able to handle internally-implemented character sets, by hastily making their reverse mapping table if it doesn't already have it. (That was only a _latent_ bug, because the only use of wc_to_mb in the cross-platform or Windows code _did_ want to convert to the currently selected code page, so the old strategy worked in that case. But there was no protection against an unworkable use of it being added later.) 2022-06-01 07:35:12 +00:00			`const char *defchr)`
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`{`
dup_mb_to_wc, dup_wc_to_mb: remove the 'flags' parameter. This parameter was undocumented, and Windows-specific: its semantics date from before PuTTY was cross-platform, and are "Pass this flags parameter straight through to the Win32 API's conversion functions". So in Windows platform code you can pass flags like MB_USEGLYPHCHARS, but in cross-platform code, you dare not pass anything nonzero at all because the Unix frontend won't recognise it (or, likely, even compile). I've kept the flag for now in the underlying mb_to_wc / wc_to_mb functions. Partly that's because there's one place in the Windows code where the parameter _is_ used; mostly, it's because I'm about to replace those functions anyway, so there's no point in editing all the call sites twice. 2024-09-24 07:46:39 +00:00			`return dup_wc_to_mb_c(codepage, string, wcslen(string), defchr, NULL);`
Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to. 2022-03-12 16:01:21 +00:00			`}`