2022-03-12 16:01:21 +00:00
|
|
|
/*
|
|
|
|
* dup_wc_to_mb: memory-allocating wrapper on wc_to_mb.
|
|
|
|
*
|
|
|
|
* Also dup_wc_to_mb_c: same but you already know the length of the
|
Add UTF-8 support to the new Windows ConsoleIO system.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
2022-11-25 12:57:43 +00:00
|
|
|
* wide string, and you get told the length of the returned string.
|
|
|
|
* (But it's still NUL-terminated, for convenience.).
|
2022-03-12 16:01:21 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <wchar.h>
|
|
|
|
|
|
|
|
#include "putty.h"
|
|
|
|
#include "misc.h"
|
|
|
|
|
2024-09-24 07:46:39 +00:00
|
|
|
char *dup_wc_to_mb_c(int codepage, const wchar_t *string,
|
Add UTF-8 support to the new Windows ConsoleIO system.
This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.
This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.
The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.
Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.
Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.
2022-11-25 12:57:43 +00:00
|
|
|
size_t inlen, const char *defchr, size_t *outlen_p)
|
2022-03-12 16:01:21 +00:00
|
|
|
{
|
Rework Unicode conversion APIs to use a BinarySink.
The previous mb_to_wc and wc_to_mb had horrible and also buggy APIs.
This commit introduces a fresh pair of functions to replace them,
which generate output by writing to a BinarySink. So it's now up to
the caller to decide whether it wants the output written to a
fixed-size buffer with overflow checking (via buffer_sink), or
dynamically allocated, or even written directly to some other output
channel.
Nothing uses the new functions yet. I plan to migrate things over in
upcoming commits.
What was wrong with the old APIs: they had that awkward undocumented
Windows-specific 'flags' parameter that I described in the previous
commit and took out of the dup_X_to_Y wrappers. But much worse, the
semantics for buffer overflow were not just undocumented but actually
inconsistent. dup_wc_to_mb() in utils assumed that the underlying
wc_to_mb would fill the buffer nearly full and return the size of data
it wrote. In fact, this was untrue in the case where wc_to_mb called
WideCharToMultiByte: that returns straight-up failure, setting the
Windows error code to ERROR_INSUFFICIENT_BUFFER. It _does_ partially
fill the output buffer, but doesn't tell you how much it wrote!
What's wrong with the new API: it's a bit awkward to write a sequence
of wchar_t in native byte order to a byte-oriented BinarySink, so
people using put_mb_to_wc directly have to do some annoying pointer
casting. But I think that's less horrible than the previous APIs.
Another change: in the new API for wc_to_mb, defchr can be "", but not
NULL.
2024-09-24 07:18:48 +00:00
|
|
|
strbuf *sb = strbuf_new();
|
|
|
|
put_wc_to_mb(sb, codepage, string, inlen, defchr);
|
|
|
|
if (outlen_p)
|
|
|
|
*outlen_p = sb->len;
|
|
|
|
return strbuf_to_str(sb);
|
2022-03-12 16:01:21 +00:00
|
|
|
}
|
|
|
|
|
2024-09-24 07:46:39 +00:00
|
|
|
char *dup_wc_to_mb(int codepage, const wchar_t *string,
|
2022-06-01 07:35:12 +00:00
|
|
|
const char *defchr)
|
2022-03-12 16:01:21 +00:00
|
|
|
{
|
2024-09-24 07:46:39 +00:00
|
|
|
return dup_wc_to_mb_c(codepage, string, wcslen(string), defchr, NULL);
|
2022-03-12 16:01:21 +00:00
|
|
|
}
|