putty-source

nhyatt/putty-source

Fork 0

mirror of https://git.tartarus.org/simon/putty.git synced 2025-07-01 19:42:48 -05:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Simon Tatham	f4519b6533	Add UTF-8 support to the new Windows ConsoleIO system. This allows you to set a flag in conio_setup() which causes the returned ConsoleIO object to interpret all its output as UTF-8, by translating it to UTF-16 and using WriteConsoleW to write it in Unicode. Similarly, input is read using ReadConsoleW and decoded from UTF-16 to UTF-8. This flag is set to false in most places, to avoid making sudden breaking changes. But when we're about to present a prompts_t to the user, it's set from the new 'utf8' flag in that prompt, which in turn is set by the userauth layer in any case where the prompts are going to the server. The idea is that this should be the start of a fix for the long- standing character-set handling bug that strings transmitted during SSH userauth (usernames, passwords, k-i prompts and responses) are all supposed to be in UTF-8, but we've always encoded them in whatever our input system happens to be using, and not done any tidying up on them. We get occasional complaints about this from users whose passwords contain characters that are encoded differently between UTF-8 and their local encoding, but I've never got round to fixing it because it's a large piece of engineering. Indeed, this isn't nearly the end of it. The next step is to add UTF-8 support to all the _other_ ways of presenting a prompts_t, as best we can. Like the previous change to console handling, it seems very likely that this will break someone's workflow. So there's a fallback command-line option '-legacy-charset-handling' to revert to PuTTY's previous behaviour.	2022-11-26 10:49:03 +00:00
Simon Tatham	5a28658a6d	Remove uni_tbl from struct unicode_data. Instead of maintaining a single sparse table mapping Unicode to the currently selected code page, we now maintain a collection of such tables mapping Unicode to any code page we've so far found a need to work with, and we add code pages to that list as necessary, and never throw them away (since there are a limited number of them). This means that the wc_to_mb family of functions are effectively stateless: they no longer depend on a 'struct unicode_data' corresponding to the current terminal settings. So I've removed that parameter from all of them. This fills in the missing piece of yesterday's commit `a216d86106`: now wc_to_mb too should be able to handle internally-implemented character sets, by hastily making their reverse mapping table if it doesn't already have it. (That was only a _latent_ bug, because the only use of wc_to_mb in the cross-platform or Windows code _did_ want to convert to the currently selected code page, so the old strategy worked in that case. But there was no protection against an unworkable use of it being added later.)	2022-06-01 09:28:25 +01:00
Simon Tatham	21f602be40	Add utility function dup_wc_to_mb. This parallels dup_mb_to_wc, which already existed. I haven't needed the same thing this way round yet, but I'm about to.	2022-03-12 18:51:21 +00:00

Simon Tatham

f4519b6533

Add UTF-8 support to the new Windows ConsoleIO system.

This allows you to set a flag in conio_setup() which causes the
returned ConsoleIO object to interpret all its output as UTF-8, by
translating it to UTF-16 and using WriteConsoleW to write it in
Unicode. Similarly, input is read using ReadConsoleW and decoded from
UTF-16 to UTF-8.

This flag is set to false in most places, to avoid making sudden
breaking changes. But when we're about to present a prompts_t to the
user, it's set from the new 'utf8' flag in that prompt, which in turn
is set by the userauth layer in any case where the prompts are going
to the server.

The idea is that this should be the start of a fix for the long-
standing character-set handling bug that strings transmitted during
SSH userauth (usernames, passwords, k-i prompts and responses) are all
supposed to be in UTF-8, but we've always encoded them in whatever our
input system happens to be using, and not done any tidying up on them.
We get occasional complaints about this from users whose passwords
contain characters that are encoded differently between UTF-8 and
their local encoding, but I've never got round to fixing it because
it's a large piece of engineering.

Indeed, this isn't nearly the end of it. The next step is to add UTF-8
support to all the _other_ ways of presenting a prompts_t, as best we
can.

Like the previous change to console handling, it seems very likely
that this will break someone's workflow. So there's a fallback
command-line option '-legacy-charset-handling' to revert to PuTTY's
previous behaviour.

2022-11-26 10:49:03 +00:00

Simon Tatham

5a28658a6d

Remove uni_tbl from struct unicode_data.

Instead of maintaining a single sparse table mapping Unicode to the
currently selected code page, we now maintain a collection of such
tables mapping Unicode to any code page we've so far found a need to
work with, and we add code pages to that list as necessary, and never
throw them away (since there are a limited number of them).

This means that the wc_to_mb family of functions are effectively
stateless: they no longer depend on a 'struct unicode_data'
corresponding to the current terminal settings. So I've removed that
parameter from all of them.

This fills in the missing piece of yesterday's commit a216d86106:
now wc_to_mb too should be able to handle internally-implemented
character sets, by hastily making their reverse mapping table if it
doesn't already have it.

(That was only a _latent_ bug, because the only use of wc_to_mb in the
cross-platform or Windows code _did_ want to convert to the currently
selected code page, so the old strategy worked in that case. But there
was no protection against an unworkable use of it being added later.)

2022-06-01 09:28:25 +01:00

Simon Tatham

21f602be40

Add utility function dup_wc_to_mb.

This parallels dup_mb_to_wc, which already existed. I haven't needed
the same thing this way round yet, but I'm about to.

2022-03-12 18:51:21 +00:00

3 Commits