putty-source

nhyatt/putty-source

Fork 0

mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

Commit Graph

Author	SHA1	Message	Date
Simon Tatham	9e01de7c2b	decode_utf8: add an enumeration of failure reasons. Now you can optionally get back an enum value indicating whether the character was successfully decoded, or whether U+FFFD was substituted due to some kind of problem, and if the latter, what problem. For a start, this allows distinguishing 'real' U+FFFD (encoded legitimately in the input) from one invented by the decoder. Also, it allows the recipient of the decode to treat failures differently, either by passing on a useful error report to the user (as utf8_unknown_char now does) or by doing something special. In particular, there are two distinct error codes for a truncated UTF-8 encoding, depending on whether it was truncated by the end of the input or by encountering a non-continuation byte. The former code means that the string is not legal UTF-8 _as it is_, but doesn't rule out it being a (bytewise) prefix of a legal UTF-8 string - so if a client is receiving UTF-8 data a byte at a time, they can treat that error code specially and not make it a fatal error.	2023-02-17 17:16:54 +00:00
Simon Tatham	854d78eef3	Fix build failure on Visual Studio. Unlike clang, VS didn't like me using the value of one 'static const' integer variable to compute the value of another, and complained 'initializer is not a constant'. Replaced all those variables with an enum, which should also more reliably ensure that even an unsophisticated compiler doesn't actually reserve data-section space for them.	2022-11-11 12:42:19 +00:00
Simon Tatham	b35d23f699	Implement Unicode normalisation. A new module in 'utils' computes NFC and NFD, via a new set of data tables generated by read_ucd.py. The new module comes with a new test program, which can read the NormalizationTest.txt that appears in the Unicode Character Database. All the tests pass, as of Unicode 15.	2022-11-11 08:48:18 +00:00

Author

SHA1

Message

Date

Simon Tatham

9e01de7c2b

decode_utf8: add an enumeration of failure reasons.

Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.

For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.

In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.

2023-02-17 17:16:54 +00:00

Simon Tatham

854d78eef3

Fix build failure on Visual Studio.

Unlike clang, VS didn't like me using the value of one 'static const'
integer variable to compute the value of another, and complained
'initializer is not a constant'. Replaced all those variables with an
enum, which should also more reliably ensure that even an
unsophisticated compiler doesn't actually reserve data-section space
for them.

2022-11-11 12:42:19 +00:00

Simon Tatham

b35d23f699

Implement Unicode normalisation.

A new module in 'utils' computes NFC and NFD, via a new set of data
tables generated by read_ucd.py.

The new module comes with a new test program, which can read the
NormalizationTest.txt that appears in the Unicode Character Database.
All the tests pass, as of Unicode 15.

2022-11-11 08:48:18 +00:00

3 Commits