1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00
Commit Graph

3 Commits

Author SHA1 Message Date
Simon Tatham
9e01de7c2b decode_utf8: add an enumeration of failure reasons.
Now you can optionally get back an enum value indicating whether the
character was successfully decoded, or whether U+FFFD was substituted
due to some kind of problem, and if the latter, what problem.

For a start, this allows distinguishing 'real' U+FFFD (encoded
legitimately in the input) from one invented by the decoder. Also, it
allows the recipient of the decode to treat failures differently,
either by passing on a useful error report to the user (as
utf8_unknown_char now does) or by doing something special.

In particular, there are two distinct error codes for a truncated
UTF-8 encoding, depending on whether it was truncated by the end of
the input or by encountering a non-continuation byte. The former code
means that the string is not legal UTF-8 _as it is_, but doesn't rule
out it being a (bytewise) prefix of a legal UTF-8 string - so if a
client is receiving UTF-8 data a byte at a time, they can treat that
error code specially and not make it a fatal error.
2023-02-17 17:16:54 +00:00
Simon Tatham
854d78eef3 Fix build failure on Visual Studio.
Unlike clang, VS didn't like me using the value of one 'static const'
integer variable to compute the value of another, and complained
'initializer is not a constant'. Replaced all those variables with an
enum, which should also more reliably ensure that even an
unsophisticated compiler doesn't actually reserve data-section space
for them.
2022-11-11 12:42:19 +00:00
Simon Tatham
b35d23f699 Implement Unicode normalisation.
A new module in 'utils' computes NFC and NFD, via a new set of data
tables generated by read_ucd.py.

The new module comes with a new test program, which can read the
NormalizationTest.txt that appears in the Unicode Character Database.
All the tests pass, as of Unicode 15.
2022-11-11 08:48:18 +00:00