putty-source

mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

Author	SHA1	Message	Date
Simon Tatham	9e01de7c2b	decode_utf8: add an enumeration of failure reasons. Now you can optionally get back an enum value indicating whether the character was successfully decoded, or whether U+FFFD was substituted due to some kind of problem, and if the latter, what problem. For a start, this allows distinguishing 'real' U+FFFD (encoded legitimately in the input) from one invented by the decoder. Also, it allows the recipient of the decode to treat failures differently, either by passing on a useful error report to the user (as utf8_unknown_char now does) or by doing something special. In particular, there are two distinct error codes for a truncated UTF-8 encoding, depending on whether it was truncated by the end of the input or by encountering a non-continuation byte. The former code means that the string is not legal UTF-8 _as it is_, but doesn't rule out it being a (bytewise) prefix of a legal UTF-8 string - so if a client is receiving UTF-8 data a byte at a time, they can treat that error code specially and not make it a fatal error.	2023-02-17 17:16:54 +00:00
Simon Tatham	69e217d23a	Make decode_utf8() read from a BinarySource. This enables it to handle data that isn't presented as a NUL-terminated string. In particular, the NUL byte can appear _within_ the string and be correctly translated to the NUL wide character. So I've been able to remove the awkwardness in the test rig of having to include the terminating NUL in every test to ensure NUL has been tested, and instead, insert a single explicit test for it. Similarly to the previous commit, the simplification at the (one) call site gives me a strong feeling of 'this is what the API should have been all along'!	2022-11-09 19:21:02 +00:00
Simon Tatham	d89f2bfc55	Fix typo in decode_utf8 tests. The test in question was supposed to contain the spurious UTF-8 encoding that 0xD800 would have if it were not a surrogate. But the final continuation character 0x80 was instead 0x00. The test passed anyway, because ED A0 was regarded as a truncated sequence, instead of ED A0 80 being regarded as an illegal encoding of a surrogate, and both return the same output!	2022-11-09 19:21:02 +00:00
Simon Tatham	b360ea6ac1	Add a manual single-char UTF-8 decoder. This parallels encode_utf8 which we already had. Decoding is more fraught with perils than encoding, so I've also included a small test program.	2022-03-12 18:51:21 +00:00

4 Commits