putty-source

mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

Author	SHA1	Message	Date
Simon Tatham	1441023f5a	read_ucd.py: tolerate whitespace in EastAsianWidth.txt. Unicode 16.0.0 has changed the formatting of that file in a way that I'm sure _they_ thought was unproblematic :-) by putting spaces around the character class field, which the reading code wasn't prepared to cope with.	2024-09-22 19:05:41 +01:00
Simon Tatham	d3e186e81b	Function to check a UTF-8 string for unknown characters. So we can reject things we don't know how to NFC yet.	2022-11-11 08:49:05 +00:00
Simon Tatham	b35d23f699	Implement Unicode normalisation. A new module in 'utils' computes NFC and NFD, via a new set of data tables generated by read_ucd.py. The new module comes with a new test program, which can read the NormalizationTest.txt that appears in the Unicode Character Database. All the tests pass, as of Unicode 15.	2022-11-11 08:48:18 +00:00
Simon Tatham	430af47a38	Polish the output of read_ucd.py. The initial outputs were all deliberately inconsistent with each other, so that each one exactly matched the existing table I was trying to replace. Now I've done that check, I can clean them up. Normalised spacing and case to be consistent; removed pointless indentation (these are now include files, so they don't have to be indented to the same level as the array declaration surrounding each one's #include); added a header comment in each autogenerated file, saying that it's autogenerated, what it's for, and who it's used by. The currently supported version number of Unicode is also exposed in a header file, so that I can put it in diagnostics.	2022-11-11 08:44:01 +00:00
Simon Tatham	b72c9aba28	New script to generate Unicode data tables. This will replace the various pieces of Perl scattered throughout the code base in comments above long boring data tables. The idea is that those long boring tables will move into header files in the new 'unicode' directory, and will be #included from the source files that use the tables. One benefit is that I won't have to page tediously past the tables to get to the actual code I want to edit. But more importantly, it should now become easy to update to a new version of Unicode, by re-running just one script and committing the changed versions of all the headers in the 'unicode' subdir. This version of the script regenerates six Unicode-derived tables in the existing source code in a byte-for-byte identical form. In the next commits I'll clean it up, commit the output, and delete the tables from their previous locations. (One table I _haven't_ incorporated into this system is the Arabic shaping table in bidi.c, because my attempt to regenerate it came out not matching the original at all. That _might_ be because the table is based on an old Unicode standard and desperately needs updating, but it might also be because I misunderstood how it works. So I'll leave sorting that out for another time.)	2022-11-09 19:21:02 +00:00

5 Commits