1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-10 01:48:00 +00:00
putty-source/windows/utils/filename.c

103 lines
2.4 KiB
C
Raw Permalink Normal View History

/*
* Implementation of Filename for Windows.
*/
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
#include <wchar.h>
#include "putty.h"
Filename *filename_from_str(const char *str)
{
Rename 'ret' variables passed from allocation to return. I mentioned recently (in commit 9e7d4c53d80b6eb) message that I'm no longer fond of the variable name 'ret', because it's used in two quite different contexts: it's the return value from a subroutine you just called (e.g. 'int ret = read(fd, buf, len);' and then check for error or EOF), or it's the value you're preparing to return from the _containing_ routine (maybe by assigning it a default value and then conditionally modifying it, or by starting at NULL and reallocating, or setting it just before using the 'goto out' cleanup idiom). In the past I've occasionally made mistakes by forgetting which meaning the variable had, or accidentally conflating both uses. If all else fails, I now prefer 'retd' (short for 'returned') in the former situation, and 'toret' (obviously, the value 'to return') in the latter case. But even better is to pick a name that actually says something more specific about what the thing actually is. One particular bad habit throughout this codebase is to have a set of functions that deal with some object type (say 'Foo'), all *but one* of which take a 'Foo *foo' parameter, but the foo_new() function starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the canonical name for the ambient Foo is 'foo', so should foo_new()! So here's a no-brainer start on cutting down on the uses of 'ret': I looked for all the cases where it was being assigned the result of an allocation, and renamed the variable to be a description of the thing being allocated. In the case of a new() function belonging to a family, I picked the same name as the rest of the functions in its own family, for consistency. In other cases I picked something sensible. One case where it _does_ make sense not to use your usual name for the variable type is when you're cloning an existing object. In that case, _neither_ of the Foo objects involved should be called 'foo', because it's ambiguous! They should be named so you can see which is which. In the two cases I found here, I've called them 'orig' and 'copy'. As in the previous refactoring, many thanks to clang-rename for the help.
2022-09-13 13:53:36 +00:00
Filename *fn = snew(Filename);
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
fn->cpath = dupstr(str);
fn->wpath = dup_mb_to_wc(DEFAULT_CODEPAGE, fn->cpath);
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
fn->utf8path = encode_wide_string_as_utf8(fn->wpath);
return fn;
}
Filename *filename_from_wstr(const wchar_t *str)
{
Filename *fn = snew(Filename);
fn->wpath = dupwcs(str);
fn->cpath = dup_wc_to_mb(DEFAULT_CODEPAGE, fn->wpath, "?");
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
fn->utf8path = encode_wide_string_as_utf8(fn->wpath);
return fn;
}
Filename *filename_from_utf8(const char *ustr)
{
Filename *fn = snew(Filename);
fn->utf8path = dupstr(ustr);
fn->wpath = decode_utf8_to_wide_string(fn->utf8path);
fn->cpath = dup_wc_to_mb(DEFAULT_CODEPAGE, fn->wpath, "?");
Rename 'ret' variables passed from allocation to return. I mentioned recently (in commit 9e7d4c53d80b6eb) message that I'm no longer fond of the variable name 'ret', because it's used in two quite different contexts: it's the return value from a subroutine you just called (e.g. 'int ret = read(fd, buf, len);' and then check for error or EOF), or it's the value you're preparing to return from the _containing_ routine (maybe by assigning it a default value and then conditionally modifying it, or by starting at NULL and reallocating, or setting it just before using the 'goto out' cleanup idiom). In the past I've occasionally made mistakes by forgetting which meaning the variable had, or accidentally conflating both uses. If all else fails, I now prefer 'retd' (short for 'returned') in the former situation, and 'toret' (obviously, the value 'to return') in the latter case. But even better is to pick a name that actually says something more specific about what the thing actually is. One particular bad habit throughout this codebase is to have a set of functions that deal with some object type (say 'Foo'), all *but one* of which take a 'Foo *foo' parameter, but the foo_new() function starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the canonical name for the ambient Foo is 'foo', so should foo_new()! So here's a no-brainer start on cutting down on the uses of 'ret': I looked for all the cases where it was being assigned the result of an allocation, and renamed the variable to be a description of the thing being allocated. In the case of a new() function belonging to a family, I picked the same name as the rest of the functions in its own family, for consistency. In other cases I picked something sensible. One case where it _does_ make sense not to use your usual name for the variable type is when you're cloning an existing object. In that case, _neither_ of the Foo objects involved should be called 'foo', because it's ambiguous! They should be named so you can see which is which. In the two cases I found here, I've called them 'orig' and 'copy'. As in the previous refactoring, many thanks to clang-rename for the help.
2022-09-13 13:53:36 +00:00
return fn;
}
Filename *filename_copy(const Filename *fn)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
Filename *newfn = snew(Filename);
newfn->cpath = dupstr(fn->cpath);
newfn->wpath = dupwcs(fn->wpath);
newfn->utf8path = dupstr(fn->utf8path);
return newfn;
}
const char *filename_to_str(const Filename *fn)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
return fn->cpath; /* FIXME */
}
bool filename_equal(const Filename *f1, const Filename *f2)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
/* wpath is primary: two filenames refer to the same file if they
* have the same wpath */
return !wcscmp(f1->wpath, f2->wpath);
}
bool filename_is_null(const Filename *fn)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
return !*fn->wpath;
}
void filename_free(Filename *fn)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
sfree(fn->wpath);
sfree(fn->cpath);
sfree(fn->utf8path);
sfree(fn);
}
void filename_serialise(BinarySink *bs, const Filename *f)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
put_asciz(bs, f->utf8path);
}
Filename *filename_deserialise(BinarySource *src)
{
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
const char *utf8 = get_asciz(src);
return filename_from_utf8(utf8);
}
char filename_char_sanitise(char c)
{
if (strchr("<>:\"/\\|?*", c))
return '.';
return c;
}
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
FILE *f_open(const Filename *fn, const char *mode, bool isprivate)
{
#ifdef LEGACY_WINDOWS
/* Fallback for legacy pre-NT windows, where as far as I can see
* _wfopen just doesn't work at all */
init_winver();
if (osPlatformId == VER_PLATFORM_WIN32_WINDOWS ||
osPlatformId == VER_PLATFORM_WIN32s)
return fopen(fn->cpath, mode);
#endif
wchar_t *wmode = dup_mb_to_wc(DEFAULT_CODEPAGE, mode);
FILE *fp = _wfopen(fn->wpath, wmode);
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
sfree(wmode);
return fp;
Some support for wide-character filenames in Windows. The Windows version of the Filename structure now contains three versions of the pathname, in UTF-16, UTF-8 and the system code page. Callers can use whichever is most convenient. All uses of filenames for actually opening files now use the UTF-16 version, which means they can tolerate 'exotic' filenames, by which I mean those including Unicode characters outside the host system's CP_ACP default code page. Other uses of Filename structures inside the 'windows' subdirectory do something appropriate, e.g. when printing a filename inside a message box or a console message, we use the UTF-8 version of the filename with the UTF-8 version of the appropriate API. There are three remaining pieces to full Unicode filename support: One is that the cross-platform code has many calls to filename_to_str(), embodying the assumption that a file name can be reliably converted into the unspecified current character set; those will all need changing in some way. Another is that write_setting_filename(), in windows/storage.c, still saves filenames to the Registry as an ordinary REG_SZ in the system code page. So even if an exotic filename were stored in a Conf, that Conf couldn't round-trip via the Registry and back without corrupting that filename by coercing it back to a string that fits in CP_ACP and therefore doesn't represent the same file. This can't be fixed without a compatibility break in the storage format, and I don't want to make a minimal change in that area: if we're going to break compatibility, then we should break it good and hard (the Nanny Ogg principle), and devise a completely fresh storage representation that fixes as many other legacy problems as possible at the same time. So that's my plan, not yet started. The final point, much more obviously, is that we're still short of methods to _construct_ any Filename structures using a Unicode input string! It should now work to enter one in the GUI configurer (either by manual text input or via the file selector), but it won't round-trip through a save and load (as discussed above), and there's still no way to specify one on the command line (the groundwork is laid by commit 10e1ac7752de928 but not yet linked up). But this is a start.
2023-05-28 10:30:59 +00:00
}