1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00
Commit Graph

6 Commits

Author SHA1 Message Date
Simon Tatham
4f756d2a4d Rework Unicode conversion APIs to use a BinarySink.
The previous mb_to_wc and wc_to_mb had horrible and also buggy APIs.
This commit introduces a fresh pair of functions to replace them,
which generate output by writing to a BinarySink. So it's now up to
the caller to decide whether it wants the output written to a
fixed-size buffer with overflow checking (via buffer_sink), or
dynamically allocated, or even written directly to some other output
channel.

Nothing uses the new functions yet. I plan to migrate things over in
upcoming commits.

What was wrong with the old APIs: they had that awkward undocumented
Windows-specific 'flags' parameter that I described in the previous
commit and took out of the dup_X_to_Y wrappers. But much worse, the
semantics for buffer overflow were not just undocumented but actually
inconsistent. dup_wc_to_mb() in utils assumed that the underlying
wc_to_mb would fill the buffer nearly full and return the size of data
it wrote. In fact, this was untrue in the case where wc_to_mb called
WideCharToMultiByte: that returns straight-up failure, setting the
Windows error code to ERROR_INSUFFICIENT_BUFFER. It _does_ partially
fill the output buffer, but doesn't tell you how much it wrote!

What's wrong with the new API: it's a bit awkward to write a sequence
of wchar_t in native byte order to a byte-oriented BinarySink, so
people using put_mb_to_wc directly have to do some annoying pointer
casting. But I think that's less horrible than the previous APIs.

Another change: in the new API for wc_to_mb, defchr can be "", but not
NULL.
2024-09-26 11:30:07 +01:00
Simon Tatham
14203bc54f Formatting: standardise on "func(\n", not "func\n(".
If the function name (or expression) in a function call or declaration
is itself so long that even the first argument doesn't fit after it on
the same line, or if that would leave so little space that it would be
silly to try to wrap all the run-on lines into a tall thin column,
then I used to do this

    ludicrously_long_function_name
        (arg1, arg2, arg3);

and now prefer this

    ludicrously_long_function_name(
        arg1, arg2, arg3);

I picked up the habit from Python, where the latter idiom is required
by Python's syntactic significance of newlines (you can write the
former if you use a backslash-continuation, but pretty much everyone
seems to agree that that's much uglier). But I've found it works well
in C as well: it makes it more obvious that the previous line is
incomplete, it gives you a tiny bit more space to wrap the following
lines into (the old idiom indents the _third_ line one space beyond
the second), and I generally turn out to agree with the knock-on
indentation decisions made by at least Emacs if you do it in the
middle of a complex expression. Plus, of course, using the _same_
idiom between C and Python means less state-switching.

So, while I'm making annoying indentation changes in general, this
seems like a good time to dig out all the cases of the old idiom in
this code, and switch them over to the new.
2022-08-03 20:48:46 +01:00
Simon Tatham
4fa3480444 Formatting: realign run-on parenthesised stuff.
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.

I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.

Restored as many alignments as I could easily find.
2022-08-03 20:48:46 +01:00
Simon Tatham
5a28658a6d Remove uni_tbl from struct unicode_data.
Instead of maintaining a single sparse table mapping Unicode to the
currently selected code page, we now maintain a collection of such
tables mapping Unicode to any code page we've so far found a need to
work with, and we add code pages to that list as necessary, and never
throw them away (since there are a limited number of them).

This means that the wc_to_mb family of functions are effectively
stateless: they no longer depend on a 'struct unicode_data'
corresponding to the current terminal settings. So I've removed that
parameter from all of them.

This fills in the missing piece of yesterday's commit a216d86106:
now wc_to_mb too should be able to handle internally-implemented
character sets, by hastily making their reverse mapping table if it
doesn't already have it.

(That was only a _latent_ bug, because the only use of wc_to_mb in the
cross-platform or Windows code _did_ want to convert to the currently
selected code page, so the old strategy worked in that case. But there
was no protection against an unworkable use of it being added later.)
2022-06-01 09:28:25 +01:00
Simon Tatham
be8d3974ff Generalise strbuf_catf() into put_fmt().
marshal.h now provides a macro put_fmt() which allows you to write
arbitrary printf-formatted data to an arbitrary BinarySink.

We already had this facility for strbufs in particular, in the form of
strbuf_catf(). That was able to take advantage of knowing the inner
structure of a strbuf to minimise memory allocation (it would snprintf
directly into the strbuf's existing buffer if possible). For a general
black-box BinarySink we can't do that, so instead we dupvprintf into a
temporary buffer.

For consistency, I've removed strbuf_catf, and converted all uses of
it into the new put_fmt - and I've also added an extra vtable method
in the BinarySink API, so that put_fmt can still use strbuf_catf's
more efficient memory management when talking to a strbuf, and fall
back to the simpler strategy when that's not available.
2021-11-19 11:32:47 +00:00
Simon Tatham
f39c51f9a7 Rename most of the platform source files.
This gets rid of all those annoying 'win', 'ux' and 'gtk' prefixes
which made filenames annoying to type and to tab-complete. Also, as
with my other recent renaming sprees, I've taken the opportunity to
expand and clarify some of the names so that they're not such cryptic
abbreviations.
2021-04-26 18:00:01 +01:00