putty-source

mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

Author	SHA1	Message	Date
Simon Tatham	a7106d8eb2	Add missing initialisation of term->osc_strlen. An uninitialised value could have got as far as do_osc() via the relatively recent code path for early termination of an OSC before it's fully set up.	2022-10-23 17:45:51 +01:00
Simon Tatham	9a84a89c32	Add a batch of missing 'static's.	2022-09-03 12:02:48 +01:00
Simon Tatham	9cac27946a	Formatting: miscellaneous. This patch fixes a few other whitespace and formatting issues which were pointed out by the bulk-reindent or which I spotted in passing, some involving manual editing to break lines more nicely. I think the weirdest hunk in here is the one in windows/window.c TranslateKey() where _half_ of an assignment statement inside an 'if' was on the same line as the trailing paren of the if condition. No idea at all how that one managed to happen!	2022-08-03 20:48:46 +01:00
Simon Tatham	4b8dc56284	Formatting: remove spurious spaces in 'type * var'. I think a lot of these were inserted by a prior run through GNU indent many years ago. I noticed in a more recent experiment that that tool doesn't always correctly distinguish which instances of 'id * id' are pointer variable declarations and which are multiplications, so it spaces some of the former as if they were the latter.	2022-08-03 20:48:46 +01:00
Simon Tatham	4fa3480444	Formatting: realign run-on parenthesised stuff. My bulk indentation check also turned up a lot of cases where a run-on function call or if statement didn't have its later lines aligned correctly relative to the open paren. I think this is quite easy to do by getting things out of sync (editing the first line of the function call and forgetting to update the rest, perhaps even because you never _saw_ the rest during a search-replace). But a few didn't quite fit into that pattern, in particular an outright misleading case in unix/askpass.c where the second line of a call was aligned neatly below the _wrong_ one of the open parens on the opening line. Restored as many alignments as I could easily find.	2022-08-03 20:48:46 +01:00
Simon Tatham	3a42a09dad	Formatting: normalise back to 4-space indentation. In several pieces of development recently I've run across the occasional code block in the middle of a function which suddenly switched to 2-space indent from this code base's usual 4. I decided I was tired of it, so I ran the whole code base through a re-indenter, which made a huge mess, and then manually sifted out the changes that actually made sense from that pass. Indeed, this caught quite a few large sections with 2-space indent level, a couple with 8, and a handful of even weirder things like 3 spaces or 12. This commit fixes them all.	2022-08-03 20:48:46 +01:00
Simon Tatham	c88b6d1853	Send xterm 216+ modifiers in small-keypad key escape sequences. In the 'xterm 216+' function key mode, a function key pressed with a combination of Shift, Ctrl and Alt has its usual sequence like ESC[n~ (for some integer n) turned into ESC[n;m~ where m-1 is a 3-bit bitmap of currently pressed modifier keys. This mode now also applies to the keys on the small keypad above the arrow keys (Ins, Home, PgUp etc). If xterm 216+ mode is selected, those keys are modified in the same way as the function keys. As with the function keys, this doesn't guarantee that PuTTY will _receive_ any particular shifted key of this kind, and not repurpose it. Just as Alt+F4 still closes the window (at least on Windows) rather than sending a modified F4 sequence, Shift+Ins will still perform a paste action rather than sending a modified Ins sequence, Shift-PgUp will still scroll the scrollback, etc. But the keys not already used by PuTTY for other purposes should now have their modern-xterm behaviour in modern-xterm mode. Thanks to H.Merijn Brand for developing and testing a version of this patch.	2022-07-24 14:03:58 +01:00
Simon Tatham	5a28658a6d	Remove uni_tbl from struct unicode_data. Instead of maintaining a single sparse table mapping Unicode to the currently selected code page, we now maintain a collection of such tables mapping Unicode to any code page we've so far found a need to work with, and we add code pages to that list as necessary, and never throw them away (since there are a limited number of them). This means that the wc_to_mb family of functions are effectively stateless: they no longer depend on a 'struct unicode_data' corresponding to the current terminal settings. So I've removed that parameter from all of them. This fills in the missing piece of yesterday's commit `a216d86106`: now wc_to_mb too should be able to handle internally-implemented character sets, by hastily making their reverse mapping table if it doesn't already have it. (That was only a _latent_ bug, because the only use of wc_to_mb in the cross-platform or Windows code _did_ want to convert to the currently selected code page, so the old strategy worked in that case. But there was no protection against an unworkable use of it being added later.)	2022-06-01 09:28:25 +01:00
Simon Tatham	187fea7610	Merge bidi paragraphOverride fix from 'pre-0.77'.	2022-05-24 17:49:32 +01:00
Simon Tatham	01d8561446	do_bidi: initialise paragraphOverride correctly. I'd forgotten to initialise it at all, which meant it was set to zero by the initial memset of the whole BidiContext on creation. But in our enumeration of bidi character types, zero corresponds to L (the most common left-to-right alphabetic character class), and as a value for paragraphOverride, that is not neutral. As a result, a command such as this (assuming UTF-8) echo -e '\xD7\x90\xD7\x91' would produce Hebrew aleph and beth in the correct display order (aleph on the right), but aligned to the left margin of the terminal instead of the right margin, because the overall direction of the line was taken to be forcibly overridden to "left-to-right" instead of being inferred dynamically from the line contents. do_bidi() is a tiny wrapper on the inner function that does all the real work. And the inner function has been subjected to the whole Unicode 14 bidi conformance test. So naturally, the "trivial" but untested function just outside it is where the embarrassing bug was.	2022-05-24 17:43:48 +01:00
Simon Tatham	4da67d8fa6	Move window resize timeouts into the GTK frontend. In the changes around commit `420fe75552`, I made the terminal suspend output processing while it waited for a term_size() callback in response to a resize request. Because on X11 there are unusual circumstances in which you never receive that callback, I also added a last-ditch 5-second timeout, so that eventually we'll resume terminal output processing regardless. But the timeout lives in terminal.c, in the cross-platform code. This is pointless on Windows (where resize processing is synchronous, so we always finish it before the timer code next gets called anyway), but I decided it was easier to keep the whole mechanism in terminal.c in the absence of a good reason not to. Now I've found that reason. We _also_ generate window resizes locally to the GTK front end, in response to the key combinations that change the font size, and _those_ still have an asynchrony problem. So, to begin with, I'm refactoring the request_resize system so that now there's an explicit callback from the frontend to the terminal to say 'Your resize request has now been processed, whether or not you've received a term_size() call'. On Windows, this simplifies matters greatly because we always know exactly when to call that, and don't have to keep a 'have we called term_size() already?' flag. On GTK, the timing complexity previously in terminal.c has moved into window.c. No functional change (I hope). The payoff will be in the next commit.	2022-05-12 18:16:56 +01:00
Simon Tatham	cc10b68d31	Allow BEL to terminate OSC sequences during setup. This is a partial cherry-pick of commit `de66b0313a` from main, which allows all the forms of OSC sequence termination to apply in the preliminary states as well as OSC_STRING. The reporting user only mentioned the case of OSC 112 BEL, and not the various forms of ST. So the former is actually known to be occurring in the wild, and is also the least complicated part of the full patch on main. Therefore I think this part is worthwhile and reasonably safe to cherry-pick to 0.77 just before a release, whereas I'd be uncomfortable making the rest of the changes at this late stage.	2022-05-12 18:01:42 +01:00
Simon Tatham	de66b0313a	Allow terminating OSC sequences during setup. A user reports that the xterm OSC 112 sequence (reset cursor colour) is sometimes sent as simply OSC 112 BEL, rather than OSC 112 ; BEL. When xterm parses this, the BEL still acts as an OSC terminator, even though it appears before the separating semicolon that shifts into the 'absorb the notional command string' state. PuTTY doesn't support that sequence at all. But currently, the way it doesn't support it is by treating the BEL completely normally, so that you get an annoying beep when a client application sends that abbreviated sequence. Now we recognise all the OSC terminator sequences even in the OSC setup termstates, as well as the final OSC_STRING state. That goes equally for BEL, ST in the form of ESC \, ST in the form of single-byte 0x9C, and ST in the UTF-8 encoding.	2022-05-11 20:24:07 +01:00
Simon Tatham	bdab00341b	Cancel drag-select when the context menu pops up. I got a pterm into a stuck state this morning by an accidental mouse action. I'd intended to press Ctrl + right-click to pop up the context menu, but I accidentally pressed down the left button first, starting a selection drag, and then while the left button was still held down, pressed down the right button as well, triggering the menu. The effect was that the context menu appeared while term->selstate was set to DRAGGING, in which state terminal output is suppressed, and which is only unset by a mouse-button release event. But then that release event went to the popup menu, and the terminal window never got it. So the terminal stayed stuck forever - or rather, until I guessed the cause and did another selection drag to reset it. This happened to me on GTK, but once I knew how I'd done it, I found I could reproduce the same misbehaviour on Windows by the same method. Added a simplistic fix, on both platforms, that cancels a selection drag if the popup menu is summoned part way through it.	2022-03-29 18:06:14 +01:00
Simon Tatham	445f9de129	Fix handling of shifted SCO function keys. A user points out that this has regressed since 0.76, probably when I reorganised the keyboard control-sequence formatting into centralised helper functions in terminal.c. The SCO function keys should behave differently when you press Shift or Ctrl or both. For example, F1 should generate ESC[M bare, ESC[Y with Shift, Esc[k with Ctrl, Esc[w with Shift+Ctrl. But in fact, Shift was having no effect, so those tests would give ESC[M twice and ESC[k twice. That was because I was setting 'shift = false' for all function key types except FUNKY_XTERM_216, after modifying the derived 'index' value. But the SCO branch of the code doesn't use 'index' (it wouldn't have the right value in any case), so the sole effect was to forget about Shift. Easily fixed by disabling that branch for FUNKY_SCO too. (cherry picked from commit `aa01530488`)	2022-02-11 20:03:31 +00:00
Simon Tatham	b7a9cdd6ee	term_get_userpass_input: missing NULL check. If term_get_userpass_input is called with term->ldisc not yet set up, then we had a special-case handler that returns an error message - but it does it via the same subroutine that returns normal results, which also turns off the prompt callback in term->ldisc! Need an extra NULL check in that subroutine. Thanks Coverity.	2022-01-29 18:25:34 +00:00
Simon Tatham	6d77541080	bidi_test: minor memory fixes. Spotted by Coverity: if you _just_ gave a filename to bidi_test, without any previous argument that set testfn to something other than NULL, the program would crash rather than giving an error message. (It's only a test program, but test programs you only run once in a blue moon are the ones that _most_ need to explain their command-line syntax to you carefully, because you've forgotten it since last time you used them!) Also, conditionalised a memcpy on the size not being 0, because it's illegal to pass a null pointer to memcpy _even_ if size==0. (That would only happen with a test case containing a zero-length string, but whatever.)	2022-01-29 18:25:34 +00:00
Simon Tatham	1f6fa876e3	do_bidi: remove a pointless assert. When the textlen parameter became a size_t, it became unsigned, so it stopped being useful to assert() its non-negativity. Spotted by Coverity. Harmless, but ordinary compilers have been known to emit annoying warnings about that kind of thing too, so it's worth fixing just to avoid noise.	2022-01-29 18:24:31 +00:00
Simon Tatham	a2ff884512	Richer data type for interactive prompt results. All the seat functions that request an interactive prompt of some kind to the user - both the main seat_get_userpass_input and the various confirmation dialogs for things like host keys - were using a simple int return value, with the general semantics of 0 = "fail", 1 = "proceed" (and in the case of seat_get_userpass_input, answers to the prompts were provided), and -1 = "request in progress, wait for a callback". In this commit I change all those functions' return types to a new struct called SeatPromptResult, whose primary field is an enum replacing those simple integer values. The main purpose is that the enum has not three but _four_ values: the "fail" result has been split into 'user abort' and 'software abort'. The distinction is that a user abort occurs as a result of an interactive UI action, such as the user clicking 'cancel' in a dialog box or hitting ^D or ^C at a terminal password prompt - and therefore, there's no need to display an error message telling the user that the interactive operation has failed, because the user already knows, because they _did_ it. 'Software abort' is from any other cause, where PuTTY is the first to know there was a problem, and has to tell the user. We already had this 'user abort' vs 'software abort' distinction in other parts of the code - the SSH backend has separate termination functions which protocol layers can call. But we assumed that any failure from an interactive prompt request fell into the 'user abort' category, which is not true. A couple of examples: if you configure a host key fingerprint in your saved session via the SSH > Host keys pane, and the server presents a host key that doesn't match it, then verify_ssh_host_key would report that the user had aborted the connection, and feel no need to tell the user what had gone wrong! Similarly, if a password provided on the command line was not accepted, then (after I fixed the semantics of that in the previous commit) the same wrong handling would occur. So now, those Seat prompt functions too can communicate whether the user or the software originated a connection abort. And in the latter case, we also provide an error message to present to the user. Result: in those two example cases (and others), error messages should no longer go missing. Implementation note: to avoid the hassle of having the error message in a SeatPromptResult being a dynamically allocated string (and hence, every recipient of one must always check whether it's non-NULL and free it on every exit path, plus being careful about copying the struct around), I've instead arranged that the structure contains a function pointer and a couple of parameters, so that the string form of the message can be constructed on demand. That way, the only users who need to free it are the ones who actually _asked_ for it in the first place, which is a much smaller set. (This is one of the rare occasions that I regret not having C++'s extra features available in this code base - a unique_ptr or shared_ptr to a string would have been just the thing here, and the compiler would have done all the hard work for me of remembering where to insert the frees!)	2021-12-28 18:08:31 +00:00
Simon Tatham	bc91a39670	Proper buffer management between terminal and backend. The return value of term_data() is used as the return value from the GUI-terminal versions of the Seat output method, which means backends will take it to be the amount of standard-output data currently buffered, and exert back-pressure on the remote peer if it gets too big (e.g. by ceasing to extend the window in that particular SSH-2 channel). Historically, as a comment in term_data() explained, we always just returned 0 from that function, on the basis that we were processing all the terminal data through our terminal emulation code immediately, and never retained any of it in the buffer at all. If the terminal emulation code were to start running slowly, then it would slow down the _whole_ PuTTY system, due to single-threadedness, and back-pressure of a sort would be exerted on the remote by it simply failing to get round to reading from the network socket. But by the time we got back to the top level of term_data(), we'd have finished reading all the data we had, so it was still appropriate to return 0. That comment is still correct if you're thinking about the limiting factor on terminal data processing being the CPU usage in term_out(). But now that's no longer the whole story, because sometimes we leave data in term->inbuf without having processed it: during drag-selects in the terminal window, and (just introduced) while waiting for the response to a pending window resize request. For both those reasons, we _don't_ always have a buffer size of zero when we return from term_data(). So now that hole in our buffer size management is filled in: term_data() returns the true size of the remaining unprocessed terminal output, so that back-pressure will be exerted if the terminal is currently not consuming it. And when processing resumes and we start to clear our backlog, we call backend_unthrottle to let the backend know it can relax the back-pressure if necessary.	2021-12-19 11:02:48 +00:00
Simon Tatham	420fe75552	Suspend terminal output while a window resize is pending. This is the payoff from the last few commits of refactoring. It fixes the following race-condition bug in terminal application redraw: * server sends a window-resizing escape sequence * terminal requests a window resize from the front end * server sends further escape sequences to perform a redraw of some full-screen application, which assume that the window resize has occurred and the window is already its new size * terminal processes all those sequences in the context of the old window size, while the front end is still thinking * window resize completes in the front end and term_size() tells the terminal it now has its new size, but it's too late, the screen redraw has made a total mess. (Perhaps the server might even send its window resize + followup redraw all in one SSH packet, so that it's all queued in term->inbuf in one go.) As far as I can see, handling of this case has been broken more or less forever in the GTK frontend (where window resizes are inherently asynchronous due to the way X11 works, and we've never done anything to compensate for that). On Windows, where window size is changed via SetWindowPos which is synchronous, it used to work, but broke in commit `d74308e90e` (i.e. between 0.74 and 0.75), which made all the ancillary window updates run on the same delayed-action timer as ordinary text display. So, it's time to fix it, and I think now I should be able to fix it in GTK as well as on Windows. Now, as soon as we've set the term->win_resize_pending flag (in response to a resize escape sequence), the next return to the top of the main loop in term_out will terminate output processing early, leaving any further terminal data still in the term->inbuf bufchain. Once we get a term_size() callback from the front end telling us our new size, we reset term->win_resize_pending, which unblocks output processing again, and we also queue a toplevel callback to have another try at term_out() so that it will be unblocked promptly. To implement this I've changed term->win_resize_pending from a bool into a three-state enumeration, so that we can tell the difference between 'pending' in the sense of not yet having sent our resize request to the frontend, and in the sense of waiting for the frontend to reply. That way, a window resize from the GUI user at least won't be mistaken for the response to our resize request if it arrives in the former state. (It can still be mistaken for one in the latter case, but if the user is resizing the window at the same time as the server-side application is doing critically size-dependent redrawing, I don't think there can be any reasonable expectation of nothing going wrong.) As mentioned in the previous commit, some failure modes under X11 (in particular the window manager process getting wedged in some way) can result in no response being received to a ConfigureWindow request. In that situation, it seems to me that we really _shouldn't_ sit there waiting forever - perhaps it's technically the WM's fault and not ours, but what kind of X window are you most likely to want to use to do emergency WM repair? A terminal window, of course, so it would be exceptionally unhelpful to make any terminal window stop working completely in this situation! Hence, there's a fallback timeout in terminal.c, so that if we don't receive a response in _too_ long, we'll assume one is not forthcoming, and resume processing terminal data at the old window size. The fallback timeout is set to 5 seconds, following existing practice in libXt (DEFAULT_WM_TIMEOUT).	2021-12-19 10:54:59 +00:00
Simon Tatham	be0cea7130	Stop using a local buffer in term_out. There's no actual need to copy the data from term->inbuf into a local variable inside term_out(). We can simply store a pointer and length, and use the data _in situ_ - as long as we remember how much of it we've used, and bufchain_consume() it when the routine exits. Getting rid of that awkward and size-limited local array should marginally improve performance. But also, it opens up the possibility to suddenly suspend handling of terminal data and leave everything not yet processed in the bufchain, because now we never remove anything from the bufchain until _after_ it's been processed, so there's no need to awkwardly push the unused segment of localbuf[] back on to the front of the bufchain if we need to do that. NFC, but as usual, I have a plan to use the new capability in a followup commit.	2021-12-19 10:54:59 +00:00
Simon Tatham	5a54b3bf17	Factor out term_request_resize(). This tiny refactoring makes a convenient function for setting all the 'pending' flags and triggering a callback for the next window update. This saves a bit of code, but that's not really the main point (or else I'd have done the same to all the other similar things like window moves). The point is that in a future commit I'm going to want to do an extra thing on every server-controlled window resize, and this refactoring gives me a single place to put that extra action.	2021-12-19 10:54:59 +00:00
Simon Tatham	8f365e39f3	Centralise drag-select check into term_out(). This tiny refactoring replaces three identical checks at call sites, not all as well commented as each other, with a check in just one place with the best of the three comments.	2021-12-19 10:54:59 +00:00
Simon Tatham	27f00038e1	Fix trust-sigil handling when scrolling the terminal. Previously, when we scrolled the terminal, the newly exposed line at the bottom would be immediately allocated a trust status corresponding to the current state of the terminal. So if you're in trusted mode and you print a newline, then the line scrolled on at the bottom immediately gets a trust sigil, whether you subsequently print anything on it or not. Up until now, that hasn't mattered, because we always _do_ print something on it. But if you don't - if you send \r\n\r\n to deliberately leave a blank line - then it turns out that's not what we want after all, because if the screen _doesn't_ scroll, the passed-over line remains completely blank, whereas if it does scroll the blank line gets a trust sigil, which is inconsistent. Now, terminal lines newly exposed by a scroll have untrusted status, just the same as terminal lines that were present in the initial blank screen. They only become trusted if you actually print at least one character on them (whereupon check_trust_status will re-clear them just in case). And this is now independent of whether the terminal has scrolled or not.	2021-10-30 17:24:45 +01:00
Simon Tatham	b13f3d079b	New function-key mode similar to modern xterm. This is the same as the previous FUNKY_XTERM mode if you don't press any modifier keys, but now Shift or Ctrl or Alt with function keys adds an extra bitmap parameter. The bitmaps are the same as the ones used by the new SHARROW_BITMAP arrow key mode.	2021-10-23 11:31:09 +01:00
Simon Tatham	a40b581fc1	Fix Alt handling in the new shifted-arrow-key support. As well as affecting the bitmap field in the escape sequence, it was _also_ having its otherwise standard effect of prefixing Esc to the whole sequence. It shouldn't do both.	2021-10-23 10:55:54 +01:00
Simon Tatham	22911ccdcc	New config option for shifted arrow key handling. This commit introduces a new config option for how to handle shifted arrow keys. In the default mode (SHARROW_APPLICATION), we do what we've always done: Ctrl flips the arrow keys between sending their most usual escape sequences (ESC [ A ... ESC [ D) and sending the 'application cursor keys' sequences (ESC O A ... ESC O D). Whichever of those modes is currently configured, Ctrl+arrow sends the other one. In the new mode (SHARROW_BITMAP), application cursor key mode is unaffected by any shift keys, but the default sequences acquire two numeric arguments. The first argument is 1 (reflecting the fact that a shifted arrow key still notionally moves just 1 character cell); the second is the bitmap (1 for Shift) + (2 for Alt) + (4 for Ctrl), offset by 1. (Except that if _none_ of those modifiers is pressed, both numeric arguments are simply omitted.) The new bitmap mode is what current xterm generates, and also what Windows ConPTY seems to expect. If you start an ordinary Command Prompt and launch into WSL, those are the sequences it will generate for shifted arrow keys; conversely, if you run a Command Prompt within a ConPTY, then these sequences for Ctrl+arrow will have the effect you expect in cmd.exe command-line editing (going backward or forward a word). For that reason, I enable this mode unconditionally when launching Windows pterm.	2021-10-18 20:15:35 +01:00
Simon Tatham	c35d8b8328	win_set_[icon_]title: send a codepage along with the string. While fixing the previous commit I noticed that window titles don't actually _work_ properly if you change the terminal character set, because the text accumulated in the OSC string buffer is sent to the TermWin as raw bytes, with no indication of what character set it should interpret them as. You might get lucky if you happened to choose the right charset (in particular, UTF-8 is a common default), but if you change the charset half way through a run, then there's certainly no way the frontend will know to interpret two window titles sent before and after the change in two different charsets. So, now win_set_title() and win_set_icon_title() both include a codepage parameter along with the byte string, and it's up to them to translate the provided window title from that encoding to whatever the local window system expects to receive. On Windows, that's wide-string Unicode, so we can just use the existing dup_mb_to_wc utility function. But in GTK, it's UTF-8, so I had to write an extra utility function to encode a wide string as UTF-8.	2021-10-16 14:00:46 +01:00
Simon Tatham	4f41bc04ab	Charset-aware handling of C1 ST in OSC sequences. When the terminal is in UTF-8 mode, we accumulate UTF-8 text normally in the OSC string buffer - but the byte 0x9C is interpreted as the C1 control character String Terminator, which terminates the OSC sequence. That's not really what you want in UTF-8 mode, because 0x9C is also a perfectly normal UTF-8 continuation character. For example, you'd expect this to set the window title to "FÜNF": echo -ne '\033]0;FÜNF\007' but in fact, by the sheer chance that Ü is encoded with an 0x9C byte, you get a window title consisting of "F" followed by an illegal- encoding marker, and the OSC sequence is terminated abruptly so that the trailing 'NF' is printed normally to the terminal and then the BEL generates a beep. Now, in UTF-8 mode, we only support the C1 control for ST if it appears in the form of the proper UTF-8 encoding of U+009C. So that example now 'works', at least in the sense that the terminal considers the OSC sequence to terminate where the sender expected it to terminate. Another case where we interpret 0x9C inappropriately as ST is if the terminal is in a single-byte character set in which that character is a printing one. In CP437, for example, you can't set a window title containing a pound sign, because its encoding is 0x9C. This commit by itself doesn't make those window titles _work_, in the sense of coming out looking right. They just mean that the OSC sequence is not terminated at the wrong place. The actual title rendering will be fixed in the next commit.	2021-10-16 14:00:46 +01:00
Simon Tatham	e744071a03	Remove some unused variables. clang warned about these in the recent bidi work.	2021-10-16 12:03:39 +01:00
Simon Tatham	54930cf784	bidi.c: correct comments. I accidentally deleted the original author's name in my rewrite, which was unnecessarily unfriendly given that some of their code is still here. Also I made a thinko in my explanation of the U+00AD problem.	2021-10-10 22:55:41 +01:00
Simon Tatham	93ba74579a	Test rig for the new bidi algorithm. This standalone CLI program runs the UCD bidi tests in the form provided in Unicode 14.0.0. You can run it by just saying bidi_test --class BidiTest.txt --char BidiCharacterTest.txt assuming those two UCD files are in the current directory.	2021-10-10 15:00:30 +01:00
Simon Tatham	b8be01adca	Complete rewrite of the bidi algorithm. A user reported that PuTTY's existing bidi algorithm will generate misordered text in cases like this (assuming UTF-8): echo -e '12 A \xD7\x90\xD7\x91 B' The hex codes in the middle are the Hebrew letters aleph and beth. Appearing in the middle of a line whose primary direction is left-to-right, those two letters should appear in the opposite order, but not cause the rest of the line to move around. That is, you expect the displayed text in this situation to be 12 A <beth><aleph> B But in fact, the digits '12' were erroneously reversed, so you would actually see '21 A <beth><aleph> B'. I tried to debug the existing bidi algorithm, but it was very hard, because the Unicode bidi spec has been extensively changed since Arabeyes contributed that code, and I couldn't even reliably work out which version of the spec the code was intended to implement. I found some problems, notably that the resolution phase was running once on the whole line instead of separately on runs of characters at the same level, and also that the 'sor' and 'eor' values were being wrongly computed. But I had no way to test any fix to ensure it hadn't introduced another bug somewhere else. Unicode provides a set of conformance tests in the UCD. That was just what I wanted - but they're too up-to-date to run against the old algorithm and expect to pass! So, paradoxically, it seemed to me that the _easiest_ way to fix this bidi bug would be to bring absolutely everything up to date. But the revised bidi algorithm is significantly more complicated, so I also didn't think it would be sensible to try to gradually evolve the existing code into it. Instead, I've done a complete rewrite of my own. The new code implements the full UAX#9 rev 44 algorithm, including in particular support for the new 'directional isolate' control characters, and also special handling for matched pairs of brackets in the text (see rule N0 in the spec). I've managed to get it to pass the entire UCD conformance test suite, so I'm reasonably confident it's right, or at the very least a lot closer to right than the old algorithm was. So the upshot is: the test case shown at the top of this file now passes, but also, other detailed bidi handling might have changed, certainly some cases involving brackets, but perhaps also other things that were either bugs in the old algorithm or updates to the standard.	2021-10-10 15:00:30 +01:00
Simon Tatham	caa16deb1c	bidi.c: update the API. The input length field is now a size_t rather than an int, on general principles. The return value is now void (we weren't using the previous return value at all). And we now require the client to have previously allocated a BidiContext, which will allow allocated storage to be reused between runs, saving a lot of churn on malloc. (However, the current BidiContext doesn't contain anything interesting. I could have moved the existing mallocs into it, but there's no point, since I'm about to rewrite the whole thing anyway.)	2021-10-10 14:55:16 +01:00
Simon Tatham	804f32765f	Make bidi type enums into list macros. This makes it easier to create the matching array of type names in bidi_gettype.c, and eliminates the need for an assertion to check the array matched the enum. And I'm about to need to add more types, so let's start by making that trivially easy.	2021-10-10 14:55:15 +01:00
Simon Tatham	d7548d0449	Move bidi gettype main() into its own file. That's what I've usually been doing with any main()s I find under ifdef; there's no reason this should be an exception. If we're keeping it in the code at all, we should ensure it carries on compiling. I've also created a new header file bidi.h, containing pieces of the bidi definitions shared between bidi.c and the new source file.	2021-10-10 14:53:25 +01:00
Simon Tatham	0377c689f2	Start a 'terminal' source subdirectory. This contains terminal.c, bidi.c (formerly minibidi.c), and terminal.h. I'm about to make a couple more bidi-related source files, so it seems worth starting by making a place to put them that won't be cluttering up the top level.	2021-10-10 14:37:10 +01:00

38 Commits