Lots of functions had really generic names (like 'makekey'), or names
that missed out an important concept (like 'rsakey_pubblob', which
loads a public blob from a _file_ and doesn't generate it from an
in-memory representation at all). Also, the opaque 'int order' that
distinguishes the two formats of public key blob is now a mnemonic
enumeration, and while I'm at it, rsa_ssh1_public_blob takes one of
those as an extra argument.
This seems to be a knock-on effect of my recent reworking of the SSH
code to be based around queues and callbacks. The loop iteration
function in uxsftp.c (ssh_sftp_do_select) would keep going round its
select loop until something had happened on one of its file
descriptors, and then return to the caller in the assumption that the
resulting data might have triggered whatever condition the caller was
waiting for - and if not, then the caller checks, finds nothing
interesting has happened, and resumes looping with no harm done.
But now, when something happens on an fd, it doesn't _synchronously_
trigger the follow-up condition PSFTP was waiting for (which, at
startup time, happens to be back->sendok() starting to return TRUE).
Instead, it schedules a callback, which will schedule a callback,
which ... ends up setting that flag. But by that time, the loop
function has already returned, the caller has found nothing
interesting and resumed looping, and _now_ the interesting thing
happens but it's too late because ssh_sftp_do_select will wait until
the next file descriptor activity before it next returns.
Solution: give run_toplevel_callbacks a return value which says
whether it's actually done something, and if so, return immediately in
case that was the droid the caller was looking for. As it were.
In commit 528513dde I absentmindedly replaced a write to the local
variable 'need_size' of drawing_area_setup with a write to
inst->drawing_area_setup_needed, imagining that they had the same
effect. But actually, need_size was doing two jobs and I only replaced
one of them: it was also the variable that indicated that the logical
terminal size had changed and so we had to call term_size() to make
the terminal.c data structures resize themselves appropriately. The
loss of that call also inhibited generation of SIGWINCH.
NFC for the moment, because the bufchain is always specially
constructed to hold exactly the same data that would have been passed
in to the function as a (pointer,length) pair. But this API change
allows get_userpass_input to express the idea that it consumed some
but not all of the data in the bufchain, which means that later on
I'll be able to point the same function at a longer-lived bufchain
containing the full stream of keyboard input and avoid dropping
keystrokes that arrive too quickly after the end of an interactive
password prompt.
NFC: this is a preliminary refactoring, intended to make my life
easier when I start changing around the APIs used to pass user
keyboard input around. The fewer functions even _have_ such an API,
the less I'll have to do at that point.
Changing the window's font size with Alt-< or Alt-> was not setting
any of the flags that make drawing_area_setup consider itself to have
been non-spuriously called, so the real window would enlarge without
the backing surface also doing so.
Since Pageant contains its own passphrase prompt system rather than
delegating it to another process, it's not trivial to use it in other
contexts. But having gone to the effort of coming up with my own
askpass system that (I think) does a better job at not revealing the
length of the password, I _want_ to use it in other contexts where a
GUI passphrase or password prompt is needed. Solution: an --askpass
option.
Mostly for debugging purposes, because I'm tired of having to use
'setsid' to force Pageant to select the GUI passphrase prompt when I'm
trying to fix bugs in gtkask.c. But I can also imagine situations in
which the ability to force a GUI prompt window might be useful to end
users, for example if the process does _technically_ have a
controlling terminal but it's not a user-visible one (say, in the back
end of some automation tool like expect(1)).
For symmetry, I also provide an option to force the tty prompt. That's
less obviously useful, because that's already the preferred prompt
type when both methods are available - so the only use for it would be
if you wanted to ensure that Pageant didn't _accidentally_ try to
launch a GUI prompt, and aborted with an error if it couldn't use a
tty prompt.
I've found Unix Pageant's GTK password prompt to be a bit flaky on
Ubuntu 18.04. Part of the reason for that seems to be (I _think_) that
GTK has changed its internal order of setting things up, so that you
can no longer call gtk_widget_show_now() and expect that when it
returns everything is ready to do a gdk_seat_grab. Another part is
that - completely mysteriously as far as I can see - a _failed_
gdk_seat_grab(GDK_SEAT_CAPABILITY_KEYBOARD) has the side effect of
calling gdk_window_hide on the window you gave it!
So I've done a considerable restructuring that means we no longer
attempt to do the keyboard grab synchronously in gtk_askpass_setup.
Instead, we make keyboard grab attempts during the run of gtk_main,
scheduling each one on a timer if the previous attempt fails.
This means I need a visual indication of 'not ready for you to type
anything yet', which I've arranged by filling in the three drawing
areas to mid-grey. At the point when the keyboard grab completes and
the window becomes receptive to input, they turn into the usual one
black and two white.
In GTK 3.10 and above, high-DPI support is arranged by each window
having a property called a 'scale factor', which translates logical
pixels as seen by most of the GTK API (widget and window sizes and
positions, coordinates in the "draw" event, etc) into the physical
pixels on the screen. This is handled more or less transparently,
except that one side effect is that your Cairo-based drawing code had
better be able to cope with that scaling without getting confused.
PuTTY's isn't, because we do all our serious drawing on a separate
Cairo surface we made ourselves, and then blit subrectangles of that
to the window during updates. This has two bad consequences. Firstly,
our surface has a size derived from what GTK told us the drawing area
size is, i.e. corresponding to GTK's _logical_ pixels, so when the
scale factor is (say) 2, our drawing takes place at half size and then
gets scaled up by the final blit in the draw event, making it look
blurry and unpleasant. Secondly, those final blits seem to end up
offset by half a pixel, so that a second blit over the same
subrectangle doesn't _quite_ completely wipe out the previously
blitted data - so there's a ghostly rectangle left behind everywhere
the cursor has been.
It's not that GTK doesn't _let_ you find out the scale factor; it's
just that it's in an out-of-the-way piece of API that you have to call
specially. So now we do: our backing surface is now created at a pixel
resolution matching the screen's real pixels, and we translate GTK's
scale factor into an ordinary cairo_scale() before we commence
drawing. So we still end up drawing the same text at the same size -
and this strategy also means that non-text elements like cursor
outlines and underlining will be scaled up with the screen DPI rather
than stubbornly staying one physical pixel thick - but now it's nice
and sharp at full screen resolution, and the subrectangle blits in the
draw event are back to affecting the exact set of pixels we expect
them to.
One silly consequence is that, immediately after removing the last
one, I've installed a handler for the GTK "configure-event" signal!
That's because the GTK 3 docs claim that that's how you get notified
that your scale factor has changed at run time (e.g. if you
reconfigure the scale factor of a whole monitor in the GNOME settings
dialog). Actually in practice I seem to find out via the "draw" event
before "configure" bothers to tell me, but now I've got a usefully
idempotent function for 'check whether the scale factor has changed
and sort it out if so', I don't see any harm in calling it from
anywhere it _might_ be useful.
I've been using that signal since the very first commit of this source
file, as a combined way to be notified when the size of the drawing
area changes (typically due to user window resizing actions) and also
when the drawing area is first created and available to be drawn on.
Unfortunately, testing on Ubuntu 18.04, I ran into an oddity, in which
the call to gtk_widget_show(inst->window) in new_session_window() has
the side effect of delivering a spurious configure_event on the
drawing area with size 1x46 pixels. This causes the terminal to resize
itself to 1 column wide, and the mistake isn't rectified until a
followup configure-event arrives after new_session_window returns to
the GTK main loop. But that means terminal output can occur between
those two configure events (the connection-sharing "Reusing a shared
connection to host.name" is a good example), and when it does, it gets
embarrassingly wrapped at one character per line down the left column.
I briefly tried to bodge around this by trying to heuristically guess
which configure events were real and which were spurious, but I have
no faith in that strategy continuing to work. I think a better
approach is to abandon configure-event completely, and move to a
system in which the two purposes I was using it for are handled by two
_different_ GTK signals, namely "size-allocate" (for knowing when we
get resized) and "realize" (for knowing when the drawing area
physically exists for us to start setting up Cairo or GDK machinery).
The result seems to have fixed the silly one-column wrapping bug, and
retained the ability to handle window resizes, on every GTK version I
have conveniently available to test on, including GTK 3 both before
and after these spurious configures started to happen.
GTK 3 PuTTY/pterm has always assumed that if it was compiled with
_support_ for talking to the raw X11 layer underneath GTK and GDK,
then it was entitled to expect that raw X11 layer to exist at all
times, i.e. that GDK_DISPLAY_XDISPLAY would return a meaningful X
display that it could do useful things with. So if you ran it over the
GDK Wayland backend, it would immediately segfault.
Modern GTK applications need to cope with multiple GDK backends at run
time. It's fine for GTK PuTTY to _contain_ the code to find and use
underlying X11 primitives like the display and the X window id, but it
should be prepared to find that it's running on Wayland (or something
else again!) so those functions don't return anything useful - in
which case it should degrade gracefully to the subset of functionality
that can be accessed through backend-independent GTK calls.
Accordingly, I've centralised the use of GDK_DISPLAY_XDISPLAY into a
support function get_x_display() in gtkmisc.c, which starts by
checking that there actually is one first. All previous direct uses of
GDK_*_XDISPLAY now go via that function, and check the result for NULL
afterwards. (To save faffing about calling that function too many
times, I'm also caching the display pointer in more places, and
passing it as an extra argument to various subfunctions, mostly in
gtkfont.c.)
Similarly, the get_windowid() function that retrieves the window id to
put in the environment of pterm's child process has to be prepared for
there not to be a window id.
This isn't a complete fix for all Wayland-related problems. The other
one I'm currently aware of is that the default font is "server:fixed",
which is a bad default now that it won't be available on all backends.
And I expect that further problems will show up with more testing. But
it's a start.
This is a heavily edited (by me) version of a patch originally due to
Nico Williams and Viktor Dukhovni. Their comments:
* Don't delegate credentials when rekeying unless there's a new TGT
or the old service ticket is nearly expired.
* Check for the above conditions more frequently (every two minutes
by default) and rekey when we would delegate credentials.
* Do not rekey with very short service ticket lifetimes; some GSSAPI
libraries may lose the race to use an almost expired ticket. Adjust
the timing of rekey checks to try to avoid this possibility.
My further comments:
The most interesting thing about this patch to me is that the use of
GSS key exchange causes a switch over to a completely different model
of what host keys are for. This comes from RFC 4462 section 2.1: the
basic idea is that when your session is mostly bidirectionally
authenticated by the GSSAPI exchanges happening in initial kex and
every rekey, host keys become more or less vestigial, and their
remaining purpose is to allow a rekey to happen if the requirements of
the SSH protocol demand it at an awkward moment when the GSS
credentials are not currently available (e.g. timed out and haven't
been renewed yet). As such, there's no need for host keys to be
_permanent_ or to be a reliable identifier of a particular host, and
RFC 4462 allows for the possibility that they might be purely
transient and only for this kind of emergency fallback purpose.
Therefore, once PuTTY has done a GSS key exchange, it disconnects
itself completely from the permanent host key cache functions in
storage.h, and instead switches to a _transient_ host key cache stored
in memory with the lifetime of just that SSH session. That cache is
populated with keys received from the server as a side effect of GSS
kex (via the optional SSH2_MSG_KEXGSS_HOSTKEY message), and used if
later in the session we have to fall back to a non-GSS key exchange.
However, in practice servers we've tested against do not send a host
key in that way, so we also have a fallback method of populating the
transient cache by triggering an immediate non-GSS rekey straight
after userauth (reusing the code path we also use to turn on OpenSSH
delayed encryption without the race condition).
Colin Watson reports that on pre-releases of Ubuntu 18.04, configure
events which don't actually involve a change of window size show up
annoyingly often. Our handling of configure events involves throwing
away the backing Cairo surface, making a fresh blank one, and
scheduling a top-level callback to get terminal.c to do a repaint and
populate the new surface; so a draw event before that callback occurs
causes the window contents to flicker off and on again, not to mention
wasting a lot of time.
The simplest solution is to spot spurious configures, and respond by
not throwing away the previous Cairo surface in the first place.
Except in GTK1 (which doesn't have the former), via a gtkcompat.h
workaround.
Up-to-date GTK3 has deprecated gdk_beep(), causing build failures due
to the default -Werror setting.
Looks as if I haven't retried the GTK1 build for a while, and recent
GTK frontend development has broken it. The selection revamp has
pointed out that GTK1 didn't have the accessor function
gtk_selection_data_get_selection(), the standard GdkAtom value
GDK_SELECTION_CLIPBOARD, or keysyms for alphabetic characters; and
also I had an initialisation of one of my own structure fields
(dp->selparams) accidentally not guarded by the same GTK-versioning
ifdef that controls whether or not it was defined.
Ahem. I _spotted_ this in code review, and forgot to make the change
before pushing!
Because it's legitimate for a C implementation to define 'NULL' so
that it expands to just 0, it follows that if you use NULL in a
variadic argument list where the callee will expect to extract a
pointer, you run the risk of putting an int-sized rather than
pointer-sized argument on the list and causing the consumer to get out
of sync. So you have to add an explicit cast.
The PuTTY GUIs (Unix and Windows) maintain an in-memory event log
for display to users as they request. This uses ints for tracking
eventlog size, which is subject to memory exhaustion and (given
enough heap space) overflow attacks by servers (via, e.g., constant
rekeying).
Also a bounded log is more user-friendly. It is rare to want more
than the initial logging and the logging from a few recent rekey
events.
The Windows fix has been tested using Dr. Memory as a valgrind
substitute. No errors corresponding to the affected code showed up.
The Dr. Memory results.txt was split into a file per-error and then
grep Error $(grep -l windlg *)|cut -d: -f3-|sort |uniq -c
was used to compare. Differences arose from different usage of the GUI,
but no error could be traced to the code modified in this commit.
The Unix fix has been tested using valgrind. We don't destroy the
eventlog_stuff eventlog arrays, so we can't be entirely sure that we
don't leak more than we did before, but from code inspection it looks
like we don't (and anyways, if we leaked as much as before, just without
the integer overflow, well, that's still an improvement).
Now we don't annoyingly print the 'askappend' prompt if you ask a
PuTTY tool to write its packet log to something that's not a regular
file, such as /dev/fd/1 or /dev/tty or a named pipe.
(In the case of a named pipe, another annoyance fixed by this change
is that we also don't open it for reading in the course of the
existence test.)
Apparently I haven't tried a GTK2 build since the most recent set of
GTK-related code reorganisation. Some functions that were ifdef'ed out
in GTK3 builds were now unused even in GTK2 builds (and, because they
were also declared static, caused a -Werror build failure); and the
pointless stub version of gtkapp.c was missing a stub version of a
recently added function referred to from another module.
gtk_application_set_accels_for_action() is new in Gtk 3.12, but (e.g.)
Ubuntu 14.04 LTS still ships with Gtk 3.10.
On the other hand, the function I've used instead,
gtk_application_add_accelerator(), is deprecated from Gtk 3.14 onwards,
indicating that it will disappear in some future version, so I've left
the newer code in against that day.
It actually doesn't seem to be necessary: running 'otool -L' on the
real binary in the application bundle (Pterm-bin or PuTTY-bin) lists a
lot of paths starting with "@executable_path/../Resources/", which I
take to mean that the application is already set up to automatically
load the GTK shared libraries out of its own bundle directory, without
me having to give it the extra hint of DYLD_LIBRARY_PATH.
Moreover, I just got round to upgrading my Mac to High Sierra, and now
the version of osxlaunch _with_ DYLD_LIBRARY_PATH is causing a crash
at program load time, when the libpng in the MacOS system library
directory tries to use the libz in the application bundle and finds
that it doesn't provide an entry point it was expecting
('inflateValidate'). I could try to fix that by updating the libz
version in my OS X PuTTY build environment, but that seems to me to
set a precedent of running to keep up with any further dependencies
the system libraries happen to acquire in later releases. Better to
reset DYLD_LIBRARY_PATH so that the system libpng will load the system
libz and not get confused in the first place.
I've been having intermittent segfaults in this launcher program, and
by means of the new TEST_COMPILE_ON_LINUX facility introduced by
commit eef8cac28, I ran it under valgrind which helpfully pointed out
several pointers between linked-list nodes which I'd been relying on
OS memory allocation to happen to have zeroed for me.
By default, the program still builds on Linux to a stub that just
prints 'nothing to see here'. But if you compile with
-DTEST_COMPILE_ON_LINUX, it compiles to a program that still doesn't
do anything _actually_ useful, but goes through all the same motions
that real osxlaunch would go through, until the final execv(2) fails
because of course it's not _really_ living in an application bundle
directory of the right shape.
That allows me to run all the setup code under the debugging tools I'm
most used to, in my preferred environment. (Same rationale as having
puttyapp / ptermapp build for Linux too.)
I've filled in the results of some not-entirely-conclusive
investigation into the trackpad scrolling issue, some thoughts on
resizing, and reordered the items into what currently seems the most
sensible order to me.
This still isn't complete: I also need to add the variable collections
of things like mid-session special commands and saved session names,
and also I need to try to grey out menu items when they're not
applicable. But it's a start.
Just to avoid an endless proliferation of functions too small to see,
I've arranged an enumeration of action ids and a single
app_menu_action function on the receiving end, and in gtkapp.c, a list
macro that means I at least don't have to define the tiny callback
functions and the GActionEntry records by hand and keep them in sync.
This fixes the problem I'd previously noticed, that if you don't
configure the "Command key acts as Meta" setting, then keystrokes like
Command-Q which _ought_ to function as accelerators for the
application menu bar don't.
Turns out that this was for the totally obvious reason: the keyboard
event was still being processed by gtkwin.c's key_event() and
translated via the GTK IM into ordinary keyboard input. If instead I
return FALSE from key_event on detecting that a key event has a
non-Meta-configured Command modifier, then it will go to the next-
level key-event handler inside GTK itself which implements the menu
accelerator behaviour. Another problem ticked off the OS X checklist.
That's what I get for not testing on all platforms before I push.
Forgot that, since OS X GTK mimics X11 GTK closely enough to still use
the name "CLIPBOARD" for the unique system clipboard, I had left this
code base's internal name for it as CLIP_CLIPBOARD and not the
CLIP_SYSTEM I used on Windows.
The gtkapp.c menu now has a Copy as well as Paste option; those menu
items, as well as the corresponding ones on the context menu and Copy
All, now address sets of clipboards parametrised between OS X and
ordinary GTK in unix.h. Also I've tweaked the wording of the
context-menu items to not use the X-specific terminology "CLIPBOARD"
on OS X.
I had a segfault on OS X today at Pterm.app shutdown. I wasn't able to
reproduce it in a debugger, but the cause seemed to be that
clipboard_clear called term_deselect (this was from before the patch
series that renamed that function) when inst->term was already NULL.
This must be because a clipboard_data_instance outlived its associated
inst->term, and quite likely its associated inst as well. But we can't
free those structures when a gui_data is freed, because GTK callbacks
will still depend on them; so instead we must have each gui_data keep
a list of active cdis pointing at it, and then at destruction time,
walk along the list nulling out each one's pointer to part of itself.
I've done the general clipboard revamp, and also, since I added
Ctrl-Shift-{C,V} as a new pair of UI actions for copy and paste, I've
also fulfilled the requirement that there should be some method of
non-menu-based pasting that doesn't depend on a middle mouse button or
an Ins key.
I think the list of OS X missing features is now down to details of
the OS X GTK port _itself_, as opposed to structural issues in the
general code base.
On all platforms, you can now configure which clipboard the mouse
pastes from, which clipboard Ctrl-Ins and Shift-Ins access, and which
Ctrl-Shift-C and Ctrl-Shift-V access. In each case, the options are:
- nothing at all
- a clipboard which is implicitly written by the act of mouse
selection (the PRIMARY selection on X, CLIP_LOCAL everywhere else)
- the standard clipboard written by explicit copy/paste UI actions
(CLIPBOARD on X, the unique system clipboard elsewhere).
Also, you can control whether selecting text with the mouse _also_
writes to the explicitly accessed clipboard.
The wording of the various messages changes between platforms, but the
basic UI shape is the same everywhere.
All the data fields referring to the selection in 'struct gui_data'
have been pulled out into a separate structure of which there are now
multiple instances, and I've plumbed through what should be the right
pointers and integer ids to everywhere they should go. So now the GTK
front end defines CLIP_PRIMARY and CLIP_CLIPBOARD in place of the
temporary cop-out CLIP_SYSTEM from the previous commit, and copying
and pasting can be done via either one.
The defaults should be the same as before, except that now the non-Mac
versions of the GtkApplication front ends will access CLIP_PRIMARY in
response to most actions but the 'Paste' menu item will paste from
CLIP_CLIPBOARD. (That's mostly just as a demonstration that accessing
multiple clipboards even works.)
This lays some groundwork for making PuTTY's cut and paste handling
more flexible in the area of which clipboard(s) it reads and writes,
if more than one is available on the system.
I've introduced a system of list macros which define an enumeration of
integer clipboard ids, some defined centrally in putty.h (at present
just a CLIP_NULL which never has any text in it, because that seems
like the sort of thing that will come in useful for configuring a
given copy or paste UI action to be ignored) and some defined per
platform. All the front end functions that copy and paste take a
clipboard id, and the Terminal structure is now configured at startup
to tell it which clipboard id it should paste from on a mouse click,
and which it should copy from on a selection.
However, I haven't actually added _real_ support for multiple X11
clipboards, in that the Unix front end supports a single CLIP_SYSTEM
regardless of whether it's in OS X or GTK mode. So this is currently a
NFC refactoring which does nothing but prepare the way for real
changes to come.
Previously, both the Unix and Windows front ends would respond to a
paste action by retrieving data from the system clipboard, converting
it appropriately, _storing_ it in a persistent dynamic data block
inside the front end, and then calling term_do_paste(term), which in
turn would call back to the front end via get_clip() to retrieve the
current contents of that stored data block.
But, as far as I can tell, this was a completely pointless mechanism,
because after a data block was written into this storage area, it
would be immediately used for exactly one paste, and then never
accessed again until the next paste action caused it to be freed and
replaced with a new chunk of pasted data.
So why on earth was it stored persistently at all, and why that
callback mechanism from frontend to terminal back to frontend to
retrieve it for the actual paste action? I have no idea. This change
removes the entire system and replaces it with the completely obvious
alternative: the character-set-converted version of paste data is
allocated in a _local_ variable in the frontend paste functions,
passed directly to term_do_paste which now takes (buffer,length)
parameters, and freed immediately afterwards. get_clip() is gone.
When testing the previous commit, I went to great lengths to check all
the tricky corner cases of the detailed command-line argument handling
in Plink and PuTTY, on Windows and Unix. And did I also double-check
that I had not completely broken the very simplest possible invocation
of pterm? I did not.
The call to cmdline_host_ok() in gtkmain.c was failing an assertion in
pterm, because that function only expects to have been called by a
program that has the TOOLTYPE_HOST_ARG flag set - if that flag isn't
set, the program is expected to come up with its own answer to the
question (because I wasn't sure what the right fallback answer would
be). And I forgot to conditionalise the call between PuTTY and pterm.
This is another piece of long-overdue refactoring similar to the
recent commit e3796cb77. But where that one dealt with normalisation
of stuff already stored _in_ a Conf by whatever means (including, in
particular, handling a user typing 'username@host.name' into the
Hostname box of the GUI session dialog box), this one deals with
handling argv entries and putting them into the Conf.
This isn't exactly a pure no-functional-change-at-all refactoring. On
the other hand, it isn't a full-on cleanup that completely
rationalises all the user-visible behaviour as well as the code
structure. It's somewhere in between: I've preserved all the behaviour
quirks that I could imagine a reason for having intended, but taken
the opportunity to _not_ faithfully replicate anything I thought was
clearly just a bug.
So, for example, the following inconsistency is carefully preserved:
the command 'plink -load session nextword' treats 'nextword' as a host
name if the loaded session hasn't provided a hostname already, and
otherwise treats 'nextword' as the remote command to execute on the
already-specified remote host, but the same combination of arguments
to GUI PuTTY will _always_ treat 'nextword' as a hostname, overriding
a hostname (if any) in the saved session. That makes some sense to me
because of the different shapes of the overall command lines.
On the other hand, there are two behaviour changes I know of as a
result of this commit: a third argument to GUI PuTTY (after a hostname
and port) now provokes an error message instead of being silently
ignored, and in Plink, if you combine a -P option (specifying a port
number) with the historical comma-separated protocol selection prefix
on the hostname argument (which I'd completely forgotten even existed
until this piece of work), then the -P will now override the selected
protocol's default port number, whereas previously the default port
would win. For example, 'plink -P 12345 telnet,hostname' will now
connect via Telnet to port 12345 instead of to port 23.
There may be scope for removing or rethinking some of the command-
line syntax quirks in the wake of this change. If we do decide to do
anything like that, then hopefully having it all in one place will
make it easier to remove or change things consistently across the
tools.
Now 'putty user@host' will do what you wanted on Unix the same way it
always has on Windows.
(Thanks to Geoff Winkless for pointing out this inconsistency. I've
redone his actual patch my way, but he should still be credited for
the inspiration!)
A more or less identical piece of code to sanitise the CONF_host
string prior to session launch existed in Windows PuTTY and both
Windows and Unix Plink. It's long past time it was centralised.
While I'm here, I've added a couple of extra comments in the
centralised version, including one that - unfortunately - tries _but
fails_ to explain why a string of the form "host.name:1234" doesn't
get the suffix moved into CONF_port the way "user@host" moves the
prefix into CONF_username. Commit c1c1bc471 is the one I'm referring
to in the comment, and unfortunately it has an unexplained one-liner
log message from before I got into the habit of being usefully
verbose.
Stopping dialog boxes from being modal is now done; post_main() is
defunct; nothing left in gtkwin.c does an inappropriate whole-process
termination in response to a window-level error or closure condition.
(There is still modalfatalbox(), but that's not an _inappropriate_
process termination.)
This one's in frontend_keypress(), which is supposed to close the
window on the first keypress after the session inside it terminates
(that is, if your close-on-exit settings haven't made it close already
at that point).
It looks to me as if that behaviour doesn't currently _work_, and
hasn't worked for quite a while (certainly it was broken as of 0.70,
well before I started on this weekend's refactoring), because when the
session terminates we delete inst->ldisc and that's what would
otherwise be calling frontend_keypress. I should probably decide what
to do about that at some point. But for the moment, I'm satisfied to
simply not break this functionality any worse by making it not a
process-global exit :-)
For gtkapp-based tools that will have to stop being a program-fatal
error, so I've turned it into a function called window_setup_error
(which I could in principle reuse for other problems in the long and
tortuous progress of new_session_window), and kept the original
handling in gtkmain.c's implementation of that function while gtkapp.c
does something more sensible with a message box.
Not all gtkwin-based tools use it. Only the ones with one session per
process, which parse a command line describing that session and might
reasonably want to report errors in that command line by writing to
standard error and exiting the program.
In other words, precisely the ones that link in gtkmain.c and not
gtkapp.c. So gtkmain.c is a more sensible place to put that
error-reporting function.
This was one of a handful of remaining places in gtkwin.c where exit()
is called incautiously. Of course, a failure to set up one SSH
connection should only be fatal to that connection, not the whole
process, so really we should be feeding into the connection_fatal
system.
This existed in order to avoid the various confusions that could
happen if a toplevel callback ran in the context of a subsidiary
instance of gtk_main(). Now there aren't any subsidiary gtk_main
instances any more, this mechanism is no longer needed, and I can
throw it out. It was horrible anyway.
I think these began to appear as a consequencce of replacing
fatalbox() calls with more sensible error reports: the more specific a
direction I send a report in, the greater the annoying possibility of
re-entrance when the resulting error handler starts closing stuff.
This change requires me to break up the general cleanups in
delete_inst() into two halves: one runs when the error message box is
created, and cleans up the network connection and all the stuff
associated with it, and the other runs when the error message is
dismissed and the window can actually close.
It's an incoherent concept! There should not be any such thing as an
error box that terminates the entire program but is not modal. If it's
bad enough to terminate the whole program, i.e. _all_ currently live
connections, then there's no point in permitting progress to continue
in windows other than the affected one, because all windows are
affected anyway.
So all previous uses of fatalbox() have become modalfatalbox(), except
those which looked to me as if they shouldn't have been fatal in the
first place, e.g. lingering pieces of error handling in winnet.c which
ought to have had the severity of 'give up on this particular Socket
and close it' rather than 'give up on the ENTIRE UNIVERSE'.
I've also moved it out into gtkwin.c, because it seemed easier to do
the 'find existing instance of this dialog and raise it' dance there
than to split it across source files pointlessly.
Apart from the specific benefit of non-modality, this also makes it a
lot simpler compared to the previous code! I'm not completely sure why
I wasn't using the standard gtkdlg.c message box system all along.
This fits into a new dialog-box slot (because it might have to come up
at the same time as a network prompt), and makes use of the existing
callback system in logging.c which buffers the logging data until the
user says what they want done with it.
Now it has several 'slots', each named for a particular class of
subsidiary dialog box that a session window can have at most one of,
and register_network_prompt_dialog has a more general name and takes
an enum-typed argument identifying a slot. This lets me avoid writing
a zillion annoyingly similar function pairs and corresponding snippets
of cleanup code in delete_inst.
If you close a session window with an associated SSH back end, the
back end may call back to notify_remote_exit() from ssh_free(), which
queues a new top-level callback citing the inst structure we were
about to delete.
We could fix this by introducing a special 'moribund' flag which
inhibits notify_remote_exit from queueing a callback, but far easier
is to move the delete_callbacks_for_context() call to _after_ all
subsidiary things have been cleaned up, so that any last-minute
callbacks they might schedule will be promptly unscheduled again
before they do any damage.
This follows exactly the same pattern as for verify_ssh_host_key, but
the results of the dialog box are simpler (a plain yes-no response),
so the two dialog types can share a callback.
I've switched it to using the new non-modal create_message_box, and
provided a callback function which handles the cleanup afterwards.
I had expected this to be a lot more work, because I'd imagined that
I'd have to contort the coroutines in ssh.c to give them the ability
to wait for an asynchronously delivered result from that user prompt.
But in fact that wasn't necessary, because just such a mechanism has
been sitting there unused since commit 8574822b9 in 2005, when I added
it as part of my _previous_ attempt to write an OS X front end! (The
abandoned one written in native ObjC + Cocoa.)
When I switch verify_ssh_host_key() and friends over to creating
non-modal message boxes and returning to the main loop, there will be
a risk that their parent window will need to close for some other
reason while the user hasn't answered the pending question yet. (E.g.
if the user presses the main session window's close button, which will
no longer be a prohibited UI action once the transient dialog is not
modal.)
At that point we need to get rid of the pending dialog box, both for
UI purposes (it would look silly and be confusing to leave it lying
around) and for memory management (if the user subsequently clicks OK
in such a dialog it would probably try to leave its result somewhere
stale).
So now there's a mechanism for gtkwin.c remembering what the current
'network prompt dialog' is, if any (in which category I intend to
include everything triggered from ssh.c's various reasons for asking
crypto-related questions), and cleaning it up when the struct gui_data
it belongs to goes away.
If a dialog box is destroyed by the program before the user has
pressed one of the result-delivering buttons - e.g. because the parent
window closes so the dialog is no longer relevant to anything anyway -
then dlgparam_destroy would never call the client code's provided
callback. That makes sense in terms of the callback wanting to _take
action_ based on the result of the dialog box, but it ignores the
possibility that the callback may simply need to free its own context
structure.
So now dlgparam_destroy always calls the client's callback, even if
the result it passes is negative (meaning 'the user never got round to
pressing any of the dialog-ending buttons'), and all the existing
client callbacks handle the negative-result case by doing nothing
except freeing any allocated memory they might have.
This does the bulk of the work previously done by message_box()
proper, but takes a pointer to a result-reporting callback function
identical to the one we pass to create_config_box().
The modal version of message_box() still exists and is a small wrapper
on this function, running its own subsidiary gtk_main() loop which the
result callback terminates. But now I can start switching over
individual uses of message_box() to the non-modal version, and when
that's done, remove the modal function completely.
Now, in place of a variadic argument list with four parameters per
button and a terminating NULL, it takes a pointer to a struct which in
turn contains an (array,length) pair of small per-button structures.
In the process I've renamed the function from messagebox() to
message_box(). Partly that was just because it gave me a convenient
way to search the source for calls I hadn't converted yet, but also
I've thought for a while that that missing underscore didn't really
match the rest of my naming.
NFCI. Partly this minor refactor has the virtue that we can reuse the
more common button layouts without having to type them in at multiple
places in the code (and, indeed, I've provided buttons_yn and
buttons_ok for easy reuse, and could easily provide other things like
yesnocancel any time I need them). But mostly it's because I'm about
to split up message_box into multiple functions, and this saves me the
hassle of deciding which ones to make variadic and which to pass an
actual va_list to - particularly since messagebox() used to go over
its variadic argument list twice, which always makes delegating it to
another function that much more annoying.
The last few changes between them have fixed the problem of windows
not closing properly when their sessions terminated. The problem was
really more than one problem - pterm session termination wasn't even
detected due to the missing SIGCHLD handler, window-closing wasn't
done explicitly due to exit_callback() just calling gtk_main_quit
instead of a proper gtk_widget_destroy(), and that in turn wouldn't do
quite the right thing without the g_application_{hold,release} system
which I added in gtkapp.c as part of the non-model config box rework.
Now that all of those are fixed, things seem to be working sensibly;
the OS X Pterm.app and PuTTY.app, and the ordinary X GTK ptermapp and
puttyapp too, now allow windows to be closed independently of each
other, close them automatically in the right way, and automatically
terminate the whole application when the last window is gone.
So I can clean up that TODO item, including its handwavy 'need to work
out some kind of mechanism'. Some kind of mechanism has now been
worked out, and given that there turned out to be a whole cluster of
interacting structural issues, no wonder I wasn't _quite_ sure what it
ought to be!
Now every call to do_config_box is replaced with a call to
create_config_box, which returns immediately having constructed the
new GTK window object, and is passed a callback function which it will
arrange to be called when the dialog terminates (whether by OK or by
Cancel). That callback is now what triggers the construction of a
session window after 'Open' is pressed in the initial config box, or
the actual mid-session reconfiguration action after 'Apply' is pressed
in a Change Settings box.
We were already prepared to ignore the re-selection of 'Change
Settings' from the context menu of a window that already had a Change
Settings box open (and not accidentally create a second config box for
the same window); but now we do slightly better, by finding the
existing config box and un-minimising and raising it, in case the user
had forgotten it was there.
That's a useful featurelet, but not the main purpose of this change.
The mani point, of course, is that now the multi-window GtkApplication
based front ends now don't do anything confusing to the nesting of
gtk_main() when config boxes are involved. Whether you're changing the
settings of one (or more than one) of your already-running sessions,
preparing to start up a new PuTTY connection, or both at once, we stay
in the same top-level instance of gtk_main() and all sessions' top-
level callbacks continue to run sensibly.
This has been logically necessary in principle for ages, but we got
away without it because we just exited the program. But in the multi-
window GtkApplication front ends, we can't get away with that for
ever; we need to be able to free _one_ of our 'struct gui_data'
instances and everything dangling off it (or, at least, everything
that GTK's reference counting system doesn't clean up for us), without
also doing anything global to the process in which that gui_data is
contained.
Apparently I copied that rather too literally from osxlaunch.c, where
the text about OS X and 'launcher' made more sense. The stub main in
gtkapp.c has nothing to do with launchers and OS X, so I've corrected
the wording to say that a completely different thing won't work in
completely different circumstances :-)
People who use a packaging system other than jhbuild still ought to be
able to run the OS X GTK3 build, so now the gtk-mac-bundler command
finds out the locations of things by a more portable method.
(I've had this change lurking around uncommitted in a working tree for
a while, and only just found it in the course of doing other OS X-
related work. Oops.)
Without this, the Conf objects in a session and its duplicate were
aliases of each other, which could lead to confusing semantic effects
if one of the sessions was reconfigured in mid-run, and worse still, a
crash if one session got cleaned up and called conf_free on a Conf
that the other was still using.
None of that was intentional; it was just a matter of forgetting to
clone the Conf for the duplicated session. Now we do.
Detecting that the child process in a pterm has terminated is
important for _any_ kind of pterm, so it's a mistake to put the signal
handler setup _solely_ inside the optional pty_pre_init function which
does the privileged setup and forks off a utmp watchdog process. Now
the signal handler is installed even in the GtkApplication-based
multi-window front end to pterm, meaning it will exist even on OS X.
ignore_sbar is a flag that we set while manually changing the
scrollbar settings, so that when those half-finished changes trigger
GTK event callbacks, we know to ignore them, and wait until we've
finished setting everything up before actually updating the window.
But somehow I had managed to leave the functions that actually _have
the effect_ (at least in GTK1) outside the pair of statements that set
and unset the ignore flag.
The effect was that compiling pterm for GTK1, starting it up, and
issuing a command like 'ls -l' that scrolls off the bottom of the
window would lead to the _top_ half of the ls output being visible,
and the scrollbar at the top of the scrollback rather than the bottom.
Apparently I haven't tested this compile mode in a while: I had a
couple of compile errors due to new code not properly #ifdeffed (the
true-colour mode has to be effectively disabled in the palette-based
GTK1 graphics model) and one for an unused static function
(get_monitor_geometry is only used in GTK2 and above, and with -Werror
that means I mustn't even _define_ it in GTK1).
With these changes, I still didn't get a clean compile unless I also
configured CFLAGS=-std=gnu89, due to the GTK1 headers having an
outdated set of ifdefs to figure out the compiler's semantics of
'inline'. (They seem to expect old-style gcc, which inconveniently
treats 'inline' and 'extern inline' more or less the opposite way
round from the version standardised by C99.)
My custom GTK layout class 'Columns' includes a linked list of
dynamically allocated data, and apparently I forgot to write a
destructor that frees it all when the class is deallocated, and have
never noticed until now.
While debugging some new code, I ran valgrind in leak-checking mode
and it pointed out a handful of existing memory leaks, which got in the
way of spotting any _new_ leaks I might be introducing :-)
This was one: in the case where an asynchronous agent query on Unix is
aborted, the dynamically allocated buffer holding the response was not
freed.
ATTR_REVERSE was being handled in the front ends, and was causing the
foreground and background colours to be switched. (I'm not completely
sure why I made that design decision; it might be purely historical,
but then again, it might also be because reverse video is one effect
on the fg and bg colours that must still be performed even in unusual
frontend-specific situations like display-driven monochrome mode.)
This affected both explicit reverse video enabled using SGR 7, and
also the transient reverse video arising from mouse selection. Thanks
to Markus Gans for reporting the bug in the latter, which when I
investigated it turned out to affect the former as well.
I've done this on a 'where possible' basis: in Windows paletted mode
(in case anyone is still using an old enough graphics card to need
that!) I simply haven't bothered, and will completely ignore the dim
flag.
Markus Gans points out that some applications which (not at all
unreasonably) don't trust $TERM to tell them the full capabilities of
their terminal will use the sequence "OSC 4 ; nn ; ? BEL" to ask for
the colour-palette value in position nn, and they may not particularly
care _what_ the results are but they will use them to decide whether
the right number of colour palette entries even exist.
Otherwise, moving the cursor (at least in active, filled-cell mode) on
to a true-coloured character cell causes it to vanish completely
because the cell's colours override the thing that differentiates the
cursor.
I'm not sure if any X11 monochrome visuals or Windows paletted display
modes are still around, but just in case they are, we shouldn't
attempt true colour on either kind of display.
This is a heavily rewritten version of a patch originally by Lorenz
Diener; it was tidied up somewhat by Christian Brabandt, and then
tidied up more by me. The basic idea is to add to the termchar
structure a pair of small structs encoding 24-bit RGB values, each
with a flag indicating whether it's turned on; if it is, it overrides
any other specification of fg or bg colour for that character cell.
I've added a test line to colours.txt containing a few example colours
from /usr/share/X11/rgb.txt. In fact it makes quite a good demo to run
the whole of rgb.txt through this treatment, with a command such as
perl -pe 's!^\s*(\d+)\s+(\d+)\s+(\d+).*$!\e[38;2;$1;$2;$3m$&\e[m!' rgb.txt
[unix/osxlaunch.c:133] -> [unix/osxlaunch.c:134]: (warning) Either the condition '!qhead' is redundant or there is possible null pointer dereference: qhead.
Alamy Liu points out that asking for CONF_host will display the wrong
part of the configuration in the case where serial port setup fails.
The Windows front end's analogous message already got this right, but
I must have forgotten to change this one too when I introduced
conf_dest.