putty-source

mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-09 17:38:00 +00:00

Author	SHA1	Message	Date
Simon Tatham	98200d1bfe	Arm: turn on PSTATE.DIT if available and needed. DIT, for 'Data-Independent Timing', is a bit you can set in the processor state on sufficiently new Arm CPUs, which promises that a long list of instructions will deliberately avoid varying their timing based on the input register values. Just what you want for keeping your constant-time crypto primitives constant-time. As far as I'm aware, no CPU has _yet_ implemented any data-dependent optimisations, so DIT is a safety precaution against them doing so in future. It would be embarrassing to be caught without it if a future CPU does do that, so we now turn on DIT in the PuTTY process state. I've put a call to the new enable_dit() function at the start of every main() and WinMain() belonging to a program that might do cryptography (even testcrypt, in case someone uses it for something!), and in case I missed one there, also added a second call at the first moment that any cryptography-using part of the code looks as if it might become active: when an instance of the SSH protocol object is configured, when the system PRNG is initialised, and when selecting any cryptographic authentication protocol in an HTTP or SOCKS proxy connection. With any luck those precautions between them should ensure it's on whenever we need it. Arm's own recommendation is that you should carefully choose the granularity at which you enable and disable DIT: there's a potential time cost to turning it on and off (I'm not sure what, but plausibly something of the order of a pipeline flush), so it's a performance hit to do it _inside_ each individual crypto function, but if CPUs start supporting significant data-dependent optimisation in future, then it will also become a noticeable performance hit to just leave it on across the whole process. So you'd like to do it somewhere in the middle: for example, you might turn on DIT once around the whole process of verifying and decrypting an SSH packet, instead of once for decryption and once for MAC. With all respect to that recommendation as a strategy for maximum performance, I'm not following it here. I turn on DIT at the start of the PuTTY process, and then leave it on. Rationale: 1. PuTTY is not otherwise a performance-critical application: it's not likely to max out your CPU for any purpose _other_ than cryptography. The most CPU-intensive non-cryptographic thing I can imagine a PuTTY process doing is the complicated computation of font rendering in the terminal, and that will normally be cached (you don't recompute each glyph from its outline and hints for every time you display it). 2. I think a bigger risk lies in accidental side channels from having DIT turned off when it should have been on. I can imagine lots of causes for that. Missing a crypto operation in some unswept corner of the code; confusing control flow (like my coroutine macros) jumping with DIT clear into the middle of a region of code that expected DIT to have been set at the beginning; having a reference counter of DIT requests and getting it out of sync. In a more sophisticated programming language, it might be possible to avoid the risk in #2 by cleverness with the type system. For example, in Rust, you could have a zero-sized type that acts as a proof token for DIT being enabled (it would be constructed by a function that also sets DIT, have a Drop implementation that clears DIT, and be !Send so you couldn't use it in a thread other than the one where DIT was set), and then you could require all the actual crypto functions to take a DitToken as an extra parameter, at zero runtime cost. Then "oops I forgot to set DIT around this piece of crypto" would become a compile error. Even so, you'd have to take some care with coroutine-structured code (what happens if a Rust async function yields while holding a DIT token?) and with nesting (if you have two DIT tokens, you don't want dropping the inner one to clear DIT while the outer one is still there to wrongly convince callees that it's set). Maybe in Rust you could get this all to work reliably. But not in C! DIT is an optional feature of the Arm architecture, so we must first test to see if it's supported. This is done the same way as we already do for the various Arm crypto accelerators: on ELF-based systems, check the appropriate bit in the 'hwcap' words in the ELF aux vector; on Mac, look for an appropriate sysctl flag. On Windows I don't know of a way to query the DIT feature, _or_ of a way to write the necessary enabling instruction in an MSVC-compatible way. I've _heard_ that it might not be necessary, because Windows might just turn on DIT unconditionally and leave it on, in an even more extreme version of my own strategy. I don't have a source for that - I heard it by word of mouth - but I _hope_ it's true, because that would suit me very well! Certainly I can't write code to enable DIT without knowing (a) how to do it, (b) how to know if it's safe. Nonetheless, I've put the enable_dit() call in all the right places in the Windows main programs as well as the Unix and cross-platform code, so that if I later find out that I _can_ put in an explicit enable of DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and compile the enable_dit() function appropriately.	2024-12-19 08:52:47 +00:00
Simon Tatham	109c60b3bf	Fix build failure on Debian bullseye from last commit. Jacob reports that bullseye objected to the change from G_APPLICATION_FLAGS_NONE to G_APPLICATION_DEFAULT_FLAGS, on the grounds that it only has the former defined. Sigh. Added a cmake check.	2024-09-08 19:05:45 +01:00
Simon Tatham	5f2eff2fea	Build option to disable scrollback compression. This was requested by a downstream of the code, who wanted to change the time/space tradeoff in the terminal. I currently have no plans to change this setting for upstream PuTTY, although there is a cmake option for it just to make testing it easy. To avoid sprinkling ifdefs over the whole terminal code, the strategy is to keep the separate type 'compressed_scrollback_line', and turn it into a typedef for a 'termline *'. So compressline() becomes almost trivial, and decompressline() even more so. Memory management is the fiddly part. To make this work sensibly on both sides, I've broken up each of compressline() and decompressline() into two versions, one of which takes ownership of (and logically speaking frees) its input, and the other doesn't. So at call sites where a function was followed by a free, it's now calling the 'and_free' version of the function, and where the input object was reused afterwards, it's calling the 'no_free' version. This means that in different branches of the #if, I can make one function call the other or vice versa, and no call site is stuck with having to do things in a more roundabout way than necessary. The freeing of the _return_ value from decompressline() is handled for us, because termlines already have a 'temporary' flag which is set when they're returned from the decompressor, and anyone receiving a termline from lineptr() calls unlineptr() when they're finished with it, which will _conditionally_ free it, depending on that 'temporary' flag. So in the new mode, 'temporary' is never set at all, and all those unlineptr() calls do nothing. However, we also still need to free compressed lines properly when they're actually being thrown away (scrolled off the top of the scrollback, or cleaned up in term_free), and for that, I've made a new special-purpose free_compressed_line() function.	2022-11-20 15:04:00 +00:00
Simon Tatham	2222cd104d	AES-GCM NEON: cope with missing vaddq_p128. In some compilers (I'm told clang 10, in particular), the NEON intrinsic vaddq_p128 is missing, even though its input type poly128_t is provided. vaddq_p128 is just an XOR of two vector registers, so that's easy to work around by casting to a more mundane type and back. Added a configure-time test for that intrinsic, and a workaround to be used in its absence.	2022-10-12 20:01:58 +01:00
Simon Tatham	fda41e1990	Add cmake check for whether setpgrp takes arguments. FreeBSD declares setpgrp() as taking two arguments, like Linux's setpgid(). Detect that at configure time and adjust the call in Pageant appropriately.	2022-09-18 15:08:31 +01:00
Simon Tatham	c1a2114b28	Implement AES-GCM using the @openssh.com protocol IDs. I only recently found out that OpenSSH defined their own protocol IDs for AES-GCM, defined to work the same as the standard ones except that they fixed the semantics for how you select the linked cipher+MAC pair during key exchange. (RFC 5647 defines protocol ids for AES-GCM in both the cipher and MAC namespaces, and requires that you MUST select both or neither - but this contradicts the selection policy set out in the base SSH RFCs, and there's no discussion of how you resolve a conflict between them! OpenSSH's answer is to do it the same way ChaCha20-Poly1305 works, because that will ensure the two suites don't fight.) People do occasionally ask us for this linked cipher/MAC pair, and now I know it's actually feasible, I've implemented it, including a pair of vector implementations for x86 and Arm using their respective architecture extensions for multiplying polynomials over GF(2). Unlike ChaCha20-Poly1305, I've kept the cipher and MAC implementations in separate objects, with an arm's-length link between them that the MAC uses when it needs to encrypt single cipher blocks to use as the inputs to the MAC algorithm. That enables the cipher and the MAC to be independently selected from their hardware-accelerated versions, just in case someone runs on a system that has polynomial multiplication instructions but not AES acceleration, or vice versa. There's a fourth implementation of the GCM MAC, which is a pure software implementation of the same algorithm used in the vectorised versions. It's too slow to use live, but I've kept it in the code for future testing needs, and because it's a convenient place to dump my design comments. The vectorised implementations are fairly crude as far as optimisation goes. I'm sure serious x86 _or_ Arm optimisation engineers would look at them and laugh. But GCM is a fast MAC compared to HMAC-SHA-256 (indeed compared to HMAC-anything-at-all), so it should at least be good enough to use. And we've got a working version with some tests now, so if someone else wants to improve them, they can.	2022-08-16 20:33:58 +01:00
Jacob Nevins	069b0c0caf	Merge recent misc fixes from 'pre-0.77'.	2022-05-19 10:57:35 +01:00
Jacob Nevins	92881f2066	Define OMIT_UTMP if there's no utmpx.h. Without this, the build of e.g. psusan would fail on systems without that header (such as Termux on Android). This is similar to how things were pre-cmake, but not identical. We used to treat lack of updwtmpx() as a reason to OMIT_UTMP (as of `f0dfa73982`), but usage of that function got conditionalised in `c19e7215dd`, so I haven't restored that exclusion.	2022-05-18 18:51:00 +01:00
Jacob Nevins	97b3db34b2	Add missing HAVE_SETRESGID to cmake.h. Without this, we were always falling back to the setuid()/setgid() privilege-dropping code in the utmp helper.	2022-05-18 18:47:01 +01:00
Simon Tatham	bc7e06c494	Windows tools: assorted '-demo' options. Using a new screenshot-taking module I just added in windows/utils, these new options allow me to start up one of the tools with demonstration window contents and automatically save a .BMP screenshot to disk. This will allow me to keep essentially the same set of demo images and update them easily to keep pace with the current appearance of the real tools as PuTTY - and Windows itself - both evolve.	2022-04-02 17:23:34 +01:00
Simon Tatham	dec7d7fce7	Merge demo screenshot features from 'pre-0.77'.	2022-04-02 16:51:55 +01:00
Simon Tatham	018236da29	Support AF_UNIX listening sockets on Windows. Not all Windows toolchains have this yet, so we have to put the whole lot under #ifdef.	2022-02-04 19:32:47 +00:00
Simon Tatham	6791bdc9b6	Don't #include <utmp.h> if it doesn't exist. A FreeBSD user reports that it doesn't exist there.	2021-05-13 18:40:47 +01:00
Simon Tatham	fca13a17b1	Break up crypto modules containing HW acceleration. This applies to all of AES, SHA-1, SHA-256 and SHA-512. All those source files previously contained multiple implementations of the algorithm, enabled or disabled by ifdefs detecting whether they would work on a given compiler. And in order to get advanced machine instructions like AES-NI or NEON crypto into the output file when the compile flags hadn't enabled them, we had to do nasty stuff with compiler-specific pragmas or attributes. Now we can do the detection at cmake time, and enable advanced instructions in the more sensible way, by compile-time flags. So I've broken up each of these modules into lots of sub-pieces: a file called (e.g.) 'foo-common.c' containing common definitions across all implementations (such as round constants), one called 'foo-select.c' containing the top-level vtable(s), and a separate file for each implementation exporting just the vtable(s) for that implementation. One advantage of this is that it depends a lot less on compiler- specific bodgery. My particular least favourite part of the previous setup was the part where I had to _manually_ define some Arm ACLE feature macros before including <arm_neon.h>, so that it would define the intrinsics I wanted. Now I'm enabling interesting architecture features in the normal way, on the compiler command line, there's no need for that kind of trick: the right feature macros are already defined and <arm_neon.h> does the right thing. Another change in this reorganisation is that I've stopped assuming there's just one hardware implementation per platform. Previously, the accelerated vtables were called things like sha256_hw, and varied between FOO-NI and NEON depending on platform; and the selection code would simply ask 'is hw available? if so, use hw, else sw'. Now, each HW acceleration strategy names its vtable its own way, and the selection vtable has a whole list of possibilities to iterate over looking for a supported one. So if someone feels like writing a second accelerated implementation of something for a given platform - for example, I've heard you can use plain NEON to speed up AES somewhat even without the crypto extension - then it will now have somewhere to drop in alongside the existing ones.	2021-04-21 21:55:26 +01:00
Simon Tatham	3996919f5e	Fix a few cmake configure-time checks. A couple of actual checks were missing (elf_aux_info, sysctlbyname). Several more were accidentally left out of cmake.h.in, meaning they wouldn't be propagated from cmake's variable space into the actual compilation. And a handful of checks in the C source were still using the autotools-style 'if defined' in place of the cmake-style "it's always 0 or 1" plain #if.	2021-04-17 22:26:00 +01:00
Simon Tatham	c19e7215dd	Replace mkfiles.pl with a CMake build system. This brings various concrete advantages over the previous system: - consistent support for out-of-tree builds on all platforms - more thorough support for Visual Studio IDE project files - support for Ninja-based builds, which is particularly useful on Windows where the alternative nmake has no parallel option - a really simple set of build instructions that work the same way on all the major platforms (look how much shorter README is!) - better decoupling of the project configuration from the toolchain configuration, so that my Windows cross-building doesn't need (much) special treatment in CMakeLists.txt - configure-time tests on Windows as well as Linux, so that a lot of ad-hoc #ifdefs second-guessing a particular feature's presence from the compiler version can now be replaced by tests of the feature itself Also some longer-term software-engineering advantages: - other people have actually heard of CMake, so they'll be able to produce patches to the new build setup more easily - unlike the old mkfiles.pl, CMake is not my personal problem to maintain - most importantly, mkfiles.pl was just a horrible pile of unmaintainable cruft, which even I found it painful to make changes to or to use, and desperately needed throwing in the bin. I've already thrown away all the variants of it I had in other projects of mine, and was only delaying this one so we could make the 0.75 release branch first. This change comes with a noticeable build-level restructuring. The previous Recipe worked by compiling every object file exactly once, and then making each executable by linking a precisely specified subset of the same object files. But in CMake, that's not the natural way to work - if you write the obvious command that puts the same source file into two executable targets, CMake generates a makefile that compiles it once per target. That can be an advantage, because it gives you the freedom to compile it differently in each case (e.g. with a #define telling it which program it's part of). But in a project that has many executable targets and had carefully contrived to _never_ need to build any module more than once, all it does is bloat the build time pointlessly! To avoid slowing down the build by a large factor, I've put most of the modules of the code base into a collection of static libraries organised vaguely thematically (SSH, other backends, crypto, network, ...). That means all those modules can still be compiled just once each, because once each library is built it's reused unchanged for all the executable targets. One upside of this library-based structure is that now I don't have to manually specify exactly which objects go into which programs any more - it's enough to specify which libraries are needed, and the linker will figure out the fine detail automatically. So there's less maintenance to do in CMakeLists.txt when the source code changes. But that reorganisation also adds fragility, because of the trad Unix linker semantics of walking along the library list once each, so that cyclic references between your libraries will provoke link errors. The current setup builds successfully, but I suspect it only just manages it. (In particular, I've found that MinGW is the most finicky on this score of the Windows compilers I've tried building with. So I've included a MinGW test build in the new-look Buildscr, because otherwise I think there'd be a significant risk of introducing MinGW-only build failures due to library search order, which wasn't a risk in the previous library-free build organisation.) In the longer term I hope to be able to reduce the risk of that, via gradual reorganisation (in particular, breaking up too-monolithic modules, to reduce the risk of knock-on references when you included a module for function A and it also contains function B with an unsatisfied dependency you didn't really need). Ideally I want to reach a state in which the libraries all have sensibly described purposes, a clearly documented (partial) order in which they're permitted to depend on each other, and a specification of what stubs you have to put where if you're leaving one of them out (e.g. nocrypto) and what callbacks you have to define in your non-library objects to satisfy dependencies from things low in the stack (e.g. out_of_memory()). One thing that's gone completely missing in this migration, unfortunately, is the unfinished MacOS port linked against Quartz GTK. That's because it turned out that I can't currently build it myself, on my own Mac: my previous installation of GTK had bit-rotted as a side effect of an Xcode upgrade, and I haven't yet been able to persuade jhbuild to make me a new one. So I can't even build the MacOS port with the _old_ makefiles, and hence, I have no way of checking that the new ones also work. I hope to bring that port back to life at some point, but I don't want it to block the rest of this change.	2021-04-17 13:53:02 +01:00

16 Commits