In 0.81 and before, we put an application manifest (XML-formatted
Windows resource) into all the GUI tools on purpose, and the CLI tools
like Plink didn't have one. But in 0.82, the CLI tools do have one,
and it's a small default one we didn't write ourselves, inserted by
some combination of cmake and clang-imitating-MSVC (I haven't checked
which of those is the cause).
This appears to have happened as a side effect of a build-tools
update, not on purpose. And its effect is that Windows XP now objects
to our plink.exe, because it's very picky about manifest format (we
have an old 'xp-wont-run' bug record about that).
Since it seemed to work fine to not have a manifest at all in 0.81,
let's go back to that. We were already passing /manifest:no to inhibit
the default manifest in the GUI tools, to stop it fighting with our
custom one; now I've moved /manifest:no into the global linker flags,
so it's applied to _all_ binaries, whether we're putting our own
manifest in or not.
Jacob reports that bullseye objected to the change from
G_APPLICATION_FLAGS_NONE to G_APPLICATION_DEFAULT_FLAGS, on the
grounds that it only has the former defined. Sigh. Added a cmake
check.
While trying to get an upcoming piece of code through testsc, I had
trouble - _yet again_ - with the way that control flow diverges inside
the glibc implementations of functions like memcpy and memset,
depending on the alignment of the input blocks _above_ the alignment
guaranteed by malloc, so that doing the same sequence of malloc +
memset can lead to different control flow. (I believe this is done
either for cache performance reasons or SIMD alignment requirements,
or both: on x86, some SIMD instructions require memory alignment
beyond what malloc guarantees, which is also awkward for our x86
hardware crypto implementations.)
My previous effort to normalise this problem out of sclog's log files
worked by wrapping memset and all its synonyms that I could find. But
this weekend, that failed for me, and the reason appears to be ifuncs.
I'm aware of the great irony of committing code to a security project
with a log message saying something vague about ifuncs, on the same
weekend that it came to light that commits matching that description
were one of the methods used to smuggle a backdoor into the XZ Utils
project (CVE-2024-3094). So I'll bend over backwards to explain both
what I think is going on, and why this _isn't_ a weird ifunc-related
backdooring attempt:
When I say I 'wrap' memset, I mean I use DynamoRIO's 'drwrap' API to
arrange that the side-channel test rig calls a function of mine before
and after each call to memset. The way drwrap works is to look up the
symbol address in either the main program or a shared library; in this
case, it's a shared library, namely libc.so. Then it intercepts call
instructions with exactly that address as the target.
Unfortunately, what _actually_ happens when the main program calls
memset is more complicated. First, control goes to the PLT entry for
memset (still in the main program). In principle, that loads a GOT
entry containing the address of memset (filled in by ld.so), and jumps
to it. But in fact the GOT entry varies its value through the program;
on the first call, it points to a resolver function, whose job is to
_find out_ the address of memset. And in the version of libc.so I'm
currently running, that resolver is an STT_GNU_IFUNC indirection
function, which tests the host CPU's capabilities, and chooses an
actual implementation of memset depending on what it finds. (In my
case, it looks as if it's picking one that makes extensive use of x86
SIMD.) To avoid the overhead of doing this on every call, the returned
function pointer is then written into the main program's GOT entry for
memset, overwriting the address of the resolver function, so that the
_next_ call the main program makes through the same PLT entry will go
directly to the memset variant that was chosen.
And the problem is that, after this has happened, none of the new
control flow ever goes near the _official_ address of memset, as read
out of libc.so's dynamic symbol table by DynamoRIO. The PLT entry
isn't at that address, and neither is the particular SIMD variant that
the resolver ended up choosing. So now my wrapper on memset is never
being invoked, and memset cheerfully generates different control flow
in runs of my crypto code that testsc expects to be doing exactly the
same thing as each other, and all my tests fail spuriously.
My solution, at least for the moment, is to completely abandon the
strategy of wrapping memset. Instead, let's just make it behave the
same way every time, by forcing all the affected memory allocations to
have extra-strict alignment. I found that 64-byte alignment is not
good enough to eliminate memset-related test failures, but 128-byte
alignment is.
This would be tricky in itself, if it weren't for the fact that PuTTY
already has its own wrapper function on malloc (for various reasons),
which everything in our code already uses. So I can divert to C11's
aligned_alloc() there. That in turn is done by adding a new #ifdef to
utils/memory.c, and compiling it with that #ifdef into a new object
library that is included in testsc, superseding the standard memory.o
that would otherwise be pulled in from our 'utils' static library.
With the previous memset-compensator removed, this means testsc is now
dependent on having aligned_alloc() available. So we test for it at
cmake time, and don't build testsc at all if it can't be found. This
shouldn't bother anyone very much; aligned_alloc() is available on
_my_ testsc platform, and if anyone else is trying to run this test
suite at all, I expect it will be on something at least as new as
that.
(One awkward thing here is that we can only replace _new_ allocations
with calls to aligned_alloc(): C11 provides no aligned version of
realloc. Happily, this doesn't currently introduce any new problems in
testsc. If it does, I might have to do something even more painful in
future.)
So, why isn't this an ifunc-related backdoor attempt? Because (and you
can check all of this from the patch):
1. The memset-wrapping code exists entirely within the DynamoRIO
plugin module that lives in test/sclog. That is not used in
production, only for running the 'testsc' side-channel tester.
2. The memset-wrapping code is _removed_ by this patch, not added.
3. None of this code is dealing directly with ifuncs - only working
around the unwanted effects on my test suite from the fact that
they exist somewhere else and introduce awkward behaviour.
Apparently when I started using _wfopen in commit 8bd75b85ed, the
winegcc build (which I mostly use for Coverity Scan) stopped working,
because _wfopen isn't included in any of the libraries I explicitly
had on my link line.
Rather than mess about with cmake, it's easier to just bodge it in the
winegcc wrapper script, since we had one of those already.
This was requested by a downstream of the code, who wanted to change
the time/space tradeoff in the terminal. I currently have no plans to
change this setting for upstream PuTTY, although there is a cmake
option for it just to make testing it easy.
To avoid sprinkling ifdefs over the whole terminal code, the strategy
is to keep the separate type 'compressed_scrollback_line', and turn it
into a typedef for a 'termline *'. So compressline() becomes almost
trivial, and decompressline() even more so.
Memory management is the fiddly part. To make this work sensibly on
both sides, I've broken up each of compressline() and decompressline()
into two versions, one of which takes ownership of (and logically
speaking frees) its input, and the other doesn't. So at call sites
where a function was followed by a free, it's now calling the
'and_free' version of the function, and where the input object was
reused afterwards, it's calling the 'no_free' version. This means that
in different branches of the #if, I can make one function call the
other or vice versa, and no call site is stuck with having to do
things in a more roundabout way than necessary.
The freeing of the _return_ value from decompressline() is handled for
us, because termlines already have a 'temporary' flag which is set
when they're returned from the decompressor, and anyone receiving a
termline from lineptr() calls unlineptr() when they're finished with
it, which will _conditionally_ free it, depending on that 'temporary'
flag. So in the new mode, 'temporary' is never set at all, and all
those unlineptr() calls do nothing.
However, we also still need to free compressed lines properly when
they're actually being thrown away (scrolled off the top of the
scrollback, or cleaned up in term_free), and for that, I've made a new
special-purpose free_compressed_line() function.
(cherry picked from commit 5f2eff2fea)
This was requested by a downstream of the code, who wanted to change
the time/space tradeoff in the terminal. I currently have no plans to
change this setting for upstream PuTTY, although there is a cmake
option for it just to make testing it easy.
To avoid sprinkling ifdefs over the whole terminal code, the strategy
is to keep the separate type 'compressed_scrollback_line', and turn it
into a typedef for a 'termline *'. So compressline() becomes almost
trivial, and decompressline() even more so.
Memory management is the fiddly part. To make this work sensibly on
both sides, I've broken up each of compressline() and decompressline()
into two versions, one of which takes ownership of (and logically
speaking frees) its input, and the other doesn't. So at call sites
where a function was followed by a free, it's now calling the
'and_free' version of the function, and where the input object was
reused afterwards, it's calling the 'no_free' version. This means that
in different branches of the #if, I can make one function call the
other or vice versa, and no call site is stuck with having to do
things in a more roundabout way than necessary.
The freeing of the _return_ value from decompressline() is handled for
us, because termlines already have a 'temporary' flag which is set
when they're returned from the decompressor, and anyone receiving a
termline from lineptr() calls unlineptr() when they're finished with
it, which will _conditionally_ free it, depending on that 'temporary'
flag. So in the new mode, 'temporary' is never set at all, and all
those unlineptr() calls do nothing.
However, we also still need to free compressed lines properly when
they're actually being thrown away (scrolled off the top of the
scrollback, or cleaned up in term_free), and for that, I've made a new
special-purpose free_compressed_line() function.
In some compilers (I'm told clang 10, in particular), the NEON
intrinsic vaddq_p128 is missing, even though its input type poly128_t
is provided.
vaddq_p128 is just an XOR of two vector registers, so that's easy to
work around by casting to a more mundane type and back. Added a
configure-time test for that intrinsic, and a workaround to be used in
its absence.
In commit 732ec31a17 I made the check for libX11 conditional on
GTK - but I forgot that if we're building without GTK, I should
_define_ NOT_X_WINDOWS, rather than leaving it undefined. As a result,
the build would fail on files like unix/utils/x11_ignore_error.c.
On FreeBSD, the GTK libraries aren't stored on the standard library
path, so pkg-config has to emit a -L option as well as -l options.
This worked fine during the main build, but the -L option wasn't being
passed through to check_symbol_exists() for the tests of Pango API
function availability.
FreeBSD declares setpgrp() as taking two arguments, like Linux's
setpgid(). Detect that at configure time and adjust the call in
Pageant appropriately.
If you have GTK installed on your system but want to build without it
anyway (e.g. if you're buliding a package suitable for headless
systems), it's useful to be able to explicitly instruct PuTTY's build
system not to use GTK even if it's there.
This would already work if you unilaterally set PUTTY_GTK_VERSION to
some value other than 1, 2, 3 or ANY. Added NONE as an officially
supported option, and included it in the list that cmake-gui will
present.
Also, made the check for libX11 conditional on having GTK, since
there's no need to bother with it otherwise.
On FreeBSD, I'm told, you can't configure Kerberos via pkg-config. So
we need a fallback. Here's some manual code to run krb5-config and
pick apart the result, similar to what I already did with gtk-config
for our (still not dead!) GTK 1 support.
This went missing in the migration to CMake, and broke compatibility
of the standard 32-bit builds with Windows XP. (Of course, the
'buildold' versions should still have run.)
There doesn't seem to be a convenient CMake option to configure it
cleanly, so I had to do a bodgy string-replace on the variable
containing the linker flags, which I found by source-diving in CMake.
That's fragile enough that I've also put in a check after the fact, so
that we'll find out if it ever stops working.
My experimental build with clang-cl at -Wall did show up a few things
that are safe enough to fix right now. One was this list of missing
includes, which was causing a lot of -Wmissing-prototype warnings, and
is a real risk because it means the declarations in headers weren't
being type-checked against the actual function definitions.
Happily, no actual mismatches.
Unfortunately, it gives an absolutely huge number of warnings, and it
wouldn't be feasible to fix them all without risking introducing
further bugs. Perhaps _after_ the next release branch it might be
worth looking at some of them, but I don't think fixing them right now
is viable.
I've left it on for actual gcc, though, since the MinGW build seems OK
with it.
This has been missing since the cmake transition (c19e7215dd) --
mkfiles.pl generally used at least -Wall -Werror -Wvla with GCC-like
compilers. After that, Windows STRICT gained -Wpointer-arith, but lost
-Wall and hence a lot of other warnings (such as the -Wformat I muttered
about in baea34a5b2).
My mingw-w64 build survives this (after my recent warning fixes), and
apparently an official Bob build does too.
This was lost in the mkfiles.pl->cmake transition (c19e7215dd).
Without this, MinGW builds were providing format strings like %zu to a
version of vsnprintf that didn't support them at runtime, so you'd get
messages like "Pageant has zu SSH-2 keys". (-Wformat would have
complained about the unknown %z format specifier, but even STRICT MinGW
builds don't get those warnings, hm.)
Now the runtime version understands %zu.
I've reviewed the other compile-time definitions that were unique to the
old Makefile.mgw, and decided not to reinstate any of them:
WIN32S_COMPAT: leave it out.
This came in in bd4b8c1285. Rationale from Joris van Rantwijk in email
2000-01-24: "Use -DWIN32S_COMPAT to avoid a linking error about
SystemPowerStatus".
But that problem was solved another way within 8 months, and
WIN32S_COMPAT removed from the code, in 76746a7d61, so this wart had
been redundant since then.
_NO_OLDNAMES: decided not to add anything back for this.
This actually does nothing with the mingw-w64 fork (which seems to spell
it NO_OLDNAMES), although current versions of original-mingw do also
still spell it _NO_OLDNAMES.
They both seem to be about suppressing a behaviour where a load of
"non-ANSI" names like strdup get redirected to invoke _strdup in MS'
libraries.
Again, original rationale is from Joris van Rantwijk: "Compile and link
with -mno-cygwin (and -D_NO_OLDNAMES) to get executables that don't need
the Cygwin DLL file."
Since I don't know of any behavioural differences that this causes
(unlike vsnprintf/_vsnprintf), and it's not obviously causing trouble
for me, continue to leave things in the default state.
I only recently found out that OpenSSH defined their own protocol IDs
for AES-GCM, defined to work the same as the standard ones except that
they fixed the semantics for how you select the linked cipher+MAC pair
during key exchange.
(RFC 5647 defines protocol ids for AES-GCM in both the cipher and MAC
namespaces, and requires that you MUST select both or neither - but
this contradicts the selection policy set out in the base SSH RFCs,
and there's no discussion of how you resolve a conflict between them!
OpenSSH's answer is to do it the same way ChaCha20-Poly1305 works,
because that will ensure the two suites don't fight.)
People do occasionally ask us for this linked cipher/MAC pair, and now
I know it's actually feasible, I've implemented it, including a pair
of vector implementations for x86 and Arm using their respective
architecture extensions for multiplying polynomials over GF(2).
Unlike ChaCha20-Poly1305, I've kept the cipher and MAC implementations
in separate objects, with an arm's-length link between them that the
MAC uses when it needs to encrypt single cipher blocks to use as the
inputs to the MAC algorithm. That enables the cipher and the MAC to be
independently selected from their hardware-accelerated versions, just
in case someone runs on a system that has polynomial multiplication
instructions but not AES acceleration, or vice versa.
There's a fourth implementation of the GCM MAC, which is a pure
software implementation of the same algorithm used in the vectorised
versions. It's too slow to use live, but I've kept it in the code for
future testing needs, and because it's a convenient place to dump my
design comments.
The vectorised implementations are fairly crude as far as optimisation
goes. I'm sure serious x86 _or_ Arm optimisation engineers would look
at them and laugh. But GCM is a fast MAC compared to HMAC-SHA-256
(indeed compared to HMAC-anything-at-all), so it should at least be
good enough to use. And we've got a working version with some tests
now, so if someone else wants to improve them, they can.
Without this, the build of e.g. psusan would fail on systems without
that header (such as Termux on Android).
This is similar to how things were pre-cmake, but not identical. We used
to treat lack of updwtmpx() as a reason to OMIT_UTMP (as of f0dfa73982),
but usage of that function got conditionalised in c19e7215dd, so I
haven't restored that exclusion.
In Winelib, you have to be careful not to say 'unsigned long' where
the API expects ULONG, because Winelib doesn't have the Windows LLP64
nature - its unsigned long is 64 bits, whereas ULONG is 32.
Also, my local Winelib has <dwmapi.h> (used in the new demo-screenshot
system), but doesn't contain some of the definitions inside it. So
I've expanded the cmake test of HAVE_DWMAPI_H so that it actually
checks the things we need, instead of just the existence of the
containing header.
Using a new screenshot-taking module I just added in windows/utils,
these new options allow me to start up one of the tools with
demonstration window contents and automatically save a .BMP screenshot
to disk. This will allow me to keep essentially the same set of demo
images and update them easily to keep pace with the current appearance
of the real tools as PuTTY - and Windows itself - both evolve.
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
These alternate frontends using the GtkApplication class don't work
before GTK3, because the GtkApplication class didn't exist. In the old
mkfiles.pl system, the simplest way to prevent a build failure was to
just compile them anyway but make them reduce to a stub main(). But
now, with the new library-based code organisation, library search
order issues mean that these applications won't build at all.
Happily, with cmake, it's also easy to simply omit these binaries from
the build completely depending on our GTK version.
This commit replaces all those fiddly little linking modules
(be_all.c, be_none.c, be_ssh.c etc) with a single source file
controlled by ifdefs, and introduces a function be_list() in
setup.cmake that makes it easy to compile a version of it appropriate
to each application.
This is a net reduction in code according to 'git diff --stat', even
though I've introduced more comments. It also gets rid of another pile
of annoying little source files in the top-level directory that didn't
deserve to take up so much room in 'ls'.
More concretely, doing this has some maintenance advantages.
Centralisation means less to maintain (e.g. n_ui_backends is worked
out once in a way that makes sense everywhere), and also, 'appname'
can now be reliably set per program. Previously, some programs got the
wrong appname due to sharing the same linking module (e.g. Plink had
appname="PuTTY"), which was a latent bug that would have manifested if
I'd wanted to reuse the same string in another context.
One thing I've changed in this rework is that Windows pterm no longer
has the ConPTY backend in its backends[]: it now has an empty one. The
special be_conpty.c module shouldn't really have been there in the
first place: it was used in the very earliest uncommitted drafts of
the ConPTY work, where I was using another method of selecting that
backend, but now that Windows pterm has a dedicated
backend_vt_from_conf() that refers to conpty_backend by name, it has
no need to live in backends[] at all, just as it doesn't have to in
Unix pterm.
After this change, the cmake setup now works even on Debian stretch
(oldoldstable), which runs cmake 3.7.
In order to support a version that early I had to:
- write a fallback implementation of 'add_compile_definitions' for
older cmakes, which is easy, because add_compile_definitions(FOO)
is basically just add_compile_options(-DFOO)
- stop using list(TRANSFORM) and string(JOIN), of which I had one
case each, and they were easily replaced with simple foreach loops
- stop putting OBJECT libraries in the target_link_libraries command
for executable targets, in favour of adding $<TARGET_OBJECTS:foo>
to the main sources list for the same target. That matches what I
do with library targets, so it's probably more sensible anyway.
I tried going back by another Debian release and getting this cmake
setup to work on jessie, but that runs CMake 3.0.1, and in _that_
version of cmake the target_sources command is missing, and I didn't
find any alternative way to add extra sources to a target after having
first declared it. Reorganising to cope with _that_ omission would be
too much upheaval without a very good reason.
For some reason, in my comment explaining which Visual Studio compile
warnings I'd suppressed and why, one of the warning numbers in the
comment totally failed to match the one in the suppression option! I
probably pasted it from some other warning in that compile, which I
fixed rather than suppressing.
This fulfills our long-standing Mayhem-difficulty wishlist item
'win-command-prompt': this is a Windows pterm in the sense that when
you run it you get a local cmd.exe running inside a PuTTY-style window.
Advantages of this: you get the same free choice of fonts as PuTTY has
(no restriction to a strange subset of the system's available fonts);
you get the same copy-paste gestures as PuTTY (no mental gear-shifting
when you have command prompts and SSH sessions open on the same
desktop); you get scrollback with the PuTTY semantics (scrolling to
the bottom gets you to where the action is, as opposed to the way you
could accidentally find yourself 500 lines past the end of the action
in a real console).
'win-command-prompt' was at Mayhem difficulty ('Probably impossible')
basically on the grounds that with Windows's old APIs for accessing
the contents of consoles, there was no way I could find to get this to
work sensibly. What was needed to make it feasible was a major piece
of re-engineering work inside Windows itself.
But, of course, that's exactly what happened! In 2019, the new ConPTY
API arrived, which lets you create an object that behaves like a
Windows console at one end, and round the back, emits a stream of
VT-style escape sequences as the screen contents evolve, and accepts a
VT-style input stream in return which it will parse function and arrow
keys out of in the usual way.
So now it's actually _easy_ to get this to basically work. The new
backend, in conpty.c, has to do a handful of magic Windows API calls
to set up the pseudo-console and its feeder pipes and start a
subprocess running in it, a further magic call every time the PuTTY
window is resized, and detect the end of the session by watching for
the subprocess terminating. But apart from that, all it has to do is
pass data back and forth unmodified between those pipes and the
backend's associated Seat!
That said, this is new and experimental, and there will undoubtedly be
issues. One that I already know about is that you can't copy and paste
a word that has wrapped between lines without getting an annoying
newline in the middle of it. As far as I can see this is a fundamental
limitation: the ConPTY system sends the _same_ escape sequence stream
for a line that wrapped as it would send for a line that had a logical
\n at what would have been the wrap point. Probably the best we can do
to mitigate this is to adopt a different heuristic for newline elision
that's right more often than it's wrong.
For the moment, that experimental-ness is indicated by the fact that
Buildscr will build, sign and deliver a copy of pterm.exe for each
flavour of Windows, but won't include it in the .zip file or in the
installer. (In fact, that puts it in exactly the same ad-hoc category
as PuTTYtel, although for completely different reasons.)
It's always the same as the cwd when the script is invoked, and by
having the script get it _from_ its own cwd, we arrange a bit of
automatic normalisation in situations where you need to invoke it with
some non-canonical path like one ending in "/.." - which I'll do in
the next commit.
When building against the Mac Homebrew installation of GTK, you find
that GTK exists, libX11 exists, but the integration between the two
(in the form of the header file gdk/gdkx.h) doesn't exist. In that
situation, we need to compile out X11 support.
doc/CMakeLists.txt now sets a variable indicating that we either have,
or can build, each individual man page. And when we call our
installed_program() function to mark a program as official enough to
put in 'make install', that function also installs the man page
similarly if it exists, and warns if not.
For the convenience of people building-and-installing from the .tar.gz
we ship, I've arranged that they can still get the man pages installed
without needing Halibut: the previous commit ensured that the prebuilt
man pages are still in the tarball, and this one arranges that if we
don't have Halibut but we do have prebuilt man pages, then we can
'build' them by copying from the prebuilt versions.
The standalone separate doc/Makefile is gone, replaced by a
CMakeLists.txt that makes 'doc' function as a subdirectory of the main
CMake build system. This auto-detects Halibut, and if it's present,
uses it to build the man pages and the various forms of the main
manual, including the Windows CHM help file in particular.
One awkward thing I had to do was to move just one config directive in
blurb.but into its own file: the one that cites a relative path to the
stylesheet file to put into the CHM. CMake builds often like to be
out-of-tree, so there's no longer a fixed relative path between the
build directory and chm.css. And Halibut has no concept of an include
path to search for files cited by other files, so I can't fix that
with an -I option on the Halibut command line. So I moved that single
config directive into its own file, and had CMake write out a custom
version of that file in the build directory citing the right path.
(Perhaps in the longer term I should fix that omission in Halibut;
out-of-tree friendliness seems like a useful feature. But even if I
do, I still need this build to work now.)
I'm about to want to embed the current git commit into a Halibut
source file, for which I'll need to add a second output mode to the
existing script that finds it out.
Now that the main source file of Plink in each platform directory has
the same name, we can put centralise the main definition of the
program in the main CMakeLists.txt, and in the platform directory,
just add the few extra modules needed to clear up platform-specific
details.
The same goes for psocks. And PSCP and PSFTP could have been moved to
the top level already - I just hadn't done it in the initial setup.
This applies to all of AES, SHA-1, SHA-256 and SHA-512. All those
source files previously contained multiple implementations of the
algorithm, enabled or disabled by ifdefs detecting whether they would
work on a given compiler. And in order to get advanced machine
instructions like AES-NI or NEON crypto into the output file when the
compile flags hadn't enabled them, we had to do nasty stuff with
compiler-specific pragmas or attributes.
Now we can do the detection at cmake time, and enable advanced
instructions in the more sensible way, by compile-time flags. So I've
broken up each of these modules into lots of sub-pieces: a file called
(e.g.) 'foo-common.c' containing common definitions across all
implementations (such as round constants), one called 'foo-select.c'
containing the top-level vtable(s), and a separate file for each
implementation exporting just the vtable(s) for that implementation.
One advantage of this is that it depends a lot less on compiler-
specific bodgery. My particular least favourite part of the previous
setup was the part where I had to _manually_ define some Arm ACLE
feature macros before including <arm_neon.h>, so that it would define
the intrinsics I wanted. Now I'm enabling interesting architecture
features in the normal way, on the compiler command line, there's no
need for that kind of trick: the right feature macros are already
defined and <arm_neon.h> does the right thing.
Another change in this reorganisation is that I've stopped assuming
there's just one hardware implementation per platform. Previously, the
accelerated vtables were called things like sha256_hw, and varied
between FOO-NI and NEON depending on platform; and the selection code
would simply ask 'is hw available? if so, use hw, else sw'. Now, each
HW acceleration strategy names its vtable its own way, and the
selection vtable has a whole list of possibilities to iterate over
looking for a supported one. So if someone feels like writing a second
accelerated implementation of something for a given platform - for
example, I've heard you can use plain NEON to speed up AES somewhat
even without the crypto extension - then it will now have somewhere to
drop in alongside the existing ones.
add_platform_sources_to_library() is now called
add_sources_from_current_dir(), so that it will make sense when I use
it in subdirectories that aren't for a particular platform.
On Windows, configure-style checks are a bit slow, so it's worth
avoiding unnecessary ones if possible. I was testing for three
different header file names that are alternatives to each other, so it
makes sense to stop as soon as we find a usable one.
This new implementation uses the same optimisation-barrier technique
that I used in various places in testsc: have a no-op function, and a
volatile function pointer pointing at it, and then call through the
function pointer, so that nothing actually happens (apart from the
physical call and return) but the compiler has to assume that
_anything_ might have happened.
Doing this just after a memset enforces that the compiler can't have
thrown away the memset, because the called function might (for
example) check that all the memory really is zero and abort if not.
I've been turning this over in my mind ever since coming up with the
technique for testsc. I think it's far more robust than the previous
smemclr technique: so much so that I'm switching to using it
_everywhere_, and no longer using platform alternatives like Windows's
SecureZeroMemory().