putty-source/unix/uxutils.c

#include "putty.h"
#include "ssh.h"

#include "uxutils.h"

#if defined __arm__ || defined __aarch64__

bool platform_aes_hw_available(void)
{
#if defined HWCAP_AES
    return getauxval(AT_HWCAP) & HWCAP_AES;
#elif defined HWCAP2_AES
    return getauxval(AT_HWCAP2) & HWCAP2_AES;
#elif defined __APPLE__
    /* M1 macOS defines no optional sysctl flag indicating presence of
     * the AES extension, which I assume to be because it's always
     * present */
    return true;
#else
    return false;
#endif
}

bool platform_sha256_hw_available(void)
{
#if defined HWCAP_SHA2
    return getauxval(AT_HWCAP) & HWCAP_SHA2;
#elif defined HWCAP2_SHA2
    return getauxval(AT_HWCAP2) & HWCAP2_SHA2;
#elif defined __APPLE__
    /* Assume always present on M1 macOS, similarly to AES */
    return true;
#else
    return false;
#endif
}

bool platform_sha1_hw_available(void)
{
#if defined HWCAP_SHA1
    return getauxval(AT_HWCAP) & HWCAP_SHA1;
#elif defined HWCAP2_SHA1
    return getauxval(AT_HWCAP2) & HWCAP2_SHA1;
#elif defined __APPLE__
    /* Assume always present on M1 macOS, similarly to AES */
    return true;
#else
    return false;
#endif
}

bool platform_sha512_hw_available(void)
{
#if defined HWCAP_SHA512
    return getauxval(AT_HWCAP) & HWCAP_SHA512;
#elif defined HWCAP2_SHA512
    return getauxval(AT_HWCAP2) & HWCAP2_SHA512;
#elif defined __APPLE__
    return test_sysctl_flag("hw.optional.armv8_2_sha512");
#else
    return false;
#endif
}

#endif /* defined __arm__ || defined __aarch64__ */
Check for auxv.h and hwcap.h before including them. uClibc-ng does not provide <sys/auxv.h>, and a non-Linux-kernel-based Unixlike system running on Arm will probably not provide <asm/hwcap.h>. Now we check for both of those headers at autoconf time, and if either one is absent, we don't do the runtime test for Arm crypto acceleration. This should only make a difference on systems where this module previously failed to compile at all. But obviously it would be nicer to find alternative ways to check for crypto acceleration on such systems; patches welcome. 2019-03-26 18:40:51 +00:00			`#include "putty.h"`
Support hardware AES on Arm platforms. The refactored sshaes.c gives me a convenient slot to drop in a second hardware-accelerated AES implementation, similar to the existing one but using Arm NEON intrinsics in place of the x86 AES-NI ones. This needed a minor structural change, because Arm systems are often heterogeneous, containing more than one type of CPU which won't necessarily all support the same set of architecture features. So you can't test at run time for the presence of AES acceleration by querying the CPU you're running on - even if you found a way to do it, the answer wouldn't be reliable once the OS started migrating your process between CPUs. Instead, you have to ask the OS itself, because only that knows about _all_ the CPUs on the system. So that means the aes_hw_available() mechanism has to extend a tentacle into each platform subdirectory. The trickiest part was the nest of ifdefs that tries to detect whether the compiler can support the necessary parts. I had successful test-compiles on several compilers, and was able to run the code directly on an AArch64 tablet (so I know it passes cryptsuite), but it's likely that at least some Arm platforms won't be able to build it because of some path through the ifdefs that I haven't been able to test yet. 2019-01-16 22:08:45 +00:00			`#include "ssh.h"`

uxutils.c: move some definitions into a header file. If the autoconf/ifdef system ends up taking the trivial branch through all the Arm-architecture ifdefs, then we define the always-fail version of getauxval as a 'static inline' function, and then (because none of our desired HWCAP_FOO values is defined at all) never call it. This leads to a compiler warning because we defined a static function and never called it - i.e. at the default -Werror, a build failure. Of course it's perfectly sensible to define a static inline function that never gets called! Header files do it all the time, and nobody is expected to ensure that if they include a header file then they take care to refer to every static inline function it defines. But if the definition is in the _source_ file rather than a header file, then clang (in particular on macOS) will give a warning. So the easy solution is to move the inline definitions of getauxval into a header file, which suppresses the warning without requiring me to faff about with further ifdefs to make the definitions conditional on at least one use. 2020-12-24 09:34:13 +00:00			`#include "uxutils.h"`
Support FreeBSD's API for querying the ELF aux vector. We use this for detecting the Arm crypto extension and using it to enable accelerated AES and/or SHA-{1,2}. Previously, I had code that called glibc's getauxval(3) function, conditioned on #ifdef __linux__. Now, instead, I do an autoconf test to query the presence of getauxval itself (so that any other system with the same API can still work), and alongside it, also check for the analogous FreeBSD libc function elf_aux_info(3). As a result, building on Arm FreeBSD now gets the accelerated-crypto autodetection. 2020-10-09 18:14:57 +00:00
uxutils.c: move some definitions into a header file. If the autoconf/ifdef system ends up taking the trivial branch through all the Arm-architecture ifdefs, then we define the always-fail version of getauxval as a 'static inline' function, and then (because none of our desired HWCAP_FOO values is defined at all) never call it. This leads to a compiler warning because we defined a static function and never called it - i.e. at the default -Werror, a build failure. Of course it's perfectly sensible to define a static inline function that never gets called! Header files do it all the time, and nobody is expected to ensure that if they include a header file then they take care to refer to every static inline function it defines. But if the definition is in the _source_ file rather than a header file, then clang (in particular on macOS) will give a warning. So the easy solution is to move the inline definitions of getauxval into a header file, which suppresses the warning without requiring me to faff about with further ifdefs to make the definitions conditional on at least one use. 2020-12-24 09:34:13 +00:00			`#if defined __arm__ \|\| defined __aarch64__`
Support hardware AES on Arm platforms. The refactored sshaes.c gives me a convenient slot to drop in a second hardware-accelerated AES implementation, similar to the existing one but using Arm NEON intrinsics in place of the x86 AES-NI ones. This needed a minor structural change, because Arm systems are often heterogeneous, containing more than one type of CPU which won't necessarily all support the same set of architecture features. So you can't test at run time for the presence of AES acceleration by querying the CPU you're running on - even if you found a way to do it, the answer wouldn't be reliable once the OS started migrating your process between CPUs. Instead, you have to ask the OS itself, because only that knows about _all_ the CPUs on the system. So that means the aes_hw_available() mechanism has to extend a tentacle into each platform subdirectory. The trickiest part was the nest of ifdefs that tries to detect whether the compiler can support the necessary parts. I had successful test-compiles on several compilers, and was able to run the code directly on an AArch64 tablet (so I know it passes cryptsuite), but it's likely that at least some Arm platforms won't be able to build it because of some path through the ifdefs that I haven't been able to test yet. 2019-01-16 22:08:45 +00:00
			`bool platform_aes_hw_available(void)`
			`{`
			`#if defined HWCAP_AES`
			`return getauxval(AT_HWCAP) & HWCAP_AES;`
			`#elif defined HWCAP2_AES`
			`return getauxval(AT_HWCAP2) & HWCAP2_AES;`
uxutils.c: add special case for M1 macOS. The M1 chip in the new range of Macs includes the crypto extension that permits AES, SHA-1 and SHA-256 acceleration. But you can't find that out by querying the ELF aux vector, because macOS isn't even ELF-based at all, so there isn't an ELF aux vector, and no web search I've tried has turned up any MachO thing obviously analogous to it. Running 'sysctl -a' does show some flags indicating CPU architecture extensions, but they're more advanced ones than this. So I think we have to assume that if we're on the new M1 macOS at all, then we have the basic crypto extension available. Accordingly, I've added a special case to all the query functions that simply returns true if defined __APPLE__. 2020-12-24 10:04:08 +00:00			`#elif defined __APPLE__`
			`/* M1 macOS defines no optional sysctl flag indicating presence of`
			`* the AES extension, which I assume to be because it's always`
			`* present */`
			`return true;`
Support hardware AES on Arm platforms. The refactored sshaes.c gives me a convenient slot to drop in a second hardware-accelerated AES implementation, similar to the existing one but using Arm NEON intrinsics in place of the x86 AES-NI ones. This needed a minor structural change, because Arm systems are often heterogeneous, containing more than one type of CPU which won't necessarily all support the same set of architecture features. So you can't test at run time for the presence of AES acceleration by querying the CPU you're running on - even if you found a way to do it, the answer wouldn't be reliable once the OS started migrating your process between CPUs. Instead, you have to ask the OS itself, because only that knows about _all_ the CPUs on the system. So that means the aes_hw_available() mechanism has to extend a tentacle into each platform subdirectory. The trickiest part was the nest of ifdefs that tries to detect whether the compiler can support the necessary parts. I had successful test-compiles on several compilers, and was able to run the code directly on an AArch64 tablet (so I know it passes cryptsuite), but it's likely that at least some Arm platforms won't be able to build it because of some path through the ifdefs that I haven't been able to test yet. 2019-01-16 22:08:45 +00:00			`#else`
			`return false;`
			`#endif`
			`}`

Support hardware SHA-256 and SHA-1 on Arm platforms. Similarly to my recent addition of NEON-accelerated AES, these new implementations drop in alongside the SHA-NI ones, under a different set of ifdefs. All the details of selection and detection are essentially the same as they were for the AES code. 2019-01-23 07:27:12 +00:00			`bool platform_sha256_hw_available(void)`
			`{`
			`#if defined HWCAP_SHA2`
			`return getauxval(AT_HWCAP) & HWCAP_SHA2;`
			`#elif defined HWCAP2_SHA2`
			`return getauxval(AT_HWCAP2) & HWCAP2_SHA2;`
uxutils.c: add special case for M1 macOS. The M1 chip in the new range of Macs includes the crypto extension that permits AES, SHA-1 and SHA-256 acceleration. But you can't find that out by querying the ELF aux vector, because macOS isn't even ELF-based at all, so there isn't an ELF aux vector, and no web search I've tried has turned up any MachO thing obviously analogous to it. Running 'sysctl -a' does show some flags indicating CPU architecture extensions, but they're more advanced ones than this. So I think we have to assume that if we're on the new M1 macOS at all, then we have the basic crypto extension available. Accordingly, I've added a special case to all the query functions that simply returns true if defined __APPLE__. 2020-12-24 10:04:08 +00:00			`#elif defined __APPLE__`
			`/* Assume always present on M1 macOS, similarly to AES */`
			`return true;`
Support hardware SHA-256 and SHA-1 on Arm platforms. Similarly to my recent addition of NEON-accelerated AES, these new implementations drop in alongside the SHA-NI ones, under a different set of ifdefs. All the details of selection and detection are essentially the same as they were for the AES code. 2019-01-23 07:27:12 +00:00			`#else`
			`return false;`
			`#endif`
			`}`

			`bool platform_sha1_hw_available(void)`
			`{`
			`#if defined HWCAP_SHA1`
			`return getauxval(AT_HWCAP) & HWCAP_SHA1;`
			`#elif defined HWCAP2_SHA1`
			`return getauxval(AT_HWCAP2) & HWCAP2_SHA1;`
uxutils.c: add special case for M1 macOS. The M1 chip in the new range of Macs includes the crypto extension that permits AES, SHA-1 and SHA-256 acceleration. But you can't find that out by querying the ELF aux vector, because macOS isn't even ELF-based at all, so there isn't an ELF aux vector, and no web search I've tried has turned up any MachO thing obviously analogous to it. Running 'sysctl -a' does show some flags indicating CPU architecture extensions, but they're more advanced ones than this. So I think we have to assume that if we're on the new M1 macOS at all, then we have the basic crypto extension available. Accordingly, I've added a special case to all the query functions that simply returns true if defined __APPLE__. 2020-12-24 10:04:08 +00:00			`#elif defined __APPLE__`
			`/* Assume always present on M1 macOS, similarly to AES */`
			`return true;`
Support hardware SHA-256 and SHA-1 on Arm platforms. Similarly to my recent addition of NEON-accelerated AES, these new implementations drop in alongside the SHA-NI ones, under a different set of ifdefs. All the details of selection and detection are essentially the same as they were for the AES code. 2019-01-23 07:27:12 +00:00			`#else`
			`return false;`
			`#endif`
			`}`

Hardware-accelerated SHA-512 on the Arm architecture. The NEON support for SHA-512 acceleration looks very like SHA-256, with a pair of chained instructions to generate a 128-bit vector register full of message schedule, and another pair to update the hash state based on those. But since SHA-512 is twice as big in all dimensions, those four instructions between them only account for two rounds of it, in place of four rounds of SHA-256. Also, it's a tighter squeeze to fit all the data needed by those instructions into their limited number of register operands. The NEON SHA-256 implementation was able to keep its hash state and message schedule stored as 128-bit vectors and then pass combinations of those vectors directly to the instructions that did the work; for SHA-512, in several places you have to make one of the input operands to the main instruction by combining two halves of different vectors from your existing state. But that operation is a quick single EXT instruction, so no trouble. The only other problem I've found is that clang - in particular the version on M1 macOS, but as far as I can tell, even on current trunk - doesn't seem to implement the NEON intrinsics for the SHA-512 extension. So I had to bodge my own versions with inline assembler in order to get my implementation to compile under clang. Hopefully at some point in the future the gap might be filled and I can relegate that to a backwards-compatibility hack! This commit adds the same kind of switching mechanism for SHA-512 that we already had for SHA-256, SHA-1 and AES, and as with all of those, plumbs it through to testcrypt so that you can explicitly ask for the hardware or software version of SHA-512. So the test suite can run the standard test vectors against both implementations in turn. On M1 macOS, I'm testing at run time for the presence of SHA-512 by checking a sysctl setting. You can perform the same test on the command line by running "sysctl hw.optional.armv8_2_sha512". As far as I can tell, on Windows there is not yet any flag to test for this CPU feature, so for the moment, the new accelerated SHA-512 is turned off unconditionally on Windows. 2020-12-24 11:40:15 +00:00			`bool platform_sha512_hw_available(void)`
			`{`
			`#if defined HWCAP_SHA512`
			`return getauxval(AT_HWCAP) & HWCAP_SHA512;`
			`#elif defined HWCAP2_SHA512`
			`return getauxval(AT_HWCAP2) & HWCAP2_SHA512;`
			`#elif defined __APPLE__`
			`return test_sysctl_flag("hw.optional.armv8_2_sha512");`
			`#else`
			`return false;`
			`#endif`
			`}`

Support FreeBSD's API for querying the ELF aux vector. We use this for detecting the Arm crypto extension and using it to enable accelerated AES and/or SHA-{1,2}. Previously, I had code that called glibc's getauxval(3) function, conditioned on #ifdef __linux__. Now, instead, I do an autoconf test to query the presence of getauxval itself (so that any other system with the same API can still work), and alongside it, also check for the analogous FreeBSD libc function elf_aux_info(3). As a result, building on Arm FreeBSD now gets the accelerated-crypto autodetection. 2020-10-09 18:14:57 +00:00			`#endif /* defined __arm__ \|\| defined __aarch64__ */`