From 98200d1bfec13397cf81b1d5b28a1ab90962dcde Mon Sep 17 00:00:00 2001 From: Simon Tatham Date: Thu, 19 Dec 2024 08:47:08 +0000 Subject: [PATCH] Arm: turn on PSTATE.DIT if available and needed. DIT, for 'Data-Independent Timing', is a bit you can set in the processor state on sufficiently new Arm CPUs, which promises that a long list of instructions will deliberately avoid varying their timing based on the input register values. Just what you want for keeping your constant-time crypto primitives constant-time. As far as I'm aware, no CPU has _yet_ implemented any data-dependent optimisations, so DIT is a safety precaution against them doing so in future. It would be embarrassing to be caught without it if a future CPU does do that, so we now turn on DIT in the PuTTY process state. I've put a call to the new enable_dit() function at the start of every main() and WinMain() belonging to a program that might do cryptography (even testcrypt, in case someone uses it for something!), and in case I missed one there, also added a second call at the first moment that any cryptography-using part of the code looks as if it might become active: when an instance of the SSH protocol object is configured, when the system PRNG is initialised, and when selecting any cryptographic authentication protocol in an HTTP or SOCKS proxy connection. With any luck those precautions between them should ensure it's on whenever we need it. Arm's own recommendation is that you should carefully choose the granularity at which you enable and disable DIT: there's a potential time cost to turning it on and off (I'm not sure what, but plausibly something of the order of a pipeline flush), so it's a performance hit to do it _inside_ each individual crypto function, but if CPUs start supporting significant data-dependent optimisation in future, then it will also become a noticeable performance hit to just leave it on across the whole process. So you'd like to do it somewhere in the middle: for example, you might turn on DIT once around the whole process of verifying and decrypting an SSH packet, instead of once for decryption and once for MAC. With all respect to that recommendation as a strategy for maximum performance, I'm not following it here. I turn on DIT at the start of the PuTTY process, and then leave it on. Rationale: 1. PuTTY is not otherwise a performance-critical application: it's not likely to max out your CPU for any purpose _other_ than cryptography. The most CPU-intensive non-cryptographic thing I can imagine a PuTTY process doing is the complicated computation of font rendering in the terminal, and that will normally be cached (you don't recompute each glyph from its outline and hints for every time you display it). 2. I think a bigger risk lies in accidental side channels from having DIT turned off when it should have been on. I can imagine lots of causes for that. Missing a crypto operation in some unswept corner of the code; confusing control flow (like my coroutine macros) jumping with DIT clear into the middle of a region of code that expected DIT to have been set at the beginning; having a reference counter of DIT requests and getting it out of sync. In a more sophisticated programming language, it might be possible to avoid the risk in #2 by cleverness with the type system. For example, in Rust, you could have a zero-sized type that acts as a proof token for DIT being enabled (it would be constructed by a function that also sets DIT, have a Drop implementation that clears DIT, and be !Send so you couldn't use it in a thread other than the one where DIT was set), and then you could require all the actual crypto functions to take a DitToken as an extra parameter, at zero runtime cost. Then "oops I forgot to set DIT around this piece of crypto" would become a compile error. Even so, you'd have to take some care with coroutine-structured code (what happens if a Rust async function yields while holding a DIT token?) and with nesting (if you have two DIT tokens, you don't want dropping the inner one to clear DIT while the outer one is still there to wrongly convince callees that it's set). Maybe in Rust you could get this all to work reliably. But not in C! DIT is an optional feature of the Arm architecture, so we must first test to see if it's supported. This is done the same way as we already do for the various Arm crypto accelerators: on ELF-based systems, check the appropriate bit in the 'hwcap' words in the ELF aux vector; on Mac, look for an appropriate sysctl flag. On Windows I don't know of a way to query the DIT feature, _or_ of a way to write the necessary enabling instruction in an MSVC-compatible way. I've _heard_ that it might not be necessary, because Windows might just turn on DIT unconditionally and leave it on, in an even more extreme version of my own strategy. I don't have a source for that - I heard it by word of mouth - but I _hope_ it's true, because that would suit me very well! Certainly I can't write code to enable DIT without knowing (a) how to do it, (b) how to know if it's safe. Nonetheless, I've put the enable_dit() call in all the right places in the Windows main programs as well as the Unix and cross-platform code, so that if I later find out that I _can_ put in an explicit enable of DIT in some way, I'll only have to arrange to set HAVE_ARM_DIT and compile the enable_dit() function appropriately. --- cmake/cmake.h.in | 1 + cmdgen.c | 2 ++ crypto/CMakeLists.txt | 7 +++++++ crypto/enable_dit.c | 24 ++++++++++++++++++++++++ proxy/cproxy.c | 3 +++ pscp.c | 1 + psftp.c | 1 + ssh.h | 11 +++++++++++ ssh/ssh.c | 2 ++ sshrand.c | 4 +++- stubs/no-dit.c | 15 +++++++++++++++ test/testcrypt.c | 2 ++ test/testsc.c | 5 +++++ unix/CMakeLists.txt | 3 +++ unix/askpass.c | 2 ++ unix/main-gtk-application.c | 3 +++ unix/main-gtk-simple.c | 2 ++ unix/pageant.c | 2 ++ unix/plink.c | 2 ++ unix/psusan.c | 2 ++ unix/putty.c | 1 + unix/sftp.c | 1 + unix/uppity.c | 2 ++ unix/utils/arm_arch_queries.c | 15 +++++++++++++++ windows/CMakeLists.txt | 2 ++ windows/pageant.c | 1 + windows/plink.c | 1 + windows/puttygen.c | 1 + windows/sftp.c | 1 + windows/utils/arm_arch_queries.c | 8 ++++++++ windows/window.c | 1 + 31 files changed, 127 insertions(+), 1 deletion(-) create mode 100644 crypto/enable_dit.c create mode 100644 stubs/no-dit.c diff --git a/cmake/cmake.h.in b/cmake/cmake.h.in index f8fcbc58..3d36b3d0 100644 --- a/cmake/cmake.h.in +++ b/cmake/cmake.h.in @@ -60,3 +60,4 @@ #cmakedefine01 HAVE_NEON_SHA512 #cmakedefine01 HAVE_NEON_SHA512_INTRINSICS #cmakedefine01 USE_ARM64_NEON_H +#cmakedefine01 HAVE_ARM_DIT diff --git a/cmdgen.c b/cmdgen.c index 295cdd77..9883fd26 100644 --- a/cmdgen.c +++ b/cmdgen.c @@ -264,6 +264,8 @@ int main(int argc, char **argv) ppk_save_parameters params = ppk_save_default_parameters; FingerprintType fptype = SSH_FPTYPE_DEFAULT; + enable_dit(); + if (is_interactive()) progress_fp = stderr; diff --git a/crypto/CMakeLists.txt b/crypto/CMakeLists.txt index 6358569a..e57bb600 100644 --- a/crypto/CMakeLists.txt +++ b/crypto/CMakeLists.txt @@ -236,6 +236,12 @@ if(neon) endif() endif() +test_compile_with_flags(HAVE_ARM_DIT + GNU_FLAGS -march=armv8.4-a + TEST_SOURCE " + int main(void) { asm volatile(\"msr dit, %0\" :: \"r\"(1)); }" + ADD_SOURCES_IF_SUCCESSFUL enable_dit.c) + set(HAVE_AES_NI ${HAVE_AES_NI} PARENT_SCOPE) set(HAVE_SHA_NI ${HAVE_SHA_NI} PARENT_SCOPE) set(HAVE_SHAINTRIN_H ${HAVE_SHAINTRIN_H} PARENT_SCOPE) @@ -243,3 +249,4 @@ set(HAVE_NEON_CRYPTO ${HAVE_NEON_CRYPTO} PARENT_SCOPE) set(HAVE_NEON_SHA512 ${HAVE_NEON_SHA512} PARENT_SCOPE) set(HAVE_NEON_SHA512_INTRINSICS ${HAVE_NEON_SHA512_INTRINSICS} PARENT_SCOPE) set(USE_ARM64_NEON_H ${USE_ARM64_NEON_H} PARENT_SCOPE) +set(HAVE_ARM_DIT ${HAVE_ARM_DIT} PARENT_SCOPE) diff --git a/crypto/enable_dit.c b/crypto/enable_dit.c new file mode 100644 index 00000000..7c9ec4b4 --- /dev/null +++ b/crypto/enable_dit.c @@ -0,0 +1,24 @@ +/* + * Enable the PSTATE.DIT flag in AArch64, if available. + * + * This guarantees that data-processing instructions (or rather, a + * long list of specific ones) will have data-independent timing + * (hence the name). In other words, you want to turn this bit on if + * you're trying to do constant-time crypto. + * + * For maximum performance you'd want to turn this bit back off when + * doing any CPU-intensive stuff that _isn't_ cryptographic. That + * seems like a small concern in this code base, and carries the risk + * of losing track of whether it was on or not, so here we just enable + * it for the whole process. That's why there's only an enable_dit() + * function in this file and not a disable_dit() to go with it. + */ + +#include "ssh.h" + +void enable_dit(void) +{ + if (!platform_dit_available()) + return; + asm volatile("msr dit, %0" :: "r"(1)); +} diff --git a/proxy/cproxy.c b/proxy/cproxy.c index 40a2f609..4203eb5d 100644 --- a/proxy/cproxy.c +++ b/proxy/cproxy.c @@ -22,6 +22,7 @@ strbuf *chap_response(ptrlen challenge, ptrlen password) { strbuf *sb = strbuf_new_nm(); const ssh2_macalg *alg = &ssh_hmac_md5; + enable_dit(); /* just in case main() forgot */ mac_simple(alg, password, challenge, strbuf_append(sb, alg->len)); return sb; } @@ -75,6 +76,8 @@ void http_digest_response(BinarySink *bs, ptrlen username, ptrlen password, const ssh_hashalg *alg = httphashalgs[hash]; size_t hashlen = httphashlengths[hash]; + enable_dit(); /* just in case main() forgot */ + unsigned char ncbuf[4]; PUT_32BIT_MSB_FIRST(ncbuf, nonce_count); diff --git a/pscp.c b/pscp.c index e3bbf1f3..c7c4623f 100644 --- a/pscp.c +++ b/pscp.c @@ -2274,6 +2274,7 @@ int psftp_main(CmdlineArgList *arglist) bool sanitise_stderr = true; sk_init(); + enable_dit(); /* Load Default Settings before doing anything else. */ conf = conf_new(); diff --git a/psftp.c b/psftp.c index 69044694..b33e29a9 100644 --- a/psftp.c +++ b/psftp.c @@ -2803,6 +2803,7 @@ int psftp_main(CmdlineArgList *arglist) Filename *batchfile = NULL; sk_init(); + enable_dit(); userhost = user = NULL; diff --git a/ssh.h b/ssh.h index 0dd3e6c0..0d31a4bc 100644 --- a/ssh.h +++ b/ssh.h @@ -1312,6 +1312,7 @@ bool platform_pmull_neon_available(void); bool platform_sha256_neon_available(void); bool platform_sha1_neon_available(void); bool platform_sha512_neon_available(void); +bool platform_dit_available(void); /* * PuTTY version number formatted as an SSH version string. @@ -2048,3 +2049,13 @@ enum { PLUGIN_NOTYPE = 256, /* packet too short to have a type */ PLUGIN_EOF = 257 /* EOF from auth plugin */ }; + +/* + * CPU features for security + */ + +#if HAVE_ARM_DIT +void enable_dit(void); +#else +#define enable_dit() ((void)0) +#endif diff --git a/ssh/ssh.c b/ssh/ssh.c index 87b1a039..6d72ed1c 100644 --- a/ssh/ssh.c +++ b/ssh/ssh.c @@ -954,6 +954,8 @@ static char *ssh_init(const BackendVtable *vt, Seat *seat, { Ssh *ssh; + enable_dit(); /* just in case main() forgot */ + ssh = snew(Ssh); memset(ssh, 0, sizeof(Ssh)); diff --git a/sshrand.c b/sshrand.c index 7f3d6411..18720adc 100644 --- a/sshrand.c +++ b/sshrand.c @@ -93,8 +93,10 @@ void random_save_seed(void) void random_ref(void) { - if (!random_active++) + if (!random_active++) { + enable_dit(); /* just in case main() forgot */ random_create(&ssh_sha256); + } } void random_setup_custom(const ssh_hashalg *hash) diff --git a/stubs/no-dit.c b/stubs/no-dit.c new file mode 100644 index 00000000..8ba3b23d --- /dev/null +++ b/stubs/no-dit.c @@ -0,0 +1,15 @@ +/* + * Stub version of enable_dit(), included in applications like + * PuTTYtel and pterm which completely leave out the 'crypto' source + * directory. + */ + +#include "ssh.h" + +#if HAVE_ARM_DIT + +void enable_dit(void) +{ +} + +#endif diff --git a/test/testcrypt.c b/test/testcrypt.c index 328ffc47..fa3df196 100644 --- a/test/testcrypt.c +++ b/test/testcrypt.c @@ -1684,6 +1684,8 @@ int main(int argc, char **argv) const char *infile = NULL, *outfile = NULL; bool doing_opts = true; + enable_dit(); /* in case this is used as a crypto helper (Hyrum's Law) */ + while (--argc > 0) { char *p = *++argv; diff --git a/test/testsc.c b/test/testsc.c index b4950d8d..d38dd639 100644 --- a/test/testsc.c +++ b/test/testsc.c @@ -1879,6 +1879,11 @@ int main(int argc, char **argv) bool keep_outfiles = false; bool test_names_given = false; + /* One day, perhaps, if I ever get this test to work on Arm, we + * might actually _check_ DIT is enabled, and check we're sticking + * to the precise list of DIT-affected instructions */ + enable_dit(); + memset(tests_to_run, 1, sizeof(tests_to_run)); random_hash = ssh_hash_new(&ssh_sha256); diff --git a/unix/CMakeLists.txt b/unix/CMakeLists.txt index 4d8ef964..87ee6574 100644 --- a/unix/CMakeLists.txt +++ b/unix/CMakeLists.txt @@ -147,6 +147,7 @@ if(GTK_FOUND) ${CMAKE_SOURCE_DIR}/stubs/no-gss.c ${CMAKE_SOURCE_DIR}/stubs/no-ca-config.c ${CMAKE_SOURCE_DIR}/stubs/no-console.c + ${CMAKE_SOURCE_DIR}/stubs/no-dit.c ${CMAKE_SOURCE_DIR}/proxy/nosshproxy.c pty.c) be_list(pterm pterm) @@ -163,6 +164,7 @@ if(GTK_FOUND) ${CMAKE_SOURCE_DIR}/stubs/no-gss.c ${CMAKE_SOURCE_DIR}/stubs/no-ca-config.c ${CMAKE_SOURCE_DIR}/stubs/no-console.c + ${CMAKE_SOURCE_DIR}/stubs/no-dit.c ${CMAKE_SOURCE_DIR}/proxy/nosshproxy.c pty.c) be_list(ptermapp pterm) @@ -204,6 +206,7 @@ if(GTK_FOUND) ${CMAKE_SOURCE_DIR}/stubs/no-ca-config.c ${CMAKE_SOURCE_DIR}/stubs/no-console.c ${CMAKE_SOURCE_DIR}/stubs/no-rand.c + ${CMAKE_SOURCE_DIR}/stubs/no-dit.c ${CMAKE_SOURCE_DIR}/proxy/nocproxy.c ${CMAKE_SOURCE_DIR}/proxy/nosshproxy.c) be_list(puttytel PuTTYtel SERIAL OTHERBACKENDS) diff --git a/unix/askpass.c b/unix/askpass.c index a1143a8f..bdd62519 100644 --- a/unix/askpass.c +++ b/unix/askpass.c @@ -612,6 +612,8 @@ int main(int argc, char **argv) int exitcode; char *ret; + enable_dit(); /* maybe overkill, but we _are_ handling a secret */ + gtk_init(&argc, &argv); if (argc != 2) { diff --git a/unix/main-gtk-application.c b/unix/main-gtk-application.c index 3aaf6af7..cb9eeb1f 100644 --- a/unix/main-gtk-application.c +++ b/unix/main-gtk-application.c @@ -78,6 +78,7 @@ https://wiki.gnome.org/Projects/GTK%2B/OSX/Bundling has some links. #define MAY_REFER_TO_GTK_IN_HEADERS #include "putty.h" +#include "ssh.h" #include "gtkmisc.h" #include "gtkcompat.h" @@ -294,6 +295,8 @@ int main(int argc, char **argv) { int status; + enable_dit(); + /* Call the function in ux{putty,pterm}.c to do app-type * specific setup */ setup(false); /* false means we are not a one-session process */ diff --git a/unix/main-gtk-simple.c b/unix/main-gtk-simple.c index a9a897ca..f1e57012 100644 --- a/unix/main-gtk-simple.c +++ b/unix/main-gtk-simple.c @@ -31,6 +31,7 @@ #define MAY_REFER_TO_GTK_IN_HEADERS #include "putty.h" +#include "ssh.h" #include "terminal.h" #include "gtkcompat.h" #include "unifont.h" @@ -594,6 +595,7 @@ int main(int argc, char **argv) bool need_config_box; setlocale(LC_CTYPE, ""); + enable_dit(); /* Call the function in ux{putty,pterm}.c to do app-type * specific setup */ diff --git a/unix/pageant.c b/unix/pageant.c index bd6d6d7f..9ac9a640 100644 --- a/unix/pageant.c +++ b/unix/pageant.c @@ -1313,6 +1313,8 @@ int main(int argc, char **argv) const char *symlink_path = NULL; FILE *logfp = NULL; + enable_dit(); + progname = argv[0]; /* diff --git a/unix/plink.c b/unix/plink.c index b4a9749f..d5db2c2a 100644 --- a/unix/plink.c +++ b/unix/plink.c @@ -683,6 +683,8 @@ int main(int argc, char **argv) struct winsize size; const struct BackendVtable *backvt; + enable_dit(); + /* * Initialise port and protocol to sensible defaults. (These * will be overridden by more or less anything.) diff --git a/unix/psusan.c b/unix/psusan.c index 4e8693af..268dad25 100644 --- a/unix/psusan.c +++ b/unix/psusan.c @@ -348,6 +348,8 @@ int main(int argc, char **argv) Conf *conf = make_ssh_server_conf(); + enable_dit(); + memset(&ssc, 0, sizeof(ssc)); ssc.application_name = "PSUSAN"; diff --git a/unix/putty.c b/unix/putty.c index 1ccc10b6..a1dd78c5 100644 --- a/unix/putty.c +++ b/unix/putty.c @@ -87,6 +87,7 @@ const unsigned cmdline_tooltype = void setup(bool single) { sk_init(); + enable_dit(); settings_set_default_protocol(be_default_protocol); /* Find the appropriate default port. */ { diff --git a/unix/sftp.c b/unix/sftp.c index 9fb0bf6e..f2323ddd 100644 --- a/unix/sftp.c +++ b/unix/sftp.c @@ -576,6 +576,7 @@ const bool buildinfo_gtk_relevant = false; */ int main(int argc, char *argv[]) { + enable_dit(); uxsel_init(); CmdlineArgList *arglist = cmdline_arg_list_from_argv(argc, argv); return psftp_main(arglist); diff --git a/unix/uppity.c b/unix/uppity.c index 30501af7..5832ed98 100644 --- a/unix/uppity.c +++ b/unix/uppity.c @@ -650,6 +650,8 @@ int main(int argc, char **argv) struct cmdline_instance *ci = &instances[ninstances++]; init_cmdline_instance(ci); + enable_dit(); + if (argc <= 1) { /* * We're going to terminate with an error message below, diff --git a/unix/utils/arm_arch_queries.c b/unix/utils/arm_arch_queries.c index c3dc286b..190ef71e 100644 --- a/unix/utils/arm_arch_queries.c +++ b/unix/utils/arm_arch_queries.c @@ -92,6 +92,21 @@ bool platform_sha512_neon_available(void) #endif } +bool platform_dit_available(void) +{ +#if defined HWCAP_DIT + return getauxval(AT_HWCAP) & HWCAP_DIT; +#elif defined HWCAP2_DIT + return getauxval(AT_HWCAP2) & HWCAP2_DIT; +#elif defined __APPLE__ + SysctlResult res = test_sysctl_flag("hw.optional.arm.FEAT_DIT"); + /* As above, treat 'missing' as enabled */ + return res != SYSCTL_OFF; +#else + return false; +#endif +} + #else /* defined __arm__ || defined __aarch64__ */ /* diff --git a/windows/CMakeLists.txt b/windows/CMakeLists.txt index aefcb9d2..960a440d 100644 --- a/windows/CMakeLists.txt +++ b/windows/CMakeLists.txt @@ -120,6 +120,7 @@ add_executable(puttytel ${CMAKE_SOURCE_DIR}/stubs/no-ca-config.c ${CMAKE_SOURCE_DIR}/stubs/no-console.c ${CMAKE_SOURCE_DIR}/stubs/no-rand.c + ${CMAKE_SOURCE_DIR}/stubs/no-dit.c ${CMAKE_SOURCE_DIR}/proxy/nocproxy.c ${CMAKE_SOURCE_DIR}/proxy/nosshproxy.c puttytel.rc) @@ -159,6 +160,7 @@ if(HAVE_CONPTY) ${CMAKE_SOURCE_DIR}/stubs/no-ca-config.c ${CMAKE_SOURCE_DIR}/stubs/no-console.c ${CMAKE_SOURCE_DIR}/stubs/no-rand.c + ${CMAKE_SOURCE_DIR}/stubs/no-dit.c ${CMAKE_SOURCE_DIR}/proxy/nosshproxy.c pterm.rc) be_list(pterm pterm) diff --git a/windows/pageant.c b/windows/pageant.c index 202a10f8..a519926f 100644 --- a/windows/pageant.c +++ b/windows/pageant.c @@ -1520,6 +1520,7 @@ int WINAPI WinMain(HINSTANCE inst, HINSTANCE prev, LPSTR cmdline, int show) size_t nclkeys = 0, clkeysize = 0; dll_hijacking_protection(); + enable_dit(); hinst = inst; diff --git a/windows/plink.c b/windows/plink.c index 30779cef..face662f 100644 --- a/windows/plink.c +++ b/windows/plink.c @@ -296,6 +296,7 @@ int main(int argc, char **argv) const struct BackendVtable *vt; dll_hijacking_protection(); + enable_dit(); /* * Initialise port and protocol to sensible defaults. (These diff --git a/windows/puttygen.c b/windows/puttygen.c index c5115c06..8dcf666b 100644 --- a/windows/puttygen.c +++ b/windows/puttygen.c @@ -2370,6 +2370,7 @@ int WINAPI WinMain(HINSTANCE inst, HINSTANCE prev, LPSTR cmdline, int show) struct InitialParams params[1]; dll_hijacking_protection(); + enable_dit(); init_common_controls(); hinst = inst; diff --git a/windows/sftp.c b/windows/sftp.c index fb3658d7..64c1daf6 100644 --- a/windows/sftp.c +++ b/windows/sftp.c @@ -649,6 +649,7 @@ int main(int argc, char *argv[]) int ret; dll_hijacking_protection(); + enable_dit(); CmdlineArgList *arglist = cmdline_arg_list_from_GetCommandLineW(); ret = psftp_main(arglist); diff --git a/windows/utils/arm_arch_queries.c b/windows/utils/arm_arch_queries.c index b0193276..87728e10 100644 --- a/windows/utils/arm_arch_queries.c +++ b/windows/utils/arm_arch_queries.c @@ -43,3 +43,11 @@ bool platform_sha512_neon_available(void) * SHA-512 architecture extension. */ return false; } + +bool platform_dit_available(void) +{ + /* As of 2024-12-17, as far as I can tell from docs.microsoft.com, + * Windows on Arm does not yet provide a PF_ARM_V8_* flag for the + * DIT bit in PSTATE. */ + return false; +} diff --git a/windows/window.c b/windows/window.c index b2ac0adb..efc028e9 100644 --- a/windows/window.c +++ b/windows/window.c @@ -448,6 +448,7 @@ int WINAPI WinMain(HINSTANCE inst, HINSTANCE prev, LPSTR cmdline, int show) int guess_width, guess_height; dll_hijacking_protection(); + enable_dit(); hinst = inst; hprev = prev;