mirror of
https://git.tartarus.org/simon/putty.git
synced 2025-01-09 17:38:00 +00:00
Complete rewrite of the AES code.
sshaes.c is more or less completely changed by this commit. Firstly, I've changed the top-level structure. In the old structure, there were three levels of indirection controlling what an encryption function would actually do: first the ssh2_cipher vtable, then a subsidiary set of function pointers within that to select the software or hardware implementation, and then inside the main encryption function, a switch on the key length to jump into the right place in the unrolled loop of cipher rounds. That was all a bit untidy. So now _all_ of that is done by means of just one selection system, namely the ssh2_cipher vtable. The software and hardware implementations of a given SSH cipher each have their own separate vtable, e.g. ssh2_aes256_sdctr_sw and ssh2_aes256_sdctr_hw; this allows them to have their own completely different state structures too, and not have to try to coexist awkwardly in the same universal AESContext with workaround code to align things correctly. The old implementation-agnostic vtables like ssh2_aes256_sdctr still exist, but now they're mostly empty, containing only the constructor function, which will decide whether AES-NI is currently available and then choose one of the other _real_ vtables to instantiate. As well as the cleaner data representation, this also means the vtables can have different description strings, which means the Event Log will indicate which AES implementation is actually in use; it means the SW and HW vtables are available for testcrypt to use (although actually using them is left for the next commit); and in principle it would also make it easy to support a user override for the automatic SW/HW selection (in case anyone turns out to want one). The AES-NI implementation has been reorganised to fit into the new framework. One thing I've done is to de-optimise the key expansion: instead of having a separate blazingly fast loop-unrolled key setup function for each key length, there's now just one, which uses AES intrinsics for the actual transformations of individual key words, but wraps them in a common loop structure for all the key lengths which has a clear correspondence to the cipher spec. (Sorry to throw away your work there, Pavel, but this isn't an application where key setup really _needs_ to be hugely fast, and I decided I prefer a version I can understand and debug.) The software AES implementation is also completely replaced with one that uses a bit-sliced representation, i.e. the cipher state is split across eight integers in such a way that each logical byte of the state occupies a single bit in each of those integers. The S-box lookup is done by a long string of AND and XOR operations on the eight bits (removing the potential cache side channel from a lookup table), and this representation allows 64 S-box lookups to be done in parallel simply by extending those AND/XOR operations to be bitwise ones on a whole word. So now we can perform four AES encryptions or decryptions in parallel, at least when the cipher mode permits it (which SDCTR and CBC decryption both do). The result is slower than the old implementation, but (a) not by as much as you might think - those parallel S-boxes are surprisingly competitive with 64 separate table lookups; (b) the compensation is that now it should run in constant time with no data-dependent control flow or memory addressing; and (c) in any case the really fast hardware implementation will supersede it for most users.
This commit is contained in:
parent
be5c0e6356
commit
dfdb73e103
23
mpint_i.h
23
mpint_i.h
@ -187,6 +187,29 @@
|
||||
#define BIGNUM_TOP_BIT (((BignumInt)1) << (BIGNUM_INT_BITS-1))
|
||||
#define BIGNUM_INT_MASK (BIGNUM_TOP_BIT | (BIGNUM_TOP_BIT-1))
|
||||
|
||||
/*
|
||||
* Just occasionally, we might need a GET_nnBIT_xSB_FIRST macro to
|
||||
* operate on whatever BignumInt is.
|
||||
*/
|
||||
#if BIGNUM_INT_BITS_BITS == 4
|
||||
#define GET_BIGNUMINT_MSB_FIRST GET_16BIT_MSB_FIRST
|
||||
#define GET_BIGNUMINT_LSB_FIRST GET_16BIT_LSB_FIRST
|
||||
#define PUT_BIGNUMINT_MSB_FIRST PUT_16BIT_MSB_FIRST
|
||||
#define PUT_BIGNUMINT_LSB_FIRST PUT_16BIT_LSB_FIRST
|
||||
#elif BIGNUM_INT_BITS_BITS == 5
|
||||
#define GET_BIGNUMINT_MSB_FIRST GET_32BIT_MSB_FIRST
|
||||
#define GET_BIGNUMINT_LSB_FIRST GET_32BIT_LSB_FIRST
|
||||
#define PUT_BIGNUMINT_MSB_FIRST PUT_32BIT_MSB_FIRST
|
||||
#define PUT_BIGNUMINT_LSB_FIRST PUT_32BIT_LSB_FIRST
|
||||
#elif BIGNUM_INT_BITS_BITS == 6
|
||||
#define GET_BIGNUMINT_MSB_FIRST GET_64BIT_MSB_FIRST
|
||||
#define GET_BIGNUMINT_LSB_FIRST GET_64BIT_LSB_FIRST
|
||||
#define PUT_BIGNUMINT_MSB_FIRST PUT_64BIT_MSB_FIRST
|
||||
#define PUT_BIGNUMINT_LSB_FIRST PUT_64BIT_LSB_FIRST
|
||||
#else
|
||||
#error Ran out of options for GET_BIGNUMINT_xSB_FIRST
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Common code across _most_ branches of the ifdef: define a set of
|
||||
* statement macros in terms of the BignumDblInt type provided. In
|
||||
|
3
ssh.h
3
ssh.h
@ -676,6 +676,9 @@ struct ssh2_cipheralg {
|
||||
const char *text_name;
|
||||
/* If set, this takes priority over other MAC. */
|
||||
const ssh2_macalg *required_mac;
|
||||
|
||||
/* Pointer to any extra data used by a particular implementation. */
|
||||
const void *extra;
|
||||
};
|
||||
|
||||
#define ssh2_cipher_new(alg) ((alg)->new(alg))
|
||||
|
Loading…
Reference in New Issue
Block a user