1
0
mirror of https://git.tartarus.org/simon/putty.git synced 2025-01-10 01:48:00 +00:00
Commit Graph

73 Commits

Author SHA1 Message Date
Simon Tatham
52f296b7e2 ntru.c: fix benign paste error.
smemclr(array, ... * sizeof(*different_array)) was not what I meant to
do, even though the two arrays have the same element size.
2022-04-22 22:20:36 +01:00
Simon Tatham
9aae695c62 NTRU: speed up the polynomial inversion.
I wasn't really satisfied with the previous version, but it was
easiest to get Stein's algorithm working on polynomials by doing it
exactly how I already knew to do it for integers. But now I've
improved it in two ways.

The first improvement I got from another implementation: instead of
transforming A into A - kB for some k that makes the constant term
zero, you can scale _both_ inputs, replacing A with mA - kB for some
k,m. The advantage is that you can calculate m and k very easily, by
making each one the constant term of the other polynomial, which means
you don't need to invert something mod q in every step. (Rather like
the projective-coordinates optimisations in elliptic curves, where
instead of inverting in every step you accumulate the product of all
the factors that need to be inverted, and invert the whole product
once at the very end.)

The second improvement is to abandon my cumbersome unwinding loop that
builds up the output coefficients by reversing the steps in the
original gcd-finding loop. Instead, I do the thing you do in normal
Euclid's algorithm: keep track of the coefficients as you go through
the original loop. I had wanted to do this before, but hadn't figured
out how you could deal with dividing a coefficient by x when (unlike
the associated real value) the coefficient isn't a multiple of x. But
the answer is very simple: x is invertible in the ring we're working
in (its inverse mod x^p-x-1 is just x^{p-1}-1), so you _can_ just
divide your coefficient by x, and moreover, very easily!

Together, these changes speed up the NTRU key generation by about a
factor of 1.5. And they remove lots of complicated code as well, so
everybody wins.
2022-04-21 08:13:15 +01:00
Simon Tatham
faf1601a55 Implement OpenSSH 9.x's NTRU Prime / Curve25519 kex.
This consists of DJB's 'Streamlined NTRU Prime' quantum-resistant
cryptosystem, currently in round 3 of the NIST post-quantum key
exchange competition; it's run in parallel with ordinary Curve25519,
and generates a shared secret combining the output of both systems.

(Hence, even if you don't trust this newfangled NTRU Prime thing at
all, it's at least no _less_ secure than the kex you were using
already.)

As the OpenSSH developers point out, key exchange is the most urgent
thing to make quantum-resistant, even before working quantum computers
big enough to break crypto become available, because a break of the
kex algorithm can be applied retroactively to recordings of your past
sessions. By contrast, authentication is a real-time protocol, and can
only be broken by a quantum computer if there's one available to
attack you _already_.

I've implemented both sides of the mechanism, so that PuTTY and Uppity
both support it. In my initial testing, the two sides can both
interoperate with the appropriate half of OpenSSH, and also (of
course, but it would be embarrassing to mess it up) with each other.
2022-04-15 17:46:06 +01:00
Simon Tatham
e59ee96554 Refactor ecdh_kex into an organised vtable.
This is already slightly nice because it lets me separate the
Weierstrass and Montgomery code more completely, without having to
have a vtable tucked into dh->extra. But more to the point, it will
allow completely different kex methods to fit into the same framework
later.

To that end, I've moved more of the descriptive message generation
into the vtable, and also provided the constructor with a flag that
will let it do different things in client and server.

Also, following on from a previous commit, I've arranged that the new
API returns arbitrary binary data for the exchange hash, rather than
an mp_int. An upcoming implementation of this interface will want to
return an encoded string instead of an encoded mp_int.
2022-04-15 17:46:06 +01:00
Simon Tatham
422a89e208 Use C99 named initialisers in all ssh_kex instances.
No functional change, but this will allow me to add more fields to
that structure without breaking the existing initialisers.
2022-04-15 17:46:06 +01:00
Simon Tatham
d5af33da53 Utility function mp_resize.
This reallocs an existing mp_int to have a different physical size,
e.g. to make sure there's enough space at the top of it.

Trivial, but I'm a little surprised I haven't needed it until now!
2022-04-15 17:46:06 +01:00
Simon Tatham
5935c68288 Update source file names in comments and docs.
Correcting a source file name in the docs just now reminded me that
I've seen a lot of outdated source file names elsewhere in the code,
due to all the reorganisation since we moved to cmake. Here's a giant
pass of trying to make them all accurate again.
2022-01-22 15:51:31 +00:00
Simon Tatham
4ecb40a60d Fix a batch of typos in comments and docs. 2022-01-03 06:40:51 +00:00
Simon Tatham
831accb2a9 Expose openssh_bcrypt() to testcrypt, and test it.
I happened to notice in passing that this function doesn't have any
tests (although it will have been at least somewhat tested by the
cmdgen interop test system).

This involved writing a wrapper that passes the passphrase and salt as
ptrlens, and I decided it made more sense to make the same change to
the original function too and adjust the call sites appropriately.

I derived a test case by getting OpenSSH itself to make an encrypted
key file, and then using the inputs and output from the password hash
operation that decrypted it again.
2021-12-24 10:13:28 +00:00
Simon Tatham
cd60a602f5 Stop using short exponents for Diffie-Hellman.
I recently encountered a paper [1] which catalogues all kinds of
things that can go wrong when one party in a discrete-log system
invents a prime and the other party chooses an exponent. In
particular, some choices of prime make it reasonable to use a short
exponent to save time, but others make that strategy very bad.

That paper is about the ElGamal encryption scheme used in OpenPGP,
which is basically integer Diffie-Hellman with one side's key being
persistent: a shared-secret integer is derived exactly as in DH, and
then it's used to communicate a message integer by simply multiplying
the shared secret by the message, mod p.

I don't _know_ that any problem of this kind arises in the SSH usage
of Diffie-Hellman: the standard integer DH groups in SSH are safe
primes, and as far as I know, the usual generation of prime moduli for
DH group exchange also picks safe primes. So the short exponents PuTTY
has been using _should_ be OK.

However, the range of imaginative other possibilities shown in that
paper make me nervous, even so! So I think I'm going to retire the
short exponent strategy, on general principles of overcaution.

This slows down 4096-bit integer DH by about a factor of 3-4 (which
would be worse if it weren't for the modpow speedup in the previous
commit). I think that's OK, because, firstly, computers are a lot
faster these days than when I originally chose to use short exponents,
and secondly, more and more implementations are now switching to
elliptic-curve DH, which is unaffected by this change (and with which
we've always been using maximum-length exponents).

[1] On the (in)security of ElGamal in OpenPGP. Luca De Feo, Bertram
Poettering, Alessandro Sorniotti. https://eprint.iacr.org/2021/923
2021-11-28 12:19:34 +00:00
Simon Tatham
46fbe375bf Switch to a fixed-window strategy for monty_pow.
Instead of the basic square-and-multiply strategy which requires a
square and a multiply per exponent bit (i.e. two modular
multiplications per bit in total), we instead reduce to a square per
exponent bit and an extra multiply only every 5 bits, because the
value we're multiplying in is derived from 5 of the exponent bits at
once via a table lookup.

To avoid the obvious side-channel leakage of a literal table lookup,
we read the whole table every time, mp_selecting the right value into
the multiplication input. This isn't as slow as it sounds when the
alternative is four entire modular multiplications! In my testing,
this commit speeds up large modpows by a factor of just over 1.5, and
it still gets a clean pass from 'testsc'.
2021-11-28 12:19:34 +00:00
Simon Tatham
a434b13050 Pass diffiehellman ssh_kex objects to testcrypt.
This slightly simplifies the lookup function get_dh_group(), but
mostly, the point is to make it more similar to the other lookup
functions, because I'm planning to have those autogenerated.
2021-11-22 18:32:17 +00:00
Simon Tatham
42120dd1c5 testcrypt: adjust some function parameter names.
Now those names appear in help files, I thought it was worth giving
them a read-through and spotting any really obviously confusing or
wrong ones. Quite a few make more sense in the original context of C
than in the derived Python (e.g. 'BinarySink *bs' as a place to write
output to makes sense, but the output 'val_string bs' is less
helpful).

A couple were so confusing that I also corrected them in the original
C, notably the misuse of 'wc' for the elliptic curve point input to
ecc_weierstrass_point_copy. ('wc' in that section of the code is
normally a parameter describing a whole curve.)
2021-11-21 22:19:01 +00:00
Simon Tatham
be8d3974ff Generalise strbuf_catf() into put_fmt().
marshal.h now provides a macro put_fmt() which allows you to write
arbitrary printf-formatted data to an arbitrary BinarySink.

We already had this facility for strbufs in particular, in the form of
strbuf_catf(). That was able to take advantage of knowing the inner
structure of a strbuf to minimise memory allocation (it would snprintf
directly into the strbuf's existing buffer if possible). For a general
black-box BinarySink we can't do that, so instead we dupvprintf into a
temporary buffer.

For consistency, I've removed strbuf_catf, and converted all uses of
it into the new put_fmt - and I've also added an extra vtable method
in the BinarySink API, so that put_fmt can still use strbuf_catf's
more efficient memory management when talking to a strbuf, and fall
back to the simpler strategy when that's not available.
2021-11-19 11:32:47 +00:00
Simon Tatham
5eee8ca648 Compatibility with older versions of cmake.
After this change, the cmake setup now works even on Debian stretch
(oldoldstable), which runs cmake 3.7.

In order to support a version that early I had to:

 - write a fallback implementation of 'add_compile_definitions' for
   older cmakes, which is easy, because add_compile_definitions(FOO)
   is basically just add_compile_options(-DFOO)

 - stop using list(TRANSFORM) and string(JOIN), of which I had one
   case each, and they were easily replaced with simple foreach loops

 - stop putting OBJECT libraries in the target_link_libraries command
   for executable targets, in favour of adding $<TARGET_OBJECTS:foo>
   to the main sources list for the same target. That matches what I
   do with library targets, so it's probably more sensible anyway.

I tried going back by another Debian release and getting this cmake
setup to work on jessie, but that runs CMake 3.0.1, and in _that_
version of cmake the target_sources command is missing, and I didn't
find any alternative way to add extra sources to a target after having
first declared it. Reorganising to cope with _that_ omission would be
too much upheaval without a very good reason.
2021-10-29 18:08:18 +01:00
Simon Tatham
5bb869dd22 Add references for the Diffie-Hellman fixed primes.
I ran across their defining RFCs recently and noticed that each one
provides an explicit mathematical expression for the prime (since each
one is derived from the expansion of pi, with framing FFs and a
correction term to make it actually prime).

Those expressions can be re-evaluated trivially by spigot, so it seems
reasonable to add those spigot commands in comments. This also means
the comments contain citations for these primes in actual standards,
including both the hex digits and the mathematical expressions.
2021-08-28 10:25:09 +01:00
Simon Tatham
59409d0947 Make mp_unsafe_mod_integer not be unsafe.
I've moved it from mpunsafe.c into the main mpint.c, and renamed it
mp_mod_known_integer, because now it manages to avoid leaking
information about the mp_int you give it.

It can still potentially leak information about the small _modulus_
integer - hence the word 'known' in the new function name. This won't
be a problem in any existing use of the function, because it's used
during prime generation to check divisibility by all the small primes,
and optionally also check for residue 1 mod the RSA public exponent.
But all those values are well known and not secret.

This removes one source of side-channel leakage from prime generation.
2021-08-27 17:43:40 +01:00
Simon Tatham
1c039d0a7b Spelling: standardise on "DSA", not "DSS".
This code base has always been a bit confused about which spelling it
likes to use to refer to that signature algorithm. The SSH protocol id
is "ssh-dss". But everyone I know refers to it as the Digital
Signature _Algorithm_, not the Digital Signature _Standard_.

When I moved everything down into the crypto subdir, I took the
opportunity to rename sshdss.c to dsa.c. Now I'm doing the rest of the
job: all internal identifiers and code comments refer to DSA, and the
spelling "dss" only survives in externally visible identifiers that
have to remain constant.

(Such identifiers include the SSH protocol id, and also the string id
used to identify the key type in PuTTY's own host key cache. We can't
change the latter without causing everyone a backwards-compatibility
headache, and if we _did_ ever decide to do that, we'd surely want to
do a much more thorough job of making the cache format more sensible!)
2021-04-22 18:34:47 +01:00
Simon Tatham
83fa43497f Move the SSH implementation into its own subdirectory.
This clears up another large pile of clutter at the top level, and in
the process, allows me to rename source files to things that don't all
have that annoying 'ssh' prefix at the top.
2021-04-22 18:09:13 +01:00
Simon Tatham
66e62915d2 Move a few stray header files into the crypto subdir.
sshblowf.h (as was) is 100% internal to that directory. And mpint_i.h
and ecc.h are specialist enough that it's reasonable to ask clients
outside the crypto directory to include them with a subdirectory path,
to hint that it's an unusual thing to be doing.
2021-04-22 17:57:56 +01:00
Simon Tatham
970f374ea6 Fix the SHA-NI cmake-time check.
When preparing commit fca13a17b1, I redesigned the cmake test
function at the last minute, and apparently didn't quite get all the
call sites correctly rewritten. This one still omitted some of the
argument-type keywords, and had an obsolete parameter giving an
explicit name for a sub-library, which I later decided wasn't needed.
2021-04-22 12:42:32 +01:00
Simon Tatham
fca13a17b1 Break up crypto modules containing HW acceleration.
This applies to all of AES, SHA-1, SHA-256 and SHA-512. All those
source files previously contained multiple implementations of the
algorithm, enabled or disabled by ifdefs detecting whether they would
work on a given compiler. And in order to get advanced machine
instructions like AES-NI or NEON crypto into the output file when the
compile flags hadn't enabled them, we had to do nasty stuff with
compiler-specific pragmas or attributes.

Now we can do the detection at cmake time, and enable advanced
instructions in the more sensible way, by compile-time flags. So I've
broken up each of these modules into lots of sub-pieces: a file called
(e.g.) 'foo-common.c' containing common definitions across all
implementations (such as round constants), one called 'foo-select.c'
containing the top-level vtable(s), and a separate file for each
implementation exporting just the vtable(s) for that implementation.

One advantage of this is that it depends a lot less on compiler-
specific bodgery. My particular least favourite part of the previous
setup was the part where I had to _manually_ define some Arm ACLE
feature macros before including <arm_neon.h>, so that it would define
the intrinsics I wanted. Now I'm enabling interesting architecture
features in the normal way, on the compiler command line, there's no
need for that kind of trick: the right feature macros are already
defined and <arm_neon.h> does the right thing.

Another change in this reorganisation is that I've stopped assuming
there's just one hardware implementation per platform. Previously, the
accelerated vtables were called things like sha256_hw, and varied
between FOO-NI and NEON depending on platform; and the selection code
would simply ask 'is hw available? if so, use hw, else sw'. Now, each
HW acceleration strategy names its vtable its own way, and the
selection vtable has a whole list of possibilities to iterate over
looking for a supported one. So if someone feels like writing a second
accelerated implementation of something for a given platform - for
example, I've heard you can use plain NEON to speed up AES somewhat
even without the crypto extension - then it will now have somewhere to
drop in alongside the existing ones.
2021-04-21 21:55:26 +01:00
Simon Tatham
5b30e6f7a6 Move crypto into its own subdirectory.
Similarly to 'utils', I've moved all the stuff in the crypto
build-time library into a source directory of its own, and while I'm
at it, split up the monolithic sshauxcrypt.c into its various
unrelated parts.

This is also an opportunity to remove the annoying 'ssh' prefix from
the front of the file names, and give several of them less cryptic
names.
2021-04-21 21:55:26 +01:00