draft-ietf-sshm-ntruprime-ssh-00 asserts that it's identical to the
@openssh.com version we already implement:
'[sntrup761x25519-sha512@openssh.com] became the default key exchange
algorithm in OpenSSH during 2022. That is identical to the
"sntrup761x25519-sha512" mechanism described in this document.'
In the key generation step where we invert 3f in the field
Z_q/<x^p-x-1>, I was carefully checking for failure, on the grounds
that even a field does have _one_ non-invertible element, namely zero.
But I forgot that we'd generated f in such a way that it can't
possibly be zero. So that failure check is pointless.
(However, I've retained it in the form of an assertion.)
My bulk indentation check also turned up a lot of cases where a run-on
function call or if statement didn't have its later lines aligned
correctly relative to the open paren.
I think this is quite easy to do by getting things out of
sync (editing the first line of the function call and forgetting to
update the rest, perhaps even because you never _saw_ the rest during
a search-replace). But a few didn't quite fit into that pattern, in
particular an outright misleading case in unix/askpass.c where the
second line of a call was aligned neatly below the _wrong_ one of the
open parens on the opening line.
Restored as many alignments as I could easily find.
The polynomial Stein's algorithm in that code was adapted from the
binary Stein in mpint.c. One of the comments which originally said
'dividing by 2' should have been updated to say 'dividing by x' in the
polynomial case, and didn't.
This function has to make an array containing a specific number of
random values that are +1 or -1, and all the rest zero. The simplest
way I could think of to do it at first was to make the array with all
the zeroes at the end and then shuffle the array.
But I couldn't think of a time-safe algorithm to shuffle an array in
such a way that all orders come out equiprobable, that was better than
quadratic time. In fact I still can't think of one. (Making a random
Benes network is the best idea I've come up with: it arranges that
every output order is _possible_, and runs in O(N log N) time, but it
skews the probabilities, which makes it unacceptable.)
However, there's no need to shuffle an array in this application
anyway: we're not actually trying to generate a random _permutation_,
only a random element of (n choose w). So we can just walk linearly
along the array remembering how many nonzero elements we have yet to
output, and using an appropriately chosen random number at each step
to decide whether this will be one of them.
This isn't a significant improvement in the performance of NTRU
overall, but it satisfies my sense of rightness a little, and at least
means I don't have to have a comment in the code apologising for the
terrible algorithm any more.
I wasn't really satisfied with the previous version, but it was
easiest to get Stein's algorithm working on polynomials by doing it
exactly how I already knew to do it for integers. But now I've
improved it in two ways.
The first improvement I got from another implementation: instead of
transforming A into A - kB for some k that makes the constant term
zero, you can scale _both_ inputs, replacing A with mA - kB for some
k,m. The advantage is that you can calculate m and k very easily, by
making each one the constant term of the other polynomial, which means
you don't need to invert something mod q in every step. (Rather like
the projective-coordinates optimisations in elliptic curves, where
instead of inverting in every step you accumulate the product of all
the factors that need to be inverted, and invert the whole product
once at the very end.)
The second improvement is to abandon my cumbersome unwinding loop that
builds up the output coefficients by reversing the steps in the
original gcd-finding loop. Instead, I do the thing you do in normal
Euclid's algorithm: keep track of the coefficients as you go through
the original loop. I had wanted to do this before, but hadn't figured
out how you could deal with dividing a coefficient by x when (unlike
the associated real value) the coefficient isn't a multiple of x. But
the answer is very simple: x is invertible in the ring we're working
in (its inverse mod x^p-x-1 is just x^{p-1}-1), so you _can_ just
divide your coefficient by x, and moreover, very easily!
Together, these changes speed up the NTRU key generation by about a
factor of 1.5. And they remove lots of complicated code as well, so
everybody wins.
This consists of DJB's 'Streamlined NTRU Prime' quantum-resistant
cryptosystem, currently in round 3 of the NIST post-quantum key
exchange competition; it's run in parallel with ordinary Curve25519,
and generates a shared secret combining the output of both systems.
(Hence, even if you don't trust this newfangled NTRU Prime thing at
all, it's at least no _less_ secure than the kex you were using
already.)
As the OpenSSH developers point out, key exchange is the most urgent
thing to make quantum-resistant, even before working quantum computers
big enough to break crypto become available, because a break of the
kex algorithm can be applied retroactively to recordings of your past
sessions. By contrast, authentication is a real-time protocol, and can
only be broken by a quantum computer if there's one available to
attack you _already_.
I've implemented both sides of the mechanism, so that PuTTY and Uppity
both support it. In my initial testing, the two sides can both
interoperate with the appropriate half of OpenSSH, and also (of
course, but it would be embarrassing to mess it up) with each other.