Current software
ntruprime-20200930.sage
:
Reference implementation in Sage (Python plus math libraries) for
sntrup653
, sntrup761
(and sntrup4591761
), sntrup857
,
sntrup953
, sntrup1013
, sntrup1277
,
ntrulpr653
, ntrulpr761
(and ntrulpr4591761
), ntrulpr857
,
ntrulpr953
, ntrulpr1013
, and ntrulpr1277
.
Reference C software:
- See the
crypto_kem/sntrup{653,761,857,953,1013,1277}/ref
andcrypto_kem/ntrulpr{653,761,857,953,1013,1277}/ref
directories in the SUPERCOP benchmarking toolkit.
Optimized software:
- See the
crypto_kem/sntrup{653,761,857,953,1013,1277}/factored
andcrypto_kem/ntrulpr{653,761,857,953,1013,1277}/factored
directories in the SUPERCOP benchmarking toolkit.
Hardware implementation:
- A constant-time high-speed and a constant-time low-area hardware implementation in mixed VHDL & Verilog
for
sntrup761
is available at https://github.com/AdrianMarotzke/SNTRUP_on_FPGA. - An arbitrary-order gate-level masked implementation of the decapsulation in mixed
VHDL & Verilog for
sntrup761
is available at https://github.com/AdrianMarotzke/Masked-SNTRUP.
Archives
See also the crypto_kem/sntrup4591761
and crypto_kem/ntrulpr4591761
directories in the
SUPERCOP benchmarking toolkit.
ntruprime.sage
:
2019 reference implementation and some tests for
sntrup653
, sntrup761
(and sntrup4591761
), sntrup857
,
ntrulpr653
, ntrulpr761
(and ntrulpr4591761
), and ntrulpr857
.
sntrup4591761.sage
:
Original reference implementation
of sntrup4591761
.
ntrulpr4591761.sage
:
Original reference implementation
of ntrulpr4591761
.
ntruprime-haswell-20190712.tar.gz
:
NTRU Prime Haswell assembly-language code package.
ntruprime-20171206.tar.gz
:
Portable C reference implementations
sntrup4591761/ref
and ntrulpr4591761/ref
;
and Haswell-optimized implementations
sntrup4591761/avx
and ntrulpr4591761/avx
.
The avx
implementations include
a very fast general-purpose integer-sorting library,
avx/int32_sort.c
,
of independent interest.
ntruprime-20170815.tar.gz
.
Changes after this version:
Added NTRU LPRime.
NIST KAT compatibility (with -DKAT
).
KAT intermediate-value output (with -DKAT
).
Namespacing improvements.
Removed unused variables.
An older constant-time hardware implementation in VHDL
for sntrup653
, sntrup761
and sntrup857
is available at https://github.com/AdrianMarotzke/SNTRUP.
C software external interface
The C implementations support the
eBACS KEM interface.
This means that they provide three functions to callers,
as shown below.
All occurrences of
crypto_kem
below are actually
crypto_kem_sntrup761
for sntrup761
,
crypto_kem_ntrulpr761
for ntrulpr761
,
etc.,
or a literal crypto_kem
in NaCl-style
cryptographic libraries.
#include "crypto_kem.h"
This is not a source file; it is created automatically by SUPERCOP.
unsigned char pk[crypto_kem_PUBLICKEYBYTES];
unsigned char sk[crypto_kem_SECRETKEYBYTES];
crypto_kem_keypair(pk,sk);
Generates a random public key pk[0],...,pk[PUBLICKEYBYTES-1]
and corresponding secret key sk[0],...,sk[SECRETKEYBYTES-1]
.
Always returns 0
.
unsigned char c[crypto_kem_CIPHERTEXTBYTES];
unsigned char k[crypto_kem_BYTES];
const unsigned char pk[crypto_kem_PUBLICKEYBYTES];
crypto_kem_enc(c,k,pk);
Generates a random session key k[0],...,k[BYTES-1]
and corresponding ciphertext c[0],...,c[CIPHERTEXTBYTES-1]
given a public key pk[0],...,pk[PUBLICKEYBYTES-1]
.
Always returns 0
.
unsigned char k[crypto_kem_BYTES];
const unsigned char c[crypto_kem_CIPHERTEXTBYTES];
const unsigned char sk[crypto_kem_SECRETKEYBYTES];
crypto_kem_dec(k,c,sk);
Recovers a session key k[0],...,k[BYTES-1]
from a ciphertext c[0],...,c[CIPHERTEXTBYTES-1]
using a secret key sk[0],...,sk[SECRETKEYBYTES-1]
.
Returns 0
.
However, if the ciphertext is invalid,
returns -1
, possibly after modifying k[0],...,k[BYTES-1]
.
C software internals for sntrup4591761
and ntrulpr4591761
All of the following details are subject to change.
These function names are not exposed to callers.
Some functions are specific to ref
,
and some are specific to avx
;
some are specific to sntrup4591761
,
and some are specific to ntrulpr4591761
.
Parameters:
All of the software takes q
to be 4591
and takes p
to be 761
.
For sntrup4591761
,
the weight parameter w
is 286;
for ntrulpr4591761
,
w
is 250.
In modular computations
there is a distinction between
"freezing" integers
(reducing them to unique representatives)
and "squeezing" integers
(reducing them to save time and avoid overflow,
but not necessarily freezing).
In this software,
frozen integers modulo 3
are defined as -1
, 0
, or 1
;
frozen integers modulo q
are defined as integers between -2295
and 2295
;
frozen polynomials are defined as linear combinations of
1,x,...,x760
with frozen coefficients.
These representations
are not expected to change
and are not strictly modularized,
but sometimes they are abstracted for clarity.
Polynomials are stored as p_padded
coefficients,
where p_padded
is currently 761
for ref
and 768
for avx
.
When p_padded
is larger than 761
,
the remaining coefficients are stored as 0
.
The current software does not use or provide the p_padded
name,
but this documentation uses it.
Software clarity, software performance, and specification clarity are obviously different metrics, often producing different function-call structures.
#include "hide.h"
unsigned char c[crypto_kem_CIPHERTEXTBYTES];
unsigned char k[crypto_kem_BYTES];
const unsigned char pk[crypto_kem_PUBLICKEYBYTES];
const unsigned char r[32];
hide(c,k,pk,r);
Same as crypto_kem_enc
,
except that
hide
takes 32 explicit bytes of randomness r[0],...,r[31]
as input,
while crypto_kem_enc
takes 32 bytes from randombytes
.
The current software provides this function
only for ntrulpr4591761
.
An analogous function for sntrup4591761
would have a small polynomial r
as input
rather than a 32-byte string.
#include "int32_sort.h"
crypto_int32 x[...]
int xlen;
int32_sort(x,xlen);
Sorts x[0],...,x[xlen-1]
into order.
crypto_int32 x, y;
minmax(&x,&y);
Sorts x
and y
into order.
#include "mod3.h"
small x;
crypto_int32 i;
x = mod3_freeze(i);
Reads i
between -100000
and 100000
.
Freezes modulo 3.
Returns the result.
#include "mod3.h"
small x;
small a, b, c;
x = mod3_minusproduct(a,b,c);
Reads frozen a
, b
, c
modulo 3.
Computes a-bc
.
Freezes modulo 3.
Returns the result.
#include "mod3.h"
int c;
small x;
c = mod3_nonzero_mask(x);
Reads frozen x
modulo 3.
Returns -1
if x
is nonzero modulo 3, 0
otherwise.
#include "mod3.h"
small x;
small a, b, c;
x = mod3_plusproduct(a,b,c);
Reads frozen a
, b
, c
modulo 3.
Computes a+bc
.
Freezes modulo 3.
Returns the result.
#include "mod3.h"
small x;
small a, b;
x = mod3_product(a,b);
Reads frozen a
, b
modulo 3.
Computes ab
.
Freezes modulo 3 (which is a no-op).
Returns the result.
#include "mod3.h"
small x;
small a, b;
x = mod3_quotient(a,b);
Same as
x = mod3_product(a,mod3_reciprocal(b))
.
See mod3_product
and mod3_reciprocal
.
#include "mod3.h"
small x;
small a;
x = mod3_reciprocal(a);
Reads frozen a
modulo 3.
Computes the reciprocal modulo 3,
or 0 if the input is 0 modulo 3.
Freezes modulo 3.
(All of this is a no-op.)
Returns the result.
#include "mod3.h"
small x;
small a, b;
x = mod3_sum(a,b);
Reads frozen a
, b
modulo 3.
Computes a+b
.
Freezes modulo 3.
Returns the result.
#include "modq.h"
modq x;
crypto_int32 i;
x = modq_freeze(i);
Reads i
between -9000000
and 9000000
.
Freezes modulo q
.
Returns the result.
#include "modq.h"
modq x;
crypto_uint32 u;
x = modq_fromuint32(u);
Similar to modq_freeze
but has a wider input range.
Specifically,
reads an integer u
between 0
and 4294967295
;
subtracts 2295;
freezes modulo q
;
returns the result.
#include "modq.h"
modq x;
modq a, b, c;
x = modq_minusproduct(a,b,c);
Reads frozen a
, b
, c
modulo q
.
Computes a-bc
.
Freezes modulo q
.
Returns the result.
#include "modq.h"
int c;
modq x;
c = modq_nonzero_mask(x);
Reads frozen x
modulo q
.
Returns -1
if x
is nonzero modulo q
, 0
otherwise.
#include "modq.h"
modq x;
modq a, b, c;
x = modq_plusproduct(a,b,c);
Reads frozen a
, b
, c
modulo q
.
Computes a+bc
.
Freezes modulo q
.
Returns the result.
#include "modq.h"
modq x;
modq a, b;
x = modq_product(a,b);
Reads frozen a
, b
modulo q
.
Computes ab
.
Freezes modulo q
.
Returns the result.
#include "modq.h"
modq x;
modq a, b;
x = modq_quotient(a,b);
Same as
x = modq_product(a,modq_reciprocal(b))
.
See modq_product
and modq_reciprocal
.
#include "modq.h"
modq x;
modq a;
x = modq_reciprocal(a);
Reads frozen a
modulo q
.
Computes the reciprocal modulo q
,
or 0 if the input is 0 modulo q
.
Freezes modulo q
.
Returns the result.
#include "modq.h"
modq x;
modq a;
x = modq_square(a);
Same as
x = modq_product(a,a)
.
See modq_product
.
#include "modq.h"
modq x;
modq a, b;
x = modq_sum(a,b);
Reads frozen a
, b
modulo q
.
Computes a+b
.
Freezes modulo q
.
Returns the result.
#include "r3.h"
small h[p_padded];
const small f[p_padded];
const small g[p_padded];
r3_mult(h,f,g);
Computes a product in the ring of polynomials
modulo 3 and modulo x761−x−1.
The inputs are polynomials
f[0]
+f[1]
x+...
and
g[0]
+g[1]
x+...,
assumed to be frozen modulo 3.
The output is a polynomial
h[0]
+h[1]
x+...,
frozen modulo 3.
#include "r3.h"
small h[p_padded];
const small g[p_padded];
r3_recip(h,g);
Computes a reciprocal in the ring of polynomials
modulo 3 and modulo x761−x−1;
and returns 0.
The input is a polynomial
g[0]
+g[1]
x+...,
assumed to be frozen modulo 3.
The output is a polynomial
h[0]
+h[1]
x+...,
frozen modulo 3.
If the input is not invertible,
instead returns -1
.
The contents of h[0]
etc. are then undefined.
Speed notes: A separate function to test invertibility would be faster. Namely, check divisibility by the factors of x761−x−1 modulo 3. These factors have degrees 19, 60, and 682, producing divisibility with probabilities below 1/230, 1/295, and 1/21080. Evidently degree 682 can safely be skipped, and skipping degree 60 might also be acceptable.
Testing invertibility before calling the reciprocal function
would slightly increase the average time to generate one key,
but would significantly reduce the maximum observed time.
The following procedure would save much more time
in any application bottlenecked by key generation:
compute a batch of many invertible g
inputs,
using an invertibility test for each input;
then use Montgomery's trick for batch reciprocals.
#include "r3.h"
int c;
const small g[p_padded];
c = r3_weightw_mask(g);
Returns 0
if the input has weight exactly w
,
otherwise -1
.
The input is a polynomial
g[0]
+g[1]
x+...,
assumed to be frozen modulo 3.
#include "rq.h"
modq f[p_padded];
const unsigned char c[rq_encode_len];
rq_decode(f,c);
See rq_encode
.
#include "rq.h"
modq f[p_padded];
const unsigned char c[rq_encoderounded_len];
rq_decoderounded(f,c);
See rq_encoderounded
.
#include "rq.h"
unsigned char c[rq_encode_len];
const modq f[p_padded];
rq_encode(c,f);
See "Encoding of public keys as strings"
in the sntrup4591761
specification.
rq_encode_len
is 1218.
#include "rq.h"
unsigned char c[rq_encoderounded_len];
const modq f[p_padded];
rq_encoderounded(c,f);
See "Encoding of rounded ring elements as strings"
in the sntrup4591761
specification.
rq_encoderounded_len
is 1015;
the name rq_encoderounded_len
is not always provided by the current software.
#include "rq.h"
modq h[p_padded];
const unsigned char K[32];
rq_fromseed(h,K);
See the definition of "Generator"
for ntrulpr4591761
.
#include "rq.h"
small g[p_padded];
const modq h[p_padded];
rq_mod3(g,h);
Handles each coefficient as follows:
multiply by 3;
freeze modulo q
;
freeze modulo 3
.
The input is a polynomial
h[0]
+h[1]
x+...,
assumed to be frozen modulo q
.
The output is a polynomial
g[0]
+g[1]
x+...,
frozen modulo 3
.
Speed note:
The specification of decapsulation
multiplies the incoming c
by 3f
,
freezes modulo q
,
and then freezes modulo 3
.
The software actually
multiplies the incoming c
by f
,
and then uses rq_mod3
.
The cost of multiplying by 3 in rq_mod3
is outweighed by the benefit
of a smaller input to multiplication.
#include "rq.h"
modq h[p_padded];
const modq f[p_padded];
const small g[p_padded];
rq_mult(h,f,g);
Computes a product in the ring of polynomials
modulo q
and modulo x761−x−1.
The inputs are polynomials
f[0]
+f[1]
x+...
and
g[0]
+g[1]
x+...,
assumed to be frozen modulo q
.
Furthermore,
g
is assumed to have each coefficient
-1
, 0
, or 1
.
The output is a polynomial
h[0]
+h[1]
x+...,
frozen modulo q
.
#include "rq.h"
modq h[p_padded];
const modq g[p_padded];
rq_recip3(h,g);
Computes a reciprocal, scaled by a factor 3,
in the ring of polynomials
modulo q
and modulo x761−x−1;
and returns 0.
The input is a polynomial
g[0]
+g[1]
x+...,
assumed to be frozen modulo q
.
The output is a polynomial
h[0]
+h[1]
x+...,
frozen modulo q
.
The reciprocal of the output is 3 times the input.
If the input is not invertible,
instead returns -1
.
The contents of h[0]
etc. are then undefined.
See r3_recip
for speed notes.
An analogous invertibility test for rq_recip3
is even easier:
all of the ring elements that appear
are nonzero by construction,
and each nonzero ring element is invertible.
#include "rq.h"
unsigned char r[32];
const unsigned char c[128];
const modq ab[256];
rq_rightsubbit(r,c,ab);
See the definition of "Right"
for ntrulpr4591761
,
and the sign-bit step
in NTRU LPRime decapsulation.
rq_rightsubbit
first applies Right to c[0],...,c[127]
,
obtaining 256 intermediate coefficients.
It then subtracts the corresponding
coefficients ab[0],...,ab[255]
,
adds 4w
+1 to each coefficient,
freezes each coefficient modulo q
,
extracts the sign bit of each coefficient,
and packs these bits in little-endian form
into r[0],...,r[31]
.
#include "rq.h"
modq h[p_padded];
const modq f[p_padded];
rq_round3(h,f);
Rounds each polynomial coefficient to the nearest multiple of 3.
The input is a polynomial
f[0]
+f[1]
x+...,
assumed to be frozen modulo q
.
The output is a polynomial
h[0]
+h[1]
x+...,
where each h[i]
is f[i]
rounded to the nearest multiple of 3.
This polynomial is frozen modulo q
since q
is 1 modulo 6.
#include "rq.h"
unsigned char c[rq_encoderounded_len];
const modq f[p_padded];
rq_roundencode(c,f);
Composition of
rq_round3
and
rq_encoderounded
.
Speed note:
These are merged to save time in avx
.
#include "rq.h"
unsigned char c[128];
const modq f[256];
const unsigned char r[32];
rq_top(c,f,r);
See the definition of "Top"
for ntrulpr4591761
;
and the definition of Cj
in NTRU LPRime encapsulation.
This function computes
c[0],...,c[127]
as Top applied to 256 input coefficients.
The input coefficients are
the 256 bits of r
in little-endian form,
times (q
-1)/2,
plus f[0],...,f[255]
.
#include "small.h"
small f[p_padded];
const unsigned char c[small_encode_len];
small_decode(f,c);
See small_encode
.
#include "small.h"
unsigned char c[small_encode_len];
const small f[p_padded];
small_encode(c,f);
Encodes a frozen polynomial modulo 3
as follows.
View the polynomial in little-endian form
as 764 coefficients,
where the last 3 coefficients are always 0.
Add 1 to each coefficient,
obtaining an element of {0,1,2}.
Write each batch of 4 elements in radix 4,
obtaining a byte.
This produces 191 bytes
c[0],...,c[190]
;
small_encode_len
is 191.
#include "small.h"
small g[p_padded];
small_random(g);
Generates a uniform random element of the ring of polynomials
modulo 3 and modulo x761−x−1.
The output is a polynomial
g[0]
+g[1]
x+...,
frozen modulo 3.
Actually, the distribution is not precisely uniform.
Each coefficient
(starting with the constant coefficient)
is generated as follows:
call small_random32()
,
extract the bottom 30 bits,
multiply by 3,
shift right by 30,
and subtract 1.
A separate analysis shows that the divergence from uniformity is below 1.000002.
#include "small.h"
small f[p_padded];
small_random_weightw(f);
Generates a uniform random weight-w
element
of the ring of polynomials
modulo 3 and modulo x761−x−1.
Weight w
means that there are
exactly w
nonzero coefficients.
The output is a polynomial
f[0]
+f[1]
x+...,
frozen modulo 3.
Actually, the distribution is not precisely uniform.
This function works as follows.
Generate 761 int32
coefficients
(starting with the constant coefficient)
using small_random32()
,
For each of the first w
coefficients,
clear the bottom bit, obtaining 0 or 2 modulo 4.
For the remaining 761−w
coefficients,
set the bottom bit and clear the next bit,
obtaining 1 modulo 4.
Then sort all 761 coefficients.
Finally, for each coefficient,
extract the bottom 2 bits and subtract 1,
obtaining -1
or 0
or 1
.
A separate analysis shows that the divergence from uniformity is below 1.00027.
#include "small.h"
crypto_int32 i;
i = small_random32();
Generates and returns a uniform random int32
.
More detailed specification for test vectors:
obtains 4 bytes from randombytes
,
interpreted in little-endian form as a uint32
,
interpreted in twos-complement as an int32
.
Even more detailed specification for NIST KATs,
which do not provide chunk-independence:
randombytes
is called in chunks of 3044 bytes.
#include "small.h"
small f[p_padded];
const unsigned char k[32];
small_seeded_weightw(f,k);
Same as small_random_weightw
,
except for two differences.
First,
the initial random coefficients
are obtained as AES-256-CTR output
using key k[0],...,k[31]
rather than as randombytes
output.
Second,
these coefficients are sorted as uint32
rather than as int32
.
Equivalently,
the coefficients are sorted as int32
but first the top bit of each coefficient is flipped.
For ntrulpr4591761
,
small_random_weightw
is actually implemented as small_seeded_weightw
using k[0],...,k[31]
obtained from randombytes
.
int c;
int x, y;
c = smaller_mask(x,y);
Returns -1
if x
is smaller than y
, 0
otherwise.
Assumes that x-y
does not overflow.
#include "swap.h"
unsigned char x[...];
unsigned char y[...];
int bytes;
int c;
swap(x,y,bytes,c);
Conditionally swaps x[0],...,x[bytes-1]
with y[0],...,y[bytes-1]
:
swaps if c
is -1
,
leaves in place if c
is 0
.
Assumes that c
is -1
or 0
.
The swap
prototype uses void
pointers
so it can also be applied to other data types.
small z[...];
int len;
const small x[...];
const small y[...];
const small c;
vectormod3_minusproduct(z,len,x,y,c);
Sets z[0]
to mod3_minusproduct(x[0],y[0],c)
;
sets z[1]
to mod3_minusproduct(x[1],y[1],c)
;
...
sets z[len-1]
to mod3_minusproduct(x[len-1],y[len-1],c)
.
See mod3_minusproduct
.
small z[...];
int len;
const small x[...];
const small c;
vectormod3_product(z,len,x,c);
Sets z[0]
to mod3_product(x[0],c)
;
sets z[1]
to mod3_product(x[1],c)
;
...
sets z[len-1]
to mod3_product(x[len-1],c)
.
See mod3_product
.
small z[...];
int len;
vectormod3_shift(z,len);
Reads 0,z[0],z[1],...,z[len-2]
and stores those in z[0],z[1],...,z[len-1]
.
Assumes that len
is at least 1
.
modq z[...];
int len;
const modq x[...];
const modq y[...];
const modq c;
vectormodq_minusproduct(z,len,x,y,c);
Sets z[0]
to modq_minusproduct(x[0],y[0],c)
;
sets z[1]
to modq_minusproduct(x[1],y[1],c)
;
...
sets z[len-1]
to modq_minusproduct(x[len-1],y[len-1],c)
.
See modq_minusproduct
.
modq z[...];
int len;
const modq x[...];
const modq c;
vectormodq_product(z,len,x,c);
Sets z[0]
to modq_product(x[0],c)
;
sets z[1]
to modq_product(x[1],c)
;
...
sets z[len-1]
to modq_product(x[len-1],c)
.
See modq_product
.
modq z[...];
int len;
vectormodq_shift(z,len);
Reads 0,z[0],z[1],...,z[len-2]
and stores those in z[0],z[1],...,z[len-1]
.
Assumes that len
is at least 1
.
int c;
const unsigned char x[crypto_kem_CIPHERTEXTBYTES];
const unsigned char y[crypto_kem_CIPHERTEXTBYTES];
c = verify(x,y);
Returns 0
if x[0],...,x[CIPHERTEXTBYTES-1]
are the same as y[0],...,y[CIPHERTEXTBYTES-1]
.
Otherwise returns -1
.
Security note:
The standard memcmp
function does not take constant time.
Version: This is version 2023.02.06 of the "Software" web page.