HYDRA: A Flexible PQC Processor Chen-Mou Cheng National Taiwan University November 16, 2012 Acknowledgment • Joint work with Bo-Yin Yang (Academia Sinica) and Andy Wu Post-quantum cryptography • • • • Hash-based cryptography Code-based cryptography Lattice-based cryptography Multivariate cryptography Multivariate cryptography • Composition of maps • Public quadratic polynomials • F1 and Fk are affine (y = Ax + b) Step 2. Encryption p ――――→ E ――――→ c easy↑ ↓hard Step 1. Generation p → F1 → F2 … → Fk → c ↓easy ↓easy easy↓ Step 3. Decryption p ← D1 ← D2 … ← Dk ← c Classification of multivariates • Big-field multivariates – Matsumoto-Imai derivatives – SFLASH, HFE • Small-field (or true) multivariates – Unbalanced Oil-and-Vinegar derivatives – Rainbow, TTS Security of UOV • MQ: Multivariate quadratics direct attacks – Gröbner bases: XL, F4/F5 families • EIP: Extended Isomorphism of Polynomials, a.k.a. rank or linear algebra attacks – Low rank attack – High rank attack – Reconciliation attack –… The HYDRA processor • A scalable, programmable crypto coprocessor • Accompanying toolchains and software libraries • API to raise abstraction level for developing security applications • Allowing aggressive experimentation with PKC, especially PQC Slogans • Cheap PKC – Hardware acceleration of core computation – Customizable for multiple vertical markets, allowing cost sharing • Future-proof PKC – Algorithm agility, allowing “BIOS upgrades” – PQC to resist emerging quantum-computers’ attacks • Management-free PKC – Lower total cost of ownership via PKC – Identity-based crypto ⇒ No more PKI! • “If we build them [cheaply], they will come” Target cryptosystems Scheme Low Security (280) High Security (2112,2128) ECC NIST 2K160 NIST 2K233 (112bit) NIST P192 NIST P256, Curve25519 GLS1271 Surface1271 (HEC) Pairings NTRU MQPKC BN(Barreto-Naehrig)161 BN256 LD(Lopez-Dahab)2271 LD 21223, Beuchat 3509 ees251ep7 ees347ep2 (112bit) (q=2 instead of q=3) ees397ep1 (128bit) Rainbow(q=16 or 31;24,20,20) Rainbow (q; 32, 32, 32) TTS (q=16 or 31; 24,20,20) 3HFE(731)-p 3HFE(747)-p ASIC prototyping of NTRU ASIC prototyping of TTS ASIC prototyping of Fp multiplications The Hydra microarchitecture D$ Axpy engine Decoder μC DMA Memory bus I$ Design ingredients • Axpy-style ISA for regular data movement between cache & datapath, i.e., Ya•X + Y, where |a| = w, |X| = lw, |Y| = lw or (l + 1)w • Wide & flexible vector datapath • DMA engine to (pre-)fetch and store data to fill up vector datapath as much as possible • General-purpose mC for complex I/O Review: NTRU cryptosystem • Core operation: Multiplication in Z[x]/(xn-1) • Key generation • • • • Randomly choose f and g with small coefficients Find fp , fq such that fpf = 1 mod p and fqf = 1 mod q Public key: h = pfqg Private key: f , fp • Encryption • Randomly generate r with coefficients in [-1,1] • c = rh+m • Decryption • a = fc, with coefficients in [-q/2,q/2] • m = afp, with coefficient in [-p/2,p/2] Multiplications in NTRU x + a4 b4 a3 b3 a2 b2 a1 b1 a0 b0 a4 b 0 a3 b 0 a2 b 0 a1 b 0 a0 b 0 a3 b 1 a2 b 1 a1 b 1 a0 b 1 a4 b 1 a2 b 2 a1 b 2 a0 b 2 a4 b 2 a3 b 2 a1 b 3 a0 b 3 a4 b 3 a3 b 3 a2 b 2 a0 b 4 a4 b 4 a3 b 4 a2 b 4 a1 b 4 c4 c3 c2 c1 c0 NTRU ees397ep1 • p=2, q=307, n=397 • Message m: 397 bits • Signature c: (Z307)397, ~397x9 bits • Public key h: (Z307[x])/(x397-1), ~397x9 bits • Private key f : (Z307[x])/(x397-1), ~397x9 bits - Contains 74 nonzero elements fp: (Z2[x])/(x397-1), = 397x1 bits Review: TTS cryptosystem • Message z: (GF31)40, ~200 bits • Signature w: (GF31)64, ~320 bits • Public key P: (GF31)40x2080, ~416 Kbits – Bottleneck: Quadratic polynomial evaluation • Private key: ~44244 bits – Bottleneck: Linear maps and system solving Review: Elliptic curve pairing • • • • • Core operations are finite-field arithmetic Bottleneck for prime fields: Modular multiplication Euclid’s division: y=qn+r, 0<=r<n Hensel’s division: y+qn=pkr, 0<=r<2n, p prime Montgomery method – – – – – x pkx mod n: ring homomorphism if (p,n)=1 Precompute p’,n’ such that pkp’-nn’=1 q (y mod pk)n’ q’ (q mod pk)n r (y+q’)/pk Montgomery method: More details • Problem: Given A, B, M, compute AB mod M • Idea: Works in an isomorphic ring – AAR mod M and BBR mod M – Need a way to compute ABR mod M • Solution: (x,y) M (xy)/R mod M – T(AR mod M)(BR mod M) – Can add multiple of M since mod M • T + xM = 0 mod R, therefore x = –M–1T mod R – (AR,BR) M(T + (–M–1T mod R)M)/R = ABR mod M Multi-precision Montgomery • X = (xn – 1 xn – 2 … x0), xi in {0,…,2w – 1} • S0 • for i in 0 .. n – 1 – qis0 + aib0(–M–1) mod 2w – S(S + aiB + qiM)/2w – [loop invariant: S in {0,…,M + B – 1}] • [post condition: 2nwS = AB + QM] The main Hydra ISA • Recall: Ya•X + Y – |a| = w, |X| = lw, |Y| = lw or (l + 1)w • Type i (for pairing) – a in {0,…,2w – 1}, X in {0,…,2lw – 1}, Y in {0,…,2(l + 1)w – 1} – •,+: the usual integer multiplication and addition • Type q (for TTS) – a in Fq, X in Fql, Y in Fql, and q ≤ 2w – •,+: scalar multiplication and vector addition in ldimensional vector spaces over Fq Type r Axpy instructions • X in Zql, Y in Zql such that q ≤ 2w • a in Zph such that h[lgp] ≤ 2w x + a4 b4 a3 b3 a2 b2 a1 b1 a0 b0 a4 b 0 a3 b 0 a2 b 0 a1 b 0 a0 b 0 a3 b 1 a2 b 1 a1 b 1 a0 b 1 a4 b 1 a2 b 2 a1 b 2 a0 b 2 a4 b 2 a3 b 2 a1 b 3 a0 b 3 a4 b 3 a3 b 3 a2 b 2 a0 b 4 a4 b 4 a3 b 4 a2 b 4 a1 b 4 c4 c3 c2 c1 c0 Next steps • Prototype implementation – Bulk of the work goes here • SystemC-based ISA simulator • Compiler construction – Maybe to base on LLVM Thank you! • Questions or comments?