Outline Introduction Inversion over Binary fields GF(2^m) Multiplication over GF(2^m)/F(x) fields Fast architectures based on pre-computed matrices Results and conclusions Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 2 Introduction Elliptic curve cryptography (ECC) ECC provides lower bit length keys than classical methods like RSA (233 bit ECC → 2048 bits RSA) ECC standards: GF(p), GF(2m) Binary fields → more suitable for hardware implementation Curve point addition implies multiplication and inversion over the finite field Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 3 Introduction Crypto-processor for ECC Hardware acceleration FIPS 186.3 → GF(2m), m=192, 233, 283, 409, 571 Example: GF(2233)/(x233+x74+1) Issues: - Multiplication → degree 232 polynomials - Inversion → several multiplication and/or exponentiation operations Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 4 Inversion over GF(2m) Little Fermat Theorem Given m p∈GF (2 )/ F ( x), the inverse can be calculated using: 1 p =p 2 m 2 or equivalently: 1 p =( p 2 m 1 1 2 ) In finite fields, 2 p ( x)=C · p( x) where C is an m x m matrix One possibility (→ m-2 clock cycles): n 1 p 2n 1= ∏ p 2i i=0 Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 5 Inversion over GF(2m) Itoh- Sujii Algorithm (ITA) k Defining α k ( p)= p2 1 , results: 2 k 2 j α k + j =(α j ) α k =(α k ) α j We have to go from α 1 ( p)= p to α m 1 ( p)= p 2 m 1 1 using an additive chain Minimal Brauer additive chain for m-1=232: U1_232={1, 2, 3, 6, 7, 14, 28, 29, 58, 116, 232} Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 6 Inversion over GF(2m) Inversion over GF(2^233) using generalized ITA (squarer-ITA) exponentiation Row 7 using a 8-C chain requires 3 clock cycles: 28 8 - C (α14) →(α14 ) 8 14 - C 6 ((α14)2 ) →(α14 )2 214 - mulpol ((α14) , α14 ) 2 p 1=α 232 ( p) Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 7 Inversion over GF(2m) Inversion over GF(2233) using quad-ITA [Rebeiro 2011] p 2 232 1 =p βk ( p)= p 4 4 k k 4 116 exponentiation 1 1 4 j βk + j =(β j ) αk =(βk ) α j Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 8 Multiplication over GF(2m)/F(x) General scheme multiplier p m polynomial multiplier q m r 2m-1 MR Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices s m SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 9 Multiplication over GF(2m)/F(x) Non-overlapping Karatsuba-Ofman multiplier (NOKOA) [Fan 2010] Classical: Area ~ m2 Karatsuba-Ofman (KOA): Area ~ m log 2(3) Proposed improvement: Non-overlapping Karatsuba-Ofman (NOKOA) Algorithm Delay Classical KOA and NOKOA are defined recursively Low values of m → Classical has better area requirements Hybrid multipliers Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 10 Multiplication over GF(2m)/F(x) Non-overlapping hybrid Karatsuba-Ofman multiplier (NOKOA) Delay Synthesis results for a Virtex-5 device Trade-off value: 16 bits Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 11 Multiplication over GF(2m)/F(x) Modular Reduction (MR) Modular Reduction matrix: MR=[ I ( m)∣D ] where I is the mxm identity matrix, and D is given by: For GF (2233 )/ x 233 + x 74 +1 , D requires 537 gates Squaring: C is constructed from the D even columns Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 12 Fast Architectures for inversion over GF(2233) Fast Architectures for inversion over GF(2233) Squarer and Quad ITA are based on chains of C (or C2) matrices to perform the inversion. New proposal for reducing the number of clock cycles: - Pre-compute all the exponentiation matrices needed by the Brauer addition chain. - Use NOKOA hybrid multiplier in order to reduce the area/delay of multiplication stages Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 13 Fast Architectures for inversion over GF(2233) Pre-computed ITA (prc-ITA) Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 14 Fast Architectures for inversion over GF(2233) Pre-computed ITA, optimization 1 (prcop1-ITA) Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 15 Fast Architectures for inversion over GF(2233) Pre-computed ITA, optimization 2 (prcop2-ITA) C58 is also eliminated Area is improved 3 new clock cycles are added Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 16 Fast Architectures for inversion over GF(2233) Pre-computed ITA, delay optimization (prcops-ITA) Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 17 Results and conclusions Results Algorithm Delay synt. Delay p&r #cycles Device: Xilinx Virtex-5 Design tools: ISE 12.2 P is defined by: P=1/(#LUTS·delay·#cycles) Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 18 Results and conclusions Results SP605 Evaluation Board (37 ns clock period) Agilent 16901A Logic Analizer p = 000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0020 0000 0000 0003 p-1=159 AA8D 0D7A 064F 57FC B8BC 7056 8FC0 3E4E 2487 DE31 568C 8998 4FB2 9FDC Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 19 Results and conclusions Results Estimated computing time for inversion: 37*24=888 ns Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 20 Results and conclusions Conclusions A set of new architectures for the fast inversion over GF(2233) based on the Itoh-Sujii algorithm have been presented Two main improvements have been proposed: - The implementation of pre-computed exponentiation matrices, reducing the number of clock cycles required to complete the inversion algorithm - The use of non-overlapping hybrid polynomial multipliers, reaching better area and delay figures with respect to Karabsuba-Ofman based multipliers Architecture optimizations have result in area and delay improvements, showing significant advances in performance over other solutions Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 21 References References T. Itoh, S. Tsujii, “A fast algorithm for computing multiplicative inverses in GF(2m) using normal bases,” Inf. Comput., Vol. 78, no. 3, pp. 171-177, 1988. Karatsuba, A.: “The complexity of computations,” Proc. Steklov Inst. Math. Vol. 211, pp.169-183, 1995. C. Rebeiro, S.S. Roy, D.S. Reddy, D. Mukhopadhyay, “Revisiting the ItohTsujii Inversion Algorithm for FPGA Platforms,”IEEE Trans. on VLSI, 2011. H. Fan, J. Sun, M. Gu, K.Y. Lam,“Overlap-free Karatsuba-Ofman Polynomial Multiplication Algorithms”, IET Information security, vol. 4, no. 1, pp. 8–14, 2010. Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices SCD2011 - Nov. 17-18, 2011. Murcia (Spain) 22