Fast Architectures for inversion over GF(2233)

advertisement
Outline
Introduction
Inversion over Binary fields GF(2^m)
Multiplication over GF(2^m)/F(x) fields
Fast architectures based on pre-computed matrices
Results and conclusions
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
2
Introduction
Elliptic curve cryptography (ECC)
ECC provides lower bit length keys than classical methods like RSA (233
bit ECC → 2048 bits RSA)
ECC standards: GF(p), GF(2m)
Binary fields → more suitable for hardware implementation
Curve point addition implies multiplication and inversion over the finite
field
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
3
Introduction
Crypto-processor for ECC
Hardware acceleration
FIPS 186.3 → GF(2m), m=192, 233, 283, 409, 571
Example: GF(2233)/(x233+x74+1)
Issues:
- Multiplication → degree 232 polynomials
- Inversion → several multiplication and/or exponentiation operations
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
4
Inversion over GF(2m)
Little Fermat Theorem
Given
m
p∈GF (2 )/ F ( x),
the inverse can be calculated using:
1
p =p
2
m
2
or equivalently:
1
p =( p
2
m 1
1 2
)
In finite fields,
2
p ( x)=C · p( x)
where C is an m x m matrix
One possibility (→ m-2 clock cycles):
n 1
p
2n
1= ∏ p
2i
i=0
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
5
Inversion over GF(2m)
Itoh- Sujii Algorithm (ITA)
k
Defining α k ( p)= p2
1
, results:
2
k
2
j
α k + j =(α j ) α k =(α k ) α j
We have to go from
α 1 ( p)= p to α m 1 ( p)= p 2
m 1
1
using an additive chain
Minimal Brauer additive chain for m-1=232:
U1_232={1, 2, 3, 6, 7, 14, 28, 29, 58, 116, 232}
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
6
Inversion over GF(2m)
Inversion over GF(2^233) using generalized ITA (squarer-ITA)
exponentiation
Row 7 using a 8-C
chain requires 3
clock cycles:
28
8
- C (α14) →(α14 )
8
14
- C 6 ((α14)2 ) →(α14 )2
214
- mulpol ((α14) , α14 )
2
p 1=α 232 ( p)
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
7
Inversion over GF(2m)
Inversion over GF(2233) using quad-ITA [Rebeiro 2011]
p
2
232
1
=p
βk ( p)= p
4
4
k
k
4
116
exponentiation
1
1
4
j
βk + j =(β j ) αk =(βk ) α j
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
8
Multiplication over GF(2m)/F(x)
General scheme
multiplier
p
m
polynomial
multiplier
q
m
r
2m-1
MR
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
s
m
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
9
Multiplication over GF(2m)/F(x)
Non-overlapping Karatsuba-Ofman multiplier (NOKOA) [Fan 2010]
Classical: Area ~ m2
Karatsuba-Ofman (KOA): Area ~ m
log 2(3)
Proposed improvement: Non-overlapping Karatsuba-Ofman (NOKOA)
Algorithm
Delay
Classical
KOA and NOKOA are defined recursively
Low values of m → Classical has better area requirements
Hybrid multipliers
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
10
Multiplication over GF(2m)/F(x)
Non-overlapping hybrid Karatsuba-Ofman multiplier (NOKOA)
Delay
Synthesis results for a Virtex-5 device
Trade-off value: 16 bits
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
11
Multiplication over GF(2m)/F(x)
Modular Reduction (MR)
Modular Reduction matrix:
MR=[ I ( m)∣D ]
where I is the mxm identity matrix, and D is given by:
For GF (2233 )/ x 233 + x 74 +1 , D requires 537 gates
Squaring: C is constructed from the D even columns
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
12
Fast Architectures for inversion over GF(2233)
Fast Architectures for inversion over GF(2233)
Squarer and Quad ITA are based on chains of C (or C2) matrices to
perform the inversion.
New proposal for reducing the number of clock cycles:
- Pre-compute all the exponentiation matrices needed by the Brauer
addition chain.
- Use NOKOA hybrid multiplier in order to reduce the area/delay of
multiplication stages
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
13
Fast Architectures for inversion over GF(2233)
Pre-computed ITA (prc-ITA)
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
14
Fast Architectures for inversion over GF(2233)
Pre-computed ITA, optimization 1 (prcop1-ITA)
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
15
Fast Architectures for inversion over GF(2233)
Pre-computed ITA, optimization 2 (prcop2-ITA)
C58 is also eliminated
Area is improved
3 new clock cycles are added
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
16
Fast Architectures for inversion over GF(2233)
Pre-computed ITA, delay optimization (prcops-ITA)
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
17
Results and conclusions
Results
Algorithm
Delay synt.
Delay p&r
#cycles
Device: Xilinx Virtex-5
Design tools: ISE 12.2
P is defined by: P=1/(#LUTS·delay·#cycles)
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
18
Results and conclusions
Results
SP605 Evaluation Board (37 ns clock period)
Agilent 16901A Logic Analizer
p = 000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0020 0000 0000 0003
p-1=159 AA8D 0D7A 064F 57FC B8BC 7056 8FC0 3E4E 2487 DE31 568C 8998 4FB2 9FDC
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
19
Results and conclusions
Results
Estimated computing time for inversion: 37*24=888 ns
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
20
Results and conclusions
Conclusions
A set of new architectures for the fast inversion over GF(2233) based on
the Itoh-Sujii algorithm have been presented
Two main improvements have been proposed:
- The implementation of pre-computed exponentiation matrices,
reducing the number of clock cycles required to complete the
inversion algorithm
- The use of non-overlapping hybrid polynomial multipliers, reaching
better area and delay figures with respect to Karabsuba-Ofman based
multipliers
Architecture optimizations have result in area and delay improvements,
showing significant advances in performance over other solutions
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
21
References
References
T. Itoh, S. Tsujii, “A fast algorithm for computing multiplicative inverses in
GF(2m) using normal bases,” Inf. Comput., Vol. 78, no. 3, pp. 171-177, 1988.
Karatsuba, A.: “The complexity of computations,” Proc. Steklov Inst. Math.
Vol. 211, pp.169-183, 1995.
C. Rebeiro, S.S. Roy, D.S. Reddy, D. Mukhopadhyay, “Revisiting the ItohTsujii Inversion Algorithm for FPGA Platforms,”IEEE Trans. on VLSI, 2011.
H. Fan, J. Sun, M. Gu, K.Y. Lam,“Overlap-free Karatsuba-Ofman Polynomial
Multiplication Algorithms”, IET Information security, vol. 4, no. 1, pp. 8–14,
2010.
Fast Inversion Archit ect ures over GF(2 233 ) using pre-com put ed Exponent iat ion Mat rices
SCD2011 - Nov. 17-18, 2011. Murcia (Spain)
22
Download