Public-Key Cryptography Arithmetic in Finite Fields Multiple

advertisement
Instruction Set Extensions for
Public-Key Cryptography
IAIK
TUG
Dipl.-Ing. Johann Großschädl
Institute for Applied Information Processing and Communications
Graz University of Technology
Inffeldgasse 16a, A–8010 Graz, Austria
http://www.iaik.tugraz.at
Abstract: Public-key cryptography (PKC) is the basis for security and privacy
in distributed systems like the Internet, for e-commerce, and for virtually all
modern cryptographic protocols. Most public-key cryptosystems involve
computation-intensive arithmetic operations (e.g. exponentiation in groups or
finite fields), resulting in an unacceptably long delay on embedded devices
like mobile phones, PDAs, or smart cards. The project described in poster is
directed towards the design of instruction-level enhancements (ISA extensions) to improve both the performance and energy-efficiency of embedded
RISC processors when executing cryptographic workloads. We focus on
low-level arithmetic operations used in PKC, e.g. addition, multiplication,
squaring, modular reduction, inversion, and division in multiplicative groups
or finite fields of very high order (160-2048 bits). The first goal of this
research project is the design, prototype implementation, and test of a
SPARC V8-compatible processor with an extended instruction set optimized
for PKC. The second project goal is to develop and analyze sophisticated
micro-architectural enhancements for high-speed cryptography and improved security (i.e. resistance against side-channel attacks).
Public-Key Cryptography
• Widely used in security protocols like SSL, IPSec, …
– Asymmetric encryption, key exchange, digital signatures
• Based on a “hard” mathematical problem
– Integer factorization problem, discrete logarithm problem
• Traditional public-key cryptosystems
– RSA, DSA, Diffie-Hellman are based on IF or DLP
– Exponentiation in mult. group, 1024-2048 bit operands
• Elliptic curve cryptography
– DLP on an elliptic curve defined over a finite field GF(q)
– Much faster, operand length between 160 and 250 bits
Arithmetic in Finite Fields
Multiple-Precision Multiplication
• Performance of ECC depends on field arithmetic
– Addition, multiplication, squaring, inversion
• Prime fields GF(p) and binary extension field GF(2m)
Multiply/Accumulate
Operation
– Recommended by standard bodies (IEEE, ANSI, NIST)
• Arithmetic in GF(p)
– Elements are the integers from 0 to p–1
– Addition, multiplication is performed modulo the prime p
Multiply/Accumulate
Operation
• Arithmetic in GF(2m)
– Elements are binary polynomials of degree up to m–1
– Arithmetic modulo irreducible polynomial p(t) of degree m
Instruction Set Extensions
Memory
(Cache)
rs
load
rt
Registers
ALU
MAC
store
HI
rd
hi part
lo part
LO
Only six custom instructions are sufficient to accelerate the arithmetic
operations in prime fields GF(p) and binary extensions fields GF(2m)
The long operands are
represented by arrays
of single-precision
words (e.g. 32-bit unsigned integers)
The algorithm spends
the majority of its
execution time in inner
loops performing
multiply/accumulate
operations. Speeding
up these inner loops
has a dramatic impact
on the overall
performance.
Prototype Implementation
The proposed
custom instructions are executed in a unified
multiply/accumulate unit
with a “long”
accumulator
and two result
registers (similar to MIPS32).
The multiply/accumulate (MAC)
unit consists of a (32*16)-bit unified multiplier (for integers & bin.
polynomials) and a 72-bit accu
DSU
5-stage
Integer Unit
Icache
~
B
FPU
I/O
COP
Uarts
Dcache
Timers
AHB
AHB
Ctrl.
APB
The MAC unit
can operate
independently
and in parallel
with the ALU.
Memory Ctrl.
PCI
(I/T)
SRAM / PROM
32-bit PCI
Results and Conclusions
cin
Integer mode: fsel = 1 (standard FA)
Polynomial mode: fsel = 0 (XORs)
sin
All timings are given in clock cycles
fsel
A dual-field adder (DFA) is a full
adder capable of performing addition
both with and without carry
cout
A DFA is only slightly larger than a
conventional full adder (FA)
LEON/
AMBA
The MAC unit has been integrated into the LEON-2
SPARC V8 processor and prototyped on FPGA
Unified Multiplier Datapath
Addition of binary polynomials is
simply a logical XOR operation
APB
Ctrl.
sout
pin
Polynomial arithmetic can be easily integrated into the datapath of a conventional integer multiplier. A properly designed unified multiplier composed
of DFAs consumes about 30% less power in POLY-mode than in INT-mode
•
•
•
•
Speed-up compared to conv. software: GF(p) 2x, GF(2m) 6x
The proposed extensions make GF(2m) faster than GF(p)
Binary extension fields are more energy-efficient than GF(p)
Extra hardware cost is marginal (approximately 5,500 gates)
The research illustrated on this poster has been supported by the Austrian Science Fund (FWF) under grant no. P16952-N04 (“Instruction Set Extensions for Public-Key
Cryptography”) and in part by the European Commission through the IST Program under contract IST-2002-507932 ECRYPT. The information on this web site is provided
as is, and no guarantee or warranty is given or implied that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability.
Download