An Optimized S-Box Circuit Architecture for Low Power AES Design IBM Japan Ltd. Tokyo Research Laboratory Sumio Morioka and Akashi Satoh Contents Background Power Analysis of Conventional S-Boxes Multi-Stage PPRM S-Box for Low-Power H/W Conclusion Background Back Ground In 2001, NIST selected Rijndael as the new symmetric key standard cipher AES AES H/W will be integrated in various applications Low-power feature is important not only in low-end applications but also in high-end servers A 10-Gbps AES H/W chip consumes several Watts in 0.13-µm CMOS Need for power analysis and development of lowpower architectures for AES H/W Power Analysis Method Power analysis is based on timing simulation Gate switching including dynamic hazard is evaluated Quite accurate estimation compared with static analysis VHDL Code Net List library ieee; use ieee.std_logic; entity XNOR is port (A : in std_logic; B : in std_logic; Y : out std_logic; end XNOR; architecture RTL of XNOR is begin signal T : std_logic; T <= A xor B; Y <= not T; end RTL Wave Form Timing Simulation Synthesis DQ BUFFER MUX21I XOR2 NAND2N1 XNOR2 XOR3 XOR2 NAND2N1 XOR3 0101010101 0011001101 0100101101 0100101010 1111000111 0101010100 1110001111 0111000111 0011100011 Test Vector Power Calculation Power Analysis in AES H/W 128-bit bus 11-round Loop Architecture Table-lookup-based S-Box using SOP logic 128 Data 1% Input 2:1 Data Register 75 % 10 % 1% ShiftRows SubBytes MixColumns 3:1 15% Others Data Output Add RoundKey For Low-Power AES H/W Reducing S-Box power is most effective There are various S-Box H/W architectures Which architecture is suitable for low power ? Objectives Investigate performance of all conventional S-Boxes Develop a low-power S-Box architecture Power Analysis of Conventional S-Boxes S-Box Definition Nonlinear byte substitution function Multiplicative Inversion on GF(28) + affine transformation. b 10001111 11000111 11100011 11110001 = 11111000 01111100 00111110 00011111 a 1 1 0 -1 0 + 0 1 1 0 Affine trans. Inversion 8 x 8 XOR matrix Complicated math. logic S-Box Architectures GF inverter + affine Various mathematical techniques can be applied Rather slow Compact implementation S-Box and S-Box-1 can be merged Direct mapping Generate black box circuit from a truth table High-speed Rather large S-Box and S-Box-1 cannot be merged GF Inverter + Affine Apply hieratical structure of composite field GF(((22)2)2) Map elements on GF(28) onto GF(((22)2)2) by isomorphism Each component is constructed using AND-XOR logic GF(28) isomorphism GF(((22)2)2) inversion GF(((22)2)2) GF((22)2) GF(22) GF(2) 4 x2 λ 4 4 x -1 4 isomorphism -1 merged GF(28) affine trans. 2 2 x2 φ 2 x -1 2 Direct Mapping 2-Level Logic SOP (Sum of Products) : (NOT)-AND-OR POS (Product of Sums) : (NOT)-OR-AND PPRM (Positive Polarity Reed-Muller) : AND-XOR 8-bit input AND matrix Several hundred wires OR / XOR matrix 8-bit output Direct Mapping Selector-Based Logic BDD (Binary Decision Diagram) Twisted BDD will appear at ICCD 2002 in Sep. in0 in1 in2 in3 in4 in5 in6 in7 in0 in1 in2 out0 in3 in4 in5 in6 in7 out0 out1 out1 0 1 0 1 out7 out7 2:1MUX BDD Twisted BDD Power vs. Gate Count Smaller circuits do not always consume less power 400 0.13-µm 1.5-V CMOS @ 10 MHz Nominal conditions 350 Power (uW) 300 PPRM BDD Twisted BDD 250 200 Composite Field 150 100 POS SOP Table Lookup 50 0 0 0.5 1 1.5 2 Gate Count (Kgate) 2.5 3 Analysis Power consumption of S-Box is greatly influenced by the number of dynamic hazards Caused by Differences of signal arrival times at each gate Propagation probability of signal transitions Analysis Differences of signal arrival times at each gate Composite field S-Box consumes a lot of power in spite of having the smallest size It has many crossing and branched signal paths Switch many times x2 λ x -1 Multiple signal paths Analysis Propagation probability of signal transitions An XOR gate transfers signal transitions from input to output with probability 100% For AND, OR gates, the probability is 50% So power for SOP is lower than PPRM 0 0 1 XOR 0 AND 100% Probability 1 50% Probability 1 1 OR 0 Multi-Stage PPRM S-Box for Low-Power H/W Approach for Low Power S-Box Use composite field S-Box Reduce gate counts Divide combination logic into multiple stages Reduce probability of signal transitions Adjust the signal timing between each stage using 2-level logic Reduce the number of dynamic hazards Transition Probability 0.5 1.0 AND array XOR array 0.5 1.0 0.5 1.0 0.5 1.0 Multi-Stage PPRM Composite field S-Box is divided into several blocks Each block is designed using PPRM logic suitable for GF operations Adjust signal timing by using 2-level (AND-XOR) logic 4 4 8 x2 λ 4 4 4 x -1 4 4 -1 + affine 8 XOR array 8 4 Delay chain 4 8 AND array 28 XOR array 4 4 4 AND 12 XOR array array 3-stage PPRM 4 4 AND array 32 Multi-Stage PPRM PPRM S-Boxes can be divided many ways (1~6-stages) XOR AND-XOR AND-XOR AND-XOR 4 4 8 4 λ x2 4 4 x -1 4 4 4 2 x2 φ 2 AND-XOR AND-XOR 2 x -1 2 AND-XOR -1 + affine 8 Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage SOP is not suitable for GF operation 800 0.13-µm 1.5-V CMOS @ 10 MHz Nominal conditions 700 32- Power (uW) 600 500 1-stage (normal PPRM) 400 300 2- 200 64- 100 0 0 31 PPRM SOP 1-stage (normal SOP) 2 3 4 Gate Count (Kgate) 5 6 7 Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage SOP is not suitable for GF operation 400 0.13-µm 1.5-V CMOS @ 10 MHz Nominal conditions 350 Power (uW) 300 PPRM BDD Twisted BDD 250 1/3 power with 1/2 gates POS Composite 200 150 Field SOP 100 Table Lookup 50 3-stage PPRM 0 0 0.5 1 1.5 2 Gate Count (Kgate) 2.5 3 Conclusion The AES S-Box (SOP logic) consumes 75% of the power Dynamic hazards boost power needs of S-Boxes A multi-stage PPRM architecture based on a composite field S-Box was developed 3-stage PPRM / SOP : Power = 1/3, Size = 1/2 This architecture can be applied to other S-Boxes defined over Galois fields