A Masked AES ASIC Implementation

advertisement
A Masked AES ASIC Implementation ∗
Norbert Pramstaller1 , Elisabeth Oswald1 , Stefan Mangard1 , Frank K. Gürkaynak2 , and Simon Häne2
1 Institute
for Applied Information Processing and Communications (IAIK),
Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria
{Norbert.Pramstaller, Elisabeth.Oswald, Stefan.Mangard}@iaik.at
2 Integrated Systems Laboratory (IIS) ETH Zentrum,
Swiss Federal Institute of Technology Zurich, Gloriastrasse 35, 8006 Zurich, Switzerland
{kgf, haene}@iis.ee.ethz.ch
Abstract
Introduced in 1999, differential power-analysis (DPA)
attacks pose a serious threat for cryptographic devices. Several countermeasures have been proposed during the last
years. However, none of them leads to implementations that
are provably resistant against DPA. A promising class of
DPA countermeasures is masking. In this article we discuss implementations of three existing masking schemes for
the Advanced Encryption Standard (AES). We present an
ASIC that has been implemented and manufactured. This
test chip is used to investigate the countermeasures in practice. With this test chip we have also determined the costs of
masking in terms of area and execution time. Compared to
an unmasked AES implementation the best masking scheme
shows a performance loss about 40-50%. To the best of
the authors knowledge it is the first ASIC that implements
masking for AES.
1. Introduction
The Advanced Encryption Standard (AES) [1] will be
the most widely used symmetric block cipher in the coming
years. AES seems to be resistant to state-of-the-art mathematical cryptanalysis. However, unprotected implementations of it are vulnerable to differential power-analysis
(DPA) attacks. DPA attacks are successful if the power
consumption of a cryptographic device depends on the processed data and the cipher key. DPA attacks can be mounted
efficiently in practice [2]. Several DPA countermeasures,
such as masking the intermediate values, have been proposed so far.
The masking of intermediate values is achieved by
adding a random mask to the input of the algorithm. In this
way, all intermediate values are randomized automatically.
At the end of the algorithm, the mask is removed to get the
correct result.
∗ The work described in this paper has been supported in part by the
FP6 sponsored research project SCARD (Side-Channel Analysis Resistant
Design Flow), project number IST-2002-507270, and by the FWF research
project “Investigations of Simple and Differential Power Analysis”, project
number P16110-N04.
The development and the implementation of this last approach are the main topics of this article. This article is
structured as follows: Section 2 briefly sketches the Advanced Encryption Standard and DPA attacks. In Section 3
we briefly survey three existing masking schemes for AES.
After the comparison of the masking schemes, we discuss
the ASIC implementation in Section 4. Finally, we draw
conclusions in Section 5.
2. AES and DPA
The AES algorithm is a symmetric block cipher that operates on 128-bit data blocks. AES uses a cipher key to encrypt a 128-bit data block. The length of the cipher key can
be 128 bits, 192 bits, or 256 bits—the standard defines these
three key lengths to adapt the algorithm to short-, medium-,
and long-term security requirements. Like most symmetric ciphers, AES encrypts an input data-block by applying
the same round function iteratively. The round function alters the input data-block, which is called State, by applying non-linear and linear functions. The linear functions
are AddRoundKey, ShiftRows, and MixColumns. The only
non-linear transformation is SubBytes. SubBytes is composed of a byte inversion in the finite field GF(28 ) followed
by an affine transformation.
DPA can be applied to any unprotected implementation
of any algorithm. It works because an attacker can predict
an intermediate value, which is dependent on a small part
of the secret key that occurs during the execution of the
algorithm. As the power consumption of an implementation strongly depends on processed data, there is a correlation between the power consumption and the, for example,
Hamming weight of the intermediate values. Hence, an attacker can verify the correctness of his prediction (which is
based on a small part of the secret key) by calculating this
correlation.
Consequently, a way to counteract DPA is to randomize the intermediate values that occur during the execution
of the algorithm by adding another secret value, which is
called mask. The mask is changed for every execution of
the algorithm. In this way, the attacker is unable to predict
intermediate values.
A +X
3. Masking AES
In a masked AES, a random mask is added to the input
data of the algorithm prior to encryption. At the end of the
encryption, the mask is removed to get the correct result.
In order to remove the mask, we need to keep track how
the mask is modified by the algorithm. Figure 1 shows this
basic principle.
Y
XY
Y
AY
a-1
Plaintext
X
AY +XY
X
Random Mask
Y
A-1Y-1
Y-1
XY-1
add random mask
a-1
A-1Y-1+ XY-1
Y
Cipher Key
Masked
Algorithm
Mask
Modification
Legend:
a-1
GF(28) multiplication
GF(28) addition
GF(28) inversion
A-1 + X
remove random mask
Ciphertext
Figure 1. Basic masking principle
Masking the linear AES functions is easy. Because these
functions are linear, applying them on a masked value A +
X, gives the same result as applying them first on the data A
and then on the mask X: f (A + X) = f (A) + f (X).
The SubBytes transformation is composed of a multiplicative inversion in GF(28 ) and an affine transformation.
Masking the non-linear byte inversion is tricky because
Inv(A + X) 6= Inv(A) + Inv(X). Without modification, the
result of the byte inversion is (A + X)−1 and thus it is not
possible to remove the mask at the end of the algorithm easily. Therefore, a modified byte inversion is required such
that the result of the inversion equals A−1 + X. Three approaches for a modified byte inversion [6, 7, 12] will be
discussed in the remainder of this section.
Multiplicative masking, which was presented by Akkar
et al. [6], is based on the idea that prior to the byte inversion
the additive mask X is replaced by the multiplicative mask
Y. After the byte inversion, the multiplicative mask is replaced by the additive mask again. We refer to this scheme,
which is depicted in Figure 2, as SubBytesAkkar throughout
the remainder of this article.
As it can be seen in Figure 2, SubBytesAkkar requires four multipliers, two inversions and two additions in
GF(28 ). Therefore, it is already clear that masking leads to
a noticeable increase in terms of area.
The second approach has been presented by Trichina
et al. [7]. The so-called simplified adaptive multiplicative
masking is based on the same scheme as [6] except that the
additive mask X is reused as multiplicative mask Y. This
scheme requires less operations than [6]. We refer to this
masking scheme as SubBytesTrichina.
Both approaches have a severe drawback: they are vulnerable to so-called zero-value attacks [8]. This attack is
Figure 2. Modified byte inversion SubBytesAkkar
based on the fact that multiplicative masks do not hide zero
input-values: if A = 0 then AY = 0 ∀Y .
Recently it has been shown that Trichina’s method is
even vulnerable to ordinary DPA [14].
A third approach has been presented by Oswald et al.
[12]. In contrast to [6, 7], the additive mask is never removed during the computation of the algorithm. This approach renders zero-value attacks impossible. In order to
get the desired result of the inversion, which is A−1 + X,
so-called correction terms are computed in parallel to the
inversion. The new masking scheme, referred to as SubBytesIAIK, uses composite field arithmetic for the implementation of the AES SubBytes operation [9]. The detailed
structure of SubBytesIAIK is schematically depicted in Figure 3.
3.1. Comparison of Masking Approaches
For a comparison of the proposed masking schemes,
the area-time (AT) products have been determined and are
shown in Figure 4. To have a direct comparison with unmasked implementations, a LUT-based SubBytes implementation (referred to as SubBytesLUT) and a compositefield based SubBytes implementation [9] (referred to as
SubBytes) have been implemented as well. For the implementation we have used a 0.25µ m CMOS technology.
As it can be seen in Figure 4, the masking approach
SubBytesAkkar requires most hardware resources, SubBytesTrichina the fewest. SubBytesIAIK lies in between but
it is the only one which is secure. Compared to the unmasked SubBytes implementations it is clear that masking
leads to a considerable performance loss.
A point which we have omitted so far is that we need random numbers as masks: SubBytesAkkar requires an additive
and a multiplicative mask, SubBytesTrichina uses an addi-
A+X
GF(256) to GF(16)
4
4
4
amh
aml
mh
ml
a2
dm3
c1
c2
GF (16) to GF(4)
2
a2
c31
fresh_mask
c4
c5
i6
mh
ml
a2
dm1a
dm2
dm3
c1
c2
i4
c31
c4
c5
c3
i6
i0
i5
i2
a2
c
dm1
i7
i3
i4
i7
i1
i8
i5
i2
i8
mh
i9 = dm
i9 = dm
GF(4) x GF(4)
masked inversion
ml
c31
mh
ml
fresh_mask(1..0)
a2
fresh_mask
i3
i1
2
aml
GF4 inversion
dm_inv1
dm_inv_mh
dm_inv1
dm_inv
fresh_mask
dm4
c7
i10
i13
inv_amh
i12
dm5
c8
i16
i17
i21
i18
i22
fresh_mask
dm4
c7
i23
inv_aml
4
4
GF(16) to GF(256)
8
A-1+X
inv_amh
ml
c5
mh
aml
dm_inv_mh
dm_inv_ml
fresh_mask
i11
i12
dm5
c8
i16
i17
i21
i18
i22
i20
i15
i10
i13
dm_inv_mh(0)
i19
i14
inv_amh
c1
fresh_mask
mh
amh
dm_inv_ml
fresh_mask
c5
c2
mh
ml
aml
dm_inv_mh
fresh_mask
i11
dm_inv_mh(1)
dm_inv_ml
c5
mh
c1
fresh_mask
mh
amh
dm_inv_ml
dm_inv_mh
dm_inv2
dm_inv
dm_inv(0)
fresh_mask
c5
i0
2
amh
c
c3
dm1
2
a2
c
GF(16) to GF(4)
2
c2
dm2
4
mh
dm1a
a2
c
X
4
4
a2
A+X
8
GF(256) to GF(16)
4
fresh_mask
X
8
mh(0)
c31(0)
fresh_mask
i19
i20
i14
i23
i15
inv_amh
inv_aml
2
2
GF(4) to GF(16)
4
A-1+X
Figure 3. Gate-level structure of SubBytesIAIK. Inversion in GF(16) × GF(16) (left) and in GF(4) × GF(4) (right)
Comparison of AT products for different SubBytes implementations (8bit)
55k
50k
45k
40k
area [µm2]
35k
30k
25k
20k
15k
10k
5k
0
1
2
3
4
5
6
7
8
9
critical path [ns]
10
11
12
13
14
15
Figure 4. AT products of different masking schemes
tive mask1 , and SubBytesIAIK uses additive masks. The
number of random bits which are required for the masks
clearly influences the performance as well.
SubBytesTrichina requires 20 random bits per data bit,
SubBytesAkkar 2 and SubBytesIAIK requires 1 random bit
per data bit. In other words, for the encryption of one
128-bit block, SubBytesTrichina needs 10 128-bit random
masks. Assuming that is is possible to generate a 128-bit
random mask in one clock cycle, the throughput of SubBytesTrichina is reduced further by a factor of 10 only by
the mask generation.
4. ASIC Implementation ARES
We have designed and manufactured an ASIC in order
to evaluate the different masking schemes in practice and
to determine the costs for masking in terms of area and execution time. This test chip, which we called ARES, features several masked and unmasked AES implementations.
In particular, it has a 32-bit AES (referred to as MAES32) with three different SubBytes implementations: one
unmasked SubBytes (SubBytes) and two masked SubBytes
(SubBytesAkkar and SubBytesIAIK). For comparisons with
high throughput AES implementations, ARES also features
a throughput optimized 128-bit masked AES version (referred to as MAES-128) with SubBytesIAIK only.
Additionally, the test chip contains a noise generator that
is based on a linear feedback shift register (LFSR). The
noise engine is used to investigate the impact of noise on
the DPA resistance in practice. Figure 5 shows the three
main components of ARES.
4.1. MAES-32
For the implementation of MAES-32 we paid special attention on a sequential execution order. More precisely,
all AES transformations are performed one after the other.
1 As described in [11], the mask values for SubBytesTrichina have some
restrictions because the additive mask is reused as multiplicative mask.
Figure 5. Floorplan of ARES
With this design, we expect that it will be easier to perform
DPA attacks with a low number of measurements. This is
important for us since we intend to attack the unmasked
SubBytes implementation.
4.2. MAES-128
MAES-128 is a throughput-optimized encryption engine that is used for direct comparisons with other highthroughput implementations such as Fastcore [10]. Additionally, it can be used as noise generator.
4.3. Implementation Issues of Masking
When implementing SubBytesIAIK, designers must pay
special attention at the gate structure of the masking
scheme. As can be seen in Figure 3, a so-called fresh
mask must be added to some intermediate values. This
fresh-mask is fundamental for the security of the masking
scheme. However, it is logically redundant, e.g. (A ⊕ B) ⊕
(C ⊕ A) = B ⊕ C. Therefore, during all steps of the design
flow, designers must ensure that this fresh mask is never removed during an optimization step.
The intricate computation structure of the masking
schemes causes additional problems for the designer. Typically, designers try to balance the combinational paths. Unbalanced paths have the drawback that the clock period
can not be exploited entirely. Long combinatorial paths
also increase the power consumption due to more glitches.
The combinatorial delay of the masked implementations
are very long compared to the unmasked implementations
(see Figure 4). Therefore, it seems natural to pipeline
these structures. However, pipelining SubBytesIAIK is not
straightforward. Due to the complicated structure a considerable amount of pipeline registers is required.
4.4. Noise Engine
The noise engine is based on an LFSR with 65 bits. This
LFSR generates pseudo random numbers which are used
as input to a network of buffers. The network of buffers has
been designed in such a way that the noise engine consumes
about the same amount of power as the MAES-32, if the
noise engine is clocked with half of its maximum frequency.
The noise engine has a separate clock signal with which
the amount of noise can be set. The higher the clock frequency is, the more switching activity occurs in the network
of buffers and hence the more power is consumed.
Due to the fact that the amount of noise can be controlled
via the clock frequency, a detailed analysis of the noise impact on the DPA-resistance of the chip can be performed.
5. Conclusions and Further Work
We have presented the first ASIC featuring masked AES
implementations in this article. Our ASIC consists of several masked and unmasked implementations, whose security and performance characteristics have been discussed in
this article as well.
From our research we have concluded that DPA resistant
masking schemes for AES can be defined. However, their
practical implementation shows that there is still room for
improvement with respect to their implementation cost.
In the near future we will continue our ongoing research
with the ARES chip. So far, the functional tests and scan
tests of all 10 packaged samples have been successful. We
intend to acquire power measurements, and perform DPA
attacks on the different AES implementations.
References
[1] National Institute of Standards and Technology (NIST).
FIPS-197: Advanced Encryption Standard, November
2001. Available online at http://www.itl.nist.gov/
fipspubs/.
[2] Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. In Michael Wiener, editor, Advances in
Cryptology - CRYPTO ’99, 19th Annual International Cryptology Conference, Santa Barbara, California, USA, August
15-19, 1999, Proceedings, volume 1666 of Lecture Notes in
Computer Science, pages 388–397. Springer, 1999.
[3] Stefan Mangard. Hardware Countermeasures against DPA–
A Statistical Analysis of Their Effectiveness. In Tatsuaki
Okamoto, editor, Topics in Cryptology - CT-RSA 2004, The
Cryptographers’ Track at the RSA Conference 2004, San
Francisco, CA, USA, February 23-27, 2004, Proceedings,
volume 2964 of Lecture Notes in Computer Science, pages
222–235. Springer, 2004.
[4] Kris Tiri, Moonmoon Akmal, and Ingrid Verbauwhede. A
Dynamic and Differential CMOS Logic with Signal Independent Power Consumption to Withstand Differential
Power Analysis on Smart Cards. In 28th European SolidState Circuits Conference - ESSCIRC 2002, Firenze, Italy,
September 24-26,2002, Proceedings, pages 403–406, 2002.
[5] Kris Tiri and Ingrid Verbauwhede. Securing Encryption
Algorithms against DPA at the Logic Level: Next Generation Smart Card Technology. In Colin D. Walter, Çetin
Kaya Koç, and Christof Paar, editors, Cryptographic Hardware and Embedded Systems - CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10, 2003,
Proceedings, volume 2779 of Lecture Notes in Computer
Science, pages 125–136. Springer, 2003.
[6] Mehdi-Laurent Akkar and Christophe Giraud. An Implementation of DES and AES, Secure against Some Attacks.
In Çetin Kaya Koç, David Naccache, and Christof Paar,
editors, Cryptographic Hardware and Embedded Systems CHES 2001, Third International Workshop, Paris, France,
May 14-16, 2001, Proceedings, volume 2162 of Lecture
Notes in Computer Science, pages 309–318. Springer, 2001.
[7] Elena Trichina, Domenico De Seta, and Lucia Germani.
Simplified Adaptive Multiplicative Masking for AES. In
Burton S. Kaliski Jr., Çetin Kaya Koç, and Christof Paar,
editors, Cryptographic Hardware and Embedded Systems CHES 2002, 4th International Workshop, Redwood Shores,
CA, USA, August 13-15, 2002, Revised Papers, volume
2523 of Lecture Notes in Computer Science, pages 187–197.
Springer, 2003.
[8] Jovan D. Golic and Christophe Tymen. Multiplicative Masking and Power Analysis of AES. In Burton S. Kaliski Jr.,
Çetin Kaya Koç, and Christof Paar, editors, Cryptographic
Hardware and Embedded Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August 1315, 2002, Revised Papers, volume 2523 of Lecture Notes in
Computer Science, pages 198–212. Springer, 2003.
[9] Johannes Wolkerstorfer, Elisabeth Oswald, and Mario Lamberger. An ASIC Implementation of the AES Sboxes. In
Bart Preneel, editor, Topics in Cryptology - CT-RSA 2002,
The Cryptographer’s Track at the RSA Conference, 2002,
San Jose, CA, USA, February 18-22, 2002, Proceedings,
volume 2271 of Lecture Notes in Computer Science, pages
67–78. Springer, 2002.
[10] Frank K. Gürkaynak, Andreas Burg, Norbert Felber, Wolfgang Fichtner, D. Gasser, F. Hug, and Hubert Kaeslin. A
2GB/s balanced AES crypto-chip implementation. In Proceedings of the Great Lakes Symposium on VLSI, Boston
USA, 2004.
[11] Norbert Pramstaller. An AES ASIC-Implementation Resistant to Differential Power Analysis. Master’s thesis, Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology, June 2004.
[12] Elisabeth Oswald, Stefan Mangard, and Norbert Pramstaller.
Secure and Efficient Masking of AES - A Mission Impossible? Cryptology ePrint Archive (http://eprint.iacr.
org/), Report 2004/134, 2004.
[13] Norbert Pramstaller, Frank K. Gürkaynak, Simon Haene,
Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Towards an AES Crypto-chip Resistant to Differential Power
Analysis. In Proccedings 30th European Solid-State Circuits Conference - ESSCIRC 2004, Leuven, Belgium, Proceedings - to appear, 2004.
[14] M.-L. Akkar, R. Bevan, and L. Goubin. Two Power Analysis
Attacks against One-Mask Methods. In FSE 2004 – Preproceedings, pages 308–325, 2004.
Download