Error detection and correction algorithm for AES

AALBORG UNIVERSITY COPENHAGEN
Error detection and correction
algorithm for AES
Fabrizio Di Napoli
Alexandre Noizet
Lucio Quagliozzi
Fall Semester 2009
Aalborg University Copenhagen
Communication Networks Specializing in Security
Title: Error detection and correction algorithm for AES.
Aalborg University Copenhagen
Lautrupvang 15, 2750 Ballerup,
Denmark
Project Period: September – December 2009
Semester Coordinator: Birger
Andersen
Semester Theme: Basic Security
Secretary: Judi Stærk Poulsen
Phone: 9940 2468
Abstract:
In the encryption process errors injected into
Supervisor(s):
data do not only compromise the quality of the
message, but can also be used by malicious
Henrik Tange
Birger Andersen
users to perform attacks and break the secrecy
of the data. The purpose of this project is to
analyze the encryption process with a focus on
data corruption and how it could be avoided.
Members:
The analysis takes place with 3 main subjects:
Fabrizio Di Napoli
Review the Advanced Encryption Standard.
Alexandre Noizet
Lucio Quagliozzi
Examine how errors can be injected in order to
recover secret information from data.
Inspect a way to detect and eventually correct
these errors to keep the encryption process safe.
Finally different solutions are suggested for a
Copies: 3
Pages: 103
Finished: 17/12/2009
software implementation of the error detection
and correction algorithm accompanied with
performances tests. It was found that bit errors
of odd multiplicity in at most one byte in each
word could be corrected with an overhead in
time below 49%.
Copyright © 2009. This report and/or appended material may not be partly or completely published or
copied without prior written approval from the authors. Neither may the contents be used for commercial
purposes without this written approval.
Table of contents
List of figures .................................................................................................................................iv
Preface .......................................................................................................................................... v
Chapter 1 ....................................................................................................................................... 1
Introduction .............................................................................................................................. 1
1.1
General structure of a digital communication system .................................................. 1
1.2
Errors introduced by the communication channel ....................................................... 3
1.3
Errors injected during the encryption phase ................................................................ 4
1.4
Overview of some encryption algorithms ..................................................................... 5
1.4.1
Feistel Cipher structure ......................................................................................... 5
1.4.2
Data Encryption Standard ..................................................................................... 6
1.4.3
Triple DES .............................................................................................................. 7
1.4.4
Advanced Encryption Standard ............................................................................. 8
1.4.5
Cipher Block Modes............................................................................................... 8
1.5
Some attacks on AES ..................................................................................................... 9
1.5.1
DPA Attack ........................................................................................................... 10
1.5.2
Cache-collision..................................................................................................... 11
1.5.3
Boomerang attack ............................................................................................... 12
1.5.4
DFA attack ........................................................................................................... 12
1.6
Problem definition ...................................................................................................... 13
Chapter 2 ..................................................................................................................................... 15
Error Detection and error correction on the AES algorithm ................................................... 15
2.1
Algorithm process of encryption ................................................................................. 15
2.2
Key Scheduling ............................................................................................................ 15
2.3
Substitute Bytes .......................................................................................................... 16
2.3.1
Shift Rows ............................................................................................................ 17
2.3.2
Mix Columns ........................................................................................................ 17
2.3.3
Add Round Key .................................................................................................... 18
2.4
Algorithm process of decryption ................................................................................. 18
2.5
Differential Fault Analysis on AES ............................................................................... 19
i
2.5.1
Description of the fault injection ........................................................................ 20
2.5.2
Key Extraction...................................................................................................... 24
2.5.3
Generalization ..................................................................................................... 26
2.5.4
Attack Complexity ............................................................................................... 26
2.6
The error detection and correction algorithm ............................................................ 27
2.6.1
Parity Bits and error detection ............................................................................ 27
2.6.2
Parity Bytes and error correction ........................................................................ 33
Chapter 3 ..................................................................................................................................... 38
Implementation of the error detection and error correction algorithm ................................ 38
3.1
Parity bit and byte computation ................................................................................. 38
3.1.1
3.2
Parity Check ......................................................................................................... 40
Error detection and correction for ShiftRows ............................................................. 41
3.2.1
Parity prediction .................................................................................................. 41
3.2.2
Error correction ................................................................................................... 42
3.3
Error detection and correction for MixColumns ......................................................... 43
3.3.1
Parity Prediction .................................................................................................. 44
3.2.2
Error correction ................................................................................................... 46
3.3
Error detection and correction for AddRoundKey ...................................................... 47
3.3.1
Parity Prediction .................................................................................................. 48
3.3.2
Error Correction................................................................................................... 49
3.4
Error detection and correction for SubBytes .............................................................. 49
3.4.1
Parity Bit based SubBytes error detection and correction.................................. 49
3.4.2
Inverse SBox based error detection and correction............................................ 51
3.4.3
CRC based error detection and correction .......................................................... 52
Chapter 4 ..................................................................................................................................... 55
Error coverage and performances tests .................................................................................. 55
4.1
Test Environment ........................................................................................................ 55
4.2
Assumptions proof ...................................................................................................... 56
4.3
ShiftRows Limits .......................................................................................................... 57
4.4
Parity bit based SubBytes Limits ................................................................................. 58
4.5
CRC based SubBytes Limits.......................................................................................... 58
4.6
Performances analysis................................................................................................. 58
ii
4.6.1
Time ..................................................................................................................... 59
4.6.1.1
Without error .................................................................................................. 59
4.6.1.2
With a single error injected ............................................................................. 59
4.6.1.3
With four errors injected................................................................................. 60
4.6.2
CPU load .............................................................................................................. 61
4.6.3
Memory usage..................................................................................................... 63
Chapter 5 ..................................................................................................................................... 64
Conclusions and future work .................................................................................................. 64
Appendix A .................................................................................................................................. 66
AES source code with error detection and correction ............................................................ 66
A.1
Complete Source code with parityBit based SubBytes detection/correction ............ 66
A.2
CRC based SubBytes detection/correction solution ................................................... 77
A.3
InvSBox based SubBytes detection/correction solution ............................................. 79
Appendix B .................................................................................................................................. 81
B.1
Time performance ....................................................................................................... 81
Parity bit based SB............................................................................................................... 81
Inverse based SB ................................................................................................................. 84
CRC based SB ....................................................................................................................... 87
B.2
CPU load performance ................................................................................................ 90
Inverse based SB ................................................................................................................. 90
Parity bit based SB............................................................................................................... 91
CRC based SB ....................................................................................................................... 92
B.3
Memory usage performance ....................................................................................... 93
Bibliography ................................................................................................................................ 94
iii
List of figures
Figure 1: Basic elements of a digital communication system ............................................ 2
Figure 2: Encryption/Decryption process........................................................................... 4
Figure 3: The Feistel Structure. ......................................................................................... 6
Figure 4: Triple DES algorithm.......................................................................................... 7
Figure 5: Cipher Block Chaining Mode. ............................................................................ 8
Figure 6: Counter Mode structure. ..................................................................................... 9
Figure 7: DPA Analysis ................................................................................................... 10
Figure 8: Data collection during AES process ................................................................ 11
Figure 9: The SubBytes Transformation .......................................................................... 16
Figure 10: The Shift Row Transformation ....................................................................... 17
Figure 11: The MixColumn Transformation .................................................................... 17
Figure 12: The AddRoundKey Transformation ............................................................... 18
Figure 13: AES encryption/decryption process ................................................................ 19
Figure 14: Matrix S and parity bits in green..................................................................... 28
Figure 15: Data , Checksum packet and the corresponding math expression ................. 30
Figure 16: Changes to the parity bits for j -th output Word after MixColumns
transformation depending on error pattern inducted and byte of the Word
affected by error ................................................................................................................ 32
Figure 17: AES’s input State with parity bits and bytes, output State
and correction mask.. ........................................................................................................ 35
Figure 18: Correction matrix for MixColumn transformation ......................................... 46
Figure 19: CPU load for the Paritybit based solution....................................................... 61
Figure 20: CPU load for the CRC based solution ............................................................ 61
Figure 21: CPU load for the InvSBox based solution ...................................................... 62
iv
Preface
This report describes the project that has been worked out by the students of the CNS
department from the 1st of September to the 18th of December 2009.
The report is an investigation of the Advanced Encryption Standard and attacks performed on
it by error injections. The main purpose is the study of error detection and error correction
algorithms proposed by researchers and a new software implementation for them.
The main report is divided in 5 parts:

Chapter 1 provides an introduction to communications and security of data with
the definition of the project purpose

Chapter 2 describes the Advanced Encryption Standard and a Differential Fault
Analysis on it. Finally we present the analysis of error detection and correction
algorithms proposed by other researchers for AES.

Chapter 3 suggests a software implementation for the error detection and
correction algorithm to prevent DFA attacks.

Chapter 4 contains the reviews of error coverage and performances tests on the
new software implementation.

Chapter 5 concludes the report with an overall discussion about what we
presented and ideas for future developments.
The exhaustive list of citations is provided at the end of this document.
The appendix part includes the C code written for the algorithm and some graphical results of
performances tests.
The source code and the pdf / doc files for this document are provided in the CD-ROM
enclosed.
v
Chapter 1
Introduction
The digital communications field involves the transmission of information in digital form, from
a source that generates the information to one or more destinations. In this scenario, integrity
of data is a main prerequisite a communication system has to preserve: if data get corrupted
during the transmission and the receiver is not able to decode the content, the communication
has no reason to exist. But to understand how it is possible to save transmitted data from
being altered by errors, a background of the steps, the components that act on the data and
the factors that may make the transmission unsafe are needed. The aim of this chapter is to
analyze how the communication works and some possible sources of error that can occur
during a transmission and affect the integrity of transmitted data.
1.1
General structure of a digital communication system
The source that generates messages to be transmitted does not always produce by itself
binary information. For this reason, the original messages produced by the source are
converted into a sequence of binary digits and, to seek an efficient representation of the
source output, in the source encoding process we try to reduce as much as possible
redundancy in the original data flow. The sequence of binary digits from the source encoder is
then passed to the channel encoder that introduces, in a controlled manner, some redundancy
in the sequence that can be used by the receiver to overcome the effects of noise and
interference encountered in the transmission through the channel. Then, the channel encoder
output goes through to the digital modulator, which serves as an interface for the
communication channel: the primary purpose of the digital modulator is to map the binary
information sequence into waveform signals, which can be transmitted through the channel.
This explanation usually deals with the analogical word.
1
The communication channel is the physical mean used to send the signal from the transmitter
to the receiver. It can be the air in a wireless transmission or twisted pairs or optic fibers in a
wired communication.
At the receiver position, end of a digital system, the digital demodulator estimates from the
received waveform the transmitted data symbols. These symbols are then processed by the
channel decoder to attempt reconstructing the original information sequence by the
redundancy contained in the received data. Finally, the source decoder reconstructs the
original signal [1- Ch. 1].
In addition to this, both sender and receiver want their data transfers to be protected against
eavesdropping or tampering and let nobody but them able to understand the information sent
through the channel. That’s why we introduce a new block in the classical basic scheme of a
digital communication system that performs an obscuring of a piece of information meaning
by encoding it in such a way that it can only be decoded, read and understood by people for
whom the information is intended: this block is the encipherer/decipherer. Figure 1 illustrates
the functional diagram and the basic elements of a digital communication system as it has just
been described.
Figure 1: Basic elements of a digital communication system
During the whole process, data encounters different factors that could compromise their
integrity and make the delivered message corrupted or different from the original one. First of
2
all, in signal transmission through any channel, various sources of noise and interference may
arise externally or within the system. In addition, malicious users could be interested in
intercepting the communication and substituting or corrupting the message in order to
discover the parameters (the key) used in the encryption process. For now, let us briefly
discuss about these scenarios and mention some solutions to avoid data corruption.
1.2
Errors introduced by the communication channel
As indicated in the preceding discussion, the communication channel is the connection
between the transmitter and the receiver. The physical channel may be a pair of wires that
carry the electrical signal, or an optical fiber that carries the information on a light beam, or
simpler free space over which the signal is radiated by an antenna. One common problem is
additive noise generated internally by electrical components, sometimes called thermal noise.
Other interference or noise sources can also be generated externally from other users of the
channel or electromagnetic fields. Because of these factors, the spectrum of the original signal
transmitted may be attenuated or present new frequencies not involved before.
Consequently, the result is that one or more bits are changed in the message with a certain
probability.
Different channel models have been described to model these phenomenon and the errorprobability they introduce: for instance the perfect channel with no error, the useless channel
where error always occurs, and different binary symmetric channels in which two transmitted
symbols face an error with probability p; each channel model describes a situation with
different numbers of sent/received symbols and a different way the error could occur [2 – Ch.
3]. These models helped to understand that something is needed to recognise if there were
errors and, perhaps, correct it.
This aim is just what the channel encoder reaches: a sequence of k symbols as an input is
represented with a longer sequence of n>k symbols adding redundancy that permits the
channel decoder to detect or eventually correct an error.
One of the most interesting codes is the Reed-Solomon coding, that can detect and correct
different combinations of erroneous symbols. They are very efficient in the correction of errors
bursts, in which errors tend to be clustered in a number of erroneous consecutive bits
3
different and bigger than one. This is a typical situation in some transmission means like the
wireless channel.
1.3
Errors injected during the encryption phase
Errors can be faced also during the encryption process. In this process the message (plaintext)
is encoded according to a key through different logical operations on bits, in order to hide the
content of the message in order to only the one who knows the key is able to understand and
to revert (Figure 2).
Figure 2: Encryption/Decryption process [3 – pg 30]
It is becoming very common that in this phase, someone who we will be known as the hacker,
tries to break the security of the algorithm and discover its secret key. One way to perform this
is to intentionally induce faults into the system and collect the correct data as well as the faulty
outputs. The hacker then compares them and tries to retrieve the secret information
embedded inside the encryption system, which is, almost all the time, identified in the secret
key. These attacks will be discussed in more details further, but before this we need to better
understand how the encryption process acts on data.
4
1.4
Overview of some encryption algorithms
All encryption algorithms are based on two general principles: substitution, in which each
element in the plaintext is mapped into another element, and transposition, in which elements
in the plaintext are rearranged [3 – Ch. 2]. The fundamental requirement is that no
information must be lost.
We can also distinguish two kinds of system according to the number of keys used: if both
sender and receiver use the same key, the system is referred to as symmetric; if instead they
each use a different key the system is referred to as asymmetric.
In the end, we can also distinguish block ciphers, which process the input one block of
elements at a time producing an output block for each input block, and stream ciphers that
process the input elements continuously, producing output one element at a time.
Through this project, we will focus on the AES algorithm that belongs to the symmetric block
ciphers. But before analysing it, we would like to provide a short overview of the main
symmetric ciphers and the most common modes in which they are used.
1.4.1 Feistel Cipher structure
The Feistel Cipher is a special example of the most general structure used by most of the
symmetric block ciphers.
This structure, described by Horst Feistel in 1973 [4], follows the structure showed in figure 3,
where the plaintext block is divided into two halves that pass through a number of rounds of
processing and then are recombined to produce the output ciphertext.
In each round the right-half derived from the previous round is processed, gathered with the
subkey Ki derived from the initial key, by a substitution function F.
The round function has the same general structure for each round, except the round where
the subkey is used.
5
Figure 3: The Feistel Structure [5]
A final permutation is performed at the end of each round by switching the two halves of the
data and putting them as inputs of the next round. From this structure, most of block ciphers
like DES and 3DES are derived.
1.4.2 Data Encryption Standard
The Data Encryption Standard (DES) was adopted in 1977 by the National Bureau of Standard,
now NIST. Here, the input block is 64-bits in length and the key is 56-bit length. It has the
Feistel structure, with 16 rounds and, as a consequence, 16 subkeys. The decrypting process is
the same as the encrypted one, but the subkeys are used in the reverse order.
The DES core function F performs an expansion of the right half from 32 to 48 bits, an XOR
with the round subkey, and a substitution according to 8 S-BOXes applied to groups of 6 bits.
6
These groups are then concatenated in a new matrix, to which an additional permutation is
applied.
There is just a minor variation of the original Feistel structure as an initial permutation
performed to the input block before the first round and its inverse performed on the output of
the last round [6].
1.4.3 Triple DES
Triple DES was standardized for use in financial application in ANSI standard X9.17 in 1985. It
uses three keys to perform three executions of the DES algorithm (Figure 4): a DES encryption
using the first key is performed followed by a DES decryption with the second key, and finally
another DES encryption with the third key. The decryption algorithm instead follows the
Decryption-Encryption-Decryption scheme with the same keys but in the reverse order.
Figure 4: Triple DES algorithm
The new overall key size is 168 bits but it allows also the use of two same keys (K1=K3), with a
new key length of 112 bits.
7
1.4.4 Advanced Encryption Standard
The Advanced Encryption Standard (AES) consists of three block ciphers, AES-128, AES-192 and
AES-256, adopted from a larger collection originally published as Rijndael. Each AES cipher has
a 128-bit block size, with key sizes of 128, 192 and 256 bits, respectively. The AES ciphers have
been analyzed extensively and are now used world widely, like it was the case with its
predecessor, the Data Encryption Standard (DES) [7].
1.4.5 Cipher Block Modes
The simplest way to operate with cipher blocks is called electronic codebook, where each block
of plaintext is encrypted using the same key according to the chosen encryption algorithm. In
this way, if the same block of plaintext appears more than once in the message, it always
produces the same ciphertext. Some alternatives to ECB solve this problem. One way is to
chain the block that we are going to encrypt with the last ciphertext obtained. This is shown in
figure 5 and is called Chaining Mode, where the input to the encryption algorithm is the XOR of
the current plaintext block and the previous ciphertext block. The key used to encrypt each
block is always the same and the decryption algorithm follows the same structure.
Figure 5: Cipher Block Chaining Mode [8]
In the Counter Mode Encryption [9] instead, in figure 6, we use the chosen algorithm and key
to encrypt a counter that is incremented by one for each block. The output is XORed with the
8
plaintext block to produce the ciphertext block. This mode of operation allows
encrypting/decrypting different blocks of plaintext at the same time. The decryption process
follows the same structure.
Figure 6: Counter Mode structure [9]
1.5
Some attacks on AES
Now that we have a complete panorama of the encryption in a communication system, we can
try to understand how it is possible to break the security of data.
As one of the most widely used algorithms, AES is continuously one of the favourite targets of
malicious attacks performed to discover the secret key or the plaintext of an encryption
process. In the following we will show few details of some of these attacks.
9
1.5.1 DPA Attack
A well-known attack on AES algorithm in counter mode is the DPA attack that stands for
differential power analysis [10]. To use this attack, there is no need of the initial value of the
counter that could be very helpful in other attacks. Also, the input and the output are
unknown too so the hacker is blind at the beginning.
Figure 7: DPA Analysis [11]
Basically the process of this attack is first to perform a data collection (see figure 7). This could
only be done with a direct access to the device and its power. The data collection is a measure
of the power of the device that will be analyzed after. It traces till the fifth round also if they
key is longer than 16 bytes. Figure 8 is an example of a data collection:
10
Figure 8: Data collection during AES process [10]
We can see on the data collection scheme that it is possible to distinguish the AES algorithm
steps of each round. In fact, the figure 8 represents the power consumption of the hardware
during the time, and how variations of the power can be related to the AES encryption
process.
Then, with the data collection performed, it is possible to find the 15th byte block of bits of the
plaintext T15 and perform a 15-bit exhaustive search on this block. These values are used
iteratively through all AES rounds to find the subKeys (keys of the extended key created by the
key schedule) by a statistical analysis. Finally, it runs the key schedule backwards to find the
Key.
But in order to perform this attack, the hacker needs to be close to the power source of the
device and to use some tools to perform the data collection so it is a hard attack to perform.
Some assumption about how to prevent these attacks can be made:
-
The user has to control the hardware power.
-
The user can disturb the hardware by adding some noise to the power consumption.
1.5.2 Cache-collision
Cache memory stores the most frequently used data to increase access so that the CPU
doesn’t have to look for the data in the main memory (cache hit). If the data is not located in
the cache then the CPU goes to the main memory (cache miss).
11
This kind of cache-collision attack uses these two processes, cache hit and cache miss, during
the encryption/decryption process. For each AES step of the encryption/decryption, cache hit
and miss can cause variations of process time and power consumptions that can be used to
find the secret keys[12].
1.5.3 Boomerang attack
This kind of attack uses local collisions of AES. The purpose is to inject errors during a step and
then to correct this error by injecting another error to create a disturbance during the
encryption algorithm. This way, this attack allows the hackers to recover parts of the secret
keys because he can find where the disturbances he created are [13].
1.5.4 DFA attack
The main principle of the Differential Fault Analysis attack is to introduce differences to some
intermediate data during encryption rather than to the input of the algorithm: this simplifies
dramatically the differential analysis since then only few rounds of the whole algorithm are
actually attacked and have to be analyzed [16].
12
1.6
Problem definition
Each of the different algorithms analyzed in section 1.4 was developed due to the security
lacks of the previous one.
Initially, in fact, DES was selected as Federal Information Processing Standard (FIPS) for
the United States in 1976 and after becoming the common encryption standard in the whole
world. But DES has been soon considered to be insecure for many applications. This is mostly
due to the 56-bit key size being too small that was publicly broken in 22 hours and 15 minutes
in January, 1999. The first algorithm suggested to solve DES problems was then Triple DES,
which simply applied DES cipher algorithm three times. It increased the key size of DES
protecting against brute force attacks without requiring a completely new block cipher
algorithm. Even if Triple-DES avoids the problem of a small key size, it is a very slow algorithm
in software applications, unsuitable for limited-resource platforms, and may be affected by
potential security issues related to the small block size of 64 bits. On January 2, 1997, NIST
announced they wish to choose a successor to DES, and AES was selected. In 2001, after an
international competition, NIST selected an algorithm named Rijndael developed by two
Belgian researchers as a replacement for DES[14]. When announced in 1997 Advanced
Encryption Standard (AES) was thought to become the solution that will ensure safe
communications. However, the use of the AES does not guarantee full security. Soon,
researchers [15] showed that even a single fault during encryption/decryption results in a large
number of errors in encrypted or decrypted data. Another problem is that the
implementations of a secret key cryptosystem in hardware, including AES, are susceptible to
differential fault analysis [16, 17].
Another relevant problem for AES security is that data can be accidentally corrupted during
the encryption process due to high memory usage or processes scheduling.
Hence comes out the problem statement of this project:
How can bit errors be detected and corrected during the encryption phase of AES, and avoid in
this way a possible DFA attempt to recover the secret key?
One of the simplest techniques could be to feed the input data to two different and
independent encryption units at the first and compare the final result of each unit to the
13
other. If they both are identical then it is assumed that both encryption units performed
correctly, if they differ it is possible to assume that at least one unit is erroneous, and the
encryption process is repeated again in each encryption unit. However it’s clear that this
approach doesn’t optimize the process because it requires more resources such as memory,
clocks of processors and, of course, time consuming. In fact you have to double the encryption
and
decryption
hardware;
moreover
the
error
will
be
detected
only
after
encryption/decryption completion, increasing the whole encryption/decryption time. In order
to protect encryption process from attacks decreasing the resources overhead from the
standard encryption, another strategy was needed. Many improvements in AES algorithm
were focused on error detection. In 2003 Bertoni et al. [18] proposed to use parity bits to
detect errors injected in the state during the encryption process. Their results were recently
improved by other authors [19, 20, 21] who proposed different algorithms. Unfortunately in
the years 2000 and 2003 other researchers [22,23] showed that error detection alone is not
enough to guarantee protection against data errors.
Then, we focused our attention not only on the fact of how to detect errors, but also how to
correct them in the encryption standard of nowadays. The main purpose is to make AES
stronger against DFA attacks, which act introducing these errors between transformations. To
be able to describe how the algorithm has to be modified, we’ll need to analyze deeply how
each transformation in each round modifies the data that is going to be encrypted, and then
how the fault inducted during the DFA attack affects these data. Furthermore, we will see how
it changes and spreads along the whole data-block after the transformations.
After this analysis, we will try to introduce an algorithm, based on parity between round
operations, which is able to make the encrypted data error-free as much as possible. Our
studies were concentrated on the theoretical analysis of the [24] proposal, also providing some
alternatives for some steps; in a final implementation phase then, we will try to build a
modified software version of AES including the detection and correction algorithm. We mainly
attempt to determine how many and what kind of errors the new algorithm is capable to face,
how AES performances could be lowered by the detection/correction procedure, and how
different approaches impact on time, memory and CPU usage.
14
Chapter 2
Error Detection and error correction on the AES
algorithm
The AES algorithm is a symmetric encryption algorithm adopted by the NIST [25] as a standard
in 2001. It replaces DES algorithm that uses secret keys of only 56 bits when the keys are equal
or more than 128 bits for AES. In fact, this algorithm uses 128 bits blocks of data and three
different keys lengths (128-192-256 bits). It works as rounds (10, 12 or 14) with different
transformations and shifts of a 16 bytes matrix (4x4 array of bytes) called state. In the
following description of the rounds, we assume that it uses 10 rounds of processing steps and
a key length of 128 bits.
2.1
Algorithm process of encryption
In each round, some processing steps are performed: AddRoundKey, Substitute bytes,
Shiftrows, Mix columns, and AddRoundKey again. Each step is reversible without any
knowledge of the key and used in the decryption process too. In the following, each round
round will be described in details.
2.2
Key Scheduling
This first step uses the key provided as input and expands it into an array of forty-four 32 bit
words using the Rijndael's key schedule operations. The first 4 words are the key itself; words
that are in positions multiple of 4 are calculated applying rotations, substitutions and additions
with the original key and a constant Rcon; the remaining words are calculated as the XOR
between the previous word and the one 4 positions earlier. For a complete description of the
key scheduler refer to the NIST description of the algorithm [25]. In each round, 4 words (128
bits) are used.
15
2.3
Substitute Bytes
This step uses a substitution box (S-box) to perform a byte to byte substitution with the State
array. Each byte in input is mapped into a new byte: the first 4 bits indicate the row and the
last 4 the column in the S-Box. This substitution creates a new state s’. The S-box is created
first by finding the multiplicative inverse for a given number with the Rijndael's finite field.
Rijndael uses a characteristic 2 finite field with 8 terms, which can also be called the Galois
field GF(28). It uses the following reducing polynomial for multiplication: x8 + x4 + x3 + x + 1.
After finding the multiplicative inverse, it is transformed with this affine transformation:
Then the result is XORed by the decimal number 99 (11000110) that generates the S-box. The
transformation through a pre-computed S-Box is shown in figure 9:
Figure 9: The SubBytes Transformation [7]
16
2.3.1 Shift Rows
This step is just simple permutation of the state rows (figure 10). If n is the row number, n-1 is
the number of left shift of bytes of each row as described on the following picture.
Row 1: not modified
Row 2: shifted one to the left
Row 3: shifted two on the left
Row 4: shifted three on the left
Figure 10: The Shift Row Transformation [7]
2.3.2 Mix Columns
The Mix columns step affects all the bytes of each column because a transformation of the first
row diffuses in all other rows. The new value is function of all the bytes in the same column. All
bytes are combined using an invertible linear transformation. Operations are made in the
GF(28) meaning that only factors of 1 remain in the polynomial. Each column of the state is
multiplied by a 4x4 matrix and the results are each column 4x1 of the output state:
 S '0 , j   2

 
 S '1, j  1
 S '   1
 2, j  
 S'  3
 3, j  
3 1 1   S 0, j 



2 3 1  S1, j 

1 2 3  S 2 , j 

 
S
1 1 2  3, j 
Figure 11: The MixColumn Transformation
17
2.3.3 Add Round Key
The Add Round Key step is an XOR between the State array after the Mix Columns and the
round key generated by the key scheduler. The following picture depicts how the output state
is created:
Figure 12: The AddRoundKey Transformation [7]
2.4
Algorithm process of decryption
The decryption process is the same as the encryption but performed in a reversed way
(expanded key is used in a reverse order). In figure 13 the complete process of
encryption/decryption of the AES algorithm is shown.
After the complete process of encryption, the output of the cipherblock AES is a ciphertext.
This ciphertext will be used as an input of the AddRoundKey step when decrypting.
18
Figure 13: AES encryption/decryption process [3]
2.5
Differential Fault Analysis on AES
The Differential Fault Analysis exploits computational errors to find cryptographic keys. The
main principle of this attack is to introduce differences to some intermediate data during
encryption rather than to the input of the algorithm: this simplifies dramatically the
differential analysis since only few rounds of the whole algorithm are actually attacked and
have to be analyzed. Determining the inducted difference and also forcing the difference to be
of the particular type are the only difficulties of the attack that affect its overhead.
In this part we show how differential fault analysis (DFA) works on the AES-128 encryption
algorithm described above.
19
2.5.1 Description of the fault injection
This section is based on [26]. In order to perform the attack, the attacker exposes the
encryption device to certain physical effects (i.e. radiation) so that he can induce a fault in
some bits in a word at some intermediate stages of the encrypting algorithm.
The goal of the attack is to recover the subkey at the 10th round; once recovered the 10th
subkey, it is possible to recover the whole key.
We assume that the fault is introduced after the Shift Rows of the 9th round changing a single
byte of the state: let’s suppose, for instance, that we inject an error ε=1E in the 1st byte of the
1st word. It corresponds to the XOR between the byte of the state and the error, that is
87 1E  99 .
 87

 6E
 46

 A6

F 2 4 D 97 

4C 90 EC 
E7 4 A C3 

8C D8 95 

 99

 6E
 46

 A6

F 2 4 D 97 

4C 90 EC 
E7 4 A C3 

8C D8 95 
However, in general, the attacker doesn’t know the differential fault ε as the fault injection
occurs with a certain probability at a short random bit location. For this reason we can
generalize the example by calling with Sr,f[x] and with Fr,f[x] the correct and the faulty byte x of
the state at round n after the function f. We can then write the faulty byte after the 9 th Shift
Rows as:
F9, Sh [1]  S 9, Sh [1]  
And the complete state as:
F9, Sh  S 9, Sh


 00

00

 00

00 00 00 

00 00 00 
00 00 00 

00 00 00 
This state is processed through the 9th Mix Column. Here the error is spread in the whole word;
To explain this in a better way, let’s consider the mix column operation just for the first word
20
of the state of the example. The first byte of the first word of the Mix Column output is the
result of the row-by-column product between the vector [02 03 01 01] and the first word of
the state [99 6E 46 A6]T, that contains also the erroneous byte. The result then will be affected
by error. The same thing happens to the second byte of the first word of the output, since it is
the product between [01 02 03 01] and the 1st word of the input state, and so on for the
remaining byte. The result is unchanged instead for the other words of the output, because
they are computed starting from the 2nd, 3rd and 4th words of the input, that are not affected
by errors. The result will then be as follows:
 7B

 29
 8A

 CF

40 A3
D 4 70
E4 3A
A5
A6
4C 

9F 
42 

BC 
In general:
F9, MC  A0  F9, Sh  A0  ( S 9, Sh


 00

00

 00

00 00 00 

00 00 00 
)
00 00 00 

00 00 00 
In this equation, A0 is the characteristic Matrix of the mix column operation. But the matrix
multiplication is distributive and the product between A0 and S9,sh represents the state we
should have after mix column if no error occurs. Then the output state from the 9th mix column
is written as:
F9, MC  S 9, MC
 2

 1

1

 3

00 00 00 

00 00 00 
00 00 00 

00 00 00 
Here the matrix on the right is the result of the product between A0 and the matrix containing
the error byte. In our example the 1st column of that matrix is [3C 1E 1E 22]T.
21
 7B

 29
 8A

 CF

40 A3 4C   47 40 A3 4C   3C
 
 
D 4 70 9 F   37 D 4 70 9 F   1E


E 4 3 A 42   94 E 4 3 A 42   1E
 
 
A5 A6 BC   ED A5 A6 BC   22
00 00 00 

00 00 00 
00 00 00 

00 00 00 
The 9th round ends with the AddRoundKey operation. Let’s call K9 the 9th round key, and
suppose it is defined as:
 AC

 77
K9  
66

 F3

19
FA
DC
21
28 57 

D1 5C 
29 00 

41 6 E 
Then the output will be:
 7B

 29
 8A

 CF

40 A3 4C   AC
 
D 4 70 9 F   77

E 4 3 A 42   66
 
A5 A6 BC   F 3
19
FA
DC
21
28 57   D7 59 8B 1B 
 

D1 5C   5E 2 E A1 C 3 

29 00   EC 38 13 42 
 

41 6 E   3C 84 E 7 D 2 
Thanks to the distributive propriety of the XOR operation, and keeping in mind how the output
of the Mix Column has been written, it’s easy to express a general formulation of the output as
function of the state we should have without error XORed an error matrix, also for the Add
Round Key:
F9, Ark  F9, MC

 2



 1
 K 9   S 9, MC  
1


 3



00 00 00  
 2


00 00 00  
 1
 K 9  S 9, Ark  


00 00 00
1





00 00 00  
 3
00 00 00 

00 00 00 
00 00 00 

00 00 00 
Applying the S-Box transformation we obtain the output state after the 10th substitute byte
operation:
22
 D7 59 8 B 1B 


 5 E 2 E A1 C 3 
 EC 38 13 42 


 3C 84 E 7 D 2 


 0 E CB 3D

 58 31 32
 CE 07 7 D

 EB 5F 94

S-BOX
AF 

2E 
2C 

B5 
To write a general form as function of correct output and error, we have to define a
differential error, as the XOR between the correct output and the faulty one after the SubBytes
operation; we’ll call the i-th byte of this new error matrix by ε1i:
F10, SB  S10, SB
  01
 1

  11
 2
 1
 3
00 00 00 

00 00 00 
00 00 00 
00 00 00 
Let’s go then quickly through the last operations of the algorithm. The shift rows will diffuse
the error through all the words of the state:
 0 E CB 3D

 31 32 2 E
 7 D 2C CE

 B5 EB 5F

AF 

58 
07 

94 
But while it changes only the order of bytes, both state and error, we can always write:
F10, Sh  S10, Sh
  01

 00

 00
 00

00 00 00 

00 00  11 
00  21 00 
 31 00 00 
Finally, XORing this last output with the round key K10, we obtain the final output:
23
 0 E CB 3D

 31 32 2 E
 7 D 2C CE

 B5 EB 5F

AF   D0 C 9 E1 B6   DE
 
 
58   14 EE 3F 63   25


07   F 9 25 0C 0C   84
 
 
94   A8 89 C 8 A6   1D
02
DC
09
DC
11
C2
62
97
19 

3B 
0B 

32 
And, in general:
F10, Ark  S10, Ark
  01

 00

 00
 00

00 00 00 

00 00  11 
00  21 00 
 31 00 00 
2.5.2 Key Extraction
Information about the last round key can be now extracted starting from the last SubBytes
transformation.
According to what we said before, in input to the S-Box we have the correct output of the 9th
AddRoundKey XORed with the error. If we consider just the first word (the one on which error
acts) we can write 4 equations, one for each byte:
Sub( x0  02   )  Sub( x0 )   01
Sub( x1  01   )  Sub( x1 )   11
Sub( x 2  01   )  Sub( x 2 )   21
Sub( x3  03   )  Sub( x3 )   31
Where [x0 x1 x2 x3]T represents the first word of the correct state after Add Round Key 9, ε the
error injected and ε1i the differential error after the Sub-Bytes. In a compact form it becomes:
Sub( xi  ci   )  Sub( xi )   i
(1)
Where xi and ε are the unknown variables.
Whereas the SubBytes transformation can be written, as we said in section 2.1.2, in a Matrix
form:
24
x0
x0
a  x 1  b
Sub( x)  
b

According
to
[16
–
Ch.
3.5],
proposition

one,
we
search
for
the
set

Sc, 1   :  x, (1) holds with a particular c and  1 
Still according to proposition one, we can explicit ε in (1) obtaining   (c  (a 1   1 )  e) 1
where e varies in a set:


E1  x 2  x  GF (28 )
E1  '01' ,..., '1F ' , '40' ,..., '5F ' ,..., ' A0' ,..., ' BF ' , ' E0' ,..., ' FF '
For our example we need to calculate:
S 2,'E 7 ' , S1,'51' , S1, 47 , S 3,99
The interception S between these sets represents the set of possible committed faults:
’01’ , ’04’ , ’13’ , ’1E’ , ’21’ , ’27’ , ’33’ , ’3B’ , ’48’ , ’4D’ , ’50’ , ’53’ , ’55’ , ’5D’ , ’64’ , ’65’ ,
S 

’7E’ , ’7F’ , ’80’ , ’83’ , ’8D’ , ’8F’ , ’93’ , ’ A7’ , ’ A8’ , ’ A9’ , ’ AB’ , ’ B3’ , ’ B8’ , ’ C9’ , ’ F6’ 
Using all the possible committed faults ε in
S  S c , 1i
, we calculate for each 𝜀𝑖1 the
1
1
number θ = (( a   ' )  c   ) .
Then we solve the equation t2+t= θ, and the two solutions α, β will be used to get the
possible values of the i-th byte of the last round key, K10[i]:
If θ≠1 there are 2 possible values:
K10[i]  Sub(c     )  F10, Ark [i ] or K10[i ]  Sub(c     )  F10, Ark [i ]
The index i represents the byte in which each ε1 appears inside the state after the 10th
AddRoundKey.
With those expressions we can find for our example some possible values for K10[0] :
25
’03’ , ’06’ , ’09’ , ’0C’ , ’10’ , ’15’ , ’1A’ , ’1F’ , ’21’ , ’24’ , ’2B’ , ’2E’ , ’32’ , ’37’ , ’38’ , ’3D’ , ’43’ , ’46’ , 
’49’ ,’4C’ , ’50’ , ’55’ , ’5F’ , ’61’ , ’64’ , ’6B’ , ’6E’ , ’72’ , ’77’ , ’78’ , ’7D’ , ’83’ , ’86’ , ’89’ , ’8C’ , ’90’ , 


K10[0]  

’95’ , ’9A’ , ’9F’ , ’ A1’ ,’ A4’ , ’ AB’ , ’ AE’ , ’ B2’ , ’ B7’ , ’ B8’ , ’ C3’, ’ C6’, ’ C9’, ’ CC’, ’ D0’ , ’ D5’ , ’ DA’ ,
’ DF’ , ’ E1’ , ’ E4’ , ’ EB’ , ’ EE’ , ’ F2’,’ F7’, ’ F8’, ’ FD’

By repeating the attack with new faults, we reduce this set until we’ll obtain only one value for
the byte of the round key 10.
In our case, injecting also {‘E1’,’B3’,’16’,’9E’} we obtain the exact K10[0]=’D0’, K10[7],
K10[10] and K10[13].
2.5.3 Generalization
If error is injected in a byte of a word different from the first one, it will be anyway diffused
through the whole word by MixColumn. In this case, the differential fault matrix will be a
matrix with all zeros except in the column corresponding to the word in which the fault
occurred, so we can bring this case back to the previous one.
2.5.4 Attack Complexity
From the study of different attacks performed and published, it has been noticed that to
recover the secret key on AES-128 requires a not so high complexity. In fact, Christophe Giraud
[26] states that only 50 faulty ciphertexts are needed for a 1-faulty-bit attack, and 250 for a 1faulty-byte one, with a chance to success next to 97%. Moreover, if the attacker can choose a
target byte that the error will affect, this numbers fall down to 35 ciphertexts in the first kind
and only 31 for the second one.
A different kind of DFA attack can be performed also against the AES key scheduler, and has
been proved [27] that the time required is similar to the time needed in decrypting 224 blocks
which can be completed within one minute on a Pentium 4 computer.
26
2.6
The error detection and correction algorithm
We assume that an attacker injects faults that affect a single byte of the word, so at most four
errors can be injected into the State. The error is inducted between transformations and we
don’t mind about the physical type of the injected fault.
We also assume that the encryption key and the round keys are error free and denote by E a 4
by 4 error matrix that represents errors injected into the bytes of the State. Elements ei, j of E
are single bytes and represent the error mask applied to the corresponding bytes of the State.
The most probable errors injected into the AES algorithm for the aim of fault analysis are byte
errors. Therefore we assume that an attacker injects faults that affect a single byte of the
word, so at most four errors can be injected into the State.
2.6.1 Parity Bits and error detection
Our first purpose is to develop fault detection techniques. To perform this task we use the
simplest error detection code, the parity code, which is capable of detecting single bit errors
and odd multiple bits errors. Using a single parity bit for the whole data block is of course not
enough, because it means obtaining fault coverage around 50%. This value is not acceptable in
practice. Moreover it will be very difficult to perform a parity prediction for all data since AES is
strongly non-linear algorithm and the parity bit depends on all information bits. A more
efficient implementation of parity code suitable for fault detecting in AES algorithm was
proposed by some researchers [18] that suggested to associate a single parity bit pi, j with each
byte si, j of the State (Figure 14). For a certain byte, the corresponding parity bit is 1 if the
number of bits set to 1 of that byte is odd, 0 otherwise:
7
pi , j  
s (k ) i, j
k 0
(2)
These parity bits can be disposed in a 4 x 4 matrix, every element in one-to-one
correspondence with the related element of the state. This parity matrix allow us to detect
odd number of erroneous bit for each byte but it’s necessary to develop, for each round
27
transformation, a method to perform the prediction of output parity given the input state and
the input parity. Parity bit detection was developed by the same researchers. We recall here
the most relevant aspects of their proposal for each round.
Figure 14: Matrix S and parity bits in green [24]
The prediction of the output parity bits for Shiftrows is easy: since the transformation only
changes the position of elements in each row, the predicted parity bits matrix is obtained
shifting the rows of the parity bits matrix relative to the input state to this function in the same
way the Shiftrows function does on the state itself.
For AddRoundKey step the prediction of the output parity bits consists in the XOR between the
input parity matrix of the bits of the state and the parity matrix associated with the current
round key.
The prediction of the output parity bits of MixColumns instead is mathematically the most
complex and is based on the most significant bit of each byte of the state and their parity
before the transformation.
To justify this we refer to [18 – Appendix A]: if we consider A as a polynomial as:
7
A   ai x i
i 0
Then we can say that the parity bit associated to the result of 02•A is equal to:
p(02  A)  a7  p( A)
Where a7 is the most significant bit of the byte represented by A and p(A) is the parity bit
associated to A.
28
As a direct consequence we can also say that the parity associated to 03•A can be calculated
as:
p(03  A)  p[(02  01)  A]  p(02  A  01  A)  p(02  A)  p( A)  a7  p( A)  p( A)  a7
That has been possible thanks to the linearity of the parity bit calculation (since it is a simple
XOR bitwise).
(𝑖)
If we denote with 𝑝𝑟,𝐶 the parity bit of the byte element of the state 𝑠𝑟,𝐶 and by 𝑠𝑟,𝐶 the i-th
bit of the byte element 𝑠𝑟,𝐶 with
0  r , c  3 , the predicted parity bit for the first byte in
the generic word c of the state can be calculated as:
p' 0,c  p[(02  s 0,c )  (03  s1,c )  (01  s 2,c )  (01  s3,c )]  p(02  s0,c )  p(03  s1,c )  p(01  s 2,c )  p(01  s3,c )
 s0(7,c)  p0,c  s1(,7c)  p2,c  p3,c
Predicted parity bits for the remaining bytes of the word can be found with the same
procedure, obtaining this way the complete predicted parity bit matrix:
p ' 0 ,c 
 p0,c  p2,c  p3,c  s ( 7 ) 0,c  s ( 7 )1,c
p'1,c 
 p0,c  p1,c  p3,c  s ( 7 )1,c  s ( 7 ) 2,c
p ' 2 ,c 
 p0,c  p1,c  p2,c  s ( 7 ) 2,c  s ( 7 ) 3,c
p ' 3, c 
 p1,c  p2,c  p3,c  s ( 7 ) 3,c  s ( 7 ) 0,c
Since non-linearity of transformation, parity prediction for SubBytes involves input parity and
data from the State. Instead of using complex algorithms, Bertoni et al. proposed to apply
look-up table to predict parity. In particular we know that the S-box is usually implemented as
a 256 x 8 bits memory. To generate the outgoing parity bits, an even parity bit can be stored
with each data byte in the S-box memory, which will now be of size 256 x 9 bits. To detect
errors, from a hardware point of view they suggest to replace the original 8-bit decoder with a
9-bit one, a 512 x 9 memory. If a 9-bit address with an even parity is decoded, the
corresponding output byte with its associated even parity bit is produced. Otherwise, a
constant word of 9 bits with a deliberately odd parity is output. Therefore, half of the entries in
the S-box will be intentionally wrong. In the same way, we are instead proposing to store the
parity bit of each S-Box entry into a new table, and to access to it each time parity prediction is
needed in the same way the access in the S-Box is done. Anyway, a problem with parity bits
29
and SubBytes lies on the fact that different S-Box values have the same parity bit with
apparently no particular order or law. So if a fake byte enters the S-Box, the corrupted output
could have the same parity bit as if no error has occurred. To better understand this, let’s show
a simple example: if the input byte is 04 then the corresponding S-Box output value will be F2.
Let’s suppose now that an error with odd parity (we are supposing that only odd parity errors
occurs) affect this byte before the transformation. If this error is EF, the value entering the SBox will be:
04  EF  EB
That has as output E9, also with odd parity, so we can’t detect the error that has occurred.
That is why we searched for new solutions, and from this research two new ideas came up:
one that uses Cyclic Redundancy Checks (CRC) and another using both direct and inverse SBoxes.
A CRC is an error-detecting code that was developed by W. Wesley Peterson, and published in
his 1961 paper [28]. It consists in a division operation using the Galois finite field arithmetic.
Finally the quotient is discarded and the result is the remainder. The length of the remainder is
always less than or equal to the length of the divisor, which therefore determines how long the
result can be. Therefore the definition of a particular CRC consists on the definition of the
divisor used. For example the parity code (that is the simplest CRC) uses the two bit long
divisor "11". Summarizing you have to choose a pattern of r+1 divisor bits in order to produce r
check bits. Adding these bits computed, known as checksum, to the original byte (Figure 15),
after the transformation you can compute another time the checksum and verify if there was
an error injected during the process.
Figure 15: Data, Checksum packet and the corresponding math expression
30
Different kind of CRCs can be seen at [29], but for our case we chose CRC-5-EPC code, which
uses as divisor the polynomial x5+x3+1. This choice relies on a compromise between the
number of error that can be detected and heaviness of the computation needed. In fact more
is the length of divisor more errors can be corrected but also more operations are required to
compute the checksum. Therefore first of all we build a table that collects the checksum value
for the corresponding one stored in S-box. The checksum prediction consists simply in getting
the value that corresponds in that table to the one involved in the transformation. Doing that
for each byte of the state we obtain the predicted matrix. After the SubBytes transformation
we compute the checksum for each byte of the actual state building the actual checksum
matrix. The final step is the comparison between the predicted matrix and the actual one.
Finally it is important to underline that is not possible to correct all the errors, since there are
more than one value that have the same checksum in the table. Thus a particular error pattern
can change the state byte into a particular configuration that has the same checksum and in
this case the error is not detected.
In fact, if we have 0x0d as a byte of the state, it will produce as output of the SBox the value
0xd7, that has 0x68 as checksum. Now, if we inject 0x76 as error, the new byte of the state will
be 0x7b, producing 0x21 as output, with 0x68 as checksum, the same as before. This error then
won’t be detected by the algorithm.
Instead, the other solution is a kind of reverse SubBytes transformation to correct errors in the
output state. Also, it doesn’t use the bit prediction matrix because we have seen that this
prediction doesn’t detect all errors. In this solution we perform detection and correction in the
same time so we’ll discuss how it works further in the correction section.
To describe the error detection process we refer to parity bits before the transformation as
input parity bits (we denote them simply as pi, j) and assume that they are always error free.
The parity bits calculated with respect to input parity and State data are called predicted ones
(p’i,,j ). Using predicted and actual parity bits after each transformation it’s possible to
understand exactly which byte has been affected by error. This is obvious for AddRoundKey,
the first solution of SubBytes and Shiftrows transformation since the errors that ensue from
these transformations do not spread between elements of the state. It is is possible then for
this operation to perform a XOR between predicted and actual parity bits and, if there is any
difference in them, they indicate where the error happened. To have a better understanding of
31
how it works, let’s suppose that we have computed for one of these operations the following
actual and predicted parity bits matrices:
1

1
0

0

1 0 1

1 1 0
0 1 1

0 1 0 
1

0
0

0

Actual Parity bits
1 0 1

1 1 0
0 1 1

0 1 0 
Predicted Parity bits
The difference between the actual and predicted parity bits in the second row of the first
column means that an error occurred in the second byte of the first word.
In the case of MixColumns transformation instead, each single byte error diffuses in the
complete word in which it was injected to. However, for any error exactly three out of four
output parity bits associated with the output word are changed because as shown in figure 16
each parity bit occurs only in 3 of the four equations. We can define an error pattern in the
parity bits as the XOR between the predicted and the actual ones. Knowing if the most
significant bit of the data is erroneous and the error pattern, by using the table shown in figure
16 it is possible to understand which byte of the word was erroneous.
Figure 16: Changes to the parity bits for j -th output Word after
MixColumns transformation depending on error pattern inducted and
byte of the Word affected by error [24]
For example if
the
most
significant bit of the data is error free and the predicted bits are:
1
0 0 1
32
Whereas the actual ones are:
1
1 1 0
Then, the corresponding error pattern for parity bits is the exclusive-OR between the predicted
and the actual ones:
0
1 1 1
Now we have got all the necessary information to detect the error. Looking at figure 16, the
corresponding byte to our case is byte s1,j .
2.6.2 Parity Bytes and error correction
In addition to parity bits Czapski and Nikodem [24] suggested to use a also single parity byte p j
for each word W j of the State in order to perform error correction:
3
pj   s
i0
i, j
(3)
As for parity bits we use the same hypothesis and notation for actual and predicted parity
bytes. First of all we want to show that it is possible to perform the parity byte prediction for
each transformation of AES algorithm.
We know that ShiftRows transformation consist in rotating rows of the state matrix. Thus
output parity byte for the j-th Word is determined as:
pj  s0, j  s1, j 1mod 4  s1, j 2 mod 4  s1, j 3 mod 4 (4)
Using this we obtain, extracting from (3) s0,j as function of pj and the other bytes:
pj  p j  s1, j  s2, j  s3, j  s1, j 1mod 4  s2, j  2 mod 4  s3, j 3 mod 4
33
Similarly AddRoundKey is a linear transformation in which each byte of the State is XOR-ed
with the corresponding byte of the round key matrix. So the output parity byte can be easily
predicted according to the following formula:
pj  s0, j  k0, j  s1, j  k1, j  s2, j  k 2, j  s3, j  k3, j  p j  pk j
Where pk j is the parity byte of the j-th word of the round key matrix.
Also parity byte prediction in MixColumn is a little bit more complex than the other operations.
Referring to the definition of the transformation, the output parity byte that should be
produced can be expressed as:
02  s0, j  03  s1, j  s2, j  s3, j
 s0, j  02  s1, j  03  s2, j  s3, j
pj  s0 , j  s1, j  s2 , j  s3, j 
 s0, j  s1, j  02  s2, j  03  s3, j
 03  s0, j  s1, j  s2, j  02  s3, j
Due to associativity of the XOR sum:
pj  s0, j 02  01  01  03  s1, j 03  02  01  01  s2, j 01  03  02  01  s3, j 01  01  03  02
And since 03  02  01 we get:
p j  s0, j  s1, j  s 2, j  s3, j  p j
Therefore MixColumns transformation maintains the parity byte for the word.
SubBytes, in the end, is a non linear transformation of each byte of the state. The output parity
byte after the transformation is:
p j  s 0 , j  s1, j  s 2 , j  s 3, j
Where si, j  a  s
1
i, j
 b according to the definition of the transformation. Thus we
obtain:

pj  a  s 10, j  s 11, j  s 12, j  s 13, j

(5)
34
Since b  b  b  b  0 and the operations  and  are distributives. From (5) and (3) we
can write the formula to predict parity byte we should have after the transformation:

pj  a   p j  s1, j  s 2, j  s3, j   s 11, j  s 1 2, j  s 13, j
1

(6)
Using predicted and actual parity bytes of the word it is possible to determine the error
injected in the state.
Using parity bits it is also possible to know in which byte of the state this error has occurred.
On this bases we can build a correction matrix and, simply XORing the correction matrix with
the output state of current AES transformation, it is possible to correct odd number of errors
for each byte of the state.
Figure 17: AES’s input State with parity bits and bytes, output State and correction mask [24]
Then, XOR-ing the predicted and the actual parity byte we obtain the correction mask to apply
to the erroneous byte of the state after the SubBytes transformation.
For 3rd solution instead, the byte prediction allows comparing the predicted parity byte vector
before SubBytes with the parity byte vector of the output state. This way, we are able to
detect almost all erroneous word of the output state. Then, for each i-th row of the j-th
erroneous word, we use the Invert SBox to determine the previous byte[i][j] of the state
before the transformation. This allows seeing if the substitute byte[i][j] of the actual state is
the corresponding SBox value of the previous state. Then, if the bytes are different, we
compute the SBox value of the previous state for the erroneous byte[i][j] and we inject it in the
output state.
35
For the other transformations we follow again [24].
Let’s call the error revealed by the XOR between predicted and actual parity bytes as p j , and
as si,j the output byte of a transformation; for the shift rows it is easy to verify that we get:
p j  p' j  s0, j  s1, j 1mod 4  s2, j  2 mod 4  s3, j 3 mod 4
Exchanging p' j with the predicted parity byte expression of the shift row and erasing equal
factors, we obtain:
p j  p j  s0, j  s1, j  s2, j  s3, j
Simillarly, in AddRoundKey:
3
p j  p' j   si , j  k i , j  p j  s0, j  s1, j  s2, j  s3, j
i 0
In both cases the error is equal to 0 if and only if the j-th Word is error free; otherwise it is
equal to the error ei,j injected in the byte of the state. Therefore, the correction matrix ₵ is an
all-zero 4x4 matrix except for the i-th byte of the j-th word where error was detected, in which
we put the error pattern:
e i  r , j  w
ci , j  
0 elsewhere
In the case of MixColumn we have to consider that a single byte error spreads across the
whole Word. Let’s consider for example a state matrix affected by an error in one byte of a
certain Word, and perform on it the MixColumn transformation by multiplicating the word by
the characteristic matrix of the operation to describe the output state:
 s'0, j   02

 
 s '1, j   01
 s '    01
 2, j  
 s '   03
 3, j  
03 01 01   s0, j
 
02 03 01   s1, j

01 02 03   s2, j
 
01 01 02   s3, j
 e0, j 

 e1, j 
 e2 , j 

 e3, j 
36
Since we supposed that at most one byte can be affected by an error in a word, only one of the
four ei,j is different from zero, and it’s easy to see how the output is the output state we
expect without error, plus an error vector e' depending on which byte has been corrupted:








 e' 0, j   02  e0, j , e0, j , e0, j ,03  e0, j T

 
T
 e'1, j   03  e1, j ,02  e1, j , e1, j , e1, j

 e'   e ,03  e ,02  e , e T
2, j
2, j
2, j
 2, j   2, j
 e'   03  e , e , e ,02  e T
3, j
3, j
3, j
3, j
 3, j  
If e0,j ≠ 0
If e1,j ≠ 0
If e2,j ≠ 0
If e3,j ≠ 0
The correction matrix ₵ then is an all-zero 4-Words matrix except the w-th word which has
been corrupted (that we are able to identify thanks to the parity bit), and this word has the
structure of the error vector e’ shown above.
37
Chapter 3
Implementation of the error detection and error
correction algorithm
In this section we expose the implementation phase we have modified a software algorithm of
AES to make it able to recognize if and where an error occurred and eventually to correct it,
basing on the theoretical description of the previous chapters. We used a simple C
implementation [30] of AES, trying to keep the correction/detection part as much separated as
possible from the original AES algorithm, so that future developments on more efficient AES
implementations can include this part without rewriting the whole code. Pieces of code are
also presented to make the description as much understandable as possible, but we will omit
for cycles or if structures that are not needful for a global idea of the implementation. For the
complete code check the Appendix A at the end of this report.
We used only the following variables from the original code:
unsigned char RoundKey[240];
unsigned char State[4][4];
The first one stores all the Nb(Nr+1) RoundKeys computed by the key scheduler where Nb is
the number of columns of AES state, usually 4 (We have let on purpose the two [4][4]
dimensions here because it is part of the original AES code. In the variables and functions we
added we replaced it by a “#define Nb 4”), and Nr is the number of rounds, depending on
the length of the input key) and the state after each transformation.
3.1
Parity bit and byte computation
The function getParitybit() takes as input a matrix where computed parity bits are going to be
stored:
void getParityBit(unsigned char bit[][Nb])
Usually this matrix is:
38
unsigned char ActualParityBit[Nb][Nb];
It stores the parity bits relative to the current state, but sometimes it will be:
unsigned char PredictedParityBit[Nb][Nb];
Because, as we’ll see, for some operations these two matrixes are almost the same one.
In a first moment, we obtain the value of the least significant bit of the byte of which we want
the parity bit performing an AND bitwise between the i-th byte of the j-th word of the state
and a mask variable formed by all zeros except the least significant bit. If we perform then a
right-shift by 1 position on this byte and apply the AND operation with the same mask, we
obtain the second least significant bit. So, doing this in a recursive way for all the 8 bits of the
byte, and XORing all the resulting bits, we get the parity bit associated to the byte in question.
bit[j][i]^=((state[j][i]>>k)&Mask);
Here i and j vary from 0 to 3 to identify bytes and words of the state, and k from 0 to 7 to
identify after the AND the k-th bit of the byte.
Instead, regarding parity bytes, the function getParityByte() receives a 4-byte vector where the
computed parity bytes are stored.
void getParityByte(unsigned char byte[])
Also in this case usually the vector is the one related to the actual state:
unsigned char ActualParityByte[Nb];
But while MixColumn doesn’t change parity bytes, we will use the same function to store also
the predicted parity bytes in the vector
unsigned char PredictedParityByte[Nb];
The computation of the j-th parity bytes is performed simply XORing all the bytes in the j-th
word.
39
byte[j] ^= state[i][j];;
Doing this for each word gives us the complete parity byte vector.
Two different functions are used to get the parity bits and bytes of the RoundKey in a specific
round:
void RoundKeyParityBit(int round);
void RoundKeyParityByte(int round);
The operations performed from these two functions are the same we mentioned above, but
here we need to give the number of the round to the function to extract from the RoundKey
vector the actual round key:
RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb +j]>>k)&Mask);
RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]);
3.1.1 Parity Check
The function ParityBitcheck() is used after each AddRoundKey, ShiftRows and SubBytes
transformations to compare the parity bit computed on the output state of the transformation
with the parity bit predicted before.
If the comparison returns 1 then an error occurred and a correction operation is needed. For
this reason we set to 1 the error flag and a flag inside a Boolean vector to identify which word
has been corrupted. In a vector of integers we store also the position of the byte in a certain
word affected by the error:
if(ActualParityBit[i][j]!=PredictedParityBit[i][j])
{
error=1;
ErrWord[j]=1;
ErrByte[j]=i;
}
40
3.2
Error detection and correction for ShiftRows
The ShiftRows step of the AES algorithm is not the hardest one. In fact, as we have seen in the
AES description before, it just shifts the rows of the actual state in this way:
Row 1: not modified
Row 2: shifted one to the left
Row 3: shifted two on the left
Row 4: shifted three on the left
It is then possible to reuse this process to perform the parity prediction that will help to locate
errors.
3.2.1 Parity prediction
The first parity prediction to perform is the parity bit prediction. The parity bit matrix is a 4x4
matrix of binaries values. Each one corresponds to the parity of the hexadecimal value (XOR of
each bit of the byte).
The first step is to compute the Actual Parity Bit matrix and then we reuse the ShiftRows
process to shift the bits of the Actual parity Bit matrix to create the Predicted parity Bit matrix.
void SR_BitPrediction()
{
unsigned char temp;
getParityBit(PredictedParityBit);
// Rotate first row 1 columns to left
temp=PredictedParityBit[1][0];
PredictedParityBit[1][0]=PredictedParityBit[1][1];
PredictedParityBit[1][1]=PredictedParityBit[1][2];
PredictedParityBit[1][2]=PredictedParityBit[1][3];
PredictedParityBit[1][3]=temp;
// Rotate second row 2 columns to left
temp=PredictedParityBit[2][0];
PredictedParityBit[2][0]=PredictedParityBit[2][2];
PredictedParityBit[2][2]=temp;
temp=PredictedParityBit[2][1];
41
PredictedParityBit[2][1]=PredictedParityBit[2][3];
PredictedParityBit[2][3]=temp;
// Rotate third row 3 columns to left
temp=PredictedParityBit[3][0];
PredictedParityBit[3][0]=PredictedParityBit[3][3];
PredictedParityBit[3][3]=PredictedParityBit[3][2];
PredictedParityBit[3][2]=PredictedParityBit[3][1];
PredictedParityBit[3][1]=temp;}
Then we performed the Parity Byte Prediction to predict the parity byte vector after ShiftRows.
p’j= S0, j ⊕ S1, j+1 mod 4 ⊕ S2, j+2 mod 4 ⊕ S3, j+3 mod 4.
void SR_BytePrediction()
{
…
PredictedParityByte[j] =
ActualParityByte[j] ^ state[1][j] ^
state[2][j] ^ state[3][j] ^ state[1][(j+1)%4] ^
state[2][(j+2)%4] ^
state[3][(j+3)%4];
…
}
3.2.2 Error correction
The error correction for Shift Rows is performed by locating errors prior to correct them. In our
algorithm, we use the ParityBitCheck() function that returns 1 if one or more errors are
detected. If 1 is returned, the correction starts instead nothing happens.
void Correction()
{
…
if (ErrWord[j]==1)
{
CorrectionParityByte[j] =
PredictedParityByte[j] ^
state[0][j] ^ state[1][j] ^ state[2][j] ^ state[3][j];
state[ErrByte[j]][j]^=CorrectionParityByte[j];
}
42
…
getParityByte(ActualParityByte);
error=0;
}
The Boolean variable error is in the end set to 0 again to indicate that no errors are now
affecting the actual state. The same procedure for error correction will be used for
AddRoundKey and the parity bit version of SubBytes.
3.3
Error detection and correction for MixColumns
In order to perform the MixColumns bit prediction we need to implement the equations [18 –
Appendix A]:
Therefore first of all we need a function to compute the most significant bit of a byte. We call
it getMostsignificantBit(). This function accepts as input a byte and perform an AND between
the byte mask 0x01 and the shifted version of the byte seven times. The function returns the
bool value of the most significant bit.
bool getMostsignificantBit(unsigned char data_received)
{
bool MSB;
if((data_received>>7)&Mask==1)
MSB=1;
else
MSB=0;
return MSB;
}
43
3.3.1 Parity Prediction
The bit prediction is completed by the function MC_BitPrediction(). This function accepts as
input a matrix, in particular the actual parity bit matrix before the mixcolumns transformation
and computes the predicted parity bit matrix according to previous equations:
void MC_BitPrediction()
{
getParityBit(ActualParityBit);
for(int i=0;i<4;i++)
{
PredictedParityBit[0][i]=ActualParityBit[0][i]^ActualParityBit[
2][i]^ActualParityBit[3][i]^getMostsignificantBit(state[0][i])^
getMostsignificantBit(state[1][i]);
PredictedParityBit[1][i]=ActualParityBit[0][i]^ActualParityBit[
1][i]^ActualParityBit[3][i]^getMostsignificantBit(state[1][i])^
getMostsignificantBit(state[2][i]);
PredictedParityBit[2][i]=ActualParityBit[0][i]^ActualParityBit[
1][i]^ActualParityBit[2][i]^getMostsignificantBit(state[2][i])^
getMostsignificantBit(state[3][i]);
PredictedParityBit[3][i]=ActualParityBit[1][i]^ActualParityBit[
2][i]^ActualParityBit[3][i]^getMostsignificantBit(state[3][i])^
getMostsignificantBit(state[0][i]);
}
}
But as we said in section 2.4.1, the information about predicted and actual parity bit is not
enough to detect errors in Mixcolumns transformation since the error spread in the whole
word. Thus it is necessary to define another function, the MC_ParityBitCheck(). This function is
basically an implementation of the table in figure 16, which allows understanding which byte
in the word is erroneous.
As you can see in the table, in order to recognize the wrong byte in the word we need two
values: the most significant bit of the byte error pattern and the parity bit error pattern of the
44
word. Regarding the byte error pattern, it is important to underline that we assumed that not
more than one byte is incorrect in each word. This means that by performing the XOR between
the predicted and the actual parity byte we obtain exactly the error pattern of the erroneous
byte. We call the result of this operation correction parity byte and looking at this parameter
we can understand if there is an error and the error mask involved for the wrong byte.
If there is an error (that is the correction parity byte is not 0x00) we get the most significant bit
by the getMostsignificantBit() function obtaining the first information.
if(CorrectionParityByte[i]!=0)
{
error=1;
bool msb=getMostsignificantBit(CorrectionParityByte[i]);
Regarding the parity bit error pattern we have just to XOR the predicted and actual parity bit of
the involved word. Once this vector has been known, you have to check in which position it is
stored inside the error bit pattern matrix and looking at the entry point you can understand
which byte is the erroneous one. The check is simply made doing AND operation between the
parity bit error pattern and the corresponding column of the error bit pattern matrix for each
word of the matrix. In particular in the case of most significant bit of the byte error pattern
equals to zero the matrix involved is the upper part in figure 16. Finally we stored in the vector
ErrByte the position of the erroneous byte for each word. Below you can see the code in the
case of most significant bit equal to zero.
if(msb==false)
{
for(int k=0;k<4;k++)
{
if(PBitError_Pattern[0][i]==matrix_msb0[0][k] &&
PBitError_Pattern[1][i]==matrix_msb0[1][k] &&
PBitError_Pattern[2][i]==matrix_msb0[2][k] &&
PBitError_Pattern[3][i]==matrix_msb0[3][k])
{
ErrByte[i]=k;
}
}
45
3.2.2 Error correction
By means of error detection is possible to understand which byte of the word is erroneous.
With this information is possible to obtain a correction vector that is function of the position of
the wrong byte in the word affected by error. Thus each column of the correction matrix
consists by all zeros if there is no error and vice versa by a correction vector that relies on the
position of the error inside the word:
Figure 18: Correction matrix for MixColumn transformation [24]
In order to build the correction matrix we need to multiply the error mask to coefficient like
0x02 and 0x03 in Galois field GF (2
8
) . Therefore the first step for the error correction is the
implementation of multiplication in GF (2
8
).
This operation has been well described by NIST
in [25].
In brief, we can compute the multiplication between a generic byte (we call it f) and a value x
(in our case 0x02) in this way:
b 6 ,b 5 ,b 4 ,b 3 ,b 2 ,b1 ,b 0 ,0 if b 7  0

x  f ( x)  
b 6 ,b 5 ,b 4 ,b 3 ,b 2 ,b1 ,b 0 ,0  (0,0,0,1,1,0,1,1) if
b7  1
Then the product between 0x03 and f can be performed XORing the byte f (that can be seen as
0x01•f) and 0x02•f.
In the AES code we used, this operation is performed by a macro: xtime
#define xtime(x)
((x<<1) ^ (((x>>7) & 1) * 0x1b))
46
As one can see, doing x<<1 we obtain
b6 ,b5 ,b4 ,b3 ,b2 ,b1,b0 ,0 . The second member of the
XOR is equal to zero if the most significant bit
b7
is equal to 0, vice versa to 0x1b.
b7
is
obtained by doing AND operation between the seven times shifted version of the byte and the
mask 0x01.
An example of the code used to build the correction matrix in the case of error located in the
first row is shown below:
if(CorrectionParityByte[i]!=0)
{
if(ErrByte[i]==0)
{
CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i]);
CorrectionMatrix[1][i]=CorrectionParityByte[i];
CorrectionMatrix[2][i]=CorrectionParityByte[i];
CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i])^
CorrectionParityByte[i];
}
The final step is just the XOR between the correction matrix obtained and the erroneous state
in the output of Mixcolumns transformation.
…
state[i][j]^=CorrectionMatrix[i][j];
…
3.3
Error detection and correction for AddRoundKey
Before each AddRoundKey transformation we need to get the actual parity bit and byte of the
state by the functions described above to perform the prediction.
47
3.3.1 Parity Prediction
The function ARK_BitPrediction() receives as argument the current round number and
computes the predicted parity bits simply XORing the actual ones with the parity bits
computed for the round key. So first of all we need to call the RoundKeyParityBit() function
that builds the RKeyParityBit matrix containing the parity bits for the round key specified by
the number of the actual round given by argument:
RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb +j]>>k)&Mask);
As the round keys are all stored in one single vector, to enter this vector in the right position
and get the round key we need, we use the round number and the number of columns Nb to
go through the whole round key. Then the parity bit is computed using the Mask 0x01 in the
same way the getParityBit() function does.
Once the RKeyParityBit matrix has been computed we can get the predicted parity bits:
PredictedParityBit[j][i]=ActualParityBit[j][i]^RKeyParityBit[j][i];
The Byte prediction instead is performed by the ARK_BytePrediction() function, that similarly
to the bitPrediction function explained just before, in a first moment invokes the
RoundKeyParityByte() that computes the parity byte for the round key specified by the
argument XORing the 4 bytes of the i-th word of the round key:
RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]);
Then the predicted parity byte is the XOR between the actual parity byte and the round key
parity byte:
PredictedParityByte[j] =
ActualParityByte[j] ^ RKeyParityByte[j];
After the transformation we use the ParityBitCheck() to check if an error occurred.
48
3.3.2 Error Correction
Using the variable error then we perform the correction if and only if an error has been
detected. The correction is performed using the same Correction() function used for
ShiftRows.
ParityBitCheck();
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
3.4
Error detection and correction for SubBytes
The SubBytes process, as it was described in the AES algorithm definition, is a substitution of
each byte of the state with the corresponding byte of the SBox table already defined in our
code. As we found some caveats in a first implementation of the error detection and
correction for this transformation, we proposed different solutions and in the next chapter we
will discuss the advantages and drawbacks of each.
3.4.1 Parity Bit based SubBytes error detection and correction
Concerning the bit prediction matrix we thought that, instead of using the SBox and then
calculate the parity bit for each byte of the state, it could have been more efficient to
precompute a SBoxParity bits table and store it in memory. In fact, this table (defined as a
vector in the code) is the computation of the parity bit for each byte of the SBox table. This
way, the algorithm has only to look for the corresponding parity bit for each byte of the state
to compute the parity bit matrix for SubBytes.
This is the result of the SBox parity table that we add in a function getSBoxParity() to be called
exactly as the getSBox() function:
49
int bitSBox[256] =
//
0 1 2 3
0, 1, 0, 0,
0, 0, 0, 0,
0, 1, 0, 1,
1, 1, 1, 0,
0, 1, 1, 1,
0, 0, 0, 0,
1, 1, 0, 1,
1, 0, 1, 1,
1, 0, 1, 1,
0, 0, 1, 1,
1, 1, 0, 0,
0, 1, 1, 1,
1, 0, 1, 0,
1, 1, 1, 0,
0, 1, 1, 0,
1, 1, 1, 1,
{
4
1,
0,
0,
0,
0,
1,
1,
1,
0,
0,
1,
0,
1,
0,
0,
1,
5
1,
0,
0,
0,
1,
0,
0,
1,
1,
1,
0,
1,
0,
0,
1,
1,
6
0,
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
7
0,
0,
0,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
1,
1,
1,
8
0,
1,
1,
1,
1,
0,
1,
1,
1,
1,
1,
0,
0,
1,
1,
0,
9
1,
0,
0,
0,
1,
1,
0,
1,
1,
0,
1,
0,
0,
0,
0,
0,
A
1,
1,
1,
1,
1,
0,
1,
1,
0,
0,
0,
1,
0,
1,
0,
0,
B
0,
0,
1,
0,
1,
0,
1,
0,
1,
0,
1,
1,
1,
1,
1,
0,
C
1,
0,
0,
0,
1,
1,
0,
1,
1,
0,
1,
0,
0,
1,
1,
1,
D
0,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
1,
0,
1,
0,
1,
E
1,
0,
1,
0,
1,
1,
0,
0,
1,
1,
0,
1,
0,
0,
0,
0,
F
1, //0
0, //1
1, //2
1, //3
0, //4
0, //5
1, //6
0, //7
1, //8
0, //9
1, //A
1, //B
1, //C
1, //D
1, //E
1}; //F
We just call the function with the actual state matrix to get the corresponding predicted parity
bit matrix:
void SB_BitPrediction()
{
PredictedParityBit[i][j] = getSBoxParity(state[i][j]);
}
Then, we predicted the parity byte vector after SubBytes. This vector is computed considering
the output state according to the definition of SubBytes in the matrix form:
S '[i][ j ]  A  (S 1[i][ j ])  b
In the way we already discussed in section 2.4.2: we first compute the Multiplicative Inverse of
the values we need from the formula (6) using the MulInv table that contains all the
multiplicative inverses of the Galois field G(28). Then we multiply the result for the matrix A
with an AND operation and a recursive XOR to get the i-th bit of the j-th Parity Byte. Each bit
then is inserted through right shifts in the PredictedParityByte[j]:
void SB_BytePrediction()
{
unsigned char temp=0x00;
for (int j=0;j<4;j++)
{
unsigned char
P1=(MulInv[(ActualParityByte[j]^state[1][j]^state[2][j]^state[3][j])])
^(MulInv[state[1][j]]^MulInv[state[2][j]]^MulInv[state[3][j]]);
PredictedParityByte[j]=0x00;
for (int i=7;i>=0;i--)
{
temp = A[i]&P1;
50
unsigned char bit=0x00;
for(int k=0;k<8;k++)
{
bit^=((temp>>k)&Mask);
}
PredictedParityByte[j]^=bit<<i;
}
}
}
To detect the error finally we use the ParityBitCheck() function. If the error variable is set to 1
then we invoke the Correction() function used in AddRoundKey and ShiftRows to correct it.
3.4.2 Inverse SBox based error detection and correction
In this algorithm, we don’t use the parity bit matrix but directly the parity byte vector which is
mostly error free (Nevertheless, some limits can be still found).
This correction is a kind of reverse SubBytes process to correct errors. Each step of the
algorithm is described in the following:
-
First, we need to store the state before the SubBytes process and the error
injection (It means that errors cannot occur everywhere) in order to use it
afterwards:
void storePreviousState()
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
pstate[i][j] = state[i][j];
}
}
}
-
We use directly the predicted parity byte vector to verify if an error occurred. We
compare the ActualParityByte with the PredictedParityByte. If these last two
vectors are different in the j-th word, the following of the algorithm will only be
performed on this word.
51
-
Then, for each row[i]of the[j]erroneous word, we use the function getSBoxInvert(),
to see if the substitute byte[i][j]of the actual state is the corresponding SBox value
of the previous state (pstate[i][j]).
-
If the two values are the same nothing is done, instead the actual state receives
the corrected value.
void SB_Correction()
{
for(int j=0;j<4;j++)
{
if (ActualParityByte[j]!=PredictedParityByte[j])
{
for(int i=0; i<4; i++)
{
if(getSBoxInvert(state[i][j]) != pstate[i][j])
{
state[i][j] = getSBoxValue(pstate[i][j]);
}
}
}
}
}
3.4.3 CRC based error detection and correction
In this other solution we use CRCs to implement the error detection for SubBytes. The first
thing a CRC needs is the checksum for data. To perform the checksum prediction we defined a
table for this purpose, SBox_checksum, where we stored the corresponding checksum for each
value of the S-Box. Each value has been computed according to what explained for the CRC
codes in section 2.4.1.
checksum_sbox[256] =
{
//0
1
2
3
4
0xb8, 0x58, 0xf8, 0x00, 0x90,
0xc0, 0x30, 0x08, 0xd8, 0xf8,
0x18, 0xa0, 0x60, 0x30, 0xe0,
0x90, 0xb8, 0x20, 0x28, 0xb8,
0xe8, 0xb0, 0x10, 0xf0, 0x70,
0x80, 0xb0, 0x00, 0x70, 0xe8,
0x30, 0x38, 0xb0, 0x78, 0x50,
0xc8, 0x58, 0x98, 0x48, 0xe0,
0x98, 0xf8, 0x18, 0xf0, 0x78,
0x70, 0xf8, 0xa8, 0xc8, 0xa0,
0x08, 0x70, 0x18, 0x20, 0x70,
0x50, 0x88, 0x60, 0x08, 0x00,
0x60, 0xc8, 0xf8, 0x58, 0x28,
0xa0, 0x88, 0x50, 0xa8, 0xf0,
0x88, 0xb0, 0xc0, 0x50, 0x98,
0x80, 0x10, 0x90, 0x78, 0x70,
5
0xd0,
0xa0,
0x08,
0x70,
0xc0,
0x20,
0xe0,
0xd0,
0xf0,
0xc8,
0xd8,
0x20,
0x48,
0xc8,
0xd8,
0xd0,
6
0x40,
0xc0,
0x80,
0x10,
0x68,
0xc0,
0xf0,
0x50,
0x08,
0xa8,
0x78,
0x28,
0xd0,
0x00,
0xc8,
0xd0,
7
0xf0,
0xd8,
0x18,
0x88,
0x90,
0xe8,
0x68,
0xc8,
0x88,
0x10,
0xb0,
0x78,
0x38,
0xb0,
0x38,
0x18,
8
0x38,
0xe8,
0xa8,
0x58,
0x00,
0x50,
0x88,
0xb8,
0x70,
0x40,
0xa8,
0x88,
0x60,
0xf0,
0x08,
0x18,
9
0x80,
0xa0,
0x80,
0x98,
0x98,
0x40,
0x30,
0x98,
0xc8,
0xb8,
0xf8,
0x90,
0x48,
0x28,
0x60,
0x40,
A
0x28,
0xd8,
0x18,
0x78,
0xe8,
0xf0,
0x48,
0x10,
0x10,
0x28,
0x68,
0x48,
0x30,
0x10,
0x20,
0x90,
B
0x48,
0xa0,
0x58,
0x40,
0x88,
0xd0,
0x90,
0x68,
0x40,
0x40,
0x38,
0x28,
0xe0,
0xa8,
0xe0,
0x30,
C
0x68,
0x50,
0x20,
0xa8,
0x00,
0xb8,
0x48,
0xd0,
0xe0,
0x80,
0x28,
0x60,
0x38,
0xa0,
0x50,
0x40,
D
0x68,
0x00,
0x58,
0xb0,
0xc0,
0x60,
0xc0,
0xe8,
0x30,
0xf8,
0xb8,
0x80,
0x38,
0x60,
0x58,
0xd8,
E
0x30,
0xe8,
0xb8,
0x08,
0xd8,
0x20,
0x98,
0x10,
0x38,
0xa0,
0x98,
0x20,
0xd8,
0xa8,
0x80,
0xe0,
F
0x78,
0xe0,
0xc0,
0xb0,
0xe8,
0xd0,
0xf8,
0x78,
0x68,
0x90,
0x48,
0x68,
0x58,
0x18,
0x00,
0x08};
//0
//1
//2
//3
//4
//5
//6
//7
//8
//9
//A
//B
//C
//D
//E
//F
52
As one can see the elements stored in this table are bytes, instead the real length of the
checksum obtained from our CRC is five bits. That is because we found easier to operate with
bytes, so we just added three zeros after the real checksum in order to represent it as a byte.
For the checksum prediction we just need to entry in the SBox_checksum with the same entry
of the SBox and store the content in the prediction matrix. The main operation performed by
the SB_Prediction_Checksum() is then:
checksum_matrix[i][j] = SBox_checksum(state[i][j]);
After the transformation, the detection is performed by SB_detection() that compares the
predicted checksum for the state with the actual one.
if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j]))
{
error=1;
ErrWord[j]=1;
ErrByte[j]=i;
}
The actual checksum is computed using Checksum_calculation(), a function that applies the
definition of checksum for our CRC to the byte given by argument. The checksum is the
remainder of the division between the byte of the state and the characteristic divisor x5+x3+1
(in hexadecimal it corresponds to 0xa4): to do this we XOR the byte with the generator
polynomial if the most significant bit of the byte is 1, otherwise we shift it right by 1 position.
Everything is performed 7 times:
while(k<7)
{
if(byte<0x80)
{
byte=byte<<1;
k++;
}
else byte^=poly_gen;
}
Finally the detection in completed running SB_detection() function that simply performs the
XOR between the predicted and the actual checksum matrix.
if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j]))
{
error=1;
53
ErrWord[j]=1;
ErrByte[j]=i;
}
To correct the error, if it occurred and has been detected, the correction takes place in the
same way as AddRoundKey, ShiftRows and the parity bit based solution of SubBytes.
54
Chapter 4
Error coverage and performances tests
After being testing the right way of working of the implementation by the encryption of
several 128-bit blocks of data using the NIST “AES Known Answer Test vectors”[31], we wanted
to analyze its advantages and drawbacks.
The paper on which the implementation of this correction algorithm is based [24] stipulates
that not all errors are corrected. In fact, the authors assume that only single and odd number
of bits for each byte of the state can be corrected. Also, the other limit is that it can correct at
most one erroneous byte in each word (column) of the state. We already justified in chapter 2
the reason of this limit, but we wanted to verify it also in our implementation. Furthermore we
wanted to test the performances of our code with different numbers of errors, focusing on
time, memory and CPU usage.
4.1
Test Environment
To simulate an error injection a new function has been added to the code: it simply creates a
4x4 byte matrix representative of the error pattern to inject and performs a XOR between this
matrix and the state.
void error_injection()
{
Int errormask[Nb][Nb]={{0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00}};
for (int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
state[i][j]=state[i][j]^errormask[i][j];
}
}
}
To inject an error then we had simply to specify the error pattern and call the error_injection()
function just before the transformation we wanted to test.
55
The test vector used during the testing process is composed by:
Key = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Plaintext = ea 83 5c f0 04 45 33 2d 65 5d 98 ad 85 96 b0 c5
That produces as output:
Ciphertext = 76 ed 47 01 93 fe 61 e0 24 1b 64 c4 55 9f 11 2c
To analyze time and CPU load, we used the Profiling tool provided by Visual Studio 2010
premium beta 2, and the Microsoft performances monitor to look at the memory usage. All the
tests were performed on an Intel 64 bits Code 2 Duo T9950 at 2.66GHz with 4GB of RAM and
Microsoft Windows 7 Professional.
4.2
Assumptions proof
The first assumption we made was that only bit errors of odd multiplicity can been detected.
We injected that in our cipher text an error with even multiplicity and, as expected, the cipher
hasn’t been able to detect it, producing a wrong output. For example, injecting the error 0x03
in the first byte of the first word of the state before the first AddRoundKey transformation, we
obtained:
65 ca 86 03 13 ad ee 66 b4 8a 57 9f 45 b8 b4 49
The second assumption instead was that only errors in different words of the state can be
corrected. In fact, injecting for the same transformation a single bit error 0x01 in the 1st and 2nd
byte of the first word of the state we obtained again a wrong ciphertext:
a7 52 cf 30 67 63 b7 02 e3 4d 37 5e 3c c2 d7 73
That justifies once again our assumptions.
56
4.3
ShiftRows Limits
In the ShiftRows step, the correction algorithm misses some errors if, at the moment of the
correction, two erroneous bytes take place in the same word of the state. While we were
injecting errors with an error mask, we didn’t know that by injecting errors before ShiftRows,
errors would have moved and the correction wouldn’t have worked. Let’s take an example to
see how it works:

First we inject an error mask that we assume is good because it has only one
erroneous byte in each word:
 01
00

00

00


ea 00 00
00 b 4 00
00 00 c5 

00 00 00
After XORed with the state (whatever what the state looks like) the result is:
 01
12

ab

65
ea 90 7 d 
92 b 4 28 
9a 6b c5 

86 4e 45 
 01
92

6b

45
ea 90 7d 
b4 28 12 
c5 ab 9a 

65 86 4e 
After ShiftRows:
Hence three erroneous bytes appear in the same word and the correction won’t be efficient.
This example helps to show a kind of limit of the correction algorithm that if errors are injected
before the ShiftRows process, the erroneous bytes have to take place by thinking about the
result of the ShiftRows step in order to test the algorithm. Obviously, this testing limit doesn’t
appear when the error is injected directly after the shift of each row.
57
In fact, injecting the error pattern in the last example, the encryption process produced the
ciphertext:
9d 11 e3 5b 92 ed af 15 d9 b9 09 7f b4 c9 e3 f2
4.4
Parity bit based SubBytes Limits
As already explained in section 2.4.1, different S-Box values have the same parity bit with
apparently no particular order or law. To have an idea of the fault coverage of this solution, we
injected 9000000 random faults for 15 times before the transformation, outlining a statistical
model of how many errors have been successfully detected and corrected: the percentage of
detected/corrected errors was of the 52%, with a variation coefficient of the 1,6%. This error
coverage has been considered very low, that is why we looked for other solutions.
4.5
CRC based SubBytes Limits
Similarly to the previous case, it could happen that different bytes have the same checksum,
thus a particular error pattern can change the state byte into a particular configuration that
have the same checksum and in this case the error is not detected. We repeated the same test
as the previous case, obtaining a percentage of detected/corrected errors of the 96,87% with a
variation coefficient of the 0,36%, that is substantially higher than the previous solutions.
4.6
Performances analysis
Once we analyzed and verified the behavior of the algorithm, it was necessary to check how
the AES performances could be lowered by the detection/correction procedure, and how
different approaches impact on time, memory and CPU usage. In order to point out the
overheads, we compared our modified AES code with the original one. We will discuss the
most relevant results and for further details it is possible to refer to the appendix B.
58
4.6.1 Time
At the beginning we analyzed the execution time of the original AES code for a single
encryption of a 128-bit block of data. We performed 10 executions on the bounce and the
average time of a single encryption was 109µs. Then, we did the same process for our
algorithm with the different implementations for SubBytes obtaining the following results for
each one. We will analyze the implementations without any error and with error(s) injected.
The overheads were computed by taking the original AES code as a base.
4.6.1.1
Without error
The first tests dealt with error free data:
-
Paritybit based: 152µs (39% overhead)
-
CRC based: 162µs (49% overhead)
-
InvSBox based: 141µs (29% overhead)
Obviously, a part of all these overheads is due to AddRoundkey, ShiftRows and MixColumns.
The differences belong to the SubBytes transformation in each solution.
Some justifications of these results can be explained by looking at the process of detecting
errors in the SubBytes step. Whereas Paritybit and CRC based solutions need to use several
times values from a look-up table, InvSBox implementation only uses parityBytes. The
drawback of the latter solution is that it cannot detect the position of the erroneous byte in a
word.
We can also distinguish an overhead difference between the two first solutions resulting from
an ulterior checksum calculation after the transformation for the CRC solution that implies
more memory accesses than the Paritybit based one.
4.6.1.2
With a single error injected
For this second series of tests we injected a single bit error in each transformation and it
allowed us to see, on the one hand, the global overhead, and on the other hand, which
function was the slowest to detect/correct the error.
59
The average results are:
-
Paritybit based: 40% overhead
-
CRC based: 49% overhead
-
InvSBox based: 31% overhead
As we can see the overheads for Paritybit and CRC based solutions are quite the same as the
case without error. Instead, in the InvSBox based version, the 2% more can be justified by the
new look-up table invSBox involved in the correction process.
By the way, in all solutions, the correction of an error injected before the SubBytes
transformation is the most time consuming because it adds respectively 3%, 2%, and 5% to the
overheads where no errors were injected.
4.6.1.3
With four errors injected
Finally, we analyzed the overheads in the worst case, with four errors injected in each
transformation (one in each word). The error matrix injected into the state looks like:
𝑒𝑎
00
[
00
00
𝑓1 𝑏𝑐
00 00
00 00
00 00
𝑎4
00
]
00
00
The average results are:
-
Paritybit based: 44% overhead
-
CRC based: 52% overhead
-
InvSBox based: 34% overhead
We could not apply this test to the Paritybit based implementation because of the low error
coverage of it. Regarding the other two proposals, the most relevant difference has been
noticed in the SubBytes transformation with 5% more in the overhead for the InvSBox solution
whereas in the CRC based one it raises from 1%. This happened because while the CRC based
corrects only the erroneous byte, the InvSBox based corrects the error going through all bytes
of each word.
60
4.6.2 CPU load
The CPU load tests allowed us to compare the CPU consumption of every function in all
solutions. Visual Studio Profiler collects a sample of the current process state. Sampling is a
nonintrusive, statistical approach to profiling. The more samples collected in a function, the
more processing the function has likely performed. By default, Visual Studio Profiler collects
one sample every 10 million CPU cycles. This way it is possible to see which function is the
heaviest in terms of CPU load. It is important to notice, that sampling collects information only
when the program uses the CPU. Thus, while your process is waiting for disk, network, or any
other resource, Visual Studio Profiler does not collect samples [32]. For this reason we had to
repeat the encryption process for a large number of times.
The following values are related to the whole encryption process.
Figure 19: CPU load for the Paritybit based solution
61
For the Paritybit based solution, the most used functions are getSBoxParity() and
getSBoxValue() (Fig.19). In fact, these two functions use load and store instructions to move
values from a look-up table to CPU registers and vice-versa.
Figure 20: CPU load for the CRC based solution
For the CRC and InvSBox based solutions the situation remains almost the same as the
previous one. Finally, we can see that the SubBytes transformation requires more CPU usage
than other ones. (Fig.20 and 21).
Figure 21: CPU load for the InvSBox based solution
62
4.6.3 Memory usage
First of all, we measured the original AES code memory usage and it was 1308KB. All the three
solutions add new variables; therefore they have a higher memory usage. The differences
between the three solutions are resulting from different size of added look-up tables. The
table below shows the different values of memory usage for each solution.
Memory usage (KB)
Overhead
Original
1308
ParityBit
1332
1,8%
InvSbox
1352
3,4%
CRC
1368
4,6%
63
Chapter 5
Conclusions and future work
The encryption/decryption process in the Advanced Encryption Standard is often subject to
hacking attacks since it has become the most widely used around the world to secure data.
By studying thoroughly the AES literature, we pointed out the possibility to violate its security
by injecting faults within data during their processing. Even if different solutions have been
already proposed to cope with it by detecting/correcting these errors, no software
implementations were found and we thought it could be useful to raise AES security by adding
this characteristic.
In this project we analyzed the fault attacks and error detection/correction algorithms to
prevent them in order to develop different versions of fault proof AES softwares. This software
uses parity bits and bytes to recognize the most common kinds of data corruption and remove
them.
We evaluated the efficiency of that proposed software in different scenarios simulating several
types of error injection. We confirmed the capability of detecting and correcting all bit errors
of odd multiplicity that are inducted into not more than four bytes of the data. One problem
came from the non-linearity of the SubBytes process, in fact some unpredictable kind of errors
were not detected. That is why different solutions were implemented using CRC and Inverse
SubBytes. Tests allowed seeing the raising of the coverage percentage.
Clearly each solution has its pros and cons. Performances tests revealed that, whereas Inverse
SubBytes based solution is faster and is able to correct all errors but it requires more memory,
the CRC based solution turned out to be slower but lighter in terms of memory usage. We can
conclude that respect to the original AES code each solution introduces a maximum overhead
close to 50% in time and 5% in memory usage.
However some improvements can be added. An optimization of the code can be performed to
reduce the overhead further, specifically for SubBytes, which is the most CPU consuming step
as shown in section 4.6.2. It is obvious that the error coverage could be enhanced by improving
64
our algorithm or suggesting new solutions. Actually we tried to reduce the time overhead by
using inline functions: with this technique the function call is replaced by the function body
during the compilation reducing several calls to functions during the execution. But applying
this solution to our algorithm did not reduce significantly the time overhead. At last we also
verified if by unrolling several for cycles, avoiding in this way multiple cycle conditions checks,
the time would have been lowered. However, also in this case we did not find any relevant
improvement.
Our algorithm could be further used in a real transmission environment involving more than
one encrypted block of data to check how it impacts on the performances in a real situation.
Moreover, an important proof of security can be achieved by analyzing how the algorithm
answers to a concrete DFA attack. Thanks to ours and supervisors’ knowledge, we want to
underline that we actually did some unique work not done by anybody else before for all we
know.
65
Appendix A
AES source code with error detection and
correction
A.1
Complete Source code with parityBit based SubBytes
detection/correction
#include <stdio.h>
#include <stdlib.h>
#ifndef _AESENCRYPT_H_
#define _AESENCRYPT_H_
#include "aesencrypt.h"
#endif
int main(int argc, char* argv[])
{
int i;
// KeyLenght
Nr=128;
Nk = Nr / 32;
Nr = Nk + 6;
unsigned char InKey[32] = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00};
unsigned char Plaintext[32]= {0xea,0x83,0x5c,0xf0,0x04,0x45,0x33,0x2d,
0x65,0x5d,0x98,0xad,0x85,0x96,0xb0,0xc5};
// Copy the Key and PlainText
for(i=0;i<Nk*4;i++)
{
Key[i]=InKey[i];
in[i]=Plaintext[i];
}
// The KeyExpansion routine must be called before encryption.
KeyExpansion();
Cipher();
return 0;
}
void error_injection()
{
int
errormask[Nb][Nb]={{0x01,0x00,0x00,0x00},{0x00,0x00,0x00,0x00},{0x00,0x00,0x00
,0x00},{0x00,0x00,0x00,0x00}};
for (int i=0;i<4;i++){
for(int j=0;j<4;j++){
state[i][j]=state[i][j]^errormask[i][j];
}
}
}
int getSBoxValue(int num)
{
66
int sbox[256] =
{
//0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76, //0
0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0, //1
0xb7,0xfd,0x93,0x26,0x36,0x3f,0xf7,0xcc,0x34,0xa5,0xe5,0xf1,0x71,0xd8,0x31,0x15, //2
0x04,0xc7,0x23,0xc3,0x18,0x96,0x05,0x9a,0x07,0x12,0x80,0xe2,0xeb,0x27,0xb2,0x75, //3
0x09,0x83,0x2c,0x1a,0x1b,0x6e,0x5a,0xa0,0x52,0x3b,0xd6,0xb3,0x29,0xe3,0x2f,0x84, //4
0x53,0xd1,0x00,0xed,0x20,0xfc,0xb1,0x5b,0x6a,0xcb,0xbe,0x39,0x4a,0x4c,0x58,0xcf, //5
0xd0,0xef,0xaa,0xfb,0x43,0x4d,0x33,0x85,0x45,0xf9,0x02,0x7f,0x50,0x3c,0x9f,0xa8, //6
0x51,0xa3,0x40,0x8f,0x92,0x9d,0x38,0xf5,0xbc,0xb6,0xda,0x21,0x10,0xff,0xf3,0xd2, //7
0xcd,0x0c,0x13,0xec,0x5f,0x97,0x44,0x17,0xc4,0xa7,0x7e,0x3d,0x64,0x5d,0x19,0x73, //8
0x60,0x81,0x4f,0xdc,0x22,0x2a,0x90,0x88,0x46,0xee,0xb8,0x14,0xde,0x5e,0x0b,0xdb, //9
0xe0,0x32,0x3a,0x0a,0x49,0x06,0x24,0x5c,0xc2,0xd3,0xac,0x62,0x91,0x95,0xe4,0x79, //A
0xe7,0xc8,0x37,0x6d,0x8d,0xd5,0x4e,0xa9,0x6c,0x56,0xf4,0xea,0x65,0x7a,0xae,0x08, //B
0xba,0x78,0x25,0x2e,0x1c,0xa6,0xb4,0xc6,0xe8,0xdd,0x74,0x1f,0x4b,0xbd,0x8b,0x8a, //C
0x70,0x3e,0xb5,0x66,0x48,0x03,0xf6,0x0e,0x61,0x35,0x57,0xb9,0x86,0xc1,0x1d,0x9e, //D
0xe1,0xf8,0x98,0x11,0x69,0xd9,0x8e,0x94,0x9b,0x1e,0x87,0xe9,0xce,0x55,0x28,0xdf, //E
0x8c,0xa1,0x89,0x0d,0xbf,0xe6,0x42,0x68,0x41,0x99,0x2d,0x0f,0xb0,0x54,0xbb,0x16}; //F
return sbox[num];
}
int getSBoxParity(int num)
{
int bitSBox[256] =
{
//
0 1 2 3 4 5 6 7 8 9 A B C D E F
0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, //0
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, //1
0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, //2
1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, //3
0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, //4
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, //5
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, //6
1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, //7
1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, //8
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, //9
1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, //A
0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, //B
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, //C
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, //D
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, //E
1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1}; //F
return bitSBox[num];
}
// This function produces Nb(Nr+1) round keys. The round keys are used in each
round to encrypt the states.
void KeyExpansion()
{
int i,j;
unsigned char temp[4],k;
// The first round key is the key itself.
for(i=0;i<Nk;i++)
{
RoundKey[i*4]=Key[i*4];
RoundKey[i*4+1]=Key[i*4+1];
RoundKey[i*4+2]=Key[i*4+2];
RoundKey[i*4+3]=Key[i*4+3];
}
// All other round keys are found from the previous round keys.
while (i < (Nb * (Nr+1)))
{
for(j=0;j<4;j++)
{
temp[j]=RoundKey[(i-1) * 4 + j];
}
if (i % Nk == 0)
{
// This function rotates the 4 bytes in a word to the left once.
// [a0,a1,a2,a3] becomes [a1,a2,a3,a0]
67
// Function RotWord()
{
k = temp[0];
temp[0] = temp[1];
temp[1] = temp[2];
temp[2] = temp[3];
temp[3] = k;
}
// SubWord() is a function that takes a four-byte input word and
// applies the S-box to each of the four bytes to produce an output word.
// Function Subword()
{
temp[0]=getSBoxValue(temp[0]);
temp[1]=getSBoxValue(temp[1]);
temp[2]=getSBoxValue(temp[2]);
temp[3]=getSBoxValue(temp[3]);
}
temp[0] = temp[0] ^ Rcon[i/Nk];
}
else if (Nk > 6 && i % Nk == 4)
{
// Function Subword()
{
temp[0]=getSBoxValue(temp[0]);
temp[1]=getSBoxValue(temp[1]);
temp[2]=getSBoxValue(temp[2]);
temp[3]=getSBoxValue(temp[3]);
}
}
RoundKey[i*4+0] = RoundKey[(i-Nk)*4+0] ^ temp[0];
RoundKey[i*4+1] = RoundKey[(i-Nk)*4+1] ^ temp[1];
RoundKey[i*4+2] = RoundKey[(i-Nk)*4+2] ^ temp[2];
RoundKey[i*4+3] = RoundKey[(i-Nk)*4+3] ^ temp[3];
i++;
}
}
//Computes and the parity bit matrix for a state
void getParityBit(unsigned char bit[][4])
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
bit[j][i]=0x00;
for(int k=0;k<8;k++)
{
bit[j][i]^=((state[j][i]>>k)&Mask);
}
}
}
}
//This function computes the parity bytes for the state:
void getParityByte(unsigned char byte[])
{
for(int j=0;j<4;j++)
{
byte[j]=0x00;
for(int i=0;i<4;i++)
{
byte[j] ^= state[i][j];
}
}
}
//This Function computes the parity bits for the round key:
68
void RoundKeyParityBit(int round)
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
RKeyParityBit[j][i]=0x00;
for(int k=0;k<8;k++)
{
RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb +
j]>>k)&Mask);
}
}
}
}
//This function computes the parity bytes for the round key:
void RoundKeyParityByte(int round)
{
for(int i=0;i<4;i++)
{
RKeyParityByte[i]=0x00;
for(int j=0;j<4;j++)
{
RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]);
}
}
}
/*This function compares actual and predicted parity bit to check if error
occurred
storing also the position of the erroneus byte in the j-th word which has been
corrupted*/
void ParityBitCheck()
{
getParityBit(ActualParityBit);
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
if(ActualParityBit[i][j]!=PredictedParityBit[i][j])
{
error=1;
ErrWord[j]=1;
ErrByte[j]=i;
}
}
}
}
//This Function Predicts the parity bits for the add round key:
void ARK_BitPrediction(int round)
{
RoundKeyParityBit(round);
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
PredictedParityBit[j][i]=ActualParityBit[j][i]^RKeyParityBit[j][i];
}
}
}
//This function predicts the parity byte for the add round key:
void ARK_BytePrediction(int round)
{
RoundKeyParityByte(round);
for(int j=0;j<4;j++)
69
{
PredictedParityByte[j] =
ActualParityByte[j] ^ RKeyParityByte[j];
}
}
// This function adds the round key to state.
// The round key is added to the state by an XOR function.
void AddRoundKey(int round)
{
int i,j;
for(i=0;i<4;i++)
{
for(j=0;j<4;j++)
{
state[j][i] ^= RoundKey[round * Nb * 4 + i * Nb + j];
}
}
}
void Correction()
{
for(int j=0;j<4;j++)
{
if (ErrWord[j]==1)
{
CorrectionParityByte[j] = PredictedParityByte[j] ^
state[0][j] ^ state[1][j] ^ state[2][j] ^ state[3][j];
state[ErrByte[j]][j]^=CorrectionParityByte[j];
}
ErrWord[j]=0;
}
getParityByte(ActualParityByte);
error=0;
}
//The predicted parity bits are the parity bit calculated for the values
contained
//in the S-Box. These are contained in the bitSBox matrix in the SBoxParity
function
void SB_BitPrediction()
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
PredictedParityBit[i][j] = getSBoxParity(state[i][j]);
}
}
}
/*The Predicted Parity Byte for SubBytes is computed considering the output
state according to the definition of SubBytes in the Matrix form:
s'[i][j] = A x (s^-1[i][j]) + b
and the first byte of it s'[0][j] as function of the actual parity byte before
the transformation*/
void SB_BytePrediction()
{
unsigned char temp=0x00;
for (int j=0;j<4;j++)
{
unsigned char P1=(MulInv[(ActualParityByte[j]^state[1][j]^
state[2][j]^state[3][j])])^(MulInv[state[1][j]]^
MulInv[state[2][j]]^MulInv[state[3][j]]);
PredictedParityByte[j]=0x00;
for (int i=7;i>=0;i--)
{
temp = A[i]&P1;
unsigned char bit=0x00;
70
for(int k=0;k<8;k++)
{
bit^=((temp>>k)&Mask);
}
PredictedParityByte[j]^=bit<<i;
}
}
}
// The SubBytes Function Substitutes the values in the
// state matrix with values in an S-box.
void SubBytes()
{
int i,j;
for(i=0;i<4;i++)
{
for(j=0;j<4;j++)
{
state[i][j] = getSBoxValue(state[i][j]);
}
}
}
//Predicted parity bit for shift row are obtained performing the same shifts
the
//shift rows does on the state on the actual parity bit matrix
void SR_BitPrediction()
{
unsigned char temp;
getParityBit(PredictedParityBit);
// Rotate first row 1 columns to left
temp=PredictedParityBit[1][0];
PredictedParityBit[1][0]=PredictedParityBit[1][1];
PredictedParityBit[1][1]=PredictedParityBit[1][2];
PredictedParityBit[1][2]=PredictedParityBit[1][3];
PredictedParityBit[1][3]=temp;
// Rotate second row 2 columns to left
temp=PredictedParityBit[2][0];
PredictedParityBit[2][0]=PredictedParityBit[2][2];
PredictedParityBit[2][2]=temp;
temp=PredictedParityBit[2][1];
PredictedParityBit[2][1]=PredictedParityBit[2][3];
PredictedParityBit[2][3]=temp;
// Rotate third row 3 columns to left
temp=PredictedParityBit[3][0];
PredictedParityBit[3][0]=PredictedParityBit[3][3];
PredictedParityBit[3][3]=PredictedParityBit[3][2];
PredictedParityBit[3][2]=PredictedParityBit[3][1];
PredictedParityBit[3][1]=temp;
}
/*The predicted parity byte for shift rows is computed according to the shifts
the transformation performs on the state. The 1st byte of the 1st word is
expressed as function of the actual parity byte
*/
void SR_BytePrediction()
{
for(int j=0;j<4;j++)
{
PredictedParityByte[j] = ActualParityByte[j] ^ state[1][j] ^
state[2][j] ^ state[3][j] ^ state[1][(j+1)%4] ^ state[2][(j+2)%4] ^
state[3][(j+3)%4];
}
71
}
// The ShiftRows() function shifts the rows in the state to the left.
// Each row is shifted with different offset.
// Offset = Row number. So the first row is not shifted.
void ShiftRows()
{
unsigned char temp;
// Rotate first row 1 columns to left
temp=state[1][0];
state[1][0]=state[1][1];
state[1][1]=state[1][2];
state[1][2]=state[1][3];
state[1][3]=temp;
// Rotate second row 2 columns to left
temp=state[2][0];
state[2][0]=state[2][2];
state[2][2]=temp;
temp=state[2][1];
state[2][1]=state[2][3];
state[2][3]=temp;
// Rotate third row 3 columns to left
temp=state[3][0];
state[3][0]=state[3][3];
state[3][3]=state[3][2];
state[3][2]=state[3][1];
state[3][1]=temp;
}
//Get the most significant bit of a byte
bool getMostsignificantBit(unsigned char data_received){
bool MSB;
if((data_received>>7)&Mask==1)
MSB=1;
else
MSB=0;
return MSB;
}
//The parity bits prediction for MixColumn is based on the actual parity bits
and
//the most significand bit of each byte of the state.
//The transformation preserves the parity byte.
void MC_BitPrediction()
{
getParityBit(ActualParityBit);
for(int i=0;i<4;i++)
{
PredictedParityBit[0][i]=ActualParityBit[0][i]^
ActualParityBit[2][i]^ActualParityBit[3][i]^
getMostsignificantBit(state[0][i])^getMostsignificantBit(state[1][i]);
PredictedParityBit[1][i]=ActualParityBit[0][i]^
ActualParityBit[1][i]^ActualParityBit[3][i]^
getMostsignificantBit(state[1][i])^getMostsignificantBit(state[2][i]);
PredictedParityBit[2][i]=ActualParityBit[0][i]^
ActualParityBit[1][i]^ActualParityBit[2][i]^
getMostsignificantBit(state[2][i])^getMostsignificantBit(state[3][i]);
PredictedParityBit[3][i]=ActualParityBit[1][i]^
ActualParityBit[2][i]^ActualParityBit[3][i]^
getMostsignificantBit(state[3][i])^getMostsignificantBit(state[0][i]);
}
}
72
void MC_ParityBitCheck()
{
char PBitError_Pattern[4][4];
char matrix_msb0[4][4]={{1,0,1,1},{1,1,0,1},{1,1,1,0},{0,1,1,1}};
char matrix_msb1[4][4]={{0,1,1,1},{1,0,1,1},{1,1,0,1},{1,1,1,0}};
getParityBit(ActualParityBit);
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
PBitError_Pattern[i][j]=PredictedParityBit[i][j]^
ActualParityBit[i][j];
}
}
getParityByte(ActualParityByte);
for(int i=0;i<4;i++)
{
CorrectionParityByte[i]=PredictedParityByte[i]^
ActualParityByte[i];
}
for(int i=0;i<4;i++)
{
if(CorrectionParityByte[i]!=0)
{
error=1;
bool msb=getMostsignificantBit(CorrectionParityByte[i]);
if(msb==false)
{
for(int k=0;k<4;k++)
{
if(PBitError_Pattern[0][i]==matrix_msb0[0][k] &&
PBitError_Pattern[1][i]==matrix_msb0[1][k] &&
PBitError_Pattern[2][i]==matrix_msb0[2][k] &&
PBitError_Pattern[3][i]==matrix_msb0[3][k])
{
ErrByte[i]=k;
}
}
}
else
{
for(int k=0;k<4;k++)
{
if(PBitError_Pattern[0][i]==matrix_msb1[0][k] &&
PBitError_Pattern[1][i]==matrix_msb1[1][k] &&
PBitError_Pattern[2][i]==matrix_msb1[2][k] &&
PBitError_Pattern[3][i]==matrix_msb1[3][k])
{
ErrByte[i]=k;
}
}
}
}
}
}
void MC_Correction()
{
for(int i=0;i<4;i++)
{
if(CorrectionParityByte[i]!=0)
{
if(ErrByte[i]==0)
{
CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i]);
73
CorrectionMatrix[1][i]=CorrectionParityByte[i];
CorrectionMatrix[2][i]=CorrectionParityByte[i];
CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i])^
CorrectionParityByte[i];
}
if(ErrByte[i]==1)
{
CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i])^
CorrectionParityByte[i];
CorrectionMatrix[1][i]=xtime(CorrectionParityByte[i]);
CorrectionMatrix[2][i]=CorrectionParityByte[i];
CorrectionMatrix[3][i]=CorrectionParityByte[i];
}
if(ErrByte[i]==2)
{
CorrectionMatrix[0][i]=CorrectionParityByte[i];
CorrectionMatrix[1][i]=xtime(CorrectionParityByte[i])^
CorrectionParityByte[i];
CorrectionMatrix[2][i]=xtime(CorrectionParityByte[i]);
CorrectionMatrix[3][i]=CorrectionParityByte[i];
}
if(ErrByte[i]==3)
{
CorrectionMatrix[0][i]=CorrectionParityByte[i];
CorrectionMatrix[1][i]=CorrectionParityByte[i];
CorrectionMatrix[2][i]=xtime(CorrectionParityByte[i])^
CorrectionParityByte[i];
CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i]);
}
}
}
for(int j=0;j<4;j++)
{
if (CorrectionParityByte[j]!=0)
{
for(int i=0;i<4;i++)
{
state[i][j]^=CorrectionMatrix[i][j];
}
}
}
getParityByte(ActualParityByte);
error=0;
}
// MixColumns function mixes the columns of the state matrix
void MixColumns()
{
int i;
unsigned char Tmp,Tm,t;
for(i=0;i<4;i++)
{
t=state[0][i];
Tmp = state[0][i] ^ state[1][i] ^ state[2][i] ^ state[3][i] ;
Tm = state[0][i] ^ state[1][i] ;
Tm = xtime(Tm);
state[0][i] ^= Tm ^ Tmp ;
Tm = state[1][i] ^ state[2][i] ;
Tm = xtime(Tm);
state[1][i] ^= Tm ^ Tmp ;
Tm = state[2][i] ^ state[3][i] ;
Tm = xtime(Tm);
state[2][i] ^= Tm ^ Tmp ;
Tm = state[3][i] ^ t ;
Tm = xtime(Tm);
state[3][i] ^= Tm ^ Tmp ;
74
}
}
// Cipher is the main function that encrypts the PlainText.
void Cipher()
{
int i,j,round=0;
//Copy the input PlainText to state array.
for(i=0;i<4;i++)
{
for(j=0;j<4;j++)
{
state[j][i] = in[i*4 + j];
}
}
getParityBit(ActualParityBit);
getParityByte(ActualParityByte);
ARK_BitPrediction(round);
ARK_BytePrediction(round);
// Add the First round key to the state before starting the rounds.
AddRoundKey(0);
ParityBitCheck();
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
// There will be Nr rounds.
// The first Nr-1 rounds are identical.
// These Nr-1 rounds are executed in the loop below.
for(round=1;round<Nr;round++)
{
SB_BitPrediction();
SB_BytePrediction();
SubBytes();
ParityBitCheck();
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
SR_BitPrediction();
SR_BytePrediction();
ShiftRows();
ParityBitCheck();
if (error==1)
{
Correction();
}
//For mix column predicted and actual parity byte are the same. As we need
//Predicted parity byte in the detection and correction now we store the
actual
//parity byte in the predicted vector
else getParityByte(PredictedParityByte);
75
MC_BitPrediction();
MixColumns();
MC_ParityBitCheck();
if (error==1)
{
MC_Correction();
}
else getParityByte(ActualParityByte);
getParityBit(ActualParityBit);
ARK_BitPrediction(round);
ARK_BytePrediction(round);
AddRoundKey(round);
ParityBitCheck();
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
}
// The last round is given below.
// The MixColumns function is not here in the last round.
SB_BitPrediction();
SB_BytePrediction();
SubBytes();
ParityBitCheck();
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
SR_BitPrediction();
SR_BytePrediction();
ShiftRows();
ParityBitCheck();
if (error==1)
{
Correction();
}
else getParityByte(PredictedParityByte);
getParityBit(ActualParityBit);
ARK_BitPrediction(Nr);
ARK_BytePrediction(Nr);
AddRoundKey(Nr);
ParityBitCheck();
if(error==1)
{
Correction();
}
// The encryption process is over.
// Copy the state array to output array.
76
for(i=0;i<4;i++)
{
for(j=0;j<4;j++)
{
out[i*4+j]=state[j][i];
}
}
}
A.2
CRC based SubBytes detection/correction solution
New variable:
unsigned char checksum_matrix[Nb][Nb]={{0,0,0,0},{0,0,0,0},{0,0,0,0},{0,0,0,0}};
In the previous code getSboxParityBit() is replaced with:
int SBox_checksum(int num)
{
int checksum_sbox[256] =
{
//0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0xb8,0x58,0xf8,0x00,0x90,0xd0,0x40,0xf0,0x38,0x80,0x28,0x48,0x68,0x68,0x30,0x78,
0xc0,0x30,0x08,0xd8,0xf8,0xa0,0xc0,0xd8,0xe8,0xa0,0xd8,0xa0,0x50,0x00,0xe8,0xe0,
0x18,0xa0,0x60,0x30,0xe0,0x08,0x80,0x18,0xa8,0x80,0x18,0x58,0x20,0x58,0xb8,0xc0,
0x90,0xb8,0x20,0x28,0xb8,0x70,0x10,0x88,0x58,0x98,0x78,0x40,0xa8,0xb0,0x08,0xb0,
0xe8,0xb0,0x10,0xf0,0x70,0xc0,0x68,0x90,0x00,0x98,0xe8,0x88,0x00,0xc0,0xd8,0xe8,
0x80,0xb0,0x00,0x70,0xe8,0x20,0xc0,0xe8,0x50,0x40,0xf0,0xd0,0xb8,0x60,0x20,0xd0,
0x30,0x38,0xb0,0x78,0x50,0xe0,0xf0,0x68,0x88,0x30,0x48,0x90,0x48,0xc0,0x98,0xf8,
0xc8,0x58,0x98,0x48,0xe0,0xd0,0x50,0xc8,0xb8,0x98,0x10,0x68,0xd0,0xe8,0x10,0x78,
0x98,0xf8,0x18,0xf0,0x78,0xf0,0x08,0x88,0x70,0xc8,0x10,0x40,0xe0,0x30,0x38,0x68,
0x70,0xf8,0xa8,0xc8,0xa0,0xc8,0xa8,0x10,0x40,0xb8,0x28,0x40,0x80,0xf8,0xa0,0x90,
0x08,0x70,0x18,0x20,0x70,0xd8,0x78,0xb0,0xa8,0xf8,0x68,0x38,0x28,0xb8,0x98,0x48,
0x50,0x88,0x60,0x08,0x00,0x20,0x28,0x78,0x88,0x90,0x48,0x28,0x60,0x80,0x20,0x68,
0x60,0xc8,0xf8,0x58,0x28,0x48,0xd0,0x38,0x60,0x48,0x30,0xe0,0x38,0x38,0xd8,0x58,
0xa0,0x88,0x50,0xa8,0xf0,0xc8,0x00,0xb0,0xf0,0x28,0x10,0xa8,0xa0,0x60,0xa8,0x18,
0x88,0xb0,0xc0,0x50,0x98,0xd8,0xc8,0x38,0x08,0x60,0x20,0xe0,0x50,0x58,0x80,0x00,
0x80,0x10,0x90,0x78,0x70,0xd0,0xd0,0x18,0x18,0x40,0x90,0x30,0x40,0xd8,0xe0,0x08};
return checksum_sbox[num];
}
//0
//1
//2
//3
//4
//5
//6
//7
//8
//9
//A
//B
//C
//D
//E
//F
77
The SB_BitPrediction functions is not needed in this solution.
Instead of it the following functions are used:
unsigned char Checksum_calculation(unsigned char byte)
{
unsigned char poly_gen=0xa4; //6 bit poly + 00
int k=0; //k is the number of shift
while(k<7){
if(byte<0x80)//shift zeros
{
byte=byte<<1;
k++;
}
else byte^=poly_gen;}
return byte;
}
void SB_Prediction_checksum()
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
checksum_matrix[i][j] = SBox_checksum(state[i][j]);
}
}
}
The prediction uses, instead of ParityBitCheck(), this new function:
void SB_detection()
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j]))
{
error=1;
ErrWord[j]=1;
ErrByte[j]=i;
}
}
}
}
The Cipher() is then modified as follows for the SubBytes() transformation:
SB_Prediction_checksum();
SB_BytePrediction();
SubBytes();
SB_detection();
78
if(error==1)
{
Correction();
}
else getParityByte(ActualParityByte);
A.3
InvSBox based SubBytes detection/correction solution
In the complete code of section A.1 getSboxParityBit() is replaced by:
int getSBoxInvert(int num)
{
int rsbox[256] =
{ 0x52,0x09,0x6a,0xd5,0x30,0x36,0xa5,0x38,0xbf,0x40,0xa3,0x9e,0x81,0xf3,0xd7,0xfb,
0x7c,0xe3,0x39,0x82,0x9b,0x2f,0xff,0x87,0x34,0x8e,0x43,0x44,0xc4,0xde,0xe9,0xcb,
0x54,0x7b,0x94,0x32,0xa6,0xc2,0x23,0x3d,0xee,0x4c,0x95,0x0b,0x42,0xfa,0xc3,0x4e,
0x08,0x2e,0xa1,0x66,0x28,0xd9,0x24,0xb2,0x76,0x5b,0xa2,0x49,0x6d,0x8b,0xd1,0x25,
0x72,0xf8,0xf6,0x64,0x86,0x68,0x98,0x16,0xd4,0xa4,0x5c,0xcc,0x5d,0x65,0xb6,0x92,
0x6c,0x70,0x48,0x50,0xfd,0xed,0xb9,0xda,0x5e,0x15,0x46,0x57,0xa7,0x8d,0x9d,0x84,
0x90,0xd8,0xab,0x00,0x8c,0xbc,0xd3,0x0a,0xf7,0xe4,0x58,0x05,0xb8,0xb3,0x45,0x06,
0xd0,0x2c,0x1e,0x8f,0xca,0x3f,0x0f,0x02,0xc1,0xaf,0xbd,0x03,0x01,0x13,0x8a,0x6b,
0x3a,0x91,0x11,0x41,0x4f,0x67,0xdc,0xea,0x97,0xf2,0xcf,0xce,0xf0,0xb4,0xe6,0x73,
0x96,0xac,0x74,0x22,0xe7,0xad,0x35,0x85,0xe2,0xf9,0x37,0xe8,0x1c,0x75,0xdf,0x6e,
0x47,0xf1,0x1a,0x71,0x1d,0x29,0xc5,0x89,0x6f,0xb7,0x62,0x0e,0xaa,0x18,0xbe,0x1b,
0xfc,0x56,0x3e,0x4b,0xc6,0xd2,0x79,0x20,0x9a,0xdb,0xc0,0xfe,0x78,0xcd,0x5a,0xf4,
0x1f,0xdd,0xa8,0x33,0x88,0x07,0xc7,0x31,0xb1,0x12,0x10,0x59,0x27,0x80,0xec,0x5f,
0x60,0x51,0x7f,0xa9,0x19,0xb5,0x4a,0x0d,0x2d,0xe5,0x7a,0x9f,0x93,0xc9,0x9c,0xef,
0xa0,0xe0,0x3b,0x4d,0xae,0x2a,0xf5,0xb0,0xc8,0xeb,0xbb,0x3c,0x83,0x53,0x99,0x61,
0x17,0x2b,0x04,0x7e,0xba,0x77,0xd6,0x26,0xe1,0x69,0x14,0x63,0x55,0x21,0x0c,0x7d};
return rsbox[num];
}
SB_BitPrediction is not needed anymore. The new functions used are:
void storePreviousState()
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
pstate[i][j] = state[i][j];
}
}
}
void SB_Correction()
{
for(int j=0;j<4;j++)
{
if (ActualParityByte[j]!=PredictedParityByte[j])
{
79
for(int i=0; i<4; i++)
{
if(getSBoxInvert(state[i][j]) != pstate[i][j])
{
state[i][j] = getSBoxValue(pstate[i][j]);
}
}
}
}
}
and a new variable is used:
unsigned char pstate[Nb][Nb];
The Cipher() is modified as follows for SubBytes() transformation:
storePreviousState();
SB_BytePrediction();
SubBytes();
getParityBit(ActualParityBit);
getParityByte(ActualParityByte);
SB_Correction();
80
Appendix B
B.1
Time performance
Parity bit based SB
Original code
No errors
Execution nb
Execution time in µs
Execution time in µs
1
120
150
2
90
150
3
100
150
4
140
130
5
100
160
6
100
150
7
100
160
8
120
150
9
120
160
10
100
160
Average
109,00
152,00
ST DEV
15,24
9,19
Overhead
39%
81
ARK
SubBytes
ShiftRows
Mixcolumns
Single Error
Single Error
Single Error
Single Error
Execution time in µs
Execution time in µs
Execution time in µs
Execution time in µs
1
140
150
150
150
2
150
150
150
150
3
150
160
160
160
4
150
150
150
150
5
150
150
150
150
6
150
170
150
150
7
150
150
150
160
8
150
150
150
150
9
160
170
160
160
10
150
150
150
150
Average
150,00
155,00
152,00
153,00
ST DEV
4,714
8,50
4,22
4,83
Overhead
38%
42%
39%
40%
82
ARK
SubBytes
ShiftRows
Mixcolumns
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
Execution time in µs Execution time in µs Execution time in µs Execution time in µs
1
160
0
150
150
2
160
0
170
160
3
150
0
160
170
4
150
0
180
150
5
160
0
150
140
6
150
0
150
150
7
160
0
170
150
8
170
0
160
150
9
160
0
150
160
10
150
0
170
150
Average
157,00
-
161,00
153,00
ST DEV
6,75
0
11,00
8,23
Overhead
44%
-
48%
40%
83
Inverse based SB
Original code
No errors
Execution nb Execution time in µs Execution time in µs
1
120
140
2
90
140
3
100
140
4
140
150
5
100
140
6
100
140
7
100
140
8
120
140
9
120
140
10
100
140
Average
109,00
141,00
ST DEV
15,24
3,16
Overhead
29%
84
ARK
SubBytes
ShiftRows
Mixcolumns
Single Error
Single Error
Single Error
Single Error
Execution time in µs Execution time in µs Execution time in µs Execution time in µs
1
140
140
140
130
2
140
150
130
140
3
160
140
130
140
4
140
140
160
170
5
140
150
150
160
6
140
140
140
130
7
140
140
140
140
8
140
180
140
140
9
140
140
140
140
10
140
140
140
140
Average
142,00
146,00
141,00
143,00
ST DEV
6,32
12,64
8,76
12,51
Overhead
30%
34%
29%
31%
85
ARK
SubBytes
ShiftRows
Mixcolumns
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
Execution time in µs Execution time in µs Execution time in µs Execution time in µs
1
150
140
180
140
2
140
140
140
160
3
140
160
140
140
4
150
140
140
140
5
150
190
170
140
6
140
160
140
150
7
140
140
140
140
8
140
140
150
140
9
140
160
140
140
10
140
150
140
140
Average
143,00
152,00
148,00
143,00
ST DEV
4,83
16,19
14,75
6,74
Overhead
31%
39%
36%
31%
86
CRC based SB
Original code
No errors
Execution nb Execution time in µs Execution time in µs
1
120
160
2
90
160
3
100
160
4
140
150
5
100
190
6
100
160
7
100
160
8
120
160
9
120
160
10
100
160
Average
109,00
162,00
ST DEV
15,24
10,32
Overhead
49%
87
ARK
SubBytes
ShiftRows
Mixcolumns
Single Error
Single Error
Single Error
Single Error
Execution time in µs Execution time in µs Execution time in µs Execution time in µs
1
160
160
160
160
2
170
160
170
160
3
160
160
160
150
4
160
170
170
160
5
160
160
160
160
6
170
170
160
170
7
160
170
160
160
8
150
170
140
160
9
160
160
160
160
10
160
170
170
170
Average
161,00
165,00
161,00
161,00
ST DEV
6,32
12,64
8,76
12,51
Overhead
48%
51%
48%
48%
88
ARK
SubBytes
ShiftRows
Mixcolumns
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
4 bytes Erroneus
Execution time in µs Execution time in µs Execution time in µs Execution time in µs
1
160
170
160
160
2
160
170
170
160
3
170
160
170
180
4
170
170
180
170
5
180
160
170
170
6
160
170
160
170
7
160
160
160
160
8
160
170
160
170
9
160
170
180
160
10
140
160
170
170
Average
162,00
166,00
168,00
167,00
ST DEV
10,32
5,16
7,88
6,74
Overhead
49%
52%
54%
53%
89
B.2
CPU load performance
Inverse based SB
Function Name
Inclusive Samples
Exclusive Samples
Inclusive Samples %
Exclusive Samples %
getSBoxValue(int)
202 817
200 586
43,51
43,03
ParityBitCheck(void)
56 106
6 460
12,04
1,39
SB_BytePrediction(void)
42 636
42 636
9,15
9,15
getParityBit(unsigned char (* const)[4])
40 845
40 845
8,76
8,76
ARK_BitPrediction(int)
24 298
1 675
5,21
0,36
MC_BitPrediction(void)
23 489
4 341
5,04
0,93
MC_ParityBitCheck(void)
22 238
4 097
4,77
0,88
SR_BitPrediction(void)
20 253
553
4,34
0,12
SB_Correction(void)
12 762
5 530
2,74
1,19
getSBoxInvert(int)
5 939
5 867
1,27
1,26
getSBoxValue(int)
1 293
1 270
0,28
0,27
MixColumns(void)
4 343
4 343
0,93
0,93
ARK_BytePrediction(int)
2 314
2 314
0,5
0,5
SR_BytePrediction(void)
1 397
1 397
0,3
0,3
AddRoundKey(int)
1 238
1 238
0,27
0,27
ShiftRows(void)
725
725
0,16
0,16
MC_Correction(void)
526
526
0,11
0,11
error_injection(void)
499
463
0,11
0,1
Correction(void)
375
375
0,08
0,08
90
Parity bit based SB
Function Name
Inclusive Samples
Exclusive Samples
Inclusive Samples %
Exclusive Samples %
getSBoxParity(int)
195 112
193 142
30,74
30,43
getSBoxValue(int)
192 502
190 596
30,33
30,03
ParityBitCheck(void)
73 677
9 289
11,61
1,46
SB_BytePrediction(void)
44 221
44 221
6,97
6,97
ARK_BitPrediction(int)
21 792
1 804
3,43
0,28
MC_ParityBitCheck(void)
21 719
3 713
3,42
0,58
getParityBit(unsigned char (* const)[4])
21 695
21 695
3,42
3,42
MC_BitPrediction(void)
21 568
4 007
3,4
0,63
SR_BitPrediction(void)
19 864
529
3,13
0,08
MixColumns(void)
3 698
3 698
0,58
0,58
ARK_BytePrediction(int)
2 230
2 230
0,35
0,35
SR_BytePrediction(void)
1 394
1 394
0,22
0,22
AddRoundKey(int)
1 331
1 331
0,21
0,21
Correction(void)
564
564
0,09
0,09
error_injection(void)
524
490
0,08
0,08
ShiftRows(void)
508
508
0,08
0,08
MC_Correction(void)
429
429
0,07
0,07
91
CRC based SB
Function Name
Inclusive Samples
Exclusive Samples
Inclusive Samples %
Exclusive Samples %
SBox_checksum(int)
196 024
193 753
27,92
27,6
getSBoxValue(int)
193 084
191 244
27,5
27,24
SB_detection(void)
88 293
88 293
12,58
12,58
ParityBitCheck(void)
50 308
6 555
7,17
0,93
SB_BytePrediction(void)
41 881
41 881
5,97
5,97
getParityBit(unsigned char (* const)[4])
22 223
22 223
3,17
3,17
MC_ParityBitCheck(void)
21 964
3 906
3,13
0,56
MC_BitPrediction(void)
21 886
4 473
3,12
0,64
ARK_BitPrediction(int)
21 634
1 674
3,08
0,24
SR_BitPrediction(void)
20 503
585
2,92
0,08
MixColumns(void)
3 563
3 563
0,51
0,51
ARK_BytePrediction(int)
2 436
2 436
0,35
0,35
SR_BytePrediction(void)
1 358
1 358
0,19
0,19
AddRoundKey(int)
1 077
1 077
0,15
0,15
error_injection(void)
615
547
0,09
0,08
ShiftRows(void)
593
593
0,08
0,08
MC_Correction(void)
587
587
0,08
0,08
Correction(void)
565
565
0,08
0,08
92
B.3
Memory usage performance
Memory usage (KB)
Original code
1308
Overhead
ParityBit based SB
1332
1,8%
InvSbox based SB
1352
3,4%
CRC based SB
1368
4,6%
93
Bibliography
[1] John G. Proakis (2001). “Digital Communications”, 4th edition, McGRAW-HILL International
edition.
[2] S. Benedetto, E. Biglieri, V. Castellari (1987), “Digital Transmission Theory”, Prentice Hall.
[3] W. Stallings (2007), “Network Security Essentials: Applications and Standards”, 3rd edition,
Pearson Prentice Hall.
[4] H. Feistel (May 1973), “Cryptography and computer privacy”, Scientific American.
[5] Yoshitaka
Ikeda
(2008),
Available
http://commons.wikimedia.org/wiki/File:Feistel.png.
from
the
Internet,
[6] Animal, a new interactive modeller for animations in lectures, v. 2.3.14: “DES, Data
Encryption Standard”, 2008
[7] Wikipedia, The free encyclopedia: “The Advanced Encryption Standard”, Available on the
Internet, http:/en.wikipedia.org/wiki/Advanced_Encryption_Standard.
[8] RSA Laboratories.. “What is a Block Cipher?”, Cryptography, 2007. Retrieved from
http://www.rsa.com/rsalabs/node.asp?id=2171.
[9] H. Lipmaa, P. Rogaway, D. Wagner (2000): “Comments to NIST concerning AES mode of
Operations: CTR-Mode Encryption”, In Symmetric Key Block Cipher Modes of operation
Workshop, Baltimore, Maryland, USA.
[10] J. Jaffe (2007), “A First-Order DPA Attack Against AES in Counter Mode with Unknown
Initial Counter”, in: Lecture Notes in Computer Science, Vol. 4727/2007, Springer Berlin /
Heidelberg, and presentation from the Rump Session Talk, CHES 2006.
[11] Ors, et. al., “Power Analysis Attacks: Power-Analysis attack on ASIC AES implementation”,
presented
by
Michael
Cloppert,
available
on
the
Internet:
http://www.cloppert.org/Power-Analysis_Attack_Presentation.pdf
[12] Onur Acıi (2006), Werner Schindler, and C, etin K. Ko, “Cache Based Remote Timing Attack
on the AES” in: Lecture notes in computer science, Springer Berlin / Heidelberg, Vol. 4377.
[13] A. Biryukov and D. Khovratovich (2009), “Related-key Cryptanalysis of the Full AES-192 and
AES-256”, University of Luxembourg
[14] Available on Internet: http://en.wikipedia.org/wiki/Data_Encryption_Standard
[15] Bertoni G. Breveglieri L., Koren I., Maistri P., Piuri V. (Nov. 2003): “Detecting and Locating
Faults in VLSI Implementations of the Advanced Encryption Standard”, Proc. of the 2003
94
IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 105113.
[16] P. Dusart, G. Letourneux, O. Vivolo (2002), “Differential Fault Analysis on AES”, in: Lecture
Notes in Computer Science, Vol. 2846/2003, Springer Berlin / Heidelberg.
[17] Biham E., Shamir A. (1997): “Differential Fault Analysis of Secret Key Cryptosystems”,
Advances in Cryptology - CRYPTO’97, LNCS, vol. 1294, pp.513-525, Springer-Verlag
[18] Bertoni G., Breveglieri L., Koren I., Maistri P., Piuri V. (2003): “Error analysis and detection
procedures for a hardware implementation of the advanced encryption standard”. IEEE
Trans. Comput. 52(4), 492–505.
[19] Karri R.,Wu K., Kuznetsov G., GoesselM. (2004): “Low cost concurrent error detection for
the advanced encryption standard”. In: Proceedings of the International Test Conference
2004, pp. 1242–1248.
[20] Kulikowski K.J., KarpovskyM.G., Taubin A. (2006): “Fault attack resistant cryptographic
hardware with uniform error detection”. In: Proceedings of the FDTC 2006, LNCS, vol.
4236, pp. 185–195.
[21] Yen C. -H., Wu B.-F. (2006): “Simple error detection methods for hardware
implementation of advanced encryption standard”. IEEE Trans. Comput. 55(6), 720–731.
[22] Yen S.-M., Kim S., Lim S., Moon S.(2003): “RSA speedup with Chinese reminder theorem
immune against hardware fault cryptanalysis”. IEEE Trans. Comput. 52(4), 461–472
(2003).
[23] Yen S.-M., Joye M. (2000): “Checking before output may not be enough against faultbased cryptanalysis”. IEEE Trans. Comput. 49(9), 967–970.
[24] M. Czapskii - M. Nikodem (2008): “Error detection and error correction procedures for the
advanced encryption standard”. In Springer Science+Business Media, LLC.
[25] Federal Information Processing Standards Publication 197 (2001), “Announcing the
ADVANCED ENCRYPTION STANDARD (AES)”. Available on the Internet:
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[26] C. Giraud (2003), “DFA on AES”, in: Advanced Encryption Standard, pg. 27-41, AES 4th
international conference, AES 2004.
[27] Chien-Ning Chen, Sung-Ming Yen (2003), “Differential Fault Analysis on AES key schedule
and some countermeasures”, in: Lecture Notes in Computer Science, Vol. 2727/2003,
Springer Berlin / Heidelberg.
[28] Peterson, W. W. and Brown, D. T. (January 1961). "Cyclic Codes for Error
Detection". Proceedings of the IRE 49: 228.
[29] Wikipedia,
The
free
encyclopedia,
“Cyclic
redundancy
check”,
http://en.wikipedia.org/wiki/Cyclic_redundancy_check
95
[30] Niyaz PK , “Advanced Encryption Standard (AES) Implementation in C/C++”, available on
internet: http://www.hoozi.com/Articles/AESEncryption.htm
[31] National Institute of Standards and Technology, Computer security division, “AES Known
Answer
Test
vectors”,
available
on
the
Internet:
http://csrc.nist.gov/groups/STM/cavp/documents/aes/KAT_AES.zip
[32] H. Pulapaka, B. Vidolov (March 2008), “Find Application Bottlenecks with Visual Studio
Profiler”,
on
MSDN
magazine
issues,
available
on
the
Internet:
http://msdn.microsoft.com/en-us/magazine/cc337887.aspx
96