AALBORG UNIVERSITY COPENHAGEN Error detection and correction algorithm for AES Fabrizio Di Napoli Alexandre Noizet Lucio Quagliozzi Fall Semester 2009 Aalborg University Copenhagen Communication Networks Specializing in Security Title: Error detection and correction algorithm for AES. Aalborg University Copenhagen Lautrupvang 15, 2750 Ballerup, Denmark Project Period: September – December 2009 Semester Coordinator: Birger Andersen Semester Theme: Basic Security Secretary: Judi Stærk Poulsen Phone: 9940 2468 Abstract: In the encryption process errors injected into Supervisor(s): data do not only compromise the quality of the message, but can also be used by malicious Henrik Tange Birger Andersen users to perform attacks and break the secrecy of the data. The purpose of this project is to analyze the encryption process with a focus on data corruption and how it could be avoided. Members: The analysis takes place with 3 main subjects: Fabrizio Di Napoli Review the Advanced Encryption Standard. Alexandre Noizet Lucio Quagliozzi Examine how errors can be injected in order to recover secret information from data. Inspect a way to detect and eventually correct these errors to keep the encryption process safe. Finally different solutions are suggested for a Copies: 3 Pages: 103 Finished: 17/12/2009 software implementation of the error detection and correction algorithm accompanied with performances tests. It was found that bit errors of odd multiplicity in at most one byte in each word could be corrected with an overhead in time below 49%. Copyright © 2009. This report and/or appended material may not be partly or completely published or copied without prior written approval from the authors. Neither may the contents be used for commercial purposes without this written approval. Table of contents List of figures .................................................................................................................................iv Preface .......................................................................................................................................... v Chapter 1 ....................................................................................................................................... 1 Introduction .............................................................................................................................. 1 1.1 General structure of a digital communication system .................................................. 1 1.2 Errors introduced by the communication channel ....................................................... 3 1.3 Errors injected during the encryption phase ................................................................ 4 1.4 Overview of some encryption algorithms ..................................................................... 5 1.4.1 Feistel Cipher structure ......................................................................................... 5 1.4.2 Data Encryption Standard ..................................................................................... 6 1.4.3 Triple DES .............................................................................................................. 7 1.4.4 Advanced Encryption Standard ............................................................................. 8 1.4.5 Cipher Block Modes............................................................................................... 8 1.5 Some attacks on AES ..................................................................................................... 9 1.5.1 DPA Attack ........................................................................................................... 10 1.5.2 Cache-collision..................................................................................................... 11 1.5.3 Boomerang attack ............................................................................................... 12 1.5.4 DFA attack ........................................................................................................... 12 1.6 Problem definition ...................................................................................................... 13 Chapter 2 ..................................................................................................................................... 15 Error Detection and error correction on the AES algorithm ................................................... 15 2.1 Algorithm process of encryption ................................................................................. 15 2.2 Key Scheduling ............................................................................................................ 15 2.3 Substitute Bytes .......................................................................................................... 16 2.3.1 Shift Rows ............................................................................................................ 17 2.3.2 Mix Columns ........................................................................................................ 17 2.3.3 Add Round Key .................................................................................................... 18 2.4 Algorithm process of decryption ................................................................................. 18 2.5 Differential Fault Analysis on AES ............................................................................... 19 i 2.5.1 Description of the fault injection ........................................................................ 20 2.5.2 Key Extraction...................................................................................................... 24 2.5.3 Generalization ..................................................................................................... 26 2.5.4 Attack Complexity ............................................................................................... 26 2.6 The error detection and correction algorithm ............................................................ 27 2.6.1 Parity Bits and error detection ............................................................................ 27 2.6.2 Parity Bytes and error correction ........................................................................ 33 Chapter 3 ..................................................................................................................................... 38 Implementation of the error detection and error correction algorithm ................................ 38 3.1 Parity bit and byte computation ................................................................................. 38 3.1.1 3.2 Parity Check ......................................................................................................... 40 Error detection and correction for ShiftRows ............................................................. 41 3.2.1 Parity prediction .................................................................................................. 41 3.2.2 Error correction ................................................................................................... 42 3.3 Error detection and correction for MixColumns ......................................................... 43 3.3.1 Parity Prediction .................................................................................................. 44 3.2.2 Error correction ................................................................................................... 46 3.3 Error detection and correction for AddRoundKey ...................................................... 47 3.3.1 Parity Prediction .................................................................................................. 48 3.3.2 Error Correction................................................................................................... 49 3.4 Error detection and correction for SubBytes .............................................................. 49 3.4.1 Parity Bit based SubBytes error detection and correction.................................. 49 3.4.2 Inverse SBox based error detection and correction............................................ 51 3.4.3 CRC based error detection and correction .......................................................... 52 Chapter 4 ..................................................................................................................................... 55 Error coverage and performances tests .................................................................................. 55 4.1 Test Environment ........................................................................................................ 55 4.2 Assumptions proof ...................................................................................................... 56 4.3 ShiftRows Limits .......................................................................................................... 57 4.4 Parity bit based SubBytes Limits ................................................................................. 58 4.5 CRC based SubBytes Limits.......................................................................................... 58 4.6 Performances analysis................................................................................................. 58 ii 4.6.1 Time ..................................................................................................................... 59 4.6.1.1 Without error .................................................................................................. 59 4.6.1.2 With a single error injected ............................................................................. 59 4.6.1.3 With four errors injected................................................................................. 60 4.6.2 CPU load .............................................................................................................. 61 4.6.3 Memory usage..................................................................................................... 63 Chapter 5 ..................................................................................................................................... 64 Conclusions and future work .................................................................................................. 64 Appendix A .................................................................................................................................. 66 AES source code with error detection and correction ............................................................ 66 A.1 Complete Source code with parityBit based SubBytes detection/correction ............ 66 A.2 CRC based SubBytes detection/correction solution ................................................... 77 A.3 InvSBox based SubBytes detection/correction solution ............................................. 79 Appendix B .................................................................................................................................. 81 B.1 Time performance ....................................................................................................... 81 Parity bit based SB............................................................................................................... 81 Inverse based SB ................................................................................................................. 84 CRC based SB ....................................................................................................................... 87 B.2 CPU load performance ................................................................................................ 90 Inverse based SB ................................................................................................................. 90 Parity bit based SB............................................................................................................... 91 CRC based SB ....................................................................................................................... 92 B.3 Memory usage performance ....................................................................................... 93 Bibliography ................................................................................................................................ 94 iii List of figures Figure 1: Basic elements of a digital communication system ............................................ 2 Figure 2: Encryption/Decryption process........................................................................... 4 Figure 3: The Feistel Structure. ......................................................................................... 6 Figure 4: Triple DES algorithm.......................................................................................... 7 Figure 5: Cipher Block Chaining Mode. ............................................................................ 8 Figure 6: Counter Mode structure. ..................................................................................... 9 Figure 7: DPA Analysis ................................................................................................... 10 Figure 8: Data collection during AES process ................................................................ 11 Figure 9: The SubBytes Transformation .......................................................................... 16 Figure 10: The Shift Row Transformation ....................................................................... 17 Figure 11: The MixColumn Transformation .................................................................... 17 Figure 12: The AddRoundKey Transformation ............................................................... 18 Figure 13: AES encryption/decryption process ................................................................ 19 Figure 14: Matrix S and parity bits in green..................................................................... 28 Figure 15: Data , Checksum packet and the corresponding math expression ................. 30 Figure 16: Changes to the parity bits for j -th output Word after MixColumns transformation depending on error pattern inducted and byte of the Word affected by error ................................................................................................................ 32 Figure 17: AES’s input State with parity bits and bytes, output State and correction mask.. ........................................................................................................ 35 Figure 18: Correction matrix for MixColumn transformation ......................................... 46 Figure 19: CPU load for the Paritybit based solution....................................................... 61 Figure 20: CPU load for the CRC based solution ............................................................ 61 Figure 21: CPU load for the InvSBox based solution ...................................................... 62 iv Preface This report describes the project that has been worked out by the students of the CNS department from the 1st of September to the 18th of December 2009. The report is an investigation of the Advanced Encryption Standard and attacks performed on it by error injections. The main purpose is the study of error detection and error correction algorithms proposed by researchers and a new software implementation for them. The main report is divided in 5 parts: Chapter 1 provides an introduction to communications and security of data with the definition of the project purpose Chapter 2 describes the Advanced Encryption Standard and a Differential Fault Analysis on it. Finally we present the analysis of error detection and correction algorithms proposed by other researchers for AES. Chapter 3 suggests a software implementation for the error detection and correction algorithm to prevent DFA attacks. Chapter 4 contains the reviews of error coverage and performances tests on the new software implementation. Chapter 5 concludes the report with an overall discussion about what we presented and ideas for future developments. The exhaustive list of citations is provided at the end of this document. The appendix part includes the C code written for the algorithm and some graphical results of performances tests. The source code and the pdf / doc files for this document are provided in the CD-ROM enclosed. v Chapter 1 Introduction The digital communications field involves the transmission of information in digital form, from a source that generates the information to one or more destinations. In this scenario, integrity of data is a main prerequisite a communication system has to preserve: if data get corrupted during the transmission and the receiver is not able to decode the content, the communication has no reason to exist. But to understand how it is possible to save transmitted data from being altered by errors, a background of the steps, the components that act on the data and the factors that may make the transmission unsafe are needed. The aim of this chapter is to analyze how the communication works and some possible sources of error that can occur during a transmission and affect the integrity of transmitted data. 1.1 General structure of a digital communication system The source that generates messages to be transmitted does not always produce by itself binary information. For this reason, the original messages produced by the source are converted into a sequence of binary digits and, to seek an efficient representation of the source output, in the source encoding process we try to reduce as much as possible redundancy in the original data flow. The sequence of binary digits from the source encoder is then passed to the channel encoder that introduces, in a controlled manner, some redundancy in the sequence that can be used by the receiver to overcome the effects of noise and interference encountered in the transmission through the channel. Then, the channel encoder output goes through to the digital modulator, which serves as an interface for the communication channel: the primary purpose of the digital modulator is to map the binary information sequence into waveform signals, which can be transmitted through the channel. This explanation usually deals with the analogical word. 1 The communication channel is the physical mean used to send the signal from the transmitter to the receiver. It can be the air in a wireless transmission or twisted pairs or optic fibers in a wired communication. At the receiver position, end of a digital system, the digital demodulator estimates from the received waveform the transmitted data symbols. These symbols are then processed by the channel decoder to attempt reconstructing the original information sequence by the redundancy contained in the received data. Finally, the source decoder reconstructs the original signal [1- Ch. 1]. In addition to this, both sender and receiver want their data transfers to be protected against eavesdropping or tampering and let nobody but them able to understand the information sent through the channel. That’s why we introduce a new block in the classical basic scheme of a digital communication system that performs an obscuring of a piece of information meaning by encoding it in such a way that it can only be decoded, read and understood by people for whom the information is intended: this block is the encipherer/decipherer. Figure 1 illustrates the functional diagram and the basic elements of a digital communication system as it has just been described. Figure 1: Basic elements of a digital communication system During the whole process, data encounters different factors that could compromise their integrity and make the delivered message corrupted or different from the original one. First of 2 all, in signal transmission through any channel, various sources of noise and interference may arise externally or within the system. In addition, malicious users could be interested in intercepting the communication and substituting or corrupting the message in order to discover the parameters (the key) used in the encryption process. For now, let us briefly discuss about these scenarios and mention some solutions to avoid data corruption. 1.2 Errors introduced by the communication channel As indicated in the preceding discussion, the communication channel is the connection between the transmitter and the receiver. The physical channel may be a pair of wires that carry the electrical signal, or an optical fiber that carries the information on a light beam, or simpler free space over which the signal is radiated by an antenna. One common problem is additive noise generated internally by electrical components, sometimes called thermal noise. Other interference or noise sources can also be generated externally from other users of the channel or electromagnetic fields. Because of these factors, the spectrum of the original signal transmitted may be attenuated or present new frequencies not involved before. Consequently, the result is that one or more bits are changed in the message with a certain probability. Different channel models have been described to model these phenomenon and the errorprobability they introduce: for instance the perfect channel with no error, the useless channel where error always occurs, and different binary symmetric channels in which two transmitted symbols face an error with probability p; each channel model describes a situation with different numbers of sent/received symbols and a different way the error could occur [2 – Ch. 3]. These models helped to understand that something is needed to recognise if there were errors and, perhaps, correct it. This aim is just what the channel encoder reaches: a sequence of k symbols as an input is represented with a longer sequence of n>k symbols adding redundancy that permits the channel decoder to detect or eventually correct an error. One of the most interesting codes is the Reed-Solomon coding, that can detect and correct different combinations of erroneous symbols. They are very efficient in the correction of errors bursts, in which errors tend to be clustered in a number of erroneous consecutive bits 3 different and bigger than one. This is a typical situation in some transmission means like the wireless channel. 1.3 Errors injected during the encryption phase Errors can be faced also during the encryption process. In this process the message (plaintext) is encoded according to a key through different logical operations on bits, in order to hide the content of the message in order to only the one who knows the key is able to understand and to revert (Figure 2). Figure 2: Encryption/Decryption process [3 – pg 30] It is becoming very common that in this phase, someone who we will be known as the hacker, tries to break the security of the algorithm and discover its secret key. One way to perform this is to intentionally induce faults into the system and collect the correct data as well as the faulty outputs. The hacker then compares them and tries to retrieve the secret information embedded inside the encryption system, which is, almost all the time, identified in the secret key. These attacks will be discussed in more details further, but before this we need to better understand how the encryption process acts on data. 4 1.4 Overview of some encryption algorithms All encryption algorithms are based on two general principles: substitution, in which each element in the plaintext is mapped into another element, and transposition, in which elements in the plaintext are rearranged [3 – Ch. 2]. The fundamental requirement is that no information must be lost. We can also distinguish two kinds of system according to the number of keys used: if both sender and receiver use the same key, the system is referred to as symmetric; if instead they each use a different key the system is referred to as asymmetric. In the end, we can also distinguish block ciphers, which process the input one block of elements at a time producing an output block for each input block, and stream ciphers that process the input elements continuously, producing output one element at a time. Through this project, we will focus on the AES algorithm that belongs to the symmetric block ciphers. But before analysing it, we would like to provide a short overview of the main symmetric ciphers and the most common modes in which they are used. 1.4.1 Feistel Cipher structure The Feistel Cipher is a special example of the most general structure used by most of the symmetric block ciphers. This structure, described by Horst Feistel in 1973 [4], follows the structure showed in figure 3, where the plaintext block is divided into two halves that pass through a number of rounds of processing and then are recombined to produce the output ciphertext. In each round the right-half derived from the previous round is processed, gathered with the subkey Ki derived from the initial key, by a substitution function F. The round function has the same general structure for each round, except the round where the subkey is used. 5 Figure 3: The Feistel Structure [5] A final permutation is performed at the end of each round by switching the two halves of the data and putting them as inputs of the next round. From this structure, most of block ciphers like DES and 3DES are derived. 1.4.2 Data Encryption Standard The Data Encryption Standard (DES) was adopted in 1977 by the National Bureau of Standard, now NIST. Here, the input block is 64-bits in length and the key is 56-bit length. It has the Feistel structure, with 16 rounds and, as a consequence, 16 subkeys. The decrypting process is the same as the encrypted one, but the subkeys are used in the reverse order. The DES core function F performs an expansion of the right half from 32 to 48 bits, an XOR with the round subkey, and a substitution according to 8 S-BOXes applied to groups of 6 bits. 6 These groups are then concatenated in a new matrix, to which an additional permutation is applied. There is just a minor variation of the original Feistel structure as an initial permutation performed to the input block before the first round and its inverse performed on the output of the last round [6]. 1.4.3 Triple DES Triple DES was standardized for use in financial application in ANSI standard X9.17 in 1985. It uses three keys to perform three executions of the DES algorithm (Figure 4): a DES encryption using the first key is performed followed by a DES decryption with the second key, and finally another DES encryption with the third key. The decryption algorithm instead follows the Decryption-Encryption-Decryption scheme with the same keys but in the reverse order. Figure 4: Triple DES algorithm The new overall key size is 168 bits but it allows also the use of two same keys (K1=K3), with a new key length of 112 bits. 7 1.4.4 Advanced Encryption Standard The Advanced Encryption Standard (AES) consists of three block ciphers, AES-128, AES-192 and AES-256, adopted from a larger collection originally published as Rijndael. Each AES cipher has a 128-bit block size, with key sizes of 128, 192 and 256 bits, respectively. The AES ciphers have been analyzed extensively and are now used world widely, like it was the case with its predecessor, the Data Encryption Standard (DES) [7]. 1.4.5 Cipher Block Modes The simplest way to operate with cipher blocks is called electronic codebook, where each block of plaintext is encrypted using the same key according to the chosen encryption algorithm. In this way, if the same block of plaintext appears more than once in the message, it always produces the same ciphertext. Some alternatives to ECB solve this problem. One way is to chain the block that we are going to encrypt with the last ciphertext obtained. This is shown in figure 5 and is called Chaining Mode, where the input to the encryption algorithm is the XOR of the current plaintext block and the previous ciphertext block. The key used to encrypt each block is always the same and the decryption algorithm follows the same structure. Figure 5: Cipher Block Chaining Mode [8] In the Counter Mode Encryption [9] instead, in figure 6, we use the chosen algorithm and key to encrypt a counter that is incremented by one for each block. The output is XORed with the 8 plaintext block to produce the ciphertext block. This mode of operation allows encrypting/decrypting different blocks of plaintext at the same time. The decryption process follows the same structure. Figure 6: Counter Mode structure [9] 1.5 Some attacks on AES Now that we have a complete panorama of the encryption in a communication system, we can try to understand how it is possible to break the security of data. As one of the most widely used algorithms, AES is continuously one of the favourite targets of malicious attacks performed to discover the secret key or the plaintext of an encryption process. In the following we will show few details of some of these attacks. 9 1.5.1 DPA Attack A well-known attack on AES algorithm in counter mode is the DPA attack that stands for differential power analysis [10]. To use this attack, there is no need of the initial value of the counter that could be very helpful in other attacks. Also, the input and the output are unknown too so the hacker is blind at the beginning. Figure 7: DPA Analysis [11] Basically the process of this attack is first to perform a data collection (see figure 7). This could only be done with a direct access to the device and its power. The data collection is a measure of the power of the device that will be analyzed after. It traces till the fifth round also if they key is longer than 16 bytes. Figure 8 is an example of a data collection: 10 Figure 8: Data collection during AES process [10] We can see on the data collection scheme that it is possible to distinguish the AES algorithm steps of each round. In fact, the figure 8 represents the power consumption of the hardware during the time, and how variations of the power can be related to the AES encryption process. Then, with the data collection performed, it is possible to find the 15th byte block of bits of the plaintext T15 and perform a 15-bit exhaustive search on this block. These values are used iteratively through all AES rounds to find the subKeys (keys of the extended key created by the key schedule) by a statistical analysis. Finally, it runs the key schedule backwards to find the Key. But in order to perform this attack, the hacker needs to be close to the power source of the device and to use some tools to perform the data collection so it is a hard attack to perform. Some assumption about how to prevent these attacks can be made: - The user has to control the hardware power. - The user can disturb the hardware by adding some noise to the power consumption. 1.5.2 Cache-collision Cache memory stores the most frequently used data to increase access so that the CPU doesn’t have to look for the data in the main memory (cache hit). If the data is not located in the cache then the CPU goes to the main memory (cache miss). 11 This kind of cache-collision attack uses these two processes, cache hit and cache miss, during the encryption/decryption process. For each AES step of the encryption/decryption, cache hit and miss can cause variations of process time and power consumptions that can be used to find the secret keys[12]. 1.5.3 Boomerang attack This kind of attack uses local collisions of AES. The purpose is to inject errors during a step and then to correct this error by injecting another error to create a disturbance during the encryption algorithm. This way, this attack allows the hackers to recover parts of the secret keys because he can find where the disturbances he created are [13]. 1.5.4 DFA attack The main principle of the Differential Fault Analysis attack is to introduce differences to some intermediate data during encryption rather than to the input of the algorithm: this simplifies dramatically the differential analysis since then only few rounds of the whole algorithm are actually attacked and have to be analyzed [16]. 12 1.6 Problem definition Each of the different algorithms analyzed in section 1.4 was developed due to the security lacks of the previous one. Initially, in fact, DES was selected as Federal Information Processing Standard (FIPS) for the United States in 1976 and after becoming the common encryption standard in the whole world. But DES has been soon considered to be insecure for many applications. This is mostly due to the 56-bit key size being too small that was publicly broken in 22 hours and 15 minutes in January, 1999. The first algorithm suggested to solve DES problems was then Triple DES, which simply applied DES cipher algorithm three times. It increased the key size of DES protecting against brute force attacks without requiring a completely new block cipher algorithm. Even if Triple-DES avoids the problem of a small key size, it is a very slow algorithm in software applications, unsuitable for limited-resource platforms, and may be affected by potential security issues related to the small block size of 64 bits. On January 2, 1997, NIST announced they wish to choose a successor to DES, and AES was selected. In 2001, after an international competition, NIST selected an algorithm named Rijndael developed by two Belgian researchers as a replacement for DES[14]. When announced in 1997 Advanced Encryption Standard (AES) was thought to become the solution that will ensure safe communications. However, the use of the AES does not guarantee full security. Soon, researchers [15] showed that even a single fault during encryption/decryption results in a large number of errors in encrypted or decrypted data. Another problem is that the implementations of a secret key cryptosystem in hardware, including AES, are susceptible to differential fault analysis [16, 17]. Another relevant problem for AES security is that data can be accidentally corrupted during the encryption process due to high memory usage or processes scheduling. Hence comes out the problem statement of this project: How can bit errors be detected and corrected during the encryption phase of AES, and avoid in this way a possible DFA attempt to recover the secret key? One of the simplest techniques could be to feed the input data to two different and independent encryption units at the first and compare the final result of each unit to the 13 other. If they both are identical then it is assumed that both encryption units performed correctly, if they differ it is possible to assume that at least one unit is erroneous, and the encryption process is repeated again in each encryption unit. However it’s clear that this approach doesn’t optimize the process because it requires more resources such as memory, clocks of processors and, of course, time consuming. In fact you have to double the encryption and decryption hardware; moreover the error will be detected only after encryption/decryption completion, increasing the whole encryption/decryption time. In order to protect encryption process from attacks decreasing the resources overhead from the standard encryption, another strategy was needed. Many improvements in AES algorithm were focused on error detection. In 2003 Bertoni et al. [18] proposed to use parity bits to detect errors injected in the state during the encryption process. Their results were recently improved by other authors [19, 20, 21] who proposed different algorithms. Unfortunately in the years 2000 and 2003 other researchers [22,23] showed that error detection alone is not enough to guarantee protection against data errors. Then, we focused our attention not only on the fact of how to detect errors, but also how to correct them in the encryption standard of nowadays. The main purpose is to make AES stronger against DFA attacks, which act introducing these errors between transformations. To be able to describe how the algorithm has to be modified, we’ll need to analyze deeply how each transformation in each round modifies the data that is going to be encrypted, and then how the fault inducted during the DFA attack affects these data. Furthermore, we will see how it changes and spreads along the whole data-block after the transformations. After this analysis, we will try to introduce an algorithm, based on parity between round operations, which is able to make the encrypted data error-free as much as possible. Our studies were concentrated on the theoretical analysis of the [24] proposal, also providing some alternatives for some steps; in a final implementation phase then, we will try to build a modified software version of AES including the detection and correction algorithm. We mainly attempt to determine how many and what kind of errors the new algorithm is capable to face, how AES performances could be lowered by the detection/correction procedure, and how different approaches impact on time, memory and CPU usage. 14 Chapter 2 Error Detection and error correction on the AES algorithm The AES algorithm is a symmetric encryption algorithm adopted by the NIST [25] as a standard in 2001. It replaces DES algorithm that uses secret keys of only 56 bits when the keys are equal or more than 128 bits for AES. In fact, this algorithm uses 128 bits blocks of data and three different keys lengths (128-192-256 bits). It works as rounds (10, 12 or 14) with different transformations and shifts of a 16 bytes matrix (4x4 array of bytes) called state. In the following description of the rounds, we assume that it uses 10 rounds of processing steps and a key length of 128 bits. 2.1 Algorithm process of encryption In each round, some processing steps are performed: AddRoundKey, Substitute bytes, Shiftrows, Mix columns, and AddRoundKey again. Each step is reversible without any knowledge of the key and used in the decryption process too. In the following, each round round will be described in details. 2.2 Key Scheduling This first step uses the key provided as input and expands it into an array of forty-four 32 bit words using the Rijndael's key schedule operations. The first 4 words are the key itself; words that are in positions multiple of 4 are calculated applying rotations, substitutions and additions with the original key and a constant Rcon; the remaining words are calculated as the XOR between the previous word and the one 4 positions earlier. For a complete description of the key scheduler refer to the NIST description of the algorithm [25]. In each round, 4 words (128 bits) are used. 15 2.3 Substitute Bytes This step uses a substitution box (S-box) to perform a byte to byte substitution with the State array. Each byte in input is mapped into a new byte: the first 4 bits indicate the row and the last 4 the column in the S-Box. This substitution creates a new state s’. The S-box is created first by finding the multiplicative inverse for a given number with the Rijndael's finite field. Rijndael uses a characteristic 2 finite field with 8 terms, which can also be called the Galois field GF(28). It uses the following reducing polynomial for multiplication: x8 + x4 + x3 + x + 1. After finding the multiplicative inverse, it is transformed with this affine transformation: Then the result is XORed by the decimal number 99 (11000110) that generates the S-box. The transformation through a pre-computed S-Box is shown in figure 9: Figure 9: The SubBytes Transformation [7] 16 2.3.1 Shift Rows This step is just simple permutation of the state rows (figure 10). If n is the row number, n-1 is the number of left shift of bytes of each row as described on the following picture. Row 1: not modified Row 2: shifted one to the left Row 3: shifted two on the left Row 4: shifted three on the left Figure 10: The Shift Row Transformation [7] 2.3.2 Mix Columns The Mix columns step affects all the bytes of each column because a transformation of the first row diffuses in all other rows. The new value is function of all the bytes in the same column. All bytes are combined using an invertible linear transformation. Operations are made in the GF(28) meaning that only factors of 1 remain in the polynomial. Each column of the state is multiplied by a 4x4 matrix and the results are each column 4x1 of the output state: S '0 , j 2 S '1, j 1 S ' 1 2, j S' 3 3, j 3 1 1 S 0, j 2 3 1 S1, j 1 2 3 S 2 , j S 1 1 2 3, j Figure 11: The MixColumn Transformation 17 2.3.3 Add Round Key The Add Round Key step is an XOR between the State array after the Mix Columns and the round key generated by the key scheduler. The following picture depicts how the output state is created: Figure 12: The AddRoundKey Transformation [7] 2.4 Algorithm process of decryption The decryption process is the same as the encryption but performed in a reversed way (expanded key is used in a reverse order). In figure 13 the complete process of encryption/decryption of the AES algorithm is shown. After the complete process of encryption, the output of the cipherblock AES is a ciphertext. This ciphertext will be used as an input of the AddRoundKey step when decrypting. 18 Figure 13: AES encryption/decryption process [3] 2.5 Differential Fault Analysis on AES The Differential Fault Analysis exploits computational errors to find cryptographic keys. The main principle of this attack is to introduce differences to some intermediate data during encryption rather than to the input of the algorithm: this simplifies dramatically the differential analysis since only few rounds of the whole algorithm are actually attacked and have to be analyzed. Determining the inducted difference and also forcing the difference to be of the particular type are the only difficulties of the attack that affect its overhead. In this part we show how differential fault analysis (DFA) works on the AES-128 encryption algorithm described above. 19 2.5.1 Description of the fault injection This section is based on [26]. In order to perform the attack, the attacker exposes the encryption device to certain physical effects (i.e. radiation) so that he can induce a fault in some bits in a word at some intermediate stages of the encrypting algorithm. The goal of the attack is to recover the subkey at the 10th round; once recovered the 10th subkey, it is possible to recover the whole key. We assume that the fault is introduced after the Shift Rows of the 9th round changing a single byte of the state: let’s suppose, for instance, that we inject an error ε=1E in the 1st byte of the 1st word. It corresponds to the XOR between the byte of the state and the error, that is 87 1E 99 . 87 6E 46 A6 F 2 4 D 97 4C 90 EC E7 4 A C3 8C D8 95 99 6E 46 A6 F 2 4 D 97 4C 90 EC E7 4 A C3 8C D8 95 However, in general, the attacker doesn’t know the differential fault ε as the fault injection occurs with a certain probability at a short random bit location. For this reason we can generalize the example by calling with Sr,f[x] and with Fr,f[x] the correct and the faulty byte x of the state at round n after the function f. We can then write the faulty byte after the 9 th Shift Rows as: F9, Sh [1] S 9, Sh [1] And the complete state as: F9, Sh S 9, Sh 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This state is processed through the 9th Mix Column. Here the error is spread in the whole word; To explain this in a better way, let’s consider the mix column operation just for the first word 20 of the state of the example. The first byte of the first word of the Mix Column output is the result of the row-by-column product between the vector [02 03 01 01] and the first word of the state [99 6E 46 A6]T, that contains also the erroneous byte. The result then will be affected by error. The same thing happens to the second byte of the first word of the output, since it is the product between [01 02 03 01] and the 1st word of the input state, and so on for the remaining byte. The result is unchanged instead for the other words of the output, because they are computed starting from the 2nd, 3rd and 4th words of the input, that are not affected by errors. The result will then be as follows: 7B 29 8A CF 40 A3 D 4 70 E4 3A A5 A6 4C 9F 42 BC In general: F9, MC A0 F9, Sh A0 ( S 9, Sh 00 00 00 00 00 00 00 00 00 ) 00 00 00 00 00 00 In this equation, A0 is the characteristic Matrix of the mix column operation. But the matrix multiplication is distributive and the product between A0 and S9,sh represents the state we should have after mix column if no error occurs. Then the output state from the 9th mix column is written as: F9, MC S 9, MC 2 1 1 3 00 00 00 00 00 00 00 00 00 00 00 00 Here the matrix on the right is the result of the product between A0 and the matrix containing the error byte. In our example the 1st column of that matrix is [3C 1E 1E 22]T. 21 7B 29 8A CF 40 A3 4C 47 40 A3 4C 3C D 4 70 9 F 37 D 4 70 9 F 1E E 4 3 A 42 94 E 4 3 A 42 1E A5 A6 BC ED A5 A6 BC 22 00 00 00 00 00 00 00 00 00 00 00 00 The 9th round ends with the AddRoundKey operation. Let’s call K9 the 9th round key, and suppose it is defined as: AC 77 K9 66 F3 19 FA DC 21 28 57 D1 5C 29 00 41 6 E Then the output will be: 7B 29 8A CF 40 A3 4C AC D 4 70 9 F 77 E 4 3 A 42 66 A5 A6 BC F 3 19 FA DC 21 28 57 D7 59 8B 1B D1 5C 5E 2 E A1 C 3 29 00 EC 38 13 42 41 6 E 3C 84 E 7 D 2 Thanks to the distributive propriety of the XOR operation, and keeping in mind how the output of the Mix Column has been written, it’s easy to express a general formulation of the output as function of the state we should have without error XORed an error matrix, also for the Add Round Key: F9, Ark F9, MC 2 1 K 9 S 9, MC 1 3 00 00 00 2 00 00 00 1 K 9 S 9, Ark 00 00 00 1 00 00 00 3 00 00 00 00 00 00 00 00 00 00 00 00 Applying the S-Box transformation we obtain the output state after the 10th substitute byte operation: 22 D7 59 8 B 1B 5 E 2 E A1 C 3 EC 38 13 42 3C 84 E 7 D 2 0 E CB 3D 58 31 32 CE 07 7 D EB 5F 94 S-BOX AF 2E 2C B5 To write a general form as function of correct output and error, we have to define a differential error, as the XOR between the correct output and the faulty one after the SubBytes operation; we’ll call the i-th byte of this new error matrix by ε1i: F10, SB S10, SB 01 1 11 2 1 3 00 00 00 00 00 00 00 00 00 00 00 00 Let’s go then quickly through the last operations of the algorithm. The shift rows will diffuse the error through all the words of the state: 0 E CB 3D 31 32 2 E 7 D 2C CE B5 EB 5F AF 58 07 94 But while it changes only the order of bytes, both state and error, we can always write: F10, Sh S10, Sh 01 00 00 00 00 00 00 00 00 11 00 21 00 31 00 00 Finally, XORing this last output with the round key K10, we obtain the final output: 23 0 E CB 3D 31 32 2 E 7 D 2C CE B5 EB 5F AF D0 C 9 E1 B6 DE 58 14 EE 3F 63 25 07 F 9 25 0C 0C 84 94 A8 89 C 8 A6 1D 02 DC 09 DC 11 C2 62 97 19 3B 0B 32 And, in general: F10, Ark S10, Ark 01 00 00 00 00 00 00 00 00 11 00 21 00 31 00 00 2.5.2 Key Extraction Information about the last round key can be now extracted starting from the last SubBytes transformation. According to what we said before, in input to the S-Box we have the correct output of the 9th AddRoundKey XORed with the error. If we consider just the first word (the one on which error acts) we can write 4 equations, one for each byte: Sub( x0 02 ) Sub( x0 ) 01 Sub( x1 01 ) Sub( x1 ) 11 Sub( x 2 01 ) Sub( x 2 ) 21 Sub( x3 03 ) Sub( x3 ) 31 Where [x0 x1 x2 x3]T represents the first word of the correct state after Add Round Key 9, ε the error injected and ε1i the differential error after the Sub-Bytes. In a compact form it becomes: Sub( xi ci ) Sub( xi ) i (1) Where xi and ε are the unknown variables. Whereas the SubBytes transformation can be written, as we said in section 2.1.2, in a Matrix form: 24 x0 x0 a x 1 b Sub( x) b According to [16 – Ch. 3.5], proposition one, we search for the set Sc, 1 : x, (1) holds with a particular c and 1 Still according to proposition one, we can explicit ε in (1) obtaining (c (a 1 1 ) e) 1 where e varies in a set: E1 x 2 x GF (28 ) E1 '01' ,..., '1F ' , '40' ,..., '5F ' ,..., ' A0' ,..., ' BF ' , ' E0' ,..., ' FF ' For our example we need to calculate: S 2,'E 7 ' , S1,'51' , S1, 47 , S 3,99 The interception S between these sets represents the set of possible committed faults: ’01’ , ’04’ , ’13’ , ’1E’ , ’21’ , ’27’ , ’33’ , ’3B’ , ’48’ , ’4D’ , ’50’ , ’53’ , ’55’ , ’5D’ , ’64’ , ’65’ , S ’7E’ , ’7F’ , ’80’ , ’83’ , ’8D’ , ’8F’ , ’93’ , ’ A7’ , ’ A8’ , ’ A9’ , ’ AB’ , ’ B3’ , ’ B8’ , ’ C9’ , ’ F6’ Using all the possible committed faults ε in S S c , 1i , we calculate for each 𝜀𝑖1 the 1 1 number θ = (( a ' ) c ) . Then we solve the equation t2+t= θ, and the two solutions α, β will be used to get the possible values of the i-th byte of the last round key, K10[i]: If θ≠1 there are 2 possible values: K10[i] Sub(c ) F10, Ark [i ] or K10[i ] Sub(c ) F10, Ark [i ] The index i represents the byte in which each ε1 appears inside the state after the 10th AddRoundKey. With those expressions we can find for our example some possible values for K10[0] : 25 ’03’ , ’06’ , ’09’ , ’0C’ , ’10’ , ’15’ , ’1A’ , ’1F’ , ’21’ , ’24’ , ’2B’ , ’2E’ , ’32’ , ’37’ , ’38’ , ’3D’ , ’43’ , ’46’ , ’49’ ,’4C’ , ’50’ , ’55’ , ’5F’ , ’61’ , ’64’ , ’6B’ , ’6E’ , ’72’ , ’77’ , ’78’ , ’7D’ , ’83’ , ’86’ , ’89’ , ’8C’ , ’90’ , K10[0] ’95’ , ’9A’ , ’9F’ , ’ A1’ ,’ A4’ , ’ AB’ , ’ AE’ , ’ B2’ , ’ B7’ , ’ B8’ , ’ C3’, ’ C6’, ’ C9’, ’ CC’, ’ D0’ , ’ D5’ , ’ DA’ , ’ DF’ , ’ E1’ , ’ E4’ , ’ EB’ , ’ EE’ , ’ F2’,’ F7’, ’ F8’, ’ FD’ By repeating the attack with new faults, we reduce this set until we’ll obtain only one value for the byte of the round key 10. In our case, injecting also {‘E1’,’B3’,’16’,’9E’} we obtain the exact K10[0]=’D0’, K10[7], K10[10] and K10[13]. 2.5.3 Generalization If error is injected in a byte of a word different from the first one, it will be anyway diffused through the whole word by MixColumn. In this case, the differential fault matrix will be a matrix with all zeros except in the column corresponding to the word in which the fault occurred, so we can bring this case back to the previous one. 2.5.4 Attack Complexity From the study of different attacks performed and published, it has been noticed that to recover the secret key on AES-128 requires a not so high complexity. In fact, Christophe Giraud [26] states that only 50 faulty ciphertexts are needed for a 1-faulty-bit attack, and 250 for a 1faulty-byte one, with a chance to success next to 97%. Moreover, if the attacker can choose a target byte that the error will affect, this numbers fall down to 35 ciphertexts in the first kind and only 31 for the second one. A different kind of DFA attack can be performed also against the AES key scheduler, and has been proved [27] that the time required is similar to the time needed in decrypting 224 blocks which can be completed within one minute on a Pentium 4 computer. 26 2.6 The error detection and correction algorithm We assume that an attacker injects faults that affect a single byte of the word, so at most four errors can be injected into the State. The error is inducted between transformations and we don’t mind about the physical type of the injected fault. We also assume that the encryption key and the round keys are error free and denote by E a 4 by 4 error matrix that represents errors injected into the bytes of the State. Elements ei, j of E are single bytes and represent the error mask applied to the corresponding bytes of the State. The most probable errors injected into the AES algorithm for the aim of fault analysis are byte errors. Therefore we assume that an attacker injects faults that affect a single byte of the word, so at most four errors can be injected into the State. 2.6.1 Parity Bits and error detection Our first purpose is to develop fault detection techniques. To perform this task we use the simplest error detection code, the parity code, which is capable of detecting single bit errors and odd multiple bits errors. Using a single parity bit for the whole data block is of course not enough, because it means obtaining fault coverage around 50%. This value is not acceptable in practice. Moreover it will be very difficult to perform a parity prediction for all data since AES is strongly non-linear algorithm and the parity bit depends on all information bits. A more efficient implementation of parity code suitable for fault detecting in AES algorithm was proposed by some researchers [18] that suggested to associate a single parity bit pi, j with each byte si, j of the State (Figure 14). For a certain byte, the corresponding parity bit is 1 if the number of bits set to 1 of that byte is odd, 0 otherwise: 7 pi , j s (k ) i, j k 0 (2) These parity bits can be disposed in a 4 x 4 matrix, every element in one-to-one correspondence with the related element of the state. This parity matrix allow us to detect odd number of erroneous bit for each byte but it’s necessary to develop, for each round 27 transformation, a method to perform the prediction of output parity given the input state and the input parity. Parity bit detection was developed by the same researchers. We recall here the most relevant aspects of their proposal for each round. Figure 14: Matrix S and parity bits in green [24] The prediction of the output parity bits for Shiftrows is easy: since the transformation only changes the position of elements in each row, the predicted parity bits matrix is obtained shifting the rows of the parity bits matrix relative to the input state to this function in the same way the Shiftrows function does on the state itself. For AddRoundKey step the prediction of the output parity bits consists in the XOR between the input parity matrix of the bits of the state and the parity matrix associated with the current round key. The prediction of the output parity bits of MixColumns instead is mathematically the most complex and is based on the most significant bit of each byte of the state and their parity before the transformation. To justify this we refer to [18 – Appendix A]: if we consider A as a polynomial as: 7 A ai x i i 0 Then we can say that the parity bit associated to the result of 02•A is equal to: p(02 A) a7 p( A) Where a7 is the most significant bit of the byte represented by A and p(A) is the parity bit associated to A. 28 As a direct consequence we can also say that the parity associated to 03•A can be calculated as: p(03 A) p[(02 01) A] p(02 A 01 A) p(02 A) p( A) a7 p( A) p( A) a7 That has been possible thanks to the linearity of the parity bit calculation (since it is a simple XOR bitwise). (𝑖) If we denote with 𝑝𝑟,𝐶 the parity bit of the byte element of the state 𝑠𝑟,𝐶 and by 𝑠𝑟,𝐶 the i-th bit of the byte element 𝑠𝑟,𝐶 with 0 r , c 3 , the predicted parity bit for the first byte in the generic word c of the state can be calculated as: p' 0,c p[(02 s 0,c ) (03 s1,c ) (01 s 2,c ) (01 s3,c )] p(02 s0,c ) p(03 s1,c ) p(01 s 2,c ) p(01 s3,c ) s0(7,c) p0,c s1(,7c) p2,c p3,c Predicted parity bits for the remaining bytes of the word can be found with the same procedure, obtaining this way the complete predicted parity bit matrix: p ' 0 ,c p0,c p2,c p3,c s ( 7 ) 0,c s ( 7 )1,c p'1,c p0,c p1,c p3,c s ( 7 )1,c s ( 7 ) 2,c p ' 2 ,c p0,c p1,c p2,c s ( 7 ) 2,c s ( 7 ) 3,c p ' 3, c p1,c p2,c p3,c s ( 7 ) 3,c s ( 7 ) 0,c Since non-linearity of transformation, parity prediction for SubBytes involves input parity and data from the State. Instead of using complex algorithms, Bertoni et al. proposed to apply look-up table to predict parity. In particular we know that the S-box is usually implemented as a 256 x 8 bits memory. To generate the outgoing parity bits, an even parity bit can be stored with each data byte in the S-box memory, which will now be of size 256 x 9 bits. To detect errors, from a hardware point of view they suggest to replace the original 8-bit decoder with a 9-bit one, a 512 x 9 memory. If a 9-bit address with an even parity is decoded, the corresponding output byte with its associated even parity bit is produced. Otherwise, a constant word of 9 bits with a deliberately odd parity is output. Therefore, half of the entries in the S-box will be intentionally wrong. In the same way, we are instead proposing to store the parity bit of each S-Box entry into a new table, and to access to it each time parity prediction is needed in the same way the access in the S-Box is done. Anyway, a problem with parity bits 29 and SubBytes lies on the fact that different S-Box values have the same parity bit with apparently no particular order or law. So if a fake byte enters the S-Box, the corrupted output could have the same parity bit as if no error has occurred. To better understand this, let’s show a simple example: if the input byte is 04 then the corresponding S-Box output value will be F2. Let’s suppose now that an error with odd parity (we are supposing that only odd parity errors occurs) affect this byte before the transformation. If this error is EF, the value entering the SBox will be: 04 EF EB That has as output E9, also with odd parity, so we can’t detect the error that has occurred. That is why we searched for new solutions, and from this research two new ideas came up: one that uses Cyclic Redundancy Checks (CRC) and another using both direct and inverse SBoxes. A CRC is an error-detecting code that was developed by W. Wesley Peterson, and published in his 1961 paper [28]. It consists in a division operation using the Galois finite field arithmetic. Finally the quotient is discarded and the result is the remainder. The length of the remainder is always less than or equal to the length of the divisor, which therefore determines how long the result can be. Therefore the definition of a particular CRC consists on the definition of the divisor used. For example the parity code (that is the simplest CRC) uses the two bit long divisor "11". Summarizing you have to choose a pattern of r+1 divisor bits in order to produce r check bits. Adding these bits computed, known as checksum, to the original byte (Figure 15), after the transformation you can compute another time the checksum and verify if there was an error injected during the process. Figure 15: Data, Checksum packet and the corresponding math expression 30 Different kind of CRCs can be seen at [29], but for our case we chose CRC-5-EPC code, which uses as divisor the polynomial x5+x3+1. This choice relies on a compromise between the number of error that can be detected and heaviness of the computation needed. In fact more is the length of divisor more errors can be corrected but also more operations are required to compute the checksum. Therefore first of all we build a table that collects the checksum value for the corresponding one stored in S-box. The checksum prediction consists simply in getting the value that corresponds in that table to the one involved in the transformation. Doing that for each byte of the state we obtain the predicted matrix. After the SubBytes transformation we compute the checksum for each byte of the actual state building the actual checksum matrix. The final step is the comparison between the predicted matrix and the actual one. Finally it is important to underline that is not possible to correct all the errors, since there are more than one value that have the same checksum in the table. Thus a particular error pattern can change the state byte into a particular configuration that has the same checksum and in this case the error is not detected. In fact, if we have 0x0d as a byte of the state, it will produce as output of the SBox the value 0xd7, that has 0x68 as checksum. Now, if we inject 0x76 as error, the new byte of the state will be 0x7b, producing 0x21 as output, with 0x68 as checksum, the same as before. This error then won’t be detected by the algorithm. Instead, the other solution is a kind of reverse SubBytes transformation to correct errors in the output state. Also, it doesn’t use the bit prediction matrix because we have seen that this prediction doesn’t detect all errors. In this solution we perform detection and correction in the same time so we’ll discuss how it works further in the correction section. To describe the error detection process we refer to parity bits before the transformation as input parity bits (we denote them simply as pi, j) and assume that they are always error free. The parity bits calculated with respect to input parity and State data are called predicted ones (p’i,,j ). Using predicted and actual parity bits after each transformation it’s possible to understand exactly which byte has been affected by error. This is obvious for AddRoundKey, the first solution of SubBytes and Shiftrows transformation since the errors that ensue from these transformations do not spread between elements of the state. It is is possible then for this operation to perform a XOR between predicted and actual parity bits and, if there is any difference in them, they indicate where the error happened. To have a better understanding of 31 how it works, let’s suppose that we have computed for one of these operations the following actual and predicted parity bits matrices: 1 1 0 0 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 0 Actual Parity bits 1 0 1 1 1 0 0 1 1 0 1 0 Predicted Parity bits The difference between the actual and predicted parity bits in the second row of the first column means that an error occurred in the second byte of the first word. In the case of MixColumns transformation instead, each single byte error diffuses in the complete word in which it was injected to. However, for any error exactly three out of four output parity bits associated with the output word are changed because as shown in figure 16 each parity bit occurs only in 3 of the four equations. We can define an error pattern in the parity bits as the XOR between the predicted and the actual ones. Knowing if the most significant bit of the data is erroneous and the error pattern, by using the table shown in figure 16 it is possible to understand which byte of the word was erroneous. Figure 16: Changes to the parity bits for j -th output Word after MixColumns transformation depending on error pattern inducted and byte of the Word affected by error [24] For example if the most significant bit of the data is error free and the predicted bits are: 1 0 0 1 32 Whereas the actual ones are: 1 1 1 0 Then, the corresponding error pattern for parity bits is the exclusive-OR between the predicted and the actual ones: 0 1 1 1 Now we have got all the necessary information to detect the error. Looking at figure 16, the corresponding byte to our case is byte s1,j . 2.6.2 Parity Bytes and error correction In addition to parity bits Czapski and Nikodem [24] suggested to use a also single parity byte p j for each word W j of the State in order to perform error correction: 3 pj s i0 i, j (3) As for parity bits we use the same hypothesis and notation for actual and predicted parity bytes. First of all we want to show that it is possible to perform the parity byte prediction for each transformation of AES algorithm. We know that ShiftRows transformation consist in rotating rows of the state matrix. Thus output parity byte for the j-th Word is determined as: pj s0, j s1, j 1mod 4 s1, j 2 mod 4 s1, j 3 mod 4 (4) Using this we obtain, extracting from (3) s0,j as function of pj and the other bytes: pj p j s1, j s2, j s3, j s1, j 1mod 4 s2, j 2 mod 4 s3, j 3 mod 4 33 Similarly AddRoundKey is a linear transformation in which each byte of the State is XOR-ed with the corresponding byte of the round key matrix. So the output parity byte can be easily predicted according to the following formula: pj s0, j k0, j s1, j k1, j s2, j k 2, j s3, j k3, j p j pk j Where pk j is the parity byte of the j-th word of the round key matrix. Also parity byte prediction in MixColumn is a little bit more complex than the other operations. Referring to the definition of the transformation, the output parity byte that should be produced can be expressed as: 02 s0, j 03 s1, j s2, j s3, j s0, j 02 s1, j 03 s2, j s3, j pj s0 , j s1, j s2 , j s3, j s0, j s1, j 02 s2, j 03 s3, j 03 s0, j s1, j s2, j 02 s3, j Due to associativity of the XOR sum: pj s0, j 02 01 01 03 s1, j 03 02 01 01 s2, j 01 03 02 01 s3, j 01 01 03 02 And since 03 02 01 we get: p j s0, j s1, j s 2, j s3, j p j Therefore MixColumns transformation maintains the parity byte for the word. SubBytes, in the end, is a non linear transformation of each byte of the state. The output parity byte after the transformation is: p j s 0 , j s1, j s 2 , j s 3, j Where si, j a s 1 i, j b according to the definition of the transformation. Thus we obtain: pj a s 10, j s 11, j s 12, j s 13, j (5) 34 Since b b b b 0 and the operations and are distributives. From (5) and (3) we can write the formula to predict parity byte we should have after the transformation: pj a p j s1, j s 2, j s3, j s 11, j s 1 2, j s 13, j 1 (6) Using predicted and actual parity bytes of the word it is possible to determine the error injected in the state. Using parity bits it is also possible to know in which byte of the state this error has occurred. On this bases we can build a correction matrix and, simply XORing the correction matrix with the output state of current AES transformation, it is possible to correct odd number of errors for each byte of the state. Figure 17: AES’s input State with parity bits and bytes, output State and correction mask [24] Then, XOR-ing the predicted and the actual parity byte we obtain the correction mask to apply to the erroneous byte of the state after the SubBytes transformation. For 3rd solution instead, the byte prediction allows comparing the predicted parity byte vector before SubBytes with the parity byte vector of the output state. This way, we are able to detect almost all erroneous word of the output state. Then, for each i-th row of the j-th erroneous word, we use the Invert SBox to determine the previous byte[i][j] of the state before the transformation. This allows seeing if the substitute byte[i][j] of the actual state is the corresponding SBox value of the previous state. Then, if the bytes are different, we compute the SBox value of the previous state for the erroneous byte[i][j] and we inject it in the output state. 35 For the other transformations we follow again [24]. Let’s call the error revealed by the XOR between predicted and actual parity bytes as p j , and as si,j the output byte of a transformation; for the shift rows it is easy to verify that we get: p j p' j s0, j s1, j 1mod 4 s2, j 2 mod 4 s3, j 3 mod 4 Exchanging p' j with the predicted parity byte expression of the shift row and erasing equal factors, we obtain: p j p j s0, j s1, j s2, j s3, j Simillarly, in AddRoundKey: 3 p j p' j si , j k i , j p j s0, j s1, j s2, j s3, j i 0 In both cases the error is equal to 0 if and only if the j-th Word is error free; otherwise it is equal to the error ei,j injected in the byte of the state. Therefore, the correction matrix ₵ is an all-zero 4x4 matrix except for the i-th byte of the j-th word where error was detected, in which we put the error pattern: e i r , j w ci , j 0 elsewhere In the case of MixColumn we have to consider that a single byte error spreads across the whole Word. Let’s consider for example a state matrix affected by an error in one byte of a certain Word, and perform on it the MixColumn transformation by multiplicating the word by the characteristic matrix of the operation to describe the output state: s'0, j 02 s '1, j 01 s ' 01 2, j s ' 03 3, j 03 01 01 s0, j 02 03 01 s1, j 01 02 03 s2, j 01 01 02 s3, j e0, j e1, j e2 , j e3, j 36 Since we supposed that at most one byte can be affected by an error in a word, only one of the four ei,j is different from zero, and it’s easy to see how the output is the output state we expect without error, plus an error vector e' depending on which byte has been corrupted: e' 0, j 02 e0, j , e0, j , e0, j ,03 e0, j T T e'1, j 03 e1, j ,02 e1, j , e1, j , e1, j e' e ,03 e ,02 e , e T 2, j 2, j 2, j 2, j 2, j e' 03 e , e , e ,02 e T 3, j 3, j 3, j 3, j 3, j If e0,j ≠ 0 If e1,j ≠ 0 If e2,j ≠ 0 If e3,j ≠ 0 The correction matrix ₵ then is an all-zero 4-Words matrix except the w-th word which has been corrupted (that we are able to identify thanks to the parity bit), and this word has the structure of the error vector e’ shown above. 37 Chapter 3 Implementation of the error detection and error correction algorithm In this section we expose the implementation phase we have modified a software algorithm of AES to make it able to recognize if and where an error occurred and eventually to correct it, basing on the theoretical description of the previous chapters. We used a simple C implementation [30] of AES, trying to keep the correction/detection part as much separated as possible from the original AES algorithm, so that future developments on more efficient AES implementations can include this part without rewriting the whole code. Pieces of code are also presented to make the description as much understandable as possible, but we will omit for cycles or if structures that are not needful for a global idea of the implementation. For the complete code check the Appendix A at the end of this report. We used only the following variables from the original code: unsigned char RoundKey[240]; unsigned char State[4][4]; The first one stores all the Nb(Nr+1) RoundKeys computed by the key scheduler where Nb is the number of columns of AES state, usually 4 (We have let on purpose the two [4][4] dimensions here because it is part of the original AES code. In the variables and functions we added we replaced it by a “#define Nb 4”), and Nr is the number of rounds, depending on the length of the input key) and the state after each transformation. 3.1 Parity bit and byte computation The function getParitybit() takes as input a matrix where computed parity bits are going to be stored: void getParityBit(unsigned char bit[][Nb]) Usually this matrix is: 38 unsigned char ActualParityBit[Nb][Nb]; It stores the parity bits relative to the current state, but sometimes it will be: unsigned char PredictedParityBit[Nb][Nb]; Because, as we’ll see, for some operations these two matrixes are almost the same one. In a first moment, we obtain the value of the least significant bit of the byte of which we want the parity bit performing an AND bitwise between the i-th byte of the j-th word of the state and a mask variable formed by all zeros except the least significant bit. If we perform then a right-shift by 1 position on this byte and apply the AND operation with the same mask, we obtain the second least significant bit. So, doing this in a recursive way for all the 8 bits of the byte, and XORing all the resulting bits, we get the parity bit associated to the byte in question. bit[j][i]^=((state[j][i]>>k)&Mask); Here i and j vary from 0 to 3 to identify bytes and words of the state, and k from 0 to 7 to identify after the AND the k-th bit of the byte. Instead, regarding parity bytes, the function getParityByte() receives a 4-byte vector where the computed parity bytes are stored. void getParityByte(unsigned char byte[]) Also in this case usually the vector is the one related to the actual state: unsigned char ActualParityByte[Nb]; But while MixColumn doesn’t change parity bytes, we will use the same function to store also the predicted parity bytes in the vector unsigned char PredictedParityByte[Nb]; The computation of the j-th parity bytes is performed simply XORing all the bytes in the j-th word. 39 byte[j] ^= state[i][j];; Doing this for each word gives us the complete parity byte vector. Two different functions are used to get the parity bits and bytes of the RoundKey in a specific round: void RoundKeyParityBit(int round); void RoundKeyParityByte(int round); The operations performed from these two functions are the same we mentioned above, but here we need to give the number of the round to the function to extract from the RoundKey vector the actual round key: RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb +j]>>k)&Mask); RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]); 3.1.1 Parity Check The function ParityBitcheck() is used after each AddRoundKey, ShiftRows and SubBytes transformations to compare the parity bit computed on the output state of the transformation with the parity bit predicted before. If the comparison returns 1 then an error occurred and a correction operation is needed. For this reason we set to 1 the error flag and a flag inside a Boolean vector to identify which word has been corrupted. In a vector of integers we store also the position of the byte in a certain word affected by the error: if(ActualParityBit[i][j]!=PredictedParityBit[i][j]) { error=1; ErrWord[j]=1; ErrByte[j]=i; } 40 3.2 Error detection and correction for ShiftRows The ShiftRows step of the AES algorithm is not the hardest one. In fact, as we have seen in the AES description before, it just shifts the rows of the actual state in this way: Row 1: not modified Row 2: shifted one to the left Row 3: shifted two on the left Row 4: shifted three on the left It is then possible to reuse this process to perform the parity prediction that will help to locate errors. 3.2.1 Parity prediction The first parity prediction to perform is the parity bit prediction. The parity bit matrix is a 4x4 matrix of binaries values. Each one corresponds to the parity of the hexadecimal value (XOR of each bit of the byte). The first step is to compute the Actual Parity Bit matrix and then we reuse the ShiftRows process to shift the bits of the Actual parity Bit matrix to create the Predicted parity Bit matrix. void SR_BitPrediction() { unsigned char temp; getParityBit(PredictedParityBit); // Rotate first row 1 columns to left temp=PredictedParityBit[1][0]; PredictedParityBit[1][0]=PredictedParityBit[1][1]; PredictedParityBit[1][1]=PredictedParityBit[1][2]; PredictedParityBit[1][2]=PredictedParityBit[1][3]; PredictedParityBit[1][3]=temp; // Rotate second row 2 columns to left temp=PredictedParityBit[2][0]; PredictedParityBit[2][0]=PredictedParityBit[2][2]; PredictedParityBit[2][2]=temp; temp=PredictedParityBit[2][1]; 41 PredictedParityBit[2][1]=PredictedParityBit[2][3]; PredictedParityBit[2][3]=temp; // Rotate third row 3 columns to left temp=PredictedParityBit[3][0]; PredictedParityBit[3][0]=PredictedParityBit[3][3]; PredictedParityBit[3][3]=PredictedParityBit[3][2]; PredictedParityBit[3][2]=PredictedParityBit[3][1]; PredictedParityBit[3][1]=temp;} Then we performed the Parity Byte Prediction to predict the parity byte vector after ShiftRows. p’j= S0, j ⊕ S1, j+1 mod 4 ⊕ S2, j+2 mod 4 ⊕ S3, j+3 mod 4. void SR_BytePrediction() { … PredictedParityByte[j] = ActualParityByte[j] ^ state[1][j] ^ state[2][j] ^ state[3][j] ^ state[1][(j+1)%4] ^ state[2][(j+2)%4] ^ state[3][(j+3)%4]; … } 3.2.2 Error correction The error correction for Shift Rows is performed by locating errors prior to correct them. In our algorithm, we use the ParityBitCheck() function that returns 1 if one or more errors are detected. If 1 is returned, the correction starts instead nothing happens. void Correction() { … if (ErrWord[j]==1) { CorrectionParityByte[j] = PredictedParityByte[j] ^ state[0][j] ^ state[1][j] ^ state[2][j] ^ state[3][j]; state[ErrByte[j]][j]^=CorrectionParityByte[j]; } 42 … getParityByte(ActualParityByte); error=0; } The Boolean variable error is in the end set to 0 again to indicate that no errors are now affecting the actual state. The same procedure for error correction will be used for AddRoundKey and the parity bit version of SubBytes. 3.3 Error detection and correction for MixColumns In order to perform the MixColumns bit prediction we need to implement the equations [18 – Appendix A]: Therefore first of all we need a function to compute the most significant bit of a byte. We call it getMostsignificantBit(). This function accepts as input a byte and perform an AND between the byte mask 0x01 and the shifted version of the byte seven times. The function returns the bool value of the most significant bit. bool getMostsignificantBit(unsigned char data_received) { bool MSB; if((data_received>>7)&Mask==1) MSB=1; else MSB=0; return MSB; } 43 3.3.1 Parity Prediction The bit prediction is completed by the function MC_BitPrediction(). This function accepts as input a matrix, in particular the actual parity bit matrix before the mixcolumns transformation and computes the predicted parity bit matrix according to previous equations: void MC_BitPrediction() { getParityBit(ActualParityBit); for(int i=0;i<4;i++) { PredictedParityBit[0][i]=ActualParityBit[0][i]^ActualParityBit[ 2][i]^ActualParityBit[3][i]^getMostsignificantBit(state[0][i])^ getMostsignificantBit(state[1][i]); PredictedParityBit[1][i]=ActualParityBit[0][i]^ActualParityBit[ 1][i]^ActualParityBit[3][i]^getMostsignificantBit(state[1][i])^ getMostsignificantBit(state[2][i]); PredictedParityBit[2][i]=ActualParityBit[0][i]^ActualParityBit[ 1][i]^ActualParityBit[2][i]^getMostsignificantBit(state[2][i])^ getMostsignificantBit(state[3][i]); PredictedParityBit[3][i]=ActualParityBit[1][i]^ActualParityBit[ 2][i]^ActualParityBit[3][i]^getMostsignificantBit(state[3][i])^ getMostsignificantBit(state[0][i]); } } But as we said in section 2.4.1, the information about predicted and actual parity bit is not enough to detect errors in Mixcolumns transformation since the error spread in the whole word. Thus it is necessary to define another function, the MC_ParityBitCheck(). This function is basically an implementation of the table in figure 16, which allows understanding which byte in the word is erroneous. As you can see in the table, in order to recognize the wrong byte in the word we need two values: the most significant bit of the byte error pattern and the parity bit error pattern of the 44 word. Regarding the byte error pattern, it is important to underline that we assumed that not more than one byte is incorrect in each word. This means that by performing the XOR between the predicted and the actual parity byte we obtain exactly the error pattern of the erroneous byte. We call the result of this operation correction parity byte and looking at this parameter we can understand if there is an error and the error mask involved for the wrong byte. If there is an error (that is the correction parity byte is not 0x00) we get the most significant bit by the getMostsignificantBit() function obtaining the first information. if(CorrectionParityByte[i]!=0) { error=1; bool msb=getMostsignificantBit(CorrectionParityByte[i]); Regarding the parity bit error pattern we have just to XOR the predicted and actual parity bit of the involved word. Once this vector has been known, you have to check in which position it is stored inside the error bit pattern matrix and looking at the entry point you can understand which byte is the erroneous one. The check is simply made doing AND operation between the parity bit error pattern and the corresponding column of the error bit pattern matrix for each word of the matrix. In particular in the case of most significant bit of the byte error pattern equals to zero the matrix involved is the upper part in figure 16. Finally we stored in the vector ErrByte the position of the erroneous byte for each word. Below you can see the code in the case of most significant bit equal to zero. if(msb==false) { for(int k=0;k<4;k++) { if(PBitError_Pattern[0][i]==matrix_msb0[0][k] && PBitError_Pattern[1][i]==matrix_msb0[1][k] && PBitError_Pattern[2][i]==matrix_msb0[2][k] && PBitError_Pattern[3][i]==matrix_msb0[3][k]) { ErrByte[i]=k; } } 45 3.2.2 Error correction By means of error detection is possible to understand which byte of the word is erroneous. With this information is possible to obtain a correction vector that is function of the position of the wrong byte in the word affected by error. Thus each column of the correction matrix consists by all zeros if there is no error and vice versa by a correction vector that relies on the position of the error inside the word: Figure 18: Correction matrix for MixColumn transformation [24] In order to build the correction matrix we need to multiply the error mask to coefficient like 0x02 and 0x03 in Galois field GF (2 8 ) . Therefore the first step for the error correction is the implementation of multiplication in GF (2 8 ). This operation has been well described by NIST in [25]. In brief, we can compute the multiplication between a generic byte (we call it f) and a value x (in our case 0x02) in this way: b 6 ,b 5 ,b 4 ,b 3 ,b 2 ,b1 ,b 0 ,0 if b 7 0 x f ( x) b 6 ,b 5 ,b 4 ,b 3 ,b 2 ,b1 ,b 0 ,0 (0,0,0,1,1,0,1,1) if b7 1 Then the product between 0x03 and f can be performed XORing the byte f (that can be seen as 0x01•f) and 0x02•f. In the AES code we used, this operation is performed by a macro: xtime #define xtime(x) ((x<<1) ^ (((x>>7) & 1) * 0x1b)) 46 As one can see, doing x<<1 we obtain b6 ,b5 ,b4 ,b3 ,b2 ,b1,b0 ,0 . The second member of the XOR is equal to zero if the most significant bit b7 is equal to 0, vice versa to 0x1b. b7 is obtained by doing AND operation between the seven times shifted version of the byte and the mask 0x01. An example of the code used to build the correction matrix in the case of error located in the first row is shown below: if(CorrectionParityByte[i]!=0) { if(ErrByte[i]==0) { CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i]); CorrectionMatrix[1][i]=CorrectionParityByte[i]; CorrectionMatrix[2][i]=CorrectionParityByte[i]; CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i])^ CorrectionParityByte[i]; } The final step is just the XOR between the correction matrix obtained and the erroneous state in the output of Mixcolumns transformation. … state[i][j]^=CorrectionMatrix[i][j]; … 3.3 Error detection and correction for AddRoundKey Before each AddRoundKey transformation we need to get the actual parity bit and byte of the state by the functions described above to perform the prediction. 47 3.3.1 Parity Prediction The function ARK_BitPrediction() receives as argument the current round number and computes the predicted parity bits simply XORing the actual ones with the parity bits computed for the round key. So first of all we need to call the RoundKeyParityBit() function that builds the RKeyParityBit matrix containing the parity bits for the round key specified by the number of the actual round given by argument: RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb +j]>>k)&Mask); As the round keys are all stored in one single vector, to enter this vector in the right position and get the round key we need, we use the round number and the number of columns Nb to go through the whole round key. Then the parity bit is computed using the Mask 0x01 in the same way the getParityBit() function does. Once the RKeyParityBit matrix has been computed we can get the predicted parity bits: PredictedParityBit[j][i]=ActualParityBit[j][i]^RKeyParityBit[j][i]; The Byte prediction instead is performed by the ARK_BytePrediction() function, that similarly to the bitPrediction function explained just before, in a first moment invokes the RoundKeyParityByte() that computes the parity byte for the round key specified by the argument XORing the 4 bytes of the i-th word of the round key: RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]); Then the predicted parity byte is the XOR between the actual parity byte and the round key parity byte: PredictedParityByte[j] = ActualParityByte[j] ^ RKeyParityByte[j]; After the transformation we use the ParityBitCheck() to check if an error occurred. 48 3.3.2 Error Correction Using the variable error then we perform the correction if and only if an error has been detected. The correction is performed using the same Correction() function used for ShiftRows. ParityBitCheck(); if(error==1) { Correction(); } else getParityByte(ActualParityByte); 3.4 Error detection and correction for SubBytes The SubBytes process, as it was described in the AES algorithm definition, is a substitution of each byte of the state with the corresponding byte of the SBox table already defined in our code. As we found some caveats in a first implementation of the error detection and correction for this transformation, we proposed different solutions and in the next chapter we will discuss the advantages and drawbacks of each. 3.4.1 Parity Bit based SubBytes error detection and correction Concerning the bit prediction matrix we thought that, instead of using the SBox and then calculate the parity bit for each byte of the state, it could have been more efficient to precompute a SBoxParity bits table and store it in memory. In fact, this table (defined as a vector in the code) is the computation of the parity bit for each byte of the SBox table. This way, the algorithm has only to look for the corresponding parity bit for each byte of the state to compute the parity bit matrix for SubBytes. This is the result of the SBox parity table that we add in a function getSBoxParity() to be called exactly as the getSBox() function: 49 int bitSBox[256] = // 0 1 2 3 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, { 4 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 5 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 6 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 7 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 8 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 9 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, A 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, B 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, C 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, D 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, E 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, F 1, //0 0, //1 1, //2 1, //3 0, //4 0, //5 1, //6 0, //7 1, //8 0, //9 1, //A 1, //B 1, //C 1, //D 1, //E 1}; //F We just call the function with the actual state matrix to get the corresponding predicted parity bit matrix: void SB_BitPrediction() { PredictedParityBit[i][j] = getSBoxParity(state[i][j]); } Then, we predicted the parity byte vector after SubBytes. This vector is computed considering the output state according to the definition of SubBytes in the matrix form: S '[i][ j ] A (S 1[i][ j ]) b In the way we already discussed in section 2.4.2: we first compute the Multiplicative Inverse of the values we need from the formula (6) using the MulInv table that contains all the multiplicative inverses of the Galois field G(28). Then we multiply the result for the matrix A with an AND operation and a recursive XOR to get the i-th bit of the j-th Parity Byte. Each bit then is inserted through right shifts in the PredictedParityByte[j]: void SB_BytePrediction() { unsigned char temp=0x00; for (int j=0;j<4;j++) { unsigned char P1=(MulInv[(ActualParityByte[j]^state[1][j]^state[2][j]^state[3][j])]) ^(MulInv[state[1][j]]^MulInv[state[2][j]]^MulInv[state[3][j]]); PredictedParityByte[j]=0x00; for (int i=7;i>=0;i--) { temp = A[i]&P1; 50 unsigned char bit=0x00; for(int k=0;k<8;k++) { bit^=((temp>>k)&Mask); } PredictedParityByte[j]^=bit<<i; } } } To detect the error finally we use the ParityBitCheck() function. If the error variable is set to 1 then we invoke the Correction() function used in AddRoundKey and ShiftRows to correct it. 3.4.2 Inverse SBox based error detection and correction In this algorithm, we don’t use the parity bit matrix but directly the parity byte vector which is mostly error free (Nevertheless, some limits can be still found). This correction is a kind of reverse SubBytes process to correct errors. Each step of the algorithm is described in the following: - First, we need to store the state before the SubBytes process and the error injection (It means that errors cannot occur everywhere) in order to use it afterwards: void storePreviousState() { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { pstate[i][j] = state[i][j]; } } } - We use directly the predicted parity byte vector to verify if an error occurred. We compare the ActualParityByte with the PredictedParityByte. If these last two vectors are different in the j-th word, the following of the algorithm will only be performed on this word. 51 - Then, for each row[i]of the[j]erroneous word, we use the function getSBoxInvert(), to see if the substitute byte[i][j]of the actual state is the corresponding SBox value of the previous state (pstate[i][j]). - If the two values are the same nothing is done, instead the actual state receives the corrected value. void SB_Correction() { for(int j=0;j<4;j++) { if (ActualParityByte[j]!=PredictedParityByte[j]) { for(int i=0; i<4; i++) { if(getSBoxInvert(state[i][j]) != pstate[i][j]) { state[i][j] = getSBoxValue(pstate[i][j]); } } } } } 3.4.3 CRC based error detection and correction In this other solution we use CRCs to implement the error detection for SubBytes. The first thing a CRC needs is the checksum for data. To perform the checksum prediction we defined a table for this purpose, SBox_checksum, where we stored the corresponding checksum for each value of the S-Box. Each value has been computed according to what explained for the CRC codes in section 2.4.1. checksum_sbox[256] = { //0 1 2 3 4 0xb8, 0x58, 0xf8, 0x00, 0x90, 0xc0, 0x30, 0x08, 0xd8, 0xf8, 0x18, 0xa0, 0x60, 0x30, 0xe0, 0x90, 0xb8, 0x20, 0x28, 0xb8, 0xe8, 0xb0, 0x10, 0xf0, 0x70, 0x80, 0xb0, 0x00, 0x70, 0xe8, 0x30, 0x38, 0xb0, 0x78, 0x50, 0xc8, 0x58, 0x98, 0x48, 0xe0, 0x98, 0xf8, 0x18, 0xf0, 0x78, 0x70, 0xf8, 0xa8, 0xc8, 0xa0, 0x08, 0x70, 0x18, 0x20, 0x70, 0x50, 0x88, 0x60, 0x08, 0x00, 0x60, 0xc8, 0xf8, 0x58, 0x28, 0xa0, 0x88, 0x50, 0xa8, 0xf0, 0x88, 0xb0, 0xc0, 0x50, 0x98, 0x80, 0x10, 0x90, 0x78, 0x70, 5 0xd0, 0xa0, 0x08, 0x70, 0xc0, 0x20, 0xe0, 0xd0, 0xf0, 0xc8, 0xd8, 0x20, 0x48, 0xc8, 0xd8, 0xd0, 6 0x40, 0xc0, 0x80, 0x10, 0x68, 0xc0, 0xf0, 0x50, 0x08, 0xa8, 0x78, 0x28, 0xd0, 0x00, 0xc8, 0xd0, 7 0xf0, 0xd8, 0x18, 0x88, 0x90, 0xe8, 0x68, 0xc8, 0x88, 0x10, 0xb0, 0x78, 0x38, 0xb0, 0x38, 0x18, 8 0x38, 0xe8, 0xa8, 0x58, 0x00, 0x50, 0x88, 0xb8, 0x70, 0x40, 0xa8, 0x88, 0x60, 0xf0, 0x08, 0x18, 9 0x80, 0xa0, 0x80, 0x98, 0x98, 0x40, 0x30, 0x98, 0xc8, 0xb8, 0xf8, 0x90, 0x48, 0x28, 0x60, 0x40, A 0x28, 0xd8, 0x18, 0x78, 0xe8, 0xf0, 0x48, 0x10, 0x10, 0x28, 0x68, 0x48, 0x30, 0x10, 0x20, 0x90, B 0x48, 0xa0, 0x58, 0x40, 0x88, 0xd0, 0x90, 0x68, 0x40, 0x40, 0x38, 0x28, 0xe0, 0xa8, 0xe0, 0x30, C 0x68, 0x50, 0x20, 0xa8, 0x00, 0xb8, 0x48, 0xd0, 0xe0, 0x80, 0x28, 0x60, 0x38, 0xa0, 0x50, 0x40, D 0x68, 0x00, 0x58, 0xb0, 0xc0, 0x60, 0xc0, 0xe8, 0x30, 0xf8, 0xb8, 0x80, 0x38, 0x60, 0x58, 0xd8, E 0x30, 0xe8, 0xb8, 0x08, 0xd8, 0x20, 0x98, 0x10, 0x38, 0xa0, 0x98, 0x20, 0xd8, 0xa8, 0x80, 0xe0, F 0x78, 0xe0, 0xc0, 0xb0, 0xe8, 0xd0, 0xf8, 0x78, 0x68, 0x90, 0x48, 0x68, 0x58, 0x18, 0x00, 0x08}; //0 //1 //2 //3 //4 //5 //6 //7 //8 //9 //A //B //C //D //E //F 52 As one can see the elements stored in this table are bytes, instead the real length of the checksum obtained from our CRC is five bits. That is because we found easier to operate with bytes, so we just added three zeros after the real checksum in order to represent it as a byte. For the checksum prediction we just need to entry in the SBox_checksum with the same entry of the SBox and store the content in the prediction matrix. The main operation performed by the SB_Prediction_Checksum() is then: checksum_matrix[i][j] = SBox_checksum(state[i][j]); After the transformation, the detection is performed by SB_detection() that compares the predicted checksum for the state with the actual one. if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j])) { error=1; ErrWord[j]=1; ErrByte[j]=i; } The actual checksum is computed using Checksum_calculation(), a function that applies the definition of checksum for our CRC to the byte given by argument. The checksum is the remainder of the division between the byte of the state and the characteristic divisor x5+x3+1 (in hexadecimal it corresponds to 0xa4): to do this we XOR the byte with the generator polynomial if the most significant bit of the byte is 1, otherwise we shift it right by 1 position. Everything is performed 7 times: while(k<7) { if(byte<0x80) { byte=byte<<1; k++; } else byte^=poly_gen; } Finally the detection in completed running SB_detection() function that simply performs the XOR between the predicted and the actual checksum matrix. if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j])) { error=1; 53 ErrWord[j]=1; ErrByte[j]=i; } To correct the error, if it occurred and has been detected, the correction takes place in the same way as AddRoundKey, ShiftRows and the parity bit based solution of SubBytes. 54 Chapter 4 Error coverage and performances tests After being testing the right way of working of the implementation by the encryption of several 128-bit blocks of data using the NIST “AES Known Answer Test vectors”[31], we wanted to analyze its advantages and drawbacks. The paper on which the implementation of this correction algorithm is based [24] stipulates that not all errors are corrected. In fact, the authors assume that only single and odd number of bits for each byte of the state can be corrected. Also, the other limit is that it can correct at most one erroneous byte in each word (column) of the state. We already justified in chapter 2 the reason of this limit, but we wanted to verify it also in our implementation. Furthermore we wanted to test the performances of our code with different numbers of errors, focusing on time, memory and CPU usage. 4.1 Test Environment To simulate an error injection a new function has been added to the code: it simply creates a 4x4 byte matrix representative of the error pattern to inject and performs a XOR between this matrix and the state. void error_injection() { Int errormask[Nb][Nb]={{0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00}, {0x00,0x00,0x00,0x00},{0x00,0x00,0x00,0x00}}; for (int i=0;i<4;i++) { for(int j=0;j<4;j++) { state[i][j]=state[i][j]^errormask[i][j]; } } } To inject an error then we had simply to specify the error pattern and call the error_injection() function just before the transformation we wanted to test. 55 The test vector used during the testing process is composed by: Key = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Plaintext = ea 83 5c f0 04 45 33 2d 65 5d 98 ad 85 96 b0 c5 That produces as output: Ciphertext = 76 ed 47 01 93 fe 61 e0 24 1b 64 c4 55 9f 11 2c To analyze time and CPU load, we used the Profiling tool provided by Visual Studio 2010 premium beta 2, and the Microsoft performances monitor to look at the memory usage. All the tests were performed on an Intel 64 bits Code 2 Duo T9950 at 2.66GHz with 4GB of RAM and Microsoft Windows 7 Professional. 4.2 Assumptions proof The first assumption we made was that only bit errors of odd multiplicity can been detected. We injected that in our cipher text an error with even multiplicity and, as expected, the cipher hasn’t been able to detect it, producing a wrong output. For example, injecting the error 0x03 in the first byte of the first word of the state before the first AddRoundKey transformation, we obtained: 65 ca 86 03 13 ad ee 66 b4 8a 57 9f 45 b8 b4 49 The second assumption instead was that only errors in different words of the state can be corrected. In fact, injecting for the same transformation a single bit error 0x01 in the 1st and 2nd byte of the first word of the state we obtained again a wrong ciphertext: a7 52 cf 30 67 63 b7 02 e3 4d 37 5e 3c c2 d7 73 That justifies once again our assumptions. 56 4.3 ShiftRows Limits In the ShiftRows step, the correction algorithm misses some errors if, at the moment of the correction, two erroneous bytes take place in the same word of the state. While we were injecting errors with an error mask, we didn’t know that by injecting errors before ShiftRows, errors would have moved and the correction wouldn’t have worked. Let’s take an example to see how it works: First we inject an error mask that we assume is good because it has only one erroneous byte in each word: 01 00 00 00 ea 00 00 00 b 4 00 00 00 c5 00 00 00 After XORed with the state (whatever what the state looks like) the result is: 01 12 ab 65 ea 90 7 d 92 b 4 28 9a 6b c5 86 4e 45 01 92 6b 45 ea 90 7d b4 28 12 c5 ab 9a 65 86 4e After ShiftRows: Hence three erroneous bytes appear in the same word and the correction won’t be efficient. This example helps to show a kind of limit of the correction algorithm that if errors are injected before the ShiftRows process, the erroneous bytes have to take place by thinking about the result of the ShiftRows step in order to test the algorithm. Obviously, this testing limit doesn’t appear when the error is injected directly after the shift of each row. 57 In fact, injecting the error pattern in the last example, the encryption process produced the ciphertext: 9d 11 e3 5b 92 ed af 15 d9 b9 09 7f b4 c9 e3 f2 4.4 Parity bit based SubBytes Limits As already explained in section 2.4.1, different S-Box values have the same parity bit with apparently no particular order or law. To have an idea of the fault coverage of this solution, we injected 9000000 random faults for 15 times before the transformation, outlining a statistical model of how many errors have been successfully detected and corrected: the percentage of detected/corrected errors was of the 52%, with a variation coefficient of the 1,6%. This error coverage has been considered very low, that is why we looked for other solutions. 4.5 CRC based SubBytes Limits Similarly to the previous case, it could happen that different bytes have the same checksum, thus a particular error pattern can change the state byte into a particular configuration that have the same checksum and in this case the error is not detected. We repeated the same test as the previous case, obtaining a percentage of detected/corrected errors of the 96,87% with a variation coefficient of the 0,36%, that is substantially higher than the previous solutions. 4.6 Performances analysis Once we analyzed and verified the behavior of the algorithm, it was necessary to check how the AES performances could be lowered by the detection/correction procedure, and how different approaches impact on time, memory and CPU usage. In order to point out the overheads, we compared our modified AES code with the original one. We will discuss the most relevant results and for further details it is possible to refer to the appendix B. 58 4.6.1 Time At the beginning we analyzed the execution time of the original AES code for a single encryption of a 128-bit block of data. We performed 10 executions on the bounce and the average time of a single encryption was 109µs. Then, we did the same process for our algorithm with the different implementations for SubBytes obtaining the following results for each one. We will analyze the implementations without any error and with error(s) injected. The overheads were computed by taking the original AES code as a base. 4.6.1.1 Without error The first tests dealt with error free data: - Paritybit based: 152µs (39% overhead) - CRC based: 162µs (49% overhead) - InvSBox based: 141µs (29% overhead) Obviously, a part of all these overheads is due to AddRoundkey, ShiftRows and MixColumns. The differences belong to the SubBytes transformation in each solution. Some justifications of these results can be explained by looking at the process of detecting errors in the SubBytes step. Whereas Paritybit and CRC based solutions need to use several times values from a look-up table, InvSBox implementation only uses parityBytes. The drawback of the latter solution is that it cannot detect the position of the erroneous byte in a word. We can also distinguish an overhead difference between the two first solutions resulting from an ulterior checksum calculation after the transformation for the CRC solution that implies more memory accesses than the Paritybit based one. 4.6.1.2 With a single error injected For this second series of tests we injected a single bit error in each transformation and it allowed us to see, on the one hand, the global overhead, and on the other hand, which function was the slowest to detect/correct the error. 59 The average results are: - Paritybit based: 40% overhead - CRC based: 49% overhead - InvSBox based: 31% overhead As we can see the overheads for Paritybit and CRC based solutions are quite the same as the case without error. Instead, in the InvSBox based version, the 2% more can be justified by the new look-up table invSBox involved in the correction process. By the way, in all solutions, the correction of an error injected before the SubBytes transformation is the most time consuming because it adds respectively 3%, 2%, and 5% to the overheads where no errors were injected. 4.6.1.3 With four errors injected Finally, we analyzed the overheads in the worst case, with four errors injected in each transformation (one in each word). The error matrix injected into the state looks like: 𝑒𝑎 00 [ 00 00 𝑓1 𝑏𝑐 00 00 00 00 00 00 𝑎4 00 ] 00 00 The average results are: - Paritybit based: 44% overhead - CRC based: 52% overhead - InvSBox based: 34% overhead We could not apply this test to the Paritybit based implementation because of the low error coverage of it. Regarding the other two proposals, the most relevant difference has been noticed in the SubBytes transformation with 5% more in the overhead for the InvSBox solution whereas in the CRC based one it raises from 1%. This happened because while the CRC based corrects only the erroneous byte, the InvSBox based corrects the error going through all bytes of each word. 60 4.6.2 CPU load The CPU load tests allowed us to compare the CPU consumption of every function in all solutions. Visual Studio Profiler collects a sample of the current process state. Sampling is a nonintrusive, statistical approach to profiling. The more samples collected in a function, the more processing the function has likely performed. By default, Visual Studio Profiler collects one sample every 10 million CPU cycles. This way it is possible to see which function is the heaviest in terms of CPU load. It is important to notice, that sampling collects information only when the program uses the CPU. Thus, while your process is waiting for disk, network, or any other resource, Visual Studio Profiler does not collect samples [32]. For this reason we had to repeat the encryption process for a large number of times. The following values are related to the whole encryption process. Figure 19: CPU load for the Paritybit based solution 61 For the Paritybit based solution, the most used functions are getSBoxParity() and getSBoxValue() (Fig.19). In fact, these two functions use load and store instructions to move values from a look-up table to CPU registers and vice-versa. Figure 20: CPU load for the CRC based solution For the CRC and InvSBox based solutions the situation remains almost the same as the previous one. Finally, we can see that the SubBytes transformation requires more CPU usage than other ones. (Fig.20 and 21). Figure 21: CPU load for the InvSBox based solution 62 4.6.3 Memory usage First of all, we measured the original AES code memory usage and it was 1308KB. All the three solutions add new variables; therefore they have a higher memory usage. The differences between the three solutions are resulting from different size of added look-up tables. The table below shows the different values of memory usage for each solution. Memory usage (KB) Overhead Original 1308 ParityBit 1332 1,8% InvSbox 1352 3,4% CRC 1368 4,6% 63 Chapter 5 Conclusions and future work The encryption/decryption process in the Advanced Encryption Standard is often subject to hacking attacks since it has become the most widely used around the world to secure data. By studying thoroughly the AES literature, we pointed out the possibility to violate its security by injecting faults within data during their processing. Even if different solutions have been already proposed to cope with it by detecting/correcting these errors, no software implementations were found and we thought it could be useful to raise AES security by adding this characteristic. In this project we analyzed the fault attacks and error detection/correction algorithms to prevent them in order to develop different versions of fault proof AES softwares. This software uses parity bits and bytes to recognize the most common kinds of data corruption and remove them. We evaluated the efficiency of that proposed software in different scenarios simulating several types of error injection. We confirmed the capability of detecting and correcting all bit errors of odd multiplicity that are inducted into not more than four bytes of the data. One problem came from the non-linearity of the SubBytes process, in fact some unpredictable kind of errors were not detected. That is why different solutions were implemented using CRC and Inverse SubBytes. Tests allowed seeing the raising of the coverage percentage. Clearly each solution has its pros and cons. Performances tests revealed that, whereas Inverse SubBytes based solution is faster and is able to correct all errors but it requires more memory, the CRC based solution turned out to be slower but lighter in terms of memory usage. We can conclude that respect to the original AES code each solution introduces a maximum overhead close to 50% in time and 5% in memory usage. However some improvements can be added. An optimization of the code can be performed to reduce the overhead further, specifically for SubBytes, which is the most CPU consuming step as shown in section 4.6.2. It is obvious that the error coverage could be enhanced by improving 64 our algorithm or suggesting new solutions. Actually we tried to reduce the time overhead by using inline functions: with this technique the function call is replaced by the function body during the compilation reducing several calls to functions during the execution. But applying this solution to our algorithm did not reduce significantly the time overhead. At last we also verified if by unrolling several for cycles, avoiding in this way multiple cycle conditions checks, the time would have been lowered. However, also in this case we did not find any relevant improvement. Our algorithm could be further used in a real transmission environment involving more than one encrypted block of data to check how it impacts on the performances in a real situation. Moreover, an important proof of security can be achieved by analyzing how the algorithm answers to a concrete DFA attack. Thanks to ours and supervisors’ knowledge, we want to underline that we actually did some unique work not done by anybody else before for all we know. 65 Appendix A AES source code with error detection and correction A.1 Complete Source code with parityBit based SubBytes detection/correction #include <stdio.h> #include <stdlib.h> #ifndef _AESENCRYPT_H_ #define _AESENCRYPT_H_ #include "aesencrypt.h" #endif int main(int argc, char* argv[]) { int i; // KeyLenght Nr=128; Nk = Nr / 32; Nr = Nk + 6; unsigned char InKey[32] = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00}; unsigned char Plaintext[32]= {0xea,0x83,0x5c,0xf0,0x04,0x45,0x33,0x2d, 0x65,0x5d,0x98,0xad,0x85,0x96,0xb0,0xc5}; // Copy the Key and PlainText for(i=0;i<Nk*4;i++) { Key[i]=InKey[i]; in[i]=Plaintext[i]; } // The KeyExpansion routine must be called before encryption. KeyExpansion(); Cipher(); return 0; } void error_injection() { int errormask[Nb][Nb]={{0x01,0x00,0x00,0x00},{0x00,0x00,0x00,0x00},{0x00,0x00,0x00 ,0x00},{0x00,0x00,0x00,0x00}}; for (int i=0;i<4;i++){ for(int j=0;j<4;j++){ state[i][j]=state[i][j]^errormask[i][j]; } } } int getSBoxValue(int num) { 66 int sbox[256] = { //0 1 2 3 4 5 6 7 8 9 A B C D E F 0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76, //0 0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0, //1 0xb7,0xfd,0x93,0x26,0x36,0x3f,0xf7,0xcc,0x34,0xa5,0xe5,0xf1,0x71,0xd8,0x31,0x15, //2 0x04,0xc7,0x23,0xc3,0x18,0x96,0x05,0x9a,0x07,0x12,0x80,0xe2,0xeb,0x27,0xb2,0x75, //3 0x09,0x83,0x2c,0x1a,0x1b,0x6e,0x5a,0xa0,0x52,0x3b,0xd6,0xb3,0x29,0xe3,0x2f,0x84, //4 0x53,0xd1,0x00,0xed,0x20,0xfc,0xb1,0x5b,0x6a,0xcb,0xbe,0x39,0x4a,0x4c,0x58,0xcf, //5 0xd0,0xef,0xaa,0xfb,0x43,0x4d,0x33,0x85,0x45,0xf9,0x02,0x7f,0x50,0x3c,0x9f,0xa8, //6 0x51,0xa3,0x40,0x8f,0x92,0x9d,0x38,0xf5,0xbc,0xb6,0xda,0x21,0x10,0xff,0xf3,0xd2, //7 0xcd,0x0c,0x13,0xec,0x5f,0x97,0x44,0x17,0xc4,0xa7,0x7e,0x3d,0x64,0x5d,0x19,0x73, //8 0x60,0x81,0x4f,0xdc,0x22,0x2a,0x90,0x88,0x46,0xee,0xb8,0x14,0xde,0x5e,0x0b,0xdb, //9 0xe0,0x32,0x3a,0x0a,0x49,0x06,0x24,0x5c,0xc2,0xd3,0xac,0x62,0x91,0x95,0xe4,0x79, //A 0xe7,0xc8,0x37,0x6d,0x8d,0xd5,0x4e,0xa9,0x6c,0x56,0xf4,0xea,0x65,0x7a,0xae,0x08, //B 0xba,0x78,0x25,0x2e,0x1c,0xa6,0xb4,0xc6,0xe8,0xdd,0x74,0x1f,0x4b,0xbd,0x8b,0x8a, //C 0x70,0x3e,0xb5,0x66,0x48,0x03,0xf6,0x0e,0x61,0x35,0x57,0xb9,0x86,0xc1,0x1d,0x9e, //D 0xe1,0xf8,0x98,0x11,0x69,0xd9,0x8e,0x94,0x9b,0x1e,0x87,0xe9,0xce,0x55,0x28,0xdf, //E 0x8c,0xa1,0x89,0x0d,0xbf,0xe6,0x42,0x68,0x41,0x99,0x2d,0x0f,0xb0,0x54,0xbb,0x16}; //F return sbox[num]; } int getSBoxParity(int num) { int bitSBox[256] = { // 0 1 2 3 4 5 6 7 8 9 A B C D E F 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, //0 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, //1 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, //2 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, //3 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, //4 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, //5 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, //6 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, //7 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, //8 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, //9 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, //A 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, //B 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, //C 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, //D 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, //E 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1}; //F return bitSBox[num]; } // This function produces Nb(Nr+1) round keys. The round keys are used in each round to encrypt the states. void KeyExpansion() { int i,j; unsigned char temp[4],k; // The first round key is the key itself. for(i=0;i<Nk;i++) { RoundKey[i*4]=Key[i*4]; RoundKey[i*4+1]=Key[i*4+1]; RoundKey[i*4+2]=Key[i*4+2]; RoundKey[i*4+3]=Key[i*4+3]; } // All other round keys are found from the previous round keys. while (i < (Nb * (Nr+1))) { for(j=0;j<4;j++) { temp[j]=RoundKey[(i-1) * 4 + j]; } if (i % Nk == 0) { // This function rotates the 4 bytes in a word to the left once. // [a0,a1,a2,a3] becomes [a1,a2,a3,a0] 67 // Function RotWord() { k = temp[0]; temp[0] = temp[1]; temp[1] = temp[2]; temp[2] = temp[3]; temp[3] = k; } // SubWord() is a function that takes a four-byte input word and // applies the S-box to each of the four bytes to produce an output word. // Function Subword() { temp[0]=getSBoxValue(temp[0]); temp[1]=getSBoxValue(temp[1]); temp[2]=getSBoxValue(temp[2]); temp[3]=getSBoxValue(temp[3]); } temp[0] = temp[0] ^ Rcon[i/Nk]; } else if (Nk > 6 && i % Nk == 4) { // Function Subword() { temp[0]=getSBoxValue(temp[0]); temp[1]=getSBoxValue(temp[1]); temp[2]=getSBoxValue(temp[2]); temp[3]=getSBoxValue(temp[3]); } } RoundKey[i*4+0] = RoundKey[(i-Nk)*4+0] ^ temp[0]; RoundKey[i*4+1] = RoundKey[(i-Nk)*4+1] ^ temp[1]; RoundKey[i*4+2] = RoundKey[(i-Nk)*4+2] ^ temp[2]; RoundKey[i*4+3] = RoundKey[(i-Nk)*4+3] ^ temp[3]; i++; } } //Computes and the parity bit matrix for a state void getParityBit(unsigned char bit[][4]) { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { bit[j][i]=0x00; for(int k=0;k<8;k++) { bit[j][i]^=((state[j][i]>>k)&Mask); } } } } //This function computes the parity bytes for the state: void getParityByte(unsigned char byte[]) { for(int j=0;j<4;j++) { byte[j]=0x00; for(int i=0;i<4;i++) { byte[j] ^= state[i][j]; } } } //This Function computes the parity bits for the round key: 68 void RoundKeyParityBit(int round) { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { RKeyParityBit[j][i]=0x00; for(int k=0;k<8;k++) { RKeyParityBit[j][i]^=((RoundKey[round * Nb * 4 + i * Nb + j]>>k)&Mask); } } } } //This function computes the parity bytes for the round key: void RoundKeyParityByte(int round) { for(int i=0;i<4;i++) { RKeyParityByte[i]=0x00; for(int j=0;j<4;j++) { RKeyParityByte[i] ^= (RoundKey[round * Nb * 4 + i * Nb + j]); } } } /*This function compares actual and predicted parity bit to check if error occurred storing also the position of the erroneus byte in the j-th word which has been corrupted*/ void ParityBitCheck() { getParityBit(ActualParityBit); for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { if(ActualParityBit[i][j]!=PredictedParityBit[i][j]) { error=1; ErrWord[j]=1; ErrByte[j]=i; } } } } //This Function Predicts the parity bits for the add round key: void ARK_BitPrediction(int round) { RoundKeyParityBit(round); for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { PredictedParityBit[j][i]=ActualParityBit[j][i]^RKeyParityBit[j][i]; } } } //This function predicts the parity byte for the add round key: void ARK_BytePrediction(int round) { RoundKeyParityByte(round); for(int j=0;j<4;j++) 69 { PredictedParityByte[j] = ActualParityByte[j] ^ RKeyParityByte[j]; } } // This function adds the round key to state. // The round key is added to the state by an XOR function. void AddRoundKey(int round) { int i,j; for(i=0;i<4;i++) { for(j=0;j<4;j++) { state[j][i] ^= RoundKey[round * Nb * 4 + i * Nb + j]; } } } void Correction() { for(int j=0;j<4;j++) { if (ErrWord[j]==1) { CorrectionParityByte[j] = PredictedParityByte[j] ^ state[0][j] ^ state[1][j] ^ state[2][j] ^ state[3][j]; state[ErrByte[j]][j]^=CorrectionParityByte[j]; } ErrWord[j]=0; } getParityByte(ActualParityByte); error=0; } //The predicted parity bits are the parity bit calculated for the values contained //in the S-Box. These are contained in the bitSBox matrix in the SBoxParity function void SB_BitPrediction() { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { PredictedParityBit[i][j] = getSBoxParity(state[i][j]); } } } /*The Predicted Parity Byte for SubBytes is computed considering the output state according to the definition of SubBytes in the Matrix form: s'[i][j] = A x (s^-1[i][j]) + b and the first byte of it s'[0][j] as function of the actual parity byte before the transformation*/ void SB_BytePrediction() { unsigned char temp=0x00; for (int j=0;j<4;j++) { unsigned char P1=(MulInv[(ActualParityByte[j]^state[1][j]^ state[2][j]^state[3][j])])^(MulInv[state[1][j]]^ MulInv[state[2][j]]^MulInv[state[3][j]]); PredictedParityByte[j]=0x00; for (int i=7;i>=0;i--) { temp = A[i]&P1; unsigned char bit=0x00; 70 for(int k=0;k<8;k++) { bit^=((temp>>k)&Mask); } PredictedParityByte[j]^=bit<<i; } } } // The SubBytes Function Substitutes the values in the // state matrix with values in an S-box. void SubBytes() { int i,j; for(i=0;i<4;i++) { for(j=0;j<4;j++) { state[i][j] = getSBoxValue(state[i][j]); } } } //Predicted parity bit for shift row are obtained performing the same shifts the //shift rows does on the state on the actual parity bit matrix void SR_BitPrediction() { unsigned char temp; getParityBit(PredictedParityBit); // Rotate first row 1 columns to left temp=PredictedParityBit[1][0]; PredictedParityBit[1][0]=PredictedParityBit[1][1]; PredictedParityBit[1][1]=PredictedParityBit[1][2]; PredictedParityBit[1][2]=PredictedParityBit[1][3]; PredictedParityBit[1][3]=temp; // Rotate second row 2 columns to left temp=PredictedParityBit[2][0]; PredictedParityBit[2][0]=PredictedParityBit[2][2]; PredictedParityBit[2][2]=temp; temp=PredictedParityBit[2][1]; PredictedParityBit[2][1]=PredictedParityBit[2][3]; PredictedParityBit[2][3]=temp; // Rotate third row 3 columns to left temp=PredictedParityBit[3][0]; PredictedParityBit[3][0]=PredictedParityBit[3][3]; PredictedParityBit[3][3]=PredictedParityBit[3][2]; PredictedParityBit[3][2]=PredictedParityBit[3][1]; PredictedParityBit[3][1]=temp; } /*The predicted parity byte for shift rows is computed according to the shifts the transformation performs on the state. The 1st byte of the 1st word is expressed as function of the actual parity byte */ void SR_BytePrediction() { for(int j=0;j<4;j++) { PredictedParityByte[j] = ActualParityByte[j] ^ state[1][j] ^ state[2][j] ^ state[3][j] ^ state[1][(j+1)%4] ^ state[2][(j+2)%4] ^ state[3][(j+3)%4]; } 71 } // The ShiftRows() function shifts the rows in the state to the left. // Each row is shifted with different offset. // Offset = Row number. So the first row is not shifted. void ShiftRows() { unsigned char temp; // Rotate first row 1 columns to left temp=state[1][0]; state[1][0]=state[1][1]; state[1][1]=state[1][2]; state[1][2]=state[1][3]; state[1][3]=temp; // Rotate second row 2 columns to left temp=state[2][0]; state[2][0]=state[2][2]; state[2][2]=temp; temp=state[2][1]; state[2][1]=state[2][3]; state[2][3]=temp; // Rotate third row 3 columns to left temp=state[3][0]; state[3][0]=state[3][3]; state[3][3]=state[3][2]; state[3][2]=state[3][1]; state[3][1]=temp; } //Get the most significant bit of a byte bool getMostsignificantBit(unsigned char data_received){ bool MSB; if((data_received>>7)&Mask==1) MSB=1; else MSB=0; return MSB; } //The parity bits prediction for MixColumn is based on the actual parity bits and //the most significand bit of each byte of the state. //The transformation preserves the parity byte. void MC_BitPrediction() { getParityBit(ActualParityBit); for(int i=0;i<4;i++) { PredictedParityBit[0][i]=ActualParityBit[0][i]^ ActualParityBit[2][i]^ActualParityBit[3][i]^ getMostsignificantBit(state[0][i])^getMostsignificantBit(state[1][i]); PredictedParityBit[1][i]=ActualParityBit[0][i]^ ActualParityBit[1][i]^ActualParityBit[3][i]^ getMostsignificantBit(state[1][i])^getMostsignificantBit(state[2][i]); PredictedParityBit[2][i]=ActualParityBit[0][i]^ ActualParityBit[1][i]^ActualParityBit[2][i]^ getMostsignificantBit(state[2][i])^getMostsignificantBit(state[3][i]); PredictedParityBit[3][i]=ActualParityBit[1][i]^ ActualParityBit[2][i]^ActualParityBit[3][i]^ getMostsignificantBit(state[3][i])^getMostsignificantBit(state[0][i]); } } 72 void MC_ParityBitCheck() { char PBitError_Pattern[4][4]; char matrix_msb0[4][4]={{1,0,1,1},{1,1,0,1},{1,1,1,0},{0,1,1,1}}; char matrix_msb1[4][4]={{0,1,1,1},{1,0,1,1},{1,1,0,1},{1,1,1,0}}; getParityBit(ActualParityBit); for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { PBitError_Pattern[i][j]=PredictedParityBit[i][j]^ ActualParityBit[i][j]; } } getParityByte(ActualParityByte); for(int i=0;i<4;i++) { CorrectionParityByte[i]=PredictedParityByte[i]^ ActualParityByte[i]; } for(int i=0;i<4;i++) { if(CorrectionParityByte[i]!=0) { error=1; bool msb=getMostsignificantBit(CorrectionParityByte[i]); if(msb==false) { for(int k=0;k<4;k++) { if(PBitError_Pattern[0][i]==matrix_msb0[0][k] && PBitError_Pattern[1][i]==matrix_msb0[1][k] && PBitError_Pattern[2][i]==matrix_msb0[2][k] && PBitError_Pattern[3][i]==matrix_msb0[3][k]) { ErrByte[i]=k; } } } else { for(int k=0;k<4;k++) { if(PBitError_Pattern[0][i]==matrix_msb1[0][k] && PBitError_Pattern[1][i]==matrix_msb1[1][k] && PBitError_Pattern[2][i]==matrix_msb1[2][k] && PBitError_Pattern[3][i]==matrix_msb1[3][k]) { ErrByte[i]=k; } } } } } } void MC_Correction() { for(int i=0;i<4;i++) { if(CorrectionParityByte[i]!=0) { if(ErrByte[i]==0) { CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i]); 73 CorrectionMatrix[1][i]=CorrectionParityByte[i]; CorrectionMatrix[2][i]=CorrectionParityByte[i]; CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i])^ CorrectionParityByte[i]; } if(ErrByte[i]==1) { CorrectionMatrix[0][i]=xtime(CorrectionParityByte[i])^ CorrectionParityByte[i]; CorrectionMatrix[1][i]=xtime(CorrectionParityByte[i]); CorrectionMatrix[2][i]=CorrectionParityByte[i]; CorrectionMatrix[3][i]=CorrectionParityByte[i]; } if(ErrByte[i]==2) { CorrectionMatrix[0][i]=CorrectionParityByte[i]; CorrectionMatrix[1][i]=xtime(CorrectionParityByte[i])^ CorrectionParityByte[i]; CorrectionMatrix[2][i]=xtime(CorrectionParityByte[i]); CorrectionMatrix[3][i]=CorrectionParityByte[i]; } if(ErrByte[i]==3) { CorrectionMatrix[0][i]=CorrectionParityByte[i]; CorrectionMatrix[1][i]=CorrectionParityByte[i]; CorrectionMatrix[2][i]=xtime(CorrectionParityByte[i])^ CorrectionParityByte[i]; CorrectionMatrix[3][i]=xtime(CorrectionParityByte[i]); } } } for(int j=0;j<4;j++) { if (CorrectionParityByte[j]!=0) { for(int i=0;i<4;i++) { state[i][j]^=CorrectionMatrix[i][j]; } } } getParityByte(ActualParityByte); error=0; } // MixColumns function mixes the columns of the state matrix void MixColumns() { int i; unsigned char Tmp,Tm,t; for(i=0;i<4;i++) { t=state[0][i]; Tmp = state[0][i] ^ state[1][i] ^ state[2][i] ^ state[3][i] ; Tm = state[0][i] ^ state[1][i] ; Tm = xtime(Tm); state[0][i] ^= Tm ^ Tmp ; Tm = state[1][i] ^ state[2][i] ; Tm = xtime(Tm); state[1][i] ^= Tm ^ Tmp ; Tm = state[2][i] ^ state[3][i] ; Tm = xtime(Tm); state[2][i] ^= Tm ^ Tmp ; Tm = state[3][i] ^ t ; Tm = xtime(Tm); state[3][i] ^= Tm ^ Tmp ; 74 } } // Cipher is the main function that encrypts the PlainText. void Cipher() { int i,j,round=0; //Copy the input PlainText to state array. for(i=0;i<4;i++) { for(j=0;j<4;j++) { state[j][i] = in[i*4 + j]; } } getParityBit(ActualParityBit); getParityByte(ActualParityByte); ARK_BitPrediction(round); ARK_BytePrediction(round); // Add the First round key to the state before starting the rounds. AddRoundKey(0); ParityBitCheck(); if(error==1) { Correction(); } else getParityByte(ActualParityByte); // There will be Nr rounds. // The first Nr-1 rounds are identical. // These Nr-1 rounds are executed in the loop below. for(round=1;round<Nr;round++) { SB_BitPrediction(); SB_BytePrediction(); SubBytes(); ParityBitCheck(); if(error==1) { Correction(); } else getParityByte(ActualParityByte); SR_BitPrediction(); SR_BytePrediction(); ShiftRows(); ParityBitCheck(); if (error==1) { Correction(); } //For mix column predicted and actual parity byte are the same. As we need //Predicted parity byte in the detection and correction now we store the actual //parity byte in the predicted vector else getParityByte(PredictedParityByte); 75 MC_BitPrediction(); MixColumns(); MC_ParityBitCheck(); if (error==1) { MC_Correction(); } else getParityByte(ActualParityByte); getParityBit(ActualParityBit); ARK_BitPrediction(round); ARK_BytePrediction(round); AddRoundKey(round); ParityBitCheck(); if(error==1) { Correction(); } else getParityByte(ActualParityByte); } // The last round is given below. // The MixColumns function is not here in the last round. SB_BitPrediction(); SB_BytePrediction(); SubBytes(); ParityBitCheck(); if(error==1) { Correction(); } else getParityByte(ActualParityByte); SR_BitPrediction(); SR_BytePrediction(); ShiftRows(); ParityBitCheck(); if (error==1) { Correction(); } else getParityByte(PredictedParityByte); getParityBit(ActualParityBit); ARK_BitPrediction(Nr); ARK_BytePrediction(Nr); AddRoundKey(Nr); ParityBitCheck(); if(error==1) { Correction(); } // The encryption process is over. // Copy the state array to output array. 76 for(i=0;i<4;i++) { for(j=0;j<4;j++) { out[i*4+j]=state[j][i]; } } } A.2 CRC based SubBytes detection/correction solution New variable: unsigned char checksum_matrix[Nb][Nb]={{0,0,0,0},{0,0,0,0},{0,0,0,0},{0,0,0,0}}; In the previous code getSboxParityBit() is replaced with: int SBox_checksum(int num) { int checksum_sbox[256] = { //0 1 2 3 4 5 6 7 8 9 A B C D E F 0xb8,0x58,0xf8,0x00,0x90,0xd0,0x40,0xf0,0x38,0x80,0x28,0x48,0x68,0x68,0x30,0x78, 0xc0,0x30,0x08,0xd8,0xf8,0xa0,0xc0,0xd8,0xe8,0xa0,0xd8,0xa0,0x50,0x00,0xe8,0xe0, 0x18,0xa0,0x60,0x30,0xe0,0x08,0x80,0x18,0xa8,0x80,0x18,0x58,0x20,0x58,0xb8,0xc0, 0x90,0xb8,0x20,0x28,0xb8,0x70,0x10,0x88,0x58,0x98,0x78,0x40,0xa8,0xb0,0x08,0xb0, 0xe8,0xb0,0x10,0xf0,0x70,0xc0,0x68,0x90,0x00,0x98,0xe8,0x88,0x00,0xc0,0xd8,0xe8, 0x80,0xb0,0x00,0x70,0xe8,0x20,0xc0,0xe8,0x50,0x40,0xf0,0xd0,0xb8,0x60,0x20,0xd0, 0x30,0x38,0xb0,0x78,0x50,0xe0,0xf0,0x68,0x88,0x30,0x48,0x90,0x48,0xc0,0x98,0xf8, 0xc8,0x58,0x98,0x48,0xe0,0xd0,0x50,0xc8,0xb8,0x98,0x10,0x68,0xd0,0xe8,0x10,0x78, 0x98,0xf8,0x18,0xf0,0x78,0xf0,0x08,0x88,0x70,0xc8,0x10,0x40,0xe0,0x30,0x38,0x68, 0x70,0xf8,0xa8,0xc8,0xa0,0xc8,0xa8,0x10,0x40,0xb8,0x28,0x40,0x80,0xf8,0xa0,0x90, 0x08,0x70,0x18,0x20,0x70,0xd8,0x78,0xb0,0xa8,0xf8,0x68,0x38,0x28,0xb8,0x98,0x48, 0x50,0x88,0x60,0x08,0x00,0x20,0x28,0x78,0x88,0x90,0x48,0x28,0x60,0x80,0x20,0x68, 0x60,0xc8,0xf8,0x58,0x28,0x48,0xd0,0x38,0x60,0x48,0x30,0xe0,0x38,0x38,0xd8,0x58, 0xa0,0x88,0x50,0xa8,0xf0,0xc8,0x00,0xb0,0xf0,0x28,0x10,0xa8,0xa0,0x60,0xa8,0x18, 0x88,0xb0,0xc0,0x50,0x98,0xd8,0xc8,0x38,0x08,0x60,0x20,0xe0,0x50,0x58,0x80,0x00, 0x80,0x10,0x90,0x78,0x70,0xd0,0xd0,0x18,0x18,0x40,0x90,0x30,0x40,0xd8,0xe0,0x08}; return checksum_sbox[num]; } //0 //1 //2 //3 //4 //5 //6 //7 //8 //9 //A //B //C //D //E //F 77 The SB_BitPrediction functions is not needed in this solution. Instead of it the following functions are used: unsigned char Checksum_calculation(unsigned char byte) { unsigned char poly_gen=0xa4; //6 bit poly + 00 int k=0; //k is the number of shift while(k<7){ if(byte<0x80)//shift zeros { byte=byte<<1; k++; } else byte^=poly_gen;} return byte; } void SB_Prediction_checksum() { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { checksum_matrix[i][j] = SBox_checksum(state[i][j]); } } } The prediction uses, instead of ParityBitCheck(), this new function: void SB_detection() { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { if(checksum_matrix[i][j]!=Checksum_calculation(state[i][j])) { error=1; ErrWord[j]=1; ErrByte[j]=i; } } } } The Cipher() is then modified as follows for the SubBytes() transformation: SB_Prediction_checksum(); SB_BytePrediction(); SubBytes(); SB_detection(); 78 if(error==1) { Correction(); } else getParityByte(ActualParityByte); A.3 InvSBox based SubBytes detection/correction solution In the complete code of section A.1 getSboxParityBit() is replaced by: int getSBoxInvert(int num) { int rsbox[256] = { 0x52,0x09,0x6a,0xd5,0x30,0x36,0xa5,0x38,0xbf,0x40,0xa3,0x9e,0x81,0xf3,0xd7,0xfb, 0x7c,0xe3,0x39,0x82,0x9b,0x2f,0xff,0x87,0x34,0x8e,0x43,0x44,0xc4,0xde,0xe9,0xcb, 0x54,0x7b,0x94,0x32,0xa6,0xc2,0x23,0x3d,0xee,0x4c,0x95,0x0b,0x42,0xfa,0xc3,0x4e, 0x08,0x2e,0xa1,0x66,0x28,0xd9,0x24,0xb2,0x76,0x5b,0xa2,0x49,0x6d,0x8b,0xd1,0x25, 0x72,0xf8,0xf6,0x64,0x86,0x68,0x98,0x16,0xd4,0xa4,0x5c,0xcc,0x5d,0x65,0xb6,0x92, 0x6c,0x70,0x48,0x50,0xfd,0xed,0xb9,0xda,0x5e,0x15,0x46,0x57,0xa7,0x8d,0x9d,0x84, 0x90,0xd8,0xab,0x00,0x8c,0xbc,0xd3,0x0a,0xf7,0xe4,0x58,0x05,0xb8,0xb3,0x45,0x06, 0xd0,0x2c,0x1e,0x8f,0xca,0x3f,0x0f,0x02,0xc1,0xaf,0xbd,0x03,0x01,0x13,0x8a,0x6b, 0x3a,0x91,0x11,0x41,0x4f,0x67,0xdc,0xea,0x97,0xf2,0xcf,0xce,0xf0,0xb4,0xe6,0x73, 0x96,0xac,0x74,0x22,0xe7,0xad,0x35,0x85,0xe2,0xf9,0x37,0xe8,0x1c,0x75,0xdf,0x6e, 0x47,0xf1,0x1a,0x71,0x1d,0x29,0xc5,0x89,0x6f,0xb7,0x62,0x0e,0xaa,0x18,0xbe,0x1b, 0xfc,0x56,0x3e,0x4b,0xc6,0xd2,0x79,0x20,0x9a,0xdb,0xc0,0xfe,0x78,0xcd,0x5a,0xf4, 0x1f,0xdd,0xa8,0x33,0x88,0x07,0xc7,0x31,0xb1,0x12,0x10,0x59,0x27,0x80,0xec,0x5f, 0x60,0x51,0x7f,0xa9,0x19,0xb5,0x4a,0x0d,0x2d,0xe5,0x7a,0x9f,0x93,0xc9,0x9c,0xef, 0xa0,0xe0,0x3b,0x4d,0xae,0x2a,0xf5,0xb0,0xc8,0xeb,0xbb,0x3c,0x83,0x53,0x99,0x61, 0x17,0x2b,0x04,0x7e,0xba,0x77,0xd6,0x26,0xe1,0x69,0x14,0x63,0x55,0x21,0x0c,0x7d}; return rsbox[num]; } SB_BitPrediction is not needed anymore. The new functions used are: void storePreviousState() { for(int i=0;i<4;i++) { for(int j=0;j<4;j++) { pstate[i][j] = state[i][j]; } } } void SB_Correction() { for(int j=0;j<4;j++) { if (ActualParityByte[j]!=PredictedParityByte[j]) { 79 for(int i=0; i<4; i++) { if(getSBoxInvert(state[i][j]) != pstate[i][j]) { state[i][j] = getSBoxValue(pstate[i][j]); } } } } } and a new variable is used: unsigned char pstate[Nb][Nb]; The Cipher() is modified as follows for SubBytes() transformation: storePreviousState(); SB_BytePrediction(); SubBytes(); getParityBit(ActualParityBit); getParityByte(ActualParityByte); SB_Correction(); 80 Appendix B B.1 Time performance Parity bit based SB Original code No errors Execution nb Execution time in µs Execution time in µs 1 120 150 2 90 150 3 100 150 4 140 130 5 100 160 6 100 150 7 100 160 8 120 150 9 120 160 10 100 160 Average 109,00 152,00 ST DEV 15,24 9,19 Overhead 39% 81 ARK SubBytes ShiftRows Mixcolumns Single Error Single Error Single Error Single Error Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 140 150 150 150 2 150 150 150 150 3 150 160 160 160 4 150 150 150 150 5 150 150 150 150 6 150 170 150 150 7 150 150 150 160 8 150 150 150 150 9 160 170 160 160 10 150 150 150 150 Average 150,00 155,00 152,00 153,00 ST DEV 4,714 8,50 4,22 4,83 Overhead 38% 42% 39% 40% 82 ARK SubBytes ShiftRows Mixcolumns 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 160 0 150 150 2 160 0 170 160 3 150 0 160 170 4 150 0 180 150 5 160 0 150 140 6 150 0 150 150 7 160 0 170 150 8 170 0 160 150 9 160 0 150 160 10 150 0 170 150 Average 157,00 - 161,00 153,00 ST DEV 6,75 0 11,00 8,23 Overhead 44% - 48% 40% 83 Inverse based SB Original code No errors Execution nb Execution time in µs Execution time in µs 1 120 140 2 90 140 3 100 140 4 140 150 5 100 140 6 100 140 7 100 140 8 120 140 9 120 140 10 100 140 Average 109,00 141,00 ST DEV 15,24 3,16 Overhead 29% 84 ARK SubBytes ShiftRows Mixcolumns Single Error Single Error Single Error Single Error Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 140 140 140 130 2 140 150 130 140 3 160 140 130 140 4 140 140 160 170 5 140 150 150 160 6 140 140 140 130 7 140 140 140 140 8 140 180 140 140 9 140 140 140 140 10 140 140 140 140 Average 142,00 146,00 141,00 143,00 ST DEV 6,32 12,64 8,76 12,51 Overhead 30% 34% 29% 31% 85 ARK SubBytes ShiftRows Mixcolumns 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 150 140 180 140 2 140 140 140 160 3 140 160 140 140 4 150 140 140 140 5 150 190 170 140 6 140 160 140 150 7 140 140 140 140 8 140 140 150 140 9 140 160 140 140 10 140 150 140 140 Average 143,00 152,00 148,00 143,00 ST DEV 4,83 16,19 14,75 6,74 Overhead 31% 39% 36% 31% 86 CRC based SB Original code No errors Execution nb Execution time in µs Execution time in µs 1 120 160 2 90 160 3 100 160 4 140 150 5 100 190 6 100 160 7 100 160 8 120 160 9 120 160 10 100 160 Average 109,00 162,00 ST DEV 15,24 10,32 Overhead 49% 87 ARK SubBytes ShiftRows Mixcolumns Single Error Single Error Single Error Single Error Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 160 160 160 160 2 170 160 170 160 3 160 160 160 150 4 160 170 170 160 5 160 160 160 160 6 170 170 160 170 7 160 170 160 160 8 150 170 140 160 9 160 160 160 160 10 160 170 170 170 Average 161,00 165,00 161,00 161,00 ST DEV 6,32 12,64 8,76 12,51 Overhead 48% 51% 48% 48% 88 ARK SubBytes ShiftRows Mixcolumns 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus 4 bytes Erroneus Execution time in µs Execution time in µs Execution time in µs Execution time in µs 1 160 170 160 160 2 160 170 170 160 3 170 160 170 180 4 170 170 180 170 5 180 160 170 170 6 160 170 160 170 7 160 160 160 160 8 160 170 160 170 9 160 170 180 160 10 140 160 170 170 Average 162,00 166,00 168,00 167,00 ST DEV 10,32 5,16 7,88 6,74 Overhead 49% 52% 54% 53% 89 B.2 CPU load performance Inverse based SB Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % getSBoxValue(int) 202 817 200 586 43,51 43,03 ParityBitCheck(void) 56 106 6 460 12,04 1,39 SB_BytePrediction(void) 42 636 42 636 9,15 9,15 getParityBit(unsigned char (* const)[4]) 40 845 40 845 8,76 8,76 ARK_BitPrediction(int) 24 298 1 675 5,21 0,36 MC_BitPrediction(void) 23 489 4 341 5,04 0,93 MC_ParityBitCheck(void) 22 238 4 097 4,77 0,88 SR_BitPrediction(void) 20 253 553 4,34 0,12 SB_Correction(void) 12 762 5 530 2,74 1,19 getSBoxInvert(int) 5 939 5 867 1,27 1,26 getSBoxValue(int) 1 293 1 270 0,28 0,27 MixColumns(void) 4 343 4 343 0,93 0,93 ARK_BytePrediction(int) 2 314 2 314 0,5 0,5 SR_BytePrediction(void) 1 397 1 397 0,3 0,3 AddRoundKey(int) 1 238 1 238 0,27 0,27 ShiftRows(void) 725 725 0,16 0,16 MC_Correction(void) 526 526 0,11 0,11 error_injection(void) 499 463 0,11 0,1 Correction(void) 375 375 0,08 0,08 90 Parity bit based SB Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % getSBoxParity(int) 195 112 193 142 30,74 30,43 getSBoxValue(int) 192 502 190 596 30,33 30,03 ParityBitCheck(void) 73 677 9 289 11,61 1,46 SB_BytePrediction(void) 44 221 44 221 6,97 6,97 ARK_BitPrediction(int) 21 792 1 804 3,43 0,28 MC_ParityBitCheck(void) 21 719 3 713 3,42 0,58 getParityBit(unsigned char (* const)[4]) 21 695 21 695 3,42 3,42 MC_BitPrediction(void) 21 568 4 007 3,4 0,63 SR_BitPrediction(void) 19 864 529 3,13 0,08 MixColumns(void) 3 698 3 698 0,58 0,58 ARK_BytePrediction(int) 2 230 2 230 0,35 0,35 SR_BytePrediction(void) 1 394 1 394 0,22 0,22 AddRoundKey(int) 1 331 1 331 0,21 0,21 Correction(void) 564 564 0,09 0,09 error_injection(void) 524 490 0,08 0,08 ShiftRows(void) 508 508 0,08 0,08 MC_Correction(void) 429 429 0,07 0,07 91 CRC based SB Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % SBox_checksum(int) 196 024 193 753 27,92 27,6 getSBoxValue(int) 193 084 191 244 27,5 27,24 SB_detection(void) 88 293 88 293 12,58 12,58 ParityBitCheck(void) 50 308 6 555 7,17 0,93 SB_BytePrediction(void) 41 881 41 881 5,97 5,97 getParityBit(unsigned char (* const)[4]) 22 223 22 223 3,17 3,17 MC_ParityBitCheck(void) 21 964 3 906 3,13 0,56 MC_BitPrediction(void) 21 886 4 473 3,12 0,64 ARK_BitPrediction(int) 21 634 1 674 3,08 0,24 SR_BitPrediction(void) 20 503 585 2,92 0,08 MixColumns(void) 3 563 3 563 0,51 0,51 ARK_BytePrediction(int) 2 436 2 436 0,35 0,35 SR_BytePrediction(void) 1 358 1 358 0,19 0,19 AddRoundKey(int) 1 077 1 077 0,15 0,15 error_injection(void) 615 547 0,09 0,08 ShiftRows(void) 593 593 0,08 0,08 MC_Correction(void) 587 587 0,08 0,08 Correction(void) 565 565 0,08 0,08 92 B.3 Memory usage performance Memory usage (KB) Original code 1308 Overhead ParityBit based SB 1332 1,8% InvSbox based SB 1352 3,4% CRC based SB 1368 4,6% 93 Bibliography [1] John G. Proakis (2001). “Digital Communications”, 4th edition, McGRAW-HILL International edition. [2] S. Benedetto, E. Biglieri, V. Castellari (1987), “Digital Transmission Theory”, Prentice Hall. [3] W. Stallings (2007), “Network Security Essentials: Applications and Standards”, 3rd edition, Pearson Prentice Hall. [4] H. Feistel (May 1973), “Cryptography and computer privacy”, Scientific American. [5] Yoshitaka Ikeda (2008), Available http://commons.wikimedia.org/wiki/File:Feistel.png. from the Internet, [6] Animal, a new interactive modeller for animations in lectures, v. 2.3.14: “DES, Data Encryption Standard”, 2008 [7] Wikipedia, The free encyclopedia: “The Advanced Encryption Standard”, Available on the Internet, http:/en.wikipedia.org/wiki/Advanced_Encryption_Standard. [8] RSA Laboratories.. “What is a Block Cipher?”, Cryptography, 2007. Retrieved from http://www.rsa.com/rsalabs/node.asp?id=2171. [9] H. Lipmaa, P. Rogaway, D. Wagner (2000): “Comments to NIST concerning AES mode of Operations: CTR-Mode Encryption”, In Symmetric Key Block Cipher Modes of operation Workshop, Baltimore, Maryland, USA. [10] J. Jaffe (2007), “A First-Order DPA Attack Against AES in Counter Mode with Unknown Initial Counter”, in: Lecture Notes in Computer Science, Vol. 4727/2007, Springer Berlin / Heidelberg, and presentation from the Rump Session Talk, CHES 2006. [11] Ors, et. al., “Power Analysis Attacks: Power-Analysis attack on ASIC AES implementation”, presented by Michael Cloppert, available on the Internet: http://www.cloppert.org/Power-Analysis_Attack_Presentation.pdf [12] Onur Acıi (2006), Werner Schindler, and C, etin K. Ko, “Cache Based Remote Timing Attack on the AES” in: Lecture notes in computer science, Springer Berlin / Heidelberg, Vol. 4377. [13] A. Biryukov and D. Khovratovich (2009), “Related-key Cryptanalysis of the Full AES-192 and AES-256”, University of Luxembourg [14] Available on Internet: http://en.wikipedia.org/wiki/Data_Encryption_Standard [15] Bertoni G. Breveglieri L., Koren I., Maistri P., Piuri V. (Nov. 2003): “Detecting and Locating Faults in VLSI Implementations of the Advanced Encryption Standard”, Proc. of the 2003 94 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 105113. [16] P. Dusart, G. Letourneux, O. Vivolo (2002), “Differential Fault Analysis on AES”, in: Lecture Notes in Computer Science, Vol. 2846/2003, Springer Berlin / Heidelberg. [17] Biham E., Shamir A. (1997): “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology - CRYPTO’97, LNCS, vol. 1294, pp.513-525, Springer-Verlag [18] Bertoni G., Breveglieri L., Koren I., Maistri P., Piuri V. (2003): “Error analysis and detection procedures for a hardware implementation of the advanced encryption standard”. IEEE Trans. Comput. 52(4), 492–505. [19] Karri R.,Wu K., Kuznetsov G., GoesselM. (2004): “Low cost concurrent error detection for the advanced encryption standard”. In: Proceedings of the International Test Conference 2004, pp. 1242–1248. [20] Kulikowski K.J., KarpovskyM.G., Taubin A. (2006): “Fault attack resistant cryptographic hardware with uniform error detection”. In: Proceedings of the FDTC 2006, LNCS, vol. 4236, pp. 185–195. [21] Yen C. -H., Wu B.-F. (2006): “Simple error detection methods for hardware implementation of advanced encryption standard”. IEEE Trans. Comput. 55(6), 720–731. [22] Yen S.-M., Kim S., Lim S., Moon S.(2003): “RSA speedup with Chinese reminder theorem immune against hardware fault cryptanalysis”. IEEE Trans. Comput. 52(4), 461–472 (2003). [23] Yen S.-M., Joye M. (2000): “Checking before output may not be enough against faultbased cryptanalysis”. IEEE Trans. Comput. 49(9), 967–970. [24] M. Czapskii - M. Nikodem (2008): “Error detection and error correction procedures for the advanced encryption standard”. In Springer Science+Business Media, LLC. [25] Federal Information Processing Standards Publication 197 (2001), “Announcing the ADVANCED ENCRYPTION STANDARD (AES)”. Available on the Internet: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf [26] C. Giraud (2003), “DFA on AES”, in: Advanced Encryption Standard, pg. 27-41, AES 4th international conference, AES 2004. [27] Chien-Ning Chen, Sung-Ming Yen (2003), “Differential Fault Analysis on AES key schedule and some countermeasures”, in: Lecture Notes in Computer Science, Vol. 2727/2003, Springer Berlin / Heidelberg. [28] Peterson, W. W. and Brown, D. T. (January 1961). "Cyclic Codes for Error Detection". Proceedings of the IRE 49: 228. [29] Wikipedia, The free encyclopedia, “Cyclic redundancy check”, http://en.wikipedia.org/wiki/Cyclic_redundancy_check 95 [30] Niyaz PK , “Advanced Encryption Standard (AES) Implementation in C/C++”, available on internet: http://www.hoozi.com/Articles/AESEncryption.htm [31] National Institute of Standards and Technology, Computer security division, “AES Known Answer Test vectors”, available on the Internet: http://csrc.nist.gov/groups/STM/cavp/documents/aes/KAT_AES.zip [32] H. Pulapaka, B. Vidolov (March 2008), “Find Application Bottlenecks with Visual Studio Profiler”, on MSDN magazine issues, available on the Internet: http://msdn.microsoft.com/en-us/magazine/cc337887.aspx 96