International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 Efficient Field Programmable Gate Array Implementation of Advanced Encryption Standard Algorithm using VHDL 1 2 Archana garg1, Harmanjot Singh Dhaliwal2 Student,M.Tech,ECE,Punjabi university, Patiala. Assistant Professor,ECE,Punjabi university, Patiala. Abstract- For the security of data, various solutions unintelligible data (known as ciphertext) through an algorithms were proposed. The AES also known as the algorithm referred to as cipher. Encryption is the Rijndael algorithm was selected as a Standard by transformation of data into a form that is as close to National Institute of Standards and Technology (NIST). Encryption algorithms are used to ensure security of transmission channels. This paper presents an efficient FPGA implementation approach of the Advanced impossible as possible to read without the appropriate knowledge (a key). Its purpose is to ensure privacy by keeping information hidden from anyone for Encryption Standard (AES) Algorithm. In this paper whom it is not intended, even those who have access two different architectures of AES named Basic AES to the encrypted data. Decryption is the reverse of and Fully Pipelined AES have been designed in VHDL. encryption; it is the transformation of encrypted data The codes have been synthesized using Xilinx ISE 9.2i back into an intelligible form. Encryption and software for a Virtex 3 FPGA device. The comparison is decryption generally require the use of some secret being done between the Basic AES and Fully Pipelined information, referred to as a key. For some AES algorithm on the basis of power consumption, Maximum pin Delay, Clock delay, Slice Flip flops. Keywords : , AES, Fully pipelined, FPGA, VHDL encryption mechanisms, the same key is used for both encryption and decryption; for other mechanisms, the keys used for encryption and decryption is different. The general model of Encryption and Decryption is shown in the figure I INTRODUCTION below : In today’s digital world, encryption is emerging as a disintegrable part of all communication networks and information processing systems, for protecting both stored and in transit data. Encryption is the transformation of plain data (known as plaintext) into ISSN: 2231-5381 http://www.ijettjournal.org Page 3956 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 An outline of AES encryption is given as : SubBytes Transformation: The SubBytes transformation is a non-linear byte substitution, operating on each of the state bytes independently. The SubBytes transformation is done Figure 1: The Encryption Model using a once-precalculated substitution table called Sbox. That S-box table contains 256 numbers (from 0 There are innumerous encryption algorithms that are now commonly used in computation, but the U.S. government has adopted the Advanced Encryption Standard (AES) to be used by Federal departments and agencies for protecting sensitive information. The National Institute of Standards and Technology (NIST) has published the specifications of this encryption standard in the Federal Information Processing Standards (FIPS) Publication [1]. II DESCRIPTION OF AES ALGORITHM to 255) and their corresponding resulting values. More details of the method of calculating the S-box table refers to [3]. ShiftRows Transformation: In Shift Rows transformation, the rows of the state are cyclically left shifted over different offsets. Row 0 is not shifted; row 1 is shifted one byte to the left; row 2 is shifted two bytes to the left and row 3 is shifted three bytes to the left. Mix Columns Transformation: In Mix Columns transformation, the columns of the The AES algorithm is a symmetric block cipher that state are considered as polynomials over GF (28) and can encrypt and decrypt information. Encryption multiplied by modulo x4 + 1 with a fixed polynomial converts data to an unintelligible form called cipher- c(x), given by: text. Decryption of the cipher-text converts the data c(x)={03}x3 + {01}x2 + {01}x + {02}. back into its original form, which is called plain-text. AddRoundKey Transformation: In the AddRoundKey transformation, a Round Key is A. AES ENCRYPTION added to the State - resulted from the operation of the The AES algorithm operates on a 128-bit block of Mix Columns transformation - by a simple bitwise data and executed Nr - 1 loop times. A loop is called XOR operation. The Round Key of each round is a round and the number of iterations of a loop, Nr, derived from the main key using the Key Expansion can be 10, 12, or 14 depending on the key length. The algorithm [4]. The encryption/decryption algorithm key length is 128, 192 or 256 bits in length needs eleven 128-bit Round Key, which are denoted respectively. The first and last rounds differ from Round Key [0] Round Key[10] (the first Round Key other rounds in that there is an additional [0] is the main key). AddRoundKey transformation at the beginning of the first round and no MixCoulmns transformation is B. AES DECRYPTION performed in the last round.[2] ISSN: 2231-5381 http://www.ijettjournal.org Page 3957 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 Decryption is a reverse of encryption which inverse increases the performances and draws part of the round transformations to computes out the original architecture from the FPGA which supports the plaintext of an encrypted cipher-text in reverse order. pipeline. The pipelining is a technique that allows the The round transformation of decryption uses the increase of throughput, but for it it needs an increase functions in the used resources. AddRoundKey, InvMixColumns, InvShiftRows, and InvSubBytes successively. Pipelining is a form economic to make parallelism AddRoundKey: temporary in computers. The idea is as the assembly AddRoundKey is its own inverse function because lines of the industrial plants. The task is divided in a the XOR function is its own inverse. The round keys sequence of subtasks, each one of which it is have to be selected in reverse order. The description executed in a specialized stage of hardware that of the other transformations will be given as follows. works concurrently with another one of the stages of InvShiftRows Transformation: pipelining. The stages are circuits that conduct InvShiftRows exactly functions the same as arithmetical operations or logics on the data set that ShiftRows, only in the opposite direction. The first flows through pipe. They are separated of registers of row is not shifted, while the second, third and fourth very high speed that store the intermediate results rows are shifted right by one, two and three bytes between stages, calls latches. This allows to increase respectively. throughput of the system of considerable form. InvSubBytes transformation: The pipelining does not reduce the run time of an The InvSubBytes transformation is done using a individual instruction; it increases the number of once-precalculated substitution table called InvS-box. instructions that are executed simultaneously and the That InvS-box table contains 256 numbers (from 0 to speed which are started and finished. [5] 255) and their corresponding values. InvMixColumns Transformation: In the InvMixColumns IV FPGA IMPLEMENTATION OF BASIC AES transformation, the ALGORITHM AND FULLY PIPELINED AES polynomials of degree less than 4 over GF(28), which coefficients are the elements in the columns of the state, are multiplied modulo (x4 + 1) by a fixed polynomial d(x) = {0B}x3 + {0D}x2 + {09}x + {0E}, AES is implemented as defined in the FIPS-197 document [1]. The Algorithm is implemented in ECB mode. where {0B}, {0D}; {09}, {0E} denote hexadecimal values. III FULLY PIPELINED AES ALGORITHM As we wish to obtain high performances and with low consumption of the resources, the combination of several techniques is essential. On a side, the pipeline ISSN: 2231-5381 http://www.ijettjournal.org Page 3958 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 ADDROUNDKEY SUB BYTES SHIFT ROWS MIX COLS CIPHERTEXT INPUTTEXT [128BITS] KEY[128BITS] SUB WORD ROT WORD Figure 4.2: Process sequence for encryption/decryption RCON W[i-NK] \\ Figure 2: Architectural block diagram The decryption process follows virtually the same order as encryption except for another round of mix columns on the generated keys before giving them to the add round key step. This flow is clearly explained in the FIPS-197 document [1] . The encryption/decryption sequence Input data and key is fed in two blocks of 64 bits in consecutive clock A. RESULTS AND DISCUSSIONS cycles with the load signal. 64 bits of input and key are read in the posedge after the load signal goes high and another block of 64 bits of input and key are read in the posedge after the load signal goes low. Hence the complete data and key is loaded only when the load signal makes a low-high-low transition (basically a pulse). The process starts once the start signal is pulsed and the output is validated with 'done' signal 13 clock cycles after the 'start' signal goes low. 'done' remains high until the next start cycle. ISSN: 2231-5381 http://www.ijettjournal.org Page 3959 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 Table 2 show the summary of resources utilized by the Fully Table 1. show the summary of resources utilized by the basic AES pipelined AES for a Virtex 3 device. Out of available 64896 core for a Virtex 3 device. Out of available 64896 Slice Flip Flops, Slice Flip Flops, 64896 4 input LUTs, 804 bonded IOBs and 64896 4 input LUTs, 804 bonded IOBs and 4 GCLKs and 4 4 GCLKs and 4 GCLKIOBs the designed core has only GCLKIOBs the designed core has only utilized 906 Slice Flip utilized 14057 Slice Flip Flops, 34479 4 input LUTs, , 385 Flops, 15742 4 input LUTs, , 261 bonded IOBs and 1 GCLKs and bonded IOBs and 1 GCLKs and 1 GCLKIOBs. Thus %age 1 GCLKIOBs. Thus %age utilization of resources is 1% Slice Flip utilization of resources is 21% Slice Flip Flops, 53% 4 input Flops, 24% 4 input LUTs,32% bonded IOBs and 25% GCLKs and LUTs,47% bonded IOBs and 25% 25% GCLKIOBs. GCLKIOBs GCLKs and 25% V CONCLUSION In this paper two different architectures of AES named Basic AES and Fully Pipelined AES have been designed in VHDL. The codes have been synthesized using Xilinx ISE 9.2i software for a Virtex 3 FPGA device. ISSN: 2231-5381 http://www.ijettjournal.org Page 3960 International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 9- September 2013 DEVICE BASIC FULLY Fully Pipelined architecture uses more resources and UTILIZATION AES PIPELINED consumes more power, yet it has very high speed as AES compared 21% connection delay has decreased from 868 ns to 1.631 SUMMERY NUMBER OF SLICE 1% Basic AES core. The average ns only. Results also show that clock delay decreases FLIP FLOPS NUMBER OF 4 INPUT LUTS NUMBER OF BONDED IOBS NUMBER OF GCLKS NUMBER OF GLCKIOBS POWER CONSUMPTION MAXIMUM PIN DEALY CLOCK DELAY AVERAGE CONNECTION DELAY to 24% 53% from 1.427 ns to 1.407 ns. 32 47 REFERENCES 25 25 25 25 7 mW 367 mW 3.843 ns 5.037 ns 1.427 ns 868 ns 1.407 ns 1.631 ns [1] National Institute of Standards and Technology, “Federal Information Processing Standards Publication 197”, 2001 [2] Hoang Trang; Nguyen Van Loi; “An efficient FPGA implementation of the Advanced Encryption Standard algorithm” IEEE 2012. [3] Ahmad, N.; Hasan, R.; Jubadi, W.M; “Design of AES S-Box using combinational logic optimization”, Table 3 : Comparison between basic AES and Fully pipelined AES IEEE Symposium on Industrial Electronics & The implementation of Basic AES and Fully Applications (ISIEA), pp. 696-699, 2010. Pipelined AES shows that Basic AES core uses 1% Slice Flip Flops, 24% 4 input LUTs,32% bonded [4] Daemen J., and Rijmen V, "The Design of IOBs and 25% GCLKs and 25% GCLKIOBs. The Rijndael: AES-the Advanced Encryption Standard", core will take only 7 mW of power. The average Springer-Verlag, 2002 connection delay for this design is 868 ns. The maximum pin delay is 3.843 ns. The clock delay for [5] Oscar Perez, Yves Berviller, Camel Tanougast, the core will be 1.427 ns. Whereas the Fully Serge Weber “Comparison of various strategies of Pipelined AES core uses 21% Slice Flip Flops, 53% implementation of the algorithm of encryption AES 4 input LUTs, 47% bonded IOBs and 25% GCLKs on FPGA” IEEE ISIE 2006, July 9-12, 2006, and 25% GCLKIOBs. The designed core will take only 367 mW of power. The average connection [6] Yang Jun Ding Jun Li Na Guo Yixiong “FPGA- delay for this design is 1.631 ns. The maximum pin based design and implementation of reduced AES delay is 5.037 ns. The clock delays for the core will algorithm” IEEE 2010 be 1.407 ns. ISSN: 2231-5381 http://www.ijettjournal.org Page 3961