International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 Area-Efficient FPGA Implementation of Cryptographic SHA3-512 1 Nayana M S, 2Mrs. Bindu A U, 1 VLSI & Embedeb systems, Dept of ECE, SIET, Tumkur, India. 2 Assistant professor, Dept of ECE, SIET, Tumkur, India. Abstract — SHA (Secure Hash Algorithm) is the condensed representation of binary data. A cryptographic hash function is a deterministic process whose input is arbitrary random block of data and produces an output of fixed size, which is known as the hash value. These functions were initially introduced to provide information security, integrity and authentication.In recent years there have been serious and alarming cryptanalytic attacks on several commonly used hash functions, such as MD4, MD5, SHA0, SHA1 and SHA2. This culminated with the design of SHA3 for 512 bits, based on “Keccak algorithm” which is logically optimized for area efficiency, best throughput, enhanced operating frequency and reduced latency by integrating Rho, Pi and Chi steps of algorithm into a single step. SHA3 also provides stringent security properties including preimage resistance and collision resistance. This work presents a compact design of newly selected Secure Hash Algorithm (SHA-3) by dividing the basic Keccak architecture in to padder module and permutation module that reflects the sponge construction. The modules are designed, simulated and verified using Xilinx ISI Design Suite 14.5 software tool and implemented on Xilinx Spartan 6 Field programmable Gate Array (FPGA) device. Keywords - cryptographic hash function; SHA1, SHA2 and SHA3; Keccak algorithm; sponge construction. I. INTRODUCTION In recent days, security is a big risk in the transmission medium due to the development of the Internet and multimedia contents such as audio, video, image, etc. It enables us to easily obtain digital contents via the net. However, it causes several problems, such as infraction of ownership and illegal distribution of the copy. The method followed to address this security issues is based on cryptography technique. The technique is based on hashing function. Cryptography is a method of storing and transmitting data in a particular form so that only those for whom it is intended can read and process it. It is one of the most useful fields in the wireless communication area and personal communication systems, where information security has become more and more important area of interest. ISSN: 2231-5381 In order to make a very secure cryptographic portable electronic device, the selected well-known algorithm must be trusted, time-tested and widely peerreviewed in the global cryptographic community. Cryptographic algorithms supervise the specific information on security requirements such as data authentication but not encryption, data confidentiality and data integrity. The function of authentication services is to assure recipient that the message is from the source it claims to be. The data integrity assures that information and programs are changed only in a specified and authorized manner. The date confidentiality assures that the private or confidential information is not made available to unauthorized individual. A cryptographic hash function should be highly sensitive to the smallest change in the input message. A small change in single digit in the input message should produce a large change in the output hash value of the message. The message can be a plaintext file, a soft ware, or executable program. SHA is also called “Message Digest” or “Fingerprint” because it is the condensed representation of electronic data and are easy to generate for a given file. The hash algorithms are typically composed of a compression function that operates on fixed-length pieces of the input and the process is repeated until all the input blocks are consumed. SHA-3 posses the following important and stringent security properties: Collision resistance: Computationally unviable to produce two messages with same message digest. Pre-image resistance: Impossible to recreate a message from a given message digest. This project work presents an efficient design and implementation of keccak SHA-3 standard by dividing the basic architecture [Fig. 2] in to smaller modules that exactly reflects the sponge construction [Fig.1] from which algorithm can be easily generated. http://www.ijettjournal.org Page 455 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 FPGAs are ideal platform for the implementation of cryptographic algorithms, because modern FPGAs are equipped with enhanced embedded resources such as BRAMs, dedicated memory controller blocks (MCBs), PLL, Global Clock Lines, Digital Signal Processing (DSP) blocks in addition to LUTs and CLBs that can be used to optimize the implementations. blocks and never output during the squeezing phase. The capacity c actually determines the attainable security level of the construction. The rest of the paper is organized as follows: Section II briefly presents the Hash technology, Section III introduces the proposed architectures, Section IV includes the FPGA synthesis results and comparisons with previous woks, while the paper conclusions are discussed in the last section. Figure 1: Sponge construction II. HASH TECHNOLOGY SHA3 supports four fixed-output-length variants i.e. Hash function is a family of sponge functions. The sponge construction (shown in Figure 1) is a simple iterated construction for building a function f with variable-length input and arbitrary output length based on a fixed-length transformation or permutation operating on a fixed number b of bits. Here b is called the width. The sponge construction builds a function SPONGE [ f , pad, r] using a fixed-length transformation or permutation f , a sponge-compliant padding rule “pad” and a parameter bit-rate r. A finite-length output can be obtained by truncating it to its ℓ first bits. This instance of the sponge construction is called sponge function. The sponge construction operates on a state of b = r + c bits. The sum r+c determine the width of the permutation used in the sponge construction and are restricted to values in {25, 50, 100, 200, 400, 800, 1600}. The sponge construction processes the message in two phases: Absorption: The sponge state initially consists of all zeros. The first input block of length r is XORED with r bits of the state; and transform functions are applied on the state. Next input block is then XORED with this state like the previous one and transformed. This continues till all the input is consumed. Squeezing: The outer part of the state is iteratively returned as output blocks, interleaved with applications of the function f . The number of iterations is determined by the requested number of bits ℓ. Finally the output is truncated to its first ℓ bits. The cbit inner state is never directly affected by the input ISSN: 2231-5381 n ∈ {224, 256, 384, 512}. The 4 output lengths and the corresponding required capacity, rate with associated security levels are listed in Table 1. Table I: output lengths supported by SHA3. Out Colli Pre- Requ Requ SHA-3 -put -sion image -ired -ired inst length resistan resist capacity Rate(r) -ance -ce -ance (c) n= s <= s <= 1152 SHA3 224 112 224 n= s <= s <= 256 128 256 n= s <= s <= 384 192 384 n= s <= s <= 512 256 512 448 n224 512 1088 SHA3 n256 768 832 SHA3 n384 1024 576 SHA3 n512 s: security strength level. The sequential Keccak SHA3-512 architecture is shown in Figure 2. The architecture has 128-bit input data just to save extra input bits. The next block is the padder block which pads the required number of zeros with the input data in order to form 1600-bit state and then inversion is applied on each byte. The output from the padder block is forwarded to 2 x 1 Multiplexer (MUX) which drives the output data from padder to the compression-box of the architecture and selects the input data for the first round and feedback data for other twenty three rounds with the help of controlling signal (Ctrl 1). http://www.ijettjournal.org Page 456 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 The basic architecture is divided in to two modules they are: 1) Padder module and 2) Permutation module (shown in Figure 3 and 4 respectively), and they exactly reflects the sponge construction. The total area covered and the operating frequency of the project is compared with other SHA-3 implementations and listed in Table II. Figure 2: The Basic block diagram of Keccak SHA3- 512. When Ctrl 1 is low, MUX select the input data and at high, MUX will select the feedback data. The padded message is directly copied to Reg_A and the 1600 bits are arranged in 5x5 matrixes of 64-bits and resulting bits are forward to Compression-Box (CBox). It is basically the implementation of compression function in SHA-3 algorithm which comprises of thetha (Θ), rho (ρ), pi (π), chi (χ) and iota (i) step. The key feature of this algorithm is that the rho (ρ), pi (π) and chi (χ) steps of C-Box are implementing as a single step. This results in saving of hardware resources and also logically optimizes the design. After completing 24 iterations, final output is forwarded to Reg B for storage in order to synchronize the data-path. The last component in the architecture is the Truncating component where inversion per byte is performed on the output bits and then truncated to the desired length of hash output. III. PROPOSED METHODOLOGY Figure 3: Block Diagram of padder module Padder module: The padder module consists of Reg A, shifter, 2:1 mux, 576-bit buffer as shown in Fig 3. From the given message, first 32 bits of data is temporarily stored in Reg_A and data is forwarded to shifter. If the control signal In_ready is high, then it indicates that the 32 input bits are ready and if the control signal In_ready is low, then it indicates that all blocks of message are consumed. The shifter will left shift the data by 32 times and then forwarded to buffer. The buffer is of 576-bit wide, the new 32 bits of data is consumed only when the buffer is not full. In the second round, the data in the buffer is left shifted by 32-bits and get concatenated with new 32 input bits. The process continues until the 576-bit buffer is full and the padder output is forwarded to permutation module. The next data blocks are padded, if the padder module receives the acknowledgement signal ackn from permutation module. The design techniques were proposed in the basic SHA-3 architecture (shown in Fig 2) in order to achieve better time performance. In order to achieve the main objective of the project work i.e. low-area constraint, the basic architecture is designed using divide and conquer approach. From the divide and conquer technique, the required algorithm can be easily obtained by generating sponge function. ISSN: 2231-5381 http://www.ijettjournal.org Page 457 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 SHA-3. The Truncating block becomes active only when the control signal Message_full is high. The control signal Message_full is high, if and only if all the input data blocks are consumed. The permutation operation compresses the data such that if any manipulations occurred in confidential files to be transmitted leads to change in hash value. The c-bit i.e. 1024 zeroes are never directly affected by the input blocks and never output during the squeezing phase. The capacity c actually determines the attainable security level of the construction. IV. IMPLEMENTATION RESULTS AND COMPARISON The designs has been implemented and verified on Xilinx ISI Design Suite, System Edition 14.6 tool. The targeted device for the implementation was a Xilinx Vertex 6. Each step of SHA- 3 design has been implemented and tested as an individual module. These modules were instantiated in the main code of the design to examine its results in detail. Figure 4: Block Diagram of permutation module The permutation module performs 2 main functions: 1) f-permutation and 2) Truncation as shown in Fig 4. For the padded 576 bits of data remaining 1024 zeroes will be added such that r+c=1600. The 1600 bits are arranged as 5x5 state arrays with 64 bit word length. If the control signal First_round is high then the padded data will be applied for transformation block and immediately the control signal First_round is disabled, such that no more padded data are allowed. As and when the padded data is consumed by permutation module an acknowledgement signal Ackn will be sent to padder module to pad next block of data. Transformation is the main stage in the permutation module in which each round is sub-divided into five steps i.e. Theta (Θ), Rho (ρ) and Pi (π), Chi (χ), Iota (i) [4]. The transformed data is stored temporally in the register and applied for 24 rounds of transformations. The 24 such iterations reflect the trade-off between performance and safety margin made in the design but finally, the proposed design come up with collision free hash function. Round constants are the 64-bit constant values that need to be substituted during transformations. Depending upon the iteration count during transformations, the Round constant values are substituted. The counter will monitor the iteration rounds and the Round constant value will change according to count value. The Truncating block performs squeezing operation by truncating the remaining LSB bits and the MSB 512 bits obtained will be the final hash value of ISSN: 2231-5381 Table II shows the implementation results of above SHA-3 hash core in terms of Area, Frequency and Throughput (TP). The maximum operating frequency achieved is 368.72MHz with a throughput of 8.5 Gbps and the design takes 220 CLB slices with 24 clock cycles are required to reach final hash value. The proposed design results are compared with previously reported FPGA based hardware designs of SHA-3 in open literature in terms of area, frequency and throughput (TP) in Table II. The focus in this work is to utilize minimum area resources with sufficient TP. The design reported in S. Kerckhof et al. [11] is utilizing minimum number of area resources and needs 2154 number of clock cycles for final hash value that results in less TP as compared to other designs. Table II: Comparison results of SHA3-512 Implemen -tation Technology Slices Freque TP -ncy (Gbps) (MHz) Proposed Design Virtex6 220 368.72 8.5 [3] V5 240 301.02 7.224 http://www.ijettjournal.org Page 458 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 [2] Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche. “Keccak sponge function family main document”, version 1.2, April 2009. [11] V6 188 285 0.08 [13] V4 2024 143 6.07 [14] V5 1229 238.4 1.0805 [10] V5 2573 285 5.70 [12] V5 1197 263.16 6.32 [15] V5 1220 - 6.56 [2] V5 444 265 0.07 [16] 0.13 µm - 250 10.67 0.13 µm - [17] 0.1 [3] FIPS-202, “Federal information processing standards publication fips-202, secure hash algorithm-3 (sha-3),” 2014. [4] “Compact Implementation of SHA3-512 on FPGA” by Alia Arshad, Dur-e-Shahwar kundi, Arshad Aziz. Department of Electrical Engineering National University of Sciences and Technology Islamabad, Pakistan. 4.4 Mbps The TP reported by A. Akin et al. [13] and Kris Gaj et al. [14] is better than the previous designs but requires much more hardware resources. The designs reported in K.Latif et al. [12] and E.Hom. et al. [15] shows the better TP of 6.32 and 6.56 respectively which is still low as compared to our compact design, but these designs utilizes large number of slices. The above comparison shows that our design is better than previously reported FPGA implementations in terms of TP 8.5. V. CONCLUSION This work presents the design for compact hardware implementation of SHA3-512. The tradeoff between area and throughput is well balanced and the proposed design present the best possible results both in term of area and throughput as compared to previous reported results. The logical optimization by using divide and conquer technique in building architecture, merging the three transforms i.e. rho, pi and chi in to a single transform and by exploring maximum parallelism in the algorithm are the contributing factors. This optimization results in overall reduced latency which significantly enhanced the system performance. REFERENCES [1] “Cryptography & Network Security Principles & Practice, 5th edition, William Stalling. [5] “The KECCAK reference Version 3.0” by G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, January 2011. [6] “Keccak Specifications”, Submission to NIST (Round 3), January 2011, by G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. [7] “Performance analysis of sha-2 and sha-3 finalists” by Ram Krishna Dahal, Jagdish Bhatta, Tanka Nath Dhamala. Central Department of Computer Science & IT, Tribhuvan University, Kathmandu, Nepal. [8] “Pushing the Limits of SHA-3 Hardware Implementations to Fit on RFID” by Peter Pessl and Michael Hutter, Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria. [9] “Design of FPGA Based Encryption Algorithm using KECCAK Hashing Functions” by Deepthi Barbara Nickolas, Mr. A. Sivasankar, PG Scholar, Department of ECE, Anna University: Regional Center, Madurai, Tamilnadu, India..Assistant professor, Department of ECE, Anna University: Regional Center, Madurai, Tamilnadu, India. [10] “FPGA-Based Design Approaches of Keccak Hash Function” by George Provelengios, National and Kapodistrian University of Athens, Athens, Greece, Paris Kitsos, Computer Science, Hellenic Open University Patras, Greece, Christos Koulamas, Industrial Systems Institute Patras, Greece, Nicolas Sklavos, KNOSSOSnet Research Group,Technological Educational Institute of Patras Patras, Greece. [11] Stéphanie Kerckhof, François Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric ISSN: 2231-5381 http://www.ijettjournal.org Page 459 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 9 – March 2015 Meurice de Dormale, François-Xavier Standaert, “Compact FPGA implementations of the five SHA-3 finalists”, 10th IFIP Smart Card Research and Advanced Applications 2011 (CARDIS 2011), Leuven, Belgium, pp. 217-233, September 14-16, 2011. [12] Kashif Latif, M Muzaffar Rao, Arshad Aziz and Athar Mahboob,“Efficient hardware implementations and hardware performance evaluation of SHA-3 finalists”, NIST Third SHA-3 Candidate Conference, Washington D.C., March 22-23, 2012. [13] Abdulkadir Akin, Aydin Aysu, Onur Can Ulusel, Erkay Savas, “Efficient hardware implementations of high throughput SHA-3 candidates Keccak, Luffa and Blue Midnight Wish for single- and multi-message hashing”, NIST 2nd SHA-3 Candidate Conference, Santa Barbara, August 23-24, 2010. [14] K. Gaj, E. Homsirikamol, and M. Rogawski, “Comprehensive comparison of hardware performance of fourteen round 2 sha-3 candidates with 512-bit outputs using field programmable gate arrays,” 2nd SHA-3 Candidate Conference, pp 23-24, August 2010. [15] E. Homsirikamol, M. Rogawski, and K. Gaj, “comparing hardware performance of round 3 sha-3 candidates using multiple hardware architectures in xilinx and altera fpgas,” ECRYPT II Hash Workshop, pp. 1–15, 19-20 May 2011. [16] Xu Guo, Meeta Srivastav, Sinan Huang, Dinesh Ganta, Michael B.Henry, Leyla Nazhandali and Patrick Schaumont, “Silicon implementation of SHA3 finalists: BLAKE, Grostl, JH, Keccak and Skein”, ECRYPT II Hash Workshop 2011, Tallinn, Estonia, 19-20 May 2011. [17] Elif Bilge Kavun and Tolga Yalcin, “A lightweight implementation of Keccak hash function for radio-frequency identification applications”, Radio Frequency Identification: Security and Privacy Issues, Lecture Notes in Computer Science, 2010. ISSN: 2231-5381 http://www.ijettjournal.org Page 460