ECE 111 Final Project Secure Hash Algorithm SHA-1 Secure Hash Algorithm • Goal is to compute a unique hash value for any input “message”, where a “message” can be anything. • SHA-1 (widely used) returns a 160-bit hash value (a.k.a. message digest or strong checksum) “The quick brown fox jumps over the lazy dog” SHA-1 2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12 160-bits = five 32-bit words SHA-1 file: avatar.avi file: chopin.mp3 some 160-bit value SHA-1 some 160-bit value SHA-1 • Just a small change, e.g. from “dog” to “cog”, will completely change the hash value “The quick brown fox jumps over the lazy dog” SHA-1 2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12 “The quick brown fox jumps over the lazy cog” SHA-1 de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3 Verifying File Integrity VIRUS badFile goodFile NY Times BigFirm™ hash(goodFile) User • Software manufacturer wants to ensure that the executable file is received by users without modification … • Sends out the file to users and publishes its hash in NY Times • The goal is integrity, not secrecy • Idea: given goodFile and hash(goodFile), very hard to find badFile such that hash(goodFile)=hash(badFile) Authentication with Shared Secrets SECRET SECRET msg, H(SECRET,msg) Alice Bob Alice wants to ensure that nobody modifies message in transit (both integrity and authentication) Idea: given msg, very hard to compute H(SECRET, msg) without SECRET; easy with SECRET SHA-1 • Developed by NIST, specified in the Secure Hash Standard (SHS, FIPS Pub 180), 1993 • SHA-1 is specified as the hash algorithm in the Digital Signature Standard (DSS), NIST General Logic • Input message must be < 264 bits – not really a problem • Message is processed in 512-bit blocks sequentially • Message digest is 160 bits SHA-1 Algorithm • Step 1: Padding bits – A b-bit message M is padded in the following manner: • Add a single “1” to the end of M • Then pad message with “0’s” until the length of message is congruent to 448, modulo 512 (which means pad with 0’s until message is 64-bits less than some multiple of 512). • Step 2: Appending length as 64 bit unsigned – A 64-bit representation of b is appended to the result of Step 1. • The resulting message is a multiple of 512 bits • e.g. suppose b = 900 2 x 512 = 1024 bits M 900 bits 1 0 0 … 0 59 0’s 900 64 bits SHA-1 Algorithm • Step 3: Buffer initiation – initialize message digest (MD) to these five 32-bit words H0 = 67452301 H1 = efcdab89 H2 = 98badcfe H3 = 10325476 H4 = c3d2e1f0 SHA-1 Algorithm • Step 4: Processing of the message (the algorithm) – Divide message M into 512-bit blocks, M0, M1, … Mj, … – Process each Mj sequentially, one after the other – Input: • Wt : a 32-bit word from the message • Kt : a constant • H0, H1, H2, H3, H4 : current MD – Output: • H0, H1, H2, H3, H4 : new MD SHA-1 Algorithm • Step 4: Cont’d – At the beginning of processing each Mj, initialize (A, B, C, D, E) = (H0, H1, H2, H3, H4) – Then 80-step processing of 512-bit blocks – 4 rounds, 20 steps each – Each step t (0 ≤ t ≤ 79): • Wt If t < 16, Wt = tth 32-bit word of Mj If t ≥ 16, Wt = (Wt-3 Wt-8 Wt-14 Wt-16) <<< 1 » where <<< denotes circular shift to the left by s bits » and denotes bit-wise XOR SHA-1 Algorithm • Step 4: Cont’d – Each step t (0 ≤ t ≤ 79): • Kt 0 ≤ t ≤ 19, Kt = 5a827999 20 ≤ t ≤ 39, Kt = 6ed9eba1 40 ≤ t ≤ 59, Kt = 8f1bbcdc 60 ≤ t ≤ 79, Kt = ca62c1d6 SHA-1 Algorithm • Step 4: Cont’d – Each step t (0 ≤ t ≤ 79): • Define F(X, Y, Z) as follows: 0 ≤ t ≤ 19, F(X, Y, Z) = (X ^ Y) ( X ^ Z) 20 ≤ t ≤ 39, F(X, Y, Z) = X Y Z 40 ≤ t ≤ 59, F(X, Y, Z) = (X ^ Y) (X ^ Z) (Y ^ Z) 60 ≤ t ≤ 79, F(X, Y, Z) = X Y Z » where ^ is bit-wise AND and is bit-wise complement • Then compute (called the SHA-1 step function) T = (A <<< 5) + F(B, C, D) + Wt + Kt + E » where + denotes an addition modulo 232 SHA-1 Algorithm • Step 4: Cont’d – Each step t (0 ≤ t ≤ 79): • The values of (A, B, C, D, E) are updated as follows: (A, B, C, D, E) = (T, A, B <<< 30, C, D) SHA-1 Algorithm • Step 4: Cont’d – Finally, when all 80 steps have been processed, set H0 = H0 + A H1 = H1 + B H2 = H2 + C H3 = H3 + D H4 = H4 + E SHA-1 Algorithm • Step 5: Output – When all Mj have been processed, the 160-bit hash of M is available in H0, H1, H2, H3, and H4 Module Interface • Very similar to Project 2 on RLE Co-Processor Done Hash[159:0] Start_hash Message_size[31:0] Message_addr[31:0] • Notes: Message_size given in number of bytes and no need to use Port_A_data_in[31:0] as memory used for read only Port_A_addr[15:0] DPSRAM (stores message) Port_A_we Port_A_data_in[31:0] Port_A_data_out[31:0] DPSRAM interface Port_A_clk Clk SHA-1 Processor nreset Big-Endian vs. Little-Endian • The memory representation (as with the RLE project) uses a little-endian representation whereas the SHA1 algorithm uses a big-endian representation. • For message “The boat”, little-endian would be: M[0] = “ ehT”; M[1] = “taob”; big-endian would be: W[0] = “The “; W[1] = “boat”; Big-Endian vs. Little-Endian • Use this function function [31:0] changeEndian; //transform to big-endian input [31:0] value; changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]}; endfunction • Then in your Verilog code, you can do something like this: w[i] <= changeEndian(port_A_data_out); Same Two Design Objectives • Minimum delay – delay = clock frequency * number of cycles • Minimum area*delay product • Also “Best Design” for each design objective Disable the use of Block Memories • Altera Quartus II will automatically replace some registers with block memories, which are not counted in the total number of registers. If you look at the Analysis & Synthesis Summary, you will see a Total block memory bits entry. If it is non-zero, it means block memories were allocated. • To provide a common basis for comparison, you should disable the use of block memories. This can be done as follows: – In the analysis and synthesis settings, there is an option for auto shift register replacement. – Turning that off will disable the use of block memories. Possible to Achieve the Following (with Block Memories disabled) • #ALUTs = 809, #Registers = 966, Clock Cycles = 253 cycles, and Clock Period = 8.20 ns (121.9 MHz) • Area = #ALUTs + #Registers = 1775 • Delay = Clock Cycles * Clock Period = 2.075 us • (Area*Delay) = 0.00368 • Try to get within a factor of 2x of these metrics. Try to do better! More Information • Here is a paper describing an FPGA implementation, which also has an excellent description of the SHA-1 algorithm. – http://signal.hut.fi/~kjarvine/documents/sha.pdf – Note: This isn’t necessarily the best implementation. Just one. You can research any designs from the literature and use them, but it is up to you to come up or find the best ones. • Here is the official IETF specification. – http://www.ietf.org/rfc/rfc3174.txt • Here is the Wikipedia page. – http://en.wikipedia.org/wiki/SHA-1