Final Project

advertisement
ECE 111
Final Project
Secure Hash Algorithm
SHA-1
Secure Hash Algorithm
• Goal is to compute a unique hash value for any input
“message”, where a “message” can be anything.
• SHA-1 (widely used) returns a 160-bit hash value (a.k.a.
message digest or strong checksum)
“The quick brown
fox jumps over the
lazy dog”
SHA-1
2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12
160-bits = five 32-bit words
SHA-1
file: avatar.avi
file: chopin.mp3
some 160-bit value
SHA-1
some 160-bit value
SHA-1
• Just a small change, e.g. from “dog” to “cog”, will completely
change the hash value
“The quick brown
fox jumps over the
lazy dog”
SHA-1
2fd4e1c6 7a2d28fc ed849ee1 bb76e739 1b93eb12
“The quick brown
fox jumps over the
lazy cog”
SHA-1
de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3
Verifying File Integrity
VIRUS
badFile
goodFile
NY Times
BigFirm™
hash(goodFile)
User
• Software manufacturer wants to ensure that the executable file is
received by users without modification …
• Sends out the file to users and publishes its hash in NY Times
• The goal is integrity, not secrecy
• Idea: given goodFile and hash(goodFile), very hard to find badFile such
that hash(goodFile)=hash(badFile)
Authentication with Shared Secrets
SECRET
SECRET
msg, H(SECRET,msg)
Alice
Bob
Alice wants to ensure that nobody modifies message in transit
(both integrity and authentication)
Idea: given msg,
very hard to compute H(SECRET, msg) without SECRET;
easy with SECRET
SHA-1
• Developed by NIST, specified in the Secure Hash Standard
(SHS, FIPS Pub 180), 1993
• SHA-1 is specified as the hash algorithm in the Digital
Signature Standard (DSS), NIST
General Logic
• Input message must be < 264 bits
– not really a problem
• Message is processed in 512-bit blocks sequentially
• Message digest is 160 bits
SHA-1 Algorithm
• Step 1: Padding bits
– A b-bit message M is padded in the following manner:
• Add a single “1” to the end of M
• Then pad message with “0’s” until the length of message is congruent to
448, modulo 512 (which means pad with 0’s until message is 64-bits less
than some multiple of 512).
• Step 2: Appending length as 64 bit unsigned
– A 64-bit representation of b is appended to the result of Step 1.
• The resulting message is a multiple of 512 bits
• e.g. suppose b = 900
2 x 512 = 1024 bits
M
900 bits
1 0 0 … 0
59 0’s
900
64 bits
SHA-1 Algorithm
• Step 3: Buffer initiation – initialize message digest (MD) to
these five 32-bit words
H0 = 67452301
H1 = efcdab89
H2 = 98badcfe
H3 = 10325476
H4 = c3d2e1f0
SHA-1 Algorithm
• Step 4: Processing of the message (the algorithm)
– Divide message M into 512-bit blocks, M0, M1, … Mj, …
– Process each Mj sequentially, one after the other
– Input:
• Wt : a 32-bit word from the message
• Kt : a constant
• H0, H1, H2, H3, H4 : current MD
– Output:
• H0, H1, H2, H3, H4 : new MD
SHA-1 Algorithm
• Step 4: Cont’d
– At the beginning of processing each Mj, initialize
(A, B, C, D, E) = (H0, H1, H2, H3, H4)
– Then 80-step processing of 512-bit blocks – 4 rounds, 20 steps each
– Each step t (0 ≤ t ≤ 79):
• Wt
If t < 16, Wt = tth 32-bit word of Mj
If t ≥ 16, Wt = (Wt-3  Wt-8  Wt-14  Wt-16) <<< 1
» where <<< denotes circular shift to the left by s bits
» and  denotes bit-wise XOR
SHA-1 Algorithm
• Step 4: Cont’d
– Each step t (0 ≤ t ≤ 79):
• Kt
0 ≤ t ≤ 19, Kt = 5a827999
20 ≤ t ≤ 39, Kt = 6ed9eba1
40 ≤ t ≤ 59, Kt = 8f1bbcdc
60 ≤ t ≤ 79, Kt = ca62c1d6
SHA-1 Algorithm
• Step 4: Cont’d
– Each step t (0 ≤ t ≤ 79):
• Define F(X, Y, Z) as follows:
0 ≤ t ≤ 19, F(X, Y, Z) = (X ^ Y)  ( X ^ Z)
20 ≤ t ≤ 39, F(X, Y, Z) = X  Y  Z
40 ≤ t ≤ 59, F(X, Y, Z) = (X ^ Y)  (X ^ Z)  (Y ^ Z)
60 ≤ t ≤ 79, F(X, Y, Z) = X  Y  Z
» where ^ is bit-wise AND and  is bit-wise complement
• Then compute (called the SHA-1 step function)
T = (A <<< 5) + F(B, C, D) + Wt + Kt + E
» where + denotes an addition modulo 232
SHA-1 Algorithm
• Step 4: Cont’d
– Each step t (0 ≤ t ≤ 79):
• The values of (A, B, C, D, E) are updated as follows:
(A, B, C, D, E) = (T, A, B <<< 30, C, D)
SHA-1 Algorithm
• Step 4: Cont’d
– Finally, when all 80 steps have been processed, set
H0 = H0 + A
H1 = H1 + B
H2 = H2 + C
H3 = H3 + D
H4 = H4 + E
SHA-1 Algorithm
• Step 5: Output
– When all Mj have been processed, the 160-bit hash of M is available in
H0, H1, H2, H3, and H4
Module Interface
• Very similar to Project 2 on RLE Co-Processor
Done
Hash[159:0]
Start_hash
Message_size[31:0]
Message_addr[31:0]
• Notes: Message_size given in number of bytes and no need to
use Port_A_data_in[31:0] as memory used for read only
Port_A_addr[15:0]
DPSRAM
(stores
message)
Port_A_we
Port_A_data_in[31:0]
Port_A_data_out[31:0]
DPSRAM interface
Port_A_clk
Clk
SHA-1
Processor
nreset
Big-Endian vs. Little-Endian
• The memory representation (as with the RLE project) uses a
little-endian representation whereas the SHA1 algorithm uses
a big-endian representation.
• For message “The boat”, little-endian would be:
M[0] = “ ehT”;
M[1] = “taob”;
big-endian would be:
W[0] = “The “;
W[1] = “boat”;
Big-Endian vs. Little-Endian
• Use this function
function [31:0] changeEndian; //transform to big-endian
input [31:0] value;
changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
endfunction
• Then in your Verilog code, you can do something like this:
w[i] <= changeEndian(port_A_data_out);
Same Two Design Objectives
• Minimum delay
– delay = clock frequency * number of cycles
• Minimum area*delay product
• Also “Best Design” for each design objective
Disable the use of Block Memories
• Altera Quartus II will automatically replace some registers
with block memories, which are not counted in the total
number of registers. If you look at the Analysis & Synthesis
Summary, you will see a Total block memory bits entry. If it
is non-zero, it means block memories were allocated.
• To provide a common basis for comparison, you
should disable the use of block memories. This can be done
as follows:
– In the analysis and synthesis settings, there is an option for
auto shift register replacement.
– Turning that off will disable the use of block memories.
Possible to Achieve the Following
(with Block Memories disabled)
• #ALUTs = 809, #Registers = 966, Clock Cycles = 253 cycles, and
Clock Period = 8.20 ns (121.9 MHz)
• Area = #ALUTs + #Registers = 1775
• Delay = Clock Cycles * Clock Period = 2.075 us
• (Area*Delay) = 0.00368
• Try to get within a factor of 2x of these metrics. Try to do
better!
More Information
• Here is a paper describing an FPGA implementation, which
also has an excellent description of the SHA-1 algorithm.
– http://signal.hut.fi/~kjarvine/documents/sha.pdf
– Note: This isn’t necessarily the best implementation. Just one. You
can research any designs from the literature and use them, but it is up
to you to come up or find the best ones.
• Here is the official IETF specification.
– http://www.ietf.org/rfc/rfc3174.txt
• Here is the Wikipedia page.
– http://en.wikipedia.org/wiki/SHA-1
Download