ppt - Statistical and Performance Analysis of SHA

advertisement
STATISTICAL AND
PERFORMANCE ANALYSIS OF
SHA-3 HASH CANDIDATES
Ashok V Karunakaran
Department of Computer Science
Rochester Institute of Technology
Committee Chair: Prof. Stanislaw Radziszowski.
Reader: Prof. Peter Bajorski.
Observer: Prof. Christopher Homan.
Project Abstract

Randomness - A good hash function should behave as close to a
random function as possible. Statistical tests help in determining the
randomness of a hash function and NIST has provided a series of tests in a
statistical test suite for this purpose. This tool has been used to analyze the
randomness of the final five hash functions.

Performance - It is the second most important factor in
determining a good hash function. Performance of the all the fourteen
Round 2 candidates was measured using Java as the programming language
on Sun platform machines for small sized messages.

Security - Security is the most important criteria when it comes to
hash functions. Grøstl is one of the final five candidates and its
architecture, design and security features have been studied in detail. Some
of the successful attacks on reduced versions have also been explained.
Also, the lesser known candidates, Fugue and ECHO, from Round 2 have
been studied.
Hash function
Input: String of arbitrary size.
 Output: Predetermined fixed size string.

Hash function requirements
Pre-image, second pre-image and collision
resistant.
 Collisions – When we find x and y such
that h(x) = h(y).
 Birthday paradox – Gives lower bound on
collision attack

◦ q ≈ 1.17√m for ε= ½ (m = 365, q = 23).
◦ Birthday bound for a m-bit message is 2m/2.
The need for a new hash function

Most commonly used hash functions are
broken
◦ Collisions in MD5 and SHA-0.
◦ Security flaws in SHA-1.

Increasing hardware power and
parallelization capabilities.
SHA-3 Competition
Organized by NIST.
 Started on Nov. 2, 2007.
 Received 64 entries.
 51 met minimum requirements.
 Round 1

◦ First candidate conference at KU Leuven,
Belgium on Feb 25-28, 2009.
◦ 14 candidates on July 24, 2009.
Round 2 and 3

Round 2
◦ Second candidate conference at Santa
Barbara, CA on August, 23-24, 2010.
◦ 5 candidates on Dec. 9, 2010.

Round 3/ Final Round
◦ Final conference in Spring 2012.
◦ Select a winner later in 2012.
Round 2 and 3 Candidates














BLAKE
BMW
CubeHash
ECHO
Fugue
Grøstl
Hamsi
JH
Keccak
Luffa
Shabal
SHAvite-3
SIMD
Skein
Randomness and Statistics
Hash function should behave
indistinguishably from a random function.
 Avoid finding patterns, which lead to
collisions.
 Statistical randomness tests to determine
hash function randomness.
 Pseudo-randomness is sufficient.

Statistical Tests
Motivation: Decide whether a particular
statement or claim is correct.
 Null hypothesis: The output of a hash
function is random, irrespective of the
input.
 Alternative hypothesis: The output is not
random.
 Test statistic: Computed from sample data.
Helps in deciding whether to
reject/accept the null hypothesis.

NIST Test Suite
Statistical test suite for random and
pseudo-random number generators for
cryptographic applications.
 Helpful in detecting deviations of a binary
sequence from randomness.
 Total of 15 tests.
 Ex., Frequency Test, Longest runs of ones
in a block.

P-value and Significance level
P-value is calculated from the test
statistic.
 The probability that a perfect random
number generator would have produced a
sequence less random than the sequence
that was tested.
 P-value = 1implies perfect randomness.
 P-value = 0 implies complete nonrandomness.

P-value and Significance level (cont.)

Significance level (α) denotes the
probability of Type 1 error.
◦ False positive, occurs when a statistical test
rejects a true null hypothesis.

If P-value ≥ α then the null hypothesis is
accepted.
◦ Meaning, the sequence appears to be random.

If P-value < α then the null hypothesis is
rejected.
P-value and Significance level (cont.)

For the project,
◦ α = 0.01
◦ One would expect 1 sequence in 100
sequences to be rejected.
◦ P-value ≥ 0.01 indicates that the sequence
would be considered random with a
confidence of 99%.
◦ P-value < 0.01 indicates that the sequence is
considered non-random with a confidence of
99%.
Frequency Test
Tests the proportion of zeros and ones in
the sequence.
 For a random sequence, the proportion
should be the same.
 Test Description:

◦ Convert bits to -1 or +1 and then add.
Sn = X1 + X2 + … + Xn.
For ex., if ε = 1011010101,
then n =10 and Sn = 2.
Frequency Test (cont.)
◦ Compute the test statistic,
Sobs = Mod( Sn) ⁄ √n.
Sobs = 2 ⁄ √10 = .63245
◦ Compute P-value = erfc(Sobs ⁄ √2).
P-value = erfc(.63245 ⁄ √2) = 0.527089.
• Decision: P-value > 0.01, so accept
sequence as random.
Longest Runs of one in a block
Tests the longest run of ones within M-bit
blocks.
 It should be similar to what is expected of
a random sequence.
 Test Description:

◦ Input:
110011000001010101101100010011001110000000000010010011010101000100010011110101101000000
01101011111001100111001101101100010110010.
◦ Input length n: 128 bits.
◦ Divide the input into M-bit blocks.
M = 8.
Longest Runs of one in a block
(cont.)
◦ Longest run of ones in each subblock is noted
Subblock
Max-Run
Subblock
Max-Run
11001100
2
00010101
1
01101100
2
01001100
2
◦ Calculate the frequencies of the longest run
ν0 = 4; ν1 = 9; ν2 = 3; ν4 = 0.
◦ Compute X2(obs), it is a measure of how well
the observed longest run length matches the
expected longest length within M-bit blocks.
Longest Runs of one in a block
(cont.)

Inputs for the experiment

Numbers – Hash of numbers 0-3999.
◦ Tests require length of at least 106 bits.
◦ For 256 bit output,
256 x 4000 = 1,024,000 bits.

KAT Inputs – 2048 hexadecimal inputs
from the official candidate
documentation.
Inputs for the experiment (cont.)

From file – The NIST document on the
statistical test suite.
◦ Every 10Kb – Each input block has 10Kb. The
first input is the first 10Kb, second input skips
first m=1Kb and takes next n=10Kb.
◦ Every 100Kb – Each input block has 100Kb. In
this case, every 100 bytes are skipped before
the next input block.

Ensures there is some over-lapping and
non-overlapping in the data blocks.
Output for BLAKE-256
Tests
Numbers
KAT
10Kb
100Kb
App. Entropy
0.531403
0.132928
0.365077
0.476437
Block Freq.
0.550332
0.999349
0.105159
0.634999
Cumulative
Sums
0.324573,
0.201009
0.988702,
0.943249
0.000432,
0.001383
0.129711,
0.221312
FFT
0.204233
0.655976
0.255107
0.617123
Frequency
0.187412
0.765466
0.000966
0.127740
Linear
Complex
0.867403
0.312439
0.551978
0.693519
Longest Run
0.095483
0.382246
0.697027
0.936944
Overlapping
Template
0.099496
0.718846
0.180799
0.214866
Rank
0.077948
0.162680
0.946797
0.843130
Output for BLAKE-256 (contd.)
Tests
Numbers
KAT
10Kb
100Kb
Runs
0.753526
0.978062
0.863215
0.048920
Serial
0.876547,
0.838931
0.252703,
0.520978
0.625307,
0.854685
0.988346,
0.986553
Universal
0.861028
0.057151
0.382927
0.833105
Nonoverlapping
Template
0.272553,
0.156433
0.748985,
0.001491
0.013372,
0.593525
0.376109,
0.329376
Random
Excursions
0.560459,
0.148643
0.997930,
0.945050
0.000000,
0.000000
0.381784,
0.935452
Random
Excursions
Variant
0.612882,
0.582494
0.163078,
0.205123
0.000000,
0.000000
0.219435,
0.393705
Total Bits
1024000
524288
1677056
16936192
No. of 0’s
511333
262036
840665
8464962
No. of 1’s
512667
262252
836391
8471230
Results and Conclusions
0.0 P-values don’t indicate failed tests but
inapplicable tests for input.
 All hash functions are random.

◦ Failed results are outliers rather than the
norm.
◦ Aren’t enough to classify as non-random.

Areas of failed tests can be explored
further.
Performance
Second most important criteria.
 Most of the work has been done with C
as the programming language.
 The following combination has not been
studied comprehensively before

◦ Language – Java
◦ Platform – Sun
◦ Messages size – Small
Specification
Machine – Sun Microsystems Ultra 20.
 Config – AMD 2.2GHz processor.
 OS – OS5.10 or Solaris 10.


Small messages – size < 8192 bytes.

Java code – Sphlib, hash function
implementations in C and Java.
Candidates
256
512
I/p=1024bytes Mbytes/s
Cycles/byte
Mbytes/s
Cycles/byte
SHA-2
57.90
38
19.69
111.73
BLAKE
45.5
48.35
27.48
80.06
Grøstl
11.56
190.31
6.87
320.23
JH
8.33
264.11
8.33
264.11
Keccak
12.63
174.19
6.89
319.3
Skein
38.24
57.53
30.11
73.07
Hamsi
18.50
118.92
7.12
308.99
BMW
42.89
51.29
36.84
59.72
CubeHash
23.75
92.63
23.87
92.17
ECHO
11.24
195.73
5.75
382.61
Fugue
22.69
96.96
11.62
189.33
Luffa
33.26
66.15
18.97
115.97
Shabal
104.37
21.08
103.36
21.28
SHAvite
24.11
91.25
13.97
157.48
SIMD
12.10
181.82
0.75
2933.33
256 output bits
512 output bits
Performance and Message length
Most of them claim performance is better
than SHA-2.
 Interesting to see how it is affected by
message length.
 For final five candidates, 16-byte and
4096-byte inputs were hashed.

Performance and Message length (cont.)
Candidates
16-256
4096-256
16-512
4096-512
SHA-2
11.89
61.43
2.39
21.93
BLAKE
10.93
47.68
3.47
29.99
Grøstl
2.8
12.38
0.67
7.74
JH
1.8
8.75
1.7
8.64
Keccak
1.52
13.7
1.56
7.26
Skein
9.18
38.77
3.78
31.76
Performance and Message length (cont.)

Rate of hashing
 Keccak-256 > SHA-256.
 Grøstl-512 > SHA-512.
Performance and Block size

For JH, the performance remains the
same for 256 and 512 version.
◦ Only one large internal state of 1024 bits.

For BLAKE and Keccak, the performance
difference is almost twice.
◦ The 256 version has block size of 512
whereas the 512 version has block size of
1024.
Candidates
256
512
I/p=1024bytes Mbytes/s
Cycles/byte
Mbytes/s
Cycles/byte
SHA-2
57.90
38
19.69
111.73
BLAKE
45.5
48.35
27.48
80.06
Grøstl
11.56
190.31
6.87
320.23
JH
8.33
264.11
8.33
264.11
Keccak
12.63
174.19
6.89
319.3
Skein
38.24
57.53
30.11
73.07
Hamsi
18.50
118.92
7.12
308.99
BMW
42.89
51.29
36.84
59.72
CubeHash
23.75
92.63
23.87
92.17
ECHO
11.24
195.73
5.75
382.61
Fugue
22.69
96.96
11.62
189.33
Luffa
33.26
66.15
18.97
115.97
Shabal
104.37
21.08
103.36
21.28
SHAvite
24.11
91.25
13.97
157.48
SIMD
12.10
181.82
0.75
2933.33
Hardware vs Software implementation

Visualizing area-time tradeoffs for SHA-3 has
hardware implementation of the
candidates.
Hardware vs Software implementation
Hardware
Software
1) Keccak
1) Shabal
2) CubeHash
2) Skein
3) JH
3) BLAKE
4) Shabal
4) CubeHash
5) Skein
5) Luffa
6) Fugue
6) SHAvite-3
7) Luffa
7) Fugue
8) BLAKE
8) JH
9) Hamsi
9) Hamsi
10) SHAvite-3
10) Keccak
11) Grøstl
11) Grøstl
Hardware vs Software implementation
(cont.)

Among the final five candidates
◦ Grøstl remains last in both implementations.
◦ Keccak has the biggest difference in terms of
position.
◦ JH and BLAKE swap positions with BLAKE
performing better in software.
◦ Skein is the only one to perform reasonably well
in both.
Security of Grøstl
One of the final five candidates.
 Developed at the University of Denmark.


What makes Grøstl interesting?
◦ Does not use block cipher components like
SHA family.
◦ Based on few individual permutations.
◦ Borrows components from AES like the Sbox.
Hash Function Construction
• Message M is padded and split into l bit message
blocks.
o If H(x) <= 256, l = 512 else l = 1024.
• The compression function f is as follows:
hi← f (hi-1, mi) for i = 1 to t.
Initial value of h, h0 = iv is predefined.
• The final value of h, ht is passed to the output
transformation function
H(M) = Ω(ht)
Compression Function
• Based on two permutations P and Q.
• Defined as
f(h, m) = P(h ⊕ m) ⊕ Q(m) ⊕ h
• Design of P and Q
• Inspired from Rijndael.
• Consists of r rounds, which consists of a
number of round transformations.
Design of P and Q (cont.)
• The four round transformations
o AddRoundConstant
o SubBytes
o ShiftBytes
o MixBytes
• One round consists of the above
transformations in the following order
R = MixBytes ◦ ShiftBytes ◦ SubBytes ◦
AddRoundConstant.
Byte Sequence to State Matrix
Mapping is done in a similar way to
Rijndael.
 The 64-byte sequence 00 01 02 … 3f is
mapped to a 8x8 matrix

AddRoundConstant
• Adds a round dependent constant to the
matrix.
• Transformation in round i is defined as
A ← A ⊕ C[i]
SubBytes
• Each byte in the matrix is substituted with a
corresponding value from the S-box.
• S-box is same as the one used in Rijndael.
• The transformation is as follows
ai,j ← S(ai,j), 0 ≤ i < 8, 0 ≤ j < v.
ai,j is the element in row i and column j.
ShiftBytes
• Shifts the bytes within a row to the left by a
number of positions.
• In round i, all bytes in row i are shifted σ
positions to the left.
σ = [0, 1, 2, 3, 4, 5, 6, 7]
MixBytes
• Each column in the matrix is multiplied by a
constant 8x8 matrix.
• The transformation is defined as
A ← B × A.
Output Transformation
• Defined as
Ω(x) = truncn (P(x) ⊕ x)
• truncn (x) discards all but the trailing n
bits of x.
• n is the length of the message digest.
Cryptanalysis
Differential Cryptanalysis
•There are at least 92 active S-boxes in a 4 round
differential trail.
o MixBytes ensures branch number is 9. Meaning, a
difference of k >0 bytes of a column will result in a
difference of at least 9-k bytes after one mix bytes
operation.
o ShiftBytes moves bytes in one column to 8 different
columns.
•Maximum distance propagation probability of S-box
= 2-6.
Cryptanalysis (cont.)
•
•
Linear Cryptanalysis
o Propagates similar to differential trail.
o Max distance propagation of S-box = 2-3.
Integrals
o Sets of plaintexts are chosen with one part held
constant and other part varies through all
possibilities.
o For ex., an attack may chose 256 plaintexts that
have all but 8 of their bits the same, but all differ in
those 8 bits.
o Has an XOR sum of 0.
o XOR sums of corresponding ciphertexts
provide information about the cipher’s operation.
Integrals (cont.)
 Similar to integrals on AES.
Grøstl- 256
o 2120 texts for 6 and 7 rounds.
o The texts are balanced in every byte of input and
output.
 Grøstl-512
o 2704 for 8 and 9 rounds.
o For 8 rounds, the texts are balanced in every byte of
input and output.
o For 9 rounds, every byte of input and every bit of
output is balanced.

Conclusion: Integrals cannot expose non-random
behavior in Grøstl.

Cryptanalysis (cont.)
•Algebraic
Cryptanalysis
o Attack on AES S-box, which is used by Grøstl.
o There are 200 S-box applications in AES for 1
encryption, it gives 8000 quadratic equations
with 1600 variables (the solution derives the
key).
o The time complexity of solving this is
unknown.
o Grøstl-256 and Grøstl-512 have 1280 and 3584
S-box applications, respectively.
Rebound Attack

Can be applied on block or permutation
based ciphers.

Consists of two phases:
◦ Inbound phase: Meet-in-the-middle (Ein) plus
exploiting the available degrees of freedom.
Rebound Attack (cont.)
◦ Outbound phase: Use the values obtained
from the inbound phase to move in the
forward (Efw) and backward (Ebw) directions
to find collisions.

Collisions found on reduced Grøstl
◦ Grøstl-256: 4 out of 10 rounds.
◦ Grøstl-512: 5 out of 12 rounds.
Internal Differential Attack
Exploits the differential trails between
parallel computations that are not distinct
enough.
 The idea is to device a differential path
that represents the difference between
the two paths rather than the differences
between the inputs.
 Grøstl has two permutations, P and Q,
which are very similar to each other.

Internal Differential Attack (cont.)
• Compute two internal states, A and B.
o A ⊕ B = Δin.
o P(A) ⊕ Q(B) = Δout.
• Collisions Found:
oGrøstl-256: 5 rounds, 279 computations and 264
memory.
oGrøstl-512: 6 rounds, 2177 computations and 264
memory.
• P and Q were modified in the final round to make
them more different.
Conclusion

Frontrunners among the five
◦ Performance:
 Good: BLAKE and Skein.
 Bad: Keccak.
 Ugly: Grøstl and JH.
◦ Randomness tests: Weakest is BLAKE.
◦ Novel algorithm: Skein and Keccak.
◦ Potential Winners: Skein or Keccak.
Thank You.

Questions?
Download