Provable Security Analysis of SHA-3 Candidates

advertisement
UNIVERSITY OF NOVI SAD
DEPARTMENT OF POWER, ELECTRONICS AND TELECOMMUNICATIONS
MASTER’S THESIS
Provable Security Analysis of SHA-3
Candidates
Marjan Škrobot
Promoters:
Prof. dr. ir. Bart Preneel
Prof. dr. ir. Vincent Rijmen
Supervisors:
Elena Andreeva, PhD
Bart Mennink
June, 2012
Abstract
Hash functions are fundamental cryptographic primitives that compress messages of arbitrary length into message digests of a fixed length. They are used as the building block
in many important security applications such as digital signatures, message authentication
codes, password protection, etc. The three main security properties of hash functions are
collision, second preimage and preimage resistance.
In 2005, significant breakthrough was made in the cryptanalysis of hash functions. Namely,
attacks on SHA-1 and MD5 raised concerns about the security of the widely used hash
function standards. In a response to this hash function crisis, the US National Institute for
Standards and Technology (NIST) announced a call for the design of a new cryptographic
hash algorithm in 2007. NIST received 64 submissions. At this moment, 5 candidates are
in the final round of competition: BLAKE, Grøstl, JH, Keccak and Skein.
An important criteria for the evaluation of hash functions is their security. A common technique to assess the security of hash functions is via reductionist proofs of security. Within
this provable framework, Andreeva et al. provided a summary of all known security reduction results in the ideal model for the 14 second round SHA-3 candidates. Furthermore,
they identified several open problems. In this thesis, we investigate the existing proof techniques for the second preimage analysis and resolve remaining open problems regarding the
second preimage resistance of Grøstl and Skein. More precisely, these two hash functions
are proved optimally second preimage resistant in the ideal model within the concrete security provable framework. Finally, we provide an overview of the current security reduction
and performance results on the five finalists.
Acknowledgements
I would like to show my gratitude to the people without whose help and guidance the
accomplishment of this thesis would not have been possible.
In the first place I am very grateful to my supervisors Elena Andreeva and Bart Mennink
who introduced me to the field of cryptology and whose sincerity and encouragement I will
never forget. Above all, it would have been next to impossible to write this thesis without
their supervision and advices from the very beginning to the end of my work. Bart’s
positive spirit and his precious time he put into reading and giving critical comments about
my thesis I greatly appreciate. I gratefully acknowledge Elena for introducing me to the
area of provable security, and for guiding me to the literature that sparked and sustained
my interest in cryptology. The cooperation with both of them was very important and
educational to me.
I gratefully thank Vojin Šenk and Željen Trpovski for their great support and active involvement as coordinators in the exchange process. I was privileged to have them as my
professors and I am grateful for the help they have given me.
A special word of gratitude to my parents, Pavle and Ruža, who have been a constant
source of support emotional, moral and of course financial during my postgraduate years,
and this thesis would certainly not have existed without them. Also, I would like to thank
my family and friends for their support throughout my studies. Finally, I want to give a
special thanks to my girlfriend Ljiljana for her great support and for producing the figures
used in this thesis.
v
Contents
Abstract
iii
Acknowledgements
v
Table of Contents
vii
List of Figures
ix
List of Tables
xi
1 Introduction
1
2 Preliminaries
2.1 Mathematical Background . . . . . . . . . . . . . . . . . .
2.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Graph Theory . . . . . . . . . . . . . . . . . . . .
2.1.3 Probability Theory . . . . . . . . . . . . . . . . . .
2.1.4 Complexity Theory . . . . . . . . . . . . . . . . . .
2.2 Provable Security . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Basic Definitions . . . . . . . . . . . . . . . . . . .
2.2.2 The Provable Security Paradigm . . . . . . . . . .
2.2.3 Assumptions . . . . . . . . . . . . . . . . . . . . .
2.2.4 Standard and Ideal Model . . . . . . . . . . . . . .
2.2.5 Complexity Theory Techniques . . . . . . . . . . .
2.3 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . .
2.3.1.1 Merkle-Damgård Mode of Operation . . .
2.3.1.2 Random Oracles . . . . . . . . . . . . . .
2.3.2 Security Properties . . . . . . . . . . . . . . . . . .
2.3.2.1 Formal Security Notions . . . . . . . . . .
2.3.2.2 Expected Security . . . . . . . . . . . . .
2.3.3 Generic Attacks Against Merkle-Damgård Mode of
2.3.4 Compression Function Building Strategies . . . . .
2.3.5 Other Modes of Operation . . . . . . . . . . . . . .
2.3.5.1 Wide-pipe and Narrow-pipe Design . . .
2.3.5.2 HAIFA . . . . . . . . . . . . . . . . . . .
vii
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Operation
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
6
7
8
10
10
10
11
11
12
12
13
13
14
15
15
16
17
17
18
19
19
Table of Contents
.
.
.
.
.
.
20
20
21
21
22
22
.
.
.
.
.
.
.
.
.
.
.
25
25
26
27
27
29
30
31
32
33
33
33
.
.
.
.
.
.
37
37
38
38
42
43
43
5 Conclusions and Remarks
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
50
50
Bibliography
58
A Mathematical Derivations
A.1 Security Bound on Second Preimage of Grøstl . . . . . . . . . . . . . . . . .
59
59
2.3.6
2.3.7
2.3.5.3 Sponge . . . . . . . . . . . . .
Establishing Security of Hash Functions
2.3.6.1 Property Preservation . . . . .
2.3.6.2 Indifferentiability Results . . .
2.3.6.3 Idealized Proof Model . . . . .
Security Model . . . . . . . . . . . . . .
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 NIST’s SHA-3 Hash Function Competition
3.1 The History of SHA Family . . . . . . . . . . . . . . . . . . .
3.2 SHA-3 Security Requirements and Evaluation Criteria . . . .
3.3 The Competition Finalists . . . . . . . . . . . . . . . . . . . .
3.3.1 BLAKE . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Grøstl . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 JH . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.4 Keccak . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.5 Skein . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 A Summary of the Existing Results . . . . . . . . . . . . . . .
3.4.1 Factors of Favorability . . . . . . . . . . . . . . . . . .
3.4.2 A Summary of the Security and Performance Results .
4 Second Preimage Resistance of Grøstl and Skein
4.1 Security Analysis of Grøstl . . . . . . . . . . . . . . . .
4.1.1 Assessing Second Preimage Resistance of Grøstl
4.1.2 Proof of Security . . . . . . . . . . . . . . . . . .
4.2 Security Analysis of Skein . . . . . . . . . . . . . . . . .
4.2.1 Assessing Second Preimage Resistance of Skein .
4.2.2 Proof of Security . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
2.1
2.2
2.3
The Merkle-Damgård construction. . . . . . . . . . . . . . . . . . . . . . . .
The HAsh Iterative FrAmework - HAIFA construction. . . . . . . . . . . . .
The sponge construction. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
19
20
3.1
3.2
3.3
3.4
The
The
The
The
.
.
.
.
28
29
30
32
4.1
4.2
The Grøstl’s compression function. . . . . . . . . . . . . . . . . . . . . . . .
The Skein hash function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
42
BLAKE’s compression function.
Grøstl hash function. . . . . . .
JH’s compression function. . . .
UBI mode of operation. . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
3.1
3.2
A schematic summary of hardware and software results. . . . . . . . . . . .
A schematic summary of security reduction results of the five finalists. . . .
xi
34
35
Chapter 1
Introduction
This thesis deals with provable security properties of cryptographic hash functions.
Cryptographic hash functions are fundamental cryptographic primitives. They are used
as a building block in many higher-level primitives in cryptography. The hash functions
compress message inputs of arbitrary length and return a hash value of fixed length.They are
employed in many practical applications such as digital signatures, message authentication
codes, password protection, pseudorandom string generation, derivation of cryptographic
keys, etc.
One of the first uses of hash functions was presented in 1976, in the famous paper by Diffie
and Hellman [DH76] on public-key cryptography. They were proposed as a building block of
digital signatures. A practical hash function must be efficiently computable and uniformly
distributed, but in order to protect data integrity and to provide message authentication
hash functions must satisfy specific security requirements. In his PhD thesis [Mer79], Merkle
defined the three main security properties of hash functions: collision, preimage and second
preimage resistance. Depending on the application, these security properties are relevant
or not. In practical signature schemes hash functions are used to: 1) make more efficient
signing of a messages of arbitrary length; 2) provide secure authentication. The usual way to
employ a hash function in a signature scheme is to initially hash a message input M and then
to sign the hashed message H(M ) with the secret key of the signer σ(M ) = H(M )d mod N ,
where 0 ≤ M ≤ N . Later, the verifier receives the pair (M, σ(M )). This approach is known
as hash-and-sign paradigm. An undesired event will happen if an adversary finds two
distinct messages with the same hash output H(M1 ) = H(M2 ). Such messages are called
colliding messages, and the event is a collision event. In the case that collision event occurs,
the adversary can trick an honest party A by first asking him to sign a harmless message
M1 . If the honest party A signs the message, the adversary can counterfeit the signature
since the signature is the same for a potentially harmful message M2 . Similar scenario can
1
Chapter 1. Introduction
2
happen if the adversary for a previously chosen specific message M finds another message
M 0 with the same hash output H(M ) = H(M 0 ). This security property is known as
second preimage resistance (or weak collision). Another practical use of hash functions
is for commitment. A commitment scheme allows a prover to commit on data without
revealing it. A possible approach to create commitment would be to apply hash function
on data and disclose only the hash value. Later, the prover can open the commitment by
revealing the data. The hash value is the only guarantee to a verifier, who checks for the
correctness of it. An adversary, typically the verifier may try to retrieve information about
data from commitment. The commitment scheme is broken if the adversary succeeds to
retrieve a message M (data) from a hash value Y = H(M ). Therefore, hash functions used
in commitment schemes need to be first preimage resistant. These examples show that the
use of an insecure hash function as a building block would endanger higher-level primitives.
In his proposal of the digital signature scheme, Rabin [Rab78] described an iterative hash
function based on a block cipher DES with a message block mi used as a key. However,
this design turned out to be trivially insecure (cf. Section 2.3.1.1). A significant breakthrough in the design of hash functions was due to Merkle [Mer90] and Damgård [Dam90]
who independently showed how to iterate a compression function to preserve the collision
resistance of compression function to the collision resistance of hash function1 . This iteration principle, known as Merkle-Damgård is used in the most popular hash functions today.
The most prominent hash functions during the previous two decades are the MDx family
(MD5 most important), the SHA family, RIPEMD, HAVAL, Tiger, GOST and Whirlpool,
all of which rely on the Merkle-Damgård iterative principle. The MD5 was designed by Ron
Rivest in 1991, based on Rivest’s earlier hash function design MD4. MD5 hash function
has been employed in a wide variety of security applications. In 1995, the US National
Institute for Standards and Technology (NIST) issued the Secure Hash Standard with a
specification of the SHA-1 algorithm. This algorithm has become the most widely used hash
function standard. A new SHA-2 algorithm was published in 2001. After the breakthrough
in cryptanalysis by Wang et al. [WYY05, WY05] in 2005, security flaws were identified in
MD5 and SHA-1. Moreover, other results emerged [Jou04, Dea99, KS05, KK06] that raised
a question about the security of the Merkle-Damgård construction and hash functions in
general. This hash function crisis initiated the ongoing NIST’s hash function competition
[NIS07] with the aim to develop a new hash function standard, SHA-3. The end of the
selection process is scheduled for the late 2012. NIST specified a number of requirements
that the future SHA-3 function should meet. The hash function with n-bit hash value is
required to provide collision resistance of approximately n/2 bits, preimage resistance of
approximately n bits and second preimage resistance of approximately n − L bits, where
the length of the first preimage is at most 2L blocks. We also point to the indifferentiability
1
This property is known as collision-resistance preservation.
Chapter 1. Introduction
3
framework introduced by Maurer et al. [MRH04]. This framework was further developed
in the context of hash functions by Coron et al. [CDMP05]. Indifferentiability is important
because it guarantees security resistance against all generic attacks. The hash functions
submitted to the SHA-3 competition claim security, but only a limited number of them
are actually backed by security proofs. Many of these security results are obtainable with
means of provable security.
The concept of provable security was introduced by Goldwasser and Micali [GM84]. Originally, they developed it in the context of asymmetric encryption. From this preliminary
work, several lines of research emerged. Fundamentally, the goal of provable security is
to provide a mathematical guarantee that a cryptographic scheme cannot be broken by a
class of attackers in a specified mathematical model of reality. Cryptographic schemes are
usually based on some mathematical problem. Those schemes that can be proven secure
under the assumption that the underlying mathematical problem is computationally hard
are said to be secure in the standard model. Since it is usually difficult to assess the security in the standard model, in practice is often used the ideal model. Within this model,
underlying cryptographic primitives are replaced by their idealized versions.
Practically, the provable security approach allows us to prove the security of higher level
scheme (e.g. digital signature) under some assumption on the hash function security. In
this context, an important line of research was initiated by the research of Fiat and Shamir
[FS86] where they suggested the random oracle methodology. Later, Bellare and Rogaway
[BR93] formally introduced the random oracle model in order to allow design of more
practice oriented provably secure cryptographic schemes. They depicted the random oracle
model as a “bridge between theory and practice”. Within this model, the hash function is
replaced by an ideal primitive (random oracle). Likewise, the provable security approach
also allows us to conduct the security analysis of hash functions, which can be realized in
both standard and ideal model. Typically used approach in this context is to argue the
security property of the hash function under some assumption on the security of property
of the underlying compression function. In the ideal model, adversaries have oracle access
to the ideal version of the compression function or its underlying building blocks (e.g. block
cipher or permutation(s)). During the second round of NIST’s competition, Andreeva et
al. [AMP10c] provided a summary of all known security reduction results for all 14 second
round SHA-3 candidates. Moreover, they identified open problems regarding the security
reduction results and as the main concern they indicated the lack of optimal security bounds
on the second preimage resistance. These results have been revisited in [AMPŠ12], a part
of which is based on results of the work presented in this thesis. In addition, we refer
to [ABM+ 12, ALM11].
Chapter 1. Introduction
4
Besides this aforementioned goal, another substantial aspect of the provable security approach is associated with the introduction of notions and their definitions. Deficiency of
a proper definitions for the basic notions of security encouraged Rogaway and Shrimpton
[RS04] to revisit and formalize seven security notions of keyed hash functions. They also
considered all of the implications and separations among them within the provable security framework. Subsequently, Andreeva et al. [ANPS07, AMP10b] determined by proof or
counterexample, the security property preservation2 of seventeen different iterations in the
standard model.
Our Contribution
In this thesis we analyze the security of the final round candidates in the competition for the
new SHA-3 hashing algorithm. We give a concise survey of the five finalists together with
their security reductions and performance results. The main contribution of this thesis is
the analysis of the second preimage resistance of hash function competition finalists Grøstl
and Skein. More precisely, within the concrete security provable security framework, we
provide a lower bound on the second preimage resistance of Grøstl in the ideal permutation
model and Skein in the ideal cipher model and prove them both optimally second preimage
resistant.
Outline of the Thesis
In Chapter 2 we introduce the mathematical and cryptographic prerequisites for our proofs.
In Chapter 3 we present the timeline of SHA-3 hash function competition as well as the
NIST’s requirements and evaluation criteria for SHA-3 hash function. Additionally, we
provide a brief introduction to the five finalists of competition and their security and performance properties.
In Chapter 4 we present proofs for second preimage resistance of the Grøstl and Skein
hash functions.
Chapter 5 offers concluding remarks where we discuss obtained security results, highlight
some limitations of our approach and provide some future directions for the research.
2
The preservation of the seven security properties defined in [RS04].
Chapter 2
Preliminaries
In this chapter we introduce a basic background knowledge, which includes some concepts
from mathematics as well as cryptography. In Section 2.1 we introduce the basic mathematical definitions. In Section 2.2 concepts of provable security are discussed. Section 2.3
offers an introduction to cryptographic hash functions and their security properties.
2.1
Mathematical Background
In this section first we give the mathematical notations used in our work. Then we offer a
brief summary of basic definitions from graph theory (see Section 2.1.2), probability theory
in Section 2.1.3 and complexity theory in Section 2.1.4. Definitions and notations for this
section are taken from literature [Die10, MvOV97].
2.1.1
Notation
Let N denote the set of all natural numbers and Z denote the set of integers. Let n ∈ N, then
{0, 1}n denotes all the n-bit strings. We denote the set of all bit strings of arbitrary length
by {0, 1}∗ . The concatenation of two bit strings x and y is denoted by x||y. The message
blocks of any message M are denoted by m1 ||m2 || . . . ||mk where k denotes the number of
$
message blocks. Furthermore, x ←
− X corresponds selecting x uniformly at random from
the set X.
5
Chapter 2. Preliminaries
2.1.2
6
Graph Theory
Definition 2.1. A graph is a pair G = (V, E) of sets satisfying E ⊆ [V ]2 ; thus, the elements
of E are 2-element subsets of V . The elements of V are the vertices (or nodes, or points)
of the graph G, the elements of E are its edges (or lines).
The number of vertices of a graph G is its order, written as |G|; its number of edges is
denoted by ||G||. Two vertices x, y of G are adjacent (or neighbours), if e = {x, y} is an
edge of G. Two edges e 6= f are adjacent if they have an end in common. The vertex set
of a graph G is referred to as V (G), its edge set as E(G). Graphs are finite or infinite
according to their order; unless otherwise stated, the graphs we consider are all finite. For
the empty graph (∅, ∅) we simply write ∅. A graph of order 0 or 1 is called trivial.
Definition 2.2. A path in narrow sense is a non-empty graph P = (V, E) of the form
V = {x1 , x2 , . . . , xk }
E = {e1 , e2 , . . . , ek },
where the xi are all distinct and ei = {xi−1 , xi } for all i ≤ k.
Definition 2.3. A path in wider sense 1 of length k in a graph G is a non-empty sequence
e
e
ek−1
e
1
2
k
x0 −→
x1 −→
x2 · · · −−−→ xk−1 −→
xk
of vertices and edges in G such that ei = {xi−1 , xi } for all i ≤ k. A path is closed if x0 = xk
and open if they are different. If the vertices in a path in wider sense are all distinct, it
defines an obvious path in narrow sense in G.
In a path the first vertex x0 is called start vertex and the last vertex xk is called end vertex.
These two vetrices are linked by a path and jointly they are called terminal vertices of the
path; the vertices x1 , . . . , xk−1 are the inner vertices of a path. The number of edges of a
path is its length.
Definition 2.4. A directed graph (or digraph) is a pair (V, E) of disjoint sets (of directed
graph vertices and edges) together with two maps init: E → V and ter: E → V assigning
to every edge e an initial vertex init(e) and a terminal vertex ter(e). The edge e is said to
be directed from init(e) to ter(e).
A directed graph may have several edges between the same two vertices x, y. Such edges
are called multiple edges; if they have the same direction (say from x to y), they are parallel.
If init(e) = ter(e), the edge e is called a loop.
1
The term “walk” is used by some authors [Die10] for a path in wider sense p (a path in which vertices or
edges may be repeated), while the terms “path” and “simple path” are used for what is in our work called
a path in narrow sense P .
Chapter 2. Preliminaries
7
Notice that we use directed graphs in our work. Also, under the term path we refer to a path
in wider sense and often we denote it by the natural sequence of its edges p = (e1 , e2 , . . . , ek ).
2.1.3
Probability Theory
In this section we consider sample spaces with only finitely many possible outcomes. Let
the simple events of a sample space S be labeled s1 , s2 , . . . , sn .
Basic Definitions
Definition 2.5. An experiment is a procedure that yields one of a given set of outcomes.
The individual possible outcomes are called simple events. The set of all possible outcomes
is called the sample space.
Definition 2.6. A probability distribution P on S is a sequence of numbers p1 , p2 , . . . , pn
that are all non-negative and sum to 1. The number pi is interpreted as the probability of
si being the outcome of the experiment.
Definition 2.7. An event E is a subset of the sample space S. The probability that event
E occurs, denoted P (E), is the sum of the probabilities pi of all simple events si which
belong to E. If si ∈ S, P ({si }) is simply denoted by P (si ).
Fact 2.8. Let E ⊆ S be an event.
i) 0 ≤ P (E) ≤ 1. Furthermore, P (S) = 1 and P (∅) = 0. (∅ is the empty set.)
ii) If the outcomes in S are equally likely, then P (E) =
|E|
|S| .
Definition 2.9. Two events E1 and E2 are called mutually exclusive if P (E1 ∩ E2 ) = 0.
That is, the occurrence of one of the two events excludes the possibility that the other
occurs.
Fact 2.10. Let E1 and E2 be two events.
i) If E1 ⊆ E2 , then P (E1 ) ≤ P (E2 ).
ii) P (E1 ∪E2 )+P (E1 ∩E2 ) = P (E1 )+P (E2 ). Hence, if E1 and E2 are mutually exclusive,
then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ).
Chapter 2. Preliminaries
8
Conditional Probability
Definition 2.11. Let E1 and E2 be two events with P (E2 ) > 0. The conditional probability
of E1 given E2 , denoted P (E1 |E2 ), is
P (E1 |E2 ) =
P (E1 ∩ E2 )
.
P (E2 )
P (E1 |E2 ) measures the probability of event E1 occurring, given that E2 has occurred.
Definition 2.12. Events E1 and E2 are independent if P (E1 ∩ E2 ) = P (E1 )P (E2 ).
Observe that if E1 and E2 are independent, then P (E1 |E2 ) = P (E1 ) and P (E2 |E1 ) =
P (E2 ). That is, the occurrence of one event does not influence the likelihood of occurrence
of the other.
Fact 2.13. (Bayes’ theorem) If E1 and E2 are events with P (E2 ) > 0, then
P (E1 |E2 ) =
2.1.4
P (E1 )P (E2 |E1 )
.
P (E2 )
Complexity Theory
The main goal of complexity theory is to provide mechanisms for classifying computational
problems according to the resources needed to solve them. The classification should not
depend on a particular computational model, but rather should measure the intrinsic difficulty of the problem. The resources measured may include time, storage space, random
bits, number of processors, etc., but typically the main focus is time, and sometimes space.
Basic Definitions
Definition 2.14. An algorithm is a well-defined computational procedure that takes a
variable input and halts with an output.
It is usually of interest to find the most efficient algorithm for solving a given computational
problem. The time that an algorithm takes to halt depends on the “size” of the problem
instance. Also, the unit of time used should be made precise, especially when comparing
the performance of two algorithms.
Definition 2.15. The size of the input is the total number of bits needed to represent the
input in ordinary binary notation using an appropriate encoding scheme. Occasionally, the
size of the input will be the number of items in the input.
Chapter 2. Preliminaries
9
Definition 2.16. The running time of an algorithm on a particular input is the number
of primitive operations or “steps” executed.
Often a step is taken to mean a bit operation. For some algorithms it will be more convenient
to take step to mean something else such as a comparison, a machine instruction, a machine
clock cycle, a modular multiplication, etc.
Definition 2.17. The worst-case running time of an algorithm is an upper bound on the
running time for any input, expressed as a function of the input size.
Definition 2.18. The average-case running time of an algorithm is the average running
time over all inputs of a fixed size, expressed as a function of the input size.
Asymptotic notation
It is often difficult to derive the exact running time of an algorithm. In such situations
one is forced to settle for approximations of the running time, and usually may only derive
the asymptotic running time. That is, one studies how the running time of the algorithm
increases as the size of the input increases without bound.
In what follows, the only functions considered are those which are defined on the positive
integers and take on real values that are always positive from some point onwards. Let f
and g be two such functions.
Definition 2.19. (order notation)
i) (asymptotic upper bound ) f (n) = O(g(n)) if there exists a positive constant c and a
positive integer n0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 .
ii) (asymptotic lower bound ) f (n) = Ω(g(n)) if there exists a positive constant c and a
positive integer n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 .
iii) (asymptotic tight bound ) f (n) = Θ(g(n)) if there exist positive constants c1 and c2 and
a positive integer n0 such that c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 .
iv) (o-notation) f (n) = o(g(n)) if for any positive constant c > 0 there exists a constant
n0 > 0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 .
Intuitively, f (n) = O(g(n)) means that f grows no faster asymptotically than g(n) to
within a constant multiple, while f (n) = Ω(g(n)) means that f (n) grows at least as fast
asymptotically as g(n) to within a constant multiple. f (n) = o(g(n)) means that g(n) is
an upper bound for f (n) that is not asymptotically tight, or in other words, the function
f (n) becomes insignificant relative to g(n) as n gets larger. The expression o(1) is often
used to signify a function f (n) whose limit as n approaches ∞ is 0.
Chapter 2. Preliminaries
2.2
10
Provable Security
The first part of this section focuses on the basic definitions and concepts of provable
security. Then in Section 2.2.5 we present two approaches taken from complexity theory
used to evaluate the level of security of cryptographic schemes.
2.2.1
Basic Definitions
The main goal of cryptography is to enable secure communication by using cryptographic
schemes (or protocols). A common way to design a scheme is to choose or build secure
atomic2 primitives, and then on top of them to design the scheme, in such a way that the
scheme can “inherit” security from these atomic primitives. Under the atomic primitive we
assume either a problem which is considered to be computationally hard (e.g. the discrete
log problem, the integer factorization problem) or a secure cryptographic construction such
as block cipher, permutation, compression function, etc. The problem that can arise with
a cryptographic scheme design is that even if a good underlying atomic primitive is used,
a poor design can result in an insecure scheme. The usual way to investigate whether a
scheme inherits desired security properties from the underlying primitives, is by means of
provable security. The provable security idea was introduced in 1984 by Goldwasser and
Micali [GM84] in the context of asymmetric encryption. Usually, theoreticians say that the
term “provable security” is in some way misleading. The reason for this is that we actually
do not provide an absolute proof of security. We simply provide a reduction of the security
of the scheme to the security of some underlying atomic primitive. The term that better
reflects the essence of this approach is reductionist approach.
2.2.2
The Provable Security Paradigm
In order to provide a security proof we need to:
1. Introduce a formal adversarial model for a concrete security goal.
2. Formally define a security notion we want to achieve in chosen adversarial model.
3. Exhibit security reduction which shows that the only practical way to defeat the
scheme is to break the underlying atomic primitive.
This practically means that if we find some weakness in the scheme, we will definitely find
a weakness in the underlying atomic primitive as well. Vice versa, if we believe that the
2
In this context the term “atomic” means that the primitive in question cannot be used alone to solve a
specific cryptographic problem. Commonly, it is used as a building block of higher-level primitives.
Chapter 2. Preliminaries
11
atomic primitive is secure, then we will know that the scheme must be secure with respect
to the desired security notion. From the point of view of cryptanalysis, this implies that
its focus should be on the atomic primitive. To summarize, there are two principal aims of
the provable security approach. The first is associated with the introduction of notions and
their definitions which practically entails classification of protocols and atomic primitives,
while the second is related to the actual reduction.
2.2.3
Assumptions
When the provable security approach is used, one needs to be aware that proven security
does not exclude the possibility of attack. Crucially, the scheme is proven secure under a
certain assumption. In the case that the assumption is not satisfied, results obtained by
the proof become irrelevant. This does not have to lead to a practical attack, it only means
that the proof of security is no longer useful. This further implies that a proof of security
is more valuable when the assumption is weaker. Once we introduced the comparable
feature regarding security, we can compare security reduction results (e.g. if two schemes
are proven secure, the one making weaker assumptions is preferable). However, it is not
always possible to compare strength of the assumptions.
2.2.4
Standard and Ideal Model
In cryptography the standard model is the model of computation in which the adversary
is only limited by the amount of time and computational power available. As we pointed
above, cryptographic schemes are often based on complexity assumptions3 . These schemes
whose security reduction is possible using only complexity assumptions are said to be secure in the standard model. Although a proof in the standard model brings more security
guaranties than other techniques, it is quite difficult to complete this type of proof in practice. Therefore, in many proofs, an ideal model is used where cryptographic primitives (e.g.
block cipher, permutation, compression function) are replaced by their idealized versions.
Probably the best known technique of this kind is known as random oracle model.
3
Under complexity assumptions we consider an assumption on the hardness of the underlying problem
(e.g. the discrete log problem, the integer factorization problem).
Chapter 2. Preliminaries
2.2.5
12
Complexity Theory Techniques
Asymptotic Security
In the theoretical literature, complexity theory is widely used. There one talks about
polynomial-time adversaries and negligible success probabilities. In this setting, a scheme
needs to be designed with polynomial-time algorithms. Then polynomial-time reductions
can be exhibited from the assumption on the computational hardness of the underlying
problem to an attack of the security notion. Generally speaking, by exhibiting a polynomialtime reduction from A to B, we can show that problem B is at least as hard as A. A
polynomial security result claims that a scheme is secure for sufficiently large values of the
security parameter, without suggesting any specific values for it.
Concrete Security
Polynomial-time approach is quite favorable in the theoretical domain, but in practice
the more desired approach is to provide a concrete number for the security parameter.
Such number needs to suggest, for example, how large the security parameter should be,
such that a polynomial adversary that makes a certain number of queries to the public
algorithms of the scheme succeeds with a small probability. This framework is called
concrete security framework and it captures the quantitative nature of security. Another
aspect of the concrete security approach is associated with possible preservation of the
strength of the underlying atomic primitive in its transformation to the scheme.
Security Resistance and Attacks
In the provable security framework, attacks and security resistance are the complement of
each other. Attacks measure the degree of insecurity while quantitative bounds measure the
degree of security. More precisely, while the proof of security provides a lower bound, cryptographic attacks provide an upper security bound on the complexity of breaking scheme
under some assumption. When these two bounds meet, the security property of the scheme
is identified and the bound is declared as tight.
2.3
Hash Functions
Firstly, we briefly introduce basic definitions and notions of hash functions together with
their design strategies. Later, in Section 2.3.2 we present the main security properties
Chapter 2. Preliminaries
13
of hash functions. Generic second preimage attacks against the Merkle-Damgård mode
of operation are discussed in Section 2.3.3. In Section 2.3.4 we introduce existing design
strategies of compression functions, while in Section 2.3.5 the most significant new modes of
operation are discussed. Finally, Section 2.3.6 and Section 2.3.7 offer the proof techniques
and the security model, which is used later in our security analysis.
2.3.1
Basic Definitions
Definition 2.20. A hash function is a deterministic function that maps an input of finite
arbitrary size to an output of finite fixed size. Formally, H: {0, 1}∗ → {0, 1}n .
Definition 2.21. A compression function is a deterministic function that maps input of
finite fixed size s to an output of finite fixed size p where s > p.
The procedure that describes how a compression function should be used in order to allow
a secure hashing of arbitrarily long inputs is called mode of operation. One of the first
proposals which included an iteration of compression function was made by Rabin [Rab78].
The advantages of this iterative approach are linear time complexity in the message size and
the modest memory requirements. Later, Merkle defined the three main security notions
of hash functions in his PhD thesis [Mer79]. These basic notions of security were revisited
and formalized in a wider context in [RS04, AS11]. At this point, commonly used informal
definitions of the three main security notions are provided:
• collision resistance (Coll) - it is hard to find any two distinct inputs M and M 0
which hash to the same output, such that H(M ) = H(M 0 ).
• second-preimage resistance (Sec) - it is hard to find any second input which has
the same output as any specified input, i.e., given M , to find a second-preimage
M 0 6= M such that H(M ) = H(M 0 ).
• preimage resistance (Pre) - it is hard to find any input which hashes to that output,
i.e., to find any preimage M 0 such that H(M 0 ) = Y .
2.3.1.1
Merkle-Damgård Mode of Operation
As indicated previously, Rabin [Rab78] introduced an iterative hash function design based
on DES block cipher. The algorithm goes as follows: first, a message M is divided into
message blocks M = m1 ||m2 || . . . ||mk−1 ||mk of fixed size. Further, the hash function is
computed in the iterative manner: hi ← f (hi−1 , mi ) where f (hi−1 , mi ) = DESmi (hi−1 )
and h0 = IV . Finally, the hash function returns a hash value H(M ) = hk . Later was
Chapter 2. Preliminaries
14
shown that the use of unspecified IV leads to trivial second preimage and collision attacks.
A colliding message is found if the first input block is removed and for IV is selected h1 .
In addition, trivial preimage attacks are possible under the assumption that IV can be
chosen by the adversary. Merkle and Damgård independently offered a solution to address
these problems [Mer90, Dam90]. Their idea was to fix a default value for IV and to use
a padding scheme with the message length appended at the end. Each of them offered a
different padding scheme. Merkle’s padding scheme emerged as standard due to its higher
efficiency as the smaller number of padded bits is needed in the case of large messages.
Figure 2.1. The Merkle-Damgård construction.
The Merkle-Damgård mode of operation constructs a hash function H f : {0, 1}∗ →
{0, 1}n by iterating a compression function f : {0, 1}n × {0, 1}m → {0, 1}n . Padding is
achieved by appending to the original message a single ’1’ bit followed by as many ’0’
bits as needed to complete an m-bit block after embedding the message length at the
end. Adding the message length in the last block and using of a fixed IV , the so-called
strengthening, is the crucial ingredient in establishing the collision-resistance preservation
of Merkle-Damgård. This iteration design known as the Merkle-Damgård construction, is
the most commonly used mode of operation in hash functions.
2.3.1.2
Random Oracles
Difficulty of exhibiting a proof under complexity assumptions, has forced cryptographers
to introduce some construction with well-understood properties which could be used every
time a cryptographic hash function is required. First choice for such construction is the
random function. Fiat and Shamir [FS86] first suggested the random oracle framework,
which was later formally introduced by Bellare and Rogaway [BR93].
Definition 2.22. A random oracle is a public hash function that maps inputs of arbitrary
size to outputs of finite size, or R : {0, 1}∗ → {0, 1}n , where the outputs are drawn uniformly
at random from the range space and accessible by all algorithms in a black-box manner.
In a reductionist security proof an underlying hash function can be replaced with the random oracle. The random oracle model allows us to prove efficient-in-practice cryptographic
Chapter 2. Preliminaries
15
schemes secure, which sometimes can be provably impossible in the standard model. Still,
when using the random oracle model one needs to be aware that the random oracle assumption is the strongest assumption possible for hash functions. As a consequence, the security
guaranties provided by the random oracle model are not as strong as those obtained in
the standard model. What are the advantages of this approach? Firstly, it allows building
of efficient schemes. Furthermore, even though the random oracle assumption is strong,
the results obtained in the random oracle model provide valuable security guarantees (e.g.
provably exclude certain generic attacks, absence of security flaws in the design, etc.).
2.3.2
Security Properties
In this section we formally introduce basic security notions of hash functions as well as
their expected security levels. We take notations and terminology from [And10, Bou11].
2.3.2.1
Formal Security Notions
Let us remind that the goal of this work is to obtain reduction results on security of
particular hash functions. As indicated in Section 2.2, in order to make a reduction possible
we need to introduce a formal adversarial model were the security notion of the scheme (a
hash function in this case) has to be defined in that model.
The formal definitions for hash functions are characterized with the so-called attack-based
definitions. Typically, attacks are defined through a game between a challenger and the
attacker, where the challenger’s task is to simulate the environment of the adversary A and
generates the secret system parameters. Usually the adversarial advantage is measured by
the success probability of the adversaries. In terms of analysis of security property xxx ∈
{Coll, P re, Sec} of the hash function H we denote the adversarial advantage in breaking
xxx
that property by Advxxx
H (A). We write AdvH (t) to denote the maximum advantage of
any adversary with time complexity at most t. While the length of the first preimage M
is of 2L blocks following NIST’s security requirements, throughout this thesis the length is
denoted by λ (in bits) and k (in blocks), where λ/m ≈ k = 2L .
Definition 2.23. Let λ, n ∈ N and let H: {0, 1}∗ → {0, 1}n be a hash function. Then, the
advantage of the adversary A against collision is
h
i
0 $
0
0
AdvColl
(A)
=
P
r
(M,
M
)
←
−
A(·)
:
M
=
6
M
and
H(M
)
=
H(M
)
.
H
The advantage of the second preimage adversary A is defined as
Sec[λ]
AdvH
h
i
$
$
(A) = P r M ←
− {0, 1}λ ; M 0 ←
− A(M ) : M 6= M 0 and H(M ) = H(M 0 ) .
Chapter 2. Preliminaries
16
The advantage of the preimage adversary A is defined as
h
i
$
$
AdvPHre (A) = P r M ←
− {0, 1}λ ; Y ← H(M ); M 0 ←
− A(Y ) : H(M 0 ) = Y .
These are commonly used formal definitions for keyless second preimage and preimage
notion. An attempt to formalize collision resistance in similar fashion faces fundamental
difficulties. The problem lies in the fact that for any hash function there always exists an
efficient collision finding algorithm, but we humans are simply not able to find it. One
solution to formalize collision resistance in the standard model was offered by Rogaway
[Rog06]. The main idea behind his proposal was to provide security reduction for the case
when a hash function is used as a building block of a higher level primitive. This reduction
means that as long as humans are not able to find collision on the hash function, then the
higher level primitive cannot be be broken by hash functions collisions.
2.3.2.2
Expected Security
Now, after we defined relevant security notions we need to see what their security level is.
We want to show security results for hash functions in general, which means that we do
not want to focus on any particular hash design. In order to achieve this, we consider a
hash function which act as random oracle.
Preimage and Second Preimage Resistance. It is easy to show that any adversary who
is trying to find a (second) preimage would succeed with probability q/2n after sending q
queries to the random oracle. Each query to the random oracle as result has uniformly
random output of size n. This implies that each query has probability 2−n to yield a
(second) preimage. This means that when we consider a hash function as the random
oracle the problem of finding a second preimage is just as hard as the problem of inverting
the hash function.
Collision Resistance. Results for collision resistance are a bit different due to the birthday paradox. Intuitively, it is much easier to find any pair of two inputs which hash to the
same output, than to find an input which hashes to the same output as one particular input
selected before. The birthday problem estimates the probability that in a set of randomly
chosen people (less than 365) a pair shares the same birthday under the assumption that all
birthday dates are equally probable. The probability that such pair is found is higher than
50% if there are 23 persons in the set. If we compare the number of possible dates (365)
and the number of people required (23) we can see that 23 is approximately square-root
dependent from the 365. If we map the birthday paradox to our collision problem, such
√
that our range length is 2n possible values, it is clear that after 2n = 2n/2 queries to hash
Chapter 2. Preliminaries
17
function, collision is going to be found with probability higher than 50%. We can also look
at this problem from different angle. If an adversary is trying to find a collision, as he sends
q queries to random oracle, he knows q(q − 1)/2 pairs and each pair results in collision with
probability 2−n . This implies that a collision is found after 2n/2 queries [Wag02].
2.3.3
Generic Attacks Against Merkle-Damgård Mode of Operation
Cryptanalysis of modes of operation has increased significantly over the years. As a result, several generic4 attacks against Merkle-Damgård mode of operation were introduced
(e.g. the length extension attack, Joux’s multicollision attack, etc.). In our work, we are
interested in second preimage generic attacks.
We defined second preimage resistance as the security notion which captures the difficulty
of finding any second message input which has the same output as any previously specified
message input. For a long time it was thought that the Merkle-Damgård based hash function with strengthening preserved second preimage resistance and that it was taking about
2n steps (queries) to find a second preimage for secure hash function [LM92]. However,
in 1999, Dean showed in his PhD thesis [Dea99] that this security level could not be accomplished by hash functions whose compression function allowed the easy finding of fixed
points5 . He found a way to circumvent the strengthening by finding preimages of the same
size as the target message. Surprisingly, this important result has gone unnoticed until
2005, when Kelsey and Schneier [KS05] generalized Dean’s attack by using the multicollision result of Joux [Jou04]. More precisely, they introduced the generic second preimage
attack on the Merkle-Damgård hash function that requires at most approximately 2n−L
queries, where the length of the first preimage is at most 2L blocks. Later, more flexible generic second preimage attack was described by Andreeva et al. [ABF+ 08], with the
same complexity as the two mentioned before. Bouillaguet and Fouque [BF09] showed
within provable security framework, that these generic second preimage attacks against
the Merkle-Damgård construction are optimal under the assumption that the compression
function is random.
2.3.4
Compression Function Building Strategies
A compression function is commonly built on the top of a block cipher or a limited number of
permutations. Although block ciphers are primarily designed for encryption, they are used
4
The attacks which are applicable on all hash functions based on a single construction design or mode
of operation are called generic attacks.
5
A fixed point of a function is a point that is mapped to itself by the function. In the context of hash
functions, a fixed point for a compression function would mean that f (h, m) = h.
Chapter 2. Preliminaries
18
as a building block of compression functions, because of their well-understood properties
and design.
Block Cipher Based Compression Functions
A detailed analysis of block cipher based compression functions was conducted by Preneel
et al. [PGV93]. More precisely, they analyzed the 64 most basic ways to construct a hash
function from a block cipher6 . Furthermore, Black et al. [BRS02] proved secure 12 of
these 64 PGV schemes in oracle model where underlying block cipher is treated as random
primitive. In 2009, Stam [Sta09] revisited the rate-17 block cipher based hash functions,
where he analyzed them in a wider context. The most widely known types of the block
cipher compression function are the Matyas-Meyer-Oseas (PGV1), the Miyaguchi-Preneel
(PGV3) and the Davies-Meyer (PGV5). The main drawback of this type of design is its
inefficiency.
Permutation Based Compression Functions
In order to address problems with a weak key schedule and to make more efficient compression functions, a limited number of permutations can be used instead of block cipher. In
their paper, Black et al. [BCS05] analyzed all 2n-bit to n-bit compression functions based
on one n-bit permutation, and proved them insecure against collision and (second) preimage attacks. Later, Rogaway and Steinberger [RS08b, RS08a] together with Stam [Sta08]
extended these results to compression functions with arbitrary input and output sizes, and
an arbitrary number of underlying permutations. Moreover, they provided security bounds
which indicate the expected number of queries required to find collisions or preimages for
permutation based compression functions.
2.3.5
Other Modes of Operation
In 2004, the attacks of [WYY05, WY05] shaken the confidence of cryptographic community
in the security of widely employed hash functions MD5 and SHA-1. This has led to an
increased interest in the field of hash functions. As a result of the research on design
strategies of hash functions, new modes of operation emerged, with different design and
security characteristics. In this section we present some of the most important modes of
operation.
6
7
These construction are usually called PGV which is an acronym for Preneel, Govaerts and Vandewalle.
A compression function based on a single call to a block cipher.
Chapter 2. Preliminaries
2.3.5.1
19
Wide-pipe and Narrow-pipe Design
One important aspect of hash function design is the size of the internal state with regard to
the size of final hash output. The Merkle-Damgård construction is a so-called narrow-pipe
design where the size of the internal state is the same as the size of the final hash output
(l = n). In 2005, Lucks introduced the Wide-pipe Hash [Luc05]. The main idea behind
this design was to use an internal state of the hash function considerably larger than hash
output (l n). More precisely, the size of an internal state is about twice as big as the
final hash output obtained by chopping at the end of the iteration. As a consequence, Lucks
was able to provide a proof that generic second preimage attacks could not be faster than
exhaustive search. As a drawback of this design one can underline slightly higher memory
requirements. This wide-pipe strategy has been employed in several SHA-3 competition
finalists, namely Grøstl, JH, Keccak and Skein.
2.3.5.2
HAIFA
The HAsh Iterative FrAmework was introduced by Biham and Dunkelman [BD07]. HAIFA
mode is basically a modified version of Merkle-Damgård mode where slight tweaks are
employed. In order to address the problem of generic second preimage attacks against
Merkle-Damgård, the designers of HAIFA accompanied each message block in the iteration
with a counter that tracks number of message bits hashed to this point and a fixed optional
salt 8 . The security property preservation of HAIFA design among others was investigated
by Andreeva et al. [ANPS07]. Bouillaguet and Fouque proved HAIFA to be optimally
second preimage resistant if the underlying compression function is assumed to behave like
an ideal primitive [BF09]. The HAIFA design strategy was followed by the designers of
one SHA-3 competition finalist, namely BLAKE. Consequently, security results of HAIFA for
preimage, second preimage, collision, and indifferentiability, while assuming ideality of the
underlying compression function, are applicable for the BLAKE hash function.
Figure 2.2. The HAsh Iterative FrAmework - HAIFA construction.
8
An input parameter for the compression function, can be either public or secret.
Chapter 2. Preliminaries
2.3.5.3
20
Sponge
As an alternative to the Merkle-Damgård design, sponge functions were introduced by
Bertoni et al. [BDPA07]. Instead of iterating a secure compression function in order to
preserve security properties and to obtain a secure hash function, designers of sponge
functions considered a different approach where they iterate a possibly insecure compression
function a sufficient number of times to obtain a secure hash function. The internal state
iterated by sponge functions is r + c bits wide, where c is the so-called capacity. The hash
value is obtained after two phases: absorbing and squeezing. Sponge functions iteratively
“absorb” r-bit message blocks per compression function call and this process is called the
absorbing phase. Once the message is processed, the squeezing phase occurs and the first r
bits of the internal state are returned as output block in a possibly iterative manner. The
number of output blocks can be chosen by the user. The security guarantees for the most
of sponge-like constructions9 are typically based on indifferentiability results, which can be
seen in Section 3.3.3 and Section 3.3.4. The SHA-3 competition finalist based on original
sponge function design is Keccak, while JH is regarded as a sponge-like hash function.
Figure 2.3. The sponge construction.
2.3.6
Establishing Security of Hash Functions
In this section we analyze possible techniques from the provable security aspect that can be
used to obtain security reduction results. Throughout further analysis emphasis is placed
on the second preimage resistance.
9
For a sponge-like hash function we consider a hash function which employs a permutation based compression function and iterate a wide internal state.
Chapter 2. Preliminaries
2.3.6.1
21
Property Preservation
In Section 2.3.1.1 we showed how a hash function should be built in order to preserve the
collision resistance from the compression function to the complete hash function. Also,
generic security is discussed in Section 2.3.3 where is pointed out that Merkle-Damgård
construction does not preserve second preimage resistance. Furthermore, Andreeva et
al. [ANPS07, AMP10b] analyzed, among the other, preservation of second preimage resistance by various constructions. Unfortunately, only two of these constructions actually
preserve second preimage resistance, one of which is ROX construction [ANPS07] while
the other one is BCM [AP09]. A reason why typically used constructions do not preserve
second preimage resistance is believed to be due to an introduction of fixed bits through
the state input by the initialization vector and possibly through the message input. Another reason for non-preservation can be presence of fixed padding message bits. As a
consequence, the second preimage resistance of the compression function does not directly
translate to the second preimage security of hash function based on the Merkle-Damgård
construction with final chopping.
2.3.6.2
Indifferentiability Results
In the recent years, an important progress in security analysis was made with the introduction of indifferentiability framework by Maurer et al. [MRH04]. In addition, this framework
was further developed in the context of hash functions by Coron et al. [CDMP05]. The
main principle behind this framework is as follows: in order to investigate the security of a
particular mode of operation, one can replace the underlying primitive (usually compression
function or even underlying building block of compression function such as permutation or
block cipher) with an ideal version of itself (a random function, a random permutation, an
ideal block cipher) and then compare the combination of the ideal primitive and the mode
of operation in question with the random oracle. Following this approach we can determine weather this design is indifferentiable from a random oracle or not. Positive answer
would mean that the design behaves ideally up to a certain level. The level of resemblance
(typically expressed in number of queries) between concrete design and random oracle is
regarded as an important security indicator. For us, the importance of this framework lies
in the fact that the result obtained within this framework indirectly provides bounds on
the (second) preimage and collision resistance of the hash function in question [AMP10c].
Chapter 2. Preliminaries
2.3.6.3
22
Idealized Proof Model
A reduction in the ideal model considers where an information-theoretic adversary who
has only query access to the idealized underlying primitive (compression function in this
case). Bouillaguet and Fouque [BF09] followed this approach to provide optimal security
bound on the second preimage resistance in ideal compression function model for MerkleDamgård and HAIFA constructions. A benefit of successfully conducted security reduction
is the guarantee that the hash function has no severe structural weaknesses, unless one
can detect a possible deviation from the random behavior in the underlying compression
function. In the later case, the security results obtained by reduction are invalid. Also, one
needs to be aware that an ideal compression function is quite a strong assumption. In the
problematic case, when the compression function exhibits non-random behavior, the level
of modularity can be refined in order to revalidate or improve security guarantees. In this
case, one needs to assume the ideal behavior of underlying building blocks of compression
function (e.g. the underlying block cipher or permutation(s)). In [BCC+ 08], the designers
of Shabal suggested idealized proof model to assess the collision, preimage and second
preimage resistance of Shabal. More concretely, they proved Shabal secure in the ideal
cipher model by using the graph based simulation approach. Subsequently, Fouque et
al. [FSZ09] analyzed collision and preimage resistance of the construction identical to the
compression function of Grøstl. This analysis was performed in the ideal permutation
model. A summary of all known security reduction results for all 14 second round SHA-3
candidates in the ideal model was provided by Andreeva et al. [AMP10c]. Subsequently,
these results were revisited and updated in [AMPŠ12, ABM+ 12].
2.3.7
Security Model
As explained in Section 2.2 after the decision has been made on what to achieve within
the provable security framework, a formal adversarial model needs to be introduced, where
the security notion of the scheme in question has to be defined. In Section 2.3.2.1 formal
definitions of the three main security notions of hash functions are provided in the standard
model. However, in the idealized proof model where an underlying primitive of compression
function is assumed to be ideal, these formal definitions slightly differ. Therefore, in order
to carry out the meaningful reduction we introduce formal adversarial model which will
be used in our analysis. This setting is very similar to the analysis conducted in [BRS02,
FSZ09, AMP10c, AMPŠ12].
Let us assume that the underlying primitive of compression function is an ideal primitive
(e.g. a random permutation, an ideal block cipher). In this model, the adversary A is
a probabilistic algorithm with oracle access to a uniformly at random sampled primitive
Chapter 2. Preliminaries
23
$
P ←
− P rim(H). The set P rim(H) depends on the chosen hash function (e.g. in the case of
permutation-based hash function H1 , primitive P is chosen independently and uniformly
at random from the set of all permutations P rim(H1 )).
We consider information-theoretic adversaries only. Hence, the adversary has unbounded
computational power and its only obstacle to succeed in an attack is the randomness of
the query response. The complexity is measured by the number of queries made to the
oracle. In this ideal model the adversary A is allowed to make at most q forward and
inverse queries to the oracle. All these queries are stored in a query history L as indexed
elements. Without loss of generality, we assume that L always contains the queries required
for the attack and that the adversary does not ask any oracle query in which the response
is already known. The definitions of preimage and second preimage that we use in the ideal
model correspond to the everywhere10 preimage and second preimage notions of [RS04].
Definition 2.24. Let λ, n ∈ N, let Y = {0, 1}n , M = {0, 1}λ and let H: {0, 1}∗ → {0, 1}n
be a hash function. Then, the advantage of the adversary A against collision is
h
i
0 $
0
0
AdvCol
(A)
=
P
r
(M,
M
)
←
−
A(P
)
:
M
=
6
M
and
H(M
)
=
H(M
)
.
H
The advantage of the everywhere second preimage adversary A is defined as
eSec[λ]
AdvH
h
i
$
$
(A) = max P r P ←
− P rim(H); M 0 ←
− A(P ) : M 6= M 0 and H(M ) = H(M 0 ) .
M ∈M
The advantage of an everywhere preimage adversary A is defined as
h
i
$
$
re
AdveP
− P rim(H); M 0 ←
− A(P ) : H(M 0 ) = Y .
H (A) = max P r P ←
Y ∈Y
xxx
For q ≥ 1 we write Advxxx
H (q) = max{AdvH (A)} where the maximum is taken over all
adversaries that ask at most q oracle queries where xxx ∈ {eP re, eSec, Col}.
Above we defined the security notions of the hash function H in the formal adversarial
model. In addition, similar definitions can be used to define security notions of compression
function f . The security analysis conducted in Chapter 3 and Chapter 4 is realized in this
adversarial model.
10
Notice that the ePre and eSec of [RS04] relies (w.r.t. randomness) on the key generation, while in the
keyless and ideal model setting it relies (w.r.t. randomness) on the random underlying primitive.
Chapter 3
NIST’s SHA-3 Hash Function
Competition
This chapter briefly reviews the timeline of the SHA family history including the NIST’s
SHA-3 hash function competition. Section 3.2 presents NIST’s requirements and evaluation
criteria for SHA-3 hash function. Additionally, Section 3.3 provides a brief introduction
to the five finalists of competition and their security and performance properties. Finally,
security and performance results are summarized in Section 3.4.
3.1
The History of SHA Family
In 1993, the US National Institute of Standards and Technology (NIST) published the first
Secure Hash Standard. Soon after having been published it was withdrawn due to flaws
in the design of Secure Hash Algorithm which was described in the Federal Information
Processing Standards Publication (FIPS PUBS) 180. That version of Secure Hash Algorithm is commonly referred to as SHA-0. After being improved, FIPS 180-1 was published
in 1995 containing a specification of the hash function known as SHA-1. SHA-1 has been
the most widely used hash function algorithm in the next decade, even though the SHA2 standard, published in FIPS 180-2 in 2001, has better security properties than SHA-1.
SHA-2 includes a significant number of changes from its predecessor SHA-1. After a series
of attacks on SHA-1 by Wang et al. [WYY05, WY05] together with results that raised a
question about the security of Merkle-Damgård construction [Dea99, Jou04, KS05, KK06]
NIST recommended the replacement of SHA-1 by the SHA-2 hash function family. On
November 2, 2007, NIST announced a call for the design of a new SHA-3 hashing algorithm [NIS07], similarly to the development process for the Advanced Encryption Standard
(AES). The main goal of this public competition is to develop a new, secure cryptographic
25
Chapter 3. NIST’s SHA-3 Hash Function Competition
26
hash algorithm, as a standard that can be used in generating digital signatures, message
authentication codes, and many other hash function applications. The selected algorithm
is intended to be available royalty-free worldwide. NIST defines three categories of evaluation criteria that will be used to compare candidate algorithms throughout the SHA-3
competition: 1) security, 2) cost and performance, and 3) algorithm and implementation
characteristics. The new hash algorithm will be referred to as “SHA-3”.
Sixty-four candidates mostly from Europe and North America were submitted for hash
function competition by October 31, 2008. The preliminary cryptanalysis showed that fiftyone candidate algorithms meet the minimum of submission requirements. These candidates
were selected for the first round in the end of 2008. Later, on July 24, 2009, after public
feedback and internal reviews of the first-round candidates, NIST selected fourteen secondround candidates using previously defined evaluation criteria. At the end of 2010, after one
year of public review, NIST announced five SHA-3 finalists: BLAKE, Grøstl, JH, Keccak,
and Skein. In order to improve their hash functions, submitters of the finalist algorithms
were allowed to make minor modifications to their algorithms and submit the final packages
to NIST by January 16, 2011. Similarly to the previous rounds, one-year public comment
period is planned for the finalists. NIST plans to choose a winner of the SHA-3 competition
in 2012.
3.2
SHA-3 Security Requirements and Evaluation Criteria
NIST specifies security as the most important competition’s evaluation criteria [NIS07].
Moreover, they define security requirements which are expected to be fulfilled by the future
SHA-3 hash algorithm. The minimum security requirements that NIST expects from the
SHA-3 hash function of hash value size n are:
1. collision resistance of approximately n/2 bits,
2. preimage resistance of approximately n bits,
3. second preimage resistance of approximately n − L bits, where the length of the first
preimage is at most 2L blocks,
4. resistance to length-extension attacks,
5. any m-bit hash function specified by taking a fixed subset of the candidate functions
output bits should meet the above requirements with m replacing n.
As explained in Section 2.3.2.2 and Section 2.3.3, a standard hash function is expected to
satisfy these specified requirements. Certainly, an increase of second preimage resistance
Chapter 3. NIST’s SHA-3 Hash Function Competition
27
(from approximately n − L bits up to resistance of approximately n bits) and resistance
against other attacks, such as multi-collision attacks, is seen as an advantage by NIST. Any
result that shows that the candidate hash function does not meet the specified requirements
is considered to be a serious attack. Therefore, a special attention has to be directed towards
newly developed attacks. This is of great importance, especially if the level of security of
the hash function is lower than it is claimed by the submitter.
A good place to start security analysis is by checking the soundness of the mathematical
basis. This analysis can provide a good indication of the hash function design quality. To
select the best candidate, each submitted hash function is compared with other candidates
(of the same hash length) based on provided security results, regarding (second) preimage
resistance, collision resistance, and resistance to generic attacks. One additional security
property raised by the public during the evaluation process is the extent to which the
algorithm output is indifferentiable from a random oracle (see Section 2.3.6.2). In a summary, those candidates whose preliminary security analysis raised concerns were discarded
from the competition. Similarly, designs that have not received much feedback from the
cryptographic community were also considered as doubtful and they were discarded, too.
3.3
The Competition Finalists
In this section we present the five finalists. Beside their main characteristics, we provide
security properties and performance results of each finalist based on earlier works by Andreeva et al. [AMP10c, ABM+ 12, AMPŠ12] and Turan et al. [TPB+ 11].
3.3.1
BLAKE
The BLAKE hash function [AHMP10] uses HAIFA as iteration mode. BLAKE’s compression
function (see Figure 3.1) maintains a large inner state initialized with the internal state hi−1 ,
the salt S, and the counter Ci . Then the compression function iterates series of messagedependent rounds. After these rounds, the new internal state is obtained by compressing
the inner state together with the old internal state and the salt. This internal design is socalled local wide-pipe which is inspired by Lucks’ wide-pipe design [Luc05]. The compression
algorithm used in BLAKE is a modified version of Bernstein’s stream cipher ChaCha [Ber08].
Chapter 3. NIST’s SHA-3 Hash Function Competition
28
Figure 3.1. The BLAKE’s compression function.
Security of BLAKE
As noted before, the security results of HAIFA (see Section 2.3.5.2) are carried over to
theBLAKE hash function under an idealness assumption of the compression function. Nevertheless, Andreeva et al. [ALM11] and Chang et al. [CNY11] independently showed that
BLAKE’s compression function is differentiable from a random compression function after
about 2n/4 queries. This implies that BLAKE’s compression function has non-random behavior and as a consequence the HAIFA security results in the ideal compression function
model are invalid for the BLAKE hash function (see Section 2.3.6.3). In order to restore
BLAKE’s security guarantees Andreeva et al. [ALM11] refined the level of modularity in
the security analysis and revalidated the security results in ideal cipher model. Firstly,
they proved optimal security bounds on the compression function AdvCol
= Θ(q 2 /2n ) and
f
re
AdveP
= Θ(q/2n ). Due to collision and everywhere preimage preservation of the HAIFA
f
design, this security results are carried over from BLAKE’s compression function and extended to the BLAKE hash function. The everywhere second preimage property of BLAKE1
was directly analyzed in the ideal cipher model and as a result BLAKE was proved optimally
second preimage resistant AdveSec
= Θ(q/2n ). Finally, the BLAKE hash function is proved
H
indifferentiable from a random oracle in the ideal cipher model [ALM11, CNY11].
Performance of BLAKE
BLAKE hash function as classified by NIST [TPB+ 11] is one of the top performers in software across most platforms, while in hardware its performance is labeled as average. In
constrained environments, BLAKE is described as one of the top performers in speed with relatively modest memory requirements. Moreover, BLAKE has a structure that allows flexible
designs.
1
The everywhere second preimage property is not preserved by HAIFA design which is shown in
[ANPS07].
Chapter 3. NIST’s SHA-3 Hash Function Competition
3.3.2
29
Grøstl
The Grøstl hash function [GKM+ 11] uses a wide-pipe Merkle-Damgård construction with
a final transformation employed before chopping. Its compression function is based on two
AES-like, fixed and distinct permutations. All nonlinearity in the design is derived from
the AES S-box. Since the security of compression function is not optimal, Grøstl designers
employed a final transformation which is believed to be one-way and collision resistant, but
does not compress before the chopping. The reader is referred to Section 4.1 for a detailed
description.
Figure 3.2. The Grøstl hash function.
Security of Grøstl
In the center of the Grøstl security analysis is its permutation based compression function. In relation to this, Fouque et al. [FSZ09] introduced specific 2-permutation based
construction and analyzed its collision and preimage resistance. Grøstl’s compression
function is based on this particular construction. Their results allow us to claim tight security bounds on the compression function for collision AdvCol
= Θ(q 4 /2l ) and preimage
f
re
resistance AdveP
= Θ(q 2 /2l ). Following same arguments as in security analysis of the
f
BLAKE, optimal bounds are obtained on collision and everywhere preimage resistance for
the Grøstl hash function. Furthermore, the Grøstl hash function is proven indifferentiable
from a random oracle if the underlying permutations are ideal [AMP10a]. The bound on
second preimage resistance of Grøstl is unknown. In Chapter 4 we analyze everywhere
second preimage resistance of the Grøstl in the ideal permutation model and we obtain
bound AdveSec
= Θ(q/2n−L ).
H
Chapter 3. NIST’s SHA-3 Hash Function Competition
30
Performance of Grøstl
In [TPB+ 11], Grøstl is marked as an average performer in software across most platforms
while in hardware Grøstl’s performance is seen as above-average. In constrained environments, Grøstl has poor performance with modest memory requirements. It can be also
noted that Grøstl has a flexible structure that allows various area trade-offs.
3.3.3
JH
The JH hash function [Wu11] is a novel design and to an extent it resembles a sponge construction. It can be viewed as a sponge-like construction as it employs fixed permutation
based compression function and wide-pipe Merkle-Damgård construction with final chopping as iteration mode, where the message size is m, the hash value size is n, while the
internal state size l satisfies l = 2m ≥ 2n. The permutation P is based on the AES design.
Specifically, all hash value sizes of JH use the same function. Also, each member of the JH
family is selected by using its corresponding IV .
Figure 3.3. JH’s compression function.
Security of JH
As a consequence of the results of Black et al. [BCS05], the JH compression function is
insecure in the ideal permutation model. As a confirmation of this claim, collisions and
preimages can be found for JH compression function in one query to the permutation. In
their paper, Lee and Hong [LH11] proved that the JH hash function is optimally colli2 n
sion resistant AdvCol
H = Θ(q /2 ). Andreeva et al. [AMPŠ12] proved optimal bounds for
preimage and second preimage resistance of JH for the n = 256 variant, while bounds for
n = 512 variant on preimage and second preimage resistance are improved but still not
Chapter 3. NIST’s SHA-3 Hash Function Competition
31
optimal. Furthermore, JH hash function is proven indiffierentiable from a random oracle if
the underlying permutation is assumed to be ideal [BMN10]. Later, Moody et al.[MPST12]
improved the indifferentiability bound on JH and confirmed (second) preimage results obtained in [AMPŠ12].
Performance of JH
In [TPB+ 11] JH is described as an average to above-average performer in software and
hardware, while in constrained environments JH is regarded as average in performance.
Also, JH has modest memory requirements.
3.3.4
Keccak
The Keccak hash function [BDPA11] follows the sponge construction [BDPA07], but can
also be considered as a Merkle-Damgård construction with final chopping. It uses a single
large fixed permutation. The permutation can be seen as a combination of a linear mixing
operation and a very simple nonlinear mixing operation. What is interesting regarding this
hash design is that it uses a single design for variable hash output sizes.
Security of Keccak
Similarly to JH, the compression function of Keccak is based on one permutation and
the same results apply. Collisions and preimages can be found for Keccak’s compression
function in one query to the permutation. The sponge construction is proven indifferentiable
from a random oracle if the underlying permutation is assumed to be ideal [BDPA08] and
this result applies to Keccak. As noted in Section 2.3.6.2, indifferentiability bound renders
bounds on the other security properties. Following this approach, an optimal bound is
2 n
obtained on collision resistance AdvCol
H = Θ(q /2 ), as well as on preimage and second
preimage resistance Θ(q/2n ) for Keccak in the ideal permutation model [AMP10c].
Performance of Keccak
The Keccak hash function is described by NIST [TPB+ 11] as an average performer in
software, while hardware performance of Keccak is regarded as excellent. In constrained
environments, Keccak is below-average in performance with modest memory requirements.
Keccak is highly parallelizable due to the design.
Chapter 3. NIST’s SHA-3 Hash Function Competition
3.3.5
32
Skein
The Skein hash function [BKL+ 10] builds on the Unique Block Iteration (UBI). UBI mode
hashes an arbitrary-length string by iterating a compression function, which takes as input
an internal state, a message block, and a tweak. The compression function is based on the
Threefish tweakable block cipher in Matyas-Meyer-Oseas mode as can be seen on Figure
3.4. The tweak encodes the number of bytes processed to this point, type of UBI mode and
special flags for the first and the last block. Skein supports variable output size. If a single
output block is not enough, Skein runs the output transformation several times. The most
innovative parts of Skein are the Threefish block cipher and the mode of operation. The
reader is referred to Section 4.2 for a detailed description.
Figure 3.4. Hashing a three-block message using UBI mode.
Security of Skein
Due to optimal security bounds on the compression function claimed by submitters [BKL+ 09]
and the property that the Skein’s mode of operation preserves collision resistance and
everywhere preimage resistance, optimal bounds for these two properties are obtained.
Furthermore, the Skein hash function is proven indifferentiable from a random oracle if
the underlying tweakable block cipher is assumed to be ideal [BKL+ 09]. As derived in
2
[AMP10c], this indifferentiability renders a bound of O 2qn + q2l on the second preimage
resistance. This second preimage bound for Skein is optimal for the n = 256 variant, while
for n = 512 variant this claim is not held. In Chapter 4 we improve bound on second
preimage resistance to AdveSec
= Θ(q/2n ) in the ideal cipher model.
H
Chapter 3. NIST’s SHA-3 Hash Function Competition
33
Performance of Skein
NIST [TPB+ 11] rated Skein’s performance in software as above-average across most platforms, particularly in 64-bit mode. In hardware, Skein’s throughput-to-area ratio is average to a little below-average. Results in constrained environments show that Skein has
above-average performance. Skein has modest memory requirements and benefits from the
pipelining used in modern processors.
3.4
A Summary of the Existing Results
In this section we provide a summary of the previously mentioned results. First, Section
3.4.1 presents the main advantages and drawbacks of the finalists recognized by NIST
[TPB+ 11], and then provides a schematic summary of security and performance results.
3.4.1
Factors of Favorability
BLAKE was promoted to the final round of NIST’s SHA-3 hash function competition due
to its high security margin, good performance in software, and its simple and clear
design.
Grøstl was chosen as a finalist because of its well-understood design and solid performance,
especially in hardware. Although the security properties of Grøstl are not ideal, the
amount of cryptanalysis that has been published on Grøstl and its building blocks
provides a degree of security in this design.
JH was selected as a finalist because of its solid security properties, good all-around performance, and innovative design. As drawbacks of JH design NIST emphasizes not
well-understood compression function construction together with lack of analysis provided for this construction.
Keccak was selected by NIST for the final of competition, mainly due to its good security
properties, its high throughput and throughput-to-area ratio and the simplicity of its
design.
Skein advanced to the final, mainly due to its high security margin and speed in software.
3.4.2
A Summary of the Security and Performance Results
In Table 3.1 we briefly summarized performance results presented in [TPB+ 11], in order
to provide an insight on this important evaluation aspect. Let us emphasize that the
Chapter 3. NIST’s SHA-3 Hash Function Competition
34
description of performance level (high, average and low) does not imply drastically different
performances, considering that all these performance results are within satisfactory range
expected by the NIST.
Table 3.1. A schematic summary of hardware and software results. The first
column indicates the name of hash function selected in the final of competition,
while the next three columns describe performance results in software, hardware
and in constrained environments, respectively.
Software
Hardware
Constrained
settings
BLAKE
High
Average
High
Grøstl
Average
High
Low
JH
Average
Average
Average
Keccak
Average
High
Low
Skein
High
Low
High
As for the provable security results, the summary presented in our work is based on the
classification conducted by Andreeva et al. [AMP10c, AMPŠ12]. The first of these two
mentioned papers deals with provable security results of all 14 second round SHA-3 candidates, while in the second paper as well as in this thesis the emphasis is placed on the five
competition finalists. Concretely, in Table 3.2 we presented all security reduction results
(for n = 256 and n = 512 variants of the SHA-3 hash function finalists) known to us. We
updated second preimage results of Grøstl and Skein obtained in Chapter 4 which are
illustrated in the table with a green box. A yellow box in the table is used to indicate problems which are still open, one of which is the lack of an optimal (second) preimage bound
for 512 bits variant of JH. Essentially, all the results are provided in the ideal permutation
or cipher model, which means that the strength of assumptions is weakened in comparison
to the ideal compression function assumption. If we take a look on the security bounds
on compression functions presented in this table, we can see that collisions and (second)
preimages can be found for the JH and Keccak compression function in one query to the
permutation and as a consequence these compression functions are regarded as insecure.
However, this does not invalidate security of the JH and Keccak hash functions.
Θ(q/2n )
Θ(q 2 /2l )
Θ(1)
Θ(q/2l )
Θ(q 2 /2n )
Θ(q 4 /2l )
Θ(1)
Θ(1)
Θ(q 2 /2l )
Ideal cipher E
Ideal permutations P,Q
Ideal permutation P
Ideal permutation P
Ideal blockcipher E
BLAKE
Grøstl
JH
Keccak
Skein
Θ(1)
AdvPf re
AdvColl
f
Model
Θ(q/2l )
Θ(1)
Θ(1)
Θ(q 2 /2l )
Θ(q/2n )
AdvSec
f
Θ(q 2 /2n )
Θ(q 2 /2n )
Θ(q 2 /2n )
Θ(q 2 /2n )
Θ(q 2 /2n )
AdvColl
H
Θ(q/2n )
Θ(q/2n )
Θ(q/2n )
q2
O 2qn + 2l−m
Θ(q/2n )
AdvPHre
Θ(q/2n )
Θ(q/2n )
Θ(q/2n−L )
q2
O 2qn + 2l−m
Θ(q/2n )
AdvSec
H
Table 3.2. A schematic summary of security reduction results of five finalists. The used parameters n, l, m,2L , denote the hash function
output size, the internal value size and the message input size, the length of the first preimage in message blocks, respectively. The
first column indicates the name of hash function selected in the final of competition, while the second column describes the underlying
assumptions. The next three columns show the security bounds on compression functions, while the last three columns summarize the
security reduction results on complete hash functions. A yellow box indicates the existence of a non-trivial upper bound which is not yet
optimal for both the 256 and 512 bits variant. A green box indicates the security reduction results that are proven in this thesis while the
other results presented in this table are based on previous works [AMP10c, AMPŠ12].
Chapter 3. NIST’s SHA-3 Hash Function Competition
35
Chapter 4
Second Preimage Resistance of
Grøstl and Skein
This thesis is concerned with the second preimage resistance of SHA-3 candidates, namely
Grøstl and Skein. As explained in Chapter 3, an important evaluation criterion in the
competition for SHA-3 hash function is security (e.g. the possible reductions of the hash
function security to the security of its underlying building blocks). In this chapter we
provide a lower bound on second preimage resistance of Grøstl and Skein within the
concrete-security provable-security framework. The reader is referred to Section 2.3.6 and
Section 2.3.7 where the proof techniques and the security model used in this chapter are
discussed.
4.1
Security Analysis of Grøstl
As briefly presented in Section 3.3.2 Grøstl combines characteristics of the wide-pipe and
Merkle-Damgård constructions and uses two distinct permutations P and Q. Let us closely
observe Grøstl to see how the hash value is obtained. First, the padding function padG
takes a message M of N bits length and returns the padded message split into l- bit
message blocks padG (M ) = m1 ||m2 || . . . ||mk of the certain length, which is a multiple of
message block size l. Padding is achieved by appending to the original message a single
’1’ bit followed by as many ’0’ bits as needed to complete l-bit block after embedding
the 64-bit representation of the number of message blocks in the padded message. Then,
Grøstl iterates the permutation based compression function f : {0, 1}l × {0, 1}l → {0, 1}l .
Finally, the output of the last compression call is processed by the output transformation
g(h) = P (h) ⊕ h after which the output size is shortened from l to n bits with the function
shortn .
37
Chapter 4. Second Preimage Resistance of Grøstl and Skein
38
Figure 4.1. Grøstl’s compression function.
4.1.1
Assessing Second Preimage Resistance of Grøstl
A possible way to obtain a bound on the second preimage resistance of Grøstl is by using
indifferentiability results. Grøstl is proven indifferentiable from a random oracle if the
underlying permutations are ideal [AMP10a]. Briefly, a proved bound shows that Grøstl
behaves like a random oracle up to the birthday bound which is not enough for achieving
optimal second preimage resistance.
As indicated in [GKM+ 11], the underlying compression function of Grøstl exhibits a nonideal behavior (i.e. the fixed points for the compression function can be found easily1 ,
the generalised birthday collision attack is applicable to the l-bit compression function of
Grøstl with a complexity of 2l/3 ), which makes the result of Bouillaguet and Fouque [BF09]
in the ideal compression function model inapplicable. Therefore, in order to reconfirm the
second preimage resistance of Grøstl we explore further. More precisely, we assume ideality
of the underlying building blocks of compression function which in the case of Grøstl are
two permutations P and Q.
4.1.2
Proof of Security
Under the assumption that P and Q are random l-bit permutations, where l is the iterated
state size and n is the output size, we will prove that the advantage of the second preimage
2
2q
+
, where the second preimage adversary
adversary is upper bounded by O (k+1)q
n
l
2
2
makes at most q queries and the length of target message is at most k blocks. In this
ideal model, an adversary is allowed to make both forward and inverse queries to P and
1
In order to find a fixed point, we select a message m arbitrarily and then compute h = P −1 (Q(m)) ⊕ m.
This will give us the fixed point for Grøstl’s compression function f (h, m) = h.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
39
Q random permutations. All these queries are stored in a query history LP and LQ as
indexed elements and their number is q2 and q1 , respectively.
Theorem 4.1. Let P,Q be two random l-bit permutations and let A be a computationally
unbounded adversary which makes at most q < 2l−1 queries to oracles. Its advantage in
breaking H second preimage resistance is upper bounded by:
eSec[λ]
AdvH
(q) ≤
(k+1)q 2
2l
+
2q
2n .
Proof. We prove the theorem by using a graph based approach. To complete this proof, we
will introduce the graph construction setting, which is based on the definitions provided in
Section 2.1.2.
The Graph Construction. We introduce two, initially empty lists LP , LQ . Let us denote
by LQ = {(αi , βi )1≤i≤q1 } a list such that Q(αi ) = βi and by LP = {(αj0 , βj0 )1≤j≤q2 } a list
such that P (αj0 ) = βj0 where a tuple (α, β) ∈ {0, 1}l ×{0, 1}l . We introduce a directed graph
(V, E), initially ({IV } , ∅). Any (αi , βi ) ∈ LQ and (αj0 , βj0 ) ∈ LP defines an edge e between
e
i
→
αi ⊕ αj0 ⊕ βi ⊕ βj0 . We define a
the two vertices in (V, E) which we denote by αi ⊕ αj0 −
path in the graph as a sequence of edges p = (e1 , . . . , ek+1 ) such that for each of its edge
ei , where 1 ≤ i ≤ k the output vertex is equal to the input vertex of ei+1 . We say that two
distinct paths collide if they both start with the IV vertex and both end with the same
output vertex.
Grøstl in the Graph Setting. Intuitively, an edge in (V, E) corresponds to an evaluation of the Grøstl compression function and the number of them is exactly q1 · q2 . For
convenience edges ei ∈ E are labeled by messages mi in {0, 1}l where mi = αi and 1 ≤ i ≤ k.
A path in the graph (V, E) obtained while hashing the target message M is called the chalm
m
m
ek+1
1
2
k
lenge path denoted by IV −−→
h1 −−→
h2 · · · −−→
hk −−−→ hk+1 . It is necessary to emphasize
that first k internal states are l-bit long, while hk+1 (n-bit long hash value) is obtained by
applying output transformation with the function shortn on the internal state hk . We can
conclude that a vertex in (V, E) corresponds to the internal state of the Grøstl hash function.
Let SP be the event that, as a result of adversary’s queries, a path which collides with and
differs from the challenge path is formed in the graph (V, E).
eSec[λ]
Claim 1. AdvH
(q) ≤ P r[SP]
Proof. Suppose that the second preimage adversary A receives a randomly generated target message M where padG (M ) = m1 ||m2 ||...||mk and it outputs a message M 0 6= M
where padG (M 0 ) = m01 ||m02 ||...||m0s such that H P,Q (M ) = H P,Q (M 0 ) for queried oracles P and Q. The adversary A makes all of the queries necessary to compute H(M )
Chapter 4. Second Preimage Resistance of Grøstl and Skein
40
and H(M 0 ). We denote by p = (m1 , m2 , ..., mk , ek+1 ) the challenge path and denote by
p0 = (m01 , m02 , ..., m0s , e0s+1 ) the path obtained while hashing message M 0 . We claim that
paths p and p0 are colliding paths.
1. If |M | 6= |M 0 |, then due to the padding function of Grøstl, the inputs of the last
invocation of the compression are not the same mk 6= m0s , then clearly p and p0
induced by messages M and M 0 are distinct.
2. Otherwise, |M | = |M 0 |. Since hk+1 = h0s+1 , either there is a second preimage for
the output transformation or hk = h0s . If the latter case is true, either there is
a second preimage on the compression function, or (hk−1 , mk ) = (h0k−1 , m0k ). This
argument repeats for the compression function. Since |M | = |M 0 | and IV is fixed for
both evaluations, either there is a second preimage at some point, or mi = m0i for
1 ≤ i ≤ k. In the latter case, M = M 0 which is impossible. Therefore, there exists at
least one pair (h0i−1 , m0i ) 6= (hi−1 , mi ), which implies that paths p and p0 are distinct.
Because M and M 0 collide, we have hk+1 = h0s+1 and hence the paths p and p0 end with
the same output vertex which means that they collide. Therefore, finding a message that
collides with the target message is equivalent to finding a path that collides with the
challenge path. This completes the proof of the Claim 1.
Claim 2. P r[SP] ≤
(k+1)q 2
2l
+
2q
2n .
Proof. Suppose that A wins. The SP event occurs when A succeeds in connecting a path
(different from the challenge path) in the graph (V, E) from IV to the challenge path. That
connection can happen in two ways:
Let C be the event in which a connection occurs on an internal state of the challenge
path before the output transformation is applied and let us name CO the event in which
connection occurs after the output transformation is applied.
Simulation. We simulate the execution of A, and bookmark in lists LP and LQ the queries
sent to the oracles P and Q, respectively. Every time A submits a new query to the oracle, it
m
m
m
ek+1
1
2
k
receives a uniformly-distributed random value. Let IV −−→
h1 −−→
h2 · · · −−→
hk −−−→ hk+1
be the sequence of vertices crossed by the challenge path.
Case 1: If the C event occurs after the q-th query to P and/or Q oracle, in the graph
m0
m0
m0
1
2
s
there exists a path p0 , IV −−→
h01 −−→
h02 · · · −−→
h0s where h0s is equal to one of the internal
states hi from the challenge path for 0 ≤ i ≤ k. This means that the adversary has found
a collision on compression function f. More precisely, this collision is actually the second
preimage of one out of k + 1 internal states for f.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
41
Start the Simulation. Let us assume that the event C occurs after the adversary has sent
a query to Q or Q−1 . Without loss of generality, we consider forward queries only. The tuple
(α̂, β̂) is generated where β̂ is a random value from a set of size at least 2l − q1 . The second
preimage is found if in the list LP exists a pair (αj0 , βj0 ), such that hi = α̂ ⊕ αj0 ⊕ β̂ ⊕ βj0
where 0 ≤ i ≤ k. Since 1 ≤ j ≤ q2 , each query to Q or Q−1 generates q2 new edges. Therefore, each query has a probability
q2 ·(k+1)
(2l −q1 )
to give the second preimage of one out of k + 1
internal states from the challenge path. Consequently, a probability that event C occurs
after the adversary asks at most q1 queries to Q or Q−1 is upper bounded by:
P r[C]Q ≤
(k + 1)q1 q2
.
2 l − q1
Alternatively, we have the case that the event C is realized after the adversary has sent a
query to P or P −1 . An upper bound for this case is obtained in the similar way as before:
P r[C]P ≤
(k + 1)q1 q2
.
2 l − q2
By the union bound, we obtain an upper bound on probability that event C occurs:
P r[C] ≤ P r[C]Q + P r[C]P ≤
(k + 1)q1 q2 (k + 1)q1 q2
+
.
2 l − q1
2 l − q2
Case 2: A hash value hk+1 = shortn (P (hk ) ⊕ hk ) is generated by applying the output
transformation together with the function shortn 2 . The output transformation is designed
on top of the permutation P . Therefore, the event CO can be realized only after the
adversary has sent query to P or P −1 . Notice that each query generates precisely one
output transformation edge. If CO event occurs, in the graph (V, E) there exists a path p0 ,
m0
m0
m0
e0s+1
1
2
s
IV −−→
h01 −−→
h02 · · · −−→
h0s −−−→ h0s+1 where h0s+1 = hk+1 and h0s 6= hk . This implies that
the adversary has found a second preimage on the output transformation for n-bit long
preimage hk+1 .
Start the Simulation. The event CO is realized after the adversary has sent query
to P or P −1 . Without loss of generality, only forward tuple (α̃, β̃) is generated and β̃
is a random value from a set of size at least 2l − q2 . The second preimage is found if
hk+1 = shortn (α̃ ⊕ β̃). Therefore, each query to P or P −1 has a probability at most
2l−n
(2l −q2 )
to give the second preimage on the output transformation. Consequently, a probability
that event CO occurs after the adversary asks at most q2 queries to P or P −1 is upper
bounded by:
P r[CO] = P r[CO]P ≤
2
q2 · 2l−n
.
2l − q2
The function shortn truncates the output by returning only the last n bits.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
42
Combining all cases, we give an upper bound on a probability that event SP occurs:
P r[SP] ≤ P r[C] + P r[CO]
(k + 1)q1 q2 (k + 1)q1 q2 q2 · 2l−n
+
+ l
2 l − q1
2l − q2
2 − q2
2
q
(k + 1)q
+ n−1 .
≤
2
2l
≤
In Appendix A we provide a detailed mathematical support for this equation. This completes the proof of Claim 2.
The result for second preimage resistance of Grøstl now follows from the combination of
the two claims which completes the proof of Theorem 4.1.
4.2
Security Analysis of Skein
As briefly presented in Section 3.3.5 the mode of operation employed in Skein called Unique
Block Iteration (UBI) takes as input an internal state, a message block, and a tweak. The
compression function is based on the Threefish tweakable block cipher used in MatyasMeyer-Oseas mode. The tweak encodes the number of bytes processed so far, the type of
UBI mode and special flags for the first and the last block. In normal hashing mode there
are three UBI invocations: the one for a configuration block used to generate IV , a message
hashing block and a block which represents the output transformation.
Figure 4.2. Skein in normal hashing mode.
The padding function padS takes a message M of N bits length and returns the padded
message split into the message blocks padS (M ) = m1 ||m2 || . . . ||mr of a certain length,
which is a multiple of message block size l. If N is a multiple of 8, padding is achieved by
appending to the original message as many ’0’ bits as needed to complete an l-bit block.
Otherwise, padding is achieved by appending to the original message a single ’1’ bit followed
Chapter 4. Second Preimage Resistance of Grøstl and Skein
43
by as many ’0’ bits as needed to complete an l-bit block. Interestingly, Skein uses a block
counter included in the tweak rather than the usual strengthening. The designers claim that
the counter provides the same security as the typical padding where the message length is
appended in the end of message. Furthermore, the counter ensures that each message block
is hashed in the unique way. To obtain the hash value, the output of the last compression
call is processed by the output transformation after which the output size is optionally
shortened from l to n bits with the function shortn .
4.2.1
Assessing Second Preimage Resistance of Skein
A possible way to obtain a bound on the second preimage resistance of Skein is by using indifferentiability results. The Skein hash function is proven indifferentiable from a
random oracle if the underlying tweakable block cipher is assumed to be ideal [BKL+ 09].
2
Additionally, an upper bound O 2qn + q2l on the second preimage resistance is derived via
the indifferentiability [AMP10c]. NIST requires the SHA-3 hash function for n = 224, 256,
384, 512. The existing second preimage bound gives optimal second preimage resistance as
long as 2n ≤ l.
In order to prove an optimal bound on the second preimage resistance of narrow-pipe
versions of Skein, we will directly analyze second preimage resistance of Skein in the ideal
cipher model. Our proof follows techniques used by Bouillaguet and Fouque [BF09] for
HAIFA construction.
4.2.2
Proof of Security
Under the assumption that E is an ideal tweakable block cipher, where l is the iterated
state size and n is the output size, we will prove that the advantage of the second preimage
2q
+
, where the second preimage adversary makes
adversary is upper bounded by O 2q
n
l
2
2
at most q queries and the length of target message is at most r blocks. In this ideal model,
an adversary is allowed to make both forward and inverse queries to E random oracle. All
these queries are stored in a query history LE as indexed elements.
Theorem 4.2. Let E be an ideal tweakable block cipher and let A be a computationally
unbounded adversary which makes at most q < 2l−1 queries. Its advantage in breaking H
second preimage resistance is upper bounded by:
eSec[λ]
AdvH
(q) ≤
2q
2l
+
2q
2n .
Proof. The proof follows an approach used in the proof of Theorem 4.1.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
44
The Graph Construction. Let LE = {(ki , xi , ti , yi )1≤i≤q } be an initially empty list such
that y = Ek (t, x) where tuple (k, x, t, y) ∈ {0, 1}l × {0, 1}l × {0, 1}s × {0, 1}l . We introduce
an initially empty directed graph (V, E). When the adversary A sends a forward query
(k, x, t) to oracle E it receives a value y, and when A sends an inverse query (k, t, y)
to oracle it receives a value x. An edge e ∈ E is formed between two vertices in V ,
e
(k, x, t, y) →
− (k 0 , x0 , t0 , y 0 ) if k 0 = y ⊕ x. We define a path in the graph (V, E) as the sequence
e
e
1
r
of vertices which we denote by p = (k1 , x1 , t1 , y1 ) −→
· · · −→
(kr+1 , xr+1 , tr+1 , yr+1 ). We say
that two vertices (k, x, t, y) and (k 0 , x0 , t0 , y 0 ) collide if y ⊕ x = y 0 ⊕ x0 . Further, two distinct
paths collide if they both start with the same vertex and they both end with colliding
vertices.
Skein in the Graph Setting. Intuitively, an edge corresponds to precisely one evaluation
of the Skein’s compression function. For each i, 1 ≤ i ≤ r is true: mi = xi , hi−1 = ki , t3 is a
tweak value of the message type and hi = yi ⊕xi . The hash value hr+1 = shortn (yr+1 ⊕xr+1 )
is obtained by applying the output transformation with the final chopping on the internal
state hr = kr+1 . In the output transformation, the tweak has the output type and the
64-bit counter is used instead of message block input xr+1 . Without loss of generality, we
can replace the first UBI invocation for the configuration block with IV = k1 and fix it as
a constant. If a path in (V, E) is obtained while hashing the target message M , we refer to
this sequence as the challenge path. We denote by (IV, h1 , . . . , hr ) the sequence of internal
states crossed by the challenge path to obtain the hash value hr+1 .
Let SP be the event that, as a result of adversary’s queries, a path which collides with and
differs from the challenge path is formed in the graph (V, E), where the overlapping tweaks
coincide with each other.
eSec[λ]
Claim 1. AdvH
(q) ≤ P r[SP].
Proof. Suppose that the second preimage adversary A receives a randomly generated target message M where padS (M ) = m1 ||m2 || . . . ||mr and it outputs a message M 0 6= M
where padS (M 0 ) = m01 ||m02 || . . . ||m0p such that H E (M ) = H E (M 0 ) for queried oracle E.
Adversary A makes all of the queries necessary to compute H(M ) and H(M 0 ). Let us
e
e
1
r
denote by p = (IV, x1 , t1 , y1 ) −→
· · · −→
(kr+1 , xr+1 , tr+1 , yr+1 ) the challenge path induced
e0
e0p
1
0
0
by message M and let us denote by p0 = (IV, x01 , t01 , y10 ) −→
· · · −→ (kp+1
, x0p+1 , t0p+1 , yp+1
)
the path induced by message M 0 . We claim that paths p and p0 are colliding paths.
3
We assume that tweaks t and t0 in the definition of edge correspond to one another in terms of bits
processed so far, the type of UBI mode and special flags.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
45
1. If |M | =
6 |M 0 |, then the values of the tweak entering the output transformation are
0
0
different tr+1 6= t0p+1 and so (kr+1 , xr+1 , tr+1 , yr+1 ) 6= (kp+1
, x0p+1 , t0p+1 , yp+1
)4 .
2. Otherwise, |M | = |M 0 |. Since hr+1 = h0r+1 , either there is a second preimage on
output transformation or (hr , tr+1 ) = (h0r , t0r+1 ). If the second statement is true,
either there is a second preimage on compression function where tweak values must
be the same tr = t0r , or (hr−1 , mr , tr ) = (h0r−1 , m0r , t0r ). This argument repeats for the
compression function. Since |M | = |M 0 | and IV is fixed for both evaluations, either
there is a second preimage on compression function at some point (for the same value
of tweak), or mi = m0i for 1 ≤ i ≤ r. In the latter case, M = M 0 which is impossible.
Therefore, there is some i, 1 ≤ i ≤ r such that mi 6= m0i , and so (ki , xi , yi ) 6= (ki0 , x0i , yi0 )
for ti = t0i .
0
Since M and M 0 collide hr+1 = h0p+1 , and hence yr+1 ⊕ xr+1 = yp+1
⊕ x0p+1 . Therefore, the
paths p and p0 collide. This completes the proof of the Claim 1.
Claim 2. P r[SP] ≤
2q
2l
+
2q
2n .
Proof. Suppose that A wins. As noted before, the SP event occurs when A succeeds
in connecting a path (different from challenge path) in the graph (V, E) from IV to the
challenge path, where the tweaks need to coincide. Similarly as in the case of Grøstl the
connection can happen in two ways:
Let C be the event in which a connection occurs on an internal state of the challenge
path before the output transformation is applied and let us name CO the event in which
connection occurs after the output transformation is applied.
Simulation. We simulate the execution of A, and bookmark in list LE the queries sent
to the oracle E. Every time A submits a new query to the oracle, it receives a uniformlydistributed random value. We denote the challenge path induced by the target message M
e
er−1
1
by p = (IV, x1 , t1 , y1 ) −→
· · · −−−→ (kr+1 , xr+1 , tr+1 , yr+1 ).
Case 1: If the C event occurs after the adversary A asks at most q the queries to E
e0
e0p
1
oracle, in the graph (V, E) there exists a path p0 = (IV, x01 , t01 , y10 ) −→
· · · −→ (kp0 , x0p , t0p , yp0 ),
where the vertex (kp0 , x0p , t0p , yp0 ) collides with a vertex (ki , xi , ti , yi )1≤i≤r from the challenge
path, such that t0p = ti . This means that adversary has found a collision for the tweakable
compression function f. More precisely, this collision is actually the second preimage of one
of the hi from the challenge path, for 1 ≤ i ≤ r.
4
As noted above, in the output transformation the block cipher is used in the counter mode and therefore
xr+1 = x0p+1 but this does not affect our proof.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
46
Start the Simulation. Without loss of generality, let us assume that event C occurs after
the adversary has sent the j-th query. The tuple (kj0 , x0j , t0j , yj0 ) is generated where yj0 is a
random value from a set of size at least 2l −j. The only place where the path p0 can connect
to the challenge path is the vertex where t0j = ti , for 1 ≤ i ≤ r. A second preimage on
tweakable compression function is found if yi ⊕ xi = yj0 ⊕ x0j . Therefore, the j-th query has
a probability at most 1/(2l − j) to give this second preimage. Consequently, a probability
that event C occurs after the adversary asks at most q queries to E is upper bounded by:
P r[C] ≤
q
X
j=1
q
1
≤ l
.
2l − j
2 −q
Case 2: As noted before, the hash value hr+1 is obtained by applying the output transformation with the final chopping on the internal state hr = kr+1 of the challenge path.
If the CO event occurs after the adversary A asks at most q the queries to E oracle, in
e0
e0p
1
0
0
the graph (V, E) exists a path p0 = (IV, x01 , t01 , y10 ) −→
· · · −→ (kp+1
, x0p+1 , t0p+1 , yp+1
) where
0
hr+1 = shortn (yp+1
⊕ x0p+1 ) after final chopping of l − n leftmost bits. This means that the
adversary has found the second preimage on the output transformation.
Start Simulation. Let us assume that event CO occurs after adversary has sent the ith query. The tuple (ki0 , x0i , t0i , yi0 ) is generated where yi0 is a random value from a set of
size at least 2l − i. A second preimage on output transformation is found if and only if
hr+1 = shortn (yi0 ⊕ x0i ). Therefore, the i-th query has a probability at most
2l−n
2l −i
to give
this second preimage. Consequently, a probability that event CO occurs after adversary A
asks q queries to E is upper bounded by:
P r[CO] ≤
q
X
2l−n
q · 2l−n
≤
.
2l − i
2l − q
i=1
Combining both cases, we give an upper bound on probability that event SP occurs:
P r[SP] ≤ P r[C] + P r[CO]
q
q · 2l−n
+ l
−q
2 −q
2q 2q
≤ l + n.
2
2
≤
2l
We obtain this result similarly as in proof of Grøstl as for q < 2l−1 we have
1
2l −q
≤
2
.
2l
If
the final chopping is not needed n = l, the results are still valid. This completes the proof
of the Claim 2.
Chapter 4. Second Preimage Resistance of Grøstl and Skein
47
The result for the second preimage resistance of Skein now follows from the combination
of the two claims which completes the proof of Theorem 4.2.
Chapter 5
Conclusions and Remarks
In this chapter, we offer a brief summary of the work done in the thesis and then we discuss
its implications for the future study.
5.1
Conclusions
In this thesis we considered the final round candidates in the competition for a new SHA-3
hashing algorithm within the provable security framework. To be able to carry out the
analysis, we became familiar with the provable security approach together with the state
of the art of hash functions, and more closely with the competition finalists. As shown
in Chapter 4, we provided a lower bound on second preimage resistance of Grøstl and
Skein in the ideal model. The obtained results for Grøstl in the ideal permutation model
confirm the claim that the Merkle-Damgård iteration looses a factor linear in the message
length (in blocks) of the second preimage security in the ideal compression function model
[KS05]. Secondly, Skein’s bound shows that the addition of a tweak which entails an unique
compression function call results in an increase of the second preimage resistance (up to
approximately n bits). In Table 3.2 we presented the existing security reduction results and
updated those obtained in our work. One needs to be aware of shortcomings of provable
security approach in the ideal model while looking at these security reduction results. There
are classes of attacks still maybe possible, such as timing attacks, differential fault analysis,
and differential power analysis. Sometimes applied proof techniques or human factors (i.e.
flaws in the proof, proof given in the wrong model or for the wrong problem) may affect
the accuracy of a security reduction. However, security reduction results are of the great
importance since they give us a very good indication that the higher level structure has
no flaws in the design. More concretely, they show that no attack on the hash function is
possible without exploiting a weakness of the underlying idealized primitive.
49
Chapter 5. Conclusions and Remarks
5.2
50
Summary of Contributions
Bearing in mind the importance of valid security guaranties, we see our results as a valuable
contribution to the SHA-3 competition. The main contributions of this thesis are:
• The analysis of the second preimage resistance of hash function competition finalists
Grøstl and Skein. Within the concrete-security provable-security framework, we
gave a lower bound on the second preimage resistance of Grøstl in the ideal permutation model and Skein in the ideal cipher model and proved them both optimally
second preimage resistant.
• While seeking for solutions we investigated the existing proof techniques concerning
security notions with an emphasis on the second preimage resistance.
• In addition, we gave a concise survey of the five finalists together with their security
reductions and performance results.
5.3
Future Research
In recent years, the NIST SHA-3 competition has focused the attention of cryptographic
community and initiated a broad research on the design principles and analysis of hash
functions. As a result many new ideas emerged regarding construction designs, cryptanalysis, proof techniques, etc. Also, new directions for further research related to this topic
were identified. We now list some open problems:
• Firstly, as can be seen in Table 3.2 the provided bounds on the preimage and second
preimage resistance of JH are not optimal.
• Once we provide a reduction of the security (Col, ePre, eSec) of the hash function to
the security of some underlying atomic primitive (under the assumption that particular underlying primitive is ideal), a more detailed analysis of that particular primitive
can be conducted with the goal to investigate its resistance to existing and new attacks.
• All security reduction results presented in this work were carried out in the ideal
model. Supporting second preimage resistance with a proof in the standard model
still remains the substantial challenge. A possible direction would be an attempt
to design a construction efficient-in-practice with the second preimage preservation
property.
Chapter 5. Conclusions and Remarks
51
• More fundamentally, definitions and a classification of the main security properties
are still not completely understood, while new practical applications emerge with the
demand for subtle security requirements.
• There is a need for developing new methods to assess security and to develop new
attacks and designs ideas.
• Finally, a broad range of use and the number of existing security requirements as
well as performance requirements make the hash function design more complex. One
solution would be to effectively parse these requirements into certain related entities
and to design different hash functions which would deal with each of these entities.
Bibliography
[ABF+ 08] Elena Andreeva, Charles Bouillaguet, Pierre-Alain Fouque, Jonathan J. Hoch,
John Kelsey, Adi Shamir, and Sébastien Zimmer. Second Preimage Attacks on
Dithered Hash Functions. In Nigel P. Smart, editor, EUROCRYPT, volume
4965 of Lecture Notes in Computer Science, pages 270–288. Springer, 2008.
[ABM+ 12] Elena Andreeva, Andrey Bogdanov, Bart Mennink, Bart Preneel, and Christian
Rechberger. On Security Arguments of the Second Round SHA-3 Candidates.
International Journal of Information Security, 11(2):103–120, 2012.
[AHMP10] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan.
SHA-3 proposal BLAKE. Submission to NIST (Round 3), 2010.
[ALM11] Elena Andreeva, Atul Luykx, and Bart Mennink. Provable Security of BLAKE
with Non-Ideal Compression Function. IACR Cryptology ePrint Archive, Report 2011/620, 2011.
[AMP10a] Elena Andreeva, Bart Mennink, and Bart Preneel. On the Indifferentiability
of the Grøstl Hash Function. In Juan A. Garay and Roberto De Prisco, editors, SCN, volume 6280 of Lecture Notes in Computer Science, pages 88–105.
Springer, 2010.
[AMP10b] Elena Andreeva, Bart Mennink, and Bart Preneel. Security Properties of Domain Extenders for Cryptographic Hash Functions. JIPS, 6(4):453–480, 2010.
[AMP10c] Elena Andreeva, Bart Mennink, and Bart Preneel. Security Reductions of the
Second Round SHA-3 Candidates. In Mike Burmester, Gene Tsudik, Spyros S.
Magliveras, and Ivana Ilić, editors, ISC, volume 6531 of Lecture Notes in Computer Science, pages 39–53. Springer, 2010.
[AMPŠ12] Elena Andreeva, Bart Mennink, Bart Preneel, and Marjan Škrobot. Security
Analysis and Comparison of the SHA-3 Finalists BLAKE, Grøstl, JH, Keccak,
and Skein. In Aikaterini Mitrokotsa and Serge Vaudenay, editors, Progress
in Cryptology - AFRICACRYPT, volume 7374 of Lecture Notes in Computer
Science, pages 287–305. Springer, Heidelberg, 2012.
53
Bibliography
54
[And10] Elena Andreeva. Domain Extenders for Cryptographic Hash Functions. PhD
thesis, Katholieke Universiteit Leuven, 2010.
[ANPS07] Elena Andreeva, Gregory Neven, Bart Preneel, and Thomas Shrimpton. SevenProperty-Preserving Iterated Hashing: ROX. In Kaoru Kurosawa, editor, ASIACRYPT, volume 4833 of Lecture Notes in Computer Science, pages 130–146.
Springer, 2007.
[AP09] Elena Andreeva and Bart Preneel. A Three-Property-Secure Hash Function.
In Roberto Maria Avanzi, Liam Keliher, and Francesco Sica, editors, Selected
Areas in Cryptography, volume 5381 of Lecture Notes in Computer Science,
pages 228–244. Springer, 2009.
[AS11] Elena Andreeva and Martijn Stam. The Symbiosis between Collision and Preimage Resistance. In Liqun Chen, editor, IMA Int. Conf., volume 7089 of Lecture
Notes in Computer Science, pages 152–171. Springer, 2011.
[BCC+ 08] Emmanuel Bresson, Anne Canteaut, Benoı̂t Chevallier-Mames, Christophe
Clavier, Thomas Fuhr, Aline Gouget, Thomas Icart, Jean-François Misarsky,
Marı̀a Naya-Plasencia, Pascal Paillier, Thomas Pornin, Jean-René Reinhard,
Céline Thuillet, and Marion Videau. Shabal, a Submission to NIST’s Cryptographic Hash Algorithm Competition. Submission to NIST, 2008.
[BCS05] John Black, Martin Cochran, and Thomas Shrimpton. On the Impossibility of
Highly-Efficient Blockcipher-Based Hash Functions. In Ronald Cramer, editor,
EUROCRYPT, volume 3494 of Lecture Notes in Computer Science, pages 526–
541. Springer, 2005.
[BD07] Eli Biham and Orr Dunkelman. A Framework for Iterative Hash Functions HAIFA. IACR Cryptology ePrint Archive, Report 2007/278, 2007.
[BDPA07] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. Sponge
functions. ECRYPT Hash Workshop, 2007.
[BDPA08] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. On
the Indifferentiability of the Sponge Construction. In Nigel P. Smart, editor,
EUROCRYPT, volume 4965 of Lecture Notes in Computer Science, pages 181–
197. Springer, 2008.
[BDPA11] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. The
Keccak SHA-3 submission. Submission to NIST (Round 3), 2011.
[Ber08] Daniel J. Bernstein. ChaCha, a variant of Salsa20, 2008. http://cr.yp.to/
chacha/chacha-20080128.pdf.
Bibliography
55
[BF09] Charles Bouillaguet and Pierre-Alain Fouque. Practical Hash Functions Constructions Resistant to Generic Second Preimage Attacks Beyond the Birthday
Bound, 2009.
[BKL+ 09] Mihir Bellare, Tadayoshi Kohno, Stefan Lucks, Niels Ferguson, Bruce Schneier,
Doug Whiting, Jon Callas, and Jesse Walker. Provable Security Support for
The Skein Hash Family, 2009.
[BKL+ 10] Mihir Bellare, Tadayoshi Kohno, Stefan Lucks, Niels Ferguson, Bruce Schneier,
Doug Whiting, Jon Callas, and Jesse Walker. The Skein Hash Function Family.
Submission to NIST (Round 3), 2010.
[BMN10] Rishiraj Bhattacharyya, Avradip Mandal, and Mridul Nandi. Security Analysis of the Mode of JH Hash Function. In Seokhie Hong and Tetsu Iwata,
editors, FSE, volume 6147 of Lecture Notes in Computer Science, pages 168–
191. Springer, 2010.
[Bou11] Charles Bouillaguet. Etudes d’hypothéeses algorithmiques et attaques de primitives cryptographiques. PhD thesis, Université Paris Diderot, 2011.
[BR93] Mihir Bellare and Phillip Rogaway. Random Oracles are Practical: A Paradigm
for Designing Efficient Protocols. In Dorothy E. Denning, Raymond Pyle, Ravi
Ganesan, Ravi S. Sandhu, and Victoria Ashby, editors, ACM Conference on
Computer and Communications Security, pages 62–73. ACM, 1993.
[BRS02] John Black, Phillip Rogaway, and Thomas Shrimpton. Black-Box Analysis
of the Block-Cipher-Based Hash-Function Constructions from PGV. In Moti
Yung, editor, CRYPTO, volume 2442 of Lecture Notes in Computer Science,
pages 320–335. Springer, 2002.
[CDMP05] Jean-Sébastien Coron, Yevgeniy Dodis, Cécile Malinaud, and Prashant Puniya.
Merkle-Damgård Revisited: How to Construct a Hash Function. In Victor
Shoup, editor, CRYPTO, volume 3621 of Lecture Notes in Computer Science,
pages 430–448. Springer, 2005.
[CNY11] Donghoon Chang, Mridul Nandi, and Moti Yung. Indifferentiability of the Hash
Algorithm BLAKE. IACR Cryptology ePrint Archive, Report 2011/623, 2011.
[Dam90] Ivan Damgård. A Design Principle for Hash Functions. In Gilles Brassard, editor, Advances in Cryptology - CRYPTO, 9th Annual International Cryptology
Conference, Santa Barbara, California, USA, August 20-24, 1989, Proceedings,
volume 435 of Lecture Notes in Computer Science, pages 416–427. Springer,
1990.
Bibliography
56
[Dea99] Richard Dean. Formal Aspects of Mobile Code Security. PhD thesis, Princeton
University, 1999.
[DH76] Whitfield Diffie and Martin E. Hellman. New Directions in Cryptography.
IEEE Transactions on Information Theory, IT-22(6)/ 644-654, 1976.
[Die10] Reinhard Diestel. Graph Theory (Graduate Texts in Mathematics). SpringerVerlag, 2010.
[FS86] Amos Fiat and Adi Shamir. How to Prove Yourself: Practical Solutions to Identification and Signature Problems. In Andrew M. Odlyzko, editor, CRYPTO,
volume 263 of Lecture Notes in Computer Science, pages 186–194. Springer,
1986.
[FSZ09] Pierre-Alain Fouque, Jacques Stern, and Sébastien Zimmer. Cryptanalysis of
Tweaked Versions of SMASH and Reparation. In Roberto Maria Avanzi, Liam
Keliher, and Francesco Sica, editors, Selected Areas in Cryptography, volume
5381 of Lecture Notes in Computer Science, pages 228–244. Springer, 2009.
[GKM+ 11] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusiewicz, Florian Mendel,
Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl – a SHA3 candidate. Submission to NIST (Round 3), 2011.
[GM84] Shafi Goldwasser and Silvio Micali. Probabilistic Encryption. Journal of Computer and System Sciences, 28(2)/ 270-299, 1984.
[Jou04] Antoine Joux.
Multicollisions in Iterated Hash Functions. Application to
Cascaded Constructions. In Matt Franklin, editor, Advances in Cryptology
CRYPTO, volume 3152 of Lecture Notes in Computer Science, chapter 19,
pages 99–213. Springer, Berlin, Heidelberg, 2004.
[KK06] John Kelsey and Tadayoshi Kohno. Herding Hash Functions and the Nostradamus Attack. In Serge Vaudenay, editor, EUROCRYPT, volume 4004 of
Lecture Notes in Computer Science, pages 183–200. Springer, 2006.
[KS05] John Kelsey and Bruce Schneier. Second preimages on n-bit hash functions
for much less than 2n work. In Ronald Cramer, editor, EUROCRYPT, volume
3494 of Lecture Notes in Computer Science, pages 474–490. Springer, 2005.
[LH11] Jooyoung Lee and Deukjo Hong. Collision Resistance of the JH Hash Function.
IACR Cryptology ePrint Archive, Report 2011/19, 2011.
[LM92] Xuejia Lai and James L. Massey. Hash Function Based on Block Ciphers.
In Rainer A. Rueppel, editor, EUROCRYPT, volume 658 of Lecture Notes in
Computer Science, pages 55–70. Springer, 1992.
Bibliography
57
[Luc05] Stefan Lucks. A Failure-Friendly Design Principle for Hash Functions. In
Bimal K. Roy, editor, ASIACRYPT, volume 3788 of Lecture Notes in Computer
Science, pages 474–494. Springer, 2005.
[Mer79] Ralph Merkle. Secrecy, Authentication, and Public Key Systems. PhD thesis,
UMI Research Press, 1979.
[Mer90] Ralph C. Merkle. One Way Hash Functions and DES. In Gilles Brassard, editor, Advances in Cryptology - CRYPTO, 9th Annual International Cryptology
Conference, Santa Barbara, California, USA, August 20-24, 1989, Proceedings,
volume 435 of Lecture Notes in Computer Science, pages 428–446. Springer,
1990.
[MPST12] Dustin Moody, Souradyuti Paul, and Daniel Smith-Tone. Improved Indifferentiability Security Bound for the JH Mode. In NIST’s 3rd SHA-3 Candidate
Conference 2012, 2012.
[MRH04] Ueli M. Maurer, Renato Renner, and Clemens Holenstein. Indifferentiability,
Impossibility Results on Reductions, and Applications to the Random Oracle
Methodology. In Moni Naor, editor, TCC, volume 2951 of Lecture Notes in
Computer Science, pages 21–39. Springer, 2004.
[MvOV97] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of
Applied Cryptography. CRC Press, 1997.
[NIS07] NIST. Announcing Request for Candidate Algorithm Nominations for a New
Cryptographic Hash Algorithm. Technical report, NIST, 2007.
[PGV93] Bart Preneel, René Govaerts, and Joos Vandewalle. Hash functions based on
block ciphers: A synthetic approach. In Advances in Cryptology - CRYPTO,
Lecture Notes in Computer Science, pages 368–378. Springer-Verlag, 1993.
[Rab78] Michael O. Rabin. Digitalized signatures. In Foundations of Secure Computation, pages 155–166. Academic Press, 1978.
[Rog06] Phillip Rogaway. Formalizing Human Ignorance. In Phong Q. Nguyen, editor,
VIETCRYPT, volume 4341 of Lecture Notes in Computer Science, pages 211–
228. Springer, 2006.
[RS04] Phillip Rogaway and Thomas Shrimpton. Cryptographic Hash-Function Basics:
Definitions, Implications, and Separations for Preimage Resistance, SecondPreimage Resistance, and Collision Resistance. In Bimal K. Roy and Willi
Meier, editors, FSE, volume 3017 of Lecture Notes in Computer Science, pages
371–388. Springer, 2004.
Bibliography
58
[RS08a] Phillip Rogaway and John P. Steinberger. Constructing Cryptographic Hash
Functions from Fixed-Key Blockciphers. In David Wagner, editor, CRYPTO,
volume 5157 of Lecture Notes in Computer Science, pages 433–450. Springer,
2008.
[RS08b] Phillip Rogaway and John P. Steinberger. Security/Efficiency Tradeoffs for
Permutation-Based Hashing. In Nigel P. Smart, editor, EUROCRYPT, volume
4965 of Lecture Notes in Computer Science, pages 220–236. Springer, 2008.
[Sta08] Martijn Stam. Beyond Uniformity: Better Security/Efficiency Tradeoffs for
Compression Functions. In David Wagner, editor, CRYPTO, volume 5157 of
Lecture Notes in Computer Science, pages 397–412. Springer, 2008.
[Sta09] Martijn Stam. Blockcipher-Based Hashing Revisited. In Orr Dunkelman, editor,
FSE, volume 5665 of Lecture Notes in Computer Science, pages 67–83. Springer,
2009.
[TPB+ 11] Meltem Sönmez Turan, Ray Perlner, Lawrence E. Bassham, William Burr,
Donghoon Chang, Shu jen Chang, Morris J. Dworkin, John M. Kelsey,
Souradyuti Paul, and Rene Peralta. Status Report on the Second Round of the
SHA-3 Cryptographic Hash Algorithm Competition. Technical report, NIST,
2011.
[Wag02] David Wagner.
A Generalized Birthday Problem.
In Moti Yung, editor,
CRYPTO, volume 2442 of Lecture Notes in Computer Science, pages 288–303.
Springer, 2002.
[Wu11] Hongjun Wu. The Hash Function JH. Submission to NIST (round 3), 2011.
[WY05] Xiaoyun Wang and Hongbo Yu. How to Break MD5 and Other Hash Functions.
In Ronald Cramer, editor, EUROCRYPT, volume 3494 of Lecture Notes in
Computer Science, pages 19–35. Springer, 2005.
[WYY05] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding Collisions in the Full
SHA-1. In Victor Shoup, editor, CRYPTO, volume 3621 of Lecture Notes in
Computer Science, pages 17–36. Springer, 2005.
Appendix A
Mathematical Derivations
A.1
Security Bound on Second Preimage of Grøstl
P r[SP] ≤ P r[C] + P r[CO]
(A.1)
(k + 1)q1 q2 (k + 1)q1 q2
+
+
2l − q1
2l − q2
(k + 1)q1 q2 (k + 1)q1 q2
≤
+
+
2l − q
2l − q
2(k + 1)q1 q2 2q2 · 2l−n
≤2·
+
2l
2l
(k + 1)q 2
q
≤2·
+ n−1
2
2 · 2l
(k + 1)q 2
q
+ n−1
≤
2
2l
≤
2l−n
q2 ·
2 l − q2
q2 · 2l−n
2l − q
(A.2)
(A.3)
(A.4)
(A.5)
(A.6)
Firstly, we present obtained bounds (A.2) in proof for second preimage resistance of the
Grøstl. Since q = q1 + q2 , we can replace q1 and q2 with q in denominator and the equation
(A.3) holds. As for q < 2l−1 we have
1
2l −q
≤
2
2l
we obtain (A.4). Furthermore, we wish
to determine what is the maximum value of 2q1 q2 . We consider x = q2 , q1 = q − x
and define a function fq (x) = 2(q − x)x = 2qx − 2x2 . To find a maximum of function
we search for the first derivative fq0 (x) = 2q − 4x where 2q − 4x = 0. We have that
x = q/2 ⇒ fqmax = fq (q/2) = 2(q − q/2)q/2 ⇒ fqmax = q 2 /2. Using this result we obtain
(A.5). Finally, we obtain the bound on second preimage resistance of Grøstl (A.6).
59
Download