UNIVERSITY OF NOVI SAD DEPARTMENT OF POWER, ELECTRONICS AND TELECOMMUNICATIONS MASTER’S THESIS Provable Security Analysis of SHA-3 Candidates Marjan Škrobot Promoters: Prof. dr. ir. Bart Preneel Prof. dr. ir. Vincent Rijmen Supervisors: Elena Andreeva, PhD Bart Mennink June, 2012 Abstract Hash functions are fundamental cryptographic primitives that compress messages of arbitrary length into message digests of a fixed length. They are used as the building block in many important security applications such as digital signatures, message authentication codes, password protection, etc. The three main security properties of hash functions are collision, second preimage and preimage resistance. In 2005, significant breakthrough was made in the cryptanalysis of hash functions. Namely, attacks on SHA-1 and MD5 raised concerns about the security of the widely used hash function standards. In a response to this hash function crisis, the US National Institute for Standards and Technology (NIST) announced a call for the design of a new cryptographic hash algorithm in 2007. NIST received 64 submissions. At this moment, 5 candidates are in the final round of competition: BLAKE, Grøstl, JH, Keccak and Skein. An important criteria for the evaluation of hash functions is their security. A common technique to assess the security of hash functions is via reductionist proofs of security. Within this provable framework, Andreeva et al. provided a summary of all known security reduction results in the ideal model for the 14 second round SHA-3 candidates. Furthermore, they identified several open problems. In this thesis, we investigate the existing proof techniques for the second preimage analysis and resolve remaining open problems regarding the second preimage resistance of Grøstl and Skein. More precisely, these two hash functions are proved optimally second preimage resistant in the ideal model within the concrete security provable framework. Finally, we provide an overview of the current security reduction and performance results on the five finalists. Acknowledgements I would like to show my gratitude to the people without whose help and guidance the accomplishment of this thesis would not have been possible. In the first place I am very grateful to my supervisors Elena Andreeva and Bart Mennink who introduced me to the field of cryptology and whose sincerity and encouragement I will never forget. Above all, it would have been next to impossible to write this thesis without their supervision and advices from the very beginning to the end of my work. Bart’s positive spirit and his precious time he put into reading and giving critical comments about my thesis I greatly appreciate. I gratefully acknowledge Elena for introducing me to the area of provable security, and for guiding me to the literature that sparked and sustained my interest in cryptology. The cooperation with both of them was very important and educational to me. I gratefully thank Vojin Šenk and Željen Trpovski for their great support and active involvement as coordinators in the exchange process. I was privileged to have them as my professors and I am grateful for the help they have given me. A special word of gratitude to my parents, Pavle and Ruža, who have been a constant source of support emotional, moral and of course financial during my postgraduate years, and this thesis would certainly not have existed without them. Also, I would like to thank my family and friends for their support throughout my studies. Finally, I want to give a special thanks to my girlfriend Ljiljana for her great support and for producing the figures used in this thesis. v Contents Abstract iii Acknowledgements v Table of Contents vii List of Figures ix List of Tables xi 1 Introduction 1 2 Preliminaries 2.1 Mathematical Background . . . . . . . . . . . . . . . . . . 2.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Graph Theory . . . . . . . . . . . . . . . . . . . . 2.1.3 Probability Theory . . . . . . . . . . . . . . . . . . 2.1.4 Complexity Theory . . . . . . . . . . . . . . . . . . 2.2 Provable Security . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . 2.2.2 The Provable Security Paradigm . . . . . . . . . . 2.2.3 Assumptions . . . . . . . . . . . . . . . . . . . . . 2.2.4 Standard and Ideal Model . . . . . . . . . . . . . . 2.2.5 Complexity Theory Techniques . . . . . . . . . . . 2.3 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . 2.3.1.1 Merkle-Damgård Mode of Operation . . . 2.3.1.2 Random Oracles . . . . . . . . . . . . . . 2.3.2 Security Properties . . . . . . . . . . . . . . . . . . 2.3.2.1 Formal Security Notions . . . . . . . . . . 2.3.2.2 Expected Security . . . . . . . . . . . . . 2.3.3 Generic Attacks Against Merkle-Damgård Mode of 2.3.4 Compression Function Building Strategies . . . . . 2.3.5 Other Modes of Operation . . . . . . . . . . . . . . 2.3.5.1 Wide-pipe and Narrow-pipe Design . . . 2.3.5.2 HAIFA . . . . . . . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 6 7 8 10 10 10 11 11 12 12 13 13 14 15 15 16 17 17 18 19 19 Table of Contents . . . . . . 20 20 21 21 22 22 . . . . . . . . . . . 25 25 26 27 27 29 30 31 32 33 33 33 . . . . . . 37 37 38 38 42 43 43 5 Conclusions and Remarks 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 50 Bibliography 58 A Mathematical Derivations A.1 Security Bound on Second Preimage of Grøstl . . . . . . . . . . . . . . . . . 59 59 2.3.6 2.3.7 2.3.5.3 Sponge . . . . . . . . . . . . . Establishing Security of Hash Functions 2.3.6.1 Property Preservation . . . . . 2.3.6.2 Indifferentiability Results . . . 2.3.6.3 Idealized Proof Model . . . . . Security Model . . . . . . . . . . . . . . viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 NIST’s SHA-3 Hash Function Competition 3.1 The History of SHA Family . . . . . . . . . . . . . . . . . . . 3.2 SHA-3 Security Requirements and Evaluation Criteria . . . . 3.3 The Competition Finalists . . . . . . . . . . . . . . . . . . . . 3.3.1 BLAKE . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Grøstl . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 JH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Keccak . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Skein . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 A Summary of the Existing Results . . . . . . . . . . . . . . . 3.4.1 Factors of Favorability . . . . . . . . . . . . . . . . . . 3.4.2 A Summary of the Security and Performance Results . 4 Second Preimage Resistance of Grøstl and Skein 4.1 Security Analysis of Grøstl . . . . . . . . . . . . . . . . 4.1.1 Assessing Second Preimage Resistance of Grøstl 4.1.2 Proof of Security . . . . . . . . . . . . . . . . . . 4.2 Security Analysis of Skein . . . . . . . . . . . . . . . . . 4.2.1 Assessing Second Preimage Resistance of Skein . 4.2.2 Proof of Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 2.1 2.2 2.3 The Merkle-Damgård construction. . . . . . . . . . . . . . . . . . . . . . . . The HAsh Iterative FrAmework - HAIFA construction. . . . . . . . . . . . . The sponge construction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 19 20 3.1 3.2 3.3 3.4 The The The The . . . . 28 29 30 32 4.1 4.2 The Grøstl’s compression function. . . . . . . . . . . . . . . . . . . . . . . . The Skein hash function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 42 BLAKE’s compression function. Grøstl hash function. . . . . . . JH’s compression function. . . . UBI mode of operation. . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Tables 3.1 3.2 A schematic summary of hardware and software results. . . . . . . . . . . . A schematic summary of security reduction results of the five finalists. . . . xi 34 35 Chapter 1 Introduction This thesis deals with provable security properties of cryptographic hash functions. Cryptographic hash functions are fundamental cryptographic primitives. They are used as a building block in many higher-level primitives in cryptography. The hash functions compress message inputs of arbitrary length and return a hash value of fixed length.They are employed in many practical applications such as digital signatures, message authentication codes, password protection, pseudorandom string generation, derivation of cryptographic keys, etc. One of the first uses of hash functions was presented in 1976, in the famous paper by Diffie and Hellman [DH76] on public-key cryptography. They were proposed as a building block of digital signatures. A practical hash function must be efficiently computable and uniformly distributed, but in order to protect data integrity and to provide message authentication hash functions must satisfy specific security requirements. In his PhD thesis [Mer79], Merkle defined the three main security properties of hash functions: collision, preimage and second preimage resistance. Depending on the application, these security properties are relevant or not. In practical signature schemes hash functions are used to: 1) make more efficient signing of a messages of arbitrary length; 2) provide secure authentication. The usual way to employ a hash function in a signature scheme is to initially hash a message input M and then to sign the hashed message H(M ) with the secret key of the signer σ(M ) = H(M )d mod N , where 0 ≤ M ≤ N . Later, the verifier receives the pair (M, σ(M )). This approach is known as hash-and-sign paradigm. An undesired event will happen if an adversary finds two distinct messages with the same hash output H(M1 ) = H(M2 ). Such messages are called colliding messages, and the event is a collision event. In the case that collision event occurs, the adversary can trick an honest party A by first asking him to sign a harmless message M1 . If the honest party A signs the message, the adversary can counterfeit the signature since the signature is the same for a potentially harmful message M2 . Similar scenario can 1 Chapter 1. Introduction 2 happen if the adversary for a previously chosen specific message M finds another message M 0 with the same hash output H(M ) = H(M 0 ). This security property is known as second preimage resistance (or weak collision). Another practical use of hash functions is for commitment. A commitment scheme allows a prover to commit on data without revealing it. A possible approach to create commitment would be to apply hash function on data and disclose only the hash value. Later, the prover can open the commitment by revealing the data. The hash value is the only guarantee to a verifier, who checks for the correctness of it. An adversary, typically the verifier may try to retrieve information about data from commitment. The commitment scheme is broken if the adversary succeeds to retrieve a message M (data) from a hash value Y = H(M ). Therefore, hash functions used in commitment schemes need to be first preimage resistant. These examples show that the use of an insecure hash function as a building block would endanger higher-level primitives. In his proposal of the digital signature scheme, Rabin [Rab78] described an iterative hash function based on a block cipher DES with a message block mi used as a key. However, this design turned out to be trivially insecure (cf. Section 2.3.1.1). A significant breakthrough in the design of hash functions was due to Merkle [Mer90] and Damgård [Dam90] who independently showed how to iterate a compression function to preserve the collision resistance of compression function to the collision resistance of hash function1 . This iteration principle, known as Merkle-Damgård is used in the most popular hash functions today. The most prominent hash functions during the previous two decades are the MDx family (MD5 most important), the SHA family, RIPEMD, HAVAL, Tiger, GOST and Whirlpool, all of which rely on the Merkle-Damgård iterative principle. The MD5 was designed by Ron Rivest in 1991, based on Rivest’s earlier hash function design MD4. MD5 hash function has been employed in a wide variety of security applications. In 1995, the US National Institute for Standards and Technology (NIST) issued the Secure Hash Standard with a specification of the SHA-1 algorithm. This algorithm has become the most widely used hash function standard. A new SHA-2 algorithm was published in 2001. After the breakthrough in cryptanalysis by Wang et al. [WYY05, WY05] in 2005, security flaws were identified in MD5 and SHA-1. Moreover, other results emerged [Jou04, Dea99, KS05, KK06] that raised a question about the security of the Merkle-Damgård construction and hash functions in general. This hash function crisis initiated the ongoing NIST’s hash function competition [NIS07] with the aim to develop a new hash function standard, SHA-3. The end of the selection process is scheduled for the late 2012. NIST specified a number of requirements that the future SHA-3 function should meet. The hash function with n-bit hash value is required to provide collision resistance of approximately n/2 bits, preimage resistance of approximately n bits and second preimage resistance of approximately n − L bits, where the length of the first preimage is at most 2L blocks. We also point to the indifferentiability 1 This property is known as collision-resistance preservation. Chapter 1. Introduction 3 framework introduced by Maurer et al. [MRH04]. This framework was further developed in the context of hash functions by Coron et al. [CDMP05]. Indifferentiability is important because it guarantees security resistance against all generic attacks. The hash functions submitted to the SHA-3 competition claim security, but only a limited number of them are actually backed by security proofs. Many of these security results are obtainable with means of provable security. The concept of provable security was introduced by Goldwasser and Micali [GM84]. Originally, they developed it in the context of asymmetric encryption. From this preliminary work, several lines of research emerged. Fundamentally, the goal of provable security is to provide a mathematical guarantee that a cryptographic scheme cannot be broken by a class of attackers in a specified mathematical model of reality. Cryptographic schemes are usually based on some mathematical problem. Those schemes that can be proven secure under the assumption that the underlying mathematical problem is computationally hard are said to be secure in the standard model. Since it is usually difficult to assess the security in the standard model, in practice is often used the ideal model. Within this model, underlying cryptographic primitives are replaced by their idealized versions. Practically, the provable security approach allows us to prove the security of higher level scheme (e.g. digital signature) under some assumption on the hash function security. In this context, an important line of research was initiated by the research of Fiat and Shamir [FS86] where they suggested the random oracle methodology. Later, Bellare and Rogaway [BR93] formally introduced the random oracle model in order to allow design of more practice oriented provably secure cryptographic schemes. They depicted the random oracle model as a “bridge between theory and practice”. Within this model, the hash function is replaced by an ideal primitive (random oracle). Likewise, the provable security approach also allows us to conduct the security analysis of hash functions, which can be realized in both standard and ideal model. Typically used approach in this context is to argue the security property of the hash function under some assumption on the security of property of the underlying compression function. In the ideal model, adversaries have oracle access to the ideal version of the compression function or its underlying building blocks (e.g. block cipher or permutation(s)). During the second round of NIST’s competition, Andreeva et al. [AMP10c] provided a summary of all known security reduction results for all 14 second round SHA-3 candidates. Moreover, they identified open problems regarding the security reduction results and as the main concern they indicated the lack of optimal security bounds on the second preimage resistance. These results have been revisited in [AMPŠ12], a part of which is based on results of the work presented in this thesis. In addition, we refer to [ABM+ 12, ALM11]. Chapter 1. Introduction 4 Besides this aforementioned goal, another substantial aspect of the provable security approach is associated with the introduction of notions and their definitions. Deficiency of a proper definitions for the basic notions of security encouraged Rogaway and Shrimpton [RS04] to revisit and formalize seven security notions of keyed hash functions. They also considered all of the implications and separations among them within the provable security framework. Subsequently, Andreeva et al. [ANPS07, AMP10b] determined by proof or counterexample, the security property preservation2 of seventeen different iterations in the standard model. Our Contribution In this thesis we analyze the security of the final round candidates in the competition for the new SHA-3 hashing algorithm. We give a concise survey of the five finalists together with their security reductions and performance results. The main contribution of this thesis is the analysis of the second preimage resistance of hash function competition finalists Grøstl and Skein. More precisely, within the concrete security provable security framework, we provide a lower bound on the second preimage resistance of Grøstl in the ideal permutation model and Skein in the ideal cipher model and prove them both optimally second preimage resistant. Outline of the Thesis In Chapter 2 we introduce the mathematical and cryptographic prerequisites for our proofs. In Chapter 3 we present the timeline of SHA-3 hash function competition as well as the NIST’s requirements and evaluation criteria for SHA-3 hash function. Additionally, we provide a brief introduction to the five finalists of competition and their security and performance properties. In Chapter 4 we present proofs for second preimage resistance of the Grøstl and Skein hash functions. Chapter 5 offers concluding remarks where we discuss obtained security results, highlight some limitations of our approach and provide some future directions for the research. 2 The preservation of the seven security properties defined in [RS04]. Chapter 2 Preliminaries In this chapter we introduce a basic background knowledge, which includes some concepts from mathematics as well as cryptography. In Section 2.1 we introduce the basic mathematical definitions. In Section 2.2 concepts of provable security are discussed. Section 2.3 offers an introduction to cryptographic hash functions and their security properties. 2.1 Mathematical Background In this section first we give the mathematical notations used in our work. Then we offer a brief summary of basic definitions from graph theory (see Section 2.1.2), probability theory in Section 2.1.3 and complexity theory in Section 2.1.4. Definitions and notations for this section are taken from literature [Die10, MvOV97]. 2.1.1 Notation Let N denote the set of all natural numbers and Z denote the set of integers. Let n ∈ N, then {0, 1}n denotes all the n-bit strings. We denote the set of all bit strings of arbitrary length by {0, 1}∗ . The concatenation of two bit strings x and y is denoted by x||y. The message blocks of any message M are denoted by m1 ||m2 || . . . ||mk where k denotes the number of $ message blocks. Furthermore, x ← − X corresponds selecting x uniformly at random from the set X. 5 Chapter 2. Preliminaries 2.1.2 6 Graph Theory Definition 2.1. A graph is a pair G = (V, E) of sets satisfying E ⊆ [V ]2 ; thus, the elements of E are 2-element subsets of V . The elements of V are the vertices (or nodes, or points) of the graph G, the elements of E are its edges (or lines). The number of vertices of a graph G is its order, written as |G|; its number of edges is denoted by ||G||. Two vertices x, y of G are adjacent (or neighbours), if e = {x, y} is an edge of G. Two edges e 6= f are adjacent if they have an end in common. The vertex set of a graph G is referred to as V (G), its edge set as E(G). Graphs are finite or infinite according to their order; unless otherwise stated, the graphs we consider are all finite. For the empty graph (∅, ∅) we simply write ∅. A graph of order 0 or 1 is called trivial. Definition 2.2. A path in narrow sense is a non-empty graph P = (V, E) of the form V = {x1 , x2 , . . . , xk } E = {e1 , e2 , . . . , ek }, where the xi are all distinct and ei = {xi−1 , xi } for all i ≤ k. Definition 2.3. A path in wider sense 1 of length k in a graph G is a non-empty sequence e e ek−1 e 1 2 k x0 −→ x1 −→ x2 · · · −−−→ xk−1 −→ xk of vertices and edges in G such that ei = {xi−1 , xi } for all i ≤ k. A path is closed if x0 = xk and open if they are different. If the vertices in a path in wider sense are all distinct, it defines an obvious path in narrow sense in G. In a path the first vertex x0 is called start vertex and the last vertex xk is called end vertex. These two vetrices are linked by a path and jointly they are called terminal vertices of the path; the vertices x1 , . . . , xk−1 are the inner vertices of a path. The number of edges of a path is its length. Definition 2.4. A directed graph (or digraph) is a pair (V, E) of disjoint sets (of directed graph vertices and edges) together with two maps init: E → V and ter: E → V assigning to every edge e an initial vertex init(e) and a terminal vertex ter(e). The edge e is said to be directed from init(e) to ter(e). A directed graph may have several edges between the same two vertices x, y. Such edges are called multiple edges; if they have the same direction (say from x to y), they are parallel. If init(e) = ter(e), the edge e is called a loop. 1 The term “walk” is used by some authors [Die10] for a path in wider sense p (a path in which vertices or edges may be repeated), while the terms “path” and “simple path” are used for what is in our work called a path in narrow sense P . Chapter 2. Preliminaries 7 Notice that we use directed graphs in our work. Also, under the term path we refer to a path in wider sense and often we denote it by the natural sequence of its edges p = (e1 , e2 , . . . , ek ). 2.1.3 Probability Theory In this section we consider sample spaces with only finitely many possible outcomes. Let the simple events of a sample space S be labeled s1 , s2 , . . . , sn . Basic Definitions Definition 2.5. An experiment is a procedure that yields one of a given set of outcomes. The individual possible outcomes are called simple events. The set of all possible outcomes is called the sample space. Definition 2.6. A probability distribution P on S is a sequence of numbers p1 , p2 , . . . , pn that are all non-negative and sum to 1. The number pi is interpreted as the probability of si being the outcome of the experiment. Definition 2.7. An event E is a subset of the sample space S. The probability that event E occurs, denoted P (E), is the sum of the probabilities pi of all simple events si which belong to E. If si ∈ S, P ({si }) is simply denoted by P (si ). Fact 2.8. Let E ⊆ S be an event. i) 0 ≤ P (E) ≤ 1. Furthermore, P (S) = 1 and P (∅) = 0. (∅ is the empty set.) ii) If the outcomes in S are equally likely, then P (E) = |E| |S| . Definition 2.9. Two events E1 and E2 are called mutually exclusive if P (E1 ∩ E2 ) = 0. That is, the occurrence of one of the two events excludes the possibility that the other occurs. Fact 2.10. Let E1 and E2 be two events. i) If E1 ⊆ E2 , then P (E1 ) ≤ P (E2 ). ii) P (E1 ∪E2 )+P (E1 ∩E2 ) = P (E1 )+P (E2 ). Hence, if E1 and E2 are mutually exclusive, then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ). Chapter 2. Preliminaries 8 Conditional Probability Definition 2.11. Let E1 and E2 be two events with P (E2 ) > 0. The conditional probability of E1 given E2 , denoted P (E1 |E2 ), is P (E1 |E2 ) = P (E1 ∩ E2 ) . P (E2 ) P (E1 |E2 ) measures the probability of event E1 occurring, given that E2 has occurred. Definition 2.12. Events E1 and E2 are independent if P (E1 ∩ E2 ) = P (E1 )P (E2 ). Observe that if E1 and E2 are independent, then P (E1 |E2 ) = P (E1 ) and P (E2 |E1 ) = P (E2 ). That is, the occurrence of one event does not influence the likelihood of occurrence of the other. Fact 2.13. (Bayes’ theorem) If E1 and E2 are events with P (E2 ) > 0, then P (E1 |E2 ) = 2.1.4 P (E1 )P (E2 |E1 ) . P (E2 ) Complexity Theory The main goal of complexity theory is to provide mechanisms for classifying computational problems according to the resources needed to solve them. The classification should not depend on a particular computational model, but rather should measure the intrinsic difficulty of the problem. The resources measured may include time, storage space, random bits, number of processors, etc., but typically the main focus is time, and sometimes space. Basic Definitions Definition 2.14. An algorithm is a well-defined computational procedure that takes a variable input and halts with an output. It is usually of interest to find the most efficient algorithm for solving a given computational problem. The time that an algorithm takes to halt depends on the “size” of the problem instance. Also, the unit of time used should be made precise, especially when comparing the performance of two algorithms. Definition 2.15. The size of the input is the total number of bits needed to represent the input in ordinary binary notation using an appropriate encoding scheme. Occasionally, the size of the input will be the number of items in the input. Chapter 2. Preliminaries 9 Definition 2.16. The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed. Often a step is taken to mean a bit operation. For some algorithms it will be more convenient to take step to mean something else such as a comparison, a machine instruction, a machine clock cycle, a modular multiplication, etc. Definition 2.17. The worst-case running time of an algorithm is an upper bound on the running time for any input, expressed as a function of the input size. Definition 2.18. The average-case running time of an algorithm is the average running time over all inputs of a fixed size, expressed as a function of the input size. Asymptotic notation It is often difficult to derive the exact running time of an algorithm. In such situations one is forced to settle for approximations of the running time, and usually may only derive the asymptotic running time. That is, one studies how the running time of the algorithm increases as the size of the input increases without bound. In what follows, the only functions considered are those which are defined on the positive integers and take on real values that are always positive from some point onwards. Let f and g be two such functions. Definition 2.19. (order notation) i) (asymptotic upper bound ) f (n) = O(g(n)) if there exists a positive constant c and a positive integer n0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 . ii) (asymptotic lower bound ) f (n) = Ω(g(n)) if there exists a positive constant c and a positive integer n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 . iii) (asymptotic tight bound ) f (n) = Θ(g(n)) if there exist positive constants c1 and c2 and a positive integer n0 such that c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 . iv) (o-notation) f (n) = o(g(n)) if for any positive constant c > 0 there exists a constant n0 > 0 such that 0 ≤ f (n) ≤ cg(n) for all n ≥ n0 . Intuitively, f (n) = O(g(n)) means that f grows no faster asymptotically than g(n) to within a constant multiple, while f (n) = Ω(g(n)) means that f (n) grows at least as fast asymptotically as g(n) to within a constant multiple. f (n) = o(g(n)) means that g(n) is an upper bound for f (n) that is not asymptotically tight, or in other words, the function f (n) becomes insignificant relative to g(n) as n gets larger. The expression o(1) is often used to signify a function f (n) whose limit as n approaches ∞ is 0. Chapter 2. Preliminaries 2.2 10 Provable Security The first part of this section focuses on the basic definitions and concepts of provable security. Then in Section 2.2.5 we present two approaches taken from complexity theory used to evaluate the level of security of cryptographic schemes. 2.2.1 Basic Definitions The main goal of cryptography is to enable secure communication by using cryptographic schemes (or protocols). A common way to design a scheme is to choose or build secure atomic2 primitives, and then on top of them to design the scheme, in such a way that the scheme can “inherit” security from these atomic primitives. Under the atomic primitive we assume either a problem which is considered to be computationally hard (e.g. the discrete log problem, the integer factorization problem) or a secure cryptographic construction such as block cipher, permutation, compression function, etc. The problem that can arise with a cryptographic scheme design is that even if a good underlying atomic primitive is used, a poor design can result in an insecure scheme. The usual way to investigate whether a scheme inherits desired security properties from the underlying primitives, is by means of provable security. The provable security idea was introduced in 1984 by Goldwasser and Micali [GM84] in the context of asymmetric encryption. Usually, theoreticians say that the term “provable security” is in some way misleading. The reason for this is that we actually do not provide an absolute proof of security. We simply provide a reduction of the security of the scheme to the security of some underlying atomic primitive. The term that better reflects the essence of this approach is reductionist approach. 2.2.2 The Provable Security Paradigm In order to provide a security proof we need to: 1. Introduce a formal adversarial model for a concrete security goal. 2. Formally define a security notion we want to achieve in chosen adversarial model. 3. Exhibit security reduction which shows that the only practical way to defeat the scheme is to break the underlying atomic primitive. This practically means that if we find some weakness in the scheme, we will definitely find a weakness in the underlying atomic primitive as well. Vice versa, if we believe that the 2 In this context the term “atomic” means that the primitive in question cannot be used alone to solve a specific cryptographic problem. Commonly, it is used as a building block of higher-level primitives. Chapter 2. Preliminaries 11 atomic primitive is secure, then we will know that the scheme must be secure with respect to the desired security notion. From the point of view of cryptanalysis, this implies that its focus should be on the atomic primitive. To summarize, there are two principal aims of the provable security approach. The first is associated with the introduction of notions and their definitions which practically entails classification of protocols and atomic primitives, while the second is related to the actual reduction. 2.2.3 Assumptions When the provable security approach is used, one needs to be aware that proven security does not exclude the possibility of attack. Crucially, the scheme is proven secure under a certain assumption. In the case that the assumption is not satisfied, results obtained by the proof become irrelevant. This does not have to lead to a practical attack, it only means that the proof of security is no longer useful. This further implies that a proof of security is more valuable when the assumption is weaker. Once we introduced the comparable feature regarding security, we can compare security reduction results (e.g. if two schemes are proven secure, the one making weaker assumptions is preferable). However, it is not always possible to compare strength of the assumptions. 2.2.4 Standard and Ideal Model In cryptography the standard model is the model of computation in which the adversary is only limited by the amount of time and computational power available. As we pointed above, cryptographic schemes are often based on complexity assumptions3 . These schemes whose security reduction is possible using only complexity assumptions are said to be secure in the standard model. Although a proof in the standard model brings more security guaranties than other techniques, it is quite difficult to complete this type of proof in practice. Therefore, in many proofs, an ideal model is used where cryptographic primitives (e.g. block cipher, permutation, compression function) are replaced by their idealized versions. Probably the best known technique of this kind is known as random oracle model. 3 Under complexity assumptions we consider an assumption on the hardness of the underlying problem (e.g. the discrete log problem, the integer factorization problem). Chapter 2. Preliminaries 2.2.5 12 Complexity Theory Techniques Asymptotic Security In the theoretical literature, complexity theory is widely used. There one talks about polynomial-time adversaries and negligible success probabilities. In this setting, a scheme needs to be designed with polynomial-time algorithms. Then polynomial-time reductions can be exhibited from the assumption on the computational hardness of the underlying problem to an attack of the security notion. Generally speaking, by exhibiting a polynomialtime reduction from A to B, we can show that problem B is at least as hard as A. A polynomial security result claims that a scheme is secure for sufficiently large values of the security parameter, without suggesting any specific values for it. Concrete Security Polynomial-time approach is quite favorable in the theoretical domain, but in practice the more desired approach is to provide a concrete number for the security parameter. Such number needs to suggest, for example, how large the security parameter should be, such that a polynomial adversary that makes a certain number of queries to the public algorithms of the scheme succeeds with a small probability. This framework is called concrete security framework and it captures the quantitative nature of security. Another aspect of the concrete security approach is associated with possible preservation of the strength of the underlying atomic primitive in its transformation to the scheme. Security Resistance and Attacks In the provable security framework, attacks and security resistance are the complement of each other. Attacks measure the degree of insecurity while quantitative bounds measure the degree of security. More precisely, while the proof of security provides a lower bound, cryptographic attacks provide an upper security bound on the complexity of breaking scheme under some assumption. When these two bounds meet, the security property of the scheme is identified and the bound is declared as tight. 2.3 Hash Functions Firstly, we briefly introduce basic definitions and notions of hash functions together with their design strategies. Later, in Section 2.3.2 we present the main security properties Chapter 2. Preliminaries 13 of hash functions. Generic second preimage attacks against the Merkle-Damgård mode of operation are discussed in Section 2.3.3. In Section 2.3.4 we introduce existing design strategies of compression functions, while in Section 2.3.5 the most significant new modes of operation are discussed. Finally, Section 2.3.6 and Section 2.3.7 offer the proof techniques and the security model, which is used later in our security analysis. 2.3.1 Basic Definitions Definition 2.20. A hash function is a deterministic function that maps an input of finite arbitrary size to an output of finite fixed size. Formally, H: {0, 1}∗ → {0, 1}n . Definition 2.21. A compression function is a deterministic function that maps input of finite fixed size s to an output of finite fixed size p where s > p. The procedure that describes how a compression function should be used in order to allow a secure hashing of arbitrarily long inputs is called mode of operation. One of the first proposals which included an iteration of compression function was made by Rabin [Rab78]. The advantages of this iterative approach are linear time complexity in the message size and the modest memory requirements. Later, Merkle defined the three main security notions of hash functions in his PhD thesis [Mer79]. These basic notions of security were revisited and formalized in a wider context in [RS04, AS11]. At this point, commonly used informal definitions of the three main security notions are provided: • collision resistance (Coll) - it is hard to find any two distinct inputs M and M 0 which hash to the same output, such that H(M ) = H(M 0 ). • second-preimage resistance (Sec) - it is hard to find any second input which has the same output as any specified input, i.e., given M , to find a second-preimage M 0 6= M such that H(M ) = H(M 0 ). • preimage resistance (Pre) - it is hard to find any input which hashes to that output, i.e., to find any preimage M 0 such that H(M 0 ) = Y . 2.3.1.1 Merkle-Damgård Mode of Operation As indicated previously, Rabin [Rab78] introduced an iterative hash function design based on DES block cipher. The algorithm goes as follows: first, a message M is divided into message blocks M = m1 ||m2 || . . . ||mk−1 ||mk of fixed size. Further, the hash function is computed in the iterative manner: hi ← f (hi−1 , mi ) where f (hi−1 , mi ) = DESmi (hi−1 ) and h0 = IV . Finally, the hash function returns a hash value H(M ) = hk . Later was Chapter 2. Preliminaries 14 shown that the use of unspecified IV leads to trivial second preimage and collision attacks. A colliding message is found if the first input block is removed and for IV is selected h1 . In addition, trivial preimage attacks are possible under the assumption that IV can be chosen by the adversary. Merkle and Damgård independently offered a solution to address these problems [Mer90, Dam90]. Their idea was to fix a default value for IV and to use a padding scheme with the message length appended at the end. Each of them offered a different padding scheme. Merkle’s padding scheme emerged as standard due to its higher efficiency as the smaller number of padded bits is needed in the case of large messages. Figure 2.1. The Merkle-Damgård construction. The Merkle-Damgård mode of operation constructs a hash function H f : {0, 1}∗ → {0, 1}n by iterating a compression function f : {0, 1}n × {0, 1}m → {0, 1}n . Padding is achieved by appending to the original message a single ’1’ bit followed by as many ’0’ bits as needed to complete an m-bit block after embedding the message length at the end. Adding the message length in the last block and using of a fixed IV , the so-called strengthening, is the crucial ingredient in establishing the collision-resistance preservation of Merkle-Damgård. This iteration design known as the Merkle-Damgård construction, is the most commonly used mode of operation in hash functions. 2.3.1.2 Random Oracles Difficulty of exhibiting a proof under complexity assumptions, has forced cryptographers to introduce some construction with well-understood properties which could be used every time a cryptographic hash function is required. First choice for such construction is the random function. Fiat and Shamir [FS86] first suggested the random oracle framework, which was later formally introduced by Bellare and Rogaway [BR93]. Definition 2.22. A random oracle is a public hash function that maps inputs of arbitrary size to outputs of finite size, or R : {0, 1}∗ → {0, 1}n , where the outputs are drawn uniformly at random from the range space and accessible by all algorithms in a black-box manner. In a reductionist security proof an underlying hash function can be replaced with the random oracle. The random oracle model allows us to prove efficient-in-practice cryptographic Chapter 2. Preliminaries 15 schemes secure, which sometimes can be provably impossible in the standard model. Still, when using the random oracle model one needs to be aware that the random oracle assumption is the strongest assumption possible for hash functions. As a consequence, the security guaranties provided by the random oracle model are not as strong as those obtained in the standard model. What are the advantages of this approach? Firstly, it allows building of efficient schemes. Furthermore, even though the random oracle assumption is strong, the results obtained in the random oracle model provide valuable security guarantees (e.g. provably exclude certain generic attacks, absence of security flaws in the design, etc.). 2.3.2 Security Properties In this section we formally introduce basic security notions of hash functions as well as their expected security levels. We take notations and terminology from [And10, Bou11]. 2.3.2.1 Formal Security Notions Let us remind that the goal of this work is to obtain reduction results on security of particular hash functions. As indicated in Section 2.2, in order to make a reduction possible we need to introduce a formal adversarial model were the security notion of the scheme (a hash function in this case) has to be defined in that model. The formal definitions for hash functions are characterized with the so-called attack-based definitions. Typically, attacks are defined through a game between a challenger and the attacker, where the challenger’s task is to simulate the environment of the adversary A and generates the secret system parameters. Usually the adversarial advantage is measured by the success probability of the adversaries. In terms of analysis of security property xxx ∈ {Coll, P re, Sec} of the hash function H we denote the adversarial advantage in breaking xxx that property by Advxxx H (A). We write AdvH (t) to denote the maximum advantage of any adversary with time complexity at most t. While the length of the first preimage M is of 2L blocks following NIST’s security requirements, throughout this thesis the length is denoted by λ (in bits) and k (in blocks), where λ/m ≈ k = 2L . Definition 2.23. Let λ, n ∈ N and let H: {0, 1}∗ → {0, 1}n be a hash function. Then, the advantage of the adversary A against collision is h i 0 $ 0 0 AdvColl (A) = P r (M, M ) ← − A(·) : M = 6 M and H(M ) = H(M ) . H The advantage of the second preimage adversary A is defined as Sec[λ] AdvH h i $ $ (A) = P r M ← − {0, 1}λ ; M 0 ← − A(M ) : M 6= M 0 and H(M ) = H(M 0 ) . Chapter 2. Preliminaries 16 The advantage of the preimage adversary A is defined as h i $ $ AdvPHre (A) = P r M ← − {0, 1}λ ; Y ← H(M ); M 0 ← − A(Y ) : H(M 0 ) = Y . These are commonly used formal definitions for keyless second preimage and preimage notion. An attempt to formalize collision resistance in similar fashion faces fundamental difficulties. The problem lies in the fact that for any hash function there always exists an efficient collision finding algorithm, but we humans are simply not able to find it. One solution to formalize collision resistance in the standard model was offered by Rogaway [Rog06]. The main idea behind his proposal was to provide security reduction for the case when a hash function is used as a building block of a higher level primitive. This reduction means that as long as humans are not able to find collision on the hash function, then the higher level primitive cannot be be broken by hash functions collisions. 2.3.2.2 Expected Security Now, after we defined relevant security notions we need to see what their security level is. We want to show security results for hash functions in general, which means that we do not want to focus on any particular hash design. In order to achieve this, we consider a hash function which act as random oracle. Preimage and Second Preimage Resistance. It is easy to show that any adversary who is trying to find a (second) preimage would succeed with probability q/2n after sending q queries to the random oracle. Each query to the random oracle as result has uniformly random output of size n. This implies that each query has probability 2−n to yield a (second) preimage. This means that when we consider a hash function as the random oracle the problem of finding a second preimage is just as hard as the problem of inverting the hash function. Collision Resistance. Results for collision resistance are a bit different due to the birthday paradox. Intuitively, it is much easier to find any pair of two inputs which hash to the same output, than to find an input which hashes to the same output as one particular input selected before. The birthday problem estimates the probability that in a set of randomly chosen people (less than 365) a pair shares the same birthday under the assumption that all birthday dates are equally probable. The probability that such pair is found is higher than 50% if there are 23 persons in the set. If we compare the number of possible dates (365) and the number of people required (23) we can see that 23 is approximately square-root dependent from the 365. If we map the birthday paradox to our collision problem, such √ that our range length is 2n possible values, it is clear that after 2n = 2n/2 queries to hash Chapter 2. Preliminaries 17 function, collision is going to be found with probability higher than 50%. We can also look at this problem from different angle. If an adversary is trying to find a collision, as he sends q queries to random oracle, he knows q(q − 1)/2 pairs and each pair results in collision with probability 2−n . This implies that a collision is found after 2n/2 queries [Wag02]. 2.3.3 Generic Attacks Against Merkle-Damgård Mode of Operation Cryptanalysis of modes of operation has increased significantly over the years. As a result, several generic4 attacks against Merkle-Damgård mode of operation were introduced (e.g. the length extension attack, Joux’s multicollision attack, etc.). In our work, we are interested in second preimage generic attacks. We defined second preimage resistance as the security notion which captures the difficulty of finding any second message input which has the same output as any previously specified message input. For a long time it was thought that the Merkle-Damgård based hash function with strengthening preserved second preimage resistance and that it was taking about 2n steps (queries) to find a second preimage for secure hash function [LM92]. However, in 1999, Dean showed in his PhD thesis [Dea99] that this security level could not be accomplished by hash functions whose compression function allowed the easy finding of fixed points5 . He found a way to circumvent the strengthening by finding preimages of the same size as the target message. Surprisingly, this important result has gone unnoticed until 2005, when Kelsey and Schneier [KS05] generalized Dean’s attack by using the multicollision result of Joux [Jou04]. More precisely, they introduced the generic second preimage attack on the Merkle-Damgård hash function that requires at most approximately 2n−L queries, where the length of the first preimage is at most 2L blocks. Later, more flexible generic second preimage attack was described by Andreeva et al. [ABF+ 08], with the same complexity as the two mentioned before. Bouillaguet and Fouque [BF09] showed within provable security framework, that these generic second preimage attacks against the Merkle-Damgård construction are optimal under the assumption that the compression function is random. 2.3.4 Compression Function Building Strategies A compression function is commonly built on the top of a block cipher or a limited number of permutations. Although block ciphers are primarily designed for encryption, they are used 4 The attacks which are applicable on all hash functions based on a single construction design or mode of operation are called generic attacks. 5 A fixed point of a function is a point that is mapped to itself by the function. In the context of hash functions, a fixed point for a compression function would mean that f (h, m) = h. Chapter 2. Preliminaries 18 as a building block of compression functions, because of their well-understood properties and design. Block Cipher Based Compression Functions A detailed analysis of block cipher based compression functions was conducted by Preneel et al. [PGV93]. More precisely, they analyzed the 64 most basic ways to construct a hash function from a block cipher6 . Furthermore, Black et al. [BRS02] proved secure 12 of these 64 PGV schemes in oracle model where underlying block cipher is treated as random primitive. In 2009, Stam [Sta09] revisited the rate-17 block cipher based hash functions, where he analyzed them in a wider context. The most widely known types of the block cipher compression function are the Matyas-Meyer-Oseas (PGV1), the Miyaguchi-Preneel (PGV3) and the Davies-Meyer (PGV5). The main drawback of this type of design is its inefficiency. Permutation Based Compression Functions In order to address problems with a weak key schedule and to make more efficient compression functions, a limited number of permutations can be used instead of block cipher. In their paper, Black et al. [BCS05] analyzed all 2n-bit to n-bit compression functions based on one n-bit permutation, and proved them insecure against collision and (second) preimage attacks. Later, Rogaway and Steinberger [RS08b, RS08a] together with Stam [Sta08] extended these results to compression functions with arbitrary input and output sizes, and an arbitrary number of underlying permutations. Moreover, they provided security bounds which indicate the expected number of queries required to find collisions or preimages for permutation based compression functions. 2.3.5 Other Modes of Operation In 2004, the attacks of [WYY05, WY05] shaken the confidence of cryptographic community in the security of widely employed hash functions MD5 and SHA-1. This has led to an increased interest in the field of hash functions. As a result of the research on design strategies of hash functions, new modes of operation emerged, with different design and security characteristics. In this section we present some of the most important modes of operation. 6 7 These construction are usually called PGV which is an acronym for Preneel, Govaerts and Vandewalle. A compression function based on a single call to a block cipher. Chapter 2. Preliminaries 2.3.5.1 19 Wide-pipe and Narrow-pipe Design One important aspect of hash function design is the size of the internal state with regard to the size of final hash output. The Merkle-Damgård construction is a so-called narrow-pipe design where the size of the internal state is the same as the size of the final hash output (l = n). In 2005, Lucks introduced the Wide-pipe Hash [Luc05]. The main idea behind this design was to use an internal state of the hash function considerably larger than hash output (l n). More precisely, the size of an internal state is about twice as big as the final hash output obtained by chopping at the end of the iteration. As a consequence, Lucks was able to provide a proof that generic second preimage attacks could not be faster than exhaustive search. As a drawback of this design one can underline slightly higher memory requirements. This wide-pipe strategy has been employed in several SHA-3 competition finalists, namely Grøstl, JH, Keccak and Skein. 2.3.5.2 HAIFA The HAsh Iterative FrAmework was introduced by Biham and Dunkelman [BD07]. HAIFA mode is basically a modified version of Merkle-Damgård mode where slight tweaks are employed. In order to address the problem of generic second preimage attacks against Merkle-Damgård, the designers of HAIFA accompanied each message block in the iteration with a counter that tracks number of message bits hashed to this point and a fixed optional salt 8 . The security property preservation of HAIFA design among others was investigated by Andreeva et al. [ANPS07]. Bouillaguet and Fouque proved HAIFA to be optimally second preimage resistant if the underlying compression function is assumed to behave like an ideal primitive [BF09]. The HAIFA design strategy was followed by the designers of one SHA-3 competition finalist, namely BLAKE. Consequently, security results of HAIFA for preimage, second preimage, collision, and indifferentiability, while assuming ideality of the underlying compression function, are applicable for the BLAKE hash function. Figure 2.2. The HAsh Iterative FrAmework - HAIFA construction. 8 An input parameter for the compression function, can be either public or secret. Chapter 2. Preliminaries 2.3.5.3 20 Sponge As an alternative to the Merkle-Damgård design, sponge functions were introduced by Bertoni et al. [BDPA07]. Instead of iterating a secure compression function in order to preserve security properties and to obtain a secure hash function, designers of sponge functions considered a different approach where they iterate a possibly insecure compression function a sufficient number of times to obtain a secure hash function. The internal state iterated by sponge functions is r + c bits wide, where c is the so-called capacity. The hash value is obtained after two phases: absorbing and squeezing. Sponge functions iteratively “absorb” r-bit message blocks per compression function call and this process is called the absorbing phase. Once the message is processed, the squeezing phase occurs and the first r bits of the internal state are returned as output block in a possibly iterative manner. The number of output blocks can be chosen by the user. The security guarantees for the most of sponge-like constructions9 are typically based on indifferentiability results, which can be seen in Section 3.3.3 and Section 3.3.4. The SHA-3 competition finalist based on original sponge function design is Keccak, while JH is regarded as a sponge-like hash function. Figure 2.3. The sponge construction. 2.3.6 Establishing Security of Hash Functions In this section we analyze possible techniques from the provable security aspect that can be used to obtain security reduction results. Throughout further analysis emphasis is placed on the second preimage resistance. 9 For a sponge-like hash function we consider a hash function which employs a permutation based compression function and iterate a wide internal state. Chapter 2. Preliminaries 2.3.6.1 21 Property Preservation In Section 2.3.1.1 we showed how a hash function should be built in order to preserve the collision resistance from the compression function to the complete hash function. Also, generic security is discussed in Section 2.3.3 where is pointed out that Merkle-Damgård construction does not preserve second preimage resistance. Furthermore, Andreeva et al. [ANPS07, AMP10b] analyzed, among the other, preservation of second preimage resistance by various constructions. Unfortunately, only two of these constructions actually preserve second preimage resistance, one of which is ROX construction [ANPS07] while the other one is BCM [AP09]. A reason why typically used constructions do not preserve second preimage resistance is believed to be due to an introduction of fixed bits through the state input by the initialization vector and possibly through the message input. Another reason for non-preservation can be presence of fixed padding message bits. As a consequence, the second preimage resistance of the compression function does not directly translate to the second preimage security of hash function based on the Merkle-Damgård construction with final chopping. 2.3.6.2 Indifferentiability Results In the recent years, an important progress in security analysis was made with the introduction of indifferentiability framework by Maurer et al. [MRH04]. In addition, this framework was further developed in the context of hash functions by Coron et al. [CDMP05]. The main principle behind this framework is as follows: in order to investigate the security of a particular mode of operation, one can replace the underlying primitive (usually compression function or even underlying building block of compression function such as permutation or block cipher) with an ideal version of itself (a random function, a random permutation, an ideal block cipher) and then compare the combination of the ideal primitive and the mode of operation in question with the random oracle. Following this approach we can determine weather this design is indifferentiable from a random oracle or not. Positive answer would mean that the design behaves ideally up to a certain level. The level of resemblance (typically expressed in number of queries) between concrete design and random oracle is regarded as an important security indicator. For us, the importance of this framework lies in the fact that the result obtained within this framework indirectly provides bounds on the (second) preimage and collision resistance of the hash function in question [AMP10c]. Chapter 2. Preliminaries 2.3.6.3 22 Idealized Proof Model A reduction in the ideal model considers where an information-theoretic adversary who has only query access to the idealized underlying primitive (compression function in this case). Bouillaguet and Fouque [BF09] followed this approach to provide optimal security bound on the second preimage resistance in ideal compression function model for MerkleDamgård and HAIFA constructions. A benefit of successfully conducted security reduction is the guarantee that the hash function has no severe structural weaknesses, unless one can detect a possible deviation from the random behavior in the underlying compression function. In the later case, the security results obtained by reduction are invalid. Also, one needs to be aware that an ideal compression function is quite a strong assumption. In the problematic case, when the compression function exhibits non-random behavior, the level of modularity can be refined in order to revalidate or improve security guarantees. In this case, one needs to assume the ideal behavior of underlying building blocks of compression function (e.g. the underlying block cipher or permutation(s)). In [BCC+ 08], the designers of Shabal suggested idealized proof model to assess the collision, preimage and second preimage resistance of Shabal. More concretely, they proved Shabal secure in the ideal cipher model by using the graph based simulation approach. Subsequently, Fouque et al. [FSZ09] analyzed collision and preimage resistance of the construction identical to the compression function of Grøstl. This analysis was performed in the ideal permutation model. A summary of all known security reduction results for all 14 second round SHA-3 candidates in the ideal model was provided by Andreeva et al. [AMP10c]. Subsequently, these results were revisited and updated in [AMPŠ12, ABM+ 12]. 2.3.7 Security Model As explained in Section 2.2 after the decision has been made on what to achieve within the provable security framework, a formal adversarial model needs to be introduced, where the security notion of the scheme in question has to be defined. In Section 2.3.2.1 formal definitions of the three main security notions of hash functions are provided in the standard model. However, in the idealized proof model where an underlying primitive of compression function is assumed to be ideal, these formal definitions slightly differ. Therefore, in order to carry out the meaningful reduction we introduce formal adversarial model which will be used in our analysis. This setting is very similar to the analysis conducted in [BRS02, FSZ09, AMP10c, AMPŠ12]. Let us assume that the underlying primitive of compression function is an ideal primitive (e.g. a random permutation, an ideal block cipher). In this model, the adversary A is a probabilistic algorithm with oracle access to a uniformly at random sampled primitive Chapter 2. Preliminaries 23 $ P ← − P rim(H). The set P rim(H) depends on the chosen hash function (e.g. in the case of permutation-based hash function H1 , primitive P is chosen independently and uniformly at random from the set of all permutations P rim(H1 )). We consider information-theoretic adversaries only. Hence, the adversary has unbounded computational power and its only obstacle to succeed in an attack is the randomness of the query response. The complexity is measured by the number of queries made to the oracle. In this ideal model the adversary A is allowed to make at most q forward and inverse queries to the oracle. All these queries are stored in a query history L as indexed elements. Without loss of generality, we assume that L always contains the queries required for the attack and that the adversary does not ask any oracle query in which the response is already known. The definitions of preimage and second preimage that we use in the ideal model correspond to the everywhere10 preimage and second preimage notions of [RS04]. Definition 2.24. Let λ, n ∈ N, let Y = {0, 1}n , M = {0, 1}λ and let H: {0, 1}∗ → {0, 1}n be a hash function. Then, the advantage of the adversary A against collision is h i 0 $ 0 0 AdvCol (A) = P r (M, M ) ← − A(P ) : M = 6 M and H(M ) = H(M ) . H The advantage of the everywhere second preimage adversary A is defined as eSec[λ] AdvH h i $ $ (A) = max P r P ← − P rim(H); M 0 ← − A(P ) : M 6= M 0 and H(M ) = H(M 0 ) . M ∈M The advantage of an everywhere preimage adversary A is defined as h i $ $ re AdveP − P rim(H); M 0 ← − A(P ) : H(M 0 ) = Y . H (A) = max P r P ← Y ∈Y xxx For q ≥ 1 we write Advxxx H (q) = max{AdvH (A)} where the maximum is taken over all adversaries that ask at most q oracle queries where xxx ∈ {eP re, eSec, Col}. Above we defined the security notions of the hash function H in the formal adversarial model. In addition, similar definitions can be used to define security notions of compression function f . The security analysis conducted in Chapter 3 and Chapter 4 is realized in this adversarial model. 10 Notice that the ePre and eSec of [RS04] relies (w.r.t. randomness) on the key generation, while in the keyless and ideal model setting it relies (w.r.t. randomness) on the random underlying primitive. Chapter 3 NIST’s SHA-3 Hash Function Competition This chapter briefly reviews the timeline of the SHA family history including the NIST’s SHA-3 hash function competition. Section 3.2 presents NIST’s requirements and evaluation criteria for SHA-3 hash function. Additionally, Section 3.3 provides a brief introduction to the five finalists of competition and their security and performance properties. Finally, security and performance results are summarized in Section 3.4. 3.1 The History of SHA Family In 1993, the US National Institute of Standards and Technology (NIST) published the first Secure Hash Standard. Soon after having been published it was withdrawn due to flaws in the design of Secure Hash Algorithm which was described in the Federal Information Processing Standards Publication (FIPS PUBS) 180. That version of Secure Hash Algorithm is commonly referred to as SHA-0. After being improved, FIPS 180-1 was published in 1995 containing a specification of the hash function known as SHA-1. SHA-1 has been the most widely used hash function algorithm in the next decade, even though the SHA2 standard, published in FIPS 180-2 in 2001, has better security properties than SHA-1. SHA-2 includes a significant number of changes from its predecessor SHA-1. After a series of attacks on SHA-1 by Wang et al. [WYY05, WY05] together with results that raised a question about the security of Merkle-Damgård construction [Dea99, Jou04, KS05, KK06] NIST recommended the replacement of SHA-1 by the SHA-2 hash function family. On November 2, 2007, NIST announced a call for the design of a new SHA-3 hashing algorithm [NIS07], similarly to the development process for the Advanced Encryption Standard (AES). The main goal of this public competition is to develop a new, secure cryptographic 25 Chapter 3. NIST’s SHA-3 Hash Function Competition 26 hash algorithm, as a standard that can be used in generating digital signatures, message authentication codes, and many other hash function applications. The selected algorithm is intended to be available royalty-free worldwide. NIST defines three categories of evaluation criteria that will be used to compare candidate algorithms throughout the SHA-3 competition: 1) security, 2) cost and performance, and 3) algorithm and implementation characteristics. The new hash algorithm will be referred to as “SHA-3”. Sixty-four candidates mostly from Europe and North America were submitted for hash function competition by October 31, 2008. The preliminary cryptanalysis showed that fiftyone candidate algorithms meet the minimum of submission requirements. These candidates were selected for the first round in the end of 2008. Later, on July 24, 2009, after public feedback and internal reviews of the first-round candidates, NIST selected fourteen secondround candidates using previously defined evaluation criteria. At the end of 2010, after one year of public review, NIST announced five SHA-3 finalists: BLAKE, Grøstl, JH, Keccak, and Skein. In order to improve their hash functions, submitters of the finalist algorithms were allowed to make minor modifications to their algorithms and submit the final packages to NIST by January 16, 2011. Similarly to the previous rounds, one-year public comment period is planned for the finalists. NIST plans to choose a winner of the SHA-3 competition in 2012. 3.2 SHA-3 Security Requirements and Evaluation Criteria NIST specifies security as the most important competition’s evaluation criteria [NIS07]. Moreover, they define security requirements which are expected to be fulfilled by the future SHA-3 hash algorithm. The minimum security requirements that NIST expects from the SHA-3 hash function of hash value size n are: 1. collision resistance of approximately n/2 bits, 2. preimage resistance of approximately n bits, 3. second preimage resistance of approximately n − L bits, where the length of the first preimage is at most 2L blocks, 4. resistance to length-extension attacks, 5. any m-bit hash function specified by taking a fixed subset of the candidate functions output bits should meet the above requirements with m replacing n. As explained in Section 2.3.2.2 and Section 2.3.3, a standard hash function is expected to satisfy these specified requirements. Certainly, an increase of second preimage resistance Chapter 3. NIST’s SHA-3 Hash Function Competition 27 (from approximately n − L bits up to resistance of approximately n bits) and resistance against other attacks, such as multi-collision attacks, is seen as an advantage by NIST. Any result that shows that the candidate hash function does not meet the specified requirements is considered to be a serious attack. Therefore, a special attention has to be directed towards newly developed attacks. This is of great importance, especially if the level of security of the hash function is lower than it is claimed by the submitter. A good place to start security analysis is by checking the soundness of the mathematical basis. This analysis can provide a good indication of the hash function design quality. To select the best candidate, each submitted hash function is compared with other candidates (of the same hash length) based on provided security results, regarding (second) preimage resistance, collision resistance, and resistance to generic attacks. One additional security property raised by the public during the evaluation process is the extent to which the algorithm output is indifferentiable from a random oracle (see Section 2.3.6.2). In a summary, those candidates whose preliminary security analysis raised concerns were discarded from the competition. Similarly, designs that have not received much feedback from the cryptographic community were also considered as doubtful and they were discarded, too. 3.3 The Competition Finalists In this section we present the five finalists. Beside their main characteristics, we provide security properties and performance results of each finalist based on earlier works by Andreeva et al. [AMP10c, ABM+ 12, AMPŠ12] and Turan et al. [TPB+ 11]. 3.3.1 BLAKE The BLAKE hash function [AHMP10] uses HAIFA as iteration mode. BLAKE’s compression function (see Figure 3.1) maintains a large inner state initialized with the internal state hi−1 , the salt S, and the counter Ci . Then the compression function iterates series of messagedependent rounds. After these rounds, the new internal state is obtained by compressing the inner state together with the old internal state and the salt. This internal design is socalled local wide-pipe which is inspired by Lucks’ wide-pipe design [Luc05]. The compression algorithm used in BLAKE is a modified version of Bernstein’s stream cipher ChaCha [Ber08]. Chapter 3. NIST’s SHA-3 Hash Function Competition 28 Figure 3.1. The BLAKE’s compression function. Security of BLAKE As noted before, the security results of HAIFA (see Section 2.3.5.2) are carried over to theBLAKE hash function under an idealness assumption of the compression function. Nevertheless, Andreeva et al. [ALM11] and Chang et al. [CNY11] independently showed that BLAKE’s compression function is differentiable from a random compression function after about 2n/4 queries. This implies that BLAKE’s compression function has non-random behavior and as a consequence the HAIFA security results in the ideal compression function model are invalid for the BLAKE hash function (see Section 2.3.6.3). In order to restore BLAKE’s security guarantees Andreeva et al. [ALM11] refined the level of modularity in the security analysis and revalidated the security results in ideal cipher model. Firstly, they proved optimal security bounds on the compression function AdvCol = Θ(q 2 /2n ) and f re AdveP = Θ(q/2n ). Due to collision and everywhere preimage preservation of the HAIFA f design, this security results are carried over from BLAKE’s compression function and extended to the BLAKE hash function. The everywhere second preimage property of BLAKE1 was directly analyzed in the ideal cipher model and as a result BLAKE was proved optimally second preimage resistant AdveSec = Θ(q/2n ). Finally, the BLAKE hash function is proved H indifferentiable from a random oracle in the ideal cipher model [ALM11, CNY11]. Performance of BLAKE BLAKE hash function as classified by NIST [TPB+ 11] is one of the top performers in software across most platforms, while in hardware its performance is labeled as average. In constrained environments, BLAKE is described as one of the top performers in speed with relatively modest memory requirements. Moreover, BLAKE has a structure that allows flexible designs. 1 The everywhere second preimage property is not preserved by HAIFA design which is shown in [ANPS07]. Chapter 3. NIST’s SHA-3 Hash Function Competition 3.3.2 29 Grøstl The Grøstl hash function [GKM+ 11] uses a wide-pipe Merkle-Damgård construction with a final transformation employed before chopping. Its compression function is based on two AES-like, fixed and distinct permutations. All nonlinearity in the design is derived from the AES S-box. Since the security of compression function is not optimal, Grøstl designers employed a final transformation which is believed to be one-way and collision resistant, but does not compress before the chopping. The reader is referred to Section 4.1 for a detailed description. Figure 3.2. The Grøstl hash function. Security of Grøstl In the center of the Grøstl security analysis is its permutation based compression function. In relation to this, Fouque et al. [FSZ09] introduced specific 2-permutation based construction and analyzed its collision and preimage resistance. Grøstl’s compression function is based on this particular construction. Their results allow us to claim tight security bounds on the compression function for collision AdvCol = Θ(q 4 /2l ) and preimage f re resistance AdveP = Θ(q 2 /2l ). Following same arguments as in security analysis of the f BLAKE, optimal bounds are obtained on collision and everywhere preimage resistance for the Grøstl hash function. Furthermore, the Grøstl hash function is proven indifferentiable from a random oracle if the underlying permutations are ideal [AMP10a]. The bound on second preimage resistance of Grøstl is unknown. In Chapter 4 we analyze everywhere second preimage resistance of the Grøstl in the ideal permutation model and we obtain bound AdveSec = Θ(q/2n−L ). H Chapter 3. NIST’s SHA-3 Hash Function Competition 30 Performance of Grøstl In [TPB+ 11], Grøstl is marked as an average performer in software across most platforms while in hardware Grøstl’s performance is seen as above-average. In constrained environments, Grøstl has poor performance with modest memory requirements. It can be also noted that Grøstl has a flexible structure that allows various area trade-offs. 3.3.3 JH The JH hash function [Wu11] is a novel design and to an extent it resembles a sponge construction. It can be viewed as a sponge-like construction as it employs fixed permutation based compression function and wide-pipe Merkle-Damgård construction with final chopping as iteration mode, where the message size is m, the hash value size is n, while the internal state size l satisfies l = 2m ≥ 2n. The permutation P is based on the AES design. Specifically, all hash value sizes of JH use the same function. Also, each member of the JH family is selected by using its corresponding IV . Figure 3.3. JH’s compression function. Security of JH As a consequence of the results of Black et al. [BCS05], the JH compression function is insecure in the ideal permutation model. As a confirmation of this claim, collisions and preimages can be found for JH compression function in one query to the permutation. In their paper, Lee and Hong [LH11] proved that the JH hash function is optimally colli2 n sion resistant AdvCol H = Θ(q /2 ). Andreeva et al. [AMPŠ12] proved optimal bounds for preimage and second preimage resistance of JH for the n = 256 variant, while bounds for n = 512 variant on preimage and second preimage resistance are improved but still not Chapter 3. NIST’s SHA-3 Hash Function Competition 31 optimal. Furthermore, JH hash function is proven indiffierentiable from a random oracle if the underlying permutation is assumed to be ideal [BMN10]. Later, Moody et al.[MPST12] improved the indifferentiability bound on JH and confirmed (second) preimage results obtained in [AMPŠ12]. Performance of JH In [TPB+ 11] JH is described as an average to above-average performer in software and hardware, while in constrained environments JH is regarded as average in performance. Also, JH has modest memory requirements. 3.3.4 Keccak The Keccak hash function [BDPA11] follows the sponge construction [BDPA07], but can also be considered as a Merkle-Damgård construction with final chopping. It uses a single large fixed permutation. The permutation can be seen as a combination of a linear mixing operation and a very simple nonlinear mixing operation. What is interesting regarding this hash design is that it uses a single design for variable hash output sizes. Security of Keccak Similarly to JH, the compression function of Keccak is based on one permutation and the same results apply. Collisions and preimages can be found for Keccak’s compression function in one query to the permutation. The sponge construction is proven indifferentiable from a random oracle if the underlying permutation is assumed to be ideal [BDPA08] and this result applies to Keccak. As noted in Section 2.3.6.2, indifferentiability bound renders bounds on the other security properties. Following this approach, an optimal bound is 2 n obtained on collision resistance AdvCol H = Θ(q /2 ), as well as on preimage and second preimage resistance Θ(q/2n ) for Keccak in the ideal permutation model [AMP10c]. Performance of Keccak The Keccak hash function is described by NIST [TPB+ 11] as an average performer in software, while hardware performance of Keccak is regarded as excellent. In constrained environments, Keccak is below-average in performance with modest memory requirements. Keccak is highly parallelizable due to the design. Chapter 3. NIST’s SHA-3 Hash Function Competition 3.3.5 32 Skein The Skein hash function [BKL+ 10] builds on the Unique Block Iteration (UBI). UBI mode hashes an arbitrary-length string by iterating a compression function, which takes as input an internal state, a message block, and a tweak. The compression function is based on the Threefish tweakable block cipher in Matyas-Meyer-Oseas mode as can be seen on Figure 3.4. The tweak encodes the number of bytes processed to this point, type of UBI mode and special flags for the first and the last block. Skein supports variable output size. If a single output block is not enough, Skein runs the output transformation several times. The most innovative parts of Skein are the Threefish block cipher and the mode of operation. The reader is referred to Section 4.2 for a detailed description. Figure 3.4. Hashing a three-block message using UBI mode. Security of Skein Due to optimal security bounds on the compression function claimed by submitters [BKL+ 09] and the property that the Skein’s mode of operation preserves collision resistance and everywhere preimage resistance, optimal bounds for these two properties are obtained. Furthermore, the Skein hash function is proven indifferentiable from a random oracle if the underlying tweakable block cipher is assumed to be ideal [BKL+ 09]. As derived in 2 [AMP10c], this indifferentiability renders a bound of O 2qn + q2l on the second preimage resistance. This second preimage bound for Skein is optimal for the n = 256 variant, while for n = 512 variant this claim is not held. In Chapter 4 we improve bound on second preimage resistance to AdveSec = Θ(q/2n ) in the ideal cipher model. H Chapter 3. NIST’s SHA-3 Hash Function Competition 33 Performance of Skein NIST [TPB+ 11] rated Skein’s performance in software as above-average across most platforms, particularly in 64-bit mode. In hardware, Skein’s throughput-to-area ratio is average to a little below-average. Results in constrained environments show that Skein has above-average performance. Skein has modest memory requirements and benefits from the pipelining used in modern processors. 3.4 A Summary of the Existing Results In this section we provide a summary of the previously mentioned results. First, Section 3.4.1 presents the main advantages and drawbacks of the finalists recognized by NIST [TPB+ 11], and then provides a schematic summary of security and performance results. 3.4.1 Factors of Favorability BLAKE was promoted to the final round of NIST’s SHA-3 hash function competition due to its high security margin, good performance in software, and its simple and clear design. Grøstl was chosen as a finalist because of its well-understood design and solid performance, especially in hardware. Although the security properties of Grøstl are not ideal, the amount of cryptanalysis that has been published on Grøstl and its building blocks provides a degree of security in this design. JH was selected as a finalist because of its solid security properties, good all-around performance, and innovative design. As drawbacks of JH design NIST emphasizes not well-understood compression function construction together with lack of analysis provided for this construction. Keccak was selected by NIST for the final of competition, mainly due to its good security properties, its high throughput and throughput-to-area ratio and the simplicity of its design. Skein advanced to the final, mainly due to its high security margin and speed in software. 3.4.2 A Summary of the Security and Performance Results In Table 3.1 we briefly summarized performance results presented in [TPB+ 11], in order to provide an insight on this important evaluation aspect. Let us emphasize that the Chapter 3. NIST’s SHA-3 Hash Function Competition 34 description of performance level (high, average and low) does not imply drastically different performances, considering that all these performance results are within satisfactory range expected by the NIST. Table 3.1. A schematic summary of hardware and software results. The first column indicates the name of hash function selected in the final of competition, while the next three columns describe performance results in software, hardware and in constrained environments, respectively. Software Hardware Constrained settings BLAKE High Average High Grøstl Average High Low JH Average Average Average Keccak Average High Low Skein High Low High As for the provable security results, the summary presented in our work is based on the classification conducted by Andreeva et al. [AMP10c, AMPŠ12]. The first of these two mentioned papers deals with provable security results of all 14 second round SHA-3 candidates, while in the second paper as well as in this thesis the emphasis is placed on the five competition finalists. Concretely, in Table 3.2 we presented all security reduction results (for n = 256 and n = 512 variants of the SHA-3 hash function finalists) known to us. We updated second preimage results of Grøstl and Skein obtained in Chapter 4 which are illustrated in the table with a green box. A yellow box in the table is used to indicate problems which are still open, one of which is the lack of an optimal (second) preimage bound for 512 bits variant of JH. Essentially, all the results are provided in the ideal permutation or cipher model, which means that the strength of assumptions is weakened in comparison to the ideal compression function assumption. If we take a look on the security bounds on compression functions presented in this table, we can see that collisions and (second) preimages can be found for the JH and Keccak compression function in one query to the permutation and as a consequence these compression functions are regarded as insecure. However, this does not invalidate security of the JH and Keccak hash functions. Θ(q/2n ) Θ(q 2 /2l ) Θ(1) Θ(q/2l ) Θ(q 2 /2n ) Θ(q 4 /2l ) Θ(1) Θ(1) Θ(q 2 /2l ) Ideal cipher E Ideal permutations P,Q Ideal permutation P Ideal permutation P Ideal blockcipher E BLAKE Grøstl JH Keccak Skein Θ(1) AdvPf re AdvColl f Model Θ(q/2l ) Θ(1) Θ(1) Θ(q 2 /2l ) Θ(q/2n ) AdvSec f Θ(q 2 /2n ) Θ(q 2 /2n ) Θ(q 2 /2n ) Θ(q 2 /2n ) Θ(q 2 /2n ) AdvColl H Θ(q/2n ) Θ(q/2n ) Θ(q/2n ) q2 O 2qn + 2l−m Θ(q/2n ) AdvPHre Θ(q/2n ) Θ(q/2n ) Θ(q/2n−L ) q2 O 2qn + 2l−m Θ(q/2n ) AdvSec H Table 3.2. A schematic summary of security reduction results of five finalists. The used parameters n, l, m,2L , denote the hash function output size, the internal value size and the message input size, the length of the first preimage in message blocks, respectively. The first column indicates the name of hash function selected in the final of competition, while the second column describes the underlying assumptions. The next three columns show the security bounds on compression functions, while the last three columns summarize the security reduction results on complete hash functions. A yellow box indicates the existence of a non-trivial upper bound which is not yet optimal for both the 256 and 512 bits variant. A green box indicates the security reduction results that are proven in this thesis while the other results presented in this table are based on previous works [AMP10c, AMPŠ12]. Chapter 3. NIST’s SHA-3 Hash Function Competition 35 Chapter 4 Second Preimage Resistance of Grøstl and Skein This thesis is concerned with the second preimage resistance of SHA-3 candidates, namely Grøstl and Skein. As explained in Chapter 3, an important evaluation criterion in the competition for SHA-3 hash function is security (e.g. the possible reductions of the hash function security to the security of its underlying building blocks). In this chapter we provide a lower bound on second preimage resistance of Grøstl and Skein within the concrete-security provable-security framework. The reader is referred to Section 2.3.6 and Section 2.3.7 where the proof techniques and the security model used in this chapter are discussed. 4.1 Security Analysis of Grøstl As briefly presented in Section 3.3.2 Grøstl combines characteristics of the wide-pipe and Merkle-Damgård constructions and uses two distinct permutations P and Q. Let us closely observe Grøstl to see how the hash value is obtained. First, the padding function padG takes a message M of N bits length and returns the padded message split into l- bit message blocks padG (M ) = m1 ||m2 || . . . ||mk of the certain length, which is a multiple of message block size l. Padding is achieved by appending to the original message a single ’1’ bit followed by as many ’0’ bits as needed to complete l-bit block after embedding the 64-bit representation of the number of message blocks in the padded message. Then, Grøstl iterates the permutation based compression function f : {0, 1}l × {0, 1}l → {0, 1}l . Finally, the output of the last compression call is processed by the output transformation g(h) = P (h) ⊕ h after which the output size is shortened from l to n bits with the function shortn . 37 Chapter 4. Second Preimage Resistance of Grøstl and Skein 38 Figure 4.1. Grøstl’s compression function. 4.1.1 Assessing Second Preimage Resistance of Grøstl A possible way to obtain a bound on the second preimage resistance of Grøstl is by using indifferentiability results. Grøstl is proven indifferentiable from a random oracle if the underlying permutations are ideal [AMP10a]. Briefly, a proved bound shows that Grøstl behaves like a random oracle up to the birthday bound which is not enough for achieving optimal second preimage resistance. As indicated in [GKM+ 11], the underlying compression function of Grøstl exhibits a nonideal behavior (i.e. the fixed points for the compression function can be found easily1 , the generalised birthday collision attack is applicable to the l-bit compression function of Grøstl with a complexity of 2l/3 ), which makes the result of Bouillaguet and Fouque [BF09] in the ideal compression function model inapplicable. Therefore, in order to reconfirm the second preimage resistance of Grøstl we explore further. More precisely, we assume ideality of the underlying building blocks of compression function which in the case of Grøstl are two permutations P and Q. 4.1.2 Proof of Security Under the assumption that P and Q are random l-bit permutations, where l is the iterated state size and n is the output size, we will prove that the advantage of the second preimage 2 2q + , where the second preimage adversary adversary is upper bounded by O (k+1)q n l 2 2 makes at most q queries and the length of target message is at most k blocks. In this ideal model, an adversary is allowed to make both forward and inverse queries to P and 1 In order to find a fixed point, we select a message m arbitrarily and then compute h = P −1 (Q(m)) ⊕ m. This will give us the fixed point for Grøstl’s compression function f (h, m) = h. Chapter 4. Second Preimage Resistance of Grøstl and Skein 39 Q random permutations. All these queries are stored in a query history LP and LQ as indexed elements and their number is q2 and q1 , respectively. Theorem 4.1. Let P,Q be two random l-bit permutations and let A be a computationally unbounded adversary which makes at most q < 2l−1 queries to oracles. Its advantage in breaking H second preimage resistance is upper bounded by: eSec[λ] AdvH (q) ≤ (k+1)q 2 2l + 2q 2n . Proof. We prove the theorem by using a graph based approach. To complete this proof, we will introduce the graph construction setting, which is based on the definitions provided in Section 2.1.2. The Graph Construction. We introduce two, initially empty lists LP , LQ . Let us denote by LQ = {(αi , βi )1≤i≤q1 } a list such that Q(αi ) = βi and by LP = {(αj0 , βj0 )1≤j≤q2 } a list such that P (αj0 ) = βj0 where a tuple (α, β) ∈ {0, 1}l ×{0, 1}l . We introduce a directed graph (V, E), initially ({IV } , ∅). Any (αi , βi ) ∈ LQ and (αj0 , βj0 ) ∈ LP defines an edge e between e i → αi ⊕ αj0 ⊕ βi ⊕ βj0 . We define a the two vertices in (V, E) which we denote by αi ⊕ αj0 − path in the graph as a sequence of edges p = (e1 , . . . , ek+1 ) such that for each of its edge ei , where 1 ≤ i ≤ k the output vertex is equal to the input vertex of ei+1 . We say that two distinct paths collide if they both start with the IV vertex and both end with the same output vertex. Grøstl in the Graph Setting. Intuitively, an edge in (V, E) corresponds to an evaluation of the Grøstl compression function and the number of them is exactly q1 · q2 . For convenience edges ei ∈ E are labeled by messages mi in {0, 1}l where mi = αi and 1 ≤ i ≤ k. A path in the graph (V, E) obtained while hashing the target message M is called the chalm m m ek+1 1 2 k lenge path denoted by IV −−→ h1 −−→ h2 · · · −−→ hk −−−→ hk+1 . It is necessary to emphasize that first k internal states are l-bit long, while hk+1 (n-bit long hash value) is obtained by applying output transformation with the function shortn on the internal state hk . We can conclude that a vertex in (V, E) corresponds to the internal state of the Grøstl hash function. Let SP be the event that, as a result of adversary’s queries, a path which collides with and differs from the challenge path is formed in the graph (V, E). eSec[λ] Claim 1. AdvH (q) ≤ P r[SP] Proof. Suppose that the second preimage adversary A receives a randomly generated target message M where padG (M ) = m1 ||m2 ||...||mk and it outputs a message M 0 6= M where padG (M 0 ) = m01 ||m02 ||...||m0s such that H P,Q (M ) = H P,Q (M 0 ) for queried oracles P and Q. The adversary A makes all of the queries necessary to compute H(M ) Chapter 4. Second Preimage Resistance of Grøstl and Skein 40 and H(M 0 ). We denote by p = (m1 , m2 , ..., mk , ek+1 ) the challenge path and denote by p0 = (m01 , m02 , ..., m0s , e0s+1 ) the path obtained while hashing message M 0 . We claim that paths p and p0 are colliding paths. 1. If |M | 6= |M 0 |, then due to the padding function of Grøstl, the inputs of the last invocation of the compression are not the same mk 6= m0s , then clearly p and p0 induced by messages M and M 0 are distinct. 2. Otherwise, |M | = |M 0 |. Since hk+1 = h0s+1 , either there is a second preimage for the output transformation or hk = h0s . If the latter case is true, either there is a second preimage on the compression function, or (hk−1 , mk ) = (h0k−1 , m0k ). This argument repeats for the compression function. Since |M | = |M 0 | and IV is fixed for both evaluations, either there is a second preimage at some point, or mi = m0i for 1 ≤ i ≤ k. In the latter case, M = M 0 which is impossible. Therefore, there exists at least one pair (h0i−1 , m0i ) 6= (hi−1 , mi ), which implies that paths p and p0 are distinct. Because M and M 0 collide, we have hk+1 = h0s+1 and hence the paths p and p0 end with the same output vertex which means that they collide. Therefore, finding a message that collides with the target message is equivalent to finding a path that collides with the challenge path. This completes the proof of the Claim 1. Claim 2. P r[SP] ≤ (k+1)q 2 2l + 2q 2n . Proof. Suppose that A wins. The SP event occurs when A succeeds in connecting a path (different from the challenge path) in the graph (V, E) from IV to the challenge path. That connection can happen in two ways: Let C be the event in which a connection occurs on an internal state of the challenge path before the output transformation is applied and let us name CO the event in which connection occurs after the output transformation is applied. Simulation. We simulate the execution of A, and bookmark in lists LP and LQ the queries sent to the oracles P and Q, respectively. Every time A submits a new query to the oracle, it m m m ek+1 1 2 k receives a uniformly-distributed random value. Let IV −−→ h1 −−→ h2 · · · −−→ hk −−−→ hk+1 be the sequence of vertices crossed by the challenge path. Case 1: If the C event occurs after the q-th query to P and/or Q oracle, in the graph m0 m0 m0 1 2 s there exists a path p0 , IV −−→ h01 −−→ h02 · · · −−→ h0s where h0s is equal to one of the internal states hi from the challenge path for 0 ≤ i ≤ k. This means that the adversary has found a collision on compression function f. More precisely, this collision is actually the second preimage of one out of k + 1 internal states for f. Chapter 4. Second Preimage Resistance of Grøstl and Skein 41 Start the Simulation. Let us assume that the event C occurs after the adversary has sent a query to Q or Q−1 . Without loss of generality, we consider forward queries only. The tuple (α̂, β̂) is generated where β̂ is a random value from a set of size at least 2l − q1 . The second preimage is found if in the list LP exists a pair (αj0 , βj0 ), such that hi = α̂ ⊕ αj0 ⊕ β̂ ⊕ βj0 where 0 ≤ i ≤ k. Since 1 ≤ j ≤ q2 , each query to Q or Q−1 generates q2 new edges. Therefore, each query has a probability q2 ·(k+1) (2l −q1 ) to give the second preimage of one out of k + 1 internal states from the challenge path. Consequently, a probability that event C occurs after the adversary asks at most q1 queries to Q or Q−1 is upper bounded by: P r[C]Q ≤ (k + 1)q1 q2 . 2 l − q1 Alternatively, we have the case that the event C is realized after the adversary has sent a query to P or P −1 . An upper bound for this case is obtained in the similar way as before: P r[C]P ≤ (k + 1)q1 q2 . 2 l − q2 By the union bound, we obtain an upper bound on probability that event C occurs: P r[C] ≤ P r[C]Q + P r[C]P ≤ (k + 1)q1 q2 (k + 1)q1 q2 + . 2 l − q1 2 l − q2 Case 2: A hash value hk+1 = shortn (P (hk ) ⊕ hk ) is generated by applying the output transformation together with the function shortn 2 . The output transformation is designed on top of the permutation P . Therefore, the event CO can be realized only after the adversary has sent query to P or P −1 . Notice that each query generates precisely one output transformation edge. If CO event occurs, in the graph (V, E) there exists a path p0 , m0 m0 m0 e0s+1 1 2 s IV −−→ h01 −−→ h02 · · · −−→ h0s −−−→ h0s+1 where h0s+1 = hk+1 and h0s 6= hk . This implies that the adversary has found a second preimage on the output transformation for n-bit long preimage hk+1 . Start the Simulation. The event CO is realized after the adversary has sent query to P or P −1 . Without loss of generality, only forward tuple (α̃, β̃) is generated and β̃ is a random value from a set of size at least 2l − q2 . The second preimage is found if hk+1 = shortn (α̃ ⊕ β̃). Therefore, each query to P or P −1 has a probability at most 2l−n (2l −q2 ) to give the second preimage on the output transformation. Consequently, a probability that event CO occurs after the adversary asks at most q2 queries to P or P −1 is upper bounded by: P r[CO] = P r[CO]P ≤ 2 q2 · 2l−n . 2l − q2 The function shortn truncates the output by returning only the last n bits. Chapter 4. Second Preimage Resistance of Grøstl and Skein 42 Combining all cases, we give an upper bound on a probability that event SP occurs: P r[SP] ≤ P r[C] + P r[CO] (k + 1)q1 q2 (k + 1)q1 q2 q2 · 2l−n + + l 2 l − q1 2l − q2 2 − q2 2 q (k + 1)q + n−1 . ≤ 2 2l ≤ In Appendix A we provide a detailed mathematical support for this equation. This completes the proof of Claim 2. The result for second preimage resistance of Grøstl now follows from the combination of the two claims which completes the proof of Theorem 4.1. 4.2 Security Analysis of Skein As briefly presented in Section 3.3.5 the mode of operation employed in Skein called Unique Block Iteration (UBI) takes as input an internal state, a message block, and a tweak. The compression function is based on the Threefish tweakable block cipher used in MatyasMeyer-Oseas mode. The tweak encodes the number of bytes processed so far, the type of UBI mode and special flags for the first and the last block. In normal hashing mode there are three UBI invocations: the one for a configuration block used to generate IV , a message hashing block and a block which represents the output transformation. Figure 4.2. Skein in normal hashing mode. The padding function padS takes a message M of N bits length and returns the padded message split into the message blocks padS (M ) = m1 ||m2 || . . . ||mr of a certain length, which is a multiple of message block size l. If N is a multiple of 8, padding is achieved by appending to the original message as many ’0’ bits as needed to complete an l-bit block. Otherwise, padding is achieved by appending to the original message a single ’1’ bit followed Chapter 4. Second Preimage Resistance of Grøstl and Skein 43 by as many ’0’ bits as needed to complete an l-bit block. Interestingly, Skein uses a block counter included in the tweak rather than the usual strengthening. The designers claim that the counter provides the same security as the typical padding where the message length is appended in the end of message. Furthermore, the counter ensures that each message block is hashed in the unique way. To obtain the hash value, the output of the last compression call is processed by the output transformation after which the output size is optionally shortened from l to n bits with the function shortn . 4.2.1 Assessing Second Preimage Resistance of Skein A possible way to obtain a bound on the second preimage resistance of Skein is by using indifferentiability results. The Skein hash function is proven indifferentiable from a random oracle if the underlying tweakable block cipher is assumed to be ideal [BKL+ 09]. 2 Additionally, an upper bound O 2qn + q2l on the second preimage resistance is derived via the indifferentiability [AMP10c]. NIST requires the SHA-3 hash function for n = 224, 256, 384, 512. The existing second preimage bound gives optimal second preimage resistance as long as 2n ≤ l. In order to prove an optimal bound on the second preimage resistance of narrow-pipe versions of Skein, we will directly analyze second preimage resistance of Skein in the ideal cipher model. Our proof follows techniques used by Bouillaguet and Fouque [BF09] for HAIFA construction. 4.2.2 Proof of Security Under the assumption that E is an ideal tweakable block cipher, where l is the iterated state size and n is the output size, we will prove that the advantage of the second preimage 2q + , where the second preimage adversary makes adversary is upper bounded by O 2q n l 2 2 at most q queries and the length of target message is at most r blocks. In this ideal model, an adversary is allowed to make both forward and inverse queries to E random oracle. All these queries are stored in a query history LE as indexed elements. Theorem 4.2. Let E be an ideal tweakable block cipher and let A be a computationally unbounded adversary which makes at most q < 2l−1 queries. Its advantage in breaking H second preimage resistance is upper bounded by: eSec[λ] AdvH (q) ≤ 2q 2l + 2q 2n . Proof. The proof follows an approach used in the proof of Theorem 4.1. Chapter 4. Second Preimage Resistance of Grøstl and Skein 44 The Graph Construction. Let LE = {(ki , xi , ti , yi )1≤i≤q } be an initially empty list such that y = Ek (t, x) where tuple (k, x, t, y) ∈ {0, 1}l × {0, 1}l × {0, 1}s × {0, 1}l . We introduce an initially empty directed graph (V, E). When the adversary A sends a forward query (k, x, t) to oracle E it receives a value y, and when A sends an inverse query (k, t, y) to oracle it receives a value x. An edge e ∈ E is formed between two vertices in V , e (k, x, t, y) → − (k 0 , x0 , t0 , y 0 ) if k 0 = y ⊕ x. We define a path in the graph (V, E) as the sequence e e 1 r of vertices which we denote by p = (k1 , x1 , t1 , y1 ) −→ · · · −→ (kr+1 , xr+1 , tr+1 , yr+1 ). We say that two vertices (k, x, t, y) and (k 0 , x0 , t0 , y 0 ) collide if y ⊕ x = y 0 ⊕ x0 . Further, two distinct paths collide if they both start with the same vertex and they both end with colliding vertices. Skein in the Graph Setting. Intuitively, an edge corresponds to precisely one evaluation of the Skein’s compression function. For each i, 1 ≤ i ≤ r is true: mi = xi , hi−1 = ki , t3 is a tweak value of the message type and hi = yi ⊕xi . The hash value hr+1 = shortn (yr+1 ⊕xr+1 ) is obtained by applying the output transformation with the final chopping on the internal state hr = kr+1 . In the output transformation, the tweak has the output type and the 64-bit counter is used instead of message block input xr+1 . Without loss of generality, we can replace the first UBI invocation for the configuration block with IV = k1 and fix it as a constant. If a path in (V, E) is obtained while hashing the target message M , we refer to this sequence as the challenge path. We denote by (IV, h1 , . . . , hr ) the sequence of internal states crossed by the challenge path to obtain the hash value hr+1 . Let SP be the event that, as a result of adversary’s queries, a path which collides with and differs from the challenge path is formed in the graph (V, E), where the overlapping tweaks coincide with each other. eSec[λ] Claim 1. AdvH (q) ≤ P r[SP]. Proof. Suppose that the second preimage adversary A receives a randomly generated target message M where padS (M ) = m1 ||m2 || . . . ||mr and it outputs a message M 0 6= M where padS (M 0 ) = m01 ||m02 || . . . ||m0p such that H E (M ) = H E (M 0 ) for queried oracle E. Adversary A makes all of the queries necessary to compute H(M ) and H(M 0 ). Let us e e 1 r denote by p = (IV, x1 , t1 , y1 ) −→ · · · −→ (kr+1 , xr+1 , tr+1 , yr+1 ) the challenge path induced e0 e0p 1 0 0 by message M and let us denote by p0 = (IV, x01 , t01 , y10 ) −→ · · · −→ (kp+1 , x0p+1 , t0p+1 , yp+1 ) the path induced by message M 0 . We claim that paths p and p0 are colliding paths. 3 We assume that tweaks t and t0 in the definition of edge correspond to one another in terms of bits processed so far, the type of UBI mode and special flags. Chapter 4. Second Preimage Resistance of Grøstl and Skein 45 1. If |M | = 6 |M 0 |, then the values of the tweak entering the output transformation are 0 0 different tr+1 6= t0p+1 and so (kr+1 , xr+1 , tr+1 , yr+1 ) 6= (kp+1 , x0p+1 , t0p+1 , yp+1 )4 . 2. Otherwise, |M | = |M 0 |. Since hr+1 = h0r+1 , either there is a second preimage on output transformation or (hr , tr+1 ) = (h0r , t0r+1 ). If the second statement is true, either there is a second preimage on compression function where tweak values must be the same tr = t0r , or (hr−1 , mr , tr ) = (h0r−1 , m0r , t0r ). This argument repeats for the compression function. Since |M | = |M 0 | and IV is fixed for both evaluations, either there is a second preimage on compression function at some point (for the same value of tweak), or mi = m0i for 1 ≤ i ≤ r. In the latter case, M = M 0 which is impossible. Therefore, there is some i, 1 ≤ i ≤ r such that mi 6= m0i , and so (ki , xi , yi ) 6= (ki0 , x0i , yi0 ) for ti = t0i . 0 Since M and M 0 collide hr+1 = h0p+1 , and hence yr+1 ⊕ xr+1 = yp+1 ⊕ x0p+1 . Therefore, the paths p and p0 collide. This completes the proof of the Claim 1. Claim 2. P r[SP] ≤ 2q 2l + 2q 2n . Proof. Suppose that A wins. As noted before, the SP event occurs when A succeeds in connecting a path (different from challenge path) in the graph (V, E) from IV to the challenge path, where the tweaks need to coincide. Similarly as in the case of Grøstl the connection can happen in two ways: Let C be the event in which a connection occurs on an internal state of the challenge path before the output transformation is applied and let us name CO the event in which connection occurs after the output transformation is applied. Simulation. We simulate the execution of A, and bookmark in list LE the queries sent to the oracle E. Every time A submits a new query to the oracle, it receives a uniformlydistributed random value. We denote the challenge path induced by the target message M e er−1 1 by p = (IV, x1 , t1 , y1 ) −→ · · · −−−→ (kr+1 , xr+1 , tr+1 , yr+1 ). Case 1: If the C event occurs after the adversary A asks at most q the queries to E e0 e0p 1 oracle, in the graph (V, E) there exists a path p0 = (IV, x01 , t01 , y10 ) −→ · · · −→ (kp0 , x0p , t0p , yp0 ), where the vertex (kp0 , x0p , t0p , yp0 ) collides with a vertex (ki , xi , ti , yi )1≤i≤r from the challenge path, such that t0p = ti . This means that adversary has found a collision for the tweakable compression function f. More precisely, this collision is actually the second preimage of one of the hi from the challenge path, for 1 ≤ i ≤ r. 4 As noted above, in the output transformation the block cipher is used in the counter mode and therefore xr+1 = x0p+1 but this does not affect our proof. Chapter 4. Second Preimage Resistance of Grøstl and Skein 46 Start the Simulation. Without loss of generality, let us assume that event C occurs after the adversary has sent the j-th query. The tuple (kj0 , x0j , t0j , yj0 ) is generated where yj0 is a random value from a set of size at least 2l −j. The only place where the path p0 can connect to the challenge path is the vertex where t0j = ti , for 1 ≤ i ≤ r. A second preimage on tweakable compression function is found if yi ⊕ xi = yj0 ⊕ x0j . Therefore, the j-th query has a probability at most 1/(2l − j) to give this second preimage. Consequently, a probability that event C occurs after the adversary asks at most q queries to E is upper bounded by: P r[C] ≤ q X j=1 q 1 ≤ l . 2l − j 2 −q Case 2: As noted before, the hash value hr+1 is obtained by applying the output transformation with the final chopping on the internal state hr = kr+1 of the challenge path. If the CO event occurs after the adversary A asks at most q the queries to E oracle, in e0 e0p 1 0 0 the graph (V, E) exists a path p0 = (IV, x01 , t01 , y10 ) −→ · · · −→ (kp+1 , x0p+1 , t0p+1 , yp+1 ) where 0 hr+1 = shortn (yp+1 ⊕ x0p+1 ) after final chopping of l − n leftmost bits. This means that the adversary has found the second preimage on the output transformation. Start Simulation. Let us assume that event CO occurs after adversary has sent the ith query. The tuple (ki0 , x0i , t0i , yi0 ) is generated where yi0 is a random value from a set of size at least 2l − i. A second preimage on output transformation is found if and only if hr+1 = shortn (yi0 ⊕ x0i ). Therefore, the i-th query has a probability at most 2l−n 2l −i to give this second preimage. Consequently, a probability that event CO occurs after adversary A asks q queries to E is upper bounded by: P r[CO] ≤ q X 2l−n q · 2l−n ≤ . 2l − i 2l − q i=1 Combining both cases, we give an upper bound on probability that event SP occurs: P r[SP] ≤ P r[C] + P r[CO] q q · 2l−n + l −q 2 −q 2q 2q ≤ l + n. 2 2 ≤ 2l We obtain this result similarly as in proof of Grøstl as for q < 2l−1 we have 1 2l −q ≤ 2 . 2l If the final chopping is not needed n = l, the results are still valid. This completes the proof of the Claim 2. Chapter 4. Second Preimage Resistance of Grøstl and Skein 47 The result for the second preimage resistance of Skein now follows from the combination of the two claims which completes the proof of Theorem 4.2. Chapter 5 Conclusions and Remarks In this chapter, we offer a brief summary of the work done in the thesis and then we discuss its implications for the future study. 5.1 Conclusions In this thesis we considered the final round candidates in the competition for a new SHA-3 hashing algorithm within the provable security framework. To be able to carry out the analysis, we became familiar with the provable security approach together with the state of the art of hash functions, and more closely with the competition finalists. As shown in Chapter 4, we provided a lower bound on second preimage resistance of Grøstl and Skein in the ideal model. The obtained results for Grøstl in the ideal permutation model confirm the claim that the Merkle-Damgård iteration looses a factor linear in the message length (in blocks) of the second preimage security in the ideal compression function model [KS05]. Secondly, Skein’s bound shows that the addition of a tweak which entails an unique compression function call results in an increase of the second preimage resistance (up to approximately n bits). In Table 3.2 we presented the existing security reduction results and updated those obtained in our work. One needs to be aware of shortcomings of provable security approach in the ideal model while looking at these security reduction results. There are classes of attacks still maybe possible, such as timing attacks, differential fault analysis, and differential power analysis. Sometimes applied proof techniques or human factors (i.e. flaws in the proof, proof given in the wrong model or for the wrong problem) may affect the accuracy of a security reduction. However, security reduction results are of the great importance since they give us a very good indication that the higher level structure has no flaws in the design. More concretely, they show that no attack on the hash function is possible without exploiting a weakness of the underlying idealized primitive. 49 Chapter 5. Conclusions and Remarks 5.2 50 Summary of Contributions Bearing in mind the importance of valid security guaranties, we see our results as a valuable contribution to the SHA-3 competition. The main contributions of this thesis are: • The analysis of the second preimage resistance of hash function competition finalists Grøstl and Skein. Within the concrete-security provable-security framework, we gave a lower bound on the second preimage resistance of Grøstl in the ideal permutation model and Skein in the ideal cipher model and proved them both optimally second preimage resistant. • While seeking for solutions we investigated the existing proof techniques concerning security notions with an emphasis on the second preimage resistance. • In addition, we gave a concise survey of the five finalists together with their security reductions and performance results. 5.3 Future Research In recent years, the NIST SHA-3 competition has focused the attention of cryptographic community and initiated a broad research on the design principles and analysis of hash functions. As a result many new ideas emerged regarding construction designs, cryptanalysis, proof techniques, etc. Also, new directions for further research related to this topic were identified. We now list some open problems: • Firstly, as can be seen in Table 3.2 the provided bounds on the preimage and second preimage resistance of JH are not optimal. • Once we provide a reduction of the security (Col, ePre, eSec) of the hash function to the security of some underlying atomic primitive (under the assumption that particular underlying primitive is ideal), a more detailed analysis of that particular primitive can be conducted with the goal to investigate its resistance to existing and new attacks. • All security reduction results presented in this work were carried out in the ideal model. Supporting second preimage resistance with a proof in the standard model still remains the substantial challenge. A possible direction would be an attempt to design a construction efficient-in-practice with the second preimage preservation property. Chapter 5. Conclusions and Remarks 51 • More fundamentally, definitions and a classification of the main security properties are still not completely understood, while new practical applications emerge with the demand for subtle security requirements. • There is a need for developing new methods to assess security and to develop new attacks and designs ideas. • Finally, a broad range of use and the number of existing security requirements as well as performance requirements make the hash function design more complex. One solution would be to effectively parse these requirements into certain related entities and to design different hash functions which would deal with each of these entities. Bibliography [ABF+ 08] Elena Andreeva, Charles Bouillaguet, Pierre-Alain Fouque, Jonathan J. Hoch, John Kelsey, Adi Shamir, and Sébastien Zimmer. Second Preimage Attacks on Dithered Hash Functions. In Nigel P. Smart, editor, EUROCRYPT, volume 4965 of Lecture Notes in Computer Science, pages 270–288. Springer, 2008. [ABM+ 12] Elena Andreeva, Andrey Bogdanov, Bart Mennink, Bart Preneel, and Christian Rechberger. On Security Arguments of the Second Round SHA-3 Candidates. International Journal of Information Security, 11(2):103–120, 2012. [AHMP10] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal BLAKE. Submission to NIST (Round 3), 2010. [ALM11] Elena Andreeva, Atul Luykx, and Bart Mennink. Provable Security of BLAKE with Non-Ideal Compression Function. IACR Cryptology ePrint Archive, Report 2011/620, 2011. [AMP10a] Elena Andreeva, Bart Mennink, and Bart Preneel. On the Indifferentiability of the Grøstl Hash Function. In Juan A. Garay and Roberto De Prisco, editors, SCN, volume 6280 of Lecture Notes in Computer Science, pages 88–105. Springer, 2010. [AMP10b] Elena Andreeva, Bart Mennink, and Bart Preneel. Security Properties of Domain Extenders for Cryptographic Hash Functions. JIPS, 6(4):453–480, 2010. [AMP10c] Elena Andreeva, Bart Mennink, and Bart Preneel. Security Reductions of the Second Round SHA-3 Candidates. In Mike Burmester, Gene Tsudik, Spyros S. Magliveras, and Ivana Ilić, editors, ISC, volume 6531 of Lecture Notes in Computer Science, pages 39–53. Springer, 2010. [AMPŠ12] Elena Andreeva, Bart Mennink, Bart Preneel, and Marjan Škrobot. Security Analysis and Comparison of the SHA-3 Finalists BLAKE, Grøstl, JH, Keccak, and Skein. In Aikaterini Mitrokotsa and Serge Vaudenay, editors, Progress in Cryptology - AFRICACRYPT, volume 7374 of Lecture Notes in Computer Science, pages 287–305. Springer, Heidelberg, 2012. 53 Bibliography 54 [And10] Elena Andreeva. Domain Extenders for Cryptographic Hash Functions. PhD thesis, Katholieke Universiteit Leuven, 2010. [ANPS07] Elena Andreeva, Gregory Neven, Bart Preneel, and Thomas Shrimpton. SevenProperty-Preserving Iterated Hashing: ROX. In Kaoru Kurosawa, editor, ASIACRYPT, volume 4833 of Lecture Notes in Computer Science, pages 130–146. Springer, 2007. [AP09] Elena Andreeva and Bart Preneel. A Three-Property-Secure Hash Function. In Roberto Maria Avanzi, Liam Keliher, and Francesco Sica, editors, Selected Areas in Cryptography, volume 5381 of Lecture Notes in Computer Science, pages 228–244. Springer, 2009. [AS11] Elena Andreeva and Martijn Stam. The Symbiosis between Collision and Preimage Resistance. In Liqun Chen, editor, IMA Int. Conf., volume 7089 of Lecture Notes in Computer Science, pages 152–171. Springer, 2011. [BCC+ 08] Emmanuel Bresson, Anne Canteaut, Benoı̂t Chevallier-Mames, Christophe Clavier, Thomas Fuhr, Aline Gouget, Thomas Icart, Jean-François Misarsky, Marı̀a Naya-Plasencia, Pascal Paillier, Thomas Pornin, Jean-René Reinhard, Céline Thuillet, and Marion Videau. Shabal, a Submission to NIST’s Cryptographic Hash Algorithm Competition. Submission to NIST, 2008. [BCS05] John Black, Martin Cochran, and Thomas Shrimpton. On the Impossibility of Highly-Efficient Blockcipher-Based Hash Functions. In Ronald Cramer, editor, EUROCRYPT, volume 3494 of Lecture Notes in Computer Science, pages 526– 541. Springer, 2005. [BD07] Eli Biham and Orr Dunkelman. A Framework for Iterative Hash Functions HAIFA. IACR Cryptology ePrint Archive, Report 2007/278, 2007. [BDPA07] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. Sponge functions. ECRYPT Hash Workshop, 2007. [BDPA08] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. On the Indifferentiability of the Sponge Construction. In Nigel P. Smart, editor, EUROCRYPT, volume 4965 of Lecture Notes in Computer Science, pages 181– 197. Springer, 2008. [BDPA11] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. The Keccak SHA-3 submission. Submission to NIST (Round 3), 2011. [Ber08] Daniel J. Bernstein. ChaCha, a variant of Salsa20, 2008. http://cr.yp.to/ chacha/chacha-20080128.pdf. Bibliography 55 [BF09] Charles Bouillaguet and Pierre-Alain Fouque. Practical Hash Functions Constructions Resistant to Generic Second Preimage Attacks Beyond the Birthday Bound, 2009. [BKL+ 09] Mihir Bellare, Tadayoshi Kohno, Stefan Lucks, Niels Ferguson, Bruce Schneier, Doug Whiting, Jon Callas, and Jesse Walker. Provable Security Support for The Skein Hash Family, 2009. [BKL+ 10] Mihir Bellare, Tadayoshi Kohno, Stefan Lucks, Niels Ferguson, Bruce Schneier, Doug Whiting, Jon Callas, and Jesse Walker. The Skein Hash Function Family. Submission to NIST (Round 3), 2010. [BMN10] Rishiraj Bhattacharyya, Avradip Mandal, and Mridul Nandi. Security Analysis of the Mode of JH Hash Function. In Seokhie Hong and Tetsu Iwata, editors, FSE, volume 6147 of Lecture Notes in Computer Science, pages 168– 191. Springer, 2010. [Bou11] Charles Bouillaguet. Etudes d’hypothéeses algorithmiques et attaques de primitives cryptographiques. PhD thesis, Université Paris Diderot, 2011. [BR93] Mihir Bellare and Phillip Rogaway. Random Oracles are Practical: A Paradigm for Designing Efficient Protocols. In Dorothy E. Denning, Raymond Pyle, Ravi Ganesan, Ravi S. Sandhu, and Victoria Ashby, editors, ACM Conference on Computer and Communications Security, pages 62–73. ACM, 1993. [BRS02] John Black, Phillip Rogaway, and Thomas Shrimpton. Black-Box Analysis of the Block-Cipher-Based Hash-Function Constructions from PGV. In Moti Yung, editor, CRYPTO, volume 2442 of Lecture Notes in Computer Science, pages 320–335. Springer, 2002. [CDMP05] Jean-Sébastien Coron, Yevgeniy Dodis, Cécile Malinaud, and Prashant Puniya. Merkle-Damgård Revisited: How to Construct a Hash Function. In Victor Shoup, editor, CRYPTO, volume 3621 of Lecture Notes in Computer Science, pages 430–448. Springer, 2005. [CNY11] Donghoon Chang, Mridul Nandi, and Moti Yung. Indifferentiability of the Hash Algorithm BLAKE. IACR Cryptology ePrint Archive, Report 2011/623, 2011. [Dam90] Ivan Damgård. A Design Principle for Hash Functions. In Gilles Brassard, editor, Advances in Cryptology - CRYPTO, 9th Annual International Cryptology Conference, Santa Barbara, California, USA, August 20-24, 1989, Proceedings, volume 435 of Lecture Notes in Computer Science, pages 416–427. Springer, 1990. Bibliography 56 [Dea99] Richard Dean. Formal Aspects of Mobile Code Security. PhD thesis, Princeton University, 1999. [DH76] Whitfield Diffie and Martin E. Hellman. New Directions in Cryptography. IEEE Transactions on Information Theory, IT-22(6)/ 644-654, 1976. [Die10] Reinhard Diestel. Graph Theory (Graduate Texts in Mathematics). SpringerVerlag, 2010. [FS86] Amos Fiat and Adi Shamir. How to Prove Yourself: Practical Solutions to Identification and Signature Problems. In Andrew M. Odlyzko, editor, CRYPTO, volume 263 of Lecture Notes in Computer Science, pages 186–194. Springer, 1986. [FSZ09] Pierre-Alain Fouque, Jacques Stern, and Sébastien Zimmer. Cryptanalysis of Tweaked Versions of SMASH and Reparation. In Roberto Maria Avanzi, Liam Keliher, and Francesco Sica, editors, Selected Areas in Cryptography, volume 5381 of Lecture Notes in Computer Science, pages 228–244. Springer, 2009. [GKM+ 11] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusiewicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl – a SHA3 candidate. Submission to NIST (Round 3), 2011. [GM84] Shafi Goldwasser and Silvio Micali. Probabilistic Encryption. Journal of Computer and System Sciences, 28(2)/ 270-299, 1984. [Jou04] Antoine Joux. Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions. In Matt Franklin, editor, Advances in Cryptology CRYPTO, volume 3152 of Lecture Notes in Computer Science, chapter 19, pages 99–213. Springer, Berlin, Heidelberg, 2004. [KK06] John Kelsey and Tadayoshi Kohno. Herding Hash Functions and the Nostradamus Attack. In Serge Vaudenay, editor, EUROCRYPT, volume 4004 of Lecture Notes in Computer Science, pages 183–200. Springer, 2006. [KS05] John Kelsey and Bruce Schneier. Second preimages on n-bit hash functions for much less than 2n work. In Ronald Cramer, editor, EUROCRYPT, volume 3494 of Lecture Notes in Computer Science, pages 474–490. Springer, 2005. [LH11] Jooyoung Lee and Deukjo Hong. Collision Resistance of the JH Hash Function. IACR Cryptology ePrint Archive, Report 2011/19, 2011. [LM92] Xuejia Lai and James L. Massey. Hash Function Based on Block Ciphers. In Rainer A. Rueppel, editor, EUROCRYPT, volume 658 of Lecture Notes in Computer Science, pages 55–70. Springer, 1992. Bibliography 57 [Luc05] Stefan Lucks. A Failure-Friendly Design Principle for Hash Functions. In Bimal K. Roy, editor, ASIACRYPT, volume 3788 of Lecture Notes in Computer Science, pages 474–494. Springer, 2005. [Mer79] Ralph Merkle. Secrecy, Authentication, and Public Key Systems. PhD thesis, UMI Research Press, 1979. [Mer90] Ralph C. Merkle. One Way Hash Functions and DES. In Gilles Brassard, editor, Advances in Cryptology - CRYPTO, 9th Annual International Cryptology Conference, Santa Barbara, California, USA, August 20-24, 1989, Proceedings, volume 435 of Lecture Notes in Computer Science, pages 428–446. Springer, 1990. [MPST12] Dustin Moody, Souradyuti Paul, and Daniel Smith-Tone. Improved Indifferentiability Security Bound for the JH Mode. In NIST’s 3rd SHA-3 Candidate Conference 2012, 2012. [MRH04] Ueli M. Maurer, Renato Renner, and Clemens Holenstein. Indifferentiability, Impossibility Results on Reductions, and Applications to the Random Oracle Methodology. In Moni Naor, editor, TCC, volume 2951 of Lecture Notes in Computer Science, pages 21–39. Springer, 2004. [MvOV97] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1997. [NIS07] NIST. Announcing Request for Candidate Algorithm Nominations for a New Cryptographic Hash Algorithm. Technical report, NIST, 2007. [PGV93] Bart Preneel, René Govaerts, and Joos Vandewalle. Hash functions based on block ciphers: A synthetic approach. In Advances in Cryptology - CRYPTO, Lecture Notes in Computer Science, pages 368–378. Springer-Verlag, 1993. [Rab78] Michael O. Rabin. Digitalized signatures. In Foundations of Secure Computation, pages 155–166. Academic Press, 1978. [Rog06] Phillip Rogaway. Formalizing Human Ignorance. In Phong Q. Nguyen, editor, VIETCRYPT, volume 4341 of Lecture Notes in Computer Science, pages 211– 228. Springer, 2006. [RS04] Phillip Rogaway and Thomas Shrimpton. Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, SecondPreimage Resistance, and Collision Resistance. In Bimal K. Roy and Willi Meier, editors, FSE, volume 3017 of Lecture Notes in Computer Science, pages 371–388. Springer, 2004. Bibliography 58 [RS08a] Phillip Rogaway and John P. Steinberger. Constructing Cryptographic Hash Functions from Fixed-Key Blockciphers. In David Wagner, editor, CRYPTO, volume 5157 of Lecture Notes in Computer Science, pages 433–450. Springer, 2008. [RS08b] Phillip Rogaway and John P. Steinberger. Security/Efficiency Tradeoffs for Permutation-Based Hashing. In Nigel P. Smart, editor, EUROCRYPT, volume 4965 of Lecture Notes in Computer Science, pages 220–236. Springer, 2008. [Sta08] Martijn Stam. Beyond Uniformity: Better Security/Efficiency Tradeoffs for Compression Functions. In David Wagner, editor, CRYPTO, volume 5157 of Lecture Notes in Computer Science, pages 397–412. Springer, 2008. [Sta09] Martijn Stam. Blockcipher-Based Hashing Revisited. In Orr Dunkelman, editor, FSE, volume 5665 of Lecture Notes in Computer Science, pages 67–83. Springer, 2009. [TPB+ 11] Meltem Sönmez Turan, Ray Perlner, Lawrence E. Bassham, William Burr, Donghoon Chang, Shu jen Chang, Morris J. Dworkin, John M. Kelsey, Souradyuti Paul, and Rene Peralta. Status Report on the Second Round of the SHA-3 Cryptographic Hash Algorithm Competition. Technical report, NIST, 2011. [Wag02] David Wagner. A Generalized Birthday Problem. In Moti Yung, editor, CRYPTO, volume 2442 of Lecture Notes in Computer Science, pages 288–303. Springer, 2002. [Wu11] Hongjun Wu. The Hash Function JH. Submission to NIST (round 3), 2011. [WY05] Xiaoyun Wang and Hongbo Yu. How to Break MD5 and Other Hash Functions. In Ronald Cramer, editor, EUROCRYPT, volume 3494 of Lecture Notes in Computer Science, pages 19–35. Springer, 2005. [WYY05] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding Collisions in the Full SHA-1. In Victor Shoup, editor, CRYPTO, volume 3621 of Lecture Notes in Computer Science, pages 17–36. Springer, 2005. Appendix A Mathematical Derivations A.1 Security Bound on Second Preimage of Grøstl P r[SP] ≤ P r[C] + P r[CO] (A.1) (k + 1)q1 q2 (k + 1)q1 q2 + + 2l − q1 2l − q2 (k + 1)q1 q2 (k + 1)q1 q2 ≤ + + 2l − q 2l − q 2(k + 1)q1 q2 2q2 · 2l−n ≤2· + 2l 2l (k + 1)q 2 q ≤2· + n−1 2 2 · 2l (k + 1)q 2 q + n−1 ≤ 2 2l ≤ 2l−n q2 · 2 l − q2 q2 · 2l−n 2l − q (A.2) (A.3) (A.4) (A.5) (A.6) Firstly, we present obtained bounds (A.2) in proof for second preimage resistance of the Grøstl. Since q = q1 + q2 , we can replace q1 and q2 with q in denominator and the equation (A.3) holds. As for q < 2l−1 we have 1 2l −q ≤ 2 2l we obtain (A.4). Furthermore, we wish to determine what is the maximum value of 2q1 q2 . We consider x = q2 , q1 = q − x and define a function fq (x) = 2(q − x)x = 2qx − 2x2 . To find a maximum of function we search for the first derivative fq0 (x) = 2q − 4x where 2q − 4x = 0. We have that x = q/2 ⇒ fqmax = fq (q/2) = 2(q − q/2)q/2 ⇒ fqmax = q 2 /2. Using this result we obtain (A.5). Finally, we obtain the bound on second preimage resistance of Grøstl (A.6). 59