Stream Ciphers and Algebraic Attack Methods MIDN 1/C Jake Felton United States Naval Academy Introduction This paper is a Capstone effort in cryptography (more particularly, cryptanalysis) at the United States Naval Academy. The intended audience, however, is much larger. I have endeavored to make the paper as accessible yet streamlined as possible. As a result, the heart of the paper – algebraic attack methods on stream ciphers based on linear feedback shift registers – is well motivated by definitions and mathematical context. A familiarity with cryptography is helpful but not necessary, so long as the reader is sufficiently fluent in mathematics generally. Stream ciphers take back seat to block ciphers in many of the most common usage scenarios, such as electronic data transmission over the internet. However, stream ciphers remain an important part of modern cryptography, particularly in low-power environments. Stream ciphers are often used in radio frequency identification (RFID) and mobile communication for their efficiency in hardware, simplicity, and ability to manipulate messages of any size rather than discrete blocks. The ‘traditional’ stream cipher is based on a block of memory called a linear feedback shift register, along with a nonlinear keystream function. As a result, the attack method presented in this paper is applicable to a large subset of stream ciphers. When important terms are introduced, they are written in italics until they have been properly defined. Afterwards they are written normally. Basic Definitions A cryptosystem is simply a collection of five objects – three sets or spaces and two rules (or functions). The three sets, taken from some alphabet 𝒜, are the plaintext set 𝒫, the ciphertext set 𝒞, and the keyspace 𝒦. The following definition based on that in [1] formalizes this notion. Note that all other definitions in this section, unless otherwise indicated, are based on those found in [4]. Definition 1 (cryptosystem) An alphabet 𝒜 is a set of symbols. The finite set 𝒫 ⊆ 𝒜 is the set of all possible messages, called plaintexts. The finite set 𝒞 ⊆ 𝒜 is the set of all possible encodings, called ciphertexts. The finite set 𝒦 ⊆ 𝒜 is the set of all possible keys, called the keyspace. 1 For every key 𝐾 ∈ 𝒦, there is an encryption rule 𝑒𝐾 ∈ ℰ and a corresponding decryption rule 𝑑𝐾 ∈ 𝒟 such that: the function 𝑒𝐾 ∶ 𝒫 → 𝒞 the function 𝑑𝐾 ∶ 𝒞 → 𝒫 𝑑𝐾 (𝑒𝐾 (𝑥)) = 𝑥 for every plaintext 𝑥 ∈ 𝒫 The fundamental purpose of any cryptosystem is to enable secure, accurate, and authenticable communication between two parties, called Alice and Bob. The assumption, of course, is that there will always be an eavesdropper Eve attempting to compromise some element of that security. To assume to the contrary would defeat the purpose of secret communication. Eve has significant implications for the study of cryptosystems. She highlights the important relationship between the arts of making ciphers and breaking ciphers. This dynamic will resurface later in the discussion of the attack, in the form of consequences that inform further design. For now, consider some famous notions of security that help illuminate the notion of a secure cryptosystem. Definition 2 (Kerckhoff’s Principle): In 1883 Auguste Kerckhoffs formulated six design principles for military ciphers in [2]. Perhaps the most notable of these stipulates that the cipher “must not require secrecy, and that it can conveniently fall into the hands of the enemy”. In other words, cryptographers must assume that every object of the cipher except the secret key is public. This implies that the security of a cryptosystem must rely solely on the strength of the secret key. *For ciphers that operate over the finite field GF(2), and with key size n, the ‘strength’ of the secret key is often measured in terms of the size of the keyspace, 2𝑛 . Indeed, this value is sometimes called the security parameter of the cipher. Definition 3 (Perfect Secrecy): In 1949 Claude Shannon published a landmark paper that introduced many of the notions of secrecy currently used. Among these are the principles of confusion and diffusion, most often realized as some type of substitution and permutation in modern ciphers. In addition, he provided the following theorem defining perfect secrecy, taken directly from [3]: 2 Theorem 1 (Shannon): A necessary and sufficient condition for perfect secrecy is that 𝑃{𝑥|𝑦} = 𝑃{𝑥} For all 𝑥 ∈ 𝒫 and 𝑦 ∈ 𝒞. In other words, the probability of guessing the plaintext that corresponds to any given ciphertext is simply the probability of guessing the plaintext given nothing at all. This is a useful idea, but does not guarantee a practical (or even secure) cipher. For instance, the One-Time Pad (OTP) is a tantalizingly simple cipher that achieves perfect secrecy. It requires Alice and Bob to agree on a secret key K that is the same length as the plaintext x they wish to encrypt (normally in GF(2)). The key K is then added to x bit-wise to produce the ciphertext y. However, Alice and Bob must find a way to secretly agree on a K that is the same length as the plaintext they wish to communicate secretly. This defeats the purpose of the OTP in almost all applications. A better idea is to reframe the condition of perfect secrecy in a more pragmatic way called conditional security, thereby relaxing the design criteria of ciphers. Definition 4 (Conditional Security): A system is called conditionally secure if it can be broken in principle, but requires more computing power than a realistic adversary would have. In this case its security is measured via complexity theory. Recall that Kerckhoff’s principle stated that the security of a cryptosystem relies entirely on the strength of the secret key. We can say, then, that a cryptosystem is conditionally secure if the plaintext cannot be recovered from the ciphertext with any less cost than searching every key 𝐾 ∈ 𝒦. We call this attack a brute force or exhaustive key search attack. This provides a benchmark [conditional] value of security for designers of cryptosystems and implies one obvious condition for security: the keyspace 𝒦 must be large (recall the security parameter 2𝑛 for ciphers over GF(2)). Modern computing can perform a brute force attack on key sizes up to about 264. Just as conditional security is to perfect secrecy, so are stream ciphers to the OTP. Stream ciphers capture the spirit of the OTP in a more practical way. 3 Definition 5 (Stream Cipher): A stream cipher is a key-dependent algorithm with internal memory that receives symbols of a message x one-by-one over the alphabet 𝒜, and in parallel produces the ciphertext y over the same alphabet, perhaps, with some delay. The internal memory refers to an internal state described in more detail later, which differentiates stream ciphers from block ciphers according to this definition. Before providing a rigorous treatment of stream ciphers, it is helpful to consider where they live in the taxonomy of cryptographic primitives. Definition 6 (Cryptographic primitive): A cryptographic primitive is a fundamental (primitive) component of cryptographic systems. For instance, one cryptographic system may be made of a public-key cipher in which Alice and Bob privately agree on a secret key to be used in a symmetric-key cipher for communication, to be signed to guarantee authenticity. The primitives are the elements that make up the system as a whole. The following diagram is based on one in [4]. Hash Functions Unkeyed Primitives Random Sequences Block Ciphers Cryptographic Primitives Symmetric-key ciphers Symmetrickey Primitives Stream Ciphers Pseudo-random Sequences Public-key ciphers Public-Key Primitives Signatures Figure 1: Taxonomy of Cryptographic Primitives 4 Structure of Stream Ciphers According to definition 5, a stream cipher encrypts elements of the plaintext one-by-one. In order to do so, the small key K must somehow generate a long keystream and combine it with the plaintext. Definition 7 (Keystream, Keystream Generator): The keystream generated by secret key K over some alphabet 𝒜 is a sequence of symbols from the same alphabet 𝒜, denoted 𝑧 𝑛 = 𝑧1 , 𝑧2 , … , 𝑧𝑛 𝑧𝑖 ∈ 𝒜, 𝑖 = 1, 2, … , 𝑛 The object that produces the keystream is called the keystream generator (KSG). Usually (and for our purposes), 𝒜 ⊆ 𝐺𝐹(2𝑛 ) and 𝑧𝑖 ⊂ 𝐺𝐹(2). In other words, the keystream is a sequence of 0’s and 1’s of length n, the length of the plaintext. The usual operation of combining plaintext and keystream bits is the exclusive or (XOR), or addition modulo 2, denoted ⊕. Definition 8 (Synchronous Stream Cipher): The simplest class of stream cipher is the synchronous stream cipher. It is so named because it requires that Alice and Bob keep their encryption and decryption operations synchronized on the same time t. It also means that, by itself, the SSC has no tolerance for error. According to [4] while adopting the notation in [5], an SSC is built from Internal state (IS), s, denotes the value of the internal state at time t. Update function (UF), L, (also called the connection function) takes as input the state (and perhaps key) at time t and produces as output the state at time t+1. So the update function updates the IS with each iteration. at time t = 0, 1, … L(st, K) = st+1 Keystream function, f, (also called a nonlinear filter) takes as input the state st (and perhaps key) and produce as output the keystream element zt. In other words, f uses the internal state at time t to produce one keystream bit zt. at time t = 0, 1, … f(st, K) = zt Output function, h, is the function that combines the keystream and the plaintext, resulting in the ciphertext. Again, h is normally the XOR. at time t = 0, 1, … h(xt, zt) = ct 5 The goal of any stream cipher is to produce as random a keystream as possible in order to mimic the security of the OTP. However, since truly random sequences are inherently difficult to obtain, pseudo-random sequences are used instead. Pseudo-random sequences are generated using pseudo-random number generators (PRNG), which have a wide range of application. In essence, the stream cipher is actually just a PRNG with a combining operation. Definition 9 (Pseudo-random number generator): A PRNG is a deterministic algorithm that produces samples from some alphabet 𝒜 that look independent and uniformly distributed. A PRNG uses a seed as its initialization parameter, and always produces the same sequence of numbers given the same seed. The PRNG drastically increases practicality and scalability of stream ciphers because it reduces the key to a manageable size. Rather than attempt to secretly transmit a truly random sequence that is as long as the message, Alice and Bob can simply use any popular public-key cryptographic primitive to agree on a sequence of n bits, which will produce a pseudo-random keystream with large period. Boolean Functions Recall that in most cases, the alphabet for stream ciphers is over GF(2) and the keystream bits are in GF(2). The keystream function, then, is a Boolean Function. Definition 10 (Boolean Function (BF), Degree, Linear and Affine): A Boolean Function f maps a binary string of length n to one binary variable. f(x1, x2, … , xn) = y, 𝑦 ∈ 𝐺𝐹(2) One important property of BF’s is that each BF has a unique representation as a multivariate polynomial over GF(2), called the algebraic normal form (ANF). In other words, the keystream function f is a multivariate polynomial. The algebraic degree of a function, d, is the number of variables in the highest order term with non-zero coefficient. o Ex: deg(x2y2 + xy2 + x3) = 4 in ℝ o Ex: deg(𝑠1 + 𝑠2 + 𝑠1 𝑠2 + 𝑠1 𝑠2 𝑠3 + 𝑠4 + 𝑠2 𝑠4 ) = 3 in GF(2) 6 A BF is affine if there exists no term of degree greater than 1 in the ANF. An affine function with zero constant term is a linear function. The concepts of nonlinear, low-degree Boolean functions are at the heart of this paper. Linear Feedback Shift Registers Perhaps the most common type of PRNG used in stream ciphers is the linear feedback shift register (LFSR). A LFSR of length l is a collection of l memory cells which are each filled with an element of GF(2) (usually). The state of the LFSR at time t is simply the contents of the l memory cells. Figure 2: General LFSR Structure The LFSR updates by running the current state s through a linear recurrence relation 𝑙−1 𝑐𝑡+𝑙 = 𝑐𝑙 ∙ ∑ 𝑐𝑖 ∙ 𝑠𝑡+𝑖 over 𝐺𝐹(2) 𝑖=0 The constants 𝑐0 , 𝑐1 , … , 𝑐𝑙 are called the feedback coefficients, the connection at cl is called the feedback position, and the connections at 𝑐0 , 𝑐1 , … , 𝑐𝑙−1 are called the tap positions. The output ct+l moves onto one end of the LFSR, pushing all other elements over and outputting one element out the opposite side. LFSR’s are attractive because they are easy to implement, have large period, and look statistically random. However, since the recurrence relation is linear, the LFSR has low linear complexity, defined as the length of the shortest LFSR able to produce any given sequence. If we take the sequence of outputs as the keystream, the keystream will be susceptible to linear attacks. We can avoid this problem by introducing a non-linear component between the LFSR 7 and the keystream. This component is the keystream function f, which as we learned before is a function on the state of the LFSR. The Problem Consider a LFSR-based cipher with key in GF(2n). We have an internal states st at time t, a state update function L, and a multivariate Boolean keystream function f (the nonlinear filter function). The update function L is linear by definition of LFSR and the keystream function f does not depend on the secret key K, but only the state of the LFSR. In the spirit of Kerckhoff, assume both L and f are public and only the state is secret. All of the following content is described in [5]. Let (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) be the initial state at time t = 0, where 𝑘0 , 𝑘1 , … , 𝑘𝑛−1 = K, the secret key of length n. Then the ‘zeroth’ keystream bit is: 𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) At time t = 1, the state is given by L(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) so the keystream bit is given by: 𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) Repeating the process, the keystream is generated as follows: 𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) 𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) 𝑧2 = 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ⋮ 𝑧𝑡 = 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) *Goal: Recover the initial state, and therefore the key K, given a subset of keystream bits zi.* Once K is known, Eve can use it to generate the entire keystream using the update and key generating functions. At that point she can decrypt just like Bob. The obvious first idea is to solve the system of multivariate equations using an existing technique such as linearization or the XL algorithm introduced in [5]. 8 Question 1: The number N of monomials of degree less than or equal to d is given by 𝑑 𝑙 𝑁 = ∑( ) 𝑖 𝑖=1 Suppose l = 128 and d = 6. Then 6 𝑁 = ∑( 𝑖=1 128 128 )≈( ) = 232.337 𝑖 6 Using linearization and Gaussian elimination on a problem this size proves intractable. How can we solve the system of multivariate polynomials if f is of high degree? Idea 1: Suppose d = 3. Then 3 𝑁 = ∑( 𝑖=1 128 128 )≈( ) = 218.381 𝑖 3 If we can reduce the degree of the keystream function, then the problem falls outside the realm of conditional security. Find a low degree function that approximates f with probability close to 1. Then for ‘most’ of the known keystream bits the approximating function produces a low degree equation that holds. The system of equations can now be solved. Question 2: What if no good low degree approximation of f exists? Idea 2: The subject of this paper. Reduce the degree of f by multiplying it by a well-chosen multivariate polynomial. Then the system holds with probability 1 and we can solve the overdefined system of low degree equations. The term overdefined simply means that there are more equations than unknowns, or the number of equations > 𝑛, the size of the key. The Algebraic Attack The Problem calls for a known subset of keystream bits zi and their positions in the keystream. The positions provide the exponents i of Li, eliminating the requirement that the known keystream bits are consecutive. An outline of the attack follows. Note that the content of following sections is based almost entirely on [5], and thus any quotes are from [5] unless specified otherwise. 9 0. Let s denote the state at any time. Then every know keystream bit zi produces an equation f(s) = zi. 1. Multiply f(s) = zi by a well-chosen multivariate polynomial g(s), such that f(s)∙g(s) is of significantly lower degree than f, denoted by d. 2. Set up a system of equations f(s)g(s) = zi g(s) for all known keystream bits zi. The system of equations is on the initial state bits ki, which make up the secret key K. 3. If sufficiently many keystream bits are known, the system is highly overdefined and can be solved efficiently. The methods for solving the system of multivariate equations are discussed only cursorily here. The big mystery, then, is the polynomial g. The rest of the paper will explore how to choose g and demonstrate the attack on a few examples. Assumptions Only synchronous stream ciphers are discussed (i.e., stream ciphers “in which each state is generated from the previous state independently of the plaintext”) Assume a binary stream cipher (i.e., the alphabet is over GF(2)) The update function (connection function) L is linear over GF(2) and public The keystream function (nonlinear filter) f is public and not a function of K Only the state is secret (Recall: the goal is to find the initial state, which is K) Scenarios Breaking the problem down into different scenarios helps provide direction on which attack method to choose. The following scenarios, introduced in [5], focus on the characteristics of the keystream function f, specifically the degree d. They reprise the logical progression given by the questions and ideas in the ‘Problem’ section above. Scenario 1 (S1): either the keystream function f has low algebraic degree d Scenario 2 (S2): or f can be approximated by a low degree function with probability close to 1. The new idea here calls for multiplying f by a well-chosen function g to produce a function f g of low degree. It relaxes the condition that f be of low degree in S1 by providing an alternative to S2 using equations that are true with probability 1. 10 Scenario 3 (S3): there exists some non-zero multivariate polynomial g such that f g is of low degree d. *Note that S3 is actually a generalization of S1, where g = 1 It is also important to note that each of these scenarios provides a route of attack that must be taken into account with future stream cipher designs. The implications of these strategies are discussed briefly after the example attacks. The Problem Redefined Recall that the strategy to recover the key K, which is the initial state of the LFSR at time t0, is to create a system of multivariate nonlinear equations 𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) 𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) 𝑧2 = 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ⋮ 𝑧𝑡 = 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) where each known keystream bit zi produces one equation. Now, however, multiply both sides of each equation by the well-chosen g to produce 𝑧0 ∙ 𝑔(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) = 𝑔(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) ∙ 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) 𝑧1 ∙ 𝑔(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) 𝑧2 ∙ 𝑔(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ⋮ 𝑧𝑡 ∙ 𝑔(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) The problem reduces to three questions: 1. How do we solve the system once we have it? 2. How many known keystream bits are necessary? 3. How do we choose 𝑔? 11 Linearization The simplest answer to the first question is the linearization technique. Recall that Gaussian Elimination (or row reduction) is a simple and efficient method of solving systems of linear equations. However, systems of nonlinear equations like those we have constructed cannot be solved using Gaussian Elimination. The idea behind the linearization technique, as the name implies, is to transform the system of nonlinear equations into a system of linear equations in order to apply Gaussian elimination. To do so, simply replace every nonlinear monomial in the equation with a new, symbolic linear value. Then, solve the system using Gaussian elimination. Recall also that Gaussian elimination requires at least as many equations as there are variables. This fact motivates the answer to question two. Let m be the number of known keystream bits and R the number of nonlinear equations of degree d (and on n variables) that the m keystream bits produce. We wish to find a bound for m. Note that in the attack overview, 𝑅 = 𝑚. That is, each known keystream bit produces one nonlinear equation. However, finding multiple – say, k – different 𝑔 functions for one f allows for each keystream bit to produce k equations. In this case, 𝑅 = 𝑘𝑚. As noted above, the number of monomials of degree ≤ 𝑑 is 𝑁 ≈ (𝑛𝑑). So R must be ≥ 𝑁. 𝑅 = 𝑘𝑚 ≥ 𝑁 𝑛 𝑁 (𝑑 ) 𝑚≥ = 𝑘 𝑘 The third question is the heart of this paper. It is the new idea that makes algebraic attacks feasible and effective. How do we choose 𝑔? The answer is easiest to see using an example. Algebraic Attack on Toyocrypt The stream cipher Toyocrypt – a submission to the Japanese Cryptrec call for cryptographic primitives – fits the assumptions nicely. Toyocrypt uses one 128-bit LFSR with an associated linear state update function L and nonlinear filter f. At the time of publication, Toyocrypt was assumed secure against all know attacks, but the construction of the nonlinear filter f proves susceptible to both S2 and S3. The nonlinear filter f is 12 62 𝑓(𝑠0 , … , 𝑠127 ) = 𝑠127 + ∑ 𝑠𝑖 𝑠𝛼𝑖 + 𝑠10 𝑠23𝑠32 𝑠42 + 𝑖=0 62 +𝑠1 𝑠2 𝑠9 𝑠12𝑠18𝑠20 𝑠23 𝑠25𝑠26 𝑠28 𝑠33𝑠38 𝑠41𝑠42 𝑠51 𝑠53𝑠59 + ∏ 𝑠𝑖 𝑖=0 with {𝛼0 , … , 𝛼62 } some permutation of the set {63, …, 125}. The previous attack following S2 approximates f with a multivariate linear function of degree 4 and probability 1 – 2-17. However, the attack requires 292 CPU clocks, which takes about 9,813,705,283.5 years on a single four core 4 GHZ CPU and 9,583,696.6 years on 1024 parallel four core 4 GHZ CPU’s. The new attack scenario S3 calls for a well-chosen function 𝑔 such that 𝑓(𝑠) ∙ 𝑔(𝑠) is of low degree. Note that Toyocrypt’s f function has one 63 degree term, one 17 degree term, and one 4 degree term (disregarding the low degree terms). In addition, each of the three high degree terms are divisible by s23 and s42. Fact: In Boolean algebra, multiplication is analogous to the and operation, whose truth table is given below. In short, the and operation works just like multiplication over the reals because the only two elements are 0 and 1. Note that any variable squared reduces to that variable. Truth Table for Boolean and 1 and 1 1 1 and 0 0 0 and 1 0 0 and 0 0 12 = 1 ∙ 1 = 1 02 = 0 ∙ 0 = 0 It follows that multiplying f by the function 𝑔(𝑠) = 𝑔′ (𝑠) − 1 produces a lower degree function 𝑓(𝑠) ∙ 𝑔(𝑠). This is because the terms that share the common factor 𝑔′ reduce back to themselves and are subtracted off since 𝑔′2 = 𝑔′ ∙ 𝑔′ = 𝑔′ . Starting with the general equation 𝑓(𝑠) = 𝑧𝑡 , multiply both sides by 𝑔(𝑠) = (𝑠23 − 1) to produce 𝑓(𝑠)𝑠23 − 𝑓(𝑠) = 𝑧𝑡 (𝑠23 − 1). In this way, every monomial divisible by 𝑠23 will cancel out, leaving an equation of degree 3 true with probability 1. 13 62 𝑓(𝑠0 , … , 𝑠127 ) ∙ 𝑠23 = 𝑠127 𝑠23 + 𝑠23 ∙ ∑ 𝑠𝑖 𝑠𝛼𝑖 + 𝑠10 𝑠23 𝑠23 𝑠32𝑠42 + 𝑖=0 62 +𝑠1 𝑠2 𝑠9 𝑠12𝑠18𝑠20 𝑠23𝑠23 𝑠25 𝑠26𝑠28 𝑠33 𝑠38𝑠41 𝑠42 𝑠51𝑠53 𝑠59 + 𝑠23 ∙ ∏ 𝑠𝑖 𝑖=0 62 𝑓(𝑠0 , … , 𝑠127 ) ∙ 𝑠23 − 𝑓(𝑠) = 𝑠127 𝑠23 + 𝑠23 ∙ ∑ 𝑠𝑖 𝑠𝛼𝑖 𝑖=0 Similarly, multiplying both sides by (𝑠42 − 1) produces another equation of degree 3. Therefore, each of the m known keystream bits produces two low degree multivariate nonlinear equations; 𝑘 = 2. Therefore 𝑁 ≈ (128 ) = 218.38. Finally, 3 𝑚≥ 𝑁 218.38 = = 217.38 𝑘 2 This corresponds to about 20 kilobytes of memory. Finally, assuming a proper implementation of Strassen’s algorithm for Gaussian reduction (not important here) that runs 64 operations per CPU clock, the number of CPU clocks necessary is 7⁄64 ∙ 𝑁 log2 7 = 249. Again assuming a single four core 4 GHZ CPU, the new approach requires 9.77 hours. 1024 parallel four core 4 GHZ CPU’s require about 34 seconds. The low number of required keystream bits m, and the elimination of the constraint that they be consecutive, makes this attack much more attractive and practical than any previous efforts. Generalization As noted before, the more 𝑔 functions that can be found for one f, the fewer known keystream bits required to perform the attack. The attack on Toyocrypt, and other stream ciphers like it, relies on the idea of factoring the high-degree portions of the f function to reduce the degree. The example using Toyocrypt was simple enough to notice by inspection. However, a method of systematizing the search for these linear dependencies is useful. The following is from section 6 in [5]. Consider a stream cipher with n state bits and whose f function only takes advantage of a small subset k of those n bits. In other notation, {𝑥1 , 𝑥2 , … , 𝑥𝑘 } ⊂ {𝑠0 , 𝑠1 , … , 𝑠𝑛 }. Construct the multiset 𝐶 = 𝐴 ∪ 𝐵 as follows. 14 A is the set of all possible monomials up to degree d: 𝐴 = {1, 𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝑥1𝑥2 , … } B is the set of all multiples of f and elements of A: 𝐵 = {𝑓(𝑥), 𝑓(𝑥) ∙ 𝑥1 , 𝑓(𝑥) ∙ 𝑥2, … , 𝑓(𝑠) ∙ 𝑥𝑛 , 𝑓(𝑠) ∙ 𝑥1𝑥2 , … } Fact: A set of multivariate polynomials on k variables cannot dimension greater than 2𝑘 . The set C is simply a set of all the multivariate polynomials in the 𝑥𝑖 up to degree d. Therefore, if C contains more than 2𝑘 elements, some linear dependencies exist and some combinations of the elements will produce the desired polynomials 𝑔. Luckily, this is always the case. Theorem 2: (Courtois and Meier) Let f be any Boolean function 𝑓 ∶ 𝐺𝐹(2)𝑘 → 𝐺𝐹(2). Then there is a Boolean function 𝑔 ≠ 0 of degree at most ⌈𝑘⁄2⌉ such that 𝑓(𝑥) ∙ 𝑔(𝑥) is of degree at most ⌊𝑘⁄2⌋. Proof: We wish to show that ∣ 𝐶 ∣ > 2𝑘 . If we let A include all monomials with degree up to ⌊𝑘⁄2⌋ and for B, multiply f by all the monomials with degree up to ⌈𝑘⁄2⌉, then: ⌈𝑘⁄2⌉ ⌊𝑘⁄2⌋ ⌈𝑘⁄2⌉ 𝑘 𝑖=0 𝑖=0 𝑖=0 𝑖=0 𝑘 𝑘 𝑘 𝑘 ∣ 𝐶 ∣=∣ 𝐴 ∣ ∪ ∣ 𝐵 ∣ = ∑ ( ) + ∑ ∑ ( ) = ∑ ( ) + ( 𝑘 ) > 2𝑘 𝑖 𝑖 𝑖 ⌊ ⁄2⌋ There are no linear dependencies in A, which means all linear dependencies are in either B itself or in both A and B, which ensures 𝑔 ≠ 0. 15 Consequences and Conclusions Cryptanalysis and crypto system design are two sides of the same coin. The former imposes requirements on the latter, which in turn creates new avenues for attack in a cyclic process. In this case, the new attack scenario presented has several implications for the design of synchronous stream ciphers, in particular the keystream function. In principle, algebraic attacks are possible whenever a Boolean function is used to generate the keystream, regardless of the method of input (irregularly clocked LFSR, multiple LFSRs, alternatives to LFSRs). However, LFSRs still offer a simple, effective way of generating a keystream. Designers must be careful, however, to ensure that ‘good’ Boolean combining functions are truly good. For one, the keystream function should use a large subset of the n state bits. In other words, k should be large. More precisely, if there exists any function 𝑔such that 𝑓 ∙ 𝑔 is of degree < 𝑑 (the degree of f), the degree of 𝑓 ∙ 𝑔 should still be at least 6 when 𝑛 = 128. This corresponds to an attack in about 288 operations, which is currently considered safe. Whatever minimum degree we specify, Courtois and Meier’s theorem implies that k should be at least double that. Secondly, the attack on Toyocrypt implies that the function should have many varied terms of high degree to avoid common factors. Toyocrypt does have a large k value, but the small number of high degree terms and common factor between them allows for a factoring approach that cuts the degree of the keystream function from 6 to 3. In addition, some stream ciphers attempt to add confusion to the encryption by using multiple filtering functions 𝑓𝑖 . This scenario is analogous to the single f scenarios discussed in this paper when the 𝑓, 𝑓 ∘ 𝐿, 𝑓 ∘ 𝐿2 , … are considered multiple different functions. In general, the condition imposed by this algebraic approach is that there should be no non-trivial low degree relationships between the key bits and the keystream bits. 16 References 1. Douglas R. Stinson: Crytography: Theory and Practice (3rd edition), Chapman & Hall/CRC (2006), pp. 1. 2. Auguste Kerckhoffs: La cryptographie militaire, ou, Des chiffres usités en temps de guerre, Paris (1883), pp. 8. Archived copy accessed on Google Play Books and translated by Google Translate. 3. Claude E. Shannon: Communication Theory of Secrecy Systems, Bell System Technical Journal 28 (1949), pp. 656-715. 4. Alexander Maximov: Some Words on Crptanalysis of Stream Ciphers, PhD thesis, Lund University (2006). 5. Nicolas Courtois and Willi Meier: Algebraic Attacks on Stream Ciphers with Linear Feedback, Eurocrypt 2003, Warsaw, Poland, LNCS 2656, pp. 345-359, Springer. An extended version is available at http://www.minrank.org/toyolili.pdf 17