Stream Ciphers and Algebraic Attack Methods MIDN 1/C Jake Felton

advertisement
Stream Ciphers and Algebraic Attack Methods
MIDN 1/C Jake Felton
United States Naval Academy
Introduction
This paper is a Capstone effort in cryptography (more particularly, cryptanalysis) at the
United States Naval Academy. The intended audience, however, is much larger. I have
endeavored to make the paper as accessible yet streamlined as possible. As a result, the heart of
the paper – algebraic attack methods on stream ciphers based on linear feedback shift registers –
is well motivated by definitions and mathematical context. A familiarity with cryptography is
helpful but not necessary, so long as the reader is sufficiently fluent in mathematics generally.
Stream ciphers take back seat to block ciphers in many of the most common usage
scenarios, such as electronic data transmission over the internet. However, stream ciphers
remain an important part of modern cryptography, particularly in low-power environments.
Stream ciphers are often used in radio frequency identification (RFID) and mobile
communication for their efficiency in hardware, simplicity, and ability to manipulate messages
of any size rather than discrete blocks. The ‘traditional’ stream cipher is based on a block of
memory called a linear feedback shift register, along with a nonlinear keystream function. As a
result, the attack method presented in this paper is applicable to a large subset of stream ciphers.
When important terms are introduced, they are written in italics until they have been
properly defined. Afterwards they are written normally.
Basic Definitions
A cryptosystem is simply a collection of five objects – three sets or spaces and two rules
(or functions). The three sets, taken from some alphabet 𝒜, are the plaintext set 𝒫, the
ciphertext set 𝒞, and the keyspace 𝒦. The following definition based on that in [1] formalizes
this notion. Note that all other definitions in this section, unless otherwise indicated, are based
on those found in [4].
Definition 1 (cryptosystem)
An alphabet 𝒜 is a set of symbols.
The finite set 𝒫 ⊆ 𝒜 is the set of all possible messages, called plaintexts.
The finite set 𝒞 ⊆ 𝒜 is the set of all possible encodings, called ciphertexts.
The finite set 𝒦 ⊆ 𝒜 is the set of all possible keys, called the keyspace.
1
For every key 𝐾 ∈ 𝒦, there is an encryption rule 𝑒𝐾 ∈ ℰ and a corresponding decryption rule
𝑑𝐾 ∈ 𝒟 such that:



the function 𝑒𝐾 ∶ 𝒫 → 𝒞
the function 𝑑𝐾 ∶ 𝒞 → 𝒫
𝑑𝐾 (𝑒𝐾 (𝑥)) = 𝑥 for every plaintext 𝑥 ∈ 𝒫
The fundamental purpose of any cryptosystem is to enable secure, accurate, and
authenticable communication between two parties, called Alice and Bob. The assumption, of
course, is that there will always be an eavesdropper Eve attempting to compromise some element
of that security. To assume to the contrary would defeat the purpose of secret communication.
Eve has significant implications for the study of cryptosystems. She highlights the important
relationship between the arts of making ciphers and breaking ciphers. This dynamic will
resurface later in the discussion of the attack, in the form of consequences that inform further
design. For now, consider some famous notions of security that help illuminate the notion of a
secure cryptosystem.
Definition 2 (Kerckhoff’s Principle): In 1883 Auguste Kerckhoffs formulated six design
principles for military ciphers in [2]. Perhaps the most notable of these stipulates that the cipher
“must not require secrecy, and that it can conveniently fall into the hands of the enemy”. In
other words, cryptographers must assume that every object of the cipher except the secret key is
public. This implies that the security of a cryptosystem must rely solely on the strength of the
secret key. *For ciphers that operate over the finite field GF(2), and with key size n, the
‘strength’ of the secret key is often measured in terms of the size of the keyspace, 2𝑛 . Indeed,
this value is sometimes called the security parameter of the cipher.
Definition 3 (Perfect Secrecy): In 1949 Claude Shannon published a landmark paper that
introduced many of the notions of secrecy currently used. Among these are the principles of
confusion and diffusion, most often realized as some type of substitution and permutation in
modern ciphers. In addition, he provided the following theorem defining perfect secrecy, taken
directly from [3]:
2
Theorem 1 (Shannon): A necessary and sufficient condition for perfect secrecy is that
𝑃{𝑥|𝑦} = 𝑃{𝑥}
For all 𝑥 ∈ 𝒫 and 𝑦 ∈ 𝒞.
In other words, the probability of guessing the plaintext that corresponds to any given ciphertext
is simply the probability of guessing the plaintext given nothing at all.
This is a useful idea, but does not guarantee a practical (or even secure) cipher. For
instance, the One-Time Pad (OTP) is a tantalizingly simple cipher that achieves perfect secrecy.
It requires Alice and Bob to agree on a secret key K that is the same length as the plaintext x they
wish to encrypt (normally in GF(2)). The key K is then added to x bit-wise to produce the
ciphertext y. However, Alice and Bob must find a way to secretly agree on a K that is the same
length as the plaintext they wish to communicate secretly. This defeats the purpose of the OTP
in almost all applications.
A better idea is to reframe the condition of perfect secrecy in a more pragmatic way
called conditional security, thereby relaxing the design criteria of ciphers.
Definition 4 (Conditional Security): A system is called conditionally secure if it can be broken
in principle, but requires more computing power than a realistic adversary would have. In this
case its security is measured via complexity theory.
Recall that Kerckhoff’s principle stated that the security of a cryptosystem relies entirely
on the strength of the secret key. We can say, then, that a cryptosystem is conditionally secure if
the plaintext cannot be recovered from the ciphertext with any less cost than searching every key
𝐾 ∈ 𝒦. We call this attack a brute force or exhaustive key search attack. This provides a
benchmark [conditional] value of security for designers of cryptosystems and implies one
obvious condition for security: the keyspace 𝒦 must be large (recall the security parameter 2𝑛
for ciphers over GF(2)). Modern computing can perform a brute force attack on key sizes up to
about 264.
Just as conditional security is to perfect secrecy, so are stream ciphers to the OTP.
Stream ciphers capture the spirit of the OTP in a more practical way.
3
Definition 5 (Stream Cipher): A stream cipher is a key-dependent algorithm with internal
memory that receives symbols of a message x one-by-one over the alphabet 𝒜, and in parallel
produces the ciphertext y over the same alphabet, perhaps, with some delay.
The internal memory refers to an internal state described in more detail later, which
differentiates stream ciphers from block ciphers according to this definition.
Before providing a rigorous treatment of stream ciphers, it is helpful to consider where
they live in the taxonomy of cryptographic primitives.
Definition 6 (Cryptographic primitive): A cryptographic primitive is a fundamental (primitive)
component of cryptographic systems. For instance, one cryptographic system may be made of a
public-key cipher in which Alice and Bob privately agree on a secret key to be used in a
symmetric-key cipher for communication, to be signed to guarantee authenticity. The primitives
are the elements that make up the system as a whole.
The following diagram is based on one in [4].
Hash Functions
Unkeyed
Primitives
Random Sequences
Block Ciphers
Cryptographic
Primitives
Symmetric-key ciphers
Symmetrickey Primitives
Stream Ciphers
Pseudo-random Sequences
Public-key ciphers
Public-Key
Primitives
Signatures
Figure 1: Taxonomy of Cryptographic Primitives
4
Structure of Stream Ciphers
According to definition 5, a stream cipher encrypts elements of the plaintext one-by-one.
In order to do so, the small key K must somehow generate a long keystream and combine it with
the plaintext.
Definition 7 (Keystream, Keystream Generator): The keystream generated by secret key K over
some alphabet 𝒜 is a sequence of symbols from the same alphabet 𝒜, denoted
𝑧 𝑛 = 𝑧1 , 𝑧2 , … , 𝑧𝑛
𝑧𝑖 ∈ 𝒜, 𝑖 = 1, 2, … , 𝑛
The object that produces the keystream is called the keystream generator (KSG). Usually (and
for our purposes), 𝒜 ⊆ 𝐺𝐹(2𝑛 ) and 𝑧𝑖 ⊂ 𝐺𝐹(2). In other words, the keystream is a sequence
of 0’s and 1’s of length n, the length of the plaintext. The usual operation of combining plaintext
and keystream bits is the exclusive or (XOR), or addition modulo 2, denoted ⊕.
Definition 8 (Synchronous Stream Cipher): The simplest class of stream cipher is the
synchronous stream cipher. It is so named because it requires that Alice and Bob keep their
encryption and decryption operations synchronized on the same time t. It also means that, by
itself, the SSC has no tolerance for error. According to [4] while adopting the notation in [5], an
SSC is built from

Internal state (IS), s, denotes the value of the internal state at time t.

Update function (UF), L, (also called the connection function) takes as input the state
(and perhaps key) at time t and produces as output the state at time t+1. So the update
function updates the IS with each iteration.
at time t = 0, 1, …
L(st, K) = st+1

Keystream function, f, (also called a nonlinear filter) takes as input the state st (and
perhaps key) and produce as output the keystream element zt. In other words, f uses the
internal state at time t to produce one keystream bit zt.
at time t = 0, 1, …
f(st, K) = zt

Output function, h, is the function that combines the keystream and the plaintext,
resulting in the ciphertext. Again, h is normally the XOR.
at time t = 0, 1, …
h(xt, zt) = ct
5
The goal of any stream cipher is to produce as random a keystream as possible in order to
mimic the security of the OTP. However, since truly random sequences are inherently difficult
to obtain, pseudo-random sequences are used instead. Pseudo-random sequences are generated
using pseudo-random number generators (PRNG), which have a wide range of application. In
essence, the stream cipher is actually just a PRNG with a combining operation.
Definition 9 (Pseudo-random number generator): A PRNG is a deterministic algorithm that
produces samples from some alphabet 𝒜 that look independent and uniformly distributed. A
PRNG uses a seed as its initialization parameter, and always produces the same sequence of
numbers given the same seed.
The PRNG drastically increases practicality and scalability of stream ciphers because it reduces
the key to a manageable size. Rather than attempt to secretly transmit a truly random sequence
that is as long as the message, Alice and Bob can simply use any popular public-key
cryptographic primitive to agree on a sequence of n bits, which will produce a pseudo-random
keystream with large period.
Boolean Functions
Recall that in most cases, the alphabet for stream ciphers is over GF(2) and the keystream
bits are in GF(2). The keystream function, then, is a Boolean Function.
Definition 10 (Boolean Function (BF), Degree, Linear and Affine):

A Boolean Function f maps a binary string of length n to one binary variable.
f(x1, x2, … , xn) = y,
𝑦 ∈ 𝐺𝐹(2)
One important property of BF’s is that each BF has a unique representation as a
multivariate polynomial over GF(2), called the algebraic normal form (ANF). In other words,
the keystream function f is a multivariate polynomial.

The algebraic degree of a function, d, is the number of variables in the highest order term
with non-zero coefficient.
o Ex: deg(x2y2 + xy2 + x3) = 4 in ℝ
o Ex: deg(𝑠1 + 𝑠2 + 𝑠1 𝑠2 + 𝑠1 𝑠2 𝑠3 + 𝑠4 + 𝑠2 𝑠4 ) = 3 in GF(2)
6

A BF is affine if there exists no term of degree greater than 1 in the ANF.

An affine function with zero constant term is a linear function.
The concepts of nonlinear, low-degree Boolean functions are at the heart of this paper.
Linear Feedback Shift Registers
Perhaps the most common type of PRNG used in stream ciphers is the linear feedback
shift register (LFSR). A LFSR of length l is a collection of l memory cells which are each filled
with an element of GF(2) (usually). The state of the LFSR at time t is simply the contents of the
l memory cells.
Figure 2: General LFSR Structure
The LFSR updates by running the current state s through a linear recurrence relation
𝑙−1
𝑐𝑡+𝑙 = 𝑐𝑙 ∙ ∑ 𝑐𝑖 ∙ 𝑠𝑡+𝑖
over 𝐺𝐹(2)
𝑖=0
The constants 𝑐0 , 𝑐1 , … , 𝑐𝑙 are called the feedback coefficients, the connection at cl is called the
feedback position, and the connections at 𝑐0 , 𝑐1 , … , 𝑐𝑙−1 are called the tap positions. The output
ct+l moves onto one end of the LFSR, pushing all other elements over and outputting one element
out the opposite side.
LFSR’s are attractive because they are easy to implement, have large period, and look
statistically random. However, since the recurrence relation is linear, the LFSR has low linear
complexity, defined as the length of the shortest LFSR able to produce any given sequence. If
we take the sequence of outputs as the keystream, the keystream will be susceptible to linear
attacks. We can avoid this problem by introducing a non-linear component between the LFSR
7
and the keystream. This component is the keystream function f, which as we learned before is a
function on the state of the LFSR.
The Problem
Consider a LFSR-based cipher with key in GF(2n). We have an internal states st at time t,
a state update function L, and a multivariate Boolean keystream function f (the nonlinear filter
function). The update function L is linear by definition of LFSR and the keystream function f
does not depend on the secret key K, but only the state of the LFSR. In the spirit of Kerckhoff,
assume both L and f are public and only the state is secret. All of the following content is
described in [5].
Let (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) be the initial state at time t = 0, where 𝑘0 , 𝑘1 , … , 𝑘𝑛−1 = K, the
secret key of length n. Then the ‘zeroth’ keystream bit is:
𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )
At time t = 1, the state is given by L(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) so the keystream bit is given by:
𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
Repeating the process, the keystream is generated as follows:
𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )
𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
𝑧2 = 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
⋮
𝑧𝑡 = 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
*Goal: Recover the initial state, and therefore the key K, given a subset of keystream bits zi.*
Once K is known, Eve can use it to generate the entire keystream using the update and key
generating functions. At that point she can decrypt just like Bob. The obvious first idea is to
solve the system of multivariate equations using an existing technique such as linearization or
the XL algorithm introduced in [5].
8
Question 1: The number N of monomials of degree less than or equal to d is given by
𝑑
𝑙
𝑁 = ∑( )
𝑖
𝑖=1
Suppose l = 128 and d = 6. Then
6
𝑁 = ∑(
𝑖=1
128
128
)≈(
) = 232.337
𝑖
6
Using linearization and Gaussian elimination on a problem this size proves intractable. How can
we solve the system of multivariate polynomials if f is of high degree?
Idea 1: Suppose d = 3. Then
3
𝑁 = ∑(
𝑖=1
128
128
)≈(
) = 218.381
𝑖
3
If we can reduce the degree of the keystream function, then the problem falls outside the realm of
conditional security.
Find a low degree function that approximates f with probability close to 1. Then for ‘most’ of
the known keystream bits the approximating function produces a low degree equation that holds.
The system of equations can now be solved.
Question 2: What if no good low degree approximation of f exists?
Idea 2: The subject of this paper. Reduce the degree of f by multiplying it by a well-chosen
multivariate polynomial. Then the system holds with probability 1 and we can solve the
overdefined system of low degree equations. The term overdefined simply means that there are
more equations than unknowns, or the number of equations > 𝑛, the size of the key.
The Algebraic Attack
The Problem calls for a known subset of keystream bits zi and their positions in the
keystream. The positions provide the exponents i of Li, eliminating the requirement that the
known keystream bits are consecutive. An outline of the attack follows. Note that the content of
following sections is based almost entirely on [5], and thus any quotes are from [5] unless
specified otherwise.
9
0. Let s denote the state at any time. Then every know keystream bit zi produces an
equation f(s) = zi.
1. Multiply f(s) = zi by a well-chosen multivariate polynomial g(s), such that f(s)∙g(s) is of
significantly lower degree than f, denoted by d.
2. Set up a system of equations f(s)g(s) = zi g(s) for all known keystream bits zi. The system
of equations is on the initial state bits ki, which make up the secret key K.
3. If sufficiently many keystream bits are known, the system is highly overdefined and can
be solved efficiently.
The methods for solving the system of multivariate equations are discussed only cursorily
here. The big mystery, then, is the polynomial g. The rest of the paper will explore how to
choose g and demonstrate the attack on a few examples.
Assumptions





Only synchronous stream ciphers are discussed (i.e., stream ciphers “in which each state
is generated from the previous state independently of the plaintext”)
Assume a binary stream cipher (i.e., the alphabet is over GF(2))
The update function (connection function) L is linear over GF(2) and public
The keystream function (nonlinear filter) f is public and not a function of K
Only the state is secret (Recall: the goal is to find the initial state, which is K)
Scenarios
Breaking the problem down into different scenarios helps provide direction on which
attack method to choose. The following scenarios, introduced in [5], focus on the characteristics
of the keystream function f, specifically the degree d. They reprise the logical progression given
by the questions and ideas in the ‘Problem’ section above.
Scenario 1 (S1): either the keystream function f has low algebraic degree d
Scenario 2 (S2): or f can be approximated by a low degree function with probability close to 1.
The new idea here calls for multiplying f by a well-chosen function g to produce a function f g of
low degree. It relaxes the condition that f be of low degree in S1 by providing an alternative to
S2 using equations that are true with probability 1.
10
Scenario 3 (S3): there exists some non-zero multivariate polynomial g such that f g is of low
degree d. *Note that S3 is actually a generalization of S1, where g = 1
It is also important to note that each of these scenarios provides a route of attack that
must be taken into account with future stream cipher designs. The implications of these
strategies are discussed briefly after the example attacks.
The Problem Redefined
Recall that the strategy to recover the key K, which is the initial state of the LFSR at time
t0, is to create a system of multivariate nonlinear equations
𝑧0 = 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )
𝑧1 = 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
𝑧2 = 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
⋮
𝑧𝑡 = 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
where each known keystream bit zi produces one equation. Now, however, multiply both sides
of each equation by the well-chosen g to produce
𝑧0 ∙ 𝑔(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) = 𝑔(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ) ∙ 𝑓(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )
𝑧1 ∙ 𝑔(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿(𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
𝑧2 ∙ 𝑔(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿2 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
⋮
𝑧𝑡 ∙ 𝑔(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) = 𝑔(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 )) ∙ 𝑓(𝐿𝑡 (𝑘0 , 𝑘1 , … , 𝑘𝑛−1 ))
The problem reduces to three questions:
1. How do we solve the system once we have it?
2. How many known keystream bits are necessary?
3. How do we choose 𝑔?
11
Linearization
The simplest answer to the first question is the linearization technique. Recall that
Gaussian Elimination (or row reduction) is a simple and efficient method of solving systems of
linear equations. However, systems of nonlinear equations like those we have constructed
cannot be solved using Gaussian Elimination. The idea behind the linearization technique, as the
name implies, is to transform the system of nonlinear equations into a system of linear equations
in order to apply Gaussian elimination. To do so, simply replace every nonlinear monomial in
the equation with a new, symbolic linear value. Then, solve the system using Gaussian
elimination.
Recall also that Gaussian elimination requires at least as many equations as there are
variables. This fact motivates the answer to question two. Let m be the number of known
keystream bits and R the number of nonlinear equations of degree d (and on n variables) that the
m keystream bits produce. We wish to find a bound for m. Note that in the attack overview, 𝑅 =
𝑚. That is, each known keystream bit produces one nonlinear equation. However, finding
multiple – say, k – different 𝑔 functions for one f allows for each keystream bit to produce k
equations. In this case, 𝑅 = 𝑘𝑚.
As noted above, the number of monomials of degree ≤ 𝑑 is 𝑁 ≈ (𝑛𝑑). So R must be ≥ 𝑁.
𝑅 = 𝑘𝑚 ≥ 𝑁
𝑛
𝑁 (𝑑 )
𝑚≥ =
𝑘
𝑘
The third question is the heart of this paper. It is the new idea that makes algebraic
attacks feasible and effective. How do we choose 𝑔? The answer is easiest to see using an
example.
Algebraic Attack on Toyocrypt
The stream cipher Toyocrypt – a submission to the Japanese Cryptrec call for
cryptographic primitives – fits the assumptions nicely. Toyocrypt uses one 128-bit LFSR with
an associated linear state update function L and nonlinear filter f. At the time of publication,
Toyocrypt was assumed secure against all know attacks, but the construction of the nonlinear
filter f proves susceptible to both S2 and S3. The nonlinear filter f is
12
62
𝑓(𝑠0 , … , 𝑠127 ) = 𝑠127 + ∑ 𝑠𝑖 𝑠𝛼𝑖 + 𝑠10 𝑠23𝑠32 𝑠42 +
𝑖=0
62
+𝑠1 𝑠2 𝑠9 𝑠12𝑠18𝑠20 𝑠23 𝑠25𝑠26 𝑠28 𝑠33𝑠38 𝑠41𝑠42 𝑠51 𝑠53𝑠59 + ∏ 𝑠𝑖
𝑖=0
with {𝛼0 , … , 𝛼62 } some permutation of the set {63, …, 125}.
The previous attack following S2 approximates f with a multivariate linear function of
degree 4 and probability 1 – 2-17. However, the attack requires 292 CPU clocks, which takes
about 9,813,705,283.5 years on a single four core 4 GHZ CPU and 9,583,696.6 years on 1024
parallel four core 4 GHZ CPU’s.
The new attack scenario S3 calls for a well-chosen function 𝑔 such that 𝑓(𝑠) ∙ 𝑔(𝑠) is of
low degree. Note that Toyocrypt’s f function has one 63 degree term, one 17 degree term, and
one 4 degree term (disregarding the low degree terms). In addition, each of the three high degree
terms are divisible by s23 and s42.
Fact: In Boolean algebra, multiplication is analogous to the and operation, whose truth table is
given below. In short, the and operation works just like multiplication over the reals because the
only two elements are 0 and 1. Note that any variable squared reduces to that variable.
Truth Table for Boolean and
1 and 1
1
1 and 0
0
0 and 1
0
0 and 0
0
12 = 1 ∙ 1 = 1
02 = 0 ∙ 0 = 0
It follows that multiplying f by the function 𝑔(𝑠) = 𝑔′ (𝑠) − 1 produces a lower degree
function 𝑓(𝑠) ∙ 𝑔(𝑠). This is because the terms that share the common factor 𝑔′ reduce back to
themselves and are subtracted off since 𝑔′2 = 𝑔′ ∙ 𝑔′ = 𝑔′ .
Starting with the general equation 𝑓(𝑠) = 𝑧𝑡 , multiply both sides by 𝑔(𝑠) = (𝑠23 − 1) to
produce 𝑓(𝑠)𝑠23 − 𝑓(𝑠) = 𝑧𝑡 (𝑠23 − 1). In this way, every monomial divisible by 𝑠23 will
cancel out, leaving an equation of degree 3 true with probability 1.
13
62
𝑓(𝑠0 , … , 𝑠127 ) ∙ 𝑠23 = 𝑠127 𝑠23 + 𝑠23 ∙ ∑ 𝑠𝑖 𝑠𝛼𝑖 + 𝑠10 𝑠23 𝑠23 𝑠32𝑠42 +
𝑖=0
62
+𝑠1 𝑠2 𝑠9 𝑠12𝑠18𝑠20 𝑠23𝑠23 𝑠25 𝑠26𝑠28 𝑠33 𝑠38𝑠41 𝑠42 𝑠51𝑠53 𝑠59 + 𝑠23 ∙ ∏ 𝑠𝑖
𝑖=0
62
𝑓(𝑠0 , … , 𝑠127 ) ∙ 𝑠23 − 𝑓(𝑠) = 𝑠127 𝑠23 + 𝑠23 ∙ ∑ 𝑠𝑖 𝑠𝛼𝑖
𝑖=0
Similarly, multiplying both sides by (𝑠42 − 1) produces another equation of degree 3.
Therefore, each of the m known keystream bits produces two low degree multivariate nonlinear
equations; 𝑘 = 2. Therefore 𝑁 ≈ (128
) = 218.38. Finally,
3
𝑚≥
𝑁 218.38
=
= 217.38
𝑘
2
This corresponds to about 20 kilobytes of memory. Finally, assuming a proper implementation
of Strassen’s algorithm for Gaussian reduction (not important here) that runs 64 operations per
CPU clock, the number of CPU clocks necessary is 7⁄64 ∙ 𝑁 log2 7 = 249. Again assuming a
single four core 4 GHZ CPU, the new approach requires 9.77 hours. 1024 parallel four core 4
GHZ CPU’s require about 34 seconds. The low number of required keystream bits m, and the
elimination of the constraint that they be consecutive, makes this attack much more attractive
and practical than any previous efforts.
Generalization
As noted before, the more 𝑔 functions that can be found for one f, the fewer known
keystream bits required to perform the attack. The attack on Toyocrypt, and other stream ciphers
like it, relies on the idea of factoring the high-degree portions of the f function to reduce the
degree. The example using Toyocrypt was simple enough to notice by inspection. However, a
method of systematizing the search for these linear dependencies is useful. The following is
from section 6 in [5].
Consider a stream cipher with n state bits and whose f function only takes advantage of a
small subset k of those n bits. In other notation, {𝑥1 , 𝑥2 , … , 𝑥𝑘 } ⊂ {𝑠0 , 𝑠1 , … , 𝑠𝑛 }. Construct the
multiset 𝐶 = 𝐴 ∪ 𝐵 as follows.
14
A is the set of all possible monomials up to degree d:
𝐴 = {1, 𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝑥1𝑥2 , … }
B is the set of all multiples of f and elements of A:
𝐵 = {𝑓(𝑥), 𝑓(𝑥) ∙ 𝑥1 , 𝑓(𝑥) ∙ 𝑥2, … , 𝑓(𝑠) ∙ 𝑥𝑛 , 𝑓(𝑠) ∙ 𝑥1𝑥2 , … }
Fact: A set of multivariate polynomials on k variables cannot dimension greater than 2𝑘 .
The set C is simply a set of all the multivariate polynomials in the 𝑥𝑖 up to degree d.
Therefore, if C contains more than 2𝑘 elements, some linear dependencies exist and some
combinations of the elements will produce the desired polynomials 𝑔. Luckily, this is always the
case.
Theorem 2: (Courtois and Meier)
Let f be any Boolean function 𝑓 ∶ 𝐺𝐹(2)𝑘 → 𝐺𝐹(2). Then there is a Boolean function
𝑔 ≠ 0 of degree at most ⌈𝑘⁄2⌉ such that 𝑓(𝑥) ∙ 𝑔(𝑥) is of degree at most ⌊𝑘⁄2⌋.
Proof: We wish to show that ∣ 𝐶 ∣ > 2𝑘 . If we let A include all monomials with degree up to
⌊𝑘⁄2⌋ and for B, multiply f by all the monomials with degree up to ⌈𝑘⁄2⌉, then:
⌈𝑘⁄2⌉
⌊𝑘⁄2⌋ ⌈𝑘⁄2⌉
𝑘
𝑖=0
𝑖=0 𝑖=0
𝑖=0
𝑘
𝑘
𝑘
𝑘
∣ 𝐶 ∣=∣ 𝐴 ∣ ∪ ∣ 𝐵 ∣ = ∑ ( ) + ∑ ∑ ( ) = ∑ ( ) + ( 𝑘 ) > 2𝑘
𝑖
𝑖
𝑖
⌊ ⁄2⌋
There are no linear dependencies in A, which means all linear dependencies are in either B itself
or in both A and B, which ensures 𝑔 ≠ 0.
15
Consequences and Conclusions
Cryptanalysis and crypto system design are two sides of the same coin. The
former imposes requirements on the latter, which in turn creates new avenues for attack in a
cyclic process. In this case, the new attack scenario presented has several implications for the
design of synchronous stream ciphers, in particular the keystream function. In principle,
algebraic attacks are possible whenever a Boolean function is used to generate the keystream,
regardless of the method of input (irregularly clocked LFSR, multiple LFSRs, alternatives to
LFSRs). However, LFSRs still offer a simple, effective way of generating a keystream.
Designers must be careful, however, to ensure that ‘good’ Boolean combining functions are truly
good.
For one, the keystream function should use a large subset of the n state bits. In other
words, k should be large. More precisely, if there exists any function 𝑔such that 𝑓 ∙ 𝑔 is of
degree < 𝑑 (the degree of f), the degree of 𝑓 ∙ 𝑔 should still be at least 6 when 𝑛 = 128. This
corresponds to an attack in about 288 operations, which is currently considered safe. Whatever
minimum degree we specify, Courtois and Meier’s theorem implies that k should be at least
double that.
Secondly, the attack on Toyocrypt implies that the function should have many varied
terms of high degree to avoid common factors. Toyocrypt does have a large k value, but the
small number of high degree terms and common factor between them allows for a factoring
approach that cuts the degree of the keystream function from 6 to 3.
In addition, some stream ciphers attempt to add confusion to the encryption by using
multiple filtering functions 𝑓𝑖 . This scenario is analogous to the single f scenarios discussed in
this paper when the 𝑓, 𝑓 ∘ 𝐿, 𝑓 ∘ 𝐿2 , … are considered multiple different functions. In general, the
condition imposed by this algebraic approach is that there should be no non-trivial low degree
relationships between the key bits and the keystream bits.
16
References
1. Douglas R. Stinson: Crytography: Theory and Practice (3rd edition), Chapman & Hall/CRC
(2006), pp. 1.
2. Auguste Kerckhoffs: La cryptographie militaire, ou, Des chiffres usités en temps de guerre,
Paris (1883), pp. 8. Archived copy accessed on Google Play Books and translated by Google
Translate.
3. Claude E. Shannon: Communication Theory of Secrecy Systems, Bell System Technical
Journal 28 (1949), pp. 656-715.
4. Alexander Maximov: Some Words on Crptanalysis of Stream Ciphers, PhD thesis, Lund
University (2006).
5. Nicolas Courtois and Willi Meier: Algebraic Attacks on Stream Ciphers with Linear
Feedback, Eurocrypt 2003, Warsaw, Poland, LNCS 2656, pp. 345-359, Springer. An extended
version is available at http://www.minrank.org/toyolili.pdf
17
Download