Application of Combinatorial Mathematics to Cryptology: A Personal Journey Ed Dawson Information Security Institute Queensland University of Technology 1 Overview • Introduction • Combinatorial Structures – Secret Sharing Schemes – Latin Squares and Authentication Schemes – Linear Codes and Boolean Functions • Discrete Optimisation – Genetic Algorithm – Knpasack Cipher and Genetic Algorithm – Boolean Functions and Discrete Optimisation • Lessons Learned 2 Introduction 3 Areas of Application • Combinatorial structures with special properties – • Provide discrete structures to build cryptographic systems Discrete Optimisation – – Provide methods to search large finite structures Tool for designing cryptographic systems and for cryptanalysis 4 Combinatorial Structures • Examples – – – – Ordered and unordered block designs used in secret sharing schemes. Linear codes used in stream ciphers, block ciphers, public key ciphers, authentication codes, secret sharing schemes. Latin squares used in authentication schemes. Primitive polynomials used in stream ciphers. 5 Discrete Optimisation Techniques • • • • Genetic Algorithm Hill Climbing Simulated Annealing Tabu Search 6 Secret Sharing Schemes 7 Shamir’s Secret Sharing Scheme (1979) • Key Generation – – • Key Recovery – • Select a polynomial f(x)=K+a,x … +at-1 xt-1 over Zp where P is large prime Distribution to participant Pi share f(i) for i=1, …, n Any t participants can recover key K using their shares by lagrange interpolation This is a perfect t-out-of-n threshold scheme 8 Secret Sharing Schemes A t-out-of-n perfect threshold scheme is a method whereby n pieces on information called shares, to a secret key are distributed such that: • K can be reconstructed from knowledge of any t or more shares • Knowledge of fewer that t shares provides no information about K 9 Orthogonal Arrays (Dawson, Mahmodian, Rahilly, 1993) • t-out-of-n perfect threshold schemes can be constructed using orthogonal arrays • The simplest construction is Shamir’s secret sharing scheme. 10 Breadth of Shamir’s Secret-Sharing Scheme (Dawson and Donovan 1994) General access control system for secret sharing using Shamir’s scheme including: • • Democratic schemes Multi level 11 Linear Codes and Boolean Functions 12 Properties of Boolean Function • Hamming Weight – – • wtH is number of ones in truth f(x) with n inputs is balanced if wtH (f)=2 (n-1) Hamming Distance – – DistH (f,g) is the number of truth table positions in which f and g differ. Nonlinearity, Nf, of f (x) is the minimum Hamming distance between f(x) and any affine function. 13 Properties of Boolean Function • Correlation – – dist H ( f , g ) c( f , g ) 1 2 n 1 f(x) has correlation immunity order m if there is zero correlation between f(x) and any linear function Lw(x) with wtH(w) ≤ m 14 Correlation Immune Function Table: Upper bounds on numbers balanced CI(m) Boolean functions 15 Construction of Correlation Immune Functions (Dawson, Wu 1997) • Linear codes can be used to construct Boolean Functions with known order or correlation immunity and nonlinearity. • Theorem: Let f(x)=g(xGT), where g is a non-degenerate Boolean function of k variables, and G is a generating matrix of an [n,k,d] linear code. Then – – – – f(x) is balanced if and only if a g(y) is balanced, Order(f(x))=ord(g(y)) Nf=2n-kNg The correlation immunity of f(x) is at least d-1 16 Latin Squares and Authentication Schemes (Denes and Keedall 1992) • Let (Q, *) denote a quasigroup where – – • Q is a set of q elements * a binary operation where a*x=b and y*a=b has exactly the same solution Let a message consist of s-blocks of length t 17 Latin Squares and Authentication Schemes • Key Generation – • Authentication – – – • Sender and receiver select secret (Q,*) M = a, a2, …, am = B, B2, …, Bs Bi = (••((ai1* ai2)8ai3)*••)*ait Transmit a1 a2 …am b1 b2 … bs Verification – Receiver uses (Q,*) on a1 a2 … am to verify b1 b2 … bs 18 Attack on Authentication Scheme (Dawson, Donovan, Offer, 1996) • Attack 1 – – • Given sufficient messages and authentication tags it is possible for an attacker to recover (Q,*) Attacker can then impersonate sender Attack 2 – There exists equivalent quasigroups (Q,) and (Q,) such that ( (( x1 x2 ) ) xt ( (( x1 x2 ) x3 ) ) xt 19 Genetic Algorithm 20 Genetic Algorithm • • Holland circa 1975; modelled on an evolutionary strategy – – • • reproduction incorporating mutation, and | survival of the fittest; a “pool” of solutions evolve based upon suitable mating, mutation and selection schemes; traditionally solutions are represented as a binary string, however newer techniques allow for arbitrary solution structures (evolutionary programming). 21 Example of Operators • • • Selection: parents are chosen from the current solution pool either at random, or based upon their fitness (weighted selection); Mating: traditional “crossover” Mutation: random bit complementation – each bit in the string is complemented with probability, pm, the mutation. 22 Example of Operators 1. Generate an initial pool of solutions (randomly or otherwise) and calculate the fitness of each. 2. For G iterations, using the current pool: (a) Select the breeding pool from the current solution pool and make pairings of parents. (b) Using a suitable mating function, use each pair of parents to generate a new pool of solutions. (c) Apply the mutation to each solution in the new pool. (d) Evaluate the fitness of each of the new solutions. (e) Based on the fitness of the solutions in the new pool and the current pool, select the solutions which will become the current pool in the next iteration. 3. Output the best solution found. 23 Attacks on Knapsack-Type Ciphers Merkle-Hellman cryptosystem: • based on an NP-hard adaptation of the subset sum problem: Given a set of integers, A, and an integer B obtained by summing a subset of A, find the subset (which is unique). • a number of exploits exist which attack the structure of the secret key (trapdoor) - these are very effective. In the Merkle-Hellman cryptosystem the secret key is a super-increasing sequence and the public key is obtained by modular multiplication with a secret constant; • Spillman (1993) proposed a genetic algorithm to solve the subset sum problem and hence attack the knapsack cipher! 24 Knapsack-Type Ciphers (Clark, Dawson 1994) Example (trivial in the extreme!): • Public key: A={5457, 1663, 216, 6013, 7439}; • Message: M={1, 0, 1, 1, 0 }; • Sum=5457+216+6013=11686 Spillman proposed a fitness based on how close the subset sum is to the target …. will not work since difference in sums does not correlate with Hamming distance: • M1= {1;1;1;1;0}, Sum1=13349. • M2= {1;0;0;0;1}, Sum2=12896. This is not an exception, it is the general rule :: 25 Knapsack-Type Ciphers Experiment with knapsack size = 30. Fitness values lie in the range (0,1): 26 Knapsack-Type Ciphers Therefore: • there is little to no correlation between the Hamming distance and the fitness; • since the fitness is not accurate, optimisation heuristics will not be effective; • consider the following results averaged over 100 different sums for each knapsack size: 27 Knapsack-Type Ciphers The results indicate that the genetic algorithm searches approximately one quarter of the solution space before finding the correct solution: • this is only twice as good as exhaustive search which would search half the solution space (on the average) before finding the correct solution; • experiments indicate that the exhaustive search is much more efficient since it doesn't suffer from the complexities of the GA. Conclusion: • optimisation heuristics are ineffective if there is no suitable solution assessment technique available. 28 Searching for Cryptographic Boolean Functions (Millan, Clark, Dawson 1998) Overview: • nonlinearity (distance to the closest linear function) is an important cryptographic property of Boolean functions; • balance is another important property; • a new technique for improving the nonlinearity of arbitrary Boolean functions, while maintaining balance, is proposed; • this technique can be used to find “locally-maximum” (in nonlinearity) Boolean functions using a hill-climbing approach; • the hill climbing method can be incorporated in a genetic algorithm to find Boolean functions with even higher nonlinearity. 29 Improving Nonlinearity It is possible to define: • conditions for determining a set of pairs of truth table positions such that complementing both truth table positions in the pair will increase the nonlinearity while maintaining the balance of the function; • an efficient technique for calculating the new WHT of a function modified using the above method. Locally-maximum functions: • functions for which such a set does not exist are locally maximum and their nonlinearity cannot be improved by complementing two of their truth table values. 30 Hill Climbing This technique can be used to successively update a Boolean function's truth table until it is no longer possible to improve the nonlinearity: 1. Generate a random truth table and calculate the Walsh-Hadamard transform. 2. Determine a set of pairs of truth table positions which, upon complementation, will improve the nonlinearity of the function (using techniques described above). If the set is empty go to Step 4. 3. Select one of the elements of the set (either randomly, or using some other heuristic), and complement the corresponding truth table positions. Update the Walsh-Hadamard transform. Return to Step 2. 4. The current function is locally maximum in nonlinearity. 31 Using a GA to find Nonlinear Boolean Functions 32 Using a GA to find Nonlinear Boolean Functions Notes: • complementing a function does not effect its nonlinearity; • moving the functions closer to each other (by complementing one), if necessary, reduces the amount of randomness in the child and, therefore, leads to children with similar characteristics; • since this mating operation incorporates randomness, a mutation operation is not required. 33 The Genetic Algorithm 1. Generate a pool of P random Boolean functions and calculate their Walsh-Hadamard transforms. 2. For G iterations do: (a) Perform the mating operation an all P(P-1)/2 pairings of solutions in the current pool (b) Hill climb each child function so that they are all locally maximum with respect to the technique being used. (c) Select the best solutions from the list of children and the current pool to form the new pool. To encourage diversity in the search, when a child has an equal fitness to a solution in the current pool, replace it with the child. 3. Report the best solution(s) from the current solution pool. 34 Boolean Function Results Benchmark results based upon random search of 1000000 functions: • • • R HC = hill climbing of random functions; GA = genetic algorithm with mating function – no hill climbing; GA HC = genetic algorithm with mating function and hill climbing. Number of functions considered by each technique before finding the benchmark: 35 Boolean Function Results • best nonlinearity achieved by each technique after testing 10000 functions 36 Application of GA Construction • Design of Boolean functions for LILI stream cipher – LILI-128 Cipher (Millan, Simpson, Dawson 1999) – LILI-II Cipher (Millan, Simpson, Dawson 2001) • Design of S-Boxes for SOBER stream cipher (Burnett, Dawson, Millan 1999) • Design of S-Boxes for MARS block cipher (Burnett, Dawson, Millan 2001) • Design of S-Boxes for Dragon stream cipher (Fuller, Millan, Dawson 2003) 37 Lessons Learned • Combinatorial mathematics offers a powerful tool for designing and analysing cryptographic systems. • Simplify! Simplify! • To apply combinatorial techniques one needs to understand cryptology. • For application of discrete optimisation make sure correct “fitness function” is used. 38