Application

advertisement
Application of Combinatorial
Mathematics to Cryptology: A
Personal Journey
Ed Dawson
Information Security Institute
Queensland University of Technology
1
Overview
• Introduction
• Combinatorial Structures
– Secret Sharing Schemes
– Latin Squares and Authentication Schemes
– Linear Codes and Boolean Functions
• Discrete Optimisation
– Genetic Algorithm
– Knpasack Cipher and Genetic Algorithm
– Boolean Functions and Discrete Optimisation
• Lessons Learned
2
Introduction
3
Areas of Application
•
Combinatorial structures with special properties
–
•
Provide discrete structures to build cryptographic
systems
Discrete Optimisation
–
–
Provide methods to search large finite structures
Tool for designing cryptographic systems and for
cryptanalysis
4
Combinatorial Structures
•
Examples
–
–
–
–
Ordered and unordered block designs used in
secret sharing schemes.
Linear codes used in stream ciphers, block
ciphers, public key ciphers, authentication
codes, secret sharing schemes.
Latin squares used in authentication schemes.
Primitive polynomials used in stream ciphers.
5
Discrete Optimisation Techniques
•
•
•
•
Genetic Algorithm
Hill Climbing
Simulated Annealing
Tabu Search
6
Secret Sharing
Schemes
7
Shamir’s Secret Sharing Scheme
(1979)
•
Key Generation
–
–
•
Key Recovery
–
•
Select a polynomial f(x)=K+a,x … +at-1 xt-1 over Zp
where P is large prime
Distribution to participant Pi share f(i)
for i=1, …, n
Any t participants can recover key K using their
shares by lagrange interpolation
This is a perfect t-out-of-n threshold scheme
8
Secret Sharing Schemes
A t-out-of-n perfect threshold scheme is a method
whereby n pieces on information called shares,
to a secret key are distributed such that:
•
K can be reconstructed from knowledge of any t
or more shares
•
Knowledge of fewer that t shares provides no
information about K
9
Orthogonal Arrays
(Dawson, Mahmodian, Rahilly, 1993)
•
t-out-of-n perfect threshold schemes can
be constructed using orthogonal arrays
•
The simplest construction is Shamir’s
secret sharing scheme.
10
Breadth of Shamir’s
Secret-Sharing Scheme
(Dawson and Donovan 1994)
General access control system for secret
sharing using Shamir’s scheme including:
•
•
Democratic schemes
Multi level
11
Linear Codes
and
Boolean Functions
12
Properties of Boolean Function
•
Hamming Weight
–
–
•
wtH is number of ones in truth
f(x) with n inputs is balanced if wtH (f)=2 (n-1)
Hamming Distance
–
–
DistH (f,g) is the number of truth table positions in
which f and g differ.
Nonlinearity, Nf, of f (x) is the minimum Hamming
distance between f(x) and any affine function.
13
Properties of Boolean Function
•
Correlation
–
–
dist H ( f , g )
c( f , g )  1 
2 n 1
f(x) has correlation immunity order m if there is zero
correlation between f(x) and any linear function
Lw(x) with wtH(w) ≤ m
14
Correlation Immune Function
Table: Upper bounds on numbers balanced CI(m) Boolean functions
15
Construction of Correlation Immune
Functions (Dawson, Wu 1997)
•
Linear codes can be used to construct Boolean
Functions with known order or correlation
immunity and nonlinearity.
•
Theorem: Let f(x)=g(xGT), where g is a non-degenerate
Boolean function of k variables, and G is a generating
matrix of an [n,k,d] linear code. Then
–
–
–
–
f(x) is balanced if and only if a g(y) is balanced,
Order(f(x))=ord(g(y))
Nf=2n-kNg
The correlation immunity of f(x) is at least d-1
16
Latin Squares and
Authentication Schemes
(Denes and Keedall 1992)
•
Let (Q, *) denote a quasigroup where
–
–
•
Q is a set of q elements
* a binary operation where a*x=b and y*a=b has
exactly the same solution
Let a message consist of s-blocks of length t
17
Latin Squares and
Authentication Schemes
•
Key Generation
–
•
Authentication
–
–
–
•
Sender and receiver select secret (Q,*)
M = a, a2, …, am
= B, B2, …, Bs
Bi = (••((ai1* ai2)8ai3)*••)*ait
Transmit a1 a2 …am b1 b2 … bs
Verification
–
Receiver uses (Q,*) on a1 a2 … am to verify
b1 b2 … bs
18
Attack on Authentication Scheme
(Dawson, Donovan, Offer, 1996)
•
Attack 1
–
–
•
Given sufficient messages and authentication tags it
is possible for an attacker to recover (Q,*)
Attacker can then impersonate sender
Attack 2
–
There exists equivalent quasigroups
(Q,) and (Q,) such that
(  (( x1  x2 )    )  xt  (  (( x1  x2 )  x3 )    )  xt
19
Genetic Algorithm
20
Genetic Algorithm
•
•
Holland circa 1975;
modelled on an evolutionary strategy
–
–
•
•
reproduction incorporating mutation, and |
survival of the fittest;
a “pool” of solutions evolve based upon suitable
mating, mutation and selection schemes;
traditionally solutions are represented as a
binary string, however newer techniques allow
for arbitrary solution structures (evolutionary
programming).
21
Example of Operators
•
•
•
Selection: parents are chosen from the current
solution pool either at random, or based upon their
fitness (weighted selection);
Mating: traditional “crossover”
Mutation: random bit complementation – each bit in
the string is complemented with probability, pm, the
mutation.
22
Example of Operators
1. Generate an initial pool of solutions (randomly or
otherwise) and calculate the fitness of each.
2. For G iterations, using the current pool:
(a) Select the breeding pool from the current solution pool and
make pairings of parents.
(b) Using a suitable mating function, use each pair of parents to
generate a new pool of solutions.
(c) Apply the mutation to each solution in the new pool.
(d) Evaluate the fitness of each of the new solutions.
(e) Based on the fitness of the solutions in the new pool and the
current pool, select the solutions which will become the
current pool in the next iteration.
3. Output the best solution found.
23
Attacks on
Knapsack-Type Ciphers
Merkle-Hellman cryptosystem:
•
based on an NP-hard adaptation of the subset sum problem:
Given a set of integers, A, and an integer B obtained by
summing a subset of A, find the subset (which is unique).
•
a number of exploits exist which attack the structure of the secret
key (trapdoor) - these are very effective. In the Merkle-Hellman
cryptosystem the secret key is a super-increasing sequence and
the public key is obtained by modular multiplication with a
secret constant;
•
Spillman (1993) proposed a genetic algorithm to solve the subset
sum problem and hence attack the knapsack cipher!
24
Knapsack-Type Ciphers
(Clark, Dawson 1994)
Example (trivial in the extreme!):
•
Public key: A={5457, 1663, 216, 6013, 7439};
•
Message: M={1, 0, 1, 1, 0 };
•
Sum=5457+216+6013=11686
Spillman proposed a fitness based on how close the subset
sum is to the target …. will not work since difference
in sums does not correlate with Hamming distance:
•
M1= {1;1;1;1;0}, Sum1=13349.
•
M2= {1;0;0;0;1}, Sum2=12896.
This is not an exception, it is the general rule ::
25
Knapsack-Type Ciphers
Experiment with knapsack size = 30. Fitness
values lie in the range (0,1):
26
Knapsack-Type Ciphers
Therefore:
•
there is little to no correlation between the Hamming
distance and the fitness;
•
since the fitness is not accurate, optimisation
heuristics will not be effective;
•
consider the following results averaged over 100
different sums for each knapsack size:
27
Knapsack-Type Ciphers
The results indicate that the genetic algorithm searches approximately
one quarter of the solution space before finding the correct
solution:
•
this is only twice as good as exhaustive search which would
search half the solution space (on the average) before finding the
correct solution;
•
experiments indicate that the exhaustive search is much more
efficient since it doesn't suffer from the complexities of the GA.
Conclusion:
•
optimisation heuristics are ineffective if there is no suitable
solution assessment technique available.
28
Searching for Cryptographic Boolean
Functions
(Millan, Clark, Dawson 1998)
Overview:
•
nonlinearity (distance to the closest linear function) is an important
cryptographic property of Boolean functions;
•
balance is another important property;
•
a new technique for improving the nonlinearity of arbitrary
Boolean functions, while maintaining balance, is proposed;
•
this technique can be used to find “locally-maximum” (in
nonlinearity) Boolean functions using a hill-climbing approach;
•
the hill climbing method can be incorporated in a genetic
algorithm to find Boolean functions with even higher nonlinearity.
29
Improving Nonlinearity
It is possible to define:
•
conditions for determining a set of pairs of truth table positions
such that complementing both truth table positions in the pair will
increase the nonlinearity while maintaining the balance of the
function;
•
an efficient technique for calculating the new WHT of a function
modified using the above method.
Locally-maximum functions:
•
functions for which such a set does not exist are locally maximum
and their nonlinearity cannot be improved by complementing two
of their truth table values.
30
Hill Climbing
This technique can be used to successively update a Boolean function's
truth table until it is no longer possible to improve the nonlinearity:
1. Generate a random truth table and calculate the Walsh-Hadamard
transform.
2. Determine a set of pairs of truth table positions which, upon
complementation, will improve the nonlinearity of the function
(using techniques described above). If the set is empty go to Step
4.
3. Select one of the elements of the set (either randomly, or using some
other heuristic), and complement the corresponding truth table
positions. Update the Walsh-Hadamard transform. Return to Step
2.
4. The current function is locally maximum in nonlinearity.
31
Using a GA to find Nonlinear
Boolean Functions
32
Using a GA to find Nonlinear
Boolean Functions
Notes:
•
complementing a function does not effect its
nonlinearity;
•
moving the functions closer to each other (by
complementing one), if necessary, reduces the amount
of randomness in the child and, therefore, leads to
children with similar characteristics;
•
since this mating operation incorporates randomness, a
mutation operation is not required.
33
The Genetic Algorithm
1. Generate a pool of P random Boolean functions and calculate their
Walsh-Hadamard transforms.
2. For G iterations do:
(a) Perform the mating operation an all P(P-1)/2 pairings of solutions in the
current pool
(b) Hill climb each child function so that they are all locally maximum with
respect to the technique being used.
(c) Select the best solutions from the list of children and the current pool to
form the new pool. To encourage diversity in the search, when a child
has an equal fitness to a solution in the current pool, replace it with
the child.
3. Report the best solution(s) from the current solution pool.
34
Boolean Function Results
Benchmark results based upon random search of 1000000 functions:
•
•
•
R HC = hill climbing of random functions;
GA = genetic algorithm with mating function – no hill climbing;
GA HC = genetic algorithm with mating function and hill
climbing.
Number of functions considered by each technique before finding the
benchmark:
35
Boolean Function Results
• best nonlinearity achieved by each technique
after testing 10000 functions
36
Application of GA Construction
• Design of Boolean functions for LILI stream
cipher
– LILI-128 Cipher (Millan, Simpson, Dawson 1999)
– LILI-II Cipher (Millan, Simpson, Dawson 2001)
• Design of S-Boxes for SOBER stream cipher
(Burnett, Dawson, Millan 1999)
• Design of S-Boxes for MARS block cipher
(Burnett, Dawson, Millan 2001)
• Design of S-Boxes for Dragon stream cipher
(Fuller, Millan, Dawson 2003)
37
Lessons Learned
• Combinatorial mathematics offers a
powerful tool for designing and analysing
cryptographic systems.
• Simplify! Simplify!
• To apply combinatorial techniques one
needs to understand cryptology.
• For application of discrete optimisation
make sure correct “fitness function” is used.
38
Download