Heuristic Algorithms (see ch.5, K&S) Backtracking algorithms are most useful for generating or enumerating all solutions. If we only want one (optimal) solution then backtracking may not be efficient (we may “waste” a lot of time even before a single solution is found). To verify that a solution is optimal may require looking at a lot of the search tree. Sometimes it’s sufficient to find a feasible solution that is “good” (nearly optimal). In this case we don’t want to waste time looking at the entire tree. Heuristic algorithms may be more suitable in this case. Generally we will take a given solution or partial solution, and apply some modification to it to obtain a new solution/partial solution. Generic Optimization Problem: Given: A finite set X An objective function (profit) P(x) for x in X Feasibility functions (constraints) gj(x), 1 ≤ j ≤ m (the solution is feasible if all of the constraints are satisfied) Find: The maximum value of P(x) such that gj(x) ≥ 0 for 1 ≤ j ≤m In constructing a heuristic we require a neighbourhood function that defines when an element is “close to” another element. Example: Nd0(x) = {y in X: dist(x,y) ≤ d0} (binary vectors within a given Hamming distance) Then the neighbourhood has size |Ndo(x)| = Σ (n choose i) (for i = 0 to d0) How can we find feasible solutions in the neighbourhood of a given feasible solution? Exhaustive search Randomized search (usually faster) Output: Another feasible solution, or “Fail”, indicating we did not find a feasible solution in the neighbourhood. (If we used a randomized search, then we’d have to try again because we could have just missed something) COSC 4P03 Week 7 1 Neighbourhood Search Strategies: Given a feasible solution x, with neighbourhood N(x), and a profit function P(x): 1. Find a feasible solution y in N(x) such that P(y) is maximized; return “fail” if no feasible solutions exist. 2. Find a feasible solution y in N(x) such that P(y) is maximized; if P(y) > P(x) then return y, else return “fail”. (Called steepest ascent because we always try to improve on current solution… so “fail” if we can’t improve it) 3. Find any feasible solution y in N(x) 4. Find any feasible solution y in N(x); if P(y) > P(x) then return y, else return “fail”. Strategies 1 and 2 would require an exhaustive search of N(x). Strategies 3 and 4 could use a random search of N(x). In general a heuristic hN for improving our current solution x could be: A single neighbourhood search, or A sequence of neighbourhood searches – each successive solution is obtained from the previous solution by a neighbourhood search. Generic Heuristic Search Algorithm (on slides): c = 0; select a feasible solution X; Xbest = X; while(c <= cmax) { Y = hN(X); if(Y != fail) { X = Y; if(P(X) > P(Xbest)) Xbest = X; } c++; } return Xbest; Notes: cmax is the desired number of iterations of hN to be performed. Xbest is the current optimal solution. COSC 4P03 Week 7 2 Application of Heuristics to a Combinatorial Problem Uniform Graph Partition Problem (UGP): Given: A weighted graph G = (V,E) on 2n vertices (for some n) Find: The minimum cost of a partition [X0, X1] of G, where V = X0 U X1, |X0| = |X1| = n, cost([X0, X1]) = Σ (weight(u,v)) where (u,v) in E and u in X0 and v in X1 (the cost of all edges “crossing” the partition) Example graph and partition (on slide – explain): X0 = {0,2,5,7}, X1 = {1,3,4,6} Cost of this partition = weight(1,2) + weight(2,4) + weight(2,6) + weight(3,5) =8 +7 +2 + 4 = 21. 0 8 9 8 2 7 6 9 6 7 2 7 5 4 3 2 9 1 4 The set of all possible solutions is {[X0, X1] such that |X0| = |X1| = n} (We want to find the minimum cost from among all these) Neighbourhood of a given partition [X0, X1]: set of partitions in which an element in X0 has been swapped with an element in X1. Example: neighbourhood of above partition [{0,2,5,7}, {1,3,4,6}] is [{1,2,5,7}, {0,3,4,6}] (swap 0 & 1), [{2,3,5,7}, {0,1,4,6}] (swap 0 & 3), [{2,4,5,7}, {0,1,3,6}] (swap 0 & 4), [{2,5,6,7}, {0,1,3,4}] (swap 0 & 6), etc. To work out gain (change in cost) from exchanging u in X0 with v in X1: just look at edges affected by swapping u and v (all other costs still the same): Gain = cost([X0,X1]) – cost([X0 – {u} U {v}, X1 – {v} U {u}]) = sum(weight(u,y) for y in X1) + sum(weight(x,v)* for x in X0) – sum(weight(v,y) for y in X1) – sum(weight(x,u) for x in X0) *note fix from book Positive gain: solution has improved (i.e. lower cost). Negative gain: solution has worsened (i.e. higher cost). COSC 4P03 Week 7 3 Suppose we perform an exhaustive neighbourhood search, returning the partition with the maximum positive value; if not possible, return “fail”. Example: original partition = [{0,2,5,7}, {1,3,4,6}]. Exchanging u with v to obtain [Y0, Y1]: u 0 0 0 0 2 2 v 1 3 4 6 1 3 [Y0, Y1] 1257, 0346 2357, 0146 2457, 0136 2567, 0134 0157, 2346 0357, 1246 Gain -27 -24 -34 -32 -16 +3 Cost 48 45 55 53 37 18 We’re going to look at 2 design strategies for heuristic algorithms. They involve decisions about how to deal with result of a neighbourhood search. We’ll look at hill-climbing and simulated annealing. Others in book (likely seen by a lot of people in other courses): tabu search and genetic algorithms. Hill-Climbing Given initial solution X, perform an exhaustive neighbourhood search to find Y in N(X). We must have P(Y)>P(X) for any Y in N(X) returned by the search algorithm. No such Y exists search algorithm must return “fail”. (So we’re trying to obtain an optimal solution by improving upon each successive feasible solution we find.) This strategy tends to find local optimal solutions rather than a global optimal solution (occasionally we might have to go down a “valley” to eventually climb a bigger hill. It’s obviously very dependent on the initial solution chosen.) Example: Uniform Graph Partition Problem (UGP) Let V = a set of 2n vertices. We wish to partition V into 2 sets of size n. Algorithm to find initial solution (show slide and explain): SelectPartition() { r = random(0, (2n choose n) – 1); X0 = KSubsetLexUnrank(r, n, 2n); // change from book X1 = V – X0; } Note: there are (2n choose n) possible subsets of size n. We choose a random integer, and use the subset of that rank for X0. COSC 4P03 Week 7 4 Algorithm to perform neighbourhood search: Ascend([X0, X1]) { g = 0; // gain (need positive to improve soln) for each i in X0 { for each j in X1 { t = gain([X0, X1], i, j); if(t > g) // current best { x = i; y = j; g = t; } } } if g > 0 // improved // best result (g) obtained by swapping x and y { Y0 = (X0 U {y}) – {x}; Y1 = (X1 U {x}) – {y}; fail = false; return ([Y0, Y1]); } else // no improvement { fail = true; return ([X0, X1]); } } Full Hill-Climbing Algorithm for UGP: UGP(cmax) { [X0, X1] = SelectPartition(); for(c = 0; c < cmax; c++) // try for some max #iterations cmax, or // until no improvement { [Y0, Y1] = Ascend([X0, X1]); if(!fail) // use new partition and try again { X0 = Y0; X1 = Y1; } else return; // couldn’t improve } } COSC 4P03 Week 7 5 Simulated Annealing This strategy uses a randomized neighbourhood search. At each step: If Y is feasible and P(Y) >= P(X) then X = Y (i.e. Y is now current solution) Else if Y is feasible and P(Y) < P(X), then X = Y according to a certain probability. This is called a downward move. This allows the algorithm to “escape” from local optimal solutions. The probability is determined as follows: We have a temperature T, initialized to some value T0 > 0. The value of T is decreased according to a cooling schedule. Generally, we multiply it by a constant number a between 0 and 1. At each step, if P(Y) < P(X), then we choose a random number r between 0 and 1, and replace X with Y if r < e (P(Y) – P(X))/T The probability of a downward move decreases as time goes on. (So as we get closer to an optimal solution, it’s less and less likely.) This is because P(Y) < P(X): when we decrease T, (P(Y) – P(X))/T is lowered (made more negative) e (P(Y) – P(X))/T becomes smaller. Notes: usually we want initial temperature to be fairly high downward moves more likely try for different values of a (cooling rate) to see what works best Simulated Annealing Algorithm (fairly generic, but with notes for UGP): T = T0; Select a feasible solution X; // UGP: as for Hill-climbing Xbest = X; for(c = 0; c < cmax; c++, T *= a) { Y = hN(X); // random feasible soln from neigh. Search // UGP: find random values of i and j to exchange if(Y != fail) { if(P(Y) >= P(X)) // if improved, always keep it { X = Y; if(P(X) > P(Xbest) Xbest = X; } else // no improvement: may keep it, acc. to probability { COSC 4P03 Week 7 6 r = random(0,1); if(r < exp((P(Y) – P(X))/T) X = Y; } } } return Xbest; See section 5.5 for a simulated annealing algorithm applied to the knapsack problem. COSC 4P03 Week 7 7 Intro to Cryptography (main reference book: Stinson, “Cryptography, Theory and Practice”) Main objective: Enable two people, A (usually called Alice) and B (usually called Bob) to communicate over an insecure channel, ensuring that their opponent O (usually called Oscar) cannot understand their communication. Terminology: Plaintext: the message you wish to send Ciphertext: the encrypted plaintext Key: information used to encrypt the plaintext and decrypt the ciphertext. For each key K, there is: An encryption rule eK(x) to encrypt the plaintext A decryption rule dK(y) to decrypt the ciphertext Each eK and dK are functions such that dK(eK(x)) = x for every plaintext x (i.e. inverses of each other) Each eK is one-to-one, i.e. there should be no y = eK(x1) = eK(x2) for x1 ≠ x2, or we couldn’t decrypt y unambiguously (it could be either x1 or x2). Desirable properties of a cryptosystem: 1. eK and dK should be efficiently computable 2. An opponent should not be able to determine the key or the plaintext. Cryptanalysis: the process of attempting to compute a key given a string of ciphertext. Once Oscar has the key, he would simply be able to apply dK to obtain the plaintext. COSC 4P03 Week 7 8 Examples of simple cryptosystems: 1. Shift cipher (also called Caesar cipher since used by Julius Caesar) Shifts each letter in the plaintext by a set amount. There are 26 possible keys (assuming we wish to represent the English alphabet). Each letter is assigned an integer value between 0 and 25, e.g. a = 0, b = 1, … Given a key K (the size of the shift): eK(x) = (x + K) mod 26 dK(y) = (y – K) mod 26 Example: K = 11 Plaintext: “meetatmidnight” Translation: 12 4 4 19 0 19 12 8 3 13 8 6 7 19 Encryption: 23 15 15 4 11 4 23 19 14 24 19 17 18 4 Ciphertext: “XPPELEXTOYTRSE” Translation: 23 15 15 4 11 4 23 19 14 24 19 17 18 4 Decryption: 12 4 4 19 0 19 12 8 3 13 8 6 7 19 Plaintext: “meetatmidnight” Note that this cipher is not secure, since we simply need to check all possible keys, looking for a message that makes sense. On average we’ll only need to check 13. 2. Substitution cipher eK: apply a permutation to the alphabet. Each letter in the plaintext is substituted with the appropriate letter from the permutation. dK: use the inverse permutation. Example: A possible permutation for encryption: a b c d e f g h i j k l m n o p q r s t u v w x y z X N Y A H P O G Z Q WB T S F L R C V M U E K J D I Corresponding permutation for decryption: A B C D E F G H I J K L M N O P Q R S T U V WX Y Z d l r y v o h e z x w p t b g f j q n m u s k a c i For this example, eK(a) = X, dK(F) = o (etc.) Example: Plaintext: "meetatnoon" Ciphertext: "THHMXMSFFS" Note that the key is just one of the 26! > 4 x 1026 possible permutations, so it is impossible to do an exhaustive search to find it. However we can apply statistical methods (seen later). COSC 4P03 Week 7 9 Above cryptosystems are monoalphabetic – a single alphabetic character is encrypted at a time (mapped to a unique alphabetic character). 3. Vigenere cipher (named for Blaise de Vigenere, 16th c.) Each letter is assigned an integer value (a = 0, b = 1, etc.) Each possible key corresponds to a keyword – a string of length m. We encrypt m characters of plaintext at a time, obtaining m characters of ciphertext. This is called a polyalphabetic cipher. The plaintext is broken into pieces of length m, to which the key is added. Example: Keyword = BROCK → key = (1,17,14,2,10) Plaintext: “meetatmidnight” Translation: 12 4 4 19 0 19 12 8 Encryption: 13 21 18 21 10 20 3 22 Ciphertext: “NVSVKUDWFXJXVV” Translation: 13 21 18 21 10 20 3 22 Decryption: 12 4 4 19 0 19 12 8 Plaintext: “meetatmidnight” 3 13 5 23 8 6 7 19 9 23 21 21 5 23 3 13 9 23 21 21 8 6 7 19 Number of possible keywords of length m: 26m. For m = 5, the number of keywords = 265 > 1 x 107. A computer could break this using an exhaustive search. (Other cryptanalysis techniques seen later: first find m, then try to discover keyword) COSC 4P03 Week 7 1 0 4. Permutation cipher (also called transposition cipher) The plaintext is broken into pieces of length m. The key is a permutation of length m. Each piece of plaintext is permuted according to the key, to obtain the ciphertext. To decrypt, apply the inverse permutation to the ciphertext. 1 3 1 2 Example: Key: 2 3 4 5 1 5 2 4 Inverse permutation: 2 3 4 5 4 1 5 3 Plaintext: Cyphertext: “meeta|tnoon” “ETMAE|NOTNO” There are m! permutations of length m. For m = 5, there are 120 permutations (exhaustive search possible). So you would want a long key! Stream Ciphers Stream ciphers are generally used when the amount of plaintext is indeterminate. e.g. WEP. Generate a key stream z1z2… A plaintext string x1x2… is encrypted as ciphertext y1y2… via a separate encryption rule for each element: y1 = ez1(x1), y2 = ez2(x2), … The keystream may repeat: a stream cipher is periodic with period d if zi+d = zi for all integers i ≥ 1. Synchronous stream cipher: keystream is constructed from key, independent of plaintext string Non-synchronous stream cipher: each keystream element depends on previous plaintext or ciphertext elements. The Vigenere cipher is a synchronous stream cipher: Let key = (k1, k2, …, km) keystream is k1k2…kmk1k2…km… Stream ciphers often described in terms of binary alphabets (if we don’t have this, then we translate what we have to binary (e.g. use ASCII code)). Use binary addition for encryption and decryption: Encryption: ez(x) = (x+z) mod 2 Decryption: dz(y) = (y+z) mod 2 Note: this is just the exclusive-or operation “cheap” to implement in hardware COSC 4P03 Week 7 1 1 Possible method to generate a synchronous keystream z1z2z3...: Let (k1,k2,…,km) be a binary m-tuple (the key) Let zi = ki for 1 ≤ i ≤ m (In other words the keystream matches the key for the first m characters.) zm+1 = f(z1,z2,...zm) That is, m+1 character is a function of the previous m characters. This is sometimes called a feedback function. So zm+i = f(zi,zi+1,...zm+i-1) An example of a feedback function is given below. Let zi+m = sum(j=0 to m-1) ((cj*zi+j) mod 2) for i ≥ 1 and where c0, …, cm-1 are given constants. The recurrence has degree m Note: we should never use (k1,k2,…,km) = (0,0,…,0) because then ciphertext = plaintext! If the constants c0, …, cm-1 are chosen appropriately we will have a keystream of period 2m-1. Thus a short key (length m) gives rise to a keystream with a long period. This is desireable since keystreams with short periods are easier to cryptanalize. e.g. Vigenere cipher. A keystream generator is essentially a pseudo random number generator. Example: Let m = 4 and let keystream by generated by linear recurrence zi+4 = (zi + zi+1) mod 2, i.e. c0 = 1, c1 = 1, c2 = 0, c3 = 0. Suppose keystream is initialized to (1,0,0,0). Then we have a keystream of period 15: 1,0,0,0,1,0,0,1,1,0,1,0,1,1,1,… Implementation details: this can efficiently be produced in hardware via a linear feedback shift register with m stages: Initialize shift register to (kl,k2, …, km) k1 is the next keystream bit k2, …, km are each shifted one position to left to become k1, …, km-1 new value of km is computed as sum(j = 0 to m-1) (cj*kj+1) (use old values of kj) For above example, this would look like this: + k1 COSC 4P03 Week 7 k2 k3 k4 1 2 Cryptanalysis This is the process the opponent (Oscar) uses to find the key. Kerckhoff’s Principle: Oscar knows the cryptosystem being used. It’s obviously harder if Oscar doesn’t know the cryptosystem. This principle allows us to evaluate the security of the cryptosystem itself, rather than relying on Oscar not knowing which cryptosystem is used. Types of attacks on cryptosystems: Ciphertext-only: opponent has a string of ciphertext (hardest) Known plaintext: opponent has a string of plaintext and its corresponding ciphertext Chosen plaintext: opponent chooses a string of plaintext, and (due to temporary access to encryption machinery) constructs corresponding ciphertext Chosen ciphertext: opponent chooses a string of ciphertext, and (due to temporary access to decryption machinery) constructs corresponding plaintext. We’ll be looking at ciphertext-only for now, using statistics. Relative frequencies of 26 letters in English language, in decreasing order: E (12%) T, A, O, I, N, S, H, R (6-9% each) D, L (4%) U, C, M, W, F, G, Y, P, B (1.5 – 2.8%) V, K, J, X, Q, Z (less than 1%) Also look at common sequences of consecutive letters. Some common 2-letter sequences (digrams): TH, HE, IN, ER, AN Some common 3-letter sequences (trigrams): THE, ING, AND (See full lists in book). (Cryptanalysis of specific cryptosystems coming up…) COSC 4P03 Week 7 1 3