Combinatorial Optimization Lecture 1 1 Motivating examples An optimization problem is a problem of the following form: Problem 1. Given a function f : S → R where S is a set, find an element s ∈ S with f (s) maximum (or minimum). When S = R or S = Rn we have powerful tools from calculus for solving problems like this. A combinatorial optimization problem is the special case of the above, when the set S is finite. Here techniques from calculus usually don’t carry over very well, and we need to come up with a different set of techniques to approach problems like this. On the other hand, in real world optimization problems we actually often have that S is a finite set: Example 2 (Euclidean travelling salesman). A helicopter has to visit 27 sites. The locations and distances are known, the target is to find in which order it should visit the sites in order to minimize the total distance flown. This give rise to a more general task called the Euclidean Traveling Salesman Problem, ETSP for short. In this problem sites (or points) v1 , . . . , vn are given = Pn in the plane and for a permutation π of {1, . . . , n} we let L(π) 2 |v − v | where |a − b| is the distance between points a, b ∈ R , and π(i+1) π(i) 1 vn+1 = v1 by convention. This can be mathematically formalized as: TASK: ETSP INPUT: points v1 , . . . , vn ∈ R2 OUTPUT: a permutation σ of {1, . . . , n} minimizing L(π) There are a couple of natural approaches that one can take for solving this: Brute force approach: There are only finitely many (namely n!) permutations so the minimum can be found by checking all of them. But this is impossible in practice: checking for instance 27! permutations would take an enormous amount of time. Nearest neighbour approach: Fix σ(1) = 1. Then pick σ(2) to minimize ||vσ(1) − vσ(2) ||. Then pick σ(3) to minimize ||vσ(2) − vσ(3) ||... This is a fast algorithm, but it doesn’t always give an optimal solution. 1 Remark. ETSP belongs to a large class of notoriously hard problems for which no fast or effective algorithm is known. It is generally assumed (but no proof is in sight) that in fact, there is no fast algorithm for ETSP. Example 3. Minimum cost path problem Given a road map with two specified locations u, v, and specified costs for travelling along each road, find the cheapest path from u to v. This is essentially the problem that your phone solves when it charts the quickest route between two places using Google Maps. Again, the set of paths from u to v is finite, so the optimal solution can be found by exhaustive search (which as before is too slow to do in practice). But, as we will see later in the module, unlike ETSP, there is a fast algorithm solving the minimum cost path problem. Decision problems A second class of problems we’ll look at in the module are decision problems. These are defined as follows: Problem 4. Consider two sets S, T with S ⊆ T . Given t ∈ T , decide if t ∈ S or not. Here are two related examples of decision problems: Example: This is to find out if a given Diophantine equation has a root in integers or not. TASK: Diophantine equations INPUT: a polynomial p(x1 , · · · , xn ) with integer coefficients OUTPUT: YES, if the equation p = 0 has a solution with x1 , . . . , xn ∈ Z (the set of integers); NO, if p = 0 has no such solution. By a famous theorem proved by Davis, Putnam, Robinson and Matiyasevich, that this problem is “undecidable”. This means that there is no general algorithm (say, a computer program) that could decide for every Diophantine equation whether or not it has a solution in integers. Later in the module we’ll see how to mathematically define what “undecidable” means. Example: TASK: Equations with real solutions INPUT: a polynomial p(x1 , . . . , xn ) with integer coefficients 2 OUTPUT: YES, if the equation p = 0 has a solution with x1 , . . . , xn ∈ R (the set of real numbers); NO, if p = 0 has no such solution. By a theorem proved by A. Tarski, there is an algorithm that solves this problem. Tarski explicitly described such an algorithm. 2 Definition of algorithms, and running times The above examples suggest that to study combinatorial optimization problems, we should study algorithms i.e. procedures one can follow to solve the problem. Informally, we will think of an algorithm as a “procedure that takes an input and produces an output solving some task”. Much of the time we will stick with this informal notion of an algorithm and study concrete examples of algorithms for particular problems. Towards the end of the module we will study Turing Machines which are a mathematically precise way of defining algorithms (which allows us to prove theorems about them). In this lecture we will study the Nearest Neighbour algorithm for ETSP, which was described earlier. Here is a more formal description of it: Algorithm 5 (Nearest neighbour algorithm). Input: points (x1 , y1 ), . . . , (xn , yn ) ∈ R2 . Output: a permutation σ of 1, . . . , n. Procedure: Set σ(1) = 1. For i = 2, . . . , n, repeat the following: Set min = ∞. For j = 1, . . . , n, repeat the following: – Calculate d = (xσ(i−1) − xj )2 + (yσ(i−1) − yj )2 . – If d < min and j 6= σ(1), . . . , σ(i − 1), then set min = d and set σ(i) = j. Output σ. Definition 6 (Running time). Given a task, the running time of an algorithm A on an input I of this task is T (A, I) = number of steps taken by A on I. When it is clear what algorithm we are looking at, we abbreviate this to T (I). 3 This definition is currently not complete because it doesn’t define what a “step” is. Informally, a step or “operation” is one elementary operation performed by the algorithm. Throughout the module, in different contexts, it will be convenient to count steps in slightly different ways. We will refer to the particular way we count steps in a particular algorithm as the “model of computation” that the algorithm uses. The most frequent model of computation is the arithmetic model: Definition 7 (Arithmetic model). An algorithm A takes place in the arithmetic model if the input to the algorithm is a finite sequence of real numbers, and every step of the algorithm is one of the following: (1) Add/subtract two real numbers. (2) Multiply/divide two real numbers. (3) Compare two real numbers x, y (to check if x < y, x = y, or x > y). (4) Change the value of some variable. The running time T (A, I) is the number of times each of the above bullet points happens when the algorithm is run on I. For example we can analyse the nearest neighbour algorithm using this definition: Example 8 (Running time of nearest neighbour algorithm). Set σ(1) = 1. [This is 1 step of type (4)] For i = 1, . . . , n [This is n step of type (4)], repeat the following: Set min = ∞ [This is 1 step of type (4)]. For j = 2, . . . , n [This is n step of type (4)], repeat the following: – Calculate d = (xσ(i−1) − xj )2 + (yσ(i−1) − yj )2 [This is 2 subtractions, 2 multiplications, 1 addition, and 1 write. So 6 steps in total]. – If d < min [This is 1 step of type (3)] and j 6= σ(1), . . . , σ(i− 1) [This is i − 1 ≤ n − 1 steps of type (3)], then set min = d [This is 1 step of type (4)] and set σ(i) = j [This is 1 step of type (4)]. 4 Output σ. Adding up all the above operations (taking into account that the ones inside the “for” loops are repeated multiple times), we get the upper bound: T ((x1 , y1 ), . . . , (xn , yn )) ≤ 1+(n−1)+(n−1)(1+n+n(6+1+(n−1)+2)) = n3 + 9n2 − 8n − 1. The running time of an algorithm usually depends on what input it gets. Generally, we want to analyse how fast or slow an algorithm is. To do this, we analyse the worst case running time of the algorithm. Definition 9 (Worst case running time). For an algorithm A, we define the worst case running time as: T (n) = T (n, A) = max{T (A, I) : I is an input of length n} This definition is again not yet complete because it depends on us defining what the length of an input is. The definition of length again depends on what model of computation we use. In the arithmetic model it is defined as follows: Definition 10 (Length of input in arithmetic model). The length of an input I in the arithmetic model is the number of real numbers given in I. Returning to the Nearest Neighbour algorithm: the input is given as “I = points (x1 , y1 ), . . . , (xn , yn ) ∈ R2 ”. Thus is a sequence of 2n real numbers. Thus length(I) = 2n. Using this we get the following upper bound on the worst case running time of the nearest neighbour algorithm: T (2n) ≤ n3 + 9n2 − 8n − 1. (1) One can also analyze algorithms by their average case behaviour, and we may meet occasionally such cases in the course. 3 Asymptotic analysis of algorithms The performance of an algorithm depends on what machine or computer is used. We want to get rid of this dependence and wish to compare algorithms independently of the machine. The idea is to work with asymptotic analysis. That is, we look at how T (n) grows as n → ∞ and ignore constant terms and smaller order terms that may depend on the particular computer or the skills 5 of the programmer. For instance, if T (n) = 6n3 + 7n2 − 11n, then T (n) is asymptotically n3 , we have simply dropped lower order terms and constants. More formally: Definition 11. Assume f, g : N → R are functions. We say that f (n) = O(g(n)) if there are C > 0 and n0 ∈ N such that for all n ≥ n0 , 0 < f (n) < Cg(n). Definition 12. Assume f, g : N → R are functions. We say that f (n) = Ω(g(n)) if there are c > 0 and n0 ∈ N such that for all n ≥ n0 , f (n) > cg(n) > 0. Definition 13. Assume f, g : N → R are functions. We say that f (n) = Θ(g(n)) if both f (n) = Ω(g(n)) and f (n) = O(g(n)) hold. The last definition can be shown to be equivalent to there being constants c, C > 0 and n0 ∈ N such that for n ≥ n0 we have 0 < cg(n) < f (n) < Cg(n). Using these definitions, and our earlier bound on T (2n) for the nearest neighbour algorithm, we can establish the following: T (n) = O(n3 ) for the nearest neighbour algorithm. Proof. To get this first notice, the (1) is equivalent to T (n) ≤ (n/2)3 + 9(n/2)2 − 8(n/2) − 1 = n3 /8 + 9n2 /4 − 4n − 1. Using that n3 ≥ n2 for n ∈ N, we get that T (n) ≤ n3 /8 + 9n3 /4 = 19n3 /8. Thus, the definition of T (n) = O(n3 ) is satisfied with n0 = 1 and c = 19/8. 4 Models of computation There are four models of computation which will come up in various parts of this module: The arithmetic model. The decimal model. Comparison sorting algorithms. Turing machines. We’ve already seen the arithmetic model of computation (and this is by far the most common one we will use). We will now introduce the decimal and comparison ones. Turing machines are a mathematically rigorous definition of algorithms which will be defined towards the end of the module. 6 Definition 14 (Decimal model). An algorithm A takes place in the decimal model if the input to the algorithm is a finite sequence of one-digit integers ∈ {0, 1, . . . , 9} (and the length of the input I is the number of such one-digit integers in I), and every step of the algorithm is one of the following: (1) Add/subtract two one-digit integers. (2) Multiply/divide two one-digit integers (“divide an integer x by an integer y” means to determine integers q and r < y so that x = yq + r). (3) Compare two one-digit integers x, y (to check if x < y, x = y, or x > y). (4) Change the value of some variable. The decimal model is a bit closer to how real computers work than the arithmetic model (since manipulating arbitrary real numbers isn’t realistic). We’ll use it when studying number-theoretic algorithms for tasks like “add two integers”. Here is one example of such an algorithm: Example 15 (School addition algorithm). Input: two positive integers a = aA aA−1 . . . a1 , b = bB bB−1 . . . b1 written in decimal with with a ≥ b. Output: a + b written in decimal. Procedure: – Set c1 , . . . , cA+1 = 0. – for i = 1, ..., A, repeat the following: * Let ci+1 ci = ai + bi + ci (adding three one-digit numbers produces a number with ≤ 2 digits) – Output cA+1 cA . . . c1 This is exactly 6A+1 elementary operations, because there are A+1 write operations before the for loop, there are A iterations of the “for” loop and each iteration contains precisely 3 “write” operations (meaning an operation of type (4)) and 2 “addition” operations (we are adding ai + bi + ci . Adding ai +bi is one operation and produces a number x, with ≤ 2 digits say x = x2 x1 . Since ai , bi ≤ 9, we additionally have that x ≤ 18. Now, note that ci ≤ 1 always and so working out (ai + bi ) + ci = x + ci is just one additional operation — work out y = x1 + ci , which is a one-digit number, and then write x2 y which is exactly the number ai + bi + ci ). Since the size of the input is n = A + B, we get that the running time is T (n) = O(n). 7 It is possible to similarly analyse algorithms for subtraction, multiplication, division, comparison of integers that you learned in school. The produce the following theorem, which you can use without proof: Theorem 16. There are algorithms in the decimal model for the following: (1) addition in O(n) steps, (2) subtraction in O(n) steps, (3) comparison in O(n) steps, (4) multiplication in O(n2 ) steps, (5) division (up to the integer part of the ratio) in O(n2 ) steps. Next we define comparison algorithms. They are almost exactly the same as a the arithmetic model — except that all the steps are just comparisons. Definition 17 (Comparison-based algorithm). An algorithm A is a comparisonbased algorithm if the input to the algorithm is a finite sequence of real numbers (and the length a input I is the number of such numbers in I), and every step of the algorithm is one of the following: (1) Compare two real numbers x, y (to check if x < y, x = y, or x > y). (2) Change the value of some variable. The running time T (A, I) is the number of A on I is the number of comparisons that happen i.e. we allow the algorithm to perform as many operations of type (2) as it wants without counting it in the running time. The motivation for this model of algorithms is that it is a bit simpler to count operations than in the arithmetic model (since there is only one type of operation which contributes to the running time). We shall use this next week to give a formal proof of a lower bound on the running times of certain algorithms. This is in contrast to most places in the course where only upper bounds are shown on running times. We will mainly consider comparison algorithms for the following task: TASK: Sorting INPUT: a1 , a2 , . . . , an ∈ N (or in R) OUTPUT: b1 , b2 , . . . , bn the ai s in increasing order. Here’s an example of such an algorithm: 8 Example 18 (Insertion sort). Input: a1 , a2 , . . . , an ∈ N (or in R) Output: b1 , b2 , . . . , bn the ai s in increasing order. Procedure: For i = 1, . . . , n, repeat the following: Set b0 = −∞ and bi+1 = +∞. For j = 1, . . . , i, repeat the following: – If bj−1 ≤ ai ≤ bj , then insert ai between bj−1 and bj (by redefining b0j = ai and b0j+1 = bj , . . . , b0i+1 = bi ). Output b1 , . . . , bn . P This algorithm has running time exactly T (I) ≤ ni=1 2i = n(n+1) (since every round of the inner “for” loop has precisely 2 comparisons). Thus we have T (n) = O(n2 ). 9 1 Combinatorial Optimization Lecture 2 1.1 Merge sort Last week we saw “Insertion Sort”, which was an algorithm for sorting numbers into increasing order in O(n2 ) steps. Now we’ll introduce a new algorithm called “Merge Sort” which does the same thing in only O(n log n) steps. It runs as follows: Algorithm 1. Input: real numbers x1 , . . . , xn . Output: y1 , . . . , yn , which are x1 , . . . , xn sorted into increasing order. Procedure: – If n = 1, the set y1 = x1 , and we are done. – Otherwise, split {x1 , . . . , xn } into two sets Sodd {x1 , x3 , x5 , . . . } and Seven = {x2 , x4 , . . . }. – Use recursion to sort Sodd into increasing order: this gives numbers a1 , . . . , adn/2e which are x1 , x3 , x5 , . . . , but in increasing order. – Use recursion to sort Seven into increasing order: this gives numbers b1 , . . . , bbn/2c which are x2 , x4 , x6 , . . . , but in increasing order. – Merge the two lists (a1 , . . . , adn/2e ), (b1 , . . . , bbn/2c ) into a single one. This is done as follows: * For i = 1, . . . , n − 1, repeat the following: (+) Let yi = max(a1 , b1 ). Delete this element max(a1 , b1 ) from the corresponding list (i.e. if we’re deleting a1 , then redefine a01 = a2 , a02 = a3 , . . . . If we’re deleting b1 , then redefine b01 = b2 , b02 = b3 , . . . ). * When there’s only one element left, let yn equal this element. – Output y1 , . . . , yn . We’re treating this as a comparison sorting algorithm, so to compute the running time we only count comparisons. The only comparison explicitly written is the comparison at the start (to check if n = 1 or not), and the comparisons in step (+) (which occurs n − 1 times). However some comparisons happen during the recursion steps as well. Thus the total number of comparisons is given by the following recursive equation: 1 T (1) = 1. T (n) = T (dn/2e) + T (bn/2c) + n. Let’s solve this recurrance. To make things a bit neater, we’ll only do this when n is a power of 2 (and so dn/2e and bn/2c will get replaced by just “n/2”). Theorem 2. Consider the following recurrence: T (1) = 1 T (n) = 2T (n/2) + n (1) (2) For n a power of 2, the solution to this is T (n) = n(log2 n + 1) = Θ(n log n). Proof. Let n = 2k , so that log2 n = k. We’ll use what is called the recurrence tree method. This works as follows: Start with a single node labelled T (n). Next change the label of this node to “n”, and give it two children labelled by T (n/2): Next replace a node labelled T (n/2) by a node labelled n/2 and give it two children labelled T (n/4). Continue like this for as long as possible. See Figure 1 for the final tree we get. Formally: If there is a node labelled T (m) for m > 1, change its label to m + 1 and give it two children labelled T (m/2). If there is a node labelled T (1), change its label to 1. Otherwise, stop. The key fact is the following “throughout the process, the sum over all the nodes doesn’t change”. This is because at each step, we used (1) or (2) to change one node into a combination of nodes with he same sum. Thus to work out T (n), we can instead work out the sum of the values of all the nodes in the final tree. Label the levels of the recursion tree as level0, level1, . . . , levelk (see Figure 1). Since every node in levels 0, . . . , k − 1 has exactly 2 children, we have the following: the number of nodes in level i = 2i . (3) 2 Figure 1: The recurrence tree Let Si be the sum of the values of all the nodes on level i. Note that on all levels i 6= k, we have Si = (the number of nodes in level i)n/2i = n. (4) Thus we obtain the theorem. T (n) = the sum over all the nodes = = k X i=0 k X Si n = (k + 1)n = n(log2 n + 1) i=0 We still want to understand the case when n is not a power of 2. One way to deal with this is to first note that T (n) is increasing with n (this can be proved by induction). Also note that there is a unique p ∈ N with 2p−1 < n < 2p . Set n0 = 2p , noting that n0 ≤ 2n. This shows us that T (n) ≤ T (n0 ) = n0 (log2 n0 + 1) ≤ 2n(log2 n + 2) = O(n log n). 1.2 Lower bounds In this section we prove the following theorem that gives a lower bound on the running time of all comparison-based sorting algorithms. 3 Theorem 3. Every comparison-based sorting algorithm A has worst case running time T (A, n) = Ω(n log n). One remark about using asymptotic notation with logarithms: using the bn change of base formula loga n = log , it is easy to show that for a, b > 1, we log ba have loga n = Θ(logb n). Because of this, we often omit the logarithm base when writing equations like “T (A, n) = Ω(n log n)” — this is because the expressions T (A, n) = Ω(n ln n), T (A, n) = Ω(n log2 n), T (A, n) = Ω(n log3 n) are all equivalent to each other via the change of base formula above. In order to prove the above theorem, we need to study binary trees — structures a bit like the ones that came up in the recursion tree method above. Definition 4. A binary tree T is a directed graph T with vertex set V = (v1 , . . . , vn ) in which every vertex vi 6= v1 has one backwards edge (i.e. an edge vj vi with j < i), and in which every vertex vi has either 0 or 3 forwards edges (i.e. edges vi vj with i < j). The vertex v1 is called the root of T . The leaves of T are the vertices vi which have no forward edges coming out of them. The height of T is the maximum distance from the root to a leaf in T . A complete binary tree of height h is one in which all leaves are at distance h from the root. Figure 2: A binary tree We’ll use the notation “xi : xj ” as shorthand for the operation “compare xi to xj ‘”. The connection between binary trees and sorting algorithms is the following: Definition 5. A decision tree for sorting x1 , . . . , xn is a binary tree in which every vertex and edge receives one of the following labels: 4 Every non-leaf vertex is labelled by “xi : xj ” for some i, j, and its two forward edges are labelled by “xi < xj ” and “xi ≥ xj ”. Every leaf vertex is labelled by some output (i.e. by a permutation of x1 , . . . , xn ). Every comparison-based sorting algorithm A gives rise to a decision tree. To get this, first label the root by the first comparison xi : xj that the algorithm makes. Then label the children of the root by the next comparisons that the algorithm makes, and so on. The leaves are created when the algorithm says “output y1 , . . . , yn ”, in which case we label the leaf by the permutation of x1 , . . . , xn that y1 , . . . , yn are. This is best illustrated as an example: Figure 3: A decision tree for sorting 3 numbers x1 , x2 , x3 . Every comparison-based sorting algorithm gives rise to a decision tree. In fact decision trees could be thought of as a way of formally defining what a comparison based sorting algorithm is. Given some input x1 , . . . , xn we can always reach an output by starting from the root and performing all the comparisons“xi : xj ” that the tree tells you to do (and following the edge labelled by “xi < xj ” or “xi ≥ xj ”, depending on which of these is true). The running time of the algorithm is then exactly the number of edges you move through i.e. the distance between the root and the leaf. 5 To prove Theorem 3, we need to understand binary trees. This is done in the following lemmas: Lemma 6. A binary tree of height h has at most 2h leaves. Proof. Let T be a binary tree of height h that has as many leaves as possible. We need to show that T has ≤ 2h leaves. Let the vertices of T be V = (v1 , . . . , vn ). Suppose that T is not a complete binary tree. Then there is some leaf vi at distance d ≤ h − 1 from the root v1 . Build a new tree T 0 by adding two vertices vn+1 , vn+2 and edges vi vn+1 , vi vn+2 . Note that in T 0 , the vertices vn+1 , vn+2 are at distance d + 1 ≤ h from the root i.e. T 0 still has height h. But T 0 has more leaves than T , contradicting the “has as many leaves as possible” part of the definition of T . Suppose that T is a complete binary tree. Let mi be the number of vertices at distance i from v1 . We have that m0 = 1 and mi+1 = 2mi for all i. Therefore mh = 2h . But since T is a complete binary tree, mh is exactly the number of leaves, giving us what we want. We’ll also need another simple lemma. Lemma 7. For every n ∈ N ln n! ∼ n ln n, meaning that the ratio ln n!/(n ln n) tends to one as n → ∞. In particular, for sufficiently large n, we have 2n ln n > ln n! > n ln n/2. Proof. The Taylor expension of ex is 1 + x/1 + x2 /2! + . . . xn /n! + . . . which implies with x = n en > nn /n!. This shows that n! > (n/e)n for every n. Then, taking the logarithms of (n/e)n < n! < nn we obtain n(ln n − 1) < ln n! < n ln n. Thus 1 ln n! < <1 1− ln n n ln n for every n. This implies (ln n!)/(n ln n) → 1 as n → ∞; that is, ln n! ∼ n log n. Now we show that every comparision sorting algorithm A has T (A, n) = Ω(n log n). Proof of Theorem 3. Consider the decision tree T corresponding to the algorithm A run on the input x1 , . . . , xn . Note the following: 6 T has height T (n). This is because T (n) is defined to be the maximum number of comparisons made on an input I of the form (x1 , . . . , xn ). But the number of comparisons on an input I exactly equals the length of the path from the root to the leaf corresponding to I. Since the maximum length of such a path is the height of T we have that height(T ) = T (n). T has ≥ n! leaves. Otherwise there would be some permutation σ of x1 , . . . , xn which is not the label of any leaf of T . Then the algorithm could not possibly work correctly on all inputs, since it is possible that the numbers in the input I are in the order given by σ. Combining Lemma 6 with the first bullet point, we get that the number of leaves of T is ≤ 2T (n) . Combining this with the second bullet point tells us that 2T (n) ≥ n!. Taking logarithms gives T (n) ≥ log2 (n!) = ln(n!)/ ln(2). By Lemma 7, we know that for sufficiently large n, ln(n!) ≥ n ln n/2. Thus ln n , which shows that T (n) = for sufficiently large n we have that T (n) ≥ n2 ln 3 Ω(n log n). 1.3 Counting sort Counting Sort is another algorithm for sorting which works in time O(n). At first this seems like it should contradict Theorem ?? — but the reason there will not be a contradiction is that Counting Sort will take place in the arithmetic model (rather than be comparison based). Additionally, counting sort will make an extra assumption on the numbers x1 , . . . , xn — we will assume that they are all contained in the set [k] := {1, . . . , k} for some integer k = O(n). Algorithm 8. Input: x1 , . . . , xn ∈ [k] Output: y1 , . . . , yn , which are x1 , . . . , xn sorted into increasing order. Procedure: 1. Set c1 , . . . , ck = 0. 2. For i = 1, . . . , n, set cxi = cxi + 1. 3. Set t = 1 4. For i = 1, . . . , k, repeat the following: – For j = 1, . . . , ci , repeat the following: * Set yt = i. 7 * Set t = t + 1. 5. Output y1 , . . . , yn . The basic idea of this algorithm is: in step 2 we count the number of times each j ∈ [k] comes up in the list x1 , . . . , xn (and let cj be the number of times that j appears). Afterwards we write out c1 copies of “1”, c2 copies of 2, . . . , ck copies of “k” — which will be the sorted list. To work out the running time, we count the number of operations in each line: 1. There are k operations here. 2. There are 3n operations here. 3. There is 1 operation here. P P 4. There are a total of k+ ki=1 4ci operations here. Noting that ki=1 ci = n (since in step (2), there were exactly n times the ci s were increased), we get that there are k + 4n operations at this step. Thus in total, there are k + 3n + 1 + (k + 4n) = 2k + 7n + 1. If we additionally know that k = O(n), this gives us that T (n) = O(n). 1.4 Recurrences Determining running time often leads to recurrence equations (like the one that came up when analysing in merge sort). The recurrence tree method can be used to solve these. Here’s a fairly general theorem which covers a wide range of recurrence equations: Theorem 9. Let a, b ∈ N and b ≥ 2 and f (n) > 0 for all n. Set D = logb a. Suppose that we have positive numbers T (1), T (b), T (b2 ) . . . , defined by the following recurrence n + f (n) T (n) = aT b (1) If f (n) = O(nD−ε ) for some ε > 0, then T (n) = Θ(nD ). (2) If f (n) = Θ(nD ), then T (n) = Θ(nD log n). (3) If f (n) = Ω(nD ) and af (n/b) < cf (n) for some c < 1, then T (n) = Θ(f (n)). 8 In the theorem, f (n) is compared with nD . In cases 1 and 3, there is a polynomial gap n±ε between f (n) and nD . So the theorem does not cover all possible cases. Next we give a few examples of how the general theorem is applied. Example 1. The recurrence equation T (n) = 9T (n/3) + n is Case 1 of the general theorem as here a = 9, b = 3, so D = log3 9 = 2 and f (n) = n = O(nD−ε ). Thus T (n) = Θ(n2 ). The result would be the same even with f (n) = n1.5 or n1.95 . Example 2. The recurrence equation T (n) = 9T (n/3) + n2 is in Case 2: a = 9, b = 3, so D = log3 9 = 2 and f (n) = n2 = Θ(nD ). Consequently T (n) = Θ(n2 log n). Example 3. The recurrence equation T (n) = T (n/3) + 1 is in Case 2: a = 1, b = 3, so D = log3 1 = 0 and f (n) = 1 = Θ(n0 ). Consequently T (n) = Θ(log n). Example 4. In the recurrence equation T (n) = 3T (n/4)+n log n we have a = 3, b = 4, D = log4 3 ≈ 0.793.. and f (n) = n log n = Ω(nD+ε ). This is going to be Case 3 but we still have to check that condition af (n/b) < cf (n) holds with some c ∈ (0, 1). This is quite simple: 3 n4 log n4 < cn log n indeed with c = 34 , say. So we have T (n) = Θ(n log n). Example 5. The recurrence equation T (n) = 2T (n/2) + n log n gives a = 2, b = 2, D = 1, f (n) = n log n = Ω(nD ). Though this seems to be Case 3, the condition af (n/b) < cf (n) fails: 2(n/2) log(n/2) = nlog(n/2) < cn log n does not hold for any positive c < 1. This case is not covered by the above general theorem. Proof. We are given that n = bh for some h ∈ N. We use the recurrence tree method, this time not with a binary tree (where every internal node has 2 children), but with an a-ary tree meaning that every internal node (including the root) has a children. Here is the recursion tree: The height of the tree is h. Let’s label the levels from top to bottom as level 0, . . . , level h. There is 1 node at level 0 with value f (n), a nodes on level 1 each with value f (n/b), a2 nodes on level 2 each with value f (n/b2 ), etc, In general, on level i < h, there are ai nodes each with value f (n/bi ). On the last level (i.e. level h), there are ah nodes, all of which are leaves, and all of which have label T (1) = T (n/bh ). Note that ah = alogb n = nlogb a = nD and bD = a. 9 The sum of the values on level 1 is af (n/b), the sum of the values on level 2 is a2 f (n/b2 ), etc, the sum of the values on level h − 1 is ah−1 f (n/bh−1 ), and the values on the leaves is ah T (1) = nD T (1). Define g(n) = Ph−1thei sum of i D i=0 a f (n/b ). Then T (n) = g(n) + n T (1) which implies that T (n) = D Ω(g(n)) and T (n) = Ω(n ). We evaluate g(n) in the three cases. Case 1. Since f (n) = O(nD−ε ), we have a constant C for which f (n) ≤ CnD−ε for all n (strictly speaking the definition of f (n) = O(nD−ε ) only gives a constant C for which f (n) ≤ CnD−ε for all n ≥ n0 for some n0 . However, if we let C 0 = max(C, f (1), f (2), . . . , f (n0 )) we get that f (n) ≤ C 0 nD−ε for all n). Then f (n/bi ) ≤ C(n/bi )D−ε . Thus g(n) = h−1 X i af i=0 D−ε = Cn n bi h−1 X i=0 D−ε ≤ Cn h−1 X i=0 ai bi(D−ε) ε i h−1 X b bhε − 1 a = CnD−ε biε = CnD−ε ε a b −1 i=0 i nε − 1 1 − n−ε = CnD−ε ε = CnD ε = O(nD ). b −1 b −1 Returning to T (n), we have T (n) = g(n) + nD T (1) = O(nD ) and hence T (n) = Θ(nD ). Case 2. As in the previous case, using that f (n) = Θ(nD ), we get 10 constants c, C such that for all n we have cnD < f (n) < CnD . We now have g(n) = h−1 X i af i=0 n bi < h−1 X i aC i=0 n D bi D = Cn h−1 X 1 = CnD logb n = CnD h 0 = O(nD log n). and also g(n) = h−1 X i af i=0 n bi > h−1 X i ac i=0 n D bi = cn D h−1 X 1 = cnD logb n = cnD h = cnD loga n 0 D = Ω(n log n). Returning to T (n) we have T (n) = g(n) + nD T (1) ≥ g(n) ≥ cnD loga n = Ω(n log n) and also T (n) = g(n)+nD T (1) ≤ cnD loga n+nD T (1) = O(n log n). Thus we’ve established both T (n) = O(n log n) and T (n) = Ω(n log n), proving T (n) = Θ(f (n)). Case 3. As in the previous cases, using that f (n) = Ω(nD ), we get a constant C such that f (n) > CnD . The condition af (n/b) < cf (n) implies that argument shows that f (n/bi ) < (c/a)i f (n), f (n/b) < ac f (n). Repeating this n i i or in other words a f bi < c f (n). We use this last inequality when estimating g(n): g(n) = h−1 X i=0 ai f n bi < h−1 X ci f (n) = f (n) h−1 X 0 i=0 ci ≤ f (n) ∞ X ci 0 1 = f (n) = O(f (n)). 1−c We also have that g(n) ≥ f (n) for all n (since the node at level 0 always has value f (n)), and hence g(n) = Θ(f (n)). Returning to T (n) we have that T (n) = g(n)+nD T (1) ≥ g(n) = Ω(f (n)). We also have that T (n) = g(n) + nD T (1) ≤ g(n) + f (n)/C = O(f (n)). Thus we’ve established both T (n) = O(f (n)) and T (n) = Ω(f (n)), proving T (n) = Θ(f (n)). 11 Lecture 3 Graph theory, basic definitions In this section we introduce objects called graphs, which are how mathematicians study networks of objects. If you took MATH0029, then there will be a large overlap between this section and that module. First a motivational example Example 1 (Minimum cost spanning tree problem). Suppose that you are designing an electrical network in a city. There are a number of buildings which you need to connect the network by stretching wires between them. The costs of each potential wire are given by the diagram bellow: What is the cheapest way of connecting everything together? The optimal way is what’s known as a minimal cost spanning tree. In the above example, the following is the optimum: 1 Next week, we will introduce two algorithms — Jarnik’s Algorithm, and Kruskal’s Algorithm for solving the above problem. Today, we’ll set up mathematical notation for describing the problem precisely. First we define a graph — informally a graph is a “network”. Definition 2. A undirected graph G is a pair G = (V, E) where V is a finite set and E is a set of unordered pairs of elements from V . Elements of V are called vertices (or nodes), and elements of E are called edges. A directed graph D is a pair D = (V, E) where V is a finite set and E is a set of ordered pairs of elements from V . For example, we could have and undirected graph G = (V, E) with V = {a, b, c, d} and E = {{a, b}, {b, c}, {a, c}, {a, d}}. And we can have a directed graph D = (V, E) with V = {x, y, z, w} and E = {(x, y), (y, z), (z, w), (x, w)}. Writing brackets for edges can get a bit cluttered, so it’s a convention to omit them when talking about graphs i.e. we can describe the edge set of the graph as E = {ab, bc, ac, ad}. When two vertices x, y are contained in an edge xy we say “x and y are adjacent”, “x and y are incident to each other”, “x and y are connected by an edge”, “x is a neighbour of y” — these are all synonyms for the same thing. While graphs are defined in terms of sets, most of the time we draw them (and think about them) as collections of points joined by lines i.e. a picture like the following one: Most of the time we do not allow a graph to contain an edge joining a vertex to itself (i.e. there are no edges of the form {x, x}), and also we allow at most one edge between two vertices. Graphs which satisfy these are called simple graphs. Graphs which have multiple edges between the same pair of vertices are called multigraphs. Given a graph G = (V, E), we write V (G) to denote the set of vertices of G (i.e. V (G) = V ), and E(G) to denote the set of edges of G (i.e. E(G) = E). The order of G is the number of vertices it has , denoted v(G) := |V (G)|, while the size of G is the number of edges it has, denoted e(G) := E(G). 2 For a vertex v, the neighbourhood of v in G, denoted NG (v) is the set of vertices connected to v by an edge i.e. NG (v) := {u ∈ V (G) : vu ∈ E(G)}. The degree of vertex v ∈ V in G, is the number of edges G has containing v — in a simple graph this works out as dG (v) = |NG (v)|. The complete graph on n vertices, Kn , is when |V (Kn )| = n and E(Kn ) consists of all pairs {u, v} with distinct u, v ∈ V . The empty graph on n vertices is En where |V (En )| = n and there are no edges. Figure 1: Examples of complete graphs An important class is that of the bipartite graphs. G(V, E) is bipartite if there is a partition V = X ∪ Y with X, Y 6= ∅ (and X ∩ Y = ∅) such that every edge in E has one endpoint in X and one in Y . That is, there are no edges with both endpoints in X or in Y , edges only go between X and Y . Figure 2: An example of a bipartite graph Two more kinds of graphs which come up are the path and the cycle. A path on n vertices n, denoted Pn is defined to be the graph with V (Pn ) = {v1 , . . . , vn }, and E(Pn ) = {v1 v2 , v2 v3 , . . . , vn−1 vn }. A cycle on n vertices n, denoted Cn is defined to be the graph with V (Cn ) = {v1 , . . . , vn }, and E(Cn ) = {v1 v2 , v2 v3 , . . . , vn−1 vn , vn v1 }. 3 Figure 3: Examples of paths and cycles Sometimes a graph H is a subgraph of another graph G. This happens exactly when V (H) ⊂ V (G) and E(H) ⊂ E(G). We say that H is a spanning subgraph of G if it is a subgraph of G and V (H) = V (G). 0.1 Paths, walks, cycles, and circuits In the graph G(V, E) a walk P is an ordered sequence of vertices v0 , v1 , . . . , vk where vi ∈ V and vi−1 vi ∈ E (for all i = 1, . . . , k). The length of a walk is defined as k−1, which equals the number of edges it goes through (repetitions counted). A trail is a walk which doesn’t repeat edges, and a path is a walk which doesn’t repeat vertices or edges (i.e. a sequence v0 , v1 , . . . , vk of distinct vertices with vi−1 vi ∈ E for all i). A closed walk is a walk v0 , v1 , . . . , vk with v0 = vk . A circuit is a trail v0 , v1 , . . . , vk with v0 = vk and k ≥ 3. A cycle is a sequence v0 , v1 , . . . , vk , v0 of vertices with v0 , . . . , vk distinct, k ≥ 3, and vi−1 vi ∈ E for all i = 1, . . . , k, and also vk v0 an edge. Notice that containing a path of length n in a graph G, is exactly the same as G containing the graph Pn+1 as a subgraph (as defined in the previous section). Similarly containing a cycle of length n in a graph G, is exactly the same as G containing the graph Cn as a subgraph. Definition 3. Vertices u, v ∈ V (G) are connected in G if there is a walk P = u, v1 , . . . , vk , v in G. Notation: u ∼ v or u ∼G v. We say that the walk P connects u and v, or goes between u and v. Lemma 4. The relation u ∼ v is an equivalence relation, that is, it satisfies the following conditions. (1) u ∼ u for every u (reflexive), (2) if u ∼ v then v ∼ u (symmetric), (3) if u ∼ v and v ∼ w, then u ∼ w (transitive). 4 Proof. (1) W = u gives a walk from u to u, showing u ∼ u. (2) If W = u, v1 , v2 , . . . , vk v is a walk from u to v, then W 0 = v, vk , . . . , v2 , v1 , u is a walk from v to u. (3) If W = u, v1 , . . . , vk v is a walk from u to v and W 0 = v, x1 , . . . , xt , w is a walk from v to w, then W 00 = u, v1 , . . . , vk v, x1 , . . . , xt , w is a walk from u to w. Equivalence relations are important in mathematics. S An equivalence relation on a ground set V gives rise to a partition V = Vi with the property that u ∼ v iff u, v are contained in the same Vi , the sets Vi are called equivalence classes. In the case of a graph G and u ∼ v, the equivalence classes are subsets of vertices V1 , . . . , Vk such that x, y ∈ V are connected by a walk iff x, y are in the same Vi . The subgraphs with vertex set Vi and edges inherited from G are called the connected components of G. Proposition 5. If u, v ∈ V are connected by a walk in G, then they are connected by a path. Proof. Consider a walk W from u to v given by u = v1 , v2 , . . . , vk = v, and suppose that its length is as small as possible (i.e. that we choose W to have k minimal out of all possible walks from u to v). If v1 , . . . , vk are all distinct, then W is a path. So suppose otherwise, that vi = vj for some i < j. Then W 0 = v1 , . . . , vi , vj+1 , . . . , vk is a walk: to check this we need to check that consecutive vertices on W are connected by edges. These pairs are v1 v2 , v2 v3 , . . . , vi−1 vi , vi vj+1 , vj+1 vj+2 , . . . , vk−1 vk . Since vi = vj , we have that all of these are of the form vt vt+1 for some t. But for all t, vt vt+1 is an edge by the definition of W being a walk. Given en edge e = uv of a graph G, we define the subgraph G − e of G via V (G − e) = V (G) and E(G − e) = E(G) \ e. This subgraph is called G minus e. Proposition 6. An edge e = uv ∈ E lies on a circuit of G if and only if u and v are connected in G − e. Proof. If uv lies on the circuit u, v1 , . . . , vk , v, u, then u and v remain connected in G − e by the path u, v1 , . . . , vk , v. Conversely, if u and v are connected in G − e, then there is path of the form u, v1 , . . . , vk , v, and e = uv lies on the circuit u, v1 , . . . , vk , v, u. Using this we can get an analogue of Proposition 5 for cycles/circuits. 5 Proposition 7. If an edge uv ∈ E(G) is contained in a circuit, then it is also contained in a cycle. Proof. Consider some circuit u, v, v1 , . . . , vk u through uv. By Proposition 6, we have that u and v are connected in G − uv (by a walk). Using Proposition 5, u and v are connected in G − uv by a path. Let P = u, x1 , . . . , xt , v be such a path. Then C = u, x1 , . . . , xt , v is a cycle through uv in G. Definition 8. A graph is connected if every pair of its vertices are connected by a walk (note that this is equivalent to “every pair of its vertices are connected by a path”). Proposition 9. Assume G is connected and e = uv ∈ E(G) lies on a circuit in G, then G − e is connected. Proof. By Proposition 6, u and v are connected in G − e by a walk P = u, v1 , . . . , vk , v. By Proposition 5, we may assume that P is in fact a path. We have to show that every pair x, y ∈ V is connected by a walk in G − e. They are connected by a path Q in G. If Q does not contain e then Q connects x and y in G − e as well. If Q contains e = uv, say Q = x, . . . , u, v . . . , y, then the walk x, . . . , u, v1 , . . . , vk , v . . . , y connects x and y in G − e. Definition 10. A graph containing no circuit is called a forest. A connected forest is a tree. In other words, a tree is a connected graph with no circuit. Using Proposition 7, an equivalent definition is that a tree is a connected graph with no cycles. We have seen trees, namely binary trees before. Although a binary tree is directed and has a root, it is easy to see that it is in fact a tree in the above sense (if we disregard orientation). A vertex v of a tree T is called a leaf if deg v = 1. This is again the same meaning as before. 1 The minimum spanning tree In a graph G = (V, E) a spanning tree is a subgraph T which is a tree with V (T ) = V . Of course, if such a tree exists, then the graph is connected. Suppose that a cost Pfunction c : E → R is given. The cost of the tree T is defined as c(T ) = e∈E(T ) c(e). The minimum spanning tree problem asks to find a spanning tree with minimal cost. Formally: TASK: Minimum spanning tree (MST), INPUT: a connected graph G and a cost function c : E → R, OUTPUT: a minimum cost spanning tree T . 6 See the figures in Example 1 for an example of an input to this problem, as well as an optimal solution. Next week, we will solve MST with a fast and effective algorithm. The computational model will be the arithmetic model. The input consists of graph G with n vertices and m edges, plus a real number c(e) for each edge. The size of the input is n + m + m = n + 2m. Note that m ≤ n2 as a graph on n vertices contains at most n2 edges. This week we will, set up some lemmas that will be used when analysing the algorithms for the MST problem. Proposition 11. Every tree T with |V (T )| ≥ 2 has at least two leaves. Proof. Consider the longest path P = v0 , v1 , . . . , vk in T . Its endpoints will be leaves of T : Indeed if v0 vi is an edge for some i > 1, then v0 , . . . , vi would form a cycle. On the other hand if v0 x is an edge for some x 6∈ P , then x, v0 , v1 , . . . , vk would be a longer path. Thus v0 has no neighbours other than v1 . Similarly vk has no neighbours other than vk−1 . Note that there are trees with only two leaves: a path is always a tree and has exactly two leaves. Proposition 12. In every tree |E(T )| + 1 = |V (T )|. Proof. Induction on n = |V (T )|. The cases n = 1 and n = 2 are clear. Let us go from n − 1 → n. Let v be a leaf of T and let e be the unique edge in T incident with v. The subgraph T − v is defined, quite naturally, by V (t − v) = V (T ) \ v and E(T − v) = E(T ) \ e. We claim that T − v is a tree. Indeed, it contains no circuit and it is connected: the edge e was used only to connect v to the other vertices. Since |V (T − v)| = n − 1, the induction hypothesis says that |E(T − v)| + 1 = |V (T − v)|. Putting back v and e we get back T , so indeed, |E(T )| + 1 = |V (T )|. Next we prove the following simple but important result. Lemma 13. Assume T is a connected spanning subgraph of a graph G. Then T is a tree iff it has exactly |V (G)| − 1 edges. Proof. If T is a spanning tree of G, then |V (G)| = |V (T )| and |V (T ) = |E(T )| + 1 by Proposition 12, so indeed |E(T )| = |V (G)| − 1. For the other direction assume that T contains circuits. Then delete edges one-by-one from circuits as long as you can. The resulting graph T ∗ is a tree again because (1) it contains no circuit and (2) it remained connected (in view of Proposition 6). Thus by Proposition 12, |E(T ∗ )| = |V (T ∗ )| − 1. Further, T ∗ is a spanning subgraph of G as vertices have not been deleted. So |V (T ∗ )| = |V (G)|. Consequently |E(T ∗ )| = |V (G)| − 1. Originally we 7 had |E(T )| = |V (G)| − 1 so no edge was ever deleted. T does not contain a circuit. 8 Lecture 4 Minimum cost spanning trees Lemma 4.1 (Exchange lemma). Assume G = (V, E) is a graph and T = (V, F ) is a spanning tree in G, e = uv ∈ E \ F , and f ∈ F is on the (unique) path P connecting u and v in T . Then T ∗ = (V, F ∪ e \ f ) is a spanning tree again. Proof. First T ∗ is a spanning subgraph as V (T ∗ ) = V . Since T is a tree with n := |V (G)| vertices, we have |E(T )| = n − 1 (by a proposition from last week). The path P together with edge e is a cycle. As T is connected, T + e is connected as well. By another proposition from last week, T + e remains connected if f is deleted from the cycle, so T ∗ is connected. Additionally we have E(T ∗ ) = E(T ) = n − 1 (since one edge was deleted and one edge was added). We’ve showed that T ∗ has n vertices, n − 1 edges, and is connected — hence it is a tree. We need one more definition. Given a graph G(V, E) and a set A ⊂ V , the cut of A is δ(A) = {e ∈ E : one endpoint of e is in A, the other one in V \ A}. So the cut of A, δ(A) ⊂ E, is the set of edges in G that go between A and its complement. D ⊂ E is a cut if there is a proper A ⊂ V with δ(A) = D (here “proper A ⊆ V ” means that A 6= V and A 6= ∅). The following proposition tells us about how paths and cuts interact. Proposition 4.2. Let G = (V, E) be a graph, A ⊆ V , and P a path which starts in A and ends outside A. Then P contains an edge of the cut δ(A). Proof. Let P = v1 , v2 , . . . , vk . So we have v1 ∈ A, and vk 6∈ A. Let vi be the last vertex in the path with vi ∈ A (i.e. pick i = max{j : vj ∈ A} noting that the maximum exists because this is a finite, nonempty set). We have vi 6= vk (since vk = v 6∈ A), and so i ≤ k − 1. Since P is a path we have an edge vi vi+1 . By maximality of i, we have vi+1 6∈ A. Now we’ve established that vi ∈ A, vi+1 6∈ A so, by the definition of “cut”, we get that the edge vi vi+1 ∈∈ δ(A). Using this, we can prove an alternative characterization of connectedness. Proposition 4.3. G is connected iff there is no proper A ⊂ V with δ(A) = ∅. 1 Proof. We prove the statement in the following form. G is disconnected ⇐⇒ there is a proper A ⊂ V with δ(A) = ∅. “⇐ direction” if δ(A) = ∅ for some proper A ⊂ V , then there is u ∈ A and v ∈ V \ A. Suppose, for contradiction, that we have some path P from u to v in G. By Proposition 4.2, we get an edge of the path xy ∈∈ δ(A). But δ(A) = ∅, which gives a contradiction. “⇒“ direction. Suppose that G is disconnected. We have to show that there is a proper A ⊂ V with δ(A) = ∅. As G is disconnected, there are u, v ∈ V that are not connected by a path. Define now A = {x ∈ V : u and x are connected in G}. The set A is proper since u ∈ A and v ∈ / A. We claim that δ(A) = ∅, which will finish the proof. Assume that pq ∈ δ(A), then p ∈ A and q ∈ / A. Let P be the walk connecting u to p. Then the walk P q connects u and q, so q ∈ A. A contradiction. Given a graph G(V, E) we say that B ⊂ E extends to a minimum spanning tree if there is a minimum spanning tree whose edge set contains B. The following theorem is the basic tool that makes our algorithms for minimum cost spanning trees run correctly. Theorem 4.4 (Extension theorem). Let G = (V, E) be a graph and c : E → R a cost function. Assume B ⊂ E extends to a minimum spanning tree, D ⊂ E is a cut disjoint from B, and e ∈ D is an edge with minimal cost in D. Then B ∪ e also extends to a minimum spanning tree. Proof. Let T = (V, F ) be the minimum spanning tree with B ⊂ F (which exists by the assumption on B). If e ∈ F , then we are done. So assume e∈ / F . Since the tree T is connected, there is a path P in T that connects the endpoints of e, and so by Proposition 4.2 there is an edge f ∈ D ∩ P with c(f ) ≥ c(e) as e is the cheapest edge in D. By the Exchange lemma, T ∗ (V, F ∪ e \ f ) is a spanning tree, again. Its cost is c(T ∗ ) = c(T ) + c(e) − c(f ) ≤ c(T ). So equality holds here and T ∗ is another minimum spanning tree and B ∪ e extends to T ∗ . 5 Jarnik’s algorithm for minimum spanning tree Jarnik’s algorithm grows a tree T by adding a new vertex and edge at each iteration. Here is how it works: Choose a vertex r ∈ V , called the root. Start 2 with V (T ) = {r} and E(T ) = ∅. On each iteration, add to T a least cost edge e ∈ / E(T ) so that T + e remains a tree. Stop when no more edge can be added. Input: A connected graph G = (V, E) and a cost function c : E → R. Output: A minimum cost spanning tree T . Procedure: – Pick some arbitrary r ∈ V (T ), and set V (T ) = {r}, E(T ) = ∅. – Repeat the following: * Find the least cost edge xy ∈ E with x ∈ V (T ), y 6∈ V (T ) (so xy ∈ δ(V (T ))). If no such edge exist, output T * Update V (T ) = V (T ) ∪ {y}, E(T ) = E(T ) ∪ {xy}. Theorem 5.1. Jarnik’s algorithm always outputs a minimum cost spanning tree Proof. We show that at the start of iteration i of the loop the following hold: (i) |V (T )| = i, |E(T )| = i − 1. (ii) Edges of E(T ) are contained in V (T ). (iii) T extends to a minimum cost spanning tree. Proof. This is proved by induction on i. The initial case is i = 1, when |V (T )| = 1, |E(T )| = 0. It is clear that this T extends to a minimum cost spanning tree (since G is connected, it contains some minimum cost spanning tree T 0 = (V, E 0 ). We have E(T ) = ∅ ⊆ E 0 ). For the induction step, suppose (i), (ii), (iii) are true at iteration i ≤ n−1. Let xy be the edge found by the algorithm at iteration i, and let T 0 = (V 0 , E 0 ) with V 0 = V (T ) ∪ {y}, E 0 = E(T ) ∪ {xy} be the graph the algorithm has at iteration i + 1. We need to show that T’ satisfies (i), (ii), (iii). Note that since y 6∈ V (T ), (ii) tells us that xy 6⊆ V (T ). Since |V (T 0 )| = |V (T ) ∪ {y}| = |V (T )| + 1 = i + 1, |E(T 0 )| = |E(T ) ∪ {xy}| = |E(T )| + 1 = (i + 1) − 1, we get that (i) holds at the start of iteration i + 1. Property (ii) holds because xy ⊆ V (T ) ∪ {y} = V (T 0 ) (since x ∈ V (T )). Property (iii) holds for T 0 by the extension theorem — because we have that T extends to a minimum cost spanning tree (by (iii)), we have E(T ) disjoint from the cut δ(V (T )) (by (ii)), and xy the minimum cost edge in the cut δ(V (T )). 3 At the start of iteration n of the loop, by (i) we have that |V (T )| = n, |E(T )| = n − 1, which tells us that V (T ) = V (G). This shows us that δ(V (T )) = ∅, and hence the algorithm terminates outputting this T . By (iii), T extends to some minimum cost spanning tree T 0 , which tells us that V (T ) ⊆ V (T 0 ) and E(T ) ⊆ E(T 0 ). Since T 0 is a spanning tree of G, it has |V (T 0 )| = |V (G)| = n = |V (T )| and |E(T 0 )| = |V (G)| − 1 = n − 1 = E(T ). Hence, we have that T = T 0 i.e. that T is a minimum cost spanning tree. Running time for Jarnik’s algorithm We will give the following upper bound on the running time of Jarnik’s Algorithm. Proposition 5.2. Jarnik’s Algorithm can be run in time O(nm) on a graph with n vertices and m edges. Note that previously we only use “O(n)” notation for referring to functions of only one variable. Here there are two variables n and m, but the meaning is the same. We use f (n, m) = O(g(n, m)) to mean “there is a constant C such that for sufficiently large n, m we have f (n, m) ≤ Cg(n, m). Thus the above proposition can be rephrased as “there is a constant C such that for sufficiently large n and m, the running time of Jarnik’s Algorithm is ≤ Cnm”. Proof. The running time of any algorithm depends a bit on the implementation — things like how the input/output is recorded can greatly affect the running time of the algorithm. For the current proposition, we will use the most natural way of encoding the input — the graph G will be inputed as a list of vertices V (G) = {1, 2, . . . , n}, and a list of edges E(G) = {x1 y1 , . . . , xm ym }. The costs of the edges are given by a list of numbers {c1 , . . . , cm } with c(xi yi ) = ci . As the algorithm runs, we will keep track of which vertices and edges are in T . The vertices will be kept track of as follows. Let’s say that we use vertex 1 as a root at the start. We will keep a binary list T = (T1 , . . . , Tn ) of length n, with Ti = 1 if vertex i is in T and Ti = 0 if vertex i is not in T . Thus at the start of the algorithm we have T = (1, 0, . . . , 0), at the end we have T = (1, 1, . . . , 1), with one “0” being turned into a “1” at intermediate iterations. The edges of T will just be kept as a list of edges (so at the ith iteration it will be a list of length i). Input: A connected graph G = (V, E) and a cost function c : E → R. We input these as three lists V = {1, . . . , n}, E = {x1 y1 , . . . , xm ym }, and c = (c1 , . . . , cm ). 4 Output: A minimum cost spanning tree T , whose edges are given by E(T ) = {a1 b1 , . . . , an−1 bn−1 }. Procedure: 1. Set T1 = 1, T2 = 0, . . . , Tn = 0. 2. Set i = 1 3. Repeat the following: – Set min = +∞. – For j = 1tom, repeat the following: * If Txj 6= Tyj and cj < min, then update min = cj and x = xj , y = y j . – If min = +∞, then output E(T ) = {a1 b1 , . . . , an−1 bn−1 }. – Otherwise, set tx = 1, ty = 1, ai = x, bi = y, i = i + 1. At (1), there are n operations. At (2), there is 1 operation. The loop at (3) is repeated n − 1 times (since in total, exactly n − 1 edges are added to get a spanning tree). Inside the “for” loop, there are ≤ 5 operations, so the “for” loop takes ≤ 5m operations in total. Outside the for loop, there are 8 operations. Thus in total, we have ≤ n + 1 + (n − 1)(5m + 8) = 5mn + 9n − 5m − 7 = O(mn) operations. 5 Lecture 5 This week we will look at the minimum cost path problem, whose input will be a directed graph D = (V, E), together with a cost function c : E → R. Recall that a directed graph is one in which edges are ordered pairs of vertices. Most definitions we’ve introduced for undirected graphs extend naturally to directed graphs. We briefly go over the main ones for concreteness: Definition 5.1 (Directed graphs). A directed graph (sometimes called “digraph”) is a pair D = (V, E) such that V is a finite set, and E is a set of ordered pairs of distinct elements of V . Edges of directed graphs are sometimes called “arcs”. In a directed graph, we don’t allow a vertex to be joined to itself (i.e. don’t allow edges uu), and we don’t allow two copies of the same edge. However we do allow for there to be two edges between two vertices as long as they go in opposite directions (i.e. we can have two edges uv and vu). Definition 5.2 (Walks, trails, and paths in directed graphs). In the directed graph D = (V, E) a walk P is an ordered sequence of vertices v0 , v1 , . . . , vk where vi ∈ V and vi−1 vi ∈ E (for all i = 1, . . . , k). The length of a walk is defined as k − 1, which equals the number of edges it goes through (repetitions counted). A trail is a walk which doesn’t repeat edges, and a path is a walk which doesn’t repeat vertices or edges (i.e. a sequence v0 , v1 , . . . , vk of distinct vertices with vi−1 vi ∈ E for all i). Note that these three definitions are exactly the same as they were in the directed case (though now it is very important what order the vertices come in each edge vi−1 vi — the edges are directed from the start of the path/walk/trail to the end). We’ll use the notation that x y if there is a walk from x to y in D. As in the undirected case this is equivalent to “there is a path from x to y in D”. Definition 5.3 (Circuits and cycles in directed graphs). In a directed graph D: A closed walk is a walk v1 , . . . , vk with v1 = vk . A circuit is a trail v1 , . . . , vk , v1 and k ≥ 2. A cycle is a walk v1 , . . . , vk , v1 with v1 , . . . , vk and k ≥ 2. These are again almost identical to the undirected case. Note however a key difference in the definition of circuits/cycles — that we only insist on k ≥ 2 (rather than k ≥ 3 as we did in the undirected case). This is because in directed graphs, we think of uv and vu as two distinct edges — therefore in the directed case, the closed walk uvu doesn’t repeat edges and is a circuit (whereas in the undirected case, the closed walk uvu would be going through the edge uv = vu twice and hence not be a circuit). Minimum cost path problem The setup for this problem is that we have a directed graph with a cost function c. The target is to find the minimum cost directed path in a digraph G(V, E) between two specified vertices. Quite often in applications, the vertices represent points in space and the cost represents the distance between two points. With this interpretation the minimum cost path will just be the shortest path between two points — for this reason, this problem is often called the “shortest path problem”. We set up the problem more generally and want to find the minimum cost paths from a fixed vertex, r, called the root, to all other vertices of the graph. Note that together with G and r, aPcost function c : E → R is given. The price of the directed path P = v0 , v1 , . . . , vk is k c(P ) = 1 c(vi−1 vi ). Recall the definition that P is a directed path iff ai−1 ai ∈ E(G) for all i. TASK: Minimum cost path INPUT: Digraph G = (V, E), root r ∈ V , cost function c : E → R OUTPUT: a directed path P from r to every vertex v ∈ V of minimal cost. The following lemma is useful for understanding walks in digraphs with a cost function: Lemma 5.4. Let W be a walk from x to y in a directed graph G with cost function c, then c(W ) = c(P ) + c(C1 ) + · · · + c(Ct ) for some path P from x to y, integer t, and cycles C1 , . . . , Ct . 1 Proof. This is by induction on the number of repeated vertices in W = xv1 . . . vt y. If W has no repeated vertices, then it is a path. Otherwise vi = vj for some i < j. Without loss of generality pick such vi , vj as close together as possible i.e. such that the vertices vi , vi+1 , . . . , vj−1 are all distinct. Then C = vi vi+1 . . . vj−1 v is a cycle, while W 0 = xv1 . . . vi vj+1 . . . vt y is a walk from x to y with less repeated vertices than W . By induction c(W 0 ) = c(P )+c(C1 )+· · ·+c(Ct ) for some path P from x to y and cycles C1 , . . . , Ct . Now c(W ) = c(W 0 ) + c(C) = c(P ) + c(C1 ) + · · · + c(Ct ) + c(C) as required. One difficulty in solving the minimum cost path problem is the presence of negative circuits in the graph i.e. circuits the sum of whose weight is negative. The above lemma shows that this is the same as negative cycles/closed walks: Lemma 5.5. The following are equivalent in a directed graph G with cost function c: (i) G has a negative cost closed walk. (ii) G has a negative cost circuit. (iii) G has a negative cost cycle. Proof. Since cycles are circuits and circuits are closed walks, we clearly have (iii) =⇒ (ii) =⇒ (i). It remains to prove (i) =⇒ (iii). Let W a closed walk with c(W ) < 0. Let x be the first (and so also last) vertex of W . Apply Lemma 5.4 to get c(W ) = c(P ) + c(C1 ) + · · · + c(Ct ) for some path P from x to x, integer t, and cycles C1 , . . . , Ct . Since P is a path from x to x, it must just be P = x, giving c(P ) = 0. Thus 0 > c(W ) = c(C1 ) + · · · + c(Ct ), which shows that c(Ci ) < 0 for some i. The following lemma gives a characterization of graphs without minimum circuits in terms of min cost paths/walks. Lemma 5.6. Suppose that we have a connected directed graph G and a cost function c : E(G) → R. The following are equivalent: (i) G has a minimum cost walk from x to y for every x, y with x such a minimum cost walk which is a path. y. Moreover there exists (ii) G has no negative circuits. Proof. For (i) =⇒ (ii): suppose that G has a minimum cost walk between every pair of vertices. Consider a circuit C = x1 x2 . . . xk x1 . Define Pt to be the walk from x1 to x1 defined by Pt = x1 x2 . . . xk x1 x2 . . . xk . . . x1 x2 . . . xk x1 , where the sequence repeats t times. We have the cost c(Pt ) = t(c(x1 x2 ) + c(x2 x3 ) + · · · + c(xk x1 )) = tc(C). By assumption G has a minimum cost walk W from x1 to x1 . Since W is minimum cost we have c(W ) ≤ c(Pt ) = tc(C) for all t. This can only happen if c(C) ≥ 0 (since otherwise the sequence {tc(C)}∞ t=0 would tend to −∞). For (ii) =⇒ (i): suppose that G has no negative circuits. Let x, y be vertices. Let P be a minimum cost path from x to y (it exists because there are finitely many paths from x to y). If P is not a minimum cost walk, then there is some walk W with c(W ) < c(P ). Use Lemma 5.4 to get a path P 0 , integer t, and cycles C1 , . . . , Ct for which c(W ) = c(P 0 ) + c(C1 ) + · · · + c(Ct ). Since there are no negative circuits, we have c(Ci ) ≥ 0 for all i, giving c(W ) ≥ c(P 0 ). But then c(P ) > c(W ) ≥ c(P 0 ), contradicting P being a minimum cost path. Because of the above lemma we generally only solve the minimum cost path problem on a graph which doesn’t have negative circuits. Our basic goal this week is to find an algorithm that will do the following: Input: a directed graph G, a cost function c : E(G) → R with no negative circuits, and a vertex r for which r y for all y ∈ G. Output: – For all y, give a path Py going from x to y of minimum cost. 2 Potentials and predecessor maps Our basic approach to this rests on the following observation: assume that we have an r to v directed path of cost fv , and an r to w dipath of cost fw . If fw > fv + c(vw), then there is another r to w path which is cheaper than fw , namely the r to v path of cost fv appended with the arc vw. This path is of cost fv + c(vw). Definition 5.7. A feasible potential is a function f : V → R (or, equivalently, an assignment of a number fv to every v ∈ V ) such that fr = 0 and fw ≤ fv + c(vw) for every vw ∈ E. (∗) Lemma 5.8. Assume f is a feasible potential and P is a directed path from r to v. Then c(P ) ≥ fv . Proof. As P = v0 , v1 , . . . , vk with v0 = r and vk = v, we have c(P ) = k−1 X 0 c(vi vi+1 ) ≥ k−1 X (fvi−1 − fvi ) = fvk − fv0 = fv . 0 Corollary 5.9. If f is a feasible potential and c(P ) ≤ fv for some r − v path P , then P is a minimum cost r − v path (and in fact we have c(P ) = fv ). Thus to solve the minimum cost path problem it is sufficient to find a feasible potential f and a collection of paths {Pv : v ∈ V (D)}, with the property that Pv goes from the root r to v and satisfies c(Pv ) = fv . Our algorithms will present their outputs in a more efficient way — they will produce a feasible potential, and something called a predecessor function. Definition 5.10. A predecessor map on a directed graph D = (V, E) with root r, is a function p : V \ {r} → V , such that for each v ∈ V \ {r}, we have p(v)v ∈ E and the set of such edges {p(v)v : v ∈ V \ {r}} contains no (directed) cycles. Given a vertex v, one can define a sequence of predecessors v, p(v), p(p(v)) . . . . This sequence cannot be infinite (since otherwise it would repeat some vertices and contain a cycle), and so must terminate. The only way it can terminate is if some p(p(. . . p(v) . . . )) equals the root r (since the root is the only vertex with no predecessor). Thus any predecessor function defines a collection of paths from r to all the vertices of D. Our algorithms for the minimum cost path problem will output a feasible potential f and a predecessor function satisfying fv = fp(v) + c(p(v)v) for all v 6= r. Lemma 5.11. Let D = (V, E) be a directed graph, C : E → R a cost function, and r ∈ V a root. Suppose that we have a feasible potential f and a predecessor function satisfying fv = fp(v) + c(p(v)v) for all v 6= r. Then for each vertex v, the sequence v, p(v), p(p(v)) . . . gives a minimum cost path from r to v. Proof. As we saw above, the sequence v, p(v), p(p(v)) . . . written in reverse gives some path Pv : r = v1 , . . . , vk = v (i.e. with p(vi ) = vi−1 for all i). We have c(Pv ) = c(v1 v2 ) + c(v2 v3 ) · · · + c(vk−1 vk ) = c(p(v2 )v2 ) + c(p(v3 )v3 ) + · · · + c(p(vk )vk ). Using the equation fv = fp(v) + c(p(v)v) repeatedly gives c(p(v2 )v2 ) = fv2 − fp(v2 ) = fv2 − fr = fv2 c(p(v3 )v3 ) = fv3 − fp(v3 ) = fv3 − fv2 c(p(v4 )v4 ) = fv4 − fp(v4 ) = fv4 − fv3 .. . c(p(vk )vk ) = fvk − fp(vk ) = fvk − fvk−1 Adding up all of these gives c(Pv ) = fvk = fv . By Corollary 5.9, Pv gives a minimum cost path from r to v. 3 Ford’s algorithm Next we describe Ford’s algorithm for the minimum cost path problem. The target is to find a feasible potential and the corresponding predecessor map. Input: A directed graph D = (V, E), a cost function c : E → R, and a root r ∈ V . Output: If G has no negative circuits and r y for every y ∈ V , then we output a feasible potential f and a predecessor map p satisfying Lemma 5.11. Procedure: – Set fr = 0, and fv = +∞ for all other vertices. – Set p(v) =“undefined” for all v. – Repeat the following: * Check if there is an edge xy with fy > fx + c(xy). * If there is such an edge, then update fy = fx + c(xy) and p(y) = x. * If there is no such edge, then output f, p. One big difference between Ford’s Algorithm and previous algorithms that we’ve looked at is that Ford’s Algorithm is not guaranteed to terminate. In fact, when the graph has negative circuits, then usually Ford’s Algorithm will not terminate. What we’ll aim to prove is that when there are no negative circuits, then Ford’s Algorithm does terminate, and the potential at each vertex gives the cost of the minimimum cost path from the root to that vertex. First notice the following observation. Observation 5.12. For each vertex y as Ford’s algorithm runs, the potential fy only decreases, never increases. This is true simply because the algorithm has no mechanism for increasing the potential of a vertex. The following is also immediate: Observation 5.13. If Ford’s algorithm terminates then for every edge xy we have fy ≤ fx +c(xy). Observation 5.14. For y 6= r, at any point of the algorithm, we have fy finite ⇐⇒ p(y) is defined. This true just because fy and p(y) are only every updated together — if we change one of these from their initial value, then we change the other too. The following lemma gives us important information about what happens in Ford’s algorithm while it runs. Lemma 5.15 (Running of Ford’s Algorithm). At step k of Ford’s algorithm, the following are true for all vertices y. (i) fy ≥ fp(y) + c(p(y)y) for all y 6= r with fy < ∞. (ii) If fy < ∞, then fy equals the cost of some walk from r to y. (iii) If there are no negative circuits, then there are no cycles v1 v2 . . . vk = vi with vi = p(vi+1 ) for all i. Proof. (i) If fy = ∞, then there is nothing to check, so suppose that fy is finite. Then the last edge checked ending at y must have been p(y)y (since this is the only way p(y) could have been set to its value). At this step, fy was set to f(p(y)) + c(p(y)y) i.e. we have fy (i) ≥ fp(y) (i) + c(p(y)y). Following this, fy hasn’t changed, while fp(y) only decreased (by Observation 5.12) i.e. we have fy ≥ fp(y) + c(p(y)y). 4 (ii) This is proved by induction on k. In the initial case k = 0, the only vertex with fx < ∞ is the root r which has fr = 0. Notice that the walk W = r is a cost 0 walk from r to r — showing that the initial case is true. Suppose that at step k − 1, there is a walk of cost fv from r to v for all v with fv < ∞. Let xy be the edge that was checked at step k. If we had fy ≤ fx + c(xy), then nothing changes in the graph (and so the claim is true by induction), so suppose that fy > fx + c(xy). Then at this step we set fy = fx + c(xy). By induction, there is a walk W from r to x of cost c(W ) = fx . Adding the vertex y at the end of this walk gives a new walk whose cost is c(W ) + c(xy) = fx + c(xy) = fy . This proves the induction step. (iii) Let H(t) be the directed graph consisting of edges p(x)x at time t. Suppose for contradiction that H(t) contains a cycle. Let t be the first timestep when H(t) contains a circuit i.e. suppose that H(t − 1) does not contain any circuits. Let xy be the edge that was checked at time t since the graph changed from H(t−1) to H(t), we must have fy (t−1) > fx (t−1)+c(xy) and fy (t) = fx (t − 1) + c(xy) = fx (t) + c(xy). Also H(t) contains a cycle C, which wasn’t present in H(t − 1) and so C must contain the edge xy. Let C = v1 v2 . . . vk v1 where vk = x, v1 = y. By part (i), we have fvi+1 (t) ≥ fvi (t) + c(vi vi+1 ) for all i. We also have fv1 (t + 1) ≥ fk (t + 1) + c(vk v1 ) (this is the same as fy (t − 1) > Pk Pk fx (t − 1) + c(xy)). Adding up all these inequalities gives i=1 fvi (t + 1) > i=1 fvi (t + 1) + c(v1 v2 ) + · · · + c(vk−1 vk ) + c(vk v1 ) i.e. c(C) < 0. An immediate corollary is the following: Corollary 5.16. If there are no negative circuits, then the potential of the root r, fr never changes (i.e. we have fr = 0 throughout the algorithm). Proof. Suppose that fr 6= 0 at some point. Since potentials only ever decrease, this means that fr < 0 at some point. By Lemma 5.15, this gives us a walk W from r to r with c(W ) < 0. But this is a negative cost closed walk, so by Lemma 5.5, there would be negative circuits too. Using the above, we can prove that when Ford’s algorithm terminates, then it gives the correct answer. Lemma 5.17 (Correctness of Ford’s Algorithm). Let G be a graph with no negative circuits and r y for all y ∈ V . If Ford’s algorithm terminates on G, then the following are true for all vertices y 6= r. (i) fy < ∞ (ii) fy = fp(y) + c(p(y)y). (iii) f is a feasible potential. (iv) The sequence y, p(y), p(p(y)) . . . defines a path P from r to y. This path is a minimum cost path from r to y and has c(P ) = fy . Proof. (i) Suppose that fy = ∞, and consider some path P : r = v1 , . . . , vk = y. Let vi be the first vertex on this path with fvi = ∞. Since fr is finite, we have that i ≥ 2, and so fvi−1 < ∞. But then ∞ = fvi > fvi−1 + c(vi−1 vi ), contradicting Observation 5.13. (ii) For the algorithm to terminate we must have fy ≤ fx + c(xy) for all edges xy. Using this with x = p(y) we have fy ≤ fp(y) + c(p(y)y). But from Lemma 5.15, we also have fy ≤ fp(y) + c(p(y)y) which gives fy = fp(y) + c(p(y)y). (iii) This is just a combination of Observation 5.13 and Corollary 5.16. 5 (iv) Parts (i) – (iii) together with part (iii) of Lemma 5.15 allow us to apply Lemma 5.11, which tells us that y, p(y), p(p(y)) . . . defines a minimum cost path P from r to y. Next we go on to show that the algorithm does indeed terminate as long as there are no negative circuits. This is easy to show when the costs of edges are all integers. Lemma 5.18 (Termination of Ford’s Algorithm, when costs are integers). Let G be a graph with no negative circuits such that all the costs are integers. Then for any r, running Ford’s algorithm with root r will terminate after finitely many steps. Proof. Suppose that the algorithm never terminates. Then there is some vertex y whose potential keeps decreasing i.e. there is an infinite sequence of steps t1 < t2 < . . . , such that fy (t1 ) > fy (t2 ) > fy (t3 ) . . . . But by Lemma 5.15, fy (ti ) is the cost of some walk in G and so an integer. So fy (t1 ) > fy (t2 ) > fy (t3 ) . . . is a sequence of decreasing integers and so tends to −∞. But by Lemma 5.15, fy (ti ) is always bounded below by the cost of a minimum cost walk, and so cannot tend to −∞, a contradiction. For the full result, we need in use Lemma 5.4. Lemma 5.19 (Termination of Ford’s Algorithm). Let G be a graph with no negative circuits. Then for any r, running Ford’s algorithm with root r will terminate after finitely many steps. Proof. Suppose that the algorithm never terminates. Then there is some vertex y whose potential keeps decreasing i.e. there is an infinite sequence of steps t1 < t2 < . . . , such that fy (t1 ) > fy (t2 ) > fy (t3 ) . . . . By Lemma 5.15 (ii), we get an infinite sequence of walks W1 , W2 , W3 , . . . from r to y with c(Wi ) = fy (ti ). By Lemma 5.4 c(Wi ) = c(Pi ) + c(C1i ) + · · · + c(Csii ) for some path and cycles. Without loss of generality, we can suppose that none of these cycles have zero cost (otherwise just remove them from the sum), and so c(Cji ) > 0 always. There are two cases: The sequence {si }∞ i=1 is bounded above. Since there are finitely many sets of ≤ si paths/cycles in the graph, by the Pigeonhole Principle, for some i 6= j we must have Pi = Pj , and Cti = Ctj for all t. But then c(Wi ) = c(Wj ), contradicting c(Wj ) = fy (tj ) < fy (ti ) = c(Wi ). The sequence {si }∞ i=1 is not bounded above. Let m be the minimum non-zero cost of a cycle in G. We have c(Wi ) ≥ msi . Since si is unbounded, for some i, msi > fy (t1 ). But then this contradicts fy (t1 ) > c(Wi ). 6 Lecture 6 Maximum flows Consider the following informal problem: you have a graph/network and want to transport something from point r to point s (e.g. it could be a road network and you want to transfer goods between two cities. Or it could be a network of pipes and you want to pump oil between two locations). What is the most efficient way to route the flow through the network? Here “efficient” could mean “the way that allows you to transfer the most material from r to s per hour”. This is called the maximum flow problem, and is what we’ll look at this week. Let’s try to model this mathematically. The input will look as follows: Input: Directed graph G = (V, E), capacity function c : E → R+ , and two vertices r (the source) and s (the sink). 1 c 4 d 1 7 r 2 1 s b e 3 2 3 1 1 a Here the capacity function encodes how much material can pass through an edge at any time. The output that we will search for is a flow — a function x : E → R, which tells us how much material we should put through every edge. The flow should satisfy two properties: Conservation law: at every P P vertex v 6= r, s the inflow equals the outflow i.e. z∈N − (v) x(zv) = y∈N + (v) x(vy) Feasibility: for every edge 0 ≤ x(uv) ≤ c(uv). If a function x : E → R satisfies both of the above, we call it a “feasible flow”. Here N − (v) denotes the in-neighbourhood of v i.e. N − (v) := {x ∈ V : xv ∈ E}, while N + (v) denotes the out-neighbourhood of v i.e. N + (v) := {y ∈ V : vy ∈ E}. 1 An example of a feasible flow for the above diagram would be to let x(rb) = 1, x(be) = 1, x(ra) = 1, x(ae) = 1, x(es) = 2. The conservation law can be stated more concisely as “the net flow at every vertex other than the source P and the sink equals zero”. Here net flow at a vertex is defined as fx (v) := z∈N − (v) x(zv)−sumy∈N + (v) x(vy). We also P define the inflow at a vertex v as fx− (v) = z∈N − (v) x(zv) and the outflow at P a vertex v as y∈N + (v) x(vy). The function we will try to maximizeP is the total flow — this is defined as the net flow at the sink fx := fx (s) = z∈N − (s) x(zs) − sumy∈N + (s) x(sy). Now we can formally state the maximum flow problem: Task 4.1 (Maximum flow). Input: Directed graph G = (V, E), capacity function c : E → R+ , and two vertices r (the source) and s (the sink). Output: A flow f : E → R with fx as large as possible. This task is called Maximum Flow or MaxFlow and is often encountered in practice, for instance when one wants to push through a pipe system as much oil as possible. Another typical example is a road system where the capacity of each road is known and cars want to travel on the system and the question is how many cars can be used on the system. Or in an electric network the electrons run through the wires, entering the network at point r and leaving it at point s. The capacity of each wire is known and one wants to know the maximum number of electrons that one can push through the network in one unit of time. Cuts in directed graphs It turns out that the MaxFlow problem is closely related to something called the MinCut problem — the problem of finding the smallest cut in a directed graph. We next explain this problem. First, P we introduce a convenient notation: for a set of edges F we write x(F ) = e∈F x(e) and also c(F ) = P e∈F c(e). Definition 4.2. Given R ⊂ V the set δ(R) = {vw ∈ E : v ∈ R, w ∈ / R} is called a cut, or the cut of R. We say that δ(R) is an r-s cut if r ∈ R and s 6∈ R. 2 This is the directed analogue of a cut in an undirected graph. We write R = V \ R for the complement of R. Note P that for such an r-s cut δ(R), and a flow x satisfying the conservation law, v∈R fx (v) = fx (r), and P v∈R fx (v) = fx (s) (since by definition of “satisfying the conservation law”, every term in the sums fx (v) equals zero except fx (r), fx (s). Lemma 4.3. For every feasible r − s flow x, and for every r − s cut δ(R) fx = x(δ(R)) − x(δ(R)). Proof. fx = fx (s) = X fx (v) = v∈R = X v∈R,wv∈E x(wv) − X x(vu) = x(δ(R)) − x(δ(R)). 2 v∈R,vu∈E We get the following corollary: Corollary 4.4. For every feasible r − s flow x, and for every r − s cut δ(R) fx ≤ c(δ(R)). Proof. x(δ(R)) ≤ c(δ(R) by the capacity constraint and x(δ(R)) is always non-negative. So by the previous lemma, we get fx = x(δ(R)) − x(δ(R)) ≤ x(δ(R)) ≤ c(δ(R)). This has the following important implication. If you find a feasible r − s flow and an r − s cut δ(R) with fx = c(δ(R)), then x is a maximum flow. The MaxFlow task is solved without any further work. In 1956 Ford and Fulkerson proved the so-called Maxflow-Mincut theorem showing that, quite generally, the maximum flow equals the capacity of the minimum cut. Theorem 4.5 (Max flow-min cut). If there is a maximum flow, then max{fx : x is a feasible flow} = min{c(δ(R)) : R is an r − s cut}. Here is the basic idea of the proof. If there is an r − s dipath P with x(e) < c(e) for all e on P , then one can increase the flow value by a positive amount, namely, by min{c(e) − x(e) : e ∈ P }. But this simple idea does not quite work, we need a modification. If there is an undirected r − s path P such that x(e) < c(e) on forward arcs and x(e) > 0 on backward arcs, then one can increase the flow value again. 3 Definition 4.6. Let G be a digraph, c a capacity function, r, s ∈ V (G) a source and sink, and x be a feasible flow. Let P : r = v1 , v2 , . . . , vk be a sequence of distinct vertices. We say that P is an x-incrementing path if for each i = 1, . . . , k − 1, we have one of: (1) vi vi+1 ∈ E and c(vi vi+1 ) > x(vi vi+1 ). These are called “forwards edges” (2) vi+1 vi ∈ E and x(vi+1 vi ) > 0. These are called “backwards edges”. If v = s, then we call P an x-augmenting path. Lemma 4.7. If there is an x-augmenting path, then the flow x is not maximum. Proof. Let P be an x-augmenting path P : r = v1 , v2 , . . . , vk = s. Let be the minimum of minvi vi+1 forward edge (c(vi vi+1 )−x(vi vi+1 )) and minvi+1 vi backwards edge (x(vi vi+1 )). Note that this exists and is > 0 (since it is the minimum of a finite set of numbers all of which are positive). Now define a new flow x0 with x0 (vi vi+1 ) = x(vi vi+1 ) + for forwards edges and x0 (vi+1 vi ) = x(vi+1 vi ) − for backwards edges. The following claim will show that x0 has greater total flow than x (and hence show that x was not maximum, proving the lemma). Claim 4.8. x0 is a feasible flow of total flow fx0 = fx + . Proof. To see the conservation law for x0 consider somePvertex v 6= r, s. Since x satisfied the conservation law, we have fx (v) = u∈N − (v) x(uv) − P 0 u∈N + (v) x(vu) = 0. If v 6∈ P , then x (e) = x(e) for all edges through v, so fx0 (v) = fx (v) = 0. So suppose that v = vi for some i 6= 1, k. Then there are two edges of the path through vi (namely one with vertices {vi−1 , vi }, and one with vertices {vi , vi+1 }). There are several cases depending on whether they are forwards/backwards edges. If vi−1 vi and vi vi+1 are forwards edges, then x0 (vi−1 vi ) = x0 (vi−1 vi ) + , x0 (vi vi+1 ) = x0 (vi vi+1 )+ giving fx0 (vi ) = fx (vi )−x(vi−1 vi )+x0 (vi−1 vi )+ x(vi vi+1 ) − x0 (vi vi+1 ) = fx (vi ) − + = fx (vi ) = 0. If vi vi−1 and vi+1 vi are backwards edges, then x0 (vi vi−1 ) = x0 (vi vi−1 )−, x0 (vi+1 vi ) = x0 (vi+1 vi )− giving fx0 (vi ) = fx (vi )+x(vi vi−1 )−x0 (vi vi−1 )− x(vi+1 vi ) + x0 (vi+1 vi ) = fx (vi ) − + = fx (vi ) = 0. If vi−1 vi is a forwards edge vi+1 vi is a backwards edge, then x0 (vi−1 vi ) = x0 (vi−1 vi ) + , x0 (vi+1 vi ) = x0 (vi+1 vi ) − giving fx0 (vi ) = fx (vi ) − x(vi−1 vi )+x0 (vi−1 vi )−x(vi+1 vi )+x0 (vi+1 vi ) = fx (vi )−+ = fx (vi ) = 0. 4 If vi vi−1 is a backwards edge and vi+1 vi is a forwards edge, then x0 (vi vi−1 ) = x0 (vi vi−1 ) − , x0 (vi vi+1 ) = x0 (vi vi+1 ) + giving fx0 (vi ) = fx (vi ) + x(vi vi−1 )−x0 (vi vi−1 )+x(vi vi+1 )−x0 (vi vi+1 ) = fx (vi )−+ = fx (vi ) = 0. To see that fx0 (uv) ≥ 0 for all edges uv, note that the only edges whose flow decreased are backwards edges where it went from x(uv) > to x0 (uv) = x(uv) − > 0 (and so is still non-negative). Similarly, to see that to see that fx0 (uv) ≤ c(uv) for all edges uv, note that the only edges whose flow increased are forwards edges where it went from x(uv) < c(uv)− to x0 (uv) = x(uv) + < c(uv). Finally, to work out the total flow: consider the sink s. It is contained in precisely one edge e of P (the one with vertices {vk , vk−1 }). If e is a forward edge, then the inflow at s increases by . If e is a backwards edge, then the outflow at s decreases by . In either case, the net flow at s increases by , giving the result. Proof of Max-Flow/Min-Cut Theorem. From Corollary 4.4, we have. max{fx : x is a feasible flow} ≤ min{c(δ(R)) : R is an r − s cut} It remains to prove the “≥” direction. Consider a maximum flow x i.e. a feasible flow x with fx = max{fx : x is a feasible flow}. Let R = {v ∈ V : there is an r to v x-incrementing path}. We have that r ∈ R (since P = r satisfies the definition of “x-incrementing”). We also have s 6∈ R (since there is no x-augmenting path, as otherwise we have a contradiction to x being a maximum flow from Lemma 4.7). Consider some edge uv ∈ δ(R). Then u ∈ R and v 6∈ R. By definition of R, we have an x-incrementing path P : r = v1 , . . . , vk , u. Also by definition of R we must have v1 , . . . , vk ∈ R (since a subpath of an incrementing path is an incrementing path). If x(uv) < c(uv), then the path P 0 : r = v1 , . . . , vk , u, v is also an incrementing path. This would give v ∈ R, contradicting “v 6∈ R”. So in fact we know that x(uv) = c(uv) for all uv ∈ δ(R). Consider some edge uv ∈ δ(R). Then u 6∈ R and v ∈ R. By definition of R, we have an x-incrementing path P : r = v1 , . . . , vk , v. Also by definition of R we must have v1 , . . . , vk ∈ R (since a subpath of an incrementing path is an incrementing path). If x(uv) > 0, then the path P 0 : r = v1 , . . . , vk , v, u is also an incrementing path. This would give u ∈ R, contradicting “u 6∈ R”. So in fact we know that x(uv) = 0 for all uv ∈ δ(R). Summarizing the last two paragraphs — we’ve shown that x(δ(R)) = c(δ(R)) and x(δ(R)) = 0. By Lemma 4.3, we get fx = x(δ(R)) − x(δ(R)) = 5 c(δ(R)) i.e. we have found a cut δ(R) with c(δ(R)) = fx = max{fx : x is a feasible flow}. This shows that max{fx : x is a feasible flow} ≥ min{c(δ(R)) : R is an r − s cut} as required. Examining the above proof, we see that it also yields the following two statements (which are each essentially equivalent to the MaxFlow-MinCut Theorem). Corollary 4.9. A feasible flow is maximal iff there is no augmenting path. Corollary 4.10. Suppose x is a feasible r − s flow and δ(r) is an r − s cut. Then x is maximal flow and δ(R) is a minimal cut if and only if x(e) = c(e) for every e ∈ δ(R) and x(e) = 0 for every e ∈ δ(R). Ford-Fulkerson Algorithm Using the corollaries we can check whether in the example the flow x of value 3 (the sum of the three path-flows) is maximal or not. Is there an xaugmenting path with this flow x? Yes, there is, namely the path r, c, b, a, d, s where ba is a backward arc and all other arcs are forward. This suggests the following algorithm (called the “Ford-Fulkerson Algorithm) for solving the maximum flow problem: Input: Directed graph G = (V, E), capacity function c : E → R+ , and two vertices r (the source) and s (the sink). Output: A maximum flow f : E → R. Procedure: – Start with x(uv) = 0 for all uv. – Repeat the following: (∗) Find an x-augmenting path P : r = v1 , v2 , . . . , vk = s (if there is one). - If there is no x-augmenting, output x. - Otherwise, let be the minimum of minvi vi+1 forward edge (c(vi vi+1 )− x(vi vi+1 )) and minvi+1 vi backwards edge (x(vi vi+1 )). - For all forwards edges, update x(vi vi+1 ) = x(vi vi+1 ) + - For all backwards edges, update x(vi vi+1 ) = x(vi vi+1 ) − 6 This algorithm is somewhat informally stated (in particular it is unclear how to perform step (∗)) — however, we shall analyse it as stated to keep things simple. From the previous section, it is easy to see that if the algorithm terminates, then its output x is a maximum flow: indeed when it terminates we know that there are no x-augmenting paths, and so Corollary 4.9 tells us that x is maximum. It is less clear that the algorithm terminates at all though. We’ll prove that it does under the additional assumption that all capacities c(uv) are integers. Theorem 4.11. Suppose that all capacities are integers. Then throughout the Ford-Fulkerson terminates and outputs a flow x with x(uv) an integer for all edges uv. Proof. We’ll first establish: (+) Throughout the Ford-Fulkerson algorithm x(uv) is an integer for all edges uv. Let xt be the flow at iteration t of (∗). We’ll show that xt (uv) is always an integer by induction on t. For t = 0, we have we have x0 (uv) = 0 ∈ Z. Suppose that xt (uv) ∈ Z for some t. Consider the number we define at this iteration. We defined be “the minimum of minvi vi+1 forward edge (c(vi vi i + 1)− xt (vi vi i + 1)) and minvi+1 vi backwards edge (xt (vi vi i + 1))”. Note that all numbers involved here (i.e. c(vi vi i + 1), xt (vi vi i + 1), xt (vi vi i + 1)) are integers. Thus is an arithmetic combination of integers — and hence an integer itself. Since at step t + 1, we have xt+1 (uv) ∈ {xt (uv), xt (uv) + , xt (uv) − }, and so we get that xt+1 (uv) ∈ Z, completing the proof of (+). Next note that the sequence of total flows fxt is strictly increasing (since each is constructed by increasing/decreasing values along an augmenting path as in the proof of Lemma 4.7). But fxt is bounded above (e.g. by the sum of the capacities of all the edges in G). Thus we have that {fxt } is a bounded sequence of strictly increasing integers — which means it is a finite sequence i.e. the algorithm terminated at some point. The above proof has the the important corollary that if all capacities are integers, then there exists a maximum flow in which the flow along every edge is an integer. In real world optimization problems we often have the phenomenon that we want all variables of the output to be whole numbers (e.g. we want a maximum flow along a road network, we don’t want to sent half a car along some road). Modelling problems as flow problems is a general method for guaranteeing the solution doesn’t involve any fractions. 7 Lecture 7 Matchings and covers in graphs Consider the following informal problem: given a collection of employers and job applicants, find an “optimal” pairing between the employers and the applicants. “Optimal” can mean different things in different context — it could mean simply allocating as many people jobs as possible. If the applicants give a ranking of what jobs they prefer the most, “optimal” could also mean taking into account their rankings in some way. We’ll first consider the case when applicants simply give a list of which jobs they would be happy with and which they wouldn’t — and our goal is to allocate as many applicants to jobs as possible. How do we model this mathematically. The setting will be a bipartite graph G — this is defined as a graph whose vertex set is V = A ∪ B for disjoint sets A, B and where all edges are of the form ab with a ∈ A and b ∈ B. The sets A, B are called the “parts” or “bipartition classes” of the bipartite graph. In our application, we would think of A as representing the applicants and B as the jobs — with edges ab representing when applicant a applies to job b. The structure that we look for in bipartite graphs is a matching: Definition 6.1. Let G = (V, E) be an (undirected) graph. M ⊂ E is a matching if e, f ∈ M implies e ∩ f = ∅. i.e. a matching is a collection of disjoint edges. Our goal is to understand the following problem Task 6.2 (Maximum matching). INPUT: graph G(V, E) OUTPUT: a matching M ⊂ E with as many edges as possible. This problem is closely connected to another one — the minimum cover problem. Definition 6.3. Let G = (V, E) be an (undirected) graph.C ⊂ V is called a cover if C ∩ e 6= ∅ for every e ∈ E. That is, a cover meets every edge of G. We’ll use e(M ) to denote the number of edges in a matching and v(C) to denote the number of vertices in a cover. A cover contains at least one vertex from every edge. This implies the following basic relationship between the above two definitions. 1 Fact 6.4. If M is a matching and C is a cover, then e(M ) ≤ v(C). Proof. Since C is a cover, each edge of M contains (at least) one vertex of C. Since the edges of M are disjoint, we get at least e(M ) vertices in C. We have seen around the MaxFlow-MinCut theorem how beneficial is when the maximum of one task equals the minimum of another. It is not true that the optima of the above tasks coincide in general. But in the case of bipartite graph they do coincide. Theorem 6.5 (König’s Theorem). In a bipartite graph G max{|M | : M is a matching } = min{|C| : C is a cover }. This again has the important implication that if M is a matching and C is a cover in a bipartite graph and |M | = |C|, then M is a maximum matching and C is a minimal cover. There are several proofs. We give one that uses the MaxFlow-MinCut theorem. Proof of König’s Theorem. Assume V = P ∪ Q is the bipartition of G. We define a directed graph G∗ as follows: V (G∗ ) = V (G) ∪ {r, s} and E(G∗ ) = {rp : p ∈ P } ∪ {qs : q ∈ Q} ∪ {pq ∈ E : p ∈ P, q ∈ Q}. We define capacities as well: u(rp) = u(qs) = 1 for every p ∈ P and q ∈ Q, and u(pq) = ∞ for every pq ∈ E(G∗ ). This is a network. Since the capacities are integers, we know that the maximum flow is an integral flow. We remark that if x is an integral and feasible flow in this network, then x(e) = 0 or 1 (∀e ∈ E(G∗ )). Claim 6.6. Let x be a maximum flow in G∗ and M a maximum matching in G then fx = |M |. Proof. Let x be a maximum feasible flow, and recall that from the FordFulkerson algorithm, we may assume that x is an integral flow. Define a subset of edges M ⊂ E by pq ∈ M if x(pq) = 1 (and pq ∈ / M if x(pq) = 0). Then M is a matching in G: there cannot be two edges pq 0 , pq 00 ∈ M from p ∈ P , since by flow conservation p would need to have inflow 2 in G∗ which is impossible (since vertices in P only have one edge of capacity 1 entering them). By the same argument, there cannot be two edges p0 q, p00 q ∈ M into q ∈ Q. Thus M is a matching. For each edge pq ∈ M , the edge qs must have flow 1 (by flow conservation). This gives that the total flow fx = |M | Conversely, assume that M is a matching on G. We define a flow x by the following rules 2 • x(pq) = 1 if pq ∈ M , • x(rp) = 1 if p is contained in an edge in M , • x(qs) = 1 if q is contained in an edge in M , • x(e) = 0 in all other cases. Then x is an integral and feasible flow with fx = |M |. Claim 6.7. Let δ(R) be a minimum r-s cut in G∗ and C a minimum vertex cover of G. Then c(δ(R)) = |C| Proof. Let δ(R) be a minimum r-s cut. Then R = {r} ∪ X for some X ⊂ V (G). There is no arc from X ∩ P to Q \ X as such an arc has infinite capacity and our δ(R) is finite. Then there is no edge between X ∩ P and Q \ X in G. Define C 0 = (P \ X) ∪ (Q ∩ X). It follows that C 0 is a cover in G, and so we have proved |C| ≤ c(δ(R)). For the other direction, consider a minimum vertex cover C of G. Let R0 = {r} ∪ (P \ C) ∪ (Q ∩ C) and consider the cut δ(R0 ). Since C is a vertex cover, P there are no P edges pq with p ∈ P \ C and q 6∈ Q ∩ C. Thus c(R0 ) = p∈P \R0 c(rp) + q∈R0 c(qs) = (|P | − |P \ C|) + (|Q ∩ C|) = |C|. By the MaxFlow-MinCut theorem, we have that the size of a maximum flow in G∗ equals the capacity of a minimum cut in G∗ . By the first claim the size of a maximum flow in G∗ equals the size of a maximum matching in G, while by the second claim, the capacity of a minimum cut in G∗ equals the size of a minimum vertex cover in G. Thus the size of a maximum matching in G equals the size of a minimum vertex cover, as required. We can now effectively solve the two tasks Max matching and Min cover in bipartite graphs. To do this, set up the network G∗ as in the proof of König’s theorem, and solve it by the Ford-Fulkerson algorithm. We’ll now look at another theorem for finding matchings in graphs. Definition 6.8. Suppose G = (V, E) is a graph and A ⊂ V . The set of neighbours of A, N (A) is defined as N (A) = {u ∈ V : there is v ∈ A such that uv ∈ E}. The following theorem gives another characterization of matchings in bipartite graphs. Theorem 6.9 (Hall Theorem). In a bipartite graph G = (V, E) with bipartition classes A and B there is a matching of size |A| if and only if (∗ ) |N (S)| ≥ |S| for every S ⊂ A. 3 Proof. For the “only if” direction: If there is a matching M of size |A|, then condition (*) holds. Indeed, if S ⊂ A, then M contains, for every a ∈ S, an edge ab with a unique b ∈ B, and for distinct as the bs are also distinct. So N (S) also contains b, one vertex for every a ∈ S. For the other direction we note that A is always a cover in G. Another form of König’s theorem says that there is a matching of size |A| iff there is no cover of size < |A|. Assume now, contrary to the statement of the theorem, that • there is a cover C ⊂ A ∪ B of size < |A|, and • condition (*) holds. Observe that there is no edge between A \ C and B \ C. We have |A| = |C ∩ A| + |A \ C| > |C| = |C ∩ A| + |C ∩ B|, and so |A \ C| > |C ∩ B|. Yet N (A \ C) ⊂ C ∩ B and then |N (A \ C)| ≤ |C ∩ B| < |A \ C| contradicting (*) when S = A \ C. 7 Stable matchings Suppose that we have a set of n job applicants A and a set of n employers B. We want to find the “best” pairing between the applicants and the employers. What does “best” mean here? Each applicant gives their ranking of the employers in order of preference and each employer gives a ranking of the applicants in order of preference. Mathematically a “preference ordering” of A/B just means an ordering <σ of A/B with the earlier applicants/employers in the ordering being the more desirable ones (e.g a preference ordering of B gives a labeling B = {b1 , . . . , bn } so that b1 < b2 < · · · < bn ). Definition 7.1. A preference profile consists of two sets A, B of the same size, and also two sets of preference orderings {<a : a ∈ A} and {<b : b ∈ B}. We think of the preference ordering <a as giving an applicant a’s preferences between the employers in B, and we think of <b as giving an employer’s preference between the applicants in A. Definition 7.2. Suppose we have a preference profile with sets A, B of size n, and orderings {<a : a ∈ A} and {<b : b ∈ B}. A stable matching between A and B is a pairing (a1 , b1 ), . . . , (an , bn ) of A to B such that there is no i, j with bj <ai bi and ai <bj aj (in words — we don’t have that ai prefers bj to their current partner and bj prefers ai to their current partner). 4 So we have the following Task 7.3 (Stable mataching). INPUT: sets B and G and their preferences OUTPUT: a stable matching (b1 , g1 ), (b2 , g2 ), . . . , (bn , gn ). There are n! matchings altogether. Is there one among them that is stable? If so, how to find one? These questions are answered by the following theorem which is due to Gale and Shapley. Theorem 7.4. There always a stable matching in a preference profile. Proof. The proof goes by the so called propose and reject algorithm that is described as follows. The algorithm gradually builds a matching M . During this algorithm everybody is either unmatched or matched. Initially everybody is unmatched. When (ai , bj ) ∈ M , we say that “bj is the partner of ai ” and “ai is the partner of bj ”. Throughout the algorithm at various points ai might “propose” to a bj — at this point bj might accept (making ai and bj matched), or reject (keeping ai and bj unmatched). Formally the algorithms runs as follows: Initially, set M = ∅. Repeat the following: – If all ai are matched, then terminate outputting M . – Otherwise arbitrarily pick some unmatched ai . – If ai has already proposed to all b ∈ B, then terminate, outputting M. – Otherwise let bj ∈ B be an element to which ai hasn’t proposed yet, who ai prefers the most (i.e. with bj as low as possible in <ai ). – ai proposed to bj . (1) If bj is unmatched, then bj accepts the proposal. Add (ai , bj ) to M . (2) If bj is matched to ak with ai <bj ak , then bj accepts the proposal. Add (ai , bj ) to M and remove (ak , bj ) from M (so ak is now unmatched). (3) If bj is matched to ak with ak <bj ai , then bj rejects the proposal (so nothing changes). 5 Now we show that the Gale-Shapley algorithm always produces a stable matching. We prove a series of claims: Claim 7.5. Once b ∈ B becomes matched, they can never become unmatched. Proof. This is true since there is simply no mechanism in the algorithm to unmatch b ∈ B. Claim 7.6. After b ∈ B becomes matched, the partners of b will keep getting better and better (formally if (ai , b) ∈ M at some point in the algorithm and (aj , b) ∈ M at a later point in the algorithm, then aj <b ai ). Proof. The partner of b can only change in step (2), when it changes from ak to ai with ai <bj ak . Claim 7.7. The algorithm terminates after at most n2 proposals. Proof. It’s impossible for some ai ∈ A to propose to the same bj ∈ B twice (just because the algorithm always picks bj to be someone who ai hasn’t proposed to yet). Thus the most proposals that can happen is |A × B| = n2 . Claim 7.8. At the end of the algorithm, everyone is matched. Proof. We prove this by contradiction. So assume a ∈ A has no partner at termination. Then there is a unmatched b ∈ B as well. If we terminated, then at some point a proposed to b. By Claim 7.5, b must have been unmatched at this point. But then b must have accepted the proposal (which contradicts b being unmatched at the end of the algorithm). Claim 7.9. At the end of the algorithm, M is a stable matching. Proof. We prove this by contradiction again. Assume that in the final matching the pairs are labelled M = {(a1 , b1 ), . . . , (an , bn )}. If M is unstable, then for some i, j, we have that bj <ai bi and ai <bj aj . Note that ai must have proposed to bj at some point (since bj <ai bi , we have that ai proposes to bj before ai proposes to bi ). It’s impossible that bj rejected ai : indeed this could only happen if bj was paired with ak with ak <bj ai . But then, by Claim 7, we have that aj ≤bj ak <bj ai , contradicting “ai <bj aj ”. So bj accepted ai s proposal. Then, by Claim 7, we have that aj ≤bj ai , which again contradicts “ai <bj aj ”. 6 Lecture 8 Turing Machines In the remainder of the module we will focus on formalizing the idea of an algorithm in order to give mathematically precise versions of statements like “a problem can be solved in polynomial time using an algorithm” and “a problem cannot be generally solved by an algorithm”. We will do this by introducing objects called “Turing machines” — these are mathematical objects which model an algorithm, and will be more rigorously defined than the decimal/arithmetic models we’ve looked at so far. Turing machines were invented by Alan Turing in 1936, even before computers existed. We first give an informal description. • There is a finite alphabet Σ = {1, 2, 3, . . . , a, b, . . . , +, − . . . , ∗} including the blank symbol ∗. • There is a processor which is always in some state m ∈ M where M is a finite set (of all possible states). M contains a starting state and a halting state. These states are the “computer program” which determines how the Turing machine acts. • There is a 2-way infinite tape, with cells containing letters from the alphabet Σ, like in the figure below, ... * * 7 2 c a t 9 0 * * ... initially this tape contains the input. At any moment, all but finitely many cells are blank (meaning that their value is “∗”). Formally, the tape is a function Z → Σ with all except finitely many values equal to ∗. • There is a read and write head, called r/w head for short, that can read, erase, and write symbols from Σ on a given cell of the tape. At any moment of the computation the Turing machine is in some position (m, t) where m ∈ M is the state in which the processor is, and t is the symbol on the tape that the r/w head sees on the tape. The next position of the Turing machine is (m0 , t0 , ±1) is completely determined by (m, t): the symbol t on the present cell is replaced by t0 , and m0 ∈ M is the next state of the processor, and the r/w head moves to the next cell on the right or left depending on whether the value is +1 or −1. (One could write left or right instead of ±1.) Mathematically, these concepts are formally described as follows: Alphabet: an alphabet is a finite set Σ containing a special symbol “blank” ∗ ∈ Σ. 1 Infinite tape: this is a function Z → Σ with all, except finitely many symbols equal to the blank symbol “∗”. Turing Machine: A Turing Machine is a finite collection of states M . – State: a state m ∈ M is a function m : Σ → M × Σ × {+1, −1}. The state with m(t) = (m0 , t0 , σ) is interpreted as “if the machine sees t while in state m, then write the symbol t0 , switch to state m0 , and then move right/left along the tape, depending on the value of σ” – Starting state: There is a special state mstart ∈ M called the “starting state” representing the state in which the Turing machine starts the computation. – Halting states: There is a halting state mhalt ∈ M representing when the Turing machine stops the computation (sometimes we allow for several halting states in M ). Inputs: an input is some initial configuration of the tape. Usually we will work with inputs which are strings. A string is an input s : Z → Σ where, for some k, positions s(1), . . . , s(k) are non-blank, and all other positions are blank. To run a Turing machine on some input s : Z → Σ, do the following: Put the r/w head on position 0. Set the state of the machine to state mstart . Perform the following repeatedly: If the machine is in a non-halting state m, and is in position i on the tape, which contains symbol t, then we have m(t) = (m0 , t0 , σ) for some state m0 ∈ M , symbol t0 ∈ Σ, and σ ∈ {+1, −1}. Do the following: – Write t0 in position i of the tape. – Switch the machine to state m0 . – Move to position i + σ on the tape. If the machine is in a halting state mhalt , then stop the computation. The output of the computation is the final configuration of the tape. One important remark, is that the definition of a Turing machine does not enforce that the machine ever actually halts. It is entirely possible to design machines which run for ever without entering a halting state. Lets look at some examples. 2 8.0.1 Example: erasing a string The following Turing machine simply erases the input and replaces everything with the blank symbol. It will work with the alphabet is Σ = {a, b, . . . , z∗}. The input s is given as s = s1 s2 . . . sk . It is written on the tape with s1 in position 1, . . . , sk in position k, and all other entries blank ∗. The states of the Turing machine are the following: M mstart (t) merase (t) merase (∗) {mstart , merase , mhalt } (merase , ∗, +1) for all t, (merase , ∗, +1) for all t 6= ∗, (mhalt , ∗, +1) = = = = Here is an example of running this machine on a sample input (with the underlined entries representing the position of the r/w head): ... ∗ ∗ d o ... ∗ ∗ d o ... ∗ ∗ ∗ o ... ∗ ∗ ∗ ∗ g g g g ∗ ∗ ∗ ∗ ∗... ∗... ∗... ∗... ... ∗ ∗ ∗ ∗ ∗ ∗ ∗ ... ... ∗ ∗ ∗ ∗ ∗ ∗ ∗... 8.0.2 state: state: state: state: mstart merase merase merase state: merase state: mhalt Example: deciding if a number is even/odd The following Turing machine that decides if n ∈ N is even or odd. The alphabet is Σ = {0, 1, . . . , 9, ∗}. The input n is given as n = n1 n2 . . . nk . It is written on the tape with n1 in position 1, . . . , nk in position k, and all other entries blank ∗. The states of the Turing machine are the following: M mstart (t) mmoveright (t) mmoveright (∗) mread (t) mread (t) = = = = = = EN {mstart , mmoveright , mread , mEV , mODD halt halt } (mmoveright , t, +1) for all t (mmoveright , t, +1) for all t 6= ∗ (mread , ∗, −1) EN , ∗, −1) for t = 0, 2, 4, 6, 8, ∗ (mEV halt (mODD halt , ∗, −1) for t = 1, 3, 5, 7, 9 3 Here is what the TM does under these rules. First moves right and switches to the state mmoveright . Then it keeps moving to the right until it sees a blank ∗ and immediately switches to the state mread and moves one digit to the left. Now the machine is reading the very last digit of the number, so depending on whether it is odd or even, it halt’s outputting “odd” or “even”. Here we give an example of this Turing machine running on the input “n = 234”. We underline the position of the r/w head at each step. ... ... ... ... ... ... ∗ ∗ ∗ ∗ ∗ ∗ ∗234 ∗234 ∗234 ∗234 ∗234 ∗234 ∗ ∗ ∗ ∗ ∗ ∗ ∗... ∗... ∗... ∗... ∗ ... ∗... state: state: state: state: state: state: mstart mmoveright mmoveright mmoveright mmoveright mread EN state: mEV halt ... ∗ ∗ 2 3 4 ∗ ∗... Remark: the way we define Turing machines for every non-halting state and every t ∈ Σ we must specify a command of the form (m0 , t0 , σ). Some of these commands may end up being unnecessary from the point of view of the problem which the Turing machine is solving. For example in the above example, state mread moving the r/w head to the left didn’t do anything useful. Additionally specifying a behaviour for mread (∗) is unnecessary since the only way that the machine could be in state mread with ∗ written on the tape is if initially the tape was entirely blank. However to properly specify a Turing machine you have to say what every state does for every possible symbol of the alphabet. Church’s thesis At first glance Turing machines seem like a very awkward way of defining computations. And they are — actually designing a Turing machine for solving any non-trivial task is highly impractical. However the point of Turing machines is not convenience, but rather being able to define algorithms in a 100% mathematically precise way. This allows mathematicians to study questions like “what sorts of problems can one design an algorithm to solve” and “how fast can one design an algorithm for solving a particular task”. It is natural to ask whether some more advanced model of computation (like the arithmetic model or decimal model) is “better” in the sense that there are tasks that can be performed by those models and not by Turing machines. Church postulated that the answer is “no”: 4 Church’s Thesis: Any reasonable notion of computation is equivalent to what is computable by a Turing machine. This is not a formal mathematical statement in a sense that it doesn’t define what “model of computation” is. Special cases of it can be formalized (e.g. it is possible to prove that for any task solved by an algorithm in the decimal model, one can build a Turing machine to solve the same task). There have been many other notions of computations studied. Some of these modifications of Turing machines (like ones where there are several tapes instead of one, other movements of the head than just one step right or left etc.), while others are more conceptually different like the arithmetic and decimal models. But for all the ones considered so far, it has been possible to prove that they are essentially equivalent in power to Turing machines. Thus, despite the apparent arbitrary nature of their definition, the idea that “algorithm = Turing machine” turns out to be reasonable. 9 Decision problems A decision problem asks whether a given mathematical object has a certain property or not. Examples: decide, whether a given n ∈ N is a power of 2 or not; n ∈ N is prime or not; a given bipartite graph has a matching of size ≥ k or not; a given digraph G with r, s ∈ V has a directed path from r to s or not; given k ∈ N and a digraph G with r, s ∈ V has a directed r − s path of length ≤ k or not; a given polynomial p(x1 , . . . , xk ) ∈ Z[x1 , . . . , xk ] has a root (x1 , . . . , xk ) ∈ Zk or not; a given polynomial p(x1 , . . . , xk ) ∈ Z[x1 , . . . , xk ] has a root (x1 , . . . , xk ) ∈ Rk or not. We want an algorithm (that is, a Turing machine) that decides, for every given n or bipartite graph G and k ∈ N, etc. whether it has the property in question or not. In general we need an encoding scheme for the problem that encodes the objects in question (n ∈ N, digraphs, pairs (G, k) where G 5 is a bipartite graph and k ∈ N, polynomials etc.) such that their codes can be inputs of a suitable Turing machine. To define decision problems formally: recall that every Turing machine works with a language Σ. The set of all strings of the alphabet Σ0 is denoted by Σ+ 0 . A language L is defined to be any set of strings i.e. any subset of + Σ0 . A decision problem is then defined as: Definition 9.1. A decision problem is a pair of languages DY ES ⊆ D. An algorithm T solves the decision problem D if for any I ∈ D runES O ning T on I halts in state mYhalt if I ∈ DY ES and halts in state mN halt otherwise. Note that we do not precisely say what “algorithm” means here. It could be a Turing machine, but could also be an algorithm in the arithmetic or decimal model. Here are some examples DECISION PROBLEM: Even ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, ∗} INPUT: a string s representing a natural number i.e. s = s1 . . . sk with s1 6= 0 OUTPUT: YES if, and only if, s is a prime. Here D = {s1 . . . sk : si ∈ Σ \ ∗ and s1 6= 0} and DY ES = {s1 . . . sk ∈ D : s1 . . . sk is even}. We’ve seen a Turing machine which solves this decision problem in Example 8.0.2. DECISION PROBLEM: Prime ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, ∗} INPUT: a string s representing a natural number i.e. s = s1 . . . sk with s1 6= 0 OUTPUT: YES if, and only if, s is a prime. Here D = {s1 . . . sk : si ∈ Σ \ ∗ and s1 6= 0} and DY ES = {s1 . . . sk ∈ D : s1 . . . sk is prime} DECISION PROBLEM: Connected graph ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, V, E, =, (, ), {, }, ∗} INPUT: graph G(V, E) OUTPUT: YES iff G is connected. To represent this as as a decision problem we need to encode the input using the alphabet Σ. There are many reasonable conventions we can come up with for doing this. One is to simply write out the vertices and edges of the graph as you would in text. So D consists of all strings of the form “V, = 6 , {, v, 1, . . . , v, n, }, E, =, {, . . . , },”. Here the symbols between the commas is what’s written on the tape. So first we write a sequence of vertices and then we write a sequence of edges. For example “V, =, {, v, 1, v, 2, v, 3}, E, = , {, (, v, 1, v, 2, ), (v, 2, v, 3, ), },” is how we would write the input “G = (V, E) with V = {v1 , v2 , v3 }, E = {v1 v2 , v2 v3 }”. Under this encoding DY ES consists of strings of the above form which corresponds to a connected graph. DECISION PROBLEM: Empty graph ALPHABET: Σ = {0, 1, 2, 3 . . . , 9, V, E, =, (, ), {, }, ∗} INPUT: graph G(V, E) i.e. a string of the form “V, =, {, v, 1, . . . , v, n, }, E, = , {, . . . , },” OUTPUT: YES iff G has no edges. Here we can actually write down a Turing machine which solves this decision problem. In this example DY ES consists of strings of the form “V, = , {, v, 1, . . . , v, n, }, E, =, {, . . . , },” where there is nothing entered between the semicolons at the end eg. “V, =, {, v, 1, v, 2, v, 3, }, E, =, {, },”. Equivalently, inputs of this form are ones where there is no “v” following the “E”. To test for this we can construct a Turing machine which reads the input until it sees “E”, then sees if there is a “v” following that. The following works: M mstart (t) mstart (E) m1 (t) m1 (v) m1 (}) = = = = = = ES O {mstart , m1 , mYhalt , mN halt } (mstart , t, +1) if t 6= E (m1 , t, +1) (m1 , t, +1) if t 6= v, E O (mN halt , t, +1) ES (mYhalt , t, +1) Note that in all the above examples the exact way that we encode the input is rather arbitrary — one could write down countless equivalent ways of writing the same input. The point is that the formalism of Turing machines and decision problems gives us some mathematically precise way of encoding problems. Once we can do this we can meaningfully ask questions like “can all decision problems be solved by a Turing machine”, “how many steps might a Turing machine need to solve a decision problem with an input of size n” etc. 7 Satisfiability Next we describe an important decision problem called satisfiability, SAT for short. It involves Boolean variables x1 , x2 , . . . , xm that can take only two possible values: True and False. A literal z associated with the Boolean variable x is either z = x or z = ¬x, the negation of x, meaning simply that ¬x is True iff x is False. A clause C is some literals connected with ’or’, that is C = z1 ∨ z2 ∨ · · · ∨ zk . Finally a Boolean expression Φ = C1 ∧ C2 ∧ · · · ∧ Cd is a conjunction or rather conjunctive normal from, a CNF for short, if each Ci there is a clause. It is a result in elementary logic that every Boolean expression can be written as a CNF, but we don’t need this. A truth assignment is a function V : {x1 , . . . , xm } → {T, F } where T =True and F =False. So V just assigns True or False value to each Boolean variable x1 , . . . , xm . This truth assignment extends to literals as V (zj ) = T iff either zj = xi and V (xi ) = T or zj = ¬xi and V (xi ) = F . It extends further to clauses naturally as V (C) = V (z1 ∨ z2 ∨ · · · ∨ zk ) = T iff V (zj ) = T for at least one zj , and extends further to CNFs as V (Φ) = V (C1 ∧ C2 ∧ · · · ∧ Cd ) = T iff V (Ci ) = T for all Ci , A CNF Φ is satisfiable if there is a truth assignment V such that V (Φ) = T . Here comes the decision problem Satisfiability and one of its variants called 3-SAT. DP: SAT INPUT: a CNF Φ OUTPUT: YES iff Φ is satisfiable. DP: 3-SAT INPUT: a CNF Φ where every clause has at most 3 literals OUTPUT: YES iff Φ is satisfiable. The size of the input in this case is the total number of literals appearing in the CNF, counted with multiplicities. Here are some examples. Let C = x1 ∨ ¬x2 and define Φ = C. Then Φ is satisfiable with eg “x1 = T and x = T ”. Next let C1 = x1 ∨ ¬x2 ∨ ¬x3 and C2 = x2 ∨ ¬x3 and Φ = C1 ∧ C2 . Again, Φ is satisfiable with “x1 = T , 8 x2 = T , x3 = F ”. Finally consider the CNF formula (x1 ∨ x2 ∨ x3 ) ∧ (¬x1 ) ∧ (¬x2 ) ∧ (¬x3 ). This is not satisfiable since for (x1 ∨ x2 ∨ x3 ) to be true one of x1 , x2 , or x3 needs to be true, which stops (¬x1 ) ∧ (¬x2 ) ∧ (¬x3 ) from being true. Understanding whether there exists an efficient algorithm for solving the SAT decision problem is actually one of the most important open problems in mathematics and computer science. It is the essence of the “P vs NP” problem which is one of the Millenium problems for the solution of which the Clay Institute offers a prize of 1 million dollars. In future weeks, we’ll what the “P vs NP” problem is and how SAT features in it. Running time of Turing machines It is quite easy to understand how many steps a Turing machine takes on an input — this is just the number of times the r/w head moves. To study running times, we also need a concept of the size of an input. Recall that the input to a Turing machine is a string, that is, a finite sequence of symbols from the alphabet Σ0 = Σ \ {∗}, written one after the other. A string x is then x = w1 w2 . . . wk with each wi ∈ Σ0 , (and k ∈ N is arbitrary). The length |x| of this string x is |x| = k. A Turing machine takes a string x ∈ Σ+ 0 as the input. The running time of the Turing machine M on the string x ∈ Σ+ 0 is defined as: TM (x) = number of steps M takes on x. Here “step” means a move of the r/w head. The running time is defined to equal infinity when M does not halt on x. For a language L, we can also formally define the worst case performance of a Turing machine M as TM (L, n) = max{TM (x) : x ∈ L and |x| ≤ n}. When the language L is just the set of all strings, we abbreviate TM (n) = TM (Σ+ 0 , n). The use of max instead of sup is justified as there are only finitely many strings of size at most n. We say that M runs in polynomial time on a language L if there is an integer d with TM (L, n) = O(nd ). This is equivalent to there existing a polynomial p with TM (L, n) ≤ p(n) for all n. If L is simply the set of all strings then we just say “M runs in polynomial time”. Definition 9.2. We say that a decision problem DY ES ⊆ D can be solved in polynomial time if there exists a Turing machine M with TM (D, n) = O(nd ) 9 for some d and which correctly solves the decision problem (in the sense of Definition 9.1). If a decision problem X can be solved in polynomial time, then we say that “X is in P”, written “X ∈ P”. A version of Church’s thesis is true for running times too. When solving exercises you can use the following without proof: Proposition 9.3. Let X be a decision problem. The following are equivalent. X can be solved in polynomial time with a Turing machine X can be solved in polynomial time in the arithmetic model. X can be solved in polynomial time in the decimal model. Proving a result like this is quite long and tedious — one needs to construct Turing machines for doing the elementary operations in the decimal/arithmetic models, and then use these to show that any algorithm in the decimal/arithmetic models can be transformed into a Turing machine. 10 Lecture 9 Decidability and the Halting Problem Recall the definition of decision problems. Definition 9.1. A decision problem is a pair of languages DY ES ⊆ D. An algorithm T solves the decision problem D if for any I ∈ D runES O ning T on I halts in state mYhalt if I ∈ DY ES and halts in state mN halt otherwise. A very natural question is whether every decision problem can be solved by an algorithm. This was actually Turing’s original motivation for defining Turing machines and so formalising the concept of an algorithm. It turns out that the answer is “no” — there are problems which cannot be solved by an algorithm. Such problems are called “undecidable”. Theorem 9.2 (Turing). There exists a decision problem which cannot be solved by any Turing Machine. This theorem basically says that there are decision problems that cannot be solved by any algorithm i.e. no matter how fast/clever your computer program is, there are questions that it simply cannot answer in general. In order to prove this theorem we need to come up with a decision problem that it proves is undecidable. This decision problems is easy to describe — it is called the Halting Problem. Informally, input to the Halting Problem is (M, x), where M is a Turing Machine and x is a string, with the output being YES ⇐⇒ M halts on x. Formally this is defined as follows: Decision problem: Halting Problem Alphabet Σ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0, a, A, b, B, . . . , z, Z, =, (, ), {, }, +, −, ,, : , ∗}. Input: A Turing Maching M running on Σ (eg M = {mstart , mone , . . . }) and an input string x ∈ Σstring (eg x = elephant02). We write this on the tape as: {mstart, mone, · · · : mstart(1) = (b, mone, −1), . . . , }elephant02 (1) 1 In other words, first write an open bracket “{”, then we write out the states of M (where we have a convention that we only use letters for labelling these), then we write a colon “:”, then write out what each state does on each alphabet letter, then write a closed bracket “}”, and finally write the string x. There is one technicality here — how do we write out what a state does on the blank symbol ∗? The natural thing is to write “mstart(∗) = (∗, mone, −1)” on the tape — however we can’t do this since we want the input to be a string (and the definition of a string doesn’t allow blank entries between non-blank entries). We remedy this by writing mstart(blank) = (blank, mone, −1)” on the tape instead of “mstart(∗) = (∗, mone, −1)”. D = { strings from Σ of the form (1) } DY ES = { strings from Σ of the form (1) such that M halts on x } This decision problem cannot be solved by algorithm. Theorem 9.3. There is no Turing machine working on alphabet Σ which solves the Halting Problem. Note that the above theorem doesn’t mean that we can never solve a particular instance of the Halting Problem (for some particularly simple Turing Machines it is possible to figure out exactly when they halt). Instead what it is saying that whatever algorithm you try to design for solving the Halting Problem, there will always be some input you could feed into the algorithm where it won’t give the correct answer. Before proving the above theorem we need an auxiliary lemma, which constructs a particular Turing machine for an (apparently) unrelated task. The following lemma builds a Turing machine which takes any string as an input and duplicates it so that it is now written twice. Lemma 9.4. There exists a Turing machine MDU P LICAT E which takes a string a1 , . . . , an as input, and whose output is the string a1 , . . . , an , a1 , . . . , an (written on positions 1, 2, . . . , 2n of the tape). Additionally, MDU P LICAT E always halts in position 0 in the state mhalt for such an input. Proof. Let the alphabet be Σ = {∗, t1 , . . . , tk } We’ll build a Turing ma1 1 1 chine M . The states of M are M = {mstart , mread , mtcopy1 , mtcopy2 , mtreturn1 , t1 t1 t1 t1 t1 mreturn2 , . . . , mcopy1 , mcopy2 , mreturn1 , mreturn2 }. Note that this is finitely 2 many states (since the alphabet is finite). The actions of each state are defined as follows: mstart (t) = (mread , t, +1) i mread (ti ) = (mtcopy1 , ∗, +1) mread (∗) = (mhalt , ∗, +1) i mtcopy1 (tj ) i mtcopy1 (∗) ti mcopy2 (tj ) i mtcopy2 (∗) i mtreturn1 (tj ) ti mreturn1 (∗) i (tj ) mtreturn2 ti mreturn2 (∗) = = = = = i (mtcopy1 , tj , +1) i (mtcopy2 , ∗, +1) ti (mcopy2 , tj , +1) i (mtreturn1 , ti , −1) i (mtreturn1 , tj , −1) for all t for all ti for all ti for all tj for all tj for all tj = (mreturn2 , ∗, −1) i , tj , −1) = (mtreturn2 for all tj = (mread , ti , +1) Running this on an input a1 , . . . , an has output a1 , . . . , an , ∗, a1 , . . . , an . In order to produce a Turing machine for the lemma, we need to combine this with another Turing machine S whose effect is shifting a string left by one position (in order to erase the ∗). We can use the following S = k 1 , sreturn1 , sreturn2 , shalt } to do this: {sstart , sread , stwrite , . . . , stwrite sstart (t) = (sread , t, +1) i , ∗, −1) sread (ti ) = (stwrite sread (∗) = (sreturn1 , ∗, +1) i stwrite (t) = (sstart , ti , +1) sreturn1 (ti ) = (sreturn1 , ti , −1) sreturn1 (∗) = (sreturn2 , ∗, −1) for all t for all ti for all ti for all ti for all ti sreturn2 (x) = (shalt , x, +1) for all x Now we can produce a Turing machine MDU P LICAT E satisfying the lemma by combining M and S: First run M . Replace the state mhalt by sread so that once M terminates, S is run. The effect is that running MDU P LICAT E on an input a1 , . . . , an has output a1 , . . . , an , a1 , . . . , an . We can now prove Turing’s theorem. Proof of Theorem 9.3. Suppose for contradiction that there is a Turing machine Msolve which can solve the Halting problem. Then Mhalt satisfies the following: 3 (a) For any Turing Machine M and string x if M halts on x, then running H on input (M, x) halts in state hyes halt . (b) For any Turing Machine M and string x if M doesn’t halt on x, then running H on input (M, x) halts in state hno halt . Now we will design a new Turing machine N which takes a string y as an input. It will run as follows: Phase 1: First N runs MDU P LICAT E so that yy is written on the tape. Phase 2: Then N runs Msolve with the following modification: The state hyes halt is replaced with the state hloop which is defined by “hloop (t) = (hloop , t, +1) for all t”. To define M formally, we need to specify all its states. We will do this by combining the states of Msolve and MDU P LICAT E suitably. First suppose that the states of Msolve are labelled by lowercase letters, while the states of MDU P LICAT E are labelled by uppercase letters (so eg the starting/halting n states of Msolve are called msolve , myes halt , mhalt o while the starting state of MDU P LICAT E are called mSOLV E , mHALT ). In particular, this implies that the states of Msolve /MDU P LICAT E all have distinct labels. Now define N as follows: N = (MDU P LICAT E \{mHALT })(Msolve \{mstart , myes halt })∪{mloop , mswitchphase } where hloop (t) = (hloop , t, +1) for all t and mswitchphase (t) = mstart (t) for all t. Additionally, we do the following alterations. Replace all mentions of myes halt by mloop everywhere. Replace all mentions of mstart by mswitchphase everywhere. Replace all mentions of mHALT by mswitchphase everywhere. The effect of this is that N runs exactly like it is described above — first it runs MDU P LICAT E , then instead of halting it switches to state mswitchphase and starts running Msolve , and afterwards instead of ever switching to state myes halt , it would switch to state mloop . Claim 9.5. N halts in state mno halt on input string y ⇐⇒ Msolve halts in no state mhalt on input yy 4 Proof. By construction, when N is run on y, it reaches mswitchphase with yy written on the tape and the r/w head on position 0. After this, N exactly copies the machine Msolve — thus N reaches mno ⇐⇒ Msolve reaches halt no mhalt . Claim 9.6. Let M be a Turing Machine written as a string on the tape in the form (1). Then N halts in state mno halt on input string M ⇐⇒ M doesn’t halt on input string M . Proof. Note that the string M M is of the form (1) where the machine is M and the input string x is also M . By Claim 9.5, we have that N halts in state mno halt on input string M ⇐⇒ no Msolve halts in state mhalt on input M M . However, by the assumption that Msolves the Halting Problem, this happens exactly when M halts on input string M . Note that it’s possible to run the Turing machine N on the input which is N . There are two cases depending on whether doing this halts or not. Case 1: Suppose that N halts on input string N . Then N must halt in state mnhalt o (since this is the only halting state in N by construction). By the “⇒” part of Claim 9.6, this tells us that N doesn’t halt on input string N , which is a contradiction. Case 2: Suppose that N doesn’t halt on input string N . By the “⇐” part of Claim 9.6, we get that N halts in state mno halt on input string M , which is a contradiction. You may think that the decision problem in the proof Theorem 9.2 is highly artificial. However it is possible to prove that more natural decision problems are undecidable as well. For example, given a multivariate polynomial p(x1 , . . . , xk ) with integer coefficients, determining whether p(x1 , . . . , xk ) = 0 has an integer solution is also undecidable. Proofs of results like this are based on reducing such problems to the Halting Problem ‘i.e. proving that they can be solved by an algorithm if, and only if, the Halting Problem can be solved by an algorithm. 10 NP We’ve already mentioned that “P vs NP problem” and stated it as “SAT ∈ P”. This isn’t the usual phrasing of the problem — it is normally phrased as “P = 6 N P”. In the remainder of the module we’ll define what N P is and explain why the two formulations of the problem are equivalent. 5 On a basic level, N P is a set of decision problems (just like P is a set of decision problems). The definition however is substantially more complex. Definition 10.1. Let Σ be an alphabet, and suppose we have a decision problem DY ES ⊆ D ⊆ Σstring . An algorithm M is a polynomial time certifier for the decision problem DY ES ⊆ D if there is a polynomial q so that: (1) For all x ∈ DY ES , there exists a string y of length ≤ q(n) such that M outputs Y ES on input (x, y). (2) For all x ∈ D \ DY ES and all strings y of length ≤ q(n), we have that M outputs N O on input (x, y). (3) M runs in polynomial time (i.e. for any x ∈ D, y ∈ Σstring , TM (x, y) = O((|x| + |y|)d ) for some d ∈ N). An algorithm B is an efficient certifier for the decision problem X if The big new thing in this definition is that M has two parts to its input — x and y. The x ∈ D is just some instance of the decision problem which we are trying to solve. On the other hand y ∈ Σstring is just an arbitrary string — and it is a lot more mysterious what it is doing in the definition. One way to think about y is a as “hint” towards the solution of the decision problem. So informally we think of a decision problem (D, DY ES ) as being in NP if there is an algorithm M which can efficiently solve (D, DY ES ) when also receiving a hint for how to solve it. We define a new class, N P, of decision problems. Definition 10.2. N P is the set of decision problems for which there exists a polynomial time certifier. The P vs NP problem is stated as “are there any decision problems which are in N P, but not in P”. Previously we stated this problem as “decide whether SAT ∈ P or not”. We’ll eventually show that the two formulations are equivalent. Let’s look at a concrete example. Let COMPOSITE be the decision problem whose input is an n-digit positive integer x and whose output is YES if, and only if, x is a composite number. Theorem 10.3. COMPOSITE ∈ N P. Proof. We’ll build a polynomial time certifier M for COMPOSITE which runs in the decimal model. Formally it works as follows: 6 (1) Input: the input for M is a pair (x, y) where x is an integer, whereas y is an arbitrary string. (2) Divide x by y i.e. find integers m, r with x = my + r where 0 ≤ r < y. This is done using Theorem 16 from week 1’s lecture notes. (3) If r 6= 0, y = 1, or m = 1, then output NO. (4) If r = 0, y 6= 1, and m 6= 1, then output YES. We’ll check the definition of “polynomial time certifier” with q(n) = n. Let x be an input to COMPOSITE of size n i.e. x is a n-digit number. We need to show three things: Let x be a YES instance of COMPOSITE. Then x has a non-trivial factorization x = ab where x − 1 ≥ a, b ≥ 2. Let y = a. Since a ≤ x, we have |y| ≤ |x| = q(|x|) always. When we run M on (x, y), we divide x by y to get x = my + r = ma + r = ab for 0 ≤ r < y = a, which implies that r = 0 and m = b. This shows that T (x, y) = Y ES. Let x be a NO instance of COMPOSITE (i.e. x is a prime number). Let y be a string of size ≤ n. When we run M (x, y) the output is always NO. When we run M on (x, y), we divide x by y to get x = my + r for some 0 ≤ r < y. Note that we cannot have m, y ≥ 2, r = 0 (since x is prime), and so we must output NO. M runs in polynomial time O((|x| + |y|)2 ) as a consequence of Theorem 16 from week 1. The above proof illustrates the common strategy used in essentially all proofs of decision problems being in N P. Such decision problems can always be phrased in the form “x ∈ DY ES ⇐⇒ there exists some object z” — e.g. in the above example x ∈ DY ES ⇐⇒ there exists an integer factor 6= 1, x of x. Other examples include “x is a YES instance of SAT ⇐⇒ there exists a T/F assignment to the variables making x true” or “G is a YES instance of CONNECTED ⇐⇒ there exists a spanning tree of G”. To build a polynomial time certifier for a decision problem, write an algorithm which checks whether y is an example of the object z. Afterwards, the proof proceeds almost exactly like the above theorem. Here is another, more complicated example. Theorem 10.4. SAT is in N P. 7 Proof. SAT can be defined as a decision problem as follows: Alphabet Σ = {∗, x, 0, 1, . . . , 9, ¬, ∧, ∨, (, ), =, T, F }. D = strings of the form “(x1 ∨ ¬x2) ∧ (¬x3 ∨ x1)” DY ES = strings of the above form, for which there exists a satisfying assignment. We define an algorithm A, whose input is a pair (x, y) with x ∈ D, y ∈ Σstring . This time the algorithm will be in the arithmetic model: (i) First check whether y is an assignment of T /F values to the variables of x i.e. if y is of the form “x1 = T /F, x2 = T /F, . . . , xn = T /F ” (for some choice of T /F in each case), and with x1, . . . , xn the variables appearing in x. If y is not of this form, then output NO. (ii) Next check whether x is satisfied by the T /F assignments given by y. To do this go through the clauses of x and check that each clause has at least one literal which is satisfied by y. If x is satisfied by y, then output YES, otherwise, output NO. Let x be an input to SAT of size n i.e. x is a CNF formula of length n. Notice that the number of variables of x is ≤ n We check the three parts of the definition of “polynomial time certifier” with the polynomial q(n) = n: Let x ∈ DY ES . Then there exists some assignment of T/F to the variables of x to make the whole expression true. Let y be the string as in (i) corresponding to this assignment. Since the number of variables of x is at most n, we have y ≤ n always. It is immediate from the definition of T that T (x, y) = Y ES. Let x ∈ D \ DY ES . Let y ∈ Σstring be any string. When we run T (x, y) the output is always NO. Indeed if y is not an assignment of T /F values to the variables of x, then we output NO at step (i). If y is an assignment of T /F values to the variables of x, then we output NO at step (ii) because (by assumption of x 6∈ DY ES ) there is no satisfying assignment for x. For any input y, T (x, y) runs in polynomial time. To see this, notice that step (i) takes O(|x| + |y|) steps (first read y from left to right, checking that it is of the form x1 = T /F, x2 = T /F, . . . , xn = T /F — this is O(|y|) operations, then read x from left to right, checking that for each variable xi that appears in x we have i ≤ n — this is O(|x|) 8 operations). Step (ii) takes O(|x|) steps (first replace each xi by T/F based on what y says it should be. Then check if there is any clause of the form (F ∨ F ∨ · · · ∨ F ). If there is such a clause, output NO, otherwise output YES). We end with establishing a containment between P and N P. Lemma 10.5. P ⊂ N P. Proof. Let (DY ES , D) ∈ P. We need to show that (DY ES , D) ∈ N P also i.e. construct a polynomial time certifier B for (DY ES , D). This is done as follows. Since (DY ES , D) ∈ P, there is an algorithm A which solves (DY ES , D) in polynomial time (i.e. TA (x) = O(|x|d ) for some d). Define another algorithm B whose input is (x, y), and which just runs A on x, while completely ignoring y. Note that by this definition, for all x, y, the output/running-time of B on (x, y) equals the output/running-time of A on x. Claim 10.6. B is a polynomial time certifier for (DY ES , D). Proof. We check the 3 parts of the definition with the polynomial q(n) = 0. Let x ∈ DY ES . Choose y = ∅ ∈ Σstring (i.e. y is the empty string). Then B(x, y) = A(y) = Y ES. Let x ∈ D \ DY ES , and let y ∈ Σstring be an arbitrary string. Then B(x, y) = A(y) = N O. For every x, y, we have TB (x, y) = TA (x) = O(|x|d ) = O((|x| + |y|)d ). 9 Lecture 10 11 Polynomial time reduction Polynomial time algorithms are considered to be “fast” whereas algorithms which are not polynomial time are considered to be “slow”. Almost all the algorithms we have considered in this module run in polynomial time. It is of great theoretical and practical interest to understand which decision problems are in P and which are not. One motivation for showing that decision problems are not in P — comes from cryptography. When encrypting something it is desirable to provably know there is no polynomial time algorithm to decrypt your cypher. Unfortunately proving that a decision problem is not in P is extremely difficult. One problem that is not in P is of course the Halting Problem. We know that there is no algorithm at all to decide the Halting Problem, so, in particular, there is no polynomial time algorithm. Beyond the Halting Problem, there are few decision problems we know of which are provably outside P. Recall the P vs NP open problem. Problem 11.1 (P vs NP problem). Show that SAT 6∈ P. The focus on SAT here may seem rather arbitrary. The basic reason why SAT is important is that it turns out that many other important problems are “polynomial time reducible” to SAT. Informally this means that if there is a polynomial time algorithm for SAT, then we would also have a polynomial time algorithm for many other problems. The following definition formalizes the idea of “polynomial time reducible”. Definition 11.2. One decision problem (AY ES , A) is polynomial-time reducible to another decision problem (BY ES , B), if there is a polynomial time algorithm T so that: – For any input i ∈ A we have T (i) ∈ B – T (i) ∈ BY ES ⇐⇒ i ∈ AY ES . We write X ≤p Y to mean “problem X is polynomial time reducible to problem Y ”. This means that if Y is solvable in polynomial time, then so is X. Or if X is not solvable in polynomial time, then neither is Y . 1 Note that the relation ≤p is reflexive (that is X ≤p X for every X). It is also transitive, meaning that X ≤p Y and Y ≤p Z imply X ≤p Z. Thus ≤p is a partial ordering on the collection of all decision problems. Let’s look at an example of polynomial time reduction. Consider the following decision problem. Decision problem: IndepSet Input: graph G = (V, E) and integer m Output YES iff G has a subset of m vertices containing no edges. A set of vertices in a graph with no edges between them is called an independent set. We can give a reduction of SAT to IndepSet. Proposition 11.3. SAT ≤p IndepSet. Proof. For polynomial reduction, for every input Φ to SAT, we have to construct an input (G, k) to IndepSet by a polynomial time (in the size of Φ) algorithm, such that the answer to Φ is YES iff the answer to (G, k) is YES. Here Φ = C1 ∧ C2 ∧ · · · ∧ Cm where each Ci is a clause with ki literals, that is Ci = z1 ∨ z2 ∨ · · · ∨ zki and each zj here is either a Boolean variable x or its negate. The size of Φ is Θ(k1 + k2 + · · · + km ). P Given such Φ, we construct first the graph G. It will have m i=1 ki vertices, each corresponding to one variable in one clause. Two vertices of G form an edge if they come from the same clause, or if the corresponding variables are negates of each other. Thus G consists of m complete subgraphs K1 , . . . , Km (one corresponding to each clause) plus edges connecting vertices of the type x and ¬x. From the input Φ to SAT we have constructed the input (G, m) to IndepSet. This construction takes polynomial time in k1 + k2 + · · · + km . The next figure shows an example. Φ = (x1 ∨ ¬x2 ∨ ¬x3 ) ∧ (¬x1 ∨ x3 ∨ ¬x4 ) ∧ (x2 ∨ ¬x3 ∨ x4 ) 2 If Φ is satisfiable then G has an independent set of size m: choose the vertex in each complete subgraph Kj that makes that clause true. This set is independent and is of size m. If G has an independent set U of size m, then Φ is satisfiable: U contains one vertex from every complete subgraph Kj , and we set xi true if a vertex corresponding to xi is in U , and set it false if the vertex corresponding ¬xi is in U . This shows that Φ is satisfiable iff G contains an independent set of size m, exactly what we wanted. 12 NP-completeness The following definition is quite central in the study of algorithms Definition 12.1. A decision problem X ∈ N P is NP-complete if Y ≤p X for every Y ∈ N P. Informally this definition means that: an NP-complete problem is the “hardest” problem in the class N P, every other problem can be polynomialtime reduced to it. A priory it is not at all obvious that NP-complete decision problems even exist. A breakthrough theorem proved independently by Cook and Levin shows that SAT is NP-complete. Theorem 12.2 (Cook-Levin). SAT is NP-complete The famous “P vs NP” problem asks whether there are decision problems which are in N P which aren’t in P. Using the above theorem we see this is equivalent to the the question of whether SAT is in P or not. Indeed, if SAT is in P, then using the definition of NP-complete we get a polynomial time algorithm for solving every decision problem in N P. On the other hand if SAT is not in P, then (using we fact that SAT ∈ N P ), we obtain that P= 6 N P. We now give the proof of the Cook-Levin Theorem. This proof is non-examinable. Proof. Let DY ES ⊆ D be a decision problem in N P. To prove the theorem, we need to find a polynomial time reduction of this decision problem to SAT. First, let’s recall what we know about N P. From the definition, we know that there is a Turing machine M and polynomials p, q which satisfy the definition of “polynomial time certifier” for DY ES ⊆ D. We’ll prove the following lemma which, when applied to the Turing machine M , will imply the theorem. 3 Lemma 12.3. There is a function f : ({Turing machines}, {Length n inputs}) → {CNF Formulas} so that f (M, x) is satisfiable if, and only if, there exists some string y of length ≤ q(n) with M (x, y) = Y ES and M halting in p(n) steps on (x, y). Additionally there is a polynomial time algorithm which finds f (M, x) for each M, x. The theorem is immediate from the lemma. We need to give a polynomial time reduction of DY ES ⊆ D to SAT. This means a polynomial time algorithm which takes an instance of D as an input, and gives an input of SAT as an output. The algorithm is simply “given an instance x ∈ D, find f (M, x) (using the polynomial time algorithm from Lemma 12.3). Note that M is fixed here (it is the Turing machine which which comes from the definition of the decision problem DY ES ⊆ D being in N P ). From Lemma 12.3, we know that f (M, x) is satisfiable if, and only if, there exists some string y of length ≤ q(n) with M (x, y) = Y ES and M halting in p(n) steps on (x, y). From the definition of “M is a polynomial time certifier for DY ES ⊆ D, we know that for each x, there exists some string y of length ≤ q(n) with M (x, y) = Y ES and M halting in p(n) steps on (x, y) if, and only if, x ∈ DY ES . Combining these we obtain that f (M, x) is satisfiable if, and only if, x ∈ DY ES i.e. we have verified the definition of “polynomial time reduction of DY ES ⊆ D to SAT”. It remains to prove the lemma. Proof of Lemma 12.3. We have a Turing machine M and input x. We want to construct a CNF formula f (M, x). The basic idea is to write down a bunch of clauses which “model” the running of a Turing machine. First we need to define the variables which the CNF formula will be built out of. Let M have m states and the alphabet Σ have s + 1 symbols. It will be convenient to think of the symbols of Σ as numbers i.e. Σ = {0, 1, . . . , s} with 0 being the blank. The variables of the CNF formula f (M, x) will be: Qi,j which will represent whether M is in state j at step i. Si,j,k which represents whether position j of the tape at step i contains symbol k. Ti,j which represents whether the r/w head is on position j at step i. We will now write down a long list of clauses which all encode some particular aspect of the running of a Turing machine. They will be grouped under 10 “rules”. 4 Rule 1: at each step i, the machine is in at least one state. This is encoded by the clause Qi,1 ∨ Qi,1 ∨ · · · ∨ Qi,q . By joining all these clauses using “∧” we get a CNF formula which encodes “at all steps, the machine is in at least one state”. Rule 2: at each step i, the machine is in at most one state. First notice that the boolean formula ¬(Qi,j ∧ Qi,k ) encodes “at step i the machine is not simultaneously in states j and k”. This is logically equivalent to the OR statement ¬Qi,j ∨ ¬Qi,k . By joining all these clauses using “∧” for all i and j 6= k we get a CNF formula which encodes “at all steps, the machine is in at most one state”. Combining this with Rule 1 (using ∧), we can encode “at all steps, the machine is in exactly one state” Rule 3: at each step i, position j on the tape contains at least one symbol This is encoded by the clause Si,j,1 ∨ Si,j,1 ∨ · · · ∨ Si,j,s . By joining all these clauses using “∧” we get a CNF formula which encodes “at all steps and all positions, the machine has at least one symbol”. Rule 4: at each step i, position j on the tape contains at most one symbol. First notice that the boolean formula ¬(Si,j,a ∧ Si,j,b ) encodes “at step i, position j on the tape doesn’t simultaneously contain symbols a and b”. This is logically equivalent to the OR statement ¬Si,j,a ∨ ¬Si,j,b . By joining all these clauses using “∧” for all i, j and a 6= b we get a CNF formula which encodes “at all steps, in all positions there is at most one symbol”. Combining this with Rule 3 (using ∧), we can encode “at all steps, in all positions, there is precisely one symbol”. Rule 5: at each step i, the r/w head is in at least one position This is encoded by the clause Ti,0 ∨ Ti,1 ∨ · · · ∨ Ti,n ∨ Ti,−1 ∨ · · · ∨ Ti,−n . By joining all these clauses using “∧” we get a CNF formula which encodes “at all steps the r/w head is in at least one position”. Rule 6: at each step i, the r/w head is in at most one position. First notice that the boolean formula ¬(Ti,a ∧ Ti,b ) encodes “at step i, the r/w head is not simultaneously in positions a and b”. This is logically equivalent to the OR statement ¬Ti,a ∨ ¬Ti,b . By joining all these clauses using “∧” for all i and a 6= b we get a CNF formula which encodes “at all steps, the r/w head is in at most one position”. Combining this with Rule 5 (using ∧), we can encode “at all steps, the r/w head is in precisely one position”. 5 Rule 7: at step 0, the r/w head is at position 0 and the machine is in state 1. This is encoded by the CNF formula T0,0 ∧ Q0,1 . Rule 8: At step i, the machine is at position a, sees symbol b, and is in state c with mc = (md , e, f ), then on the next step, it writes e in position a, moves to position a + f , and switch to state md . This is the “main” rule, which encodes the fact that the Turing machine acts like a Turing machine. To encode it we use the fact that the logical symbol for implies “ =⇒ ” can be used to encode if/then statements. We encode it using the boolean formula Ti,a ∧Si,a,b ∧Qi,c =⇒ Ti+1,a+f ∧Si+1,a,e ∧Qi+1,d . Using the either/or form of “implies”, this is logically equivalent to the CNF formula (¬Ti,a ∨ ¬Si,a,b ∨ ¬Qi,c ∨ Si+1,a,e ) ∧ (¬Ti,a ∨ ¬Si,a,b ∨ ¬Qi,c ∨ Ti+1,a+f ) ∧ (¬Ti,a ∨ ¬Si,a,b ∨ ¬Qi,c ∨ Qi+1,d ). By joining all these formulas using “∧” for all i, a, b, c, d, e, f , we can get a single CNF formula which encodes Rule 8 always holding. Rule 9: At step 0, the tape has the string x written on it. Otherwise the tape is blank except for the q(n) positions immediately to the right of x (where y can be written). Let x = x1 . . . xn , where each xi ∈ {1, . . . , s} is a symbol from the alphabet. Then the CNF formula S0,1,x1 ∧ S0,2,x2 ∧ · · · ∧ S0,n,xn encodes “x is written on the tape between positions 1 and n. Similarly, the CNF formula S0,0,0 ∧ S0,−1,0 ∧ · · · ∧ S0,−p(n),0 ∧ S0,n+q(n),0 ∧ S0,n+q(n)+1,0 ∧ · · · ∧ S0,n+p(n),0 encodes “all other entries are blank, except possibly the q(n) entries immediately to the right of x”. Combining these two CNF formulas using ∧ encodes rule 9. ES Rule 10: The machine halts with output YES. Let mYhalt be the kth state of the machine. Then Rule 10 can be encoded by Q1,k ∨Q2,k ∨ · · · ∨ Qp(n),k . Define the CNF formula f (M, x) as the combination of all the the CNF formulae from Rules 1 – 10 using “∧”. First, note that given M, x, f (M, x) can be calculated in polynomial time. To see this, first observe that f (M, x) has length O(p(n)3 ) (to see this, go through each of Rules 1 – 10 and check that O(p(n)3 ) is an upper bound on the length of the CNF formulae defined in each rule). Thus f (M, x) can be calculated in time O(p(n)3 ), simply by going through Rules 1 – 10 one by one and writing out the formulae involved in each rule. From the definition of Rules 1 – 10, f (M, x) has a satisfying assignment if, and only if, there is some string y of length q(n) which can be written after x 6 ES such that running M (x, y) halts in mYhalt (since the rules exactly encode the running of a Turing machine). This concludes the proof of the lemma. 7