Fundamentals of Informatics Lecture 14 Intractability and NP-completeness Bas Luttik Algorithms A complete description of an algorithm consists of three parts: 1. the algorithm 2. a proof of the algorithm’s correctness 3. a derivation of the algorithm’s running time Should we really care about the running time of an algorithm? Couldn’t we just improve our hardware, to compensate for the inefficiency of an algorithm? Towers of Hanoi Rules of the game: 1. Rings can only be moved one at the time 2. Rings may not be placed on top of smaller rings. Towers of Hanoi (running time analysis) Hanoi(n,x,y,z) // move n rings from peg x to peg y using peg z 1. if n = 1 2. then move ring from x to y 3. else 4. Hanoi(n-1,x,z,y) 5. move ring from x to y 6. Hanoi(n-1,y,z,x) Solving the recurrence T(n) = 2T(n-1)+O(1) yields T(n)=O(2^n)! A more efficient solution does not exist! T(n) O(1) O(1) T(n-1) O(1) T(n-1) Running times How long does it take to run an algorithm on a computer capable of a million instructions per second? Input length polynomial exponential 10 20 50 100 200 n2 1/10000 s. 1/2500 s. 1/400 s. 1/100 s. 1/25 s. n5 1/10 s. 3.2 s. 5.2 m. 2.8 h. 3.7 d. 2n 1/1000 s. 1 s. 35.7 y. 4 x 1011 c. 1 x 1045 c. nn 2.8 h. 3.3 x 109 y. 1 x 1070 c. 1 x 10185 c. 1 x 10445 c. polynomial-time algorithm: algorithm with running time of O(nc) for some constant c. P: the class of all decision problems for which there exists polynomialtime algorithm that solves it. Tractable versus intractable problems Tractable A problem is tractable if there exists a polynomial-time algorithm for it (i.e., if it is in the class P) Intractable A problem is intractable if there does not exist a polynomial-time algorithm for it. Unsolvable A problem is unsolvable if there does not exist an algorithm for it (not even an inefficient one). Two decision problems about graphs Problem 1: Input: a road map of cities, with distances attached to road segments, twopath designated cities A and B shortest problem and an integer(known k. to be in P) Output: ‘Yes’ if it is possible to take a trip from A to B of length ≤ k, and ‘No’ if such a trip is impossible. Problem 2: Input: a road map of cities, with distances attached to road segments, two designated cities A and B and an integer k. salesman problem travelling Output: ‘Yes’ if it is possible to take a trip from A to (unknown whether it is in P) B of length ≤ k which passes through all the cities, and ‘No’ if such a trip is impossible. Travelling Salesman decision problem travelling salesman decision problem Input: A road map with n locations (one of the locations is the depot) connected by road segments, with distances attached to the road segments, and an integer k. Output: Yes if there exists a route of distance less or equal k that starts and ends at the depot and visits all locations on the map exactly once. No otherwise. A solution to the travelling salesman problem is a list l0,…,ln of n+1 locations such that 1. the depot is both the first (l0) and the last (ln) location in the list, 2. all the other locations occur exactly once in the list, and 3. the sum of the distances between successive locations in the list is less or equal k. Note that it can be verified with a polynomial-time algorithm whether some candidate list l0,…,ln is a solution. Travelling Salesman decision problem travelling salesman decision problem Input: A road map with n locations (one of the locations is the depot) connected by road segments, with distances attached to the road segments, and an integer k. Output: Yes if there exists a route of distance less or equal k that starts and ends at the depot and visits all locations on the map exactly once. No otherwise. A solution to the travelling salesman problem is a candidate list l0,…,ln of n+1 locations satisfying the three conditions on the previous slide, and the conditions can be verified with a polynomial-time algorithm. A naïve algorithm for solving the travelling salesman decision problem searches exhaustively through all candidate solutions. The difficulty is: there are n! = n(n-1)(n-2)321 such candidate lists. So: searching for a solution is hard, but verifying a solution is easy. Problems in NP The travelling salesman problem has the following characteristics: 1. Given some candidate solution, it can be verified in polynomial time whether it is a correct solution. 2. There are too many candidate solutions to allow an efficient solution that exhaustively searches for a correct solution among the candidate solutions. A candidate solution to a problem will be called a certificate for the problem. NP: the class of all decision problems for which there exists a suitable notion of certificate and a polynomial-time algorithm to verify whether a certificate is an actual solution to the problem. (NP stands for Non-deterministic Polynomial time) Boolean formula satisfiability (SAT) A boolean formula consists of 0/1-valued variables and operators AND, OR, NOT with the following interpretation: x y x AND x OR y A ycertificate 0 0 0 1 x the NOTboolean x for satisfiability problem is an Examples: verification boils to checking whether the 0assignment; 0 0 1 1. xdown AND (NOT x) assignment satisfies the formula. 2. (((NOT x) OR y) AND (x AND (NOT z))) 0 1 1 0 1 0 0So: 1 1 1 the1 boolean satisfiability problem is in NP. A boolean formula is satisfiable if there exists an 1 assignment of 0/1-values to its variables such that the formula evaluates to 1. boolean satisfiability problem Input: A boolean formula. Output: Yes if the formula is satisfiable. No otherwise. A solution to the boolean satisfiability problem is a satisfying assignment; there are 2n candidate assignments. Subset-sum subset-sum problem Input: An finite set S of positive integers and a target number t. Output: Yes if S has a subset whose elements sum exactly to t. No otherwise. Example: A certificate for the subset-sum problem is a subset of S; If S is verification the set boils down to checking that the numbers in the t. 686, 2409, 2793, 16808, 17206, 117705, 117993} {1 subset ,2, 7, 14,add 49,up 98,to 343, So: the subset-sum problemofisthe in NP. and t=138457, then the elements subset {1, 2, 7, 98, 343, 686, 2409, 17206, 117705} is a solution. A candidate solution is a subset of S. There are 2|S| candidate solutions (where |S| is the number of elements of S). Hamiltonian-cycle decision problem A hamiltonian cycle in an undirected graph is a path in the graph that starts and ends with the same vertex and visits each vertex exactly once. A certificate for the hamiltonian-cycle problem is a hamiltonian-cycle permutation of the vertices of the graph problem with the first vertex added at the end. Verification down tograph. checking Input: Anboils undirected whether the resulting list of vertices corresponds a path in Output: Yes if the graph hastohamiltonian the graph. cycle. No otherwise. So: the hamiltonian-cycle problem is in NP. If the input graph has n vertices, then a candidate solution is any permutation of these n vertices with the first vertex in the permutation added at the end. There are n! candidate solutions. NP-complete problems A problem is in de class NP if there exists a polynomial-time algorithm to verify whether a certificate is an actual solution to the problem. A problem is NP-hard if the existence of a polynomial-time algorithm to solve the problem implies the existence of polynomial-time algorithms for all the other problems in NP. A problem is NP-complete if it is NP-hard and in NP. Theorem: The boolean satisfiability problem is NP-complete. Stephen Cook Leonid Levin Reduction revisited efficient Problem A efficient Problem B A reduction from decision Problem A to decision Problem B consists of two efficient parts: 1. a general method (algorithm) for transforming every question of problem A into an question of problem B 2. an argument that the B-answer to every transformed A-question can be interpreted as a (correct) answer to the original A-question. (Sometimes Banswers need to be negated.) efficient efficient Note that if we have a solution for decision problem B and a reduction from A to B, then we can effectively use it to solve problem A. efficiently Roughly, if there exists an efficient reduction from problem A to problem B, then problem A is cannot be fundamentally harder than problem B. Reduction revisited Problem A Problem B A polynomial-time reduction from decision Problem A to decision Problem B consists of two parts: 1. a polynomial-time algorithm for transforming every question of problem A into an question of problem B 2. an argument that the B-answer to every transformed A-question can be interpreted as a (correct) answer to the original A-question. (Sometimes Banswers need to be negated.) Note that if we have a polynomial-time algorithm for decision problem B and a polynomial-time reduction from A to B, then we can use it to construct a polynomial-time algorithm for decision problem A (see next slide). Polynomial-time reduction input x to A polynomial-time input y reduction algorithm to B from A to B yes polynomial-time algorithm for B no polynomial-time algorithm for A Consider a polynomial-time reduction algorithm from A to B that converts every input x for A to an input y for B in such a way that: 1. If the answer in B for y is yes, then the answer in A for x is yes. 2. If the answer in B for y is no, then the answer in A for x is no. If the reduction algorithm runs in O(nc) and the algorithm for B runs in O(md), then the resulting algorithm for A runs in O(nc + ncd). Composing polynomial-time reductions yes poly.-time reduction alg. from C to A input Z to C input x to A poly-time reduction alg. from A to B input y to B poly-time alg. for B no polynomial-time reduction algorithm from C to B polynomial-time algorithm for C If A is NP-hard, and there is a polynomial-time reduction algorithm from A to B, then from every problem C in NP there exists a polynomial-time reduction algorithm to B. Therefore: If A is NP-hard, and there exists a polynomial-time reduction algorithm from A to B, then B is NP-hard too! So: it suffices to reduce just one NP-hard problem to B (instead of all NP problems) to show that it is NP-hard! The mother problem NP-complete boolean formula satisfiability Theorem: The boolean satisfiability problem is NP-complete. Stephen Cook Leonid Levin Subset-sum subset-sum problem Input: An finite set S of positive integers and a target number t. Output: Yes if S has a subset whose elements sum exactly to t. No otherwise. A certificate for the subset-sum problem is a subset of S; verification boils down to checking that the numbers in the subset add up to t. So: the subset-sum problem is in NP. The book explains the details of an intricate polynomial-time reduction from (a variant of) boolean satisfiability to the subset-sum problem. So: the subset-sum problem is NP-hard. Conclusion: the subset-sum problem is NP-complete. Hamiltonian-cycle hamiltonian-cycle problem Input: An undirected graph. Output: Yes if the graph has hamiltonian cycle. No otherwise. A certificate for the hamiltonian-cycle problem is some permutation of the list of vertices of the graph, with the first vertex of the list repeated at the end. Verification amounts to checking that the list is indeed a path in the graph. So: the hamiltonian-cycle problem is in NP. The book presents a polynomial-time reduction (in several steps) from (a variant of) boolean satisfiability to the hamiltonian-cycle problem. So: the hamiltonian-cycle problem is NP-hard. Conclusion: the hamiltonian-cycle problem is NP-complete. A family tree of reductions NP-complete boolean formula satisfiability hamiltonian-cycle subset-sum Travelling Salesman (TSP) travelling salesman decision problem Input: A road map with n locations (one of the locations is the depot) connected by road segments, with distances attached to the road segments, and an integer k. Output: Yes if there exists a route of distance less or equal k that starts and ends at the depot and visits all locations on the map exactly once. No otherwise. A certificate for TSP is some permutation of the list of n+1 locations starting and ending with the depot. Verification amounts to checking that every location is in the list and the sum of the distances is less than or equal to k. So: TSP is in NP. Reducing hamiltonian-cycle to TSP A hamiltonian cycle in an undirected graph is a path in the graph that starts and ends with the same vertex and visits each vertex exactly once. hamiltonian-cycle problem Input: An undirected graph. Output: Yes if the graph has hamiltonian cycle. No otherwise. To reduce the hamiltonian-cycle problem to the traveling salesman problem (TSP) assign to all edges a weight of 0, and then add all missing edges (grey edges in the above graph) with a weight of 1. Then the original graph has a hamiltonian cycle if, and only if, the resulting graph has a TSP-route of distance 0. Travelling Salesman (TSP) travelling salesman decision problem Input: A road map with n locations (one of the locations is the depot) connected by road segments, with distances attached to the road segments, and an integer k. Output: Yes if there exists a route of distance less or equal k that starts and ends at the depot and visits all locations on the map exactly once. No otherwise. A certificate for TSP is some permutation of the list of n+1 locations starting and ending with the depot. Verification amounts to checking that every location is in the list and the sum of the distances is less than or equal to k. So: TSP is in NP. We have sketched a polynomial-time reduction from the hamiltoniancycle problem to TSP (see also the book). So: TSP is NP-hard. Conclusion: TSP is NP-complete. A family tree of reductions NP-complete boolean formula satisfiability hamiltonian-cycle travelling salesman subset-sum Partition partition problem Input: An finite set S of positive integers. Output: Yes if S can be partitioned into S1 and S2 such that the sum of the elements in S1 is equal to the sum of the elements in S2. No otherwise. A certificate for the partition problem consists of two disjoint subsets S1 and S2 of S such that the union of S1 and S2 is S. Verifying whether a certificate is a solution amounts to compute sums of the elements of S1 and S2 and checking whether the sums are equal. So: the partition problem is in NP. We have presented a reduction from subset-sum to partition. So: the partition problem is NP-hard. Conclusion: the partition problem is NP-complete. Reducing subset-sum to partition subset-sum problem Input: An finite set S of positive integers and a target number t. Output: Yes if S has a subset whose elements sum exactly to t. No otherwise. Let S, t be some input to the subset-sum problem Let z be the sum of the elements of S. Let y > t+z, 2z; add y-t and y-z+t (both >z!) to S to obtain S’. On the one hand, if S has a subset with sum t, then S’ can be partitioned! If S’ can be partitioned, then one of the subsets contains y-t and the other contains y-z+t. Since the size of the entire set (y-z+t)+z+(y-t)=2y, it follows that a partition includes a subset of S of size y-(y-t)=t. So S has a subset with sum t if, and only if, S’ can be partitioned. S’ can be obtained from S in polynomial time. Partition partition problem Input: An finite set S of positive integers. Output: Yes if S can be partitioned into S1 and S2 such that the sum of the elements in S1 is equal to the sum of the elements in S2. No otherwise. A certificate for the partition problem consists of two disjoint subsets S1 and S2 of S such that the union of S1 and S2 is S. Verifying whether a certificate is a solution amounts to compute sums of the elements of S1 and S2 and checking whether the sums are equal. So: the partition problem is in NP. We have presented a polynomial-time reduction from subset-sum to partition. So: the partition problem is NP-hard. Conclusion: the partition problem is NP-complete. A family tree of reductions NP-complete boolean formula satisfiability hamiltonian-cycle subset-sum travelling salesman partition Limits of computation For a class of important decision problems (the NP-complete problems), it is unknown whether they are tractable or intractable. Tractable NP-complete Intractable Unsolvable Note: they are all tractable or all intractable! A decision problem is tractable if there exists a polynomialtime that solves it intractable if there exists an algorithm that solves it, but not a polynomial-time algorithm unsolvable if there does not exist an algorithm that solves it The holy grail Multiplication 36746043666799590428244633799 62795263227915816434308764267 60322838157396665112792333734 17143396810270092798736308917 33478071698956898786044169848 21269081770479498371376856891 24313889828837938780022876147 11652531743087737814467999489 1230186684530117755130494958384962720772853569595334792197 3224521517264005072636575187452021997864693899564749427740 6384592519255732630345373154826850791702612214291346167042 9214311602221240479274737794080665351419597459856902143413 Factorization 36746043666799590428244633799 62795263227915816434308764267 60322838157396665112792333734 17143396810270092798736308917 33478071698956898786044169848 21269081770479498371376856891 24313889828837938780022876147 11652531743087737814467999489 1230186684530117755130494958384962720772853569595334792197 3224521517264005072636575187452021997864693899564749427740 6384592519255732630345373154826850791702612214291346167042 9214311602221240479274737794080665351419597459856902143413 Factorization That factorization is hard, is actually a good thing: it is at the heart of public-key cryptography! Factorization is actually in NP. (What’s the certificate?) So, public-key cryptography is in trouble in the (unlikely) case that P=NP . It is unknown whether factorization is NP-hard. Factorization currently essentially involves exhaustively searching the enormous search space of candidate divisors. But it may happen that an alternative to searching is found. For the related problem of primality testing (which, at first sight, also involves searching), an alternative to searching has been found and led to polynomial-time algorithm! Material Chapter 10 discusses many NP-complete problems, and presents reductions between them (Refer to chapter 9 again for an explanation of public-key cryptography and RSA.) Deadline: January 15, 2016