Hard Problems 15-211 Fundamental Data Structures and Algorithms Vahe Poladian Slides courtesy of Michael Maxim April 11, 2003 Announcements Homework 5 Due on Tuesday, April 15th, Don’t forget, taxes due then too, Hard Problems Intro to hard problems A note about today’s lecture topic: The subject of hard problems is deep and involves many elegant but also difficult concepts. There are several whole courses devoted to this subject, at both the 400 and 800 level. We will give just the briefest overview today. Easy vs. hard problems We have studied a lot of problems in this course. Sorting and searching. Graph traversal and spanning. Etc. Easy (i.e., tractable) problems These problems are tractable, meaning they have upper bounds that are polynomial-time or better. Sorting in O(Nlog N) or O(N2). Union-find in O(1). Dijkstra’s algorithm in O(|E|log |V|). Etc. Polynomial time We say that these problems can be solved in polynomial time because their upper bounds are defined by functions that are bounded by some polynomial. Some upper bounds, such as O(2N), are exponential and have no polynomial bound. Hard (i.e., intractable) problems Some problems have no known polynomial-time solution, only exponential-time solutions (or worse). Such problems are called intractable. Examples of intractable graph problems: 3-coloring. Traveling salesman problem. Hamiltonian cycle. Prime factorization -> a special case. The 3-coloring problem Suppose we color the vertices of an undirected graph. We have a k-coloring of the graph if no more than k colors are used, and Any two adjacent vertices must have different colors. The k-coloring problem: Does a given graph have a k-coloring? For k>2, all known algorithms are at least O(2|V|). 3-Color Example No adjacent nodes have the same color. A bad 3-Coloring The traveling salesman problem Suppose we are a traveling salesman that has to visit every city exactly once. We want to minimize our travel costs. The TSP problem: Given a complete weighted graph and an integer k, is there a simple cycle that visits all vertices and has total cost k? Like 3-coloring, known algorithms are all at least O(2|V|). TSP Example 5 3 1 6 4 2 1 2 3 4 Starting at 1, and traveling to each node, does a visitation order exists that has total weight less than 14? Yes – 1,2,3,4,1 The Hamiltonian cycle problem Given an undirected graph, is there a cycle that visits every edge exactly once? This is not asking for finding just a cycle – a DFS or BFS would do that! Cycle must visit every vertex in the graph The factoring problem (15-251) Suppose we want to find the prime factors of a given integer. Ex: Given 15, find 3 and 5. Best known algorithms are worse O(ed 1/3 than ), where d is the number of digits in the binary form of the given integer. The difficulty of this problem is key for many encryption algorithms including the popular RSA algorithm. Hard problems really are hard! 2500 2000 10N 100 log N 5 N^2 N^3 2^N 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 So much for 1.5GHz… Doligez showed that 10 of the world’s fastest computers, working together for 2 months, can factor a 130-bit number. Finding a 3-coloring or solving TSP for a graph with 500 nodes can require O(2500) operations. More operations than seconds since the Big Bang! Or atoms in the known universe. Many hard problems There are many well-known intractable problems. Questions: What do we mean, exactly, when we say “problem”? How do we prove that a problem is intractable? Just because we haven’t found a good algorithm for a problem, isn’t it possible that one might get invented some day? Complexity Theory Unfortunately, we won’t be able to answer these questions completely. This is a topic of Computer Science called complexity theory. See the 15-453 course for an intro! But a brief overview here. Before we get to this, we might ask whether there are unsolvable problems, and furthermore, we will look at precise ways to classify problems. The Millennium Problems One Hundred Billion Dollar Problem (Yes, that’s 10^11) Fresh Off the Press … Associated Press, April 10, 2003, 11:59 pm EST, For Immediate Release. “One Hundred Billion Dollars offered for solving the Halting problem by a former mobster committed to scientific advancement.” Prize Donor “I will personally deliver the prize, in cash, to the first person who solves the HALTING problem,” said the mobsterturned-philanthropist from his submarine. The Halting Problem Given a program P, does P always terminate when executed? Surely since we can compile and run programs, we can also figure out whether or not they terminate! Or can we? The Halting Problem, cont’d An informal argument: Suppose the halting problem is solvable and we can write a Java program HALT that takes any other Java program P as input and returns true if P always terminates, and false if P might loop forever. So, HALT(P) = true if P always terminates and false if P might infinite loop. CONFUSE Let us introduce a special program, called CONFUSE. Here is the specification of CONFUSE If (HALT(CONFUSE) = true) Loop forever Else Halt The Halting Problem, cont’d Java Code for CONFUSE.java: public static void main (String[] args) { if (HALT(“CONFUSE.java”)) { while (true) { ; } //loop forever } else { return; } } What happens when we run CONFUSE? The Halting Problem, cont’d Two possibilities: CONFUSE halts. But the definition of CONFUSE says that CONFUSE should then loop forever! CONFUSE loops. But the definition of CONFUSE says that CONFUSE should then halt! A contradiction. Therefore, our initial assumption that HALT(P) was possible was not valid. Historical Tidbit “Dr. Evil himself developed CONFUSE during his years as a graduate student.” - source unknown Undecidable problems. Indeed, some problems are so hard that they are impossible to solve! Such problems are called undecidable problems. The halting problem is one of the most famous of these problems. Self-Reference in proofs Such “self reference” exposes the impossibility of solving the halting problem. “There is a male barber in town who shaves every man in town who does not shave himself”. A more formal, and very powerful, proof technique for this is called diagonalization. Not covered in this course. Decision Procedures What is a “problem”? Throughout all of this, we have been relying on our intuition about what is a “problem”. But can we be more precise? Decision problems One class of problem is the decision problem. Decision problems involve finding the correct yes/no answer. Examples: Does graph G have a 3-coloring? Given weighted graph G and integer k, is there a simple cycle of cost <k? Does integer n have two factors a>1 and b>1? Decision procedure An algorithm that solves a decision problem is called a decision procedure. Search problems Another class of problem is the search problem. Search problems involve finding an example solution, if one exists. Examples: Find a 3-coloring for graph G. Find a simple cycle of cost <k in the weighted graph G. Find two factors a>1 and b>1 for integer n. Search = decision What is surprising is that search problems and decision problems are essentially equivalent to each other! In other words, given a solution to a search problem, one can solve the corresponding decision problem, and vice-versa. This is fairly subtle, and unfortunately beyond the scope of this course… Problems Because search and decision problems are essentially equivalent, we don’t normally have to say “decision” or “search” when talking about problems. Also, it is always good enough to have a decision procedure, as this will also give a solution to the corresponding search problem. Think about how this would work for 3coloring a graph… (FOOD FOR THOUGHT: Give me the conversion between decision and search for 3-coloring) NP Problems Hard problems As we mentioned earlier, some problems, like 3-coloring, TSP, and factoring, have no known polynomial-time solution. However, many of these problems have a very interesting property: A solution to their search problem can be verified in polynomial time. Verifying solutions Examples: If you are given a colored graph, it is easy to check, in polynomial time, whether or not it is a 3-coloring. If you are given a path through a weighted graph and an integer k, it is easy to check whether the path is a simple cycle and whether the cost is <k. If you are given two integers a and b, it is easy to check that a>1 and b>1 and a*b=n. Oracles A slightly twisted way to view these problems is as follows: Suppose we have a special computer that has a direct network connection to a Very Powerful Computer. Any time that a program needs to make a guess, it can consult with the Very Powerful Computer and get the right answer (instantly). Technically speaking, this is referred to as an oracle. An Oracle Oracles, cont’d Examples: To find a 3-coloring, simply ask the Oracle the color of each vertex. To find a cheap path through the graph, simply ask for the order in which the vertices should be visited. To factor a number n, ask the Oracle for one of the factors a. Nondeterministic computation A computer with an Oracle gets the solution by consulting the Oracle, and then verifies, in polynomial time, that the solution is correct. In some sense, this is equivalent to trying all possibilities in parallel. Note that 3-color, TSP, and factoring can all be solved in polynomial time on such a computer. The class NP The class of problems for which a witness (a potential solution) can be verified in polynomial time is called NP (nondeterministic polynomial time). Example NP problems 3-color, TSP, Hamiltonian cycle and factoring are all NP problems. Reducibility of problems There is a very strong sense in which 3-color, TSP, and Hamiltonian are all equivalent problems: Because of this, if you can come up with a polynomial-time solution to any one of these problems, then you will be able to solve all of them in polynomial time. Reducibility of problems The standard way of proving this type of equivalence is by reducing one problem to another. This is done by showing that it is possible to solve one problem by converting it to another (in polynomial time) and then converting the solution back again. Reducibility Suppose we can solve TSP in polynomial time: Convert instance of Hamiltonian to TSP Convert solution to TSP back to Hamiltonian Solve TSP If the conversions can be done in polynomial time, then the entire process is polynomial. Thus, TSP is at least as hard as Hamiltonian. Reducing Hamiltonian to TSP Given G = {V, E}, we are looking for a Hamiltonian cycle. Consider G’ = {V, E’}, a complete graph on V, such that if (u, v) is in E, then cost (u, v) in G’ is 0, and 1 otherwise, Solving Hamiltonian on G is reduced to solving TSP on G’ with k = 0, Reducing Hamiltonian … (2) Clearly, the reduction is polynomial: Takes O(|V|2) time, If the answer to the corresponding instance of TSP is yes, Same cycle satisfies Hamiltonian, If the answer to the corresponding instance of TSP is no, Instance of Hamiltonian does not have a solution, Reducing Hamiltonian … (3) Remember: We showed how to reduce Hamiltonian cycle problem to TSP! Reducibility, cont’d It turns out that 3-color, TSP, and Hamiltonian can all be reduced to each other. In other words, a polynomial-time solution to any of them would mean a polynomial-time solution to all. Factoring is “easier” in the sense that 3-color, TSP, and Hamiltonian can solve factoring, but not the other way around. Oracles and Reducibility Another way to think about reductions is using the oracle concept. Given an oracle for a problem X, to show that a problem Y can be reduced to X, we have to show that we can build an oracle for Y in polynomial time, using the oracle for X, The hardest NP problems Let’s imagine the space of all NP problems. NP Hardest NP problems A subset of these are the “hardest”, A polynomial-time solution to one would give a polynomial-time solution to all NP problems. NP-complete problems These “hardest” problems are called the NP-complete problems. 3-color, TSP, and Hamiltonian are all NP-complete. An NP-complete problem also has the property that all problems in NP are reducible to that problem. Proving NP-complete: how? To prove that X is NP-complete, you must reduce a problem already known to be NP-complete to X. So in the parlance of reductions, we know X is at least as hard as any of the already known NP-complete problems. This begs the question, before we knew of any NP-complete problems, how did we prove one NP-complete? Proving NP-completeness Make sure to keep the direction straight! We want to reduce from a known NP-complete problem to the problem we wish to prove NPcomplete. Reducing a known NP-complete problem to another problem X means X is at least as hard as the NP-Complete problem. But since the NP-complete problems are the hardest, X cannot be harder, so X must also be NP-complete. The first NP-complete problem The first problem that was discovered to be NP-complete was the satisfiability problem. By Cook in 1971. Given a boolean expression E with variables, is there an assignment of truth values to the variables that makes E true? An interesting proof… See 15-251, 15-463. Quiz Break Is Hamiltonian NP-Complete? Assume the following are given: TSP is NP-complete, Hamiltonian can be reduced to TSP, Is this sufficient to claim that Hamiltonian is NP-complete? Is Hamiltonian NP-Complete? In order to deduce that Hamiltonian is NP-complete, we need reduction the other way around! Thus, no, not sufficient. Questions, Questions,… Is there a polynomial time algorithm for the NP-complete problems? In other words, is a computer with an oracle more powerful than a “normal” computer? The answer: We don’t know! P=NP? NP-complete NP P Many people suspect that the picture looks like this, but no proof has been found. Dealing with hard problems Approximation algorithms Heuristics Others… TSP Approximation Pick a random visitation order of all nodes in the graph to start out. Pick two random nodes in the visitation order and switch their order. If this results in a travel of weight less than the previous best, accept the new ordering. Loop around by picking two random nodes again. TSP Approximation If we loop around enough times we will probably hit upon the optimal solution to TSP, however it is not guaranteed. Unlike general solutions to hard problems, approximations cannot be easily reduced to approximations for other hard problems. Another reason they are so hard… Summary Summary Most practical algorithms have polynomial-time upper bounds. Such problems are tractable problems. But many practical problems have no known polynomial-time algorithms. Such problem are intractable. Some problems are even undecidable. Example is the halting problem. Summary, cont’d An interesting class of intractable problems is the class NP-P. Solutions to NP search problems can be verified in polynomial time. Equivalently, NP problems can be solved in polynomial time by a nondeterministic computer. Some problems are NP-complete. Other problems can be shown to be NPcomplete by reduction. What you need to know You are not expected to prove that a problem is NP-complete. However, you are expected to understand what is meant by undecidable, NP, P, and NPcomplete. You should be able to reason about a problem to determine whether it is in NP.