SHEN’S CLASS NOTES Chapter 34 NP-Completeness 34.1 Basic Concepts After long time study, people found that some problems can be efficiently solved, and some problems are so difficult that only exponential time algorithms are known for them or even no algorithm at all. In order to study the intrinsic complexity of computational problems, we would like to classify problems into different classes based on their difficulty levels. If a problem can be solved in time that is a polynomial function of n, where n is the input size, we say that this problem belongs to class P. Class P problems are called tractable because they can be solved in polynomial time. A problem that requires super-polynomial time is called intractable. There is a class of problems whose tractability is not known yet. We call this class NP-complete (NPC). If a problem belongs to NPC class, then it is a hard problem because no body can provide a polynomial algorithm for it so far. However, it has not been proved yet that they are indeed not polynomial solvable. In order to study the intrinsic complexity of problems, we introduce the NP class. A problem belongs to the NP class if it can be solved by an NP algorithm in polynomial time. An NP algorithm is an algorithm run on a powerful but hypothetical computing model called non-deterministic machine model. All NPC problems can be solved by an NP algorithm in polynomial 1 SHEN’S CLASS NOTES time. Moreover, it is proved that if any NPC problem can be solved in the future by a (deterministic) polynomial algorithm, then all NP problems can be solved too. This is the reason why we use the name NP-Complete for such a problem. On the other hand, if any NPC problem is proved to be intractable in the future, then all NPC problems are intractable. Encodings The complexity of a problem is closely related to the input size which depends on the encoding of the problem. For example, if we want to encode the number 99. Its decimal representation, binary representation, and unary representation are: (99)10 (uses two digits) (1100011) 2 (uses 7 digits) (1111…1) 1 (uses 99 digits). So, we need to say few words about the encodings. Two different encodings e1 and e2 are called polynomial related for a problem I if there is a polynomial computable functions f and g such that f(e1(i)) = e2(i) and g(e2(i)) = e1(i) for any problem instance i. Obviously, if encodings e1 and e2 are polynomial related for problem I, then the problem I can be solved in polynomial time under encoding e1 if and only if I can be solved in polynomial time under encoding e2. We notice that almost all encoding methods are polynomial related except “expensive” encoding such as unary encoding. Suppose we use binary encoding for a number N, we need 2 SHEN’S CLASS NOTES log2N bits. Now, if we use base b system to represent N, then we need logbN digits. However, log2N = (log2b) (logbN) = k logbN, where k = log2b is a constant. Therefore, using base two encoding or base b encoding will affect the input size only by a constant factor. In general, if an encoding e uses an alphabet of b symbols, we can always use two symbols {0, 1} to encode each symbol with log2b bits. Thus, any encoding can be translated into a binary encoding without affect the complexity. We can assume that any “reasonable” encoding, particularly, binary encoding, can be used in our discussion of NPCompleteness theory, Decision Problems vs. Optimization Problems Many problems are optimization problems for which we wish to find the best solutions. For example, find a shortest path in between two vertices of a graph, find the largest compatible set of activities, find the MST for a graph. The NP-completeness theory does not directly apply to the optimization problems. It is based on “decision problems.” Definition 1 Any problem for which the answer is either yes or no is called a decision problem. Although the discussion on NP-Completeness is restricted for decision problems, it usually can be indirectly applied to 3 SHEN’S CLASS NOTES optimization problems. For example, finding a path from vertex u to v with minimum number of edges is an optimization problem. We can cast this problem as a decision problem as follows: Given a graph G(V, E) and two vertices u, v V, does there exists a path from u to v with distance k or less? We denote the coding of this problem by <G, u, v, k>. If this decision problem can be solved by an algorithm A(G, u, v, k), then the optimization problem can be solved in the following way: Shortest-path(G, u, v) 1 k1 2 while A(G, u, v, k) = ‘no’ 3 do { k k + 1 4 A(G, u, v, k) 5 } 6 return k 7 End Obviously, the above algorithm is polynomial if A(G, u, v, k) is a polynomial algorithm. This algorithm finds the length k of the shortest path, it does not actually produce the path. However, it is not hard to design a simple algorithm that can actually produce the path. We leave this to students. From now on, we only discuss decision problems unless specified otherwise. Polynomial reductions 4 SHEN’S CLASS NOTES Definition 2 Let A and B be two problems, we say that A polynomial reduces to B, denoted by A B, if there is a procedure called reduction algorithm that transforms any instance of A into an instance of B with the following characteristics: (1) The transformation takes polynomial time. (2) The answer for is yes if and only if the answer for is yes. Obviously, if A B and B is polynomial time solvable, then A is also polynomial solvable. A Formal-Language Framework Since the formal-language is a powerful tool in establishing NPC theory, here we review some basic notions about formal languages. Definition 3 An alphabet is a finite set of symbols. Examples are: = {0, 1}, = {a, b, c}, = {a, b, …, z}. Definition 4 A language L over is any set of strings made up of symbols from . Suppose = {0, 1}, L = {10, 11, 101, 111, 1011, …, } is the language which contains the binary representations of all prime numbers. 5 SHEN’S CLASS NOTES Special symbols and are used to represent the empty string and the empty language respectively. Moreover, * is used to represent the set of all binary strings. That is, * = {, 0, 1, 00, 01, 10, 11, 000, 001, …}. Every language L is a subset of *. * itself is a language also. Definition 5 Let L1 and L2 be two languages, their concatenation is a language defined by L1 L2 = { x1 x2 | x1 L1, x2 L2}. For example, L1 = { 10, 1100, 111000, …} = {1n0n | n 1}, L2 = { 01, 0011, 000111, …} = {0n1n | n 1}, L1 L2 = {1001, 100011, …} = {1n0n+m1m | m, n 1}. Definition 6 Let L be a language, its complement and Kleene star are languages defined by: L = * - L, and L* = {} L L2 L3 … Let Q be a decision problem, x be an instance of the problem, Q(x) be the answer to the instance x. Moreover, we use Q(x) = 1 and Q(x) = 0 to represent that the answer is yes or no respectively. Then, the problem Q can be characterized by the following language: 6 SHEN’S CLASS NOTES L = { x * | Q(x) = 1}. Definition 7 An algorithm A is said to accept a string x {0, 1}* if given input x, the algorithm’s output A(x) = 1. The algorithm A is said to reject a string x if A(x) = 0. Definition 8 Given an algorithm A, we define the language accepted by A to be the set of strings accepted by A: L = {x | x {0, 1}* and A(x) = 1}. Definition 9 A language L is decided by an algorithm A if every string in L is accepted by A and every string not in L is rejected by A. Now, we are ready to define class P. Class P The class P is the set of decision problems that can be solved (decided) in polynomial time. P = {L | L {0, 1}* and there exists an algorithm A that decides L in polynomial time}. The following theorem shows that accepting a language in polynomial time means deciding a language in polynomial time. So, we only need to show there is a polynomial time algorithm to accept a language to prove that it is in class P. Theorem 34.2 P = {L | L is accepted by a polynomial time algorithm}. 7 SHEN’S CLASS NOTES Proof. We need only show that if L is accepted by a polynomial time algorithm A, then we can find an algorithm A’ that decides L in polynomial time. Assume algorithm A accepts L in time O(nk) for a fixed k. This means that, for any string x of size n in L, algorithm A can produce A(x) = 1 within T = cnk steps, where c is a constant number. Now, we can design an algorithm A’ in this way: Let A’ to simulate the actions of A on an input x until A stops or reaches T steps. Then, A’ checks the result of A. If A accepts x, A’ accepts x by output 1. If algorithm A has not accepted x, then A’ rejects x by output 0. Obviously, A’ correctly decides L in polynomial time. 34.2 Polynomial Time Verification If we are given a decision problem and additional information which proves that the answer to the decision problem is yes, can you design a polynomial algorithm to verify that this information indeed proves the answer is yes? If you can, then this algorithm is called a polynomial (time) verification algorithm. For example, if you are given problem <G, u, v, k> and you are also given a path from u to v with distance k or less, then, we can easily design a polynomial algorithm that verifies if the path is indeed a path in the graph from u to v, and check if its length is k or less. 8 SHEN’S CLASS NOTES The additional information is called a certificate. Obviously, the length of the certificate must be a polynomial function of input size also. We assume that if the instance of the problem has “no” answer, then no certificate can exist. In this case, the verification algorithm can either give “no” answer or give no answer. In other words, the verification is only responsible for the cases where the instance of the problem has “yes” answer and a correct polynomial-long certificate is always provided. Usually, when we could not solve a hard problem in polynomial time, we try to design a polynomial verification algorithm for it. We call a polynomial verification algorithm an NP-algorithm. In the following, we look at another example. Hamiltonian Cycles A Hamiltonian cycle of a graph G(V, E) is a simple cycle that goes through every vertex in V exactly once. A graph that has a Hamiltonian cycle is called Hamiltonian graph. The Hamiltonian cycle problem is a decision problem that asks whether a given graph G has a Hamiltonian cycle or not. In terms of formal language, this problem corresponds to the following language. HAM-CYCLE = {<G> | G is a Hamiltonian graph}. 9 SHEN’S CLASS NOTES This is a difficult problem. So, we consider the verification algorithm. Suppose we are given a graph G, and also a sequence p of vertices, design a polynomial algorithm that verifies if p represents a Hamiltonian cycle of G. Obviously, this verification problem can be easily solved in O(n2): HAM-CYCLE(G(V, E), p) 1 Check if every vertex in p belongs to set V. 2 Check if the starting and ending vertices are identical. 3 Check every other vertex in p to see if they occur exactly once. 4 Check if every vertex in G occurs in p. 5 Check if every two adjacent vertices u, v in p are also adjacent in G. 6 If all above steps passed, return yes, otherwise no. 7 End Definition 10 A verification algorithm A(x, y) is a twoargument algorithm that takes an instance x of problem Q and a certificate y of x. This algorithm will produce A(x, y) = 1 if y proves Q(x) = 1. Note. The verification algorithm only needs be responsible for those certificates y that proves Q(x) = 1. If Q(x) = 0, the algorithm is allowed to produce nothing or runs forever. 10 SHEN’S CLASS NOTES The class NP A language L belongs to class NP if and only if there exists a two-input polynomial-time algorithm A and a constant c such that L = {x {0, 1}* | there exists a certificate y with |y| = O(|x|c) such that A(x, y) = 1}. We say that algorithm A verifies language L in polynomial time. Obviously, HAM-CYCLE NP. Note that P NP because if L P, then L can be accepted by an algorithm A in polynomial time. So, a certificate exists for any x L and A can be used as a verification algorithm. For example, <G, u, v, k> is in P. If the answer to a graph G is yes, then a path (u, v) with distance k or less exists. So, such a path can serve as a certificate. Also, note that a certificate is not unique. Any string that can be used to prove that the answer is yes can be used as a certificate. Therefore, x itself can be used as a certificate. Example 1 The set partition problem takes as input a set S of numbers. The question is whether the numbers can be partitioned into two sets, A and Ā = S – A such that x = x xA xA Show that the set-partition problem is in NP. 11 SHEN’S CLASS NOTES Solution: Let the certificate y be a subset Y of the set S. The verification algorithm A takes the following steps to verify: (1) check if every number in Y is a number in S (2) compute the sum of all numbers in the set Y (3) computer the set S – Y (4) compute the sum of all numbers in the set S - Y (5) compare the two numbers obtained in (2) and (4) to see it these two numbers are identical. If yes, then return yes. 34.3 NP-Completeness and Reducibility Definition 11 A language L1 is polynomial-time reducible to a language L2, written L1 p L2, if there exists a polynomial–time computable function f : {0, 1}* {0, 1}* such that for all x {0, 1}*, x L1 if and only if f(x) L2. The function f is called the reduction function. A polynomial-time algorithm F that computes f is called a reduction algorithm. Figure 34-1 illustrates the reduction function f. 12 SHEN’S CLASS NOTES {0,1}* f L1 {0,1}* L2 Fig. 34-1 Note that the reduction function is not one-to-one nor onto function. It may be a many to one function and some instances in L2 may be left unmapped. Lemma 34.3 If L1, L2 {0, 1}* are languages such that L1 p L2, then L2 P implies L1 P. Proof. Let A2 be a polynomial-time algorithm that decides L2, and let F be a polynomial-time reduction algorithm that computes the reduction function f. We shall show how to design a polynomial-time algorithm A1 that decides L1. Fig. 34-2 illustrates the design of A1. 13 SHEN’S CLASS NOTES x F f(x) yes, f(x) L2 yes, x L1 no, f(x) L2 no, x L1 A2 Fig. 34-2 Algorithm A1(x) 1 call algorithm F to transform x into f(x) 2 call algorithm A2(f(x)) to test if f(x) L2 3 if A2(f(x)) = 1 //This means f(x) L2 4 then return A1(x) = 1 // x L1 5 else return A1(x) = 0 // x L1 6 End Obviously, the algorithm correctly decides L1 and its running time is polynomial because each step in the algorithm needs a polynomial time. The class NPC Definition 12 A language L {0, 1}* is called NP-Complete if the following two conditions hold: (1) L NP, and (2) L’ p L for every L’ NP. If a language L satisfies condition (2), but not necessarily (1), then we say that L is NP-hard. 14 SHEN’S CLASS NOTES From the definition, any NP-Complete problem is also a NPhard problem. Definition 13 The set of all NP-Complete problems is called the NP-Complete class or the NPC class. That is NPC = {L | L is NP-Complete}. Theorem 4.4 If any NP-Complete problem is polynomial-time solvable, then P = NP. Equivalently, if any problem in NP is not polynomial-time solvable, then no NP-Complete problem is polynomial-time solvable. Proof. Suppose L P and also L NPC. By the definition of NP-Completeness, L’ p L for every L’ NP. From Lemma 34.3, we also have L’ P. Therefore, P = NP. The second statement is the contraposition of the first and it is true also. So far it is not known if P = NP or P NP although most people believe P NP. This is the most famous open conjecture in computer science. If P = NP, then P = NP = NPC. Otherwise, P NP, NPC NP, and P NPC = as illustrated by Fig. 34-3. NPC P NP Fig. 34-3 15 SHEN’S CLASS NOTES Circuit Satisfiability We will show that the NPC class is not empty. We will show that the circuit satisfiability problem is NP-Complete. Definition 14 A Boolean combinational circuit composed of AND, OR, and NOT gates is satisfiable if a set of input values can be found such that the output of the circuit is 1. Example 2 Fig. 34-4 shows two circuits, one is satisfiable and the other is not. 16 SHEN’S CLASS NOTES x1 x2 1 1 1 1 1 0 1 0 1 1 1 1 x3 0 1 1 1 1 (a) A satisfiable circuit x1 x2 x3 (b) A unsatisfiable circuit Fig. 34-4 Suppose a combinational circuit is encoded in a binary sequence <c>. Then the circuit satisfiability problem corresponds to the following language: CIRCUIT-SAT = {<C> | C is a satisfiable circuit}. Lemma 34.5 CIRCUIT-SAT NP. 17 SHEN’S CLASS NOTES Proof. We design a two-input polynomial-time algorithm A that can verify CIRCUIT-SAT. One input is the circuit C and the other is a certificate corresponding to an assignment of Boolean values to the wires of C. The algorithm A is constructed as follows. For each logic gate, it checks that the value provided by the certificate is correctly computed. Then, if the final output of the entire circuit is 1, the algorithm outputs 1. Otherwise A outputs 0. When the circuit is satisfiable, a certificate exists and has a length that is in the order of the circuit. The time to verify is linear in the number of gates and wires. Thus A is a polynomial-time algorithm. Therefore, CIRCUIT-SAT NP. Lemma 34.6 CIRCUIT-SAT is NP-hard Proof. Omitted. Theorem 34.7 CIRCUIT-SAT NPC. Proof. This is obtained directly from Lemmas 34.5 and 34.6. 34.4 NP-Completeness Proofs Lemma 34.8 If L is a language such that L’ p L for some L’ NPC, then L is NP-hard. Moreover, if L NP, then L NPC. 18 SHEN’S CLASS NOTES Proof. Since L’ is NP-Complete, for any L’’ NP, we have L’’ p L’. Because L’ p L, we have L’’ p L by transitivity. (See Exercise 34.3-2.) Therefore, L is NP-hard. Moreover, if L NP, then L NPC by definition. A Method for Proving that a language L is NP-Complete From Lemma 34.8, we often use the following steps to prove that a language L is NP-Complete. (1) Prove L NP (2) Select a known NP-Complete language L’ (3) Describe an algorithm F that transforms every instance x {0, 1}* of L’ to an instance f(x) of L. (4) Prove that x L’ if and only if f(x) L for all x {0, 1}*. (5) Prove that the algorithm F runs in polynomial time. In the following, we study a NPC problem. Formula Satisfiability We define the formula satisfiability problem in terms of the language SAT. An instance of SAT is a Boolean formula which consists of: (1) n Boolean variables : x1, x2, …, xn; (2) m Boolean connectives. Each connective has one or two inputs and one output. Possible connectives are: , , , , 19 SHEN’S CLASS NOTES (3) Parentheses used to define order of connectives. We assume no redundant parentheses. A truth assignment for a formula is a set of values for the variables of . A satisfying assignment is a truth assignment such that = 1. A formula is satisfiable formula if it has a satisfying assignment. SAT = {<> | is a satisfiable Boolean formula}. Example 3 = ((x1 x2) (( x1 x3) x4 )) x2 is satisfiable. A satisfying assignment is < x1 = 0, x2 = 0, x3 = 1, x4 = 1>. = ((0 0) (( 0 1) 1 )) 0 = (1 (1 1 )) 1 = (1 0) 1 = 1. Theorem 34.9 SAT NPC. Proof. We first prove that SAT NP. Given a certificate consisting of a satisfying assignment for a formula , the verifying algorithm simply replaces each variable in the formula with its corresponding value and then evaluates the expression. This can be done in polynomial time. So, SAT NP. 20 SHEN’S CLASS NOTES Now, we prove that SAT NP-hard. We will show that CIRCUIT-SAT p SAT. We will show how to transform a circuit into a formula. The transformation takes the following steps: (1) Create n variables x1, x2, …, xn for the n input lines of the circuit. (2) Let m be the number of gates in the circuit. Create a new variable xn+i for the output wire of gate i. (3) For gate i, create a simple formula fi that establishes “if and only if” relation between its input variables and its output variables, 1 i m. Specifically, (3.1) if gate i is a NOT gate and the input variable is xj then fi = (xn+i xj); (3.2) if gate i is a OR gate and the input variables are xr, xr+1, …, xj then fi = (xn+i (xr xr+1… xj)); (3.3) if gate i is a AND gate and the input variables are xr, xr+1, …, xj then fi = (xn+i (xrxr+1… xj)). (4) Let the xn+m be the variable corresponding to the output wire of the circuit. Then, the formula is = xn+m f1 f2 … fm. Fig. 34-5 shows an example. 21 SHEN’S CLASS NOTES x1 x2 2 x5 5 3 x6 6 x3 1 x4 4 x8 x9 7 x10 x7 f1 = (x4 x3), f2 = (x5 (x1 x2)), f3 = (x6 x4), f4 = (x7 (x1 x2 x4)), f5 = (x8 (x5 x6)), f6 = (x9 (x6 x7)), f7 = (x10 (x7 x8 x9)). = x10 (x4 x3) (x5 (x1 x2)) (x6 x4) (x7 (x1 x2 x4)) (x8 (x5 x6)) (x9 (x6 x7)) (x10 (x7 x8 x9)). Fig. 34-5. Now we prove that the circuit is satisfiable if and only if is satisfiable. (1) Suppose the circuit is satisfiable. Let x1, x2, …, xn satisfy the circuit. Then, we can use the same set of values for the variables x1, x2, …, xn in the formula . Moreover, we use the value output from gate i for variable xn+i in . Because each formula fi correctly defines the function of gate i, the value of each 22 SHEN’S CLASS NOTES fi will be 1. So, if we evaluate , we will get 1 which means, is satisfiable. (2) Suppose is satisfiable. Let a set of values of x1, x2, …, xn, xn+1, …, xn+m satisfy . Then, we can use the same values of x1, x2, …, xn as the input values to the circuit. Because = 1, each formula fi must equal to one also, as well as xn+m = 1. Because fi correctly defines the function of gate i, the value of output wire from gate i must equal to the value of fn+i. Particularly, the value of the output of the circuit is equal to xn+m which is equal to one. Therefore, the circuit is satisfiable. Obviously, the transformation takes a linear time. Thus, CIRCUIT-SAT p SAT, which effectively proves that SAT NPC. 3-SAT 3-SAT is a short name for 3-CNF satisfiability problem. Definition 15 A literal in a Boolean formula is an occurrence of a variable x or its negation x. Definition 15 A Boolean formula is in conjunctive normal form (CNF) if it is expressed as an AND of clauses, where a clause is the OR of one or more literals. Definition 16 A 3-CNF is a CNF in which each clause has exactly three distinct literals. 23 SHEN’S CLASS NOTES Example 4 The following formula is a 3-CNF. = (x1 x1 x2) (x3 x2 x4) (x1 x3 x4). Theorem 34.10 3-SAT NPC. Proof. Omitted. 34.5 NP-Complete Problems In this section, we will study several most well-known NPC problems. We will lean some proof skills and techniques from these examples. We will prove a new problem is NPC by polynomial reducing a known NP-C problem to this new problem. Fig. 34-6 shows those NP-C problems which will be studied and the relationship from which problem to which problem the polynomial reduction takes place. The first two reductions have been discussed. 24 SHEN’S CLASS NOTES CIRCUIT-SAT SAT 3-SAT SUBSET-SUM CLIQUE VERTEX-COVER HAM-CYCLE TSP Fig. 34-6 The Clique Problem A clique in a undirected graph G(V, E) is a subset V’ V of vertices such that every two of them are adjacent. So, a clique is a complete subgraph of G. The clique problem is to find a clique of maximum size. This is an optimization problem. A corresponding decision problem is to decide if graph G has a clique of size k. This problem can be defined as the following language. CLIQUE ={<G, k> | G is a graph with a clique size k}. 25 SHEN’S CLASS NOTES Theorem 34.11 CLIQUE NP-C. Proof. First, we prove that CLIQUE NP. Given a certificate that consists of k vertices, it is easy to check if the k vertices form a k-clique. Checking if two vertices are adjacent needs at most O(n) time by scan the input once. So, the verification can be done in O(k2n) time. Now, we prove that CLIQUE NP-hard by proving 3SAT p CLIQUE. Let = C1 C2 … Ck be the input for the 3-SAT problem, where is a clause with three literals. Let Ck = ( l1r l r2 l 3r ), 1 r k. We construct a graph G(V, E) from . The vertex set V contains 3k vertices: V = { v1r , v r2 , v 3r }, 1 r k. For edges, ( v ir , v sj ) E if the following two conditions hold: (1) r s (2) l ir l sj The first condition means v ir and v sj are in different triples. The second means the corresponding literals are not complement each other. Fig. 34-7 shows the graph constructed from the formula = (x1 x2 x3) (x1 x2 x3) (x1 x2 x3). 26 SHEN’S CLASS NOTES C1=x1x2x3 x2 x1 x3 x1 C2=x1x2x3 x1 x2 x2 x3 C3=x1x2x3 x3 Fig. 34-7 We will show that is satisfiable if and only if the constructed graph G(V, E) has a k-clique. Suppose has a satisfying assignment. Then, each clause Cr has at least one literal l ir = 1. Its corresponding vertex in G is v ir . Selecting one such literal from each clause, we get corresponding k vertices in G. Among the k vertices, any two of them are adjacent because they belong to different triples, and their corresponding literals are not complement each other. This is because any literal and its complement cannot be both equal to 1. Therefore, these k vertices form a k-clique. Now, suppose G has a clique V’ of size k. Then, any two vertices in V’ must belong to different triples. We assign one to the k corresponding literals in . That is, assign l ir = 1 if v ir V’. Obviously, if v ir V’, then the vertex u corresponding to the complement of l ir will not be in V’ because (u, v ir ) E. Thus, 27 SHEN’S CLASS NOTES this assignment will not run into the risk that both a variable and its negation are assigned with one. After this, we assign 0 to the k literals which are negations of the k assigned literals. If there are other variables not assigned, we arbitrarily assign each of them with one and its negation with zero. Obviously, this assignment satisfies the formula . Because the construction of graph takes a polynomial time, 3-SAT p CLIQUE, which proves CLIQUE NPC. The Vertex-Cover Problem A vertex cover of a graph G(V, E) is a vertex subset V’ V such that if (u, v) E, then u V’ or v V’ or both. The vertex cover problem is an optimization problem to find a vertex cover of minimum size. Its decision problem can be defined by the following language: VERTEX-COVER = {<G, k> | G has a vertex cover of size k}. Theorem 34.12 VERTEX-COVER NPC. Proof. We prove VERTEX-COVER NP first. Let the certificate to be a set of vertices V’ V. The verification algorithm checks if the following are true: (1) |V’| = k. (2) For every edge (u, v) E, either u V’ or v V’. Obviously, this verification can be done in polynomial time. Now, we prove VERTEX-COVER NP-hard by showing CLIQUE p VERTEX-COVER. Let G(V, E) be the graph for 28 SHEN’S CLASS NOTES the CLIQUE problem. We construct a new graph G’ for the VERTEX-COVER problem. The construction of G’ is easy. It is the complement graph of G. That is G’ = G (V’, E’). u v u v z w y z x w y (a) G x ( b) G Fig. 34-8 Let |V| = n, k’ = n – k. We shall show that G has a k-clique if and only if G has a vertex cover with size k’. Suppose G has a k-clique V’ V. We claim that V – V’ is a vertex-cover of G . To see this, look at edge (u, v) E’. Obviously, (u, v) E. So, either u or v will not belong to V’. Then, u or v must belong to V – V’. So, V – V’ is a vertex cover of G with size |V-V’| = n - k = k’. Conversely, suppose G has a vertex-cover V’ V, where |V’| = n - k = k’. Then, for any u, v V, if (u, v) E’, then u V’ or v V’ or both. This implies that if u V’ and v V’, then (u, v) E’ or (u, v) E. Therefore, V – V’ is a clique of G with size |V-V’| = n – k’ = k. Thus, we have just proved CLIQUE p VERTEX-COVER. 29 SHEN’S CLASS NOTES So, VERTEX-COVER NPC. The Hamiltonian Cycle Problem We have defined this problem before. Now we prove its NP-Completeness. Theorem 34.13 The Hamiltonian Cycle problem is NP-Complete. Proof. The proof is given in the book. Because it is too lengthy, we omit it here. The Traveling-Salesman Problem A traveling salesman wishes to make tour, visiting each city exactly once and return to the starting city. Suppose there is a direct connection between any two cities. So, finding such a tour is easy. The problem is that there is a cost associated with each connection and the traveling salesman wants to minimize the total cost. We formalize this optimization problem by graph terminology as follows: Given a weighted and complete graph G(V, E), find a Hamiltonian cycle whose total weight (cost) is minimized. A corresponding decision problem can be defined as: Given a weighted and complete graph G(V, E) and a number k, does G have a Hamiltonian cycle whose total weight is k or less. We assume all weights are integers. We can also define this problem by the following language: 30 SHEN’S CLASS NOTES TSP = {<G, c, k> | G(V, E) is a complete graph, c is a function: VVZ, k Z+, G has a Hamiltonian cycle with cost k}. Theorem 34.14 TSP NP-C. Proof. It is easy to see that TSP NP. We will show that TSP NP-hard by showing HAM-CYCLE p TSP. Let G(V, E) be an instance of HAM-CYCLE. We construct an instance of TSP as follows: The instance for TSP is a graph G’(V’, E’), where V’ = V. G’ is a complete graph. So, E’ = {(i, j) | i, j V and i j}. The weight (cost) on each edge is defined in this way: 0 if (i, j ) E c(i, j) = 1 if (i, j ) E Then, <G’, c, 0> is the instance for TSP. This reduction takes polynomial time. Now it is straightforward to see that G has a Hamiltonian cycle if and only if a salesman tour in G’ has a total cost 0. Therefore TSP NPC. The Subset-Sum Problem In the subset-sum problem, we are given a finite set S N and a target number t N. We ask whether there is a subset S’ S whose elements sum to t. For example, if S = {1, 2, 7, 8, 14}, t = 15, then S’ = {7, 8} is a solution. 31 SHEN’S CLASS NOTES Formally, we can define SUBSET-SUM = {<S, t> | there exists a subset S’ S such that s = t}. sS ' Theorem 34.15 SUBSET-SUM NPC. Proof. First we prove that SUBSET-SUM NP. Let the certificate be a subset of S, then checking s = t can easily be sS ' done in polynomial time. Now prove SUBSET-SUM NPC by showing 3-SAT p SUBSET-SUM. Let formula be the input to the s-SAT problem, we will construct an instance <S, t> for the SUBSET-SUM. Without loss of generality, we assume (1) No clause contains x and x. This is because such a clause is always true and can be deleted. (2) Each variable appears in at least one clause. Suppose has n variables x1, x2, …, xn and k clauses C1, C2, …, Ck. The instance <S, t> will have 2(n + k) decimal numbers in the set S, two for each variable or clause. Each number has (n+k) digits defined by the n variable and k clauses as illustrated by Fig. 34-8. The number t is also a (n+k)-digit number. x1 x2 xn C1 C2 Fig. 34-8 The structure of (n+k) digits. 32 Ck SHEN’S CLASS NOTES Specifically, we do the following. (1) For each variable xi, generate two numbers vi and vi’, one for xi itself and the other for its complement xi. The (n+k) digits for vi are determined as follows: The digit under xi is 1. If xi appears in Cj, then the digit under Cj is 1. All other digits are 0. The (n+k) digits for vi are determined as follows: The digit under xi is 1. If xi appears in Cj, then the digit under Cj is 1. All other digits are 0. (2) For each clause Cj, generate two numbers sj and sj’. The number sj has a zero under all digits except the digit under Cj which is 1. The number sj’ has a zero under all digits except the digit under Cj which is 2. (3) The number t has a one in each of the first n digits corresponding to the n variables x1, x2, …, xn. It has a 4 in each of the last k digits corresponding to the k clauses C2, …, Ck. Example 5 Figure 34-9 shows how the number t and set S of 14 numbers are generated from formula = C1 C2 C3 C4, where C1 = (x1 x2 x3) C2 = (x1 x2 x3) C3 = (x1 x2 x3) C4 = (x1 x2 x3) 33 SHEN’S CLASS NOTES v1 v1’ v2 v2’ v3 v3’ s1 s1’ s2 s2’ s3 s3’ s4 s4’ t = = = = = = = = = = = = = = = x1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 x2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 x3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 C1 1 0 0 1 0 1 1 2 0 0 0 0 0 0 4 C2 0 1 0 1 0 1 0 0 1 2 0 0 0 0 4 C3 0 1 0 1 1 0 0 0 0 0 1 2 0 0 4 C4 1 0 1 0 1 0 0 0 0 0 0 0 1 2 4 Fig. 34-9 Obviously, this construction of <S, t> takes a polynomial time. Now, we show that is satisfiable if and only if <S, t> has a yes answer. (1) Suppose has a satisfying assignment. We select numbers from the set S as follows. Check each xi. If xi = 1, we include vi the number in set S’, otherwise include vi’. After the n variables 34 SHEN’S CLASS NOTES have been checked, we add those number selected so far in set S’. Let this number be r. From the construction, it is easy to see that the number r has a one in each of the first n digits. For example, in the Example 5, has a satisfying assignment, x1 = 0, x2= 0, x3 = 1. So, v1’, v2’, and v3 are selected, and r = 1111231. The number r t yet. We notice that, each of the last k digits in r must be either 1 or 2 or 3. This is because in every clause, there is at least one literal but at most three literals that are equal to one. We have selected exactly those numbers whose corresponding literals equal to one. Now, we check each of the last k digits in the number r. If the digit under Cj is one, we include the numbers sj and sj’ in the set S’. If it is two, we include the numbers sj’ in the set S’. If it is three, we include the numbers sj in the set S’. Now, the sum of all numbers in set S’ is equal to t. This is because adding number sj to the number r will increase the digit of Cj by one without change other digits; Adding number sj’ to the number r will increase the digit of Cj by two; Adding both sj and sj’ will increase the digit of Cj by three. Therefore, the way we select the numbers will make each of the last k digits equal to 4 in the sum of all numbers in set S’. Therefore, the instance we constructed has a yes answer. In Fig. 34-9, the shaded rows are the numbers included in set S’. Obviously, the sum of these numbers equals to t. 35 SHEN’S CLASS NOTES (2) Suppose the instance <S, t> we have constructed has a yes answer. That is there is a subset S’ S such that s = t. We will show a satisfying assignment for the sS ' formula . From the construction of <S, t>, S’ must include either vi or vi’, but not both, so that the sum has one in each of the first n digits. We assign xi = 1 if vi S’, xi = 0 otherwise, 1 i n. Now, we show this assignment satisfies . Because the sum t has a 4 in each of the last k digits corresponding to the k clauses, Cj, 1 i k, then S’ must include some vi or vi’ that appears in Cj. This means that some literal in Cj is assigned one. Therefore, every Cj, 1 i k, is satisfied and hence is satisfied too. End of Chapter 34. 36