SAT-CNF Is N P-complete∗ Rod Howell Kansas State University November 9, 2000 The purpose of this paper is to give a detailed presentation of an N Pcompleteness proof using the definition of N P given by Brassard and Bratley [1] and the following definition of N P-hardness: Definition 1 A decision problem Y is N P-hard if for every X ∈ N P, X ≤pm Y. We focus on the following two problems: Satisfiability (SAT): Input: A formula f over boolean variables with operators ∧, ∨, and ¬. Question: Is there an assignment of boolean values to the variables in f such that f is true? Conjunctive Normal Form Satisfiability (SAT-CNF): Input: A formula f in conjunctive normal form (CNF), i.e., of the form ki m ^ _ αij , i=1 j=1 where each αij is either a boolean variable or the negation of a boolean variable. Question: Is there an assignment of boolean values to the variables in f such that f is true? We will show that SAT-CNF is N P-complete (this fact is originally due to Cook [2]). We assume the following theorem, also due to Cook [2]: Theorem 1 SAT is N P-hard. ∗ Copyright c 2000, Rod Howell. This paper may be copied or printed in its entirety for use in conjunction with CIS 775, Analysis of Algorithms, at Kansas State University. Otherwise, no portion of this paper may be reproduced in any form or by any electronic or mechanical means without permission in writing from Rod Howell. 1 We first show that SAT ∈ N P. Because SAT-CNF is a special case of SAT, it will follow that SAT-CNF ∈ N P. Theorem 2 SAT ∈ N P. Proof: Our proof space Q will be the set of finite sequences of boolean values. Let Φ be the set of all boolean formulas, and let V be the set of all boolean variables. We define a partial function g : Φ × Z+ * V so that g(f, k) is the variable in f whose first occurrence is kth among the first occurrences of all variables in f ; if f contains fewer than k variables, g(f, k) is undefined. We then define F ⊆ SAT × Q so that hf, hb1 , b2 , . . . , bm ii ∈ F iff • f contains exactly m variables; and • the assignment of the boolean value bk to the variable g(f, k) for 1 ≤ k ≤ m is a satisfying assignment for f . Thus, for each f ∈ SAT, there is a q ∈ Q no longer than f . We will now show that we can decide whether hf, qi ∈ F in O(n2 ) time, where n is the sum of the lengths of f and q. Our algorithm proceeds as follows: 1. We first construct a list of all the variables in f , ordered by the position of their first occurrence. As we construct this list, we record the position in this list of each occurrence of each variable in f . This can clearly be done in O(n2 ) time. 2. We then verify that the length of q is exactly the number of variables in f . This can clearly be done in O(1) time if q is stored in an array. 3. We then verify that f is satisfied by assigning the kth variable in the variable list the kth boolean value in q, for all k. We accomplish this by a straightforward recursive evaluation of f , looking up the value of each variable using its recorded position in the variable list. Assuming the sequence q is stored as an array, this evaluation can be done in O(n) time. Corollary 1 SAT-CNF ∈ N P. Corollary 2 SAT is N P-complete. Our proof of N P-hardness will involve a reduction from SAT to SAT-CNF. This reduction will consist of two steps. First, we will convert the given boolean formula to an equivalent formula in which negation is applied only to variables, not to arbitrary subexpressions. We will then construct from the resulting formula a formula in CNF that is satisfiable iff the original formula is satisfiable. The CNF formula will not, in fact, be equivalent to the original formula, because conversion to CNF can result in an exponentially larger formula, and hence uses exponential time in the worst case. We begin by showing how negations can be moved to the variables. 2 Lemma 1 There is a linear-time algorithm to convert arbitrary boolean formulas to equivalent boolean formulas in which negation is applied only to variables. Proof: Our algorithm uses the following laws of boolean algebra: ¬(f ∨ g) = ¬f ∧ ¬g; ¬(f ∧ g) = ¬f ∨ ¬g; and ¬¬f = f. (1) (2) (3) We can apply one of these three laws to any subexpression ¬f , where f is not a variable, to obtain an equivalent subexpression in which all negated subexpressions are strictly shorter than f . A straightforward recursive implementation of this strategy produces an equivalent formula of the proper form in linear time. We now define a literal to be either a variable or the negation of a variable. Using this definition, we can describe our reduction from SAT to SAT-CNF. Theorem 3 SAT-CNF is N P-hard. Proof: We will show that SAT ≤pm SAT-CNF. From Theorem 1, it will follow that SAT-CNF is N P-hard. Let f be an arbitrary boolean formula. Our algorithm first coverts f to an equivalent formula f 0 in which negations are applied only to variables. From Lemma 1, this can be done in linear time. Furthermore, because the conversion can be done in linear time, the length of f 0 is linear in the length of f . For the next step of our algorithm, we assume the existence of a polynomial-time algorithm to generate new unique variables. In particular, after the algorithm is initialized, we can call it arbitrarily many times, and each time it will return a variable different from any variable in f 0 and any variable it had previously returned. It is not hard to design such an algorithm that returns n variables in a time polynomial in n and the length of f 0 . We now describe a recursive algorithm that takes a formula φ in which negations are applied only to variables, and produces a formula φ0 in CNF that is satisfiable iff φ is satisfiable. Let V be the set of variables in φ and V 0 be the set of variables in φ0 . Specifically, φ0 will be satisfiable by an assignment g 0 : V 0 → {true, false} iff φ is satisfiable by the assignment g : V → {true, false}, where g(v) = g 0 (v) for all v ∈ V . The base case occurs when φ is a literal. In this case, the algorithm simply returns φ. Otherwise, there are two cases. Case 1: φ = φ1 ∧ φ2 . We first recursively compute φ01 and φ02 . We then return φ0 = φ01 ∧ φ02 . Because φ01 and φ02 are both in CNF, φ0 is in CNF. Suppose φ is satisfied by some assignment g. Then g must satisfy both φ1 and φ2 . Let V10 be the set of variables in either φ01 or V , and let V20 be the set of variables in either φ02 or V . Then V 0 = V10 ∪ V20 , and V = V10 ∩ V20 . There must be assignments g10 : V10 → {true, false} satisfying 3 φ01 and g20 : V20 → {true, false} satisfying φ02 such that for all v ∈ V , g(v) = g10 (v) = g20 (v). Let g 0 : V 0 → {true, false} be defined as ( g10 (v) if v ∈ V10 0 g (v) = g20 (v) if v ∈ V20 . Clearly, g 0 satisfies φ0 . Now suppose φ0 is satisfied by some assignment g 0 . Then g 0 must satisfy both φ01 and φ02 ; hence, it also satisfies both φ1 and φ2 . Case 2: φ = φ1 ∨ φ2 . We first recursively compute φ01 and φ02 , then generate a new variable x. We then change each conjunct c of φ01 to x ∨ c, and change each conjunct c of φ02 to ¬x ∨ c. We return φ0 , the conjunction of the resulting two formulas. Again, because both φ01 and φ02 are in CNF, φ is in CNF. Suppose φ is satisfied by some assignment g. Then g must satisfy at least one of φ1 or φ2 . Assume g satisfies φ1 ; the other case is handled symmetrically. Define V10 and V20 as in Case 1. Then V 0 = V10 ∪ V20 ∪ {x} and V = V10 ∩ V20 . There must be an assignment g10 : V10 → {true, false} satisfying φ01 such that for all v ∈ V , g(v) = g10 (v). Let g 0 : V 0 → {true, false} be defined as 0 g1 (v) if v ∈ V1 0 g (v) = false if v = x true otherwise Because g 0 (x) = false, and because each conjunct of φ01 is satisfied by g10 , each conjunct of φ0 is satisfied by g 0 . Now suppose φ0 is satisfied by some assignment g 0 . Assume g 0 (x) = false; the other case is handled symmetrically. Then each conjunct of φ0 that was derived from φ01 must contain a literal that is true under g 0 ; furthermore, each of these true literals must contain variables from V10 . It follows that φ01 is satisfied by g 0 ; hence, φ1 and φ are satisfied by g 0 . In order to complete the proof, we must show that the entire algorithm operates in polynomial time. In order to facilitate this, we will first show that the formula φ0 produced by the recursive algorithm described above contains no more conjuncts than φ has literals, and that each conjunct contains no more literals than the number of ∨’s in φ, plus 1. The bound on the number of conjuncts follows immediately from the fact that only the base case introduces new conjuncts, and it introduces one for each literal. The bound on the number of literals in each conjunct follows from the fact that only the base case and Case 2 increase the size of a conjunct. The base case creates a new conjunct (effectively increasing from 0 literals to 1 literal), and Case 2 adds 1 literal to existing conjuncts. Because Case 2 is called once for each ∨ in φ, the total number literals in any conjunct in φ0 is at most the number of ∨’s in φ plus 1. 4 The recursive algorithm can be implemented to represent φ0 as a linked list with head and tail pointers. Each element in the list represents a conjunct. Each conjunct in turn is represented by a linked list of literals. Using this representation, it is easily seen that if we ignore the time needed to generate new variables, the recursive algorithm can be implemented to run in time linear in the size of φ0 , which is polynomial in the size of f . Because the total number of new variables needed is polynomial in the size of f , they can also be generated in polynomial time. Therefore, the entire algorithm runs in time polynomial in the size of f . Corollary 3 SAT-CNF is N P-complete. References [1] Gilles Brassard and Paul Bratley. Fundamentals of Algorithmics. Prentice Hall, 1996. [2] Steven Cook. The complexity of theorem proving procedures. In Proc. Third Annual ACM Symposium on the Theory of Computing, pages 151–158, 1971. 5