SAT-CNF Is NP-complete

advertisement
SAT-CNF Is N P-complete∗
Rod Howell
Kansas State University
November 9, 2000
The purpose of this paper is to give a detailed presentation of an N Pcompleteness proof using the definition of N P given by Brassard and Bratley
[1] and the following definition of N P-hardness:
Definition 1 A decision problem Y is N P-hard if for every X ∈ N P,
X ≤pm Y.
We focus on the following two problems:
Satisfiability (SAT):
Input: A formula f over boolean variables with operators ∧, ∨, and ¬.
Question: Is there an assignment of boolean values to the variables in f
such that f is true?
Conjunctive Normal Form Satisfiability (SAT-CNF):
Input: A formula f in conjunctive normal form (CNF), i.e., of the form


ki
m
^
_

αij  ,
i=1
j=1
where each αij is either a boolean variable or the negation of a boolean
variable.
Question: Is there an assignment of boolean values to the variables in f
such that f is true?
We will show that SAT-CNF is N P-complete (this fact is originally due to Cook
[2]). We assume the following theorem, also due to Cook [2]:
Theorem 1 SAT is N P-hard.
∗ Copyright c 2000, Rod Howell. This paper may be copied or printed in its entirety for use
in conjunction with CIS 775, Analysis of Algorithms, at Kansas State University. Otherwise,
no portion of this paper may be reproduced in any form or by any electronic or mechanical
means without permission in writing from Rod Howell.
1
We first show that SAT ∈ N P. Because SAT-CNF is a special case of SAT, it
will follow that SAT-CNF ∈ N P.
Theorem 2 SAT ∈ N P.
Proof: Our proof space Q will be the set of finite sequences of boolean values.
Let Φ be the set of all boolean formulas, and let V be the set of all boolean
variables. We define a partial function g : Φ × Z+ * V so that g(f, k) is the
variable in f whose first occurrence is kth among the first occurrences of all
variables in f ; if f contains fewer than k variables, g(f, k) is undefined. We
then define F ⊆ SAT × Q so that hf, hb1 , b2 , . . . , bm ii ∈ F iff
• f contains exactly m variables; and
• the assignment of the boolean value bk to the variable g(f, k) for 1 ≤ k ≤ m
is a satisfying assignment for f .
Thus, for each f ∈ SAT, there is a q ∈ Q no longer than f .
We will now show that we can decide whether hf, qi ∈ F in O(n2 ) time, where
n is the sum of the lengths of f and q. Our algorithm proceeds as follows:
1. We first construct a list of all the variables in f , ordered by the position
of their first occurrence. As we construct this list, we record the position
in this list of each occurrence of each variable in f . This can clearly be
done in O(n2 ) time.
2. We then verify that the length of q is exactly the number of variables in
f . This can clearly be done in O(1) time if q is stored in an array.
3. We then verify that f is satisfied by assigning the kth variable in the
variable list the kth boolean value in q, for all k. We accomplish this
by a straightforward recursive evaluation of f , looking up the value of
each variable using its recorded position in the variable list. Assuming the
sequence q is stored as an array, this evaluation can be done in O(n) time.
Corollary 1 SAT-CNF ∈ N P.
Corollary 2 SAT is N P-complete.
Our proof of N P-hardness will involve a reduction from SAT to SAT-CNF. This
reduction will consist of two steps. First, we will convert the given boolean
formula to an equivalent formula in which negation is applied only to variables,
not to arbitrary subexpressions. We will then construct from the resulting
formula a formula in CNF that is satisfiable iff the original formula is satisfiable.
The CNF formula will not, in fact, be equivalent to the original formula, because
conversion to CNF can result in an exponentially larger formula, and hence uses
exponential time in the worst case.
We begin by showing how negations can be moved to the variables.
2
Lemma 1 There is a linear-time algorithm to convert arbitrary boolean formulas to equivalent boolean formulas in which negation is applied only to variables.
Proof: Our algorithm uses the following laws of boolean algebra:
¬(f ∨ g) = ¬f ∧ ¬g;
¬(f ∧ g) = ¬f ∨ ¬g; and
¬¬f = f.
(1)
(2)
(3)
We can apply one of these three laws to any subexpression ¬f , where f is not a
variable, to obtain an equivalent subexpression in which all negated subexpressions are strictly shorter than f . A straightforward recursive implementation of
this strategy produces an equivalent formula of the proper form in linear time.
We now define a literal to be either a variable or the negation of a variable.
Using this definition, we can describe our reduction from SAT to SAT-CNF.
Theorem 3 SAT-CNF is N P-hard.
Proof: We will show that SAT ≤pm SAT-CNF. From Theorem 1, it will follow
that SAT-CNF is N P-hard.
Let f be an arbitrary boolean formula. Our algorithm first coverts f to an
equivalent formula f 0 in which negations are applied only to variables. From
Lemma 1, this can be done in linear time. Furthermore, because the conversion
can be done in linear time, the length of f 0 is linear in the length of f .
For the next step of our algorithm, we assume the existence of a polynomial-time
algorithm to generate new unique variables. In particular, after the algorithm
is initialized, we can call it arbitrarily many times, and each time it will return
a variable different from any variable in f 0 and any variable it had previously
returned. It is not hard to design such an algorithm that returns n variables in
a time polynomial in n and the length of f 0 .
We now describe a recursive algorithm that takes a formula φ in which negations
are applied only to variables, and produces a formula φ0 in CNF that is satisfiable
iff φ is satisfiable. Let V be the set of variables in φ and V 0 be the set of variables
in φ0 . Specifically, φ0 will be satisfiable by an assignment g 0 : V 0 → {true, false}
iff φ is satisfiable by the assignment g : V → {true, false}, where g(v) = g 0 (v)
for all v ∈ V .
The base case occurs when φ is a literal. In this case, the algorithm simply
returns φ. Otherwise, there are two cases.
Case 1: φ = φ1 ∧ φ2 . We first recursively compute φ01 and φ02 . We then return
φ0 = φ01 ∧ φ02 . Because φ01 and φ02 are both in CNF, φ0 is in CNF.
Suppose φ is satisfied by some assignment g. Then g must satisfy both
φ1 and φ2 . Let V10 be the set of variables in either φ01 or V , and let
V20 be the set of variables in either φ02 or V . Then V 0 = V10 ∪ V20 , and
V = V10 ∩ V20 . There must be assignments g10 : V10 → {true, false} satisfying
3
φ01 and g20 : V20 → {true, false} satisfying φ02 such that for all v ∈ V ,
g(v) = g10 (v) = g20 (v). Let g 0 : V 0 → {true, false} be defined as
(
g10 (v) if v ∈ V10
0
g (v) =
g20 (v) if v ∈ V20 .
Clearly, g 0 satisfies φ0 .
Now suppose φ0 is satisfied by some assignment g 0 . Then g 0 must satisfy
both φ01 and φ02 ; hence, it also satisfies both φ1 and φ2 .
Case 2: φ = φ1 ∨ φ2 . We first recursively compute φ01 and φ02 , then generate
a new variable x. We then change each conjunct c of φ01 to x ∨ c, and
change each conjunct c of φ02 to ¬x ∨ c. We return φ0 , the conjunction of
the resulting two formulas. Again, because both φ01 and φ02 are in CNF, φ
is in CNF.
Suppose φ is satisfied by some assignment g. Then g must satisfy at
least one of φ1 or φ2 . Assume g satisfies φ1 ; the other case is handled
symmetrically. Define V10 and V20 as in Case 1. Then V 0 = V10 ∪ V20 ∪ {x}
and V = V10 ∩ V20 . There must be an assignment g10 : V10 → {true, false}
satisfying φ01 such that for all v ∈ V , g(v) = g10 (v). Let g 0 : V 0 →
{true, false} be defined as

0

g1 (v) if v ∈ V1
0
g (v) = false if v = x


true otherwise
Because g 0 (x) = false, and because each conjunct of φ01 is satisfied by g10 ,
each conjunct of φ0 is satisfied by g 0 .
Now suppose φ0 is satisfied by some assignment g 0 . Assume g 0 (x) = false;
the other case is handled symmetrically. Then each conjunct of φ0 that was
derived from φ01 must contain a literal that is true under g 0 ; furthermore,
each of these true literals must contain variables from V10 . It follows that
φ01 is satisfied by g 0 ; hence, φ1 and φ are satisfied by g 0 .
In order to complete the proof, we must show that the entire algorithm operates
in polynomial time. In order to facilitate this, we will first show that the formula
φ0 produced by the recursive algorithm described above contains no more conjuncts than φ has literals, and that each conjunct contains no more literals than
the number of ∨’s in φ, plus 1. The bound on the number of conjuncts follows
immediately from the fact that only the base case introduces new conjuncts,
and it introduces one for each literal. The bound on the number of literals in
each conjunct follows from the fact that only the base case and Case 2 increase
the size of a conjunct. The base case creates a new conjunct (effectively increasing from 0 literals to 1 literal), and Case 2 adds 1 literal to existing conjuncts.
Because Case 2 is called once for each ∨ in φ, the total number literals in any
conjunct in φ0 is at most the number of ∨’s in φ plus 1.
4
The recursive algorithm can be implemented to represent φ0 as a linked list
with head and tail pointers. Each element in the list represents a conjunct.
Each conjunct in turn is represented by a linked list of literals. Using this
representation, it is easily seen that if we ignore the time needed to generate
new variables, the recursive algorithm can be implemented to run in time linear
in the size of φ0 , which is polynomial in the size of f . Because the total number
of new variables needed is polynomial in the size of f , they can also be generated
in polynomial time. Therefore, the entire algorithm runs in time polynomial in
the size of f .
Corollary 3 SAT-CNF is N P-complete.
References
[1] Gilles Brassard and Paul Bratley. Fundamentals of Algorithmics. Prentice
Hall, 1996.
[2] Steven Cook. The complexity of theorem proving procedures. In Proc. Third
Annual ACM Symposium on the Theory of Computing, pages 151–158, 1971.
5
Download