NP-Complete Problems

advertisement
NP-Complete Problems
•Polynomial time vs exponential time
–Polynomial O(nk), where n is the input size (e.g., number
of nodes in a graph, the length of strings , etc) of our
problem and k is a constant (e.g., k=1, 2, 3, etc).
–Exponential time: 2n or nn.
n= 2,
10,
20,
30
2n:
4
1024 1 million
1000 million
Suppose our computer can solve a problem of size k (i.e.,
compute 2k operations) in a hour/week/month. If the new
computer is 1024 times faster than ours, then the new computer
can solve the problem of size k+10 in the same time. The
improvement is very little.
• Hardware improvement has little use for solving problems
that require exponential running time.
• Exponential running time is considered as “not efficient”.
1
Story
• All algorithms we have studied so far are polynomial
time algorithms.
• Facts: people have not yet found any polynomial time
algorithms for some famous problems, (e.g., Hamilton
Circuit, longest simple path, Steiner trees).
• Question: Do there exist polynomial time algorithms
for those famous problems?
• Answer: No body knows.
2
Story
•Research topic: Prove that polynomial time
algorithms do not exist for those famous problems, e.g.,
Hamilton circuit problem.
•You can get Turing award if you can give the proof.
•In order to answer the above question, people define
two classes of problems, P class and NP class.
•To answer if PNP, a rich area, NP-completeness
theory is developed.
3
Class P and Class NP
•Class P contains those problems that are solvable in
polynomial time.
–They are problems that can be solved in O(nk) time, where n
is the input size and k is a constant.
•Class NP consists of those problem that are verifiable
in polynomial time.
•What we mean here is that if we were somehow given
a solution, then we can verify that the solution is
correct in time polynomial in the input size to the
problem.
•Example: Hamilton Circuit: given an order of the n distinct vertices (v1,
v2, …, vn), we can test if (vi, v i+1) is an edge in G for i=1, 2, …, n-1 and (vn,
4
v1) is an edge in G in time O(n) (polynomial in the input size).
Class P and Class NP
•
•
•
Based on definitions, PNP.
If we can design a polynomial time algorithm for
problem A, then problem A is in P.
However, if we have not been able to design a
polynomial time algorithm for problem A, then
there are two possibilities:
1. polynomial time algorithm does not exist for problem A
or
2. we are not smart.
Open problem: PNP?
Clay $1 million prize.
5
Polynomial-Time Reductions
Suppose we have a black box (an algorithm) that could
solve instances of a problem X; If we give the input
of an instance of X, then in a single step, the black
box will return the correct answer.
Question:
Can arbitrary instances of problem Y be solved using
polynomial number of standard computational
steps, plus a polynomial number of calls to a black
box that solves problem X?
If yes, then Y is polynomial-time reducible to X.
6
NP-Complete
• A problem X is NP-complete if it is in NP and any
problem Y in NP has a polynomial time reduction to X.
– it is the hardest problem in NP
If an NP-complete problem can be solved in polynomial time,
then any problem in class NP can be solved in polynomial
time.
•The first NPC problem is Satisfiability probelm
–Proved by Cook in 1971 and obtains the Turing Award for
this work
7
Boolean formula
• A boolean formula f(x1, x2, …xn), where xi are
boolean variables (either 0 or 1), contains boolean
variables and boolean operations AND, OR and NOT .
• Clause: variables and their negations are connected
with OR operation, e.g., (x1 OR NOTx2 OR x5)
• Conjunctive normal form of boolean formula:
contains m clauses connected with AND operation.
Example:
(x1 OR NOT x2) AND (x1 OR NOT x3 OR x6)
AND (x2 OR x6) AND (NOT x3 OR x5).
–Here we have four clauses.
8
Satisfiability problem
• Input: conjunctive normal form with n variables, x1,
x2, …, xn.
• Problem: find an assignment of x1, x2, …, xn (setting
each xi to be 0 or 1) such that the formula is true (satisfied).
• Example: conjunctive normal form is
(x1 OR NOTx2) AND (NOT x1 OR x3).
• The formula is true for assignment
x1=1, x2=0, x3=1.
Note: for n Boolean variables, there are 2n assignments.
•Testing if formula=1 can be done in polynomial time for any given assignment.
•Given an assignment that satisfies formula=1 is hard.
9
The First NP-complete Problem
• Theorem: Satisfiability problem is NP-complete.
–It is the first NP-complete problem.
–S. A. Cook in 1971 http://en.wikipedia.org/wiki/Stephen_Cook
–Won Turing prize for his work.
• Significance:
–If Satisfiability problem can be solved in polynomial time,
then ALL problems in class NP can be solved in polynomial
time.
–If you want to solve PNP, then you should work on NPC
problems such as satisfiability problem.
–We can use the first NPC problem, Satisfiability problem, to
show that other problems are also NP-complete.
10
How to show that a problem is NPC?
•To show that problem A is NP-complete, we
can
–First find a problem B that has been proved to be
NP-complete.
–Show that if Problem A can be solved in
polynomial time, then problem B can also be solved
in polynomial time.
That is, to give a polynomial time reduction from B
to A.
Remarks: Since a NPC problem, problem B, is the
hardest in class NP, problem A is also the hardest
11
Hamilton circuit and Longest Simple Path
•
•
•
•
•
Hamilton circuit : a circuit uses every vertex
of the graph exactly once except for the last
vertex, which duplicates the first vertex.
It was shown to be NP-complete.
Longest Simple Path:
Input: V={v1, v2, ..., vn} be a set of nodes in a
graph and d(vi, vj) the distance between vi and
vj,, find a longest simple path from u to v .
Theorem 2: The longest simple path problem
is NP-complete.
12
Theorem 2: The longest simple path (LSP)
problem is NP-complete.
Proof:
Hamilton Circuit Problem (HC): Given a graph G=(V, E), find a Hamilton
Circuit.
We want to show that if we can solve the longest simple path problem in
polynomial time, then we can also solve the Hamilton circuit problem in
polynomial time.
Design a polynomial time algorithm to solve HC by using an algorithm for LSP.
Step 0: Set the length of each edge in G to be 1
Step 1: for each edge (u, v)E do
find the longest simple path P from u to v in G.
Step 2:
if the length of P is n-1 then by adding edge (u, v) we
obtain an Hamilton circuit in G.
Step 3: if no Hamilton circuit is found for every (u, v) then
print “no Hamilton circuit exists”
Conclusion:
•
if LSP can be solved in polynomial time, then HC can also be solved
in polynomial.
•
Since HC was proved to be NP-complete, LSP is also NP-complete.
13
Some basic NP-complete problems
•
•
•
•
•
3-Satisfiability : Each clause contains at most
three variavles or their negations.
Vertex Cover: Given a graph G=(V, E), find a
subset V’ of V such that for each edge (u, v) in
E, at least one of u and v is in V’ and the size of
V’ is minimized.
Hamilton Circuit: (definition was given before)
History: Satisfiability3-Satisfiabilityvertex
coverHamilton circuit.
Those proofs are very hard.
14
Approximation Algorithms
•Concepts
•Knapsack
•Steiner Minimum Tree
•TSP
•Vertex Cover
15
Concepts of Approximation
Algorithms
Optimization Problem:
The solution of the problem is associated with a cost
(value).
We want to maximize the cost or minimize the cost.
Minimum spanning tree and shortest path are
optimization problems.
Euler circuit problem is NOT an optimization
problem. (it is a decision problem.)
16
Approximation Algorithm
An algorithm A is an approximation
algorithm , if given any instance I, it finds a
candidate solution s(I)
How good an approximation algorithm is?
We use performance ratio to measure the
quality of an approximation algorithm.
17
Performance ratio
For minimization problem, the performance ratio of
algorithm A is defined as a number r such that for
any instance I of the problem,
A( I )
 r (r  1)
OPT ( I )
where OPT(I) is the value of the optimal solution for
instance I and A(I) is the value of the solution
returned by algorithm A on instance I.
18
Performance ratio
For maximization problem, the performance ratio of algorithm
A is defined as a number r such that for any instance I of
the problem,
OPT(I)
A(I)
is at most r (r1), where OPT(I) is the value of the optimal
solution for instance I and A(I) is the value of the solution
returned by algorithm A on instance I.
19
Simplified Knapsack Problem
Given a finite set U of items, a size s(u)  Z+,
a capacity Bmax{s(u):u  U}, find a
subset U'U such that
 s(u)  B
uU '
and such that the above summation is as
large as possible. (It is NP-hard.)
20
Ratio-2 Algorithm
1. Sort u's based on s(u)'s in increasing order.
2. Select the smallest remaining u until no
more u can be added.
3. Compare the total value of selected items
with the item of the largest size, and select
the larger one.
Theorem: The algorithm has performance
ratio 2.
21
Proof
• Case 1: the total of selected items  0.5B
(got it!)
• Case 2: the total of selected items < 0.5B.
– No remaining item left: we get optimal.
– There are some remaining items: the size of the
smallest remaining item >0.5B. (Otherwise, we
can add it in.)
• Selecting the largest item gives ratio-2.
22
The 0-1 Knapsack problem:
• The 0-1 knapsack problem:
• N items, where the i-th item is worth vi dollars and
weight wi pounds.
– vi and wi are integers.
• A thief can carry at most W (integer) pounds.
• How to take as valuable a load as possible.
– An item cannot be divided into pieces.
• The fractional knapsack problem:
•
The same setting, but the thief can take fractions of items.
23
Ratio-2 Algorithm
1. Delete the items i with wi>W.
2. Sort items in decreasing order based on vi/wi.
3. Select the first k items item 1, item 2, …, item k
such that
w1+w2+…, wk W and w1+w2+…, wk +w k+1>W.
4. Compare vk+1 with v1+v2+…+vk and select the
larger one.
Theorem: The algorithm has performance ratio 2.
24
Proof of ratio 2
•
•
1.
2.
3.
C(opt): the cost of optimum solution
C(fopt): the optimal cost of the fractional version.
C(opt)C(fopt).
v1+v2+…+vk +v k+1> C(fopt).
So, either v1+v2+…+vk >0.5 C(fopt)0.5c(opt)
or
v k+1 >0.5 C(fopt)0.5c(opt).
•
•
Since the algorithm choose the larger one from
v1+v2+…+vk and v k+1
We know that the cost of the solution obtained by the
algorithm is at least 0.5 C(fopt)c(opt).
25
Steiner Minimum Tree
Steiner minimum tree in the plane
• Input: a set of points R (regular points) in the plane.
• Output: a tree with smallest weight which contains
all the nodes in R.
• Weight: weight on an edge connecting two points
(x1,y1) and (x2,y2) in the plane is defined as the
Euclidean distance ( x1  x2 ) 2  ( y1  y2 ) 2
26
Example: Dark points are regular points.
27
Triangle inequality
Key for our approximation algorithm.
For any three points in the plane, we have:
dist(a, c ) ≤ dist(a, b) + dist(b, c).
Examples:
c
5
a
4
b
3
28
Approximation algorithm
(Steiner minimum tree in the plane)
Compute a minimum spanning tree for R as
the approximation solution for the Steiner
minimum tree problem.
How good the algorithm is? (in terms of the
quality of the solutions)
Theorem: The performance ratio of the
approximation algorithm is 2.
29
Proof
We want to show that for any instance (input)
I, A(I)/OPT(I) ≤ r (r≥1), where A(I) is the
cost of the solution obtained from our
spanning tree algorithm, and OPT(I) is the
cost of an optimal solution.
30
• Assume that T is the optimal solution for instance
I. Consider a traversal of T.
10
1
9
8
5
2
3
4
6
7
• Each edge in T is visited at most twice. Thus, the
total weight of the traversal is at most twice of the
weight of T, i.e.,
w(traversal)≤2w(T)=2OPT(I).
.........(1)31
• Based on the traversal, we can get a spanning tree
ST as follows: (Directly connect two nodes in R
based on the visited order of the traversal.)
10
1
9
8
5
2
3
4
6
7
From triangle inequality,
w(ST)≤w(traversal) ≤2OPT(I). ..........(2)
32
• Inequality(2) says that the cost of the spanning tree ST
is less than or equal to twice of the cost of an optimal
solution.
• So, if we can compute ST, then we can get a solution
with cost≤2OPT(I).
(Great! But finding ST may also be very hard, since ST is
obtained from the optimal solution T, which we do not know.)
• We can find a minimum spanning tree MST for R in
polynomial time.
• By definition of MST, w(MST) ≤w(ST) ≤2OPT(I).
• Therefore, the performance ratio is 2.
33
Story
• The method was known long time ago. The
performance ratio was conjectured to be
2 / 3  1.1547(1968)
• Du and Hwang (1990 ) proved that the conjecture
is true.
34
Graph Steiner minimum tree
• Input: a graph G=(V,E), a weight w(e) for
each e∈E, and a subset R⊂V.
• Output: a tree with minimum weight which
contains all the nodes in R.
• The nodes in R are called regular points.
Note that, the Steiner minimum tree could
contain some nodes in V-R and the nodes in
V-R are called Steiner points.
35
Example: Let G be shown in Figure a.
R={a,b,c}. The Steiner minimum tree
T={(a,d),(b,d),(c,d)} which is shown in
Figure b.
b
b
2
2
1
a
1
1
1
d
Figure a
c
a
1
1
c
d
Figure b
Theorem: Graph Steiner minimum tree
problem is NP-complete.
36
Approximation algorithm
(Graph Steiner minimum tree)
1. For each pair of nodes u and v in R,
compute the shortest path from u to v and
assign the cost of the shortest path from u
to v as the length of edge (u, v).
(a complete graph is given)
2. Compute a minimum spanning tree for the
modified complete graph.
3. Include the nodes in the shortest paths
used.
37
Theorem: The performance ratio of this
algorithm is 2.
Proof:
We only have to prove that Triangle Inequality
holds. If
dist(a,c)>dist(a,b)+dist(b,c) ......(3)
then we modify the path from a to c like
a→b→c
Thus, (3) is impossible.
38
Example II-1
15
5
g
1
2
15
2
a
2
e
1
c
d
1
1
2
25
b
2
1
5
1
f
1
1
The given graph
39
Example II-2
e-c-g /7
e /4
a
e /3
c
g /3
d
f/ 2
b
f-c-g/5
Modified complete graph
40
Example II-3
a
c
e /3
g/3
d
f /2
b
The minimum spanning tree
41
Example II-4
g
2
1
2
a
e
c
1
b
d
1
f
1
The approximate Steiner tree
42
Approximation Algorithm for TSP
with triangle inequality
• Given n points in a plane, find a tour to visit each
city exactly once.
• Assumption: the triangle inequality holds. That is,
d (a, c) ≤ d (a, b) + d (b, c).
• This condition is reasonable, for example,
whenever the cities are points in the plane and the
distance between two points is the Euclidean
distance.
• Theorem: TSP with triangle inequality is also NPhard.
43
Ratio 2 Algorithm
Algorithm A:
1. Compute a minimum spanning tree
algorithm (Figure a)
2. Visit all the cities by traversing twice
around the tree. This visits some cities
more than once. (Figure b)
3. Shortcut the tour by going directly to the
next unvisited city. (Figure c)
44
Example:
(a)
A spanning tree
(b)
Twice around the tree
(c)
A tour with shortcut
45
Proof of Ratio 2
1. The cost of a minimum spanning tree: cost(t), is
not greater than opt(TSP), the cost of an optimal
TSP. (Why? n-1 edges in a spanning tree. n
edges in TSP. Delete one edge in TSP, we get a
spanning tree. Minimum spanning tree has the
smallest cost.)
2. The cost of the TSP produced by our algorithm
is less than 2×cost(T) and thus is less than
2×opt(TSP).
46
Center Selection Problem
Problem: Given a set of points V in the plane
(or some other metric space), find k points c1,
c2, .., ck such that for each v in V,
min { i=1, 2, …, k} d(v, ci)  d
and d is minimized.
47
Farthest-point clustering algorithm
Step 1: arbitrarily select a point in V as c1.
Step 2: let i=2.
Step 3: pick a point ci from V –{c1, c2, …, ci-1}
to maximize min {|c1ci|, |c2ci|,…,|ci-1 ci|}.
Step 4: i=i+1;
Step 5: repeat Steps 3 and 4 until i=k.
48
Theorem: Farthest-point clustering algorithm has
ratio-2.
Proof: Let c i be an point in V that maximize
i=min {|c1ci|, |c2ci|,…,|ci-1 ci|}.
We have i  i-1 for any i.
Since two, say ci and cj (i>j), of the k+1 points must
be in the same group (in an opt solution), i 2 opt.
Thus, k+1  2 opt.
For any v in V, by the definition of k+1 ,
min {|c1v|, |c2v|,…,|ck v|}  k+1 .
So the algorithm has ratio-2.
49
Vertex Cover Problem
• Given a graph G=(V, E), find V'⊆V with
minimum number of vertices such that for
each edge (u, v)∈E at least one of u and v
is in V’.
• V' is called vertex cover.
• The problem is NP-hard.
• A ratio-2 algorithm exists for vertex cover
problem.
50
Download