APPROXIMATION ALGORITHMS LecA/1 Introduction & Combinatorial (Problem Specific) Algorithms Most optimization problems, in theory and in practice, are NP-hard. Let OPT be the value of an optimal solution to such a problem. This means that there is no poly-time algorithm to compute OPT, unless P=NP. Of course, in practice, such problems need to be somehow solved. Practitioners routinely address this issue by so-called heuristics. These are efficient algorithms that produce some solution with value, say APPROX without any performance guarantee as to how APPROX compares to OPT, but with some general belief that APPROX is “good enough” for the particular application. The theory of Approximation Algorithms provides performance guarantees to such heuristics: For a minimization problem with optimal solution of value OPT, we say that an algorithm which outputs a solution of value APPROX achieves approximation factor α if and only if, for all inputs, APPROX OPT For a maximization problem with optimal solution of value OPT, we say that an algorithm which outputs a solution of value APPROX achieves approximation factor α if and only if, for all inputs, APPROX OPT Realize: For minimization NP-hard problems 1 Realize: For maximization NP-hard problems 1 NOTE: The difficulty is that such performance guarantees are making a statement (lowerbound or upperbound) about OPT, which, as we said above, is, in general, NP-hard to compute. NOTE: The purpose of Approx. Algs is not to simply analyze and justify used heuristics. The real value of Approx. Algs is that, in our effort to establish formal proofs concerning the performace of various heuristics, and in our effort to improve this performance, we discover new algorithmic techniques (which, for example, practitioners routinely use as further heuristics). We may also discover inherent limitations to improve performance guarantees, ie new complexity lower bounds. LecA/2 VERTEX COVER, Factor 2 Approximation Input: G(V,E), undirected and unweighted. Output: Minimum cardinality vertex cover, ie S V such that every edge in E has at least one endpoint in S. APPROX: Greedily find maximal matching M of G(V,E). Output both endpoints of every edge in M. THEOREM: APPROX 2 OPT . PROOF: Since M is a matching, there are no two edges in M sharing a common endpoint (vertex). However, every edge in M is an edge of the graph, hence must be covered by a distinct vertex. Therefore M OPT . But APPROX 2 M . Therefore APPROX 2 M 2 OPT NOTE: The factor 2 is tight (for the particular approximation algorithm). Example achieving factor 2: . NOTE: The above algorithm is not a factor 2 approximation, if the vertices were weighted, and we were looking for a minimum weight vertex cover. Weighted MAXCUT, Factor 2 Approximation LecA/3 Input: G(V,E,W), undirected, with symmetric costs w w . uv vu and triangular inequality wuv wux wxv . Output: Bipartition of V (ie a cut), maximizing the total costs of edges with endpoints in different partition classes (ie crossing the cut). APPROX: For v=1 to |V|, assume cl(v)=0, w.p. ½ and cl(v)=1, w.p. ½. For v=1 to |V| do W xyE wxy Prcl(x) cl(y) | cl(z), z v W0 xyE wxy Prcl(x) cl(y) | cl(z), z v & cl(v) 0 W1 xyE wxy Prcl(x) cl(y) | cl(z), z v & cl(v) 1 If W0 W1 then set cl(v)=0, else set cl(v)=1. 1 THEOREM: APPROX OPT . * 2 PROOF: Initially, W is the expectation of the total weight of the edges crossing a cut, under independently random assignment of vertices to partition classes 0 and 1, each w.p. ½. 1 1 1 Thus, initially, W * xyE wxy xyE wxy OPT 2 2 2 Throughout the algorithm the following invariant holds: At least one of W0 and W1 is at least as large as W . Thus, in the end, 1 1 This is because W W0 W1 . 2 2 APPROX xV:cl(x)0 1 * w W OPT yV:cl(y)1 xy 2 Metric TSP, Factor 2 Approximation LecA/4 Input: G(V,E,W), undirected, with symmetric costs wuv wvu and triangular inequality wuv wux wxv . Output: Minimum cost Hamilton cycle of G(V,E,W). (ie. cycle going through each vertex exactly once). APPROX: T0 Find min cost spanning tree of G(V,E,W), say, of cost MST. Construct cycle 0 going though all vertices according to the min cost spanning tree (but each vertex may be visited more than once). T T Construct Hamilton cycle from by taking “shortcuts” (repeated shortcuts, if necessary). T0 T THEOREM: APPROX 2 OPT . PROOF: OPT is a cycle, hence it can be viewed as a spanning tree plus one edge, hence MST OPT . T 0 2 MST By construction, . By triangular inequality (repeated applications if shortcuts involve more than two edges) . T T0 Combining the above: . APPROX T T0 2 MST 2 OPT Metric TSP, Factor 2 Approximation LecA/5 NOTE: The factor 2 is tight (for the particular approximation algorithm). Example achieving factor 2: OPT n 1 Complete graph on 2n+1 nodes. Blue edges cost 1, black edges cost 2. M ST n T0 2n T 2n - 1 APPROX 2n 1 LecA/6 Metric TSP, Factor 3/2 Approximation Input: G(V,E,W), undirected, with symmetric costs wuv wvu and triangular inequality wuv wux wxv . Output: Minimum cost Hamilton cycle of G(V,E,W). (ie. cycle going through each vertex exactly once). APPROX: Find min cost spanning tree of G(V,E,W), say, of cost MST. (Realize that MST has an even number of vertices that have odd degree.) M Find mincost perfect matching between all odd-degree vertices of MST. Notice that MST M is an Eulerian graph (ie all vertices have even degree). Also realize that may include edges of the MST. M T Construct an Euler Tour/Cycle 0 for MST M (notice that each vertex may be visited more than once) T Construct Hamilton cycle from by taking “shortcuts” (repeated shortcuts, if necessary). T0 LecA/7 TSP, Factor 3/2 Approximation THEOREM: 3 APPROX OPT . 2 PROOF: OPT is a cycle, hence it can be viewed as a spanning tree plus one edge, hence MST OPT A F Consider a Hamilton Tour of cost OPT, and let A,B,C,D,E,F be the vertices that have odd degree in the MST, B in the order in which they appear in in the Hamilton Tour of cost OPT. E Clearly {AB, CD, EF} and {BC,DE,FA} C D are perfect matchings. Also, by triangular inequality, AB BC CD DE EF FA OPT . Now assume that AB BC CD OPT/2 , and since M AB CD EF (because M is a mincost matching over A,B,C,D,E,F) , . it follows that M OPT/2 . By construction, the cost of the Euler Tour is T 0 MST M . . Finally, by triangular inequality (repeated applications if shortcuts involve ore than two edges) T T 0 . Combining the above: 1 3 APPROX T T0 MST M OPT OPT OPT . 2 2.