Richard M. Karp
Harvard University August 29, 2011
• Exact solution methods: exponential running time in worst case.
• Polynomial-time approximation algorithms for optimization problems. Approximation ratios are usually unrealistically high.
• Parametrized complexity: polynomial-time complexity for instances with fixed parameter, but dependence on parameter is usually adverse.
• In probabilistic analysis problem instances are drawn from simple probability distributions. Often one can prove excellent performance on the average. However, the probability distributions may not correspond to real-life instances.
• Heuristics are often “unreasonably effective,” for reasons not well understood.
• We seek systematic methods for tuning heuristics and validating them by empirical testing on training sets of representative instances.
• Large traveling-salesman problems can be solved by quick tour construction methods, local improvement methods or cutting plane methods.
• Local improvement methods find near-optimal solutions to graph bisection problems.
• Huge satisfiability problems are routinely solved rapidly by branch-and-bound methods.
• The greedy set cover algorithm typically gives solutions within a few percent of optimal.
• Set of constraints defined implicitly by a generation algorithm rather than by an explicit list.
-- Linear and convex programming: equivalence of separation and optimization
-- Integer programming: cutting-plane methods
-- Linear programming: column generation
• Ground set
V
• For every v in V , a positive weight c(v).
•
C*: collection of subsets of V (circuits)
• Goal: Find a set of minimum weight that hits every set in C*
• Equivalent to set cover problem
• NP-hard and hard to approximate within ratio o(log | C*|).
• Greedy algorithm achieves approximation ratio O(log | C*|):
Repeat: Choose element v in V that minimizes ratio of c(v) to number of sets hit; Delete sets hit by v .
• Greedy algorithm gives good approximate solutions.
• CPLEX integer programming algorithm often gives optimal solutions rapidly.
• The collection of circuits C* has a compact implicit description.
• There is a polynomial-time separation oracle which, given a subset H of the ground set, either determines that H is a hitting set or produces a circuit that H does not hit.
Example: in the feedback vertex set problem, the separation oracle produces vertex set of a shortest cycle in the subgraph induced by V\H.
• Feedback vertex set in a graph or digraph: vertex sets of cycles
• Feedback edge set in a digraph: edge sets of cycles
• Max cut: edge sets of odd cycles
• Steiner tree: edge sets of cycles that partition the required vertices
• Maximum 2-sat: minimal contradictory sets of 2-element clauses
• Intersection of k matroids: circuits of each matroid
• Maximal feasible subset of set of linear inequalities; minimal infeasible subsets.
Repeat until a feasible hitting set H is found:
(1) Given C, a subset of C*, find a minimumweight hitting set H for C.
(2) Using the separation oracle, find a minimumcardinality circuit c not hit by H.
(3 ) Add c to C
Return C
Input: C, a set of circuits and H, a hitting set for C
Repeat until H hits every circuit in C* find a circuit c not hit by H and choose an element x in c; add c to C and add x to H.
• Input: set of circuits C and hitting set H for C
(1)Execute the circuit-finding subroutine
(2) Repeat until k iterations yield no circuits: construct a greedy hitting set H for C and execute the circuit-finding subroutine.
(3) Using CPLEX, construct an optimal hitting set H for C.
If H is infeasible, go to (1)
Return H.
• Number of circuits generated, number of calls to solver, running time of generator.
• Highly similar sequences in two genomes constitute an anchor pair . The individual sequences are called anchors .
• A genome is a linearly ordered sequence of anchors.
• An alignment is a matrix with a row for each genome, and an assignment of each anchor to a column, respecting the linear orders.
• An anchor pair is synchronized if its two anchors lie in the same column.
• Goal: maximize the sum of the weights of the synchronized anchor pairs.
• The
2 -genome problem is equivalent to the maximum-weight increasing subsequence problem and is solvable in time O(n log n), where n is the cardinality of the ground set.
The k -genome problem can be solved in time O(n k ) by dynamic programming.
• Ground set: anchor pairs
• Goal: delete a minimum-weight set of anchor pairs such that the remaining anchor pairs can be simultaneously synchronized.
• Directed edge (u,v): u precedes v .
• undirected edge
(u,v) : u and v are an anchor pair
• Mixed cycle : contains directed and undirected edges, but at least one directed edge.
• An edge must be deleted from the set of undirected edges of each mixed cycle (Kececioglu).
• Run the generic implicit hitting set algorithm, with the elements as anchors and the undirected edge sets of mixed cycles as circuits.
• Separation oracles: given a putative hitting set H, search for a mixed cycle in the graph induced by the edges not in
H.
Two methods:
(1) a variant of depth-first search;
(2) attempt to align the remaining edges until blocked by the occurrence of a mixed cycle.
Time (sec.) # solved # edges
0 to 0.01 1311 (1; 52; 399)
0.01 to 0.1 764 (20; 203; 549)
0.1 to 1 1086 (26; 450; 1837)
1 to 10 632 (44; 1104; 4645)
10 to 60 151 (65; 1351; 12313)
60 to 600 75 (103; 1136; 14690)
600 to 3600 36 (166; 1236; 13916)
• Within the general algorithmic strategy there are many possible choices of the separation oracle, greedy algorithm, versions of CPLEX, parameter choices etc.
By tuning these choices on a training set of real-world examples we improved the performance by a factor of several hundred.
• This is joint work with Erick Moreno
Centeno