Knuth Prize Lecture 2010

• David S. Johnson

• AT&T Labs - Research

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

1975

Don Knuth Mike Garey David Johnson

From M.R. Garey, R. L. Graham, D. S. Johnson, and D.E. Knuth,

“Complexity Results for Bandwidth Minimization,” SIAM J. App.

Math. 34:3 (1978), 477-495.

Bob Tarjan Mike Garey David Johnson

1980’s

Peter Shor

Ed Coffman

Ron Graham

Mihalis Yannakakis

Dick Karp

Christos Papadmitriou

Endre Szemeredi

Laci Lovasz

Acknowledgments

• Role Models: Collaboration in Action

– Mike Fischer and Albert Meyer

• Collaborators “Down the Hall”

– Mike Garey, Ron Graham, Ed Coffman, Mihalis

Yannakakis, Bob Tarjan, Peter Shor

• Honorary “Down the Hall” Collaborators

– Christos Papadimitriou, Tom Leighton, Richard

Weber, Claire Mathieu

• Experimental Inspirations and Collaborators

– Jon Bentley, Shen Lin & Brian Kernighan, Lyle &

Cathy McGeogh, David Applegate

Approximation Algorithms in

Theory and Practice

David S. Johnson

AT&T Labs – Research

Knuth Prize Lecture

June 7, 2010

The Lost Cartoon

Impact?

Kanellakis Theory and Practice Award

1996: Public Key Cryptography (Adleman, Diffie, Hellman, Merkle, Rivest, and Shamir)

1997: Data Compression (Lempel and Ziv)

1998: Model Checking (Bryant, Clarke, Emerson, and McMillan)

1999: Splay Trees (Sleator and Tarjan)

2000: Polynomial-Time Interior Point LP Methods (Karmarkar)

2001: Shotgun Genome Sequencing (Myers)

2002: Constrained Channel Coding (Franaszek)

2003: Randomized Primality Tests (Miller, Rabin, Solovay, and Strassen)

2004: AdaBoost Machine Learning Algorithm (Freund and Schapire)

2005: Formal Verification of Reactive Systems (Holzmann, Kurshan, Vardi, and Wolper)

2006: Logic Synthesis and Simulation of Electronic Systems (Brayton)

2007: Gröbner Bases as a Tool in Computer Algebra (Buchberger)

2008: Support Vector Machines (Cortes and Vapnik)

2009: Practice-Oriented Provable-Security (Bellare and Rogoway)

Coping with NP-Completeness at AT&T

Part I. The Traveling Salesman Problem

• TSP Applications (Bell Labs):

– “Laser Logic” (programming FPGA’s)

– Circuit Board Construction

– Circuit Board Inspection

• Algorithms Used --

– Double Spanning Tree? (worst-case ratio = 2)

– Nearest Insertion? (worst-case ratio = 2)

– Christofides? (worst-case ratio = 1.5)

• Answer: None of the Above

Testbed: Random Euclidean Instances

N = 10

N = 10

N = 100

N = 1000

N = 10000

Lin-Kernighan [Johnson-McGeoch Implementation]

1.5% off optimal

1,000,000 cities in 8 minutes at 500 Mhz

Iterated Lin-Kernighan [Johnson-McGeoch Implementation]

0.4% off optimal

100,000 cities in 75 minutes at 500 Mhz

Concorde Branch-and-Cut Optimization [Applegate-Bixby-Chvatal-Cook]

Optimum

1,000 cities in median time 5 minutes at 2.66 Ghz

Running times (in seconds) for 10,000 Concorde runs on random 1000-city planar

Euclidean instances (2.66

Ghz Intel Xeon processor in dual-processor PC, purchased late 2002).

Range: 7.1 seconds to 38.3 hours

For more on the state-of-the-TSP-art, see http://www2.research.att.com/~dsj/chtsp/index.html/

[DIMACS TSP Challenge] http://www.tsp.gatech.edu/

[Concorde, with instances]

Coping with NP-Completeness at AT&T

Part II. Bin Packing

Coping with NP-Completeness at AT&T

Part III. Access Network Design

[Applegate, Archer, Johnson, Merritt, Phillips, …]

• Problem:

In “out of region” areas, AT&T does not always have direct fiber connections to our business customers, and hence spends a lot of money to lease lines to reach them. Can we save money by laying our own fiber?

• Tradeoff:

Capital cost of fiber installation versus monthly cost savings from dropping leases.

• Our Task:

Identify the most profitable clusters of customers to fiber up.

• Key Observation: This can be modeled as a Prize Collecting Steiner Tree problem, with Prize = Lease Savings and Cost = Annualized Capital Cost.

• The Goemans-Williamson primal-dual approximation PCST algorithm should be applicable.

Unfortunate Details

• Although the Goemans-Williamson algorithm has a worst-case ratio of 2, this is for the objective function

Edge Cost + Amount of Prize Foregone which isn’t really the correct one here.

• Edge costs are capital dollars, prizes are expense dollars and not strictly comparable.

• We don’t have accurate estimates of costs.

Fortunate Details

• By using various multipliers on the prize values, we can generate a range of possible clusters, ranking them for instance by the number of years until cumulative lease savings equals capital cost.

• Each cluster can itself yield more options if we consider peeling off the least profitable leaves.

• Planners can then take our top suggestions and validate them by obtaining accurate cost estimates.

Coping with NP-Completeness at AT&T

Part IV. The More Typical Approaches

• Adapt a metaheuristical search-based approach

(local search, genetic algorithms, tabu search,

GRASP, etc.)

• Model as a mixed integer program and use CPLEX, either to solve the MIP if the instance is sufficiently small (often the case), or to solve the

LP, which we then round.

Final AT&T Example:

[Breslau, Diakonikolas, Duffield, Gu, Hajiaghayi, Karloff, Johnson, Resende, Sen]

• Special case of the “Cover-by-Pairs” problem [Hassin

& Segev, 2005]:

• Given a set A of items, a set C of “cover objects”, and a set T  AxCxC , find a minimum-size subset C’  C such that for all a  A , there exist (not-necessarilydistinct) c,c’  C’ such that (a,c,c’)  T .

• Here we are given a graph G = (V,E) , with both A and C being subsets of V .

Yes

No

Cover Object

(potential content location)

Item

(customer for content)

(a,c,c’)  T iff no vertex b  a is in both a shortest path from a to c and a shortest path from a to c’ .

What Theory Tells Us

• Our special case is at least as hard to approximate as Cover-by-Pairs.

• Cover-by-Pairs is at least as hard to approximate as Label Cover.

• Assuming NP  DTIME(n O(polylog(n)) ) , no polynomial-time approximation algorithm for Label Cover can be guaranteed to find a solution that is within a ratio of

2 log1-εn of optimal for any ε > 0 .

What Practice Tells Us

Algorithms we tried:

– CPLEX applied to the integer programming formulation of the corresponding Cover-by-Pairs instance

– Greedy algorithm for the Cover-by-Pairs instance

– Genetic algorithm for the Covers-by-Pairs instance

– Graph-based “Double Hitting Set” Algorithm (HH) that puts together solutions to two speciallyconstructed hitting-set instances, with Greedy algorithm cleanup

Instance Testbed

• Actual ISP networks with from 100 to

1000 routers (vertices)

• Synthetic wide-area-network structures from 26 to 556 routers, generated using the Georgia Tech Internet

Topology Models package.

Computational Results

• CPLEX could, in reasonable time, find optimal integer solutions to instances with |A|,|C| <

150, but its running time was clearly growing exponentially.

• The Double Hitting Set and Genetic algorithms typically found solutions that of size no more than 1.05 OPT

• (where “OPT” was the maximum of the true optimum, where known, and a lower bound equaling the optimal solution value for the second hitting set instance the Double Hitting

Set algorithm considered)

• Only for the largest ISP instance did the results degrade (HH was 46% off the lower bound)

• But is this degradation of the algorithm or the quality of our lower bound?

• And does it matter? The solution was still far better than the naïve solution and well worth obtaining.

Lessons Learned

• Real world instances were not as worst-case or asymptotic as our theory is.

• Champion algorithms from the theory world could be outclassed by ad hoc algorithms with much worse (or unknown) worst-case behavior.

Some algorithms and ideas from the theory world have been successfully applied, often to purposes for which they were not originally designed.

• Algorithms from the Operations Research and

Metaheuristic communities have perhaps had more real-world impact on coping with NP-hardness than those from TCS.

How to Have More “Real-World” Impact

(Today’s Sermon, with Illustrations)

1. Study problems people might actually want to solve.

2. Study the algorithms people actually use (or might consider using).

3. Design for users, not adversaries.

4. Complement worst-case results with “realistic” average case results.

5. Implement and experiment.

Study the algorithms people actually use

• Bin packing, greedy set covering, graph coloring (DSJ, 1973)

• 2-Opt algorithm for the TSP (Chandra,

Karloff, & Tovey, 1999)

• K-means clustering (Arthur & Vassilvitskii,

2006, 2007)

• Smoothed analysis of linear programming

(Spielman & Teng, 2001)

Still Open

• When (and why) do metaheuristic approaches work well?

• Ditto for belief propagation algorithms, etc.

• Many other questions.

Design for users, not adversaries

• Some of our most effective techniques for minimizing worst-case behavior essentially guarantee poor performance in practice

– Rounding

– Metric Embedding

On-Line Bin Packing

• For any on-line algorithm A, R

(A) ≥ 1.540

• First Fit: asympotic worst-case ratio R

(FF) = 1.7

• Harmonic Algorithm: R

(H) = 1.691…

• Richey’s Harmonic+1 Algorithm: R

(H+) ~ 1.59

• The rounding up of sizes used in the latter two algorithms guarantees wasted space in bins, and

First Fit substantially outperforms them in practice and on average (for a variety of distributions)

Complement worst-case results with

“realistic” average case results

Drawbacks of standard average-case analysis:

• Results for just one distribution tell a very narrow story

• Many distributions studied are chosen for ease of analysis rather than for modelling reality

– too unstructured

– too much independence

TSP

• Reasonably good: Random points in the unit square -- here geometry imposes structure, yielding reasonably good surrogates for real-world geometric instances.

• Not so Good: Random distance matrices

(each edge length chosen uniformly and independently from [0,1]).

Bin Packing

• The classical distribution first studied has item sizes chosen independently from the uniform distribution on [0,1]. It yields great and surprising theory:

– Theorem [Shor]: For an n-item instance, the expected number of bins in the packing constructed by Best Fit is n/2 + Θ(n 1/2 log 3/4 n)

• However, for this distribution, bin packing is essentially reduced to a matching problem.

More Meaningful

Bin Packing Distributions

• Choosing sizes from [0,a], a < 1, captures a wider range of packing behavior.

– Theorem [Johnson, Leighton, Shor, & Weber]

First Fit Decreasing has O(1) waste for 0 < a ≤ ½ , and Θ(n 1/3 ) waste for ½ < a < 1.

• Even more generally, one can consider

“discrete distributions”, where item sizes are restricted to a fixed set of integers, each having its own probability.

Some additional questionable distributions *

• Random instances of Satisfiability

• G(n,p) random graphs

• G(n,p) random graphs with planted subgraphs

*Questionable for practice only -- Very interesting for theory.

Implement and Experiment

• Given the frequent disconnect between theory and practical performance, the best way to get people to use your algorithm is to provide experimental evidence that it performs well, and, better yet, to provide an implementation itself.

• Side benefit: Experiments can also drive new theory, suggesting new questions and algorithms.

TSP

• Jon Bentley’s experiments with the 2-Opt algorithm identified the linked-list representation of the current tour as a bottleneck.

• This led to defining a “tour data structure” with flip, successor, predecessor, and betweeness as operations/queries.

• For this we obtained a cell-probe lower bound and a near-optimum solution (a representation based on splay trees) [Fredman, Johnson,

McGeogh, & Ostheimer, 1995]

Bin Packing

• The above two average-case results would never even have been conjectured were it not for experimental results that suggested them.

• The observation that First Fit works well on the

U[0,1] distribution because it approximately solves a matching problem led to the invention of the “Sumof-Squares” on-line bin packing algorithm.

• Experimental analysis of this algorithm led to a variant which has essentially optimal average case performance for ALL discrete distributions. [Csirik,

Johnson, Kenyon, Orlin, Shor, & Weber, 2006]

Final Thoughts

• Implementation and experiments are certainly not for everyone (or every algorithm).

• Some experience in this area does, however, help to put our theoretical work in perspective, as does knowing more about how our algorithms perform in practice and why.

Last Wishes

• More examples of impact (or lack thereof).

• Suggestions for future Kanellakis Prize nominees.

• Suggestions of new problem domains for future DIMACS Implementation

Challenges.

• Questions?