Michael Trick
Carnegie Mellon University
Workshop on Modeling and
Reformulation, CP 2004
Provide a perspective on what makes a “good” integer programming formulation for a problem
Give examples on automatic versus manual reformulation of problems
Outline some challenges in the automatic reformulation of integer programs (and perhaps constraint programs?)
Quick review of key concepts in integer programming
Two models
Truck-route contracting
Traveling Tournament Problem
General Comments
Minimize cx
Subject to
Ax=b
Linear objective
X: variables
Linear constraints l<=x<=u some or all of x j integral
Makes things hard!
Must put in that form!
Seems limiting, but 50 years of experience gives “tricks of the trade”
Many formulations for same problem
Variables x, y both binary (0-1) variables
Formulate requirement that x can be 1 only if y is 1
Formulation 1: x ≤ y; x,y
{0,1}
Formulation 2: x ≤ 20y; x,y
{0,1}
Are they different? Do we care which we use?
From a modeling point of view, they are the same: they both correctly model the given requirement
From an algorithmic point of view, they may be different, depending on algorithm used
Most common method is some form of branch and bound
Use linear relaxation to bound objective value
Branch on fractional values in linear relaxation solution
Stop branching when subproblem is
Infeasible
Integer
Fathomed (cannot be better than best found so far)
Minimize cx
Subject to
Ax=b
Linear objective
X: variables
Linear constraints l<=x<=u some or all of x j integral
Makes things hard!
If linear relaxation is very different from integer program then
Choose wrong variables to branch on
Fathoming will be done less often
Formulation gives convex hull of feasible integer points
y
x ≤ y
x ≤ 20 y y x x
Use formulations with good linear relaxations!
This guideline is quite misleading!
Other issues in formulations: avoiding symmetry issues, keeping problem size down, scaling, etc. that will not be covered here
Real application
Highly simplified version (which shows everything I learned)
D: 8, A: 12, $150, C: 100
D: 9, A: 1, $250, C: 80
TRUCK DATA
D: Departure Time
A: Arrival Time
$: Cost
C: Capacity
A B
D: 10, A: 2, $200, C: 125 Sample Package
Size: 10
Time Available: 9
Time Needed: 2
Problem: Purchase trucks sufficient to move all packages on time
Variables: y(i) = 1 if truck i purchased, 0 else x(j,i) = 1 if package j on i, 0 else
Objective: Minimize truck costs
Constraints:
Packages fit on assigned truck
Use only paid for trucks
Every package on some truck
No partial trucks or package splitting
model "Transportation Planning" uses "mmxprs" declarations
TRUCKS = 1..10
PACKAGES = 1..20
capacity: array(TRUCKS) of real size: array(PACKAGES) of real cost: array(TRUCKS) of real can_use: array(PACKAGES,TRUCKS) of real x: array(PACKAGES,TRUCKS) of mpvar y: array(TRUCKS) of mpvar end-declarations capacity:= [100,200,100,200,100,200,100,200,100,200] size := [17,21,54,45,87,34,23,45,12,43,
54,39,31,26,75,48,16,32,45,55] cost := [1,1.8,1,1.8,1,1.8,1,1.8,1,1.8] can_use:=[0-1 matrix whether package can go on truck]
Total := sum(i in TRUCKS) cost(i)*y(i) forall(i in TRUCKS) sum(j in PACKAGES) size(j)*x(j,i) <= capacity(i) ! (1) Packages fit forall (i in TRUCKS) sum (j in PACKAGES) x(j,i) <= NUM_PACKAGE*y(i) ! (2) use only
! paid for trucks forall (j in PACKAGES) sum(i in TRUCKS) can_use(j,i)*x(j,i) = 1 ! (3) every
! package on truck forall (i in TRUCKS) y(i) is_binary ! (4) no partial trucks forall (i in TRUCKS, j in PACKAGES) x(j,i) is_binary ! (5) no package splitting minimize(Total) end-model
Every integer programming will immediately spot the improvements: forall (i in TRUCKS) sum (j in PACKAGES) x(j,i) <= NUM_PACKAGE*y(i) ! (2) use only
! paid for trucks should be replaced with forall (i in TRUCKS, j in PACKAGES) x(j,i) <= y(i) !(2’) tighter formulation which we saw as “tighter” (though bigger)
Integer programmers are good at spotting opportunities: forall(i in TRUCKS) sum(j in PACKAGES) size(j)*x(j,i) <= capacity(i) ! (1) Packages fit
Can be strengthened with forall(i in TRUCKS) sum(j in PACKAGES) size(j)*x(j,i) <= capacity(i)*y(i) ! (1’) Packages fit
Weak Formulation: 11.2 sec, 31,825 nodes
Strong Formulation: 22.1 sec, 50,631 nodes
XPRESSMP (ILOG’s CPLEX will work the same) “knows” about this form of tightening.
It will do it automatically
In fact, it will do it “better”, only including constraints that the linear relaxation points to as relevant
Automatic reformulation trumps manual reformulation in this case!
If you use a naïve code that doesn’t understand this, then tightened formulation is critical:
Weak formulation: Unsolved after 3600 seconds (gap is 1.22 – 8.4)
Strong formulation: 1851 seconds, 2.4 million nodes
But who would use such a code for real work?
Consider the constraint sum(i in TRUCKS) capacity(i)*y(i) >= sum (j in PACKAGES)size(j)
! (6) Have sufficient capacity
Such a constraint does not tighten the formulation
(it is a linear combination of existing constraints): fundamental mantra says don’t add.
Solution time: .1 seconds, 1 node
XPRESS (and other sophisticated codes) knows a lot about “knapsack” constraints and does automatic tightening on those
Can’ identify knapsack constraint, but once identified by user, can tighten (a lot!).
Standard tightening methods by user makes things slower
Creative addition of constraint that does not appear to tighten relaxation makes things much faster
Given an n by n distance matrix D= [d(i,j)] and an integer k find a double round robin (every team plays at every other team) schedule such that:
The total distance traveled by the teams is minimized
(teams are assumed to start at home and must return home at the end of the tournament), and
No team is away more than k consecutive games, or home more than k consecutive games.
(For the instances that follow, an additional constraint that if i is at j in slot t , then j is not at i in t +1.)
NL6: Six teams from the National League of
(American) Major League Baseball.
Distances:
0 745 665 929 605 521
745 0 80 337 1090 315
665 80 0 380 1020 257
929 337 380 0 1380 408
605 1090 1020 1380 0 1010
521 315 257 408 1010 0 k is 3
Distance: 23916 (Easton May 7, 1999)
Slot ATL NYM PHI MON FLA PIT
0 FLA @PIT @MON PHI @ATL NYM
1 NYM @ATL FLA @PIT @PHI MON
2 PIT @FLA MON @PHI NYM @ATL
3 @PHI MON ATL @NYM PIT @FLA
4 @MON FLA @PIT ATL @NYM PHI
5 @PIT @PHI NYM FLA @MON ATL
6 PHI @MON @ATL NYM @PIT FLA
7 MON PIT @FLA @ATL PHI @NYM
8 @NYM ATL PIT @FLA MON @PHI
9 @FLA PHI @NYM PIT ATL @MON
NL12. 12 teams
Feasible Solution: 143655 (Rottembourg and Laburthe May
2001), 138850 (Larichi, Lapierre, and Laporte July 8 2002),
125803 (Cardemil, July 2 2002), 119990 (Dorrepaal July 16,
2002), 119012 (Zhang, August 19 2002), 118955 (Cardemil,
November 1 2002), 114153 (Van Hentenryck January 14, 2003),
113090 (Van Hentenryck February 26, 2003), 112800 (Van
Hentenryck June 26, 2003), 112684 (Langford February 16,
2004), 112549 (Langford February 27, 2004), 112298 (Langford
March 12, 2004), 111248 (Van Hentenryck May 13, 2004).
Lower Bound: 107483 (Waalewign August 2001)
Straightforward formulation is possible: plays(i,j,t) = 1 if i at j in slot t
Need auxiliary variables location (i,j,t) = 1 if i in location j in slot t follows(i,j,k,t) = 1 I travels from j to k after slot t
Rest of formulation in paper (pages 9 and 10 in proceedings)
Result is a mess
N=6
After 1800 seconds gap is 5434 – 25650 (optimal is 23,916)
Anything XPRESS is doing is not helping enough!
• Sample Variables:
@NY @MON
@MON @PHI
@NY
H
H
H
H
X1
X2
X3
Y1
Y2
One thing per time: X1+X2+Y1+Y2
1
@NY @MON
@MON @PHI
X1
X2
H
H
H
H Y1
Y2
No Away followed by Away X1+X3
1
@NY
@MON @PHI X2
X3
Rest of formulation is straightforward (in proceedings, looking more complicated than it needs to)
Result: initial relaxation (for n=6) 21,624.7
Optimal: 4136 seconds, 66,000 nodes
Stronger: X1+X2+X3+Y2
1
@NY @MON
@MON @PHI
@NY
X1
X2
X3
H H Y2
Initial relaxation same, solution time a little longer
What happened: “Strengthening” is type of clique inequality, known by XPRESS
Without clique inequalities: unsolved after more than 36,000 seconds
Initial formulation almost hopeless
Manual reformulation needed to redefine variables
Then, automatic reformulation can improve results tremendously
What is the role of manual versus automatic reformulation?
Model 1: manual needed to identify hidden constraint
Model 2: manual needed to redefine the variables
Is this an ever-moving line, or are some aspects intrinsically difficult to determine?
How can software be developed to better
Do automatic reformulation
Provide flexibility to experiment with different reformulations/reformulation levels
Introduction to Integer Programming (by Bob
Bosch and me) and this talk
Will be at http://mat.tepper.cmu.edu/trick
XPRESSMP and ILOG’s OPL Studio provide great software to experiment with