Slide 1

advertisement
Guiding Combinatorial Optimization
with UCT
Ashish Sabharwal and Horst Samulowitz
IBM Watson Research Center
(presented by Raghuram Ramanujan)
MCTS Workshop at ICAPS-2011
June 12, 2011
1
© 2011 IBM Corporation
MCTS and Combinatorial Search

Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI

Upper Confidence bounds on Trees (UCT): a form of MCTS, especially successful in
two-agent game tree search, e.g., Go, Kriegspiel, Mancala, General Game Playing

Based on single-agent tree search: one multi-armed bandit at each node of a tree
 goal: find the most “rewarding” root-to-leaf path in the tree
graph coloring

Combinatorial Search

A discrete search space, e.g., {0,1}N or {R, G, B}N

A “feasible” subspace of interest: typically defined indirectly by a finite set of constraints

Goal: find a solution – an element of the discrete space that satisfies all constraints
 If a utility function / objective function given: find an optimal solution

E.g., Boolean Satisfiability (SAT), Graph Coloring (COL), Constraint Satisfaction
Problems (CSPs), Constraint Optimization, Integer Programming (IP)
Can MCTS/UCT inspired techniques be used to improve the performance of
combinatorial search algorithms?
2
© 2011 IBM Corporation
Mixed Integer Programming (MIP) :
A Challenging but Promising Opportunity

MIP: linear inequality constraints, continuous & discrete variables

Typically with a linear (or quadratic) objective function

NP-hard; highly useful, with several academic and commercial solvers available
MIP search appears much more suitable than, e.g., SAT for applying UCT!

Opportunity for applying UCT


MIP solvers such as IBM ILOG’s CPLEX, Gurobi, etc.:

maintain a “frontier” of open nodes, exploring them with a
combination of best-first search, “diving” to the bottom of the tree, etc.

rely on spending substantial effort per node, e.g., computing LP relaxation to
obtain a bound on the objective value in the subtree: an estimate of the true value
In contrast, state-of-the-art SAT solvers not easily adapted to UCT:

are based on enhancements to basic depth-first search traversal

rely on processing nodes extremely fast (~ 2000-5000 per second)
Can we improve CPLEX by letting UCT decide search tree exploration order?
3
© 2011 IBM Corporation
Mixed Integer Programming (MIP) :
A Challenging but Promising Opportunity

Challenges and Differences from the “usual” setup for UCT

Biggest success of UCT so far: two-agent game tree search, rather than single-agent

Random playouts are costly to implement in MIP search

Unlike game tree search, too costly to create a full UCT tree at each node

Exploitation isn’t very meaningful after true value of a node is revealed:
no reason to repeatedly visit that node even if it is optimal

LP relaxation – available for “free”, provides a guaranteed bound on the true value
 averaging backups may not be the best strategy!

Highly optimized commercial MIP solvers such as CPLEX very hard to improve upon!

Implementation: no easy access to CPLEX’s internal data structures; must maintain
our own “shadow tree” for exploring UCT strategies – additional overhead
Main Finding:
Guidance near the top of the tree can improve performance across a variety of instances!
4
© 2011 IBM Corporation
How does Search in CPLEX (roughly) work?

CPLEX explores the search tree by alternating between two operations:
I.
Node Selection: Select the next open search node to continue search on:
CPLEX selects node with the best estimate E
II.
Branching: Select the next variable to branch on (assume binary branching)
Search Tree
Root-Node
Ei
x  10
E0
x  10
y  5 E1 y  5
z2
E3
E5
E4
v 1
E7
E2
z2
v 1
E6
E8
- Node Selection:
Initially only one node that can be selected
- Branching: Select variable x
- Node Selection:
Select node with estimate E1
- Branching: Select variable y
- Node Selection:
Select node with estimate E2
- Branching: Select variable z
- Node Selection:
Select node with estimate E5
- Branching: Select variable v
CPLEX open nodes and corresponding quality estimate E of the underlying sub-tree
(e.g., LP objective value)
CPLEX closed nodes
5
© 2011 IBM Corporation
Guiding Node Selection in CPLEX with UCT

Node Selection with UCT

Idea: expand nodes in the order in which UCT would expand them

Traverse search tree from root to a current leaf node (i.e., “open” node) while
at each node selecting the child that has the highest UCT score s.

UCT score s: Combines estimate of the “quality” of a node (the same CPLEX
uses) with how often this node has been visited already


Tree Update Phase

6
Goal: Balance Exploration / Exploitation in CPLEX search
When node selection reaches a leaf node,

compute its quality estimate (e.g., objective value of LP relaxation) and
propagate it upwards towards the root

branch on this node using the default variable/value selection of CPLEX

Update rule / backup operator: max of the two children (no averaging!), if
maximization problem; min if minimization

Result: estimate at each node N along this leaf-to-root path equals the best
value seen in the entire sub-tree under N
© 2011 IBM Corporation
Guiding Search in CPLEX with UCT
 Node Selection

Node Selection is now guided by UCT scores (as illustrated below)
UCT score is based on estimate E and number of visits to a search nod

In order to employ UCT one needs to maintain a shadow tree of CPLEXs search tree
 CPLEX maintains just a frontier of open nodes; the underlying search tree only exists implicitly
Search Tree
Root-Node
Ei
#visits0
x  10 E0
- Node Selection:
Initially only one node that can be selected
- Branching: Select variable x
#visits1
#visits2 E
- Node Selection:
E
2
1
y5
y5
z2
z  2 Select node with highest UCT score based
on E1and #visits
#visits5
1
E
E3
E6 - Branching: Select variable
E54
E4
y
v 1
v 1
Node Selection:
#visits3
#visits6 - Select
#visits4
node with highest UCT score based
E7
E8
#visits7
#visits8 …on E2and #visits2
x  10
CPLEX open nodes and corresponding quality estimate E of the underlying sub-tree
(e.g., LP objective value)
CPLEX closed nodes
7
© 2011 IBM Corporation
Guiding Search in CPLEX with UCT
 Tree Update Phase
 After selecting a node N and branching on a variable, two child nodes N_left and N_right will be
created with their corresponding estimates E_left and E_right
 When propagating estimates upwards, we only consider the best estimate (e.g., no averaging)
 Update using the “backup operator”
Search Tree
Root-Node
Ei
x  10
E
y5 1 y5
E3
E4
E0
x  10
- Propagate max(E1 , E2 )  E1 to
E2
E0
- Propagate max(E3 , E4 )  E4 to E1
as long as new estimates improve
current best estimate at a node on
path to the root.
E.g., only if E4  E0 then propagate new
estimate to node labeled with E0 .
However, visit counts are updated for
each node on the path to root.
CPLEX open nodes and corresponding quality estimate E of the underlying sub-tree
(e.g., LP objective value)
CPLEX closed nodes
8
© 2011 IBM Corporation
UCT Score: “Epsilon Greedy” Variant of UCB1

UCT Score computation:
N = tree node under consideration
P = parent of N
 = a constant balancing exploration and exploitation (0.7 in experiments)
 = theoretically a number decreasing inversely proportional to
visits(N) ( = a constant set to 0.01 in experiments)

9
Fast and accurate enough for our purposes, compared to the standard
UCB1 formula
© 2011 IBM Corporation
Experimental Evaluation



Starting with 1,024 publically available MIP instances we removed:

All instances solved by default CPLEX within 10 seconds (too easy)

All instances not solved by default CPLEX within 900 seconds (too hard)
Experimental Evaluation is based on the 170 remaining instances

Spanning a variety of domains

Experimentation not limited to any particular instance family (e.g., TSP
instances, set covering, etc.)
Experiments were conducted on:

Intel Xeon CPU E5410, 2.33GHz with 8 cores, and 32GB of memory


10
Only a single run per machine since multiple CPLEXs on one machine
can (and often do!) interfere with each other
OS: Ubuntu
© 2011 IBM Corporation
Experimental Evaluation: Solvers


Default CPLEX

Uses various strategies, including a combination of best-first node selection and
depth-first “diving” to reach a leaf node from each best node

Highly optimized; very challenging to beat by a large margin across a large
variety of problem domains
CPLEX with node selection guided by UCT


11
Best results when guidance limited to the top 5 levels of the tree;
then revert to the default node selection of CPLEX
Other standard exploration schemes

Best-first

Breadth-first

Depth-first
© 2011 IBM Corporation
Preliminary Experimental Results
[ timeout: 600 sec ]
Promising performance:
 UCT guidance results in the fewest instances timing out (8)
 Fastest on 39 instances
 Lowest average runtime (albeit only by a few seconds)
12
© 2011 IBM Corporation
Preliminary Experimental Results
Pairwise performance measure (timeout: 600 sec) :
 how often does the row solver outperform the column solver?
 e.g., UCT guidance outperforms default CPLEX on 64 instances;
52 times vice versa
Promising performance:
 UCT guidance outperforms default CPLEX and other natural alternatives
13
© 2011 IBM Corporation
Conclusion



14
Explored the use of MCTS/UCT in a combinatorial search setting

Specifically, for mixed integer programming (MIP) search, with CPLEX

Typical “random playouts” very costly but LP relaxation objective value serves
as a good estimate – a guaranteed one-sided bound!

Max-style update rule performs better here than the usual averaging backups
Guiding combinatorial search with UCT holds promise!

Improving performance of highly optimized MIP solvers across a variety of
problem domains is a huge challenge

UCT-inspired guidance for node selection shows promise

Most benefit when UCT used only near the top of the search tree
Further exploration along these lines appears fruitful, e.g.:

using UCT for variable or value selection (rather than node selection)

building a “full” UCT tree at each search tree node before branching
© 2011 IBM Corporation
Download