KITPC08 Tutorial: Combinatorial Problems I

advertisement
Combinatorial Problems I: Finding Solutions
Ashish Sabharwal
Cornell University
March 3, 2008
2nd Asian-Pacific School on Statistical Physics and Interdisciplinary Applications
KITPC/ITP-CAS, Beijing, China
Computer Science
Engineering
Mathematics
Cross-fertilization
of ideas for the study
and design of
Intelligent Systems
Operations
Research
Physics
Economics
Cognitive Science
Research part of Cornell’s Intelligent Information Systems Institute (IISI)
Director: Carla Gomes
Combinatorial Problems
Examples
• Routing: Given a partially connected network
on N nodes, find the shortest path between X and Y
• Traveling Salesperson Problem (TSP): Given a
partially connected network on N nodes, find a path
that visits every node of the network exactly once
[much harder!!]
• Scheduling: Given N tasks with earliest start times, completion
deadlines, and set of M machines on which they can execute, schedule
them so that they all finish by their deadlines
3
Problem Instance, Algorithm
• Specific instantiation of the problem
• E.g. three instances for the routing problem with N=8 nodes:
• Objective: a single, generic algorithm for the problem that can solve
any instance of that problem
A sequence of steps, a “recipe”
4
Measuring the Effectiveness of Algorithms
• Capture scaling with input size N, rather than runtime on specific
instances
• The most common notion in Computer Science is worst-case
complexity: What is the longest time (or number of steps) the
algorithm might take on any input of size N?
Perhaps only N steps, 100 N+5 N
linear time, O(N)
Maybe N2 steps, or N2 + 4 N + 6
quadratic ,O(N2)
Maybe N3 + 1000 log N
cubic, O(N3)
…
…
Maybe 2N, or 2N + N1000
…
exponential, O(2N)
5
Polynomial vs. Exponential Complexity
Polynomial time: “tractable”, can
hope to solve very large problems
with enough computing power
E.g. known routing / shortest
path algorithms [O(N3)]
Exponential time: quickly run into
scalability issues as N increases
E.g. best known algorithms for TSP
exponential
polynomial
6
Are some problems inherently harder than
others?
A large amount of work on answering this question:
computational complexity theory
Computational Complexity Hierarchy
EXP-complete:
games like Go, …
EXP
Hard
PSPACE-complete:
QBF, adversarial
planning, chess (bounded), …
#P-complete/hard:
PSPACE
P^#P
#SAT, sampling,
probabilistic inference, …
PH
NP-complete:
SAT, scheduling,
graph coloring, puzzles, …
NP
P-complete:
circuit-value, …
P
In P:
sorting, shortest path, …
Note: widely believed hierarchy; know P≠EXP for sure
Easy
8
NP-Completeness
• P : class of problems for which a solution can be found in poly time
e.g. can find a shortest path in poly time
• NP: class of problems for which a solution can be verified in poly time
e.g. can’t find a TSP solution in poly time (as far as we know)
but, given a candidate solution (a “witness”)
can verify the correctness of the witness in poly time
“N”: non-deterministic, with the power of “guessing”
“P”: polynomial time
• NP-complete: the “hardest” problems within NP
9
NP-Completeness
One of the biggest discoveries in Computer Science:
All NP-complete problems are equally hard!
[worst-case complexity]
• An algorithm for any one NP-complete problem can be used to solve
any other NP-complete problem with only a polynomial overhead!
• There are catalogues of 10,000’s of such problems
e.g. “Boolean satisfiability” or SAT, TSP, scheduling, (bounded)
planning, chip verification, 0-1 integer programming, graph coloring,
logical inference, …
[Similarly for PSPACE-complete, #P-complete, etc.]
10
Can one design a single algorithm that can
efficiently solve thousands of different problems
of interest?
The Quest for Machine Reasoning
A cornerstone of Artificial Intelligence
Objective: Develop foundations and technology to enable
effective, practical, large-scale automated reasoning.
Machine Reasoning (1960-90s)
Computational complexity of reasoning
appears to severely limit real-world
applications
Current reasoning technology
Revisiting the challenge:
Significant progress with new
ideas / tools for dealing with
complexity (scale-up),
uncertainty, and multi-agent
reasoning
12
General Automated Reasoning
Problem
instance
Domain-specific
e.g. logistics, chess,
planning, scheduling, ...
Model
Generator
(Encoder)
General
Inference
Engine
Solution
Generic
applicable to all domains
within range of modeling language
Research objective
Impact
Better reasoning and
modeling technology
Faster solutions
in several domains
13
Reasoning Complexity
• EXPONENTIAL COMPLEXITY: INHERENT
AN worst case
N= No. of Variables/Objects A= Object states
Simple Example:
Knowledge Base
Variables (binary)
X1 = email_ received
X2 = in_ meeting
X3 = urgent
X4 = respond_to_email
X5 = near_deadline
X6 = postpone
X7 = air_ticket_info_request
X8 = travel_ request
X9 = info_request
Rules:
1. X1 & (not X2) & X3  X4
2. X2  not X4
3.
4.
5.
6.
7.
X5  X3 or X6
X7  X8
X8  X9
X8  X5
X6  not X9
• TIME/SPACE
Granularity   Object states
• Current implementations trade
time with soundness
Question: Given: X1= true; X2 = false; X7=true.
What is X4 = ?
Answer Development: Inference Chain
Step 1: X7  X8 (rule 4)
Step 2: X8  X5 (rule 6)
Step 3: X5  X3 or X6 (rule 3)
Search for rules to apply
M
For N variables: 2N cases drive
complexity!
Case A: X6 = true
Step 4: X6  not X9
Step 5: X9  not X8
Step 6: Contradiction
Backtrack to M
Case B: X3 = true
X1 & (not X2) & X3  X4
Step 7: X4 = true (Rule 1)
Check Contradictions
14
Exponential Complexity Growth:
The Challenge of Complex Domains
Case complexity
Note: rough estimates, for propositional reasoning
1M War Gaming
5M
10301,020
0.5M VLSI
1M Verification
10150,500
100K
450K
106020
Military Logistics
20K Chess (20 steps deep)
100K
No. of atoms
on the earth
103010
Seconds until heat
death of sun
1047
Protein folding
Calculation
(petaflop-year)
10K
50K
Deep space mission control
100 Car repair diagnosis
200
1030
100
10K
20K
100K
1M
Variables
Rules (Constraints)
[Credit: Kumar, DARPA; Cited in Computer World magazine]
15
Progress in Last 15 Years
Focus: Combinatorial Search Spaces
Specifically, the Boolean satisfiability problem, SAT
Significant progress since the 1990’s.
How much?
•
Problem size: We went from 100 variables, 200 constraints (early
90’s) to 1,000,000 vars. and 5,000,000 constraints in 15 years.
Search space: from 10^15 to 10^300,000.
[Aside: “one can encode quite a bit in 1M variables.”]
•
Tools: 50+ competitive SAT solvers available
Overview of the state of the art:
Plenary talk at IJCAI-05 (Selman); Discrete App. Math. article (Kautz-Selman ’06)
16
How Large are the Problems?
A bounded model checking problem:
17
SAT Encoding
(automatically generated from problem specification)
i.e., ((not x1) or x7)
((not x1) or x6)
etc.
x1, x2, x3, etc. are our Boolean variables
(to be set to True or False)
Should x1 be set to False??
18
10 Pages Later:
…
i.e., (x177 or x169 or x161 or x153 …
x33 or x25 or x17 or x9 or x1 or (not x185))
clauses / constraints are getting more interesting…
Note x1 …
19
4,000 Pages Later:
…
20
Finally, 15,000 Pages Later:
Search space of truth assignments:
Current SAT solvers solve this instance in
under 30 seconds!
21
SAT Solver Progress
Solvers have continually improved over time
Instance
Posit' 94 Grasp' 96 Sato' 98 Chaff' 01
40.66s
1.20s
0.95s
0.02s
bf1355-638
1805.21s
0.11s
0.04s
0.01s
pret150_25
>3000s
0.21s
0.09s
0.01s
dubois100
>3000s
11.85s
0.08s
0.01s
aim200-2_0-no-1
>3000s
0.01s
< 0.01s
< 0.01s
2dlx_..._bug005
>3000s
>3000s
>3000s
2.90s
c6288
>3000s
>3000s
>3000s
>3000s
ssa2670-136
Source: Marques-Silva 2002
22
How do SAT Solvers Keep Improving?
From academically interesting to practically relevant.
We now have regular SAT solver competitions.
(Germany ’89, Dimacs ’93, China ’96, SAT-02, SAT-03, …, SAT-07)
E.g. at SAT-2006 (Seattle, Aug ’06):
• 35+ solvers submitted, most of them open source
• 500+ industrial benchmarks
•
50,000+ benchmark instances available on the www
This constant improvement in SAT solvers is the key to making, e.g.,
SAT-based planning very successful.
23
Current Automated Reasoning Tools
Most-successful fully automated methods:
based on Boolean Satisfiability (SAT) / Propositional Reasoning
– Problems modeled as rules / constraints over Boolean variables
– “SAT solver” used as the inference engine
Applications: single-agent search
• AI planning
SATPLAN-06, fastest optimal planner;
ICAPS-06 competition (Kautz & Selman ’06)
•
Verification – hardware and software
Major groups at Intel, IBM, Microsoft, and universities
such as CMU, Cornell, and Princeton.
SAT has become the dominant technology.
•
Many other domains: Test pattern generation, Scheduling,
Optimal Control, Protocol Design, Routers, Multi-agent systems,
E-Commerce (E-auctions and electronic trading agents), etc.
24
Recall: General Automated Reasoning
Problem
instance
Domain-specific
e.g. logistics, chess,
planning, scheduling, ...
Model
Generator
(Encoder)
General
Inference
Engine
Solution
Generic
applicable to all domains
within range of modeling language
Research objective
Impact
Better reasoning and
modeling technology
Faster solutions
in several domains
25
Automated Reasoning with SAT
• A simple but useful modeling language:
Boolean formulas
• Corresponding inference engine:
Satisfiability or SAT algorithm
(e.g. complete search, local search, message passing)
• Numerous applications:
hardware and software verification, planning,
scheduling, e-commerce, circuit design,
open problems in algebra, …
26
Boolean Logic
Defined over Boolean (binary) variables a, b, c, …
Each of these can be True (1, T) or False (0, F)
Variables connected together with logic operators: and, or, not (denoted )
E.g. ((c  d)  f) is True iff
either c is True and d is False, or f is True
Fact: All other Boolean logic operators can be expressed with and, or, not
E.g. (a  b) same as (a or b)
Boolean formula, e.g. F = (a or b) and (a and (b or c))
(Truth) Assignment: any setting of the variables to True or False
Satisfying assignment: assignment where the formula evaluates to True
E.g. F has 3 satisfying assignments: (0,1,0), (0,1,1), (1,0,0)
27
Boolean Logic: Example
F = (a or b) and (a and (b or c))
Note: True often written as 1, False as 0
• There are 23 = 8 possible truth assignments to a, b, c
– (a=0,b=1,c=0)
– (a=0,b=0,c=1)
– …
representing (a=False, b=True, c=False)
• Exactly 3 truth assignments satisfy F
– (a=0,b=1,c=0)
– (a=0,b=1,c=1)
– (a=1,b=0,c=0)
Truth Table for F
a
b
c
F
0
0
0
0
0
0
1
0
0
1
0
1
0
1
1
1
1
0
0
1
1
0
1
0
1
1
0
0
1
1
1
0
28
Boolean Logic: Expressivity
All discrete single-agent search problems can be
cast as a Boolean formula
Variables a, b, c, … often represent “states” of the
system, “events”, “actions”, etc.
(more on this later, using Planning as an example)
Very general encoding language. E.g. can handle
• Numbers (k-bit binary representation)
• Floating-point numbers
• Arithmetic operators like +, x, exp(), log()
• …
SAT encodings (generated automatically from high
level languages) routinely used in domains like
planning, scheduling, verification, e-commerce,
network design, …
Recall Example:
Variables
X1 = email_ received
X2 = in_ meeting
X3 = urgent
X4 = respond_to_email
“event”
“state”
“action”
X5 = near_deadline
X6 = postpone
X7 = air_ticket_info_request
X8 = travel_ request
X9 = info_request
Rules:
1. X1 & (not X2) & X3  X4
2. X2  not X4
constraint
3.
4.
5.
6.
7.
X5  X3 or X6
X7  X8
X8  X9
X8  X5
X6  not X9
29
Boolean Logic: Standard Representations
Each problem constraint typically specified as (a set of) clauses:
E.g. (a or b), (c or d or f), (a or c or d), …
clauses (only “or”, “not”)
Formula in conjunctive normal form, or CNF: a conjunction of clauses
E.g.
F = (a or b) and (a and (b or c)) changes to
FCNF = (a or b) and (a or b) and (b or c)
Alternative [useful for QBF]: specify each constraint as a term (only “and”, “not”):
E.g. (a and d), (b and a and f), (b and d and e), …
Formula in disjunctive normal form, or DNF: a disjunction of terms
E.g. FDNF = (a and b) or (a and b and c)
30
Boolean Satisfiability Testing
The Boolean Satisfiability Problem, or SAT:
Given a Boolean formula F,
• find a satisfying assignment for F
• or prove that no such assignment exists.
•
•
•
A wide range of applications
Relatively easy to test for small formulas (e.g. with a Truth Table)
However, very quickly becomes hard to solve
–
Search space grows exponentially with formula size
(more on this next)
SAT technology has been very successful in taming this exponential blow up!
31
SAT Search Space
All vars free
Fix one variable to True or False
Fix another var
Fix a 3rd var
Fix a 4th var
True
False
False
False
True
SAT Problem: Find a path to a True leaf node.
For N Boolean variables, the raw search space is of size 2N
• Grows very quickly with N
• Brute-force exhaustive search unrealistic without efficient heuristics, etc.
32
k-CNF, 3-CNF
k-CNF: all clauses have k literals

1-CNF SAT: trivial

2-CNF SAT: solvable in O(N2) time



3-CNF SAT: NP-complete
4-CNF SAT: NP-complete
…
[N = num. of variables]
Note: Any Boolean formula can be converted into CNF.
-- with or without extra variables (without  size increase)
34
Worst-Case Complexity
SAT is an NP-complete problem
•
Worst-case believed to be exponential
(roughly 2N for N variables)
•
10,000+ problems in CS are NPcomplete (e.g. planning, scheduling,
protein folding, reasoning)
•
P vs. NP --- $1M Clay Prize
However, real-world instances are usually
not pathological and can often be solved
very quickly with the latest technology!
Typical-case complexity provides a more
detailed understanding and a more
positive picture.
exponential
polynomial
35
Exponential Complexity Growth
Planning (single-agent):
find the right sequence of actions
HARD: 10 actions, 10! = 3 x 106 possible plans
Contingency planning (multi-agent):
actions may or may not
produce the desired effect!
…
1 out
of 10
4 out
2 out of 8
of 9
REALLY HARD: 10 x 92 x 84 x 78 x … x 2256 =
10224 possible contingency plans!
exponential
polynomial
36
Typical-Case Complexity
A key hardness parameter for k-SAT: the ratio of clauses to variables
Delete Constraints
Add Constraints
Problems that are not critically constrained tend to be much easier in practice
than the relatively few critically constrained ones
[Mitchell, Selman, and Levesque ’92; Kirkpatrick and Selman – Science ’94]
37
Typical-Case Complexity
Random 3-SAT as of 2004
Phase
transition
Linear time algs.
Random Walk
DP
DP
’
GSAT
Walksat
SP
SAT solvers continually getting close to tackling problems in the hardest region!
SP (survey propagation) now handles 1,000,000 variables
very near the phase transition region
38
Tractable Sub-Structure Can Dominate and
Drastically Reduce Solution Cost!
Median runtime
2+p-SAT model: mix 2-SAT (tractable) and 3-SAT (intractable) clauses
> 40% 3-SAT: exponential scaling
 40% 3-SAT: linear scaling!
Number of variables
(Monasson, Selman et al. – Nature ’99; Achlioptas ’00)
39
How are other NP-complete problems
translated into SAT instances?
“SAT encoding”
SAT Encoding Example: Planning Domain
Planning Problem  Propositional CNF formula
by axiom schemas
Logistics planning: think of a number of trucks and planes that need to transport a
bunch of packages from their origin to their destination
Discrete time, modeled by integers
•
state predicates: indexed by time at which they hold
E.g. at_location(x,,loc,i), free(x,i+1), route(cityA,cityB,i)
•
action predicates: indexed by time at which action begins
E.g. fly(cityA,cityB,i), pickup(x,loc,i), drive_truck(loc1,loc2,i)
– each action takes 1 time step
– many actions may occur at the same step
41
Encoding Rules
•
Actions imply preconditions and effects
fly(x,y,i)
•

at(x,i) and route(x,y,i) and at(y,i+1)
Conflicting actions cannot occur at same time (A deletes a precondition of B)
fly(x,y,i) and yz
•

If something changes, an action must have caused it
(Explanatory Frame Axioms)
at(x,i) and not at(x,i+1)
•
not fly(x,z,i)

y . route(x,y) and fly(x,y,i)
Initial and final states hold
at(NY,0) and ... and at(LA,9) and ...
42
Using SAT Solvers for Planning
Modeling and Solving a Planning Problem
Problem description in
high level language
axiom
schemas
instantiate
instantiated
propositional
clauses
(manual)
length
mapping
plan
interpret
satisfying
model
(fully automatic)
SAT
engine(s)
43
Planning Benchmark Complexity
Logistics domain – a complex, highly-parallel transportation domain
E.g. logistics.d problem:
o 2,165 possible actions per time slot
o optimal solution contains 74 distinct actions over 14 time slots
(out of 5 x 10^46 possible sequential plans of length 14)
Satplan [Selman et al.] approach is currently fastest optimal planning approach.
Winner ICAPS-05 & ICAPS-06 international planning competitions.
44
Solution Approaches to SAT
Solving SAT: Systematic Search
One possibility: enumerate all truth assignments one-by-one,
test whether any satisfies F
– Note: testing is easy!
– But too many truth assignments (e.g. for N=1000 variables, have
21000  10300 truth assignments)
00000000
00000001
00000010
00000011
……
11111111
2N
46
Solving SAT: Systematic Search
Smarter approach: the “DPLL” procedure [1960’s]
(Davis, Putnam, Logemann, Loveland)
1. Assign values to variables one at a time (“partial” assignments)
2. Simplify F
3. If contradiction (i.e. some clause becomes False), “backtrack”, flip
last unflipped variable’s value, and continue search
•
Extended with many new techniques -- 100’s of research papers,
yearly conference on SAT
e.g., extremely efficient data-structures (representation),
randomization, restarts, learning “reasons” of failure
•
•
Provides proof of unsatisfiability if F is unsat. [“complete method”]
Forms the basis of dozens of very effective SAT solvers!
e.g. minisat, zchaff, relsat, rsat, … (open source, available on the www)
47
Solving SAT: Local Search
• Search space: all 2N truth assignments for F
• Goal: starting from an initial truth assignment A0, compute assignments
A1, A2, …, As such that As is a satisfying assignment for F
 Ai+1 is computed by a “local transformation” to Ai
e.g. A1 = 000110111
A2 = 001110111
A3 = 001110101
A4 = 101110101
…
…
As = 111010000
green bit “flips” to red bit
solution found!
 No proof of unsatisfiability if F is unsat. [“incomplete method”]
 Several SAT solvers based on this approach, e.g. Walksat
48
Solving SAT: Decimation
• “Search” space: all 2N truth assignments for F
• Goal: attempt to construct a solution in “one-shot” by very carefully
setting one variable at a time
• Survey Inspired Decimation:
– Estimate certain “marginal probabilities” of each variable being True, False,
or ‘undecided’ in each solution cluster using Survey Propagation
– Fix the variable that is the most biased to its preferred value
– Simplify F and repeat
• A method rarely used by computer scientists
• But has received tremendous success from the physics community on
random k-SAT; can easily solve random instances with 1M+ variables!
• No searching for solution
• No proof of unsatisfiability [“incomplete method”]
49
The Next Two Lectures
• Problems beyond SAT / searching for a single solution
• #P-complete: count the number of solutions of a SAT instance
• #P-hard: sample a solution uniformly at random for a SAT instance
• PSPACE-complete: quantified Boolean formula (QBF)
50
Thank you for attending!
Slides: http://www.cs.cornell.edu/~sabhar/tutorials/kitpc08-combinatorial-problems-I.ppt
Ashish Sabharwal : http://www.cs.cornell.edu/~sabhar
Bart Selman : http://www.cs.cornell.edu/selman
Download