Heuristic Search CMSC 100 Tuesday, November 4, 2008 Prof. Marie desJardins

advertisement
Heuristic Search
CMSC 100
Tuesday, November 4, 2008
Prof. Marie desJardins
Summary of Topics
 What is heuristic search?
 Examples of search problems
 Search methods




Uninformed search
Informed search
Local search
Game trees
Building Goal-Based
Intelligent Agents
To build a goal-based agent we need to answer the
following questions:
 What is the goal to be achieved?
 What are the actions?
 What relevant information is necessary to encode
in order to describe the state of the world,
describe the available transitions, and solve the
problem?
Initial
state
Actions
Goal
state
Representing States
 What information is necessary to encode about the world to
sufficiently describe all relevant aspects to solving the goal?
 That is, what knowledge needs to be represented in a state
description to adequately describe the current state or situation of
the world?
 The size of a problem is usually described in terms of the
number of states that are possible.
 Tic-Tac-Toe has about 39 states.
 Checkers has about 1040 states.
 Rubik’s Cube has about 1019 states.
 Chess has about 10120 states in a typical game.
Real-world Search Problems






Route finding
Touring (traveling salesman)
Logistics
VLSI layout
Robot navigation
Learning
8-Puzzle
Given an initial configuration of 8 numbered tiles on a
3 x 3 board, move the tiles into a desired goal
configuration of the tiles.
8-Puzzle Encoding
 State: 3 x 3 array configuration of the tiles on the board.
 4 Operators: Move Blank Square Left, Right, Up or Down.
 This is a more efficient encoding of the operators than one in which
each of four possible moves for each of the 8 distinct tiles is used.
 Initial State: A particular configuration of the board.
 Goal: A particular configuration of the board.
 What does the state space look like?
Missionaries and Cannibals
There are 3 missionaries, 3 cannibals, and 1
boat that can carry up to two people on
one side of a river.
 Goal: Move all the missionaries and
cannibals across the river.
 Constraint: Missionaries can never be
outnumbered by cannibals on either side of
river; otherwise, the missionaries are killed.
 State: Configuration of missionaries and
cannibals and boat on each side of river.
 Operators: Move boat containing some set
of occupants across the river (in either
direction) to the other side.
 What’s the solution??
Missionaries and Cannibals Solution
Near side
0 Initial setup:
MMMCCC B
1 Two cannibals cross over:
MMMC
2 One comes back:
MMMCC
B
3 Two cannibals go over again:
MMM
4 One comes back:
MMMC
B
5 Two missionaries cross:
MC
6 A missionary & cannibal return: MMCC
B
7 Two missionaries cross again:
CC
8 A cannibal returns:
CCC
B
9 Two cannibals cross:
C
10 One returns:
CC
B
11 And brings over the third:
-
Far side
B CC
C
B CCC
CC
B MMCC
MC
B MMMC
MMM
B MMMCC
MMMC
B MMMCCC
Water Jug Problem
Given a full 5-gallon jug and an empty 2-gallon jug, the goal is to
fill the 2-gallon jug with exactly one gallon of water.
 Possible actions:
 Empty the 5-gallon jug (pour contents down the drain)
 Empty the 2-gallon jug
 Pour the contents of the 2-gallon jug into the 5-gallon jug (only if there
is enough room)
 Fill the 2-gallon jug from the 5-gallon jug
Case 1: at least 2 gallons in the 5-gallon jug
Case 2: less than 2 gallons in the 5-gallon jug
 What are the states?
 What are the state transitions?
 What does the state space look like?
Water Jug Problem
Given a full 5-gallon
jug and an empty
2-gallon jug, the
goal is to fill the 2gallon jug with
exactly one gallon
of water.
 State = (x,y), where
x is the # of gallons
of water in the 5gallon jug and y is
the # of gallons in
the 2-gallon jug
 Initial State = (5,0)
 Goal State = (*,1),
where * means any
amount
Operator table
Name
Cond. Transition
Effect
Empty5
–
(x,y)→(0,y)
Empty 5-gal.
jug
Empty2
–
(x,y)→(x,0)
Empty 2-gal.
jug
2to5
x≤3
(x,2)→(x+2,0) Pour 2-gal.
into 5-gal.
5to2
x≥2
(x,0)→(x-2,2)
5to2part
y<2
(1,y)→(0,y+1) Pour partial
5-gal. into 2gal.
Pour 5-gal.
into 2-gal.
Water Jug State Space
Empty5
5, 2
5, 1
5, 0
Empty2
4, 2
4, 1
4, 0
3, 2
3, 1
3, 0
2, 2
2, 1
2, 0
1, 2
1, 1
1, 0
0, 2
0, 1
0, 0
2to5
5to2
5to2part
Water Jug Solution
5, 2
5, 1
5, 0
4, 2
4, 1
4, 0
3, 2
3, 1
3, 0
2, 2
2, 1
2, 0
1, 2
1, 1
1, 0
0, 2
0, 1
0, 0
The 8-Queens Problem
• Place eight
queens on a
chessboard
such that no
queen attacks
any other
•
•
What are the states and
operators?
What does the state
space look like?
Solution Cost
 A solution is a sequence of operators that is associated
with a path in a state space from a start node to a goal
node.
 The cost of a solution is the sum of the arc costs on the
solution path.
 If all arcs have the same (unit) cost, then the solution cost is just the
length of the solution (number of steps / state transitions)
Evaluating search strategies
 Completeness
 Guarantees finding a solution whenever one exists
 Time complexity
 How long (worst or average case) does it take to find a solution?
Usually measured in terms of the number of nodes expanded
 Space complexity
 How much space is used by the algorithm? Usually measured in
terms of the maximum size of the “nodes” list during the search
 Optimality/Admissibility
 If a solution is found, is it guaranteed to be an optimal one? That is,
is it the one with minimum cost?
Types of Search Methods
 Uninformed search strategies
 Also known as “blind search,” uninformed search strategies use no
information about the likely “direction” of the goal node(s)
 Variations on “generate and test” or “trial and error” approach
 Uninformed search methods: breadth-first, depth-first, uniform-cost
 Informed search strategies
 Also known as “heuristic search,” informed search strategies use
information about the domain to (try to) (usually) head in the general
direction of the goal node(s)
 Informed search methods: greedy search, A, A*
 Local search strategies
 Pick a starting solution (that might not be very good) and incrementally try
to improve it
 Local search methods: hill-climbing, genetic algorithms
 Game trees
 Search strategies for situations where you have an opponent who gets to
make some of the moves
 Try to pick moves that will let you win most of the time by “looking ahead” to
see what your opponent might do
Uninformed Search
A Simple Search Space
S
3
A
3
D
B
15
7
E
8
1
C
20
G
5
Depth-First (DFS)
 Enqueue nodes on nodes in LIFO (last-in, first-out) order. That
is, nodes used as a stack data structure to order nodes.
 May not terminate without a “depth bound,” i.e., cutting off
search below a fixed depth D ( “depth-limited search”)
 Not complete (with or without cycle detection, and with or
without a cutoff depth)
 Exponential time, O(bd), but only linear space, O(bd)
 Can find long solutions quickly if lucky (and short solutions
slowly if unlucky!)
 When search hits a dead end, can only back up one level at a
time, even if the “problem” occurs because of a bad operator
choice near the top of the tree.
Depth-First Search Solution
Expanded node
S0
A3
D6
E10
G18
Nodes list
{ S0 }
{ A3 B1 C8 }
{ D6 E10 G18 B1 C8 }
{ E10 G18 B1 C8 }
{ G18 B1 C8 }
{ B1 C8 }
Solution path found is S A G, cost 18
Number of nodes expanded (including goal node) = 5
Breadth-First
 Enqueue nodes on nodes in FIFO (first-in, first-out) order.
 Complete
 Optimal (i.e., admissible) if all operators have the same cost.
Otherwise, not optimal but finds solution with shortest path length.
 Exponential time and space complexity, O(bd), where d is the depth
of the solution and b is the branching factor (i.e., number of children)
at each node.
 Will take a long time to find solutions with a large number of steps
because it must look at all shorter length possibilities first.
 A complete search tree of depth d where each non-leaf node has b
children, has a total of 1 + b + b2 + ... + bd = (b(d+1) - 1)/(b-1) nodes
 For a complete search tree of depth 12, where every node at depths 0, ...,
11 has 10 children and every node at depth 12 has 0 children, there are 1
+ 10 + 100 + 1000 + ... + 1012 = (1013 - 1)/9 = O(1012) nodes in the
complete search tree. If BFS expands 1000 nodes/sec and each node
uses 100 bytes of storage, then BFS will take 35 years to run in the worst
case, and it will use 111 terabytes of memory!
Breadth-First Search Solution
Expanded node
Nodes list
{ S0 }
S0
{ A3 B1 C8 }
A3
{ B1 C8 D6 E10 G18 }
B1
{ C8 D6 E10 G18 G21 }
C8
{ D6 E10 G18 G21 G13 }
D6
{ E10 G18 G21 G13 }
E10
{ G18 G21 G13 }
G18
{ G21 G13 }
Solution path found is S A G , cost 18
Number of nodes expanded (including goal node) = 7
Uniform Cost Search
 Enqueue nodes by path cost. That is, let priority =
cost of the path from the start node to the current
node n. Sort nodes by increasing value of cost (try
low-cost nodes first)
 Called “Dijkstra’s Algorithm” in the algorithms
literature; similar to “Branch and Bound Algorithm”
from operations research
 Complete
 Optimal/Admissible
 Exponential time and space complexity, O(bd)
Uniform-Cost Search Solution
Expanded node
Nodes list
{ S0 }
S0
{ B1 A3 C8 }
B1
{ A3 C8 G21 }
A3
{ D6 C8 E10 G18 G21 }
D6
{ C8 E10 G18 G21 }
C8
{ E10 G13 G18 G21 }
E10
{ G13 G18 G21 }
G13
{ G18 G21 }
Solution path found is S C G, cost 13
Number of nodes expanded (including goal node) = 7
Comparing Performance
 Depth-First Search:
 Expanded nodes: S A D E G
 Solution found: S A G (cost 18)
 Breadth-First Search:
 Expanded nodes: S A B C D E G
 Solution found: S A G (cost 18)
 Uniform-Cost Search:
 Expanded nodes: S A D B C E G
 Solution found: S B G (cost 13)
This is the only uninformed search method that worries about
costs.
Holy Grail Search
Expanded node
S0
C8
G13
Nodes list
{ S0 }
{ C8 A3 B1 }
{ G13 A3 B1 }
{ A3 B1 }
Solution path found is S C G, cost 13 (optimal)
Number of nodes expanded (including goal node) = 3
(as few as possible!)
If only we knew where we were headed…
Informed Search
What’s a Heuristic?
Webster's Revised Unabridged Dictionary (1913) (web1913)
Heuristic \Heu*ris"tic\, a. [Gr. ? to discover.] Serving to discover or find
out.
The Free On-line Dictionary of Computing (15Feb98)
heuristic 1. <programming> A rule of thumb, simplification or educated
guess that reduces or limits the search for solutions in domains that
are difficult and poorly understood. Unlike algorithms, heuristics do not
guarantee feasible solutions and are often used with no theoretical
guarantee. 2. <algorithm> approximation algorithm.
From WordNet (r) 1.6
heuristic adj 1: (computer science) relating to or using a heuristic rule 2:
of or relating to a general formulation that serves to guide investigation
[ant: algorithmic] n : a commonsense rule (or set of rules) intended to
increase the probability of solving some problem [syn: heuristic rule,
heuristic program]
Informed Search: Use What You Know!
 Add domain-specific information to select the best path
along which to continue searching
 Define a heuristic function, h(n), that estimates the
“goodness” of a node n.
 Most often, h(n) = estimated cost (or distance) of minimal
cost path from n to a goal state.
 The heuristic function is an estimate, based on domainspecific information that is computable from the current
state description, of how close we are to a goal
Heuristic Functions
 All domain knowledge used in the search is encoded
in the heuristic function h.
 Heuristic search is an example of a “weak method”
because of the limited way that domain-specific
information is used to solve the problem.
 Examples:
 Missionaries and Cannibals: Number of people on starting
river bank
 8-puzzle: Number of tiles out of place
 8-puzzle: Sum of distances each tile is from its goal position
 In general:
 h(n)  0 for all nodes n
 h(n) = 0 implies that n is a goal node
 h(n) = infinity implies that n is a dead end from which a goal
cannot be reached
Example
n
g(n) h(n) f(n) h*(n)
S
A
B
C
D
E
G
0
3
1
8
6
10
13
8
8
4
3


0
8
11
5
11


13
13
15
20
5


0
 g(n) is the (lowest observed) cost from the start node to n
 H(n) is the estimated cost from n to the goal node
 F(n) is the heuristic value (f(n) = g(n) + h(n), estimated total cost
from start to goal through n)
 h*(n) is the (hypothetical) perfect heuristic
 Since h(n)  h*(n) for all n, h is admissible
 Optimal path = S C G with cost 13
Greedy Search
 Use as an evaluation function f(n) = h(n),
sorting nodes by increasing values of f
 Selects node to expand believed to be
closest (hence “greedy”) to a goal node
(i.e., select node with smallest f value)
 Not complete
 Not admissible, as in the example.
Assuming all arc costs are 1, then greedy
search will find goal g, which has a solution
cost of 5, while the optimal solution is the
path to goal g2 with cost 3
a
h=2
b
h
h=4
h=1
c
i
h=1
h=1
d
g2h=0
h=1
e
h=0
g
Greedy Search
f(n) = h(n)
node expanded
S
C
G
{
{
{
{
nodes list
S8 }
C3 B4 A8 }
G0 B4 A8 }
B 4 A8 }
 Solution path found is S C G, 3 nodes expanded.
 See how fast the search is!! But it is not always optimal.
Algorithm A
 Use as an evaluation function
f(n) = g(n) + h(n)
 g(n) = minimal-cost path from the start state to state n.
 The g(n) term adds a “breadth-first” component to the
evaluation function.
 Ranks nodes on search frontier by estimated cost of
solution from start node through the given node to
goal.
 Not complete if h(n) can equal infinity.
 Not admissible.
Algorithm A*
 Algorithm A with constraint that h(n)  h*(n)
 h*(n) = true cost of the minimal cost path from n to a
goal.
 h is admissible when h(n)  h*(n) holds.
 Using an admissible heuristic guarantees that the first
solution found will be an optimal one.
 A* is complete whenever the branching factor is finite,
and every operator has a fixed positive cost
 A* is admissible
A* Search
f(n) = g(n) + h(n)
node exp.
S
B
A
C
G
{
{
{
{
{
{
nodes list
S8 }
B5 A11 C11 }
A11 C11 G21 }
C11 G18 G21 D E }
G13 G18 G21 D E }
G18 G21 D E }
 Solution path found is S B G, 5 nodes expanded..
 Still pretty fast. And optimal, too.
Dealing with Hard Problems
 For large problems, A* often requires too much space.
 Two variations conserve memory: IDA* and SMA*
 IDA* -- iterative deepening A* -- uses successive iteration
with growing limits on f:
 A* but don’t consider any node n where f(n) >10
 A* but don’t consider any node n where f(n) >20
 A* but don’t consider any node n where f(n) >30, ...
 SMA* -- Simplified Memory-Bounded A*
 uses a queue of restricted size to limit memory use.
Local Search
Local Search
 Another approach to search involves starting with
an initial guess at a solution and gradually
improving it until it is a legal solution or the best
that can be found.
 Also known as “incremental improvement” search
 Some examples:
 Hill climbing
 Genetic algorithms
Hill Climbing on a Surface of States
Height Defined by
Evaluation
Function
Hill-Climbing Search
 If there exists a successor s for the current state n
such that
 h(s) < h(n)
 h(s)  h(t) for all the successors t of n,
 then move from n to s. Otherwise, halt at n.
 Looks one step ahead to determine if any successor
is better than the current state; if there is, move to the
best successor.
 Similar to Greedy search in that it uses h, but does
not allow backtracking or jumping to an alternative
path since it doesn’t “remember” where it has been.
 Not complete since the search will terminate at "local
minima," "plateaus," and "ridges."
Hill Climbing Example
start
2 8 3
1 6 4
7
5
-5
h = -4
-5
2 8 3
1
4 h = -3
7 6 5
-3
h = -3
goal
1 2 3
8
4 h=0
7 6 5
-2
1 2 3
8 4 h = -1
7 6 5
-4
2
3
1 8 4
7 6 5
2 3
1 8 4 h = -2
7 6 5
-4
f(n) = -(number of tiles out of place)
Drawbacks of Hill Climbing
 Problems:
 Local Maxima: peaks that aren’t the highest point in the
space
 Plateaus: the space has a broad flat region that gives
the search algorithm no direction (random walk)
 Ridges: flat like a plateau, but with dropoffs to the sides;
steps to the North, East, South and West may go down,
but a step to the NW may go up.
 Remedies:
 Random restart
 Problem reformulation
 Some problem spaces are great for hill climbing and others
are terrible.
Example of a Local Optimum
start
2 5
1 7 4
8 6 3
1 2 5
7 4
8 6 3
1 2 5
7
4 -4
8 6 3
-3
-4
1 2 5
8 7 4 -4
6 3
goal
1 2 3
8
4 0
7 6 5
Genetic Algorithms
 Start with k random states (the initial population)
 New states are generated by “mutating” a single state or
“reproducing” (combining) two parent states (selected
according to their fitness)
 Encoding used for the “genome” of an individual strongly
affects the behavior of the search
 Genetic algorithms / genetic programming are a large and
active area of research
Summary: Informed Search
 Best-first search is general search where the minimum-cost nodes (according
to some measure) are expanded first.
 Greedy search uses minimal estimated cost h(n) to the goal state as measure.
This reduces the search time, but the algorithm is neither complete nor optimal.
 A* search combines uniform-cost search and greedy search: f(n) = g(n) + h(n).
A* handles state repetitions and h(n) never overestimates.
 A* is complete and optimal, but space complexity is high.
 The time complexity depends on the quality of the heuristic function.
 Hill-climbing algorithms keep only a single state in memory, but can get
stuck on local optima.
 Genetic algorithms can search a large space by modeling biological
evolution.
Game Playing
Why Study Games?
 Clear criteria for success
 Offer an opportunity to study problems involving {hostile,
adversarial, competing} agents.
 Historical reasons
 Fun
 Interesting, hard problems which require minimal “initial
structure”
 Games often define very large search spaces
 chess 35100 nodes in search tree, 1040 legal states
State of the Art
 How good are computer game players?
 Chess:
Deep Blue beat Gary Kasparov in 1997
Garry Kasparav vs. Deep Junior (Feb 2003): tie!
Kasparov vs. X3D Fritz (November 2003): tie!
http://www.cnn.com/2003/TECH/fun.games/11/19/kas
parov.chess.ap/
 Checkers: Chinook (an AI program with a very large endgame
database) is the world champion (checkers is “solved”!)
 Go: Computer players are decent, at best
 Bridge: “Expert-level” computer players exist (but no world
champions yet!)
 Good places to learn more:
 http://www.cs.ualberta.ca/~games/
 http://www.cs.unimass.nl/icga
Chinook
 Chinook is the World Man-Machine Checkers
Champion, developed by researchers at the University
of Alberta.
 It earned this title by competing in human tournaments,
winning the right to play for the (human) world
championship, and eventually defeating the best
players in the world.
 Visit http://www.cs.ualberta.ca/~chinook/ to play a
version of Chinook over the Internet.
 The developers claim to have fully analyzed the game
of checkers, and can provably always win if they play
black
 “One Jump Ahead: Challenging Human Supremacy in
Checkers” Jonathan Schaeffer, University of Alberta
(496 pages, Springer. $34.95, 1998).
Ratings of Human and Computer Chess Champions
Typical Game Setting




2-person game
Players alternate moves
Zero-sum: one player’s loss is the other’s gain
Perfect information: both players have access to
complete information about the state of the game. No
information is hidden from either player.
 No chance (e.g., using dice) involved
 Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim,
Othello
 Not: Bridge, Solitaire, Backgammon, ...
How to Play a Game
 A way to play such a game is to:





Consider all the legal moves you can make
Compute the new position resulting from each move
Evaluate each resulting position and determine which is best
Make that move
Wait for your opponent to move and repeat
 Key problems are:
 Representing the “board”
 Generating all legal next boards
 Evaluating a position
Evaluation Function
 Evaluation function or static evaluator is used to
evaluate the “goodness” of a game position.
 Contrast with heuristic search where the evaluation function
was a non-negative estimate of the cost from the start node
to a goal and passing through the given node
 The zero-sum assumption allows us to use a single
evaluation function to describe the goodness of a
board with respect to both players.





f(n) >> 0: position n good for me and bad for you
f(n) << 0: position n bad for me and good for you
f(n) near 0: position n is a neutral position
f(n) = +infinity: win for me
f(n) = -infinity: win for you
Evaluation Function Examples
 Example of an evaluation function for Tic-Tac-Toe:
f(n) = [# of 3-lengths open for me] - [# of 3-lengths open for you]
where a 3-length is a complete row, column, or diagonal
 Alan Turing’s function for chess
 f(n) = w(n)/b(n) where w(n) = sum of the point value of white’s pieces
and b(n) = sum of black’s
 Most evaluation functions are specified as a weighted sum of
position features:
f(n) = w1*feat1(n) + w2*feat2(n) + ... + wn*featk(n)
 Example features for chess are piece count, piece placement,
squares controlled, etc.
 Deep Blue had over 8000 features in its evaluation function
Game Trees
 Problem spaces for typical games are
represented as trees
 Root node represents the current
board configuration; player must decide
the best single move to make next
 Static evaluator function rates a
board position. f(board) = real number
with f>0 “white” (me), f<0 for black
(you)
 Arcs represent the possible legal
moves for a player
 If it is my turn to move, then the root is labeled a "MAX" node;
otherwise it is labeled a "MIN" node, indicating my opponent's turn.
 Each level of the tree has nodes that are all MAX or all MIN; nodes at
level i are of the opposite kind from those at level i+1
Minimax Procedure
 Create start node as a MAX node with current board
configuration
 Expand nodes down to some depth (a.k.a. ply) of lookahead in
the game
 Apply the evaluation function at each of the leaf nodes
 “Back up” values for each of the non-leaf nodes until a value is
computed for the root node
 At MIN nodes, the backed-up value is the minimum of the values
associated with its children.
 At MAX nodes, the backed-up value is the maximum of the values
associated with its children.
 Pick the operator associated with the child node whose backedup value determined the value at the root
Minimax Algorithm
2
1
2
2
7
1
Static evaluator
value
8
2
7
1
8
2
1
2
7
This is the move
selected by minimax
1
8
2
2
1
MAX
MIN
2
7
1
8
Partial Game Tree for Tic-Tac-Toe
• f(n) = +1 if the position is a
win for X.
• f(n) = -1 if the position is a
win for O.
• f(n) = 0 if the position is a
draw.
Download