Note 5: games

advertisement
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr
game playing
Outline
• What are games?
• Optimal decisions in games
– Which strategy leads to success?
• - pruning
• Games of imperfect information
• Games that include an element of chance
2
What are and why study games?
• Games are a form of multi-agent environment
– What do other agents do and how do they affect our
success?
– Cooperative vs. competitive multi-agent environments.
– Competitive multi-agent environments give rise to
adversarial problems a.k.a. games
• Why study games?
– Fun; historically entertaining
– Interesting subject of study because they are hard
– Easy to represent and agents restricted to small number
of actions
3
Relation of Games to Search
• Search – no adversary
– Solution is (heuristic) method for finding goal
– Heuristics and CSP techniques can find optimal solution
– Evaluation function: estimate of cost from start to goal through
given node
– Examples: path planning, scheduling activities
• Games – adversary
– Solution is strategy (strategy specifies move for every
possible opponent reply).
– Time limits force an approximate solution
– Evaluation function: evaluate “goodness” of
game position
– Examples: chess, checkers, Othello, backgammon
4
Types of Games
5
Game setup
• Two players: MAX and MIN
• MAX moves first and they take turns until the game
is over. Winner gets award, looser gets penalty.
• Games as search:
– Initial state: e.g. board configuration of chess
– Successor function: list of (move,state) pairs specifying
legal moves.
– Terminal test: Is the game finished?
– Utility function: Gives numerical value of terminal states.
E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next)
• MAX uses search tree to determine next move.
6
Partial Game Tree for Tic-Tac-Toe
7
Optimal strategies
• Find the contingent strategy for MAX assuming an
infallible MIN opponent.
• Assumption: Both players play optimally !!
• Given a game tree, the optimal strategy can be
determined by using the minimax value of each
node:
MINIMAX-VALUE(n)=
UTILITY(n)
maxs  successors(n) MINIMAX-VALUE(s)
mins  successors(n) MINIMAX-VALUE(s)
If n is a terminal
If n is a max node
If n is a min node
8
Two-Ply Game Tree
9
Two-Ply Game Tree
•
MAX nodes
•
MIN nodes
• Terminal nodes – utility values for MAX
• Other nodes – minimax values
• MAX’s best move at root – a1 - leads to the
successor with the highest minimax value
• MIN’s best to reply – b1 – leads to the
successor with the lowest minimax value
10
What if MIN does not play optimally?
• Definition of optimal play for MAX assumes
MIN plays optimally: maximizes worst-case
outcome for MAX.
• But if MIN does not play optimally, MAX will
do even better. [Can be proved.]
11
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
vMAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v  MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v  MIN(v,MAX-VALUE(s))
return v
12
Properties of Minimax
Criterion
Minimax
Complete?
Yes

Time
O(bm)

Space
O(bm)

Optimal?
Yes

13
Tic-Tac-Toe
• Depth bound 2
• Breadth first search until all nodes at level 2
are generated
• Apply evaluation function to positions at
these nodes
14
Tic-Tac-Toe
• Evaluation function e(p) of a position p
– If p is not a winning position for either player
– e(p) = (number of a complete rows, columns, or
diagonals that are still open for MAX) – (number
of a complete rows, columns, or diagonals that
are still open for MIN)
– If p is a win for MAX
• e(p) = 
– If p is a win for MIN
• e(p) = - 
15
Tic-Tac-Toe - First stage
o x
6-4=2
5-4=1
1
x
o
-2
x
o
x
x
MAX’s move
x
1
o
x
4-5=-1
o
4-6=-2
6-6=0
o
x
5-5=0
o
x
5-6=-1
o
6-5=1
x
5-5=0
o
x
-1
x
x
o
5-5=0
o x
x o
6-5=1
5-6=-1
16
Tic-Tac-Toe - Second
stage
1
1
MAX’s move
ox
x
o x
0
o x
o x
x
x
1
o
o x
x
x
o x x
4-2=2
o
o x
x
o x x 3-3=0
o
o x x 4-3=1
o
o x x 4-3=1
o
3-2=1
o x o 5-2=3
x
o x
4-2=2
o x
o x
4-2=2
o
x
3-2=1
4-3=1
o
ox
x
4-3=1
ox
x
3-2=1
5-2=3
3-3=0
ox o
x
4-2=2
o
o
o x
x
o
o
o x
0
4-2=2
o
o x
x
o
ox
x
o x
x
o x o 5-3=2
x
o x
xo
3-3=0
o x
o x
4-3=1
ox
x o
3-2=1
ox
xo
4-2=2
17
Multiplayer games
• Games allow more than two players
• Single minimax values become vectors
18
Problem of minimax search
• Number of games states is exponential to the
number of moves.
– Solution: Do not examine every node
– ==> Alpha-beta pruning
• Alpha = value of best choice found so far at any
choice point along the MAX path
• Beta = value of best choice found so far at any choice
point along the MIN path
• Revisit example …
19
Tic-Tac-Toe - First stage
Beta value = -1
B
x
Alpha value = -1
x
4-5=-1
o
x
5-5=0
o
o
6-5=1
x
-1
x
x
o
5-5=0
A
C
x o
6-5=1
o x
5-6=-1
20
Alpha-beta pruning
• Search progresses in a depth first manner
• Whenever a tip node is generated, it’s static
evaluation is computed
• Whenever a position can be given a backed up
value, this value is computed
• Node A and all its successors have been generated
• Backed up value -1
• Node B and its successors have not yet been
generated
• Now we know that the backed up value of the start
node is bounded from below by -1
21
Alpha-beta pruning
• Depending on the back up values of the
other successors of the start node, the final
backed up value of the start node may be
greater than -1, but it cannot be less
• This lower bound – alpha value for the start
nodea
22
Alpha-beta pruning
• Depth first search proceeds – node B and its
first successor node C are generated
• Node C is given a static value of -1
• Backed up value of node B is bounded from
above by -1
• This Upper bound on node B – beta value.
• Therefore discontinue search below node B.
– Node B. will not turn out to be preferable to node
A
23
Alpha-beta pruning
• Reduction in search effort achieved by
keeping track of bounds on backed up values
• As successors of a node are given backed
up values, the bounds on backed up values
can be revised
– Alpha values of MAX nodes that can never
decrease
– Beta values of MIN nodes can never increase
24
Alpha-beta pruning
• Therefore search can be discontinued
– Below any MIN node having a beta value less
than or equal to the alpha value of any of its
MAX node ancestors
• The final backed up value of this MIN node can then
be set to its beta value
– Below any MAX node having an alpha value
greater than or equal to the beta value of any of
its MINa node ancestors
• The final backed up value of this MAX node can then
be set to its alpha value
25
Alpha-beta pruning
• during search, alpha and beta values are
computed as follows:
– The alpha value of a MAX node is set equal to
the current largest final backed up value of its
successors
– The beta value of a MINof the node is set equal
to the current smallest final backed up value of
its successors
26
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
[-∞,+∞]
[-∞, +∞]
27
Alpha-Beta Example
(continued)
[-∞,+∞]
[-∞,3]
28
Alpha-Beta Example
(continued)
[-∞,+∞]
[-∞,3]
29
Alpha-Beta Example
(continued)
[3,+∞]
[3,3]
30
Alpha-Beta Example
(continued)
[3,+∞]
This node is worse
for MAX
[3,3]
[-∞,2]
31
Alpha-Beta Example
(continued)
[3,14]
[3,3]
[-∞,2]
,
[-∞,14]
32
Alpha-Beta Example
(continued)
[3,5]
[3,3]
[−∞,2]
,
[-∞,5]
33
Alpha-Beta Example
(continued)
[3,3]
[3,3]
[−∞,2]
[2,2]
34
Alpha-Beta Example
(continued)
[3,3]
[3,3]
[-∞,2]
[2,2]
35
Alpha-Beta Algorithm
function ALPHA-BETA-SEARCH(state) returns an action
inputs: state, current state in game
vMAX-VALUE(state, - ∞ , +∞)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state, , ) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v-∞
for a,s in SUCCESSORS(state) do
v  MAX(v,MIN-VALUE(s,  , ))
if v ≥  then return v
  MAX( ,v)
return v
36
Alpha-Beta Algorithm
function MIN-VALUE(state,  , ) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v+∞
for a,s in SUCCESSORS(state) do
v  MIN(v,MAX-VALUE(s,  , ))
if v ≤  then return v
  MIN( ,v)
return v
37
General alpha-beta pruning
• Consider a node n
somewhere in the tree
• If player has a better
choice at
– Parent node of n
– Or any choice point further
up
• n will never be reached in
actual play.
• Hence when enough is
known about n, it can be
pruned.
38
Final Comments about AlphaBeta Pruning
• Pruning does not affect final results
• Entire subtrees can be pruned.
• Good move ordering improves effectiveness of
pruning
• Alpha-beta pruning can look twice as far as
minimax in the same amount of time
• Repeated states are again possible.
– Store them in memory = transposition table
39
Games that include chance
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,1016) and (5-11,11-16)
40
Games that include chance
chance nodes
• Possible moves (5-10,5-11), (5-11,19-24),(5-10,1016) and (5-11,11-16)
• [1,1], [6,6] chance 1/36, all other chance 1/18
41
Games that include chance
• [1,1], [6,6] chance 1/36, all other chance 1/18
• Can not calculate definite minimax value, only expected
value
42
Expected minimax value
EXPECTED-MINIMAX-VALUE(n)=
UTILITY(n)
If n is a terminal
maxs  successors(n) MINIMAX-VALUE(s)
If n is a max
node
mins  successors(n) MINIMAX-VALUE(s)
If n is a max
node
s  successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance
node
These equations can be backed-up recursively
all the way to the root of the game tree.
43
Position evaluation with chance
nodes
• Left, A1 wins
• Right A2 wins
• Outcome of evaluation function may not change when
values are scaled differently.
• Behavior is preserved only by a positive linear
transformation of EVAL.
44
Discussion
• Examine section on state-of-the-art games yourself
• Minimax assumes right tree is better than left, yet
…
– Return probability distribution over possible values
– Yet expensive calculation
45
Discussion
• Utility of node expansion
– Only expand those nodes which lead to significanlty
better moves
• Both suggestions require meta-reasoning
46
Summary
• Games are fun (and dangerous)
• They illustrate several important points about
AI
– Perfection is unattainable -> approximation
– Good idea what to think about
– Uncertainty constrains the assignment of values
to states
• Games are to AI as grand prix racing is to
automobile design.
47
Download