Game Trees

advertisement
Von Neuman
(Min-Max theorem)
John McCarthy
(a-b pruning)
Chaturanga, India (~550AD)
(Proto-Chess)
Claude Shannon
(finite look-ahead)
Donald Knuth
(a-b analysis)
Wilmer McLean
The war began in my front yard
and ended in my front parlor
Deep Thought: Chess is easy for but the pesky opponent
Search: If I do A, then I will be in S, then if I do B, then I will get to S’
Game Search: If I do A, then I will be in S, then my opponent gets to do B.
then I will be forced to S’. Then I get to do C,..
Kriegspiel
(blind-fold chess)
Snakes
& Ladders? is perfect information with chance
Snakes-and-ladders
think of the utter boringness of deterministic snakes
and ladders
Not that the normal snakes-and-ladders has any real
scope for showing your thinking power (your only
action is dictated by the dice—so the dice can play
it as a solitaire—at most they need your hand..).
Searching Tic Tac Toe using Minmax
A game is considered
Solved if it can
be shown that
the MAX player
has a winning
(or at least
Non-losing)
Strategy
This means
that the backed-up
Value in the
Full min-max
Tree is
+ve
<= 2
<= 5
<= 14
<= 2
Cut
2
•Whenever a node gets its “true” value, its parent’s bound gets updated
•When all children of a node have been evaluated (or a cut off occurs
below that node), the current bound of that node is its true value
•Two types of cutoffs:
•If a min node n has bound <=k, and a max ancestor of n, say
m, has a bound >=j, then cutoff occurs as long as j >=k
•If a max node n has bound >=k, and a min ancestor of n, say
m, has a bound <=j, then cutoff occurs as long as j<=k
14
5
2
Another alpha-beta example
(order nodes in terms of
their static eval values)
Click for an animation of Alpha-beta search in action on Tic-Tac-Toe
Evaluation Functions:
TicTacToe
If win for Max
+infty
If lose for Max
-infty
If draw for Max
0
Else
# rows/cols/diags open
for Max
#rows/cols/diags open
for Min
What depth should we go to?
--Deeper the better (but why?)
Should we go to uniform depth?
--Go deeper in branches where
the game is in a flux
(backed up values are
changing fast)
[Called “Quiescence” ]
Can we avoid the horizon effect?
Depth Cutoff and Online Search
• Until now we considered mostly “all or
nothing” computations
– The computation takes the time it takes,
and only at the end will give any answer
• When the agent has to make decisions
online, it needs flexibility in the time it can
devote to “thinking” (“deliberation scheduling”)
– Can’t do it if we have all-or-nothing
computations. We need flexibile or
anytime computations
• The depth-limited min-max is an example of
an anytime computation.
– Pick a small depth limit. Do the analysis
w.r.t. that tree. Decide the best move.
Keep it as a back up. If you have more
time, go deeper and get a better move.
Online Search
is not guaranteed
to be optimal
--The agent may
not even survive
unless the world
is ergodic
(non-zero prob.
of reach any state
from any other state)
Why is “deeper” better?
• Possible reasons
– Taking mins/maxes of the evaluation values
of the leaf nodes improves their collective
accuracy
– Going deeper makes the agent notice “traps”
thus significantly improving the evaluation
accuracy
• All evaluation functions first check for termination
states before computing the non-terminal
evaluation
If this is indeed the case, then we should remember the
backed-up values for game positions—since they are
better than straight evaluations
(just as human weight lifters
refuse to compete against cranes)
Uncertain Actions &
Games Against Nature
[can generalize
to have action costs C(a,s)]
If Mij matrix is not known a priori, then we have
a reinforcement learning scenario..
3,2
.8
.1
.1
4,2
3,3
3,1
-1
-0.04
-0.04
.8
3,3
.1
.1
3,2
4,2
Leaf node values have been
set to their immediate rewards
Can do better if we set to
them to an estimate of their
expected value..
This is a game against the nature, and nature
decides which outcome of each action will occur.
How do you think it will decide?
I am the chosen one: So nature will decide the course
that is most beneficial to me [Max-Max]
 I am the Loser: So nature will decide the course that is
least beneficial to me [Min-Max]
 I am a rationalist: Nature is oblivious of me—and it does
what it does—so I do “expectation analysis”
Real Time Dynamic Programming
• Interleave “search” and
“execution” (Real Time Dynamic
Programming)
• Do limited-depth analysis based on
reachability to find the value of a state
(and there by the best action you
should be doing—which is the action
that is sending you the best value)
• The values of the leaf nodes are set to
be their immediate rewards
RTDP
– Alternatively some admissible
estimate of the value function (h*)
• If all the leaf nodes are terminal
nodes, then the backed up value will
be true optimal value. Otherwise, it is
an approximation…
For leaf nodes, can use R(s) or some
heuristic value h(s)
If you are perpetual optimist
then
V2= max(V3,V4)
If you have deterministic actions
then RTDP becomes RTA*
(if you use h(.) to evaluate leaves
The expected value computation
is fine if you are maximizing
“expected” return
If you are
--if you are risk-averse?
(and think “nature” is out to get you)
V2= min(V3,V4)
RTA*
(RTDP with deterministic actions
and leaves evaluated by f(.))
S
G=1
H=2
F=3
n
infty
m
G=1
H=2
F=3
k
G=2
H=3
F=5
--Grow the tree to depth d
--Apply f-evaluation for the leaf nodes
--propagate f-values up to the parent nodes
f(parent) = min( f(children))
S n
m
k
G
RTA* is a special case of RTDP
--It is useful for acting in
determinostic, dynamic worlds
--While RTDP is useful for
actiong in stochastic, dynamic
worlds
LRTA*: Can store backed up values
for states (and they will be
better heuristics)
End of Gametrees
Game Playing (Adversarial
Search)
• Perfect play
– Do minmax on the complete game tree
• Alpha-Beta pruning (a neat idea that is the bane
of many a CSE471 student)
• Resource limits
– Do limited depth lookahead
– Apply evaluation functions at the leaf nodes
– Do minmax
• Miscellaneous
– Games of Chance
– Status of computer games..
Multi-player Games
Everyone maximizes their utility
--How does this compare to 2-player games?
(Max’s utility is negative of Min’s)
Expecti-Max
Download