Chapters8,9,10,11

advertisement
Performance analysis of
Alpha-Beta Pruning



Since alpha-beta pruning performs a minimax search
while pruning much of the tree, its effect is to allow a
deeper search with the same amount of computation.
The question: how much does alpha-beta improve
performance?
The best way to characterize is asymptotic effective
branching factor.
The dth root of the number of nodes (in a search to depth d,
in the limit of large d)
 number of nodes generated at depth d / number of nodes
generated at depth d-1.

Performance analysis of
Alpha-Beta Pruning


The efficiency of alpha-beta pruning depends upon the
order in which nodes are encountered at the search
frontier.
Thus, we consider 3 different cases:
case - the algorithm doesn’t perform any cutoffs at all
 best case
 average case
 worst
Example of alpha-beta worst case

Evaluation from left to right
MAX
4
MIN
4
2
4
4
4 5
8
2
6
2
8
1
14
2
12
14
3 2 6 7 8 9 1 10 2 11 12 13 14 14
Lower Bound for Minimax
Algorithms



We consider a lower bound on the number of leaf nodes
that must be examined by any minimax algorithm.
In minimax algorithm, it’s a guaranty to return the minimax
value v of the root node of a game tree.
verifying maximum value = v  verifying value  v &&
value  v.
Any correct minimax algorithm must explore:
a
strategy for Max
 a strategy for Min
Strategies for Min and Max
value  v:

doesn’t matter what min
does

Strategy for max:
subtree containing:

one child of each Max
node

all b children of each min
node
value  v:

doesn’t matter what
Max does

Strategy for min:
subtree containing:

one child of each Min
node
 all b children of each
Max node
Example
strategy for Min:
strategy for Max:
mixed
Max strategy
Min strategy
Lower Bound for Minimax
Algorithms - Analysis
Assume :
 uniform branching factor of b uniform depth of d levels
 Max move is at the root.

Strategy for Max


d is even 
nodes
d is odd 
nodes
b
d
2
Strategy for Min
leaf
b 2  leaf

d

d is even 
nodes
d is odd 
nodes
d
b2
leaf
d

b 2  leaf
Lower Bound for Minimax
Algorithms - Analysis
Total number of distinct leaf nodes:


d is odd :
d is even :
b  d/2  + b d/2 
b d /2 + b d /2  b
 d /2 
+b
 d /2 
 note:
 there is a single leaf node in common of both
strategies.
Lower Bound for Minimax
Algorithms - Analysis
b


d/2
+ b d/2 -1 = O(bd/2 )
This is the number of leaf nodes that must be
examined by any minimax algorithm.
This is the lower bound of the time complexity.
Minimax value of game trees



The most natural definition for the average case is
that the leaf nodes are randomly ordered.
Heuristic node ordering would violate this
assumption.
Average case performance is not a prediction of its
performance in practice
Minimax value of game trees

The root will be in the average case of randomly
ordered frontier nodes.
Special case
 leaf
nodes:
 are
actual terminal positions,
 have the exact values of WIN or LOSS.

Most general case
 arbitrary
leaf node values
WIN-LOSS Trees analytic model






uniform branching factor b
uniform depth d
Max is to move at the root
depth d is even
terminal nodes labeled WIN with probability P0
terminal nodes labeled LOSS with probability 1 - P0
Example: Board Splitting



Two players:vertical and horizontal
square sheet of graph paper, bd/2 squares on
each side
each square V with probability P0 and
H with probability 1 - P0

vertical’s turn: divides the board vertically
into b equals slices, discarding all but one of
them.
Board Splitting


horizontal’s turn: divides the board
horizontally into b equals slices, discarding
all but one of them.
Result: the initial in the only square left
indicates the winner.
Complexity of WIN-LOSS Tree
Pn

Pn probability that Max
force a win, given that
Max is to move
Qn
 2n moves in the tree
Pn-1

Qn probability that Max
force a win, given that
min is to move
 2n -1 moves in the tree
Max
min
WIN-LOSS Trees
Qn  Pnb 1
1  Pn   1  Qn 

b
n 1

b
n 1
1  Pn  1  P
Pn  1  1  P
b


b
b
 This is the probability that a Max node at any
higher level in the tree will be a win for Max
WIN-LOSS Trees
Min is to move to be a
win for Max.
 all of its children must be
wins for Max

probability that all b
children of a node are
win for Max
 Q n = (Pn - 1)b

Max is to move to be a
loss for Max
 all of its children must be
losses for Max

probability that a node is
a loss for Max
 1 - probability that it is a
win for Max
 1 - P n = (1 - Q n)

WIN-LOSS Trees
1.00
crossover point:
determines the
probability of a win
for Max
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00‫ב‬
0.20
0.40
0.60
0.80
Graph showing iterations of function f(x) = 1 - (1 - x2)2.
1.00
WIN-LOSS Trees
If the probability of a win for MAX at the leaves is grater
than crossover point, then the large enough game tree
is almost certainly a forced win for MAX!
WIN-LOSS Trees


Let b be is the fixed point of the iteration (crossover point).
For b = 2,
2 

5 1
 0.618
2
This value is also known as “golden ratio” .
The probability that the root of a game tree is a forced win for
Max:
 0 if P0   b

lim Pn (P0 )   b if P0   b
n
 1 if P  
0
b

Even through wins and losses
are chosen randomly at the leaf
nodes, we can predict the winloss status of the root of a
sufficiently deep minimax tree
with almost certainty, simply by
knowing the probability of a win
at the leaves!
Minimax convergence theorem

We now generalize our result to the case of leaf nodes
with arbitrary numerical values. We adopt a following
model:

Uniform branching factor b

uniform depth d

leaves are assigned random numeric value, but from a
common probability distribution function Fv0(v) = P(v0  v).

v0 is a particular node value chosen from this distribution
Minimax convergence theorem

Let’s determine the probability distribution of the
minimax value of the root of a tree
 in
the limit of large depth
 as a function of the probability distribution of the
leaf values
 with the expression of the distribution of the
minimax values at 2n levels above the leaves as
Fvn(v)
Minimax convergence theorem
For any value of v:

a leaf node is a win for Max
a leaf node is a win for Min
its value > v
its value >v
Minimax value of a max node
will be greater than v
any one of its children has a
minimax value greater than v.


Minimax value of a min node
will be greater than v
all of its children has a
minimax value greater than v.
The minimax values propagate up the same as in the winloss trees.
Minimax convergence theorem


In win-loss tree Pn is the probability that a Max node 2n
levels above the leaves is a forced win for Max.
In general game tree
P(vn > v) = 1 - P(vn  v) = 1 - Fvn (v)
The theorem:
if Fv0 v   1   b
0

lim Fvn v   1   b if Fv0 v   1   b
n
1
if Fv0 v   1   b

Minimax convergence theorem
The Meaning:

Probability distribution is zero up to a particular value of v
(v*).
in a b-ary tree:


Fv0 (v*) = 1- 
beyond v*, the probability distribution function of the
minimax value of the root is 1.
The probability density function of the step distribution
function Fvn(v) is an impulse at v*. All the probability mass
is concentrated at v *.
Conclusion for a minimax tree
We can predict exactly what the minimax value of the
root of the tree will be.

Given:
 arbitrary
terminal values chosen independently from
the same distribution
 limit of large depth
Example (application)
Goal:

a fuse that will burn out after a specific time.
problem:

we only have fuses that have same broad distribution of
burn-out time.
solution:

we connect two fuses in parallel
the burn-out time of the whole circuit will be the maximum
of the burn-out time of the individual fuses
 the circuit will remain closed until both fused burn-out


The burn-out time of the entire circuit is the minimax
value of the burn-out times of the individuals fuses.
Average-case time complexity Win-Loss game tree
We assume the previous model.
 Assume that Pn   b According to the minimax
convergence theorem, at sufficiently high levels of the
tree, all nodes are losses for Max and wins for Min.

Max node
 all the children must
be examined
 all the children will
be a loss for Max
Min node
 only the first node
must be examined
 it will be a win for
Max
Average-case time complexity Win-Loss game tree
If we follow any path from root, we will branch:
 only
one way at the alternating Max levels
 b ways at every other level
The asymptotic number of leaf nodes in the limit of large
depth O(bd/2 )
@ effective branching factor of b1/2
Average-case time complexity Win-Loss game tree
 Assume that Pn   b
In this case, at sufficiently
high levels of the tree, all nodes are losses for Max and
wins for Min.

As the above case, this alse results in effective b.f of b1/2
 Assume that Pn   b
 Pearl
(extremely rare case)
shows :
effective branching factor =
 b / (1 -  b )  b
3
4
Average-case time complexity Trees with arbitrary terminal values

There are two possibilities for choosing the leaf values:
a
continuous distribution - segment of the real number
number line.
 Minimax
values of all nodes will be equal
 Alpha-beta
pruning will realize its best performance
a
discrete distribution - only a finite number of distinct
values.
 The
probability that any node takes an any particular
value is zero
 Pearl
shows:
 b / (1 -  b )  b
3
4
Introduction

We generalize the 2-player-perfect-information
algorithms, to the case of non-cooperative perfectinformation -more players games.

No coalitions between players.

Examples:
 Chinese
Checkers with 6 players.
 Othello extended by having different colored
pieces for each player.
Maxn
n
(max ) Algorithm
Assumption:

the players alternate moves

each player tries to maximize his/her return

and indifferent to returns of others.
Maxn Algorithm

At frontier nodes, evaluation function returns an
n-tuple of values:
(player1, P2, P3, …. Player n)

For example:
 Othello
- return number of pieces for each player.
Maxn Algorithm
evaluation function in each interior node where
player i is to move
=
the entire n-tuple of the child for which the ith
component is maximum.
Maxn Algorithm - Example
1
(7,3,6)
(1,7,2)
(7,3,6)
2
2
(6,5,4)
(1,7,2)
(7,3,6)
3
(3,1,8)
3
3
3
(2,8,1)
(1,7,2)
(5,6,3)
(6,5,4)
(8,5,4)
(7,3,6)
(4,2,7)
(3,1,8)
Maxn Algorithm

Formal notations:







M(x) static heuristic value of node x
M(x,p) - backed-up maxn value of node x by player p.
Mi(x,p) - component of M(x,p) corresponds to the return for
player i.
M(xi,p’) = maxMp(xi,p’) over children of node x,
p’ is player that follows player p
tie breaking in favor of leftmost node.
Recursive definition of the maxn node:
if x is a frontier node
 M ( x)
M ( x , p)  
 M ( xi , p') otherwise
Maxn Algorithm

Minimax can be viewed as a special case of maxn,
when
n = 2,
 evaluation function: (x, -x).


Luckhardt & Irani observed:
at nodes where player i is to move, only the ith component
of the children need be evaluated.
 It may be no less expensive to compute all components.


Without assumptions on values of components,
pruning of branches is impossible (with more than 2
players).
Alpha-Beta in multi-Player Games

Tree pruning is possible when :
 there
is an upper bound on the sum of all
components of a tuple
 a lower bound on the values of each component.

For example:
 Othello
- no player can have less than zero, and
total number of pieces on the board is equal for all
nodes at same level.
Immediate Pruning



Player I is to move, and in one child the ith component
equals the upper bound of sum on all components.
Obvious that any other child can be pruned.
This is equivalent to situations in the two-player case
when a child of a Max node has value of , or a child
of a Min node has value of -, indication a won
position for the corresponding player.
Shallow Pruning
(3, 6, 6)
(3,3,3)
1
(3,3,3)
(2, 7, 2)
2
3
(3,3,3)
3
(4,2,3)
3
(3,1,5)
2
3
(1,7,1)
(3, 6, 3)
2
3
(1,6,2)
Shallow algorithm
Shallow(Node, Player, Bound)
IF Node is terminal, RETURN static value
Best = Shallow(first Child, next Player, Sum)
FOR each remaining Child
IF Best[Player] >= Bound, RETURN Best
Current = Shallow(Child, next Player,
Sum - Best[Player] )
IF Current[Player] > Best[Player],
Best = Current
RETURN Best
Failure of deep pruning

In a 2-player game, alpha-beta allows deep
pruning - pruning a node based on bounds
inherited from its great-grandparents, or
other distant ancestor.
Deep pruning does not generalize to more
than 2 players!
Failure of Deep Pruning -Example
(5, 5, 4)
1
2
2
(5,2,2)
(4, 4, 5)
1
(2,2,5)
3
3
1
(2,3,4) or (3,0,6)
(6,1,2)
Optimality of shallow pruning

Theorem:
 Every
directional algorithm that computes the maxn
value of a game tree with more than 2 players must
evaluate every terminal node evaluated by shallow
pruning under the same order.

Steps of proof:
 The formal proof is by induction on the height of
the tree and generalizes the result to an arbitrary
number of players greater than 2.
Optimality of shallow pruning

Try to see a “zipper” effect in the sense that
the original order of the “teeth” (nodes) at the
bottom determines the order of the teeth at
the top, even though no individual tooth can
move very far.
Minimax and Pathology



So far, we have considered the time and spacre
complexities of minimax search.
We now turn our attention to the quality of the
decisions it makes.
Since alpha-beta makes exactly
the same
decisions as minimax search, the question is the
decision quality of minimax.
Exact Terminal Values



If the leaf nodes of the tree are evaluated exactly,the
minimax makes optimal moves against an opponent
who plays perfectly.
But decision quality is not optimal against an
imperfect opponent, who can make mistakes.
Example of situation:
 two
moves are available:
 - may lead easily and immediately to loss
 - require a long sequence of moves and great
deal of skill of the opponent to force the loss.

Minimax has no preference for one option over the other.
Exact Terminal Values - Example


Against an infallible opponent, it does no matter
what move is made.
Against an opponent who can make mistakes, it is
far preferable to choose the move that requires the
most skill on the part of opponent, increasing the
chances of an error by the opponent - and hence a
win by the player to move.
Minimaxing of Heuristic values


With the exception of the endgame, the values
associated with most nodes in a game are heuristic
values returned by the static evaluation.
To decrease an error, the heuristic values should
be maximized up as if they were exact values.
Minimaxing of Heuristic valuesExample

Consider a Max node with two children:
x, y - the static heuristic values of the child nodes
m
m = max(x,y) (minimax value of the parent node )
x

y
Assume that :
the true value of each child node is a random variable
uniformly distributed from 0 to 1.
 the variables are independent.


The most natural way to estimate their values would be
their expected values, which are 1/2.
Minimaxing of Heuristic valuesExample

Since the minimax value of the parent node is dependent
on the values of its children, m becomes a random
variable as well.
m = max(x,y) = 1/2


A better estimate of the value of the parent would be its
expected value, or expected value of the maximum of x
and y.
Note: The expected value of the maximum of two values is not the
same as the maximum of of their expected values!
Minimaxing of Heuristic valuesProof
PDF ( x)  P( x0  x)  x
PDF ( y)  P( y0  y)  y
Since x,y~U(0,1)
m  max( x, y)
PDF (m)  P(m0  m)  P(max( x , y )  m)  P( x  m and y  m) 
 P( x  m)  P( y  m) = m  m  m2
   2m
pdf(m)  m
1
PDF - probability function
pdf - density function
2 '
1
E (m)   m  pdf (m)dm   2m dm  2 / 3m
2
0
0
31
0
2/3
Minimaxing of Heuristic values



Thus, the expected value of the maximum of two
random variables chosen independently from the uniform
distribution from 0 to 1 is 2/3, while the maximum of their
expected values is only 1/2.
The essential error of minimax is
is to take the
maximum/minimum of the expected values instead of computing
the expected value of the maximum/minimum.
As the search deeper, the minimax value accumulate
more and more error.
Minimaxing of Heuristic values

Why not to compute the exact value of the root of a
game tree?

This requires the exact distribution function of all of our
leaf nodes, which we don’t know in a real game.

To do the above calculation, we assumed that the values
of the child nodes were independent of one another,
which is unlikely to be true in a real game.

Even if we had the exact distribution and they were
independent, the distributions of interior nodes become
increasingly complex functions and can be calculated
exactly only in small trees.
Game tree pathology



The above error in maximizing gives rise to an effect
known as game-tree pathology.
In the board-splitting game tree, the decision quality of
minimax as a function of search depth increases with
increasing depth, up to a point, Beyond a certain depth,
the percentage of optimal moves made by minimax
searching deeper it less than for minimax searching to a
shallower depth.
The meaning:
The error propagation due to maximizing overcomes the
additional information derived from searching deeper.
Game tree pathology



In real games, searching deeper almost never results
in poorer overall quality of play.
The puzzle then is to determine what is about board
splitting that causes it to be pathological, unlike real
games.
There are several possible explanations.
Game tree pathology - Possible
Reasons
Real games
Board Splitting
Possible ?
The accuracy of the evaluation
function increases as it gets
closer to the endgame, and that
effect overcomes the error due to
maximizing on more levels
The accuracy of the
evaluation function
doesn’t increase
All terminal nodes are not at the
same depth
Uniform depth
Unlikely.
Shown by Pearl, that in order to
overcome the maximizing error,
the accuracy of the evaluation
would have to increase by at least
50% with each level
Probably
Shown by Pearl, that in real
games are terminal positions
(“traps”)
which
have
exact
evaluations and increase the
overall accuracy of the evaluation
function.
Sibling values are
independent
Probably
Shown by Nau
The uniform branching
factor assumption is
made
Probably
Sibling values are dependent
The branching
uniform
factor
is
not
Game tree pathology

We can remove any of the assumptions of board
splitting, as:
uniform branching factor
 uniform depth
 independence of sibling value



and the pathology disappears.
It is difficult to argue convincingly that any of those
factors is the cause of pathology.
The real virtue of game-tree pathology is to remind us
that maximizing of uncertain value is statistically
misguided.
Learning Two Player Evaluation
Functions



We turn our attention to the problem of learning
heuristic functions for two-player games.
The most obvious, and still most commonly
used technique, is hand-coding by a human
expert.
Example:
 Deep_blue(chess)
 Chinook’s(checkers).
Samuel’s Checker Player



Arthur Samuel’s checkers program, written in the
1950’s.
In 1962, running on an IBM 7094, the machine
defeated R.W.Nealy, a future Connecticut state
checkers champion.
One of the first machine learning programs,
introducing a number of different learning
techniques.
Samuel’s Checker Player

Rote Learning
 When
a minimax value is computed for a position,
that position is stored along with its value.
 If the same position is encountered again, the value
can simply be returned.
 Due to memory constraints, all the generated board
position cannot be stored, and Samuel used a set of
criteria for determining which positions to actually
store.
Samuel’s Checker Player

Learning the evaluation function
 Comparing
the static evaluation of a node with the
backed-up minimax value from a lookahead search.
 If
the heuristic evaluation function is perfect, the static
value of a node would be equal to the backed-up
value based on a lookahead search applying the
same evaluation on the frontier nodes.
 If there’s a difference between the values, the
evaluation the heuristic function should be modified.
Samuel’s Checker Player

Selecting term
 Samuel’s
program could select which terms to
actually use, from a library of possible terms.
 In addition to material, these terms attempted to
measure following board features :
 center
control
 advancement of the pieces
 mobility
 The
program computes the correlation between the
values of these different features and the overall
evaluation score. If the correlation of a particular
feature dropped below a certain level, the feature
was replaced by another.
Linear Regression


Samuel’s method for modifying the weights were
somewhat Ad Hoc.
We shell describe a more principled way of
performing this task.
Linear Regression

Consider a checkers evaluation function based just on
material, of the form:
cppw+ckkw-cppb-ckkb,
where
 pw/pb - number of single white/black pieces on the
board
 kw/kb - number of white/Black kings on the board
 cp/ck - coefficient/weight assigned to a single
pieces/king

Since the game is symmetric with respect to white and
black, we assigned the same pieces the same weights.
Linear Regression


An individual function is represented by a particular set
of values for cp and ck.
We represent all such function in a two dimensional
space with cp on one axis and ck on the other.


our task: to find the best point in this space.
We start with an initial approximation of the relative
weights and find out from the equation the static
heuristic value of the current state.
Linear Regression

From this game state,
 We
perform a lookahead search as deeply as
our computational resources allow.
 At the frontier, our current evaluation function
is applied to the leaf nodes, and these values
are minimaxed back up to the root to
determine a backed up value b for the
position.
 In general, this value will not equal the static
value of the node.
Linear Regression

Consider the equation :
b=cppw+ckkw-cppb-ckkb,



It represents the set of all possible weight
combination for which the static value of this
particular position will equal its backed-up minimax
value from the given depth.
This is the equation of a line which is based on only
a single game state.
Each state produce another line.
Linear Regression




In general these lines will intersect one another, but
not all at the same point.
Given a set of such lines, we can determine the point
which more nearly approximates their mutual
intersection.
Standard mathematical techniques such as linear
regression can be applied to solve this problem.
The best intersection correspond to a new point in cpck space , and hence a different evaluation function.
Linear Regression



The entire process used a particular approximation
of the evaluation function, which was applied to the
leaf nodes of each minimax search.
The new function must be viewed as simply a
different , and hopefully better, approximation.
To get an even better function, we must return the
entire process again, applying the new
approximation to the leaf nodes to get yet another
approximation.
Linear Regression

We have two loops to this process.
 the
inner loop uses a particular evaluation function
to derive a new approximation.
 the outer loop iterates this process over multiple
approximation.

Hopefully it will eventually converge to a particular
function or a small neighborhood of such function.
Experiments with Chess


As a test of these ideas, was the task of learning the
relative weights of the different chess pieces.
The evaluation function was based purely on
material, with five parameters - the weights of :
 queens
 rooks
 bishops
 knights
 pawns
Experiments with Chess



Initially all weights were set to one.
The lookahead search was limited to two levels
deep, and linear regression was used to derive each
successive approximation to the evaluation function.
Surprisingly, the values eventually converged to a
fixed point.
Experiments with Chess - Result


Pieces
Values learned by
the program
Classical weights
from the chess
literature
queen
8
9
rook
4
5
bishop
4
3
knight
3
3
pawn
1
1
These are not the same, but they are close and atlas the
order of the pieces is correct.
Bear in mind that this experiment was performed with a
purely material evaluation function, and only two-level
lookahead !
Download