Game Trees and Minimax

advertisement
Minimax and Alpha-Beta
Mike Maxim
17 Apr 03
15-211 Spring 2003
Announcements

Homework 6 has been released!



Due May 1st 11:59PM, so get going!
Today’s lecture is very pertinent to this
assignment.
Quiz Postponed to Tuesday (4/22)
Initial Questions



How do we get a program to play
games well?
Could I be unstoppable with a
computer with massive computation
power?
How does Kasparov stay with
programs like Deep Junior?
A little more precise…


What sort of “games” do you mean?
Strategy/Board Type Games





Chess
Othello
Go
Tic-Tac-Toe
Let’s look at how we might go about
playing Tic-Tac-Toe
Consider this position
We are playing X, and it is now our turn.
Let’s write out all possibilities
Each number represents a position after each legal
move we have.
Now let’s look at their options
Here we are looking at all of the opponent responses
to the first possible move we could make.
Now let’s look at their options
Opponent options after our second
possibility. Not good again…
Now let’s look at their options
Struggling…
More interesting case
Now they don’t have a way to win on their next
move. So now we have to consider our responses to
their responses.
Our options
We have a win for any move they make.
So the original position in purple is an X win.
Finishing it up…
They win again if we take our fifth move.
Summary of the Analysis
So which move should we make? ;-)
Looking closer at the process


Traverse the “game tree”.
 Enumerate all possible moves at each node. The
children of that node are the positions that result
from making each move. A leaf is a position that
is won or drawn for some side.
Make the assumption that we pick the best move for
us, and the opponent picks the best move for him
(causes most damage to us)

Pick the move that maximizes the minimum amount
of success for our side.

This process is known as the Minimax algorithm.
Maximizing Success


In Tic-Tac-Toe there are only three
forms of success: Win, Tie, Lose.
So the point is, if you have a move that
leads to a Win make it. If you have no
such move, then make the move that
gives the tie. If not even this exists,
then it doesn’t matter what you do.
When can we use Minimax?

Game Properties required for Minimax



Two players
No chance (coin flipping)
Perfect information


No hidden cards or hidden Chess pieces…
Non-Minimax Games


Poker (or any game that involves bluffing or
somehow outwitting your opponent)
Arcade Games…
Example

The Game of Nim.



Each player alternates turns.
For each turn a player selects some
amount of pennies from one of the
stacks.
The player that takes the last penny wins.
Example



Let’s start with a simple configuration of Nim and use
Minimax to select a move.
Our initial configuration consists of three piles, with
1, 2, and 3 pennies in each pile.
We can represent this configuration compactly by
writing it as (1,2,3). Each position in this list
represents the number of pennies in that stack.
Order does not matter (I can just rearrange the
stacks).
Drawing the Game Tree
The first thing we need to take care of
Is drawing the game tree.
(1,2,3)
(2,3)
(1,1,3)
(1,2,2)
(1,3)
(1,1,2)
(1,2)
One level of the tree. Whose move is it now?
Drawing the Game Tree
(1,2,3)
(2,3)
(2,2)
(1,2)
(3)
(1,1,3)
(2)
(1,3)
(2)
(1)
(2)
(1,2,2)
(3)
Us
(1,3)
(1,1,2)
(1,2)
(1,1,2) (1,1,1) (1,1) (1)
win
(1)
(1,1)
(1,2)
loss
(1)
(1,1)
win
loss
(1,1,1)
Them
Us
Them
Us
Them
Us
Some notes on the tree


Each level of the tree is called a ply. Our
current tree is 6-ply deep.
To get the outcome at the root of the tree,
we start at the bottom and work our way up.


If it is our turn (level is labeled Us) then we pick
the maximum outcome from the children.
If it is the opponent’s turn (level is labeled Them)
then we pick the minimum outcome from the
children.
Analyzing the Tree
(1,2,3) loss
(2,3) loss
(1,1,3)loss (1,2,2)loss(1,3) loss (1,1,2)loss (1,2) loss
(2,2) (1,2)win(2)win(1,3)win (3) win
loss
(3) loss (2) loss (1) loss win
(2) win
Us
(1) win
(1,1,2)win
(1,1,1)win
(1,1)loss
(1) win Us
(1,1)win(1,2)loss (1,1,1)loss Them
loss
(1) loss
(1,1) loss
win
loss
Them
Us
Them
Us
What did we just find out?

We lose no matter what we do.



If our opponent plays like he should, there is
nothing we can do.
Keep in mind we didn’t really use any
“strategy” here. We just enumerated all
“lines” the game could progress down.
* It turns out in Nim there is a special trick
that can tell you immediately whether you
have a win, but that is not true for most
strategy games.
What did we just find out?


Minimax gives us a mechanism to play
perfectly. If we can build up the entire
game tree, all we need to do is follow
the procedure we just did to get the
optimal move.
Why can’t we do this for Chess or
Othello (or any reasonably complex
game)?
Othello


There are O(364) possible positions in
Othello (Note this is probably way too high).
Each square can have a black square on it,
a white square on it, or be empty, and there
are 64 squares.
We cannot build this game tree in full, it is
just too big. We need a way to approximate
the bottom part of it, so we don’t have to
build the whole thing.
Heuristics



A heuristic is an approximation that is
typically fast and used to aid in optimization
problems.
In this context, heuristics are used to “rate”
board positions based on local information.
For example, in Chess I can “rate” a position
by examining who has more pieces. The
difference in black’s and white’s pieces
would be the score of the position.
Heuristics and Minimax




We want a strategy that will let us cut off the
game tree at a certain maximum ply.
At the bottom nodes of the tree, we apply
the heuristic function to those positions.
Now instead of just Win, Loss, Tie, we have
a score.
For a level of the tree that is Us, we want
the move that yields the position with the
highest score. A Them level entails that we
want the child with the lowest score.
Heuristics and Minimax



When dealing with game trees, the heuristic
function is generally referred to as the
evaluation function, or the static evaluator.
The static evaluation takes in a board
position, and gives it a score.
The higher the score, the better it is for you,
the lower, the better for the opponent.
Implementing Minimax

The most important thing to note with
Minimax is that while we can visualize the
process as building a tree, when we
implement the algorithm in code, we never
actually build an explicit tree. The “tree” in
the implementation lives on the call stack as
a result of “tree like” recursive calls. This
can be difficult to conceptualize at first, but
think about it for a little bit and it should
make some sense.
Pseudo Code
int Minimax(Board b, boolean myTurn, int depth) {
if (depth==0)
return b.Evaluate(); // Heuristic
for(each possible move i)
value[i] = Minimax(b.move(i), !myTurn,
depth-1);
if (myTurn)
return array_max(value);
else
return array_min(value);
}
It is clear from this code that we don’t use an explicit tree structure.
However, the pattern of recursive calls forms a tree on the call stack.
Real Minimax Example
10
Max
Min
10
10
Max
Min
10
2
-5
16
12
7
16
2
7
-5
-5
-80
Evaluation function applied to the leaves!
How fast?



Minimax right now is pretty slow even
for a modest depth.
It is basically a brute force search.
What is the running time?

Each level of the tree has some average
b moves per level. We have d levels. So
the running time is O(bd).
Can we speed this up?

Let us observe the following game tree.
Max
2
2
Min
Max
2
7
1
1
What do we know about the root?
What do we know about the root’s right child?
Pruning


It is clear from this little example that in
Minimax we sometimes do extra work.
We evaluate nodes whose value has
no impact on the rest of the search.
There is a way we can “prune”
sections of the game tree off if we
know that they are irrelevant to the
outcome.
Alpha Beta Pruning


Idea: Track “window” of expectations.
Use two variables
  – Best score so far at a max node: increases

At a child min node:


If  ever gets less:


Parent wants max. To affect the parent’s current , our  cannot
drop below .
Stop searching further subtrees of that child. They do not matter!
 – Best score so far at a min node: decreases

At a child max node.
Parent wants min. To affect the parent’s current , our  cannot
get above the parent’s .
If  gets bigger than :

Stop searching further subtrees of that child. They do not matter!



Start the process with an infinite window ( = -,  = ).
Pseudo Code
int AlphaBeta(Board b, boolean myTurn, int depth, int
alpha, int beta) {
if (depth==0)
return b.Evaluate(); // Heuristic
if (myTurn) {
for(each possible move i && alpha < beta)
alpha = max(alpha,AlphaBeta(b.move(i), !myTurn,
depth-1,alpha,beta));
return alpha;
}
else {
for(each possible move i && alpha < beta)
beta = min(beta,AlphaBeta(b.move(i), !myTurn,
depth-1,alpha,beta));
return beta;
}
}
Alpha Beta Example
Max
 > !
Min
10
12  = 12
10
Max
Min
10
2
 =10
12
Alpha Beta Example
10
Max
Min
10
10
Max
Min
10
2
7
 =7
12
12
 = 10
7
2
7
 > !
Quiz Break
Alpha Beta Pruning

Does Alpha Beta ever return a different root value
than Minimax?


No! Alpha Beta does the same thing Minimax does, except
it is able to detect parts of the tree that make no
difference. Because it can detect this it doesn’t evaluate
them.
What is the speedup?


The optimal Alpha Beta search tree is O(bd/2) nodes or the
square root of the number of nodes in the regular Minimax
tree.
The speedup is greatly dependent on the order in which
you consider moves at each node. Why?
Transposition Tables


Another way to speed up basic Minimax is the use of
memoization to build transposition tables.
We construct a table (hash table) that stores a board position
and relevant information about that position.

Value for the node


Best move at the position


Upper Bound, Lower Bound, or Exact Value
Useful for move ordering!
If we encounter a position in our table during the search that
already exists in the table, we can take advantage of that
information at the current node.


Note that we cannot always just return the score from
transposition table hits. Sometimes we may just have a bound in
the table (if we had beta cutoff on that node). Must be extremely
careful when using these tables with Alpha Beta!
The most useful thing you get from this is the best move, which
aids greatly in ordering.
Other Optimizations





History and Killer Heuristics: Track which moves have tendency to
cause beta cutoff. Order these moves at the beginning of the move
list, since they have shone they are capable of being good.
Null Move: Skip your turn at a node (referred to as making a “null
move”) and do a reduced depth search on the resulting position. If
you get a good score back, you can prune the node (This works
extremely well in Chess, although you will have to be very careful if
you decide to use this in Othello programs).
Aspiration Window: Instead of starting with an infinite window, start
instead with an “aspiration window” to get more cutoff.
Fast Performance: Often times the best game playing programs are
the ones that search the deepest (hence all good programs being
called “Deep”). In order to search deep, your program will have to be
as efficient as possible. This includes optimizing your move
generation and evaluation routines among others.
Many references online (mostly dealing with Chess) to more
advanced optimizations.

E.g. http://www.seanet.com/~brucemo/topics/topics.htm
Iterative Deepening


In real life, often players have time limits on the amount of
time they get per move. They need to produce a move within
say 2 minutes. How do we get Minimax to search the
maximum amount of depth within the time limit?
Start out with a 1-ply search and get a move. This should go
very fast (well within time limit). We put the move we get back
into a “best move area”. Now we do a 2-ply search and
replace the current best move with the new move just
calculated. We continue to increase the depth of the search
tree until something forcibly shuts us down (on the time limit).
However when we get killed, we have a move ready in the
best move area from our earlier lower ply searches. So
because we just keep increasing our depth, we don’t have to
hardcode a max depth, the max depth adjusts to our time limit!

Ordering and Tables
Initial Questions



How do we get a program to play
games well?
Could I be unstoppable with a
computer with massive computation
power?
How does Kasparov stay with
programs like Deep Junior?
So how does Kasparov win?



Even the best Chess grandmasters say they
only look 4 or 5 moves ahead each turn.
Deep Junior looks up about 18-25 moves
ahead. How does it lose!?
Kasparov has an unbelievable evaluation
function. He is able to assess strategic
advantages much better than programs can
(although this is getting less true).
The moral, the evaluation function plays a
large role in how well your program can
play.
Summary




The Minimax algorithm provides a way to
build and analyze a game tree.
Often times it is impossible to build the
entire game tree, so we need a heuristic
approximation.
Alpha Beta and other optimizations provide
techniques for greatly enhancing the
performance of game playing programs.
Good Luck in the Tournament!
Download