Chapt-6

advertisement
Adversarial Search
Chapter 6
Outline



Optimal decisions
α-β pruning
Imperfect, real-time decisions
Games as search problems

Engaged intellectual faculties of humans.

Board games such as Chess.


Game playing is appealing target of
AI.
State of the game is easy to represent . And
Actions are restricted.
Games as search problems




This makes the Game playing an idealizations of
world in which hostile agents act so as to diminish
one’s well being.
Early researchers choose chess fro several reasons.
A chess playing computer is an existing proof of
machine doing some thing thought to require
intelligence.
Easy to represent the game as search through a
space of possible gaming positions .
Games as search problems




The presence of opponent makes the decision
problem complicated.
It introduces “Uncertainty”.
So, all game playing programs must deal with
contingency problems.
Means prediction is impossible.
Games as search problems



The complexity of the games introduces completely a
new kind of uncertainty that we have not seen so far.
It does not arise because it has missing information
but one does not have a time to calculate the exact
consequences of any move.
Instead, one has to make the best guess based on
past experience and act before he is sure what action
to take.
Games as search problems





Game usually has the time limits.
Game playing research has therefore spawned a number of
interesting ideas on how to make the best use of time to reach good
decisions.
Now ,the next discussion is how to find the theoretically best move.
Then we look techniques for choosing a good move when time is
limited.
Pruning allows us to to ignore portions of the search tree that
makes no difference to the final choice.
Heuristic function allows us to approximate the true utility of state
without complete search
Perfect Decisions in two person
Games



The general case of a game with two players, whom
we will call MAX and M1N,
MAX moves first, and then they take turns moving
until the game is over.
At the end of the game, points are awarded to the
winning player.
Formal definition of game as search
problem




The initial state, which includes the board position
and an indication of whose move it is.
A set of operators, which define the legal moves
that a player can make.
A terminal test, which determines when the game
is over. States where the game has ended are called
terminal states.
A utility function (also called a payoff function),
which gives a numeric value for the outcome of a
game. In chess, the outcome is a win, loss, or draw,
which we can represent by the values +1, —1, or 0
Perfect Decisions in two person
Games




MAX would have to search for a sequence of moves
that leads to a terminal state that is a winner .then
go ahead and make the first move in the sequence.
Unfortunately, MIN has something to say about it.
MAX therefore must find a strategy that will lead to
a winning terminal state regardless of what MIN
does, where the strategy includes the correct move
for MAX for each possible move by MIN.
We will begin by showing how to find the optimal (or
rational) strategy.
Game tree (2-player,
deterministic, turns)



Figure shows part of the search tree for the game of Tic-TacToe.
From the initial state, has a choice of nine possible moves. Play
alternates between MAX placing x's and MIN placing o's until we
reach leaf nodes corresponding to terminal states: states where
one player has three in a row or all the squares are filled.
The number on each leaf node indicates the utility value of the
terminal state from the point of view of MAX; high values are
assumed to be good for MAX and bad for MIN .It is MAX'S job
to use the search tree to determine the best move.
Game tree (2-player,
deterministic, turns)
Game tree




Even a simple game like Tic-Tac-Toe is too
complex to show the whole search tree,
so we will switch to the absolutely trivial
game in Figure .
The possible moves for MAX are labelled A I ,
AT, and AS. The possible replies loA\ for MIN
are A11, A12, A13, and so on. This particular
game ends after one move each by MAX and
MIN.
Game tree
Minimax Algorithm


The minimax algorithm is designed to determine the
optimal strategy for MAX, and thus to decide what
the best first move is.
The algorithm consists of five steps:

Generate the whole game tree, all the way down to
the terminal states.

Apply the utility function to each terminal state to get
its value.

Use the utility of the terminal states to determine the
utility of the nodes one level higher up in the search
tree.
Minimax Algorithm



Continue backing up the values from the leaf nodes
toward the root, one layer at a time.
Eventually, the backed-up values reach the top of
the tree; at that point, MAX chooses the move that
leads to the highest value. In the topmost A node of ,
MAX has a choice of three moves that will lead to
states with utility 3, 2, and 2, respectively.
Thus, MAX's best opening move is A1. This is called
the minimax decision, because it maximizes the
utility under the assumption that the opponent will
play perfectly to minimize it.
Minimax

Perfect play for deterministic games


Idea: choose move to position with highest minimax
value
= best achievable payoff against best play



E.g., 2-ply game:
Minimax algorithm
Properties of minimax

Complete? Yes (if tree is finite)


Optimal? Yes (against an optimal opponent)




Time complexity? O(bm)
(m is depth and b is legal move on each point)
Space complexity? O(bm) (depth-first exploration)


For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
Properties of α-β

Pruning does not affect final result


Good move ordering improves effectiveness of
pruning


With "perfect ordering," time complexity = O(bm/2)
 doubles depth of search

A simple example of the value of reasoning about
which computations are relevant (a form of
metareasoning)
Why is it called α-β?

α is the value of the
best (i.e., highestvalue) choice found
so far at any choice
point along the path
for max


If v is worse than α,
max will avoid it

 prune that branch
The α-β algorithm
The α-β algorithm
Resource limits
Suppose we have 100 secs, explore 104
nodes/sec
 106 nodes per move
Standard approach:

cutoff test:
e.g., depth limit (perhaps add quiescence search)
Evaluation functions


For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of
black queens), etc.
Cutting off search
MinimaxCutoff is identical to MinimaxValue
except
1.
2.
Terminal? is replaced by Cutoff?
Utility is replaced by Eval
3.
Does it work in practice?
bm = 106, b=35  m=4
4-ply lookahead is a hopeless chess player!
Deterministic games in
practice

Checkers: Chinook ended 40-year-reign of human world
champion Marion Tinsley in 1994. Used a precomputed
endgame database defining perfect play for all positions
involving 8 or fewer pieces on the board, a total of 444 billion
positions.



Chess: Deep Blue defeated human world champion Garry
Kasparov in a six-game match in 1997. Deep Blue searches 200
million positions per second, uses very sophisticated evaluation,
and undisclosed methods for extending some lines of search up
to 40 ply.


Othello: human champions refuse to compete against
computers, who are too good.


Go: human champions refuse to compete against computers,
Summary

Games are fun to work on!


They illustrate several important points
about AI



perfection is unattainable  must
approximate
good idea to think about what to think
about
Download