Game Theory

advertisement
GAME THEORY
Heng Ji
jih@rpi.edu
Feb 09/12, 2016
Outline
• Game Categories and Strategies
• MinMax
• Alpha-Beta Pruning
• Other Game Strategies
• Real-world Applications
GAMES AND ADVERSARIAL
SEARCH
Game playing
• Rich tradition of creating game-playing programs in AI
• Many similarities to search
• Most of the games studied
• have two players,
• are zero-sum: what one player wins, the other loses
• have perfect information: the entire state of the game is known to both players
at all times
• E.g., tic-tac-toe, checkers, chess, Go, backgammon, …
• Will focus on these for now
• Recently more interest in other games
• Esp. games without perfect information; e.g., poker
• Need probability theory, game theory for such games
Why study games?
• Games are a traditional hallmark of intelligence
• Games are easy to formalize
• Games can be a good model of real-world
competitive activities
• Military confrontations, negotiation, auctions, etc.
Types of game environments
Deterministic
Perfect information
(fully observable)
Imperfect information
(partially observable)
Stochastic
Types of game environments
Deterministic
Stochastic
Perfect information
(fully observable)
Chess, checkers, go
Backgammon,
monopoly
Imperfect information
(partially observable)
Battleships
Scrabble, poker,
bridge
Game theory
• Game theory deals with systems of interacting
agents where the outcome for an agent depends
on the actions of all the other agents
• Applied in sociology, politics, economics, biology, and,
of course, AI
• Agent design: determining the best strategy for
a rational agent in a given game
• Mechanism design: how to set the rules of the
game to ensure a desirable outcome
http://www.economist.com/node/21527025
Simultaneous single-move games
• Players must choose their actions at the same time, without
knowing what the others will do
• Form of partial observability
Normal form representation:
Player 1
Player 2
0,0
1,-1
-1,1
-1,1
0,0
1,-1
1,-1
-1,1
0,0
Payoff matrix
(row player’s utility is listed first)
Is this a zero-sum game?
Husband-Wife Delimma
• A couple was kidnapped. Only one of them can survive.
They agreed to both show rocks.
• But the husband win because he showed scissor while
the wife showed paper.
• Interpretation
• Husband wanted wife to win, while wife wanted herself to win
• Husband wanted wife to win, while wife guessed husband will
show scissor so she showed paper to let him win
• Husband guessed wife will show paper so he showed scissor
because he wanted to win
Prisoner’s dilemma
• Two criminals have been
arrested and the police visit them
separately
• If one player testifies against the
other and the other refuses, the
one who testified goes free and
the one who refused gets a 10year sentence
• If both players testify against
each other, they each get a
5-year sentence
• If both refuse to testify, they each
get a 1-year sentence
Alice:
Testify
Alice:
Refuse
Bob:
Testify
-5,-5
-10,0
Bob:
Refuse
0,-10
-1,-1
Prisoner’s dilemma
• Alice’s reasoning:
• Suppose Bob testifies. Then I get
5 years if I testify and 10 years if
I refuse. So I should testify.
• Suppose Bob refuses. Then I go free if
I testify, and get 1 year if
I refuse. So I should testify.
• Dominant strategy: A strategy
whose outcome is better for the
player regardless of the strategy
chosen by the other player
Alice:
Testify
Alice:
Refuse
Bob:
Testify
-5,-5
-10,0
Bob:
Refuse
0,-10
-1,-1
Prisoner’s dilemma
• Nash equilibrium: A pair of
strategies such that no player can get
a bigger payoff by switching
strategies, provided the other player
sticks with the same strategy
Alice:
Testify
Alice:
Refuse
Bob:
Testify
-5,-5
-10,0
Bob:
Refuse
0,-10
-1,-1
• (Testify, testify) is a dominant strategy
equilibrium
• Pareto optimal outcome: It is
impossible to make one of the
players better off without making
another one worse off
• In a non-zero-sum game, a Nash
equilibrium is not necessarily Pareto
optimal!
Prisoner’s dilemma in real life
• Price war
• Arms race
• Steroid use
Cooperate
Defect
Cooperate
Win – win
Win big –
lose big
Defect
Lose big –
win big
Lose – lose
• Pollution control
http://en.wikipedia.org/wiki/Prisoner’s_dilemma
Is there any reasonable way to get a
better answer?
• Superrationality (Douglas Hofstadter)
• Assume that the answer to a symmetric problem will be
the same for both players
• Maximize the payoff to each player while considering
only identical strategies
• Not a conventional model in game theory
Stag hunt
Hunter 1: Hunter 1:
Stag
Hare
Hunter 2:
Stag
2,2
1,0
Hunter 2:
Hare
0,1
1,1
• Is there a dominant strategy for either player?
• Is there a Nash equilibrium?
• (Stag, stag) and (hare, hare)
• Model for cooperative activity
Prisoner’s dilemma vs. stag hunt
Stag hunt
Prisoner’ dilemma
Cooperate
Defect
Cooperate
Win – win
Win big –
lose big
Defect
Lose big –
win big
Lose – lose
Players can gain by
defecting unilaterally
Cooperate
Defect
Cooperate
Win big –
win big
Win – lose
Defect
Lose – win
Win – win
Players lose by
defecting unilaterally
Game of Chicken
Player 1
S
Player 2
Chicken
Straight
Chicken
Straight
C
S -10, -10
-1, 1
C
0, 0
1, -1
• Is there a dominant strategy for either player?
• Is there a Nash equilibrium?
(Straight, chicken) or (chicken, straight)
• Anti-coordination game: it is mutually beneficial for the two players to
choose different strategies
• Model of escalated conflict in humans and animals
(hawk-dove game)
• How are the players to decide what to do?
• Pre-commitment or threats
• Different roles: the “hawk” is the territory owner and the “dove” is the intruder, or
vice versa
http://en.wikipedia.org/wiki/Game_of_chicken
Some multi-player games
• The diner’s dilemma
• A group of people go out to eat and agree to split the bill equally.
Each has a choice of ordering a cheap dish or an expensive dish
(the utility of the expensive dish is higher than that of the cheap
dish, but not enough for you to want to pay the difference)
• What would you do?
The Diner’s Delimma
The Diner’s Delimma
The Diner’s Delimma
The Diner’s Delimma
Some multi-player games
• The diner’s dilemma
• A group of people go out to eat and agree to split the bill equally.
Each has a choice of ordering a cheap dish or an expensive dish
(the utility of the expensive dish is higher than that of the cheap
dish, but not enough for you to want to pay the difference)
• Nash equilibrium is for everybody to get the expensive dish
Let’s play a game
• Pick up a number between 0~100
• If you number is closest to 2/3 of the average of
all numbers, then you will win the game.
Outline
• Game Categories and Strategies
• MinMax
• Alpha-Beta Pruning
• Other Game Strategies
• Real-world Applications
Game playing algorithms today
• Computers are better than humans
• Checkers: solved in 2007
• Chess: IBM Deep Blue defeated Kasparov in 1997
• Computers are competitive with top human players
• Backgammon: TD-Gammon system used reinforcement
learning to learn a good evaluation function
• Bridge: top systems use Monte Carlo simulation and
alpha-beta search
• Computers are not competitive
• Go: branching factor 361. Existing systems use Monte
Carlo simulation and pattern databases
Origins of game playing algorithms
• Ernst Zermelo (1912): Minimax algorithm
• Claude Shannon (1949): chess playing with
evaluation function, quiescence search, selective
search (paper)
• John McCarthy (1956): Alpha-beta search
• Arthur Samuel (1956): checkers program that
learns its own evaluation function by playing
against itself
Review: Games
• Efficiency of alpha-beta pruning
• Evaluation functions
• Horizon effect
• Quiescence search
• Additional techniques for improving efficiency
• Stochastic games, partially observable games
Games vs. search problems
• "Unpredictable" opponent  specifying a move for every
possible opponent reply
•
• Time limits  unlikely to find goal, must approximate
•
Alternating two-player zero-sum games
• Players take turns
• Each game outcome or terminal state has a
utility for each player (e.g., 1 for win, 0 for loss)
• The sum of both players’ utilities is a constant
Games vs. single-agent search
• We don’t know how the opponent will act
• The solution is not a fixed sequence of actions from start
state to goal state, but a strategy or policy (a mapping from
state to best move in that state)
• Efficiency is critical to playing well
• The time to make a move is limited
• The branching factor, search depth, and number of terminal
configurations are huge
• In chess, branching factor ≈ 35 and depth ≈ 100, giving a search tree
of 10154 nodes
• This rules out searching all the way to the end of the game
Two Player Games
• Competitive rather than cooperative
• One player loses, one player wins
• Zero sum game
• One player wins what the other one loses
• See game theory for the mathematics
• Getting an agent to play a game
• Boils down to how it plays each move
• Express this as a search problem
• Cannot backtrack once a move has been made (episodic)
Game tree (2-player, deterministic, turns)
• A game of tic-tac-toe between two players, “max” and “min”
http://xkcd.com/832/
http://xkcd.com/832/
“Sum to 2” game
• Player 1 moves, then player 2, finally player 1 again
• Move = 0 or 1
• Player 1 wins if and only if all moves together sum to 2
Player 1
0
1
Player 2
Player 2
0
1
Player 1
0
-1
Player 1
1
-1
1
0
Player 1
Player 1
0
1
0
-1
1
-1
1
0
1
1
1
Player 1’s utility is in the leaves; player 2’s utility is the negative of this
-1
Backward induction (aka. minimax)
• From leaves upward, analyze best decision for player at node,
give node a value
• Once we know values, easy to find optimal action (choose best value)
1
Player 1
0
Player 2
-1
0
Player 1
0
-1
1
1
Player 1
-1
1
-1
Player 2
1
1
0
1
1
0
1
0
-1
1
-1
1
Player 1
1
0
1
Player 1
1
1
-1
Modified game
• From leaves upward, analyze best decision for player at node,
give node a value
6
Player 1
0
Player 2
-1
0
Player 1
0
-1
1
1
Player 1
-1
1
-2
Player 2
6
1
0
6
4
0
1
0
-3
4
-5
7
Player 1
1
0
6
Player 1
1
7
-8
A recursive implementation
• Value(state)
• If state is terminal, return its value
• If (player(state) = player 1)
• v := -infinity
• For each action
• v := max(v, Value(successor(state, action)))
• Return v
• Else
• v := infinity
• For each action
• v := min(v, Value(successor(state, action)))
• Return v
Outline
• Game Categories and Strategies
• MinMax
• Alpha-Beta Pruning
• Other Game Strategies
• Real-world Applications
Review: Games
• What is a zero-sum game?
• What’s the optimal strategy for a player in a zero-sum
game?
• How do you compute this strategy?
Minimax
• Perfect play for deterministic games
•
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
•
• E.g., 2-ply game:
•
Minimax algorithm
Properties of minimax
• Complete? Yes (if tree is finite)
•
• Optimal? Yes (against an optimal opponent)
•
• Time complexity? O(bm)
•
• Space complexity? O(bm) (depth-first exploration)
•
• For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
•
A more abstract game tree
3
3
2
Terminal utilities (for MAX)
A two-ply game
2
A more abstract game tree
3
3
2
2
• Minimax value of a node: the utility (for MAX) of being in the
corresponding state, assuming perfect play on both sides
• Minimax strategy: Choose the move that gives the best worstcase payoff
Computing the minimax value of a state
3
3
2
• Minimax(state) =
 Utility(state) if state is terminal
 max Minimax(successors(state)) if player = MAX
 min Minimax(successors(state)) if player = MIN
2
Computing the minimax value of a state
3
3
2
2
• The minimax strategy is optimal against an optimal opponent
• If the opponent is sub-optimal, the utility can only be higher
• A different strategy may work better for a sub-optimal opponent, but it will
necessarily be worse against an optimal opponent
More general games
4,3,2
4,3,2
4,3,2
1,5,2
7,4,1
1,5,2
7,7,1
• More than two players, non-zero-sum
• Utilities are now tuples
• Each player maximizes their own utility at each node
• Utilities get propagated (backed up) from children to parents
Do we need to see all the leaves?
• Do we need to see the value of the question mark here?
Player 1
0
1
Player 2
Player 2
0
1
Player 1
0
-1
Player 1
1
-2
1
0
Player 1
Player 1
0
1
?
4
0
1
0
1
Do we need to see all the leaves?
• Do we need to see the values of the question marks here?
Player 1
0
1
Player 2
Player 2
0
1
Player 1
0
-1
Player 1
1
-2
1
0
Player 1
Player 1
0
1
0
?
?
-5
1
0
6
1
7
-8
Alpha-beta pruning
• Pruning = cutting off parts of the search tree (because you
realize you don’t need to look at them)
• When we considered A* we also pruned large parts of the search tree
• Maintain alpha = value of the best option for player 1 along the
path so far
• Beta = value of the best option for player 2 along the path so far
Pruning on beta
• Beta at node v is -1
• We know the value of node v is going to be at least 4, so the -1
route will be preferred
• No need to explore this node further
Player 1
0
1
Player 2
Player 2
0
1
1
0
node v
Player 1
0
-1
Player 1
1
-2
Player 1
Player 1
0
1
?
4
0
1
0
1
Pruning on alpha
• Alpha at node w is 6
• We know the value of node w is going to be at most -1, so
the 6 route will be preferred
• No need to explore this node further
Player 1
0
Player 2
1
Player 1
-1
Player 2
node w
0
0
1
Player 1
1
-2
1
0
Player 1
Player 1
0
1
0
?
?
-5
1
0
6
1
7
-8
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
3
3
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
3
3
2
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
3
3
2
14
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
3
3
2
5
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
3
3
2
2
Alpha-beta pruning
• α is the value of the best choice for
•
•
•
•
the MAX player found so far
at any choice point above n
We want to compute the
MIN-value at n
As we loop over n’s children,
the MIN-value decreases
If it drops below α, MAX will never
take this branch, so we can ignore
n’s remaining children
Analogously, β is the value of the
lowest-utility choice found so far for
the MIN player
MAX

MIN
MAX
MIN
n
Alpha-beta pruning
• Pruning does not affect final result
• Amount of pruning depends on move ordering
• Should start with the “best” moves (highest-value for MAX
or lowest-value for MIN)
• For chess, can try captures first, then threats, then
forward moves, then backward moves
• Can also try to remember “killer moves” from other
branches of the tree
• With perfect ordering, the time to find the best
move is reduced to O(bm/2) from O(bm)
• Depth of search is effectively doubled
Evaluation function
• Cut off search at a certain depth and compute the value of an
evaluation function for a state instead of its minimax value
• The evaluation function may be thought of as the probability of winning
from a given state or the expected value of that state
• A common evaluation function is a weighted sum of features:
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
• For chess, wk may be the material value of a piece (pawn = 1,
knight = 3, rook = 5, queen = 9) and fk(s) may be the advantage in terms
of that piece
• Evaluation functions may be learned from game databases or
by having the program play many games against itself
Evaluation functions
• For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
• e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black
queens), etc.
Cutting off search
• Horizon effect: you may incorrectly estimate the
value of a state by overlooking an event that is
just beyond the depth limit
• For example, a damaging move by the opponent that
can be delayed but not avoided
• Possible remedies
• Quiescence search: do not cut off search at positions
that are unstable – for example, are you about to lose
an important piece?
• Singular extension: a strong move that should be tried
when the normal depth limit is reached
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1.
2.
3.
Terminal? is replaced by Cutoff?
Utility is replaced by Eval
Does it work in practice?
bm = 106, b=35  m=4
4-ply lookahead is a hopeless chess player!
•
•
•
•
4-ply ≈ human novice
8-ply ≈ typical PC, human master
12-ply ≈ Deep Blue, Kasparov
Properties of α-β
• Pruning does not affect final result
•
• Good move ordering improves effectiveness of pruning
•
• With "perfect ordering," time complexity = O(bm/2)
 doubles depth of search
• A simple example of the value of reasoning about which
computations are relevant (a form of metareasoning)
•
Modifying recursive implementation to do
alpha-beta pruning
• Value(state, alpha, beta)
• If state is terminal, return its value
• If (player(state) = player 1)
• v := -infinity
• For each action
• v := max(v, Value(successor(state, action), alpha, beta))
• If v >= beta, return v
• alpha := max(alpha, v)
• Return v
• Else
• v := infinity
• For each action
• v := min(v, Value(successor(state, action), alpha, beta))
• If v <= alpha, return v
• beta := min(beta, v)
• Return v
The α-β algorithm
The α-β algorithm
Benefits of alpha-beta pruning
• Without pruning, need to examine O(bm) nodes
• With pruning, depends on which nodes we
consider first
• If we choose a random successor, need to
examine O(b3m/4) nodes
• If we manage to choose the best successor first,
need to examine O(bm/2) nodes
• Practical heuristics for choosing next successor to
consider get quite close to this
• Can effectively look twice as deep!
• Difference between reasonable and expert play
Repeated states
• As in search, multiple sequences of moves
may lead to the same state
• Again, can keep track of previously seen
states (usually called a transposition table in
this context)
• May not want to keep track of all previously seen
states…
Using evaluation functions
• Most games are too big to solve even with alpha-
beta pruning
• Solution: Only look ahead to limited depth
(nonterminal nodes)
• Evaluate nodes at depth cutoff by a heuristic
(aka. evaluation function)
• E.g., chess:
• Material value: queen worth 9 points, rook 5, bishop 3,
knight 3, pawn 1
• Heuristic: difference between players’ material values
Resource limits
Suppose we have 100 secs, explore 104 nodes/sec
 106 nodes per move
Standard approach:
• cutoff test:
e.g., depth limit (perhaps add quiescence search)
• evaluation function
= estimated desirability of position
Outline
• Game Categories and Strategies
• MinMax
• Alpha-Beta Pruning
• Other Game Strategies
• Real-world Applications
Review: Minimax
3
3
2
• Minimax(state) =
 Utility(state) if state is terminal
 max Minimax(successors(state)) if player = MAX
 min Minimax(successors(state)) if player = MIN
2
(Our) Basis of Game Playing:
Search for best move every time
Search for
Opponent
Move 1
Moves
Initial Board State
State 3
Board State 2
Board
Search for
Opponent
Move 3
Moves
Board State 4
Board State 5
Lookahead Search
• If I played this move
• Then they might play that move
• Then I could do that move
• And they would probably do that move
• Or they might play that move
• Then I could do that move
• And they would play that move
• Or I could play that move
• And they would do that move
• If I played this move…
Lookahead Search (best moves)
• If I played this move
• Then their best move would be
• Then my best move would be
• Then their best move would be
• Or another good move for them is…
• Then my best move would be
• Etc.
Minimax Search
• Like children sharing a cake
• Underlying assumption
• Opponent acts rationally
• Each player moves in such a way as to
• Maximise their final winnings, minimise their losses
• i.e., play the best move at the time
• Method:
• Calculate the guaranteed final scores for each move
• Assuming the opponent will try to minimise that score
• Choose move that maximises this guaranteed score
Example Trivial Game
• Deal four playing cards out, face up
• Player 1 chooses one, player 2 chooses one
• Player 1 chooses another, player 2 chooses another
• And the winner is….
• Add the cards up
• The player with the highest even number
• Scores that amount (in pounds sterling from opponent)
For Trivial Games
• Draw the entire search space
• Put the scores associated with each final board state at
the ends of the paths
• Move the scores from the ends of the paths to the starts of
the paths
• Whenever there is a choice use minimax assumption
• This guarantees the scores you can get
• Choose the path with the best score at the top
• Take the first move on this path as the next move
Entire Search Space
Moving the scores from
the bottom to the top
Moving a score
when there’s a choice
• Use minimax assumption
• Rational choice for the player below the number you’re moving
Choosing the best move
For Real Games
• Search space is too large
• So we cannot draw (search) the entire space
• For example: chess has branching factor of ~35
• Suppose our agent searches 1000 board states per second
• And has a time limit of 150 seconds
• So can search 150,000 positions per move
• This is only three or four ply look ahead
• Because 353 = 42,875 and 354 = 1,500,625
• Average humans can look ahead six-eight ply
Cutoff Search
• Must use a heuristic search
• Use an evaluation function
• Estimate the guaranteed score from a board state
• Draw search space to a certain depth
• Depth chosen to limit the time taken
• Put the estimated values at the end of paths
• Propagate them to the top as before
• Question:
• Is this a uniform path cost, greedy or A* search?
Evaluation Functions
• Must be able to differentiate between
• Good and bad board states
• Exact values not important
• Ideally, the function would return the true score
• For goal states
• Example in chess
• Weighted linear function
• Weights:
• Pawn=1, knight=bishop=3, rook=5, queen=9
Example Chess Score
• Black has:
• 5 pawns, 1 bishop, 2 rooks
• Score = 1*(5)+3*(1)+5*(2)
= 5+3+10 = 18
White has:
• 5 pawns, 1 rook
• Score = 1*(5)+5*(1)
= 5 + 5 = 10
Overall scores for this board state:
black = 18-10 = 8
white = 10-18 = -8
Evaluation Function for our Game
• Evaluation after the first move
• Count zero if it’s odd, take the number if its even

Evaluation function here would choose 10
–
But this would be disastrous for the player
Problems with Evaluation Functions
• Horizon problem
• Agent cannot see far enough into search space
• Potentially disastrous board position after seemingly good one
• Possible solution
• Reduce the number of initial moves to look at
• Allows you to look further into the search space
• Non-quiescent search
• Exhibits big swings in the evaluation function
• E.g., when taking pieces in chess
• Solution: advance search past non-quiescent part
Pruning
• Want to visit as many board states as possible
• Want to avoid whole branches (prune them)
• Because they can’t possibly lead to a good score
• Example: having your queen taken in chess
• (Queen sacrifices often very good tactic, though)
• Alpha-beta pruning
• Can be used for entire search or cutoff search
• Recognize that a branch cannot produce better score
• Than a node you have already evaluated
Alpha-Beta Pruning for Player 1
1. Given a node N which can be chosen by player one,
then if there is another node, X, along any path, such
that (a) X can be chosen by player two (b) X is on a
higher level than N and (c) X has been shown to
guarantee a worse score for player one than N, then
the parent of N can be pruned.
2. Given a node N which can be chosen by player two,
then if there is a node X along any path such that (a)
player one can choose X (b) X is on a higher level
than N and (c) X has been shown to guarantee a
better score for player one than N, then the parent of
N can be pruned.
Example of Alpha-Beta Pruning
player 1
player 2
•
Depth first search a good idea here
•
See notes for explanation
Expectimax Search
• Going to draw tree and move values as before
• Whenever there is a random event
• Add an extra node for each possible outcome which will change the
board states possible after the event
• E.g., six extra nodes if each roll of die affects state
• Work out all possible board states from chance node
• When moving score values up through a chance node
• Multiply the value by the probability of the event happening
• Add together all the multiplicands
• Gives you expected value coming through the chance node
More interesting
(but still trivial) game
• Deal four cards face up
• Player 1 chooses a card
• Player 2 throws a die
• If it’s a six, player 2 chooses a card, swaps it with player 1’s and
keeps player 1’s card
• If it’s not a six, player 2 just chooses a card
• Player 1 chooses next card
• Player 2 takes the last card
Expectimax Diagram
Expectimax Calculations
Games Played by Computer
• Games played perfectly:
• Connect four, noughts & crosses (tic-tac-toe)
• Best move pre-calculated for each board state
• Small number of possible board states
• Games played well:
• Chess, draughts (checkers), backgammon
• Scrabble, tetris (using ANNs)
• Games played badly:
• Go, bridge, soccer
Philosophical Questions
• Q1. Is how computers plays chess
• More fundamental than how people play chess?
• In science, simple & effective techniques are valued
• Minimax cutoff search is simple and effective
• But this is seen by some as stupid and “non-AI”
• Drew McDermott:
• "Saying Deep Blue doesn't really think about chess is like saying an
airplane doesn't really fly because it doesn't flap its wings”
• Q2. If aliens came to Earth and challenged us to chess…
• Would you send Deep Blue or Kasparov into battle?
Additional techniques
• Transposition table to store previously expanded
states
• Forward pruning to avoid considering all possible
moves
• Lookup tables for opening moves and endgames
Chess playing systems
•
•
•
•
Baseline system: 200 million node evalutions per move
(3 min), minimax with a decent evaluation function and
quiescence search
•
5-ply ≈ human novice
Add alpha-beta pruning
•
10-ply ≈ typical PC, experienced player
Deep Blue: 30 billion evaluations per move, singular
extensions, evaluation function with 8000 features,
large databases of opening and endgame moves
•
14-ply ≈ Garry Kasparov
Recent state of the art (Hydra): 36 billion evaluations per
second, advanced pruning techniques
•
18-ply ≈ better than any human alive?
Games with Chance
• Many more interesting games
• Have an element of chance
• Brought in by throwing a die, tossing a coin
• Example: backgammon
• See Gerry Tesauro’s TD-Gammon program
• In these cases
• We can no longer calculate guaranteed scores
• We can only calculate expected scores
• Using probability to guide us
Games of chance
• How to incorporate dice throwing into the game tree?
Games of chance
Games of chance
• Expectiminimax: for chance nodes, average
values weighted by the probability of each outcome
• Nasty branching factor, defining evaluation functions and
pruning algorithms more difficult
• Monte Carlo simulation: when you get to a chance
node, simulate a large number of games with
random dice rolls and use win percentage as
evaluation function
• Can work well for games like Backgammon
Partially observable games
• Card games like bridge and poker
• Monte Carlo simulation: deal all the cards
randomly in the beginning and pretend the game
is fully observable
• “Averaging over clairvoyance”
• Problem: this strategy does not account for bluffing,
information gathering, etc.
Outline
• Game Categories and Strategies
• MinMax
• Alpha-Beta Pruning
• Other Game Strategies
• Real-world Applications
Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994. Used a precomputed endgame database
defining perfect play for all positions involving 8 or fewer pieces on
the board, a total of 444 billion positions.
•
•
• Chess: Deep Blue defeated human world champion Garry Kasparov
in a six-game match in 1997. Deep Blue searches 200 million
positions per second, uses very sophisticated evaluation, and
undisclosed methods for extending some lines of search up to 40 ply.
•
• Othello: human champions refuse to compete against computers,
who are too good.
•
• Go: human champions refuse to compete against computers, who are
too bad. In go, b > 300, so most programs use pattern knowledge
bases to suggest plausible moves.
•
Mechanism design
(inverse game theory)
• Assuming that agents pick rational strategies, how
should we design the game to achieve a socially
desirable outcome?
• We have multiple agents and a center that collects
their choices and determines the outcome
Auctions
• Goals
• Maximize revenue to the seller
• Efficiency: make sure the buyer who values the goods
the most gets them
• Minimize transaction costs for buyer and sellers
Ascending-bid auction
• What’s the optimal strategy for a buyer?
• Bid until the current bid value exceeds your private value
• Usually revenue-maximizing and efficient, unless
the reserve price is set too low or too high
• Disadvantages
• Collusion
• Lack of competition
• Has high communication costs
Sealed-bid auction
• Each buyer makes a single bid and communicates it to the
auctioneer, but not to the other bidders
• Simpler communication
• More complicated decision-making: the strategy of a buyer depends on
what they believe about the other buyers
• Not necessarily efficient
• Sealed-bid second-price auction: the winner pays the price
of the second-highest bid
• Let V be your private value and B be the highest bid by any other buyer
• If V > B, your optimal strategy is to bid above B – in particular, bid V
• If V < B, your optimal strategy is to bid below B – in particular, bid V
• Therefore, your dominant strategy is to bid V
• This is a truth revealing mechanism
Dollar auction
• A dollar bill is being auctioned off. It goes to the highest
bidder, but the second-highest bidder also has to pay
• Player 1 bids 1 cent
• Player 2 bids 2 cents
• …
• Player 2 bids 98 cents
• Player 1 bids 99 cents
• If Player 2 passes, he loses 98 cents, if he bids $1, he might still come out even
• So Player 2 bids $1
• Now, if Player 1 passes, he loses 99 cents, if he bids $1.01, he only loses 1 cent
• …
• What went wrong?
• When figuring out the expected utility of a bid, a rational player should
take into account the future course of the game
• How about Player 1 starts by bidding 99 cents?
Game theory issues
• Is it applicable to real life?
• Humans are not always rational
• Utilities may not always be known
• Other assumptions made by the game-theoretic model may
not hold
• Political difficulties may prevent theoretically optimal
mechanisms from being implemented
• Could it be more applicable to AI than to real life?
• Computing equilibria in complicated games is difficult
• Relationship between Nash equilibrium and rational decision
making is subtle
The state of the art for some games
• Chess:
• 1997: IBM Deep Blue defeats Kasparov
• … there is still debate about whether computers are really better
• Checkers:
• Computer world champion since 1994
• … there was still debate about whether computers are really better…
• until 2007: checkers solved optimally by computer
• Go:
• Computers still not very good
• Branching factor really high
• Some recent progress
• Poker:
• Competitive with top humans in some 2-player games
• 3+ player case much less well-understood
Is this of any value to society?
• Some of the techniques developed for games
have found applications in other domains
• Especially “adversarial” settings
• Real-world strategic situations are usually not
two-player, perfect-information, zero-sum, …
• But game theory does not need any of those
• Example application: security scheduling at
airports
Summary
• Games are fun to work on!
•
• They illustrate several important points about AI
•
• perfection is unattainable  must approximate
• good idea to think about what to think about
Assignment 2
Download