GAME THEORY Heng Ji jih@rpi.edu Feb 09/12, 2016 Outline • Game Categories and Strategies • MinMax • Alpha-Beta Pruning • Other Game Strategies • Real-world Applications GAMES AND ADVERSARIAL SEARCH Game playing • Rich tradition of creating game-playing programs in AI • Many similarities to search • Most of the games studied • have two players, • are zero-sum: what one player wins, the other loses • have perfect information: the entire state of the game is known to both players at all times • E.g., tic-tac-toe, checkers, chess, Go, backgammon, … • Will focus on these for now • Recently more interest in other games • Esp. games without perfect information; e.g., poker • Need probability theory, game theory for such games Why study games? • Games are a traditional hallmark of intelligence • Games are easy to formalize • Games can be a good model of real-world competitive activities • Military confrontations, negotiation, auctions, etc. Types of game environments Deterministic Perfect information (fully observable) Imperfect information (partially observable) Stochastic Types of game environments Deterministic Stochastic Perfect information (fully observable) Chess, checkers, go Backgammon, monopoly Imperfect information (partially observable) Battleships Scrabble, poker, bridge Game theory • Game theory deals with systems of interacting agents where the outcome for an agent depends on the actions of all the other agents • Applied in sociology, politics, economics, biology, and, of course, AI • Agent design: determining the best strategy for a rational agent in a given game • Mechanism design: how to set the rules of the game to ensure a desirable outcome http://www.economist.com/node/21527025 Simultaneous single-move games • Players must choose their actions at the same time, without knowing what the others will do • Form of partial observability Normal form representation: Player 1 Player 2 0,0 1,-1 -1,1 -1,1 0,0 1,-1 1,-1 -1,1 0,0 Payoff matrix (row player’s utility is listed first) Is this a zero-sum game? Husband-Wife Delimma • A couple was kidnapped. Only one of them can survive. They agreed to both show rocks. • But the husband win because he showed scissor while the wife showed paper. • Interpretation • Husband wanted wife to win, while wife wanted herself to win • Husband wanted wife to win, while wife guessed husband will show scissor so she showed paper to let him win • Husband guessed wife will show paper so he showed scissor because he wanted to win Prisoner’s dilemma • Two criminals have been arrested and the police visit them separately • If one player testifies against the other and the other refuses, the one who testified goes free and the one who refused gets a 10year sentence • If both players testify against each other, they each get a 5-year sentence • If both refuse to testify, they each get a 1-year sentence Alice: Testify Alice: Refuse Bob: Testify -5,-5 -10,0 Bob: Refuse 0,-10 -1,-1 Prisoner’s dilemma • Alice’s reasoning: • Suppose Bob testifies. Then I get 5 years if I testify and 10 years if I refuse. So I should testify. • Suppose Bob refuses. Then I go free if I testify, and get 1 year if I refuse. So I should testify. • Dominant strategy: A strategy whose outcome is better for the player regardless of the strategy chosen by the other player Alice: Testify Alice: Refuse Bob: Testify -5,-5 -10,0 Bob: Refuse 0,-10 -1,-1 Prisoner’s dilemma • Nash equilibrium: A pair of strategies such that no player can get a bigger payoff by switching strategies, provided the other player sticks with the same strategy Alice: Testify Alice: Refuse Bob: Testify -5,-5 -10,0 Bob: Refuse 0,-10 -1,-1 • (Testify, testify) is a dominant strategy equilibrium • Pareto optimal outcome: It is impossible to make one of the players better off without making another one worse off • In a non-zero-sum game, a Nash equilibrium is not necessarily Pareto optimal! Prisoner’s dilemma in real life • Price war • Arms race • Steroid use Cooperate Defect Cooperate Win – win Win big – lose big Defect Lose big – win big Lose – lose • Pollution control http://en.wikipedia.org/wiki/Prisoner’s_dilemma Is there any reasonable way to get a better answer? • Superrationality (Douglas Hofstadter) • Assume that the answer to a symmetric problem will be the same for both players • Maximize the payoff to each player while considering only identical strategies • Not a conventional model in game theory Stag hunt Hunter 1: Hunter 1: Stag Hare Hunter 2: Stag 2,2 1,0 Hunter 2: Hare 0,1 1,1 • Is there a dominant strategy for either player? • Is there a Nash equilibrium? • (Stag, stag) and (hare, hare) • Model for cooperative activity Prisoner’s dilemma vs. stag hunt Stag hunt Prisoner’ dilemma Cooperate Defect Cooperate Win – win Win big – lose big Defect Lose big – win big Lose – lose Players can gain by defecting unilaterally Cooperate Defect Cooperate Win big – win big Win – lose Defect Lose – win Win – win Players lose by defecting unilaterally Game of Chicken Player 1 S Player 2 Chicken Straight Chicken Straight C S -10, -10 -1, 1 C 0, 0 1, -1 • Is there a dominant strategy for either player? • Is there a Nash equilibrium? (Straight, chicken) or (chicken, straight) • Anti-coordination game: it is mutually beneficial for the two players to choose different strategies • Model of escalated conflict in humans and animals (hawk-dove game) • How are the players to decide what to do? • Pre-commitment or threats • Different roles: the “hawk” is the territory owner and the “dove” is the intruder, or vice versa http://en.wikipedia.org/wiki/Game_of_chicken Some multi-player games • The diner’s dilemma • A group of people go out to eat and agree to split the bill equally. Each has a choice of ordering a cheap dish or an expensive dish (the utility of the expensive dish is higher than that of the cheap dish, but not enough for you to want to pay the difference) • What would you do? The Diner’s Delimma The Diner’s Delimma The Diner’s Delimma The Diner’s Delimma Some multi-player games • The diner’s dilemma • A group of people go out to eat and agree to split the bill equally. Each has a choice of ordering a cheap dish or an expensive dish (the utility of the expensive dish is higher than that of the cheap dish, but not enough for you to want to pay the difference) • Nash equilibrium is for everybody to get the expensive dish Let’s play a game • Pick up a number between 0~100 • If you number is closest to 2/3 of the average of all numbers, then you will win the game. Outline • Game Categories and Strategies • MinMax • Alpha-Beta Pruning • Other Game Strategies • Real-world Applications Game playing algorithms today • Computers are better than humans • Checkers: solved in 2007 • Chess: IBM Deep Blue defeated Kasparov in 1997 • Computers are competitive with top human players • Backgammon: TD-Gammon system used reinforcement learning to learn a good evaluation function • Bridge: top systems use Monte Carlo simulation and alpha-beta search • Computers are not competitive • Go: branching factor 361. Existing systems use Monte Carlo simulation and pattern databases Origins of game playing algorithms • Ernst Zermelo (1912): Minimax algorithm • Claude Shannon (1949): chess playing with evaluation function, quiescence search, selective search (paper) • John McCarthy (1956): Alpha-beta search • Arthur Samuel (1956): checkers program that learns its own evaluation function by playing against itself Review: Games • Efficiency of alpha-beta pruning • Evaluation functions • Horizon effect • Quiescence search • Additional techniques for improving efficiency • Stochastic games, partially observable games Games vs. search problems • "Unpredictable" opponent specifying a move for every possible opponent reply • • Time limits unlikely to find goal, must approximate • Alternating two-player zero-sum games • Players take turns • Each game outcome or terminal state has a utility for each player (e.g., 1 for win, 0 for loss) • The sum of both players’ utilities is a constant Games vs. single-agent search • We don’t know how the opponent will act • The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state) • Efficiency is critical to playing well • The time to make a move is limited • The branching factor, search depth, and number of terminal configurations are huge • In chess, branching factor ≈ 35 and depth ≈ 100, giving a search tree of 10154 nodes • This rules out searching all the way to the end of the game Two Player Games • Competitive rather than cooperative • One player loses, one player wins • Zero sum game • One player wins what the other one loses • See game theory for the mathematics • Getting an agent to play a game • Boils down to how it plays each move • Express this as a search problem • Cannot backtrack once a move has been made (episodic) Game tree (2-player, deterministic, turns) • A game of tic-tac-toe between two players, “max” and “min” http://xkcd.com/832/ http://xkcd.com/832/ “Sum to 2” game • Player 1 moves, then player 2, finally player 1 again • Move = 0 or 1 • Player 1 wins if and only if all moves together sum to 2 Player 1 0 1 Player 2 Player 2 0 1 Player 1 0 -1 Player 1 1 -1 1 0 Player 1 Player 1 0 1 0 -1 1 -1 1 0 1 1 1 Player 1’s utility is in the leaves; player 2’s utility is the negative of this -1 Backward induction (aka. minimax) • From leaves upward, analyze best decision for player at node, give node a value • Once we know values, easy to find optimal action (choose best value) 1 Player 1 0 Player 2 -1 0 Player 1 0 -1 1 1 Player 1 -1 1 -1 Player 2 1 1 0 1 1 0 1 0 -1 1 -1 1 Player 1 1 0 1 Player 1 1 1 -1 Modified game • From leaves upward, analyze best decision for player at node, give node a value 6 Player 1 0 Player 2 -1 0 Player 1 0 -1 1 1 Player 1 -1 1 -2 Player 2 6 1 0 6 4 0 1 0 -3 4 -5 7 Player 1 1 0 6 Player 1 1 7 -8 A recursive implementation • Value(state) • If state is terminal, return its value • If (player(state) = player 1) • v := -infinity • For each action • v := max(v, Value(successor(state, action))) • Return v • Else • v := infinity • For each action • v := min(v, Value(successor(state, action))) • Return v Outline • Game Categories and Strategies • MinMax • Alpha-Beta Pruning • Other Game Strategies • Real-world Applications Review: Games • What is a zero-sum game? • What’s the optimal strategy for a player in a zero-sum game? • How do you compute this strategy? Minimax • Perfect play for deterministic games • • Idea: choose move to position with highest minimax value = best achievable payoff against best play • • E.g., 2-ply game: • Minimax algorithm Properties of minimax • Complete? Yes (if tree is finite) • • Optimal? Yes (against an optimal opponent) • • Time complexity? O(bm) • • Space complexity? O(bm) (depth-first exploration) • • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible • A more abstract game tree 3 3 2 Terminal utilities (for MAX) A two-ply game 2 A more abstract game tree 3 3 2 2 • Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides • Minimax strategy: Choose the move that gives the best worstcase payoff Computing the minimax value of a state 3 3 2 • Minimax(state) = Utility(state) if state is terminal max Minimax(successors(state)) if player = MAX min Minimax(successors(state)) if player = MIN 2 Computing the minimax value of a state 3 3 2 2 • The minimax strategy is optimal against an optimal opponent • If the opponent is sub-optimal, the utility can only be higher • A different strategy may work better for a sub-optimal opponent, but it will necessarily be worse against an optimal opponent More general games 4,3,2 4,3,2 4,3,2 1,5,2 7,4,1 1,5,2 7,7,1 • More than two players, non-zero-sum • Utilities are now tuples • Each player maximizes their own utility at each node • Utilities get propagated (backed up) from children to parents Do we need to see all the leaves? • Do we need to see the value of the question mark here? Player 1 0 1 Player 2 Player 2 0 1 Player 1 0 -1 Player 1 1 -2 1 0 Player 1 Player 1 0 1 ? 4 0 1 0 1 Do we need to see all the leaves? • Do we need to see the values of the question marks here? Player 1 0 1 Player 2 Player 2 0 1 Player 1 0 -1 Player 1 1 -2 1 0 Player 1 Player 1 0 1 0 ? ? -5 1 0 6 1 7 -8 Alpha-beta pruning • Pruning = cutting off parts of the search tree (because you realize you don’t need to look at them) • When we considered A* we also pruned large parts of the search tree • Maintain alpha = value of the best option for player 1 along the path so far • Beta = value of the best option for player 2 along the path so far Pruning on beta • Beta at node v is -1 • We know the value of node v is going to be at least 4, so the -1 route will be preferred • No need to explore this node further Player 1 0 1 Player 2 Player 2 0 1 1 0 node v Player 1 0 -1 Player 1 1 -2 Player 1 Player 1 0 1 ? 4 0 1 0 1 Pruning on alpha • Alpha at node w is 6 • We know the value of node w is going to be at most -1, so the 6 route will be preferred • No need to explore this node further Player 1 0 Player 2 1 Player 1 -1 Player 2 node w 0 0 1 Player 1 1 -2 1 0 Player 1 Player 1 0 1 0 ? ? -5 1 0 6 1 7 -8 Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 14 Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 5 Alpha-beta pruning • It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 2 Alpha-beta pruning • α is the value of the best choice for • • • • the MAX player found so far at any choice point above n We want to compute the MIN-value at n As we loop over n’s children, the MIN-value decreases If it drops below α, MAX will never take this branch, so we can ignore n’s remaining children Analogously, β is the value of the lowest-utility choice found so far for the MIN player MAX MIN MAX MIN n Alpha-beta pruning • Pruning does not affect final result • Amount of pruning depends on move ordering • Should start with the “best” moves (highest-value for MAX or lowest-value for MIN) • For chess, can try captures first, then threats, then forward moves, then backward moves • Can also try to remember “killer moves” from other branches of the tree • With perfect ordering, the time to find the best move is reduced to O(bm/2) from O(bm) • Depth of search is effectively doubled Evaluation function • Cut off search at a certain depth and compute the value of an evaluation function for a state instead of its minimax value • The evaluation function may be thought of as the probability of winning from a given state or the expected value of that state • A common evaluation function is a weighted sum of features: Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • For chess, wk may be the material value of a piece (pawn = 1, knight = 3, rook = 5, queen = 9) and fk(s) may be the advantage in terms of that piece • Evaluation functions may be learned from game databases or by having the program play many games against itself Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc. Cutting off search • Horizon effect: you may incorrectly estimate the value of a state by overlooking an event that is just beyond the depth limit • For example, a damaging move by the opponent that can be delayed but not avoided • Possible remedies • Quiescence search: do not cut off search at positions that are unstable – for example, are you about to lose an important piece? • Singular extension: a strong move that should be tried when the normal depth limit is reached Cutting off search MinimaxCutoff is identical to MinimaxValue except 1. 2. 3. Terminal? is replaced by Cutoff? Utility is replaced by Eval Does it work in practice? bm = 106, b=35 m=4 4-ply lookahead is a hopeless chess player! • • • • 4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov Properties of α-β • Pruning does not affect final result • • Good move ordering improves effectiveness of pruning • • With "perfect ordering," time complexity = O(bm/2) doubles depth of search • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) • Modifying recursive implementation to do alpha-beta pruning • Value(state, alpha, beta) • If state is terminal, return its value • If (player(state) = player 1) • v := -infinity • For each action • v := max(v, Value(successor(state, action), alpha, beta)) • If v >= beta, return v • alpha := max(alpha, v) • Return v • Else • v := infinity • For each action • v := min(v, Value(successor(state, action), alpha, beta)) • If v <= alpha, return v • beta := min(beta, v) • Return v The α-β algorithm The α-β algorithm Benefits of alpha-beta pruning • Without pruning, need to examine O(bm) nodes • With pruning, depends on which nodes we consider first • If we choose a random successor, need to examine O(b3m/4) nodes • If we manage to choose the best successor first, need to examine O(bm/2) nodes • Practical heuristics for choosing next successor to consider get quite close to this • Can effectively look twice as deep! • Difference between reasonable and expert play Repeated states • As in search, multiple sequences of moves may lead to the same state • Again, can keep track of previously seen states (usually called a transposition table in this context) • May not want to keep track of all previously seen states… Using evaluation functions • Most games are too big to solve even with alpha- beta pruning • Solution: Only look ahead to limited depth (nonterminal nodes) • Evaluate nodes at depth cutoff by a heuristic (aka. evaluation function) • E.g., chess: • Material value: queen worth 9 points, rook 5, bishop 3, knight 3, pawn 1 • Heuristic: difference between players’ material values Resource limits Suppose we have 100 secs, explore 104 nodes/sec 106 nodes per move Standard approach: • cutoff test: e.g., depth limit (perhaps add quiescence search) • evaluation function = estimated desirability of position Outline • Game Categories and Strategies • MinMax • Alpha-Beta Pruning • Other Game Strategies • Real-world Applications Review: Minimax 3 3 2 • Minimax(state) = Utility(state) if state is terminal max Minimax(successors(state)) if player = MAX min Minimax(successors(state)) if player = MIN 2 (Our) Basis of Game Playing: Search for best move every time Search for Opponent Move 1 Moves Initial Board State State 3 Board State 2 Board Search for Opponent Move 3 Moves Board State 4 Board State 5 Lookahead Search • If I played this move • Then they might play that move • Then I could do that move • And they would probably do that move • Or they might play that move • Then I could do that move • And they would play that move • Or I could play that move • And they would do that move • If I played this move… Lookahead Search (best moves) • If I played this move • Then their best move would be • Then my best move would be • Then their best move would be • Or another good move for them is… • Then my best move would be • Etc. Minimax Search • Like children sharing a cake • Underlying assumption • Opponent acts rationally • Each player moves in such a way as to • Maximise their final winnings, minimise their losses • i.e., play the best move at the time • Method: • Calculate the guaranteed final scores for each move • Assuming the opponent will try to minimise that score • Choose move that maximises this guaranteed score Example Trivial Game • Deal four playing cards out, face up • Player 1 chooses one, player 2 chooses one • Player 1 chooses another, player 2 chooses another • And the winner is…. • Add the cards up • The player with the highest even number • Scores that amount (in pounds sterling from opponent) For Trivial Games • Draw the entire search space • Put the scores associated with each final board state at the ends of the paths • Move the scores from the ends of the paths to the starts of the paths • Whenever there is a choice use minimax assumption • This guarantees the scores you can get • Choose the path with the best score at the top • Take the first move on this path as the next move Entire Search Space Moving the scores from the bottom to the top Moving a score when there’s a choice • Use minimax assumption • Rational choice for the player below the number you’re moving Choosing the best move For Real Games • Search space is too large • So we cannot draw (search) the entire space • For example: chess has branching factor of ~35 • Suppose our agent searches 1000 board states per second • And has a time limit of 150 seconds • So can search 150,000 positions per move • This is only three or four ply look ahead • Because 353 = 42,875 and 354 = 1,500,625 • Average humans can look ahead six-eight ply Cutoff Search • Must use a heuristic search • Use an evaluation function • Estimate the guaranteed score from a board state • Draw search space to a certain depth • Depth chosen to limit the time taken • Put the estimated values at the end of paths • Propagate them to the top as before • Question: • Is this a uniform path cost, greedy or A* search? Evaluation Functions • Must be able to differentiate between • Good and bad board states • Exact values not important • Ideally, the function would return the true score • For goal states • Example in chess • Weighted linear function • Weights: • Pawn=1, knight=bishop=3, rook=5, queen=9 Example Chess Score • Black has: • 5 pawns, 1 bishop, 2 rooks • Score = 1*(5)+3*(1)+5*(2) = 5+3+10 = 18 White has: • 5 pawns, 1 rook • Score = 1*(5)+5*(1) = 5 + 5 = 10 Overall scores for this board state: black = 18-10 = 8 white = 10-18 = -8 Evaluation Function for our Game • Evaluation after the first move • Count zero if it’s odd, take the number if its even Evaluation function here would choose 10 – But this would be disastrous for the player Problems with Evaluation Functions • Horizon problem • Agent cannot see far enough into search space • Potentially disastrous board position after seemingly good one • Possible solution • Reduce the number of initial moves to look at • Allows you to look further into the search space • Non-quiescent search • Exhibits big swings in the evaluation function • E.g., when taking pieces in chess • Solution: advance search past non-quiescent part Pruning • Want to visit as many board states as possible • Want to avoid whole branches (prune them) • Because they can’t possibly lead to a good score • Example: having your queen taken in chess • (Queen sacrifices often very good tactic, though) • Alpha-beta pruning • Can be used for entire search or cutoff search • Recognize that a branch cannot produce better score • Than a node you have already evaluated Alpha-Beta Pruning for Player 1 1. Given a node N which can be chosen by player one, then if there is another node, X, along any path, such that (a) X can be chosen by player two (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for player one than N, then the parent of N can be pruned. 2. Given a node N which can be chosen by player two, then if there is a node X along any path such that (a) player one can choose X (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for player one than N, then the parent of N can be pruned. Example of Alpha-Beta Pruning player 1 player 2 • Depth first search a good idea here • See notes for explanation Expectimax Search • Going to draw tree and move values as before • Whenever there is a random event • Add an extra node for each possible outcome which will change the board states possible after the event • E.g., six extra nodes if each roll of die affects state • Work out all possible board states from chance node • When moving score values up through a chance node • Multiply the value by the probability of the event happening • Add together all the multiplicands • Gives you expected value coming through the chance node More interesting (but still trivial) game • Deal four cards face up • Player 1 chooses a card • Player 2 throws a die • If it’s a six, player 2 chooses a card, swaps it with player 1’s and keeps player 1’s card • If it’s not a six, player 2 just chooses a card • Player 1 chooses next card • Player 2 takes the last card Expectimax Diagram Expectimax Calculations Games Played by Computer • Games played perfectly: • Connect four, noughts & crosses (tic-tac-toe) • Best move pre-calculated for each board state • Small number of possible board states • Games played well: • Chess, draughts (checkers), backgammon • Scrabble, tetris (using ANNs) • Games played badly: • Go, bridge, soccer Philosophical Questions • Q1. Is how computers plays chess • More fundamental than how people play chess? • In science, simple & effective techniques are valued • Minimax cutoff search is simple and effective • But this is seen by some as stupid and “non-AI” • Drew McDermott: • "Saying Deep Blue doesn't really think about chess is like saying an airplane doesn't really fly because it doesn't flap its wings” • Q2. If aliens came to Earth and challenged us to chess… • Would you send Deep Blue or Kasparov into battle? Additional techniques • Transposition table to store previously expanded states • Forward pruning to avoid considering all possible moves • Lookup tables for opening moves and endgames Chess playing systems • • • • Baseline system: 200 million node evalutions per move (3 min), minimax with a decent evaluation function and quiescence search • 5-ply ≈ human novice Add alpha-beta pruning • 10-ply ≈ typical PC, experienced player Deep Blue: 30 billion evaluations per move, singular extensions, evaluation function with 8000 features, large databases of opening and endgame moves • 14-ply ≈ Garry Kasparov Recent state of the art (Hydra): 36 billion evaluations per second, advanced pruning techniques • 18-ply ≈ better than any human alive? Games with Chance • Many more interesting games • Have an element of chance • Brought in by throwing a die, tossing a coin • Example: backgammon • See Gerry Tesauro’s TD-Gammon program • In these cases • We can no longer calculate guaranteed scores • We can only calculate expected scores • Using probability to guide us Games of chance • How to incorporate dice throwing into the game tree? Games of chance Games of chance • Expectiminimax: for chance nodes, average values weighted by the probability of each outcome • Nasty branching factor, defining evaluation functions and pruning algorithms more difficult • Monte Carlo simulation: when you get to a chance node, simulate a large number of games with random dice rolls and use win percentage as evaluation function • Can work well for games like Backgammon Partially observable games • Card games like bridge and poker • Monte Carlo simulation: deal all the cards randomly in the beginning and pretend the game is fully observable • “Averaging over clairvoyance” • Problem: this strategy does not account for bluffing, information gathering, etc. Outline • Game Categories and Strategies • MinMax • Alpha-Beta Pruning • Other Game Strategies • Real-world Applications Deterministic games in practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • • • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • • Othello: human champions refuse to compete against computers, who are too good. • • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. • Mechanism design (inverse game theory) • Assuming that agents pick rational strategies, how should we design the game to achieve a socially desirable outcome? • We have multiple agents and a center that collects their choices and determines the outcome Auctions • Goals • Maximize revenue to the seller • Efficiency: make sure the buyer who values the goods the most gets them • Minimize transaction costs for buyer and sellers Ascending-bid auction • What’s the optimal strategy for a buyer? • Bid until the current bid value exceeds your private value • Usually revenue-maximizing and efficient, unless the reserve price is set too low or too high • Disadvantages • Collusion • Lack of competition • Has high communication costs Sealed-bid auction • Each buyer makes a single bid and communicates it to the auctioneer, but not to the other bidders • Simpler communication • More complicated decision-making: the strategy of a buyer depends on what they believe about the other buyers • Not necessarily efficient • Sealed-bid second-price auction: the winner pays the price of the second-highest bid • Let V be your private value and B be the highest bid by any other buyer • If V > B, your optimal strategy is to bid above B – in particular, bid V • If V < B, your optimal strategy is to bid below B – in particular, bid V • Therefore, your dominant strategy is to bid V • This is a truth revealing mechanism Dollar auction • A dollar bill is being auctioned off. It goes to the highest bidder, but the second-highest bidder also has to pay • Player 1 bids 1 cent • Player 2 bids 2 cents • … • Player 2 bids 98 cents • Player 1 bids 99 cents • If Player 2 passes, he loses 98 cents, if he bids $1, he might still come out even • So Player 2 bids $1 • Now, if Player 1 passes, he loses 99 cents, if he bids $1.01, he only loses 1 cent • … • What went wrong? • When figuring out the expected utility of a bid, a rational player should take into account the future course of the game • How about Player 1 starts by bidding 99 cents? Game theory issues • Is it applicable to real life? • Humans are not always rational • Utilities may not always be known • Other assumptions made by the game-theoretic model may not hold • Political difficulties may prevent theoretically optimal mechanisms from being implemented • Could it be more applicable to AI than to real life? • Computing equilibria in complicated games is difficult • Relationship between Nash equilibrium and rational decision making is subtle The state of the art for some games • Chess: • 1997: IBM Deep Blue defeats Kasparov • … there is still debate about whether computers are really better • Checkers: • Computer world champion since 1994 • … there was still debate about whether computers are really better… • until 2007: checkers solved optimally by computer • Go: • Computers still not very good • Branching factor really high • Some recent progress • Poker: • Competitive with top humans in some 2-player games • 3+ player case much less well-understood Is this of any value to society? • Some of the techniques developed for games have found applications in other domains • Especially “adversarial” settings • Real-world strategic situations are usually not two-player, perfect-information, zero-sum, … • But game theory does not need any of those • Example application: security scheduling at airports Summary • Games are fun to work on! • • They illustrate several important points about AI • • perfection is unattainable must approximate • good idea to think about what to think about Assignment 2