Notes adapted from lecture notes for CMSC 421 by B.J. Dorr game playing Outline • What are games? • Optimal decisions in games – Which strategy leads to success? • - pruning • Games of imperfect information • Games that include an element of chance 2 What are and why study games? • Games are a form of multi-agent environment – What do other agents do and how do they affect our success? – Cooperative vs. competitive multi-agent environments. – Competitive multi-agent environments give rise to adversarial problems a.k.a. games • Why study games? – Fun; historically entertaining – Interesting subject of study because they are hard – Easy to represent and agents restricted to small number of actions 3 Relation of Games to Search • Search – no adversary – Solution is (heuristic) method for finding goal – Heuristics and CSP techniques can find optimal solution – Evaluation function: estimate of cost from start to goal through given node – Examples: path planning, scheduling activities • Games – adversary – Solution is strategy (strategy specifies move for every possible opponent reply). – Time limits force an approximate solution – Evaluation function: evaluate “goodness” of game position – Examples: chess, checkers, Othello, backgammon 4 Types of Games 5 Game setup • Two players: MAX and MIN • MAX moves first and they take turns until the game is over. Winner gets award, looser gets penalty. • Games as search: – Initial state: e.g. board configuration of chess – Successor function: list of (move,state) pairs specifying legal moves. – Terminal test: Is the game finished? – Utility function: Gives numerical value of terminal states. E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe (next) • MAX uses search tree to determine next move. 6 Partial Game Tree for Tic-Tac-Toe 7 Optimal strategies • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) maxs successors(n) MINIMAX-VALUE(s) mins successors(n) MINIMAX-VALUE(s) If n is a terminal If n is a max node If n is a min node 8 Two-Ply Game Tree 9 Two-Ply Game Tree • MAX nodes • MIN nodes • Terminal nodes – utility values for MAX • Other nodes – minimax values • MAX’s best move at root – a1 - leads to the successor with the highest minimax value • MIN’s best to reply – b1 – leads to the successor with the lowest minimax value 10 What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX will do even better. [Can be proved.] 11 Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v 12 Properties of Minimax Criterion Minimax Complete? Yes Time O(bm) Space O(bm) Optimal? Yes 13 Tic-Tac-Toe • Depth bound 2 • Breadth first search until all nodes at level 2 are generated • Apply evaluation function to positions at these nodes 14 Tic-Tac-Toe • Evaluation function e(p) of a position p – If p is not a winning position for either player – e(p) = (number of a complete rows, columns, or diagonals that are still open for MAX) – (number of a complete rows, columns, or diagonals that are still open for MIN) – If p is a win for MAX • e(p) = – If p is a win for MIN • e(p) = - 15 Tic-Tac-Toe - First stage o x 6-4=2 5-4=1 1 x o -2 x o x x MAX’s move x 1 o x 4-5=-1 o 4-6=-2 6-6=0 o x 5-5=0 o x 5-6=-1 o 6-5=1 x 5-5=0 o x -1 x x o 5-5=0 o x x o 6-5=1 5-6=-1 16 Tic-Tac-Toe - Second stage 1 1 MAX’s move ox x o x 0 o x o x x x 1 o o x x x o x x 4-2=2 o o x x o x x 3-3=0 o o x x 4-3=1 o o x x 4-3=1 o 3-2=1 o x o 5-2=3 x o x 4-2=2 o x o x 4-2=2 o x 3-2=1 4-3=1 o ox x 4-3=1 ox x 3-2=1 5-2=3 3-3=0 ox o x 4-2=2 o o o x x o o o x 0 4-2=2 o o x x o ox x o x x o x o 5-3=2 x o x xo 3-3=0 o x o x 4-3=1 ox x o 3-2=1 ox xo 4-2=2 17 Multiplayer games • Games allow more than two players • Single minimax values become vectors 18 Problem of minimax search • Number of games states is exponential to the number of moves. – Solution: Do not examine every node – ==> Alpha-beta pruning • Alpha = value of best choice found so far at any choice point along the MAX path • Beta = value of best choice found so far at any choice point along the MIN path • Revisit example … 19 Tic-Tac-Toe - First stage Beta value = -1 B x Alpha value = -1 x 4-5=-1 o x 5-5=0 o o 6-5=1 x -1 x x o 5-5=0 A C x o 6-5=1 o x 5-6=-1 20 Alpha-beta pruning • Search progresses in a depth first manner • Whenever a tip node is generated, it’s static evaluation is computed • Whenever a position can be given a backed up value, this value is computed • Node A and all its successors have been generated • Backed up value -1 • Node B and its successors have not yet been generated • Now we know that the backed up value of the start node is bounded from below by -1 21 Alpha-beta pruning • Depending on the back up values of the other successors of the start node, the final backed up value of the start node may be greater than -1, but it cannot be less • This lower bound – alpha value for the start nodea 22 Alpha-beta pruning • Depth first search proceeds – node B and its first successor node C are generated • Node C is given a static value of -1 • Backed up value of node B is bounded from above by -1 • This Upper bound on node B – beta value. • Therefore discontinue search below node B. – Node B. will not turn out to be preferable to node A 23 Alpha-beta pruning • Reduction in search effort achieved by keeping track of bounds on backed up values • As successors of a node are given backed up values, the bounds on backed up values can be revised – Alpha values of MAX nodes that can never decrease – Beta values of MIN nodes can never increase 24 Alpha-beta pruning • Therefore search can be discontinued – Below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors • The final backed up value of this MIN node can then be set to its beta value – Below any MAX node having an alpha value greater than or equal to the beta value of any of its MINa node ancestors • The final backed up value of this MAX node can then be set to its alpha value 25 Alpha-beta pruning • during search, alpha and beta values are computed as follows: – The alpha value of a MAX node is set equal to the current largest final backed up value of its successors – The beta value of a MINof the node is set equal to the current smallest final backed up value of its successors 26 Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞] 27 Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] 28 Alpha-Beta Example (continued) [-∞,+∞] [-∞,3] 29 Alpha-Beta Example (continued) [3,+∞] [3,3] 30 Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2] 31 Alpha-Beta Example (continued) [3,14] [3,3] [-∞,2] , [-∞,14] 32 Alpha-Beta Example (continued) [3,5] [3,3] [−∞,2] , [-∞,5] 33 Alpha-Beta Example (continued) [3,3] [3,3] [−∞,2] [2,2] 34 Alpha-Beta Example (continued) [3,3] [3,3] [-∞,2] [2,2] 35 Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v-∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s, , )) if v ≥ then return v MAX( ,v) return v 36 Alpha-Beta Algorithm function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v+∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , )) if v ≤ then return v MIN( ,v) return v 37 General alpha-beta pruning • Consider a node n somewhere in the tree • If player has a better choice at – Parent node of n – Or any choice point further up • n will never be reached in actual play. • Hence when enough is known about n, it can be pruned. 38 Final Comments about AlphaBeta Pruning • Pruning does not affect final results • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • Alpha-beta pruning can look twice as far as minimax in the same amount of time • Repeated states are again possible. – Store them in memory = transposition table 39 Games that include chance • Possible moves (5-10,5-11), (5-11,19-24),(5-10,1016) and (5-11,11-16) 40 Games that include chance chance nodes • Possible moves (5-10,5-11), (5-11,19-24),(5-10,1016) and (5-11,11-16) • [1,1], [6,6] chance 1/36, all other chance 1/18 41 Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value 42 Expected minimax value EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a max node s successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree. 43 Position evaluation with chance nodes • Left, A1 wins • Right A2 wins • Outcome of evaluation function may not change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL. 44 Discussion • Examine section on state-of-the-art games yourself • Minimax assumes right tree is better than left, yet … – Return probability distribution over possible values – Yet expensive calculation 45 Discussion • Utility of node expansion – Only expand those nodes which lead to significanlty better moves • Both suggestions require meta-reasoning 46 Summary • Games are fun (and dangerous) • They illustrate several important points about AI – Perfection is unattainable -> approximation – Good idea what to think about – Uncertainty constrains the assignment of values to states • Games are to AI as grand prix racing is to automobile design. 47