Games, Theory and Application Jaap van den Herik and Jeroen Donkers Institute for Knowledge and Agent Technology Institute for Knowledge and Agent Technology, IKAT Department of Computer Science Universiteit Maastricht The Netherlands Computer Science Universiteit Maastricht SOFSEM 2004 Prague, Hotel VZ Merin Sunday, January 25 8.30-10.00h Contents • • • • • • • Sofsem'04 Games in Artificial-Intelligence Research Game-tree search principles Using opponent models Opponent-Model search Gains and risks Implementing OM search: complexities Past future of computer chess 2 Games, such as Chess, Checkers, and Go are an attractive pastime and scientifically interesting Sofsem'04 3 Chess • Much research has been performed in Computer Chess • Deep Blue (IBM) defeated the world champion Kasparov in 1997 • A Micro Computer better than the world champion? Sofsem'04 4 World Champion Programs • • • • • • • • • • • • Sofsem'04 KAISSA CHESS BELLE CRAY BLITZ CRAY BLITZ DEEP THOUGHT REBEL FRITZ SHREDDER JUNIOR SHREDDER ? 1974 1977 1980 1983 1986 1989 1992 1995 1999 2002 2003 2004 Stockholm Toronto Linz New York Keulen Edmonton Madrid Hong Kong Paderborn Maastricht Graz Ramat-Gan 5 International Draughts • Buggy best draughts program • Human better than computer, but the margin is small • Challenge: More knowledge in program Sofsem'04 6 Go • Computer Go programs are weak • Problem: recognition of patterns • Top Go programs: Go4++, Many Faces of Go, GnuGo, and Handtalk Sofsem'04 7 Awari • • • • A Mancala game (pebble-hole game) The game is a draw Solved by John Romein and Henri Bal (VU Amsterdam) 2002 All positions (900 billion) with a cluster computer of 144 1GHz processors and 72Gb ram computed in 51 hour • Proven that even the best computer programs still make many errors in their games Sofsem'04 8 Scrabble • • • • Sofsem'04 Maven beats every human opponent Author Brian Sheppard Ph.D. (UM): “Towards Perfect Play of Scrabble” (02) Maven can play scrabble in any language 9 Overview Solved Connect-four Qubic Go-Moku Renju Kalah Awari Nine men’s morris Sofsem'04 Super human Checkers (8x8) Gipf Othello Scrabble Backgammon Lines of Action Bao World champion Chess Draughts (10x10) Grand master Amazons Bridge Chinese Chess Hex Poker Amateur Go Arimaa Shogi 10 Computer Olympiad • Initiative of David Levy (1989) • Since 1989 there have been 8 olympiads; 4x Londen, 3x Maastricht, 1x Graz • Goal: • Finding the best computer program for each game • Connecting programmers / researchers of different games • Computers play each other in competition • Demonstrations: • Man versus Machine • Man + Machine versus Man + Machine Sofsem'04 11 Computer versus Computer Sofsem'04 12 Computer Olympiad • Last time in Graz 2004 • Also World Championship Computer Chess and World Championship Backgammon • 80 particpants in several categories • Competitions in olympiad’s history: Abalone, Awari, Amazons, Backgammon, Bao, Bridge, Checkers, Chess, Chinese Chess, Dots and Boxes, Draughts, Gipf, Go-Moku, 19x19 Go, 9x9 Go, Hex, Lines of Action, Poker, Renju, Roshambo, Scrabble, and Shogi Sofsem'04 13 UM Programs on the Computer Olympiads Gipfted - Gipf MIA - LOA Anky - Amazons Sofsem'04 Bao 1.0 Magog - Go 14 Computer Game-playing • Can computers beat humans in board games like Chess, Checkers, Go, Bao? • This is one of the first tasks of Artificial Intelligence (Shannon 1950) • Successes obtained in Chess (Deep Blue), Checkers, Othello, Draughts, Backgammon, Scrabble... Sofsem'04 15 Computer Game-Playing • Sizes of game trees: • • • • • Nim-5: Tic-Tac-Toe: Checkers: Chess: Go: 28 nodes 105 nodes 1031 nodes 10123 nodes 10360 nodes • In practice it is intractable to find a solution with minimax: so use heuristic search Sofsem'04 16 Three Techniques • Minimax Search • α-β Search • Opponent Modelling Sofsem'04 17 Game-playing • Domain: two-player zero-sum game with perfect information (Zermelo, 1913) • Task: find best response to opponent’s moves, provided that the opponent does the same • Solution: Minimax procedure (von Neumann, 1928) Sofsem'04 18 Minimax Search Nim-5 Players remove on turn 1, 2, or 3 matches. The player who takes the last match wins the game. Sofsem'04 19 Minimax Search 5 1 2 4 1 1 2 1 1 1 0 0 3 2 2 2 3 2 3 1 3 0 1 1 1 1 2 0 0 2 1 1 1 1 0 3 2 2 1 2 0 0 3 0 1 2 1 0 1 1 0 1 0 1 0 +1 –1 –1 +1 –1 +1 +1 –1 +1 +1 –1 +1 –1 Sofsem'04 20 Minimax Search +1 1 MINIMAXING +1 –1 1 2 2 3 +1 1 3 1 +1 2 1 1 +1 –1 –1 1 –1 –1 3 1 2 +1 –1 +1 –1 +1 1 2 2 1 –1 +1 –1 +1 +1 –1 +1 +1 2 3 –1 +1 1 1 2 1 +1 1 –1 1 +1 +1 –1 –1 +1 –1 +1 +1 –1 +1 +1 –1 +1 –1 Sofsem'04 21 Pruning • You do not need the total solution to play (well): only the move at the root is needed • Methods exist to find this move without a need for a total solution: pruning • Alpha-beta search (Knuth & Moore ‘75) • Proof-number search (Allis et al. ‘94), etc. • Playing is solving a sequence of game trees Sofsem'04 22 Heuristic Search 0.25 1 2 –1 0.25 1 0.25 2 0.33 3 3 0.5 1 0.33 –1 2 3 0.5 –1 1 0.5 2 –1 • Truncate the game tree (depth + selection) • Use a (static heuristic) evaluation function at the leaves to replace pay-offs Sofsem'04 • Miminax on the reduced game tree 23 Heuristic Search • The approach works very well in Chess, but... • Is solving a sequence of reduced games the best way to win a game? • Heuristic values are used instead of pay-offs • Additional information in the tree is unused • The opponent is not taken into account • We aim at the last item: opponents Sofsem'04 24 Minimax [3] 2 3 3 Sofsem'04 4 2 7 25 α-β algorithm 3 3 β-pruning 2 3 Sofsem'04 4 2 7 26 The strength of α-β 3 4 2 More than thousand prunings Sofsem'04 27 The importance of α-β algorithm 3 3 3 Sofsem'04 β-pruning 4 2 28 The Possibilities Of Chess THE NUMBER OF DIFFERENT, REACHABLE POSITIONS IN CHESS IS (CHINCHALKAR): 1046 Sofsem'04 29 A Clever Algorithm (α-β) SAVES THE SQAURE ROOT OF THE NUMBER OF POSSIBILITIES, N, THIS IS MORE THAN 99,99999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999 999999999999999999999999% Sofsem'04 30 A Calculation NUMBER OF POSSIBILITIES: SAVINGS BY α-Β ALGORITHM: 1000 PARALLEL PROCESSORS: POSITIONS PER SECOND: LEADS TO: 1023-12 = A CENTURY IS SOLVING CHESS: 1046 1023 103 109 1011 SECONDS 109 SECONDS 102 CENTURIES SO 100 CENTURIES OR 10,000 YEAR Sofsem'04 31 Using opponent’s strategy (NIM-5) –1 1 2 –1 1 –1 1 –1 1 2 –1 2 1 +1 –1 –1 –1 2 3 +1 1 3 3 1 +1 2 1 3 1 2 +1 –1 +1 1 2 2 1 1 –1 +1 +1 –1 +1 +1 1 –1 1 –1 1 +1 Sofsem'04 “Player 1 never takes 3” 32 Using opponent’s strategy • Well known Tic-Tac-Toe strategy: R1: make 3-in-a-row if possible R2: prevent opponent’s 3-in-a-row if possible H1: occupy central square if possible H2: occupy corner square if possible knows that Sofsem'04 uses this strategy 33 Opponent-Model search Iida, vd Herik, Uiterwijk, Herschberg (1993) Carmel and Markovitch (1993) • Opponent’s evaluation function is known (unilateral: the opponent uses minimax) • This is the opponent model • It is used to predict opponent’s moves • Best response is determined, using the own evaluation function Sofsem'04 34 OM Search • Procedure: • At opponent’s nodes: use minimax (alpha-beta) to predict the opponent’s move. Return the own value of the chosen move • At own nodes: Return (the move with) the highest value • At leaves: Return the own evaluation • Implementation: one-pass or probing Sofsem'04 35 OM Search Equations Sofsem'04 36 OM Search Example 7 7 8 Sofsem'04 V0: 8 Vop: 9 v0: 8 vop: 6 v0: 7 vop: 6 7 6 V0: 7 Vop: 6 8 V0: 8 Vop: 6 v0: 8 vop: 6 6 V0: 6 Vop: 7 37 OM Search Algorithm (probing) Sofsem'04 38 Risks and Gains in OM search • Gain: difference between the predicted move and the minimax move • Risk: difference between the move we expect and the move the opponent really selects • Prediction of the opponent is important, and depends on the quality of opponent model. Sofsem'04 39 Additional risk • Even if the prediction is correct, there are traps for OM search • Let P be the set of positions that the opponent selects, v0 be our evaluation function and vop the opponent’s function. Sofsem'04 40 Four types of error • V0 overestimates a position in P (bad) • V0 underestimates a position in P • Vop underestimates a position that enters P (good for us) • Vop overestimates a position in P Sofsem'04 41 Base case MAX 6 MIN MAX 6 6 V0 = 6 Vop = 6 Sofsem'04 v0 = 6 vop = 6 mmx = 6 obt = 6 v0 = 6 vop = 6 mmx = 6 obt = 6 2 7 8 V0 = 7 Vop = 7 V0 = 8 Vop = 8 v0 = 2 vop = 2 mmx = 2 obt = 2 2 V0 = 2 Vop = 2 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value 42 Type-I error MAX 6 MIN MAX 6 6 V0 = 6 Vop = 6 Sofsem'04 v0 = 9 vop = 6 mmx = 8 obt = 2 v0 = 6 vop = 6 mmx = 6 obt = 6 2 7 8 V0 = 7 Vop = 7 V0 = 8 Vop = 8 v0 = 9 vop = 2 mmx = 8 obt = 2 2 V0 = 9 Vop = 2 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value 43 MAX Type-I error vanished MIN MAX Sofsem'04 6 v0 = 8 vop = 8 mmx = 8 obt = 8 6 v0 = 6 vop = 6 mmx = 6 obt = 6 2 6 7 8 V0 = 6 Vop = 6 V0 = 7 Vop = 7 V0 = 8 Vop = 8 v0 = 8 vop = 8 mmx = 8 obt = 8 2 V0 = 9 Vop = 9 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value 44 Type-II error MAX 6 MIN MAX v0 = 2 vop = 6 mmx = 2 obt = 2 6 6 V0 = 1 Vop = 6 v0 = 1 vop = 6 mmx = 1 obt = 6 2 7 8 V0 = 7 Vop = 7 V0 = 8 Vop = 8 v0 = 2 vop = 2 mmx = 2 obt = 2 2 V0 = 2 Vop = 2 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value Sofsem'04 45 Type-III error MAX 6 MIN MAX 6 6 V0 = 6 Vop = 6 Sofsem'04 v0 = 8 vop = 6 mmx = 6 obt = 8 v0 = 6 vop = 6 mmx = 6 obt = 6 2 7 8 V0 = 7 Vop = 7 V0 = 8 Vop = 1 v0 = 8 vop = 1 mmx = 2 obt = 8 2 V0 = 2 Vop = 2 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value 46 Type-IV error MAX 6 MIN MAX 6 6 V0 = 6 Vop = 8 Sofsem'04 v0 = 7 vop = 7 mmx = 6 obt = 7 v0 = 7 vop = 7 mmx = 6 obt = 7 2 7 8 V0 = 7 Vop = 7 V0 = 8 Vop = 8 v0 = 2 vop = 2 mmx = 2 obt = 2 2 V0 = 2 Vop = 2 Vo: our evaluation Vop: opponent’s evaluation Mmx: minimax value Obt: obtained value 47 Pruning in OM Search • Pruning at MAX nodes is not possible in OM search, only pruning at MIN nodes can take place (Iida et al, 1993). • Analogous to α-β search, the pruning version is called β-pruning OM search Sofsem'04 48 Pruning Example 7 8 7 8 6 8 Sofsem'04 7 8 a b c 9 d 5 7 e h i 7 4 9 j 4 6 5 7 8 f g k l 5 7 m 8 n o p q r s t u v w x y 9 6 8 7 4 5 9 10 4 6 8 9 5 7 9 8 49 Two Implementations • One-pass: • visit every node at most once • back-up both your own and the opponent’s value • Probing: • At MIN nodes use α-β search to predict the opponent’s move • back-up only one value Sofsem'04 50 One-Pass β-pruning OM search Sofsem'04 51 Probing β-pruning OM search Sofsem'04 52 Best-case time complexity Sofsem'04 53 Best-case time complexity COM = Subtree A C’OM = Subtree B Sofsem'04 54 Best-case time complexity • The time complexity for the one-pass version is equal to that of the theoretical best • The time complexity for the probing version is different: • first detect the number of leaf evaluations needed: • Then find the number of probes needed: Sofsem'04 55 Best-case time complexity In the best case, pruning in the probes is not changed by β=v+1: all probes take the same number of evaluations as with a fully open window. Sofsem'04 56 Best-case time complexity Sofsem'04 57 Time complexities compared Sofsem'04 (Branching factors 4, 8, 12, 16, 20) 58 Average-case time complexity (Measured on random game trees) Sofsem'04 59 Probabilistic Opponent-model Search Donkers, vd Herik, Uiterwijk (2000) • More elaborate opponent model: • Opponent uses a mixed strategy: n different evaluation functions (opponent types) plus a probability distribution • It is assumed to approximate the true opponent’s strategy • Own evaluation function is one of the opponent types Sofsem'04 60 PrOM Search • Procedure: • At opponent’s nodes: use minimax (alphabeta) to predict the opponent’s move for all opponent types separately. Return the expected own value of the chosen moves • At own nodes: Return (the move with) the highest value • At leafs: Return the own evaluation • Implementation: one-pass or probing Sofsem'04 61 PrOM Search Equations Sofsem'04 62 PrOM Search Example v0: 7.4 7 Pr(0) = 0.3 Pr(1) = 0.7 v0: 0.3 x 6 + 0.7 x 8 = 7.4 v0: 7 7 0.0 v0: 8 Sofsem'04 8 0: 7 1: 8 0: 7 1: 6 6 1.0 0: 8 1: 7 v0: 7 7 0.7 0: 7 1: 6 v0: 8 8 0: 6 1: 8 0.3 0: 8 1: 8 v0: 6 6 0: 6 1: 10 63 PrOM Search Algorithm Sofsem'04 64 How did chess players envision the future of non-human chess players? - Euwe Donner De Groot Timman Sosonko Böhm Question: Do you think that a computer will be able to play at grandmaster level? Sofsem'04 65 Euwe (June 10, 1980) “I don’t believe so. I am almost convinced of that, too” Sofsem'04 66 Timman (June 22, 1980): “If you would express it in ELO-points, than I believe that a computer will not be able to obtain more than ...... let’s say ...... 2500 ELO-points.” Sofsem'04 67 Sosonko (August 26, 1980): • “I haven’t thought about that so much, but I believe that it will certainly not be harmful for the human chess community. It will become more interesting. I am convinced that computer chess will play a increasingly important role. It will be very interesting, I think” Sofsem'04 68 Donner (April 13, 1981): “But the computer cannot play chess at all and will never be able to play the game, at least not the first two thousand years, (...) What it does now has nothing to do with chess.” Sofsem'04 69 De Groot (June 19, 1981): “No, certainly not. As a matter of fact, I believe it will not even possible that it can reach a stable master level. I believe it is possible that the computer can reach a master level, but not that it is a stable master.” Sofsem'04 70 Contributions from Science • COMPUTERS PLAY STRONGER THAN HUMANS. • COMPUTERS CAN NOT SOLVE CHESS. • COMPUTERS ENABLE AN ALTERNATIVE FORM OF GAME EXPERIENCE. Sofsem'04 71 Conclusions 1. Chess is a frontrunner among the games 2. Kasparov’s defeat has become a victory for brute force in combination with knowledge and opponent modelling Sofsem'04 72 The Inevitable Final Conclusion TECHNOLOGY, AND IN PARTICULAR OPPONENT MODELLING, MAKES THE BEST BETTER, BUT DOES NOT LEAVE THE WEAKER PLAYERS WITH EMPTY HANDS SINCE THEIR LEVEL WILL INCREASE TOO. Sofsem'04 73