Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex Outline • General machine intelligence: the ingredients • Monte Carlo Tree Search – A quick overview and tutorial • Example application: Mapello – Note: Game AI is Real AI !!! • Example test problem: Physical TSP • Results of open competitions • Challenges and future directions General Machine Intelligence: the ingredients • Evolution • Reinforcement Learning • Function approximation – Neural nets, N-Tuples etc • Selective search / Sample based planning / Monte Carlo Tree Search Conventional Game Tree Search • Minimax with alpha-beta pruning, transposition tables • Works well when: – A good heuristic value function is known – The branching factor is modest • E.g. Chess: Deep Blue, Rybka – Super-human on a smartphone! • Tree grows exponentially with search depth Go • Much tougher for computers • High branching factor • No good heuristic value function • MCTS to the rescue! “Although progress has been steady, it will take many decades of research and development before world-championship– calibre go programs exist”. Jonathan Schaeffer, 2001 Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT) Further reading: Attractive Features • Anytime • Scalable – Tackle complex games and planning problems better than before – May be logarithmically better with increased CPU • No need for heuristic function – Though usually better with one • Next we’ll look at: – General MCTS – UCT in particular MCTS: the main idea • Tree policy: choose which node to expand (not necessarily leaf of tree) • Default (simulation) policy: random playout until end of game MCTS Algorithm • Decompose into 6 parts: • MCTS main algorithm – Tree policy • Expand • Best Child (UCT Formula) – Default Policy – Back-propagate • We’ll run through these then show demos MCTS Main Algorithm • • BestChild simply picks best child node of root according to some criteria: e.g. best mean value In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used – E.g. final selection can be the max value child or the most frequently visited one TreePolicy • Note that node selected for expansion does not need to be a leaf of the tree • But it must have at least one untried action Expand Best Child (UCT) • This is the standard UCT equation – Used in the tree • Higher values of c lead to more exploration • Other terms can be added, and usually are – More on this later DefaultPolicy • Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached • The standard is to do this uniformly randomly – But better performance may be obtained by biasing with knowledge Backup • Note that v is the new node added to the tree by the tree policy • Back up the values from the added node up the tree to the root MCTS Builds Asymmetric Trees (demo) All Moves As First (AMAF), Rapid Value Action Estimates (RAVE) • Additional term in UCT equation: – Treat actions / moves the same independently of where they occur in the move sequence Using for a new problem: Implement the State interface Example Application: Mapello Othello • Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour • Pincered counters are flipped to your own colour • Winner is player with most pieces at the end Basics of Good Game Design • Simple rules • Balance • Sense of drama • Outcome should not be obvious Othello Example – white leads: -58 (from http://radagast.se/othello/Help/strategy.html ) … … … Black wins with score of 16 Mapello • Take the counter-flipping drama of Othello • Apply it to novel situations – Obstacles – Power-ups (e.g. triple square score) – Large maps with power-plays e.g. line fill • Novel games – Allow users to design maps that they are expert in – The map design is part of the game • Research bonus: large set of games to experiment with Example Initial Maps Or how about this? Need Rapidly Smart AI • Give players a challenging game – Even when the game map can be new each time • Obvious easy to apply approaches – TD Learning – Monte Carlo Tree Search (MCTS – Combinations of these … • E.g. Silver et al, ICML 2008 • Robles et al, CIG 2011 MCTS (see Browne et al, TCIAIG 2012) • Simple algorithm • Anytime • No need for a heuristic value function • E-E balance • Works well across a range of problems Demo • TDL learns reasonable weights rapidly • How well will this play at 1 ply versus limited toll-out MCTS? For Strong Play … • Combine MCTS, TDL, N-Tuples Where to play / buy • Coming to Android (November 2012) • Nestorgames (http://www.nestorgames.com) MCTS in Real-Time Games: PTSP • Hard to get long-term planning without good heuristics Optimal TSP order != PTSP Order 36 MCTS: Challenges and Future Directions • Better handling of problems with continuous action spaces – Some work already done on this • Better understanding of handling real-time problems – Use of approximations and macro-actions • Stochastic and partially observable problems / games of incomplete and imperfect information • Hybridisation: – with evolution – with other tree search algorithms Conclusions • MCTS: a major new approach to AI • Works well across a range of problems – Good performance even with vanilla UCT – Best performance requires tuning and heuristics – Sometimes the UCT formula is modified or discarded • Can be used in conjunction with RL – Self tuning • And with evolution – E.g. evolving macro-actions Further reading and links • http://ptsp-game.net/ • http://www.pacman-vs-ghosts.net/