From: AAAI Technical Report SS-98-04. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. Integration of Different ReasoningModesin a Go Playing and Learning System Tristan Cazenave LIP6,Universit6Pierre et MarieCurie 4, placeJussieu 75252 PARISCEDEX05, FRANCE Tristan.Cazenave@lip6. fr Abstract Integrating multiple reasoningmodeis useful in complex domainslike the gameof Go.Goplayers use various forms of reasoningduringa game.Reasoning at the tactical level is completelydifferent fi’omreasoningat the strategic level. Choosing a plan requires a different formof reasoningthan knowinghowto execute a plan. This paper gives examples of the integration of these reasoningmodesinto a single system. Rule-basedreasoning, Constraint-basedreasoning andCase-based reasoningare usedin this hierarchicalorder¯ Constraint-basedreasoninguses the results of Rule-based reasoning, and Case-basedreasoning uses the results of Constraint-basedreasoningand Rule-basedreasoning. Introduction Integrating multiple reasoning modesis useful in complex domainslike the gameof Go. Go players use various types of reasoning during a game.Reasoningat the tactical level is completelydifferent fromreasoningat the strategic level. Choosinga plan requires a different type of reasoning than knowinghowto execute a plan. This paper gives examples of the integration of these reasoning modesinto a single system. This work has somesimilarities with the work by Epstein and Gelfand [Epstein and Gelfand 1996]. The first section describes computer Go. The second section showshowrules are used and learned in our system at the tactical level. The third section describes some constraints to choose a plan at the strategic level. The fourth section provide a way to use Case-BasedReasoning to choose the more appropriate moveto follow a plan. The last section gives the results of our computerGosystem. Computer complex task. Robson [Robson 1983] proved that Go generalized to NxNboards is exponential in time. More concretely, Van den Herik [Van den Herik 1991] and Allis [Allis 1994] use complexity measuresof different gamesto compare them. They defme the whole game tree complexity A. Considering the average length of actual ,games L and average branching factor B, we have A = B~. The statespace complexity of a game is defmed as the number of legal gamepositions reachable from the initial position of the game. In Go, L~150 and B~250 hence the game tree ¯ complextty A~10360. Go state space compleyaty, bounded by 3361~e10172,and gametree complexity are far larger than those of any other perfect-information game. Moreover,a position is takes time to evaluate, on the contrary of chess where positions can be evaluated very fast. This makesGo very difficult to program. Computer Go has been recognized as a challenge for Artificial Intelligence [Selman 1996]. Go The game of Go Gowasdevelopedthree to four millennia ago in China; it is the oldest and one of the most popular board gamein the world. Like chess, it is a deterministic, perfect information, zero-sumgameof strategy between two players. In spite of the simplicity of its rules, playing the gameof Gois a very Figure 1 The board is madeof 19 vertical lines and 19 horizontal lines which cut themselves into 361 intersections. At the beginning the board is empty. Eachplayer (Black or White) moves alternatively in adding one stone on an empty intersection. Twoadjacent stones of the same color are 91 connected and they are part of the same string. For example, the white stones of Figure 1 marked with A are connected and are part of the same string. Emptyadjacent intersections of a string are the liberties of the string. The string of four markedwhite stones of Figure 1 has eight liberties. Whena movefills the last liberty of a string, this string is removed from the board. The repetitions of positions are forbidden. According to the possibility of being captured or not, the strings maybe dead or alive. A player controls an intersection either whenhe has an alive stone on it, either when the intersection is empty but adjacent to alive stones. The aim of the gameis to control more intersections than the opponent. The gameends when the two players pass. In spite of the simplicity of the rules, a Goplayer uses a lot of concepts to understanda position and to play a move. This paragraph briefly showssomeintuitive definitions of these concepts. At the lower level, a player looks at the safety of the strings in performing look-ahead. Whena string has enoughliberties, the string is said to be safe. A player also checks if an intersection is controlled by one player or not. An eye is a small enclosed area, Figure 2 gives an exampleof an eye on intersection A. In this figure, B is one of the four diagonal intersections of A. When searching to makean eye, it is importantto control diagonal intersections. Figure 2 A virtual connection is a spatial configuration that enables to connect strings whatever the opponent plays. Figure 3 gives an exampleof a ’Bamboojoin’. If the white player plays at A, black plays at B and connects its stones. If white plays at B, then black at A connects. The four stones are virtually connected. -A-B- Go, the best Go playing programs rely on a knowledge intensive approach. They are generally divided in two modules: ¯ A tactical module that develops narrow and deep search trees. Each tree is related to the achievement of a goal of the gameof Go. ¯ A strategic module that chooses the moveto play accordingto the results of the tactical module. Strategic reasoning is concernedwith groups of stones. A group of stones is a set of stones of the samecolor, each stone can be connectedto each other. Different types of reasoning are required in these modules. The tactical module uses rules to decide what movesto try in the search trees. The strategic modulehas to choose a plan and to execute it. A good way to choose a plan is to use constraints on the groups calculated by the tactical module. Choosing a movethat executes the plan can be done by comparingthe present situation with cases previously encountered in games. Rule based reasoning Rule based reasoning is used in the tactical moduleof the system. The rules are used to decide what movesto try in a search tree. These rules are automatically created by an Explanation Based Learning system named Introspect [Cazenave 1996]. Introspect is an introspective learning system [Cox 1996], such systems have been formalized in [Mitchell 1986] [Laird 1986] [Dejong 1986] and they have received attention more recently in [Ram& Leake 1995]. The rules learned by Introspect enable to consider only between 1 and 5 movesout of the 250 possible moveson a board. They exponentially decrease the size and time of the brute force search tree. This enables our Go program to took 60 moves ahead in some tactical positions. The formalism used to represent these rules is f’n’st order predicate logic. The rules are learned by Introspect, only given the rules of the gamein predicate logic. Example of a (simple) learned rule used to find connectionsbetweenstrings of stones : Figure 3 Connect( S1 $2 1 friend ) :- Color ( S1 friend ), Color friend ), Liberty( I S 1 ), Liberty( I $2), Move ( I friend Using these tactical results, a Go player starts its strategic reasoning with the use of groups. A group is a complexconcept for humanplayers. It maybe either a set of intersections that are virtually connected,either a set of intersections that gather the sameproperties. A group has a status. Astatus is deador alive and it is derived fromother intuitive conceptslike influence, fight, circling, life-base. The reader does not need explanations of these concepts to understandthe following sections. This rule tells that if an intersection I is a liberty of strings S1 and S2 that are friend strings, playing a black stone at I enables to achieve the goal Connectbetweenthe twostrings. The target concepts of the Explanation Based Learning module are the tactical subgoals of the game of Go : Removea string, Makea string alive, Connecttwo strings, Disconnect two strings, Makean eye and Removean eye. Each of these target concepts is defined using rules in predicate logic. For example the target concept for the tactical goal RemoveString is defined using this rule: Differentlevels In a Goprogram As it is impossibleto search the entire tree for the gameof 92 gemoveString ( S I friend ) :- Color( S enemy), Move enemy),NumberOfLibertiesBeforeMove ( S 1 ), Liberty S ), LegalMove ( I enemy Thousands of rules are created by usingthe rules of the gameto specializethe tactical goals. Figure4 Theexamplelearned rule is learned by explaining why the movemarkedA in the Figure4 connectsthe two black strings. Theinitial target conceptdefiningthe Connectgoal is: ConnectedAfterMove ( S 1 $2 ) :- Color( S 1 C ), Color C ), ElementOfARerMove ( I S 1 ), ElementOfAfterMove IS2). Therules used to specialize the target conceptin this exampleare: ElementOfAfterMove ( I S ) :- Liberty( I S ), Color( S Move ( I C ). Connect ( S1 $2 I friend ):- Move( I friend ConnectedAfterMove ( S 1 $2). Notethat there are different predicates to describethe boardafter the moveandthe boardbeforethe move.This is to preventside effects to happen,andto avoidincomplete explanations. At eachnodeof the prooftree, learnedrules are usedto select useful movesto try. Knowledge about the movesto try in the searchtrees are representedusingpredicatelogic rules becausethese rules represent theoremsabout the movesusefulor necessaryto try andthe movesnot to try. Constraint based reasoning Figure5 Theconstraints are about groups.Groupsare constructed usingthe results of the tactical module.For example,each point of territory is the result of a prooftree. Theprooftree is developedfor proving that a string S that belongsto group G can be connectedto the intersection I if Friend plays fast. If no opponentstring can connectto the same intersection I, then this intersectionis a territory of the groupGthat containsthe string S. In the Figure5, wegive a boardwherethe examplerule with the consWaintsapplies. ThegroupG2is markedwith 2, andthe groupsG1with 1. Thepoints of territory of the group G2are markedwith little black points. There are morethan forty points of territory for the white group, mainlyon the upper left side of the board. G2has five points of territory andonly oneeye. Theconstraints of the example rule are verified, so the goalSave( G2) is active. Usingconstraints is the obviouswayto describe that groups are unsafe under somecritical threshold of the numbersrepresentingtheir properties. Groupshavea lot of numericalproperties that are related to their safety, so constraints enable to express easily knowledgeabout the safety of groups. Case-based Reasoning Constraintscan be usedin gamesto choosea plan [Nigro& Cazenave1996]. They are used in the Go program to chooseplansat the strategic level. Forexample : Save (G2):- Neighbor ( G2 G1 ), Territory ( G2 Territory( G1 ), Territory( G2) + PotentialTerritory ( / 2 < 9, NumberOfEyes ( G2) < Thisconstrainttells that it is interestingto savegroup(32 if it has a neighboringgroupG1,and G2has less territory than GI, and if the sumof the territory of G2and of the potentialterritory of G2dividedbytwois less than9, andif the number of eyes of G2is less than 2. 93 Oncea plan has been chosen, the programhas to choose howto applythe plan. This is the nextpart of the strategic level. Tacticalgoals andstrategic plansare calculatedon a set of typical positions. It providesa set of cases with associated moves.Themovescan be the movesto play or the moves not to play. There are different degrees of similarity betweenthe predicates describinggroups. For example,the conditions ’Territory ( G) = 9’ and ’Territory (G) = 10’ are similar. Whereasthe conditions ’NumberOfEyes ( G) = and ’NumberOfEyes ( G) = 2’ are very dissimilar. Eachmoveis associated to the goal it achieves. The movesthat achievethe plans chosenby the constraints are selected. Each plan has a value, a movecan achieve multiple plans. Thevalue of each moveis the sumof the value of the plans the moveachieves. The movewith the highestvalueis played. participants. It finished 6 out of 40 participants. Thefive fLrSt programsare commercial programsthat haverequired a lot of person*yearsof work. It has outperformedother commercialsystems that have required more than 10 person*yearsof work. Conclusion Figure6 In the Figure6, the situation of the groupsmarkedwith 1 is very similar to the situation of the groupmarkedwith 2 in Figure5. Group1 in Figure6 has 3 points of territory and only one eye, whereasthe neighboring enemygroup has muchmoreterritory. Thesolution remembered to save the group1 in Figure6 is to play the movethat enablesto maketwoeyes andtherefore to live (preventingforeverthe opponentto removethe group from the board). According to the stored movemarkedA in Figure 6, the systemwill chooseto play the movemarkedAin Figure5. Case-Based Reasoningenables imprecisionin the use of knowledge.Strategy is naturally imprecise. Allowingto define similarity with somereference groupsenables to have a concise and general representation of strategic knowledge. Results The Goprogramplays a movein 10 secondson a Pentium 133 MHz,for each moveit proves about 450 tactical theorems,each theoremrequires between4 and 600 nodes in a searchtree to be proved,at eachnodeof eachtree, the rules learned by Introspect are called to fred the useful movesto try. Introspect has learned thousandsof tactical rules. All the learned rules are compiledinto a 1 000 000 lines C++program.Thestrategic level choosesplans using constraints on someproperties of the groups. Thegroups andtheir propertiesare built usingthe results of the tactical level. When strategic plans are chosen,movesrelated to the plans are chosenusing informationabout previously seen similarsituations. Gogol competedin the international computer Go tournamentheld during IJCAI97together with 40 other Wehaveshownhowto integrate multiple reasoning modes in a complexdomainthat requires different forms of reasoning.Rule-based reasoningis usedat the tactical level in our Goprogramto select the useful movesto try when searching. Constraint-basedreasoning is used to select interesting plans accordingto constraints. Oncethe plans are chosen, Case-basedreasoning is used to select the movesthat enable the plans to work. Each movehas a value that is the sumof the values of the plans the move achieves. The resulting Go programhas goodresults in international competitions(best non-commercial program). This approachcombiningmultiple types of reasoningcan also be used in other domainsthat are complexenoughto require different kind of knowledgeto use knowledge [Pitrat 1990]. Alesson learned fromapplyingmultimodalreasoningto a very complex task like the gameof Go,is that in complex domains,as weneed a lot of knowledge,using multiple reasoningmodesis appropriatebecausethere are different types of knowledge.Eachtype of knowledge is suited to a particular reasoningmode.Theproblemis to split a system into modules, and to choose a reasoning modefor each module.In our application, wechoseto separateour system in three modules:A theoremprover that uses rules and predicate logic for exact computations.Amodulebased on constraints to choose plans according to predefmed thresholds. A Case-BasedReasoningmodulethat enables imprecisionin the recognition of howmucha moveenables to achievea strategic goalthat cannotbe exactlyforeseen. HumanGo players also use different reasoning mode whenstudying a position. They search very fast when readingtactical sequencesof moves,using complexlearned patterns to choose the movesto try. They have a less rigorous reasoningmodewhenthey think strategically. As we have shownwith the gameof Go, we believe that the ability to switchbetweenreasoningmodesis necessaryto havegoodperformancesin manycomplexcognitive tasks. References Allis, L. V. 1994. Searching for Solutions in Gamesan Artificial Intelligence. Ph.D. diss., Vrije Universitat Amsterdam, Maastricht. Cazenave, T. 1996. Syst6med’Apprentissage par AutoObservation. Application au Jeu de Go. Ph.D. diss., Universit6Paris6. Cox, M. T. 1996. Introspective Multistrategy Learning: Constructive a Learning Strategy UnderReasoningFailure. Ph.D. diss., Georgia Institute of Technology, College of Computing,Atlanta. Dejong, G. and Mooney, R. 1986. Explanation Based Learning : an alternative view. MachineLearning 1 (2). Epstein, S. L. and Gelfand J. J. 1996. Pattern-based learning and spatially-oriented concept formation in a multi-agent, decision-making expert. Computational Intelligence 12 (1): 199-221. J.; Rosenbloom,P. and Newell A. 1986. Chunkingin SOAR: An Anatomy of a General Learning Mechanism. MachineLearning 1 (1). Laird, Mitchell, T. M.; Keller, R. M. and Kedar-Kabelli S. T. 1986. Explanation-based Generalization : A unifying view. MachineLearning 1 (1), 1986. Nigro, J.-M. and Cazenave, T. 1996. Constraint-based explanations in games. In Proceedings of IPMU’96, Granada, Spain. Pitrat, J. 1990. Mitaconnaissance- Futur de l’Intelligence Artificielle. Herm6s,Paris. Ram, A. and Leake, D. 1995. Goal-Driven Learning. Cambridge, MA,MITPress/Bradford Books. Robson, J. M. 1983. The Complexityof Go. In Proceedings IFIP, 413-417. Selman, B.; Brooks, R. A.; Dean, T.; Horvitz, E.; Mitchell, T. M.; Nilsson, N. J. 1996. Challenge Problems for Artificial Intelligence. In Proceedings AAAI-96,13401345. Vanden Herik, H. J.; Allis, L. V.; Herschberg, I. S. 1991. Which GamesWill Survive ? Heuristic Programming in Artificial Intelligence 2, the Second ComputerOlympiad (eds. D. N. L. Levy and D. F. Beal), pp. 232-243. Ellis Horwood. ISBN0-13-382615-5. 1991. 95