Integration of Different Reasoning Modes in ... System

advertisement
From: AAAI Technical Report SS-98-04. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.
Integration of Different ReasoningModesin a Go Playing and Learning
System
Tristan
Cazenave
LIP6,Universit6Pierre et MarieCurie
4, placeJussieu
75252 PARISCEDEX05, FRANCE
Tristan.Cazenave@lip6.
fr
Abstract
Integrating multiple reasoningmodeis useful in complex
domainslike the gameof Go.Goplayers use various forms
of reasoningduringa game.Reasoning
at the tactical level is
completelydifferent fi’omreasoningat the strategic level.
Choosing
a plan requires a different formof reasoningthan
knowinghowto execute a plan. This paper gives examples
of the integration of these reasoningmodesinto a single
system. Rule-basedreasoning, Constraint-basedreasoning
andCase-based
reasoningare usedin this hierarchicalorder¯
Constraint-basedreasoninguses the results of Rule-based
reasoning, and Case-basedreasoning uses the results of
Constraint-basedreasoningand Rule-basedreasoning.
Introduction
Integrating multiple reasoning modesis useful in complex
domainslike the gameof Go. Go players use various types
of reasoning during a game.Reasoningat the tactical level
is completelydifferent fromreasoningat the strategic level.
Choosinga plan requires a different type of reasoning than
knowinghowto execute a plan. This paper gives examples
of the integration of these reasoning modesinto a single
system. This work has somesimilarities with the work by
Epstein and Gelfand [Epstein and Gelfand 1996].
The first section describes computer Go. The second
section showshowrules are used and learned in our system
at the tactical level. The third section describes some
constraints to choose a plan at the strategic level. The
fourth section provide a way to use Case-BasedReasoning
to choose the more appropriate moveto follow a plan. The
last section gives the results of our computerGosystem.
Computer
complex task. Robson [Robson 1983] proved that Go
generalized to NxNboards is exponential in time. More
concretely, Van den Herik [Van den Herik 1991] and Allis
[Allis 1994] use complexity measuresof different gamesto
compare them. They defme the whole game tree complexity
A. Considering the average length of actual ,games L and
average branching factor B, we have A = B~. The statespace complexity of a game is defmed as the number of
legal gamepositions reachable from the initial position of
the game. In Go, L~150 and B~250 hence the game tree
¯
complextty A~10360. Go state space compleyaty, bounded by
3361~e10172,and gametree complexity are far larger than
those of any other perfect-information game. Moreover,a
position is takes time to evaluate, on the contrary of chess
where positions can be evaluated very fast. This makesGo
very difficult
to program. Computer Go has been
recognized as a challenge for Artificial Intelligence
[Selman 1996].
Go
The game of Go
Gowasdevelopedthree to four millennia ago in China; it is
the oldest and one of the most popular board gamein the
world. Like chess, it is a deterministic, perfect information,
zero-sumgameof strategy between two players. In spite of
the simplicity of its rules, playing the gameof Gois a very
Figure 1
The board is madeof 19 vertical lines and 19 horizontal
lines which cut themselves into 361 intersections. At the
beginning the board is empty. Eachplayer (Black or White)
moves alternatively
in adding one stone on an empty
intersection. Twoadjacent stones of the same color are
91
connected and they are part of the same string. For
example, the white stones of Figure 1 marked with A are
connected and are part of the same string. Emptyadjacent
intersections of a string are the liberties of the string. The
string of four markedwhite stones of Figure 1 has eight
liberties. Whena movefills the last liberty of a string, this
string is removed from the board. The repetitions of
positions are forbidden. According to the possibility of
being captured or not, the strings maybe dead or alive. A
player controls an intersection either whenhe has an alive
stone on it, either when the intersection is empty but
adjacent to alive stones. The aim of the gameis to control
more intersections than the opponent. The gameends when
the two players pass.
In spite of the simplicity of the rules, a Goplayer uses a
lot of concepts to understanda position and to play a move.
This paragraph briefly showssomeintuitive definitions of
these concepts. At the lower level, a player looks at the
safety of the strings in performing look-ahead. Whena
string has enoughliberties, the string is said to be safe. A
player also checks if an intersection is controlled by one
player or not. An eye is a small enclosed area, Figure 2
gives an exampleof an eye on intersection A. In this figure,
B is one of the four diagonal intersections of A. When
searching to makean eye, it is importantto control diagonal
intersections.
Figure 2
A virtual connection is a spatial configuration that
enables to connect strings whatever the opponent plays.
Figure 3 gives an exampleof a ’Bamboojoin’. If the white
player plays at A, black plays at B and connects its stones.
If white plays at B, then black at A connects. The four
stones are virtually connected.
-A-B-
Go, the best Go playing programs rely on a knowledge
intensive approach. They are generally divided in two
modules:
¯
A tactical module that develops narrow and deep
search trees. Each tree is related to the achievement
of a goal of the gameof Go.
¯
A strategic module that chooses the moveto play
accordingto the results of the tactical module.
Strategic reasoning is concernedwith groups of stones. A
group of stones is a set of stones of the samecolor, each
stone can be connectedto each other.
Different types of reasoning are required in these
modules. The tactical module uses rules to decide what
movesto try in the search trees. The strategic modulehas to
choose a plan and to execute it. A good way to choose a
plan is to use constraints on the groups calculated by the
tactical module. Choosing a movethat executes the plan
can be done by comparingthe present situation with cases
previously encountered in games.
Rule based reasoning
Rule based reasoning is used in the tactical moduleof the
system. The rules are used to decide what movesto try in a
search tree. These rules are automatically created by an
Explanation Based Learning system named Introspect
[Cazenave 1996]. Introspect is an introspective learning
system [Cox 1996], such systems have been formalized in
[Mitchell 1986] [Laird 1986] [Dejong 1986] and they have
received attention more recently in [Ram& Leake 1995].
The rules learned by Introspect enable to consider only
between 1 and 5 movesout of the 250 possible moveson a
board. They exponentially decrease the size and time of the
brute force search tree. This enables our Go program to
took 60 moves ahead in some tactical positions. The
formalism used to represent these rules is f’n’st order
predicate logic. The rules are learned by Introspect, only
given the rules of the gamein predicate logic.
Example of a (simple) learned rule used to find
connectionsbetweenstrings of stones :
Figure 3
Connect( S1 $2 1 friend ) :- Color ( S1 friend ), Color
friend ), Liberty( I S 1 ), Liberty( I $2), Move
( I friend
Using these tactical results, a Go player starts its
strategic reasoning with the use of groups. A group is a
complexconcept for humanplayers. It maybe either a set
of intersections that are virtually connected,either a set of
intersections that gather the sameproperties. A group has a
status. Astatus is deador alive and it is derived fromother
intuitive conceptslike influence, fight, circling, life-base.
The reader does not need explanations of these concepts to
understandthe following sections.
This rule tells that if an intersection I is a liberty of
strings S1 and S2 that are friend strings, playing a black
stone at I enables to achieve the goal Connectbetweenthe
twostrings.
The target concepts of the Explanation Based Learning
module are the tactical subgoals of the game of Go :
Removea string, Makea string alive, Connecttwo strings,
Disconnect two strings, Makean eye and Removean eye.
Each of these target concepts is defined using rules in
predicate logic. For example the target concept for the
tactical goal RemoveString
is defined using this rule:
Differentlevels In a Goprogram
As it is impossibleto search the entire tree for the gameof
92
gemoveString
( S I friend ) :- Color( S enemy), Move
enemy),NumberOfLibertiesBeforeMove
( S 1 ), Liberty
S ), LegalMove
( I enemy
Thousands
of rules are created by usingthe rules of the
gameto specializethe tactical goals.
Figure4
Theexamplelearned rule is learned by explaining why
the movemarkedA in the Figure4 connectsthe two black
strings. Theinitial target conceptdefiningthe Connectgoal
is:
ConnectedAfterMove
( S 1 $2 ) :- Color( S 1 C ), Color
C ), ElementOfARerMove
( I S 1 ), ElementOfAfterMove
IS2).
Therules used to specialize the target conceptin this
exampleare:
ElementOfAfterMove
( I S ) :- Liberty( I S ), Color( S
Move
( I C ).
Connect ( S1 $2 I friend ):- Move( I friend
ConnectedAfterMove
( S 1 $2).
Notethat there are different predicates to describethe
boardafter the moveandthe boardbeforethe move.This is
to preventside effects to happen,andto avoidincomplete
explanations.
At eachnodeof the prooftree, learnedrules are usedto
select useful movesto try. Knowledge
about the movesto
try in the searchtrees are representedusingpredicatelogic
rules becausethese rules represent theoremsabout the
movesusefulor necessaryto try andthe movesnot to try.
Constraint based reasoning
Figure5
Theconstraints are about groups.Groupsare constructed
usingthe results of the tactical module.For example,each
point of territory is the result of a prooftree. Theprooftree
is developedfor proving that a string S that belongsto
group G can be connectedto the intersection I if Friend
plays fast. If no opponentstring can connectto the same
intersection I, then this intersectionis a territory of the
groupGthat containsthe string S.
In the Figure5, wegive a boardwherethe examplerule
with the consWaintsapplies. ThegroupG2is markedwith
2, andthe groupsG1with 1. Thepoints of territory of the
group G2are markedwith little black points. There are
morethan forty points of territory for the white group,
mainlyon the upper left side of the board. G2has five
points of territory andonly oneeye. Theconstraints of the
example
rule are verified, so the goalSave( G2) is active.
Usingconstraints is the obviouswayto describe that
groups are unsafe under somecritical threshold of the
numbersrepresentingtheir properties. Groupshavea lot of
numericalproperties that are related to their safety, so
constraints enable to express easily knowledgeabout the
safety of groups.
Case-based Reasoning
Constraintscan be usedin gamesto choosea plan [Nigro&
Cazenave1996]. They are used in the Go program to
chooseplansat the strategic level. Forexample
:
Save (G2):- Neighbor ( G2 G1 ), Territory ( G2
Territory( G1 ), Territory( G2) + PotentialTerritory
(
/ 2 < 9, NumberOfEyes
( G2) <
Thisconstrainttells that it is interestingto savegroup(32
if it has a neighboringgroupG1,and G2has less territory
than GI, and if the sumof the territory of G2and of the
potentialterritory of G2dividedbytwois less than9, andif
the number
of eyes of G2is less than 2.
93
Oncea plan has been chosen, the programhas to choose
howto applythe plan. This is the nextpart of the strategic
level. Tacticalgoals andstrategic plansare calculatedon a
set of typical positions. It providesa set of cases with
associated moves.Themovescan be the movesto play or
the moves
not to play.
There are different degrees of similarity betweenthe
predicates describinggroups. For example,the conditions
’Territory ( G) = 9’ and ’Territory (G) = 10’ are
similar. Whereasthe conditions ’NumberOfEyes
( G) =
and ’NumberOfEyes
( G) = 2’ are very dissimilar.
Eachmoveis associated to the goal it achieves. The
movesthat achievethe plans chosenby the constraints are
selected. Each plan has a value, a movecan achieve
multiple plans. Thevalue of each moveis the sumof the
value of the plans the moveachieves. The movewith the
highestvalueis played.
participants. It finished 6 out of 40 participants. Thefive
fLrSt programsare commercial
programsthat haverequired
a lot of person*yearsof work. It has outperformedother
commercialsystems that have required more than 10
person*yearsof work.
Conclusion
Figure6
In the Figure6, the situation of the groupsmarkedwith 1
is very similar to the situation of the groupmarkedwith 2
in Figure5. Group1 in Figure6 has 3 points of territory
and only one eye, whereasthe neighboring enemygroup
has muchmoreterritory. Thesolution remembered
to save
the group1 in Figure6 is to play the movethat enablesto
maketwoeyes andtherefore to live (preventingforeverthe
opponentto removethe group from the board). According
to the stored movemarkedA in Figure 6, the systemwill
chooseto play the movemarkedAin Figure5.
Case-Based
Reasoningenables imprecisionin the use of
knowledge.Strategy is naturally imprecise. Allowingto
define similarity with somereference groupsenables to
have a concise and general representation of strategic
knowledge.
Results
The Goprogramplays a movein 10 secondson a Pentium
133 MHz,for each moveit proves about 450 tactical
theorems,each theoremrequires between4 and 600 nodes
in a searchtree to be proved,at eachnodeof eachtree, the
rules learned by Introspect are called to fred the useful
movesto try. Introspect has learned thousandsof tactical
rules. All the learned rules are compiledinto a 1 000 000
lines C++program.Thestrategic level choosesplans using
constraints on someproperties of the groups. Thegroups
andtheir propertiesare built usingthe results of the tactical
level. When
strategic plans are chosen,movesrelated to the
plans are chosenusing informationabout previously seen
similarsituations.
Gogol competedin the international computer Go
tournamentheld during IJCAI97together with 40 other
Wehaveshownhowto integrate multiple reasoning modes
in a complexdomainthat requires different forms of
reasoning.Rule-based
reasoningis usedat the tactical level
in our Goprogramto select the useful movesto try when
searching. Constraint-basedreasoning is used to select
interesting plans accordingto constraints. Oncethe plans
are chosen, Case-basedreasoning is used to select the
movesthat enable the plans to work. Each movehas a
value that is the sumof the values of the plans the move
achieves. The resulting Go programhas goodresults in
international competitions(best non-commercial
program).
This approachcombiningmultiple types of reasoningcan
also be used in other domainsthat are complexenoughto
require different kind of knowledgeto use knowledge
[Pitrat 1990].
Alesson learned fromapplyingmultimodalreasoningto
a very complex
task like the gameof Go,is that in complex
domains,as weneed a lot of knowledge,using multiple
reasoningmodesis appropriatebecausethere are different
types of knowledge.Eachtype of knowledge
is suited to a
particular reasoningmode.Theproblemis to split a system
into modules, and to choose a reasoning modefor each
module.In our application, wechoseto separateour system
in three modules:A theoremprover that uses rules and
predicate logic for exact computations.Amodulebased on
constraints to choose plans according to predefmed
thresholds. A Case-BasedReasoningmodulethat enables
imprecisionin the recognition of howmucha moveenables
to achievea strategic goalthat cannotbe exactlyforeseen.
HumanGo players also use different reasoning mode
whenstudying a position. They search very fast when
readingtactical sequencesof moves,using complexlearned
patterns to choose the movesto try. They have a less
rigorous reasoningmodewhenthey think strategically. As
we have shownwith the gameof Go, we believe that the
ability to switchbetweenreasoningmodesis necessaryto
havegoodperformancesin manycomplexcognitive tasks.
References
Allis, L. V. 1994. Searching for Solutions in Gamesan
Artificial Intelligence. Ph.D. diss., Vrije Universitat
Amsterdam,
Maastricht.
Cazenave, T. 1996. Syst6med’Apprentissage par AutoObservation. Application au Jeu de Go. Ph.D. diss.,
Universit6Paris6.
Cox, M. T. 1996. Introspective Multistrategy Learning:
Constructive a Learning Strategy UnderReasoningFailure.
Ph.D. diss., Georgia Institute of Technology, College of
Computing,Atlanta.
Dejong, G. and Mooney, R. 1986. Explanation Based
Learning : an alternative view. MachineLearning 1 (2).
Epstein, S. L. and Gelfand J. J. 1996. Pattern-based
learning and spatially-oriented
concept formation in a
multi-agent, decision-making expert.
Computational
Intelligence 12 (1): 199-221.
J.; Rosenbloom,P. and Newell A. 1986. Chunkingin
SOAR: An Anatomy of a General Learning Mechanism.
MachineLearning 1 (1).
Laird,
Mitchell, T. M.; Keller, R. M. and Kedar-Kabelli S. T.
1986. Explanation-based Generalization : A unifying view.
MachineLearning 1 (1), 1986.
Nigro, J.-M. and Cazenave, T. 1996. Constraint-based
explanations in games. In Proceedings of IPMU’96,
Granada, Spain.
Pitrat, J. 1990. Mitaconnaissance- Futur de l’Intelligence
Artificielle. Herm6s,Paris.
Ram, A. and Leake, D. 1995. Goal-Driven Learning.
Cambridge, MA,MITPress/Bradford Books.
Robson, J. M. 1983. The Complexityof Go. In Proceedings
IFIP, 413-417.
Selman, B.; Brooks, R. A.; Dean, T.; Horvitz, E.; Mitchell,
T. M.; Nilsson, N. J. 1996. Challenge Problems for
Artificial Intelligence. In Proceedings AAAI-96,13401345.
Vanden Herik, H. J.; Allis, L. V.; Herschberg, I. S. 1991.
Which GamesWill Survive ? Heuristic Programming in
Artificial Intelligence 2, the Second ComputerOlympiad
(eds. D. N. L. Levy and D. F. Beal), pp. 232-243. Ellis
Horwood. ISBN0-13-382615-5. 1991.
95
Download