Continuous Case-Based Reasoning

advertisement
From: AAAI Technical Report WS-93-01. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
Continuous Case-Based Reasoning
Ashwin Ramand Juan Carlos Santamaria
College of Computing
GeorgiaInstitute of Technology
Atlanta, Georgia30332-0280
{ashwin, carlos}@cc,gatech, edu
Abstract
Case-basedreasoning systems have traditionallybeen
used to perform high-level reasoning in problemdomains that can be adequately described using discrete, symbolic representations. However,manyrealworld problem domains, such as autonomousrobotic
navigation, are better characterized using continuous
representations. Such problem domainsalso require
continuous performance,such as continuous sensorimotor interaction with the environment, and continuous ~ptation and learning during the performance
task. Weintroduce a new methodfor continuous casebased reasoning, and discuss howit can be applied to
the dynamicselection, modification, and acquisition
of robot behaviors in autonomousnavigation systems.
Weconclude with a general discussion of case-based
reasoning issues addressed by this work.
1
Introduction
Case-based reasoning systems have traditionally been used to
perform high-level reasoning in problem domainsthat can be
adequately described using discrete, symbolicrepresentations.
For example, CHEFuses case-based planning to create recipes
[Hammond,1989], AQUA
uses case-based explanation to understand newspaper stories [Ram, 1993b], HYPO
uses casebased interpretation for legal argumentation[Ashley &Rissland, 1987], MEDIATOR
uses case-based problem solving for
dispute resolution [Kolodner, Simpson, & Sycara, 1985], and
PRODIGY
uses case-based reasoning in the form of derivational analogy for high-level robot planning [Veloso &Carbonell, 1991].
In our research, we have been investigating the problem of
performanceand learning in continuous, real-time problemdomains, such as autonomousrobotic navigation. Our method,
called continuous case-based reasoning shares manyof the
fundamental assumptions of what might be called "discrete"
case-based reasoning in symbolic problem domains} Learning is integrated with performance. Performanceis guided by
previous experience. Newproblems are solved by retrieving
cases and adapting them. Newcases are learned by evaluating proposed solutions and testing themon a real or simulated
world. However,continuous problem domainsrequire different
underlying representations and place additional constraints on
1Wedo not eschewsymbolicrepresentations; rather, the issue is
the continuous,time-varyingnature of any proposedrepresentations,
whethersymbolic,numeric,or otherwise.
86
the problem solving process. For example, consider the problem of driving a car on a highway.Car driving experiences can
vary from one another in infinitely manyways. The speed of a
car might be 55 mphin one experience and 54 mphin another.
Withina given episode, the speed of the car might continuously
vary, both infinitesimally from momentto moment,and significantly from, say, the highwayto an exit ramp. The problem
solving and learning process must operate continuously; there
is no time to stop and think, nor a logical point in the processat
which to do so.
Such problem domains are "continuous" in three senses.
First, they require continuous representations. For example,
a robotic navigation task requires representations of continuous
perceptual and motorcontrol information. Second,they require
continuous performance. For example, driving a car requires
continuous action. Often, problem-solving performance is incremental of necessity because of limited knowledgeavailable
to the reasoningsystemand/or becauseof the unpredictabilityof
the environment;the systemcan at best execute the "best" shortterm actions available to it and then re-evaluate its progress. A
robot, for example,maynot knowwhereobstacles lie until it actually encounters them. Third, these problemdomainsrequire
continuous adaptation and learning. As the problems encountered becomemorevaried and difficult, it becomesnecessary to
use fine-grained, detailed knowledgein an incremental manner
to act, and to rely on continuousfeedbackfrom the environment
to adapt actions and learn from experiences.
Case-based reasoning in such problemdomainsrequires significant enhancementsto the basic case-based reasoning methods used in discrete, symbolicreasoning systems. Whenare two
experiences different enoughto warrant consideration as independent cases? Whatis the scope of a single case? Is the entire
car trip from one’s houseto the grocery store a single case that
can be used to guide and improve one’s driving performance
in future situations? Howshould "continuous cases" be represented? Howcan they be used to guide performance? Howare
they learned and modified through experience? And howcan
this performanceand learning be integrated into a continuous,
on-line, real-time process?
In this paper, we provide an answerto these questions based
on our research into a robot navigation task. The proposed
methodsare fully implementedin a computersystem which uses
reactive control for its performanceelement, and case-based
reasoning for continuous adaptation of the performanceelement
and for continuouslearning through experience. Wediscuss the
relevant technical details of the system, and we conclude with
the contdbt~tions of our research and their implication for the
design of case-based reasoning systems in general.
2 The robot navigation
task
Autonomous
robotic navigation is defined as the task of finding
a path along whicha robot can movesafely from a source point
to a destination point in a obstacle-ridden terrain, and executing the actions to carry out the movementin a real or simulated world. Several methodshave been proposedfor this task,
ranging from high-level planning methodsto reactive methods.
High-level planning methods use extensive world knowledge
about available actions and their consequencesto formulate a
detailed plan before the actions are actually executed in the
world (e.g., [Fikes, Hart, &Nilsson, 1972; Georgeff, 1987;
Maes, 1990; Sacerdoti, 1977]). Considerablehigh-level knowledge is also neededto learn from planning experiences (e.g.,
[Hammond,1989; Minton, 1988; Mostow& Bhatnagar, 1987;
Segre, 1988]). Situated or reactive control methods,in contrast,
perform no planning; instead, a simple sensory representation
of the environmentis used to select the next action that should
be performed [Arkin, 1989; Brooks, 1986; Kaelbling, 1986;
Payton, 1986]. Actions are represented as simple behaviors,
whichcan be selected and executedrapidly, often in real-time.
These methods can cope with unknownand dynamic environmentalconfigurations, but only those that lie within the scopeof
predetermined behaviors. Furthermore, such methods cannot
modify or improve their behaviors through experience, since
they do not have any predictive capability that could account
for future consequencesof their actions, nor a higher-level forrealism in which to represent and reason about the knowledge
necessaryfor such analysis.
Wehave developed a self-improving navigation system that
uses reactive control for fast performance, augmentedwith a
continuous case-based reasoning methodthat allow the system
to adapt to novel environments and to learn from its experiences. The system autonomouslyand progressively constructs
representational structures that aid the navigation task by supplying the predictive capability that standard reactive systems
lack. The representations are constructed using a hybrid casebased and reinforcement learning method without extensive
high-level reasoning. The system is very robust and can perform successfully in (and learn from)novel environments,yet
comparesfavorably with traditional reactive methodsin terms
of speed and performance. A further advantage of the method
is that the system designers do not need to foresee and represent all the possibilities that might occur since the system
develops its own"understanding"of the world and its actions.
Throughexperience, the systemis’able to adapt to, and perform
well in, a wide range of environmentswithout any user intervention or supervisory input. This is a primary characteristic
that autonomousagents must have to interact with real-world
environments.
3 Technicaldetails
Our self-improvingrobotic navigation systemconsists of a navigation module,which uses schema-basedreactive control methods, and an on-line adaptation and learning module,whichuses
case-based reasoning and reinforcement learning methods. The
navigation moduleis responsible for movingthe robot through
the environmentfrom the starting location to the desired goal
location while avoidingobstacles along the way. The adaptation
and learning modulehas two responsibilities. The adaptation
sub-moduleperforms on-line adaptation of the reactive control parametersto get the best performancefrom the navigation
module. The adaptation is based on recommendations from
cases that capture and modelthe interaction of the systemwith
87
RobotAgent
I~ Learning and l
I .1 A~,~ I--I
"°*"
I[I
later,
I--
I1"-"
~
Environment
Figure1: Systemarchitecture
its environment.Withsuch a model, the system is able to predict future consequencesof its actions and act accordingly. The
learning sub-modulemonitors the progress of the system and
incrementally modifies the case representations throughexperience. Figure 1 showsthe functional architecture of the system.
3.1 Schema-basedreactive control
The reactive control moduleis based on the AuRA
architecture
[Arkin, 1989], and consists of a set of motorschemasthat represent the individual motorbehaviors available to the system.
Each schemareacts to sensory information from the environment, and producesa velocity vector representing the direction
and speed at whichthe robot is to movegiven current environmental conditions. For example, the schema AVOID-STATICOBSTACLE
directs the system to moveitself awayfrom detected
obstacles, and the associated schemaparameter Obstacle-Gain
determinesthe magnitudeof the repulsive potential field generated by the obstacles perceived by the system. Thevelocity vectors producedby all the schemasare then combinedto produce
a potential field that directs the actual movement
of the robot.
Simple behaviors, such as wandering, obstacle avoidance, and
goal following, can combineto produce complexemergent behaviors in a particular environment.Different emergentbehaviors can be obtained by modifying the simple behaviors. A
detailed description of schema-basedreactive control methods
can be found in Arkin [1989].
3.2 Behavior selection and modification
Our first attempt at building a case-based reactive navigation
system focussed on the issue of using case-based reasoning to
guide reactive control. A central issue here is the nature of the
guidance: At what grain size should the reactive control module
represent its behaviors, and what kind of "advice" should the
case-based reasoning moduleprovide to the reactive control
module?
In order to achieve more robust robotic control, we advocate
the use of sets of behaviors, called behavior assemblages,to
represent appropriate collections of cooperating behaviors for
complexenvironments, and of behavior adaptation to adapt and
fine-tune existing behaviors dynamicallyin novel environments
[Ram,Arkin, Moorman,&Clark, 1992]. There are two types of
behavior adaptations that might be considered. Oneoption is to
have the system modifyits current behavior based on immediate
past experience. Whileuseful, this is only a local response to
the problem.A moreglobal solution is to havethe systemselect
completely new assemblagesof behaviors based on the current
environmentin whichit finds itself. A robust systemshould be
able to learn about and adapt to its environmentdynamicallyin
both these ways.
Figure2: Typicalnavigational
behaviors
of differenttuningsof the
reactivecon~olmodule.
Thefigureonthe left showsthe non-learning
system
withhighobstacleavoidance
andlowgoalattraction. Onthe
right, the learningsystem
haslowered
obstacleavoidance
andincreased
goalattraction,allowing
it to "squeeze"
through
the obstaclesandthen
takea relativelydirectpathto thegoal.
the aboveexample,supposethat the robot is in a very cluttered
environmentand is employinga conservative assemblageof
motorbehaviors.It then breaksout of the obstaclesandenters
a large openfield (analogousto movingfroma forested area
into a meadow).
If only local changeswereallowed,the robot
wouldeventually adjust to the newenvironment.However,by
allowinga global changeto take place, the systemneedsonly
to realize that it is in a radicallynewenvironment
andto select
a newassemblage
of motorbehaviors, onebetter suited to the
newsurroundings.Interestingly, case-basedreasoningis used
to realize this typeof modification
as well.
Assemblages
of behaviorsare representedas cases, or standard scenarios known
to the system,that can be used to guide
performance
in novelsituations. Asin a traditional case-based
reasoningsystem,a case is used to proposea plan or a solution (here, a behaviorassemblage)to the problem(here,
current environmentalconfiguration). However,our method
differs fromthe traditional use of case-basedreasoningin an
importantrespect. Acase in our systemis also usedto propose
a set of behavioradaptations,rather than merelythe behaviors
themselves.
Thisallowsthe systemto use different strategies in
different situations. For example,the systemmightuse a"cautious" strategy in a crowdedenvironment
by graduallyslowing
downandallowingitself to get closer to the surrounding
obstacles. In order to permitthis, strategies suggestboundarieson
behavioralparametersrather than precise values for these parameters. Casesare used both to suggestbehaviorassemblages
as wellas to performdynamic
(on-line) adaptationof the parameters of behaviorassemblages
withinthe suggestedboundaries.
Theknowledge
required for both kinds of suggestionsis stored
in a case, in contrastwithtraditional case-based
reasoningsystemsin whichcases are used only to suggestsolutions, and a
separatelibrary of adaptationrules is usedto adapta solution
to fit the current problem.Furtherdetails can be foundin Ram,
Arkin, Moorman
& Clark [1992].
3.3 Caserepresentation
WhileACBARR
demonstratedthe feasibility of on-line, casebasedreasoningin systemsrequiring continuous,real-time response,it relied on a fixed library of cases that werehandcoded into the system. The system could adapt to novel
environments--an
importantkind of learning--but it could not
improveits ownadaptationbehaviorthroughexperience.Since
the knowledge
required for behavioradaptation is stored in
cases, weturnedour attention to the problemof learningcases
through experience. WebuiltSINS
(Self-ImprovingNavigation
System),whichis similar to the ACBARR
systembut can learn
and modifyits owncases throughexperience.Therepresentation of the casesin SINSis different andis designedto support
learning, but the underlyingideas behindthe twosystemsare
very similar.
Sincewedid not wantto rely on hand-coded,high-level domainknowledge,
the representationsusedby SINSto modelits
interaction withthe environment
are initially under-constrained
andgenetic;they containvery little usefulinformationfor the
navigationtask. As the systeminteracts with the environment,
the learningmodulegraduallymodifiesthe content of the representations until they becomeuseful andprovidereliable informationfor adaptingthe navigationsystemto the particular
environmentat hand.
Thelearning and reactive modulesfunction in an integrated
manner.Thelearning moduleis alwaystrying to find a better
modelof the interaction of the systemwith its environment
so that it can tune the reactive moduleto performits function
Differentcombinations
of schemaparametersproducedifferent behaviorsto be exhibitedby the system(see figure 2). This
allowsthe systemto interact successfullyin different environmentalconfigurationsrequitingdifferent navigational"strategies" [Clark, Arldn, & Ram,1992; Moorman
& Ram,1992].
Traditionally, parametersare fixed and determinedahead of
time by the systemdesigner. However,on-line selection and
modificationof the appropriateparametersbasedon the current
environmentcan enhancenavigational performance.Wetested
this idea by building the ACBARR
(A Case-BAsedReactive
Robotic)systemand evaluatingit qualitatively andquantitatively throughextensivesimulationstudies usinga variety of
different environments
andseveral different performance
mettics (see [Ram,Arkin, Moorman,
&Clark, 1992] for details).
Theexperimentsshowthat ACBARR
is very robust, performing
well in novelenvironments.
Additionally,it is able to navigate
throughseveral "hard" environments,such as box canyons,in
whichtraditional reactive systemswouldperformpoorly.
In the ACBARR
system, we incorporated both behavior
adaptationandbehaviorswitchinginto a reactivecontrol framework.At the local level, this is accomplished
by allowingthe
systemto ~pt its current behaviorin order to build "momentum".If somethingis workingwell, the systemcontinuesdoing
it andtries doingit a little bit harder;conversely,if thingsare
not proceedingwell, the systemattemptssomething
a little different. Thistechniqueallowsthe systemto fine tuneits current
behaviorpatterns to the exact environment
in whichit finds
itself. For example,if the robot has beenin an openarea for a
period of time andhas not encountered
anyobstacles, it picks
up speedanddoesnot worryas muchaboutobstacles. If, on the
otherhand,it is in a clutteredarea, it lowersits speedandtreats
obstacles moreseriously. For behavior-based
reactive systems,
this translates into altering the schemagains and parameters
continuously,providedthe systemhas a methodfor determining
the appropriate modifications. ACBARR
uses case-basedreasoningto retrieve behaviormodificationrules. Theserules are
then used incrementallyto alter the gain andparametervalues
basedon current environmental
conditionsandpast successes.
Theother methodfor behavior modification in ACBARR
is
at a moreglobal level. If the systemis currentlyacting under
the control of an assemblage
of behaviorswhichare no longer
suited to the current environment,it selects a newassemblage
based on whatthe environmentis nowlike. Continuingwith
88
better. Thereactive moduleprovidesfeedbackto the learning
moduleso it can build a better modelof this interaction. The
behaviorof the systemis thenthe result of an equilibriumpoint
establishedby the learningmodule
whichis trying to refine the
modeland the environmentwhichis complexand dynamicin
nature. This equilibriummayshift andneedto be re-established
if the environment
changesdrastically; however,the modelis
geneticenoughat anypoint to be able to deal with a very wide
range of environments.
Thereactive modulein SINScan be adaptedto exhibit many
different behaviors. SINSimprovesits performance
by learning howandwhento tune the reactive module.In this way,the
systemcan use the appropriate behaviorin each environmental configurationencountered.Thelearning module,therefore,
Ca$o
P~toomam
mustlearn about anddiscriminate betweendifferent environOon~alkm
P, apt~mt~10n
ments,andassociatewith eachthe appropriate~aptationsto be
performedon the motorschemas.This requires a representational schemeto modelthe interaction betweenthe systemand
the environment.However,
to ensurethat the systemdoes not
get boggeddownin extensivehigh-level reasoning,the knowledgerepresentedin the modelmustbe basedon perceptualand
motorinformation
easily availableat the reactivelevel.
Figure 3: Samplerepresentations showingthe time history of analog
SINSuses a modelconsisting of associations betweenthe
sensoryinputs (e.g., Obstacle-Density)
andschemaparameters values representing perceived inputs and schemaparameters. Each
graph in the case (below) is matchedagainst the corresponding graph
values(e.g., Obstacle-Gain,associatedwith the AVOID-STATICin the current environment(above) to determinethe best match, after
OBSTACLE
schema).Eachset of associations is representedas
whichthe remainingpart of the case is used to guide navigation (shown
a case. Sensoryinputs providesinformationabout the config- as dashedlines).
uration of the environment,and is obtainedfromthe system’s
sensors. Schema
parameterinformationspecifies howto adapt
the reactive modulein the environments
to whichthe case is
accordingly, while simultaneouslyupdating its owncases to
applicable.Eachtype of informationis representedas a vector reflect the observedresults of the system’sactions in various
of analog values. Eachanalog value correspondsto a quan- situations.
titative variable (a sensoryinput or a schemaparameter)at
Themethodis based on a combinationof ideas from casespecifictime. Avectorrepresentsthe trend or recenthistoryof a
based
reasoningand learning, whichdeals with the issue of
variable. Acase modelsan association betweensensoryinputs
using
past
experiencesto deal with andlearn fromnovelsituand schemaparametersby groupingtheir respective vectors
ations, and fromreinforcementlearning, whichdeals with the
together. Figure3 showan exampleof this representation.
basedon
Thisrepresentationhas three essential properties.First, the issue of updatingthe content of system’sknowledge
feedback
from
the
environment
(see
[Sutton,
1992]).
However,
representationis capableof capturinga widerangeof possible in traditional case-basedplanningsystems(e.g., [Hammond,
associations betweenof sensory inputs and schemaparame- 1989])learningandadaptationrequires a detailed modelof the
ters. Second,it permitscontinuousprogressiverefinementof
domain.This is exactlywhatreactive planningsystemsare trythe associations.Finally, the representationcapturestrendsor
ing to avoid. Earlier attemptsto combine
reactivecontrol with
patterns of input andoutputvalues over time. This allowsthe
classical planningsystems(e.g., [Chien,Gervasio,& DeJong,
systemto detect patterns over larger time windows
rather than
1991])or explanation-based
learningsystems(e.g., [Mitchell,
havingto makea decision basedonly on instantaneousvalues
1990]) also relied on deep reasoningand weretypically too
of perceptualinputs.
slowfor the fast, reflexivebehaviorrequiredin reactivecontrol
3.4 Case learning
systems. Unlikethese approaches,our methoddoes not fall
back
on slownon-reactivetechniquesfor improvingreactive
Thecase-basedreasoningand learning modulecreates, mainrains andappliesthe caserepresentations
usedfor on-lineadap- control.
tation of the reactivemodule.Themainobjectiveof the learning
Eachcase represents an observedregularity betweena parmethodis to construct a modelof the continuoussensorimotor ticular environmental
configurationandthe effects of different
interactionof the systemwithits environment,
that is, a mapping actions, and prescribes the values of the schemaparameters
fromsensoryinputs to appropriatebehavioral(schema)param- that are mostappropriate (as far as the systemknowsbased
eters. This modelallowsthe adaptationmoduleto continuously on its previousexperience)for that environment.Thelearncontrol the behaviorof the navigationmoduleby selecting and ing moduleperformsthe followingtasks in a cyclic manner:
adaptingschemaparametersin different environments.
Tolearn (1) perceiveandrepresentthe currentenvironment;
(2) retrieve
a mapping
in this context is to detect anddiscriminateamong a case whosesensory input vector represents an environment
different environment
configurations,and to identify the ap- mostsimilar to the current environment;
(3) adapt the schema
propriate schemaparametervalues to be used by the reactive parametervalues in use by the reactive control moduleby inmodule,in a dynamicand on-line manner.This meansthat,
stalling the values recommended
by schemaparametervectors
as the systemis navigating, the learning moduleis perceiv- of the case; and(4) learn newassociationsand/oradaptexisting
ing the environment,detecting an environmentconfiguration, associationsrepresentedin the caseto reflect anynewinformaand modifyingthe schemaparametersof the reactive module tion gainedthroughthe use of the case in the newsituation to
89
enhance
the reliability of their predictions.
Theperceivestep builds a set of vectors representingthe
sensoryinput, whichare then matchedagainst the corresponding vectorsof the cases in the system’smemory
in the retrieve
step. Thecase similarity metric is basedon the meansquared
difference betweeneach of the vector values of the case over
a trending window,and the vector values of the environment.
Thebest matchwindow
is calculated usinga reverse sweepover
the timeaxis similarto a convolution
processto findthe relative
positionthat matchesbest. Thebest matchingcase is handedto
the adaptstep, whichselects the schemaparametervalues from
the ease andmodifiesthe corresponding
valuesof the reactive
behaviorscurrently in use usinga reinforcementformulawhich
uses the case similarity metric as a scalar reward. Thusthe
actual adaptationsperformeddependon the goodnessof match
betweenthe case and the environment.
Finally,the learnstep usesstatistical information
aboutprior
applicationsof the case to determinewhetherinformationfrom
the current application of the case shouldbe used to modify
this case, or whethera newcase shouldbe created. Thevectors
encodedin the cases are adaptedusing a reinforcementformula
in whicha relative similaritymeasure
is usedas a scalar reward
or reinforcement
signal. Therelative similarity measurequantifies howsimilar the current environment
configurationis to
the environment
configurationencodedby the case relative to
howsimilar the environment
has beenin previousutilizations
of the case. Intuitively, if case matchesthe current situation
better thanprevioussituationsit wasusedin, it is likelythat the
situation involvesthe veryregularities that the case is beginning to capture;thus, it is worthwhile
modifying
the casein the
directionof the currentsituation. Alternatively,if the matchis
not quite as good,the case shouldnot be modifiedbecausethat
will take it awayfromthe regularityit wasconvergingtowards.
Finally,if the currentsituationis a verybadfit to the case, it
makesmoresense to create a newcase to represent whatis
probablya newclass of situations.
Adetailed descriptionof eachstep wouldrequire morespace
than is available in this paper (see Ram&Santamaria[1993]
for details). Here,wenote that since the reinforcement
formula
is basedon a relative similarity measure,the overall effect of
the learningprocessis to causethe cases to convergeon stable
associations betweenenvironmentconfigurations and schema
parameters.Stable associations represent regularities in the
worldthat havebeenidentified by the systemthroughexperience, and providethe predictive powernecessaryto navigate
in future situations. Theassumption
behindthis methodis that
the interaction betweenthe systemandthe environment
can be
characterizedby a finite set of causalpatterns or associations
betweenthe sensory inputs and the actions performedby the
system. The methodallows the system to learn these causal
patterns andto use themto modifyits actions by updatingits
schemaparametersas appropriate.
Themethodspresentedabovehavebeenevaluatedusing extensivesimulationsacrossa variety of different typesof environment,performance
criteria, and systemconfigurations. We
measuredthe qualitative and quantitative improvement
in the
navigationperformance
of the SINSsystem,and systematically
evaluatedthe effects of various designdecisionson the performaneeof the system.Theresults showthe efficacy of the
methodsacross a widerange of qualitative metrics, such as
flexibility of the systemandability to deal with difficult environmentalconfigurations,and quantitativemetrics that measure performance,such as the number
of navigationalproblems
90
solvedsuccessfullyandthe optimalityof the paths foundfor
these problems.Details of the experimentscan be found in
Ram& Santamaria [1993].
4
Discussion
Continuouscase-basedreasoningis a variation of traditional
case-basedreasoningthat can be used to performcontinuous
tasks. Theunderlyingsteps in the methodare similar, namely,
problemcharacterization,caseretrieval, adaptationandexecution, and learning. The ACBARR
and SINSsystems perform
a kind of case-basedplanning, and in that respect are similar to CHEF[Hammond,
1989]. However,there are several
interesting differencesdueto the continuousnatureof the domain,andto the on-line nature of the performance
andlearning tasks. Ourapproachis also similar to Kopeikina,Brandau, &Lemmon’s
[1988]use of case-basedreasoningfor realtime control. Theirsystem,thoughnot intendedfor robotics,
is designedto handlethe special issues of time-constrained
processingand the needto represent cases that evolve over
time. Theysuggest a systemthat performsthe learning task
in batch modeduring off peak hours. In contrast, our approachcombines
the learningcapabilities of case-basedreasoning withthe on-line, real-timeaspectsof reactive control. In
this respect,our researchis also differentfromearlier attempts
to combinereactive control with other types of higher-level
reasoning systems(e.g., [Chien, Gervasio, &DeJong,1991;
Mitchell, 1990]), whichtypically require the systemto "stop
and think." In our systems,the perception-actiontask andthe
adaptation-learning
task are integratedin a tightly knit cycle,
similar to the "anytimelearning" approachof Grefenstette &
Ramsey[1992].
Ourapproachcombinescontinuousrepresentations, continuous performance,and continuouslearning into a single integrated framework.There are three basic assumptionsthat
underliethis approach.First, the interaction betweenthe reasoningsystemandthe environment
is causal andconsistent. By
causal wemeanthat the sameactions executedunderthe same
environmentalconditions wouldresult in the sameoutcomes
(or similar outcomes,but suchvariations are muchslowerthan
the execution cycle of the system). Byconsistent we mean
that small changesin the executedactions underthe sameenvironmentalconditions wouldresult in small changesin the
outcomes.This guaranteesthat the systemcan use past experience to guideperformance
in similar situations in the future
andhopeto obtain the sameresults fromits actions. Thesecond assumptionunderlyingour approachis that the system’s
experiencesare likely to be typical of future experiences,and
that the systemwill usually encounterproblemsituations that
are similar to thosethat it alreadyhas experiencewith. These
assumptionsare also common
to manytraditional case-based
reasoningsystems(see, for example,the "typicality assumption" of Ram[1993b]), although they are not alwaysstated
explicitly or in this exactmanner.
Thethird assumption
is, perhaps,uniqueto continuouscasebased reasoning. In our current work, we haveassumedthat
the problemdomaincan be represented quantitatively. This
is required by the semanticconcepts and operations in our
method.In particular, our methodrequires a formalandwelldefinedsimilarity metricto judgethe similarity of twosituations, whichis usedto determinethe direction andmagnitude
of the necessaryadaptations. Whilecase-basedreasoningin
non-continuousdomainsalso requires a similarity metric for
partial matching,mostsuchmetricsusedin existingsystemsare
not fine-grainedenoughto determinethe degreeof similarity
betweencontinuouscasesor to place situations that couldvary
continuously
andinfinitesimallyfromeachother on a similarity
scale. This assumptioncould be relaxed if adequatesymbolic
representationsandsimilarity metricscouldbe developed,but
moreresearchis neededinto this issue.
Ourapproachto continuouscase-basedreasoningintroduces
several innovationsto the basic case-basedreasoningparadigm.
Dueto the similarity in the assumptionsandmethods,weconjecture that manyof these innovationswouldbe useful in traditional case-basedreasoningsystemsas well. Let us discussthe
underlyingcase-basedreasoningissues in moredetail.
4.1 Continuous cases
Ourmethodsrepresent a novel approachto case-basedreasoning for a newkindof task, onethat requirescontinuous,on-line
performance.Thesystemmustcontinuouslyevaluate its performance,and continueto adaptthe solution or seek a newone
basedon the current environment.
Furthermore,this evaluation
mustbe doneusing the simpleperceptualfeatures available to
a reactivecontrol system,unlike the complex
thematicfeatures
or abstract worldmodelsusedto retrieve cases andadaptation
strategies in manycase-basedreasoningsystems. Case-based
reasoningin such task domainsrequires a continuousrepresentationof cases, in termsof the availablefeatures, that represent the time courseof these features over suitably chosen
time windows.Theserepresentations are learned andmodified
continuouslywhilesimultaneously
being usedto guideaction.
4.2 Abstract cases
In a traditional, symbolictask domain,a case representsan actual experienceor an abstraction of one. For example,cases
in CHEF
[Hammond,
1989] represent actual recipes created by
the program.In a continuousdomain,however,an actual experienceconsistsof the timehistories of real parametricvalues
of perceptualand control parameters.For example,a navigational experiencemightinvolvea starting location of (2.34 m,
1.23 m)ona two-dimensional
grid, anda destinationlocationof
(6.71 m, 10.98m)on that samegrid. Theexperiencemightconsist of the valuesof perceptualparameterssuchas the current
position of the robot (an x-y coordinatepair of real numbers),
numberof obstacles sensed in the immediatevicinity of the
robot(an integer),or the distancein metersto the nearestobstacle, andcontrol parameterssuchas the minimum
safe distance
of approachto an obstacle(in meters),the speed,of the robot
(in metersper second),or the current headingof the robot (in
radians). Theseparametersvary continuouslyover time; thus
the completeexperienceis representedby time graphsof each
of theseparameters.
Clearly,it is not feasibleto store the entire
time history of eachof these parametersfor each problemthat
2the robotsolves,nor is it usefulto doso.
ThusSINSlearns andstores someabstraction of the actual
experience.Onemightargue that CHEF
actually does the same,
since an actual cookingexperiencewouldinvolve perceptual
input (suchas lookingat the fryingpanandjudgingwhetherthe
ingredients havebrownedenough)and control output (such
moving
the spatula to stir the ingredientsin the pot). In CHEF,
the experienceshavealreadybeenabstracted by the programmer
and represented in symbolicform. Oneof the open issues
in our research is the automaticexWactionof such symbolic
ZHowever,
if the rangeof allowablevariationsof perceptualand
control parametersis bounded
by the nature of the tusk, sucha
"memory-based
approach"
maybe in fact be feasible[Atkeson,1990].
91
representationsfromthe actual continuousexperiencesof the
system(e.g., [Kuipers&Byun,1988]).
4.3 Virtual eases
Arelated, but different, problemwith continuoustask domains
is that an experiencecandiffer fromanotherin infinitely many
ways,anddifferencescan rangefromsignificant to infinitesimal. Onlya tiny fraction of all possibleexperienceswill actually be undergoneby the system. However,the powerof a
case-basedreasoningsystemcomesfromits cases, and so it
is desirableto havea representativelibrary of casesthat will
coverthe rangeof experiencesthe systemis likely to encounter.
Weintroduce the idea of a virtual case, whichrepresents a
representativeexperiencethat the systemcould well havehad
but mayor maynot actually have had. Rather than trying to
remember
all the details (downto the grain-size definedby the
programmer)
of eachexperience, or someabstraction of these
details, SINScombinespast cases and present experiencesto
create a virtual experience.This is similar to the abstraction
processdescribedearlier, with the differencethat a virtual case
doesnot representan abstract or generalizeddescriptionof an
actual experiencebut rather a virtual experiencederivedfrom
a combinationof several actual experiences.Thenotion of a
virtual case mightbe useful in symboliccase-basedreasoning
systemsas well. To take a simpleexample,AQUA
learns about
and updates existing cases based on newexperiences [Ram,
1993b].This processresults in hypothesesthat mayor maynot
be "true" of a single experience,but are useful andplausible.
If used in future reasoning,these hypothesescould be viewed
as virtual cases. SINS’svirtual cases are moresophisticated
since theyare continuouslyrefined throughuse; the refinement
algorithmsare also different, as discussedbelow.
4.4 Twotypes of behaviormodification
Asdiscussedearlier, our systemsuse case-basedreasoningto
suggestglobalmodifications(behaviorselection) as well as
suggest morelocal modifications(behavior adaptation). The
knowledge
requiredfor both kindsof suggestionsare stored in
a case, in contrast with traditional case-basedreasoningsystemsin whichcases are usedonly to suggestsolutions, anda
separatelibrary of adaptationrules is usedto adapta solutionto
fit the current problem.In manyproblemdomains,even noncontinuousones, planning(or other types of problem-solving)
cannot be performedseparately from plan execution. In such
situations, case-basedreasoningcan be usedto proposea plan
(or solution)as wellas continuously
refine it duringexecution.
Anotherdifference of interest is that cases in ACBARR
and
SINSproposemodifications,not directly of the plan or trajectory, but of the reactive planneritself whichthenresult in
modificationsto the proposedtrajectories.
4.5 Twotypes of adaptation
Traditional case-basedreasoningsystemsretrieve cases and
adapt the solutions proposedby those cases in order to provide newsolutions to newproblemsat hand. However,
in order
to build virtual cases, our systemsalso needto adaptthe cases
themselvesin response to newexperiences. This is similar
to the incrementalcase modificationprocess in AQUA
[Ram,
1993b]in that, in additionto usinga case to deal with a new
situation, the systemcan use aa experienceto learn moreabout
its existingcases.
In our problemdomain,cases represent environmental
regularities that havebeenidentifiedby the systemthroughits experience. Theseprovidethe predictive powernecessaryto navigate in future situations. Casesare usedfor behavioradaptation;
in standardcase-basedreasoningterms, this can be viewedas
tered worlds, worlds with box canyons, and so on, without
the processof using the recommendations
providedby the case any reconfiguration. In this paper, wehavefocussed on the
to _n~_nt the solutionscurrentlybeingpursuedbythe system.In
case-basedreasoningaspects of our work; robot control isaddition, cases themselvescan be adaptedthroughexperience; sues are discussed in [Ram,Arkin, Moorman,
& Clark, 1992;
this canbe viewedas a processof discoveryin whichthe system Ram& Santamaria, 1993].
developsa modelof the worldaroundit. Thesystem"explores"
the searchspaceas its case representationstraverse this space 5 Conclusions
andfind good"niches" representing regularities. Theformer Wehavepresented a novel methodfor augmentingthe perforprocesscouldbe viewedas a processof"solution adaptation," manceof a reactive control systemthat combinescase-based
andthe latter as oneof "casemodification."
reasoningfor on-line parameteradaptation andreinforcement
Theparticular methodof case modificationusedin our system learning for on-line case learning andadaptation. Themethod
is similar to that of Sutton[1990],whosesystemuses a trialis fully implemented
andhas beenevaluatedthroughextensive
and-error reinforcementlearning strategy to developa world simulations.
modeland to plan optimal routes using the evolving world
Thepowerof the methodderives fromits ability to capture
model.Unlikethis system, however,our systemdoes not need
common
environmentalconfigurations, and regularities in the
to be trained on the sameworldmanytimes, nor are the results
interaction
betweenthe environment
andthe system,throughan
of its learningspecificto a particularworld,initial location,or
on-line,
adaptive
process.
The
method
addsconsiderablyto the
destinationlocation. In general, wehypothesizethat case modperformance
and
flexibility
of
the
underlying
reactive control
itieation maybe useful in other types of case-basedreasoning
system
because
it
allows
the
system
to
select
and
utilize differsystemsas well, although different methodsmayneed to be
ent
behaviors
(i.e.,
different
sets
of
schema
parameter
values)
developedto performthe modifications. AQUA’s
incremenas
appropriate
for
the
particular
situation
at
hand.
SINS
can be
tal case modification,for example,can be viewedas such an
characterized
as
performing
a
kind
of
constructive
representaextensionto SWALE’s
solution adaptation[Schank,1986].
Differentcriteria mayalso needto be developed
for deciding tional changein whichit constructshigher-levelrepresentations
(cases) of system-environment
interactions fromlow-levelsenwhento modifya case to fit the newexperience,whento learn
sorimotor
representations
[Ram,
1993a].
a newease to represent the newexperience, and whento use
In
SINS,
the
perception-action
task and the adaptationthe case for solution adaptationbut withoutmodifying
it. Our
systemuses a relative similarity measureto identify potential learningtask are integratedin a tightly knit cycle, similar to
regularities. Intuitively, if casematchesthe currentsituation the "anytimelearning" approachof [Grefenstette & Ramsey,
1992]. Perceptionand action are required so that the system
betterthanprevious
situationsit wasusedin, it is likelythat the
can
exploreits environment
anddetect regularities; they also,
situationinvolvesthe veryregularitiesthat the caseis beginning
of course, formthe basis of the underlyingperformance
task,
to capture; thus, it is worthwhilemodifyingthe case in the
that of navigation. Adaptationand learning are required to
directionof the currentsituation. Alternatively,if the matchis
generalizethese regularities andprovidepredictivesuggestions
not quite as good,the case shouldnot be modifiedbecausethat
will take it awayfromthe regularityit wasconverging
towards. basedon prior experience. Both tasks occur simultaneously,
progressively improvingthe performanceof the systemwhile
Finally,if the currentsituationis a verybadfit to the case, it
allowingit to carry out its performance
task withoutneedingto
makesmoresense to create anewcase to represent what is
"stop
and
think."
probablya newclass of situations.
Thereare still several unresolvedissues in our research.
4.6 On-line real.time response
Whilewehave beenable to determineappropriate time winUnliketraditional case-basedreasoningsystemswhichrely on dowsfor SINSthroughsimulationstudies, the size or extent
deepreasoningand analysis (e.g., [Hammond,
1989]), and un- of the cases neededto represent extendedexperiencesin conlike other machinelearning augmentations
to reactive control tinuous domainsis still an openissue [Kolodner,1993]. Fursystemswhichfall backon non-reactivereasoning(e.g., [Chien, thermore,the retrieval processis very expensiveandlimits the
Gervasio, & DeJong,1991]), our methodallows the system number
of cases that the systemcan handlewithoutdeteriorating
to continueto performreactively with very little performance the overallnavigationalperformance,
leadingto a kindof utility
overheadas compared
to a"pure"reactive control system.Even problem[Minton,1988]. Ourcurrent solution to this problem
if real-timeresponseis not required,however,
continuouscase- is to place an upperboundon the numberof cases allowedin
basedreasoningcould still be used in problemdomainswhich the system.Abetter solution wouldbe to developa methodfor
are inherently continuousand require continuousrepresenta- organization of cases in memory;
however,conventionalmemtions.
ory organizationschemesused in case-basedreasoningsystems
4.7 Adaptivereactive control
(see [Kolodner,1993])assumestructured, nominalinformation
Ourresearchalso contributesto reactivecontrolfor autonomous rather than continuous,time-varying,analoginformationof the
kind usedin our cases.
robots in the followingways.One,weproposea methodfor the
use of assemblages
of behaviorstailored to particular environAnotheropenissue is that of the nature of the regularities
mentaldemands,rather than of single or multipleindependent capturedin the system’scases. WhileSINS’cases do enhance
behaviors. Two,our systemscan select and adapt these beits performance,
they are not easy to interpret. Interpretation
haviors dynamicallywithout relying on the user to manually is desirable, not only for the purposeof obtainingof a deeper
programthe correct behavioralparametersfor each navigation understanding
of the methods,but also for possibleintegration
problem.Three, the knowledgerequired for behaviorselecof higher-levelreasoningandlearningmethodsinto the system.
tion and modificationis automaticallyacquiredthroughexpe- For example,instead of guessinginitial schemaparametervalrience using multiple learning methods.Finally, our system ues or modifying
themincrementallythroughtrial anderror, an
exhibits considerableflexibility over multiple domains.For explanation-basedmoduleworkingon top of the case adaptaexample,it performswell in unclutteredworlds,highly cluttion modulecould providebetter suggestionsfor these values,
92
thus speeding up the search process of finding the best schema
parametervalues associated with a particular environmentsituation. This requires a symbolic understanding of the knowledge
represented in the system’s cases.
Despite these limitations, SINS is a complete and autonomousself-improving navigation system, which can interact
with its environmentwithout user input and without any preprogrammed"domain knowledge"other than that implicit in
its reactive control schemas.Asit performsits task, it builds
a library of experiences that help it enhanceits performance.
Since the systemis alwayslearning, it can cope with major environmentalchanges as well as line tune its navigation module
in static and specific environmentsituations.
Kuipers, BJ. &Byun, Y-T. [1988]. A robust, qualitative method
for robot spatial learning. In Proceedingsof the Seventh
National Conferenceon Artificial Intelligence, 774-779,St.
Paul, MN.
Maes, P. [1990]. Situated agents can have goals. Robotics and
AutonomousSystems, 6:49-70.
Minton, S. [1988]. Learning effective search control knowledge: An explanation-based
approach. PhD thesis,
Carnegie-MellonUniversity, ComputerScience Department,
Pittsburgh, PA. Technical Report CMU-CS-88-133.
Mitchell, T.M. [1990]. Becomingincreasingly reactive. In
Proceedingsof the Eighth National Conferenceon Artificial
Intelligence, 1051-1058, Boston, MA.
Moorman,
K. & Ram, A. [1992]. A case-based approach to
References
reactive control for autonomousrobots. In Proceedings of
Arkin, R.C. [1989]. Motor schema-based mobile robot navthe AAAI Fall Symposiumon Al for Real-World Autonomous
igation. The International Journal of Robotics Research,
Mobile Robots, Cambridge, MA.
8(4):92-112.
Mostow,J. &Bhatnagar, N. [1987]. FAILSAFE,--A
floor planAshley, K. &Rissland, E. [1987]. Compareand contrast, a test
ner that uses EBGto learn from its failures. In Proceedings
of expertise. In Proceedingsof the Sixth NationalConference
of the TenthInternational Joint Conferenceon Artificial Inon Artificial Intelligence, 273-284.
telligence, 249-255,Milan, Italy.
Atkeson, C.G. [1990]. Memory-basedlearning in intelligent
Payton, D. [1986]. Anarchitecture for reflexive autonomous
control systems. In Proceedings of the AmericanControl
vehicle control. In Proceedingsof the IEEEConferenceon
Conference, San Diego, CA.
Robotics and Automation, pp. 1838-1845.
Brooks, R. [1986]. A robust layered control system for a mo- Ram, A. & Santamaria, J.C. [1993]. A multistrategy casebile robot. IEEEJournal of Robotics and Automation, RAbased and reinforcement learning approachto self-improving
2(1): 14-23.
reactive control systems for autonomousrobotic navigation.
Chien, S.A., Gervasio, M.T., &DeJong, G.E [1991]. On beIn R.S. Michalski & O. Tecuci (eds.), Proceedings of the
Second International Workshopon Multistrategy Learning,
comingdecreasingly reactive: Learning to deliberate minireally. In Proceedingsof the Eighth International Workshop
Harpers Ferry, WV.
on MachineLearning, 288-292, Chicago, IL.
Ram, A., Arkin, R.C., Moorman,K., & Clark, RJ. [1992].
Case-based reactive navigation: A case-based methodfor
Clark, RJ., Arkin, R.C. &Ram, A. [1992]. Learning momenon-line selection and adaptation of reactive control parametum: On-line performance enhancement for reactive systers in autonomousrobotic systems. Technical Report GITtems. In Proceedings of the IEEEInternational Conference
CC-92/57,College of Computing,Georgia Institute of Techon Robotics and Automation, 111-116, Nice, France, 1992.
nology, Atlanta, GA.
Fikes, R.E., Hart, P.E. & Nilsson, NJ. [1972]. Learning and
Ram,
A. [1993a]. Creative conceptual change. In Proceedings
executing generalized robot plans. Artificial Intelligence,
of
the Fifteenth AnnualConferenceof the Cognitive Science
3:251-288.
Society, Boulder, CO.
Georgeff, M. [1987]. Planning. Annual Review of Computer
Ram,A. [1993b]. Indexing, elaboration & refinement: IncreScience, 2:359-400.
mental learning of explanatory cases. MachineLearning,
Grefenstette, J.J. &Ramsey,C.L. [1992]. An approach to any10:201-248,in press.
time learning. In D. Sleeman&P. Edwards(eds.), Machine
Sacerdoti,
E.D. [1977]. A structure for plans and behavior.
Learning: Proceedings of the Ninth International ConferElsevier,
NewYork.
ence, 189-195, Aberdeen, Scotland.
Schank, R.C. [1986]. Explanation Patterns: Understanding
Hammond,KJ. [1989]. Case-Based Planning: Viewing PlanMechanically and Creatively. LawrenceErlbaum, HiUsdale,
ning as a MemoryTask. Perspectives in Artificial IntelliNJ.
gence. AcademicPress, Boston, MA,1989.
Segre, A.M. [1988]. Machine Learning of Robot Assembly
Kaelbling, L. [1986]. Anarchitecture for intelligent reactive
Plans. Kluwer Academic, Norwell, MA.
systems. Technical Note 400, SRI International, October.
Sutton, R.S. [1990]. Integrated architectures for learning, planKolodner, J.L., Simpson,R.L., &Sycara, K. [1985]. A process
ning, and reacting based on approximating dynamic promodel of case-based reasoning in problem solving. In A.
gramming.In Proceedingsof the Seventh International ConJoshi, editor, Proceedingsof the Ninth International Joint
ference on MachineLearning, 216-224, Austin, TX.
Conferenceon Artificial lntelligence, 284-290,Los Angeles,
Sutton, R.S., [1992], editor. MachineLearning, 8(3/4), Special
CA.
issue on reinforcementlearning.
Koiodner, J.L. [1993]. Case-based reasoning. MorganKaufVeloso, M. & Carbonell, J.G. [1991]. Automatingcase generamann,San Mateo, CA, in press.
tion, storage, and retrieval in Prodigy.In R.S. Michalski&G.
Kopeikina, L., Brandau, R., & Lemmon,A. [1988]. CaseTecuci (eds.), Proceedingsof the First International Workbased reasoning for continuouscontrol. In Proceedingsof a
shop On Multistrategy Learning, 363-377, Harpers Ferry,
Workshopon Case-Based Reasoning, 250-259, Clearwater
WV.
Beach, FL.
93
Download