From: AAAI Technical Report WS-93-01. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Continuous Case-Based Reasoning Ashwin Ramand Juan Carlos Santamaria College of Computing GeorgiaInstitute of Technology Atlanta, Georgia30332-0280 {ashwin, carlos}@cc,gatech, edu Abstract Case-basedreasoning systems have traditionallybeen used to perform high-level reasoning in problemdomains that can be adequately described using discrete, symbolic representations. However,manyrealworld problem domains, such as autonomousrobotic navigation, are better characterized using continuous representations. Such problem domainsalso require continuous performance,such as continuous sensorimotor interaction with the environment, and continuous ~ptation and learning during the performance task. Weintroduce a new methodfor continuous casebased reasoning, and discuss howit can be applied to the dynamicselection, modification, and acquisition of robot behaviors in autonomousnavigation systems. Weconclude with a general discussion of case-based reasoning issues addressed by this work. 1 Introduction Case-based reasoning systems have traditionally been used to perform high-level reasoning in problem domainsthat can be adequately described using discrete, symbolicrepresentations. For example, CHEFuses case-based planning to create recipes [Hammond,1989], AQUA uses case-based explanation to understand newspaper stories [Ram, 1993b], HYPO uses casebased interpretation for legal argumentation[Ashley &Rissland, 1987], MEDIATOR uses case-based problem solving for dispute resolution [Kolodner, Simpson, & Sycara, 1985], and PRODIGY uses case-based reasoning in the form of derivational analogy for high-level robot planning [Veloso &Carbonell, 1991]. In our research, we have been investigating the problem of performanceand learning in continuous, real-time problemdomains, such as autonomousrobotic navigation. Our method, called continuous case-based reasoning shares manyof the fundamental assumptions of what might be called "discrete" case-based reasoning in symbolic problem domains} Learning is integrated with performance. Performanceis guided by previous experience. Newproblems are solved by retrieving cases and adapting them. Newcases are learned by evaluating proposed solutions and testing themon a real or simulated world. However,continuous problem domainsrequire different underlying representations and place additional constraints on 1Wedo not eschewsymbolicrepresentations; rather, the issue is the continuous,time-varyingnature of any proposedrepresentations, whethersymbolic,numeric,or otherwise. 86 the problem solving process. For example, consider the problem of driving a car on a highway.Car driving experiences can vary from one another in infinitely manyways. The speed of a car might be 55 mphin one experience and 54 mphin another. Withina given episode, the speed of the car might continuously vary, both infinitesimally from momentto moment,and significantly from, say, the highwayto an exit ramp. The problem solving and learning process must operate continuously; there is no time to stop and think, nor a logical point in the processat which to do so. Such problem domains are "continuous" in three senses. First, they require continuous representations. For example, a robotic navigation task requires representations of continuous perceptual and motorcontrol information. Second,they require continuous performance. For example, driving a car requires continuous action. Often, problem-solving performance is incremental of necessity because of limited knowledgeavailable to the reasoningsystemand/or becauseof the unpredictabilityof the environment;the systemcan at best execute the "best" shortterm actions available to it and then re-evaluate its progress. A robot, for example,maynot knowwhereobstacles lie until it actually encounters them. Third, these problemdomainsrequire continuous adaptation and learning. As the problems encountered becomemorevaried and difficult, it becomesnecessary to use fine-grained, detailed knowledgein an incremental manner to act, and to rely on continuousfeedbackfrom the environment to adapt actions and learn from experiences. Case-based reasoning in such problemdomainsrequires significant enhancementsto the basic case-based reasoning methods used in discrete, symbolicreasoning systems. Whenare two experiences different enoughto warrant consideration as independent cases? Whatis the scope of a single case? Is the entire car trip from one’s houseto the grocery store a single case that can be used to guide and improve one’s driving performance in future situations? Howshould "continuous cases" be represented? Howcan they be used to guide performance? Howare they learned and modified through experience? And howcan this performanceand learning be integrated into a continuous, on-line, real-time process? In this paper, we provide an answerto these questions based on our research into a robot navigation task. The proposed methodsare fully implementedin a computersystem which uses reactive control for its performanceelement, and case-based reasoning for continuous adaptation of the performanceelement and for continuouslearning through experience. Wediscuss the relevant technical details of the system, and we conclude with the contdbt~tions of our research and their implication for the design of case-based reasoning systems in general. 2 The robot navigation task Autonomous robotic navigation is defined as the task of finding a path along whicha robot can movesafely from a source point to a destination point in a obstacle-ridden terrain, and executing the actions to carry out the movementin a real or simulated world. Several methodshave been proposedfor this task, ranging from high-level planning methodsto reactive methods. High-level planning methods use extensive world knowledge about available actions and their consequencesto formulate a detailed plan before the actions are actually executed in the world (e.g., [Fikes, Hart, &Nilsson, 1972; Georgeff, 1987; Maes, 1990; Sacerdoti, 1977]). Considerablehigh-level knowledge is also neededto learn from planning experiences (e.g., [Hammond,1989; Minton, 1988; Mostow& Bhatnagar, 1987; Segre, 1988]). Situated or reactive control methods,in contrast, perform no planning; instead, a simple sensory representation of the environmentis used to select the next action that should be performed [Arkin, 1989; Brooks, 1986; Kaelbling, 1986; Payton, 1986]. Actions are represented as simple behaviors, whichcan be selected and executedrapidly, often in real-time. These methods can cope with unknownand dynamic environmentalconfigurations, but only those that lie within the scopeof predetermined behaviors. Furthermore, such methods cannot modify or improve their behaviors through experience, since they do not have any predictive capability that could account for future consequencesof their actions, nor a higher-level forrealism in which to represent and reason about the knowledge necessaryfor such analysis. Wehave developed a self-improving navigation system that uses reactive control for fast performance, augmentedwith a continuous case-based reasoning methodthat allow the system to adapt to novel environments and to learn from its experiences. The system autonomouslyand progressively constructs representational structures that aid the navigation task by supplying the predictive capability that standard reactive systems lack. The representations are constructed using a hybrid casebased and reinforcement learning method without extensive high-level reasoning. The system is very robust and can perform successfully in (and learn from)novel environments,yet comparesfavorably with traditional reactive methodsin terms of speed and performance. A further advantage of the method is that the system designers do not need to foresee and represent all the possibilities that might occur since the system develops its own"understanding"of the world and its actions. Throughexperience, the systemis’able to adapt to, and perform well in, a wide range of environmentswithout any user intervention or supervisory input. This is a primary characteristic that autonomousagents must have to interact with real-world environments. 3 Technicaldetails Our self-improvingrobotic navigation systemconsists of a navigation module,which uses schema-basedreactive control methods, and an on-line adaptation and learning module,whichuses case-based reasoning and reinforcement learning methods. The navigation moduleis responsible for movingthe robot through the environmentfrom the starting location to the desired goal location while avoidingobstacles along the way. The adaptation and learning modulehas two responsibilities. The adaptation sub-moduleperforms on-line adaptation of the reactive control parametersto get the best performancefrom the navigation module. The adaptation is based on recommendations from cases that capture and modelthe interaction of the systemwith 87 RobotAgent I~ Learning and l I .1 A~,~ I--I "°*" I[I later, I-- I1"-" ~ Environment Figure1: Systemarchitecture its environment.Withsuch a model, the system is able to predict future consequencesof its actions and act accordingly. The learning sub-modulemonitors the progress of the system and incrementally modifies the case representations throughexperience. Figure 1 showsthe functional architecture of the system. 3.1 Schema-basedreactive control The reactive control moduleis based on the AuRA architecture [Arkin, 1989], and consists of a set of motorschemasthat represent the individual motorbehaviors available to the system. Each schemareacts to sensory information from the environment, and producesa velocity vector representing the direction and speed at whichthe robot is to movegiven current environmental conditions. For example, the schema AVOID-STATICOBSTACLE directs the system to moveitself awayfrom detected obstacles, and the associated schemaparameter Obstacle-Gain determinesthe magnitudeof the repulsive potential field generated by the obstacles perceived by the system. Thevelocity vectors producedby all the schemasare then combinedto produce a potential field that directs the actual movement of the robot. Simple behaviors, such as wandering, obstacle avoidance, and goal following, can combineto produce complexemergent behaviors in a particular environment.Different emergentbehaviors can be obtained by modifying the simple behaviors. A detailed description of schema-basedreactive control methods can be found in Arkin [1989]. 3.2 Behavior selection and modification Our first attempt at building a case-based reactive navigation system focussed on the issue of using case-based reasoning to guide reactive control. A central issue here is the nature of the guidance: At what grain size should the reactive control module represent its behaviors, and what kind of "advice" should the case-based reasoning moduleprovide to the reactive control module? In order to achieve more robust robotic control, we advocate the use of sets of behaviors, called behavior assemblages,to represent appropriate collections of cooperating behaviors for complexenvironments, and of behavior adaptation to adapt and fine-tune existing behaviors dynamicallyin novel environments [Ram,Arkin, Moorman,&Clark, 1992]. There are two types of behavior adaptations that might be considered. Oneoption is to have the system modifyits current behavior based on immediate past experience. Whileuseful, this is only a local response to the problem.A moreglobal solution is to havethe systemselect completely new assemblagesof behaviors based on the current environmentin whichit finds itself. A robust systemshould be able to learn about and adapt to its environmentdynamicallyin both these ways. Figure2: Typicalnavigational behaviors of differenttuningsof the reactivecon~olmodule. Thefigureonthe left showsthe non-learning system withhighobstacleavoidance andlowgoalattraction. Onthe right, the learningsystem haslowered obstacleavoidance andincreased goalattraction,allowing it to "squeeze" through the obstaclesandthen takea relativelydirectpathto thegoal. the aboveexample,supposethat the robot is in a very cluttered environmentand is employinga conservative assemblageof motorbehaviors.It then breaksout of the obstaclesandenters a large openfield (analogousto movingfroma forested area into a meadow). If only local changeswereallowed,the robot wouldeventually adjust to the newenvironment.However,by allowinga global changeto take place, the systemneedsonly to realize that it is in a radicallynewenvironment andto select a newassemblage of motorbehaviors, onebetter suited to the newsurroundings.Interestingly, case-basedreasoningis used to realize this typeof modification as well. Assemblages of behaviorsare representedas cases, or standard scenarios known to the system,that can be used to guide performance in novelsituations. Asin a traditional case-based reasoningsystem,a case is used to proposea plan or a solution (here, a behaviorassemblage)to the problem(here, current environmentalconfiguration). However,our method differs fromthe traditional use of case-basedreasoningin an importantrespect. Acase in our systemis also usedto propose a set of behavioradaptations,rather than merelythe behaviors themselves. Thisallowsthe systemto use different strategies in different situations. For example,the systemmightuse a"cautious" strategy in a crowdedenvironment by graduallyslowing downandallowingitself to get closer to the surrounding obstacles. In order to permitthis, strategies suggestboundarieson behavioralparametersrather than precise values for these parameters. Casesare used both to suggestbehaviorassemblages as wellas to performdynamic (on-line) adaptationof the parameters of behaviorassemblages withinthe suggestedboundaries. Theknowledge required for both kinds of suggestionsis stored in a case, in contrastwithtraditional case-based reasoningsystemsin whichcases are used only to suggestsolutions, and a separatelibrary of adaptationrules is usedto adapta solution to fit the current problem.Furtherdetails can be foundin Ram, Arkin, Moorman & Clark [1992]. 3.3 Caserepresentation WhileACBARR demonstratedthe feasibility of on-line, casebasedreasoningin systemsrequiring continuous,real-time response,it relied on a fixed library of cases that werehandcoded into the system. The system could adapt to novel environments--an importantkind of learning--but it could not improveits ownadaptationbehaviorthroughexperience.Since the knowledge required for behavioradaptation is stored in cases, weturnedour attention to the problemof learningcases through experience. WebuiltSINS (Self-ImprovingNavigation System),whichis similar to the ACBARR systembut can learn and modifyits owncases throughexperience.Therepresentation of the casesin SINSis different andis designedto support learning, but the underlyingideas behindthe twosystemsare very similar. Sincewedid not wantto rely on hand-coded,high-level domainknowledge, the representationsusedby SINSto modelits interaction withthe environment are initially under-constrained andgenetic;they containvery little usefulinformationfor the navigationtask. As the systeminteracts with the environment, the learningmodulegraduallymodifiesthe content of the representations until they becomeuseful andprovidereliable informationfor adaptingthe navigationsystemto the particular environmentat hand. Thelearning and reactive modulesfunction in an integrated manner.Thelearning moduleis alwaystrying to find a better modelof the interaction of the systemwith its environment so that it can tune the reactive moduleto performits function Differentcombinations of schemaparametersproducedifferent behaviorsto be exhibitedby the system(see figure 2). This allowsthe systemto interact successfullyin different environmentalconfigurationsrequitingdifferent navigational"strategies" [Clark, Arldn, & Ram,1992; Moorman & Ram,1992]. Traditionally, parametersare fixed and determinedahead of time by the systemdesigner. However,on-line selection and modificationof the appropriateparametersbasedon the current environmentcan enhancenavigational performance.Wetested this idea by building the ACBARR (A Case-BAsedReactive Robotic)systemand evaluatingit qualitatively andquantitatively throughextensivesimulationstudies usinga variety of different environments andseveral different performance mettics (see [Ram,Arkin, Moorman, &Clark, 1992] for details). Theexperimentsshowthat ACBARR is very robust, performing well in novelenvironments. Additionally,it is able to navigate throughseveral "hard" environments,such as box canyons,in whichtraditional reactive systemswouldperformpoorly. In the ACBARR system, we incorporated both behavior adaptationandbehaviorswitchinginto a reactivecontrol framework.At the local level, this is accomplished by allowingthe systemto ~pt its current behaviorin order to build "momentum".If somethingis workingwell, the systemcontinuesdoing it andtries doingit a little bit harder;conversely,if thingsare not proceedingwell, the systemattemptssomething a little different. Thistechniqueallowsthe systemto fine tuneits current behaviorpatterns to the exact environment in whichit finds itself. For example,if the robot has beenin an openarea for a period of time andhas not encountered anyobstacles, it picks up speedanddoesnot worryas muchaboutobstacles. If, on the otherhand,it is in a clutteredarea, it lowersits speedandtreats obstacles moreseriously. For behavior-based reactive systems, this translates into altering the schemagains and parameters continuously,providedthe systemhas a methodfor determining the appropriate modifications. ACBARR uses case-basedreasoningto retrieve behaviormodificationrules. Theserules are then used incrementallyto alter the gain andparametervalues basedon current environmental conditionsandpast successes. Theother methodfor behavior modification in ACBARR is at a moreglobal level. If the systemis currentlyacting under the control of an assemblage of behaviorswhichare no longer suited to the current environment,it selects a newassemblage based on whatthe environmentis nowlike. Continuingwith 88 better. Thereactive moduleprovidesfeedbackto the learning moduleso it can build a better modelof this interaction. The behaviorof the systemis thenthe result of an equilibriumpoint establishedby the learningmodule whichis trying to refine the modeland the environmentwhichis complexand dynamicin nature. This equilibriummayshift andneedto be re-established if the environment changesdrastically; however,the modelis geneticenoughat anypoint to be able to deal with a very wide range of environments. Thereactive modulein SINScan be adaptedto exhibit many different behaviors. SINSimprovesits performance by learning howandwhento tune the reactive module.In this way,the systemcan use the appropriate behaviorin each environmental configurationencountered.Thelearning module,therefore, Ca$o P~toomam mustlearn about anddiscriminate betweendifferent environOon~alkm P, apt~mt~10n ments,andassociatewith eachthe appropriate~aptationsto be performedon the motorschemas.This requires a representational schemeto modelthe interaction betweenthe systemand the environment.However, to ensurethat the systemdoes not get boggeddownin extensivehigh-level reasoning,the knowledgerepresentedin the modelmustbe basedon perceptualand motorinformation easily availableat the reactivelevel. Figure 3: Samplerepresentations showingthe time history of analog SINSuses a modelconsisting of associations betweenthe sensoryinputs (e.g., Obstacle-Density) andschemaparameters values representing perceived inputs and schemaparameters. Each graph in the case (below) is matchedagainst the corresponding graph values(e.g., Obstacle-Gain,associatedwith the AVOID-STATICin the current environment(above) to determinethe best match, after OBSTACLE schema).Eachset of associations is representedas whichthe remainingpart of the case is used to guide navigation (shown a case. Sensoryinputs providesinformationabout the config- as dashedlines). uration of the environment,and is obtainedfromthe system’s sensors. Schema parameterinformationspecifies howto adapt the reactive modulein the environments to whichthe case is accordingly, while simultaneouslyupdating its owncases to applicable.Eachtype of informationis representedas a vector reflect the observedresults of the system’sactions in various of analog values. Eachanalog value correspondsto a quan- situations. titative variable (a sensoryinput or a schemaparameter)at Themethodis based on a combinationof ideas from casespecifictime. Avectorrepresentsthe trend or recenthistoryof a based reasoningand learning, whichdeals with the issue of variable. Acase modelsan association betweensensoryinputs using past experiencesto deal with andlearn fromnovelsituand schemaparametersby groupingtheir respective vectors ations, and fromreinforcementlearning, whichdeals with the together. Figure3 showan exampleof this representation. basedon Thisrepresentationhas three essential properties.First, the issue of updatingthe content of system’sknowledge feedback from the environment (see [Sutton, 1992]). However, representationis capableof capturinga widerangeof possible in traditional case-basedplanningsystems(e.g., [Hammond, associations betweenof sensory inputs and schemaparame- 1989])learningandadaptationrequires a detailed modelof the ters. Second,it permitscontinuousprogressiverefinementof domain.This is exactlywhatreactive planningsystemsare trythe associations.Finally, the representationcapturestrendsor ing to avoid. Earlier attemptsto combine reactivecontrol with patterns of input andoutputvalues over time. This allowsthe classical planningsystems(e.g., [Chien,Gervasio,& DeJong, systemto detect patterns over larger time windows rather than 1991])or explanation-based learningsystems(e.g., [Mitchell, havingto makea decision basedonly on instantaneousvalues 1990]) also relied on deep reasoningand weretypically too of perceptualinputs. slowfor the fast, reflexivebehaviorrequiredin reactivecontrol 3.4 Case learning systems. Unlikethese approaches,our methoddoes not fall back on slownon-reactivetechniquesfor improvingreactive Thecase-basedreasoningand learning modulecreates, mainrains andappliesthe caserepresentations usedfor on-lineadap- control. tation of the reactivemodule.Themainobjectiveof the learning Eachcase represents an observedregularity betweena parmethodis to construct a modelof the continuoussensorimotor ticular environmental configurationandthe effects of different interactionof the systemwithits environment, that is, a mapping actions, and prescribes the values of the schemaparameters fromsensoryinputs to appropriatebehavioral(schema)param- that are mostappropriate (as far as the systemknowsbased eters. This modelallowsthe adaptationmoduleto continuously on its previousexperience)for that environment.Thelearncontrol the behaviorof the navigationmoduleby selecting and ing moduleperformsthe followingtasks in a cyclic manner: adaptingschemaparametersin different environments. Tolearn (1) perceiveandrepresentthe currentenvironment; (2) retrieve a mapping in this context is to detect anddiscriminateamong a case whosesensory input vector represents an environment different environment configurations,and to identify the ap- mostsimilar to the current environment; (3) adapt the schema propriate schemaparametervalues to be used by the reactive parametervalues in use by the reactive control moduleby inmodule,in a dynamicand on-line manner.This meansthat, stalling the values recommended by schemaparametervectors as the systemis navigating, the learning moduleis perceiv- of the case; and(4) learn newassociationsand/oradaptexisting ing the environment,detecting an environmentconfiguration, associationsrepresentedin the caseto reflect anynewinformaand modifyingthe schemaparametersof the reactive module tion gainedthroughthe use of the case in the newsituation to 89 enhance the reliability of their predictions. Theperceivestep builds a set of vectors representingthe sensoryinput, whichare then matchedagainst the corresponding vectorsof the cases in the system’smemory in the retrieve step. Thecase similarity metric is basedon the meansquared difference betweeneach of the vector values of the case over a trending window,and the vector values of the environment. Thebest matchwindow is calculated usinga reverse sweepover the timeaxis similarto a convolution processto findthe relative positionthat matchesbest. Thebest matchingcase is handedto the adaptstep, whichselects the schemaparametervalues from the ease andmodifiesthe corresponding valuesof the reactive behaviorscurrently in use usinga reinforcementformulawhich uses the case similarity metric as a scalar reward. Thusthe actual adaptationsperformeddependon the goodnessof match betweenthe case and the environment. Finally,the learnstep usesstatistical information aboutprior applicationsof the case to determinewhetherinformationfrom the current application of the case shouldbe used to modify this case, or whethera newcase shouldbe created. Thevectors encodedin the cases are adaptedusing a reinforcementformula in whicha relative similaritymeasure is usedas a scalar reward or reinforcement signal. Therelative similarity measurequantifies howsimilar the current environment configurationis to the environment configurationencodedby the case relative to howsimilar the environment has beenin previousutilizations of the case. Intuitively, if case matchesthe current situation better thanprevioussituationsit wasusedin, it is likelythat the situation involvesthe veryregularities that the case is beginning to capture;thus, it is worthwhile modifying the casein the directionof the currentsituation. Alternatively,if the matchis not quite as good,the case shouldnot be modifiedbecausethat will take it awayfromthe regularityit wasconvergingtowards. Finally,if the currentsituationis a verybadfit to the case, it makesmoresense to create a newcase to represent whatis probablya newclass of situations. Adetailed descriptionof eachstep wouldrequire morespace than is available in this paper (see Ram&Santamaria[1993] for details). Here,wenote that since the reinforcement formula is basedon a relative similarity measure,the overall effect of the learningprocessis to causethe cases to convergeon stable associations betweenenvironmentconfigurations and schema parameters.Stable associations represent regularities in the worldthat havebeenidentified by the systemthroughexperience, and providethe predictive powernecessaryto navigate in future situations. Theassumption behindthis methodis that the interaction betweenthe systemandthe environment can be characterizedby a finite set of causalpatterns or associations betweenthe sensory inputs and the actions performedby the system. The methodallows the system to learn these causal patterns andto use themto modifyits actions by updatingits schemaparametersas appropriate. Themethodspresentedabovehavebeenevaluatedusing extensivesimulationsacrossa variety of different typesof environment,performance criteria, and systemconfigurations. We measuredthe qualitative and quantitative improvement in the navigationperformance of the SINSsystem,and systematically evaluatedthe effects of various designdecisionson the performaneeof the system.Theresults showthe efficacy of the methodsacross a widerange of qualitative metrics, such as flexibility of the systemandability to deal with difficult environmentalconfigurations,and quantitativemetrics that measure performance,such as the number of navigationalproblems 90 solvedsuccessfullyandthe optimalityof the paths foundfor these problems.Details of the experimentscan be found in Ram& Santamaria [1993]. 4 Discussion Continuouscase-basedreasoningis a variation of traditional case-basedreasoningthat can be used to performcontinuous tasks. Theunderlyingsteps in the methodare similar, namely, problemcharacterization,caseretrieval, adaptationandexecution, and learning. The ACBARR and SINSsystems perform a kind of case-basedplanning, and in that respect are similar to CHEF[Hammond, 1989]. However,there are several interesting differencesdueto the continuousnatureof the domain,andto the on-line nature of the performance andlearning tasks. Ourapproachis also similar to Kopeikina,Brandau, &Lemmon’s [1988]use of case-basedreasoningfor realtime control. Theirsystem,thoughnot intendedfor robotics, is designedto handlethe special issues of time-constrained processingand the needto represent cases that evolve over time. Theysuggest a systemthat performsthe learning task in batch modeduring off peak hours. In contrast, our approachcombines the learningcapabilities of case-basedreasoning withthe on-line, real-timeaspectsof reactive control. In this respect,our researchis also differentfromearlier attempts to combinereactive control with other types of higher-level reasoning systems(e.g., [Chien, Gervasio, &DeJong,1991; Mitchell, 1990]), whichtypically require the systemto "stop and think." In our systems,the perception-actiontask andthe adaptation-learning task are integratedin a tightly knit cycle, similar to the "anytimelearning" approachof Grefenstette & Ramsey[1992]. Ourapproachcombinescontinuousrepresentations, continuous performance,and continuouslearning into a single integrated framework.There are three basic assumptionsthat underliethis approach.First, the interaction betweenthe reasoningsystemandthe environment is causal andconsistent. By causal wemeanthat the sameactions executedunderthe same environmentalconditions wouldresult in the sameoutcomes (or similar outcomes,but suchvariations are muchslowerthan the execution cycle of the system). Byconsistent we mean that small changesin the executedactions underthe sameenvironmentalconditions wouldresult in small changesin the outcomes.This guaranteesthat the systemcan use past experience to guideperformance in similar situations in the future andhopeto obtain the sameresults fromits actions. Thesecond assumptionunderlyingour approachis that the system’s experiencesare likely to be typical of future experiences,and that the systemwill usually encounterproblemsituations that are similar to thosethat it alreadyhas experiencewith. These assumptionsare also common to manytraditional case-based reasoningsystems(see, for example,the "typicality assumption" of Ram[1993b]), although they are not alwaysstated explicitly or in this exactmanner. Thethird assumption is, perhaps,uniqueto continuouscasebased reasoning. In our current work, we haveassumedthat the problemdomaincan be represented quantitatively. This is required by the semanticconcepts and operations in our method.In particular, our methodrequires a formalandwelldefinedsimilarity metricto judgethe similarity of twosituations, whichis usedto determinethe direction andmagnitude of the necessaryadaptations. Whilecase-basedreasoningin non-continuousdomainsalso requires a similarity metric for partial matching,mostsuchmetricsusedin existingsystemsare not fine-grainedenoughto determinethe degreeof similarity betweencontinuouscasesor to place situations that couldvary continuously andinfinitesimallyfromeachother on a similarity scale. This assumptioncould be relaxed if adequatesymbolic representationsandsimilarity metricscouldbe developed,but moreresearchis neededinto this issue. Ourapproachto continuouscase-basedreasoningintroduces several innovationsto the basic case-basedreasoningparadigm. Dueto the similarity in the assumptionsandmethods,weconjecture that manyof these innovationswouldbe useful in traditional case-basedreasoningsystemsas well. Let us discussthe underlyingcase-basedreasoningissues in moredetail. 4.1 Continuous cases Ourmethodsrepresent a novel approachto case-basedreasoning for a newkindof task, onethat requirescontinuous,on-line performance.Thesystemmustcontinuouslyevaluate its performance,and continueto adaptthe solution or seek a newone basedon the current environment. Furthermore,this evaluation mustbe doneusing the simpleperceptualfeatures available to a reactivecontrol system,unlike the complex thematicfeatures or abstract worldmodelsusedto retrieve cases andadaptation strategies in manycase-basedreasoningsystems. Case-based reasoningin such task domainsrequires a continuousrepresentationof cases, in termsof the availablefeatures, that represent the time courseof these features over suitably chosen time windows.Theserepresentations are learned andmodified continuouslywhilesimultaneously being usedto guideaction. 4.2 Abstract cases In a traditional, symbolictask domain,a case representsan actual experienceor an abstraction of one. For example,cases in CHEF [Hammond, 1989] represent actual recipes created by the program.In a continuousdomain,however,an actual experienceconsistsof the timehistories of real parametricvalues of perceptualand control parameters.For example,a navigational experiencemightinvolvea starting location of (2.34 m, 1.23 m)ona two-dimensional grid, anda destinationlocationof (6.71 m, 10.98m)on that samegrid. Theexperiencemightconsist of the valuesof perceptualparameterssuchas the current position of the robot (an x-y coordinatepair of real numbers), numberof obstacles sensed in the immediatevicinity of the robot(an integer),or the distancein metersto the nearestobstacle, andcontrol parameterssuchas the minimum safe distance of approachto an obstacle(in meters),the speed,of the robot (in metersper second),or the current headingof the robot (in radians). Theseparametersvary continuouslyover time; thus the completeexperienceis representedby time graphsof each of theseparameters. Clearly,it is not feasibleto store the entire time history of eachof these parametersfor each problemthat 2the robotsolves,nor is it usefulto doso. ThusSINSlearns andstores someabstraction of the actual experience.Onemightargue that CHEF actually does the same, since an actual cookingexperiencewouldinvolve perceptual input (suchas lookingat the fryingpanandjudgingwhetherthe ingredients havebrownedenough)and control output (such moving the spatula to stir the ingredientsin the pot). In CHEF, the experienceshavealreadybeenabstracted by the programmer and represented in symbolicform. Oneof the open issues in our research is the automaticexWactionof such symbolic ZHowever, if the rangeof allowablevariationsof perceptualand control parametersis bounded by the nature of the tusk, sucha "memory-based approach" maybe in fact be feasible[Atkeson,1990]. 91 representationsfromthe actual continuousexperiencesof the system(e.g., [Kuipers&Byun,1988]). 4.3 Virtual eases Arelated, but different, problemwith continuoustask domains is that an experiencecandiffer fromanotherin infinitely many ways,anddifferencescan rangefromsignificant to infinitesimal. Onlya tiny fraction of all possibleexperienceswill actually be undergoneby the system. However,the powerof a case-basedreasoningsystemcomesfromits cases, and so it is desirableto havea representativelibrary of casesthat will coverthe rangeof experiencesthe systemis likely to encounter. Weintroduce the idea of a virtual case, whichrepresents a representativeexperiencethat the systemcould well havehad but mayor maynot actually have had. Rather than trying to remember all the details (downto the grain-size definedby the programmer) of eachexperience, or someabstraction of these details, SINScombinespast cases and present experiencesto create a virtual experience.This is similar to the abstraction processdescribedearlier, with the differencethat a virtual case doesnot representan abstract or generalizeddescriptionof an actual experiencebut rather a virtual experiencederivedfrom a combinationof several actual experiences.Thenotion of a virtual case mightbe useful in symboliccase-basedreasoning systemsas well. To take a simpleexample,AQUA learns about and updates existing cases based on newexperiences [Ram, 1993b].This processresults in hypothesesthat mayor maynot be "true" of a single experience,but are useful andplausible. If used in future reasoning,these hypothesescould be viewed as virtual cases. SINS’svirtual cases are moresophisticated since theyare continuouslyrefined throughuse; the refinement algorithmsare also different, as discussedbelow. 4.4 Twotypes of behaviormodification Asdiscussedearlier, our systemsuse case-basedreasoningto suggestglobalmodifications(behaviorselection) as well as suggest morelocal modifications(behavior adaptation). The knowledge requiredfor both kindsof suggestionsare stored in a case, in contrast with traditional case-basedreasoningsystemsin whichcases are usedonly to suggestsolutions, anda separatelibrary of adaptationrules is usedto adapta solutionto fit the current problem.In manyproblemdomains,even noncontinuousones, planning(or other types of problem-solving) cannot be performedseparately from plan execution. In such situations, case-basedreasoningcan be usedto proposea plan (or solution)as wellas continuously refine it duringexecution. Anotherdifference of interest is that cases in ACBARR and SINSproposemodifications,not directly of the plan or trajectory, but of the reactive planneritself whichthenresult in modificationsto the proposedtrajectories. 4.5 Twotypes of adaptation Traditional case-basedreasoningsystemsretrieve cases and adapt the solutions proposedby those cases in order to provide newsolutions to newproblemsat hand. However, in order to build virtual cases, our systemsalso needto adaptthe cases themselvesin response to newexperiences. This is similar to the incrementalcase modificationprocess in AQUA [Ram, 1993b]in that, in additionto usinga case to deal with a new situation, the systemcan use aa experienceto learn moreabout its existingcases. In our problemdomain,cases represent environmental regularities that havebeenidentifiedby the systemthroughits experience. Theseprovidethe predictive powernecessaryto navigate in future situations. Casesare usedfor behavioradaptation; in standardcase-basedreasoningterms, this can be viewedas tered worlds, worlds with box canyons, and so on, without the processof using the recommendations providedby the case any reconfiguration. In this paper, wehavefocussed on the to _n~_nt the solutionscurrentlybeingpursuedbythe system.In case-basedreasoningaspects of our work; robot control isaddition, cases themselvescan be adaptedthroughexperience; sues are discussed in [Ram,Arkin, Moorman, & Clark, 1992; this canbe viewedas a processof discoveryin whichthe system Ram& Santamaria, 1993]. developsa modelof the worldaroundit. Thesystem"explores" the searchspaceas its case representationstraverse this space 5 Conclusions andfind good"niches" representing regularities. Theformer Wehavepresented a novel methodfor augmentingthe perforprocesscouldbe viewedas a processof"solution adaptation," manceof a reactive control systemthat combinescase-based andthe latter as oneof "casemodification." reasoningfor on-line parameteradaptation andreinforcement Theparticular methodof case modificationusedin our system learning for on-line case learning andadaptation. Themethod is similar to that of Sutton[1990],whosesystemuses a trialis fully implemented andhas beenevaluatedthroughextensive and-error reinforcementlearning strategy to developa world simulations. modeland to plan optimal routes using the evolving world Thepowerof the methodderives fromits ability to capture model.Unlikethis system, however,our systemdoes not need common environmentalconfigurations, and regularities in the to be trained on the sameworldmanytimes, nor are the results interaction betweenthe environment andthe system,throughan of its learningspecificto a particularworld,initial location,or on-line, adaptive process. The method addsconsiderablyto the destinationlocation. In general, wehypothesizethat case modperformance and flexibility of the underlying reactive control itieation maybe useful in other types of case-basedreasoning system because it allows the system to select and utilize differsystemsas well, although different methodsmayneed to be ent behaviors (i.e., different sets of schema parameter values) developedto performthe modifications. AQUA’s incremenas appropriate for the particular situation at hand. SINS can be tal case modification,for example,can be viewedas such an characterized as performing a kind of constructive representaextensionto SWALE’s solution adaptation[Schank,1986]. Differentcriteria mayalso needto be developed for deciding tional changein whichit constructshigher-levelrepresentations (cases) of system-environment interactions fromlow-levelsenwhento modifya case to fit the newexperience,whento learn sorimotor representations [Ram, 1993a]. a newease to represent the newexperience, and whento use In SINS, the perception-action task and the adaptationthe case for solution adaptationbut withoutmodifying it. Our systemuses a relative similarity measureto identify potential learningtask are integratedin a tightly knit cycle, similar to regularities. Intuitively, if casematchesthe currentsituation the "anytimelearning" approachof [Grefenstette & Ramsey, 1992]. Perceptionand action are required so that the system betterthanprevious situationsit wasusedin, it is likelythat the can exploreits environment anddetect regularities; they also, situationinvolvesthe veryregularitiesthat the caseis beginning of course, formthe basis of the underlyingperformance task, to capture; thus, it is worthwhilemodifyingthe case in the that of navigation. Adaptationand learning are required to directionof the currentsituation. Alternatively,if the matchis generalizethese regularities andprovidepredictivesuggestions not quite as good,the case shouldnot be modifiedbecausethat will take it awayfromthe regularityit wasconverging towards. basedon prior experience. Both tasks occur simultaneously, progressively improvingthe performanceof the systemwhile Finally,if the currentsituationis a verybadfit to the case, it allowingit to carry out its performance task withoutneedingto makesmoresense to create anewcase to represent what is "stop and think." probablya newclass of situations. Thereare still several unresolvedissues in our research. 4.6 On-line real.time response Whilewehave beenable to determineappropriate time winUnliketraditional case-basedreasoningsystemswhichrely on dowsfor SINSthroughsimulationstudies, the size or extent deepreasoningand analysis (e.g., [Hammond, 1989]), and un- of the cases neededto represent extendedexperiencesin conlike other machinelearning augmentations to reactive control tinuous domainsis still an openissue [Kolodner,1993]. Fursystemswhichfall backon non-reactivereasoning(e.g., [Chien, thermore,the retrieval processis very expensiveandlimits the Gervasio, & DeJong,1991]), our methodallows the system number of cases that the systemcan handlewithoutdeteriorating to continueto performreactively with very little performance the overallnavigationalperformance, leadingto a kindof utility overheadas compared to a"pure"reactive control system.Even problem[Minton,1988]. Ourcurrent solution to this problem if real-timeresponseis not required,however, continuouscase- is to place an upperboundon the numberof cases allowedin basedreasoningcould still be used in problemdomainswhich the system.Abetter solution wouldbe to developa methodfor are inherently continuousand require continuousrepresenta- organization of cases in memory; however,conventionalmemtions. ory organizationschemesused in case-basedreasoningsystems 4.7 Adaptivereactive control (see [Kolodner,1993])assumestructured, nominalinformation Ourresearchalso contributesto reactivecontrolfor autonomous rather than continuous,time-varying,analoginformationof the kind usedin our cases. robots in the followingways.One,weproposea methodfor the use of assemblages of behaviorstailored to particular environAnotheropenissue is that of the nature of the regularities mentaldemands,rather than of single or multipleindependent capturedin the system’scases. WhileSINS’cases do enhance behaviors. Two,our systemscan select and adapt these beits performance, they are not easy to interpret. Interpretation haviors dynamicallywithout relying on the user to manually is desirable, not only for the purposeof obtainingof a deeper programthe correct behavioralparametersfor each navigation understanding of the methods,but also for possibleintegration problem.Three, the knowledgerequired for behaviorselecof higher-levelreasoningandlearningmethodsinto the system. tion and modificationis automaticallyacquiredthroughexpe- For example,instead of guessinginitial schemaparametervalrience using multiple learning methods.Finally, our system ues or modifying themincrementallythroughtrial anderror, an exhibits considerableflexibility over multiple domains.For explanation-basedmoduleworkingon top of the case adaptaexample,it performswell in unclutteredworlds,highly cluttion modulecould providebetter suggestionsfor these values, 92 thus speeding up the search process of finding the best schema parametervalues associated with a particular environmentsituation. This requires a symbolic understanding of the knowledge represented in the system’s cases. Despite these limitations, SINS is a complete and autonomousself-improving navigation system, which can interact with its environmentwithout user input and without any preprogrammed"domain knowledge"other than that implicit in its reactive control schemas.Asit performsits task, it builds a library of experiences that help it enhanceits performance. Since the systemis alwayslearning, it can cope with major environmentalchanges as well as line tune its navigation module in static and specific environmentsituations. Kuipers, BJ. &Byun, Y-T. [1988]. A robust, qualitative method for robot spatial learning. In Proceedingsof the Seventh National Conferenceon Artificial Intelligence, 774-779,St. Paul, MN. Maes, P. [1990]. Situated agents can have goals. Robotics and AutonomousSystems, 6:49-70. Minton, S. [1988]. Learning effective search control knowledge: An explanation-based approach. PhD thesis, Carnegie-MellonUniversity, ComputerScience Department, Pittsburgh, PA. Technical Report CMU-CS-88-133. Mitchell, T.M. [1990]. Becomingincreasingly reactive. In Proceedingsof the Eighth National Conferenceon Artificial Intelligence, 1051-1058, Boston, MA. Moorman, K. & Ram, A. [1992]. A case-based approach to References reactive control for autonomousrobots. In Proceedings of Arkin, R.C. [1989]. Motor schema-based mobile robot navthe AAAI Fall Symposiumon Al for Real-World Autonomous igation. The International Journal of Robotics Research, Mobile Robots, Cambridge, MA. 8(4):92-112. Mostow,J. &Bhatnagar, N. [1987]. FAILSAFE,--A floor planAshley, K. &Rissland, E. [1987]. Compareand contrast, a test ner that uses EBGto learn from its failures. In Proceedings of expertise. In Proceedingsof the Sixth NationalConference of the TenthInternational Joint Conferenceon Artificial Inon Artificial Intelligence, 273-284. telligence, 249-255,Milan, Italy. Atkeson, C.G. [1990]. Memory-basedlearning in intelligent Payton, D. [1986]. Anarchitecture for reflexive autonomous control systems. In Proceedings of the AmericanControl vehicle control. In Proceedingsof the IEEEConferenceon Conference, San Diego, CA. Robotics and Automation, pp. 1838-1845. Brooks, R. [1986]. A robust layered control system for a mo- Ram, A. & Santamaria, J.C. [1993]. A multistrategy casebile robot. IEEEJournal of Robotics and Automation, RAbased and reinforcement learning approachto self-improving 2(1): 14-23. reactive control systems for autonomousrobotic navigation. Chien, S.A., Gervasio, M.T., &DeJong, G.E [1991]. On beIn R.S. Michalski & O. Tecuci (eds.), Proceedings of the Second International Workshopon Multistrategy Learning, comingdecreasingly reactive: Learning to deliberate minireally. In Proceedingsof the Eighth International Workshop Harpers Ferry, WV. on MachineLearning, 288-292, Chicago, IL. Ram, A., Arkin, R.C., Moorman,K., & Clark, RJ. [1992]. Case-based reactive navigation: A case-based methodfor Clark, RJ., Arkin, R.C. &Ram, A. [1992]. Learning momenon-line selection and adaptation of reactive control parametum: On-line performance enhancement for reactive systers in autonomousrobotic systems. Technical Report GITtems. In Proceedings of the IEEEInternational Conference CC-92/57,College of Computing,Georgia Institute of Techon Robotics and Automation, 111-116, Nice, France, 1992. nology, Atlanta, GA. Fikes, R.E., Hart, P.E. & Nilsson, NJ. [1972]. Learning and Ram, A. [1993a]. Creative conceptual change. In Proceedings executing generalized robot plans. Artificial Intelligence, of the Fifteenth AnnualConferenceof the Cognitive Science 3:251-288. Society, Boulder, CO. Georgeff, M. [1987]. Planning. Annual Review of Computer Ram,A. [1993b]. Indexing, elaboration & refinement: IncreScience, 2:359-400. mental learning of explanatory cases. MachineLearning, Grefenstette, J.J. &Ramsey,C.L. [1992]. An approach to any10:201-248,in press. time learning. In D. Sleeman&P. Edwards(eds.), Machine Sacerdoti, E.D. [1977]. A structure for plans and behavior. Learning: Proceedings of the Ninth International ConferElsevier, NewYork. ence, 189-195, Aberdeen, Scotland. Schank, R.C. [1986]. Explanation Patterns: Understanding Hammond,KJ. [1989]. Case-Based Planning: Viewing PlanMechanically and Creatively. LawrenceErlbaum, HiUsdale, ning as a MemoryTask. Perspectives in Artificial IntelliNJ. gence. AcademicPress, Boston, MA,1989. Segre, A.M. [1988]. Machine Learning of Robot Assembly Kaelbling, L. [1986]. Anarchitecture for intelligent reactive Plans. Kluwer Academic, Norwell, MA. systems. Technical Note 400, SRI International, October. Sutton, R.S. [1990]. Integrated architectures for learning, planKolodner, J.L., Simpson,R.L., &Sycara, K. [1985]. A process ning, and reacting based on approximating dynamic promodel of case-based reasoning in problem solving. In A. gramming.In Proceedingsof the Seventh International ConJoshi, editor, Proceedingsof the Ninth International Joint ference on MachineLearning, 216-224, Austin, TX. Conferenceon Artificial lntelligence, 284-290,Los Angeles, Sutton, R.S., [1992], editor. MachineLearning, 8(3/4), Special CA. issue on reinforcementlearning. Koiodner, J.L. [1993]. Case-based reasoning. MorganKaufVeloso, M. & Carbonell, J.G. [1991]. Automatingcase generamann,San Mateo, CA, in press. tion, storage, and retrieval in Prodigy.In R.S. Michalski&G. Kopeikina, L., Brandau, R., & Lemmon,A. [1988]. CaseTecuci (eds.), Proceedingsof the First International Workbased reasoning for continuouscontrol. In Proceedingsof a shop On Multistrategy Learning, 363-377, Harpers Ferry, Workshopon Case-Based Reasoning, 250-259, Clearwater WV. Beach, FL. 93