From: AAAI Technical Report SS-95-05. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. Modeling Case-based Planning for Repairing Reasoning Failures Susan Fox David B. Leake Computer Science Department Indiana University Bloomington, IN 47405 {sfox, leake}@cs,indiana,edu Abstract Oneapplicationof modelsof reasoningbehavioris to allowa reasonerto introspectivelydetectandrepair failures of its own reasoningprocess. Weaddressthe issues of the transferability of such modelsversus the specificity of the knowledge in them, the kinds of knowledgeneededfor self-modelingand howthat knowledge is structured, andthe evaluationof introspective reasoning systems. Wepresent the ROBBIE system whichimplements a modelof its planningprocessesto improve the planner in responseto reasoningfailures. Weshowhow ROBBIE’s hierarchical modelbalancesmodelgenerality with accessto implementation-specific details, anddiscussthe qualitative and quantitativemeasureswehaveused for evaluating its introspectivecomponent. Introduction Manymotivations underlie current interest in introspective reasoning and learning. Froma functional perspective, introspective reasoning has the potential benefit of allowing the reasoner to refine its ownreasoning methods, expandingits capabilities over time and adapting its reasoning to respond effectively to novel circumstances. In complexdomainsit is difficult or impossible to predict all the knowledgeand reasoning methodsthe system will need ahead of time. A system which can learn new knowledgeand new reasoning methods should be able to perform better under those circumstances. Froma more general perspective, developmentof a modelfor this task will help us to understandand evaluate reasoningbehavior and the knowledgeneeded to capture it. In order to learn about its reasoningmethods,a systemmust be able to detect opportunities to learn, which are defined in our system by places where expectations about ideal system performance fail (Leake, 1992; Krulwich, Birnbaum, Collins, 1992; Hammond,1989; Ram, 1989; Schank, 1986; Riesbeck, 1981). Whenactual performance differs from expected ideal performance, the system learns by assigning blamefor the failure, and repairing the flaw in the underlying system. All these tasks require knowledgeabout howthe systemreasons, and what the expected results of that reasoning are. There are several different recent approachesto the task of introspective reasoning: RAPTER (Freed & Collins, 1994a, 1994b) uses expectations about a reactive planning task to diagnose and repair failures, Meta-AQUA (Ram Cox,1994) maintainsa set of templates for reasoning failures with applicable repairs to apply to failed reasoningtraces, Autognostic (Stroulia &Goel, 1994) uses an Structure-BehaviorFunctionmodelof its ownreasoning to find learning opportunities, and IULIAN(Oehlmann, Edwards, & Sleeman, 1994, 1995) uses questions about its ownreasoning and knowledge to re-index its memory and to regulate its processing. OurapI (Fox & Leake, 1994), models the desired proach, ROBBIE behavior of its underlying case-based planning componentas a set of expectations about the behavior of the systemduring the planning process. ROBBIE monitors the reasoning of its underlying system, comparingits performance to a modelof the "ideal" performanceof the case-based reasoning process, as first proposed by Birnbaumet al. (1991). The modelcontains expectations about each portion of the system’s reasoning processes. These expectations, assertions that wouldhold for an ideal CBRsystem, are organized by the componentof the systemthey refer to, their level of specificity, and their relations to other expectations. The questions of what expectations are required, at what levels of abstraction, and howthey relate to each other lie at the heart of this work. In this paper wefocus on a few issues of importanceto systems which use introspective reasoning for self-improvement. In particular we consider the tradeoff betweencreating a general, transferable modeland creating a modelwith sufficient detail to guide precise diagnosis and repairs, and we consider the issue of evaluating introspective learning as a methodology and in terms of specific uses. Generalityvs. Specificity: In order to facilitate the application of a self-modeling frameworkto manydifferent systems, we must keep the modelas general as possible and use mechanismsindependent of both the implementedsystem and the particular task. At the sametime detailed descriptions of the underlying mechanismsand domain are needed in order for the self-model to determine concrete repairs. In ROBBIE, we proposean approachto introspective learning that strikes a balance betweenthe desired generality and the needed specificity, and whichhas other benefits of its ownin simplifying access to the model. The mechanismsin ROBBIEwhich manipulate its introspective model are independent of ROBBIE’s domainand underlying system, providing a few simple means of communication betweenintrospective reasoner and underlying system. The vocabulary in whichthe modelis represented is designed to describe data and reasoning tasks without being specific to a particular implementation.The modelstructure preserves as muchgenerality as possible by maintaininga hierarchy of assertions (expectations) which keep task- and implementationspecific details separate from generalities that might be more transferable to other tasks and domains(Fox &Leake, 1994). IRe-Organization of BehaviorByIntrospective Evaluation tem’s ownreasoning process (Birnbaumet al., 1991; Collins, Birnbaum, Krulwich, & Freed, 1993; Birnbaum, Collins, Freed, & Krulwich, 1990). This higher-level task is performed by a separate componentwhich interacts with the planner (see Figure 1). WorldSimulator Presented with a starting location (usually the current location of the simulated robot) and a goal location to reach, ROBBIE’scase-based componentretrieves the most similar matchingsolution in memory.Similarity is initially judged by a naive methodcomparingthe geographic "closeness" of the starting and goal locations in the current situation to those in the solutions in memory.ROBBIE can learn new features to use in assessing similarity. The solution retrieved from memFigure 1 : ROBBIE Architecture ory is adaptedby trying to mapthe actual starting and ending locations onto the retrieved ones. The resulting plan is executed by the reactive planning component,taking each highEvaluating the method: Evaluation of AI systems is important to verify that the claims madeabout their performance level plan step as a goal to be reached. This execution provides an evaluation of the quality of the adaptedplan. actually hold. Upto this point little concrete evaluation has Duringthe plan generation and execution process, the introbeen attempted for introspective reasoning systems; we will spective componentmonitors the reasoning of the case-based discuss possible meansfor evaluating such systems and deand reactive componentsfor discrepancies betweenits expecscribe how we have begun to evaluate ROBBIE. tations and the actual results. ROBBIE uses a modelof the In order to fully analyze ROBBIE’s performance, we must develop criteria for judging howgoodor "useful" its method underlying planning process to provide expectations about its is; we must justify the effort expendedboth in terms of what performance.Themodelis a structured set of assertions about the ideal behaviorof the case-basedplanner (Birnbaumet al., we maylearn about modelingmental states and in terms of the 1991; Fox &Leake, 1994). During the monitoring process, tangible benefits of designing such a system. only those assertions relevant to the current portion of the reaBy analyzing ROBBIE’sapproach, we can learn something soning task need be considered. In diagnosing a discovered of the knowledgeneeds of systems for doing introspective failure, the entire modelmaybe reconsidered as a problem diagnosis and repair, and of howthat knowledgeshould be mightnot be discovereduntil well after it wasintroduced(i.e., structured. For example,expectationsat multiple levels of abretrieval of a bad case mightnot producean explicit failure unstraction seemto makethe modelingas well as the transferring of goals more tractable. Makingfine discriminations among til plan execution). the kinds of relationships betweenexpectations seemsto imThe failures ROBBIE maydetect include both catastrophes prove the focus of assigning blamewhena failure does occur. in whichthe planner incorrectly solves a problemor cannot Onepractical justification for using introspective reasonreach a solution, and hiddenfailures whichinvolve inefficient ing is the potential for improvedperformance;to support such processing or successful but non-optimal solutions. For exa claim we must determine, quantitatively and qualitatively, ample, ROBBIE expects that it will knowand use all the relto what extent performance has improved. Potential evaluaevant features of a problemto retrieve the best old solution. tion methods should provide somemeasure of the magnitude This assertion could be violated, yet a solution still be possiof improvementintrospective reasoning produces: one possible fromthe less-than-optimal retrieved case. ble evaluation methodis to comparethe performanceof the Whena discrepancy is discovered, the network of related bottom-level systemalone with that of the systemas a whole. assertions is reconsidered, drawingfrom a trace of the reaIn addition, we should define morequalitative methods,such soning so far and those portions of the modelreachable from as learning the "right" newreasoning, or producing "better" the original failed assertion. Throughthis process the system outputresults. will determinethe root cause and possible repair for the noWefirst describe the ROBBIE systemin detail and present ticed failure. For the failure above(that it will knowand use an exampleof the sort of introspective learning ROBBIE perall relevant features), ROBBIE might discover in storing the forms. Then we will consider the issues described above to solution gained by a poorly retrieved case that the solution resee howROBBIE fits in and what we can conclude. trieved was not the best one. The introspective reasoner can workback from that noticed failure to the deeper cause: the The ROBBIEsystem lack of a relevant feature. ROBBIE can alter the features used in retrieval to include one that wouldhave distinguished the The ROBBIE system is, at the most basic task level, a planning system, whichinteracts with a user and a simulatedworld "real" best solution. The examplebelow addresses this problem in moredetail. to generate and execute plans for that world. That "performance" task is performed by a case-based planner (HamThe planner maybe suspended while a repair is found and mond, 1989; Alterman, 1986; Kolodner, 1993), combined implemented,or it maybe permitted to continue until more with a simple reactive-style execution system(Firby, 1989). information becomesavailable to the introspective reasoner. Overarchingthe performancetask is the task of learning inAfter a repair has been implemented, the planner maycontrospectively about the planning and execution process itself, tinue from the point where a problem was observed or may which is done using model-based reasoning about the sysbe reset to a prior point in the reasoning task fromwhichthe 11 32 sertions often have repairs associated with them, because they can refer to actual parts of ROBBIE which can be altered. Several components of the planner are implemented as case-based systems themselves, sharing the same memoryand retrieval mechanismsas the planner as a whole. For example, anticipation is viewedas a process of selecting and applying cases which specify features to be added to the problem description. Becauseof the re-use of the case-based mechanisms for morethan one purpose, the details of the modelare simplified for those case-based components;the modelof CBRas a whole provides expectations for each of them, as well as for the planner. Retriever: abstract seq~Retriever a ease Retrieverwill willfind output a valid ease Adaptor: abstract (/~ Adaptorwill get an adaptable case 2~Adaptati°n spec/abstl(,, will succeed mid-level X’~I Adaptorwill producea completecase specific ] , / ,J l Adaptorwill completein less than N step~ Figure 2: SampleAssertions system can proceed normally. ROBBIE’s self-model The introspective reasoning modelis used to monitor the system’s reasoningprocesses, and to diagnoseand repair failures that occur whenthe assertions of ideal reasoning performance fail to be true of the actual reasoning performance.The assertions describe expectations about the reasoning processes for each componentof the planning system; Figure 2 shows a portion of the current modelfor ROBBIE, with assertions describedin English. In this section wewill describe what assertions the modelcontains, howthey are structured, whatthat meansfor the assertions in Figure 2, and the benefits gained by a hierarchical model. Assertions in the model The model must provide expectations for the reasoning processes of each componentof the planner. The case-based planning system consists of componentswhich perform specific parts of the CBRtask: Anticipator, Retriever, Adaptor, Executor, and Storer. The Anticipator takes an initial problem description and creates an index to compareto the cases in memory.The Retriever uses that index to select the most similar solution in memory,the Adaptorchanges the old solution to matchthe newproblem, the Executor evaluates the solution by executing it, and the Storer adds the newsolution to memoryfor future use. The assertions in the modeldescribe the componentsat different levels of specificity. At the abstract level are assertions muchlike the description given above. High-level assertions provide a trace of the overall flow of control and information throughthe planner, without using any details specific to ROBBIE. At lower levels, assertions refer to specific aspects of ROBBIE’simplementation: the algorithms used for doing retrieval, adaptation, execution, and so forth. Lower-levelas- 33 Structure of the model The assertions are structured by the componentto which an assertion refers, the level of specificity of the assertion, and by connectionsto other related assertions. Dividingassertions into groupsby their components facilitates monitoringthe reasoning processes for deviations; the only assertions which must be monitoredrefer to the current componentof processing. Assertions which belong to a particular componentare also likely to be closely related to each other, as well. Assertions are arranged hierarchically depending on how specific they are to ROBBIE’s implementation. A separation by hierarchy simplifies the task of updating the modelwhen things change, and transferring portions of the modelto new underlying systems. In addition, it separates different ways of thinking about the reasoning task: the abstract levels link componentstogether and describe howinformation and control passes betweenthem, low-level assertions describe portions of particular componentsand the specific information needs and algorithms for them. Eachassertion is linked to the other assertions which are related to it. Theselinks guide the introspective reasoner in explaining and repairing a detected failure by focusing on the most fruitful portions of the model. There are four kinds of links, whichthe introspective reasoner treats differently during the search for the deep cause of a failure: an abstraction link connects a loW-levelassertion to its high-level counterpart, a specification link symmetricto the abstraction link, a sequence link connects two assertions (at the same level of specificity) whenone assertion refers to an earlier part of the reasoning process, and a co-occurs link connects two assertions whichtend to fail or succeed together. Theseclasses of links betweenassertions are preliminary; we expect to refine the classes as the modelis completed. Sample of the model Figure 2 represents a portion of the model ROBBIE uses, showinga subset of the assertions for two componentsof the case-based planner. Assertions are grouped, first by component, and then by level of specificity. The numberof levels dependson the componentin question; this figure showsthree levels for the Adaptor: abstract, mid-level, and specific. Assertions which are groupedby specificity and componentare considered together during the monitoring process. Assertions are connectedtogether by several different kinds of links; three appear in Figure 2: "seq," "spec," and "abstr." The "seq" links encodethe order in whichevent occur in the underlying system, "spec" and "abstr" links are symmetric and link assertions at one level to correspondingspecifica- tions or abstractions at another level. Assertions are written in English for convenience,the actual assertions use a limited vocabularyin predicate calculus. The first componentin Figure 2 is the Retriever, for which only two abstract assertions appear. In the complete model, these assertions havelinks to other abstract and specific assertions omitted here to simplify the example.The first assertion states that the retriever will alwaysfind somematchingcase. The"seq" link indicates that the next assertion comeslater in the retrieval process:It states that the final result of retrieval will be the right kind of case. A memory might contains different kinds of memorystructures (as ROBBIE’s does), this asserts that the Retrieverwill find a plan, andnot (for instance) an adaptationstrategy, if it is lookingfor a plan. Sequencelinks often connect assertions from two different componentsat the abstract level. The next assertion in sequenceis an Adaptorassertion, that the Adaptorwill be given an adaptable case ("adaptable" wouldbe defined by specific assertions not included here). It is followed by an assertion stating that adaptation will succeed in producingsomeanswer in a limited amountof time. This assertion is linked to a specification that describes, in details specific to ROBBIE’s implementation, exactly howto judge"success" of the adaptor. The last abstract assertion is, as for the Retriever, concernedwith the correct output for the component; it is linked to a mid-level assertion whichdefines "executable" in terms of "complete", and wouldhave morespecific assertions belowit. Benefits of a hierarchical model A hierarchical model such as ROBBIE uses provides two advantages over an approach using just general or specific expectations. First, for knowledgere-use, we can encapsulate those parts of the knowledgewhich wouMapply across different systems and keep that part of the model’s knowledge whendoing the transfer. Equally important, however,is the value of both kinds of expectations in monitoringand repairing the underlying system. High-level assertions provide an overview of the planner’s process, and allow us to connect the functioning of one componentwith another at the "right" level of abstraction: wedon’t needto trace each specific step from storing the solution back to creating the index; the abstract assertions provide access to other components at a general flow-of-control level of description. In searching for the root cause of a failure, we can use the high-level assertions to select appropriate componentsto consider and, from there, the appropriate specific assertions for that component.Without the lower-level assertions whichdescribe the actual processing of this systems, it becomesnearly impossibleto detect failures or to specify goodrepairs for them. Wetherefore must design a modelwhichincorporates both levels of description; ROBBIE’s hierarchical and component-oriented model is one such design. There are still manyunansweredor incompletely-answered questions about ROBBIE’sapproach. ROBBIE’scurrent modelis incomplete,incorporatinga fraction of the assertions we expect to need, and havingvery few repairs at its disposal. Our immediatetask is to expandthe modeland repairs: to do this we must determineto a finer degree what knowledgeis required. Wemust consider howmanylevels of abstraction in the modelhierarchy are useful for ROBBIE. Wemust also catalog morecompletelythe kinds of links betweenassertions, as LI BirchStreet A Wasted step //r Figure 3: Mapof simulated world Plan A: ¯ Turn south ¯ Movesouth to south side of Birch ¯ Turn west ¯ Movewest to L2 Plan B: ¯ Turn east ¯ Moveeast to L3 Figure 4: Hans in memory we see what effect the current divisions have on the model’s processing. Ideally we wouldtest the modelstructure under fire by using it to implementintrospective reasoningfor a different underlying system. Example:learning newindex features To makethe discussion more concrete, let us consider a case in whichROBBIE alters the set of features used to index its memory.ROBBIE’sunderlying task is to create and execute plans for navigating city streets in a simulated worldas a pedestrian. The systemhas access to previous routes it has taken and to a mapof the world which does not include dynamically changing details. Suchdetails at the present time include traffic lights against whichthe systemmustnot cross and which break down, and street-closings. The case-based process must measure the similarity between the goal index and the indices of cases in memoryto select the case which is easiest to adapt into a newsolution; ROBBIE originally selects cases based on howsimilar the starting and ending locations are to those in memory.Suchan index, while it seemsan obvious approach, is not sufficient, as the following example will makeclear. Figure 3 showsa portion of the world maprelevant to this problem. ROBBIEhas in memoryplan A, which describes howto travel from location LI to location L2, and plan B, which describes howto get from location L2 to location L3. Figure 4 showsthe steps of each plan. The current task is to get from location L4 to location L2. Using the geographic closeness of starting and ending locations alone to judge sim- Before execution: ¯ Turn south ¯ Movesouth to south side of Birch ¯ Turn west ¯Movc west to L2 After execution: ¯ Turn west ¯ Movewest to L2 Figure 5: Plan C before and after execution ilarity, plan Aappears to be the closest becauseit shares the sameending location (ROBBIE’s retrieval criteria does not inelude knowledgeabout reversals of knownroutes, so plan B does not look similar at all). Plan A is selected and adaptedto create plan C (dashed line in Figure 3 and in Figure 5). During this process, the introspective reasoner monitorsthe system’s behavior but detects nothing wrong. Whenthe plan is executed, however,the wasted plan steps will be eliminated: the goal of the first twosteps in plan C is to be on the southside of Birch, whichis already true, so the steps will be skipped (see Figure 5). Whenthe resulting plan is stored into memory, an introspective failure is detected: an assertion in the model is that the final solution stored will haveplan steps whichare moresimilar to the retrieved case than to any other case in memory.In comparingthe final plan C to cases A and B in memory,it is clear that plan B has the moresimilar solution. In explaining the cause of this assertion failure, ROBBIE reconsiders related assertions assertions in the model, moving up in the hierarchyof assertions to the general assertions that "retrieval will operate successfully." It will consider highlevel assertions prior to the general one, such as "the index will select the closest case." That high-level assertion belongs to the Anticipator component;ROBBIE will also movedownward from high-level to more specific assertions, including "the indexwill include all the relevant features to retrieve the closest case." In re-evaluatingthe last assertion in the context of the failure the systemdiscoversa feature of the cases it had not used before: that each involves movingstraight along an east/west street. This showsthat the assertion "the index will include all the relevant features to retrieve the closest case" failed. The assertion suggests a repair: add "movesstraight on east/west street" to the features used in indexingcases, and re-index memoryto include the newfeature. In the future, any problemwhich involves movingstraight along an east/west street will be indexed by the newfeature, and will match most closely other cases which also include that feature in their index. Oncethe introspective reasoner has evaluated and repaired the problem,processing continues normally. Notice that the failure in question here is not a catastrophic one, but it does represent wastedeffort on the part of the planner, effort that wouldotherwise be repeated and compoundedin the future. The situation above is an exampleof ROBBIE’s introspective learning for a single goal. The ramifications of learning a newfeature will only becomeclear over a sequence of 35 goals. In order to study the improvementintrospectivereasoning provides for ROBBIE, we ran a set of experiments which presented ROBBIE with twenty-six sequences of goals, executing each sequence with and without introspective reasoning. Onesequence was carefully designed to be easy for ROBBIE to handle, other sequences were randomlyperturbed versions of the first. Wemeasuredthe numberof problemsROBBIE successfully handled for each sequence, and found that in almost every case ROBBIEcould handle more problems with introspective learning than without (in one anomalous case the overall performancewas so poor that introspective learning could provide no benefit at all). Wealso measuredthe percentage of cases in memorywhich were considered during the the retrieval process, over the sequenceof retrievals made in solving the sequence of goals: The percentage considered whenintrospective reasoning was used droppedsignificantly below the percentage considered without introspective reasoning. ROBBIE, using introspective reasoning to re-index its memory,consideredfewer irrelevant cases at the sametime as it improvedits overall successrate. Ramificationsto general issues Wehave nowdescribed the ROBBIE system in some detail; we must comeback to the issues alluded to briefly above. We will discuss the tradeoff betweenthe generality, and hence transferability, of a self-modelframeworkand the specificity of details the modelneedsto accurately detect and repair failures. Wewill also discuss meansfor evaluating the benefit of learning about reasoning methods. Wewill describe our attempts to address these issues with the ROBBIE system, sketch our conclusions, and describe howROBBIE relates to other workin this area. Generality vs. Specificity Ideally one could develop a frameworkfor reasoning about mental processes that could be transferred with minorchanges to provide self-models for a wide array of underlying systems and tasks (vision, planning, etc.) and for a wide variety modelingtasks (modeling others’ reasoning, explaining reasoning behavior, analyzing its ownactions, etc.). While we must admit that such a universal frameworkis, at least now, out of reach, it is certainly possible to share higher-lcvcl insights about mental reasoning, and to develop specific frameworks for more limited tasks and domains. There will be commonalitiesamongthe kinds of knowledge, and the useful forms for representing that knowledge,neededto reason about mental actions. Beyondthat, it seems reasonable to expect more concrete sharing of model forms and knowledgewithin a particular kind of self-modelingtask. In developingmodelsof introspection, we will be torn betweenour desires for transferable modelsand the reality that a modelmust include a great deal of system-specific knowledge. Developingapproaches that maintain the generality of model as muchas possible meansfocusing on separating details from the functioning of the model, keeping mechanisms and vocabulary used as independent as possible and emphasizing the kind of knowledgeneeded. Specifying classes of knowledgeand useful organizations of that knowledgefor describing mental actions will provide the largest gain across modelingtasks. The problem of integrating a general approach to selfder any similar reasoning/explanationtask. modelingwith the details needed to use the modelhas been Evaluating self-modeling systems one we have tried to address with ROBBIEfrom the beIt is often problematic in AI to explain exactly what a given ginning. Wedesigned general mechanisms for monitoring the underlying reasoning and accessing the declarative model system has accomplished besides showing some implementawhich depend in no way on the contents of that model. We tion is possible. It is importantto demonstratethe advantages of any learning systemin terms of the breadth of problemsit are developinga general vocabularyfor describing the assertions in the modelto complete the generality of the mecha- can solve and the applicability of its ideas in general. At this point, attempts to evaluate introspective reasoning have been nisms. Within this framework a model may be constructed limited; we have, however, madean effort to evaluate ROBfor a very different systemsharing little in common with the BIE’s mechanismsand performance. implementedone. Keepinga hierarchy of assertions, and organizing them by component,allows substitution for pieces Wemust determine when using a self-model provides a of the modelfor a newsystem without requiring a completely benefit, and howto demonstrate the extent of that benefit. That benefit maybe in advancingour knowledgeof what selfnew model. For example, a CBRsystem could keep the upper tiers of the model for each componentsimilar to ROBBIE’s, reasoning entails and the ramifications for mental modeling adding only new lower-level details, or a variation on ROB- in general. The benefit mayalso lie on the practical side as well: systems with the power to improve their ownmechaBIE which used a different adaptation mechanismcould substitute newassertions for that componentalone. nisms should solve more problems, solve problems more effectively, producebetter solutions, and respond moreflexibly Of perhapsgreater importancein terms of transferability is to novel situations than their non-introspective counterparts. what we now understand about the kinds of knowledgeand The expense of modelingreasoning behavior makes evaluatthe modelstructure required for this task. In developing a ing its successas a practical tool of particular importance. modelfor this system we also develop a template for what to Howto measurethe performanceof an introspective learninclude in models of other systems; ROBBIE’smodeldemon- ing systemis itself a difficult question and maydependon the strates the value of incorporating multiple levels of knowl- system; possible measures include: the breadth and number edge about reasoning tasks. The ROBBIE system’s diagnosis of problemssolved that were impossible previously, the speed capabilities were improved by having high-level knowledge and efficiency of the reasoning process and the solutions prothat provideda general flow of control and information, along duced, and manyothers. Manysystems which use a model of with specific details about the system’s operation (tied into reasoning, including ROBBIE,are two-level systems which that higher level). Consideringhigh-level assertions whenasmakea relatively firm distinction betweenthe reasoningbeing signing blameleads the systemto considerationof other assermodeledand the reasoning Usedto do the modeling; one postions distant in termsof the reasoningtrace but close in terms sible evaluation methodis to comparethe performanceof the of the flow of control. Thesystemshould moreeasily trace the bottom-level system with the system as a whole. As a qualreasoning behavior from a detected failure back to the origiitative evaluation we can ask if a system like ROBBIE denal cause. In a similar way,distinguishing different kinds of tects the "right" failures, assigns blamecorrectly, and repairs relationships betweenpieces of knowledgefocuses the model the systemthe "right" way. Other workhas been less explicit on the most relevant pieces; in ROBBIE, the modelincludes about concrete meansof evaluating systems. Cox (1995) has specification and abstraction links, links that indicate the sedescribed classes of reasoning behavior and failures that peoquenceof reasoning, and causal links that connect assertions ple experience, and that systems which modelreasoning belikely to fail or succeedtogether. The modelcould chooseto havior should address; that set provides a qualitative guide for follow specification links whentrying to determinea repair, judging models of reasoning. Autognostic (Stroulia &Goel, or could avoid testing assertions whichare specifications of a 1994) provides another kind of evaluation by directly proving high-level assertion that has not failed. A modelwithout disthe applicability of its modelto different underlyingsystems. tinct connectionsbetweenassertions could not as accurately Wehave begun evaluating ROBBIEusing a practicallygaugewhichassertions are relevant under a given set of ciroriented criterion: the addition of introspective reasoning cumstances. should produce quantitative as well as qualitative improveManyother systems have also approached the problem of ments in the performance of the overall system. Weare in generality of mechanismand transferability. Cox & Freed the process of performingextensive experimentsto test ROB(1994) identify knowledgeabout howgeneral and specific BIE’s performance over long sequences of problems. By colknowledgecombines as a key element for a self-reasoning lecting statistics on the success of the systemwith and withsystem. Freed’s RAPTER (Freed & Collins, 1994b) uses out introspective learning, we can quantify its effect. Some general set of representations for expectationsand repairs, and tentative and preliminary results are in (Fox &Leake, 1994). a general mechanism to manipulatethem, while the content of Wehave completed one set of experiments (described above) its representations is specific to the RAPTER system. Strouwhich used the numberof successful cases over a sequence lia’s Autognostic(Stroulia &Goel, 1994) applies an existing and the percentage of cases in memoryconsidered during rekind of model (used for modelingphysical machines) to imtrieval to reveal differences in ROBBIE performancewith and plement a self-model and successfully applied the modeland withoutintrospective reasoning. Initial results of that experimechanismsto two independent systems (Kritik2 (Stroulia ment are encouraging. Goel, 1992) and Router (Goel, Callantine, Shankar, &ChanFocusing too heavily on quantitative measures mayoverdrasekaran, 1991)). Meta-AQUA (Ram & Cox, 1994) look someimportantfeatures of introspective reasoning; it is abstract descriptions of reasoningtraces that might arise undifficult to quantify the quality of a solution, or the elegance 36 of the reasoningthat created it. Wemust be awareof and seek out those morequalitative benefits as well. Wemayfind objective measuresof solution quality through common sense in some domains, through comparisons with human-created solutions, or through surveys eliciting quality judgements.Elegance of reasoning is an even moresubjective issue, but by similar methodssomeobjective judgementcan be reached. Conclusions whenintrospective learning is enabled, while also considering ways to express qualitative measuressuch as "improving in the fight way"or "learning from the fight failures." Anissue at the heart of self-modeling systems is the question of what kinds of knowledgeare required for the system to performits tasks, and howthat knowledgeis to be represented. Whilethis is an ongoingresearch issue, we have proposeda structure for self-modelingthat allows for flexibility of application and is designed to allow for transfer of some part of the modelto newapplications. The ROBBIE system, while still incomplete, addresses several important issues for modelingreasoning behavior, and introspective learning in particular. Our conclusions about the structural requirements of ROBBIE’s modelshould be applicable to a general modelof reasoning, and the approachto preserving the re-usability of the modelmayalso provide pointers for future work on the transfer of reasoning knowledge.ROBBIE benefited from using multiple levels of knowledgeto focus on the most relevant portions of the model; determining what the important levels of knowledgeare and howmultiple levels affect reasoning modelsmaybe beneficial to a wide range of modelsof reasoning behavior. In developing a model of the reasoning process, we must strike a balance betweenthe generality and transferability of the modeland the specific knowledgerequired to detect and specify repairs. The ROBBIE system uses a hierarchical modelaccessed by system independent mechanismsin order to find that balance. To achieve generality, mechanisms for introspective reasoning should workwith any set of assertions neededfor a system, requiring a general vocabulary or frameworkfor a vocabulary. Separating assertions which makestatements about the general kind of underlying system (here, case-based reasoning) from those that refer to implementationor knowledgedetails of the specific systemmakesit easier to convert the modelto apply to a newbut similar system. Weclaim that a model must include knowledge about the reasoningprocess at multiple levels of abstraction; a highlevel description tied to lower-leveldetails. Doingso helps us to keep the generality of the modeland also allows us to use the modelat the "right" level for diagnosis by using abstract descriptions to trace the general flow of control and knowledge rather than plodding through every detailed step of the reasoning process. Using the right level of abstraction may focus the diagnosis on promising areas of the model while avoiding unnecessary or unpromisingdetails. Weclaim that the problemof evaluating modelsof reasoning behavior must be addressed because of the potential expense of such models. Wecan choose to evaluate a system in terms of its benefit as a modelof mental actions: what we learn about possible model structures and knowledgeneeds provide one kind of justification, or the extent to which a modelcovers the scopeof introspective reasoningfor the task. Wemayalso evaluate the practical benefits of using a model of reasoning behavior. For the purposes of a systemusing the modelfor self-repair, we mayjudge the quality of the overall systemcomparedto one without learning, or use a qualitative gaugeof the repairs madeto the system. Tojudge the quality of the overall system, various measures might be proposed: breadth of problemssolved, quality of solutions, speedof processing, and so forth. Wehavebegunthis process by trying to find some quantitative measures of ROBBIE’simprovement Acknowledgements This work is supported in part by the National Science Foundation under Grant No. IRI-9409348. References 37 Alterman, R. (1986). An adaptive planner. In Proceedingsof the Fifth NationalConferenceon Artificial Intelligence, pp. 65-69 Philadelphia, PA. AAAI. Birnbaum,L., Collins, G., Brand, M., Freed, M., Krulwich, B., & Pryor, L. (1991). A model-based approach the construction of adaptive case-based planning systems. In Bareiss, R. (Ed.), Proceedingsof the CaseBased Reasoning Workshop, pp. 215-224 San Mateo. DARPA,Morgan Kaufmann, Inc. Birnbaum,L., Collins, G., Freed, M., &Krulwieh, B. (1990). Model-baseddiagnosis of planning failures. In Proceedings of the Eighth National Conferenceon Artificial Intelligence, pp. 318-323 Boston, MA.AAAI. Collins, G., Birnbaum,L., Krulwich, B., & Freed, M. (1993). The role of self-models in learning to plan. In Foundations of KnowledgeAquisition: MachineLearning, pp. 83-116. Kluwer AcademicPublishers. Cox, M. (1995). Representing mental events (or the lack thereof). In Proceedings of the 1995 AAAI Spring Symposiumon Representing Mental States and Mechanisms. (in press). Cox, M. & Freed, M. (1994). Using knowledge of cognitive behavior to learn from failure. In Proceedings of the Seventh International Conference on Systems Research, Informatics and Cybernetics, pp. 142147 Baden-Baden, Germany. Firby, R. J. (1989). Adaptive Execution in ComplexDynamic WorMs.Ph.D. thesis, Yale University, ComputerScience Department. Technical Report 672. Fox, S. & Leake, D. (1994). Using introspective reasoning to guide index refinement in case-based reasoning. In Proceedingsof the Sixteenth Annual Conferenceof the Cognitive Science Society, pp. 324-329 Atlanta, GA. Lawrence Erlbaum Associates. Freed, M. &Collins, G. (1994a). Adapting routines to improve task coordination. In Proceedings of the 1994 Conference on AI Planning Systems, pp. 255-259. Freed, M. &Collins, G. (1994b). Learning to prevent task interactions. In desJardins, M. &Ram,A. (Eds.), Proceedings of the 1994 AAAI Spring Symposiumon Goaldriven Learning, pp, 28-35. AAAIPress. Goel, A., Callantine, T., Shankar, M., &Chandrasekaran,B. (1991). Representation, organization, and use of topographic modelsof physical spaces for route planning. In Proceedings of the Seventh IEEEConferenceon AI Applications, pp. 308-314. IEEEComputerSociety Press. Hammond,C. (1989). Case-Based Planning: Viewing Planning as a MemoryTask. AcademicPress, San Diego. Kolodner, J. (1993). Case-Based Reasoning. MorganKaufman, San Mateo, CA. Krulwich, B., Birnbaum,L., &Collins, G. (1992). Learning several lessons from one experience. In Proceedingsof the FourteenthAnnualConferenceof the Cognitive Science Society, pp. 242-247 Bloomington,IN. Cognitive Science Society. Leake, D. (1992). Evaluating Explanations: A Content Theory. LawrenceErlbaumAssociates, Hillsdale, NJ. Oehlmann, R., Edwards, P., & Sleeman, D. (1994). Changing the viewpoint: re-indexing by introspective questioning. In Proceedingsof the Sixteenth Annual Conference of the Cognitive Science Society, pp. 675-680. LawrenceErlbaum Associates. Oehlmann,R., Edwards, P., &Sleeman, D. (1995). Introspection planning: representing metacognitive experience. In Proceedings of the 1995 AAAI Spring Symposium on Representing Mental States and Mechanisms. (in press). Ram, A. (1989). Question-driven understanding: An integrated theory of story understanding, memoryand learning. Ph.D. thesis, Yale University, NewHaven, CT. Computer Science Department Technical Report 710. Ram, A. & Cox, M. (1994). Introspective reasoning using meta-explanations for multistrategy learning. In Michalski, R. &Tecuci, G. (Eds.), MachineLearning: A multistrategy approachVol. IV, pp. 349-377. Morgan Kaufmann. Riesbeck, C. ( 1981). Failure-driven remindingfor incremental learning. In Proceedings of the Seventh International Joint Conferenceon Artificial Intelligence, pp. 115-120 Vancouver, B.C. IJCAI. Schank, R. (1986). Explanation Patterns: Understanding Mechanically and Creatively. Lawrence Erlbaum Associates, Hillsdale, NJ. Stroulia, E. &Goel, A. (1992). Generic teleological mechanisms and their use in case adaptation. In Proceedings of the Fourteenth AnnualConference of the Cognitive Science Society, pp. 319-324 Bloomington,IN. Cognitive ScienceSociety. Stroulia, E. &Goel, A. (1994). Task structures: what learn?. In desJardins, M. & Ram, A. (Eds.), Proceedings of the 1994 AAAISpring Symposiumon Goaldriven Learning, pp. 112-121. AAAIPress. 38