Learning Planning Operators by Observation ... Xuemei Wang*

From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. Learning Planning Operators by Observation and Practice Xuemei Wang* School of ComputerScience Carnegie Mellon University Pittsburgh, PA15213 wxm+@cs.cmu.edu Abstract Theworkdescribed in this paper addresseslearning planningoperatorsby observingexpertagents and subsequent knowledgerefinementin a learning-by-doing paradigm.Theobservationsof the expert agent consist of: 1) the sequenceof actions being executed, 2) the state in whicheachaction is executed,and 3) the state resulting fromthe executionof each action. Planning operators are learned fromthese observation sequencesin an incrementalfashion utilizing a conservativespecific-to-generalinductivegeneralization process. In order to refine the newoperators to makethemcorrect and complete,the systemuses the newoperatorsto solve practice problems,analyzing and learning fromthe executiontraces of the resulting solutions or executionfailures. Wedescribe techniquesfor planningand plan repair with incorrect and incompletedomainknowledge,and for operator refinementthrougha processwhichintegrates planning, execution, and plan repair. Our learning methodis implementedon top of the PRODIGY architecture(Carbonell, Knoblock,&Minton1990; Carbonell et al. 1992) and is demonstratedin the extended-strips domain(Minton 1988)and a subset the processplanningdomain(Gil1991). The Challenge There is considerable interest in learning for planning systems. Most of this work concentrates on learning search control knowledge(Fikes, Hart, &Nilsson 1972; Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso 1992). Previous work on acquiring planning operators refines incomplete operators by active experimentation(Gil 1992). This paper describes a frameworkfor learning planning operators by observing expert agents and subsequent knowledgerefinement in a learning-by-doingparadigm(Anzai & Simon 1979). *Thisresearch is sponsoredby the WrightLaboratory,Aeronautical SystemsCenter, Air Force Materiel Command, USAF, and the Advanced ResearchProjects Agency(ARPA) under grant numberF33615-93-1-1330. Viewsand conclusions contained in this document are thoseof the authorsandshouldnot be interpreted as necessarilyrepresentingofficial policiesor endorsements, either expressedor implied, of WrightLaboratoryor the UnitedStates Government. The learning systemis given at the outset the description languagefor the domain,whichincludes the types of objects and the predicates that describe states and operators. The observations of an expert agent consist of: 1) the sequence of actions being executed, 2) the state in whicheach action is executed, and 3) the state resulting from the execution of each action. Our planning system uses STRIPS-like operators(Fikes &Nilsson 1971) and the goal of this work is to learn the preconditionsand the effects of the operators. Operators are learned from these observation sequences in an incrementalfashion utilizing a conservative specific-togeneral inductive generalization process. In order to refine the newoperators to makethem correct and complete, the system uses themto solve practice problems, analyzing and learning fromthe executiontraces of the resulting solutions. Learning by observation and by practice provides a new point in the space of approaches for problemsolving and learning. Thereare manydifficulties for such integration of learning and planning, to namea few: there are manyirrelevant features in the state whenthe operators are applied; the planner needs to plan with the newlyacquired, possibly incomplete and incorrect operators; plan repair is necessary whenthe initial plan cannot be completedbecause of someunmetnecessary preconditions in the operators in the plan; planning and execution must be interleaved and integrated. This paper describe our approach to address these problems. Techniques for planning and plan repair with partially incorrect and incomplete domainknowledgeare developed, and operators are refined during an integration of planning, execution, and plan repair. The PRODIGY architecture(Carbonell, Knoblock, &Minton1990; Carboneli et al. 1992)is the test-bed for our learning method. Our learning method applies when the domains can be modelledwith discrete actions, the states are observable, the description languagefor representing the states and the operators (i.e. the set of predicates) is known.In many cases, such as in the process planning domain(Gil 1991), learning the operator representations is a difficult and important task. Our methodcan significantly reduce the effort of knowledgeengineering in such cases i mNote that the term"observation"refers to a formof input to the learningsystem.It cancomefromvisual or audioinput, or it canbe input to the systemtypedin bya person,or it canbe input WANG 335 From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. Learning Planning Operators Architecture - the The Learning Thearchitecture for learning by observation and practice in planning include the observation module, the learning module, the planning module, and the plan execution module. Figure 1 illustrates the functionality of each moduleand the information transferred amongthese modules. ~Xge~n~ - actio~v :~ ob.~er~atio~" Environment.__._.( pla~EodX ueC~tlon ) lentatwe execul ion traces plnn.¢ ~ of the m operators Figure 1: Overviewof the data-flow amongdifferent modulesin the learning system.Theobservationmoduleprovidesthe learning modulewith observationsof the expert agent. The learning moduleformulatesandrefines operatorsin the domain,and these operators are used by the planningmodule.The planningmodule is essentially the PRODIGY plannermodifiedto use someheuristics for planningwithincompleteand incorrect operatorsas well as for plan repair, it generatestentative plansto solve practice problems.Theplan executionmoduleexecutesthe plans, providing the learningmodulewiththe executiontraces, and passingthe plan failures to the planningmodule for plan repair. Module The learning moduleformulates and refines the representations of the operators in the domainfrom the observations of the expert agent and the execution traces of the system’s ownactions. Operators are learned in an incremental fashion utilizing a conservative specific-to-general inductive generalization process and a set of domain-independent heuristics. At any point of time duringlearning, any learned operator alwayshas all the correct preconditions,but it may also have someunnecessary preconditions. These unnecessary preconditions can be removedwith more observations and practice. The learning modulealso learns heuristics for subgoal ordering from the observed episodes of the expert agent. Learning Operators from Observations Given an observation, if the action is observedfor the first time, we create the correspondingoperator such that its precondition is the parameterizedpre-state, and its effect is the parameterized delta-state. The type for each variable in the operator is the most specific type in the type hierarchythat the correspondingparameteriz~,.d object has2. This is the most conservative constant-to-variable generalization. Generalizing the observation in Figure 2 yields the operator shown in Figure3. Pre-.r tat£: (connects dr 13 rml rm3) (connects dr 13 rm3rm (dr-to-rm drl 3 rml ) (dr-to-rm drl3 rm3) (inroom robot rml ) (inroom box I rm3) (pu~ablo box (dr-clo.c@..ddrl 3) (unlockeddrl 3) (arm-empty) Delta-state: add: (next-to robot drl3) Figure2: Anobservationof the state beforeandafter the application of the operator GOTO-DR This work makes the following assumptions: 1. Theoperators and the states are deterministic. If the same operator with the same bindings is applied in the same states, the resulting states are the same. 2. Noise-free sensors. Thereare no errors in the states. 3. The preconditions of operators are conjunctive predicates. 4. Everythingin the state is observable. 5. There are no conditional effects. The Observation Module The observation modulemonitors the execution traces of the expert agents. Anobservation of the other agent consists of the nameof the operator being executed, the state before the operator is executed,called pre-state, and the state after the operator is executed, called post-state. Computingthe difference betweenthe post-state and pre-state, we obtain a delta-state. Anepisode is a sequenceof observations. readin froma caselibrary. 336 POSTERS (operator guto-dr (preconds ((<v ! > door) (<v2>object) (<v3> room)(<v4> (and (inrcom robot <v4>) (connects <v I ¯ <v3><v4>) (connects <v I ¯ <v4><v3>) (dr-to-rm <v I ¯ <v3>) (dr-to-tin <v I ¯ <v4>) (unlocked<v I >) (dr-cloud <v 1 >) (arm-empty) (inroom <v2> <v3>) (pushable <v2>))) (effects nil ((add (next-to robot <v ! >))))) Figure 3: Learnedoperator GOTO-DR whenthe observation in Figure2 is givento the learningmodule Given a new observation, whenthe system already has an operator representation, the preconditionsare generalized by removingfacts that are not present in the pre-state of the 2Objects that do not changefrom problemto problemare treated as constantsand are not parameterized,such as the object ROBOT in the extended-stripsdomain. From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. newobservation3, the effects are augmentedby adding facts rying out the observedactions based on the "rational agent" that are in the delta-state of the observation. For example, assumption (Newell 1982). Inferring what the top-level given the observation in Figure 4, the operator GOTO-DR is goals are and howthe preconditions of each operator in the refined: (dr-closed <vl>) is removedfrom the preconditions, observations is achieved provides the systemheuristics for and the effects become: planning. For example, the system can prefer to subgoal on (effects ((<vS>Object)) the preconditions of an operator that were subgoaled on by (add(next-torobot<vl>))(del (next-torobot<v5>))) the observedagent. The system infers howthe preconditions of each operator Pre-sMfe: are achieved by creating an InfeTred Subgoal Graph(ISG). (connectsdr ! 2 rm2rml) (connectsdrl 2 rmI rm2) An ISG is a directed acyclic graph (DAG)whosenodes are (dr-to-rmdrl 2 rml) (dr-to-rmdr12 rm2) either operators or goals. Anedge in an ISGconnects an (next-torobota) (dr-opendrl 2) (unlocked drl (inroomrobotrmI ) (inmom c rm2)(iaroomb rm operator to the goal it achieves, or connects a goal to the (inroomarml) (pushablec) (pushableb) (ann-empty) operator that requires it as a precondition. The root of an ISGis the virtual operator *finish* that is connectedto the Delta-state:add:(next-torobotdrl2)del: (next-torobot partially inferred goals. Figure 4: The secondobservation of the operator GOTO-DR The systeminitializes the ISGby setting its root to the virtual operator *finish*. The last operator in the episode must be achieving sometop-level goals. Since everything Refining Operators with Practice Because of the incorin the delta-state of the observationcan be a potential toprectness and incompleteness of the learned operators, the level goal, and since it is difficult to distinguish the true plans produced by the planning modulemay be incorrect. top-level goals from subgoals or side effects, the complete This will be explained in section. While executing these delta-state is presumedto be part of the top-level goals. A plans in the environment,one of the following two outcomes node is created for these goals and is connectedto the root is possible for each operator: of the ISG, i.e. to operator *finish*. A node is also cre(1) The operator is executedif the state changespredicted ated for the operator in the observation, and is connectedto by the effects of the operator happenafter the operator exethe top-level goals it achieves. The systemthen infers how cution. In this case, the executionis treated as an observaeach precondition of the operator is achieved. A precontion and the preconditins and the effects of the operator are dition is inferred to be achievedby an observation if it is modifiedas described in the previous sectione. addedin the delta-state of the observation. Thenthe system (2) The operator is un-executed if the state does not recursively processes the episode and goes backwardsthe change as predicted because of unmet necessary preconepisode, determines howthe preconditions of the operator ditions of the operator in the environment. Whenan opare achieved. In an ISG, the preconditions of an operaerator is un-executed, the learning modulecomputes the tor that are connected to someoperator that achieves them unmetpreconditions of the operator. If there is only one are considered to have been subgoaled on by the observed such precondition, we mark this precondition as a necesagent. These preconditions are called subgoaled-preconds sary precondition of the operator. Otherwise, none of the of the operator. preconditions is markedas necessarybecause of insufficient information. Whenall the preconditins of an operator are The Planning Module markedas necessary, the system has learned the correct The operators acquired by the learning moduleare themrepresentaionof the operator. selves incorrect or incomplete. To plan with such opLearning Negated Preconditions The learned operators erators, PRODIGY’s search algorithm is augmented with are used to solve practice problems. During practice, if somedomain-independentheuristics to generate plans that are possibly incompleteor incorrect. an operator is un-executed even whenall the preconditions of the operators are satisfied, the learning modulediscovers Givena practice problem, the systemfirst generates an tentative plan. Then the plan is executed in a simulated that somenegated preconditions are missing. The negations environmentA, and repaired if someoperators in the plan of all the predicatesthat are true in the current state whenthe operator is un-executed,but are not true in the observations cannot be executed because of unmet necessary preconditions in the environment. The following sections describe of this operator are conjected as potential negated preconditions, and are addedto the preconditions of the operator. our approach to plan with incorrect and incomplete operaSomeof the added negated preconditions maybe unnecestors and to repair a failed plan. sary, and they will be removedin the same wayas all other Planning with Incomplete or Incorrect DomainKnowlpreconditions with more observations and practices. edge PRODIGY’s standard planner assumes a correct ac- Inferring Subgoal Structure of the Observed Agent Whenthe system observes an episode, it assumes that the observed agent is achieving its owntop-level goals by car3 Aprecondition is removed iff the predicateof the precondition is not presentin the newpre-state. tion model. But before our learner acquires a complete model of the domain, the operators can be incomplete or incorrect in the followingways: (a) over-specific precondi4Thesimulatedenvironmentis implemented using a complete andcorrect modelof the domain. WANG 337 From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. tions: whena learned operator has unnecessary preconditions. In this case, the systemhas to achieveits unnecessary preconditions during planning. Not only does this force the system to do unnecessarysearch, but also it can make manysolvable problems unsolvable because it maynot be possible to achieve some unnecessary preconditions. (b) incomplete effects: when someeffects of an operator are not learned because they are present in the pre-state of the operator. While searching for a plan, PRODIGY applies in its internal modelan operator that is missingsomeeffects, the state becomesincorrect. Theseproblemsmakeit very difficult to generate a correct plan for a problem. Therefore, we relax the assumption that all the plans generatedby the planner are correct. over-specific preconditions prevents the system from generating any plan at all, and are dealt with by using applicable threshold heuristic and subgoalordering heuristic to generate plans that includes operators with unmetpreconditions. incompleteeffects is not dealt with explicitly in our planner. Whenan operator cannot be executed because of unmetnecessary preconditions, the plan repair mechanism is invoked to generate a modified plan. PRODIGY’s basic search algorithm is modified by using the following two heuristics to generate plans that are possibly incompleteor incorrect: ¯ Applicable threshold heuristic: While computing the applicable operators, a threshold variable *applicablethreshold* is introduced such that an operator is considered applicable in the planner’s internal modelduring planning whenall the subgoaled-precondsand the necessary precondsare satisfied, and whenthe fraction of the satisfied preconditions exceeds *applicable-threshold*. Of course, this introduces the possibility of incomplete or incorrect plans in that a necessary precondition may be unsatisfied, necessitating plan repair (see the next section). ¯ Subgoal ordering heuristic: While choosing a goal to workon next, goals that are subgoaled-precondsare preferred over those that are not. For example, Figure 5 illustrates an exampleproblem and the solution generatedusing the aboveheuri sties with a set of learned operators that still are incorrect and incomplete,and *applicable-threshold* set to 0.7 5. This plan is compared to a correct plan. Plan Repair Plan repair is necessary whena planned operator fails to execute in the environmentbecause of unmet necessary preconditions. Since there can be multiple unmet preconditions, and since some of them may be unnecessary preconditions, it is not necessaryto generate a plan to achieve all of them. Therefore, the plan repair mechanism generates a plan fragmentto achieve one of them at a time. Theplan repair algorithmis illustrated in Figure 6. In step 1, heuristics from variable bindings are used to determine the order in which the unmet preconditions are achieved. Anunmetprecondition of the highest priority is chosen in ~Tbe*applicable-threshold* is currently set manually,howto set it automatically is a part of futurework. 338 POSTERS (a) Initial State: (connectsdr ! 2 rmI rm2)(dr-to-tindr 12rm2) (connects drl 2 rm2rmI ) (dr-to-rmdrl 2 rm (unlockeddrl 2) (dr-opendrl 2) (inroomrobotrm2)(arm-empty) finrooma rm2)(inroomb rm2)(pushablec) (inroomc (inroom key12rmI ) (carriabl¢key12) (is-keydr 12key (b) Goal Statement: (and(im~om key 12rmI ) (inroomc rm2)) Incorrect and incomplete plan generated by the planner with the aboveheuristics: <goto-drdrl 2rm2rml> <go-thru-drrm2rrnl c drl2> <cm’ry-thru-drrml c rm2b c drl2> A Correct plan that can be executed in the environment: <goto-drdr 12tin2 rml> <go-thru-drrm2rml c drl2> <push-to-drdrl2 e keyl2e b rml rm2> <push-thru-drdrl 2 e rmI rm2key 12¢ b> Figure 5: A SampleProblem.The incorrect and incompleteplan generatedby the plannerusing the aboveheudstics is compared witha correctplan that canbe executedin the environment. step 2 as the goal to achieve. If a goal is chosen,the system attempts to achieve it using planning heuristics described previously. Whena plan is thus generated, it is addedto the initial plan to form the repaired plan. Whenthe goal cannot be achieved, it is rememberedas an unachievable precondition, and the systemloops back to step 2 to achieve the next unmetprecondition. Whenthe failed operator still cannot be executed in the environmentafter the systemhas attempted to achieve each unmetprecondition in the order determined in step 1, the system propose another operator to achieve the goal by using the one step dependency directed plan repair heuristics: while generating bindings for the failed operator, bindings leading to unachievable preconditionsare rejected. Input:inst-opthat failedto execute,top-goab’,state Output:repaired-plan I. Decidethe order in whichthe unmetpreconditionsof inst-op are to be achieved 2. Choosethe highestpriority unmetpreconditionas the goal g to achieve 3. If g is chosen,do: 3.1. Generatea plan to achieveg fromstate 3.2. If the planis not empty,it is addedto the initial planto formrepaired-plan.Returnrepaired-plan. 3.3. Otherwise,goto2 4. If g is empty,generatea planto achievetop goals,but avoidthe operators with unachievablepreconditions. Figure 6: ThePlan Repair Process The Plan Execution Module The plan execution process is integrated with plan repair. For each operator in the initial plan, the plan execution From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. moduletests it in the environment.If it is executed, then the learning modulerefines the operator according to the states before and after the execution. If the operator is not executed, plan repair mechanismis invoked to generated a repaired plan and the execution continues. This process is illustrated in Figure7. Input: plan, goals, state Output: Whethergoals are achievedor not Foreachinstantiatedoperatorinst-opin plan, do: I. Testif inst-opis applicablein the environment or not 2. If it inst-opis applicable,thendo: 2. I. state,--- applyinst-opin state 2.2. Modifythe operator with execution 2.3. Checkif goals are achieved.If so, terminate 3. If inst-op is not executedbecauseofunmetpreconditions,then do: 3.1. Modifythe operator with the un-executed 3.2. new-plan ,-- Repair-plan (inst-op, goals,state) 3.3. If new-plan is empty,terminateexecute-planwithfailure 3.4. Otherwise,abandon the initial planandrestart execution: execute-plan(new-plangoals state). Figure7: Integration of Plan Executionand Plan Repair. Examples Let’s trace howthe problemin Figure 5 is solved whenthe systemhas learned a set of operators that are inco~ect or incomplete. The initial plan is shownin Figure 5. While executing the plan, the first two steps, i.e. <gote-dr drl2 rm2 rml> and <go-thru-dr rm2 rml c drl2> are executed in the environment.<carry-thrn-dr rml c rm2b c drl2> is not because of unmetpreconditions. Therefore, the plan repair procedure is called. The unmetpreconditions of this operator are: (holdingc) (cardablec) (pushableb), and the planner decides to first achieve (holding c) using the variable bindings heuristics. The following two steps are added to the initial plan throughplan repair: <goto-obj drl2 c rml rm2 rm2 rml drl2> <pickup-obj c keyl2 c rml rml drl2> <goto-obj drl2 c rml rm2 rm2 rml drl2> is executed, but <pickup-obj c keyl2 c rml rml drl2> fails. The only unmetpreconditionof this operators is (carriable c), therefore, it is markedas a necessary precondition of the operator by the learning module: Marking (carriable <v140>) as a necessary precondition of the operator PICKUP-OBJ. Since (carriable c) cannot be achieved by the planner. <carry-thin-drrml c rm2b c drl2>thus fails, and its unachievable precondition is (holding c). The planner generates newplan to achieve the top-level goals (inroomkeyl2 rml) and (inroom c rm2) using one step dependencydirected plan repair heuristics while avoiding the operators with the unachievableprecotutition (holding c). The newplan is: <push-thru-dr drl2 c rml rm2 keyl2 c b> <push-thru-drdrl2 c rml rm2keyl2 c b> fails to execute, and the unmetpreconditions are ((next-to robot drl2) (next-to drl2) (inroomkeyl2 rm2)). Anewplan step <push-to-drdrl2 keyl2 c b rml rm2>is generated to achieve the precondition (next-to robot drl2), andit is executed.Finally, <push-thin-dr drl2 c rml rm2keyl2 c b> is also executed. The learning modulenotices that the precondition (inroom<v 184><v 178> is not satisfied in the state when<push-thin-dr drl2 c rml rm2 keyl2 c b> is executed, therefore, is an unnecessary precondition of the operator PUSH-THRU-DR: Removing unnecessary precondition (inroom <v184> <v178>) from the operator PUSH-THRU-DR. At this point, the systemhas achieved all the top-level goals. Empirical Results The learning algorithm described in this paper has been tested in the extended-STRIPS(Minton1988) domain and a subset of the process planning domain(Gil1991). It also easily learns operators in more simple domainssuch as the blocksworld domainand the logistics domain. As already discussed, the learned operators always have all the correct preconditions (except for negated preconditions), but they mayalso have unnecessary preconditions that can be removedwith more observations and practice. Therefore, we use the percentage of unnecessary preconditions in the learned operators to measurethe effectiveness of the learning system. Wealso measure thenumberof problems solved with the learned operators becausethe goal of learning is to solve problems in the domains. The following table shows the system’s performance in the extended-strips and process planning domain. Both the training and the practice problems in these domains are randomly generated. In the extended-strips domain, these problemshave up to 3. goals and solution lengths of 5 to 10 operators. In the process planning domain, these problemshave up to 3 goals, and an average solution length of 20. Weare currenlty performing more extensive tests in the complete process planning domainthat has total 76 operators. In both domains,we see that the learned operators can be used to solve the practice problemseffectively, and the unnecessarypreconditions of the learned operators are removedwith practice so that the operators will eventually convergeto the correct ones. extendedstrips # of trainingprobs 7 # of learnedops 21 %of unnecessarypreconds 45% # of practice probs 32 # of practice probs solved 26 %of practice probs solved 81% %of unnecessarypreconds (after practice) 25% process planning 20 12 43% 10 10 100% 38% Related Work There is considerable interest in learning for planning systems. Most of this work concentrates on learning WANG 339 From: AIPS 1994 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. search control knowledge(Fikes, Hart, & Nilsson 1972; Carbonell, Mitchell, Utgoff, & Banerji 1983; Minton 1988; Veloso 1992). EXPO(Gil1992) is a learning-by-experimentation modulefor refining incomplete planning operators. Learning is triggered whenplan execution monitoring detects a divergence betweeninternal expectations and external observations. Our systemdiffers in that 1) Theinitial knowledge given to the two systems is different. EXPO starts with a set of operators that are missing somepreconditions and effects. Our system starts with no knowledgeabout the preconditions or the effects of the operators. 2) EXPO only copes with incompletedomalnknowledge, i.e. over-general preconditions and incompleteeffects of the operators, while our systemalso copes with incorrect domainknowledge,i.e. operators with over-specific preconditions. 3) Operatorsare learned from general-to-specific in EXPO,while they are learned from specific-to-general in our system. LIVE(Shen 1989) is a system that learns and discovers from environment.It integrates action, exploration, experimentation, learning, problem solving. It also constructs newterm, that our system does not deal with. It does not learn fromobservingothers, therefore it cannot learn search heuristics for planning. The preconditions of the operators in LIVEare learned mostly from general to specific, although over-specific preconditions can be eliminated using complementarydiscrimination learning algorithm. Our system deals explicitly with incomplete and incorrect operators by using a set of domain-independent heuristics for planning and plan repair, whereasLIVEdependson a set of domain-dependentsearch heuristics. On the planning side, some planning systems do plan repair (Simmons 1988; Wilkins 1988; Hammond1989; Kambhampati1990). However,all these plan repair systems use a correct domain model. In our system, on the other hand, a correct domainmodelis not available, therefore it is necessaryto plan and repair plan using incomplete and incorrect domainknowledge. Our plan repair method gives feedbackto learning module,this is not the case for other plan repairs systems. Future Workand Conclusions Future workincludes relaxing someof the assumptions described in this paper. For example,if there is noise in the sensor, we can have a threshold for each precondition so that a predicate is removedonly whenit is absent from the state moreoften than the threshold. In conclusion, we have described a frameworkfor learning planning operators by observation and practice and have demonstratedits effectiveness in several domains. In our framework,operators are learned from specific-to-general throughan integration of plan execution as well as planning and plan repair with possibly incorrect operators. References Anzai, Y., and Simon, H. A. 1979. The Theoryof Learning by Doing. Psychological Review 86:124-140. 340 POSTERS J.; J.Blythe; O.Etzioni; Oil, Y.; Joseph, R.; Kahn, D.; Knoblock,C.; Minton, S.; P~rez, M. A.; Reily, S.; Veloso, M.; and Wang, X. 1992. PRODIGY 4.0: The Manualand Tutorial. Technical report, School of Computer Science, Carnegie MellonUniversity. Carbonell, J. G.; Knoblock,C. A.; and Minton, S. 1990. PRODIGY: An Integrated Architecture for Planning and Learning. In VanLehn,K., ed, Architectures for Intelligence. Hillsdale, NJ: Erlbaum. Fikes, R. E., and Nilsson, N. J. 1971. STRIPS:A NewApproach to the Application of TheoremProving to Problem Solving. Artificial Intelligence 2(3,4):189-208. Fikes, R. E.; Hart, P. E.; and Nilsson, N. J. 1972. Learning and ExecutingGeneralizedRobotPlans. Artificial Intelligence 3:251-288. Gil, Y. 1991. A Specificication of Manufacturing Processes for Planning. Technical Report CMU-CS-91-179, School of ComputerScience, Carnegie MellonUniversity, Pittsburgh, PA. Gil, Y. ! 992. Acquiring DomainKnowledgefor Planning by Experimentation. Ph.D. Dissertation, School of Computer Science, CarnegieMellonUniversity, Pittsburgh, PA. Hammond,K. 1989. Cased-Based Planning: Viewing Planning as a Mentor), Task. AcademicPress. Kambhampati,S. 1990. A Theory of Plan Modification. In Proceedingsof the Eighth National Conferenceon Artificial Intelligence. Minton, S. 1988. Learning Search Control Knowledge: An Explanation-BasedApproach. Kluwer Academic Publishers. Mitchell, T.; Utgoff, P.; and Banerji, R. 1983. Learning by Experimentation: Acquiring and Refining Problemsolving Heuristic. In MachineLearning, An Artificial Intelligence Approach,Volume1. Palo Alto, CA:Tioga Press. Newell, A. 1982. The KnowledgeLevel. Artificial Intelligence 18(I ):87-127. Shen, W. 1989. Learning from EnvironmentBased on Percepts" andActions. Ph.D. Dissertation, Schoolof Computer Science, CarnegieMellonUniversity, Pittsburgh, PA. Simmons, R. 1988. A Theory of Debugging Plans and Interpretations. In Proceedingsof the Sixth National Conference on Artificial Intelligence. Veloso, M. 1992. Learning by Analogical Reasoning in General ProblemSolving. Ph.D. Dissertation, School of ComputerScience, Carnegie MellonUniversity, Pittsburgh, PA. Wilkins, D. 1988. Practical Planning: Extending the Classical AI Planning Paradigm. MorganKaufmann. Acknowledgments Thanksto JaimeCarbonelifor manydiscussions and insights in this work.This manuscriptis submittedfor publicationwiththe understanding that the U.S.Government is authorizedto reproduce and distribute reprints for Governmental purposes,notwithstanding any copyrightnotationthereon.

Learning Planning Operators by Observation ... Xuemei Wang*

Related documents

Products

Support

Learning Planning Operators by Observation ... Xuemei Wang*

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib