From: AAAI Technical Report FS-93-03. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Integrating Reactive and Deliberative Planning in a HouseholdRobot Jim Blythe and W. Scott Reilly School of ComputerScience Carnegie Mellon University Pittsburgh, PA15213 jblythe+@cs.cmu.edu and wsr+@cs.cmu.edu August 20, 1993 Abstract Autonomous agents that respond intelligentlyin dyoamic, complexenvironmentsneed to be both reactive and deliberative. Reactive systems have traditionally fared better than deliberative planers in such environments, but are often hard to code and inflexible. To fill in some of these gaps, we propose a hybrid systemthat exploits the strengths of both reactive and deliberative systems. We demonstrate howour system controls a simulated household robot and compareour system to a purely reactive one in this domain. 1. Introduction The typical household is a challenging environment for a robot, partly because of the presence of other agents. The robot must sense and respond to the other agents as it completesits task. In addition, other agents maymove furniture and create newcleaning tasks; this leads to a less structured environment,for example,than a nuclear powerplant or a road-followingtask. Thusit will be impossible to pre-program a vacuumingand cleaning robot to perform "blind", without sensing the environmentand monitoring the progress of tasks. Someof the changes in the environmentwill demandreactive responses from the robot, suchas interrupting a less importanttask to pick up a piece of trash. Thusa vacuumingrobot must be flexible enoughin its design to respondintelligently to a changing set of goals as well as a changingenvironment. Such a robot needs to display a range of capabilities not typically found in a single system. Deliberative systems that embodypowerful techniques for reasoning about actions and their consequencesoften fail to guarantee a timely responsein time-critical situations [6]. Also, reactive systems that respondwell in time-critical situations typically do not provide a reasonable response in situations unforeseenby the designer [5]. Reactive systems have traditionally been more successful than deliberative ones in controlling agents in dynamicdomains, like a typical household. Building a reactive system, however, can be a complex and timeconsumingendeavor because of the need to pre-code all of the behaviors of the systemfor all foreseeable circumstances. An efficient reactive system is also likely to havea narrowarea of applicability, for instance a specific house and task set. Deliberative systems are best suited to long-term,off-line planning-- effective for static environments, but not for controlling an autonomousagent operating in a dynamicenvironment. However,deliberative systems are often more robust to varying domains and sets of interacting goals because they employsome form of forward projection. Wehave designed a hybrid reactive-deliberative system in an attempt to combinethese complementarysets of capabilities, and use it to control a simulatedvacuuming robot. In this paper, we comparethe performanceof the hybrid system with a purely reactive one hand-coded for the same domain. Although the performance of the hybrid systemis inferior, we demonstratethat it is close enoughto be an attractive option becauseof its flexibility. 2. 2.1. The Architecture The ComponentSystems: Hap & Prodigy The Hap system is designed to execute plans for achieving multiple, prioritized goals. It is related to Firby’s RAPs[8]. Hapstarts with a set of pre-defmedgoals and hand-coded plans for achieving those goals. The handcoded plans are designed to allow Hap to respond to its environment in a timely manner. Hap also allows for demonsthat dynamically create new goals in appropriate situations. If such a goal has a higher priority than the other active goals, executionof the current plan is interrupted in favor of a plan to achieve the newgoal. Hapwill attempt to resumeits previous plan once it has handled this unexpected event. More details about Hap can be foundin [13]. predefinedsubset of the agent’s goals for Prodigy to work on and a time bound, so that the agent does not lose reactivity even though planning wouldotherwise take an arbitrary amountof time to complete. Since Prodigy reasons about the effects of its actions in the future, it can produce plans that will achieve a Prodigy4.0 is a classical deliberativeplannerthat uses numberof goals simultaneously, even if they interact in means-endssearch to create plans from descriptions of unexpected ways, and contend for the same resources. A operators, given initial and goal state descriptions. Goal pre-programmedapproach mayfail or produce less effistatements as well as the preconditions of operators may cient behaviourbecause of these interactions. Prodigy is be arbitrary expressionsof first-order logic, involvingex- given a time boundand uses anytime planning techniques istential and universal quantification. Prodigy’splanning to producethe best plan it can within the allowed time. involves a numberof choice-points, such as choosing Thegoals passed to Prodigyare prioritized so that if it a goal to work on or an operator to use, and Prodigy cannot solve all the goals in time it will solve the ones uses control rules to represent information about which with higher priorities. choices to make.Moredetails about Prodigy can be found in [2]. The integration works, in part, because the communication between Hapand Prodigy takes place at aa apA classical planner maytake an arbitrary amountof propriate level of abstraction, both of state and operators. time to completeits task; this is unacceptablefor an agent Like otherresearchers [9, 15], wehaveusedan abstraction that mayrequire a rapid response. For this reason, we boundary between the reactive and deliberative compohave modified Prodigy to be an instance of an anytime nents of our architecture. planner [4]. This means that the planner can be interrupted at any momentbefore it completes its task, and Deliberativeplannerstypically assumea static or nearwill return a portion of a completeplan that will allow static world. Whilethis allows the planner to construct a the agent to take someactions. This has been done such plan, in a dynamicdomainthis plan will frequently fail that the planner is changedonly minimally,allowingus to because someof its underlying assumptions have become makeuse of the large bodyof workon classical planning false since the plan was constructed. This is one of the systems, such as abstraction [12], machinelearning to im- primary reasons deliberative planners have been unsuccessful as agent architectures. Wecan improvethe quality prove planning performance[14, 7, 10] and derivational analogy [16, 11]. Moredetails about the anytimeplanner of the plans by makingthe planner plan in a morestatic state. Weaccomplishthis by providing the planner with can be foundin [3]. an abstraction of the state that includes the morestatic elementsand allowingit to plan in that state. 2.2. Integration Prodigyuses the abstract state to generate an abstract Hapis designed to react quickly and intelligently in a plan, that is, a plan that is a sequenceof Hapsubgoals dynamic environment by using stored behaviors when rather than a set of concrete actions. Hap uses its set of stored reactive plans to execute these subgoals in the possible. Prodigyis designedto plan for sets of goals that mayinteract, and to learn to plan more effectively. We dynamicworld. Hapis able to fill in the dynamicdetails integrate these two systems so as to retain the strengths that Prodigy did not plan for. Also, Hapwill often have numerousalternative plans for achieving a subgoal, so a of each, giving primarycontrol of the agent to Hap. single Prodigy plan can generate very different behavior Hapkeepsthe tree of goals and plans that the agent is dependingon the current state of the world. pursuingup to date. Whenthere are stored plans available or there is a strict time constraint, Hapwill usually act in a pre-programmedway. WhenHap has extra time to 3. Mr. Fixit, a HouseholdRobot act or there is no stored plan to handle the current situation, Hapmaycall Prodigy. Planning is not distinguished Wehave designed and built a simulated household from anything else Hap does, so the conditions under robot agent, Mr. Fixit, using this architecture. Mr. Fixit whichProdigy is called are contained in the pre-defined lives in a simulated environmentbuilt with the Oz system Happroductions. This allows the agent builder to create [1]. Figure 1 gives a rough layout of the environment agents that vary along the spectrum betweendeliberation that Mr. Fixit inhabits. Manyof the details of the world, and reaction as desired. The call to Prodigy includes a whichincludes over 80 objects, have beenleft out as they Figure 1: Rough layout of the simulated Player:. *BREAK Clfiua-Cup Fixit: *GO-TOSunroom [Plan: Goal:(and (recharge) (thrown-ont-nash)) Timelimit: I second Plan: ((go~orecharger) (recharge) (goto cup) tgoto trash-can) (put cup hash-can))] Ftxie *GO-TOBedroom *GO-TOCloset *~GE eGO-TOBedroom *TAKE Ca~-C~ *GO-TOSunroom *GO-TOSpare-Room *PUTChina-Cup in Trash-Can [Plan: Goal:(clean-tiNting-room) Timelimit: I second Plan: ((goto dining-room) (vacutmadining-room))] Fudt "GO-TO *GO-TODining-Room Player. eGO-TOSunroom Fixit *VACUUM Dining-Room Pla~cer: eGO-TODining-Room Fixlt’ *SAY"Hello." to Player Player: *BREAK Jar Figure 2: Sample trace household robot environment [Plan: Goal:(and (elean-alning-room)... (thrown-ont-~ash)) Timelimit: 1 second Plan: (Cgotojar) (getjar) ($ototraslvcan) (putjar in trash-can) (goto dining-room) (vacuum,~ining-room))l F~e *TAKE Jar [Plan: Goal:(and (clean-dinlng-room).. (thrown-out-trasla) (recharge)) Timelimit: 0.5 seconds Plan: ((goto recharger) (recharge))] *GO-TOSunroom *GO-TOBedroom Pla~¢**: *GO-TOSunroom Fbtit:*GO-TOCloset *RECHARGE [Plan: Goal: (and (clean-dlnlng*room) (throwa-out-trash)) Timeli~t: 1 second Plan: ((goto tras~an) (put jar Uash-oan) (goto dining-room) (v~uum dining-room))] Fruit *GO-TOBedroom eGO-TOSpare-Room *PUTJar m Trash-Can *GO-TOSuaroom *SAY"Hello." to Player *GO-TODining-Room *VACUUM Dining-Room of the household robot Number of goals I 2 4 8 Time bound (seconds) 0.25 0.50 0.90 0.98 0.80 0.96 0.38 0.52 0.13 0.16 1.00 1.00 1.00 1.00 0.62 2.00 1.00 1.00 1.00 1.00 Table 1: Averageproportion of goals solved. are unimportant for the behaviors described here. The player is a humanuser that is also interacting with the simulation. The robot has a numberof important goals that define its behavior. Theyare, from mostto least important: recharge battery whenlow, clean up brokenobjects, greet the player, vacuumdirty rooms, and roamthe house looking for tasks to perform. Mr. Fixit keeps a simple model of the world, but given his limited sensing abilities and the existence of an unpredictable user in the world, his modelwill often be incompleteor incorrect. Figure 2 showsa trace of a run of this simulation, whichhas been edited for brevity and clarity. The trace begins with Mr. Fixit in the spare roomperformingrounds while the Player is in the bedroom.The player breaks the cup. Fixit notices the broken cup at the same time that his battery needs recharging. Prodigy is notified of the two goals and is able to find a plan to solve them both within one second. Fixit executes the plan without any problems. During the execution of the plan Fixit goes by the Player twice without stopping for the traditional greeting. This is because the current plan being executed has a higher priority than greeting the user. Oncethe two goals are accomplished, Fixit notifies Prodigy of a goal to clean the dining room, which is planned for successfully. During the execution of this plan, the player enters the dining roomand is greeted because vacuumingis a low priority goal. Despite the warmgreeting, the Player decides to break the jar. Fixit notices this and plans to clean up the messand then return to vacuuming.While cleaning up the broken jar, Fixit’s battery again gets low. The three pendinggoals are sent to Prodigy along with a half second time limit. Prodigy is able to solve only the goal to recharge in this amount of time and Fixit executes that plan successfully. Once that is done,Fixit calls Prodigyto replan for the other two goals and executes them. 3.1. Discussion One of our main concerns in building an agent like Mr. Fixit is howquicklyProdigyis able to plan for goals in the domain. If Prodigy isn’t given enough time to generate plans for even single goals, then the agent is going to be stuck. Onthe otherhand, if we knowthat Prodigyis going to generally be given more than enough time to achieve somesubset of goals, then there are tradeoffs in speedvs. plan interleaving. For example, say Fixit knowsabout 2 broken objects that need to be thrown out. If Prodigy only has time to solve one of the goals, it will return a plan that maybe sub-optimalwith respect to solving both goals. Wetested Prodigy’s ability to solve goals quickly in this domainby giving it problems to solve with varying numbersof goals and varying time bounds. The results of the test are reproduced in table 1. WhenProdigy is given a 2 secondtime bound,it can alwayssolve the goal conjunct in this relatively simple domain, makingthe anytime planning redundant. With smaller time bounds, Prodigy is only able to solve for a proper subset of the goals. If Prodigy only returned complete solutions Hap wouldn’tget any useful informationin these time-critical Cases. This information can beusedto guide the design of the domainspecification given to Prodigy. If we knowthat Prodigy will often only have 0.25 or 0.5 seconds with which to work, we will want to generate simple serial plans. If the domainallows Prodigy to take 2 or more seconds, we can write control rules for Prodigy that will produce moreefficient plans but that will usually take longer before completelyplanning for any single goal. 4. Experiments: Deliberative+Reactive vs. Reactive Weused the household robot domainas a testing ground for our architecture. Wehave already discussed some of the obviousbenefits to using a deliberative planner as part of an agent architecture, but if the architecture doesn’t perform well, these benefits maybe overshadowed. To evaluate our architecture we decided to test it against a purely reactive agent. This agent was written entirely in Hap and was designed specifically for this domain. In general, the hand-codedplans were similar to the ones that Prodigy generated, but we also addedspecific interleaved plans for throwingout multiple pieces of garbage. Wedid not expect the hybrid agent to do quite as well expectedthe reactive agent to performbetter on this task. as the reactive agent, but if our architecture could come Our expectations were realized and figure 4 showshowat close, then we can reap the benefits of using a delibera10%the reactive agent is falling behind, but at both 8% tive architecture without concernfor losing reactivity or and 5%the robot seems able to keep up. performanceefficiency. The relative performancesof the two systems in pickThe domaindescribed in section 3 was modified as ing up cups was expected. That the hybrid system fared as well as itdid on this task is reassuring. Admittedly,this follows: (1) Wechanged the procedure for throwing out garbage so that the robot first had to get a bag out of a is still a relatively simpledomainso differences in perforcabinet in the kitchen, then put the trash in the bag, then mancewill tend to be small, but at the same time moving put the bag in a trash bin in the sunroom;(2) A second to more complex domains will make it even harder for robot, the Destructo2000was added to the environment. builders of reactive agents to create a completeand effiThis robot would generate cups and break them on the cient set of behaviorsfor all situations. floor with someprobability that we could control; (3) A somewhatsurprising result was howwell thehybrid (slightly unrealistic) phone was placed in the bedroom. system did on the other tasks it was given. First, both With a 10%chance the phone wouldring during any turn agents performed at 100%on battery recharging, which it was off the hook. If the robot hadn’t answeredthe previous call, it waslost. Until another call camein, the was the most important goal the agent was given. Second, previous caller would keep ringing; (4) All the rooms the hybrid agent wasable to answerthe phoneat a rate of 78%as comparedto 61%for the reactive agent. Third, at started in need of vacuumingand didn’t becomedirty again once vacuumed.(5) The player was removedfrom cleaning rooms, the hybrid agent was able to do almost the simulation; (6) The new goal priority ordering was as well as the reactive agent in the 10%domain and (from most to least important): recharge battery, answer even slightly better in the 5%domain. This is shownin figure 5. Weexpect that the hybrid system outperforming phone, throw out trash, vacuumrooms. the reactive systemon these tasks is either a matter of The phone and the battery created occasional intera limited data set (10 runs for each robot-environment rupts in behavior that were kept constant over every rim. pair) or a product of the logistics of this domainand not Wewere able to control how dynamic the environment attributable to our architecture. was by changing the chance that the Destructo2000would Each time cycle in the simulations with the reactive drop a cup. Weran both the hybrid and pure reactive agent took 9.2 seconds. This includes both agents, the robots with this chance at 5%,8%, and 10%.Wealso ran physical world simulation, and the data gathering. When the hybrid system at 3%. we changed to a hybrid system, this increased to 9.7 The planner in the hybrid system had a 3 secondtime seconds. So on average, the hybrid agent only took 0.5 bound. Wefeel this was reasonable because, although a secondslonger to choosean action than thereactive agent. householdrobot might stand a longer delay without losing reactivity, its worldmodelis also likely to be morecomplicated. Althoughwe didn’t do as complete an analysis 5. Conclusions of this version of the domainas wasdescribed in section 3.1, someinformal experimentsshowedthat the plans required to handle the newtrash procedurerequired enough This preliminary version of the architecture can be improved in several obvious places. For example, our curtime that 3 seconds was not enoughto consistently create interleaved plans for throwing out garbage. Because of rent modelof anytimeplanningonly gives credit to states this, the hybrid agent always generated serialized plans that solve one or more top-level goals. If no goal can for throwingout trash. Thesewere sub-optimalplans, but be completely solved within the time bound, the planner Prodigy was alwaysable to solve at least one goal in the will be unable to suggest an action. The planner is unable to makeuse of partial plans built during previous calls allotted time. from Hap, and we have not yet investigated the impact of Prodigy’s machinelearning capabilities on this task. Figure 3 graphs howwell the hybrid robot did in keeping up with the Destructo2000.It’s fairly clear that The hybrid architecture we built provides powerto at 5%, 8%, and 10%,Mr. Fixit is falling further and agent builders which will be useful in designing agents further behind but at 3%he is able to keepup. The purely for a variety of domaintypes. In highly dynamicdomains reactive agent that we created had hand-codedplans for where quick action is vital, the agent builder can put throwingout multiple pieces of trash simultaneously, and reactive behavior to deal with most situations into Hap tuned to the layout of the building. Becauseof this we and design the Prodigy system to return quickly. This 10 35 I 30 3% 5% 8% 10% g i I i I I with planning ~ with planning -4--with planning -s--with planning -x-- i / I __ /w~ // .¢. i ¯~ ........ ~:;’~:: ..... 25 ¯o °oo.°. .°oooo s/... ~,~s" 20 o°oOOOO 15 /.~ s Ii1 °° // I0 I 5 I0 15 I I 20 25 Time (simulation 30 units) I I I 35 40 45 50 Figure3: Hybrid robotcleaningupcups. 35 I I I 5% without planning ¢ 8% without planning -4-°-. 10% without planning-E] 3O )4 0 0 I I I I ..’" °¯ o/° /- "¯ o~1 °.¯o°-°°° 25 / / ~oo°" o/ / / 20 0 0 -,-t 15 / / ,C / / e"[~ ¯ .’/~/ ¯ i0 / ¯ / ¯ / \ \ s /J i 0 0 / ~ .S I0 20 30 Time ’ 40 (simulation 50 units) Figure4: Purereactiverobotcleaningupcups. 11 I I 60 70 80 3.5 5% with planning $ 5% without planning-+--" 10% with planning -8--. 10% without planning -M.-- ..-" -°"" " . .... ~.----" 2.5 s.J" ’O o) o) r-I O t~ O t~ .+1 1.5 ss I" I ......... ~ ...... . .............. ....43............ ______~x~ 2r ....... s// 0.5 0 I I I 5 i0 15 I I I 20 25 Time (simulation 30 units) I I 35 40 I 45 50 Figure 5: Robots vacuumingrooms. will often be at the expense of plan quality because the interactions betweengoals will not be explored. In more stable domains, Prodigy can be given moretime to create efficient plans. In fact, in somesituations it mightbe the case that Prodigy is able to solve someresource critical problemthat a reactive systemmightnot solve at all. References [1] Joseph Bates. Virtual reality, art, and entertainment. PRESENCE: Teleoperators and Virtual Environments, 1(1):133-138, 1992. [2] Jim Blythe, Oren Etzioni, Yolanda Gil, Robert Joseph, Alicia Pdrez, Scott Reilly, ManuelaVeloso, and Xuemei Wang. Prodigy4.0: The manual and tutorial. Technical report, Schoolof ComputerScience, Carnegie MellonUniversity, 1992. [3] Jim Blythe and W.Scott Reilly. Integrating reactive and deliberative planning for agents. Technical Report CMU-CS-93-135, Carnegie Mellon University, May 1993. [4] Mark Boddy. Solving lime-Dependent Problems: A Decision-Theoretic Approachto Planning in Dy. 12 namic Environments. PhDthesis, sity, May1991. Brown Univer- [5] Rodney Brooks. Integrated systems based on behaviors. In Proceedings of AAA1Spring Symposium on Integrated Intelligent Architectures, Stanford University, March1991. Available in SIGART Bulletin, Volume2, Number4, August 1991. [6] David Chapman.Planning for conjunctive goals. Artificial Intelligence, 32:333-378,1987. [7] Oren FAzioni. A Structural Theory of ExplanationBased Learning. PhDthesis, School of Computer Science, Carnegie MellonUniversity, 1990. Available as technical report CMU-CS-90-185. [8] James R.Firby. Adaptive Execution in ComplexDynamic WorMs.PhDthesis, Department of Computer Science, Yale University, 1989. [9] ErannGat. Onthe role of stored internal state in the control of autonomousmobile robots. AIMagazine, 14(1):64-73, Spring 1990. [10] John Gratch and GerryDeJong. Composer: Aprobabilistic solution to the utility problemin speed-up learning. InAAAI92, 1992. [11] Kristian J. Hammond.Case-based Planning: An Integrated Theory of Planning, Learning and Memory. Phi9 thesis, Yale University, 1986. [12] Craig A. Knoblock, Josh D. Tenenberg, and Qiang Yang. Characterizing abstraction hierarchies for planning. In National Conferenceon Artificial Intelligence, 1991. [13] A. Bryan Loyall and Joseph Bates. Hap: A reactive, adaptive architecture for agents. Technical Report CMU-CS-91-147,School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, June 1991. [14] Steven Minton. Learning Effective Search Control Knowledge: An Explanation-Based Approach. Kluwer, Boston, MA,1988. [15] Alberto Segre and Jennifer Turney. Planning, acting and learning in a dynamicdomain.In S. Minton,editor, MachineLearning Methodsfor Planning. Morgan KaufmannPublishers, San Mateo, CA, 1993. [16] Manuela M. Veloso. Learning by Analogical Reasoning in General ProblemSolving. Phi) thesis, Carnegie MellonUniversity, august 1992. 13