Integrating Reactive and Deliberative Planning in a ...

From: AAAI Technical Report FS-93-03. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
Integrating Reactive and Deliberative Planning in a HouseholdRobot
Jim Blythe and W. Scott Reilly
School of ComputerScience
Carnegie Mellon University
Pittsburgh, PA15213
jblythe+@cs.cmu.edu and wsr+@cs.cmu.edu
August 20, 1993
Abstract
Autonomous
agents that respond intelligentlyin dyoamic,
complexenvironmentsneed to be both reactive and deliberative. Reactive systems have traditionally fared better
than deliberative planers in such environments, but are
often hard to code and inflexible. To fill in some of
these gaps, we propose a hybrid systemthat exploits the
strengths of both reactive and deliberative systems. We
demonstrate howour system controls a simulated household robot and compareour system to a purely reactive
one in this domain.
1.
Introduction
The typical household is a challenging environment for
a robot, partly because of the presence of other agents.
The robot must sense and respond to the other agents as
it completesits task. In addition, other agents maymove
furniture and create newcleaning tasks; this leads to a
less structured environment,for example,than a nuclear
powerplant or a road-followingtask. Thusit will be impossible to pre-program a vacuumingand cleaning robot
to perform "blind", without sensing the environmentand
monitoring the progress of tasks. Someof the changes in
the environmentwill demandreactive responses from the
robot, suchas interrupting a less importanttask to pick up
a piece of trash. Thusa vacuumingrobot must be flexible
enoughin its design to respondintelligently to a changing
set of goals as well as a changingenvironment.
Such a robot needs to display a range of capabilities not typically found in a single system. Deliberative
systems that embodypowerful techniques for reasoning
about actions and their consequencesoften fail to guarantee a timely responsein time-critical situations [6]. Also,
reactive systems that respondwell in time-critical situations typically do not provide a reasonable response in
situations unforeseenby the designer [5].
Reactive systems have traditionally been more successful than deliberative ones in controlling agents in
dynamicdomains, like a typical household. Building a
reactive system, however, can be a complex and timeconsumingendeavor because of the need to pre-code all
of the behaviors of the systemfor all foreseeable circumstances. An efficient reactive system is also likely to
havea narrowarea of applicability, for instance a specific
house and task set. Deliberative systems are best suited
to long-term,off-line planning-- effective for static environments, but not for controlling an autonomousagent
operating in a dynamicenvironment. However,deliberative systems are often more robust to varying domains
and sets of interacting goals because they employsome
form of forward projection.
Wehave designed a hybrid reactive-deliberative system in an attempt to combinethese complementarysets
of capabilities, and use it to control a simulatedvacuuming robot. In this paper, we comparethe performanceof
the hybrid system with a purely reactive one hand-coded
for the same domain. Although the performance of the
hybrid systemis inferior, we demonstratethat it is close
enoughto be an attractive option becauseof its flexibility.
2.
2.1.
The Architecture
The ComponentSystems: Hap & Prodigy
The Hap system is designed to execute plans for achieving multiple, prioritized goals. It is related to Firby’s
RAPs[8]. Hapstarts with a set of pre-defmedgoals and
hand-coded plans for achieving those goals. The handcoded plans are designed to allow Hap to respond to its
environment in a timely manner. Hap also allows for
demonsthat dynamically create new goals in appropriate
situations. If such a goal has a higher priority than the
other active goals, executionof the current plan is interrupted in favor of a plan to achieve the newgoal. Hapwill
attempt to resumeits previous plan once it has handled
this unexpected event. More details about Hap can be
foundin [13].
predefinedsubset of the agent’s goals for Prodigy to work
on and a time bound, so that the agent does not lose reactivity even though planning wouldotherwise take an
arbitrary amountof time to complete.
Since Prodigy reasons about the effects of its actions
in the future, it can produce plans that will achieve a
Prodigy4.0 is a classical deliberativeplannerthat uses numberof goals simultaneously, even if they interact in
means-endssearch to create plans from descriptions of unexpected ways, and contend for the same resources. A
operators, given initial and goal state descriptions. Goal pre-programmedapproach mayfail or produce less effistatements as well as the preconditions of operators may cient behaviourbecause of these interactions. Prodigy is
be arbitrary expressionsof first-order logic, involvingex- given a time boundand uses anytime planning techniques
istential and universal quantification. Prodigy’splanning to producethe best plan it can within the allowed time.
involves a numberof choice-points, such as choosing Thegoals passed to Prodigyare prioritized so that if it
a goal to work on or an operator to use, and Prodigy cannot solve all the goals in time it will solve the ones
uses control rules to represent information about which with higher priorities.
choices to make.Moredetails about Prodigy can be found
in [2].
The integration works, in part, because the communication between Hapand Prodigy takes place at aa apA classical planner maytake an arbitrary amountof propriate level of abstraction, both of state and operators.
time to completeits task; this is unacceptablefor an agent Like otherresearchers [9, 15], wehaveusedan abstraction
that mayrequire a rapid response. For this reason, we boundary between the reactive and deliberative compohave modified Prodigy to be an instance of an anytime nents of our architecture.
planner [4]. This means that the planner can be interrupted at any momentbefore it completes its task, and
Deliberativeplannerstypically assumea static or nearwill return a portion of a completeplan that will allow static world. Whilethis allows the planner to construct a
the agent to take someactions. This has been done such plan, in a dynamicdomainthis plan will frequently fail
that the planner is changedonly minimally,allowingus to
because someof its underlying assumptions have become
makeuse of the large bodyof workon classical planning false since the plan was constructed. This is one of the
systems, such as abstraction [12], machinelearning to im- primary reasons deliberative planners have been unsuccessful as agent architectures. Wecan improvethe quality
prove planning performance[14, 7, 10] and derivational
analogy [16, 11]. Moredetails about the anytimeplanner of the plans by makingthe planner plan in a morestatic
state. Weaccomplishthis by providing the planner with
can be foundin [3].
an abstraction of the state that includes the morestatic
elementsand allowingit to plan in that state.
2.2. Integration
Prodigyuses the abstract state to generate an abstract
Hapis designed to react quickly and intelligently in a plan, that is, a plan that is a sequenceof Hapsubgoals
dynamic environment by using stored behaviors when rather than a set of concrete actions. Hap uses its set
of stored reactive plans to execute these subgoals in the
possible. Prodigyis designedto plan for sets of goals that
mayinteract, and to learn to plan more effectively. We dynamicworld. Hapis able to fill in the dynamicdetails
integrate these two systems so as to retain the strengths that Prodigy did not plan for. Also, Hapwill often have
numerousalternative plans for achieving a subgoal, so a
of each, giving primarycontrol of the agent to Hap.
single Prodigy plan can generate very different behavior
Hapkeepsthe tree of goals and plans that the agent is
dependingon the current state of the world.
pursuingup to date. Whenthere are stored plans available
or there is a strict time constraint, Hapwill usually act
in a pre-programmedway. WhenHap has extra time to 3.
Mr. Fixit, a HouseholdRobot
act or there is no stored plan to handle the current situation, Hapmaycall Prodigy. Planning is not distinguished
Wehave designed and built a simulated household
from anything else Hap does, so the conditions under
robot
agent, Mr. Fixit, using this architecture. Mr. Fixit
whichProdigy is called are contained in the pre-defined
lives
in a simulated environmentbuilt with the Oz system
Happroductions. This allows the agent builder to create
[1].
Figure
1 gives a rough layout of the environment
agents that vary along the spectrum betweendeliberation
that
Mr.
Fixit
inhabits. Manyof the details of the world,
and reaction as desired. The call to Prodigy includes a
whichincludes over 80 objects, have beenleft out as they
Figure
1: Rough layout
of the simulated
Player:. *BREAK
Clfiua-Cup
Fixit: *GO-TOSunroom
[Plan: Goal:(and (recharge)
(thrown-ont-nash))
Timelimit: I second
Plan: ((go~orecharger)
(recharge)
(goto cup)
tgoto trash-can)
(put cup hash-can))]
Ftxie
*GO-TOBedroom
*GO-TOCloset
*~GE
eGO-TOBedroom
*TAKE Ca~-C~
*GO-TOSunroom
*GO-TOSpare-Room
*PUTChina-Cup
in Trash-Can
[Plan: Goal:(clean-tiNting-room)
Timelimit: I second
Plan: ((goto dining-room)
(vacutmadining-room))]
Fudt "GO-TO
*GO-TODining-Room
Player. eGO-TOSunroom
Fixit *VACUUM
Dining-Room
Pla~cer: eGO-TODining-Room
Fixlt’ *SAY"Hello." to Player
Player: *BREAK
Jar
Figure
2: Sample trace
household
robot
environment
[Plan: Goal:(and (elean-alning-room)...
(thrown-ont-~ash))
Timelimit: 1 second
Plan: (Cgotojar)
(getjar)
($ototraslvcan)
(putjar in trash-can)
(goto dining-room)
(vacuum,~ining-room))l
F~e *TAKE Jar
[Plan: Goal:(and (clean-dinlng-room)..
(thrown-out-trasla)
(recharge))
Timelimit: 0.5 seconds
Plan: ((goto recharger)
(recharge))]
*GO-TOSunroom
*GO-TOBedroom
Pla~¢**:
*GO-TOSunroom
Fbtit:*GO-TOCloset
*RECHARGE
[Plan: Goal: (and (clean-dlnlng*room)
(throwa-out-trash))
Timeli~t: 1 second
Plan: ((goto tras~an)
(put jar Uash-oan)
(goto dining-room)
(v~uum
dining-room))]
Fruit *GO-TOBedroom
eGO-TOSpare-Room
*PUTJar m Trash-Can
*GO-TOSuaroom
*SAY"Hello." to Player
*GO-TODining-Room
*VACUUM
Dining-Room
of the household
robot
Number of goals
I
2
4
8
Time bound (seconds)
0.25
0.50
0.90
0.98
0.80
0.96
0.38
0.52
0.13
0.16
1.00
1.00
1.00
1.00
0.62
2.00
1.00
1.00
1.00
1.00
Table 1: Averageproportion of goals solved.
are unimportant for the behaviors described here. The
player is a humanuser that is also interacting with the
simulation.
The robot has a numberof important goals that define its behavior. Theyare, from mostto least important:
recharge battery whenlow, clean up brokenobjects, greet
the player, vacuumdirty rooms, and roamthe house looking for tasks to perform. Mr. Fixit keeps a simple model
of the world, but given his limited sensing abilities and
the existence of an unpredictable user in the world, his
modelwill often be incompleteor incorrect.
Figure 2 showsa trace of a run of this simulation,
whichhas been edited for brevity and clarity. The trace
begins with Mr. Fixit in the spare roomperformingrounds
while the Player is in the bedroom.The player breaks the
cup. Fixit notices the broken cup at the same time that
his battery needs recharging. Prodigy is notified of the
two goals and is able to find a plan to solve them both
within one second. Fixit executes the plan without any
problems. During the execution of the plan Fixit goes
by the Player twice without stopping for the traditional
greeting. This is because the current plan being executed
has a higher priority than greeting the user.
Oncethe two goals are accomplished, Fixit notifies
Prodigy of a goal to clean the dining room, which is
planned for successfully. During the execution of this
plan, the player enters the dining roomand is greeted
because vacuumingis a low priority goal. Despite the
warmgreeting, the Player decides to break the jar. Fixit
notices this and plans to clean up the messand then return
to vacuuming.While cleaning up the broken jar, Fixit’s
battery again gets low. The three pendinggoals are sent
to Prodigy along with a half second time limit. Prodigy
is able to solve only the goal to recharge in this amount
of time and Fixit executes that plan successfully. Once
that is done,Fixit calls Prodigyto replan for the other two
goals and executes them.
3.1. Discussion
One of our main concerns in building an agent like Mr.
Fixit is howquicklyProdigyis able to plan for goals in the
domain. If Prodigy isn’t given enough time to generate
plans for even single goals, then the agent is going to be
stuck. Onthe otherhand, if we knowthat Prodigyis going
to generally be given more than enough time to achieve
somesubset of goals, then there are tradeoffs in speedvs.
plan interleaving. For example, say Fixit knowsabout
2 broken objects that need to be thrown out. If Prodigy
only has time to solve one of the goals, it will return a
plan that maybe sub-optimalwith respect to solving both
goals.
Wetested Prodigy’s ability to solve goals quickly in
this domainby giving it problems to solve with varying
numbersof goals and varying time bounds. The results
of the test are reproduced in table 1. WhenProdigy is
given a 2 secondtime bound,it can alwayssolve the goal
conjunct in this relatively simple domain, makingthe
anytime planning redundant. With smaller time bounds,
Prodigy is only able to solve for a proper subset of the
goals. If Prodigy only returned complete solutions Hap
wouldn’tget any useful informationin these time-critical
Cases.
This information can beusedto guide the design of the
domainspecification given to Prodigy. If we knowthat
Prodigy will often only have 0.25 or 0.5 seconds with
which to work, we will want to generate simple serial
plans. If the domainallows Prodigy to take 2 or more
seconds, we can write control rules for Prodigy that will
produce moreefficient plans but that will usually take
longer before completelyplanning for any single goal.
4.
Experiments: Deliberative+Reactive vs.
Reactive
Weused the household robot domainas a testing ground
for our architecture. Wehave already discussed some
of the obviousbenefits to using a deliberative planner as
part of an agent architecture, but if the architecture doesn’t
perform well, these benefits maybe overshadowed. To
evaluate our architecture we decided to test it against a
purely reactive agent. This agent was written entirely in
Hap and was designed specifically for this domain. In
general, the hand-codedplans were similar to the ones
that Prodigy generated, but we also addedspecific interleaved plans for throwingout multiple pieces of garbage.
Wedid not expect the hybrid agent to do quite as well expectedthe reactive agent to performbetter on this task.
as the reactive agent, but if our architecture could come Our expectations were realized and figure 4 showshowat
close, then we can reap the benefits of using a delibera10%the reactive agent is falling behind, but at both 8%
tive architecture without concernfor losing reactivity or and 5%the robot seems able to keep up.
performanceefficiency.
The relative performancesof the two systems in pickThe domaindescribed in section 3 was modified as
ing up cups was expected. That the hybrid system fared
as well as itdid on this task is reassuring. Admittedly,this
follows: (1) Wechanged the procedure for throwing out
garbage so that the robot first had to get a bag out of a is still a relatively simpledomainso differences in perforcabinet in the kitchen, then put the trash in the bag, then mancewill tend to be small, but at the same time moving
put the bag in a trash bin in the sunroom;(2) A second to more complex domains will make it even harder for
robot, the Destructo2000was added to the environment. builders of reactive agents to create a completeand effiThis robot would generate cups and break them on the
cient set of behaviorsfor all situations.
floor with someprobability that we could control; (3)
A somewhatsurprising result was howwell thehybrid
(slightly unrealistic) phone was placed in the bedroom.
system did on the other tasks it was given. First, both
With a 10%chance the phone wouldring during any turn
agents performed at 100%on battery recharging, which
it was off the hook. If the robot hadn’t answeredthe
previous call, it waslost. Until another call camein, the was the most important goal the agent was given. Second,
previous caller would keep ringing; (4) All the rooms the hybrid agent wasable to answerthe phoneat a rate of
78%as comparedto 61%for the reactive agent. Third, at
started in need of vacuumingand didn’t becomedirty
again once vacuumed.(5) The player was removedfrom cleaning rooms, the hybrid agent was able to do almost
the simulation; (6) The new goal priority ordering was as well as the reactive agent in the 10%domain and
(from most to least important): recharge battery, answer even slightly better in the 5%domain. This is shownin
figure 5. Weexpect that the hybrid system outperforming
phone, throw out trash, vacuumrooms.
the reactive systemon these tasks is either a matter of
The phone and the battery created occasional intera limited data set (10 runs for each robot-environment
rupts in behavior that were kept constant over every rim. pair) or a product of the logistics of this domainand not
Wewere able to control how dynamic the environment attributable to our architecture.
was by changing the chance that the Destructo2000would
Each time cycle in the simulations with the reactive
drop a cup. Weran both the hybrid and pure reactive
agent took 9.2 seconds. This includes both agents, the
robots with this chance at 5%,8%, and 10%.Wealso ran
physical world simulation, and the data gathering. When
the hybrid system at 3%.
we changed to a hybrid system, this increased to 9.7
The planner in the hybrid system had a 3 secondtime seconds. So on average, the hybrid agent only took 0.5
bound. Wefeel this was reasonable because, although a
secondslonger to choosean action than thereactive agent.
householdrobot might stand a longer delay without losing
reactivity, its worldmodelis also likely to be morecomplicated. Althoughwe didn’t do as complete an analysis 5.
Conclusions
of this version of the domainas wasdescribed in section
3.1, someinformal experimentsshowedthat the plans required to handle the newtrash procedurerequired enough This preliminary version of the architecture can be improved in several obvious places. For example, our curtime that 3 seconds was not enoughto consistently create
interleaved plans for throwing out garbage. Because of rent modelof anytimeplanningonly gives credit to states
this, the hybrid agent always generated serialized plans that solve one or more top-level goals. If no goal can
for throwingout trash. Thesewere sub-optimalplans, but be completely solved within the time bound, the planner
Prodigy was alwaysable to solve at least one goal in the will be unable to suggest an action. The planner is unable
to makeuse of partial plans built during previous calls
allotted time.
from Hap, and we have not yet investigated the impact of
Prodigy’s
machinelearning capabilities on this task.
Figure 3 graphs howwell the hybrid robot did in
keeping up with the Destructo2000.It’s fairly clear that
The hybrid architecture we built provides powerto
at 5%, 8%, and 10%,Mr. Fixit is falling further and
agent
builders which will be useful in designing agents
further behind but at 3%he is able to keepup. The purely
for
a
variety of domaintypes. In highly dynamicdomains
reactive agent that we created had hand-codedplans for
where
quick action is vital, the agent builder can put
throwingout multiple pieces of trash simultaneously, and
reactive
behavior to deal with most situations into Hap
tuned to the layout of the building. Becauseof this we
and design the Prodigy system to return quickly. This
10
35
I
30
3%
5%
8%
10%
g
i
I
i
I
I
with planning ~
with planning -4--with planning -s--with planning -x--
i
/
I
__
/w~
//
.¢.
i
¯~ ........
~:;’~::
.....
25
¯o °oo.°.
.°oooo
s/...
~,~s"
20
o°oOOOO
15
/.~ s
Ii1 °°
//
I0
I
5
I0
15
I
I
20
25
Time (simulation
30
units)
I
I
I
35
40
45
50
Figure3: Hybrid
robotcleaningupcups.
35
I
I
I
5% without planning ¢
8% without planning -4-°-.
10% without planning-E]
3O
)4
0
0
I
I
I
I
..’"
°¯
o/°
/-
"¯
o~1
°.¯o°-°°°
25
/
/
~oo°"
o/
/
/
20
0
0
-,-t
15
/
/
,C
/
/
e"[~
¯
.’/~/
¯
i0
/
¯
/
¯
/
\
\
s
/J
i
0
0
/
~ .S
I0
20
30
Time
’
40
(simulation
50
units)
Figure4: Purereactiverobotcleaningupcups.
11
I
I
60
70
80
3.5
5% with planning $
5% without planning-+--"
10% with planning -8--.
10% without planning -M.--
..-"
-°"" "
. ....
~.----"
2.5
s.J"
’O
o)
o)
r-I
O
t~
O
t~
.+1
1.5
ss I"
I
.........
~
......
. ..............
....43............
______~x~
2r .......
s//
0.5
0
I
I
I
5
i0
15
I
I
I
20
25
Time (simulation
30
units)
I
I
35
40
I
45
50
Figure 5: Robots vacuumingrooms.
will often be at the expense of plan quality because the
interactions betweengoals will not be explored. In more
stable domains, Prodigy can be given moretime to create
efficient plans. In fact, in somesituations it mightbe the
case that Prodigy is able to solve someresource critical
problemthat a reactive systemmightnot solve at all.
References
[1] Joseph Bates. Virtual reality, art, and entertainment. PRESENCE:
Teleoperators and Virtual Environments, 1(1):133-138, 1992.
[2] Jim Blythe, Oren Etzioni, Yolanda Gil, Robert
Joseph, Alicia Pdrez, Scott Reilly, ManuelaVeloso,
and Xuemei Wang. Prodigy4.0: The manual and
tutorial. Technical report, Schoolof ComputerScience, Carnegie MellonUniversity, 1992.
[3] Jim Blythe and W.Scott Reilly. Integrating reactive
and deliberative planning for agents. Technical Report CMU-CS-93-135,
Carnegie Mellon University,
May 1993.
[4] Mark Boddy. Solving lime-Dependent Problems:
A Decision-Theoretic Approachto Planning in Dy.
12
namic Environments. PhDthesis,
sity, May1991.
Brown Univer-
[5] Rodney Brooks. Integrated systems based on behaviors. In Proceedings of AAA1Spring Symposium on Integrated Intelligent Architectures, Stanford University, March1991. Available in SIGART
Bulletin, Volume2, Number4, August 1991.
[6] David Chapman.Planning for conjunctive goals.
Artificial Intelligence, 32:333-378,1987.
[7] Oren FAzioni. A Structural Theory of ExplanationBased Learning. PhDthesis, School of Computer
Science, Carnegie MellonUniversity, 1990. Available as technical report CMU-CS-90-185.
[8] James R.Firby. Adaptive Execution in ComplexDynamic WorMs.PhDthesis, Department of Computer
Science, Yale University, 1989.
[9] ErannGat. Onthe role of stored internal state in the
control of autonomousmobile robots. AIMagazine,
14(1):64-73, Spring 1990.
[10] John Gratch and GerryDeJong. Composer: Aprobabilistic solution to the utility problemin speed-up
learning. InAAAI92, 1992.
[11] Kristian J. Hammond.Case-based Planning: An
Integrated Theory of Planning, Learning and Memory. Phi9 thesis, Yale University, 1986.
[12] Craig A. Knoblock, Josh D. Tenenberg, and Qiang
Yang. Characterizing abstraction hierarchies for
planning. In National Conferenceon Artificial Intelligence, 1991.
[13] A. Bryan Loyall and Joseph Bates. Hap: A reactive, adaptive architecture for agents. Technical
Report CMU-CS-91-147,School of Computer Science, Carnegie Mellon University, Pittsburgh, PA,
June 1991.
[14] Steven Minton. Learning Effective Search Control Knowledge: An Explanation-Based Approach.
Kluwer, Boston, MA,1988.
[15] Alberto Segre and Jennifer Turney. Planning, acting
and learning in a dynamicdomain.In S. Minton,editor, MachineLearning Methodsfor Planning. Morgan KaufmannPublishers, San Mateo, CA, 1993.
[16] Manuela M. Veloso. Learning by Analogical Reasoning in General ProblemSolving. Phi) thesis,
Carnegie MellonUniversity, august 1992.
13