Integration and Action in Perception/Action Systems with Access to NonLocal-Space Information From: AAAI Technical Report WS-96-07. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Glenn S. Wassonand WorthyN. Martin ComputerScience Department, Thornton Hall, University of Virginia, Charlottesville, { wassonI martin }@virginia.odu Abstract Actionsand objects are closely tied together(Russell&Norvig 1995).Weoften think of the actions wecan performon objects as propertiesof those objects. Thetheoryof action presentedhere, containsthe belief that actionsare definedin termsof the objects they effect. Intuitively then, actions consistof a verbanda noun, for example,pick-up-the-soda-can, go-to-the-car,or paint-the-canvas. Weare interestedin twodifferentroles for nouns,as direct objects whichthe agentmanipulates(the-canin pick-up-the-can)and as indirect objectswhichspecifydestinations(the-cat in go-to-thecar). Werefer to these nounsas m-roleobjectsor d-role objects.In our theoryof action, before performinganyaction, an agent first developsa deietic representationof the m-roleor d-role object.So, the agentidentifies the-soda-can,the-car, or the-canvas.When this is completed, the specifiedactioncan be takenusingthe objectindicatedbythe representation.Aplanis, therefore,a set of objects and the actions to perform. Wecontendthat this deictic organizationis an effective means of expressingactions becausean action cannottake place without identifyingthe mor d-role objects. Ourresearchwill showthat actions can be performedeffectively and plans can be organizedappropriatelyusing such a representation.Wewill presenta system that usesdeictie representations as a meansto integratea deliberate plannerand a perception/actionsystem.Non-local-space information in the planswill be conveyed to the perception/actionsystem, whereit can be actedon. Emergent fromour theoryof activity is a representationsystem that integratesagent controlsystemcomponents. Ourtheoryis centered onthe real worldobjects effeetedby action. Webelievethis providessufficient and effectiveorganizingprinciplesfor designing perception/actionsystems. Background There are two schools of thought in the AI planning community, the classical planning camp[see (Tate, Hendler Drummond1990) for a survey] and the reactive camp (Brooks 1986)(Firby 1987). The classical planning proponents claim that complextasks cannot be completedwithout reasoning about the state of the world in advance, while the reactive proponents claim the world is too dynamicto reason about or evenrepresent. Clearly, a little of each is true. One source of information, commonto manyclassical planners, but inaccessible to pure perception/action (i.e. reactive) systems, is non-local-space information, such as is present in a map.A perception/action systemoperating in a large environment maynot be able to perceive its entire route space a priori. Somedeliberative entity must plan a 137 VA22901 course for the agent and activate appropriate behaviorsat the correct time. This is because a perception/action (PA) system has no memoryor representation. It cannot create a plan, it only knowswhat action should be taken now. Although PAsystems can perform somenavigation tasks without consulting a deliberator, whenthe agent reaches a Tjunction for example, the perception/action system must appeal to its route plan in order to deterministically choosethe correct direction. This paper’s agendais not to discuss better methodsof plan generation, but to discuss howto better communicatethe information expressed in plans to a perception/action system. That is, howto integrate a non-localspace system component and a PA system component by transferring non-local-space information to the PAcomponent. in such a waythat the PAcomponentcan use the informationefficiently and effectively. Integration Traditionally, perception/action systems are thought of as determining their actions directly from their perceptions. This can be expressed as a =tiP), wherea is the current action and P is the set of current precepts. Workby Brill (Brill 1995) has shownthat perception/action systems can exhibit greatly improved performance by incorporating small amountsof task-dependent representation into their framework. In such systems, a = f(P, M). wherea and P are as before and M is the PA component’s memory or representation. Brill proposes a deictic, local-space representation system based on Ullman’s (Ullman 1984) markers, as used in (Agre &Chapman1987). Since perception/action systems can use such local-space representations effectively to achieve certain complexbehaviors, then the task of the entity possessingthe non-localspace representation is to transfer this infoLmationinto the PAsystem’s local-space representation. Given the formulation, a = f(P, M), the mostefficient wayof influencing a PA system’s actions is through M. However,there is a temptation to pass arbitrarily complex data structures to the PAsystem through M. In order for a PAsystem to remain effective in dynamicenvironments, it cannot be asked to manipulate large data structures. We present a minimalist interface between a non-local-space componentand a perception/action system, using markers. Markers Reactivesystems, as conceivedby Brooks(Brooks1986), do not attemptto modelthe worldbecauseanystored representationwill become stale. In fact, the designphilosophy of suchsystemsis that the worldis a better modelof itself than an agentcouldever build. Therefore,an agentneedsno representationbecauseit can just "look"at the worldto determinewhateverit needs to know. Clearly, there are somesimpletasks whichrequire some formof representation.For example,whenI sit in an easychair, I align myselfwith the chair, but whenI actually sit down,I amfacing awayfrom the seat. However,becauseI havea representationof wherethe seat is, whichI developed whilelookingat it. I amable to maneuver myselfinto the chair. If a purereactivesystemweretrying to sit down,once it is nolongerfacingthe chair, thereis nothingin its perceptive field to indicatethat it shouldsit down.Withoutrepresentationto tell the systemthat it wastrying to sit, it never will. Amoresophisticated systemwhichcould sit in the chair at mydesk mightuse the computermonitoras a visual cue to decidewhento sit, i.e. whenyousee the monitor,the chair is behind you, so sit down.However,mydesk chair has wheelsand can movearound.In order to decidewhatvisual cuetells the agentwhen to sit for variousinstancesof the sitting task, the sit routinemustbe parameterized. Anothersystem componentmust pass this parameter to the reactive system.Thisinteractionis discussedin sectionrifled "Transferring Non-Local-Space Informationto a PASystem". Tryingto preserve the robustness of PAsystems, while incorporating representation involves using a minimalist representationfor anygiven task. Taskdependentrepresentations give us a minimalist deicric representationin that the agent modelsonly those objects whoseroles are relevant to its currenttask. Thereasonthat onlytask dependent objects are modeledis that the high cost of keepingan elaborate modelup-to-date limits PAsystemeffectiveness. Markersare the basis of the representationin our theory of action. Ullman(Ullman1984)uses the term "marking" refer to remembering a location for later reference. Hesupposes the creation of a "markingmap"whichholds coutext dependentlocation, in the visual field, that havebeenanalyzed. Attneaveand Farrar (Attneave& Farrar 1977)suggest that locations outsidethe visual field can be marked. Pylyshyndescribes a similar concept in his FqNST model (Pylyshyn&Storm1988). Hedescribeda limited number "reference tokens"whichcan be boundto visual features. This binding is a pre-requisite to determiningrelational properties about those features. Agreand Chapman (Agre Chapman 1987)use markersin their Pengisystemto identify objectsrelevantto the penguin’scurrenttask, e.g. the-iceblock-blocking-my-escape. Brill considers issues involved in using markersin dynamic3Ddomainsinvolving occlu- 138 sion (Brill, Wasson &Martin1996). Markersare a task dependentrepresentation in which each markeris associated with an object havingsomerole (mor d) in the task. Markers are generallythoughtto contain a ’what’ and a ’where’ component.The ’what’ component describesthe object’srole in the currenttask. Themarkerassociatedwith a particular object mayhavedifferent ’what’ components for different tasks. For example,an electric socketcouldbe markedas an obstaclefor navigation,unless the agentneededto recharge,in whichcase, the samesocket could be markedas goal. The ’where’ componentcontains the locationof the associatedobject in somecoordinatesystem useful to the current task. Themarkersin our system have other componentswhichare discussed in the implementation section. However,the total numberof components in a markeris purposelykept smallin order to avoid the data manipulationproblemsdiscussedearlier. Expectants Sincea markeris associatedwith an object, an important question is howthe association is made.Pylyshynsays the associationis madethrougha "primitiveoperatiouwhichis pre-attentively performedon the visual display" (Pylyshyn & Storm1988). This suggeststhat there is a processexamining the visual field for propertieswhichcanbe detectedin a pre-attenrivefashion. However, it is unlikelythat all the pre-attenfive properties whichhumanscan detect are being soughtat all times. Rather,weonly detect the featuresthat are importantto our current task. Giventhis task dependence,it is likely that the currenttask candictate not only whichfeatures oneexpectsto fred, but whereonemightexpect to f’mdthem. In orderto facilitate non-local/localspaceintegration.We define the term "expectant".Anexpectantdefmesa set of objects whichmatchsomedescription of an object. The morespecificthe description,the fewerobjectswill be in the expectantset. For example,supposemytask wasto tighten a screw.I knowthat myscrewdriversare in the basementon the workbench.Theremaybe multiple screwdriverson the workbenchand each of these can matchthe expectant description. Theexpectantdescribesthe objectI need,a screwdriver, andwhereI mayfred that object, on the workbench. Thedifference betweena markerand an expectantis that a markeris associatedwith a specific object, whilean expectant describesa set of objects. Transferring Non-Local-Space Information to a PA System Communication betweenour system componentpossessing non-local-space(NLS)informationand our perception/ action component occursvia nninstantiatedmarkers."Uninstantiated"reflects the fact that the markeris associatedwith an expectantandhencea single object has not yet beenassociated with the marker. TheNLScomponent can pass unimtantiated markersto a PAsystemfor instantiation. The PAsystemmustdetect andselect a single object fromthe set describedby the expectantandassociateit with the marker. Making this associationmeansthe markeris nowinstantiated. So, it is nowwithinthe local-spacerepresentationof the PAsystem,whereit can be used effectively. Consideran agent whosetask is to get a cup of coffee. Oneof the steps in this planis to get a coffeemug.Thecoffee mugsare in the cabinet abovethe coffee maker,on the bottomshelf. When the cabinetdooris closed,the agentcannot perceive any mugswhichmaybe in the cabinet. The NLScomponentpasses an uninstantiated ’mug’markerto the PAsystem.The’where’slot of this markertells the PA systemthat the mugwill be inside the cabinet, onthe bottom shelf. The PAsystemcan open the cabinet (whetherthis emergesfromsomebehaviorin the PAsystemor is initiated by the TEpassingin a’cabinet door’markerandtelling the PAsystemto execute an openaction on it, will be determinedby the PAsystem’sbasic capabilities). At this point, the PAsystemcan instantiate the ’mug’marker.This deictic representationcan be usedto effectively manipulatethe mug in later stagesof the plan, suchas pour-coffee-into-mug. It shouldbe notedthat the PAsystemis not limitedto representing only those objects indicated by the NLScomponent. The PAsystem mayhave its ownroutines actively looking for task dependentfeatures. Representationacquiredandmaintainedby the PAsystemis referred to as local-spacerepresentation. Architecture In the proceedingdiscussionabout our theory of action andhowit leads to a formof representation,wediscussedan agent whosecontrol systemconsists of a perception/action system and someother system componentpossessing nonlocal-spaceinformation.Adiagramof this generalarchitecture appearsin figure 1. Figure2 showsa morestructuredarchitecture, whichour systemuses. Ourarchitectureconsists of 3 layers, the planningmodule,the task executor,andthe perception/actionsystem.Thejob of the planningmoduleis to formulatehigh-levelgoalsfor the agentandcreate plans. Thetask executorsequencesthese plans to the perception/ action systemas necessary. Theperception/action system controlsthe agent’seffectors. This layerperformsall actions andformsall percepts fromthe environment. In the architectureof figure 2, the meansof integrationbetween the NLScomponentand the PAcomponentis the Task Executor(TIE). The planning modulehas access non-local-space representations whichit can incorp~ate into the plans it generates.TheTEtakes plans andcreates nninstantiated markersfor each object representedin the plan. TheseunirLstantiatedmarkersare passedto the PAsys- 139 Figure1. GeneralArchitecture Figure2. OurArchitecture temfor instantiation whenthe appropriatestep in the planis reached. Thespecificarchitectureof figure 2 is similar to a number of other 3 layeredarchitectures, suchas (Bonasso&Kortonkamp1995)(Connel11992)(Gat 1992), and is not the subject of this research. Features of Integration Thereare several importantfeaturesof the formof representationthat emerges fromour theoryof activity. First, our theory leads to the development of integrated agentcontrol systemswith a variable level of autonomy in their action sub-systems.Theperception/actionsystemis capableof taking actions basedonly on its ownpercepts and representation, i.e. withoutany informationfromother levels of the architecture.When there is little controlinformationflowing fromthe TEto the PA,the PAis moreautonomous. In fact, the PAsystemis capable of completelyautonomous operation if communicatlon with the TEis completelycut off. The systemis then reducedto a Brill-like system.This formof graceful degradationwasone of Brook’soriginal goals for subsumption. Fortasks in whichit is necessaryfor the TEto moretightly control the actions of the PAsystem, the TEcan send morecontrol informationvia expectants.Somesituations require decisionsthat are morecorrectly madeusing non-local-space information(and therefore morecorrectly made outside the PAsystem). Decidingwhichwayto go at a Tjunctionis sucha situation. So, ff the agent’splan wasto navigateto the waterfountainwhichis to the left at a T-junction, whenthe agentarrived at the T-junction,the TEwould pass an ,minstantiated markercorrespondingto the water fountain to the PAsystem. The’where’slot of this marker wouldindicatethat the waterfountainis to the agent’sleft. In order for the agent to be as robust as possible, the amountof control exercised over the PAsystemshould be minimized. Thisis becauseof the belief that the information whichthe PAsystem acquires is the most up-to-date and therefore the mostbelievable. Thenon-local-spaceinformation usedby the planningmodule is potentially stale. TheTE communicates expectantsto the PAsystemas necessary, to allow the PAsystem to operate moreautonomouslywhen this is appropriate. Anotherfeature of this integration methodology is the ability to use multiplemapsof differing scale. Planningbecomesmorecomplexfor an agent operating in a large environment. As the numberof features in the environment grows,so does the complexityof the planningtask. Route finding and feature selection maywell becomeintractable for evenmoderatelylarge areas. Consideran agent driving across the country. Whileon the interstate, manydetails aboutthe variouslocalities are not needed.Havingmultiple mapsof differing scale allows,mimportant details to be abstracted awayfromthe planningprocess. Thereare also situations whenhavingaccessto twodifferent scale representationsof the samearea is useful. Consider an agent whichis walkingthrougha university campus in the rain. In order to stay dry, the agententers a near-by building. Now,the agentstill wishesto reach its original destination,but in orderto stay dry, the agentwantsto followa route whichstays inside buildingsas muchas possible. Theagent can use its coarsescale mapof the universityto formthe context,i.e. the entranceandexit, for the planning processwhichselects a route througha givenbuildingfrom the agent’sfmescale mapof that building. Implementation In order to decidewhatto do, the PAsystemaccessesits perceptsandlooksat its markers.Thesestimuli maytrigger multiple possible behaviorsand the PAsystemmustprioritize its responses.Practical implementation of this system requires an expandedviewof markers,fromthat of earlier markersystems(Agr¢& Chapman 1987)(BriU1995). In dition to the ’what’and’where’slots, markershave’appearance’, ’action’and’action support’slots. 140 Appearance Appearance denotes the sensor signature characteristics of the object. This involvestwo, not necessarilydistinct, routinesfor initially locatingthe objectandfor trackingthe object onceit is marked.Theinitial location routinecan be a visual routine whichthe agentbeginsexecutingwhenits PAsystemreceivesan expectant.After the markeris instantinted, the PAsystemcanbeginusingthe trackingroutineto keepthe marker’sposition up to date. Action The’action’ slot is whatthe systemwantsto do with the markedobject. This is probablywhythe systemmarkedthe objectin the first place.Theactioncanbe anywhichthe systemknows aboutfor a particularobject, i.e. for a particular entry in the marker’s’what’slot. Thepossibleactions will be determinedby the capabilities of the system.Someactions will be common to manyobjects. Examplesof common’actions’ include, go-to and get. Examplesof actions specificto a particularobjectcouldbe ’pour’and’fill’ for a pitcher. Theagent mayhavea databaseof action routines whichthe PAsystemcan access. Progress. Thenotion of task progress is essential to the agent’s ability to determinewhenan action is complete. Obviously,the measureof progresswill vary fromtask to task. Eachaction whichthe systemknowsaboutcan haveits ownprogressroutine. Theprogressmayalso feed backinto the action routine. Forexample,whenpullinga car upto a stop sign, the closer the vehicle gets to the sign, the morestrongly the breaks shouldbe applied. Whilethe PAsystemcan executethe progress routine to determinehowclose to completionit is on sometask, it may not be possibleto resolvea lack of progressat the PAsystem level. ThePAsystemmaybe able to detect that a givenaction is failing, but mayhaveno idea whyor whatelse to try. At this point, it mustappealto higherlayers of the architecture for guidance. Action Support ’Action support’ holds information about the action whichthe perception/actionsystemdoes not possess. This could be parametersto the action’s progress routine or a pointer to anothermarkermarkingan object to be"operated" on. Theremayalso be informationapplicableto the particular instance of a task whichthe system is about to perform.Forexample,if the agentwantedto fall a glass half full of water froma pitcher, the pitcher wouldbe marked with a markerwhose’action’ slot contained ’pour’ and whose’action support’slot containeda referenceto the glass marker.Thetask also specifies a stop conditionfor the pour action, i.e. whenthe glass is half full, whichcanbe checked by the progressroutine. This conditionalso appearsin the ’action support’slot of the pitcher markerbecauseit is the pour action that mustbe stopped. Thepour-progressfunction mustbe applied to the object markedby the markerin the ’actionsupport’slot. i.e. the glass. Anotherexample of a parameterto a progress routine mightbe for an open-door task wherethe doorjust has a handlewhichcan be pulled or pushed.If the agentknowsthis particular door,it can notify the progressroutine that it shouldexpecta pushmotionto causethe doorto openfromthe side the agentis on. If the progressroutinereports a lack of progress,the agentcan try pulling the handle. System Platform Weare currentlydevelopinga systembasedon this theory of action. The platform is a mobile robot based on a MC6811 CPU.The robot, Bruce, has a single cameraon a pan/tilt platformfor sensing. Theperception/actionsystem runs on a workstationsendingcommands to the robot’s ellectors over a wirelessdata-link. Moredetails andexamples of the systemin action are available in (Brill, Wasson Martin1996). Conclusions Wehavediscusseda theoryof action whichidentifies actions by associating themwith the objects involved. This theory encompasses a representationalsystemand an architectural organizingprinciple. Actionsare performedby acquiringa deictic representationof the necessaryobjectsand then performingthe action. This theoryleads to a systemof representationwhichcan expresstask dependentlocal-space informationandbe usedto integrate perception/actionsystemsand non-local-spaceinformation,via expectants. Wehavedescribedour minimalistdeictic representation system,markers.Themarkerinterface is minimalistin both the numberand size of markers.Uninstantiatedand instantiated markersfacilitate communication betweena perception/action systemand a systemcomponent containingnonlocal-space information.Thesetwostates of markersallow non-local-spaceinformationto be placed into the localspacerepresentationfor a perception/actionsystem. Thetheoryof actionpresentedin this papercanbe usedto create systemswhichcan operaterobustlyin large scale domains. Variablelevels of PAsystemautonomyallow rapid executionof primitive actions whenthe environment is sufficiently dynamic.Multiplemapsof possiblydiffering scale can be used to abstract unnecessarydetails awayfromthe planning process. Ourcommunication betweensystemcomponents can express non-local-space information at any scale. Webelieve that designingagents in accordancewith our 141 theoryof action will producesystemswhichare both sufficient andeffective for operationin large dytlamic environments. References [1] Agre,P.E.; and Chapman. D. 1987. Pengi: AnImplementationof a Theoryof Activity. AAAI-87,268-272. [2] Attneave,F.; andFarter. P. 1977.TheVisualWorld Behindthe Head,AmericanJournalof Psychology.Vol. 90, No.4, 549-563. [3]Brill, EZ.1995.Representationof LocalSpacein Pexception/ActionSystems:BehavingAppropriatelyin Difficult Situations. Ph.D.Dissertation,Department of Computer Science,Universityof Virginia. [4]Brill, EZ.; Wasson.G.S.; andMartinW.N.1996.Sensing and RepresentingDynamic,Three-Dimensional Environments: Judicious Useof RepresentationMeasurably ImprovesPerformance,1ROS-96,submitted. [5] Brooks, R.A. 1986. A RobustLayeredControl System for a MobileRobot,IEEEJournalof Roboticsand Automation,RA-2(1):14-23. [6]Bonasso,R.P.; and Kortenkamp, D. 1995.Characterizing an Architecturefor Intelligent, ReactiveAgents.AAA/ Spring Symposium,1995. [7]Connell,J.H. 1992.SSS:AHybridArchitectureApplied to RobotNavigation,In Proceedingsof the 1992IEEE Conferenceon Roboticsand Automation,Nice, France, 2719-2724. [8]Firby, R.J. 1987.AnInvestigationinto ReactivePlanning in ComplexDomains,AAAI-87,202-206. [9] Gat, E. 1992.IntegratingPlanningandReactingin a HeterogeneousAsynchronous Architecturefor Controlling Real-WorldMobileRobots, AAAI-92,809-815. [10] Pylyshyn,Z.W.;and StormR.W.1988. TrackingMultiple Independent Targets:Evidence for a Parallel Tracking Mechanism, Spatial Vision 3(3): 179-197. [11] Russell,S.J., andNorvig,E1995.Artificial Intelligence: A ModernApproach,Prentice Hall, Englewood Cliffs, NewJersey. [12] Tate, A.; Hendler,J.; and Drummond, M. 1990.A Reviewof AI PlanningTechniques,Readingsin Planning, Allen, Hendlerand Tate ed., MorganKanfman. [13] Ullman,S. 1984.VisualRoutines,Cognition18:97159.