From: AAAI Technical Report SS-02-04. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. A Hybrid Hierarchical Schema-BasedArchitecture for Distributed AutonomousAgents Matthew Gillen, Arvind Lakshmikumar, David Chelberg, Cynthia Mark Tomko and Lonnie Welch Schoolof Electrical EngineeringandComputer Science OhioUniversity, Athens,OH45701USA Abstract Theadventof inexpensivecomputers has spurredinterest in distributedarchitectures,in whicha cluster of low-costcornpeters canachieveperformance on the samescale as expensive super-computers. Thegoalof researchin distributedsystemsis to take a highlydecomposable problem solution, and put that solutioninto a framework that allowsthe systemto take full advantage of the sohition’sdecomposability. Anarchitecturethat allowsadditionalprocessorsto be takenadvantageof withlittle or nomanualreconfigurafion wouldbe ideal. Artificial Intelligence(AI)algorithmsfor planningin nontrivial domainsamtypically resource-inteasive.Webelieve that a framework for developing planningalgorithmsthat allowsan arbitrary level of decomposition and providesthe meansfor distributing the computation wouldbe a valuable contributionto the Al community. Weproposea general,flexible, andscalablehierarchicalarchitecturefor designingmulti-agentdistributedsystems.We developed this architectureto facilitate the development of softwarefor soccer-playingrobots. Theserobotswill compete in the internationalrobot soccercompetition RoboCup. All of our software,fromthe high-levelteamstrategy planningdown to the controlloopfor the motorsonourrobots fits intothis architecture. Webuild onthe notionof deliberativeandreactive agents, forminga hierarchyof hybridagents. Thehierarchycovers the entire continuum of hybridagents:agentsnearthe root of the hierarchyare mostlydeliberative,whileagentsnear the leavesare almostpurelyreactive. Introduction to Multi-Agent Systems A multi-agent systemcan be thought of as a group of interacting agents workingtogether to achieve a set of goals. To maximizethe efficiency of the system, each agent must be able to reason aboutother agents’ actions in addition to its own. A dynamicand unpredictable environment creates a need for an agent to employflexible strategies. The more flexible the strategies however,the moredifficult it becomes Copyright(~) 2002,American Associationfor Artificial Intelligence(www.aaai.org). All rights reserved. Marling, to predict whatthe other agentsare goingto do. For this reason, coordination mechanismshave been developedto help the agents interact whenperformingcomplexactions requiring teamwork.These mechanisms must ensure that the plans of individualagents do not conflict, while guidingthe agents in pursuit of the goals of the system. Agents themselves have traditionally been categorized into one of the following types (Woolridgeand Jennings 1995): ¯ Deliberative ¯ Reactive ¯ Hybrid Defiberative Agents Thekey component of a deliberative agent is a central reasoning system(Ginsberg1989)that constitutes the intelligence of the agent. Deliberative agents generate plans to accomplish their goals. A world model maybe used in a deliberative agent, increasingthe agent’sability to generate a plan that is successful in achievingits goals evenin unforeseensituations. This ability to adapt is desirable in a dynamicenvironment. The main problemwith a purely deliberative agent when dealing with real-time systemsis reaction time. For simple, well knownsituations, reasoningmaynot be required at all. In somereal-time domains,such as robotic soccer, minimizing the latency betweenchangesin worldstate and reactions is important. Reactive Agents Reactiveagents maintainno internal modelof howto predict future states of the world. Theychooseactions by using the current worldstate as an indexinto a table of actions, where the indexingfunction’s purposeis to mapknownsituations to appropriateactions. Thesetypes of agents are sufficient for limited environmentswhereevery possible situation can be mapped to an action or set of actions. Thepurely reactive agent’s majordrawbackis its lack of adaptability. This type of agent cannot generate an appropriate plan if the current worldstate was not considereda PurelyDeliberative Agent More Deliberat~More Reactive Agents Figure 1: AgentHierarchy priori. In domainsthat cannot be completelymapped,using reactive agents can be too restrictive (Mataric1995). Hybrid Agents Hybrid agents, when designed correctly, use both approaches to get the best properties of each (Bensaid and Mathieu1997). Specifically, hybrid agents aimto have the quick responsetime of reactive agents for well knownsituations, yet also havethe ability to generate newplans for unforeseensituations. Wepropose to have a hierarchy of agents spanninga continuumof deliberative and reactive components.At the root of the hierarchyare agents that are mostlydeliberative, while at the leaf nodes are agents that are completelyreactive. Figure1 showsa visual representationof the hierarchy. A Hybrid Hierarchical Architecture The Test Domainfor the Architecture This architecture was developedfor the Robobeats(Cheiberg et al. 2000;2001)to copewith the demandsof competing in the small-size RoboCup robot soccer league (Robocup Federation 2001). As such, examples and analogies will relate to that domain.The key features in the domainof RoboCup as it relates to this architecture are as follows: ,, One or moreoverhead camerasprovide the world view ¯ All robots on the field mustbe untethered(e.g., use wireless communication) ¯ Anynumberof computersoff the field mayprovide computing power ¯ Oncethe gamestarts, no humaninteraction mayhelp or guide any robot ¯ The exact state of the worldis not obtainable, nor is it possible to entirely predict the opponent’sfuture actions, makinglong-term planningdifficult Webelieve that this domainincorporates manyof the issues present in real worldplanningproblems,and therefore is an ideal testbed for our architecture. A domainlike RoboCup requires coordination to achieve the full benefit of applyingmultiple agents to the problem. In addition to the regular hardware/softwareproblemsfaced by mobilerobotic systems, soccer playing robots also need to cope with communication delays, and vision system imperfections and inconsistencies. Withinsuch systems, each agent has a responsibilityto carry out tasks that mustbenefit the wholeteam. Whena team fails to performwell, it becomesdifficult to assess and analyzethe sourceof the problem. The agents might haveincorrect individual strategies or the overall team strategy maynot be suitable. (Mataric 1998) and (Boutilier 1996) have suggested communication as a possible solution to resolve these problemsin certain multi-agent environments.For our domain,extensive communication wouldresult in unacceptable delays between changesin the world state and appropriate responses. Our architecture is designed to minimizecommunication among agents to keep the required bandwidthlow. The Schema Weuse the term goal to meana desired worldstate. A goal couldimplya partial worldstate that is desired, for example: somegoal maybe accomplishedif the ball reaches someposition on the field irrespective of whereall the players are. Accordingto Webster’s dictionary, a schemais "a mental imageproducedin response to a stimulus, that becomesa frameworkor basis for analyzingor respondingto other related stimuli". Wewill use the term schemain the same spirit, but defineit moreconcretelybysayingthat it is a strategy for achieving one or more goals. Anideal schemagenerates a plan, basedon the current worldstate, that has the highest desirability for achievingsomesubset of the goals the schemawas designed to accomplish. Desirability is a functionof the utility of the goals andthe probabilityof success of the plan. Wecare about the probability of success, because even an ideal strategy producing an optimal plan cannot guarantee success in a world that cannot be completely modeled. The Hierarchy of Schemas The plan a schemagenerates consists of either a simple action or a sequential execution of other schemas’ plans. Schemasare designedhierarchically; the lowest level schemasperformsimple actions such as movinga robot to a specified destination. A higher level schemamayuse any schemathat falls strictly belowit in the schemahierarchyin order to generatea plan. For example,a higher level schema might use the ’moveto destination’ schemaseveral times sequentially to maneuvera robot aroundan obstacle. Whena parent schematells a child schema(that is part of the parent’s plan) to start executingits plan, a newthread of control is started for the executionof that plan. Theparent’s thread of executiondoes not stop, however.Theparent mustcontinue to monitorthe world state, in case the plan it began executing becomesnon-optimal. Thus, the higher level ’avoid obstacle’ schemamayinterrupt the execution of ’moveto destination’ iftbe obstacle movesunexpectedly. Continualmonitoringof a plan’s optimality at all levels of the hierarchyis a keypart of this architecture. Giventhis hierarchyof schemas(strategies), each schema is only responsible for optimality with respect to its own goals. The goals of the parent schemaneed not be considered, since it is assumedthat the parent is optimizingits own goals. The only goals any given schemaneeds to consider are its owngoals andthe goals of its children. This encapsulation helps to makethe systemmanageable. If any one schemaA seemsto do too much,its goals could be split into smaller pieces, and the schemadividedinto two levels: a high level AHthat keepsthe sameparents of A and managesthe moreabstract goals, and alower level AL that keeps the same children as A, but has A~as a parent. A sanity checkof AH’sparents to ensurethat A~is still being used correctly should not be necessary, since AHprovides the samefunctionality that A did, and A~v’sparents didn’t dependon anythingthat A used in its implementation(e.g., A’s children). AnAgent can be composedof one or more schemas. What constitutes the boundariesof an agent in this systemis still an open question. Weuse the agent to represent a process, an atomic unit that maybe distributed amongthe available resources. Howmanyschemasa given agent contains is up to the designer. For maximum flexibility, a designer may decide that each schemais an agent. Or, if distributing the computationis not as important, the designer maygroup the top half of the hierarchyinto one agent and the rest of the hierarchyinto a set of agents. The first implementationof our RoboCup systemprovides a concrete example. Wegrouped all schemasconsidering multiple robots into a single Meta-agent(e.g., formations, passing). Schemasthat only dealt with one robot (e.g., roles in a formation) weregroupedinto a Player-agent. Finally, all schemasfor interfacing with the hardwareon the robots composedthe Robot-agent. There was an instantiation of Player-agent and Robot-agentfor each of our five physical robots. ity theory and utility theory. Probabilistic methodsprovide coherent prescriptions for choosingactions and meaningful guaranteesof the quality of these choices. Whilejudgments aboutthe likelihoodof events are quantifiedby probabilities, judgmentsabout the desirability of action consequencesare quantifiedbyutilities. Givena probability distribution over the possible outcomesof an action in any state, and a reasonable utility (preference) function over outcomes,we can computethe expectedutility of each action. Thetask of the schemathen seemsstraightforward - to find the plan with the maximum expected utility (MEU). Desirability Analysis: An Example Intuitively, the goals of a schemacanbe thoughtof as having a context-freeutility that is the sameacrossall schemaswith that goal, and somecontext-dependent utility that is specific to the particular schemaimplementation.The context-free utility is used to implythat in general, pursuingsomegoal witha highcontext-freeutility is better than someother goal (with a lowercontext-freeutility). It canalso be interpreted as a wayfor expressingheuristics about the relationships betweengoals. For example,if we take the old adage that "the best defenseis a goodoffense"as a generalheuristic for accomplishingthe goal of winning,then the goal of scoring wouldhavea higher context-freeutility compared to the goal of preventingopponentscores. Thecontext-dependent utility refers to the utility of the goal in the context of the strategy that the schemawasdesigned to implement. A strategy maybe implementedto achieve someset of goals H. As a by product of achieving those goals, it mayachieve someother set of goals L. However, since the mainfocus of the strategy is H andnot L, the goals within H will have a context-dependentutility much greater than those goals whichare a part of L. There are also other waysof orderingthe relative context-freeutilities of the set of all goals that a schemaachieves.Thisis just one example. The important notion is that the ordering of contextdependentutilities does not matter outside of the context of the schema.All context-dependentutilities are local to the schemathat defines them, and are never propagatedto parents or children. So howdoes a parent decide whether a child schemais an appropriate strategy for accomplishing its goals? Anexampleis instructive. Assume a schema,As, is at the root nodein the hierarchy, and that As’s children are represented by the set Css = {C’sl, C’s2 } (s denotes a schema, ss denotes a set of schemas). G(As) will represent As’s goals, and G(Css)will represent the union of the goals of the elementsof Css. Eachchild’s goals represent somesubset of the possible waysof achievingone or moreelements of G(As). AssumeG(As) has one element: win the game of soccer. ThenG(Css)could consist of score and prevent opponentscore, which could be thought of as components Decision Makingand its Role in the Architecture Whenschemashave to makedecisions in a dynamicenvironment,they haveto to take into accountthe fact that their actions mayhave several outcomes, someof which maybe more desirable than others. Theymust balance the potential of a plan achievinga goal state against the risk of producingan undesirablestate and against the cost of performing the plan. Decision theory (French 1988) provides attractive frameworkfor weighingthe strengths and weaknessesof a particular courseof action, withroots in probabil- 8 of the goal win. For Asto determinethe best wayof achieving G(As), it must decide whichchild will be permitted to execute its plan. Asranks its children based on the desirability of eachstrategy. Desirabilityis a functionof the utility of achieving elements of G(Css)with respect to G(As) and the feasibility of the instantiated plan achievingG(As). We’vediscussedwhat the utility of a goal represents, so we’ll moveon to the feasibility. Feasibility is a measure of howachievablea goal is given the current situation, irrespective of howuseful accomplishingthat goal wouldbe. Toestimatehowachievablea goal is in the current situation, it is necessaryto havea plan already instanfiated. Thefeasibility is dependenton three things: a plan, a goal, and a current worldstate. So, to rank the elements of Css in As, As asks each elementof Uss the feasibility of its accomplishing each elementof G(Css). For any child to computethe feasibility of achievinga given goal, the child must havea plan. Each child in turn developsa plan by asking its children about the feasibility of accomplishingtheir goals. This recursive structure allows accurateprediction of feasibility. Oncethe feasibility of a child’s plan achievingeachgoal (F(cj (p), g)) is determined,a rankingof the children accordingto desirability is possible. Wewill nowdescribe one wayof computingdesirability, althoughother algorithmsare certainly possible. Giventhe utility of each goal in G(Css),U(gn), and a feasibility for each{child, goal} pair F(cj, g~), we coulddefine desirability of a givenchild cj as fg D(c) = U(g) ¯ FCc,g ) (I) i=0 Wewill call the most desirable child CD.If we nowassume that As is not the root node, whenaskedabout the feasibility of each element k of G(As) by one of As’s parents, As wouldreturn a value computed by consideringthe feasibility of CDaccomplishingg~. Related Work Choosingthe best plan for a given strategy entails selecting child schemas(actions) that are appropriate to the current situation. Manyaction selection mechanisms for robot control havebeen discussed in the literature. At the highest level, these mechanisms could be divided into two main categories, arbitration (Ishiguro et al. 1995)and command fusion (Rosenblatt 1995). Arbitration mechanisms can be divided into: priority-based, state-based andwinner-take-all. Thesubsumptionarchitecture (Brooks1986), is a prioritybased arbitration mechanism,where behaviors with higher priorities are allowed to subsumethe output of behaviors with lower priority. State-basedarbitration mechanisms include Discrete Event Systems (DES) (Koseckaand Bajcsy 1993), a similar methodcalled temporalsequencing(Arkin 1992), and action selection basedon Bayesiandecision the- ory (Hager 1990). The advantageof the DESand the temporal sequencingapproachesis that they are basedon a finitestate machineformalismthat makesit easy to predict future states from current ones. However,this modelingscheme is knownto be NP-hard.Finally, in winner-take-all mechanisms,action selection results fromthe interactionof a set of distributed behaviors that competeuntil one behavior wins the competitionand takes control of the robot. Anexample is the activation networkapproach(Maes1990), whereit shownthat no central bureaucratic moduleis required. Command fusion mechanismscan be divided into voting, fuzzy and superposition approaches. Votingtechniques interpret the output of each behavioras votes for or against possible actions. The action with the maximum weighted sum of votes is selected. Fuzzy command fusion methods are similar to voting. However,they use fuzzy inferencing methodsto formalize the voting approach. Superposition techniques combinebehavior recommendations using linear combinations. Conclusions Wehave presented an architecture employinga hierarchy of hybrid agents. Theadvantagesof this architecture are many. It allows the planningprocess to be distributed across any available computingplatforms. Additionally, the required communicationamongagents is minimizedby communicating feasibilities of plans, not plan details. This architecture allows an arbitrary decompositionof the planning problem, while introducing minimaloverhead. This architecture facilitates the integration of middle-warebasedresourcemanagers, like DeSiDeRaTa (Welchet al. 1998), to ensure optimal use of resources in dynamicenvironments. Wewill implement this architecture as part of our RoboCupeffort this year. Weexpect to achieve low communicationslatencies while utilizing several off-field platformsto control a groupof five robots playing soccer. This uniformarchitecture provides an ideal frameworkto build planningagents at different levels of abstraction, fromteamlevel strategy to robot-level motorcontrol. The uniformity of our approachto distributed hierarchical planninglends itself to rapid software development,and the ability to more easily integrate the efforts of our teamof programmers. Acknowledgments This work has been partially supported by the NASA CETDP grant "Resource Managementfor Real-Time Adaptive Agents" and by the Ohio University 1804 Research Fund. The authors wouldlike to thank the entire Robobcats team without whosesupport and contributions this work wouldnot be possible. References Arkin, R, 1992. Integrating behavioral, perceptual, and world knowledgein reactive navigation. Robotics and Au- tonomousSystems 6( l): 105-122. Bensaid, N., and Mathieu,P. 1997. A hybrid architecture for hierarchical agents. Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Proceedingsof the 6th Conferenceon Theoretical Aspects of Rationality and Knowledge, 195-210. San Mateo, CA: MorganKaufmann. Brooks, R. A. 1986. A robust layered control system for a mobile robot. 1EEEJournal of Robotics and Automation RA-2(I): 14-23. Chelberg, D. M.; Gillen, M.; Zhou, Q.; and Lakshmikumar, A. 2000. 3D-VDBM: 3D visual debug monitor for RoboCup.In lASTEDInternational Conference on Computer Graphicsand Imaging, 14-19. Chelberg, D. M.; Welch, L.; Lakshmikumar,A.; Gillen, M.; and Zhou, Q. 2001. Meta-reasoningfor a distributed agent architecture. In Proceedingsof the Southeastern Symposiumon System Theory, 377-38 I. Cooper,G. 1990. The computationalcomplexityof probabilistic inferenceusing bayesiannetworks.Artificial Intelligence 42(2-3):393--405. French, S. 1988. Decision Theory. Ellis Horwood,Chichester, WestSussex, England. Ginsberg, M. 1989. Universal planning: An(almost) universally bad idea. AI Magazine10(4):40-44. Hager, G.D. 1990. Task directed sensor fusion and planning: A computational approach. The Kluwer International Series in Engineeringand ComputerScience. Boston: KluwerAcademicPublishers. Ishiguro, A.; Kondo,T.; Watanabe,Y.; and Uchikawa,Y. 1995. Dynamicbehavior arbitration of autonomousmobile robots using immunenetworks. In IEEEInternational Conferenceon Evolutionary Computing(ICEC), 722-727. Kosecka,J., and Bajcsy, R. 1993. Discrete event systems for autonomousmobileagents. In Workshopon Intelligent Robot Control, 21-31. Maes,E 1990. Howto do the right thing. ConnectionScience Journal, Special Issue on Hybrid Systems 1(3):291323. Mataric, M. 1995. Issues and approachesin the design of collective autonomousagents. Robotics and Autonomous Systems 16(2-4):321-331. Mataric, M. 1998. Usingcommunication to reduce locality in distributed multiagentlearning. Journalof Experimental andTheoreticalArtificial Intelligence 10(3):357-369. Noda, I., and Matsubara, H. 1996. Learning of cooperative actions in multi-agent systems: A case study of pass play in soccer. In Workingnotes for the AAAiSymposium on Adaptation, Co-evolution and Learning in Multiagent Systems, 63--67. 10 Pearl, J. 1988.Probabilistic Reasoningin Intelligent Systems: Networksof Plausible Inference. San Mateo,California: MorganKaufmann. RobocupFederation. 2001. RobocupOverview. World WideWebsite, http://www.robocup.org/overview/2.htmi. Rosenblatt, J. 1995. Damn:A distributed architecture for mobile navigation. In AAAI Spring Symposiumon Lessons Learnedfrom ImplementedSoftware Architectures for Physical Agents. Welch,L., Shirazi, B.; Ravin&an,B.; and Bruggeman, C. 1998. DeSiDeRaTa:QoSmanagementtechnology for dynamic,scalable, dependablereal-time systems. In Proceedings of The 15th IFACWorkshopon Distributed Computer Control Systems. Woolridge,M., and Jennings, N. 1995. Intelligent agents: Theory and practice. Knowledge Engineering Review 10(2): 115-152.