A Hybrid Hierarchical Schema-Based Architecture for Distributed Autonomous Agents

From: AAAI Technical Report SS-02-04. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
A Hybrid Hierarchical Schema-BasedArchitecture
for Distributed AutonomousAgents
Matthew Gillen,
Arvind Lakshmikumar, David Chelberg, Cynthia
Mark Tomko and Lonnie Welch
Schoolof Electrical EngineeringandComputer
Science
OhioUniversity, Athens,OH45701USA
Abstract
Theadventof inexpensivecomputers
has spurredinterest in
distributedarchitectures,in whicha cluster of low-costcornpeters canachieveperformance
on the samescale as expensive super-computers.
Thegoalof researchin distributedsystemsis to take a highlydecomposable
problem
solution, and
put that solutioninto a framework
that allowsthe systemto
take full advantage
of the sohition’sdecomposability.
Anarchitecturethat allowsadditionalprocessorsto be takenadvantageof withlittle or nomanualreconfigurafion
wouldbe
ideal.
Artificial Intelligence(AI)algorithmsfor planningin nontrivial domainsamtypically resource-inteasive.Webelieve
that a framework
for developing
planningalgorithmsthat allowsan arbitrary level of decomposition
and providesthe
meansfor distributing the computation
wouldbe a valuable
contributionto the Al community.
Weproposea general,flexible, andscalablehierarchicalarchitecturefor designingmulti-agentdistributedsystems.We
developed
this architectureto facilitate the development
of
softwarefor soccer-playingrobots. Theserobotswill compete in the internationalrobot soccercompetition
RoboCup.
All of our software,fromthe high-levelteamstrategy planningdown
to the controlloopfor the motorsonourrobots
fits
intothis architecture.
Webuild onthe notionof deliberativeandreactive agents,
forminga hierarchyof hybridagents. Thehierarchycovers
the entire continuum
of hybridagents:agentsnearthe root of
the hierarchyare mostlydeliberative,whileagentsnear the
leavesare almostpurelyreactive.
Introduction
to Multi-Agent
Systems
A multi-agent systemcan be thought of as a group of interacting agents workingtogether to achieve a set of goals. To
maximizethe efficiency of the system, each agent must be
able to reason aboutother agents’ actions in addition to its
own. A dynamicand unpredictable environment creates a
need for an agent to employflexible strategies. The more
flexible the strategies however,the moredifficult it becomes
Copyright(~) 2002,American
Associationfor Artificial Intelligence(www.aaai.org).
All rights reserved.
Marling,
to predict whatthe other agentsare goingto do. For this reason, coordination mechanismshave been developedto help
the agents interact whenperformingcomplexactions requiring teamwork.These mechanisms
must ensure that the plans
of individualagents do not conflict, while guidingthe agents
in pursuit of the goals of the system.
Agents themselves have traditionally been categorized
into one of the following types (Woolridgeand Jennings
1995):
¯ Deliberative
¯ Reactive
¯ Hybrid
Defiberative Agents
Thekey component
of a deliberative agent is a central reasoning system(Ginsberg1989)that constitutes the intelligence of the agent. Deliberative agents generate plans to
accomplish their goals. A world model maybe used in a
deliberative agent, increasingthe agent’sability to generate
a plan that is successful in achievingits goals evenin unforeseensituations. This ability to adapt is desirable in a
dynamicenvironment.
The main problemwith a purely deliberative agent when
dealing with real-time systemsis reaction time. For simple,
well knownsituations, reasoningmaynot be required at all.
In somereal-time domains,such as robotic soccer, minimizing the latency betweenchangesin worldstate and reactions
is important.
Reactive Agents
Reactiveagents maintainno internal modelof howto predict
future states of the world. Theychooseactions by using the
current worldstate as an indexinto a table of actions, where
the indexingfunction’s purposeis to mapknownsituations
to appropriateactions. Thesetypes of agents are sufficient
for limited environmentswhereevery possible situation can
be mapped
to an action or set of actions.
Thepurely reactive agent’s majordrawbackis its lack of
adaptability. This type of agent cannot generate an appropriate plan if the current worldstate was not considereda
PurelyDeliberative
Agent
More
Deliberat~More
Reactive
Agents
Figure 1: AgentHierarchy
priori. In domainsthat cannot be completelymapped,using
reactive agents can be too restrictive (Mataric1995).
Hybrid Agents
Hybrid agents, when designed correctly, use both approaches to get the best properties of each (Bensaid and
Mathieu1997). Specifically, hybrid agents aimto have the
quick responsetime of reactive agents for well knownsituations, yet also havethe ability to generate newplans for
unforeseensituations.
Wepropose to have a hierarchy of agents spanninga continuumof deliberative and reactive components.At the root
of the hierarchyare agents that are mostlydeliberative, while
at the leaf nodes are agents that are completelyreactive.
Figure1 showsa visual representationof the hierarchy.
A Hybrid Hierarchical
Architecture
The Test Domainfor the Architecture
This architecture was developedfor the Robobeats(Cheiberg et al. 2000;2001)to copewith the demandsof competing in the small-size RoboCup
robot soccer league (Robocup
Federation 2001). As such, examples and analogies will
relate to that domain.The key features in the domainof
RoboCup
as it relates to this architecture are as follows:
,, One or moreoverhead camerasprovide the world view
¯ All robots on the field mustbe untethered(e.g., use wireless communication)
¯ Anynumberof computersoff the field mayprovide computing power
¯ Oncethe gamestarts, no humaninteraction mayhelp or
guide any robot
¯ The exact state of the worldis not obtainable, nor is it
possible to entirely predict the opponent’sfuture actions,
makinglong-term planningdifficult
Webelieve that this domainincorporates manyof the issues
present in real worldplanningproblems,and therefore is an
ideal testbed for our architecture.
A domainlike RoboCup
requires coordination to achieve
the full benefit of applyingmultiple agents to the problem.
In addition to the regular hardware/softwareproblemsfaced
by mobilerobotic systems, soccer playing robots also need
to cope with communication
delays, and vision system imperfections and inconsistencies. Withinsuch systems, each
agent has a responsibilityto carry out tasks that mustbenefit
the wholeteam. Whena team fails to performwell, it becomesdifficult to assess and analyzethe sourceof the problem. The agents might haveincorrect individual strategies
or the overall team strategy maynot be suitable. (Mataric
1998) and (Boutilier 1996) have suggested communication
as a possible solution to resolve these problemsin certain
multi-agent environments.For our domain,extensive communication wouldresult in unacceptable delays between
changesin the world state and appropriate responses. Our
architecture is designed to minimizecommunication
among
agents to keep the required bandwidthlow.
The Schema
Weuse the term goal to meana desired worldstate. A goal
couldimplya partial worldstate that is desired, for example:
somegoal maybe accomplishedif the ball reaches someposition on the field irrespective of whereall the players are.
Accordingto Webster’s dictionary, a schemais "a mental
imageproducedin response to a stimulus, that becomesa
frameworkor basis for analyzingor respondingto other related stimuli". Wewill use the term schemain the same
spirit, but defineit moreconcretelybysayingthat it is a strategy for achieving one or more goals. Anideal schemagenerates a plan, basedon the current worldstate, that has the
highest desirability for achievingsomesubset of the goals
the schemawas designed to accomplish. Desirability is a
functionof the utility of the goals andthe probabilityof success of the plan. Wecare about the probability of success,
because even an ideal strategy producing an optimal plan
cannot guarantee success in a world that cannot be completely modeled.
The Hierarchy of Schemas
The plan a schemagenerates consists of either a simple action or a sequential execution of other schemas’
plans. Schemasare designedhierarchically; the lowest level
schemasperformsimple actions such as movinga robot to
a specified destination. A higher level schemamayuse any
schemathat falls strictly belowit in the schemahierarchyin
order to generatea plan. For example,a higher level schema
might use the ’moveto destination’ schemaseveral times
sequentially to maneuvera robot aroundan obstacle.
Whena parent schematells a child schema(that is part
of the parent’s plan) to start executingits plan, a newthread
of control is started for the executionof that plan. Theparent’s thread of executiondoes not stop, however.Theparent
mustcontinue to monitorthe world state, in case the plan
it began executing becomesnon-optimal. Thus, the higher
level ’avoid obstacle’ schemamayinterrupt the execution
of ’moveto destination’ iftbe obstacle movesunexpectedly.
Continualmonitoringof a plan’s optimality at all levels of
the hierarchyis a keypart of this architecture.
Giventhis hierarchyof schemas(strategies), each schema
is only responsible for optimality with respect to its own
goals. The goals of the parent schemaneed not be considered, since it is assumedthat the parent is optimizingits own
goals. The only goals any given schemaneeds to consider
are its owngoals andthe goals of its children.
This encapsulation helps to makethe systemmanageable.
If any one schemaA seemsto do too much,its goals could
be split into smaller pieces, and the schemadividedinto two
levels: a high level AHthat keepsthe sameparents of A and
managesthe moreabstract goals, and alower level AL that
keeps the same children as A, but has A~as a parent. A
sanity checkof AH’sparents to ensurethat A~is still being
used correctly should not be necessary, since AHprovides
the samefunctionality that A did, and A~v’sparents didn’t
dependon anythingthat A used in its implementation(e.g.,
A’s children).
AnAgent can be composedof one or more schemas. What
constitutes the boundariesof an agent in this systemis still
an open question. Weuse the agent to represent a process,
an atomic unit that maybe distributed amongthe available
resources. Howmanyschemasa given agent contains is up
to the designer. For maximum
flexibility, a designer may
decide that each schemais an agent. Or, if distributing the
computationis not as important, the designer maygroup the
top half of the hierarchyinto one agent and the rest of the
hierarchyinto a set of agents.
The first implementationof our RoboCup
systemprovides
a concrete example. Wegrouped all schemasconsidering
multiple robots into a single Meta-agent(e.g., formations,
passing). Schemasthat only dealt with one robot (e.g., roles
in a formation) weregroupedinto a Player-agent. Finally,
all schemasfor interfacing with the hardwareon the robots
composedthe Robot-agent. There was an instantiation of
Player-agent and Robot-agentfor each of our five physical
robots.
ity theory and utility theory. Probabilistic methodsprovide
coherent prescriptions for choosingactions and meaningful
guaranteesof the quality of these choices. Whilejudgments
aboutthe likelihoodof events are quantifiedby probabilities,
judgmentsabout the desirability of action consequencesare
quantifiedbyutilities.
Givena probability distribution over the possible outcomesof an action in any state, and a reasonable utility
(preference) function over outcomes,we can computethe
expectedutility of each action. Thetask of the schemathen
seemsstraightforward - to find the plan with the maximum
expected utility (MEU).
Desirability Analysis: An Example
Intuitively, the goals of a schemacanbe thoughtof as having
a context-freeutility that is the sameacrossall schemaswith
that goal, and somecontext-dependent
utility that is specific
to the particular schemaimplementation.The context-free
utility is used to implythat in general, pursuingsomegoal
witha highcontext-freeutility is better than someother goal
(with a lowercontext-freeutility). It canalso be interpreted
as a wayfor expressingheuristics about the relationships
betweengoals. For example,if we take the old adage that
"the best defenseis a goodoffense"as a generalheuristic for
accomplishingthe goal of winning,then the goal of scoring
wouldhavea higher context-freeutility compared
to the goal
of preventingopponentscores.
Thecontext-dependent
utility refers to the utility of the
goal in the context of the strategy that the schemawasdesigned to implement. A strategy maybe implementedto
achieve someset of goals H. As a by product of achieving
those goals, it mayachieve someother set of goals L. However, since the mainfocus of the strategy is H andnot L, the
goals within H will have a context-dependentutility much
greater than those goals whichare a part of L. There are
also other waysof orderingthe relative context-freeutilities
of the set of all goals that a schemaachieves.Thisis just one
example.
The important notion is that the ordering of contextdependentutilities does not matter outside of the context
of the schema.All context-dependentutilities are local to
the schemathat defines them, and are never propagatedto
parents or children. So howdoes a parent decide whether
a child schemais an appropriate strategy for accomplishing
its goals?
Anexampleis instructive. Assume
a schema,As, is at the
root nodein the hierarchy, and that As’s children are represented by the set Css = {C’sl, C’s2 } (s denotes a schema,
ss denotes a set of schemas). G(As) will represent As’s
goals, and G(Css)will represent the union of the goals of
the elementsof Css. Eachchild’s goals represent somesubset of the possible waysof achievingone or moreelements
of G(As). AssumeG(As) has one element: win the game
of soccer. ThenG(Css)could consist of score and prevent
opponentscore, which could be thought of as components
Decision Makingand its Role in the Architecture
Whenschemashave to makedecisions in a dynamicenvironment,they haveto to take into accountthe fact that their
actions mayhave several outcomes, someof which maybe
more desirable than others. Theymust balance the potential of a plan achievinga goal state against the risk of producingan undesirablestate and against the cost of performing the plan. Decision theory (French 1988) provides
attractive frameworkfor weighingthe strengths and weaknessesof a particular courseof action, withroots in probabil-
8
of the goal win. For Asto determinethe best wayof achieving G(As), it must decide whichchild will be permitted to
execute its plan. Asranks its children based on the desirability of eachstrategy. Desirabilityis a functionof the utility of achieving elements of G(Css)with respect to G(As)
and the feasibility of the instantiated plan achievingG(As).
We’vediscussedwhat the utility of a goal represents, so
we’ll moveon to the feasibility. Feasibility is a measure
of howachievablea goal is given the current situation, irrespective of howuseful accomplishingthat goal wouldbe.
Toestimatehowachievablea goal is in the current situation,
it is necessaryto havea plan already instanfiated. Thefeasibility is dependenton three things: a plan, a goal, and a
current worldstate.
So, to rank the elements of Css in As, As asks each
elementof Uss the feasibility of its accomplishing
each elementof G(Css). For any child to computethe feasibility
of achievinga given goal, the child must havea plan. Each
child in turn developsa plan by asking its children about
the feasibility of accomplishingtheir goals. This recursive
structure allows accurateprediction of feasibility. Oncethe
feasibility of a child’s plan achievingeachgoal (F(cj (p), g))
is determined,a rankingof the children accordingto desirability is possible.
Wewill nowdescribe one wayof computingdesirability,
althoughother algorithmsare certainly possible. Giventhe
utility of each goal in G(Css),U(gn), and a feasibility for
each{child, goal} pair F(cj, g~), we coulddefine desirability of a givenchild cj as
fg
D(c) = U(g) ¯ FCc,g )
(I)
i=0
Wewill call the most desirable child CD.If we nowassume
that As is not the root node, whenaskedabout the feasibility
of each element k of G(As) by one of As’s parents, As
wouldreturn a value computed
by consideringthe feasibility
of CDaccomplishingg~.
Related Work
Choosingthe best plan for a given strategy entails selecting child schemas(actions) that are appropriate to the current situation. Manyaction selection mechanisms
for robot
control havebeen discussed in the literature. At the highest level, these mechanisms
could be divided into two main
categories, arbitration (Ishiguro et al. 1995)and command
fusion (Rosenblatt 1995). Arbitration mechanisms
can be divided into: priority-based, state-based andwinner-take-all.
Thesubsumptionarchitecture (Brooks1986), is a prioritybased arbitration mechanism,where behaviors with higher
priorities are allowed to subsumethe output of behaviors
with lower priority. State-basedarbitration mechanisms
include Discrete Event Systems (DES) (Koseckaand Bajcsy
1993), a similar methodcalled temporalsequencing(Arkin
1992), and action selection basedon Bayesiandecision the-
ory (Hager 1990). The advantageof the DESand the temporal sequencingapproachesis that they are basedon a finitestate machineformalismthat makesit easy to predict future
states from current ones. However,this modelingscheme
is knownto be NP-hard.Finally, in winner-take-all mechanisms,action selection results fromthe interactionof a set of
distributed behaviors that competeuntil one behavior wins
the competitionand takes control of the robot. Anexample
is the activation networkapproach(Maes1990), whereit
shownthat no central bureaucratic moduleis required.
Command
fusion mechanismscan be divided into voting,
fuzzy and superposition approaches. Votingtechniques interpret the output of each behavioras votes for or against
possible actions. The action with the maximum
weighted
sum of votes is selected. Fuzzy command
fusion methods
are similar to voting. However,they use fuzzy inferencing
methodsto formalize the voting approach. Superposition
techniques combinebehavior recommendations
using linear
combinations.
Conclusions
Wehave presented an architecture employinga hierarchy of
hybrid agents. Theadvantagesof this architecture are many.
It allows the planningprocess to be distributed across any
available computingplatforms. Additionally, the required
communicationamongagents is minimizedby communicating feasibilities of plans, not plan details. This architecture
allows an arbitrary decompositionof the planning problem,
while introducing minimaloverhead. This architecture facilitates the integration of middle-warebasedresourcemanagers, like DeSiDeRaTa
(Welchet al. 1998), to ensure optimal use of resources in dynamicenvironments.
Wewill implement this architecture as part of our
RoboCupeffort this year. Weexpect to achieve low communicationslatencies while utilizing several off-field platformsto control a groupof five robots playing soccer. This
uniformarchitecture provides an ideal frameworkto build
planningagents at different levels of abstraction, fromteamlevel strategy to robot-level motorcontrol. The uniformity
of our approachto distributed hierarchical planninglends itself to rapid software development,and the ability to more
easily integrate the efforts of our teamof programmers.
Acknowledgments
This work has been partially supported by the NASA
CETDP
grant "Resource Managementfor Real-Time Adaptive Agents" and by the Ohio University 1804 Research
Fund. The authors wouldlike to thank the entire Robobcats team without whosesupport and contributions this work
wouldnot be possible.
References
Arkin, R, 1992. Integrating behavioral, perceptual, and
world knowledgein reactive navigation. Robotics and Au-
tonomousSystems 6( l): 105-122.
Bensaid, N., and Mathieu,P. 1997. A hybrid architecture
for hierarchical agents.
Boutilier, C. 1996. Planning, learning and coordination
in multiagent decision processes. In Proceedingsof the
6th Conferenceon Theoretical Aspects of Rationality and
Knowledge, 195-210. San Mateo, CA: MorganKaufmann.
Brooks, R. A. 1986. A robust layered control system for
a mobile robot. 1EEEJournal of Robotics and Automation
RA-2(I): 14-23.
Chelberg, D. M.; Gillen, M.; Zhou, Q.; and Lakshmikumar, A. 2000. 3D-VDBM:
3D visual debug monitor for
RoboCup.In lASTEDInternational Conference on Computer Graphicsand Imaging, 14-19.
Chelberg, D. M.; Welch, L.; Lakshmikumar,A.; Gillen,
M.; and Zhou, Q. 2001. Meta-reasoningfor a distributed
agent architecture. In Proceedingsof the Southeastern
Symposiumon System Theory, 377-38 I.
Cooper,G. 1990. The computationalcomplexityof probabilistic inferenceusing bayesiannetworks.Artificial Intelligence 42(2-3):393--405.
French, S. 1988. Decision Theory. Ellis Horwood,Chichester, WestSussex, England.
Ginsberg, M. 1989. Universal planning: An(almost) universally bad idea. AI Magazine10(4):40-44.
Hager, G.D. 1990. Task directed sensor fusion and
planning: A computational approach. The Kluwer International Series in Engineeringand ComputerScience.
Boston: KluwerAcademicPublishers.
Ishiguro, A.; Kondo,T.; Watanabe,Y.; and Uchikawa,Y.
1995. Dynamicbehavior arbitration of autonomousmobile robots using immunenetworks. In IEEEInternational
Conferenceon Evolutionary Computing(ICEC), 722-727.
Kosecka,J., and Bajcsy, R. 1993. Discrete event systems
for autonomousmobileagents. In Workshopon Intelligent
Robot Control, 21-31.
Maes,E 1990. Howto do the right thing. ConnectionScience Journal, Special Issue on Hybrid Systems 1(3):291323.
Mataric, M. 1995. Issues and approachesin the design of
collective autonomousagents. Robotics and Autonomous
Systems 16(2-4):321-331.
Mataric, M. 1998. Usingcommunication
to reduce locality
in distributed multiagentlearning. Journalof Experimental
andTheoreticalArtificial Intelligence 10(3):357-369.
Noda, I., and Matsubara, H. 1996. Learning of cooperative actions in multi-agent systems: A case study of pass
play in soccer. In Workingnotes for the AAAiSymposium
on Adaptation, Co-evolution and Learning in Multiagent
Systems, 63--67.
10
Pearl, J. 1988.Probabilistic Reasoningin Intelligent Systems: Networksof Plausible Inference. San Mateo,California: MorganKaufmann.
RobocupFederation. 2001. RobocupOverview. World
WideWebsite, http://www.robocup.org/overview/2.htmi.
Rosenblatt, J. 1995. Damn:A distributed architecture
for mobile navigation. In AAAI Spring Symposiumon
Lessons Learnedfrom ImplementedSoftware Architectures
for Physical Agents.
Welch,L., Shirazi, B.; Ravin&an,B.; and Bruggeman,
C.
1998. DeSiDeRaTa:QoSmanagementtechnology for dynamic,scalable, dependablereal-time systems. In Proceedings of The 15th IFACWorkshopon Distributed Computer
Control Systems.
Woolridge,M., and Jennings, N. 1995. Intelligent agents:
Theory and practice. Knowledge Engineering Review
10(2): 115-152.