Team Coordination among Distributed Agents: Analyzing Key Teamwork

From: AAAI Technical Report SS-02-04. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
TeamCoordination amongDistributed Agents:
Analyzing Key TeamworkTheories and Models
David V. Pynadath and Milind Tambe
ComputerScience Departmentand Information Sciences Institute
University
ofSouthern
California
4676Admiralty
Way,Marina
delRey,CA 90292
Email:
pynadath@isi.edu,
tambc@usc.edu
Abstract
Multiagent
researchhasmade
significantprogressin constructingteams
of distributedentities (e.g., robots,agents,embedded
systems)that act
autonomously
in the pursuit of common
goals. Therenowexist a variety of prescriptivetheories,as wellas implemented
systems,that can
specify goodteambehaviorin different domains.However,
each of
these theoriesandsystemsaddressesdifferent aspectsof the teamwork
problem,andeachdoes so in a different language.In this work,we
seekto providea unifiedframework
that cancaptureall of the common
aspectsof the teamwork
problem
(e.g., heterogeneous,
distributedentities, uncertainanddynamic
environment),
whilestill supportinganalyses of boththe optimalityof teamperformance
andthe computational
complexityof the agents’ decision problem.OurCOMmunicative
Muitiagent TeamDecision Problem(COM-MTDP)
modelprovides such
a framework
for specifyingand analyzingdistributed teamwork.The
COM-MTDP
modelis general enoughto capture manyexisting models
of multiagentsystems,and we use this modelto providesomecomparativeresults of these theories. Wealso providea breakdown
of the
computational
complexityof constructingoptimalteamsundervarious
classes of problemdomains.Wethen use the COM-MTDP
modelto
compare
(both analyticallyandempirically)twospecific coordination
theories (joint intentionstheoryand STEAM)
against optimalcoordination, in termsof bothperformance
and computational
complexity.
Introduction
A central challenge in the control and coordination of distributed
agents, robots, and embedded
systemsis enablingthese different, distributed entities to worktogether as a team, towarda common
goal.
Suchteamworkis critical in a vast range of domains,e.g., for future teamsof orbiting spacecraft, sensor teamsfor tracking targets,
teams of unmanned
vehicles for urban battlefields, software agent
teamsfor assisting organizationsin rapid crisis response, etc. Research in teamworktheory has built the foundations for successful
practical agent team implementationsin such domains.Onthe forefront so far havebeentheories basedon belief-desire-intentions(BDI)
frameworks(e. g., joint-intentions (Cohen&Levesque1991), SharedPlans (Grosz &Kraus 1996), and others (Sonenberget al. 1994))
that haveprovidedprescriptions for coordinationin practical systems.
Thesetheories have inspired the construction of practical, domainindependent teamworkmodels(Jennings 1995, Rich & Sidner 1997:
Tambe1997;Yenet al. 2001), successfully applied in a range of complex domains. While the BDI-basedtheories continue to be useful
in practical systems, there is nowa critical need for complementary
foundational frameworksas practical teams scale-up towards complex, real-world domains. Such complementaryframeworkswould
address somekey weaknessesin the existing theories, as outlined below:
¯ Existing teamworktheories provide prescriptions that usually ignore the various uncertainties and costs in real-world environCopyright(~) 2002, AmericanAssociationfor Artificial Intelligence
(www.aaai.org).
All rights reserved.
57
ments. Whilesuch frameworks
can still providesatisfactory behavior, it is difficult to evaluatethe degreeof optimalityof the behavior
suggestedby these theories/models.For instance, the joint intentions theory (Cohen& Levesque1991) suggests that team members committo attainment of mutualbeliefs in key circumstances,
but it ignores the cost of attaining mutualbelief (e.g., via communication).Coordinationstrategies that blindly followsuch prescriptions maylead to highly suboptimal implementations.While
in somecases optimality maybe practically unattainable, understandingthe strengths and limitations of coordinationprescriptions
is definitelyuseful.
Of course, while existing theories have avoidedcosts and uncertainties, practical systemscannotdo so in real-worldenvironments.
For instance, STEAM
(Tambe1997) extends the BDI-logic approach with decision-theoretic extensions. Unfortunately, while
such approaches succeed in their intended domains, their very
pragmatism
often necessarilyleads to a lack of the theoretical rigor.
Furthermore,it remainsunclear whethertheir suggestedprescriptions (e.g., STEAM’s
prescriptions) necessarily lead to optimal
team coordination.
¯ Existing teamworktheories haveso far failed to provide insights
into the computationalcomplexityof various aspects of teamwork
decisions. Suchresults mayshowintractability of somecoordination strategies in specific domainsand potentially justify the use
of practical teamworkmodelsas an approximationto optimality in
such domains.
To answer these needs, we propose a new complementaryframework, the COMmunicativeMultiagent Team Decision Problem
(COM-MTDP),
inspired by earlier work in economic team theory (Marschak& Radner1971; Ho1980), within the context of economics and operations research. While our COM-MTDP
model borrowsfroma theory developedin another field, weare not just dressing up old wine in a newbottle. Wemakeseveral contributions in
applyingand extendingthe original theory for applicability to intelligent distributed systems.First, weextendthe theory to include key
issues of interest in the distributed AI literature, mostnotablyby explicitly including communication.Second, we introduce a categorization of teamworkproblemsbasedon these extensions. Third, we
provide complexityanalyses of these problems,providing a clearer
understandingof the difficulty of teamworkproblemsunder different circumstances. For instance, someresearchers have advocated
teamworkwithout communication
(Matarid 1997). This paper clearly
identifies domainswheresuch team planning remains tractable, at
least whencomparedto teamworkwith communication.It also identifies domainswherecommunication
provides a significant advantage
in teamworkby reducing the time complexity.
In addition to analyzing complexity, our frameworksupports the
analysis of the optimality of different meansof coordination. We
isolate the conditions under whichcoordination based on joint intentions (Cohen&Levesque1991) is preferred, based on the re-
suiting performanceof the distributed team at the desired task. We
provide similar results for coordination based on the STEAM
algorithm (Tambe1997). Wealso analyze the complexityof the decision
problemwithin these frameworks.The end result is the beginningsof
a well-groundedcharacterizationof the complexity-optimalitytradeoffamongvarious meansof team coordination.
Finally, we are currently conductingexperimentsusing the domain
of simulated helicopter teams, conducting a joint mission. Weare
evaluating different coordination strategies, and comparingthemto
the optimal strategy that we can computeusing our framework.We
will use the helicopter teamas a runningexamplein our explanations
in the followingsections.
Multiagent Team Decision Problems
Givena team of selfless, distributed entities, a, whointend to perform somejoint task, wewish to evaluate the effectiveness of possible coordinationpolicies. A coordinationpolicy prescribes certain
behavior, on top of the domain-levelactions the agents performin direct service of the joint task. Webegin withthe initial teamdecision
model(Ho1980). This initial modelfocusedon a static decision, but
we provide an extension to handle dynamicdecisions over time. We
also provide other extensions to the individual components
to generalize the representation in wayssuitable for representingmultiagent
domains.Werepresent our resulting multiagent team decision problem (MTDP)
modelas a tuple, (S, A, P. fl, O, R). The remainder
this section describes each of these components
in moredetail.
World States: S
¯ S: a set of worldstates (the team’senvironment).
Domain-Level Actions: A
¯ {Ai},¢~:set of actions available to each agent i E a, implicitly
defining a set of combinedactions, A - l’I~o A~.
Theseactions consist of tasks an agent mayperform to changethe
environment(e.g., changehelicopter altitude/heading). Theseactions
correspondto the decision variables in teamtheory.
Extension to DynamicProblemThe original conception of the
team decision problemfocused on a one-shot, static problem. The
typical muRiagentdomainsrequire multiple decisions madeover
time. Weextend the original specification so that each component
is a time series of randomvariables, rather than a single one. Theinteraction betweenthe distributed entities and their environment
obeys
the followingprobabilistic distribution:
¯ P: state-transition function, S x A -~ II(S).
This function represents the effects of domain-levelactions (e.g.,
flying action changesa helicopter’s position). Thegivendefinition of
P assumesthat the world dynamicsobey the Markovassumption.
Agent Observations: 1"1
¯ (fl~}~a: a set of observations that each agent, i, can experience of its world, implicitly defining a combinedobservation,
In a completelyobservablecase, this set maycorrespondexactly to
S x A, meaningthat the observations are drawndirectly from the
actual state and combinedactions. In the moregeneral case, f~ may
include elementscorresponding
to indirect evidenceof the state (e.g.,
sensor readings) and actions of other agents (e.g., movement
of other
helicopters). In a team decision problem,the informationstructure
represents the observationprocessof the agents. In the original conception, the informationstructure wasa set of deterministicfunctions,
O~: S --* ~.
58
Extension of AllowableInformation Structures Such deterministic observationsare rarely presentin nontrivial distributed domains,
so weextend the informationstructure representation to allowfor uncertain observations. There are manychoices of modelspossible for
representing the observability of the domain.In this work, we use
a general stochastic model, borrowedfrom the paniaUyobservable
Marl~vdecision process model(Smallwood&Sondik 1973):
¯ {O~}~a:an observation function, Oi : S x A ~ H(f~i), that
gives a distribution over possible observationsthat an entity can
make.This function modelsthe sensors, representing any errors,
noise, etc. EachO, is cross productof observationfunctions of the
state and actions of other entities: O~sx O~A,respectively. Wecan
define a combinedobservation function, O : S x A ~ II(f~),
taking the cross productover the individual observationfunctions:
O = ]"[~¢~ Oi. Thus, the probability distribution specified by O
formsthe richer informationstructure used in our model.
Wecan makesomeuseful distinctions betweendifferent classes of
informationstructures:
Collective UnobservabilityThis is the general case, wherewe make
no assumptionson the observations.
Collective Observabllity Vwe f}, 3s E S such that Pr(St =
sill t = w) = 1. In other words,the state of the worldis uniquely
determinedby the combinedobservations of the team of agents.
Theset of domainsthat are collectively observableis a strict subset of the domainsthat are collectively unobservable.
Individual Observability V~o~E f~l, 3s E S such that Pr(St =
s[ft~ = w~)= I. In other words,the state of the worldis uniquely
determinedby the observations of each individual agent. The set
of domainsthat are individuallyobservableis a strict subset of the
domainsthat are collectively observable.
Policy (Strategy) Space:
¯ ~qA:a domain-levelpolicy (or strategy, in the original teamtheory
specification) to governeach agent’s behavior.
Eachsuch policy mapsan agent’s belief state to an action, where,in
the original formalism,the agent’s beliefs corresponddirectly to its
observations
(i.e., ~riA: Oi--~ A).
Extension to RicherStrategy SpaceWegeneralize the set of possible strategies to capture the morecomplexmental states of the
agents. Eachagent, i E a, formsa belief state, b~ E B4, based on
its observationsseen throughtime t, whereB~circumscribesthe set
of possible belief states for the agent. Wedefine the set of possible combinedbelief states over all agents to be B -- 1-Ii~o Bi. The
correspondingrandomvariable, bt, represents the agents’ combined
belief state at timet. Thus,wedefinethe set of possibledomain-level
policies as mappingsfrombelief states to actions, 7r~a : Bi --* A. We
elaborate on different types of belief states and the mappingof observationsto belief states (i.e., the state estimatorfunction)in a later
section.
Reward Function: R
¯ R: rewardfunction(or payofffunction, or loss criterion), S x A
R.
This function represents the team’s joint preferences over domainlevel states (e.g., destroyingenemyis good, returning to homebase
with only 10%of original force is bad). The function also represents
the cost of domain-levelactions (e.g., flying consumesfuel). Thus,
all the agents share the samecommon
interest. Followingthe original
team-theoretic specification, we assumea Markovianproperty to R.
Extension for Explicit Representation of
Communication:
E
Wemakean explicit separation betweenthe domain-levelactions (A
in the original team-theoretic model)and communication
actions. In
principle, A can include communication
actions, but the separation is
useful for the purposesof this framework.
Thus, weextendour initial
MTDP
modelto be a communicativemultiagent team decision problem (COM-MTDP),
that wedefine as a tuple, (S, A, X], P, f~, O,
with the additional component,E, and an extendedrewardfunction,
R, as follows:
¯ {E~}i~a:a set of possible "speechacts", implicitly defining a set
of combinedcommunications,X~= I-Le,, E~. In other words, an
agent, i, maycommunicatean element, z E E~, to its teammates,
whointerpret the communication
as they wish (e.g., broadcast arrival in enemyterritory). Weassumehere perfect communication,
in that each agent immediatelyobservesthe speechacts of the others withoutany noise or delay.
¯ R: reward function, S x A x X] ~ R. This function represents
the team’s joint preferences over domain-levelstates and domainlevel actions, as in the original team-theoretic model. However,
we nowextend it to also represent the cost of communicative
acts
(e.g., communication
channels mayhaveassociated cost).
Weassume that the cost of communicationand cost of domainlevel actions are independentof each other, giventhe current state
of the world. Withthis assumption, we can decomposethe reward
function into two components:a domain-level reward function,
RA: S x A --+ R, and a communication-levelreward function,
Rz : S × ~ ~ R. The total reward is the sum of the two component values: R(s, a: o’) = RA(S, a) + RE(s, ~). Weassumethat
communication
has no inherent benefit and mayinstead have some
cost, so that for all states, s E S, and messages,~r E E, the reward
is neverpositive: R(s, or) <
Withthe introduction of this communication
stage, the agents now
updatetheir belief states at twodistinct points within each decision
epoch t: once upon receiving observation f~ (producing the precommunication
belief state b~.x), and again uponobservingthe other
agents’ communications(producing the post-communicationbelief
state b~E.). The distinction allows us to differentiate betweenthe
belief state used by the agents in selecting their communication
actions and the more"up-to-date" belief state used in selecting their
domain-levelactions. Wealso distinguish betweenthe separate stateestimator functions used in each update phase:
n,
b,x o = SE~Eo (b~.,x, ~t)
b°= SE O b~.x=SE~*x(bix°,
’
~)
whereS E~.r. : B~x fl~ -+ B~ is the pre-communication
state estimator for agent i, and SERE.: B~x ~2 --+ B~is the post-communication
state estimator for agent i. Theyspecify functions that update the
agent’s beliefs based on the latest observation and communication,
respectively. Theinitial state estimator, SEmi: 0 -~ B~, specifies
the agent’s prior beliefs, before any observationsare made.For each
of these, we also makethe obviousdefinitions for the corresponding
°.
estimators for the combinedbelief states: SE.E, SEE., and SE
Note that, although we assumeperfect communication
here, we could
potentially use the post-communication
state estimator to modelany
noise in the communication
channel.
Weextendour definition of agent’s policy to include a coordination
policy, 7r~x : B~--¢ E~, analogousto the domain-levelpolicy. We
define the joint policies, *rE and *rA, as the combined
policies across
all agents in a. Wealso define the overall policy, 7r~, as the pair,
(TqA,~rix ), andthe overallcombined
policy, ,r, as the pair, (*rA,*rE).
Communication Subdasses
One key assumption we makeis that communicationhas no latency,
i.e., agents communication
instantly reaches other agents withoutde-
lay. However,as with the observability function, weparameterizethe
communication
costs associated with messagetransmissions:
General CommunicationIn this general case, we makeno assumptions on the communication.
Free Communication
R(s, a, or1) = R(s, a, o’~) for any o’1, or2
E, s E S, and a E A. In other words, communication
actions have
no effect on the agents’ reward.
No communication]D = 0, so the agents have no explicit communication capabilities. Alternatively, communication
maybe prohibitively expensive, so that Vu E 1~, s E S, and a E A,
R(s, a.. or) = -oo. If communication
has such an unboundednegative cost, then wecan effectively treat the agents as havingno
communication
capabilities whendeterminingoptimal policies.
The free-communicationcase is common
in the multiagent literature, whenresearchers wish to focus on issues other than communication cost. Wealso identify the no-communicationcase because such decision problemshavebeenof interest to researchers as
well (Matari6 1997). Of course, even if E = 0, it is possible that
there are domain-levelactions in A that have communicative
value.
Suchimplicitly communicative
actions mayact as signals that convey
informationto the other agents. However,we still label such agent
teamsas having no communication
for the purposesof the workhere,
wherewe gain extensive leverage from explicit modelsof communication whenavailable.
Model mustration
Wecan viewthe evolvingstate as a Markovchain with separate stages
for domain-leveland coordination-level actions. In other words, the
agent team begins in someinitial state, SO, with separate features,
.... , fo. Wealso add initial belief states for each agent i E a,
b~ "= S~-~i"0. Eachagent, i E a, receives an observation f~o drawn
from f~ according to the probability distribution O~(S°, .), since
there are no actions yet. Then,each agent updates its belief state,
= S ,.E(bo, W).
Next, each agent i E a selects a speech act according to its coo
ordination policy, Eo = ~z (bi°z),
defining a combinedspeech act
Eo. Eachagent observesthe communications
of all of the others, and
it updates its belief state, box. = SE~E.(b°.E,]~o). Eachthen selects an action accordingto its domain-levelpolicy, A° =~riA(b°E.),
defining a combinedaction A°. In executingthese actions, the team
receives a single reward, R° = R(S°, A°, ~o). The world then
movesinto a newstate, SI, accordingto the distribution, P(S°, A°),
and the process continues.
Now,each agent i receives an observation f~ drawnfrom N~accordingto the distributionOi ( S ~, A°), andit updatesits belief state,
bl_x = SE~,E
(blE.. fl~). Eachagent again chooses its speechact,
E~= ~r,z (b~.~:). The agents then incorporate the received messages
into their newbelief state, blE. = SE~:.(bl.~., ~.~). Theagents then
choosenewdomain-levelactions, A~= ~r~A(b~E.), resulting in yet
anotherstate, etc.
Thus,in additionto the timeseries of worldstates, S°, t,
SZ,... : S
the agents themselvesdeterminea time series of coordination-level
and domain-levelactions, Eo, El,...., Et and A1, A1,..., At, respectively. Wealso havea time series of observationsfor each agent
t
i, fl~,
0 G~
1 ,...,
fl~.
Likewise,we can treat the combined
observations,
f~o, l~ ..., f~t, as a similar time series of randomvariables.
Finally, the agents receive a series of rewards,Re, R1,..., Rt. We
can define the value, V, of the policies, *rAand *rE, as the expected
rewardreceivedwhenexecutingthose policies. Overa finite horizon,
T, this value is equivalentto the following:
vT(*rA,*rx)
59
= [t =;-~oRtl*rA,*rE ]
(1)
I{
Individually
Collectively
Collectively
Unobservable
Observable
Observable
P-complete NEXP-complete NEXP-complete
No Comm.
Gen. Comm. P-complete NEXP-complete NEXP-complete
PSPACE-complete
Free Comm. P-complete
P-complete
Table 1: Time complexity of COM-MTDPs
under combinations of
observability and communicability.
Complexity Analysis of Team Decision Problems
The problemfacing these agents (or the designer of these agents)
howto construct the joint policies, ~r~_and ~rA, so as to maximize
their joint utility, R( S*, t, Et )
TheoremI The decision problemof determining whetherthere exist
joint policies, ~r~. and~r A, for a given teamdecision problemthat
yield a total rewardat least K over somefinite horizon T (given
integers K andT ) is NEXP-complete
ilia[ >_. 2 (i.e., morethan one
agent).
The proof for this theoremfollows from somerecent results in
decentralized POMDPs
(Bernstein, Zilberstein, &Immerman
2000).
Dueto lack of space, wewill not provide the proofs for the theorems
outlinedin this paper.
Thedistributed nature of the agent team is one factor behindthe
complexityof the problem.The ability of the agents to communicate
potentially allowssharing of observationsand, thus, simplification of
the agents’ problem.
Theorem2 The decision problemof determining whether there exist
joint policies, 7rE and 7rA, for a given team decision problemwith
free communication,that yield a total rewardat least K over some
finite horizon T (given integers K and T) is PSPACE-complete.
Theorem3 The decision problemof determining whether there exist
joint policies, 7rr~ and 7r A, for a given teamdecision problemwith
free communication
and collective observability, that yield a total
rewardat least K over somefinite horizon T (given integers K and
T) is P-complete.
Table 1 summarizesthe above results. The columnsoutline different observability conditions, while the rowsoutline the different
communication
conditions. Wecan makethe following observations:
¯ Wecan identify that difficult problemsin teamwork--in the sense
of providingagent teamswith domain-leveland coordination policies -- arise underthe conditionsof collective observabilityor unobservability, and whenthere are costs associated with communication. In attacking teamworkproblems,our energies should concentrate in these arenas.
¯ Whenthe world is individually observable, communication
makes
little differencein performance.
¯ Thecollective observability case clearly illustrates a case where
teamworkwithout communication
is highly intractable, potentially
with a double exponential. However,with communication,the
complexitydrops downto a polynomialrunning time.
¯ Thereis a reduction in complexitywith the collectively unobservable case as well, but the differenceis not as significant as withthe
collectively unobservablecase.
Evaluating Coordination Strategies
Whileprovidingdomain-leveland coordination policies for teams in
general is seen to be a difficult challenge, manysystemsattempt to
alleviate this difficulty by having humanusers provide agents with
domain-levelplans (Tambe1997). The problemfor the agents then
6O
to generateappropriateteamcoordinationactions, so as to ensurethat
the domain-levelplans are suitably executed. In other words,agents
are providedwith ~rA, but they mustcompute~rr autonomously.Different teamcoordinationstrategies havebeenproposedin the literature, essentially to compute~r~. Weillustrate that COM-MTDP
can
be used to evaluate suchcoordinationstrategies, by focusingon a key
theory and a practical teamworkmodel.
Communicationunder Joint Intentions
Joint-intention theory provides a prescriptive frameworkfor multiagent coordination in a team setting. It does not makeany claims
of optimalityin its coordination,but it providestheoretical justifications for its prescriptions, groundedin the attainment of mutual
belief amongthe teammembers.Ourgoal here is to identify the domainproperties under whichjoint-intention theory provides a good
basis for coordination. By "good", we meanthat a team of agents
communicates
as specified under joint-intention theory will achieve
a high expectedutility in performinga given task. A teamfollowing
joint-intention
theorymaynot performoptimally, but wewishto find
a methodfor identifying precisely howsuboptimalthe performance
will be. In addition, wehave already shownthat finding the optimal
policy is a complexproblem.The complexityof finding the jointintention coordinationpolicy is lowerthan that of the optimalpolicy.
In manyproblemdomains,we maybe willing to incur the suboptimal
performancein exchangefor the gains in efficiency.
Underthe joint-intention framework,the agents, a, makecommitmerits to achieve certain joint goals. Wespecify a joint goal, G, as
a subset of states, GC_S, wherethe desired conditionholds. Under
joint-intention theory, whenan individual privately believes that the
teamhas achievedits joint goal, it shouldthen attain mutualbelief
of this achievement, with its teammates. Presumably,such a prescription indicates that joint intentions do not applyto individually
observableenvironments,since all of the agents wouldobserve that
St E G, and they wouldinstantaneouslyattain mutualbelief. Instead,
the joint-intention frameworkaims at domainswith somedegree of
unobservability.In such domains,the agents must communicate
to attain mutualbelief. However,wecan also assumethat joint-intention
theory does not apply to domainswith free communication.In such
domains,we do not need any specialized prescription for behavior,
since we can simplyhave the agents communicate
everything, all the
time.
The joint-intention frameworkdoes not specify a precise communication policy for the attainmentof mutualbelief. Let us consider
one possible instantiation of joint-intention theory: a simple jointintention communication
policy, ~rsJI
I; , that prescribes that agents
communicate
the achievementof the joint goal, G, in each and every belief state wherethey believe G to be true. Underjoint intentions, we makean assumptionof sincerity (Smith &Cohen1996),
that the agents neverselect the special oa messagein any belief state
whereG is not believed to be true with certainty. Wemakeno other
assumptionsabout whatmessages~rs J1 specifies in all other belief
states.
Given the assumptionof sincerity, we can assumethat all of the
other agents immediatelyaccept the special message,oa, as true, and
we define our post-communication
state estimator function accordingly. Withthis assumption,as well as our assumptionsof perfect
communication,
the team attains mutualbelief that G is true immediately uponreceiving the message,OG.
Wecan define the negated simple joint-intention communication
policy, ~r~sJx, as an alternative to 7r~Jx. In particular, this negated
policy never prescribes communicating
the oc message.For all other
belief state, the two policies are identical. Thedifference between
these policies dependson manyfactors within our model. Giventhat
the domainis not individually observable,certain agents maynot be
awareof the achievementof G whenit happens. The risk with the
Individually
Observable
No Comm.
a(1)
General Comm.
fl(1)
Free Comm.
n(1)
Collectively
Observable
Collectively
Unobservable
n(t)
n(1)
o((ISl. IND") o((ISl- Inl)D
fl(1)
n(1)
Table 2: Timecomplexity of choosing betweenSJI and -~SJI policies at a single point in time.
negatedpolicy, Ir~.sJ1, is that, evenif one of their teammatesknows
of the achievement,these unawareagents mayunnecessarily continue
performingactions in the pursuit of achieving G. The performance
of these extraneousactions couldpotentially incur costs and lead to a
lowerutility than one wouldexpect underthe simple joint-intention
sJI.
policy, lr
To more precisely comparea communicationpolicy against its
negation, we define the following difference value, for a fixed
domain-levelpolicy, 7rA:
AT ----- vT(~A, "ffyt
X) -- vT(~rA, =Z~X)
(2)
So all else being equal, wewouldlike to see that the simple jointintention policy dominatesits negation--i.e., A~jI >_ 0. Theexecution of the twopolicies differs only if the agents achieve G and one
of the agents comesto knowthis fact with certainty. Dueto spacerestrictions, weomit the precise characterizationof the policies, but we
can use our COM-MTDP
model to derive a computable expression
correspondingto our condition, AsTjI _> 0. This expression takes
the formof a page-longinequality groundedin the basic elements of
our model.Informally, the inequality states that weprefer the SJI
policy wheneverthe magnitudeof the cost of executionafter already
achieving G outweighsthe cost of communication
of the fact that G
has been achieved. At this high level, the result maysoundobvious, but the inequality providesan operational criterion for an agent
to makeoptimal decisions. Moreover,the details of the inequality
provide a measureof the complexity of makingsuch optimal decisions. In the worst case, determiningwhetherthis inequality holds for
a given domainrequires iteration over all of the possible sequences
of states and observations, leading to a computationalcomplexityof
O((ISl. Inl)r).
However,under certain domainconditions, we can reduce our derived general criterion to drawgeneral conclusions about the SJI
and --,SJI policies. Underno communication,the inequality is alwaysfalse, and we knowto prefer -~SJI (i.e., we do not communicate). Underfree communication,the inequality is alwaystrue, and
we knowto prefer SJI (i.e., we always communicate). Under
assumptions about communication,the determination is more complicated. Whenthe domainis individually observable, the inequality
is alwaysfalse (unless under free communication),and we prefer the
--,SJI policy (i.e., we do not communicate).
Whenthe environmentis only partially observable, then agent ~ is
unsure about whether its teammateshave observed that G has been
achieved.In such cases, the agent mustevaluate the general criterion
in its full complexity.In other words,it mustconsider all possible
sequencesof states and observationsto determinethe belief states of
its teammates.Table2 provides a table of the complexityrequired to
evaluate our general criterion underthe various domainproperties.
STEAM
In choosing betweenthe SJI and -~SJI policies, we are faced with
two relatively inflexible styles of coordination: alwayscommunicate
or never communicate.Recognizingthat practical implementations
must be moreflexible in addressing varying communicationcosts,
the STEAM
teamworkmodel includes decision-theoretic communication selectivity. The algorithmthat computesthis selectivity uses
61
an inequality, similar to the high-level viewof our optimalcriterion
for communication:7" C,,,t > Cc Here, Cc is the cost of communication that the joint goal has beenachieved;C,~t is the cost of miscoordinated termination, where miscoordination mayarise because
teammatesmayremainunawarethat the goal has been achieved; and
"y is the probability that other agents are unawareof the goal being
achieved.
Whileeach of these parameters has an corresponding expression
in the optimal criterion for communication,in STEAM,
each parameter is a fixed constant. Thus, STEAM’s
criterion is not sensitive to
the particular worldand belief state that the teamfinds itself in at
the point of decision, nor does it conductany lookaheadover future
states. Therefore, STEAM
provides a rough approximation to the
optimal criterion for communication,
but it will suffer somesuboptimality in domainsof reasonable complexity.
Onthe other hand, while optimality is crucial, feasibility of execution is also important. Agentscan computethe STEAM
selectivity
criterion in constant time, whichpresents an enormoussavings over
the worst-casecomplexityof the optimal decision, as shownin Table
2. In addition, STEAM
is moreflexible than the SJI policy, since its
conditions for communication
are moreselective.
Summary
and Future ExperimentalWork
In addition to providingthese analytical results over general classes
of problem domains, the COM-MTDP
frameworkalso supports the
analysis of specific domains.Givena particular problemdomain,we
can use the COM-MTDP
model to construct an optimal coordination policy. If the complexityof computingan optimal policy is prohibitive, we can instead use the COM-MTDP
modelto evaluate and
comparecandidates for approximatepolicies.
To provide a concrete illustration of the application of the COMMTDP
frameworkto a specific problem, we are investigating an
exampledomaininspired by our experiences in constructing teams
of agent-piloted helicopters. Considertwohelicopters that mustfly
across enemyterritory to their destination without detection. The
first, piloted by agentA, is a transport vehiclewithlimited firepower.
Thesecond, piloted by agent B, is an escort vehicle with significant
firepower. Somewhere
along their path is an enemyradar unit, but
its location is unknown
(a priori) to the agents. AgentB’s escort helicopter is capableof destroyingthe radar unit uponencounteringit.
However,
agent A’s transport helicopter is not, and it will itself be
destroyed by enemyunits if detected. Onthe other hand, agent A’s
transport helicopter can escapedetection by the radar unit by traveling at a very lowaltitude (nap-of-the-earthflight), thoughat a lower
speedthanat its typical, higheraltitude. In this scenario,agentB will
not worryaboutdetection, givenits superior firepower;therefore, it
will fly at a fast speedat its typical altitude.
The two agents form a top-level joint commitment,GD,to reach
their destination. Thereis no incentive for the agents to communicate
the achievementof this goal, since they will both eventually reach
their destination, at whichpoint they will achievemutualbelief in the
achievementof GD.However,in the service of their top-level goal,
GD,the two agents also adopt a joint goal, Gn, of destroying the
radar unit, since, withoutdestroyingthe radar unit, agent A’s transport helicopter cannot reach the destination. Weconsider here the
problemfacing agent B with respect to communicating
the achievementof goal, Gn. If agent B communicatesthe achievementof GR,
then agent A knowsthat the enemyis destroyed and that it is safe
to fly at its normalaltitude (thus reachingthe destinationsooner).
agent B does not communicate
the achievementof Gn, there is still
somechance that agent A will Observethe event anyway.If agent
A does not observer the achievementof GR,then it must fly nap-ofthe-earth the wholedistance, and the team receives a lower reward
becauseof the later arrival. Therefore, agent B must weighthe increase in expectedrewardagainst the cost of communication.
Weare in the process of conductingexperimentsusing this example scenario with the aimof characterizing the optimality-efficiency
tradeoffs madeby various policies. In particular, wewill computethe
expected rewardachievable by the agents using the optimal coordination policy vs. STEAM
vs. the simple joint-intentions policy. We
will also measurethe amountof computationaltime required to generate these various policies. Wewill evaluatethis profile overvarious
domainparameters--moreprecisely, agent A’s ability to observe GR
and the cost of communication.
The results of this investigation will
appear in a forthcomingpublication (Pynadath &Tambe2002).
References
Bemstein, D. S.; Zilberstein, S.; and Immerman,H. 2000. The
complexityof decentralized control of Markovdecision processes.
In Proceedingsof the Conferenceon Uncertaintyin Artificial Intelligence, 32-37.
Cohen, P. R., and Levesque, H.J. 1991. Teamwork. Nous
25(4):487-512.
Grosz, B., and Kxans, S. 1996. Collaborative plans for complex
groupactions. Artificial Intelligence 86:269-358.
Ho, Y.-C. 1980. Teamdecision theory and information structures.
Proceedingsof the IEEE68(6):6A.A. 654.
Jennings, N. 1995. ContToilingcooperative problemsolving in industrial multi-agentsystemsusing joint intentions. Artificial Intelligence 75:195--240.
Marschak,J., and Radner, R. 1971. The EconomicTheoryof Teams.
NewHaven,CT: Yale University Press.
Matari~, M.J. 1997. Using communicationto reduce locality in
multi-robot learning. In Proceedingsof the NationalConferenceon
Artificial Intelligence, 643--648.
Pynadath, D. V., and Tambe,M. 2002. Multiagent teamwork:Analyzing the optimality and complexityof key theories and models.In
Proceedingsof the International Joint Conferenceon Autonomous
Agents and Multi-AgentSystems, to appear.
Rich, C., and Sidner, C. 1997. COLLAGEN:
Whenagents collaborate with people. In Proceedingsof the International Conferenceon
AutonomousAgents (Agents’97).
Smallwood,R. D., and Sondik, E. J. 1973. The optimal control of
partially observableMarkovprocesses over a finite horizon. Operations Research 21:1071-1088.
Smith,I. A., and Cohen,P. R. 1996. Towarda semanticsfor an agent
communicationslanguage based on speech-acts. In Proceedingsof
the NationalConferenceon Artificial Intelligence, 24-31.
Sonenberg,E.; Tidhard, G.; Wemer,E.; Kinny, D.; Ljungberg, M.;
and Rao, A. 1994. Planned team activity. Technical Report 26,
AustralianAI Institute.
Tambe,M. 1997. Towardsflexible teamwork.Journal of Artificial
Intelligence Research7: 83-124.
Yen, J., Yin, J.; Ioerger, T. R.; Miller, M. S.; Xu, D.; and Volz,
R. A. 2001. CAST:Collaborative agents for simulating teamwork.
In Proceedingsof the International Joint Conferenceon Artificial
Intelligence.
62