From: AAAI Technical Report SS-02-04. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
TeamFormation for Reformation
Ramjit
Nair and Milind
Tambe and Stacy Marsella
ComputerScience Departmentand Information Sciences Institute
University of SouthernCalifornia
Los Angeles CA90089
{ nair,tambe} @usc .edu, marsella@isi.edu
Introduction
The utility of the multi-agent team approachfor coordination of distributed agents has beendemonstratedin a number of large-scale systemsfor sensingand acting like sensor
networksfor real-time tracking of movingtargets (Modiet
al. 2001) and disaster rescue simulation domains,such as
RoboCupRescue Simulation Domain(Kitano et al. 1999;
Tadokoroet al. 2000)Thesedomainscontain tasks that can
be performedonly by collaborativeactions of the agents. Incompleteor incorrect knowledgeowingto constrained sensing and uncertainty of the environmentfurther motivatethe
needfor these agents to explicitly workin teams.Akey precursor to teamworkis team formation, the problemof how
best to organizethe agents into collaboratingteamsthat performthe tasks that arise. Forinstance, in the disaster rescue
simulation domain,injured civilians in a burning building
mayrequire teaming of two ambulancesand three nearby
fire-brigades to extinguish the fire and quickly rescue the
civilians. If there are several such fires and injured civilians, the teamsmustbe carefully formedto optimizeperforITlantT.e.
Our work in team formation focuses on dynamic, realtime environments, such as sensor networks (Modiet al.
2001) and RoboCupRescue Simulation Domain(Kitano
et al. 1999; Tadokoro et al. 2000). In such domains
teams mustbe formedrapidly so tasks are performedwithin
given deadlines, and teams must be reformed in response
to the dynamicappearanceor disappearance of tasks. The
problems with the current team formation work for such
dynamic real-time domains are two-fold: i) most team
formation algorithms (Tidhar, Rao, & Sonenberg 1996;
Hunsberger & Grosz 2000; Fatima & Wooldridge 2001;
Horling, Benyo,&Lesser 2001, Modiet al. 2001)are static.
In order to adapt to the changingenvironmentthe static algorithm wouldhave to be run repeatedly, ii) Teamformation has largely relied on experimentalwork, without any
theoretical analysis of key properties of teamformationalgorithms,such as their worst-casecomplexity.This is especially importantbecauseof the real-time nature of the domains.
In this paperwetake initial steps to attack boththese problems. As the tasks changeand members
of the team fail, the
current teamneedsto evolve to handle the changes.In both
the sensor networkdomain(Modiet al. 2001) and RoboCup
52
Rescue(Kitano et al. 1999; Tadokoroet al. 2000), each
re-organizationof the teamrequires time(e.g., fire-brigades
mayneed to drive to a newlocation) and is hence expensive because of the need for quick response. Clearly, the
current configurationof agents is relevant to howquickly
and well they can be re-organized in the future. Eachreorganization of the teams should be such that the resulting team is effective at performingthe existing tasks but
also flexible enoughto adapt to newscenarios quickly. We
refer to this reorganization of the teamas ’~I’eamFormation for Reformation".In order to solve the "l’eamFormation for Reformation" problem, we present R-COM-MTDPs
(Roles and Communicationin a Marknv TeamDecision
Process), a formal modelbased on communicatingdecentralized POMDPs,
to address the above shortcomings. RCOM-MTDP
significantly extends an earlier modelcalled
COM-MTDP
(Pynadath & Tambe 2002), by making important additions of roles and agents’ local states, to more
closely modelcurrent complexmultiagent teams. Thus, RCOM-MTDP
provides decentralized optimal policies to take
up and changeroles in a team (planning ahead to minimize
reorganizationcosts), and to executesuch roles.
R-COM-MTDPs
provide a general tool to analyze roletaking and role-executing policies in muitiagentteams. We
showthat while generation of optimal policies in R-COMMTDPs
is NEXP-complete,different communicationand
observability conditions significantly reduce such complexity. In this paper, we use the disaster rescue domainto motivate the ’TeamFormationfor Reformation"problem. We
present real worldscenarios wheresuch an approachwould
be useful and use the RoboCnp
RescueSimulation Environment(Kitano et al. 1999;Tadoknroet al. 2000) to explain
the workingof our model.
Domain and Motivation
The RoboCup-RescueSimulation Domain(Kitano et al.
1999;Tadokoroet al. 2000), provides an environmentwhere
large-scale earthquakescan be simulatedand heterogeneous
agentscancollaboratein the task of disaster mitigation. Currently, the environmentis a simulation of an earthquakein
the Nagataward in Kobe,Japan. As a result of the quake
manybuildings collapse, civilians get trapped, roads get
damaged
and gas leaks cause fires whichspread to neighboring buildings. Thereare different kinds of agents that par-
ticipate in the task of disaster mitigationviz. fire brigades,
ambulances,police forces, fire stations, ambulance
centers
and police stations. In addition to havinga large numberof
heterogeneous
agents, the state of the environment
is rapidly
changing- buried civilians die, fires spreadto neighboring
buildings, buildings get burnt down,rescue agents run out of
stamina, etc. Thereis uncertainty in the systemon account
of incorrect informationor informationnot reachingagents.
In such a hostile, uncertain and dynamicallychangingenvironment,a purely reactive methodof team formationthat
relies on only current informationis not likely to perform
as well as a methodthat takes into accountfuture tasks and
reformationsthat are required. Reformationstake time and
hence we wouldlike to form teams keeping in mindthe future reformationsthat maybe necessary.
Whileteams can be formedwithout look-ahead, the followingscenarios illustrate the needfor ’TeamFormationfor
Reformation".
1. Afactory B catchesfire at night. Sinceit is known
that the
factory is emptyno casualties are likely. Withoutlooking
aheadat the possible outcomesof this fire, one wouldnot
give too muchimportanceto this fire and might assign
just one or two fire brigades to it. However,if by looking ahead, there is a high probability that the fire would
spread to a nearby hospital, then morefire brigades and
ambulancescould be assigned to the factory and the surroundingarea to reduce the response time. Movingfire
brigades and ambulancesto this area might leave other
areas wherenewtasks could arise empty. Thusother ambulancesand fire brigadescould be movedto strategic locations within these areas.
2. There are two neighborhoods, one with small wooden
houses close together and the other with houses of more
fire resistant material. Both these neighborhoodshave a
fire in eachof themwith the fire in the wooden
neighborhood being smaller at this time. Withoutlooking ahead
to howthese fires might spread, morefire brigades may
be assigned to the larger fire. But the fire in the wooden
neighborhoodmight soon get larger and mayrequire more
fire brigades. Sincewe are strappedfor resources, the responsetime to get morefire brigadesfromthe first neighborhoodto the secondwouldbe long and possibly critical.
3. There is an unexploredregion of the world from which
no reports of any incident have comein. This could be
because nothing untowardhas happenedin that region or
morelikely, consideringthat a majorearthquakehas just
taken place, that there has been a communication
breakdownin that area. By considering both possibilities, it
might be best if police agents take on the role of exploration to discover newtasks and ambulancesand fire
brigades ready themselvesto performthe newtasks that
maybe discovered.
Eachof these scenarios demonstratethat looking ahead
at whatevents mayarise in the future is critical to knowing what teams will need to be formed. The time to form
these future teams fromthe current teams could be greatly
reducedff the current teamswereformedkeepingthis future
reformationin mind.
53
From COM-MTDP to RoCOM-MTDP
In the following sub-sections we present R-COM-MTDP,
a
formal model based on COM-MTDP
(Pynadath & Tambe
2002), that can address the shortcomingsof existing team
formation approaches,and its application to RoboCup
Rescue.
COM-MTDP
Given a team of selfless agents, a, a COM-MTDP
(Pynadath
&Tambe2002)is a tuple, (S, A,~, E,~, P, f/a, O,~, Ba, R).
S is a set of world states. Aa = 1-[iEcx Ai is a set of
combineddomain-levelactions, whereA~ is the set of actions for agent i. Ea = I-[iE~, Ei is a set of combined
messages, where E~ is the set of messagesfor agent i.
P(ab, a, Se) ----- Pr(S t+l =seJSt =Sb, At---a) governs the
domain-levelaction’s effects, fla = 1-Ii~c, fl~ is a set of
combinedobservations, wheref/i is the set of observations
for agent i. Observationfunctions, O a = I-[i~, Oi, specify a probability distribution over an agent’s observations:
Oi( s, a, w) = Pr( fi~=wiS~-s,A~- l=a) and maybe classified as:
Collective Partial Observability: No assumptions are
madeaboutthe observability of the worldstate.
Collective Observabmty: Team’scombined observations
uniquely determines world state: VwE ha, 3s E S such
that Vs’ ~ s, Pr(f/c, t = to]St = s~) = O.
Individual Observability: Each individual’s observation
uniquely determines the world state: VwE fli, 3s E S
such that Vs’ ~ s, Pr(f/~ = wlSt = s’) = 0.
Agent i chooses its actions and communication
based on
its belief state, b~ E B,, derived from the observations
and communicationit has received through time t. Ba =
I’Ii¢~ Bi is the set of possiblecombined
belief states. Like
the Xuan-Lessermodel(Xuan,Lesser, &Zilberstein 2001),
each decision epocht consists of two phases. In the first
phase, each agent i updatesits belief state on receivingits
observation, wt E fli, and choosesa messageto send to its
teammates.In the secondphase, it updatesits beliefs based
on communication
received, Eta, and then choosesits action.
Theagents use separate state-estimator functions to update
their belief states: initial belief state, b° = SE~/(); precommunication
belief state, b~.~: = SEi.E(b~t., w~); and
t = SEir,. ( b i°t E t
post-communication
belief state, b i~:.
E, a )
The COM-MTDP
reward function represents the team’s
joint utility (shared by all members)over states and actions, R : S x E~ x Aa ~ R, and is the sum of two rewards: a domain-action-level reward, RA:S×A~--*R, and
a communication-level reward, R~: : SxEa ~ ~ COMMTDP
(and likewise R-COM-MTDP)
domains can be classifted based on the allowedcommunication
and its reward:
General Communication:no assumptions on Ea nor RE.
No Communication:E.. = 0.
Free Communication:V~ E E~, R~(o) =
Analyzingthe extremecases, like free communication
(and
others in this paper) helps to understandthe computational
impact of the extremes. In addition, we can approximate
somereal-world domainswith such assumptions.
R-COM-MTDPExtensions
to COM-MTDPModel
We define a R-COM-MTDP
as an extended tuple,
(S,A~,E~,P,f~,O~,Bo,R,~£).
The key extension
over the COM-MTDP
is the addition of subplans, ~’£, and
the individual roles associatedwith those plans.
Suchcosts mayrepresent preparation or training or traveling time for newmembers,
e.g., if a sensor agent changesits
role to join a newsub-teamtracking a newtarget, there is
a few seconds delay in tracking. However,changeof roles
maypotentially provide significant future rewards.
Wecan define a role-taking policy, ~riT : Bi --+ T~
for each agent’s role-takingaction, a role-executionpolicy,
~ri¢ : B~~ ¢$i for each agent’s role-executionaction, and
a communicationpolicy 7r~E : B~ ~ E~ for each agent’s
communicationaction. The goal is to comeup with joint
policies ~rT, ~r~and ~r~ that will maximize
the total reward.
Extension for Explicit Local States: Si In considering
distinct roles withina team,it is useful to considerdistinct
subspacesof S relevant for each individual agent. If weconsider the worldstate to be madeup of orthogonalfeatures
(i.e., S = E~ x ":2 x ... x F.,~), then we can identify the
subset of features that agent i mayobserve. Wedenote this
subset as its local state, Si = ,Et, t ×,E~,~ ×... × E~,~.
Bydefinition, the observationthat agent i receives is independent of any features not covered by Si: Pr(f~
t-1 t---- a) ----- Pr(N~= w]S
0O[ St = (~1, ~2,’",~rt),A
Extensionfor Explicit Sub-PlansT’£ is a set of all possible sub-plans that a can perform. Weexpress a sub-plan
Pk E 9£ as a tuple of roles (rl,...,r,).
rjk represents
a role instance of role rj for a plan Pk and requires some
agent i E a to fulfill it. Rolesenablebetter modelingof real
systems, whereeach agent’s role restricts its domain-level
actions (Wooldridge, Jennings, & Kinny1999). Agents’
domain-level
actions
arenowdistinguished
between
two
types:
Role-Taking
actions:
To= 1-Le~T, isa setof combined
role
taking
actions,
where
Ti= {vir
~ } contains
theroletaking
actions
foragent
i.v~r~h
E ~imeans
thatagent
i
takes
ontherolerjaspartofplan
Ph.Anagent’s
role
can
beuniquely
determined
fromitsbelief
state.
Role-Execution
Actions:
~ir-h
isthesetofagent
i’sactions
forexecuting
rolerj~orplanPk(Wooldridge,
Jen...,
>, =
nings,
& Kinny
1999).
~/=~Vr~h
~i~j~.
Thisdefines
the
Application in RoboCupRescue
setofcombined
execution
actions
~a=l’Leo
~iThedistinction
between
role-taking
androle-executionThe notation described abovecan be applied easily to the
RoboCupRescue domainas follows:
actions
(A,=To
U ~a)enables
ustoseparate
their
costs.
canthencompare
costs
ofdifferent
role-taking
policies
ana1. a consists of three types of agents: ambulances,police
lytically
andempirically
asshown
infuture
sections.
Within
forces, fire brigades.
thismodel,
wecanrepresent
thespecialized
behaviors
associated
witheachrole,
andalsoanypossible
differences2. Injured civilians, buildings on fire and blockedroads can
be groupedtogether to form tasks. Wespecify sub-plans
among
theagents’
capabilities
forthese
roles.
While
filling
for eachsuch task type. Theseplans consist of roles that
a particular
role,
r jr,agent
i canperform
onlythose
rolecan be fulfilled by agents whosecapabilities matchthose
execution
actions,
~ E ~i,~k,
which
maynotcontain
allof
of the role.
itsavailable
actions
in~.Another
agent
t mayhavea different
setofavailable
actions,
~t,~,,
allowing
ustomodel 3. A sub-plan, p ~ 7~£ comprises of a variable numberof
thedifferent
methods
bywhich
agents
i andi mayfillrole
roles that needto be fulfilled in order to accomplisha
rjk.These
different
methods
canproduce
varied
effects
on
task. For example,the task of rescuing a civilian from
theworld
state
(asmodeled
bythetransition
probabilities, a burning building can be accomplishedby a plan where
P)andtheteam’s
utility
(asmodeled
bythereward
function,
fire-brigadesfirst extinguishthe fire, then ambulances
free
P~).
Thus,
thepolicies
mustensure
thatagents
foreachrole
the buried civilian and oneambulance
takes the civilian to
havethecapabilities
thatbenefit
theteam
themost.
a hospital. Eachtask can have multiple plans whichrepIn R-COM-MTDPs
(as in COM-MTDPs),
eachdecision
resent multiple waysof achievingthe task.
epoch
consists
oftwostages,
a communication
stage
andan
action
stage.
Ineachsuccessive
epoch,
theagents
alternate 4. Eachagent receives observationsabout the objects within
its visible range. But there maybe parts of the world
between
role-taking
androle-execution
epochs.
Thus,the
that are not observablebecausethere are no agents there.
agents
areintherole-taking
epoch
ifthetimeindex
isdiThus, RoboCup
Rescueis a collectively partially observvisible
by2,andareintheroleexecution
epoch
otherwise.
able
domain.
Therefore
each agent, needs to maintain a
Although,
thissequencing
ofrole-taking
androle-execution
belief
state
of
what
it
believes
the true worldstate is.
epochs
restricts
different
agents
fromrunning
role-taking
androle-execution
actions
inthesameepoch,
itiscon5. Thebenefit function can be chosento consider the capaceptually
simple
andsynchronization
isautomatically
enbilities of the agentsto performparticular roles, e.g., police agents maybe moreadept at performingthe "search"
forced.
As withCOM-MTDP,
thetotalreward
is a sumof
role than ambulances
and fire-brigades. This wouldbe recommunication
andaction
rewards,
buttheaction
reward
is
further
separated
intorole-taking
action
vs.role-execution flected in a highervaluefor choosinga police agentto take
action: RA(s,a) = Rr(s,.a)+ R4(s,a). By definition,
on the "search" role than an ambulance
or a fire-brigade.
In addition, the rewardfunction takes into consideration
RT(S,~) = 0 for all ~ E ~, and R.(s,v) = 0 for all
v E To. Weview the role taking rewardas the cost (negathe numberof civilians rescued, the numberof fires put
out and the health of agents.
tive reward)for taking up different roles in different teams.
54
The R-COM-MTDP
model works as follows: Initially,
the global world state is S°, whereeach agent i £ a has
local state S° and belief state b° = SE°Oand no role. Each
agent i receives an observation,~o°, accordingto probability
distribution Oi(S°, null. ~o) (there are no actions yet) and
updates its belief state, ’b°.z
= E~.~:
S (bi ,o ~o) to incorporate this newevidence. Eachagent then decides on what to
broadcast basedon its communication
policy, 7ri~:, and up0
dates its belief state accordingtoob~z. = SE~E.(b~.r.,
E°).
Eachagent, basedon its belief state then executesthe roleo
takingaction accordingto its role-taking policy, 7rcr. Thus,
somepolice agents maydecide on performingthe "search
role", whileothers maydecideto "clear roads", fire-brigades
decide on whichfires "to put out". Bythe central assumption of teamwork,all of the agents receive the samejoint
reward, R=° R($O,Ea,o Aa). The world then movesinto
a newstate, S1, according to the distribution, P(S°, A°).
Eachagent then receives the next observation about its new
local state basedon its position and its visual rangeandupo ~o~). This
dates its belief state usingb~.
1 x = SE~.z(b~z.,
is followed by another communication
action resulting in
1
1
the belief state, 1 b~z. = SE~z.(bi.z,
Ea).
The agent then
decideson a role-executionaction basedon its policy ~r i~.
It then receives newobservations about its local state and
the cycle of observation, communication,
role-taking action,
observation, communication
and role-execution action continues.
Complexity
of R-COM-MTDPs
R-COM-MTDP
supports a range of complexity analysis for
generating optimal policies under different communication
and observability conditions.
Theorem 1 We can reduce a COM-MTDP
to an equivalent
R-COM-MTDP.
Proof:
Given
a
COM-MTDP,
(S, Aa, Ea, P, fla, 0¢~, Ba, R), we can generate an equivalent R-COM-MTDP,
(S,A’ a,Ec~.P’,fla,O~,Ba,R’).
,
Within the R-COM-MTDP
actions,
A’, we define
Ta = {null} and @a= Aa. In other words, all of the original COM-MTDP
actions becomerole-execution actions
in the R-COM-MTDP,
where we add a single role-taking
action that has no effect (i.e., pt(s, null, s) = 1). The
new reward function borrows the same role-execution and
communication-level components: R~(s, a) RA(s, a)
and R~(s,~). Wealso add the new role-taking component: R’.r(s;nuU) = 0. Thus, the only role-taking policy
possible for this R-COM-MTDP
is ~r[. r (b) = null, and any
role-execution and communicationpolicies 0r~ and Ir~:,
respectively) will havean identical expectedrewardas the
identical domain-leveland communication
policies (Tr Aand
lrz, respectively) in the original COM-MTDP./-1
Theorem 2 Wecan reduce a R-COM-MTDP
to an equivalent COM-MTDP
I
Proof:
Given
a
R-COM-MTDP,
(S, Aa, Ea, P, N~, Oo, B~, R, 7~£), we can generate
ITheproofof this theoremwascontributedby Dr. DavidPynadath
55
equivalent COM-MTDP,
(S’, A~, Ea, P’, fl~: Oa, Ba, R’).
The COM-MTDP
state space, S’, includes all of the
features, Ei, in the original R-COM-MTDP
state space,
S = E1 x ... x -Zn, as well as an additional feature,
-’-phase = {taking, executing}. This newfeature indicates
whetherthe current state correspondsto a role-taking or
-executing stage of the R-COM-MTDP.
The new transition probability function, P’, augmentsthe original
function with an alternating behaviorfor this newfeature:
P’((~lb,...,~nb, taking),v, (~1~..... fne, executing) ) =
P((~b.... , ~.b), V,(~,. . . , ~,,)
P’(({lb,...,{nb,executing),~b,
and
{{le,-..,{ne, taking))
P((~lb,...,
~,nb), ~, (~le,-..,
~ne)). Within the COMMTDP,we restrict the actions that agents can take
in each stage by assigning illegal actions an excessively negative reward (denoted -r~=): Vv E T~,,
R~((~lb,..., ~,~b, executing), v) = --r,na= and V0E
R~((~lb,.., ~,~b, taking), 0) ---- -r,~,~. Thus, for a
MTDP
domain-levelpolicy, ~r~, we can extract role-taking
and -executing policies, 7r-r and ~r,, respectively, that
generate identical behavior in the R-COM-MTDP
when
used in conjunction with identical communication-level
policies, ~rr = ~r~. []
Thus, the problem of finding optimal policies for RCOM-MTDPs
has the same complexity as the problem of
finding optimal policies for COM-MTDPs.
Table 1 shows
the computationalcomplexityresults for various classes
of R-COM-MTDP
domains, where the results for individual, collective, andcollective partial observabilityfollow from COM-MTDPs
(Pynadath & Tambe 2002) (See
http://www.isi.edu/teamcordCOM-MTDP/for
proofs of the
COM-MTDP
results). In the individual observability and
collective observability under free communication
cases,
each agent knowsexactly what the global state is. The PCompleteresult is from a reduction from and to MDPs.
The collectively partial observable case with free communication can be treated as a single agent POMDP,
where
the actions correspondto the joint actions of the R-COMMTDP.The reduction from and to a single agent POMDP
gives the PSPACE-Complete
result. In the general case, by
a reduction from and to decentralized POMDPs,
the worstcase computationalcomplexityof finding the optimal policy
is NEXP-Complete.
The table 1 showsus that the task of finding the optimal policy is extremelybard, in general. However,by increasing the amountof communicationwe can drastically
improvethe worst-case computationalcomplexity(by paying the price of the additional communication).
As can be
seen from the table, when communicationchanges from
no communicationto free communication,in collectively
partially observable domains like RoboCupRescue, the
computational complexity changes from NEXP-Complete
to PSPACE-Complete.
In collectively observable domains,
like the sensor domain,the computationalsavings are even
greater, from NEXP-Complete
to P-Complete. This emphasizes the importanceof communication
in reducing the
worst case complexity.TableI suggeststhat if we weredesigners of a multiagent system, we wouldincrease the observability of the systemso as to reduce the computational
Coll. Obs. I IColl.
Part. Obs.[I
III.d.
ObsNEXP-Comp.
NEXP-Comp.
[OenC°°
It
No Comm.
P-Comp.
P-Comp.i NEXP-Comp. NEXP-Comp.
Free Comm.P-Comp.
P-Comp.
PSPACE-Comp.
Table 1: Computational complexity of R-COM-MTDPs.
complexity.
Summaryand Related Work
This workA_ddressestwo shortcomingsof the current work
in team formation for dynamicreal-time domains: i) most
algorithmsare static in the sense that they don’t anticipate
for changesthat will be requiredin the configurationof the
teams, fi) complexityanalysis of the problemis lacking.
Weaddressed the first shortcomingby presenting R-COMMTDP,a formal model based on decentralized communicating POMDPs,
to determine the team configuration that
takes into accounthowthe teamwill haveto be restructured
in the future. R-COM-MTDP
enables a rigorous analysis of
complexity-optimalitytradeoffs in team formation and reorganization approaches. The second shortcomingwas addressed by presenting an analysis of the worst-casecomputational complexityof team formation and reformation under various types of communication.Weintend to use the
R-COM-MTDP
model to compare the various team formation algorithms for RoboCupRescue which were used by
our agents in RoboCup-2001
and Robofesta 2001, where our
agents finished in third place and secondplace respectively.
While there are related multiagent models based on
MDPs,they have focused on coordination after team formation on a subset of domaintypes we consider, and they do
not address team formation and reformation. For instance,
the decentralizedpartially observableMarkovdecision process (DEC-POMDP)
(Bernstein, Zilberstein, &Immerman
2000)modelfocuses on generating decentralized policies in
collectively partially observable domainswith no communication; while the Xnan-Lessermodel(Xnan, Lesser,
Zilberstein 2001) focuses only on a subset of collectively
observable environments.Finally, while (Modiet al. 2001)
provide an initial complexityanalysis of distributed sensor
teamformation,their analysis is limited to static environments(no reorganizations)- in fact, illustrating the need
for R-COM-MTDP
type analysis tools.
Acknowledgment
Wewouldlike to thank DavidPynadathfor his discussions
on extending the COM-MTDP
model to R-COM-MTDP,
and the Intel Corporationfor their generousgift that made
this researchpossible.
References
Bernstein, D. S.; Ziiberstein, S.; and Immerman,
N. 2000.
The complexityof decentralized control of MDPs.In Proceeings of the Sixteenth Conferenceon Uncertaintyin Artificial Intelligence.
56
Fatima, S. S., and Wooldridge,M. 2001. Adaptive task
and resourceallocation in multi-agentsystems.In Proceedings of the Fifth International Conferenceon Autonomous
Agents.
Hofling, B.; Benyo,B.; and Lesser, V. 2001. Usingselfdiagnosis to adapt organizational structures. In Proceedings of the Fifth International Conferenceon Autonomous
Agents.
Hunsberger, L., and Grosz, B. 2000. A combinatorial
auction for collaborative planning. In Proceedingsof the
FourthInternational Conferenceon MultiAgentSystems.
Kitano, H.; Tadokoro,S.; Noda,I.; Matsubara,H.; Takahashi, T.; Shinjoh, A., and Shimada,S. 1999. Robocuprescue: Searchand rescue for large scale disasters as a domainfor multi-agent research. In ProceedingsoflEEEInternational Conferenceon Systems, Manand Cybernetics.
Modi,P. J.; Jung, H.; Tambe,M.; Shen, W.-M.;and Kulkarni, S. 2001.A dynamicdistributed constraint satisfaction
approachto resource allocation. In Proceedingsof Seventh International Conferenceon Principles and Practice
of Constraint Programming.
Pynadath, D., and Tambe,M. 2002. Multiagent teamwork:
Analyzingthe optimality complexityof key theories and
models. In Proceedingsof First International 3oint Conference on AutonomousAgents and Multi-Agent Systems.
Tadokoro,S.; Kitano, H.; Tomoichi,T.; Noda,I.; Matsubara, H.; Shinjoh, A., Koto, T.; Takeuchi,I.; Takahashi,H.;
Matsuno, E; Hatayama, M.; Nobe, J.; and Shimada, S.
2000. The robocup-rescue: An international cooperative
researchproject of robotics and al for the disaster mitigation problem.In Proceedingsof SPIE14th AnnualInternational Symposiumon Aerospace~DefenseSensing, Simulation, and Controls (AeroSense), Conferenceon Unmanned
GroundVehicle TechnologyIL
Tidhar, G.; Rao, A. S.; and Sonenberg,E. 1996. Guided
team selection. In Proceedingsof the SecondInternational
Conferenceon Multi-Agent Systems.
Wooldridge, M.; Jennings, N.; and Kinny, D. 1999. A
methodologyfor agent oriented analysis and design. In
Proceedingsof the Third International Conferenceon AutonomousAgents.
Xuan,P.; Lesser, V.; and Zilberstein, S. 2001. Communication decisions in multiagentcooperation.In Proceedingsof
the Fifth lnternatianal Conferenceon AutonomousAgents