From: AAAI Technical Report SS-02-04. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. TeamFormation for Reformation Ramjit Nair and Milind Tambe and Stacy Marsella ComputerScience Departmentand Information Sciences Institute University of SouthernCalifornia Los Angeles CA90089 { nair,tambe} @usc .edu, marsella@isi.edu Introduction The utility of the multi-agent team approachfor coordination of distributed agents has beendemonstratedin a number of large-scale systemsfor sensingand acting like sensor networksfor real-time tracking of movingtargets (Modiet al. 2001) and disaster rescue simulation domains,such as RoboCupRescue Simulation Domain(Kitano et al. 1999; Tadokoroet al. 2000)Thesedomainscontain tasks that can be performedonly by collaborativeactions of the agents. Incompleteor incorrect knowledgeowingto constrained sensing and uncertainty of the environmentfurther motivatethe needfor these agents to explicitly workin teams.Akey precursor to teamworkis team formation, the problemof how best to organizethe agents into collaboratingteamsthat performthe tasks that arise. Forinstance, in the disaster rescue simulation domain,injured civilians in a burning building mayrequire teaming of two ambulancesand three nearby fire-brigades to extinguish the fire and quickly rescue the civilians. If there are several such fires and injured civilians, the teamsmustbe carefully formedto optimizeperforITlantT.e. Our work in team formation focuses on dynamic, realtime environments, such as sensor networks (Modiet al. 2001) and RoboCupRescue Simulation Domain(Kitano et al. 1999; Tadokoro et al. 2000). In such domains teams mustbe formedrapidly so tasks are performedwithin given deadlines, and teams must be reformed in response to the dynamicappearanceor disappearance of tasks. The problems with the current team formation work for such dynamic real-time domains are two-fold: i) most team formation algorithms (Tidhar, Rao, & Sonenberg 1996; Hunsberger & Grosz 2000; Fatima & Wooldridge 2001; Horling, Benyo,&Lesser 2001, Modiet al. 2001)are static. In order to adapt to the changingenvironmentthe static algorithm wouldhave to be run repeatedly, ii) Teamformation has largely relied on experimentalwork, without any theoretical analysis of key properties of teamformationalgorithms,such as their worst-casecomplexity.This is especially importantbecauseof the real-time nature of the domains. In this paperwetake initial steps to attack boththese problems. As the tasks changeand members of the team fail, the current teamneedsto evolve to handle the changes.In both the sensor networkdomain(Modiet al. 2001) and RoboCup 52 Rescue(Kitano et al. 1999; Tadokoroet al. 2000), each re-organizationof the teamrequires time(e.g., fire-brigades mayneed to drive to a newlocation) and is hence expensive because of the need for quick response. Clearly, the current configurationof agents is relevant to howquickly and well they can be re-organized in the future. Eachreorganization of the teams should be such that the resulting team is effective at performingthe existing tasks but also flexible enoughto adapt to newscenarios quickly. We refer to this reorganization of the teamas ’~I’eamFormation for Reformation".In order to solve the "l’eamFormation for Reformation" problem, we present R-COM-MTDPs (Roles and Communicationin a Marknv TeamDecision Process), a formal modelbased on communicatingdecentralized POMDPs, to address the above shortcomings. RCOM-MTDP significantly extends an earlier modelcalled COM-MTDP (Pynadath & Tambe 2002), by making important additions of roles and agents’ local states, to more closely modelcurrent complexmultiagent teams. Thus, RCOM-MTDP provides decentralized optimal policies to take up and changeroles in a team (planning ahead to minimize reorganizationcosts), and to executesuch roles. R-COM-MTDPs provide a general tool to analyze roletaking and role-executing policies in muitiagentteams. We showthat while generation of optimal policies in R-COMMTDPs is NEXP-complete,different communicationand observability conditions significantly reduce such complexity. In this paper, we use the disaster rescue domainto motivate the ’TeamFormationfor Reformation"problem. We present real worldscenarios wheresuch an approachwould be useful and use the RoboCnp RescueSimulation Environment(Kitano et al. 1999;Tadoknroet al. 2000) to explain the workingof our model. Domain and Motivation The RoboCup-RescueSimulation Domain(Kitano et al. 1999;Tadokoroet al. 2000), provides an environmentwhere large-scale earthquakescan be simulatedand heterogeneous agentscancollaboratein the task of disaster mitigation. Currently, the environmentis a simulation of an earthquakein the Nagataward in Kobe,Japan. As a result of the quake manybuildings collapse, civilians get trapped, roads get damaged and gas leaks cause fires whichspread to neighboring buildings. Thereare different kinds of agents that par- ticipate in the task of disaster mitigationviz. fire brigades, ambulances,police forces, fire stations, ambulance centers and police stations. In addition to havinga large numberof heterogeneous agents, the state of the environment is rapidly changing- buried civilians die, fires spreadto neighboring buildings, buildings get burnt down,rescue agents run out of stamina, etc. Thereis uncertainty in the systemon account of incorrect informationor informationnot reachingagents. In such a hostile, uncertain and dynamicallychangingenvironment,a purely reactive methodof team formationthat relies on only current informationis not likely to perform as well as a methodthat takes into accountfuture tasks and reformationsthat are required. Reformationstake time and hence we wouldlike to form teams keeping in mindthe future reformationsthat maybe necessary. Whileteams can be formedwithout look-ahead, the followingscenarios illustrate the needfor ’TeamFormationfor Reformation". 1. Afactory B catchesfire at night. Sinceit is known that the factory is emptyno casualties are likely. Withoutlooking aheadat the possible outcomesof this fire, one wouldnot give too muchimportanceto this fire and might assign just one or two fire brigades to it. However,if by looking ahead, there is a high probability that the fire would spread to a nearby hospital, then morefire brigades and ambulancescould be assigned to the factory and the surroundingarea to reduce the response time. Movingfire brigades and ambulancesto this area might leave other areas wherenewtasks could arise empty. Thusother ambulancesand fire brigadescould be movedto strategic locations within these areas. 2. There are two neighborhoods, one with small wooden houses close together and the other with houses of more fire resistant material. Both these neighborhoodshave a fire in eachof themwith the fire in the wooden neighborhood being smaller at this time. Withoutlooking ahead to howthese fires might spread, morefire brigades may be assigned to the larger fire. But the fire in the wooden neighborhoodmight soon get larger and mayrequire more fire brigades. Sincewe are strappedfor resources, the responsetime to get morefire brigadesfromthe first neighborhoodto the secondwouldbe long and possibly critical. 3. There is an unexploredregion of the world from which no reports of any incident have comein. This could be because nothing untowardhas happenedin that region or morelikely, consideringthat a majorearthquakehas just taken place, that there has been a communication breakdownin that area. By considering both possibilities, it might be best if police agents take on the role of exploration to discover newtasks and ambulancesand fire brigades ready themselvesto performthe newtasks that maybe discovered. Eachof these scenarios demonstratethat looking ahead at whatevents mayarise in the future is critical to knowing what teams will need to be formed. The time to form these future teams fromthe current teams could be greatly reducedff the current teamswereformedkeepingthis future reformationin mind. 53 From COM-MTDP to RoCOM-MTDP In the following sub-sections we present R-COM-MTDP, a formal model based on COM-MTDP (Pynadath & Tambe 2002), that can address the shortcomingsof existing team formation approaches,and its application to RoboCup Rescue. COM-MTDP Given a team of selfless agents, a, a COM-MTDP (Pynadath &Tambe2002)is a tuple, (S, A,~, E,~, P, f/a, O,~, Ba, R). S is a set of world states. Aa = 1-[iEcx Ai is a set of combineddomain-levelactions, whereA~ is the set of actions for agent i. Ea = I-[iE~, Ei is a set of combined messages, where E~ is the set of messagesfor agent i. P(ab, a, Se) ----- Pr(S t+l =seJSt =Sb, At---a) governs the domain-levelaction’s effects, fla = 1-Ii~c, fl~ is a set of combinedobservations, wheref/i is the set of observations for agent i. Observationfunctions, O a = I-[i~, Oi, specify a probability distribution over an agent’s observations: Oi( s, a, w) = Pr( fi~=wiS~-s,A~- l=a) and maybe classified as: Collective Partial Observability: No assumptions are madeaboutthe observability of the worldstate. Collective Observabmty: Team’scombined observations uniquely determines world state: VwE ha, 3s E S such that Vs’ ~ s, Pr(f/c, t = to]St = s~) = O. Individual Observability: Each individual’s observation uniquely determines the world state: VwE fli, 3s E S such that Vs’ ~ s, Pr(f/~ = wlSt = s’) = 0. Agent i chooses its actions and communication based on its belief state, b~ E B,, derived from the observations and communicationit has received through time t. Ba = I’Ii¢~ Bi is the set of possiblecombined belief states. Like the Xuan-Lessermodel(Xuan,Lesser, &Zilberstein 2001), each decision epocht consists of two phases. In the first phase, each agent i updatesits belief state on receivingits observation, wt E fli, and choosesa messageto send to its teammates.In the secondphase, it updatesits beliefs based on communication received, Eta, and then choosesits action. Theagents use separate state-estimator functions to update their belief states: initial belief state, b° = SE~/(); precommunication belief state, b~.~: = SEi.E(b~t., w~); and t = SEir,. ( b i°t E t post-communication belief state, b i~:. E, a ) The COM-MTDP reward function represents the team’s joint utility (shared by all members)over states and actions, R : S x E~ x Aa ~ R, and is the sum of two rewards: a domain-action-level reward, RA:S×A~--*R, and a communication-level reward, R~: : SxEa ~ ~ COMMTDP (and likewise R-COM-MTDP) domains can be classifted based on the allowedcommunication and its reward: General Communication:no assumptions on Ea nor RE. No Communication:E.. = 0. Free Communication:V~ E E~, R~(o) = Analyzingthe extremecases, like free communication (and others in this paper) helps to understandthe computational impact of the extremes. In addition, we can approximate somereal-world domainswith such assumptions. R-COM-MTDPExtensions to COM-MTDPModel We define a R-COM-MTDP as an extended tuple, (S,A~,E~,P,f~,O~,Bo,R,~£). The key extension over the COM-MTDP is the addition of subplans, ~’£, and the individual roles associatedwith those plans. Suchcosts mayrepresent preparation or training or traveling time for newmembers, e.g., if a sensor agent changesits role to join a newsub-teamtracking a newtarget, there is a few seconds delay in tracking. However,changeof roles maypotentially provide significant future rewards. Wecan define a role-taking policy, ~riT : Bi --+ T~ for each agent’s role-takingaction, a role-executionpolicy, ~ri¢ : B~~ ¢$i for each agent’s role-executionaction, and a communicationpolicy 7r~E : B~ ~ E~ for each agent’s communicationaction. The goal is to comeup with joint policies ~rT, ~r~and ~r~ that will maximize the total reward. Extension for Explicit Local States: Si In considering distinct roles withina team,it is useful to considerdistinct subspacesof S relevant for each individual agent. If weconsider the worldstate to be madeup of orthogonalfeatures (i.e., S = E~ x ":2 x ... x F.,~), then we can identify the subset of features that agent i mayobserve. Wedenote this subset as its local state, Si = ,Et, t ×,E~,~ ×... × E~,~. Bydefinition, the observationthat agent i receives is independent of any features not covered by Si: Pr(f~ t-1 t---- a) ----- Pr(N~= w]S 0O[ St = (~1, ~2,’",~rt),A Extensionfor Explicit Sub-PlansT’£ is a set of all possible sub-plans that a can perform. Weexpress a sub-plan Pk E 9£ as a tuple of roles (rl,...,r,). rjk represents a role instance of role rj for a plan Pk and requires some agent i E a to fulfill it. Rolesenablebetter modelingof real systems, whereeach agent’s role restricts its domain-level actions (Wooldridge, Jennings, & Kinny1999). Agents’ domain-level actions arenowdistinguished between two types: Role-Taking actions: To= 1-Le~T, isa setof combined role taking actions, where Ti= {vir ~ } contains theroletaking actions foragent i.v~r~h E ~imeans thatagent i takes ontherolerjaspartofplan Ph.Anagent’s role can beuniquely determined fromitsbelief state. Role-Execution Actions: ~ir-h isthesetofagent i’sactions forexecuting rolerj~orplanPk(Wooldridge, Jen..., >, = nings, & Kinny 1999). ~/=~Vr~h ~i~j~. Thisdefines the Application in RoboCupRescue setofcombined execution actions ~a=l’Leo ~iThedistinction between role-taking androle-executionThe notation described abovecan be applied easily to the RoboCupRescue domainas follows: actions (A,=To U ~a)enables ustoseparate their costs. canthencompare costs ofdifferent role-taking policies ana1. a consists of three types of agents: ambulances,police lytically andempirically asshown infuture sections. Within forces, fire brigades. thismodel, wecanrepresent thespecialized behaviors associated witheachrole, andalsoanypossible differences2. Injured civilians, buildings on fire and blockedroads can be groupedtogether to form tasks. Wespecify sub-plans among theagents’ capabilities forthese roles. While filling for eachsuch task type. Theseplans consist of roles that a particular role, r jr,agent i canperform onlythose rolecan be fulfilled by agents whosecapabilities matchthose execution actions, ~ E ~i,~k, which maynotcontain allof of the role. itsavailable actions in~.Another agent t mayhavea different setofavailable actions, ~t,~,, allowing ustomodel 3. A sub-plan, p ~ 7~£ comprises of a variable numberof thedifferent methods bywhich agents i andi mayfillrole roles that needto be fulfilled in order to accomplisha rjk.These different methods canproduce varied effects on task. For example,the task of rescuing a civilian from theworld state (asmodeled bythetransition probabilities, a burning building can be accomplishedby a plan where P)andtheteam’s utility (asmodeled bythereward function, fire-brigadesfirst extinguishthe fire, then ambulances free P~). Thus, thepolicies mustensure thatagents foreachrole the buried civilian and oneambulance takes the civilian to havethecapabilities thatbenefit theteam themost. a hospital. Eachtask can have multiple plans whichrepIn R-COM-MTDPs (as in COM-MTDPs), eachdecision resent multiple waysof achievingthe task. epoch consists oftwostages, a communication stage andan action stage. Ineachsuccessive epoch, theagents alternate 4. Eachagent receives observationsabout the objects within its visible range. But there maybe parts of the world between role-taking androle-execution epochs. Thus,the that are not observablebecausethere are no agents there. agents areintherole-taking epoch ifthetimeindex isdiThus, RoboCup Rescueis a collectively partially observvisible by2,andareintheroleexecution epoch otherwise. able domain. Therefore each agent, needs to maintain a Although, thissequencing ofrole-taking androle-execution belief state of what it believes the true worldstate is. epochs restricts different agents fromrunning role-taking androle-execution actions inthesameepoch, itiscon5. Thebenefit function can be chosento consider the capaceptually simple andsynchronization isautomatically enbilities of the agentsto performparticular roles, e.g., police agents maybe moreadept at performingthe "search" forced. As withCOM-MTDP, thetotalreward is a sumof role than ambulances and fire-brigades. This wouldbe recommunication andaction rewards, buttheaction reward is further separated intorole-taking action vs.role-execution flected in a highervaluefor choosinga police agentto take action: RA(s,a) = Rr(s,.a)+ R4(s,a). By definition, on the "search" role than an ambulance or a fire-brigade. In addition, the rewardfunction takes into consideration RT(S,~) = 0 for all ~ E ~, and R.(s,v) = 0 for all v E To. Weview the role taking rewardas the cost (negathe numberof civilians rescued, the numberof fires put out and the health of agents. tive reward)for taking up different roles in different teams. 54 The R-COM-MTDP model works as follows: Initially, the global world state is S°, whereeach agent i £ a has local state S° and belief state b° = SE°Oand no role. Each agent i receives an observation,~o°, accordingto probability distribution Oi(S°, null. ~o) (there are no actions yet) and updates its belief state, ’b°.z = E~.~: S (bi ,o ~o) to incorporate this newevidence. Eachagent then decides on what to broadcast basedon its communication policy, 7ri~:, and up0 dates its belief state accordingtoob~z. = SE~E.(b~.r., E°). Eachagent, basedon its belief state then executesthe roleo takingaction accordingto its role-taking policy, 7rcr. Thus, somepolice agents maydecide on performingthe "search role", whileothers maydecideto "clear roads", fire-brigades decide on whichfires "to put out". Bythe central assumption of teamwork,all of the agents receive the samejoint reward, R=° R($O,Ea,o Aa). The world then movesinto a newstate, S1, according to the distribution, P(S°, A°). Eachagent then receives the next observation about its new local state basedon its position and its visual rangeandupo ~o~). This dates its belief state usingb~. 1 x = SE~.z(b~z., is followed by another communication action resulting in 1 1 the belief state, 1 b~z. = SE~z.(bi.z, Ea). The agent then decideson a role-executionaction basedon its policy ~r i~. It then receives newobservations about its local state and the cycle of observation, communication, role-taking action, observation, communication and role-execution action continues. Complexity of R-COM-MTDPs R-COM-MTDP supports a range of complexity analysis for generating optimal policies under different communication and observability conditions. Theorem 1 We can reduce a COM-MTDP to an equivalent R-COM-MTDP. Proof: Given a COM-MTDP, (S, Aa, Ea, P, fla, 0¢~, Ba, R), we can generate an equivalent R-COM-MTDP, (S,A’ a,Ec~.P’,fla,O~,Ba,R’). , Within the R-COM-MTDP actions, A’, we define Ta = {null} and @a= Aa. In other words, all of the original COM-MTDP actions becomerole-execution actions in the R-COM-MTDP, where we add a single role-taking action that has no effect (i.e., pt(s, null, s) = 1). The new reward function borrows the same role-execution and communication-level components: R~(s, a) RA(s, a) and R~(s,~). Wealso add the new role-taking component: R’.r(s;nuU) = 0. Thus, the only role-taking policy possible for this R-COM-MTDP is ~r[. r (b) = null, and any role-execution and communicationpolicies 0r~ and Ir~:, respectively) will havean identical expectedrewardas the identical domain-leveland communication policies (Tr Aand lrz, respectively) in the original COM-MTDP./-1 Theorem 2 Wecan reduce a R-COM-MTDP to an equivalent COM-MTDP I Proof: Given a R-COM-MTDP, (S, Aa, Ea, P, N~, Oo, B~, R, 7~£), we can generate ITheproofof this theoremwascontributedby Dr. DavidPynadath 55 equivalent COM-MTDP, (S’, A~, Ea, P’, fl~: Oa, Ba, R’). The COM-MTDP state space, S’, includes all of the features, Ei, in the original R-COM-MTDP state space, S = E1 x ... x -Zn, as well as an additional feature, -’-phase = {taking, executing}. This newfeature indicates whetherthe current state correspondsto a role-taking or -executing stage of the R-COM-MTDP. The new transition probability function, P’, augmentsthe original function with an alternating behaviorfor this newfeature: P’((~lb,...,~nb, taking),v, (~1~..... fne, executing) ) = P((~b.... , ~.b), V,(~,. . . , ~,,) P’(({lb,...,{nb,executing),~b, and {{le,-..,{ne, taking)) P((~lb,..., ~,nb), ~, (~le,-.., ~ne)). Within the COMMTDP,we restrict the actions that agents can take in each stage by assigning illegal actions an excessively negative reward (denoted -r~=): Vv E T~,, R~((~lb,..., ~,~b, executing), v) = --r,na= and V0E R~((~lb,.., ~,~b, taking), 0) ---- -r,~,~. Thus, for a MTDP domain-levelpolicy, ~r~, we can extract role-taking and -executing policies, 7r-r and ~r,, respectively, that generate identical behavior in the R-COM-MTDP when used in conjunction with identical communication-level policies, ~rr = ~r~. [] Thus, the problem of finding optimal policies for RCOM-MTDPs has the same complexity as the problem of finding optimal policies for COM-MTDPs. Table 1 shows the computationalcomplexityresults for various classes of R-COM-MTDP domains, where the results for individual, collective, andcollective partial observabilityfollow from COM-MTDPs (Pynadath & Tambe 2002) (See http://www.isi.edu/teamcordCOM-MTDP/for proofs of the COM-MTDP results). In the individual observability and collective observability under free communication cases, each agent knowsexactly what the global state is. The PCompleteresult is from a reduction from and to MDPs. The collectively partial observable case with free communication can be treated as a single agent POMDP, where the actions correspondto the joint actions of the R-COMMTDP.The reduction from and to a single agent POMDP gives the PSPACE-Complete result. In the general case, by a reduction from and to decentralized POMDPs, the worstcase computationalcomplexityof finding the optimal policy is NEXP-Complete. The table 1 showsus that the task of finding the optimal policy is extremelybard, in general. However,by increasing the amountof communicationwe can drastically improvethe worst-case computationalcomplexity(by paying the price of the additional communication). As can be seen from the table, when communicationchanges from no communicationto free communication,in collectively partially observable domains like RoboCupRescue, the computational complexity changes from NEXP-Complete to PSPACE-Complete. In collectively observable domains, like the sensor domain,the computationalsavings are even greater, from NEXP-Complete to P-Complete. This emphasizes the importanceof communication in reducing the worst case complexity.TableI suggeststhat if we weredesigners of a multiagent system, we wouldincrease the observability of the systemso as to reduce the computational Coll. Obs. I IColl. Part. Obs.[I III.d. ObsNEXP-Comp. NEXP-Comp. [OenC°° It No Comm. P-Comp. P-Comp.i NEXP-Comp. NEXP-Comp. Free Comm.P-Comp. P-Comp. PSPACE-Comp. Table 1: Computational complexity of R-COM-MTDPs. complexity. Summaryand Related Work This workA_ddressestwo shortcomingsof the current work in team formation for dynamicreal-time domains: i) most algorithmsare static in the sense that they don’t anticipate for changesthat will be requiredin the configurationof the teams, fi) complexityanalysis of the problemis lacking. Weaddressed the first shortcomingby presenting R-COMMTDP,a formal model based on decentralized communicating POMDPs, to determine the team configuration that takes into accounthowthe teamwill haveto be restructured in the future. R-COM-MTDP enables a rigorous analysis of complexity-optimalitytradeoffs in team formation and reorganization approaches. The second shortcomingwas addressed by presenting an analysis of the worst-casecomputational complexityof team formation and reformation under various types of communication.Weintend to use the R-COM-MTDP model to compare the various team formation algorithms for RoboCupRescue which were used by our agents in RoboCup-2001 and Robofesta 2001, where our agents finished in third place and secondplace respectively. While there are related multiagent models based on MDPs,they have focused on coordination after team formation on a subset of domaintypes we consider, and they do not address team formation and reformation. For instance, the decentralizedpartially observableMarkovdecision process (DEC-POMDP) (Bernstein, Zilberstein, &Immerman 2000)modelfocuses on generating decentralized policies in collectively partially observable domainswith no communication; while the Xnan-Lessermodel(Xnan, Lesser, Zilberstein 2001) focuses only on a subset of collectively observable environments.Finally, while (Modiet al. 2001) provide an initial complexityanalysis of distributed sensor teamformation,their analysis is limited to static environments(no reorganizations)- in fact, illustrating the need for R-COM-MTDP type analysis tools. Acknowledgment Wewouldlike to thank DavidPynadathfor his discussions on extending the COM-MTDP model to R-COM-MTDP, and the Intel Corporationfor their generousgift that made this researchpossible. References Bernstein, D. S.; Ziiberstein, S.; and Immerman, N. 2000. The complexityof decentralized control of MDPs.In Proceeings of the Sixteenth Conferenceon Uncertaintyin Artificial Intelligence. 56 Fatima, S. S., and Wooldridge,M. 2001. Adaptive task and resourceallocation in multi-agentsystems.In Proceedings of the Fifth International Conferenceon Autonomous Agents. Hofling, B.; Benyo,B.; and Lesser, V. 2001. Usingselfdiagnosis to adapt organizational structures. In Proceedings of the Fifth International Conferenceon Autonomous Agents. Hunsberger, L., and Grosz, B. 2000. A combinatorial auction for collaborative planning. In Proceedingsof the FourthInternational Conferenceon MultiAgentSystems. Kitano, H.; Tadokoro,S.; Noda,I.; Matsubara,H.; Takahashi, T.; Shinjoh, A., and Shimada,S. 1999. Robocuprescue: Searchand rescue for large scale disasters as a domainfor multi-agent research. In ProceedingsoflEEEInternational Conferenceon Systems, Manand Cybernetics. Modi,P. J.; Jung, H.; Tambe,M.; Shen, W.-M.;and Kulkarni, S. 2001.A dynamicdistributed constraint satisfaction approachto resource allocation. In Proceedingsof Seventh International Conferenceon Principles and Practice of Constraint Programming. Pynadath, D., and Tambe,M. 2002. Multiagent teamwork: Analyzingthe optimality complexityof key theories and models. In Proceedingsof First International 3oint Conference on AutonomousAgents and Multi-Agent Systems. Tadokoro,S.; Kitano, H.; Tomoichi,T.; Noda,I.; Matsubara, H.; Shinjoh, A., Koto, T.; Takeuchi,I.; Takahashi,H.; Matsuno, E; Hatayama, M.; Nobe, J.; and Shimada, S. 2000. The robocup-rescue: An international cooperative researchproject of robotics and al for the disaster mitigation problem.In Proceedingsof SPIE14th AnnualInternational Symposiumon Aerospace~DefenseSensing, Simulation, and Controls (AeroSense), Conferenceon Unmanned GroundVehicle TechnologyIL Tidhar, G.; Rao, A. S.; and Sonenberg,E. 1996. Guided team selection. In Proceedingsof the SecondInternational Conferenceon Multi-Agent Systems. Wooldridge, M.; Jennings, N.; and Kinny, D. 1999. A methodologyfor agent oriented analysis and design. In Proceedingsof the Third International Conferenceon AutonomousAgents. Xuan,P.; Lesser, V.; and Zilberstein, S. 2001. Communication decisions in multiagentcooperation.In Proceedingsof the Fifth lnternatianal Conferenceon AutonomousAgents