From: AAAI Technical Report SS-02-02. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Learningto FormNegotiation Coalitions in a MultiagentSystem Leen-Kiat Soh Costas Tsatsoulis ComputerScience and Engineering University of Nebraska 115 FergusonHall Lincoln, NE (402) 472-6738 Dept. of Electrical Engineeringand Computer Science Information and Telecommunication TechnologyCenter University of Kansas Lawrence, KS 66044 (785) 864-7749 Iksoh@cse.unl.edu tsatsoul @ittc.ukans.edu Abstract In a multiagent systemwhereagents are peers and collaborate to achievea global task or resourceallocation goal, coalitions are usually formed dynamically from the bottom-up. Each agent has high autonomyand the system as a wholetends to be anarchic due to the distributed decision makingprocess. In this paper, wepresent a negotiation-basedcoalition formation approachthat learns in two different waysto improvethe chanceof a successful coalition formation.First, every agent evaluates the utility of its coalition candidatesvia reinforcement learning of past negotiation outcomesand behaviors. As a result, an agent assigns its task requirementsdifferently based on what it has learned from its interactions with its neighbors in the past. Second,each agent uses a case-based reasoning (CBR) mechanismto learn useful negotiation strategies that dictate hownegotiations should be executed. Furthermore,an agent also learns from its past relationship with a particular neighbor when conducting a negotiation with that neighbor. Thecollaborative learning behavior allowstwonegotiation partners to reach a deal moreeffectively, andagentsto formbetter coalitions faster. Introduction Wehave developed a multiagent system that uses negotiation between agents to form coalitions for solving task and resource allocation problems. All agents of this system are peers, and decisions are made autonomously at each agent, from the bottom up. Each agent maintains its own knowledge and information base (and thus has a partial view of the world) and is capable of initiating its owncoalitions. Due to the incomplete view and the dynamics of the world, an agent is unable or cannot afford to rationalize to form an ¯ optimal coalition. Hence, we have designed a negotiationbased coalition formation methodology, in which an agent forms a sub-optimal initial coalition based on its current information and then conducts concurrent 1-to-I negotiations with the membersof the initial coalition to refine and finalize the coalition. To form better coalitions faster, we have incorporated two learning mechanisms. First, each agent maintains a history of its relationships with its neighbors, documenting the negotiation experiences and using reinforcement learning to form potentially more viCopyright©2002, American Associationfor Artificial Intelligence (www.aaal.org). All rights reserved. 106 able and useful coalitions. Consequently, an agent assigns its task requirements differently based on what it has learned from its interactions with its neighbors in the past. Second, each agent has a case-based reasoning (CBR) modulewith its owncase base that allows it to learn useful negotiation strategies. In this manner, our agents are able to form better coalitions faster. An agent maintains its own knowledge base that helps determine its behavior--who to include in a coalition, how to negotiate to convince a neighbor to be in the coalition, and whento agree to join a coalition. As the agent continues reasoning, it adapts its behavior to what it has experienced. This in turn impacts its negotiation partners and collaborative learning is transferred implicitly, allowing a negotiation partner to update its relationship with the agent. This dynamic, negotiation-based coalition formation strategy can be viewed as an effective and efficient tool for collaborative learning. The efficiency comesfrom the fact that information is exchanged only when necessary during a negotiation. The driving application for our system is multisensor target tracking, a distributed resource allocation and constraint satisfaction problem. The objective is to track as manytargets as possible and as accurately as possible using a network of fixed sensors under real-time constraints. A "good-enough, soon-enough" coalition has to be formed quickly since, for example, to track accurately a target moving at half a foot per second requires one measurementeach from at least three different sensors within a time interval of less than 2 seconds. Finally, the environment is noisy and subject to uncertainty and errors such as message loss and jammed communication channels. In this paper, we first discuss present our coalition formation model. Then we briefly talk about our argumentative negotiation approach. After that, we discuss howreinforcement learning and case-based learning are used in the selection and ranking of coalition members,the determination of negotiation strategies, the task assignment among coalition candidates, and the evaluation of an on-going negotiation for evidence support. Then, we present and discuss some results. Before we conclude the paper, we outline somefuture work and interesting issues. Coalition Formation In dynamic,distributed problemsolving or resourceallocation, an initiating agent (or initiator) mustforma coalition with other agents, so that it can performits task. Themembers of a coalition are selected froma neighborhood,a set of other agents knownto the initiator. In our application domain, an agent a, t is a neighbor to an agent a, if a, t ’s . .j sensor coverage area overlaps that of aj. Anmmator first selects membersin the neighborhoodthat are qualified to be part of an initial coalition. Second,it evaluates these membersto rank themin terms of their potential utility value to the coalition. Third, it initiates negotiation requests to the top-ranked members,trying to convincethem to join the coalition. In the end, the coalition mayfail to form because of the membersrefusing to cooperate, or may form successfully whenenoughmembersreach a deal with the initiating agent. Finally, the agent sends a confirmation messageto all coalition membersinvolved to announcethe successor failure of the proposedcoalitionj. If it is a success, then all coalition membersthat have agreed to join will carry out their respectivetasks at plannedtimesteps. To establish whocan provide useful resources, the initiator calculatesthe positionandthe velocityof the target it is tracking and establishes a potential future path that the target will follow. Next,the initiator finds the radar coverage areas that this path crosses and identifies areas whereat least three radars can track the target (remember that tracking requires almost simultaneousmeasurement from at least three sensors). The agents controlling these radars become members of the initial coalition. Since computationalresources are limited, and negotiating consumesCPUand bandwidth, the initiator does not start negotiationwithall members of the coalition, but first ranks themand then initiates negotiation with the highestranked ones. Rankingof the coalition membersis done using a multi-criterion utility-theoretic evaluation technique. There are two groups of evaluation criteria, one problemrelated, and another one experience-based. The problemrelated criteria are: (1) the target’s projectedtimeof arrival at the coveragearea of a sensor: there has to be a balance betweentoo short arrival times whichdo not allow enough time to negotiate and too long arrival times whichdo not allow adequatetracking; (2) the target’s projected time departure from the coverage area of a sensor: the target needs to be in the coveragearea long enoughto be illumiDueto unreliable inter-agent communications, a confirmation messagemaynot arrive at the coalition members on time. We implemented one mechanism and are considering another to improvethe fault tolerance of the system.First, a coalition member will hold on to a commitment (though not yet confirmed)until it is no longerfeasible to do so---for example, whenthe durationof the task is not cost-effective.Thesecond improvement is to makepersistent communication depending onthe timebetween nowandthe start of the plannedcoalition. If the agenthas moretime, then it will makemoreattemptsto communicate the confirmation. 107 nated by the radar, (3) the numberof overlapping radar sectors: the moresectors that overlap the higher the chance that three agents will agree on measurements,thus achieving target triangulation; and (4) whetherthe initiator’s coverage overlaps the coveragearea of the coalition agent: in this case the initiator needsto convinceonly twoagents to measure(since it is the third one), whichmaybe easier than convincingthree. Wewill discuss further our experience-basedcriteria in the form of distributed case-based learning and utilitydriven reinforcementlearning in the followingsections. In summary,at the end of the evaluation all coalition members are rankedand the initiator activates negotiations with as manyhigh-ranked agents as possible (there have to be at least two and the maximum is established by the negotiation threadsavailableto the initiator at the time, since it maybe respondingto negotiation requests even as it is initiating other ones). ArgumentativeNegotiations Ouragents use a variation of the argumentativenegotiation model(Jennings et al. 1998) in whichit is not necessary for themto exchangetheir inference modelwith their negotiation partners. Notethat after the initial coalition formation, the initiator knowswhocan help. Thegoal of negotiations is to find out whois willing to help. Todo so, first the initiator contactsa coalition candidateto start a negotiating session. Whenthe responding agent (or responder) agrees to negotiate, it computesa persuasionthresholdthat indicates the degree to whichit needsto be convincedin order to free or share a resource(alternatively, one can viewthe persuasionthreshold as the degreeto whichan agent tries to hold on to a resource). Subsequently,the initiator attempts to convincethe responderby sharing parts of its local information. The responder, in turn, uses a set of domainspecific rules to establish whetherthe informationprovided by the initiator pushes it above a resource’s persuasion threshold, in whichcase it frees the resource. If the responderis not convincedby the evidential support provided by the initiator, it requests moreinformationthat is then providedby the initiator. The negotiation continues based on the established strategy and eventually either the agents reach an agreement,iti whichcase a resource or a percentage of a resource is freed, or the negotiation fails. Note that, motivatedto cooperate, the responder also counteroffers whenit realizes that the initiator has exhaustedits argumentsor whentime is running out for the particular negotiation. Howto negotiate successfully is dictated by a negotiation strategy, whicheach agent derives using casebased reasoning (CBR). CBRgreatly limits the time neededto decide on a negotiation strategy, whichis necessary in our real-time domainsince the agent does not have to computeits negotiationstrategy fromscratch. Reinforcement Learning for Coalition Formation 2, and current Eachagent keepsa profile of its neighborhood and past relationships with its ’neighbors,and the selection of the potential members of an initial coalition is basedon this profile. Thecurrent relationship is basedon the negotiation strains and leverage betweentwo agents at the time whenthe coalition is about to be formed.Thepast relationship, however,is collected over time and enables an agent to adapt to formcoalitions moreeffectively via reinforcementlearning. This we will discuss in this section. First, we define the past relationship betweenan agent ai and a candidate tZ,. Supposethat the numberof negotiations initiated froman agent ai to Otk is ~negottate(at -’> ak ), the numberof successful negotiations initiated from an agent di to ¢2~ is E~%:’~e(ai "-’> ¢~k ), the numberof negotiation requests fromCZk that a; agrees to entertain is entertain Similarly, the agent knowsof howuseful it has been to its potentialcoalitionpartnert2k : (d) the friendliness of t t o CZk : [~ y.,fmenain ~sotiate [ak ~ ai ) 2..go,~,,~ (ak "->ai ’ (e) the helpfulness of t t o Ok : Eentertain ~ (o negotiate" k "-) ai ) (f) the relative importanceof t t o ¢2k : ~,~gotia.(6t, "-) ai ~negotiate(6t~ -’~ t), t he total number foall n egotiations initiated from a i to all its neighbors is E,,qat~te(ai --’) lTa, ). andthe total number of all successful negotiations initiated E~:te(at---~rlal). from at to all its neighbors is In our model, rel~,.a,(a~,t) includes the following: (a) the helpfulnessof k toat : Xnego. ,(a, (b) the importanceof k toat : ~’negotiate (di (g) the reliance of k onat . ._,rla, )’ (c) the reliance of i on ~2k : 2 In our multiagentsystcra, eachagentmaintainsa neighborhood. Anagent knowsand can communicate to all its neighborsdirectly. Thisconfigurationof Overlappingneighborhoods centered aroundeach agent allows the systemto scale up. Each agent, for example, only needs to knowabout its local neighborhood, and does not needto havethe global knowledge of the entire system.In this manner,an agentcanmaintainits neighborhood with the variousmeasuresdiscussedin this section. 108 Note that the aboveattributes are based on data readily collected wheneverthe agent at initiates a re.quest to its neighborsor wheneverit receives a request fromone of its neighbors. The higher the value of each of the above attributes, the higher the potential utility the agent ai may contributeto the coalition. Thefirst three attributes tell the agent howhelpful and important a particular neighbor has been. The morehelpful and importantthat neighbor is, the better it is to include that neighborin the coalition. The secondset of attributes tells the agent the chanceof having a successful negotiation. The agent expects the particular neighbor to be grateful and morewilling to agree to a request based on the agent’s friendliness, helpfulness and relative importanceto that neighbor. Tofurther the granulaxity of the aboveattributes, one maymaasurethemalong different event types: for each event type, the initiating agent records the above six attributes. This allows the agent to better analyze the utility of a neighbor basedon whattype of events that it is currently trying to forma coalition for. In that case, an event type wouldqualify all the aboveattributes. Anagent at can readily computeand update the first six attributes since they arc basedon the agent’s encounterwith its neighbors.However,an agent is not able to computethe last attribute, the reliance of ak on ai , since the denominator is basedon 6Zt’s negotiationstatistics. Thisattribute is updated only when agent at receives arguments from t2 k. Thus, an agent updates the attributes from two sources. First, the update is triggered by the negotiation outcomedirectly. Second, the update comesfrom the informationpieces containedin the arguments(whenthe initiator remindsthe responderof the responder’sreliance on the initiator). Withthe argumentativenegotiation between agents, and the ability of an agent to conductmultiple, concurrent, coordinatedl-to-1 negotiations, the aboveprofiled attributes facilitate a convergence of useful coalitions, allowingmultiagentsystemas a wholeto learn to collaborate moreefficiently. From Learning to Selecting Better Coalition Members Anagent evaluates its coalition candidates and ranks them basedon their potential utility value. Thepotential utility, PUat.al" of a candidate O~k is a weightedsumof the past relationship, i.e., relp~,.=, (ak,t), the current relationship, The first attribute rel~o~,,, (ak,t) is inversely proportional to andthe other two are proportional to relnow.a, (ak,t). The first attribute approximateshowdemandingthe agent is of a particular neighbor. The more negotiations an agent is initiating to a neighbor,the more demanding the agent is and this strains the relationship between the two and the negotiations suffer. The last two attributes are used as a leverage that the agent can use against a neighborthat it is negotiating with, about a request initiated by the neighbor. Thepotentialutility is then: pu.,.., = w^.(.,.e, )¯ [rel ~,.o, (Ot~,t ) rei.o~.o, (Ot~,t ) ability., (otk,e WP"$t*a I .£j where W^,., (oi.ej) = w,o~.=~.ej and i.e., rel~.=i (O~k,t), and the ability of the candidate help with the task ej, i.e., ability,, (ak,ept). Theability value is domain-dependent and predefined (e.g. an IR sensor is better suited for night detection than an optical one). The current relationship between an agent a i and its neighbor 0~ is defined as follows: Supposethe numberof concurrent negotiations that an agent can conduct is N~i t, and the numberof tasks that the agent ai is currently executing as requested by Ot k is Suppose Z.x~=, ~ask : initiator(task )= ~,.a, Z=,e (a~. --~ O~k ) is the numberof ongoing negotiations initiated from ai to a~. Then,rel,o~.,, (C~k,t) includes the following: (a) negotiation strain between i andak : ,ng,ing Z.,ot, o.(a, No,aa (b) negotiation leverage between at and ~k : (O Z°ng"~ng ~ ~ai ) , and negotiate" degree of strain E~,e=e (task: initiator(task)= .a l ,e iWability j "4" Wnow.ai.e I "~ Wability.ai .¢ ] "~-1. Note that ulti- mately these weights maybe dynamicallydependenton the current status of ai and the task ey. And rel~.o, (0~k,t) is a weightedsumof the seven attributes previouslydiscussed. As a result, an agent is reinforced to go backto the sameneighbor(for a particular task) that has had goodpast relationship in their interactions. In this manner,our agents are able to form moreeffective coalitionsvia reinforcement learning of which neighbors are useful for whichtasks. From Learning to Arguing More Effectively Aswe mentionedearlier, whenan initiating agent argues, it sends over different informationclasses. Oneof themis the worldclass whichincludes a profile of the neighbors. This informationincludes the past relationship attributes. Upon receiving these arguments, the respondingagent evaluates the evidence support that they bring forth. Here we show an actual CLIPS rule that our agents use: (dofrule world-help-rate (world (_helpRate?y&:(> ?y 0.6667))) => (bind ?*evidenceSupport* (+ (* 0.05 ?y) ?*evidenceSupport*)) k Navait (c) W past.ai.e on t from ~Z k: a~k ) 2e, e~=,~ask : initiator(task )¢ rl,, 109 Thisrule saysif the attribute helpfulness is greaterthan 0.6667,then add 5 percent of that rate to the evidencesupport. This meansthat the respondingagent is reinforced to agree to a negotiation by its previousinteractions with the initiating agent. Andthe moresuccessesthe initiating agent has with a particular neighbor, the moreeffectively it can argue with that neighbordue to its reinforcementlearning. Asa result, both agentslearn to conductbetter negotiations. Note also that this collaborative learning via inter-agent negotiations allow both agents to improvetheir performance because both agents are motivated to cooperate to achieveglobal goals. Case-Based Learning for Better Negotiations Beforea negotiation can take place, an agent has to define its negotiationstrategy, whichit learns fromthe retrieved, mostsimilar old case. Note that a better negotiation does not meanone that alwaysresults in a deal. A better negotiation is one that is effective andefficient, meaningthat a quick successful negotiation is alwayspreferred, but if a negotiationis goingto fail, then a quicklyfailed negotiation is also preferred. A negotiation process is situated. That is, for the same task, becauseof the differences in the worldscenarios, constraints, evaluationcriteria, informationcertainty and completeness, and agent status, an agent mayadopt a different negotiation strategy. A strategy establishes the types of information to be transmitted, the numberof communication acts, the computingresources needed, and so on. To represent each situation, we use cases. Eachagent maintains two case bases, one for the agent as an initiator, the other for the agent as a responder.In general, an initiator is more conceding and agreeable and a responder is more demandingand unyielding. Each agent learns from its own experiencesand thus evolves its owncase bases. In our worka case contains the followinginformation:a situation space, a solution space, and an outcome. The situation space documentsthe information about the world, the agent, and the neighborswhenthe task arises. Thesolution space is the negotiation strategy that determineshow an agent should behavein its upcomingnegotiation. In an initiator case, the situation features are the list of current tasks, whatthe potential negotiation partners are, the task description, and the target speedand location. Thenegotiation parametersare the classes and descriptions of the informationto transfer, the time constraints, the numberof negotiation steps planned, the CPUresource usage, the numberof steps possible, the CPUresources needed, and the numberof agents that can be contacted. In a responder case, the situation feature set consists of the list of current tasks, the ID of the initiating agent, the task description,the powerand data quality of its sensor, its CPUresources available, and the status of its sensing sector. Thenegotiation parametersto be determinedare the time allocated, the numberof negotiation steps planned, the CPUusage, the powerusage, a persuasion threshold for turning a sensing sector on (performing frequency or amplitude measurements), giving up CPUresources, or sharing communication channels. Finally, in both cases the outcomerecords the result of the negotiation. Whena task is formulated, an agent composesa new problemcase with a situation space. Thenit retrieves the most similar case from the case base by a weighted, parametric matching on the situation spaces. Given the most ii0 similar case, the agent adapts that solution or negotiation strategy to moreclosely reflect the current situation. Equippedwith this modifiednegotiation strategy, the agent proceedswith its negotiation. Finally, whenthe negotiation completes,the agent updates the case with the outcome. Incremental and Refinement Learning After a negotiation is completed(successfully or otherwise), the agent updates its case base using an incremental and a refinementlearning step. During incremental learning the agent matchesthe new case to all cases in the case baseand if it is significantly different fromall other stored cases, then it stores the new case. Whenthe agent computesthe difference between a pair of cases, it emphasizesmorethe case description than the negotiation parameterssince its objective is to learn a wide coverage of the problem domain. This will improve its future case retrieval and case adaptation. So, an agent learns good, unique cases incrementally. The difference measureis: f’+.,, Y_. +.,,.,I::-,+ -:.:.,.,I +] I +’"+ I Diff(a,b)= ~ ,=,,on +++, i=ltoM /, 2w., j=lh~’ wheref,i,., is the ith situation feature for case k, andf~.k is the jth solutionfeature for casek, w++,. i is the weightfor wsit is the weight for the situation space, and w~ is the weight for f/~.k, andw+~,jis the weight for fs~/.k ’ the solution space. Also, wsit + w~= 1, and each wsa.i or w~,j is between 0 and 1. Finally, the agent computes minDiff(a,b)for the newcase a, and if this numberis b=Itog greater than a pre-determinedthreshold, then the case is learned. Second,since we want to keep the size of the case base undercontrol, especially for speedof retrieval and maintenance(since our problemdomaindeals with real-time target tracking), the agent also performsrefinementlearning. If the newcase is foundto be very similar to one of the existing cases, then it computes (I) the sum of differences (e.g., ~ Diff(n,b)) betweenthat old case and the entire b=lloK case base minusthe old case; and (2) the sumof differences betweenthe newcase and the entire case base minusthe old case. Thisestablishes the utility of the newcase and the old case (that the agent considersto replace with the newcase). The agent chooses to keep the case that will increase the diversity of the case base. Thus, if the second sum is greater than the first sum, then the agent replaces the old case with the new one. In this manner, we are able to graduallyrefine the cases in the case base while keepingthe size of the case baseundercontrol. Results Wehave built a fully-integrated mtfltiagent systemwith agents performingend-to-end behavior. In our simulation, we have any numberautonomousagents and Tracker modules. A Tracker moduleis tasked to accept target measurements from the agents and predict the location of the target. The agents track targets, and negotiate with each other. Wehave tested our agents in a simulated environmentand with actual hardwareof up to eight sensors and two targets. For the following experiments, we used four sensors and one target movingabout 60 feet betweentwo points. Weran two basic experiments:In the first one we investigated the effect of reinforcementlearning in the quality of the resulting coalition, and in the second one we looked into the quality of the negotiationas a function of learning newcases of negotiation strategies. The quality was based in the numberof successful negotiations, since whenthe agents reach a negotiateddeal to jointly track a target, the overall systemutility increases. Initial results indicate that the agents form coalitions with partners whoare morewilling to accommodate themin negotiation, and that the cases learned are being used in future negotiations. Agentsthat use learning of negotiation cases have between 40%and 20%fewer failed negotiations. Agentsthat use reinforcementlearning to determinefuture coalition partners tend to prefer neighbors whoare moreconceding. Wealso conducted experiments with four versions of learning: (1) both case-based learning and reinforcement learning (CBLRL), (2) only reinforcement learning (NoCBL),(3) only case-based learning (NoRL),and learning at all (NoCBLRL). Figure 1 showsthe result terms of the success rates for negotiations and coalition formations. As can be observed from the graph, the agent design with both case-based learning and reinforcement learning outperformed others in both its negotiation success rate and coalition formation success rate. That meanswith learning, the agents wereable to negotiate moreeffectively (and perhapsmoreefficiently as well) that led to morecoalitions formed. Withouteither case-basedlearning or reinforcementlearning (but not both), the negotiation success rates remainedabout the samebut the coalition formation rate tended to deteriorate. This indicates that without one of the learning mechanisms,the agents were still able to negotiate effectively, but maybe not efficiently (resulting in less processingtime for the initiating agent to post-process an agreement). Without both learning mechanisms,there wassignificant drop in the negotiation success rate. This indicates that the learning mechanisms did help improve iii Figure 1 Successrates of negotiations and coalition formations for different learning mechanisms. ¯ Notethat the entire agent systemis complicatedwith the end-to-end-behavioraffected by various threads and environments.The results reported here need to be scrutinized further to isolate learning, monitoring,detection, reasoning, communication,and execution componentsin the evaluation: Anagent busytracking will not entertain a negotiation request that requires it to give up the resourcesthat it is using to track. That refusal leads to a failure on the initiator’s side. Anotherpoint worthmentioningis the real-time nature of our system and experiments. Addedlearning steps maycause an agent to lose valuable processing time to handle a coalition formation problem. More frequent coalition formationsmayflat out prevent other negotiations to proceedas moreagents will be tied up in their scheduled tasks, part of the commitments to the coalitions. Future Research Issues Thereare other learning issues that weplan investigate. Of immediateconcernsto us are the following (1) Weplan to investigate the learned cases and the overall ’health’ of the case bases. Weplan to measurethe diversity growthof each case base (initiating and responding)for each agent. Wealso plan to examinethe relationships between an agent’s behavior (such as numberof tracking tasks, the numberof negotiation tasks, the CPUresources allocated, etc.) with the agent’s learning rate. For example,if an agent is alwaysbusy, does it learn moreor less, or learn faster or more slowly? (2) Weplan to investigate the learning of "good-enough, soon-enough" solutions. Under time constraints, agents cannot afford to performcomplicatedlearning. Shouldthe agent learn hastily sub-optimalsolutions to its coalition formation and negotiation ventures? Or should it keep track of its coalition formationsuccess rates and its negotiationsuccess rates? For example,if one of the rates drops below an acceptable level, should an agent ask for help from more successful agents and perhaps learn their cases? If so, then we mayhaveto incorporate time issues in this behaviorto trade-off betweenthe loss of the target tracking time, and the gain of higher coalition formationrates. (3) Weplan to investigate further the effects of reinforcementlearning on the agents’ coalition formationcapabilities. Willeach agent learn to approacha static coalition formationfor each particular task? That is, will the reinforcementlearning inhibit an agent to look to someother neighbors for help instead of the sameold groupof neighbors?If so, this could be harmfulto the system? (4) Weplan to investigate cooperative and distributed case-based learning (Martin and Plaza 1998; Martin et aL 1998; Plaza et aL 1997). However, instead of evaluating the worthof such learning in terms of tasks getting executed, wewant to concentrate on coalitions getting formed. Webelieve that combiningcase-based learning with negotiations allows the agents to exchangeexperiences selectively---a negotiation occurs only whennecessary. This facilitates a moreconservative but potentially morepowerfullearning behavioras the learning is reinforced through subsequent agent processes. Conclusions Wehave described the combination of reinforcement and case-based learning in negotiating multiagent systems. Each agent keeps a profile of its interactions with its neighbors and keeps a log on their relationships. This allows an agent to computea potential utility of each coalition candidate and only approach those with high utility values to increase the chanceof a successful coalition formation. This is a form of reinforcementlearning as agents 112 are able to learn to go back to the sameagents that have cooperated before and to argue more effectively. The approachof negotiation-basedcoalition facilitates collaborative learning efforts amongthe agents as the negotiation outcomes influence each agent’s perception of its neighbors,which,in turn, plays an importantrole in forming better coalitions faster. This learning is derived from the outcomeof the negotiations and also from the information exchangedduring a negotiation. Eachagent also uses case-based reasoning to derive a negotiation strategy. Whena negotiation completes, the resultant new case is learned if it increases the diversity of the case base. The integration of these learning mechanismsleads to more effective negotiations and deals betweenagents. Initial results supportthis avenueof research and showthat learning agents performbetter and moresuccessful negotiations. Acknowledgments The work described in this paper is sponsored by the Defense AdvancedResearch Projects Agency(DARPA)and Air Force Research Laboratory, Air Force Materiel Command, USAF,under agreement number F30602-99-2-0502. The U.S. Governmentis authorized to reproduce and distribute reprints for Governmental purposesnotwithstanding any copyright annotation thereon. The views and conclusions containedherein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements,either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory, or the U.S. Government. References Jennings, N. R., Parsons, S., Noriega, P., and Sierra, C. 1998. On Argumentation-BasedNegotiation, in Proceedings of Int. Workshopon Multi-Agent Systems, Boston, MA. Martin, F. and Plaza, E. 1998. Auction-BasedRetrieval, in Proceedingsof the 2~ CongresCatala d’InwUigenciaArtificial, Girona, Spain, 136-145. Martin, F., Plaza, E. and Arcos, J. L. 1998. Knowledge and Experience Reuse through CommunicationamongCompetent (Peer) Agents,International Journalof SoftwareEngineering and KnowledgeEngineering, 9(3):319-341. Plaza, E., Arcos, J. L., and Martin, F. (1997). Cooperative Case-BasedReasoning,in Weiss, G. (ed.) Distributed artificial intelligence meetsmachinelearning, Lecture Notesin Artificial Intelligence number1221, Springer Verlag, 180201. Sob, L.-K. and Tsatsoulis, C. 2001. Reflective Negotiating Agents for Real-TimeMultisensorTarget Tracking, in Proceedings oflJCAl’Ol, Seattle, WA,1121-1127.