Learning to Form Negotiation Coalitions in a Multiagent System

From: AAAI Technical Report SS-02-02. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Learningto FormNegotiation Coalitions in a MultiagentSystem
Leen-Kiat Soh
Costas Tsatsoulis
ComputerScience and Engineering
University of Nebraska
115 FergusonHall
Lincoln, NE
(402) 472-6738
Dept. of Electrical Engineeringand Computer
Science
Information and Telecommunication
TechnologyCenter
University of Kansas
Lawrence, KS 66044
(785) 864-7749
Iksoh@cse.unl.edu
tsatsoul @ittc.ukans.edu
Abstract
In a multiagent systemwhereagents are peers and collaborate
to achievea global task or resourceallocation goal, coalitions
are usually formed dynamically from the bottom-up. Each
agent has high autonomyand the system as a wholetends to
be anarchic due to the distributed decision makingprocess.
In this paper, wepresent a negotiation-basedcoalition formation approachthat learns in two different waysto improvethe
chanceof a successful coalition formation.First, every agent
evaluates the utility of its coalition candidatesvia reinforcement learning of past negotiation outcomesand behaviors.
As a result, an agent assigns its task requirementsdifferently
based on what it has learned from its interactions with its
neighbors in the past. Second,each agent uses a case-based
reasoning (CBR) mechanismto learn useful negotiation
strategies that dictate hownegotiations should be executed.
Furthermore,an agent also learns from its past relationship
with a particular neighbor when conducting a negotiation
with that neighbor. Thecollaborative learning behavior allowstwonegotiation partners to reach a deal moreeffectively,
andagentsto formbetter coalitions faster.
Introduction
Wehave developed a multiagent system that uses negotiation between agents to form coalitions for solving task and
resource allocation problems. All agents of this system are
peers, and decisions are made autonomously at each agent,
from the bottom up. Each agent maintains its own knowledge and information base (and thus has a partial view of
the world) and is capable of initiating its owncoalitions.
Due to the incomplete view and the dynamics of the world,
an agent is unable or cannot afford to rationalize to form an
¯ optimal coalition. Hence, we have designed a negotiationbased coalition formation methodology, in which an agent
forms a sub-optimal initial coalition based on its current
information and then conducts concurrent 1-to-I negotiations with the membersof the initial coalition to refine and
finalize the coalition. To form better coalitions faster, we
have incorporated two learning mechanisms. First, each
agent maintains a history of its relationships with its
neighbors, documenting the negotiation experiences and
using reinforcement learning to form potentially more viCopyright©2002, American
Associationfor Artificial Intelligence
(www.aaal.org).
All rights reserved.
106
able and useful coalitions. Consequently, an agent assigns
its task requirements differently based on what it has
learned from its interactions with its neighbors in the past.
Second, each agent has a case-based reasoning (CBR)
modulewith its owncase base that allows it to learn useful
negotiation strategies. In this manner, our agents are able
to form better coalitions faster.
An agent maintains its own knowledge base that helps
determine its behavior--who to include in a coalition, how
to negotiate to convince a neighbor to be in the coalition,
and whento agree to join a coalition. As the agent continues reasoning, it adapts its behavior to what it has experienced. This in turn impacts its negotiation partners and
collaborative learning is transferred implicitly, allowing a
negotiation partner to update its relationship with the agent.
This dynamic, negotiation-based coalition formation strategy can be viewed as an effective and efficient tool for collaborative learning. The efficiency comesfrom the fact that
information is exchanged only when necessary during a
negotiation.
The driving application for our system is multisensor
target tracking, a distributed resource allocation and constraint satisfaction problem. The objective is to track as
manytargets as possible and as accurately as possible using
a network of fixed sensors under real-time constraints. A
"good-enough, soon-enough" coalition has to be formed
quickly since, for example, to track accurately a target moving at half a foot per second requires one measurementeach
from at least three different sensors within a time interval of
less than 2 seconds. Finally, the environment is noisy and
subject to uncertainty and errors such as message loss and
jammed communication channels.
In this paper, we first discuss present our coalition formation model. Then we briefly talk about our argumentative negotiation approach. After that, we discuss howreinforcement learning and case-based learning are used in the
selection and ranking of coalition members,the determination of negotiation strategies, the task assignment among
coalition candidates, and the evaluation of an on-going negotiation for evidence support. Then, we present and discuss some results. Before we conclude the paper, we outline somefuture work and interesting issues.
Coalition Formation
In dynamic,distributed problemsolving or resourceallocation, an initiating agent (or initiator) mustforma coalition
with other agents, so that it can performits task. Themembers of a coalition are selected froma neighborhood,a set
of other agents knownto the initiator. In our application
domain, an agent a, t is a neighbor to an agent a, if a, t ’s
. .j
sensor coverage area overlaps that of aj. Anmmator
first
selects membersin the neighborhoodthat are qualified to
be part of an initial coalition. Second,it evaluates these
membersto rank themin terms of their potential utility
value to the coalition. Third, it initiates negotiation requests to the top-ranked members,trying to convincethem
to join the coalition. In the end, the coalition mayfail to
form because of the membersrefusing to cooperate, or may
form successfully whenenoughmembersreach a deal with
the initiating agent. Finally, the agent sends a confirmation
messageto all coalition membersinvolved to announcethe
successor failure of the proposedcoalitionj. If it is a success, then all coalition membersthat have agreed to join
will carry out their respectivetasks at plannedtimesteps.
To establish whocan provide useful resources, the initiator calculatesthe positionandthe velocityof the target it
is tracking and establishes a potential future path that the
target will follow. Next,the initiator finds the radar coverage areas that this path crosses and identifies areas whereat
least three radars can track the target (remember
that tracking requires almost simultaneousmeasurement
from at least
three sensors). The agents controlling these radars become
members
of the initial coalition.
Since computationalresources are limited, and negotiating consumesCPUand bandwidth, the initiator does not
start negotiationwithall members
of the coalition, but first
ranks themand then initiates negotiation with the highestranked ones. Rankingof the coalition membersis done using a multi-criterion utility-theoretic evaluation technique.
There are two groups of evaluation criteria, one problemrelated, and another one experience-based. The problemrelated criteria are: (1) the target’s projectedtimeof arrival
at the coveragearea of a sensor: there has to be a balance
betweentoo short arrival times whichdo not allow enough
time to negotiate and too long arrival times whichdo not
allow adequatetracking; (2) the target’s projected time
departure from the coverage area of a sensor: the target
needs to be in the coveragearea long enoughto be illumiDueto unreliable inter-agent communications,
a confirmation
messagemaynot arrive at the coalition members
on time. We
implemented
one mechanism
and are considering another to
improvethe fault tolerance of the system.First, a coalition
member
will hold on to a commitment
(though not yet confirmed)until it is no longerfeasible to do so---for example,
whenthe durationof the task is not cost-effective.Thesecond
improvement
is to makepersistent communication
depending
onthe timebetween
nowandthe start of the plannedcoalition.
If the agenthas moretime, then it will makemoreattemptsto
communicate
the confirmation.
107
nated by the radar, (3) the numberof overlapping radar
sectors: the moresectors that overlap the higher the chance
that three agents will agree on measurements,thus achieving target triangulation; and (4) whetherthe initiator’s coverage overlaps the coveragearea of the coalition agent: in
this case the initiator needsto convinceonly twoagents to
measure(since it is the third one), whichmaybe easier than
convincingthree.
Wewill discuss further our experience-basedcriteria in
the form of distributed case-based learning and utilitydriven reinforcementlearning in the followingsections. In
summary,at the end of the evaluation all coalition members
are rankedand the initiator activates negotiations with as
manyhigh-ranked agents as possible (there have to be at
least two and the maximum
is established by the negotiation
threadsavailableto the initiator at the time, since it maybe
respondingto negotiation requests even as it is initiating
other ones).
ArgumentativeNegotiations
Ouragents use a variation of the argumentativenegotiation
model(Jennings et al. 1998) in whichit is not necessary
for themto exchangetheir inference modelwith their negotiation partners. Notethat after the initial coalition formation, the initiator knowswhocan help. Thegoal of negotiations is to find out whois willing to help. Todo so, first the
initiator contactsa coalition candidateto start a negotiating
session. Whenthe responding agent (or responder) agrees
to negotiate, it computesa persuasionthresholdthat indicates the degree to whichit needsto be convincedin order
to free or share a resource(alternatively, one can viewthe
persuasionthreshold as the degreeto whichan agent tries to
hold on to a resource). Subsequently,the initiator attempts
to convincethe responderby sharing parts of its local information. The responder, in turn, uses a set of domainspecific rules to establish whetherthe informationprovided
by the initiator pushes it above a resource’s persuasion
threshold, in whichcase it frees the resource. If the responderis not convincedby the evidential support provided
by the initiator, it requests moreinformationthat is then
providedby the initiator. The negotiation continues based
on the established strategy and eventually either the agents
reach an agreement,iti whichcase a resource or a percentage of a resource is freed, or the negotiation fails. Note
that, motivatedto cooperate, the responder also counteroffers whenit realizes that the initiator has exhaustedits
argumentsor whentime is running out for the particular
negotiation. Howto negotiate successfully is dictated by a
negotiation strategy, whicheach agent derives using casebased reasoning (CBR). CBRgreatly limits the time
neededto decide on a negotiation strategy, whichis necessary in our real-time domainsince the agent does not have
to computeits negotiationstrategy fromscratch.
Reinforcement
Learning for Coalition Formation
2, and current
Eachagent keepsa profile of its neighborhood
and past relationships with its ’neighbors,and the selection
of the potential members
of an initial coalition is basedon
this profile. Thecurrent relationship is basedon the negotiation strains and leverage betweentwo agents at the time
whenthe coalition is about to be formed.Thepast relationship, however,is collected over time and enables an agent
to adapt to formcoalitions moreeffectively via reinforcementlearning. This we will discuss in this section. First,
we define the past relationship betweenan agent ai and a
candidate tZ,. Supposethat the numberof negotiations
initiated froman agent ai to Otk is ~negottate(at -’> ak ),
the numberof successful negotiations initiated from an
agent di to ¢2~ is E~%:’~e(ai "-’> ¢~k ), the numberof
negotiation requests fromCZk that a; agrees to entertain is
entertain
Similarly, the agent knowsof howuseful it has been to
its potentialcoalitionpartnert2k :
(d) the friendliness of t t o CZk :
[~
y.,fmenain
~sotiate [ak ~ ai )
2..go,~,,~ (ak "->ai ’
(e) the helpfulness of t t o Ok :
Eentertain
~
(o
negotiate"
k "-) ai )
(f) the relative importanceof t t o ¢2k :
~,~gotia.(6t, "-) ai
~negotiate(6t~ -’~ t), t he total number foall n egotiations
initiated
from a i to all its neighbors
is
E,,qat~te(ai --’) lTa, ). andthe total number
of all successful
negotiations initiated
E~:te(at---~rlal).
from at to all its neighbors is
In our model, rel~,.a,(a~,t)
includes the following:
(a) the helpfulnessof k toat :
Xnego.
,(a,
(b) the importanceof k toat :
~’negotiate
(di
(g) the reliance of k onat .
._,rla, )’
(c) the reliance of i on ~2k :
2 In our multiagentsystcra, eachagentmaintainsa neighborhood.
Anagent knowsand can communicate
to all its neighborsdirectly. Thisconfigurationof Overlappingneighborhoods
centered aroundeach agent allows the systemto scale up. Each
agent, for example, only needs to knowabout its local
neighborhood,
and does not needto havethe global knowledge
of the entire system.In this manner,an agentcanmaintainits
neighborhood
with the variousmeasuresdiscussedin this section.
108
Note that the aboveattributes are based on data readily
collected wheneverthe agent at initiates a re.quest to its
neighborsor wheneverit receives a request fromone of its
neighbors. The higher the value of each of the above attributes, the higher the potential utility the agent ai may
contributeto the coalition. Thefirst three attributes tell the
agent howhelpful and important a particular neighbor has
been. The morehelpful and importantthat neighbor is, the
better it is to include that neighborin the coalition. The
secondset of attributes tells the agent the chanceof having
a successful negotiation. The agent expects the particular
neighbor to be grateful and morewilling to agree to a request based on the agent’s friendliness, helpfulness and
relative importanceto that neighbor. Tofurther the granulaxity of the aboveattributes, one maymaasurethemalong
different event types: for each event type, the initiating
agent records the above six attributes. This allows the
agent to better analyze the utility of a neighbor basedon
whattype of events that it is currently trying to forma coalition for. In that case, an event type wouldqualify all the
aboveattributes.
Anagent at can readily computeand update the first six
attributes since they arc basedon the agent’s encounterwith
its neighbors.However,an agent is not able to computethe
last attribute, the reliance of ak on ai , since the denominator is basedon 6Zt’s negotiationstatistics. Thisattribute
is updated only when agent at receives arguments from
t2 k. Thus, an agent updates the attributes from two
sources. First, the update is triggered by the negotiation
outcomedirectly. Second, the update comesfrom the informationpieces containedin the arguments(whenthe initiator remindsthe responderof the responder’sreliance on
the initiator). Withthe argumentativenegotiation between
agents, and the ability of an agent to conductmultiple, concurrent, coordinatedl-to-1 negotiations, the aboveprofiled
attributes facilitate a convergence
of useful coalitions, allowingmultiagentsystemas a wholeto learn to collaborate
moreefficiently.
From Learning to Selecting Better Coalition
Members
Anagent evaluates its coalition candidates and ranks them
basedon their potential utility value. Thepotential utility,
PUat.al" of a candidate O~k is a weightedsumof the past
relationship, i.e., relp~,.=,
(ak,t), the current relationship,
The first
attribute
rel~o~,,, (ak,t)
is inversely proportional to
andthe other two are proportional to
relnow.a, (ak,t). The first attribute approximateshowdemandingthe agent is of a particular neighbor. The more
negotiations an agent is initiating to a neighbor,the more
demanding
the agent is and this strains the relationship between the two and the negotiations suffer. The last two
attributes are used as a leverage that the agent can use
against a neighborthat it is negotiating with, about a request initiated by the neighbor.
Thepotentialutility is then:
pu.,.., = w^.(.,.e,
)¯
[rel ~,.o, (Ot~,t ) rei.o~.o, (Ot~,t ) ability., (otk,e
WP"$t*a
I .£j
where W^,., (oi.ej) = w,o~.=~.ej and
i.e., rel~.=i (O~k,t), and the ability of the candidate
help with the task ej, i.e., ability,, (ak,ept). Theability
value is domain-dependent
and predefined (e.g. an IR sensor is better suited for night detection than an optical one).
The current relationship between an agent a i and its
neighbor 0~ is defined as follows: Supposethe numberof
concurrent negotiations that an agent can conduct is N~i
t,
and the numberof tasks that the agent ai is currently
executing
as
requested
by
Ot
k
is
Suppose
Z.x~=, ~ask : initiator(task )= ~,.a,
Z=,e (a~. --~ O~k ) is the numberof ongoing negotiations initiated from ai to a~. Then,rel,o~.,, (C~k,t) includes the following:
(a) negotiation
strain
between i
andak :
,ng,ing
Z.,ot,
o.(a,
No,aa
(b) negotiation
leverage
between
at
and ~k :
(O
Z°ng"~ng
~
~ai )
, and
negotiate"
degree
of strain
E~,e=e (task: initiator(task)=
.a l ,e
iWability
j "4" Wnow.ai.e
I "~ Wability.ai
.¢ ] "~-1. Note
that ulti-
mately these weights maybe dynamicallydependenton the
current
status
of ai and the task ey.
And
rel~.o, (0~k,t) is a weightedsumof the seven attributes
previouslydiscussed. As a result, an agent is reinforced to
go backto the sameneighbor(for a particular task) that
has had goodpast relationship in their interactions. In this
manner,our agents are able to form moreeffective coalitionsvia reinforcement learning of which neighbors are
useful for whichtasks.
From Learning to Arguing More Effectively
Aswe mentionedearlier, whenan initiating agent argues, it
sends over different informationclasses. Oneof themis the
worldclass whichincludes a profile of the neighbors. This
informationincludes the past relationship attributes. Upon
receiving these arguments, the respondingagent evaluates
the evidence support that they bring forth. Here we show
an actual CLIPS
rule that our agents use:
(dofrule world-help-rate
(world (_helpRate?y&:(> ?y 0.6667)))
=>
(bind ?*evidenceSupport*
(+ (* 0.05 ?y)
?*evidenceSupport*))
k
Navait
(c)
W past.ai.e
on t
from
~Z
k:
a~k )
2e, e~=,~ask : initiator(task )¢ rl,,
109
Thisrule saysif the attribute helpfulness
is greaterthan
0.6667,then add 5 percent of that rate to the evidencesupport. This meansthat the respondingagent is reinforced to
agree to a negotiation by its previousinteractions with the
initiating agent. Andthe moresuccessesthe initiating agent
has with a particular neighbor, the moreeffectively it can
argue with that neighbordue to its reinforcementlearning.
Asa result, both agentslearn to conductbetter negotiations.
Note also that this collaborative learning via inter-agent
negotiations allow both agents to improvetheir performance because both agents are motivated to cooperate to
achieveglobal goals.
Case-Based Learning for Better
Negotiations
Beforea negotiation can take place, an agent has to define
its negotiationstrategy, whichit learns fromthe retrieved,
mostsimilar old case. Note that a better negotiation does
not meanone that alwaysresults in a deal. A better negotiation is one that is effective andefficient, meaningthat a
quick successful negotiation is alwayspreferred, but if a
negotiationis goingto fail, then a quicklyfailed negotiation
is also preferred.
A negotiation process is situated. That is, for the same
task, becauseof the differences in the worldscenarios, constraints, evaluationcriteria, informationcertainty and completeness, and agent status, an agent mayadopt a different
negotiation strategy. A strategy establishes the types of
information to be transmitted, the numberof communication acts, the computingresources needed, and so on. To
represent each situation, we use cases. Eachagent maintains two case bases, one for the agent as an initiator, the
other for the agent as a responder.In general, an initiator is
more conceding and agreeable and a responder is more
demandingand unyielding. Each agent learns from its own
experiencesand thus evolves its owncase bases.
In our worka case contains the followinginformation:a
situation space, a solution space, and an outcome. The
situation space documentsthe information about the world,
the agent, and the neighborswhenthe task arises. Thesolution space is the negotiation strategy that determineshow
an agent should behavein its upcomingnegotiation. In an
initiator case, the situation features are the list of current
tasks, whatthe potential negotiation partners are, the task
description, and the target speedand location. Thenegotiation parametersare the classes and descriptions of the informationto transfer, the time constraints, the numberof
negotiation steps planned, the CPUresource usage, the
numberof steps possible, the CPUresources needed, and
the numberof agents that can be contacted. In a responder
case, the situation feature set consists of the list of current
tasks, the ID of the initiating agent, the task description,the
powerand data quality of its sensor, its CPUresources
available, and the status of its sensing sector. Thenegotiation parametersto be determinedare the time allocated, the
numberof negotiation steps planned, the CPUusage, the
powerusage, a persuasion threshold for turning a sensing
sector on (performing frequency or amplitude measurements), giving up CPUresources, or sharing communication channels. Finally, in both cases the outcomerecords
the result of the negotiation.
Whena task is formulated, an agent composesa new
problemcase with a situation space. Thenit retrieves the
most similar case from the case base by a weighted, parametric matching on the situation spaces. Given the most
ii0
similar case, the agent adapts that solution or negotiation
strategy to moreclosely reflect the current situation.
Equippedwith this modifiednegotiation strategy, the agent
proceedswith its negotiation. Finally, whenthe negotiation
completes,the agent updates the case with the outcome.
Incremental and Refinement Learning
After a negotiation is completed(successfully or otherwise), the agent updates its case base using an incremental
and a refinementlearning step.
During incremental learning the agent matchesthe new
case to all cases in the case baseand if it is significantly
different fromall other stored cases, then it stores the new
case. Whenthe agent computesthe difference between a
pair of cases, it emphasizesmorethe case description than
the negotiation parameterssince its objective is to learn a
wide coverage of the problem domain. This will improve
its future case retrieval and case adaptation. So, an agent
learns good, unique cases incrementally. The difference
measureis:
f’+.,,
Y_.
+.,,.,I::-,+
-:.:.,.,I
+]
I +’"+
I
Diff(a,b)= ~ ,=,,on
+++,
i=ltoM
/,
2w.,
j=lh~’
wheref,i,., is the ith situation feature for case k, andf~.k
is the jth solutionfeature for casek, w++,.
i is the weightfor
wsit is the
weight for the situation space, and w~ is the weight for
f/~.k, andw+~,jis the weight for
fs~/.k ’
the solution space. Also, wsit + w~= 1, and each wsa.i
or w~,j is between 0 and 1. Finally, the agent computes
minDiff(a,b)for the newcase a, and if this numberis
b=Itog
greater than a pre-determinedthreshold, then the case is
learned.
Second,since we want to keep the size of the case base
undercontrol, especially for speedof retrieval and maintenance(since our problemdomaindeals with real-time target
tracking), the agent also performsrefinementlearning. If
the newcase is foundto be very similar to one of the existing cases, then it computes (I) the sum of differences
(e.g., ~ Diff(n,b)) betweenthat old case and the entire
b=lloK
case base minusthe old case; and (2) the sumof differences
betweenthe newcase and the entire case base minusthe old
case. Thisestablishes the utility of the newcase and the old
case (that the agent considersto replace with the newcase).
The agent chooses to keep the case that will increase the
diversity of the case base. Thus, if the second sum is
greater than the first sum, then the agent replaces the old
case with the new one. In this manner, we are able to
graduallyrefine the cases in the case base while keepingthe
size of the case baseundercontrol.
Results
Wehave built a fully-integrated mtfltiagent systemwith
agents performingend-to-end behavior. In our simulation,
we have any numberautonomousagents and Tracker modules. A Tracker moduleis tasked to accept target measurements from the agents and predict the location of the
target. The agents track targets, and negotiate with each
other. Wehave tested our agents in a simulated environmentand with actual hardwareof up to eight sensors and
two targets. For the following experiments, we used four
sensors and one target movingabout 60 feet betweentwo
points.
Weran two basic experiments:In the first one we investigated the effect of reinforcementlearning in the quality of
the resulting coalition, and in the second one we looked
into the quality of the negotiationas a function of learning
newcases of negotiation strategies. The quality was based
in the numberof successful negotiations, since whenthe
agents reach a negotiateddeal to jointly track a target, the
overall systemutility increases. Initial results indicate that
the agents form coalitions with partners whoare morewilling to accommodate
themin negotiation, and that the cases
learned are being used in future negotiations. Agentsthat
use learning of negotiation cases have between 40%and
20%fewer failed negotiations. Agentsthat use reinforcementlearning to determinefuture coalition partners tend to
prefer neighbors whoare moreconceding.
Wealso conducted experiments with four versions of
learning: (1) both case-based learning and reinforcement
learning (CBLRL), (2) only reinforcement learning
(NoCBL),(3) only case-based learning (NoRL),and
learning at all (NoCBLRL).
Figure 1 showsthe result
terms of the success rates for negotiations and coalition
formations. As can be observed from the graph, the agent
design with both case-based learning and reinforcement
learning outperformed
others in both its negotiation success
rate and coalition formation success rate. That meanswith
learning, the agents wereable to negotiate moreeffectively
(and perhapsmoreefficiently as well) that led to morecoalitions formed. Withouteither case-basedlearning or reinforcementlearning (but not both), the negotiation success
rates remainedabout the samebut the coalition formation
rate tended to deteriorate. This indicates that without one
of the learning mechanisms,the agents were still able to
negotiate effectively, but maybe not efficiently (resulting in
less processingtime for the initiating agent to post-process
an agreement). Without both learning mechanisms,there
wassignificant drop in the negotiation success rate. This
indicates that the learning mechanisms
did help improve
iii
Figure 1 Successrates of negotiations and coalition formations for different learning mechanisms.
¯ Notethat the entire agent systemis complicatedwith the
end-to-end-behavioraffected by various threads and environments.The results reported here need to be scrutinized
further to isolate learning, monitoring,detection, reasoning,
communication,and execution componentsin the evaluation: Anagent busytracking will not entertain a negotiation
request that requires it to give up the resourcesthat it is
using to track. That refusal leads to a failure on the initiator’s side. Anotherpoint worthmentioningis the real-time
nature of our system and experiments. Addedlearning
steps maycause an agent to lose valuable processing time
to handle a coalition formation problem. More frequent
coalition formationsmayflat out prevent other negotiations
to proceedas moreagents will be tied up in their scheduled
tasks, part of the commitments
to the coalitions.
Future Research
Issues
Thereare other learning issues that weplan investigate. Of
immediateconcernsto us are the following
(1) Weplan to investigate the learned cases and the overall
’health’ of the case bases. Weplan to measurethe diversity growthof each case base (initiating and responding)for each agent. Wealso plan to examinethe
relationships between an agent’s behavior (such as
numberof tracking tasks, the numberof negotiation
tasks, the CPUresources allocated, etc.) with the
agent’s learning rate. For example,if an agent is alwaysbusy, does it learn moreor less, or learn faster or
more slowly?
(2) Weplan to investigate the learning of "good-enough,
soon-enough" solutions. Under time constraints,
agents cannot afford to performcomplicatedlearning.
Shouldthe agent learn hastily sub-optimalsolutions to
its coalition formation and negotiation ventures? Or
should it keep track of its coalition formationsuccess
rates and its negotiationsuccess rates? For example,if
one of the rates drops below an acceptable level,
should an agent ask for help from more successful
agents and perhaps learn their cases? If so, then we
mayhaveto incorporate time issues in this behaviorto
trade-off betweenthe loss of the target tracking time,
and the gain of higher coalition formationrates.
(3) Weplan to investigate further the effects of reinforcementlearning on the agents’ coalition formationcapabilities. Willeach agent learn to approacha static coalition formationfor each particular task? That is, will
the reinforcementlearning inhibit an agent to look to
someother neighbors for help instead of the sameold
groupof neighbors?If so, this could be harmfulto the
system?
(4) Weplan to investigate cooperative and distributed
case-based learning (Martin and Plaza 1998; Martin et
aL 1998; Plaza et aL 1997). However, instead of
evaluating the worthof such learning in terms of tasks
getting executed, wewant to concentrate on coalitions
getting formed. Webelieve that combiningcase-based
learning with negotiations allows the agents to exchangeexperiences selectively---a negotiation occurs
only whennecessary. This facilitates a moreconservative but potentially morepowerfullearning behavioras
the learning is reinforced through subsequent agent
processes.
Conclusions
Wehave described the combination of reinforcement and
case-based learning in negotiating multiagent systems.
Each agent keeps a profile of its interactions with its
neighbors and keeps a log on their relationships. This allows an agent to computea potential utility of each coalition candidate and only approach those with high utility
values to increase the chanceof a successful coalition formation. This is a form of reinforcementlearning as agents
112
are able to learn to go back to the sameagents that have
cooperated before and to argue more effectively. The approachof negotiation-basedcoalition facilitates collaborative learning efforts amongthe agents as the negotiation
outcomes influence each agent’s perception of its
neighbors,which,in turn, plays an importantrole in forming better coalitions faster. This learning is derived from
the outcomeof the negotiations and also from the information exchangedduring a negotiation. Eachagent also uses
case-based reasoning to derive a negotiation strategy.
Whena negotiation completes, the resultant new case is
learned if it increases the diversity of the case base. The
integration of these learning mechanismsleads to more
effective negotiations and deals betweenagents. Initial
results supportthis avenueof research and showthat learning agents performbetter and moresuccessful negotiations.
Acknowledgments
The work described in this paper is sponsored by the Defense AdvancedResearch Projects Agency(DARPA)and
Air Force Research Laboratory, Air Force Materiel Command, USAF,under agreement number F30602-99-2-0502.
The U.S. Governmentis authorized to reproduce and distribute reprints for Governmental
purposesnotwithstanding
any copyright annotation thereon. The views and conclusions containedherein are those of the authors and should
not be interpreted as necessarily representing the official
policies or endorsements,either expressed or implied, of
the Defense Advanced Research Projects Agency
(DARPA),
the Air Force Research Laboratory, or the U.S.
Government.
References
Jennings, N. R., Parsons, S., Noriega, P., and Sierra, C.
1998. On Argumentation-BasedNegotiation, in Proceedings of Int. Workshopon Multi-Agent Systems, Boston,
MA.
Martin, F. and Plaza, E. 1998. Auction-BasedRetrieval, in
Proceedingsof the 2~ CongresCatala d’InwUigenciaArtificial, Girona, Spain, 136-145.
Martin, F., Plaza, E. and Arcos, J. L. 1998. Knowledge
and
Experience Reuse through CommunicationamongCompetent (Peer) Agents,International Journalof SoftwareEngineering and KnowledgeEngineering, 9(3):319-341.
Plaza, E., Arcos, J. L., and Martin, F. (1997). Cooperative
Case-BasedReasoning,in Weiss, G. (ed.) Distributed artificial intelligence meetsmachinelearning, Lecture Notesin
Artificial Intelligence number1221, Springer Verlag, 180201.
Sob, L.-K. and Tsatsoulis, C. 2001. Reflective Negotiating
Agents for Real-TimeMultisensorTarget Tracking, in Proceedings oflJCAl’Ol, Seattle, WA,1121-1127.