Learning Problem Solving Control in ... M V Nagendra Prasad and Victor R Lesser

From: AAAI Technical Report WS-97-03. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Learning Problem Solving Control in Cooperative Multi-Agent Systems*
MV NagendraPrasad and Victor R Lesser
Department of ComputerScience
University of Massachusetts, Amherst, MA01003.
{nagendra,lesser} @cs.umass.edu
Abstract
The workpresented here deals with waysto improveproblemsolving control skills of cooperativeagents. Wepropose
situation-specificlearningas a wayto learn cooperative
control
knowledgein complexmulti-agent systems. Wedemonstrate
the powerof situation-specificlearningin the contextof dynamicchoiceof a coordinationstrategy basedon the problem
solving situation. Wepresent a leamingsystem, called COLLAGE,
that uses meta-levelinformationin the formof abstract
characterizationof the coordinationprobleminstanceto leam
to choosethe appropriatecoordinationstrategy fromamonga
class of strategies.
the agent set. Muchof the explicitly cooperative activities
involves direct communication.In contrast, indirect communication is achieved through observation of the behavior of
the other agents and their effects on the environment(Mataric
1993). Indirect communicationinvolves cues that are "simply by-productsof behaviors performedfor reasons other than
communication"(Seeley 1989).
In this paper, wedeal with an explicitly cooperativedirectly
communicatingsystem. In such systems, the local problem
solving control of an agent can interact in complexand intricate ways with the problemsolving control of other agents.
In these systems, an agent cannot makeeffective control decisions based on purely local problemsolving state. A given
The work presented here deals with ways to improveproblocal problemsolving state can mapto a multitude of global
lem solving control skills of cooperative agents. Problem
problemsolving states. Anagent with a purely local view of
solving control is the process by whichan agent organizes its
the problemsolving situation cannotlearn effective control decomputation. Anagent in a MASfaces control uncertainties
cisions that mayhave global implications, due the uncertainty
due to its partial local viewof the problemsolvingstates of the
aboutthe overall state of the system.Thisgives rise to the need
other agents in the system. Uncertainties about progress of
for learning moreglobally situated control knowledge.Onthe
problemsolving in other agents, characteristics of non-local
interacting sub-problems,expectations of non-local problem- other hand, an agent cannot utilize the entire global problem
solving state even if other agents are willing to communisolving activity, and generatedpartial results can lead to global
cate all the necessary information because of the limitations
incoherence and degradation in system performance if they
on its computationalresources, representational heterogeneare not managedeffectively through cooperation. Effective
like distraction. If an agent A provides
cooperation in such systems requires that the global problem- ity and phenomenon
another agent B with enormous amount of problem solving
solving state influence the local control decisions madeby an
agent. Wecall such an influence cooperative control(Lesser
state information,then agent B has to divert its computational
resources awayfrom local problem solving towards the task
1991).
of assimilating this information. This can have drastic conCooperationamongagents can be explicit or implicit. In explicitly cooperativesystems, the agents interact and exchange sequences in time constrained or resource boundeddomains.
Moreover, the phenomenon
of distraction can becomea seriinformation or performactions so as to benefit other agents.
ous problem.The incorrect or irrelevant information provided
Onthe other hand, implicitly cooperative agents perform acby another agent with weakly constrained knowledgecould
tions that are a part of their owngoal-seeking process but
these actions affect the other agents in beneficial ways(Mataric lead the receiving agent’s computations along unproductive
directions(Lesser 1991).
1993).
Agents exchange information through communication. An
This brings us to the importanceof communicating
only relagent in a MASneeds to communicatewith other agents to
evant information at suitable abstractions to other agents. We
acquire a view of the non-local problemsolving so as to make call such information a situation. Anagent needs to associate
local decisions that are influenced by moreglobal consideraappropriate views of the global situation with the knowledge
tions. The agents perform direct communicationor indirect
learned about effective control decisions. Wecall this form of
communicationas a part of their process of interaction with
knowledgesituation-specific control. In complexmulti-agent
other agents. Direct communicationinvolves intentionality
systems, the knowledgeengineering involved in providing
entailing a sender and one or more receivers. The commu- the agents with effective control strategies is a formidableand
nication is purposeful and is aimedat particular membersof
perhaps even impossible task due to the dynamic nature of
the interactions amongagents and between the environment
*Thismaterial is based uponworksupportedby the National
and agents. Thus, we appeal to machinelearning techniques
ScienceFoundationunderGrant Nos. IRI-9523419.Thecontent of
to learn situation specific problemsolving control in such
this paperdoesnot necessarilyreflect the positionor the policyof
systems. Learningcan involve acquiring strategies for negothe Government,
and no official endorsement
shouldbe inferred.
53
tiation or coordination, detecting coordination relationships,
incrementallyconstructing modelsof other agents and the task
environment.
In this paper, we will be dealing with learning coordination as an illustration of learning situation-specific control.
Coordination is the act of managinginterdependencies in a
multi-agent system(Decker&Lesser 1995). The agents learn
to dynamicallychoose the appropriate coordination strategy
in different coordination probleminstances. Weempirically
demonstratethat even for a narrowclass of agent activities,
learning to choosean appropriate coordination strategy based
on meta-level characterization of the global problemsolving state outperformsusing any single coordination strategy
across all probleminstances. The broad insights presented
here are moregenerally applicable to learning problemsolving control in complexcooperative multi-agent systems. Our
past workhas dealt with learning situation-specific organizational roles in a multi-agent parametric design system(Nagendra Prasad, Lesser, & Lander 1996b) and learning non-local
requirements in a negotiated search process(NagendraPrasad,
Lesser, & Lander 1996a).
Related Work
Muchof the literature in multi-agent learning relies on reinforcementlearning and classifier systems as learning algorithms. In Sen, Sekaran and Hale(Sen, Sekaran, &Hale 1994)
and Crites and Barto(Crites &Barto 1996), the agents do not
communicatewith one another and an agent treats the other
agents as a part of the environment.Weiss(Weiss1994) uses
classifier systems for learning appropriate multi-agent hierarchical organization structuring relationships. In Tan(Tan
1993), the agents share perception information to overcome
perceptual limitations or communicatepolicy functions. In
Sandholmand Crites(Sandholm&Crites 1995) the agents are
self-interested and an agent is not free to ask for any kind of
information from the other agents. The MASthat this paper
deals with contains complexcooperative agents (each agent is
a sophisticated problemsolver) and an agent’s local problem
solvingcontrol interacts withthat of the other agents’in intricate ways. In our work, rather than treating other agents as a
part of the environmentand learn in the presenceof increased
uncertainty, an agent tries to resolve the uncertainty about
other agents’ activities by communicatingmeta-level information(or situations) to resolve the uncertainty to the extent
possible (there is still the environmentaluncertainty that the
agents cannot do muchabout). In cooperative systems, an
agent can "ask" other agents for any information that it deems
relevant to appropriately situate its learned local coordination knowledge. Agents sharing perceptual information as
in Tan(Tan1993) or bidding information as in Weiss(Weiss
1994) do not makeexplicit the notion of situating the local
control knowledgein a more global abstract situation. The
information shared is weakand they are studied in domains
such as predator-prey(Tan1993) or blocks world (Weiss 1994)
wherethe need for sharing meta-level information and situating learningin it is not apparent.
54
Environment-Specific Coordination
Mechanisms
In this paper, weshowthe effectiveness of learning situationspecific coordination by extending a domain-independentcoordination frameworkcalled Generalized Partial Global Planning (GPGP)(Decker&Lesser 1995). GPGPis a flexible
modularmethodthat recognizes the need for creating tailored
coordinationstrategies in responseto the characteristics of a
particular task environment.It is structured as an extensible
set of modular coordination mechanismsso that any subset
of mechanisms can be used. Each mechanism can be invokedin response to the detection of certain features of the
environment and interdependencies between problem solving
activities of agents. The coordination mechanismscan be
configured(or parameterized)in different waysto derive different coordination strategies. In GPGP
(without the learning
extensions), once a set of mechanismsare configured (by
humanexpert) to derive a coordination strategy, that strategy is used across all probleminstances in the environment.
Experimentalresults have already verified that for someenvironmentsa subset of the mechanismsis moreeffective than
using the entire set of mechanisms(Decker & Lesser 1995;
NagendraPrasad et al. 1996). In this work, we present
learning extension, called COLLAGE,
that endowsthe agents
with the capability to choosea suitable subset of the coordination mechanismsbased on the present problem solving
situation, instead of havinga fixed subset across all problem
instances in an environment.
GPGPcoordination mechanismsrely on the coordination
problem instance being represented in a frameworkcalled
T~EMS(Decker
& Lesser 1993). The T~EMSframework (Task
Analysis, Environment Modeling, and Simulation) (Decker
& Lesser 1993) represents coordination problems in a formal, domain-independent way. It captures the essence of
the coordination probleminstances, independent of the details of the domain, in terms of trade-offs betweenmultiple
ways of accomplishinga goal, progress towards the achievementof goals, various interactions betweenthe activities in
the probleminstance and the effect of these activities on the
performance of the system as a whole. There are deadlines
associated with the tasks and someof the subtasks maybe
interdependent,that is, cannot be solved independentlyin isolation. There maybe multiple waysto accomplisha task and
these tradeoff the quality of a result producedfor the time to
producethat result. Quality is used as a catch-all term representing acceptability characteristics like certainty, precision,
and completeness, other than temporal characteristics. A set
of related tasks is called a task groupor a task structure. A
task structure is the elucidation of the structure of the problem
an agent is trying to solve. The quality of a task is a function
of the quality of its subtasks. The leaf tasks are denoted as
methodswith base level quality and duration. Methodsrepresent domainactions, like executing a blackboard knowledge
source, running an instantiated plan, or executing a piece of
code with its data. A task mayhave multiple waysto accomplish it, represented by multiple methods,that trade off the
time to producea result for the quality of the result. Besides
task/subtask relationships, there can be other interdependen-
cies, called coordination interrelationships, betweentasks in
a task group(Decker&Lesser 1993). In this paper, we will
be dealing with two such interrelationships:
¯ facilitates relationship or soft interrelationship: TaskA facilitates TaskB if the the executionTaskA producesa result
that affects an optional parameter of Task B. Task B could
have been executed without the result from Task A, but the
availability of the result providesconstraints on the execution of Task B, leading to improved quality and reduced
time requirements.
¯ enables relationships or hard interrelationship: If Task A
enables Task B then the execution Task A producesa result
that is a required input parameter for Task B. The results
of execution of Task A must be communicatedto Task B
before it can be executed.
In GPGP,a coordination strategy can be derived by activating a subset of the coordination mechanisms.Specifically we
will be investigating the effect of three coordinationstrategies:
Balanced (or dynamic-scheduling): Agents coordinate their
actions by dynamically forming commitments. A commitmentrepresents a contract to generate results of a particular
quality by a specified deadline. Relevant results are generated by specific times and communicatedto the agents to
whomcorresponding commitmentsare made. Agents schedule their local tasks trying to maximizethe accrual of quality
based on the commitmentsmade to it by the other agents,
while ensuring that commitmentsto other agents are satisfied. The agents have the relevant non-local view of the coordination problem, detect coordination relationships, form
commitmentsand communicatethe committed results.
Data Flow Strategy: An agent communicatesthe result of
performinga task to all the agents and the other agents can
exploit these results if they still can. This represents the other
extreme where there are no commitmentsfrom any agent to
any other agent.
Roughcoordination: This is similar to balanced
but commitments do not arise out of communicationbetween agents
but are knowna priori. Eachagent has an approximateidea of
whenthe other agents complete their tasks and communicate
results based on its past experience. "Roughcommitments"
are a form of tacit social contract betweenagents about the
completiontimes of their tasks.
COLLAGE:
Learning Coordination
LearningCoordination
Our learning algorithm, called COLLAGE,
uses abstract
meta-level information about coordination probleminstances
to learn to choose, for the given probleminstance, the appropriate coordination strategy from the three strategies described previously. Learning in COLLAGE
(COordination
Learner for muLtiple AGEntsystems) falls into the category
of Instance-Based Learning algorithms(Aha, Kibler, & Albert
1991) originally proposedfor supervisedclassification learning. We, however, use the IBL-paradigmfor unsupervised
learning of decision-theoretic choice.
Learninginvolves runningthe multi-agent systemon a large
numberof training coordination problem instances and observing the performanceof different coordination strategies
55
on these instances. Whena newtask group arises in the environment,each of the agents has its ownpartial local view of
the task group. Based on its local view, each agent forms a
local situation vector. A local situation represents an agent’s
assessmentof the utility of reacting to various characteristics of the environment. Such an assessment can potentially
indicate how to activate the various GPGPmechanismsand
consequentlyhas a direct bearing on the type of coordination
strategy that is best for the given coordination episode. The
agents then exchangetheir local situation vectors and each
of the agents composesall the local situation vectors into a
global situation vector. All agents agree on a choice of the
coordination strategy and the choice dependson the kind of
learning modeof the agents:
Mode1" In this mode,the agents run all the available coordination strategies and note their relative performances
for the each of the coordination problem instances. Thus,
for example, agents run each of data-flow, rough, and
balanced for a coordination episode and store their performancesfor each strategy.
Mode2: In this mode,the agents choose one of the coordination strategies for a given coordination episode and observe
and store the performanceonly for that coordination strategy. They choose the coordination that is represented the
least numberof times in the neighborhoodof a small radius
around the present global situation. This is done to obtain
a balanced representation for all the coordination strategies
across the space of possible global situations.
At end of each run of the coordination episode with
a selected coordination strategy, the performance of the
system is registered. This is represented as a vector of
four performance measures: total quality, numberof methods executed, number of communications, and termination time. Learning involves simply adding the new instance formedby the performanceof the coordination strategy along with the associated problem solving situation
to the "instance-base". Thus, the training phase builds a
set of {situation, coordination_strateg~l, performance}
triplets for each of the agents. Herethe global situation vector
is the abstraction of the global problemsolving state associated with the choice of a coordination-strategy. Note that at
the beginningof a problemsolving episode, all agents communicate their local problemsolving situations to other agents.
Thus, each agent aggregates the local problemsolving situations to form a commonglobal situation. All agents form
identical instance-bases because they build the sameglobal
situation vectors through communication.
Forminga Local Situation Vector The situation vector is
an abstraction of the coordination problemand the effects of
the coordination mechanismsin GPGP.It is composedof six
components:
(a) The first component
represents an approximationof the effect of detecting soft coordinationrelationships on the quality
componentof the overall performance. Anagent creates virtual task structures from the locally available task structures
by letting each of the facilitates coordination relationships potentially affecting a local task to actually take effect
and calls the scheduler on these task structures. In order to
achieve this, the agent detects all the faci i i rates interrelationships that affect its tasks. Anagent can be expectedto
knowthe interrelationships affecting its tasks thoughit may
not knowthe exact tasks in other agents that affect it without
communicatingwith them. The agent then produces another
set of virtual task structures, but this time with the assumption
that the facilitates
relationships are not detected and
hence the tasks that can potentially be affected by themare
not affected in these task structures. The scheduler is again
called with this task structure. The first component,representing the effect of detecting facilitates is obtained
as the ratio of the quality producedby the schedule without
fac i 1 i ta tes relationships and the quality producedby the
schedule with fac i litates relationships.
(b) The second component represents an approximation
the effect of detecting soft coordination relationships on the
duration componentof the overall performance. It is formed
using the same techniques discussed above for quality but
using the duration of the schedules formedwith the virtual
task structures.
(c) The third and fourth componentsrepresent an approximation of the effect of detecting hard coordinationinterrelationships on the quality and duration of the local task structures
at an agent. They are obtained in a mannersimilar to that
described for facilitates.
(e) The fifth componentrepresents the time pressure on the
agent. In a design-to-time scheduler, increased time pressure
on an agent will lead to schedules that will still adhere to
the deadline requirements as far as possible but with a sacrifice in quality. Undertime pressure, lower quality, lower
duration methodsare preferred over higher quality, higher duration methodsfor achieving a particular task. In order to
get an estimate of the time pressure, an agent generates virtual task structures from its local task structures by setting
the deadlines of the task groups, tasks and methodsto oo (a
large number)and scheduling these virtual task structures.
The agents scheduleagain with local task structures set to the
actual deadline. Timepressure is obtained as the ratio of the
schedule quality with the actual deadlines and the schedule
quality with large deadlines.
(f) The sixth componentrepresents the load. It is obtained
as the ratio of execution time under actual deadline and the
execution time under no time pressure. It is formed using
methodssimilar to that discussed abovefor time pressure but
using the duration of the schedules formedwith the virtual
task structures.
In the workpresented here, the cost of schedulingis ignored
and the time for scheduling is considered negligible compared
to the execution time of the methods. However,more sophisticated modelswouldneed to take into consideration these
factors too. Weview this a one of our future directions of
research.
Forminga Global Situation Vector Each agent communicates its local situation vector to all other agents. Anagent
composesall the local situation vectors: its ownand those it
received from others to form a global situation vector. We
can have a numberof composition functions but the one we
used in the experimentsreported here is simple: component56
wise average of the local situation vectors. Thusthe global
situation vector has six componentswhere each component
is the averageof all the correspondinglocal situation vector
components.
Choosing a Coordination Strategy
COLLAGE
chooses a coordination strategy based on how
the set of available strategies performedin similar past cases.
Weadopt the notation from Gilboa and Schmeidler(Gilboa
Schmeidler1995). Eachcase e is triplet
<p,a,r>
Ci
C_
PxAxR
where p 6 P and P is the set of situations representing
abstract characterization of coordination problems, a 6 A
and A is the set of coordination choices available, r 6 R and
R is the set of results fromrunningthe coordinationstrategies.
Decisions about coordination strategy choice are made
based on similar past cases. Outcomesdecide the desirability
of the strategies. Wedefine a similarity function and a utility
function as follows:
s : P~ ~ [0,1]
u:R
~ R
In the experiments presented later, we use the Euclidean
metric for similarity.
The desirability of a coordination strategy is determined
by a similarity-weighted sumof the utility it yielded in the
similar past cases in a small neighborhoodaroundthe present
situation vector. Weobservedthat such an averaging process
in a neighborhoodaround the present situation vector was
more robust than taking the nearest neighbor. Let Mbe the
set of past similar cases to problempn,,o 6 P (greater than
threshold similarity).
m 6 M ¢~ s(p,e,,
m) > st~,,,hoZa
Fora 6 A, letM. - {m = (p,a,r)
utility of a is definedas
U¢p..,~,a)
6 Mla = a}. The
1
= IM~I ~ s(p,,e.,q)u(r)
(q,a,r)EMa
Experiments
Experiments in the DDP domain
Our experiments on learning coordination were conducted in
the domainof distributed data processing(NagendraPrasad et
al. 1996). Someof the early insights into learning coordination in this domainwere presented in (NagendraPrasad
Lesser 1996)and the details of the modelingof the distributed
data processing domaincan be found in (NagendraPrasad et
al. 1996). This domainconsists of a number of geographically dispersed data processing centers (agents). Eachcenter
is responsible for conductingcertain types of analysis tasks
on streamsof satellite data arriving at its site: "routine analysis" that needs to be performedon data comingin at regular
intervals during the day, "crisis analysis" that needsto be performed on the incomingdata but with a certain probability
and "low priority analysis", the need for whicharises at the
beginningof the day with a certain probability. Lowpriority
analysis involves performingspecialized analysis on specific
archival data. Different types of analysis tasks havedifferent
priorities. A center shouldfirst attend to the "crisis analysis
tasks" and then perform "routine tasks" on the data. Time
permitting, it can handle the low-priority tasks. The processing centers have limited resources to conduct their analysis
on the incomingdata and they have to do this within certain
deadlines. Results of processing data at a center mayneed
to be communicatedto other centers due the interrelationships betweenthe tasks at these centers. (NagendraPrasad et
al. 1996) developed a graph-grammar-basedstochastic task
structure description language and generation tool for modeling task structures arising in a domainsuch as this. They
present the results of empirical explorations of the effects of
varying deadlines and crisis task group arrival probability.
Based on the experiments, they noted the need for different
coordinationstrategies in different situations to achieve good
performance. In this section, we intend to demonstrate the
power of COLLAGE
in choosing the most appropriate coordination strategy in a given situation. Weperformedtwo sets
of experimentsvarying the probability of the centers seeing
crisis tasks. In the first set of experiments,the crisis task group
arrival probability was0.25 and in the secondset it was 1.0.
For both sets of experiments, low priority tasks arrived with
a probability of 0.5 and the routine tasks werealwaysseen at
the time of newarrivals. A day consisted of a time slice of 140
time units and hence the deadline for the task structures was
fixed at 140 time units. In the experiments described here,
utility is the primary performance measure. Each message
an agent communicates
to another agent penalizes the overall
utility by a factor called eomm_eost.However,achieving a
better non-local view can potentially lead to higher quality
that adds to the system-wideutility. ThUS,utility = quality
- ~al_eoraraunieation x eomm_eoat. The system consisted of three agents (or data processingcenters) and they had
to learn to choose the best coordination strategy from among
balanced,rough,and data- flow.
ExperimentsFor the experiments where crisis task group
arrival probability was 0.25, COLLAGE
was trained on 4500
instances in Mode1 and on 10000instances in Mode2. For the
case wherecrisis task grouparrival probability was 1.0, it was
trained on 2500 instances in Mode1 and on 12500instances
in Mode2. Figure 1 showsthe average quality over 100 runs
for different coordination strategies at various communication
costs. The curves for both Mode1 and Mode2 learning algorithms lie abovethose for all the other coordinationstrategies
for the most part in both the experiments. Weperformed
a Wilcoxonmatched-pair signed ranks analysis to test for
significant differences (at significance level 0.05) betweenaverage performances of the strategies across communications
costs (as versus, pairwise tests at each communication
cost).
This test revealed significant differences betweenthe learning
algorithms (both Mode1 and ModeII) and the other three
coordination strategies indicating that we can assert with a
high degree of confidence that the performanceof the learning algorithms across various communicationcosts is better
57
than statically using any one of the family of coordination
strategies t. As the communication
costs go up, the meanperformanceof the coordination strategies go down.For crisis
task group arrival probability of 0.25, the balanced coordination strategy performsbetter than the learning algorithmsat
very high communicationcosts because, learning algorithms
use additional units of communication
to form the global situation vectors. At very high communicationcosts, even the
three additional meta-level messagesfor local situation communication(one for each agent) led to large penalties on utility of system. At communicationcost of 1.0, Mode1 learner
and Mode2 learner average at 77.72 and 79.98 respectively,
whereas, choosing balanced
always produces an average
performanceof 80.48¯ Similar behavior was exhibited at very
high communicationcosts whenthe crisis task group arrival
probability was 1.0. Figure 2 gives an exampleof situationspecific choice of coordination strategies for Mode1 learner
in 100 test runs whenthe crisis task group probability was
1.0. The Z-axis showsthe numberof times a particular coordination strategy waschosenin the 100runs at a particular
communication cost. X-axis shows the communication cost
and the Y-axis showsthe coordination strategy.
-8O
’60
-40
¯ 30
10
Figure 2: Strategies chosen by COLLAGE
M1for Crisis TG
Probability 1.0 at various communication
costs
Conclusion
Manyresearchers have shown that no single coordination
mechanism
is goodfor all situations. However,there is little
in the literature that deals with howto dynamicallychoosea
coordination strategy based on the problemsolving situation.
In this paper, we presented the COLLAGE
system, that uses
meta-level information in the form of abstract characterization of the coordination probleminstance to learn to choose
the appropriate coordination strategy from amonga class of
strategies.
However,the scope situation-specific learning goes beyond
learning coordination. Effective cooperation in a multi-agent
systemrequires that the global problem-solvingstate influence the local learning process of an agent. Agents need
to associate appropriate views of the global problemsolving
*Testingacrosscommunication
costs is justified becausein reality, the cost mayvaryduringthe courseof the day.
96
4J~/~L,L~,
86
.,.~
.~
!
i
t i
,,,= CollageM2
--
.~........!
’
76.
.
220.
& datafow
....
.~ ~10~
~
i
............
.............
°
.....L~
’’ ’’ !’
~i.............
............
i I~-+°°I
....................
~a,ance,
4, !T,~ ~ , = ~ , ¯ ~
’ -C]’- rough
-
<1SO
j .........~........
i.............
i.. ! i
~
’’
i i .......i........
!.............
i.........
!............
!............
i............
!...........
i.....
!......
.......
i
1,o
I............
0 0,1 0,2 0,3 0,4 0,5 0,6 0,70,80,91
Communication
Cost
0 0,10,20,30,40,50,60,70,80,91
Communication
Cost
b)Crisis
TG1,0
a) Crisis
TG0,25
Figure 1: Average Quality versus CommunicationCost
state with the learned control knowledgein order to develop
more socially situated control knowledge.
This paper presents someof the empirical evidence that
we developed to validate these ideas. Further experimental
evidencefor the effectivenessof learningsituation-specific coordination can be found in (NagendraPrasad &Lesser 1997).
(Nagendra Prasad, Lesser, & Lander 1996b) and (Nagendra
Prasad, Lesser, &Lander1996a) deal with situation-specific
learning of other aspects of cooperative control.
on Learning, Interaction and Organizations in Multiagent
Environments.
NagendraPrasad, M. V., and Lesser, V.R. 1997. Learning
situation-specific coordination in cooperative multi-agent
systems. ComputerScience Technical Report 97-12, University of Massachusetts.
Nagendra Prasad, M. V.; Decker, K. S.; Garvey, A.; and
Lesser, V.R. 1996. Exploring Organizational Designs
with TAEMS:A Case Study of Distributed Data Processing. In Proceedingsof the SecondInternational Conference
on Multi-AgentSystems. Kyoto, Japan: AAAIPress.
NagendraPrasad, M. V.; Lesser, V. R.; and Lander, S. E.
1996a. Cooperative learning over composite search spaces:
Experienceswith a multi-agent design system. In Thirteenth
National Conferenceon Artificial Intelligence (AAAI-96).
Portland, Oregon: AAAIPress.
NagendraPrasad, M. V., Lesser, V. R.; and Lander, S. E.
1996b. Learning organizational roles in a heterogeneous
multi-agent system. In Proceedings of the SecondInternational Conference on Multi-Agent Systems. Kyoto, Japan:
AAAIPress.
Sandholm,T., and Crites, R. 1995. Multi-agent reinforcementlearning in the repeated prisoner’s dilemma,to appear
in Biosystems.
Seeley, T. D. 1989. The Honey Bee Colony as a Superorganism. AmericanScientist 77:546-553.
Sen, S.; Sekaran, M., and Hale, J. 1994. Learning to coordinate without sharing information. In Proceedingsof the
Twelfth National Conferenceon Artificial Intelligence, 426431. Seattle, WA:AAAI.
Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedingsof the Tenth
International Conference on MachineLearning, 330-337.
Weiss, G. 1994. Somestudies in distributed machinelearning and organizational design. Technical Report FKI- 189-94,
Institut fiir Informatik, TUMiinchen.
References
Aha, D. W.; Kibler, D.; and Albert, M. K. 1991. Instancebased Learning Algorithms. MachineLearning 6:37-66.
Crites, R. H., and Barto, A. G. 1996. Improving elevator
performance using reinforcement learning. In Touretzky,
D.; Mozer, M.; and Hasselmo,M., eds., Advancesin Neural
Information Processing Systems 8. Cambridge, MA:MIT
Press.
Decker, K. S., and Lesser, V. R. 1993. Quantitative modeling of complexcomputational task environments. In Proceedings of the Eleventh National Conferenceon Artificial
Intelligence, 217-224.
Decker, K. S., and Lesser, V.R. 1995. Designing a family
of coordination algorithms. In Proceedingsof the First International Conferenceon Multi-Agent Systems, 73-80. San
Francisco, CA: AAAIPress.
Gilboa, I., and Schmeidler, D. 1995. Case-based Decision
Theory. The Quaterly Journal of Economics605-639.
Lesser, V. R. 1991. A retrospective view of FAJCdistributed
problem solving. IEEETransactions on Systems, Man, and
Cybernetics 21(6): 1347-1362.
Mataric, M. J. 1993. Kin recognition, similarity, and group
behavior. In Proceedingsof the Fifteenth AnnualCognitive
Science Society Conference, 705-710.
NagendraPrasad, M. V., and Lesser, V. R. 1996. Off-line
Learning of Coordination in Functionally Structured Agents
for Distributed Data Processing. In ICMAS96Workshop
58