From: AAAI Technical Report SS-96-01. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Learning Organizational Roles in a Heterogeneous Multi-agent System* M V Nagendra Prasad 1, Victor R. Lesser 1 2, and Susan E. Lander 1Department of Computer Science University of Massachusetts Amherst, MA01003 ~nagendra,lesser}~cs.umass.edu 2Blackboard Technology Group, Inc. 401 Main Street Amherst, MA01002 lander@bbtech.com Abstract In thispaper, wepresent a heterogeneous multi-agent system called L-TEAMthathighlights thepotential forlearning asanimportant toolforacquiring organizational knowledge in heterogeneous multi-agent systems. Nosingle organization isgoodforallsituations anditisalmost impossible toanticipate allthesituations andprovide theagents withcapabilities toreact to eachof themappropriately. L-TEAMis ourattempttostudy waysto alleviate thisenormous knowledgeacquisition bottle-neck. Introduction As Distributed Artificial Intelligence matures as a field, the complexity of the applications being tackled are beginning to challenge some of the commonassumptions such as homogeneity of agents and tightly integrated coordination. Reusability of legacy systems and heterogeneity of agent representations demand a reexamination of many of the key assumptions about the amount of sharable information and the types of protocols for coordination. Lander and Lesser (Lander & Lesser 1994) developed the TEAMframework to examine cooperative search amonga set of heterogeneous reusable agents. TEAMis an open system assembled through minimally customized integration of a dynamically selected subset from a catalogue of existing agents. Reusable agents may be involved in systems and situations that may not have been explicitly anticipated at the time of their design. Each agent works on a specific part of the overall problem. The agents work towards achieving a set of local solutions to different parts of the problem that are mutually consistent and that satisfy, as far as possible, the global considerations related to the overall problem. TEAMwas introduced in the context of parametric design in multi-agent systems. Each of the agents has its ownlocal state information, a local database withstatic anddynamic constraints on itsdesigncomponents anda localagendaof potential actions. The search is performed overa spaceof partial designs. It is initiated by placing a problem specification ina centralized sharedmemorythatalsoactsas a repository fortheemerging composite solutions (i.e.partial solutions) andis visible to alltheagents. Anydesign component produced by an agcntis placedin thecentralized repository. Someof theagentsinitiate base proposals basedon theproblem specifications andtheir owninternal constraints andlocalstate.Otheragents in turnextendandcritique theseproposals to form complete designs. An agentmaydetectconflicts duringthisprocess andcommunicate feedback to therelevant agents; augmenting theirlocalviewof thecomposite searchspace withmeta-level information aboutits localsearch spacetominimize thelikelihood of generating conflicting solutions(Lander & Lesser1994).For a composite solution in a givcnstate,an agentcan playone of a setof organizational roles(inTEAM,theseroles are solution-initiator, solution-eztender, or solutioncritic). An organizational role represents a set of operators an agent can apply to a composite solution. An agent can be working on several composite solutions concurrently. Thus, at a given time, an agent is faced with the problem of: 1) choosing which solution to work on; and 2) choosing a role from the set of allowed roles that it can play for that solution. This decision is complicated by the fact that an agent has to achieve this choice within its local view of the problem-solving situation. The objective of this paper is to investigate the utility of machine learning techniques as an aid to the decision processes of agents that may be involved in problem-solving situations not necessarily knownat the time of their design. The results in our previous paper(NagendraPrasad, Lesser, & Lander 1995b) *Thismaterial isbaseduponworksupported by theNationalScience Foundation underGrantNos.IRI-9523419 demonstrated the promise of learning techniques for such a task in L-TEAM,which is a learning version of andEEC-9209623. Thecontent of thispaperdoesnotnecessarily reflect theposition orthepolicy oftheGovernment, TEAM.In this paper, we present further experimental dctails of our results and somenew results since. andnoofficial endorsement should beinferred. 72 Organizational Roles Search in Distributed Organizational knowledge can be described as a specification of the way the overall search should be organized in terms of which agents play what roles in the search process and communicatewhat information, when and to whom. It provides the agents a way to effectively and reliably handle cooperative tasks. Each agent in L-TEAMplays some organizational role in distributed search. A role is a task or a set of tasks to be performed in the context of a single solution. A role may encompass one or more operators, e.g., the role solution-initiator includes the operators initiatesolution and relax-solution-requirement. A pattern of activation of roles in an agent set is a role assignment. All agents need not play all organizational roles; which in turn implies that agents can differ in the kinds of roles they are allotted. Organizational roles played by the agents are important for the efficiency of a search process and the quality of final solutions produced. To illustrate the above issue, we will use a simple, generic two-agent example and their search and solution spaces as shown in Figure 1. The shaded portions in the local spaces of the agents A and B axe the local solution spaces and their intersection represents the global solution space. It is clear that if agent A initiates and agent B extends, there is a greater chance of finding a mutually acceptable solution. Agent A trying to extend a solution initiated by Agent B is likely to lead to a failure more often than not due the small intersection space versus the large local solution space in Agent B. Note however, that the solution distribution in the space is not knowna priori to the designer to hand code good organizational roles at the design time. During each cycle of operator application in TEAM, each agent in turn has to decide on the role it can play next, based on the available partial designs. An agent can choose to be an initiator of a new design or an extender of an already existing partial design or a critic of an existing design. The agent needs to decide on the best role to assume next and accordingly construct a design component. Due the complexity and uncertainty associated with most real-world organizations, there maybe no single organizational structure for all situations. This makes it imperative that the agents adapt themselves to be better suited for the current situation, so as to be effective. This paper investigates the effectiveness of learning situation-specific organizational roles assignments. Roles for the agents are chosen based on the present problem solving situation (we will discuss situations in more detail in the following section). Learning Organizational Roles The formal basis for learning search strategies adopted in this paper is derived from the UPC formalism for search control (see (Whitehair & Lesser 1993)) that relies on the calculation and use of the Utility, Probability and Cost (UPC) values associated with each (state, op, final state) tuple. The Utility component represents the present state’s estimate of the final state’s expected value or utility if we apply operator op in the present state. Probability represents the expected uncertainty associated with the ability to reach the final state from the present state, given that we apply operator op. Cost represents the expected computational cost of reaching the final state. Additionally, in the complex search spaces for which the UPCformalism was developed, application of an operator to a state does more than expand the state; it may also result in an increase in the problem solver’s understanding of the interrelationships amongstates. In these situations, an operator that looks like a poor choice from the perspective of a local control policy may actually be a good choice from a more global perspective due to some increased information it makes available to the problem solver. This property of an operator is referred to as its potential and it needs to be taken into account while rating the operator. An evaluation function defines the objective strategy of the problem solving system based on the UPCcomponents of an operator and its potential. We modify the UPC formalism for the purpose of learning organizational roles for agents. All the possible states of the search are classified into a preenumerated finite class of situations. These classes of situations represent abstractions of the state of a search. Thus, for each agent, there is a UPCvector per situation per operator leading to a final state. A situation in L-TEAM is represented by a feature vector whose values determine the class of a state of the search. In L-TEAM,an agent responsible for decision making at the node retrieves the UPCvalues based on the situation vector for all the roles that are applicable in current state. Dependingon the objective function to be maximized, these UPCvectors are used to choose a role to be performed next. Weuse the supervised-learning approach to prediction learning (see (Sutton 1988)) to learn estimates 1. the UPCvectors for each of the states Obtaining measures of potential is a more involved process requiring a certain understanding of the system. For example, in L-TEAMthe agents can apply operators that lead to infeasible solutions due to conflicts in their requirements. However, this process of running into a conflict leads to certain important consequences such as the exchangeof violated constraints. The constraints an agent receives from other agents aid that agent’s subsequent search in that episode by letting it relate its local solution requirements to more global requirements. Hence, the operators leading’ to conflicts followed by information exchange are 1For details, the reader is referred to NagendraPrasad et. al.(NagendraPrasad, Lesser, &Lander 1995a) 73 Y X -- m X -- AgentB AgentA SolutionSpacesof AgentsAand B overXand Y Composite SolutionSpaceof Aand B Figure 1: Local and Composite Search Spaces This phase shift is represented in the situation vector through the third component: during the initial phase of conflict detection and exchangeof information, it is ’0’ while in the second phase, it is ’1’. Weused the following objective evaluation function for rating an organizational role: rewarded by potential. Learning algorithm for the potential of an operator again uses supervised-learning approach to prediction learning and is similar to that for utility. Experiments f(U, P, C, potential) To demonstrate the effectiveness of the mechanisms in L-TEAMand compare them to those in TEAM, we used the same domain as in (Lander & Lesser 1994) -- parametric design of steam condensers. The prototype multi-agent system for this domain, built on top of the TEAMframework, consists of seven agents: pump-agent, heat-exchanger-agent, motoragent, vbelt-agent, shaft-agent, platform-agent, and frequency-critic. Problem specification consists of three parameters -- required capacity, platform side length, and maximumplatform deflection. Each agent can play a single organizational role in any single design. In this paper, we confine ourselves to learning the appropriate application of two roles in the agents- solution-initiator or solution-eztender. Four of the seven agents -- pump-agent, motor-agent, heatexchanger-agent, and Vbelt-agent -- are learning either to be a solution-initiator or solution-extender in each situation. The other three agents have fixed organizational roles -- platform and shaft agents always extend and frequency-critic always critiques. In the experiments reported below, the situation vector for each agent had three components. The first component represented changes in the global views of any of the agents in the system. If any of the agents receives any new external constraints from other agents in the past m time units (m is set to 4 in the experiments), this componentis ’1’ for all agents. Otherwise it is ’0’. If any of the agents has relaxed its local quality requirements in the past n time units (n = 2) then the second component is ’1’ for all agents. Otherwise it is ’0’. Typically, a problem solving episode in L-TEAM starts with an initial phase of exchange of all the communicableinformation involved in conflicts and then enters a phase where the search is more informed and all the information that leads to conflicts and can be communicated has already been exchanged. = U, P + potential We trained L-TEAMon 150 randomly generated design requirements and then tested L-TEAMand TEAMpairwise on 100 randomly generated design requirements different from those used for training. TEAMwas set up so that heat-exchanger and pump agents could either initiate a design or extend a design whereas Vbelt, shaft and platform agents could only extend a design. In TEAM,an agent initiates a design only if there are no partial designs on the blackboard that it can extend. The performance parameter we analyzed was the cost of the best design produced (lowest cost). We ran L-TEAM and TEAM in two ranges of the inputparameters. RangeI consistedof requiredcapacity50 - 1500,platform-side-length 25 - 225, platform-deflection 0.02- 0.1.Range2 consisted of requlred-capaclty 1750-2000,platform-side-length 175 - 225,platform-deflection 0.06- 0.1.Lowervaluesof required-capacity in Range1 represented easierproblems.We chosethe two rangesto represent "easy" and "tough"problems. Table2 represents the same organization learnedby non-situation-specific TEAM in boththeranges. Onecan seefromTable3 andTable 4 thatthe twolearned organizations forRangeI and Range2 aredifferent. Table1 showstheaverage design costs forthethreesystems - situation-specific LTEAM (ss-L-TEAM), non-situation-specific L-TEAM (ns-L-TEAM), and TEAM- overthe 2 ranges. Wilcoxon matched-pair signed-ranks testrevealed significant differences between thecostofdesigns producedby all the pairsin the tableexceptbetween situation-specific L-TEAMandnon-situation-specific L-TEAM in Range 12 and betweennon-situation2Easyproblems maynotgainby sophisticated mechanisms like situation-speclficity 74 Range Range 1 Range 2 ss-L-TEAM 5587.6 17353.75 ns-L-TEAM 5616.2 17678.97 TEAM 5770.6 17704.70 Table 1: Average Cost of a Design agent pump heatx agent agent initiator extender ro~e8 motor agent vbelt agent extender extender shaft agent extender Table 2: Organizational roles for non-situation-specific agent pump agent hear agent motor agent vbelt agent 1 1 1 extender initiator situation 1 1 0 1 0 O 1 1 1 0 1 0 initiator initiatorinitiator extender extender extender extender extender extender extender extender extender extender extender extender extender extender extender frequency critic extender critique L-TEAM after learning 0 0 0 1 0 1 initiator extender 0 0 0 initiator extender initiator extender extender extender extender extender extender extender Table 3: Organizational roles learned by situation-specific 75 platform agent L-TEAM for Range 1 a. specific L-TEAMand TEAMin Range 2 These experiments highlight some important aspects of role organization: ¯ Learning is a promising approach to acquiring organizational knowledge about roles in distributed search. Both situation-specific and non-situation specific learning outperforms hand-coded organizational role assignment in our experiments. ¯ Situation-specificity is an important part of role organization. Situation-specific L-TEAM produces designs that are lower in cost than those produced by non-situation-specific L-TEAM in both the ranges. ¯ In addition, the role organization produced by learning is different for the two different ranges implying that different nature of the problems being solved by the system mayneed different role organization. Conclusion L-TEAMhighlights the potential for learning as an important tool for acquiring organizational knowledge in heterogeneous multi-agent systems. No single organization is good for all situations and it is almost impossible to anticipate all the situations and provide the agents with capabilities to react to each of them appropriately. L-TEAM is our attempt to study ways to alleviate this enormous knowledgeacquisition bottleneck. References Lander, S. E., and Lesser, V. R. 1994. Sharing metainformation to guide cooperative search among heterogeneous reusable agents. Computer Science Technical Report 94-48, University of Massachusetts. To appear in IEEE Transactions on Knowledge and Data Engineering, 1996. NagendraPrasad, M. V.; Lesser, V. R.; and Lander, S. E. 1995a. Learning organizational roles in a heterogeneous multi-agent system. Computer Science Technical Report 95-35, University of Massachusetts. NagendraPrasad, M. V.; Lesser, V. R.; and Lander, S. E. 1995b. Learning experiments in a heterogeneous multi-agent system. In Proceedings of the IJCAI95 Workshop on Adaptation and Learning in MultiAgent Systems. Sutton, R. 1988. Learning to predict by the methods of temporal differences. MachineLearning 3:9-44. Whitehair, R., and Lesser, V. R. 1993. A framework for the analysis of sophisticated control in interpretation systems. Computer Science Technical Report 93-53, University of Massachusetts. s In addition to cost of design, we also did experiments on numberof cycles (representing the amountof search. ss-L-TEAM out performed both ns-L-TEAMand TEAM on these measurestoo. 76 1 1 1 initiator agent pump agent hea~ agent extender motor agent extender vbelt agent extender 1 1 0 initiator extender 1 0 1 situation 1 0 0 1 0 I initiator initiator extender initiator extender 0 1 0 initiator 0 0 1 0 0 0 extender extender extender extender extender extender extender extender initiator extender initiator extender extender extender extender extender extender extender extender extender Table 4: Organizational roles learned by situation-specific 77 L-TEAM for Range 2