Group Reputation in Multi-Agent Systems Ari Halberstadt Lik Mui Magic Cookie Software MIT Laboratory for Computer Science 9 Whittemore Road 200 Technology Square Newton, MA 02458, USA Cambridge, MA 02139, USA ari@magiccookie.com lmui@lcs.mit.edu ABSTRACT Reputation studies so far have been concentrated on individual reputation. Clearly, individuals exist within social groups. By nature of their affiliations, individuals affect how the groups that they belong to are perceived. Reputations for these groups are constructed from these individual perceptions. In multi-agents research, reputation systems have been constructed to capture individual reputation. This paper proposes a computational model for groups of agents and means for inferring reputations of group members. We argue that group reputation can be used as an a priori estimate for individual reputation. Simulations based on evolutionary game theory are used to validate our group reputation model. Categories and Subject Descriptors I.2.11 [ARTIFICIAL INTELLIGENCE ]: Distributed Artificial Intelligence – Multiagent systems. I.6.8 [SIMULATION AND MODELING]: Types of Simulation – Gaming. General Terms Experimentation, Theory. Keywords Reputation, cooperation, multiagent-based simulation, social norms, evolutionary game theory. 1. INTRODUCTION Boyd and Richerson[1] and Tirole[2] have considered groups in the context of game theory. Their group models are simple aggregations of agents with no obvious relationships among groups. We introduce a hierarchical tree model for groups in which agents can belong to multiple groups. Groups possess parameters that affect the behavior of member agents. Each agent has its own perception of individual agent and group member actions. Agents maintain interaction histories of other individual agents and use these histories to ascribe reputations to individual to groups of agents. Agents use strategies in combination with observations for decision making. For encounters between agents in different groups, we introduce the concept of shared group environment. This shared environment is the least common ancestral node for the involved agents in the group hierarchy. 2. Background The prisoner’s dilemma is a well known framework for the study of cooperation[3]. In our simulations, we apply repeated two-player prisoner’s dilemma encounters over multiple generations to randomly selected pairs of agents. In each generation, a new population of agents is created based on the fitness of agents in the preceding generation. The payoff matrix for players in the prisoners' dilemma game is shown in Figure 1. The payoffs for an agent are known as temptation to defect (T), reward for cooperation (R), punishment for defection (P), and sucker's payoff (S). Constraints on the payoffs are T R P S and 2R T S . Values that satisfy these constraints are T 5 , R 3 , P 1 , and S 0 . C D C R, R S, T D T, S P, P Figure 1. Payoff matrix for players in the Prisoners' Dilemma. Tit-for-tat (TFT) is a strategy that initially cooperates and then does whatever its opponent did in the last round. TFT can be an evolutionarily stable strategy (ESS) against always defecting agents when the probability of interaction among TFT agents is sufficiently high[3],[4]. We explored the effect of group reputation—measured as the proportion of cooperations over the total number of encounters for a given group as observed by an agent—on TFT population size. By modifying TFT to use group reputation, the modified TFT population size can increase in the presence of D agents, even with a relatively low probability for repeated interactions. 3. Group Reputation A reputation measure is a metric that allows the reputations of individual agents or groups to be compared. Higher reputations imply a more favorable opinion towards the agent or group and imply a higher level of trust towards the agent or towards members of the group. We have studied several types of reputation[5]. Of these types, group reputation is a measure of the aggregate reputation of agents that are members of a group. Group reputation is used by people in their real-life interactions. Some of these uses are helpful both to the person making use of group knowledge and to the member of the group affected by the person's choices, while other uses may be considered harmful or undesirable. Group reputation can be used by individuals to enhance their prestige and opportunity. For instance, a graduate of a prestigious school tends to have enhanced reputation and respect, even among people who do not know that individual personally, relative to a graduate of a less prestigious school. The school's reputation serves to enhance the graduate's opportunities, and presumably to also enhance the chance that an employer will hire a highly qualified and capable individual. Groups may also acquire poor reputation. Poor reputation can be used to avoid interacting with harmful individuals, such as members of criminal groups. In our simulations, group reputation is measured by an agent as the proportion of encounters in which members of a group cooperated out of the total number of encounters that were observed by the agent and that involved members of the group. Thus, group reputation varies between agents and depends on the encounters that each agent has observed and the actions of the agents participating in those encounters. Our simulations use group reputation as an a priori estimate of the reputation of an agent being encountered for the first time. In their initial interaction, an agent employing a strategy that uses group reputation will cooperate with another agent if the other agent's group reputation is above a minimum threshold. Group reputation may be the only measure available when direct interaction is limited. Since groups contain more than one member, there is a higher probability for interactions with some member of a group than with any individual member of the group. Thus, there will be more information available about the aggregate group than about any individual member. Since an individual agent may belong to multiple groups, each observed interaction may affect the reputations of all of the groups to which the agent belongs. The use of group reputation by an agent can be viewed as an attempt by the agent to discern the norms of behavior followed by a group's members. Group reputation can be a good indicator of the behavior of an unknown agent who is a member of a homogeneous group. In a homogeneous group, all member agents may be expected to follow similar or identical strategies, and thus to behave similarly. Group reputation is less successful as a predictor of agent behavior in heterogeneous groups in which agents follow different strategies, since information gleaned from one member is less applicable to another member. 4. Group Model Groups can be structured in a hierarchical tree. This is analogous to membership in a hierarchy of groups such as family, school, city, state, country. While people belong to multiple overlapping groups, modeling such a complex structure would have significantly complicated our simulations. In our model, an individual agent belongs to a primary group and to all groups in nodes higher in the group hierarchy. All agents exist within a single group tree. Each agent carries a group identity, which is observable by other agents. The group structure is accessible to all agents, so that an agent can infer the complete group membership of another agent. Figure 2 shows a sample group tree. Within this tree, an agent whose primary membership is in group 4 is also a member of groups 2 and 1, while an agent whose primary group membership is group 3 is also a member of group 1. 1 2 4 3 5 Figure 2. Sample group tree. Associated with a group are parameters that affect the interactions of its members and which specify the group's initial composition of agents. Group parameters include the initial number of agents with a given strategy and the probability that agents will interact in a group. Interaction probabilities reflect the idea that members of groups have different propensities to interact with members of other groups. Agents may interact with other agents within their primary group and with agents in any other group. Since different groups have different parameters, some determination must be made as to which group's parameters affect an interaction. In our model, an interaction between agents occurs in the group situated at the least common ancestral node of each agent's primary group membership within the group tree. The determination of the shared group, and the application of the group's parameters, are features of the system and of the groups in which the agents exist, not of individual agents. For instance, in the preceding group tree, an interaction between members of group 4 occurs in group 4, an interaction between members of groups 4 and 5 occurs in group 2, and an interaction between members of groups 4 and 3 occurs in group 1. 4.1 Agent Strategies We studied agent strategies in which the decision for an encounter with an agent is based on the last interaction with that agent. Each strategy is characterized by five probabilities for cooperation[6]: an initial probability and four probabilities for each of the possible outcomes of the last encounter. We extended these strategies by adding a reputational threshold that determines how an agent will act. Some strategies that can be constructed from the five probabilities for cooperation combined with group reputation measures include: Cooperate (C): always cooperates. Defect (D): always defects. Tit-for-tat (TFT): initially cooperates, and then does what the other agent did in the last round. Reputation tit-for-tat (RTFT): initially cooperates depending on the reputation of the other agent's primary group, and then does whatever the other agent did in the last round. When an RTFT agent has not interacted with another agent it uses group reputation as an a priori measure of whether to cooperate with the other agent. If the reputation of the target agent's primary group is less than a minimum group reputation then the RTFT agent defects, otherwise it cooperates. If an RTFT agent has no observations of a group then it initially cooperates (it assumes a favorable group reputation). Strategy I T R P S Cooperate (C) 1 1 1 1 1 Defect (D) 0 0 0 0 0 Tit-for-tat (TFT) 1 1 1 0 0 Reputation tat (RTFT) (see text) 1 1 0 0 Tit-for- Figure 3. Cooperation probabilities of different strategies. The column labeled I gives the initial probability for cooperation, while those labeled T, R, P, and S give the probabilities for cooperation given that the outcome (payoff) of the previous encounter was temptation, reward, punishment, or sucker. The initial probability for RTFT depends on group reputation. 5. Strategy Discussion Excluding reputation measures, the strategies that we examined have a history of one prior interaction with another agent. For a selective strategy to be successful, it must discriminate between agents that will defect and agents that will cooperate. The cumulative payoff to an agent (the agent's fitness) is F NT PT N R PR N P PP N S PS N x = number of x outcomes Px = payoff for each outcome x For agents following one strategy to win out over agents following another strategy, the expected fitness of the first strategy must be greater than the expected fitness of the second strategy. E[ Fa ] E[ Fb ] Fx = fitness of strategy x To avoid overly complex scenarios, in one of our simulations we looked at a population consisting only of RTFT and D agents in two homogeneous groups. Figure 4 shows all possible outcomes for interactions between two agents in these groups. <1,1>, ... RTFT-RTFT <1,0>, <0,1>, ... <0,1>, <1,0>, ... <0,0>, ... <0,1>, <0,0>, ... RTFT-D <0,0>, ... D-D <0,0>, ... Figure 4. Possible outcomes in population of RTFT and D agents. A value of 1 indicates cooperation, while a value of 0 indicates defection. An RTFT agent cooperates in its initial encounter if the other agent has sufficient group reputation. The left column shows possible encounters, while the right column shows the resulting sequence of interactions. Since all of an RTFT agent's outcomes are determined by its initial encounter, an RTFT agent receives a higher total payoff when it initially cooperates with an RTFT agent and defects against a D agent. If an RTFT agent initially defects on an RTFT agent, then both agents either get locked into an alternating defect/cooperate sequence or an always defect sequence, both of which have lower payoffs than the reward for repeated cooperation. Group reputation can help an RTFT agent make the correct choice in its first encounter with another agent. 6. Simulation Framework Our simulation consists of a tree of environments containing agents. A simulation is run for some number of generations. In each generation, agents are randomly chosen to interact and observe other agents. When agents interact, their fitness is adjusted based on the outcome of their interaction. At the end of each generation, agents leave offspring in proportion to their accumulated fitness and the parent generation is destroyed. The total population size is fixed, so any increase in the number of one type of agent is balanced by a decrease in the numbers of other types of agents. A simulation has parameters that specify the number of generations for which the simulation is run, the number of encounters between agents in each generation, and the minimum group reputation for initial cooperation by agents following the RTFT strategy. Within a simulation, a specification is provided for the group tree and for each group's associated parameters (initial number of agents and interaction probability). Agents consist of four main elements: an interaction strategy, a primary group membership, a history of interactions with other agents and groups, and an accumulated fitness. An agent uses its interaction strategy to decide how to act towards another agent. Each agent belongs to a primary group. When agents reproduce, an offspring agent inherits the parent agent's strategy and group membership. Each time an agent observes another agent, it updates the reputation for all of the groups of which the observed agent is a member. Each agent maintains its own history of other agents and groups, so that there is no shared global history or reputation. An agent's fitness measure is the sum of the benefits accumulated by an agent in all of its encounters. Participants in an interaction are chosen randomly from the total population. After the first participant is selected, a second participant is randomly selected. The shared environment of the two participants is then found, and the agents interact with the interaction probability of the shared environment. If no interaction takes place, then a different second agent is selected and the test is repeated until an interacting pair is found. 7. Simulation Results Following are typical results from two simulations. Each simulation started with an equal mix of 50 D agents and 50 TFT or RTFT agents. The first simulation provides a baseline for comparison with the other simulations, while the second simulation shows the effect of group reputation. 7.1 Baseline: TFT vs. Defect In our baseline experiment, agents inhabiting a single group interacted for 10,000 encounters in each generation. As shown in Figure 5, the TFT agents decreased in number and within 14 generations had completely disappeared. This outcome is typical of this scenario and occurs since there are insufficient repeated interactions between agents for TFT's strategy to have an advantage over always defect. TFT 60 50 Size 40 30 TFT 20 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Generation Figure 5. TFT vs. Defect. In our next simulation, we looked at two homogeneous groups of agents, one consisting of RTFT agents and the second consisting of D agents. We specified a fixed minimum group reputation threshold and varied the number of interactions per generation from 10,000 down to 100. Figure 6 shows the number of RTFT agents after each generation for a given number of encounters. The RTFT agents were able to increase their numbers or maintain a population even when the number of interactions per generation dropped to 500. At 200 interactions per generation there were insufficient interactions between RTFT and D agents for group reputation to be useful, and RTFT's population decreased. 120 100 80 RTFT (10000) Size RTFT (1000) 60 RTFT (500) RTFT (200) RTFT (100) 40 20 39 36 33 30 27 24 21 18 15 9 12 6 3 0 0 Generation Figure 6. RTFT vs. Defect. 8. Conclusion and Future Study Group reputation can help agents choose an appropriate course of action when interacting with an otherwise unknown agent. A model of groups, relating groups within a hierarchy, was developed to support the study of group reputation. There are many additional aspects of groups that affect agent behavior. Additional parameters affecting groups could be studied, such as observation probabilities, knowledge transfer among agents, etc. A more complete model of groups would include more complex relationships between groups and multiple group memberships for agents. Further work should clarify the relationships among different types of reputations and provide a clearer understanding of how group reputation may be used in different circumstances. 9. References [1] Boyd, R., Richerson, P. J.: The Evolution of Reciprocity in Sizable Group. Journal of Theoretical Biology, 132, 1988, pp. 337-356 [2] Tirole, J.: A Theory of Collective Reputations (with Applications to the Persistence of Corruption and to Firm Quality. The Review of Economic Studies, 63(1), 1996, pp. 1-22 [3] Axelrod, R.: The Evolution of Cooperation. Basic Books, New York, 1984 [4] Maynard Smith, J: Evolution and the Theory of Games, Cambridge University Press, Cambridge, 1982 [5] Mui, Lik and Halberstadt, Ari: Notions of Reputation in Multi-Agent Systems: A Review, 2001 [6] Nowak, M. and Sigmund, K.: A strategy of win-stay, lose shift that outperforms tit-for-tat in the Prisoner's Dillema game. Nature, 364, 1993, pp. 56-58