3. Group Reputation - MIT Computer Science and Artificial

advertisement
Group Reputation in Multi-Agent Systems
Ari Halberstadt
Lik Mui
Magic Cookie Software
MIT Laboratory for Computer Science
9 Whittemore Road
200 Technology Square
Newton, MA 02458, USA
Cambridge, MA 02139, USA
ari@magiccookie.com
lmui@lcs.mit.edu
ABSTRACT
Reputation studies so far have been concentrated on individual
reputation. Clearly, individuals exist within social groups. By
nature of their affiliations, individuals affect how the groups that
they belong to are perceived. Reputations for these groups are
constructed from these individual perceptions. In multi-agents
research, reputation systems have been constructed to capture
individual reputation. This paper proposes a computational
model for groups of agents and means for inferring reputations
of group members. We argue that group reputation can be used
as an a priori estimate for individual reputation. Simulations
based on evolutionary game theory are used to validate our
group reputation model.
Categories and Subject Descriptors
I.2.11 [ARTIFICIAL INTELLIGENCE ]: Distributed Artificial
Intelligence – Multiagent systems.
I.6.8 [SIMULATION AND MODELING]: Types of Simulation
– Gaming.
General Terms
Experimentation, Theory.
Keywords
Reputation, cooperation, multiagent-based simulation, social
norms, evolutionary game theory.
1. INTRODUCTION
Boyd and Richerson[1] and Tirole[2] have considered groups
in the context of game theory. Their group models are simple
aggregations of agents with no obvious relationships among
groups. We introduce a hierarchical tree model for groups in
which agents can belong to multiple groups. Groups possess
parameters that affect the behavior of member agents. Each
agent has its own perception of individual agent and group
member actions. Agents maintain interaction histories of other
individual agents and use these histories to ascribe reputations to
individual to groups of agents. Agents use strategies in
combination with observations for decision making. For
encounters between agents in different groups, we introduce the
concept of shared group environment. This shared environment
is the least common ancestral node for the involved agents in the
group hierarchy.
2. Background
The prisoner’s dilemma is a well known framework for the
study of cooperation[3]. In our simulations, we apply repeated
two-player prisoner’s dilemma encounters over multiple
generations to randomly selected pairs of agents. In each
generation, a new population of agents is created based on the
fitness of agents in the preceding generation. The payoff matrix
for players in the prisoners' dilemma game is shown in Figure 1.
The payoffs for an agent are known as temptation to defect (T),
reward for cooperation (R), punishment for defection (P), and
sucker's payoff (S). Constraints on the payoffs are
T  R  P  S and 2R  T  S . Values that satisfy
these constraints are
T  5 , R  3 , P  1 , and S  0 .
C
D
C
R, R
S, T
D
T, S
P, P
Figure 1. Payoff matrix for players in the Prisoners'
Dilemma.
Tit-for-tat (TFT) is a strategy that initially cooperates and
then does whatever its opponent did in the last round. TFT can
be an evolutionarily stable strategy (ESS) against always
defecting agents when the probability of interaction among TFT
agents is sufficiently high[3],[4]. We explored the effect of group
reputation—measured as the proportion of cooperations over the
total number of encounters for a given group as observed by an
agent—on TFT population size. By modifying TFT to use group
reputation, the modified TFT population size can increase in the
presence of D agents, even with a relatively low probability for
repeated interactions.
3. Group Reputation
A reputation measure is a metric that allows the reputations
of individual agents or groups to be compared. Higher
reputations imply a more favorable opinion towards the agent or
group and imply a higher level of trust towards the agent or
towards members of the group. We have studied several types of
reputation[5]. Of these types, group reputation is a measure of the
aggregate reputation of agents that are members of a group.
Group reputation is used by people in their real-life
interactions. Some of these uses are helpful both to the person
making use of group knowledge and to the member of the group
affected by the person's choices, while other uses may be
considered harmful or undesirable. Group reputation can be used
by individuals to enhance their prestige and opportunity. For
instance, a graduate of a prestigious school tends to have
enhanced reputation and respect, even among people who do not
know that individual personally, relative to a graduate of a less
prestigious school. The school's reputation serves to enhance the
graduate's opportunities, and presumably to also enhance the
chance that an employer will hire a highly qualified and capable
individual. Groups may also acquire poor reputation. Poor
reputation can be used to avoid interacting with harmful
individuals, such as members of criminal groups.
In our simulations, group reputation is measured by an
agent as the proportion of encounters in which members of a
group cooperated out of the total number of encounters that were
observed by the agent and that involved members of the group.
Thus, group reputation varies between agents and depends on
the encounters that each agent has observed and the actions of
the agents participating in those encounters. Our simulations use
group reputation as an a priori estimate of the reputation of an
agent being encountered for the first time. In their initial
interaction, an agent employing a strategy that uses group
reputation will cooperate with another agent if the other agent's
group reputation is above a minimum threshold.
Group reputation may be the only measure available when
direct interaction is limited. Since groups contain more than one
member, there is a higher probability for interactions with some
member of a group than with any individual member of the
group. Thus, there will be more information available about the
aggregate group than about any individual member. Since an
individual agent may belong to multiple groups, each observed
interaction may affect the reputations of all of the groups to
which the agent belongs.
The use of group reputation by an agent can be viewed as
an attempt by the agent to discern the norms of behavior
followed by a group's members. Group reputation can be a good
indicator of the behavior of an unknown agent who is a member
of a homogeneous group. In a homogeneous group, all member
agents may be expected to follow similar or identical strategies,
and thus to behave similarly. Group reputation is less successful
as a predictor of agent behavior in heterogeneous groups in
which agents follow different strategies, since information
gleaned from one member is less applicable to another member.
4. Group Model
Groups can be structured in a hierarchical tree. This is
analogous to membership in a hierarchy of groups such as
family, school, city, state, country. While people belong to
multiple overlapping groups, modeling such a complex structure
would have significantly complicated our simulations. In our
model, an individual agent belongs to a primary group and to all
groups in nodes higher in the group hierarchy. All agents exist
within a single group tree. Each agent carries a group identity,
which is observable by other agents. The group structure is
accessible to all agents, so that an agent can infer the complete
group membership of another agent. Figure 2 shows a sample
group tree. Within this tree, an agent whose primary
membership is in group 4 is also a member of groups 2 and 1,
while an agent whose primary group membership is group 3 is
also a member of group 1.
1
2
4
3
5
Figure 2. Sample group tree.
Associated with a group are parameters that affect the
interactions of its members and which specify the group's initial
composition of agents. Group parameters include the initial
number of agents with a given strategy and the probability that
agents will interact in a group. Interaction probabilities reflect
the idea that members of groups have different propensities to
interact with members of other groups.
Agents may interact with other agents within their primary
group and with agents in any other group. Since different groups
have different parameters, some determination must be made as
to which group's parameters affect an interaction. In our model,
an interaction between agents occurs in the group situated at the
least common ancestral node of each agent's primary group
membership within the group tree. The determination of the
shared group, and the application of the group's parameters, are
features of the system and of the groups in which the agents
exist, not of individual agents. For instance, in the preceding
group tree, an interaction between members of group 4 occurs in
group 4, an interaction between members of groups 4 and 5
occurs in group 2, and an interaction between members of
groups 4 and 3 occurs in group 1.
4.1 Agent Strategies
We studied agent strategies in which the decision for an
encounter with an agent is based on the last interaction with that
agent. Each strategy is characterized by five probabilities for
cooperation[6]: an initial probability and four probabilities for
each of the possible outcomes of the last encounter. We
extended these strategies by adding a reputational threshold that
determines how an agent will act. Some strategies that can be
constructed from the five probabilities for cooperation combined
with group reputation measures include:
Cooperate (C): always cooperates.
Defect (D): always defects.
Tit-for-tat (TFT): initially cooperates, and then does what the
other agent did in the last round.
Reputation tit-for-tat (RTFT): initially cooperates depending on
the reputation of the other agent's primary group, and then does
whatever the other agent did in the last round.
When an RTFT agent has not interacted with another agent
it uses group reputation as an a priori measure of whether to
cooperate with the other agent. If the reputation of the target
agent's primary group is less than a minimum group reputation
then the RTFT agent defects, otherwise it cooperates. If an
RTFT agent has no observations of a group then it initially
cooperates (it assumes a favorable group reputation).
Strategy
I
T
R
P
S
Cooperate (C)
1
1
1
1
1
Defect (D)
0
0
0
0
0
Tit-for-tat (TFT)
1
1
1
0
0
Reputation
tat (RTFT)
(see
text)
1
1
0
0
Tit-for-
Figure 3. Cooperation probabilities of different strategies.
The column labeled I gives the initial probability for
cooperation, while those labeled T, R, P, and S give the
probabilities for cooperation given that the outcome (payoff)
of the previous encounter was temptation, reward,
punishment, or sucker. The initial probability for RTFT
depends on group reputation.
5. Strategy Discussion
Excluding reputation measures, the strategies that we
examined have a history of one prior interaction with another
agent. For a selective strategy to be successful, it must
discriminate between agents that will defect and agents that will
cooperate. The cumulative payoff to an agent (the agent's
fitness) is
F  NT PT  N R PR  N P PP  N S PS
N x = number of x outcomes
Px = payoff for each outcome x
For agents following one strategy to win out over agents
following another strategy, the expected fitness of the first
strategy must be greater than the expected fitness of the second
strategy.
E[ Fa ]  E[ Fb ]
Fx = fitness of strategy x
To avoid overly complex scenarios, in one of our simulations we
looked at a population consisting only of RTFT and D agents in
two homogeneous groups. Figure 4 shows all possible outcomes
for interactions between two agents in these groups.
<1,1>, ...
RTFT-RTFT
<1,0>, <0,1>, ...
<0,1>, <1,0>, ...
<0,0>, ...
<0,1>, <0,0>, ...
RTFT-D
<0,0>, ...
D-D
<0,0>, ...
Figure 4. Possible outcomes in population of RTFT and D
agents. A value of 1 indicates cooperation, while a value of 0
indicates defection. An RTFT agent cooperates in its initial
encounter if the other agent has sufficient group reputation.
The left column shows possible encounters, while the right
column shows the resulting sequence of interactions.
Since all of an RTFT agent's outcomes are determined by
its initial encounter, an RTFT agent receives a higher total
payoff when it initially cooperates with an RTFT agent and
defects against a D agent. If an RTFT agent initially defects on
an RTFT agent, then both agents either get locked into an
alternating defect/cooperate sequence or an always defect
sequence, both of which have lower payoffs than the reward for
repeated cooperation. Group reputation can help an RTFT agent
make the correct choice in its first encounter with another agent.
6. Simulation Framework
Our simulation consists of a tree of environments
containing agents. A simulation is run for some number of
generations. In each generation, agents are randomly chosen to
interact and observe other agents. When agents interact, their
fitness is adjusted based on the outcome of their interaction. At
the end of each generation, agents leave offspring in proportion
to their accumulated fitness and the parent generation is
destroyed. The total population size is fixed, so any increase in
the number of one type of agent is balanced by a decrease in the
numbers of other types of agents. A simulation has parameters
that specify the number of generations for which the simulation
is run, the number of encounters between agents in each
generation, and the minimum group reputation for initial
cooperation by agents following the RTFT strategy. Within a
simulation, a specification is provided for the group tree and for
each group's associated parameters (initial number of agents and
interaction probability).
Agents consist of four main elements: an interaction
strategy, a primary group membership, a history of interactions
with other agents and groups, and an accumulated fitness. An
agent uses its interaction strategy to decide how to act towards
another agent. Each agent belongs to a primary group. When
agents reproduce, an offspring agent inherits the parent agent's
strategy and group membership. Each time an agent observes
another agent, it updates the reputation for all of the groups of
which the observed agent is a member. Each agent maintains its
own history of other agents and groups, so that there is no
shared global history or reputation. An agent's fitness measure is
the sum of the benefits accumulated by an agent in all of its
encounters.
Participants in an interaction are chosen randomly from the
total population. After the first participant is selected, a second
participant is randomly selected. The shared environment of the
two participants is then found, and the agents interact with the
interaction probability of the shared environment. If no
interaction takes place, then a different second agent is selected
and the test is repeated until an interacting pair is found.
7. Simulation Results
Following are typical results from two simulations. Each
simulation started with an equal mix of 50 D agents and 50 TFT
or RTFT agents. The first simulation provides a baseline for
comparison with the other simulations, while the second
simulation shows the effect of group reputation.
7.1 Baseline: TFT vs. Defect
In our baseline experiment, agents inhabiting a single group
interacted for 10,000 encounters in each generation. As shown
in Figure 5, the TFT agents decreased in number and within 14
generations had completely disappeared. This outcome is typical
of this scenario and occurs since there are insufficient repeated
interactions between agents for TFT's strategy to have an
advantage over always defect.
TFT
60
50
Size
40
30
TFT
20
10
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Generation
Figure 5. TFT vs. Defect.
In our next simulation, we looked at two homogeneous groups of agents, one consisting of RTFT agents and the second consisting of
D agents. We specified a fixed minimum group reputation threshold and varied the number of interactions per generation from 10,000
down to 100. Figure 6 shows the number of RTFT agents after each generation for a given number of encounters. The RTFT agents were
able to increase their numbers or maintain a population even when the number of interactions per generation dropped to 500. At 200
interactions per generation there were insufficient interactions between RTFT and D agents for group reputation to be useful, and RTFT's
population decreased.
120
100
80
RTFT (10000)
Size
RTFT (1000)
60
RTFT (500)
RTFT (200)
RTFT (100)
40
20
39
36
33
30
27
24
21
18
15
9
12
6
3
0
0
Generation
Figure 6. RTFT vs. Defect.
8. Conclusion and Future Study
Group reputation can help agents choose an appropriate
course of action when interacting with an otherwise unknown
agent. A model of groups, relating groups within a hierarchy,
was developed to support the study of group reputation. There
are many additional aspects of groups that affect agent behavior.
Additional parameters affecting groups could be studied, such as
observation probabilities, knowledge transfer among agents, etc.
A more complete model of groups would include more complex
relationships between groups and multiple group memberships
for agents. Further work should clarify the relationships among
different types of reputations and provide a clearer
understanding of how group reputation may be used in different
circumstances.
9. References
[1] Boyd, R., Richerson, P. J.: The Evolution of
Reciprocity in Sizable Group. Journal of Theoretical
Biology, 132, 1988, pp. 337-356
[2] Tirole, J.: A Theory of Collective Reputations (with
Applications to the Persistence of Corruption and to
Firm Quality. The Review of Economic Studies,
63(1), 1996, pp. 1-22
[3] Axelrod, R.: The Evolution of Cooperation. Basic
Books, New York, 1984
[4] Maynard Smith, J: Evolution and the Theory of
Games, Cambridge University Press, Cambridge,
1982
[5] Mui, Lik and Halberstadt, Ari: Notions of Reputation
in Multi-Agent Systems: A Review, 2001
[6] Nowak, M. and Sigmund, K.: A strategy of win-stay,
lose shift that outperforms tit-for-tat in the Prisoner's
Dillema game. Nature, 364, 1993, pp. 56-58
Download