Artificial Agents Play the “Mad Mex Trust Game”:

advertisement
Artificial Agents Play the “Mad Mex Trust Game”:
A Computational Approach1
Wu, D.J., S. Kimbrough and F. Zhong
Abstract
We investigate the “Mad Mex Trust Game,” which cannot easily be represented in strategic form. In
outline the game is as follows. N players of various types are free to negotiate with each other. The
players and their types are identified and known to the other players. Players of a given type produce
a particular commodity at a certain rate. The well being of the individual players depends upon
having a mixture of commodities; hence the players have incentive to negotiate trades with players of
other types. After arriving at an agreement, there is a fulfillment stage. Players are free to renege on
their agreements, and players are able to remember who has reneged and who hasn't. Will cooperative
behavior emerge and under what conditions? What are some of the efficient and effective
mechanisms for trust building in electronic markets? How will these mechanisms affect the
emergence of trust and cooperative behavior? What are the key ingredients in building distributed
trust and what destroys trust? This game constitutes a more realistic model of negotiation support
systems in electronic markets, particularly on the Internet.
1
This material is based upon work supported by, or in part by, DARPA contract DASW01 97 K
0007. File: Trust_HICSS35_J22.doc. Partial support by a mini-Summer research grant and a
research fellowship by the Safeguard Scientifics Center for Electronic Commerce Management,
LeBow College of Business at Drexel University are gratefully acknowledged. Corresponding author
is D.J. Wu, his current address is: 101 North 33rd Street, Academic Building, Philadelphia, PA
19104. Email: wudj@drexel.edu. The Java code we developed in this study can be downloaded for
interested readers at www.lebow.drexel.edu/wu300/trust_game/.
1
1. Introduction
An important aspect of electronic commerce is that often it is not trusted (Tan and Thoen 2001), since
it is often difficult for a user to figure out who to trust in online communities (Dusgopta 1988;
Schillo and Funk 1999). Recently much interest has developed from researchers into how to build
trust in electronic markets operation in such environments as the Internet. The literature has
approached the study of trust with various well-defined “trust games”.
Basically there are two versions of the trust game, the classical trust game in economics or the
investment game (Lahno 1995; Berg et al. 1994; Erev and Roth 1998), and the “Electronic Commerce
Trust Game” or the “Mad Mex Trust Game”2. The former is well studied in the economics literature
and it is regarded as revealing and of fundamental importance in social interaction and knowledge
management, as fundamental as the prisoner’s dilemma game (Hardin 1982; Lahno 1995). The latter
is due to Kimbrough and Tan where the players exchange goods such as red sauce or green sauce
rather than money. In this paper, we focus on the Mad Mex Trust Game using artificial agents. We
leave the Economics Trust Game to a subsequent paper where we plan to use the agent-based
approach as well. Will trust/cooperative behavior emerge? If so, under what conditions (and when
and how)? Put in another way, what are the conditions that promote of trust/distrust? How do we
explain/reveal/understand the behavior of agents (What they are doing, why they are doing what they
are doing)? The ultimate goals are to study the effects of markets, characteristics of markets, and
market mechanisms associated with systems of artificial agents.
After its place of conception, a restaurant near the University of Pennsylvania, by Kimbrough and
Tan.
2
2
The contributions of this paper are the integration of several strands of research literature: The trust
literature (we focus here on the computational approach of social trust); The Electronic Communities
and Electronic Markets literature (we focus here on what kind of market mechanisms would facilitate
trust and cooperation); what kind of market mechanism would disrupt trust and cooperation.
The rest of the paper is organized as follows. Section 2 provides a brief literature review.
Section 3 outlines our key research methodologies and implementation details. Section 4 reports our
experimental findings for two agents. Section 5 reports further experiments where an additional
player, the third agent has been introduced. Section 6 summarizes and discusses future research.
2. Literature Review
There are roughly two major streams in the trust literature. One stream is interested in developing
trust technology (example: security technology such as password setting or digital watermarking).
Representative work can be found in the recent special section of CACM (December 2000). The
second stream focuses on social trust (Shapiro 1987) and the work on social capital (e.g. Uslaner
2000). Our interest is in the latter, e.g., an online trader can well have access to the trading system but
not cooperate with the other online traders due to self-interest. In particular, we are interested in trust
based on cooperation (Gűth et al. 1997), i.e., social trust is viewed as cooperative behavior.
Methodologically, there are several approaches to the study of trust, illustrating a broad interest from
several disciplines in social sciences. These include the behavioral approach (e.g., Das and Teng
1995; Mayer, Davis and Schoorman 1995); the philosophical and logical approach (e.g., Tan 2000;
Tan and Thoen 2001); the computer science approach (e.g., Holland and Lockett 1998; Zacharia,
3
Moukas and Maes 1999); the sociology approach (e.g. Shapiro 1987); the psychology approach (e.g.
Gűth et al. 1997); the classical economics approach (e.g. Ellison 1993) and the experimental
economic approach (e.g. Engle-Warnick 2000; Erev, E. and Roth 1998; Sundali, Israeli and Janicki
2000). In this paper, we use an interdisciplinary approach that integrates the economics and computer
science approaches, or the computational economics approach. Details of our methodology and
framework are provided later in section 3. We now define our specific trust game.
As mentioned earlier, there are several versions of the trust game. The following is our version of the
game, known in the literature as the investment game. There are two players, the principal and the
agent. The principal has some amount of money to invest, say x, so he hires an agent (she) to do this
for him. The agent, in term, gives the money to an investor or a broker, who invests the money in the
market, and truthfully reports to the agent on the return of the investment, say 3x. The agent then
decides how to split with the principal on the profit. The game is played repeatedly, i.e., the principal
has the choice to whether to hire the agent or not. Under some regularity conditions, it has been
shown in the literature that trust can be built if the game is played repeatedly (Lahno, 1995).
In the Mad Mex Trust game, the money is replaced with goods. This game cannot easily be
represented in strategic form. In outline the game is as follows. N players of various types are free to
negotiate with each other. The players and their types are identified and known to the other players.
Players of a given type produce a particular commodity at a certain rate. The well being of the
individual players depends upon having a mixture of commodities; hence the players have incentive
to negotiate trades with players of other types. After arriving at an agreement, there is a fulfillment
4
stage. Players are free to renege on their agreements, and players are able to remember who has
reneged and who hasn't.
We now describe our research framework and methodology in more detail.
3. Methodology and Implementations
In our framework, artificial agents are modeled as finite automata (Hopcroft and Ullman 1979). This
framework has been adopted by a number of previous investigations. Among them, Rubinstein
(1986), Sandholm and Crites (1995), Miller (1996) and many others used it to study interated
prisoner’s dilemma (IPD). Kimbrough, Wu and Zhong (2001a) used it to study the MIT “Beer
Game”, where genetic learning artificial agents played the game and managed a liner supply chain.
Wu and Sun (2001a, b) investigate the electronic market off-equilibrium behavior of artificial agents
in a price and capacity bidding game using genetic algorithms (Holland 1992). Arthur et al. (1996)
modeled a realistic stock marketplace composed of various genetic learning agents. Zhang,
Kimbrough and Wu (2001) study the ultimatum game using reinforcement learning agents. These are
merely examples to illustrate the acceptance of this framework in the literature. The reader is referred
a Kimbrough and Wu (2001) for a survey.
In this study, we depart from previous research by integrating several stranded approaches. First, we
study a different game, namely, the Mad Mex Trust game, rather than games such as the RPD, Beer,
Bidding, or Ultimatum; Second, in studying this game, we use a computational/evolutionary
approach, in comparing classical or behavioral game theoretical approaches with artificial agents;
Third, our agents are using a reinforcement learning regime (Sutton and Barto 1998), Q-learning, as a
5
learning mechanism in game playing. Previous studies of the trust game are not computational (with
the exception of Zacharia et al., where they employed a reputation rating mechanism). Finally, our
agents are identity-centric rather than strategy-centric as used in previous studies (e.g., Kimbrough,
Wu and Zhong 2001a; Wu and Sun 2001a, b). That is, our agents may meaningfully be said to have
individual identities and behavior. They are not just naked strategies that play and succeed or fail.
Individuals, rather than populations as a whole, learn and adapt over time and with experience.
Fielding these kinds of agents, we believe, is needed for e-commerce applications.
We now describe our model and prototype implementations in more detail in the framework of Q
learning. This includes how the rules or state-action pairs are embedded in our artificial agents, how
the rewards were set up, what was the long-term goal of the game (returns), and finally the specific Q
learning algorithm we designed.
Rules (State-Action pairs):
The Q-learning algorithm estimates the values of state-action pairs Q(s, a). At each decision point, the
state of an agent is decided by all the information in its memory history, e.g. its own and opponent’s
last trade volume. The possible action at this decision point that an agent can take is any number
between zero and its endowment. In this sense, the agent’s strategy is the mapping from its memory
of the last iteration to its current action. To balance exploration and exploitation, we use the -greedy
method to choose randomly with trivial probability. The value of  starts from 0.3, and then decreases
to 0.01 by steps of 0.000001.
Rewards:
6
The instant reward an agent can get is determined by a modified Cobb-Douglas function that is the
mixture of the amounts of different types of sauces the agent posses after each episode.
U = ∏ ai1/n
Where n is the number of the types of comedies in the market. We chose this Cobb-Douglas utility
function for our simulation because commodity A and B have equal weight for the agents.
Returns:
The long-run return is simply the total utility an agent obtained after playing every episode so far.
R = ∑ Ui
The goal of an agent at any iteration is to select actions that will maximize its discounted long-term
return following certain policies. The use of Q-learning ensures that the artificial agents are nonmyopic.
Q learning:
The learning algorithm used by the artificial agents is one-step Q-learning described as following:
Initialize Q(s, a) to zero
Repeat
From the current state s, select an action a using -greedy method
Take action a, observe an immediate reward r, and the next state s’.
Q(s, a) = Q(s, a) +  [r +  maxa’ Q(s’, a’) – Q(s, a)]
s  s’
Until the end of a trial
The experiment runs with the learning rate  = 0.05 and discount factor  = 0.95. The values of  and
 are chosen to promote learning of cooperation.
4. Two-Agent Experiment Design and Results
7
We compare the following trust mechanisms: (1) Moving average of the past five observations; (2)
Exponential smoothing; (3) Zacharia-Moukas-Maes reputation rating; (4) Tit-for-tat; and (5) Most
recent move (or last move). We are interested to seeing if under each of the above five mechanisms,
will agents trust each other and if so, will cooperative behavior converge. By comparing these five
mechanisms, we are interested to see which mechanism does better in terms of building trust and
promoting social welfare.
Experiment one: Two Q-learner agents play against each other
Two learning agents play the iterated trust game. In each iteration, both agents start with the same
amount of endowment. Player A first offers its commodity to player B, upon receiving the
commodity, player B decides how much of his commodity he will trade. Since there is no third party
agent to deal with the transaction, the exact number of the trade volume is open to both parties and
thus there is no asymmetric information. The whole experiment is a long trial, i.e. two artificial agents
play the game an indefinite number of times.
To test the validity of the experiment and analyze the results, the endowment is set to be rather small,
3, so that each agent has 4 * 4 * 4 = 64 state-action pairs. Based on the utility function, one agent
learns to switch its trade volume between 3 and 0, the other agent learn to switch between 0 and 2
correspondingly through long iterations. This can be illustrated as:
Agent A (trade volume): 3 0 3 0 3 0 …
(utility)
:060606…
Agent B (trade volume): 0 2 0 2 0 2 …
(utility)
:909090…
8
Thus the utility of the first agent changes between 0 and 6, which gives it an average of 3; the second
agent gets either 9 or 0 in turn, which gives it 4.5 on average. Although this is still not as good as the
following result that will give both of the agents utility of 4.5 on average, it’s better than sticking on
trading 1 unit or 2 units all the time.
Agent A (trade volume): 3 0 3 0 3 0 …
(utility)
:090909…
Agent B (trade volume): 0 3 0 3 0 3 …
(utility)
:909090…
Experiment two: Two Q-learner agents with reputation index play against each other
Experiment two includes a set of sub-experiments to test the efficiency of different reputation
mechanisms. At the end of any time period t, both agents rate each other. The rating to one’s
opponent, ri’, is simply the ratio of the opponent’s trade volume Vi’ and the endowment N:
ri’ = Vi’ / N
The reputation index will also be updated based on this rating according to different mechanisms. The
value of this reputation index will be normalized in each mechanism, where 1 is perfect and 0 is
terrible. Now the strategies of each agent are the mappings from the reputation information to
possible actions. We specifically test the following four reputation mechanisms.
1. Moving Average
The value of the reputation index is simply the arithmetic average of the most recent five ratings.
2. Exponential Smoothing
The reputation index is the weighted average of the past ratings.
9
3. Reputation Rating (Zacharia-Moukas-Maes)
Introduced by Zacharia, Moukas and Maes (1999), every agent’s reputation index will be updated
after each iteration based on its reputation value in the last iteration Rt-1, its counterpart’s reputation
R’ and the rating it received for this iteration Wt. So the recursive estimate of the reputation value of
an agent at time t can be expressed by:
Rt = Rt-1 + 1/θ * (Φ(Rt-1) R’(Wt – Et)),
Φ(R) = 1 – 1/ (1 + e-(R – D)/σ),
Et = Rt-1/D,
Wt = Vt’ / N
Where Vt’ is the trade volume of an agent’s counterpart, D is the range of the reputation values,
which is 1 here.
4. Tit-for-tat
Under this mechanism, the agents will trade the amount that its counterpart traded to it in the last time
period, or Vt = Vt-1’.
5. Performance Comparison of various mechanisms
The total utility of each agent under different reputation mechanisms is compared in figures 1 and 2.
Furthermore, the joint utility of both agents under different reputation mechanisms is compared in
figure 3.
-----------------------------------------Insert Figure 1 here
-------------------------------------------
10
-----------------------------------------Insert Figure 2 here
----------------------------------------------------------------------------------Insert Figure 3 here
------------------------------------------Furthermore, we assign different reputation mechanisms to the two agents and compare the total
points after 300,000 iterations. In the following set of figures, the performance of each reputation
mechanism vs. other mechanism is described.
-----------------------------------------Insert Figures 4a – 4e here
-------------------------------------------
5. Three-Agent Experiment Design and Results
Three agents selling two goods
In this experiment, three agents trade two types of goods, i.e., Agents B and C produce the same type
of good, while agent A produces a different type. At the beginning of each episode, agent A chooses
the agent with higher reputation value from agent B and C to give his goods. The reputation of the
chosen agent and agent A will be upgraded after each episode. All three agents are assigned the same
reputation mechanism. We test the performance of different reputation mechanisms: moving average,
last move, exponential smoothing and Zacharia-Moukas-Maes reputation rating. Figure 5 displays the
total utility of agent A under these mechanisms, the experiment shows that the “most recent”
11
reputation mechanism quickly wins out against the others, except tit-for-tat. Here, if all agents are
using tit-for-tat, then obviously, all agents would cooperate and each agent would achieve its best
performance.
-----------------------------------------Insert Figure 5 here
-------------------------------------------
We now study the impact of additional trade partner by comparing the total utility of agent A in two
agent and three agent contexts. Not surprisingly, agent A benefits from the introduction of the third
player as there is now competition between agent B and C. The results are summarized in Table 1.
Table 1: Total Utility of Agent A in 2-agent and 3-agent context.
Moving Average
Most Recent Move
Exponential Smoothing
Zacharia-Moukas-Maes
Three agents
506829.6
577475.8
423258.9
452451.9
Two agents
337441
279490.9
449496.1
449606.6
What if A uses a different fixed reputation mechanism such as tit-for-tat, while Agent B and C are
using another but identical reputation mechanisms? Will agent A benefit from such a differentiation?
The results are somewhat mixed and show that the performance depends on what the others are using,
as summarized in Table 2.
Table 2: Total Utility of Agent A with Tit-for-Tat strategy playing against other reputation mechanisms in twoagent and three-agent environments.
Moving Average
Most Recent Move
Exponential Smoothing
Zacharia-Moukas-Maes
Two agents
413824.6
423078.2
394672.8
423752.8
12
Three agents
418158.5
389489.9
429542.0
382129.0
Three agents selling three goods
We now let agent C sell the third type of sauce, i.e., we are considering the general case when each
agent has a different good. The endowment at each period/episode of each agent is set to 3, reflecting
a steady state production rate of reach agent (i.e., each agent can produce a fixed amount of goods
during a period). Now we describe the trade game. At the beginning of each episode, each agent
decides simultaneously how many to give to the other two agents, expecting an exchange from them.
It turns out that the system can turn quickly into being too complicated to be tractable even in the two
agent or the three agent learning situation (Sandholm and Crites 1995; Marimon, McGrattan and
Sargent 1990). In our setting, it will be very difficult for the Q-learning agents to learn the true value
of the state-action functions. In this initial exploration, we will start with one agent (A) learning by
fixing the strategies of the other two agents (B and C). We note here in passing that we leave the case
of three agents learning simultaneously for future research.
We have identified from the literature the following heuristics for players B and C, which were
suggested by previous research, including the nice, nasty, fair, and modified tit-for-tat strategies (e.g.
Axelrod 1984; Axelrod and Hamilton 1981). For benchmarking purpose, as in the literature, we also
add the random strategy. Table 3 describes these five strategies.
Table 3: Possible Strategies of Player B and C.
Random
Nasty
Fair
Nice
Modified Tit-4-Tat
Agent randomly decides how many goods to give to the other two agents.
IF
V’t-1 = 0
THEN
Vt = 1
ELSE
Vt = 0
IF
V’t-1 = 0
THEN
Vt = 0
ELSE
Vt = 1
Agent always gives 1 unit of its good to each of the other two agents. (Vt = 1)
Agent always gives the amount its opponents gave to it in the last episode
(Vt = V’t-1) when the total amount it got from last episode doesn’t exceed its
13
endowment. If the total amount exceeds its endowment, the agent will give its
good to the other agents proportional to the amount he was given in the last
episode.
We experiment with the behavior of our learning agent (A) in the following typical scenarios of agent
B and C3. Agents B and C are using the following strategies: random and random; tit-for-tat and
random; tit-for-tat and nasty; tit-for-tat and fair; tit-for-tat and tit-for-tat; tit-for-tat and nice; and
finally fair and nice. Our interest is to investigate the performance of agent A using the above five
reputation mechanisms. Figure 6a - 6e display the results for moving average, most recent move,
exponential smoothing, Zacharia-Moukas-Maes rating and tit-for-tat mechanisms accordingly.
-----------------------------------------Insert Figure 6a – 6e here
-------------------------------------------
Can the artificial intelligent agent learn to cooperate? The answer is yes based on the above
experiment. When using exponential smoothing, agent A seems to learn slowly and performance is a
bit inferior; otherwise it performs well under all other reputation mechanisms. The value of
intelligence (Zhong, Kimbrough and Wu 2001) is further confirmed in this study, i.e., intelligence
pays. The learning agent can quickly learn how to exploit the other agents’ fixed strategies.
3
We note here that it is straightforward to conduct a statistical significant test of all possible combinations of agent B and
C’s strategies. However, we choose not to do so since we believe such statistical formalism would not add much
additional insights into our interest here.
14
The results show that the emergence of trust depends on the strategies used by B and C, or the
“climate”. When the climate is nice, then the agent will learn to cooperate and the social welfare is
maximized and rather fairly distributed (almost equally split).
In terms of the comparison of the five different reputation mechanisms, except for the exponential
smoothing reputation mechanisms, there does not seem to exist any significant difference in building
trust. This is so, due to the commonality of these mechanisms, which is to forgive or discount
previous actions taken by other parities. This is interesting and the role of forgiveness in promoting
trust building deserves further investigation. Again, we leave this for a subsequent project.
Overall, this experiment demonstrates the promise of artificial intelligent agents in the Mad Mex trust
game, and indeed in market negotiation contexts generally.
6. Conclusions and Future research
It is well known in the literature that trust will emerge in repeated games such as the Max Med Trust
game studied here. However, this study deepens previous work by examining when trust will and
when trust will not emerge using a framework that allows parameter settings. The agents here are
identity-centric using reinforcement Q-learning and the performance has been compared with the
strategy-centric agents.
Artificial agents using Q learning have been found to be capable of playing the Mad Mex Trust game
efficiently and effectively. Cooperative behaviors have emerged and conditions for such cooperation
or trust building has been studied experimentally. Several efficient and effective mechanisms for trust
building in electronic markets have been tested and compared. The study explores, initially, how
these mechanisms affect the emergence of trust and cooperative behavior. Key ingredients in building
15
as well as destroying distributed trust have been experimented. Can we find characteristics of
trusting/distrusting systems? Our initial yet original exploration shed light on this.
We believe our Mad Mex Trust game constitutes a more realistic model of negotiation support
systems in electronic markets, particularly on the Internet. We are actively investigating other forms
of trust games, including the classical investment game (or the trust game) as well as the ultimatum
game (see Zhong, Kimbrough and Wu 2001 for initial results). Of particular interest, we plan to
investigate a closed related game, the Santa Fe Bar Game, first proposed by Arthur (1994). In the
long-run, we hope to develop computational principles for understanding social trust.
16
7. References
1. Arthur, B. “Inductive Reasoning and Bounded Rationality,” The American Economic Review,
V. 84, No. 2, pp. 406-411, May 1994.
2. Arthur, B., Holland, J., LeBaron, B., Palmer, R., and Tayler, P. “Asset Pricing Under
Endogenous Expectations in an Artificial Stock Market,” Working Paper, Santa Fe Institute,
December 1996.
3. Axelrod, R. The Evolution of Cooperation. Basic Books, New York, N.Y., 1984.
4. Axelrod, R., and Hamilton, W. “The evolution of cooperation”, Science, Vol. 211, No. 27 pp.
1390-1396, March 1981.
5. Berg, J., Dickhault, J., and McCabe, K. “Trust, Reciprocity, and Social History,” Games and
Economic Behavior, 10, pp. 122 – 142, 1994.
6. CACM (Communications of ACM), Special Section on Trusting Technology,
http://www.acm.org/cacm/1200/1200toc.html, Vol. 43, No. 12, December 2000.
7. Das, T.K., and Teng, B.-S. “Between Trust and Control: developing confidence in partner
cooperation in alliances”, Academy of Management Review, Vol. 23, No. 3, pp. 491-512,
1998.
8. Dusgopta, P. “Trust as a Commodity,” In D. Gambetta, editor, Trust: Making and Breaking
Cooperattive Relations, pp. 49 – 72. Blackwell, Oxford and New Work, 1988.
9. Erev, E. and Roth, A. “Predicting How People Play Games: Reinforcement Learning in
Experimental Games with Unique, Mixed Strategy Equilibria,” The American Economic
Review, 88, pp. 848 – 881, 1998.
10. Ellison, G. “Learning, Local Interaction, and Coordination,” Econometrica, V. 61, No. 5, pp.
1047-1071, September 1993.
11. Gűth, W., Ockenfels, P., and Wendel, M. “Cooperation Based on Trust: An Experimental
Investigation,” Journal of Economic Psychology, 18, 15 – 43, 1997.
12. Hardin, R. “Exchanges Theory on Strategic Bases,” Social Science Information, Vol. 21, No.
2, pp. 251-272, 1982.
13. Holland, C.P., Lockett, A.G., “Business Trust and the Formation of Virtual Organizations”,
Proceedings of the 31st Annual Hawaii International Conference on System Sciences (HICSS31), IEEE Computer Society, 1998.
17
14. Holland, J. Artificial Adaptive Agents in Economic Theory. The American Economic Review,
81, pp. 365 – 370, 1975.
15. Hopcroft, J. and Ullman, J. Introduction to Automata Theory, Languages and Computation.
Addison-Wesley, Reading, MA, 1979.
16. Kimbrough, S., Wu, D.J., and Zhong, F. “Computers Play the Beer Game: Can Artificial
Agents Manage the Supply Chain?” HICSS-34, 2001.
17. Lahno, B. “Trust, Reputation, and Exit in Exchange Relationships”, Journal of Conflict
Resolution Vol. 39, No. 3, pp. 495-510, 1995.
18. Marimon, R., McGrattan, E., and Sargent, T. “Money as a Medium of Exchange in an
Economy with Artificially Intelligent Agents”, Journal of Economics Dynamics and Control.
Vol. 14, pp. 329-373, 1990.
19. Mayer, R.C., Davis, J.H., and Schoorman, F.D., “An Integrative Model of Organizational
Trust”, Academy of Management Review, Vol. 20, No 3, pp. 709-734, 1995.
20. Miller, J. “The Coevolution of Automata in the Repeated Presioner’s Dilemma,” Journal of
Economic Behavior and Organization, 29, pp. 87-112, 1996.
21. Rubinstein, A. “Finite Automata Play the Repeated Prisoner’s Dilemma”, Journal of
Economic Theory 39, pp. 83-96, 1986.
22. Sandholm, T., and Crites, R. “Multiagent Reinforcement Learning in Iterated Prisoner's
Dilemma,” Biosystems, 37, pp. 147 - 166, 1995. Special Issue on the Prisoner's Dilemma.
23. Schillo, M. and Funk, P. “Who Can You Trust: Dealing with Deception”, in Proceedings of
the Workshop Deception, Fraud and Trust in Agent Societies at the Autonomous Agents
Conference, pp. 95-106, 1999.
24. Shapiro, S. P. “The Social Control of Impersonal Trust”, The American Journal of Sociology,
Vol. 93, No. 3, pp. 623-658, 1987.
18
25. Sundali, J., Israeli, A., and Janicki, T. “Reputation and Deterrence: Experimental Evidence
from the Chain Store Game,” Journal of Business and Economic Studies, Vol. 6, No. 1, pp. 1–
19, Spring 2000.
26. Tan, Y. H. and Thoen, W. “Formal Aspects of a Generic Model of Trust for Electronic
Commerce,” Working Paper, Erasmus University Research Institute for Decision and
Information Systems (EURIDIS), Erasmus University Rotterdam, The Netherlands, 2001.
27. Uslaner, E.M. “Social Capital and the Net,” Communications
http://www.acm.org/cacm/1200/1200toc.html, Vol. 43, No. 12, December 2000.
of
ACM,
28. Van der Heijden E.C.M., Nelissen, J.H.M., Potters, J.J.M, and Vernon, H.A.A. “Simple and
Complex Gift Exchange in the Laboratory,” Working Paper, Department of Economics and
CentER, Tilburg University.
29. Wolfram, S. Cellular Automata and Complexity. Addison-Wesley Publishing Company,
Reading, MA, 1994.
30. Zacharia, G., Moukas, A., and Maes, P. “Collaborative Reputation Mechanisms in Electronic
Marketplace,” HICSS-33, 1999.
31. Zhong, F., Kimbrough, S. and Wu, D.J. “Cooperative Agent Systems: Artificial Agents Play
the Ultimatum Game”, Working Paper, The Wharton School, University of Pennsylvania,
2001.
19
500000
450000
400000
Utility
350000
Avg.
Rrecency
Smoothing
Maes
Tit-4-Tat
300000
250000
200000
150000
100000
50000
0
0
00
0
30
00
0
0
27
00
0
0
24
21
00
0
0
0
18
00
0
0
00
0
15
00
0
00
0
12
90
00
0
60
00
0
30
0
0
Time
Figure 1: Total utility of agent A under different mechanisms for 300,000 episodes
600000
500000
Avg.
Recency
300000
Smoothing
Maes
Tit-4-Tat
200000
100000
00
30
00
00
27
00
00
24
00
00
00
00
21
00
00
18
00
15
12
00
00
0
90
00
0
00
0
60
00
0
0
30
Utility
400000
Time
Figure 2: Total utility of agent B under different mechanisms for 300,000 episodes
20
900000
800000
700000
Avg.
Recency
Smoothing
Maes
Tit-4-Tat
500000
400000
300000
200000
100000
0
0
00
0
30
0
00
0
27
00
0
0
24
21
00
0
0
0
18
00
0
0
00
0
15
00
0
00
0
12
90
00
0
60
00
0
0
0
30
Utiltiy
600000
Time
Figure 3: Joint utility of both agents under different mechanisms for 300,000 episodes
900000
800000
700000
600000
Avg.
500000
Opponent
400000
Joint
300000
200000
100000
0
aa
ar
as
4a
21
am
at
900000
800000
700000
600000
Recency
Opponent
Joint
500000
400000
300000
200000
100000
0
ra
rr
rs
rm
rt
4b
900000
800000
700000
600000
Smoothing
500000
Opponent
400000
Joint
300000
200000
100000
0
sa
sr
ss
sm
st
4c
900000
800000
700000
600000
Maes
500000
Opponent
400000
Joint
300000
200000
100000
0
ma
mr
ms
4d
22
mm
mt
900000
800000
700000
600000
Tit-4-Tat
500000
Opponent
400000
Joint
300000
200000
100000
0
ta
tr
ts
tm
tt
4e
Figure 4:The performances of different reputation mechanisms when playing against each other. ( a - moving
average, r - most recently, s - exponential smoothing, m - Maes, t - tit4tat)
700000
600000
avg.
400000
recent
smoothing
300000
mae
200000
100000
90
00
0
12
00
00
15
00
00
18
00
00
21
00
00
24
00
00
27
00
00
30
00
00
60
00
0
0
0
30
00
0
Utility
500000
Time
Figure 5: Total Utility of Agent A in three agents context.
23
450000
400000
350000
300000
Agent A
250000
Agent B
200000
Agent C
150000
100000
50000
0
random- t-random
random
t-nasty
t-fair
t-t
t-nice
fair-nice
6a
450000
400000
350000
300000
Agent A
250000
Agent B
200000
Agent C
150000
100000
50000
0
random- t-random
random
t-nasty
t-fair
t-t
t-nice
fair-nice
6b
450000
400000
350000
300000
Agent A
250000
Agent B
200000
Agent C
150000
100000
50000
0
randomrandom
t-random
t-nasty
t-fair
6c
24
t-t
t-nice
fair-nice
450000
400000
350000
300000
Agent A
250000
Agent B
200000
Agent C
150000
100000
50000
0
randomrandom
t-random
t-nasty
t-fair
t-t
t-nice
fair-nice
6d
350000
300000
250000
Agent A
200000
Agent B
150000
Agent C
100000
50000
0
randomrandom
t-random
t-nasty
t-fair
t-t
t-nice
fair-nice
6e
Figure 6: Total Utilities of the three agents under different reputation mechanisms and strategy combinations.
25
Download