Trust, reputation and learning strategies in multi

advertisement

Trust, reputation and learning strategies in multi-agent systems

Authors: Iulia Mărieş, PhD. Student, Academy of Economic Studies, Bucharest

Bogdan Vintilă, PhD. Student, Academy of Economic Studies, Bucharest

Abstract

This paper tries to highlight associations between multi-agent systems and game theory, using trust and reputation models. Therefore, concepts of multi-agent system, trust and reputation are described. It is shown how an agent can learn a model of another agent and how can evaluate a set of competing hypotheses about that agent. Learning strategies in an Iterated Prisoner’s Dilemma setting are indicated. Experimental results are obtained using the simulation application NetLogo.

Keywords: multi-agent systems, game theory, trust, reputation, Prisoner’s Dilemma

JEL Classification: C73

Încredere, reputaţie şi strategii de învăţare în sistemele multi-agent

Autori: Drd. Iulia Mărieş, Academia de Studii Economice, Bucureşti

Drd. Bogdan Vintilă, Academia de Studii Economice, Bucureşti

Abstract

Lucrarea încearcă să evidenţieze legătura dintre sistemele multi-agent şi teoria jocurilor, utilizând în acest scop modele de reputaţie şi încredere. Astfel, sunt descrise conceptele de sistem multi-agent, încredere şi reputaţie. Este arătat modul în care un agent poate învăţa un model al unui alt agent şi modul în care poate evalua o mulţime de ipoteze competitive cu privire la acel agent. Sunt indicate strategii de învăţare pentru cazul Dilemei Prizonierilor

Infinit Repetată. Rezultate experimentale sunt obţinute utilizând aplicaţia de simulare NetLogo.

Cuvinte cheie : sisteme multi-agent, teoria jocurilor, încredere, reputaţie, Dilema Prizonierilor

Clasificare JEL : C73

1.

INTRODUCTION

There is an increasing interest in applying techniques from game theory in analyzing and implementing agent systems. To collaborate in multi-agent systems, agents must trust in other agents’ competence and willingness to cooperate. Autonomous agents in open systems encounter problems in establishing and maintaining relationships, therefore a need for modeling trust and reputation in agents arises. Trust and reputation are important concepts in multi-agent systems, and essential to human life.

Game theory studies interactions between self-interested agents, and also problems of how interaction strategies can be design to maximize the welfare of an agent. Game theory applications come to analyze multi-agent interactions, particularly those involving coordination and negotiation. Cooperation can take place without communication, while both agents compute the best outcome and know that the other agent will do the same. In this situation, agents can use game theory to predict what other agents will do and coordination arises through the assumption of mutual rationality. Negotiation is a process by which agents can reach agreement on matters of common interest. Game theory is used to design an appropriate protocol to govern interactions

48

between negotiation participants and to design a particular strategy that individual agents will follow while negotiating.

To achieve knowledge about the true game, opponent’s beliefs about it are desirable, determining a role-based trust. To improve performance, knowledge might be based on empirical observations or trusted sources.

Agents and multi-agent systems offer a new possibility for analyzing, modeling and implementing the complex systems. Agent-based vision offers a wide range of tools, techniques and paradigms, with a real potential to improve the use of informational technologies.

In a dictionary, an agent is defined as “someone or something who acts on behalf of another person or group”. This type of definition is too common to be considered operational.

But agents have been defined to be “autonomous, problem-solving computational entities capable of effective operation in dynamic and open environments”, (Luck, McBurney, Priest). Therefore, agents offer a new and appropriate route to the development of complex systems, especially in open and dynamic environments.

Multi-agent systems can approach problems with multiple solving methods, multiple structuring possibilities or multiple solving entities, like distributed systems. Thus, multi-agent systems have both the advantage of distributed and competitive solving problems, and the advantage of representing the complex ways of interactions. Interactions refer to cooperation, coordination and negotiation.

2.

TRUST AND REPUTATION

In this section will be presented a selection of computational trust and reputation models, through which the connections between game theory and multi-agent systems will be highlighted. First we need to clarify the notions of trust and reputation. This field is quite recent, but in the last years there have been proposed interesting models with direct implementation in different domains.

2.1 Trust

Trust is important to human society due to its social component. The concept of trust has different meanings, but Gambetta’s point of view is the most significant:

“ … trust (or, symmetrically, distrust) is a particular level of the subjective probability with which an agent assesses that another agent or group of agents will perform a particular action, both before he can monitor such action (or independently of his capacity ever to be able to monitor it) and in a context in which it affects his own action ”. (Gambetta, 2000)

There are significant characteristics of trust mentioned in the above definition:

Trust is subjective

Trust is affected by the actions that cannot be monitor

The level of trust is dependent on how our actions are affected by the other agent’s actions.

From the socio-cognitive perspective of Castelfranchi and Falcone (1998), trust represents an explicit reason-based and conscious form. While trust means different things, the concept can be seen as:

A mental attitude towards another agent, a disposition

A decision to rely upon another agent, an intention to delegate and trust

A behavior, for example the intentional act of trust and the relation between the trustier and the trustee.

49

The above concepts imply multiple sets of cognitive elements involved in the trustier mind.

Typologies of trust

In a social manner, there have been identified three types of trust:

Interpersonal trust (the direct trust that an agent has in another agent)

Impersonal trust (the trust within a system that is perceived through different properties)

Dispositional trust (the general trusting attitude)

2.2 Reputation

An agent behavior can be induced by other agents that cooperate, determining a reputation mechanism. The simplest definition of reputation can be the opinion others have of us.

Otherwise, reputation represents a perception that an agent has of another agent’s intentions or an expectation about an agent’s behavior.

Abdul-Rahman and Hailes (2000) have defined reputation as “an expectation about an agent’s behavior based on information about or observations of its past behavior.” This definition considers reputational information based on agent’s personal experiences.

2.3

Competence and integrity

This paper emphasizes two fundamental characteristics of trust. One characteristic is competence, which refers to agent’s ability to accomplish an action. The second characteristic is integrity, which refers to agent’s commitment to a long-term cooperative stance. Integrity is limited for rational agents by their belief in the potential return over long run.

Agents have to decide whether to trust or not the partner when engaging in joint actions.

Sometimes cooperation among agents is not possible. In such environments, “defecting” actions can be seen as actions that lead to higher utility in the short term. But in the situation of playing for longer periods of time with partners that have patience and memory in repeated interactions, the same agent will choose to cooperate.

Integrity and competence should be treated as separate components for decisions based on trust and reputation. An agent’s competence in accomplishing an action represents agent’s ability to change the state of agents’ environment, while a rational agent’s integrity to long term consists in respecting its commitments. An agent can decide whether to act honestly or not, but competence can not be determined by the agent (or not at the time of decision). A competence increase can be attained only through significant expenditure of time and resources. Agents can improve their ability to accomplish an action by learning or commitment of resources. The decision to commit resources is determined by agent’s integrity, but is not part of competence in the sense of ability to accomplish an action.

Integrity can be modeled through agents’ belief in a discount factor δ , considering interactions with each potential partner. The discount factor represents the probability that an interaction to be repeated. If the discount factor rises, the degree of commitment to cooperation between rational agents rises as well. A variety of environments are determined by interactions that do not have high probability to be repeated ( p ≤ 0.95). Exchange of services, products or information between agents, reach an uncertain horizon of time for any pair of agents. This phenomenon is true especially for interactions that imply individuals or small organizations.

Agents have to made trust-based decisions and to learn about competence, integrity and the difference between this two characteristics in multi-agent systems. Agents are faced with two

50

important tasks: observing agents’ behavior in an environment and concluding knowledge about agents’ competence and integrity, and applying that knowledge to make effective decisions when interacting with agents observed.

This paper suggests a strategy for learning competence and the discount factors that estimates alternative hypothesis for each partner. The technique proposes learning methods in an

Iterated Prisoner’s Dilemma setting.

2.4

Trust and reputation models

In the last years, the research in the multi-agent systems intensified and new trust and reputation models have been created.

Abdul-Rahman and Hailes (2000) have suggested a model that allows agents to decide which other agents’ opinion they trust more. In their view trust can be observed from two perspectives: as direct trust or as recommender trust. Direct trust can be represented as one of the values: “very trustworthy”, “trustworthy”, “untrustworthy” or “very untrustworthy”. For each partner, the agent has a panel with the number of past experiences in each category, and trust on a partner is given by the degree corresponding to the maximum value in the panel. The model takes into account only the trust coming from a witness, the recommender trust, which is considered

“reputation”. This approach could not differentiate agents that are lying from those that are telling the truth, but think different, so the model gives more importance to the information coming from agents with similar point of view. The model assigns higher values to trustworthy agents, without taking into account probabilities. In this case decisions to trust are relative, not assigning payoffs in absolute terms. Thus, an agent can identify whether to trust one agent more than another, but cannot choose whether to trust another agent in an interaction with certain payoffs, without specifications about threshold values.

Sabater and Sierra (2001) have proposed a modular trust and reputation model (ReGreT) to e-commerce environment. This model takes into consideration three different types of information sources: direct experiences, information from third party agents and social structures.

Trust can be determined combining direct experiences with the reputation model. Direct trust is built from direct interactions, using information perceived by the agent itself, and determined trust based on direct experiences. The reputation model is composed of specialized types of reputation: witness reputation (calculated from the reputation coming from witness), neighborhood reputation (calculated from the information regarding social relations between agents) and system reputation (calculated from roles and general properties). Witness reputation is calculated based on information from other agents of the community. Neighborhood reputation is expressed based on social environment of the agent and the relations between the agent and that environment. System reputation is considered as objective features of the agent (for example, agent’s role in the society). Those components merge and determine a trust model based on direct knowledge and reputation. The proposed metrics are intuitive, but the framework does not include probabilistic and utility-based terms.

The above two models try to put together the trust interaction and reported reputation or reputation scores, being able to generate higher or lower reputation scores according to the trustworthiness of the agent. But these models are not adapted to the possibility of varying payoffs over time. Therefore, the frameworks are able to determine a degree of trust, a subjective probability, but are not based on decision theory and not able to ensure payoff maximization.

Trust and reputation models try to decompose trust into fundamental aspects that can be modeled and learned. The notion of decomposing trust is not a new one. Marsh (1994) has

51

introduced a computational trust model in the distributed artificial intelligence. An artificial agent can absorb the trust and than he can make trust-based decisions. This model proposes a representation of trust as a continuous variable over the range [-1, +1). There are differentiated three types of trust: basic trust (calculated from all agent’s experiences), general trust (the trust on another agent without taking into account a specific situation) and situational trust (the trust on another agent taking into account a specific situation). There are proposed three statistical methods to estimate general trust, each determining a different type of agent: the maximum method leads to an optimistic agent (takes the maximum trust value from the experiences he has), the minimum method leads to a pessimistic agent (takes the minimum trust value from the experiences he has) and the mean method that lead to a realistic agent (takes the mean trust value from the experiences he has). Trust values are used in agents’ decision whether to cooperate or not with another agent.

Castelfranchi (2003) tries to determine a part of integrity, suggesting a commitment or willingness to cooperate with other agents. Disposition is incorporated as a belief component in a trust model based on fuzzy cognitive map. The application of fuzzy cognitive maps assumes that both maps graph and its weights are determined by an expert. Thus, there are no visible criteria to establish the conditions for which the constructed cognitive map to be optimal. Furthermore, decisions taken using cognitive maps assume establishing a threshold belief for making a decision to trust for any payoff level.

3.

SMITH – DESJARDINS MODEL

The problem approached is to determine how an agent can maximize its returns over interactions with another agent, knowing that the partner does not always accomplish the actions. The specific problem that the paper refers to is a variation of the Iterated Prisoners Dilemma.

The model proposed by Smith and desJardins (2008) tries to incorporate aspects of competence and integrity into a formal framework for making decisions. The framework applies game-theoretic concepts to model and learn about other agents, based on previous experiences.

This approach can determine the difference between influences of competence and commitment, and establish a utility criterion to make trust-based decisions, when payoffs vary over time.

Experimental results show the advantages of making difference between integrity and competence. Thus, agents with an exact estimation of other agents’ integrity and competence overrun the strategies that do not model these factors.

The competence is represented as a probability, denoted by c . Individual agents’ competence should be taken into account for each decision. In Iterated Prisoners’ Dilemma games, agents have the same probability to choose the wrong strategy, and c is the same for all agents. Integrity is represented by an agent’s belief in a discount factor, denoted by

δ

. For a rational agent, integrity is directly related to beliefs concerning future interactions with other agents. For low discount factor δ , the temptation to defect will overcome any advantage of longterm cooperation.

An agent must have beliefs about its competence, c

S

, about the game discount factor,

S

, and the corresponding beliefs about the partner, c

P

and

P

. An agent can defect a partner who overestimates the interactions number, if the immediate payoff is high enough. Once an agent cheats the partner, that partner will defect in all subsequent rounds.

The experiments use variations of the Iterated Prisoner’s Dilemma and the payoff matrix

is shown in Table 1. The Prisoner’s Dilemma payoffs are symmetric and depend on agents’

52

beliefs: C = 4 for mutual cooperation, D = 2 for mutual defection, T = 7 for defecting when the partner cooperates and S = 0 for cooperating when the partner defects.

Table 1

Cooperate

Defect

Cooperate

C = 4

T = 7

Defect

S = 0

D = 2

An agent can model a partner, if partner’s actions succeed only sometimes. The payoffs depend on each agent’s competence, c . The discount factor

δ

represents an agent belief about the probability of each successive opportunity to interact with the partner. Knowing agents’ competencies and discount factors, the current payoffs, and expected mean of future rounds, an agent can make a decision by estimating expected returns and choosing the action with the highest expected value.

3.1 The decision rule

The players estimate the expected payoffs for each strategy (cooperate or defect) based on their current knowledge and choose the strategy corresponding to the highest payoff. The decision of a rational player is given by the function:

If

C

D

Then COOPERATE

Else DEFECT

3.2 Discounted returns

The players estimate

C

and

D

as total estimated return for a strategy, discounted over the game, using the competence evaluation of the payoffs matrix (

 

C , D ,

T ).

The discounted returns follow an infinite geometric series. For example, the expected return for mutual cooperation with the discount rate

1 is:

C

 

 

C

 

2

C

...

  

C

 t  t lim

 

 i t 

0

C

 t



C

1

 

The discounted values of cooperation and defection when is no payoff variation are:

C

C

,

D

T

 

D

1

 

1

 

The discount value

C

is optimistic, because assumes mutual cooperation if the trust-

 based player cooperates, and the discount value

D

is pessimistic, because assumes defection for the rest of the game if once had been cheated by the trust-based player. If the partner’s memory is short, the penalty for defecting will be overestimated. Both assumptions encourage cooperation.

For unitary computed discount factor,

1 , is assumed the game will be endless, and while partner also believes this, cooperation is the best strategy for a rational player.

53

3.3

Estimated payoffs

The estimated payoffs for the four possible outcomes in the payoffs matrix must be adjusted by the estimation of agent’s own competence,

 c

S

, and other agents’ competence,

The expected payoff for mutual cooperation,

C , is:

 c

P

.

C

 c

S c

P

C

 c

S

( 1

 c

P

) S

( 1

 c

S

)

 c

P

T

( 1

 c

S

)( 1

 c

P

) D

While agents’ competencies are imperfect, any of the four outcomes is possible. Thus, the expected payoff is the sum of the four outcomes, weighted by the specific probabilities.

The expected payoff for mutual defection,

D

 c

S c

P

D

 c

S

( 1

 c

P

) T

( 1

 c

S

)

 c

P

S

D

( 1

, is:

 c

S

)( 1

 c

P

) C

And the expected payoff for defecting when the partner cooperates,

T

 c

S

 c

P

T

 c

S

( 1

 c

P

) D

( 1

 c

S

)

 c

P

C

( 1

 c

S

)( 1

 c

P

) S

T

, is:

For the final estimation of the discounted payoff after cooperation and defection, the expected payoff S

is never needed.

4.

SIMULATION

The simulations start from a base configuration of the system with equal number of agents of each type.

The strategies are:

RANDOM - randomly cooperate or defect

COOPERATE - always cooperate

DEFECT - always defect

TIT-FOR-TAT - if a partner cooperates on this interaction cooperate on the next interaction with them; if a partner defects on this interaction, defect on the next interaction with them; but initially cooperate

UNFORGIVING - cooperate until a partner defects once, then always defect in each interaction with them

UNKNOWN - this strategy is included to help trying your new strategies; it currently defaults to

Tit-for-Tat

54

Figure 1

In the base configuration of the system (Figure 1), on short term the defect agents obtain higher

benefits compared to other agents. On long term, their benefits decrease under the level obtained by the tit-for-tat or unforgiving agents but over the one obtained by cooperating agents.

Figure 2

For the configuration with a reduced number of the cooperating agents (Figure 2), compared with

the base configurations the defect agents obtain the highest benefits. On long term their benefits decrease under those obtained by the tit-for-tat and unforgiving agents and even under the benefits obtained by the cooperating ones.

55

Figure 3

The case with reduced number of defect agents (

Ошибка! Источник ссылки не найден.

), the tendency of having the highest benefits is maintained. On long term the agents’ benefits tend to reach equal levels, but still the defect agents’ benefits are the highest and the cooperating ones’ benefits are the lowest.

Figure 4

The configuration with a reduced number of tit-for-tat agents (

Ошибка! Источник ссылки не найден.

) leads, on short term, to the conclusion that a defect comportment is advantageous. On long term their benefits diminishes, but their level is still the highest among the others. The lowest benefits are obtained by the cooperating agents.

56

Figure 5

For the configuration with a reduced number of unforgiving agents (Figure 5) the highest benefits

are obtained by the defect agents. The tit-for-tat and unforgiving agents obtain similar benefits.

The cooperating agents obtain the lowest benefits. On long term, the order of the benefits’ levels is maintained, but the difference between the highest and the lowest benefits diminishes.

For the considered configurations the defect comportment seems to be the best choice of the agents, both short and long term. The cooperating agents’ benefits are the lowest in all configurations, both short and long term. The tit-for-tat and unforgiving comportments are viable variants as the benefits maintain at good levels, both short and long term.

5.

CONCLUSIONS

Agents and multi-agent systems offer new methods of analysis, modeling and implementation of complex systems. Multi-agent systems can approach problems with multiple solving methods, multiple structuring possibilities or multiple solving entities, similar to distributed systems. The multi-agent systems have the advantages of distributed and concurrent problem solving and complex interaction methods’ representation. Trust has an important role in the human society because of its’ social component. An agent’s behavior can be determined by other cooperating agents, building a reputation mechanism. Reputation represents the perception of an agent about another’s agent intention or an expectation regarding the agent’s behavior. Integrity and competence are distinct and must be treated as separate components in order to make decisions based on trust and reputation. Trust models in interaction try to decompose trust in fundamental characteristics that can be modeled and learned. Experimental results reveal some benefits determined by the distinction between integrity and competence. Agents that know precise estimations of other agents’ integrity and competence can overcome strategies that do not model these factors.

REFERENCES

1.

Abdul-Rahman, A., Hailes, S., 2000, “Supporting Trust in Virtual Communities”, Proceedings of the

Hawaii International Conference on System Sciences , Maui, Hawaii, pp. 8-20.

2.

Castelfranchi, C., Falcone, R., 1998, “Principles of Trust for MAS: Cognitive Anatomy, Social Importance, and Quantification”, Proceedings of the 3 rd International Conference on Multi-Agent Systems , pp. 1-8.

57

3.

Castelfranchi, C., Falcone, R., Marzo, F., 2007, “Cognitive Model of Trust as Relational Capital”, 10th

Workshop on Trust, Privacy, Deception and Fraud in Agent Societies , Honolulu, Hawaii

4.

Falcone, R., Castelfranchi, C., 2004, “Trust Dynamics: How Trust is influenced by direct experiences and by Trust itself”, AAMAS 2004 , New York, USA

5.

Gambetta, D., 2000, “ Can We Trust Trust?

”, in Gambetta, D. (ed.) Trust: Making and Breaking

Cooperative Relations , University of Oxford, pp. 4.

6.

Luck, M., McBurney, P., Preist, C., (ed.) 2003, Agent Technology: Enabling Next Generation Computing –

A Roadmap for Agent Based Computing , Agent Link

7.

Marsh, S., P., 1994, “ Formalizing Trust as a Computational Concept ”, PhD Thesis, Department of

Computing Science and Mathematics, University of Stirling

8.

Sabater, J., Sierra, C., 2001, “REGRET: A Reputation Model for Gregarious Societies”, Proceedings of the

5 th International Conference on Autonomous Agents , Montreal, Quebec, Canada, pp. 1-9.

9.

10.

Scarlat, E., (ed.) 2005, Agenţi şi Modelarea bazată pe Agenţi în Economie , Editura ASE, Bucharest

Smith, M., J., desJardins, M., 2008, “Learning to Trust in the Competence and Commitment of Agents”,

Journal of Autonomous Agents and Multi-Agent Systems , Vol. 18, No. 1, Springer Netherlands, pp. 7-11

58

Download