Collective Learning in Social Coordination Games Akira Namatame,Hiroshi Sato Dept. of ComputerScience, National Defense Academy Yokosuka, 239-8686, JAPAN E-mail: { nama, hsato}@cc.nda.ac.jp Abstract Animportant aspect of collective intelligence is the learning strategy adapted by each agent. Aninteresting problemwhich Weespecially consider the networks of strategic decisions as shown in Figure 1, in which each node represents an agent has been widely investigated is under what circumstances will agents converge to some particular equilibrium? We and each link represents social interaction betweentwo agents. Wemodelthe strategic decisions of agents as follows: Each consider the networks of agents, in which each agent faces agent faces the binary decision with externalities. An strategic decision problems. Weinvestigate the dynamicsof externality occurs if one cares about others’ decisions, and collective decision when each agent adapts the strategy of interaction to its, neighbors. Weare interested in to showhow they also affect each agent’s strategic decision. Weconsider the networks of social gamesin whicheach agent is repeatedly the society gropes its way towards an equilibrium situation. Weshowthat the society selects the mostefficient equilibrium matchedwith its neighbors to play the two-personcoordination games. Weinvestigate how the society gropes for its way amongmultiple equilibria when the agents composingit do learn fromeach other as collective learning, and they co-evolve towards equilibrium in an imperfect world where agents interact locally. Theydo not performoptimizationcalculations, their strategies over time. The possibility of mutuallearning allows the socially optimality where the model with random instead observe the current performanceof their neighbors, and learn from the most successful one. Weattempt to probe matchingmayresult in inefficiency. Keyword:collective learning, crossover, coordination games deeper into these issues by modelingthe choices madeby the , equilibriumselection, agents with the owninternal models. The term evolutionary dynamicsoften refers to systems that exhibit a time evolution in which the character of the dynamics may change due to 1. Introduction internal mechanisms[4]. Wefocus on evolutionary dynamics The concept and techniques of game theory have been used that maychange in time according to each agent’s local rule extensively in dealing with strategic decision-making of interaction. Wethen model the microscopic evolutionary situations. However,game theory has been unsuccessful in dynamics induced by both agents’ conscious decision and explaining on how agents know which equilibrium can be adaptive decision. realized if a gamehas multiple equally plausible equilibria There is no presumptionthat the self-interested behavior [2]. Introspective or deductivetheories that attempt to explain of agents shouldusually lead to collectively satisfactory results. the problem of equilibrium selection impose very strong Howwell each agent does for it in adapting to its social informational assumptionson the decision-makingprocess[3]. environmentis not the same thing as howsatisfactory a social In most gametheoritic models, agents are assumed to have environment they collectively create for themselves. We perfect knowledgeof the consequencesof their decision. explore the mechanismso that emergent collective behavior As a consequence, attention has shifted to evolutionary can be manipulated in order to achieve desirable situations. explanations motivated by the evolutionary gametheory[9]. Wediscuss the role of collective learning by considering the Evolutionary game theory assumes that the gameis repeated problem of equilibrium selection. Weuse the term emergent by trial and error. Theevolutionary selection process operates to denote stable macroscopicpatterns arising from the local over time on the population, and morefit agents or strategies rules for interaction. Weinvstigate the mechnismin which can be selected by natural selection. That is, selection pressure collective learning produces some kind of coherent and favors agents whichare fitter, i.e., whosestrategy yields a higher payoff against the average of the population. Two systematic emergentbehavior of social efficiency. features of the evolutionary approachalso distinguish it from the introspective approach of the gametheory. First, agents are not assumed to be so rational or knowledgeable as to correctly guessor anticipate the other agents’ strategies. Second, an explicit dynamicprocess is specified describing howagents adjust their strategies over time as they learn fromtheir experiences[ 11]. This paper considers the dynamicsof collective decision Fig.l. Networksof strategic decisions: social games whenagents adapts their strategic decision to their neighbors. 156 2. Social Cordination Gamesand and Equilibrium Selection Social coordination games ae modeled to describe social situations, in whichagents are expectedto select the strategy that the majority does. Coordination games have multiple equilibria with the possibility of coordinating on different strategies, whichis knownas the problemof miss-coordination. The traditional game theory is silent on how agents know which equilibrium should be realized if a gamehas multiple equally plausible equilibria, wherethese can be Pareto ranked. The game theory has been also unsuccessful in explaining howeach agent should learn in order to improvean equilibrium situation [2]. As a specific exampleof a coordination game,we formulate the endogeneouschoice problems of partners. Each link in Fig.l, represents social interaction formulatedas the following coordination game:Agents periodically have the discretion to add or sever links to their neighbors. All possible outcomes betweentwo agents are given as the payoff matrix in Table I. If both agents add the links, they receive the payoff ct. The parameter 13 represents the loss of the one-sided addition to the network. If the partner severs the network, the network cannot be connected,hence the effort for addingto the networks becomes to be meaningless. Social interaction model is characterized by the fact that each agent can completelycontrol with whomthey interact. With such endogeneousinteraction patterns, the question is whether the society mayselect an efficient equilibrium, in whicheach agent is linked as shown in Figure 1, starting from any initial configuration of the networks. The payoff matrix in Table 1 has two equilibria with the pairs of the pure strategies (SI,S~), ($2,$2), and one equilibrium of the mixed strategy. The most preferable equilibrium, defined as Pareto-dominance,is ( Sl, Sl), which dominatesthe other equilibria. There is another equilibrium concept, the risk-dominance.If a - 13 < 0, the equilibrium ( S2, S2) risk-dominates ( 1, S1 ). I f t he value 0= 13/(t~+13 ),defined as the threshold, is greater than 0.5, the Pareto-dominant equilibrium (St, $1) also risk dominates( 2 Sz). However, i 0 is less than 0.5, the equilibrium( 2, $2) risk dominates l( ,St). The optimal situation of the society is realized if all agents choosethe Pareto-optimal strategy Sl . Howdo rational agents make their decision when Pareto-dominant and riskdominantstrategies are different? Absent an explanation of howagents coordinate their expectations on the multiple equilibrium, someagents expect the equilibrium (Sl ,S1) and the others expect ( 2 , $2), and then they often f ace the problem of miss-coordination. Even if we consider the social coordination gamesin an evolutionary setting, the situation remainsthe same. Theriskdominantequlibrium is selected from amongthe set of Nash equlibria, even if it is inefficient. The basin of attraction of the risk-dominantequlibriumis larger than that of the Paretodominantequilibrium. In addition, if agents sometimesmake mistakes, the society occasionally switches from one equilibrium to another. As the likelihood of mistakes goes to zero, it is knownthat the risk-dominant equilibrium becomes to be stochasticallystable [6] .[8]. Table 1 The payoff matrix of a coordination game Own’s Th~ oth6r’s trateg~ S1 (add) S2 (sever) S1 (acid) S2 (sever) 3. Spatioal Evolutionary Collective Learning Dynamics with In order to formulate the social interactions in large, we may have two fundamental models, random matching and local matching[I]. The approach of random (or uniform) matching is modeled as follows: At each time period, every agent is assumed to match (interact) with one agent drawn at random from a society. An important assumption of randommatching is that they receive knowledge of the current strategy distribution. Agentscan calculate their best strategy basedon information about what other agents have done in the past. What happens when an agent does not have all of this information7 Weespecially consider the structure of territories in which the entire territory is divided up so that each agent interacts with eight neighbors in Fig.1 [7]. Theyadapt the most successful strategy as guides for their owndecision (individual learning). Hencetheir success dependsin large part on how well they learn from their neighbors. If the neighbor is doing well, its strategy can be imitated by all others (collective learning). Each agent is modeled to matchedseveral times with the samepartner, and the strategies for the samepartner is codedas the list. A part of the list is replaced with that of the most successful agent as shownin Fig2 (cross-over) [5][10]. Neighbors can serve another function as well. If the neighbor is doing well, the behavior of the neighbor can be imitated, and successful strategies can spread throughouta population from neighbor to neighbor. The consequencesof their behaviors maytake sometime to have an effect on agents with whomthey are not directly linked. cross-over I I ~D~Ug~c I Agent A Neighbor who of Agent ~C~/ Updated Strategy .......... ~ F-q/m/ acquired the highest payoff Aj Fig2 Illustration of crossover as individual learning 157 4. Simulation Results Weendowagents with somesimple way of collective learning and describe the evolutionary dynamics that magnifies tendencies toward more better situation. Wearrange agents for an area of 50 × 50 (N = 2,500 agents) as the lattice model in Figure 2 with no gap, and four comers and end of an area connectit with an opposite side. At each time period, each agent plays the coordination gamein Table 1 with his 8 neighbors .Weinvestigate the role of collective learning in social games. We demonstrate various evolutionary phenomenaby studying the behavior in the following two completely different models: The means-field modelin which all agents with all (or equivalently the randommatchingmodel in which each agent interacts with a randomlychosen partner ), and the local interaction on the lattice model. In Fig.4, we showthe simulation results by changing the parametersa, B in Table 1. Thex-axis represents the threshold value /9 =/3/( ct +/3 )). They-axis represents the ratio of Pareto-optimal strategies ($1) in the society at equilibrium. The dot line in Fig.4 represent the result of the evolutionary approach.If/9 < 0.5, the pareto-optimalstrategy $1 also riskdominateS2, therefore all agents finally chooses S~. However if 8 > 0.5, all agents finally choosethe risk-dominantstrategy S2 with the means-field model. With the latteice modelwith collective learning, if the threshold value 0 is less than 0.8, then almost every agent finally choose S~. However,if the threshold /9 is greater than 0.8, moreagents becometo choose the risk-dominantstrategy S 2. Wealso considered the effect of mistakes for equilibrium selection. Weassume that agents sometimes make mistakes in the inplementation of their strategies. That is, with some probability ~, the agents do not use the current strategy determinedby their rules of interaction, but the other one. Weuse different noise levels of ~ =0.01, ~ =0.025, and =0.05. With the high rate of mistake (~ =0.05), the riskdominant-strategy S2 becomesto dominate if the threshold value 0 is large ( 0 > 0.8). However,with the low level mistakes, the pareto-dominatestrategy S~ still dominatesthe society. ~) 100 80 "--" "-- "-:" "" - -~~_~. ¯ 60 40 ¯ ¯ 20 I I 3=05 I I ~=0.00 ~~ ~__- JAil- .e=0.01 ~.2 .... 6=0.025 t ~°~ -6=0.05 %1 0=0.9 Fig.4:The ratio of the pareto-dominant strategy(St) at equilibrium 158 The differences betweenthe mean-field modeland the lattice modelwith collective learning are evident in all these cases. Wefound that the collective learning process leads the society of rational agents to the Pareto dominantequilibrium situation in most cases. In this context, evenif mistakes or errors when they choose their strategy are introduced, we found social coordination gameswith collective learning always converges to the Pareto dominantequilibrium. 5. Conclusion This paper discussed strategic modelsin whichagents interact locally with their neighbors and learn from the most successful one. Wecompared the global model with global interaction in whichall agents can interact with each other and the lattice modelwith local interaction in whichagents can only interact with their immediate neighbors. As an example, we showed howsocial networks evolve to a socially optimal situation where everybody is fully connected starting from randomly disconnectedsituation. Wealso discussed the role of collective learning in social games. The hypotheses we employedhere reflect limited ability of agents. The learning strategy employedhere is crossover, and they learn from the most successful neighbor. The methodologyof collective learning is veryslowto evolve,but it is essential especiallyin imperfect worlds. Weinvestigated howthe society selects the most efficient equilibrium amongmultiple equilibria when the agents composing it do learn each other, and showedthat collective learning is essential for realizing social efficiency.. 6 References [1]Cohedent,P (eds): TheEconomics of Networks,Springer, 1998 [2]Fudenberg,D., and Levine,D., TheTheoryof Learningin Games, The MITPress, 1998 [3]Hansaryi,J. and Selten, R., A GameTheoryof Equibrium Selection in Games,MITPress, 1988. [4]Hofbauer,J. Sigmund,K., EcolutionaryGamesandPopulation Dynamics,Cambridge Univ. Press, 1998. [5]IwanagaS, Namatame A., "DecentralizedDecisionNetworks", TheThird Int. Conf. OnInformationFusion, pp571-577,2000, [6]Kandori,M.and Mailath, G., "Learning,Mutationand Long RunEquilibria in Games",Econometrica, Vol.61, pp.29-56, 1993. [7]Nowak, M.A., etc., "TheArithmeticsof MutualHelp",Scientific American,June, 1995. [8]Samuelson,,L.,EvolutionaryGamesand EqulibriumSelection, The MITPress, 1998. [9] Smith,1. M., Evolutionand the Theoryof Games,Cambridge UniversityPress, 1982. [10] UnoK. and Namatame A., "An Evolutionary Design of the Networksof MutualReliability", CEC’99,Washington D.C., pp. 17171723, 1999. [11] UnoK. and Namatame A., "Evolutionary Behaviors Emerged throughStrategic Interactions in the Large", GECCO’99, Orlando, Florida, pp.1414-1421, 1999. [12] WeibullJ.,Evolutionary GameTheory,The MITPress, 1996.