Collaborative Learning in Strategic Environments

From: AAAI Technical Report SS-02-02. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Collaborative Learning in Strategic Environments
Akira Namatame,Noriko Tanoura, Hiroshi Sato
Dept. of ComputerScience, National Defense Academy
Yokosuka, 239-8686, JAPAN
E-mail: { nama,hsato} @cc.nda.ac.jp
Abstract
Thereis no presumptionthat collective behaviorof interacting
agents leads to collectively satisfactory results. Howwell
agents can adapt to their social environmentis different to
howsatisfactory a social environment
they collectively create.
In this paper, we attempt to probe a deeper understandingof
this issue by specifying howagents interact by adaptingtheir
behavior. Weconsider the problems of asymmetric
coordination, which are formulated as minority games, and
we address the followingquestion: howdo interacting agents
realize an efficient coordinationwithoutany central authority
through self-organizing macroscopicorders from bottomup?
Weinvestigate several types of learning methodologies
including anewmodel,give-and-takelearning, in whichagents
yield to others if they gain andthey randomize
their actions if
they lose or do not gain. Weshowthat evolutionary learning
is the mostefficient in asymmetric
strategic environments.
Keyword:aymmetric coordination, social efficiency,
evolutionarylearning, give-and-takelearning
1. Introduction
ensurethat their individual actions are carried out with little
conflicts. Weclassify this type of coordinationas asymmetric
coordination[7]. Considerthe followingsituation: A collection
of agents have to travel using one of the route A or B. Each
agent gains a payoff if he chooses the sameroute what the
majority does. This type of coordination is classified as
symmetriccoordination. Onthe other hand, each agent gains
a payoff if he chooses the opposite route what the majority
does. This type of coordination is classified as asymmetric
coordination.
Coordination problems are characterized with many
equilibria, and they often face the problemof coordination
failure resulting from their independentinductive processes
[ 1][4]. Aninteresting problemis then underwhatcircumstances
will a collectionof agentsrealizes a particular stable situation,
and whetherthey satisfy the conditions of social efficiency?
In recent years, this issue has beenaddressedby formulating
Thereare manysituations whereinteracting agents can benefit
fromcoordinatingtheir actions. Social interactions pose many minority games(MG)[2][10]. However,the growingliterature
on MGtreats agents as automata, merely responding to
coordination problems for individuals. Individials face
problemsof sharing and distributing limited resources in an changingenvironmentswithout deliberating about individuals’
decisions[13]. Thereis no presumption
that the self-interested
efficient way. Consider a competitive routing problem of
networksin whichthe paths from sources to destination have behavior of agents should usually lead to collectively
to be established by multiple agents. In the context of traffic
networks, for instance, agents have to determinetheir route
independently, and in telecommunicationnetworks, they have
to decideon whatfraction of their traffic to send on eachlink
of the network.
Coordinationimplies that increased effort by someagents
leads the remainingagents to follow suit, whichgives rise to
multiplier effects. Weclassify this type of coordination as
symmetriccoordination[3]. Coordinationis also necessaryto
61
satisfactory results [8][9]. Howwell each agent does in
adapting to its social environmentis not the samething as
howsatisfactory a social environmentthey collectively create
for themselves. Aninteresting problemis then under what
circumstanceswill a society of rational agents realize social
efficiency? Solutionsto these problemsinvokethe intervention
of an authority whofinds the social optimumand imposes
the optimalbehaviorto agents. Whilesuch an optimal solution
maybe easy to find, the implementationmaybe difficult to
enforcein practical situations. Self-enforcingsolutions, where
agents achieve optimal allocation of resources while pursing
their self-interests withoutany explicit agreementwith others
are of great practical importance.
Weare interested in the bottom-upapproach for leading
to moreefficient coordinationwith the powerof moreeffective
learningat the individual levels [I 1]. Withinthe scopeof our
previousnights.
Whatmakesthis problemparticularly interesting is that it
is impossiblefor each agent to be perfectly rational, in the
sense of correctly predicting the attendance on any given
night. Thisis becauseif mostagentspredict that the attendance
model, we create models in which agents makedeliberate
decisions by applyingrational learning procedures.Weexplore
the mechanismin which interacting agents are stuck at an
inefficient equilibrium. While agents understand that the
outcomeis inefficient, each agent acting independently is
powerlessto managedecisionswhcireflect collective activity.
will be low(and therefore decide to attend), the attendance
will actually be high, whileif they predict the attendancewto
be high (and therefore decide not to attend) the attendance
will be low. Arthurinvestigated the numberof agents attending
the bar over time by using a diverse population of simple
rules. Oneinteresting result obtained wasthat over time, the
averageattendance of the bar is about 60. Agentsmaketheir
Agents also maynot knowabout what to do and also howto
makea decision. The design of efficient collective action is
crucial in manyfields. In collective activity, two types of
choices by predicting ahead of time whether the attendance
on the current night will exceedthe capability and then take
the appropriate course of action. Arthur examinedthe dynamic
activities maybe necessary: Eachagent behavesas a member
of society, while at the sametime, it behavesindependently
by adjusting its view and action. At the individual level, it
learns to improveits action basedon its ownobservationand
experiences. At the samelevel, they put forwardtheir learnt
knowledgefor consideration by others. Animportant aspect
driving force behindthis equilibrium.
The Arthur’s "El Farol" modelhas been extended in the
form as Minority Games(MG),whichshowfor the first time
howequilibrium can be reachedusing inductive learning [2].
The MGis played by a collection of rational agents
G = {At : 1 < i < N}. Withoutlosing the generarity, we can
of this coordinationis the learningrule adaptedby individuals.
assume N is an odd number. On each period of the stage
2. Formalismof AsymmetricCoordination and
Minority Games
game, each agent must choose privately and independently
betweentwo strategies S = {SI, S2 }. Werepresent the action
ofagentA~at the time period t by ai(t ) = 1 if he choosesSi.
and at(t ) = 0 if he chooses S2. Given the actions of all
The EL Farol bar problem and its variants provide a clean
and simple example of asymmetric coordination problems agents, the payoffof agent Ai is givenby
Ea,(t)lN <0
[ 1 ][4]. BrianArthuruseda verysimpleyet interesting problem (i) u~(t)= l if a~(t)= and p( t)ffi I~i$N
to illustrate effective uses of inductive reasoning of
(ii) ui(t ) = 0 if ai(t) = and p(t) > 0.5
heterogeneousagents. There is a bar called El Farol in the
downtown
of Santa Fe. In Santa Fe, there are agents interested
in going to the bar each night. All agents have identical
preferences. Eachof them will enjoy the night at El Farol
very muchif there are no morethan the threshold numberof
agents in the bar; however,each of themwill suffer miserably
if there are more than the threshold numberof agents. In
Arthur’s example, the total numberof agents is N=IO0,and
the threshold numberis set to 60. The only information
available to agents is the numberof visitors to the bar in
62
(2.1)
Each agent first receives aggregate information p(t) which
represents all agents’ actions, and then he decides whetherto
choose$1 or S,. Eachagent is rewardedwith a unitary payoff
wheneverthe side he chooses happens to be chosen by the
minority of the agents, while agents on the majority side get
nothing. All agents have access to public information on the
record of past histories on p(~r), Z" < t. The past history
available at the time period t is represented by /2(t). How
do agents choose actions under the commoninformation
bt(t)? Agentsmaybehavedifferently becauseof their personal
beliefs on the outcomeof the next time period p(t + 1),
chooseS~at the timeperiod t.
whichonly dependson what agents do at the next time period
t+l, and the past history #(t) has no direct impacton it.
Weanalyze the structure of the MGto see what we should
expect. Thesocial efficiency can be measuredfromthe average
payoff of one agent over a long-time period. Consider the
extremecase whereonly one agent take one side, and all the
others take the other side at each time period. The lucky
agent gets a reward, nothing for the others, and the average
~
Table1 The payoff matrix of the minority games
mhzfs
stramgy
S1
(go)
82
(stay)
Si
(go)
82
(stay)
payoff per agent is 1/N. Equally extreme situation is that
when(N-I)/2 agents on one side, (N+1)/’2agents on the other
side wherethe average payoff is about 0.5. Fromthe society
N=50
point of view,the latter situation is preferable.
The MGgameis characterized with manysolutions. It is
easy to see that this game has (N- I)/2 asymmetric Nash
N=50
equilibria in pure strategies in the case whereexactly (N-I)~2
agents choose either one of the two sides. The gamealso
presents a unique symmetricmixedstrategy Nashequilibrium
in which each agent selects the two sides with un equal
probability. Withthis mixedstrategy, each agent can expect
the payoff 0.5 on each time period, and the society payoff
follows a binomial distribution with the meanequal to N/2
and the variance N/4. The variance is also a measureof the
degreeof social efficiency. Thehigher the variance, the higher
the magnitude of the fluctuations around N/2 and the
correspondingaggregatewelfare loss. Several learning rules
have been found to lead to an efficient outcomewhenagents
learn fromeach other [2][15].
Howexactly does an agent’s utility depend on the
numberof total participants? Wenow showthe MGcan be
represented as 2x2 gamesin which an agent play with the
aggregate of the society with payoff matrix in Table 1. Let
supposeeach agent plays with all other agents individually
with the payoff matrix in Table 1. The average payoffs of
agent Ai from the play S~and S2 with one agent are given:
Ui(Sl) ffi 1 - ~ai(t)/N
Isi:gN
U,(S,,) ~, a,(t)lN
(2.2)
I<I¢,N
p(t) = ~ a~ (t) represents the proportion of a gents who
l<i<N
63
Figure I Local Matchings with 8 neighbours
The matchingmethodologyalso plays an important role in
the outcome of the game. The uniform matching, random
matching, or local matching are used as the matching
methodologiesin manyliteratures [3][6]. Agentsinteract with
all other agents, which is knownas the uniform matching.
Agents are not assumed to be knowledgeable enough to
correctly anticipate all other agents’ choices, howeverthey
can only access information about the aggregate behavior of
the society. The optimal solution of the MGdependson the
matching methodology.As shownin Figure 2, if each agent
is matchedwithall other agents (uniformmatching),he receives
0.5 as the optimal payoff. As shownin Figure 2, if each agent
is matchedwith their eight neighbors (local matching),
receives 0.75 or 0.5 as the optimal payoff, dependingon how
they split into two groupsof choosingS, (dark circle) or 2
(whitecircle).
(1) Reinforcementlearning
Agentstend to adopt actions that yielded a higher payoff in
the past, and to avoidactions that yielded a lowpayoff. Payoff
describe choicebehavior,but it is one’s ownpast payoffsthat
matter, not the payoffs of the others. The basic premise is
that the probabilityof taking an action in the present increases
with the payoff that resulted from taking that action in the
o
(a)Average
payoff:
0.75
(b}Average
payoff:
0.5
past [6].
(2) Best response learning
Agentsadopt actions that optimizetheir expectedpayoffgiven
whatthey expect others to do. In this learning model,agents
choosebest replies to the empiricalfrequencies distribution
of the previousactions of the others.
(c) averagepayoff: 0.5
Figure.2: The optimal payoffs per agent with different
matehings: (a) (b) Local matching with 8 neighbours,
Uniform or randommatching
3. Learning Models in Strategic
Environments
Gametheory is typically baseduponthe assumptionof rational
choice. In our view, the reason for the dominanceof the
rational-choice approachis not because scholars think it is
realistic. Nor is gametheory used solely because it offers
good advice to a decision maker, because its unrealistic
assumptionsunderminemuchof its value as a basis for advice.
Thereal advantageof the rational-choiceassumptionis that it
often allows deduction.Themainalternative to the assumption
of rational choice is some form of adaptive behavior.
Adaptation maybe expected at the individual level through
learning,or it maybe at the populationlevel throughdifferential
survival and reproductionof the moresuccessful individuals.
Either way, the consequencesof adaptive processes are often
very hard to deducewhenthere are manyinteracting agents
followingrules that havenonlineareffects.
Wespecify howagents adapt their behavior in responseto
others’ behaviorin strategic environments.Among
the adaptive
mechanisms
that havebeendiscussedin the learning literature
are the following[5][6][12][14]. Animportantissue in strategic
environment
is the learningstrategy adaptedby eachindividual.
64
(3) Evolutionarylearning
Agentswhouse high-off payoff strategies are at a productive
advantagecomparedto agents whouse low-payoffstrategies,
hencethe latter decreasein frequencyin the populationover
time(natural selection). In the standardmodelof this situation
agents are viewedas being genetically codedwith a strategy
andselection pressurefavors agentsthat are fitter, i.e., whose
strategy yields a higher payoffagainst the population.
(4) Social learning
Agentslearn eachfromother withsocial learning. For instance,
agents maycopy the behavior of others, especially behavior
that is popularto yield highpayoffs(imitation). In contrast
natural selection, the payoffs describe how agents make
choices, and agents’ payoff must be observable by others for
the modelto makesense. Thecrossoverstrategy is also another
type of social learning.
Theselearning modelscan be represented on the spectrum
in Figure 3. The reinforcementlearning and social learning
basedon give-and-taketake limiting cases representingat the
right-most and left-most points of the spectrum,models.
reinforcementbest-response
give&take
Figure.3 : The spectrumof learning
evolution:crossover
4. Best-Response
Learning
cycles betweenthe two extremesituations whereall agents
Withthe assumption of rationality, agents are assumedto
choosean optimal strategy based on a sampleof information
about what others agents have done in the past. Agents are
able to calculate best replies andlearn the strategy distribution
of play in a society. Gradually, agents learn the strategy
distribution in the society. Withbest-responselearning, each
agent calculates his best strategy basedon informationabout
the current distributional patterns of the strategies [5]. At
each period of time, each agent decides which strategy to
choose given the knowledgeof the aggregate behavior of the
population. Each agent thinks strategically, knowingthat
everyoneelse is also makinga rational choice given its own
information.
Animportant assumptionis howagents receive knowledge
of the current strategy distribution. For simplicity we assume
that there is no strategic interaction across time; an agent’s
decision dependsonly on current information and not on any
previousactions. Thedynamics
for collective decisionof agents
are described as follows: Letp(Obe the proportion of agents
whohavechosenSs at time t. Let U/Sk) be the expectedpayoff
to Ai whenhe chooses Sk, k=l,Z The best-response of agent
is then givenas follows:
If Ui(Si) > U/S2) then
choose S
l
If U~(S:) < U/S2), then choose $2
chooseS~ or S2. Underthis cyclic behavior, no agent gains
resultingin a hugewaste.Thisresult has a considerable
intuitive
appeal since it displays situations whererational individual
action, in pursuit of well-defined preferences, lead to
undesirable outcomes.
5. Collaboraive
Learning
with Give-and-Take
In this section, we proposethe give-and-takelearning which
departs from the conventional assumptionsuch that agents
update their behaviors in order to improvetheir measure
functions such as payoffs. It is commonly
assumedthat agents
tend to adopt actions that yield a higher payoff in the past,
and to avoid actions that yield a low payoff. Withthe give
and take learning, on the contrary, agents are assumedto
yield to others if they receive a payoffby taking the opposite
starategy at the next time period, and they chooserandomly
if they do not gain the payoff. Each agent gets the common
information p(t) whichaggregate all agents’ actions of the
last time period, and then he decides whetherto chooseSs or
S2 at the time periodt+lby consideringwhetherhe is rewarded
at time t: He is rewardeda unitary payoff wheneverthe side
he chooseshappensto be chosenby the minorityof the agents.
(4.1)
i
The expectedpayoffs of agentAs are obtained as
U,(S,) = e(l p(t)), u,(s2) = (1O)p(t (4.2)
Thebest-response adaptive rule of agent Ai is then obtained
as follows:
I1[
(i) If p(t)<q, then
choose S
I
(ii) If p(t)> q, then choose S:
(4.3)
Aggregateinformationp(0,the current status of the collective
decision,has a significant effect onagents’rational decisions.
The result of the learning with the global best-response
strategy is simple. Starting fromany initial conditionp(O), it
65
Figure 4 (a) The numberof agents to chooseS~ and 2 with
the mixedstrategy.
the payoffand the averagepayoff is 0.5, whichis the maximum
payoff. Withthe corraborative learning, every agent receive
the payoff and the majorityof agents receive 0.35..
It,’
~E
¢
w"
:,~.~¯
....
I "*,,sd~
~. ................................................
".
Figure 4 (b) The proportion of agents with the sameaverage
payoff with the mixedstrategy
.~q
4~
lO
IO
TUG
Fig.5(a): The numberof agents to choose t and S2, with
give-and-takelearning
Weformalize give-and-take learning as folows: The
action ai(t + 1)of agent i at t he next t ime period t + Iis
determinedby the followingrule:
(i) ai(t + 1) = 0 (Choose82) ai(t) = 1 anp(t) <0.5
(Gain)
rl=
(ii) ai(t + 1) = 1 (ChooseS,) ai( ) - 0 and p(t) > 0.5
(Gain)
,°
n
(iii)
........
?
....
ai(t + 1) = RND(x)if ai(t) = 1 and p(t) > 0.5
........................
(No gain)
(iv) ai(t + 1) "- RND(x)if a~(t) -" 0 and p(t) < 0.5
(No gain)
(5.1)
./,
...
Fig.5(b): The proportion of agents with the same average
payoff with give-and-takelearning
where RND(x)represents the mixedstarategy x=(x, l-x) of
6, Evolutionary Learning with Local Matching
chossingS~with the probability x and S2 with 1-x.
payoff of an individual. The expected payoff of agents who
In this section, we investigate evolutionary learning where
agents learn from the most successful neighbours, and they
co-evolve their strategies over time. Each agent adapts the
choose S1 is given l-p, and those of whochoose S2 is p,
where
p denotes the proportion of agents who choose S
I
.Therefore the average payoff oer individual is given by
2p(1 - p), which takes the maximum
value 0.5 atp=0.5. If
the exact numberN=50attends the bar, they receive 1 as
most successful strategy as guides for their owndecision
(individual learning). Hencetheir success dependsin large
part on howwell they learn from their neighbours. If the
neighbouris doing well, its strategy can be imitated by all
others (collective learning). In an evolutionaryapproach,there
Weevaluate the performace by comparing the average
66
is no needto assumea rational calculationto identify the best
strategy. Instead, the analysis of whatis chosenat any specific
time is baseduponan implementation
of the idea that effective
strategies are morelikely to be retained than ineffective
strategies [15]. Moreover,the evolutionary approachallows
the introduction of new strategies as occasional random
mutationsof old strategies. The evolutionaryprinciple itself
can be thought of as the consequenceof any one of three
different mechanisms.It could be that the more effective
individuals are morelikely to survive and reproduce.A second
interpretation is that agents learn by trial and error, keeping
effective strategies and altering ones that turn out poorly. A
third interpretationis that agents observeeachother, andthose
with poor performancetend to imitate the strategies of those
they see doingbetter.
Meus-ruleof interaction
--7
-’’
Figure 6: Therepresentation of the meta-ruleas the list of
strategies
Weconsider two types of evolutionary learning, mimicry
and crossover. Each agent interacts with the agents on all
eight adjacent squares and imitates the strategy of any better
performingone. In each generation,eachagent attains a success
score measuredby its average performance with its eight
neighbours. Then if an agent has one or more neighbours
whoare moresuccessful, the agent convertsto the strategy of
In this section weconsider the local matchingas shown
the most successful of themor crosses with the strategy of
in Figure l, whereeach agent is modeledto be matchedwith
the most successful neighbour.Neighborusalso serve another
his 8 neighbours.Eachagent is modeledto be matchedseveral
function as well. If the neighboris doingwell, the behaviour
times with the sameneighbour, and the rule of the strategy
of the neighbourcan be shared, and successful strategies can
selection is codedas the list as shownin Figure 6. A part of
spread throughout a population from neighbour to neighbour
the list is replacedwith that of the mostsuccessfulneighbour.
[111.
Anagent’sdecision rule is representedby the Nbinary string.
Significant differences were observedbetweenthe mimicry
At each generation gen, gent[1 .... lastgen], agents
and the crossover. As shownFigure 7, the case with the
repeatedly play the gamefor T iterations. An agent A~.,
i ~ [1...N], uses a binary string i to makea decision about mimicry strategy, each agent acquires a payoff of
his actionat eachiteration t, t ~ [l...T]. Abinarystring consists approximately 0.35, and with the cross-over, each agent
acquires a payoff of approximately 0.7. Consequently, we
of 22 positions (genes). Each position PI’J~[I,22]" is
represented as follows. The first and secondposition, P] and
P2, encodesthe action that the agent takes at iteration t = 1
and t = 2. A position p], j ~ [3,6], encodesthe history of
mutualhands(cooperateor defect) that agent i tookat iteration
can concludethat evolutionlearning leads to a moreefficient
situation in the strategic environments.
payoff
1 ...................................................................................
c~mmovcr
t - 1 and t - 2 with his neighbor(opponent).A position p j,
j ~ [7, 22], encodesthe action that agent i takes at iteration
°"i.................
.................................................
-1....................
t > 2, corresponding
to the position pj, j ~ [3,6].
0.40’6
-..............
....................
..............
~ ...........................................................................
O
1
50
I00
150
200
250
300
350
400
450
~n~lnion
Figure 7: The averagepayoff with the evolutionary learning:
crossover and mimicry
67
7. Conclusion
8. References
The interaction of heterogeneousagents produces somekind
of coherent, systematic behavior. Weinvestigated the
macroscopicpatterns arising from strategic interactions of
heterougeneousagents whobehave based on the local rules.
In this paper we address the questions such as: 1) how
society of selffish agents self-organizes, without a central
[1]Arthur W.B. "Inductive reasoning and Bounded
Rationality", AmericanEconomicReview, Vol.84, pp406-411,
1994.
authority, their collective behaviorto satisfy the constraints?
2) Howdoes learning at individual levels generate more
efficient collective behavior? 3) Howdoes co-evolution in
society put its invisible handsto promoteseif-oranization of
emergingcollective bahaviors?
In previous work on collective behavior, the standard
assumption was that agents use the same kind of adaptive
rule. In this paper, we departed from this assumption by
considering a modelof heterogeneousagents with respect to
their meta-ruleof makingdecisonsat each time period. Agents
use ad hoc meta-rules to maketheir decision based on the
past performance.Wealso consider several types of learning
rules agents use to updatetheir meta-rulefor makingdecisions.
Weconsider specific strategic environmentsin whicha large
numberof agents have to chooseone of twosides indepenently
and those on the minorityside win, whichis knonas a minority
game. A rational approachis helpless in our minority game
as it generates large-scale social inefficiency. Weintrduced
a newlearning modelat the individual level, give-and-take
learning whereevery agent should makehis decison based on
[2] Challet,D& Zhang.C.," Emergenceof Cooperation and
Organization in an Evolutionary Game,Physica A246,1997.
[3] Cooper, R: CoordinationGames,CambridgeUniv. Press,
1999.
[4] Fogel B., Chellapia K., "Inductive Reasoningand Bounded
Rationality Reconsidered", IEEE Trans. of Evolutionary
Computation,Vol.3, pp142-146,1999.
[5] Fudenberg,D., and Levine, D., The Theoryof Learningin
Games,The MITPress, 1998
[6] Hart,S.,and Mas-Colell, A, Simple Adaptive Procedure
Leadingto Correlated Equilibrium, Econometrica,2000.
[7] IwanagaS., Namatame.A: "AsymmetricCoordination of
Heterogeneous Agents", IEICE Trans. on Information and
Systems, Vol. E84-D,pp.937-944, 2001.
[8] IwanagaS., Namatame.A: "The Complexityof Collective
Decision, Journal of Nonlinear Dynamicsand Control", (to
appear).
[9]Kaniovski, Y., Kryazhimskii, A., &Young,H. Adaptive
Dynamicsin GamesPlayed by Heterogeneous Populations.
Gamesand EconomicsBehavior, 31, 50-96,2000.
[I0] Marsili Matteo.,"Toy Models of Markets with
Heterogeneous Interacting Agents", Lecture Notes in
Economicsand Mathematical Systems, Vol.503, pp161-181,
the past history of collective behavior.It is shownthat emegent Springer, 2001
A,andSato, H., "Collective Learningin Social
collective behavioris moreefficient than that generatedfrom [11] Namatame,
Coordination
Games", Proc. of AAAISpring Symposium,
the mixed Nash equilibrium strategies. Wealso proposed
SS-01-03,pp.156-159,2001.
collaborative learning based on a Darwinianapproach.. It is
shownthat in strategic environmentswhereevery agent has
to keep improvinghis meta-rule for makingdecisons in order
to survive, if agents learn from each other, then the social
efficiencyis realized withouta central of authority.
[12] Samuelson,,L., Evolutionary Gamesand Equilibrium
Selection, The MITPress, 1998.
[ 13] Sipper M., Evolutionof ParaUelCellular Machines:The
Cellular Programming
Approach,Springer, 1996.
[14] Young,H.P., Individual Strategy and Social Structur~
PrincedomUniv. Press, 1998.
[ 15] WeibullJ., Evolutionary GameTheory, The MITPress,
1996.
68