Axelrod_Prisoners_Dilemma_Notes

advertisement
The Evolution of Strategies in the Iterated Prisoner’s Dilemma

In the Prisoner’s Dilmma, two individuals will either cooperate or defect, and the payoff to a player
is determined by the following table:
Cooperate
Row
Player
Defect







Column Player
Cooperate
Defect
R=3. R=3
S=0. T=5
Reward for mutual
Sucker’s payoff and
cooperation
temptation to defect
T=5. S=0
P=1. P=1
Temptation to defect
Punishment for mutual
and sucker’s payoff
defection
Defection yields higher payoff than cooperation. However, if both individuals defect, both would
receive a smaller payoff than they would if both cooperated.
Iterated Prisoner’s Dilemma: If an individual recognizes a previous encounter with another
individual and can remember the history of the outcomes, then the situation becomes an interated
Prisoner’s Dilemma.
o A strategy is a decision rule which specifies the probability of cooperation or defection as a
function of the history of the interactions so far
Axelrod (author) conducted a computer tournament for the iterated Prisoner’s Dilemma, where the
strategies were submitted by game theorists in economics, sociology, political science, and
mathematics. The strategy with the highest average score was TIT FOR TAT
TIT FOR TAT: one of cooperating on the first move and then doing whatever the other player did on
the preceding move. Cooperation based upon reciprocity.
A second tournament including 62 entries from 6 countries, with the majority of the contestants
being computer hobbyists and a few professors in evolutionary biology, physics, computer science
and the same disciplines who entered in the first generation. TIT FOR TAT was the strategy that won
again.
One strategy includes making a current decision based on the three previous moves. Since there are
4 possible outcomes for each move, there are 4*4*4 = 64 different histories. So each element can
either be a C or a D, depending on whether the individual will cooperate or defect, respectively. 6
additional genes are required to represent the three hypothetical moves for a total of 70 genes.
The simulation works in 5 stages:
o Create a population where each individual is a string of length 70 containing C’s or D’s
o Run each individual in the current environment. Each individual player uses the strategy
defined by its chromosome to play an iterated Prisoner’s Dilemma with other strategies. An
individual’s score is its average over all the games it played.
o Successful individuals are selected. Those that are average give one mating. Those above
the average by one standard deviation mate twice. Those below the average by one
standard deviation do not mate.
o








For each random pair of successful individuals, create two offspring using crossover and
mutation
o The new population is more likely to have genes that are successful
Population size is 20, 1 crossover and ½ mutation per chromosome per generation, 151 moves in a
game, 50 generations in a run, 40 runs
Running the simulation on random chromosomes gave rise to evolved chromosomes having similar
characteristics similar to TIT FOR TAT. The median evolved chromosome was just as successful as
the TIT FOR TAT
o Don’t rock the boat (C after RRR)
o Be provocable (D after receiving RRS)
o Accept an apology (C after TSR)
o Forget (C after SRR)
o Accept a rut (D after PPP)
Like TIT FOR TAT, most of the evolved rules did well by achieving almost complete mutual
cooperation with seven of the eight representatives. Like TIT FOR TAT, most of the evolved rules do
poorly with one representative called adjuster (adjusts its rate of defection to try to exploit the
other player)
In 11 of the 40 runs, the median rule did better than TIT FOR TAT. Populations evolved strategies
that exploit one of the eight representatives at the cost of achieving somewhat less cooperation
with two others
o Must be able to discriminate between one representative and another based upon only the
behavior the other player shows spontaneously or is provoked into showing
o Must be able to adjust its own behavior to exploit a representative that is identified as an
exploitable player
o Must be able to achieve this discrimination and exploitation without getting into much
trouble with other representatives. No submitted strategy could do this.
These highly effective rules defect on the first move and sometimes the second move. They were
able to “apologize” and get to mutual cooperation with most of the unexploitable representatives,
and had responses to exploit those that were exploitable
The genetic algorithm was very good at developing highly specialized adaptations to specific
environmental settings
Asexual reproduction yielded individuals with rules that did about as well as TIT FOR TAT in most
cases, but they were only half as likely to develop median member was substantially more effective
than TIT FOR TAT.
In a changing environment, the effectiveness of a strategy depends upon which strategies are being
used by the other members of the population
o As generations progress, individuals tend to defect. However, after a few generations, some
individuals are will cooperate, and build on each other. This will cause the next generations
to have individuals who will cooperate more, which will increase the payoff of the
individuals




The genetic algorithm is a highly effective method of problem solving. The problem for evolution
can be conceptualized as a search for relatively high points in a multidimensional field of gene
combinations, where height corresponds to fitness. The computer simulations demonstrate that
the genetic algorithm is a highly efficient method for searching such a complex multidimensional
space
Some sequences are not likely to occur, so what they dictate may not matter that much. So if all of
the individuals in a population are descendants of just a few individuals, then these irrelevant genes
may be fixed to the values the ancestors happened to share. Repeated simulation shows that some
genes that are fixed in one run are not necessarily fixed in another.
For some genes, it is not arbitrary that they remain fixed. They can either be C or D. However,
other genes adapt to those genes in such a way that improves the fitness of the individual
There is a tradeoff between short and long terms. Median individuals may potentially evolve to
really successful individuals, but died out by others who were slightly better and could not improve
that much.
Download