Axelrod_Prisoners_Dilemma_Notes

The Evolution of Strategies in the Iterated Prisoner’s Dilemma  In the Prisoner’s Dilmma, two individuals will either cooperate or defect, and the payoff to a player is determined by the following table: Cooperate Row Player Defect        Column Player Cooperate Defect R=3. R=3 S=0. T=5 Reward for mutual Sucker’s payoff and cooperation temptation to defect T=5. S=0 P=1. P=1 Temptation to defect Punishment for mutual and sucker’s payoff defection Defection yields higher payoff than cooperation. However, if both individuals defect, both would receive a smaller payoff than they would if both cooperated. Iterated Prisoner’s Dilemma: If an individual recognizes a previous encounter with another individual and can remember the history of the outcomes, then the situation becomes an interated Prisoner’s Dilemma. o A strategy is a decision rule which specifies the probability of cooperation or defection as a function of the history of the interactions so far Axelrod (author) conducted a computer tournament for the iterated Prisoner’s Dilemma, where the strategies were submitted by game theorists in economics, sociology, political science, and mathematics. The strategy with the highest average score was TIT FOR TAT TIT FOR TAT: one of cooperating on the first move and then doing whatever the other player did on the preceding move. Cooperation based upon reciprocity. A second tournament including 62 entries from 6 countries, with the majority of the contestants being computer hobbyists and a few professors in evolutionary biology, physics, computer science and the same disciplines who entered in the first generation. TIT FOR TAT was the strategy that won again. One strategy includes making a current decision based on the three previous moves. Since there are 4 possible outcomes for each move, there are 4*4*4 = 64 different histories. So each element can either be a C or a D, depending on whether the individual will cooperate or defect, respectively. 6 additional genes are required to represent the three hypothetical moves for a total of 70 genes. The simulation works in 5 stages: o Create a population where each individual is a string of length 70 containing C’s or D’s o Run each individual in the current environment. Each individual player uses the strategy defined by its chromosome to play an iterated Prisoner’s Dilemma with other strategies. An individual’s score is its average over all the games it played. o Successful individuals are selected. Those that are average give one mating. Those above the average by one standard deviation mate twice. Those below the average by one standard deviation do not mate. o         For each random pair of successful individuals, create two offspring using crossover and mutation o The new population is more likely to have genes that are successful Population size is 20, 1 crossover and ½ mutation per chromosome per generation, 151 moves in a game, 50 generations in a run, 40 runs Running the simulation on random chromosomes gave rise to evolved chromosomes having similar characteristics similar to TIT FOR TAT. The median evolved chromosome was just as successful as the TIT FOR TAT o Don’t rock the boat (C after RRR) o Be provocable (D after receiving RRS) o Accept an apology (C after TSR) o Forget (C after SRR) o Accept a rut (D after PPP) Like TIT FOR TAT, most of the evolved rules did well by achieving almost complete mutual cooperation with seven of the eight representatives. Like TIT FOR TAT, most of the evolved rules do poorly with one representative called adjuster (adjusts its rate of defection to try to exploit the other player) In 11 of the 40 runs, the median rule did better than TIT FOR TAT. Populations evolved strategies that exploit one of the eight representatives at the cost of achieving somewhat less cooperation with two others o Must be able to discriminate between one representative and another based upon only the behavior the other player shows spontaneously or is provoked into showing o Must be able to adjust its own behavior to exploit a representative that is identified as an exploitable player o Must be able to achieve this discrimination and exploitation without getting into much trouble with other representatives. No submitted strategy could do this. These highly effective rules defect on the first move and sometimes the second move. They were able to “apologize” and get to mutual cooperation with most of the unexploitable representatives, and had responses to exploit those that were exploitable The genetic algorithm was very good at developing highly specialized adaptations to specific environmental settings Asexual reproduction yielded individuals with rules that did about as well as TIT FOR TAT in most cases, but they were only half as likely to develop median member was substantially more effective than TIT FOR TAT. In a changing environment, the effectiveness of a strategy depends upon which strategies are being used by the other members of the population o As generations progress, individuals tend to defect. However, after a few generations, some individuals are will cooperate, and build on each other. This will cause the next generations to have individuals who will cooperate more, which will increase the payoff of the individuals     The genetic algorithm is a highly effective method of problem solving. The problem for evolution can be conceptualized as a search for relatively high points in a multidimensional field of gene combinations, where height corresponds to fitness. The computer simulations demonstrate that the genetic algorithm is a highly efficient method for searching such a complex multidimensional space Some sequences are not likely to occur, so what they dictate may not matter that much. So if all of the individuals in a population are descendants of just a few individuals, then these irrelevant genes may be fixed to the values the ancestors happened to share. Repeated simulation shows that some genes that are fixed in one run are not necessarily fixed in another. For some genes, it is not arbitrary that they remain fixed. They can either be C or D. However, other genes adapt to those genes in such a way that improves the fitness of the individual There is a tradeoff between short and long terms. Median individuals may potentially evolve to really successful individuals, but died out by others who were slightly better and could not improve that much.

Axelrod_Prisoners_Dilemma_Notes

Related documents

Products

Support

Axelrod_Prisoners_Dilemma_Notes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib