OR 3: Chapter 10 - Infinitely Repeated Games Recap In the previous chapter: • We defined repeated games; • We showed that a sequence of stage Nash games would give a subgame perfect equilibria; • We considered a game, illustrating how to identify equilibria that are not a sequence of stage Nash profiles. In this chapter we'll take a look at what happens when games are repeatedly infinitely. Discounting To illustrate infinitely repeated games (饾憞 → ∞) we will consider a Prisoners dilemma as our stage game. ( (2,2) (0,3) ) (3,0) (1,1) Let us denote 饾憼饾惗 as the strategy "cooperate at every stage". Let us denote 饾憼饾惙 as the strategy "defect at every stage". If we assume that both players play 饾憼饾惗 their utility would be: ∞ 饾憟1 (饾憼饾惗 , 饾憼饾惗 ) = 饾憟2 (饾憼饾惗 , 饾憼饾惗 ) = ∑ 2 > ∞ 饾憱=1 Similarly: ∞ 饾憟1 (饾憼饾惙 , 饾憼饾惙 ) = 饾憟2 (饾憼饾惙 , 饾憼饾惙 ) = ∑ 1 > ∞ 饾憱=1 It is impossible to compare these two strategies. To be able to carry out analysis of strategies in infinitely repeated games we make use of a discounting factor 0 < 饾浛 < 1. The interpretation of 饾浛 is that there is less importance given to future payoffs. One way of thinking about this is that "the probability of recieveing the future payoffs decreases with time". In this case we write the utility in an infinitely repeated game as: ∞ 饾憟饾憱 (饾憻, 饾憼) = ∑ 饾浛 饾憽−1 饾憿饾憱 饾憻(饾憽), 饾憼(饾憽) 饾憽=1 Thus: ∞ 饾憟1 (饾憼饾惗 , 饾憼饾惗 ) = 饾憟2 (饾憼饾惗 , 饾憼饾惗 ) = 2 ∑ 饾浛 饾憽−1 = 2/(1 − 饾浛) 饾憱=1 and: ∞ 饾憟1 (饾憼饾惙 , 饾憼饾惙 ) = 饾憟2 (饾憼饾惙 , 饾憼饾惙 ) = ∑ 饾浛 饾憽−1 = 1/(1 − 饾浛) 饾憱=1 Conditions for cooperation in Prisoner's Dilemmas Let us consider the "Grimm trigger" strategy (which we denote 饾憼饾惡 ): "Start by cooperating until your opponent defects at which point defect in all future stages." If both players play 饾憼饾惡 we have 饾憼饾惡 = 饾憼饾惗 : 饾憟1 (饾憼饾惡 , 饾憼饾惡 ) = 饾憟1 (饾憼饾惡 , 饾憼饾惡 ) = 2/(1 − 饾浛) If we assume that 饾憜1 = 饾憜2 = {饾憼饾惗 , 饾憼饾惙 , 饾憼饾惡 } and player 2 deviates from 饾憜饾惡 at the first stage to 饾憼饾惙 we get: ∞ 饾憟2 (饾憼饾惗 , 饾憼饾惡 ) = 3 + ∑ 饾浛 饾憽−1 = 3 + 饾浛/(1 − 饾浛) 饾憽=2 Deviation from 饾憼饾惡 to 饾憼饾惙 is rational if and only if: 2/(1 − 饾浛) < 3 + 饾浛/(1 − 饾浛) ⇔ 饾浛 < 1/2 thus if 饾浛 is large enough (饾憼饾惡 , 饾憼饾惡 ) is a Nash equilibrium. Importantly (饾憼饾惡 , 饾憼饾惡 ) is not a subgame perfect Nash equilibrium. Consider the subgame following (饾憻1 , 饾憼2 ) having been played in the first stage of the game. Assume that player 1 adhers to 饾憼饾惡 : 1. If player 2 also plays 饾憼饾惡 then the first stage of the subgame will be (饾憻2 , 饾憼1 ) (player 1 punishes while player 2 sticks with 饾憼1 as player 1 played 饾憻1 in previous stage). All subsequent plays will be (饾憻2 , 饾憼2 ) so player 2's utility will be: ∞ 0 + ∑ 饾浛 饾憽−1 = 饾浛/(1 − 饾浛) 饾憽=2 2. If player 2 deviates from 饾憼饾惡 and chooses to play 饾惙 in every period of the subgame then player 2's utility will be: ∞ ∑ 饾浛 饾憽−1 = (1 − 饾浛) 饾憽=1 which is a rational deviation (as 0 < 饾浛 < 1). Two questions arise: 1. Can we always find a strategy in a repeated game that gives us a better outcome than simply repeating the stage Nash equilibria? (Like 饾憼饾惡 ) 2. Can we also find a strategy with the above property that in fact is subgame perfect? (Unlike 饾憼饾惡 ) Folk theorm The answer is yes! To prove this we need to define a couple of things. Definition of an average payoff If we interpret 饾浛 as the probability of the repeated game not ending then the average length of the game is: 饾憞= 1 1−饾浛 We can use this to define the average payoffs per stage: 1 饾憞 饾憟饾憱 (饾憻, 饾憼) = (1 − 饾浛)饾憟饾憱 (饾憻, 饾憼) This average payoff is a tool that allows us to compare the payoffs in an infitely repeated game to the payoff in a single stage game. Definition of individually rational payoffs Individually rational payoffs are average payoffs that exceed the stage game Nash equilibrium payoffs for both players. As an example consider the plot corresponding to a repeated Prisoner's Dilemma. Convex hull of payoffs to a prisoners dilemma. The feasible average payoffs correspond to the feasible payoffs in the stage game. The individually rational payoffs show the payoffs that are better for both players than the stag Nash equilibrium. The following theorem states that we can choose a particular discount rate that for which there exists a subgame perfect Nash equilibrium that would give any individually rational payoff pair! Folk Theorem for infinetely repeated games Let (饾憿1∗ , 饾憿2∗ ) be a pair of Nash equilibrium payoffs for a stage game. For every individually rational pair (饾懀1 , 饾懀2 ) there exists 饾浛 such that for all 1 > 饾浛 > 饾浛 > 0 there is a subgame perfect Nash equilibrium with payoffs (饾懀1 , 饾懀2 ). Proof Let (饾湈1∗ , 饾湈2∗ ) be the stage Nash profile that yields (饾憿1∗ , 饾憿2∗ ). Now assume that playing 饾湈1 ∈ 饾洢饾憜1 and 饾湈2 ∈ 饾洢饾憜2 in every stage gives (饾懀1 , 饾懀2 ) (an individual rational payoff pair). Consider the following strategy: "Begin by using 饾湈饾憱 and continue to use 饾湈饾憱 as long as both players use the agreed strategies. If any player deviates: use 饾湈饾憱∗ for all future stages." We begin by proving that the above is a Nash equilibrium. Without loss of generality if player 1 deviates to 饾湈1 使 ∈ 饾洢饾憜1 such that 饾憿1 (饾湈1 使, 饾湈2 ) > 饾懀1 in stage 饾憳 then: 饾憳−1 (饾憳) 饾憟1 = ∑饾浛 饾憳 饾憽−1 饾懀1 + 饾浛 饾憳−1 饾憿1 (饾湈1 使, 饾湈2 ) + 饾憽=1 饾憿1∗ ( 1 − ∑ 饾浛 饾憽−1 ) 1−饾浛 饾憽=1 Recalling that player 1 would receive 饾懀1 in every stage with no devitation, the biggest gain to be made from deviating is if player 1 deviates in the first stage (all future gains are more 饾懀1 (1) heavily discounted). Thus if we can find 饾浛 such that 饾浛 > 饾浛 implies that 饾憟1 ≤ 1−饾憺 then player 1 has no incentive to deviate. 饾浛 1−饾浛 (1 − 饾浛)饾憿1 (饾湈1 使, 饾湈2 ) + 饾憿1∗ 饾浛 饾憿1 (饾湈1 使, 饾湈2 ) − 饾懀1 (1) 饾憟1 = 饾憿1 (饾湈1 使, 饾湈2 ) + 饾憿1∗ 饾懀1 1−饾浛 ≤ 饾懀1 ≤ 饾浛(饾憿1 (饾湈1 使, 饾湈2 ) − 饾憿1∗ ) ≤ 饾憿 (饾湈 使,饾湈 )−饾懀 as 饾憿1 (饾湈1 使, 饾湈2 ) > 饾懀1 > 饾憿1∗ , taking 饾浛 = 饾憿 1(饾湈1使,饾湈 2)−饾憿1∗ gives the required required result for 1 1 2 1 player 1 and repeating the argument for player 2 completes the proof of the fact that the prescribed strategy is a Nash equilibrium. By construction this strategy is also a subgame perfect Nash equilibrium. Given any history both players will act in the same way and no player will have an incentive to deviate: • If we consider a subgame just after any player has deviated from 饾湈饾憱 then both players use 饾湈饾憱∗ . • If we consider a subgame just after no player has deviated from 饾湈饾憱 then both players continue to use 饾湈饾憱 .