EC941 - Game Theory Lecture 7 Prof. Francesco Squintani Email: f.squintani@warwick.ac.uk 1 Structure of the Lecture Infinitely Repeated Games Nash and Subgame-Perfect Equilibrium Finitely Repeated Games 2 Repeated Games Repeated games are a special class of interactions, represented as extensive form games. A simultaneous move game, represented as a normal form game, is repeated over time. This yields to enlarging the set of equilibria, if players are sufficiently patient. For example, cooperation is a subgame perfect equilibrium in the prisoner’s dilemma. 3 Definition Let G = (N, A, u) be a strategic game. Let T be finite or infinite. The T-repeated game of G for the discount factor δ is the extensive game in which: the set of players is N the set of terminal histories is the set of infinite sequences (a1, a2, . . .) of action profiles in G the player function assigns the set of all players to every proper sub-history of every terminal history the set of actions of player i after any history is Ai each player i evaluates each terminal history (a1, a2, . . .) according to its discounted average (1 − d) ∑Tt=1 d t−1 ui (at). 4 Repeated Prisoner Dilemma Suppose that the following game is infinitely repeated with discount factor d. C C D D 2, 2 0, 3 3, 0 1, 1 5 Strategies A player’s strategy in an extensive game specifies her action after all possible histories after which it is her turn to move. A strategy of player i in an infinitely repeated game of the strategic game G specifies an action of player i (a member of Ai) for every sequence (a1, …, aT) of outcomes of G. 6 Grim Trigger Strategy Consider the repeated prisoner’s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous times. si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T. si (a1, . . . , aT) = C otherwise. Note that a player defects if either she or her opponent defected in the past. 7 Automaton Representation An automaton for player i is (X, x0 , f, g ). 1. X is a set of states. 2. x0 is the initial state of the automaton. 3. f : X x A X is the transition across states, as a function of the play. 4. g: X Ai is the play output at each state. 8 The automaton of the Grim Trigger Strategy is as follows: There are two states: C in which C is chosen, and D, in which D is chosen. The initial state is C. If the play is not (C,C) in any period then the state changes to D. If the automaton is in state D, there it remains forever. (C,C) * C (C,D) (D,C) (D,D) D 9 Tit for Tat The player initially cooperates. At subsequent rounds, she plays the strategy played by the opponent at the previous round. si (a1, . . . , aT) = C if aTj = C or T=1. si (a1, . . . , aT) = D if aTj = D ( . ,C) ( . ,D) * D C ( . ,C) ( . ,D) 10 Grim Trigger Nash Equilibrium Suppose that player j adopt the grim trigger strategy. If player i plays grim trigger, then the outcome is (C, C) in every period with payoffs (2, 2, . . .). The discounted average is 2. If i deviates from the grim trigger strategy, then there is one period (at least) in which she chooses D. All subsequent periods player j chooses D. So the best deviation for player i is choosing D in every subsequent period (because D is her unique best response to D). 11 If i can increase her payoff by deviating then she can do so by deviating to D in the first period. She obtains the stream of payoffs (3, 1, 1, . . .) with discounted average (1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d. Thus player i cannot increase her payoff by deviating if and only if 2 ≥ 3(1 − d) + d, or d ≥ 1/2. Hence, if d ≥ 1/2, then playing the grim trigger strategy by both players is a Nash equilibrium of the infinitely repeated Prisoner’s Dilemma. 12 Tit for Tat Nash Equilibrium Suppose that player j adopts the tit for tat strategy. If player i plays tit for tat, then the outcome is (C, C) in every period with payoffs (2, 2, . . .). If i deviates, then there is one period (at least) in which she chooses D. At the subsequent period player j chooses D. If player i plays D, she triggers one further D by j. If i chooses C, then she obtains C. 13 If i can increase her payoff by deviating, then she can do so by deviating to D in the first period. Player i can obtain either the stream (3, 1, 1, 1,…) or the stream (3, 0, 3, 0, . . .) with discounted average (1 − d)[3 + 0d + 3d2 + 0d3 +…] = 3 (1 − d) ∑∞t=0 d 2t = 3 (1 − d)/(1 - d2) = 3/(1 +d) Thus player i cannot increase her payoff by deviating if and only if 2 ≥ 3(1 − d) + d and 2 ≥ 3/(1 + d), or d ≥ 1/2. 14 Nash Folk Theorem in the Prisoner Dilemma Definition The set of feasible payoff profiles of a strategic game is the set of all weighted averages of payoff profiles in the game. For any feasible pair (x1, x2) of payoffs there is a finite sequence (a1, . . . , ak) of outcomes for which each player i’s average payoff is close to xi: [ui(a1)+…+ ui(ak)]/k – e1 < xi < e1 + [ui(a1)+…+ ui(ak)]/k. 15 The discounted average payoff is as close as possible to xi when taking the discount factor close enough to 1: (1 − d)∑∞t=1 d t-1 ui(at) – e2 < xi < e2 + (1 − d)∑∞t=1 d t-1 ui(at). Consider the feasible payoff pair (x1, x2), and the outcome path b that consists of repetitions of the sequence (a1, . . . , ak): bnk+l = al for l = 1,…,k. Consider the strategy si (h1, . . . , hT-1) = bT if ht = bt for t = 1, . . . , T − 1 si (h1, . . . , hT-1) = D otherwise. 16 As long as x1 > u1 (D,D) and x2 > u2 (D,D), this “grim trigger” strategy is a Nash Equilibrium. We conclude that any feasible payoff pair (x1, x2) such that x1 > u1 (D,D) and x2 > u2 (D,D) is a Nash Equilibrium payoff of the Prisoner’s Dilemma game. 17 Nash Folk Theorem Consider a one-shot game. Suppose that each player i can guarantee herself a “minimum” payoff mi. We will show that every feasible payoff profile w such that wi > mi can be achieved as the discounted average payoff profile of a Nash equilibrium in the infinitely repeated game, when d is close to 1. This payoff can be achieved with strategies similar to grim trigger strategies. Deviation from path by player i is punished by minimizing i’s payoff forever. 18 For the Prisoner’s Dilemma, the minimum payoff of player i supported by a Nash equilibrium is ui(D, D). Player j can ensure (by choosing D) that player i’s payoff does not exceed ui(D, D), and there is no lower payoff with this property. Hence, ui(D, D) is the lowest payoff that player j can force upon player i. What is this minimum payoff for player i in an arbitrary game? 19 For any collection a−i of the other players’ mixed actions, player i’s highest possible payoff is max ui (ai , a−i). {ai ∈ Ai} As a−i changes, this maximal payoff changes. The collection a−i of “punishments” that make this maximum as small as possible is the solution of min max ui (ai , a−i). {a−i ∈ D(A-i)} {ai ∈ Ai} This payoff is known as player i’s minmax payoff. 20 Theorem (Nash Folk Theorem). Let G be a strategic game. Let w be a feasible payoff profile of G for which each player’s payoff exceeds her minmax payoff. Then, for all e > 0, there exists δ < 1 such that if the discount factor exceeds δ, then the infinitely repeated game of G has a Nash equilibrium whose discounted average payoff profile w’ satisfies |w’ − w| < e . For any discount factor δ with 0 < δ < 1, the discounted average payoff of every player in any Nash equilibrium of the infinitely repeated game of G is at least her minmax payoff. 21 Let x be the payoff profile induced by the actions a. By hypothesis, each xi exceeds player i’s minmax payoff. For each player i, let p-i be a profile of mixed actions for the players other than i that holds player i down to her minmax payoff. Define each player i’s strategy as follows. In each period, play ai as long as the play was a in every previous period. Otherwise play (p-j)i, where j is the player who deviated in the first period in which the play was not a. 22 Let H∗ be the set of histories in which there is a period in which exactly one player j chose an action different from aj. Refer to j as a lone deviant. The strategy of player i is defined as follows: si (∅) = ai, si (h) = ai if h is not in H∗, si(h) = (p-j)i if h ∈ H∗ and j is the first lone deviant in h. 23 We now show that the profile s is a Nash equilibrium. If each player i adheres to si, then her payoff is xi in every period. If player i deviates from si, then she may gain in the period in which she deviates, but she loses in every subsequent period, obtaining at most her minmax payoff, rather than xi. Thus for a discount factor close enough to 1, si is a best response to s-i for every player i. 24 Subgame Perfect Equilibrium Theorem (One-Shot Deviation Property of subgame perfect equilibria of infinitely repeated games) A strategy profile in an infinitely repeated game is a subgame perfect equilibrium if and only if no player can gain by changing her action after any history, given both the strategies of the other players and the remainder of her own strategy. 25 The One-Shot Deviation Principle is deceptively simple. Its application is often not straightforward. First, it requires that players cannot gain by deviating once, in any history of play. But there are infinite many histories... So they cannot be checked one by one. They must grouped according to the prescriptions of the strategy profile we are considering. Second, the one-shot deviation may change future play, according to the strategy that we are considering. The deviation is one shot from a strategy, not from a play. 26 Grim Trigger Strategy Consider the repeated prisoner’s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous times. Otherwise, they should defect forever. si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T. si (a1, . . . , aT) = C otherwise. 27 Grim Trigger SPE Suppose that both players adopt the grim trigger strategy. There are two “groups” of histories. Those for which grim trigger strategy prescribes that the players play (C,C) and those for which the grim trigger strategy prescribes that they play (D,D). In the first set of histories, if player i plays grim trigger, then the outcome is (C, C) in every period with payoffs (2, 2, . . .), whose discounted average is 2. 28 If i deviates only once, she plays D. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent periods. The opponent, playing grim trigger strategy, plays D forever as a consequence of i’s one-shot deviation. The OSD yields the stream of payoffs (3, 1, 1, . . .) with discounted average (1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d. Thus player i cannot increase her payoff by deviating if and only if 2 ≥ 3(1 − d) + d, or d ≥ 1/2. 29 In the second set of histories, if player i plays grim trigger, then the outcome is (D, D) in every period with payoffs (1, 1, . . .), whose discounted average is 1. If I deviates only once, she plays C. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent periods. The opponent, playing grim trigger strategy, plays D forever as a consequence of i’s one-shot deviation. The OSD yields the stream of payoffs (0, 1, 1, . . .) with discounted average (1 − d)[0 + d + d 2 + d 3 + · · ·] = d. 30 Player i cannot increase her payoff by deviating: 1 ≥ d. We conclude that if d ≥ 1/2, then the strategy pair in which each player’s strategy is the grim trigger strategy is a Subgame-Perfect Equilibrium of the infinitely repeated Prisoner’s Dilemma. 31 SPE Folk Theorem Theorem (Simplified Subgame Perfect Folk Theorem for Two-Player Games) Let G be a two-player strategic game. Let w be a feasible payoff profile of G for which each player’s payoff exceeds her (pure-strategy) minmax payoff. Then for all e > 0 there exists δ < 1 such that if the discount factor exceeds δ then the infinitely repeated game of G has a subgame perfect equilibrium whose discounted average payoff profile w satisfies|w’ − w| < e . 32 Take an outcome a such that both players’ discounted payoffs exceed their pure-strategy minmax payoffs. Let pj be an action of player i that holds player j down to her minmax payoff, and let p = (p2, p1). If the minmax profile p is a Nash Equilibrium of the stage game, then consider a modified grim strategy such that both players play the sequence at at any time t; and that, if either player deviates, p is played for ever. Because both players’ discounted payoffs for a exceed their minmax payoffs, if the discount factor d is sufficiently close to one, the players will obey to the modified grim trigger strategy, yielding the outcome a. 33 If p is not a Nash Equilibrium, the proof is as follows. Let si be a strategy of player i that starts off choosing ai,0, and continues to choose ai,t so long as the previous outcome was at; otherwise, it chooses the action pj that holds player j to her minmax payoff. Once punishment begins, it continues for k periods, as long as both players choose their punishment actions, and then players revert to a. If any player j deviates from the assigned punishment action, then the punishments are re-started, and player j is now punished. 34 To prove that (s1, s2) is a subgame perfect equilibrium, we now find δ’ and k(δ’) such that if δ > δ’ then the strategy pair (s1, s2) is a subgame perfect equilibrium of the infinitely repeated game. Suppose that player j adheres to sj. If player i adheres to si in any history with no deviations, then her discounted average payoff is ui(a). If she deviates, she obtains at most her maximal payoff in the game, say ui*, in the period of her deviation, then ui(p) for k periods, and subsequently ui(a) in the future. 35 Her discounted payoff from the deviation is at most (1 − δ)[ui*+δui(p)+· · · +δkui (p)] + δk+1ui (a) = (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a). Hence, she does not deviate if ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a). If player i adheres to si in any history where the players play p, she gets ui(p) for at most k periods, then ui(a) in every subsequent period. This yields a discounted payoff of (1 − δk)ui(p) + δkui(a). Note that ui(p) < mi, her minmax payoff, and ui(a) > mi. 36 If she deviates from si, she obtains at most her minmax payoff in the period of her deviation, then ui(p) for k periods, then ui(a) in the future. This yields a discounted average payoff of at most (1 − δ)mi + δ(1 − δk)ui(p) + δk+1 ui(a). She does not deviate if (1 − δk) ui(p) + δk ui(a) ≥ (1 − δ)mi + δ(1 − δk) ui(p) + δk+1ui(a) or (1 − δk) ui(p) + δkui(a) ≥ mi. For each value of δ sufficiently close to 1 we can find k(δ) such that (δ, k(δ)) satisfies the 2 no-deviation inequalities: ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a), (1 − δk) ui(p) + δkui(a) ≥ mi. 37 Finitely Repeated Games Consider any Subgame Perfect Equilibrium of a finitely repeated game. In the final stage, a Nash Equilibrium of the stage game must be played. Hence, the set of Equilibria is enlarged only if there are multiple equilibria in the stage game. Otherwise, the unique Subgame Perfect Equilibrium of the repeated game is the unique Nash Equilibrium of the stage game. 38 Prisoner’s Dilemma The following game is repeated for T periods. C C D D 2, 2 0, 3 3, 0 1, 1 Proceeding by backward induction, in the last period, the unique Nash equilibrium is (D,D). 39 Because in the last period players play (D,D) regardless of the previous play, in the second to last period future payoffs do not depend on current play. It is as if players were playing the following game. C C D D 2, 2 0, 3 3, 0 1, 1 The unique Nash Equilibrium is (D,D). Proceeding by backward induction, the unique subgame-perfect equilibrium is (D,D) in every period. 40 Expanded Prisoner’s Dilemma The following game is repeated for T periods. C C D E D E 2, 2 0, 3 -2,-2 3, 0 1, 1 -2,-2 -2,-2 -2,-2 -1,-1 There are 2 Nash Equilibria: (D, D) and (E,E). 41 Expanded Prisoner’s Dilemma In period T, a Nash Equilibrium is played, either (D, D), with payoffs (1, 1), or (E, E), with payoffs (-1, -1). We construct a Subgame Perfect Equilibrium as follows. 1. In period T, the profile (D, D) is played. In all periods t = 1, …, T-1, the profile (C, C) is played with payoffs (2, 2). If either player deviates to D, then the future play switches to (E, E) forever. 2. 3. 42 This is a SPE if and only if players do not have an incentive to deviate at the period before the last. In fact, the punishment (E, E) forever is more severe if there are more periods left to play. Each player must prefer to play C with payoff 2 + d, than to play D, with payoff 3 – d. Hence, the strategies are a SPE if and only if: 3 - δ < 2 + δ, i.e. δ > 1/2. 43 Summary of the Lecture Infinitely Repeated Games Nash and Subgame-Perfect Equilibrium Finitely Repeated Games 44 Preview of the Next Lecture Coalitional Games and the Core Ownership and the Distribution of Wealth Horse Trading and House Exchanges Voting and Matching 45