Enforcing Social Norms: Trust-building and community enforcement∗ Joyee Deb† Julio González-Dı́az‡ New York University University of Santiago de Compostela April 3, 2014 Abstract We study impersonal exchange, and ask how agents can behave honestly in anonymous transactions, without contracts. We analyze repeated anonymous random matching games, where agents observe only their own transactions. Little is known about cooperation in this setting beyond prisoner’s dilemma. We show that cooperation can be sustained quite generally, using community enforcement and “trust-building”. The latter refers to an initial phase of the game in which one community builds trust by not deviating despite a short-run incentive to cheat; the other community reciprocates trust by not punishing deviations during this phase. Trust-building is followed by cooperative play, sustained through community enforcement. ∗ Acknowledgements. We are grateful to Johannes Hörner for many insightful discussions and comments. We thank Mehmet Ekmekci, Jérôme Renault, Larry Samuelson, Satoru Takahashi and three anonymous referees for their comments. We also thank seminar participants at Boston University, GAMES 2008, Erice 2010, Northwestern University, New York University, Penn State University, and the Repeated Games workshop at SUNY StonyBrook for many suggestions. The second author gratefully acknowledges the support of the Sixth Framework Programme of the European Commission through a Marie Curie fellowship, and the Ministerio de Ciencia e Innovación through a Ramón y Cajal fellowship and through projects ECO2008-03484C02-02 and MTM2011-27731-C03. Support from Xunta de Galicia through project INCITE09-207-064-PR is also acknowledged. A version of this paper has been circulated under the title “Community Enforcement Beyond the Prisoner’s Dilemma.” † Email: joyee.deb@nyu.edu ‡ Email: julio.gonzalez@usc.es 1 Introduction In many economic settings, impersonal exchange in society is facilitated by legal institutions that can enforce contractual obligations. However, there is also a wide range of economic examples where impersonal exchange exists in the absence of legal contractual enforcement –either because existing legal institutions are weak, or even by design. For instance, Greif (1994, 2006) documents impersonal exchange or trade between different communities in medieval Europe before the existence of centralized legal institutions. In large part, the diamond industry, even today, opts out of the legal system (Bernstein, 1992). Diamond dealers have developed norms or rules that enforce honest trading, and the community punishes opportunistic behavior by appropriate sanctions. Millions of transactions are completed on eBay and other internet sites, where buyers and sellers trade essentially anonymously (Resnick et al., 2000). These settings raise the important question of how agents achieve cooperative outcomes, and act in good faith in transactions with strangers, in the absence of formal contracts. This is the central question of our paper. We model impersonal exchange as an infinitely repeated random matching game, in which players from two different communities are randomly and anonymously matched to each other to play a two-player game. Each player observes only his own transactions: He does not receive any information about the identity of his opponent or about how play proceeds in other transactions. In this setting of “minimal information-transmission,” we ask what payoffs can be achieved in equilibrium. In particular, can agents be prevented from behaving opportunistically? Can they achieve cooperative outcomes that could not be achieved in a one-shot interaction? In practice, cooperation or honest behavior within and between communities is often sustained by community enforcement; a mechanism in which a player’s dishonest behavior against one partner is punished by other members of society. Community enforcement arises as a norm in many settings. In this paper, we examine whether such social norms (cooperative behavior enforced by social sanctions) can be sustained in equilibrium with rational agents. Two early papers by Kandori (1992) and Ellison (1994) showed that community enforcement can sustain cooperative behavior for the special case in which the agents are engaged in a prisoner’s dilemma game (PD). However, it is an open question whether cooperation can be sustained in other strategic situations beyond the PD. In the aforementioned papers, if a player ever faces a defection, he punishes all future rivals by switching to defection forever (Nash reversion). By starting to defect, he spreads the information that 1 someone has defected. The defection action spreads throughout the population: More and more people get infected, and cooperation eventually breaks down completely. The credible threat of such a breakdown of cooperation can deter players from defecting in the first place. However, these arguments rely critically on properties of the PD. In particular, since the Nash equilibrium of the PD is in strictly dominant strategies, the punishment action is dominant and so gives a current gain even if it lowers continuation payoffs. In an arbitrary game, on facing a deviation for the first time, players may not have the incentive to punish, because punishing can both lower future continuation payoffs and entail a short-term loss in that period. How then can cooperation be sustained in general? We establish that it is, indeed, possible to sustain a wide range of payoffs in equilibrium in a large class of games beyond the PD, provided that all players are sufficiently patient and that the population size is not too small. We argue that communities of around twenty people should suffice for our results in most games. In particular, we show that for any stage-game with a strict Nash equilibrium, the ideas of community enforcement coupled with “trust-building” can be used to sustain cooperation. In equilibrium, play proceeds in two phases. There is an initial phase of what we call “trust-building,” followed by a cooperative phase that lasts forever, as long as nobody deviates. If anyone observes a deviation in the cooperative phase, he triggers Nash reversion (or community enforcement). In the initial trust-building phase, players of one community build trust by not deviating from the equilibrium action even though they have a short-run incentive to do so, and players in the other community reciprocate the trust by not starting any punishments during this phase even if they observe a deviation. This initial trustbuilding phase turns out to be crucial to sustaining cooperation in the long-run. The ideas that profitable long-term relationships must start by building trust, and that a breach of trust is subsequently punished harshly by the community are very intuitive, and our paper can be viewed as providing a foundation for how such norms of behavior are sustained.1 To the best of our knowledge, this is the first paper to sustain cooperation in a random matching game beyond the PD without adding any extra informational assumptions. Some papers that go beyond the PD introduce verifiable information about past play to sus1 There is a literature on building trust in repeated interactions (see for instance, Ghosh and Ray (1996) and Watson (2002)). The role of trust in this paper has a different flavor. These papers focus on the “gradual” building of trust, where the stakes in a relationship grow over time. Our equilibrium does not feature this gradualism. Rather, we have an initial phase in which players behave cooperatively even though they have an incentive to deviate, and this initial phase is exactly what helps sustain cooperation in the long-run. In this sense, trust is built so that trigger strategies can be effective in sustaining cooperation. 2 tain cooperation.2 More recently, Deb (2012) obtains a general folk theorem by allowing transmission of unverifiable information (cheap talk).3 An important feature of our equilibrium is that the strategies are simple and plausible. Unlike recent work on games with imperfect private monitoring (Ely and Välimäki, 2002; Piccione, 2002; Ely, Hörner and Olszewski, 2005; Hörner and Olszewski, 2006) and, specifically, in repeated random matching games (Takahashi, 2010; Deb, 2012), we do not rely on belief-free ideas. The strategies give the players strict incentives on and off the equilibrium path. Further, unlike existing literature, our strategies are robust to changes in the discount factor.4 The reader may wonder how our equilibrium strategies compare with social norms that arise in reality. Our strategies use community enforcement, which has the defining feature that opportunistic behavior by any agent is punished by (possibly all) members of the community, not necessarily by the victim alone. This is indeed a common feature in practice. For instance, in the diamond industry dishonest traders are punished through sanctions by all other community members. Greif (1994, 2006) documents the existence of a community responsibility system in which a player who acted dishonestly was punished by all members of the rival community, who in addition also punished the rest of the deviator’s community. In practice, however, societies also try to disseminate information and publicize the identity of deviators for better enforcement. For instance, Bernstein (1992) reports that the diamond dealer’s club “facilitates the transmission of information about dealers’ reputations, and serves both a reputation-signaling and a reputation-monitoring role.” Similarly, eBay and Amazon have developed reputation rating systems to enable transmission 2 For instance, Kandori (1992) assumes the existence of a mechanism that assigns labels to players based on their history of play. Players who have deviated or have seen a deviation can be distinguished from those who have not, by their labels. This naturally enables transmission of information, and cooperation can be sustained in a specific class of games. For related approaches, see Dal Bó (2007), Hasker (2007), OkunoFujiwara and Postlewaite (1995), and Takahashi (2010). 3 There is a recent literature on repeated games and community enforcement on networks. It may be worthwhile to investigate if the ideas of our construction apply in these settings. Yet, there are important modeling differences, which change the incentive issues substantively. On a network players are not anonymous. Moreover, this literature mostly restricts attention to the PD, and partners do not change over time (see, for instance, Ali and Miller (2012) and Lippert and Spagnolo (2011)). More recently, Nava and Piccione (Forthcoming) consider cooperation on a network that can change over time and show that cooperation can be sustained through some notion of reciprocity, which is not a possibility in our anonymous setting. 4 In a very recent sequence of papers, Sugaya (2013a,b,c) establishes general folk theorems under imperfect private monitoring, for two-player PD, general two-player games and N -player games. However, our anonymous random matching setting, even if viewed as an N -player game of imperfect private monitoring, does not fall within the scope of these papers, since it violates the full-support monitoring assumption, as well as other identifiability assumptions required in these papers. 3 of information about past transactions (Resnick et al., 2000). We do not allow dissemination of information within the community, but rather model all transactions as anonymous (even after deviations). We do this for two reasons. First, the anonymous setting with minimal information transmission offers a useful benchmark: Establishing the possibility of cooperation in this environment tells us that cooperation should certainly be sustainable when more verifiable information is available, and punishments can be personalized by using information about past deviations. Second, the anonymous setting allows us to model applications in which it may be difficult to know the identity of the rival. Consider, for instance, settings in which individual stakes are low but, because the number of agents is large, equilibrium behavior may still have a high social impact: For example, think of the repeated interaction between citizens in a district and the staff that keeps the district streets clean. Actions are taken asynchronously and it is natural to assume that the agents cannot transmit information about each other. Citizens can choose to litter or not. If the streets are too dirty, i.e., if the citizens litter, then a staff effort might not have an incentive to exert effort, since such an effort would not make a difference. In this setting, we can ask if it is possible to sustain an equilibrium in which citizens do not litter, and the staff exerts effort, when the outcome in a static setting might be for the citizens to litter and for the staff to shirk. It is worth emphasizing that this paper also makes a methodological contribution, since we develop techniques to work explicitly with players’ beliefs. We use Markov chains to model the beliefs of the players off the equilibrium path. We hope that the methods we use to study the evolution of beliefs will be of independent interest, and can be applied in other contexts such as the study of belief-based equilibria in general repeated games. It is useful to explain why working with beliefs is fundamental to our approach. Recall that the main challenge to sustaining cooperation through community enforcement is that, when an agent is supposed to punish a deviation, he may find that doing so is costly for both his current and his continuation payoffs. The main feature of our construction with trust-building is that, when a player is required to punish by reverting to the Nash action, his off-path beliefs are such that he thinks that most people are already playing Nash. To see how this works, start by assuming that players entertained the possibility of correlated deviations. Then, we could assume that, upon observing a deviation, a player thinks that all the players in the rival community have simultaneously deviated and that everybody is about to start the punishment. This would make Nash reversion optimal, but it is an arti- 4 ficial way to guarantee a common flow of information and get coordinated punishments.5 Indeed, if we rule out the possibility of coordinated deviations, a player who observes a deviation early in the game will know that few players have been affected so far and that Nash reversion will not be optimal for him. This suggests that, to induce appropriate beliefs, equilibrium strategies must prescribe something different from Nash reversion in the initial periods of the game, which is the reason for the trust-building phase.6 The rest of the paper is organized as follows. Section 2 contains the model and the main result. In Section 3, we illustrate the result using a leading example of the product-choice game. This section is useful in understanding the strategies and the intuition behind the result. In Section 4, we discuss the generality of our result and potential extensions. We provide the general equilibrium construction and the formal proof in the Appendix. 2 Cooperation Beyond the PD 2.1 The Setting There are 2M players, divided into two communities with M players each. In each period t ∈ {1, 2, . . .}, players are randomly matched into pairs, with each player from Commu- nity 1 facing a player from Community 2. The matching is anonymous, independent and uniform over time.7 After being matched, each pair of players plays a finite two-player game G. The action set of each player i ∈ {1, 2} is denoted by Ai and an action profile is an element of A := A1 × A2 . Players observe the actions and payoffs in their match. Then, a new matching occurs in the next period. Throughout, we refer to arbitrary players and players in Community 1 as male players and to players in Community 2 as female players. Players can observe only the transactions they are personally engaged in, i.e., each player knows the history of action profiles played in each of his stage-games in the past. A player never observes his opponent’s identity. Further, he gets no information about how other players have been matched or about the actions chosen by any other pair of 5 The solution concept used in this paper is sequential equilibrium (Kreps and Wilson, 1982), which rules out such off-path beliefs. In contrast such beliefs would be admissible, for instance, under weak perfect Bayesian equilibrium. 6 Ideally, we would like strategies such that every player, at each information set, has a unique best reply that is independent of his beliefs (as in Kandori (1992) and Ellison (1994)). We have not been able to construct such strategies. 7 Although the assumption of uniform matching greatly simplifies the calculations, we expect our results to hold for other similar matching technologies. 5 players. All players have discount factor δ ∈ (0, 1), and their payoffs in the infinitely repeated random matching game are the normalized sum of the discounted payoffs from the stage-games. No public randomization device is assumed.8 The solution concept used is sequential equilibrium. Given a game G, the associated repeated random matching game 9 with communities each of size M and discount factor δ is denoted by GM δ . 2.2 A negative result The main difficulty in sustaining cooperation through community enforcement is that players may not have the the incentive to punish deviations. Below, we present a simple example to show that a straightforward adaptation of grim trigger strategies (or the contagion strategies as in Ellison (1994)) cannot be used to support cooperation in general. Buyer Seller QH QL BH BL 1, 1 −l, 1 − c 1 + g, −l 0, 0 Figure 1: The product-choice game. Consider the simultaneous-move game between a buyer and a seller presented in Figure 1. Suppose that this game is played by a community of M buyers and a community of M sellers in the repeated anonymous random matching setting. In each period, every seller is randomly matched with a buyer. After being matched, each pair plays the product-choice game, where g > 0, c > 0, and l > 0.10 The seller can exert either high effort (QH ) or low effort (QL ) in the production of his output. The buyer, without observing the seller’s choice, can buy either a high-priced product (BH ) or a low-priced product (BL ). The buyer prefers the high-priced product if the seller has exerted high effort and prefers the lowpriced product if the seller has not. For the seller, exerting low effort is a dominant action. The efficient outcome of this game is (QH , BH ), while the Nash equilibrium is (QL , BL ). We denote a product-choice game by Γ(g, l, c). The infinitely repeated random matching 8 Section 4 has a discussion of what can be gained with a randomization device. Note that we are assuming that there are two independent communities. Alternatively, we could have taken one population whose members are matched in pairs in every period and, in each match, the roles of players are randomly assigned. We believe that the trust-building ideas that underlie this paper can be adapted to this new setting and refer the reader to Online Appendix (B.7) for a discussion. It is worth noting that the choice to model independent communities is driven by the fact that it seems to be the hardest scenario to achieve cooperation because of the negative result in Section 2.2. 10 See Mailath and Samuelson (2006) for a discussion of this game in the context of repeated games. 9 6 game associated with the product-choice game Γ(g, l, c), with discount parameter δ and communities of size M, is denoted by ΓM δ (g, l, c). Proposition 1. Let Γ(g, l, c) be a product-choice game with c ≤ 1. Then, there is M ∈ N ¯ such that, for each M ≥ M, regardless of the discount factor δ, the repeated random ¯ matching game ΓM δ (g, l, c) has no sequential equilibrium in which (QH , BH ) is played in every period on the equilibrium path, and in which players play the Nash action off the equilibrium path. Proof. Suppose that there exists an equilibrium in which (QH , BH ) is played in every period on the equilibrium path, and in which players play the Nash action off the equilibrium path. Suppose that a seller deviates in period 1. We argue below that, for a buyer who observes this deviation, it is not optimal to switch to the Nash action permanently from period 2. In particular, we show that playing BH in period 2, followed by switching to BL from period 3 onwards gives the buyer a higher payoff. The buyer who observes the deviation knows that, in period 2, with probability M −1 M she will face a different seller who will play QH . Consider this buyer’s short-run and long-run incentives: Short-run: The buyer’s payoff in period 2 from playing BH is 1 (−l) M + MM−1 . Her payoff if she switches to BL is MM−1 (1−c). Hence, if M is large enough, she has no short-run incentive to switch to the Nash action. Long-run: With probability 1 , M the buyer will meet the deviant seller (who is already playing QL ) in period 2. In this case, her action will not affect this seller’s future behavior, and so her continuation payoff will be the same regardless of her action. With probability M −1 , M the buyer will meet a different seller. Note that, since 1 − c ≥ 0, a buyer always prefers to face a seller playing QH . So, regardless of the buyer’s strategy, the larger the number of sellers who have already switched to QL , the lower is her continuation payoff. Hence, playing BL in period 2 will give her a lower continuation payoff than playing BH , because action BL will make a new seller switch permanently to QL . Since there is no short-run or long-run incentive to switch to the Nash action in period 2, the buyer will not start punishing. Therefore, playing (QH , BH ) in every period on-path, and playing the Nash action off-path does not constitute a sequential equilibrium, regardless of the discount factor. 7 Notice that the product-choice game represents a minimal departure from the PD. If we replace the payoff 1 − c with 1 + g, we get the PD. However, even with this small departure from the PD, cooperation can not be sustained in equilibrium using the standard grim trigger strategies.11 In the next section, we state our main result which is a possibility result about cooperation in a class of games beyond the PD. 2.3 The Main Result CLASS OF GAMES: We denote by G the class of finite two-player games with the following two properties: P1. Strict Nash equilibrium. There exists a strict Nash equilibrium a∗ .12 P2. One-sided Incentives. There exists a pure action profile (â1 , â2 ) in which one player has a strict incentive to deviate while the other has a strict incentive to stick to the current action. ACHIEVABLE PAYOFFS: Let G be a game and let a ∈ A. Let Aa := {a ∈ A : a1 = ¯ ¯ a1 ⇐⇒ a2 = a2 }. Define Fa := conv{u(a) : a ∈ Aa } ∩ {v ∈ R2 : v > u(a)}. ¯ ¯ ¯ ¯ ¯ Our main result says that given a game G in G with a strict Nash equilibrium a∗ , it is possible for players to achieve any payoff in Fa∗ in equilibrium in the corresponding infinitely repeated random matching game GM δ , if players are sufficiently patient and the communities are not too small. G is a large class of games and, in particular, includes the PD. Before stating the result formally, we discuss the assumptions P1 and P2. Since we are interested in Nash reversion, the existence of a pure Nash equilibrium is needed.13 The extra requirement of strictness eliminates only games that are non-generic. To see why we need P1, note that under imperfect private monitoring, it is typically impossible to get full coordination in the punishments. Therefore, if a player is asked to revert to a Nash equilibrium that is not strict, he may not be willing to do. That is, if he thinks that there is a positive probability that his opponent will not play the Nash action, he may 11 It is worth emphasizing that while Proposition 1 establishes that cooperation cannot be sustained via grim trigger strategies, it does not rule out the possibility of sustaining cooperation using other repeated game strategies. Establishing such a negative result for patient players for the full class of repeated game strategies, although desirable, is typically very challenging, and we have not been able to obtain such a result. 12 A Nash equilibrium is strict if each player has a unique best response to his rival’s actions (Harsanyi, 1973). By definition, a strict equilibrium is in pure actions. 13 In the literature on repeated random matching games, it is common to restrict attention to pure strategies. 8 not be indifferent between two actions in the support of his Nash action and, indeed, his myopic best reply may even be outside the support. P2 is a mild condition on the class of games. G excludes what we call games with strictly aligned interests: for two-player games this means that, at each action profile, a player has a strict incentive to deviate if and only if his opponent also has a strict incentive to deviate. The games in G are generic in the class of games without strictly aligned interests with a pure Nash equilibrium.14 The set of achievable payoffs includes payoffs arbitrarily close to efficiency for the prisoner’s dilemma. In general, our strategies do not suffice to get a folk theorem for games in G . However, we conjecture that, by adequately modifying our strategies, it may be possible to support payoffs outside Fa∗ and obtain a Nash threats folk theorem for the games in G .15 We now present the formal statement of the result. Theorem 1. Let G be a game in G with a strict Nash equilibrium a∗ . There exists M ∈ N ¯ such that, for each payoff profile v ∈ Fa∗ , each ε > 0, and each M ≥ M, there exists ¯ δ ∈ (0, 1) such that there is a strategy profile in the repeated random matching game GM δ ¯ that constitutes a sequential equilibrium for each δ ∈ [δ, 1) with payoff within ε of v. ¯ A noteworthy feature of our equilibrium strategies is that if a strategy profile constitutes an equilibrium for a given discount factor, it does so for any higher discount factor as well; in particular, the equilibrium strategy profile defines what is called a uniform equilibrium (Sorin, 1990). This is in contrast with existing literature, where strategies have to be finetuned based on the discount factor (e.g. Takahashi (2010) and Deb (2012)).16 Interestingly, the continuation payoff is within ε of the target equilibrium payoff v, not just ex-ante, but throughout the game on the equilibrium path. In this sense, cooperation is sustained as a durable phenomenon, which constrasts with reputation models where, for every δ, there exists a time after which cooperation collapses (e.g. Cripps, Mailath and Samuelson (2004)). 14 We have not been able to apply our approach to games of strictly aligned interests. We refer the reader to the Online Appendix (B.9) for an example that illustrates the difficulty with achieving cooperation in certain games in this class. However, cooperation is not an issue in commonly studied games in this class, like “battle of the sexes” and the standard version of “chicken,” since in these games, the set of Pareto efficient payoffs is spanned by the set of pure Nash payoffs (so, we can alternate the pure Nash action profiles with the desired frequencies). 15 We refer the reader to the Online Appendix (B.8) for a discussion. 16 Further, in Ellison (1994), the severity of punishments depends on the discount factor which has to be common for all players. We just need all players to be sufficiently patient. 9 While cooperation with a larger population calls for more patient players (larger δ), a very small population also hinders cooperation. Our construction requires a minimum community size M. We work explicitly with beliefs off-path, and a relatively large M guarantees that the beliefs induce the correct incentives to punish. The lower bound M on ¯ the community size depends only on the game G; it is independent of the target payoff v and the precision ε. In sections A.3.3 and A.4 in the Appendix we discuss rough bounds on M that suggest that populations of size 20 would be enough for many games.17 2.4 Equilibrium Strategies Let G be a game in G and let a∗ be a strict Nash equilibrium. Let (â1 , â2 ) be a pure action profile in which only one player has an incentive to deviate. Without loss of generality, suppose it is player 1 who wants to deviate from (â1 , â2 ) while player 2 does not and let ā1 be the most profitable deviation. Let the target equilibrium payoff be v ∈ Fa∗ . We maintain the convention that players 1 and 2 of the stage-game belong to communities 1 and 2, respectively. Also, henceforth, when we refer to the Nash action, we mean the strict Nash equilibrium action profile a∗ . We present the strategies that sustain v. We divide the game into three phases (Figure 2). Phases I and II are trust-building phases and Phase III is the target payoff phase. 1 | Phase I Phase II Ṫ {z Ṫ } | {z T̈ Ṫ + T̈ Phase III } ···∞ Figure 2: Different phases of the strategy profiles. Equilibrium play: Phase I: During the first Ṫ periods, action profile (â1 , â2 ) is played. In every period in this phase, players from Community 1 have a short-run incentive to deviate, but those from Community 2 do not. Phase II: During the next T̈ periods, players play (a∗1 , a2 ), an action profile where players from Community 1 play their Nash action and players from Community 2 do not play their best response.18 In every period in this phase, players from Community 2 have a short-run incentive to deviate. Phase III: For the rest of the game, the players play a sequence of pure 17 It would be interesting to investigate if our strategies are robust to entry and exit of players. We think that our equilibrium construction may be extended to a setting in which we allow players to enter and exit in each period with small probability, but such an analyisis is beyond the scope of this paper. 18 Player 2’s action a2 can be any action other than a∗2 in the stage-game. 10 action profiles that approximates the target payoff v and such that no player plays according to a∗ in period Ṫ + T̈ + 1. Off-Equilibrium play: A player can be in one of two moods: uninfected or infected, with the latter mood being irreversible. At the start of the game, all players are uninfected. We classify deviations into two types of actions. Any deviation by a player from Community 2 in Phase I is called a non-triggering action. Any other deviation is a triggering action. A player who has observed a triggering action is in the infected mood. An uninfected player continues to play as if on-path.19 An infected player acts as follows. • A player who gets infected after facing a triggering action switches to the Nash action forever, either from the beginning of Phase II or immediately from the next period, whichever is later.20 In other words, any player in Community 2 who faces a triggering action in Phase I switches to her Nash action forever from the start of Phase II, playing as if on-path in the meantime. A player facing a triggering action at any other stage of the game immediately switches to the Nash action forever. • A player who gets infected by playing a triggering action himself henceforth best responds, period by period, to the strategies of the other players. In particular, we will show, for large enough Ṫ , a player from Community 1 who deviates in the first period will continue to (profitably) deviate until the end of Phase I, and then switch to playing the Nash action forever.21 Note that a profitable deviation by a player is punished (ultimately) by the whole community of players, with the punishment action spreading like an epidemic. We refer to the spread of punishments as contagion. The difference between our strategies and standard contagion (Kandori, 1992; Ellison, 1994) is that here, the game starts with two trust-building phases. This can be interpreted as follows. In Phase I, players in Community 1 build credibility by not deviating even 19 In other words, if a player from Community 1 observes a deviation in Phase I, he ignores the deviation, and continues to play as if on-path. 20 Moreover, if such an infected player decides to deviate himself in the future, he will subsequently play ignoring his own deviation. 21 In particular, even if the deviating player observes a history that has probability zero, he will continue to best respond given his beliefs (described in Section A.2). 11 though they have a short-run incentive to do so. The situation is reversed in Phase II, where players in Community 2 build credibility by not playing a∗2 , even though they have a shortrun incentive to do so. A deviation by a player in Community 1 in Phase I is not punished in his trust-building phase, but is punished as soon as the phase is over. Similarly, if a player in Community 2 deviates in her trust-building phase, she effectively faces punishment once the trust-building phase is over. Unlike the results for the PD, where the equilibria are based on trigger strategies, we have “delayed” trigger strategies. In Phase III, deviations immediately trigger Nash reversion. Next, we discuss the main ideas underlying the proof of Theorem 1 for the particular case of the product-choice game. The general proof can be found in the Appendix. 3 An Illustration: The Product-Choice Game We show that, for the product-choice game (Figure 1), it is possible to sustain the efficient action profile in equilibrium, using strategies as described above in Section 2.4. 3.1 Equilibrium Strategies in the Product-Choice Game Equilibrium play: Phase I: During the first Ṫ periods, (QH , BH ) is played. Phase II: During the next T̈ periods, the players play (QL , BH ). Phase III: For the rest of the game, the players play the efficient action profile (QH , BH ). Off-Equilibrium play: Action BL in Phase I is a non-triggering action. Any other deviation is a triggering action. An uninfected player continues to play as if on-path. A player who gets infected after facing a triggering action switches to the Nash action forever either from the beginning of Phase II or immediately from the next period, whichever is later. A player facing a triggering action at any other stage immediately switches to the Nash action forever. A player who gets infected by playing a triggering action best responds to the strategies of the other players. Clearly, the payoff from the strategy profile above will be arbitrarily close to the efficient payoff (1, 1) for δ large enough. We need to establish that the strategy profile constitutes a sequential equilibrium of the repeated random matching game ΓM δ (g, l, c) when Ṫ , T̈ , and M are appropriately chosen and δ is close enough to 1. 12 3.2 Optimality of Strategies: Intuition The incentives on-path are quite straightforward. Any short-run profitable deviation will eventually trigger Nash reversion that will spread and reduce continuation payoffs. Hence, given Ṫ , T̈ , and M, for sufficiently patient players, the future loss in continuation payoff will outweigh any current gain from deviation. Establishing sequential rationality off-path is the challenge. Below, we consider some key histories and argue why the strategies are optimal after these histories. We start with two observations. First, a seller who deviates to make a short-term gain in period 1 will find it optimal to revert to the Nash action immediately. This seller knows that, regardless of his choice of actions, from period Ṫ on, at least one buyer will start playing Nash. Then, from period Ṫ + T̈ on, more and more players will start playing the Nash action, and contagion will spread exponentially fast. Thus, his continuation payoff after Ṫ + T̈ will be quite low, regardless of what he does in the remainder of Phase I. Our construction is such that if Ṫ is large enough, both in absolute terms and relative to T̈ , no matter how patient this seller is, the best thing he can do after deviating in period 1 is to play the Nash action forever.22 Second, the optimal action of a player after he observes a triggering action depends on the beliefs that he has about how far the contagion has already spread. To see why, think of a buyer who observes a triggering action during, say, Phase III. Is Nash reversion optimal for her? If she believes that few people are infected, then playing the Nash action may not be optimal. In such a case, with high probability she will face a seller playing QH and playing the Nash action will entail a loss in that period. Moreover, she is likely to infect her opponent, hastening the contagion and lowering her own continuation payoff. The situation is different if she believes that almost everybody is infected (so, playing Nash). Then, there is a short-run gain by playing the Nash action in this period. Moreover, the effect on the contagion process and the continuation payoff will be negligible. Since the optimal action for a player after observing a triggering action depends on the beliefs he has about “how widespread the contagion is,” we need to define a system of beliefs and check if Nash reversion is optimal after getting infected, given these beliefs. We make the following assumptions on beliefs. If an uninfected player observes a non22 For the deviant seller’s incentives, the choice of large Ṫ with Ṫ ≫ T̈ is important for two reasons: First, if Ṫ ≫ T̈ , then a seller who deviates in period 1 will find it optimal to keep deviating and making short-run profits in Phase I, without caring about potential losses in Phase II. Second, as long as Ṫ ≫ T̈ , the deviant seller will believe that the contagion is spread widely enough in Phase I that he will be willing to play Nash throughout Phase III, regardless of the history he observes. 13 triggering action, then he just thinks that the opponent made a mistake and that no one is infected. If a player observes a triggering action, he thinks that some seller deviated in period 1 and that contagion has been spreading since then. Such beliefs are natural given the interpretation of our strategies: deviations become less likely as more trust is built. Here, we formally study the most extreme such beliefs, which leads to cleaner proofs. In the Online Appendix (B.1) we prove that they are consistent in the sense required by sequential equilibrium. These beliefs, along with the fact that a deviant seller will play the Nash action forever, imply that, for sufficiently large Ṫ , any player who observes a triggering action thinks that, almost everybody must be infected by the end of Phase I. This makes Nash reversion optimal after the end of Phase I. To gain some insight, consider the following histories. • Suppose that I am a buyer who gets infected in Phase I. I think that a seller deviated in the first period and that he will continue infecting buyers throughout Phase I. Unless M is too small, in each of the remaining periods of Phase I, the probability of meeting the same seller again is low; so, I prefer to play BH during Phase I (since all other sellers are playing QH ). Yet, if Ṫ is large enough, once Phase I is over I will think that, with high probability, every buyer is now infected. Nash reversion thereafter is optimal. It may happen that, after I get infected, I observe QL in most (possibly all) periods of Phase I. Then, I think that I met the deviant seller repeatedly, and so not all buyers are infected. However, it turns out that if T̈ is large enough, I will still revert to Nash play. Since I expect my continuation payoff to drop after Ṫ + T̈ anyway, for T̈ large enough, I prefer to play the myopic best reply in Phase II, to make some short-term gains (similar to the argument for a seller’s best reply if he deviates in period 1).23 • Suppose that I am a seller who faces BL in Phase I. (Non-triggering actions) Since such actions are never profitable (on-path or off-path), after observing such an action, I will think it was a mistake and that no one is infected. Then, it is optimal to ignore it. The deviating player knows this, and so it is also optimal for him to ignore it. • Suppose that I am a player who gets infected in Phase II, or shortly after period Ṫ + T̈ . I know that contagion has been spreading since the first period. But the fact that I was uninfected so far may indicate that, possibly, not so many people are infected. We show that if Ṫ is large enough and Ṫ ≫ T̈ , I will still think that, with high probability, I 23 The reason why we need Phase II is to account for the histories in which the deviant seller meets the same buyer in most of the periods of Phase I. Since the probability of these histories is extremely low, they barely affect the incentives of a seller, and so having T̈ periods of Phase II is probably not necessary. Yet, introducing Phase II is convenient since it allows us to completely characterize off-path behavior of the buyers. 14 was just lucky not to have been infected so far, but that everybody is infected now. This makes Nash reversion optimal. • Suppose that I get infected late in the game, at period t̄ ≫ Ṫ + T̈ . If t̄ ≫ Ṫ + T̈ , it is no longer possible to rely on how large Ṫ is to characterize my beliefs. However, for this and other related histories late in the game, it turns out that, if M is reasonably large, I will still believe that “enough” people are infected and already playing the Nash action, so that playing the Nash action is also optimal for me. 3.3 Choice of Parameters for Equilibrium Construction It is important to clarify how the different parameters, M, T̈ , Ṫ , and δ, are chosen to ¯ ¯ construct the equilibrium. First, given a game, we find M so that i) a buyer who is infected ¯ in Phase I does not revert to the Nash action before Phase II, and ii) players who are infected very late in the game believe that enough people are infected for Nash reversion to be optimal. Then, we choose T̈ so that, in Phase II, any infected buyer will find it optimal to revert to the Nash action (even if she observed QL in all periods of Phase I). Then, we pick Ṫ , with Ṫ ≫ T̈ , so that i) players infected in Phase II or early in Phase III believe that almost everybody is infected, and ii) a seller who deviates in period 1 plays the Nash action ever after. Finally, we pick δ large enough so that players do not deviate on path.24 ¯ The role of the discount factor δ requires further explanation. A high δ deters players from deviating from cooperation, but also makes players want to slow down the contagion. Then, why are very patient players willing to spread the contagion after getting infected? The essence of the argument is that, for a fixed M, once a player is infected he knows that contagion will start spreading, and expected future payoffs will converge to 0 exponentially fast. Indeed, we can derive an upper bound on the future gains any player can make by slowing down the contagion once it has started, even if he were perfectly patient (δ = 1). Once we know that the gains from slowing down the contagion are bounded in this way, it is easy to show that even very patient players will be willing to play the Nash action. 24 Note that δ must also be large enough so that the payoff achieved in equilibrium is close enough to (1, 1). ¯ 15 4 Discussion and Extensions 4.1 Introduction of Noise An equilibrium is said to be globally stable equilibrium if, after any finite history, play finally reverts to cooperative play (Kandori, 1992). The notion is appealing because it implies that a single mistake does not entail permanent reversion to punishments. The equilibrium here fails to satisfy global stability, though this can be obtained by introducing a public randomization device.25 A related question is to see if cooperation can be sustained in a model with some noise. Since players have strict incentives, our equilibria are robust to the introduction of some noise in the payoffs. Suppose, however, that we are in a setting where players are constrained to make mistakes with probability at least ε > 0 (small) at every possible history. Our equilibrium construction is not robust to this modification. The incentive compatibility of our strategies relies on the fact that players believe that early deviations are more likely. This ensures that whenever players are required to punish, they think that the contagion has spread enough for punishing to be optimal. If players make mistakes with positive and equal probability in all periods, this property is lost. However, we conjecture that, depending on the game, suitable modifications of our strategies may be robust to the introduction of mistakes. To see why, think of a player who observes a deviation for the first time late in Phase III. In a setting in which mistakes occur with probability ε at every period, this player may not believe that the contagion is widespread, but will still think that it has spread to a certain degree. Whether this is enough to provide him with the incentive to punish, depends on the specific payoffs of the stage-game.26 25 This is similar to Ellison (1994).The randomization device could be used to allow for the possibility of restarting the game in any period, with a low but positive probability. 26 It is worth noting that if we consider the same strategies we used to prove Theorem 1, there is one particularly problematic situation. Consider the product-choice game in a setting with noise. If a buyer makes a mistake late in Phase II, no matter what she does after that, she will start Phase III knowing that not many people are already infected. Hence, if she is very patient, it may be optimal for her to continue play as if on-path and slow down the contagion. Suppose that a seller observes a triggering action in the last period of Phase II. Since sellers do not spread the contagion in Phase II, this seller will think that it is very likely that his opponent was uninfected and has just made a mistake, and so will not punish. In this case, neither player reverts to Nash punishments. This implies that a buyer may profitably deviate in the last period of Phase II, since her deviation would go unpunished. 16 4.2 Uncertainty about Timing In the equilibrium strategies in this paper, players condition behavior on calendar time. Onpath, players in Community 1 switch their action in a coordinated way at the end of Phases I and II. Off-path, players coordinate the start of the punishment phase. The calendar time and timing of phases (Ṫ and T̈ ) are commonly known and used to coordinate behavior. Arguably, in modeling large communities, the need to switch behavior with precise coordination is an unappealing feature. It may be interesting to investigate if cooperation can be sustained if players are not sure about the precise time to switch phases. A complete analysis of this issue is beyond the scope of this paper. We conjecture that a modification of our strategies would be robust to the introduction of small uncertainty about timing. The reader may refer to the Online Appendix (B.6), where we consider an altered environment in which players are slightly uncertain about the timing of the different phases. We conjecture equilibrium strategies in this setting, and provide the intuition behind them. 4.3 Alternative Systems of Beliefs We assume that a player who observes a triggering action believes that some player from Community 1 deviated in the first period of the game. This ensures that an infected player thinks that the contagion has been spreading long enough that, after Phase I, almost everybody is infected. It is easy to see that alternative (less extreme) assumptions on beliefs would still have delivered this property. We work with this case mainly for tractability. Also, since our equilibrium is based on communities building trust in the initial phases of the game, it is plausible that players regard deviations to be more likely earlier rather than later. Further, the assumption we make is a limiting one in the sense that it yields the weakest bound on M. With other assumptions, for a given game G ∈ G and given Ṫ and T̈ , the threshold population size M required to sustain cooperation would be weakly greater than ¯ the threshold we obtain. Why is this so? On observing a triggering action, my belief about the number of infected people is determined by my belief about when the first deviation took place and the subsequent contagion process. Formally, on getting infected at period t, P my belief xt can be expressed as xt = tτ =1 µ(τ )y t(τ ), where µ(τ ) is the probability I assign to the first deviation having occurred at period τ , and y t (τ ) is my belief about the number of people infected if I know that the first deviation took place at period τ . Since contagion is not reversible, every elapsed period of contagion results in a weakly greater 17 number of infected people. Thus, my belief if I think the first infection occurred at t = 1 first-order stochastically dominates my belief if I think the first infection happened later, P PM t t at any t > 1, i.e., for each τ and each l ∈ {1, . . . , M}, M i=l yi (1) ≥ i=l yi (τ ). Now consider any belief x̂t that I might have had with alternative assumptions on when I think the first deviation occurred. This belief will be some convex combination of the beliefs y t (τ ), for τ = 1, . . . , t. Since we know that y t (1) first-order stochastically dominates y t (τ ) for all τ > 1, it follows that y t(1) will also first-order stochastically dominate x̂t . Therefore, our assumption is the one for which players will think that the contagion is most widespread at any given time and so makes the off-path incentives easier to satisfy. References Ali, Nageeb, and David Miller. 2012. “Efficiency in repeated games with local interaction and uncertain local monitoring.” Bernstein, Lisa. 1992. “Opting Out of the Legal System: Extralegal Contractual Relations in the Diamond Industry.” Journal of Legal Studies, 115. Cripps, Martin W., George J. Mailath, and Larry Samuelson. 2004. “Imperfect Monitoring and Impermanent Reputations.” Econometrica, 72(2): 407–432. Dal Bó, P. 2007. “Social norms, cooperation and inequality.” Economic Theory, 30(1): 89– 105. Deb, J. 2012. “Cooperation and Community Responsibility: A Folk Theorem for Random Matching Games with Names.” mimeo. Ellison, Glenn. 1994. “Cooperation in the Prisoner’s Dilemma with Anonymous Random Matching.” Review of Economic Studies, 61(3): 567–88. Ely, J. C., and J. Välimäki. 2002. “A Robust Folk Theorem for the Prisoner’s Dilemma.” Journal of Economic Theory, 102(1): 84–105. Ely, Jeffrey C., Johannes Hörner, and Wojciech Olszewski. 2005. “Belief-Free Equilibria in Repeated Games.” Econometrica, 73(2): 377–415. Ghosh, Parikshit, and Debraj Ray. 1996. “Cooperation in Community Interaction without Information Flows.” The Review of Economic Studies, 63: 491–519. 18 Greif, Avner. 1994. “Cultural Beliefs and the Organization of Society: A Historical and Theoretical Reflection on Collectivist and Individualist Societies.” Journal of Political Economy, 102(5): 912–950. Greif, Avner. 2006. Institutions and the Path to the Modern Economy: Lessons from Medieval Trade. Cambridge University Press. Harsanyi, J. 1973. “Oddness of the number of equilibrium points: A new proof.” International Journal of Game Theory, 2: 235–250. Hasker, K. 2007. “Social norms and choice: a weak folk theorem for repeated matching games.” International Journal of Game Theory, 36(1): 137–146. Hörner, Johannes, and Wojciech Olszewski. 2006. “The folk theorem for games with private almost-perfect monitoring.” Econometrica, 74(6): 1499–1544. Kandori, Michihiro. 1992. “Social Norms and Community Enforcement.” Review of Economic Studies, 59(1): 63–80. Kreps, David M., and Robert Wilson. 1982. “Sequential Equilibria.” Econometrica, 50: 863–894. Lippert, Steffen, and Giancarlo Spagnolo. 2011. “Networks of Relations and Word-ofMouth Communication.” Games and Economic Behavior, 72(1): 202–217. Mailath, George J., and Larry Samuelson. 2006. Repeated Games and Reputations: Long-Run Relationships. Oxford University Press. Nava, Francesco, and Michele Piccione. Forthcoming. “Efficiency in repeated games with local interaction and uncertain local monitoring.” Theoretical Economics. Okuno-Fujiwara, M., and A. Postlewaite. 1995. “Social Norms and Random Matching Games.” Games and Economic Behavior, 9: 79–109. Piccione, M. 2002. “The Repeated Prisoner’s Dilemma with Imperfect Private Monitoring.” Journal of Economic Theory, 102(1): 70–83. Resnick, Paul, Richard Zeckhauser, Eric Freidman, and Ko Kuwabara. 2000. “Reputation Systems: Facilitating Trust in Internet Interactions.” Communications of the ACM, 43(12): 45–48. 19 Seneta, Eugene. 2006. Non-negative Matrices and Markov Chains. Springer Series in Statistcs, Springer. Sorin, S. 1990. “Supergames.” In Game theory and applications. , ed. T. Ichiishi, A. Neyman and Y. Taumann, 46–63. Academic Press. Sugaya, Takuo. 2013a. “Folk Theorem in Repeated Games with Private Monitoring.” Sugaya, Takuo. 2013b. “Folk Theorem in Repeated Games with Private Monitoring with More than Two Players.” Sugaya, Takuo. 2013c. “Folk Theorem in Two-Player Repeated Games with Private Monitoring.” Takahashi, Satoru. 2010. “Community enforcement when players observe partners’ past play.” Journal of Economic Theory, 145: 42–62. Watson, Joel. 2002. “Starting Small and Commitment.” Games and Economic Behavior, 38: 176–199. A Proof of the Main Result We show below that, under the assumptions of Theorem 1, the strategy profile described in Section 2.4 constitutes a sequential equilibrium of the repeated random matching game GM δ when M is large enough, Ṫ and T̈ are appropriately chosen, and δ is close enough to 1. If we compare the general game to the product-choice game, a player in Community 1 in a general game in G is like the seller who is the only player with an incentive to deviate in Phase I, and a player in Community 2 is like the buyer who has no incentive to deviate in Phase I but has an incentive to deviate in Phase II. For the sake of clarity, we often refer to players from Community 1 as sellers and players from Community 2 as buyers. A.1 Incentives on-path The incentives on-path are straightforward, and so we omit the formal proof. The logic is that if players are sufficiently patient, on-path deviations can be deterred by the threat of eventual Nash reversion. It is easy to see that, if Ṫ is large enough, the most “profitable” 20 on-path deviation is that of a player in Community 1 (a seller) in period 1. Given M, Ṫ and T̈ , the discount factor δ can be chosen close enough to 1 to deter such deviations. ¯ A.2 System of beliefs In this section we describe the assumptions we make for the formation of off-path beliefs. As usual, they are then updated using Bayes rule whenever possible. i) Assumption 1: If a player observes a triggering action, then this player believes that some player in Community 1 (a seller) deviated in the first period of the game. Subsequently, his beliefs are described as follows: • Assumption 1.1: He believes that after the deviation by a seller in the first pe- riod, play has proceeded as prescribed by the strategies, provided his observed history can be explained in this way. • Assumption 1.2: If his observed history cannot be explained with play having proceeded as prescribed by the strategies after a seller’s deviation in period 1, then we call this an erroneous history.27 In this case, the player believes that after the seller deviated in the first period of the game, one or more of his opponents in his matches made a mistake. Indeed, this player will think that there have been as many mistakes by his past rivals as needed to explain the history at hand. Erroneous histories include the following: – A player observes an action in Phases II or III that is neither part of the strict Nash profile a∗ nor the action that is supposed to be played in equilibrium. – A player who, after being certain that all the players in the other community are infected, faces an opponent who does not play the Nash action. This can only happen in Phase III. For instance, this happens if an infected player infects M − 1 opponents by playing a triggering action against them. ii) Assumption 2: If a player observes a non-triggering action, then this player believes that his opponent made a mistake. iii) Assumption 3: If a player plays a triggering action and then observes a history that has probability zero according to his beliefs, we also say that he is at an erroneous 27 For the formation of a player’s beliefs after erroneous histories, we assume that “mistakes” by infected players are infinitely more likely than “mistakes” by uninfected players. 21 history and will think that there have been as many mistakes by his past rivals as needed to explain the history at hand. In the Online Appendix (B.1) we formally prove consistency of these beliefs. A.2.1 Modeling Beliefs with Contagion Matrices So far, we have not formally described the structure of a player’s beliefs. The payoffrelevant feature of a player’s beliefs is the number of people he believes to be currently infected. Accordingly, we let a vector xt ∈ RM denote the beliefs of player i about the number of infected people in the other community at the end of period t, where xtk denotes the probability he assigns to exactly k people being infected in the other community. To illustrate, when player i observes the first triggering action, Assumption 1 implies that x1 = (1, 0, . . . , 0). In some abuse of notation, when it is known that a player assigns 0 probability to more than k opponents being infected, we work with xt ∈ Rk . We say a belief xt ∈ Rk first-order stochastically dominates a belief x̂t if xt assigns higher probability to P P more people being infected; i.e., for each l ∈ {1, . . . , k}, ki=l xti ≥ ki=l x̂ti . Let I t denote the random variable representing the number of infected people in the other community at the end of period t. Let k t denote the event “k people in the other community are infected by the end of period t”, i.e., k t and I t = k denote the same event. As we will see below, the beliefs of players after different histories evolve according to simple Markov processes, and so can be studied using an appropriate transition matrix and an initial belief. We define below a useful class of matrices: contagion matrices. The element cij of a contagion matrix C denotes the probability that the state “i rivals infected” transitions to the state “j rivals infected.” Formally, if we let Mk denote the set of k × k matrices with real entries, we say that a matrix C ∈ Mk is a contagion matrix if it has the following properties: i) All the entries of C belong to [0, 1] (represent probabilities). ii) C is upper triangular (being infected is irreversible). iii) All diagonal entries are strictly positive (with some probability, infected people meet other infected people and contagion does not spread in the current period). iv) For each i > 1, ci−1,i is strictly positive (With some probability, exactly one new person gets infected in the current period, unless everybody is already infected.). 22 A useful technical property is that, since contagion matrices are upper triangular, their P eigenvalues correspond to the diagonal entries. Given y ∈ Rk , let kyk := i∈{1,...,k} yi . We will often be interested in the limit behavior of yC t , kyC t k where C is a contagion matrix and y is a probability vector. Given a matrix C, let Cl⌋ denote the matrix derived by removing the last l rows and columns from C. Similarly, C⌈k is the matrix derived by removing the first k rows and columns and C⌈k,l⌋ by doing both operations simultaneously. Clearly, if we perform any of these operations on a contagion matrix, we get a new contagion matrix. A.3 Incentives off-path To prove sequential rationality, we need to examine players’ incentives after possible offpath histories, given the beliefs. This is the heart of the proof and the exposition proceeds as follows. We classify possible off-path histories of a player i based on when player i observed off-path behavior for the first time. • Given the beliefs described above, it is important to first characterize the best response of a seller who deviates in the first period of the game. • Next, we consider histories where player i observes a triggering action (gets infected) for the first time in the target payoff phase (Phase III). • We then consider histories where player i observes a triggering action for the first time during one of the two trust-building phases. • We discuss non-triggering actions. • Finally, we discuss two particular types of off-path histories that might be of special interest, since the arguments used here can be applied to other similar histories. We need some extra notation to characterize the possible histories that can arise in the game. Denote a t-period private history for a player i by ht . At any time period, actions are classified as two types, depending on the information they convey. Good behavior (g): We say that a player observes good behavior, denoted by g, if he observes the action an uninfected player would choose. There is one exception here: if the on-path and the off-path actions coincide with the Nash action, then observing the Nash action is good behavior only if the player is uninfected.28 28 If a player is infected and observes a Nash action in one such period, this does not convey information about how widespread the contagion is. In particular, it does not point towards fewer people being infected. 23 Bad behavior (b): We say that a player observes bad behavior, denoted by b, if he does not observe good behavior.29 In a nutshell, good behavior points in the direction of fewer people being infected and bad behavior does not. If player i observes a t-period history ht followed by three periods of good behavior and then one period of bad behavior, we represent this by ht gggb. For any player i, let gt denote the event “player i observed g in period t,” and let U t denote the event “player i is uninfected at the end of period t.” In an abuse of notation, a player’s history is written omitting his own actions. For most of the arguments below, we discuss beliefs from the point of view of a fixed player i, and so often refer to player i in the first person. A.3.1 Computing off-path beliefs Since we work with off-path beliefs of players, it is useful to clarify at the outset our approach to computing beliefs. Suppose that I get infected at some period t̄ in Phase III and that I face an uninfected player in period t̄ + 1. I will think that a seller deviated in period 1 and that in period Ṫ + T̈ + 1 all infected buyers and sellers played the Nash action (which is triggering in this period). Therefore, period Ṫ + T̈ + 2 must always start with the same number of players infected in both communities. So it suffices to compute beliefs about the number of people infected in the rival community. These beliefs are represented by xt̄+1 ∈ RM , where xt̄+1 is the probability of exactly k people being infected after period k t̄ + 1, and must be computed using Bayes rule and conditioning on my private history. What information do I have after history ht̄+1 ? I know that a seller deviated at period 1, so x1 = (1, 0, . . . , 0). I also know that, after any period t < t̄, I was not infected (U t ). Moreover, since I got infected at period t̄, at least one player in the rival community got infected in the same period. Finally, since I faced an uninfected player at t̄ + 1, at most M − 2 people were infected after any period t < t̄ (i.e., I t ≤ M − 2). To compute xt̄+1 , we compute a series of intermediate beliefs xt , for t < t̄ + 1. We compute x2 from x1 by conditioning on U 2 and I 2 ≤ M − 2; then we compute x3 from x2 and so on. Note that, to compute x2 , we do not use the information that “I did not get infected at any period 2 < t < t̄.” So, at each t < t̄, xt represents my beliefs when I condition on the fact that the contagion started at period 1 and that no matching that leads to 29 By Assumption 1.2, if a player observes off-path behavior that does not correspond with the action an infected player would play, he thinks that this corresponds to a “mistake” by an infected player, pointing in the direction of more people being infected. Thus, this is classified as bad behavior. 24 more than M − 2 people being infected could have been realized.30 Put differently, at each period, I compute my beliefs by eliminating (assigning zero probability to) the matchings I know could not have taken place. At a given period τ < t̄, the information that “I did not get infected at any period t, with τ < t < t̄” is not used. This extra information is added period by period, i.e., only at period t we add the information coming from the fact that “I was not infected at period t.” In the Online Appendix (B.3), we show that this method of computing xt̄+1 generates the required beliefs, i.e., beliefs at period t̄ + 1 conditioning on the entire history. Now we are equipped to check the optimality of the equilibrium strategies. A.3.2 A seller deviates at beginning of the game The strategies specify that a player who gets infected by deviating to a triggering action henceforth plays his best response to the other players’ strategies. As we argued for the product-choice game, it is easy to show that for a given M, if Ṫ is large enough (and large relative to T̈ ), a seller who deviates from equilibrium action â1 in period 1 will find it optimal to continue deviating and playing his most profitable deviation, ā1 , until the end of Phase I to maximize short-run gains, and then will switch to playing the Nash a∗1 forever. Note that, here, it is important that Ṫ be chosen large enough relative to T̈ . First, it ensures that if the seller keeps deviating in Phase I, his short-run gains in Phase I are larger than the potential loses in Phase II. Second, even if the deviant seller faces many occurrences of a non-Nash action in Phase II, he will assign high probability to the event that all but one buyer got infected in Phase I, and that he has been repeatedly meeting the only uninfected buyer in Phase II. Under such beliefs, playing the Nash action in Phase III is still optimal for him. A.3.3 A player gets infected in Phase III Our analysis of incentives in Phase III is structured as follows. First we check the offpath incentives of a player infected at the start of Phase III. Next, we analyze incentives of a player after infection very late in Phase III. Finally, we use a monotonicity argument on the beliefs to check the incentives after infection in intermediate periods in Phase III. Case 1: Infection at the start of Phase III. Let hṪ +T̈ +1 denote a history in which I am 30 The updating after period t̄ is different, since I know that I was infected at t̄ and that no more than M − 1 people could possibly be infected in the other community at the end of period t̄. 25 a player who gets infected in period Ṫ + T̈ + 1. The equilibrium strategies prescribe that I switch to the Nash action forever. For this to be optimal, I must believe that enough players in the other community are already infected. My beliefs depend on what I know about how contagion spreads in Phase I after a seller deviates in period 1. In Phase I, only one deviant seller is infecting buyers (recall that buyers cannot spread contagion in Phase I). The contagion, then, is a Markov process with state space {1, . . . , M}, where the state represents the number of infected buyers. The transition matrix corresponds with the contagion matrix SM ∈ MM , where a state k transits to k + 1 if the deviant seller meets an uninfected buyer, which has probability M −k . M With the remaining probability, i.e., k , M state k remains at state k. When no confusion arises, we omit subscript M in matrix SM . Let skl be the probability that state k transitions to state l. Then, 1 M −1 0 0 ... 0 M M 2 M −2 0 0 ... 0 M M . . .. .. .. . .. . . . . SM = 2 M −2 0 0 0 0 M M M −1 1 0 0 0 0 M M 0 0 0 0 0 1 . The transition matrix represents how the contagion is expected to spread. To compute my beliefs after being infected, I must also condition on the information from my own history. Consider any period t < Ṫ . After observing history hṪ +T̈ +1 = g . . . gb, I know that at the end of period t + 1, at most M − 1 buyers were infected and I was not infected. Therefore, to compute xt+1 , my intermediate beliefs about the number of buyers who were infected at the end of period t + 1 (i.e., about I t+1 ), I need to condition on the following: i) My beliefs about I t : xt . ii) I was uninfected at the end of t + 1: the event U t+1 (this is irrelevant if I am a seller, since sellers cannot get infected in Phase I). iii) At most M − 1 buyers were infected by the end of period t + 1: I t+1 ≤ M − 1 (otherwise I could not have been uninfected at the start of Phase III). 26 Therefore, given l < M, if I am a buyer, the probability that exactly l buyers are infected after period t + 1, conditional on the above information, is given by: P(l t+1 t |x ∩ U t+1 ∩I t+1 P(lt+1 ∩ U t+1 ∩ I t+1 ≤ M − 1 |xt ) ≤ M − 1) = P(U t+1 ∩ I t+1 ≤ M − 1 |xt ) −l xtl−1 sl−1,l MM−l+1 + xtl sl,l = PM −1 t . M −k t + x s x s k,k k−1,k k k−1 k=1 M −k+1 The expression for a seller would be analogous, but without the t t+1 tice that we can express the transition from x to x M −l M −l+1 factors. No- using what we call the conditional transition matrix. Let C ∈ MM be defined, for each pair k, l ∈ {1, . . . , M − 1}, by M −l ckl := skl M ; by cM M = 1, and with all remaining entries being 0. −k M −1 Since we already know that xtM = xt+1 . Recall that C1⌋ M = 0, we can just work in R and S1⌋ denote the matrices obtained from C and S by removing the last row and the last column of each. So, the truncated matrix of conditional transitional probabilities C1⌋ is as follows: C1⌋ = 1 M 0 .. . M −1 M −2 M M −1 2 M .. . 0 0 ... 0 M −2 M −3 M M −2 ... . 0 .. . 0 .. . 2 1 M 2 M −1 M .. 0 0 0 0 M −2 M 0 0 0 0 0 . Thus, it is crucial for our analysis to understand the evolution of the Markov processes associated with matrices C1⌋ and S1⌋ , starting out with the situation in which only one player is infected. Then, for the buyer case, let yB1 = (1, 0, . . . , 0) and define yBt+1 as yBt+1 yB1 C1⌋t yBt C1⌋ = t = 1 t . kyB C1⌋ k kyB C1⌋ k Analogously, we define the Markov process for the seller, ySt , by using S1⌋ instead of C1⌋ . Therefore, my intermediate beliefs at the end of period Ṫ , xṪ , would be given by yBṪ if I am a buyer and ySṪ if I am a seller. The following result characterizes the limit behavior of yBt and ySt , needed to understand xṪ for large Ṫ . Lemma 1. Fix M. Then, limt→∞ yBt = limt→∞ ySt = (0, . . . , 0, 1). The intuition for the above lemma is as follows. Note that the largest diagonal entry in the matrix C1⌋ (or S1⌋ ) is the last one. This means that the state M − 1 is more stable than 27 any other state. Consequently, as more periods of contagion elapse in Phase I, state M − 1 becomes more and more likely. The formal proof is a straightforward consequence of some of the properties of contagion matrices (see Proposition B.1 in the Online Appendix). Proposition 2. Fix T̈ and M. If I observe history hṪ +T̈ +1 = g . . . gb and Ṫ is large enough, then it is sequentially rational for me to play the Nash action at period Ṫ + T̈ + 2. Proof. Suppose that I am a buyer. Since sellers always play the Nash action in Phase II, I cannot learn anything from play in Phase II. By Lemma 1, if Ṫ is large enough, xṪ , which coincides with yBṪ , will be close to (0, . . . , 0, 1). Thus, I assign high probability to M − 1 players in my community being infected at the end of Phase I. Then, at least as many sellers got infected during Phase II. If exactly M − 1 buyers were infected by the end of Phase II, then the only uninfected seller must have got infected in period Ṫ + T̈ + 1 since, in this period, I was the only uninfected buyer and I met an infected opponent. So, if Ṫ is large enough, I assign probability arbitrarily close to 1 to all sellers being infected by the end of Ṫ + T̈ + 1. Thus, it is optimal to play the Nash action a∗2 . Next, suppose that I am a seller. In this case, the fact that no player infected me in Phase II will make me update my beliefs about how contagion has spread. However, if Ṫ is large enough relative to T̈ , even if I factor in the information that I was not infected in Phase II, the probability I assign to M − 1 buyers being infected by the end of Phase I is arbitrarily higher than the probability I assign to k players being infected for any k < M −1. By the same argument as above, playing Nash is optimal. Next, we consider (Ṫ + T̈ +1+α)-period histories of the form hṪ +T̈ +1+α = g . . . gbg . α. . g, with 1 ≤ α ≤ M − 2, i.e., histories where I was infected at period Ṫ + T̈ + 1 and then I observed α periods of good behavior while I was playing the Nash action. For the sake of exposition, assume that I am a buyer (the arguments for a seller are analogous). Why are these histories significant? Notice that if I get infected in period Ṫ + T̈ + 1, I can believe that all other players in my community are infected. However, if after that I observe the on-path action, I will think that I have met an uninfected player and I will have to revise my beliefs, since it is not possible that all the players in my community were infected after period Ṫ + T̈ + 1. Can this alter my incentives to play the Nash action? Suppose that α = 1. After history hṪ +T̈ +2 = g . . . gbg, I know that at most M − 2 buyers were infected by the end of Phase I. So, for each t ≤ Ṫ , xtM = xtM −1 = 0. My beliefs are no longer computed using C1⌋ , but rather with C2⌋ , which is derived by removing the last two rows and columns of C. By a similar argument as Lemma 1, we can show that 28 the new intermediate beliefs, xṪ ∈ RM −2 , also converge to (0, 0, . . . , 1). In other words, the state M − 2 is the most stable and, for Ṫ large enough, I assign very high probability to M − 2 buyers being infected at the end of Phase I. Consequently, at least as many sellers were infected during Phase II. This, in turn, implies (just as in Proposition 2) that I believe that, with high probability, all players are infected by now (at t = Ṫ + T̈ + 2). To see why note that, with high probability, M − 2 sellers were infected during Phase II. Then, one of the uninfected sellers met an infected buyer at period Ṫ + T̈ + 1, and I infected the last one in the last period. So, if Ṫ is large enough, I still assign arbitrarily high probability to everyone being infected by now, and it is optimal for me to play Nash. A similar argument holds for (Ṫ + T̈ + 1 + α)-period histories with α ∈ {1, . . . , M − 1}. We need not check incentives for Nash reversion after histories where I observe more than M − 2 periods of g after being infected. Note that these are erroneous histories. When I get infected at period Ṫ + T̈ + 1, I believe that the minimum number of people infected in each community is two (at least one pair of players in period 1 and one pair in period Ṫ + T̈ + 1). Suppose that subsequently I face M − 2 instances of g. Since I switched to Nash after being infected, I infected my opponents in the last M − 2 periods. At the end of M − 2 observations of g, I am sure that even if there were only two players infected in each community at the end of Ṫ + T̈ + 1, I have infected all the remaining uninfected people. So observing more g after that cannot be explained by a single deviation in period 1. Then, I will believe that in addition to the first deviation by a seller in period 1, there have been as many mistakes by players as needed to be consistent with the observed history. Finally, consider histories where, after getting infected, I observe a sequence of actions that includes both g and b, i.e., histories starting with hṪ +T̈ +1 = g . . . gb and where I faced b in one or more periods after getting infected. After such histories, I will assign higher probability to more people being infected compared to histories where I observed only g after getting infected. This is because, by definition, every additional observation of b is an indication of more people being infected, which increases my incentives to play Nash. Thus, we have shown that a player who observes a triggering action for the first time at the start of Phase III will find it optimal to revert permanently to the Nash action. Case 2: Infection late in Phase III. Next, suppose that I get infected after observing history ht̄+1 = g . . . gb, with t̄ ≫ Ṫ + T̈ . This is the situation where the incentives are harder to verify, because we have to study how the contagion evolves during Phase III. In this phase, all infected players in both communities spread the contagion and, hence, from period Ṫ + T̈ + 1 on, the same number of people is infected in both communities. The 29 contagion can again be studied as a Markov process with state space {1, . . . , M}. The new transition matrix corresponds with the contagion matrix S̄ ∈ MM where, for each pair k, l ∈ {1, . . . , M}, if k > l or l > 2k, s̄kl = 0; otherwise, i.e., if k ≤ l ≤ 2k, the probability of transition to state k to state l is (see Figure 3): s̄kl = k l−k M −k l−k 2 (l − k)! (2k − l)!(M − l)! M! Community 1 (Sellers) = (k!)2 ((M − k)!)2 . ((l − k)!)2 (2k − l)!(M − l)!M! Community 2 (Buyers) (2k − l)! Already infected k l-k k l−k k l−k M −k (l − k)! l−k M −k (l − k)! l−k M-k Newly infected Still uninfected (M − l)! Figure 3: Spread of Contagion in Phase III. There are M ! possible matchings. For state k to transit to state l, exactly (l − k) infected people from each community must meet (l − k) uninfected people from the other community. The number of ways of choosing exactly (l − k) buyers from k infected k ones is l−k . The number of ways of choosing the corresponding (l − k) uninfected sellers that −k will get infected is Ml−k . Finally, the number of ways in which these sets of (l − k) people can be matched is the total number of permutations of l − k people, i.e., (l − k)!. Analogously, we choose the (l − k) infected sellers who will be matched to (l − k) uninfected buyers. The number of ways in which the remaining infected buyers and sellers get matched to each other is (2k − l)! and, for the uninfected ones, we have (M − l)!. Consider any t such that Ṫ + T̈ < t < t̄. Since I have observed history ht̄+1 = g . . . gb, I know that “at most M −1 people could have been infected in the rival community at the end of period t + 1” (I t+1 ≤ M − 1), and “I was not infected at the end of period t + 1” (U t+1 ). As before, let xt be my intermediate beliefs after period t. Since, for each t ≤ t̄, xtM = 0, we can work with xt ∈ RM −1 . We want to compute P(lt+1 |xt ∩ U t+1 ∩ I t+1 ≤ M − 1) for l ∈ {1, . . . , M − 1} and Ṫ + T̈ < t: P(lt+1 ∩ U t+1 ∩ I t+1 ≤ M − 1) |xt ) P(U t+1 ∩ I t+1 ≤ M − 1) |xt ) P M −l t k∈{1,...,M } xk s̄kl M −k . P = P M −l t l∈{1,...,M −2} k∈{1,...,M } xk s̄kl M −k P(lt+1 |xt ∩ U t+1 ∩ I t+1 ≤ M − 1) = 30 Again, we can express these probabilities using the corresponding conditional transition M −l ; matrix. Let C̄ ∈ MM be defined, for each pair k, l ∈ {1, . . . , M − 1}, by c̄kl := s̄kl M −k by c̄M M = 1; and with all remaining entries being 0. Then, given a vector of beliefs at the end of Phase II represented by a probability vector ȳ 0, we are interested in the evolution of the Markov process where ȳ t+1 is defined as ȳ t+1 = ȳ t C̄1⌋ . kȳ t C̄1⌋ k Note that, as long as t ≤ t̄ − Ṫ − T̈ , each ȳ t coincides with the intermediate beliefs xṪ +T̈ +t . Thus, to understand the beliefs after late deviations, we study ȳ t as t → ∞. Importantly, provided that ȳ10 > 0, the limit results below do not depend on ȳ 0 , i.e., the specific beliefs at the end of Phase II do not matter.31 We show next that a result similar to Lemma 1 holds. Lemma 2. Suppose that ȳ10 > 0. Then, limt→∞ ȳ t = (0, 0, . . . , 0, 1) ∈ RM −1 . The lemma follows from properties of contagion matrices (see Proposition B.1 in the Online Appendix for the proof). The informal argument is as follows. The largest diagonal entries of the matrix C̄1⌋ are the first and last ones (c̄11 and c̄M −1,M −1 ), which are equal. Unlike in the Phase I transition matrix, state M − 1 is not the unique most stable state. Here, states 1 and M − 1 are equally stable, and more stable than any other state. Why do then have converge to (0, 0, . . . , 0, 1)? In each period, many states transit to M − 1 with positive probability, while no state transits to state 1, and so the ratio increases. t ȳM −1 ȳ1t goes to ∞ as t Proposition 3. Fix Ṫ ∈ N, T̈ ∈ N, and M ∈ N. If I observe history ht̄+1 = g . . . gb and t̄ is large enough, then it is sequentially rational for me to play the Nash action at period t̄ + 2. Proof. We want to know how spread the contagion is according to the beliefs xt̄+1 . To do so, we start with xt̄ . By Lemma 2, if t̄ is large enough, ȳ t̄−Ṫ −T̈ is very close to (0, 0, . . . , 0, 1). Thus, xt̄ is such that I assign very high probability to M − 1 players in the other community being infected by the end of period t̄. Now, to compute xt̄+1 from xt̄ , I add the information that I got infected at t̄ + 1 and, hence, the only uninfected person in the other community got infected too. Thus, I assign very high probability to everyone being infected, and Nash reversion is optimal. 31 The condition ȳ10 > 0 is just saying that, with positive probability, the deviant seller may meet the same buyer in all the periods in phases I and II. 31 Suppose now that I observe ht̄+2 = g . . . gbg and that I played the Nash action at period t̄ + 2. The analysis of these histories turns out to be more involved than in Case 1. After observing ht̄+2 , I will know that fewer than (M − 1) people were infected at the end of period t̄ since, otherwise, I could not have faced g in period t̄ + 2. In other words, I have to recompute my beliefs using the information that, for each t ≤ t̄, I t ≤ M − 2. To do so, I should use the truncated transition matrix C̄2⌋ . Since, for each t ≤ t̄, xtM = xtM −1 = 0, to obtain xt̄ we just work with xt ∈ RM −2 . We have the Markov process that starts with a vector of beliefs at the end of Phase II, represented by a probability vector ȳ¯0 , and such that ȳ¯t+1 is computed as ȳ¯t+1 = ȳ¯t C̄2⌋ . kȳ¯t C̄2⌋ k As before, as long as t ≤ t̄− Ṫ − T̈ , each ȳ¯t coincides with the intermediate beliefs xṪ +T̈ +t . Also, we want to study the limit behavior of ȳ¯t̄ as t goes to ∞ and then rely on xt̄ to compute beliefs at t̄ + 2. The extra difficulty comes from the fact that c̄M −2,M −2 < c̄11 = 1 , M and the arguments behind Lemma 2 do not apply to matrix C̄2⌋ and, indeed, the limit beliefs do not converge to (0, . . . , 0, 1). However, we show that I will still believe that “enough” people are infected with “high enough probability.” In the lemma below, we first establish that ȳ¯t converges. The result follows from properties of contagion matrices. The proof is available in the Online Appendix (Proposition B.1). Since this limit only depends on M, with a slight abuse of notation we denote it by ȳ¯M . Lemma 3. Suppose that ȳ10 > 0. Then, limt→∞ ȳ t = ȳ¯M , where ȳ¯M is the unique nonnegative left eigenvector associated with the largest eigenvalue of C̄2⌋ such that kȳ¯M k = 1. Thus, ȳ¯M C̄2⌋ = ȳ¯M . M In particular, the result above implies that the limit as t̄ goes to infinity of the beliefs xt̄ is independent of Ṫ and T̈ . We are interested in the properties of ȳ¯M , which we study next. Lemma 4. Let r ∈ (0, 1). Then, for each m ∈ N, there is M ∈ N such that, for each ¯ M ≥ M, ¯ M −2 X 1 ȳ¯jM > 1 − m , M j=⌈rM ⌉ where ⌈z⌉ is the ceiling function and is defined as the smallest integer not smaller than z. This result can be interpreted as follows. Think of r as a proportion of people, say 0.9. Given m, if the population size M is large enough, after observing history ht̄+2 = g . . . gbg, 32 ¯ will be such that I will assign probability at least (1 − 1 ) to at least my limiting belief M̄ Mm 90 percent of the population being infected. We can choose r close enough to 1 and m large enough and then find M ∈ N large enough so that I believe that the contagion has spread enough that playing the Nash action is optimal. Figure 4 below represents the probabilities PM −2 j=⌈rM ⌉ ȳ¯iM for different values of r and M. In particular, it shows that they go to one very fast with M.32 From the results in this section it will follow that, after any history in which I have been infected late in Phase III, I will believe that the contagion is at least as spread as ȳ¯M indicates. For instance, consider M = 20. In this case, ȳ¯M is such that I believe that, with probability at least 0.75, at least 90 percent of the people are infected. This may be enough to induce the right incentives for many games. In general, these incentives will hold even for fairly small population sizes. r=0.5 r=0.75 r=0.9 r=0.95 1.0 Probability 0.9 0.8 r=0.99 0.7 1.0 0.9 0.8 0.7 0.6 0.6 0.5 0.5 0 10 20 30 40 0 50 100 200 Community Size (M ) 300 400 500 600 Community Size (M ) Figure 4: Illustration of Lemma 4. In order to prove Lemma 4, we need to study the transitions between states. There are two opposing forces that affect how my beliefs evolve after I observe g . . . gbg. On the one hand, each observation of g suggests to me that not too many people are infected, making me step back in my beliefs and assign higher weight to lower states (fewer people infected). On the other hand, since I believe that the contagion started at t = 1 and that it has been spreading exponentially during Phase III, every elapsed period makes me assign more weight to higher states (believe that more people are infected). What we need to do is to compare the magnitudes of these two effects. Two main observations drive the proof. First, each time I observe g, my beliefs get updated with more weight assigned to lower states and, roughly speaking, this step back in beliefs turns out to be of the order of M. Second, we identify the most likely transition from any given state k. It turns out that the √ likeliest transition from k, say state k ′ , is about M times more likely than the state k. 32 The non-monotonicity in the graphs in Figure 4 may be surprising. To the best of our understanding, this can be essentially attributed to the fact that the states that are powers of 2 tend to be more likely, and their distribution within the top M − ⌈rM ⌉ states varies in a non-monotone way. 33 Similarly, the most likely transition from k ′ , say state k ′′ , is √ M times more likely than the ′ state k . Hence, given a proportion of people r ∈ (0, 1) and m ∈ N, if M is large enough, for each state k < rM, we can find another state k̄ that is at least M m times more likely than state k. Thus, given m ∈ N, by taking M large enough, we get that the second effect on the beliefs is, at least, of an order of M m , and so it dominates the first effect. We need some preliminaries before we formally prove Lemma 4. Recall that (c̄2⌋ )k,k+j M −k (k!)2 ((M − k)!)2 M −k−j = s̄k,k+j = . 2 M −k−j (j!) (k − j)!(M − k − j)!M! M − k Given a state k ∈ {1, . . . , M − 2}, consider the transition from k to state tr(k) := k + ⌊ k(MM−k) ⌋, where ⌊z⌋ is the floor function defined as the largest integer not larger than z. It turns out that, for large M, this is a good approximation of the most likely transition from state k. We temporarily switch to the case where there is a continuum of states, i.e., we think of the set of states as the interval [0, M]. In the continuous setting, a state z ∈ [0, M], can be represented as rM; where r = z/M can be interpreted as the proportion of infected people at state z. Let γ ∈ R. We define a function fγ : [0, 1] → R as fγ (r) := rM(M − rM) + γ = (r − r 2 )M + γ. M Note that fγ is continuous and, further, that rM + f0 (r) would just be the extension of the function tr(k) to the continuous case. We want to know how likely the transition from state rM to rM + f0 (r) is. Define a function g : [0, 1] → [0, 1] as g(r) := 2r − r 2 . The function g is continuous and strictly increasing. Given r ∈ [0, 1], g(r) represents the proportion of infected people if, at state rM, f0 (r) people get infected. To see why, note that rM + f0 (r) = rM + (r − r 2 )M = (2r − r 2 )M. Let g 2 (r) := g(g(r)) and define analogously any other power of g. Hence, for each r ∈ [0, 1], g n (r) represents the fraction of people infected after n steps starting at rM when transitions are made according to f0 (·). Claim 1. Let M ∈ N and a, b ∈ (0, 1), with a > b. Then, aM + f0 (a) > bM + f0 (b). Proof. Note that aM + f0 (a) − bM − f0 (b) = (g(a) − g(b))M, and the result follows from the fact that g(·) is strictly increasing on (0, 1). 34 Now we define a function hM γ : (0, 1) → (0, ∞) as hM γ (r) := (rM!)2 ((M − rM)!)2 M − rM − fγ (r) . (fγ (r)!)2 (rM − fγ (r))!(M − rM − fγ (r))!M! M − rM This function is the continuous version of the transitions given by the matrix C̄2⌋ . In particular, given γ ∈ R and r ∈ [0, 1] the function hM γ (r) represents the conditional probability of transition from state rM to state rM + fγ (r). In some abuse of notation, we apply the factorial function to non-integer real numbers. In such cases, the factorial can be interpreted as the corresponding Gamma function, i.e., a! = Γ(a + 1). Claim 2. Let γ ∈ R and r ∈ (0, 1). Then, limM →∞ MhM γ (r) = ∞. More precisely, (r) MhM 1 √γ = √ . lim M →∞ r 2π M Proof. This proof is based on Stirling’s approximation. The interested reader is referred to the Online Appendix (B.4) for a proof. We are now ready to prove Lemma 4. Proof of Lemma 4. Fix r ∈ (0, 1) and m ∈ N. We show that there is M such that, for ¯ each M ≥ M and each state k < ⌈rM⌉, there is a state k̄ ∈ {⌈rM⌉, . . . , M − 2} such ¯ that ȳ¯k̄M > M m+1 ȳ¯kM . We show this first for state k0 = ⌈rM⌉ − 1. Consider the state rM and define ρ := 2m + 3 and r̄ := g ρ (r). Recall that r̄ is the state reached (proportion of people infected) from initial state rM after ρ steps according to the function f0 . Recall that functions f0 and g are such that, r < r̄ < 1. Moreover, suppose that M is large enough so that r̄M ≤ M − 2. Consider the state k0 and let k̄ be the number of infected people after ρ steps according to function tr(·). Given r̂ ∈ (0, 1), f0 (r̂) and tr(r̂M) just differ because f0 (r̂) may not be a natural number, while tr(r̂M) always is, and so the difference between these two functions is just a matter of truncation. Hence, for each of step j ∈ {1, . . . , ρ}, there is γj ∈ (−1, 0] such that the step corresponds with that of function fγj ; for instance, γ1 is such that tr(k0 ) = fγ1 (k0 /M). By Claim 1, since k0 < rM, k̄ < M − 2. Moreover, it is trivial to see that k̄ > ⌈rM⌉. Let k1 be the state that is reached after the first step from k0 according to function tr(·). Recall that, by Lemma 3, ȳ¯M = M ȳ¯M C̄2⌋ . Then, P −2 M ¯k (C̄2⌋ )kk1 > M ȳ¯kM0 (C̄2⌋ )k0 k1 = ȳ¯kM0 MhM ȳ¯kM1 = M M γ1 (r), which, by Claim 2, can be k=1 ȳ√ M M approximated by √ ȳ¯k0 if M is large enough. Repeating the same argument for the other r 2π 35 intermediate states that are reached in each of the ρ steps we get that, if M is large enough, ρ ȳ¯k̄M 1 M2 M2 ȳ¯kM0 > M m+1 ȳ¯kM0 . > √ ȳ¯kM0 = M m+1 √ ρ ρ (r 2π) (r 2π) The proof for an arbitrary state k < ⌈rM⌉ − 1 is very similar, with the only difference that more steps might be needed to get to a state k̄ ∈ {⌈rM⌉, . . . , M − 2}; yet, the extra number of steps makes the difference between ȳ¯kM and ȳ¯k̄M even bigger.33 Let k̄ ∈ {⌈rM⌉, . . . , M − 2} be a state such that ȳ¯k̄M > M m+1 max{ȳ¯kM : k ∈ P⌈rM ⌉−1 M ȳ¯k̄M {1, . . . , ⌈rM⌉ − 1}. Then, k=1 ȳ¯k < rM M m+1 < M1m . Corollary 1. Fix a game G ∈ G . There is MG1 ∈ N such that, for each M ≥ MG1 , an infected player in Phase III with beliefs given by ȳ¯M strictly prefers to play the Nash action. Proof. Suppose an infected player with beliefs given by ȳ¯M is deciding what to do at some period in Phase III. By Lemma 4, the probability that he assigns to meeting an infected player in the current period converges to 1 as M goes to infinity. This has two implications on incentives. First, in the current period, the Nash action becomes a strict best reply. Second, the impact of the current action on the continuation payoff goes to 0. Therefore, there is MG1 ∈ N such that, for each M larger than MG1 , playing the Nash action is a best reply given ȳ¯M . Proposition 4. Fix a game G ∈ G and M > MG1 . Fix Ṫ ∈ N and T̈ ∈ N and let t̄ ≫ Ṫ + T̈ . Suppose I observe history ht̄+2 = g . . . gbg. Then, it is sequentially rational for me to play the Nash action at period t̄ + 3. Proof. Now, we want to know how spread the contagion is according to the beliefs xt̄+2 . As usual, to do so we start with xt̄ , my beliefs computed conditioning only the information that at most M −2 people were infected after period t̄ and that I was uninfected until period t̄ + 1. From Lemma 3, if t̄ is large enough, xt̄ , which coincides with ȳ¯t̄−Ṫ −T̈ , is very close to ȳ¯M and, further, also all subsequent elements ȳ¯t̂−Ṫ −T̈ with t̂ > t̄ are very close to ȳ¯M . In particular, we can assume that, taking m = 1 in Lemma 4, I believe that, with probability at least 1 − M1 , at least rM people are infected. We can use xt̄ to compute my beliefs at the end of period t̄ + 2. 33 It is worth noting that we can do this argument uniformly for the different states and the corresponding γ’s because we know that all of them lie inside [−1, 0], a bounded interval; that is, we can take M big enough so as to ensure that we can use the approximation given by Lemma 2 for all γ in [−1, 0]. 36 • After period t̄ + 1: I compute xt̄+1 by updating xt̄ , conditioning on i) I observed b in period t̄ + 1 and ii) at most M − 1 people were infected after t̄ + 1 (I observed g at t̄ + 2). Let x̃t̄+1 be the belief computed from xt̄ by conditioning instead on i) I observed g in period t̄ + 1 and ii) at most M − 2 people are infected. Clearly, xt̄+1 first-order stochastically dominates x̃t̄+1 , in the sense of placing higher probability on more people being infected. Moreover, x̃t̄+1 coincides with ȳ¯t̄+1−Ṫ −T̈ , which we argued above is also close enough to ȳ¯M . • After period t̄ + 2: I compute xt̄+2 based on xt̄+1 and conditioning on i) I observed g; ii) I infected my opponent by playing the Nash action at t̄ + 2; and iii) at most M people have been infected after t̄ + 2. Again, this updating leads to beliefs that first-order stochastically dominate x̃t̄+2 , the beliefs we would obtain if we instead conditioned on i) I observed g and ii) at most M − 2 people were infected after t̄ + 2. Again, the belief x̃t̄+2 would be very close to ȳ¯M . Hence, if it is optimal for me to play the Nash action when my beliefs are given by ȳ¯M , it is also optimal to do so after observing the history ht̄+2 = g . . . gbg (provided that t̄ is large enough). By Corollary 1, if M > MG1 , the beliefs ȳ¯M ensure that I have the incentive to switch to Nash. As we did in Case 1, we still need to show that Nash reversion is optimal after histories of the form ht̄+1+α = g . . . gbg . α. . g. The intuition is as follows. If α is small, then I will think that a lot of players were already infected when I got infected, the argument being similar to that in Proposition 4. If α is large, I may learn that there were not so many players infected when I got infected. However, the number of players I myself have infected since then, together with the exponential spread of the contagion, will be enough to convince me that, at the present period, contagion is widespread anyway. To formalize the above intuition, we need the following strengthening of Lemma 4. We omit the proof, as it just involves a minor elaboration of the arguments in Lemma 4. Lemma 5. Let r ∈ (0, 1). Then, for each m ∈ N, there are r̂ ∈ (r, 1) and M ∈ N such ¯ that, for each M ≥ M, ¯ P⌊r̂M ⌋ ¯M 1 j=⌈rM ⌉ ȳj > 1 − m. PM M M 1 − j=⌊M −r̂M ⌋+1 ȳ¯j 37 Proposition 5. Fix a game G ∈ G . Then, there is MG2 such that, regardless of Ṫ ∈ N and T̈ ∈ N, for each t̄ ≫ Ṫ + T̈ the following holds. For each M > MG2 , if I observe history ht̄+1+α = g . . . gbg . α. . g with α ≥ 1 , then it is sequentially rational for me to play the Nash action at period t̄ + 2 + α. Proof. The proof is similar to that of Proposition 4, so we relegate it to the Online Appendix (B.4). Finally, we consider histories in which, after getting infected, I observe actions that include both g and b, i.e., histories starting with ht̄+1 = g . . . gb and in which I have observed b in one or more periods after getting infected. The reasoning is the same as in Case 1. After such histories, I will assign higher probability to more people being infected compared to histories where I only observed g after getting infected. Case 3: Infection in other periods of Phase III (“monotonicity” of beliefs). So far, we have shown that if a player is infected early in Phase III, he thinks that he was the last player to be infected and that everybody is infected, making Nash reversion optimal. Also, if a player is infected late in Phase III, he will believe that enough players are already infected for it to be optimal to play the Nash action (with his limit belief being given by ȳ¯M ). Next, we show that the belief of a player who is infected not very late in Phase III will lie in between. The earlier a player gets infected in Phase III, the closer his belief will be to (0, . . . , 0, 1), and the later he gets infected, the closer his belief will be to ȳ¯M . Proposition 6. Fix a game G ∈ G and let M > max{MG1 , MG2 }. Fix T̈ ∈ N. If Ṫ is large enough, then it is sequentially rational for me to play the Nash action after any history in which I get infected in Phase III. Proof. Let M > max{MG1 , MG2 }. In Cases 1 and 2, we showed that if I get infected at the start of Phase III (at Ṫ + T̈ + 1) or late in Phase III (at t̄ ≫ Ṫ + T̈ ), I will switch to the Nash action. What remains to be shown is that the same is true if I get infected at some intermediate period in Phase III. We prove this for histories in Phase III of the form ht̂+2 = g . . . gbg. The proof can be extended to include other histories, just as in Cases 1 and 2. We want to compute my belief xt̂+2 after history ht̂+2 = g . . . gbg. We first compute the intermediate belief xt , for t ≤ t̂. Beliefs are computed using matrix C2⌋ in Phase I, and C̄2⌋ in Phase III. We know from Case 1 that, for Ṫ large enough, xṪ +T̈ +1 ∈ RM is close to (0, . . . , 0, 1). Using the properties of the contagion matrix C̄2⌋ (see Proposition B.2 in the Online Appendix), we can show that 38 if we start Phase III with such a belief xṪ +T̈ +1 , xt̂ first-order stochastically dominates ȳ¯M , in the sense of placing higher probability on more people being infected. My beliefs still need to be updated from xt̂ to xt̂+1 and then from xt̂+1 to xt̂+2 . We can use similar arguments as in Proposition 4 to show that xt̂+2 first-order stochastically dominates ȳ¯M . Hence, if it is sequentially rational for me to play the Nash action when my beliefs are given by ȳ¯M , it is also sequentially rational to do so when they are given by xt̂+2 . We have established that if a player observes a triggering action during Phase III, it is sequentially rational for him to revert to the Nash action. A.3.4 A player observes a triggering action in Phases I or II It remains to check the incentives for a player who is infected during the trust-building phases. We argued informally in Section 3.2 that buyers infected in Phase I find it optimal to start the punishment only when Phase II starts. To get this, all that is needed is that, in each period of Phase I, the likelihood of meeting the deviant seller is low enough. For each game G, there is MG3 such that, if M > MG3 , this will be the case. Also, we argued that in case of being infected in Phase II, they would find it optimal to switch to the Nash action. We omit the formal proofs, as the arguments are very similar to those for Case 1 above. A.3.5 A player observes a non-triggering action An uninfected player who observes a non-triggering action knows that his opponent will not get infected, and will continue to play as if on-path. Since he knows that contagion will not start, the best thing to do is also to ignore this off-path behavior. A.3.6 Off-path histories with more than one deviation First, consider the situation in which the deviating player observes a probability zero history. In such a case, he will be at an erroneous history (Assumption 3 in Section A.2) and will think that there have been some mistake, update his beliefs accordingly and best respond. Second, suppose I am an infected player who deviates from the prescribed off-path action. The strategies prescribe that I subsequently play ignoring my own deviation (i.e., play the Nash action). To see why this is optimal consider a history in which I have been infected at a period t̄+1 late in Phase III and have observed history ht̄+1+α = g . . . gbg . . . g. 39 Further, suppose that, instead of playing Nash, I have played my on-path action after being infected. To argue why I will still think that contagion is widely spread regardless of how large α 34 is, let us compare the Markov processes needed to compute my intermediate beliefs with the one used to obtain the beliefs after history ht̄+1 = g . . . gb. History ht̄+1 = g . . . gbg. For this case we built upon the properties of ȳ¯M to check the incentives (see Corollary 1). Recall that ȳ¯M is the limit of the Markov process starting with some probability vector y 0 with y10 > 0 and transition matrix C̄2⌋ . History ht̄+1+α = g . . . gbg . . . g. We start with the intermediate beliefs xt̄ . Regardless of the value of α, since I am not spreading contagion (I may be meeting the same uninfected player in every period since I got infected), I will still think that at most M − 2 people were infected at any period t ≤ t̄. As above, the transition matrix is C̄2⌋ , and xt̄ will be close to ȳ¯M . To compute subsequent beliefs, since I know that at least two people in each community were infected after t̄, we have to use matrix C⌈1,2⌋ , which shifts the beliefs towards more people being infected (relative to the process given by C̄2⌋ ). Therefore, the ensuing process will converge to a limit that stochastically dominates ȳ¯M in terms of more people being infected. Finally, to study the beliefs after histories in which, after being infected, I alternate on-path play with the Nash action, we would have to combine the above arguments with those in the proof of Proposition 5 (Online Appendix, B.4). A.4 Choice of Parameters for Equilibrium Construction We now describe the order in which the parameters, M, T̈ , Ṫ , and δ, must be chosen to ¯ ¯ satisfy all incentive constraints simultaneously. Fix any game G ∈ G with a strict Nash equilibrium a∗ and a target payoff v ∈ Fa∗ . The first step is to choose M > max{MG1 , MG2 , MG3 }. This ensures that a buyer who ¯ gets infected in Phase I starts punishing in Phase II and that the incentive constraints in Phase III are satisfied, i.e., a player who observes a triggering action late in Phase III believes that enough people are already infected so that Nash reversion is optimal. As we argued in the discussion after Lemma 4, MG1 ≈ 20 should suffice for many games. A similar argument would apply to MG2 . Finally, MG3 will typically be smaller. However, 34 Note that the arguments in Proposition 5 cannot be applied since I am not spreading the contagion. 40 these are rough estimates. The precise computation of tight bounds is beyond the scope of this paper. There is one subtle issue. Once M, T̈ , and Ṫ are chosen, we need players to be patient ¯ enough to prevent deviations on-path. But we also need to check that a very patient player does not want to slow down the contagion once it has started. The essence of the argument is in observing that, for a fixed population size, once contagion has started, the expected future stage-game payoffs go down to the Nash payoff u(a∗ ) exponentially fast. That is, regardless of how an infected player plays in the future, the undiscounted sum of possible future gains he can make relative to the payoff from Nash reversion is bounded above. Thus, in some sense, even a perfectly patient player becomes effectively impatient. In the Online Appendix (B.5), we show formally that even an infinitely patient player would have the incentives to revert to the Nash action when prescribed to do so by our strategies. Once M is chosen, we pick T̈ large enough so that a buyer who is infected in Phase I ¯ and knows that not all buyers were infected by the end of Phase I still has an incentive to revert to Nash in Phase II. This buyer knows that contagion will spread from Phase III anyway, and the Nash action gives her a short-term gain in Phase II. Hence, if T̈ is long enough, she will revert to Nash in Phase II. Next, Ṫ is chosen large enough so that i) a buyer infected in Phase I who has observed â1 in most periods of Phase I believes that with high probability all buyers were infected during Phase I; ii) a seller infected in Phase II believes that with high probability at least M − 1 buyers were infected during Phase I; iii) a seller who deviates in Phase I believes that, with high probability, he met all the buyers in Phase I; and iv) players infected in Phase III believe that, with high probability, “enough” people were infected by the end of Phase II. Finally, once M, T̈ , and Ṫ have been chosen, we find the threshold δ such that i) for ¯ ¯ discount factors δ > δ, players will not deviate on-path and ii) the payoff associated with ¯ our strategies is as close to the target payoff as needed. 41 B For Online Publication B.1 Sequential Equilibrium: Consistency of Beliefs In this section we prove that the beliefs we have defined are consistent. First, recall the assumptions used to characterize them. i) Assumption 1: If a player observes a triggering action, then this player believes that some player in Community 1 (a seller) deviated in the first period of the game. Subsequently, his beliefs are described as follows: • Assumption 1.1: He believes that after the deviation by a seller in the first pe- riod, play has proceeded as prescribed by the strategies, provided his observed history is can be explained in this way. • Assumption 1.2: If his observed history cannot be explained with play having proceeded as prescribed by the strategies after a seller’s deviation in period 1, then we call this an erroneous history.35 In this case, the player believes that after the seller deviated in the first period of the game, one or more of his opponents in his matches made a mistake. Indeed, this player will think that there have been as many mistakes by his past rivals as needed to explain the history at hand. ii) Assumption 2: If a player observes a non-triggering action, then this player believes that his opponent made a mistake. iii) Assumption 3: If a player plays a triggering action and then observes a history that has probability zero according to his beliefs, we also say that he is at an erroneous history and will think that there have been as many mistakes by his past rivals as needed to explain the history at hand. Let σ and µ denote the equilibrium strategy profile and the associated system of beliefs, respectively. We prove below that there is a sequence of completely mixed strategies converging uniformly to σ such that the associated sequence of systems of beliefs, uniquely derived from the strategies by Bayes rule, converges pointwise to µ. Indeed, convergence will be uniform except for a set of pathological histories that has extremely low probability 35 For the formation of a player’s beliefs after erroneous histories, we assume that “mistakes” by infected players are infinitely more likely than “mistakes” by uninfected players. 42 (and therefore, would have no influence at all on incentives). It is worth noting that there is hardly any reference in the literature in which consistency of beliefs is formally proved, so it is not even clear whether uniform convergence should be preferred over pointwise convergence. For the reader keen on former, at the end of this section we present a mild assumption on the matching technology under which uniform convergence to the limit beliefs would be obtained after every history. Before moving to the proof, we present the notions of pointwise and uniform convergence adapted to our setting (we present them for beliefs, with the definitions for strategies being analogous): Pointwise convergence. A sequence of beliefs {µn }n∈N converges pointwise to the belief µ if, for each δ > 0 and each T ∈ N, there is n̄ ∈ N such that, for each n > n̄ and each information set at a period t ≤ T , the probability attached by µn at each node of that information set differs by less than δ from that attached by µ. Uniform convergence. A sequence of beliefs {µn }n∈N converges uniformly to the belief µ if, for each δ > 0, there is n̄ ∈ N such that, for each n > n̄ and each information set, the probability attached by µn at each node of that information set differs by less than δ from that attached by µ. Proof of consistency. First, we construct the sequence of completely mixed strategies con- verging to our equilibrium strategies, σ, and let µ be a belief system satisfying assumptions 1-3. Fix a player i and let D + 1 be the number of actions available to player i in the stage game. Then, for each n ∈ N, let εn = 1 . 2n Now, we distinguish two cases: A buyer at period t ≤ Ṫ . An uninfected buyer plays the prescribed equilibrium action with probability (1 − εtn ) and plays any other action with probability infected buyer these probabilities are (1 − εnt n ) and εnt n , D εtn . D For an respectively. These trembles convey the information that early deviations are more likely and that, in Phase I, mistakes by uninfected buyers are more likely than mistakes by infected ones. A buyer at period t > Ṫ or a seller at any period. Each uninfected player plays the prescribed equilibrium action with probability (1 − εnt n ) and plays any other action with probability εnt n . D For an infected player these probabilities are (1 − εtn ) and εtn , D respec- tively. These trembles convey again the information that early deviations are more likely, but now infected players are more likely to make mistakes. 43 Note that the above beliefs are natural in the spirit of trembling hand perfection: more costly mistakes are less likely. This is why except for the case of uninfected buyers in Phase I, whose mistakes are ignored anyway, mistakes by infected players are more likely (contagion will spread anyway). For each n ∈ N we denote by σn the corresponding completely mixed strategy and by µn the belief system derived from σn using Bayes rule. Clearly, the sequence {σn }n∈N converges uniformly to σ. We move now to the convergence of {µn }n∈N to µ, which we discuss through a series of representative histories. We start with the case that is more important for the incentives of our equilibrium strategies. Case 1. Consider a history ht̄ , late in Phase III (t̄ ≫ Ṫ + T̈ ), such that player i, a buyer, gets infected at time t̄, i.e., ht̄ = g . . . gb. We show that, as n goes to ∞, the beliefs µn are such that the probability that player i attaches to a seller having deviated in period 1 converges to one. For each t̄ ∈ N and τ ∈ N, with τ ≤ t̄ we define H(t̄, τ ) as the set of all histories of the matching process such that: • Player i observed history ht̄ = g . . . gb. • The first triggering action occurred at period τ . Let Pn (t̄, τ ) denote the sum of the probabilities of the histories in H(t̄, τ ) according to µn . Then, we want to show that, as n goes to ∞, Pn (t̄, 1) becomes infinitely larger than Pt̄ τ =2 Pn (t̄, τ ). To do so, we start comparing the probabilities in H(t̄, 1) and H(t̄, 2). Take H ∈ H(t̄, 2) and associate with it an element Ĥ ∈ H(t̄, 1) such that: • The same two players who got infected at period 2 according to H get infected in period 1 according to Ĥ. • These two players were matched to each other also in period 2. • The realized matches in H and Ĥ are the same from period 2 until t̄. We show that, as n goes to ∞, player i assigns arbitrarily higher probability to Ĥ compared to H. Recall that, at a given period t, the probability that a player plays the prescribed action is either 1 − εtn or 1 − εnt n . Finally, note that triggering actions in Phase I must come from sellers and, therefore, all triggering actions have probability P(H) P(Ĥ) εnτ n . D Then, 2n ≤ 2M −1 (1 − εnn )2M εDn (1 − ε2n q n ) εn n (1 D − εn )2M −1 (1 − ε2n )2M q 44 , where q is the probability of the realizations from period 2 until t̄, which, by definition, is the same for H and Ĥ. Simplifying the above expression we get εnn 2M −1 (1 − εnn )2M (1 − ε2n n ) (1 − εn )2M −1 (1 − ε2n )2M (1) and, as n goes to ∞, this converges to zero. We could have derived a similar expression if comparing histories in H(t̄, τ ) and H(t̄, τ + 1). Further, t̄ would not appear in these expressions, which is important for our convergence arguments to hold uniformly (not just pointwise). Next, to compare Pn (t̄, 1) with Pn (t̄, 2) we have to note the following. Given H ∈ H(t̄, 2), there are M! elements of H(t̄, 2) that coincide with H from period 2 onwards (corresponding to all the possible matchings in period 1). On the other hand, given Ĥ ∈ H(t̄, 1), there are just (M − 1)! elements of H(t̄, 1) that coincide with Ĥ from period 2 onwards (the players infected in period 2 must be matched also in period 1). What this implies is that if we want to compare Pn (t̄, 1) and Pn (t̄, 2) relying on the above association between elements of H(t̄, 1) and H(t̄, 2), we have to account for the fact that each element of H(t̄, 1) will be associated with M different elements of H(t̄, 2). Thus, if we define Ĥ(t̄, 1) as the set formed by the elements of H(t̄, 1) that are associated with some element of H(t̄, 2), we have that |H(t̄, 2)| = M|H(t̄, 1)|. Then, P P Pn (t̄, 2) Pn (First dev. at τ = 2 |ht̄ ) H∈H(t̄,2) Pn (H) H∈H(t̄,2) Pn (H) P = = ≤ P ′ Pn (First dev. at τ = 1 |ht̄ ) Pn (t̄, 1) H ′ ∈H(t̄,1) Pn (H ) Ĥ∈Ĥ(t̄,1) Pn (Ĥ) which by the above argument and Equation 1 reduces to 2M −1 − εnn )2M (1 − ε2n n ) , (1 − εn )2M −1 (1 − ε2n )2M (1 Mεnn which, as we increase n, soon becomes smaller than ε2n . Wrapping up, we have shown that, according to the belief system µn , the event “the first triggering action occurred at period 1” is, for large n, at least the event “the first triggering action occurred at period 2”, with goes to ∞. 1 ε2n 1 ε2n times more likely than converging to ∞ as n More generally, the above arguments can be used, mutandis mutandis, to show that the event “the first triggering action occurred at period τ ” is, for large n, at least 45 1 ε2n times more likely than the event “the first triggering action occurred at period τ + 1”. Thus, in general we have that, for large n, 2(τ −1) Pn (t̄, τ ) ε2 Pn (t̄, τ − 1) ε4 Pn (t̄, τ − 2) εn Pn (t̄, 1) 2(τ −1) ≤ n ≤ n ≤ ··· ≤ = ε2 . Pn (t̄, 1) Pn (t̄, 1) Pn (t̄, 1) Pn (t̄, 1) Therefore, for large n we have t̄ t̄ ∞ ε2n Pn (First dev. at τ 6= 1 |ht̄ ) X Pn (t̄, τ ) X 2(τ −1) X 2(τ −1) , = ≤ ε ≤ ε = n n 2 Pn (First dev. at τ = 1 |ht̄ ) τ =2 Pn (t̄, 1) 1 − ε n τ =2 τ =2 which, as we wanted to prove, converges to zero as n goes to ∞ and, further, this conver- gence is uniform on t̄. Case 2. Consider a history ht̄ , in Phase I (t̄ ≤ Ṫ ), such that player i, a seller, observes a nontriggering action at time t̄, i.e., ht̄ = g . . . gb. In this case, what is relevant for the incentives is that the seller believes that there has been no triggering action so far. First, by similar arguments to those in Case 1, we can show that, as n goes to ∞, the set of histories with a triggering action in period 1 becomes infinitely more likely than the set of histories with the first triggering action at a period τ > 1. We argue now why the former histories become themselves infinitely less likely than histories in which there has been no triggering action. The idea is that, to explain the observation of ht̄ = g . . . gb with a history that entails a triggering action at period 1, one needs at least two deviations from the equilibrium strategy: one at period 1, with probability period t̄, which has probability at most εt̄n D εn n D and another one by a buyer at (if the buyer was uninfected). On the other hand, t̄ explaining the observation of h = g . . . gb with histories without triggering actions just requires one deviation by a buyer at period t̄ which has probability εt̄n D (no one is infected). Comparing the likelihood of the two types of histories it is clear that the later ones become infinitely more likely as n goes to ∞. Case 3. We discuss now the beliefs after an erroneous history. Consider a history ht̄ , at the start of Phase III (t̄ = Ṫ + T̈ + 1), such that player i, a buyer, gets infected at period t̄ and then observes M − 1 periods of good behavior while he is playing the Nash action: ht̄+M −1 = g . . . gbg . . . g. As argued at the end of Case 1 in Section A.3.3, such a history is erroneous, since a deviation by a seller at period 1 does not suffice to explain it.36 Then, 36 We briefly recall the argument used there. Player i believes that at least two people are infected at the end of period t̄ (at least one pair of players in period 1 and one pair in period t̄). Suppose that subsequently 46 Assumption 1.2 on beliefs says that, in addition to the first deviation by a seller in period 1, there was another mistake; in this case, this means that one of the M − 1 opponents that player i faced after period t̄ was infected and, yet, he played as if on-path. It is worth noting that history ht̄+M −1 = g . . . gbg . . . g can be alternatively explained by one deviation: the triggering action observed by player i at period t̄ was the first triggering action of the game and, hence, only one player in each community was infected after at the end of that period; in the other M − 1 subsequent periods player i infected each and every player in the other community one by one. We informally argue why, as n goes to ∞, histories like the one above become infinitely less likely than those with a seller deviating in period 1. The key is that deviations by infected players are much more likely than those by uninfected ones. First triggering action at period 1. In this case the probability goes to zero at speed, at −1 = εnn+t̄+M −1 . most, εnn · εt̄+M n First triggering action at period t̄. As n goes to ∞, the probability of these histories goes to zero at speed εnnt̄ , which comes from the probability that an uninfected player deviates from the equilibrium path at period t̄. Clearly, as n goes to ∞, εnnt̄ goes to zero faster than εnn+t̄+M −1 . The same kind of com- parison would arise for similar histories in which t̄ ≫ Ṫ + T̈ , and therefore uniform convergence of beliefs would not be a problem either. Yet, a formal proof would require an analysis similar to that of Case 1, to factor in the likelihood of the matching realizations associated with different histories. Case 4. To conclude, we discuss another type of erroneous histories, for which convergence is only achieved pointwise.37 Consider a history in which player i makes the first triggering deviation of the game at period t̄ = Ṫ + T̈ + 1 and that, since then, he has always observed the Nash action h = g . . . gbbb . . . . Then, player i believes that, very likely, he has been meeting the same player, say player j, ever since he deviated (setting aside “mistakes”, no other player would have played the Nash action at period t̄ + 1, for instance). Given this belief suppose that, after deviating, player i played an action ã that is player i faces M − 2 instances of g. Then, since player i switched to the Nash action after getting infected, he has infected each and every one of his opponents in the last M − 2 periods. Then, at the end of M − 2 observations of g, he is sure that even if there were only two players infected in each community at the end of t̄, he has infected all the remaining uninfected people. So observing more g after that cannot be explained by a single deviation in period 1. 37 We thank an anonymous referee for pointing this out. 47 neither the on-path action nor the Nash action (maybe maximizing short-run profits before the contagion spreads). The question now is, how does player j explain the history he is observing? First of all, note that the types of histories we are describing are very pathological: players i and j are matched in every single period after period t̄, which is very unlikely and leads to a situation in which, period after period, player i has very precise information about how the contagion is spreading. Since these histories have very low probability, they will have no impact on the incentives of a player who may be considering a deviation in Phase III but, in any case, we argue below why the beliefs of player j converge to those in which he thinks that a seller deviated in period 1 and all observations from period t̄ onwards have been mistakes. Let τ be the number of periods after t̄ in which players i and j have been matched and consider the following two histories: First triggering action at period 1. The probability of these histories goes to zero at speed εnn · τ Y t=1 +1 τ 2t̄+τ 2 n εt̄+t n = εn · εn +1 n+τ 2t̄+τ 2 = εn First triggering action at period t̄. As n goes to ∞, the probability of these histories goes to zero at speed εnnt̄ , which comes from the probability that an uninfected player deviates from the equilibrium path. +1 n+τ 2t̄+τ 2 Then, for each τ ∈ N, as n goes to ∞, εnnt̄ goes to zero faster than εn . However, differently from Case 3, the convergence cannot be made uniform. The reason is that for +1 n+τ 2t̄+τ 2 a fixed n, no matter how large, we can always find τ ∈ N such that εn is smaller than εnnt̄ ; take, for instance, τ = n. B.1.1 Ensuring uniform convergence of beliefs We have seen above that there are histories after which the convergence of beliefs required for consistency is achieved pointwise. These histories are very special, since they require the same two players being matched in many consecutive periods, which results in one of the players being highly informed about the evolution of contagion. At soon as there is a period in which these two players do not face each other, uniform convergence is recovered. The preceding argument suggests a natural modification to our matching assumptions to ensure uniform convergence of beliefs after every history. The modification would be 48 to assume that there is L ∈ N such that no two players are matched to each other for more than L consecutive periods. In particular, L = 1 would mean that no players are matched together twice in a row. Thus, matching would not be completely independent over time. We conjecture that, regardless of the value of L, our construction goes through and convergence of beliefs would be uniform. However, a formal analysis of this modified setting is beyond the scope of this paper. B.2 Properties of the Conditional Transition Matrices In Section A.2, we introduced a class of matrices, contagion matrices, which turns out to be very useful in analyzing the beliefs of players. First, note that, since contagion matrices are upper triangular, their eigenvalues correspond with the diagonal entries. It is easy to check that, given a contagion matrix, any eigenvector associated with the largest eigenvalue P is either nonnegative or nonpositive. Given y ∈ Rk , let kyk := i∈{1,...,k} yi . We are often interested in the limit behavior of y t := yC t , kyC t k where C is a contagion matrix and y is a probability vector. We present below a few results about this limit behavior. We distinguish three special types of contagion matrices that will deliver different limiting results. Property C1: {c11 } = argmaxi∈{1,...,k} cii . Property C2: ckk ∈ argmaxi∈{1,...,k} cii . Property C3: For each l < k, C⌈l satisfies C1 or C2. Lemma B.1. Let C be a contagion matrix and let λ be its largest eigenvalue. Then, the left eigenspace associated with λ has dimension one. That is, the geometric multiplicity of λ is one, irrespective of its algebraic multiplicity. Proof. Let l be the largest index such that cll = λ > 0, and let y be a nonnegative left eigenvector associated with λ. We claim that, for each i < l, yi = 0. Suppose not and let i be the largest index smaller than l such that yi 6= 0. If i < l − 1, we have that yi+1 = 0 and, since ci,i+1 > 0, we get (yC)i+1 > 0, which contradicts that y is an eigenvector associated with λ. If i = l − 1, then (yC)l ≥ cll yl + cl−1,l yl−1 > cll yl = λyl , which, again, contradicts that y is an eigenvector associated with λ. Then, we can restrict attention to matrix C⌈(l−1) . Now, λ is also the largest eigenvalue of C⌈(l−1) but, by definition of l, only one diagonal entry of C⌈(l−1) equals λ and, hence, its multiplicity is one. Then, z ∈ Rk−(l−1) is a left 49 eigenvector associated with λ for matrix C⌈(l−1) if and only if (0, . . . , 0, z) ∈ Rk is a left eigenvector associated with λ for matrix C. Given a contagion matrix C with largest eigenvalue λ, we denote by y̆ the unique nonnegative left eigenvector associated with λ such that ky̆k = 1. Proposition B.1. Let C ∈ Mk be a contagion matrix satisfying C1 or C2. Then, for each nonnegative vector y ∈ Rk with y1 > 0, we have limt→∞ C2, y̆ = (0, . . . , 0, 1). yC t kyC t k = y̆. In particular, under Proof. Clearly, since C is a contagion matrix, if t is large enough, all the components of y t are positive. Then, for the sake of exposition, we assume that all the components of y are positive. We distinguish two cases. C satisfies C1. This part of the proof is a direct application of Perron-Frobenius theorem. First, note that yC t kyC t k can be written as in Seneta (2006), we have that Ct λt y(C t /λt ) . ky(C t /λt )k Now, using for instance Theorem 1.2 converges to a matrix that is obtained as the product of the right and left eigenvectors associated to λ. Since in our case the right eigenvector is (1, 0, . . . , 0), Ct λt converges to a matrix that has y̆ in the first row and with all other rows being the zero vector. Therefore, the result follows from the fact that y1 > 0. C satisfies C2. We show that, for each i < k, limt→∞ yit = 0. We prove this by induction on i. Let i = 1. Then, for each t ∈ N, c11 y1t y1t c11 y1t y1t+1 P < ≤ , = t ckk ykt ykt ykt+1 l≤k clk yi where the first inequality is strict because yk−1 > 0 and ck−1,k > 0 (C is a contagion matrix); the second inequality follows from C2. Hence, the ratio t y1t ykt is strictly decreasing in t. Moreover, since all the components of y lie in [0, 1], it is not hard to see that, as far as y1t is bounded away from 0, the speed at which the above ratio decreases is also bounded away from 0.38 Therefore, limt→∞ y1t = 0. Suppose that the claim holds for each i < j < k − 1. Now, yjt+1 ykt+1 P P t t X clj y t X clj y t yjt cjj yjt l≤j clj yl l≤j clj yl l l =P = + ≤ + . < t t t t t ckk ykt c c c y kk yk kk yk kk yk k l≤k clk yl l<j l<j 38 Roughly speaking, this is because the state k will always get some probability from state 1 via the intermediate states, and this probability will be bounded away from 0 as far as the probability of state 1 is bounded away from 0. 50 By the induction hypothesis, for each l < j, the term ylt ykt can be made arbitrarily small for large enough t. Then, the first term in the above expression can be made arbitrarily small. Hence, it is easy to see that, for large enough t, the ratio above, this can happen only if limt→∞ yjt = 0. yjt ykt is strictly decreasing in t. As Recall the matrices used to represent a player’s beliefs after he observes history ht = g . . . gb. At the beginning of Phase III, the beliefs evolved according to matrices C1⌋ and S1⌋ , and late in Phase III, according to C̄1⌋ . Note that these three matrices all satisfy the conditions of the above proposition. This is what drives Lemmas 1 and 2 in the text. Consider the truncated matrix C̄2⌋ that gave the transition of beliefs of a player who observes history ht = g . . . bg. This matrix also satisfies the conditions of the above proposition, and this suffices for Lemma 3. Proposition B.2. Let C ∈ Mk be a contagion matrix satisfying C1 and C3. Let y ∈ Rk be a nonnegative vector. Then, if y is close enough to (0, . . . , 0, 1), we have that, for each P P t ∈ N and each l ∈ {1, . . . , k}, ki=l yit ≥ ki=l y̆i . Whenever two vectors are as y t and y̆ above, we say that y t first-order stochastically dominates y̆ (in the sense of more people being infected). Proof. For each i ∈ {1 . . . , k}, let ei denote the i-th element of the canonical basis in Rk . By C1, c11 is larger than any other diagonal entry of C. Let y̆ be the unique nonnegative left eigenvector associated with c11 such that ky̆k = 1. Clearly, y̆1 > 0 and, hence, {y̆, e2, . . . , ek } is a basis in Rk . With respect to this basis, the matrix C looks like c11 0 0 C⌈1 . Now, we distinguish two cases. C⌈1 satisfies C2. In this case, we can apply Proposition B.1 to C⌈1 to get that, for each nonnegative vector z ∈ Rk−1 with z1 > 0, limt→∞ zC⌈t1 kzC⌈t1 k = (0, . . . , 0, 1). Now, let y ∈ Rk be the vector in the statement of this result. Since y is very close to (0, . . . , 0, 1). Then, using the above basis, it is clear that y = αy̆ + v, with α ≥ 0 and v ≈ (0, . . . , 0, 1). Let 51 t ∈ N. Then, for each t ∈ N, t vC λt αy̆ + kvC t k kvC tk λt αy̆ + vC t yC t t = = . y = t t t kyC k kyC k kyC k t vC Clearly, kyC tk = kλt αy̆ + kvC t k kvC t k k and, since all the terms are positive, kyC tk = kλt αk ky̆k + kvC t k k vC t k = kλt αk + kvC t k kvC tk vC t . Since v ≈ (0, . . . , 0, 1) kvC t k vC t first-order stochastically kvC t k t and, hence, we have that y t is a convex combination of y̆ and and vC t kvC t k → (0, . . . , 0, 1), it is clear that, for each t ∈ N, dominates y̆ in the sense of more people being infected. Therefore, also y will first-order stochastically dominate y̆. C⌈1 satisfies C1. By C1, the first diagonal entry of C⌈1 is larger than any other diagonal entry. Let y̆ 2 be the unique associated nonnegative left eigenvector such that ky̆ 2k = 1. It is easy to see that y̆ 2 first-order stochastically dominates y̆; the reason is that y̆ 2 and y̆ are the limit of the same contagion process, with the only difference that the state in which only one person has been infected is known to have probability 0 when using obtaining y̆ 2 from C⌈1 . Clearly, y̆22 > 0 and, hence, {y̆, y̆ 2, e3 , . . . , ek } is a basis in Rk . With respect to this basis, the matrix C looks like c11 0 0 0 0 c22 0 0 C⌈2 . Again, we can distinguish two cases. • C⌈2 satisfies C2. In this case, we can repeat the arguments above to show that y t is a convex combination of y̆, y̆ 2 and stochastically dominate y̆, y t also does. vC t . kvC t k Since both y̆ 2 and vC t kvC t k first-order • C⌈2 satisfies C1. Now, we would get a vector y̆ 3 , and the procedure would con- tinue until a truncated matrix satisfies C2 or until we get a basis of eigenvectors, one of them being y̆ and all the others first-order stochastically dominating y̆. In both 52 situations, the result immediately follows from the above arguments. Note that the matrix C̄2⌋ , which gave the transition of beliefs of a player conditional on history ht = g . . . gbg late in the game, satisfies the conditions of the above proposition. This property is useful in proving Proposition 6. B.3 Updating of Beliefs Conditional on Observed Histories Below, we validate the approach to computing off-path beliefs discussed in Section A.3.1. Suppose that player i observes history ht̄+1 = g . . . gbg in Phase III, and we want to compute her beliefs at period t̄ + 1 conditional on ht̄+1 , namely xt̄+1 . Recall our method for computing xt̄+1 . We first compute a set of intermediate beliefs xt for t < t̄ + 1. For any period τ , we compute xτ +1 from xτ by conditioning on the event that “I was uninfected in period τ + 1” and that “I τ +1 ≤ M − 2” (I t is the random variable representing the number of infected people after period t). We do not use the information that “I remained uninfected after any period t with τ + 1 < t < t̄.” This information is added later, period by period, i.e., only at period t, we add the information coming from the fact that “I was not infected at period t.” Below, we show that this method of computing beliefs is equivalent to the standard updating of beliefs conditioning on the entire history at once. Let α ∈ {0, . . . , M −2} and let ht+1+α denote the (t+1+α)-period history g . . . gbg . α. . g. Recall that U t denotes the event that i is uninfected at the end of period t. Let bt (gt ) denote the event that player i faced b (g) in period t. We introduce some additional notation. (t) • Ii,k denotes the event i < I t < k, i.e., the number of infected people at the end of t periods is at least i and at most k. t t • Eαt := I0,M −α ∩ U t+1 t+1 • Eαt+1 := Eαt ∩ I1,M −α+1 ∩ b • For each β ∈ {1, . . . , α − 1}, t+1+β t+1+β Eαt+1+β := Eαt+β ∩ Iβ+1,M −α+β+1 ∩ g • Eαt+1+α := Eαt+α ∩ g t+1+α = ht+1+α . Let H t be a history of the contagion process up to period t. Let Ht be the set of all H t histories. Hkt denotes the set of t-period histories of the stochastic process where I t = k. 53 t+1+β We say H t+1 ⇒ ht+1 if history H t+1 implies that I observed history ht+1 . Let i → k denote the event that state i transits to state k in period t + 1 + β, consistently with ht+1+α , or, equivalently, consistently with Eαt+1+β . The probabilities of interest are P (I t+1+α = k |ht+1+α ) = P (I t+1+α = k |Eαt+1+α ). We want to show that we can obtain the probabilities after t + 1 + α conditional on ht+1+α by starting with the probabilities after t conditional on Eαt and then let the contagion elapse one more period at a time conditioning on the new information, i.e., adding the “local” information that player i observed g in the next period and that infected one more person. Precisely, we want to show that, for each β ∈ {0, . . . , α}, P (I t+1+β ? t+1+β PM P (i → k)P (I t+β = i |Eαt+β ) j=1 t+β = i | t+β ) i=1 P (i → j)P (I Eα i=1 = k |Eαt+1+β ) = P P M M t+1+β . Fix β ∈ {0, . . . , α}. For each H t+1+β ∈ Ht+1+β , let H t+1+β,β denote the unique H t+β ∈ Ht+β that is compatible with H t+1+β , i.e., the restriction of H t+1+β to the first t + β pe- riods. Let F 1+β := {H̃ t+1+β ∈ Ht+1+β : H̃ t+1+β ⇒ Eαt+1+β }. Let Fk1+β := {H̃ t+1+β ∈ F 1+β : H̃ t+1+β ∈ Hkt+1+β }. Clearly, the Fk1+β sets define a “partition” of F 1+β (one or more sets in the partition might be empty). Let Fkβ := {H̃ t+1+β ∈ F 1+β : H̃ t+1+β,β ∈ Hkt+β }. Clearly, also the Fkβ sets define a “partition” of F 1+β . Note that, for each pair H t+1+β , H̃ t+1+β ∈ Fk1+β ∩ Fiβ , P (H t+1+β |H t+1,β ) = P (H̃ t+1+β |H̃ t+1,β ). Denote this probability by P (Fiβ t+1+β t+1+β → Fk1+β ). Let |i → k| denote the number of ways in which i can transition to k at period t + 1 + β consistently with ht+1+α or, equivalently, consistently with Eαt+1+β . Clearly, this number is independent of the history that led to i people being t+1+β t+1+β t+1+β infected. Now, P (i → k) = P (Fiβ → Fk1+β )|i → k|. Then, P (I t+1+β = k |Eαt+1+β ) = X P (H t+1+β |Eαt+1+β ) = = H t+1+β ∈Hkt+1+β = X H t+1+β ∈Fk1+β = 1 X H t+1+β ∈Fk1+β P (H t+1+β |Eαt+1+β ) 1 P (H t+1+β ∩ Eαt+1+β ) = t+1+β t+1+β P (Eα P (Eα ) ) M X X P (Eαt+1+β ) i=1 t+1+β 1+β β H ∈Fk ∩Fi P (H t+1+β ) 54 X H t+1+β ∈Fk1+β P (H t+1+β ) = = 1 M X X P (H t+1+β |H t+1+β,β )P (H t+1+β,β |Eαt+β )P (Eαt+β ) t+1+β P (Eα ) i=1 t+1+β 1+β β H ∈Fk ∩Fi M t+β X P (Eα ) X β t+1+β 1+β P (F → F ) P (H t+1+β,β |Eαt+β ) i k t+1+β P (Eα ) i=1 H t+1+β ∈F 1+β ∩F β i k = = = M P (Eαt+β ) X t+1+β P (Fiβ → t+1+β P (Eα ) i=1 X t+1+β Fk1+β )|i → k| M P (Eαt+β ) X H t+β ∈Ht+β i t+1+β t+1+β P (Fiβ → Fk1+β )|i → k|P (I t+β t+1+β P (Eα ) i=1 M P (Eαt+β ) X t+1+β P (i → k)P (I t+β = i |Eαt+β ) t+1+β P (Eα ) i=1 It is easy to see that P (Eαt+1+β ) = PM t+β j=1 P (Eα ) PM i=1 P (H t+β |Eαt+β ) = i |Eαt+β ) t+1+β P (i → j)P (I t+β = i |Eαt+β ), and the result follows. Similar arguments apply to histories ht+1+α = g . . . gbg . α. . where player i observes both g and b in the α periods following the first triggering action. B.4 Proofs of Claim 2 and Proposition 5 Proof of Claim 2. We prove the claim in two steps. 1√ Step 1: γ = 0. Stirling’s formula implies that limn→∞ (e−n nn+ 2 2π)/n! = 1. Given √ −n n+ 21 r ∈ (0, 1), to study hM n 2π. γ (r) in the limit, we use the approximation n! = e Substituting and simplifying, we get the following: MhM 0 (r) = M ((rM)!)2 (((1 − r)M)!)2 (1 − r) M!(r 2 M)!(((r − r 2 )M)!)2 ((1 − r)2 M)! M(rM)1+2rM ((1 − r)M)1+2(1−r)M (1 − r) = √ 1 1 1 2 2 2πM 2 +M ((1 − r)2 M)1+2(1−r)2 M ((r − r 2 )M) 2 +(r−r )M (r 2 M) 2 +r M √ M = √ . r 2π Step 2: Let γ ∈ R and r ∈ (0, 1). Now, (r 2 M − γ)!(((r − r 2 )M + γ)!)2 ((1 − r)2 M − γ)! (1 − r)2 M hM 0 (r) = . hM (r 2 M)!(((r − r 2 )M)!)2 ((1 − r)2 M)! (1 − r)2 M − γ γ (r) 55 Applying Stirling’s formula again, the above expression becomes 1 (r 2 M −γ) 2 +r 2 M −γ 1 2 (r 2 M ) 2 +r M 1 2 2 ((r−r 2 )M +γ)1+2(r−r )M +2γ ((1−r)2 M −γ) 2 +(1−r) M −γ (1−r)2 M . 1 2 2 (1−r)2 M −γ ((r−r 2 )M )1+2(r−r )M ((1−r)2 M ) 2 +(1−r) M (2) To compute the limit of the above expression as M → ∞, we analyze the four fractions above separately. Clearly, ((1 − r)2 M)/((1 − r)2 M − γ) → 1 as M → ∞. So, we restrict attention to the first three fractions. Take the first one: 1 (r 2 M − γ) 2 +r 2 M −γ 1 2 (r 2 M) 2 +r M = (1 − γ γ 1 r2M ) 2 · (1 − r2M )r 2M · (r 2 M − γ)−γ = A1 · A2 · A3 , where limM →∞ A1 = 1 and limM →∞ A2 = e−γ . Similarly, the second fraction decomposes as B1 ·B2 ·B3 , where limM →∞ B1 = 1, limM →∞ B2 = e2γ and B3 = ((r−r 2 )M +γ)2γ . The third fraction can be decomposed as C1 ·C2 ·C3 , where limM →∞ C1 = 1, limM →∞ C2 = e−γ and C3 = ((1 − r)2 M − γ)−γ . Thus, the limit of expression (2) as M → ∞ reduces to lim M →∞ 1 eγ (r 2 M − γ)γ 1 = − r)2 M − γ)γ γ ((r − r 2 )M + γ)2 = 1, lim M →∞ (r 2 M − γ)((1 − r)2 M − γ) · e2γ ((r − r 2 )M + γ)2γ · eγ ((1 which delivers the desired result. Proof of Proposition 5. First, I know that at most M − α − 1 people were infected after period t̄. The new limit vector y ∗ ∈ RM −α−1 must be computed using matrix C̄(α+1)⌋ . If z 39 M ∗ we define z := (ȳ¯1M , . . . , ȳ¯M −α−1 ), it is easy to see that y = kzk . Second, consider the following scenario. Suppose that, late in Phase III, an infected player happened to believe that exactly two people were infected in each community, and then he played the Nash action for multiple periods while observing only g. In each period, he infected a new person and contagion continued to spread exponentially. If the number of periods in which the player infected people is large enough, Nash reversion would be the best reply. This number of periods only depends on the specific game G and on the population size M and, thus, we denote this number by φG (M). Since the contagion spreads 39 One way to see this is to recall that ȳ¯M = limt→∞ for each t and each k ∈ {1, . . . , M − α − 1}, zkt = t ȳ¯k kȳ¯t k 56 . ¯t C̄2⌋ ȳ , kȳ¯t C̄2⌋ k y ∗ = limt→∞ z t C̄(α+1)⌋ kz t C̄(α+1)⌋ k and note that, exponentially, for fixed G, φG (M) is some logarithmic function of M. Hence, for each r̂ ∈ (0, 1), there is M̂ such that for M > M̂, r̂M > φG (M). Now, given r ∈ (0, 1) and m ∈ N, we can find r̂ ∈ (r, 1) and M such that Lemma 5 holds. For the rest of the proof, ¯ we work with M > max{M̂, M}. There are two cases: ¯ α < φG (M ): In this case, we can repeat the arguments in the proof of Proposition 4 to show that my beliefs xt̄+1+α first-order stochastically dominate y ∗. Since I observed α periods of good behavior after being infected, I know that at most M − α players were infected at the end of period t̄. Now, since r̂M > φG (M), we have that ⌊M − r̂M⌋ < M − φG (M) < M − α. Then, we can parallel the arguments in Proposition 4 and use Lemma 5 to find an MG that ensures that the player has strict incentives to play the Nash action if the population size is at least MG . α ≥ φG (M ): In this case, I played the Nash action for α periods. By definition of φG (M), playing Nash is the unique best reply after observing ht̄+1+α . Finally, we can just take MG2 = max{MG , M̂, M}. ¯ B.5 Incentives of Patient Players We formally show here that even an infinitely patient player would have the incentive to punish when prescribed to do so. This is important for the discussion in Section A.4. Let m̄ be the maximum possible gain any player can make by a unilateral deviation from any action profile of the stage-game. Let l be the minimum loss that a player suffers ¯ in the stage-game when he does not play his best response to his rival’s strict Nash action a∗ (recall that, since a∗ is a strict Nash equilibrium, l > 0). Suppose that we are in Phase III, ¯ and take a player i who knows that the contagion has started. For the analysis in this section, we use the continuation payoffs of the Nash reversion strategy as the benchmark. For any given continuation strategy of a player i, let v(M) denote player i’s (expected) undiscounted sum of future gains relative to his payoff from the Nash reversion strategy. It is easy to see that v(M) is finite. Player i knows that contagion is spreading exponentially and, hence, his payoff will drop to the Nash payoff u(a∗ ) in the long run. In fact, although v(M) increases with M, since contagion spreads exponentially fast, v(M) grows at a slower rate than M. Similarly, for any continuation strategy, and for each r ∈ (0, 1], let v(r, M) denote the (expected) undiscounted sum of future gains player i can make from playing this continuation strategy relative to the payoff from Nash reversion, when he is in Phase III and knows that at least rM people are infected in each community. In the result 57 below, we show that v(r, M) is uniformly bounded on M. Lemma 6. Fix a game G ∈ G . Let r > 1/2. Then, there is Ū such that, for each r̄ > r and each M ∈ N, v(r̄, M) ≤ Ū. Proof. Let P (r, M) denote the probability of the event: “If ⌈rM⌉ people are infected at period t, the number of infected people at period t + 1 is, at least, ⌈rM + (1−r) M⌉;” that is, 2 P (r, M) is the probability that at least half the uninfected people get infected in the present period, given that rM people are infected already. We claim that P ( 12 , M) ≥ 12 . M people 2 least M4 more Suppose that M is even and What is the probability of at are infected at the start of period t (i.e., r = 21 ). people getting infected in this period? It is easy to see from the contagion matrix that there is a symmetry in the transition probabilities of different states. The probability of no one getting infected in this period is the same as that of no one remaining uninfected. The probability of one person being infected is the same as that of only one person remaining uninfected. In general, probability of k < M/4 players getting infected is the same as that of k players remaining uninfected. Tsimilarhis symmetry implies immediately that P ( 12 , M) ≥ 21 . It is easy to see that for r > 12 , ⌈rM⌉ > Thus, for each r > 1 2 M , 2 and further P (r, M) > P ( 21 , M) ≥ 12 . and each M ∈ N, P (r, M) > 1 . 2 Also, for any fixed r > 1 , 2 limM →∞ P (r, M) = 1. Intuitively, if more than half the population is already infected, then the larger M is, the more unlikely it is that more than half of the uninfected people remain uninfected. Hence, given r > 1 , 2 there is p̂ < P (r, M) > 1 − p̂. 1 2 such that, for each M ∈ N, Given any continuation strategy, we want to compute v(r, M). Note that if a player meets an infected opponent, then not playing the Nash action involves a minimal loss of l; ¯ and if he meets an uninfected opponent, his maximal gain relative to playing the Nash action is m̄. So, we first compute the probability of meeting an uninfected player in any future period. Suppose that ⌈rM⌉ people are infected at the start of period t. Let s0 = 1−r. Current period t: There are ⌈rM⌉ infected players. So, the probability of meeting an uninfected opponent is (1 − r) = s0 . Period t+1: With probability at least 1− p̂, at least half the uninfected players got infected in period t and with probability at most p̂, less than half the uninfected players got infected. So, the probability of meeting an uninfected opponent in this period is less than (1 − p̂) s20 + p̂s0 < s0 2 + p̂s0 . 58 Period t + 2: Similarly, using the upper bound s0 2 + p̂s0 obtained for period t + 1, the probability of meeting an uninfected opponent in period t + 2 is less than (1 − p̂) s0 s0 s0 s0 s0 1 + p̂ (1 − p̂) + p̂s0 ≤ + p̂ + 2p̂ + p̂2 s0 = s0 ( + p̂)2 . 4 2 2 4 2 2 Period t + τ : Probability of meeting an uninfected opponent in period t + τ is less than s0 ( 12 + p̂)τ . Therefore, we have 1 1 v(r, M) < s0 m̄ − (1 − s0 )l + s0 ( + p̂)m̄ − (1 − s0 ( + p̂))l + . . . ¯ ¯ 2 2 ∞ ∞ X X 1 1 1 s0 ( + p̂)τ m̄ = s0 ( + p̂)τ m̄ − (1 − s0 ( + p̂)τ )l ≤ ¯ 2 2 2 τ =0 τ =0 = s0 m̄ ∞ X 1 ( + p̂)τ = Ū. 2 τ =0 Convergence of the series follows from the fact that 1 2 + p̂ < 1. Clearly, for r̄ > r, v(r̄, M) ≤ v(r, M), and, hence, v(r̄, M) is uniformly bounded, i.e., for each r̄ ≥ r and each M ∈ N, v(r̄, M) ≤ Ū.40 Proposition 7. Fix a game G ∈ G . Then, there exist r ∈ (0, 1) and M ∈ N such that, for ¯ ¯ each r ≥ r and each M ≥ M, a player who gets infected very late will not slow down the ¯ ¯ contagion even if he is infinitely patient (δ = 1). Proof. Take r > 1 . 2 By Lemma 4, if M is big enough, a player who gets infected late in Phase III believes that “with probability at least 1 − 1 , M2 at least rM people in each community are infected.” Consider a continuation strategy where he deviates and does not play the Nash action. i) With probability 1 − 1 , M2 at least rM people are infected. So, with probability at least r, he meets an infected player, makes a loss of at least l by not playing Nash, ¯ 40 The upper bound obtained here is a very loose one. Notice that contagion proceeds at a very high rate. In the proof of Lemma 4, we showed that if the state of contagion is such that number of infected people is rM , the most likely state in the next period is (2r − r2 )M . This implies that the fraction of uninfected people goes down from (1 − r) to (1 − 2r − r2 ) = (1 − r)2 . More generally, if we consider contagion evolving along this path of “most likely” transitions for t consecutive periods, the number of uninfected people would go down t to (1 − r)2 , i.e., the contagion spreads at a highly exponential rate. In particular, this rate of convergence is independent of M . 59 and does not slow down the contagion. With probability 1 − r, he gains, at most, m̄ in the current period and v(r, M) in the future. ii) With probability 1 , M2 fewer than rM people are infected, and the player’s gain is, at most, m̄ in the current period and, at most, v(M) in the future. Hence, by Lemma 6, the gain from not playing the Nash action instead of doing so is bounded above by: 1 M + m̄ 1 m̄ + v(M) +(1− 2 ) −rl+(1−r)(m̄+v(M, r)) < +(1− 2 ) −rl+(1−r)(m̄+Ū) . 2 2 M M M M The inequality follows from the facts that v(M) is finite and increases slower than the rate of M and that Ū is a uniform bound for v(r, M) for any r > 1 . 2 If M is large enough and r is close to 1, the expression is negative. So, there is no incentive to slow down the contagion. B.6 Uncertainty about Timing In this section we investigate what happens if players are not sure about the the timing of the different phases. We conjecture that a modification of our strategies is robust to the introduction of small uncertainty about timing. To provide some intuition for this conjecture, we consider a setting in which there is uncertainty about the timing of the phases. More precisely, the players don’t know precisely which equilibrium they are supposed to play, among all that can be defined via pairs (Ṫ, T̈ ). For the sake of exposition, we restrict attention to the product-choice game and try to sustain a payoff close to the efficient outcome (1, 1). Given the product-choice game and community size M, we choose a pair Ṫ and T̈ that induces an equilibrium. At the start of the game, each player receives an independent, noisy but informative signal about the timing of the trust-building phases (values of Ṫ and ˙ i , d̈i , ∆ ¨ i ), which is interpreted as follows. T̈ ). Each player receives a signal ωi = (ḋi , ∆ Player i, on receiving signal ωi , can bound the values of Ṫ and T̈ with two intervals; i.e., ˙ i ] and T̈ ∈ [d¨i , d¨i + ∆ ¨ i ]. The signal generation process she knows that Ṫ ∈ [d˙i , d˙i + ∆ is described below. The idea is that players are aware that there are two trust-building phases followed by the target payoff phase, but are uncertain about the precise timing of the phases. Moreover, signals are informative in that the two intervals are non-overlapping and larger intervals (imprecise estimates) are less likely than smaller ones. 60 ˙ i is drawn from a Poisson distribution with parameter γ̇, and then ḋi is drawn from i) ∆ ˙ i , Ṫ − ∆ ˙ i + 1, . . . , Ṫ ]. If either 1 or T̈ the discrete uniform distribution over [Ṫ − ∆ ˙ i ], then ∆ ˙ i and ḋi are drawn again. lie in the resulting interval [d˙i , d˙i + ∆ ˙ i and ḋi are drawn as above, ∆ ¨ i is drawn from a Poisson distribution with ii) After ∆ parameter γ̈. Finally, d̈i is drawn from the discrete uniform distribution over [T̈ − ˙ i , T̈ − ∆ ˙ i + 1, . . . , T̈ ]. If the resulting interval, [d¨i , d¨i + ∆ ¨ i ] overlaps with the first ∆ ˙ i ], (i.e.,d˙i + ∆ ˙ i ∈ [d¨i , d¨i + ∆ ¨ i ]), then d¨i is redrawn. interval [d˙i , d˙i + ∆ In this setting, players are always uncertain about the start of the trust-building phases and precise coordination is impossible. However, we conjecture that with a modification to our strategies, sufficiently patient players will be able to attain payoffs arbitrarily close to (1, 1), if the uncertainty about timing is very small. ˙ i , d̈i , ∆ ¨ i ). DurEquilibrium play: Phase I: Consider any player i with signal ωi = (ḋi , ∆ ˙ i periods, he plays the cooperative action (QH or BH ). Phase II: ing the first d˙i + ∆ ˙ i ) periods, he plays as if he were in Phase II, i.e., a seller During the next d¨i − (d˙i + ∆ plays QL and a buyer BH . Phase III: For the rest of the game (i.e., from period d¨i on), he plays the efficient action (QH or BH ). Off-Equilibrium play: A player can be in one of two moods: uninfected or infected, with the latter mood being irreversible. We define the moods a little differently. At the beginning of the game, all players are uninfected. Any action (observed or played) that is not consistent with play that can arise on-path, given the signal structure, is called a deviation. We classify deviations into two types. Deviations that definitely entail a short-run loss for the deviating player are called non-triggering deviations (e.g., a buyer deviating in the first period of the game). Any other deviation is called a triggering deviation (i.e., these are deviations that with positive probability give the deviating player a short-run gain). A player who is aware of a triggering deviation is said to be infected. Below, we specify off-path behavior. We do not completely specify play after all histories, but the description below will suffice to provide the intuition behind the conjecture. An uninfected player continues to play as if on-path. An infected player acts as follows. ˙ i : A buyer i who gets infected before period d˙i • Deviations observed before d˙i +∆ ˙ i when switches to her Nash action forever at some period between d˙i and d˙i + ∆ 61 she believes that enough buyers are infected and have also switched. Note that ˙ i , since any action observed buyers cannot get infected between d˙i and d˙i + ∆ during this period is consistent with equilibrium play (i.e., a seller j playing QL ˙ i ] may have received a signal such that d˙j + ∆ ˙ j = t). at time t ∈ [d˙i , d˙i + ∆ A seller i who faces BL before period d˙i , ignores it (this is a non-triggering deviation, as the buyer must still be in Phase I, which means that the deviation entails a short-term loss for her). If a seller observes BL between periods ḋi and ˙ i , he will switch to Nash immediately. d˙i + ∆ ˙ i + 1 and d¨i : A player who gets infected in • Deviations observed between d˙i + ∆ ˙ i + 1, d¨i ] will switch to Nash forever from period d¨i . Note the interval [d˙i + ∆ that buyers who observe QH ignore such deviations as they are non-triggering. • Deviations observed after d¨i : A player who gets infected after d¨i switches to the Nash action immediately and forever. We argue below why these strategies can constitute an equilibrium by analyzing some important histories. Incentives of players on-path: If triggering deviations are definitely detected and punished by Nash reversion, then, for sufficiently patient players, the shortrun gain from a deviation will be less than the long-term loss in payoff from starting the contagion. So, we need to check that all deviations are detected (though, possibly with probability < 1), and that the resultant punishment is enough to deter the deviation. • Seller i deviates (plays QL ) at t = 1: With probability 1, his opponent will detect the deviation, and ultimately his payoffs will drop to a very low level. A sufficiently patient player will, therefore, not deviate. ˙ i : With positive probability, his opponent j has • Seller i deviates at 2 ≤ t < d˙i + ∆ d˙j > t, and will detect the deviation and punish him. But, because of the uncertainty about the values of Ṫ and T̈ , with positive probability, the deviation goes undetected and unpunished. The probability of detection depends on the time of the deviation (detection is more likely earlier than later, because early on, most players are outside their first interval). So, the deviation gives the seller a small current gain with probability 1, but a large future loss (from punishment) with probability less than 1. If the uncertainty about Ṫ and T̈ is small enough (i.e., signals are very precise), then the probability of detection (and future loss) will be high. For a sufficiently patient player, the current gain will then be outweighed by the expected future loss. 62 • Seller i deviates (plays QL ) at t ≥ d¨i : With positive probability, his opponent j has signal d¨j = d¨i , and will detect the deviation. • All deviations by buyers (playing BL ) are detected, since BL is never consistent with equilibrium play. If a buyer plays a triggering deviation BL , she knows that with probability 1, her opponent will start punishing immediately. The buyer’s incentives in this case are exactly as in the setting without uncertainty. For appropriately chosen Ṫ and T̈ , buyers will not deviate on-path. Optimality of Nash reversion off-path: Now, because players are uncertain about the true values of Ṫ and T̈ , there are periods when they cannot distinguish between equilibrium play and deviations. We need to consider histories where a player can observe a triggering deviation, and check that it is optimal for him to start punishments. As before, we assume that players, on observing a deviation, believe that some seller deviated in the first period. First, consider incentives of a seller i. We argue that a seller who deviates at t = 1 will find it optimal to continue deviating. Further, a seller who gets infected by a triggering deviation at any other period will find it optimal to revert immediately to the Nash action. • Suppose that seller i deviates at t = 1, and plays QL . He knows that his opponent will switch to the Nash action at most, at the end of her first interval (close to the true Ṫ with high probability), and the contagion will spread exponentially from some period close to the true Ṫ + T̈ . Thus, if seller i is sufficiently patient, his continuation payoff will drop to a very low level after Ṫ + T̈ , regardless of his play in his Phase I (until ˙ i ). Therefore, for a given M, if Ṫ is large enough (and so d˙i + ∆ ˙ i is period d˙i + ∆ large), the optimal continuation strategy for seller i will be to continue playing QL . • Seller i observes a triggering deviation of BL : If a seller observes a triggering deviation of BL by a buyer (in Phase II), he thinks that the first deviation occurred at period 1, and by now all buyers are infected. Since, his play will have a negligible effect on the contagion process, it is optimal to play QL . Now, consider the incentives of a buyer. • Buyer i observes QL at 1 ≤ t < d˙i : This must be a triggering deviation. A seller j ˙ j ), and this cannot be should switch to QL only at the end of his first interval (d˙j + ∆ the case because, then, the true Ṫ does not lie in player i’s first interval. On observing this triggering deviation, the buyer believes that the first deviation occurred at t = 1 63 and the contagion has been spreading since then. Consequently, she will switch to her ˙ i when she begins believing Nash action forever at some period between d˙i and d˙i + ∆ that enough other buyers are infected and have switched, as well (It is easily seen that, ˙ i .). at worst, buyer i will switch at period d˙i + ∆ ¨ i . Since i is at the end of her second interval, she • Buyer i observes QL at t ≥ d¨i + ∆ knows that every rival has started his second interval, and should be playing QH . So, this is a triggering deviation. She believes that the first deviation occurred at t = 1, and so most players must be infected by now. This makes Nash reversion optimal. Note that in any other period, buyers cannot distinguish a deviation from equilibrium play. i) Any action observed by buyer i in her first interval (i.e., for t such that d˙i ≤ t < ˙ i ) is consistent with equilibrium play. A seller j playing QH could have got d˙i + ∆ ˙ j ≤ t. signal d˙j > t, and a seller playing QL could have got signal d˙j + ∆ ˙i≤ ii) Any action observed by buyer i between her two intervals (i.e., at t such that d˙i +∆ t < d¨i ) is consistent with equilibrium play. QL is consistent with a seller j who got ˙ j ≤ t, and QH is consistent with a seller with signal such that t < d˙j + ∆ ˙ j. d˙j + ∆ iii) Any action observed by buyer i within her second interval (i.e., at t such that d¨i ≤ ¨ i ) is consistent with equilibrium play. QL is consistent with a seller j who t < d¨i + ∆ ¨ i ), and QH is consistent with a seller with signal such got d¨j > t (say d¨j = d¨i + ∆ that d¨j < t (say j got the same signal as buyer i). B.7 Interchangeable Populations We assumed that each player belongs either to Community 1 or to Community 2. Alternatively, we could have assumed that there is one population whose members are matched in pairs in every period and, in each match, the roles of players are randomly assigned. At the start of every period, each player has an equal probability of playing as player 1 or 2 in the stage-game. A first implication of this setting is that a negative result like Proposition 1 may no longer be true.41 We believe that the trust-building ideas that underlie this paper can be adapted to this new setting. 41 A buyer infected in period 1 might become a seller in period 2, and he might indeed have the right incentives to punish. 64 Consider the repeated product-choice game when roles are randomly assigned in each period. We conjecture that the following version of our strategies can be used to get as close as desired to payoff (1, 1). There are two phases. Phase I is the trust-building phase: sellers play QL and buyers play BH ; the important features of this profile being that i) only buyers have an incentive to deviate, and ii) sellers are playing a Nash action. Phase II is the target payoff phase and (QH , BH ) is played. Deviations are punished through Nash reversion with no delay in the punishment. Thus, contagion also takes place in Phase I; whenever an “infected” player is in the role of a buyer, he will play BL and spread the contagion, so we do not have a single player infecting people in this phase. This implies that we do not need a second trust-building phase, since its primary goal was to give the infected buyers the right incentives to “tell” the sellers that there had been a deviation. The arguments would be very similar to those in the setting with independent populations. After getting infected, a player would believe that a buyer deviated in period one and that punishments have been going on ever since. The key observation is that an infected player early in Phase I knows that contagion will spread regardless of his action, and so he wants to get all the short-run gains he can when he is a buyer (similar to the incentives of the deviant seller in Phase I in our construction). Proving formally that players have the right incentives after all histories is hard, and we cannot rely on the analysis of the independent populations. Interchangeable roles for players have two main consequences for the analysis. First, the contagion process is not the same, and a slightly different mathematical object is needed to model it. Second, the set of histories a player observes would depend on the roles he played in the past periods, so it is harder to characterize all possible histories. We think that this exercise would not yield new insights, and so leave it as a conjecture. B.8 Can we get a (Nash Threats) Folk Theorem? Our strategies do not suffice to get a folk theorem for all games in G . For a game G ∈ G with strict Nash equilibrium a∗ , the set Fa∗ does not include action profiles where only one player is playing the Nash action a∗i . For instance, in the product-choice game, our construction cannot achieve payoffs close to (1 + g, −l) or (−l, 1 − c). However, we believe that the trust-building idea is powerful enough to take us farther. We conjecture we can obtain a Nash threats folk theorem for two-player games by modifying our strategies with the addition of further trust-building phases. We do not prove a folk theorem here, but hope that the informal argument below will illustrate how this might be done. To fix ideas, 65 we restrict attention to the product-choice game, although the extension to general games may entail additional difficulties. Consider a feasible and individually rational target payoff that can be achieved by playing short sequences of (QH , BH ) (10 percent of the time) alternating with longer sequences of (QH , BL ) (90 percent of the time). It is not possible to sustain this payoff in Phase III with our strategies. To see why not, consider a long time window in Phase III where the prescribed action profile is (QH , BL ). Suppose that a buyer faces QL for the first time in a period of this phase followed by many periods of QH . Notice that since the action for a buyer is BL in this time window, she cannot infect any sellers herself. Then, with more and more observations of QH , she will ultimately be convinced that few people are infected. Thus, it may not be optimal to keep playing Nash any more. This is in sharp contrast with the original situation, where the target action was (QH , BH ). In that case, a player who gets infected starts infecting players himself and so, after, at most, M − 1 periods of infecting opponents, he is convinced that everyone is infected. What modification to our strategies might enable us to attain these payoffs? We can use additional trust-building phases. Consider a target payoff phase that involves alternating sequences of (QH , BL ) for T1 periods and (QH , BH ) for T2 = 91 T1 periods. In the modified equilibrium strategies, in Phase III, the windows of (QH , BL ) and (QH , BH ) will be separated by trust-building phases. To illustrate, we start the game as before, with two phases: Ṫ periods of (QH , BH ) and T̈ periods of (QL , BH ). In Phase III, players play the action profile (QH , BL ) for T1 periods, followed by a new trust-building phase of T ′ periods during which (QL , BH ) is played. Then, players switch to playing the sequence of (QH , BH ) for T2 periods. The new phase is chosen to be short enough (i.e., T ′ ≪ T1 ) to have no significant payoff consequences. But, it is chosen long enough so that a player who is infected during the T1 period window, but thinks that very few people are infected, will still want to revert to Nash punishments to make short-term gains during the new phase.42 We conjecture that adding appropriate trust-building phases in the target payoff phase in this way can guarantee that players have the incentive to revert to Nash punishments off-path for any beliefs they may have about the number of people infected. 42 For example, think of a buyer who observes a triggering action for the first time in Phase III (while playing (QH , BL )) and then observes only good behavior for a long time while continuing to play (QH , BL ). Even if this buyer is convinced that very few people are infected, she knows that the contagion has begun, and ultimately her continuation payoff will become very low. So, if there is a long enough phase of playing (QL , BH ) ahead, she will choose to revert to Nash because this is the myopic best response, and would give her at least some short-term gains. 66 B.9 A Problematic Game outside G Consider the two-player game in Figure 5. This is a game with strictly aligned interests. For L C R T −5, −5 −1, 8 5, 5 M −5, −5 −2, −2 8, −1 B −3, −3 −5, −5 −5, −5 Figure 5: A game outside G . each (pure) action profile, either it is a Nash equilibrium or both players want to deviate. The difference with respect to other strictly aligned interests games, such as the battle of the sexes or the chicken game, is that there is a Pareto efficient payoff, (5, 5), that cannot be achieved as the convex combination of Nash payoffs. Further, since it Pareto dominates the pure Nash given by (B, L), it might be possible to achieve it using Nash reversion. Note that, given a strictly aligned interests game and an action profile, if a player plays her best reply against her opponent’s action, the resulting profile is a Nash equilibrium. What is special about games like this that make it harder to get cooperation? Suppose that we want to get close to payoff (5, 5) in equilibrium in the repeated random matching game. Both players have a strict incentive to deviate from (T, R), the action profile that delivers (5, 5). For this game, we cannot rely on the construction we used to prove Theorem 1, since there is no one-sided incentive profile to use in Phase I. The approach of starting play with an initial phase of playing Nash action profiles does not work well. This suggests that play in the initial phase of the game should consist of action profiles where both players have an incentive to deviate. Suppose that we start the game with a phase in which we aim to achieve target payoff (5, 5), with the threat that any deviation will, at some point, be punished by Nash reversion to (−3, −3). Suppose that a player deviates in period 1. Then, the opponent knows that no one else is infected in her community and that Nash reversion will eventually occur. Hence, both infected players will try to make short-run gains by moving to the profile that gives them 8. As more players become infected, more people are playing M and C and the payoff will get closer to (−2, −2). In this situation, it is not clear how the best-replying dynamics will evolve. Further, what is even more complicated, is that it is very hard to provide the players with the right incentives to move to (−3, −3). Note that, as long as no player plays B or L, no one ever gets something below −2, while B and L lead to, at most, −3. That is, a player 67 will not switch to B unless she thinks that a high number of players in the other community are already playing L but, who will be the first one to switch? Can we ensure somehow that players coordinately switch to (−3, −3) in this setting? 68