Improving the Relationship: A Principal-Agent Model of Progressive Learning and Path Dependence∗ Avidit Acharya† and Juan Ortner‡ May 3, 2016 Preliminary - comments welcome Abstract The ratchet effect literature studies principal-agent relationships in which the principal lacks inter-temporal commitment power and the agent possesses persistent private information. A key message of this literature is that in such a setting the possibilities for learning by the principal are typically limited and the principal must content herself with low values, giving up substantial informational rents to the agent. This paper investigates a new source of potential learning for the principal: time varying productivity shocks. We show that when the incentive environment is subject to these shocks, the principal may be able to increase her value by gradually learning the agent’s private information over time. We find conditions under which full learning eventually takes place so that the principal eventually achieves her first best payoffs, as well as conditions under which learning is path dependent in the sense that the history of early shocks determines the principal’s long run value. We show how a strategic principal leverages these shocks to both increase her value and potentially achieve efficiency even when commitment is not possible. JEL Classification Codes: C73, D86 Key words: principal-agent model, adverse selection, ratchet effect, inefficiency, learning, path dependence ∗ For helpful comments, we would like to thank Stephane Wolton and seminar audiences at Boston University, Stanford, Berkeley and the LSE/NYU political economy conference. † Assistant Professor of Political Science, 616 Serra Street, Stanford University, Stanford CA 94305 (email: avidit@stanford.edu). ‡ Assistant Professor of Economics, Boston University, 270 Bay State Road, Boston MA 02215 (email: jortner@bu.edu). 1 1 Introduction An early literature on the ratchet effect studies long-run principal-agent relationships, finding that the value of the relationship to the principal suffers when the agent has persistent private information and the principal lacks long term commitment power. The lack of commitment power, in particular, hinders the principal’s ability to incentivize information disclosure, and as a result, the principal has to give up substantial informational rents to the agent. One key message of this literature is that without commitment power, the principal’s ability to improve the relationship by learning the agent’s private information is severely limited. In this paper, we investigate a new source of learning for the principal that mitigates the ratchet effect: time-varying productivity shocks. Such shocks are a natural feature of most dynamic relationships, and our model departs from previous work (e.g., Hart and Tirole (1988) and Schmidt (1993)) only with respect to these shocks. In particular, we maintain the assumption that the principal lacks commitment power, and the agent’s private information is persistent. Although these productivity shocks further complicate the environment, we show that they provide the principal with valuable opportunities to learn the agent’s private information over time. We are interested in how the value of the relationship for the principal evolves in the presence of these shocks.1 In each period of our model, the principal offers the agent a transfer in exchange for taking an action that benefits the principal. The principal has short-term, but not longterm, commitment power: she can credibly promise to pay a transfer in the current period if the agent takes the action, but she cannot commit to any future transfers. The principal is able to observe the agent’s decision, but the agent’s cost of taking the action is his private information and is constant over time. In each period, the realization of a productivity shock affects the size of the benefit that the principal obtains from having the agent take the action. The current level of productivity is publicly observed by both the principal and the agent at the start of the period, and productivity evolves over time as a Markov process. The basic structure of our model is motivated by the focus of the ratchet effect literature and facilitates a direct comparison of our results to existing results by Hart and Tirole (1988), 1 Other papers examine alternative approaches to mitigating the ratchet effect. For example, Kanemoto and MacLeod (1992) show that piece-rate contracts alleviate the ratchet effect in labor contracting when there is competition for second-hand workers. Carmichael and MacLeod (2000) show that the ratchet effect can be mitigated when the principal and agent play a cooperative equilibrium that is sustained via the threat of future punishment. Fiocco and Strausz (2015) show that the ratchet effect can be mitigated when contracting is strategically delegated to an independent third party. Our paper differs from this work in that we do not introduce external sources of contract enforcement nor do we reintroduce commitment through the back door by allowing for punishment strategies. 2 Schmidt (1993), Gerardi and Maestri (2015), and others.2 These authors consider stationary models in which the agent has persistent private information, and show that the principal’s inability to commit to long term contracts severely limits what she can learn about the agent.3 Hart and Tirole (1988) and Schmidt (1993) establish their results by studying games with long but finite time horizons, while Gerardi and Maestri (2015) assume that the time horizon is infinite. We take the latter approach, and focus on pure strategy Markovian equilibria that are optimal for the principal. In these equilibria, the shock variable and principal’s belief are treated as the relevant state variables. These restrictions keep our model closely related to the existing work in the sense that without the productivity shocks, our equilibrium collapses to the standard ratchet effect equilibrium studied in the literature.4 Our analysis produces three key results that contrast with the main results of the previous work cited above. First, we show that in the presence of productivity shocks, the principal’s lack of commitment power may produce lasting inefficiencies. To see why, consider a productivity shock at which it is only efficient for low cost agents to take the action. If the equilibrium outcome were efficient, the principal would learn information about the agent’s cost after observing the agent’s choice of action. Given the principal’s inability to commit, a low cost agent would not be willing to reveal his private information if the benefit that he obtains from pooling with high cost types is sufficiently large. When this happens, the equilibrium must necessarily be inefficient. Second, and more importantly, we show that the principal might be able to gradually learn the agent’s cost over time. In some cases, she may be able to eventually learn the agent’s exact cost, while in other situations only some (but not full) learning takes place. The principal’s ability to learn arises because the benefit that low cost types obtain by pooling with high cost types changes over time, together with the level of productivity. Specifically, mimicking a high cost type becomes less profitable for a low cost type when productivity is low. In such periods, it becomes cheaper for the principal to extract information from low cost agent types, and thus she may find it optimal to do so.5 2 Other papers in the literature include Freixas et al. (1985), Gibbons (1987), Laffont and Tirole (1988), Dewatripont (1989), and, more recently, Halac (2012) and Malcomson (2015). 3 One paper in which the environment is non-stationary is Blume (1998), in which the agent’s type is changing over time. In our model, the agent’s private information is fixed throughout the entire game, and the non-stationarity of our environment arises from changes in productivity over time. 4 Specifically, in the case of no shocks, the path of play generated by our equilibrium is, for arbitrarily long history, identical to equilibrium path of a corresponding game in which the time horizon is long but finite, as in Hart and Tirole (1988) and Schmidt (1993). 5 The idea that time-varying shocks can ameliorate a player’s lack of commitment also appears in Ortner (2016), who studies the problem of a durable goods monopolist with time-varying production costs. The paper shows that, unlike the classic Coase conjecture results in Gul et al. (1986) and Fudenberg et al. (1985), 3 Third, and finally, we uncover an interesting feature of the equilibrium: it may be pathdependent. By this we mean that the information that the principal is able to learn about the agent’s type along the path of play may depend on the sequence of productivity shocks that was realized early on in the relationship. Consequently, the principal’s long-run value from the relationship may also depend crucially on the path of productivity in the early stages. This is true even when the process governing the evolution of productivity is ergodic and our equilibrium concept is Markovian. To understand the intuition behind our path dependence result, consider a setting with three possible agent types, with costs c1 < c2 < c3 . In this setting, the information rents that a c1 -agent type gets by mimicking a c2 -agent type depends on how often the c2 type is expected to take the productive action in the future. In turn, how often a c2 type takes the action depends on the principal’s beliefs. If the principal assigns positive probability to the agent’s type being c3 , the c2 -type has an incentive to mimic the c3 -type by not taking the action in periods when productivity is low. This incentive disappears if at some point along the path of play the principal learns that the agent’s cost is not c3 . As a result, there may be levels of productivity at which it is profitable for the principal to incentivize a c1 -type to reveal his type when she assigns positive probability to all three agent types. However, if it some point in the past the principal learned that the agent’s cost is not c3 , it becomes too expensive for her to incentivize a c1 -agent type to reveal his private information. The principal then never fully learns the agent’s type. Our paper is relevant to the set of applications that motivated the original work of Hart and Tirole (1988) on contract renegotiation, particularly repeated buyer-seller relationships. As a model of repeated bargaining with one-sided offers, it is also relevant to the literature on repeated negotiations in which path dependence has been highlighted as an important empirical feature (see, e.g., Kennan, 2001). The two main assumptions of our model—that inter-temporal contracts are not available, and that the relationship is periodically hit by productivity shocks—make our results especially relevant to a range of applications for which these assumptions are natural. The literature on relational contracting, for example, starts from the premise that not everything can be contracted (see, e.g., Levin, 2003). Shareholders may provide the managers of their firm with short term incentives, but they cannot always make long-term commitments. In addition, as the firm’s investment opportunities change, its productivity will also vary over time. Similarly, in many political economy applications, it is natural to assume that key aca monopolist with time-varying costs may extract rents from consumers. 4 tors like governments and international organizations cannot always make credible long-term commitments. Dixit (2000), for example, suggests that the IMF’s relationship with a client government is a principal-agent relationship. He writes that “IMF programs are incentive schemes or mechanisms” and that “viewing them explicitly in this way ... reminds us of the essential common elements of such problems, for example asymmetries of information and observation, credibility of commitment, [etc.]” He further explains how “the outcomes [of IMF programs] are eventually beneficial, for example lower inflation and better access to international financial markets, but taking the required actions can have some economic costs to a country, for example higher unemployment in the short run, and political costs to its government, for example a reduction in subsidies to its favored groups.” Our model speaks to this view, especially if the client government has private information about the political costs of reform. Since the benefits to economic reform will, in general, depend on the state of the economy as it moves through different points on the business cycle, our assumption that there are productivity shocks to the environment is also natural in this setting. 2 Model 2.1 Setup Consider the following repeated interaction between a principal and an agent. Time is discrete and indexed by t = 0, 1, 2, ..., ∞. Both players are risk-neutral expected utility maximizers and share a common discount factor δ < 1.6 At the start of each period t, a state bt is drawn from a finite set of states B, and is publicly revealed. After observing bt ∈ B, the principal decides how much transfer Tt ≥ 0 to offer the agent in exchange for taking a productive action. The agent then decides whether or not to take the action. We denote the agent’s choice by at ∈ {0, 1}, where at = 1 means that the agent takes the action period t. The action provides the principal a benefit equal to bt . We assume that b > 0 for all b ∈ B. The action, however, has a cost c > 0 to the agent. This cost is private information to the agent, and is fixed through time. The set of possible costs is C = {c1 , ..., cK }, and the principal has a prior belief µ0 ∈ ∆(C) with full support. At the end of each period, the principal observes the agent’s choice and updates her beliefs about the agent’s cost. The players receive their payoffs and the game moves to the next period.7 6 7 Our results remain qualitatively unchanged if the principal and agent have different discount factors. As in Hart and Tirole (1988) and Schmidt (1993), the principal in our model can commit to paying the 5 These assumptions imply that the payoffs to the principal and an agent of cost type c = ck at the end of period t are, respectively, u(bt , Tt , at ) = at [bt − Tt ] vk (bt , Tt , at ) = at [Tt − ck ] We assume, without loss of generality, that the agent’s possible costs are ordered so that 0 < c1 < c2 < ... < cK . To avoid having to deal with knife-edge cases, we further assume that b 6= ck for all b ∈ B and ck ∈ C. This means that it is socially optimal (efficient) for an agent with cost ck to take the productive action in state b if and only if b − ck > 0. Let the set of states for which it is socially optimal for an agent with cost ck to take the action be Ek := {b ∈ B : b − ck > 0}. We refer to Ek as the efficiency set for type ck . Note that by our assumptions on the ordering of types, the efficiency sets are nested, i.e. Ek0 ⊆ Ek for all k 0 ≥ k. We assume that the evolution of states is governed by a Markov process with transition matrix [Qb,b0 ]b,b0 ∈B . We further assume that this process is relatively persistent. To formalize this, first define the following function: for any state b ∈ B and subset of states B ⊆ B, let " X(b, B) := E ∞ X # δ t 1{bt ∈B} |b0 = b , t=1 where E[·|b0 = b] denotes the expectation operator with respect to the Markov process governing state transitions, given that the period 0 state is b. Term X(b, B) is the expected discounted amount of time that the process visits a state in B in the future, given that the current state is b. The following assumption then captures the idea that discounting is not too high, and that states are relatively persistent Assumption 1 (discounting/persistence) X(b, {b}) > 1 for all b ∈ B. When there are no shocks (i.e., the state is fully persistent so that B is a singleton) the above assumption holds when δ > 1/2. In general, for any ergodic process, the assumption holds whenever δ is above a cutoff δ > 1/2. Conversely, for any δ > 1/2, the assumption holds whenever the process is sufficiently persistent; that is, whenever Prob(bt+1 = b|bt = b) is sufficiently large for all b ∈ B. transfer within the current period, but cannot commit to a schedule of transfers in future periods. 6 2.2 Histories, Strategies and Equilibrium Concept A history ht = h(b0 , T0 , a0 ), ..., (bt−1 , Tt−1 , at−1 )i records the states, transfers and agent’s action choices that have been realized from the beginning of the game until the start of period t. For any two histories ht0 and ht with t0 ≥ t, we write ht0 ht if the first t period entries of ht0 are the same as the t period entries of ht . As usual, we let Ht denote the set of period S t histories and H = t≥0 Ht the set of all histories. A pure strategy for the principal is a function τ : H × B → R+ , which maps histories and the current state to transfer offers T . A pure strategy for the agent is a collection of mappings {αk }K k=1 , αk : H × B × R+ → {0, 1}, each of which maps the current history, current state and current transfer offer to the action choice a ∈ {0, 1} for a particular type ck . Given a pair of strategies σ = τ, {αk } , the continuation payoffs of the principal and an agent with cost ck after history ht and shock realization bt are denoted U σ [ht , bt ] and Vkσ [ht , bt ] respectively. The principal’s belief about the agent’s cost after history ht is denoted µ[ht ] and is given by a mapping µ : H → ∆(C). A pure strategy perfect Bayesian equilibrium (PBE) is a pair of strategies σ and posterior beliefs µ for the principal such that the strategies form a Bayesian Nash equilibrium in every continuation game given the posterior beliefs, and beliefs are consistent with Bayes rule whenever possible. Thus, pure strategy PBE can be denoted by the pair (σ, µ). We use the term “equilibrium” to refer to pure strategy PBE that satisfy the two conditions below, where we identify τ (ht , ·) and αk (ht , ·, ·) with the continuation strategies of the principal and agent with cost ck , given the occurrence of history ht . R1. (Markovian condition) For all histories ht and ht0 , if µ[ht ] = µ[ht0 ] then τ (ht , ·) = τ (ht0 , ·) and αk (ht , ·, ·) = αk (ht0 , ·, ·) for all k. R2. (best for principal) There is no history ht , shock bt ∈ B and pure strategy PBE (σ 0 , µ0 ) that also satisfies the Markovian condition, R1, for which 0 U σ [ht , bt ] ≥ U σ [ht , bt ] R1 says that the principal’s and agent’s strategies depend on history only through the principal’s current beliefs. R2 says that after every history, the equilibrium yields the highest possible continuation payoff to the principal among all pure strategy PBE that satisfy R1. We impose these restrictions to rule out indirect sources of commitment for the principal. In particular, R1 rules out equilibria in which the threat of punishment enforces high continuation payoffs for the agent. R2 rules out Markovian equilibria in which off path beliefs 7 are constructed in ways that make the principal give the agent extra rents beyond his informational rents.8 As we show in Lemma 0 below, the main equilibrium implications of these restrictions is that the highest cost type in the support of the principal’s belief has a zero continuation payoff at any history, and local incentive constraints always bind with equality. 3 Equilibrium Analysis 3.1 Incentive Constraints Fix any equilibrium (σ, µ) = ((τ, {αk }), µ) and given the equilibrium let at,k be a random variable indicating the action that agent type ck takes in period t. We will use C[ht ] to denote the support of µ[ht ] and k[ht ] := max{k : ck ∈ C[ht ]} to denote the highest index of types in the support of the principal’s beliefs. For any history ht , any pair ci , cj ∈ C[ht ], and any b ∈ B, let "∞ # X 0 σ Vi→j [ht , b] := Eσj δ t −t at0 ,j (Tt0 − ci )|ht , bt = b t0 =t be the expected discounted payoff that type ci would obtain after history ht when bt = b from following the equilibrium strategy of type cj . Here, Eσj [·|ht , bt = b] denotes the expectation over future events given type ci ’s deviation to type cj ’s strategy after history ht when bt = b and when all other types play according to σ. For any ck ∈ C[ht ], the continuation value of σ [ht , b]. Then, note that an agent with cost ci at history ht is simply Viσ [ht , b] = Vi→i " σ [ht , b] = Eσj Vi→j ∞ X # 0 δ t −t at0 ,j (Tt0 − cj )|ht , bt = b + Eσj t0 =t " ∞ X # 0 δ t −t at0 ,j (cj − ci )|ht , bt = b t0 =t = Vjσ [ht , b] + (cj − ci )Aσj [ht , b] (1) where Vjσ [ht , b] is type cj ’s continuation value at history (ht , bt = b), and Aσj [ht , b] := Eσj "∞ X # δ t0 −t at0 ,j |ht , bt = b t0 =t is the expected discounted number of times that type cj takes the action after history ht , according to σ. Equation (1) says that type ci ’s payoff from deviating to cj ’s strategy can 8 Markovian equilibria in which the principal offers high transfers to the agent can be constructed by specifying off path beliefs that “punish” an agent who accepts low transfers. Such beliefs incentivize the agent to reject low transfers, and by doing this they also incentivize the principal to offer high transfers. 8 be decomposed into two parts: type cj ’s continuation value, and an informational rent (cj − ci )Aσj [ht , bt ], which depends on how frequently cj is expected to take the action in the future. Incentive compatibility requires that for all histories ht , all shocks bt ∈ B and every pair of σ types ci , cj ∈ C[ht ], Viσ [ht , bt ] ≥ Vi→j [ht , bt ], or using (1), Viσ [ht , bt ] ≥ Vjσ [ht , bt ] + (cj − ci )Aσj [ht , bt ] ∀(ht , bt ), ∀ci , cj ∈ C[ht ] (2) We then have the following fact, which follows from equilibrium conditions R1 and R2. Part (i) says that the highest cost type in the support of the principal’s beliefs obtains a zero continuation payoff, while part (ii) says that local incentive constraints bind with equality. Lemma 0. Fix any equilibrium (σ, µ) and history ht , and if necessary renumber the types so that C[ht ] = {c1 , c2 , ..., ck[ht ] } with c1 < c2 < ... < ck[ht ] . Then, for all ht ∈ H and all b ∈ B, σ (i) Vk[h [ht , b] = 0. t] σ [ht , b] + (ci+1 − ci )Aσi+1 [ht , b] for all ci , ci+1 ∈ C[ht ]. (ii) If |C[ht ]| ≥ 2, then Viσ [ht , b] = Vi+1 Proof. See Appendix A. 3.2 Equilibrium Characterization We now describe the (essentially) unique equilibrium of the game. Recall that k[ht ] denotes the highest index in the support of the principal’s equilibrium beliefs at any history ht , and that Ek is the set of states at which it is socially optimal for type ck ∈ C to take the action. Theorem 1. The set of equilibria is non-empty, and all equilibrium are payoff equivalent. For every history ht and every bt ∈ B, in equilibrium we have: (i) If bt ∈ Ek[ht ] , then the principal offers transfer Tt = ck[ht ] , and all types in C[ht ] take the action. (ii) If bt ∈ / Ek[ht ] and X(bt , Ek[ht ] ) > 1 then no type in C[ht ] takes the action. (iii) If bt ∈ / Ek[ht ] and X(bt , Ek[ht ] ) ≤ 1, then there is a threshold type ck∗ ∈ C[ht ] such that types in C − := {ck ∈ C[ht ] : ck < ck∗ } that are below the threshold take the action while types in C + := {ck ∈ C[ht ] : ck ≥ ck∗ } that are above the threshold don’t take the action. If C − is non-empty, the transfer that is accepted by types in C − is, Tt = cj ∗ + Vkσ∗ [ht , bt ] + (ck∗ − cj ∗ )Aσk∗ [ht , bt ], 9 (∗) where cj ∗ = max C − . Proof. See Appendix B. Part (i) characterizes a situation in which an efficient ratchet effect is in play, replicating the main finding of the ratchet effect literature. In this case, it is socially optimal for all agent types in the principal’s support to take the action (and they all do) but the principal must compensate the agent as if he was the highest cost type. For example, Hart and Tirole (1988) and Schmidt (1993) consider a special case of our model in which the benefit from taking the action is constant over time (for all t, the state is bt = b for some constant b) and strictly larger than the highest cost (b > cK ). Thus, part (i) of Theorem 1 applies. In each period t, the principal offers a transfer Tt = cK that all agent types accept, and she never learns anything about the agent’s type. Part (ii) characterizes situations where an inefficient ratchet effect is in play: low cost types pool with high cost types and don’t take the productive action even if the principal would be willing to fully compensate their costs. The reason is that if bt ∈ / Ek[ht ] , then the payoff that an agent with cost ck < ck[ht ] obtains by pooling with type ck[ht ] is (ck[ht ] − ck )Aσk[ht ,bt ] = (ck[ht ] − ck )X(bt , Ek[ht ] ), which follows because an agent with cost ck[ht ] takes the action at time t0 ≥ t if and only if bt0 ∈ Ek[ht ] . Suppose there exists a nonempty subset / Ek[ht ] with X(bt , Ek[ht ] ) > 1 C ⊂ C[ht ]\{ck[ht ] } of types that take the action at a state bt ∈ after history ht . Let c∗ = max C. By Lemma 0, an agent with cost c∗ obtains a total payoff of Tt − c∗ + 0 by taking the action at time t.9 Since this payoff must be larger than what this type would get by mimicking an agent with cost ck[ht ] , it must be that Tt ≥ c∗ + (ck[ht ] − c∗ )X(bt , Ek[ht ] ) > ck[ht ] , where the second inequality follows because X(bt , Ek[ht ] ) > 1. But this cannot occur in an equilibrium, since by Lemma 0(i) an agent with type ck[ht ] would strictly prefer to accept offer Tt > ck[ht ] and take the action. Therefore, for all bt ∈ / Ek[ht ] with X(bt , Ek[ht ] ) > 1, no type takes the action. Part (iii) characterizes situations where learning may take place. Specifically, learning takes place when the set C − is nonempty. Although the theorem does not tell us under which conditions C − is nonempty, we provide sufficient conditions for learning to occur in Section 4 below. In Appendix B.3 we provide a characterization of the threshold ck∗ as the unique solution to a finite maximization problem. Building on this, we also characterize the principal’s equilibrium payoffs as the unique fixed point of a contraction mapping. 9 Indeed, under the proposed equilibrium, the principal infers that the agent’s type is in C if she observes that the agent took the action at time t. Hence, by Lemma 0, an agent with cost c∗ obtains a continuation payoff of zero from time t + 1 onwards. 10 3.3 Examples We now present two examples that illustrate some of the main equilibrium features of our model. The first highlights the fact that equilibrium outcome in our model can be inefficient. This contrasts with the results in Hart and Tirole (1988) and Schmidt (1993), where the equilibrium is always socially optimal. Example 1 (inefficient ratchet effect) Suppose that there are two states, B = {bL , bH }, with 0 < bL < bH , and two types, C = {c1 , c2 } (recall our assumption that c2 > c1 ). Let the efficiency sets be E1 = {bL , bH } and E2 = {bH }, and assume that X(bL , {bH }) > 1. Consider a history ht such that C[ht ] = {c1 , c2 }. Theorem 1(i) implies that, at such a history, both types take the action if bt = bH , receiving a transfer equal to c2 . On the other hand, Theorem 1(ii) implies that neither type takes the action if bt = bL . Indeed, when X(bL , {bH }) > 1 the benefit that a c1 -agent obtains by pooling with a c2 -agent is so large that there does not exist any offer that a c1 -agent would accept but a c2 -agent would reject. As a result, the principal never learns the agent’s type in equilibrium. Inefficiencies arise in all periods t in which bt = bL : an agent with cost c1 never takes the action when the state is bL , even though it is socially optimal for him to do so. The next example illustrates a situation in which the principal is able to learn the agent’s type, and the equilibrium outcome is efficient. This too contrasts with earlier work on the ratchet effect in which there is no learning by the principal on the path of play. Example 2 (efficiency and learning) The environment is the same as in Example 1, with the only difference that X(bL , {bH }) < 1. Consider a history ht such that C[ht ] = {c1 , c2 }. As in Example 1, both types take the action in period t if bt = bH , i.e., they take the action in the high productivity state. The difference is that, if bt = bL , the principal offers a transfer Tt that a c2 -agent rejects, but a c1 -agent accepts. To see why, note first that by Theorem 1, an agent of type c2 does not take the action at time t if bt = bL . Suppose that type c1 does not take the action when bt = bL either. Since the equilibrium is Markovian, this implies that the principal never learns the agent’s type, and her payoff is at time t when bt = bL is U = X(bL , {bH })[bH − c2 ]. If instead the principal were to make an offer than only an agent with cost c1 accepts, then by Theorem 1(iii) and Lemma 0(i) the principal’s offer would exactly compensate type c1 for revealing his type, i.e. T − c1 = X(bL , {bH })(c2 − c1 ). Note that X(bL , {bH }) < 1 implies that T < c2 , so an agent with cost c2 rejects this offer. Conditional on the agent’s cost being c1 , the principal’s payoff from making offer T when 11 bt = 1 is U [c1 ] = bL − T + X(bL , {bL })[bL − c1 ] + X(bL , {bH })[bH − c1 ] = [1 + X(bL , {bL })][bL − c1 ] + X(bL , {bH })[bH − c2 ]. On the other hand, conditional on the agent’s type being c2 , the agent would reject the transfer and the principal’s payoff would be U [c2 ] = X(bL , {bH })[bH − c2 ] = U . The principal finds it optimal to make offer T if µ0 [c1 ]U [c1 ] + µ0 [c2 ]U [c2 ] > U , where µ0 [cj ] is the prior probability that the agent’s cost is cj . Since U [c2 ] = U and since U [c1 ] > U , this inequality holds. Therefore, in any Markovian PBE satisfying R2, type c1 takes the action in state bL when C[ht ] = {c1 , c2 }. Finally, note that the principal learns the agent’s type at time t = min{t : bt = bL }, and the equilibrium outcome is efficient from time t + 1 onwards: type ci takes the action at time t0 > t if and only if bt0 ∈ Ei . Moreover, Lemma 0(i) guarantees that the principal extracts all of the surplus from time t∗ + 1 onwards, paying the agent a transfer equal to his cost. Example 2 has three notable features. First, despite of her lack of commitment the principal is able to learn the agent’s type. Learning takes place the first time the relationship hits the low productivity state. In the next subsection we present conditions under which this result generalizes. Second, the principal’s value increases over time, since the surplus she extracts from the agent increases as she learns the agent’s type. In Section 4 we characterize general conditions under which full learning eventually takes place. Third, the equilibrium exhibits a form of path-dependence: equilibrium play at time t depends on the entire history of shocks up to period t. Before state bL is reached the principal pays a transfer equal to the agent’s highest cost c2 , to get both types to take the action. After state bL is visited, if the principal finds that the agent has low cost, then she pays a transfer equal to the low type’s cost. Note, however, that the path dependence in this example is short-lived: after state bL is visited for the first time, the principal learns the agent’s type and the equilibrium outcome from that point on is independent of the prior history of shocks. It turns out, however, that this is not a general property of our model. In Section 4 we demonstrate how the equilibrium may also display long-run path dependence. 3.4 Learning in Bad Times Under natural conditions on the process [Qb,b0 ] that governs the evolution of the stochastic shock, the principal learns about the agent’s type only in “bad times,” i.e., only when the 12 benefit bt is small. Recall that µ[ht ] denotes the principal’s beliefs about the agent’s type after history ht . Then, we have: Proposition 1. (learning in bad times) Suppose that for all ck ∈ C and all b, b0 ∈ B\Ek such that b < b0 , X(b, Ek ) ≤ X(b0 , Ek ). Then, in any equilibrium and for every history ht there exists a state b[ht ] ∈ B such that µ[ht+1 ] 6= µ[ht ] only if bt ≤ b[ht ]. Proof. By Theorem 1, after history ht the principal learns at state b only if X(b, Ek[ht ] ) ≤ 1. By the assumption that for all types ck , if b < b0 implies X(b, Ek ) ≤ X(b0 , Ek ), there exists a state b[ht ] ∈ B such that X(b, Ek[ht ] ) ≤ 1 if and only if b ≤ b[ht ]. Proposition 1 provides conditions under which the principal only updates her beliefs about the agent’s type at states at which the benefits from taking the productive action are sufficiently small. The reason is that under the premise of the proposition, the informational rent that agents with type ci < ck[ht ] get from mimicking an agent with the highest cost ck[ht ] is decreasing in the realization of bt . As a result, the principal is only able to learn about the agent’s type when bt is small. 4 Long Run Properties In this section, we study the long run properties of the equilibrium characterized in the previous section. Before stating our results, we introduce some additional notation, and make a preliminary observation. An equilibrium outcome can be written as an infinite sequence h∞ = h(bt , Tt , at i∞ t=0 , or equivalently as an infinite sequence of equilibrium histories h∞ = {ht }∞ t=0 such that ht+1 ht for all t. Because we focus on pure strategy Markovian equilibria and because the sets of types and states are finite, for any equilibrium outcome h∞ there exists a time t∗ [h∞ ] such that µ[ht ] = µ[ht∗ [h∞ ] ] for all ht ht∗ [h∞ ] . That is, given an equilibrium outcome, learning always stops after some time t∗ [h∞ ]. Therefore, given an agent’s type c ∈ C and an equilibrium outcome h∞ that can arise when the agent has type c, in every period after t∗ [h∞ ] the principal’s continuation payoff can be written to depend only on the realization of the current period shock. Formally, given any equilibrium outcome h∞ = {ht }∞ t=0 that is possible when the agent’s true type is c = ck ∈ C, the principal’s equilibrium continuation value conditional on the agent’s type being c = ck can be written as U σ (bt |ht∗ [h∞ ] , c = ck ) for all ht ht∗ [h∞ ] . This says that after period t∗ [h∞ ] we need to only keep track of the current shock and what beliefs were at the end of period t∗ [h∞ ], since beliefs are constant 13 from this period onwards. We use this fact in the next two subsections to study properties of the principal’s long run value. 4.1 The Principal’s Long Run Value We start by studying the extent to which the principal can learn the agent’s type, and how the efficiency of the relationship might improve over time. For all b ∈ B and all ck ∈ C, the principal’s first best payoffs conditional on the current shock being b and the agent’s type being c = ck are given by " U ∗ (b|ck ) := E ∞ X 0 δ t −t (bt0 − ck )1{bt0 ∈Ek } t0 =t # bt = b . Thus, under the first best outcome the agent takes the action whenever it is socially optimal and the principal always compensates the agent his exact cost. We then say that an equilibrium is long run first best if for all ck ∈ C and for every equilibrium outcome h∞ that arises with positive probability when the agent’s type is c = ck , U σ (bt |ht∗ [h∞ ] , c = ck ) = U ∗ (bt |ck ) ∀t ≥ t∗ [h∞ ] and ∀bt ∈ B. This says that no matter what the agent’s true type is, and no matter what the equilibrium outcome is, once learning has stopped the principal achieves her first best payoff at every subsequent realization of the shock. The following proposition reports a sufficient condition for the principal to always eventually achieve her first best payoffs. Proposition 2. (long run first best) Suppose that [Qb,b0 ] is ergodic and that for all ck ∈ C\{cK } there exists b ∈ Ek \Ek+1 such that X(b, Ek+1 ) < 1. Then, the equilibrium is long run first best. Proof. See Appendix C. Note that an equilibrium is long run first best if and only if the principal always eventually learns the agent’s type, i.e., if and only if for all c ∈ C and every equilibrium outcome h∞ that is possible when the agent’s true cost is c, we have µ[ht∗ [h∞ ] ](c) = 1. The proof of Proposition 2 shows that, under the sufficient condition, with probability 1 the principal eventually learns the agent’s type. The condition guarantees that, for any history ht such that |C[ht ]| ≥ 2, there exists at least one state b ∈ B at which the principal finds it optimal to make an offer that only a strict subset of types accept. Therefore, if the process [Qb,b0 ] is ergodic, then it is 14 certain that the principal will eventually learn the agent’s type, and from that point onwards she will obtain her first best payoffs. If an equilibrium is long run first best then it is also long run efficient, i.e. for all ck ∈ C and for every equilibrium outcome h∞ that is possible when the agent’s true cost is ck , an agent with cost ck takes the action in each period t > t∗ [h∞ ] if and only if bt ∈ Ek . However, the converse of this statement is not true. Because of this, it is easy to find weaker sufficient conditions under which long run efficiency holds. One such condition is that [Qb,b0 ] is ergodic and for all ck ∈ C\{cK } there exists b ∈ Ek \Ek such that X(b, Ek ) < 1, where k = min{j ≥ k : Ej 6= Ek }. This condition guarantees that the principal’s beliefs will eventually place unit mass on the set of types that share the same efficiency set with the agent’s true type. After this happens, even if the principal does not achieve her first best payoff by further learning the agent’s type, the agent takes the action if and only if it is socially optimal to do so. The next and final result of this section provides a partial counterpart to Proposition 2 by presenting conditions under which the equilibrium is not long run first best, and conditions under which it is not long run efficient. Proposition 3. (no long run first best; no long run efficiency) Let ht be an equilibrium history such that |C[ht ]| ≥ 2 and X(b, Ek[ht ] ) > 1 for all b ∈ B. Then C[ht0 ] = C[ht ] (and thus |C[ht0 ]| ≥ 2) for all histories ht0 ht , so the equilibrium is not long run first best. If, in addition, there exists ci ∈ C[ht ] such that Ei 6= Ek[ht ] , then the equilibrium is not long run efficient either. Proof. Follows from Theorem 1. 4.2 Path Dependence In the examples of Section 3.3 the principal always learns the same amount of information about the agent’s type in the long run. As a result, even if equilibrium play may exhibit path-dependence in the short-run, as in Example 2, the principal’s long run value from the relationship, conditional on the agent’s type, is independent of the history of play. In this section we show that this is not a general property of the equilibrium of our model. We show here that the learning process, and hence the principal’s value from the relationship, may be exhibit path dependence even in the long run. We say that an equilibrium exhibits long run path dependence if for some type of the agent c = ck ∈ C there are two equilibrium outcomes, h∞ and h̃∞ , that arise with positive probability when the agent’s type is c = ck , 15 such that, U σ (·|ht∗ [h∞ ] , c = ck ) 6= U σ (·|ht∗ [h̃∞ ] , c = ck ). As we have emphasized repeatedly, the equilibrium may exhibit long run path dependence even when the process [Qb,b0 ] governing the evolution of shocks is ergodic. In fact, our next example illustrates how easily long run path dependence can arise when the productivity shock process [Qb,b0 ] is not ergodic. Example 3 Let C = {c1 , c2 }, and B = {bL , bM , bH }, with bL < bM < bH . Suppose that E1 = {bL , bM , bH } and E2 = {bM , bH }. Suppose the process [Qb,b0 ] satisfies: (i) X(bL , E2 ) < 1, and (ii) QbH ,bH = 1 and Qb,b0 ∈ (0, 1) for all (b, b0 ) 6= (b3, b3) (recall that Qb,b0 denotes the probability of transitioning to state b0 from state b.) Thus, state bH is absorbing. By Theorem 1, if bt = bH , then from period t onwards the principal makes an offer equal to ck[ht ] and all agent types in C[ht ] accept. Consider a history ht with C[ht ] = {c1 , c2 }. By Theorem 1, if bt = bM the principal makes an offer Tt = c2 that both types of agents accept. If bt = bL , by arguments similar to those in Example 2, the principal finds it optimal to make an offer Tt = c1 +X(bL , E2 )(c2 −c1 ) ∈ (c1 , c2 ) that an agent with cost c1 accepts and that an agent with cost c2 rejects. Therefore, the principal learns the agent’s type. Suppose that the agent’s true type is c = c1 , and consider the following two histories, ht and h̃t : ht = h(bt0 = bM , Tt0 = c2 , at0 = 1)t−1 t0 =1 i, h̃t = h(bt0 = bM , Tt0 = c2 , at0 = 1)t−2 t0 =1 , (bt−1 = bL , Tt−1 = T̃ , at−1 = 1)i. Under history ht , bt0 = bM for all t0 ≤ t − 1, so the principal’s beliefs after ht is realized are equal to her prior. Under history h̃t the principal learns that the agent’s type is c1 at time t − 1. Suppose that bt = bH , so that bt0 = bH for all t0 ≥ t. Under history ht , the principal doesn’t know the agent’s type at t, and therefore offers a transfer Tt0 = c2 for all t0 ≥ t, which both agent types accept. Instead, under history h̃t the principal knows that the agent’s type is c1 , and therefore offers transfer Tt0 = c1 for all t0 ≥ t, and the agent accepts it. Therefore, when the agent’s type is c1 , the principal’s continuation payoff value at history (ht , bt = bH ) 1 1 (bH − c2 ), while her payoff at history (h̃t , bt = bH ) is 1−δ (bH − c1 ). is 1−δ We now establish that equilibrium may exhibit long-run path dependence even when [Qb,b0 ] is ergodic. Let B = {b1, b2, b3, b4}, with b1 < b2 < b3 < b4 and C = {c1 , c2 , c3 }, and 16 assume that the efficiency sets are E1 = E2 = {b2, b3, b4} and E3 = {b4}. Thus, in the most productive state, it is socially optimal for all types to take the productive action; in the next two most productive states, it is socially optimal for only the two lowest cost types to take the productive action; and in the least productive state it is not socially optimal for any type to take the productive action. The following proposition shows that equilibrium may have long-run path dependence even when every entry of the transition matrix [Qb,b0 ] is positive. Proposition 4. (long run path dependence) Suppose that the agent’s cost is c1 , all of the entries of [Qb,b0 ] are positive, X(b3, {b4}) > 1 and X(b2, {b4}) < 1. If |b − c1 | is small enough, |b − c2 | is large enough, Qs,b1 is small enough for all b 6= b1 and Qb,b2 is small enough for all b 6= b2, then the equilibrium has long-run path dependence. In particular, there exist two equilibrium outcomes, h∞ and h̃∞ , such that C[ht∗ [h∞ ] ] = {c1 } = 6 {c1 , c2 } = C[ht∗ [h̃∞ ] ] and thus (†) U σ (·|ht∗ [h∞ ] , c = c1 ) 6= U σ (·|h̃t∗ [h̃∞ ] , c = c1 ). Proof. See Appendix D. Proposition 4 shows that, even when the process [Qb,b0 ] is ergodic, the information that the principal learns about the agent’s type in the long run might be influenced by the history of productivity shocks early on in the relationship. In particular, when the agent’s true cost is c1 , under some sequences of productivity shocks the principal eventually ends up learning the agent’s exact cost and achieving her first best payoff. Under other sequences of shocks the principal only learns that the agent’s cost is in the set {c1 , c2 }, after which learning stops. In this case, the principal never achieves her first best payoff, and in the long run she must pay a transfer equal to c2 whenever she incentivizes the agent to take the action; that is, she may be giving up quite substantial informational rents even in the long run. Therefore, the early shocks have a lasting effect on the principal’s equilibrium value. The intuition behind Proposition 4 is as follows. The information rents that a c1 -agent gets by mimicking a c2 -agent depends on how often the c2 -agent is expected to take the productive action in the future (see equation (1)). In turn, how often a c2 -agent takes the productive action depends on the principal’s beliefs. Indeed, if the principal assigns positive probability to the agent’s type being c3 , under the assumptions in Proposition 4 a c2 -agent will not take the productive action at periods such that bt = b2. In contrast, if the principal learns along the path of play that the agent’s type is not c3 , from that time onwards a c2 -agent will take the action whenever the state is in E2 = {b2, b3, b4}. 17 As a consequence of this, at histories (ht , bt ) with C[ht ] = {c1 , c2 , c3 } and bt = b1, it is profitable for the principal to make an offer that only a c1 -agent accepts (i.e., an offer that induces a c1 -agent to reveal his type). In contrast, at at histories (ht , bt ) with C[ht ] = {c1 , c2 }, inducing a c1 -agent to reveal his private information is too expensive, and the principal is unable to fully learn the agent’s type. At the same time, in the proof of Proposition 4 we show that, at histories (ht , bt ) with C[ht ] = {c1 , c2 , c3 } and bt = b2, the principal finds it optimal to make an offer that only types in {c1 , c2 } accept. This, together with the arguments above, explain why the equilibrium displays long run path-dependence. Suppose the agent’s type is c1 . Then, if state b = b1 is visited before state b = b2, the principal will learn the agent’s type and from that point onwards she will extract all the surplus. In contrast, if state b = b2 is visited before state b = b1, the principal learns that the agent’s type is in {c1 , c2 }, and then learning stops. Path dependence has been highlighted to be an important phenomenon in organizational economics, especially in understanding why seemingly identical firms have persistent differences in performance – see Gibbons (2010).10 With informational asymmetries, as in our model, shocks to the environment provide variation in learning about firm productivity over time. The path of learning then has implications for long run performance. In particular, two firms with similar structures may embark upon two very different learning paths, even though their internal institutions for providing incentives might appear to be identical.11 5 Conclusion Productivity shocks are a natural feature of most economic environments, and the incentives that economic agents face in completely stationary environments can be very different than the incentives they face in environments subject to these shocks. Our results show that this is true of the traditional ratchet effect literature. The message of this literature is that outside institutions that provide contract enforcement can help improve the principal’s welfare. Our results show that even without such institutions, a strategic principal can use productivity shocks to her advantage to gradually learn the agent’s private information and 10 Luria (1996), for example, shows that in the American metal manufacturing industry in the 1990’s, some firms were more than three times more productive than others (by labor productivity) even though they sold essentially the same product to the same customers. Similarly, Hallward-Driemeier et al. (2001) show that the top quartile of Indonesian electronics manufacturers were more than eight times as productive as the bottom quartile, even though all firms were supplying similar products on a competitive global market. 11 Other dynamic principal-agent models giving rise to path-dependence include Chassang (2010), Li and Matouschek (2013) and Halac and Prat (2015). 18 improve her own welfare. In addition, a relationship that was initially highly inefficient may become efficient over time. On the other hand, whether or not the relationship ever becomes efficient, and how profitable it becomes for the principal, may be path dependent. Appendix A. Proof of Lemma 0 Proof of part (i). The proof is by strong induction on the cardinality of the support of the principal’s beliefs, C[ht ]. Fix an equilibrium (σ, µ), and note that the claim is true for all histories ht such that |C[ht ]| = 1. Suppose next that the claim is true for all histories h̃t̃ with |C[h̃t̃ ]| ≤ n − 1, and consider a history ht with |C[ht ]| = n. σ [ht , bt ] > 0. Then, there must exist a state bt0 and Suppose by contradiction that Vk[h t] history ht0 ht that arises on the path of play at which type ck[ht ] receives a transfer Tt0 > ck[ht ] that the type accepts. Note first that, since type ck [ht ] accepts offer Tt0 , all types in the support of C[ht0 ] must also accept it. If this were not true, then there would be a highest type ck ∈ C[ht0 ] that rejects. By the induction hypothesis, the equilibrium payoff that this type obtains at history ht0 is Vkσ [ht0 , bt0 ] = 0, since this type will be the highest cost of in the support of the principal’s beliefs following a rejection. But this cannot be, since type cj can get a payoff of at least Tt0 − ck > 0 by accepting the principal’s offer at time t0 . We now construct an alternative strategy profile σ̃ that is otherwise identical to σ except that in state bt0 and history ht0 the agent is offered a transfer T̃ ∈ (ck[ht ] , Tt0 ). Specify the principal’s beliefs at history (ht0 , bt0 ) as follows: regardless of the agent’s action, the principal’s beliefs at the end of period t0 are the same as her beliefs at the beginning of the period. At all other histories, the principal’s beliefs are the same as in the original equilibrium. Note that, given these beliefs, at history ht0 all agent types in C[ht0 ] find it strictly optimal to accept the principal’s offer T̃ and take the action. Thus, the principal’s payoff at history ht0 is larger than her payoff under the original equilibrium, which contradicts R2. Proof of part (ii). The proof is by induction of the cardinality of C[ht ]. Consider first a history ht such that |C[ht ]| = 2. Without loss of generality, let C[ht ] = {c1 , c2 }, with c1 < c2 . There are two cases to consider: (i) for all histories ht0 ht , µ[ht0 ] = µ[ht ], i.e., there is no more learning; and (ii) there exists a history ht0 ht such that µ[ht0 ] 6= µ[ht ]. Consider first case (i). Since µ[ht0 ] = µ[ht ] for all ht0 ht , both types of agents take the productive action at the same times. This implies that Aσ2 [ht , bt ] = Aσ1 [ht , bt ]. Moreover, by 19 Lemma 0, the transfer that the principal pays when the productive action is taken is equal to P∞ t0 −t (Tt0 − c1 )at0 ,1 |ht = V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ), where c2 . Hence, V1σ [ht , bt ] = E t0 =t δ we have used the facts that V2σ [ht , bt ] = 0 and Tt0 = c2 for all t0 such that at0 ,1 = at0 ,2 = 1. Consider next case (ii), and let t = min{t0 ≥ t : at0 ,1 6= at0 ,2 }. Hence, at time t only one type of agent in {c1 , c2 } takes the action. Note that an agent of type c1 must take the action. To see why, suppose that it is only the agent of type c2 that takes the action. By part (i) of the Lemma, the transfer Tt that the principal pays the agent must be equal to c2 . The payoff that an agent with type c1 gets by accepting the offer Tt is bounded below by c2 − c1 > 0. In contrast, by part (i) of the Lemma, an agent of type c1 would obtain a continuation payoff of zero by rejecting this offer. Hence, it must be that only an agent with type c1 takes the action at time t. Note that the total payoff that an agent with type c1 gets from time t onwards must satisfy V1σ [ht , bt ] = Tt − c1 ≥ V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ), where the inequality follows since an agent of type c1 can get a payoff equal to the right-hand side by mimicking an agent with type c2 . Since we focus on stationary PBE that are optimal for the principal, the transfer that the principal offers the agent at time t must be Tt = c1 + V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ), and so V1σ [ht , bt ] = V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ). (3) Note next that, for all t0 ∈ {t, ..., t − 1}, at0 ,1 = at0 ,2 , i.e., both types of agents take the same action, and that Tt0 = c2 whenever at0 ,1 = at0 ,2 = 1, i.e., the principal pays a transfer equal to c2 whenever the high cost agent takes the action. Therefore, " V1σ [ht , bt ] = E " =E t−1 X t0 =t ∞ X ! δ t0 −t (Tt0 − c1 )at0 ,1 # + δ t V1σ [ht , bt ] | ht , bt ! 0 δ t −t (c2 − c1 )at0 ,2 # + δ t Aσ2 [ht , bt ](c2 − c1 ) | ht , bt t0 =t = V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ), (4) where we have used (3), and the fact that V2σ [ht , bt ] = 0. Therefore, the result of the lemma holds for all ht such that |C[ht ]| = 2. Suppose next that the result holds for all h̃t̃ such that |C[h̃t̃ ]| ≤ n − 1, and consider a history ht such that |C[ht ]| = n. Consider two “adjacent” types ci , ci+1 ∈ C[ht ]. We have two possible cases: (i) with probability 1, types ci and ci+1 take the same action at all histories ht0 ht ; (ii) there exists a history ht0 ht at which types ci and ci+1 take different actions. 20 Under case (i), " Viσ [ht , bt ] =E =E ∞ X 0 =t " t∞ X # δ t0 −t δ t0 −t (Tt0 − ci )at0 ,i |ht , bt # (Tt0 − ci+1 )at0 ,i+1 |ht , bt + E t0 =t " ∞ X # δ t0 −t (ci+1 − ci )at0 ,i+1 |ht , bt t0 =t σ = Vi+1 [ht , bt ] + Aσi+1 [ht , bt ](ci+1 − ci ). (5) For case (ii), let t = min{t0 ≥ t : at0 ,i+1 6= at0 ,i } be the first time after t at which types ci and ci+1 take different actions. Let ck ∈ C[ht ] be the highest cost type that takes the action at time t. The transfer Tt that the principal offers at time t must satisfy Vkσ [ht , bt ] = σ σ [ht , bt ] + Aσk+1 [ht , bt ](ck+1 − ck ).12 Note further that Vk+1 [ht , bt ] ≥ Tt − ck+1 , Tt − ck + 0 = Vk+1 since an agent with cost ck+1 can guarantee Tt − ck+1 by taking the action at time t and then not taking the action in all future periods. This, combined with the previous equality, implies that Aσk+1 [ht , bt ] ≤ 1. We now use this to show that all types below ck also take the action at time t. This implies that all agents in the support of C[ht ] with cost weakly lower than ck take the action at t, and all agents with cost weakly greater than ck+1 do not take the action. Note that this implies that ci = ck (since types ci and ci+1 take different actions at time t). Suppose for the sake of contradiction that this is not true, and let cj be the highest cost type below ck that takes does not take the action. The payoff that this agent gets from not taking the σ [ht , bt ] + Aσk+1 [ht , bt ](ck+1 − cj ), which follows since at time t types action is Vjσ [ht , bt ] = Vk+1 cj and ck+1 do not take the action and since, by the induction hypothesis, from time t + 1 onwards the payoff that an agent with cost cj gets is equal to what this agent would get by mimicking an agent with cost ck+1 . On the other hand, the payoff that agent cj obtains by taking the action and mimicking type ck is Vkσ [ht , bt ] + Aσk [ht , bt ](ck − cj ) = Tt − ck + Aσk [ht , bt ](ck − cj ) σ = Vk+1 [ht , bt ] + Aσk+1 [ht , bt ](ck+1 − ck ) + Aσk [ht , bt ](ck − cj ) σ [ht , bt ] + Aσk+1 [ht , bt ](ck+1 − cj ), > Vk+1 (6) which follows since Aσk+1 [ht , bt ] ≤ 1 < Aσk [ht , bt ]. Hence, type j strictly prefers to take the action, a contradiction. Therefore, all types below ck take the action at time t, and so ci = ck . 12 The first equality follows since, after time t, type ck is the highest type in the support of the principal’s beliefs if the agent takes action a = 1 at time t. 21 By the arguments above, the payoff that an agent of type ci = ck obtains at time t is σ Viσ [ht , bt ] = Tt − ci + 0 = Vi+1 [ht , bt ] + Aσi+1 [ht , bt ](ci+1 − ci ), (7) σ [ht , bt ] + Aσi+1 [ht , bt ](ci+1 − ci ). since transfer that the principal offers at time t is Tt = ci + Vi+1 Moreover, t−1 X " Viσ [ht , bt ] =E t0 =t ∞ X " =E ! δ t0 −t δ t0 −t (Tt0 − ci )at0 ,i # +δ t Viσ [ht , bt ]|ht , bt ! # ((Tt0 − ci+1 )at0 ,i+1 + (ci+1 − ci )at0 ,i+1 ) |ht , bt t0 =t # " σ [ht , bt ] + Aσi+1 [ht , bt ](ci+1 − ci ) |ht , bt + E δ t Vi+1 σ [ht , bt ] + Aσi+1 [ht , bt ](ci+1 − ci ), = Vi+1 (8) where the second equality follows since at0 ,i = at0 ,i+1 for all t0 ∈ [t, t − 1]. Hence, the result also holds for histories ht with |C[ht ]| = n. B. Proof of Theorem 1 The proof proceeds in three steps. First we analyze the case where bt ∈ Ek[ht ] , establishing part (i) of the theorem. Then we analyze the case where bt ∈ / Ek[ht ] , establishing (ii) and (iii). Finally, we show that equilibrium exists and has unique payoffs. In doing so, we also characterize the threshold type ck∗ defined in part (iii). B.1. Proof of part (i) (the case of bt ∈ Ek[ht ] ) We prove part (i) of the theorem by strong induction on the size of the set C[ht ]. If C[ht ] is a singleton {ck }, the statement of part (i) holds: by R1-R2, the principal offers the agent a transfer Tt0 = ck at all times t0 ≥ t such that bt0 ∈ Ek and the agent accepts, and she offers some transfer Tt0 < ck at all times t0 ≥ t such that bt0 ∈ / Ek , and the agent rejects. Suppose next that the claim is true for all histories ht0 such that |C[ht0 ]| ≤ n − 1, and let ht be a history such that |C[ht ]| = n, and let bt ∈ Ek[ht ] . In a PBE that satisfies R1-R2, it cannot be that the principal makes an offer that no type in C[ht ] accepts. Indeed, suppose that no type in C[ht ] takes the action. Consider an alternative Markovian PBE which is identical to the original PBE, except that when the principal’s beliefs are µ[ht ] and the shock 22 is bt , the principal makes an offer T = ck[ht ] , and all agent types in C[ht ] accept any offer weakly larger than T . The principal’s beliefs after this period are equal to µ[ht ] regardless of the agent’s action. Note that it is optimal to all types of agents to accept this offer, and it is optimal for the principal to make this offer. Moreover, since bt ∈ Ek[ht ] , the payoff that the principal obtains from this PBE is strictly larger than her payoff under the original PBE. But this cannot be, since the original PBE satisfies R1-R2. Hence, if bt ∈ Ek[ht ] , at least a subset of types in C[ht ] take the action at time t. Suppose for the sake of contradiction that the principal makes an offer Tt that only a subset C ( C[ht ] of types accept, and let cj = max C. By Lemma 0, the payoff of type cj from taking the productive action is Tt − cj + 0. Since an agent with cost cj can mimic the strategy of type ck[ht ] , incentive compatibility implies that σ Tt − cj ≥ Vk[h [ht , bt ] + (ck[ht ] − cj )Aσk[ht ] [ht , bt ] t] ≥ (ck[ht ] − cj )X(bt , Ek[ht ] ) > ck[ht ] − cj (9) The first inequality follows from equation (2) in the main text. The second inequality follows / C,13 from Lemma 0 and the fact that Aσj [ht , bt ] ≥ X(bt , Ek[ht ] ). To see why, note that ck[ht ] ∈ so at most n − 1 types accept the principal’s offer. Thus, the inductive hypothesis implies that if the agent rejects the offer, then in all periods after t the principal will get all the remaining types to take the action whenever bt ∈ Ek[ht ] . The last inequality in equation (9) follows from the fact X(bt , Ek[ht ] ) ≥ X(bt , {bt }) > 1 where the first inequality is due to the fact that bt ∈ Ek[ht ] and the second is by Assumption 1. On the other hand, because Lemma 0 implies that an agent with type ck[ht ] has a continuation value of zero, the transfer Tt that the principal offers must be weakly smaller than ck[ht ] ; otherwise, if Tt > ck[ht ] , an agent with type ck[ht ] could guarantee himself a strictly positive payoff by accepting the offer. But this contradicts (9). Hence, it must be that all agents in C[ht ] take action a = 1 at history (ht , bt ) if bt ∈ Ek[ht ] . 13 Suppose, for the sake of contradiction, that type ck[ht ] ∈ C. Since by Lemma 0 this type’s continuation payoff is zero for all histories, it must be that Tt ≥ ck[ht ] . Let ci = max C[ht ]\C. Since ci rejects the offer today and becomes the highest cost in the support of the principal’s beliefs tomorrow, Lemma 0 implies that Viσ [ht ] = 0. But this cannot be, since this agent can guarantee a payoff at least of Tt − ci ≥ ck[ht ] − ci > 0 by accepting the offer. Contradiction. 23 B.2. Proof of parts (ii) & (iii) (the case of bt ∈ / Ek[ht ] ) In both parts (ii) and (iii) of the theorem, the highest cost type in the principal’s support ck[ht ] does not take the productive action when bt ∈ / Ek[ht ] . We prove this in Lemma 1 below, and use the lemma to prove parts (ii) and (iii) separately. Lemma 1. Fix any equilibrium (σ, µ) and history ht . If bt ∈ / Ek[ht ] , then an agent with cost ck[ht ] does not take the productive action. Proof. Suppose for the sake of contradiction that an agent with type ck[ht ] does take the action when bt ∈ / Ek[ht ] . Since, by Lemma 0, this type’s payoff must equal zero at all histories, it must be that the offer that is accepted is Tt = ck[ht ] . We now show that if the principal makes such an offer, then all agent types will accept the offer and take the productive action. To see this, suppose some types reject the offer. Let cj be the highest cost type that rejects the offer. By Lemma 0, type cj earns a continuation payoff of zero. So, the payoff that type cj gets by rejecting the offer is zero. However, this type can guarantee itself a payoff of at least Tt − cj = ck[ht ] − cj > 0 by accepting the current offer. Hence, it cannot be that some agents reject the offer Tt = ck[ht ] when an agent with type ck[ht ] accepts the offer. It then follows that if type ck[ht ] accepts the offer, then the principal will not learn anything about the agent’s type. Since bt ∈ / Ek[ht ] , her flow payoff from making the offer is bt −ck[ht ] < 0. If instead the principal offers a transfer equal to 0 that all agents reject, she would obtain a current payoff of zero and have the same beliefs as in the case where everyone accepts. Therefore, by conditions R1-R2, she would have the same continuation payoff as well. Since bt − ck[ht ] < 0, the principal obtains a higher payoff from following the second strategy. Proof of part (ii). Fix a history ht and let bt ∈ B\Ek[ht ] be such that X(bt , Ek[ht ] ) > 1. By Lemma 1, type ck[ht ] doesn’t take the productive action at time t if bt ∈ / Ek[ht ] . Suppose, for the sake of contradiction, that there is a nonempty set of types C ( C[ht ] that do take the productive action. Let cj = max C. By Lemma 0 type cj obtains a continuation payoff of zero starting in period t + 1. Hence, type cj receives a payoff Tt − cj + 0 from taking the productive action in period t. Since this payoff must be weakly larger than the payoff the agent would obtain by not taking the action and mimicking the strategy of agent ck[ht ] in all future periods, it follows that σ Tt − cj ≥ Vk[h [ht , bt ] + (ck[ht ] − cj )Aσk[ht ] [ht , bt ] t] ≥ (ck[ht ] − cj )X(bt , Ek[ht ] ) > ck[ht ] − cj , 24 (10) where the first line follows from incentive compatibility, the second line follows from the fact that at0 ,k[ht ] = 1 for all times t0 ≥ t such that bt0 ∈ Ek[ht ] (by the result of part (i) proven above), and the third line follows since X(bt , Ek[ht ] ) > 1 by assumption. The inequalities in (10) imply that Tt > ck[ht ] . But then by Lemma 0, it would be strictly optimal for type ck[ht ] to deviate by accepting the transfer and taking the productive action. So it must be that all agent types in C[ht ] take action at = 0. Proof of part (iii). We start by showing that the set of types that accept the offer has the form C − = {ck ∈ C[ht ] : ck < ck∗ } for some ck∗ ∈ C[ht ]. The result is clearly true if no agent type takes the action, in which case set ck∗ = min C[ht ]; or if only an agent with type min C[ht ] takes the action, in which case set ck∗ equal to the second lowest cost in C[ht ]. Therefore, suppose that an agent with type larger than min C[ht ] takes the action, and let cj ∗ ∈ C[ht ] be the highest cost agent that takes the action. Since bt ∈ / Ek[ht ] , by Lemma 1 it must be that cj ∗ < ck[ht ] . By Lemma 0, type cj ∗ ’s payoff is Tt − cj ∗ , since from date t + 1 onwards this type will be the largest cost in the support of the principal’s beliefs if the principal observes that the agent took the action at time t. Let ck∗ = min{ck ∈ C[ht ] : ck > cj ∗ }, and note that (2) implies that Tt − cj ∗ ≥ Vkσ∗ [ht , bt ] + (ck∗ − cj ∗ )Aσk∗ [ht , bt ] (11) Furthermore, type ck∗ can guarantee himself a payoff of Tt − ck∗ by taking the action once and never taking the action again. Therefore, it must be that Vkσ∗ [ht , bt ] ≥ Tt − ck∗ ≥ cj ∗ − ck∗ + Vkσ∗ [ht , bt ] + (ck∗ − cj ∗ )Aσk∗ [ht , bt ] =⇒ 1 ≥ Aσk∗ [ht , bt ] (12) where the second inequality in the first line follows from (11). We now show that all types ci ∈ C[ht ] with ci < cj ∗ also take the action at time t. Suppose for the sake of contradiction that this is not true, and let ci∗ ∈ C[ht ] be the highest cost type lower than cj ∗ that does not take the action. The payoff that this type would get by taking the action at time t and then mimicking type cj ∗ is Viσ∗ →j ∗ [ht , bt ] = Tt − ci∗ + (cj ∗ − ci∗ )Aσj∗ [ht , bt ] = Tt − ci∗ + (cj ∗ − ci∗ )X(bt , Ej ∗ ) ≥ (cj ∗ − ci∗ )[1 + X(bt , Ej ∗ )] + Vkσ∗ [ht , bt ] + (ck∗ − cj ∗ )Aσk∗ [ht , bt ] 25 (13) where the first line follows from the fact that type cj ∗ is the highest type in the support of the principal’s beliefs in period t + 1, so he receives a payoff of 0 from t + 1 onwards; the second follows from part (i) and Lemma 1 which imply that type cj ∗ takes the action in periods t0 ≥ t + 1 only when bt0 ∈ Ej ∗ ; and the third follows by applying the inequality in (11). On the other hand, by Lemma 0(ii), the payoff that type ci∗ gets by rejecting the offer at time t is equal to the payoff she would get by mimicking type ck∗ , since the principal will believe for sure that the agent does not have type in {ci∗ +1 , ..., cj ∗ } ⊆ C[ht ] after observing a rejection. That is, type ci∗ ’s payoff is Viσ∗ [ht , bt ] = Viσ∗ →k∗ [ht , bt ] = Vkσ∗ [ht , bt ] + (ck∗ − ci∗ )Aσk∗ [ht , bt ] (14) From equations (13) and (14), it follows that Viσ∗ [ht , bt ] − Viσ∗ →j ∗ [ht , bt ] ≤ (cj ∗ − ci∗ ) Aσk∗ [ht , bt ] − [1 + X(bt , Ej ∗ )] < 0, where the strict inequality follows after using (12). Hence, type ci∗ strictly prefers to mimic type cj ∗ and take the action at time t than to not take it, a contradiction. Hence, all types ci ∈ C[ht ] with ci ≤ cj ∗ take the action at t, and so the set of types taking the action takes the form C − = {cj ∈ C[ht ] : cj < ck∗ }. Finally, it is clear that in equilibrium, the transfer that the principal will pay at time t if all agents with type ci ∈ C − take the action is given by (∗). The payoff that an agent with type cj ∗ = max C − gets by accepting the offer is Tt − cj ∗ , while her payoff from rejecting the offer and mimicking type ck∗ = min C[ht ]\C − is Vkσ∗ [ht , bt ] + (ck∗ − cj ∗ )Aσk∗ [ht , bt ]. Hence, the lowest offer that a cj ∗ -agent accepts is Tt = cj ∗ + Vkσ∗ [ht bt ] + (ck∗ − cj ∗ )Aσk∗ [ht bt ]. B.3. Proof of Existence and Uniqueness For each history ht and each cj ∈ C[ht ], let Cj+ [ht ] = {ci ∈ C[ht ] : ci ≥ cj }. For each history ht and state b ∈ B, let (ht , bt ) denote the concatenation of history ht = hbt0 , Tt0 , at0 it−1 t0 =0 together with state realization bt . Let " Aσj+ [ht , bt ] := Eσj ∞ X # 0 δ t −t at0 ,j |(ht , bt ) and C[ht+1 ] = Cj+ [ht ] . t0 =t+1 That is, Aσj+ [ht , bt ] is the expected discounted fraction of time at which an agent with type cj takes the action after history (ht , bt ) if the beliefs of the principal at time t + 1 have support 26 Cj+ [ht ]. We then have: Lemma 2. Fix any equilibrium (σ, µ) and history-state pair (ht , bt ). Then, there exists an offer T ≥ 0 such that types ci ∈ C[ht ], ci < cj , accept in time t and types ci ∈ C[ht ], ci ≥ cj , reject if and only if Aσj+ [ht , bt ] ≤ 1. Proof. First, suppose such an offer T exists, and let ck be the highest type in C[ht ] that accepts T . Let cj be the lowest type in C[ht ] that rejects the offer. By Lemma 0, the expected discounted payoff that an agent with type ck gets from accepting the offer is T − ck + 0. The payoff that type ck obtains by rejecting the offer and mimicking type cj from time t + 1 onwards is Vjσ [ht , bt ] + (cj − ck )Aσj+ [ht , bt ]. Therefore, the offer T that the principal makes must satisfy T − ck ≥ Vjσ [ht , bt ] + (cj − ck )Aσj+ [ht , bt ] (15) where the second equality follows because, if the offer is rejected at t, the principal’s beliefs in period t + 1 are that the agent’s type lies in the set Cj+1 [ht ], and thus Aσj [ht , bt ] = Aσj+ [ht , bt ]. Note that an agent with type cj can guarantee herself a payoff of T − cj by taking the action in period t and then never taking it again; therefore, incentive compatibility implies Vjσ [ht , bt ] ≥ T − cj = Vjσ [ht , bt ] + (cj − ck ) Aσj+ [ht , bt ] − 1 =⇒ 1 ≥ Aσj+ [ht , bt ] where the equality in the first line follows after substituting T from (15). Suppose next that Aσj+ [ht , bt ] ≤ 1, and suppose the principal makes offer T = ck + Vjσ [ht , bt ] + (cj − ck )Aσj+ [ht , bt ], which only agents with type c` ∈ C[ht ], c` ≤ ck are supposed to accept. The payoff that an agent with cost ck obtains by accepting the offer is T − ck , which is exactly what he would obtain by rejecting the offer and mimicking type cj . Hence, type ck has an incentive to accept such an offer. Similarly, one can check that all types c` ∈ C[ht ], c` < ck also have an incentive to accept the offer. If the agent accepts such an offer and takes the action in period t, the principal will be believe that the agent’s type lies in {c` ∈ C[ht ] : c` ≤ ci }. Note that, in all periods t0 > t, the principal will never offer Tt0 > ck . Consider the incentives of an agent with type cj at time t. The payoff that this agent gets from accepting the offer is T − cj , since from t + 1 onwards the agent will never accept any equilibrium offer. This is because all subsequent offers will be lower than ck < cj . On the other hand, the agent’s payoff from rejecting the offer is Vjσ [ht , bt ] ≥ T − cj = ck − cj + Vjσ [ht , bt ] + (ck − cj ) Aσj+ [ht , bt ] − 1 , where the inequality follows since Aσj+ [ht , bt ] ≤ 1. 27 The proof of existence and uniqueness relies on Lemma 2 and uses strong induction on the cardinality of C[ht ]. Clearly, equilibrium exists and equilibrium payoffs are unique at histories ht such that C[ht ] is a singleton {ck }: in this case, the principal offers the agent a transfer Tt0 = ck at all times t0 ≥ t such that bt0 ∈ Ek (which the agent accepts) and offers some transfer Tt0 < ck at all times t0 ≥ t such that bt0 ∈ / Ek . Suppose next that equilibrium exists and equilibrium payoffs are unique for all histories h̃t̃ such that |C[h̃t̃ ]| ≤ n − 1, and let ht be a history such that |C[ht ]| = n. Fix a candidate for equilibrium (σ, µ), and let U σ [bt , µ[ht ]] denote the principal’s equilibrium payoffs when her beliefs are µ[ht ] and the shock is bt . We now show that, when the principal’s beliefs are µ[ht ], equilibrium payoffs are also unique. If bt ∈ Ek[ht ] , then by part (i) it must be that all agent types in C[ht ] take the action in period t and Tt = ck[ht ] ; hence, at such states U σ [bt , µ[ht ]] = bt − ck[ht ] + δE U σ [bt+1 , µ[ht ]]|bt If bt ∈ / Ek[ht ] and X(bt , Ek[ht ] ) > 1, then by part (ii), all agent types in C[ht ] don’t take the action (in this case, the principal makes an offer T small enough that all agents reject); hence, at such states U σ [bt , µ[ht ]] = δE U σ [bt+1 , µ[ht ]]|bt In either case, the principal doesn’t learn anything about the agent’s type, since all types of agents in C[ht ] take the same action, so her beliefs don’t change. Finally, consider states bt ∈ / Ek[ht ] and X(bt , Ek[ht ] ) ≤ 1. Two things can happen at such a state: (i) all types of agents in C[ht ] don’t take the action, or (ii) a strict subset of types in C[ht ] don’t take the action and the rest do.14 In case (i), the beliefs of the principal at time t + 1 would be the same as the beliefs of the principal at time t, and her payoffs are U σ [bt , µ[ht ]] = δE U σ [bt+1 , µ[ht ]]|bt In case (ii), the types of the agent not taking the action has the form Cj+ [ht ] = {ci ∈ C[ht ] : ci ≥ cj } for some cj ∈ C[ht ]. So in case (ii) the support of the beliefs of the principal at time t + 1 would be Cj+ [ht ] if the agent doesn’t take the action, and C[ht ]\Cj+ [ht ] if he does. By Lemma 2, there exists an offer that types Cj+ [ht ] reject and types C[ht ]\Cj+ [ht ] accept 14 By Lemma 1, in equilibrium an agent with cost ck[ht ] doesn’t take the action. 28 if and only if Aσj+ [ht , bt ] ≤ 1. Note that, by the induction hypothesis, Aσj+ [ht , bt ] is uniquely determined.15 Let C ∗ [ht , bt ] = {ci ∈ C[ht ] : Aσi+ [ht , bt ] ≤ 1}. Without loss of generality, renumber the types in C[ht ] so that C[ht ] = {c1 , ..., ck[ht ] }, with c1 < ... < ck[ht ] . For each ci ∈ C ∗ [ht , bt ], let ∗ = ci−1 + Viσ [ht , bt ] + Aσi+ [ht , bt ](ci − ci−1 ) Tt,i−1 be the offer that leaves an agent with type ci−1 indifferent between accepting and rejecting when all types in Ci+ [ht ] reject the offer and all types in C[ht ]\Ci+ [ht ] accept. Note that ∗ is the best offer for a principal who wants to get all agents with types in C[ht ]\Ci+ [ht ] Tt,i−1 to take the action and all agents with types in types in Ci+ [ht ] to not take the action. ∗ : ci ∈ C ∗ [ht , bt ]}. At states bt ∈ / Ek[ht ] with X(bt , Ek[ht ] ) ≤ 1, the principal Let T = {Tt,i−1 must choose optimally whether to make an offer in T or to make a low offer (for example, ∗ Tt = 0) that all agents reject: an offer Tt = Tt,i−1 would be accepted by types in C[ht ]\Ci+ [ht ] and rejected by types in Ci+ [ht ], while an offer Tt = 0 will be rejected by everyone. For each ∗ ∗ ) be the probability that offer is accepted; i.e., the probability that ∈ T , let p(Tt,i−1 offer Tt,i−1 ∗ ∗ , at = 0] , at = 1] and U σ [bt , Tt,i−1 the agent has cost weakly smaller than ci−1 . Let U σ [bt , Tt,i−1 ∗ denote the principal’s expected continuation payoffs if the offer Tt,i−1 ∈ T is accepted and rejected, respectively, at state bt . Note that these payoffs are uniquely pinned down by the induction hypothesis: after observing whether the agent accepted or rejected the offer, the cardinality of the support of the principal’s beliefs will be weakly lower than n − 1. Let n o U ∗ (b) = max p(T )(b − T + U σ [b, T, 1]) + (1 − p(T ))U σ [b, T, 0] T ∈T and let T (b) be a maximizer of this expression. Partition the states B as follows: B1 = Ek[ht ] B2 = {b ∈ B\B1 : X(bt , Ek[ht ] ) > 1} B3 = {b ∈ B\B1 : X(bt , Ek[ht ] ) ≤ 1} 15 Aσj+ [ht , bt ] is determined in equilibrium when the principal has beliefs with support Cj+ [ht ], and the induction hypothesis states that the continuation equilibrium is unique when the cardinality of the support of principal’s beliefs is less than n. 29 By our arguments above, the principal’s payoff U σ [b, µ[ht ]] satisfies: σ if b ∈ B1 t ]]|bt = b b − ck[ht ] + δE U [bt+1 , µ[h σ σ U [b, µ[ht ]] = δE U [bt+1 , µ[ht ]]|bt = b if b ∈ B2 σ ∗ max{U (b), δE U [bt+1 , µ[ht ]]|bt = b } if b ∈ B3 (16) Let F be the set of functions from B to R and let Φ : F → F be the operator such that, for every f ∈ F, if b ∈ B1 b − ck[ht ] + δE[f [bt+1 ]|bt = b] Φ(f )(b) = δE[f [bt+1 ]|bt = s] if b ∈ B2 ∗ max{U (b), δE[f [bt+1 ]|bt = b]} if b ∈ B3 One can check that Φ is a contraction of modulus δ < 1, and therefore has a unique fixed point. Moreover, by (16), the principal’s equilibrium payoffs U σ [b, µ[ht ]] are a fixed point of Φ. These two observations together imply that the principal’s equilibrium payoffs U σ [b, µ[ht ]] are unique. Finally, the equilibrium strategies at (ht , bt ) can be immediately derived from (16). C. Proof of Proposition 2 Fix a history ht such that |C[ht ]| ≥ 2 and without loss of generality renumber the types so that C[ht ] = {c1 , ..., ck[ht ] } with c1 < ... < ck[ht ] . We start by showing that for every such history, there exists a shock realization b ∈ B with the property that at state (µ[ht ], b) the principal makes an offer that a strict subset of the types in C[ht ] accepts. Suppose for the sake of contradiction that this is not true. Note that this implies that µ[ht0 ] = µ[ht ] for every ht0 ht . By Theorem 1, this further implies that after history ht , the agent only takes the action when the shock is in Ek[ht ] , and receives a transfer equal to ck[ht ] . Therefore, the principal’s payoff after history (ht , b) is U σ [ht , b] = E "∞ X # δ t0 −t (bt0 − ck[ht ] )1{bt0 ∈Ek[h ] } |bt = b . t t0 =t Let b ∈ Ek[ht ]−1 be such that X(b, Ek[ht ] ) < 1. By Assumption 2 such a shock b exists. Suppose that the shock at time t after history ht is b, and let > 0 be small enough such that T = ck[ht ]−1 + X(b, Ek[ht ] )(ck[ht ] − ck[ht ]−1 ) + < ck[ht ] . (17) Note that at state (µ[ht ], b), an offer equal to T is accepted by all types below ck[ht ] , and is 30 rejected by type ck[ht ] .16 The principal’s payoff from making an offer T conditional on the agent’s type being ck[ht ] is U σ [ht , b]. On the other hand, when the agent’s type is lower than ck[ht ] , the principal obtains b − T at period t if she offers transfer T , and learns that the agent’s type is not ck[ht ] . From period t + 1 onwards, the principal’s payoff is bounded below by what she could obtain if at all periods t0 > t she offers Tt0 = ck[ht ]−1 whenever bt0 ∈ Ek[ht ]−1 (an offer which is accepted by all types), and offers Tt0 = 0 otherwise (which is rejected by all types). The payoff that the principal obtains from following this strategy when the agent’s cost is lower than ck[ht ] is " U =b−T +E ∞ X # 0 δ t −t (bt0 − ck[ht ]−1 )1{bt0 ∈Ek[h ]−1 } |bt = b t t0 =t+1 " = b − ck[ht ]−1 − + E ∞ X # δ t0 −t (bt0 − ck[ht ] )1{bt0 ∈Ek[h ] } |bt = b t t0 =t+1 " +E ∞ X # δ t0 −t (bt0 − ck[ht ]−1 )1{bt0 ∈Ek[h ]−1 \Ek[h ] } |bt = b t t0 =t+1 t = U σ [ht , b] + b − ck[ht ]−1 − " ∞ # X 0 +E δ t −t (bt0 − ck[ht ]−1 )1{bt0 ∈Ek[h ]−1 \Ek[h ] } |bt = b , t t0 =t+1 t where the first line follows from substituting (17). Then, from the third line we see that if > 0 is small enough then U strictly larger than U σ [ht , b]. But this cannot be, since the proposed strategy profile was an equilibrium. Therefore, for all histories ht such that |C[ht ]| ≥ 2, there exists b ∈ B with the property that at state (µ[ht ], b) the principal makes an offer that a strict subset of the types in C[ht ] accept. We now use this result to establish the proposition. Note first that this result, together with the assumption that [Qb,b0 ] is ergodic, implies that there is long run learning in equilibrium. As long as C[ht ] has two or more elements, there will be some shock realization at which the principal makes an offer that only a strict subset of types in C[ht ] accepts. Since there are finitely many types, the principal will end up learning the agent’s type. Finally, suppose that the principal history ht is such that C[ht ] = {ci }. Then, from P∞ t0 −t time t onwards the principal’s payoff is U σ [ht , b] = E (bt0 − ci )1{bt0 ∈Ei } |bt = b = t0 =t δ Ui∗ (b|c = ci ), which is the first best payoff. This and the previous arguments imply that the 16 Indeed, an agent with cost ci < ck[ht ] obtains a payoff that is strictly larger by accepting offer T than what he obtains by rejecting and continuing playing the equilibrium. 31 equilibrium is long run first best, has long run learning, and is long run efficient. D. Proof of Proposition 4 To prove the proposition we show that, under the Assumptions of Proposition 4, the unique equilibrium has the following properties. (i) If ht is such that C[ht ] = {c1 , c2 }, then µ[ht0 ] = µ[ht ] for all ht0 ht (i.e., there is no more learning by the principal from time t onwards). (ii) If ht is such that C[ht ] = {c2 , c3 }, the principal learns the agent’s type at time t if and only if bt = b2. (iii) For histories ht such that C[ht ] = {c1 , c2 , c3 }: if bt = b1, type c1 takes action a = 1 while types c2 and c3 take action a = 0; if bt = b2, types c1 and c2 take action a = 1 and type c3 takes action a = 0; if bt = b3, all agent types take action a = 0; and if bt = b4, all agent types take action a = 1. Before proving properties (i)-(iii), we note that they imply the desired result. Indeed, when the agent’s type is c1 , properties (i)-(iii) imply that the principal eventually learns the agent’s type if and only if t(b1) := min{t ≥ 0 : bt = b1} < t(b2) := min{t ≥ 0 : bt = b2} (i.e., if state b1 is visited before state b2). Proof of Property (i). Note first that, by Theorem 1, after such a history the principal makes a pooling offer that both types accept if bt ∈ E2 = {b2, b3, b4}. To establish the result, we show that if bt = b1, types c1 and c2 take action a = 0 after history ht . If the principal makes a separating offer that only a c1 agent accepts, she pays a transfer Tt = c1 + X(b1, E2 )(c2 − c1 ). The principal’s payoff from making such an offer, conditional on the agent being type c1 , is " Ũ sc [c1 ] = b1 − Tt + E # X δ t0 −t 1bt ∈E1 (bt0 − c1 )|bt = b1 t0 >t = b1 − c1 + X X(b1, {b})[b − c2 ]. b∈{b2,b3,b4} Her payoff from making that offer conditional on the agent’s type being c2 is Ũ sc [c2 ] = P b∈{b2,b3,b4} X(b1, {b})[b − c2 ]. If the principal doesn’t make a separating offer when bt = b1, P she never learns the agent’s true type and gets a payoff Ũ nsc = b∈{b2,b3,b4} X(b1, {b})[b − c2 ]. 32 Since b1 − c1 < 0 by assumption, Ũ nsc > µ[ht ][c1 ]Ũ sc [c1 ] + µ[ht ][c2 ]Ũ sc [c2 ], and therefore the principal chooses not to make a separating offer. Proof of Property (ii). Theorem 1 implies that, after such a history, the principal makes a pooling offer that both types accept if bt ∈ E3 = {b4}. Theorem 1 also implies that, if bt = b3, then after such a history the principal makes an offer that both types reject (since X(b3, {b4}) > 1 by assumption). So it remains to show that, after history ht , the principal makes an offer that a c2 agent accepts and a c3 agent rejects if bt = b2, and that the principal makes an offer that both types reject if bt = b1. Suppose bt = b2. Let U [ci ] be the principal’s value at history (ht , bt = b2) conditional on the agent’s type being ci ∈ {c2 , c3 }, and let Vi the value of an agent of type ci at history P (ht , bt = b2). Note that U [c2 ] + V2 ≤ b2 − c2 + b∈{b2,b3,b4} X(b2, {b})[b − c2 ], since the righthand side of this equation corresponds to the efficient total payoff when the agent is of type c2 (i.e., the agent taking the action if and only if the state is in E2 .) Note also that incentive compatibility implies V2 ≥ X(b2, {b4})(c2 − c3 ), since a c2 -agent can mimic a c3 -agent forever and obtain X(b2, {b4})(c2 − c3 ). It thus follows that U [c2 ] ≤ b2 − c2 + X(b2, {b4})[b4 − c3 ] + P s∈{b2,b3} X(b2, {b})[b − c2 ]. If when bt = b2 the principal makes an offer that only a c2 agent accepts, the offer must satisfy Tt = c2 + X(b2, {b4})(c2 − c3 ) < c3 . The principal’s payoff from making such an offer when the agent’s type is c2 is b2 − Tt + X X(b2, {b})[b − c2 ] b∈{b2,b3,b4} =b2 − c2 + X(b2, {b4})[b4 − c3 ] + X X(b2, {b})[b − c2 ], (18) b∈{b2,b3} which, from the arguments in the previous paragraph, is the highest payoff that the principal can ever get from a c2 agent after history (ht , bt = b2). Hence, it is optimal for the principal to make such a separating offer.17 Suppose next that bt = b1. If the principal makes an offer that a c2 -agent accepts and a c3 -agent rejects, she pays a transfer Tt = c2 + X(b1, E3 )(c3 − c2 ). Thus, the principal’s payoff 17 Indeed, the principal’s payoff from making an offer equal to Tt when the agent’s type is c3 is X(2, {4})[b(4) − c3 ], which is also the most that she can extract from an agent of type c3 . 33 from making such an offer, conditional on the agent being type c2 , is Ũ sc [c2 ] = b1 − Tt + X X(b1, {b})[b − c2 ] b∈{b2,b3,b4} X = b1 − c2 + X(b1, {b4})[b4 − c3 ] + X(b1, {b})[b − c2 ]. b∈{b2,b3} If the principal makes an offer that both types reject when bt = b1, then by the arguments above she learns the agent’s type the first time at which shock b2 is reached. Let ť be the random variable that indicates the next date at which shock b2 is realized. Then, conditional on the agent’s type being c2 , the principal’s payoff from making an offer that both types reject when bt = b1 is " Ũ nsc [c2 ] =E ť−1 X # δ t0 −t 1bt0 =b4 (b4 − c3 )|bt = b1 t0 =t+1 + E δ ť−t b2 − c2 + X(b2, {b4})[b4 − c3 ] + X X(b2, {b})[b − c2 ] |bt = b1 b∈{b2,b3} =X(b1, {b4})[b4 − c3 ] + X(b1, {b2})[b2 − c2 ] + E δ ť−t |bt = b1 X(b2, {b3})[b3 − c2 ] where we used (18), which is the payoff that the principal obtains from an agent with type c2 when the state is b2 and the support of her beliefs is {c2 , c3 }. Then, we have h i Ũ nsc [c2 ] − Ũ sc [c2 ] = −[b1 − c2 ] − X(b1, {b3}) − E δ ť−t |bt = b1 X(b2, {b3}) [b3 − c2 ] which is positive when |b1 − c2 | is large enough since b1 − c2 < 0 by assumption. Since the principal’s payoff conditional on the agent’s type being c3 is the same regardless of whether she makes a separating offer or not when bt = b1 (i.e., in either case the principal earns X(b1, {b4})(b4 − c3)), the principal chooses not to make an offer that c2 accepts and c3 rejects when bt = b1. Proof of Property (iii). Suppose C[ht ] = {c1 , c2 , c3 }. Theorem 1 implies that all agent types take action a = 1 if bt = b4, and all agent types take action a = 0 if bt = b3 (this last claim follows since X(b3, {b4}) > 1). Suppose next that C[ht ] = {c1 , c2 , c3 } and bt = b2. We first claim that if the principal makes an offer that only a subset of types accept at state b2, then this offer must be such 34 that types in {c1 , c2 } take action a = 1 and type c3 takes action a = 0. To see this, suppose that she instead makes an offer that only an agent with type c1 accepts, and that agents with types in {c2 , c3 } reject. The offer that she makes in this case satisfies Tt − c1 = V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ). By property (ii) above, under this proposed equilibrium a c2 -agent will from period t + 1 onwards take the action at all times t0 > t such that bt0 = b2.18 Therefore, Aσ2 [ht , bt ] ≥ X(b2, {b2}) > 1, where the last inequality follows from Assumption 1. The payoff that an agent of type c2 obtains by accepting offer Tt at time t is bounded below by Tt − c2 = c1 − c2 + V2σ [ht , bt ] + Aσ2 [ht , bt ](c2 − c1 ) > V2σ [ht , bt ], where the inequality follows since Aσ2 [ht , bt ] > 1. But this cannot be, since V2σ [ht , bt ] is the equilibrium payoff of an agent with type c2 . Therefore, either the principal makes an offer that only types in {c1 , c2 } accept in state b2, or she makes an offer that all types reject. We now show that the principal makes an offer that types in {c1 , c2 } accept and type c3 rejects when bt = b2 and C[ht ] = {c1 , c2 , c3 }. If she makes an offer that agents with cost in {c1 , c2 } accept and a c3 -agent rejects, then she pays a transfer Tt = c2 + X(b2, {b4})(c3 − c2 ). Note then that, by property (i) above, when the agent’s cost is in {c1 , c2 }, the principal stops learning: for all times t0 > t the principal makes an offer Tt0 = c2 that both types accept when bt0 ∈ E2 , and she makes a low offer Tt0 = 0 that both types reject when bt0 ∈ / E2 . Therefore, conditional on the agent’s type being either c1 or c2 , the principal’s payoff from making at time t an offer Tt that agents with cost in {c1 , c2 } accept and a c3 -agent rejects is Û sc [{c1 , c2 }] = b2 − Tt + X X(b2, {b})[b − c2 ] b∈{b2,b3,b4} = b2 − c2 + X(b2, {b4})[b4 − c3 ] + X X(b2, {b})[b − c2 ] b∈{b2,b3} On the other hand, if she does not make an offer that a subset of types accept when bt = b2, then the principal’s payoffs conditional on the agent being of type ci ∈ {c1 , c2 } is bounded above by Û nsc [ci ] = E t̂−1 X 0 δ t −t 1bt0 =b4 (b4 − c3 ) + δ t̂−t t0 =t X X(b1, {b})(b − ci )|bt = b2 b∈Ei 18 Indeed, under the proposed equilibrium, if the offer is rejected the principal learns that the agent’s type is in {c2 , c3 }. By property (ii), if the agent’s type is c2 , the principal will learn the agent’s type the time the shock is b2 (because at that time an agent with type c2 will take the action, while an agent with type c3 won’t), and from that point onwards the agent will take the action when the shock is in E2 = {b2, b3, b4}. 35 where t̂ denotes the next period that state b1 is realized. Note that there exists > 0 small such that, if Qb,b1 < for all b 6= b1, then Û sc [{c1 , c2 }] > Û nsc [ci ] for i = 1, 2. Finally, note that the payoff that principal obtains from an agent of type c3 at history ht when bt = b2 is X(b2, {b4})(b4 − c3 ), regardless of whether the principal makes a separating offer or not. Therefore, if Qb,b1 < for all b 6= b1, when C[ht ] = {c1 , c2 , c3 } and bt = b2 the principal makes an offer Tt that only types in {c1 , c2 } accept. Finally, we show that when C[ht ] = {c1 , c2 , c3 } and bt = b1, the principal makes an offer that only type c1 accepts. Let ť be the random variable that indicates the next date at which state b2 is realized. If the principal makes an offer Tt that only a c1 -agent accepts, this offer satisfies Tt − c1 = V2σ [ht , b1] + Aσ2 [ht , b1](c2 − c1 ) = X(b1, {b4})(c3 − c1 ) + X(b1, {b2}) + E[δ ť−t |bt = b1]X(b2, {b3}) (c2 − c1 ) (19) where the equality follows since V2σ [ht , b1] = X(b1, {b4})(c3 − c2 ) and since, by property (ii), when the support of the principal’s beliefs is {c2 , c3 } and the agent’s type is c2 , the principal learn the agent’s type at time ť.19 Therefore, conditional on making an offer that only type c1 accepts, the principal’s equilibrium payoff from making an offer that only an agent with cost c1 accepts at state b1 is Ǔ sc [c1 ] = b1 − Tt + X X(b1, {b})[b − c1 ] b∈{b2,b3,b4} = b1 − c1 + X(b1, {b4})[b4 − c3 ] + X(b1, {b3})[b3 − c1 ] + X(b1, {b2})[b2 − c2 ] − E[δ ť−t |bt = b1]X(b2, {b3})(c2 − c1 ) where the second line follows from substituting the transfer in (19). On the other hand, the principal’s payoff from making such an offer at state b1, conditional on the agent’s type being 19 Indeed, the fact that the principal learns the agent’s type at time ť implies that ∞ ť−1 X X 0 0 Aσ2 [ht , b1] =E δ t −t 1bt0 =b4 + δ ť δ t −ť 1bt0 ∈E2 |bt = b1 t0 =t t0 =ť h i =X(b1, {b4}) + X(b1, {b2}) + E δ ť−t X(b2, {b3})|bt = 1 . Since X(b1, {b4}) < 1, there exists ε > 0 such that Aσ2 [ht , b1] < 1 whenever Qb,b2 < ε for all b 6= b2. 36 c2 , is Ǔ sc [c2 ] = E " ť−1 X # δ τ −t 1bτ =b4 (b4 − c3 )|bt = b1 τ =t + E δ ť−t (b2 − c2 − X(b2, {b4})(c3 − c2 )) + ∞ X δ τ −t 1bτ ∈E2 (bτ − c2 )|bt = b1 τ =ť+1 h i =X(b1, {b4})(b4 − c3 ) + X(b1, {b2})(b2 − c2 ) + E δ ť−t X(b2, {b3})|bt = b1 (b3 − c2 ), where we used the fact that, when the support of her beliefs is {c2 , c3 }, the principal makes an offer that only a c2 -agent accepts when the state is b2 (the offer that she makes at that point is T = c2 + X(b2, {b4})(c3 − c2 )). Alternatively, suppose the principal makes an offer that both c1 and c2 accept but c3 rejects. Then she pays a transfer Tt = c2 + X(b1, {b4})(c3 − c2 ); thus, her payoff from learning that the agent’s type is in {c1 , c2 } in state b1 is Ū sc [{c1 , c2 }] = b1 − Tt + X X(b1, {b})(b − c2) b∈{b2,b3,b4} = b1 − c2 + X(b1, {b4})[b4 − c3 ] + X(b1, {b2})[b2 − c2 ] + X(b1, {b3})[b3 − c2 ], where we used the fact that the principal never learns anything more about the agent’s type when the support of her beliefs is {c1 , c2 } (see property (i) above). Note that there exists η > 0 and K > 0 such that, if Qb,b2 < η for all b 6= b2 and if b1 − c2 < −K, then sc sc h ť−t i Ǔ [c1 ] − Ū [{c1 , c2 }] = 1 + X(b1, {b3}) − E[δ |bt = b1]X(b2, {b3}) (c2 − c1 ) > 0 and h h i i Ǔ sc [c2 ] − Ū sc [{c1 , c2 }] = E δ ť−t X(b2, {b3})|bt = b1 − X(b1, {b3}) (b3 − c2 ) − (b1 − c2 ) > 0. Therefore, under these conditions, at state b1 the principal strictly prefers to make an offer that a c1 -agent accepts and agents with cost c ∈ {c2 , c3 } reject than to make an offer that agents with cost in {c1 , c2 } accept and a c3 -agent rejects. However, the principal may choose to make an offer that all agent types reject when bt = b1 and C[ht ] = {c1 , c2 , c3 }. In this case, by the arguments above, the next time the state is equal to b2 the principal will make an offer that only types in {c1 , c2 } accept. The offer that she makes in this case is such that T − c2 = X(b2, {b4})(c3 − c2 ). Then, from that 37 point onwards, she will never learn more (by property (i) above). In this case, the principal’s payoff conditional on the agent’s type being {c1 , c2 } is Ū nsc =E " ť−1 X # 1bτ =b4 (bτ − c3 )|bt = b1 τ =t " # + E δ ť−t (b2 − T ) + X X(b2, {b})(b − c2 )|bt = b1 b∈E2 = X(b1, {b4})[b4 − c3 ] + X(b1, {b2})[b2 − c2 ] + E[δ ť−t |bt = b1]X(b2, {b3})[b3 − c2 ]. Note that there exists η 0 > 0 and 0 > 0 such that, if Qb,b2 < η 0 for all b 6= b2, and if b1 − c1 > −0 , then, sc Ǔ [c1 ] − Ū nsc h = b1 − c1 + X(b1, {b3}) − E[δ ť−t i |bt = b1]X(b2, {b3}) [b3 − c1 ] > 0 and Ǔ sc [c2 ] − Ū nsc = 0. Therefore, under these conditions, the principal makes an offer that type c1 accepts and types in {c2 , c3 } reject when C[ht ] = {c1 , c2 , c3 } and bt = b1. References Blume, A. (1998): “Contract Renegotiation with Time-Varying Valuations,” Journal of Economics & Management Strategy, 7, 397–433. Carmichael, H. L. and W. B. MacLeod (2000): “Worker cooperation and the ratchet effect,” Journal of Labor Economics, 18, 1–19. Chassang, S. (2010): “Building routines: Learning, cooperation, and the dynamics of incomplete relational contracts,” The American Economic Review, 100, 448–465. Dewatripont, M. (1989): “Renegotiation and information revelation over time: The case of optimal labor contracts,” The Quarterly Journal of Economics, 589–619. Dixit, A. (2000): “IMF programs as incentive mechanisms,” Unpublished manuscript, Department of Economics, Princeton University, Princeton, NJ. Fiocco, R. and R. Strausz (2015): “Consumer standards as a strategic device to mitigate 38 ratchet effects in dynamic regulation,” Journal of Economics & Management Strategy, 24, 550–569. Freixas, X., R. Guesnerie, and J. Tirole (1985): “Planning under incomplete information and the ratchet effect,” The review of economic studies, 52, 173–191. Fudenberg, D., D. K. Levine, and J. Tirole (1985): “Infinite-horizon models of bargaining with one-sided incomplete information,” in Bargaining with incomplete information, ed. by A. Roth, Cambridge Univ Press, 73–98. Gerardi, D. and L. Maestri (2015): “Dynamic Contracting with Limited Commitment and the Ratchet Effect,” Tech. rep., Collegio Carlo Alberto. Gibbons, R. (1987): “Piece-rate incentive schemes,” Journal of Labor Economics, 413–429. ——— (2010): “Inside Organizations: Pricing, Politics, and Path Dependence,” Annual Review of Economics, 2, 337–365. Gul, F., H. Sonnenschein, and R. Wilson (1986): “Foundations of dynamic monopoly and the Coase conjecture,” Journal of Economic Theory, 39, 155–190. Halac, M. (2012): “Relational contracts and the value of relationships,” The American Economic Review, 102, 750–779. Halac, M. and A. Prat (2015): “Managerial Attention and Worker Performance,” . Hallward-Driemeier, M., G. Iarossi, and K. Sokoloff (2001): “Manufacturing productivity in East Asia: market depth and aiming for exports,” Unpublished manuscript, World Bank. Hart, O. D. and J. Tirole (1988): “Contract renegotiation and Coasian dynamics,” The Review of Economic Studies, 55, 509–540. Kanemoto, Y. and W. B. MacLeod (1992): “The ratchet effect and the market for secondhand workers,” Journal of Labor Economics, 85–98. Kennan, J. (2001): “Repeated bargaining with persistent private information,” The Review of Economic Studies, 68, 719–755. Laffont, J.-J. and J. Tirole (1988): “The dynamics of incentive contracts,” Econometrica: Journal of the Econometric Society, 1153–1175. 39 Levin, J. (2003): “Relational incentive contracts,” The American Economic Review, 93, 835–857. Li, J. and N. Matouschek (2013): “Managing conflicts in relational contracts,” The American Economic Review, 103, 2328–2351. Luria, D. (1996): “Why markets tolerate mediocre manufacturing,” Challenge, 39, 11–16. Malcomson, J. M. (2015): “Relational incentive contracts with persistent private information,” Econometrica, forthcoming. Ortner, J. (2016): “Durable goods monopoly with stochastic costs,” Theoretical Economics, forthcoming. Schmidt, K. M. (1993): “Commitment through incomplete information in a simple repeated bargaining game,” Journal of Economic Theory, 60, 114–139. 40