Recursive Formulation with Learning Suehyun Kwon April 24, 2014

advertisement
Recursive Formulation with Learning
Suehyun Kwon∗
April 24, 2014
Abstract
This paper provides an algorithm to find the optimal contract with learning. There
is a payoff-relevant state, and the principal and the agent start with a common prior.
Each period, the agent takes an unobservable action, and the distribution of outcome
is determined by the agent’s action and the state. Because of learning, the principal
and the agent have asymmetric information off the equilibrium path, and the usual
algorithms don’t apply. The proposed algorithm decomposes the continuation values
by each state, and the one-shot deviation principle is shown. The algorithm allows for
any risk preference of the agent and different commitment powers of the principal.
1
Introduction
Consider the worker’s ability in Harris and Holmström (1982) or Holmström (1982). This
paper presents a model of dynamic moral hazard in which the principal and the agent learn
about the agent’s type over time. An algorithm to find the optimal contract in a general
environment is provided. The main difference from the two papers cited above is that the
outside option of the agent is assumed to be exogenous. The focus of the paper is how
learning affects the agent’s incentives within the relationship and how one can characterize
the optimal contract with an algorithm.
When the principal hires an agent, the agent’s ability may be unknown to both the
principal and the agent. Or if the quality of matching between the principal and the agent
is unknown at the onset of the relationship, the principal and the agent learn about the
match quality over time. In innovation or experimentation, neither the principal or the
agent knows the quality of the project, and they have to learn over time. Learning brings
a few challenges into the characterization of the optimal contract. The first challenge is
∗
Kwon: University College London, suehyun.kwon@ucl.ac.uk, Department of Economics, University College London, Gower Street, London, WC1E 6BT United Kingdom. I thank Dilip Abreu, V. Bhaskar,
Patrick Bolton, Antonio Cabrales, Pierre-André Chiappori, Martin Cripps, Glenn Ellison, Jihong Lee, George
Mailath and participants at UCL Economics Research Day for helpful comments.
1
to represent the continuation values of the principal and the agent. When the principal
offers history-contingent payments, the agent evaluates it with his prior beliefs at every
information set. If the states are partially or fully persistent, the agent updates his belief
after each period, given his action and the realization of the outcome. In order to evaluate
the agent’s payoff after some history, one needs to know the prior after the history, the
equilibrium strategies and the payment for each history.
The problem is further complicated when one considers the IC constraint for the one-shot
deviation. When the agent deviates, the agent updates his posterior belief with the action
he actually took. However, the principal believes that the agent took a different action and
uses a different probability distribution to update his posterior belief. The principal and the
agent have different prior beliefs in all periods after the agent deviates unless they learn the
state perfectly after certain outcomes. In the IC constraint for the one-shot deviation, even
if the agent deviates only once, now the agent evaluates his deviation payoff with a different
prior from the principal’s prior. When the principal and the agent play the equilibrium
strategies from the following period and on, the payments for each history is the same as
when the agent has never deviated, but because the agent has a different prior belief, the
agent’s continuation value is different from the on-the-equilibrium-path continuation value.
To find the optimal contract recursively, one needs to express the agent’s payoffs on and off
the equilibrium path.
The second challenge is to show the one-shot deviation principle. Given a history and
a prior, the agent has infinitely many deviations he can do, and there are infinitely many
IC constraints. In addition, if the agent has deviated previously, then the principal and the
agent have different prior beliefs, and in order to account for all deviations, the principal
also has to consider the potential prior beliefs the agent may have. I show that it is sufficient
to consider IC constraints for one-shot deviations when the agent has never deviated before.
If the agent has never deviated before, the principal and the agent share the same prior,
and there is only one IC constraint for the one-shot deviation.
The algorithm I propose in this paper circumvents these issues by decomposing the
continuation values in each state. Instead of keeping track of the agent’s prior belief at every
information set, I note that the state transition is exogenous, and once the principal and the
agent are in a particular state, the transition probabilities and the probability distribution
of the outcome for each action are independent of the initial prior. The continuation value
when the principal and the agent follow the equilibrium strategies and they are in state i
in the current period is given by some number, and the continuation value of the agent can
be written as a linear combination of those values, with the agent’s prior as weights.
Once we decompose the continuation values in each state, the one-shot deviation principle follows from the supermodularity of the optimal contract. If the agent’s continuation
2
value under the optimal contract is supermodular in his effort and the state, then it is sufficient to consider IC constraints for one-shot deviations when the agent has never deviated
before. I assume that the agent can only deviate to a less costly action, and the agent’s
posterior after a deviation dominates the principal’s posterior in the sense of first-order
stochastic dominance.
Decomposition of continuation values and the one-shot deviation principle allow one
to find the optimal contract with one IC constraint at a time. In addition to the oneshot deviation principle in the usual sense, the principal doesn’t need to worry about the
potential priors the agent may have if he has deviated previously. If the environment is
finite, i.e., the set of states is finite and the set of outcomes is finite, then there are a
finite number of state variables, and one can find the largest self-generating set subject to
constraints. With a continuum of states and outcomes, one can discretize the state space.
The algorithm can be applied for a general outcome structure. There are a few recent
papers on dynamic moral hazard and learning. See Bhaskar (2012), DeMarzo and Sannikov
(2011), He, Wei and Yu (2013), Jovanovic and Prat (2013) and Kwon (2013). For innovation
and experimentation, see Bergemann and Hege (1998, 2005) and Hörner and Samuelson
(2013). DeMarzo and Sannikov (2011), He, Wei and Yu (2013) and Jovanovic and Prat
(2013) share the similar outcome structure in which the outcome is the sum of the agent’s
type and the effort with noise. In Bergemann and Hege (1998, 2005), Hörner and Samuelson
(2013) or Kwon (2013), there is a positive probability of a good outcome only if the agent
works and the project (state) is good. I assume the supermodularity of the optimal contract,
but even in a binary environment, the outcome structure allows for a positive probability of
a good outcome when the agent shirks if the state is good. This implies that the principal
can learn the state without leaving any rent by letting the agent shirk. If there is no
probability of a good outcome when the agent shirks, the literature finds it optimal to
frontload the incentives early on and learn the state. This property has to change when
there is a chance of learning with shirking. The principal loses output because the outside
option is inefficient, but he doesn’t have to leave any rent to the agent, and it can be optimal
to let the agent shirk early on in the relationship until they learn the state sufficiently.
DeMarzo and Sannikov (2011) and He, Wei and Yu (2013) assume that the state follows
a Brownian motion, and other papers cited above assume that the state is fully persistent.
When the state is fully persistent, the principal’s problem is a stopping-time problem. The
algorithm I propose can allow for both partially persistent states and fully persistent states.
Even if the state evolves over time, the algorithm can be applied to find the optimal contract.
Another advantage of my algorithm is that it allows for both risk-neutrality and riskaversion of the agent. If the agent is risk-averse, he may have any concave and increasing
function. All of the above papers assume either a risk-neutral agent or the CARA preference.
3
I don’t show qualitative properties of the optimal contract in this paper, but my algorithm
can provide the optimal contract for wider risk preferences. I can also allow for different
commitment powers of the principal. The algorithm can accommodate both with-in period
commitment power and no-commitment power.
There is a large literature on models with learning starting with Harris and Holmström
(1982) and Holmström (1982). Harris and Holmström (1982) doesn’t have moral hazard
while the agent in Holmström (1982) decides how much effort to exert. Cabrales, CalvóArmengol and Pavoni (2007) is a more recent paper on the topic. The main difference from
this literature is that I assume an exogenous outside option of the agent, and there is no
competitive market. Without the competitive market, the agent doesn’t benefit from the
market expectation through the outside option, and the agent’s IC constraint focuses on
the potential information asymmetry between the principal and the agent.
Lastly, there are papers with private Markovian types. Fernandes and Phelan (2000)
provides a recursive formulation with a first-order Markov chain with two types, and Zhang
(2009) solves for optimal contracts in continuous time. Recent papers include Farhi and
Werning (2013), Golosov et al (2011) and Williams (2011).
The rest of the paper is organized as follows. Section 2 describes the model, and I set
up the recursive formulation in Section 3. Section 4 discusses extensions, and Section 5
concludes.
2
Model
The principal hires an agent over an infinite horizon, t = 0, 1, 2, · · · . At the beginning of
each period, the principal offers a contract to the agent, and the agent decides whether to
accept. If the agent accepts, he takes an action from a finite set, a ∈ A = {a1 , · · · , an }, and
the outcome is realized. The distribution of the outcome is determined by the underlying
state ω ∈ Ω = [0, 1] and the agent’s action. The outcome y is in Y = [0, 1], and the
distribution of the outcome is denoted by F (·|a, ω) with pdf f (·|a, ω). The agent’s action
is unobservable to the principal, which leads to moral hazard, and the underlying state is
unobservable to both the principal and the agent; the principal only observes the outcome.
The principal makes a payment, and they move on to the next period. If the agent rejects,
the principal and the agent each get their outside options in this period, v̄ and ū, and
continue.
At the beginning of period 0, the principal and the agent start with a common prior
π0.
The states follow a first-order Markov chain, and the probability of transition from
state ω to ω 0 is denoted by m(ω 0 |ω). For the moment, I’m not assuming anything about
the transition matrix, but I will impose a condition in Assumption 3. Action ai costs ci to
4
the agent, and I assume 0 ≤ c1 < c2 < · · · < cn . I assume the principal has within-period
commitment power, and the agent has limited liability.1 The agent cannot save, and his
consumption in each period is bounded from below by 0. The principal is risk-neutral, but
the agent has an increasing utility function u(·) with u0 > 0, u(0) = 0. I allow for both
risk-aversion and risk-neutrality. The equilibrium concept is perfect Bayesian equilibrium.
I make three assumptions. The first assumption is that the agent becomes more optimistic than the principal if he takes a less costly action.
Assumption 1. Let π̂(π, ai , y) be the posterior of the agent when the prior is π, the agent
takes action ai and gets outcome y. π̂(π, ai , y) dominates π̂(π, aj , y) in the sense of firstorder stochastic dominance for all j > i.
If the principal believes that the agent took action aj , his posterior belief after getting
outcome y is π̂(π, aj , y). If the agent deviated to ai such that i < j, then the agent’s
posterior belief after the deviation is π̂(π, ai , y), and the agent’s posterior dominates the
principal’s posterior in the sense of first-order stochastic dominance.
The next assumption is that the agent can only deviate to a less costly action.
Assumption 2. Suppose the principal wants the agent to take action ai . The agent can
only deviate to aj such that j < i.
I’m considering an environment where the agent needs access to a particular technology
to produce. In some environments, the agent has to be on site to do his work, and sometimes,
researchers can access the data only from a designated location or a machine. If the agent
needs access to a technology, then the principal can limit the amount of time the agent
has access, and the agent can deviate downwards, but he can’t work more than what the
principal allows him to.
The last assumption is that the agent’s payoff under the optimal contract is supermodular in his effort and the state. This is an assumption on endogenous objects, but it is
automatically satisfied in the following environment. Suppose there are two states (“good”
and “bad”), two actions (“work” and “shirk”) and two outcomes (“good” and “bad”). If
the agent works in the good state, he gets a good outcome with some probability pH > 0. If
the agent shirks in the good state, he gets a good outcome with some probability pL < pH .
If the state is bad, then the outcome is bad. Assumption 3 is always satisfied for any pL , pH ,
and this environment embeds other models on innovation and experimentation.2
1
I consider full-commitment power and no-commitment power in Section 4.1.
See for example Hörner and Samuelson (2013). Their model is in continuous time, but the agent can
produce a good outcome with a positive probability only if he works and the project is good. I can also
allow the state to evolve over time.
2
5
Assumption 3. Under the optimal contract, the agent’s payoff is supermodular in the
agent’s effort and the state:
Z
(u(wy ) + δVyω0 m(ω 0 |ω))dF (y|a, ω)dω 0
is supermodular in a and ω, where Vyω0 is the agent’s payoff on the equilibrium path after
outcome y if the state in the following period is ω 0 and wy is the payment for outcome y in
the current period.
A history htP = (d1 , y 1 , w1 , d2 , · · · ) is the history from the principal’s perspective where
dt is the agent’s decision to accept or reject the offer. A history htA = (d1 , a1 , y 1 , w1 , d2 , · · · )
is the history from the agent’s perspective. The set of histories for the principal and the
t , respectively. Since the agent’s effort is
agent are denoted by HP = ∪HPt and HA = ∪HA
unobservable to the principal, the agent’s history also keeps track of his past efforts. The
principal’s history is a public history, but I allow private strategies so that the agent’s effort
can depend on his past effort. In this context, the updating of prior beliefs depends on the
agent’s effort. The principal updates his belief based on the on-the-equilibrium-path action
of the agent, whereas if the agent deviates, the agent updates his belief with the true action.
3
Recursive Formulation
This section develops the recursive formulation to characterize the optimal contracts. I
first show the results for the binary environment in Section 3.1 and illustrate what are the
challenges and the methodological contributions of the paper. Section 3.2 generalizes the
results to the model in Section 2.
3.1
Binary Environment
I consider the binary environment in this section. I decompose the principal’s continuation
value and the agent’s continuation value by the on-the-equilibrium-path continuation values in each state. Then I show the one-shot deviation principle holds, and in particular,
it is sufficient to consider the IC constraint for one-shot deviations when the agent has
never deviated before. Combining the decomposition of continuation values and the oneshot deviation principle, I construct the recursive formulation to characterize the optimal
contracts.
There are two states (“good” and “bad”), two actions (“work” and “shirk”) and two
outcomes (“good” and “bad”). If the agent works in the good state, he gets a good outcome
with some probability pH > 0. If the agent shirks in the good state, he gets a good outcome
with some probability pL < pH . If the state is bad, then the outcome is bad for both
6
actions. Work costs c > 0 to the agent, and shirking costs nothing. Assumptions 1 and 2
hold. Denote the Markov transition matrix by M , where Mij is the probability of being in
state j after being in state i.
3.1.1
Decomposition of Continuation Values
The first step is to decompose the continuation values of the principal and the agent.
Consider the principal’s problem. The principal maximizes his payoff in period 0 subject to
IR, IC and limited liability every period. Since the principal has within-period commitment
power, the principal’s payoff at each information set also has to be weakly greater than his
outside option. At any information set, the principal’s payoff should be weakly greater than
his payoff off the equilibrium path. This payoff is bounded from below by his outside option.
Since I’m interested in the set of all equilibrium payoffs, there is no loss of generality in
assuming that off the equilibrium path, the agent believes that his payoff is constant across
all outcomes. The agent takes the least expensive action, and anticipating this, the principal
takes his outside option.
Given a history ht and the principal’s prior π, there are infinitely many IC constraints,
but first consider the IC constraint for the one-shot deviation when the agent has never
deviated before. If the agent has never deviated before, the principal and the agent share
the same prior π. Let V (ht , π) be the agent’s continuation value after history ht with prior
π when both the principal and the agent play the equilibrium strategies. The IC constraint
for the one-shot deviation takes the following form:
− c + π1 pH (u(w1 ) + δV (ht 1, (1, 0)M )) + (1 − π1 pH )(u(w0 ) + δV (ht 0, π̂))
≥ π1 pL (u(w1 ) + δV (ht 1, (1, 0)M )) + (1 − π1 pL )(u(w0 ) + δV (ht 0, π̃)).
If they get a good outcome, then they learn that they are in the good state, and the prior
is updated according to the Markov matrix. If they get a bad outcome, the posterior belief
depends on which action the principal believes the agent took. On the equilibrium path,
the agent worked, and we have
π̂ =
π1 (1 − pH )
1 − π1
,
π1 (1 − pH ) + (1 − π1 ) π1 (1 − pH ) + (1 − π1 )
M.
On the other hand, when the agent deviated, he updates his belief to
π̃ =
π1 (1 − pL )
1 − π1
,
π1 (1 − pL ) + (1 − π1 ) π1 (1 − pL ) + (1 − π1 )
M.
Given history ht 0, the principal believes he’s offering V (ht 0, π̂) to the agent, but off-the-
7
1
π 0 p~
1
h
0
π~
p
π1
h
1 − π~
p
0
π̂~
p
1
1 − π1 h
0
pH
1 − pH
0
1
1
pH
1
0
0
1
0
0
1
0
Figure 1: Decomposition of Continuation Values
equilibrium-path, the agent is getting V (ht 0, π̃). Because of this difference, we can’t take
the agent’s continuation value as the state variable anymore, and the approach in Spear
and Srivastava (1987) doesn’t work in this environment. Furthermore, ht 0 determines the
payoff stream on the equilibrium path, while the prior keeps getting updated every period;
we need to evaluate the expected payoff with respect to the prior. The first innovation
of the paper is to decompose the agent’s continuation value into on-the-equilibrium-path
continuation values in each state and to represent V (ht , π) as a linear combination of π for
any prior π. Suppose for the moment, the Markov matrix is the identity. Figure 1 shows the
decomposition of the continuation values. π 0 = (1, 0)M is the prior in the period following
a good outcome, and p~ = p0H is the probability of a good outcome in each state when the
agent works. The graph on the left is the usual representation of the information set and
the priors, and the graph on the right is the decomposition. After history ht , the principal
and the agent don’t know which state they are in. However, the state is exogenous, and
once they are in a particular state, the probability of a good outcome given the agent’s
action is the same every period. In particular, we can express the conditional probability of
a history as the sum of conditional probabilities in each state. For example, the probability
of ht 11 given history ht is π~
p × π 0 p~. But we can express it as π1 p2H + (1 − π1 ) × 0. We
can express every probability as the sum of two probabilities with π as the weights. Given
history ht , there exists V1 and V0 such that V (ht , π) = π1 V1 + (1 − π1 )V0 for all π. V1 is
the expected payoff of the agent on the equilibrium path in the good state, and V0 is the
expected payoff of the agent on the equilibrium path in the bad state.
The decomposition works for any matrix M . In Figure 2, the principal and the agent
don’t know which state they are in. If they are in the good state, they get the good outcome
with probability pH , and they get the bad outcome in the bad state. After the outcome is
realized, the state transits to the next state, and the transition probabilities are determined
by what state they are in currently. The state transition is exogenous, and it’s not affected
8
pH
1
h
1 − pH 0
π1
1
1 − π1
M11
ω1
M12
M11
ω0
ω1
M12
ω0
M21
ω1
M22
M21
ω0
ω1
M22
ω0
0
h
0
1
Figure 2: Decomposition for Partially Persistent States
by the agent’s effort or the realization of the outcome.3 Given a history, the payoff stream
on the equilibrium path is determined, and we can decompose the continuation value by the
continuation value on the equilibrium path in each state: If the principal wants the agent
to work this period, the agent’s continuation value is given by
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 ) × 0 + (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )),
where Vyω0 is the agent’s continuation value on the equilibrium path if the outcome this
period is y and they are in state ω 0 in the following period. w1 and w0 are the payments
for the good outcome and the bad outcome, respectively.
Once we decompose the continuation values by the on-the-equilibrium-path continuation
values in each state, then we can express V (ht 0, π̂) and V (ht 0, π̃) with same V11 , V10 , V01
and V00 , and the only difference between the two continuation values is that we use π̂ and
π̃ as weights for each of them.
Proposition 1. Given a contract, fix any public history ht . The agent’s expected payoff
satisfies the following equations for any prior π if the principal and the agent follow the
equilibrium strategies after ht . If the principal wants the agent to work this period, there
3
In some circumstances, one can allow the transition probabilities to be endogenous. Section 4.2 discusses
the endogenous states.
9
exist V11 , V10 , V01 and V00 such that
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )).
If the principal wants the agent to shirk this period, there exist V1 , V0 such that for any prior
π,
V (ht , π) = ū + (π1 M11 + (1 − π1 )M21 )δV1 + (π1 M12 + (1 − π1 )M22 )δV0 .
Similarly, let W (ht , π) be the principal’s payoff after history ht given prior π when the principal and the agent follow the equilibrium strategies after ht . W (ht , π) satisfies the following
equations. If the principal wants the agent to work this period, there exist W11 , W10 , W01
and W00 such that for any prior π,
W (ht , π) =π1 pH (1 − w1 + δ(M11 W11 + M12 W10 )) + π1 (1 − pH )(−w0 + δ(M11 W01 + M12 W00 ))
+ (1 − π1 )(−w0 + δ(M21 W01 + M22 W00 )).
If the principal wants the agent to shirk this period, there exist W1 , W0 such that for any
prior π,
W (ht , π) = v̄ + (π1 M11 + (1 − π1 )M21 )δW1 + (π1 M12 + (1 − π1 )M22 )δW0 .
The decomposition of continuation value allows one to express the agent’s continuation
values on and off the equilibrium path with the same state variables when the agent does
a one-shot deviation. The only difference between the on- and off-the-equilibrium-path
continuation values is that the agent has different prior beliefs, which is the weight in the
linear combination of the state variables. Without decomposition, one doesn’t know what
are the possible deviation payoffs of the agent when the principal offers certain amount as
the on-the-equilibrium-path continuation value, which is a problem when one considers the
IC constraint of the agent. The next section shows how the decomposition comes in the IC
constraints.
3.1.2
One-Shot Deviation Principle
The next step is to reduce the set of IC constraints. I show that the IC constraints for
one-shot deviations when the agent has never deviated before are sufficient conditions for
all IC constraints.
Given any history ht and prior π, the agent has infinitely many ways to deviate, and
10
there are an infinite number of IC constraints. Furthermore, if the agent has deviated before
reaching ht , the principal and the agent have different prior beliefs, and in order to account
for all potential prior beliefs the agent may have, we need to know the time index and
history ht , which prevents the recursive representation of the problem. However, I show
that the IC constraints for one-shot deviations when the agent has never deviated before
are sufficient, and at any point, it is sufficient to consider only one IC constraint.
Suppose after history ht , the principal has prior π. If the agent has never deviated
before, the principal and the agent share the common prior π. From Proposition 1, if the
principal wants the agent to work this period, we can decompose the agent’s continuation
value by V11 , V10 , V01 and V00 . The IC constraint for the one-shot deviation can be written
as
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 ))
≥ π1 pL (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pL )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )).
Note that the continuation values from the following period and on are V11 , V10 , V01 and
V00 , and they don’t change when the agent deviates. The agent’s deviation affects the
distribution of outcomes in this period, but the agent’s expected payoffs in the continuation
game are completely determined by the outcome this period and the state in the next
period. The state transition is also exogenous, and the probabilities are independent of the
agent’s action.
We can simplify the IC constraint as
π1 (pH − pL )((u(w1 ) + δ(M11 V11 + M12 V10 )) − (u(w0 ) + δ(M11 V01 + M12 V00 ))) ≥ c.
(1)
From Assumptions 1 and 2, the agent has prior π̂ with π̂1 ≥ π1 if he has deviated before.
The agent’s IC constraint for the one-shot deviation becomes
π̂1 (pH − pL )((u(w1 ) + δ(M11 V11 + M12 V10 )) − (u(w0 ) + δ(M11 V01 + M12 V00 ))) ≥ c,
and it is satisfied whenever (1) holds. Therefore, the IC constraints for one-shot deviations
when the agent has never deviated before are sufficient conditions for all IC constraints for
one-shot deviations. Since the principal has within-period commitment power, the agent’s
payoff is bounded from above, and together with limited liability, we have continuity at
infinity. Therefore, the IC constraints for one-shot deviations are sufficient conditions for
all IC constraints, and we have the following proposition.
11
Proposition 2. In the binary environment with pL < pH , IC constraints for one-shot
deviations when the agent has never deviated before are sufficient conditions for all IC
constraints.
3.1.3
Recursive Formulation: Binary environment
This section sets up the algorithm. After history ht , suppose the principal has prior π and
he wants the agent to work this period. The principal’s payoff should be weakly greater
than his outside option,
W = π1 pH (1 − w1 + δ(M11 W11 + M12 W10 )) + π1 (1 − pH )(−w0 + δ(M11 W01 + M12 W00 ))
v̄
+ (1 − π1 )(−w0 + δ(M21 W01 + M22 W00 )) ≥
.
(2)
1−δ
The agent has following three constraints:
(IC) : π1 (pH − pL )((u(w1 ) + δ(M11 V11 + M12 V10 )) − (u(w0 ) + δ(M11 V01 + M12 V00 ))) ≥ c,
(3)
(IR) : V = −c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 ))
+ π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 ))
ū
,
≥
1−δ
(LL) : w1 , w0 ≥ 0.
(4)
(5)
One can decompose the continuation values of the agent by Proposition 1, and it is sufficient
to consider one IC constraint by Proposition 2. From the decomposition, we also know that
the following four equalities should hold:
W1 = pH (1 − w1 + δ(M11 W11 + M12 W10 )) + (1 − pH )(−w0 + δ(M11 W01 + M12 W00 )), (6)
W0 = −w0 + δ(M21 W01 + M22 W00 ),
(7)
V1 = pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 )),
(8)
V0 = u(w0 ) + δ(M21 V01 + M22 V00 ).
(9)
The set of continuation values the principal can get when he wants the agent to work this
period is given by
12
Sw = {(π, W1 , W0 , V1 , V0 )|∃(π 0 , W11 , W10 , V11 , V10 ), (π̂, W01 , W00 , V01 , V00 ) ∈ S,
π 0 = (1, 0)M,
π1 (1 − pH )
1 − π1
M,
π̂ =
,
π1 (1 − pH ) + (1 − π1 ) π1 (1 − pH ) + (1 − π1 )
V1
W1
,
, V =π
W =π
V0
W0
(2) − (9) hold.}
where S is the set of all (π, W1 , W0 , V1 , V0 ) the principal can implement.
If the principal takes his outside option this period, both the principal and the agent
get their outside options and continue to the next period. The set of continuation values is
given by
So = {(π, W1 , W0 , V1 , V0 )|∃(π̌, W̌1 , W̌0 , V̌1 , V̌0 ) ∈ S,
π̌ = πM,
V1 = ū + M11 δ V̌1 + M12 δ V̌0 ,
V0 = ū + M21 δ V̌1 + M22 δ V̌0 ,
W1 = v̄ + M11 δ W̌1 + M12 δ W̌0 ,
W0 = v̄ + M21 δ W̌1 + M22 δ W̌0 .}
Proposition 3. The set of payoffs the principal can implement is given by the largest selfgenerating set S = Sw ∪ So . (π, W1 , W0 , V1 , V0 ) ∈ S corresponds to a contract starting
with prior π, the principal gets W1 if he’s in the good state and W0 if he’s in the bad state.
V1 , V0 are the agent’s payoffs in the good state and the bad state, respectively. The principal’s
V1
1
expected payoff is π W
W0 , and the agent’s expected payoff is π V0 .
3.2
General Case
This section develops the recursive formulation for the model in Section 2. The main
steps are the decomposition of continuation values and the one-shot deviation principle.
Assumptions 1 to 3 hold in this section.
In the binary environment, I decomposed the continuation values of the agent by the
continuation values on the equilibrium path given outcome y and state ω. Since there were
two outcomes and two states, I can decompose the continuation value as a linear combination
of four values, Vyω , for any given prior π. The decomposition works similarly when there
is a continuum of states and a continuum of outcomes. Let Vyω be the continuation value
13
of the agent after outcome y when the state in the next period is ω and both the principal
and the agent follow the equilibrium strategies from this period on. The state transition is
exogenous, and the history of outcomes determines the payments in the future. Let wy be
the payment for outcome y in the current period, and we can write the agent’s payoff as
follows:
V (ht , π) = −ci +
Z
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|ai , ω)dω 0 dω,
when the agent takes action ai .
The next step is the sufficiency of IC constraints for one-shot deviations when the agent
has never deviated before. From Assumption 3, we know
Z
(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|a, ω)dω 0
is supermodular in a and ω. Consider the IC constraint for the one-shot deviation from ai
to aj . From Assumption 2, j < i, and the IC constraint is given by
Z
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|ai , ω)dω 0 dω
Z
≥ −cj + π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|aj , ω)dω 0 dω.
t
V (h , π) = − ci +
We can rewrite the IC constraint as
Z
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω ≥ ci − cj .
Supermodularity implies that
R
(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 in-
creases with ω. If the agent has deviated previously, the agent took a less costly action,
and Assumption 1 implies that the agent’s prior π̂ dominates the principal’s prior π in the
sense of first-order stochastic dominance. Together with the supermodularity, we know
Z
Z
≥
π̂(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω
≥ ci − cj ,
and the agent will work again in the current period even if he has deviated before. Since
the principal has within-period commitment power and the agent has limited liability, the
agent’s continuation values are bounded, and we have continuity at infinity. The IC constraints for one-shot deviations when the agent has never deviated before are sufficient
14
conditions for all IC constraints.
Proposition 4. IC constraints for one-shot deviations when the agent has never deviated
before are sufficient conditions for all IC constraints.
We can find the optimal contract using the following recursive formulation: Let S0 , S1 , · · · , Sn
be defined by
S0 = {(π, W, V )|∃(π̌, W̌ , V̌ ) ∈ S
π̌ = πM,
Z
W (ω) = v̄ + δ m(ω 0 |ω)W̌ (ω 0 )dω 0 ,
Z
V (ω) = ū + δ m(ω 0 |ω)V̌ (ω 0 )dω 0 .}
Si = {(π, W, V )|∃(π y , W y , V y ) ∈ S, ∀y ∈ Y,
π y is updated using Bayes’ rule with action ai and outcome y,
Z
W (ω) = (−wy + m(ω 0 |ω)W y (ω 0 ))dω 0 dF (y|ai , ω),
Z
V (ω) = (u(wy ) + m(ω 0 |ω)V y (ω 0 ))dω 0 dF (y|ai , ω),
Z
v̄
,
π(ω)W (ω)dω ≥
1−δ
Z
ū
,
π(ω)V (ω)dω ≥
1−δ
Z
π(ω)(u(wy ) + δm(ω 0 |ω)V y (ω 0 ))(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω ≥ ci − cj , ∀j < i,
wy ≥ 0, ∀y ∈ Y.}
∀i = 1, · · · , n,
where (π, W, V ) ∈ S satisfies W, V : Ω → R and wy is the payment for outcome y in this
period.
Proposition 5. S is the largest self-generating set such that S = ∪ni=0 Sn .
The above recursive formulation requires Assumptions 1 to 3. Assumption 3 is about
the agent’s payoff under the optimal contract, which is an endogenous object. However,
we can consider the relaxed problem without Assumption 3 and verify ex-post that the
optimal contract does satisfy the assumption. If we don’t require Assumption 3, we don’t
know whether the IC constraints for one-shot deviations when the agent has never deviated
are sufficient conditions for all IC constraints. The recursive formulation with only the IC
constraints for one-shot deviations when the agent has never deviated before is a relaxed
15
problem of the original problem of finding the optimal contract. But if we verify ex-post
that the optimal contract of the relaxed problem does satisfy Assumption 3, then it is the
optimal contract of the original problem.
4
Discussion
This section extends the algorithm to different commitment powers of the principal and
endogenous state transition.
4.1
Commitment Power
The model in Section 2 assumes that the principal has within-period commitment power.
This can be generalized to no-commitment power case.
4.1.1
No-commitment Power
Suppose the principal has no-commitment power and one is interested in the set of all
perfect Bayesian equilibria. All constraints in the case of within-period commitment power
case have to be satisfied when the principal has no-commitment power. In addition, the
principal should be willing to make the payment he promised. The payment at any point
is bounded from above by the difference in the continuation value of the principal and
his outside option. The algorithm with this promise-keeping constraint finds the set of all
perfect Bayesian equilibria. Since the agent’s payoff is bounded, we still have continuity at
infinity, and the one-shot deviation principle holds.
4.1.2
Full-commitment Power
Suppose the principal has full-commitment power. The continuation value of the principal
doesn’t have to be above his outside option, and we can drop the constraint on the principal’s
payoff. For the agent, if he also commits to the contract, then IR will bind only in the
very first period. Otherwise, if the agent can quit anytime, then the agent has the same
constraints as in Section 3. With full-commitment power, we don’t necessarily know that
the agent’s continuation is bounded from above, and we have to show continuity at infinity.
In particular, if the agent is risk-neutral, the principal can always delay the payment at
no cost, and we need to restrict the class of contracts such that the principal makes the
payment at the earliest possible time. As long as we are interested in the principal’s payoff
in period 0, this is without loss of generality. When the agent is risk-averse, the principal has
incentives to smooth the agent’s consumption. I conjecture the agent’s payoff is bounded
under the optimal contract, but I have to formally show this yet.
16
4.2
Endogenous State Transition
Consider the binary environment from Section 3.1. The algorithm generalizes when the state
transition from the good state is a function of the state and the agent’s action. Specifically,
let M1ja be the probability of transiting from state 1 to state j when the agent takes action
a. Given the state in the following period, one can decompose the agent’s continuation
value by state, and the IC constraint for the one-shot deviation when the agent has never
deviated before is given as follows.
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M111 V11 + M121 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M111 V01 + M121 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 ))
≥ π1 pL (u(w1 ) + δ(M110 V11 + M120 V10 )) + π1 (1 − pL )(u(w0 ) + δ(M110 V01 + M120 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )).
The IC constraint simplifies as
π1 (pH (u(w1 ) + δ(M111 V11 + M121 V10 )) + (1 − pH )(u(w0 ) + δ(M111 V01 + M121 V00 ))
− pL (u(w1 ) + δ(M110 V11 + M120 V10 )) − (1 − pL )(u(w0 ) + δ(M110 V01 + M120 V00 )))
≥ c.
(10)
If the agent has deviated before, his prior satisfies π̂1 ≥ π1 , and the agent will work again.
The one-shot deviation principle holds in this environment, and the algorithm with (10) as
the IC constraint characterizes the set of all equilibrium payoffs the principal can implement.
5
Conclusion
I develop an algorithm to find the optimal contract when the principal and the agent learn
the payoff-relevant state over time. The principal and the agent start with symmetric
information about the underlying state, and every period after observing the outcome, they
update their beliefs about the state. The updating depends on the action of the agent, and
off the equilibrium path, the principal and the agent have asymmetric information which
leads to information rent.
Since the agent’s continuation value depends on both the payments in the future and his
prior in the current period, the agent’s deviation payoff off the equilibrium path takes into
account the agent’s prior after a deviation. The first innovation of this paper is to decompose
the agent’s continuation value by the continuation value in each state. If the principal and
the agent play the equilibrium strategies from this period on, the principal’s and the agent’s
continuation values can be written as a linear combination of the continuation values in each
17
state. The prior at the beginning of this period enters as weights in the linear combination.
Once one decomposes the continuation values, one can also show the one-shot deviation
principle. What I show in this paper is stronger than the usual one-shot deviation principle
in a sense that we can consider one-shot deviations when the agent has never deviated before.
If the agent has never deviated before, then the principal and the agent have the same prior,
and there is only one IC constraint for the one-shot deviation. The key assumptions for
the one-shot deviation principle are downward deviations, supermodularity of the agent’s
payoff and the assumption that the agent is weakly more optimistic than the principal after
a deviation.
Together, the decomposition and the one-shot deviation principle allows one to formulate
the principal’s problem recursively. My algorithm finds the set of all PBEs the principal
can implement.
In the binary environment when the supermodularity is already satisfied, the principal
has incentives to learn the state cheaply by letting the agent shirk. The principal is losing
the output, but he doesn’t have to leave any rent, and the principal doesn’t necessarily want
to frontload all the effort. Characterizing the qualitative properties of the optimal contract
when the principal can learn without leaving rent is one direction the future research can
take.
Another topic for further research is to find primitive conditions for supermodularity. It
is satisfied in the binary environment, but in a more general environment, it is an assumption
on endogenous objectives. It’ll be useful if one can find primitive conditions under which
the agent’s payoff under optimal contract is supermodular.
References
[1] Abreu, Dilip, Pearce, David, and Ennio Stacchetti. 1990. “Toward a Theory of
Discounted Repeated Games with Imperfect Monitoring.” Econometrica 58 (5): 10411063.
[2] Athey, Susan, and Kyle Bagwell. 2008. “Collusion with Persistent Cost Shocks.”
Econometrica 76 (3): 493-540.
[3] Battaglini, Marco. 2005. “Long-Term Contracting with Markovian Consumers.”
American Economic Review 95 (3): 637-658.
[4] Bergemann, Dirk, and Ulrich Hege. 1998. “Venture capital financing, moral hazard, and learning.” Journal of Banking and Finance 22 (6): 703-735.
18
[5] Bergemann, Dirk, and Ulrich Hege. 2005. “The Financing of Innovation: Learning
and Stopping.” RAND Journal of Economics 36 (4): 719-752.
[6] Bhaskar, V. 2012. “Dynamic Moral Hazard, Learning and Belief Manipulation.”
http://www.ucl.ac.uk/∼uctpvbh.
[7] Cabrales, Antonio and Calvó-Armengol, Antoni and Pavoni, Nicola. 2008.
“Social preferences, skill segregation, and wage dynamics.” Review of Economic Studies
75 (1): 65-98.
[8] DeMarzo, Peter M., and Yuliy Sannikov. 2011. “Learning, Termination, and Payout Policy in Dynamic Incentive Contracts.” http://www.princeton.edu/∼sannikov.
[9] Escobar, Juan F., and Juuso Toikka. 2012. “Efficiency in Games with Markovian
Private Information.” http://economics.mit.edu/faculty/toikka.
[10] Farhi, Emmanuel, and Ivn Werning. 2013. “Insurance and taxation over the life
cycle.” Review of Economic Studies 80 (2): 596-635.
[11] Fernandes, Ana, and Christopher Phelan. 2000. “A Recursive Formulation for
Repeated Agency with History Dependence.” Journal of Economic Theory 91 (2):
223-247.
[12] Garrett, Daniel, and Alessandro Pavan. 2012. “Managerial Turnover in a Changing World.” Journal of Political Economy 120 (5): 879-925.
[13] Garrett,
agerial
Daniel,
Compensation:
and
On
Alessandro
the
Pavan.
Optimality
of
2013.
“Dynamic
Seniority-based
Man-
Schemes.”
http://faculty.wcas.northwestern.edu/∼apa522.
[14] Golosov, Mikhail, Maxim Troshkin, and Aleh Tsyvinski. 2011. “Optimal dynamic taxes.” National Bureau of Economic Research No. w17642.
[15] Green, Edward. 1987. “Lending and the Smoothing of Uninsurable Income.” Contractual arrangements for intertemporal trade 1: 3-25.
[16] Halac, Marina, Navin Kartik, and Qingmin Liu. 2012. “Optimal Contracts for
Experimentation.” http://www0.gsb.columbia.edu/faculty/mhalac/research.html.
[17] Harris, Milton, and Bengt Holmström. 1982. “A Theory of Wage Dynamics.”
Review of Economic Studies 49 (3): 315-333.
[18] He, Zhiguo, Bin Wei, and Jianfeng Yu. 2013. “Optimal Long-term Contracting
with Learning.” http://faculty.chicagobooth.edu/zhiguo.he/pubs.html.
19
[19] Holmström, Bengt. 1982. “Managerial Incentive Schemes—A Dynamic Perspective.” In Essays in Economics and Management in Honor of Lars Wahlbeck. Helsinki:
Swedish School of Economics.
[20] Hörner, Johannes, and Larry Samuelson. 2012. “Incentives for Experimenting
Agents.” https://sites.google.com/site/jo4horner.
[21] Jovanovic, Boyan, and Julien Prat. 2013. “Dynamic Contracts when Agent’s Quality is Unknown.” forthcoming in Theoretical Economics
[22] Kwon,
Suehyun. 2013. “Dynamic Moral Hazard with Persistent States.”
http://www.ucl.ac.uk/∼uctpskw.
[23] Spear, Stephen E., and Sanjay Srivastava. 1987. “On Repeated Moral Hazard
with Discounting.” Review of Economic Studies 54 (4): 599-617.
[24] Strulovici, Bruno. 2011. “Contracts, Information Persistence, and Renegotiation.”
http://faculty.wcas.northwestern.edu/∼bhs675/
[25] Tchistyi, Alexei. 2013. “Security Design with Correlated Hidden Cash Flows: The
Optimality of Performance Pricing.” http://faculty.haas.berkeley.edu/Tchistyi.
[26] Williams, Noah. 2011. “Persistent Private Information.” Econometrica 79 (4): 12331275.
[27] Zhang, Yuzhe. 2009. “Dynamic Contracting with Persistent Shocks.” Journal of Economic Theory 144 (2): 635675.
A
Proofs
Proof of Proposition 1. Suppose the principal and the agent follow the equilibrium strategies after ht under a given contract and the principal wants the agent to work this period. If
the principal and the agent are in ω1 in the following period, there is no information asymmetry about the probability distributions of the outcome and the state transition in the
future. The history ht and the outcome this period determines the payment stream in the
future, and there exist V11 , V01 , W11 , W01 such that Vy1 is the agent’s payoff after outcome
y. Wy1 is the principal’s payoff after outcome y. Similarly, one can define V10 , V00 , W10 , W00
if the state in the next period is ω0 .
The prior π after ht doesn’t affect the continuation values from the following period,
and π only affects the probability of transition into each state in the next period. Therefore,
20
the agent’s payoff satisfies
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )),
and the principal’s payoff satisfies
W (ht , π) =π1 pH (1 − w1 + δ(M11 W11 + M12 W10 )) + π1 (1 − pH )(−w0 + δ(M11 W01 + M12 W00 ))
+ (1 − π1 )(−w0 + δ(M21 W01 + M22 W00 )).
If the principal wants the agent to shirk this period, they get their outside options
in the current period, and if the principal and the agent follow the equilibrium strategies
from the following period, the continuation values from the following period are completely
determined by what state they are in next period. There exist V1 , V0 , W1 , W0 such that for
any prior π,
V (ht , π) = ū + (π1 M11 + (1 − π1 )M21 )δV1 + (π1 M12 + (1 − π1 )M22 )δV0 ,
W (ht , π) = v̄ + (π1 M11 + (1 − π1 )M21 )δW1 + (π1 M12 + (1 − π1 )M22 )δW0 ,
where Vi and Wi are the agent’s payoff and the principal’s payoff if they’re in ωi next
period.
Proof of Proposition 2. Suppose after history ht , the principal has prior π. If the agent
has never deviated before, the principal and the agent share the common prior π. From
Proposition 1, if the principal wants the agent to work this period, we can decompose the
agent’s continuation value by V11 , V10 , V01 and V00 . The IC constraint for the one-shot
deviation can be written as
V (ht , π) = − c + π1 pH (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pH )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 ))
≥ π1 pL (u(w1 ) + δ(M11 V11 + M12 V10 )) + π1 (1 − pL )(u(w0 ) + δ(M11 V01 + M12 V00 ))
+ (1 − π1 )(u(w0 ) + δ(M21 V01 + M22 V00 )).
Note that the continuation values from the following period and on are V11 , V10 , V01 and
V00 , and they don’t change when the agent deviates. The agent’s deviation affects the
distribution of outcomes in this period, but the agent’s expected payoffs in the continuation
game are completely determined by the outcome this period and the state in the next
period. The state transition is also exogenous, and the probabilities are independent of the
21
agent’s action.
The IC constraint can be rewritten as
π1 (pH − pL )((u(w1 ) + δ(M11 V11 + M12 V10 )) − (u(w0 ) + δ(M11 V01 + M12 V00 ))) ≥ c,
and from Assumptions 1 and 2, the agent’s prior π̂ satisfies π̂1 ≥ π1 if he has deviated
before. The agent’s IC constraint for the one-shot deviation becomes
π̂1 (pH − pL )((u(w1 ) + δ(M11 V11 + M12 V10 )) − (u(w0 ) + δ(M11 V01 + M12 V00 ))) ≥ c,
and it is satisfied whenever the IC constraint holds for π. Therefore, the IC constraints for
one-shot deviations when the agent has never deviated before are sufficient conditions for all
IC constraints for one-shot deviations. Since the principal has within-period commitment
power, the agent’s payoff is bounded from above, and together with limited liability, we have
continuity at infinity. Therefore, the IC constraints for one-shot deviations are sufficient
conditions for all IC constraints.
Proof of Proposition 3. At every period, the IR constraints for the principal and the agent
should hold. The agent also has the IC constraints, and he has limited liability. From
Proposition 2, it is sufficient to consider the IC constraint for the one-shot deviation when
the agent has never deviated before. The principal also should have incentives to offer the
on-the-equilibrium-path contract than to deviate. Since we are interested in the set of all
PBEs, there is no loss of generality in assuming that off the equilibrium path, the agent
chooses the least expensive action, and anticipating this, the principal takes his outside
option. Therefore, the IC constraint for the principal can be replace with the IR. The other
equalities are for accounting, and the set of all PBEs is the largest self-generating set subject
to the constraints.
Proof of Proposition 4. Consider the IC constraint for the one-shot deviation from ai to aj .
From Assumption 2, j < i, and the IC constraint is given by
t
Z
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|ai , ω)dω 0 dω
Z
≥ −cj + π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )dF (y|aj , ω)dω 0 dω.
V (h , π) = − ci +
We can rewrite the IC constraint as
Z
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω ≥ ci − cj .
Supermodularity implies that
R
(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 in22
creases with ω. If the agent has deviated previously, the agent took a less costly action,
and Assumption 1 implies that the agent’s prior π̂ dominates the principal’s prior π in the
sense of first-order stochastic dominance. Together with the supermodularity, we know
Z
Z
≥
π̂(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω
π(ω)(u(wy ) + δm(ω 0 |ω)Vyω0 )(dF (y|ai , ω) − dF (y|aj , ω))dω 0 dω
≥ ci − cj ,
and the agent will work again in the current period even if he has deviated before. Since
the principal has within-period commitment power and the agent has limited liability, the
agent’s continuation values are bounded, and we have continuity at infinity. The IC constraints for one-shot deviations when the agent has never deviated before are sufficient
conditions for all IC constraints.
Proof of Proposition 5. The set of constraints are the same as in the binary environment.
From Proposition 4, it is sufficient to consider the IC constraint for the one-shot deviation
when the agent has never deviated before. Si is the set of payoffs the principal can implement
if he wants the agent to take action ai in the current period. S0 is the set of payoffs
when the principal takes his outside option. Therefore, the set of all PBEs is the union of
Si , i = 1, · · · , n and S0 .
23
Download