Play with fire: Experiments to find Florian K Diekert November 12, 2015

advertisement
Play with fire: Experiments to find
the location of a catastrophic threshold
Florian K Diekert§
November 12, 2015¶
Abstract
Many dynamic systems exhibit tipping points – they fundamentally change their character once a critical value, or threshold, is crossed. A key aspect is that the threshold’s
location is almost always unknown. Because experimentation only reveals whether the
threshold has been crossed or not, learning is “affirmative”. When crossing the threshold is disastrous, in the sense that the post-event value is independent of the pre-event
state, affirmative learning implies that all experimentation is undertaken at once. However, the magnitude of the experiment is smaller the more valuable the current state.
The paper further shows that this feature of learning allows non-cooperative agents to
take advantage of the regime shift threat and coordinate on a cautious equilibrium that
preserves the resource with positive probability. If the safe status quo is sufficiently
valuable, players can even coordinate on the first-best.
Keywords: Catastrophic shifts; Tipping points; Learning; Dynamic Games.
JEL-Classifications: C73, D83, H41, Q20, Q54
§
Department of Economics and Centre for Ecological and Evolutionary Synthesis, University of Oslo,
PO-Box 1095, 3017 Oslo, Norway. Email: f.k.diekert@ibv.uio.no. This research is funded by NorMER, a
Nordic Centre of Excellence for Research on Marine Ecosystems and Resources under Climate Change.
¶
Job Market Paper; updated frequently. The most recent version is deposited at http://folk.uio.no/
floriakd/papers/DiekertPlayFire.pdf.
1
Introduction
Suppose you would want to go as far as possible, but any step too far would be the end.
How far would you go and how many steps would you take when you don’t know where the
catastrophic threshold is?
Many natural systems at the very foundation of human activity exhibit tipping points
whose location is unknown. In particular, the dynamics of the climate system may change
fundamentally once global warming exceeds a critical value. A rapid meltdown of the
Greenland Ice Sheet, a disintegration of the West-Antarctic Ice Sheet, or a shutdown of the
Gulf Stream all are possible tipping elements in the earth’s climate system (Lenton et al.,
2008). Any of these events may have large and long-lasting, or even disastrous, consequences
for life on this planet. Tipping points have also been documented in other natural systems
such as lakes, coral reefs, or woodlands (Scheffer et al., 2001). Economically important
regimes shifts in the recent past include the collapse of the Canadian cod or Norwegian
herring fisheries (Frank et al., 2005; Amundsen and Bjørndal, 1999). Finally, a wide range
of applications from the social sphere are illustrated in the popular book by Gladwell (2000).
Given its importance, it is not surprising that there is a rapidly growing literature that
grapples with this issue and advances our understanding of how to manage dynamic systems
under regime shift risks (see below for a review of that literature). However, up to date,
two aspects are underexplored: learning about the unknown location of the threshold, and
the effect of strategic interaction. To provide sound policy advice, we need to know more
about how these two factors affect the aggregate use of resources under regime shift risk.
This paper proposes a tractable framework to analyze both learning and strategic interactions. I focus on the uncertainty about the threshold’s location by assuming that the
threshold value T is constant but unknown (with an arbitrary continuous probability distribution), which implies that if an exploration of the state space up to a value s has not
triggered the regime shift, it will also not do so in the future. The value s is henceforth
known to be safe. In the simplest setting, this means that it is socially optimal to either not
experiment at all, or to undertake all experimentation at once. The reason is that learning is
only “affirmative”: Having explored the state space between s and s0 , it is revealed whether
the state s0 is safe or not, but no new knowledge about the relative probability that the
threshold is located at, say, s1 or s2 (with s1 , s2 > s0 ) has been acquired. Therefore, it does
not pay to experiment a second time. This feature of learning is remarkably robust to the
wide array of changes to the basic model structure that I consider in the first and third part
of the paper.
In the second part of the paper, I turn to a non-cooperative setting (but go back to the
simplest version of the system dynamics). I show that strategic agents can take advantage
of the regime shift threat and coordinate on a “cautious” equilibrium that preserves the
resource with positive probability (although experimentation is still inefficiently risky). If
the safe status quo is sufficiently valuable, players can even coordinate on the first-best.
2
Applications
The aim of this paper is to develop a generic theoretical model to systematically characterize
the effect of a disastrous regime shift on optimal- and non-cooperative experimentation and
resource use. My research strategy is to model the occurrence of the regime shift as a
function of the current control variable and to simplify the underlying stock dynamics by
assuming that the current state equals the last state as long as the threshold has not been
crossed. This allows me to concentrate on the uncertainty about the location of the threshold
and to obtain clear analytical results. It also implies that the model most closely represents
“flow problems”.
A very important example of such a problem is saltwater intrusion that threatens freshwater reservoirs in many coastal regions around the world (Barlow and Reichard, 2010).
Due to strong pressure from population growth, managers may be tempted to increase the
extraction of water from the reservoir beyond what is known to be safe (which is given by
the current water table). Because the geological structures are often highly complex, and
the spoilage may be irreversible, a manager thus faces the trade-off that increasing extraction may trigger the catastrophic regime shift – or it may prove to be just fine, so that the
consumption level that is known to be safe is updated.
Another example of a relevant flow problem could be harvesting from an annual fishery, such as fishing for shrimps or small pelagics can be analyzed as such a flow problem.
The magnitude of the annual recruitment of individual shrimp/fish is largely controlled by
exogenous environmental factors and independent of the remaining stock in the last year –
unless the biomass has been depleted below its minimum viable population size.
In spite of the simplicity with which the underlying dynamics of the state variable are
modeled, it is also possible to apply the model to a “stock problem”. Consider climate
change: while it is the temperature level and not consumption directly that will cause the
climate system to tip into a potentially disastrous state, there is a one-to-one mapping from
consumption to temperature and thus the probability to experience the regime shift. The
crucial point is that it is the current addition to the stock that causes tipping (even if the
event may occur only a long time in the future). In essence, this perspective reinterprets
consumption as the utility derived from the current stock of pollution and subsumes all noncatastrophic damages from an increased pollution stock in the utility function. In this way,
one can also think of land-use choices as a relevant example. The supporting services of an
ecosystem may stay relatively intact even as large parts of it are permanently transformed
and put to other uses (such as turning rainforest into palm-oil plantations). However, once
more than a critical part has been converted, the supporting mechanism collapses and the
ecosystem tips into a degraded state.
Although environmental problems arguably constitute a very important and direct application, my model is more general and can be used to shed light on a variety of processes.
For example, the testing of a large-scale machine or a power plant presents engineers with
a similar problem: To what level of pressure should they expose a machine to learn about
3
its resistance? When the machine is large and its breakdown costly, it will be optimal to
describe a lower bound for the level of pressure that is safe, but not actually find out at
which level the machine breaks.
Relation to the literature
This paper links to three strands of literature. First, it contributes to the literature on the
management of natural resources under regime shift risk by explicitly analyzing learning
about the location of a threshold. Second, it relates to the more general literature by
describing optimal experimentation in a set-up of “affirmative learning”. Third, the paper
extends the literature on coordination in face of a catastrophic public bad, that has hitherto
been analyzed in a static setting, by showing that the sharp distinction between a known
and unknown location of the threshold vanishes in a dynamic context.
The pioneering contributions that analyze the economics of regime shifts in an environmental/resource context were Cropper (1976) and Kemp (1976). There are by now a good
dozen papers on the optimal management of renewable resources under the threat of an
irreversible regime shift (see, for example: Tsur and Zemel, 1995; Nævdal, 2006; Brozović
and Schlenker, 2011; Lemoine and Traeger, 2014). Polasky et al. (2011) summarize and
characterize the literature at hand of a simple fishery model. They contrast whether the
regime shift implies a collapse of the resource or merely a reduction of its renewability, and
whether probability of crossing the threshold is exogenous or depends on the state of the
system (i.e. it is endogenous). They show that resource extraction should be more cautious
when crossing the threshold implies a loss of renewability and the probability of crossing
the threshold depends on the state of the system. In contrast, exploitation should be more
aggressive when a regime shift implies a collapse of the resource and the probability of
crossing the threshold cannot be influenced. There is no change in optimal extraction for
the loss-of-renewability/exogenous-probability case and the results are ambiguous for the
collapse/endogenous-probability case.
Ren and Polasky (2014) scrutinize the loss-of-renewability/endogenous-probability case
for more general growth and utility functions. They point out that in addition to the riskreduction effect of the linear model in Polasky et al. (2011), there may also be a consumptionsmoothing and an investment effect. The consumption smoothing effect gives incentives to
build up a higher resource stock, so that one has more should the future growth be reduced.
The investment effect gives incentives to draw down the resource stock as it the rate of
return after the regime shift is lower. Lemoine and Traeger (2014) call the sum of these two
countervailing effects the “differential welfare effect”. In their climate-change application
the total effect is positive. As I consider a truly catastrophic event, this effect is zero by
construction in the baseline model of the current paper.
Until now the literature in resource economics has been predominantly occupied with
optimal management, leaving aside the central question of how agent’s strategic considerations influence and are influenced by the potential to trigger a disastrous regime shift.
4
Still, there are a few notable exceptions: Crépin and Lindahl (2009) analyze the classical
“tragedy of the commons” in a grazing game with complex feedbacks, focussing on open-loop
strategies. They find that, depending on the productivity of the rangeland, under- or overexploitation might occur. Kossioris et al. (2008) focus on feedback equilibria and analyze,
with help of numerical methods, non-cooperative pollution of a “shallow lake”. They show
that, as in most differential games with renewable resources, the outcome of the feedback
Nash equilibrium is in general worse than the open-loop equilibrium or the social optimum.
Ploeg and Zeeuw (2015b) compare the social optimal carbon tax to the tax in the openloop equilibrium under the threat of a productivity shock due to climate change. While
it would be optimal in their two-region model to have taxes that converge to a high level,
the players apply diverging taxes in the non-cooperative equilibrium: While the more developed and less vulnerable region has a lower carbon tax throughout, the less developed
and more vulnerable region has low taxes initially in order to build up a capital stock, but
will apply high taxes later on in order to reduce the chance of tipping the climate into
the unproductive regime. Fesselmeyer and Santugini (2013) introduce an exogenous event
risk into a non-cooperative renewable resource game à la Levhari and Mirman (1980). As
in the optimal management problem with an exogenous probability of a regime shift, the
impact of shifted resource dynamics is ambiguous: On the one hand, the threat of a less
productive resource induces a conservation motive for all players, but on the other hand, it
exacerbates the tragedy of the commons as the players do not take the risk externality into
account. Finally, Sakamoto (2014) has, by combining analytical and numerical methods,
analyzed a non-cooperative game with an endogenous regime shift hazard. He shows that
this setting may lead to more precautionary management, also in a strategic setting. Miller
and Nkuiya (2014) also combine analytical and numerical methods to investigate how an
exogenous or endogenous regime shift affects coalition formation in the Levhari-Mirman
model. They show that an endogenous regime shift hazard increases coalition sizes and it
allows the players, in some cases, to achieve full cooperation. Taken together, these studies
and the current paper show that the effect of a regime shift pulls in the same direction
in a non-cooperative setting as under optimal management. However, both the literature
on optimal resource management under regime shift risk and its non-cooperative counterpart have not explicitly addressed learning about the unknown location of the tipping point.
There are two papers that discuss optimal experimentation in an environment of affirmative
learning: Rob (1991) studies optimal and competitive capacity expansion when market
demand is unknown. Rob finds that learning will take place over several periods. Costello
and Karp (2004) investigate optimal pollution quotas when abatement costs are unknown.
In line with the baseline model of the current paper, they find that any experimentation
takes place in the first period only.
The difference between Rob’s model on the one hand and Costello and Karp (2004) on
the other hand is the following: In Rob (1991), the information gained by an additional unit
of installed capital is small, but so is the cost. However, experimenting too much (in the
5
sense of installing more capital than is needed to satisfy the revealed demand) is very costly
compared to experimenting too little (so that the true size of the market remains unknown)
several times. Consequently, learning takes place gradually. In the competitive equilibrium,
learning is even slower due to the private nature of search costs but the public nature of
information. In Costello and Karp (2004), the information gain from an additional unit of
quota is small as well, but search costs are very high in the beginning and then decline.
In fact, the costs are zero once the quota is non-binding. Thus, although the costs of
an experiment are high, there is no additional harm from experimenting too much and it
is therefore optimal to search only once. In my model, the marginal gain from search is
bounded above and decreasing, but the marginal costs from search increase. The disastrous
regime shift occurs when the threshold is crossed, irrespective of how far the agents have
stepped over it. As in Costello and Karp (2004), there is thus no additional harm in
experimenting too much (but the costs of an experiment are increasing with its size in my
model), and it is therefore optimal to search only once.
I show that non-cooperative learning is more aggressive than socially optimal because
costs of search are public while the immediate gains are private. Moreover, I show that
experimentation is decreasing in the value of the state that is known to be safe: The more
the players know that they can safely consume, the less will they be willing to risk triggering
the regime shift by enlarging the set of consumption opportunities. This aspect has, to the
best of my knowledge, not yet been appreciated.
My discussion of learning about a reversible threshold also extends the pioneering treatment of Groeneveld et al. (2013). I show that their result, that the upper bound of the belief
about the threshold’s location is expanded only once, holds more generally for any concave
utility function and continuous probability density. But in contrast to what their numerical
forward simulation suggests, I find that learning occurs in at most a finite number of steps.
At this point it is important to highlight the difference between the current approach,
in which learning is “affirmative”, on the one hand, and, on the other hand, the literature
on strategic experimentation (e.g.: Bolton and Harris, 1999; Keller et al., 2005; Bonatti and
Hörner, 2015) and the literature on learning in a resource management context (e.g.: Kolstad
and Ulph, 2008; Agbo, 2014; Koulovatianos, 2015). In the latter two strands of literature,
learning is “informative” in the sense that the agents obtain a random sample on which they
base their inference about the state of the world. It pays to obtain repeated samples as
this improves the estimate (of course, the public nature of information introduces free-rider
incentives in a strategic setting).
Finally, the current paper is closely related to three articles that discuss the role of uncertainty about the threshold’s location on whether a catastrophe can be avoided. Barrett (2013) shows that players in a linear-quadratic game are in most cases able to form
self-enforcing agreements that avoid catastrophic climate change when the location of the
threshold is known but not when it is unknown. Similarly, Aflaki (2013) analyzes a model
of a common-pool resource problem that is, in its essence, the same as the stage-game
6
developed in section II–1. Aflaki shows that an increase in uncertainty leads to increased
consumption, but that increased ambiguity may have the opposite effect. Bochet et al.
(2013) confirm the detrimental role of increased uncertainty in the stochastic variant of the
Nash Demand Game: Even though “cautious” and “dangerous” equilibria co-exist (as they
do in my model), they provide experimental evidence that participants in the lab are not
able to coordinate on the Pareto-dominant cautious equilibrium.1 However, the models in
Aflaki (2013), Barrett (2013), and Bochet et al. (2013) are all static and can therefore not
address the prospects of learning. Here, I show that the sharp distinction between known
and unknown location of a threshold vanishes in a dynamic context. More uncertainty still
leads to increased consumption, but this is now partly driven by the increased gain from
experimentation.
Analyzing how strategic interactions shape the exploitation pattern of a renewable resource under the threat of a disastrous regime shift is important beyond mere curiosity
driven interest. It is probably fair to say that international relations are basically characterized by an absence of supranational enforcement mechanisms which would allow to
make binding agreements. But also locally, within the jurisdiction of a given nation, control is seldom complete and the exploitation of many common pool resources is shaped by
strategic considerations. Extending our knowledge on the effect of looming regime shifts by
taking non-cooperative behavior into account is therefore a timely contribution to both the
scientific literature and the current policy debate.
Plan of the paper
The structure of the paper is simple. In the first part, I focus on learning and explore the
effect of various structural assumptions in the context of a first-best social planner problem.
In Part II, I focus on strategic interaction and to do so, I revert to the simplest version of the
dynamic model. Finally, I discuss extensions to the underlying model of resource dynamics
in Part III. All proofs are relegated to the Appendix.
1
Bochet et al. (2013, p.1) conclude that a “risk-taking society may emerge from the decentralized actions
of risk-averse individuals”. Unfortunately, it is not clear from the description in their manuscript whether
the participants were able to communicate. The latter has shown to be a crucial factor for coordination in
threshold public goods experiments (Tavoni et al., 2011; Barrett and Dannenberg, 2012). Hence, it may be
that what they refer to as “societal risk taking” is simply the result of strategic uncertainty.
7
Part I – The social optimum
In this part of the paper, I introduce the basic modeling framework in a setting where a
social planner seeks to find the optimal strategy (section I–1). I first analyze the baseline
scenario of an unconstrained optimization in section I–2, showing that all experimentation is
undertaken at once, but that the magnitude of experimentation is smaller the more valuable
the current state. In section I–3, I develop a simple example for specific functional forms to
obtain closed-form solutions. Generalizations and extensions to the baseline scenario, such
as incorporating the potential reversibility of a regime shift, are discussed in section I–4.
Although the model that I present below is generic, and could fit many different applications, it may be useful to fix ideas by considering a real world example of an underground
aquifer. The overall volume of freshwater in the reservoir is approximately known, and the
annual recharge, say due to rainfall or from melting snow is sufficient to fully replenish
it. However, the manager fears that extracting all the water in the reservoir may cause
the intrusion of saltwater. Further, suppose the underwater geology is complex so that it
is not known at which level of the water table saltwater intrusion would occur. (Suppose
further that the location of the threshold cannot be adequately learned by scientific investigation, or it is feared that the scientific exploration itself causes the intrusion of saltwater
by destabilizing the geological structure.) However, saltwater intrusion has not occurred in
the past, so that the current level of use is known to be safe. Thus, the manager now faces
the trade-off whether to expand the current consumption of water, or not. If she decides to
expand the current level of use, by how much should extraction increase, and in how many
steps should the expansion occur?
With this example in mind, let me now present the formal framework.
I–1
The model
To concentrate on learning and the uncertainty about the location of the threshold T , I strip
the model to its bare necessities: Utility at time t is derived from consumption ct according
to a function u (with u0 > 0, u00 ≤ 0). Consumption cannot exceed the resource base, which
is given by R (as long as the threshold has not been crossed).
I treat the threshold as constant, but its location is unknown. The social planner has
a prior belief F about its location, so that we are in a situation of risk (and not Knightian
uncertainty; more about the updating of beliefs below).
Time is discrete and indexed by t = 0, 1, .... The social planner seeks to maximize the
discounted sum of utilities, where the discount factor is given by β. To focus on the effect
of a catastrophic threshold, the resource dynamics are simple and given by equation (1):
(
R0 = R ;
Rt+1 =
R if ct ≤ T
r
if ct > T
or Rt = r
(1)
Note that the abstraction from the stock dynamics in absence of the regime shift must
8
not mean that they do not exist. One could very well imagine a more general, underlying,
process that generates R, which could be interpreted as the consumption level that would
be optimal in absence of a regime shift risk. For example, this could be the steady-state
harvest level when applying this model to a fishery, or it could be the bliss point where
marginal benefits from pollution equal the marginal cost of pollution in an application to
climate-change. The model formulation of equation (1) simply means that the internal
resource dynamics are not relevant for analyzing the tipping point problem.
Note further that the extreme simplicity of modeling brings some freedom: Should the
regime shift have occurred, it is obvious that the best action of the social planner is to set ct
to r for all remaining time. Without loss of generality, one can therefore normalize u(r) = 0.
The post-event continuation value is consequently zero, which greatly simplifies the algebra.
However, this must not mean that all economic activity stops once the threshold is crossed,
but it simply implies that the pre-event value could be interpreted as the benefit that is
obtained in addition to some post-event baseline.
While I do discuss extensions and different modeling assumptions (such as a delay in the
occurrence, or reversibility of the regime shift) in section I–4, I would argue that the above
model is a very sensible way to analyze the problem. Granted, equation (1) implies that –
figuratively speaking – the edge of the cliff is a safe place (choosing ct = T does not trigger
the regime shift) and this does indeed not seem to be a very realistic feature at first sight.
Upon closer inspection, however, one realizes that there are two aspects that could make
the edge an unsafe place that need to be distinguished here. First, there could be additive
disturbances, and second, there could be multiplicative disturbances in the system.
Additive disturbances, such as stochastic (white) noise, are independent of the current
state and would not affect the calculations in a meaningful way. They could be absorbed in
the discount factor. To be more concrete, think of a sardine or shrimp fishery, and let T be
the minimum viable population size, so that the fish stock collapses once the escapement falls
below this threshold. Stochastic noise would then mean that T moves for some exogenous
reason such as changes in salinity. Multiplicative disturbances, in contrast, would not be
independent of the current state. In the shrimp example, the survival from egg to larvae
could be viewed as the product of many smaller survival events at the individual level. This
would mean that a population collapse is more likely to occur when escapement is closely
above the average threshold value and T is small, than when escapement is closely above
the average threshold value and T is large. However, this second aspect can be readily
accounted for in the probability distribution function.2
Let me therefore now turn to the probability of triggering the regime shift. Let the
probability density of T on [0, A] be given by a continuous function f such that the cumuRx
lative probability of triggering the regime shift is a priori given by F (x) = 0 f (τ )dτ . The
variable A with R ≤ A ≤ ∞ denotes the upper bound of the support of T . When R < A,
2
Note that I rule out a constructivist worldview. T is given by nature. While the distance between me
and the edge of the cliff (the threshold) gets smaller because I walk towards it, the threshold does not appear
because I am walking.
9
there is some probability 1 − F (R) that extracting the entire amount of the resource is
actually safe and the presence of a critical threshold is immaterial. When R = A extracting
the entire amount of the resource will trigger the regime shift for sure. Both R and A are
known with certainty.
Because T is constant, it follows that any segment of the state space that has been
explored without observing the threshold is known to be safe, also in the future. It is
therefore useful to split the per-period consumption choice in two parts: ct = st + δt . This
means:
1. The planner consumes st (the amount of the resource that can be used safely).
2. The planner may choose to consume an additional amount δt , effectively pushing the
boundary of the safe consumption set at the risk of triggering the regime shift.
Knowing that a given a given exploitation level s is save, the updated density of T on
[s, A] is given by fs (δ) =
f (s+δ)
1−F (s)
(Figure 1). The cumulative probability of triggering the
regime shift when, so to say, taking a step of distance δ from the safe value s is:
Z
Fs (δ) =
δ
fs (τ )dτ
=
0
1
1 − F (s)
Z
δ
f (s + ξ)dξ
F (s + δ) − F (s)
1 − F (s)
=
0
So that Fs (δ) is the discretized version of the hazard rate. I assume that the hazard rate
is not decreasing with s. Most proofs do not rely on this assumption, but it simplifies the
Density
following exposition considerably.3
0
s
R
Figure 1: Updating of belief upon learning that T > s: Grey area is F , blue hatched area is Fs .
The (bayesian) updating of beliefs is illustrated in Figure 1. Note that it is only revealed
whether the state s is safe or not, but no new knowledge about the relative probability that
3
Note that this assumption is not necessarily inconsequential with respect to the optimal policy: Essentially, it rules out very “rugged” probability distributions, where – figuratively speaking – one could be in a
situation such that one would not want to take another step just to the left of a peak in the density, but if
one were to the right, one would want to jump very far to the next peak. In other words, this assumption
guarantees that the optimal policy is a continuous function of the state. At the end of the day, this assumption is not very strong, as I am most interested in analyzing cases that are sufficiently well behaved to be
amenable to real world applications.
10
the threshold is located at, say, s1 or s2 (with s1 , s2 > s) has been acquired. Therefore, I
call this type of learning “affirmative”. The absence of any passive learning (an arrival of
information simply due to the passage of time) is justified in a situation where all learning
opportunities from other, similar resources have been exhausted. The only way to learn
more about the location of the threshold in the specific problem at hand is to experiment
with it.4 Another explanation for the absence of passive learning is when the resource at
hand is very large and unique, such as the planet earth when thinking about tipping points
in the climate system.
The key expression that I use in the remainder of the paper is Fs (δ), which I call the
survival function. It denotes the probability that the threshold is not crossed when taking
a step δ, given that the event has not occurred up to s. Let F (x) = 1 − F (x) and
Fs (δ) = 1 − Fs (δ) =
F (s + δ)
1 − F (s) − (F (s + δ) − F (s))
=
1 − F (s)
F (s)
(2)
The survival function has the following properties:
h
i
(R)
• Fs (δ) ∈ 1−F
;
1
(it is bounded below by the conditional probability that T is not
1−F (s)
in the interval [s, R]);
•
∂Fs (δ)
∂δ
=
−f (s+δ)
1−F (s)
•
∂Fs (δ)
∂s
=
−f (s+δ)(1−F (s))+(1−F (s+δ))f (s)
[1−F (s)]2
I–2
< 0 (the survival probability decreases as the step size increases);
< 0 because
f (s)
1−F (s)
<
f (s+δ)
1−F (s+δ)
by assumption.
Optimal experimentation
Starting from a given safe value s, the social planner has in principle two options: She can
either stay at s (choose δ = 0), thereby ensuring the existence of the resource in the next
period (as the probability of crossing the threshold is then 0, or Fs (0) = 1). Alternatively,
she can take a positive step into unknown territory (choose δ > 0), potentially expanding
the set of safe consumption possibilities to s0 = s + δ, albeit at the risk of a resource collapse
(as Fs (δ) < 1 for δ > 0). The social planner’s “Bellman equation” is thus:
V (s) =
max
δ∈[0,R−s]
u(s + δ) + βFs (δ)V (s + δ)
(3)
The crux is, of course, that the value function V (s) is a priori not known. However,
we do know that once the planner has decided to not expand the set of safe consumption
possibilities, it cannot be optimal to do so at a later period: If δ = 0 is chosen in a given
period, nothing is learned for the future (s0 = s), so that the problem in the next period is
4
An everyday example is blowing up a ballon: We all know that they will burst at some point, and we
have blown up sufficiently many balloons, or seen our parents blow sufficiently many balloons to have a
good idea which size is safe. But for a given balloon at hand, I do not know when it will burst.
11
identical to the problem in the current period. If moving in the next period would increase
the payoff, it would increase the payoff even more when one would have made the move a
period earlier (as the future is discounted).
To introduce some notation, let S be the set of values s∗ at which it is not socially
optimal to experiment (as the threat of a disastrous regime shift is too large) and let s∗
be the lowest member of this set of values. In Appendix A–1, part 1, I show that S is not
empty, so that for s ∈ S, it is optimal to choose δ = 0. In this case, we know V (s): it is
given by V (s) =
u(s)
1−β .
This leaves three possible paths when starting from values of s0 that are not in S. The
social planner could
a.) make one step and then stay,
b.) make several, but finitely many steps and then stay,
c.) make infinitely many steps.
Suppose that S is reached in finitely many steps. This implies that there must be a last
step. For this last step, we can explicitly write down the objective function as we know that
the continuation value of staying at s0 forever is
u(s0 )
1−β .
Denote the social planner’s valuation
of taking exactly one step δ from the initial value s and then staying at s0 forevermore by
ϕ(s) and denote by δ ∗ (s) the optimal choice of the last step. Formally:
ϕ(δ; s) = u(s + δ) + βFs (δ)
u(s + δ)
.
1−β
(4)
This yields the following first-order-condition for an interior solution:
ϕ0 (δ; s) = u0 (s + δ) +
i
β h 0
Fs (δ)u(s + δ) + Fs (δ)u0 (s + δ) = 0.
1−β
(5)
Note that we need not have an interior solution so that δ ∗ (s) = 0 when ϕ0 (δ; s) < 0 for all
δ ∈ (0, R − s] and δ ∗ (s) = R − s when ϕ0 (δ; s) > 0 for all δ ∈ [0, R − s). That is:
δ ∗ (s) = max 0 ; min {arg max ϕ(δ; s) ; R − s} .
(6)
With this explicit functional form in hand, I can show that it is better to traverse any
given distance before remaining standing in one step rather than two steps (Appendix A–1,
part 2). A fortiori, this holds for any finite sequence of steps. Also an infinite sequence of
steps cannot yield a higher payoff since the first step towards S will be arbitrarily close to
S and discounting ensures that there is no gain from never actually reaching S.
The intuition is the following: Given that it is optimal to eventually stop at some s∗ ≥ s̄∗ ,
the probability that the threshold is located on the interval [s0 , s∗ ] is exogenous. Hence the
probability of triggering the regime shift when going from s0 to s∗ is the same whether
the distance is traversed in one step or in many steps. Due to discounting, the earlier the
optimal safe value s∗ is reached, the better. In other words, given that one has to walk out
into the dark, it is best to take a deep breath and get to it.
12
The first-best consumption pattern is summarized by the following proposition:
Proposition 1. The socially optimal total use of the resource is either s0 for all t or
s0 + δ ∗ (s0 ) for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1. In other words,
any experimentation – if at all – is undertaken in the first period.
Proof. The proof is given in Appendix A–1.
In short, the dynamics of learning are stunted: For initial values of s below some threshold s∗ , it is optimal to make exactly one step and then stay at the updated value s0 forever
(provided T is not located between s and s0 , of course). For initial values of s in S, it is
optimal to never expand the set of safe consumption possibilities.
Note that the setup here explicitly allows to take any step size that one wish to take.
In many real world applications, this would not be possible. There could, for example,
be capacity constraints to harvest, or there could be convex adjustment cost that make it
prohibitively costly to take large steps. In this sense, the current section analyses an ideal
case, whereas constraints on the choice are discussed in more detail in section I–4.2.
Proposition 2 then shows that the more the social planner knows, the less she wants to
learn. In other words, the degree of experimentation is declining in s. The intuition for this
effect is clear: The more valuable my current outside option, the less I can gain from an
increased consumption set, but the more I can lose should the experiment trigger the regime
shift. This implies that the largest step is undertaken when s = 0, which is reminiscent of
Janis Joplin’s dictum that “freedom is another word for nothing left to lose”.
Proposition 2. The socially optimal step size δ ∗ (s) is decreasing in s.
Proof. The proof is placed in Appendix A–2.
I–3
A simple example with specific functional forms
For a given utility function and a given probability distribution of the threshold’s location it
is then possible to find closed-form solutions for δ ∗ (s) and the value function V (s). Below,
1
I do this for u(c) = c 2 and a uniform probability distribution so that it is especially easy
to explicitly solve (5). When the social planner thinks that every value in [0, A] is equally
likely to be the threshold we have f =
1
A,
and accordingly Fs (δ) =
A−s−δ
A−s .
Consequently, we have
1
A − s − δ (s + δ) 2
ϕ(δ; s) = (s + δ) + β
.
A−s
1−β
1
2
(7)
This yields the following first-order-condition for an interior solution:
"
#
1
2
1
1
1
β
(s
+
δ)
A
−
s
−
δ
1
−
+
(s + δ)− 2 = 0.
ϕ0 (δ; s) = (s + δ)− 2 +
2
1−β
A−s
A−s 2
13
(8)
which can be solved for:
δ∗ =
A − (1 + 2β)s
3β
(9)
Recall that there need not be an interior solution. When s ≥ s∗ , it is optimal to never
experiment and to remain at the initial safe value s. But also when the future is discounted
heavily (so that β is very low) there may be a range of initial values for which it is not
optimal to choose an interior step size δ ∗ ∈ (0, R − s), but rather try immediately whether
consuming the entire resource triggers the regime shift. Choosing δ(s) = R − s could even
be the optimal action when it is known to cause the catastrophe (i.e. when A = R), namely
when the current consumption of R is more valuable than to stay below R in the present
period in order to increase the chances of a continued consumption in the future. Denote
by s∗ the largest member of the set of initial values at which it is optimal to consume the
A−(1+2β)s
. At
3β
A−(1+2β)s
. Thus:
3β
entire resource. s∗ is found by solving R − s =
standing, so that s∗ is found by solving 0 =
A − 3βR
s = max 0,
(1 − β)
∗
∗
s = min
s∗ it is optimal to remain
A
,R
1 + 2β
Total extraction δ(s)
Figure 2 plots the socially optimal extension δ of the safe consumption.
R-s
s*
s*
R
Initial safe value s
Figure 2: Illustration of policy function δ(s). The blue circles represent the socially optimal extension
δ of the safe consumption set s (on the y-axis) as a function of the safe consumption set on the
x-axis (where obviously s ≤ R and δ ∈ [0, R − s]). For values of s below s∗ , it is optimal to consume
the entire resource (choose δ(s) = R − s). For values of s above s∗ , it is optimal to remain standing
(choose δ(s) = 0). The discount factor is set to β = 0.32 to illustrate the values s∗ and s∗ .
14
I–4
Generalizations and Extensions
In this section, I relax several of the underlying assumptions of the model. I discuss how
optimal experimentation changes when the regime shift occurs only after some delay (section I–4.1), how an exogenously growing upper bound of the consumption possibility set
Rt+1 ≥ Rt changes the optimal consumption pattern (section I–4.2), and I analyze optimal
experimentation when the regime shift is reversible (section I–4.3). Throughout, I keep to
the assumption that the resource replenishes fully as long as the threshold has not been
crossed. More complex resource dynamics where the upper bound of the consumption possibility set in the next period explicitly depends on what is left behind in this period are
addressed in Part III of the paper.
I–4.1
Delay in the occurrence of the regime shift
Consider a situation where the social planner, in a given period, observes only with some
probability whether she has crossed the threshold. In fact, it is not unreasonable to model
the true process of the resource as hidden and that it will manifest itself only after some
delay (see Gerlagh and Liski (2014) for a recent paper that focusses on this effect in the
context of optimal climate policies). Hence, as time passes the planner will update her
beliefs about whether the threshold has been located on the interval [st , st + δt ]. How
does this passive learning affect optimal experimentation? Although solving for the optimal
consumption decision becomes extremely difficult as – due to the delay – the problem is no
longer Markovian, it is possible to show the following:
Proposition 3. Also when crossing the threshold at time t triggers the regime shift at some
(potentially uncertain) time τ > t, it is still optimal to experiment – if at all – in the first
period only.
Proof. The proof is given in Appendix A–3.
In other words, the fact that the learning dynamics are stunted is robust to a delay in
the occurrence of the regime-shift. This does of course not imply that the optimal decision
under the two different models will be the same. They almost surely will differ, as delaying
the consequences of crossing the threshold decreases the costs of experimentation. Yet,
as the planner only learns that she has crossed the threshold when the disastrous regime
shift actually occurs, she cannot capitalize on this delay by trying to expand the set of safe
consumption possibilities several times.
I–4.2
Growing R and constraints on the choice set
Previously, the upper bound of the resource, R has been treated as known and constant. In
this subsection, I shall depart from this assumption and consider the case when R increases
15
(but f and T remain unchanged). Formally, the resource dynamics can be expressed as:
Rt+1

P i
 G(Rt ) if
i ct ≤ T
=
P

i
r
if
i ct > T
(10)
or Rt = r
where G0 (R) > 0.
In this situation, there is scope for a continued expansion of the set of safe consumption
possibilities, but only as long as the upper bound of the available resource at time t, Rt ,
is binding. As Rt can, by construction, not exceed A and we know from the proof of
Proposition 1 that there will be some point at which it is not socially optimal to further
expand the set of safe consumption possibilities. Thus, once δt (st ) < Rt − st for some t = τ ,
we have δt = 0 for all t > τ . That said, a growing Rt may induce several periods where
δt (st ) = Rt − st . The validity of this conclusion can be easily checked by observing that
the first-order condition for an interior choice of δt (equation 5) does not depend on R.
Note that this argument also shows that uncertainty about R is immaterial for the optimal
learning dynamics.
Similarly, constraints on the choice set (such that δ ∈ [0, δmax ] where δmax < R − s)
will mechanically lead to repeated experimentation. When the first-best unconstrained
expansion of the set of safe consumption possibilities is δ ∗ (s0 ) but δmax is such that it
requires several steps to traverse the distance δ ∗ (s0 ), then the safe value s will be updated
sequentially (conditional on not causing the regime shift, of course). Note that it follows
directly from Proposition 2 that a constrained choice set implies an overall more cautious
plan: As δ ∗ (s) is declining in s, it will at some point no longer optimal to choose δmax but
rather an interior step size δ ∈ [0, δmax ] will be optimal.
I–4.3
Reversible regime shifts
So far, the regime shift was assumed to be irreversible. While this simplified the analysis,
this may not be an adequate description for all applications. For some underground aquifers,
it may be possible to desalinate contaminated reservoirs, or a change in the surrounding
hydrological conditions may result in a recharge of freshwater. In this subsection, I therefore
analyze the situation when the upper bound of the consumption possibility set goes back to
R after the system has spend l periods in the unproductive regime (where l > 0). Clearly,
the lag l that the system spends in the unproductive regime could also be interpreted as
the cleanup cost caused by an active effort to reverse the regime (e.g. desalination).
What we observe in case the step δ implies exceeding T and we tip from the productive
into the unproductive regime is a critical modeling choice. On the one hand, one could
presume that while the choice of δ is set and may push us over the edge, we will at least
observe where the edge has been. On the other hand, one can presume that the only thing
we observe is that the regime shift has occurred, so that it must lie between s and s + δ,
but we do not know exactly where.5
5
The latter type of learning is a little bit like sitting in a car, fixing the course to a destination and then
16
In discussing optimal experimentation for these two cases in turn, I will assume that
F (R) = 1. That is, the social planner knows that the threshold is for sure somewhere on
the interval between 0 and R. Keeping the possibility that F (R) < 1 makes the analysis
significantly more tedious without yielding additional insights.
Location of the threshold is discovered if it is crossed. When we take the first
modeling route, the continuation value in case of a negative regime shift will be given by
a period in which the resource is in its unproductive state, the per-period payoff being
u(r) = 0. When the resource has recovered after a lag of length l, the continuation value is
given by
u(T )
1−β
as the threshold has been discovered. As the location of T is unknown, the
social planner evaluates the payoff in case of a reversible regime shift at it expected value.
When experimentation occurs in the first period only (as will be shown below), the Bellman
equation for this problem can be written as:
(
V (s) =
max
δ∈[0,R−s]
"
u(s + δ)
u(s + δ) + β Fs (δ)
+ βl
1−β
R s+δ
s
u(y)f (y)dy
1−β
#)
(11)
where β l ∈ (0, 1) is the discount factor that accounts for the time that the system spends in
the unproductive regime before recovering. The larger l, the stronger the hysteresis. Clearly,
as l → ∞, the reversible case approaches the irreversible case discussed above.
Because the consequences of a regime shift, should it occur, are less malign than in the
irreversible case, the optimal step size will be larger. This becomes clear when inspecting
the derivative of the maximand with respect to δ:
u0 (s + δ) +
i
β h 0
Fs (δ)u(s + δ) + Fs (δ)u0 (s + δ) + β l u(s + δ)f (s + δ)
1−β
(12)
In comparison with the first-order condition in the irreversible case, equation (5), there is
an additional positive term, β l u(s + δ)f (s + δ). Therefore, the function described by (12)
will cross the x-axis at a larger value of δ than the function described by (5). Clearly, this
additional term, β l u(s + δ)f (s + δ), is larger the smaller is l.
Note however, that a region of the state space may still remain unexplored. The probability of crossing the threshold and incurring the cost of the regime shift is increasing,
but the marginal gain from going yet a little further is constant or decreasing, so that the
expected gain is decreasing. Whether or not it pays to explore the entire state space will
depend on the length of the time lag in the unproductive regime. Finally, it will be optimal
to experiment only once. Again, the argument is that if it would be optimal to explore the
state space in some interval (s1 , s2 ], in the second period, it would have also been optimal
to do so in the first period, and due to discounting, it must be better to do so right away.
The following proposition summarizes this discussion:
blindfolding oneself. Conscious experimentation, however, is more realistically described by saying that the
course is set, but the eyes remain open.
17
Proposition 4. When the regime shift is reversible after a lag of length l and T is revealed
when st + δt > T , then any experimentation is undertaken in the first period and the size of
the first step, δ ∗ (s0 ), is larger is l. Depending on l and the initial safe value s0 , a range of
the state-space remains permanently unexplored.
Proof. The proof is placed in Appendix A–4
Location of the threshold remains unknown if it is crossed. When we take the
second modeling route, according to which the social planner does not learn the exact
location of the threshold upon crossing it, an experiment still reveals useful information:
The social planner can now update the upper bound of T ’s distribution. Denote this upper
bound by Ut . We have U0 = R and, if the threshold has been crossed, Ut+1 = st +
δt . Consequently, we need to explicitly account for how the upper bound changes when
formulating the planner’s value function. In addition, note that the social planner can
always secure herself a payoff of
secure herself a payoff of
u(R)
1−β l+1
u(s)
1−β
by simply remaining standing at s. Moreover, she can
when she goes all the way to R, taking into account that
this will trigger the regime shift for sure, but after a lag of l periods, she can do the same
again. Obviously, the planner can also choose an interior step size δ ∈ (0, U − s) (any choice
of δ ∈ [U − s, R − s) cannot be optimal). Denote the payoff from an interior choice by J.
The planner’s value function is then given by:
V (s, U) = max
u(R)
u(s)
; J(s, U) ;
1−β
1 − β l+1
(
"R U
where J(s, U) =
sup
u(s + δ) + β
δ∈(0,U −s)
(13)
s+δ
RU
s
f (y)dy
V (s + δ, U)
f (y)dy
R s+δ
f (y)dy
l s
+β R U
V
f
(y)dy
s
#)
(s, s + δ)
By now, it will not be surprising that the first step will be the farthest, if it is at all
optimal to experiment. If the first step does not trigger the regime shift, we learn that
T > s0 + δ0∗ , and we will stay at s1 = s0 + δ0∗ forever (see also section 2.5 in Groeneveld
et al., 2013). This implies that also here, a region of the state space will remain permanently
unexplored. If the first step triggers the regime shift, we will – after living through a period
of low resource productivity – have the same knowledge about s but we will have updated
the upper bound U. The important thing to note is that there will be a critical value Û
below which it does not pay to experiment further: as the expected gain is small but the
probability of a regime shift is large when U becomes small, it will be better to remain
standing at s or oscillate between consuming R and r. The optimal consumption pattern is
therefore characterized by the “stopping rule” described in Proposition 5.
Proposition 5. When the regime shift is reversible after a lag of length l and T is not
revealed when st + δt > T , it is either not optimal to experiment at all, or there is repeated
18
experimentation with decreasing step sizes δt > δt+1 . Experimentation stops the moment
that st + δt < T or st + δt < Û.
Proof. The proof is placed in Appendix A–5
Part II – The non-cooperative game
While the first part of the paper investigates optimal experimentation of a social planner,
I now analyze how strategic interactions affect learning and resource use under the threat
of a regime shift. To do so, I go back to the simplest case of an irreversible threshold.
Below, I introduce the modifications of the basic model that account for the game-theoretic
setting. In section II–2, I will first look at the case when the location of the threshold is
known in order to expose the underlying strategic structure. Section II–3 contains the main
result and section II–4 gives comparative statics. Section II–5 returns to the example of the
specific functional forms assumed in section I–3 in order to compare the non-cooperative
equilibrium to the social optimum.
II–1
The model
There are N identical players that share a common resource whose dynamics are described
by equation (1). Again, we can think of a underground freshwater reservoir that is accessed
by several well-owners. All players have the same belief F about the location of the threshold
T . Upon learning that a value s of the consumption possibility set is safe, all players update
their belief according to equation (2). Furthermore, I continue to assume that
∂Fs (δ)
∂s
< 0,
so that the hazard rate is increasing in s.
At time t, each player may choose to consume an amount δti more than st , effectively
pushing the boundary of the safe consumption set at the risk of triggering the regime shift.
In other words, δti is the effective choice variable with δti ∈ [0, R − st − δt−i ], where δt−i is the
extension of the safe consumption set by all other players. I denote δ without superscript i
P
i
as the total extension of the safe set, i.e. δt = N
i=1 δt .
It is well known that the static non-cooperative game of sharing a given resource has
infinitely many equilibria. Here, I focus on symmetric pure-strategy equilibria. That is,
any safe value st is shared equitably. Moreover, the game requires a statement about
the consequences when the sum of players’ consumption plans exceeds the total available
resource. In this case, I also assume that the resource is rationed so that each players gets
an equal share.
The objective of the players is to choose that sequence of extension decisions ∆i =
δ0i , δ1i , ... which, for given strategies of the other players ∆−i , and for a given initial value s0 ,
19
maximizes the sum of expected per-period utilities, discounted by a common factor β with
β ∈ (0, 1). I concentrate on Markovian strategies.
For now, the model abstracts from the dynamic common pool problem in the sense that
the consumption decision of a player today has no effect on the consumption possibilities
tomorrow, except that a.) the set of safe consumption possibilities may have been enlarged
and b.) the disastrous regime shift may have been triggered.
II–2
Preliminary step: known threshold location
To expose the underlying strategic structure, I consider the case when the location of the
threshold T is known. What is the first-best outcome in such a situation? When T is large,
the first-best is to indefinitely use exactly that amount of the resource which does not cause
the regime shift. However, when T is small (so that a large part of the available resource R
must be foregone to ensure its continued existence) it will be socially optimal to cross the
threshold and deplete the resource immediately. How small T must be for depletion to be
optimal depends obviously on the discount factor β: the less one discounts the future, the
more willing one is to sacrifice today’s consumption to ensure consumption in the future.
The non-cooperative game then has two equilibria in pure strategies: Either the players
deplete the resource immediately, or they can coordinate on staying at the threshold. For a
given safe value of total consumption s, player i’s value function is:

 I = 1 when s + δ i + δ −i ≤ T
i
V (s) = max u(s/N + δ ) + I· βV (s) , where
 I = 0 when s + δ i + δ −i > T
δi
(14)
Due to the stationarity of the model structure, it is clear that if staying at the threshold
can be rationalized in any one period, it can be done so in every period. The payoff from
avoiding the regime shift is
u(T /N )
1−β .
Conversely, the payoff from deviating and immediately
depleting the resource when all other players’ policy is to stay at the threshold is given
by u R − NN−1 T . Staying at the threshold can thus be sustained as a Nash equilibrium
/N )
N −1
whenever u(T
1−β ≥ u R − N T . Denote by β̄ the value of β for which this condition just
so holds with equality (i.e. β̄ is the lowest discount factor for which staying at the threshold
can be sustained for given values of N , T , and R). We have:
β̄ = 1 −
u(T /N )
u R − NN−1 T
(15)
In fact, there will always be a parameter combination so that the first-best can be
supported as a Nash-equilibrium of the game with a known threshold (Proposition 6). Given
these conditions, the game exhibits the structure of a coordination game. Here, as in the
static game from Barrett (2013, p.236), “nature herself enforces an agreement to avoid
catastrophe.”
20
Proposition 6. When the location of the threshold is known with certainty, then there
exists, for every combination of N , T , and R, a value of β̄ such that the first-best can be
sustained as a Nash-equilibrium when β ≥ β̄. The larger is N , or the closer T is to 0, the
larger has to be β.
Proof. The proof is placed in Appendix A–6
II–3
Non-cooperative equilibrium when the location of T is unknown
The game has two equilibria in pure strategies: an “aggressive” equilibrium where the players
immediately deplete the resource, and a “cautious” equilibrium where there is a positive
probability that the resource is maintained forever. In fact, there are two types of the
“cautious” equilibrium, depending on the initial value s. Similar to section I–2, I define
snc to be the lowest member of the set of safe values at which a Nash equilibrium with
the players choosing δ = 0 can be supported. For s ≥ snc , the cautious equilibrium thus
conserves the resource with probability 1. I define snc to be the largest member of the
set at which depletion is the only equilibrium. For values of s ∈ (snc , snc ), the “cautious”
equilibrium implies that the players experiment once (and the regime shift thus occurs with
positive probability).
In general, the “Bellman equation” of player i can (for a given strategy of the other
players ∆−i ) be expressed as:
V i (s, ∆−i ) =
max
δ i ∈[0,R−s]
u(s + δ i ) + βFs (δ)(δ i + δ −i )V i (s + δ, ∆−i )
(16)
Also here, the crux is that V i is a priori unknown. Similar to the analysis in part I, I denote
by φ the value for player i to take exactly one step of size δ i and then remain standing when
the other players’ strategy is to do the same (i.e. ∆−i = {δ −i , 0, 0, 0, ...}):
s
u
φ(δ i ; δ −i , s) = u
+ δ i + βF s (δ i + δ −i )
N
s+δ i +δ −i
N
(17)
1−β
The derivative of φ with respect to δ i is given by:
φ0 (δ i ; δ −i , s) = u0
+
s
+ δi
N
β
0
F s (δ i + δ −i )u
1−β
s + δi + δ
N
−i
+
1
F s (δ i + δ −i )u0
N
s + δi + δ
N
(18)
−i
Let g(δ −i , s) be the interior solution to the first-order-condition of maximizing φ(δ i ; δ −i , s).
21
The best-reply function for player i, δ i∗ (δ −i , s) is then:

0


g(δ −i , s)
δ i∗ (δ −i , s) =


R − s − δ −i
if
s ≥ snc
(19a)
if
s ∈ (snc , snc )
(19b)
if
s ≤ snc
(19c)
For a symmetric step size δ −i = (N − 1)δ i , we have:
s
+ δ nc
N
nc
β
s + δ nc
1
0
nc
nc 0 s + δ
+
F s (N δ )u
+ F s (N δ )u
1−β
N
N
N
φ0 (δ nc ; s) = u0
(20)
Proposition 7. The set of Markov-strategies
0
if
δ nc (s) if
R−s
N
if
s ≥ snc
s ∈ (snc , snc )
s ≤ snc
where δ nc (s) is defined by the interior solution to (20), constitutes a feedback Nash equilibrium. That is, for s0 ≥ snc coordination to stay at s0 can be supported as a Nash equilibrium.
For s0 < snc taking one step and then staying at s1 = s0 + δ nc can be supported as a Nash
equilibrium.
Proof. The proof is given in Appendix A–7
Obviously, the best-reply for player i when all other players plan to expand the consumption set by R − s is to choose R − s as well. This would ensure that the player at
least gets an equal share of R. I call this equilibrium in which the resource is immediately
depleted the “aggressive equilibrium” and the equilibrium described in Proposition 7 the
“cautious equilibrium”. Note that, for a given s, both the “cautious” and the “aggressive
equilibrium” are unique.6
In short, the game has the structure of a coordination problem where the immediate
depletion of the resource may become a self-fulfilling prophecy. Indeed, for s ≤ snc , the
immediate depletion of the resource cannot be avoided in a non-cooperative setting, despite
the fact that there is a range of initial values (s ∈ [s∗ , snc ]) for which it is optimal to
conserve the resource indefinitely with positive probability. For s ∈ [snc , snc ], the strategic
interactions imply that experimentation is inefficiently large. However, should it turn out
6
Uniqueness of the latter type of equilibrium simply follows from the assumption that in case of incompatible demands, the resource is shared equally among the players. Uniqueness of the symmetric “cautious
equilibrium” (should it entail δ nc (s) < R−s
) can be established by contradiction. Suppose all other players
N
j 6= i choose to expand the consumption set to a level at which – should the threshold have not been crossed
– no player would have an incentive to go further. Player i’s best-reply cannot be to choose δ i = 0 in
this situation as the gain from making a small positive step (which are private) exceed the (public) cost of
advancing a little further. Hence, the only equilibrium at which the players expand the consumption set
once is the symmetric one.
22
that s0 = s + N δ nc is safe, this consumption pattern is ex-post socially optimal. Figure 3
illustrates the aggregate expansion of the set of safe consumption possibilities in the cautious
equilibrium and contrasts it with the social optimum.
Total extraction δ(s)
Social Optimum
Non-cooperation
R-s
s*
snc
s*
snc
R
Initial safe value s
Figure 3: Illustration of policy function δ(s). The blue circles represent the socially optimal extension
δ of the safe consumption set s as discussed in section I–3 above. The red dashed line plots the
cautious non-cooperative equilibrium, showing how s∗ ≤ snc and s∗ ≤ snc (but note that in some
cases we may even have snc < s∗ ). It illustrates how even the “cautious” experimentation under
non-cooperation implies excessive risk-taking. The figure also shows that the first-best and the
non-cooperative outcome may coincide for very low and very high values of s. A value of β = 0.32
has been chossen to illustrate all values s∗ , snc , s∗ and snc .
Faced with this coordination problem, the question arises which of the two equilibria
can we expect to be selected. Clearly, the “cautious equilibrium” pareto-dominates the “aggressive equilibrium”.7 With rational players and without strategic uncertainty, the cautious
equilibrium would thus be the outcome of the game. But what happens when the players are
uncertain about the other player’s behavior? As the disastrous regime shift is irreversible,
there is no room for dynamic processes that lead players to select the pareto-dominant equilibrium (Kim, 1996). Therefore, I turn to the static concept of risk-dominance (Harsanyi
and Selten, 1988).
Since the game is symmetric, applying the criterium of risk-dominance for equilibrium
selection has the intuitive interpretation that the cautious equilibrium is selected if player i
prefers to play cautiously (i.e. by choosing δ i (s) = δ nc (s)) rather than playing aggressively
7
This follows immediately from the fact that, by definition, δ nc (s) is the interior solution to the symmetric
maximization problem (16) (with δ −i = (N − 1)δ nc ) where the policy δ(s) = R − s was an admissible
candidate.
23
(i.e. choosing δ i (s) = R − s) when the payoff from doing so exceeds the payoff from playing
aggressively when player i assigns probability p to the other players playing aggressively.
Obviously, whether the cautious or the aggressive equilibrium is risk-dominant will depend
both on this probability p as well as on the safe value s. We can, for a given safe value s
solve for the probability p∗ at which the player is just indifferent between playing cautiously
or aggressively:
p∗ · π[all aggressive] + (1 − p∗ )· π[only i aggressive] = p∗ · π[only i cautious] + (1 − p∗ )· π[all cautious]
⇔
p∗ =
π[all cautious] − π[only i aggressive]
(π[all cautious] − π[only i aggressive] ) − (π[only i cautious] − π[all aggressive] )
In the above calculation, π[all aggressive] refers to the payoff of playing aggressive, when all
other players play aggressively, π[only i aggressive] refers to the payoff of playing aggressive,
when all other players play cautiously, etc. In order to explicitly solve for the value of p∗ ,
we need to put more structure on the problem. For the functional forms developed in the
specific example, we can calculate and plot p∗ as a function of s. Figure 4 then illustrates
how robust this equilibrium is: Even when the players think that there is more than a 50%
chance that all other players play the aggressive strategy, it still pays to play the cautious
1
strategy for a wide range of initial values s.
0.75
0.5
Region where playing cautious is risk-dominant
0.25
Probability that opponents play aggressively
p*
snc
snc
R
Initial safe value s
√
Figure 4: p∗ as a function of s for u(c) = c, f = A1 and β = 0.8, A = R = 1 and N = 10. The
grey area below the line drawn by p∗ shows the set of values for which player i prefers to play the
strategy that pertains to the cautious equilibrium. p∗ is not defined for s < snc when the cautious
and the aggressive equilibrium coincide.
24
II–4
Comparative statics
In order to analyze how the extraction pattern changes with changes in the parameters,
I first note that δ nc , the equilibrium expansion of the set of safe values, is monotonically
decreasing in s. (The argument is the same as in the proof of Proposition 2 when simply
replacing δ ∗ (s) with N δ nc (s) and is therefore omitted.) This implies that the aggregate
extraction pattern as a function of the prior knowledge about the set of safe consumption
possibilities indeed looks qualitatively as in Figure 3 (where it was plotted for the specific
functional forms assumed in the example). The effect of an increase in the fundamentals β,
N , F s (δ), and R can therefore be analyzed by investigating changes to φ0 (δ nc , s).
Proposition 8. We have the following comparative statics results:
(a) The boundaries snc , snc , and aggregate extraction for s ∈ [snc , snc ], decrease with β.
(b) An increase in N leads to more aggressive extraction when
N
N +1
R
> u0 ( N
) u0 ( NR+1 ).
(c) The more likely the regime shift (in terms of a first-order stochastic dominance), the
larger the range where a cautious Nash-equilibrium exists.
(d) As long as R < A, the higher the maximum potential reward R, the larger the range
where a cautious Nash-equilibrium exists.
Proof. The proofs are given in Appendix A–8.
The first comparative static result conforms with basic intuition: The more impatient the
players are, the less they value the current safe consumption value, and the more aggressive
their experimentation. The second result shows that an increase in the number players may
exacerbate the “tragedy of the commons”, but not necessarily in all cases. There are two
opposing effects: On the one hand, the addition of one more players increases aggregate
extraction if all players were to choose the same consumption level as before. On the other
hand, the addition of another player leads all other players to decrease their individual
consumption as they partly take the increase in N into account. A sufficient condition for
R
when the former dominates is that NN+1 > u0 ( N
) u0 ( NR+1 ). The third comparative static
result, that an increased risk of crossing the threshold leads to a larger range of the cautious
equilibrium is not related to the risk aversion of the agents as such, but stems from the fact
that the expected cost of experimentation increase, while the gains stay the same. Finally,
as long as R < A, an increase in the upper bound of the consumption possibility set, R,
increases the range where a cautious equilibrium exists, in spite of the fact that R neither
affects the interior consumption choices directly, nor the value snc at which the cautious
equilibrium implies no further experimentation. The reason for the result is the following:
At the old value of snc , the equilibrium experimentation just coincides with choosing R − s.
Now when the R shifts outwards, say to R̃, equilibrium experimentation at the old snc is
strictly less than R̃ − s.
25
II–5
Specific example again
In this section, I use the specific example developed throughout the paper to derive closedform solutions for the step size in the “cautious” equilibrium and to plot the individual
player’s value function in that equilibrium.
Exploiting the fact that due to symmetry we have δ −i = (N − 1)δ i , I solve (20) for an
interior equilibrium value δ nc . Total non-cooperative expansion is then given by:
Nδ
nc
(1 − β)N + β A − (1 − β)N + 3β s
=
3β
(21)
Again, there will only be an interior equilibrium for s ∈ [snc , snc ], where:
)
(1 − β)N + β A − 3βR
= max 0,
(1 − β)N
(
nc
s
(
s
nc
= min
)
(1 − β)N + β A
,R
(1 − β)N + 3β
From inspection of (21) it becomes clear that the total non-cooperative consumption is
increasing in N :
∂[N δ nc ]
∂N
=
(1−β)(A−s)
3β
> 0. This points to the “tragedy of the commons”:
The more players there are, the more aggressive the first-period expansion of the set of
consumption possibilities. Furthermore, one can find the combination of parameters that
would ensure a self-fulfilling prophecy of extirpation when staring from an initial value of s =
0 (no prior knowledge of a safe level of extraction). It is namely given by N ≥
β
1−β (3R − A),
hence increasing in R and decreasing in β and A, as it is intuitive.
Figure 5 plots the value function for a uniform prior (with A = R = 1) and a discount
factor of β = 0.8, illustrating how it changes as the number of player increases. The more
players there are, the greater the distance of the non-cooperative value function (assuming
that the players coordinate on the Pareto-dominated equilibrium, plotted by the blue solid
diamonds) to the socially optimal value function (plotted by the black open circles). In
particular when N = 10, one sees the region (roughly from s = 0 to s = 0.2) where there
is no “cautious” equilibrium, and the large value of s̄nc (roughly 0.62) when it first becomes
individually rational to remain standing. All in all however, this example shows that the
threat of a irreversible regime shift is very effective when the common pool externality
applies only to the risk of crossing the threshold. (At least for this specific utility function
and these parameter values. Note that β = 0.8 implies a unreasonably high discount rate,
but it was chosen to magnify the effect of non-cooperation for a small number of players.)
26
N=5
N=10
5
Social Optimum
Non-cooperation
1
1
2
2
3
3
4
4
5
Social Optimum
Non-cooperation
snc
snc
s*
0.2
0.4
snc
0
0
s*
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.6
0.8
1.0
Figure 5: Illustration of expected total payoff in the social optimum (blue open circles) and in the
cautious non-cooperative equilibrium (red dashed line). Left panel shows the aggregate payoff in
the game when√there are N = 5 players and right players shows N = 10. Parameters and functional
forms: u(c) = c, β = 0.8, A = R = 1, and Fs (δ) = 1−s−δ
1−s .
Part III – Discussion
The paper’s main results do not rely on specific functional forms for utility or the probability distribution of the threshold’s location. Tractability was achieved by: (1) considering
extremely simple resource dynamics, namely the resource remained intact and replenishes
fully in the next period as long as resource use in the current period has not exceeded T .
(2) I have assumed that the value after the threshold has been crossed is independent of the
situation before the regime shift has occurred. Thus, one could interpret R as the upper
bound of the consumption possibility set in the productive regime and r being the upper
bound of the consumption possibility set in the unproductive regime.
In this part of the paper, I relax the first structural assumption and discuss generic nonrenewable resource dynamics in section III–1 and generic renewable resource dynamics in
section III–2. In contrast to the analysis in part II, there is now a common-pool externality
that relates to the resource itself. Therefore, the cautious non-cooperative equilibrium will
coincide with the social optimum in very special cases only. Nevertheless, it is still socially
optimal and Nash equilibrium to undertake any experimentation in the first period only.
Thus, if the regime shift has not occurred in the first period, the threat of crossing the
catastrophic threshold disciplines non-cooperative play and leads to a pareto-improvement.
In section III–3, I then relax the second structural assumption. The value of the unproductive regime cannot be normalized to zero when it explicitly depends on the values of
the state and control variables in the pre-event regime. In such an environment, learning
will be gradual when the post-event value is declining sufficiently strongly in the size of the
experiment. Intuitively, the more costly it is to over-invest, the more it pays to experiment
repeatedly and cautiously, in spite of the fact that sequential investment yields a lower
payoff when the threshold is not crossed.
27
III–1
Non-renewable resource dynamics
So far, it has been assumed that the resource replenishes fully every period unless the
thresholds has been crossed. Here, I study the opposite case of a non-renewable resource
to analyze the effect of a disastrous regime shift when the common-pool externality relates
not only to the risk of crossing the threshold but also to the resource itself. Specifically, I
consider the following model of extraction from a known stock of a non-renewable resource:
max
cit
∞
X
β t u(cit ) subject to: Rt+1
t=0

P
P i
 Rt − i cit if
i ct ≤ T
=
P i

r
if
i ct > T
(22)
or Rt = r
I assume that the utility function is of such a form, that in a world without the threshold,
there is a pareto-dominant non-cooperative equilibrium in which positive extraction occurs
in several periods (though the players could empty the resource in one period, if they so
wish). Due to discounting, it is clear that the extraction level will decline as time passes, both
in the social optimum and in the non-cooperative equilibrium. Due to the stock externality,
it is clear that the extraction rate in the non-cooperative equilibrium is inefficiently large.
A simple interpretation of this model could be a mine from which several agents extract
a valuable resource. For example, the structure of the mining shafts may collapse and the
remainder of the resource becomes inaccessible if aggregate extraction is too high in any
given period. In spite of this natural interpretation, two things are rather peculiar about
this model setup: First, any player can extract any amount up to Rt . (The option to
introduce a capacity constraint on current extraction – though realistic – would come at the
cost of significant clutter without yielding any apparent benefit.) Second, the assumption
that R0 is known and that T is constant means that this is not a problem of eating a cake of
unknown size. This problem has since long been dealt with in the literature (see e.g. Kemp,
1976; Hoel, 1978) and is not considered here.
As in part II of the paper, it is instructive to first discuss the case when the location
of the threshold is known in order to expose the strategic structure that results from the
potential for a disastrous regime shift. Let c̃nc (Rt ) be the total non-cooperative extraction
level (as a function of the resource stock Rt ) in absence of the regime shift risk. Clearly, if
T > c̃nc (R0 ) the threshold is immaterial and the whole problem is not interesting. Thus, I
only consider the case when the known value of T is below c̃nc (R0 ). The relevant question is
therefore whether the agents can coordinate on staying at the level T in the first period or
not. If they can, they will stay at the level T for an interval of time t = 0, 1, ..., τ , where τ is
the time at which the resource stock has been depleted to a level where the non-cooperative
extraction path stays below T until the resource is exhausted. That is Rτ is given by R0 −τ T
where τ is defined implicitly by c̃nc (R) = T .
The temptation to empty the resource in the first period, when all other players stay at
the threshold is given by u(R0 −
N −1
N T)
T
> u( N
) + βV (R0 − T ). Whether this inequality
holds depends on the particular form of u and V and cannot be answered in general. Still,
it is possible to prove the following:
28
Proposition 9. In the game described by (22), a known threshold is crossed in the first
period, or never.
Proof. The proof is given in Appendix A–9.
When we now turn to the case where the location of the threshold is unknown, it will
not be surprising that it does not pay to experiment in the second (or any later) period
because learning is only affirmative. This means that, if the threshold has not been crossed,
the extraction path will be constrained by s1 = s0 + δ nc (s0 ). While the extraction path
in the absence of regime shift risk, c̃nc (Rt ), declines monotonically, the extraction path
under regime shift risk in the “cautious equilibrium” is characterized by an initial nondeclining phase for which ct = s1 = s0 + δ nc (s0 ), and at some period τ , when c̃nc (Rτ ) = s1 ,
the extraction path then follows c̃nc (Rt ) (i.e. it declines monotonically). As the players
would never expand the set of safe extraction possibilities beyond c̃nc (R0 ) in the cautious
equilibrium, the threat of a disastrous regime shift is welfare improving when the players
coordinate. However, the cautious equilibrium will coincide with the first-best only in the
very special case that the initially safe value s0 is binding in each period of the game and
the social optimum.
Proposition 10 summarizes this discussion.
Proposition 10. In the game described by (22) when T is unknown, there exists, in addition
to the aggressive equilibrium in which the resource is exhausted in the initial period, a pareto
dominant equilibrium in which experimentation – if at all – is undertaken in the first period
only and s1 = s0 + δ nc (s0 ) is an upper bound on aggregate extraction for the remainder of
the game. The threat of a regime shift slows down the extraction rate and improves welfare.
Proof. The proof is given in Appendix A–10.
III–2
Renewable resource dynamics
Consider a generic renewable resource problem, where the objective of player i is:
max
cit
∞
X
T =0
β t u(cit , Rt ) s.t.: Rt+1

P
P i
 G(Rt − i cit ) if
i ct ≤ T
=
P i

r
if
i ct > T
(23)
or Rt = r
Note that the instantaneous utility function now directly depends on the resource stock
(with
∂u
∂R
> 0). This could, for example, be due to stock dependent harvesting costs as
it is usual for fishery models. Suppose that without the threshold, there is a unique Nash
nc . Due to the negative stock externality, the
equilibrium with steady state resource stock R∞
so , is larger than Rnc .
socially optimal steady state resource stock, R∞
∞
Parallel to section III–1, the threshold applies to the total exploitation level in any given
period, and not to the stock as such. This structure is without loss of generality here,
29
because there is a one-to-one mapping between total harvest and escapement for a given
initial stock.
Assume that the initial resource stock R0 , and the initial safe exploitation level, s0 , are
so . Assume furthermore that there are no capacity constraints to harvesting, so that
above R∞
nc in one period should he or she wish to do so. Although
any agent i can consume R0 − R∞
we would need to put a lot more structure on this problem to solve it explicitly, we can
make the following statement:
nc , the threat of a disastrous regime shift will
Proposition 11. Unless δ0nc (s0 ) = R0 − R∞
nc .
strictly improve welfare as players can coordinate on a steady-state resource stock above R∞
Proof. The argument for the fact that the players can coordinate on a cautious equilibrium
is the same as the one for Proposition 10 and is omitted.
As in the case of non-renewable resource dynamics, the maximum extraction level that
is known to be safe puts an upper bound on extraction and thereby mitigates the negative
stock externality. If the initial value of s0 is relatively low the players will experiment once
and stay at the updated level, unless, of course, the threshold has been crossed.
The discipling effect of the threat of a stock collapse may indeed be very policy relevant,
as the example of the North-Atlantic herring fishery suggests: For centuries, this fishery has
been the economic centerpiece for many regions in Northern Europe. The collapse of this
immense stock in the late 1960s created significant hardship for the affected communities. In
spite of a complete harvest moratorium, it took almost 30 years for the fish stock to recover.
By the end of the 1990s the spawning stock biomass had reached levels above 6 million
tons again and the fishery was re-opened. A changed distribution pattern in the early 2000s
lead to strong disagreement among the harvesting nations and severe overfishing. The trend
in biomass growth halted and even turned negative. Nevertheless, the competing nations
could restore cooperation, supposedly because they were “staring into the abyss that yawned
before them” (Miller et al., 2013, p.325).
III–3
Pre-event choices matter for post-event value
The second simplifying assumption in part I and II that allowed me to get tractable results
was that pre-event choices did not matter for post-event value. In this sense, the regime shift
was really disastrous, breaking any links between the state before T has been crossed and
the state afterwards. Because of this assumption, I could simply normalize the continuation
value in case of a regime shift to zero. For some applications, this independence of the
post-event value is not a very fitting description. This is especially true when the system
under consideration is large, such as for global climate change, and the threshold effect on
the damage is not truly catastrophic, but just one of many parts in the equation. In such a
setting, it would be more appropriate to explicitly take into account how the continuation
value depends on the pre-event situation, for example by writing the continuation value
function by W (st+1 ).
30
How W would depend on the pre-event values of the state and choice variables s and δ
is not generally clear. The capital stock in a climate change application, for example, has
an ambiguous effect (Ploeg and Zeeuw, 2015a). On the one hand, it could buffer against
the adverse effects of the regime shift and hence smooth consumption over regimes. On the
other hand, a higher capital stock implies more intense use of fossil fuels, which aggravates
climate damages. In a renewable resource application, Ren and Polasky (2014) similarly
discuss under which conditions regime shift risk implies more cautionary management. In
particular, they highlight the role of an “investment effect”: because investing in the renewable resource stock by harvesting less pays off badly should the regime shift occur, there are
incentives for more aggressive management. These incentive are balanced by what they call
the “consumption smoothing effect” (leading to more precaution in their application) and
the “risk reduction effect”.
Whether overall W 0 (st+1 ) > 0 or W 0 (st+1 ) < 0 will depend on the specific model at
hand. Regardless of this, however, when pre-event choices matter for the post-event value,
it is no longer immaterial by how much one has stepped over the threshold when it is crossed.
Below, I derive general conditions when this implies that it is not optimal to undertake all
experimentation at once, but rather approach the value at which experimentation ceases
sequentially.
III–3.1
Optimal and non-cooperative experimentation
The value function for a social planner that knows that the current consumption level s is
safe, but an increase of the consumption flow by an amount δ may trigger a regime shift
is given by (24), where the post-event value function is given by W . I assume that Wt+1
depends only on the pre-event value st at time t and on the choice in period t. In other
words, the Markov property is maintained and the problem is stationary. Furthermore, I
continue to assume that the regime shift irreversible.
V (s) = max
δ∈[0,R.s]
u(s + δ) + β (1 − Fs (δ))· V (s + δ) + Fs (δ)· W (s + δ)
(24)
When the post-event value is a function of the pre-event state, optimal experimentation
will occur in several steps only if the post-event value declines with the size of the last step
before the threshold has been crossed. Intuitively, it only pays off to proceed in several
steps when crossing the threshold is worse the further it has been exceeded. To see this
more clearly, I compare the expected value of traversing a given distance in one step with
the expected value of traversing the same distance in two steps (equation 25). In the proof of
Proposition 1 (in Appendix A–1) this construction was used to shows that experimentation
in the basic step involves at most one step. Before analyzing equation (25), note that also
here there will be some value of s∗ such that for s ≥ s∗ , it is not optimal to experiment any
further, but rather to stay at s.
As in the proof of Proposition 1, take some s̃ outside of the set of values at which it is
optimal to remain standing (I denote this set by S), and let δ ∗ be the optimal step from s̃
31
into S. Then denote by δ̃ the step from some s to s̃, and compare the expected payoff from
taking one step of size δ̂ = δ̃ + δ ∗ to the expected payoff from taking two steps: one step of
size δ̃ from s to s̃, and then the step δ ∗ from s̃ into S.
u(s + δ̂) + β Fs (δ̂)W (s + δ̂) + (1 − Fs (δ̂))V (s + δ̂)
?
>
h
u(s + δ̃) + β Fs (δ̂)W (s + δ̃) + (1 − Fs (δ̃)) u(s + δ̃ + δ ∗ )+
i
+β Fs+δ̃ (δ ∗ )W (s + δ̃ + δ ∗ ) + (1 − Fs+δ̃ (δ ∗ ))V (s + δ̃ + δ ∗ )
Using the fact that because s + δ̃ + δ ∗ ∈ S we know V (s + δ̃ + δ ∗ ) =
u(s+δ̂)
1−β ,
this can be
re-written as:
?
R s+δ̂
u(s + δ̂) − u(s + δ̃) > β
s+δ̃
f (y)dy h
1 − F (s)
u(s + δ̂) − (1 − β)W (s + δ̂)
i
(25)
R s+δ̃
+β
i
f (y)dy h
W (s + δ̃) − W (s + δ̂)
1 − F (s)
s
Equation (25) compares the immediate gain from taking a large instead of a small step
with the (discounted) future consequences when the threshold was either located between
s + δ̃ and s + δ̂, or located between s and s + δ̃. Clearly, the more the future is discounted,
the more important will be the immediate gain of taking a large step. But note also that
when W = 0, the condition above is identical to equation (A-2) in the basic setup, where
it was shown that the LHS was larger than the RHS. Now when W 6= 0, there are two
additional effects. First, postponing the regime shift is less valuable when the consequences
of the regime shift are less severe. This is captured by the term (1 − β)W (s + δ̂) in the first
line of equation (25). This effect strengthens the social planner’s incentive to experiment
only once. However, this effect can be overturned when W (s + δ̃) > W (s + δ̂): Specifically,
the term in the second line of equation (25) represents the value of postponing the loss from
having overstepped by a lot rather than just a little, and it is proportional to the probability
that the threshold is located in the interval between s and s + δ̃. In summary, only when the
post-event value declines sufficiently strongly in the size of the pre-event experiment will it
pay to update the upper bound of the set of safe consumption values in several steps.
In the non-cooperative game, there are two countervailing forces when pre-event choices
matter for post-event value: On the one hand, the classic dynamic common pool externality
emphasizes current short-term gains over long-term savings, which speaks against cautious,
sequential experimentation. On the other hand, as also the post-event value will be characterized by the common-pool externality, a lower value of state variable after the regime shift
has been triggered is much more harmful under non-cooperation than in the social optimum.
This effectively increases the penalty from overstepping the threshold by far and speaks in
32
favor of cautious, sequential experimentation. How these two effects play out cannot be said
in general, and I therefore turn to the specific example that I have developed throughout
this paper in the next section.
III–3.2
Numerical solution for specific functional forms
To illustrate the possible size of the effect when pre-event choices matter for the post-event
value in , I numerically solve the model for the specific functional forms developed above.8
For the post-event continuation value, I assume that the resource loses all its productivity
once the regime shift occurs. In other words, in the social optimum W is the highest value
that can be obtained when spreading the consumption of the now non-renewable resource
rt = R − st − δt over the remaining time horizon. Formally:
W (st + δt ) = max
0≤cτ ≤rτ
s
=
∞
X
β τ −t u(cτ )
subject to: rτ +1 = rτ − cτ ;
rt = R − st − δt .
τ =t
R − (st + δt )
1 − β2
when u(c) =
√
c
Clearly, W 0 (st + δt ) < 0. Turning to the non-cooperative game, denote the aggregate
expansion of all N players at time t by δt , let δi,t be the expansion of player i and let
δ−i,t be the expansion of all other players. For a square-root utility function and without
exogenous constraints in extraction, the number of players cannot be too big in order to
have an interior equilibrium with positive extraction over the entire time path (instead of
immediate, pre-emptive, extraction in the first period). Here I choose N =2. We then have:
W nc (st + δi,t + δ−i,t ) = max
0≤cτ ≤rτ
∞
X
β τ −t u(ci,τ )
subject to: rt = R − (st + δi,t + δ−i,t );
τ =t
and: rτ +1 = rτ −
PN
i
ci,τ .
for a symmetric equilibrium ci,τ = cnc
τ ∀i and ∀τ , we have the following explicit solution:
s
W nc (st + δi,t + δ−i,t ) =
R − (st + δi,t + δ−i,t )
p
β 1 − β2
when u(ci ) =
√
ci
and N = 2.
Panel (a) of Figure 6 shows the optimal size of experimentation δ, as a function of the
safe value s, for three different values of the discount factor β. The overall structure of the
optimal policy is the same as in the baseline case (compare to Figure 3), namely that there
is region where s < s∗ for which experimentation is optimal, and a region where s ≥ s∗ for
which it is not optimal to experiment (note that I do not plot the entire state-space [0, R] here
in order to concentrate on the relevant range of state values). Moreover, as in the baseline
8
I solve by straight-forward value function iteration, using the computer program R 3.1.1 (2014). The
scripts are available on request.
33
0.6
0.6
Total extraction δ(s)
0.2
0.4
Total extraction δ(s)
0.2
0.4
β=0.60
β=0.65
β=0.95
β=0.95
β=0.99
β=0.99
0
0.2
0.4
0.6
0
Initial safe value s
0.2
0.4
0.6
Initial safe value s
(a) Social optimal expansion
(b) Aggregate non-cooperative expansion
Figure 6: Illustration of optimal and non-cooperative experimentation
when pre-event choices matter
√
for post-event value. Parameters and functional forms: u(c) = c, N =2, A=R=1, Fs (δ) = 1−s−δ
1−s ;
various values for β.
scenario of Part I, the size of the experiment is decreasing in the value of consumption that is
known to be safe. In contrast to the baseline case of an exogenous post-event value, however,
Figure 6a shows clearly how there may be repeated experimentation when β = 0.95 and
β = 0.99. In fact, as the optimal policy in these cases is always below the curve that maps
the policy of taking only step (that is, the curve at which δ(s) = s∗ − s), represented by
the thin black line, it is optimal to approach s∗ only asymptotically. As discussed above,
the stronger the future is discounted, the less valuable is a cautious approach. In Figure
6a, this is illustrated by plotting the optimal policy for β = 0.60, which is always above the
thin black line that represents δ(s) = s− s. That is, in the case that β is sufficiently small,
all experimentation will be undertaken at once.
Panel (b) serves to contrast the socially optimal expansion with the aggregate noncooperative expansion. Not surprisingly, non-cooperative experimentation is inefficiently
risky. But again, the overall structure of the non-cooperative equilibrium is the same as
in the baseline case, namely there is region where s < snc with positive experimentation,
and a region where s ≥ snc for which it is an equilibrium to not experiment. Again, as
in the baseline scenario of Part II, the size of the experiment is decreasing in the value
of consumption that is known to be safe. However, in contrast to the baseline scenario,
equilibrium expansion does not significantly exceed the level that brings st+1 to the set at
which no further experimentation is an equilibrium. At the same time, endogeneity of the
post-event value does not induce sequential experimentation either, unless the values of β
is very high. Generally, it is noteworthy that the slope of the aggregate expansion is much
less sensitive to changes the discount factor as compared to the social optimum (though
it moves snc to the left and right, respectively, with a higher or lower β, as in the social
optimum).
34
Conclusion
The effect of potential regime shifts on the first-best and the non-cooperative use of environmental goods and services is polarizing. Depending on the initial level of use that is known
to be safe, and depending on the belief about the location of the catastrophic threshold, it
may be optimal to never use more than the initially known safe level. Or, it may be optimal to experiment exactly once and then not expand resource use again when the updated
level of resource use turned out to be safe. This first experiment may however imply the
exhaustion of the resource.
Similarly, when the players believe that even a low level of consumption causes catastrophe, the game exhibits prisoner-dilemma features: Although it would be optimal to
sustain the resource at its current level of use, the only non-cooperative equilibrium will
be the immediate extirpation of the resource. In contrast, when the players believe that
it is sufficiently likely that the productive regime can be sustained even at a high level of
consumption, the game changes into a coordination-problem: The threat of loosing the productive resource can effectively enforce the first-best consumption level. For intermediate
values, the equilibrium will neither be extirpation nor status-quo consumption, but rather a
one-time increase in consumption, expanding the set of safe consumption possibilities. This
expansions will be inefficiently large compared to the first-best experiment, but if it has not
caused the regime shift, the players will be able to coordinate on staying at the updated
level. Staying at the updated level is ex post socially optimal.
When the externality applies not only to the risk of a regime shift (i.e. any given level of
safe consumption is efficiently shared among the agents), but also to the resource use itself,
the threat of the threshold loses importance. Due to the dynamic common-pool externality
on the resource, non-cooperative extraction will be inefficiently high even in absence of any
risk of a regime shift. This means that when the threshold is believed to be above the firstbest consumption pattern, its threat cannot act as a “commitment device” to ensure efficient
extraction. Nevertheless, the threshold may still dampen non-cooperative extraction.
These conclusions have been derived by using a general dynamic model that has placed
only minimal requirements on the utility function (concavity and boundedness) and the
probability distribution of the threshold (continuity). Their robustness have been demonstrated by exploring a range of alternative assumptions on the timing at which the catastrophe occurs, on the renewability of the regime shift, and the growth potential of the
resource. Nevertheless, there are a four aspects of the underlying modeling structure that
warrant special discussion.
First, a prominent aspect of this model is that the threshold itself is not stochastic.
The central motivation is that this allows concentrating on the effect of uncertainty about
the threshold’s location. This is arguably the core of the problem: we don’t know which
level of use triggers the regime shift. This modeling approach implies a clear demarcation
between a safe region and a risky region of the state space. In particular, it implies that
the edge of the cliff, figuratively speaking, is a safe – and in many cases optimal – place
35
to be. The alternative approach, modeling the risk of a regime shift by a hazard rate
acknowledges that, figuratively speaking, the edge of a cliff is often quite windy and not a
particular safe place. This however implies, that under a hazard rate approach, the regime
shift will occur with certainty as time goes to infinity, no matter how little of the resource
is used. Eventually, there will be a gust of wind that is strong enough to blow us over
the edge, regardless of where we stand. This is of course not very realistic either. But
also on a deeper level one could argue that the non-stochasticity is in effect not a flaw but
a feature: Let me cite Lemoine and Traeger (2014, p.28) who argue that “we would not
actually expect tipping to be stochastic. Instead, any such stochasticity would serve to
approximate a more complete model with uncertainty (and potentially learning) over the
precise trigger mechanism underlying the tipping point.” This being said, it would still be
interesting to investigate how the choice between a hazard-rate formulation (as in Polasky
et al., 2011 or Sakamoto, 2014) or a threshold formulation influences the outcome and policy
conclusions in an otherwise identical model.
Second, I have modeled the players to be identical. In the real world, players are rarely
identical. One dimension along which players could differ is their valuation of the future.
However, prima facie it should not be difficult to show that any such differences could be
smoothed out by a contract that gives a larger share of the gains from cooperation to more
impatient players. Another dimension along which players could differ is their beliefs about
the existence and location of the threshold. Agbo (2014) and Koulovatianos (2015) analyze
belief heterogeneity about the renewability parameter in the framework of Levhari and
Mirman (1980). They find that in particular the players with the most pessimistic beliefs can
have a detrimental effect on resource governance. In the current set-up such a heterogeneity
could lead to interesting dynamics and possible multiple equilibria, where some players
are so pessimistic about the location of T that the rationally do not want to experiment,
whereas other players would want to invest in learning and experimentation. Finally, players
could differ in their size or the degree to which they depend on the environmental goods or
services in question. As larger players are likely to be able to internalize a larger part of the
externality than smaller players, different sets of equilibria may emerge. Especially in light
of the discussions surrounding a possible climate treaty (Harstad, 2012; Nordhaus, 2015), it
is topical to analyze a situation where groups of players can form a coalition to ameliorate
the negative effects of non-cooperation in future applications.
Third, while I have analyzed reversibility of the regime shift in the social planner environment of Part I, it was beyond the scope of this paper to also do so in the game. However,
leveraging the tractability of the current modeling approach to explore this issue could be
very fruitful. For example, if one presumes that crossing the threshold implies that one
learns where it is, the game turns into a repeated game. On the one hand, this may imply
that cooperation is sustainable for sufficiently patient players (van Damme, 1989). But on
the other hand, there could also be cases where irreversibility emerges “endogenously” when
it is possible – but not an equilibrium – to move out of a non-productive regime.
A final, related, point is the fact that I have concentrated on Markovian strategies. When
36
the players are allowed to use history-dependent strategies, the threat of a threshold may
allow them to coordinate on the social optimum in all phases of the game. They could simply
agree on expanding the set of safe consumption possibilities by the socially optimal amount
and threaten that if any player steps too far, this triggers the depletion of the resource in the
next period. This obviously begs the question of renegotiation proofness, but it is plausible
that already a contract that is binding for two periods is sufficient to achieve the first-best.
The threat of a disastrous regime shift is a very strong coordinating device. This is true
irrespective of whether the threshold’s location is known or unknown, because the current
model of uncertainty implies that it is, loosely speaking, pitch dark when the players take a
step. It is only afterwards that they realize whether the disastrous regime shift has occurred
or not. Would the coordinating force of a catastrophic threshold diminish when the players
can learn about its location without risking to cross it? Importantly, an extension of the
model along these lines would link the game-theoretic part to the debate on “early warning
signals” (Scheffer et al., 2009; Boettiger and Hastings, 2013) and is the task of future work.
37
References
Aflaki, S. (2013). The effect of environmental uncertainty on the tragedy of the commons. Games and
Economic Behavior, 82(0):240–253.
Agbo, M. (2014). Strategic exploitation with learning and heterogeneous beliefs. Journal of Environmental
Economics and Management, 67(2):126–140.
Amundsen, E. S. and Bjørndal, T. (1999). Optimal exploitation of a biomass confronted with the threat of
collapse. Land Economics, 75(2):185–202.
Barlow, P. and Reichard, E. (2010). Saltwater intrusion in coastal regions of north america. Hydrogeology
Journal, 18(1):247–260.
Barrett, S. (2013). Climate treaties and approaching catastrophes. Journal of Environmental Economics
and Management, 66(2):235–250.
Barrett, S. and Dannenberg, A. (2012). Climate negotiations under scientific uncertainty. Proceedings of
the National Academy of Sciences, 109(43):17372–17376.
Bochet, O., Laurent-Lucchetti, J., Leroux, J., and Sinclair-Desgagné, B. (2013). Collective Dangerous Behavior: Theory and Evidence on Risk-Taking. Research Papers by the Institute of Economics and Econometrics, Geneva School of Economics and Management, University of Geneva 13101, Institut d’Economie
et Econométrie, Université de Genève.
Boettiger, C. and Hastings, A. (2013). Tipping points: From patterns to predictions. Nature, 493(7431):157–
158.
Bolton, P. and Harris, C. (1999). Strategic experimentation. Econometrica, 67(2):349–374.
Bonatti, A. and Hörner, J. (2015). Learning to disagree in a game of experimentation. Discussion paper,
Cowles Foundation, Yale University.
Brozović, N. and Schlenker, W. (2011). Optimal management of an ecosystem with an unknown threshold.
Ecological Economics, 70(4):627 – 640.
Costello, C. and Karp, L. (2004). Dynamic taxes and quotas with learning. Journal of Economic Dynamics
and Control, 28(8):1661–1680.
Crépin, A.-S. and Lindahl, T. (2009). Grazing games: Sharing common property resources with complex
dynamics. Environmental and Resource Economics, 44(1):29–46.
Cropper, M. (1976). Regulating activities with catastrophic environmental effects. Journal of Environmental
Economics and Management, 3(1):1–15.
Fesselmeyer, E. and Santugini, M. (2013). Strategic exploitation of a common resource under environmental
risk. Journal of Economic Dynamics and Control, 37(1):125–136.
Frank, K. T., Petrie, B., Choi, J. S., and Leggett, W. C. (2005). Trophic cascades in a formerly coddominated ecosystem. Science, 308(5728):1621–1623.
Gerlagh, R. and Liski, M. (2014). Carbon Prices for the Next Hundred Years. CESifo Working Paper Series
4671, CESifo Group Munich.
Gladwell, M. (2000). The Tipping Point: How Little Things Can Make a Big Difference. Little Brown.
Groeneveld, R. A., Springborn, M., and Costello, C. (2013). Repeated experimentation to learn about a
flow-pollutant threshold. Environmental and Resource Economics, forthcoming:1–21.
Harsanyi, J. C. and Selten, R. (1988). A general theory of equilibrium selection in games. MIT Press,
Cambridge.
Harstad, B. (2012). Climate contracts: A game of emissions, investments, negotiations, and renegotiations.
The Review of Economic Studies, 79(4):1527–1557.
Hoel, M. (1978). Resource extraction, uncertainty, and learning. The Bell Journal of Economics, 9(2):642–
645.
Keller, G., Rady, S., and Cripps, M. (2005). Strategic experimentation with exponential bandits. Econometrica, 73(1):39–68.
Kemp, M. (1976). How to eat a cake of unknown size. In Kemp, M., editor, Three Topics in the Theory of
International Trade, pages 297–308. North-Holland, Amsterdam.
Kim, Y. (1996). Equilibrium selection in n-person coordination games. Games and Economic Behavior,
15(2):203–227.
Kolstad, C. and Ulph, A. (2008). Learning and international environmental agreements. Climatic Change,
89(1-2):125–141.
38
Kossioris, G., Plexousakis, M., Xepapadeas, A., de Zeeuw, A., and Mäler, K.-G. (2008). Feedback nash
equilibria for non-linear differential games in pollution control. Journal of Economic Dynamics and
Control, 32(4):1312–1331.
Koulovatianos, C. (2015). Strategic exploitation of a common-property resource under rational learning
about its reproduction. Dynamic Games and Applications, 5(1):94–119.
Lemoine, D. and Traeger, C. (2014). Watch your step: Optimal policy in a tipping climate. American
Economic Journal: Economic Policy,, forthcoming:1–47.
Lenton, T. M., Held, H., Kriegler, E., Hall, J. W., Lucht, W., Rahmstorf, S., and Schellnhuber, H. J.
(2008). Tipping elements in the earth’s climate system. Proceedings of the National Academy of Sciences,
105(6):1786–1793.
Levhari, D. and Mirman, L. J. (1980). The Great Fish War: An Example Using a Dynamic Cournot-Nash
Solution. The Bell Journal of Economics, 11(1):322–334.
Miller, K. A., Munro, G. R., Sumaila, U. R., and Cheung, W. W. L. (2013). Governing marine fisheries in a
changing climate: A game-theoretic perspective. Canadian Journal of Agricultural Economics, 61(2):309–
334.
Miller, S. and Nkuiya, B. (2014). Coalition formation in fisheries with potential regime shift. Unpublished
Working Paper, University of California, Santa Barbara.
Nævdal, E. (2006). Dynamic optimisation in the presence of threshold effects when the location of the
threshold is uncertain – with an application to a possible disintegration of the western antarctic ice sheet.
Journal of Economic Dynamics and Control, 30(7):1131–1158.
Nordhaus, W. (2015). Climate clubs: Overcoming free-riding in international climate policy. American
Economic Review, 105(4):1339–1370.
Ploeg, F. v. d. and Zeeuw, A. d. (2015a). Climate tipping and economic growth: Precautionary saving and
the social cost of carbon. OxCarre Research Paper 118, OxCarre, Oxford, UK.
Ploeg, F. v. d. and Zeeuw, A. d. (2015b). Non-cooperative and cooperative responses to climate catastrophes
in the global economy: A north-south perspective. OxCarre Research Paper 149, OxCarre, Oxford, UK.
Polasky, S., Zeeuw, A. d., and Wagener, F. (2011). Optimal management with potential regime shifts.
Journal of Environmental Economics and Management, 62(2):229 – 240.
R 3.1.1 (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria. http://www.R-project.org.
Ren, B. and Polasky, S. (2014). The optimal management of renewable resources under the risk of potential
regime shift. Journal of Economic Dynamics and Control, 40(0):195–212.
Rob, R. (1991). Learning and capacity expansion under demand uncertainty. The Review of Economic
Studies, 58(4):655–675.
Sakamoto, H. (2014). A dynamic common-property resource problem with potential regime shifts. Discussion
Paper E-12-012, Graduate School of Economics, Kyoto University.
Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., Held, H., van Nes,
E. H., Rietkerk, M., and Sugihara, G. (2009). Early-warning signals for critical transitions. Nature,
461(7260):53–59.
Scheffer, M., Carpenter, S., Foley, J. A., Folke, C., and Walker, B. (2001). Catastrophic shifts in ecosystems.
Nature, 413(6856):591–596.
Tavoni, A., Dannenberg, A., Kallis, G., and Löschel, A. (2011). Inequality, communication, and the avoidance of disastrous climate change in a public goods game. Proceedings of the National Academy of
Sciences, 108(29):11825–11829.
Tsur, Y. and Zemel, A. (1995). Uncertainty and irreversibility in groundwater resource management. Journal
of Environmental Economics and Management, 29(2):149 – 161.
van Damme, E. (1989). Renegotiation-proof equilibria in repeated prisoners’ dilemma. Journal of Economic
Theory, 47(1):206–217.
39
Appendix
A–1
Proof of Proposition 1
Recall that Proposition 1 states that the socially optimal total use of the resource is either s0 for all t or
s0 + δ ∗ (s0 ) for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1.
Part (1) First, I show that there is a non-empty set S of values of s at which it is optimal to stay.
Regardless of the actual form of the value function, we know that an upper bound for the continuation
value V (s + δ) is u(R)
. Using this, an upper bound for the gain from taking a positive step δ > 0 is
1−β
.
Therefore, when the derivative of u(s + δ) + βFs (δ) u(R)
with respect to δ is less
u(s + δ) + βFs (δ) u(R)
1−β
1−β
or equal to zero on the domain δ ∈ (0, R − s] for some given value of s, then it is optimal to stay (choose
δ ∗ (s) = 0) for that value of s. I now show that there exists some value of s at which this is the case.
Let s = R − ε with ε > 0 so that s is close to R. The derivative of u(s + δ) + βFs (δ) u(R)
is given by
1−β
0
u(R)
0
u (s + δ) + βFs (δ) 1−β , and evaluated at s = R − this is:
0
u0 (R − ε + δ) + βF R−ε (δ)
u(R)
1−β
⇔
u0 (R − ε + δ) + β
−f (R − ε + δ) u(R)
1 − F (R − ε) 1 − β
An interior solution would imply that:
(1 − F (R − ε))u0 (R − ε + δ) = βf (R − ε + δ)
u(R)
1−β
When it is known that there is a catastrophic threshold on h[0, R], we have F (R)
i = 1. In this case, we
have limε→0 [(1 − F (R − ε))u0 (R − ε + δ)] = 0 whereas limε→0 βf (R − ε + δ) u(R)
> 0 (the existence of
1−β
limx→R− f (x) > 0 is implied by the assumption of a continuous support for T on [0; R]) so that there cannot
be an interior solution for small values of ε.
When there is a positive probability that there is no threshold on [0, R] (that is, F (R) < 1), then it
need not be the case that (1 − F (R − ε))u0 (R − ε + δ) < βf (R − ε + δ) u(R)
as → 0. However, even when
1−β
(1 − F (R))u0 (R) ≥ βf (R) u(R)
,
there
will
a
value
of
s,
namely
s
=
R,
at
which
it is optimal to stay – simply
1−β
because there is no other choice.
Thus, the set S is not empty. Moreover, when the hazard rate is not decreasing with s (that is when
(s))+(1−F (s+δ))f (s)
f (s+δ)
∂Fs (δ)
f (s)
= −f (s+δ)(1−F[1−F
< 1−F
), it can be shown that the set S is convex,
< 0 ⇔ 1−F
∂s
(s)
(s+δ)
(s)]2
∗
∗
so that S = [s , R] where s is defined in the main text as the lowest value of s at which it is optimal to
never experiment. First, note that convexity of S is trivial when s∗ = R. Consider then the case that
s∗ < R. By definition we have for s∗ that the first-order condition must just hold with equality:
u0 (s∗ ) = β
f (s∗ ) u(s∗ )
1 − F (s∗ ) 1 − β
Convexity of S requires:
u0 (λs∗ + (1 − λ)R) < β
f (λs∗ + (1 − λ)R) u(λs∗ + (1 − λ)R)
1 − F (λs∗ + (1 − λ)R)
1−β
for all λ ∈ (0, 1]
(A-1)
The term on the LHS of (A-1) is smaller the larger is λ. The rightmost fraction of (A-1) is larger the larger
s (δ)
is λ, β is positive constant, and the term in the middle is increasing in λ when ∂F∂s
< 0.
Summing up, when s0 ∈ S, the the socially optimal total use of the resource is s0 for all t.
Part (2) When s0 ∈
/ S, it is not optimal to stay, so that it is optimal to expand the set of safe consumption values by choosing δ > 0. Due to discounting, it cannot be optimal to approach S asymptotically.
Thus there must be a last step from some st ∈
/ S to st+1 = st + δt with st+1 ∈ S. Below, I show that it is
in fact optimal to take only one step. It then follows that when s0 ∈
/ S, it is optimal to choose s0 + δ ∗ (s0 )
40
for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1.
Denote δ ∗ (s̃) the optimal last step when starting from some value s̃ ∈
/ S and s∗ = s̃ + δ ∗ with s∗ ∈ S.
The following calculations show that going from some s to s̃ (by taking a step of size δ̃ and then to s∗ )
yields a lower payoff than going from s to s∗ directly (by taking a step of size δ̂ = δ̃ + δ ∗ ; see the box below
for a sketch of the involved step-sizes).
δ̂
δ̃
s
δ∗
s
^
R
s∗
s̃
That is, I claim:
u(s + δ̃ + δ ∗ )
u(s + δ̃) + βF s (δ̃) u(s + δ̃ + δ ∗ ) + βF s+δ̃ (δ ∗ )
1−β
The important thing to note is that: F s (δ̃)F s+δ̃ (δ ∗ ) =
≤
F (s+δ̃) F (s+δ̃+δ ∗ )
F (s)
F (s+δ̃)
u(s + δ̂) + βF s (δ̂)
=
F (s+δ̃+δ ∗ )
F (s)
u(s + δ̂)
1−β
(A-2)
= F s (δ̃ + δ ∗ ). Hence,
∗
(A-2) can, upon using δ̂ = δ̃ + δ and splitting the RHS into three parts (t = 0, t = 1, t ≥ 2), be written as:
u(s + δ̂)
1−β
≤
u(s + δ̂) + βF s (δ̂)u(s + δ̂) + β 2 F s (δ̂)
u(s + δ̃)
≤
h
u(s + δ̃) + βF s (δ̃)u(s + δ̂) + β 2 F s (δ̂)
which simplifies to:
i
1 + β F s (δ̂) − F s (δ̃) u(s + δ̂)
"
u(s + δ̃)
≤
u(s + δ̂)
1−β
1+β
#
F (s + δ̂) − F (s + δ̃)
u(s + δ̂)
F (s)
(A-2’)
Because the term in the squared bracket is smaller than 1 (as F s (δ̂) < F s (δ̃)), it is not immediately obvious
that the inequality in the last line holds. However, we can use the fact that because s̃ ∈
/ S, and because δ ∗
is defined as the optimal last step from s̃ into the set S, the following must hold:
u(s̃ + δ ∗ )
u(s̃)
< u(s̃ + δ ∗ ) + βF s̃ (δ ∗ )
.
1−β
1−β
Using the fact that s̃ = s + δ̃ and that s̃ + δ ∗ = s + δ̂, this can be re-arranged to give:
F (s + δ̂) u(s + δ̂)
u(s + δ̃)
< u(s + δ̂) + β
1−β
F (s + δ̃) 1 − β
⇔
#
F (s + δ̂) − F (s + δ̃)
u(s + δ̂)
u(s + δ̃) < 1 + β
F (s + δ̃)
"
0
(A-3)
(s+δ̃)
(s+δ̃)
Since F (s) < 0, we know that F (s+Fδ̂)−F
< F (s+δ̂)−F
< 0. Therefore, combining (A-2’) and (A-3)
F (s)
(s+δ̃)
establishes the claim and completes the proof:
"
"
#
#
F (s + δ̂) − F (s + δ̃)
F (s + δ̂) − F (s + δ̃)
u(s + δ̃) <
1+β
u(s + δ̂) <
1+β
u(s + δ̂) (A-4)
F (s)
F (s + δ̃)
A–2
Proof of Proposition 2.
Proposition 2 states that δ ∗ is decreasing in s.
41
Recall that the optimal choice of a positive expansion δ ∗ (s) is the solution of ϕ0 (δ ∗ ; s) = 0 where ϕ0 is
given by (5):
ϕ0 (δ ∗ ; s) = u0 (s + δ ∗ ) +
i
β h 0 ∗
F s (δ )u (s + δ ∗ ) + F s (δ ∗ )u0 (s + δ ∗ )
1−β
(5)
with the second-order condition:
ϕ00 (δ ∗ ; s) = u00 +
β 00 ∗
0
F s (δ )u + 2F s (δ ∗ )u0 + F s (δ ∗ )u00 < 0
1−β
(A-5)
To show that δ ∗ is declining in s, I therefore need to show that:
∂ ϕ0 (δ ∗ ; s) /∂s
dδ ∗
=− 0 ∗ <0
ds
∂ ϕ (δ ; s) /∂δ ∗
Since the denominator is negative when the second-order condition is satisfied, we have that
∂[ϕ0 (δ ∗ ;s)]
< 0, so that the condition to check is (A-6):
∂s
0
∂[ϕ0 (δ ∗ ; s)]
β
= u00 +
∂s
1−β
∂F s (δ ∗ )
∂F s (δ ∗ ) 0
0
u + F s (δ ∗ )u0 +
u + F s (δ ∗ )u00
∂s
∂s
dδ ∗
ds
< 0 when
!
<0
(A-6)
Noting the similarity of (A-6) to the second-order condition (A-5), and realizing that (A-5) can be
decomposed into a common part A and a part B and that (A-6) can be decomposed into the common part
A and a part C, a sufficient condition for (A-6) to be satisfied is that B > C.
β 00 ∗
β 0 ∗ 0
0
F s (δ )u + F s (δ ∗ )u00
+
F s (δ )u + F s (δ ∗ )u0 < 0
(A-5’)
u00 +
1−β
1−β
|
{z
}
|
{z
}
B
A
!
0
∂F s (δ ∗ )
∂F s (δ ∗ ) 0
u+
u <0
∂s
∂s
{z
}
β
β 0 ∗ 0
+
u +
F s (δ )u + F s (δ ∗ )u00
1−β
1−β
|
{z
}
|
00
A
00
0
0
∂F (δ)
s
In order to show that F s (δ)u + F s (δ)u0 > ∂s
u+
0
interior solution from (5) to write u in terms of u:
(A-6’)
C
∂F s (δ) 0
u,
∂s
I use the first-order condition for an
0
u0 =
−F s (δ)
u
1−β
+ Ls (δ)
β
Upon inserting and canceling u, I need to show that:
00
0
F s (δ) + F s (δ)
Recall that F s (δ) =
F (s+δ)
F (s)
0
F (s + δ)
F (s)
00
F (s + δ)
F (s)
00
F s (δ) =
0
−F s (δ)
1−β
+ F s (δ)
β
0
#
∂F s (δ)
∂F s (δ)
+
∂s
∂s
>
"
0
−F s (δ)
1−β
+ F s (δ)
β
#
(A-7)
and hence:
0
F s (δ) =
"
0
0
∂F s (δ)
F (s + δ)F (s) − F (s + δ)F (s)
=
∂s
[F (s)]2
0
00
0
0
F (s + δ)F (s) − F (s + δ)F (s)
∂F s (δ)
=
∂s
[F (s)]2
Tedious but straightforward calculations then show that (A-7) is indeed satisfied.
42
# 00
" 0
#2 0
F (s + δ) F (s + δ)
F (s + δ)
1−β
1−β
L(s + δ) ∂F s (δ)
∂F s (δ) 0
+
−
>
+
−
F s (δ)
β
β
L(s)
∂s
∂s
F (s)
F (s)
F (s)
|
{z
}
|
{z
}
"
a
a
⇔
00
a
F (s + δ)
F (s)
"
−
0
F (s + δ)
#2
F (s)
00
>a
0
0
F (s + δ)F (s) − F (s + δ)F (s)
[F (s)]2
−
∂F s (δ) 0
F s (δ)
∂s
⇔
aF (s) > F (s + δ)
⇔
#
"
1−β
F (s + δ)
F (s) > F (s + δ)
+
β
F (s)
⇔
1−β
F (s) > 0
β
A–3
Proof of Proposition 3
Recall that Proposition 3 states that also when crossing the threshold at time t triggers the regime shift at
some (potentially uncertain) time τ > t, it is still optimal to experiment – if at all – in the first period only.
The key is to realize that yesterday’s decisions are exogenous today. This means that threat of a regime
shift can be modeled as an exogenous hazard rate: Let ht be the probability that the regime shift, triggered
by events earlier than and including time t, occurs at time t (conditional on not having occurred prior to t,
of course). The planner’s problem in this situation can be formulated as:
V (s) =
max
δ∈[0,R−s]
u(s + δ) + (1 − ht )βF s (δ)V (s + δ)
(A-8)
The structure of (A-8) is identical to the one in equation (3), only the effective discount factor decreases by
(1 − ht ). As the value of β is immaterial for the fact that it is optimal to experiment only once, the learning
dynamics are unchanged.
A–4
Proof of Proposition 4
Recall that Proposition 4 states that when the regime shift is reversible after a lag of length l and T is
revealed when st + δt > T , then any experimentation is undertaken in the first period and the size of the first
step, δ ∗ (s0 ), is larger is l. Depending on l and the initial safe value s0 , a range of the state-space remains
permanently unexplored.
To prove this proposition, I first show that for suitable values of l, the set S of initial values s0 at which
it is not optimal to experiment is not empty. As in the proof A–1 above, I use the fact that I can find
an upper bound for the continuation value after a step δ to show that for some values of s the first-order
condition for a positive step size cannot be satisfied.
Because it is assumed that the location of the threshold is revealed when it is being crossed, the optimal
choice after recovery (which takes l periods) is to stay exactly at the threshold T . An upper bound for the
payoff from an experiment of size δ is therefore given by:
" R s+δ
#
u(y)f (y)dy
β
l s
β
+ Fs (δ)u(R)
u(s + δ) +
1−β
1 − F (s)
43
the corresponding first-order condition for an interior optimum is:
0 = u0 (s + δ) +
−f (s + δ)
f (s + δ)
β
u(s + δ) +
u(R)
βl
1−β
1 − F (s)
1 − F (s)
which can be rewritten as:
[1 − F (s)]u0 (s + δ) =
h
i
β
f (s + δ) u(R) − β l u(s + δ)
1−β
(A-9)
As above, consider s → R. Because we assume F (R) = 1 in this section, the LHS of (A-9) will go to zero
whereas the RHS is positive when l > 0. Clearly, the larger l the larger the RHS. Note also that (A-9)
shows that it will always be optimal to explore the entire state-space when l = 0 as the RHS is then zero as
s → R.
Again, to show that any experimentation is undertaken in the first step, I show that the payoff from
reaching S in one step is higher than the payoff from doing so in two steps. As the only thing that differs
from the calculations (A-2) to (A-4) above is the addition of the continuation value in case the threshold is
crossed and discovered, this amounts to showing:
R s+δ̂
u(y)f (y)dy
u(y)f (y)dy
l+1 s+δ̃
+β
β
(1 − F (s))(1 − β)
(1 − F (s))(1 − β)
R s+δ̂
R s+δ̃
l
s
≤
β
l
u(y)f (y)dy
s
(1 − F (s))(1 − β)
which is true (because β < 1) as the following equivalent reformulation makes clear:
s+δ̃
Z
Z
s+δ̂
u(y)f (y)dy + β
s
s+δ̂
Z
u(y)f (y)dy
s+δ̃
≤
u(y)f (y)dy
s
Finally, direct inspection of equation (12) in the main text shows that the size of the first step, δ ∗ (s0 ),
is larger the shorter the duration of the lag l.
A–5
Proof of Proposition 5
Recall that Proposition 5 states that when the regime shift is reversible after a lag of length l and T is not
revealed when st + δt > T , it is either not optimal to experiment at all, or there is repeated experimentation
with decreasing step sizes δt > δt+1 . Experimentation stops the moment that st + δt < T or st + δt < Û.
To prove this proposition, I first note that the existence of a set S at which it is not optimal to experiment
further is implied by Proposition 4. The continuation value when the location is discovered upon crossing
the threshold is larger than the continuation value when the threshold is crossed but not revealed. Because
even in the former case, it was optimal to leave some of the state space unexplored as the cost of crossing
the threshold become prohibitively high, this must necessarily be also be true when the cost of crossing the
threshold are larger.
To see that there must be some critical value Û below which it does not pay to experiment further, I
u(s)
. Inserting s + ε for U in the equation
set U to some small value s + ε and show that as ε → 0, J(s, U) < 1−β
of J(s, U) (equation (13) in the main text), we have:
(
J(s, s + ε) = sup
δ∈(0,ε)
"
u(s + δ) + β
βl
R s+δ
s
#)
R s+ε
f (y)dyV (s, s + δ) + s+δ f (y)dyV (s + δ, s + ε)
R s+ε
f (y)dy
s
u(s)
u(s)
Clearly, for l = 0, we would have limε→0 J = 1−β
, but because l > 0, we have limε→0 J < 1−β
.
The fact that the step size decreases simply follows from the successive updating of the upper bound
U: as Ut+1 = st + δt , as new step from st of size δt+1 ∈ (0, Ut+1 − st ) must necessarily be smaller than δt .
44
A–6
Proof of Proposition 6
Recall that Proposition 6 states that when the location of the threshold is known with certainty, then there
exists, for every combination of N , T , and R, a value of β̄ such that the first-best can be sustained as a
Nash-equilibrium when β ≥ β̄. The larger is N , or the closer T is to 0, the larger has to be β.
Recall that β̄ was defined as the lowest value of β at which staying at T can be sustained as a Nash
equilibrium. Equation (15), which is replicated below, characterizes β̄.
β̄ = 1 −
u(T /N )
u R − NN−1 T
(15)
0
dβ̄
= − [ uN · u + u· u0 NN−1 ]/[u2 ] < 0.
First, fix a value of N and R and consider how β̄ changes with T : dT
The players need to be the more patient the less valuable it is to stay below the threshold (i.e. as T declines).
u(T /N )
→ 0. The rightNote that for T → 0, u(T /N ) → 0 while u R − NN−1 T → u(R) > 0 so that
−1
u(R− NN
T)
hand-side of (15) therefore approaches 1 as T → 0. But since it approaches 1 from below, we can always
find some value of β that could still sustain the first-best.
dβ̄
Second, fix T and R and consider how β̄ changes with N : dN
= − [u0 · T · u + u· u0 N12 ]/[u2 ] < 0. The
more players there are, the more patient they have to be in order to sustain the productive equilibrium.
u(T /N )
Note that as N → ∞, u(T /N ) → 0 while u R − NN−1 T → u(R − T ) > 0 so that
→ 0. Again,
−1
u(R− NN
T)
β̄ approaches 1 from below, which allows to find some value of β that could still sustain the first-best.
0
dβ̄
Finally, fix N and T and consider how β̄ changes with R: dR
= uu2 > 0. The larger is R, the larger the
temptation to deviate and extirpate the resource immediately, which means that β must be higher in order
for a sustainable Nash equilibrium. However, as R > T by construction, there will always be some value of
β at which the resource is preserved indefinitely.
A–7
Proof of Proposition 7.
Recall that Proposition 7 states that for s0 ≥ snc coordination to stay at s0 can be supported as a Nash
equilibrium. For s0 < snc taking one step and then staying at s1 = s0 + δ nc can be supported as a Nash
equilibrium.
First note that if it is a Nash equilibrium to stay at some s in any one period, it will be a Nash
equilibrium to stay at that s in all subsequent periods. Again, there will be some snc at which staying is
a Nash equilibrium, because at least at s = R, there is no other choice. But parallel to the argument in
Proposition 1, there will also be some snc < R when s close enough to R and F s (δ) becomes sufficiently
small. Also here, there will always be values of snc < R when it is known that there is a catastrophic
threshold on [0, R]. Suppose all other players stay at s = R − ε, then for ε small, the value from staying at
s = R − ε is at least as large as the value of making a step towards R so that the updated value is R − δ
(with δ ∈ (0, ε]):
u
R−ε
N
1−β
≥u
R−ε
+δ
N
+ βF s (δ)
u
R−δ
N
1−β
(A-10)
)
u(( R−ε
)
N
Parallel to the social optimum we have limε→0
= u(R/N
. Again, since δ ∈ (0, ε] and F s (δ) → 0
1−β
1−β
R−δ
u( N )
)
as δ → R − s we have limε→0 u R−ε
+ δ + βF s (δ) 1−β
= u(R/N ) < u(R/N
.
N
1−β
Now, as there is some snc at which staying is a Nash equilibrium, there will be a last step at which
this value is reached. Take some value s at which staying is not a Nash equilibrium. Suppose the strategy
of the opponents is to take some step δ1−i < δ nc (s) and then some step δ2−i∗ (s + δ1−i + δ1i ). The following
calculations show that the best-reply from player i is to take only one step δ1i∗ . Hence δ2−i∗ (s + δ1−i + δ1i ) = 0
and the equilibrium will be to reach a value at which staying is a Nash equilibrium in one step.
45
For player i the payoff from making one step δ1i∗ = snc − s0 − δ1−i exceeds the payoff from making two
steps δ1i < snc − s0 − δ1−i and δ2i∗ = snc − s1 − δ2−i∗ when:
nc s
s
β
0
i∗
−i
i∗
u
+ δ1 +
F s (δ1 + δ1 )u
N
1−β 0
N
(A-11)
nc s
s
s1
β
0
≥u
+ δ1i + βF s0 (δ1i + δ1−i ) u
+ δ2i∗ +
F s1 (δ2i∗ )u
N
N
1−β
N
As for the coordinated case, F s0 (s1 − s0 )F s1 (snc − s1 ) = F s0 (snc − s0 ) so that (A-11) implies:
u
s
F (snc ) snc F (s1 ) s1
0
+ δ1i∗ − u
+ δ1i ≥ β
+ δ2i∗ −
u
u
N
N
N
N
F (s0 )
F (s0 )
s
0
(A-12)
For clarity, write this inequality as A − a ≥ B − b. This inequality holds because both A > B and a < b.
To see that A > B note that u is an increasing and concave function so that u( sN0 + δ1i∗ ) > u( sN1 + δ2i∗ )
when sN0 + δ1i∗ > sN1 + δ2i∗ . Inserting δ1i∗ = snc − s0 − δ1−i , δ2i∗ = snc − s1 − δ2−i∗ and s1 = s0 + δ1i + δ1−i
in this inequality simplifies to (N − 1)(δ1i + δ1−i ) > 0, which is true. By the same argument, a < b when
nc
s0
+ δ1i < sN . Re-write this as N δ1i < snc − s0 . This inequality holds because it is implied by the definition
N
that δ1i < snc − s0 − δ1−i and δ1−i < δ nc (s).
Recall that the best-reply function g(δ −i , s) in equation (19b) is therefore defined by the interior solution
to the first-order-condition of maximizing φ(δ i ; δ −i , s):
s
φ0 (δ i ; δ −i , s) = u0
+ δ i + δ −i
N
s
s
β
1
0
i
+
F s (δ + δ −i )u
+ δ i + δ −i + F s (δ i + δ −i )u0
+ δ i + δ −i
1−β
N
N
N
For a symmetric step size δ −i = (N − 1)δ i , we have:
s
+ δ nc
N
s + δ nc
s + δ nc
β
1
0
+
F s (N δ nc )u
+ F s (N δ nc )u0
1−β
N
N
N
φ0 (δ nc ; s) = u0
The value of snc is defined by δ nc = R−s
, which is the largest value of s at which equation (20) does not
N
0
yet have an interior solution but φ (δ, s) > 0 for all δ ∈ [0, R − s). Similarly, the value of snc is defined
by δ nc = 0, which is the smallest value of s at which equation (20) no longer has an interior solution but
φ0 (δ, s) < 0 for all δ ∈ (0, R − s]
A–8
Proof of Proposition 8.
Let me repeat the comparative statics results here:
(a) The boundaries snc and snc , and aggregate extraction for s ∈ [snc , snc ], decrease with β.
R
(b) An increase in N leads to more aggressive extraction when NN+1 > u0 ( N
) u0 ( NR+1 ).
(c) The more unlikely the regime shift (in terms of a first-degree stochastic dominance), the larger the
range where a cautious Nash-equilibrium exists.
(d) As long as R < A, the higher the maximum potential reward R, the larger the range where a cautious
Nash-equilibrium exists.
First, as φ0 = 0 implicitly defines a monotonically decreasing function δ nc (s) on [snc , snc ] (which can be
shown by replacing δ ∗ (s) with N δ nc (s) in the proof of Proposition 2) and δ nc (s) is bounded above by R − s
and below by 0, an increase in δ nc will also lead to an increase in snc and snc respectively.
dφ0
.
dβ
0
We have dφ
=
dβ
2β[...]−β
,
where
the
term
in
the
squared
brackets
[...]
is
term
in
the
squared
brackets
of
equation
(20).
We
2
(1−β)
(a) To prove the proposition’s part with respect to β it is thus sufficient to analyze
46
know that this term must be negative for an interior solution because u0 > 0. Therefore:
dφ0
dβ
=
2β[...]−β
(1−β)2
< 0.
(b) I now turn to the effect of increasing N . To provide a sufficient condition for when an increase in
N decreases the range where there is a cautious equilibrium, and therefore increases aggregate expansion,
I make the following argument: snc , the largest value at which immediate extirpation is the only Nash
R
equilibrium becomes larger when adding another player and NN+1 > u0 ( N
) u0 ( NR+1 ). For a given number
R−s
; ŝ) = 0 and I show that φ0 ( N
; ŝ) > 0 when
of players N we have at a given snc = ŝ that φ0 ( R−s
N
+1
N
N +1
>
R
)
N
:
R
)
N +1
u0 (
u0 (
R−s
φ0 ( N
; ŝ) − φ0 ( R−s
; ŝ) > 0
+1
N
⇔
R
u0 ( NR+1 ) − u0 ( N
)+
0
1
β
1
R
R
u0 ( NR+1 ) − u0 ( N
u( NR+1 ) − u( N
) F ŝ + F ŝ
)
>0
1−β
N +1
N
The first part of the last line is positive due to concavity of u, the first term in the squared bracket is
0
R
positive since F s < 0 and u( N
) > u0 ( NR+1 ), and the last term in the squared bracket is positive whenever
N
N +1
>
R
)
N
.
R
)
N +1
u0 (
u0 (
(c) Consider the equation (20) at s = snc :
φ0
R − snc nc
;s
N
= u0
R
N
+
β
R
1
R
0
F s (R − snc )u
+ F s (R − snc )u0
=0
1−β
N
N
N
ˆ < F and F
ˆ 0 < F 0 . The term in the squared brackets
Now when the regime shift is more likely, we have F
above will therefore be smaller in absolute terms. As it is negative, it must mean that:
β
R − snc nc
R
ˆ 0 (R − snc )u R + 1 F
ˆ (R − snc )u0 R
;s
+
>0
φ̃0
= u0
F
s
s
N
N
1−β
N
N
N
so that the range of values at which a cautious equilibrium exists is larger.
(d) Finally, to see the effect of an increase in R to R̃ when R < A and R̃ ≤ A, note that this does not
impact equation (20) directly, but it does have an effect on the first value snc : As the diagonal line defining
the upper bound of δ ∈ [0, R − s] shifts outwards, and δ nc (s) is a downward sloping function steeper than
R − s, the first value at which it is not optimal to extirpate the resource must be smaller.
A–9
Proof of Proposition 9
Proposition 9 states that a known threshold is crossed in the first period, or never.
I show that crossing the threshold in period t cannot be an equilibrium because the payoff for player
i to preempt crossing the threshold (i.e. exhausting the resource) at t by exhausting the resource at time
t − 1 is strictly larger. I claim:
N −1
Rt
u Rt−1 −
T >u
N
N
where u Rt−1 − NN−1 T is the payoff from preempting exhausting the resource at t−1 while all other players
still extract T . As Rt = Rt−1 − T when the threshold is not crossed at time t, the above inequality holds
T
T
when Rt−1 − T + N
> RNt ⇔ Rt > RNt − N
, which is obviously true. Thus, if it is a Nash equilibrium to
exhaust the resource by crossing the threshold, it must be so in the first period.
47
A–10
Proof of Proposition 10
Recall that Proposition 10 states that in the game described by (22) when T is unknown, there exists,
in addition to the aggressive equilibrium in which the resource is exhausted in the initial period, a pareto
dominant equilibrium in which experimentation – if at all – is undertaken in the first period only and
s1 = s0 + δ nc (s0 ) is an upper bound on aggregate extraction for the remainder of the game. The threat of a
regime shift slows down the extraction rate and improves welfare.
The existence of the aggressive equilibrium is self-evident. The cautious equilibrium, if it exists, must
pareto-dominate the aggressive equilibrium as – by assumption – there was a pareto-dominant equilibrium
with several periods of extraction in a world without the threshold. If the cautious equilibrium implies, for
all periods t, a reduced extraction compared to the equilibrium extraction path in absence of regime shift
risk, the cautious equilibrium must be welfare improving.
Below I show that experimentation in the second period of the game is not individually (and socially) optimal. Therefore, if the players coordinate on the cautious equilibrium, and because the cautious equilibrium
implies δ nc < c̃nc (R0 ) − s0 by construction, it slows down the extraction rate for all periods t.
To show that experimentation in the second period is not optimal, I argue by contradiction. For a given
safe value s1 and a given stock of the remaining resource R1 in the second period, the value of the game for
player i, when all other players share the extraction of the safe amount s1 equally, is given by:
V2 (R1 , s1 ) = max u( sN1 + δ i ) + βF s1 (δ i )V3 R1 − s1 − δ i
δi
(A-13)
Suppose it were optimal to expand the set of safe values in the second period. The first-order condition for
an (interior) expansion is given by:
0
u0 ( sN1 + δ i ) = −βF s1 (δ i )V3 R1 − s1 − δ i + βF s1 (δ i )V30 R1 − s1 − δ i
(A-14)
Suppose that the first derivative of the objective function is declining in s in a neighborhood of δ i∗ (this is
shown below). Then for large s the right-hand side of (A-14) is larger than the left-hand side (hence there
is no interior solution). Accordingly, the last value of s at which no expansion can be coordinated upon in
the second period of the game is defined by:
0
u0 ( sN1 ) = βV30 (R1 − s1 ) − βF s1 (δ i )V3 (R1 − s1 )
(A-15)
Now in the first period, the corresponding value function for player i (again presuming that all other
players remain at s0 ) is:
V1 (R0 , s0 ) = max u( sN0 + δ i ) + βF s0 (δ i )V2 (R0 − s0 − δ i , s0 + δ i )
δi
The condition for the last value of s at which no expansion can be coordinated upon is given by:
i
i
β V2 R0 − s0 − δ , s0 + δ
0
u0 ( sN0 ) =
− βF s0 (δ i )V2 R0 − s0 − δ i , s + δ i
N
∂s0
(A-16)
(A-17)
Now consider the value of s at which (A-17) holds. At this level of s, equation (A-15) will not hold (but
2]
the right-hand side will be larger than the left-hand side) as ∂[V
> u0 and V2 R0 − s0 − δ i , s + δ i >
∂s
V3 R1 − s0 − δ i . This means that experimentation in the second period is no longer optimal but it is still
optimal in the first period. As ϕ is a continuous function of s, the same holds also for depletion (the other
corner solution).
What remains to be shown is that the first derivative of the objective function is declining in s in a
neighborhood of the optimal expansion δ i∗ . To this end, denote the derivative of the objective function by
48
ϕ. Omitting all sub and superscripts to avoid clutter, we then have:
∂ϕ
1
= u00 + β
∂s
N
β
−
N
00
0
0
0
F (s + δ)F (s) − F (s + δ)F (s)
1 F (s + δ) 0
u
u−
2
N
[F (s)]
F (s)
!
0
0
F (s + δ)F (s) − F (s + δ)F (s) 0 F (s + δ) 00
u +
u
[F (s)]2
F (s)
!
(A-18)
Note that the second-order condition for a maximum at δ i∗ requires:
00
β
−
N
0
!
0
F (s + δ)F (s) − F (s + δ)F (s) 0 F (s + δ) 00
u +
u
[F (s)]2
F (s)
Equation (A-18) and (A-19) are similar. In fact,
0
0
F (s + δ)
1 F (s + δ) 0
u−
u
N F (s)
F (s)
∂ϕ
1
= u00 + β
∂δ
N
∂ϕ
∂s
=
0
∂ϕ
∂δ
−
0
0
F (s+δ)F (s)
.
[F (s)]2
(A-19)
!
0
The term
0
F (s+δ)F (s)
[F (s)]2
is positive
as F (s + δ) < 0 and F (s) < 0. Hence equation (A-18) is negative when the second-order condition holds.
Thus experimentation in the second period is not optimal, and a fortiori not in any later period.
49
Download