Threatening Thresholds? environmental goods and services Florian K Diekert

advertisement
Threatening Thresholds?
1
2
3
The effect of disastrous regime shifts on the non-cooperative use of
environmental goods and services
Florian K Diekert∗
4
5
May 1, 2016
6
Abstract
7
This paper presents an analytically tractable dynamic game in which agents jointly use a
resource. The resource replenishes fully but collapses irreversibly should total use exceed
a threshold. The initial level of resource use is known to be safe. Because experimentation
only reveals whether the threshold has been crossed or not, and the consequence of crossing
the threshold is disastrous, it is socially optimal, and a Nash Equilibrium, to update
beliefs about the location of the threshold only once (if at all). These learning dynamics
facilitate coordination on a “cautious equilibrium”. If the initial safe level is sufficiently
valuable, the equilibrium implies no experimentation and coincides with the first-best.
The less valuable the initial safe value, the more experimentation. For sufficiently low
initial values, immediate depletion of the resource is the only equilibrium. The potentially
beneficial effect of the threat of a disastrous regime shift is robust to a variety of extensions.
When the regime shift is not disastrous, but the damage increases sufficiently strongly the
more the threshold is exceeded, learning may be gradual.
(175 words)
8
9
10
11
12
13
14
15
16
17
18
19
21
Keywords: Dynamic Games; Thresholds and Natural Disasters; Learning.
JEL-Codes: C73, Q20, Q54
22
Highlights:
20
23
• Thresholds may benefit non-cooperative agents by facilitating coordination.
24
• Learning depends on history: higher initial safe level implies less experimentation.
25
• For high initial value, preserving the resource with certainty is a Nash equilibrium.
∗
Department of Economics and Centre for Ecological and Evolutionary Synthesis, University of Oslo, POBox 1095, 3017 Oslo, Norway. Email: f.k.diekert@ibv.uio.no. This research is funded by NorMER, a Nordic
Centre of Excellence for Research on Marine Ecosystems and Resources under Climate Change. I would like to
thank Geir Asheim, Scott Barrett, Bård Harstad, Larry Karp, Matti Liski, Leo Simon, and seminar participants
at the Beijer Institute, Statistics Norway, and in Oslo, Helsinki, Santa Barbara, and Berkeley for constructive
discussion and comments.
1
26
1
Introduction
27
Everyday-experience painfully teaches that many things tolerate some stress, but if this is
28
driven too far or for too long, they break. Not surprisingly, regime shifts, thresholds, or
29
tipping points are a popular topic in the scientific literature. In this paper, I investigate
30
whether and how the existence of a catastrophic threshold may be beneficial in the sense that
31
it enables non-cooperative agents to coordinate their actions, thereby improving welfare over
32
a situation where this tipping point is not present or ignored. To this end, I develop a generic
33
model of using a productive asset that loses (some or all) its productivity upon crossing some
34
(potentially unknown) threshold.
35
The model is applicable to many different settings of non-cooperative learning about the
36
location of a threshold, but I would like to think of it in terms of ecosystem services. Prime
37
examples are the eutrophication of lakes, the bleaching of coral reefs, or saltwater intrusion
38
(Scheffer et al., 2001; Tsur and Zemel, 1995). On a larger scale, the collapses of the North
39
Atlantic cod stocks off the coast of Canada, or the capelin stock in the Barents Sea can –
40
at least in part – be attributed to overfishing (Frank et al., 2005; Hjermann et al., 2004).
41
The most important application may be the climate system, where potential drivers of a
42
disastrous regime shift could be a disintegration of the West-Antartic ice sheet (Nævdal,
43
2006), a shutdown of the thermohaline circulation (Nævdal and Oppenheimer, 2007), or a
44
melting of Permafrost (Lenton et al., 2008).
45
A key feature of these examples is that the exact location of the tipping point is unknown.
46
Most previous studies translate the uncertainty about the location of the threshold in state
47
space into uncertainty about the occurrence of the event in time. This allows for a convenient
48
hazard-rate formulation (where the hazard rate could be exogenous or endogenous), but it
49
has the problematic feature that, eventually, the event occurs with probability 1. In other
50
words, even if we were to totally stop extracting/polluting, the disastrous regime shift could
51
not be avoided. Arguably, it is more realistic to model the regime shift in such a way that
52
when it has not occurred up to some level, the players can avoid the event by staying at or
53
below that level. I therefore follow the modeling approach of e.g. Tsur and Zemel (1994);
54
Nævdal (2003); Lemoine and Traeger (2014) where – figuratively speaking – the threshold is
55
interpreted as the edge of a cliff and the policy question is whether, when, and where to stop
56
walking. This modeling approach leads, under certain conditions, to a socially optimal “safe
57
minimum standard of conservation” as pointed out by Mitra and Roy (2006).
58
In order to isolate the effect of a catastrophic threshold on the ability to cooperate, I
59
abstract – as a first step – from the dynamic common pool aspect of non-cooperative resource
60
use. This allows me to obtain tractable analytic solutions and feedback Nash-equilibria for
61
general increasing and concave utility functions and continuous probability distributions. In
62
section 3.1, I expose the underlying strategic structure of the game by considering the case
63
when the location of the threshold is known. I show that there is a Nash equilibrium where the
64
resource is conserved indefinitely and a Nash equilibrium where the resource is depleted im-
65
mediately. When the location of the threshold is unknown, any experimentation is undertaken
66
in the first period only (both in the social optimum, section 3.2, and in the non-cooperative
2
67
game, section 3.3). Agents can coordinate on the first-best, if a) it is socially optimal to use
68
the environmental good at its current level, and if b) this status quo is sufficiently valuable.
69
If preserving the status quo is not sufficiently valuable, agents may still coordinate to refrain
70
from depleting the resource, but they will increase their consumption by an inefficiently high
71
amount. However, provided that the increase in consumption has not caused the disastrous
72
regime shift, the players can coordinate on keeping to the updated level of consumption, which
73
is, ex post, socially optimal.
74
The way the threshold risk is modeled in this paper implies that learning is only “affir-
75
mative”. That is, given that the agents know that the current state is safe, and they expand
76
the current state, they will only learn whether the future state is safe or not. The agents will
77
not obtain any new information on how much closer they have come to the threshold.1 In
78
simple words, it is pitch-dark when the agents walk towards the cliff. When the consequence
79
of falling over the cliff is disastrous in the sense that it does not matter by how far an agent
80
has stepped over the edge, then there is no point to split any given distance in several steps.
81
Any experimentation is – if at all – undertaken in the first period. Moreover, the degree of
82
experimentation is decreasing in the value of current use that is known to be safe.
83
This basic feature of experimentation and resource use under the threat of a disastrous
84
threshold is robust to several extensions that are explored in section 4. While the threat of
85
the threshold may no longer induce coordination on the first-best when the externality relates
86
to both the (endogenous) risk of passing the threshold and resource itself, the threshold still
87
encourages coordination on a time-profile of resource use that is Pareto-superior compared to
88
the Nash equilibrium without a threshold. As I show in section 4.4, repeated learning will
89
take place only if the post-threshold value depends negatively on the pre-threshold degree of
90
experimentation, and if this effect is sufficiently strong.
Section 5 concludes the paper and points to important future applications of the modeling
91
92
framework. All proofs are collected in the Appendix.
93
Relation to the literature
94
This paper links to three strands of literature. First, it contributes to the literature on the
95
management of natural resources under regime shift risk by explicitly analyzing learning about
96
the location of a threshold. Second, it relates to the more general literature by describing
97
optimal experimentation in a set-up of “affirmative learning”. Third, the paper extends
98
the literature on coordination in face of a catastrophic public bad, that has hitherto been
99
analyzed in a static setting, by showing that the sharp distinction between a known and
100
unknown location of the threshold vanishes in a dynamic context.
101
The pioneering contributions that analyze the economics of regime shifts in an environ-
102
mental/resource context were Cropper (1976) and Kemp (1976). There are by now a good
103
dozen papers on the optimal management of renewable resources under the threat of an ir-
104
reversible regime shift (see, for example: Tsur and Zemel, 1995; Nævdal, 2006; Brozović and
105
Schlenker, 2011; Lemoine and Traeger, 2014). Polasky et al. (2011) summarize and character1
Empiricists will agree that there is no learning without experiencing.
3
106
ize the literature at hand of a simple fishery model. They contrast whether the regime shift
107
implies a collapse of the resource or merely a reduction of its renewability, and whether prob-
108
ability of crossing the threshold is exogenous or depends on the state of the system (i.e. it is
109
endogenous). They show that resource extraction should be more cautious when crossing the
110
threshold implies a loss of renewability and the probability of crossing the threshold depends
111
on the state of the system. In contrast, exploitation should be more aggressive when a regime
112
shift implies a collapse of the resource and the probability of crossing the threshold cannot be
113
influenced. There is no change in optimal extraction for the loss-of-renewability/exogenous-
114
probability case and the results are ambiguous for the collapse/endogenous-probability case.
115
Ren and Polasky (2014) scrutinize the loss-of-renewability/endogenous-probability case
116
for more general growth and utility functions. They point out that in addition to the risk-
117
reduction effect of the linear model in Polasky et al. (2011), there may also be a consumption-
118
smoothing and an investment effect. The consumption smoothing effect gives incentives to
119
build up a higher resource stock, so that one has more should the future growth be reduced.
120
The investment effect gives incentives to draw down the resource stock as the rate of return
121
after the regime shift is lower. Lemoine and Traeger (2014) call the sum of these two coun-
122
tervailing effects the “differential welfare effect”. In their climate-change application the total
123
effect is positive. As I consider a truly catastrophic event, this effect is zero by construction
124
in the baseline model of the current paper (but see the discussion in section 4.4).
125
Until now the literature in resource economics has been predominantly occupied with
126
optimal management, leaving aside the central question of how agent’s strategic considerations
127
influence and are influenced by the potential to trigger a disastrous regime shift. Still, there
128
are a few notable exceptions: Crépin and Lindahl (2009) analyze the classical “tragedy of
129
the commons” in a grazing game with complex feedbacks, focussing on open-loop strategies.
130
They find that, depending on the productivity of the rangeland, under- or overexploitation
131
might occur. Kossioris et al. (2008) focus on feedback equilibria and analyze, with help of
132
numerical methods, non-cooperative pollution of a “shallow lake”. They show that, as in most
133
differential games with renewable resources, the outcome of the feedback Nash equilibrium
134
is in general worse than the open-loop equilibrium or the social optimum. Ploeg and Zeeuw
135
(2015b) compare the social optimal carbon tax to the tax in the open-loop equilibrium under
136
the threat of a productivity shock due to climate change. While it would be optimal in their
137
two-region model to have taxes that converge to a high level, the players apply diverging taxes
138
in the non-cooperative equilibrium: While the more developed and less vulnerable region has
139
a lower carbon tax throughout, the less developed and more vulnerable region has low taxes
140
initially in order to build up a capital stock, but will apply high taxes later on in order to reduce
141
the chance of tipping the climate into the unproductive regime. Fesselmeyer and Santugini
142
(2013) introduce an exogenous event risk into a non-cooperative renewable resource game à
143
la Levhari and Mirman (1980). As in the optimal management problem with an exogenous
144
probability of a regime shift, the impact of shifted resource dynamics is ambiguous: On the
145
one hand, the threat of a less productive resource induces a conservation motive for all players,
146
but on the other hand, it exacerbates the tragedy of the commons as the players do not take
4
147
the risk externality into account. Finally, Sakamoto (2014) has, by combining analytical
148
and numerical methods, analyzed a non-cooperative game with an endogenous regime shift
149
hazard. He shows that this setting may lead to more precautionary management, also in a
150
strategic setting. Miller and Nkuiya (2014) also combine analytical and numerical methods
151
to investigate how an exogenous or endogenous regime shift affects coalition formation in the
152
Levhari-Mirman model. They show that an endogenous regime shift hazard increases coalition
153
sizes and it allows the players, in some cases, to achieve full cooperation. Here, I simplify
154
the dynamics of resource use to develop an analytically tractable model with an endogenous
155
regime shift risk. Taken together, these studies and the current paper show that the effect
156
of a regime shift pulls in the same direction in a non-cooperative setting as under optimal
157
management. However, both the literature on optimal resource management under regime
158
shift risk and its non-cooperative counterpart have not explicitly addressed learning about
159
the unknown location of the tipping point.
160
Results on optimal experimentation in the context of affirmative learning appear at various
161
places in the literature. For example, in their classical book, Dubins and Savage (1965) dis-
162
cuss circumstances under which it is optimal for gamblers to expose themselves to uncertainty
163
in as few rounds as possible. Another example is Riley and Zeckhauser (1983) who discuss
164
price-negotiation strategies where the seller does not know the valuation of the buyer. They
165
find that “[a] seller encountering risk-neutral buyers one at a time should, if commitments are
166
feasible, quote a single take-it-or-leave-it price to each.” Another well-known study is from
167
Rob (1991), who studies optimal and competitive capacity expansion when market demand is
168
unknown. Rob finds that learning will take place over several periods. In his model, the infor-
169
mation gained by an additional unit of installed capital is small, but so is the cost. However,
170
experimenting too much (in the sense of installing more capital than is needed to satisfy the
171
revealed demand) is very costly compared to experimenting too little (so that the true size of
172
the market remains unknown) several times. Consequently, learning takes place gradually. In
173
the competitive equilibrium, learning is even slower due to the private nature of search costs
174
but the public nature of information. In an application to environmental economics, Costello
175
and Karp (2004) investigate optimal pollution quotas when abatement costs are unknown. In
176
their model, the information gain from an additional unit of quota is small as well, but search
177
costs are very high in the beginning and then decline. Importantly, there is no additional
178
harm from experimenting too much, which – in line with the baseline model of the current
179
paper – leads to the conclusion that any experimentation takes place in the first period only.
180
Groeneveld et al. (2013) discuss socially optimal learning about a reversible flow-pollution
181
threshold. They show that the upper bound of the belief about the threshold’s location is
182
expanded only once, while there may be several rounds of experimentation to update the
183
lower bound of the belief about the threshold’s location.
184
Here, I show that non-cooperative learning is more aggressive than socially optimal because
185
costs of search are public while the immediate gains are private. Moreover, I show that
186
experimentation is decreasing in the value of the state that is known to be safe: The more
187
the players know that they can safely consume, the less will they be willing to risk triggering
5
188
the regime shift by enlarging the set of consumption opportunities. This aspect has, to the
189
best of my knowledge, not yet been appreciated.
190
At this point, it is important to highlight the difference between the current approach,
191
in which learning is “affirmative”, on the one hand, and, on the other hand, the literature
192
on strategic experimentation (e.g.: Bolton and Harris, 1999; Keller et al., 2005; Bonatti and
193
Hörner, 2015) and the literature on learning in a resource management context (e.g.: Kolstad
194
and Ulph, 2008; Agbo, 2014; Koulovatianos, 2015). In the latter two strands of literature,
195
learning is “informative” in the sense that the agents obtain a random sample on which they
196
base their inference about the state of the world. It pays to obtain repeated samples as
197
this improves the estimate (of course, the public nature of information introduces free-rider
198
incentives in a strategic setting).
199
Finally, the current paper is closely related to three articles that discuss the role of un-
200
certainty about the threshold’s location on whether a catastrophe can be avoided. Barrett
201
(2013) shows that players in a linear-quadratic game are in most cases able to form self-
202
enforcing agreements that avoid catastrophic climate change when the location of the thresh-
203
old is known but not when it is unknown. Similarly, Aflaki (2013) analyzes a model of a
204
common-pool resource problem that is, in its essence, the same as the stage-game developed
205
in section 3. Aflaki shows that an increase in uncertainty leads to increased consumption,
206
but that increased ambiguity may have the opposite effect. Bochet et al. (2013) confirm the
207
detrimental role of increased uncertainty in the stochastic variant of the Nash Demand Game:
208
Even though “cautious” and “dangerous” equilibria co-exist (as they do in my model), they
209
provide experimental evidence that participants in the lab are not able to coordinate on the
210
Pareto-dominant cautious equilibrium.2 However, the models in Aflaki (2013), Barrett (2013),
211
and Bochet et al. (2013) are all static and can therefore not address the prospects of learning.
212
Here, I show that the sharp distinction between known and unknown location of a threshold
213
vanishes in a dynamic context. More uncertainty still leads to increased consumption, but
214
this is now partly driven by the increased gain from experimentation.
215
Analyzing how strategic interactions shape the exploitation pattern of a renewable resource
216
under the threat of a disastrous regime shift is important beyond mere curiosity driven interest.
217
It is probably fair to say that international relations are basically characterized by an absence
218
of supranational enforcement mechanisms which would allow to make binding agreements.
219
But also locally, within the jurisdiction of a given nation, control is seldom complete and the
220
exploitation of many common pool resources is shaped by strategic considerations. Extending
221
our knowledge on the effect of looming regime shifts by taking non-cooperative behavior into
222
account is therefore a timely contribution to both the scientific literature and the current
223
policy debate.
2
Bochet et al. (2013, p.1) conclude that a “risk-taking society may emerge from the decentralized actions
of risk-averse individuals”. Unfortunately, it is not clear from the description in their manuscript whether
the participants were able to communicate. The latter has shown to be a crucial factor for coordination in
threshold public goods experiments (Tavoni et al., 2011; Barrett and Dannenberg, 2012). Hence, it may be
that what they refer to as “societal risk taking” is simply the result of strategic uncertainty.
6
224
2
Basic model
225
To repeat, I consider a situation where several agents share a productive resource. Each agent
226
obtains utility from using it and as long as total use is at or below a threshold, the resource
227
remains intact. Contrarily, if total use in any period exceeds the threshold, a disastrous
228
regime shift will occur. The players are unable to make binding agreements. In order to
229
expose the effect of a threatening regime shift as clearly as possible, I consider a model where
230
the externality applies only to the risk of triggering the regime shift. While this is clearly
231
a simplification, it allows for a tractable solution of a general dynamic model with minimal
232
requirements on the utility function or the distribution of the threshold’s location.
233
Essentially, the agents face the problem of sharing a magic pie: If they do not eat too
234
much today, they will tomorrow find the full pie replenished, to be shared again. Yet, each
235
agent is tempted to eat a little more than what was eaten yesterday since he will have that
236
piece for himself, while the future consequences of his voracity will have to be shared by all.
237
A literal example could be saltwater intrusion: Consider a freshwater reservoir whose over-
238
all volume is approximately known, and the annual recharge, say due to rainfall or snowmelt
239
is sufficient to fully replenish it. However, the agents fear that saltwater may intrude and
240
irreversibly spoil the reservoir once the water table falls too low. Further, suppose the geol-
241
ogy is so complex that it is not known how much water must be left in the reservoir to avoid
242
saltwater intrusion. Saltwater intrusion has not occurred in the past, so that the current level
243
of total use is known to be safe. Thus, the agents now face the trade-off whether to expand
244
the current consumption of water, or not. If they decide to expand the current level of use,
245
by how much should extraction increase, and in how many steps should the expansion occur?
246
Moreover, could it be in the agent’s own best interest to empty the remaining reservoir even
247
when all other agents take just their share of the historical use?
248
Resource dynamics
249
• Time is discrete and indexed by t = 0, 1, 2, ....
250
• Each period, agents can in total consume up to the available amount of the resource.
251
There are two regimes: In the productive regime, the upper bound on the available
252
resource is given by R, and in the unproductive regime, the upper bound is given by r
253
(with r R).
254
• The game starts in the productive regime and will stay in the productive regime as long
255
as total consumption does not exceed a threshold T . The threshold T is the same in all
256
periods, but it may be known or unknown.
257
• To highlight the effect of uncertainty about the threshold, I define the state variable st ,
258
denoting the upper bound of the “safe consumption possibility set” at time t. That is,
259
total resource use up to st has not triggered a regime shift before, and it is hence known
260
that it will not trigger a regime shift in the future (i.e. Prob(T ≤ st ) = 0).
7
261
Agents, choices, and payoff
262
• There are N identical agents. Each agent i derives utility from consuming the resource
263
according to some general function u(cit ). I assume that this function is continuous,
264
increasing (u0 > 0), concave (u00 ≤ 0), and bounded below by u(0) = b.
265
266
267
268
• Let cit be the consumption of player i at time t. For clarity, I split per-period consumption
st
N
in the productive regime in two parts: cit =
+ δti . This means:
1. The agents obtain an equitable share of the amount of the resource that can be
used safely.
269
2. The agents may choose to consume an additional amount δti , effectively pushing
270
the boundary of the safe consumption possibility set at the risk of triggering the
271
regime shift.
272
273
274
• In other words, δti is the effective choice variable with δti ∈ [0, R − st − δt−i ], where δt−i
is the expansion of the safe consumption set by all other agents except i. I denote δ
P
i
without superscript i as the total extension of the safe set, i.e. δt = N
i=1 δt .
275
• The objective of the agents is to choose that sequence of decisions ∆i = δ0i , δ1i , ... which,
276
for given strategies of the other agents ∆−i , and for a given initial value s0 , maximizes the
277
sum of expected per-period utilities, discounted by a common factor β with β ∈ (0, 1).
278
I concentrate on Markovian strategies.
279
280
281
282
283
The probability of triggering the regime shift
• Let the probability density of T on [0, A] be given by a continuous function f such that
the cumulative probability of triggering the regime shift is a priori given by F (x) =
Rx
0 f (τ )dτ . F (x) is the common prior of the agents, so that we are in a situation of risk
(and not Knightian uncertainty).
284
• The variable A with R ≤ A ≤ ∞ denotes the upper bound of the support of T . When
285
R < A, there is some probability 1 − F (R) that using the entire resource is safe and the
286
presence of a critical threshold is immaterial. When R = A using the entire resource
287
will trigger the regime shift for sure. Both R and A are known with certainty.
288
• Knowing that a given consumption level s is save, the updated density of T on [s, A]
f (s+δ)
1−F (s)
(see Figure 1). The cumulative probability of triggering the
289
is given by fs (δ) =
290
regime shift when, so to say, taking a step of distance δ from the safe value s is:
Z
Fs (δ) =
δ
fs (τ )dτ
0
=
1
1 − F (s)
Z
δ
f (s + ξ)dξ
0
=
F (s + δ) − F (s)
1 − F (s)
(1)
291
So that Fs (δ) is the discretized version of the hazard rate. I assume that the hazard
292
rate does not decrease in s.
8
Density
0
s
R
Figure 1: Updating of belief upon learning that T > s: Grey area is F , blue hatched area is Fs .
293
• The (bayesian) updating of beliefs is illustrated in Figure 1. Note that it is only revealed
294
whether the state s is safe or not, but no new knowledge about the relative probability
295
that the threshold is located at, say, s1 or s2 (with s1 , s2 > s) has been acquired.
296
• The key expression that I use in the remainder of the paper is Ls (δ), which I call the
297
conditional survival function. It denotes the probability that the threshold is not crossed
298
when taking a step δ, given that the event has not occurred up to s. Let L(x) = 1−F (x):
Ls (δ) = 1 − Fs (δ) =
299
1 − F (s) − (F (s + δ) − F (s))
L(s + δ)
=
1 − F (s)
L(s)
The conditional survival function has the following properties:
300
– It decreases with the step size δ:
301
– It decreases with s:
302
303
(2)
f (s+δ)
1−F (s+δ)
∂Ls (δ)
∂s
=
∂Ls (δ)
∂δ
=
−f (s+δ)
1−F (s)
< 0.
−f (s+δ)(1−F (s))+(1−F (s+δ))f (s)
[1−F (s)]2
≤ 0 ⇔
f (s)
1−F (s)
≤
(as the hazard rate is non-decreasing).
Preliminary clarifications and tractability assumptions
304
• It is well known that the static non-cooperative game of sharing a given resource has
305
infinitely many equilibria, even when the agents are assumed to be symmetric. Moreover,
306
the game requires a statement about the consequences when the sum of consumption
307
plans exceeds the total available resource. Here, I assume that the resource is rationed
308
so that each agent gets an equal share. (The important assumption of symmetry is
309
discussed in section 5).
310
• The agent’s prior F (x) is fixed. The absence of any passive learning (an arrival of
311
information simply due to the passage of time) is justified in a situation where all
312
learning opportunities from other, similar resources have been exhausted. The only way
313
to learn more about the location of the threshold in the specific resource at hand is to
9
314
experiment with it.3 The absence of passive learning is also justified when the resource
315
at hand is unique (such as the planet earth when thinking about tipping points in the
316
climate system).
317
• The regime shift is irreversible. Moreover, I consider the regime shift to be disastrous,
318
in the sense that crossing the thresholds breaks all links between the pre-event and the
319
post-event regime. Because the post-event value function is then independent of the
320
pre-event state, I can, without loss of generality, set r = 0 and u(0) = 0.
321
• For now, the model abstracts from the dynamic common pool problem in the sense
322
that the consumption decision of an agent today has no effect on the consumption
323
possibilities tomorrow, except that a) the set of safe consumption possibilities may have
324
been enlarged and b) the disastrous regime shift may have been triggered.
325
• The idea that a system is more likely to experience a disastrous regime shift the lower
326
the amount of the resource that has been left untouched could simply be included in the
327
belief F (x).4 Additive disturbances, such as stochastic (white) noise, are independent
328
of the current state and would not affect the calculations in a meaningful way. They
329
could be absorbed in the discount factor.
330
3
Social optimum and non-cooperative equilibrium
331
In this main part of the paper, I will first expose the underlying strategic structure of the
332
model by analyzing the situation when the threshold is known (section 3.1). In section 3.2,
333
I describe the socially optimal course of action to highlight that any experimentation – if at
334
all – is undertaken in the first period and that experimentation is decreasing with the value
335
of the consumption level that is known to be safe. I then show that this feature of learning
336
may allow for a cautious non-cooperative equilibrium where either the resource is conserved
337
with probability 1 or the agents experiment once (section 3.3). The degree of experimentation
338
will be inefficiently large, but if the threshold has not been crossed, staying at the updated
339
safe level will – ex post – be socially optimal. In section 3.4, I analyze how optimal and
340
non-cooperative resource use changes with changes in the parameters. Finally, I provide an
341
instructive example for which I derive closed-form solutions (section 3.5).
342
3.1
343
What is the socially optimal outcome in a situation when the threshold T is known? When
344
T is small, too much of the resource must be left untouched to ensure its future existence. As
Known threshold location
3
An everyday example is blowing up a ballon: We all know that they will burst at some point, and we have
blown up sufficiently many balloons, or seen our parents blow sufficiently many balloons to have a good idea
which size is safe. But for a given balloon at hand, I do not know when it will burst.
4
Note that I rule out a constructivist worldview. T is given by nature. While the distance between me
and the edge of the cliff (the threshold) gets smaller because I walk towards it, the threshold does not appear
because I am walking.
10
345
a consequence, it is socially optimal to cross the threshold and consume the entire resource
346
immediately. When T is large, however, the per-period utility from staying at the threshold is
347
sufficiently high so that the first-best is to indefinitely use exactly that amount of the resource
348
which does not cause the regime shift. Intuitively, whether a given T is large enough to induce
349
conservation will also depend on the overall amount of the resource R and the discount factor.
350
The more the future is discounted, the less willing one is to sacrifice today’s consumption to
351
ensure consumption in the future.
352
In the non-cooperative game with a known threshold, immediate depletion is always a
353
Nash-Equilibrium. Clearly, an agent’s best reply when the other agents cross the threshold is
354
to demand the maximal amount of the resource. However, there will be a critical value Tc so
355
that staying at the threshold T is also Nash-Equilibrium when T ≥ Tc . In fact, as Proposition
356
1 states, there will always be a parameter combination so that the first-best can be supported
357
as a Nash-Equilibrium.
358
As the setup is stationary, it is clear that if staying at the threshold can be rationalized
359
in any one period, it can be done so in every period. The payoff from avoiding the regime
360
shift is
361
u(T /N )
1−β .
Conversely, the payoff from deviating and immediately depleting the resource
when all other players intend to stay at the threshold is given by u R − NN−1 T . The lower
362
is T , the lower will be the payoff from staying at the threshold, and the higher will be the
363
payoff from deviating. I can therefore define a function Ψ that captures agent i’s incentives
364
to grab the resource when all other agents stay at T :
N −1
u(T /N )
Ψ(T, R, N, β) = u R −
T −
N
1−β
(3)
365
The function Ψ is positive at T = 0 and declines as T gets larger. Staying at the threshold
366
can be sustained as a Nash Equilibrium whenever Ψ ≤ 0. The critical value Tc is implicitly
367
defined by Ψ(Tc , R, N, β) = 0.
368
Proposition 1. When the location of the threshold is known with certainty, then there exists,
369
for every combination of β, N , and R, a value Tc such that the first-best can be sustained as a
370
Nash-Equilibrium when T ≥ Tc , where Tc is defined by Ψ = 0. The critical value Tc is higher,
371
the larger is N and R, or the smaller is β.
372
Proof. The proof is placed in Appendix A.1
373
In other words, when T is known and T ≥ Tc , the game exhibits the structure of a
374
coordination game with two Nash equilibria in symmetric pure strategies. Here, as in the
375
static game from Barrett (2013, p.236), “[e]ssentially, nature herself enforces an agreement to
376
avoid catastrophe.” When the staying at or below the threshold is not sufficiently valuable,
377
immediate depletion is the only equilibrium.
378
Having exposed the underlying strategic structure of the game, I now turn to the situation
379
when the location of the threshold is unknown, first by disregarding strategic interactions and
380
studying optimal experimentation, and then by analyzing the non-cooperative game with
381
unknown location of T .
11
382
3.2
Social optimum when the location of T is unknown
383
The objective of the social planner is:
max
ct ∈C
∞
X
β t u(ct )
subject to: Rt+1 =

 Rt if ct ≤ T
 0
t=0
if ct > T
;
R0 = R.
(4)
or Rt = 0
384
Starting from a historically given safe value st , and a belief about the location of the
385
threshold, the social planner has in principle two options: She can either stay at st (choose
386
δ = 0), thereby ensuring the existence of the resource in the next period. Alternatively, she
387
can take a positive step into unknown territory (choose δ > 0), potentially expanding the set
388
of safe consumption possibilities to st+1 = st + δ, albeit at the risk of a resource collapse.
389
Recalling that Ls (δ) is the probability of not crossing the threshold (“surviving”) when taking
390
a step of size δ from the safe value s, we can write social planner’s Bellman equation as:
V (s) =
max
δ∈[0,R−s]
u(s + δ) + βLs (δ)V (s + δ)
(5)
391
The crux is, of course, that the value function V (s) is a priori not known. However,
392
we do know that once the planner has decided to not expand the set of safe consumption
393
possibilities, it cannot be optimal to do so at a later period: If δ = 0 is chosen in a given
394
period, nothing is learned for the future (st+1 = st ), so that the problem in the next period is
395
identical to the problem in the current period. If moving in the next period were to increase
396
the payoff, it would increase the payoff even more when one would have made the move a
397
period earlier (as the future is discounted).
398
To introduce some notation, let s∗ be a member of the set of admissible consumption
399
values C = [0, R] at which it is not socially optimal to expand the set of safe consumption
400
values (as the threat of a disastrous regime shift looms too large). Denote this set of values
401
by S and let s∗ be the smallest member of S. In Appendix A.2, I show that S must exist and
402
that it is convex when the hazard rate is non-decreasing. Thus, for s ≥ s∗ , it is optimal to
403
choose δ = 0. In this case, we know V (s), it is given by V (s) =
u(s)
1−β .
404
This leaves three possible paths when starting from values of s0 that are below s∗ . The
405
social planner could: 1) make one step and then stay, 2) make several, but finitely many steps
406
and then stay, and 3) make infinitely many steps. I now argue that 1) is optimal.
407
Suppose that a value at which it is socially optimal to remain standing is reached in finitely
408
many steps. This implies that there must be a last step. For this last step, we can explicitly
409
write down the objective function as we know that the value of staying at s∗ forever is
410
Denote by ϕ(δ; s) the social planner’s valuation of taking exactly one step of size δ from the
411
initial value s to some value s∗ and then staying at s∗ forevermore, and denote by δ ∗ (s) the
412
optimal choice of the last step. Formally:
12
u(s∗ )
1−β .
ϕ(δ; s) = u(s + δ) + βLs (δ)
u(s + δ)
.
1−β
(6)
This yields the following first-order-condition for an interior solution:
0
0
ϕ (δ; s) = u (s + δ) + β
+ δ)
u0 (s + δ)
+ Ls (δ)
1−β
1−β
u(s
L0s (δ)
= 0.
(7)
413
With these explicit functional forms in hand, I can show that it is better to traverse any
414
given distance before remaining standing in one step rather than two steps (see Appendix
415
A.2). A fortiori, this holds for any finite sequence of steps. Also an infinite sequence of steps
416
cannot yield a higher payoff since the first step towards s∗ will be arbitrarily close to s∗ and
417
concavity of the utility function ensures that there is no gain from never actually reaching s∗ .
418
Let g ∗ (s) be the interior solution to the first-order-condition (7). Note that we need not
419
have an interior solution so that δ ∗ (s) = 0 when ϕ0 (δ; s) < 0 for all δ and δ ∗ (s) = R − s when
420
ϕ0 (δ; s) > 0 for all δ. The first corner solution arises when s ≥ s∗ . Similarly, I define a critical
421
value s∗ so that the second corner solution arises when s ≤ s∗ . (In most cases, this corner
422
solution will not arise in the social optimum, which can be formally described by s∗ < 0).
423
That is, the optimal expansion of the set of safe consumption values is given by:

R−s




δ ∗ (s) =
g ∗ (s)




0
424
if
s ≤ s∗
(8a)
if
s ∈ (s∗ , s∗ )
(8b)
if
s ≥ s∗
(8c)
The first-best consumption is summarized by the following proposition:
425
Proposition 2. There exists a set S so that for s ∈ S, it is optimal to choose δ ∗ (s) = 0. That
426
is, if s0 ∈ S, the socially optimal use of the resource is s0 for all t. If s0 ∈
/ S, it is optimal
427
to experiment once at t = 0 and expand the set of safe values by δ ∗ (s0 ). When this has not
428
triggered the regime shift, it is socially optimal to stay at s1 = s0 + δ ∗ (s0 ) for all t ≥ 1.
429
Proof. The proof is given in Appendix A.2.
430
In other words, any experimentation – if at all – is undertaken in the first period. The
431
intuition is the following: Given that it is optimal to eventually stop at some s∗ ∈ S, the
432
probability of triggering the regime shift when going from s0 to s∗ is the same whether the
433
distance is traversed in one step or in many steps. Due to discounting, the earlier the optimal
434
safe value s∗ is reached, the better.
435
Moreover, the degree of experimentation depends on history; it is declining in s (Proposi-
436
tion 3). The intuition for this effect is clear: The more valuable the current outside option, the
437
less the social planner can gain from an increased consumption set, but the more she can lose
438
should the experiment trigger the regime shift. In other words, the more the social planner
13
439
knows, the less she wants to learn. In fact, this implies that the largest step is undertaken
440
when s = 0, which is reminiscent Janis Joplin’s dictum that “freedom is another word for
441
nothing left to lose”.
442
Proposition 3. The socially optimal step size δ ∗ (s) is decreasing in s for s ∈ (s∗ , s∗ ).
443
Proof. The proof is placed in Appendix A.3.
444
With this characterization of the social optimum in place, I next turn to the non-cooperative
445
game of using a renewable resource under the threat of a regime shift with unknown location
446
of the threshold.
447
3.3
448
For a given value of the total consumption that is known to be safe, and a given strategy of
449
the other players ∆−i , the Bellman equation of agent i is:
Non-cooperative equilibrium when the location of T is unknown
V i (s, ∆−i ) =
max
δ i ∈[0,R−s−δ i ]
n
o
u(s/N + δ i ) + βLs (δ i + δ −i )V i (s + δ, ∆−i )
(9)
450
Also here, the crux is that agent i’s value function V i is a priori unknown. However, as
451
the analysis in the previous section has highlighted, we do know that s divides the state space
452
into a safe region and an unsafe region. Moreover, due to the stationarity of the problem, we
453
know that if the agents can coordinate to stay at some value s once, they can do so forever.
454
Below, I will show that there indeed exists a set S nc where for any s ∈ S nc staying at s is an
455
equilibrium. However, just as in the case when the threshold’s location is known, immediate
456
depletion is always also a Nash-Equilibrium. But different from the case when the threshold’s
457
location is known, immediate depletion need not be the best-reply when s ∈
/ S nc . Rather, the
458
agents may coordinate on expanding the set of safe consumption values by some amount δ nc
459
and this experiment need not trigger the regime shift. Provided that the regime shift has not
460
occurred, the set of safe consumption possibilities will be expanded up to a level where it is a
461
Nash-Equilibrium to not expand it further. Parallel to the socially optimal experimentation
462
pattern, it will be a Nash-Equilibrium to reach the set S nc in one step. This “cautious”
463
pattern of non-cooperative resource use is summarized by the following proposition.
464
Proposition 4. There exists a set S nc such that for s0 ∈ S nc , it is a symmetric Nash-
465
Equilibrium to stay at s0 and consume
466
take exactly one step and consume
467
the regime shift – to stay at s1 = s0 + N δ nc (s0 ), consuming
468
Proof. The proof is given in Appendix A.4
s0
N
s0
N
for all t. For s0 ∈
/ S nc , it is a Nash-Equilibrium to
+ δ nc (s0 ) for t = 0 and – when this has not triggered
s1
N
for all t ≥ 1.
469
Let φ denote the payoff for agent i when she takes exactly one step of size δ i and then
470
remains standing and the strategy of the other agents, ∆−i = {δ −i , 0, 0, 0, ...}, is also to take
471
only one step (of total size δ −i ):
14
i −i s
u s+δ N+δ
φ(δ i ; δ −i , s) = u
+ δ i + βLs (δ i + δ −i )
N
1−β
(10)
The corresponding first-order-condition for an interior maximum is:
s
i
φ (δ ; δ , s) = u
+δ
N
0
i
−i
0
+ βL0s (δ i + δ −i )
i −i u s+δ N+δ
(11)
1−β
0
u
1
+ β Ls (δ i + δ −i )
N
s+δ i +δ
N
−i
=0
1−β
472
Denote the interior solution to the first-order-condition (11) (if it exists) by g(δ −i , s).
473
Three forces determine g: First, there is the marginal gain from the increase in current utility.
474
For a given s, this private gain is larger the more agents there are as u00 ≤ 0. Second, there is
475
the marginal decrease in the probability of surviving, which is evaluated at the updated safe
476
consumption value. As agent i obtains only
477
cost weigh less the more agents there are. In this sense, the cost of triggering the regime shift
478
are public. Third, conditional on survival, there is the marginal utility gain from an expanded
479
safe consumption set. As this gain is again public and benefits all agents equally, it is devalued
480
by the factor
481
of the set of safe consumption possibilities will be inefficiently large, despite the fact that it
482
is “cautious” in the sense that it does not trigger the regime shift with probability 1.
1
N.
1
N th
of the updated safe consumption value, these
Taken together, these three effects mean that the non-cooperative expansion
483
Clearly, for a given s and δ −i there need not be an interior solution. When the gain from
484
expanding the set of safe consumption values is small, but the threat of triggering the regime
485
shift is large, it may be individually rational to choose δ i = 0. Conversely, when the gain
486
from expanding the set of safe consumption values is large and/or it is unlikely that there is
487
a regime shift, it may be individually rational to choose δ i = R − s − δ −i .
For a symmetric step size δ −i = (N − 1)δ i , we can write equation (11) as follows:
"
#
s+N δ nc
s
0 s+N δ nc
u
u
1
N
N
φ0 (δ nc ; s) = u0
+ δ nc + β L0s (N δ nc )
+ Ls (N δ nc )
=0
N
1−β
N
1−β
(12)
488
Let g nc (s) be the individual symmetric interior non-cooperative expansion. It is implicitly
489
defined by φ0 (δ nc ; s) = 0. Noting the similarity of (12) to (7) when replacing δ ∗ with N δ nc ,
490
it is possible to show – parallel to Proposition 3 – that g nc (s) is decreasing in s. We can
491
therefore define snc , the smallest member of the set S nc , by g nc (snc ) = 0. In other words, for
492
s ≥ snc , the threat of triggering a disastrous regime shift is sufficiently large so that the agents
493
find it in their own best interest to stay at s when all other agents do so, too. Conversely,
494
we can define the value snc by the other corner solution g nc (snc ) =
495
for s ≤ snc , the threat of triggering a regime shift is so small compared to the gains from
15
R−s
N .
In other words,
496
increasing one’s own consumption that it is individually rational to use the resource up to its
497
maximal capacity R.
498
499
To sum up, in the non-cooperative game when the location of T is unknown, there is a
“cautious equilibrium” that is described by the following set of Markov-strategies:

R−s




 N
nc
δ (s) =
g nc (s)





0
if
s ≤ snc
(13a)
if
s ∈ (snc , snc )
(13b)
if
s ≥ snc
(13c)
500
The key intuition for the existence of this “cautious equilibrium” is that 1) for high values
501
of s, staying at s is individually rational when all other agents do so, too, and 2) that when
502
s∈
/ S nc , no agent has an incentive to deviate from a one-step experimentation that expands
503
the set of safe consumption values into the region in which staying is optimal. Of course,
504
there will always also exist an “aggressive equilibrium” in which the resource is depleted
505
immediately, simply because the best-reply for player i when all other players plan to expand
506
the consumption set by
507
“cautious” and the “aggressive equilibrium” are unique.5
R−s
N
is to chose
R−s
N
as well. Note that, for a given s, both the
508
Figure 2 illustrates the aggregate expansion of the set of safe consumption possibilities in
509
the cautious equilibrium and contrasts it with the social optimum. The game has the structure
510
of a coordination problem where the immediate depletion of the resource may become a self-
511
fulfilling prophecy. If any agent i is convinced that the other agents plan to deplete the
512
resource, his best reply is to claim the maximum amount of the resource. Indeed, for s0 ≤ snc ,
513
the immediate depletion of the resource is the only Nash Equilibrium, despite the fact that
514
there is a range of initial values (s0 ∈ [s∗ , snc ]) for which it is socially optimal to conserve
515
the resource indefinitely (at least with positive probability). For s0 > snc the “cautious”
516
and the “aggressive equilibrium” coexist. For s0 > snc the agents do not expand the set of
517
consumption values and share s0 in the cautious equilibrium. Indeed, the cautious equilibrium
518
coincides with the first-best for some range: The threat of the regime shift allows the player
519
to coordinate on the social optimum. For s ∈ [snc , snc ], the strategic interactions imply that
520
experimentation is inefficiently large. However, should it turn out that s1 = s0 + N δ nc is safe,
521
this consumption pattern is ex-post socially optimal.
522
Faced with this coordination problem, the question arises which of the two equilibria can we
523
expect to be selected. Clearly, the “cautious equilibrium” pareto-dominates the “aggressive
524
equilibrium”.6 Without strategic uncertainty, the cautious equilibrium would thus be the
5
Uniqueness of the latter type of equilibrium simply follows from the assumption that in case of incompatible
demands, the resource is shared equally among the players. Uniqueness of the symmetric “cautious equilibrium”
(should it entail δ nc (s) < R−s
) can be established by contradiction. Suppose all other players j 6= i choose to
N
expand the consumption set to a level at which – should the threshold have not been crossed – no player would
have an incentive to go further. Player i’s best-reply cannot be to choose δ i = 0 in this situation as the gain
from making a small positive step (which are private) exceed the (public) cost of advancing a little further.
Hence, the only equilibrium at which the players expand the consumption set once is the symmetric one.
6
This follows immediately from the fact that, by definition, δ nc (s) is the interior solution to the symmetric
maximization problem (9) (with δ −i = (N − 1)δ nc ) where the policy δ(s) = R − s was an admissible candidate.
16
Aggregate expansion (step) size δ(s)
Social Optimum
Cautious Nash Equilibrium
R-s
s*
snc s*
snc
R
Initial safe value s
Figure 2: Illustration of policy function δ(s). The blue circles represent the socially optimal extension
δ of the safe consumption set s (on the y-axis) as a function of the safe consumption set on the x-axis
(where obviously s ≤ R and δ ∈ [0, R − s]). For values of s below s∗ , it is optimal to consume the
entire resource (choose δ(s) = R − s). For values of s above s∗ , it is optimal to remain standing (choose
δ(s) = 0). The red dashed line plots the cautious non-cooperative equilibrium, showing how s∗ ≤ snc
and s∗ ≤ snc (in some cases we may even have snc < s∗ ). It illustrates how even the “cautious”
experimentation under non-cooperation implies excessive risk-taking. The figure also shows that the
first-best and the non-cooperative outcome may coincide for very low and very high values of s.
525
outcome of the game. But what happens when the agents are uncertain about the other
526
agents’ behavior? As the disastrous regime shift is irreversible, there is no room for dynamic
527
processes that lead agents to select the pareto-dominant equilibrium (Kim, 1996). Therefore,
528
I turn to the static concept of risk-dominance (Harsanyi and Selten, 1988).
529
Since the game is symmetric, applying the criterium of risk-dominance for equilibrium
530
selection has the intuitive interpretation that the cautious equilibrium is selected if the ex-
531
pected payoff from playing cautiously exceeds the expected payoff from playing aggressively
532
when agent i assigns probability p to the other agents playing aggressively. Obviously, whether
533
the cautious or the aggressive equilibrium is risk-dominant will depend both on this probabil-
534
ity p as well as on the safe value s. We can, for a given safe value s, solve for the probability
535
p∗ at which agent i is just indifferent between playing cautiously or aggressively:
p∗ · π[all aggressive] + (1 − p∗ )· π[only i aggressive] = p∗ · π[only i cautious] + (1 − p∗ )· π[all cautious]
⇔
p∗ =
536
π[all cautious] − π[only i aggressive]
(π[all cautious] − π[only i aggressive] ) − (π[only i cautious] − π[all aggressive] )
In the above calculation, π[all aggressive] refers to the payoff of playing aggressive when all
17
other agents play aggressively, π[only i aggressive] refers to the payoff of playing aggressive when
538
all other agents play cautiously, etc. In order to explicitly solve for the value of p∗ , we need to
539
put more structure on the problem. For the specific example developed in section 3.5 below,
540
we can calculate and plot p∗ as a function of s (see Figure 3). The grey area below the line
541
drawn by p∗ shows the set of values for which agent i prefers to play cautiously. Figure 3
542
illustrates how robust the cautious equilibrium is: Even when the agents think that there is a
543
50% chance that all other agents play the aggressive strategy, it still pays to play cautiously
544
for a wide range of initial values s. (Clearly, p∗ is not defined for s < snc when the cautious
545
and the aggressive equilibrium coincide.)
1
537
0.75
0.5
Region where playing cautious is risk-dominant
0.25
Probability that opponents play aggressively
p*
snc
snc
R
Initial safe value s
√
Figure 3: The black line plots p∗ as a function of s for u(c) = c, f = A1 and β = 0.8, A = R = 1 and
N = 10. It shows, for a given value of s the maximum value that agent i can assign to the probability
that all other agents play aggressively and still prefer to play cautiously.
546
3.4
Comparative statics
547
In this section, I analyze how the consumption pattern in the cautious equilibrium changes
548
with changes in the parameters. Recall that g nc is defined as the interior solution φ0 = 0
549
where φ0 is given by (12). The effect of an increase in a parameter a in the interior range
550
s ∈ (snc , snc ) is given by
551
higher the higher the parameter a, it is sufficient to show that
dg nc
da
0
∂φ /∂a
= − ∂φ
0 /∂g nc . Thus, to show that aggregate consumption is
∂φ0
∂g nc
∂φ0
∂a
> 0 (because the second-
< 0). Because g nc is monotonically decreasing in s, it
552
order condition implies that
553
is equivalently sufficient to show that, for a given value of R, neither boundary snc or snc
554
decreases and at least one boundary increases with a. The reason is that for a given value
555
of R an upward shift of snc or snc (and no downward of the respective other boundary)
556
necessarily implies that all new values of g nc must lie above the old values of g nc .
557
Proposition 5 summarizes the comparative statics results with respect to β, N, R and the
558
agent’s prior belief about the location of the threshold.
559
Proposition 5.
560
561
(a) The boundaries snc and snc , and aggregate consumption in the cautious equilibrium,
N g nc , decrease with β.
18
562
563
(b) An increase in N leads to higher resource use in the cautious equilibrium when
R
) u0 ( NR+1 ).
u0 ( N
N
N +1
>
564
(c) The more likely the regime shift (in terms of a first-order stochastic dominance), the
565
larger the range where a separate cautious Nash-equilibrium exists and the lower aggre-
566
gate consumption.
567
(d) An increase of R to R̃ for an unchanged risk of the regime shift (i.e. R < R̃ ≤ A)
decreases snc and leads to a larger range where a separate cautious equilibrium exists.
568
569
Proof. The proofs are given in Appendix A.5.
570
The first comparative static result conforms with basic intuition: The more patient the
571
agents are, the more they value the annuity of staying at s, and the more cautious they
572
are. The second result shows that an increase in the number of agents may exacerbate the
573
“tragedy of the commons”, but not necessarily in all cases. There are two opposing effects:
574
On the one hand, the addition of one more agent increases aggregate resource use if all agents
575
were to choose the same consumption level as before. On the other hand, the addition of
576
another agent leads all other agents to decrease their individual consumption as they partly
577
take the increase in N into account. Proposition 5 (b) gives a sufficient condition for when
578
the former effect dominates. The third comparative static result also conforms with basic
579
intuition: The more dangerous any step is, the more cautiously will the agents experiment.
580
The last comparative statics result finally highlights the difference to the situation when the
581
location of the threshold is known with certainty. In that situation, an increase in R leads
582
to an increase in Tc , which shrinks the range in which the socially optimal outcome is a
583
Nash-Equilibrium (Proposition 1). Here, immediate depletion is not necessarily the dominant
584
strategy. An increase in R essentially means that the scope for an interior solution is widened
585
so that the range for which immediate depletion is the only Nash-Equilibrium is smaller.
586
3.5
587
For a given utility function and a given probability distribution of the threshold’s location it is
Specific example
589
possible to explicitly solve for δ ∗ (s), δ nc (s) and calculate the value function V (s). Specifically,
√
I assume that u(c) = c and that the agents believe that every value in [0, A] is equally likely
590
to be the threshold, i.e. f =
588
1
A,
and accordingly Ls (δ) =
A−s−δ
A−s .
Let us first consider the case of the social planner: The optimal expansion of the set of
safe consumption possibilities is given by:
δ∗ =
A − (1 + 2β)s
3β
19
There will only be an interior solution to (7) when s ∈ [s∗ , s∗ ]. We have:7
A − 3βR
s = max 0,
(1 − β)
A
∗
s = min
,R
1 + 2β
∗
Let us now consider the cautious equilibrium of the non-cooperative game. Solving (12)
for the interior equilibrium g nc , we have that total non-cooperative expansion is given by:

R−s





(1 − β)N + β A − (1 − β)N + 3β s
nc
nc
N δ (s) =
Ng =

3β




0
if
s ≤ snc
if
s ∈ (snc , snc )
if
s ≥ snc
where
(
nc
s
)
(1 − β)N + β A − 3βR
= max 0,
(1 − β)N
(
nc
s
= min
)
(1 − β)N + β A
,R
(1 − β)N + 3β
591
The closed form solutions make it easy to confirm the comparative statics results. First,
592
when R increases to R̃ but A remains unchanged (so that R < A and R̃ ≤ A), it only has
593
an effect on snc (provided that snc < R). Provided that snc > 0, it is plain to see that
594
∂snc
∂R
595
Second, for these specific functional forms, a monotonic increase in the risk of a regime shift
596
is equivalent to a decrease in A (say, from A to Â). Clearly, when R is unchanged (so that
597
R < A and R ≤  when  < A), a lower A means a lower snc , a lower snc , and a decreased
598
expansion of the set of safe consumption values when s ∈ (snc , snc ) because ((1−β)N +β) > 0
599
for N ≥ 1. The same holds when A = R and R̂ =  with  < A, because R only appears in
600
the condition for snc and snc > 0 ⇔ (1 − β)N − 2β > 0. Third, considering the comparative
601
static results with respect to N , it is straightforward to check that the denominator of
602
is 2β(1 − β)A > 0 and that
603
the functional forms of this example, the more agents there are, the smaller the region where
604
they can coordinate on not expanding the set of safe consumption values and the larger the
605
total resource use in the case of experimentation. Finally, an increase in the discount factor
606
implies that each agent is more patient and values the preservation of the resource for future
607
consumption more. Thus, aggregate experimentation declines and the range of the cautious
608
equilibrium is larger, which can be readily confirmed by noting that the denominator of
609
is 3N (s − A), which is negative because s ≤ A.
7
3β
= − (1−β)N
< 0 so that the range where a separate cautious equilibrium exists is larger.
∂N g nc
∂N
∂snc
∂N
= g nc + N 1−β
3β (A − s) > 0 as A ≥ s by definition. For
At s∗ it is optimal to consume the entire resource, so that s∗ is found by solving R − s∗ =
s∗ it is optimal to remain standing, so that s∗ is found by solving 0 =
20
A−(1+2β)s∗
.
3β
∂g nc
∂β
A−(1+2β)s∗
.
3β
At
610
Figure 4 plots the value function for a uniform prior (with A = R = 1) and a discount
611
factor of β = 0.8, illustrating how it changes as the number of agents increases. The more
612
agents there are, the greater the distance of the non-cooperative value function (assuming that
613
agents coordinate on the cautious equilibrium, plotted by the red dashed line) to the socially
614
optimal value function (plotted by the blue open circles). In particular when N = 10, one sees
615
the region (roughly from s = 0 to s = 0.2) where there is no separate cautious equilibrium,
616
and that the value s̄nc when it becomes individually rational to remain standing is relatively
617
large (roughly 0.62). All in all however, this example shows that the threat of a irreversible
618
regime shift is very effective when the common pool externality applies only to the risk of
619
crossing the threshold.8 In particular, for s ≥ snc , the cautious equilibrium coincides with the
620
social optimum.
N=5
N=10
5
Social Optimum
Non-cooperation
1
1
2
2
3
3
4
4
5
Social Optimum
Non-cooperation
snc
snc
s*
0.2
0.4
snc
0
0
s*
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.6
0.8
1.0
Figure 4: Illustration√of the value derived from the socially optimal and non-cooperative use of the
resource with u(c) = c, β = 0.8, and A = R = 1 for N = 5 and N = 10.
621
4
Extensions
622
The paper’s main results do not rely on specific functional forms for utility or the probability
623
distribution of the threshold’s location. Tractability is achieved by considering extremely
624
simple resource dynamics, namely the resource remains intact and replenishes fully in the
625
next period as long as resource use in the current period has not exceeded T . In other words,
626
there is no common pool externality relating to the resource dynamics itself. In this part of the
627
paper, I explore to what extent the main results are robust to more general resource dynamics
628
(section 4.1 amd 4.2). Moreover, I show that the result of once-and-for-all experimentation
629
does not rely on the assumption that the regime shift occurs immediately if the threshold is
8
At least for this specific utility function and these parameter values. Note that β = 0.8 implies a unreasonably high discount rate, but it was chosen to magnify the effect of non-cooperation for a small number of
agents.
21
630
crossed (section 4.3). However, once-and-for-all experimentation may no longer be optimal
631
when the regime shift is not disastrous in the sense that the post-event value function depends
632
on the extent to which the threshold has been surpassed: Learning is gradual if, and only if,
633
the post-event value decreases sufficiently strongly with the extent to which the threshold has
634
been surpassed (section 4.4).
635
4.1
636
In this subsection, I shall relax the assumption that R is constant. Instead, I consider the
637
case that R grows according to some function G(R). However, I maintain the assumption
638
that f and A are fixed, so that Rt converges to some R∞ ≤ A with time. This situation may
639
mechanistically lead to repeated experimentation as long as the upper bound of the feasible
640
choice set in the respective period is binding. As a consequence, the overall consumption plan
641
will be more cautious.
642
Growing R and constraints on the choice set
Formally, the resource dynamics can be expressed as:
Rt+1
643

P i
 Rt + G(Rt ) if
i ct ≤ T
=
P i

0
if
i ct > T
(15)
or Rt = 0
where G(R) > 0 for R ∈ (0, R∞ ), G(R∞ ) = 0, and R0 < R∞ ≤ A.
644
Let us first consider the social optimum. As the set of safe values at which no more
645
experimentation is optimal, S, depends only on the belief about the location of the threshold
646
F and not on R, it will be optimal to expand the set of safe consumption values as long as
647
Rt ∈
/ S. Specifically, in this initial phase, it will be optimal to choose δt∗ = Rt − st−1 . Once
648
Rt ∈ S for some t (say at t = τ ), it will be optimal to choose one last step δ ∗ (sτ ) (which may
649
be of size zero) and to remain at sτ +1 = sτ + δ ∗ (sτ ) for all remaining time.
650
Note that constraints on the choice set such that δ ∈ [0, δmax ] where δmax < R − st for
651
some period t = 0, ..., τ will lead to repeated experimentation for the same mechanistic reason:
652
When the first-best unconstrained expansion of the set of safe consumption possibilities is
653
δ ∗ (s0 ) but δmax is such that it requires several steps to traverse this distance, then the safe
654
value s will be updated sequentially (conditional on not causing the regime shift, of course).
655
The optimal plan prescribes choosing δmax for some period t = 0, ..., τ and then choose δ ∗ (sτ .
656
657
Because ∂δ ∗ (s)∂s < 0 (Proposition 3), it follows directly that this implies an overall more
P
cautious plan (that is: τt=0 δmax + δ ∗ (sτ ) < δ ∗ (s0 )).
658
The exact same reasoning applies in the non-cooperative game. Given that the agents
659
coordinate on the cautious equilibrium, the set S nc does not depend on R. Consequently,
660
the equilibrium path prescribes choosing δtnc =
661
then stay at sτ +1 for the remaining time (even though Rt , and with it also the incentive to
662
unilaterally deviate and deplete the resource, may continue to grow). Note that this rests on
663
the assumption that all agents rationally anticipate the evolution of Rt . Analyzing the effect
664
of uncertainty about G(R) could be very interesting, but is left for future work. That said,
22
Rt −st−1
N
for some period t = 0, ..., τ and
665
even with perfect knowledge about G(R), strategic uncertainty will matter a lot in the real
666
world. Given that a real world actor knows that the incentive to grab is increasing through
667
time and he or she is uncertain whether the other actors will actually stick to the cooperative
668
choice, he or she will have strong incentives to pre-empt the other actors.
669
The discussion in this subsection highlights how the result of once-and-for-all experimen-
670
tation is linked to the assumption of an unconstrained choice set δ ∈ [0, R − s]. The discussion
671
further sheds light on the difference of this model to e.g. the climate change application of
672
Lemoine and Traeger (2014): One reason for the gradual approach in their model is that
673
their assumed capital dynamics implicitly translate into a constrained choice set (as it is very
674
reasonable in their setting, capital cannot be adjusted instantly and costlessly). As I will
675
show in section 4.4, this is not the only reason for gradual experimentation.
676
4.2
677
So far, I have assumed that the resource replenishes fully every period unless the thresholds
678
has been crossed. In other words, the externality related only to the risk of crossing the
679
threshold. Here, I study the opposite case of a non-renewable resource to analyze the effect of
680
a disastrous regime shift when the externality relates both to the risk of crossing the threshold
681
and to the resource itself. Specifically, I consider the following model of extraction from a
682
known stock of a non-renewable resource:
Non-renewable resource dynamics
max
cit
∞
X
β t u(cit )
subject to: Rt+1
t=0

P i
P
 Rt − i cit if
i ct ≤ T
;
=
P

i >T
0
if
c
i t
R0 given
(16)
683
A simple interpretation of this model could be a mine from which several agents extract
684
a valuable resource. If aggregate extraction is too high in a given period, the structure of the
685
shafts may collapse, making the remainder of the resource inaccessible. In spite of this natural
686
interpretation, two things are rather peculiar about this model setup: First, any player can
687
extract any amount up to Rt . (The option to introduce a capacity constraint on current
688
extraction – though realistic – would come at the cost of significant clutter without yielding
689
any apparent benefit.) Second, the assumption that R0 is known and that T is constant means
690
that this is not a problem of eating a cake of unknown size. This problem has since long been
691
dealt with in the literature (see e.g. Kemp, 1976; Hoel, 1978) and is not considered here.
692
I assume that the utility function is of such a form, that in a world without the threshold,
693
there is a pareto-dominant non-cooperative equilibrium in which positive extraction occurs
694
in several periods. Due to discounting, it is clear that the extraction level will decline as
695
time passes, both in the social optimum and in the non-cooperative equilibrium. Due to the
696
stock externality, it is clear that the extraction rate in the non-cooperative equilibrium is
697
inefficiently large (see e.g. Harstad and Liski, 2013). To introduce some notation, let c̃nc (Rt )
698
be the total non-cooperative extraction level (as a function of the resource stock Rt ) in absence
699
of the regime shift risk.
23
700
The threat of a regime shift has the potential to limit non-cooperative extraction below
701
c̃nc (Rt )
702
dynamics given by (16), it is possible to show that experimentation continues to exhibit
703
“once-and-for-all dynamics”.9 Thus, for a given value s0 for which it is known from history
704
that s0 can extracted safely in any one period, agents will either experiment once to learn
705
nc
whether they can safely extract snc
1 = s0 + N δ (s0 ), or not at all. As the extraction path will
706
decline, snc
1 will only be binding on the per-period extraction for an initial phase (say until
707
t = τ , where τ is implicitly defined by s1 = c̃nc (Rτ ). After time period τ , the extraction path
708
will follow the same non-cooperative path as in the absence of a regime shift. Consequently,
709
the threat of a regime shift cannot induce coordination on the intertemporal first-best (except
710
∗
∗
in the very special case that snc
1 = s0 = c (RT ) where c (RT ) is the socially optimal extraction
711
at the finite exhaustion period T ). Nevertheless, the threat of a regime shift still allows the
712
agents to coordinate on an extraction path that may pareto-dominate the extraction path
713
without the regime shift in expected terms: While the agents would obviously be better of
714
without the regime shift risk when the initial experiment triggers the collapse, the agents
715
strictly benefit from the constraint on the per-period extraction when the regime shift not be
716
triggered by the initial experiment.
717
4.3
718
In this subsection, I depart from the assumption that all agents observe immediately whether
719
last period’s expansion has triggered the regime shift or not, but consider a situation where
720
the agents observe only with some probability whether they have crossed the threshold. In
721
fact, it is not unreasonable to model the true process of the resource as hidden and that it
722
will manifest itself only after some delay (see Gerlagh and Liski (2014) for a recent paper
723
that focusses on this effect in the context of optimal climate policies). Hence, as time passes
724
the agents will update their beliefs whether the threshold has been located on the interval
725
[st , st + δt ]. How does this learning affect the optimal and non-cooperative strategies? This
726
becomes an extremely difficult question as – due to the delay – the problem is no longer
727
Markovian.
and thereby improve welfare. Also in the case of a non-renewable resource, with
Delay in the occurrence of the regime shift
Nevertheless, it is possible to show that also when crossing the threshold at time t triggers
the regime shift at some (potentially uncertain) time τ > t, it is still socially and individually
rational to experiment – if at all – in the first period only. The key is to realize that yesterday’s
decisions are exogenous today. This means that threat of a regime shift can be modeled as
an exogenous hazard rate: Let ht be the probability that the regime shift, triggered by events
earlier than and including time t, occurs at time t (conditional on not having occurred prior
9
The proof is omitted because it follows the same steps as the proof of Proposition 4. In particular, the
argument that agent i’s payoff is higher when expanding the set of safe values in one step rather than two rests
on a comparison of the risk encountered by the two strategies and that, by definition, staying after the first
expansion is sub-optimal for agent i. Neither of these two comparisons uses the specific form of the continuation
value (it is irrelevant when comparing the regime shift risk and held constant in the second argument).
24
to t, of course). The agent’s problem in this situation can be formulated as:
V i (s, ∆−i ) =
max
δ i ∈[0,R−s]
u(s + δ i ) + (1 − ht )βLs (δ i + δ −i )V i (s + δ, ∆−i )
(17)
728
The structure of (17) is identical to the one in equation (9), only the effective discount factor
729
decreases by (1 − ht ). Thus, the learning dynamics are unchanged.
730
In other words, the “once-and-for-all” dynamics of experimentation are robust to a delay
731
in the occurrence of the regime-shift. This does of course not imply that the optimal decision
732
under the two different models will be the same. It will almost surely differ, as delaying
733
the consequences of crossing the threshold decreases the costs of experimentation. Yet, as
734
the agents only learn that they have crossed the threshold when the disastrous regime shift
735
actually occurs, they cannot capitalize on this delay by trying to expand the set of safe
736
consumption possibilities several times.
737
4.4
738
A central feature of the baseline model was that the regime shift is disastrous: Crossing the
739
threshold breaks any links between the state before and after the regime shift. The pre-
740
event choices did not matter for the post-event value. This structure allowed me to simply
741
normalize the continuation value in case of a regime shift to zero. For some applications,
742
this independence of the post-event value is not a very fitting description. This is especially
743
true when the system under consideration is large, such as for global climate change, and the
744
threshold effect on the damage is not truly catastrophic, but just one of many parts in the
745
equation. In such a setting, one would need to take into account how the continuation value
746
depends on how far the set of consumption values has been expanded before the regime shift.
747
Denote the function that captures how the post-event continuation value depends on the
748
pre-event expansion of the set of consumption values by W (st+1 ) (where st+1 = st + δt ).
749
How W would depend on the pre-event values of the state st and the choice variable δt is
750
not generally clear. The capital stock in a climate change application, for example, likely
751
has an ambiguous effect (Ploeg and Zeeuw, 2015a). On the one hand, it buffers against the
752
adverse effects of the regime shift and hence smoothes consumption over regimes. On the other
753
hand, a higher capital stock implies more intense use of fossil fuels, which aggravates climate
754
damages. Similarly, Ren and Polasky (2014) discuss under which conditions regime shift risk
755
implies more cautionary or more aggressive management in a renewable resource application.
756
In particular, they highlight the role of an “investment effect” that induces incentives for more
757
aggressive management: harvesting less (investing in the renewable resource stock) pays off
758
badly should the regime shift occur. Ren and Polasky go on to show how these incentives
759
are balanced (and potentially overturned) by the “risk reduction effect” and a “consumption
760
smoothing effect” (that leads to more precaution in their application).
Non-disastrous regime shift
761
Thus, whether we overall have W 0 (st+1 ) > 0 or W 0 (st+1 ) < 0 depends on the specific
762
model at hand. Regardless of this, however, when pre-event choices matter for the post-event
763
value, it is no longer immaterial by how much one has stepped over the threshold when it
25
764
is crossed. I now argue that 1) even when the regime shift is not disastrous, there will still
765
be a set S or S nc at which it is socially or individually rational to not experiment further
766
but to stay at the consumption value that is currently known to be safe, and 2) I point out
767
that a necessary condition for a gradual approach to S is that the post-event value declines
768
sufficiently strongly in st+1 .
To put some more structure to the argument, consider the Bellman equation of the sole
769
770
owner before the regime shift has occurred:
V (s) =
max
δ∈[0,R−s]
u(s + δ) + β Ls (δ)· V (s + δ) + (1 − Ls (δ))· W (s + δ)
(18)
771
In other words, the social planner seeks to choose that expansion of the set of consumption
772
values that is currently known to be safe that maximizes his current utility plus the discounted
773
continuation value. With probability Ls (δ), the step of size δ turns out to be safe and the
774
continuation value is given by V (s + δ). With probability (1 − Ls (δ)), the threshold is located
775
on the interval between s and s + δ and the continuation value after the regime shift depends
776
on how far s has been expanded.
777
Although these considerations certainly impact the choice of δ, it will still be the case that
778
there is a non-empty set S for which the optimal choice is δ ∗ = 0. The reason is that, as long
779
as the regime shift is a negative event (V (s) > W (s)), the gains from further expansion of the
780
set of safe consumption values are bounded above, while the risk of triggering the regime shift
781
grows exceedingly large as s → R. To obtain more insights about how the optimal expansion
782
choice is changed the existence of an endogenous continuation value, consider the derivative
783
of the RHS of (18) (denoting this function again by ϕ0 should not cause confusion):
ϕ0 = u0 + β L0s (δ)(V − W ) + Ls (δ)· V 0 + (1 − Ls (δ))· W 0 = 0
(19)
The size of δ is determined by three factors: First, there is the gain in marginal utility
784
785
u0 .
Second, there is the term involving L0s (δ) which captures the increased risk of the regime
786
shift. This term is negative. Third, the the optimal choice of δ is affected by the marginal
787
continuation value. Previously, only the event of not crossing the threshold mattered here.
788
Now, also the event of crossing the threshold has to be evaluated explicitly.
789
Analyzing how an endogenous post-event value affects δ ∗ , we first note that the negative
790
term L0s (δ)(V − W ) decreases in absolute value. When the second-order condition for an
791
interior solution is satisfied, ϕ0 is a decreasing function in the neighborhood of δ ∗ and a
792
decrease in absolute value of the term L0s (δ)(V − W ) shifts the function upwards. As it is
793
intuitive, a non-zero continuation value in case of a regime shift pushes for a larger current
794
consumption. However, when W 0 < 0, the term (1 − Ls (δ))· W 0 is negative, which, ceteris
795
paribus, lead to a lower value of δ ∗ . If, and only if, the post-event value declines sufficiently
796
strongly in st+1 will the optimal δ ∗ be so small that for st ∈
/ S, we have st + δt∗ = st+1 ∈
/ S.
797
The approach to S will then be gradual, implying periods of repeated experimentation.
26
799
To illustrate more concretely how these effects play out, I assume specific functional forms.
√
As in section 3.5, I set u(c) = c and assume a uniform distribution for the location of the
800
threshold so that Ls (δ) =
801
assume that the resource loses all its productivity once the regime shift occurs. In other
802
words, W is the highest value that can be obtained when spreading the consumption of the
798
803
804
A−s−δ
A−s
with A = R. For the post-event continuation value, I
now non-renewable
resource rt = R − st − δt over the remaining time horizon. We have
q
R−(st +δt )
∗
W (st + δt ) =
. Clearly, W 0 (st + δt ) < 0. I assume the same structure to contrast
1−β 2
805
the social optimum with the non-cooperative game. For a square-root utility function and
806
without exogenous constraints in extraction, the number of players cannot be too big in order
807
to have an interior equilibrium with positive extraction over the entire time path (instead of
β
1−β
instructive, so that I present the results graphically.
0.4
0.2
s*
*
s^
0.6
0.2
nc
snc s^
0.6
Safe value s
Safe value s
Social Optimum
Cautious equilibrium, N=2
Post-event value = W(s)
Post-event value = 0
45-degree line
0.2
0.4
0.6
Post-event value = W(s)
Post-event value = 0
45-degree line
0.6
β low
Post-event value = W(s)
Post-event value = 0
45-degree line
0.2
0.4
0.6
Post-event value = W(s)
Post-event value = 0
45-degree line
0.2
Aggregate expansion (step) size δ(s)
β high
Cautious equilibrium, N=2
0.6
Social Optimum
0.4
810
0.2
809
immediate, pre-emptive,
r extraction in the first period). Here I choose N =2. We then have
R−(st +δti +δt−i )
√
W nc (st + δti + δt−i ) =
. The analytic closed form solutions are not particularly
2
Aggregate expansion (step) size δ(s)
808
0.2
s*
*
s^
0.6
Safe value s
0.2
0.4
snc
nc
s^
Safe value s
Figure 5: Illustration of optimal experimentation
when pre-event choices matter for post-event value.
√
Parameters and functional forms: u(c) = c, N =2, A=R=1, Ls (δ) = 1−s−δ
1−s ; for β = 0.95 and β = 0.7.
27
811
Figure 5 plots the aggregate expansion of the set of safe consumption values as a function
812
of s for a high and a low value of the discount factor (β = 0.95 and β = 0.7, respectively). The
813
left panel shows the social optimum and the right panel shows the cautious equilibrium when
814
N = 2. In addition to the socially optimal and cautious non-cooperative expansion when
815
the continuation value is endogenous (in blue and red, respectively), I plot the corresponding
816
expansion when the continuation value is zero for comparison.
817
The Figure shows how the first value of s at which is socially optimal and individually
818
rational to remain standing (here marked by ŝ∗ and ŝnc , respectively) is higher when the
819
regime shift is non-disastrous. For a high value of β this difference is relatively small, but
820
for a low value of β the difference is substantial. Figure 5 furthermore shows, in particular
821
for the social optimum at a high discount factor, that the slope of the optimal expansion is
822
lower when the continuation value is endogenous. For the social optimum, this slope is less
823
than one in absolute value (it is below the 45-degree line – the thin dotted line – anchored
824
at ŝ∗ or ŝnc ). This means that the approach to the set of values at which it is optimal to
825
not experiment further is gradual. In fact, S is reached only asymptotically here. For the
826
non-cooperative game at a low discount factor, however, the set S nc is still reached in one
827
step (if the threshold has not been crossed, of course). At the high discount factor (β = 0.95)
828
the slope of the non-cooperative expansion is just equal to -1, but for higher values of β (0.97
829
and above; not shown here) also the cautious non-cooperative equilibrium implies repeated
830
experimentation.
831
5
832
The threat of a disastrous regime shift may be beneficial in the context of non-cooperative use
833
of a renewable resource because it can facilitate coordination on a pareto-dominant equilibrium
834
even when the location of the threshold is unknown. Because the historical level of use is known
835
to be safe, the consequence of the regime shift is catastrophic, and learning is only affirmative,
836
it is socially optimal and a “cautious” Nash Equilibrium to update beliefs only once (if at all).
837
When the current level of safe use is sufficiently valuable, the threat of loosing the productive
838
resource discourages any experimentation and effectively enforces the first-best consumption
839
level. When the agents experiment, the expansion of the consumption set is inefficiently large,
840
but when it has not triggered the regime shift, staying at the updated level is ex post socially
841
optimal. However, there is always also an “aggressive” Nash equilibrium in which the agents
842
immediately deplete the resource, and when the initial value of safe use is not valuable enough,
843
immediate depletion will be the only equilibrium.
Discussion and Conclusion
844
The amount of learning thus depends on history and there will be at most one experiment.
845
These dynamics are robust to a variety of extensions. However, when the regime shift is not
846
disastrous and the post-event continuation value depends on how far the threshold has been
847
overstepped, repeated experimentation may emerge (but only if the post-event value declines
848
sufficiently steeply in the amount by with the threshold has been overstepped). When the
849
externality applies both to the risk of triggering the regime shift and to the resource itself, the
28
850
threat of the disastrous regime shift loses importance, but it can still act as a “commitment
851
device” to at least dampen non-cooperative extraction.
852
These conclusions have been derived by developing a general dynamic model that has
853
placed only minimal requirements on the utility function and the probability distribution of
854
the threshold. Nevertheless, there are a number of structural modeling assumptions that
855
warrant discussion.
856
First, a prominent aspect of this model is that the threshold itself is not stochastic. The
857
central motivation is that this allows concentrating on the effect of uncertainty about the
858
threshold’s location. This is arguably the core of the problem: we don’t know which level of
859
use triggers the regime shift. This modeling approach implies a clear demarcation between a
860
safe region and a risky region of the state space. In particular, it implies that the edge of the
861
cliff, figuratively speaking, is a safe – and in many cases optimal – place to be. The alternative
862
approach to model the regime shift as a function of time t (with a positive hazard rate for all
863
t) acknowledges that, figuratively speaking, the edge of a cliff is often quite windy and not a
864
particular safe place. This however implies that the regime shift will occur with certainty as
865
time goes to infinity, no matter how little of the resource is used; eventually there will be a
866
gust of wind that is strong enough to blow us over the edge, regardless of where we stand.
867
This is of course not very realistic either. But also on a deeper level one could argue that
868
the non-stochasticity is in effect not a flaw but a feature: Let me cite Lemoine and Traeger
869
(2014, p.28) who argue that “we would not actually expect tipping to be stochastic. Instead,
870
any such stochasticity would serve to approximate a more complete model with uncertainty
871
(and potentially learning) over the precise trigger mechanism underlying the tipping point.”
872
This being said, it would still be interesting to investigate how the choice between a hazard-
873
rate formulation (as in Polasky et al., 2011 or Sakamoto, 2014) or a threshold formulation
874
influences the outcome and policy conclusions in an otherwise identical model.
875
Second, I have modeled the players to be identical. In the real world, players are rarely
876
identical. One dimension along which players could differ could be their valuation of the
877
future. However, prima facie it should not be difficult to show that any such differences could
878
be smoothed out by a contract that gives a larger share of the gains from cooperation to
879
more impatient players. One could also investigate the effect of heterogenous beliefs about the
880
existence and location of the threshold. Agbo (2014) and Koulovatianos (2015) analyze this in
881
the framework of Levhari and Mirman (1980). In the current set-up such a heterogeneity could
882
lead to interesting dynamics and possible multiple equilibria, where some players rationally
883
do not want to learn about the probability distribution of T whereas other players do invest
884
in learning and experimentation. Another dimension along which players could differ is their
885
size or the degree to which they depend on the environmental goods or services in question.
886
As larger players are likely to be able to internalize a larger part of the externality than
887
smaller players, different sets of equilibria may emerge. Especially in light of the discussions
888
surrounding a possible climate treaty (Harstad, 2012; Nordhaus, 2015), it is topical to analyze
889
a situation where groups of players can form a coalition to ameliorate the negative effects of
890
non-cooperation in future applications.
29
891
Third, I have assumed the regime shift to be irreversible. This is obviously a considerable
892
simplification. Groeneveld et al. (2013) have analyzed the problem of how a social planner
893
would learn about the location of a flow-pollution threshold in a setting where repeated
894
crossings are allowed. In the current set-up, the introduction of several thresholds would
895
most likely not yield significant new insights but make the analysis a lot more cumbersome.
896
For example, if one presumes that crossing the threshold implies that one learns where it
897
is,10 the game turns into a repeated game. This may imply that cooperation is sustainable
898
for sufficiently patient agents (van Damme, 1989). However, there could also be cases where
899
irreversibility emerges “endogenously” when it is possible – but not an equilibrium – to move
900
out of a non-productive regime. The tractability of the current modeling approach may prove
901
fruitful to further explore this issue.
902
A final, related, point is the fact that I have concentrated on Markovian strategies. When
903
the agents are allowed to use history-dependent strategies, the threat of a threshold may allow
904
them to coordinate on the social optimum in all phases of the game. They could simply agree
905
on expanding the set of safe consumption possibilities by the socially optimal amount and
906
threaten that if any agent steps too far, this triggers a reaction to deplete the resource in the
907
next period. This obviously begs the question of renegotiation proofness, but it is plausible
908
that already a contract that is binding for two periods is sufficient to achieve the first-best.
909
The threat of a disastrous regime shift is a very strong coordinating device. This is true
910
irrespective of whether the threshold’s location is known or unknown, because the current
911
model of uncertainty implies that the agents learn only after the fact whether the disastrous
912
regime shift has occurred or not. Would a catastrophic threshold lose its coordinating force
913
when the agents can learn about its location without risking to cross it? Importantly, an
914
extension of the model along these lines would speak to the recent debate on “early warning
915
signals” (Scheffer et al., 2009; Boettiger and Hastings, 2013) and is the task of future work.
10
Groeneveld et al. (2013) presume that crossing the threshold tells the social planner that the regime shift
has occurred, but not where. This is a little bit like sitting in a car, fixing the course to a destination and
then blindfolding oneself. Arguably conscious experimentation is more realistically described by saying that
the course is set, but the eyes remain open.
30
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
References
Aflaki, S. (2013). The effect of environmental uncertainty on the tragedy of the commons. Games and Economic
Behavior, 82(0):240–253.
Agbo, M. (2014). Strategic exploitation with learning and heterogeneous beliefs. Journal of Environmental Economics
and Management, 67(2):126–140.
Barrett, S. (2013). Climate treaties and approaching catastrophes. Journal of Environmental Economics and Management, 66(2):235–250.
Barrett, S. and Dannenberg, A. (2012). Climate negotiations under scientific uncertainty. Proceedings of the National
Academy of Sciences, 109(43):17372–17376.
Bochet, O., Laurent-Lucchetti, J., Leroux, J., and Sinclair-Desgagné, B. (2013). Collective Dangerous Behavior: Theory
and Evidence on Risk-Taking. Research Papers by the Institute of Economics and Econometrics, Geneva School of
Economics and Management, University of Geneva 13101, Institut d’Economie et Economtrie, Universit de Genve.
Boettiger, C. and Hastings, A. (2013). Tipping points: From patterns to predictions. Nature, 493(7431):157–158.
Bolton, P. and Harris, C. (1999). Strategic experimentation. Econometrica, 67(2):349–374.
Bonatti, A. and Hörner, J. (2015). Learning to disagree in a game of experimentation. Discussion paper, Cowles
Foundation, Yale University.
Brozović, N. and Schlenker, W. (2011). Optimal management of an ecosystem with an unknown threshold. Ecological
Economics, 70(4):627 – 640.
Costello, C. and Karp, L. (2004). Dynamic taxes and quotas with learning. Journal of Economic Dynamics and Control,
28(8):1661–1680.
Crépin, A.-S. and Lindahl, T. (2009). Grazing games: Sharing common property resources with complex dynamics.
Environmental and Resource Economics, 44(1):29–46.
Cropper, M. (1976). Regulating activities with catastrophic environmental effects. Journal of Environmental Economics
and Management, 3(1):1–15.
Dubins, L. E. and Savage, L. J. (1965). How to gamble if you must : inequalities for stochastic processes. McGraw-Hill,
New York.
Fesselmeyer, E. and Santugini, M. (2013). Strategic exploitation of a common resource under environmental risk. Journal
of Economic Dynamics and Control, 37(1):125–136.
Frank, K. T., Petrie, B., Choi, J. S., and Leggett, W. C. (2005). Trophic cascades in a formerly cod-dominated ecosystem.
Science, 308(5728):1621–1623.
Gerlagh, R. and Liski, M. (2014). Carbon Prices for the Next Hundred Years. CESifo Working Paper Series 4671, CESifo
Group Munich.
Groeneveld, R. A., Springborn, M., and Costello, C. (2013). Repeated experimentation to learn about a flow-pollutant
threshold. Environmental and Resource Economics, forthcoming:1–21.
Harsanyi, J. C. and Selten, R. (1988). A general theory of equilibrium selection in games. MIT Press, Cambridge.
Harstad, B. (2012). Climate contracts: A game of emissions, investments, negotiations, and renegotiations. The Review
of Economic Studies, 79(4):1527–1557.
Harstad, B. and Liski, M. (2013). Games and resources. In Shogren, J., editor, Encyclopedia of Energy, Natural Resource
and Environmental Economics, volume 2, pages 299–308. Elsevier, Amsterdam.
Hjermann, D. Ø., Stenseth, N. C., and Ottersen, G. (2004). Competition among fishermen and fish causes the collapse
of Barents Sea capelin. Proceedings of the National Academy of Sciences, 101:11679–11684.
Hoel, M. (1978). Resource extraction, uncertainty, and learning. The Bell Journal of Economics, 9(2):642–645.
Keller, G., Rady, S., and Cripps, M. (2005). Strategic experimentation with exponential bandits. Econometrica, 73(1):39–
68.
Kemp, M. (1976). How to eat a cake of unknown size. In Kemp, M., editor, Three Topics in the Theory of International
Trade, pages 297–308. North-Holland, Amsterdam.
Kim, Y. (1996). Equilibrium selection in n-person coordination games. Games and Economic Behavior, 15(2):203–227.
Kolstad, C. and Ulph, A. (2008). Learning and international environmental agreements. Climatic Change, 89(1-2):125–
141.
Kossioris, G., Plexousakis, M., Xepapadeas, A., de Zeeuw, A., and Mäler, K.-G. (2008). Feedback nash equilibria for
non-linear differential games in pollution control. Journal of Economic Dynamics and Control, 32(4):1312–1331.
Koulovatianos, C. (2015). Strategic exploitation of a common-property resource under rational learning about its reproduction. Dynamic Games and Applications, 5(1):94–119.
Lemoine, D. and Traeger, C. (2014). Watch your step: Optimal policy in a tipping climate. American Economic Journal:
Economic Policy,, forthcoming:1–47.
Lenton, T. M., Held, H., Kriegler, E., Hall, J. W., Lucht, W., Rahmstorf, S., and Schellnhuber, H. J. (2008). Tipping
elements in the earth’s climate system. Proceedings of the National Academy of Sciences, 105(6):1786–1793.
Levhari, D. and Mirman, L. J. (1980). The Great Fish War: An Example Using a Dynamic Cournot-Nash Solution. The
Bell Journal of Economics, 11(1):322–334.
Miller, S. and Nkuiya, B. (2014). Coalition formation in fisheries with potential regime shift. Unpublished Working
Paper, University of California, Santa Barbara.
Mitra, T. and Roy, S. (2006). Optimal exploitation of renewable resources under uncertainty and the extinction of
species. Economic Theory, 28(1):1–23.
Nævdal, E. (2003). Optimal regulation of natural resources in the presence of irreversible threshold effects. Natural
31
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
Resource Modeling, 16(3):305–333.
Nævdal, E. (2006). Dynamic optimisation in the presence of threshold effects when the location of the threshold is
uncertain – with an application to a possible disintegration of the western antarctic ice sheet. Journal of Economic
Dynamics and Control, 30(7):1131–1158.
Nævdal, E. and Oppenheimer, M. (2007). The economics of the thermohaline circulation – a problem with multiple
thresholds of unknown locations. Resource and Energy Economics, 29(4):262–283.
Nordhaus, W. (2015). Climate clubs: Overcoming free-riding in international climate policy. American Economic Review,
105(4):1339–1370.
Ploeg, F. v. d. and Zeeuw, A. d. (2015a). Climate tipping and economic growth: Precautionary saving and the social
cost of carbon. OxCarre Research Paper 118, OxCarre, Oxford, UK.
Ploeg, F. v. d. and Zeeuw, A. d. (2015b). Non-cooperative and cooperative responses to climate catastrophes in the
global economy: A north-south perspective. OxCarre Research Paper 149, OxCarre, Oxford, UK.
Polasky, S., Zeeuw, A. d., and Wagener, F. (2011). Optimal management with potential regime shifts. Journal of
Environmental Economics and Management, 62(2):229 – 240.
Ren, B. and Polasky, S. (2014). The optimal management of renewable resources under the risk of potential regime shift.
Journal of Economic Dynamics and Control, 40(0):195–212.
Riley, J. and Zeckhauser, R. (1983). Optimal selling strategies: When to haggle, when to hold firm. The Quarterly
Journal of Economics, 98(2):267–289.
Rob, R. (1991). Learning and capacity expansion under demand uncertainty. The Review of Economic Studies, 58(4):655–
675.
Sakamoto, H. (2014). A dynamic common-property resource problem with potential regime shifts. Discussion Paper
E-12-012, Graduate School of Economics, Kyoto University.
Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., Held, H., van Nes, E. H., Rietkerk,
M., and Sugihara, G. (2009). Early-warning signals for critical transitions. Nature, 461(7260):53–59.
Scheffer, M., Carpenter, S., Foley, J. A., Folke, C., and Walker, B. (2001). Catastrophic shifts in ecosystems. Nature,
413(6856):591–596.
Tavoni, A., Dannenberg, A., Kallis, G., and Löschel, A. (2011). Inequality, communication, and the avoidance of
disastrous climate change in a public goods game. Proceedings of the National Academy of Sciences, 108(29):11825–
11829.
Tsur, Y. and Zemel, A. (1994). Endangered species and natural resource exploitation: Extinction vs. coexistence. Natural
Resource Modeling, 8:389–413.
Tsur, Y. and Zemel, A. (1995). Uncertainty and irreversibility in groundwater resource management. Journal of
Environmental Economics and Management, 29(2):149 – 161.
van Damme, E. (1989). Renegotiation-proof equilibria in repeated prisoners’ dilemma. Journal of Economic Theory,
47(1):206–217.
32
1015
Appendix
1016
A.1
1017
Recall that Proposition 1 states that when the location of the threshold is known with certainty, then there
1018
exists, for every combination of β, N , and R, a value Tc such that the first-best can be sustained as a Nash-
1019
Equilibrium when T ≥ Tc , where Tc is defined by Ψ = 0. The critical value Tc is higher the larger is N and R,
1020
or the smaller is β.
It is useful to replicate equation (3) that describes the gain from immediate depletion over staying at T
1021
1022
Proof of Proposition 1
when all other agents stay at T :
u(T /N )
N −1
Ψ(T, R, N, β) = u R −
T −
N
1−β
1023
1024
(3)
To show that a value Tc , defined by Ψ = 0, always exists, I first note that Ψ declines monotonically in
0
(T /N )
T:
= − NN−1 u0 R − NN−1 T − N1 u 1−β
< 0 as u0 > 0, N ≥ 1 and β ∈ (0, 1). Then, I show that Ψ is
∂Ψ
∂T
1025
larger than zero at T = 0: Ψ(0, R, N, β) = u(R) > 0. Finally, I show that Ψ is smaller than zero as T → R:
1026
β
limT →R Ψ = − 1−β
u(R/N ) < 0 as β ∈ (0, 1) and u(R/N ) > 0. Thus, by the mean value theorem, for every
1027
combination of β, N , and R, there must be a value of T at which Ψ = 0.
Now, to show that staying at T > Tc is indeed the socially optimal action (the first-best), I show that
1028
1029
dTc
dN
1030
which by implies that Tc is smallest when N = 1 (the social planner case). As Ψ is monotonically declining in
1031
T , it will be socially optimal to stay at T for all values of T that are larger than the social-planner’s Tc . A Tc
1032
is implicitly defined by Ψ = 0 we have:
> 0. This means that the critical value at which staying is a Nash-Equilibrium is higher the larger is N ,
∂Ψ/∂N
dTc
=−
>0
dN
∂Ψ/∂T
1033
1034
1035
1036
∂Ψ
∂N
We know that the denominator is negative, so that
= NT2 u0 R − NN−1 T − u0 (T /N ) > 0.
dTc
dN
> 0 when the numerator is positive. We have:
Finally, it remains to show that Tc is higher the larger is R and the smaller is β. Again, a sufficient
condition for the former statement is ∂Ψ
> 0, which holds because ∂Ψ
= u0 R − NN−1 T > 0. A sufficient
∂R
∂R
dTc
dβ
< 0 is that
∂Ψ
∂β
< 0 which holds because
∂Ψ
∂β
/N )
= − u(T
< 0.
(1−β)2
1037
condition for
1038
A.2
1039
Recall that Proposition 2 consists of two parts: First, it states that there exists a set S so that for s ∈ S, it is
1040
optimal to choose δ(s) = 0. That is, if s0 ∈ S, the socially optimal use of the resource is s0 for all t. Second,
1041
the proposition states that if s0 ∈
/ S, it is optimal to experiment once at t = 0 and expand the set of safe values
1042
by δ ∗ (s0 ). When this has not triggered the regime shift, it is socially optimal to stay at s1 = s0 + δ ∗ (s0 ) for all
1043
t ≥ 1.
Proof of Proposition 2
33
1044
1045
First, I show that there is a non-empty set S ⊂ [0, R] at which it is optimal to stay. Assume
Part (1)
for contradiction that for all s in [0, R]:
max
δ∈(0,R−s]
1046
u(s + δ) + βLs (δ)V (s + δ) > u(s) + βV (s)
(A-1)
Then there is a value of δ such that:
V (s) −
L(s + δ)
u(s + δ) − u(s)
V (s + δ) <
L(s)
β
(A-2)
1047
Now as u is concave, positive, and bounded above by u(R), we know that for s sufficiently close to R, the
1048
denominator of the RHS of (A-2) is bounded above: u(s + δ) − u(s) < βKδ. Using this and multiplying both
1049
sides by L(s) as well as dividing both sides by δ we have:
L(s)V (s) − L(s + δ)V (s + δ)
< KL(s)
δ
(A-3)
1050
Now, take the limit as s → R. Because δ ∈ (0, R − s], we have that δ → 0 when s → R so that the LHS of
1051
(s)]
(A-3) is the negative of the derivative of L(s)V (s): lims→R − ∂[L(s)V
= f (R)V (R), which is positive while
∂s
1052
the RHS of (A-3) vanishes when F (R) = 1. Thus, we have a contradiction and there must be some s at which
1053
it is optimal to choose δ(s) = 0. When there is a positive probability that there is no threshold on [0, R] (that
1054
is, F (R) < 1), the RHS of (A-3) does not vanish. Nevertheless, there will always be value of s, namely s = R,
1055
at which it is optimal to stay – simply because there is no other choice.
Thus, the set S is not empty. Moreover, when the hazard rate is not decreasing with s (that is when
∂Ls (δ)
∂s
=
−f (s+δ)(1−F (s))+(1−F (s+δ))f (s)
[1−F (s)]2
∗
<0⇔
f (s)
1−F (s)
<
f (s+δ)
),
1−F (s+δ)
it can be shown that the set S is convex,
∗
so that S = [s , R] where s is defined in the main text as the lowest value of s at which it is optimal to never
experiment. First, note that convexity of S is trivial when s∗ = R. Consider then the case that s∗ < R. By
definition, the first-order condition (equation (7) in the main text) must just hold with equality for s∗ :
ϕ0 (δ; s) = 0
⇔
u0 (s∗ ) = β
f (s∗ ) u(s∗ )
1 − F (s∗ ) 1 − β
S is convex if for any s ∈ (s∗ , R] we have ϕ0 < 0 (i.e. a boundary solution of δ ∗ = 0). That is:
u0 (λs∗ + (1 − λ)R) < β
f (λs∗ + (1 − λ)R) u(λs∗ + (1 − λ)R)
1 − F (λs∗ + (1 − λ)R)
1−β
for all λ ∈ (0, 1]
(A-4)
1056
Because u0 > 0 and u00 ≤ 0, the term on the LHS of (A-4) is smaller the larger is λ. Because u0 > 0 the
1057
rightmost fraction of (A-4) is larger the larger is λ, β is positive constant. The term in the middle is the
1058
hazard rate, which is non-decreasing by assumption.
1059
1060
Part (2)
When s0 ∈
/ S, it is not optimal to stay. Thus, it is optimal to expand the set of safe consumption
values by choosing δ > 0. Due to discounting, it cannot be optimal to approach S asymptotically. Thus there
34
1061
must be a last step from some st ∈
/ S to st+1 = st + δt with st+1 ∈ S. Below, I show that it is in fact optimal
1062
to take only one step. It then follows that when s0 ∈
/ S, it is optimal to choose s0 + δ ∗ (s0 ) for t = 0 and, if the
1063
resource has not collapsed, s1 for all t ≥ 1.
1064
Denote δ ∗ (s̃) the optimal last step when starting from some value s̃ ∈
/ S and s∗ = s̃ + δ ∗ with s∗ ∈ S. The
1065
following calculations show that going from some s to s̃ (by taking a step of size δ̃) and then to s∗ (by taking
1066
a step of size δ ∗ ) yields a lower payoff than going from s to s∗ directly (by taking a step of size δ̂ = δ̃ + δ ∗ ; see
1067
the box below for a sketch of the involved step-sizes).
δ̂
δ̃
s
1068
δ∗
s
^
R
s∗
s̃
That is, I claim:
u(s + δ̃ + δ ∗ )
u(s + δ̃) + βLs (δ̃) u(s + δ̃ + δ ∗ ) + βLs+δ̃ (δ ∗ )
1−β
The important thing to note is that: Ls (δ̃)Ls+δ̃ (δ ∗ ) =
≤
L(s+δ̃) L(s+δ̃+δ ∗ )
L(s)
L(s+δ̃)
u(s + δ̂) + βLs (δ̂)
=
L(s+δ̃+δ ∗ )
L(s)
u(s + δ̂)
1−β
(A-5)
= Ls (δ̃ + δ ∗ ). Hence, (A-5)
can, upon using δ̂ = δ̃ + δ ∗ and splitting the RHS into three parts (t = 0, t = 1, t ≥ 2), be written as:
u(s + δ̃) + βLs (δ̃)u(s + δ̂) + β 2 Ls (δ̂)
u(s + δ̂)
1−β
≤
u(s + δ̂) + βLs (δ̂)u(s + δ̂) + β 2 Ls (δ̂)
which simplifies to:
u(s + δ̃)
≤
h
i
1 + β Ls (δ̂) − Ls (δ̃) u(s + δ̂)
"
u(s + δ̃)
≤
u(s + δ̂)
1−β
#
L(s + δ̂) − L(s + δ̃)
u(s + δ̂)
1+β
L(s)
(A-5’)
Because the term in the squared bracket is smaller than 1 (as L(s+ δ̂) < L(s+ δ̃)), it is not immediately obvious
that the inequality in the last line holds. However, we can use the fact that because s̃ ∈
/ S, and because δ ∗ is
defined as the optimal last step from s̃ into the set S, the following must hold:
u(s̃)
u(s̃ + δ ∗ )
< u(s̃ + δ ∗ ) + βLs̃ (δ ∗ )
.
1−β
1−β
Using the fact that s̃ = s + δ̃ and that s̃ + δ ∗ = s + δ̂, this can be re-arranged to give:
u(s + δ̃)
L(s + δ̂) u(s + δ̂)
< u(s + δ̂) + β
1−β
L(s + δ̃) 1 − β
⇔
"
#
L(s + δ̂) − L(s + δ̃)
u(s + δ̃) < 1 + β
u(s + δ̂)
L(s + δ̃)
Since L0 (s) < 0, we know that
L(s+δ̂)−L(s+δ̃)
L(s+δ̃)
<
L(s+δ̂)−L(s+δ̃)
L(s)
35
(A-6)
< 0. Therefore, combining (A-5’) and (A-6)
establishes the claim and completes the proof:
"
u(s + δ̃)
<
#
L(s + δ̂) − L(s + δ̃)
1+β
u(s + δ̂)
L(s + δ̃)
"
<
#
L(s + δ̂) − L(s + δ̃)
u(s + δ̂)
1+β
L(s)
1069
A.3
1070
Proposition 3 states that the socially optimal step size δ ∗ (s) is decreasing in s for s ∈ (s∗ , s∗ ).
(A-7)
Proof of Proposition 3.
Recall that δ ∗ (s) is implicitly defined by the solution of ϕ0 (δ ∗ ; s) = 0 for s ∈ (s∗ , s∗ ) and where ϕ0 is:
ϕ0 (δ ∗ ; s) = u0 (s + δ ∗ ) +
β 0 ∗
Ls (δ )u(s + δ ∗ ) + Ls (δ ∗ )u0 (s + δ ∗ )
1−β
(7)
with the second-order condition:
ϕ00 (δ ∗ ; s) = u00 +
β 00 ∗
Ls (δ )u + 2L0s (δ ∗ )u0 + Ls (δ ∗ )u00 < 0
1−β
(A-8)
To show that δ ∗ is declining in s, I can use the implicit function theorem and need to show that:
∂ ϕ0 (δ ∗ ; s) /∂s
dδ ∗
=−
<0
ds
∂ ϕ0 (δ ∗ ; s) /∂δ ∗
The denominator is negative when the second-order condition is satisfied. Therefore, a sufficient condition for
dδ ∗
ds
< 0 when the second-order condition is satisfied is that
∂[ϕ0 (δ ∗ ; s)]
β
= u00 +
∂s
1−β
∂[ϕ0 (δ ∗ ;s)]
∂s
< 0 holds. In other words, we must have:
∂L0s (δ ∗ )
∂Ls (δ ∗ ) 0
u + L0s (δ ∗ )u0 +
u + Ls (δ ∗ )u00
∂s
∂s
<0
(A-9)
Noting the similarity of (A-9) to the second-order condition (A-8), and realizing that (A-8) can be decomposed into a common part A and a part B and that (A-9) can be decomposed into the common part A and a
part C, a sufficient condition for (A-9) to be satisfied is that B > C.
u00 +
|
|
β 0
β 00 ∗
Ls L(δ ∗ )u0 + Ls (δ ∗ )u00
+
Ls (δ )u + L0s (δ ∗ )u0 < 0
1−β
1−β
|
{z
}
{z
}
B
A
u00 +
β 0 ∗ 0
β
Ls (δ )u + Ls (δ ∗ )u00
+
1−β
1−β
{z
}
|
A
In order to show that L00s (δ)u + L0s (δ)u0 >
∂L0s (δ ∗ )
∂Ls (δ ∗ ) 0
u+
u <0
∂s
∂s
{z
}
∂L0s (δ)
s (δ) 0
u + ∂L∂s
u,
∂s
0
solution from (7) to write u in terms of u:
u0 =
(A-8’)
−L0s (δ)
u
+ Ls (δ)
1−β
β
36
(A-9’)
C
I use the first-order condition for an interior
Upon inserting and canceling u, I need to show that:
"
L00s (δ)
Recall that Ls (δ) =
1071
L(s+δ)
L(s)
+
L0s (δ)
−L0s (δ)
1−β
+ Ls (δ)
β
#
∂Ls (δ)
∂L0s (δ)
+
>
∂s
∂s
"
−L0s (δ)
1−β
+ Ls (δ)
β
#
(A-10)
and hence:
L0s (δ) =
L0 (s + δ)
L(s)
∂Ls (δ)
L0 (s + δ)L(s) − L(s + δ)L0 (s)
=
∂s
[L(s)]2
L00s (δ) =
L00 (s + δ)
L(s)
∂L0s (δ)
L00 (s + δ)L(s) − L0 (s + δ)L0 (s)
=
∂s
[L(s)]2
Tedious but straightforward calculations then show that (A-10) is indeed satisfied.
0
1−β
L(s + δ) L00 (s + δ)
L (s + δ) 2
1−β
L(s + δ) ∂L0s (δ)
∂Ls (δ) 0
+
−
>
+
−
Ls (δ)
β
L(s)
L(s)
L(s)
β
L(s)
∂s
∂s
|
{z
}
|
{z
}
a
a
⇔
a
0
L00 (s + δ)L(s) − L0 (s + δ)L0 (s)
∂Ls (δ) 0
L00 (s + δ)
L (s + δ) 2
−
>a
−
Ls (δ)
L(s)
L(s)
[L(s)]2
∂s
⇔
L0 (s + δ)
aL00 (s + δ)L(s) − [L0 (s + δ)]2 > a L00 (s + δ)L(s) − L0 (s + δ)L0 (s) − L0 (s + δ)L(s) − L(s + δ)L0 (s)
L(s)
⇔
− [L0 (s + δ)]2 L(s) > −aL0 (s + δ)L0 (s)L(s) − L0 (s + δ)L(s)L0 (s + δ) + L(s + δ)L0 (s)L0 (s + δ)
⇔
aL0 (s + δ)L0 (s)L(s) > L(s + δ)L0 (s)L0 (s + δ)
⇔
aL(s) > L(s + δ)
⇔
L(s + δ)
1−β
+
L(s) > L(s + δ)
β
L(s)
⇔
1−β
L(s) > 0.
β
True because β, L ∈ (0, 1).
1072
1073
A.4
1074
Recall that Proposition 4 states that There exists a set S nc such that for s0 ∈ S nc , it is a symmetric Nash-
1075
Equilibrium to stay at s0 and consume
1076
one step and consume
1077
s1 = s0 + N δ nc (s0 ), consuming
1078
1079
Proof of Proposition 4.
s0
N
s0
N
for all t. For s0 ∈
/ S nc , it is a Nash-Equilibrium to take exactly
+ δ nc (s0 ) for t = 0 and – when this has not triggered the regime shift – to stay at
s1
N
for all t ≥ 1.
Preliminarily, note that the game’s stationarity implies that if it is a Nash equilibrium to stay at some s
in any one period, it will be a Nash equilibrium to stay at that s in all subsequent periods.
1080
The first part of the proof, showing the existence of S nc , is parallel to the first part of the proof of
1081
Proposition 2 and is not repeated here. It rests on the same argument, namely that there is some s at which
1082
the gains from increasing consumption are small compared to the expected loss, even when the short term gain
1083
does not have to be shared among all N agents.
37
1084
To prove the second part of the proposition, I need to show that, for s0 ∈
/ S nc , any agent i prefers to
1085
reach the set S nc in one step rather than two when the strategy of all other agents is to first take one step of
1086
fixed size and then a second feedback step δ −i∗ (s1 ) that ensures reaching snc ∈ S nc . Importantly, I use the
1087
symmetry of the agents, that is, the second step δ −i∗ (s) is given by δ −i∗ (s) = (N − 1)δ i∗ (s) where δ i∗ (s) is
1088
the state-dependent best reply defined by equation (11) in the main text. Moreover, I assume that all agents
1089
coordinate on staying at snc whenever it is reached.
1090
The setup is the following: All agents stand at s0 at the beginning of the first period. All agents except
1091
i take step of fixed size and their combined expansion is given by δ̃ −i . If agent i chooses the same symmetric
1092
fixed size, her expansion being denoted by δ̃ i , the state s̃ ∈
/ S nc is reached (provided the regime shift has not
1093
occurred). That is, s0 < s̃ < snc and s̃ − s0 = N δ̃ i . Agent i can also choose her first step so that snc already
1094
in the first period. Denote this step that expands the set of safe consumption values from s0 to snc when
1095
the total expansion of all other agents taken together is δ̃ −i by δ̂ i∗ (s0 ). The size of the step δ̂ i∗ (s0 ) is thus
1096
δ̂ i∗ (s0 ) = snc − s0 − δ̃ −i . Because all other agents remain at snc once it is reached, agent i’s payoff is in this
1097
case:
π (1) = u
s
0
N
+ δ̂ i∗ (s0 ) +
nc s
β
Ls0 (δ̂ i∗ + δ̃ −i )u
1−β
N
(A-11)
When agent i takes two steps, first a step of size δ̃ i to some value s̃ (with s̃ ∈
/ S nc ) and then a second step
δ −i∗ (s̃) of size δ −i∗ (s̃) = snc − s̃ − δ −i∗ (s̃), her payoff is:
π (2) = u
1098
1099
nc s̃
β
s
+ δ̃ i + βLs0 (δ̃ i + δ̃ −i ) u
+ δ i∗ +
Ls̃ (δ i∗ + δ −i∗ )u
N
N
1−β
N
s
0
I now show that π (1) > π (2) . For clarity, rewrite (A-11) and (A-12), using the fact that Ls0 (δ̂ i∗ + δ̃ −i ) =
Ls0 (δ̃ i + δ̃ −i )Ls̃ (δ i∗ + δ −i∗ ) =
π
(1)
π (2)
1100
(A-12)
L(snc )
:
L(s0 )
nc L(snc )
L(snc ) u(snc /N )
s
=u
+ δ̂ (s0 ) + β
u
+ β2
N
L(s0 )
N
L(s0 ) 1 − β
s
L(s̃)
L(snc ) u(snc /N )
s̃
0
=u
+ δ̃ i + β
u
+ δ i∗ (s̃) + β 2
N
L(s0 )
N
L(s0 ) 1 − β
s
0
i∗
(A-11’)
(A-12’)
Thus, π (1) > π (2) if:
u
nc s
L(snc )
L(s̃)
s̃
s
0
+ δ̃ i < u
+ δ̂ i∗ (s0 ) + β
u
+ δ i∗ (s̃) −
u
N
N
L(s0 )
N
L(s0 )
N
s
0
First, by symmetry of the agents and the definition of δ i∗ (s̃), we have
u
s̃
N
+ δ i∗ (s̃) =
nc s
L(s0 ) − L(snc )
s
0
+ δ̃ i < u
+ δ̂ i∗ (s0 ) + β
u
N
N
L(s0 )
N
s
0
38
s̃+N δ i∗ (s̃)
N
(A-13)
=
snc
.
N
(A-13’)
Similarly, we have
s0
N
+ δ̃ i =
u
Now, note that
1101
s0
N
s̃
N
s0 +N δ̃ i
N
<u
=
s̃
N
:
nc L(s̃) − L(snc )
s
+ δ̂ i∗ (s0 ) + β
u
N
L(s0 )
N
s
0
+ δ̂ i∗ (s0 ) =
s0 +N δ̂ i∗ (s0 )
N
(A-13”)
and that the step δ̂ i∗ (s0 ) is larger than the symmetric step that
snc −s0
N
1102
would be necessary to reach snc from s0 . Formally: δ̂ i∗ (s0 ) = snc − s0 − δ̃ −i >
1103
1)δ̃ i > snc − s0 ⇔ snc − s0 > N δ̃ i = s̃ − s0 which is true by construction. It follows that
1104
Consequently, it is sufficient condition to show:
u
s̃
N
nc nc L(s̃) − L(snc )
s
s
u
<u
+β
N
L(s0 )
N
<u
⇔ N snc − N s0 − N (N −
s0 +N δ̂ i∗ (s0 )
N
nc L(s̃) − L(snc )
s
+ δ̂ i∗ (s0 ) + β
u
N
L(s0 )
N
s
0
>
snc
.
N
(A-14)
Parallel to the argument in Proposition 2, we can use the fact that because s̃ ∈
/ S nc we must have
1105
u(s̃/N )
u(snc /N )
s̃
<u
+ δ i∗ (s̃) + βLs̃ (δ i∗ (s̃) + δ −i∗ (s̃))
1−β
N
1−β
Using that
s̃
N
+ δ i∗ (s̃) =
snc
N
and re-arranging, we can write this as:
nc u(s̃/N )
L(s̃) − L(snc )
s
< 1+β
u
1−β
L(s̃)
N
1106
Because L0 (s) < 0, we know that
(1)
<
L(s̃)−L(snc )
L(s0 )
< 0 and combining (A-14) and (A-15) therefore
1107
establishes that π
1108
A.5
1109
Let me repeat the argument from the main text: The effect of an increase in a parameter a in the interior
1110
range s ∈ (snc , snc ) is given by
1111
higher the parameter a, it is sufficient to show that
0
>π
(2)
L(s̃)−L(snc )
L(s̃)
(A-15)
and completes the proof.
Proof of Proposition 5.
dg nc
da
0
∂φ /∂a
= − ∂φ
0 /∂g nc . Thus, to show that aggregate consumption is higher the
∂φ0
∂a
> 0 (because the second-order condition implies that
1112
∂φ
∂g nc
1113
R, neither boundary snc or snc decreases and at least one boundary increases with a. The reason is that for a
1114
given value of R an upward shift of snc or snc (and no downward of the respective other boundary) necessarily
1115
implies that all new values of g nc must lie above the old values of g nc .
1116
(a) The boundaries snc , snc , and aggregate consumption in the cautious equilibrium, N gnc , decrease with β.
1117
Here, it is simple to show that
1118
is term in the squared brackets of equation (12). We know that this term must be negative for an interior
1119
solution because u0 > 0.
1120
(b) An increase in N leads to higher resource use in the cautious equilibrium when
1121
Here, I argue that both snc and snc increase when adding another player and
< 0). Because g nc is monotonically decreasing in s, it is also sufficient to show that, for a given value of
∂φ0
∂β
< 0. We have
∂φ0
∂β
=
39
[...]
,
(1−β)2
where the term in the squared brackets [...]
N
N +1
R
> u0 ( N
) u0 ( NR+1 ).
R
> u0 ( N
) u0 ( NR+1 ):
N
N +1
First, for a given number of players N we have at a given snc = ŝ that
φ0 (
ŝ
R
R−s
R − ŝ
R
β
1
; ŝ) = u0
+
+
L0ŝ (N δ nc )u
+ Lŝ (N δ nc )u0
=0
N
N
N
1−β
N
N
N
R−s
; ŝ) > 0 when
I now show that φ0 ( N
+1
N
N +1
>
R
)
N
:
R
)
N +1
u0 (
u0 (
R−s
φ0 ( N
; ŝ) − φ0 ( R−s
; ŝ) > 0
+1
N
⇔
R
)+
u0 ( NR+1 ) − u0 ( N
1
β
1
R
R
) L0ŝ + Lŝ
u0 ( NR+1 ) − u0 ( N
)
>0
u( NR+1 ) − u( N
1−β
N +1
N
1122
The first part of the last line is positive due to concavity of u, the first term in the squared bracket is
1123
R
positive since L0s < 0 and u0 ( NR+1 ) < u( N
), and the last term in the squared bracket is positive whenever
1124
N
N +1
>
R
)
N
.
R
)
N +1
u0 (
u0 (
Second, for a given number of players N we have at a given snc = š that
φ0 (0; š, N ) = u0
š
N
+
β
š
1
š
L0š (0)u
+ u0
=0
1−β
N
N
N
1125
Clearly, we can make exactly the same argument as above to show that φ0 (0; š, N +1) > 0 when
1126
(c)
1127
where a separate cautious Nash-equilibrium exists and the lower aggregate consumption.
1128
N
N +1
>
R
)
N
.
R
u0 (
)
N +1
u0 (
The more likely the regime shift (in terms of a first-order stochastic dominance), the larger the range
Suppose that for some given value ŝ, equation (12) has an interior solution that defines g nc :
φ0 (δ nc ; ŝ, L) = u0
ŝ
+ δ nc
N
+
β
ŝ + N δ nc
1
ŝ + N δ nc
L0ŝ (N δ nc )u
+ Lŝ (N δ nc )u0
=0
1−β
N
N
N
(12)
1129
I now show that a more likely regime shift (in terms of a first-order stochastic dominance) means a change
1130
in Ls to L̃s in such a way that φ0 (δ nc ; ŝ, L̃) < 0 so that for every s ∈ (snc , snc ) we have that the resulting
1131
interior solution g̃ nc is smaller than the orginal g nc . As a consequence, the range where a separate cautious
1132
Nash-equilibrium exists will also be larger.
1133
1134
A first-order stochastic dominance means that F̃ ≥ F (where the inequality is strict for at least one
s). Because the hazard rate is non-declining, this means that
−f˜(s+δ)
1−F̃ (s)
−f (s+δ)
1−F (s)
F̃ (s+δ)−F̃ (s)
1−F̃ (s)
≥
F (s+δ)−F (s)
1−F (s)
and consequently
1135
L̃s ≤ Ls . This implies also that L̃0ŝ =
1136
positive second term in the squared bracket of (12) are smaller, which implies that φ0 (δ nc ; ŝ, L̃) < 0.
1137
(d) An increase of R to R̃ for an unchanged risk of the regime shift (i.e. R < R̃ ≤ A) decreases sn c and
1138
thus leads to a larger range where a separate cautious equilibrium exists.
<
= L0ŝ < 0. Thus, both the negative first term and the
1139
Note that R does not impact equation (12) when R < R̃ ≤ A, but it has an effect on the first value snc :
1140
As the diagonal line defining the upper bound of δ shifts outwards, and g nc (s) is a downward sloping function
1141
steeper than R − s, the first value at which it is not optimal to deplete the resource, snc , must be smaller when
1142
R increases from to R̃
40
Download