Threatening Thresholds? 1 2 3 The effect of disastrous regime shifts on the non-cooperative use of environmental goods and services Florian K Diekert∗ 4 5 May 1, 2016 6 Abstract 7 This paper presents an analytically tractable dynamic game in which agents jointly use a resource. The resource replenishes fully but collapses irreversibly should total use exceed a threshold. The initial level of resource use is known to be safe. Because experimentation only reveals whether the threshold has been crossed or not, and the consequence of crossing the threshold is disastrous, it is socially optimal, and a Nash Equilibrium, to update beliefs about the location of the threshold only once (if at all). These learning dynamics facilitate coordination on a “cautious equilibrium”. If the initial safe level is sufficiently valuable, the equilibrium implies no experimentation and coincides with the first-best. The less valuable the initial safe value, the more experimentation. For sufficiently low initial values, immediate depletion of the resource is the only equilibrium. The potentially beneficial effect of the threat of a disastrous regime shift is robust to a variety of extensions. When the regime shift is not disastrous, but the damage increases sufficiently strongly the more the threshold is exceeded, learning may be gradual. (175 words) 8 9 10 11 12 13 14 15 16 17 18 19 21 Keywords: Dynamic Games; Thresholds and Natural Disasters; Learning. JEL-Codes: C73, Q20, Q54 22 Highlights: 20 23 • Thresholds may benefit non-cooperative agents by facilitating coordination. 24 • Learning depends on history: higher initial safe level implies less experimentation. 25 • For high initial value, preserving the resource with certainty is a Nash equilibrium. ∗ Department of Economics and Centre for Ecological and Evolutionary Synthesis, University of Oslo, POBox 1095, 3017 Oslo, Norway. Email: f.k.diekert@ibv.uio.no. This research is funded by NorMER, a Nordic Centre of Excellence for Research on Marine Ecosystems and Resources under Climate Change. I would like to thank Geir Asheim, Scott Barrett, Bård Harstad, Larry Karp, Matti Liski, Leo Simon, and seminar participants at the Beijer Institute, Statistics Norway, and in Oslo, Helsinki, Santa Barbara, and Berkeley for constructive discussion and comments. 1 26 1 Introduction 27 Everyday-experience painfully teaches that many things tolerate some stress, but if this is 28 driven too far or for too long, they break. Not surprisingly, regime shifts, thresholds, or 29 tipping points are a popular topic in the scientific literature. In this paper, I investigate 30 whether and how the existence of a catastrophic threshold may be beneficial in the sense that 31 it enables non-cooperative agents to coordinate their actions, thereby improving welfare over 32 a situation where this tipping point is not present or ignored. To this end, I develop a generic 33 model of using a productive asset that loses (some or all) its productivity upon crossing some 34 (potentially unknown) threshold. 35 The model is applicable to many different settings of non-cooperative learning about the 36 location of a threshold, but I would like to think of it in terms of ecosystem services. Prime 37 examples are the eutrophication of lakes, the bleaching of coral reefs, or saltwater intrusion 38 (Scheffer et al., 2001; Tsur and Zemel, 1995). On a larger scale, the collapses of the North 39 Atlantic cod stocks off the coast of Canada, or the capelin stock in the Barents Sea can – 40 at least in part – be attributed to overfishing (Frank et al., 2005; Hjermann et al., 2004). 41 The most important application may be the climate system, where potential drivers of a 42 disastrous regime shift could be a disintegration of the West-Antartic ice sheet (Nævdal, 43 2006), a shutdown of the thermohaline circulation (Nævdal and Oppenheimer, 2007), or a 44 melting of Permafrost (Lenton et al., 2008). 45 A key feature of these examples is that the exact location of the tipping point is unknown. 46 Most previous studies translate the uncertainty about the location of the threshold in state 47 space into uncertainty about the occurrence of the event in time. This allows for a convenient 48 hazard-rate formulation (where the hazard rate could be exogenous or endogenous), but it 49 has the problematic feature that, eventually, the event occurs with probability 1. In other 50 words, even if we were to totally stop extracting/polluting, the disastrous regime shift could 51 not be avoided. Arguably, it is more realistic to model the regime shift in such a way that 52 when it has not occurred up to some level, the players can avoid the event by staying at or 53 below that level. I therefore follow the modeling approach of e.g. Tsur and Zemel (1994); 54 Nævdal (2003); Lemoine and Traeger (2014) where – figuratively speaking – the threshold is 55 interpreted as the edge of a cliff and the policy question is whether, when, and where to stop 56 walking. This modeling approach leads, under certain conditions, to a socially optimal “safe 57 minimum standard of conservation” as pointed out by Mitra and Roy (2006). 58 In order to isolate the effect of a catastrophic threshold on the ability to cooperate, I 59 abstract – as a first step – from the dynamic common pool aspect of non-cooperative resource 60 use. This allows me to obtain tractable analytic solutions and feedback Nash-equilibria for 61 general increasing and concave utility functions and continuous probability distributions. In 62 section 3.1, I expose the underlying strategic structure of the game by considering the case 63 when the location of the threshold is known. I show that there is a Nash equilibrium where the 64 resource is conserved indefinitely and a Nash equilibrium where the resource is depleted im- 65 mediately. When the location of the threshold is unknown, any experimentation is undertaken 66 in the first period only (both in the social optimum, section 3.2, and in the non-cooperative 2 67 game, section 3.3). Agents can coordinate on the first-best, if a) it is socially optimal to use 68 the environmental good at its current level, and if b) this status quo is sufficiently valuable. 69 If preserving the status quo is not sufficiently valuable, agents may still coordinate to refrain 70 from depleting the resource, but they will increase their consumption by an inefficiently high 71 amount. However, provided that the increase in consumption has not caused the disastrous 72 regime shift, the players can coordinate on keeping to the updated level of consumption, which 73 is, ex post, socially optimal. 74 The way the threshold risk is modeled in this paper implies that learning is only “affir- 75 mative”. That is, given that the agents know that the current state is safe, and they expand 76 the current state, they will only learn whether the future state is safe or not. The agents will 77 not obtain any new information on how much closer they have come to the threshold.1 In 78 simple words, it is pitch-dark when the agents walk towards the cliff. When the consequence 79 of falling over the cliff is disastrous in the sense that it does not matter by how far an agent 80 has stepped over the edge, then there is no point to split any given distance in several steps. 81 Any experimentation is – if at all – undertaken in the first period. Moreover, the degree of 82 experimentation is decreasing in the value of current use that is known to be safe. 83 This basic feature of experimentation and resource use under the threat of a disastrous 84 threshold is robust to several extensions that are explored in section 4. While the threat of 85 the threshold may no longer induce coordination on the first-best when the externality relates 86 to both the (endogenous) risk of passing the threshold and resource itself, the threshold still 87 encourages coordination on a time-profile of resource use that is Pareto-superior compared to 88 the Nash equilibrium without a threshold. As I show in section 4.4, repeated learning will 89 take place only if the post-threshold value depends negatively on the pre-threshold degree of 90 experimentation, and if this effect is sufficiently strong. Section 5 concludes the paper and points to important future applications of the modeling 91 92 framework. All proofs are collected in the Appendix. 93 Relation to the literature 94 This paper links to three strands of literature. First, it contributes to the literature on the 95 management of natural resources under regime shift risk by explicitly analyzing learning about 96 the location of a threshold. Second, it relates to the more general literature by describing 97 optimal experimentation in a set-up of “affirmative learning”. Third, the paper extends 98 the literature on coordination in face of a catastrophic public bad, that has hitherto been 99 analyzed in a static setting, by showing that the sharp distinction between a known and 100 unknown location of the threshold vanishes in a dynamic context. 101 The pioneering contributions that analyze the economics of regime shifts in an environ- 102 mental/resource context were Cropper (1976) and Kemp (1976). There are by now a good 103 dozen papers on the optimal management of renewable resources under the threat of an ir- 104 reversible regime shift (see, for example: Tsur and Zemel, 1995; Nævdal, 2006; Brozović and 105 Schlenker, 2011; Lemoine and Traeger, 2014). Polasky et al. (2011) summarize and character1 Empiricists will agree that there is no learning without experiencing. 3 106 ize the literature at hand of a simple fishery model. They contrast whether the regime shift 107 implies a collapse of the resource or merely a reduction of its renewability, and whether prob- 108 ability of crossing the threshold is exogenous or depends on the state of the system (i.e. it is 109 endogenous). They show that resource extraction should be more cautious when crossing the 110 threshold implies a loss of renewability and the probability of crossing the threshold depends 111 on the state of the system. In contrast, exploitation should be more aggressive when a regime 112 shift implies a collapse of the resource and the probability of crossing the threshold cannot be 113 influenced. There is no change in optimal extraction for the loss-of-renewability/exogenous- 114 probability case and the results are ambiguous for the collapse/endogenous-probability case. 115 Ren and Polasky (2014) scrutinize the loss-of-renewability/endogenous-probability case 116 for more general growth and utility functions. They point out that in addition to the risk- 117 reduction effect of the linear model in Polasky et al. (2011), there may also be a consumption- 118 smoothing and an investment effect. The consumption smoothing effect gives incentives to 119 build up a higher resource stock, so that one has more should the future growth be reduced. 120 The investment effect gives incentives to draw down the resource stock as the rate of return 121 after the regime shift is lower. Lemoine and Traeger (2014) call the sum of these two coun- 122 tervailing effects the “differential welfare effect”. In their climate-change application the total 123 effect is positive. As I consider a truly catastrophic event, this effect is zero by construction 124 in the baseline model of the current paper (but see the discussion in section 4.4). 125 Until now the literature in resource economics has been predominantly occupied with 126 optimal management, leaving aside the central question of how agent’s strategic considerations 127 influence and are influenced by the potential to trigger a disastrous regime shift. Still, there 128 are a few notable exceptions: Crépin and Lindahl (2009) analyze the classical “tragedy of 129 the commons” in a grazing game with complex feedbacks, focussing on open-loop strategies. 130 They find that, depending on the productivity of the rangeland, under- or overexploitation 131 might occur. Kossioris et al. (2008) focus on feedback equilibria and analyze, with help of 132 numerical methods, non-cooperative pollution of a “shallow lake”. They show that, as in most 133 differential games with renewable resources, the outcome of the feedback Nash equilibrium 134 is in general worse than the open-loop equilibrium or the social optimum. Ploeg and Zeeuw 135 (2015b) compare the social optimal carbon tax to the tax in the open-loop equilibrium under 136 the threat of a productivity shock due to climate change. While it would be optimal in their 137 two-region model to have taxes that converge to a high level, the players apply diverging taxes 138 in the non-cooperative equilibrium: While the more developed and less vulnerable region has 139 a lower carbon tax throughout, the less developed and more vulnerable region has low taxes 140 initially in order to build up a capital stock, but will apply high taxes later on in order to reduce 141 the chance of tipping the climate into the unproductive regime. Fesselmeyer and Santugini 142 (2013) introduce an exogenous event risk into a non-cooperative renewable resource game à 143 la Levhari and Mirman (1980). As in the optimal management problem with an exogenous 144 probability of a regime shift, the impact of shifted resource dynamics is ambiguous: On the 145 one hand, the threat of a less productive resource induces a conservation motive for all players, 146 but on the other hand, it exacerbates the tragedy of the commons as the players do not take 4 147 the risk externality into account. Finally, Sakamoto (2014) has, by combining analytical 148 and numerical methods, analyzed a non-cooperative game with an endogenous regime shift 149 hazard. He shows that this setting may lead to more precautionary management, also in a 150 strategic setting. Miller and Nkuiya (2014) also combine analytical and numerical methods 151 to investigate how an exogenous or endogenous regime shift affects coalition formation in the 152 Levhari-Mirman model. They show that an endogenous regime shift hazard increases coalition 153 sizes and it allows the players, in some cases, to achieve full cooperation. Here, I simplify 154 the dynamics of resource use to develop an analytically tractable model with an endogenous 155 regime shift risk. Taken together, these studies and the current paper show that the effect 156 of a regime shift pulls in the same direction in a non-cooperative setting as under optimal 157 management. However, both the literature on optimal resource management under regime 158 shift risk and its non-cooperative counterpart have not explicitly addressed learning about 159 the unknown location of the tipping point. 160 Results on optimal experimentation in the context of affirmative learning appear at various 161 places in the literature. For example, in their classical book, Dubins and Savage (1965) dis- 162 cuss circumstances under which it is optimal for gamblers to expose themselves to uncertainty 163 in as few rounds as possible. Another example is Riley and Zeckhauser (1983) who discuss 164 price-negotiation strategies where the seller does not know the valuation of the buyer. They 165 find that “[a] seller encountering risk-neutral buyers one at a time should, if commitments are 166 feasible, quote a single take-it-or-leave-it price to each.” Another well-known study is from 167 Rob (1991), who studies optimal and competitive capacity expansion when market demand is 168 unknown. Rob finds that learning will take place over several periods. In his model, the infor- 169 mation gained by an additional unit of installed capital is small, but so is the cost. However, 170 experimenting too much (in the sense of installing more capital than is needed to satisfy the 171 revealed demand) is very costly compared to experimenting too little (so that the true size of 172 the market remains unknown) several times. Consequently, learning takes place gradually. In 173 the competitive equilibrium, learning is even slower due to the private nature of search costs 174 but the public nature of information. In an application to environmental economics, Costello 175 and Karp (2004) investigate optimal pollution quotas when abatement costs are unknown. In 176 their model, the information gain from an additional unit of quota is small as well, but search 177 costs are very high in the beginning and then decline. Importantly, there is no additional 178 harm from experimenting too much, which – in line with the baseline model of the current 179 paper – leads to the conclusion that any experimentation takes place in the first period only. 180 Groeneveld et al. (2013) discuss socially optimal learning about a reversible flow-pollution 181 threshold. They show that the upper bound of the belief about the threshold’s location is 182 expanded only once, while there may be several rounds of experimentation to update the 183 lower bound of the belief about the threshold’s location. 184 Here, I show that non-cooperative learning is more aggressive than socially optimal because 185 costs of search are public while the immediate gains are private. Moreover, I show that 186 experimentation is decreasing in the value of the state that is known to be safe: The more 187 the players know that they can safely consume, the less will they be willing to risk triggering 5 188 the regime shift by enlarging the set of consumption opportunities. This aspect has, to the 189 best of my knowledge, not yet been appreciated. 190 At this point, it is important to highlight the difference between the current approach, 191 in which learning is “affirmative”, on the one hand, and, on the other hand, the literature 192 on strategic experimentation (e.g.: Bolton and Harris, 1999; Keller et al., 2005; Bonatti and 193 Hörner, 2015) and the literature on learning in a resource management context (e.g.: Kolstad 194 and Ulph, 2008; Agbo, 2014; Koulovatianos, 2015). In the latter two strands of literature, 195 learning is “informative” in the sense that the agents obtain a random sample on which they 196 base their inference about the state of the world. It pays to obtain repeated samples as 197 this improves the estimate (of course, the public nature of information introduces free-rider 198 incentives in a strategic setting). 199 Finally, the current paper is closely related to three articles that discuss the role of un- 200 certainty about the threshold’s location on whether a catastrophe can be avoided. Barrett 201 (2013) shows that players in a linear-quadratic game are in most cases able to form self- 202 enforcing agreements that avoid catastrophic climate change when the location of the thresh- 203 old is known but not when it is unknown. Similarly, Aflaki (2013) analyzes a model of a 204 common-pool resource problem that is, in its essence, the same as the stage-game developed 205 in section 3. Aflaki shows that an increase in uncertainty leads to increased consumption, 206 but that increased ambiguity may have the opposite effect. Bochet et al. (2013) confirm the 207 detrimental role of increased uncertainty in the stochastic variant of the Nash Demand Game: 208 Even though “cautious” and “dangerous” equilibria co-exist (as they do in my model), they 209 provide experimental evidence that participants in the lab are not able to coordinate on the 210 Pareto-dominant cautious equilibrium.2 However, the models in Aflaki (2013), Barrett (2013), 211 and Bochet et al. (2013) are all static and can therefore not address the prospects of learning. 212 Here, I show that the sharp distinction between known and unknown location of a threshold 213 vanishes in a dynamic context. More uncertainty still leads to increased consumption, but 214 this is now partly driven by the increased gain from experimentation. 215 Analyzing how strategic interactions shape the exploitation pattern of a renewable resource 216 under the threat of a disastrous regime shift is important beyond mere curiosity driven interest. 217 It is probably fair to say that international relations are basically characterized by an absence 218 of supranational enforcement mechanisms which would allow to make binding agreements. 219 But also locally, within the jurisdiction of a given nation, control is seldom complete and the 220 exploitation of many common pool resources is shaped by strategic considerations. Extending 221 our knowledge on the effect of looming regime shifts by taking non-cooperative behavior into 222 account is therefore a timely contribution to both the scientific literature and the current 223 policy debate. 2 Bochet et al. (2013, p.1) conclude that a “risk-taking society may emerge from the decentralized actions of risk-averse individuals”. Unfortunately, it is not clear from the description in their manuscript whether the participants were able to communicate. The latter has shown to be a crucial factor for coordination in threshold public goods experiments (Tavoni et al., 2011; Barrett and Dannenberg, 2012). Hence, it may be that what they refer to as “societal risk taking” is simply the result of strategic uncertainty. 6 224 2 Basic model 225 To repeat, I consider a situation where several agents share a productive resource. Each agent 226 obtains utility from using it and as long as total use is at or below a threshold, the resource 227 remains intact. Contrarily, if total use in any period exceeds the threshold, a disastrous 228 regime shift will occur. The players are unable to make binding agreements. In order to 229 expose the effect of a threatening regime shift as clearly as possible, I consider a model where 230 the externality applies only to the risk of triggering the regime shift. While this is clearly 231 a simplification, it allows for a tractable solution of a general dynamic model with minimal 232 requirements on the utility function or the distribution of the threshold’s location. 233 Essentially, the agents face the problem of sharing a magic pie: If they do not eat too 234 much today, they will tomorrow find the full pie replenished, to be shared again. Yet, each 235 agent is tempted to eat a little more than what was eaten yesterday since he will have that 236 piece for himself, while the future consequences of his voracity will have to be shared by all. 237 A literal example could be saltwater intrusion: Consider a freshwater reservoir whose over- 238 all volume is approximately known, and the annual recharge, say due to rainfall or snowmelt 239 is sufficient to fully replenish it. However, the agents fear that saltwater may intrude and 240 irreversibly spoil the reservoir once the water table falls too low. Further, suppose the geol- 241 ogy is so complex that it is not known how much water must be left in the reservoir to avoid 242 saltwater intrusion. Saltwater intrusion has not occurred in the past, so that the current level 243 of total use is known to be safe. Thus, the agents now face the trade-off whether to expand 244 the current consumption of water, or not. If they decide to expand the current level of use, 245 by how much should extraction increase, and in how many steps should the expansion occur? 246 Moreover, could it be in the agent’s own best interest to empty the remaining reservoir even 247 when all other agents take just their share of the historical use? 248 Resource dynamics 249 • Time is discrete and indexed by t = 0, 1, 2, .... 250 • Each period, agents can in total consume up to the available amount of the resource. 251 There are two regimes: In the productive regime, the upper bound on the available 252 resource is given by R, and in the unproductive regime, the upper bound is given by r 253 (with r R). 254 • The game starts in the productive regime and will stay in the productive regime as long 255 as total consumption does not exceed a threshold T . The threshold T is the same in all 256 periods, but it may be known or unknown. 257 • To highlight the effect of uncertainty about the threshold, I define the state variable st , 258 denoting the upper bound of the “safe consumption possibility set” at time t. That is, 259 total resource use up to st has not triggered a regime shift before, and it is hence known 260 that it will not trigger a regime shift in the future (i.e. Prob(T ≤ st ) = 0). 7 261 Agents, choices, and payoff 262 • There are N identical agents. Each agent i derives utility from consuming the resource 263 according to some general function u(cit ). I assume that this function is continuous, 264 increasing (u0 > 0), concave (u00 ≤ 0), and bounded below by u(0) = b. 265 266 267 268 • Let cit be the consumption of player i at time t. For clarity, I split per-period consumption st N in the productive regime in two parts: cit = + δti . This means: 1. The agents obtain an equitable share of the amount of the resource that can be used safely. 269 2. The agents may choose to consume an additional amount δti , effectively pushing 270 the boundary of the safe consumption possibility set at the risk of triggering the 271 regime shift. 272 273 274 • In other words, δti is the effective choice variable with δti ∈ [0, R − st − δt−i ], where δt−i is the expansion of the safe consumption set by all other agents except i. I denote δ P i without superscript i as the total extension of the safe set, i.e. δt = N i=1 δt . 275 • The objective of the agents is to choose that sequence of decisions ∆i = δ0i , δ1i , ... which, 276 for given strategies of the other agents ∆−i , and for a given initial value s0 , maximizes the 277 sum of expected per-period utilities, discounted by a common factor β with β ∈ (0, 1). 278 I concentrate on Markovian strategies. 279 280 281 282 283 The probability of triggering the regime shift • Let the probability density of T on [0, A] be given by a continuous function f such that the cumulative probability of triggering the regime shift is a priori given by F (x) = Rx 0 f (τ )dτ . F (x) is the common prior of the agents, so that we are in a situation of risk (and not Knightian uncertainty). 284 • The variable A with R ≤ A ≤ ∞ denotes the upper bound of the support of T . When 285 R < A, there is some probability 1 − F (R) that using the entire resource is safe and the 286 presence of a critical threshold is immaterial. When R = A using the entire resource 287 will trigger the regime shift for sure. Both R and A are known with certainty. 288 • Knowing that a given consumption level s is save, the updated density of T on [s, A] f (s+δ) 1−F (s) (see Figure 1). The cumulative probability of triggering the 289 is given by fs (δ) = 290 regime shift when, so to say, taking a step of distance δ from the safe value s is: Z Fs (δ) = δ fs (τ )dτ 0 = 1 1 − F (s) Z δ f (s + ξ)dξ 0 = F (s + δ) − F (s) 1 − F (s) (1) 291 So that Fs (δ) is the discretized version of the hazard rate. I assume that the hazard 292 rate does not decrease in s. 8 Density 0 s R Figure 1: Updating of belief upon learning that T > s: Grey area is F , blue hatched area is Fs . 293 • The (bayesian) updating of beliefs is illustrated in Figure 1. Note that it is only revealed 294 whether the state s is safe or not, but no new knowledge about the relative probability 295 that the threshold is located at, say, s1 or s2 (with s1 , s2 > s) has been acquired. 296 • The key expression that I use in the remainder of the paper is Ls (δ), which I call the 297 conditional survival function. It denotes the probability that the threshold is not crossed 298 when taking a step δ, given that the event has not occurred up to s. Let L(x) = 1−F (x): Ls (δ) = 1 − Fs (δ) = 299 1 − F (s) − (F (s + δ) − F (s)) L(s + δ) = 1 − F (s) L(s) The conditional survival function has the following properties: 300 – It decreases with the step size δ: 301 – It decreases with s: 302 303 (2) f (s+δ) 1−F (s+δ) ∂Ls (δ) ∂s = ∂Ls (δ) ∂δ = −f (s+δ) 1−F (s) < 0. −f (s+δ)(1−F (s))+(1−F (s+δ))f (s) [1−F (s)]2 ≤ 0 ⇔ f (s) 1−F (s) ≤ (as the hazard rate is non-decreasing). Preliminary clarifications and tractability assumptions 304 • It is well known that the static non-cooperative game of sharing a given resource has 305 infinitely many equilibria, even when the agents are assumed to be symmetric. Moreover, 306 the game requires a statement about the consequences when the sum of consumption 307 plans exceeds the total available resource. Here, I assume that the resource is rationed 308 so that each agent gets an equal share. (The important assumption of symmetry is 309 discussed in section 5). 310 • The agent’s prior F (x) is fixed. The absence of any passive learning (an arrival of 311 information simply due to the passage of time) is justified in a situation where all 312 learning opportunities from other, similar resources have been exhausted. The only way 313 to learn more about the location of the threshold in the specific resource at hand is to 9 314 experiment with it.3 The absence of passive learning is also justified when the resource 315 at hand is unique (such as the planet earth when thinking about tipping points in the 316 climate system). 317 • The regime shift is irreversible. Moreover, I consider the regime shift to be disastrous, 318 in the sense that crossing the thresholds breaks all links between the pre-event and the 319 post-event regime. Because the post-event value function is then independent of the 320 pre-event state, I can, without loss of generality, set r = 0 and u(0) = 0. 321 • For now, the model abstracts from the dynamic common pool problem in the sense 322 that the consumption decision of an agent today has no effect on the consumption 323 possibilities tomorrow, except that a) the set of safe consumption possibilities may have 324 been enlarged and b) the disastrous regime shift may have been triggered. 325 • The idea that a system is more likely to experience a disastrous regime shift the lower 326 the amount of the resource that has been left untouched could simply be included in the 327 belief F (x).4 Additive disturbances, such as stochastic (white) noise, are independent 328 of the current state and would not affect the calculations in a meaningful way. They 329 could be absorbed in the discount factor. 330 3 Social optimum and non-cooperative equilibrium 331 In this main part of the paper, I will first expose the underlying strategic structure of the 332 model by analyzing the situation when the threshold is known (section 3.1). In section 3.2, 333 I describe the socially optimal course of action to highlight that any experimentation – if at 334 all – is undertaken in the first period and that experimentation is decreasing with the value 335 of the consumption level that is known to be safe. I then show that this feature of learning 336 may allow for a cautious non-cooperative equilibrium where either the resource is conserved 337 with probability 1 or the agents experiment once (section 3.3). The degree of experimentation 338 will be inefficiently large, but if the threshold has not been crossed, staying at the updated 339 safe level will – ex post – be socially optimal. In section 3.4, I analyze how optimal and 340 non-cooperative resource use changes with changes in the parameters. Finally, I provide an 341 instructive example for which I derive closed-form solutions (section 3.5). 342 3.1 343 What is the socially optimal outcome in a situation when the threshold T is known? When 344 T is small, too much of the resource must be left untouched to ensure its future existence. As Known threshold location 3 An everyday example is blowing up a ballon: We all know that they will burst at some point, and we have blown up sufficiently many balloons, or seen our parents blow sufficiently many balloons to have a good idea which size is safe. But for a given balloon at hand, I do not know when it will burst. 4 Note that I rule out a constructivist worldview. T is given by nature. While the distance between me and the edge of the cliff (the threshold) gets smaller because I walk towards it, the threshold does not appear because I am walking. 10 345 a consequence, it is socially optimal to cross the threshold and consume the entire resource 346 immediately. When T is large, however, the per-period utility from staying at the threshold is 347 sufficiently high so that the first-best is to indefinitely use exactly that amount of the resource 348 which does not cause the regime shift. Intuitively, whether a given T is large enough to induce 349 conservation will also depend on the overall amount of the resource R and the discount factor. 350 The more the future is discounted, the less willing one is to sacrifice today’s consumption to 351 ensure consumption in the future. 352 In the non-cooperative game with a known threshold, immediate depletion is always a 353 Nash-Equilibrium. Clearly, an agent’s best reply when the other agents cross the threshold is 354 to demand the maximal amount of the resource. However, there will be a critical value Tc so 355 that staying at the threshold T is also Nash-Equilibrium when T ≥ Tc . In fact, as Proposition 356 1 states, there will always be a parameter combination so that the first-best can be supported 357 as a Nash-Equilibrium. 358 As the setup is stationary, it is clear that if staying at the threshold can be rationalized 359 in any one period, it can be done so in every period. The payoff from avoiding the regime 360 shift is 361 u(T /N ) 1−β . Conversely, the payoff from deviating and immediately depleting the resource when all other players intend to stay at the threshold is given by u R − NN−1 T . The lower 362 is T , the lower will be the payoff from staying at the threshold, and the higher will be the 363 payoff from deviating. I can therefore define a function Ψ that captures agent i’s incentives 364 to grab the resource when all other agents stay at T : N −1 u(T /N ) Ψ(T, R, N, β) = u R − T − N 1−β (3) 365 The function Ψ is positive at T = 0 and declines as T gets larger. Staying at the threshold 366 can be sustained as a Nash Equilibrium whenever Ψ ≤ 0. The critical value Tc is implicitly 367 defined by Ψ(Tc , R, N, β) = 0. 368 Proposition 1. When the location of the threshold is known with certainty, then there exists, 369 for every combination of β, N , and R, a value Tc such that the first-best can be sustained as a 370 Nash-Equilibrium when T ≥ Tc , where Tc is defined by Ψ = 0. The critical value Tc is higher, 371 the larger is N and R, or the smaller is β. 372 Proof. The proof is placed in Appendix A.1 373 In other words, when T is known and T ≥ Tc , the game exhibits the structure of a 374 coordination game with two Nash equilibria in symmetric pure strategies. Here, as in the 375 static game from Barrett (2013, p.236), “[e]ssentially, nature herself enforces an agreement to 376 avoid catastrophe.” When the staying at or below the threshold is not sufficiently valuable, 377 immediate depletion is the only equilibrium. 378 Having exposed the underlying strategic structure of the game, I now turn to the situation 379 when the location of the threshold is unknown, first by disregarding strategic interactions and 380 studying optimal experimentation, and then by analyzing the non-cooperative game with 381 unknown location of T . 11 382 3.2 Social optimum when the location of T is unknown 383 The objective of the social planner is: max ct ∈C ∞ X β t u(ct ) subject to: Rt+1 = Rt if ct ≤ T 0 t=0 if ct > T ; R0 = R. (4) or Rt = 0 384 Starting from a historically given safe value st , and a belief about the location of the 385 threshold, the social planner has in principle two options: She can either stay at st (choose 386 δ = 0), thereby ensuring the existence of the resource in the next period. Alternatively, she 387 can take a positive step into unknown territory (choose δ > 0), potentially expanding the set 388 of safe consumption possibilities to st+1 = st + δ, albeit at the risk of a resource collapse. 389 Recalling that Ls (δ) is the probability of not crossing the threshold (“surviving”) when taking 390 a step of size δ from the safe value s, we can write social planner’s Bellman equation as: V (s) = max δ∈[0,R−s] u(s + δ) + βLs (δ)V (s + δ) (5) 391 The crux is, of course, that the value function V (s) is a priori not known. However, 392 we do know that once the planner has decided to not expand the set of safe consumption 393 possibilities, it cannot be optimal to do so at a later period: If δ = 0 is chosen in a given 394 period, nothing is learned for the future (st+1 = st ), so that the problem in the next period is 395 identical to the problem in the current period. If moving in the next period were to increase 396 the payoff, it would increase the payoff even more when one would have made the move a 397 period earlier (as the future is discounted). 398 To introduce some notation, let s∗ be a member of the set of admissible consumption 399 values C = [0, R] at which it is not socially optimal to expand the set of safe consumption 400 values (as the threat of a disastrous regime shift looms too large). Denote this set of values 401 by S and let s∗ be the smallest member of S. In Appendix A.2, I show that S must exist and 402 that it is convex when the hazard rate is non-decreasing. Thus, for s ≥ s∗ , it is optimal to 403 choose δ = 0. In this case, we know V (s), it is given by V (s) = u(s) 1−β . 404 This leaves three possible paths when starting from values of s0 that are below s∗ . The 405 social planner could: 1) make one step and then stay, 2) make several, but finitely many steps 406 and then stay, and 3) make infinitely many steps. I now argue that 1) is optimal. 407 Suppose that a value at which it is socially optimal to remain standing is reached in finitely 408 many steps. This implies that there must be a last step. For this last step, we can explicitly 409 write down the objective function as we know that the value of staying at s∗ forever is 410 Denote by ϕ(δ; s) the social planner’s valuation of taking exactly one step of size δ from the 411 initial value s to some value s∗ and then staying at s∗ forevermore, and denote by δ ∗ (s) the 412 optimal choice of the last step. Formally: 12 u(s∗ ) 1−β . ϕ(δ; s) = u(s + δ) + βLs (δ) u(s + δ) . 1−β (6) This yields the following first-order-condition for an interior solution: 0 0 ϕ (δ; s) = u (s + δ) + β + δ) u0 (s + δ) + Ls (δ) 1−β 1−β u(s L0s (δ) = 0. (7) 413 With these explicit functional forms in hand, I can show that it is better to traverse any 414 given distance before remaining standing in one step rather than two steps (see Appendix 415 A.2). A fortiori, this holds for any finite sequence of steps. Also an infinite sequence of steps 416 cannot yield a higher payoff since the first step towards s∗ will be arbitrarily close to s∗ and 417 concavity of the utility function ensures that there is no gain from never actually reaching s∗ . 418 Let g ∗ (s) be the interior solution to the first-order-condition (7). Note that we need not 419 have an interior solution so that δ ∗ (s) = 0 when ϕ0 (δ; s) < 0 for all δ and δ ∗ (s) = R − s when 420 ϕ0 (δ; s) > 0 for all δ. The first corner solution arises when s ≥ s∗ . Similarly, I define a critical 421 value s∗ so that the second corner solution arises when s ≤ s∗ . (In most cases, this corner 422 solution will not arise in the social optimum, which can be formally described by s∗ < 0). 423 That is, the optimal expansion of the set of safe consumption values is given by: R−s δ ∗ (s) = g ∗ (s) 0 424 if s ≤ s∗ (8a) if s ∈ (s∗ , s∗ ) (8b) if s ≥ s∗ (8c) The first-best consumption is summarized by the following proposition: 425 Proposition 2. There exists a set S so that for s ∈ S, it is optimal to choose δ ∗ (s) = 0. That 426 is, if s0 ∈ S, the socially optimal use of the resource is s0 for all t. If s0 ∈ / S, it is optimal 427 to experiment once at t = 0 and expand the set of safe values by δ ∗ (s0 ). When this has not 428 triggered the regime shift, it is socially optimal to stay at s1 = s0 + δ ∗ (s0 ) for all t ≥ 1. 429 Proof. The proof is given in Appendix A.2. 430 In other words, any experimentation – if at all – is undertaken in the first period. The 431 intuition is the following: Given that it is optimal to eventually stop at some s∗ ∈ S, the 432 probability of triggering the regime shift when going from s0 to s∗ is the same whether the 433 distance is traversed in one step or in many steps. Due to discounting, the earlier the optimal 434 safe value s∗ is reached, the better. 435 Moreover, the degree of experimentation depends on history; it is declining in s (Proposi- 436 tion 3). The intuition for this effect is clear: The more valuable the current outside option, the 437 less the social planner can gain from an increased consumption set, but the more she can lose 438 should the experiment trigger the regime shift. In other words, the more the social planner 13 439 knows, the less she wants to learn. In fact, this implies that the largest step is undertaken 440 when s = 0, which is reminiscent Janis Joplin’s dictum that “freedom is another word for 441 nothing left to lose”. 442 Proposition 3. The socially optimal step size δ ∗ (s) is decreasing in s for s ∈ (s∗ , s∗ ). 443 Proof. The proof is placed in Appendix A.3. 444 With this characterization of the social optimum in place, I next turn to the non-cooperative 445 game of using a renewable resource under the threat of a regime shift with unknown location 446 of the threshold. 447 3.3 448 For a given value of the total consumption that is known to be safe, and a given strategy of 449 the other players ∆−i , the Bellman equation of agent i is: Non-cooperative equilibrium when the location of T is unknown V i (s, ∆−i ) = max δ i ∈[0,R−s−δ i ] n o u(s/N + δ i ) + βLs (δ i + δ −i )V i (s + δ, ∆−i ) (9) 450 Also here, the crux is that agent i’s value function V i is a priori unknown. However, as 451 the analysis in the previous section has highlighted, we do know that s divides the state space 452 into a safe region and an unsafe region. Moreover, due to the stationarity of the problem, we 453 know that if the agents can coordinate to stay at some value s once, they can do so forever. 454 Below, I will show that there indeed exists a set S nc where for any s ∈ S nc staying at s is an 455 equilibrium. However, just as in the case when the threshold’s location is known, immediate 456 depletion is always also a Nash-Equilibrium. But different from the case when the threshold’s 457 location is known, immediate depletion need not be the best-reply when s ∈ / S nc . Rather, the 458 agents may coordinate on expanding the set of safe consumption values by some amount δ nc 459 and this experiment need not trigger the regime shift. Provided that the regime shift has not 460 occurred, the set of safe consumption possibilities will be expanded up to a level where it is a 461 Nash-Equilibrium to not expand it further. Parallel to the socially optimal experimentation 462 pattern, it will be a Nash-Equilibrium to reach the set S nc in one step. This “cautious” 463 pattern of non-cooperative resource use is summarized by the following proposition. 464 Proposition 4. There exists a set S nc such that for s0 ∈ S nc , it is a symmetric Nash- 465 Equilibrium to stay at s0 and consume 466 take exactly one step and consume 467 the regime shift – to stay at s1 = s0 + N δ nc (s0 ), consuming 468 Proof. The proof is given in Appendix A.4 s0 N s0 N for all t. For s0 ∈ / S nc , it is a Nash-Equilibrium to + δ nc (s0 ) for t = 0 and – when this has not triggered s1 N for all t ≥ 1. 469 Let φ denote the payoff for agent i when she takes exactly one step of size δ i and then 470 remains standing and the strategy of the other agents, ∆−i = {δ −i , 0, 0, 0, ...}, is also to take 471 only one step (of total size δ −i ): 14 i −i s u s+δ N+δ φ(δ i ; δ −i , s) = u + δ i + βLs (δ i + δ −i ) N 1−β (10) The corresponding first-order-condition for an interior maximum is: s i φ (δ ; δ , s) = u +δ N 0 i −i 0 + βL0s (δ i + δ −i ) i −i u s+δ N+δ (11) 1−β 0 u 1 + β Ls (δ i + δ −i ) N s+δ i +δ N −i =0 1−β 472 Denote the interior solution to the first-order-condition (11) (if it exists) by g(δ −i , s). 473 Three forces determine g: First, there is the marginal gain from the increase in current utility. 474 For a given s, this private gain is larger the more agents there are as u00 ≤ 0. Second, there is 475 the marginal decrease in the probability of surviving, which is evaluated at the updated safe 476 consumption value. As agent i obtains only 477 cost weigh less the more agents there are. In this sense, the cost of triggering the regime shift 478 are public. Third, conditional on survival, there is the marginal utility gain from an expanded 479 safe consumption set. As this gain is again public and benefits all agents equally, it is devalued 480 by the factor 481 of the set of safe consumption possibilities will be inefficiently large, despite the fact that it 482 is “cautious” in the sense that it does not trigger the regime shift with probability 1. 1 N. 1 N th of the updated safe consumption value, these Taken together, these three effects mean that the non-cooperative expansion 483 Clearly, for a given s and δ −i there need not be an interior solution. When the gain from 484 expanding the set of safe consumption values is small, but the threat of triggering the regime 485 shift is large, it may be individually rational to choose δ i = 0. Conversely, when the gain 486 from expanding the set of safe consumption values is large and/or it is unlikely that there is 487 a regime shift, it may be individually rational to choose δ i = R − s − δ −i . For a symmetric step size δ −i = (N − 1)δ i , we can write equation (11) as follows: " # s+N δ nc s 0 s+N δ nc u u 1 N N φ0 (δ nc ; s) = u0 + δ nc + β L0s (N δ nc ) + Ls (N δ nc ) =0 N 1−β N 1−β (12) 488 Let g nc (s) be the individual symmetric interior non-cooperative expansion. It is implicitly 489 defined by φ0 (δ nc ; s) = 0. Noting the similarity of (12) to (7) when replacing δ ∗ with N δ nc , 490 it is possible to show – parallel to Proposition 3 – that g nc (s) is decreasing in s. We can 491 therefore define snc , the smallest member of the set S nc , by g nc (snc ) = 0. In other words, for 492 s ≥ snc , the threat of triggering a disastrous regime shift is sufficiently large so that the agents 493 find it in their own best interest to stay at s when all other agents do so, too. Conversely, 494 we can define the value snc by the other corner solution g nc (snc ) = 495 for s ≤ snc , the threat of triggering a regime shift is so small compared to the gains from 15 R−s N . In other words, 496 increasing one’s own consumption that it is individually rational to use the resource up to its 497 maximal capacity R. 498 499 To sum up, in the non-cooperative game when the location of T is unknown, there is a “cautious equilibrium” that is described by the following set of Markov-strategies: R−s N nc δ (s) = g nc (s) 0 if s ≤ snc (13a) if s ∈ (snc , snc ) (13b) if s ≥ snc (13c) 500 The key intuition for the existence of this “cautious equilibrium” is that 1) for high values 501 of s, staying at s is individually rational when all other agents do so, too, and 2) that when 502 s∈ / S nc , no agent has an incentive to deviate from a one-step experimentation that expands 503 the set of safe consumption values into the region in which staying is optimal. Of course, 504 there will always also exist an “aggressive equilibrium” in which the resource is depleted 505 immediately, simply because the best-reply for player i when all other players plan to expand 506 the consumption set by 507 “cautious” and the “aggressive equilibrium” are unique.5 R−s N is to chose R−s N as well. Note that, for a given s, both the 508 Figure 2 illustrates the aggregate expansion of the set of safe consumption possibilities in 509 the cautious equilibrium and contrasts it with the social optimum. The game has the structure 510 of a coordination problem where the immediate depletion of the resource may become a self- 511 fulfilling prophecy. If any agent i is convinced that the other agents plan to deplete the 512 resource, his best reply is to claim the maximum amount of the resource. Indeed, for s0 ≤ snc , 513 the immediate depletion of the resource is the only Nash Equilibrium, despite the fact that 514 there is a range of initial values (s0 ∈ [s∗ , snc ]) for which it is socially optimal to conserve 515 the resource indefinitely (at least with positive probability). For s0 > snc the “cautious” 516 and the “aggressive equilibrium” coexist. For s0 > snc the agents do not expand the set of 517 consumption values and share s0 in the cautious equilibrium. Indeed, the cautious equilibrium 518 coincides with the first-best for some range: The threat of the regime shift allows the player 519 to coordinate on the social optimum. For s ∈ [snc , snc ], the strategic interactions imply that 520 experimentation is inefficiently large. However, should it turn out that s1 = s0 + N δ nc is safe, 521 this consumption pattern is ex-post socially optimal. 522 Faced with this coordination problem, the question arises which of the two equilibria can we 523 expect to be selected. Clearly, the “cautious equilibrium” pareto-dominates the “aggressive 524 equilibrium”.6 Without strategic uncertainty, the cautious equilibrium would thus be the 5 Uniqueness of the latter type of equilibrium simply follows from the assumption that in case of incompatible demands, the resource is shared equally among the players. Uniqueness of the symmetric “cautious equilibrium” (should it entail δ nc (s) < R−s ) can be established by contradiction. Suppose all other players j 6= i choose to N expand the consumption set to a level at which – should the threshold have not been crossed – no player would have an incentive to go further. Player i’s best-reply cannot be to choose δ i = 0 in this situation as the gain from making a small positive step (which are private) exceed the (public) cost of advancing a little further. Hence, the only equilibrium at which the players expand the consumption set once is the symmetric one. 6 This follows immediately from the fact that, by definition, δ nc (s) is the interior solution to the symmetric maximization problem (9) (with δ −i = (N − 1)δ nc ) where the policy δ(s) = R − s was an admissible candidate. 16 Aggregate expansion (step) size δ(s) Social Optimum Cautious Nash Equilibrium R-s s* snc s* snc R Initial safe value s Figure 2: Illustration of policy function δ(s). The blue circles represent the socially optimal extension δ of the safe consumption set s (on the y-axis) as a function of the safe consumption set on the x-axis (where obviously s ≤ R and δ ∈ [0, R − s]). For values of s below s∗ , it is optimal to consume the entire resource (choose δ(s) = R − s). For values of s above s∗ , it is optimal to remain standing (choose δ(s) = 0). The red dashed line plots the cautious non-cooperative equilibrium, showing how s∗ ≤ snc and s∗ ≤ snc (in some cases we may even have snc < s∗ ). It illustrates how even the “cautious” experimentation under non-cooperation implies excessive risk-taking. The figure also shows that the first-best and the non-cooperative outcome may coincide for very low and very high values of s. 525 outcome of the game. But what happens when the agents are uncertain about the other 526 agents’ behavior? As the disastrous regime shift is irreversible, there is no room for dynamic 527 processes that lead agents to select the pareto-dominant equilibrium (Kim, 1996). Therefore, 528 I turn to the static concept of risk-dominance (Harsanyi and Selten, 1988). 529 Since the game is symmetric, applying the criterium of risk-dominance for equilibrium 530 selection has the intuitive interpretation that the cautious equilibrium is selected if the ex- 531 pected payoff from playing cautiously exceeds the expected payoff from playing aggressively 532 when agent i assigns probability p to the other agents playing aggressively. Obviously, whether 533 the cautious or the aggressive equilibrium is risk-dominant will depend both on this probabil- 534 ity p as well as on the safe value s. We can, for a given safe value s, solve for the probability 535 p∗ at which agent i is just indifferent between playing cautiously or aggressively: p∗ · π[all aggressive] + (1 − p∗ )· π[only i aggressive] = p∗ · π[only i cautious] + (1 − p∗ )· π[all cautious] ⇔ p∗ = 536 π[all cautious] − π[only i aggressive] (π[all cautious] − π[only i aggressive] ) − (π[only i cautious] − π[all aggressive] ) In the above calculation, π[all aggressive] refers to the payoff of playing aggressive when all 17 other agents play aggressively, π[only i aggressive] refers to the payoff of playing aggressive when 538 all other agents play cautiously, etc. In order to explicitly solve for the value of p∗ , we need to 539 put more structure on the problem. For the specific example developed in section 3.5 below, 540 we can calculate and plot p∗ as a function of s (see Figure 3). The grey area below the line 541 drawn by p∗ shows the set of values for which agent i prefers to play cautiously. Figure 3 542 illustrates how robust the cautious equilibrium is: Even when the agents think that there is a 543 50% chance that all other agents play the aggressive strategy, it still pays to play cautiously 544 for a wide range of initial values s. (Clearly, p∗ is not defined for s < snc when the cautious 545 and the aggressive equilibrium coincide.) 1 537 0.75 0.5 Region where playing cautious is risk-dominant 0.25 Probability that opponents play aggressively p* snc snc R Initial safe value s √ Figure 3: The black line plots p∗ as a function of s for u(c) = c, f = A1 and β = 0.8, A = R = 1 and N = 10. It shows, for a given value of s the maximum value that agent i can assign to the probability that all other agents play aggressively and still prefer to play cautiously. 546 3.4 Comparative statics 547 In this section, I analyze how the consumption pattern in the cautious equilibrium changes 548 with changes in the parameters. Recall that g nc is defined as the interior solution φ0 = 0 549 where φ0 is given by (12). The effect of an increase in a parameter a in the interior range 550 s ∈ (snc , snc ) is given by 551 higher the higher the parameter a, it is sufficient to show that dg nc da 0 ∂φ /∂a = − ∂φ 0 /∂g nc . Thus, to show that aggregate consumption is ∂φ0 ∂g nc ∂φ0 ∂a > 0 (because the second- < 0). Because g nc is monotonically decreasing in s, it 552 order condition implies that 553 is equivalently sufficient to show that, for a given value of R, neither boundary snc or snc 554 decreases and at least one boundary increases with a. The reason is that for a given value 555 of R an upward shift of snc or snc (and no downward of the respective other boundary) 556 necessarily implies that all new values of g nc must lie above the old values of g nc . 557 Proposition 5 summarizes the comparative statics results with respect to β, N, R and the 558 agent’s prior belief about the location of the threshold. 559 Proposition 5. 560 561 (a) The boundaries snc and snc , and aggregate consumption in the cautious equilibrium, N g nc , decrease with β. 18 562 563 (b) An increase in N leads to higher resource use in the cautious equilibrium when R ) u0 ( NR+1 ). u0 ( N N N +1 > 564 (c) The more likely the regime shift (in terms of a first-order stochastic dominance), the 565 larger the range where a separate cautious Nash-equilibrium exists and the lower aggre- 566 gate consumption. 567 (d) An increase of R to R̃ for an unchanged risk of the regime shift (i.e. R < R̃ ≤ A) decreases snc and leads to a larger range where a separate cautious equilibrium exists. 568 569 Proof. The proofs are given in Appendix A.5. 570 The first comparative static result conforms with basic intuition: The more patient the 571 agents are, the more they value the annuity of staying at s, and the more cautious they 572 are. The second result shows that an increase in the number of agents may exacerbate the 573 “tragedy of the commons”, but not necessarily in all cases. There are two opposing effects: 574 On the one hand, the addition of one more agent increases aggregate resource use if all agents 575 were to choose the same consumption level as before. On the other hand, the addition of 576 another agent leads all other agents to decrease their individual consumption as they partly 577 take the increase in N into account. Proposition 5 (b) gives a sufficient condition for when 578 the former effect dominates. The third comparative static result also conforms with basic 579 intuition: The more dangerous any step is, the more cautiously will the agents experiment. 580 The last comparative statics result finally highlights the difference to the situation when the 581 location of the threshold is known with certainty. In that situation, an increase in R leads 582 to an increase in Tc , which shrinks the range in which the socially optimal outcome is a 583 Nash-Equilibrium (Proposition 1). Here, immediate depletion is not necessarily the dominant 584 strategy. An increase in R essentially means that the scope for an interior solution is widened 585 so that the range for which immediate depletion is the only Nash-Equilibrium is smaller. 586 3.5 587 For a given utility function and a given probability distribution of the threshold’s location it is Specific example 589 possible to explicitly solve for δ ∗ (s), δ nc (s) and calculate the value function V (s). Specifically, √ I assume that u(c) = c and that the agents believe that every value in [0, A] is equally likely 590 to be the threshold, i.e. f = 588 1 A, and accordingly Ls (δ) = A−s−δ A−s . Let us first consider the case of the social planner: The optimal expansion of the set of safe consumption possibilities is given by: δ∗ = A − (1 + 2β)s 3β 19 There will only be an interior solution to (7) when s ∈ [s∗ , s∗ ]. We have:7 A − 3βR s = max 0, (1 − β) A ∗ s = min ,R 1 + 2β ∗ Let us now consider the cautious equilibrium of the non-cooperative game. Solving (12) for the interior equilibrium g nc , we have that total non-cooperative expansion is given by: R−s (1 − β)N + β A − (1 − β)N + 3β s nc nc N δ (s) = Ng = 3β 0 if s ≤ snc if s ∈ (snc , snc ) if s ≥ snc where ( nc s ) (1 − β)N + β A − 3βR = max 0, (1 − β)N ( nc s = min ) (1 − β)N + β A ,R (1 − β)N + 3β 591 The closed form solutions make it easy to confirm the comparative statics results. First, 592 when R increases to R̃ but A remains unchanged (so that R < A and R̃ ≤ A), it only has 593 an effect on snc (provided that snc < R). Provided that snc > 0, it is plain to see that 594 ∂snc ∂R 595 Second, for these specific functional forms, a monotonic increase in the risk of a regime shift 596 is equivalent to a decrease in A (say, from A to Â). Clearly, when R is unchanged (so that 597 R < A and R ≤  when  < A), a lower A means a lower snc , a lower snc , and a decreased 598 expansion of the set of safe consumption values when s ∈ (snc , snc ) because ((1−β)N +β) > 0 599 for N ≥ 1. The same holds when A = R and R̂ =  with  < A, because R only appears in 600 the condition for snc and snc > 0 ⇔ (1 − β)N − 2β > 0. Third, considering the comparative 601 static results with respect to N , it is straightforward to check that the denominator of 602 is 2β(1 − β)A > 0 and that 603 the functional forms of this example, the more agents there are, the smaller the region where 604 they can coordinate on not expanding the set of safe consumption values and the larger the 605 total resource use in the case of experimentation. Finally, an increase in the discount factor 606 implies that each agent is more patient and values the preservation of the resource for future 607 consumption more. Thus, aggregate experimentation declines and the range of the cautious 608 equilibrium is larger, which can be readily confirmed by noting that the denominator of 609 is 3N (s − A), which is negative because s ≤ A. 7 3β = − (1−β)N < 0 so that the range where a separate cautious equilibrium exists is larger. ∂N g nc ∂N ∂snc ∂N = g nc + N 1−β 3β (A − s) > 0 as A ≥ s by definition. For At s∗ it is optimal to consume the entire resource, so that s∗ is found by solving R − s∗ = s∗ it is optimal to remain standing, so that s∗ is found by solving 0 = 20 A−(1+2β)s∗ . 3β ∂g nc ∂β A−(1+2β)s∗ . 3β At 610 Figure 4 plots the value function for a uniform prior (with A = R = 1) and a discount 611 factor of β = 0.8, illustrating how it changes as the number of agents increases. The more 612 agents there are, the greater the distance of the non-cooperative value function (assuming that 613 agents coordinate on the cautious equilibrium, plotted by the red dashed line) to the socially 614 optimal value function (plotted by the blue open circles). In particular when N = 10, one sees 615 the region (roughly from s = 0 to s = 0.2) where there is no separate cautious equilibrium, 616 and that the value s̄nc when it becomes individually rational to remain standing is relatively 617 large (roughly 0.62). All in all however, this example shows that the threat of a irreversible 618 regime shift is very effective when the common pool externality applies only to the risk of 619 crossing the threshold.8 In particular, for s ≥ snc , the cautious equilibrium coincides with the 620 social optimum. N=5 N=10 5 Social Optimum Non-cooperation 1 1 2 2 3 3 4 4 5 Social Optimum Non-cooperation snc snc s* 0.2 0.4 snc 0 0 s* 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.6 0.8 1.0 Figure 4: Illustration√of the value derived from the socially optimal and non-cooperative use of the resource with u(c) = c, β = 0.8, and A = R = 1 for N = 5 and N = 10. 621 4 Extensions 622 The paper’s main results do not rely on specific functional forms for utility or the probability 623 distribution of the threshold’s location. Tractability is achieved by considering extremely 624 simple resource dynamics, namely the resource remains intact and replenishes fully in the 625 next period as long as resource use in the current period has not exceeded T . In other words, 626 there is no common pool externality relating to the resource dynamics itself. In this part of the 627 paper, I explore to what extent the main results are robust to more general resource dynamics 628 (section 4.1 amd 4.2). Moreover, I show that the result of once-and-for-all experimentation 629 does not rely on the assumption that the regime shift occurs immediately if the threshold is 8 At least for this specific utility function and these parameter values. Note that β = 0.8 implies a unreasonably high discount rate, but it was chosen to magnify the effect of non-cooperation for a small number of agents. 21 630 crossed (section 4.3). However, once-and-for-all experimentation may no longer be optimal 631 when the regime shift is not disastrous in the sense that the post-event value function depends 632 on the extent to which the threshold has been surpassed: Learning is gradual if, and only if, 633 the post-event value decreases sufficiently strongly with the extent to which the threshold has 634 been surpassed (section 4.4). 635 4.1 636 In this subsection, I shall relax the assumption that R is constant. Instead, I consider the 637 case that R grows according to some function G(R). However, I maintain the assumption 638 that f and A are fixed, so that Rt converges to some R∞ ≤ A with time. This situation may 639 mechanistically lead to repeated experimentation as long as the upper bound of the feasible 640 choice set in the respective period is binding. As a consequence, the overall consumption plan 641 will be more cautious. 642 Growing R and constraints on the choice set Formally, the resource dynamics can be expressed as: Rt+1 643 P i Rt + G(Rt ) if i ct ≤ T = P i 0 if i ct > T (15) or Rt = 0 where G(R) > 0 for R ∈ (0, R∞ ), G(R∞ ) = 0, and R0 < R∞ ≤ A. 644 Let us first consider the social optimum. As the set of safe values at which no more 645 experimentation is optimal, S, depends only on the belief about the location of the threshold 646 F and not on R, it will be optimal to expand the set of safe consumption values as long as 647 Rt ∈ / S. Specifically, in this initial phase, it will be optimal to choose δt∗ = Rt − st−1 . Once 648 Rt ∈ S for some t (say at t = τ ), it will be optimal to choose one last step δ ∗ (sτ ) (which may 649 be of size zero) and to remain at sτ +1 = sτ + δ ∗ (sτ ) for all remaining time. 650 Note that constraints on the choice set such that δ ∈ [0, δmax ] where δmax < R − st for 651 some period t = 0, ..., τ will lead to repeated experimentation for the same mechanistic reason: 652 When the first-best unconstrained expansion of the set of safe consumption possibilities is 653 δ ∗ (s0 ) but δmax is such that it requires several steps to traverse this distance, then the safe 654 value s will be updated sequentially (conditional on not causing the regime shift, of course). 655 The optimal plan prescribes choosing δmax for some period t = 0, ..., τ and then choose δ ∗ (sτ . 656 657 Because ∂δ ∗ (s)∂s < 0 (Proposition 3), it follows directly that this implies an overall more P cautious plan (that is: τt=0 δmax + δ ∗ (sτ ) < δ ∗ (s0 )). 658 The exact same reasoning applies in the non-cooperative game. Given that the agents 659 coordinate on the cautious equilibrium, the set S nc does not depend on R. Consequently, 660 the equilibrium path prescribes choosing δtnc = 661 then stay at sτ +1 for the remaining time (even though Rt , and with it also the incentive to 662 unilaterally deviate and deplete the resource, may continue to grow). Note that this rests on 663 the assumption that all agents rationally anticipate the evolution of Rt . Analyzing the effect 664 of uncertainty about G(R) could be very interesting, but is left for future work. That said, 22 Rt −st−1 N for some period t = 0, ..., τ and 665 even with perfect knowledge about G(R), strategic uncertainty will matter a lot in the real 666 world. Given that a real world actor knows that the incentive to grab is increasing through 667 time and he or she is uncertain whether the other actors will actually stick to the cooperative 668 choice, he or she will have strong incentives to pre-empt the other actors. 669 The discussion in this subsection highlights how the result of once-and-for-all experimen- 670 tation is linked to the assumption of an unconstrained choice set δ ∈ [0, R − s]. The discussion 671 further sheds light on the difference of this model to e.g. the climate change application of 672 Lemoine and Traeger (2014): One reason for the gradual approach in their model is that 673 their assumed capital dynamics implicitly translate into a constrained choice set (as it is very 674 reasonable in their setting, capital cannot be adjusted instantly and costlessly). As I will 675 show in section 4.4, this is not the only reason for gradual experimentation. 676 4.2 677 So far, I have assumed that the resource replenishes fully every period unless the thresholds 678 has been crossed. In other words, the externality related only to the risk of crossing the 679 threshold. Here, I study the opposite case of a non-renewable resource to analyze the effect of 680 a disastrous regime shift when the externality relates both to the risk of crossing the threshold 681 and to the resource itself. Specifically, I consider the following model of extraction from a 682 known stock of a non-renewable resource: Non-renewable resource dynamics max cit ∞ X β t u(cit ) subject to: Rt+1 t=0 P i P Rt − i cit if i ct ≤ T ; = P i >T 0 if c i t R0 given (16) 683 A simple interpretation of this model could be a mine from which several agents extract 684 a valuable resource. If aggregate extraction is too high in a given period, the structure of the 685 shafts may collapse, making the remainder of the resource inaccessible. In spite of this natural 686 interpretation, two things are rather peculiar about this model setup: First, any player can 687 extract any amount up to Rt . (The option to introduce a capacity constraint on current 688 extraction – though realistic – would come at the cost of significant clutter without yielding 689 any apparent benefit.) Second, the assumption that R0 is known and that T is constant means 690 that this is not a problem of eating a cake of unknown size. This problem has since long been 691 dealt with in the literature (see e.g. Kemp, 1976; Hoel, 1978) and is not considered here. 692 I assume that the utility function is of such a form, that in a world without the threshold, 693 there is a pareto-dominant non-cooperative equilibrium in which positive extraction occurs 694 in several periods. Due to discounting, it is clear that the extraction level will decline as 695 time passes, both in the social optimum and in the non-cooperative equilibrium. Due to the 696 stock externality, it is clear that the extraction rate in the non-cooperative equilibrium is 697 inefficiently large (see e.g. Harstad and Liski, 2013). To introduce some notation, let c̃nc (Rt ) 698 be the total non-cooperative extraction level (as a function of the resource stock Rt ) in absence 699 of the regime shift risk. 23 700 The threat of a regime shift has the potential to limit non-cooperative extraction below 701 c̃nc (Rt ) 702 dynamics given by (16), it is possible to show that experimentation continues to exhibit 703 “once-and-for-all dynamics”.9 Thus, for a given value s0 for which it is known from history 704 that s0 can extracted safely in any one period, agents will either experiment once to learn 705 nc whether they can safely extract snc 1 = s0 + N δ (s0 ), or not at all. As the extraction path will 706 decline, snc 1 will only be binding on the per-period extraction for an initial phase (say until 707 t = τ , where τ is implicitly defined by s1 = c̃nc (Rτ ). After time period τ , the extraction path 708 will follow the same non-cooperative path as in the absence of a regime shift. Consequently, 709 the threat of a regime shift cannot induce coordination on the intertemporal first-best (except 710 ∗ ∗ in the very special case that snc 1 = s0 = c (RT ) where c (RT ) is the socially optimal extraction 711 at the finite exhaustion period T ). Nevertheless, the threat of a regime shift still allows the 712 agents to coordinate on an extraction path that may pareto-dominate the extraction path 713 without the regime shift in expected terms: While the agents would obviously be better of 714 without the regime shift risk when the initial experiment triggers the collapse, the agents 715 strictly benefit from the constraint on the per-period extraction when the regime shift not be 716 triggered by the initial experiment. 717 4.3 718 In this subsection, I depart from the assumption that all agents observe immediately whether 719 last period’s expansion has triggered the regime shift or not, but consider a situation where 720 the agents observe only with some probability whether they have crossed the threshold. In 721 fact, it is not unreasonable to model the true process of the resource as hidden and that it 722 will manifest itself only after some delay (see Gerlagh and Liski (2014) for a recent paper 723 that focusses on this effect in the context of optimal climate policies). Hence, as time passes 724 the agents will update their beliefs whether the threshold has been located on the interval 725 [st , st + δt ]. How does this learning affect the optimal and non-cooperative strategies? This 726 becomes an extremely difficult question as – due to the delay – the problem is no longer 727 Markovian. and thereby improve welfare. Also in the case of a non-renewable resource, with Delay in the occurrence of the regime shift Nevertheless, it is possible to show that also when crossing the threshold at time t triggers the regime shift at some (potentially uncertain) time τ > t, it is still socially and individually rational to experiment – if at all – in the first period only. The key is to realize that yesterday’s decisions are exogenous today. This means that threat of a regime shift can be modeled as an exogenous hazard rate: Let ht be the probability that the regime shift, triggered by events earlier than and including time t, occurs at time t (conditional on not having occurred prior 9 The proof is omitted because it follows the same steps as the proof of Proposition 4. In particular, the argument that agent i’s payoff is higher when expanding the set of safe values in one step rather than two rests on a comparison of the risk encountered by the two strategies and that, by definition, staying after the first expansion is sub-optimal for agent i. Neither of these two comparisons uses the specific form of the continuation value (it is irrelevant when comparing the regime shift risk and held constant in the second argument). 24 to t, of course). The agent’s problem in this situation can be formulated as: V i (s, ∆−i ) = max δ i ∈[0,R−s] u(s + δ i ) + (1 − ht )βLs (δ i + δ −i )V i (s + δ, ∆−i ) (17) 728 The structure of (17) is identical to the one in equation (9), only the effective discount factor 729 decreases by (1 − ht ). Thus, the learning dynamics are unchanged. 730 In other words, the “once-and-for-all” dynamics of experimentation are robust to a delay 731 in the occurrence of the regime-shift. This does of course not imply that the optimal decision 732 under the two different models will be the same. It will almost surely differ, as delaying 733 the consequences of crossing the threshold decreases the costs of experimentation. Yet, as 734 the agents only learn that they have crossed the threshold when the disastrous regime shift 735 actually occurs, they cannot capitalize on this delay by trying to expand the set of safe 736 consumption possibilities several times. 737 4.4 738 A central feature of the baseline model was that the regime shift is disastrous: Crossing the 739 threshold breaks any links between the state before and after the regime shift. The pre- 740 event choices did not matter for the post-event value. This structure allowed me to simply 741 normalize the continuation value in case of a regime shift to zero. For some applications, 742 this independence of the post-event value is not a very fitting description. This is especially 743 true when the system under consideration is large, such as for global climate change, and the 744 threshold effect on the damage is not truly catastrophic, but just one of many parts in the 745 equation. In such a setting, one would need to take into account how the continuation value 746 depends on how far the set of consumption values has been expanded before the regime shift. 747 Denote the function that captures how the post-event continuation value depends on the 748 pre-event expansion of the set of consumption values by W (st+1 ) (where st+1 = st + δt ). 749 How W would depend on the pre-event values of the state st and the choice variable δt is 750 not generally clear. The capital stock in a climate change application, for example, likely 751 has an ambiguous effect (Ploeg and Zeeuw, 2015a). On the one hand, it buffers against the 752 adverse effects of the regime shift and hence smoothes consumption over regimes. On the other 753 hand, a higher capital stock implies more intense use of fossil fuels, which aggravates climate 754 damages. Similarly, Ren and Polasky (2014) discuss under which conditions regime shift risk 755 implies more cautionary or more aggressive management in a renewable resource application. 756 In particular, they highlight the role of an “investment effect” that induces incentives for more 757 aggressive management: harvesting less (investing in the renewable resource stock) pays off 758 badly should the regime shift occur. Ren and Polasky go on to show how these incentives 759 are balanced (and potentially overturned) by the “risk reduction effect” and a “consumption 760 smoothing effect” (that leads to more precaution in their application). Non-disastrous regime shift 761 Thus, whether we overall have W 0 (st+1 ) > 0 or W 0 (st+1 ) < 0 depends on the specific 762 model at hand. Regardless of this, however, when pre-event choices matter for the post-event 763 value, it is no longer immaterial by how much one has stepped over the threshold when it 25 764 is crossed. I now argue that 1) even when the regime shift is not disastrous, there will still 765 be a set S or S nc at which it is socially or individually rational to not experiment further 766 but to stay at the consumption value that is currently known to be safe, and 2) I point out 767 that a necessary condition for a gradual approach to S is that the post-event value declines 768 sufficiently strongly in st+1 . To put some more structure to the argument, consider the Bellman equation of the sole 769 770 owner before the regime shift has occurred: V (s) = max δ∈[0,R−s] u(s + δ) + β Ls (δ)· V (s + δ) + (1 − Ls (δ))· W (s + δ) (18) 771 In other words, the social planner seeks to choose that expansion of the set of consumption 772 values that is currently known to be safe that maximizes his current utility plus the discounted 773 continuation value. With probability Ls (δ), the step of size δ turns out to be safe and the 774 continuation value is given by V (s + δ). With probability (1 − Ls (δ)), the threshold is located 775 on the interval between s and s + δ and the continuation value after the regime shift depends 776 on how far s has been expanded. 777 Although these considerations certainly impact the choice of δ, it will still be the case that 778 there is a non-empty set S for which the optimal choice is δ ∗ = 0. The reason is that, as long 779 as the regime shift is a negative event (V (s) > W (s)), the gains from further expansion of the 780 set of safe consumption values are bounded above, while the risk of triggering the regime shift 781 grows exceedingly large as s → R. To obtain more insights about how the optimal expansion 782 choice is changed the existence of an endogenous continuation value, consider the derivative 783 of the RHS of (18) (denoting this function again by ϕ0 should not cause confusion): ϕ0 = u0 + β L0s (δ)(V − W ) + Ls (δ)· V 0 + (1 − Ls (δ))· W 0 = 0 (19) The size of δ is determined by three factors: First, there is the gain in marginal utility 784 785 u0 . Second, there is the term involving L0s (δ) which captures the increased risk of the regime 786 shift. This term is negative. Third, the the optimal choice of δ is affected by the marginal 787 continuation value. Previously, only the event of not crossing the threshold mattered here. 788 Now, also the event of crossing the threshold has to be evaluated explicitly. 789 Analyzing how an endogenous post-event value affects δ ∗ , we first note that the negative 790 term L0s (δ)(V − W ) decreases in absolute value. When the second-order condition for an 791 interior solution is satisfied, ϕ0 is a decreasing function in the neighborhood of δ ∗ and a 792 decrease in absolute value of the term L0s (δ)(V − W ) shifts the function upwards. As it is 793 intuitive, a non-zero continuation value in case of a regime shift pushes for a larger current 794 consumption. However, when W 0 < 0, the term (1 − Ls (δ))· W 0 is negative, which, ceteris 795 paribus, lead to a lower value of δ ∗ . If, and only if, the post-event value declines sufficiently 796 strongly in st+1 will the optimal δ ∗ be so small that for st ∈ / S, we have st + δt∗ = st+1 ∈ / S. 797 The approach to S will then be gradual, implying periods of repeated experimentation. 26 799 To illustrate more concretely how these effects play out, I assume specific functional forms. √ As in section 3.5, I set u(c) = c and assume a uniform distribution for the location of the 800 threshold so that Ls (δ) = 801 assume that the resource loses all its productivity once the regime shift occurs. In other 802 words, W is the highest value that can be obtained when spreading the consumption of the 798 803 804 A−s−δ A−s with A = R. For the post-event continuation value, I now non-renewable resource rt = R − st − δt over the remaining time horizon. We have q R−(st +δt ) ∗ W (st + δt ) = . Clearly, W 0 (st + δt ) < 0. I assume the same structure to contrast 1−β 2 805 the social optimum with the non-cooperative game. For a square-root utility function and 806 without exogenous constraints in extraction, the number of players cannot be too big in order 807 to have an interior equilibrium with positive extraction over the entire time path (instead of β 1−β instructive, so that I present the results graphically. 0.4 0.2 s* * s^ 0.6 0.2 nc snc s^ 0.6 Safe value s Safe value s Social Optimum Cautious equilibrium, N=2 Post-event value = W(s) Post-event value = 0 45-degree line 0.2 0.4 0.6 Post-event value = W(s) Post-event value = 0 45-degree line 0.6 β low Post-event value = W(s) Post-event value = 0 45-degree line 0.2 0.4 0.6 Post-event value = W(s) Post-event value = 0 45-degree line 0.2 Aggregate expansion (step) size δ(s) β high Cautious equilibrium, N=2 0.6 Social Optimum 0.4 810 0.2 809 immediate, pre-emptive, r extraction in the first period). Here I choose N =2. We then have R−(st +δti +δt−i ) √ W nc (st + δti + δt−i ) = . The analytic closed form solutions are not particularly 2 Aggregate expansion (step) size δ(s) 808 0.2 s* * s^ 0.6 Safe value s 0.2 0.4 snc nc s^ Safe value s Figure 5: Illustration of optimal experimentation when pre-event choices matter for post-event value. √ Parameters and functional forms: u(c) = c, N =2, A=R=1, Ls (δ) = 1−s−δ 1−s ; for β = 0.95 and β = 0.7. 27 811 Figure 5 plots the aggregate expansion of the set of safe consumption values as a function 812 of s for a high and a low value of the discount factor (β = 0.95 and β = 0.7, respectively). The 813 left panel shows the social optimum and the right panel shows the cautious equilibrium when 814 N = 2. In addition to the socially optimal and cautious non-cooperative expansion when 815 the continuation value is endogenous (in blue and red, respectively), I plot the corresponding 816 expansion when the continuation value is zero for comparison. 817 The Figure shows how the first value of s at which is socially optimal and individually 818 rational to remain standing (here marked by ŝ∗ and ŝnc , respectively) is higher when the 819 regime shift is non-disastrous. For a high value of β this difference is relatively small, but 820 for a low value of β the difference is substantial. Figure 5 furthermore shows, in particular 821 for the social optimum at a high discount factor, that the slope of the optimal expansion is 822 lower when the continuation value is endogenous. For the social optimum, this slope is less 823 than one in absolute value (it is below the 45-degree line – the thin dotted line – anchored 824 at ŝ∗ or ŝnc ). This means that the approach to the set of values at which it is optimal to 825 not experiment further is gradual. In fact, S is reached only asymptotically here. For the 826 non-cooperative game at a low discount factor, however, the set S nc is still reached in one 827 step (if the threshold has not been crossed, of course). At the high discount factor (β = 0.95) 828 the slope of the non-cooperative expansion is just equal to -1, but for higher values of β (0.97 829 and above; not shown here) also the cautious non-cooperative equilibrium implies repeated 830 experimentation. 831 5 832 The threat of a disastrous regime shift may be beneficial in the context of non-cooperative use 833 of a renewable resource because it can facilitate coordination on a pareto-dominant equilibrium 834 even when the location of the threshold is unknown. Because the historical level of use is known 835 to be safe, the consequence of the regime shift is catastrophic, and learning is only affirmative, 836 it is socially optimal and a “cautious” Nash Equilibrium to update beliefs only once (if at all). 837 When the current level of safe use is sufficiently valuable, the threat of loosing the productive 838 resource discourages any experimentation and effectively enforces the first-best consumption 839 level. When the agents experiment, the expansion of the consumption set is inefficiently large, 840 but when it has not triggered the regime shift, staying at the updated level is ex post socially 841 optimal. However, there is always also an “aggressive” Nash equilibrium in which the agents 842 immediately deplete the resource, and when the initial value of safe use is not valuable enough, 843 immediate depletion will be the only equilibrium. Discussion and Conclusion 844 The amount of learning thus depends on history and there will be at most one experiment. 845 These dynamics are robust to a variety of extensions. However, when the regime shift is not 846 disastrous and the post-event continuation value depends on how far the threshold has been 847 overstepped, repeated experimentation may emerge (but only if the post-event value declines 848 sufficiently steeply in the amount by with the threshold has been overstepped). When the 849 externality applies both to the risk of triggering the regime shift and to the resource itself, the 28 850 threat of the disastrous regime shift loses importance, but it can still act as a “commitment 851 device” to at least dampen non-cooperative extraction. 852 These conclusions have been derived by developing a general dynamic model that has 853 placed only minimal requirements on the utility function and the probability distribution of 854 the threshold. Nevertheless, there are a number of structural modeling assumptions that 855 warrant discussion. 856 First, a prominent aspect of this model is that the threshold itself is not stochastic. The 857 central motivation is that this allows concentrating on the effect of uncertainty about the 858 threshold’s location. This is arguably the core of the problem: we don’t know which level of 859 use triggers the regime shift. This modeling approach implies a clear demarcation between a 860 safe region and a risky region of the state space. In particular, it implies that the edge of the 861 cliff, figuratively speaking, is a safe – and in many cases optimal – place to be. The alternative 862 approach to model the regime shift as a function of time t (with a positive hazard rate for all 863 t) acknowledges that, figuratively speaking, the edge of a cliff is often quite windy and not a 864 particular safe place. This however implies that the regime shift will occur with certainty as 865 time goes to infinity, no matter how little of the resource is used; eventually there will be a 866 gust of wind that is strong enough to blow us over the edge, regardless of where we stand. 867 This is of course not very realistic either. But also on a deeper level one could argue that 868 the non-stochasticity is in effect not a flaw but a feature: Let me cite Lemoine and Traeger 869 (2014, p.28) who argue that “we would not actually expect tipping to be stochastic. Instead, 870 any such stochasticity would serve to approximate a more complete model with uncertainty 871 (and potentially learning) over the precise trigger mechanism underlying the tipping point.” 872 This being said, it would still be interesting to investigate how the choice between a hazard- 873 rate formulation (as in Polasky et al., 2011 or Sakamoto, 2014) or a threshold formulation 874 influences the outcome and policy conclusions in an otherwise identical model. 875 Second, I have modeled the players to be identical. In the real world, players are rarely 876 identical. One dimension along which players could differ could be their valuation of the 877 future. However, prima facie it should not be difficult to show that any such differences could 878 be smoothed out by a contract that gives a larger share of the gains from cooperation to 879 more impatient players. One could also investigate the effect of heterogenous beliefs about the 880 existence and location of the threshold. Agbo (2014) and Koulovatianos (2015) analyze this in 881 the framework of Levhari and Mirman (1980). In the current set-up such a heterogeneity could 882 lead to interesting dynamics and possible multiple equilibria, where some players rationally 883 do not want to learn about the probability distribution of T whereas other players do invest 884 in learning and experimentation. Another dimension along which players could differ is their 885 size or the degree to which they depend on the environmental goods or services in question. 886 As larger players are likely to be able to internalize a larger part of the externality than 887 smaller players, different sets of equilibria may emerge. Especially in light of the discussions 888 surrounding a possible climate treaty (Harstad, 2012; Nordhaus, 2015), it is topical to analyze 889 a situation where groups of players can form a coalition to ameliorate the negative effects of 890 non-cooperation in future applications. 29 891 Third, I have assumed the regime shift to be irreversible. This is obviously a considerable 892 simplification. Groeneveld et al. (2013) have analyzed the problem of how a social planner 893 would learn about the location of a flow-pollution threshold in a setting where repeated 894 crossings are allowed. In the current set-up, the introduction of several thresholds would 895 most likely not yield significant new insights but make the analysis a lot more cumbersome. 896 For example, if one presumes that crossing the threshold implies that one learns where it 897 is,10 the game turns into a repeated game. This may imply that cooperation is sustainable 898 for sufficiently patient agents (van Damme, 1989). However, there could also be cases where 899 irreversibility emerges “endogenously” when it is possible – but not an equilibrium – to move 900 out of a non-productive regime. The tractability of the current modeling approach may prove 901 fruitful to further explore this issue. 902 A final, related, point is the fact that I have concentrated on Markovian strategies. When 903 the agents are allowed to use history-dependent strategies, the threat of a threshold may allow 904 them to coordinate on the social optimum in all phases of the game. They could simply agree 905 on expanding the set of safe consumption possibilities by the socially optimal amount and 906 threaten that if any agent steps too far, this triggers a reaction to deplete the resource in the 907 next period. This obviously begs the question of renegotiation proofness, but it is plausible 908 that already a contract that is binding for two periods is sufficient to achieve the first-best. 909 The threat of a disastrous regime shift is a very strong coordinating device. This is true 910 irrespective of whether the threshold’s location is known or unknown, because the current 911 model of uncertainty implies that the agents learn only after the fact whether the disastrous 912 regime shift has occurred or not. Would a catastrophic threshold lose its coordinating force 913 when the agents can learn about its location without risking to cross it? Importantly, an 914 extension of the model along these lines would speak to the recent debate on “early warning 915 signals” (Scheffer et al., 2009; Boettiger and Hastings, 2013) and is the task of future work. 10 Groeneveld et al. (2013) presume that crossing the threshold tells the social planner that the regime shift has occurred, but not where. This is a little bit like sitting in a car, fixing the course to a destination and then blindfolding oneself. Arguably conscious experimentation is more realistically described by saying that the course is set, but the eyes remain open. 30 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 References Aflaki, S. (2013). The effect of environmental uncertainty on the tragedy of the commons. Games and Economic Behavior, 82(0):240–253. Agbo, M. (2014). Strategic exploitation with learning and heterogeneous beliefs. Journal of Environmental Economics and Management, 67(2):126–140. Barrett, S. (2013). Climate treaties and approaching catastrophes. Journal of Environmental Economics and Management, 66(2):235–250. Barrett, S. and Dannenberg, A. (2012). Climate negotiations under scientific uncertainty. Proceedings of the National Academy of Sciences, 109(43):17372–17376. Bochet, O., Laurent-Lucchetti, J., Leroux, J., and Sinclair-Desgagné, B. (2013). Collective Dangerous Behavior: Theory and Evidence on Risk-Taking. Research Papers by the Institute of Economics and Econometrics, Geneva School of Economics and Management, University of Geneva 13101, Institut d’Economie et Economtrie, Universit de Genve. Boettiger, C. and Hastings, A. (2013). Tipping points: From patterns to predictions. Nature, 493(7431):157–158. Bolton, P. and Harris, C. (1999). Strategic experimentation. Econometrica, 67(2):349–374. Bonatti, A. and Hörner, J. (2015). Learning to disagree in a game of experimentation. Discussion paper, Cowles Foundation, Yale University. Brozović, N. and Schlenker, W. (2011). Optimal management of an ecosystem with an unknown threshold. Ecological Economics, 70(4):627 – 640. Costello, C. and Karp, L. (2004). Dynamic taxes and quotas with learning. Journal of Economic Dynamics and Control, 28(8):1661–1680. Crépin, A.-S. and Lindahl, T. (2009). Grazing games: Sharing common property resources with complex dynamics. Environmental and Resource Economics, 44(1):29–46. Cropper, M. (1976). Regulating activities with catastrophic environmental effects. Journal of Environmental Economics and Management, 3(1):1–15. Dubins, L. E. and Savage, L. J. (1965). How to gamble if you must : inequalities for stochastic processes. McGraw-Hill, New York. Fesselmeyer, E. and Santugini, M. (2013). Strategic exploitation of a common resource under environmental risk. Journal of Economic Dynamics and Control, 37(1):125–136. Frank, K. T., Petrie, B., Choi, J. S., and Leggett, W. C. (2005). Trophic cascades in a formerly cod-dominated ecosystem. Science, 308(5728):1621–1623. Gerlagh, R. and Liski, M. (2014). Carbon Prices for the Next Hundred Years. CESifo Working Paper Series 4671, CESifo Group Munich. Groeneveld, R. A., Springborn, M., and Costello, C. (2013). Repeated experimentation to learn about a flow-pollutant threshold. Environmental and Resource Economics, forthcoming:1–21. Harsanyi, J. C. and Selten, R. (1988). A general theory of equilibrium selection in games. MIT Press, Cambridge. Harstad, B. (2012). Climate contracts: A game of emissions, investments, negotiations, and renegotiations. The Review of Economic Studies, 79(4):1527–1557. Harstad, B. and Liski, M. (2013). Games and resources. In Shogren, J., editor, Encyclopedia of Energy, Natural Resource and Environmental Economics, volume 2, pages 299–308. Elsevier, Amsterdam. Hjermann, D. Ø., Stenseth, N. C., and Ottersen, G. (2004). Competition among fishermen and fish causes the collapse of Barents Sea capelin. Proceedings of the National Academy of Sciences, 101:11679–11684. Hoel, M. (1978). Resource extraction, uncertainty, and learning. The Bell Journal of Economics, 9(2):642–645. Keller, G., Rady, S., and Cripps, M. (2005). Strategic experimentation with exponential bandits. Econometrica, 73(1):39– 68. Kemp, M. (1976). How to eat a cake of unknown size. In Kemp, M., editor, Three Topics in the Theory of International Trade, pages 297–308. North-Holland, Amsterdam. Kim, Y. (1996). Equilibrium selection in n-person coordination games. Games and Economic Behavior, 15(2):203–227. Kolstad, C. and Ulph, A. (2008). Learning and international environmental agreements. Climatic Change, 89(1-2):125– 141. Kossioris, G., Plexousakis, M., Xepapadeas, A., de Zeeuw, A., and Mäler, K.-G. (2008). Feedback nash equilibria for non-linear differential games in pollution control. Journal of Economic Dynamics and Control, 32(4):1312–1331. Koulovatianos, C. (2015). Strategic exploitation of a common-property resource under rational learning about its reproduction. Dynamic Games and Applications, 5(1):94–119. Lemoine, D. and Traeger, C. (2014). Watch your step: Optimal policy in a tipping climate. American Economic Journal: Economic Policy,, forthcoming:1–47. Lenton, T. M., Held, H., Kriegler, E., Hall, J. W., Lucht, W., Rahmstorf, S., and Schellnhuber, H. J. (2008). Tipping elements in the earth’s climate system. Proceedings of the National Academy of Sciences, 105(6):1786–1793. Levhari, D. and Mirman, L. J. (1980). The Great Fish War: An Example Using a Dynamic Cournot-Nash Solution. The Bell Journal of Economics, 11(1):322–334. Miller, S. and Nkuiya, B. (2014). Coalition formation in fisheries with potential regime shift. Unpublished Working Paper, University of California, Santa Barbara. Mitra, T. and Roy, S. (2006). Optimal exploitation of renewable resources under uncertainty and the extinction of species. Economic Theory, 28(1):1–23. Nævdal, E. (2003). Optimal regulation of natural resources in the presence of irreversible threshold effects. Natural 31 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 Resource Modeling, 16(3):305–333. Nævdal, E. (2006). Dynamic optimisation in the presence of threshold effects when the location of the threshold is uncertain – with an application to a possible disintegration of the western antarctic ice sheet. Journal of Economic Dynamics and Control, 30(7):1131–1158. Nævdal, E. and Oppenheimer, M. (2007). The economics of the thermohaline circulation – a problem with multiple thresholds of unknown locations. Resource and Energy Economics, 29(4):262–283. Nordhaus, W. (2015). Climate clubs: Overcoming free-riding in international climate policy. American Economic Review, 105(4):1339–1370. Ploeg, F. v. d. and Zeeuw, A. d. (2015a). Climate tipping and economic growth: Precautionary saving and the social cost of carbon. OxCarre Research Paper 118, OxCarre, Oxford, UK. Ploeg, F. v. d. and Zeeuw, A. d. (2015b). Non-cooperative and cooperative responses to climate catastrophes in the global economy: A north-south perspective. OxCarre Research Paper 149, OxCarre, Oxford, UK. Polasky, S., Zeeuw, A. d., and Wagener, F. (2011). Optimal management with potential regime shifts. Journal of Environmental Economics and Management, 62(2):229 – 240. Ren, B. and Polasky, S. (2014). The optimal management of renewable resources under the risk of potential regime shift. Journal of Economic Dynamics and Control, 40(0):195–212. Riley, J. and Zeckhauser, R. (1983). Optimal selling strategies: When to haggle, when to hold firm. The Quarterly Journal of Economics, 98(2):267–289. Rob, R. (1991). Learning and capacity expansion under demand uncertainty. The Review of Economic Studies, 58(4):655– 675. Sakamoto, H. (2014). A dynamic common-property resource problem with potential regime shifts. Discussion Paper E-12-012, Graduate School of Economics, Kyoto University. Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., Held, H., van Nes, E. H., Rietkerk, M., and Sugihara, G. (2009). Early-warning signals for critical transitions. Nature, 461(7260):53–59. Scheffer, M., Carpenter, S., Foley, J. A., Folke, C., and Walker, B. (2001). Catastrophic shifts in ecosystems. Nature, 413(6856):591–596. Tavoni, A., Dannenberg, A., Kallis, G., and Löschel, A. (2011). Inequality, communication, and the avoidance of disastrous climate change in a public goods game. Proceedings of the National Academy of Sciences, 108(29):11825– 11829. Tsur, Y. and Zemel, A. (1994). Endangered species and natural resource exploitation: Extinction vs. coexistence. Natural Resource Modeling, 8:389–413. Tsur, Y. and Zemel, A. (1995). Uncertainty and irreversibility in groundwater resource management. Journal of Environmental Economics and Management, 29(2):149 – 161. van Damme, E. (1989). Renegotiation-proof equilibria in repeated prisoners’ dilemma. Journal of Economic Theory, 47(1):206–217. 32 1015 Appendix 1016 A.1 1017 Recall that Proposition 1 states that when the location of the threshold is known with certainty, then there 1018 exists, for every combination of β, N , and R, a value Tc such that the first-best can be sustained as a Nash- 1019 Equilibrium when T ≥ Tc , where Tc is defined by Ψ = 0. The critical value Tc is higher the larger is N and R, 1020 or the smaller is β. It is useful to replicate equation (3) that describes the gain from immediate depletion over staying at T 1021 1022 Proof of Proposition 1 when all other agents stay at T : u(T /N ) N −1 Ψ(T, R, N, β) = u R − T − N 1−β 1023 1024 (3) To show that a value Tc , defined by Ψ = 0, always exists, I first note that Ψ declines monotonically in 0 (T /N ) T: = − NN−1 u0 R − NN−1 T − N1 u 1−β < 0 as u0 > 0, N ≥ 1 and β ∈ (0, 1). Then, I show that Ψ is ∂Ψ ∂T 1025 larger than zero at T = 0: Ψ(0, R, N, β) = u(R) > 0. Finally, I show that Ψ is smaller than zero as T → R: 1026 β limT →R Ψ = − 1−β u(R/N ) < 0 as β ∈ (0, 1) and u(R/N ) > 0. Thus, by the mean value theorem, for every 1027 combination of β, N , and R, there must be a value of T at which Ψ = 0. Now, to show that staying at T > Tc is indeed the socially optimal action (the first-best), I show that 1028 1029 dTc dN 1030 which by implies that Tc is smallest when N = 1 (the social planner case). As Ψ is monotonically declining in 1031 T , it will be socially optimal to stay at T for all values of T that are larger than the social-planner’s Tc . A Tc 1032 is implicitly defined by Ψ = 0 we have: > 0. This means that the critical value at which staying is a Nash-Equilibrium is higher the larger is N , ∂Ψ/∂N dTc =− >0 dN ∂Ψ/∂T 1033 1034 1035 1036 ∂Ψ ∂N We know that the denominator is negative, so that = NT2 u0 R − NN−1 T − u0 (T /N ) > 0. dTc dN > 0 when the numerator is positive. We have: Finally, it remains to show that Tc is higher the larger is R and the smaller is β. Again, a sufficient condition for the former statement is ∂Ψ > 0, which holds because ∂Ψ = u0 R − NN−1 T > 0. A sufficient ∂R ∂R dTc dβ < 0 is that ∂Ψ ∂β < 0 which holds because ∂Ψ ∂β /N ) = − u(T < 0. (1−β)2 1037 condition for 1038 A.2 1039 Recall that Proposition 2 consists of two parts: First, it states that there exists a set S so that for s ∈ S, it is 1040 optimal to choose δ(s) = 0. That is, if s0 ∈ S, the socially optimal use of the resource is s0 for all t. Second, 1041 the proposition states that if s0 ∈ / S, it is optimal to experiment once at t = 0 and expand the set of safe values 1042 by δ ∗ (s0 ). When this has not triggered the regime shift, it is socially optimal to stay at s1 = s0 + δ ∗ (s0 ) for all 1043 t ≥ 1. Proof of Proposition 2 33 1044 1045 First, I show that there is a non-empty set S ⊂ [0, R] at which it is optimal to stay. Assume Part (1) for contradiction that for all s in [0, R]: max δ∈(0,R−s] 1046 u(s + δ) + βLs (δ)V (s + δ) > u(s) + βV (s) (A-1) Then there is a value of δ such that: V (s) − L(s + δ) u(s + δ) − u(s) V (s + δ) < L(s) β (A-2) 1047 Now as u is concave, positive, and bounded above by u(R), we know that for s sufficiently close to R, the 1048 denominator of the RHS of (A-2) is bounded above: u(s + δ) − u(s) < βKδ. Using this and multiplying both 1049 sides by L(s) as well as dividing both sides by δ we have: L(s)V (s) − L(s + δ)V (s + δ) < KL(s) δ (A-3) 1050 Now, take the limit as s → R. Because δ ∈ (0, R − s], we have that δ → 0 when s → R so that the LHS of 1051 (s)] (A-3) is the negative of the derivative of L(s)V (s): lims→R − ∂[L(s)V = f (R)V (R), which is positive while ∂s 1052 the RHS of (A-3) vanishes when F (R) = 1. Thus, we have a contradiction and there must be some s at which 1053 it is optimal to choose δ(s) = 0. When there is a positive probability that there is no threshold on [0, R] (that 1054 is, F (R) < 1), the RHS of (A-3) does not vanish. Nevertheless, there will always be value of s, namely s = R, 1055 at which it is optimal to stay – simply because there is no other choice. Thus, the set S is not empty. Moreover, when the hazard rate is not decreasing with s (that is when ∂Ls (δ) ∂s = −f (s+δ)(1−F (s))+(1−F (s+δ))f (s) [1−F (s)]2 ∗ <0⇔ f (s) 1−F (s) < f (s+δ) ), 1−F (s+δ) it can be shown that the set S is convex, ∗ so that S = [s , R] where s is defined in the main text as the lowest value of s at which it is optimal to never experiment. First, note that convexity of S is trivial when s∗ = R. Consider then the case that s∗ < R. By definition, the first-order condition (equation (7) in the main text) must just hold with equality for s∗ : ϕ0 (δ; s) = 0 ⇔ u0 (s∗ ) = β f (s∗ ) u(s∗ ) 1 − F (s∗ ) 1 − β S is convex if for any s ∈ (s∗ , R] we have ϕ0 < 0 (i.e. a boundary solution of δ ∗ = 0). That is: u0 (λs∗ + (1 − λ)R) < β f (λs∗ + (1 − λ)R) u(λs∗ + (1 − λ)R) 1 − F (λs∗ + (1 − λ)R) 1−β for all λ ∈ (0, 1] (A-4) 1056 Because u0 > 0 and u00 ≤ 0, the term on the LHS of (A-4) is smaller the larger is λ. Because u0 > 0 the 1057 rightmost fraction of (A-4) is larger the larger is λ, β is positive constant. The term in the middle is the 1058 hazard rate, which is non-decreasing by assumption. 1059 1060 Part (2) When s0 ∈ / S, it is not optimal to stay. Thus, it is optimal to expand the set of safe consumption values by choosing δ > 0. Due to discounting, it cannot be optimal to approach S asymptotically. Thus there 34 1061 must be a last step from some st ∈ / S to st+1 = st + δt with st+1 ∈ S. Below, I show that it is in fact optimal 1062 to take only one step. It then follows that when s0 ∈ / S, it is optimal to choose s0 + δ ∗ (s0 ) for t = 0 and, if the 1063 resource has not collapsed, s1 for all t ≥ 1. 1064 Denote δ ∗ (s̃) the optimal last step when starting from some value s̃ ∈ / S and s∗ = s̃ + δ ∗ with s∗ ∈ S. The 1065 following calculations show that going from some s to s̃ (by taking a step of size δ̃) and then to s∗ (by taking 1066 a step of size δ ∗ ) yields a lower payoff than going from s to s∗ directly (by taking a step of size δ̂ = δ̃ + δ ∗ ; see 1067 the box below for a sketch of the involved step-sizes). δ̂ δ̃ s 1068 δ∗ s ^ R s∗ s̃ That is, I claim: u(s + δ̃ + δ ∗ ) u(s + δ̃) + βLs (δ̃) u(s + δ̃ + δ ∗ ) + βLs+δ̃ (δ ∗ ) 1−β The important thing to note is that: Ls (δ̃)Ls+δ̃ (δ ∗ ) = ≤ L(s+δ̃) L(s+δ̃+δ ∗ ) L(s) L(s+δ̃) u(s + δ̂) + βLs (δ̂) = L(s+δ̃+δ ∗ ) L(s) u(s + δ̂) 1−β (A-5) = Ls (δ̃ + δ ∗ ). Hence, (A-5) can, upon using δ̂ = δ̃ + δ ∗ and splitting the RHS into three parts (t = 0, t = 1, t ≥ 2), be written as: u(s + δ̃) + βLs (δ̃)u(s + δ̂) + β 2 Ls (δ̂) u(s + δ̂) 1−β ≤ u(s + δ̂) + βLs (δ̂)u(s + δ̂) + β 2 Ls (δ̂) which simplifies to: u(s + δ̃) ≤ h i 1 + β Ls (δ̂) − Ls (δ̃) u(s + δ̂) " u(s + δ̃) ≤ u(s + δ̂) 1−β # L(s + δ̂) − L(s + δ̃) u(s + δ̂) 1+β L(s) (A-5’) Because the term in the squared bracket is smaller than 1 (as L(s+ δ̂) < L(s+ δ̃)), it is not immediately obvious that the inequality in the last line holds. However, we can use the fact that because s̃ ∈ / S, and because δ ∗ is defined as the optimal last step from s̃ into the set S, the following must hold: u(s̃) u(s̃ + δ ∗ ) < u(s̃ + δ ∗ ) + βLs̃ (δ ∗ ) . 1−β 1−β Using the fact that s̃ = s + δ̃ and that s̃ + δ ∗ = s + δ̂, this can be re-arranged to give: u(s + δ̃) L(s + δ̂) u(s + δ̂) < u(s + δ̂) + β 1−β L(s + δ̃) 1 − β ⇔ " # L(s + δ̂) − L(s + δ̃) u(s + δ̃) < 1 + β u(s + δ̂) L(s + δ̃) Since L0 (s) < 0, we know that L(s+δ̂)−L(s+δ̃) L(s+δ̃) < L(s+δ̂)−L(s+δ̃) L(s) 35 (A-6) < 0. Therefore, combining (A-5’) and (A-6) establishes the claim and completes the proof: " u(s + δ̃) < # L(s + δ̂) − L(s + δ̃) 1+β u(s + δ̂) L(s + δ̃) " < # L(s + δ̂) − L(s + δ̃) u(s + δ̂) 1+β L(s) 1069 A.3 1070 Proposition 3 states that the socially optimal step size δ ∗ (s) is decreasing in s for s ∈ (s∗ , s∗ ). (A-7) Proof of Proposition 3. Recall that δ ∗ (s) is implicitly defined by the solution of ϕ0 (δ ∗ ; s) = 0 for s ∈ (s∗ , s∗ ) and where ϕ0 is: ϕ0 (δ ∗ ; s) = u0 (s + δ ∗ ) + β 0 ∗ Ls (δ )u(s + δ ∗ ) + Ls (δ ∗ )u0 (s + δ ∗ ) 1−β (7) with the second-order condition: ϕ00 (δ ∗ ; s) = u00 + β 00 ∗ Ls (δ )u + 2L0s (δ ∗ )u0 + Ls (δ ∗ )u00 < 0 1−β (A-8) To show that δ ∗ is declining in s, I can use the implicit function theorem and need to show that: ∂ ϕ0 (δ ∗ ; s) /∂s dδ ∗ =− <0 ds ∂ ϕ0 (δ ∗ ; s) /∂δ ∗ The denominator is negative when the second-order condition is satisfied. Therefore, a sufficient condition for dδ ∗ ds < 0 when the second-order condition is satisfied is that ∂[ϕ0 (δ ∗ ; s)] β = u00 + ∂s 1−β ∂[ϕ0 (δ ∗ ;s)] ∂s < 0 holds. In other words, we must have: ∂L0s (δ ∗ ) ∂Ls (δ ∗ ) 0 u + L0s (δ ∗ )u0 + u + Ls (δ ∗ )u00 ∂s ∂s <0 (A-9) Noting the similarity of (A-9) to the second-order condition (A-8), and realizing that (A-8) can be decomposed into a common part A and a part B and that (A-9) can be decomposed into the common part A and a part C, a sufficient condition for (A-9) to be satisfied is that B > C. u00 + | | β 0 β 00 ∗ Ls L(δ ∗ )u0 + Ls (δ ∗ )u00 + Ls (δ )u + L0s (δ ∗ )u0 < 0 1−β 1−β | {z } {z } B A u00 + β 0 ∗ 0 β Ls (δ )u + Ls (δ ∗ )u00 + 1−β 1−β {z } | A In order to show that L00s (δ)u + L0s (δ)u0 > ∂L0s (δ ∗ ) ∂Ls (δ ∗ ) 0 u+ u <0 ∂s ∂s {z } ∂L0s (δ) s (δ) 0 u + ∂L∂s u, ∂s 0 solution from (7) to write u in terms of u: u0 = (A-8’) −L0s (δ) u + Ls (δ) 1−β β 36 (A-9’) C I use the first-order condition for an interior Upon inserting and canceling u, I need to show that: " L00s (δ) Recall that Ls (δ) = 1071 L(s+δ) L(s) + L0s (δ) −L0s (δ) 1−β + Ls (δ) β # ∂Ls (δ) ∂L0s (δ) + > ∂s ∂s " −L0s (δ) 1−β + Ls (δ) β # (A-10) and hence: L0s (δ) = L0 (s + δ) L(s) ∂Ls (δ) L0 (s + δ)L(s) − L(s + δ)L0 (s) = ∂s [L(s)]2 L00s (δ) = L00 (s + δ) L(s) ∂L0s (δ) L00 (s + δ)L(s) − L0 (s + δ)L0 (s) = ∂s [L(s)]2 Tedious but straightforward calculations then show that (A-10) is indeed satisfied. 0 1−β L(s + δ) L00 (s + δ) L (s + δ) 2 1−β L(s + δ) ∂L0s (δ) ∂Ls (δ) 0 + − > + − Ls (δ) β L(s) L(s) L(s) β L(s) ∂s ∂s | {z } | {z } a a ⇔ a 0 L00 (s + δ)L(s) − L0 (s + δ)L0 (s) ∂Ls (δ) 0 L00 (s + δ) L (s + δ) 2 − >a − Ls (δ) L(s) L(s) [L(s)]2 ∂s ⇔ L0 (s + δ) aL00 (s + δ)L(s) − [L0 (s + δ)]2 > a L00 (s + δ)L(s) − L0 (s + δ)L0 (s) − L0 (s + δ)L(s) − L(s + δ)L0 (s) L(s) ⇔ − [L0 (s + δ)]2 L(s) > −aL0 (s + δ)L0 (s)L(s) − L0 (s + δ)L(s)L0 (s + δ) + L(s + δ)L0 (s)L0 (s + δ) ⇔ aL0 (s + δ)L0 (s)L(s) > L(s + δ)L0 (s)L0 (s + δ) ⇔ aL(s) > L(s + δ) ⇔ L(s + δ) 1−β + L(s) > L(s + δ) β L(s) ⇔ 1−β L(s) > 0. β True because β, L ∈ (0, 1). 1072 1073 A.4 1074 Recall that Proposition 4 states that There exists a set S nc such that for s0 ∈ S nc , it is a symmetric Nash- 1075 Equilibrium to stay at s0 and consume 1076 one step and consume 1077 s1 = s0 + N δ nc (s0 ), consuming 1078 1079 Proof of Proposition 4. s0 N s0 N for all t. For s0 ∈ / S nc , it is a Nash-Equilibrium to take exactly + δ nc (s0 ) for t = 0 and – when this has not triggered the regime shift – to stay at s1 N for all t ≥ 1. Preliminarily, note that the game’s stationarity implies that if it is a Nash equilibrium to stay at some s in any one period, it will be a Nash equilibrium to stay at that s in all subsequent periods. 1080 The first part of the proof, showing the existence of S nc , is parallel to the first part of the proof of 1081 Proposition 2 and is not repeated here. It rests on the same argument, namely that there is some s at which 1082 the gains from increasing consumption are small compared to the expected loss, even when the short term gain 1083 does not have to be shared among all N agents. 37 1084 To prove the second part of the proposition, I need to show that, for s0 ∈ / S nc , any agent i prefers to 1085 reach the set S nc in one step rather than two when the strategy of all other agents is to first take one step of 1086 fixed size and then a second feedback step δ −i∗ (s1 ) that ensures reaching snc ∈ S nc . Importantly, I use the 1087 symmetry of the agents, that is, the second step δ −i∗ (s) is given by δ −i∗ (s) = (N − 1)δ i∗ (s) where δ i∗ (s) is 1088 the state-dependent best reply defined by equation (11) in the main text. Moreover, I assume that all agents 1089 coordinate on staying at snc whenever it is reached. 1090 The setup is the following: All agents stand at s0 at the beginning of the first period. All agents except 1091 i take step of fixed size and their combined expansion is given by δ̃ −i . If agent i chooses the same symmetric 1092 fixed size, her expansion being denoted by δ̃ i , the state s̃ ∈ / S nc is reached (provided the regime shift has not 1093 occurred). That is, s0 < s̃ < snc and s̃ − s0 = N δ̃ i . Agent i can also choose her first step so that snc already 1094 in the first period. Denote this step that expands the set of safe consumption values from s0 to snc when 1095 the total expansion of all other agents taken together is δ̃ −i by δ̂ i∗ (s0 ). The size of the step δ̂ i∗ (s0 ) is thus 1096 δ̂ i∗ (s0 ) = snc − s0 − δ̃ −i . Because all other agents remain at snc once it is reached, agent i’s payoff is in this 1097 case: π (1) = u s 0 N + δ̂ i∗ (s0 ) + nc s β Ls0 (δ̂ i∗ + δ̃ −i )u 1−β N (A-11) When agent i takes two steps, first a step of size δ̃ i to some value s̃ (with s̃ ∈ / S nc ) and then a second step δ −i∗ (s̃) of size δ −i∗ (s̃) = snc − s̃ − δ −i∗ (s̃), her payoff is: π (2) = u 1098 1099 nc s̃ β s + δ̃ i + βLs0 (δ̃ i + δ̃ −i ) u + δ i∗ + Ls̃ (δ i∗ + δ −i∗ )u N N 1−β N s 0 I now show that π (1) > π (2) . For clarity, rewrite (A-11) and (A-12), using the fact that Ls0 (δ̂ i∗ + δ̃ −i ) = Ls0 (δ̃ i + δ̃ −i )Ls̃ (δ i∗ + δ −i∗ ) = π (1) π (2) 1100 (A-12) L(snc ) : L(s0 ) nc L(snc ) L(snc ) u(snc /N ) s =u + δ̂ (s0 ) + β u + β2 N L(s0 ) N L(s0 ) 1 − β s L(s̃) L(snc ) u(snc /N ) s̃ 0 =u + δ̃ i + β u + δ i∗ (s̃) + β 2 N L(s0 ) N L(s0 ) 1 − β s 0 i∗ (A-11’) (A-12’) Thus, π (1) > π (2) if: u nc s L(snc ) L(s̃) s̃ s 0 + δ̃ i < u + δ̂ i∗ (s0 ) + β u + δ i∗ (s̃) − u N N L(s0 ) N L(s0 ) N s 0 First, by symmetry of the agents and the definition of δ i∗ (s̃), we have u s̃ N + δ i∗ (s̃) = nc s L(s0 ) − L(snc ) s 0 + δ̃ i < u + δ̂ i∗ (s0 ) + β u N N L(s0 ) N s 0 38 s̃+N δ i∗ (s̃) N (A-13) = snc . N (A-13’) Similarly, we have s0 N + δ̃ i = u Now, note that 1101 s0 N s̃ N s0 +N δ̃ i N <u = s̃ N : nc L(s̃) − L(snc ) s + δ̂ i∗ (s0 ) + β u N L(s0 ) N s 0 + δ̂ i∗ (s0 ) = s0 +N δ̂ i∗ (s0 ) N (A-13”) and that the step δ̂ i∗ (s0 ) is larger than the symmetric step that snc −s0 N 1102 would be necessary to reach snc from s0 . Formally: δ̂ i∗ (s0 ) = snc − s0 − δ̃ −i > 1103 1)δ̃ i > snc − s0 ⇔ snc − s0 > N δ̃ i = s̃ − s0 which is true by construction. It follows that 1104 Consequently, it is sufficient condition to show: u s̃ N nc nc L(s̃) − L(snc ) s s u <u +β N L(s0 ) N <u ⇔ N snc − N s0 − N (N − s0 +N δ̂ i∗ (s0 ) N nc L(s̃) − L(snc ) s + δ̂ i∗ (s0 ) + β u N L(s0 ) N s 0 > snc . N (A-14) Parallel to the argument in Proposition 2, we can use the fact that because s̃ ∈ / S nc we must have 1105 u(s̃/N ) u(snc /N ) s̃ <u + δ i∗ (s̃) + βLs̃ (δ i∗ (s̃) + δ −i∗ (s̃)) 1−β N 1−β Using that s̃ N + δ i∗ (s̃) = snc N and re-arranging, we can write this as: nc u(s̃/N ) L(s̃) − L(snc ) s < 1+β u 1−β L(s̃) N 1106 Because L0 (s) < 0, we know that (1) < L(s̃)−L(snc ) L(s0 ) < 0 and combining (A-14) and (A-15) therefore 1107 establishes that π 1108 A.5 1109 Let me repeat the argument from the main text: The effect of an increase in a parameter a in the interior 1110 range s ∈ (snc , snc ) is given by 1111 higher the parameter a, it is sufficient to show that 0 >π (2) L(s̃)−L(snc ) L(s̃) (A-15) and completes the proof. Proof of Proposition 5. dg nc da 0 ∂φ /∂a = − ∂φ 0 /∂g nc . Thus, to show that aggregate consumption is higher the ∂φ0 ∂a > 0 (because the second-order condition implies that 1112 ∂φ ∂g nc 1113 R, neither boundary snc or snc decreases and at least one boundary increases with a. The reason is that for a 1114 given value of R an upward shift of snc or snc (and no downward of the respective other boundary) necessarily 1115 implies that all new values of g nc must lie above the old values of g nc . 1116 (a) The boundaries snc , snc , and aggregate consumption in the cautious equilibrium, N gnc , decrease with β. 1117 Here, it is simple to show that 1118 is term in the squared brackets of equation (12). We know that this term must be negative for an interior 1119 solution because u0 > 0. 1120 (b) An increase in N leads to higher resource use in the cautious equilibrium when 1121 Here, I argue that both snc and snc increase when adding another player and < 0). Because g nc is monotonically decreasing in s, it is also sufficient to show that, for a given value of ∂φ0 ∂β < 0. We have ∂φ0 ∂β = 39 [...] , (1−β)2 where the term in the squared brackets [...] N N +1 R > u0 ( N ) u0 ( NR+1 ). R > u0 ( N ) u0 ( NR+1 ): N N +1 First, for a given number of players N we have at a given snc = ŝ that φ0 ( ŝ R R−s R − ŝ R β 1 ; ŝ) = u0 + + L0ŝ (N δ nc )u + Lŝ (N δ nc )u0 =0 N N N 1−β N N N R−s ; ŝ) > 0 when I now show that φ0 ( N +1 N N +1 > R ) N : R ) N +1 u0 ( u0 ( R−s φ0 ( N ; ŝ) − φ0 ( R−s ; ŝ) > 0 +1 N ⇔ R )+ u0 ( NR+1 ) − u0 ( N 1 β 1 R R ) L0ŝ + Lŝ u0 ( NR+1 ) − u0 ( N ) >0 u( NR+1 ) − u( N 1−β N +1 N 1122 The first part of the last line is positive due to concavity of u, the first term in the squared bracket is 1123 R positive since L0s < 0 and u0 ( NR+1 ) < u( N ), and the last term in the squared bracket is positive whenever 1124 N N +1 > R ) N . R ) N +1 u0 ( u0 ( Second, for a given number of players N we have at a given snc = š that φ0 (0; š, N ) = u0 š N + β š 1 š L0š (0)u + u0 =0 1−β N N N 1125 Clearly, we can make exactly the same argument as above to show that φ0 (0; š, N +1) > 0 when 1126 (c) 1127 where a separate cautious Nash-equilibrium exists and the lower aggregate consumption. 1128 N N +1 > R ) N . R u0 ( ) N +1 u0 ( The more likely the regime shift (in terms of a first-order stochastic dominance), the larger the range Suppose that for some given value ŝ, equation (12) has an interior solution that defines g nc : φ0 (δ nc ; ŝ, L) = u0 ŝ + δ nc N + β ŝ + N δ nc 1 ŝ + N δ nc L0ŝ (N δ nc )u + Lŝ (N δ nc )u0 =0 1−β N N N (12) 1129 I now show that a more likely regime shift (in terms of a first-order stochastic dominance) means a change 1130 in Ls to L̃s in such a way that φ0 (δ nc ; ŝ, L̃) < 0 so that for every s ∈ (snc , snc ) we have that the resulting 1131 interior solution g̃ nc is smaller than the orginal g nc . As a consequence, the range where a separate cautious 1132 Nash-equilibrium exists will also be larger. 1133 1134 A first-order stochastic dominance means that F̃ ≥ F (where the inequality is strict for at least one s). Because the hazard rate is non-declining, this means that −f˜(s+δ) 1−F̃ (s) −f (s+δ) 1−F (s) F̃ (s+δ)−F̃ (s) 1−F̃ (s) ≥ F (s+δ)−F (s) 1−F (s) and consequently 1135 L̃s ≤ Ls . This implies also that L̃0ŝ = 1136 positive second term in the squared bracket of (12) are smaller, which implies that φ0 (δ nc ; ŝ, L̃) < 0. 1137 (d) An increase of R to R̃ for an unchanged risk of the regime shift (i.e. R < R̃ ≤ A) decreases sn c and 1138 thus leads to a larger range where a separate cautious equilibrium exists. < = L0ŝ < 0. Thus, both the negative first term and the 1139 Note that R does not impact equation (12) when R < R̃ ≤ A, but it has an effect on the first value snc : 1140 As the diagonal line defining the upper bound of δ shifts outwards, and g nc (s) is a downward sloping function 1141 steeper than R − s, the first value at which it is not optimal to deplete the resource, snc , must be smaller when 1142 R increases from to R̃ 40