Play with fire: Experiments to find the location of a catastrophic threshold Florian K Diekert§ November 12, 2015¶ Abstract Many dynamic systems exhibit tipping points – they fundamentally change their character once a critical value, or threshold, is crossed. A key aspect is that the threshold’s location is almost always unknown. Because experimentation only reveals whether the threshold has been crossed or not, learning is “affirmative”. When crossing the threshold is disastrous, in the sense that the post-event value is independent of the pre-event state, affirmative learning implies that all experimentation is undertaken at once. However, the magnitude of the experiment is smaller the more valuable the current state. The paper further shows that this feature of learning allows non-cooperative agents to take advantage of the regime shift threat and coordinate on a cautious equilibrium that preserves the resource with positive probability. If the safe status quo is sufficiently valuable, players can even coordinate on the first-best. Keywords: Catastrophic shifts; Tipping points; Learning; Dynamic Games. JEL-Classifications: C73, D83, H41, Q20, Q54 § Department of Economics and Centre for Ecological and Evolutionary Synthesis, University of Oslo, PO-Box 1095, 3017 Oslo, Norway. Email: f.k.diekert@ibv.uio.no. This research is funded by NorMER, a Nordic Centre of Excellence for Research on Marine Ecosystems and Resources under Climate Change. ¶ Job Market Paper; updated frequently. The most recent version is deposited at http://folk.uio.no/ floriakd/papers/DiekertPlayFire.pdf. 1 Introduction Suppose you would want to go as far as possible, but any step too far would be the end. How far would you go and how many steps would you take when you don’t know where the catastrophic threshold is? Many natural systems at the very foundation of human activity exhibit tipping points whose location is unknown. In particular, the dynamics of the climate system may change fundamentally once global warming exceeds a critical value. A rapid meltdown of the Greenland Ice Sheet, a disintegration of the West-Antarctic Ice Sheet, or a shutdown of the Gulf Stream all are possible tipping elements in the earth’s climate system (Lenton et al., 2008). Any of these events may have large and long-lasting, or even disastrous, consequences for life on this planet. Tipping points have also been documented in other natural systems such as lakes, coral reefs, or woodlands (Scheffer et al., 2001). Economically important regimes shifts in the recent past include the collapse of the Canadian cod or Norwegian herring fisheries (Frank et al., 2005; Amundsen and Bjørndal, 1999). Finally, a wide range of applications from the social sphere are illustrated in the popular book by Gladwell (2000). Given its importance, it is not surprising that there is a rapidly growing literature that grapples with this issue and advances our understanding of how to manage dynamic systems under regime shift risks (see below for a review of that literature). However, up to date, two aspects are underexplored: learning about the unknown location of the threshold, and the effect of strategic interaction. To provide sound policy advice, we need to know more about how these two factors affect the aggregate use of resources under regime shift risk. This paper proposes a tractable framework to analyze both learning and strategic interactions. I focus on the uncertainty about the threshold’s location by assuming that the threshold value T is constant but unknown (with an arbitrary continuous probability distribution), which implies that if an exploration of the state space up to a value s has not triggered the regime shift, it will also not do so in the future. The value s is henceforth known to be safe. In the simplest setting, this means that it is socially optimal to either not experiment at all, or to undertake all experimentation at once. The reason is that learning is only “affirmative”: Having explored the state space between s and s0 , it is revealed whether the state s0 is safe or not, but no new knowledge about the relative probability that the threshold is located at, say, s1 or s2 (with s1 , s2 > s0 ) has been acquired. Therefore, it does not pay to experiment a second time. This feature of learning is remarkably robust to the wide array of changes to the basic model structure that I consider in the first and third part of the paper. In the second part of the paper, I turn to a non-cooperative setting (but go back to the simplest version of the system dynamics). I show that strategic agents can take advantage of the regime shift threat and coordinate on a “cautious” equilibrium that preserves the resource with positive probability (although experimentation is still inefficiently risky). If the safe status quo is sufficiently valuable, players can even coordinate on the first-best. 2 Applications The aim of this paper is to develop a generic theoretical model to systematically characterize the effect of a disastrous regime shift on optimal- and non-cooperative experimentation and resource use. My research strategy is to model the occurrence of the regime shift as a function of the current control variable and to simplify the underlying stock dynamics by assuming that the current state equals the last state as long as the threshold has not been crossed. This allows me to concentrate on the uncertainty about the location of the threshold and to obtain clear analytical results. It also implies that the model most closely represents “flow problems”. A very important example of such a problem is saltwater intrusion that threatens freshwater reservoirs in many coastal regions around the world (Barlow and Reichard, 2010). Due to strong pressure from population growth, managers may be tempted to increase the extraction of water from the reservoir beyond what is known to be safe (which is given by the current water table). Because the geological structures are often highly complex, and the spoilage may be irreversible, a manager thus faces the trade-off that increasing extraction may trigger the catastrophic regime shift – or it may prove to be just fine, so that the consumption level that is known to be safe is updated. Another example of a relevant flow problem could be harvesting from an annual fishery, such as fishing for shrimps or small pelagics can be analyzed as such a flow problem. The magnitude of the annual recruitment of individual shrimp/fish is largely controlled by exogenous environmental factors and independent of the remaining stock in the last year – unless the biomass has been depleted below its minimum viable population size. In spite of the simplicity with which the underlying dynamics of the state variable are modeled, it is also possible to apply the model to a “stock problem”. Consider climate change: while it is the temperature level and not consumption directly that will cause the climate system to tip into a potentially disastrous state, there is a one-to-one mapping from consumption to temperature and thus the probability to experience the regime shift. The crucial point is that it is the current addition to the stock that causes tipping (even if the event may occur only a long time in the future). In essence, this perspective reinterprets consumption as the utility derived from the current stock of pollution and subsumes all noncatastrophic damages from an increased pollution stock in the utility function. In this way, one can also think of land-use choices as a relevant example. The supporting services of an ecosystem may stay relatively intact even as large parts of it are permanently transformed and put to other uses (such as turning rainforest into palm-oil plantations). However, once more than a critical part has been converted, the supporting mechanism collapses and the ecosystem tips into a degraded state. Although environmental problems arguably constitute a very important and direct application, my model is more general and can be used to shed light on a variety of processes. For example, the testing of a large-scale machine or a power plant presents engineers with a similar problem: To what level of pressure should they expose a machine to learn about 3 its resistance? When the machine is large and its breakdown costly, it will be optimal to describe a lower bound for the level of pressure that is safe, but not actually find out at which level the machine breaks. Relation to the literature This paper links to three strands of literature. First, it contributes to the literature on the management of natural resources under regime shift risk by explicitly analyzing learning about the location of a threshold. Second, it relates to the more general literature by describing optimal experimentation in a set-up of “affirmative learning”. Third, the paper extends the literature on coordination in face of a catastrophic public bad, that has hitherto been analyzed in a static setting, by showing that the sharp distinction between a known and unknown location of the threshold vanishes in a dynamic context. The pioneering contributions that analyze the economics of regime shifts in an environmental/resource context were Cropper (1976) and Kemp (1976). There are by now a good dozen papers on the optimal management of renewable resources under the threat of an irreversible regime shift (see, for example: Tsur and Zemel, 1995; Nævdal, 2006; Brozović and Schlenker, 2011; Lemoine and Traeger, 2014). Polasky et al. (2011) summarize and characterize the literature at hand of a simple fishery model. They contrast whether the regime shift implies a collapse of the resource or merely a reduction of its renewability, and whether probability of crossing the threshold is exogenous or depends on the state of the system (i.e. it is endogenous). They show that resource extraction should be more cautious when crossing the threshold implies a loss of renewability and the probability of crossing the threshold depends on the state of the system. In contrast, exploitation should be more aggressive when a regime shift implies a collapse of the resource and the probability of crossing the threshold cannot be influenced. There is no change in optimal extraction for the loss-of-renewability/exogenous-probability case and the results are ambiguous for the collapse/endogenous-probability case. Ren and Polasky (2014) scrutinize the loss-of-renewability/endogenous-probability case for more general growth and utility functions. They point out that in addition to the riskreduction effect of the linear model in Polasky et al. (2011), there may also be a consumptionsmoothing and an investment effect. The consumption smoothing effect gives incentives to build up a higher resource stock, so that one has more should the future growth be reduced. The investment effect gives incentives to draw down the resource stock as it the rate of return after the regime shift is lower. Lemoine and Traeger (2014) call the sum of these two countervailing effects the “differential welfare effect”. In their climate-change application the total effect is positive. As I consider a truly catastrophic event, this effect is zero by construction in the baseline model of the current paper. Until now the literature in resource economics has been predominantly occupied with optimal management, leaving aside the central question of how agent’s strategic considerations influence and are influenced by the potential to trigger a disastrous regime shift. 4 Still, there are a few notable exceptions: Crépin and Lindahl (2009) analyze the classical “tragedy of the commons” in a grazing game with complex feedbacks, focussing on open-loop strategies. They find that, depending on the productivity of the rangeland, under- or overexploitation might occur. Kossioris et al. (2008) focus on feedback equilibria and analyze, with help of numerical methods, non-cooperative pollution of a “shallow lake”. They show that, as in most differential games with renewable resources, the outcome of the feedback Nash equilibrium is in general worse than the open-loop equilibrium or the social optimum. Ploeg and Zeeuw (2015b) compare the social optimal carbon tax to the tax in the openloop equilibrium under the threat of a productivity shock due to climate change. While it would be optimal in their two-region model to have taxes that converge to a high level, the players apply diverging taxes in the non-cooperative equilibrium: While the more developed and less vulnerable region has a lower carbon tax throughout, the less developed and more vulnerable region has low taxes initially in order to build up a capital stock, but will apply high taxes later on in order to reduce the chance of tipping the climate into the unproductive regime. Fesselmeyer and Santugini (2013) introduce an exogenous event risk into a non-cooperative renewable resource game à la Levhari and Mirman (1980). As in the optimal management problem with an exogenous probability of a regime shift, the impact of shifted resource dynamics is ambiguous: On the one hand, the threat of a less productive resource induces a conservation motive for all players, but on the other hand, it exacerbates the tragedy of the commons as the players do not take the risk externality into account. Finally, Sakamoto (2014) has, by combining analytical and numerical methods, analyzed a non-cooperative game with an endogenous regime shift hazard. He shows that this setting may lead to more precautionary management, also in a strategic setting. Miller and Nkuiya (2014) also combine analytical and numerical methods to investigate how an exogenous or endogenous regime shift affects coalition formation in the Levhari-Mirman model. They show that an endogenous regime shift hazard increases coalition sizes and it allows the players, in some cases, to achieve full cooperation. Taken together, these studies and the current paper show that the effect of a regime shift pulls in the same direction in a non-cooperative setting as under optimal management. However, both the literature on optimal resource management under regime shift risk and its non-cooperative counterpart have not explicitly addressed learning about the unknown location of the tipping point. There are two papers that discuss optimal experimentation in an environment of affirmative learning: Rob (1991) studies optimal and competitive capacity expansion when market demand is unknown. Rob finds that learning will take place over several periods. Costello and Karp (2004) investigate optimal pollution quotas when abatement costs are unknown. In line with the baseline model of the current paper, they find that any experimentation takes place in the first period only. The difference between Rob’s model on the one hand and Costello and Karp (2004) on the other hand is the following: In Rob (1991), the information gained by an additional unit of installed capital is small, but so is the cost. However, experimenting too much (in the 5 sense of installing more capital than is needed to satisfy the revealed demand) is very costly compared to experimenting too little (so that the true size of the market remains unknown) several times. Consequently, learning takes place gradually. In the competitive equilibrium, learning is even slower due to the private nature of search costs but the public nature of information. In Costello and Karp (2004), the information gain from an additional unit of quota is small as well, but search costs are very high in the beginning and then decline. In fact, the costs are zero once the quota is non-binding. Thus, although the costs of an experiment are high, there is no additional harm from experimenting too much and it is therefore optimal to search only once. In my model, the marginal gain from search is bounded above and decreasing, but the marginal costs from search increase. The disastrous regime shift occurs when the threshold is crossed, irrespective of how far the agents have stepped over it. As in Costello and Karp (2004), there is thus no additional harm in experimenting too much (but the costs of an experiment are increasing with its size in my model), and it is therefore optimal to search only once. I show that non-cooperative learning is more aggressive than socially optimal because costs of search are public while the immediate gains are private. Moreover, I show that experimentation is decreasing in the value of the state that is known to be safe: The more the players know that they can safely consume, the less will they be willing to risk triggering the regime shift by enlarging the set of consumption opportunities. This aspect has, to the best of my knowledge, not yet been appreciated. My discussion of learning about a reversible threshold also extends the pioneering treatment of Groeneveld et al. (2013). I show that their result, that the upper bound of the belief about the threshold’s location is expanded only once, holds more generally for any concave utility function and continuous probability density. But in contrast to what their numerical forward simulation suggests, I find that learning occurs in at most a finite number of steps. At this point it is important to highlight the difference between the current approach, in which learning is “affirmative”, on the one hand, and, on the other hand, the literature on strategic experimentation (e.g.: Bolton and Harris, 1999; Keller et al., 2005; Bonatti and Hörner, 2015) and the literature on learning in a resource management context (e.g.: Kolstad and Ulph, 2008; Agbo, 2014; Koulovatianos, 2015). In the latter two strands of literature, learning is “informative” in the sense that the agents obtain a random sample on which they base their inference about the state of the world. It pays to obtain repeated samples as this improves the estimate (of course, the public nature of information introduces free-rider incentives in a strategic setting). Finally, the current paper is closely related to three articles that discuss the role of uncertainty about the threshold’s location on whether a catastrophe can be avoided. Barrett (2013) shows that players in a linear-quadratic game are in most cases able to form self-enforcing agreements that avoid catastrophic climate change when the location of the threshold is known but not when it is unknown. Similarly, Aflaki (2013) analyzes a model of a common-pool resource problem that is, in its essence, the same as the stage-game 6 developed in section II–1. Aflaki shows that an increase in uncertainty leads to increased consumption, but that increased ambiguity may have the opposite effect. Bochet et al. (2013) confirm the detrimental role of increased uncertainty in the stochastic variant of the Nash Demand Game: Even though “cautious” and “dangerous” equilibria co-exist (as they do in my model), they provide experimental evidence that participants in the lab are not able to coordinate on the Pareto-dominant cautious equilibrium.1 However, the models in Aflaki (2013), Barrett (2013), and Bochet et al. (2013) are all static and can therefore not address the prospects of learning. Here, I show that the sharp distinction between known and unknown location of a threshold vanishes in a dynamic context. More uncertainty still leads to increased consumption, but this is now partly driven by the increased gain from experimentation. Analyzing how strategic interactions shape the exploitation pattern of a renewable resource under the threat of a disastrous regime shift is important beyond mere curiosity driven interest. It is probably fair to say that international relations are basically characterized by an absence of supranational enforcement mechanisms which would allow to make binding agreements. But also locally, within the jurisdiction of a given nation, control is seldom complete and the exploitation of many common pool resources is shaped by strategic considerations. Extending our knowledge on the effect of looming regime shifts by taking non-cooperative behavior into account is therefore a timely contribution to both the scientific literature and the current policy debate. Plan of the paper The structure of the paper is simple. In the first part, I focus on learning and explore the effect of various structural assumptions in the context of a first-best social planner problem. In Part II, I focus on strategic interaction and to do so, I revert to the simplest version of the dynamic model. Finally, I discuss extensions to the underlying model of resource dynamics in Part III. All proofs are relegated to the Appendix. 1 Bochet et al. (2013, p.1) conclude that a “risk-taking society may emerge from the decentralized actions of risk-averse individuals”. Unfortunately, it is not clear from the description in their manuscript whether the participants were able to communicate. The latter has shown to be a crucial factor for coordination in threshold public goods experiments (Tavoni et al., 2011; Barrett and Dannenberg, 2012). Hence, it may be that what they refer to as “societal risk taking” is simply the result of strategic uncertainty. 7 Part I – The social optimum In this part of the paper, I introduce the basic modeling framework in a setting where a social planner seeks to find the optimal strategy (section I–1). I first analyze the baseline scenario of an unconstrained optimization in section I–2, showing that all experimentation is undertaken at once, but that the magnitude of experimentation is smaller the more valuable the current state. In section I–3, I develop a simple example for specific functional forms to obtain closed-form solutions. Generalizations and extensions to the baseline scenario, such as incorporating the potential reversibility of a regime shift, are discussed in section I–4. Although the model that I present below is generic, and could fit many different applications, it may be useful to fix ideas by considering a real world example of an underground aquifer. The overall volume of freshwater in the reservoir is approximately known, and the annual recharge, say due to rainfall or from melting snow is sufficient to fully replenish it. However, the manager fears that extracting all the water in the reservoir may cause the intrusion of saltwater. Further, suppose the underwater geology is complex so that it is not known at which level of the water table saltwater intrusion would occur. (Suppose further that the location of the threshold cannot be adequately learned by scientific investigation, or it is feared that the scientific exploration itself causes the intrusion of saltwater by destabilizing the geological structure.) However, saltwater intrusion has not occurred in the past, so that the current level of use is known to be safe. Thus, the manager now faces the trade-off whether to expand the current consumption of water, or not. If she decides to expand the current level of use, by how much should extraction increase, and in how many steps should the expansion occur? With this example in mind, let me now present the formal framework. I–1 The model To concentrate on learning and the uncertainty about the location of the threshold T , I strip the model to its bare necessities: Utility at time t is derived from consumption ct according to a function u (with u0 > 0, u00 ≤ 0). Consumption cannot exceed the resource base, which is given by R (as long as the threshold has not been crossed). I treat the threshold as constant, but its location is unknown. The social planner has a prior belief F about its location, so that we are in a situation of risk (and not Knightian uncertainty; more about the updating of beliefs below). Time is discrete and indexed by t = 0, 1, .... The social planner seeks to maximize the discounted sum of utilities, where the discount factor is given by β. To focus on the effect of a catastrophic threshold, the resource dynamics are simple and given by equation (1): ( R0 = R ; Rt+1 = R if ct ≤ T r if ct > T or Rt = r (1) Note that the abstraction from the stock dynamics in absence of the regime shift must 8 not mean that they do not exist. One could very well imagine a more general, underlying, process that generates R, which could be interpreted as the consumption level that would be optimal in absence of a regime shift risk. For example, this could be the steady-state harvest level when applying this model to a fishery, or it could be the bliss point where marginal benefits from pollution equal the marginal cost of pollution in an application to climate-change. The model formulation of equation (1) simply means that the internal resource dynamics are not relevant for analyzing the tipping point problem. Note further that the extreme simplicity of modeling brings some freedom: Should the regime shift have occurred, it is obvious that the best action of the social planner is to set ct to r for all remaining time. Without loss of generality, one can therefore normalize u(r) = 0. The post-event continuation value is consequently zero, which greatly simplifies the algebra. However, this must not mean that all economic activity stops once the threshold is crossed, but it simply implies that the pre-event value could be interpreted as the benefit that is obtained in addition to some post-event baseline. While I do discuss extensions and different modeling assumptions (such as a delay in the occurrence, or reversibility of the regime shift) in section I–4, I would argue that the above model is a very sensible way to analyze the problem. Granted, equation (1) implies that – figuratively speaking – the edge of the cliff is a safe place (choosing ct = T does not trigger the regime shift) and this does indeed not seem to be a very realistic feature at first sight. Upon closer inspection, however, one realizes that there are two aspects that could make the edge an unsafe place that need to be distinguished here. First, there could be additive disturbances, and second, there could be multiplicative disturbances in the system. Additive disturbances, such as stochastic (white) noise, are independent of the current state and would not affect the calculations in a meaningful way. They could be absorbed in the discount factor. To be more concrete, think of a sardine or shrimp fishery, and let T be the minimum viable population size, so that the fish stock collapses once the escapement falls below this threshold. Stochastic noise would then mean that T moves for some exogenous reason such as changes in salinity. Multiplicative disturbances, in contrast, would not be independent of the current state. In the shrimp example, the survival from egg to larvae could be viewed as the product of many smaller survival events at the individual level. This would mean that a population collapse is more likely to occur when escapement is closely above the average threshold value and T is small, than when escapement is closely above the average threshold value and T is large. However, this second aspect can be readily accounted for in the probability distribution function.2 Let me therefore now turn to the probability of triggering the regime shift. Let the probability density of T on [0, A] be given by a continuous function f such that the cumuRx lative probability of triggering the regime shift is a priori given by F (x) = 0 f (τ )dτ . The variable A with R ≤ A ≤ ∞ denotes the upper bound of the support of T . When R < A, 2 Note that I rule out a constructivist worldview. T is given by nature. While the distance between me and the edge of the cliff (the threshold) gets smaller because I walk towards it, the threshold does not appear because I am walking. 9 there is some probability 1 − F (R) that extracting the entire amount of the resource is actually safe and the presence of a critical threshold is immaterial. When R = A extracting the entire amount of the resource will trigger the regime shift for sure. Both R and A are known with certainty. Because T is constant, it follows that any segment of the state space that has been explored without observing the threshold is known to be safe, also in the future. It is therefore useful to split the per-period consumption choice in two parts: ct = st + δt . This means: 1. The planner consumes st (the amount of the resource that can be used safely). 2. The planner may choose to consume an additional amount δt , effectively pushing the boundary of the safe consumption set at the risk of triggering the regime shift. Knowing that a given a given exploitation level s is save, the updated density of T on [s, A] is given by fs (δ) = f (s+δ) 1−F (s) (Figure 1). The cumulative probability of triggering the regime shift when, so to say, taking a step of distance δ from the safe value s is: Z Fs (δ) = δ fs (τ )dτ = 0 1 1 − F (s) Z δ f (s + ξ)dξ F (s + δ) − F (s) 1 − F (s) = 0 So that Fs (δ) is the discretized version of the hazard rate. I assume that the hazard rate is not decreasing with s. Most proofs do not rely on this assumption, but it simplifies the Density following exposition considerably.3 0 s R Figure 1: Updating of belief upon learning that T > s: Grey area is F , blue hatched area is Fs . The (bayesian) updating of beliefs is illustrated in Figure 1. Note that it is only revealed whether the state s is safe or not, but no new knowledge about the relative probability that 3 Note that this assumption is not necessarily inconsequential with respect to the optimal policy: Essentially, it rules out very “rugged” probability distributions, where – figuratively speaking – one could be in a situation such that one would not want to take another step just to the left of a peak in the density, but if one were to the right, one would want to jump very far to the next peak. In other words, this assumption guarantees that the optimal policy is a continuous function of the state. At the end of the day, this assumption is not very strong, as I am most interested in analyzing cases that are sufficiently well behaved to be amenable to real world applications. 10 the threshold is located at, say, s1 or s2 (with s1 , s2 > s) has been acquired. Therefore, I call this type of learning “affirmative”. The absence of any passive learning (an arrival of information simply due to the passage of time) is justified in a situation where all learning opportunities from other, similar resources have been exhausted. The only way to learn more about the location of the threshold in the specific problem at hand is to experiment with it.4 Another explanation for the absence of passive learning is when the resource at hand is very large and unique, such as the planet earth when thinking about tipping points in the climate system. The key expression that I use in the remainder of the paper is Fs (δ), which I call the survival function. It denotes the probability that the threshold is not crossed when taking a step δ, given that the event has not occurred up to s. Let F (x) = 1 − F (x) and Fs (δ) = 1 − Fs (δ) = F (s + δ) 1 − F (s) − (F (s + δ) − F (s)) = 1 − F (s) F (s) (2) The survival function has the following properties: h i (R) • Fs (δ) ∈ 1−F ; 1 (it is bounded below by the conditional probability that T is not 1−F (s) in the interval [s, R]); • ∂Fs (δ) ∂δ = −f (s+δ) 1−F (s) • ∂Fs (δ) ∂s = −f (s+δ)(1−F (s))+(1−F (s+δ))f (s) [1−F (s)]2 I–2 < 0 (the survival probability decreases as the step size increases); < 0 because f (s) 1−F (s) < f (s+δ) 1−F (s+δ) by assumption. Optimal experimentation Starting from a given safe value s, the social planner has in principle two options: She can either stay at s (choose δ = 0), thereby ensuring the existence of the resource in the next period (as the probability of crossing the threshold is then 0, or Fs (0) = 1). Alternatively, she can take a positive step into unknown territory (choose δ > 0), potentially expanding the set of safe consumption possibilities to s0 = s + δ, albeit at the risk of a resource collapse (as Fs (δ) < 1 for δ > 0). The social planner’s “Bellman equation” is thus: V (s) = max δ∈[0,R−s] u(s + δ) + βFs (δ)V (s + δ) (3) The crux is, of course, that the value function V (s) is a priori not known. However, we do know that once the planner has decided to not expand the set of safe consumption possibilities, it cannot be optimal to do so at a later period: If δ = 0 is chosen in a given period, nothing is learned for the future (s0 = s), so that the problem in the next period is 4 An everyday example is blowing up a ballon: We all know that they will burst at some point, and we have blown up sufficiently many balloons, or seen our parents blow sufficiently many balloons to have a good idea which size is safe. But for a given balloon at hand, I do not know when it will burst. 11 identical to the problem in the current period. If moving in the next period would increase the payoff, it would increase the payoff even more when one would have made the move a period earlier (as the future is discounted). To introduce some notation, let S be the set of values s∗ at which it is not socially optimal to experiment (as the threat of a disastrous regime shift is too large) and let s∗ be the lowest member of this set of values. In Appendix A–1, part 1, I show that S is not empty, so that for s ∈ S, it is optimal to choose δ = 0. In this case, we know V (s): it is given by V (s) = u(s) 1−β . This leaves three possible paths when starting from values of s0 that are not in S. The social planner could a.) make one step and then stay, b.) make several, but finitely many steps and then stay, c.) make infinitely many steps. Suppose that S is reached in finitely many steps. This implies that there must be a last step. For this last step, we can explicitly write down the objective function as we know that the continuation value of staying at s0 forever is u(s0 ) 1−β . Denote the social planner’s valuation of taking exactly one step δ from the initial value s and then staying at s0 forevermore by ϕ(s) and denote by δ ∗ (s) the optimal choice of the last step. Formally: ϕ(δ; s) = u(s + δ) + βFs (δ) u(s + δ) . 1−β (4) This yields the following first-order-condition for an interior solution: ϕ0 (δ; s) = u0 (s + δ) + i β h 0 Fs (δ)u(s + δ) + Fs (δ)u0 (s + δ) = 0. 1−β (5) Note that we need not have an interior solution so that δ ∗ (s) = 0 when ϕ0 (δ; s) < 0 for all δ ∈ (0, R − s] and δ ∗ (s) = R − s when ϕ0 (δ; s) > 0 for all δ ∈ [0, R − s). That is: δ ∗ (s) = max 0 ; min {arg max ϕ(δ; s) ; R − s} . (6) With this explicit functional form in hand, I can show that it is better to traverse any given distance before remaining standing in one step rather than two steps (Appendix A–1, part 2). A fortiori, this holds for any finite sequence of steps. Also an infinite sequence of steps cannot yield a higher payoff since the first step towards S will be arbitrarily close to S and discounting ensures that there is no gain from never actually reaching S. The intuition is the following: Given that it is optimal to eventually stop at some s∗ ≥ s̄∗ , the probability that the threshold is located on the interval [s0 , s∗ ] is exogenous. Hence the probability of triggering the regime shift when going from s0 to s∗ is the same whether the distance is traversed in one step or in many steps. Due to discounting, the earlier the optimal safe value s∗ is reached, the better. In other words, given that one has to walk out into the dark, it is best to take a deep breath and get to it. 12 The first-best consumption pattern is summarized by the following proposition: Proposition 1. The socially optimal total use of the resource is either s0 for all t or s0 + δ ∗ (s0 ) for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1. In other words, any experimentation – if at all – is undertaken in the first period. Proof. The proof is given in Appendix A–1. In short, the dynamics of learning are stunted: For initial values of s below some threshold s∗ , it is optimal to make exactly one step and then stay at the updated value s0 forever (provided T is not located between s and s0 , of course). For initial values of s in S, it is optimal to never expand the set of safe consumption possibilities. Note that the setup here explicitly allows to take any step size that one wish to take. In many real world applications, this would not be possible. There could, for example, be capacity constraints to harvest, or there could be convex adjustment cost that make it prohibitively costly to take large steps. In this sense, the current section analyses an ideal case, whereas constraints on the choice are discussed in more detail in section I–4.2. Proposition 2 then shows that the more the social planner knows, the less she wants to learn. In other words, the degree of experimentation is declining in s. The intuition for this effect is clear: The more valuable my current outside option, the less I can gain from an increased consumption set, but the more I can lose should the experiment trigger the regime shift. This implies that the largest step is undertaken when s = 0, which is reminiscent of Janis Joplin’s dictum that “freedom is another word for nothing left to lose”. Proposition 2. The socially optimal step size δ ∗ (s) is decreasing in s. Proof. The proof is placed in Appendix A–2. I–3 A simple example with specific functional forms For a given utility function and a given probability distribution of the threshold’s location it is then possible to find closed-form solutions for δ ∗ (s) and the value function V (s). Below, 1 I do this for u(c) = c 2 and a uniform probability distribution so that it is especially easy to explicitly solve (5). When the social planner thinks that every value in [0, A] is equally likely to be the threshold we have f = 1 A, and accordingly Fs (δ) = A−s−δ A−s . Consequently, we have 1 A − s − δ (s + δ) 2 ϕ(δ; s) = (s + δ) + β . A−s 1−β 1 2 (7) This yields the following first-order-condition for an interior solution: " # 1 2 1 1 1 β (s + δ) A − s − δ 1 − + (s + δ)− 2 = 0. ϕ0 (δ; s) = (s + δ)− 2 + 2 1−β A−s A−s 2 13 (8) which can be solved for: δ∗ = A − (1 + 2β)s 3β (9) Recall that there need not be an interior solution. When s ≥ s∗ , it is optimal to never experiment and to remain at the initial safe value s. But also when the future is discounted heavily (so that β is very low) there may be a range of initial values for which it is not optimal to choose an interior step size δ ∗ ∈ (0, R − s), but rather try immediately whether consuming the entire resource triggers the regime shift. Choosing δ(s) = R − s could even be the optimal action when it is known to cause the catastrophe (i.e. when A = R), namely when the current consumption of R is more valuable than to stay below R in the present period in order to increase the chances of a continued consumption in the future. Denote by s∗ the largest member of the set of initial values at which it is optimal to consume the A−(1+2β)s . At 3β A−(1+2β)s . Thus: 3β entire resource. s∗ is found by solving R − s = standing, so that s∗ is found by solving 0 = A − 3βR s = max 0, (1 − β) ∗ ∗ s = min s∗ it is optimal to remain A ,R 1 + 2β Total extraction δ(s) Figure 2 plots the socially optimal extension δ of the safe consumption. R-s s* s* R Initial safe value s Figure 2: Illustration of policy function δ(s). The blue circles represent the socially optimal extension δ of the safe consumption set s (on the y-axis) as a function of the safe consumption set on the x-axis (where obviously s ≤ R and δ ∈ [0, R − s]). For values of s below s∗ , it is optimal to consume the entire resource (choose δ(s) = R − s). For values of s above s∗ , it is optimal to remain standing (choose δ(s) = 0). The discount factor is set to β = 0.32 to illustrate the values s∗ and s∗ . 14 I–4 Generalizations and Extensions In this section, I relax several of the underlying assumptions of the model. I discuss how optimal experimentation changes when the regime shift occurs only after some delay (section I–4.1), how an exogenously growing upper bound of the consumption possibility set Rt+1 ≥ Rt changes the optimal consumption pattern (section I–4.2), and I analyze optimal experimentation when the regime shift is reversible (section I–4.3). Throughout, I keep to the assumption that the resource replenishes fully as long as the threshold has not been crossed. More complex resource dynamics where the upper bound of the consumption possibility set in the next period explicitly depends on what is left behind in this period are addressed in Part III of the paper. I–4.1 Delay in the occurrence of the regime shift Consider a situation where the social planner, in a given period, observes only with some probability whether she has crossed the threshold. In fact, it is not unreasonable to model the true process of the resource as hidden and that it will manifest itself only after some delay (see Gerlagh and Liski (2014) for a recent paper that focusses on this effect in the context of optimal climate policies). Hence, as time passes the planner will update her beliefs about whether the threshold has been located on the interval [st , st + δt ]. How does this passive learning affect optimal experimentation? Although solving for the optimal consumption decision becomes extremely difficult as – due to the delay – the problem is no longer Markovian, it is possible to show the following: Proposition 3. Also when crossing the threshold at time t triggers the regime shift at some (potentially uncertain) time τ > t, it is still optimal to experiment – if at all – in the first period only. Proof. The proof is given in Appendix A–3. In other words, the fact that the learning dynamics are stunted is robust to a delay in the occurrence of the regime-shift. This does of course not imply that the optimal decision under the two different models will be the same. They almost surely will differ, as delaying the consequences of crossing the threshold decreases the costs of experimentation. Yet, as the planner only learns that she has crossed the threshold when the disastrous regime shift actually occurs, she cannot capitalize on this delay by trying to expand the set of safe consumption possibilities several times. I–4.2 Growing R and constraints on the choice set Previously, the upper bound of the resource, R has been treated as known and constant. In this subsection, I shall depart from this assumption and consider the case when R increases 15 (but f and T remain unchanged). Formally, the resource dynamics can be expressed as: Rt+1 P i G(Rt ) if i ct ≤ T = P i r if i ct > T (10) or Rt = r where G0 (R) > 0. In this situation, there is scope for a continued expansion of the set of safe consumption possibilities, but only as long as the upper bound of the available resource at time t, Rt , is binding. As Rt can, by construction, not exceed A and we know from the proof of Proposition 1 that there will be some point at which it is not socially optimal to further expand the set of safe consumption possibilities. Thus, once δt (st ) < Rt − st for some t = τ , we have δt = 0 for all t > τ . That said, a growing Rt may induce several periods where δt (st ) = Rt − st . The validity of this conclusion can be easily checked by observing that the first-order condition for an interior choice of δt (equation 5) does not depend on R. Note that this argument also shows that uncertainty about R is immaterial for the optimal learning dynamics. Similarly, constraints on the choice set (such that δ ∈ [0, δmax ] where δmax < R − s) will mechanically lead to repeated experimentation. When the first-best unconstrained expansion of the set of safe consumption possibilities is δ ∗ (s0 ) but δmax is such that it requires several steps to traverse the distance δ ∗ (s0 ), then the safe value s will be updated sequentially (conditional on not causing the regime shift, of course). Note that it follows directly from Proposition 2 that a constrained choice set implies an overall more cautious plan: As δ ∗ (s) is declining in s, it will at some point no longer optimal to choose δmax but rather an interior step size δ ∈ [0, δmax ] will be optimal. I–4.3 Reversible regime shifts So far, the regime shift was assumed to be irreversible. While this simplified the analysis, this may not be an adequate description for all applications. For some underground aquifers, it may be possible to desalinate contaminated reservoirs, or a change in the surrounding hydrological conditions may result in a recharge of freshwater. In this subsection, I therefore analyze the situation when the upper bound of the consumption possibility set goes back to R after the system has spend l periods in the unproductive regime (where l > 0). Clearly, the lag l that the system spends in the unproductive regime could also be interpreted as the cleanup cost caused by an active effort to reverse the regime (e.g. desalination). What we observe in case the step δ implies exceeding T and we tip from the productive into the unproductive regime is a critical modeling choice. On the one hand, one could presume that while the choice of δ is set and may push us over the edge, we will at least observe where the edge has been. On the other hand, one can presume that the only thing we observe is that the regime shift has occurred, so that it must lie between s and s + δ, but we do not know exactly where.5 5 The latter type of learning is a little bit like sitting in a car, fixing the course to a destination and then 16 In discussing optimal experimentation for these two cases in turn, I will assume that F (R) = 1. That is, the social planner knows that the threshold is for sure somewhere on the interval between 0 and R. Keeping the possibility that F (R) < 1 makes the analysis significantly more tedious without yielding additional insights. Location of the threshold is discovered if it is crossed. When we take the first modeling route, the continuation value in case of a negative regime shift will be given by a period in which the resource is in its unproductive state, the per-period payoff being u(r) = 0. When the resource has recovered after a lag of length l, the continuation value is given by u(T ) 1−β as the threshold has been discovered. As the location of T is unknown, the social planner evaluates the payoff in case of a reversible regime shift at it expected value. When experimentation occurs in the first period only (as will be shown below), the Bellman equation for this problem can be written as: ( V (s) = max δ∈[0,R−s] " u(s + δ) u(s + δ) + β Fs (δ) + βl 1−β R s+δ s u(y)f (y)dy 1−β #) (11) where β l ∈ (0, 1) is the discount factor that accounts for the time that the system spends in the unproductive regime before recovering. The larger l, the stronger the hysteresis. Clearly, as l → ∞, the reversible case approaches the irreversible case discussed above. Because the consequences of a regime shift, should it occur, are less malign than in the irreversible case, the optimal step size will be larger. This becomes clear when inspecting the derivative of the maximand with respect to δ: u0 (s + δ) + i β h 0 Fs (δ)u(s + δ) + Fs (δ)u0 (s + δ) + β l u(s + δ)f (s + δ) 1−β (12) In comparison with the first-order condition in the irreversible case, equation (5), there is an additional positive term, β l u(s + δ)f (s + δ). Therefore, the function described by (12) will cross the x-axis at a larger value of δ than the function described by (5). Clearly, this additional term, β l u(s + δ)f (s + δ), is larger the smaller is l. Note however, that a region of the state space may still remain unexplored. The probability of crossing the threshold and incurring the cost of the regime shift is increasing, but the marginal gain from going yet a little further is constant or decreasing, so that the expected gain is decreasing. Whether or not it pays to explore the entire state space will depend on the length of the time lag in the unproductive regime. Finally, it will be optimal to experiment only once. Again, the argument is that if it would be optimal to explore the state space in some interval (s1 , s2 ], in the second period, it would have also been optimal to do so in the first period, and due to discounting, it must be better to do so right away. The following proposition summarizes this discussion: blindfolding oneself. Conscious experimentation, however, is more realistically described by saying that the course is set, but the eyes remain open. 17 Proposition 4. When the regime shift is reversible after a lag of length l and T is revealed when st + δt > T , then any experimentation is undertaken in the first period and the size of the first step, δ ∗ (s0 ), is larger is l. Depending on l and the initial safe value s0 , a range of the state-space remains permanently unexplored. Proof. The proof is placed in Appendix A–4 Location of the threshold remains unknown if it is crossed. When we take the second modeling route, according to which the social planner does not learn the exact location of the threshold upon crossing it, an experiment still reveals useful information: The social planner can now update the upper bound of T ’s distribution. Denote this upper bound by Ut . We have U0 = R and, if the threshold has been crossed, Ut+1 = st + δt . Consequently, we need to explicitly account for how the upper bound changes when formulating the planner’s value function. In addition, note that the social planner can always secure herself a payoff of secure herself a payoff of u(R) 1−β l+1 u(s) 1−β by simply remaining standing at s. Moreover, she can when she goes all the way to R, taking into account that this will trigger the regime shift for sure, but after a lag of l periods, she can do the same again. Obviously, the planner can also choose an interior step size δ ∈ (0, U − s) (any choice of δ ∈ [U − s, R − s) cannot be optimal). Denote the payoff from an interior choice by J. The planner’s value function is then given by: V (s, U) = max u(R) u(s) ; J(s, U) ; 1−β 1 − β l+1 ( "R U where J(s, U) = sup u(s + δ) + β δ∈(0,U −s) (13) s+δ RU s f (y)dy V (s + δ, U) f (y)dy R s+δ f (y)dy l s +β R U V f (y)dy s #) (s, s + δ) By now, it will not be surprising that the first step will be the farthest, if it is at all optimal to experiment. If the first step does not trigger the regime shift, we learn that T > s0 + δ0∗ , and we will stay at s1 = s0 + δ0∗ forever (see also section 2.5 in Groeneveld et al., 2013). This implies that also here, a region of the state space will remain permanently unexplored. If the first step triggers the regime shift, we will – after living through a period of low resource productivity – have the same knowledge about s but we will have updated the upper bound U. The important thing to note is that there will be a critical value Û below which it does not pay to experiment further: as the expected gain is small but the probability of a regime shift is large when U becomes small, it will be better to remain standing at s or oscillate between consuming R and r. The optimal consumption pattern is therefore characterized by the “stopping rule” described in Proposition 5. Proposition 5. When the regime shift is reversible after a lag of length l and T is not revealed when st + δt > T , it is either not optimal to experiment at all, or there is repeated 18 experimentation with decreasing step sizes δt > δt+1 . Experimentation stops the moment that st + δt < T or st + δt < Û. Proof. The proof is placed in Appendix A–5 Part II – The non-cooperative game While the first part of the paper investigates optimal experimentation of a social planner, I now analyze how strategic interactions affect learning and resource use under the threat of a regime shift. To do so, I go back to the simplest case of an irreversible threshold. Below, I introduce the modifications of the basic model that account for the game-theoretic setting. In section II–2, I will first look at the case when the location of the threshold is known in order to expose the underlying strategic structure. Section II–3 contains the main result and section II–4 gives comparative statics. Section II–5 returns to the example of the specific functional forms assumed in section I–3 in order to compare the non-cooperative equilibrium to the social optimum. II–1 The model There are N identical players that share a common resource whose dynamics are described by equation (1). Again, we can think of a underground freshwater reservoir that is accessed by several well-owners. All players have the same belief F about the location of the threshold T . Upon learning that a value s of the consumption possibility set is safe, all players update their belief according to equation (2). Furthermore, I continue to assume that ∂Fs (δ) ∂s < 0, so that the hazard rate is increasing in s. At time t, each player may choose to consume an amount δti more than st , effectively pushing the boundary of the safe consumption set at the risk of triggering the regime shift. In other words, δti is the effective choice variable with δti ∈ [0, R − st − δt−i ], where δt−i is the extension of the safe consumption set by all other players. I denote δ without superscript i P i as the total extension of the safe set, i.e. δt = N i=1 δt . It is well known that the static non-cooperative game of sharing a given resource has infinitely many equilibria. Here, I focus on symmetric pure-strategy equilibria. That is, any safe value st is shared equitably. Moreover, the game requires a statement about the consequences when the sum of players’ consumption plans exceeds the total available resource. In this case, I also assume that the resource is rationed so that each players gets an equal share. The objective of the players is to choose that sequence of extension decisions ∆i = δ0i , δ1i , ... which, for given strategies of the other players ∆−i , and for a given initial value s0 , 19 maximizes the sum of expected per-period utilities, discounted by a common factor β with β ∈ (0, 1). I concentrate on Markovian strategies. For now, the model abstracts from the dynamic common pool problem in the sense that the consumption decision of a player today has no effect on the consumption possibilities tomorrow, except that a.) the set of safe consumption possibilities may have been enlarged and b.) the disastrous regime shift may have been triggered. II–2 Preliminary step: known threshold location To expose the underlying strategic structure, I consider the case when the location of the threshold T is known. What is the first-best outcome in such a situation? When T is large, the first-best is to indefinitely use exactly that amount of the resource which does not cause the regime shift. However, when T is small (so that a large part of the available resource R must be foregone to ensure its continued existence) it will be socially optimal to cross the threshold and deplete the resource immediately. How small T must be for depletion to be optimal depends obviously on the discount factor β: the less one discounts the future, the more willing one is to sacrifice today’s consumption to ensure consumption in the future. The non-cooperative game then has two equilibria in pure strategies: Either the players deplete the resource immediately, or they can coordinate on staying at the threshold. For a given safe value of total consumption s, player i’s value function is: I = 1 when s + δ i + δ −i ≤ T i V (s) = max u(s/N + δ ) + I· βV (s) , where I = 0 when s + δ i + δ −i > T δi (14) Due to the stationarity of the model structure, it is clear that if staying at the threshold can be rationalized in any one period, it can be done so in every period. The payoff from avoiding the regime shift is u(T /N ) 1−β . Conversely, the payoff from deviating and immediately depleting the resource when all other players’ policy is to stay at the threshold is given by u R − NN−1 T . Staying at the threshold can thus be sustained as a Nash equilibrium /N ) N −1 whenever u(T 1−β ≥ u R − N T . Denote by β̄ the value of β for which this condition just so holds with equality (i.e. β̄ is the lowest discount factor for which staying at the threshold can be sustained for given values of N , T , and R). We have: β̄ = 1 − u(T /N ) u R − NN−1 T (15) In fact, there will always be a parameter combination so that the first-best can be supported as a Nash-equilibrium of the game with a known threshold (Proposition 6). Given these conditions, the game exhibits the structure of a coordination game. Here, as in the static game from Barrett (2013, p.236), “nature herself enforces an agreement to avoid catastrophe.” 20 Proposition 6. When the location of the threshold is known with certainty, then there exists, for every combination of N , T , and R, a value of β̄ such that the first-best can be sustained as a Nash-equilibrium when β ≥ β̄. The larger is N , or the closer T is to 0, the larger has to be β. Proof. The proof is placed in Appendix A–6 II–3 Non-cooperative equilibrium when the location of T is unknown The game has two equilibria in pure strategies: an “aggressive” equilibrium where the players immediately deplete the resource, and a “cautious” equilibrium where there is a positive probability that the resource is maintained forever. In fact, there are two types of the “cautious” equilibrium, depending on the initial value s. Similar to section I–2, I define snc to be the lowest member of the set of safe values at which a Nash equilibrium with the players choosing δ = 0 can be supported. For s ≥ snc , the cautious equilibrium thus conserves the resource with probability 1. I define snc to be the largest member of the set at which depletion is the only equilibrium. For values of s ∈ (snc , snc ), the “cautious” equilibrium implies that the players experiment once (and the regime shift thus occurs with positive probability). In general, the “Bellman equation” of player i can (for a given strategy of the other players ∆−i ) be expressed as: V i (s, ∆−i ) = max δ i ∈[0,R−s] u(s + δ i ) + βFs (δ)(δ i + δ −i )V i (s + δ, ∆−i ) (16) Also here, the crux is that V i is a priori unknown. Similar to the analysis in part I, I denote by φ the value for player i to take exactly one step of size δ i and then remain standing when the other players’ strategy is to do the same (i.e. ∆−i = {δ −i , 0, 0, 0, ...}): s u φ(δ i ; δ −i , s) = u + δ i + βF s (δ i + δ −i ) N s+δ i +δ −i N (17) 1−β The derivative of φ with respect to δ i is given by: φ0 (δ i ; δ −i , s) = u0 + s + δi N β 0 F s (δ i + δ −i )u 1−β s + δi + δ N −i + 1 F s (δ i + δ −i )u0 N s + δi + δ N (18) −i Let g(δ −i , s) be the interior solution to the first-order-condition of maximizing φ(δ i ; δ −i , s). 21 The best-reply function for player i, δ i∗ (δ −i , s) is then: 0 g(δ −i , s) δ i∗ (δ −i , s) = R − s − δ −i if s ≥ snc (19a) if s ∈ (snc , snc ) (19b) if s ≤ snc (19c) For a symmetric step size δ −i = (N − 1)δ i , we have: s + δ nc N nc β s + δ nc 1 0 nc nc 0 s + δ + F s (N δ )u + F s (N δ )u 1−β N N N φ0 (δ nc ; s) = u0 (20) Proposition 7. The set of Markov-strategies 0 if δ nc (s) if R−s N if s ≥ snc s ∈ (snc , snc ) s ≤ snc where δ nc (s) is defined by the interior solution to (20), constitutes a feedback Nash equilibrium. That is, for s0 ≥ snc coordination to stay at s0 can be supported as a Nash equilibrium. For s0 < snc taking one step and then staying at s1 = s0 + δ nc can be supported as a Nash equilibrium. Proof. The proof is given in Appendix A–7 Obviously, the best-reply for player i when all other players plan to expand the consumption set by R − s is to choose R − s as well. This would ensure that the player at least gets an equal share of R. I call this equilibrium in which the resource is immediately depleted the “aggressive equilibrium” and the equilibrium described in Proposition 7 the “cautious equilibrium”. Note that, for a given s, both the “cautious” and the “aggressive equilibrium” are unique.6 In short, the game has the structure of a coordination problem where the immediate depletion of the resource may become a self-fulfilling prophecy. Indeed, for s ≤ snc , the immediate depletion of the resource cannot be avoided in a non-cooperative setting, despite the fact that there is a range of initial values (s ∈ [s∗ , snc ]) for which it is optimal to conserve the resource indefinitely with positive probability. For s ∈ [snc , snc ], the strategic interactions imply that experimentation is inefficiently large. However, should it turn out 6 Uniqueness of the latter type of equilibrium simply follows from the assumption that in case of incompatible demands, the resource is shared equally among the players. Uniqueness of the symmetric “cautious equilibrium” (should it entail δ nc (s) < R−s ) can be established by contradiction. Suppose all other players N j 6= i choose to expand the consumption set to a level at which – should the threshold have not been crossed – no player would have an incentive to go further. Player i’s best-reply cannot be to choose δ i = 0 in this situation as the gain from making a small positive step (which are private) exceed the (public) cost of advancing a little further. Hence, the only equilibrium at which the players expand the consumption set once is the symmetric one. 22 that s0 = s + N δ nc is safe, this consumption pattern is ex-post socially optimal. Figure 3 illustrates the aggregate expansion of the set of safe consumption possibilities in the cautious equilibrium and contrasts it with the social optimum. Total extraction δ(s) Social Optimum Non-cooperation R-s s* snc s* snc R Initial safe value s Figure 3: Illustration of policy function δ(s). The blue circles represent the socially optimal extension δ of the safe consumption set s as discussed in section I–3 above. The red dashed line plots the cautious non-cooperative equilibrium, showing how s∗ ≤ snc and s∗ ≤ snc (but note that in some cases we may even have snc < s∗ ). It illustrates how even the “cautious” experimentation under non-cooperation implies excessive risk-taking. The figure also shows that the first-best and the non-cooperative outcome may coincide for very low and very high values of s. A value of β = 0.32 has been chossen to illustrate all values s∗ , snc , s∗ and snc . Faced with this coordination problem, the question arises which of the two equilibria can we expect to be selected. Clearly, the “cautious equilibrium” pareto-dominates the “aggressive equilibrium”.7 With rational players and without strategic uncertainty, the cautious equilibrium would thus be the outcome of the game. But what happens when the players are uncertain about the other player’s behavior? As the disastrous regime shift is irreversible, there is no room for dynamic processes that lead players to select the pareto-dominant equilibrium (Kim, 1996). Therefore, I turn to the static concept of risk-dominance (Harsanyi and Selten, 1988). Since the game is symmetric, applying the criterium of risk-dominance for equilibrium selection has the intuitive interpretation that the cautious equilibrium is selected if player i prefers to play cautiously (i.e. by choosing δ i (s) = δ nc (s)) rather than playing aggressively 7 This follows immediately from the fact that, by definition, δ nc (s) is the interior solution to the symmetric maximization problem (16) (with δ −i = (N − 1)δ nc ) where the policy δ(s) = R − s was an admissible candidate. 23 (i.e. choosing δ i (s) = R − s) when the payoff from doing so exceeds the payoff from playing aggressively when player i assigns probability p to the other players playing aggressively. Obviously, whether the cautious or the aggressive equilibrium is risk-dominant will depend both on this probability p as well as on the safe value s. We can, for a given safe value s solve for the probability p∗ at which the player is just indifferent between playing cautiously or aggressively: p∗ · π[all aggressive] + (1 − p∗ )· π[only i aggressive] = p∗ · π[only i cautious] + (1 − p∗ )· π[all cautious] ⇔ p∗ = π[all cautious] − π[only i aggressive] (π[all cautious] − π[only i aggressive] ) − (π[only i cautious] − π[all aggressive] ) In the above calculation, π[all aggressive] refers to the payoff of playing aggressive, when all other players play aggressively, π[only i aggressive] refers to the payoff of playing aggressive, when all other players play cautiously, etc. In order to explicitly solve for the value of p∗ , we need to put more structure on the problem. For the functional forms developed in the specific example, we can calculate and plot p∗ as a function of s. Figure 4 then illustrates how robust this equilibrium is: Even when the players think that there is more than a 50% chance that all other players play the aggressive strategy, it still pays to play the cautious 1 strategy for a wide range of initial values s. 0.75 0.5 Region where playing cautious is risk-dominant 0.25 Probability that opponents play aggressively p* snc snc R Initial safe value s √ Figure 4: p∗ as a function of s for u(c) = c, f = A1 and β = 0.8, A = R = 1 and N = 10. The grey area below the line drawn by p∗ shows the set of values for which player i prefers to play the strategy that pertains to the cautious equilibrium. p∗ is not defined for s < snc when the cautious and the aggressive equilibrium coincide. 24 II–4 Comparative statics In order to analyze how the extraction pattern changes with changes in the parameters, I first note that δ nc , the equilibrium expansion of the set of safe values, is monotonically decreasing in s. (The argument is the same as in the proof of Proposition 2 when simply replacing δ ∗ (s) with N δ nc (s) and is therefore omitted.) This implies that the aggregate extraction pattern as a function of the prior knowledge about the set of safe consumption possibilities indeed looks qualitatively as in Figure 3 (where it was plotted for the specific functional forms assumed in the example). The effect of an increase in the fundamentals β, N , F s (δ), and R can therefore be analyzed by investigating changes to φ0 (δ nc , s). Proposition 8. We have the following comparative statics results: (a) The boundaries snc , snc , and aggregate extraction for s ∈ [snc , snc ], decrease with β. (b) An increase in N leads to more aggressive extraction when N N +1 R > u0 ( N ) u0 ( NR+1 ). (c) The more likely the regime shift (in terms of a first-order stochastic dominance), the larger the range where a cautious Nash-equilibrium exists. (d) As long as R < A, the higher the maximum potential reward R, the larger the range where a cautious Nash-equilibrium exists. Proof. The proofs are given in Appendix A–8. The first comparative static result conforms with basic intuition: The more impatient the players are, the less they value the current safe consumption value, and the more aggressive their experimentation. The second result shows that an increase in the number players may exacerbate the “tragedy of the commons”, but not necessarily in all cases. There are two opposing effects: On the one hand, the addition of one more players increases aggregate extraction if all players were to choose the same consumption level as before. On the other hand, the addition of another player leads all other players to decrease their individual consumption as they partly take the increase in N into account. A sufficient condition for R when the former dominates is that NN+1 > u0 ( N ) u0 ( NR+1 ). The third comparative static result, that an increased risk of crossing the threshold leads to a larger range of the cautious equilibrium is not related to the risk aversion of the agents as such, but stems from the fact that the expected cost of experimentation increase, while the gains stay the same. Finally, as long as R < A, an increase in the upper bound of the consumption possibility set, R, increases the range where a cautious equilibrium exists, in spite of the fact that R neither affects the interior consumption choices directly, nor the value snc at which the cautious equilibrium implies no further experimentation. The reason for the result is the following: At the old value of snc , the equilibrium experimentation just coincides with choosing R − s. Now when the R shifts outwards, say to R̃, equilibrium experimentation at the old snc is strictly less than R̃ − s. 25 II–5 Specific example again In this section, I use the specific example developed throughout the paper to derive closedform solutions for the step size in the “cautious” equilibrium and to plot the individual player’s value function in that equilibrium. Exploiting the fact that due to symmetry we have δ −i = (N − 1)δ i , I solve (20) for an interior equilibrium value δ nc . Total non-cooperative expansion is then given by: Nδ nc (1 − β)N + β A − (1 − β)N + 3β s = 3β (21) Again, there will only be an interior equilibrium for s ∈ [snc , snc ], where: ) (1 − β)N + β A − 3βR = max 0, (1 − β)N ( nc s ( s nc = min ) (1 − β)N + β A ,R (1 − β)N + 3β From inspection of (21) it becomes clear that the total non-cooperative consumption is increasing in N : ∂[N δ nc ] ∂N = (1−β)(A−s) 3β > 0. This points to the “tragedy of the commons”: The more players there are, the more aggressive the first-period expansion of the set of consumption possibilities. Furthermore, one can find the combination of parameters that would ensure a self-fulfilling prophecy of extirpation when staring from an initial value of s = 0 (no prior knowledge of a safe level of extraction). It is namely given by N ≥ β 1−β (3R − A), hence increasing in R and decreasing in β and A, as it is intuitive. Figure 5 plots the value function for a uniform prior (with A = R = 1) and a discount factor of β = 0.8, illustrating how it changes as the number of player increases. The more players there are, the greater the distance of the non-cooperative value function (assuming that the players coordinate on the Pareto-dominated equilibrium, plotted by the blue solid diamonds) to the socially optimal value function (plotted by the black open circles). In particular when N = 10, one sees the region (roughly from s = 0 to s = 0.2) where there is no “cautious” equilibrium, and the large value of s̄nc (roughly 0.62) when it first becomes individually rational to remain standing. All in all however, this example shows that the threat of a irreversible regime shift is very effective when the common pool externality applies only to the risk of crossing the threshold. (At least for this specific utility function and these parameter values. Note that β = 0.8 implies a unreasonably high discount rate, but it was chosen to magnify the effect of non-cooperation for a small number of players.) 26 N=5 N=10 5 Social Optimum Non-cooperation 1 1 2 2 3 3 4 4 5 Social Optimum Non-cooperation snc snc s* 0.2 0.4 snc 0 0 s* 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.6 0.8 1.0 Figure 5: Illustration of expected total payoff in the social optimum (blue open circles) and in the cautious non-cooperative equilibrium (red dashed line). Left panel shows the aggregate payoff in the game when√there are N = 5 players and right players shows N = 10. Parameters and functional forms: u(c) = c, β = 0.8, A = R = 1, and Fs (δ) = 1−s−δ 1−s . Part III – Discussion The paper’s main results do not rely on specific functional forms for utility or the probability distribution of the threshold’s location. Tractability was achieved by: (1) considering extremely simple resource dynamics, namely the resource remained intact and replenishes fully in the next period as long as resource use in the current period has not exceeded T . (2) I have assumed that the value after the threshold has been crossed is independent of the situation before the regime shift has occurred. Thus, one could interpret R as the upper bound of the consumption possibility set in the productive regime and r being the upper bound of the consumption possibility set in the unproductive regime. In this part of the paper, I relax the first structural assumption and discuss generic nonrenewable resource dynamics in section III–1 and generic renewable resource dynamics in section III–2. In contrast to the analysis in part II, there is now a common-pool externality that relates to the resource itself. Therefore, the cautious non-cooperative equilibrium will coincide with the social optimum in very special cases only. Nevertheless, it is still socially optimal and Nash equilibrium to undertake any experimentation in the first period only. Thus, if the regime shift has not occurred in the first period, the threat of crossing the catastrophic threshold disciplines non-cooperative play and leads to a pareto-improvement. In section III–3, I then relax the second structural assumption. The value of the unproductive regime cannot be normalized to zero when it explicitly depends on the values of the state and control variables in the pre-event regime. In such an environment, learning will be gradual when the post-event value is declining sufficiently strongly in the size of the experiment. Intuitively, the more costly it is to over-invest, the more it pays to experiment repeatedly and cautiously, in spite of the fact that sequential investment yields a lower payoff when the threshold is not crossed. 27 III–1 Non-renewable resource dynamics So far, it has been assumed that the resource replenishes fully every period unless the thresholds has been crossed. Here, I study the opposite case of a non-renewable resource to analyze the effect of a disastrous regime shift when the common-pool externality relates not only to the risk of crossing the threshold but also to the resource itself. Specifically, I consider the following model of extraction from a known stock of a non-renewable resource: max cit ∞ X β t u(cit ) subject to: Rt+1 t=0 P P i Rt − i cit if i ct ≤ T = P i r if i ct > T (22) or Rt = r I assume that the utility function is of such a form, that in a world without the threshold, there is a pareto-dominant non-cooperative equilibrium in which positive extraction occurs in several periods (though the players could empty the resource in one period, if they so wish). Due to discounting, it is clear that the extraction level will decline as time passes, both in the social optimum and in the non-cooperative equilibrium. Due to the stock externality, it is clear that the extraction rate in the non-cooperative equilibrium is inefficiently large. A simple interpretation of this model could be a mine from which several agents extract a valuable resource. For example, the structure of the mining shafts may collapse and the remainder of the resource becomes inaccessible if aggregate extraction is too high in any given period. In spite of this natural interpretation, two things are rather peculiar about this model setup: First, any player can extract any amount up to Rt . (The option to introduce a capacity constraint on current extraction – though realistic – would come at the cost of significant clutter without yielding any apparent benefit.) Second, the assumption that R0 is known and that T is constant means that this is not a problem of eating a cake of unknown size. This problem has since long been dealt with in the literature (see e.g. Kemp, 1976; Hoel, 1978) and is not considered here. As in part II of the paper, it is instructive to first discuss the case when the location of the threshold is known in order to expose the strategic structure that results from the potential for a disastrous regime shift. Let c̃nc (Rt ) be the total non-cooperative extraction level (as a function of the resource stock Rt ) in absence of the regime shift risk. Clearly, if T > c̃nc (R0 ) the threshold is immaterial and the whole problem is not interesting. Thus, I only consider the case when the known value of T is below c̃nc (R0 ). The relevant question is therefore whether the agents can coordinate on staying at the level T in the first period or not. If they can, they will stay at the level T for an interval of time t = 0, 1, ..., τ , where τ is the time at which the resource stock has been depleted to a level where the non-cooperative extraction path stays below T until the resource is exhausted. That is Rτ is given by R0 −τ T where τ is defined implicitly by c̃nc (R) = T . The temptation to empty the resource in the first period, when all other players stay at the threshold is given by u(R0 − N −1 N T) T > u( N ) + βV (R0 − T ). Whether this inequality holds depends on the particular form of u and V and cannot be answered in general. Still, it is possible to prove the following: 28 Proposition 9. In the game described by (22), a known threshold is crossed in the first period, or never. Proof. The proof is given in Appendix A–9. When we now turn to the case where the location of the threshold is unknown, it will not be surprising that it does not pay to experiment in the second (or any later) period because learning is only affirmative. This means that, if the threshold has not been crossed, the extraction path will be constrained by s1 = s0 + δ nc (s0 ). While the extraction path in the absence of regime shift risk, c̃nc (Rt ), declines monotonically, the extraction path under regime shift risk in the “cautious equilibrium” is characterized by an initial nondeclining phase for which ct = s1 = s0 + δ nc (s0 ), and at some period τ , when c̃nc (Rτ ) = s1 , the extraction path then follows c̃nc (Rt ) (i.e. it declines monotonically). As the players would never expand the set of safe extraction possibilities beyond c̃nc (R0 ) in the cautious equilibrium, the threat of a disastrous regime shift is welfare improving when the players coordinate. However, the cautious equilibrium will coincide with the first-best only in the very special case that the initially safe value s0 is binding in each period of the game and the social optimum. Proposition 10 summarizes this discussion. Proposition 10. In the game described by (22) when T is unknown, there exists, in addition to the aggressive equilibrium in which the resource is exhausted in the initial period, a pareto dominant equilibrium in which experimentation – if at all – is undertaken in the first period only and s1 = s0 + δ nc (s0 ) is an upper bound on aggregate extraction for the remainder of the game. The threat of a regime shift slows down the extraction rate and improves welfare. Proof. The proof is given in Appendix A–10. III–2 Renewable resource dynamics Consider a generic renewable resource problem, where the objective of player i is: max cit ∞ X T =0 β t u(cit , Rt ) s.t.: Rt+1 P P i G(Rt − i cit ) if i ct ≤ T = P i r if i ct > T (23) or Rt = r Note that the instantaneous utility function now directly depends on the resource stock (with ∂u ∂R > 0). This could, for example, be due to stock dependent harvesting costs as it is usual for fishery models. Suppose that without the threshold, there is a unique Nash nc . Due to the negative stock externality, the equilibrium with steady state resource stock R∞ so , is larger than Rnc . socially optimal steady state resource stock, R∞ ∞ Parallel to section III–1, the threshold applies to the total exploitation level in any given period, and not to the stock as such. This structure is without loss of generality here, 29 because there is a one-to-one mapping between total harvest and escapement for a given initial stock. Assume that the initial resource stock R0 , and the initial safe exploitation level, s0 , are so . Assume furthermore that there are no capacity constraints to harvesting, so that above R∞ nc in one period should he or she wish to do so. Although any agent i can consume R0 − R∞ we would need to put a lot more structure on this problem to solve it explicitly, we can make the following statement: nc , the threat of a disastrous regime shift will Proposition 11. Unless δ0nc (s0 ) = R0 − R∞ nc . strictly improve welfare as players can coordinate on a steady-state resource stock above R∞ Proof. The argument for the fact that the players can coordinate on a cautious equilibrium is the same as the one for Proposition 10 and is omitted. As in the case of non-renewable resource dynamics, the maximum extraction level that is known to be safe puts an upper bound on extraction and thereby mitigates the negative stock externality. If the initial value of s0 is relatively low the players will experiment once and stay at the updated level, unless, of course, the threshold has been crossed. The discipling effect of the threat of a stock collapse may indeed be very policy relevant, as the example of the North-Atlantic herring fishery suggests: For centuries, this fishery has been the economic centerpiece for many regions in Northern Europe. The collapse of this immense stock in the late 1960s created significant hardship for the affected communities. In spite of a complete harvest moratorium, it took almost 30 years for the fish stock to recover. By the end of the 1990s the spawning stock biomass had reached levels above 6 million tons again and the fishery was re-opened. A changed distribution pattern in the early 2000s lead to strong disagreement among the harvesting nations and severe overfishing. The trend in biomass growth halted and even turned negative. Nevertheless, the competing nations could restore cooperation, supposedly because they were “staring into the abyss that yawned before them” (Miller et al., 2013, p.325). III–3 Pre-event choices matter for post-event value The second simplifying assumption in part I and II that allowed me to get tractable results was that pre-event choices did not matter for post-event value. In this sense, the regime shift was really disastrous, breaking any links between the state before T has been crossed and the state afterwards. Because of this assumption, I could simply normalize the continuation value in case of a regime shift to zero. For some applications, this independence of the post-event value is not a very fitting description. This is especially true when the system under consideration is large, such as for global climate change, and the threshold effect on the damage is not truly catastrophic, but just one of many parts in the equation. In such a setting, it would be more appropriate to explicitly take into account how the continuation value depends on the pre-event situation, for example by writing the continuation value function by W (st+1 ). 30 How W would depend on the pre-event values of the state and choice variables s and δ is not generally clear. The capital stock in a climate change application, for example, has an ambiguous effect (Ploeg and Zeeuw, 2015a). On the one hand, it could buffer against the adverse effects of the regime shift and hence smooth consumption over regimes. On the other hand, a higher capital stock implies more intense use of fossil fuels, which aggravates climate damages. In a renewable resource application, Ren and Polasky (2014) similarly discuss under which conditions regime shift risk implies more cautionary management. In particular, they highlight the role of an “investment effect”: because investing in the renewable resource stock by harvesting less pays off badly should the regime shift occur, there are incentives for more aggressive management. These incentive are balanced by what they call the “consumption smoothing effect” (leading to more precaution in their application) and the “risk reduction effect”. Whether overall W 0 (st+1 ) > 0 or W 0 (st+1 ) < 0 will depend on the specific model at hand. Regardless of this, however, when pre-event choices matter for the post-event value, it is no longer immaterial by how much one has stepped over the threshold when it is crossed. Below, I derive general conditions when this implies that it is not optimal to undertake all experimentation at once, but rather approach the value at which experimentation ceases sequentially. III–3.1 Optimal and non-cooperative experimentation The value function for a social planner that knows that the current consumption level s is safe, but an increase of the consumption flow by an amount δ may trigger a regime shift is given by (24), where the post-event value function is given by W . I assume that Wt+1 depends only on the pre-event value st at time t and on the choice in period t. In other words, the Markov property is maintained and the problem is stationary. Furthermore, I continue to assume that the regime shift irreversible. V (s) = max δ∈[0,R.s] u(s + δ) + β (1 − Fs (δ))· V (s + δ) + Fs (δ)· W (s + δ) (24) When the post-event value is a function of the pre-event state, optimal experimentation will occur in several steps only if the post-event value declines with the size of the last step before the threshold has been crossed. Intuitively, it only pays off to proceed in several steps when crossing the threshold is worse the further it has been exceeded. To see this more clearly, I compare the expected value of traversing a given distance in one step with the expected value of traversing the same distance in two steps (equation 25). In the proof of Proposition 1 (in Appendix A–1) this construction was used to shows that experimentation in the basic step involves at most one step. Before analyzing equation (25), note that also here there will be some value of s∗ such that for s ≥ s∗ , it is not optimal to experiment any further, but rather to stay at s. As in the proof of Proposition 1, take some s̃ outside of the set of values at which it is optimal to remain standing (I denote this set by S), and let δ ∗ be the optimal step from s̃ 31 into S. Then denote by δ̃ the step from some s to s̃, and compare the expected payoff from taking one step of size δ̂ = δ̃ + δ ∗ to the expected payoff from taking two steps: one step of size δ̃ from s to s̃, and then the step δ ∗ from s̃ into S. u(s + δ̂) + β Fs (δ̂)W (s + δ̂) + (1 − Fs (δ̂))V (s + δ̂) ? > h u(s + δ̃) + β Fs (δ̂)W (s + δ̃) + (1 − Fs (δ̃)) u(s + δ̃ + δ ∗ )+ i +β Fs+δ̃ (δ ∗ )W (s + δ̃ + δ ∗ ) + (1 − Fs+δ̃ (δ ∗ ))V (s + δ̃ + δ ∗ ) Using the fact that because s + δ̃ + δ ∗ ∈ S we know V (s + δ̃ + δ ∗ ) = u(s+δ̂) 1−β , this can be re-written as: ? R s+δ̂ u(s + δ̂) − u(s + δ̃) > β s+δ̃ f (y)dy h 1 − F (s) u(s + δ̂) − (1 − β)W (s + δ̂) i (25) R s+δ̃ +β i f (y)dy h W (s + δ̃) − W (s + δ̂) 1 − F (s) s Equation (25) compares the immediate gain from taking a large instead of a small step with the (discounted) future consequences when the threshold was either located between s + δ̃ and s + δ̂, or located between s and s + δ̃. Clearly, the more the future is discounted, the more important will be the immediate gain of taking a large step. But note also that when W = 0, the condition above is identical to equation (A-2) in the basic setup, where it was shown that the LHS was larger than the RHS. Now when W 6= 0, there are two additional effects. First, postponing the regime shift is less valuable when the consequences of the regime shift are less severe. This is captured by the term (1 − β)W (s + δ̂) in the first line of equation (25). This effect strengthens the social planner’s incentive to experiment only once. However, this effect can be overturned when W (s + δ̃) > W (s + δ̂): Specifically, the term in the second line of equation (25) represents the value of postponing the loss from having overstepped by a lot rather than just a little, and it is proportional to the probability that the threshold is located in the interval between s and s + δ̃. In summary, only when the post-event value declines sufficiently strongly in the size of the pre-event experiment will it pay to update the upper bound of the set of safe consumption values in several steps. In the non-cooperative game, there are two countervailing forces when pre-event choices matter for post-event value: On the one hand, the classic dynamic common pool externality emphasizes current short-term gains over long-term savings, which speaks against cautious, sequential experimentation. On the other hand, as also the post-event value will be characterized by the common-pool externality, a lower value of state variable after the regime shift has been triggered is much more harmful under non-cooperation than in the social optimum. This effectively increases the penalty from overstepping the threshold by far and speaks in 32 favor of cautious, sequential experimentation. How these two effects play out cannot be said in general, and I therefore turn to the specific example that I have developed throughout this paper in the next section. III–3.2 Numerical solution for specific functional forms To illustrate the possible size of the effect when pre-event choices matter for the post-event value in , I numerically solve the model for the specific functional forms developed above.8 For the post-event continuation value, I assume that the resource loses all its productivity once the regime shift occurs. In other words, in the social optimum W is the highest value that can be obtained when spreading the consumption of the now non-renewable resource rt = R − st − δt over the remaining time horizon. Formally: W (st + δt ) = max 0≤cτ ≤rτ s = ∞ X β τ −t u(cτ ) subject to: rτ +1 = rτ − cτ ; rt = R − st − δt . τ =t R − (st + δt ) 1 − β2 when u(c) = √ c Clearly, W 0 (st + δt ) < 0. Turning to the non-cooperative game, denote the aggregate expansion of all N players at time t by δt , let δi,t be the expansion of player i and let δ−i,t be the expansion of all other players. For a square-root utility function and without exogenous constraints in extraction, the number of players cannot be too big in order to have an interior equilibrium with positive extraction over the entire time path (instead of immediate, pre-emptive, extraction in the first period). Here I choose N =2. We then have: W nc (st + δi,t + δ−i,t ) = max 0≤cτ ≤rτ ∞ X β τ −t u(ci,τ ) subject to: rt = R − (st + δi,t + δ−i,t ); τ =t and: rτ +1 = rτ − PN i ci,τ . for a symmetric equilibrium ci,τ = cnc τ ∀i and ∀τ , we have the following explicit solution: s W nc (st + δi,t + δ−i,t ) = R − (st + δi,t + δ−i,t ) p β 1 − β2 when u(ci ) = √ ci and N = 2. Panel (a) of Figure 6 shows the optimal size of experimentation δ, as a function of the safe value s, for three different values of the discount factor β. The overall structure of the optimal policy is the same as in the baseline case (compare to Figure 3), namely that there is region where s < s∗ for which experimentation is optimal, and a region where s ≥ s∗ for which it is not optimal to experiment (note that I do not plot the entire state-space [0, R] here in order to concentrate on the relevant range of state values). Moreover, as in the baseline 8 I solve by straight-forward value function iteration, using the computer program R 3.1.1 (2014). The scripts are available on request. 33 0.6 0.6 Total extraction δ(s) 0.2 0.4 Total extraction δ(s) 0.2 0.4 β=0.60 β=0.65 β=0.95 β=0.95 β=0.99 β=0.99 0 0.2 0.4 0.6 0 Initial safe value s 0.2 0.4 0.6 Initial safe value s (a) Social optimal expansion (b) Aggregate non-cooperative expansion Figure 6: Illustration of optimal and non-cooperative experimentation when pre-event choices matter √ for post-event value. Parameters and functional forms: u(c) = c, N =2, A=R=1, Fs (δ) = 1−s−δ 1−s ; various values for β. scenario of Part I, the size of the experiment is decreasing in the value of consumption that is known to be safe. In contrast to the baseline case of an exogenous post-event value, however, Figure 6a shows clearly how there may be repeated experimentation when β = 0.95 and β = 0.99. In fact, as the optimal policy in these cases is always below the curve that maps the policy of taking only step (that is, the curve at which δ(s) = s∗ − s), represented by the thin black line, it is optimal to approach s∗ only asymptotically. As discussed above, the stronger the future is discounted, the less valuable is a cautious approach. In Figure 6a, this is illustrated by plotting the optimal policy for β = 0.60, which is always above the thin black line that represents δ(s) = s− s. That is, in the case that β is sufficiently small, all experimentation will be undertaken at once. Panel (b) serves to contrast the socially optimal expansion with the aggregate noncooperative expansion. Not surprisingly, non-cooperative experimentation is inefficiently risky. But again, the overall structure of the non-cooperative equilibrium is the same as in the baseline case, namely there is region where s < snc with positive experimentation, and a region where s ≥ snc for which it is an equilibrium to not experiment. Again, as in the baseline scenario of Part II, the size of the experiment is decreasing in the value of consumption that is known to be safe. However, in contrast to the baseline scenario, equilibrium expansion does not significantly exceed the level that brings st+1 to the set at which no further experimentation is an equilibrium. At the same time, endogeneity of the post-event value does not induce sequential experimentation either, unless the values of β is very high. Generally, it is noteworthy that the slope of the aggregate expansion is much less sensitive to changes the discount factor as compared to the social optimum (though it moves snc to the left and right, respectively, with a higher or lower β, as in the social optimum). 34 Conclusion The effect of potential regime shifts on the first-best and the non-cooperative use of environmental goods and services is polarizing. Depending on the initial level of use that is known to be safe, and depending on the belief about the location of the catastrophic threshold, it may be optimal to never use more than the initially known safe level. Or, it may be optimal to experiment exactly once and then not expand resource use again when the updated level of resource use turned out to be safe. This first experiment may however imply the exhaustion of the resource. Similarly, when the players believe that even a low level of consumption causes catastrophe, the game exhibits prisoner-dilemma features: Although it would be optimal to sustain the resource at its current level of use, the only non-cooperative equilibrium will be the immediate extirpation of the resource. In contrast, when the players believe that it is sufficiently likely that the productive regime can be sustained even at a high level of consumption, the game changes into a coordination-problem: The threat of loosing the productive resource can effectively enforce the first-best consumption level. For intermediate values, the equilibrium will neither be extirpation nor status-quo consumption, but rather a one-time increase in consumption, expanding the set of safe consumption possibilities. This expansions will be inefficiently large compared to the first-best experiment, but if it has not caused the regime shift, the players will be able to coordinate on staying at the updated level. Staying at the updated level is ex post socially optimal. When the externality applies not only to the risk of a regime shift (i.e. any given level of safe consumption is efficiently shared among the agents), but also to the resource use itself, the threat of the threshold loses importance. Due to the dynamic common-pool externality on the resource, non-cooperative extraction will be inefficiently high even in absence of any risk of a regime shift. This means that when the threshold is believed to be above the firstbest consumption pattern, its threat cannot act as a “commitment device” to ensure efficient extraction. Nevertheless, the threshold may still dampen non-cooperative extraction. These conclusions have been derived by using a general dynamic model that has placed only minimal requirements on the utility function (concavity and boundedness) and the probability distribution of the threshold (continuity). Their robustness have been demonstrated by exploring a range of alternative assumptions on the timing at which the catastrophe occurs, on the renewability of the regime shift, and the growth potential of the resource. Nevertheless, there are a four aspects of the underlying modeling structure that warrant special discussion. First, a prominent aspect of this model is that the threshold itself is not stochastic. The central motivation is that this allows concentrating on the effect of uncertainty about the threshold’s location. This is arguably the core of the problem: we don’t know which level of use triggers the regime shift. This modeling approach implies a clear demarcation between a safe region and a risky region of the state space. In particular, it implies that the edge of the cliff, figuratively speaking, is a safe – and in many cases optimal – place 35 to be. The alternative approach, modeling the risk of a regime shift by a hazard rate acknowledges that, figuratively speaking, the edge of a cliff is often quite windy and not a particular safe place. This however implies, that under a hazard rate approach, the regime shift will occur with certainty as time goes to infinity, no matter how little of the resource is used. Eventually, there will be a gust of wind that is strong enough to blow us over the edge, regardless of where we stand. This is of course not very realistic either. But also on a deeper level one could argue that the non-stochasticity is in effect not a flaw but a feature: Let me cite Lemoine and Traeger (2014, p.28) who argue that “we would not actually expect tipping to be stochastic. Instead, any such stochasticity would serve to approximate a more complete model with uncertainty (and potentially learning) over the precise trigger mechanism underlying the tipping point.” This being said, it would still be interesting to investigate how the choice between a hazard-rate formulation (as in Polasky et al., 2011 or Sakamoto, 2014) or a threshold formulation influences the outcome and policy conclusions in an otherwise identical model. Second, I have modeled the players to be identical. In the real world, players are rarely identical. One dimension along which players could differ is their valuation of the future. However, prima facie it should not be difficult to show that any such differences could be smoothed out by a contract that gives a larger share of the gains from cooperation to more impatient players. Another dimension along which players could differ is their beliefs about the existence and location of the threshold. Agbo (2014) and Koulovatianos (2015) analyze belief heterogeneity about the renewability parameter in the framework of Levhari and Mirman (1980). They find that in particular the players with the most pessimistic beliefs can have a detrimental effect on resource governance. In the current set-up such a heterogeneity could lead to interesting dynamics and possible multiple equilibria, where some players are so pessimistic about the location of T that the rationally do not want to experiment, whereas other players would want to invest in learning and experimentation. Finally, players could differ in their size or the degree to which they depend on the environmental goods or services in question. As larger players are likely to be able to internalize a larger part of the externality than smaller players, different sets of equilibria may emerge. Especially in light of the discussions surrounding a possible climate treaty (Harstad, 2012; Nordhaus, 2015), it is topical to analyze a situation where groups of players can form a coalition to ameliorate the negative effects of non-cooperation in future applications. Third, while I have analyzed reversibility of the regime shift in the social planner environment of Part I, it was beyond the scope of this paper to also do so in the game. However, leveraging the tractability of the current modeling approach to explore this issue could be very fruitful. For example, if one presumes that crossing the threshold implies that one learns where it is, the game turns into a repeated game. On the one hand, this may imply that cooperation is sustainable for sufficiently patient players (van Damme, 1989). But on the other hand, there could also be cases where irreversibility emerges “endogenously” when it is possible – but not an equilibrium – to move out of a non-productive regime. A final, related, point is the fact that I have concentrated on Markovian strategies. When 36 the players are allowed to use history-dependent strategies, the threat of a threshold may allow them to coordinate on the social optimum in all phases of the game. They could simply agree on expanding the set of safe consumption possibilities by the socially optimal amount and threaten that if any player steps too far, this triggers the depletion of the resource in the next period. This obviously begs the question of renegotiation proofness, but it is plausible that already a contract that is binding for two periods is sufficient to achieve the first-best. The threat of a disastrous regime shift is a very strong coordinating device. This is true irrespective of whether the threshold’s location is known or unknown, because the current model of uncertainty implies that it is, loosely speaking, pitch dark when the players take a step. It is only afterwards that they realize whether the disastrous regime shift has occurred or not. Would the coordinating force of a catastrophic threshold diminish when the players can learn about its location without risking to cross it? Importantly, an extension of the model along these lines would link the game-theoretic part to the debate on “early warning signals” (Scheffer et al., 2009; Boettiger and Hastings, 2013) and is the task of future work. 37 References Aflaki, S. (2013). The effect of environmental uncertainty on the tragedy of the commons. Games and Economic Behavior, 82(0):240–253. Agbo, M. (2014). Strategic exploitation with learning and heterogeneous beliefs. Journal of Environmental Economics and Management, 67(2):126–140. Amundsen, E. S. and Bjørndal, T. (1999). Optimal exploitation of a biomass confronted with the threat of collapse. Land Economics, 75(2):185–202. Barlow, P. and Reichard, E. (2010). Saltwater intrusion in coastal regions of north america. Hydrogeology Journal, 18(1):247–260. Barrett, S. (2013). Climate treaties and approaching catastrophes. Journal of Environmental Economics and Management, 66(2):235–250. Barrett, S. and Dannenberg, A. (2012). Climate negotiations under scientific uncertainty. Proceedings of the National Academy of Sciences, 109(43):17372–17376. Bochet, O., Laurent-Lucchetti, J., Leroux, J., and Sinclair-Desgagné, B. (2013). Collective Dangerous Behavior: Theory and Evidence on Risk-Taking. Research Papers by the Institute of Economics and Econometrics, Geneva School of Economics and Management, University of Geneva 13101, Institut d’Economie et Econométrie, Université de Genève. Boettiger, C. and Hastings, A. (2013). Tipping points: From patterns to predictions. Nature, 493(7431):157– 158. Bolton, P. and Harris, C. (1999). Strategic experimentation. Econometrica, 67(2):349–374. Bonatti, A. and Hörner, J. (2015). Learning to disagree in a game of experimentation. Discussion paper, Cowles Foundation, Yale University. Brozović, N. and Schlenker, W. (2011). Optimal management of an ecosystem with an unknown threshold. Ecological Economics, 70(4):627 – 640. Costello, C. and Karp, L. (2004). Dynamic taxes and quotas with learning. Journal of Economic Dynamics and Control, 28(8):1661–1680. Crépin, A.-S. and Lindahl, T. (2009). Grazing games: Sharing common property resources with complex dynamics. Environmental and Resource Economics, 44(1):29–46. Cropper, M. (1976). Regulating activities with catastrophic environmental effects. Journal of Environmental Economics and Management, 3(1):1–15. Fesselmeyer, E. and Santugini, M. (2013). Strategic exploitation of a common resource under environmental risk. Journal of Economic Dynamics and Control, 37(1):125–136. Frank, K. T., Petrie, B., Choi, J. S., and Leggett, W. C. (2005). Trophic cascades in a formerly coddominated ecosystem. Science, 308(5728):1621–1623. Gerlagh, R. and Liski, M. (2014). Carbon Prices for the Next Hundred Years. CESifo Working Paper Series 4671, CESifo Group Munich. Gladwell, M. (2000). The Tipping Point: How Little Things Can Make a Big Difference. Little Brown. Groeneveld, R. A., Springborn, M., and Costello, C. (2013). Repeated experimentation to learn about a flow-pollutant threshold. Environmental and Resource Economics, forthcoming:1–21. Harsanyi, J. C. and Selten, R. (1988). A general theory of equilibrium selection in games. MIT Press, Cambridge. Harstad, B. (2012). Climate contracts: A game of emissions, investments, negotiations, and renegotiations. The Review of Economic Studies, 79(4):1527–1557. Hoel, M. (1978). Resource extraction, uncertainty, and learning. The Bell Journal of Economics, 9(2):642– 645. Keller, G., Rady, S., and Cripps, M. (2005). Strategic experimentation with exponential bandits. Econometrica, 73(1):39–68. Kemp, M. (1976). How to eat a cake of unknown size. In Kemp, M., editor, Three Topics in the Theory of International Trade, pages 297–308. North-Holland, Amsterdam. Kim, Y. (1996). Equilibrium selection in n-person coordination games. Games and Economic Behavior, 15(2):203–227. Kolstad, C. and Ulph, A. (2008). Learning and international environmental agreements. Climatic Change, 89(1-2):125–141. 38 Kossioris, G., Plexousakis, M., Xepapadeas, A., de Zeeuw, A., and Mäler, K.-G. (2008). Feedback nash equilibria for non-linear differential games in pollution control. Journal of Economic Dynamics and Control, 32(4):1312–1331. Koulovatianos, C. (2015). Strategic exploitation of a common-property resource under rational learning about its reproduction. Dynamic Games and Applications, 5(1):94–119. Lemoine, D. and Traeger, C. (2014). Watch your step: Optimal policy in a tipping climate. American Economic Journal: Economic Policy,, forthcoming:1–47. Lenton, T. M., Held, H., Kriegler, E., Hall, J. W., Lucht, W., Rahmstorf, S., and Schellnhuber, H. J. (2008). Tipping elements in the earth’s climate system. Proceedings of the National Academy of Sciences, 105(6):1786–1793. Levhari, D. and Mirman, L. J. (1980). The Great Fish War: An Example Using a Dynamic Cournot-Nash Solution. The Bell Journal of Economics, 11(1):322–334. Miller, K. A., Munro, G. R., Sumaila, U. R., and Cheung, W. W. L. (2013). Governing marine fisheries in a changing climate: A game-theoretic perspective. Canadian Journal of Agricultural Economics, 61(2):309– 334. Miller, S. and Nkuiya, B. (2014). Coalition formation in fisheries with potential regime shift. Unpublished Working Paper, University of California, Santa Barbara. Nævdal, E. (2006). Dynamic optimisation in the presence of threshold effects when the location of the threshold is uncertain – with an application to a possible disintegration of the western antarctic ice sheet. Journal of Economic Dynamics and Control, 30(7):1131–1158. Nordhaus, W. (2015). Climate clubs: Overcoming free-riding in international climate policy. American Economic Review, 105(4):1339–1370. Ploeg, F. v. d. and Zeeuw, A. d. (2015a). Climate tipping and economic growth: Precautionary saving and the social cost of carbon. OxCarre Research Paper 118, OxCarre, Oxford, UK. Ploeg, F. v. d. and Zeeuw, A. d. (2015b). Non-cooperative and cooperative responses to climate catastrophes in the global economy: A north-south perspective. OxCarre Research Paper 149, OxCarre, Oxford, UK. Polasky, S., Zeeuw, A. d., and Wagener, F. (2011). Optimal management with potential regime shifts. Journal of Environmental Economics and Management, 62(2):229 – 240. R 3.1.1 (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org. Ren, B. and Polasky, S. (2014). The optimal management of renewable resources under the risk of potential regime shift. Journal of Economic Dynamics and Control, 40(0):195–212. Rob, R. (1991). Learning and capacity expansion under demand uncertainty. The Review of Economic Studies, 58(4):655–675. Sakamoto, H. (2014). A dynamic common-property resource problem with potential regime shifts. Discussion Paper E-12-012, Graduate School of Economics, Kyoto University. Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., Held, H., van Nes, E. H., Rietkerk, M., and Sugihara, G. (2009). Early-warning signals for critical transitions. Nature, 461(7260):53–59. Scheffer, M., Carpenter, S., Foley, J. A., Folke, C., and Walker, B. (2001). Catastrophic shifts in ecosystems. Nature, 413(6856):591–596. Tavoni, A., Dannenberg, A., Kallis, G., and Löschel, A. (2011). Inequality, communication, and the avoidance of disastrous climate change in a public goods game. Proceedings of the National Academy of Sciences, 108(29):11825–11829. Tsur, Y. and Zemel, A. (1995). Uncertainty and irreversibility in groundwater resource management. Journal of Environmental Economics and Management, 29(2):149 – 161. van Damme, E. (1989). Renegotiation-proof equilibria in repeated prisoners’ dilemma. Journal of Economic Theory, 47(1):206–217. 39 Appendix A–1 Proof of Proposition 1 Recall that Proposition 1 states that the socially optimal total use of the resource is either s0 for all t or s0 + δ ∗ (s0 ) for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1. Part (1) First, I show that there is a non-empty set S of values of s at which it is optimal to stay. Regardless of the actual form of the value function, we know that an upper bound for the continuation value V (s + δ) is u(R) . Using this, an upper bound for the gain from taking a positive step δ > 0 is 1−β . Therefore, when the derivative of u(s + δ) + βFs (δ) u(R) with respect to δ is less u(s + δ) + βFs (δ) u(R) 1−β 1−β or equal to zero on the domain δ ∈ (0, R − s] for some given value of s, then it is optimal to stay (choose δ ∗ (s) = 0) for that value of s. I now show that there exists some value of s at which this is the case. Let s = R − ε with ε > 0 so that s is close to R. The derivative of u(s + δ) + βFs (δ) u(R) is given by 1−β 0 u(R) 0 u (s + δ) + βFs (δ) 1−β , and evaluated at s = R − this is: 0 u0 (R − ε + δ) + βF R−ε (δ) u(R) 1−β ⇔ u0 (R − ε + δ) + β −f (R − ε + δ) u(R) 1 − F (R − ε) 1 − β An interior solution would imply that: (1 − F (R − ε))u0 (R − ε + δ) = βf (R − ε + δ) u(R) 1−β When it is known that there is a catastrophic threshold on h[0, R], we have F (R) i = 1. In this case, we have limε→0 [(1 − F (R − ε))u0 (R − ε + δ)] = 0 whereas limε→0 βf (R − ε + δ) u(R) > 0 (the existence of 1−β limx→R− f (x) > 0 is implied by the assumption of a continuous support for T on [0; R]) so that there cannot be an interior solution for small values of ε. When there is a positive probability that there is no threshold on [0, R] (that is, F (R) < 1), then it need not be the case that (1 − F (R − ε))u0 (R − ε + δ) < βf (R − ε + δ) u(R) as → 0. However, even when 1−β (1 − F (R))u0 (R) ≥ βf (R) u(R) , there will a value of s, namely s = R, at which it is optimal to stay – simply 1−β because there is no other choice. Thus, the set S is not empty. Moreover, when the hazard rate is not decreasing with s (that is when (s))+(1−F (s+δ))f (s) f (s+δ) ∂Fs (δ) f (s) = −f (s+δ)(1−F[1−F < 1−F ), it can be shown that the set S is convex, < 0 ⇔ 1−F ∂s (s) (s+δ) (s)]2 ∗ ∗ so that S = [s , R] where s is defined in the main text as the lowest value of s at which it is optimal to never experiment. First, note that convexity of S is trivial when s∗ = R. Consider then the case that s∗ < R. By definition we have for s∗ that the first-order condition must just hold with equality: u0 (s∗ ) = β f (s∗ ) u(s∗ ) 1 − F (s∗ ) 1 − β Convexity of S requires: u0 (λs∗ + (1 − λ)R) < β f (λs∗ + (1 − λ)R) u(λs∗ + (1 − λ)R) 1 − F (λs∗ + (1 − λ)R) 1−β for all λ ∈ (0, 1] (A-1) The term on the LHS of (A-1) is smaller the larger is λ. The rightmost fraction of (A-1) is larger the larger s (δ) is λ, β is positive constant, and the term in the middle is increasing in λ when ∂F∂s < 0. Summing up, when s0 ∈ S, the the socially optimal total use of the resource is s0 for all t. Part (2) When s0 ∈ / S, it is not optimal to stay, so that it is optimal to expand the set of safe consumption values by choosing δ > 0. Due to discounting, it cannot be optimal to approach S asymptotically. Thus there must be a last step from some st ∈ / S to st+1 = st + δt with st+1 ∈ S. Below, I show that it is in fact optimal to take only one step. It then follows that when s0 ∈ / S, it is optimal to choose s0 + δ ∗ (s0 ) 40 for t = 0 and, if the resource has not collapsed, s1 for all t ≥ 1. Denote δ ∗ (s̃) the optimal last step when starting from some value s̃ ∈ / S and s∗ = s̃ + δ ∗ with s∗ ∈ S. The following calculations show that going from some s to s̃ (by taking a step of size δ̃ and then to s∗ ) yields a lower payoff than going from s to s∗ directly (by taking a step of size δ̂ = δ̃ + δ ∗ ; see the box below for a sketch of the involved step-sizes). δ̂ δ̃ s δ∗ s ^ R s∗ s̃ That is, I claim: u(s + δ̃ + δ ∗ ) u(s + δ̃) + βF s (δ̃) u(s + δ̃ + δ ∗ ) + βF s+δ̃ (δ ∗ ) 1−β The important thing to note is that: F s (δ̃)F s+δ̃ (δ ∗ ) = ≤ F (s+δ̃) F (s+δ̃+δ ∗ ) F (s) F (s+δ̃) u(s + δ̂) + βF s (δ̂) = F (s+δ̃+δ ∗ ) F (s) u(s + δ̂) 1−β (A-2) = F s (δ̃ + δ ∗ ). Hence, ∗ (A-2) can, upon using δ̂ = δ̃ + δ and splitting the RHS into three parts (t = 0, t = 1, t ≥ 2), be written as: u(s + δ̂) 1−β ≤ u(s + δ̂) + βF s (δ̂)u(s + δ̂) + β 2 F s (δ̂) u(s + δ̃) ≤ h u(s + δ̃) + βF s (δ̃)u(s + δ̂) + β 2 F s (δ̂) which simplifies to: i 1 + β F s (δ̂) − F s (δ̃) u(s + δ̂) " u(s + δ̃) ≤ u(s + δ̂) 1−β 1+β # F (s + δ̂) − F (s + δ̃) u(s + δ̂) F (s) (A-2’) Because the term in the squared bracket is smaller than 1 (as F s (δ̂) < F s (δ̃)), it is not immediately obvious that the inequality in the last line holds. However, we can use the fact that because s̃ ∈ / S, and because δ ∗ is defined as the optimal last step from s̃ into the set S, the following must hold: u(s̃ + δ ∗ ) u(s̃) < u(s̃ + δ ∗ ) + βF s̃ (δ ∗ ) . 1−β 1−β Using the fact that s̃ = s + δ̃ and that s̃ + δ ∗ = s + δ̂, this can be re-arranged to give: F (s + δ̂) u(s + δ̂) u(s + δ̃) < u(s + δ̂) + β 1−β F (s + δ̃) 1 − β ⇔ # F (s + δ̂) − F (s + δ̃) u(s + δ̂) u(s + δ̃) < 1 + β F (s + δ̃) " 0 (A-3) (s+δ̃) (s+δ̃) Since F (s) < 0, we know that F (s+Fδ̂)−F < F (s+δ̂)−F < 0. Therefore, combining (A-2’) and (A-3) F (s) (s+δ̃) establishes the claim and completes the proof: " " # # F (s + δ̂) − F (s + δ̃) F (s + δ̂) − F (s + δ̃) u(s + δ̃) < 1+β u(s + δ̂) < 1+β u(s + δ̂) (A-4) F (s) F (s + δ̃) A–2 Proof of Proposition 2. Proposition 2 states that δ ∗ is decreasing in s. 41 Recall that the optimal choice of a positive expansion δ ∗ (s) is the solution of ϕ0 (δ ∗ ; s) = 0 where ϕ0 is given by (5): ϕ0 (δ ∗ ; s) = u0 (s + δ ∗ ) + i β h 0 ∗ F s (δ )u (s + δ ∗ ) + F s (δ ∗ )u0 (s + δ ∗ ) 1−β (5) with the second-order condition: ϕ00 (δ ∗ ; s) = u00 + β 00 ∗ 0 F s (δ )u + 2F s (δ ∗ )u0 + F s (δ ∗ )u00 < 0 1−β (A-5) To show that δ ∗ is declining in s, I therefore need to show that: ∂ ϕ0 (δ ∗ ; s) /∂s dδ ∗ =− 0 ∗ <0 ds ∂ ϕ (δ ; s) /∂δ ∗ Since the denominator is negative when the second-order condition is satisfied, we have that ∂[ϕ0 (δ ∗ ;s)] < 0, so that the condition to check is (A-6): ∂s 0 ∂[ϕ0 (δ ∗ ; s)] β = u00 + ∂s 1−β ∂F s (δ ∗ ) ∂F s (δ ∗ ) 0 0 u + F s (δ ∗ )u0 + u + F s (δ ∗ )u00 ∂s ∂s dδ ∗ ds < 0 when ! <0 (A-6) Noting the similarity of (A-6) to the second-order condition (A-5), and realizing that (A-5) can be decomposed into a common part A and a part B and that (A-6) can be decomposed into the common part A and a part C, a sufficient condition for (A-6) to be satisfied is that B > C. β 00 ∗ β 0 ∗ 0 0 F s (δ )u + F s (δ ∗ )u00 + F s (δ )u + F s (δ ∗ )u0 < 0 (A-5’) u00 + 1−β 1−β | {z } | {z } B A ! 0 ∂F s (δ ∗ ) ∂F s (δ ∗ ) 0 u+ u <0 ∂s ∂s {z } β β 0 ∗ 0 + u + F s (δ )u + F s (δ ∗ )u00 1−β 1−β | {z } | 00 A 00 0 0 ∂F (δ) s In order to show that F s (δ)u + F s (δ)u0 > ∂s u+ 0 interior solution from (5) to write u in terms of u: (A-6’) C ∂F s (δ) 0 u, ∂s I use the first-order condition for an 0 u0 = −F s (δ) u 1−β + Ls (δ) β Upon inserting and canceling u, I need to show that: 00 0 F s (δ) + F s (δ) Recall that F s (δ) = F (s+δ) F (s) 0 F (s + δ) F (s) 00 F (s + δ) F (s) 00 F s (δ) = 0 −F s (δ) 1−β + F s (δ) β 0 # ∂F s (δ) ∂F s (δ) + ∂s ∂s > " 0 −F s (δ) 1−β + F s (δ) β # (A-7) and hence: 0 F s (δ) = " 0 0 ∂F s (δ) F (s + δ)F (s) − F (s + δ)F (s) = ∂s [F (s)]2 0 00 0 0 F (s + δ)F (s) − F (s + δ)F (s) ∂F s (δ) = ∂s [F (s)]2 Tedious but straightforward calculations then show that (A-7) is indeed satisfied. 42 # 00 " 0 #2 0 F (s + δ) F (s + δ) F (s + δ) 1−β 1−β L(s + δ) ∂F s (δ) ∂F s (δ) 0 + − > + − F s (δ) β β L(s) ∂s ∂s F (s) F (s) F (s) | {z } | {z } " a a ⇔ 00 a F (s + δ) F (s) " − 0 F (s + δ) #2 F (s) 00 >a 0 0 F (s + δ)F (s) − F (s + δ)F (s) [F (s)]2 − ∂F s (δ) 0 F s (δ) ∂s ⇔ aF (s) > F (s + δ) ⇔ # " 1−β F (s + δ) F (s) > F (s + δ) + β F (s) ⇔ 1−β F (s) > 0 β A–3 Proof of Proposition 3 Recall that Proposition 3 states that also when crossing the threshold at time t triggers the regime shift at some (potentially uncertain) time τ > t, it is still optimal to experiment – if at all – in the first period only. The key is to realize that yesterday’s decisions are exogenous today. This means that threat of a regime shift can be modeled as an exogenous hazard rate: Let ht be the probability that the regime shift, triggered by events earlier than and including time t, occurs at time t (conditional on not having occurred prior to t, of course). The planner’s problem in this situation can be formulated as: V (s) = max δ∈[0,R−s] u(s + δ) + (1 − ht )βF s (δ)V (s + δ) (A-8) The structure of (A-8) is identical to the one in equation (3), only the effective discount factor decreases by (1 − ht ). As the value of β is immaterial for the fact that it is optimal to experiment only once, the learning dynamics are unchanged. A–4 Proof of Proposition 4 Recall that Proposition 4 states that when the regime shift is reversible after a lag of length l and T is revealed when st + δt > T , then any experimentation is undertaken in the first period and the size of the first step, δ ∗ (s0 ), is larger is l. Depending on l and the initial safe value s0 , a range of the state-space remains permanently unexplored. To prove this proposition, I first show that for suitable values of l, the set S of initial values s0 at which it is not optimal to experiment is not empty. As in the proof A–1 above, I use the fact that I can find an upper bound for the continuation value after a step δ to show that for some values of s the first-order condition for a positive step size cannot be satisfied. Because it is assumed that the location of the threshold is revealed when it is being crossed, the optimal choice after recovery (which takes l periods) is to stay exactly at the threshold T . An upper bound for the payoff from an experiment of size δ is therefore given by: " R s+δ # u(y)f (y)dy β l s β + Fs (δ)u(R) u(s + δ) + 1−β 1 − F (s) 43 the corresponding first-order condition for an interior optimum is: 0 = u0 (s + δ) + −f (s + δ) f (s + δ) β u(s + δ) + u(R) βl 1−β 1 − F (s) 1 − F (s) which can be rewritten as: [1 − F (s)]u0 (s + δ) = h i β f (s + δ) u(R) − β l u(s + δ) 1−β (A-9) As above, consider s → R. Because we assume F (R) = 1 in this section, the LHS of (A-9) will go to zero whereas the RHS is positive when l > 0. Clearly, the larger l the larger the RHS. Note also that (A-9) shows that it will always be optimal to explore the entire state-space when l = 0 as the RHS is then zero as s → R. Again, to show that any experimentation is undertaken in the first step, I show that the payoff from reaching S in one step is higher than the payoff from doing so in two steps. As the only thing that differs from the calculations (A-2) to (A-4) above is the addition of the continuation value in case the threshold is crossed and discovered, this amounts to showing: R s+δ̂ u(y)f (y)dy u(y)f (y)dy l+1 s+δ̃ +β β (1 − F (s))(1 − β) (1 − F (s))(1 − β) R s+δ̂ R s+δ̃ l s ≤ β l u(y)f (y)dy s (1 − F (s))(1 − β) which is true (because β < 1) as the following equivalent reformulation makes clear: s+δ̃ Z Z s+δ̂ u(y)f (y)dy + β s s+δ̂ Z u(y)f (y)dy s+δ̃ ≤ u(y)f (y)dy s Finally, direct inspection of equation (12) in the main text shows that the size of the first step, δ ∗ (s0 ), is larger the shorter the duration of the lag l. A–5 Proof of Proposition 5 Recall that Proposition 5 states that when the regime shift is reversible after a lag of length l and T is not revealed when st + δt > T , it is either not optimal to experiment at all, or there is repeated experimentation with decreasing step sizes δt > δt+1 . Experimentation stops the moment that st + δt < T or st + δt < Û. To prove this proposition, I first note that the existence of a set S at which it is not optimal to experiment further is implied by Proposition 4. The continuation value when the location is discovered upon crossing the threshold is larger than the continuation value when the threshold is crossed but not revealed. Because even in the former case, it was optimal to leave some of the state space unexplored as the cost of crossing the threshold become prohibitively high, this must necessarily be also be true when the cost of crossing the threshold are larger. To see that there must be some critical value Û below which it does not pay to experiment further, I u(s) . Inserting s + ε for U in the equation set U to some small value s + ε and show that as ε → 0, J(s, U) < 1−β of J(s, U) (equation (13) in the main text), we have: ( J(s, s + ε) = sup δ∈(0,ε) " u(s + δ) + β βl R s+δ s #) R s+ε f (y)dyV (s, s + δ) + s+δ f (y)dyV (s + δ, s + ε) R s+ε f (y)dy s u(s) u(s) Clearly, for l = 0, we would have limε→0 J = 1−β , but because l > 0, we have limε→0 J < 1−β . The fact that the step size decreases simply follows from the successive updating of the upper bound U: as Ut+1 = st + δt , as new step from st of size δt+1 ∈ (0, Ut+1 − st ) must necessarily be smaller than δt . 44 A–6 Proof of Proposition 6 Recall that Proposition 6 states that when the location of the threshold is known with certainty, then there exists, for every combination of N , T , and R, a value of β̄ such that the first-best can be sustained as a Nash-equilibrium when β ≥ β̄. The larger is N , or the closer T is to 0, the larger has to be β. Recall that β̄ was defined as the lowest value of β at which staying at T can be sustained as a Nash equilibrium. Equation (15), which is replicated below, characterizes β̄. β̄ = 1 − u(T /N ) u R − NN−1 T (15) 0 dβ̄ = − [ uN · u + u· u0 NN−1 ]/[u2 ] < 0. First, fix a value of N and R and consider how β̄ changes with T : dT The players need to be the more patient the less valuable it is to stay below the threshold (i.e. as T declines). u(T /N ) → 0. The rightNote that for T → 0, u(T /N ) → 0 while u R − NN−1 T → u(R) > 0 so that −1 u(R− NN T) hand-side of (15) therefore approaches 1 as T → 0. But since it approaches 1 from below, we can always find some value of β that could still sustain the first-best. dβ̄ Second, fix T and R and consider how β̄ changes with N : dN = − [u0 · T · u + u· u0 N12 ]/[u2 ] < 0. The more players there are, the more patient they have to be in order to sustain the productive equilibrium. u(T /N ) Note that as N → ∞, u(T /N ) → 0 while u R − NN−1 T → u(R − T ) > 0 so that → 0. Again, −1 u(R− NN T) β̄ approaches 1 from below, which allows to find some value of β that could still sustain the first-best. 0 dβ̄ Finally, fix N and T and consider how β̄ changes with R: dR = uu2 > 0. The larger is R, the larger the temptation to deviate and extirpate the resource immediately, which means that β must be higher in order for a sustainable Nash equilibrium. However, as R > T by construction, there will always be some value of β at which the resource is preserved indefinitely. A–7 Proof of Proposition 7. Recall that Proposition 7 states that for s0 ≥ snc coordination to stay at s0 can be supported as a Nash equilibrium. For s0 < snc taking one step and then staying at s1 = s0 + δ nc can be supported as a Nash equilibrium. First note that if it is a Nash equilibrium to stay at some s in any one period, it will be a Nash equilibrium to stay at that s in all subsequent periods. Again, there will be some snc at which staying is a Nash equilibrium, because at least at s = R, there is no other choice. But parallel to the argument in Proposition 1, there will also be some snc < R when s close enough to R and F s (δ) becomes sufficiently small. Also here, there will always be values of snc < R when it is known that there is a catastrophic threshold on [0, R]. Suppose all other players stay at s = R − ε, then for ε small, the value from staying at s = R − ε is at least as large as the value of making a step towards R so that the updated value is R − δ (with δ ∈ (0, ε]): u R−ε N 1−β ≥u R−ε +δ N + βF s (δ) u R−δ N 1−β (A-10) ) u(( R−ε ) N Parallel to the social optimum we have limε→0 = u(R/N . Again, since δ ∈ (0, ε] and F s (δ) → 0 1−β 1−β R−δ u( N ) ) as δ → R − s we have limε→0 u R−ε + δ + βF s (δ) 1−β = u(R/N ) < u(R/N . N 1−β Now, as there is some snc at which staying is a Nash equilibrium, there will be a last step at which this value is reached. Take some value s at which staying is not a Nash equilibrium. Suppose the strategy of the opponents is to take some step δ1−i < δ nc (s) and then some step δ2−i∗ (s + δ1−i + δ1i ). The following calculations show that the best-reply from player i is to take only one step δ1i∗ . Hence δ2−i∗ (s + δ1−i + δ1i ) = 0 and the equilibrium will be to reach a value at which staying is a Nash equilibrium in one step. 45 For player i the payoff from making one step δ1i∗ = snc − s0 − δ1−i exceeds the payoff from making two steps δ1i < snc − s0 − δ1−i and δ2i∗ = snc − s1 − δ2−i∗ when: nc s s β 0 i∗ −i i∗ u + δ1 + F s (δ1 + δ1 )u N 1−β 0 N (A-11) nc s s s1 β 0 ≥u + δ1i + βF s0 (δ1i + δ1−i ) u + δ2i∗ + F s1 (δ2i∗ )u N N 1−β N As for the coordinated case, F s0 (s1 − s0 )F s1 (snc − s1 ) = F s0 (snc − s0 ) so that (A-11) implies: u s F (snc ) snc F (s1 ) s1 0 + δ1i∗ − u + δ1i ≥ β + δ2i∗ − u u N N N N F (s0 ) F (s0 ) s 0 (A-12) For clarity, write this inequality as A − a ≥ B − b. This inequality holds because both A > B and a < b. To see that A > B note that u is an increasing and concave function so that u( sN0 + δ1i∗ ) > u( sN1 + δ2i∗ ) when sN0 + δ1i∗ > sN1 + δ2i∗ . Inserting δ1i∗ = snc − s0 − δ1−i , δ2i∗ = snc − s1 − δ2−i∗ and s1 = s0 + δ1i + δ1−i in this inequality simplifies to (N − 1)(δ1i + δ1−i ) > 0, which is true. By the same argument, a < b when nc s0 + δ1i < sN . Re-write this as N δ1i < snc − s0 . This inequality holds because it is implied by the definition N that δ1i < snc − s0 − δ1−i and δ1−i < δ nc (s). Recall that the best-reply function g(δ −i , s) in equation (19b) is therefore defined by the interior solution to the first-order-condition of maximizing φ(δ i ; δ −i , s): s φ0 (δ i ; δ −i , s) = u0 + δ i + δ −i N s s β 1 0 i + F s (δ + δ −i )u + δ i + δ −i + F s (δ i + δ −i )u0 + δ i + δ −i 1−β N N N For a symmetric step size δ −i = (N − 1)δ i , we have: s + δ nc N s + δ nc s + δ nc β 1 0 + F s (N δ nc )u + F s (N δ nc )u0 1−β N N N φ0 (δ nc ; s) = u0 The value of snc is defined by δ nc = R−s , which is the largest value of s at which equation (20) does not N 0 yet have an interior solution but φ (δ, s) > 0 for all δ ∈ [0, R − s). Similarly, the value of snc is defined by δ nc = 0, which is the smallest value of s at which equation (20) no longer has an interior solution but φ0 (δ, s) < 0 for all δ ∈ (0, R − s] A–8 Proof of Proposition 8. Let me repeat the comparative statics results here: (a) The boundaries snc and snc , and aggregate extraction for s ∈ [snc , snc ], decrease with β. R (b) An increase in N leads to more aggressive extraction when NN+1 > u0 ( N ) u0 ( NR+1 ). (c) The more unlikely the regime shift (in terms of a first-degree stochastic dominance), the larger the range where a cautious Nash-equilibrium exists. (d) As long as R < A, the higher the maximum potential reward R, the larger the range where a cautious Nash-equilibrium exists. First, as φ0 = 0 implicitly defines a monotonically decreasing function δ nc (s) on [snc , snc ] (which can be shown by replacing δ ∗ (s) with N δ nc (s) in the proof of Proposition 2) and δ nc (s) is bounded above by R − s and below by 0, an increase in δ nc will also lead to an increase in snc and snc respectively. dφ0 . dβ 0 We have dφ = dβ 2β[...]−β , where the term in the squared brackets [...] is term in the squared brackets of equation (20). We 2 (1−β) (a) To prove the proposition’s part with respect to β it is thus sufficient to analyze 46 know that this term must be negative for an interior solution because u0 > 0. Therefore: dφ0 dβ = 2β[...]−β (1−β)2 < 0. (b) I now turn to the effect of increasing N . To provide a sufficient condition for when an increase in N decreases the range where there is a cautious equilibrium, and therefore increases aggregate expansion, I make the following argument: snc , the largest value at which immediate extirpation is the only Nash R equilibrium becomes larger when adding another player and NN+1 > u0 ( N ) u0 ( NR+1 ). For a given number R−s ; ŝ) = 0 and I show that φ0 ( N ; ŝ) > 0 when of players N we have at a given snc = ŝ that φ0 ( R−s N +1 N N +1 > R ) N : R ) N +1 u0 ( u0 ( R−s φ0 ( N ; ŝ) − φ0 ( R−s ; ŝ) > 0 +1 N ⇔ R u0 ( NR+1 ) − u0 ( N )+ 0 1 β 1 R R u0 ( NR+1 ) − u0 ( N u( NR+1 ) − u( N ) F ŝ + F ŝ ) >0 1−β N +1 N The first part of the last line is positive due to concavity of u, the first term in the squared bracket is 0 R positive since F s < 0 and u( N ) > u0 ( NR+1 ), and the last term in the squared bracket is positive whenever N N +1 > R ) N . R ) N +1 u0 ( u0 ( (c) Consider the equation (20) at s = snc : φ0 R − snc nc ;s N = u0 R N + β R 1 R 0 F s (R − snc )u + F s (R − snc )u0 =0 1−β N N N ˆ < F and F ˆ 0 < F 0 . The term in the squared brackets Now when the regime shift is more likely, we have F above will therefore be smaller in absolute terms. As it is negative, it must mean that: β R − snc nc R ˆ 0 (R − snc )u R + 1 F ˆ (R − snc )u0 R ;s + >0 φ̃0 = u0 F s s N N 1−β N N N so that the range of values at which a cautious equilibrium exists is larger. (d) Finally, to see the effect of an increase in R to R̃ when R < A and R̃ ≤ A, note that this does not impact equation (20) directly, but it does have an effect on the first value snc : As the diagonal line defining the upper bound of δ ∈ [0, R − s] shifts outwards, and δ nc (s) is a downward sloping function steeper than R − s, the first value at which it is not optimal to extirpate the resource must be smaller. A–9 Proof of Proposition 9 Proposition 9 states that a known threshold is crossed in the first period, or never. I show that crossing the threshold in period t cannot be an equilibrium because the payoff for player i to preempt crossing the threshold (i.e. exhausting the resource) at t by exhausting the resource at time t − 1 is strictly larger. I claim: N −1 Rt u Rt−1 − T >u N N where u Rt−1 − NN−1 T is the payoff from preempting exhausting the resource at t−1 while all other players still extract T . As Rt = Rt−1 − T when the threshold is not crossed at time t, the above inequality holds T T when Rt−1 − T + N > RNt ⇔ Rt > RNt − N , which is obviously true. Thus, if it is a Nash equilibrium to exhaust the resource by crossing the threshold, it must be so in the first period. 47 A–10 Proof of Proposition 10 Recall that Proposition 10 states that in the game described by (22) when T is unknown, there exists, in addition to the aggressive equilibrium in which the resource is exhausted in the initial period, a pareto dominant equilibrium in which experimentation – if at all – is undertaken in the first period only and s1 = s0 + δ nc (s0 ) is an upper bound on aggregate extraction for the remainder of the game. The threat of a regime shift slows down the extraction rate and improves welfare. The existence of the aggressive equilibrium is self-evident. The cautious equilibrium, if it exists, must pareto-dominate the aggressive equilibrium as – by assumption – there was a pareto-dominant equilibrium with several periods of extraction in a world without the threshold. If the cautious equilibrium implies, for all periods t, a reduced extraction compared to the equilibrium extraction path in absence of regime shift risk, the cautious equilibrium must be welfare improving. Below I show that experimentation in the second period of the game is not individually (and socially) optimal. Therefore, if the players coordinate on the cautious equilibrium, and because the cautious equilibrium implies δ nc < c̃nc (R0 ) − s0 by construction, it slows down the extraction rate for all periods t. To show that experimentation in the second period is not optimal, I argue by contradiction. For a given safe value s1 and a given stock of the remaining resource R1 in the second period, the value of the game for player i, when all other players share the extraction of the safe amount s1 equally, is given by: V2 (R1 , s1 ) = max u( sN1 + δ i ) + βF s1 (δ i )V3 R1 − s1 − δ i δi (A-13) Suppose it were optimal to expand the set of safe values in the second period. The first-order condition for an (interior) expansion is given by: 0 u0 ( sN1 + δ i ) = −βF s1 (δ i )V3 R1 − s1 − δ i + βF s1 (δ i )V30 R1 − s1 − δ i (A-14) Suppose that the first derivative of the objective function is declining in s in a neighborhood of δ i∗ (this is shown below). Then for large s the right-hand side of (A-14) is larger than the left-hand side (hence there is no interior solution). Accordingly, the last value of s at which no expansion can be coordinated upon in the second period of the game is defined by: 0 u0 ( sN1 ) = βV30 (R1 − s1 ) − βF s1 (δ i )V3 (R1 − s1 ) (A-15) Now in the first period, the corresponding value function for player i (again presuming that all other players remain at s0 ) is: V1 (R0 , s0 ) = max u( sN0 + δ i ) + βF s0 (δ i )V2 (R0 − s0 − δ i , s0 + δ i ) δi The condition for the last value of s at which no expansion can be coordinated upon is given by: i i β V2 R0 − s0 − δ , s0 + δ 0 u0 ( sN0 ) = − βF s0 (δ i )V2 R0 − s0 − δ i , s + δ i N ∂s0 (A-16) (A-17) Now consider the value of s at which (A-17) holds. At this level of s, equation (A-15) will not hold (but 2] the right-hand side will be larger than the left-hand side) as ∂[V > u0 and V2 R0 − s0 − δ i , s + δ i > ∂s V3 R1 − s0 − δ i . This means that experimentation in the second period is no longer optimal but it is still optimal in the first period. As ϕ is a continuous function of s, the same holds also for depletion (the other corner solution). What remains to be shown is that the first derivative of the objective function is declining in s in a neighborhood of the optimal expansion δ i∗ . To this end, denote the derivative of the objective function by 48 ϕ. Omitting all sub and superscripts to avoid clutter, we then have: ∂ϕ 1 = u00 + β ∂s N β − N 00 0 0 0 F (s + δ)F (s) − F (s + δ)F (s) 1 F (s + δ) 0 u u− 2 N [F (s)] F (s) ! 0 0 F (s + δ)F (s) − F (s + δ)F (s) 0 F (s + δ) 00 u + u [F (s)]2 F (s) ! (A-18) Note that the second-order condition for a maximum at δ i∗ requires: 00 β − N 0 ! 0 F (s + δ)F (s) − F (s + δ)F (s) 0 F (s + δ) 00 u + u [F (s)]2 F (s) Equation (A-18) and (A-19) are similar. In fact, 0 0 F (s + δ) 1 F (s + δ) 0 u− u N F (s) F (s) ∂ϕ 1 = u00 + β ∂δ N ∂ϕ ∂s = 0 ∂ϕ ∂δ − 0 0 F (s+δ)F (s) . [F (s)]2 (A-19) ! 0 The term 0 F (s+δ)F (s) [F (s)]2 is positive as F (s + δ) < 0 and F (s) < 0. Hence equation (A-18) is negative when the second-order condition holds. Thus experimentation in the second period is not optimal, and a fortiori not in any later period. 49