Learning under Ambiguity, Portfolio Choice, and Asset Returns Hongseok Choi∗ December 15, 2015 Job Market Paper Abstract This paper investigates the effects of learning under ambiguity on investment strategies and asset returns. The main contributions of the paper are twofold. First, it recognizes and quantifies the uncertainty involved in estimating the data-generating mechanism, or estimation ambiguity. I consider the consumption/portfolio choice problem of a multiple-priors investor with logarithmic felicity and observe that ignoring estimation ambiguity can result in a significant underestimation of hedging demand: even with only a moderate degree of ambiguity in the equity premium (1% point in annual terms), a log investor learning about the data-generating mechanism can sell short an amount of the risky asset worth half of her wealth when the estimated equity premium is zero. The second contribution of the paper is that it provides endogenous dynamics for the conditional degree of ambiguity, and consequently for the ambiguity premium. I observe that the resulting relationship of the total equity premium with return volatility is unclear. The ambiguity premium depends on the history of return volatility rather than its contemporaneous value. Furthermore, a period of low (high) return volatility can be followed rather by a high (low) ambiguity premium. Keywords: ambiguity, learning, state-space models, recursive multiple-priors utility, model selection, hedging demand, equity premium. ∗ University of Pennsylvania, aitch.choi@gmail.com. This paper is a revised version of my doctoral dissertation at the University of Pennsylvania (Choi, 2012), and I am deeply indebted to my advisor, Domenico Cuoco, for his guidance and support throughout the course of this research. I would also like to thank Philipp Illeditsch and Dirk Krueger; and the seminar participants at Seoul National University, Korea University Business School, the 11th Annual Conference of the Asia-Pacific Association of Derivatives (especially Sun-Joong Yoon), Korea University Mathematics Department, the Korea Institute of Finance, and Yonsei University. 1 1 Introduction In standard financial models, agents are depicted as having a unique probabilistic model regarding uncertain investment opportunities. However, our knowledge about the data-generating mechanism—for example, how stock returns are generated—is often limited, and in such cases it is difficult to assess the probabilities of all uncertain events precisely; in other words, we face ambiguity. One of the most popular ways to model decision makers under ambiguity is through Gilboa and Schmeidler’s (1989) multiple-priors utility, and over the past few decades multiple-priors models have successfully demonstrated the economic significance of taking ambiguity into account by shedding light on a number of puzzling phenomena in financial markets.1 In most dynamic multiple-priors models, however, the degree of ambiguity as perceived by an agent follows an exogenous process, whether constant or time-varying. In a dynamic multiple-priors model, an agent’s view about the uncertain outcomes of the immediate future is represented by a set of one-step-ahead conditionals as opposed to a single one-step-ahead conditional;2 and the size of the set of one-step-ahead conditionals is thought to represent the conditional degree of ambiguity. Some authors have assumed that the size of the set of one-step-ahead conditionals follows a meanreverting process;3 a larger number of authors that it stays constant, interpreting the constancy as lack of learning due to the agent’s having learned all that she can.4 While these models of ambiguity provide useful benchmarks, they are not ideal, essentially being reduced-form models. In this paper, I endogenize the conditional degree of ambiguity and its response to observations by constructing a model of learning under ambiguity. The underlying idea is familiar and simple: fearing misspecification, the agent considers multiple Bayesian models, or theories, of the market, and at each instant she constructs the set of one-step-ahead conditionals by updating the theories with sufficiently high plausibility. The set of theories is constructed around a “reference theory” that data are generated by a continuous-time linear state-space model. The agent, however, is not bold enough to claim that she has identified the model of the market, and seeks a degree of robustness by expanding her consideration to a class of theories that are statistically close to the reference theory in the sense of mutual absolute continuity. These theories are not as intuitive and simple as the reference theory, but, since they 1 Among other examples: stock market nonparticipation (Dow and Werlang, 1992), excess volatility (Epstein and Wang, 1994; Illeditsch, 2011), excess equity premium (Chen and Epstein, 2002), and equity home bias (Epstein and Miao, 2003). 2 Let G0 , G1 , · · · , GT be the agent’s observation filtration. Given a prior P on GT , the implied time-t one-step-ahead conditional means the restriction of P (·|Gt ) to Gt+1 . 3 See Sbuelz and Trojani (2008), Drechsler (2013), and Ilut and Schneider (2014). 4 See Hernández-Hernández and Schied (2006, 2007a,b), Schied (2008), Miao (2009), Routledge and Zin (2009), and Liu (2011, 2013) for applications of constant ambiguity to portfolio choice, and Chen and Epstein (2002), Epstein and Miao (2003), Trojani and Vanini (2004), and Gagliardini et al. (2009) for those to asset pricing. 2 all agree with the reference theory on what events are and are not possible, it is difficult to tell them apart from each other even after a long period of observation. Now, a key feature of the present paper is that the agent is unable to assign second-order probabilities over the theories. However complicated a Bayesian hierarchy may be, as long as it is complete the agent can tell the probabilities of all conceivable events, and the objective of this paper is precisely to consider a case in which she cannot. Consequently, the agent instead constructs a “confidence set” of theories by comparing their plausibility. The main contributions of the paper are twofold. First, it recognizes and quantifies the uncertainty involved in estimating the datagenerating mechanism. In this respect, the paper is also closely related to the Bayesian literature on estimation risk (Kalymon, 1971; Barry, 1974; Klein and Bawa, 1976, 1977). In Bayesian models, agents may not know the exact values of certain parameters, but they still have a prior over the parameter space, which is updated as they make observations. The dispersion of the posterior distribution is known as estimation risk. In contrast, in the present paper the agent not only faces estimation risk under each theory but also faces uncertainty in estimating the unknown data-generating mechanism, or estimation ambiguity. The economic significance of estimation ambiguity comes from the fact that in revising their estimates, agents place a greater weight on new evidence the more unreliable are current estimates. This means a higher local correlation between the return and the state variable (of the control problem) and consequently higher hedging demand. In other words, ignoring estimation ambiguity can result in an underestimation of hedging demand, as I demonstrate in Section 4. In Section 4, I solve, up to a value function, the consumption/portfolio choice problem of a multiple-priors investor with logarithmic felicity. After calibrating the model to U.S. stock market data (Barberis, 2000), I numerically compute the optimal Markov policy and observe that even with only a moderate degree of ambiguity in the equity premium (1% point in annual terms), an investor learning about the datagenerating mechanism can sell short an amount of the risky asset worth half of her wealth when the estimated equity premium is zero. This is in stark contrast with both the Bayesian result that log investors are myopic, in which case an absence of an equity premium implies no investment in the risky asset (see, for example, Kim and Omberg, 1996) and the static multiple-priors result that ambiguity aversion deters investors (with a general felicity function) from taking a position in the risky asset for a range of prices (Dow and Werlang, 1992). Multiple-priors log investors’ nonmyopic behavior was first observed in discrete time by Epstein and Schneider (2007) and in continuous time by Hernández-Hernández and Schied (2007a). The second contribution of the paper is that, as I have emphasized, it provides endogenous dynamics for conditional ambiguity. The economic significance of these dynamics comes from the fact that conditional ambiguity is directly reflected in the equilibrium equity premium; that is, the equity premium includes a reward for bear3 ing ambiguity, or ambiguity premium, as well as the risk premium (Chen and Epstein, 2002). Specifically, in Section 5, I consider a Lucas economy populated by a multiple-priors agent with logarithmic felicity and explore a few theoretical possibilities regarding the dynamics and asymptotic level of the ambiguity premium. First, the relationship between the total equity premium and the conditional variance of returns is theoretically, as well as empirically, unclear. Motivated by standard asset pricing models such as the Intertemporal CAPM of Merton (1973), numerous empirical papers have investigated the relationship between the two, but the findings are mixed.5 In the present model, the equity premium is given by the conditional variance of returns plus the ambiguity premium, and the latter both varies over time and does not have a deterministic relationship with the former. The ambiguity premium, following an absolutely continuous process, slowly responds to variations in the conditional variance of returns, depending thus on the recent history of the latter rather than its contemporaneous value. Furthermore, if expected and realized growth rates are locally negatively correlated, a period of low (high) return volatility can be followed rather by a high (low) ambiguity premium. Second, in the long run, the ambiguity premium exhibits a downward trend, and consequently so does the total equity premium. This is because, in the present model, the ambiguity premium is a reward for bearing ambiguity in the expected growth rate, and the component ambiguity in the long-run growth rate resolves as time passes. The equity premium has indeed been observed to have declined over the postwar period (Merton, 1980; Blanchard, 1993; Jagannathan et al., 2000), and part of it could be due to learning and the accompanying resolution of ambiguity. Finally in Section 5.5, I demonstrate that an improvement in the quality of public information can increase the asymptotic level of the equity premium. In a Bayesian framework, Veronesi (2000) made a similar observation that higher precision of signals tends to increase the risk premium. What distinguishes the present observation from Veronesi’s is that his result relies on the representative agent’s being sufficiently risk-averse, more so than log agents. In his model, a deterioration in the quality of public information decreases the equity premium because the agent’s hedging demand tones down the covariation between returns and consumption growth. In contrast, the present paper shows that the equity premium can exhibit such counterintuitive behavior even under unit risk aversion, which is conventionally associated with myopia. To the best of my knowledge, the one paper in the existing literature that derives time variation in the set of one-step-ahead conditionals from a model of learning is Epstein and Schneider (2007).6 In that paper, too, the predictive set is constructed by a statistical test over multiple theories. The main differences between Epstein and 5 See the introduction of Rossi and Timmermann (2015). Campanale (2011) applies Epstein and Schneider’s model in the context of life-cycle portfolio choice, and Miao and Wang (2011), in the context of job matching. Epstein and Schneider’s (2008) asset pricing model, too, conforms to the formalism of Epstein and Schneider (2007), but the agents of the 2008 paper do not rule out any of the theories they a priori entertain. 6 4 Schneider’s paper and mine are twofold. First, whereas Epstein and Schneider consider memoryless data-generating mechanisms, I consider those with serial dependence. The latter consideration clearly complements the former. For example, with the present model, we can study the effects of learning under ambiguity when stock returns or the growth of consumption/dividends is (ambiguously) predictable (Sections 4 and 5).7 Second, whereas Epstein and Schneider set their model in discrete time, I set mine in continuous time. Continuous-time modeling is known to facilitate analysis with its analytical tractability. But more importantly I also note that the continuous-time counterpart of Epstein and Schneider’s portfolio choice example results in no learning because the likelihood function degenerates to infinity everywhere.8 Consequently, their discrete-time finding that learning resolves ambiguity does not immediately carry over to continuous time: learning under ambiguity in continuous time needs separate treatment. Another paper that considers learning in the context of multiple-priors utility is Miao (2009). Specifically, he considers the consumption/portfolio choice problem of a multiple-priors investor in continuous time who partially observes stochastic investment opportunities. However, his notion of learning is fundamentally different from mine. Miao’s investor obtains a benchmark predictive measure by updating a reference theory, and the set of predictive measures is given by a neighborhood of the benchmark with a fixed radius. Thus, learning and ambiguity do not interact. In fact, Miao’s model is the limit of the present model as the investor gains confidence (Section 4.4.3).9 See also Hansen and Sargent (2011), Chen et al. (2014), and the references therein. The former paper considers learning in the context of robust control, and the latter in the context of smooth ambiguity. In both papers, agents learn by updating a Bayesian model. Chen et al.’s agent, in particular, entertains a standard hierarchical Bayesian model. In a smooth ambiguity model, aversion to ambiguity is captured not by imprecise probabilities but by failures to reduce compound lotteries. The rest of the paper is organized as follows. Section 2 defines and solves the model of learning under ambiguity. Section 3 discusses the results. Section 4 applies the model to portfolio choice; Section 5, to asset pricing. All proofs are collected in the appendix. 7 There indeed is convincing evidence that both returns and growth rates are predictable (van Binsbergen and Koijen, 2010). For a review of the predictability literature, see Koijen and van Nieuwerburgh (2011). 8 See the supplementary appendix. 9 Liu (2011) considers the consumption/portfolio choice problem of a Miao investor when expected returns follow a Markov chain. 5 2 Learning under Ambiguity This section defines and solves the model of learning. Section 2.1 describes the agent’s utility function and Section 2.2 her theories. Section 2.3 then applies the learning mechanism outlined in the introduction to the setting; this amounts to mapping the theories to the priors in the representation of preferences. 2.1 Preferences The agent has continuous-time recursive multiple-priors utility with equivalent priors as formulated by Chen and Epstein (2002). Specifically, time is continuous and varies over [0, T ], T ∈ (0, ∞). Let Ω denote the set of states of Nature, and let a filtration G = {Gt } on Ω represent the accrual of the agent’s information. There is a set P of equivalent probability measures on (Ω, GT ), the set of priors, with the following properties: Let P 0 ∈ P. There is an ny -dimensional Wiener process = {(t), Gt } under P 0 such that G is the augmentation under P 0 of the filtration generated by . (The notation = {(t), Gt } signifies that the process is adapted to the filtration G. All vectors, including the gradient ∂f of a scalar function f , are column vectors. Hence, (t) = (1 (t), · · · , ny (t))> .) Each prior is identified with the corresponding density generator ξ = {ξ(t), Gt } and is thus written as P ξ , where dP ξ = E ξ (T ) dP 0 (1) and E ξ denotes the Doléans-Dade exponential Z t Z 1 t ξ 2 ξ(s) d(s) − E (t) , exp |ξ(s)| ds , 0 ≤ t ≤ T. 2 0 0 Then, under P ξ , ξ defined by dξ (t) = d(t) − ξ(t) dt, ξ (0) = 0, is a Wiener process. P is further required to be rectangular, which means that there is a set-valued ny process Ξ : [0, T ] × Ω → 2R such that P ξ defined by (1) is a prior if and only if ξ is progressive and ξ(t, ω) ∈ Ξ(t, ω) for Lebesgue×P 0 almost every (t, ω). Since P consists of equivalent measures, “for Lebesgue×P 0 almost every (t, ω)” is henceforth abbreviated without obscurity to “a.e.” Ξ is called the one-step-ahead beliefs process10 and is further required to be (i) uniformly bounded, (ii) compact-convex-valued, and (iii) “progressive”: (i) Ξ(t, ω) ⊂ K a.e. for some bounded K ⊂ Rny , (ii) Ξ(t, ω) is compactconvex a.e., and (iii) the restriction of Ξ to [0, t] × Ω is B[0, t] ⊗ Gt -measurable11 for all t ∈ [0, T ] where BX denotes the Borel σ-algebra of a topological space X. Incidentally, 10 Since I work with continuous time, one-step-ahead is an abuse of language, which can be remedied, if need be, by introducing infinitesimal generators. See, for example, Anderson et al. (2003). 11 {(s, ω) ∈ [0, t] × Ω : Ξ(s, ω) ∩ K 0 6= ∅} ∈ B[0, t] ⊗ Gt for all closed K 0 ⊂ K. See Aliprantis and Border (1999), Sections 16.1, 16.2, and 17.1. 6 constant ambiguity, or independently and indistinguishably distributed (IID) ambiguity (Epstein and Schneider, 2003; Chen and Epstein, 2002), refers to the situation where Ξ is constant, that is, Ξ(t, ω) = K a.e. for some compact-convex K ⊂ Rny . A scalar process c = {c(t), Gt } is a consumption process if it is progressive, positive, and integrable with respect to time. Denote the set of consumption processes by C. The agent’s conditional preferences at time t ∈ [0, T ] are represented by U c (t, ω) = min U c,P (t, ω), c ∈ C P ∈P (2) where U c,P = {U c,P (s), Gs }, the utility process under P ∈ P, uniquely solves the backward stochastic differential equation (BSDE), Z T P c,P c,P F (c(τ ), U (τ )) dτ Gs , t ≤ s ≤ T. U (s) = E s Here, F is the aggregator. See Chen and Epstein (2002), Section 2.5, for the conditions that an aggregator must satisfy. The utility process U c defined by (2) solves dU c (t) = −F (c(t), U c (t)) dt + max σUc (t)( d(t) + ξ(t) dt), 0 ≤ t ≤ T, (3) ξ(t)∈Ξ(t) with terminal condition U (T ) = 0, for some process σUc = {σUc (t), Gt }. When (3) is viewed as a BSDE, the pair (U c , σUc ) constitutes a solution. Interpretation is straightforward. By the assumption that generates G, all changes in fundamentals to take place over the next instant are functions of d(t); (3) then indicates that the agent’s conditional beliefs about the one-step-ahead uncertainties are summarized by a set of distributions {N (ξ(t) dt, Iny dt) : ξ(t) ∈ Ξ(t)}; and as a “pessimist,” the agent assesses each consumption plan under the distribution that is worst for the plan. 2.2 The Theories Denote the observable process that generates the agent’s information by y = {y(t), Gt }. That is, G is the P 0 -augmentation of the filtration generated by y. ny ≥ 1 denotes the dimension of y. Examples of y will be given shortly. In this section, I define the theories that the agent entertains about how y is generated. They are given by a set of probability measures Q ∈ Q on a common measurable space. 2.2.1 The Reference Likelihood Let there be a filtration F = {Ft } on Ω and a probability measure Qx̄,0 on (Ω, FT ) where x̄ ∈ Rnx , nx ≥ 1. F satisfies the usual conditions with respect to Qx̄,0 . Let there also be two independent (Qx̄,0 , F)-Wiener processes w and v x̄,0 , ny -dimensional and 7 nx -dimensional, respectively. Under Qx̄,0 , y satisfies the following system of stochastic differential equations (SDEs): dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t), dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv dv x̄,0 (t). Here, x = {x(t), Ft } is an nx -dimensional process that is unobservable to the agent; a, b, and σ are nonanticipative path functionals from [0, T ] × C([0, T ], Rny ) into Rny , Rny ×nx , and Rny ×ny , respectively, where C([0, T ], Rny ) denotes the set of continuous functions from [0, T ] into Rny ;12 κ is an nx × nx diagonal matrix with positive entries, ρw is an nx × ny matrix, and ρv is an nx × nx invertible matrix. Given that y is observed, the diffusion matrix process σσ > , too, is observed via quadratic variationcovariation. The assumption that σ as a nonanticipative path functional depends only on y, or equivalently, σ as a process is adapted to G, embodies the restriction that observing the diffusion matrix does not expand the agent’s information. x(0) is an F0 -measurable random variable. The distribution of x(0) conditional on G0 is normal with mean m0 ∈ Rnx and variance-covariance matrix γ0 ∈ Rnx ×nx . For simplicity, I assume y(0) is nonrandom. All the parameters and functionals are known but x̄. While this assumption may seem unrealistic—if the agent knows so much, why not x̄? or vice versa—I point out that (i) in many special cases considered in the literature, the functionals are simple, being constant or linear, and (ii) the restrictive form of ignorance is only a first step. The agent may well find κ ambiguous as well, for example. Example 2.1 (Stock Returns with Constant Volatility). Suppose that the cumulative return process R of a stock satisfies dR(t) = x(t) dt + σR dw(t) and R is the only observable process. Then y = R with a ≡ 0, b ≡ 1, and σ ≡ σR > 0. Example 2.2 (Extra Signal). Continue to assume constant volatility; but now assume there is an extra signal about the unobservable expected return: q 2 dR(t) = x(t) dt + σR ( 1 − rRA , rRA ) dw(t), dA(t) = x(t) dt + σA (0, 1) dw(t), where rRA ∈ (−1, 1) and σR , σA > 0. Then y = (R, A)> with p 2 0 1 σR 1 − rRA σR rRA a≡ , b≡ , and σ ≡ . 0 1 0 σA 12 Let ι be such that ι(t, f ) = f (t), 0 ≤ t ≤ T , f ∈ C([0, T ], Rny ). Let Bt , σ(ι(s) : 0 ≤ s ≤ t). Then a, b, and σ are measurable and adapted to {Bt }. 8 Similar examples can be constructed in general equilibrium settings in which, for example, consumption/dividend growth replaces the stock returns (see Section 5). Now, the reference likelihood function of the parameter x̄ under full observation, or simply the reference likelihood, is defined by LFO,T (x̄) , 2.2.2 dQx̄,0 . dQ0,0 Bayesian Benchmark Let M be a probability measure on (Rnx , BRnx ). Then, (M, LFO,T ) defines a Bayesian model of data generation, according to which x̄ is drawn from M , and conditional on x̄, (y, x) from LFO,T (x̄). Incidentally, M , too, is typically called the prior, but to distinguish it from P ∈ P (and Q ∈ Q), I refer to it as the parameter prior. 2.2.3 The Theories Bayesian agents behave as if they knew the probabilities of all events precisely. The agent of this paper, in contrast, lacks confidence in her understanding of the environment and finds both the parameter and the likelihood ambiguous. To elaborate, the agent’s perception of ambiguity regarding x̄ is expressed by multiple parameter priors. For simplicity, I assume that the parameter priors are all Dirac measures; that is, the set of parameter priors is given by 0 M = {Diracx̄ : x̄0 ∈ Rnx } 0 where Diracx̄ denotes the Dirac measure concentrated at x̄0 ∈ Rnx .13 Similarly, the agent entertains multiple likelihoods as well. Fix x̄. Let there be a probability measure Qx̄,η , η ∈ L2 ([0, T ], Rnx ), on (Ω, FT ) where L2 ([0, T ], Rnx ) denotes the set of square-integrable Rnx -valued functions. Let there also be an nx -dimensional Wiener process v x̄,η = {v x̄,η (t), Ft } independent of w. Under Qx̄,η , (y, x) satisfies the following system of SDEs: dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t), dx(t) = κ(x̄ + κ−1 ρv η(t) − x(t)) dt + ρw dw(t) + ρv dv x̄,η (t), (4) (5) where, as before, y(0) ∈ Rny is nonrandom and the F0 -measurable random variable x(0) has the conditional distribution x(0)|G0 ∼ N (m0 , γ0 ). The presence of η clearly allows for the structural breaks suspected in the literature; meanwhile, the lack of assumptions on η (besides the minimal technical one of square integrability) reflects 13 To quote Epstein and Schneider (2007), who first considered multiple parameter priors, “Indeed, one may wonder whether there is a need for non-Dirac [parameter] priors at all.” 9 the fact that evidence is too inconclusive for the agent to confidently make one. The set of full-observation likelihoods is given by LFO,T = x̄ 7→ LFO,T (x̄, η) : η ∈ L2 ([0, T ], Rnx ) , dQx̄,η LFO,T (x̄, η) , . dQ0,0 (5) can alternatively be written as dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt), which shows that the ambiguity in question is equivalent to that in the noise v x̄,η specific to the state dynamics and not the other one w. Indeed, the agent of this paper is not mechanically taking into consideration all the theories that are close to a reference, in which case w, too, would be perturbed, but is rather questioning a particular aspect of market dynamics, namely, mean reversion. Now I turn to the issue of the existence and uniqueness of a solution to the system of SDEs (4)–(5). |·| denotes the Euclidean norm for p vectors and the Frobenius norm for matrices; that is, for a vector or matrix z, |z| , tr(zz > ). All numbered assumptions stand throughout the paper from their statement on, unless otherwise noted. Assumption 2.1 (Sufficient Conditions for Unique Strong Existence). (i) b is uniformly bounded. (ii) For all f ∈ C([0, T ], Rny ), Z T (|a(t, f )| + |σ(t, f )|2 ) dt < ∞. 0 (iii) a, b, and σ are locally Lipschitz. That is, for each N there is a KN such that sup |f (s)| ∨ sup |g(s)| ≤ N ⇒ |σ(t, f ) − σ(t, g)| ≤ KN sup |f (s) − g(s)| s≤t s≤t s≤t for all t ∈ [0, T ]; and the same for a and b mutatis mutandis. (iv) a and σ are linearly growing. That is, there is a K such that |a(t, f )| + |σ(t, f )| ≤ K 1 + sup |f (s)| s≤t for all (t, f ) ∈ [0, T ] × C([0, T ], Rny ). Proposition 2.1. Strong existence and pathwise uniqueness hold for the system of SDEs (4)–(5). 10 Suppose (w, v x̄,η ) and (y, x) are defined on some filtered complete probability space. With a slight abuse of notation, σ(t) ≡ σ(t, ω) ≡ σ(t, y(ω)). With the notation σ(t, ω), σ can be considered a process (adapted to G). Similar remarks apply to the other functionals. As is the custom, the qualification almost surely is suppressed unless necessary. Assumption 2.2. There is an ε > 0 such that z > σ(t)σ(t)> z ≥ ε|z|2 for all z ∈ Rny and all t ∈ [0, T ]. Assumption 2.2 implies that σ(t) has an inverse and |σ(t)−1 z| ≤ ε−1/2 |z| for all z and t; likewise, σ(t)> , too, has an inverse and |(σ(t)> )−1 z| ≤ ε−1/2 |z| for all z and t (Karatzas and Shreve, 1988, Problem 5.8.1). With the last observation, we can rewrite (4) and (5) as dw(t) = σ(t)−1 [ dy(t) − (a(t) + b(t)x(t)) dt], dv x̄,η (t) = −η(t) dt + ρ−1 v (6) { dx(t) − κ(x̄ − x(t)) dt (7) −ρw σ(t)−1 [ dy(t) − (a(t) + b(t)x(t)) dt] , and use these SDEs to define w and v x̄,η . Let Ω , C([0, T ], Rny ) × C([0, T ], Rnx ), F ◦ , BC([0, T ], Rny ) ⊗ BC([0, T ], Rnx ), let (y, x) be the identity map on Ω, and let Qx̄,η , law(y, x) be defined on (Ω, F ◦ ). Let F = {Ft } be the augmented filtration generated by (y, x). Since {Qx̄,η : x̄, η} are equivalent, they all lead to the same augmentation. Finally, define σ by σ(t, ω) = σ(t, y(ω)); a and b similarly; and w and v x̄,η by (6) and (7). In particular, this construction (of weak solutions) explains why v x̄,η is superscripted. In sum, the agent’s theories of how the data y is generated can be identified with the probability measures Q , Qx̄,η : (x̄, η) ∈ Rnx × L2 ([0, T ], Rnx ) on the common measurable space (Ω, FT ). I call these measures theoretical priors to distinguish them from the measures P ∈ P that are part of the representation of the agent’s preferences, or the preferential priors. The elicited probability measures are often interpreted as those that are actually entertained by the agent; see, for example, Gagliardini et al. (2009).14 However, 14 In motivating their preferential priors {P h : h} Gagliardini et al. say, “The representative agent does not know the data-generating process · · · [and] considers a class of probabilistic scenarios, or contaminations, P h around the reference belief [a Cox-Ingersoll-Ross (1985) economy]. These contaminations are interpreted as likely specifications for the constituents of the opportunity sets.” 11 “nothing in the theoretical construct of Gilboa and Schmeidler (1989) supports this interpretation” (Gajdos et al., 2008), and rather it is clear that “information and personal taste jointly determine [the set of preferential priors]” (Gilboa and Marinacci, 2013). Thus in this paper I speak of the agent’s theories in their own right and with them make explicit the cognitive origin of the elicited beliefs (Epstein and Schneider, 2007; Gajdos et al., 2008). 2.3 The Preferential Priors This subsection executes the learning mechanism. This amounts to mapping the theories of the agent to her preferential priors. 2.3.1 Learning under Qx̄,η Recall the partially observable system dy(t) = (a(t) + b(t)x(t)) dt + σ(t) dw(t), dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt). I use dot notation for time derivatives: f˙(t) ≡ df (t)/ dt. Proposition 2.2. The following standard results in Gaussian filtering hold: (i) (y, x) is conditionally Gaussian. (ii) The conditional mean vector and variance-covariance matrix x̄,η mx̄,η (t) , EQ (x(t)|Gt ), x̄,η γ(t) , EQ [(x(t) − mx̄,η (t))(x(t) − mx̄,η (t))> |Gt ], satisfy the system of differential equations dmx̄,η (t) = [κ(x̄ − mx̄,η (t)) + ρv η(t)] dt + (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 [ dy(t) − (a(t) + b(t)mx̄,η (t)) dt] (8) = (κx̄ + ρv η(t) − κ̄(t)mx̄,η (t)) dt + (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 ( dy(t) − a(t) dt), > γ̇(t) = ρw ρ> w + ρv ρv − κγ(t) − γ(t)κ − (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> , (9) with initial conditions mx̄,η (0) = m0 and γ(0) = γ0 , where κ̄(t) , κ + (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 b(t). (iii) the process w̄x̄,η = {w̄x̄,η (t), Gt } defined by Z t x̄,η w̄ (t) , σ(s)−1 [ dy(s) − (a(s) + b(s)mx̄,η (s)) ds], 0 ≤ t ≤ T, 0 is a Wiener process under Qx̄,η and generates G. 12 (10) Lemma 2.1. γ is uniformly bounded. Let ϕ : [0, T ] × Ω → Rnx ×nx be the solution of ϕ̇(t) = −κ̄(t)ϕ(t), ϕ(0) = Inx , where Inx denotes the nx -dimensional identity matrix. ϕ(t) is invertible for all t ≥ 0. Introduce the following notation: for functions f from [0, T ] into Rnx or into Rnx ×nx , Φf denotes the process defined by Z t f ϕ(s)−1 f (s) ds, 0 ≤ t ≤ T. Φ (t) , ϕ(t) 0 Then mx̄,η (t) = ϕ(t)m0 + Φκx̄+ρv η (t) Z t + ϕ(t) ϕ(s)−1 (ρw σ(s)> + γ(s)b(s)> )(σ(s)σ(s)> )−1 ( dy(s) − a(t) dt). (11) 0 2.3.2 Likelihood of Theories The log-likelihood function for the theories under partial observation is given by d(Qx̄,η |GT ) d(Q0,0 |GT ) x̄,η dQ 0,0 GT = log EQ dQ0,0 `T (x̄, η) , log (12) where Qx̄,η |GT denotes the restriction of Qx̄,η to GT . The choice of the reference, here (x̄, η) = (0, 0), is inconsequential. Proposition 2.3. Z T `T (x̄, η) = (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 dy(t) 0 Z 1 T (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 (a(t) + b(t)mx̄,η (t)) dt − 2 Z 0T − (a(t) + b(t)m0,0 (t))> (σ(t)σ(t)> )−1 dy(t) 0 Z 1 T 0,0 > > −1 0,0 − (a(t) + b(t)m (t)) (σ(t)σ(t) ) (a(t) + b(t)m (t)) dt . 2 0 13 The log-likelihood at time t, t < T , is obtained by replacing the arbitrary T with t: Z t (a(s) + b(s)mx̄,η (s))> (σ(s)σ(s)> )−1 dy(s) `t (x̄, η) = 0 Z 1 t − (a(s) + b(s)mx̄,η (s))> (σ(s)σ(s)> )−1 (a(s) + b(s)mx̄,η (s)) ds + ft (13) 2 0 where ft is independent of (x̄, η). Since mx̄,η (t) is linear in x̄, `t (x̄, η) is quadratic in x̄. `t (x̄, η) is also Gâteaux differentiable with respect to η, and the derivative is linear in η: Lemma 2.2. The Gâteaux differential of `t (x̄, ·) at η ∈ L2 ([0, T ], Rnx ) in the direction h ∈ L2 ([0, T ], Rnx ) is Z t Z t −1 > (ϕ(s) ρv ) ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 0 s x̄,η × [ dy(τ ) − (a(τ ) + b(τ )m 2.3.3 > (τ )) dτ ] h(s) ds. Learning about the Data-Generating Mechanism Recall the dynamics of the observable process dy(t) = (a(t) + b(t)x(t)) dt + σ(t) dw(t). If the agent were Bayesian with unique theoretical prior Qx̄,0 ∈ Q,15 Bayesian updating would result in the filtered dynamics dy(t) = (a(t) + b(t)mx̄,0 (t)) dt + σ(t) dw̄(t), (14) where w̄, defined by (14), is a (Qx̄,0 , G)-Wiener process, and her time-t decisions would accordingly be based on the unique one-step-ahead conditional dy(t)|Gt ∼ N (a(t) + b(t)mx̄,0 (t)) dt, σ(t)σ(t)> dt . On the other hand, our agent entertains a set of theories, {Qx̄,η : x̄, η}, and rules out some of them in light of evidence. Hence, unless she rules out all but one theory, the agent will have multiple one-step-ahead conditionals of the form dy(t)|Gt ∼ N (a(t) + b(t)mx̄,η (t)) dt, σ(t)σ(t)> dt where (x̄, η) runs over a set. Note that the ambiguity in the data-generating mechanism boils down to that in mx̄,η (t), the posterior mean of x(t). 15 Given that our agent is uncertain about x̄, a fair comparison would require that the hypothetical Bayesian agent be given a diffuse parameter prior; but the form of the parameter prior is irrelevant to the point I am making here, namely, unique versus nonunique one-step-ahead conditionals. 14 Plausibility: Penalized Log-Likelihood The ambiguity, however, is too large for there to be learning, if the agent assesses the plausibility of a theory based on the likelihood alone. To elaborate, define the log-likelihood induced by the transformation (x̄, η) 7→ mx̄,η (t) by `t,m(t) (m) , sup (x̄,η)∈Rnx ×L2 ([0,T ],Rnx ) {`t (x̄, η) : mx̄,η (t) = m} , m ∈ Rnx . Then, `t,m(t) is constant, the constant value lying in R∪{∞}.16 In other words, mx̄,η (t) is not identified. The reason is that each value of mx̄,η (t) can be supported equally well by some theory with a large η.17 Indeed, “inductive inference based on objective criteria alone is bound to fail, while incorporating subjective criteria alongside objective ones can lead to successful learning”; that is, “effective learning requires a willingness to sacrifice goodness-of-fit in return for enhanced subjective appeal” (Gilboa and Samuelson, 2012). Thus, I assume that the plausibility ranking, a binary relation “at least as plausible as,” over the theories is represented by a penalized log-likelihood function. Specifically, the agent finds more appealing the “reference” or “simple” theories free of the poorly understood factor η, and that subjective criterion is translated into a penalty proportional to the magnitude of η measured by the L2 -norm: Z λ t λ |η(s)|2 ds (15) `t (x̄, η) , `t (x̄, η) − 2 0 where λ ∈ (0, ∞] measures the agent’s a priori confidence about the reference likelihood. When λ = ∞, the set of theories reduces to {Qx̄,0 : x̄ ∈ Rnx } and the agent perceives no persistent source of ambiguity; when λ is small, the agent fits data with large ηs with little restraint. It is also worth noting that the L2 -norm of η is equal to the deviation of a theory Qx̄,η from its simple counterpart Qx̄,0 measured by the Kullback-Leibler divergence: dQx̄,0 x̄,0 DKL (Qx̄,0 kQx̄,η ) , EQ log dQx̄,η Z T 1 = |η(t)|2 dt. 2 0 16 See the supplementary appendix for a proof. Precisely speaking, the supremum is not attained; that is, there does not exist a maximum likelihood estimate. Fix x̄ and suppose there is a partial maximizer η of the likelihood `t (x̄, η), 0 < t ≤ T , in L2 ([0, T ], Rnx ). Then it must satisfy, from Lemma 2.2, 17 0 = (ϕ(s)−1 ρv )> Z t ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 [ dy(τ ) − (a(τ ) + b(τ )mx̄,η (τ )) dτ ], 0 ≤ s ≤ t, s but the constancy of the left-hand side and the unbounded variation of the right-hand side are incompatible. It follows that for any given η, there is another η 0 with higher likelihood. 15 The idea of penalizing the likelihood was first discussed by Good and Gaskins (1971) in the context of nonparametric density estimation. Green (1987) extended it to semiparametric settings. In these non- or semi-parametric estimation problems, Sobolev norms of higher orders, as well as the L2 -norm, are favored; but for us, imposing smoothness on η would violate the assumption of symmetry. In the context of model selection, Akaike (1973) extended the maximum likelihood principle by proposing his celebrated criterion in the form of a penalized log-likelihood; and ever since, penalizing the likelihood has been a standard method in information theory to strike a balance between the goodness of fit and the simplicity of the model (see Konishi and Kitagawa, 2008). The penalized log-likelihood representation of a plausibility ranking has recently been axiomatized by Gilboa and Schmeidler (2010). Finally, a theory Qx̄,η is not ruled out if and only if `λt (x̄, η) ≥ max (x̄0 ,η 0 )∈Rnx ×L2 ([0,T ],Rnx ) `λt (x̄0 , η 0 ) − α where 0 ≤ α < ∞. α measures how conservative the agent is in model selection; when α = 0, in particular, the agent keeps nothing but the most plausible theories. And as shall be seen, the corresponding induced plausibility of mx̄,η (t) λ x̄,η ` (x̄, η) : m (t) = m , m ∈ Rnx max `λt,m(t) (m) , t n 2 n (x̄,η)∈R x ×L ([0,T ],R x) has a nonzero curvature (Lemma 2.5). Remark 2.1. There are two prominent alternatives to the L2 -penalty. The first is Epstein and Schneider’s (2007) L∞ -constraint: ess supt≤T |η(t)| ≤ η̄. This amounts to constraining instantaneous entropy rates point by point in time. While this is sensible when the agent is looking forward and fears misspecification of the infinitesimal future, in looking backward, it is not. What the agent tries to pin down here is the value of mx̄,η (t), and in this regard, η(s), s ≤ t, having large values for a short period of time has littleR significance. T The other is an L2 -constraint: 0 |η(t)|2 dt ≤ η̄T . Naturally, this is closely related to the L2 -penalty. First, a constraint is a penalty that is discontinuous. Second, a constraint is the dual of a penalty in the method of Lagrange multipliers: the constant λ defines a shadow process η̄ λ = {η̄tλ } that implies the same most plausible theories. And it can be seen that the penalized likelihood ratio test with λ is more conservative than the constrained likelihood ratio test with η̄ λ . Compared to its penalty counterpart, however, the L2 -constraint has the following drawbacks. First, the sharp bounds seem to be at odds with the assumed a priori ignorance. Second, if, as is natural, the time-t bound η̄t is lower than η̄T , t < T , it implies p that the agent has a time-varying parameter set; for example, he would deem η(s) = 2η̄T /t(1, 0, · · · , 0)> , s ≤ t, implausible at time t but plausible at time T . 16 Maximum Plausibility Estimation I will need the following facts to characterize the natural “center” of the set of preferential priors. The maximum plausibility estimate (MPE) of (x̄, η) at time t is defined as (x̄∗t , ηt∗ ) , arg max (x̄,η)∈Rnx ×L2 ([0,T ],Rnx ) `λt (x̄, η). The notion of the partial MPE of η given x̄ will prove helpful: ∗ ηx̄,t , arg max η∈L2 ([0,T ],Rnx ) `λt (x̄, η). Clearly, ηt∗ = ηx̄∗∗t ,t . The first-order condition with respect to η (FOC(η)) is −1 > Z λη(s) = (ϕ(s) ρv ) t ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 s × [ dy(τ ) − (a(τ ) + b(τ )mx̄,η (τ )) dτ ], 0 ≤ s ≤ t. To write the solution of this integral equation, introduce the following notation. Let > > −1 > > > −1 b(s) (σ(s)σ(s) ) b(s) ) ρ ρ κ̄(s) (ρ ρ ρv ρ> v v v v v , χ(s) , λ−1 Inx −κ̄(s) and let ψ be the matrix-valued process such that ψ̇(s) = χ(s)ψ(s), 0 ≤ s ≤ T, ψ(0) = I2nx . (16) ψ(s) is invertible for all s ≥ 0. Let ι1 , (Inx , 0)> , ι2 , (0, Inx )> , and Aij , ι> i Aιj for a 2nx × 2nx matrix A. Lemma 2.3. For all t > 0, (i) ψ11 (t) is invertible, and (ii) ψ21 (t)ψ11 (t)−1 ρv ρ> v is symmetric and positive definite. Let also Z s Ψ(s) , ψ(s) ψ(τ )−1 dτ, 0 ≤ s ≤ T. (17) 0 Proposition 2.4 (Partial MPE of η). ∗ λρv ηx̄,t (s) ∗ κx̄+ρv ηx̄,t Φ (s) s Z > > −1 = Ψ(s)ι2 κx̄ − ψ(s) ψ(τ )−1 ι1 ρv ρ> dw̄0,0 (τ ) v b(τ ) (σ(τ ) ) 0 Z t −1 > −1 > > > −1 0,0 − ψ(s)ι1 ψ11 (t) ι1 Ψ(t)ι2 κx̄ − ψ(t) ψ(τ ) ι1 ρv ρv b(τ ) (σ(τ ) ) dw̄ (τ ) . 0 (18) 17 ∗ Hence, mx̄,ηx̄,t (s) is linear in x̄ (recall (11)). Define θ(t) by θ(t) , Ψ22 (t) − ψ21 (t)ψ11 (t)−1 Ψ12 (t) or ∗ (19) ∗ mx̄,ηx̄,t (t) = m0,η0,t (t) + θ(t)κx̄. That is, θ(t) measures the sensitivity to κx̄ of mx̄,η (t) with η “profiled out.” Let Ix̄ (t) denote the “observed Fisher information” about x̄: ∂2 λ ∗ Ix̄ (t) , − `t (x̄, ηx̄,t ) . (20) 2 ∂(κx̄) x̄=x̄∗ t Precisely speaking, Ix̄ (t) is the information about κx̄, but I adopt this slight abuse of terminology because κ is known and the parameter of interest is clearly x̄. Assumption 2.3. (i) nx ≤ ny . (ii) b(t) is of full column rank (that is, nx ) for all t ∈ [0, T ]. Lemma 2.4. Z Ix̄ (t) = t θ(s)> b(s)> (σ(s)σ(s)> )−1 b(s)θ(s) ds (21) 0 and is invertible for all t > 0. FOC(x̄) is Z t 0= (b(s)ΦInx (s)κ)> (σ(s)σ(s)> )−1 [ dy(s) − (a(s) + b(s)mx̄,η (s)) ds]. 0 Proposition 2.5 (MPE of x̄). For t > 0, Z t ∗ ∗ −1 −1 x̄t = κ Ix̄ (t) ΦInx (s)> b(s)> (σ(s)> )−1 dw̄0,η0,t (s). 0 Remark 2.2. Estimation is not defined at time 0, and consequently, neither is the time-0 decision making. This is natural. At time 0, the agent is in the state of sheer ignorance while once the observable process y starts to wiggle, information thereafter accrues continuously. The singularity at time 0 is not a problem because, as we shall see, decision making is well-defined for all t > 0. Nevertheless, I assume purely for the brevity of exposition that the agent’s learning started prior to time 0 and all the statistics, including Ix̄ (0) and x̄∗0 , have a definite, finite value at time 0. The differential dynamics I am about to characterize determine their evolution from then on. To maintain the convention that G0 is trivial, I assume that all the G0 -measurable variables are nonrandom constants. 18 The natural center of the time-t set of one-step-ahead conditionals is ∗ ∗ dy(t)|Gt ∼ N (a(t) + b(t)mx̄t ,ηt (t)) dt, σ(t)σ(t)> dt . This observation motivates us to define a process = {(t), Gt } by Z t ∗ ∗ σ(s)−1 [ dy(s) − (a(s) + b(s)mx̄s ,ηs (s)) ds], 0 ≤ t ≤ T. (t) , 0 To prove that there is a probability measure on (Ω, GT ) under which is a Wiener process, I first observe the dynamics of the statistics. Proposition 2.6 (Dynamics of the MPEs). dx̄∗t = κ−1 σx̄∗ (t)> b(t)> (σ(t)> )−1 d(t), ∗ ∗ ∗ ∗ dmx̄t ,ηt (t) = κ(x̄∗t − mx̄t ,ηt (t)) dt + [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)> )−1 d(t), (22) where σx̄∗ (t) , θ(t)Ix̄ (t)−1 , (23) > δ(t) , ψ21 (t)ψ11 (t)−1 ρv ρ> v + θ(t)σx̄∗ (t) . (24) Note that δ is symmetric and positive definite (Lemma 2.3 and θ(t) = σx̄∗ (t)Ix̄ (t)). The following proposition closes the dynamics: Proposition 2.7. θ̇(t) = Inx − (κ + ρw σ(t)−1 b(t))θ(t) (25) − (γ(t) + δ(t) − θ(t)σx̄∗ (t)> )b(t)> (σ(t)σ(t)> )−1 b(t)θ(t), σ̇x̄∗ (t) = Ix̄ (t)−1 − {κ + [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 b(t)}σx̄∗ (t), d (Ix̄ (t)−1 ) = −σx̄∗ (t)> b(t)> (σ(t)σ(t)> )−1 b(t)σx̄∗ (t), dt δ̇(t) = (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> − κδ(t) − δ(t)κ − [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 [ρw σ(t)> + (γ(t) + δ(t))b(t)> ]> (26) + σ ∗ (t) + σ ∗ (t)> + λ−1 ρ ρ> . x̄ x̄ v v The Preferential Priors Now we are ready to characterize the preferential priors. The reference preferential prior P 0 is identified in Proposition 2.8, and the one-stepahead beliefs process Ξ, in Proposition 2.9. (Recall the description of Section 2.1.) First, make the following additional assumption: Assumption 2.4. θ, σx̄∗ , and δ are uniformly bounded. 19 Remark 2.3. Here are simple example cases in which Assumption 2.4 holds: (i) σ and b are deterministic. (ii) σ, ρw , ρv , and b are diagonal,18 and there is an ε > 0 such that κ̄ = κ + (ρw σ > + γb> )(σσ > )−1 b ≥ εInx a.e. (Given that σ, ρw , ρv , and b are diagonal, there trivially is an ε > 0 such that κ̄ > εInx a.e. if ρw = 0.) See the appendix for a proof. Proposition 2.8. There is a unique probability measure on (Ω, GT ), denoted by P 0 , such that P 0 ∼ (Q0,0 |GT ) and is a Wiener process under P 0 . Also, G equals the augmented filtration generated by . Observe that under P ξ , ∗ ∗ dy(t)|Gt ∼ N (a(t) + b(t)mx̄t ,ηt (t) + σ(t)ξ(t)) dt, σ(t)σ(t)> dt . Hence, the time-t set of one-step-ahead conditionals Ξ(t) is defined by ∗ ∗ a(t) + b(t)mx̄t ,ηt (t) + σ(t)Ξ(t) = µ ∈ Rny : `λt (x̄∗t , ηt∗ ) − max {`λt (x̄, η) (x̄,η)∈Rnx ×L2 ([0,T ],Rnx ) x̄,η : a(t) + b(t)m (t) = µ} ≤ α where the maximum of an empty set is defined to be −∞. It turns out that δ(t) is the inverse of the observed Fisher information about mx̄,η (t): Lemma 2.5. ∗ ∗ `λt (x̄∗t , ηt∗ ) − max `λt (x̄, η) : mx̄,η (t) = m = `λt,m(t) (mx̄t ,ηt (t)) − `λt,m(t) (m) x̄,η 1 ∗ ∗ ∗ = (m − mx̄t ,ηt (t))> δ(t)−1 (m − mx̄t ,ηt (t)), m ∈ Rnx . 2 And the set of one-step-ahead conditionals is given as follows: Proposition 2.9. −1 nx Ξ(t) = σ(t) b(t) ∆m ∈ R 1 > −1 : (∆m) δ(t) ∆m ≤ α , 0 ≤ t ≤ T. 2 (27) The process Ξ = {Ξ(t), Gt } is uniformly bounded and compact-convex-valued. If each of the processes b and σ −1 is left- or right-continuous, Ξ is furthermore progressive. Remark 2.4. For ξ(t) ∈ Ξ(t), 1 (σ(t)ξ(t))> (b(t)δ(t)b(t)> )+ σ(t)ξ(t) ≤ α 2 where (b(t)δ(t)b(t)> )+ denotes the Moore-Penrose pseudoinverse: (b(t)δ(t)b(t)> )+ = b(t)(b(t)> b(t))−1 δ(t)−1 (b(t)> b(t))−1 b(t)> . However, the converse is not true; that is, (28) does not imply ξ(t) ∈ Ξ(t). 18 In case nx 6= ny , the nx × ny matrix ρw , for example, is diagonal if ρij w = 0 for all i 6= j. 20 (28) Thus, the size of the set of one-step-ahead conditionals is proportional to that of the set of the plausible values of mx̄,η (t), and the latter set is given by an nx ∗ ∗ dimensional hyper-ellipsoid centered at the most plausible value mx̄t ,ηt (t). The lengths of the principal axes of the hyper-ellipsoid are proportional to the square roots of the eigenvalues of δ(t). Therefore, the square roots of the eigenvalues, or the eigenvalues themselves, of δ(t) are measures of conditional ambiguity. To conclude, let ξ ∈ Ξ mean that ξ = {ξ(t), Gt } is progressive and ξ(t, ω) ∈ Ξ(t, ω) a.e. Then the set of preferential priors is given by dP ξ ξ ξ ξ P = P : P is a probability measure on (Ω, GT ), = E (T ), ξ ∈ Ξ . dP 0 3 Discussion This section examines the learning dynamics derived in the previous section. The conditional ambiguity about one-step-ahead uncertainties is represented by the set of one-step-ahead conditionals Ξ(t); a nonsingleton Ξ(t) results from ambiguity in the ∗ ∗ ∗ ∗ estimate mx̄t ,ηt (t); and the ambiguity in mx̄t ,ηt (t) comes from two component ambiguities, one about the time-invariant factor x̄ and the other about the time-varying factor η. The discussion begins by noting that the former eventually resolves (Section 3.1). The ambiguity about η, however, persists, and the consequent time variation in the conditional ambiguity is discussed next. Specifically, Section 3.2 analyzes the ∗ ∗ filtering equations for {mx̄t ,ηt (t) : t} and δ in comparison with the classical ones for mx̄,η and γ. It turns out that IID ambiguity may indeed obtain as the limit of a learning process. Section 3.3 provides a sufficient condition for such convergence. Assume henceforth nx = 1. Still the setup is general enough to encompass both the examples given in Section 2.2.1 and those to be given in Section 5. 3.1 Learning about x̄ The ambiguity associated with x̄ eventually resolves, provided that the observation process y as a signal about the hidden state x maintains a level of informativeness: Assumption 3.1. |σ −1 b| is uniformly bounded away from zero. Proposition 3.1. The set of the values of x̄ that are sufficiently plausible at time t, n o ∗ )≤α , x̄ ∈ R : `λt (x̄∗t , ηx̄∗∗t ,t ) − `λt (x̄, ηx̄,t shrinks to the point {x̄∗t } as t → ∞, for all critical values α ≥ 0. The question that naturally arises next is if x̄∗ converges. But since convergence under one probability measure does not imply convergence under another probability 21 measure obtained by a Girsanov change of measure (see Karatzas and Shreve, 1988, p. 193), to answer this question we need to take a stance regarding the true probability measure. Although my stance is that not only does the agent not know the true probability measure but she does not purport, either, to have identified a set of probability measures (theoretical priors) that includes it, if need be the natural candidate for the true probability measure is a theoretical prior Qx̄,0 ∈ Q of the agent (correct specification). It remains to be seen if x̄∗ converges under Qx̄,0 .19 3.2 Comparison with the Classical Filter The agent’s learning process is summarized by a finite-dimensional filter (Propositions ∗ ∗ 2.6 and 2.7). The key components of the filter are {mx̄t ,ηt (t) : t} and δ. The ambiguity in the data-generating mechanism boils down, in the present model, to that in the current value of the state variable, and the plausible estimates of the latter are given ∗ ∗ by an interval centered at mx̄t ,ηt (t) with length proportional to the square root of ∗ ∗ δ(t) (Lemma 2.5). Therefore, of prime interest is how {mx̄t ,ηt (t) : t} and δ evolve. In ∗ ∗ the following lines, I analyze the differential equations {mx̄t ,ηt (t) : t} and δ satisfy in comparison with those satisfied by mx̄,η and γ. Let us begin with the unobservable process x: dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt), d Var(x(t)) = |ρw |2 + ρ2v −2κ Var(x(t)). | {z } dt Var( dx(t)|Ft )/ dt The time-derivative of the unconditional variance of x(t) is the conditional variance of dx(t) per unit time, less the unconditional variance of x(t) times the rate of reversion (times two). This is self-explanatory. Recall next the filtering equations of a conditionally Gaussian process, namely (8) 19 ∗ ∗ ∆x̄∗ (t) , x̄∗t − x̄ and ∆m∗ (t) , mx̄t ,ηt (t) − mx̄,0 (t) satisfy κ d∆x̄∗ = σx̄>∗ b> (σ > )−1 ( dw̄x̄,0 − σ −1 b∆m∗ dt), d∆m∗ = κ(∆x̄∗ − ∆m∗ ) dt + δb> (σ > )−1 dw̄x̄,0 − (ρw σ > + (γ + δ)b> )(σσ > )−1 b∆m∗ dt, and (∆x̄∗ , ∆m∗ ) converges in L2 . The difficulty is that σx̄∗ is square-integrable but not integrable. It is not clear whether Z ∞ σx̄>∗ b> (σσ > )−1 b∆m∗ dt 0 is convergent or not. On the other hand, it is easy to see that x̄∗ is an L2 -bounded continuous martingale under P 0 , in which case limt→∞ x̄∗t exists under P 0 by Doob’s martingale convergence theorem (Rogers and Williams, 1994, Theorem II.69.1). 22 and (9), slightly rephrased to facilitate the discussion: dmx̄,η (t) = [κ(x̄ − mx̄,η (t)) + ρv η(t)] dt + [ρw + (σ(t)−1 b(t)γ(t))> ] dw̄x̄,η (t), | {z } Kalman gain | {z } weight on the innovation 2 γ̇(t) = |ρw |2 + ρ2v − 2κγ(t) − ρw + (σ(t)−1 b(t)γ(t))> . | {z } weight on the innovation squared We revise the estimate mx̄,η (t) of x(t) in consideration of two factors: (i) the estimation error and (ii) the (unobservable) evolution of x. First, the correction of the estimation error, +(σ(t)−1 b(t)γ(t))> dw̄x̄,η (t), is proportional to the innovation dw̄x̄,η (t). Suppose, to ease explanation, ny = 1 and σ, b > 0. Then, when the change in the observable variable exceeds (falls short of) what was expected, it is likely that the old estimate of the growth rate was an underestimate (overestimate), and it thus needs to be revised upward (downward). The multiplicative factor, or the Kalman gain, is increasing in the uncertainty γ(t) in mx̄,η (t) (the less trustworthy the old estimate, the more weighted the new evidence) and is decreasing in the imprecision σ(t) of the signal (the less informative the signal, the less weighted the new evidence). Second, mx̄,η (t) is also to be revised by +[κ(x̄ − mx̄,η (t)) + ρv η(t)] dt + ρw dw̄x̄,η (t), to account for the corresponding change +[κ(x̄ − x(t)) + ρv η(t)] dt + ρw dw(t) in x. γ̇(t) is given by an analogue of ( d/ dt) Var(x(t)) less the weight on the innovation squared. The last term expresses that uncertainty resolves more quickly when new evidence is taken more seriously. For example, zero weight on the news is equivalent to no news, and that certainly cannot help resolve uncertainty. ∗ ∗ Recall finally the governing equations (22) and (26) of {mx̄t ,ηt (t) : t} and δ under the reference preferential prior P 0 , again slightly rephrased: ∗ ∗ ∗ ∗ dmx̄t ,ηt (t) = κ(x̄∗t − mx̄t ,ηt (t)) dt + {ρw + [σ(t)−1 b(t)( γ(t) + δ(t) | {z } )]> } d(t), estimation uncertainty 2 > 2 δ̇(t) = ρw + (σ(t)−1 b(t)γ(t))> −2κδ(t) − ρw + σ(t)−1 b(t)(γ(t) + δ(t)) | {z } | {z } Var( dmx̄,η (t)|Gt )/ dt + weight on the innovation squared 2θ(t)Ix̄ (t)−1 + λ−1 ρ2v | {z } ambiguity associated with x̄ . | {z } ambiguity associated with η As with the Bayesian estimate mx̄,η (t), the ambiguity-conscious agent’s estimate m (t), too, is revised in consideration of the estimation error and the evolution of x. But the difference is that now γ(t) is replaced by the sum of γ(t) and δ(t). γ(t) is also known as the estimation risk in the literature (Kalymon, 1971; Barry, 1974; Klein and Bawa, 1976, 1977) and represents the Bayesian uncertainty under each theory that results because the agent cannot observe x(t) and consequently has to estimate it. On the other hand, δ(t) represents the Knightian uncertainty that results because x̄∗t ,ηt∗ 23 the agent does not know the data-generating mechanism and consequently has to estimate it. Based on this parallelism, I call δ(t) the estimation ambiguity and the sum γ(t) + δ(t) the estimation uncertainty. When the estimated theory is imprecise (large δ(t)), or the posterior distribution of x(t) is diffuse under the theory (large γ(t)), or both, new evidence receives more weight. δ̇(t) is given by an analogue of γ̇(t) plus terms accounting for the ambiguity in the data-generating mechanism. The first three terms reflect the fact that δ measures the imprecision in the estimation of mx̄,η , as opposed to that in the estimation of x as does γ. To elaborate, the first term is the conditional variance of dmx̄,η (t) (given Gt ) per unit time as opposed to that of dx(t) (given Ft ) per unit time; the parallelism between the second terms is obvious; and the third term is the weight on the innovation squared, as in γ̇(t). Next, the fourth term θ(t)Ix̄ (t)−1 captures ∗ ∗ the ambiguity in the estimate mx̄t ,ηt (t) of mx̄,η (t) due to that in x̄∗t ; recall that θ(t) ∗ measures the sensitivity to x̄ of mx̄,ηx̄,t (t), and Ix̄ (t)−1 , the imprecision of x̄∗t . Lemma 3.1. limt→∞ θ(t)Ix̄ (t)−1 = 0. The fifth and last term λ−1 ρ2v then captures the ambiguity associated with η and sets the long-run level of δ as the fourth term vanishes: Proposition 3.2. Suppose σ −1 b is constant. Then δ evolves deterministically, converging to p p (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 − (κ + ρw σ −1 b)2 + ρ2v |σ −1 b|2 . δ(∞) = |σ −1 b|2 Note that δ(∞) is strictly decreasing in the confidence measure λ, and δ(∞) = 0 if and only if λ = ∞. When σ −1 b is stochastic, we can roughly say that δ is instantaneously tending to the limit identified above, while how quickly it does and how close it is to the value at each instant depend on other parameters in the governing equation and how wildly σ −1 b varies. To be highlighted here is that δ, a measure of uncertainty, may respond inversely to changes in the signal imprecision σ (assume ny = 1). That is, agents may perceive, rather counterintuitively, more ambiguity when news has been relatively precise and less ambiguity when news has been relatively imprecise. If we think of the observable process as the endowment stream, this means that market ambiguity, or the premium for bearing it, may negatively comove with market risk (see Section 4.1). To elaborate on the behavior of δ in question, suppose the signs of ρw and σ −1 b differ. If, as would typically be the case, b > 0, this means that the observable variable (return or growth rate) locally negatively covaries with the unobservable variable (expected return or expected growth rate). Then, the two considerations in revising ∗ ∗ mx̄t ,ηt (t) oppose each other, and this can result in high uncertainty. For an illustration, suppose further that b is constant; that σ > 0 and ρw < 0; and that σ is currently −1 2 very large, σ |ρ−1 w (γ + δ)|, so that δ is approaching λ ρv /2κ. (Note that the 24 extremely noisy signal is nevertheless not vacuous; it is not directly informative about the unobservable variable, but is indirectly so by revealing the common noise w.) Now, if σ drops a bit, new evidence starts to receive less weight, rather, in which case, by way of the earlier observation, uncertainty rises. If, for example, σ stays at σ≡− b (ρ2w + (1 + λ−1 )ρ2v ) > 0, 2κρw then in the limit t → ∞, the weight on the innovation is zero and δ is indeed larger than the earlier limit value λ−1 ρ2v /2κ.20 Similar observations can be made about γ as well; in fact, it is more straightforward to see that γ may negatively comove with σ given the simpler governing equation. However, it is of less interest because the “second-order” uncertainty is not reflected in the equity premium to begin with. 3.3 Convergence to IID Ambiguity I define convergence to IID ambiguity to mean that the one-step-ahead beliefs process Ξ converges uniformly in the Hausdorff metric to a compact-convex subset of Rny .21 An obvious sufficient condition is the following: Proposition 3.3. Convergence to IID ambiguity occurs if σ −1 b is constant.22 Suppose σ −1 b is constant; and suppose further ny = 1 for simplicity. Then, p p 2 ¯ ξ(∞) = 2α (κ + σ −1 bρw )2 + (1 + λ−1 )(σ −1 bρv )2 − (κ + σ −1 bρw )2 + (σ −1 bρv )2 ¯ ξ(t)]. ¯ ¯ where Ξ(t) = [−ξ(t), Note that ξ(∞) is nonzero if and only if λ is finite. Also, not surprisingly, it is increasing in α and decreasing in λ. ¯ To see the dependence of ξ(∞) on other parameters, define Y by Y (0) = 0 and dY (t) = σ(t)−1 dy(t) = σ(t)−1 (a(t) + b(t)x(t)) dt + dw(t), and assume σ −1 a is deterministic. Then, denoting the asymptotic variability of Y by VY , lim t→∞ 20 21 d Var(Y (t)), dt See the supplementary appendix. The Hausdorff metric dH on the set K of nonempty compact subsets of Rny is defined by dH (X, Y ) = max sup inf |x − y|, sup inf |y − x| , X, Y ∈ K. x∈X y∈Y 22 y∈Y x∈X The constancy of σ −1 b can be weakened to uniform convergence. 25 2 ¯ we can rewrite ξ(∞) as 2 ¯ ξ(∞) = 2α p p κ2 VY + λ−1 (σ −1 bρv )2 − κ2 VY . ¯ Thus, ξ(∞) is decreasing in κ and VY . This is intuitive. First, when κ is large, the unobservable process stays close to the attractor. Second, VY measures the variability of the unobservable process relative to the measurement error w. The last observation, in particular, is in line with the following remark by Merton (1980): “Unless a significant portion of the variance of the market returns is caused by changes in the expected return on the market, it will be difficult to use the time series of realized market returns to distinguish among different models for expected return.” 4 Portfolio Choice In Section 4, I apply the model of learning to the consumption/portfolio choice problem of a log investor. The investor finances her intertemporal consumption by trading one risk-free asset (bond) and a number of risky assets (stocks). She believes, as is the prevailing view of the financial economics profession, that mean reversion in stock returns is a plausible assumption (Fama and French, 1988; Poterba and Summers, 1988); but facing at the same time nonnegligible evidence that questions its validity (Welch and Goyal, 2008), she fails to have full confidence in it. In Section 4.1, I explain the setup. Sections 4.2 and 4.3 characterize the optimal demand for stocks. In Section 4.4, I consider the special case in which there is a single stock and the stock return volatility is constant. This simplification allows us to establish certain analytical properties of the optimal policy. In Section 4.4.3, I numerically compute the optimal policy and discuss its behavior in comparison with the related models by Epstein and Schneider (2007) and Miao (2009). 4.1 Setup As with the previous section, time is continuous and varies over [0, T ], T ∈ (0, ∞). 4.1.1 Securities Market Dynamics There is a single consumption good in the economy, which is continuously consumed and serves as the numeraire. The investor finances her consumption by trading one bond and nR ≥ 1 stocks. The interest rate on the bond is constant at r ∈ R. Regarding how the stock returns are generated, on the other hand, the investor entertains multiple theories. Specifically, the theories take the form of probability measures on a common measurable space: Let there be a measurable space (Ω, F), a set Q of probability measures (theoretical priors) on (Ω, F), and a filtration F = {Ft } 26 of F. The theoretical priors are equivalent and F satisfies the usual conditions with respect to the theoretical priors. Under the theoretical prior Qx̄,η ∈ Q, where (x̄, η) ∈ Rnx × L2 ([0, T ], Rnx ) and nx ≥ 1, the cumulative return process R = {R(t), Ft } of the stocks is given by part of the solution to the system of SDEs dR(t) = (aR (t, R, A) + bR (t, R, A)x(t)) dt + σR (t, R, A) dw(t), dA(t) = (aA (t, R, A) + bA (t, R, A)x(t)) dt + σA (t, R, A) dw(t), dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt). (29) (30) Here, A = {A(t), Ft } is an nA -dimensional process, nA ≥ 0; x = {x(t), Ft } is an nx -dimensional process; nx ≤ nR + nA ; w = {w(t), Ft } and v x̄,η = {v x̄,η (t), Ft } are independent Wiener processes of dimension nR + nA and nx , respectively; aR , bR , σR , aA , bA , and σA are nonanticipating path functionals from [0, T ] × C([0, T ], RnR +nA ) into RnR , RnR ×nx , RnR ×(nR +nA ) , RnA , RnA ×nx , and RnA ×(nR +nA ) , respectively; κ is an nx × nx diagonal matrix with positive entries, ρw is an nx × (nR + nA ) matrix, and ρv is an nx × nx invertible matrix. The investor observes R and A but not x. A in this context represents the observable macroeconomic variables in addition to the stock returns themselves that affect the stock returns; and x the latent state of the economy. To conform to the notation of Section 2, let y , (R> , A> )> and ny , nR + nA . Then the dynamics (29)–(30) of the observable processes can be rewritten compactly as dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t) where the definitions of a, b, and σ are obvious. I continue to adopt the slightly abusive notation f (t) ≡ f (t, ω) ≡ f (t, y(ω)) for the path functionals f . 4.1.2 The Investor’s Preferences The investor has the Chen-Epstein recursive multiple-priors utility with log felicity. Her conditional preferences at time t ∈ [0, T ] are represented by Pξ Z min E ξ∈Ξ T e−βs log(c(s)) ds, c ∈ C. t Under a generic preferential prior P ξ , dR(t) = (aR (t) + bR (t)m∗t ) dt + σR (t)( dξ (t) + ξ(t) dt), ∗ ∗ where m∗t ≡ mx̄t ,ηt (t) is the maximum plausibility estimate of the conditional expectation of x(t) given Gt and ξ = {ξ (t), Gt } is a P ξ -Wiener process of dimension ny . Ξ(t) thus acquires a more specific interpretation as the ambiguity in the contemporaneous price of risk. 27 4.1.3 Trading Strategies and the Budget Constraint A (1 + nR )-dimensional process (Π◦ , Π), Π(t) = (Π1 (t), . . . , ΠnR (t))> , is a trading strategy if G-progressive and Z T (|Π◦ (t)| + |Π(t)|2 ) dt < ∞. 0 Π◦ represents the amount of money invested in the bond and Π those invested in the stocks. A trading strategy (Π◦ , Π) finances a consumption plan c ∈ C if Π◦ (T ) + Π(T )> 1nR ≥ 0 and d(Π◦ (t) + Π(t)> 1nR ) = Π◦ (t)r dt + Π(t)> dR(t) − c(t) dt where 1nR denotes the nR -dimensional vector of ones. Denote the wealth process Π◦ + Π> 1nR by W . W satisfies dW (t) = (W (t) − Π(t)> 1nR )r dt + Π(t)> dR(t) − c(t) dt (31) with initial condition W (0) = Π◦ (0) + Π(0)> 1nR . In fact, W is the unique strong solution to the last equation, and therefore, we can suppress Π◦ and identify a trading strategy with Π. A pair (Π, c) is admissible for initial wealth W (0) if the corresponding wealth process W Π,c,W (0) is uniformly bounded below. The market is dynamically incomplete if nA > 0. Let ζ(t) , σR (t)> (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR ). A consumption process c ∈ C can be financed by some trading strategy if and only if it satisfies the following static budget constraint: Z T P0 sup E E −(ζ+ν) (t)e−rt c(t) dt ≤ W (0) (32) ν∈Ker(σR ) 0 where Ker(σR ) denotes the set of processes ν such that σR (t, ω)ν(t, ω) = 0 a.e. (He and Pearson, 1991; Karatzas et al., 1991; Cuoco, 1997). Remark 4.1. If the investor had full confidence in a simple theory Qx̄,0 ∈ Q, then the present model would have as special cases the Bayesian learning models of Lakner (1998), Xia (2001), Zohar (2001), and Brendle (2006), in which the unobservable instantaneous expected return process follows an Ornstein-Uhlenbeck process. In other words, this section extends the latter models to a case of ambiguity. 4.2 Optimal Consumption and Portfolio Let C 2 (u) ⊂ C denote the set of consumption processes such that Z T P0 E [log(c(t))]2 dt < ∞. 0 28 I define the investor’s problem to be Pξ Z T e−βt log(c(t)) dt sup min E c∈C 2 (u) ξ∈Ξ (33) 0 subject to the budget constraint (32). The objective function in (33) is finite for all (c, ξ) ∈ C 2 (u) × Ξ due to the definition of C 2 (u) and the uniform boundedness of Ξ. Let Cbudget ⊂ C denote the set of consumption processes that satisfy the budget constraint. Lemma 4.1. The minimax theorem holds, that is, Z T Pξ e−βt log(c(t)) dt = min sup min E sup c∈C 2 (u)∩C budget ξ∈Ξ ξ∈Ξ c∈C 2 (u)∩C budget 0 Pξ Z E T e−βt log(c(t)) dt. 0 Remark 4.2. It is clear from the proof that the claim is true for any concave felicity, with the corresponding change to the definition of C 2 (u). Proposition 4.1. For a given ξ ∈ Ξ, the inner supremum Z T Pξ sup E e−βt log(c(t)) dt c∈C 2 (u)∩Cbudget (34) 0 equals 1 − e−βT − log β 1 − e−βT β−r 1 − e−βT 1 − e−βT −βT + Te − + log W (0) β β β β Z T −βt e − e−βT 1 Pξ |ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t)|2 dt. (35) +E β 2 0 Let ξ ∗ denote the minimizer of the last expression: Z T −βt e − e−βT 1 Pξ ∗ ξ , arg min E |ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t)|2 dt. (36) β 2 ξ∈Ξ 0 The optimal consumption process is given by ∗ e−βt E ξ (t) c (t) = βW (0)e , 1 − e−βT E −(ζ+ν ∗ ) (t) ν ∗ (t) = [σR (t)> (σR (t)σR (t)> )−1 σR (t) − Iny ]ξ ∗ (t). ∗ rt (37) (38) Hence the key is to solve (36). Note for later reference that ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t) = σR (t)> (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR + σR (t)ξ(t)). 29 To find the trading strategy that finances c∗ , observe first that the wealth process corresponding to c∗ is Z T 1 P0 −1 −(ζ+ν ∗ ) ∗ ∗ Gt E B(s) E (s)c (s) ds W (t) = B(t)−1 E −(ζ+ν ∗ ) (t) t (39) ∗ −βt − e−βT E ξ (t) rt e . = W (0)e 1 − e−βT E −(ζ+ν ∗ ) (t) Thus its differential is dW ∗ (t) = W ∗ (t)(ζ(t) + ν ∗ (t) + ξ ∗ (t))> d + · dt. Comparing the last expression with (31) and recalling (38), we see that π ∗ (t) , Π∗ (t)/W ∗ (t) = (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR + σR (t)ξ ∗ (t)) (40) where π ∗ denotes the optimal fraction of wealth invested in the stock. The optimal consumption plan c∗ found above equals that of the Bayesian investor ∗ with unique prior P ξ . Accordingly, π ∗ equals the stock demand of the same Bayesian ∗ investor, the term involving ξ ∗ accounting for the discrepancy between P ξ and P 0 . This observation also suggests that as is characteristic of Bayesian log investors, the optimal consumption is given by a fraction of wealth independent of other state variables, or precisely, β W ∗ (t), c∗ (t) = −β(T −t) 1−e as can be verified from (37) and (39). 4.3 Markovian Characterization Suppose the economy is Markovian, that is, f (t, R, A) = f (t, R(t), A(t)) where f = a, b, or σ. Then the investor’s information can be summarized by a finite number of Markovian variables. Observe first that the Bayesian investor who has full confidence in a simple theory x̄,0 Q ∈ Q has the following as the state variables (see Proposition 2.2): R(t), A(t), mx̄,0 (t), and γ(t). Our investor also has these as state variables, with the obvious replacement of mx̄,0 (t) by m∗t , that is, R(t), A(t), m∗t , and γ(t), (41) 30 and the following in addition: x̄∗t , σx̄∗ (t), Ix̄ (t)−1 , and δ(t). (42) See Propositions 2.6, 2.7, and 2.9. The first three of (42) originates from the estimation of x̄; the last is needed to describe the set of one-step-ahead conditionals Ξ(t). The standard control approach to the minimization (36) requires that Ξ(t), ζ(t), and σR (t) be functions of some (multidimensional) Markov process, and Propositions 2.6 and 2.7 confirm that the variables identified in (41) and (42) form a closed system of Markovian variables. Collect them in Z, Z , (R, A, m∗ , γ, x̄∗ , σx̄∗ , Ix̄−1 , δ)> , and write dZ(t) = µZ (t, Z(t)) dt + σZ (t, Z(t)) d(t). (43) Remark 4.3. Some of the state variables identified above may be redundant. For example, if a, b, and σ are deterministic functions of time independent of R and A, then it suffices to take as state variables m∗ and x̄∗ . See Section 4.4 below. Define the value function as Z T −βs e − e−βT Pξ J(t, Z) , min E ξ∈Ξ β t 1 > > −1 2 × |ζ(s) + σR (s) (σR (s)σR (s) ) σR (s)ξ(s)| ds Z(t) = Z 2 subject to the state dynamics (43). Picking a particular ξ ∈ Ξ is to say that ξ = {(t), Gt } defined by dξ (t) = d(t) − ξ(t) dt is a Wiener process. Hence Z T −βs e − e−βT P0 J(t, Z) = min E ξ∈Ξ β t ξ 1 ξ ξ ξ ξ > > −1 ξ 2 × |ζ (s) + σR (s) (σR (s)σR (s) ) σR (s)ξ(s)| ds Z (t) = Z (44) 2 subject to dZ ξ (t) = µZ (t, Z ξ (t)) dt + σZ (t, Z ξ (t))( d(t) + ξ(t) dt) where σRξ (s) ≡ σR (s, Rξ (s), Aξ (s)). (44) is linear-quadratic in the control, although not in the state and hence not linear-quadratic in the classical sense. The corresponding Hamilton-Jacobi-Bellman (HJB) equation is 0 = min ∂t J(t, Z) + (∂Z J(t, Z))> (µZ (t, Z) + σZ (t, Z)ξ(t)) ξ(t)∈Ξ(t,Z) 1 (45) tr[(∂Z2 J(t, Z))σZ (t, Z)σZ (t, Z)> ] 2 e−βt − e−βT 1 + |ζ(t, Z) + σR (t, Z)> (σR (t, Z)σR (t, Z)> )−1 σR (t, Z)ξ(t)|2 β 2 + 31 with boundary condition J(T, Z) = 0 for all Z. In general, (45) is of degenerate parabolic type and we can only say that the value function is a viscosity solution of (45). But see Section 4.4.1, where I consider a special case in which the value function is a unique classical solution to the HJB equation. 4.4 Examples To gain intuition, I consider in this section the special case in which there is a single stock, the stock return volatility is constant, and there are no other observable macroeconomic indicators that affect stock returns. That is, nR = 1 and nA = 0 so that ny = nx = 1 and σ(t, y) = σR (t, y) = σR ∈ (0, ∞) for all (t, y). Assume furthermore aR ≡ 0 and bR ≡ 1. This setup is simple but rich enough to let us discuss key aspects of the optimal policy. 4.4.1 x̄ Known Suppose first that x̄ is known. Optimal Policy Revisited Under the aforementioned assumptions, the investor’s problem is Markovian and her optimal stock demand can be written in a simple feedback form. Recall Section 4.3 and note that (i) R and A are redundant as state variables because σ is constant, (ii) γ and δ are redundant because they are deterministic, and (iii) x̄∗ ≡ x̄, σx̄∗ , and Ix̄−1 are redundant because x̄ is known. It thus suffices to take m∗ as the sole state variable (Z = m∗ ). The controlled state dynamics is (see (22)) ∗,ξ −1 dm∗,ξ t = κ(x̄ − mt ) dt + (ρw σR + γ(t) + δ(t))σR ( d(t) + ξ(t) dt) =: µm∗ (m∗,ξ t ) dt + σm∗ (t)( d(t) + ξ(t) dt). The price of risk under P 0 is simplified to ζ(m∗ ) = m∗ − r . σR ¯ ξ(t)] ¯ Ξ(t) is given by an interval [−ξ(t), where p ¯ , 2αδ(t) ; ξ(t) σR ¯ measures the magnitude of the ambiguity in the price of risk and is increasing ξ(t) in the investor’s conservatism in model selection α and the estimation ambiguity δ(t). Also, it decreases monotonically and deterministically over time, converging to ∗ a constant, as is a property of δ. The estimated p equity premium is m − r and the ¯ = 2αδ(t). (Unless necessary, the [true ambiguity in the equity premium is σR ξ(t) 32 or estimated] instantaneous equity premium will be referred to simply as the [true or estimated] equity premium.) Next, the HJB equation (45) is simplified to 0 = min ∂t J(t, m∗ ) + (∂m∗ J(t, m∗ ))(µm∗ (m∗ ) + σm∗ (t)ξ(t)) ξ(t)∈Ξ(t) e−βt − e−βT 1 1 2 ∗ 2 ∗ 2 (46) (ζ(m ) + ξ(t)) + (∂m∗ J(t, m ))σm∗ (t) + 2 β 2 with boundary condition J(T, m∗ ) = 0 for all m∗ . It is still not clear if (46) allows for an analytical solution, but we can now check some basic properties of the value function.23 C 1,2 ([0, T ] × R) denotes the set of real-valued functions f from [0, T ] × R such that f (t, m∗ ) is continuously differentiable in t and twice continuously differentiable in m∗ ; and Cp ([0, T ] × R) the set of real-valued functions f from [0, T ] × R that are continuous and satisfy the polynomial growth condition: |f (t, m∗ )| ≤ K(1 + |m∗ |n ) for all m∗ ∈ R for some nonnegative constants K and n. Assume for the rest of Section 4.4.1, 2 Assumption 4.1. σm ∗ : [0, T ] → R is bounded below away from zero. The assumption trivially holds if ρw ≥ 0. Proposition 4.2. (i) The partial differential equation (46) with its boundary condition has a unique solution K ∈ C 1,2 ([0, T ] × R) ∩ Cp ([0, T ] × R). (ii) K is the value function, that is, K = J. (iii) ¯ ξ U (t, m∗ ) , ¯ min ξ(t), ξ ∗ (t, m∗ ) = max −ξ(t), βeβt ξ (t, m ) , −ζ(m ) − σm∗ (t)∂m∗ J(t, m∗ ). −β(T −t) 1−e U ∗ ∗ Thus, in particular, the optimal control ξ ∗ : [0, T ] × R → R is continuous. The expression for the optimal stock demand (40) becomes m∗ − r + σR ξ ∗ (t, m∗ ) σ2 ∗ R ∗ ¯ ¯ m − r − σR ξ(t) m − r + σR ξ(t) = max , min , σR2 σR2 1 βeβt ∗ − 2 σR σm∗ (t)∂m∗ J(t, m ) . σR 1 − e−β(T −t) π ∗ (t, m∗ ) = 23 (47) It is possible to formulate (46) as a free boundary problem and characterize the solution to a certain degree (cf. Davis and Norman (1990)), but there is little practical benefit and I do not pursue this direction. 33 Lemma 4.2. (i) J(t, m∗ ) is convex in m∗ . (ii) Z T −βs ∗,ξ ∗ ∗ −βT −κ(s−t) ∗,ξ∗ e m − r + σ ξ (s) e − e 0 R s P ∗ ∗ ds mt = m . ∂m∗ J(t, m ) = E β σR σR t From the convexity, that is, from the fact that ∂m∗ J(t, m∗ ) is nondecreasing in m , it follows that ¯ m∗ − r − σR ξ(t) 1 βeβt = − σR σm∗ (t)∂m∗ J(t, m∗ ) σR2 σR2 1 − e−β(T −t) ∗ and ¯ 1 βeβt m∗ − r + σR ξ(t) = − σR σm∗ (t)∂m∗ J(t, m∗ ) 2 2 −β(T −t) σR σR 1 − e ∗ as equations in m each have a unique solution, m∗ (t) and m∗ (t) < m∗ (t), respectively. π ∗ can be rewritten as ∗ ¯ m − r + σR ξ(t) if m∗ < m∗ (t) 2 σ R m∗ − r − ¯ σR ξ(t) ∗ ∗ π (t, m ) = if m∗ > m∗ (t) 2 σR βeβt 1 − 2 σR σm∗ (t)∂m∗ J(t, m∗ ) if m∗ ∈ [m∗ (t), m∗ (t)]. σR 1 − e−β(T −t) Since ξ ∗ is bounded, the effect of ambiguity on ∂m∗ J(t, m∗ ) is negligible for m∗ s with a large absolute value. Combined with convexity, this implies that m∗ 7→ J(t, m∗ ) is U-shaped. (Epstein and Schneider (2007) in p. 1296 make a similar observation from a numerical exercise.) As with her Bayesian counterpart with unique theoretical prior Qx̄,0 , our multiple-priors investor, too, is better off when the estimated equity premium is further away from zero, that is, when the stocks are (locally, in expected terms) more distinct from the bond. The U-shape implies that the optimal policy may have curvature in the central region m∗ ∈ [m∗ (t), m∗ (t)]. Compared to the Bayesian policy, our investor’s stock demand is (i) shifted up by the ambiguity in the equity premium (divided by the return variance) when the estimated equity premium m∗ − r is sufficiently small (in the sense of < on the real line), (ii) shifted down by the same amount when m∗ − r is sufficiently large, and (iii) proportional to the negative of the instantaneous covariation between the stock return and the state (−σR σm∗ (t)) and the first derivative of the value function (∂m∗ J(t, m∗ )), when m∗ − r is intermediate. Clearly, the last case is reminiscent of Merton’s (1973) hedging demand; it tells the investor to hold more of the stock if it pays in cases of low continuation utility. (But it is not exactly the same as Merton’s hedging demand. His is such that the investor holds more of the assets that pay in cases of low consumption, or equivalently, high marginal utility.) I will have a deeper look at the quantity −σR σm∗ (t)∂m∗ J(t, m∗ ) later, but to talk about hedging, first we have to clarify the myopic demand. 34 Myopic Demand The myopic demand is defined to be h i ∗ (t, m∗ ) , lim π ∗ (t, m∗ ) πmyopic . t→T T =t Proposition 4.3. ∗ ¯ m − r + σR ξ(t) σR2 ∗ ∗ ¯ (t, m∗ ) = m − r − σR ξ(t) πmyopic σR2 0 ¯ if m∗ − r < −σR ξ(t) ¯ if m∗ − r > +σR ξ(t) ¯ ≤ m∗ − r ≤ σR ξ(t). ¯ if − σR ξ(t) The myopic demand is more conservative than that of the Bayesian investor with unique theoretical prior Qx̄,0 in that in absolute values, the former is dominated by the latter: ∗ ∗ m − r m − r ∗ ∗ for all m∗ and < |πmyopic (t, m∗ )| ≤ σ 2 for all m 6= r. σR2 R (I am comparing the feedback policies, considering m∗ to be signifying the estimate of each investor. The actual values of m∗ will differ between the two investors.) Furthermore, there is a range of estimated equity premia for which our investor neither buys nor sells short the stock. Say that the estimated equity premium is unambiguously positive if it is greater than the ambiguity in the equity premium, that is, ¯ ¯ if m∗ − r > σR ξ(t); unambiguously negative if m∗ − r < −σR ξ(t); and not unambiguously distinct from zero, otherwise. Then, the observation, rephrased, is that the multiple-priors investor, if myopic, does not participate in the stock market when her estimate of the equity premium is not unambiguously distinct from zero; and participates when it is unambiguously positive or negative but invests a smaller fraction of her wealth than the Bayesian counterpart with the same estimate would. See Dow and Werlang (1992), who first presented a nonparticipation result for ambiguity-averse (in the sense of Schmeidler (1989)) investors. Hedging Demand Under risk, log investors do not hedge; under ambiguity, they do. Recall the total demand (47) and let ∗∗ πhedging (t, m∗ ) , − βeβt 1 σR σm∗ (t)∂m∗ J(t, m∗ ). σR2 1 − e−β(T −t) ∗∗ As noted earlier, πhedging reflects the investor’s desire to hedge against adverse changes in the investment opportunities. Under ambiguity, an adverse change in the investment opportunities is a change in the state variables that is associated with a decrease 35 in continuation utility. In the present case, if the (estimated) equity premium is sufficiently large that ∂m∗ J(t, m∗ ) > 0, then the investor would fear a decrease in the equity premium, that is, its becoming ambiguous, and want to transfer wealth to states with lower equity premia. And she could do this by holding more of the stock if it pays at times of lower equity premia and less of it if it does not. However, the desire to hedge does not fully realize, and how much of it realizes depends on the magnitude of the ambiguity present. The total demand π ∗ (t, m∗ ) is ∗∗ 2 ¯ given by πhedging (t, m∗ ) confined between (m∗ − r ± σR ξ(t))/σ R , which collapse to the ∗∗ the Bayesian demand (m∗ − r)/σR2 when no ambiguity is present. Hence I call πhedging ∗∗ shadow hedging demand. Finally, based on the interpretation of πhedging , the difference ∗ between the total demand and the myopic demand is called the hedging π ∗ − πmyopic demand, although the intent is not fully realized: ∗ ∗ (t, m∗ ) , π ∗ (t, m∗ ) − πmyopic (t, m∗ ) πhedging ∗ ∗ ¯ ¯ m − r + σR ξ(t) m − r − σR ξ(t) ∗∗ ∗ , min , πhedging (t, m ) = max σR2 σR2 ∗ ∗ ¯ ¯ m − r − σR ξ(t) m − r + σR ξ(t) − max , min ,0 . σR2 σR2 Long-horizon, multiple-priors log investors’ nonmyopic behavior was first observed in discrete time by Epstein and Schneider (2007) and in continuous time by HernándezHernández and Schied (2007a). ∗∗ In Comparison with Merton (1973) The shadow hedging demand πhedging is reminiscent of Merton’s (1973), but not the same. The difference lies in what are adverse changes in the investment opportunities. Under risk, they are associated with low consumption; under ambiguity, with low continuation utility. ∗∗ To draw further comparison between πhedging and Merton’s hedging demand, recall that the latter is the position in the stock that minimizes the volatility of consumption. ∗∗ On the other hand, πhedging is the position in the stock that minimizes (to zero) the effect of misspecification on continuation utility. To elaborate, let Z T ∗ ∗ ∗ ,c∗ Pξ ∗ −βs ∗ ∗ π V (t, m , W ) , E e log(c (s)) ds mt = m , W (t) = W . t As is characteristic of log investors, V additively separates to a part depending only on (t, W ∗ ) and another depending only on (t, m∗ ), and I have been focusing on the latter denoted by J. Let further ξ EP [ dV (t, m∗t , W π,c (t))| m∗t = m∗ , W π,c (t) = W ] f (t) , dt and observe that ξ ∂ξ(t) (f ξ (t) − f 0 (t)) = W π(t)σR ∂W V (t, m∗ , W ) + σm∗ (t)∂m∗ V (t, m∗ , W ). 36 From (35), ∂W V (t, m∗ , W ) = e−βt − e−βT 1 and ∂m∗ V (t, m∗ , W ) = ∂m∗ J(t, m∗ ). β W ∗∗ It follows that |∂ξ(t) (f ξ (t)−f 0 (t))| attains its minimum (zero) at π(t) = πhedging (t, m∗ ). 4.4.2 x̄ Unknown and Ambiguous Suppose now that the investor does not know the value of x̄ and entertains all the theoretical priors Q = {Qx̄,η : (x̄, η) ∈ R × L2 ([0, T ], R)}. As before, R and A are redundant as state variables because σ is constant, and γ, δ, σx̄∗ , and Ix̄−1 are redundant because they are deterministic. But now x̄∗ needs to be taken as a state variable as well as m∗ : Z = (m∗ , x̄∗ ) with dynamics (see Proposition 2.6) ∗,ξ ρw σR + γ(t) + δ(t) κ(x̄∗,ξ ξ t − mt ) dZ (t) = dt + σR−1 ( d(t) + ξ(t) dt) κ−1 σx̄∗ (t) 0 = µZ (Z ξ (t)) dt + σZ (t)( d(t) + ξ(t) dt). Since the diffusion matrix σZ σZ> is degenerate, the value function J may not be differentiable. I assume nevertheless that ∂Z J(t, Z) exists everywhere and write ∗ ∗ ¯ ¯ m − r + σR ξ(t) m − r − σR ξ(t) ∗∗ ∗ ∗ ∗ ∗ ∗ π (t, m , x̄ ) = max , min , πhedging (t, m , x̄ ) , σR2 σR2 ∗∗ (t, m∗ , x̄∗ ) , − πhedging 1 βeβt (ρw σR + γ(t) + δ(t))∂m∗ J(t, m∗ , x̄∗ ) σR2 1 − e−β(T −t) 1 βeβt − 2 κ−1 σx̄∗ (t)∂x̄∗ J(t, m∗ , x̄∗ ). σR 1 − e−β(T −t) ∗∗ I call the first term of πhedging the m∗ -shadow hedging demand and the second the x̄∗ -shadow hedging demand. 4.4.3 Numerical Analysis Continue to assume that the investor entertains all the theoretical priors Q = {Qx̄,η : (x̄, η) ∈ R × L2 ([0, T ], R)}. In this section, I numerically compute the optimal stock demand π ∗ (t, m∗ , x̄∗ ) and discuss its behavior. 37 Stock Demand 2 Stock Demand 2 1 1 Hedging -0.04 -0.02 0.02 -0.04 -0.02 m* +x* 0.02 0.04 Premium Hedging Demand m* -1 -2 Premium 0.04 -1 Contrarian Behavior x* -2 Figure 1: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous equity premium (annual, decimal). The investor has observed 20 years of data and now faces a 10year investment horizon. β = 0.03, λ = ∞, α = 0.38, and the estimated long-run equity premium x̄∗t − r is fixed at 0.0458. Left plot: The solid line passing through the origin shows the Bayesian demand; the dashed line, the myopic demand; the dotted lines, the shadow hedging demands; finally, the thick solid line shows the total demand. Right plot: An analysis of the optimal stock demand. The securities market model is calibrated based on Barberis (2000):24 dR(t) = x(t) dt + 0.1428 dw(t), dx(t) = 0.2743(x̄ − x(t)) dt − 0.0392 dw(t) + 0.0361 dv x̄,0 (t), and r = 0.0432 (all numbers are annual). The investor has observed 20 years of data and now faces a 10-year investment horizon. β = 0.03, λ = ∞, and α = 0.38. These parameters translate into an ambiguity in the equity premium of 0.01. Also, σZ (20) = (0.007, 0.009)> . Figure 1 shows the corresponding optimal stock demand as a function of the estimated instantaneous equity premium, with the estimated long-run equity premium fixed at 0.0458 (Barberis’s estimate). In the left plot, the solid line passing through the origin shows the Bayesian demand; the kinked dashed line, the myopic demand; the dotted lines, the m∗ -, x̄∗ -, and total shadow hedging demands; and finally, the thick solid line shows the total demand. As observed analytically, the total demand is given 24 I annualized his monthly estimates (left panel of his Table II). His estimation is based on the monthly NYSE value-weighted returns as calculated by the CRSP, from June 1952 to December 1995. Barberis assumes that excess stock returns are predicted by the dividend-price ratio, whereas the predictive variables of the present model, x, are unobservable. Hence, I calibrated the SDE for x so that the SDE for mx̄,0 matches Barberis’s estimation: dmx̄,0 (t) = 0.2743(x̄ − mx̄,0 (t)) dt − 0.0031 dw̄x̄,0 (t) where −0.0031 = limt→∞ (ρw + γ(t)/σR ). To be precise, Barberis finds, in accordance with other empirical works, excess stock returns and the dividend-price ratio to be highly negatively correlated (−0.9351), and I set R and mx̄,0 to be perfectly negatively correlated. 38 by the shadow hedging demand if the latter is moderate compared to the magnitude of the ambiguity present; otherwise, the investor behaves as if he were a Bayesian ¯ or m∗ − r + σR ξ(t). ¯ investor whose estimate of the equity premium is m∗ − r − σR ξ(t) The hedging demands are represented by a shaded region in the right plot. Note that the investor hedges for a range of estimated equity premia wider than dictated by the ambiguity in her estimate and the hedging demands are significant. For example, when the estimated equity premium is −0.01, the long-horizon investor facing a 10year horizon sells short an amount of the stock worth about 100% of her wealth, whereas a myopic investor would take no position in the stock. In Comparison with Epstein and Schneider (2007) To further analyze the optimal policy, it helps to contrast it with that of related models, and first I consider Epstein and Schneider (2007). First, in Epstein and Schneider’s model, a long-horizon multiple-priors investor still holds no stock when the estimated equity premium is zero. In Figure 1, on the other hand, π ∗ is negative around zero estimated premium. This is due to the asymmetry in the dynamics of the estimated premium m∗ −r. When the true premium is constant and known, a log investor’s value function is quadratic in it. Hence, in particular, it is symmetric at zero premium and is strictly increasing in the absolute value of the premium; that is, the investor is better off when the stock is (locally, in expected terms) more distinct from the bond. However, since in the present model m∗ − r is attracted to x̄∗ − r, the current value of which is positive, the value function rises in the right vicinity of zero estimated premium and rises more in the right than in the left because a negative m∗ − r will have to pass the minimum of the value function before reaching x̄∗ − r. Consequently, ∂m∗ J(t, r, x̄∗ ) > 0. ∂x̄∗ J(t, r, x̄∗ ) > 0 for the obvious reason, and the negative demands around zero estimated premium follow. When the desire to hedge fully realizes, it may give rise to contrarian behavior. Note from Figure 1 that when the estimated premium falls around −0.02, the investor exhibits contrarian behavior in the sense that she decreases her stock holdings as the estimated premium increases. In the absence of ambiguity (see, for example, Brendle (2006)), as the estimated premium improves, that is, as it moves toward the direction of increasing the continuation utility, the marginal indirect utility of such an improvement strictly increases and so does the desire to hedge. The introduction of ambiguity does not fundamentally alter this structure because the density generators are bounded by ξ¯ and ξ¯ is independent of the estimated premium. Epstein and Schneider make a similar observation that their investor is contrarian in the sense that when the estimated premium is not unambiguously distinct from zero, she goes long for negative premia and short for positive premia. This restricted form of contrarian behavior results from the symmetric structure of their model. What, then, exactly is the dependence of the stock demand on the estimated longrun equity premium? The argument leading to ∂m∗ J(t, r, x̄∗ ) > 0 suggests that if the 39 Stock Demand 2 Stock Demand 2 1 1 x* m +x* * -0.04 -0.02 0.02 Premium 0.04 x -1 m* +x* * -0.04 -0.02 0.02 0.04 Premium m* * m -1 -2 -2 Figure 2: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous equity premium (annual, decimal). The parametrization is the same as Figure 1 except that the estimated long-run equity premium x̄∗t − r is 0 for the left plot and −0.0458 for the right plot, as opposed to 0.0458. current value of x̄∗ − r is negative, ∂m∗ J(t, r, x̄∗ ) < 0. This is indeed the case. See Figure 2; I changed the value of x̄∗t − r from 0.0458 to 0 (left plot) and −0.0458 (right plot). When x̄∗t − r = −0.0458, both derivatives at zero instantaneous premium are negative and the corresponding hedging demand is positive. It is possible to show, following the proof of Lemma 4.2(i), that J(t, m∗ , x̄∗ ) is convex in (m∗ , x̄∗ ) and hence in particular in x̄∗ . Accordingly, the desire to hedge against low continuation utility results in contrarian behavior with respect to the long-run premium. Compare the monotonic dependence of the demand on the long-run premium with its nonmonotonic dependence on the instantaneous premium. Such a distinction is absent in Epstein and Schneider’s model because they consider constant (in the sense of indistinguishability) investment opportunities. The assumption of constant investment opportunities also implies that in Epstein and Schneider’s model, hedging demands disappear as time goes to infinity. In contrast, in the present model, the desire to hedge against adverse changes in the estimate of the instantaneous premium, that is, the m∗ -shadow hedging demand, persists. In Comparison with Miao (2009) Miao (2009) also considers the consumption/portfolio choice problem of a multiple-priors investor in continuous time who partially observes stochastic investment opportunities. However, his notion of learning is fundamentally different from mine. To review Miao’s model in the context of the present model, pick a theoretical prior Qx̄,0 , x̄ ∈ R. A preferential prior P ξ is characterized by the filtered stock return dynamics dR(t) = mx̄,0 (t) dt + σR ( dw̄x̄,0,ξ (t) + ξ(t) dt), |ξ(t)| ≤ ξ¯ where w̄x̄,0,ξ = {w̄x̄,0,ξ (t), Gt } is a P ξ -Wiener process. That is, (i) the “center” of the set of one-step-ahead conditionals is obtained by the standard Bayesian learning 40 Stock Demand at m* -r=0 log Λ=¥ Markov Policy log Λ=1 Stock Demand log Λ=0 log Λ=-0.5 Stock Demand æ æ -1 0.2 æ æ æ æ 2 3 log Λ æ -0.1 æ 0.1 -0.015 -0.010 -0.005 -0.1 1 æ æ -0.2 Premium 0.005 0.010 0.015 æ æ -0.2 æ -0.3 æ æ æ -0.3 -0.4 -0.5 Figure 3: Confidence and optimal stock demand. The investor has learned all that he can (t → ∞) and now faces a 10-year investment horizon. β = 0.03 and x̄∗∞ − r =p0.0458. α varies as λ does in such a way that the ambiguity in the instantaneous equity premium 2αδ(t = ∞; λ) stays at 0.01. Left plot: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous equity premium (annual, decimal), for different levels of the investor’s confidence λ in the reference likelihood. Right plot: The same demand at m∗ − r = 0 as a function of λ. under Qx̄,0 and (ii) after the Bayesian learning, there remains an exogenous and time-invariant ambiguity. Thus, in particular, learning and ambiguity do not interact. In contrast, in the present model, the innovation receives a larger weight when the current estimate m∗t is ambiguous, that is, when δ(t) is large. In fact,p Miao’s model is the limit of the present model as t, λ, α → ∞ with the ¯ Note that t → ∞ is consistent with the IID ambiguity; restriction 2αδ(t; λ)/σR = ξ. λ → ∞, that is, full confidence in the reference likelihood, with the Bayesian learning; and α → ∞ with the multiple one-step-ahead conditionals despite the full confidence. In Figure 3, I plot the optimal stock demand corresponding to different levels of confidence λ. The investor has learned all that she can, meaning in particular that γ and δ have converged (assume that x̄∗ , too, has converged), and now faces a 10-year investment horizon. β = 0.03 as before and x̄∗∞ − r = 0.0458. α varies as λ does p in such a way that the ambiguity in the instantaneous premium 2αδ(t = ∞; λ) stays at 0.01. The left plot shows the Markov policies. The solid black line (top) in particular corresponds to full confidence and hence to the Miao demand. Note that it is increasing everywhere, that is, there is no region of contrarian behavior. This is because stock returns are negatively correlated with the state variable m∗ : σm∗ (∞) = ρw + γ(∞)/σR = −0.003. More importantly, the stock demand monotonically decreases as the investor loses confidence. See the right plot, which shows the stock demand at m∗ − r = 0 as λ varies. Intuitively, the estimation of the true premium is more difficult and unreliable for those investors who are less confident about their grasp of the environment; the consequent lack of confidence in the estimate combined with the (apparent) pessimism leads those investors, then, to try to transfer wealth even more to adverse states. 41 The effect of learning under ambiguity can be significant: the difference between Miao’s prediction and mine can be as large as half of wealth, depending on the investor’s confidence. 5 Asset Pricing In this section, I examine the asset pricing implications of learning under ambiguity. Specifically, I consider simple Lucas economies populated by log agents who find dividend growth ambiguous. Section 5.1 describes the general setup and Section 5.2, the equilibrium. I then make specializing assumptions to highlight the unclear relationship between the equity premium and the conditional variance of returns (Section 5.3), the declining trend in the equity premium (Section 5.4), and the nonmonotonic dependence of the equity premium on the precision of signals (Section 5.5). 5.1 General Setup Consider an economy populated by a representative agent who finances her intertemporal consumption over a finite period of time [0, T ], T ∈ (0, ∞), by trading two financial assets: one locally risk-free asset (bond) and one risky asset (stock). The bond is in zero net supply; the stock is a claim to an exogenous dividend stream. The consumption good is perishable and serves as the numeraire. 5.1.1 The Agent’s Observation and Theories Regarding how the endowments are generated, the agent entertains multiple theories. Specifically, the theories take the form of probability measures on a common measurable space. Let there be a measurable space (Ω, F), a set Q of probability measures (theoretical priors) on (Ω, F), and a filtration F = {Ft } of F. Under the theoretical prior Qx̄,η ∈ Q, (x̄, η) ∈ R × L2 ([0, T ], R), the dividend-rate process D = {D(t), Ft } is given by part of the solution to the system of SDEs dD(t)/D(t) = (aD (t, D, A) + bD (t, D, A)x(t)) dt + σD (t, D, A) dw(t), dA(t) = (aA (t, D, A) + bA (t, D, A)x(t)) dt + σA (t, D, A) dw(t), dx(t) = κx (x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt). (48) (49) (50) Here, A = {A(t), Ft } is an nA -dimensional process, nA ≥ 0, representing the macroeconomic variables in addition to the dividends themselves that affect, or are simply correlated with, the growth of dividends; the scalar process x = {x(t), Ft } tracks the evolution of the unobservable state of the economy determinant of the expected growth rate; w = {w(t), Ft } and v x̄,η = {v x̄,η (t), Ft } are independent Wiener processes of dimension 1+nA and 1, respectively; aD , bD , σD , aA , bA , and σA are nonanticipative 42 path functionals from [0, T ] × C([0, T ], R1+nA ) into R, R, R1×(1+nA ) , RnA , RnA , and RnA ×(1+nA ) , respectively; and κ ∈ (0, ∞), ρw ∈ R1×(1+nA ) , and ρv ∈ R \ {0}. The agent observes the dividends and the rest of the macroeconomic variables but not the expected growth rate. If we define d by dd = dD/D, d and A constitute the observable component and x the unobservable component of the partially observable process of the previous sections.25 Let thus y , (d, A> )> . The dynamics (48)–(49) of the observable component can be rewritten compactly as dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t) (51) where the definitions of a, b, and σ are obvious. Now recall that all numbered assumptions are to stand throughout the paper from their statement on: Assumptions 2.1 to 2.4 and 3.1. Thus, by Assumption 2.1 in particular, the system of SDEs (50)–(51) has a unique strong solution; and consequently so does (48)–(50). Assume further the following for simplicity: Assumption 5.1. bD > 0 a.e. 5.1.2 Asset Prices The rate of (net) return on the bond, or the interest rate, is denoted by r(t) while the initial value of the bond is normalized to one. The price of the stock is denoted by S(t). The processes r and S are adapted to the agent’s observation filtration G generated by y. It is further assumed that r is a.s. bounded; and that S is a positive Itô process,26 so that the return on the stock is well-defined by dR(t) = dS(t) + D(t) dt . S(t) The interest rate and the stock price are determined endogenously.27 5.1.3 Consumption Plans and Trading Strategies To guarantee that D belongs to C 2 (u), I make the following assumption: 25 aD , bD , and so on are functionals in (D, A) rather than in (d, A). But by Lemma 4.9 of Liptser and Shiryaev (1977), there exist ({Bt+ }-adapted) functionals ãD and so on such that aD (t, D(·, ω), A(·, ω)) = ãD (t, d(·, ω), A(·, ω)), · · · , a.e. This substitution is implicit in the statement. 26 An Itô process consists of a time integral and a stochastic integral with respect to a Wiener process where the integrand of the stochastic integral is required to be square-integrable with respect to time a.s. This definition stays unambiguous the multiplicity notwithstanding of the probability measures under consideration (P and {Q|GT : Q ∈ Q}) by virtue of the uniform boundedness of ξ ∈ Ξ and b (Proposition 2.9 and Assumption 2.1), the nondegeneracy condition on σ (Assumption ∗ ∗ 2.2), and the fact that mx̄,η (t) and mx̄t ,ηt (t), 0 ≤ t ≤ T , are continuous processes. 27 As usual, the assumptions made here on the equilibrium prices are only “provisional” in that the equilibria thus found (see Proposition 5.1 below) are indeed consistent with them. I do not check, however, whether there are other equilibria. 43 Assumption 5.2. The processes aD and σD satisfy Z T P0 aD (t)2 dt < ∞, E 0 sup E P0 exp(h|σD (t)|2 ) < ∞ for some h > 0. (52) t≤T Lemma 5.1. D ∈ C 2 (u). Now, a two-dimensional process (Π◦ , Π) = {(Π◦ (t), Π(t)), Gt } is a trading strategy, where Π◦ (t) is to be read as the amount invested in the bond and Π(t) as that invested in the stock, if (i) it is progressive; (ii) the gains process Z t Z t ◦ Π (s)r(s) dt + Π(s) dR(s), 0 ≤ t ≤ T, 0 0 is well-defined and again an Itô process; (iii) the discounted wealth process Z t r(s) ds (Π◦ (t) + Π(t)), 0 ≤ t ≤ T, exp − 0 is uniformly bounded below; and (iv) Π◦ (T ) + Π(T ) ≥ 0. A trading strategy (Π◦ , Π) finances a consumption plan c ∈ C 2 (u) if Z t Z t Z t ◦ ◦ ◦ Π (t) + Π(t) = Π (0) + Π(0) + Π (s)r(s) dt + Π(s) dR(s) − c(s) ds − C(t), 0 0 0 0 ≤ t ≤ T , for some nondecreasing process C = {C(t), Gt } with C(0) = 0. 5.1.4 The Agent’s Problem The agent has the Chen-Epstein recursive multiple-priors utility with log felicity Z T Pξ min E e−βt log(c(t)) dt, c ∈ C 2 (u), ξ∈Ξ 0 and maximizes it over the consumption plans that can be financed by some trading strategy with initial worth S(0). The preferential priors {P ξ : ξ ∈ Ξ} are constructed from the theoretical priors described above, as in Section 2. 5.1.5 Equilibrium: Definition An equilibrium is a pair of processes (r, S) such that with r as the interest rate process and S as the stock price process, the optimal consumption plan of the agent equals the exogenous dividend stream D. 44 5.2 5.2.1 The Equilibrium Asset Prices Denote the set of minimizing density generators by28 Z T Pξ ∗ Ξ , arg min E e−βt log(D(t)) dt. ξ∈Ξ 0 Proposition 5.1. Equilibria, possibly multiple, exist. ∗ ∗ To each ξ ∗ ∈ Ξ∗ corresponds an equilibrium (rξ , S ξ ) where ∗ ∗ ∗ rξ (t) = β + aD (t) + bD (t)mx̄t ,ηt (t) − |σD (t)|2 + σD (t)ξ ∗ (t), ∗ S ξ (t) = 1 − e−β(T −t) D(t). β (53) ∗ Note that S ξ is independent of ξ ∗ . In fact, it is independent of ambiguity; the equilibrium stock price identified in (53) is the same as that of the Lucas economy populated by unique-prior agents with log felicity. A description of Ξ∗ is then in order. Since ∗ ∗ dD(t)/D(t) = (aD (t) + bD (t)mx̄t ,ηt (t) + σD (t)ξ(t)) dt + σD (t) dξ (t) (54) p and σD (t)ξ(t) = bD (t)∆m, |∆m| ≤ 2αδ(t), a natural candidate is ξ ∗∗ defined by p ξ ∗∗ (t) , −σ(t)−1 b(t) 2αδ(t), 0 ≤ t ≤ T. Proposition 5.2. Suppose σ −1 b is constant. Then Ξ∗ = {ξ ∗∗ }. With arbitrary stochastic coefficients a, b, and σ, however, an explicit characterization of Ξ∗ seems difficult. In what follows I will thus assume σ −1 b is constant, with the exception of Section 5.3, and focus on the equilibrium associated with ξ ∗∗ : Assumption 5.3. σ −1 b is constant. With ξ ∗ = ξ ∗∗ , the equilibrium interest rate identified in Proposition 5.1 becomes p ∗ ∗ r(t) = β + aD (t) + bD (t)mx̄t ,ηt (t) − |σD (t)|2 − bD (t) 2αδ(t) where the now-redundant superscript is dropped. 28 The set is nonempty by Theorem 2.2 of Chen and Epstein (2002). 45 5.2.2 Equity Premium If the agent had to estimate the equity premium, she would do so under the most ∗ ∗ plausible theory Qx̄t ,ηt . So compute µR defined by dR(t) = µR (t) dt + σR (t) d(t). From (53), ∗ ∗ µR (t) = β + aD (t) + bD (t)mx̄t ,ηt (t), σR (t) = σD (t). The (estimated) equilibrium equity premium is given by p µR (t) − r(t) = |σD (t)|2 +bD (t) 2αδ(t), | {z } | {z } risk premium ambiguity premium | {z } uncertainty premium which we see consists of two components. I refer to the equity premium also as the uncertainty premium; it is the reward for bearing uncertainty in the growth of the dividends of the asset the agent is constrained to hold: 0 0 dD(t)/D(t)|Gt ∼ N aD (t) + bD (t)mx̄ ,η (t) dt, |σD (t)|2 dt , (x̄0 , η 0 ) ∈ R×L2 ([0, T ], R). Of the two components, the first is the risk premium from the C-CAPM (Breeden, 1979), |σD (t)|2 = Cov( dR(t), dD(t)/D(t)|Gt ), and equals the unambiguous local variance of dividend growth, namely the risk in dividend growth. The second is the ambiguity premium (Chen and Epstein, 2002), p bD (t) 2αδ(t) = −σR (t)ξ ∗∗ (t) and equals the ambiguity in dividend growth; or, more specifically, the ambiguity in the local mean of dividend growth as measured by the square root of the p inverse of the curvature of the√(induced) plausibility, or the “standard error,” bD (t) δ(t) (times the personal cutoff 2α). If the agent could put full confidence in a particular theory, she would deduce a single conditional distribution for dividend growth with a definite mean, and accordingly worry only about the dispersion of dividend growth as above. The agent cannot, however, and demands commensurate compensation. Remark 5.1. The presence of ambiguity and aversion to it alleviate both the equity premium puzzle and the risk-free rate puzzle (Chen and Epstein, 2002). In the present setup, the stock price is unaffected by ambiguity (Proposition 5.1), and so for the agent to continue to hold the ambiguous asset, the interest rate has to fall. 46 5.3 Equity Premium & Conditional Variance of Returns Motivated by standard asset pricing models such as the I-CAPM of Merton (1973), numerous empirical papers have investigated the relationship between the equity premium and the conditional variance of returns. The findings are mixed: some report a positive relationship as expected; others a negative relationship; and still others an insignificant relationship. Consequently, various explanations and state variables have been suggested.29 In the present setup, the equity premium would equal the conditional return variance if there were no ambiguity, that is, if the agent knew the true probability measure; with ambiguity, it equals the conditional return variance plus the ambiguity premium. And the ambiguity premium is both time-varying and do not have a deterministic relationship with the conditional return variance. To further investigate the relationship between the equity premium and the conditional variance of returns, I consider an economy where the conditional variance of dividend growth (and that of returns) is time-varying, following a Cox-Ingersoll-Ross process (Gennotte and Marsh, 1993):30 q p 2 , rDA ) dw(t), dD(t)/D(t) = x(t) dt + A(t)( 1 − rDA p dA(t) = ν(Ā − A(t)) dt + ςA A(t)(0, 1) dw(t), dx(t) = κ(x̄ − x(t)) dt + (ρw,1 , ρw,2 ) dw(t) + ρv dv x̄,0 (t), where (i) ν Ā > ςA2 /2 so that A stays away from zero and (ii) rDA ∈ (−1, 1). This specification violates Assumption 2.2. Thus, to characterize ξ ∗ I invoke a stochastic control argument instead. Denote the state vector by Z = (D, A, m∗ , x̄∗ , γ, δ, Ix̄−1 , σx̄∗ ) ∗ ∗ where, as before, m∗t ≡ mx̄t ,ηt (t). Define the value function by Z T Pξ −βs J(t, Z) , min E e log(D(s)) ds Z(t) = Z ξ t subject to the dynamics of Z. Proposition 5.3. Assume J ∈ C 1,2 ∩ Cp . Then ξ ∗ ∈ Ξ∗ if and only if p + 2αδ(t) p < 0 ∗ −1 > −1 ξ (t) = σ(t) b(t) × − 2αδ(t) if (∂Z J(t, Z(t))) σZ (Z(t))σ(t) b(t) > 0 =0 indeterminate p p where “indeterminate” means “any number in [− 2αδ(t), 2αδ(t)].” 29 30 See the introduction of Rossi and Timmermann (2015). In Gennotte and Marsh (1993), the expected growth rate of dividends is constant and known. 47 (∂Z J)> σZ σ −1 b is the marginal effect of an increase in the estimated expected growth rate, on the expected change over the next instant in the agent’s continuation utility. Hence, when it is positive (negative), the agent views the expected growth rate as overestimated (underestimated) and demands a high (low) equity premium. Alternatively, we can interpret the ambiguity premium also as coming from the shadow hedging demand (see Section 4.4). Lemma 5.2. √ p −κx (s−t) 2 ρ A 1 − rDA +γ+δ 1 − e w,1 e−βs ds 2 κx A(1 − rDA ) t T −κx (s−t) 1−e κ−1 x σx̄∗ e−βs (s − t) − ds . 2 κx A(1 − rDA ) t e−βt − e−βT + (∂Z J)> σZ σ −1 b = β Z + Z T The first term and the third term of (∂Z J)> σZ σ −1 b are positive for all t < T and all Z in its natural domain. On the other hand, if ρw,1 < 0, the second term may be > −1 negative; and when (∂ pZ J) σZ σ b changes sign as a result, the ambiguity premium will jump between ± 2αδ(t). Compare this with the constant (positive) sign of the ambiguity premium under the assumption of constant volatility (Proposition 5.2). In Figure 4, I plot a simulated path of the conditional variance (common to both plots) and the corresponding variations in the ambiguity premium when the growth rates dD/D and the changes dx in the expected rate of growth are locally uncorrelated (left) and locally negatively correlated (right). Cases of positive local correlation are qualitatively similar to those of zero correlation and hence are not reported. The dividend dynamics are calibrated based on Bansal et al. (2012) while, differently from their model, a local correlation between dD/D and dx is allowed.31 The parameter values related to the agent’s dispositions are set as follows: β = 0.03, λ = 0.01, and α = 16. The choice for β is within the standard range; those for λ and α are essentially arbitrary. When Corr( dD/D, dx) ≥ 0 (see the left plot), the ambiguity premium follows the conditional variance of returns with some lag. This is intuitive: ambiguity about the unobservable state of the economy and the premium for bearing it are high when signals about the unobservable state have been imprecise in the recent past. The lag, however, confirms that the ambiguity premium is not a function of the contemporaneous conditional variance of returns. Rather, the ambiguity premium at 31 I annualized the monthly values used for the dynamics of aggregate consumption (the second row of Bansal et al.’s Table I): p dD(t)/D(t) = x(t) dt + A(t)(1, 0) dw(t), p dA(t) = 0.0120((0.0249)2 − A(t)) dt + 3.889 × 10−4 A(t)(0, 1) dw(t), 9.478 × 10−4 dx(t) = 0.3038(0.0180 − x(t)) dt + q ((ρw,1 , 0) dw(t) + ρv dv x̄,0 (t)). 2 2 ρw,1 + ρv 48 Figure 4: Conditional variance of returns and ambiguity premium. Thin lines (left vertical axes): Conditional variance of dividend growth = conditionalpvariance of returns = risk premium = |σD (t)|2 . Thick lines (right vertical axes): Ambiguity premium 2αδ(t). The local correlation between dD/D and dx is 0 (left) and −0.99 (right). The simulated path of the conditional variance is common to the two plots. λ = 0.01 and α = 16. Initial years are discarded. a particular instant depends on the history of return variance, or, in other words, there is a kind of hysteresis. Compare, for example, the ambiguity premia around Year 10 and Year 15. Despite the comparable levels of return variance, the ambiguity premium is higher around Year 10 because the signal has been imprecise over the preceding couple of years, while it stays relatively precise afterwards up to Year 15. Hysteresis can be observed when Corr( dD/D, dx) < 0 as well; see the right plot and compare, again, Years 10 and 15. More noteworthy in this case, however, is that the ambiguity premium comoves negatively with the conditional variance of returns. That is, the ambiguity premium, or more fundamentally, market ambiguity itself, is high when the traditional measure of market uncertainty is low, and low when the traditional measure is high. The reason is as discussed in Section 3.2. Suppose, for example, that recently dividends have been growing steadily, at a rate faster than expected. The old estimate of the expected growth rate is likely an underestimate, and, given the reliability of the signal, it is to be revised upward. But at the same time, the faster-than-expected growth is indicative also of positive shocks to the growth rate and in turn of negative shocks to the expected growth rate, and in this respect the estimate is to be revised downward. It is not clear whether the estimate of the expected growth rate should be revised upward or downward, and uncertainty—the estimation ambiguity—rises. The path dependence and the possibility of negative comovement render unclear the relationship of the ambiguity premium (and ultimately, of the total equity premium) with the conditional variance of returns. In particular, if dividend growth and expected dividend growth are indeed locally negatively correlated and the ambiguity premium is of a comparable magnitude to the risk premium (low confidence λ or high conservatism α or both), then the low-frequency correlation between the equity 49 premium and the conditional variance of returns would be small. 5.4 Declining Trend in the Equity Premium We observed in Section 3 that learning about ambiguous yet time-invariant factors of the data-generating mechanism generates a decreasing trend in the conditional ambiguity. We would thus expect the reward for bearing the ambiguity, too, to decrease over time; and if it does, the interest rate is to rise concurrently (Remark 5.1). These trends have indeed been documented over the post-WWII period by such authors as Merton (1980), Blanchard (1993), and Jagannathan et al. (2000). Blanchard, for example, finds that the U.S. equity premium, which was higher than 10 percent in the late 1940s, dropped to 2 to 3 percent toward the early 1990s as the interest rate rose from negative to positive values. For a demonstration, the following minimal specification suffices, where dividends grow with constant volatility and there are no additional macroeconomic variables to consider other than the dividends: dD(t)/D(t) = x(t) dt + σD dw(t), dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv dv x̄,0 (t). 2 p In this case, the risk premium is constant at σD while the ambiguity premium 2αδ(t) deterministically decreases over time to its limit value q p 12 p 2 2 −1 2 2 2 2ασD (κ + ρw /σD ) + (1 + λ )(ρv /σD ) − (κ + ρw /σD ) + (ρv /σD ) (Proposition 3.2). The agent’s conservatism in model selection α affects both the limit value of the ambiguity premium and the rate at which it falls. Note also that if the agent in fact has full confidence in mean reversion (λ = ∞), the equity premium will eventually reach the level that standard models predict. The left panel of Figure 5 shows a simulated path for the equity premium (thick line) and a corresponding one for the interest rate (thin line). The dividend dynamics are calibrated again based on Bansal et al. (2012),32 while β = 0.03, λ = ∞, and α = 16. β and α are the same as before; but the value of 16 for α, together with the full confidence, is in fact chosen to match (very roughly) the observed trends (see Figures 10 and 11 of Blanchard, 1993). Given in the right panel of Figure 5 for comparison are the equity premium and the interest rate under the prevailing form of ambiguity, namely IID ambiguity, based 32 Specifically, dD(t)/D(t) = x(t) dt + 0.0249 dw(t), dx(t) = 0.3038(0.0180 − x(t)) dt + 9.478 × 10−4 dv x̄,0 (t). 50 IID Ambiguity Present Model 0.10 0.10 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 -0.02 -0.04 10 20 30 40 50 Year -0.02 10 20 30 40 50 Year -0.04 Figure 5: Declining equity premium and rising interest rate. The thick lines show the equity premium and the thin lines the interest rate (annual and decimal). The left plot shows these rates predicted by the present model, and the right plot those by an IID ambiguity model à la Miao (2009), from a common realization of the economy. Under the IID ambiguity model, the equity premium is constant 2 at σD + σD ξ¯IID , ξ¯IID ≥ 0. I set the value of ξ¯IID so that the resulting constant premium equals the mean of the premia in the left plot. The first 5 years are discarded. on the same realization of the economy. To be precise, by an IID ambiguity model I mean a version of Miao (2009): at each time-state, the agent Bayes-updates the reference and true probability measure Qx̄,0 , after which there still remains some exogenous ambiguity in the form of IID ambiguity. Under the IID ambiguity model, 2 the equity premium is constant at σD + σD ξ¯IID where ξ¯IID ≥ 0 is the bound on density generators. Thus, to generate time variations in the equity premium, one needs stochastic volatility; but even then, it is still difficult to generate a trend. The point here is not that the present model can more or less quantitatively match the observed trends in the equity premium and the interest rate, but only that it can qualitatively reproduce them; the present specification is too simplistic for a quantitative analysis. Furthermore, there are other explanations for the trends, such as increased portfolio diversification (Heaton and Lucas, 2000) and a fall in macroeconomic volatility (Lettau et al., 2008); and Pástor and Stambaugh (2001) find that prior to WWII, the equity premium had been on an increasing trend for a century. Thus, the claim is rather that learning contributes to a decline in the equity premium, through a resolution of ambiguity. 5.5 Equity Premium & Signal Precision In a Bayesian framework, Veronesi (2000) observed that higher precision of signals tends to increase the risk premium. Now that we have an endogenous characterization of the ambiguity premium, it is of interest to see if such a counterintuitive relation holds for the ambiguity premium as well. 51 A minimal specification that will suffice for the purpose is as follows: q 2 dD(t)/D(t) = x(t) dt + σD ( 1 − rDA , rDA ) dw(t), dA(t) = x(t) dt + σA (0, 1) dw(t), dx(t) = κ(x̄ − x(t)) dt + ρv dv x̄,0 (t), where σD , σA > 0 without loss of generality and rDA ∈ (−1, 1). In words, dividends grow with constant volatility, and there is a signal A about the expected growth rate x (Detemple, 1986; Veronesi, 2000). (The dividend stream D, too, is a signal, but I reserve the term signal for A, which is to represent news circulating in the economy other than the dividends themselves. I will use the term dividend-signal when referring to D as a signal.) Dividend growth and expected dividend growth are, for simplicity, assumed to be locally uncorrelated; but the signal and the dividend-signal are allowed to be locally correlated with correlation rDA . With the constant volatility of the observable processes (and the constant b), the ambiguity premium converges to a constant (Proposition 3.2), and I will focus on this limit value. Define the precision of the signal by hA , 1/σA . Proposition 5.4. Suppose rDA > 0. Then the ambiguity premium as a function of the signal precision hA is hump-shaped: it is strictly increasing on [0, rDA hD ] and strictly decreasing on (rDA hD , ∞). To understand the result, let hD , 1/σD and s h2D − 2hD hA rDA + h2A , 0 ≤ hA < ∞. heff = heff (hA ) , 2 1 − rDA heff denotes the effective precision of the signal and the dividend-signal en masse; σD and σA enter the filtering equations only via |σ −1 b|2 = h2eff . Then, in terms of the effective precision the ambiguity premium is given by q p 21 p p 2 + (1 + λ−1 )(ρ h )2 − 2 + (ρ h )2 2αδ(∞) = 2αh−2 κ κ . v eff v eff eff As can be easily checked, the relationship between the ambiguity premium and the effective precision is intuitive: the higher the effective precision, the lower the ambiguity premium. What appears to be counterintuitive is that, when the signal and the dividend-signal are locally positively correlated, for low levels of the signal precision an increase in the signal precision decreases the collective informativeness of the signal and the dividend-signal. See pFigure 6, in which I plot the asymptotic level of the 2 equity premium, namely σD + 2αδ(∞) (left), and the effective precision (right), as functions of the signal precision. The reason is in fact simple: an extremely noisy signal still helps infer the hidden state if it is correlated with another signal, by revealing the common noise. It is thus 52 ΜR -r heff 300 0.06 hD 200 0.05 150 0.04 rDA =0.99 100 rDA =0.9 0.03 50 rDA =0.1 0 heff HrDA =0.99L 250 10 20 30 40 50 60 hA 0 10 20 30 40 50 60 hA Figure 6: Equity premium and signal precision. Left plot: The asymptotic level of the equity premium (annual, decimal) as a function of the signal precision, for different levels of local correlation between the signal and the dividend-signal. Right plot: The effective precision as a function of the signal precision. The dividend dynamics are calibrated as in the previous subsection. λ = 0.01 and α = 16. a signal with moderate precision, provided it is locally positively correlated with the dividend-signal, that adds least to the dividend-signal, because not only is it then revealing neither the hidden state nor the common noise, but it is similar to the dividend-signal, that is, redundant. Note that indeed heff attains its minimum of hD when hA = rDA hD . When rDA < 0, the ambiguity premium is strictly decreasing in hA everywhere because the signal is then never redundant. Thus, the ambiguity premium, too, can increase in response to an improvement in the quality of news. What distinguishes this observation from Veronesi’s (2000) is that his result relies on the representative agent’s being sufficiently risk-averse, more so than log agents; in his model, a deterioration in the quality of news decreases the equity premium because the agent’s hedging demand tones down the covariation between returns and consumption growth. In contrast, the present paper shows that the equity premium can exhibit such counterintuitive behavior even under unit risk aversion, which is conventionally associated with myopia.33 A Proofs Proof of Proposition 2.1. The local version of the Itô existence-uniqueness result. See Rogers and Williams (1994), Theorem V.12.1. Proof of Proposition 2.2. Under Assumptions 2.1 and 2.2, the following theorems in Liptser and Shiryaev (1977) hold: (i) follows from Theorem 12.6; (ii), from Theorem 33 Multiple-priors log investors are nonmyopic; see Epstein and Schneider (2007) and HernándezHernández and Schied (2007a) 53 12.7; and (iii), from a multidimensional adaptation of Theorems 7.17 and 12.5. Proof of Lemma 2.1. Let f (t) = eκt γ(t)eκt . Then > > > > −1 > > > κt f˙(t) = eκt [ρw ρ> w +ρv ρv −(ρw σ(t) +γ(t)b(t) )(σ(t)σ(t) ) (ρw σ(t) +γ(t)b(t) ) ]e . Since (ρw σ > + γb> )(σσ > )−1 (ρw σ > + γb> )> is symmetric and positive semidefinite, Z t > κt tr(eκt (ρw ρ> tr f (t) ≤ tr f (0) + w + ρv ρv )e ) dt. 0 It follows that the sum of the variances is bounded: supt≤T covariances are bounded by variances, the claim follows. P i γii (t) < ∞. Since Proof of Proposition 2.3.34 Fix (x̄, η) ∈ Rnx × L2 ([0, T ], Rny ). Let Q̄x̄,η be the measure under which j and v x̄,η are independent Wiener processes where j is defined by dj(t) = σ(t)−1 dy(t), j(0) = 0. Then dQx̄,η = Λ(T ), dQ̄x̄,η Z Λ(t) , exp 0 t 1 [σ(s) (a(s) + b(s)x(s))] dj(s) − 2 −1 > Z t −1 2 |σ(s) (a(s) + b(s)x(s))| ds . 0 Let ψ x̄,η (t, ·) denote the unnormalized density of x(t) given Gt under Qx̄,η , defined by Z Q̄x̄,η f (x)ψ x̄,η (t, x) dx E [Λ(t)f (x(t))|Gt ] = X where X , Rnx , f denotes an arbitrary test function, and Z Z Z dx ≡ ··· dx1 · · · dxnx . X R R Since (y, x) is conditionally Gaussian, 1 x̄,η x̄,η x̄,η > −1 x̄,η ψ (t, x) = exp u (t) − (x − m (t)) γ(t) (x − m (t)) 2 (55) where ux̄,η (t) is independent of x. Now use Bayes’ rule to see dQx̄,η dQ0,0 dQ̄x̄,η Q̄x̄,η Q̄0,0 Q̄0,0 `T (x̄, η) = log E GT − log E GT + log E GT dQ̄x̄,η dQ̄0,0 dQ̄0,0 34 I thank Domenico Cuoco for this direct proof. Alternatively, we can differentiate (12) under the integral sign and re-construct the log-likelihood function back. 54 but the last term is 0 because under Q̄0,0 , v 0,0 and j are independent. Thus Z Z x̄,η `T (x̄, η) = log ψ (T, x) dx − log ψ 0,0 (T, x) dx X x̄,η =u X 0,0 (T ) − u (T ) and all boils down to computing ux̄,η (T ). To compute it, I compare the ψ x̄,η given in (55) with that as the solution to the Zakai equation: Lemma A.1. ψ x̄,η satisfies dψ x̄,η (t, x) = ψ x̄,η (t, x)[σ(t)−1 (a(t)+b(t)x)]> dj(t)−div[(κ(x̄−x)+ρv η(t))ψ x̄,η (t, x)] dt 1 > − ∂x ψ x̄,η (t, x)> ρw dj(t) + tr[∂x2 ψ x̄,η (t, x)(ρw ρ> w + ρv ρv )] dt (56) 2 with initial condition ψ x̄,η (0, ·) ∼ N (m0 , γ0 ). Proof. The derivation is standard; see, for example, Elliott and Krishnamurthy (1997). First, differentiate Λ(t)f (x(t)) and then re-integrate the resulting expression: Z t Λ(t)f (x(t)) = Λ(0)f (x(0)) + Λ(s)f (x(s))[σ(s)−1 (a(s) + b(s)x(s))]> dj(s) 0 Z t + Λ(s)∂f (x(s))> {[κ(x̄ − x(s)) + ρv η(s)] ds + ρw dj(s) + ρv dv x̄,η (s)} 0 Z 1 t > Λ(s) tr[∂ 2 f (x(s))(ρw ρ> + w + ρv ρv )] ds. 2 0 Then take the conditional expectation under Q̄x̄,η given Gt : Z Z x̄,η f (x)ψ x̄,η (0, x) dx f (x)ψ (t, x) dx = X X Z tZ + f (x)[σ(s)−1 (a(s) + b(s)x)]> ψ x̄,η (s, x) dx dj(s) Z0 t ZX + ∂f (x)> [κ(x̄ − x) + ρv η(s)]ψ x̄,η (s, x) dx ds 0 X Z tZ + ∂f (x)> ρw ψ x̄,η (s, x) dx dj(s) 0 Z XZ 1 t > x̄,η + (s, x) dx ds. tr[∂ 2 f (x)(ρw ρ> w + ρv ρv )]ψ 2 0 X For the change in the order of the conditional expectation and the stochastic integral with respect to j, see Liptser and Shiryaev (1977), Theorem 5.14. Now, integration by parts with respect to x completes the derivation. 55 (Proof of the proposition continued.) From (55), d log ψ x̄,η (t, x) = dux̄,η (t) + (x − mx̄,η (t))> γ(t)−1 dmx̄,η (t) 1 + (x − mx̄,η (t))> γ(t)−1 γ̇(t)γ(t)−1 (x − mx̄,η (t)) dt 2 1 − tr[γ(t)−1 (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> ] dt. 2 On the other hand, computing the spatial derivatives of ψ x̄,η using (55) and plugging them to (56), we obtain another expression for dψ x̄,η (t, x): −1 x̄,η dψ x̄,η (t, x)/ψ x̄,η (t, x) = [σ(t)−1 (a(t) + b(t)x) + ρ> (t))]> dj(t) w γ(t) (x − m + [(x − mx̄,η (t))> γ(t)−1 (κ(x̄ − x) + ρv η(t)) + tr κ] dt 1 > −1 x̄,η + (x − mx̄,η (t))> γ(t)−1 (ρw ρ> (t)) dt w + ρv ρv )γ(t) (x − m 2 1 > − tr(γ(t)−1 (ρw ρ> w + ρv ρv )) dt. 2 Then d log ψ x̄,η 1 1 1 − 2 ( dψ)2 (t, x) = dψ + ψ 2 ψ −1 −1 x̄,η = [σ(t) (a(t) + b(t)x) + ρ> (t))]> dj(t) w γ(t) (x − m + [(x − mx̄,η (t))> γ(t)−1 (κ(x̄ − x) + ρv η(t)) + tr κ] dt 1 > −1 x̄,η + (x − mx̄,η (t))> γ(t)−1 (ρw ρ> (t)) dt w + ρv ρv )γ(t) (x − m 2 1 > − tr(γ(t)−1 (ρw ρ> w + ρv ρv )) dt 2 1 −1 x̄,η − |σ(t)−1 (a(t) + b(t)x) + ρ> (t))|2 dt. w γ(t) (x − m 2 Equate the two expressions of d log ψ x̄,η (t, x) to see 1 dux̄,η (t) = − tr(γ(t)−1 γ̇(t)) dt + (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 dy(t) 2 1 − (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 (a(t) + b(t)mx̄,η (t)) dt. 2 x̄,η Finally, note that u (0) = u0,0 (0). Proof of Lemma 2.2. Let ε > 0 and observe > Z t Z s −1 `t (x̄, η+εh)−`t (x̄, η) = ε ϕ(τ ) ρv h(τ ) dτ ϕ(s)> b(s)> (σ(s)σ(s)> )−1 dy(s) 0 0 Z t Z s x̄,η > > −1 − ε (a(s) + b(s)m (s)) (σ(s)σ(s) ) b(s)ϕ(s) ϕ(τ )−1 ρv h(τ ) dτ ds + O(ε2 ). 0 0 (57) 56 The first term can be rewritten, by integration by parts, as Z ε 0 > Z t t ϕ(s)> b(s)> (σ(s)σ(s)> )−1 dy(s) ϕ(τ ) ρv h(τ ) dτ 0 Z s Z t −1 > ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 dy(τ ) ds − ε (ϕ(s) ρv h(s)) −1 0 (58) 0 Z t Z t > > > −1 ϕ(τ ) b(τ ) (σ(τ )σ(τ ) ) =ε 0 > dy(τ ) ϕ(s)−1 ρv h(s) ds s and the second term, by changing the order of integration, as Z t Z ε > > > −1 x̄,η ϕ(τ ) b(τ ) (σ(τ )σ(τ ) ) (a(τ ) + b(τ )m 0 > t (τ )) dτ ϕ(s)−1 ρv h(s) ds. (59) s Now, plug (58) and (59) into (57), differentiate it with respect to ε, and set ε = 0. Proof of Lemma 2.3. Since ψ11 (0) = Inx , ψ11 (t) is by continuity invertible up to a random time τ11 ∈ (0, T ]. Up to τ11 , ψ21 (t)ψ11 (t)−1 ρv ρ> v satisfies > f˙(t) = λ−1 ρv ρ> v − κ̄(t)f (t) − f (t)κ̄(t) − f (t)b(t)> (σ(t)σ(t)> )−1 b(t)f (t), f (0) = 0. (60) (60) has a unique solution: suppose p and q solve (60), let ∆(t) = p(t) − q(t), and ˙ observe ∆(0) = ∆(0) = 0. Thus, ψ21 (t)ψ11 (t)−1 ρv ρ> v is symmetric up to τ11 . Consider the following hypothetical partially observable system dy(t) = b(t)x(t) dt + σ(t) dw(t), dx(t) = −κ̄(t)x(t) dt + λ−1/2 ρv dv(t), x(0) ∼ N (m0 , 0), with the understanding κ̄(t) = κ̄(t, y). By Liptser and Shiryaev (1977), Theorem 12.7, the conditional variance of x(t) satisfies (60) and stays positive definite for t > 0. (The assumptions of the theorem are satisfied; in particular, κ̄ is uniformly bounded by Lemma 2.1.) Hence, ψ21 (t)ψ11 (t)−1 ρv ρ> v is positive definite and consequently invertible up to τ11 . Since ψ21 (0) = 0 and ψ̇21 (0) = λ−1 In , ψ21 (t), t > 0, too, is invertible up to a random time τ21 ∈ (0, T ]. By the last paragraph, τ21 ≥ τ11 . Suppose τ11 < T . There are two cases to consider. First, τ11 = τ21 . This contradicts the invertibility of ψ. Second, τ11 < τ21 . Then ψ11 (t)−1 will explode as t ↑ τ11 , which is impossible because ψ21 (t) is invertible. To be concrete, let g be the solution of > ġ(t) = g(t)κ̄(t), g(0) = In , and let h(t) , g(t)ψ21 (t)ψ11 (t)−1 ρv ρ> v g(t) . Observe Z T > tr h(t) ≤ tr(g(s)λ−1 ρv ρ> v g(s) ) ds, t ≤ τ11 , 0 57 Given that g and ψ21 are invertible, the left-hand side should explode as t ↑ τ11 but the right-hand side is finite. Hence, τ11 = τ21 = T . (Note. T is arbitrary.) Proof of Proposition 2.4. Multiply ρv to FOC(η) to have Z t > −1 > ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 [ dy(τ )−(a(τ )+b(τ )mx̄,η (τ )) dτ ]. λρv η(s) = ρv ρv (ϕ(s) ) s Differentiate this with respect to s to see > > −1 d(λρv η(s)) = ρv ρ> v κ̄(s) (ρv ρv ) λρv η(s) ds > > −1 κx̄+η > > −1 (s) ds. dw̄0,0 (s) + ρv ρ> − ρv ρ> v b(s) (σ(s)σ(s) ) b(s)Φ v b(s) (σ(s) ) Observe in turn that dΦκx̄+ρv η (s) = −κ̄(s)Φκx̄+ρv η (s) ds + (κx̄ + ρv η(s)) ds and that consequently we have a linear system of differential equations in λρv η and Φκx̄+ρv η . Written in the matrix form, the system is > > −1 λρv η(s) λρv η(s) −ρv ρ> dw̄0,0 (s) v b(s) (σ(s) ) d = χ(s) ds + Φκx̄+ρv η (s) Φκx̄+ρv η (s) κx̄ ds It follows Z s > > −1 λρv η(s) −ρv ρ> dw̄0,0 (τ ) −1 v b(τ ) (σ(τ ) ) = ψ(s) ι1 λρv η(0) + ψ(τ ) Φκx̄+ρv η (s) κx̄ dτ 0 = ψ(s)ι1 λρv η(0) Z s > > −1 ψ(τ )−1 ι1 ρv ρ> dw̄0,0 (τ ) + Ψ(s)ι2 κx̄. − ψ(s) v b(τ ) (σ(τ ) ) 0 Finally, observe λρv η(t) = 0 = ψ11 (t)λρv η(0) − ι> 1 ψ(t) Z t > > −1 ψ(τ )−1 ι1 ρv ρ> dw̄0,0 (τ ) + Ψ12 (t)κx̄. v b(τ ) (σ(τ ) ) 0 Lemma A.2. θ is the unique solution of f˙(t) = Inx − κ̄λ (t)f (t), f (0) = 0 where > > −1 κ̄λ (t) , κ̄(t) + ψ21 (t)ψ11 (t)−1 ρv ρ> v b(t) (σ(t)σ(t) ) b(t) > > −1 = κ + [ρw σ(t)> + (γ(t) + ψ21 (t)ψ11 (t)−1 ρv ρ> v )b(t) ](σ(t)σ(t) ) b(t). 58 (61) Proof. (61) follows from direct differentiation. Uniqueness is standard. Define p(s, t) , Ψ(s)ι2 − ψ(s)ι1 ψ11 (t)−1 Ψ12 (t), s ≤ t ≤ T. Then ∗ (s) λρv ηx̄,t ∗ κx̄+ρv ηx̄,t Φ (s) ∗ (s) λρv η0,t ∗ 0+ρv η0,t Φ (s) = + p(s, t)κx̄, s ≤ t ≤ T and ι> 2 p(t, t) = θ(t). Also, ∂ > > −1 p(s, t) = −ψ(s)ι1 ψ11 (t)−1 ρv ρ> v b(t) (σ(t)σ(t) ) b(t)θ(t). ∂t Proof of Lemma 2.4. Let −1 −1 0 λ (ρv ρ> v) M (s) , 0 b(s)> (σ(s)σ(s)> )−1 b(s) and observe Z Ix̄ (t) = t p(s, t)> M (s)p(s, t) ds. 0 Thus d Ix̄ (t) = θ(t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t) dt Z t −2 |0 > > −1 p(s, t)> M (s)ψ(s)ι1 ds ψ11 (t)−1 ρv ρ> v b(t) (σ(t)σ(t) ) b(t)θ(t), {z } =:f (t) −1 > > −1 > > > > −1 −1 > f˙ = −f (κ̄ + ψ21 ψ11 ρv ρ> v b (σσ ) b) + θ b (σσ ) b ψ21 ψ11 ρv ρv (62) Z t > −1 > − (ψ(s)ι1 ψ11 (t)−1 ρv ρ> ) M (s)ψ(s)ι ψ (t) ρ ρ ds , 1 11 v v v 0 {z } | =:g(t) ġ = λ −1 ρv ρ> v > − κ̄g − gκ̄ − −1 > > −1 ψ21 ψ11 ρv ρ> v b (σσ ) bg −1 ρv ρ> + (ψ21 ψ11 v − −1 g)b> (σσ > )−1 bψ21 ψ11 ρv ρ> v, −1 ρv ρ> with f (0) = g(0) = 0, where I have suppressed t unless needed. g = ψ21 ψ11 v is the unique solution to the last equation. In turn, f = 0 is the unique solution to (62). 59 Suppose Ix̄ (t) is singular for some t > 0. Since it is symmetric and positive semidefinite, there must be a nonzero z ∈ Rnx such that Z t z > θ(s)> b(s)> (σ(s)σ(s)> )−1 b(s)θ(s)z ds = 0 0 or σ(s)−1 b(s)θ(s)z = 0 for Lebesgue almost every s ≤ t or θ(s)z = 0 for all s ≤ t. Multiply z to (61) to see d (θ(s)z) = 0 = z, s ≤ t ds which is absurd. Lemma A.3. Z t I nx > > Φ (t) − ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ψ21 (s) ds ψ11 (t)−1 ρv ρ> (63) v = θ(t) 0 and Z t ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι> 2 p(s, t) ds = Ix̄ (t). (64) 0 Proof. (63): Denote the left-hand side by f (t). Then −1 f˙ = −(ΦInx )> κ̄> + Inx − (ΦInx )> b> (σσ > )−1 bψ21 ψ11 ρv ρ> v Z t + ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ψ21 (s) ds (65) 0 −1 > > > −1 −1 > × ψ11 ρv ρ> v (κ̄ + b (σσ ) bψ21 ψ11 ρv ρv ). But by Lemma 2.3(ii), −1 −1 > > > −1 > κ̄> + b> (σσ > )−1 bψ21 ψ11 ρv ρ> v = (κ̄ + ψ21 ψ11 ρv ρv b (σσ ) b) = (κ̄λ )> and with this, (65) can be rewritten as f˙ = Inx − f (κ̄λ )> , which is also satisfied by θ> . Since f (0) = θ(0)> = 0, it follows that f (t) = θ(t)> . (64): Denote the left-hand side by g(t). Then ġ(t) = ΦInx (t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t) Z t ∂ p(s, t) ds + ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s) ι> ∂t 2 0 while ∂ > > > −1 ι p(s, t) = −ψ21 (s)ψ11 (t)−1 ρv ρ> v b(t) (σ(t)σ(t) ) b(t)θ(t). ∂t 2 60 By (63), ġ(t) = θ(t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t). Note finally that g(0) = 0. Proof of Proposition 2.5. From FOC(x̄), Z 0 t ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι> 2 p(s, t) ds κx̄ Z t ∗ ΦInx (s)> b(s)> (σ(s)> )−1 dw̄0,η0,t (s). = 0 Recall (64). Proof of Proposition 2.6. (i) Differentiating FOC(x̄) with respect to t, we see 0 = ΦInx (t)> b(t)> (σ(t)> )−1 d(t) Z t ∗ ∗ ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s) dt Φκx̄t +ρv ηt (s) ds. − 0 Direct computation shows ∗ ∗ ∗ −1 > > > −1 dt Φκx̄t +ρv ηt (s) = ι> d(t). (66) 2 p(s, t)κ dx̄t + ψ21 (s)ψ11 (t) ρv ρv b(t) (σ(t) ) Hence, Z t ∗ ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι> 2 p(s, t) ds κ dx̄t 0 Z t Inx > Inx > > > −1 −1 > = Φ (t) − Φ (s) b(s) (σ(s)σ(s) ) b(s)ψ21 (s) ds ψ11 (t) ρv ρv 0 > −1 > × b(t) (σ(t) ) d(t). Use Lemma A.3. (ii) Observe ∗ ∗ ∗ ∗ dmx̄t ,ηt (t) = dmx̄,η (t)|x̄=x̄∗t ,η=ηt∗ + Φκ dx̄t +ρv dηt (t) ∗ ∗ = κ(x̄∗t − mx̄t ,ηt (t)) dt + (ρw σ(t)> + γ(t)b(t)> )(σ(t)> )−1 d(t) ∗ ∗ + Φκ dx̄t +ρv dηt (t). ∗ ∗ dΦκx̄t +ρv ηt (t) is, if computed from the definition, ∗ ∗ ∗ ∗ ∗ ∗ dΦκx̄t +ρv ηt (t) = −κ̄(t)Φκx̄t +ρv ηt (t) dt + κx̄∗t dt + Φκ dx̄t +ρv dηt (t) 61 and is, if computed from the solution (18) (recall (66)), ∗ ∗ ∗ ∗ dΦκx̄t +ρv ηt (t) = −κ̄(t)Φκx̄t +ρv ηt (t) dt + κx̄∗t dt > > −1 d(t) + θ(t)κ dx̄∗t . + ψ21 (t)ψ11 (t)−1 ρv ρ> v b(t) (σ(t) ) Comparing the last two equations, we see ∗ ∗ > > > −1 d(t). Φκ dx̄t +ρv dηt (t) = (ψ21 (t)ψ11 (t)−1 ρv ρ> v + θ(t)σx̄∗ (t) )b(t) (σ(t) ) Proof of Proposition 2.7. All follow from direct differentiation. Proof of the Claim in Remark 2.3. (i) If σ and b are deterministic, then θ, σx̄∗ , and δ, too, are deterministic. Since the latter are continuous, boundedness follows. (ii) Suppose σ, ρw , ρv , and b are diagonal; it then suffices to consider the scalar case. Suppose also κ̄ ≥ ε a.e. for some ε > 0. −1 2 Since δ − θσx̄∗ = ψ21 ψ11 ρv > 0, θ̇(t) < 1 − εθ(t) for all t ≥ 0. Consider θ† defined by θ̇† (t) = 1 − εθ† (t) and θ† (0) = θ(0). θ(t) ≤ θ† (t) for all t ≥ 0 because θ(t) = θ† (t) implies θ̇(t) < θ̇† (t). Now, θ† monotonically converges to ε−1 , and thus, θ is uniformly bounded by ε−1 ∨ θ(0). Next, since Ix̄−1 is decreasing, σ̇x̄∗ (t) < Ix̄ (0)−1 − εσx̄∗ (t) and σx̄∗ is uniformly bounded by (εIx̄ (0))−1 ∨ σx̄∗ (0). Note, finally, that δ̇(t) < 2[(εIx̄ (0))−1 ∨ σx̄∗ (0)] + λ−1 ρ2v − 2εδ(t). Proof of Proposition 2.8. Since ∗ ∗ d(t) = dw̄0,0 (t) − σ(t)−1 b(t)(mx̄t ,ηt (t) − m0,0 (t)) dt, the question is whether Z t Z 1 t σ −1 b∆ −1 0,0 −1 2 E (t) = exp σ(s) b(s)∆(s) dw̄ (t) − |σ(s) b(s)∆(s)| ds , 2 0 0 0 ≤ t ≤ T , is a martingale under Q0,0 |GT , where ∗ ∗ ∆(t) , mx̄t ,ηt (t) − m0,0 (t), 0 ≤ t ≤ T. Observe d∆(t) = κ(x̄∗t − ∆(t)) dt + δ(t)b(t)> (σ(t)> )−1 dw̄0,0 (t) − [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 b(t)∆(t) dt. 62 Hence, (∆, x̄∗ ) satisfies a linear SDE with uniformly bounded volatility. Thus, by a multidimensional adaptation of Liptser and Shiryaev (1977), Theorem 4.7, there is an 0,0 h > 0 such that supt≤T EQ exp(h|∆(t)|2 ) < ∞; in turn, by the uniform boundedness 0,0 of σ −1 b, there is an h0 > 0 such that supt≤T EQ exp(h0 |σ(t)−1 b(t)∆(t)|2 ) < ∞. Now, by Liptser and Shiryaev (1977), Section 6.2.3, Example 3, Novikov’s condition holds −1 −1 and E σ b∆ is a martingale. Define P 0 by dP 0 / d(Q0,0 |GT ) = E σ b∆ (T ). Denote by F̄ the augmented filtration generated by . From the definition of , we have F̄t ⊆ Gt , 0 ≤ t ≤ T . For the other direction, observe the SDE that ∗ ∗ (y, mx̄· ,η· (·), x̄∗ ) satisfies with a, b, σ, γ, δ, and σx̄∗ replaced by their respective nonanticipative path functionals in y. The drift is locally Lipschitz and linearly growing, and the volatility is linearly growing (Assumptions 2.1, 2.2, and 2.4). Hence, if in ∗ ∗ addition (γ + δ)b> (σ > )−1 and σx̄>∗ b> (σ > )−1 are locally Lipschitz, then (y, mx̄· ,η· (·), x̄∗ ) is the unique strong solution to the SDE by Itô’s existence and uniqueness theorem (Rogers and Williams (1994), Theorem V.12.1), and it would follow that F̄t ⊇ Gt , 0 ≤ t ≤ T , or F̄ = G. (Recall that I assume all G0 -measurable variables to be nonrandom constants.) Since γ, δ, b> , (σ > )−1 , and σx̄>∗ are uniformly bounded, it suffices to show that each of them is locally Lipschitz: Suppose p and q are matrix-valued path functionals on [0, T ] × C([0, T ], Rny ). Then |p(t, f )q(t, f ) − p(t, g)q(t, g)| = |pq − pq 0 + pq 0 − p0 q 0 |, p ≡ p(t, f ) and p0 ≡ p(t, g) ≤ |p||q − q 0 | + |q 0 ||p − p0 | by the triangle and Cauchy-Schwarz inequalities. γ is locally Lipschitz by the proof of Liptser and Shiryaev (1977), Theorem 12.5. b is so by assumption (Assumption 2.1). To see σ −1 is locally Lipschitz, observe that |σ(t, f )−1 − σ(t, g)−1 | = |σ(t, f )−1 (σ(t, g) − σ(t, f ))σ(t, g)−1 | ≤ |σ(t, f )−1 ||σ(t, g)−1 ||σ(t, g) − σ(t, f )|. It remains to show that δ and σx̄∗ are locally Lipschitz. Let N > 0, t ∈ [0, T ], and let f, g ∈ C([0, T ], Rny ) be such that sup |f (s)| ∨ sup |g(s)| ≤ N. s≤t s≤t Consider Ix̄−1 . Since d (Ix̄ (t)−1 ) = −σx̄∗ (t)> b(t)> (σ(t)σ(t)> )−1 b(t)σx̄∗ (t), dt for s ≤ t, −1 |Ix̄ (s, f ) −1 Z − Ix̄ (s, g) | ≤ K1 s |σx̄∗ (τ, f ) − σx̄∗ (τ, g)| dτ + K2 sup |f (s) − g(s)| s≤t 0 63 where I use the same symbols for the path functionals and Ki are positive constants that do not depend on s or t. Proceeding similarly for σx̄∗ , and using the last inequality, Z s |σx̄∗ (τ, f ) − σx̄∗ (τ, g)| dτ |σx̄∗ (s, f ) − σx̄∗ (s, g)| ≤ K3 0 Z s |δ(τ, f ) − δ(τ, g)| dτ + K5 sup |f (s) − g(s)|. + K4 s≤t 0 In turn, Z s |δ(τ, f ) − δ(τ, g)| dτ + K7 sup |f (s) − g(s)|. |δ(s, f ) − δ(s, g)| ≤ K6 s≤t 0 By Gronwall’s lemma, |δ(s, f ) − δ(s, g)| ≤ eK6 s K7 sup |f (s) − g(s)|, s ≤ t, s≤t or |δ(t, f ) − δ(t, g)| ≤ eK6 T K7 sup |f (s) − g(s)| =: K8 sup |f (s) − g(s)| s≤t s≤t where K8 does not depend on t. Hence, δ is locally Lipschitz. In turn, so is σx̄∗ . Proof of Lemma 2.5. Let ft (∆x̄, ∆η) , `λt (x̄∗t , ηt∗ ) − `λt (x̄∗t + ∆x̄, ηt∗ + ∆η) ≥ 0 Z 1 t κ∆x̄+ρv ∆η > Φ (s) b(s)> (σ(s)σ(s)> )−1 b(s)Φκ∆x̄+ρv ∆η (s) dt = 2 0 Z λ t + |∆η(s)|2 ds 2 0 where I have recalled the FOCs. We are to find min ft (∆x̄, ∆η) : Φκ∆x̄+ρv ∆η (t) = ∆m ∆x̄,∆η ∗ ∗ where ∆m ≡ m − mx̄t ,ηt (t). Note that there always is a ∆x such that (∆x, ∆η = 0) satisfies the constraint; Φκ (t) is invertible. Write the Lagrangian as ft (∆x̄, ∆η) − Λ> (Φκ∆x̄+ρv ∆η (t) − ∆m); the dependence of Λ on t is suppressed. FOC(∆η) is −1 > Z 0 = (ϕ(s) ρv ) t ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 b(τ )Φκ∆x̄+ρv ∆η (τ ) dτ s + λ∆η(s) − (Λ> ϕ(t)ϕ(s)−1 ρv )> 64 or, multiplied by ρv and differentiated with respect to s, d > > −1 (λρv ∆η(s)) = ρv ρ> v κ̄(s) (ρv ρv ) λρv ∆η(s) ds > > −1 κ∆x̄+ρv ∆η (s). + ρv ρ> v b(s) (σ(s)σ(s) ) b(s)Φ Proceeding similarly to the proof of Proposition 2.4, λρv ∆η(s) = ψ(s)ι1 λρv ∆η(0) + Ψ(s)ι2 κ∆x̄. Φκ∆x̄+ρv ∆η (s) Let s = t to obtain λρv ∆η(t) = ψ11 (t)λρv ∆η(0) + Ψ12 (t)κ∆x̄. From FOC(η), 0 = λ∆η(t) − (Λ> ρv )> . Thus λρv ∆η(s) κ∆x̄+ρv ∆η Φ (s) = p(s, t)κ∆x̄ + ψ(s)ι1 ψ11 (t)−1 ρv ρ> v Λ. (67) Now, FOC(∆x̄) is > t Z Φ 0= κ∆x̄+ρv ∆η > > > −1 Inx (s) b(s) (σ(s)σ(s) ) b(s)Φ (s)κ ds − (Λ> ΦInx (t)κ)> . 0 Substitute Φκ∆x̄+ρv ∆η (s) with that in (67) and use Lemma A.3 to see κ∆x̄ = σx̄∗ (t)> Λ. Plug this back to (67) and set s = t; the constraint is Φκ∆x̄+ρv ∆η (t) = δ(t)Λ = ∆m or Λ = δ(t)−1 ∆m. Thus λρv ∆η(s) κ∆x̄+ρv ∆η Φ (s) −1 = (p(s, t)σx̄∗ (t)> + ψ(s)ι1 ψ11 (t)−1 ρv ρ> v )δ(t) ∆m. Observe 1 ft (∆x̄, ∆η) = 2 where Z t 0 M (s) = λρv ∆η(s) Φκ∆x̄+ρv ∆η (s) > M (s) λρv ∆η(s) Φκ∆x̄+ρv ∆η (s) −1 λ−1 (ρv ρ> 0 v) 0 b(s)> (σ(s)σ(s)> )−1 b(s) ds as defined in the proof of Lemma 2.4. Proceeding similarly to that proof, we prove 1 ft (∆x̄, ∆η) = (∆m)> δ(t)−1 ∆m. 2 65 ∗ ∗ Proof of Proposition 2.9. Suppose first ξ(t) ∈ Ξ(t). Then b(t)mx̄t ,ηt (t)+σ(t)ξ(t) = b(t)mx̄,η (t) for some theory (x̄, η) and the theory passes the penalized likelihood ratio test. By Lemma 2.5, 1 x̄,η ∗ ∗ ∗ ∗ (m (t) − mx̄t ,ηt (t))> δ(t)−1 (mx̄,η (t) − mx̄t ,ηt (t)) ≤ `λt (x̄∗t , ηt∗ ) − `λt (x̄, η) ≤ α. 2 Suppose next σ(t)ξ(t) = b(t)∆m, ∆m ∈ Rnx , and 2−1 (∆m)> δ(t)−1 ∆m ≤ α. Let ∆x̄ , Φκ (t)−1 ∆m. Then ∗ ∗ ∗ ∗ b(t)mx̄t ,ηt (t) + σ(t)ξ(t) = b(t)mx̄t +∆x̄,ηt (t). There is a theory (x̄, η) such that it passes the penalized likelihood ratio test and ∗ ∗ b(t)mx̄,η (t) = b(t)mx̄t +∆x̄,ηt (t), because ∗ ∗ `λt (x̄∗t , ηt∗ ) − max{`λt (x̄, η) : b(t)mx̄,η (t) = b(t)mx̄t +∆x̄,ηt (t)} x̄,η ∗ ∗ = `λt (x̄∗t , ηt∗ ) − max{`λt (x̄, η) : mx̄,η (t) = mx̄t +∆x̄,ηt (t)} x̄,η 1 = (∆m)> δ(t)−1 ∆m ≤ α 2 where the second equality follows from Lemma 2.5. Hence (27). Since δ is uniformly bounded, so are its eigenvalues; hence, the eigenvalues of δ −1 are uniformly bounded below away from 0. It follows that the right-hand side of (27) is uniformly bounded; so is Ξ by Assumption 2.2. Compact-convexity is clear. Finally, progressive measurability is proved as that of a single-valued, left- or right-continuous adapted process is proved: Suppose b and σ −1 are both right-continuous. Let {sνi : i} denote the νth dyadic partition of [0, t], t ≤ T , and define δν−1 by δν−1 (s) , δ(sνi+1 )−1 for sνi < s ≤ sνi+1 and δν−1 (0) , δ(0)−1 ; define bν and σν−1 in the same way. Let F be a closed subset of Rny , and observe that the weak inverse (Aliprantis and Border, 1999, Section 16.1) is (s, ω) ∈ [0, t] × Ω : σν−1 (s, ω)b(s, ω) 1 > −1 nx ∆m ∈ R : (∆m) δν (s, ω)∆m ≤ α ∩ F 6= ∅ 2 which is trivially B[0, t] ⊗ Gt -measurable. Now, note that δ −1 is differentiable and −1 −1 hence a fortiori continuous. We have δ∞ = δ −1 as well as b∞ = b and σ∞ = σ −1 . 66 Finally, 1 > −1 −1 (s, ω) : σ(s, ω) b(s, ω) ∆m : (∆m) δ(s, ω) ∆m ≤ α ∩ F 6= ∅ 2 1 ∞ ∞ −1 > −1 = (s, ω) : ∩µ=1 ∪ν=µ σν (s, ω)bν (s, ω) ∆m : (∆m) δν (s, ω)∆m ≤ α ∩ F = 6 ∅ 2 1 −1 ∞ ∞ > −1 = ∩µ=1 ∪ν=µ (s, ω) : σν (s, ω)bν (s, ω) ∆m : (∆m) δν (s, ω)∆m ≤ α ∩ F = 6 ∅ 2 ∈ B[0, t] ⊗ Gt . Proof of Proposition 3.1. Since `λt (x̄, η) is quadratic in (x̄, η) ((15), (13), and (11)) ∗ is linear in x̄ (see (18)), the set in question equals (see (20)) and ηx̄,t 1 ∗ 2 x̄t + ∆x̄ ∈ R : Ix̄ (t)(κ∆x̄) ≤ α . 2 The claim then follows from the following lemma: Lemma A.4. limt→∞ Ix̄ (t) = ∞. Proof. Let ε > 0 be a lower bound of |σ −1 b|. Observe from the dynamics (25) of θ and the boundedness of the statistics γ, θ, σx̄∗ , and δ (Lemma 2.1 and Assumption 2.4) that θ is bounded away from zero as well: θ(t) ≥ θ > 0, t ≥ 0. (Keep in mind the convention that learning began prior to the decision making at time 0.) It follows Z t |σ(s)−1 b(s)|2 θ(s)2 ds Ix̄ (t) = Ix̄ (0) + 0 ≥ Ix̄ (0) + ε2 θ2 t → ∞ as t → ∞. Proof of Lemma 3.1. The claim follows from the boundedness of θ (Assumption 2.4) and Lemma A.4. Proof of Proposition 3.2. Suppose σ −1 b is constant. To see that δ evolves deterministically, simply recall the governing equations of δ, σx̄∗ , Ix̄−1 , and γ (Proposition 2.7 and (9)); a, b, and σ enter them only via σ −1 b. To prove convergence, denote the so far suppressed moment at which the agent’s learning started, by tΓ < 0 (see Remark 2.2); define f by f (t) , γ(t) + δ(t), t > tΓ ; and note that f satisfies 2 f˙(t) = 2θ(t)Ix̄ (t)−1 + |ρw |2 + (1 + λ−1 )ρ2v − 2κf (t) − ρw + (σ −1 bf (t))> , which motivates us to consider the following DE in tandem: for some t0 ≥ tΓ , 2 γ̇ λ (t) = |ρw |2 + (1 + λ−1 )ρ2v − 2κγ λ (t) − ρw + (σ −1 bγ λ (t))> , t ≥ t0 , γ λ (t0 ) > 0. 67 Lemma A.5. (i) γ λ is given by λ γ̄ −(κ + ρw σ −1 b) + ν λ tanh{ν λ t + tanh−1 [(ν λ )−1 (κ + ρw σ −1 b + |σ −1 b|2 γ λ (t0 ))]} λ γ (t) = |σ −1 b|2 −1 λ λ −(κ + ρw σ b) + ν coth{ν t + coth−1 [(ν λ )−1 (κ + ρw σ −1 b + |σ −1 b|2 γ λ (t0 ))]} |σ −1 b|2 depending on whether γ λ (t0 ) = γ̄ λ (top), < γ̄ λ (middle), or > γ̄ λ (bottom), where p ν λ , (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 , p −1 −(κ + ρ σ b) + (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 w γ̄ λ , . |σ −1 b|2 (ii) θ(t)Ix̄ (t)−1 > 0 for all t > tΓ . (iii) limt↓tΓ δ(t) = ∞. (iv) f (t) > γ̄ λ for all t > tΓ . Proof. Before proceeding with the proof, note carefully that in the expressions preceding Remark 2.2, time 0 refers to the beginning of learning. ˙ (i) Recall the definition (10) of κ̄ and observe that it satisfies κ̄(t) = ν 2 − κ̄(t)2 . (ii) θ satisfies ((25) and (24)) θ̇(t) = 1 − κ + ρw σ −1 b + (γ(t) + ψ21 (t)ψ11 (t)−1 ρ2v )|σ −1 b|2 θ(t), t ≥ tΓ , (68) while from (19) and (17), θ(tΓ ) = 0. Since the expression in the curly brackets is a continuous function on t ≥ tΓ (see (16)), it follows that θ(t) > 0 for all t > tΓ . Then Ix̄ (t), too, is positive for all t > tΓ by (21). (iii) From the definitions (24), (16), and (23) of δ, ψ, and σx̄∗ , limt↓tΓ δ(t) = limt↓tΓ (θ(t)2 /Ix̄ (t)). However, θ(tΓ ) = Ix̄ (tΓ ) = 0. Apply thus L’Hôpital’s rule: lim δ(t) = lim t↓tΓ t↓tΓ 2θ(t)θ̇(t) 2θ̇(t) θ(t)2 = lim −1 2 = lim −1 2 . 2 t↓tΓ |σ b| θ(t) Ix̄ (t) t↓tΓ |σ b| θ(t) First, from (68), limt↓tΓ θ̇(t) = 1. Next, by the observations made in the proof of (ii) above, θ(t) approaches 0 as t ↓ tΓ , from above. Thus limt↓tΓ δ(t) = ∞.35 (iv) Let t∗ , inf{t ≥ 0 : f (t) ≤ γ̄ λ + 1}. If t∗ = ∞, we are done. So suppose ∗ t < ∞. Since limt↓tΓ δ(t) = ∞ as (iii) states, t∗ > 0; and f (t) > γ̄ λ + 1 for all t < t∗ . By continuity, f (t∗ ) = γ̄ λ + 1. Let γ λ start with γ λ (t∗ ) = f (t∗ ) = γ̄ λ + 1. Then f (t) ≥ γ λ (t) for all t ≥ t∗ because f (t) = γ λ (t) implies f˙(t) > γ̇ λ (t) by (ii). Since γ λ (t) > γ̄ λ for all t ≥ t∗ by (i), the claim follows. (Proof of the proposition continued.) Let ε > 0. Then by Lemma 3.1 there exists t1 ≥ 0 such that 2θ(t)Ix̄ (t)−1 < ε for all t ≥ t1 . Let λ̃ ∈ (0, λ) be such that λ̃−1 ρ2v = 35 Similarly we can also prove limt↓tΓ σx̄∗ (t) = ∞. 68 λ−1 ρ2v + ε, and let γ λ and γ λ̃ start with γ λ (t1 ) = γ λ̃ (t1 ) = f (t1 ); f (t1 ) > 0 by Lemma A.5.(iv). Then γ λ (t) ≤ f (t) ≤ γ λ̃ (t) for all t ≥ t1 by Lemma A.5.(ii). Now some further elementary arguments based on the convergence of γ λ and γ λ̃ (Lemma A.5.(i)) and the arbitrariness of ε, prove f (t) → γ̄ λ as t → ∞; in turn, δ(t) → γ̄ λ − γ̄ ∞ . p Proof of Proposition 3.3. Let Ξ̄ , σ −1 b{∆m ∈ R : |∆m| ≤ 2αδ(∞)}. Then p √ p −1 dH (Ξ(t), Ξ̄) = σ b 2α δ(t) − δ(∞) and uniform convergence to Ξ̄ follows from Proposition 3.2. Proof of Lemma 4.1. Let M , {E ξ : ξ ∈ Ξ}. Define f : M × (C 2 (u) ∩ Cbudget ) by Z T P0 e−βt M (t) log(c(t)) dt. f (M, c) , E 0 The claim is sup min f (M, c) = min c∈C 2 (u)∩Cbudget M ∈M sup M ∈M c∈C 2 (u)∩C f (M, c). budget I apply the Kneser-Fan minimax theorem (Fan (1953), Theorem 2). The conclusion follows once the following three assumptions are checked. (i) M is a compact Hausdorff space. Let L2 ([0, T ] × Ω) ≡ L2 ([0, T ] × Ω, B[0, T ] ⊗ GT , Lebesgue × P 0 ) be the set of processes h such that khk , P0 Z E T 2 h(t) dt 1/2 < ∞. 0 L2 ([0, T ]×Ω) is a reflexive Banach space with the norm k·k defined above. By design, M ⊂ L2 ([0, T ] × Ω). Let K ≥ 0 be such that Ξ(t) ∈ [−K, K]ny , t ≥ 0. (K may be state-dependent. See Section 4.3 and Remark 2.2.) For all M ∈ M, Z T 2 2 P0 2 kM k ≤ E E (2ξ) (t)eny K T dt = T eny K T 0 and M is norm-bounded. M is norm-closed by Lemma B.1 of Cuoco and Cvitanić (1998) and is convex by (the proof of) Theorem 2.1(c) of Chen and Epstein (2002); thus, it is weakly closed. By Alaoglu’s theorem, then, M is weakly compact. The weak topology of a normed space is Hausdorff and so is a subspace. (ii) For every c ∈ C 2 (u) ∩ Cbudget , f (M, c) is lower semicontinuous on M. Let span(M) be the linear span of M over R; span(M) ⊂ L2 ([0, T ] × Ω) is a normed space. For each c ∈ C 2 (u) ∩ Cbudget , the map fec : span(M) → R, Z T P0 M 7→ E e−βt M (t) log(c(t)) dt 0 69 is linear; by Hölder’s inequality, the norm of fec is bounded by k log ck < ∞. Then there exists an extension f c of fec such that the linear functional f c defined on L2 ([0, T ]×Ω) is continuous in the norm topology, and consequently, in the weak topology (Aliprantis and Border (1999), Lemma 6.13). Being a restriction of f c to M ⊂ span(M), f (·, c) is continuous as well. (iii) f is convexlike on M and concavelike on C 2 (u)∩Cbudget . M and C 2 (u)∩Cbudget are both convex. It then suffices to note that (M, c) 7→ M log c is convex-concave on (0, ∞)2 . Proof of Proposition 4.1. Apply the minimax theorem and write the dual of the inner supremization as Z T P0 inf E max E ξ (t)e−βt log(c(t)) − ΛE −(ζ+ν) (t)e−rt c(t) dt (69) ν 0 c(t) where Λ > 0. The solution to the dual problem solves the primal problem as well (He and Pearson, 1991; Karatzas et al., 1991). c∗ (t) and Λ∗ are standard. Plugging c∗ to (69), ignoring irrelevant terms, and exchanging the order of integration, we reach Z T −βt e − e−βT 1 Pξ inf |ζ(t) + ν(t) + ξ(t)|2 dt. E ν(t) β 2 0 Without ξ, the minimizing ν(t) is 0 because ν(t) ∈ Ker(σR (t)). With ξ, on the other hand, |ζ(t) + ν(t) + ξ(t)|2 = |ζ(t) + ξ(t)|2 + |ν(t)|2 + 2ξ(t)> ν(t) and under the constraint σR (t)ν(t) = 0, the unique minimizer is given by ν ∗ (t) = f (t)ξ(t) where f (t) , σR (t)> (σR (t)σR (t)> )−1 σR (t) − Iny . Observe that f = f > and f 2 = −f , and plug c∗ , ν ∗ , and Λ∗ to (34). Proof of Proposition 4.2. (i) follows from Theorem IV.4.3 of Fleming and Soner (1993). The assumptions of the theorem are (IV.3.5) and (IV.4.6) in their book. (IV.3.5) is the uniform parabolicity assumption, which is equivalent in the present case to Assumption 4.1. (IV.4.6) is a collection of regularity conditions that can be checked straightforwardly. (ii) and (iii) follow from Theorem IV.3.1. Proof of Lemma 4.2. (i) Let "Z # ∗,ξ 2 T −βs −βT e − e 1 m − r + σ ξ(s) 0 R ∗,ξ s F (t, m∗ , ξ) , EP ds mt = m∗ β 2 σ R t 70 so that J(t, m∗ ) = minξ F (t, m∗ , ξ). The convexity of m∗ 7→ J(t, m∗ ) follows from that of (m∗ , ξ) 7→ F (t, m∗ , ξ) and of Ξ: Suppose m∗ = hm∗1 + (1 − h)m∗2 , h ∈ [0, 1], and let ξ1∗ and ξ2∗ be the respective minimizers. Then J(t, m∗ ) ≤ F (t, hm∗1 + (1 − h)m∗2 , hξ1∗ + (1 − h)ξ2∗ ) ≤ hJ(t, m∗1 ) + (1 − h)J(t, m∗2 ). (ii) ∂m∗ J(t, m∗ ) is obtained via the envelope theorem: If both ∂m∗ J(t, m∗ ) and ∂m∗ F (t, m∗ , ξ ∗ ) exist, then ∂m∗ J(t, m∗ ) = ∂m∗ F (t, m∗ , ξ ∗ ). (See Milgrom and Segal (2002), Theorem 1.) Observe Z s ∗,ξ κ(τ −t) ∗,ξ −κ(s−t) e [κx̄ dτ + σm∗ (τ )( d(τ ) + ξ(τ ) dτ )] mt + ms = e t and let e−βs − e−βT 1 f (s, t, m , ξ) , β 2 ∗ m∗,ξ s − r + σR ξ(s) σR so that ∗ F (t, m , ξ) = E P0 Z 2 ∗ , m∗,ξ t = m T f (s, t, m∗ , ξ) ds. t Now, it is easy to check the conditions for differentiating under the integral (Durrett 0 RT (2005), Theorem A.9.1) and we have ∂m∗ F (t, m∗ , ξ) = EP t ∂m∗ f (s, t, m∗ , ξ) ds. Proof of Proposition 4.3. It suffices to show lim t→T βeβt σR σm∗ (t)∂m∗ J(t, m∗ ) = 0. 1 − e−β(T −t) Recall Lemma 4.2(ii) and let −κ(s−t) ∗,ξ∗ ∗ ∗ e m − r + σ ξ (s) 0 R s , m∗,ξ K(t, m∗ ) , sup EP = m∗ . t σR σR s∈[t,T ] Then Z T −βs βt βe βeβt e − e−βT ∗ ∗ J(t, m ) ≤ ∂ ds K(t, m∗ ) m 1 − e−β(T −t) 1 − e−β(T −t) β t 1 T −t = − K(t, m∗ ). β eβ(T −t) − 1 limt→T K(t, m∗ ) < ∞ because (i) Z s ∗,ξ ∗ ∗,0 −κ(s−t) ms = ms + e eκ(τ −t) σm∗ (τ )ξ ∗ (τ ) dτ where mt∗,0 = m∗t , t 71 (ii) 1 K(t, m∗ ) ≤ 2 σR P0 sup E |m∗,0 s |+ Z s∈[t,T ] ! T ¯ ) dτ + r + σR ξ(t) ¯ eκ(τ −t) |σm∗ (τ )|ξ(τ , t 0 ∗ and (iii) EP |m∗,0 s | = g(s − t, m ) for some function g continuous in s − t. Thus, βt βe ∗ ∗ ∗ J(t, m ) ≤ 0 · lim K(t, m ) = 0. lim ∂ m t→T 1 − e−β(T −t) t→T Proof of Lemma 5.1. Note first that Z t Z t 1 ∗ 2 σD (s) d(s) + aD (s) + bD (s)ms − |σD (s)| ds log D(t) = log D(0) + 2 0 0 ∗ ∗ where m∗s ≡ mx̄s ,ηs (s). Note further that (i) (52) implies Z T P0 E |σD (t)|4 dt < ∞ 0 and (ii) m∗ is square-integrable by Theorem 4.7 of Liptser and Shiryaev (1977); (m∗ , x̄∗ ) satisfies a linear SDE with uniformly bounded volatility (Proposition 2.6). Then, the claim follows from Jensen’s inequality, Itô’s isometry, the boundedness of bD , and other elementary arguments. Proof of Proposition 5.1. Step 1. Notation. Let Z T Pξ f (ξ, c) , E e−βt log(c(t)) dt, (ξ, c) ∈ Ξ × C 2 (u). 0 Then, Ξ∗ = arg minξ∈Ξ f (ξ, D). Denote the set of consumption plans that can be financed (that is, are feasible) under price system (r, S) with initial wealth W0 , by Cf (W0 ; r, S) ⊂ C 2 (u). The agent’s problem, under (r, S) endowed with W0 , is sup min f (ξ, c) . ξ∈Ξ c∈Cf (W0 ;r,S) Step 2. Optimality of D given ξ ∗ . Fix ξ ∗ ∈ Ξ∗ . In this step, I show that D maxi∗ ∗ ∗ mizes f (ξ ∗ , c) on Cf (S ξ (0); rξ , S ξ ). Begin by noting that the maximization problem supc∈Cf f (ξ ∗ , c) can be seen as ∗ ∗ ∗ that of an expected utility investor with prior P ξ subject to price system (rξ , S ξ ); and with this in mind note further that ∗ ∗ ∗ dR(t) = (β + aD (t) + bD (t)mx̄t ,ηt (t) + σD (t)ξ ∗ (t)) dt + σD (t) dξ (t) ∗ ∗ =: µξR (t) dt + σR (t) dξ (t). 72 Then, in particular, ∗ ∗ µξR (t) − rξ (t) = |σR (t)|2 , ∗ ∗ ζ0 (t) , σR (t)> (σR (t)σR (t)> )−1 (µξR (t) − rξ (t)) = σR (t)> . ∗ Lemma A.6. ζ0 satisfies Novikov’s condition under P ξ : Z T ∗ 1 Pξ 2 E exp |ζ0 (t)| dt < ∞. 2 0 > Proof. Note first that ζ0 = σD . Then, by (52) and Example 3 of Section 6.2.3 of Liptser and Shiryaev (1977), ζ0 satisfies Novikov’s condition under P 0 . Now the claim follows from the Cauchy-Schwarz inequality and the uniform boundedness of ξ ∗ . (Proof of the proposition continued.) Suppose first nA = 0. Then the market is dynamically complete (Lemma A.6); and standard martingale arguments show that the optimal consumption plan equals D. Suppose next nA > 0. Then the market is dynamically incomplete, in which case the ξ ∗ -optimality of D can be argued along the lines of He and Pearson (1991), Karatzas et al. (1991), and Cuoco (1997). First, introduce nA fictitious financial assets (Karatzas et al., 1991) whose nA dimensional return process H = {H(t), Gt } follows ∗ ∗ dH(t) = rξ (t)1nA dt + σH (t) dξ (t) where 1nA denotes the nA -dimensional vector of ones and the rows of σH = {σH (t), Gt } consist of orthonormal vectors in the kernel of σR a.e. Next, let N denote the set of RnA -valued processes ν = {ν(t), Gt } satisfying Z T ∗ Pξ E |ν(t)|2 dt < ∞, 0 let ζν (t) , σR (t) σH (t) −1 ∗ ∗ µξR (t) − rξ (t) ν(t) = σR (t)> + σH (t)> ν(t), and collect in N ∗ those ν ∈ N with which Z T Z ∗ 1 T Pξ 2 > ξ∗ |ζν (t)| dt = 1. E exp − ζν (t) d (t) − 2 0 0 N ∗ is not empty: 0 ∈ N ∗ (Lemma A.6). Let also Z t Z t Z 1 t ξ∗ > ξ∗ 2 pν (t) , exp − r (s) ds exp − ζν (s) d (s) − |ζν (s)| ds . 2 0 0 0 73 Then, by Theorem 1 of Cuoco (1997),36 a feasible plan c ∈ Cf satisfies Pξ ∗ Z sup E ν∈N ∗ T ∗ pν (t)c(t) dt ≤ S ξ (0). 0 Accordingly, the dual problem is defined as −βt Z T ∗ e Pξ −βt ξ∗ −βt e log inf τ S (0) + E dt , −e (τ,ν)∈(0,∞)×N ∗ τ pν (t) 0 the unique solution to which is τ ∗ = 1/D(0) and ν ∗ ≡ 0. Since the candidate optimal consumption plan c∗ equals D ∈ Cf where c∗ (t) , D(0)e−βt , 0 ≤ t ≤ T, p0 (t) it follows, finally, from Proposition 1 of Cuoco (1997) that D solves supc∈Cf f (ξ ∗ , c). Step 3. Optimality of D. Therefore, for each ξ ∗ ∈ Ξ∗ , (ξ ∗ , D) is a saddle point of ∗ ∗ ∗ f on Ξ × Cf (S ξ (0); rξ , S ξ ); that is, for each ξ ∗ ∈ Ξ∗ , D∈ arg max min f (ξ, c) . c∈Cf (S ξ∗ (0);rξ∗ ,S ξ∗ ) ξ∈Ξ Proof of Proposition 5.2. First, from the law of motion (54) of D under P ξ , Z t 1 (aD + bD m∗s + σD ξ(s)) ds + σD ξ (t) − |σD |2 t log D(t) = log D(0) + 2 0 ∗ ∗ where m∗s ≡ mx̄s ,ηs (s). By Fubini’s theorem, Pξ Z T f (ξ) , E e−βt log(D(t)) dt 0 Z T Z th i 1 Pξ Pξ −βt 2 ∗ = e log D(0) + aD t − |σD | t + bD E (ms ) + σD E (ξ(s)) ds dt 2 0 0 Z T Z th i ξ ξ −βt = K1 + e bD EP (m∗s ) + σD EP (ξ(s)) ds dt (70) 0 0 36 The present specification violates one of the standing assumptions (Assumption 1) of the cited paper that the interest rate process is uniformly bounded. The theorem nevertheless applies because I directly required the discounted wealth process to be uniformly bounded below (cf. Equation (11) of the cited paper). 74 where K1 , whose definition is clear from the last equality, is a constant independent ξ of ξ. To compute EP m∗s , note from Proposition 2.6 that z , (m∗ , x̄∗ )> satisfies bz , dz(t) = bz z(t) dt + σz (t) d(t), −κ κ ρw + (γ(t) + δ(t))(σ −1 b)> and σz (t) , ; 0 0 κ−1 σx̄∗ (t)(σ −1 b)> in particular, σz : [0, T ] → R2×(1+nA ) is a deterministic function of time, which is differentiable, and hence is a fortiori bounded, on [0, T ]. The solution is Z t −bz s bz t e σz (s) d(s) . z(0) + z(t) = e 0 Thus, Pξ E m∗s = bz s ι> z(0) 1e + ξ bz s EP ι> 1e Z s e−bz τ σz (τ )ξ(τ ) dτ (71) 0 where ι1 = (1, 0)> and the expectation of the stochastic integral with respect to ξ has vanished given the boundedness of the integrand. Plugging (71) back into (70) we obtain Z T Z t Z s Pξ −βt > bz s −bz τ f (ξ) = K2 + E e bD ι1 e e σz (τ )ξ(τ ) dτ + σD ξ(s) ds dt 0 0 0 where Z K2 , K1 + T e −βt 0 Z t bz s bD ι> z(0) ds dt. 1e 0 Now, consider the following integral: Z t Z s Z tZ s > bz s −bz τ bz (s−τ ) bD ι1 e e σz (τ )ξ(τ ) dτ ds = bD ι> σz (τ )ξ(τ ) dτ ds 1e 0 0 0 0 Z tZ t bz (s−τ ) = bD ι> σz (τ )ξ(τ ) ds dτ 1e 0 τ Z tZ t bz (τ −s) = bD ι> σz (s)ξ(s) dτ ds. 1e 0 s Thus, f (ξ) = K2 + E Pξ Z T −βt e 0 Z t Z t > bz (τ −s) bD ι1 e dτ σz (s)ξ(s) + σD ξ(s) ds dt. 0 s Recall that p ξ(s) ∈ Ξ(s) if and only if there is ∆m(s) ∈ R such that ξ(s) = σ −1 b∆m(s), |∆m(s)| ≤ 2αδ(s). Noting also that σD σ −1 = (1, 0, · · · , 0), we finally arrive at Z t Z TZ t ξ −βt > bz (τ −s) −1 dτ σz (s)σ b + 1 EP (∆m(s)) ds dt. f (ξ) = K2 + e bD ι1 e 0 0 s 75 Lemma A.7. h : dom(h) → R defined by Z t bz (τ −s) ι> dτ σz (s)σ −1 b + 1, h(s, t) , 1e s dom(h) , {(s, t) ∈ R2 : 0 ≤ s ≤ t ≤ T }, is continuous on dom(h) and positive on int(dom(h)) = {(s, t) ∈ R2 : 0 < s < t < T }. Proof. Direct computation shows κh(s, t) = κ + 1 − e−κ(t−s) ρw σ −1 b + (γ(t) + δ(t))|σ −1 b|2 + κ(t − s) − (1 − e−κ(t−s) ) κ−1 σx̄∗ (t)|σ −1 b|2 . Continuity is clear; γ, δ, and σx̄∗ are differentiable functions on [0, T ]. For the other claim, note first that since κ(t−s)−(1−e−κ(t−s) ) is positive whenever s < t and so is σx̄∗ (t), by Lemma A.5.(ii), for all t ≥ 0, the third term is positive on int(dom(h)). Meanwhile, p ρw σ −1 b + (γ(t) + δ(t))|σ −1 b|2 > −κ + (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 , t ≥ 0, by Lemma A.5.(iv). Thus, the rest of κh(s, t), too, is positive on int(dom(h)). (Proof of the proposition continued.) Thus, e−βt bD h(s, t) > 0 on the interior of the domain of integration; and it follows that f (ξ) is uniquely minimized by ξ ∗∗ . Proof of Proposition 5.3. To conform to the standard presentation of a control problem, I rewrite J as Z T ξ P0 −βs ξ J(t, Z) = min E e log(D (s)) ds Z (t) = Z ξ t subject to dZ ξ (s) = µZ (Z ξ (s)) ds + σZ (Z ξ (s))( d(s) + ξ(s) ds). The HJB equation is 1 2 −βt > > 0 = min e log D + ∂t J + (∂Z J) (µZ + σZ ξ) + (∂Z J) ◦ (σZ σZ ) . ξ 2 The minimization problem taken separately is ξ ∗ (t) = arg min(∂Z J)> σZ ξ. ξ∈Ξ(t) Now recall that σξ = b(m − m∗t ) by definition where |m − m∗t | ≤ 76 p 2αδ(t). Proof of Lemma 5.2. First, !> √ p 2 ρw,1 A 1 − rDA +γ+δ κ−1 x σx̄∗ D, 0, , , 0, 0, 0, 0 2 2 A(1 − rDA ) A(1 − rDA ) −1 σZ σ b = Thus, expectedly because A is unambiguous, ∂A J is irrelevant. Next, observe Z ξ s ∗,ξ m (τ ) dτ D (s) = D exp t Z × exp t s 1 σD (τ )( d(τ ) + ξ(t) dτ ) − 2 Z s 2 |σD (τ )| dτ , t ( m∗,ξ (s) = e−κx (s−t) Z + m∗ " s eκx (τ −t) κx x̄∗,ξ τ dτ + t and x̄∗,ξ s ∗ Z = x̄ + t s ! #) γ(τ ) + δ(τ ) p (1, 0) ( d(τ ) + ξ(τ ) dτ ) , ρw + p 2 A(τ ) 1 − rDA κ−1 σ ∗ (τ ) p x px̄ (1, 0)( d(τ ) + ξ(τ ) dτ ). 2 A(τ ) 1 − rDA Thus, 1 − e−κx (s−t) ∗ 1 − e−κx (s−t) log(D (s)) = log D + x̄∗ + f (s, t, ξ) m + (s − t) − κx κx ξ where f is independent of D, m∗ , and x̄∗ . Finally, by the envelope theorem, √ p Z T −βt −βT −κx (s−t) 2 A 1 − rDA +γ+δ ρ e − e 1 − e w,1 (∂Z J)> σZ σ −1 b = + e−βs ds 2 β κx A(1 − rDA ) t Z T −κx (s−t) 1−e κ−1 x σx̄∗ + e−βs (s − t) − ds . 2 κx A(1 − rDA ) t Proof of Proposition 5.4. See the discussion following the statement of the proposition. As can be easily checked, ∂δ(∞)/∂(h2eff ) < 0. On the other hand, ∂h2eff −2hD rDA + 2hA = . 2 ∂hA 1 − rDA 77 References Akaike, Hirotugu (1973), “Information theory and an extension of the maximum likelihood principle.” Proc. 2nd Inter. Symposium on Information Theory, Budapest., 267–281. Aliprantis, Charalambos D. and Kim C. Border (1999), Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer. Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent (2003), “A quartet of semigroups for model specification, robustness, prices of risk, and model detection.” Journal of the European Economic Association, 1, 68–123. Bansal, Ravi, Dana Kiku, and Amir Yaron (2012), “An empirical evaluation of the long-run risks model for asset prices.” Critical Finance Review, 1, 183–221. Barberis, Nicholas (2000), “Investing for the long run when returns are predictable.” Journal of Finance, 55, 225–264. Barry, Christopher B. (1974), “Portfolio analysis under uncertain means, variances, and covariances.” Journal of Finance, 29, 515–522. Blanchard, Olivier J. (1993), “Movements in the equity premium.” Brookings Papers on Economic Activity, 2, 75–138. Breeden, Douglas T. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities.” Journal of Financial Economics, 7, 265–296. Brendle, Simon (2006), “Portfolio selection under incomplete information.” Stochastic Processes and their Applications, 116, 701–723. Campanale, Claudio (2011), “Learning, ambiguity and life-cycle portfolio allocation.” Review of Economic Dynamics, 14, 339–367. Chen, Hui, Nengjiu Ju, and Jianjun Miao (2014), “Dynamic asset allocation with ambiguous return predictability.” Review of Economic Dynamics, 17, 799–823. Chen, Zengjing and Larry G. Epstein (2002), “Ambiguity, risk, and asset returns in continuous time.” Econometrica, 70, 1403–1443. Choi, Hongseok (2012), Essays on Learning under Ambiguity. Ph.D. dissertation, University of Pennsylvania. Cox, John C., Jr. Jonathan E. Ingersoll, and Stephen A. Ross (1985), “An intertemporal general equilibrium model of asset prices.” Econometrica, 53, 363–384. Cuoco, Domenico (1997), “Optimal consumption and equilibrium prices with portfolio constraints and stochastic income.” Journal of Economic Theory, 72, 33–73. Cuoco, Domenico and Jakša Cvitanić (1998), “Optimal consumption choices for a ‘large’ investor.” Journal of Economic Dynamics & Control, 22, 401–436. 78 Davis, M. H. A. and A. R. Norman (1990), “Portfolio selection with transaction costs.” Mathematics of Operations Research, 15, 676–713. Detemple, Jérôme B. (1986), “Asset pricing in a production economy with incomplete information.” Journal of Finance, 61, 383–392. Dow, James and Sergio Ribeiro da Costa Werlang (1992), “Uncertainty aversion, risk aversion, and the optimal choice of portfolio.” Econometrica, 60, 197–204. Drechsler, Itamar (2013), “Uncertainty, time-varying fear, and asset prices.” Journal of Finance, 68, 1843–1889. Durrett, Richard (2005), Probability: Theory and Examples. Thomson. Elliott, Robert J. and Vikram Krishnamurthy (1997), “Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems.” SIAM Journal on Control and Optimization, 35, 1908–1923. Epstein, Larry G. and Jianjun Miao (2003), “A two-person dynamic equilibrium under ambiguity.” Journal of Economic Dynamics & Control, 27, 1253–1288. Epstein, Larry G. and Martin Schneider (2003), “IID: Independently and indistinguishably distributed.” Journal of Economic Theory, 113, 32–50. Epstein, Larry G. and Martin Schneider (2007), “Learning under ambiguity.” Review of Economic Studies, 74, 1275–1303. Epstein, Larry G. and Martin Schneider (2008), “Ambiguity, information quality, and asset pricing.” Journal of Finance, 63, 197–228. Epstein, Larry G. and Tan Wang (1994), “Intertemporal asset pricing under Knightian uncertainty.” Econometrica, 62, 283–322. Fama, Eugene F. and Kenneth R. French (1988), “Permanent and temporary components of stock prices.” Journal of Political Economy, 96, 246–273. Fan, Ky (1953), “Minimax theorems.” Proceedings of the National Academyof Sciences, 39, 42–47. Fleming, Wendell H. and H. Mete Soner (1993), Controlled Markov Processes and Viscosity Solutions. Springer-Verlag. Gagliardini, Patrick, Paolo Porchia, and Fabio Trojani (2009), “Ambiguity aversion and the term structure of interest rates.” Review of Financial Studies, 22, 4157–4188. Gajdos, T., T. Hayashi, J.-M. Tallon, and J.-C. Vergnaud (2008), “Attitude toward imprecise information.” Journal of Economic Theory, 140, 27–65. Gennotte, Gérard and Terry A. Marsh (1993), “Variations in economic uncertainty and risk premiums on capital assets.” European Economic Review, 37, 1021–1041. 79 Gilboa, Itzhak and Massimo Marinacci (2013), “Ambiguity and the Bayesian paradigm.” In Advances in Economics and Econometrics (Daron Acemoglu, Manuel Arellano, and Eddie Dekel, eds.), volume I, 179–242, Cambridge University Press. Gilboa, Itzhak and Larry Samuelson (2012), “Subjectivity in inductive inference.” Theoretical Economics, 7, 183–215. Gilboa, Itzhak and David Schmeidler (1989), “Maxmin expected utility with non-unique prior.” Journal of Mathematical Economics, 18, 141–153. Gilboa, Itzhak and David Schmeidler (2010), “Simplicity and likelihood: An axiomatic approach.” Journal of Economic Theory, 145, 1757–1775. Good, Irving J. and Ray A. Gaskins (1971), “Nonparametric roughness penalties for probability densities.” Biometrika, 58, 255–277. Green, Peter J. (1987), “Penalized likelihood for general semi-parametric regression models.” International Statistical Review, 55, 245–259. Hansen, Lars Peter and Thomas J. Sargent (2011), “Robustness and ambiguity in continuous time.” Journal of Economic Theory, 146, 1195–1223. He, Hua and Neil D. Pearson (1991), “Consumption and portfolio policies with incomplete markets and short-sale constraints: The infinite dimensional case.” Journal of Economic Theory, 54, 259–304. Heaton, John and Deborah Lucas (2000), NBER Macroeconomics Annual 1999, Volume 14, chapter Stock Prices and Fundamentals, 213–264. MIT. Hernández-Hernández, Daniel and Alexander Schied (2006), “Robust utility maximization in a stochastic factor model.” Statistics & Decisions, 24, 109–125. Hernández-Hernández, Daniel and Alexander Schied (2007a), “A control approach to robust utility maximization with logarithmic utility and time-consistent penalties.” Stochastic Processes and their Applications, 117, 980–1000. Hernández-Hernández, Daniel and Alexander Schied (2007b), “Robust maximization of consumption with logarithmic utility.” Proceedings of the 2007 American Control Conference, 1120–1123. Illeditsch, Philipp K. (2011), “Ambiguous information, portfolio inertia, and excess volatility.” Journal of Finance, 66, 2213–2247. Ilut, Cosmin L. and Martin Schneider (2014), “Ambiguous business cycles.” American Economic Review, 104, 2368–2399. Jagannathan, Ravi, Ellen R. McGrattan, and Anna Scherbina (2000), “The declining u.s. equity premium.” Federal Reserve Bank of Minneapolis Quarterly Review, 24, 3–19. 80 Kalymon, Basil A. (1971), “Estimation risk and the portfolio selection model.” Journal of Financial and Quantitative Analysis, 6, 559–582. Karatzas, Ioannis, John P. Lehoczky, Steven E. Shreve, and Gan-Lin Xu (1991), “Martingale and duality methods for utility maximization in an incomplete market.” SIAM Journal on Control and Optimization, 29, 702–730. Karatzas, Ioannis and Steven E. Shreve (1988), Brownian Motion and Stochastic Calculus. Springer-Verlag. Kim, Tong Suk and Edward Omberg (1996), “Dynamic nonmyopic portfolio behavior.” Review of Financial Studies, 9, 141–161. Klein, Roger W. and Vijay S. Bawa (1976), “The effect of estimation risk on optimal portfolio choice.” Journal of Financial Economics, 3, 215–231. Klein, Roger W. and Vijay S. Bawa (1977), “The effect of limited information and estimation risk on optimal portfolio diversification.” Journal of Financial Economics, 5, 89–111. Koijen, Ralph S.J. and Stijn van Nieuwerburgh (2011), “Predictability of returns and cash flows.” Annual Review of Financial Economics, 3, 467–491. Konishi, Sadanori and Genshiro Kitagawa (2008), Information Criteria and Statistical Modeling. Springer. Lakner, Peter (1998), “Optimal trading strategy for an investor: The case of partial information.” Stochastic Processes and their Applications, 76, 77–97. Lettau, Martin, Sydney C. Ludvigson, and Jessica A. Wachter (2008), “The declining equity premium: What role does macroeconomic risk play?” Review of Financial Studies, 21, 1653–1687. Liptser, Robert S. and Albert N. Shiryaev (1977), Statistics of Random Processes (Volumes I and II). Springer-Verlag. Liu, Hening (2011), “Dynamic portfolio choice under ambiguity and regime switching mean returns.” Journal of Economic Dynamics & Control, 35, 623–640. Liu, Hening (2013), “Optimal consumption and portfolio choice under ambiguity for a meanreverting risk premium in complete markets.” Annals of Economics and Finance, 14, 21–52. Merton, Robert C. (1973), “An intertemporal capital asset pricing model.” Econometrica, 41, 867–887. Merton, Robert C. (1980), “On estimating the expected return on the market: An exploratory investigation.” Journal of Financial Economics, 8, 323–361. Miao, Jianjun (2009), “Ambiguity, risk and portfolio choice under incomplete information.” Annals of Economics and Finance, 10, 257–279. 81 Miao, Jianjun and Neng Wang (2011), “Risk, uncertainty, and option exercise.” Journal of Economic Dynamics & Control, 35, 442–461. Milgrom, Paul and Ilya Segal (2002), “Envelope theorems for arbitrary choice sets.” Econometrica, 70, 583–601. Pástor, Ľuboš and Robert F. Stambaugh (2001), “The equity premium and structural breaks.” Journal of Finance, 56, 1207–1239. Poterba, James M. and Lawrence H. Summers (1988), “Mean reversion in stock prices: Evidence and implications.” Journal of Financial Economics, 22, 27–59. Rogers, L. C. G. and David Williams (1994), Diffusions, Markov Processes, and Martingales, Volumes 1 and 2. Cambridge University Press. Rossi, Alberto G. and Allan Timmermann (2015), “Modeling covariance risk in merton’s icapm.” Review of Financial Studies, 28, 1428–1461. Routledge, Bryan R. and Stanley E. Zin (2009), “Model uncertainty and liquidity.” Review of Economic Dynamics, 12, 543–566. Sbuelz, Alessandro and Fabio Trojani (2008), “Asset prices with locally constrained-entropy recursive multiple-priors utility.” Journal of Economic Dynamics & Control, 32, 3695– 3717. Schied, Alexander (2008), “Robust optimal control for a consumption-investment problem.” Mathematical Methods of Operations Research, 67, 1–20. Schmeidler, David (1989), “Subjective probability and expected utility without additivity.” Econometrica, 57, 571–587. Trojani, Fabio and Paolo Vanini (2004), “Robustness and ambiguity aversion in general equilibrium.” Review of Finance, 8, 279–324. van Binsbergen, Jules H. and Ralph S. J. Koijen (2010), “Predictive regressions: A presentvalue approach.” Journal of Finance, 65, 1439–1471. Veronesi, Pietro (2000), “How does information quality affect stock returns?” Journal of Finance, 55, 807–837. Welch, Ivo and Amit Goyal (2008), “A comprehensive look at the empirical performance of equity premium prediction.” Review of Financial Studies, 21, 1455–1508. Xia, Yihong (2001), “Learning about predictability: The effects of parameter uncertainty on dynamic asset allocation.” Journal of Finance, 56, 205–246. Zohar, Gady (2001), “A generalized Cameron-Martin formula with applications to partially observed dynamic portfolio optimization.” Mathematical Finance, 11, 475–494. 82 Supplementary Appendix to Learning under Ambiguity, Portfolio Choice, and Asset Returns Hongseok Choi∗ September 22, 2015 SA.1 Proof of the Claim in Page 5 Here I briefly review the related model of learning by Epstein and Schneider (2007) focusing on their portfolio choice example and show that, as claimed in the introduction, the continuous-time counterpart of their example results in no learning because the likelihood function degenerates to infinity everywhere. SA.1.1 The Model Begin with the exchangeable Bayesian model of binary returns. There is a stock for which there are d trading days per month. The likelihood function for the net rate of return between two consecutive trading days is1 √ 1 1 x̄ √ L ∆R(t) = ±σR / d x̄ = ± 2 2 σR d where σR > 0 and the monthly expected return x̄ ∈ R is the parameter of interest. A Bayesian agent would have a unique parameter prior M .2 Epstein and Schneider’s agent, on the other hand, entertains multiple parameter priors M ∈ M and multiple likelihoods L ∈ L. The parameter priors are all Dirac measures. The likelihoods are given by 1 1 x̄ + η(t) √ √ , |η(t)| ≤ η̄, L ∆R(t) = ±σR / d x̄, η(t) = ± (SA.1) 2 2 σR d for some η̄ < ∞, so that at each trading date t, any value of η(t), |η(t)| ≤ η̄, could be the case. ∗ University of Pennsylvania, aitch.choi@gmail.com. Epstein and Schneider consider log returns but the difference is inconsequential. 2 See Section 2.2.2. 1 1 Having observed the returns up to trading date t > 0, the agent rules out theories with low likelihood3 and Bayes-updates the remaining ones to obtain the set of onestep-ahead conditionals: denoting the time-t log-likelihood function of theories by `t (x̄, η), the set of one-step-ahead conditionals is given by 0 0 0 L(·|x̄, η(t + 1)) : |η(t + 1)| ≤ η̄ and max `t (x̄, η ) ≥ max `t (x̄ , η ) − α (SA.2) 0 0 0 η x̄ ,η where α ≥ 0 is a primitive. SA.1.2 The Likelihood Function in Continuous Time The continuous-time return process implied by (SA.1) is dR(t) = (x̄ + η(t)) dt + σR dw(t) with |η(t)| ≤ η̄ for all t ≥ 0.4 The log-likelihood of (x̄, η) is (see Proposition 2.3) Z T (x̄ + 0 η(t))σR−2 1 dR(t) − 2 Z T [σR−1 (x̄ + η(t))]2 dt. 0 But the sequence {η ν : ν = 1, 2, . . .} defined by if t = 0 0 ν η (t) , +η̄ if t ∈ (tνi , tνi+1 ] and R(tνi+1 ) − R(tνi ) > 0 , −η̄ if t ∈ (tνi , tνi+1 ] and R(tνi+1 ) − R(tνi ) ≤ 0 where {tν1 , . . . , tνν } is the νth dyadic partition of [0, T ], will make the integral Z T η ν (t) dR(t) 0 3 See Section 2.3.3. In contrast to the present paper, Epstein and Schneider’s does not speak of penalization, but the restriction “|η(t)| ≤ η̄ for all t” is equivalent to the penalty on the log-likelihood that takes the value of zero when the restriction is met and infinity otherwise. 4 Let d → ∞, and observe that the mean of returns is x̄ + η σR x̄ + η √ √ = = (x̄ + η) dt d σR d d and the variance of returns 2 2 σ x̄ + η 1 1 x̄ + η σR x̄ + η 1 1 x̄ + η σ2 2 √R − √ √ + + −√ − − ≈ R = σR dt. d 2 2 σR d d 2 2 σR d d d d 2 diverge to the infinite variation of R.5 Epstein and Schneider could circumvent this problem because the set of one-step-ahead conditionals (SA.2) is well-defined for all trading frequencies d and converges; but if time is continuous at the outset, such a circumvention is not possible. SA.2 Proof of the Claim in Page 15 Here I prove the claim in page 15 that the induced log-likelihood function of mx̄,η (t) without penalty, namely `t,m(t) , is constant. Familiarity with Section 2.3.3, including the results given after page 15, is assumed. Fix t > 0. Let the induced log-likelihood functions `t,m(t) and `λt,m(t) take values from R∪{∞}. By Lemma 2.5, the curvature of `λt,m(t) is given by δ(t)−1 . Since δ(t)−1 ↓ 0 as λ ↓ 0 (see (26) and the proof of Proposition 3.2), it follows that `λt,m(t) converges to a constant function as λ ↓ 0. Now, the constancy of `t,m(t) follows from the following lemma: Lemma SA.2.1. `t,m(t) (m) = limλ↓0 `λt,m(t) (m) for all m ∈ Rnx . Proof. Fix m ∈ Rnx . Begin with the trivial observation Z λ t `t (x̄, η) − |η(s)|2 ds ≤ `t (x̄, η). 2 0 Take supx̄,η with the constraint mx̄,η (t) = m and then limλ↓0 to see lim `λt,m(t) (m) ≤ `t,m(t) (m). λ↓0 Next, λ `t (x̄, η) − 2 Z t |η(s)|2 ds ≤ `λt,m(t) (m), η is such that mx̄,η (t) = m, 0 `t (x̄, η) ≤ lim `λt,m(t) (m), η is such that mx̄,η (t) = m, λ↓0 and `t,m(t) (m) ≤ limλ↓0 `λt,m(t) (m). 5 The discrete-time partial maximum-likelihood estimate also alternates between the extreme values ±η̄, and the corresponding profile likelihood function (the likelihood function with η replaced by the partial maximizer) becomes degenerate as the trading dates become infinitely frequent. See Epstein and Schneider (2007), Supplementary Appendix, Proposition S1. 3 SA.3 Proof of the Claim in Page 25 This section proves the following claim made at the end of the discussion of the negative comovement between δ and σ: Lemma SA.3.1. Assume finite confidence λ < ∞ for nondegeneracy. Suppose further that ny = nx = 1; b and σ are constant; b > 0 and ρw < 0; and finally b (SA.3) (ρ2 + (1 + λ−1 )ρ2v ) > 0. σ=− 2κρw w Then, in the limit t → ∞, the weight on the innovation ρw + σ −1 b(γ(t) + δ(t)) is zero, and δ(t) is larger than λ−1 ρ2v /2κ. Proof. Since σ −1 b is constant, γ and δ converge to constants, which I will denote again by γ and δ (see the proof of Proposition 3.2; γ(t) = γ ∞ (t)). Thus, the question is if there exist positive numbers γ, δ, and σ that solve the system of equations 0 = ρ2w + ρ2v − 2κγ − (ρw + σ −1 bγ)2 , 0 = (ρw + σ −1 bγ)2 − 2κδ + λ−1 ρ2v , (SA.4) (SA.5) 0 = ρw + σ −1 b(γ + δ). (SA.6) and Sum (SA.4) and (SA.5), and solve the resulting equation for γ + δ to obtain ρ2w + (1 + λ−1 )ρ2v . (SA.7) 2κ Plugging (SA.7) into (SA.6) we obtain (SA.3). Next, solve (SA.4), together with (SA.3), for γ to arrive at i p ρ2 + (1 + λ−1 )ρ2v h 2 −1 2 2 − (1 + λ−1 )ρ2 )2 + 4ρ2 ρ2 > 0. ρ − (1 + λ )ρ + γ= w (ρ w v w v w v 4κρ2w γ+δ = Finally, from (SA.7), f (λ−1 ) , δ(λ−1 ) − λ−1 ρ2v 2κ ρ2w + ρ2v − γ(λ−1 ). (SA.8) 2κ It only remains to show that f (λ−1 ) > 0 for all λ−1 > 0. First of all, f (0) = δ(0) − 0 = 0. Note then, from (SA.8), " # 1 ∂f 1 − . = ρ2v γ × p ∂(λ−1 ) (ρ2w − (1 + λ−1 )ρ2v )2 + 4ρ2w ρ2v ρ2w + (1 + λ−1 )ρ2v = Since 0 < (ρ2w −(1+λ−1 )ρ2v )2 +4ρ2w ρ2v = (ρ2w +(1+λ−1 )ρ2v )2 −4λ−1 ρ2w ρ2v < (ρ2w +(1+λ−1 )ρ2v )2 , ∂f /∂(λ−1 ) > 0 for all λ−1 > 0 and the claim follows. 4