Learning under Ambiguity, Portfolio Choice, and Asset Returns Hongseok Choi December 15, 2015

advertisement
Learning under Ambiguity, Portfolio Choice,
and Asset Returns
Hongseok Choi∗
December 15, 2015
Job Market Paper
Abstract
This paper investigates the effects of learning under ambiguity on investment strategies and asset returns. The main contributions of the paper are
twofold. First, it recognizes and quantifies the uncertainty involved in estimating the data-generating mechanism, or estimation ambiguity. I consider the
consumption/portfolio choice problem of a multiple-priors investor with logarithmic felicity and observe that ignoring estimation ambiguity can result in
a significant underestimation of hedging demand: even with only a moderate
degree of ambiguity in the equity premium (1% point in annual terms), a log investor learning about the data-generating mechanism can sell short an amount
of the risky asset worth half of her wealth when the estimated equity premium
is zero. The second contribution of the paper is that it provides endogenous
dynamics for the conditional degree of ambiguity, and consequently for the ambiguity premium. I observe that the resulting relationship of the total equity
premium with return volatility is unclear. The ambiguity premium depends on
the history of return volatility rather than its contemporaneous value. Furthermore, a period of low (high) return volatility can be followed rather by a high
(low) ambiguity premium.
Keywords: ambiguity, learning, state-space models, recursive multiple-priors
utility, model selection, hedging demand, equity premium.
∗
University of Pennsylvania, aitch.choi@gmail.com. This paper is a revised version of my doctoral
dissertation at the University of Pennsylvania (Choi, 2012), and I am deeply indebted to my advisor,
Domenico Cuoco, for his guidance and support throughout the course of this research. I would also
like to thank Philipp Illeditsch and Dirk Krueger; and the seminar participants at Seoul National
University, Korea University Business School, the 11th Annual Conference of the Asia-Pacific Association of Derivatives (especially Sun-Joong Yoon), Korea University Mathematics Department, the
Korea Institute of Finance, and Yonsei University.
1
1
Introduction
In standard financial models, agents are depicted as having a unique probabilistic
model regarding uncertain investment opportunities. However, our knowledge about
the data-generating mechanism—for example, how stock returns are generated—is
often limited, and in such cases it is difficult to assess the probabilities of all uncertain
events precisely; in other words, we face ambiguity. One of the most popular ways to
model decision makers under ambiguity is through Gilboa and Schmeidler’s (1989)
multiple-priors utility, and over the past few decades multiple-priors models have
successfully demonstrated the economic significance of taking ambiguity into account
by shedding light on a number of puzzling phenomena in financial markets.1
In most dynamic multiple-priors models, however, the degree of ambiguity as
perceived by an agent follows an exogenous process, whether constant or time-varying.
In a dynamic multiple-priors model, an agent’s view about the uncertain outcomes of
the immediate future is represented by a set of one-step-ahead conditionals as opposed
to a single one-step-ahead conditional;2 and the size of the set of one-step-ahead
conditionals is thought to represent the conditional degree of ambiguity. Some authors
have assumed that the size of the set of one-step-ahead conditionals follows a meanreverting process;3 a larger number of authors that it stays constant, interpreting the
constancy as lack of learning due to the agent’s having learned all that she can.4 While
these models of ambiguity provide useful benchmarks, they are not ideal, essentially
being reduced-form models.
In this paper, I endogenize the conditional degree of ambiguity and its response
to observations by constructing a model of learning under ambiguity. The underlying idea is familiar and simple: fearing misspecification, the agent considers multiple
Bayesian models, or theories, of the market, and at each instant she constructs the set
of one-step-ahead conditionals by updating the theories with sufficiently high plausibility. The set of theories is constructed around a “reference theory” that data are
generated by a continuous-time linear state-space model. The agent, however, is not
bold enough to claim that she has identified the model of the market, and seeks a
degree of robustness by expanding her consideration to a class of theories that are
statistically close to the reference theory in the sense of mutual absolute continuity.
These theories are not as intuitive and simple as the reference theory, but, since they
1
Among other examples: stock market nonparticipation (Dow and Werlang, 1992), excess volatility (Epstein and Wang, 1994; Illeditsch, 2011), excess equity premium (Chen and Epstein, 2002),
and equity home bias (Epstein and Miao, 2003).
2
Let G0 , G1 , · · · , GT be the agent’s observation filtration. Given a prior P on GT , the implied
time-t one-step-ahead conditional means the restriction of P (·|Gt ) to Gt+1 .
3
See Sbuelz and Trojani (2008), Drechsler (2013), and Ilut and Schneider (2014).
4
See Hernández-Hernández and Schied (2006, 2007a,b), Schied (2008), Miao (2009), Routledge
and Zin (2009), and Liu (2011, 2013) for applications of constant ambiguity to portfolio choice, and
Chen and Epstein (2002), Epstein and Miao (2003), Trojani and Vanini (2004), and Gagliardini
et al. (2009) for those to asset pricing.
2
all agree with the reference theory on what events are and are not possible, it is difficult to tell them apart from each other even after a long period of observation. Now,
a key feature of the present paper is that the agent is unable to assign second-order
probabilities over the theories. However complicated a Bayesian hierarchy may be, as
long as it is complete the agent can tell the probabilities of all conceivable events, and
the objective of this paper is precisely to consider a case in which she cannot. Consequently, the agent instead constructs a “confidence set” of theories by comparing
their plausibility.
The main contributions of the paper are twofold.
First, it recognizes and quantifies the uncertainty involved in estimating the datagenerating mechanism. In this respect, the paper is also closely related to the Bayesian
literature on estimation risk (Kalymon, 1971; Barry, 1974; Klein and Bawa, 1976,
1977). In Bayesian models, agents may not know the exact values of certain parameters, but they still have a prior over the parameter space, which is updated as they
make observations. The dispersion of the posterior distribution is known as estimation
risk. In contrast, in the present paper the agent not only faces estimation risk under
each theory but also faces uncertainty in estimating the unknown data-generating
mechanism, or estimation ambiguity.
The economic significance of estimation ambiguity comes from the fact that in
revising their estimates, agents place a greater weight on new evidence the more unreliable are current estimates. This means a higher local correlation between the return
and the state variable (of the control problem) and consequently higher hedging demand. In other words, ignoring estimation ambiguity can result in an underestimation
of hedging demand, as I demonstrate in Section 4.
In Section 4, I solve, up to a value function, the consumption/portfolio choice
problem of a multiple-priors investor with logarithmic felicity. After calibrating the
model to U.S. stock market data (Barberis, 2000), I numerically compute the optimal
Markov policy and observe that even with only a moderate degree of ambiguity in
the equity premium (1% point in annual terms), an investor learning about the datagenerating mechanism can sell short an amount of the risky asset worth half of her
wealth when the estimated equity premium is zero. This is in stark contrast with both
the Bayesian result that log investors are myopic, in which case an absence of an
equity premium implies no investment in the risky asset (see, for example, Kim and
Omberg, 1996) and the static multiple-priors result that ambiguity aversion deters
investors (with a general felicity function) from taking a position in the risky asset for
a range of prices (Dow and Werlang, 1992). Multiple-priors log investors’ nonmyopic
behavior was first observed in discrete time by Epstein and Schneider (2007) and in
continuous time by Hernández-Hernández and Schied (2007a).
The second contribution of the paper is that, as I have emphasized, it provides
endogenous dynamics for conditional ambiguity. The economic significance of these
dynamics comes from the fact that conditional ambiguity is directly reflected in the
equilibrium equity premium; that is, the equity premium includes a reward for bear3
ing ambiguity, or ambiguity premium, as well as the risk premium (Chen and Epstein, 2002). Specifically, in Section 5, I consider a Lucas economy populated by a
multiple-priors agent with logarithmic felicity and explore a few theoretical possibilities regarding the dynamics and asymptotic level of the ambiguity premium.
First, the relationship between the total equity premium and the conditional variance of returns is theoretically, as well as empirically, unclear. Motivated by standard
asset pricing models such as the Intertemporal CAPM of Merton (1973), numerous
empirical papers have investigated the relationship between the two, but the findings
are mixed.5 In the present model, the equity premium is given by the conditional
variance of returns plus the ambiguity premium, and the latter both varies over time
and does not have a deterministic relationship with the former. The ambiguity premium, following an absolutely continuous process, slowly responds to variations in
the conditional variance of returns, depending thus on the recent history of the latter
rather than its contemporaneous value. Furthermore, if expected and realized growth
rates are locally negatively correlated, a period of low (high) return volatility can be
followed rather by a high (low) ambiguity premium.
Second, in the long run, the ambiguity premium exhibits a downward trend, and
consequently so does the total equity premium. This is because, in the present model,
the ambiguity premium is a reward for bearing ambiguity in the expected growth rate,
and the component ambiguity in the long-run growth rate resolves as time passes. The
equity premium has indeed been observed to have declined over the postwar period
(Merton, 1980; Blanchard, 1993; Jagannathan et al., 2000), and part of it could be
due to learning and the accompanying resolution of ambiguity.
Finally in Section 5.5, I demonstrate that an improvement in the quality of public
information can increase the asymptotic level of the equity premium. In a Bayesian
framework, Veronesi (2000) made a similar observation that higher precision of signals tends to increase the risk premium. What distinguishes the present observation
from Veronesi’s is that his result relies on the representative agent’s being sufficiently
risk-averse, more so than log agents. In his model, a deterioration in the quality of
public information decreases the equity premium because the agent’s hedging demand
tones down the covariation between returns and consumption growth. In contrast, the
present paper shows that the equity premium can exhibit such counterintuitive behavior even under unit risk aversion, which is conventionally associated with myopia.
To the best of my knowledge, the one paper in the existing literature that derives
time variation in the set of one-step-ahead conditionals from a model of learning is
Epstein and Schneider (2007).6 In that paper, too, the predictive set is constructed
by a statistical test over multiple theories. The main differences between Epstein and
5
See the introduction of Rossi and Timmermann (2015).
Campanale (2011) applies Epstein and Schneider’s model in the context of life-cycle portfolio
choice, and Miao and Wang (2011), in the context of job matching. Epstein and Schneider’s (2008)
asset pricing model, too, conforms to the formalism of Epstein and Schneider (2007), but the agents
of the 2008 paper do not rule out any of the theories they a priori entertain.
6
4
Schneider’s paper and mine are twofold. First, whereas Epstein and Schneider consider
memoryless data-generating mechanisms, I consider those with serial dependence. The
latter consideration clearly complements the former. For example, with the present
model, we can study the effects of learning under ambiguity when stock returns or the
growth of consumption/dividends is (ambiguously) predictable (Sections 4 and 5).7
Second, whereas Epstein and Schneider set their model in discrete time, I set mine
in continuous time. Continuous-time modeling is known to facilitate analysis with
its analytical tractability. But more importantly I also note that the continuous-time
counterpart of Epstein and Schneider’s portfolio choice example results in no learning
because the likelihood function degenerates to infinity everywhere.8 Consequently,
their discrete-time finding that learning resolves ambiguity does not immediately
carry over to continuous time: learning under ambiguity in continuous time needs
separate treatment.
Another paper that considers learning in the context of multiple-priors utility
is Miao (2009). Specifically, he considers the consumption/portfolio choice problem
of a multiple-priors investor in continuous time who partially observes stochastic
investment opportunities. However, his notion of learning is fundamentally different
from mine. Miao’s investor obtains a benchmark predictive measure by updating a
reference theory, and the set of predictive measures is given by a neighborhood of
the benchmark with a fixed radius. Thus, learning and ambiguity do not interact. In
fact, Miao’s model is the limit of the present model as the investor gains confidence
(Section 4.4.3).9
See also Hansen and Sargent (2011), Chen et al. (2014), and the references therein.
The former paper considers learning in the context of robust control, and the latter
in the context of smooth ambiguity. In both papers, agents learn by updating a
Bayesian model. Chen et al.’s agent, in particular, entertains a standard hierarchical
Bayesian model. In a smooth ambiguity model, aversion to ambiguity is captured not
by imprecise probabilities but by failures to reduce compound lotteries.
The rest of the paper is organized as follows. Section 2 defines and solves the
model of learning under ambiguity. Section 3 discusses the results. Section 4 applies
the model to portfolio choice; Section 5, to asset pricing. All proofs are collected in
the appendix.
7
There indeed is convincing evidence that both returns and growth rates are predictable (van
Binsbergen and Koijen, 2010). For a review of the predictability literature, see Koijen and van
Nieuwerburgh (2011).
8
See the supplementary appendix.
9
Liu (2011) considers the consumption/portfolio choice problem of a Miao investor when expected
returns follow a Markov chain.
5
2
Learning under Ambiguity
This section defines and solves the model of learning. Section 2.1 describes the agent’s
utility function and Section 2.2 her theories. Section 2.3 then applies the learning
mechanism outlined in the introduction to the setting; this amounts to mapping the
theories to the priors in the representation of preferences.
2.1
Preferences
The agent has continuous-time recursive multiple-priors utility with equivalent priors
as formulated by Chen and Epstein (2002).
Specifically, time is continuous and varies over [0, T ], T ∈ (0, ∞).
Let Ω denote the set of states of Nature, and let a filtration G = {Gt } on Ω
represent the accrual of the agent’s information.
There is a set P of equivalent probability measures on (Ω, GT ), the set of priors,
with the following properties: Let P 0 ∈ P. There is an ny -dimensional Wiener process
= {(t), Gt } under P 0 such that G is the augmentation under P 0 of the filtration
generated by . (The notation = {(t), Gt } signifies that the process is adapted
to the filtration G. All vectors, including the gradient ∂f of a scalar function f , are
column vectors. Hence, (t) = (1 (t), · · · , ny (t))> .) Each prior is identified with the
corresponding density generator ξ = {ξ(t), Gt } and is thus written as P ξ , where
dP ξ
= E ξ (T )
dP 0
(1)
and E ξ denotes the Doléans-Dade exponential
Z t
Z
1 t
ξ
2
ξ(s) d(s) −
E (t) , exp
|ξ(s)| ds , 0 ≤ t ≤ T.
2 0
0
Then, under P ξ , ξ defined by dξ (t) = d(t) − ξ(t) dt, ξ (0) = 0, is a Wiener process.
P is further required to be rectangular, which means that there is a set-valued
ny
process Ξ : [0, T ] × Ω → 2R such that P ξ defined by (1) is a prior if and only if ξ is
progressive and ξ(t, ω) ∈ Ξ(t, ω) for Lebesgue×P 0 almost every (t, ω). Since P consists
of equivalent measures, “for Lebesgue×P 0 almost every (t, ω)” is henceforth abbreviated without obscurity to “a.e.” Ξ is called the one-step-ahead beliefs process10 and
is further required to be (i) uniformly bounded, (ii) compact-convex-valued, and (iii)
“progressive”: (i) Ξ(t, ω) ⊂ K a.e. for some bounded K ⊂ Rny , (ii) Ξ(t, ω) is compactconvex a.e., and (iii) the restriction of Ξ to [0, t] × Ω is B[0, t] ⊗ Gt -measurable11 for all
t ∈ [0, T ] where BX denotes the Borel σ-algebra of a topological space X. Incidentally,
10
Since I work with continuous time, one-step-ahead is an abuse of language, which can be remedied, if need be, by introducing infinitesimal generators. See, for example, Anderson et al. (2003).
11
{(s, ω) ∈ [0, t] × Ω : Ξ(s, ω) ∩ K 0 6= ∅} ∈ B[0, t] ⊗ Gt for all closed K 0 ⊂ K. See Aliprantis and
Border (1999), Sections 16.1, 16.2, and 17.1.
6
constant ambiguity, or independently and indistinguishably distributed (IID) ambiguity (Epstein and Schneider, 2003; Chen and Epstein, 2002), refers to the situation
where Ξ is constant, that is, Ξ(t, ω) = K a.e. for some compact-convex K ⊂ Rny .
A scalar process c = {c(t), Gt } is a consumption process if it is progressive, positive,
and integrable with respect to time. Denote the set of consumption processes by C.
The agent’s conditional preferences at time t ∈ [0, T ] are represented by
U c (t, ω) = min U c,P (t, ω), c ∈ C
P ∈P
(2)
where U c,P = {U c,P (s), Gs }, the utility process under P ∈ P, uniquely solves the
backward stochastic differential equation (BSDE),
Z T
P
c,P
c,P
F (c(τ ), U (τ )) dτ Gs , t ≤ s ≤ T.
U (s) = E
s
Here, F is the aggregator. See Chen and Epstein (2002), Section 2.5, for the conditions
that an aggregator must satisfy.
The utility process U c defined by (2) solves
dU c (t) = −F (c(t), U c (t)) dt + max σUc (t)( d(t) + ξ(t) dt), 0 ≤ t ≤ T,
(3)
ξ(t)∈Ξ(t)
with terminal condition U (T ) = 0, for some process σUc = {σUc (t), Gt }. When (3) is
viewed as a BSDE, the pair (U c , σUc ) constitutes a solution. Interpretation is straightforward. By the assumption that generates G, all changes in fundamentals to take
place over the next instant are functions of d(t); (3) then indicates that the agent’s
conditional beliefs about the one-step-ahead uncertainties are summarized by a set
of distributions {N (ξ(t) dt, Iny dt) : ξ(t) ∈ Ξ(t)}; and as a “pessimist,” the agent
assesses each consumption plan under the distribution that is worst for the plan.
2.2
The Theories
Denote the observable process that generates the agent’s information by y = {y(t), Gt }.
That is, G is the P 0 -augmentation of the filtration generated by y. ny ≥ 1 denotes
the dimension of y. Examples of y will be given shortly.
In this section, I define the theories that the agent entertains about how y is
generated. They are given by a set of probability measures Q ∈ Q on a common
measurable space.
2.2.1
The Reference Likelihood
Let there be a filtration F = {Ft } on Ω and a probability measure Qx̄,0 on (Ω, FT )
where x̄ ∈ Rnx , nx ≥ 1. F satisfies the usual conditions with respect to Qx̄,0 . Let there
also be two independent (Qx̄,0 , F)-Wiener processes w and v x̄,0 , ny -dimensional and
7
nx -dimensional, respectively. Under Qx̄,0 , y satisfies the following system of stochastic
differential equations (SDEs):
dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t),
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv dv x̄,0 (t).
Here, x = {x(t), Ft } is an nx -dimensional process that is unobservable to the agent;
a, b, and σ are nonanticipative path functionals from [0, T ] × C([0, T ], Rny ) into Rny ,
Rny ×nx , and Rny ×ny , respectively, where C([0, T ], Rny ) denotes the set of continuous
functions from [0, T ] into Rny ;12 κ is an nx × nx diagonal matrix with positive entries,
ρw is an nx × ny matrix, and ρv is an nx × nx invertible matrix. Given that y is
observed, the diffusion matrix process σσ > , too, is observed via quadratic variationcovariation. The assumption that σ as a nonanticipative path functional depends only
on y, or equivalently, σ as a process is adapted to G, embodies the restriction that
observing the diffusion matrix does not expand the agent’s information. x(0) is an
F0 -measurable random variable. The distribution of x(0) conditional on G0 is normal
with mean m0 ∈ Rnx and variance-covariance matrix γ0 ∈ Rnx ×nx . For simplicity, I
assume y(0) is nonrandom.
All the parameters and functionals are known but x̄. While this assumption may
seem unrealistic—if the agent knows so much, why not x̄? or vice versa—I point out
that (i) in many special cases considered in the literature, the functionals are simple,
being constant or linear, and (ii) the restrictive form of ignorance is only a first step.
The agent may well find κ ambiguous as well, for example.
Example 2.1 (Stock Returns with Constant Volatility). Suppose that the cumulative
return process R of a stock satisfies
dR(t) = x(t) dt + σR dw(t)
and R is the only observable process. Then y = R with a ≡ 0, b ≡ 1, and σ ≡ σR > 0.
Example 2.2 (Extra Signal). Continue to assume constant volatility; but now assume
there is an extra signal about the unobservable expected return:
q
2
dR(t) = x(t) dt + σR ( 1 − rRA
, rRA ) dw(t),
dA(t) = x(t) dt + σA (0, 1) dw(t),
where rRA ∈ (−1, 1) and σR , σA > 0. Then y = (R, A)> with
p
2
0
1
σR 1 − rRA
σR rRA
a≡
, b≡
, and σ ≡
.
0
1
0
σA
12
Let ι be such that ι(t, f ) = f (t), 0 ≤ t ≤ T , f ∈ C([0, T ], Rny ). Let Bt , σ(ι(s) : 0 ≤ s ≤ t).
Then a, b, and σ are measurable and adapted to {Bt }.
8
Similar examples can be constructed in general equilibrium settings in which, for
example, consumption/dividend growth replaces the stock returns (see Section 5).
Now, the reference likelihood function of the parameter x̄ under full observation,
or simply the reference likelihood, is defined by
LFO,T (x̄) ,
2.2.2
dQx̄,0
.
dQ0,0
Bayesian Benchmark
Let M be a probability measure on (Rnx , BRnx ). Then, (M, LFO,T ) defines a Bayesian
model of data generation, according to which x̄ is drawn from M , and conditional
on x̄, (y, x) from LFO,T (x̄). Incidentally, M , too, is typically called the prior, but to
distinguish it from P ∈ P (and Q ∈ Q), I refer to it as the parameter prior.
2.2.3
The Theories
Bayesian agents behave as if they knew the probabilities of all events precisely. The
agent of this paper, in contrast, lacks confidence in her understanding of the environment and finds both the parameter and the likelihood ambiguous.
To elaborate, the agent’s perception of ambiguity regarding x̄ is expressed by
multiple parameter priors. For simplicity, I assume that the parameter priors are all
Dirac measures; that is, the set of parameter priors is given by
0
M = {Diracx̄ : x̄0 ∈ Rnx }
0
where Diracx̄ denotes the Dirac measure concentrated at x̄0 ∈ Rnx .13
Similarly, the agent entertains multiple likelihoods as well. Fix x̄. Let there be a
probability measure Qx̄,η , η ∈ L2 ([0, T ], Rnx ), on (Ω, FT ) where L2 ([0, T ], Rnx ) denotes
the set of square-integrable Rnx -valued functions. Let there also be an nx -dimensional
Wiener process v x̄,η = {v x̄,η (t), Ft } independent of w. Under Qx̄,η , (y, x) satisfies the
following system of SDEs:
dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t),
dx(t) = κ(x̄ + κ−1 ρv η(t) − x(t)) dt + ρw dw(t) + ρv dv x̄,η (t),
(4)
(5)
where, as before, y(0) ∈ Rny is nonrandom and the F0 -measurable random variable
x(0) has the conditional distribution x(0)|G0 ∼ N (m0 , γ0 ). The presence of η clearly
allows for the structural breaks suspected in the literature; meanwhile, the lack of
assumptions on η (besides the minimal technical one of square integrability) reflects
13
To quote Epstein and Schneider (2007), who first considered multiple parameter priors, “Indeed,
one may wonder whether there is a need for non-Dirac [parameter] priors at all.”
9
the fact that evidence is too inconclusive for the agent to confidently make one. The
set of full-observation likelihoods is given by
LFO,T = x̄ 7→ LFO,T (x̄, η) : η ∈ L2 ([0, T ], Rnx ) ,
dQx̄,η
LFO,T (x̄, η) ,
.
dQ0,0
(5) can alternatively be written as
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt),
which shows that the ambiguity in question is equivalent to that in the noise v x̄,η
specific to the state dynamics and not the other one w. Indeed, the agent of this
paper is not mechanically taking into consideration all the theories that are close to
a reference, in which case w, too, would be perturbed, but is rather questioning a
particular aspect of market dynamics, namely, mean reversion.
Now I turn to the issue of the existence and uniqueness of a solution to the system
of SDEs (4)–(5). |·| denotes the Euclidean norm for
p vectors and the Frobenius norm for
matrices; that is, for a vector or matrix z, |z| , tr(zz > ). All numbered assumptions
stand throughout the paper from their statement on, unless otherwise noted.
Assumption 2.1 (Sufficient Conditions for Unique Strong Existence).
(i) b is uniformly bounded.
(ii) For all f ∈ C([0, T ], Rny ),
Z
T
(|a(t, f )| + |σ(t, f )|2 ) dt < ∞.
0
(iii) a, b, and σ are locally Lipschitz. That is, for each N there is a KN such that
sup |f (s)| ∨ sup |g(s)| ≤ N ⇒ |σ(t, f ) − σ(t, g)| ≤ KN sup |f (s) − g(s)|
s≤t
s≤t
s≤t
for all t ∈ [0, T ]; and the same for a and b mutatis mutandis.
(iv) a and σ are linearly growing. That is, there is a K such that
|a(t, f )| + |σ(t, f )| ≤ K 1 + sup |f (s)|
s≤t
for all (t, f ) ∈ [0, T ] × C([0, T ], Rny ).
Proposition 2.1. Strong existence and pathwise uniqueness hold for the system of
SDEs (4)–(5).
10
Suppose (w, v x̄,η ) and (y, x) are defined on some filtered complete probability
space. With a slight abuse of notation, σ(t) ≡ σ(t, ω) ≡ σ(t, y(ω)). With the notation
σ(t, ω), σ can be considered a process (adapted to G). Similar remarks apply to
the other functionals. As is the custom, the qualification almost surely is suppressed
unless necessary.
Assumption 2.2. There is an ε > 0 such that
z > σ(t)σ(t)> z ≥ ε|z|2 for all z ∈ Rny and all t ∈ [0, T ].
Assumption 2.2 implies that σ(t) has an inverse and |σ(t)−1 z| ≤ ε−1/2 |z| for all z
and t; likewise, σ(t)> , too, has an inverse and |(σ(t)> )−1 z| ≤ ε−1/2 |z| for all z and t
(Karatzas and Shreve, 1988, Problem 5.8.1). With the last observation, we can rewrite
(4) and (5) as
dw(t) = σ(t)−1 [ dy(t) − (a(t) + b(t)x(t)) dt],
dv
x̄,η
(t) = −η(t) dt +
ρ−1
v
(6)
{ dx(t) − κ(x̄ − x(t)) dt
(7)
−ρw σ(t)−1 [ dy(t) − (a(t) + b(t)x(t)) dt] ,
and use these SDEs to define w and v x̄,η .
Let
Ω , C([0, T ], Rny ) × C([0, T ], Rnx ),
F ◦ , BC([0, T ], Rny ) ⊗ BC([0, T ], Rnx ),
let (y, x) be the identity map on Ω, and let
Qx̄,η , law(y, x)
be defined on (Ω, F ◦ ). Let F = {Ft } be the augmented filtration generated by (y, x).
Since {Qx̄,η : x̄, η} are equivalent, they all lead to the same augmentation. Finally,
define σ by σ(t, ω) = σ(t, y(ω)); a and b similarly; and w and v x̄,η by (6) and (7). In
particular, this construction (of weak solutions) explains why v x̄,η is superscripted.
In sum, the agent’s theories of how the data y is generated can be identified with
the probability measures
Q , Qx̄,η : (x̄, η) ∈ Rnx × L2 ([0, T ], Rnx )
on the common measurable space (Ω, FT ). I call these measures theoretical priors to
distinguish them from the measures P ∈ P that are part of the representation of the
agent’s preferences, or the preferential priors.
The elicited probability measures are often interpreted as those that are actually entertained by the agent; see, for example, Gagliardini et al. (2009).14 However,
14
In motivating their preferential priors {P h : h} Gagliardini et al. say, “The representative agent
does not know the data-generating process · · · [and] considers a class of probabilistic scenarios,
or contaminations, P h around the reference belief [a Cox-Ingersoll-Ross (1985) economy]. These
contaminations are interpreted as likely specifications for the constituents of the opportunity sets.”
11
“nothing in the theoretical construct of Gilboa and Schmeidler (1989) supports this
interpretation” (Gajdos et al., 2008), and rather it is clear that “information and personal taste jointly determine [the set of preferential priors]” (Gilboa and Marinacci,
2013). Thus in this paper I speak of the agent’s theories in their own right and with
them make explicit the cognitive origin of the elicited beliefs (Epstein and Schneider,
2007; Gajdos et al., 2008).
2.3
The Preferential Priors
This subsection executes the learning mechanism. This amounts to mapping the theories of the agent to her preferential priors.
2.3.1
Learning under Qx̄,η
Recall the partially observable system
dy(t) = (a(t) + b(t)x(t)) dt + σ(t) dw(t),
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt).
I use dot notation for time derivatives: f˙(t) ≡ df (t)/ dt.
Proposition 2.2. The following standard results in Gaussian filtering hold:
(i) (y, x) is conditionally Gaussian.
(ii) The conditional mean vector and variance-covariance matrix
x̄,η
mx̄,η (t) , EQ (x(t)|Gt ),
x̄,η
γ(t) , EQ [(x(t) − mx̄,η (t))(x(t) − mx̄,η (t))> |Gt ],
satisfy the system of differential equations
dmx̄,η (t) = [κ(x̄ − mx̄,η (t)) + ρv η(t)] dt
+ (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 [ dy(t) − (a(t) + b(t)mx̄,η (t)) dt] (8)
= (κx̄ + ρv η(t) − κ̄(t)mx̄,η (t)) dt
+ (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 ( dy(t) − a(t) dt),
>
γ̇(t) = ρw ρ>
w + ρv ρv − κγ(t) − γ(t)κ
− (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> ,
(9)
with initial conditions mx̄,η (0) = m0 and γ(0) = γ0 , where
κ̄(t) , κ + (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 b(t).
(iii) the process w̄x̄,η = {w̄x̄,η (t), Gt } defined by
Z t
x̄,η
w̄ (t) ,
σ(s)−1 [ dy(s) − (a(s) + b(s)mx̄,η (s)) ds], 0 ≤ t ≤ T,
0
is a Wiener process under Qx̄,η and generates G.
12
(10)
Lemma 2.1. γ is uniformly bounded.
Let ϕ : [0, T ] × Ω → Rnx ×nx be the solution of
ϕ̇(t) = −κ̄(t)ϕ(t), ϕ(0) = Inx ,
where Inx denotes the nx -dimensional identity matrix. ϕ(t) is invertible for all t ≥ 0.
Introduce the following notation: for functions f from [0, T ] into Rnx or into Rnx ×nx ,
Φf denotes the process defined by
Z t
f
ϕ(s)−1 f (s) ds, 0 ≤ t ≤ T.
Φ (t) , ϕ(t)
0
Then
mx̄,η (t) = ϕ(t)m0 + Φκx̄+ρv η (t)
Z t
+ ϕ(t)
ϕ(s)−1 (ρw σ(s)> + γ(s)b(s)> )(σ(s)σ(s)> )−1 ( dy(s) − a(t) dt). (11)
0
2.3.2
Likelihood of Theories
The log-likelihood function for the theories under partial observation is given by
d(Qx̄,η |GT )
d(Q0,0 |GT )
x̄,η dQ
0,0
GT
= log EQ
dQ0,0 `T (x̄, η) , log
(12)
where Qx̄,η |GT denotes the restriction of Qx̄,η to GT . The choice of the reference, here
(x̄, η) = (0, 0), is inconsequential.
Proposition 2.3.
Z T
`T (x̄, η) =
(a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 dy(t)
0
Z
1 T
(a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 (a(t) + b(t)mx̄,η (t)) dt
−
2
Z 0T
−
(a(t) + b(t)m0,0 (t))> (σ(t)σ(t)> )−1 dy(t)
0
Z
1 T
0,0
>
> −1
0,0
−
(a(t) + b(t)m (t)) (σ(t)σ(t) ) (a(t) + b(t)m (t)) dt .
2 0
13
The log-likelihood at time t, t < T , is obtained by replacing the arbitrary T with t:
Z t
(a(s) + b(s)mx̄,η (s))> (σ(s)σ(s)> )−1 dy(s)
`t (x̄, η) =
0
Z
1 t
−
(a(s) + b(s)mx̄,η (s))> (σ(s)σ(s)> )−1 (a(s) + b(s)mx̄,η (s)) ds + ft (13)
2 0
where ft is independent of (x̄, η).
Since mx̄,η (t) is linear in x̄, `t (x̄, η) is quadratic in x̄. `t (x̄, η) is also Gâteaux
differentiable with respect to η, and the derivative is linear in η:
Lemma 2.2. The Gâteaux differential of `t (x̄, ·) at η ∈ L2 ([0, T ], Rnx ) in the direction
h ∈ L2 ([0, T ], Rnx ) is
Z t
Z t
−1
>
(ϕ(s) ρv )
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1
0
s
x̄,η
× [ dy(τ ) − (a(τ ) + b(τ )m
2.3.3
>
(τ )) dτ ]
h(s) ds.
Learning about the Data-Generating Mechanism
Recall the dynamics of the observable process
dy(t) = (a(t) + b(t)x(t)) dt + σ(t) dw(t).
If the agent were Bayesian with unique theoretical prior Qx̄,0 ∈ Q,15 Bayesian
updating would result in the filtered dynamics
dy(t) = (a(t) + b(t)mx̄,0 (t)) dt + σ(t) dw̄(t),
(14)
where w̄, defined by (14), is a (Qx̄,0 , G)-Wiener process, and her time-t decisions
would accordingly be based on the unique one-step-ahead conditional
dy(t)|Gt ∼ N (a(t) + b(t)mx̄,0 (t)) dt, σ(t)σ(t)> dt .
On the other hand, our agent entertains a set of theories, {Qx̄,η : x̄, η}, and rules out
some of them in light of evidence. Hence, unless she rules out all but one theory, the
agent will have multiple one-step-ahead conditionals of the form
dy(t)|Gt ∼ N (a(t) + b(t)mx̄,η (t)) dt, σ(t)σ(t)> dt
where (x̄, η) runs over a set. Note that the ambiguity in the data-generating mechanism boils down to that in mx̄,η (t), the posterior mean of x(t).
15
Given that our agent is uncertain about x̄, a fair comparison would require that the hypothetical
Bayesian agent be given a diffuse parameter prior; but the form of the parameter prior is irrelevant
to the point I am making here, namely, unique versus nonunique one-step-ahead conditionals.
14
Plausibility: Penalized Log-Likelihood The ambiguity, however, is too large for
there to be learning, if the agent assesses the plausibility of a theory based on the
likelihood alone. To elaborate, define the log-likelihood induced by the transformation
(x̄, η) 7→ mx̄,η (t) by
`t,m(t) (m) ,
sup
(x̄,η)∈Rnx ×L2 ([0,T ],Rnx )
{`t (x̄, η) : mx̄,η (t) = m} , m ∈ Rnx .
Then, `t,m(t) is constant, the constant value lying in R∪{∞}.16 In other words, mx̄,η (t)
is not identified. The reason is that each value of mx̄,η (t) can be supported equally
well by some theory with a large η.17
Indeed, “inductive inference based on objective criteria alone is bound to fail,
while incorporating subjective criteria alongside objective ones can lead to successful
learning”; that is, “effective learning requires a willingness to sacrifice goodness-of-fit
in return for enhanced subjective appeal” (Gilboa and Samuelson, 2012).
Thus, I assume that the plausibility ranking, a binary relation “at least as plausible
as,” over the theories is represented by a penalized log-likelihood function. Specifically, the agent finds more appealing the “reference” or “simple” theories free of the
poorly understood factor η, and that subjective criterion is translated into a penalty
proportional to the magnitude of η measured by the L2 -norm:
Z
λ t
λ
|η(s)|2 ds
(15)
`t (x̄, η) , `t (x̄, η) −
2 0
where λ ∈ (0, ∞] measures the agent’s a priori confidence about the reference likelihood. When λ = ∞, the set of theories reduces to {Qx̄,0 : x̄ ∈ Rnx } and the agent
perceives no persistent source of ambiguity; when λ is small, the agent fits data with
large ηs with little restraint. It is also worth noting that the L2 -norm of η is equal
to the deviation of a theory Qx̄,η from its simple counterpart Qx̄,0 measured by the
Kullback-Leibler divergence:
dQx̄,0
x̄,0
DKL (Qx̄,0 kQx̄,η ) , EQ log
dQx̄,η
Z T
1
=
|η(t)|2 dt.
2 0
16
See the supplementary appendix for a proof.
Precisely speaking, the supremum is not attained; that is, there does not exist a maximum
likelihood estimate. Fix x̄ and suppose there is a partial maximizer η of the likelihood `t (x̄, η),
0 < t ≤ T , in L2 ([0, T ], Rnx ). Then it must satisfy, from Lemma 2.2,
17
0 = (ϕ(s)−1 ρv )>
Z
t
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 [ dy(τ ) − (a(τ ) + b(τ )mx̄,η (τ )) dτ ], 0 ≤ s ≤ t,
s
but the constancy of the left-hand side and the unbounded variation of the right-hand side are
incompatible. It follows that for any given η, there is another η 0 with higher likelihood.
15
The idea of penalizing the likelihood was first discussed by Good and Gaskins
(1971) in the context of nonparametric density estimation. Green (1987) extended
it to semiparametric settings. In these non- or semi-parametric estimation problems,
Sobolev norms of higher orders, as well as the L2 -norm, are favored; but for us, imposing smoothness on η would violate the assumption of symmetry. In the context of
model selection, Akaike (1973) extended the maximum likelihood principle by proposing his celebrated criterion in the form of a penalized log-likelihood; and ever since,
penalizing the likelihood has been a standard method in information theory to strike
a balance between the goodness of fit and the simplicity of the model (see Konishi and
Kitagawa, 2008). The penalized log-likelihood representation of a plausibility ranking
has recently been axiomatized by Gilboa and Schmeidler (2010).
Finally, a theory Qx̄,η is not ruled out if and only if
`λt (x̄, η) ≥
max
(x̄0 ,η 0 )∈Rnx ×L2 ([0,T ],Rnx )
`λt (x̄0 , η 0 ) − α
where 0 ≤ α < ∞. α measures how conservative the agent is in model selection; when
α = 0, in particular, the agent keeps nothing but the most plausible theories. And as
shall be seen, the corresponding induced plausibility of mx̄,η (t)
λ
x̄,η
`
(x̄,
η)
:
m
(t)
=
m
, m ∈ Rnx
max
`λt,m(t) (m) ,
t
n
2
n
(x̄,η)∈R
x ×L
([0,T ],R
x)
has a nonzero curvature (Lemma 2.5).
Remark 2.1. There are two prominent alternatives to the L2 -penalty.
The first is Epstein and Schneider’s (2007) L∞ -constraint: ess supt≤T |η(t)| ≤ η̄.
This amounts to constraining instantaneous entropy rates point by point in time.
While this is sensible when the agent is looking forward and fears misspecification of
the infinitesimal future, in looking backward, it is not. What the agent tries to pin
down here is the value of mx̄,η (t), and in this regard, η(s), s ≤ t, having large values
for a short period of time has littleR significance.
T
The other is an L2 -constraint: 0 |η(t)|2 dt ≤ η̄T . Naturally, this is closely related
to the L2 -penalty. First, a constraint is a penalty that is discontinuous. Second, a
constraint is the dual of a penalty in the method of Lagrange multipliers: the constant
λ defines a shadow process η̄ λ = {η̄tλ } that implies the same most plausible theories.
And it can be seen that the penalized likelihood ratio test with λ is more conservative
than the constrained likelihood ratio test with η̄ λ .
Compared to its penalty counterpart, however, the L2 -constraint has the following
drawbacks. First, the sharp bounds seem to be at odds with the assumed a priori
ignorance. Second, if, as is natural, the time-t bound η̄t is lower than η̄T , t < T , it
implies p
that the agent has a time-varying parameter set; for example, he would deem
η(s) = 2η̄T /t(1, 0, · · · , 0)> , s ≤ t, implausible at time t but plausible at time T .
16
Maximum Plausibility Estimation I will need the following facts to characterize
the natural “center” of the set of preferential priors.
The maximum plausibility estimate (MPE) of (x̄, η) at time t is defined as
(x̄∗t , ηt∗ ) ,
arg max
(x̄,η)∈Rnx ×L2 ([0,T ],Rnx )
`λt (x̄, η).
The notion of the partial MPE of η given x̄ will prove helpful:
∗
ηx̄,t
,
arg max
η∈L2 ([0,T ],Rnx )
`λt (x̄, η).
Clearly, ηt∗ = ηx̄∗∗t ,t .
The first-order condition with respect to η (FOC(η)) is
−1
>
Z
λη(s) = (ϕ(s) ρv )
t
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1
s
× [ dy(τ ) − (a(τ ) + b(τ )mx̄,η (τ )) dτ ], 0 ≤ s ≤ t.
To write the solution of this integral equation, introduce the following notation. Let
>
> −1
>
>
> −1
b(s)
(σ(s)σ(s)
)
b(s)
)
ρ
ρ
κ̄(s)
(ρ
ρ
ρv ρ>
v
v
v
v
v
,
χ(s) ,
λ−1 Inx
−κ̄(s)
and let ψ be the matrix-valued process such that
ψ̇(s) = χ(s)ψ(s), 0 ≤ s ≤ T, ψ(0) = I2nx .
(16)
ψ(s) is invertible for all s ≥ 0. Let ι1 , (Inx , 0)> , ι2 , (0, Inx )> , and Aij , ι>
i Aιj for
a 2nx × 2nx matrix A.
Lemma 2.3. For all t > 0, (i) ψ11 (t) is invertible, and (ii) ψ21 (t)ψ11 (t)−1 ρv ρ>
v is
symmetric and positive definite.
Let also
Z
s
Ψ(s) , ψ(s)
ψ(τ )−1 dτ, 0 ≤ s ≤ T.
(17)
0
Proposition 2.4 (Partial MPE of η).
∗
λρv ηx̄,t
(s)
∗
κx̄+ρv ηx̄,t
Φ
(s)
s
Z
>
> −1
= Ψ(s)ι2 κx̄ − ψ(s)
ψ(τ )−1 ι1 ρv ρ>
dw̄0,0 (τ )
v b(τ ) (σ(τ ) )
0
Z t
−1 >
−1
>
>
> −1
0,0
− ψ(s)ι1 ψ11 (t) ι1 Ψ(t)ι2 κx̄ − ψ(t)
ψ(τ ) ι1 ρv ρv b(τ ) (σ(τ ) ) dw̄ (τ ) .
0
(18)
17
∗
Hence, mx̄,ηx̄,t (s) is linear in x̄ (recall (11)). Define θ(t) by
θ(t) , Ψ22 (t) − ψ21 (t)ψ11 (t)−1 Ψ12 (t)
or
∗
(19)
∗
mx̄,ηx̄,t (t) = m0,η0,t (t) + θ(t)κx̄.
That is, θ(t) measures the sensitivity to κx̄ of mx̄,η (t) with η “profiled out.” Let Ix̄ (t)
denote the “observed Fisher information” about x̄:
∂2 λ
∗ Ix̄ (t) , −
`t (x̄, ηx̄,t )
.
(20)
2
∂(κx̄)
x̄=x̄∗
t
Precisely speaking, Ix̄ (t) is the information about κx̄, but I adopt this slight abuse
of terminology because κ is known and the parameter of interest is clearly x̄.
Assumption 2.3. (i) nx ≤ ny .
(ii) b(t) is of full column rank (that is, nx ) for all t ∈ [0, T ].
Lemma 2.4.
Z
Ix̄ (t) =
t
θ(s)> b(s)> (σ(s)σ(s)> )−1 b(s)θ(s) ds
(21)
0
and is invertible for all t > 0.
FOC(x̄) is
Z t
0=
(b(s)ΦInx (s)κ)> (σ(s)σ(s)> )−1 [ dy(s) − (a(s) + b(s)mx̄,η (s)) ds].
0
Proposition 2.5 (MPE of x̄). For t > 0,
Z t
∗
∗
−1
−1
x̄t = κ Ix̄ (t)
ΦInx (s)> b(s)> (σ(s)> )−1 dw̄0,η0,t (s).
0
Remark 2.2. Estimation is not defined at time 0, and consequently, neither is the
time-0 decision making. This is natural. At time 0, the agent is in the state of sheer
ignorance while once the observable process y starts to wiggle, information thereafter
accrues continuously. The singularity at time 0 is not a problem because, as we shall
see, decision making is well-defined for all t > 0. Nevertheless, I assume purely for
the brevity of exposition that the agent’s learning started prior to time 0 and all
the statistics, including Ix̄ (0) and x̄∗0 , have a definite, finite value at time 0. The
differential dynamics I am about to characterize determine their evolution from then
on. To maintain the convention that G0 is trivial, I assume that all the G0 -measurable
variables are nonrandom constants.
18
The natural center of the time-t set of one-step-ahead conditionals is
∗ ∗
dy(t)|Gt ∼ N (a(t) + b(t)mx̄t ,ηt (t)) dt, σ(t)σ(t)> dt .
This observation motivates us to define a process = {(t), Gt } by
Z t
∗ ∗
σ(s)−1 [ dy(s) − (a(s) + b(s)mx̄s ,ηs (s)) ds], 0 ≤ t ≤ T.
(t) ,
0
To prove that there is a probability measure on (Ω, GT ) under which is a Wiener
process, I first observe the dynamics of the statistics.
Proposition 2.6 (Dynamics of the MPEs).
dx̄∗t = κ−1 σx̄∗ (t)> b(t)> (σ(t)> )−1 d(t),
∗
∗
∗
∗
dmx̄t ,ηt (t) = κ(x̄∗t − mx̄t ,ηt (t)) dt + [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)> )−1 d(t), (22)
where
σx̄∗ (t) , θ(t)Ix̄ (t)−1 ,
(23)
>
δ(t) , ψ21 (t)ψ11 (t)−1 ρv ρ>
v + θ(t)σx̄∗ (t) .
(24)
Note that δ is symmetric and positive definite (Lemma 2.3 and θ(t) = σx̄∗ (t)Ix̄ (t)).
The following proposition closes the dynamics:
Proposition 2.7.
θ̇(t) = Inx − (κ + ρw σ(t)−1 b(t))θ(t)
(25)
− (γ(t) + δ(t) − θ(t)σx̄∗ (t)> )b(t)> (σ(t)σ(t)> )−1 b(t)θ(t),
σ̇x̄∗ (t) = Ix̄ (t)−1 − {κ + [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 b(t)}σx̄∗ (t),
d
(Ix̄ (t)−1 ) = −σx̄∗ (t)> b(t)> (σ(t)σ(t)> )−1 b(t)σx̄∗ (t),
dt
δ̇(t) = (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> − κδ(t) − δ(t)κ
− [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 [ρw σ(t)> + (γ(t) + δ(t))b(t)> ]>
(26)
+ σ ∗ (t) + σ ∗ (t)> + λ−1 ρ ρ> .
x̄
x̄
v v
The Preferential Priors Now we are ready to characterize the preferential priors.
The reference preferential prior P 0 is identified in Proposition 2.8, and the one-stepahead beliefs process Ξ, in Proposition 2.9. (Recall the description of Section 2.1.)
First, make the following additional assumption:
Assumption 2.4. θ, σx̄∗ , and δ are uniformly bounded.
19
Remark 2.3. Here are simple example cases in which Assumption 2.4 holds: (i) σ
and b are deterministic. (ii) σ, ρw , ρv , and b are diagonal,18 and there is an ε > 0
such that κ̄ = κ + (ρw σ > + γb> )(σσ > )−1 b ≥ εInx a.e. (Given that σ, ρw , ρv , and b
are diagonal, there trivially is an ε > 0 such that κ̄ > εInx a.e. if ρw = 0.) See the
appendix for a proof.
Proposition 2.8. There is a unique probability measure on (Ω, GT ), denoted by P 0 ,
such that P 0 ∼ (Q0,0 |GT ) and is a Wiener process under P 0 . Also, G equals the
augmented filtration generated by .
Observe that under P ξ ,
∗ ∗
dy(t)|Gt ∼ N (a(t) + b(t)mx̄t ,ηt (t) + σ(t)ξ(t)) dt, σ(t)σ(t)> dt .
Hence, the time-t set of one-step-ahead conditionals Ξ(t) is defined by
∗
∗
a(t) + b(t)mx̄t ,ηt (t) + σ(t)Ξ(t)
= µ ∈ Rny : `λt (x̄∗t , ηt∗ ) −
max
{`λt (x̄, η)
(x̄,η)∈Rnx ×L2 ([0,T ],Rnx )
x̄,η
: a(t) + b(t)m
(t) = µ} ≤ α
where the maximum of an empty set is defined to be −∞. It turns out that δ(t) is
the inverse of the observed Fisher information about mx̄,η (t):
Lemma 2.5.
∗ ∗
`λt (x̄∗t , ηt∗ ) − max `λt (x̄, η) : mx̄,η (t) = m = `λt,m(t) (mx̄t ,ηt (t)) − `λt,m(t) (m)
x̄,η
1
∗ ∗
∗
= (m − mx̄t ,ηt (t))> δ(t)−1 (m − mx̄t ,ηt (t)), m ∈ Rnx .
2
And the set of one-step-ahead conditionals is given as follows:
Proposition 2.9.
−1
nx
Ξ(t) = σ(t) b(t) ∆m ∈ R
1
>
−1
: (∆m) δ(t) ∆m ≤ α , 0 ≤ t ≤ T.
2
(27)
The process Ξ = {Ξ(t), Gt } is uniformly bounded and compact-convex-valued. If each
of the processes b and σ −1 is left- or right-continuous, Ξ is furthermore progressive.
Remark 2.4. For ξ(t) ∈ Ξ(t),
1
(σ(t)ξ(t))> (b(t)δ(t)b(t)> )+ σ(t)ξ(t) ≤ α
2
where (b(t)δ(t)b(t)> )+ denotes the Moore-Penrose pseudoinverse:
(b(t)δ(t)b(t)> )+ = b(t)(b(t)> b(t))−1 δ(t)−1 (b(t)> b(t))−1 b(t)> .
However, the converse is not true; that is, (28) does not imply ξ(t) ∈ Ξ(t).
18
In case nx 6= ny , the nx × ny matrix ρw , for example, is diagonal if ρij
w = 0 for all i 6= j.
20
(28)
Thus, the size of the set of one-step-ahead conditionals is proportional to that
of the set of the plausible values of mx̄,η (t), and the latter set is given by an nx ∗ ∗
dimensional hyper-ellipsoid centered at the most plausible value mx̄t ,ηt (t). The lengths
of the principal axes of the hyper-ellipsoid are proportional to the square roots of the
eigenvalues of δ(t). Therefore, the square roots of the eigenvalues, or the eigenvalues
themselves, of δ(t) are measures of conditional ambiguity.
To conclude, let ξ ∈ Ξ mean that ξ = {ξ(t), Gt } is progressive and ξ(t, ω) ∈ Ξ(t, ω)
a.e. Then the set of preferential priors is given by
dP ξ
ξ
ξ
ξ
P = P : P is a probability measure on (Ω, GT ),
= E (T ), ξ ∈ Ξ .
dP 0
3
Discussion
This section examines the learning dynamics derived in the previous section. The
conditional ambiguity about one-step-ahead uncertainties is represented by the set of
one-step-ahead conditionals Ξ(t); a nonsingleton Ξ(t) results from ambiguity in the
∗ ∗
∗ ∗
estimate mx̄t ,ηt (t); and the ambiguity in mx̄t ,ηt (t) comes from two component ambiguities, one about the time-invariant factor x̄ and the other about the time-varying
factor η. The discussion begins by noting that the former eventually resolves (Section
3.1). The ambiguity about η, however, persists, and the consequent time variation
in the conditional ambiguity is discussed next. Specifically, Section 3.2 analyzes the
∗ ∗
filtering equations for {mx̄t ,ηt (t) : t} and δ in comparison with the classical ones for
mx̄,η and γ. It turns out that IID ambiguity may indeed obtain as the limit of a
learning process. Section 3.3 provides a sufficient condition for such convergence.
Assume henceforth nx = 1. Still the setup is general enough to encompass both
the examples given in Section 2.2.1 and those to be given in Section 5.
3.1
Learning about x̄
The ambiguity associated with x̄ eventually resolves, provided that the observation
process y as a signal about the hidden state x maintains a level of informativeness:
Assumption 3.1. |σ −1 b| is uniformly bounded away from zero.
Proposition 3.1. The set of the values of x̄ that are sufficiently plausible at time t,
n
o
∗
)≤α ,
x̄ ∈ R : `λt (x̄∗t , ηx̄∗∗t ,t ) − `λt (x̄, ηx̄,t
shrinks to the point {x̄∗t } as t → ∞, for all critical values α ≥ 0.
The question that naturally arises next is if x̄∗ converges. But since convergence
under one probability measure does not imply convergence under another probability
21
measure obtained by a Girsanov change of measure (see Karatzas and Shreve, 1988,
p. 193), to answer this question we need to take a stance regarding the true probability measure. Although my stance is that not only does the agent not know the
true probability measure but she does not purport, either, to have identified a set
of probability measures (theoretical priors) that includes it, if need be the natural
candidate for the true probability measure is a theoretical prior Qx̄,0 ∈ Q of the agent
(correct specification). It remains to be seen if x̄∗ converges under Qx̄,0 .19
3.2
Comparison with the Classical Filter
The agent’s learning process is summarized by a finite-dimensional filter (Propositions
∗ ∗
2.6 and 2.7). The key components of the filter are {mx̄t ,ηt (t) : t} and δ. The ambiguity
in the data-generating mechanism boils down, in the present model, to that in the
current value of the state variable, and the plausible estimates of the latter are given
∗ ∗
by an interval centered at mx̄t ,ηt (t) with length proportional to the square root of
∗ ∗
δ(t) (Lemma 2.5). Therefore, of prime interest is how {mx̄t ,ηt (t) : t} and δ evolve. In
∗ ∗
the following lines, I analyze the differential equations {mx̄t ,ηt (t) : t} and δ satisfy in
comparison with those satisfied by mx̄,η and γ.
Let us begin with the unobservable process x:
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt),
d
Var(x(t)) = |ρw |2 + ρ2v −2κ Var(x(t)).
| {z }
dt
Var( dx(t)|Ft )/ dt
The time-derivative of the unconditional variance of x(t) is the conditional variance of
dx(t) per unit time, less the unconditional variance of x(t) times the rate of reversion
(times two). This is self-explanatory.
Recall next the filtering equations of a conditionally Gaussian process, namely (8)
19
∗
∗
∆x̄∗ (t) , x̄∗t − x̄ and ∆m∗ (t) , mx̄t ,ηt (t) − mx̄,0 (t) satisfy
κ d∆x̄∗ = σx̄>∗ b> (σ > )−1 ( dw̄x̄,0 − σ −1 b∆m∗ dt),
d∆m∗ = κ(∆x̄∗ − ∆m∗ ) dt + δb> (σ > )−1 dw̄x̄,0 − (ρw σ > + (γ + δ)b> )(σσ > )−1 b∆m∗ dt,
and (∆x̄∗ , ∆m∗ ) converges in L2 . The difficulty is that σx̄∗ is square-integrable but not integrable.
It is not clear whether
Z ∞
σx̄>∗ b> (σσ > )−1 b∆m∗ dt
0
is convergent or not. On the other hand, it is easy to see that x̄∗ is an L2 -bounded continuous
martingale under P 0 , in which case limt→∞ x̄∗t exists under P 0 by Doob’s martingale convergence
theorem (Rogers and Williams, 1994, Theorem II.69.1).
22
and (9), slightly rephrased to facilitate the discussion:
dmx̄,η (t) = [κ(x̄ − mx̄,η (t)) + ρv η(t)] dt + [ρw + (σ(t)−1 b(t)γ(t))> ] dw̄x̄,η (t),
|
{z
}
Kalman gain
|
{z
}
weight on the innovation
2
γ̇(t) = |ρw |2 + ρ2v − 2κγ(t) − ρw + (σ(t)−1 b(t)γ(t))> .
|
{z
}
weight on the innovation squared
We revise the estimate mx̄,η (t) of x(t) in consideration of two factors: (i) the
estimation error and (ii) the (unobservable) evolution of x. First, the correction of
the estimation error, +(σ(t)−1 b(t)γ(t))> dw̄x̄,η (t), is proportional to the innovation
dw̄x̄,η (t). Suppose, to ease explanation, ny = 1 and σ, b > 0. Then, when the change
in the observable variable exceeds (falls short of) what was expected, it is likely that
the old estimate of the growth rate was an underestimate (overestimate), and it thus
needs to be revised upward (downward). The multiplicative factor, or the Kalman
gain, is increasing in the uncertainty γ(t) in mx̄,η (t) (the less trustworthy the old
estimate, the more weighted the new evidence) and is decreasing in the imprecision
σ(t) of the signal (the less informative the signal, the less weighted the new evidence).
Second, mx̄,η (t) is also to be revised by +[κ(x̄ − mx̄,η (t)) + ρv η(t)] dt + ρw dw̄x̄,η (t), to
account for the corresponding change +[κ(x̄ − x(t)) + ρv η(t)] dt + ρw dw(t) in x.
γ̇(t) is given by an analogue of ( d/ dt) Var(x(t)) less the weight on the innovation
squared. The last term expresses that uncertainty resolves more quickly when new
evidence is taken more seriously. For example, zero weight on the news is equivalent
to no news, and that certainly cannot help resolve uncertainty.
∗ ∗
Recall finally the governing equations (22) and (26) of {mx̄t ,ηt (t) : t} and δ under
the reference preferential prior P 0 , again slightly rephrased:
∗
∗
∗
∗
dmx̄t ,ηt (t) = κ(x̄∗t − mx̄t ,ηt (t)) dt + {ρw + [σ(t)−1 b(t)(
γ(t) + δ(t)
| {z }
)]> } d(t),
estimation uncertainty
2
> 2
δ̇(t) = ρw + (σ(t)−1 b(t)γ(t))> −2κδ(t) − ρw + σ(t)−1 b(t)(γ(t) + δ(t)) |
{z
}
|
{z
}
Var( dmx̄,η (t)|Gt )/ dt
+
weight on the innovation squared
2θ(t)Ix̄ (t)−1
+
λ−1 ρ2v
|
{z
}
ambiguity associated with x̄
.
| {z }
ambiguity associated with η
As with the Bayesian estimate mx̄,η (t), the ambiguity-conscious agent’s estimate
m
(t), too, is revised in consideration of the estimation error and the evolution of
x. But the difference is that now γ(t) is replaced by the sum of γ(t) and δ(t). γ(t) is
also known as the estimation risk in the literature (Kalymon, 1971; Barry, 1974; Klein
and Bawa, 1976, 1977) and represents the Bayesian uncertainty under each theory
that results because the agent cannot observe x(t) and consequently has to estimate
it. On the other hand, δ(t) represents the Knightian uncertainty that results because
x̄∗t ,ηt∗
23
the agent does not know the data-generating mechanism and consequently has to
estimate it. Based on this parallelism, I call δ(t) the estimation ambiguity and the
sum γ(t) + δ(t) the estimation uncertainty. When the estimated theory is imprecise
(large δ(t)), or the posterior distribution of x(t) is diffuse under the theory (large
γ(t)), or both, new evidence receives more weight.
δ̇(t) is given by an analogue of γ̇(t) plus terms accounting for the ambiguity in
the data-generating mechanism. The first three terms reflect the fact that δ measures
the imprecision in the estimation of mx̄,η , as opposed to that in the estimation of
x as does γ. To elaborate, the first term is the conditional variance of dmx̄,η (t)
(given Gt ) per unit time as opposed to that of dx(t) (given Ft ) per unit time; the
parallelism between the second terms is obvious; and the third term is the weight
on the innovation squared, as in γ̇(t). Next, the fourth term θ(t)Ix̄ (t)−1 captures
∗ ∗
the ambiguity in the estimate mx̄t ,ηt (t) of mx̄,η (t) due to that in x̄∗t ; recall that θ(t)
∗
measures the sensitivity to x̄ of mx̄,ηx̄,t (t), and Ix̄ (t)−1 , the imprecision of x̄∗t .
Lemma 3.1. limt→∞ θ(t)Ix̄ (t)−1 = 0.
The fifth and last term λ−1 ρ2v then captures the ambiguity associated with η and sets
the long-run level of δ as the fourth term vanishes:
Proposition 3.2. Suppose σ −1 b is constant. Then δ evolves deterministically, converging to
p
p
(κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 − (κ + ρw σ −1 b)2 + ρ2v |σ −1 b|2
.
δ(∞) =
|σ −1 b|2
Note that δ(∞) is strictly decreasing in the confidence measure λ, and δ(∞) = 0 if and
only if λ = ∞. When σ −1 b is stochastic, we can roughly say that δ is instantaneously
tending to the limit identified above, while how quickly it does and how close it is to
the value at each instant depend on other parameters in the governing equation and
how wildly σ −1 b varies.
To be highlighted here is that δ, a measure of uncertainty, may respond inversely
to changes in the signal imprecision σ (assume ny = 1). That is, agents may perceive,
rather counterintuitively, more ambiguity when news has been relatively precise and
less ambiguity when news has been relatively imprecise. If we think of the observable
process as the endowment stream, this means that market ambiguity, or the premium
for bearing it, may negatively comove with market risk (see Section 4.1).
To elaborate on the behavior of δ in question, suppose the signs of ρw and σ −1 b
differ. If, as would typically be the case, b > 0, this means that the observable variable
(return or growth rate) locally negatively covaries with the unobservable variable
(expected return or expected growth rate). Then, the two considerations in revising
∗ ∗
mx̄t ,ηt (t) oppose each other, and this can result in high uncertainty. For an illustration,
suppose further that b is constant; that σ > 0 and ρw < 0; and that σ is currently
−1 2
very large, σ |ρ−1
w (γ + δ)|, so that δ is approaching λ ρv /2κ. (Note that the
24
extremely noisy signal is nevertheless not vacuous; it is not directly informative about
the unobservable variable, but is indirectly so by revealing the common noise w.) Now,
if σ drops a bit, new evidence starts to receive less weight, rather, in which case, by
way of the earlier observation, uncertainty rises. If, for example, σ stays at
σ≡−
b
(ρ2w + (1 + λ−1 )ρ2v ) > 0,
2κρw
then in the limit t → ∞, the weight on the innovation is zero and δ is indeed larger
than the earlier limit value λ−1 ρ2v /2κ.20
Similar observations can be made about γ as well; in fact, it is more straightforward
to see that γ may negatively comove with σ given the simpler governing equation.
However, it is of less interest because the “second-order” uncertainty is not reflected
in the equity premium to begin with.
3.3
Convergence to IID Ambiguity
I define convergence to IID ambiguity to mean that the one-step-ahead beliefs process
Ξ converges uniformly in the Hausdorff metric to a compact-convex subset of Rny .21
An obvious sufficient condition is the following:
Proposition 3.3. Convergence to IID ambiguity occurs if σ −1 b is constant.22
Suppose σ −1 b is constant; and suppose further ny = 1 for simplicity. Then,
p
p
2
¯
ξ(∞)
= 2α
(κ + σ −1 bρw )2 + (1 + λ−1 )(σ −1 bρv )2 − (κ + σ −1 bρw )2 + (σ −1 bρv )2
¯ ξ(t)].
¯
¯
where Ξ(t) = [−ξ(t),
Note that ξ(∞)
is nonzero if and only if λ is finite. Also,
not surprisingly, it is increasing in α and decreasing in λ.
¯
To see the dependence of ξ(∞)
on other parameters, define Y by Y (0) = 0 and
dY (t) = σ(t)−1 dy(t)
= σ(t)−1 (a(t) + b(t)x(t)) dt + dw(t),
and assume σ −1 a is deterministic. Then, denoting the asymptotic variability of Y by
VY , lim
t→∞
20
21
d
Var(Y (t)),
dt
See the supplementary appendix.
The Hausdorff metric dH on the set K of nonempty compact subsets of Rny is defined by
dH (X, Y ) = max sup inf |x − y|, sup inf |y − x| , X, Y ∈ K.
x∈X y∈Y
22
y∈Y x∈X
The constancy of σ −1 b can be weakened to uniform convergence.
25
2
¯
we can rewrite ξ(∞)
as
2
¯
ξ(∞)
= 2α
p
p
κ2 VY + λ−1 (σ −1 bρv )2 − κ2 VY .
¯
Thus, ξ(∞)
is decreasing in κ and VY . This is intuitive. First, when κ is large, the
unobservable process stays close to the attractor. Second, VY measures the variability
of the unobservable process relative to the measurement error w. The last observation, in particular, is in line with the following remark by Merton (1980): “Unless a
significant portion of the variance of the market returns is caused by changes in the
expected return on the market, it will be difficult to use the time series of realized
market returns to distinguish among different models for expected return.”
4
Portfolio Choice
In Section 4, I apply the model of learning to the consumption/portfolio choice problem of a log investor. The investor finances her intertemporal consumption by trading
one risk-free asset (bond) and a number of risky assets (stocks). She believes, as
is the prevailing view of the financial economics profession, that mean reversion in
stock returns is a plausible assumption (Fama and French, 1988; Poterba and Summers, 1988); but facing at the same time nonnegligible evidence that questions its
validity (Welch and Goyal, 2008), she fails to have full confidence in it.
In Section 4.1, I explain the setup. Sections 4.2 and 4.3 characterize the optimal
demand for stocks. In Section 4.4, I consider the special case in which there is a
single stock and the stock return volatility is constant. This simplification allows us
to establish certain analytical properties of the optimal policy. In Section 4.4.3, I
numerically compute the optimal policy and discuss its behavior in comparison with
the related models by Epstein and Schneider (2007) and Miao (2009).
4.1
Setup
As with the previous section, time is continuous and varies over [0, T ], T ∈ (0, ∞).
4.1.1
Securities Market Dynamics
There is a single consumption good in the economy, which is continuously consumed
and serves as the numeraire. The investor finances her consumption by trading one
bond and nR ≥ 1 stocks.
The interest rate on the bond is constant at r ∈ R.
Regarding how the stock returns are generated, on the other hand, the investor
entertains multiple theories. Specifically, the theories take the form of probability
measures on a common measurable space: Let there be a measurable space (Ω, F), a
set Q of probability measures (theoretical priors) on (Ω, F), and a filtration F = {Ft }
26
of F. The theoretical priors are equivalent and F satisfies the usual conditions with
respect to the theoretical priors. Under the theoretical prior Qx̄,η ∈ Q, where (x̄, η) ∈
Rnx × L2 ([0, T ], Rnx ) and nx ≥ 1, the cumulative return process R = {R(t), Ft } of
the stocks is given by part of the solution to the system of SDEs
dR(t) = (aR (t, R, A) + bR (t, R, A)x(t)) dt + σR (t, R, A) dw(t),
dA(t) = (aA (t, R, A) + bA (t, R, A)x(t)) dt + σA (t, R, A) dw(t),
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt).
(29)
(30)
Here, A = {A(t), Ft } is an nA -dimensional process, nA ≥ 0; x = {x(t), Ft } is an
nx -dimensional process; nx ≤ nR + nA ; w = {w(t), Ft } and v x̄,η = {v x̄,η (t), Ft } are
independent Wiener processes of dimension nR + nA and nx , respectively; aR , bR , σR ,
aA , bA , and σA are nonanticipating path functionals from [0, T ] × C([0, T ], RnR +nA )
into RnR , RnR ×nx , RnR ×(nR +nA ) , RnA , RnA ×nx , and RnA ×(nR +nA ) , respectively; κ is an
nx × nx diagonal matrix with positive entries, ρw is an nx × (nR + nA ) matrix, and
ρv is an nx × nx invertible matrix.
The investor observes R and A but not x. A in this context represents the observable macroeconomic variables in addition to the stock returns themselves that affect
the stock returns; and x the latent state of the economy.
To conform to the notation of Section 2, let y , (R> , A> )> and ny , nR + nA .
Then the dynamics (29)–(30) of the observable processes can be rewritten compactly
as
dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t)
where the definitions of a, b, and σ are obvious. I continue to adopt the slightly
abusive notation f (t) ≡ f (t, ω) ≡ f (t, y(ω)) for the path functionals f .
4.1.2
The Investor’s Preferences
The investor has the Chen-Epstein recursive multiple-priors utility with log felicity.
Her conditional preferences at time t ∈ [0, T ] are represented by
Pξ
Z
min E
ξ∈Ξ
T
e−βs log(c(s)) ds, c ∈ C.
t
Under a generic preferential prior P ξ ,
dR(t) = (aR (t) + bR (t)m∗t ) dt + σR (t)( dξ (t) + ξ(t) dt),
∗
∗
where m∗t ≡ mx̄t ,ηt (t) is the maximum plausibility estimate of the conditional expectation of x(t) given Gt and ξ = {ξ (t), Gt } is a P ξ -Wiener process of dimension ny . Ξ(t)
thus acquires a more specific interpretation as the ambiguity in the contemporaneous
price of risk.
27
4.1.3
Trading Strategies and the Budget Constraint
A (1 + nR )-dimensional process (Π◦ , Π), Π(t) = (Π1 (t), . . . , ΠnR (t))> , is a trading
strategy if G-progressive and
Z T
(|Π◦ (t)| + |Π(t)|2 ) dt < ∞.
0
Π◦ represents the amount of money invested in the bond and Π those invested in
the stocks. A trading strategy (Π◦ , Π) finances a consumption plan c ∈ C if Π◦ (T ) +
Π(T )> 1nR ≥ 0 and
d(Π◦ (t) + Π(t)> 1nR ) = Π◦ (t)r dt + Π(t)> dR(t) − c(t) dt
where 1nR denotes the nR -dimensional vector of ones. Denote the wealth process
Π◦ + Π> 1nR by W . W satisfies
dW (t) = (W (t) − Π(t)> 1nR )r dt + Π(t)> dR(t) − c(t) dt
(31)
with initial condition W (0) = Π◦ (0) + Π(0)> 1nR . In fact, W is the unique strong
solution to the last equation, and therefore, we can suppress Π◦ and identify a trading
strategy with Π. A pair (Π, c) is admissible for initial wealth W (0) if the corresponding
wealth process W Π,c,W (0) is uniformly bounded below.
The market is dynamically incomplete if nA > 0. Let
ζ(t) , σR (t)> (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR ).
A consumption process c ∈ C can be financed by some trading strategy if and only if
it satisfies the following static budget constraint:
Z T
P0
sup E
E −(ζ+ν) (t)e−rt c(t) dt ≤ W (0)
(32)
ν∈Ker(σR )
0
where Ker(σR ) denotes the set of processes ν such that σR (t, ω)ν(t, ω) = 0 a.e. (He
and Pearson, 1991; Karatzas et al., 1991; Cuoco, 1997).
Remark 4.1. If the investor had full confidence in a simple theory Qx̄,0 ∈ Q, then
the present model would have as special cases the Bayesian learning models of Lakner
(1998), Xia (2001), Zohar (2001), and Brendle (2006), in which the unobservable instantaneous expected return process follows an Ornstein-Uhlenbeck process. In other
words, this section extends the latter models to a case of ambiguity.
4.2
Optimal Consumption and Portfolio
Let C 2 (u) ⊂ C denote the set of consumption processes such that
Z T
P0
E
[log(c(t))]2 dt < ∞.
0
28
I define the investor’s problem to be
Pξ
Z
T
e−βt log(c(t)) dt
sup min E
c∈C 2 (u)
ξ∈Ξ
(33)
0
subject to the budget constraint (32). The objective function in (33) is finite for all
(c, ξ) ∈ C 2 (u) × Ξ due to the definition of C 2 (u) and the uniform boundedness of
Ξ. Let Cbudget ⊂ C denote the set of consumption processes that satisfy the budget
constraint.
Lemma 4.1. The minimax theorem holds, that is,
Z T
Pξ
e−βt log(c(t)) dt = min
sup
min E
sup
c∈C 2 (u)∩C
budget
ξ∈Ξ
ξ∈Ξ c∈C 2 (u)∩C
budget
0
Pξ
Z
E
T
e−βt log(c(t)) dt.
0
Remark 4.2. It is clear from the proof that the claim is true for any concave felicity,
with the corresponding change to the definition of C 2 (u).
Proposition 4.1. For a given ξ ∈ Ξ, the inner supremum
Z T
Pξ
sup
E
e−βt log(c(t)) dt
c∈C 2 (u)∩Cbudget
(34)
0
equals
1 − e−βT
−
log
β
1 − e−βT
β−r
1 − e−βT
1 − e−βT
−βT
+
Te
−
+
log W (0)
β
β
β
β
Z T −βt
e
− e−βT 1
Pξ
|ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t)|2 dt. (35)
+E
β
2
0
Let ξ ∗ denote the minimizer of the last expression:
Z T −βt
e
− e−βT 1
Pξ
∗
ξ , arg min E
|ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t)|2 dt. (36)
β
2
ξ∈Ξ
0
The optimal consumption process is given by
∗
e−βt
E ξ (t)
c (t) = βW (0)e
,
1 − e−βT E −(ζ+ν ∗ ) (t)
ν ∗ (t) = [σR (t)> (σR (t)σR (t)> )−1 σR (t) − Iny ]ξ ∗ (t).
∗
rt
(37)
(38)
Hence the key is to solve (36). Note for later reference that
ζ(t) + σR (t)> (σR (t)σR (t)> )−1 σR (t)ξ(t)
= σR (t)> (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR + σR (t)ξ(t)).
29
To find the trading strategy that finances c∗ , observe first that the wealth process
corresponding to c∗ is
Z T
1
P0
−1 −(ζ+ν ∗ )
∗
∗
Gt
E
B(s)
E
(s)c
(s)
ds
W (t) =
B(t)−1 E −(ζ+ν ∗ ) (t)
t
(39)
∗
−βt
− e−βT E ξ (t)
rt e
.
= W (0)e
1 − e−βT E −(ζ+ν ∗ ) (t)
Thus its differential is
dW ∗ (t) = W ∗ (t)(ζ(t) + ν ∗ (t) + ξ ∗ (t))> d + · dt.
Comparing the last expression with (31) and recalling (38), we see that
π ∗ (t) , Π∗ (t)/W ∗ (t)
= (σR (t)σR (t)> )−1 (aR (t) + bR (t)m∗t − r1nR + σR (t)ξ ∗ (t))
(40)
where π ∗ denotes the optimal fraction of wealth invested in the stock.
The optimal consumption plan c∗ found above equals that of the Bayesian investor
∗
with unique prior P ξ . Accordingly, π ∗ equals the stock demand of the same Bayesian
∗
investor, the term involving ξ ∗ accounting for the discrepancy between P ξ and P 0 .
This observation also suggests that as is characteristic of Bayesian log investors, the
optimal consumption is given by a fraction of wealth independent of other state
variables, or precisely,
β
W ∗ (t),
c∗ (t) =
−β(T
−t)
1−e
as can be verified from (37) and (39).
4.3
Markovian Characterization
Suppose the economy is Markovian, that is,
f (t, R, A) = f (t, R(t), A(t))
where f = a, b, or σ. Then the investor’s information can be summarized by a finite
number of Markovian variables.
Observe first that the Bayesian investor who has full confidence in a simple theory
x̄,0
Q ∈ Q has the following as the state variables (see Proposition 2.2):
R(t), A(t), mx̄,0 (t), and γ(t).
Our investor also has these as state variables, with the obvious replacement of mx̄,0 (t)
by m∗t , that is,
R(t), A(t), m∗t , and γ(t),
(41)
30
and the following in addition:
x̄∗t , σx̄∗ (t), Ix̄ (t)−1 , and δ(t).
(42)
See Propositions 2.6, 2.7, and 2.9. The first three of (42) originates from the estimation
of x̄; the last is needed to describe the set of one-step-ahead conditionals Ξ(t). The
standard control approach to the minimization (36) requires that Ξ(t), ζ(t), and σR (t)
be functions of some (multidimensional) Markov process, and Propositions 2.6 and
2.7 confirm that the variables identified in (41) and (42) form a closed system of
Markovian variables. Collect them in Z,
Z , (R, A, m∗ , γ, x̄∗ , σx̄∗ , Ix̄−1 , δ)> ,
and write
dZ(t) = µZ (t, Z(t)) dt + σZ (t, Z(t)) d(t).
(43)
Remark 4.3. Some of the state variables identified above may be redundant. For
example, if a, b, and σ are deterministic functions of time independent of R and A,
then it suffices to take as state variables m∗ and x̄∗ . See Section 4.4 below.
Define the value function as
Z T −βs
e
− e−βT
Pξ
J(t, Z) , min E
ξ∈Ξ
β
t
1
>
> −1
2
× |ζ(s) + σR (s) (σR (s)σR (s) ) σR (s)ξ(s)| ds Z(t) = Z
2
subject to the state dynamics (43). Picking a particular ξ ∈ Ξ is to say that ξ =
{(t), Gt } defined by dξ (t) = d(t) − ξ(t) dt is a Wiener process. Hence
Z T −βs
e
− e−βT
P0
J(t, Z) = min E
ξ∈Ξ
β
t
ξ
1 ξ
ξ
ξ
ξ
>
> −1 ξ
2
× |ζ (s) + σR (s) (σR (s)σR (s) ) σR (s)ξ(s)| ds Z (t) = Z
(44)
2
subject to
dZ ξ (t) = µZ (t, Z ξ (t)) dt + σZ (t, Z ξ (t))( d(t) + ξ(t) dt)
where σRξ (s) ≡ σR (s, Rξ (s), Aξ (s)). (44) is linear-quadratic in the control, although not
in the state and hence not linear-quadratic in the classical sense. The corresponding
Hamilton-Jacobi-Bellman (HJB) equation is
0 = min
∂t J(t, Z) + (∂Z J(t, Z))> (µZ (t, Z) + σZ (t, Z)ξ(t))
ξ(t)∈Ξ(t,Z)
1
(45)
tr[(∂Z2 J(t, Z))σZ (t, Z)σZ (t, Z)> ]
2
e−βt − e−βT 1
+
|ζ(t, Z) + σR (t, Z)> (σR (t, Z)σR (t, Z)> )−1 σR (t, Z)ξ(t)|2
β
2
+
31
with boundary condition J(T, Z) = 0 for all Z. In general, (45) is of degenerate
parabolic type and we can only say that the value function is a viscosity solution of
(45). But see Section 4.4.1, where I consider a special case in which the value function
is a unique classical solution to the HJB equation.
4.4
Examples
To gain intuition, I consider in this section the special case in which there is a single
stock, the stock return volatility is constant, and there are no other observable macroeconomic indicators that affect stock returns. That is, nR = 1 and nA = 0 so that
ny = nx = 1 and σ(t, y) = σR (t, y) = σR ∈ (0, ∞) for all (t, y). Assume furthermore
aR ≡ 0 and bR ≡ 1. This setup is simple but rich enough to let us discuss key aspects
of the optimal policy.
4.4.1
x̄ Known
Suppose first that x̄ is known.
Optimal Policy Revisited Under the aforementioned assumptions, the investor’s
problem is Markovian and her optimal stock demand can be written in a simple
feedback form.
Recall Section 4.3 and note that (i) R and A are redundant as state variables
because σ is constant, (ii) γ and δ are redundant because they are deterministic, and
(iii) x̄∗ ≡ x̄, σx̄∗ , and Ix̄−1 are redundant because x̄ is known. It thus suffices to take
m∗ as the sole state variable (Z = m∗ ). The controlled state dynamics is (see (22))
∗,ξ
−1
dm∗,ξ
t = κ(x̄ − mt ) dt + (ρw σR + γ(t) + δ(t))σR ( d(t) + ξ(t) dt)
=: µm∗ (m∗,ξ
t ) dt + σm∗ (t)( d(t) + ξ(t) dt).
The price of risk under P 0 is simplified to
ζ(m∗ ) =
m∗ − r
.
σR
¯ ξ(t)]
¯
Ξ(t) is given by an interval [−ξ(t),
where
p
¯ , 2αδ(t) ;
ξ(t)
σR
¯ measures the magnitude of the ambiguity in the price of risk and is increasing
ξ(t)
in the investor’s conservatism in model selection α and the estimation ambiguity
δ(t). Also, it decreases monotonically and deterministically over time, converging to
∗
a constant, as is a property of δ. The estimated
p equity premium is m − r and the
¯ = 2αδ(t). (Unless necessary, the [true
ambiguity in the equity premium is σR ξ(t)
32
or estimated] instantaneous equity premium will be referred to simply as the [true or
estimated] equity premium.)
Next, the HJB equation (45) is simplified to
0 = min
∂t J(t, m∗ ) + (∂m∗ J(t, m∗ ))(µm∗ (m∗ ) + σm∗ (t)ξ(t))
ξ(t)∈Ξ(t)
e−βt − e−βT 1
1 2
∗
2
∗
2
(46)
(ζ(m ) + ξ(t))
+ (∂m∗ J(t, m ))σm∗ (t) +
2
β
2
with boundary condition J(T, m∗ ) = 0 for all m∗ . It is still not clear if (46) allows for
an analytical solution, but we can now check some basic properties of the value function.23 C 1,2 ([0, T ] × R) denotes the set of real-valued functions f from [0, T ] × R such
that f (t, m∗ ) is continuously differentiable in t and twice continuously differentiable
in m∗ ; and Cp ([0, T ] × R) the set of real-valued functions f from [0, T ] × R that are
continuous and satisfy the polynomial growth condition:
|f (t, m∗ )| ≤ K(1 + |m∗ |n ) for all m∗ ∈ R
for some nonnegative constants K and n. Assume for the rest of Section 4.4.1,
2
Assumption 4.1. σm
∗ : [0, T ] → R is bounded below away from zero.
The assumption trivially holds if ρw ≥ 0.
Proposition 4.2. (i) The partial differential equation (46) with its boundary condition has a unique solution K ∈ C 1,2 ([0, T ] × R) ∩ Cp ([0, T ] × R).
(ii) K is the value function, that is, K = J.
(iii)
¯ ξ U (t, m∗ ) ,
¯ min ξ(t),
ξ ∗ (t, m∗ ) = max −ξ(t),
βeβt
ξ (t, m ) , −ζ(m ) −
σm∗ (t)∂m∗ J(t, m∗ ).
−β(T
−t)
1−e
U
∗
∗
Thus, in particular, the optimal control ξ ∗ : [0, T ] × R → R is continuous.
The expression for the optimal stock demand (40) becomes
m∗ − r + σR ξ ∗ (t, m∗ )
σ2
∗ R
∗
¯
¯
m − r − σR ξ(t)
m − r + σR ξ(t)
= max
,
min
,
σR2
σR2
1
βeβt
∗
− 2
σR σm∗ (t)∂m∗ J(t, m )
.
σR 1 − e−β(T −t)
π ∗ (t, m∗ ) =
23
(47)
It is possible to formulate (46) as a free boundary problem and characterize the solution to a
certain degree (cf. Davis and Norman (1990)), but there is little practical benefit and I do not pursue
this direction.
33
Lemma 4.2. (i) J(t, m∗ ) is convex in m∗ .
(ii)
Z T −βs
∗,ξ ∗
∗
−βT −κ(s−t)
∗,ξ∗
e
m
−
r
+
σ
ξ
(s)
e
−
e
0
R
s
P
∗
∗
ds mt = m .
∂m∗ J(t, m ) = E
β
σR
σR
t
From the convexity, that is, from the fact that ∂m∗ J(t, m∗ ) is nondecreasing in
m , it follows that
¯
m∗ − r − σR ξ(t)
1
βeβt
=
−
σR σm∗ (t)∂m∗ J(t, m∗ )
σR2
σR2 1 − e−β(T −t)
∗
and
¯
1
βeβt
m∗ − r + σR ξ(t)
=
−
σR σm∗ (t)∂m∗ J(t, m∗ )
2
2
−β(T
−t)
σR
σR 1 − e
∗
as equations in m each have a unique solution, m∗ (t) and m∗ (t) < m∗ (t), respectively.
π ∗ can be rewritten as
 ∗
¯
m − r + σR ξ(t)


if m∗ < m∗ (t)

2

σ

R

 m∗ − r −
¯
σR ξ(t)
∗
∗
π (t, m ) =
if m∗ > m∗ (t)
2

σR



βeβt
1


− 2
σR σm∗ (t)∂m∗ J(t, m∗ ) if m∗ ∈ [m∗ (t), m∗ (t)].
σR 1 − e−β(T −t)
Since ξ ∗ is bounded, the effect of ambiguity on ∂m∗ J(t, m∗ ) is negligible for m∗ s
with a large absolute value. Combined with convexity, this implies that m∗ 7→ J(t, m∗ )
is U-shaped. (Epstein and Schneider (2007) in p. 1296 make a similar observation
from a numerical exercise.) As with her Bayesian counterpart with unique theoretical
prior Qx̄,0 , our multiple-priors investor, too, is better off when the estimated equity
premium is further away from zero, that is, when the stocks are (locally, in expected
terms) more distinct from the bond. The U-shape implies that the optimal policy
may have curvature in the central region m∗ ∈ [m∗ (t), m∗ (t)].
Compared to the Bayesian policy, our investor’s stock demand is (i) shifted up
by the ambiguity in the equity premium (divided by the return variance) when the
estimated equity premium m∗ − r is sufficiently small (in the sense of < on the
real line), (ii) shifted down by the same amount when m∗ − r is sufficiently large,
and (iii) proportional to the negative of the instantaneous covariation between the
stock return and the state (−σR σm∗ (t)) and the first derivative of the value function
(∂m∗ J(t, m∗ )), when m∗ − r is intermediate. Clearly, the last case is reminiscent of
Merton’s (1973) hedging demand; it tells the investor to hold more of the stock if it
pays in cases of low continuation utility. (But it is not exactly the same as Merton’s
hedging demand. His is such that the investor holds more of the assets that pay in
cases of low consumption, or equivalently, high marginal utility.) I will have a deeper
look at the quantity −σR σm∗ (t)∂m∗ J(t, m∗ ) later, but to talk about hedging, first we
have to clarify the myopic demand.
34
Myopic Demand The myopic demand is defined to be
h
i
∗
(t, m∗ ) , lim π ∗ (t, m∗ )
πmyopic
.
t→T
T =t
Proposition 4.3.
 ∗
¯
m − r + σR ξ(t)




σR2

∗
∗
¯
(t, m∗ ) = m − r − σR ξ(t)
πmyopic


σR2



0
¯
if m∗ − r < −σR ξ(t)
¯
if m∗ − r > +σR ξ(t)
¯ ≤ m∗ − r ≤ σR ξ(t).
¯
if − σR ξ(t)
The myopic demand is more conservative than that of the Bayesian investor with
unique theoretical prior Qx̄,0 in that in absolute values, the former is dominated by
the latter:
∗
∗
m
−
r
m
−
r
∗
∗
for all m∗ and < |πmyopic
(t, m∗ )| ≤ σ 2 for all m 6= r.
σR2 R
(I am comparing the feedback policies, considering m∗ to be signifying the estimate
of each investor. The actual values of m∗ will differ between the two investors.) Furthermore, there is a range of estimated equity premia for which our investor neither
buys nor sells short the stock. Say that the estimated equity premium is unambiguously positive if it is greater than the ambiguity in the equity premium, that is,
¯
¯
if m∗ − r > σR ξ(t);
unambiguously negative if m∗ − r < −σR ξ(t);
and not unambiguously distinct from zero, otherwise. Then, the observation, rephrased, is that the
multiple-priors investor, if myopic, does not participate in the stock market when her
estimate of the equity premium is not unambiguously distinct from zero; and participates when it is unambiguously positive or negative but invests a smaller fraction of
her wealth than the Bayesian counterpart with the same estimate would. See Dow and
Werlang (1992), who first presented a nonparticipation result for ambiguity-averse (in
the sense of Schmeidler (1989)) investors.
Hedging Demand Under risk, log investors do not hedge; under ambiguity, they
do.
Recall the total demand (47) and let
∗∗
πhedging
(t, m∗ ) , −
βeβt
1
σR σm∗ (t)∂m∗ J(t, m∗ ).
σR2 1 − e−β(T −t)
∗∗
As noted earlier, πhedging
reflects the investor’s desire to hedge against adverse changes
in the investment opportunities. Under ambiguity, an adverse change in the investment opportunities is a change in the state variables that is associated with a decrease
35
in continuation utility. In the present case, if the (estimated) equity premium is sufficiently large that ∂m∗ J(t, m∗ ) > 0, then the investor would fear a decrease in the
equity premium, that is, its becoming ambiguous, and want to transfer wealth to
states with lower equity premia. And she could do this by holding more of the stock
if it pays at times of lower equity premia and less of it if it does not.
However, the desire to hedge does not fully realize, and how much of it realizes
depends on the magnitude of the ambiguity present. The total demand π ∗ (t, m∗ ) is
∗∗
2
¯
given by πhedging
(t, m∗ ) confined between (m∗ − r ± σR ξ(t))/σ
R , which collapse to the
∗∗
the
Bayesian demand (m∗ − r)/σR2 when no ambiguity is present. Hence I call πhedging
∗∗
shadow hedging demand. Finally, based on the interpretation of πhedging , the difference
∗
between the total demand and the myopic demand is called the hedging
π ∗ − πmyopic
demand, although the intent is not fully realized:
∗
∗
(t, m∗ ) , π ∗ (t, m∗ ) − πmyopic
(t, m∗ )
πhedging
∗
∗
¯
¯
m − r + σR ξ(t)
m − r − σR ξ(t)
∗∗
∗
, min
, πhedging (t, m )
= max
σR2
σR2
∗
∗
¯
¯
m − r − σR ξ(t)
m − r + σR ξ(t)
− max
, min
,0
.
σR2
σR2
Long-horizon, multiple-priors log investors’ nonmyopic behavior was first observed
in discrete time by Epstein and Schneider (2007) and in continuous time by HernándezHernández and Schied (2007a).
∗∗
In Comparison with Merton (1973) The shadow hedging demand πhedging
is
reminiscent of Merton’s (1973), but not the same. The difference lies in what are
adverse changes in the investment opportunities. Under risk, they are associated with
low consumption; under ambiguity, with low continuation utility.
∗∗
To draw further comparison between πhedging
and Merton’s hedging demand, recall
that the latter is the position in the stock that minimizes the volatility of consumption.
∗∗
On the other hand, πhedging
is the position in the stock that minimizes (to zero) the
effect of misspecification on continuation utility. To elaborate, let
Z T
∗
∗
∗ ,c∗
Pξ
∗
−βs
∗
∗
π
V (t, m , W ) , E
e
log(c (s)) ds mt = m , W
(t) = W .
t
As is characteristic of log investors, V additively separates to a part depending only
on (t, W ∗ ) and another depending only on (t, m∗ ), and I have been focusing on the
latter denoted by J. Let further
ξ
EP [ dV (t, m∗t , W π,c (t))| m∗t = m∗ , W π,c (t) = W ]
f (t) ,
dt
and observe that
ξ
∂ξ(t) (f ξ (t) − f 0 (t)) = W π(t)σR ∂W V (t, m∗ , W ) + σm∗ (t)∂m∗ V (t, m∗ , W ).
36
From (35),
∂W V (t, m∗ , W ) =
e−βt − e−βT 1
and ∂m∗ V (t, m∗ , W ) = ∂m∗ J(t, m∗ ).
β
W
∗∗
It follows that |∂ξ(t) (f ξ (t)−f 0 (t))| attains its minimum (zero) at π(t) = πhedging
(t, m∗ ).
4.4.2
x̄ Unknown and Ambiguous
Suppose now that the investor does not know the value of x̄ and entertains all the
theoretical priors Q = {Qx̄,η : (x̄, η) ∈ R × L2 ([0, T ], R)}.
As before, R and A are redundant as state variables because σ is constant, and
γ, δ, σx̄∗ , and Ix̄−1 are redundant because they are deterministic. But now x̄∗ needs
to be taken as a state variable as well as m∗ :
Z = (m∗ , x̄∗ )
with dynamics (see Proposition 2.6)
∗,ξ
ρw σR + γ(t) + δ(t)
κ(x̄∗,ξ
ξ
t − mt )
dZ (t) =
dt +
σR−1 ( d(t) + ξ(t) dt)
κ−1 σx̄∗ (t)
0
= µZ (Z ξ (t)) dt + σZ (t)( d(t) + ξ(t) dt).
Since the diffusion matrix σZ σZ> is degenerate, the value function J may not be
differentiable. I assume nevertheless that ∂Z J(t, Z) exists everywhere and write
∗
∗
¯
¯
m − r + σR ξ(t)
m − r − σR ξ(t)
∗∗
∗
∗
∗
∗
∗
π (t, m , x̄ ) = max
, min
, πhedging (t, m , x̄ )
,
σR2
σR2
∗∗
(t, m∗ , x̄∗ ) , −
πhedging
1
βeβt
(ρw σR + γ(t) + δ(t))∂m∗ J(t, m∗ , x̄∗ )
σR2 1 − e−β(T −t)
1
βeβt
− 2
κ−1 σx̄∗ (t)∂x̄∗ J(t, m∗ , x̄∗ ).
σR 1 − e−β(T −t)
∗∗
I call the first term of πhedging
the m∗ -shadow hedging demand and the second the
x̄∗ -shadow hedging demand.
4.4.3
Numerical Analysis
Continue to assume that the investor entertains all the theoretical priors Q = {Qx̄,η :
(x̄, η) ∈ R × L2 ([0, T ], R)}. In this section, I numerically compute the optimal stock
demand π ∗ (t, m∗ , x̄∗ ) and discuss its behavior.
37
Stock Demand
2
Stock Demand
2
1
1
Hedging
-0.04
-0.02
0.02
-0.04
-0.02
m* +x*
0.02
0.04
Premium
Hedging Demand
m*
-1
-2
Premium
0.04
-1
Contrarian Behavior
x*
-2
Figure 1: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous
equity premium (annual, decimal). The investor has observed 20 years of data and now faces a 10year investment horizon. β = 0.03, λ = ∞, α = 0.38, and the estimated long-run equity premium
x̄∗t − r is fixed at 0.0458. Left plot: The solid line passing through the origin shows the Bayesian
demand; the dashed line, the myopic demand; the dotted lines, the shadow hedging demands; finally,
the thick solid line shows the total demand. Right plot: An analysis of the optimal stock demand.
The securities market model is calibrated based on Barberis (2000):24
dR(t) = x(t) dt + 0.1428 dw(t),
dx(t) = 0.2743(x̄ − x(t)) dt − 0.0392 dw(t) + 0.0361 dv x̄,0 (t),
and r = 0.0432 (all numbers are annual). The investor has observed 20 years of
data and now faces a 10-year investment horizon. β = 0.03, λ = ∞, and α = 0.38.
These parameters translate into an ambiguity in the equity premium of 0.01. Also,
σZ (20) = (0.007, 0.009)> .
Figure 1 shows the corresponding optimal stock demand as a function of the
estimated instantaneous equity premium, with the estimated long-run equity premium
fixed at 0.0458 (Barberis’s estimate). In the left plot, the solid line passing through the
origin shows the Bayesian demand; the kinked dashed line, the myopic demand; the
dotted lines, the m∗ -, x̄∗ -, and total shadow hedging demands; and finally, the thick
solid line shows the total demand. As observed analytically, the total demand is given
24
I annualized his monthly estimates (left panel of his Table II). His estimation is based on the
monthly NYSE value-weighted returns as calculated by the CRSP, from June 1952 to December
1995.
Barberis assumes that excess stock returns are predicted by the dividend-price ratio, whereas the
predictive variables of the present model, x, are unobservable. Hence, I calibrated the SDE for x so
that the SDE for mx̄,0 matches Barberis’s estimation:
dmx̄,0 (t) = 0.2743(x̄ − mx̄,0 (t)) dt − 0.0031 dw̄x̄,0 (t)
where −0.0031 = limt→∞ (ρw + γ(t)/σR ). To be precise, Barberis finds, in accordance with other
empirical works, excess stock returns and the dividend-price ratio to be highly negatively correlated
(−0.9351), and I set R and mx̄,0 to be perfectly negatively correlated.
38
by the shadow hedging demand if the latter is moderate compared to the magnitude
of the ambiguity present; otherwise, the investor behaves as if he were a Bayesian
¯ or m∗ − r + σR ξ(t).
¯
investor whose estimate of the equity premium is m∗ − r − σR ξ(t)
The hedging demands are represented by a shaded region in the right plot. Note that
the investor hedges for a range of estimated equity premia wider than dictated by
the ambiguity in her estimate and the hedging demands are significant. For example,
when the estimated equity premium is −0.01, the long-horizon investor facing a 10year horizon sells short an amount of the stock worth about 100% of her wealth,
whereas a myopic investor would take no position in the stock.
In Comparison with Epstein and Schneider (2007) To further analyze the
optimal policy, it helps to contrast it with that of related models, and first I consider
Epstein and Schneider (2007).
First, in Epstein and Schneider’s model, a long-horizon multiple-priors investor
still holds no stock when the estimated equity premium is zero. In Figure 1, on
the other hand, π ∗ is negative around zero estimated premium. This is due to the
asymmetry in the dynamics of the estimated premium m∗ −r. When the true premium
is constant and known, a log investor’s value function is quadratic in it. Hence, in
particular, it is symmetric at zero premium and is strictly increasing in the absolute
value of the premium; that is, the investor is better off when the stock is (locally, in
expected terms) more distinct from the bond. However, since in the present model
m∗ − r is attracted to x̄∗ − r, the current value of which is positive, the value function
rises in the right vicinity of zero estimated premium and rises more in the right than
in the left because a negative m∗ − r will have to pass the minimum of the value
function before reaching x̄∗ − r. Consequently, ∂m∗ J(t, r, x̄∗ ) > 0. ∂x̄∗ J(t, r, x̄∗ ) > 0
for the obvious reason, and the negative demands around zero estimated premium
follow.
When the desire to hedge fully realizes, it may give rise to contrarian behavior.
Note from Figure 1 that when the estimated premium falls around −0.02, the investor exhibits contrarian behavior in the sense that she decreases her stock holdings
as the estimated premium increases. In the absence of ambiguity (see, for example,
Brendle (2006)), as the estimated premium improves, that is, as it moves toward the
direction of increasing the continuation utility, the marginal indirect utility of such
an improvement strictly increases and so does the desire to hedge. The introduction
of ambiguity does not fundamentally alter this structure because the density generators are bounded by ξ¯ and ξ¯ is independent of the estimated premium. Epstein and
Schneider make a similar observation that their investor is contrarian in the sense that
when the estimated premium is not unambiguously distinct from zero, she goes long
for negative premia and short for positive premia. This restricted form of contrarian
behavior results from the symmetric structure of their model.
What, then, exactly is the dependence of the stock demand on the estimated longrun equity premium? The argument leading to ∂m∗ J(t, r, x̄∗ ) > 0 suggests that if the
39
Stock Demand
2
Stock Demand
2
1
1
x*
m +x*
*
-0.04
-0.02
0.02
Premium
0.04
x
-1
m* +x*
*
-0.04
-0.02
0.02
0.04
Premium
m*
*
m
-1
-2
-2
Figure 2: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous
equity premium (annual, decimal). The parametrization is the same as Figure 1 except that the
estimated long-run equity premium x̄∗t − r is 0 for the left plot and −0.0458 for the right plot, as
opposed to 0.0458.
current value of x̄∗ − r is negative, ∂m∗ J(t, r, x̄∗ ) < 0. This is indeed the case. See
Figure 2; I changed the value of x̄∗t − r from 0.0458 to 0 (left plot) and −0.0458 (right
plot). When x̄∗t − r = −0.0458, both derivatives at zero instantaneous premium are
negative and the corresponding hedging demand is positive. It is possible to show,
following the proof of Lemma 4.2(i), that J(t, m∗ , x̄∗ ) is convex in (m∗ , x̄∗ ) and hence
in particular in x̄∗ . Accordingly, the desire to hedge against low continuation utility
results in contrarian behavior with respect to the long-run premium. Compare the
monotonic dependence of the demand on the long-run premium with its nonmonotonic
dependence on the instantaneous premium. Such a distinction is absent in Epstein and
Schneider’s model because they consider constant (in the sense of indistinguishability)
investment opportunities.
The assumption of constant investment opportunities also implies that in Epstein
and Schneider’s model, hedging demands disappear as time goes to infinity. In contrast, in the present model, the desire to hedge against adverse changes in the estimate
of the instantaneous premium, that is, the m∗ -shadow hedging demand, persists.
In Comparison with Miao (2009) Miao (2009) also considers the consumption/portfolio choice problem of a multiple-priors investor in continuous time who
partially observes stochastic investment opportunities. However, his notion of learning is fundamentally different from mine.
To review Miao’s model in the context of the present model, pick a theoretical
prior Qx̄,0 , x̄ ∈ R. A preferential prior P ξ is characterized by the filtered stock return
dynamics
dR(t) = mx̄,0 (t) dt + σR ( dw̄x̄,0,ξ (t) + ξ(t) dt), |ξ(t)| ≤ ξ¯
where w̄x̄,0,ξ = {w̄x̄,0,ξ (t), Gt } is a P ξ -Wiener process. That is, (i) the “center” of
the set of one-step-ahead conditionals is obtained by the standard Bayesian learning
40
Stock Demand at m* -r=0
log Λ=¥
Markov Policy
log Λ=1
Stock Demand
log Λ=0
log Λ=-0.5
Stock Demand
æ
æ
-1
0.2
æ
æ
æ
æ
2
3
log Λ
æ
-0.1
æ
0.1
-0.015 -0.010 -0.005
-0.1
1
æ
æ
-0.2
Premium
0.005 0.010 0.015
æ
æ
-0.2
æ
-0.3
æ
æ
æ
-0.3
-0.4
-0.5
Figure 3: Confidence and optimal stock demand. The investor has learned all that he can (t → ∞)
and now faces a 10-year investment horizon. β = 0.03 and x̄∗∞ − r =p0.0458. α varies as λ does in
such a way that the ambiguity in the instantaneous equity premium 2αδ(t = ∞; λ) stays at 0.01.
Left plot: Optimal stock demand (fraction of wealth) as a function of the estimated instantaneous
equity premium (annual, decimal), for different levels of the investor’s confidence λ in the reference
likelihood. Right plot: The same demand at m∗ − r = 0 as a function of λ.
under Qx̄,0 and (ii) after the Bayesian learning, there remains an exogenous and
time-invariant ambiguity. Thus, in particular, learning and ambiguity do not interact.
In contrast, in the present model, the innovation receives a larger weight when the
current estimate m∗t is ambiguous, that is, when δ(t) is large.
In fact,p
Miao’s model is the limit of the present model as t, λ, α → ∞ with the
¯ Note that t → ∞ is consistent with the IID ambiguity;
restriction 2αδ(t; λ)/σR = ξ.
λ → ∞, that is, full confidence in the reference likelihood, with the Bayesian learning;
and α → ∞ with the multiple one-step-ahead conditionals despite the full confidence.
In Figure 3, I plot the optimal stock demand corresponding to different levels of
confidence λ. The investor has learned all that she can, meaning in particular that γ
and δ have converged (assume that x̄∗ , too, has converged), and now faces a 10-year
investment horizon. β = 0.03 as before and x̄∗∞ − r = 0.0458. α varies
as λ does
p
in such a way that the ambiguity in the instantaneous premium 2αδ(t = ∞; λ)
stays at 0.01. The left plot shows the Markov policies. The solid black line (top)
in particular corresponds to full confidence and hence to the Miao demand. Note
that it is increasing everywhere, that is, there is no region of contrarian behavior.
This is because stock returns are negatively correlated with the state variable m∗ :
σm∗ (∞) = ρw + γ(∞)/σR = −0.003.
More importantly, the stock demand monotonically decreases as the investor loses
confidence. See the right plot, which shows the stock demand at m∗ − r = 0 as λ
varies. Intuitively, the estimation of the true premium is more difficult and unreliable
for those investors who are less confident about their grasp of the environment; the
consequent lack of confidence in the estimate combined with the (apparent) pessimism
leads those investors, then, to try to transfer wealth even more to adverse states.
41
The effect of learning under ambiguity can be significant: the difference between
Miao’s prediction and mine can be as large as half of wealth, depending on the
investor’s confidence.
5
Asset Pricing
In this section, I examine the asset pricing implications of learning under ambiguity.
Specifically, I consider simple Lucas economies populated by log agents who find dividend growth ambiguous. Section 5.1 describes the general setup and Section 5.2, the
equilibrium. I then make specializing assumptions to highlight the unclear relationship between the equity premium and the conditional variance of returns (Section
5.3), the declining trend in the equity premium (Section 5.4), and the nonmonotonic
dependence of the equity premium on the precision of signals (Section 5.5).
5.1
General Setup
Consider an economy populated by a representative agent who finances her intertemporal consumption over a finite period of time [0, T ], T ∈ (0, ∞), by trading two
financial assets: one locally risk-free asset (bond) and one risky asset (stock). The
bond is in zero net supply; the stock is a claim to an exogenous dividend stream. The
consumption good is perishable and serves as the numeraire.
5.1.1
The Agent’s Observation and Theories
Regarding how the endowments are generated, the agent entertains multiple theories.
Specifically, the theories take the form of probability measures on a common measurable space. Let there be a measurable space (Ω, F), a set Q of probability measures
(theoretical priors) on (Ω, F), and a filtration F = {Ft } of F. Under the theoretical
prior Qx̄,η ∈ Q, (x̄, η) ∈ R × L2 ([0, T ], R), the dividend-rate process D = {D(t), Ft }
is given by part of the solution to the system of SDEs
dD(t)/D(t) = (aD (t, D, A) + bD (t, D, A)x(t)) dt + σD (t, D, A) dw(t),
dA(t) = (aA (t, D, A) + bA (t, D, A)x(t)) dt + σA (t, D, A) dw(t),
dx(t) = κx (x̄ − x(t)) dt + ρw dw(t) + ρv ( dv x̄,η (t) + η(t) dt).
(48)
(49)
(50)
Here, A = {A(t), Ft } is an nA -dimensional process, nA ≥ 0, representing the macroeconomic variables in addition to the dividends themselves that affect, or are simply
correlated with, the growth of dividends; the scalar process x = {x(t), Ft } tracks
the evolution of the unobservable state of the economy determinant of the expected
growth rate; w = {w(t), Ft } and v x̄,η = {v x̄,η (t), Ft } are independent Wiener processes
of dimension 1+nA and 1, respectively; aD , bD , σD , aA , bA , and σA are nonanticipative
42
path functionals from [0, T ] × C([0, T ], R1+nA ) into R, R, R1×(1+nA ) , RnA , RnA , and
RnA ×(1+nA ) , respectively; and κ ∈ (0, ∞), ρw ∈ R1×(1+nA ) , and ρv ∈ R \ {0}.
The agent observes the dividends and the rest of the macroeconomic variables but
not the expected growth rate. If we define d by dd = dD/D, d and A constitute the
observable component and x the unobservable component of the partially observable
process of the previous sections.25 Let thus y , (d, A> )> . The dynamics (48)–(49) of
the observable component can be rewritten compactly as
dy(t) = (a(t, y) + b(t, y)x(t)) dt + σ(t, y) dw(t)
(51)
where the definitions of a, b, and σ are obvious. Now recall that all numbered assumptions are to stand throughout the paper from their statement on: Assumptions 2.1
to 2.4 and 3.1. Thus, by Assumption 2.1 in particular, the system of SDEs (50)–(51)
has a unique strong solution; and consequently so does (48)–(50). Assume further the
following for simplicity:
Assumption 5.1. bD > 0 a.e.
5.1.2
Asset Prices
The rate of (net) return on the bond, or the interest rate, is denoted by r(t) while
the initial value of the bond is normalized to one. The price of the stock is denoted
by S(t). The processes r and S are adapted to the agent’s observation filtration G
generated by y. It is further assumed that r is a.s. bounded; and that S is a positive
Itô process,26 so that the return on the stock is well-defined by
dR(t) =
dS(t) + D(t) dt
.
S(t)
The interest rate and the stock price are determined endogenously.27
5.1.3
Consumption Plans and Trading Strategies
To guarantee that D belongs to C 2 (u), I make the following assumption:
25
aD , bD , and so on are functionals in (D, A) rather than in (d, A). But by Lemma 4.9 of
Liptser and Shiryaev (1977), there exist ({Bt+ }-adapted) functionals ãD and so on such that
aD (t, D(·, ω), A(·, ω)) = ãD (t, d(·, ω), A(·, ω)), · · · , a.e. This substitution is implicit in the statement.
26
An Itô process consists of a time integral and a stochastic integral with respect to a Wiener
process where the integrand of the stochastic integral is required to be square-integrable with respect
to time a.s. This definition stays unambiguous the multiplicity notwithstanding of the probability
measures under consideration (P and {Q|GT : Q ∈ Q}) by virtue of the uniform boundedness of
ξ ∈ Ξ and b (Proposition 2.9 and Assumption 2.1), the nondegeneracy condition on σ (Assumption
∗ ∗
2.2), and the fact that mx̄,η (t) and mx̄t ,ηt (t), 0 ≤ t ≤ T , are continuous processes.
27
As usual, the assumptions made here on the equilibrium prices are only “provisional” in that the
equilibria thus found (see Proposition 5.1 below) are indeed consistent with them. I do not check,
however, whether there are other equilibria.
43
Assumption 5.2. The processes aD and σD satisfy
Z T
P0
aD (t)2 dt < ∞,
E
0
sup E
P0
exp(h|σD (t)|2 ) < ∞ for some h > 0.
(52)
t≤T
Lemma 5.1. D ∈ C 2 (u).
Now, a two-dimensional process (Π◦ , Π) = {(Π◦ (t), Π(t)), Gt } is a trading strategy,
where Π◦ (t) is to be read as the amount invested in the bond and Π(t) as that invested
in the stock, if (i) it is progressive; (ii) the gains process
Z t
Z t
◦
Π (s)r(s) dt +
Π(s) dR(s), 0 ≤ t ≤ T,
0
0
is well-defined and again an Itô process; (iii) the discounted wealth process
Z t
r(s) ds (Π◦ (t) + Π(t)), 0 ≤ t ≤ T,
exp −
0
is uniformly bounded below; and (iv) Π◦ (T ) + Π(T ) ≥ 0. A trading strategy (Π◦ , Π)
finances a consumption plan c ∈ C 2 (u) if
Z t
Z t
Z t
◦
◦
◦
Π (t) + Π(t) = Π (0) + Π(0) +
Π (s)r(s) dt +
Π(s) dR(s) −
c(s) ds − C(t),
0
0
0
0 ≤ t ≤ T , for some nondecreasing process C = {C(t), Gt } with C(0) = 0.
5.1.4
The Agent’s Problem
The agent has the Chen-Epstein recursive multiple-priors utility with log felicity
Z T
Pξ
min E
e−βt log(c(t)) dt, c ∈ C 2 (u),
ξ∈Ξ
0
and maximizes it over the consumption plans that can be financed by some trading
strategy with initial worth S(0). The preferential priors {P ξ : ξ ∈ Ξ} are constructed
from the theoretical priors described above, as in Section 2.
5.1.5
Equilibrium: Definition
An equilibrium is a pair of processes (r, S) such that with r as the interest rate process
and S as the stock price process, the optimal consumption plan of the agent equals
the exogenous dividend stream D.
44
5.2
5.2.1
The Equilibrium
Asset Prices
Denote the set of minimizing density generators by28
Z T
Pξ
∗
Ξ , arg min E
e−βt log(D(t)) dt.
ξ∈Ξ
0
Proposition 5.1. Equilibria, possibly multiple, exist.
∗
∗
To each ξ ∗ ∈ Ξ∗ corresponds an equilibrium (rξ , S ξ ) where
∗
∗
∗
rξ (t) = β + aD (t) + bD (t)mx̄t ,ηt (t) − |σD (t)|2 + σD (t)ξ ∗ (t),
∗
S ξ (t) =
1 − e−β(T −t)
D(t).
β
(53)
∗
Note that S ξ is independent of ξ ∗ . In fact, it is independent of ambiguity; the equilibrium stock price identified in (53) is the same as that of the Lucas economy populated
by unique-prior agents with log felicity.
A description of Ξ∗ is then in order. Since
∗
∗
dD(t)/D(t) = (aD (t) + bD (t)mx̄t ,ηt (t) + σD (t)ξ(t)) dt + σD (t) dξ (t)
(54)
p
and σD (t)ξ(t) = bD (t)∆m, |∆m| ≤ 2αδ(t), a natural candidate is ξ ∗∗ defined by
p
ξ ∗∗ (t) , −σ(t)−1 b(t) 2αδ(t), 0 ≤ t ≤ T.
Proposition 5.2. Suppose σ −1 b is constant. Then Ξ∗ = {ξ ∗∗ }.
With arbitrary stochastic coefficients a, b, and σ, however, an explicit characterization
of Ξ∗ seems difficult. In what follows I will thus assume σ −1 b is constant, with the
exception of Section 5.3, and focus on the equilibrium associated with ξ ∗∗ :
Assumption 5.3. σ −1 b is constant.
With ξ ∗ = ξ ∗∗ , the equilibrium interest rate identified in Proposition 5.1 becomes
p
∗ ∗
r(t) = β + aD (t) + bD (t)mx̄t ,ηt (t) − |σD (t)|2 − bD (t) 2αδ(t)
where the now-redundant superscript is dropped.
28
The set is nonempty by Theorem 2.2 of Chen and Epstein (2002).
45
5.2.2
Equity Premium
If the agent had to estimate the equity premium, she would do so under the most
∗ ∗
plausible theory Qx̄t ,ηt . So compute µR defined by
dR(t) = µR (t) dt + σR (t) d(t).
From (53),
∗
∗
µR (t) = β + aD (t) + bD (t)mx̄t ,ηt (t),
σR (t) = σD (t).
The (estimated) equilibrium equity premium is given by
p
µR (t) − r(t) = |σD (t)|2 +bD (t) 2αδ(t),
| {z } |
{z
}
risk premium ambiguity premium
|
{z
}
uncertainty premium
which we see consists of two components.
I refer to the equity premium also as the uncertainty premium; it is the reward for
bearing uncertainty in the growth of the dividends of the asset the agent is constrained
to hold:
0 0
dD(t)/D(t)|Gt ∼ N aD (t) + bD (t)mx̄ ,η (t) dt, |σD (t)|2 dt , (x̄0 , η 0 ) ∈ R×L2 ([0, T ], R).
Of the two components, the first is the risk premium from the C-CAPM (Breeden,
1979),
|σD (t)|2 = Cov( dR(t), dD(t)/D(t)|Gt ),
and equals the unambiguous local variance of dividend growth, namely the risk in
dividend growth. The second is the ambiguity premium (Chen and Epstein, 2002),
p
bD (t) 2αδ(t) = −σR (t)ξ ∗∗ (t)
and equals the ambiguity in dividend growth; or, more specifically, the ambiguity in
the local mean of dividend growth as measured by the square root of the
p inverse of
the curvature of the√(induced) plausibility, or the “standard error,” bD (t) δ(t) (times
the personal cutoff 2α). If the agent could put full confidence in a particular theory,
she would deduce a single conditional distribution for dividend growth with a definite
mean, and accordingly worry only about the dispersion of dividend growth as above.
The agent cannot, however, and demands commensurate compensation.
Remark 5.1. The presence of ambiguity and aversion to it alleviate both the equity
premium puzzle and the risk-free rate puzzle (Chen and Epstein, 2002). In the present
setup, the stock price is unaffected by ambiguity (Proposition 5.1), and so for the
agent to continue to hold the ambiguous asset, the interest rate has to fall.
46
5.3
Equity Premium & Conditional Variance of Returns
Motivated by standard asset pricing models such as the I-CAPM of Merton (1973),
numerous empirical papers have investigated the relationship between the equity premium and the conditional variance of returns. The findings are mixed: some report
a positive relationship as expected; others a negative relationship; and still others an
insignificant relationship. Consequently, various explanations and state variables have
been suggested.29
In the present setup, the equity premium would equal the conditional return variance if there were no ambiguity, that is, if the agent knew the true probability measure;
with ambiguity, it equals the conditional return variance plus the ambiguity premium.
And the ambiguity premium is both time-varying and do not have a deterministic
relationship with the conditional return variance.
To further investigate the relationship between the equity premium and the conditional variance of returns, I consider an economy where the conditional variance of
dividend growth (and that of returns) is time-varying, following a Cox-Ingersoll-Ross
process (Gennotte and Marsh, 1993):30
q
p
2
, rDA ) dw(t),
dD(t)/D(t) = x(t) dt + A(t)( 1 − rDA
p
dA(t) = ν(Ā − A(t)) dt + ςA A(t)(0, 1) dw(t),
dx(t) = κ(x̄ − x(t)) dt + (ρw,1 , ρw,2 ) dw(t) + ρv dv x̄,0 (t),
where (i) ν Ā > ςA2 /2 so that A stays away from zero and (ii) rDA ∈ (−1, 1).
This specification violates Assumption 2.2. Thus, to characterize ξ ∗ I invoke a
stochastic control argument instead. Denote the state vector by
Z = (D, A, m∗ , x̄∗ , γ, δ, Ix̄−1 , σx̄∗ )
∗
∗
where, as before, m∗t ≡ mx̄t ,ηt (t). Define the value function by
Z T
Pξ
−βs
J(t, Z) , min E
e
log(D(s)) ds Z(t) = Z
ξ
t
subject to the dynamics of Z.
Proposition 5.3. Assume J ∈ C 1,2 ∩ Cp . Then ξ ∗ ∈ Ξ∗ if and only if
 p



+
2αδ(t)
 p
< 0
∗
−1
>
−1
ξ (t) = σ(t) b(t) × − 2αδ(t)
if (∂Z J(t, Z(t))) σZ (Z(t))σ(t) b(t) > 0




=0
indeterminate
p
p
where “indeterminate” means “any number in [− 2αδ(t), 2αδ(t)].”
29
30
See the introduction of Rossi and Timmermann (2015).
In Gennotte and Marsh (1993), the expected growth rate of dividends is constant and known.
47
(∂Z J)> σZ σ −1 b is the marginal effect of an increase in the estimated expected growth
rate, on the expected change over the next instant in the agent’s continuation utility.
Hence, when it is positive (negative), the agent views the expected growth rate as
overestimated (underestimated) and demands a high (low) equity premium. Alternatively, we can interpret the ambiguity premium also as coming from the shadow
hedging demand (see Section 4.4).
Lemma 5.2.
√ p
−κx (s−t)
2
ρ
A 1 − rDA
+γ+δ
1
−
e
w,1
e−βs
ds
2
κx
A(1 − rDA )
t
T
−κx (s−t)
1−e
κ−1
x σx̄∗
e−βs (s − t) −
ds
.
2
κx
A(1 − rDA
)
t
e−βt − e−βT
+
(∂Z J)> σZ σ −1 b =
β
Z
+
Z
T
The first term and the third term of (∂Z J)> σZ σ −1 b are positive for all t < T and
all Z in its natural domain. On the other hand, if ρw,1 < 0, the second term may be
>
−1
negative; and when (∂
pZ J) σZ σ b changes sign as a result, the ambiguity premium
will jump between ± 2αδ(t). Compare this with the constant (positive) sign of the
ambiguity premium under the assumption of constant volatility (Proposition 5.2).
In Figure 4, I plot a simulated path of the conditional variance (common to both
plots) and the corresponding variations in the ambiguity premium when the growth
rates dD/D and the changes dx in the expected rate of growth are locally uncorrelated (left) and locally negatively correlated (right). Cases of positive local correlation
are qualitatively similar to those of zero correlation and hence are not reported. The
dividend dynamics are calibrated based on Bansal et al. (2012) while, differently from
their model, a local correlation between dD/D and dx is allowed.31 The parameter
values related to the agent’s dispositions are set as follows: β = 0.03, λ = 0.01, and
α = 16. The choice for β is within the standard range; those for λ and α are essentially
arbitrary.
When Corr( dD/D, dx) ≥ 0 (see the left plot), the ambiguity premium follows
the conditional variance of returns with some lag. This is intuitive: ambiguity about
the unobservable state of the economy and the premium for bearing it are high when
signals about the unobservable state have been imprecise in the recent past.
The lag, however, confirms that the ambiguity premium is not a function of the
contemporaneous conditional variance of returns. Rather, the ambiguity premium at
31
I annualized the monthly values used for the dynamics of aggregate consumption (the second
row of Bansal et al.’s Table I):
p
dD(t)/D(t) = x(t) dt + A(t)(1, 0) dw(t),
p
dA(t) = 0.0120((0.0249)2 − A(t)) dt + 3.889 × 10−4 A(t)(0, 1) dw(t),
9.478 × 10−4
dx(t) = 0.3038(0.0180 − x(t)) dt + q
((ρw,1 , 0) dw(t) + ρv dv x̄,0 (t)).
2
2
ρw,1 + ρv
48
Figure 4: Conditional variance of returns and ambiguity premium. Thin lines (left vertical axes):
Conditional variance of dividend growth = conditionalpvariance of returns = risk premium = |σD (t)|2 .
Thick lines (right vertical axes): Ambiguity premium 2αδ(t). The local correlation between dD/D
and dx is 0 (left) and −0.99 (right). The simulated path of the conditional variance is common to
the two plots. λ = 0.01 and α = 16. Initial years are discarded.
a particular instant depends on the history of return variance, or, in other words,
there is a kind of hysteresis. Compare, for example, the ambiguity premia around
Year 10 and Year 15. Despite the comparable levels of return variance, the ambiguity
premium is higher around Year 10 because the signal has been imprecise over the
preceding couple of years, while it stays relatively precise afterwards up to Year 15.
Hysteresis can be observed when Corr( dD/D, dx) < 0 as well; see the right plot
and compare, again, Years 10 and 15.
More noteworthy in this case, however, is that the ambiguity premium comoves
negatively with the conditional variance of returns. That is, the ambiguity premium,
or more fundamentally, market ambiguity itself, is high when the traditional measure
of market uncertainty is low, and low when the traditional measure is high. The
reason is as discussed in Section 3.2. Suppose, for example, that recently dividends
have been growing steadily, at a rate faster than expected. The old estimate of the
expected growth rate is likely an underestimate, and, given the reliability of the signal,
it is to be revised upward. But at the same time, the faster-than-expected growth is
indicative also of positive shocks to the growth rate and in turn of negative shocks to
the expected growth rate, and in this respect the estimate is to be revised downward.
It is not clear whether the estimate of the expected growth rate should be revised
upward or downward, and uncertainty—the estimation ambiguity—rises.
The path dependence and the possibility of negative comovement render unclear
the relationship of the ambiguity premium (and ultimately, of the total equity premium) with the conditional variance of returns. In particular, if dividend growth and
expected dividend growth are indeed locally negatively correlated and the ambiguity
premium is of a comparable magnitude to the risk premium (low confidence λ or
high conservatism α or both), then the low-frequency correlation between the equity
49
premium and the conditional variance of returns would be small.
5.4
Declining Trend in the Equity Premium
We observed in Section 3 that learning about ambiguous yet time-invariant factors of
the data-generating mechanism generates a decreasing trend in the conditional ambiguity. We would thus expect the reward for bearing the ambiguity, too, to decrease
over time; and if it does, the interest rate is to rise concurrently (Remark 5.1). These
trends have indeed been documented over the post-WWII period by such authors
as Merton (1980), Blanchard (1993), and Jagannathan et al. (2000). Blanchard, for
example, finds that the U.S. equity premium, which was higher than 10 percent in
the late 1940s, dropped to 2 to 3 percent toward the early 1990s as the interest rate
rose from negative to positive values.
For a demonstration, the following minimal specification suffices, where dividends
grow with constant volatility and there are no additional macroeconomic variables to
consider other than the dividends:
dD(t)/D(t) = x(t) dt + σD dw(t),
dx(t) = κ(x̄ − x(t)) dt + ρw dw(t) + ρv dv x̄,0 (t).
2
p In this case, the risk premium is constant at σD while the ambiguity premium
2αδ(t) deterministically decreases over time to its limit value
q
p
12
p
2
2
−1
2
2
2
2ασD
(κ + ρw /σD ) + (1 + λ )(ρv /σD ) − (κ + ρw /σD ) + (ρv /σD )
(Proposition 3.2). The agent’s conservatism in model selection α affects both the limit
value of the ambiguity premium and the rate at which it falls. Note also that if the
agent in fact has full confidence in mean reversion (λ = ∞), the equity premium will
eventually reach the level that standard models predict.
The left panel of Figure 5 shows a simulated path for the equity premium (thick
line) and a corresponding one for the interest rate (thin line). The dividend dynamics
are calibrated again based on Bansal et al. (2012),32 while β = 0.03, λ = ∞, and
α = 16. β and α are the same as before; but the value of 16 for α, together with
the full confidence, is in fact chosen to match (very roughly) the observed trends (see
Figures 10 and 11 of Blanchard, 1993).
Given in the right panel of Figure 5 for comparison are the equity premium and
the interest rate under the prevailing form of ambiguity, namely IID ambiguity, based
32
Specifically,
dD(t)/D(t) = x(t) dt + 0.0249 dw(t),
dx(t) = 0.3038(0.0180 − x(t)) dt + 9.478 × 10−4 dv x̄,0 (t).
50
IID Ambiguity
Present Model
0.10
0.10
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
-0.02
-0.04
10
20
30
40
50
Year
-0.02
10
20
30
40
50
Year
-0.04
Figure 5: Declining equity premium and rising interest rate. The thick lines show the equity premium
and the thin lines the interest rate (annual and decimal). The left plot shows these rates predicted
by the present model, and the right plot those by an IID ambiguity model à la Miao (2009), from a
common realization of the economy. Under the IID ambiguity model, the equity premium is constant
2
at σD
+ σD ξ¯IID , ξ¯IID ≥ 0. I set the value of ξ¯IID so that the resulting constant premium equals the
mean of the premia in the left plot. The first 5 years are discarded.
on the same realization of the economy. To be precise, by an IID ambiguity model
I mean a version of Miao (2009): at each time-state, the agent Bayes-updates the
reference and true probability measure Qx̄,0 , after which there still remains some
exogenous ambiguity in the form of IID ambiguity. Under the IID ambiguity model,
2
the equity premium is constant at σD
+ σD ξ¯IID where ξ¯IID ≥ 0 is the bound on
density generators. Thus, to generate time variations in the equity premium, one
needs stochastic volatility; but even then, it is still difficult to generate a trend.
The point here is not that the present model can more or less quantitatively
match the observed trends in the equity premium and the interest rate, but only
that it can qualitatively reproduce them; the present specification is too simplistic
for a quantitative analysis. Furthermore, there are other explanations for the trends,
such as increased portfolio diversification (Heaton and Lucas, 2000) and a fall in
macroeconomic volatility (Lettau et al., 2008); and Pástor and Stambaugh (2001)
find that prior to WWII, the equity premium had been on an increasing trend for a
century. Thus, the claim is rather that learning contributes to a decline in the equity
premium, through a resolution of ambiguity.
5.5
Equity Premium & Signal Precision
In a Bayesian framework, Veronesi (2000) observed that higher precision of signals
tends to increase the risk premium. Now that we have an endogenous characterization
of the ambiguity premium, it is of interest to see if such a counterintuitive relation
holds for the ambiguity premium as well.
51
A minimal specification that will suffice for the purpose is as follows:
q
2
dD(t)/D(t) = x(t) dt + σD ( 1 − rDA
, rDA ) dw(t),
dA(t) = x(t) dt + σA (0, 1) dw(t),
dx(t) = κ(x̄ − x(t)) dt + ρv dv x̄,0 (t),
where σD , σA > 0 without loss of generality and rDA ∈ (−1, 1). In words, dividends
grow with constant volatility, and there is a signal A about the expected growth rate
x (Detemple, 1986; Veronesi, 2000). (The dividend stream D, too, is a signal, but I
reserve the term signal for A, which is to represent news circulating in the economy
other than the dividends themselves. I will use the term dividend-signal when referring
to D as a signal.) Dividend growth and expected dividend growth are, for simplicity,
assumed to be locally uncorrelated; but the signal and the dividend-signal are allowed
to be locally correlated with correlation rDA .
With the constant volatility of the observable processes (and the constant b), the
ambiguity premium converges to a constant (Proposition 3.2), and I will focus on this
limit value. Define the precision of the signal by hA , 1/σA .
Proposition 5.4. Suppose rDA > 0. Then the ambiguity premium as a function of
the signal precision hA is hump-shaped: it is strictly increasing on [0, rDA hD ] and
strictly decreasing on (rDA hD , ∞).
To understand the result, let hD , 1/σD and
s
h2D − 2hD hA rDA + h2A
, 0 ≤ hA < ∞.
heff = heff (hA ) ,
2
1 − rDA
heff denotes the effective precision of the signal and the dividend-signal en masse; σD
and σA enter the filtering equations only via |σ −1 b|2 = h2eff . Then, in terms of the
effective precision the ambiguity premium is given by
q
p
21
p
p
2 + (1 + λ−1 )(ρ h )2 −
2 + (ρ h )2
2αδ(∞) = 2αh−2
κ
κ
.
v eff
v eff
eff
As can be easily checked, the relationship between the ambiguity premium and the
effective precision is intuitive: the higher the effective precision, the lower the ambiguity premium. What appears to be counterintuitive is that, when the signal and the
dividend-signal are locally positively correlated, for low levels of the signal precision
an increase in the signal precision decreases the collective informativeness of the signal and the dividend-signal. See
pFigure 6, in which I plot the asymptotic level of the
2
equity premium, namely σD
+ 2αδ(∞) (left), and the effective precision (right), as
functions of the signal precision.
The reason is in fact simple: an extremely noisy signal still helps infer the hidden
state if it is correlated with another signal, by revealing the common noise. It is thus
52
ΜR -r
heff
300
0.06
hD
200
0.05
150
0.04
rDA =0.99
100
rDA =0.9
0.03
50
rDA =0.1
0
heff HrDA =0.99L
250
10
20
30
40
50
60
hA
0
10
20
30
40
50
60
hA
Figure 6: Equity premium and signal precision. Left plot: The asymptotic level of the equity premium
(annual, decimal) as a function of the signal precision, for different levels of local correlation between
the signal and the dividend-signal. Right plot: The effective precision as a function of the signal
precision. The dividend dynamics are calibrated as in the previous subsection. λ = 0.01 and α = 16.
a signal with moderate precision, provided it is locally positively correlated with the
dividend-signal, that adds least to the dividend-signal, because not only is it then
revealing neither the hidden state nor the common noise, but it is similar to the
dividend-signal, that is, redundant. Note that indeed heff attains its minimum of hD
when hA = rDA hD .
When rDA < 0, the ambiguity premium is strictly decreasing in hA everywhere
because the signal is then never redundant.
Thus, the ambiguity premium, too, can increase in response to an improvement
in the quality of news. What distinguishes this observation from Veronesi’s (2000) is
that his result relies on the representative agent’s being sufficiently risk-averse, more
so than log agents; in his model, a deterioration in the quality of news decreases
the equity premium because the agent’s hedging demand tones down the covariation
between returns and consumption growth. In contrast, the present paper shows that
the equity premium can exhibit such counterintuitive behavior even under unit risk
aversion, which is conventionally associated with myopia.33
A
Proofs
Proof of Proposition 2.1. The local version of the Itô existence-uniqueness result.
See Rogers and Williams (1994), Theorem V.12.1.
Proof of Proposition 2.2. Under Assumptions 2.1 and 2.2, the following theorems
in Liptser and Shiryaev (1977) hold: (i) follows from Theorem 12.6; (ii), from Theorem
33
Multiple-priors log investors are nonmyopic; see Epstein and Schneider (2007) and HernándezHernández and Schied (2007a)
53
12.7; and (iii), from a multidimensional adaptation of Theorems 7.17 and 12.5.
Proof of Lemma 2.1. Let f (t) = eκt γ(t)eκt . Then
>
>
>
> −1
>
> > κt
f˙(t) = eκt [ρw ρ>
w +ρv ρv −(ρw σ(t) +γ(t)b(t) )(σ(t)σ(t) ) (ρw σ(t) +γ(t)b(t) ) ]e .
Since (ρw σ > + γb> )(σσ > )−1 (ρw σ > + γb> )> is symmetric and positive semidefinite,
Z t
> κt
tr(eκt (ρw ρ>
tr f (t) ≤ tr f (0) +
w + ρv ρv )e ) dt.
0
It follows that the sum of the variances is bounded: supt≤T
covariances are bounded by variances, the claim follows.
P
i
γii (t) < ∞. Since
Proof of Proposition 2.3.34 Fix (x̄, η) ∈ Rnx × L2 ([0, T ], Rny ). Let Q̄x̄,η be the
measure under which j and v x̄,η are independent Wiener processes where j is defined
by dj(t) = σ(t)−1 dy(t), j(0) = 0. Then
dQx̄,η
= Λ(T ),
dQ̄x̄,η
Z
Λ(t) , exp
0
t
1
[σ(s) (a(s) + b(s)x(s))] dj(s) −
2
−1
>
Z
t
−1
2
|σ(s) (a(s) + b(s)x(s))| ds .
0
Let ψ x̄,η (t, ·) denote the unnormalized density of x(t) given Gt under Qx̄,η , defined by
Z
Q̄x̄,η
f (x)ψ x̄,η (t, x) dx
E
[Λ(t)f (x(t))|Gt ] =
X
where X , Rnx , f denotes an arbitrary test function, and
Z
Z
Z
dx ≡
···
dx1 · · · dxnx .
X
R
R
Since (y, x) is conditionally Gaussian,
1
x̄,η
x̄,η
x̄,η
>
−1
x̄,η
ψ (t, x) = exp u (t) − (x − m (t)) γ(t) (x − m (t))
2
(55)
where ux̄,η (t) is independent of x. Now use Bayes’ rule to see
dQx̄,η dQ0,0 dQ̄x̄,η Q̄x̄,η
Q̄0,0
Q̄0,0
`T (x̄, η) = log E
GT − log E
GT + log E
GT
dQ̄x̄,η dQ̄0,0 dQ̄0,0 34
I thank Domenico Cuoco for this direct proof. Alternatively, we can differentiate (12) under the
integral sign and re-construct the log-likelihood function back.
54
but the last term is 0 because under Q̄0,0 , v 0,0 and j are independent. Thus
Z
Z
x̄,η
`T (x̄, η) = log
ψ (T, x) dx − log
ψ 0,0 (T, x) dx
X
x̄,η
=u
X
0,0
(T ) − u (T )
and all boils down to computing ux̄,η (T ).
To compute it, I compare the ψ x̄,η given in (55) with that as the solution to the
Zakai equation:
Lemma A.1. ψ x̄,η satisfies
dψ x̄,η (t, x) = ψ x̄,η (t, x)[σ(t)−1 (a(t)+b(t)x)]> dj(t)−div[(κ(x̄−x)+ρv η(t))ψ x̄,η (t, x)] dt
1
>
− ∂x ψ x̄,η (t, x)> ρw dj(t) + tr[∂x2 ψ x̄,η (t, x)(ρw ρ>
w + ρv ρv )] dt (56)
2
with initial condition ψ x̄,η (0, ·) ∼ N (m0 , γ0 ).
Proof. The derivation is standard; see, for example, Elliott and Krishnamurthy
(1997). First, differentiate Λ(t)f (x(t)) and then re-integrate the resulting expression:
Z t
Λ(t)f (x(t)) = Λ(0)f (x(0)) +
Λ(s)f (x(s))[σ(s)−1 (a(s) + b(s)x(s))]> dj(s)
0
Z t
+
Λ(s)∂f (x(s))> {[κ(x̄ − x(s)) + ρv η(s)] ds + ρw dj(s) + ρv dv x̄,η (s)}
0
Z
1 t
>
Λ(s) tr[∂ 2 f (x(s))(ρw ρ>
+
w + ρv ρv )] ds.
2 0
Then take the conditional expectation under Q̄x̄,η given Gt :
Z
Z
x̄,η
f (x)ψ x̄,η (0, x) dx
f (x)ψ (t, x) dx =
X
X
Z tZ
+
f (x)[σ(s)−1 (a(s) + b(s)x)]> ψ x̄,η (s, x) dx dj(s)
Z0 t ZX
+
∂f (x)> [κ(x̄ − x) + ρv η(s)]ψ x̄,η (s, x) dx ds
0
X
Z tZ
+
∂f (x)> ρw ψ x̄,η (s, x) dx dj(s)
0
Z XZ
1 t
>
x̄,η
+
(s, x) dx ds.
tr[∂ 2 f (x)(ρw ρ>
w + ρv ρv )]ψ
2 0 X
For the change in the order of the conditional expectation and the stochastic integral
with respect to j, see Liptser and Shiryaev (1977), Theorem 5.14. Now, integration
by parts with respect to x completes the derivation.
55
(Proof of the proposition continued.) From (55),
d log ψ x̄,η (t, x) = dux̄,η (t) + (x − mx̄,η (t))> γ(t)−1 dmx̄,η (t)
1
+ (x − mx̄,η (t))> γ(t)−1 γ̇(t)γ(t)−1 (x − mx̄,η (t)) dt
2
1
− tr[γ(t)−1 (ρw σ(t)> + γ(t)b(t)> )(σ(t)σ(t)> )−1 (ρw σ(t)> + γ(t)b(t)> )> ] dt.
2
On the other hand, computing the spatial derivatives of ψ x̄,η using (55) and plugging
them to (56), we obtain another expression for dψ x̄,η (t, x):
−1
x̄,η
dψ x̄,η (t, x)/ψ x̄,η (t, x) = [σ(t)−1 (a(t) + b(t)x) + ρ>
(t))]> dj(t)
w γ(t) (x − m
+ [(x − mx̄,η (t))> γ(t)−1 (κ(x̄ − x) + ρv η(t)) + tr κ] dt
1
>
−1
x̄,η
+ (x − mx̄,η (t))> γ(t)−1 (ρw ρ>
(t)) dt
w + ρv ρv )γ(t) (x − m
2
1
>
− tr(γ(t)−1 (ρw ρ>
w + ρv ρv )) dt.
2
Then
d log ψ
x̄,η
1
1
1
− 2 ( dψ)2
(t, x) = dψ +
ψ
2
ψ
−1
−1
x̄,η
= [σ(t) (a(t) + b(t)x) + ρ>
(t))]> dj(t)
w γ(t) (x − m
+ [(x − mx̄,η (t))> γ(t)−1 (κ(x̄ − x) + ρv η(t)) + tr κ] dt
1
>
−1
x̄,η
+ (x − mx̄,η (t))> γ(t)−1 (ρw ρ>
(t)) dt
w + ρv ρv )γ(t) (x − m
2
1
>
− tr(γ(t)−1 (ρw ρ>
w + ρv ρv )) dt
2
1
−1
x̄,η
− |σ(t)−1 (a(t) + b(t)x) + ρ>
(t))|2 dt.
w γ(t) (x − m
2
Equate the two expressions of d log ψ x̄,η (t, x) to see
1
dux̄,η (t) = − tr(γ(t)−1 γ̇(t)) dt + (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 dy(t)
2
1
− (a(t) + b(t)mx̄,η (t))> (σ(t)σ(t)> )−1 (a(t) + b(t)mx̄,η (t)) dt.
2
x̄,η
Finally, note that u (0) = u0,0 (0).
Proof of Lemma 2.2. Let ε > 0 and observe
>
Z t Z s
−1
`t (x̄, η+εh)−`t (x̄, η) = ε
ϕ(τ ) ρv h(τ ) dτ
ϕ(s)> b(s)> (σ(s)σ(s)> )−1 dy(s)
0
0
Z t
Z s
x̄,η
>
> −1
− ε (a(s) + b(s)m (s)) (σ(s)σ(s) ) b(s)ϕ(s)
ϕ(τ )−1 ρv h(τ ) dτ ds + O(ε2 ).
0
0
(57)
56
The first term can be rewritten, by integration by parts, as
Z
ε
0
> Z
t
t
ϕ(s)> b(s)> (σ(s)σ(s)> )−1 dy(s)
ϕ(τ ) ρv h(τ ) dτ
0
Z s
Z t
−1
>
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 dy(τ ) ds
− ε (ϕ(s) ρv h(s))
−1
0
(58)
0
Z t Z
t
>
>
> −1
ϕ(τ ) b(τ ) (σ(τ )σ(τ ) )
=ε
0
>
dy(τ ) ϕ(s)−1 ρv h(s) ds
s
and the second term, by changing the order of integration, as
Z t Z
ε
>
>
> −1
x̄,η
ϕ(τ ) b(τ ) (σ(τ )σ(τ ) ) (a(τ ) + b(τ )m
0
>
t
(τ )) dτ
ϕ(s)−1 ρv h(s) ds. (59)
s
Now, plug (58) and (59) into (57), differentiate it with respect to ε, and set ε = 0.
Proof of Lemma 2.3. Since ψ11 (0) = Inx , ψ11 (t) is by continuity invertible up to a
random time τ11 ∈ (0, T ]. Up to τ11 , ψ21 (t)ψ11 (t)−1 ρv ρ>
v satisfies
>
f˙(t) = λ−1 ρv ρ>
v − κ̄(t)f (t) − f (t)κ̄(t)
− f (t)b(t)> (σ(t)σ(t)> )−1 b(t)f (t), f (0) = 0. (60)
(60) has a unique solution: suppose p and q solve (60), let ∆(t) = p(t) − q(t), and
˙
observe ∆(0) = ∆(0)
= 0. Thus, ψ21 (t)ψ11 (t)−1 ρv ρ>
v is symmetric up to τ11 .
Consider the following hypothetical partially observable system
dy(t) = b(t)x(t) dt + σ(t) dw(t),
dx(t) = −κ̄(t)x(t) dt + λ−1/2 ρv dv(t), x(0) ∼ N (m0 , 0),
with the understanding κ̄(t) = κ̄(t, y). By Liptser and Shiryaev (1977), Theorem 12.7,
the conditional variance of x(t) satisfies (60) and stays positive definite for t > 0. (The
assumptions of the theorem are satisfied; in particular, κ̄ is uniformly bounded by
Lemma 2.1.) Hence, ψ21 (t)ψ11 (t)−1 ρv ρ>
v is positive definite and consequently invertible
up to τ11 .
Since ψ21 (0) = 0 and ψ̇21 (0) = λ−1 In , ψ21 (t), t > 0, too, is invertible up to a
random time τ21 ∈ (0, T ]. By the last paragraph, τ21 ≥ τ11 .
Suppose τ11 < T . There are two cases to consider. First, τ11 = τ21 . This contradicts
the invertibility of ψ. Second, τ11 < τ21 . Then ψ11 (t)−1 will explode as t ↑ τ11 , which
is impossible because ψ21 (t) is invertible. To be concrete, let g be the solution of
>
ġ(t) = g(t)κ̄(t), g(0) = In , and let h(t) , g(t)ψ21 (t)ψ11 (t)−1 ρv ρ>
v g(t) . Observe
Z T
>
tr h(t) ≤
tr(g(s)λ−1 ρv ρ>
v g(s) ) ds, t ≤ τ11 ,
0
57
Given that g and ψ21 are invertible, the left-hand side should explode as t ↑ τ11 but
the right-hand side is finite. Hence, τ11 = τ21 = T . (Note. T is arbitrary.)
Proof of Proposition 2.4. Multiply ρv to FOC(η) to have
Z t
>
−1 >
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 [ dy(τ )−(a(τ )+b(τ )mx̄,η (τ )) dτ ].
λρv η(s) = ρv ρv (ϕ(s) )
s
Differentiate this with respect to s to see
>
> −1
d(λρv η(s)) = ρv ρ>
v κ̄(s) (ρv ρv ) λρv η(s) ds
>
> −1
κx̄+η
>
> −1
(s) ds.
dw̄0,0 (s) + ρv ρ>
− ρv ρ>
v b(s) (σ(s)σ(s) ) b(s)Φ
v b(s) (σ(s) )
Observe in turn that
dΦκx̄+ρv η (s) = −κ̄(s)Φκx̄+ρv η (s) ds + (κx̄ + ρv η(s)) ds
and that consequently we have a linear system of differential equations in λρv η and
Φκx̄+ρv η . Written in the matrix form, the system is
>
> −1
λρv η(s)
λρv η(s)
−ρv ρ>
dw̄0,0 (s)
v b(s) (σ(s) )
d
= χ(s)
ds +
Φκx̄+ρv η (s)
Φκx̄+ρv η (s)
κx̄ ds
It follows
Z s
>
> −1
λρv η(s)
−ρv ρ>
dw̄0,0 (τ )
−1
v b(τ ) (σ(τ ) )
= ψ(s) ι1 λρv η(0) +
ψ(τ )
Φκx̄+ρv η (s)
κx̄ dτ
0
= ψ(s)ι1 λρv η(0)
Z s
>
> −1
ψ(τ )−1 ι1 ρv ρ>
dw̄0,0 (τ ) + Ψ(s)ι2 κx̄.
− ψ(s)
v b(τ ) (σ(τ ) )
0
Finally, observe
λρv η(t) = 0
= ψ11 (t)λρv η(0) −
ι>
1 ψ(t)
Z
t
>
> −1
ψ(τ )−1 ι1 ρv ρ>
dw̄0,0 (τ ) + Ψ12 (t)κx̄.
v b(τ ) (σ(τ ) )
0
Lemma A.2. θ is the unique solution of
f˙(t) = Inx − κ̄λ (t)f (t), f (0) = 0
where
>
> −1
κ̄λ (t) , κ̄(t) + ψ21 (t)ψ11 (t)−1 ρv ρ>
v b(t) (σ(t)σ(t) ) b(t)
>
> −1
= κ + [ρw σ(t)> + (γ(t) + ψ21 (t)ψ11 (t)−1 ρv ρ>
v )b(t) ](σ(t)σ(t) ) b(t).
58
(61)
Proof. (61) follows from direct differentiation. Uniqueness is standard.
Define
p(s, t) , Ψ(s)ι2 − ψ(s)ι1 ψ11 (t)−1 Ψ12 (t), s ≤ t ≤ T.
Then
∗
(s)
λρv ηx̄,t
∗
κx̄+ρv ηx̄,t
Φ
(s)
∗
(s)
λρv η0,t
∗
0+ρv η0,t
Φ
(s)
=
+ p(s, t)κx̄, s ≤ t ≤ T
and ι>
2 p(t, t) = θ(t). Also,
∂
>
> −1
p(s, t) = −ψ(s)ι1 ψ11 (t)−1 ρv ρ>
v b(t) (σ(t)σ(t) ) b(t)θ(t).
∂t
Proof of Lemma 2.4. Let
−1
−1
0
λ (ρv ρ>
v)
M (s) ,
0
b(s)> (σ(s)σ(s)> )−1 b(s)
and observe
Z
Ix̄ (t) =
t
p(s, t)> M (s)p(s, t) ds.
0
Thus
d
Ix̄ (t) = θ(t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t)
dt
Z
t
−2
|0
>
> −1
p(s, t)> M (s)ψ(s)ι1 ds ψ11 (t)−1 ρv ρ>
v b(t) (σ(t)σ(t) ) b(t)θ(t),
{z
}
=:f (t)


−1
>
> −1 >
> >
> −1 
−1
>
f˙ = −f (κ̄ + ψ21 ψ11
ρv ρ>
v b (σσ ) b) + θ b (σσ ) b ψ21 ψ11 ρv ρv
(62)
Z t


>
−1
>
− (ψ(s)ι1 ψ11 (t)−1 ρv ρ>
)
M
(s)ψ(s)ι
ψ
(t)
ρ
ρ
ds
,
1 11
v v
v

0
{z
}
|
=:g(t)
ġ = λ
−1
ρv ρ>
v
>
− κ̄g − gκ̄ −
−1
>
> −1
ψ21 ψ11
ρv ρ>
v b (σσ ) bg
−1
ρv ρ>
+ (ψ21 ψ11
v −
−1
g)b> (σσ > )−1 bψ21 ψ11
ρv ρ>
v,
−1
ρv ρ>
with f (0) = g(0) = 0, where I have suppressed t unless needed. g = ψ21 ψ11
v is
the unique solution to the last equation. In turn, f = 0 is the unique solution to (62).
59
Suppose Ix̄ (t) is singular for some t > 0. Since it is symmetric and positive
semidefinite, there must be a nonzero z ∈ Rnx such that
Z t
z > θ(s)> b(s)> (σ(s)σ(s)> )−1 b(s)θ(s)z ds = 0
0
or σ(s)−1 b(s)θ(s)z = 0 for Lebesgue almost every s ≤ t or θ(s)z = 0 for all s ≤ t.
Multiply z to (61) to see
d
(θ(s)z) = 0 = z, s ≤ t
ds
which is absurd.
Lemma A.3.
Z t
I nx
>
>
Φ (t) −
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ψ21 (s) ds ψ11 (t)−1 ρv ρ>
(63)
v = θ(t)
0
and
Z
t
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι>
2 p(s, t) ds = Ix̄ (t).
(64)
0
Proof. (63): Denote the left-hand side by f (t). Then
−1
f˙ = −(ΦInx )> κ̄> + Inx − (ΦInx )> b> (σσ > )−1 bψ21 ψ11
ρv ρ>
v
Z t
+
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ψ21 (s) ds
(65)
0
−1
>
>
> −1
−1
>
× ψ11
ρv ρ>
v (κ̄ + b (σσ ) bψ21 ψ11 ρv ρv ).
But by Lemma 2.3(ii),
−1
−1
> >
> −1 >
κ̄> + b> (σσ > )−1 bψ21 ψ11
ρv ρ>
v = (κ̄ + ψ21 ψ11 ρv ρv b (σσ ) b)
= (κ̄λ )>
and with this, (65) can be rewritten as f˙ = Inx − f (κ̄λ )> , which is also satisfied by
θ> . Since f (0) = θ(0)> = 0, it follows that f (t) = θ(t)> .
(64): Denote the left-hand side by g(t). Then
ġ(t) = ΦInx (t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t)
Z t
∂
p(s, t) ds
+
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s) ι>
∂t 2
0
while
∂ >
>
> −1
ι p(s, t) = −ψ21 (s)ψ11 (t)−1 ρv ρ>
v b(t) (σ(t)σ(t) ) b(t)θ(t).
∂t 2
60
By (63),
ġ(t) = θ(t)> b(t)> (σ(t)σ(t)> )−1 b(t)θ(t).
Note finally that g(0) = 0.
Proof of Proposition 2.5. From FOC(x̄),
Z
0
t
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι>
2 p(s, t) ds κx̄
Z t
∗
ΦInx (s)> b(s)> (σ(s)> )−1 dw̄0,η0,t (s).
=
0
Recall (64).
Proof of Proposition 2.6. (i) Differentiating FOC(x̄) with respect to t, we see
0 = ΦInx (t)> b(t)> (σ(t)> )−1 d(t)
Z t
∗
∗
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s) dt Φκx̄t +ρv ηt (s) ds.
−
0
Direct computation shows
∗
∗
∗
−1
>
>
> −1
dt Φκx̄t +ρv ηt (s) = ι>
d(t). (66)
2 p(s, t)κ dx̄t + ψ21 (s)ψ11 (t) ρv ρv b(t) (σ(t) )
Hence,
Z t
∗
ΦInx (s)> b(s)> (σ(s)σ(s)> )−1 b(s)ι>
2 p(s, t) ds κ dx̄t
0
Z t
Inx
>
Inx
>
>
> −1
−1
>
= Φ (t) −
Φ (s) b(s) (σ(s)σ(s) ) b(s)ψ21 (s) ds ψ11 (t) ρv ρv
0
> −1
>
× b(t) (σ(t) )
d(t).
Use Lemma A.3.
(ii) Observe
∗
∗
∗
∗
dmx̄t ,ηt (t) = dmx̄,η (t)|x̄=x̄∗t ,η=ηt∗ + Φκ dx̄t +ρv dηt (t)
∗
∗
= κ(x̄∗t − mx̄t ,ηt (t)) dt + (ρw σ(t)> + γ(t)b(t)> )(σ(t)> )−1 d(t)
∗
∗
+ Φκ dx̄t +ρv dηt (t).
∗
∗
dΦκx̄t +ρv ηt (t) is, if computed from the definition,
∗
∗
∗
∗
∗
∗
dΦκx̄t +ρv ηt (t) = −κ̄(t)Φκx̄t +ρv ηt (t) dt + κx̄∗t dt + Φκ dx̄t +ρv dηt (t)
61
and is, if computed from the solution (18) (recall (66)),
∗
∗
∗
∗
dΦκx̄t +ρv ηt (t) = −κ̄(t)Φκx̄t +ρv ηt (t) dt + κx̄∗t dt
>
> −1
d(t) + θ(t)κ dx̄∗t .
+ ψ21 (t)ψ11 (t)−1 ρv ρ>
v b(t) (σ(t) )
Comparing the last two equations, we see
∗
∗
>
>
> −1
d(t).
Φκ dx̄t +ρv dηt (t) = (ψ21 (t)ψ11 (t)−1 ρv ρ>
v + θ(t)σx̄∗ (t) )b(t) (σ(t) )
Proof of Proposition 2.7. All follow from direct differentiation.
Proof of the Claim in Remark 2.3. (i) If σ and b are deterministic, then θ, σx̄∗ ,
and δ, too, are deterministic. Since the latter are continuous, boundedness follows.
(ii) Suppose σ, ρw , ρv , and b are diagonal; it then suffices to consider the scalar
case. Suppose also κ̄ ≥ ε a.e. for some ε > 0.
−1 2
Since δ − θσx̄∗ = ψ21 ψ11
ρv > 0,
θ̇(t) < 1 − εθ(t) for all t ≥ 0.
Consider θ† defined by θ̇† (t) = 1 − εθ† (t) and θ† (0) = θ(0). θ(t) ≤ θ† (t) for all t ≥ 0
because θ(t) = θ† (t) implies θ̇(t) < θ̇† (t). Now, θ† monotonically converges to ε−1 ,
and thus, θ is uniformly bounded by ε−1 ∨ θ(0). Next, since Ix̄−1 is decreasing,
σ̇x̄∗ (t) < Ix̄ (0)−1 − εσx̄∗ (t)
and σx̄∗ is uniformly bounded by (εIx̄ (0))−1 ∨ σx̄∗ (0). Note, finally, that
δ̇(t) < 2[(εIx̄ (0))−1 ∨ σx̄∗ (0)] + λ−1 ρ2v − 2εδ(t).
Proof of Proposition 2.8. Since
∗
∗
d(t) = dw̄0,0 (t) − σ(t)−1 b(t)(mx̄t ,ηt (t) − m0,0 (t)) dt,
the question is whether
Z t
Z
1 t
σ −1 b∆
−1
0,0
−1
2
E
(t) = exp
σ(s) b(s)∆(s) dw̄ (t) −
|σ(s) b(s)∆(s)| ds ,
2 0
0
0 ≤ t ≤ T , is a martingale under Q0,0 |GT , where
∗
∗
∆(t) , mx̄t ,ηt (t) − m0,0 (t), 0 ≤ t ≤ T.
Observe
d∆(t) = κ(x̄∗t − ∆(t)) dt + δ(t)b(t)> (σ(t)> )−1 dw̄0,0 (t)
− [ρw σ(t)> + (γ(t) + δ(t))b(t)> ](σ(t)σ(t)> )−1 b(t)∆(t) dt.
62
Hence, (∆, x̄∗ ) satisfies a linear SDE with uniformly bounded volatility. Thus, by a
multidimensional adaptation of Liptser and Shiryaev (1977), Theorem 4.7, there is an
0,0
h > 0 such that supt≤T EQ exp(h|∆(t)|2 ) < ∞; in turn, by the uniform boundedness
0,0
of σ −1 b, there is an h0 > 0 such that supt≤T EQ exp(h0 |σ(t)−1 b(t)∆(t)|2 ) < ∞. Now,
by Liptser and Shiryaev (1977), Section 6.2.3, Example 3, Novikov’s condition holds
−1
−1
and E σ b∆ is a martingale. Define P 0 by dP 0 / d(Q0,0 |GT ) = E σ b∆ (T ).
Denote by F̄ the augmented filtration generated by . From the definition of
, we have F̄t ⊆ Gt , 0 ≤ t ≤ T . For the other direction, observe the SDE that
∗ ∗
(y, mx̄· ,η· (·), x̄∗ ) satisfies with a, b, σ, γ, δ, and σx̄∗ replaced by their respective nonanticipative path functionals in y. The drift is locally Lipschitz and linearly growing,
and the volatility is linearly growing (Assumptions 2.1, 2.2, and 2.4). Hence, if in
∗ ∗
addition (γ + δ)b> (σ > )−1 and σx̄>∗ b> (σ > )−1 are locally Lipschitz, then (y, mx̄· ,η· (·), x̄∗ )
is the unique strong solution to the SDE by Itô’s existence and uniqueness theorem
(Rogers and Williams (1994), Theorem V.12.1), and it would follow that F̄t ⊇ Gt ,
0 ≤ t ≤ T , or F̄ = G. (Recall that I assume all G0 -measurable variables to be
nonrandom constants.)
Since γ, δ, b> , (σ > )−1 , and σx̄>∗ are uniformly bounded, it suffices to show that
each of them is locally Lipschitz: Suppose p and q are matrix-valued path functionals
on [0, T ] × C([0, T ], Rny ). Then
|p(t, f )q(t, f ) − p(t, g)q(t, g)| = |pq − pq 0 + pq 0 − p0 q 0 |, p ≡ p(t, f ) and p0 ≡ p(t, g)
≤ |p||q − q 0 | + |q 0 ||p − p0 |
by the triangle and Cauchy-Schwarz inequalities.
γ is locally Lipschitz by the proof of Liptser and Shiryaev (1977), Theorem 12.5.
b is so by assumption (Assumption 2.1). To see σ −1 is locally Lipschitz, observe that
|σ(t, f )−1 − σ(t, g)−1 | = |σ(t, f )−1 (σ(t, g) − σ(t, f ))σ(t, g)−1 |
≤ |σ(t, f )−1 ||σ(t, g)−1 ||σ(t, g) − σ(t, f )|.
It remains to show that δ and σx̄∗ are locally Lipschitz. Let N > 0, t ∈ [0, T ], and let
f, g ∈ C([0, T ], Rny ) be such that
sup |f (s)| ∨ sup |g(s)| ≤ N.
s≤t
s≤t
Consider Ix̄−1 . Since
d
(Ix̄ (t)−1 ) = −σx̄∗ (t)> b(t)> (σ(t)σ(t)> )−1 b(t)σx̄∗ (t),
dt
for s ≤ t,
−1
|Ix̄ (s, f )
−1
Z
− Ix̄ (s, g) | ≤ K1
s
|σx̄∗ (τ, f ) − σx̄∗ (τ, g)| dτ + K2 sup |f (s) − g(s)|
s≤t
0
63
where I use the same symbols for the path functionals and Ki are positive constants
that do not depend on s or t. Proceeding similarly for σx̄∗ , and using the last inequality,
Z s
|σx̄∗ (τ, f ) − σx̄∗ (τ, g)| dτ
|σx̄∗ (s, f ) − σx̄∗ (s, g)| ≤ K3
0
Z s
|δ(τ, f ) − δ(τ, g)| dτ + K5 sup |f (s) − g(s)|.
+ K4
s≤t
0
In turn,
Z
s
|δ(τ, f ) − δ(τ, g)| dτ + K7 sup |f (s) − g(s)|.
|δ(s, f ) − δ(s, g)| ≤ K6
s≤t
0
By Gronwall’s lemma,
|δ(s, f ) − δ(s, g)| ≤ eK6 s K7 sup |f (s) − g(s)|, s ≤ t,
s≤t
or
|δ(t, f ) − δ(t, g)| ≤ eK6 T K7 sup |f (s) − g(s)| =: K8 sup |f (s) − g(s)|
s≤t
s≤t
where K8 does not depend on t. Hence, δ is locally Lipschitz. In turn, so is σx̄∗ .
Proof of Lemma 2.5. Let
ft (∆x̄, ∆η) , `λt (x̄∗t , ηt∗ ) − `λt (x̄∗t + ∆x̄, ηt∗ + ∆η) ≥ 0
Z
1 t κ∆x̄+ρv ∆η >
Φ
(s) b(s)> (σ(s)σ(s)> )−1 b(s)Φκ∆x̄+ρv ∆η (s) dt
=
2 0
Z
λ t
+
|∆η(s)|2 ds
2 0
where I have recalled the FOCs. We are to find
min ft (∆x̄, ∆η) : Φκ∆x̄+ρv ∆η (t) = ∆m
∆x̄,∆η
∗
∗
where ∆m ≡ m − mx̄t ,ηt (t). Note that there always is a ∆x such that (∆x, ∆η = 0)
satisfies the constraint; Φκ (t) is invertible. Write the Lagrangian as
ft (∆x̄, ∆η) − Λ> (Φκ∆x̄+ρv ∆η (t) − ∆m);
the dependence of Λ on t is suppressed. FOC(∆η) is
−1
>
Z
0 = (ϕ(s) ρv )
t
ϕ(τ )> b(τ )> (σ(τ )σ(τ )> )−1 b(τ )Φκ∆x̄+ρv ∆η (τ ) dτ
s
+ λ∆η(s) − (Λ> ϕ(t)ϕ(s)−1 ρv )>
64
or, multiplied by ρv and differentiated with respect to s,
d
>
> −1
(λρv ∆η(s)) = ρv ρ>
v κ̄(s) (ρv ρv ) λρv ∆η(s)
ds
>
> −1
κ∆x̄+ρv ∆η
(s).
+ ρv ρ>
v b(s) (σ(s)σ(s) ) b(s)Φ
Proceeding similarly to the proof of Proposition 2.4,
λρv ∆η(s)
= ψ(s)ι1 λρv ∆η(0) + Ψ(s)ι2 κ∆x̄.
Φκ∆x̄+ρv ∆η (s)
Let s = t to obtain λρv ∆η(t) = ψ11 (t)λρv ∆η(0) + Ψ12 (t)κ∆x̄. From FOC(η),
0 = λ∆η(t) − (Λ> ρv )> .
Thus
λρv ∆η(s)
κ∆x̄+ρv ∆η
Φ
(s)
= p(s, t)κ∆x̄ + ψ(s)ι1 ψ11 (t)−1 ρv ρ>
v Λ.
(67)
Now, FOC(∆x̄) is
>
t
Z
Φ
0=
κ∆x̄+ρv ∆η
>
>
> −1
Inx
(s) b(s) (σ(s)σ(s) ) b(s)Φ
(s)κ ds
− (Λ> ΦInx (t)κ)> .
0
Substitute Φκ∆x̄+ρv ∆η (s) with that in (67) and use Lemma A.3 to see κ∆x̄ = σx̄∗ (t)> Λ.
Plug this back to (67) and set s = t; the constraint is Φκ∆x̄+ρv ∆η (t) = δ(t)Λ = ∆m or
Λ = δ(t)−1 ∆m.
Thus
λρv ∆η(s)
κ∆x̄+ρv ∆η
Φ
(s)
−1
= (p(s, t)σx̄∗ (t)> + ψ(s)ι1 ψ11 (t)−1 ρv ρ>
v )δ(t) ∆m.
Observe
1
ft (∆x̄, ∆η) =
2
where
Z t
0
M (s) =
λρv ∆η(s)
Φκ∆x̄+ρv ∆η (s)
>
M (s)
λρv ∆η(s)
Φκ∆x̄+ρv ∆η (s)
−1
λ−1 (ρv ρ>
0
v)
0
b(s)> (σ(s)σ(s)> )−1 b(s)
ds
as defined in the proof of Lemma 2.4. Proceeding similarly to that proof, we prove
1
ft (∆x̄, ∆η) = (∆m)> δ(t)−1 ∆m.
2
65
∗
∗
Proof of Proposition 2.9. Suppose first ξ(t) ∈ Ξ(t). Then b(t)mx̄t ,ηt (t)+σ(t)ξ(t) =
b(t)mx̄,η (t) for some theory (x̄, η) and the theory passes the penalized likelihood ratio
test. By Lemma 2.5,
1 x̄,η
∗ ∗
∗ ∗
(m (t) − mx̄t ,ηt (t))> δ(t)−1 (mx̄,η (t) − mx̄t ,ηt (t)) ≤ `λt (x̄∗t , ηt∗ ) − `λt (x̄, η) ≤ α.
2
Suppose next σ(t)ξ(t) = b(t)∆m, ∆m ∈ Rnx , and 2−1 (∆m)> δ(t)−1 ∆m ≤ α. Let
∆x̄ , Φκ (t)−1 ∆m. Then
∗
∗
∗
∗
b(t)mx̄t ,ηt (t) + σ(t)ξ(t) = b(t)mx̄t +∆x̄,ηt (t).
There is a theory (x̄, η) such that it passes the penalized likelihood ratio test and
∗
∗
b(t)mx̄,η (t) = b(t)mx̄t +∆x̄,ηt (t),
because
∗
∗
`λt (x̄∗t , ηt∗ ) − max{`λt (x̄, η) : b(t)mx̄,η (t) = b(t)mx̄t +∆x̄,ηt (t)}
x̄,η
∗
∗
= `λt (x̄∗t , ηt∗ ) − max{`λt (x̄, η) : mx̄,η (t) = mx̄t +∆x̄,ηt (t)}
x̄,η
1
= (∆m)> δ(t)−1 ∆m ≤ α
2
where the second equality follows from Lemma 2.5.
Hence (27).
Since δ is uniformly bounded, so are its eigenvalues; hence, the eigenvalues of δ −1
are uniformly bounded below away from 0. It follows that the right-hand side of (27)
is uniformly bounded; so is Ξ by Assumption 2.2. Compact-convexity is clear. Finally,
progressive measurability is proved as that of a single-valued, left- or right-continuous
adapted process is proved: Suppose b and σ −1 are both right-continuous. Let {sνi : i}
denote the νth dyadic partition of [0, t], t ≤ T , and define δν−1 by δν−1 (s) , δ(sνi+1 )−1
for sνi < s ≤ sνi+1 and δν−1 (0) , δ(0)−1 ; define bν and σν−1 in the same way. Let F be a
closed subset of Rny , and observe that the weak inverse (Aliprantis and Border, 1999,
Section 16.1) is
(s, ω) ∈ [0, t] × Ω :
σν−1 (s, ω)b(s, ω)
1
> −1
nx
∆m ∈ R : (∆m) δν (s, ω)∆m ≤ α ∩ F 6= ∅
2
which is trivially B[0, t] ⊗ Gt -measurable. Now, note that δ −1 is differentiable and
−1
−1
hence a fortiori continuous. We have δ∞
= δ −1 as well as b∞ = b and σ∞
= σ −1 .
66
Finally,
1
>
−1
−1
(s, ω) : σ(s, ω) b(s, ω) ∆m : (∆m) δ(s, ω) ∆m ≤ α ∩ F 6= ∅
2
1
∞
∞
−1
> −1
= (s, ω) : ∩µ=1 ∪ν=µ σν (s, ω)bν (s, ω) ∆m : (∆m) δν (s, ω)∆m ≤ α ∩ F =
6 ∅
2
1
−1
∞
∞
> −1
= ∩µ=1 ∪ν=µ (s, ω) : σν (s, ω)bν (s, ω) ∆m : (∆m) δν (s, ω)∆m ≤ α ∩ F =
6 ∅
2
∈ B[0, t] ⊗ Gt .
Proof of Proposition 3.1. Since `λt (x̄, η) is quadratic in (x̄, η) ((15), (13), and (11))
∗
is linear in x̄ (see (18)), the set in question equals (see (20))
and ηx̄,t
1
∗
2
x̄t + ∆x̄ ∈ R : Ix̄ (t)(κ∆x̄) ≤ α .
2
The claim then follows from the following lemma:
Lemma A.4. limt→∞ Ix̄ (t) = ∞.
Proof. Let ε > 0 be a lower bound of |σ −1 b|. Observe from the dynamics (25) of θ
and the boundedness of the statistics γ, θ, σx̄∗ , and δ (Lemma 2.1 and Assumption
2.4) that θ is bounded away from zero as well: θ(t) ≥ θ > 0, t ≥ 0. (Keep in mind the
convention that learning began prior to the decision making at time 0.) It follows
Z t
|σ(s)−1 b(s)|2 θ(s)2 ds
Ix̄ (t) = Ix̄ (0) +
0
≥ Ix̄ (0) + ε2 θ2 t → ∞ as t → ∞.
Proof of Lemma 3.1. The claim follows from the boundedness of θ (Assumption
2.4) and Lemma A.4.
Proof of Proposition 3.2. Suppose σ −1 b is constant.
To see that δ evolves deterministically, simply recall the governing equations of δ,
σx̄∗ , Ix̄−1 , and γ (Proposition 2.7 and (9)); a, b, and σ enter them only via σ −1 b.
To prove convergence, denote the so far suppressed moment at which the agent’s
learning started, by tΓ < 0 (see Remark 2.2); define f by f (t) , γ(t) + δ(t), t > tΓ ;
and note that f satisfies
2
f˙(t) = 2θ(t)Ix̄ (t)−1 + |ρw |2 + (1 + λ−1 )ρ2v − 2κf (t) − ρw + (σ −1 bf (t))> ,
which motivates us to consider the following DE in tandem: for some t0 ≥ tΓ ,
2
γ̇ λ (t) = |ρw |2 + (1 + λ−1 )ρ2v − 2κγ λ (t) − ρw + (σ −1 bγ λ (t))> , t ≥ t0 , γ λ (t0 ) > 0.
67
Lemma A.5. (i) γ λ is given by
 λ
γ̄




 −(κ + ρw σ −1 b) + ν λ tanh{ν λ t + tanh−1 [(ν λ )−1 (κ + ρw σ −1 b + |σ −1 b|2 γ λ (t0 ))]}
λ
γ (t) =
|σ −1 b|2


−1
λ
λ

−(κ + ρw σ b) + ν coth{ν t + coth−1 [(ν λ )−1 (κ + ρw σ −1 b + |σ −1 b|2 γ λ (t0 ))]}


|σ −1 b|2
depending on whether γ λ (t0 ) = γ̄ λ (top), < γ̄ λ (middle), or > γ̄ λ (bottom), where
p
ν λ , (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 ,
p
−1
−(κ
+
ρ
σ
b)
+
(κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2
w
γ̄ λ ,
.
|σ −1 b|2
(ii) θ(t)Ix̄ (t)−1 > 0 for all t > tΓ . (iii) limt↓tΓ δ(t) = ∞. (iv) f (t) > γ̄ λ for all t > tΓ .
Proof. Before proceeding with the proof, note carefully that in the expressions preceding Remark 2.2, time 0 refers to the beginning of learning.
˙
(i) Recall the definition (10) of κ̄ and observe that it satisfies κ̄(t)
= ν 2 − κ̄(t)2 .
(ii) θ satisfies ((25) and (24))
θ̇(t) = 1 − κ + ρw σ −1 b + (γ(t) + ψ21 (t)ψ11 (t)−1 ρ2v )|σ −1 b|2 θ(t), t ≥ tΓ ,
(68)
while from (19) and (17), θ(tΓ ) = 0. Since the expression in the curly brackets is a
continuous function on t ≥ tΓ (see (16)), it follows that θ(t) > 0 for all t > tΓ . Then
Ix̄ (t), too, is positive for all t > tΓ by (21).
(iii) From the definitions (24), (16), and (23) of δ, ψ, and σx̄∗ , limt↓tΓ δ(t) =
limt↓tΓ (θ(t)2 /Ix̄ (t)). However, θ(tΓ ) = Ix̄ (tΓ ) = 0. Apply thus L’Hôpital’s rule:
lim δ(t) = lim
t↓tΓ
t↓tΓ
2θ(t)θ̇(t)
2θ̇(t)
θ(t)2
= lim −1 2
= lim −1 2
.
2
t↓tΓ |σ b| θ(t)
Ix̄ (t) t↓tΓ |σ b| θ(t)
First, from (68), limt↓tΓ θ̇(t) = 1. Next, by the observations made in the proof of (ii)
above, θ(t) approaches 0 as t ↓ tΓ , from above. Thus limt↓tΓ δ(t) = ∞.35
(iv) Let t∗ , inf{t ≥ 0 : f (t) ≤ γ̄ λ + 1}. If t∗ = ∞, we are done. So suppose
∗
t < ∞. Since limt↓tΓ δ(t) = ∞ as (iii) states, t∗ > 0; and f (t) > γ̄ λ + 1 for all
t < t∗ . By continuity, f (t∗ ) = γ̄ λ + 1. Let γ λ start with γ λ (t∗ ) = f (t∗ ) = γ̄ λ + 1. Then
f (t) ≥ γ λ (t) for all t ≥ t∗ because f (t) = γ λ (t) implies f˙(t) > γ̇ λ (t) by (ii). Since
γ λ (t) > γ̄ λ for all t ≥ t∗ by (i), the claim follows.
(Proof of the proposition continued.) Let ε > 0. Then by Lemma 3.1 there exists
t1 ≥ 0 such that 2θ(t)Ix̄ (t)−1 < ε for all t ≥ t1 . Let λ̃ ∈ (0, λ) be such that λ̃−1 ρ2v =
35
Similarly we can also prove limt↓tΓ σx̄∗ (t) = ∞.
68
λ−1 ρ2v + ε, and let γ λ and γ λ̃ start with γ λ (t1 ) = γ λ̃ (t1 ) = f (t1 ); f (t1 ) > 0 by Lemma
A.5.(iv). Then γ λ (t) ≤ f (t) ≤ γ λ̃ (t) for all t ≥ t1 by Lemma A.5.(ii). Now some further
elementary arguments based on the convergence of γ λ and γ λ̃ (Lemma A.5.(i)) and
the arbitrariness of ε, prove f (t) → γ̄ λ as t → ∞; in turn, δ(t) → γ̄ λ − γ̄ ∞ .
p
Proof of Proposition 3.3. Let Ξ̄ , σ −1 b{∆m ∈ R : |∆m| ≤ 2αδ(∞)}. Then
p
√ p
−1
dH (Ξ(t), Ξ̄) = σ b 2α δ(t) − δ(∞)
and uniform convergence to Ξ̄ follows from Proposition 3.2.
Proof of Lemma 4.1. Let M , {E ξ : ξ ∈ Ξ}. Define f : M × (C 2 (u) ∩ Cbudget ) by
Z T
P0
e−βt M (t) log(c(t)) dt.
f (M, c) , E
0
The claim is
sup
min f (M, c) = min
c∈C 2 (u)∩Cbudget M ∈M
sup
M ∈M c∈C 2 (u)∩C
f (M, c).
budget
I apply the Kneser-Fan minimax theorem (Fan (1953), Theorem 2). The conclusion
follows once the following three assumptions are checked.
(i) M is a compact Hausdorff space. Let L2 ([0, T ] × Ω) ≡ L2 ([0, T ] × Ω, B[0, T ] ⊗
GT , Lebesgue × P 0 ) be the set of processes h such that
khk ,
P0
Z
E
T
2
h(t) dt
1/2
< ∞.
0
L2 ([0, T ]×Ω) is a reflexive Banach space with the norm k·k defined above. By design,
M ⊂ L2 ([0, T ] × Ω). Let K ≥ 0 be such that Ξ(t) ∈ [−K, K]ny , t ≥ 0. (K may be
state-dependent. See Section 4.3 and Remark 2.2.) For all M ∈ M,
Z T
2
2
P0
2
kM k ≤ E
E (2ξ) (t)eny K T dt = T eny K T
0
and M is norm-bounded. M is norm-closed by Lemma B.1 of Cuoco and Cvitanić
(1998) and is convex by (the proof of) Theorem 2.1(c) of Chen and Epstein (2002);
thus, it is weakly closed. By Alaoglu’s theorem, then, M is weakly compact. The
weak topology of a normed space is Hausdorff and so is a subspace.
(ii) For every c ∈ C 2 (u) ∩ Cbudget , f (M, c) is lower semicontinuous on M. Let
span(M) be the linear span of M over R; span(M) ⊂ L2 ([0, T ] × Ω) is a normed
space. For each c ∈ C 2 (u) ∩ Cbudget , the map fec : span(M) → R,
Z T
P0
M 7→ E
e−βt M (t) log(c(t)) dt
0
69
is linear; by Hölder’s inequality, the norm of fec is bounded by k log ck < ∞. Then there
exists an extension f c of fec such that the linear functional f c defined on L2 ([0, T ]×Ω)
is continuous in the norm topology, and consequently, in the weak topology (Aliprantis
and Border (1999), Lemma 6.13). Being a restriction of f c to M ⊂ span(M), f (·, c)
is continuous as well.
(iii) f is convexlike on M and concavelike on C 2 (u)∩Cbudget . M and C 2 (u)∩Cbudget
are both convex. It then suffices to note that (M, c) 7→ M log c is convex-concave on
(0, ∞)2 .
Proof of Proposition 4.1. Apply the minimax theorem and write the dual of the
inner supremization as
Z T
P0
inf E
max E ξ (t)e−βt log(c(t)) − ΛE −(ζ+ν) (t)e−rt c(t) dt
(69)
ν
0
c(t)
where Λ > 0. The solution to the dual problem solves the primal problem as well (He
and Pearson, 1991; Karatzas et al., 1991). c∗ (t) and Λ∗ are standard.
Plugging c∗ to (69), ignoring irrelevant terms, and exchanging the order of integration, we reach
Z T −βt
e
− e−βT 1
Pξ
inf |ζ(t) + ν(t) + ξ(t)|2 dt.
E
ν(t)
β
2
0
Without ξ, the minimizing ν(t) is 0 because ν(t) ∈ Ker(σR (t)). With ξ, on the other
hand,
|ζ(t) + ν(t) + ξ(t)|2 = |ζ(t) + ξ(t)|2 + |ν(t)|2 + 2ξ(t)> ν(t)
and under the constraint σR (t)ν(t) = 0, the unique minimizer is given by ν ∗ (t) =
f (t)ξ(t) where
f (t) , σR (t)> (σR (t)σR (t)> )−1 σR (t) − Iny .
Observe that f = f > and f 2 = −f , and plug c∗ , ν ∗ , and Λ∗ to (34).
Proof of Proposition 4.2. (i) follows from Theorem IV.4.3 of Fleming and Soner
(1993). The assumptions of the theorem are (IV.3.5) and (IV.4.6) in their book.
(IV.3.5) is the uniform parabolicity assumption, which is equivalent in the present
case to Assumption 4.1. (IV.4.6) is a collection of regularity conditions that can be
checked straightforwardly. (ii) and (iii) follow from Theorem IV.3.1.
Proof of Lemma 4.2. (i) Let
"Z
#
∗,ξ
2 T −βs
−βT
e
−
e
1
m
−
r
+
σ
ξ(s)
0
R
∗,ξ
s
F (t, m∗ , ξ) , EP
ds mt = m∗
β
2
σ
R
t
70
so that J(t, m∗ ) = minξ F (t, m∗ , ξ). The convexity of m∗ 7→ J(t, m∗ ) follows from that
of (m∗ , ξ) 7→ F (t, m∗ , ξ) and of Ξ: Suppose m∗ = hm∗1 + (1 − h)m∗2 , h ∈ [0, 1], and let
ξ1∗ and ξ2∗ be the respective minimizers. Then
J(t, m∗ ) ≤ F (t, hm∗1 + (1 − h)m∗2 , hξ1∗ + (1 − h)ξ2∗ )
≤ hJ(t, m∗1 ) + (1 − h)J(t, m∗2 ).
(ii) ∂m∗ J(t, m∗ ) is obtained via the envelope theorem: If both ∂m∗ J(t, m∗ ) and
∂m∗ F (t, m∗ , ξ ∗ ) exist, then ∂m∗ J(t, m∗ ) = ∂m∗ F (t, m∗ , ξ ∗ ). (See Milgrom and Segal
(2002), Theorem 1.) Observe
Z s
∗,ξ
κ(τ −t)
∗,ξ
−κ(s−t)
e
[κx̄ dτ + σm∗ (τ )( d(τ ) + ξ(τ ) dτ )]
mt +
ms = e
t
and let
e−βs − e−βT 1
f (s, t, m , ξ) ,
β
2
∗
m∗,ξ
s − r + σR ξ(s)
σR
so that
∗
F (t, m , ξ) = E
P0
Z
2
∗
, m∗,ξ
t = m
T
f (s, t, m∗ , ξ) ds.
t
Now, it is easy to check the conditions for differentiating under the integral (Durrett
0 RT
(2005), Theorem A.9.1) and we have ∂m∗ F (t, m∗ , ξ) = EP t ∂m∗ f (s, t, m∗ , ξ) ds.
Proof of Proposition 4.3. It suffices to show
lim
t→T
βeβt
σR σm∗ (t)∂m∗ J(t, m∗ ) = 0.
1 − e−β(T −t)
Recall Lemma 4.2(ii) and let
−κ(s−t) ∗,ξ∗
∗
∗
e
m
−
r
+
σ
ξ
(s)
0
R
s
, m∗,ξ
K(t, m∗ ) , sup EP = m∗ .
t
σR
σR
s∈[t,T ]
Then
Z T −βs
βt
βe
βeβt
e
− e−βT
∗ ∗ J(t, m ) ≤
∂
ds K(t, m∗ )
m
1 − e−β(T −t)
1 − e−β(T −t)
β
t
1
T −t
=
−
K(t, m∗ ).
β eβ(T −t) − 1
limt→T K(t, m∗ ) < ∞ because (i)
Z s
∗,ξ ∗
∗,0
−κ(s−t)
ms = ms + e
eκ(τ −t) σm∗ (τ )ξ ∗ (τ ) dτ where mt∗,0 = m∗t ,
t
71
(ii)
1
K(t, m∗ ) ≤ 2
σR
P0
sup E
|m∗,0
s |+
Z
s∈[t,T ]
!
T
¯ ) dτ + r + σR ξ(t)
¯
eκ(τ −t) |σm∗ (τ )|ξ(τ
,
t
0
∗
and (iii) EP |m∗,0
s | = g(s − t, m ) for some function g continuous in s − t. Thus,
βt
βe
∗
∗
∗ J(t, m ) ≤ 0 · lim K(t, m ) = 0.
lim ∂
m
t→T 1 − e−β(T −t)
t→T
Proof of Lemma 5.1. Note first that
Z t
Z t
1
∗
2
σD (s) d(s) +
aD (s) + bD (s)ms − |σD (s)| ds
log D(t) = log D(0) +
2
0
0
∗
∗
where m∗s ≡ mx̄s ,ηs (s). Note further that (i) (52) implies
Z T
P0
E
|σD (t)|4 dt < ∞
0
and (ii) m∗ is square-integrable by Theorem 4.7 of Liptser and Shiryaev (1977);
(m∗ , x̄∗ ) satisfies a linear SDE with uniformly bounded volatility (Proposition 2.6).
Then, the claim follows from Jensen’s inequality, Itô’s isometry, the boundedness of
bD , and other elementary arguments.
Proof of Proposition 5.1. Step 1. Notation. Let
Z T
Pξ
f (ξ, c) , E
e−βt log(c(t)) dt, (ξ, c) ∈ Ξ × C 2 (u).
0
Then, Ξ∗ = arg minξ∈Ξ f (ξ, D). Denote the set of consumption plans that can be
financed (that is, are feasible) under price system (r, S) with initial wealth W0 , by
Cf (W0 ; r, S) ⊂ C 2 (u). The agent’s problem, under (r, S) endowed with W0 , is
sup
min f (ξ, c) .
ξ∈Ξ
c∈Cf (W0 ;r,S)
Step 2. Optimality of D given ξ ∗ . Fix ξ ∗ ∈ Ξ∗ . In this step, I show that D maxi∗
∗
∗
mizes f (ξ ∗ , c) on Cf (S ξ (0); rξ , S ξ ).
Begin by noting that the maximization problem supc∈Cf f (ξ ∗ , c) can be seen as
∗
∗
∗
that of an expected utility investor with prior P ξ subject to price system (rξ , S ξ );
and with this in mind note further that
∗
∗
∗
dR(t) = (β + aD (t) + bD (t)mx̄t ,ηt (t) + σD (t)ξ ∗ (t)) dt + σD (t) dξ (t)
∗
∗
=: µξR (t) dt + σR (t) dξ (t).
72
Then, in particular,
∗
∗
µξR (t) − rξ (t) = |σR (t)|2 ,
∗
∗
ζ0 (t) , σR (t)> (σR (t)σR (t)> )−1 (µξR (t) − rξ (t)) = σR (t)> .
∗
Lemma A.6. ζ0 satisfies Novikov’s condition under P ξ :
Z T
∗
1
Pξ
2
E exp
|ζ0 (t)| dt < ∞.
2 0
>
Proof. Note first that ζ0 = σD
. Then, by (52) and Example 3 of Section 6.2.3 of
Liptser and Shiryaev (1977), ζ0 satisfies Novikov’s condition under P 0 . Now the claim
follows from the Cauchy-Schwarz inequality and the uniform boundedness of ξ ∗ .
(Proof of the proposition continued.) Suppose first nA = 0. Then the market is
dynamically complete (Lemma A.6); and standard martingale arguments show that
the optimal consumption plan equals D.
Suppose next nA > 0. Then the market is dynamically incomplete, in which case
the ξ ∗ -optimality of D can be argued along the lines of He and Pearson (1991),
Karatzas et al. (1991), and Cuoco (1997).
First, introduce nA fictitious financial assets (Karatzas et al., 1991) whose nA dimensional return process H = {H(t), Gt } follows
∗
∗
dH(t) = rξ (t)1nA dt + σH (t) dξ (t)
where 1nA denotes the nA -dimensional vector of ones and the rows of σH = {σH (t), Gt }
consist of orthonormal vectors in the kernel of σR a.e.
Next, let N denote the set of RnA -valued processes ν = {ν(t), Gt } satisfying
Z T
∗
Pξ
E
|ν(t)|2 dt < ∞,
0
let
ζν (t) ,
σR (t)
σH (t)
−1 ∗
∗
µξR (t) − rξ (t)
ν(t)
= σR (t)> + σH (t)> ν(t),
and collect in N ∗ those ν ∈ N with which
Z T
Z
∗
1 T
Pξ
2
>
ξ∗
|ζν (t)| dt = 1.
E exp −
ζν (t) d (t) −
2 0
0
N ∗ is not empty: 0 ∈ N ∗ (Lemma A.6). Let also
Z t
Z t
Z
1 t
ξ∗
>
ξ∗
2
pν (t) , exp −
r (s) ds exp −
ζν (s) d (s) −
|ζν (s)| ds .
2 0
0
0
73
Then, by Theorem 1 of Cuoco (1997),36 a feasible plan c ∈ Cf satisfies
Pξ
∗
Z
sup E
ν∈N ∗
T
∗
pν (t)c(t) dt ≤ S ξ (0).
0
Accordingly, the dual problem is defined as
−βt Z T
∗
e
Pξ
−βt
ξ∗
−βt
e log
inf
τ S (0) + E
dt ,
−e
(τ,ν)∈(0,∞)×N ∗
τ pν (t)
0
the unique solution to which is
τ ∗ = 1/D(0) and ν ∗ ≡ 0.
Since the candidate optimal consumption plan c∗ equals D ∈ Cf where
c∗ (t) ,
D(0)e−βt
, 0 ≤ t ≤ T,
p0 (t)
it follows, finally, from Proposition 1 of Cuoco (1997) that D solves supc∈Cf f (ξ ∗ , c).
Step 3. Optimality of D. Therefore, for each ξ ∗ ∈ Ξ∗ , (ξ ∗ , D) is a saddle point of
∗
∗
∗
f on Ξ × Cf (S ξ (0); rξ , S ξ ); that is, for each ξ ∗ ∈ Ξ∗ ,
D∈
arg max
min f (ξ, c) .
c∈Cf (S ξ∗ (0);rξ∗ ,S ξ∗ )
ξ∈Ξ
Proof of Proposition 5.2. First, from the law of motion (54) of D under P ξ ,
Z t
1
(aD + bD m∗s + σD ξ(s)) ds + σD ξ (t) − |σD |2 t
log D(t) = log D(0) +
2
0
∗
∗
where m∗s ≡ mx̄s ,ηs (s). By Fubini’s theorem,
Pξ
Z
T
f (ξ) , E
e−βt log(D(t)) dt
0
Z T
Z th
i 1
Pξ
Pξ
−βt
2
∗
=
e
log D(0) + aD t − |σD | t +
bD E (ms ) + σD E (ξ(s)) ds dt
2
0
0
Z T
Z th
i
ξ
ξ
−βt
= K1 +
e
bD EP (m∗s ) + σD EP (ξ(s)) ds dt
(70)
0
0
36
The present specification violates one of the standing assumptions (Assumption 1) of the cited
paper that the interest rate process is uniformly bounded. The theorem nevertheless applies because
I directly required the discounted wealth process to be uniformly bounded below (cf. Equation (11)
of the cited paper).
74
where K1 , whose definition is clear from the last equality, is a constant independent
ξ
of ξ. To compute EP m∗s , note from Proposition 2.6 that z , (m∗ , x̄∗ )> satisfies
bz ,
dz(t) = bz z(t) dt + σz (t) d(t),
−κ κ
ρw + (γ(t) + δ(t))(σ −1 b)>
and σz (t) ,
;
0 0
κ−1 σx̄∗ (t)(σ −1 b)>
in particular, σz : [0, T ] → R2×(1+nA ) is a deterministic function of time, which is
differentiable, and hence is a fortiori bounded, on [0, T ]. The solution is
Z t
−bz s
bz t
e
σz (s) d(s) .
z(0) +
z(t) = e
0
Thus,
Pξ
E
m∗s
=
bz s
ι>
z(0)
1e
+
ξ
bz s
EP ι>
1e
Z
s
e−bz τ σz (τ )ξ(τ ) dτ
(71)
0
where ι1 = (1, 0)> and the expectation of the stochastic integral with respect to ξ
has vanished given the boundedness of the integrand. Plugging (71) back into (70)
we obtain
Z T
Z t
Z s
Pξ
−βt
> bz s
−bz τ
f (ξ) = K2 + E
e
bD ι1 e
e
σz (τ )ξ(τ ) dτ + σD ξ(s) ds dt
0
0
0
where
Z
K2 , K1 +
T
e
−βt
0
Z
t
bz s
bD ι>
z(0) ds dt.
1e
0
Now, consider the following integral:
Z t
Z s
Z tZ s
> bz s
−bz τ
bz (s−τ )
bD ι1 e
e
σz (τ )ξ(τ ) dτ ds =
bD ι>
σz (τ )ξ(τ ) dτ ds
1e
0
0
0
0
Z tZ t
bz (s−τ )
=
bD ι>
σz (τ )ξ(τ ) ds dτ
1e
0
τ
Z tZ t
bz (τ −s)
=
bD ι>
σz (s)ξ(s) dτ ds.
1e
0
s
Thus,
f (ξ) = K2 + E
Pξ
Z
T
−βt
e
0
Z t Z t
> bz (τ −s)
bD
ι1 e
dτ σz (s)ξ(s) + σD ξ(s) ds dt.
0
s
Recall that p
ξ(s) ∈ Ξ(s) if and only if there is ∆m(s) ∈ R such that ξ(s) = σ −1 b∆m(s),
|∆m(s)| ≤ 2αδ(s). Noting also that σD σ −1 = (1, 0, · · · , 0), we finally arrive at
Z t
Z TZ t
ξ
−βt
> bz (τ −s)
−1
dτ σz (s)σ b + 1 EP (∆m(s)) ds dt.
f (ξ) = K2 +
e bD
ι1 e
0
0
s
75
Lemma A.7. h : dom(h) → R defined by
Z t
bz (τ −s)
ι>
dτ σz (s)σ −1 b + 1,
h(s, t) ,
1e
s
dom(h) , {(s, t) ∈ R2 : 0 ≤ s ≤ t ≤ T },
is continuous on dom(h) and positive on int(dom(h)) = {(s, t) ∈ R2 : 0 < s < t < T }.
Proof. Direct computation shows
κh(s, t) = κ + 1 − e−κ(t−s)
ρw σ −1 b + (γ(t) + δ(t))|σ −1 b|2
+ κ(t − s) − (1 − e−κ(t−s) ) κ−1 σx̄∗ (t)|σ −1 b|2 .
Continuity is clear; γ, δ, and σx̄∗ are differentiable functions on [0, T ].
For the other claim, note first that since κ(t−s)−(1−e−κ(t−s) ) is positive whenever
s < t and so is σx̄∗ (t), by Lemma A.5.(ii), for all t ≥ 0, the third term is positive on
int(dom(h)). Meanwhile,
p
ρw σ −1 b + (γ(t) + δ(t))|σ −1 b|2 > −κ + (κ + ρw σ −1 b)2 + (1 + λ−1 )ρ2v |σ −1 b|2 , t ≥ 0,
by Lemma A.5.(iv). Thus, the rest of κh(s, t), too, is positive on int(dom(h)).
(Proof of the proposition continued.) Thus, e−βt bD h(s, t) > 0 on the interior of
the domain of integration; and it follows that f (ξ) is uniquely minimized by ξ ∗∗ .
Proof of Proposition 5.3. To conform to the standard presentation of a control
problem, I rewrite J as
Z T
ξ
P0
−βs
ξ
J(t, Z) = min E
e
log(D (s)) ds Z (t) = Z
ξ
t
subject to
dZ ξ (s) = µZ (Z ξ (s)) ds + σZ (Z ξ (s))( d(s) + ξ(s) ds).
The HJB equation is
1 2
−βt
>
>
0 = min e log D + ∂t J + (∂Z J) (µZ + σZ ξ) + (∂Z J) ◦ (σZ σZ ) .
ξ
2
The minimization problem taken separately is
ξ ∗ (t) = arg min(∂Z J)> σZ ξ.
ξ∈Ξ(t)
Now recall that σξ = b(m − m∗t ) by definition where |m − m∗t | ≤
76
p
2αδ(t).
Proof of Lemma 5.2. First,
!>
√ p
2
ρw,1 A 1 − rDA
+γ+δ
κ−1
x σx̄∗
D, 0,
,
, 0, 0, 0, 0
2
2
A(1 − rDA
)
A(1 − rDA
)
−1
σZ σ b =
Thus, expectedly because A is unambiguous, ∂A J is irrelevant. Next, observe
Z
ξ
s
∗,ξ
m (τ ) dτ
D (s) = D exp
t
Z
× exp
t
s
1
σD (τ )( d(τ ) + ξ(t) dτ ) −
2
Z
s
2
|σD (τ )| dτ ,
t
(
m∗,ξ (s) = e−κx (s−t)
Z
+
m∗
"
s
eκx (τ −t) κx x̄∗,ξ
τ dτ +
t
and
x̄∗,ξ
s
∗
Z
= x̄ +
t
s
!
#)
γ(τ ) + δ(τ )
p
(1, 0) ( d(τ ) + ξ(τ ) dτ )
,
ρw + p
2
A(τ ) 1 − rDA
κ−1 σ ∗ (τ )
p x px̄
(1, 0)( d(τ ) + ξ(τ ) dτ ).
2
A(τ ) 1 − rDA
Thus,
1 − e−κx (s−t) ∗
1 − e−κx (s−t)
log(D (s)) = log D +
x̄∗ + f (s, t, ξ)
m + (s − t) −
κx
κx
ξ
where f is independent of D, m∗ , and x̄∗ . Finally, by the envelope theorem,
√ p
Z T
−βt
−βT
−κx (s−t)
2
A 1 − rDA
+γ+δ
ρ
e
−
e
1
−
e
w,1
(∂Z J)> σZ σ −1 b =
+
e−βs
ds
2
β
κx
A(1 − rDA )
t
Z T
−κx (s−t)
1−e
κ−1
x σx̄∗
+
e−βs (s − t) −
ds
.
2
κx
A(1 − rDA
)
t
Proof of Proposition 5.4. See the discussion following the statement of the proposition. As can be easily checked, ∂δ(∞)/∂(h2eff ) < 0. On the other hand,
∂h2eff
−2hD rDA + 2hA
=
.
2
∂hA
1 − rDA
77
References
Akaike, Hirotugu (1973), “Information theory and an extension of the maximum likelihood
principle.” Proc. 2nd Inter. Symposium on Information Theory, Budapest., 267–281.
Aliprantis, Charalambos D. and Kim C. Border (1999), Infinite Dimensional Analysis: A
Hitchhiker’s Guide. Springer.
Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent (2003), “A quartet of semigroups for model specification, robustness, prices of risk, and model detection.” Journal
of the European Economic Association, 1, 68–123.
Bansal, Ravi, Dana Kiku, and Amir Yaron (2012), “An empirical evaluation of the long-run
risks model for asset prices.” Critical Finance Review, 1, 183–221.
Barberis, Nicholas (2000), “Investing for the long run when returns are predictable.” Journal
of Finance, 55, 225–264.
Barry, Christopher B. (1974), “Portfolio analysis under uncertain means, variances, and
covariances.” Journal of Finance, 29, 515–522.
Blanchard, Olivier J. (1993), “Movements in the equity premium.” Brookings Papers on
Economic Activity, 2, 75–138.
Breeden, Douglas T. (1979), “An intertemporal asset pricing model with stochastic consumption and investment opportunities.” Journal of Financial Economics, 7, 265–296.
Brendle, Simon (2006), “Portfolio selection under incomplete information.” Stochastic Processes and their Applications, 116, 701–723.
Campanale, Claudio (2011), “Learning, ambiguity and life-cycle portfolio allocation.” Review of Economic Dynamics, 14, 339–367.
Chen, Hui, Nengjiu Ju, and Jianjun Miao (2014), “Dynamic asset allocation with ambiguous
return predictability.” Review of Economic Dynamics, 17, 799–823.
Chen, Zengjing and Larry G. Epstein (2002), “Ambiguity, risk, and asset returns in continuous time.” Econometrica, 70, 1403–1443.
Choi, Hongseok (2012), Essays on Learning under Ambiguity. Ph.D. dissertation, University
of Pennsylvania.
Cox, John C., Jr. Jonathan E. Ingersoll, and Stephen A. Ross (1985), “An intertemporal
general equilibrium model of asset prices.” Econometrica, 53, 363–384.
Cuoco, Domenico (1997), “Optimal consumption and equilibrium prices with portfolio constraints and stochastic income.” Journal of Economic Theory, 72, 33–73.
Cuoco, Domenico and Jakša Cvitanić (1998), “Optimal consumption choices for a ‘large’
investor.” Journal of Economic Dynamics & Control, 22, 401–436.
78
Davis, M. H. A. and A. R. Norman (1990), “Portfolio selection with transaction costs.”
Mathematics of Operations Research, 15, 676–713.
Detemple, Jérôme B. (1986), “Asset pricing in a production economy with incomplete information.” Journal of Finance, 61, 383–392.
Dow, James and Sergio Ribeiro da Costa Werlang (1992), “Uncertainty aversion, risk aversion, and the optimal choice of portfolio.” Econometrica, 60, 197–204.
Drechsler, Itamar (2013), “Uncertainty, time-varying fear, and asset prices.” Journal of
Finance, 68, 1843–1889.
Durrett, Richard (2005), Probability: Theory and Examples. Thomson.
Elliott, Robert J. and Vikram Krishnamurthy (1997), “Exact finite-dimensional filters for
maximum likelihood parameter estimation of continuous-time linear Gaussian systems.”
SIAM Journal on Control and Optimization, 35, 1908–1923.
Epstein, Larry G. and Jianjun Miao (2003), “A two-person dynamic equilibrium under
ambiguity.” Journal of Economic Dynamics & Control, 27, 1253–1288.
Epstein, Larry G. and Martin Schneider (2003), “IID: Independently and indistinguishably
distributed.” Journal of Economic Theory, 113, 32–50.
Epstein, Larry G. and Martin Schneider (2007), “Learning under ambiguity.” Review of
Economic Studies, 74, 1275–1303.
Epstein, Larry G. and Martin Schneider (2008), “Ambiguity, information quality, and asset
pricing.” Journal of Finance, 63, 197–228.
Epstein, Larry G. and Tan Wang (1994), “Intertemporal asset pricing under Knightian
uncertainty.” Econometrica, 62, 283–322.
Fama, Eugene F. and Kenneth R. French (1988), “Permanent and temporary components
of stock prices.” Journal of Political Economy, 96, 246–273.
Fan, Ky (1953), “Minimax theorems.” Proceedings of the National Academyof Sciences, 39,
42–47.
Fleming, Wendell H. and H. Mete Soner (1993), Controlled Markov Processes and Viscosity
Solutions. Springer-Verlag.
Gagliardini, Patrick, Paolo Porchia, and Fabio Trojani (2009), “Ambiguity aversion and the
term structure of interest rates.” Review of Financial Studies, 22, 4157–4188.
Gajdos, T., T. Hayashi, J.-M. Tallon, and J.-C. Vergnaud (2008), “Attitude toward imprecise information.” Journal of Economic Theory, 140, 27–65.
Gennotte, Gérard and Terry A. Marsh (1993), “Variations in economic uncertainty and risk
premiums on capital assets.” European Economic Review, 37, 1021–1041.
79
Gilboa, Itzhak and Massimo Marinacci (2013), “Ambiguity and the Bayesian paradigm.”
In Advances in Economics and Econometrics (Daron Acemoglu, Manuel Arellano, and
Eddie Dekel, eds.), volume I, 179–242, Cambridge University Press.
Gilboa, Itzhak and Larry Samuelson (2012), “Subjectivity in inductive inference.” Theoretical Economics, 7, 183–215.
Gilboa, Itzhak and David Schmeidler (1989), “Maxmin expected utility with non-unique
prior.” Journal of Mathematical Economics, 18, 141–153.
Gilboa, Itzhak and David Schmeidler (2010), “Simplicity and likelihood: An axiomatic approach.” Journal of Economic Theory, 145, 1757–1775.
Good, Irving J. and Ray A. Gaskins (1971), “Nonparametric roughness penalties for probability densities.” Biometrika, 58, 255–277.
Green, Peter J. (1987), “Penalized likelihood for general semi-parametric regression models.”
International Statistical Review, 55, 245–259.
Hansen, Lars Peter and Thomas J. Sargent (2011), “Robustness and ambiguity in continuous
time.” Journal of Economic Theory, 146, 1195–1223.
He, Hua and Neil D. Pearson (1991), “Consumption and portfolio policies with incomplete
markets and short-sale constraints: The infinite dimensional case.” Journal of Economic
Theory, 54, 259–304.
Heaton, John and Deborah Lucas (2000), NBER Macroeconomics Annual 1999, Volume 14,
chapter Stock Prices and Fundamentals, 213–264. MIT.
Hernández-Hernández, Daniel and Alexander Schied (2006), “Robust utility maximization
in a stochastic factor model.” Statistics & Decisions, 24, 109–125.
Hernández-Hernández, Daniel and Alexander Schied (2007a), “A control approach to robust
utility maximization with logarithmic utility and time-consistent penalties.” Stochastic
Processes and their Applications, 117, 980–1000.
Hernández-Hernández, Daniel and Alexander Schied (2007b), “Robust maximization of consumption with logarithmic utility.” Proceedings of the 2007 American Control Conference,
1120–1123.
Illeditsch, Philipp K. (2011), “Ambiguous information, portfolio inertia, and excess volatility.” Journal of Finance, 66, 2213–2247.
Ilut, Cosmin L. and Martin Schneider (2014), “Ambiguous business cycles.” American Economic Review, 104, 2368–2399.
Jagannathan, Ravi, Ellen R. McGrattan, and Anna Scherbina (2000), “The declining u.s.
equity premium.” Federal Reserve Bank of Minneapolis Quarterly Review, 24, 3–19.
80
Kalymon, Basil A. (1971), “Estimation risk and the portfolio selection model.” Journal of
Financial and Quantitative Analysis, 6, 559–582.
Karatzas, Ioannis, John P. Lehoczky, Steven E. Shreve, and Gan-Lin Xu (1991), “Martingale
and duality methods for utility maximization in an incomplete market.” SIAM Journal
on Control and Optimization, 29, 702–730.
Karatzas, Ioannis and Steven E. Shreve (1988), Brownian Motion and Stochastic Calculus.
Springer-Verlag.
Kim, Tong Suk and Edward Omberg (1996), “Dynamic nonmyopic portfolio behavior.”
Review of Financial Studies, 9, 141–161.
Klein, Roger W. and Vijay S. Bawa (1976), “The effect of estimation risk on optimal portfolio choice.” Journal of Financial Economics, 3, 215–231.
Klein, Roger W. and Vijay S. Bawa (1977), “The effect of limited information and estimation
risk on optimal portfolio diversification.” Journal of Financial Economics, 5, 89–111.
Koijen, Ralph S.J. and Stijn van Nieuwerburgh (2011), “Predictability of returns and cash
flows.” Annual Review of Financial Economics, 3, 467–491.
Konishi, Sadanori and Genshiro Kitagawa (2008), Information Criteria and Statistical Modeling. Springer.
Lakner, Peter (1998), “Optimal trading strategy for an investor: The case of partial information.” Stochastic Processes and their Applications, 76, 77–97.
Lettau, Martin, Sydney C. Ludvigson, and Jessica A. Wachter (2008), “The declining equity
premium: What role does macroeconomic risk play?” Review of Financial Studies, 21,
1653–1687.
Liptser, Robert S. and Albert N. Shiryaev (1977), Statistics of Random Processes (Volumes
I and II). Springer-Verlag.
Liu, Hening (2011), “Dynamic portfolio choice under ambiguity and regime switching mean
returns.” Journal of Economic Dynamics & Control, 35, 623–640.
Liu, Hening (2013), “Optimal consumption and portfolio choice under ambiguity for a meanreverting risk premium in complete markets.” Annals of Economics and Finance, 14,
21–52.
Merton, Robert C. (1973), “An intertemporal capital asset pricing model.” Econometrica,
41, 867–887.
Merton, Robert C. (1980), “On estimating the expected return on the market: An exploratory investigation.” Journal of Financial Economics, 8, 323–361.
Miao, Jianjun (2009), “Ambiguity, risk and portfolio choice under incomplete information.”
Annals of Economics and Finance, 10, 257–279.
81
Miao, Jianjun and Neng Wang (2011), “Risk, uncertainty, and option exercise.” Journal of
Economic Dynamics & Control, 35, 442–461.
Milgrom, Paul and Ilya Segal (2002), “Envelope theorems for arbitrary choice sets.” Econometrica, 70, 583–601.
Pástor, Ľuboš and Robert F. Stambaugh (2001), “The equity premium and structural
breaks.” Journal of Finance, 56, 1207–1239.
Poterba, James M. and Lawrence H. Summers (1988), “Mean reversion in stock prices:
Evidence and implications.” Journal of Financial Economics, 22, 27–59.
Rogers, L. C. G. and David Williams (1994), Diffusions, Markov Processes, and Martingales,
Volumes 1 and 2. Cambridge University Press.
Rossi, Alberto G. and Allan Timmermann (2015), “Modeling covariance risk in merton’s
icapm.” Review of Financial Studies, 28, 1428–1461.
Routledge, Bryan R. and Stanley E. Zin (2009), “Model uncertainty and liquidity.” Review
of Economic Dynamics, 12, 543–566.
Sbuelz, Alessandro and Fabio Trojani (2008), “Asset prices with locally constrained-entropy
recursive multiple-priors utility.” Journal of Economic Dynamics & Control, 32, 3695–
3717.
Schied, Alexander (2008), “Robust optimal control for a consumption-investment problem.”
Mathematical Methods of Operations Research, 67, 1–20.
Schmeidler, David (1989), “Subjective probability and expected utility without additivity.”
Econometrica, 57, 571–587.
Trojani, Fabio and Paolo Vanini (2004), “Robustness and ambiguity aversion in general
equilibrium.” Review of Finance, 8, 279–324.
van Binsbergen, Jules H. and Ralph S. J. Koijen (2010), “Predictive regressions: A presentvalue approach.” Journal of Finance, 65, 1439–1471.
Veronesi, Pietro (2000), “How does information quality affect stock returns?” Journal of
Finance, 55, 807–837.
Welch, Ivo and Amit Goyal (2008), “A comprehensive look at the empirical performance of
equity premium prediction.” Review of Financial Studies, 21, 1455–1508.
Xia, Yihong (2001), “Learning about predictability: The effects of parameter uncertainty
on dynamic asset allocation.” Journal of Finance, 56, 205–246.
Zohar, Gady (2001), “A generalized Cameron-Martin formula with applications to partially
observed dynamic portfolio optimization.” Mathematical Finance, 11, 475–494.
82
Supplementary Appendix to
Learning under Ambiguity, Portfolio Choice,
and Asset Returns
Hongseok Choi∗
September 22, 2015
SA.1
Proof of the Claim in Page 5
Here I briefly review the related model of learning by Epstein and Schneider (2007)
focusing on their portfolio choice example and show that, as claimed in the introduction, the continuous-time counterpart of their example results in no learning because
the likelihood function degenerates to infinity everywhere.
SA.1.1
The Model
Begin with the exchangeable Bayesian model of binary returns. There is a stock for
which there are d trading days per month. The likelihood function for the net rate of
return between two consecutive trading days is1
√ 1 1 x̄
√
L ∆R(t) = ±σR / d x̄ = ±
2 2 σR d
where σR > 0 and the monthly expected return x̄ ∈ R is the parameter of interest. A
Bayesian agent would have a unique parameter prior M .2
Epstein and Schneider’s agent, on the other hand, entertains multiple parameter
priors M ∈ M and multiple likelihoods L ∈ L. The parameter priors are all Dirac
measures. The likelihoods are given by
1 1 x̄ + η(t)
√ √ , |η(t)| ≤ η̄,
L ∆R(t) = ±σR / d x̄, η(t) = ±
(SA.1)
2 2 σR d
for some η̄ < ∞, so that at each trading date t, any value of η(t), |η(t)| ≤ η̄, could
be the case.
∗
University of Pennsylvania, aitch.choi@gmail.com.
Epstein and Schneider consider log returns but the difference is inconsequential.
2
See Section 2.2.2.
1
1
Having observed the returns up to trading date t > 0, the agent rules out theories
with low likelihood3 and Bayes-updates the remaining ones to obtain the set of onestep-ahead conditionals: denoting the time-t log-likelihood function of theories by
`t (x̄, η), the set of one-step-ahead conditionals is given by
0
0 0
L(·|x̄, η(t + 1)) : |η(t + 1)| ≤ η̄ and max
`t (x̄, η ) ≥ max
`t (x̄ , η ) − α
(SA.2)
0
0 0
η
x̄ ,η
where α ≥ 0 is a primitive.
SA.1.2
The Likelihood Function in Continuous Time
The continuous-time return process implied by (SA.1) is
dR(t) = (x̄ + η(t)) dt + σR dw(t)
with |η(t)| ≤ η̄ for all t ≥ 0.4 The log-likelihood of (x̄, η) is (see Proposition 2.3)
Z
T
(x̄ +
0
η(t))σR−2
1
dR(t) −
2
Z
T
[σR−1 (x̄ + η(t))]2 dt.
0
But the sequence {η ν : ν = 1, 2, . . .} defined by


if t = 0
0
ν
η (t) , +η̄ if t ∈ (tνi , tνi+1 ] and R(tνi+1 ) − R(tνi ) > 0 ,


−η̄ if t ∈ (tνi , tνi+1 ] and R(tνi+1 ) − R(tνi ) ≤ 0
where {tν1 , . . . , tνν } is the νth dyadic partition of [0, T ], will make the integral
Z
T
η ν (t) dR(t)
0
3
See Section 2.3.3. In contrast to the present paper, Epstein and Schneider’s does not speak of
penalization, but the restriction “|η(t)| ≤ η̄ for all t” is equivalent to the penalty on the log-likelihood
that takes the value of zero when the restriction is met and infinity otherwise.
4
Let d → ∞, and observe that the mean of returns is
x̄ + η σR
x̄ + η
√ √ =
= (x̄ + η) dt
d
σR d d
and the variance of returns
2 2 σ
x̄ + η
1 1 x̄ + η
σR
x̄ + η
1 1 x̄ + η
σ2
2
√R −
√
√
+
+ −√ −
−
≈ R = σR
dt.
d
2 2 σR d
d
2 2 σR d
d
d
d
2
diverge to the infinite variation of R.5 Epstein and Schneider could circumvent this
problem because the set of one-step-ahead conditionals (SA.2) is well-defined for all
trading frequencies d and converges; but if time is continuous at the outset, such a
circumvention is not possible.
SA.2
Proof of the Claim in Page 15
Here I prove the claim in page 15 that the induced log-likelihood function of mx̄,η (t)
without penalty, namely `t,m(t) , is constant. Familiarity with Section 2.3.3, including
the results given after page 15, is assumed.
Fix t > 0. Let the induced log-likelihood functions `t,m(t) and `λt,m(t) take values
from R∪{∞}. By Lemma 2.5, the curvature of `λt,m(t) is given by δ(t)−1 . Since δ(t)−1 ↓
0 as λ ↓ 0 (see (26) and the proof of Proposition 3.2), it follows that `λt,m(t) converges
to a constant function as λ ↓ 0. Now, the constancy of `t,m(t) follows from the following
lemma:
Lemma SA.2.1. `t,m(t) (m) = limλ↓0 `λt,m(t) (m) for all m ∈ Rnx .
Proof. Fix m ∈ Rnx . Begin with the trivial observation
Z
λ t
`t (x̄, η) −
|η(s)|2 ds ≤ `t (x̄, η).
2 0
Take supx̄,η with the constraint mx̄,η (t) = m and then limλ↓0 to see
lim `λt,m(t) (m) ≤ `t,m(t) (m).
λ↓0
Next,
λ
`t (x̄, η) −
2
Z
t
|η(s)|2 ds ≤ `λt,m(t) (m), η is such that mx̄,η (t) = m,
0
`t (x̄, η) ≤ lim `λt,m(t) (m), η is such that mx̄,η (t) = m,
λ↓0
and `t,m(t) (m) ≤ limλ↓0 `λt,m(t) (m).
5
The discrete-time partial maximum-likelihood estimate also alternates between the extreme
values ±η̄, and the corresponding profile likelihood function (the likelihood function with η replaced
by the partial maximizer) becomes degenerate as the trading dates become infinitely frequent. See
Epstein and Schneider (2007), Supplementary Appendix, Proposition S1.
3
SA.3
Proof of the Claim in Page 25
This section proves the following claim made at the end of the discussion of the
negative comovement between δ and σ:
Lemma SA.3.1. Assume finite confidence λ < ∞ for nondegeneracy. Suppose further
that ny = nx = 1; b and σ are constant; b > 0 and ρw < 0; and finally
b
(SA.3)
(ρ2 + (1 + λ−1 )ρ2v ) > 0.
σ=−
2κρw w
Then, in the limit t → ∞, the weight on the innovation ρw + σ −1 b(γ(t) + δ(t)) is zero,
and δ(t) is larger than λ−1 ρ2v /2κ.
Proof. Since σ −1 b is constant, γ and δ converge to constants, which I will denote
again by γ and δ (see the proof of Proposition 3.2; γ(t) = γ ∞ (t)). Thus, the question
is if there exist positive numbers γ, δ, and σ that solve the system of equations
0 = ρ2w + ρ2v − 2κγ − (ρw + σ −1 bγ)2 ,
0 = (ρw + σ −1 bγ)2 − 2κδ + λ−1 ρ2v ,
(SA.4)
(SA.5)
0 = ρw + σ −1 b(γ + δ).
(SA.6)
and
Sum (SA.4) and (SA.5), and solve the resulting equation for γ + δ to obtain
ρ2w + (1 + λ−1 )ρ2v
.
(SA.7)
2κ
Plugging (SA.7) into (SA.6) we obtain (SA.3). Next, solve (SA.4), together with
(SA.3), for γ to arrive at
i
p
ρ2 + (1 + λ−1 )ρ2v h 2
−1 2
2 − (1 + λ−1 )ρ2 )2 + 4ρ2 ρ2 > 0.
ρ
−
(1
+
λ
)ρ
+
γ= w
(ρ
w
v
w
v
w v
4κρ2w
γ+δ =
Finally, from (SA.7),
f (λ−1 ) , δ(λ−1 ) −
λ−1 ρ2v
2κ
ρ2w + ρ2v
− γ(λ−1 ).
(SA.8)
2κ
It only remains to show that f (λ−1 ) > 0 for all λ−1 > 0.
First of all, f (0) = δ(0) − 0 = 0. Note then, from (SA.8),
"
#
1
∂f
1
−
.
= ρ2v γ × p
∂(λ−1 )
(ρ2w − (1 + λ−1 )ρ2v )2 + 4ρ2w ρ2v ρ2w + (1 + λ−1 )ρ2v
=
Since
0 < (ρ2w −(1+λ−1 )ρ2v )2 +4ρ2w ρ2v = (ρ2w +(1+λ−1 )ρ2v )2 −4λ−1 ρ2w ρ2v < (ρ2w +(1+λ−1 )ρ2v )2 ,
∂f /∂(λ−1 ) > 0 for all λ−1 > 0 and the claim follows.
4
Download