Pathwise Dynamic Programming

advertisement
Pathwise Dynamic Programming
Christian Bender1,2 , Christian Gärtner2 , and Nikolaus Schweizer3
April 8, 2016
Abstract
We present a novel method for deriving tight Monte Carlo confidence intervals for solutions
of stochastic dynamic programming equations. Taking some approximate solution to the
equation as an input, we construct pathwise recursions with a known bias. Suitably coupling
the recursions for lower and upper bounds ensures that the method is applicable even when
the dynamic program does not satisfy a comparison principle. We apply our method to two
nonlinear option pricing problems, pricing under bilateral counterparty risk and pricing under
uncertain volatility.
Keywords: stochastic dynamic programming; Monte Carlo; confidence bounds; option pricing
MSC 2000: 90C39, 65C05, 90C46, 91B28.
1
Introduction
We study the problem of computing Monte Carlo confidence intervals for solutions of discretetime, finite horizon stochastic dynamic programming equations. There is a random terminal value,
and a nonlinear recursion which allows to compute the value at a given time from expectations
about the value one step ahead. Equations of this type are highly prominent in the analysis of
multistage sequential decision problems under uncertainty (Bertsekas, 2005; Powell, 2011). Yet
they also arise in a variety of further applications such as financial option pricing (e.g. Guyon and
Henry-Labordère, 2013), the evaluation of recursive utility functionals (e.g. Kraft and Seifried,
2014) or the numerical solution of partial differential equations (e.g. Fahim et al., 2011).
The key challenge when computing solutions to stochastic dynamic programs numerically
stems from a high order nesting of conditional expectations operators in the backward recursion.
The solution at each time step depends on an expectation of what happens one step ahead, which
in turn depends on expectations of what happens at later dates. If the system is driven by a
1
Corresponding author.
Saarland University, Department of Mathematics, Postfach 151150, D-66041 Saarbrücken, Germany,
bender@math.uni-sb.de; gaertner@math.uni-sb.de.
3
University of Duisburg-Essen, Mercator School of Management, Lotharstr.
65, D-46057 Duisburg,
nikolaus.schweizer@uni-due.de.
2
1
Markov process with a high-dimensional state space – as is the case in typical applications – a
naive numerical approach quickly runs into the curse of dimensionality. In practice, conditional
expectations thus need to be replaced by an approximate conditional expectations operator which
can be nested several times at moderate computational costs, e.g., by a least-squares Monte Carlo
approximation. For an introduction and overview of this “approximate dynamic programming”
approach, see, e.g., Powell (2011). The error induced by these approximations is typically hard to
quantify and control. This motivates us to develop a posteriori criteria which use an approximate
or heuristic solution to the dynamic program as an input in order to compute tight upper and
lower bounds on the true solution. In this context, “tight” means that the bounds match when
the input approximation coincides with the true solution.
Specifically, we study stochastic dynamic programming equations of the form
Yj = Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ]))
(1)
with random terminal condition YJ = ξ. Here, Ej [·] denotes the conditional expectation given the
information at time step j. Fj and Gj are real-valued, adapted, random functions. Fj is convex
while Gj is concave and increasing in its second argument. In particular, a possible dependence
on state and decision variables is modeled implicitly through the functions Fj and Gj . The
stochastic weight vectors βj form an adapted process, i.e., unlike Fj and Gj , βj+1 is known at
time j + 1 but not yet at time j. Such weight processes frequently arise in dynamic programs for
financial option valuation problems where, roughly speaking, they are associated with the first
two derivatives, Delta and Gamma, of the value process Y . Similarly, Fahim et al. (2011) derive
time discretization schemes of the form (1) for fully nonlinear second-order parabolic partial
differential equations which involve stochastic weights in the approximation of derivatives. As
prominent examples, this class of partial differential equations includes Hamilton-Jacobi-Bellman
equations arising from continuous-time stochastic control problems.
There are two main motivations for assuming the concave-convex structure of the right hand
side of (1). First, since convexity and concavity may stem, respectively, from maximization or
minimization, the equation is general enough to cover not only dynamic programs associated with
classical optimization problems but also dynamic programs arising from certain stochastic twoplayer games or robust optimization problems under model uncertainty. Second, many functions
which are themselves neither concave nor convex may be written in this concave-convex form.
This can be seen, for instance, in the application to bilateral counterparty credit risk in Section
6.1 below.
Traditionally, a posteriori criteria for this type of recursion were not derived directly from
equation (1) but rather from a primal-dual pair of optimization problems associated with it. For
instance, with the choices Gj (z, y) = y, Fj (z) = max{Sj , z}, βj = 1 and ξ = SJ , (1) becomes the
dynamic programming equation associated with the optimal stopping problem for a process (Sj )j .
From an approximate solution of (1) one can conclude an approximation of the optimal stopping
2
strategy. Testing this strategy by simulation yields lower bounds on the value of the optimal
stopping problem. The derivation of dual upper bounds starts by considering strategies which
allow to look into the future, i.e., by considering strategies which allow to optimize pathwise rather
than in conditional expectation. The best strategy of this type is easy to simulate and implies
an upper bound on the value of the usual optimal stopping problem. This bound can be made
tight by penalizing the use of future information in an appropriate way. Approximately optimal
information penalties can be derived from approximate solutions to (1). Combining the resulting
low-biased and high-biased Monte Carlo estimators for Y0 yields Monte Carlo confidence intervals
for the solution of the dynamic program. This information relaxation approach was developed in
the context of optimal stopping independently by Rogers (2002) and Haugh and Kogan (2004).
The approach was extended to general discrete-time stochastic optimization problems by Brown
et al. (2010) and Rogers (2007).
Recently, Bender et al. (2015) have proposed a posteriori criteria which are derived directly
from a dynamic programming recursion like (1). The obvious advantage is that, in principle, this
approach requires neither knowledge nor existence of associated primal and dual optimization
problems. This paper generalizes the approach of Bender et al. (2015) in various directions. For
example, we merely require that the functions Fj and Gj are of polynomial growth while the
corresponding condition in the latter paper is Lipschitz continuity. Moreover, we assume that
the weights βj are sufficiently integrable rather than bounded, and introduce the concave-convex
functional form of the right hand side of (1) which is more flexible than the functional forms
considered there.
Conceptually, our main departure from the approach of Bender et al. (2015) is that we do not
require that the recursion (1) satisfies a comparison principle. Suppose that two adapted processes
Y low and Y up fulfill the analogs of (1) with “=” replaced, respectively, by “≤” and “≥”. We call
such processes subsolutions and supersolutions to (1). The comparison principle postulates that
subsolutions are always smaller than supersolutions. Relying on such a comparison principle, the
approach of Bender et al. (2015) mimics the classical situation in the sense that their lower and
upper bounds can be interpreted as stemming, respectively, from a primal and a dual optimization
problem constructed from (1).
The bounds of the present paper apply regardless of whether a comparison principle holds or
not. This increased applicability is achieved at negligible additional numerical costs. The key idea
is to construct a pathwise recursion associated with particular pairs of super- and subsolutions.
These pairs remain ordered even when such an ordering is not a generic property of super- and
subsolutions, i.e., when comparison is violated in general. Roughly speaking, whenever a violation
of comparison threatens to reverse the order of the upper and lower bound, the upper and lower
bound are simply exchanged on the respective path. Consequently, lower and upper bounds must
always be computed simultaneously. In particular, they can no longer be viewed as the respective
solutions of distinct primal and dual optimization problems.
3
For some applications like optimal stopping, the comparison principle is not an issue. For
others like the nonlinear pricing applications studied in Bender et al. (2015), the principle can
be set in force by a relatively mild truncation of the stochastic weights β. Dispensing with the
comparison principle allows us to avoid any truncation and the corresponding truncation error.
This matters because there is also a class of applications where the required truncation levels
are so low that truncation would fundamentally alter the problem. This concerns, in particular,
backward recursions associated with fully nonlinear second-order partial differential equations
such as the problem of pricing under volatility uncertainty which serves as a running example
throughout the paper. Solving equation (1) for the uncertain volatility model is well-known as
a challenging numerical problem (Guyon and Henry-Labordère, 2011; Alanko and Avellaneda,
2013), making an a posteriori evaluation of solutions particularly desirable.
The paper is organized as follows: Section 2 provides further motivation for the stochastic
dynamic programming equation (1) by discussing how it arises in a series of examples. Section
3 states our setting and key assumptions in detail and explores a partial connection between
equation (1) and a class of two-player games which makes some of the constructions of the later
sections easier to interpret. Section 4 presents our main results. We first discuss the restrictiveness
of assuming the comparison principle in the applications we are interested in. Afterwards, we
proceed to Theorem 4.5, the key result of the paper, providing a new pair of upper and lower
bounds which is applicable in the absence of a comparison principle. In Section 5, we relate
our results to the information relaxation duals of Brown et al. (2010). In particular, we show
that our bounds in the presence of the comparison principle can be reinterpreted in terms of
information relaxation bounds for stochastic two-player zero-sum games, complementing recent
work by Haugh and Wang (2015). Finally, in Section 6, we apply our results in the context
of two nonlinear valuation problems, option pricing in a four-dimensional interest rate model
with bilateral counterparty risk and in the uncertain volatility model. While such nonlinearities
in pricing due to counterparty risk and model uncertainty have received increased attention
since the financial crisis, Monte Carlo confidence bounds for these two problems were previously
unavailable. All proofs are postponed to the Appendix.
2
Examples
In this section, we briefly discuss several examples of backward dynamic programming equations
arising in option pricing problems, which are covered by the framework of our paper. These
include nonlinearities due to early exercise features, counterparty risk, and model uncertainty.
Example 2.1 (Bermudan options). Our first example is the optimal stopping problem, or, in
financial terms, the pricing problem of a Bermudan option. Given a stochastic process Sj , j =
0, . . . , J, which is adapted to the information given in terms of a filtration on an underlying
4
filtered probability space, one wishes to maximize the expected reward from stopping S, i.e.,
Y0 := sup E[Sτ ],
(2)
τ
where τ runs through the set of {0, . . . , J}-valued stopping times. If the expectation is taken with
respect to a pricing measure, under which all tradable and storable securities in an underlying
financial market are martingales, then Y0 is a fair price of a Bermudan option with discounted
payoff process given by S. It is well-known and easy to check that Y0 is the initial value of the
dynamic program
Yj = max{Ej [Yj+1 ], Sj },
YJ = SJ ,
(3)
which is indeed of the form (1). In the traditional primal-dual approach, the primal maximization
problem (2) is complemented by a dual minimization problem. This is the information relaxation
dual which was initiated for this problem independently by works of Rogers (2002) and Haugh
and Kogan (2004). It states that
"
Y0 = inf E
M
#
sup (Sj − Mj ) ,
(4)
j=0,...,J
where M runs over the set of martingales which start in zero at time zero. In order to illustrate
our pathwise dynamic programming approach, we shall present an alternative proof of this dual
representation for optimal stopping in Example 4.2 below, which, in contrast to the arguments
in Rogers (2002) and Haugh and Kogan (2004), does not make use of specific properties of the
optimal stopping problem (2), but rather relies on the convexity of the max-operator in the
associated dynamic programming equation (3).
Example 2.2 (Credit value adjustment). In order to explain the derivation of a discrete-time
dynamic programming equation of the form (1) for pricing under counterparty risk, we restrict
ourselves to a simplified setting, but refer, e.g., to the monographs by Brigo et al. (2013) or
Crépey et al. (2014) for more realistic situations: A party A and a counterparty B trade several
derivatives, all of which mature at the same time T . The random variable ξ denotes the (possibly negative) cumulative payoff of the basket of derivatives which A receives at time T . We
assume that ξ is measurable with respect to the market’s reference information which is given by
a filtration (Ft )0≤t≤T , and that only the counterparty B can default. Default occurs at an exponential waiting time τ with parameter γ which is independent of the reference filtration. The full
filtration (Gt )0≤t≤T is the smallest one which additionally to the reference information contains
observation of the default event, i.e. it satisfies {τ ≤ t} ∈ Gt for every t ∈ [0, T ]. A recovery
scheme X = (Xt )0≤t≤T describes the (possibly negative) amount of money which A receives, if
B defaults at time t. The process X is assumed to be adapted to the reference filtration. We
denote by D(t, s; κ) = e−κ(s−t) , t ≤ s, the discount factor for a rate κ and stipulate that the
constant r ≥ 0 is a good proxy to a risk-free rate. Then, standard calculations in this reduced
5
form approach (e.g. Duffie et al., 1996) show that the fair price for these derivatives under default
risk at time t is given by
Vt
=
=
1{τ >t} E 1{τ >T } D(t, T, r)ξ + 1{τ ≤T } D(t, τ, r)Xτ Gt
Z T
1{τ >t} E D(t, T, r + γ) ξ +
D(s, T, −(r + γ))γXs ds Ft
(5)
t
=: 1{τ >t} Yt ,
where expectation is again taken with respect to a pricing measure. We assume that, upon
default, the contract (consisting of the basket of derivatives) is replaced by the same one with a
new counterparty which has not yet defaulted but is otherwise identical to the old counterparty
B. This suggests the choice Xt = −(Yt )− + r(Yt )+ = Yt − (1 − r)(Yt )+ for some recovery rate
r ∈ [0, 1]. Here (·)± denote the positive part and the negative part, respectively. Substituting
this definition of X into (5) and integrating by parts, leads to the following nonlinear backward
stochastic differential equation:
Z
Yt = E ξ −
t
T
rYs + γ(1 − r)(Ys )+ ds Ft .
(6)
i
h RT
Note that, by Proposition 3.4 in El Karoui et al. (1997), Y0 = inf ρ E e− 0 ρs ds ξ , where ρ runs
over the set of adapted processes with values in [r, r+γ(1−r)]. Thus, the party A can minimize the
price by choosing the interest rate which is applied for the discounting within a range determined
by the riskless rate r, the credit spread γ of the counterparty, and the recovery rate r.
Discretizing (6) in time, we end up with the backward dynamic programming equation
Yj = (1 − r∆)E[Yj+1 |Ftj ] − γ(1 − r) E[Yj+1 |Ftj ] + ∆,
YJ = ξ,
where tj = jh, j = 0, . . . , J, and ∆ = T /J. This equation is of the form (1) with βj ≡ 1,
Gj (z, y) = (1 − r∆)z − γ(1 − r)∆ (z)+ , and Fj (z) ≡ 1. We note that, starting with the work
by Zhang (2004), time discretization schemes for backward stochastic differential equations have
been intensively studied, see, e.g., the literature overview in Bender and Steiner (2012).
If one additionally takes the default risk and the funding cost of the party A into account, the
corresponding dynamic programming equation has both concave and convex nonlinearities, and
Y0 corresponds to the equilibrium value of a two-player game, see the discussion at the end of
Section 3. This is one reason to consider the more flexible concave-convex form (1). A numerical
example in the theoretical framework of Crépey et al. (2013) with random interest and default
rates is presented in Section 6 below.
Example 2.3 (Uncertain volatility). Dynamic programming equations of the form (1) also appear
in the discretization of Hamilton-Jacobi-Bellman equations for stochastic control problems, see,
e.g., Fahim et al. (2011). We illustrate such an approach by the pricing problem of a European
6
option on a single asset under uncertain volatility, but the underlying discretization methodology
of Fahim et al. (2011) can, of course, be applied to a wide range of (multidimensional) stochastic
control problems in continuous time.
We suppose that the price of the asset is given under risk-neutral dynamics and in discounted
units by
Z
Xtσ
t
= x0 exp
0
1
σu dWu −
2
Z
t
σu2 du
,
0
where W denotes a Brownian motion, and the volatility σ is a stochastic process which is adapted
to the Brownian filtration (Ft )0≤t≤T . The pricing problem of a European option with payoff
function g under uncertain volatility then becomes the optimization problem
Y0 = sup E[g(XTσ )],
(7)
σ
where σ runs over the nonanticipating processes with values in [σlow , σup ] for some given constants
σlow < σup . This is the worst case price over all stochastic volatility processes ranging within
the interval [σlow , σup ], and thus reflects the volatility uncertainty. The study of this so-called
uncertain volatility model was initiated by Avellaneda et al. (1995) and Lyons (1995).
The corresponding Hamilton-Jacobi-Bellman equation can easily be transformed in such a
way that Y0 = v(0, 0) where v solves the fully nonlinear Cauchy problem
1
1
vt (t, x) = − vxx (t, x) −
max
2
σ∈{σlow ,σup } 2
1 2
v(T, x) = g(x0 eρ̂x− 2 ρ̂ T ),
σ2
−
1
(vxx (t, x) − ρ̂vx (t, x)) ,
ρ̂2
(t, x) ∈ [0, T ) × R
x ∈ R,
(8)
for any (constant) reference volatility ρ̂ > 0 of one’s choice. Under suitable assumptions on
g, there exists a unique classical solution to (8) (satisfying appropriate growth conditions), see
Pham (2009). Let 0 = t0 < t1 < . . . < tJ = T again be an equidistant partition of the interval
[0, T ], where J ∈ N, and set ∆ = T /J. We approximate v by an operator splitting scheme,
which on each small time interval first solves the linear subproblem, i.e., a Cauchy problem for
the heat equation, and then corrects for the nonlinearity by plugging in the solution of the linear
subproblem. Precisely, for fixed J, let
1 2
y J (x) = g(x0 eρ̂x− 2 ρ̂ T ), x ∈ R
1 j
ȳtj (t, x) = − ȳxx
(t, x), ȳ j (tj+1 , x) = y j+1 (x), t ∈ [tj , tj+1 ), x ∈ R, j = J − 1, . . . , 0
2
1 σ2
j
j
j
y (x) = ȳ (tj , x) + ∆
max
− 1 ȳxx
(tj , x) − ρ̂ȳxj (tj , x) , x ∈ R.
2
ρ̂
σ∈{σlow ,σup } 2
Evaluating y j (x) along the Brownian paths leads to Yj := y j (Wtj ). Applying the FeynmanKac representation for the heat equation, see e.g. Karatzas and Shreve (1991), repeatedly, one
7
observes that ȳ j (tj , Wtj ) = Ej [Yj+1 ], where Ej denotes the expectation conditional on Ftj . It is
well-known from Fournié et al. (1999) that the space derivatives of ȳ j (tj , ·) can be expressed via
the so-called Malliavin Monte-Carlo weights as
ȳxj (tj , Wtj ) = Ej
"
∆Wj+1
Yj+1 ,
∆
j
ȳxx
(tj , Wtj ) = Ej
2
∆Wj+1
1
−
2
∆
∆
!
#
Yj+1 ,
where ∆Wj = Wtj − Wtj−1 . (In our particular situation, one can verify this simply by re-writing
the conditional expectations as integrals on R with respect to the Gaussian density and integrating
by parts.) Hence, one arrives at the following discrete-time dynamic programming equation:
YJ
Yj
= g(XTρ̂ ),
= Ej [Yj+1 ] + ∆
max
σ∈{σlow ,σup }
1
2
!
#!
"
2
∆Wj+1
∆Wj+1
σ2
1
− 1 Ej
− ρ̂
−
Yj+1
, (9)
ρ̂2
∆2
∆
∆
where XTρ̂ is the price of the asset at time T under the constant reference volatility ρ̂. This type
of time-discretization scheme was analyzed for a general class of fully nonlinear parabolic PDEs
by Fahim et al. (2011). In the particular case of the uncertain volatility model, the scheme was
suggested by Guyon and Henry-Labordère (2011) by a slightly different derivation.
2
Let Gj (z, y) = y, sι = 12 ( σρ̂2ι − 1) for ι ∈ {up, low}, and
Fj (z) = z (1) + ∆
max
s∈{slow ,sup }
sz (2) ,
where z (i) denotes the i-th component of the two-dimensional vector z, and define the R2 -valued
process β by
βj =
!
1
∆Wj2
∆2
− ρ̂
∆Wj
∆
−
1
∆
,
j = 1, . . . , J.
With these choices, the dynamic programming equation for Y in (9) is of the form (1). We emphasize that, although the recursive equation for Y stems from the discretization of the HamiltonJacobi-Bellman equation of a continuous-time stochastic control problem, the discrete time process Y may fail to be the value process of a discrete-time control problem, when the weight
1
1+∆
2
σ2
−1
ρ̂2
2
∆Wj+1
∆Wj+1
1
− ρ̂
−
2
∆
∆
∆
!
can become negative with positive probability for σ = σlow or σ = σup , see Example 3.1 below.
In such situations, the general theory of information relaxation duals due to Brown et al. (2010)
cannot be applied to construct upper bounds on Y , but the pathwise dynamic programming
approach, which we present in Section 4, still applies.
8
3
Setting
In this section, we introduce the general setting of the paper, into which the examples of the
previous section are accommodated. We then explain, how the dynamic programming equation
(1) relates to a class of stochastic two-player games, which have a sound interpretation from the
point of view of option pricing. While establishing these connections has some intrinsic interest,
their main motivation is to make the constructions and results of Section 4 more tangible and
intuitive.
Throughout the paper we study the following type of concave-convex dynamic programming
equation on a complete filtered probability space (Ω, F, (Fj )j=0,...J , P ) in discrete time:
YJ = ξ,
Yj = Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])),
j = J − 1, . . . , 0
(10)
where Ej [·] denotes the conditional expectation with respect to Fj . We assume
(C): For every j = 0, . . . , J − 1, Gj : Ω × RD × R → R and Fj : Ω × RD → R are measurable
and, for every (z, y) ∈ RD × R, the processes (j, ω) 7→ Gj (ω, z, y) and (j, ω) 7→ Fj (ω, z) are
adapted. Moreover, for every j = 0, . . . , J − 1 and ω ∈ Ω, the map (z, y) 7→ Gj (ω, z, y) is
concave in (z, y) and non-decreasing in y, and the map z 7→ Fj (ω, z) is convex in z.
(R):
• G and F are of polynomial growth in (z, y) in the following sense: There exist a
constant q ≥ 0 and a nonnegative adapted process (αj ) such that for all (z, y) ∈ RD+1
and j = 0, . . . , J − 1
|Gj (z, y)| + |Fj (z)| ≤ αj (1 + |z|q + |y|q ),
P -a.s.,
and αj ∈ Lp (Ω, P ) for every p ≥ 1.
• β = (βj )j=1,...,J is an adapted process such that βj ∈ Lp (Ω, P ) for every p ≥ 1 and
j = 1, . . . , J.
• The terminal condition ξ is an FJ -measurable random variable such that ξ ∈ Lp (Ω, P )
for every p ≥ 1.
In the following, we abbreviate by
• L∞− (RN ) the set of RN -valued random variables that are in Lp (Ω, P ) for all p ≥ 1.
• L∞−
(RN ) the set of Fj -measurable random variables that are in L∞− (RN ).
j
∞−
N
• L∞−
(RN ) for every j = 0, . . . , J.
ad (R ) the set of adapted processes Z such that Zj ∈ Lj
Thanks to the integrability assumptions on the terminal condition ξ and the weight process
β and thanks to the polynomial growth condition on F and G, it is straightforward to check
9
recursively that the (P -a.s. unique) solution Y to (10) belongs to L∞−
ad (R) under assumptions
(C) and (R). In order to simplify the exposition, we shall also use the following convention for
the remainder of the paper: Unless otherwise noted, all equations and inequalities are supposed
to hold P -almost surely.
For our main results in Section 4, we rewrite equation (10) relying on convex duality techniques. In the remainder of this section, we thus recall some concepts from convex analysis and
show that – in some cases – these techniques allow us to interpret the dynamic programming
equation (10) in terms of a class of two-player games. We recall that the convex conjugate of Fj
is given by
Fj# (u) := sup (u> z − Fj (z)),
u ∈ RD ,
z∈RD
where (·)> denotes matrix transposition. Note that Fj# can take the value +∞ and that the
maximization takes place pathwise, i.e., ω by ω. Analogously, for Gj the concave conjugate can
be defined in terms of the convex conjugate of −Gj as
(1) (0)
#
(1)
(0)
G#
v
,
v
:=
−(−G
)
−v
,
−v
j
j
>
(1)
(0)
=
inf
v
z + v y − Gj (z, y) ,
(z,y)∈RD+1
v (1) , v (0) ∈ RD+1 ,
which can take the value −∞. By the Fenchel-Moreau theorem, e.g. in the form of Theorem 12.2
in Rockafellar (1970), one has, for every j = 0, . . . , J − 1, z ∈ RD , y ∈ R, and every ω ∈ Ω
Gj (ω, z, y) =
(1)
>
inf
v
z+v
sup u> z − Fj# (ω, u) .
(0)
(v (1) ,v (0) )∈RD+1
Fj (ω, z) =
y−
(1) (0)
G#
j (ω, v , v )
,
(11)
(12)
u∈RD
Here, the minimization and maximization can, of course, be restricted to the effective domains
#
(1) (0)
D
DG# (ω,·) = {(v (1) , v (0) ) ∈ RD+1 , G#
j (ω, v , v ) > −∞}, DF # (ω,·) = {u ∈ R , Fj (ω, u) < +∞}
j
j
#
(0) ≥ 0 for (v (1) , v (0) ) ∈ D
by the monotonicity assumption
of G#
j and Fj . Noting that v
G# (ω,·)
j
on G in the y-variable, the dynamic programming equation (10) can be rewritten in the form
Yj =
>
# (1) (0)
(1)
(0)
(0) #
v + v u Ej [βj+1 Yj+1 ] − v Fj (u) − Gj (v , v ) ,
sup
inf
(v (1) ,v (0) )∈D
G
#
j
u∈D
#
F
j
which formally looks like the dynamic programming equation for a two-player game with a random
>
(Fj+1 -measurable!) and controlled ‘discount factor’ v (1) + v (0) u βj+1 between time j and j +1.
To make this intuition precise, we next introduce sets of adapted ‘controls’ (or strategies) in
10
terms of F and G by
o
#
D
∞−
(ri )i=j,...,J−1 ri ∈ L∞−
(R
),
F
(r
)
∈
L
(R)
for
i
=
j,
.
.
.
,
J
−
1
,
i
i
i
n
(1) (0)
(1) (0)
=
ρi , ρi
∈ L∞−
(RD+1 ),
ρi , ρ i
i
i=j,...,J−1
o
(1) (0)
G#
ρi , ρi
∈ L∞− (R) for i = j, . . . , J − 1 .
i
n
AFj
=
AG
j
(13)
F
In Appendix A, we show that these sets of controls AG
j and Aj are nonempty under the standing
assumptions (C) and (R). For j = 0, . . . , J, we consider the following multiplicative weights
(corresponding to the ‘discounting’ between time 0 and j)
wj (ω, v
(1)
,v
(0)
, u) =
j−1
Y
(1)
(vi
+ vi ui )> βi+1 (ω)
(0)
(14)
i=0
(1)
(1)
(0)
(0)
for ω ∈ Ω, v (1) = (v0 , ..., vJ−1 ) ∈ (RD )J , v (0) = (v0 , ..., vJ−1 ) ∈ RJ and u = (u0 , ..., uJ−1 ) ∈
(RD )J . Assuming (for simplicity) that F0 is trivial, and (crucially) that the positivity condition
(v (1) + v (0) u)> βj+1 (ω) ≥ 0
(15)
on the ‘discount factors’ holds for every ω ∈ Ω, (v (1) , v (0) ) ∈ DG# (ω,·) and u ∈ DF # (ω,·) , Y0
j
j
is indeed the equilibrium value of a (in general, non-Markovian) stochastic two-player zero-sum
game, namely,

Y0 =
sup E wJ (ρ(1) , ρ(0) , r)ξ −
inf
F
(ρ(1) ,ρ(0) )∈AG
0 r∈A
0
=
sup
inf
(ρ(1) ,ρ(0) )∈AG
r∈AF
0
0
J−1
X
j=0

E wJ (ρ(1) , ρ(0) , r)ξ −
J−1
X

(0)
(1) (0) 
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )

(0)
(1) (0) 
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
j=0
as is shown in Appendix C.2. In financial terms, we may think of wj (ρ(1) , ρ(0) , r) as a discretetime price deflator (or as an approximation of a continuous-time price deflator given in terms of
a stochastic exponential which can incorporate both discounting in the real-world sense and a
change of measure). Then, the first term in

E wJ (ρ(1) , ρ(0) , r)ξ −
J−1
X

(0)
(1) (0) 
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
j=0
corresponds to the fair price of an option with payoff ξ in the price system determined by the
deflator wJ (ρ(1) , ρ(0) , r), which is to be chosen by the two players. The choice may come with an
additional running reward or cost which is formulated via the convex conjugates of F and G in the
11
second term of the above expression. With this interpretation, Y0 is the equilibrium price for an
option with payoff ξ, on which the two players agree. Of course, the above stochastic two-player
game degenerates to a stochastic maximization (resp. minimization) problem, if G (resp. F ) is
linear, and so one of the control sets becomes a singleton.
In the framework of the uncertain volatility model of Example 2.3, the following example
illustrates that the validity of this game-theoretic interpretation of the dynamic programming
equation (10) crucially depends on the positivity assumption (15).
Example 3.1. We consider the discrete-time dynamic programming equation (9) with T = 2,
J = 2, σlow = 0.1, σup = ρ̂ = 0.2 and (for simplicity) the artificial terminal condition Y2 := ξ :=
1{∆W1 ≤−3} 1{∆W2 ≤−3} . In this example, the positivity condition (15) is violated, since, with the
notation introduced in Example 2.3, we obtain for (1, slow ) = (1, −3/8) ∈ DF # = {1} × [−3/8, 0],
j
3
3
3
(1, −3/8)> βj+1 = 1 − (∆Wj+1 )2 + ∆Wj+1 + =: aj+1 < −2 on {∆Wj+1 ≤ −3},
8
40
8
for j = 0, 1. As Y can be rewritten in the form
Yj = max{Ej [Yj+1 ], Ej [aj+1 Yj+1 ]},
one easily observes that (with Φ denoting the cumulative distribution function of a standard
normal),
Y1 = 1{∆W1 ≤−3} max{P ({∆W2 ≤ −3}), E1 [a2 1{∆W2 ≤−3} ]} = Φ(−3)1{∆W1 ≤−3}
Y0 = Φ(−3) max{P ({∆W1 ≤ −3}), E[a1 1{∆W1 ≤−3} ]} = Φ(−3)2 .
2
However, in this example F # ≡ 0, AF0 consists of the processes in L∞−
ad (R ) with values in
{1} × [−3/8, 0], and the multiplicative weight satisfies
w2 ((1, u1 ), (1, u2 )) =
8
8
1 + u1 (1 − a1 )
1 + u2 (1 − a2 ) .
3
3
Hence, Y0 is not the value of the maximization problem supr∈AF E[w2 (r)ξ], because the deterministic control r1 := r2 := (1, −3/8)> achieves the reward
0
E a1 a2 1{∆W1 ≤−3} 1{∆W2 ≤−3} ≥ 4Φ(−3)2 > Y0 .
For most of the paper, we shall not assume the positivity condition (15), and hence go beyond
the above game-theoretic setting. Nonetheless, by a slight abuse of terminology, we will also refer
F
to members of the sets AG
j and Aj as controls, when the positivity condition is violated.
12
4
Main results
In this section, we state and discuss the main contribution of the paper, the pathwise dynamic
programming approach of Theorem 4.5, which avoids the use of nested conditional expectations,
in order to construct tight upper and lower bounds on the solution Yj to the dynamic program
(10). Extending the ideas in Bender et al. (2015), this construction will be based on the concept
of supersolutions and subsolutions to the dynamic program (10).
Definition 4.1. A process Y up (resp. Y low ) ∈ L∞−
ad (R) is called supersolution (resp. subsolution)
to the dynamic program (10) if YJup ≥ YJ (resp. YJlow ≤ YJ ) and for every j = 0, . . . , J − 1 it
holds
up
up
Yjup ≥ Gj ([Ej [βj+1 Yj+1
], Fj (Ej [βj+1 Yj+1
])),
(and with ’ ≥’ replaced by ’ ≤’ for a subsolution).
Before we explain a general construction method for such super- and subsolutions, we first
illustrate how the dual representation for optimal stopping in Example 2.1 can be derived by this
line of reasoning.
Example 4.2. Fix a martingale M . The dynamic version of (4) suggests to consider
Θup
j := max Si − (Mi − Mj ),
i=j,...,J
Yjup = Ej [Θup
j ].
The anticipative discrete-time process Θup clearly satisfies the recursive equation (pathwise dynamic program)
up
Θup
j = max{Sj , Θj+1 − (Mj+1 − Mj )},
Θup
J = SJ .
Hence, by convexity of the max-operator and Jensen’s inequality, we obtain, thanks to the martingale property of M and the tower property of conditional expectation,
up
Yjup ≥ max{Sj , Ej [Θup
j+1 − (Mj+1 − Mj )]} = max{Sj , Ej [Yj+1 ]},
YJup = SJ .
Thus, Y up is a supersolution to the dynamic program (3). By the monotonicity of the maxoperator, we observe by backward induction that Yjup ≥ Yj , because (assuming that the claim is
already proved at time j + 1)
up
Yjup ≥ max{Sj , Ej [Yj+1
]} ≥ max{Sj , Ej [Yj+1 ]} = Yj .
In particular, writing Y up (M ) instead of Y up in order to emphasize the dependence on the fixed
martingale M , we get
Y0 ≤ inf
M
Y0up (M )
= inf E
M
max Si − (Mi − M0 ) .
i=0,...,J
13
(16)
∗ − M∗ = Y
up,∗ =
Defining M ∗ to be the Doob martingale of Y , i.e. Mi+1
i+1 − Ei [Yi+1 ], and Θ
i
Θup (M ∗ ), it is straightforward to check by backward induction, that, for every j = J, . . . , 0,
Θup,∗
= Yj , as (assuming again that the claim is already proved for time j + 1)
j
Θup,∗
= max{Sj , Θup,∗
j
j+1 − (Yj+1 − Ej [Yj+1 ])} = max{Sj , Ej [Yj+1 ]} = Yj .
This turns the inequality in (16) into an equality.
This alternative proof of the dual representation for optimal stopping is by no means shorter
or simpler than the original one by Rogers (2002) and Haugh and Kogan (2004) which relies on
the optional sampling theorem and the supermartingale property of Y . It comes, however, with
the advantage that it generalizes immediately to dynamic programming equations which share
the same convexity and monotonicity properties. Indeed, if Gj (z, y) = y in our general setting
and we, thus, consider the convex dynamic program
Yj = Fj (Ej [βj+1 Yj+1 ]),
j = J − 1, . . . , 0,
YJ = ξ,
(17)
we can write the corresponding pathwise dynamic program
up
Θup
j = Fj (βj+1 Θj+1 − (Mj+1 − Mj )),
Θup
J =ξ
(18)
D
for M ∈ MD , i.e., for D-dimensional martingales which are members of L∞−
ad (R ). Then, the
same argument as above, based on convexity and Jensen’s inequality implies that Yjup = Ej [Θup
j ]
is a supersolution to the dynamic program (17). A similar construction of a subsolution can be
built on classical Fenchel duality. Indeed, consider, for every r ∈ AF0 , the linear pathwise dynamic
program
#
Θlow
= rj> βj+1 Θup
j
j+1 − Fj (rj ),
Θlow
J = ξ.
(19)
Then, by (12), it is easy to check that Yjlow = Ej [Θlow
j ] defines a subsolution.
These observations already led to the construction of tight upper and lower bounds for discretetime backward stochastic differential equations (BSDEs) with convex generators in Bender et al.
(2015). In the latter paper, the authors exploit that in their setting the special form of F ,
stemming from a Lipschitz BSDE, implies – after a truncation of Brownian increments – a similar
monotonicity property than in the Bermudan option case.
In the general setting of the present paper, we would have to assume the following comparison
principle to ensure that the supersolution Yjup and the subsolution Yjlow indeed constitute an
upper bound and a lower bound for Yj .
(Comp): For every supersolution Y up and every subsolution Y low to the dynamic program (10)
it holds that
Yjup ≥ Yjlow ,
P -a.s.,
14
for every j = 0, . . . , J.
The following characterization shows that the comparison principle can be rather restrictive
and is our motivation to remove it below through a more careful construction of suitable pathwise
dynamic programs.
Theorem 4.3. Suppose (R) and that the dynamic program is convex, i.e., (C) is in force with
Gj (z, y) = y for j = 0, . . . , J − 1. Then, the following assertions are equivalent:
(a) The comparison principle (Comp) holds.
(b) For every r ∈ AF0 the following positivity condition is fulfilled: For every j = 0, . . . , J − 1
rj> βj+1 ≥ 0,
P -a.s.
(c) For every j = 0, . . . , J−1 and any two random variables Y (1) , Y (2) ∈ L∞− (R) with Y (1) ≥ Y (2)
P -a.s., the monotonicity condition
Fj (Ej [βj+1 Y (1) ]) ≥ Fj (Ej [βj+1 Y (2) ]),
P -a.s.,
is satisfied.
Theorem 4.3 provides three equivalent formulations for the comparison principle. Formulation
(b) illustrates the restrictiveness of the principle most clearly. For instance, when β contains
unbounded entries with random sign, comparison can only hold in degenerate cases. In some
applications, problems of this type can be resolved by truncation of the weights at the cost of
a small additional error. Yet in applications such as pricing under volatility uncertainty, the
comparison principle may fail (even under truncation). This is demonstrated in the following
example which can be interpreted as a more general and systematic version of the calculation in
Example 3.1.
Example 4.4. Recall the setting and the dynamic programming equation (9) from Example 2.3.
In this setting, thanks to condition (c) in Theorem 4.3, comparison boils down to the requirement
that the prefactor
1 + sj
!
2
∆Wj+1
− ρ̂∆Wj+1 − 1
∆
of Yj+1 in the equation (9) for Yj is P -almost surely nonnegative for both of the feasible values
of sj ,
( 2
1 σlow
1
sj ∈
−1 ,
2
2
ρ̂
2
!)
2
σup
−1
.
ρ̂2
For sj > 1, this requirement is violated for realizations of ∆Wj+1 sufficiently close to zero, while
for sj < 0 violations occur for sufficiently negative realizations of the Brownian increment – and
√
this violation also takes place if one truncates the Brownian increments at ±const. ∆ with an
arbitrarily large constant. Consequently, we arrive at the necessary conditions ρ̂ ≤ σlow and
√
ρ̂ ≥ σup / 3 for comparison to hold. For σlow = 0.1 and σup = 0.2, the numerical test case
15
in Guyon and Henry-Labordère (2011) and Alanko and Avellaneda (2013), these two conditions
cannot hold simultaneously, ruling out the possibility of a comparison principle.
Our aim is, thus, to modify the above construction of supersolutions and subsolutions in
such a way that comparison still holds for these particular pairs of supersolutions and subsolutions, although the general comparison principle (Comp) may be violated. At the same time, we
generalize from the convex structure to the concave-convex structure.
To this end, we consider the following ‘coupled’ pathwise recursion. Given (ρ(1) , ρ(0) ) ∈ AG
j ,
r ∈ AFj and M ∈ MD , define the (in general) nonadapted processes θiup = θiup (ρ(1) , ρ(0) , r, M ) and
θilow = θilow (ρ(1) , ρ(0) , r, M ), i = j, . . . , J, via the ‘pathwise dynamic program’
θJup = θJlow = ξ,
(1) >
(1) >
(1) >
up
up
low
θi+1 −
θi+1
− ρi
θi
=
ρi
βi+1
ρi
βi+1
∆Mi+1
+
−
(0)
(1) (0)
ι
+ρi
max Fi (βi+1 θi+1
− ∆Mi+1 ) − G#
ρ
,
ρ
i
i
i
ι∈{up,low}
up
ι
low
θilow =
min Gi βi+1 θi+1
− ∆Mi+1 , ri> βi+1 θi+1
− ri> βi+1 θi+1
+
−
ι∈{up,low}
−ri> ∆Mi+1 − Fi# (ri ) ,
(20)
where ∆Mi = Mi − Mi−1 . We emphasize that these two recursion formulas are coupled, as
low enters the defining equation for θ up and θ up enters the defining equation for θ low . Roughly
θi+1
i
i
i+1
up
low in the upper bound
speaking, the rationale of this coupled recursion is to replace θj+1
by θj+1
recursion at time j, whenever the violation of the comparison principle threatens to reverse
the order between upper bound recursion and lower bound recursion. Due to this coupling the
elementary argument based on convexity and Fenchel duality outlined at the beginning of this
section does not apply anymore, but a careful analysis is required to disentangle the influences of
the upper and lower bound recursion (see the proofs in the appendix).
Our main result on the construction of tight upper and lower bounds for the concave-convex
dynamic program (10) in absence of the comparison principle now reads as follows:
Theorem 4.5. Suppose (C) and (R). Then, for every j = 0, . . . , J,
Yj
=
=
essinf
Ej [θjup (ρ(1) , ρ(0) , r, M )]
esssup
Ej [θjlow (ρ(1) , ρ(0) , r, M )],
F
(ρ(1) ,ρ(0) )∈AG
j , r∈Aj , M ∈MD
(ρ(1) ,ρ(0) )∈AG
j ,
r∈AF
j ,
M ∈MD
P -a.s.
F
For any (ρ(1) , ρ(0) ) ∈ AG
j , r ∈ Aj , and M ∈ MD , we have the P -almost sure relation
θilow (ρ(1) , ρ(0) , r, M ) ≤ θiup (ρ(1) , ρ(0) , r, M )
16
for every i = j, . . . , J. Moreover,
Yj = θjup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = θjlow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ )
(21)
∗
F
P -almost surely, for every (ρ(1,∗) , ρ(0,∗) ) ∈ AG
j and r ∈ Aj satisfying the duality relations
(1,∗) >
ρi
(0,∗)
Ei [βi+1 Yi+1 ] + ρi
(1,∗) (0,∗)
ρ
,
ρ
Fi (Ei [βi+1 Yi+1 ]) − G#
i
i
i
= Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ])) (22)
and
(ri∗ )> Ei [βi+1 Yi+1 ] − Fi# (ri∗ ) = Fi (Ei [βi+1 Yi+1 ])
(23)
P -almost surely for every i = j, . . . , J − 1, and with M ∗ being the Doob martingale of βY , i.e.,
Mk∗ =
k−1
X
βi+1 Yi+1 − Ei [βi+1 Yi+1 ],
P -a.s., for every k = 0, . . . J.
i=0
∗
F
We note that, by Lemma A.1, optimizers (ρ(1,∗) , ρ(0,∗) ) ∈ AG
j and r ∈ Aj which solve the
duality relations (22) and (23) do exist. These duality relations also provide some guidance on
how to choose nearly optimal controls in numerical implementations of the method: When an
approximate solution Ỹ of the dynamic program is available, controls can be chosen such that
they (approximately) fulfill the analogs of (22) and (23) with Ỹ in place of the unknown true
solution. Likewise, M can be chosen as an approximate Doob martingale of β Ỹ . Moreover, the
almost sure optimality property in (21) suggests that (as is the case for the dual upper bounds
for optimal stopping) a Monte Carlo implementation benefits from a low variance, when the
approximate solution Ỹ to the dynamic program (10) is sufficiently close to the true solution Y .
As shown in the next proposition, the processes Ej [θjup ] and Ej [θjlow ] in Theorem 4.5 indeed
define super- and subsolutions to the dynamic program (10) which are ordered even though the
comparison principle need not hold:
Proposition 4.6. Under the assumptions of Theorem 4.5, the processes Y up and Y low given by
Yjup = Ej [θjup (ρ(1) , ρ(0) , r, M )]
and
Yjlow = Ej [θjlow (ρ(1) , ρ(0) , r, M )],
j = 0, . . . , J
F
are, respectively, super- and subsolutions to (10) for every (ρ(1) , ρ(0) ) ∈ AG
0 , r ∈ A0 , and M ∈
MD .
We close this section by stating a simplified and decoupled pathwise dynamic programming
approach, which can be applied if the comparison principle is known to be in force.
17
Theorem 4.7. Suppose (C), (R), and (Comp) and fix j = 0, . . . , J. For every (ρ(1) , ρ(0) ) ∈ AG
j
up (1) (0)
and M ∈ MD define Θup
j = Θj (ρ , ρ , M ) by the pathwise dynamic program
(1) >
(0)
up
# (1) (0)
=
ρ
βi+1 Θup
Θup
i
i
i+1 − (Mi+1 − Mi ) + ρi Fi (βi+1 Θi+1 − (Mi+1 − Mi )) − Gi (ρi , ρi ),
F
low =
for i = J − 1, . . . , j, initiated at Θup
J = ξ. Moreover, for r ∈ Aj and M ∈ MD define Θj
Θlow
j (r, M ) for i = J − 1, . . . , j by the pathwise dynamic program
#
low
>
low
Θlow
=
G
β
Θ
−
(M
−
M
),
r
β
Θ
−
(M
−
M
)
−
F
(r
)
i
i+1
i+1
i
i+1
i+1
i
i
i
i+1
i
i+1
i
initiated at Θlow
J = ξ. Then,
Yj =
essinf
(ρ(1) ,ρ(0) )∈AG
j , M ∈MD
(1) (0)
Ej [Θup
j (ρ , ρ , M )] =
esssup
r∈AF
j ,
M ∈MD
Ej [Θlow
j (r, M )], P -a.s.,
Moreover,
(1,∗) (0,∗)
∗
∗
Yj = Θup
,ρ
, M ∗ ) = Θlow
j (r , M )
j (ρ
(24)
∗
F
P -almost surely, for every (ρ(1,∗) , ρ(0,∗) ) ∈ AG
j and r ∈ Aj satisfying the duality relations (22)–
(23) and with M ∗ being the Doob martingale of βY .
5
Relation to information relaxation duals
In this section, we relate a special case of our pathwise dynamic programming approach to the
information relaxation dual for the class of stochastic two-player games discussed in Section
3. Throughout the section, we assume that the positivity condition (15) holds and that no
information is available at time zero, i.e., F0 is trivial. Then, by virtue of Proposition B.1,
the comparison principle (Comp) is in force and we can apply the simplified pathwise dynamic
programming approach of Theorem 4.7.
Under assumption (15), Y0 is the equilibrium value of a stochastic two-player zero-sum game,
namely,

Y0 =
sup E wJ (ρ(1) , ρ(0) , r)ξ −
inf
F
(ρ(1) ,ρ(0) )∈AG
0 r∈A
0
=
sup
r∈AF
0
inf
(ρ(1) ,ρ(0) )∈AG
0
J−1
X
j=0

E wJ (ρ(1) , ρ(0) , r)ξ −
J−1
X

(0)
(1) (0) 
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )

(0)
(1) (0) 
,
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
j=0
as seen in Section 3 where the multiplicative weights wj are defined in (14). Now, assume that
18
player 1 fixes her control (ρ(1) , ρ(0) ) ∈ AG
0 . Then,

Y0 ≤ sup E wJ (ρ(1) , ρ(0) , r)ξ −
r∈AF
0
J−1
X

(1) (0) 
(0)
.
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
j=0
One can next apply the information relaxation dual of Brown et al. (2010) (with strong duality),
which states that the maximization problem on the right-hand side equals


inf E  sup
p
wJ (ρ(1) , ρ(0) , u)ξ −
J−1
X
u∈(RD )J

(0)
(1) (0)
 ,
wj (ρ(1) , ρ(0) , u) ρj Fj# (uj ) + G#
j (ρj , ρj ) − p(u)
j=0
where the infimum runs over the set of all dual-feasible penalties, i.e., all mappings p : Ω×(RD )J →
R ∪ {+∞} which satisfy E[p(r)] ≤ 0 for every r ∈ AF0 , i.e., for every adapted and admissible
control r of player 2. Notice that, for every martingale M ∈ MD , the map
pM : u 7→
J−1
X
wj (ρ(1) , ρ(0) , u)(ρj + ρj uj )> ∆Mj+1
(1)
(0)
j=0
is a dual-feasible penalty, since, for adapted controls r ∈ AF0 ,


E
wj (ρ(1) , ρ(0) , r)(ρj + ρj rj )> ∆Mj+1 
J−1
X
(1)
(0)
j=0
=
J−1
X
h
i
(1)
(0)
E wj (ρ(1) , ρ(0) , r)(ρj + ρj rj )> Ej [∆Mj+1 ] = 0
j=0
by the martingale property of M and the tower property of the conditional expectation. We shall
refer to penalties of that particular form as martingale penalties. It turns out (see Appendix C.1
for the detailed argument) that, given the fixed controls (ρ(1) , ρ(0) ) ∈ AG
0 of player 1, one has, for
every martingale M ∈ MD ,
(1) (0)
Θup
0 (ρ , ρ , M )

=
sup
u∈(RD )J
wJ (ρ(1) , ρ(0) , u)ξ −
J−1
X

(0)
(1) (0)

wj (ρ(1) , ρ(0) , u) ρj Fj# (uj ) + G#
j (ρj , ρj ) − pM (u) ,
j=0
(25)
i.e. the explicit and straightforward to implement pathwise dynamic program Θup (ρ(1) , ρ(0) , M )
in Theorem 4.7 solves the pathwise maximization problem in the information relaxation dual
for the subclass of martingale penalties. We emphasize that, in contrast, for general choices of
the penalties solving the pathwise maximization problem may turn out to be a computational
19
bottleneck of the information relaxation approach, compare, e.g., the discussion in Section 4.2 of
Brown and Smith (2011) and in Section 2.3 of Haugh and Lim (2012).
In view of (25), Theorem 4.7 implies
"
Y0 =
inf
(ρ(1) ,ρ(0) )∈AG
j , M ∈MD
−
J−1
X
wj (ρ
E
(1)
wJ (ρ(1) , ρ(0) , u)ξ
sup
u∈(RD )J
,ρ
(0)
, u)
(0)
ρj Fj# (uj )
+
(1) (0)
G#
j (ρj , ρj )
!#
− pM (u)
j=0
i.e., one still obtains strong duality, when the minimization is restricted from the set of all dualfeasible penalties to the computationally tractable subset of martingale penalties.
Summarizing, when assuming the positivity condition (15), we can interpret the upper bound
(1) (0)
(1) (0)
E[Θup
0 (ρ , ρ , M )] in such a way that, first, player 1 fixes her strategy (ρ , ρ ) and the penalty
by the choice of the martingale M , while, then, the player 2 is allowed to maximize the penalized
problem pathwise. An analogous representation and interpretation holds, of course, for the lower
F
bound E[Θlow
0 (r, M )]. Indeed, given a control r ∈ A0 and a martingale M ∈ MD
Θlow
0 (r, M ) =
wJ (v (1) , v (0) , r)ξ
inf
(1) (0)
(vj ,vj )∈(RD+1 )J
−
J−1
X
wj (v
(1)
,v
(0)
, r)
(0)
vj Fj# (rj )
+
(1)
(vj
+
(0)
vj rj )> ∆Mj+1
+
(1) (0)
G#
j (vj , vj )
!
,
j=0
where now the map
(v (1) , v (0) ) 7→
J−1
X
wj (v (1) , v (0) , r)(vj + vj rj )> ∆Mj+1
(1)
(0)
j=0
is a dual-feasible penalty in the sense of Brown et al. (2010).
In this way, we end up with the information relaxation dual of Brown et al. (2010) for each
player given that the other player has fixed a control, but with the additional twist that, for our
particular class of (possibly non-Markovian) two-player games, minimization can be restricted to
the tractable subclass of martingale penalties. The general procedure explained above is analogous
to the recent information relaxation approach by Haugh and Wang (2015) for two-player games
in a classical Markovian framework which dates back to Shapley (1953), but in their setup Haugh
and Wang (2015) have to take minimization over all dual-feasible penalties into account.
As an important example, the setting of this section covers the problem of pricing convertible
bonds which (in its simplest form) is a stopping game governed by the dynamic programming
equation
Yj = min{Uj , max{Lj , Ej [Yj+1 ]}},
20
YJ = LJ
for adapted processes Lj ≤ Uj . For this problem, representations in the sense of pathwise optimal
control were previously studied by Kühn et al. (2007) in continuous time and by Beveridge and
Joshi (2011) in discrete time.
6
Applications
In this section, we apply the pathwise dynamic programming methodology to calculate upper
and lower bounds for two nonlinear pricing problems, pricing of a long-term interest rate product
under bilateral counterparty risk, and option pricing under volatility uncertainty. Traditionally,
the valuation of American options was by far the most prominent nonlinear pricing problem
both in practice and in academia. In the wake of the financial crisis, various other sources of
nonlinearity such as model uncertainty, default risk, liquidity problems or transaction costs have
received considerable attention, see the recent monographs Brigo et al. (2013), Crépey (2013),
and Guyon and Henry-Labordère (2013).
6.1
Bilateral counterparty risk
Suppose that two counterparties have agreed to exchange a stream of payments Ctj over the
sequence of time points t0 , . . . , tJ . For many common interest rate products such as swaps, the
sign of C is random – so that the agreed payments may flow in either direction. Therefore,
a consistent pricing approach must take into account bilateral default risk, thus introducing
nonlinearities into the arising recursive pricing equations which are in general neither convex nor
concave. In this respect, the present setting is conceptually similar but more complex than the
model of unilateral counterparty risk in Example 2.2. We refer to Crépey et al. (2013) for technical
background and an in-depth discussion of the intricacies of pricing under bilateral counterparty
risk and funding. We notice that by discretizing their equations (2.14) and (3.8), we arrive at the
following nonlinear backward dynamic program for the value of the product Yj at time tj (given
that there was no default prior to tj ) associated with the payment stream C, namely, YJ = CtJ
and
Yj
= (1 − ∆(rtj + γtj (1 − r)(1 − 2ptj ) + λ))Ej [Yj+1 ]
+∆(γtj (1 − r)(1 − 3ptj ) + λ − λ)Ej [Yj+1 ]+ + Ctj .
Here Ej denotes the expectation conditional on the market’s reference filtration up to time tj (i.e.,
the full market information is obtained by enlarging this filtration with the information whether
default has yet occurred or not). In focusing on equation (3.8) in Crépey et al. (2013), we consider
a pre-default CSA recovery scheme without collateralization, see their paper for background. In
the pricing equation, rt denotes (a proxy to) the risk-less short rate at time t. The rate at which a
default of either side occurs at time t is denoted by γt . Moreover, pt is the associated conditional
21
probability that it is the counterparty who defaults, if default occurs at time t. We rule out
simultaneous default so that own default happens with conditional probability 1 − pt , and assume
that the three parameters ρ, ρ and r associated with partial recovery from Crépey et al. (2013)
are identical and denoted by r. Finally, λ and λ are two constants associated with the costs of
external lending and borrowing, and ∆ = tj+1 −tj is the stepsize of the equidistant time partition.
Defining gj = 1 − ∆(rtj + γtj (1 − r)(1 − 2ptj ) + λ) and hj = ∆(γtj (1 − r)(1 − 3ptj ) + λ − λ),
we can express the recursion for Y in terms of a concave function Gj and a convex function Fj
by setting Fj (z) = z+ and
Gj (z, y) = gj z + (hj )+ y − (hj )− z+ + Ctj ,
so that Yj = Gj (Ej [Yj+1 ], Fj (Ej [Yj+1 ])). Denote by (Ỹj ) a numerical approximation of the process
(Yj ), by (Q̃j ) a numerical approximation of (Ej [Yj+1 ]) and by (M̃j ) a martingale which we think
of as an approximation to the Doob martingale of Y . In terms of these inputs, the pathwise
recursions (20) for upper and lower bound are given by
θjup =
(1)
(1)
(0)
up
up
low
ρ̃j
θj+1
− ∆M̃j+1 − ρ̃j
θj+1
− ∆M̃j+1 + ρ̃j θj+1
− ∆M̃j+1
+ Ctj
−
+
+
and
θjlow =
min
ι∈{up,low}
ι
ι
low
gj θj+1
− ∆M̃j+1 − (hj )− θj+1
− ∆M̃j+1
+ (hj )+ s̃j θj+1
− ∆M̃j+1 + Ctj
+
where

(g , (h ) , 0) ,
Q̃j < 0,
j
j +
(1) (0)
(ρ̃j , ρ̃j , s̃j ) =
(g − (h ) , (h ) , 1) , Q̃ ≥ 0.
j
j −
j +
j
For the payment stream Ctj , we consider a swap with notional N , fixed rate R and an
equidistant sequence of tenor dates T = {T0 , . . . , TK } ⊆ {t0 , . . . tJ }. Denote by δ the length of the
time interval between Ti and Ti+1 and by P (Ti−1 , Ti ) the Ti−1 -price of a zero-bond with maturity
Ti . Then, the payment process Ctj is given by
CTi = N ·
1
P (Ti−1 , Ti )
− (1 + Rδ)
for Ti ∈ T \ {T0 } and Ctj = 0 otherwise, see Brigo and Mercurio (2006), Chapter 1.
For r and γ, we implement the model of Brigo and Pallavicini (2007), assuming that the
risk-neutral dynamics of r is given by a two-factor Gaussian short rate model, a reparametrization of the two-factor Hull-White model, while γ is a CIR process. For the conditional default
probabilities pt we assume pt = 0 ∧ p̃t ∨ 1 where p̃ is an Ornstein-Uhlenbeck process. In continuous
22
time, this corresponds to the system of stochastic differential equations
dxt = −κx xt dt + σx dWtx , dyt = −κy yt dt + σy dWty
√
dγt = κγ (µγ − γt )dt + σγ γt dWtγ , dp̃t = κp (µp − p̃t )dt + σp dWtp
with rt = r0 + xt + yt , x0 = y0 = 0. Here, W x , W y and W γ are Brownian motions q
with instanp
γ
taneous correlations ρxy , ρxγ and ρyγ . In addition, we assume that Wt = ργp Wt + 1 − ρ2γp Wt
where the Brownian motion W is independent of (W x , W y , W γ ). We choose the filtration generated by the four Brownian motions as the reference filtration.
For the dynamics of x, y and p̃, exact time discretizations are available in closed form. We
discretize γ using the fully truncated scheme of Lord et al. (2010). The bond prices P (t, s) are
given as an explicit function of xt and yt in this model. This implies that the swap’s “clean price”,
i.e., the price in the absence of counterparty risk is given in closed form as well, see Sections 1.5
and 4.2 of Brigo and Mercurio (2006).
We consider 60 half-yearly payments over a horizon of T = 30 years, i.e., δ = 0.5. J is always
chosen as an integer multiple of 60 so that δ is an integer multiple of ∆ = T /J. For the model
parameters, we choose
(r0 , κx , σx , κy , σy ) = (0.03, 0.0558, 0.0093, 0.5493, 0.0138),
(γ0 , µγ , κγ , σγ , p0 , µp , κp , σp ) = (0.0165, 0.026, 0.4, 0.14, 0.5, 0.5, 0.8, 0.2),
(ρxy , ρxγ , ρyγ , r, λ, λ, N ) = (−0.7, 0.05, −0.7, 0.4, 0.015, 0.045, 1).
We thus largely follow Brigo and Pallavicini (2007) for the parametrization of r and γ but leave
out their calibration to initial market data and choose slightly different correlations to avoid the
extreme cases of a perfect correlation or independence of r and γ. The remaining parameters J,
R and ργp are varied in the numerical experiments below.
Except for the different recursions for θup and θlow , the overall numerical procedure is similar
to the one in Bender et al. (2015), so we refer to that paper for a more detailed exposition. We
calculate the initial approximation (Q̃j , Ỹj ) by a least-squares Monte Carlo approach. We first
simulate Nr = 105 trajectories of (xtj , ytj , γtj , ptj , Ctj ) forward in time, the so-called regression
paths. Then, for j = J − 1, . . . 0, we compute Q̃j by least-squares regression of Ỹj+1 onto a
set of basis functions. As the four basis functions, we use 1, γtj , γtj · ptj and the closed-form
expression for the clean price at time tj of the swap’s remaining payment stream. Ỹj is computed
by application of Fj and Gj to Q̃j . Storing the coefficients from the regression, we next simulate
No = 106 independent trajectories of (xtj , ytj , γtj , ptj , Ctj , Q̃j , Ỹj ) forward in time, the so-called
outer paths. In order to simulate an approximate Doob martingale M̃ of Ỹj , we rely on a
subsampling approach as proposed in Andersen and Broadie (2004). Along each outer path and
for all j, we simulate Ni = 100 copies of Ỹj+1 conditional on the observed value of (xtj , ytj , γtj , p̃tj )
and take the average of these copies to obtain an unbiased estimator of Ej [Ỹj+1 ].
23
J
120
360
Clean Price
0
0
ργp = 0.8
21.53 21.70
ργp = 0
25.13 25.31
ργp = −0.8
28.42 28.64
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
21.34
21.52
24.97
25.16
28.25
28.50
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
21.46
24.88
25.07
28.16
28.41
720
0
21.28
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
1440
0
21.28
21.46
24.88
25.08
28.15
28.41
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
Table 1: Lower and upper bound estimators for varying values of ργp and J with R = 275.12
basis points (b.p.), No = 106 and Ni = 100. Prices and standard deviations (in brackets) are
given in b.p.
Ni
10
Clean Price
0
20
0
50
0
ργp = 0.8
21.14 22.15
ργp = 0
24.79 25.81
ργp = −0.8
28.09 29.23
(0.17)
(0.17)
(0.17)
(0.17)
(0.17)
(0.17)
21.38
21.94
25.01
25.58
28.30
28.96
(0.12)
(0.12)
(0.12)
(0.12)
(0.12)
(0.12)
21.28
21.57
24.90
25.19
28.19
28.54
(0.09)
(0.09)
(0.09)
(0.09)
(0.09)
(0.09)
21.41
24.83
25.02
28.10
28.35
100
0
21.22
(0.07)
(0.07)
(0.07)
(0.07)
(0.07)
(0.07)
200
0
21.42
21.55
25.00
25.14
28.29
28.48
(0.06)
(0.06)
(0.06)
(0.06)
(0.06)
(0.06)
Table 2: Lower and upper bound estimators for varying values of ργp and Ni with No = 105 ,
R = 275.12 b.p. and J = 720. Prices and standard deviations (in brackets) are given in b.p.
After these preparations, we can go through the recursion for θup and θlow along each outer
path. Denote by Ŷ0up and Ŷ0low the resulting Monte Carlo estimators of E[θ0up ] and E[θ0low ]
and their associated empirical standard deviations by σ̂ up and σ̂ low . Then, an asymptotic 95%confidence interval for Y0 is given by
[Ŷ0low − 1.96σ̂ low , Ŷ0up + 1.96σ̂ up ].
Table 1 displays upper and lower bound estimators with their standard deviations for different
values of the time discretization and the correlation between γ and p. Here, R is chosen as the fair
swap rate in the absence of default risk, i.e., it is chosen such that the swap’s clean price at j = 0
is zero. The four choices of J correspond to a quarterly, monthly, bi-weekly, and weekly time
discretization, respectively. In all cases, the width of the resulting confidence interval is about
1.2% of the value. While variations between the two finer time discretizations can be explained
by standard deviations, the two rougher discretizations lead to values which are slightly higher.
We thus work with the bi-weekly discretization J = 720 in the following. The effect of varying
the correlation parameter of γ and p also has the expected direction. Roughly, if ργp is positive
then larger values of the overall default rate go together with larger conditional default risk of the
24
ργp
0.8
Adjusted Fair Swap Rate
290.91
0
293.75
−0.8
296.41
Clean Price
−31.70
Bounds
−0.07 0.12
(0.07)
−37.41
−0.09
−42.75
−0.12
(0.07)
(0.07)
(0.07)
0.12
(0.07)
0.14
(0.07)
Table 3: Adjusted fair swap rates and lower and upper bound estimators for varying values of
ργp with No = 105 , Ni = 100 and J = 720. Rates, prices and standard deviations (in brackets)
are given in b.p.
counterparty and smaller conditional default risk of the party, making the product less valuable
to the party. While this effect is not as pronounced as the overall deviation from the clean price,
the bounds are easily tight enough to differentiate between the three cases.
Table 2 shows the effect of varying the number of inner paths in the same setting. Since a
larger number of inner paths improves the approximation quality of the martingale, and since we
have pathwise optimality for the limiting Doob martingale, we observe two effects, a reduction in
standard deviations and a reduction of the distance between lower and upper bound estimators.
To display these effects more clearly, we have decreased the number of outer paths to No = 105
in this table, leading to the expected increase in the overall level of standard deviations.
Finally, Table 3 displays the adjusted fair swap rates accounting for counterparty risk and
funding for the three values of ργp , i.e., the values of R which set the adjusted price to zero in
the three scenarios. To identify these rates, we fix a set of outer, inner and regression paths and
define µ(R) as the midpoint of the confidence interval we obtain when running the algorithm
with these paths and rate R for the fixed leg of the swap. We apply a standard bisection method
to find the zero of µ(R). The confidence intervals for the prices in Table 3 are then obtained by
validating these swap rates with a new set of outer and inner paths. We observe that switching
from a clean valuation to the adjusted valuation with ρ = 0.8 increases the fair swap rate by 16
basis points (from 275 to 291). Changing ρ from 0.8 to −0.8 leads to a further increase by 5 basis
points.
6.2
Uncertain Volatility Model
In this section, we apply our numerical approach to the uncertain volatility model of Example 2.3.
Let 0 = t0 < t1 < ... < tJ = T be an equidistant partition of the interval [0, T ], where T ∈ R+ .
Recall that for an adapted process (σt )t , the price of the risky asset X σ at time t is given by
Z
Xtσ
= x0 exp
0
t
1
σu dWu −
2
25
Z
t
σu2 du
0
,
where W is a Brownian motion. Further, let g be the payoff a European option written on the
risky asset. Then, by Example 2.3, the discretized value process of this European option under
uncertain volatility is given by YJ = g(XTρ̂ ),
h
i
ρ̂
Γj = Ej Bj+1
Yj+1
where
ρ̂
Bj+1
=
and
Yj = Ej [Yj+1 ] + ∆
2
∆Wj+1
∆Wj+1
1
− ρ̂
− ,
∆2
∆
∆
and sι =
max
s∈{slow ,sup }
1
2
s Γj ,
(26)
σι2
−
1
ρ̂2
for ι ∈ {low, up}. Moreover, XTρ̂ is the price of the asset at time T under the constant reference
volatility ρ̂ > 0. Notice that the reference volatility ρ̂ is a choice parameter in the discretization.
The basic idea is to view the uncertain volatility model as a suitable correction of a Black-Scholes
model with volatility ρ̂.
As in Section 6.1, we denote in the following by (Ỹj ) a numerical approximation of the process
(Yj ), by (Q̃j ) a numerical approximation of (Ej [Yj+1 ]) and by (Γ̃j ) a numerical approximation of
(Γj ). Furthermore, (M̃j ) = (M̃j , M̃j )> denotes an approximation of the Doob martingale M
(1)
(2)
of βj Yj , which is given by
Mj+1 − Mj =
Yj+1 − Ejh[Yj+1 ]
ρ̂
ρ̂
Bj+1
Yj+1 − Ej Bj+1
Yj+1
!
i .
The recursion (20) for θjup and θjlow which is based on these input approximations can be written
as
θjup =
max
max
ι∈{up,low} s∈{slow ,sup }
n
o
(1)
(2)
ρ̂
ι
ι
θj+1
− ∆M̃j+1 + sBj+1
θj+1
∆ − s∆M̃j+1 ∆
and
θjlow =
(1)
(2) ρ̂
(1)
(2) ρ̂
(1)
(1)
(2)
(2)
up
low
r̃j + r̃j Bj+1
∆ θj+1
− r̃j + r̃j Bj+1
∆ θj+1
− r̃j ∆M̃j+1 − r̃j ∆M̃j+1 ∆,
−
+
(1)
(2)
where r̃j = (r̃j , r̃j ) is given by

(1, s ) , Γ̃ < 0
j
low
r̃j =
(1, s ) , Γ̃ ≥ 0.
up
j
For the payoff, we consider a European call-spread option with strikes K1 and K2 , i.e.,
g(x) = (x − K1 )+ − (x − K2 )+ ,
which is also studied in Guyon and Henry-Labordère (2011) and Alanko and Avellaneda (2013).
26
Following them, we choose the maturity T = 1 as well as K1 = 90 and K2 = 110 and x0 = 100.
The parameters ρ̂, σlow and σup are varied in the numerical experiments below.
In order to calculate the initial approximations (Ỹj ) and (Γ̃j ), we apply the martingale basis
variant of least-squares Monte Carlo which was proposed in Bender and Steiner (2012) in the
context of backward stochastic differential equations and dates back to Glasserman and Yu (2004)
for the optimal stopping problem. The basic idea is to use basis functions whose conditional
expectations (under the reference Black-Scholes model) can be calculated in closed form. When
Ỹj+1 is given as a linear combination of such basis functions, Ej [Ỹj+1 ] can be calculated without
regression by simply applying the same coefficients to the conditional expectations of the basis
functions. Similarly the Gamma of Ỹ can be represented in terms of the second derivatives of the
basis functions. To be more precise, we fix K basis functions ηJ,k (x), k = 1, . . . , K, at terminal
time and assume that
ηj,k (x) = E[ηJ,k (XJρ̂ )|Xjρ̂ = x]
can be computed in closed form. Then, it is straightforward to check that
ρ̂
E[ηj+1,k (Xj+1
)|Xjρ̂ = x] = ηj,k (x)
h
i
d2
ρ̂
ρ̂
E Bj+1
ηj+1,k (Xj+1
) Xjρ̂ = x = ρ̂2 x2 2 ηj,k (x).
dx
For a given approximation Ỹj+1 of Yj+1 , j = 0, . . . , J − 1, in terms of such martingale basis
functions, a first approximation of Yj can thus be calculated without regression by computing the
right hand sides in (26) explicitly with Ỹj+1 in place of Yj+1 . This approximation is regressed onto
the basis functions to ensure that the approximation Ỹj at time j is again a linear combination
of basis functions. This overall approach has two key advantages. First, since Q̃j = Ej [Ỹj+1 ] and
ρ̂
Γ̃j are exact conditional expectations of Ỹj+1 and Bj+1
Ỹj+1 , the martingale
M̃j+1 − M̃j =
Ỹj+1 − Q̃j
!
ρ̂
Bj+1
Ỹj+1 − Γ̃j
can be computed without subsimulations, and thus without the associated sampling error and
simulation costs. Second, since we calculate Γ̃j by twice differentiating the Y -approximation, we
ρ̂
2 ·∆−2 , these
avoid to use the simulated Γ-weights Bj+1
in the regression. Due to the factor ∆Wj+1
weights are otherwise a source of large standard errors. See Alanko and Avellaneda (2013) for a
discussion of this problem and some control variates (which are not needed in our approach).
We first simulate Nr = 105 regression paths of the process (Xjρ̂ ) under the constant volatility ρ̂.
For the regression, we do not start all paths at x0 but rather start Nr /200 trajectories at each of the
points 31, . . . , 230. Since X is a geometric Brownian motion under ρ̂, it can be simulated exactly.
Starting the regression paths at multiple points allows to reduce the instability of regression
coefficients arising at early time points. See Rasmussen (2005) for a discussion of this stability
27
problem and of the method of multiple starting points. For the computation of Ỹj and Q̃j we
choose 162 basis functions. The first three are 1, x and E[g(XJρ̂ )|Xjρ̂ = x]. Note that the third
one is simply the Black-Scholes price (under ρ̂) of the spread option g. For the remaining 159
basis functions, we also choose Black-Scholes prices of spread options with respective strikes K (k)
and K (k+1) for k = 1, . . . , 159, where the numbers K (1) , . . . , K (160) increase from 20.5 to 230.5.
The second derivatives of these basis functions are just (differences of) Black-Scholes Gammas.
Like in Section 6.1, we then simulate (Xjρ̂ , Ỹj , Q̃j , Γ̃j ) along No = 105 outer paths started in
x0 . As pointed out above, the Doob martingale of βj Ỹj is available in closed form in this case.
Finally, the recursions for θup and θlow are calculated backwards in time on each outer path.
As a first example we consider σlow = 0.1 and σup = 0.2 as in Guyon and Henry-Labordère
(2011) and Alanko and Avellaneda (2013). Following Vanden (2006), the continuous-time limiting
price of a European call-spread option in this setting is given by 11.2046. Table 4 shows the
√
approximated prices Ỹ0 as well as upper and lower bounds for ρ̂ = 0.2/ 3 ≈ 0.115 depending on
the time discretization. This is the smallest choice of ρ̂, for which the monotonicity condition in
Theorem 4.3 can only be violated when the absolute values of the Brownian increments are large,
cp. Example 4.4. As before, Ŷ0up and Ŷ0low denote Monte Carlo estimates of E[θ0up ] respectively
E[θ0low ]. The numerical results suggest convergence from below towards the continuous-time limit
for finer time discretizations. This is intuitive in this example, since finer time discretizations
allow for richer choices of the process (σt ) in the maximization problem (7). We notice that
the bounds are fairly tight (with, e.g., a relative width of 1.3% for the 95% confidence interval
with J = 21 time discretization points), although the upper bound begins to deteriorate as Ỹ0
approaches its limiting value. The impact of increasing ρ̂ to 0.15 (as proposed in Guyon and
Henry-Labordère, 2011; Alanko and Avellaneda, 2013) is shown in Table 5. The relative width
of the 95%-confidence interval is now about 0.6 % for up to J = 35 time steps, but also the
convergence to the continuous-time limit appears to be slower with this choice of ρ̂.
J
3
6
9
12
15
18
21
24
Ỹ0
10.8553
11.0500
11.1054
11.1340
11.1484
11.1585
11.1666
11.1710
Ŷ0low
10.8550
11.0502
11.1060
11.1344
11.1486
11.1585
11.1666
11.1683
(0.0001)
(0.0002)
(0.0005)
(0.0002)
(0.0002)
(0.0006)
(0.0003)
(0.0032)
Ŷ0up
10.8591
11.0536
11.1116
11.1438
11.1728
11.2097
11.2764
11.5593
(0.0001)
(0.0002)
(0.0005)
(0.0007)
(0.0056)
(0.0088)
(0.0173)
(0.0984)
√
Table 4: Approximated price as well as lower and upper bounds for ρ̂ = 0.2/ 3 for different time
discretizations. Standard deviations are given in brackets
Comparing Table 5 with the results in Alanko and Avellaneda (2013), we observe that their
point estimates for Y0 at time discretization levels J = 10 and J = 20 do not lie in our confidence
intervals which are given by [10.9981, 11.0030] and [11.1021, 11.1096], indicating that their leastsquares Monte Carlo estimator may still suffer from large variances (although they apply control
variates). The dependence of the time discretization error on the choice of the reference volatility
28
J
5
10
15
20
25
30
35
40
Ỹ0
10.8153
10.9982
11.0684
11.1027
11.1241
11.1386
11.1479
11.1554
Ŷ0low
10.8153
10.9983
11.0683
11.1023
11.1237
11.1376
11.1462
11.0101
(0.0001)
(0.0001)
(0.0001)
(0.0001)
(0.0001)
(0.0002)
(0.0002)
(0.1391)
Ŷ0up
10.8167
11.0028
11.0728
11.1092
11.1379
11.1687
11.2058
12.0483
(0.0001)
(0.0001)
(0.0001)
(0.0002)
(0.0008)
(0.0030)
(0.0047)
(0.6464)
Table 5: Approximated price as well as lower and upper bounds for ρ̂ = 0.15 for different time
discretizations. Standard deviations are given in brackets
ρ̂ is further illustrated in Table 6, which displays the mean and the standard deviation of 30 runs
of the martingale basis algorithm for different choices of ρ̂ and up to 640 time steps. By and
large, convergence is faster for smaller choices of ρ̂, but the algorithm becomes unstable when the
reference volatility is too small.
J
10
20
40
80
160
320
640
ρ̂ = 0.06
79.7561
1.6421 · 105
1.7010 · 1016
3.2151 · 1024
1.8613 · 1024
4.5672 · 1039
7.0277 · 1039
11.6463
12.3183
130.2723
11.8494
11.6951
(0.2634)
(1.5447)
(372.2625)
(1.1846)
(1.5772)
5.4389 · 10
1.0153 · 10
11.1552
11.1823
11.1942
11.1999
11.2026
11.2047
11.2057
(0.0031)
(0.0040)
(0.0005)
(0.0002)
(0.0001)
(0.0001)
(0.0001)
ρ̂ = 0.15
10.9985
11.1027
11.1556
11.1821
11.1952
11.2017
11.2047
(0.0006)
(0.0006)
(0.0004)
(0.0003)
(0.0003)
(0.0003)
(0.0002)
ρ̂ = 0.2
10.7999
10.9746
11.0819
11.1455
11.1802
11.1980
11.2073
(0.0007)
(0.0005)
(0.0005)
(0.0006)
(0.0005)
(0.0006)
(0.0006)
9.7088
9.9652
10.2306
10.4945
10.7453
10.9635
11.1248
(0.0004)
(0.0005)
(0.0009)
(0.0015)
(0.0047)
(0.0076)
(0.0103)
(5.1739)
ρ̂ = 0.08
ρ̂ = 0.1
ρ̂ = 0.5
(8.4594·105 )
(6.3043·1016 )
(1.7603·1025 )
(7.0234·1024 )
(2.5016·1040 )
3
(1.8766·104 )
(3.7590·1040 )
8
(3.5571·108 )
Table 6: Mean of L = 30 simulations of Ỹ0 for different ρ̂ and discretizations. Standard deviations
are given in brackets.
In order to gain a better understanding of how the performance of the method depends on the
input parameters, we also consider the case σlow = 0.3 and σup = 0.4. Following Vanden (2006),
the price of the European call-spread option in the continuous-time limit is 9.7906 in this case.
We get qualitatively the same results as for the previous example, in the sense that convergence is
faster for the smaller reference volatility and that the upper bound estimators begin to deteriorate
as the time partition becomes too fine. However, quantitatively, the numerical results in Table
7 and 8 are better than in the previous example (confidence intervals remain tight for finer time
partitions, for which the discretized price has a relative error of about 0.2 % compared to the
continuous-time limit). This is quite likely to be connected to the fact that the ratio between σup
and σlow is smaller in this second example.
7
Conclusion
This paper proposes a new method for constructing high-biased and low-biased Monte Carlo estimators for solutions of stochastic dynamic programming equations. When applied to the optimal
29
J
3
6
9
12
15
18
21
24
27
30
Ỹ0
9.6206
9.7147
9.7435
9.7574
9.7643
9.7683
9.7717
9.7750
9.7767
9.7772
Ŷ0low
9.6139
9.7147
9.7433
9.7573
9.7642
9.7681
9.7728
9.7740
9.7768
9.7769
(0.0001)
(0.0002)
(0.0002)
(0.0003)
(0.0002)
(0.0005)
(0.0006)
(0.0012)
(0.0002)
(0.0002)
Ŷ0up
9.6211
9.7167
9.7475
9.7629
9.7725
9.7848
9.7933
9.8010
9.8248
9.9354
(0.0002)
(0.0002)
(0.0006)
(0.0005)
(0.0016)
(0.0028)
(0.0046)
(0.0052)
(0.0115)
(0.0592)
√
Table 7: Approximated price as well as lower and upper bounds for ρ̂ = 0.4/ 3 for different time
discretizations. Standard deviations are given in brackets
J
10
20
30
40
50
60
70
80
Ỹ0
9.6061
9.6924
9.7235
9.7409
9.7509
9.7574
9.7626
9.7653
Ŷ0low
9.6060
9.6922
9.7231
9.7406
9.7506
9.7571
9.7622
9.7644
(0.0001)
(<0.0001)
(<0.0001)
(<0.0001)
(<0.0001)
(0.0001)
(0.0001)
(0.0002)
Ŷ0up
9.6063
9.6930
9.7248
9.7435
9.7567
9.7707
9.7946
9.8651
(0.0001)
(<0.0001)
(0.0001)
(0.0001)
(0.0002)
(0.0006)
(0.0018)
(0.0088)
Table 8: Approximated price as well as lower and upper bounds for ρ̂ = 0.35 for different time
discretizations. Standard deviations are given in brackets
stopping problem, the method simplifies to the classical primal-dual approach of Rogers (2002)
and Haugh and Kogan (2004) (except that the martingale penalty appears as a control variate in
the lower bound). Our approach is complementary to earlier generalizations of this methodology
by Rogers (2007) and Brown et al. (2010) whose approaches start out from a primal optimization
problem rather than from a dynamic programming equation. The resulting representation of
high-biased and low-biased estimators in terms of a pathwise recursion makes the method very
tractable from a computational point of view. Suitably coupling upper and lower bounds in the
recursion enables us to handle situations which are outside the scope of classical primal-dual
approaches, because the dynamic programming equation fails to satisfy a comparison principle.
Acknowledgements
Financial support by the Deutsche Forschungsgemeinschaft under grant BE3933/5-1 is gratefully
acknowledged.
A
Preparations
In this section, we verify in our setting, i.e., under the standing assumptions (C) and (R), a
number of technical properties of the control processes which are needed in the later proofs. Note
first that, by continuity of Fi ,
Fi# (ri ) = sup (ri> z − Fi (z))
z∈QD
30
(1)
(0)
(1) (0)
is Fi -measurable for every r ∈ AFj and i = j, . . . , J − 1 — and so is G#
i (ρi , ρi ) for (ρ , ρ ) ∈
AG
j .
Moreover, the integrability condition on the controls requires that Fi# (ri ) < ∞ and
(1)
(0)
G#
i (ρi , ρi ) > −∞ P -almost surely, i.e., controls take values in the effective domain of the
convex conjugate of F and in the effective domain of the concave conjugate of G, respectively.
(0)
In particular, by the monotonicity assumption on G in the y-variable, we observe that ρi
≥0
P -almost surely.
G
Finally, we show existence of adapted controls, so that the sets AG
j and Aj are always
nonempty.
Lemma A.1. Fix j ∈ {0, . . . , J − 1} and let fj : Ω × RN → R be a mapping such that, for
every ω ∈ Ω, the map x 7→ fj (ω, x) is convex, and for every x ∈ RN , the map ω 7→ fj (ω, x)
is Fj -measurable. Moreover, suppose that fj satisfies the following polynomial growth condition:
(R) such that
There are a constant q ≥ 0 and a nonnegative random variable αj ∈ L∞−
j
|fj (x)| ≤ αj (1 + |x|q ),
P -a.s.,
for every x ∈ RN . Then, for every Z ∈ L∞−
(RN ) there exists a random variable ρj ∈ L∞−
(RN )
j
j
(R) and
such that fj# ρj ∈ L∞−
j
#
fj (Z) = ρ>
j Z − fj (ρj ),
P -a.s.
(27)
Proof. Let Z ∈ L∞−
(RN ). Notice first that, since fj is convex and closed, we have fj## = fj by
j
Theorem 12.2 in Rockafellar (1970) and thus
fj (Z) ≥ ρ> Z − fj# (ρ)
(28)
holds ω-wise for any random variable ρ. We next show that there exists an Fj -measurable random
variable ρj for which (28) holds with P -almost sure equality. To this end, we apply Theorem 7.4
in Cheridito et al. (2015) which yields the existence of an Fj -measurable subgradient to fj , i.e.,
existence of an Fj -measurable random variable ρj such that for all Fj -measurable RN -valued
random variables Z
fj Z + Z − fj Z ≥ ρ>
j Z,
P -a.s.
(29)
From (29) (with the choice Z = z − Z̄ for z ∈ QN ), we conclude that
#
>
ρ>
j Z − fj Z ≥ sup ρj (z) − fj (z) = fj (ρj ),
P -a.s.,
(30)
z∈QN
by continuity of fj , which is the converse of (28), proving P -almost sure equality for ρ = ρj and
thus (27). We next show that ρj satisfies the required integrability conditions, i.e., ρj ∈ L∞− (RN )
∞− (R) for any Z ∈ L∞− (RN ).
and fj# (ρj ) ∈ L∞− (R). To this end, we first prove that ρ>
j Z ∈ L
j
31
Due to (29) and the Minkowski inequality and since a ≤ b implies a+ ≤ |b|, it follows for
Z ∈ L∞−
(RN ) that, for every p ≥ 1,
j
1
> p p
E ρj Z +
p 1
p 1
E fj Z + Z p + E fj Z p < ∞,
≤
since fj is of polynomial growth with ‘random constant’ αj ∈ L∞−
(R) and Z, Z are members of
j
L∞−
(RN ) by assumption. Applying the same argument to Z̃ = −Z yields
j
> p
> p
E ρj Z = E ρj Z̃ < ∞,
−
+
since (29) holds for all Fj -measurable random variables Z and Z̃ inherits the integrability of Z.
We thus conclude that
p i
h
Z
<∞
E ρ>
j and
p E ρj < ∞,
where the second claim follows from the first by taking Z = sgn(ρj ) with the sign function
applied componentwise. In order to show that fj# (ρj ) ∈ L∞− (R), we start with (27) and apply
the Minkowski inequality to conclude that
h
p i p1 h > p i p1
p 1
#
E fj ρj ≤ E ρj Z + E fj Z p < ∞.
B
B.1
Proofs of Section 4
Proof of Theorem 4.3
We first show that the implications (b) ⇒ (c) ⇒ (a) hold in an analogous formulation without
the additional assumption that Gi (z, y) = y.
Proposition B.1. Suppose (R) and (C), and consider the following assertions:
(a) The comparison principle (Comp) holds.
F
(b’) For every (ρ(1) , ρ(0) ) ∈ AG
0 and r ∈ A0 the following positivity condition is fulfilled: For
every i = 0, . . . , J − 1
(ρi + ρi ri )> βi+1 ≥ 0,
(1)
(0)
P -a.s.
(c’) For every j = 0, . . . , J − 1 and any two random variables Y (1) , Y (2) ∈ L∞− (R) with Y (1) ≥
Y (2) P -a.s., the monotonicity condition
Gj (Ej [βj+1 Y (1) ], Fj (Ej [βj+1 Y (1) ])) ≥ Gj (Ej [βj+1 Y (2) ], Fj (Ej [βj+1 Y (2) ])),
is satisfied.
32
P -a.s.,
Then, (b0 ) ⇒ (c0 ) ⇒ (a).
Proof. (b0 ) ⇒ (c0 ): Fix j ∈ {0, . . . , J − 1} and let Y (1) and Y (2) be random variables which are
in L∞− (R) and satisfy Y (1) ≥ Y (2) . By Lemma A.1, there are r ∈ AF0 and (ρ(1) , ρ(0) ) ∈ AG
0
such that
h
i
h
i
Fj Ej βj+1 Y (2) = rj> Ej βj+1 Y (2) − Fj# (rj )
and
h
i
Gj Ej βj+1 Y (1) , Fj (Ej [βj+1 Y (1) ])
h
i
(1) >
(0)
(1) (0)
= ρj
Ej βj+1 Y (1) + ρj Fj Ej [βj+1 Y (1) ] − G#
ρ
,
ρ
,
j
j
j
P -almost surely. Hence, by (11), (b0 ) and (12) we obtain
h
h
i
i
Gj Ej βj+1 Y (2) , Fj Ej βj+1 Y (2)
h
i
(1) >
(0)
(1) (0)
≤
ρj
Ej βj+1 Y (2) + ρj Fj Ej [βj+1 Y (2) ] − G#
ρj , ρ j
j
>
(1)
(0)
(0) #
(1) (0)
#
(2)
= Ej ρj + ρj rj
βj+1 Y − ρj Fj (rj ) − Gj ρj , ρj
>
(1)
(0)
(0) #
(1) (0)
#
(1)
≤ Ej ρj + ρj rj
βj+1 Y − ρj Fj (rj ) − Gj ρj , ρj
h
i
(1) >
(0)
(1) (0)
ρj
Ej βj+1 Y (1) + ρj Fj Ej [βj+1 Y (1) ] − G#
ρ
,
ρ
j
j
j
h
i
h
i
= Gj Ej βj+1 Y (1) , Fj Ej βj+1 Y (1)
.
≤
(c0 ) ⇒ (a): We prove this implication by backward induction. Let Y up and Y low respectively be
super- and subsolutions of (10). Then the assertion is trivially true for j = J. Now assume,
that the assertion is true for j + 1 ∈ {1, . . . , J}. It follows by (c0 ) and the definition of a
sub- and supersolution that
up
up
low
low
Yjup ≥ Gj (Ej [βj+1 Yj+1
], Fj (Ej [βj+1 Yj+1
])) ≥ Gj (Ej [βj+1 Yj+1
], Fj (Ej [βj+1 Yj+1
])) ≥ Yjlow .
Proof of Theorem 4.3. Notice first, that under the additional assumption Gi (z, y) = y, assertions
(b0 ), (c0 ) in Proposition B.1 coincide with assertions (b), (c) in Theorem 4.3, because the vector
(0, ..., 0, 1)> ∈ RD+1 is the only control in AG
0 by linearity of G. It, hence, remains to show:
(a) ⇒ (b): We prove the contraposition. Hence, we assume that there exists a r ∈ AF0 and a
33
j0 ∈ {0, . . . , J − 1} such that P ({r>
j0 βj0 +1 < 0}) > 0. Then we define the process Y by


Y,
i > j0 + 1


 i
Y i = Yj0 +1 − n1{r> βj +1 <0} ,
i = j0 + 1
j0 0



#
r> E [β Y
i i+1 i+1 ] − Fi (r i ), i ≤ j0 ,
i
for n ∈ N which we fix later on. In view of (12), it follows easily that Y is a subsolution to (10).
Now we observe that
i
h
i
h
#
# ∗
Y j0 − Yj0 = Ej0 (rj0 − rj∗0 )> βj0 +1 Yj0 +1 + nEj0 (r>
β
)
j0 j0 +1 − − Fj0 (r j0 ) + Fj0 (rj0 ),
where r∗ ∈ AFj0 is such that for all j = j0 , . . . , J − 1
(rj∗ )> Ej [βj+1 Yj+1 ] − Fj# (rj∗ ) = Fj (Ej [βj+1 Yj+1 ]),
see Lemma A.1. In a next step we define the set Aj0 ,N by
Aj0 ,N
h
i
1
>
=
Ej0 (rj0 βj0 +1 )− ≥
N
n
h
i
o
∩ Ej0 (rj0 − rj∗0 )> βj0 +1 Yj0 +1 − Fj#0 (rj0 ) + Fj#0 (rj∗0 ) > −N .
For N ∈ N sufficiently large (which is fixed from now on), we get that P (Aj0 ,N ) > 0 and therefore
(Y j0 − Yj0 )1Aj0 ,N > −N +
n
=0
N
for n = N 2 , which means that the comparison principle is violated for the subsolution Y (with
this choice of n) and the (super-)solution Y .
B.2
Proof of Theorem 4.5
The proof of Theorem 4.5 is prepared by two propositions. The first of them shows almost sure
optimality of optimal controls and martingales. For convenience, we show this simultaneously for
the processes Θup and Θlow from Theorem 4.7.
Proposition B.2. With the notation in Theorems 4.5 and 4.7, we have for every i = j, . . . J,
(1,∗) (0,∗)
Yi = θiup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = Θup
,ρ
, M ∗)
i (ρ
∗
∗
= θilow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = Θlow
i (r , M ).
Proof. The proof is by backward induction on i = J, . . . , j. The case i = J is obvious as all five
34
processes have the same terminal condition ξ by construction. Now suppose the claim is already
shown for i + 1 ∈ {j + 1, . . . , J}. Then, applying the induction hypothesis to the right hand side
of the recursion formulas, we observe
(1,∗) (0,∗)
Θup,∗
:= Θup
,ρ
, M ∗ ) = θiup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ),
i
i (ρ
(1,∗) (0,∗)
Θlow,∗
:= Θlow
,ρ
, M ∗ ) = θilow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ).
i (ρ
i
Exploiting again the induction hypothesis as well as the definition of the Doob martingale and
the duality relation (22) we obtain
(1,∗) >
(0,∗)
(1,∗) (0,∗)
∗
∗
ρi
βi+1 Yi+1 − ∆Mi+1
+ ρi Fi (βi+1 Yi+1 − ∆Mi+1
) − G#
, ρi )
i (ρi
>
(1,∗)
(0,∗)
(1,∗) (0,∗)
=
ρi
Ei [βi+1 Yi+1 ] + ρi Fi (Ei [βi+1 Yi+1 ]) − G#
, ρi )
i (ρi
Θup,∗
=
i
= Gi (Ei [βi+1 Yi+1 ], Fi (Ei [βi+1 Yi+1 ]))
= Yi .
An analogous argument, making use of (23), shows Θlow,∗
= Yi .
i
The key step in the proof of Theorem 4.5 is the following alternative recursion formula for
θjup
and θjlow . This result enables us to establish inequalities between Yj , Ej [θjup ] and Ej [θjlow ],
replacing, in a sense, the comparison principle (Comp).
Proposition B.3. Suppose (R) and (C) and let M ∈ MD . Then, for every j = 0, . . . , J and
F
ρ(1) , ρ(0) ∈ AG
j and r ∈ Aj , we have for all i = j, . . . , J the P -almost sure identities
θiup (ρ(1) , ρ(0) , r, M )
(1) (0)
up
low (1) (0)
= sup Φi+1 ρi , ρi , u, θi+1
(ρ(1) , ρ(0) , r, M ), θi+1
(ρ , ρ , r, M ), ∆Mi+1
u∈RD
θilow (ρ(1) , ρ(0) , r, M )
=
up
low (1) (0)
inf
Φi+1 v (1) , v (0) , ri , θi+1
(ρ , ρ , r, M ), θi+1
(ρ(1) , ρ(0) , r, M ), ∆Mi+1 ,
(v(1) ,v(0) )∈RD+1
where ΦJ+1 (v (1) , v (0) , u, ϑ1 , ϑ2 , m) = ξ and
Φi+1 v (1) , v (0) , u, ϑ1 , ϑ2 , m
>
>
>
(1)
(1)
=
v
βi+1
ϑ1 −
v
βi+1
ϑ2 − v (1) m
+
−
(1) (0)
+v (0)
u> βi+1 ϑ1 − u> βi+1 ϑ2 − u> m − Fi# (u) − G#
v
,
v
i
+
−
35
for i = j, ..., J − 1. In particular, θilow (ρ(1) , ρ(0) , r, M ) ≤ θiup (ρ(1) , ρ(0) , r, M ) for every i = j, . . . , J.
Proof. First we fix j ∈ {0, . . . , J − 1}, M ∈ MD and controls ρ(1) , ρ(0) and r in AG
j respectively
AFj and define θup and θlow by (20). To lighten the notation, we set
up
(1) (0)
(1) (0)
low (1) (0)
(1) (0)
Φlow
v
,
v
,
r
=
Φ
v
,
v
,
r
,
θ
(ρ
,
ρ
,
r,
M
),
θ
(ρ
,
ρ
,
r,
M
),
∆M
i
i+1
i i+1
i+1
i+1
i+1
up and θ low ). We show the assertion by
and define Φup
i+1 accordingly (interchanging the roles of θ
backward induction on i = J, . . . , j with the case i = J being trivial since θJup = θJlow = ΦJ+1 = ξ
by definition. Now suppose that the assertion is true for i + 1. For any (v (1) , v (0) ) ∈ RD+1 we
obtain, by (11), the following upper bound for θilow :
(1) (0)
Φlow
i+1 v , v , ri
> up
low
=
v (1)
βi+1 θi+1
1{(v(1) )> βi+1 ≥0} + θi+1
1{(v(1) )> βi+1 <0} − ∆Mi+1
up
#
(0)
>
low
>
>
(1) (0)
+v
ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G#
v
,
v
i
+
−
up
low
≥ Gi βi+1 θi+1
1{(v(1) )> βi+1 ≥0} + θi+1
1{(v(1) )> βi+1 <0} − ∆Mi+1 ,
up
#
>
low
>
>
ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri )
+
−
up
ι
low
≥
min Gi βi+1 θi+1
− ∆Mi+1 , ri> βi+1 θi+1
− ri> βi+1 θi+1
+
−
ι∈{up,low}
#
>
−ri ∆Mi+1 − Fi (ri )
= θilow .
We emphasize that this chain of inequalities holds for every ω ∈ Ω, and not only P -almost surely.
Hence,
inf
(v (1) ,v (0) )∈RD+1
(1) (0)
Φlow
v
,
v
,
r
≥ θilow
i
i+1
for every ω ∈ Ω. To conclude the argument for θilow , it remains to show that the converse
inequality holds P -almost surely. Thanks to (11) we get
ι
βi+1 θi+1
Gi
=
− ∆Mi+1 ,
ri> βi+1
+
low
θi+1
−
ri> βi+1
−
up
θi+1
>
ι
inf
v (1)
βi+1 θi+1
− ∆Mi+1 + v (0)
(v(1) ,v(0) )∈RD+1
−
ri> ∆Mi+1
ri> βi+1
+
−
up
low
θi+1
− ri> βi+1 θi+1
− ri> ∆Mi+1 − Fi# (ri )
36
Fi# (ri )
−
(1) (0)
− G#
v
,
v
.
i
up
low P -a.s. (by the induction hypothesis) we obtain
Together with θi+1
≥ θi+1
up
ι
low
Gi βi+1 θi+1
− ∆Mi+1 , ri> βi+1 θi+1
− ri> βi+1 θi+1
+
−
ι∈{up,low}
−ri> ∆Mi+1 − Fi# (ri )
>
>
ι
=
min
inf
v (1) βi+1 θi+1
− v (1) ∆Mi+1
ι∈{up,low} (v (1) ,v (0) )∈RD+1
up
#
(1) (0)
low
>
(0)
>
>
v
,
v
+v
ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G#
i
+
−
>
>
>
up
low
v (1) βi+1
θi+1
≥
inf
−
v (1) βi+1
θi+1
− v (1) ∆Mi+1
(v(1) ,v(0) )∈RD+1
+
−
up
#
(0)
low
>
(1) (0)
>
>
+v
v
,
v
ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G#
i
+
−
(1) (0)
=
inf
P -a.s.
Φlow
i+1 v , v , ri ,
(v(1) ,v(0) )∈RD+1
θilow
=
min
We next turn to θiup where the overall strategy of proof is similar. Recall first that the monotonicity
of G in the y-component implies existence of a set Ω̄ρ (depending on ρ(0) ) of full P -measure such
(0)
that ρk (ω) ≥ 0 for every ω ∈ Ω̄ρ and k = j, . . . , J − 1. By (12) we find that, for any u ∈ RD ,
(0)
(1)
up
Φup
i+1 (ρi , ρi , u) is a lower bound for θi on Ω̄ρ :
(1) (0)
Φup
ρ
,
ρ
,
u
i+1
i
i
(1) >
(1) >
(1) >
up
low
=
ρi
βi+1
θi+1 −
ρi
βi+1
θi+1
− ρi
∆Mi+1
+
−
(0)
(1) (0)
up
#
>
>
low
>
+ρi
u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u) − G#
ρi , ρ i
i
+
−
>
>
(1) >
(1)
(1)
up
low
βi+1
θi+1
− ρi
∆Mi+1
βi+1
θi+1 −
ρi
≤
ρi
+
−
(0)
(1) (0)
up
low
+ρi Fi βi+1 θi+1
1{u> βi+1 ≥0} + θi+1
1{u> βi+1 <0} − ∆Mi+1 − G#
ρ
,
ρ
i
i
i
>
>
>
(1)
(1)
(1)
up
low
≤
ρi
βi+1
θi+1
−
ρi
βi+1
θi+1
− ρi
∆Mi+1
+
−
(0)
(1) (0)
ι
+ρi
max Fi (βi+1 θi+1
− ∆Mi+1 ) − G#
ρ
,
ρ
i
i
i
ι∈{up,low}
=
Hence,
θiup .
(1) (0)
sup Φup
ρ
,
ρ
,
u
≤ θiup
i+1
i
i
u∈RD
on Ω̄ρ , and, thus, P -almost surely. To complete the proof of the proposition, we show the converse
37
up
low P -a.s., we conclude, by (12),
inequality. As θi+1
≥ θi+1
θiup
(1) >
(1) >
(1) >
up
low
θi+1 −
ρi
βi+1
θi+1
− ρi
∆Mi+1
=
ρi
βi+1
+
−
(0)
(1) (0)
ι
+ρi
max Fi (βi+1 θi+1
− ∆Mi+1 ) − G#
ρ
,
ρ
i
i
i
ι∈{up,low}
(1) >
(1) >
(1) >
up
low
=
ρi
βi+1
θi+1
−
ρi
βi+1
θi+1
− ρi
∆Mi+1
+
−
(0)
ι
− u> ∆Mi+1 − Fi# (u)
+ρi
max sup u> βi+1 θi+1
ι∈{up,low} u∈RD
(1) (0)
ρ
,
ρ
−G#
i
i
i
>
(1)
(1) >
(1) >
up
low
≤
ρi
βi+1
θi+1 −
ρi
βi+1
θi+1
− ρi
∆Mi+1
+
−
(0)
up
#
>
>
low
>
+ρi sup
u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u)
+
−
u∈RD
(1)
(0)
−G#
ρi , ρi
i
(1) (0)
= sup Φup
ρ
,
ρ
,
u
, P -a.s.
i+1
i
i
u∈RD
As Φi+1 v (1) , v (0) , u, ϑ1 , ϑ2 , m is increasing in ϑ1 and decreasing in ϑ2 , we finally get
(1) (0)
(1) (0)
up
low
low up
sup Φi+1 ρi , ρi , u, θi+1
, θi+1
, ∆Mi+1 ≥ sup Φi+1 ρi , ρi , u, θi+1
, θi+1 , ∆Mi+1
u∈RD
u∈RD
low up
Φi+1 v (1) , v (0) , ri , θi+1
, θi+1 , ∆Mi+1 = θilow , P -a.s.,
≥
inf
(v(1) ,v(0) )∈RD+1
θiup =
up
low P -a.s. by the induction hypothesis.
as θi+1
≥ θi+1
We are now in the position to complete the proof of Theorem 4.5.
Proof of Theorem 4.5. Let j ∈ {0, . . . , J − 1} be fixed from now on. Due to Propositions B.2 and
B.3, it only remains to show that
Ei [θilow ] ≤ Yi ≤ Ei [θiup ]
for i = j, ..., J. We prove this by backward induction on i. To this end, we fix M ∈ MD and
F
(1,∗) , ρ(0,∗) and r ∗ in
controls ρ(1) , ρ(0) and r in AG
j respectively Aj , as well as ‘optimizers’ ρ
F
up and
AG
j respectively Aj which satisfy the duality relations (22) and (23). By definition of θ
θlow the assertion is trivially true for i = J. Suppose that the assertion is true for i + 1. Recalling
Proposition B.3 and applying the tower property of the conditional expectation as well as the
38
induction hypothesis, we get
"
Ei [θilow ]
>
>
up
low
(1)
(1)
θi+1 −
v
βi+1
θi+1
v
βi+1
= Ei
inf
(v(1) ,v(0) )∈RD+1
+
−
>
up
low
− v (1) ∆Mi+1 + v (0)
ri> βi+1 θi+1
− ri> βi+1 θi+1
− ri> ∆Mi+1
+
−
#
#
#
(1) (0)
−Fi (ri ) − Gi v , v
i h
up (1,∗) >
(1,∗) >
low
Ei+1 θi+1
ρi
βi+1
Ei+1 θi+1 −
≤ Ei
ρi
βi+1
−
+
i h
>
up (1,∗)
(0,∗)
low
− ri> βi+1 Ei+1 θi+1
− ρi
∆Mi+1 + ρi
ri> βi+1 Ei+1 θi+1
−
+
(1,∗) (0,∗)
−ri> ∆Mi+1 − Fi# (ri ) − G#
ρi , ρ i
i
(1,∗) >
(0,∗)
(1,∗) (0,∗)
#
#
>
βi+1 Yi+1 + ρi
ri βi+1 Yi+1 − Fi (ri ) − Gi ρi , ρi
≤ Ei ρi
≤ Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ]))
= Yi .
(0,∗)
Here, the last inequality is an immediate consequence of (12), the nonnegativity of ρi
duality relation (22). Applying an analogous argument, we obtain that
Ei [θiup ]
= Ei
(1) >
ρi
βi+1
up
θi+1
Ei [θiup ]
and the
≥ Yi . Indeed,
(1) >
(1) >
low
−
ρi
βi+1
θi+1
− ρi
∆Mi+1
+
−
(0)
up
#
>
>
low
>
+ρi sup
u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u)
+
−
u∈RD
(1) (0)
−G#
ρi , ρ i
i
h
i up (1) >
(1) >
(1) >
low
≥ Ei
ρi
βi+1
Ei+1 θi+1
−
ρi
βi+1
Ei+1 θi+1
− ρi
∆Mi+1
−
+
h
i
up ∗ >
(0)
low
+ρi
(ri∗ )> βi+1 Ei+1 θi+1
− (ri ) βi+1 Ei+1 θi+1
− (ri∗ )> ∆Mi+1
−
+
(1) (0)
# ∗
#
−Fi (ri ) − Gi ρi , ρi
(1) (0)
(ri∗ )> Ei [βi+1 Yi+1 ] − Fi# (ri∗ ) − G#
ρ
,
ρ
i
i
i
>
(1)
(0)
(1) (0)
=
ρi
Ei [βi+1 Yi+1 ] + ρi Fi (Ei [βi+1 Yi+1 ]) − G#
ρi , ρi
i
≥
(1) >
ρi
(0)
Ei [βi+1 Yi+1 ] + ρi
≥ Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ]))
= Yi .
39
(0)
making now use of the nonnegativity of ρi , the duality relation (23), and (11). This establishes
Ei [θilow ] ≤ Yi ≤ Ei [θiup ]
for i = j, . . . , J.
B.3
Proof of Proposition 4.6
By the definition of θup , the tower property of the conditional expectation, Jensen’s inequality
up
low (in view of
(applied to the convex functions max and Fj ), and the comparison Yj+1
≥ Yj+1
Proposition B.3), we obtain
Yjup
(1) >
(1) >
(1) >
up
low
θj+1 −
θj+1
= Ej
ρj
βj+1
ρj
βj+1
− ρj
∆Mj+1
+
−
(1) (0)
(0)
#
ι
+ ρj
max Fj (βj+1 θj+1 − ∆Mj+1 ) − Gj ρj , ρj
ι∈{up,low}
(1) >
(1) >
up
low
≥ Ej
ρj
βj+1
Yj+1 − Ej
ρj
βj+1
Yj+1
+
−
(0)
(1) (0)
ι
+ρj
max Fj (Ej [βj+1 Yj+1
]) − G#
ρ
,
ρ
j
j
j
ι∈{up,low}
h
i
(1) >
(0)
up
≥
ρj
Ej βj+1 Yj+1
+ ρj
max
ι∈{up,low}
(1) (0)
ι
Fj (Ej [βj+1 Yj+1
]) − G#
ρ
,
ρ
j
j
j
h
i
(1) >
(0)
(1) (0)
up
up
≥
ρj
Ej βj+1 Yj+1
+ ρj Fj (Ej [βj+1 Yj+1
]) − G#
ρj , ρ j
j
up
up
≥ Gj (Ej [βj+1 Yj+1
], Fj (Ej [βj+1 Yj+1
])),
(0)
making, again, use of the nonnegativity of ρj
for the lower bound Y
B.4
low
and (11). As in the previous proofs, the argument
is essentially the same, and we skip the details.
Proof of Theorem 4.7
Let j ∈ {0, ..., J − 1} be fixed. Due to Proposition B.2 and the comparison principle (Comp), it
only remains to show that the pair of processes Y up and Y low given by
Yiup = Ei [Θup
i ]
and
Yilow = Ei [Θlow
i ],
i = j, ..., J,
defines a super- and a subsolution to (10). The proof of this claim is similar to the proof of
Proposition 4.6 but simpler.
40
C
Proofs on Stochastic Games
In this section, we provide the technical details for the discussion of stochastic two-player zero-sum
games in Sections 3 and 5.
C.1
Alternative Representation
We first derive the alternative representation for Θlow
0 (r, M ) as a pathwise minimization problem
discussed in Section 5. To this end, define, for some fixed control r ∈ AF0 and martingale M ∈ MD ,
Θ̃low
=
i
wi,J (v (1) , v (0) , r)ξ
inf
(1)
(0)
(vj ,vj )∈RD+1 , j=i,...,J−1
−
J−1
X
!
(0)
(1)
(0)
(1)
(0)
#
#
wi,j (v (1) , v (0) , r) vj Fj (rj ) + (vj + vj rj )> ∆Mj+1 + Gj (vj , vj ) ,
j=i
where
wi,j (v (1) , v (0) , u) =
j−1
Y
(vk + vk uk )> βk+1 .
(1)
(0)
k=i
Then, Θ̃low
J = ξ and, for i = 0, . . . , J − 1,
Θ̃low
=
i
inf
−
J−1
X
(1)
inf
(1) (0)
(vi ,vi )∈RD+1
(1) (0)
(vj ,vj )∈RD+1 ,
wi+1,j (v (1) , v (0) , r)
(vi
+ vi ri )> βi+1 wi+1,J (v (1) , v (0) , r)ξ
(0)
j=i+1,...,J−1
(0)
vj Fj# (rj )
+
(1)
(vj
+
(0)
vj rj )> ∆Mj+1
j=i+1
−
(0)
vi Fi# (ri )
+
(1)
(vi
+
(0)
vi ri )> ∆Mi+1
+
(1) (0)
G#
i (vi , vi )
(1)
!
+
(1) (0)
G#
j (vj , vj )
!
.
(0)
The outer infimum can be taken restricted to such (vi , vi ) ∈ RD+1 which belong to DG# (ω,·) ,
i
because the expression which is to be minimized is +∞ otherwise. Then, assumption (15) implies
that the inner infimum can be interchanged with the nonnegative factor (vi +vi ri )> βi+1 , which
(1)
(0)
yields
Θ̃low
=
i
inf
(1) (0)
(vi ,vi )∈D #
G (ω,·)
i
(1)
(vi
+ vi ri )> βi+1 Θ̃low
i+1
(0)
(0)
(1)
(0)
(1) (0)
− vi Fi# (ri ) + (vi + vi ri )> ∆Mi+1 + G#
(v
,
v
)
,
j
i
i
(31)
and the infimum can, by the same argument as above, again be replaced by that over the whole
RD+1 . Further rewriting (31) using (11) shows that the recursion for Θ̃low
coincides with the one
i
41
for Θlow
i . Therefore,
low
Θlow
=
0 (r, M ) = Θ̃0
−
J−1
X
wJ (v (1) , v (0) , r)ξ
inf
(1) (0)
(vj ,vj )∈RD+1 ,
wj (v (1) , v (0) , r)
j=0,...,J−1
(0)
vj Fj# (rj )
+
(1)
(vj
+
(0)
vj rj )> ∆Mj+1
+
(1) (0)
G#
j (vj , vj )
!
,
j=0
(1) (0)
P -almost surely. The alternative expression for Θup
0 (ρ , ρ , M ) as a pathwise maximization
problem can be shown in the same way.
C.2
Equilibrium Value
It remains to show that Y0 is the equilibrium value of the two-player zero-sum game. We apply
the alternative representations for Θup and Θlow as pathwise optimization problems from Section
C.1 as well as Theorem 4.7 (twice) in order to conclude
Y0 =
sup
r∈AF
0 ,
=
M ∈MD
−
J−1
X
"
E
sup
r∈AF
0 ,
E[Θlow
0 (r, M )]
inf
(1)
M ∈MD
(0)
(0)
(1)
(0)
(1) (0)
wj (v (1) , v (0) , r) vj Fj# (rj ) + (vj + vj rj )> ∆Mj+1 + G#
(v
,
v
)
j
j
j
j=0
≤
−
J−1
X
inf
G
(1) (0)
M ∈MD (ρ ,ρ )∈A0
sup
r∈AF
0
≤
"
inf
(ρ(1) ,ρ(0) )∈AG
0
E wJ (ρ(1) , ρ(0) , r)ξ −
sup E wJ (ρ(1) , ρ(0) , r)ξ −
inf
F
(ρ(1) ,ρ(0) )∈AG
0 r∈A
inf
−
J−1
X
wj (ρ
(1)
(0)
,ρ
E
J−1
X
#
(0)
(1) (0)
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
#
(0)
(1) (0)
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G#
j (ρj , ρj )
j=0
"
(ρ(1) ,ρ(0) )∈AG
0 , M ∈MD
J−1
X
j=0
"
0
≤
E wJ (ρ(1) , ρ(0) , r)ξ
#
(0)
(1)
(0)
(1) (0)
wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + (ρj + ρj rj )> ∆Mj+1 + G#
j (ρj , ρj )
j=0
=
wJ (ρ(1) , ρ(0) , u)ξ
sup
(uj )∈RD , j=0,...,J−1
(0)
(1)
(0)
(1) (0)
, u) ρj Fj# (uj ) + (ρj + ρj uj )> ∆Mj+1 + G#
(ρ
,
ρ
)
j
j
j
j=0
=
#
"
sup
r∈AF
0 ,
wJ (v (1) , v (0) , r)ξ
(vj ,vj )∈RD+1 , j=0,...,J−1
inf
(ρ(1) ,ρ(0) )∈AG
0 ,
M ∈MD
(1) (0)
E[Θup
0 (ρ , ρ , M )] = Y0 .
42
!#
Here we also applied the zero-expectation property of martingale penalties at adapted controls.
Consequently, all inequalities turn into equalities, which completes the proof.
References
S. Alanko and M. Avellaneda. Reducing variance in the numerical solution of BSDEs. Comptes
Rendus Mathematique, 351(3):135–138, 2013.
L. Andersen and M. Broadie. Primal-dual simulation algorithm for pricing multidimensional
American options. Management Science, 50(9):1222–1234, 2004.
M. Avellaneda, A. Levy, and A. Parás. Pricing and hedging derivative securities in markets with
uncertain volatilities. Applied Mathematical Finance, 2(2):73–88, 1995.
C. Bender and J. Steiner. Least-squares Monte Carlo for Backward SDEs. In R. Carmona,
P. Del Moral, P. Hu, and N. Oudjane, editors, Numerical Methods in Finance, pages 257–289.
Springer, 2012.
C. Bender, N. Schweizer, and J. Zhuo. A primal-dual algorithm for BSDEs. Mathematical Finance,
Early View, DOI: 10.1111/mafi.12100, 2015.
D. P. Bertsekas. Dynamic programming and optimal control, volume I. Athena Scientific, 3rd
edition, 2005.
C. Beveridge and M. Joshi. Monte Carlo bounds for game options including convertible bonds.
Management Science, 57(5):960–974, 2011.
D. Brigo and F. Mercurio. Interest Rate Models – Theory and Practice: With Smile, Inflation
and Credit. Springer, 2nd edition, 2006.
D. Brigo and A. Pallavicini. Counterparty risk pricing under correlation between default and
interest rates. In J. H. Miller, D. C. Edelman, and J. A. D. Appleby, editors, Numerical
Methods for Finance, pages 63–81. Chapman and Hall/CRC, 2007.
D. Brigo, M. Morini, and A. Pallavicini. Counterparty Credit Risk, Collateral and Funding: With
Pricing Cases for All Asset Classes. John Wiley & Sons, 2013.
D. B. Brown and J. E. Smith. Dynamic portfolio optimization with transaction costs: Heuristics
and dual bounds. Management Science, 57(10):1752–1770, 2011.
D. B. Brown, J. E. Smith, and P. Sun. Information relaxations and duality in stochastic dynamic
programs. Operations Research, 58(4):785–801, 2010.
43
P. Cheridito, M. Kupper, and N. Vogelpoth. Conditional analysis on Rd . In A. H. Hamel,
F. Heyde, A. Löhne, B. Rudloff, and C. Schrage, editors, Set Optimization and Applications in
Finance - The State of the Art, pages 179–211. Springer, 2015.
S. Crépey. Financial Modeling. Springer, 2013.
S. Crépey, R. Gerboud, Z. Grbac, and N. Ngor. Counterparty risk and funding: The four wings
of the TVA. International Journal of Theoretical and Applied Finance, 16(2):1350006, 2013.
S. Crépey, T. R. Bielecki, and D. Brigo. Counterparty Risk and Funding: A Tale of Two Puzzles.
Chapman and Hall/CRC, 2014.
D. Duffie, M. Schroder, and C. Skiadas. Recursive valuation of defaultable securities and the
timing of resolution of uncertainty. Annals of Applied Probability, pages 1075–1090, 1996.
N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equations in finance.
Mathematical Finance, 7(1):1–71, 1997.
A. Fahim, N. Touzi, and X. Warin. A probabilistic numerical method for fully nonlinear parabolic
PDEs. Annals of Applied Probability, 21(4):1322–1364, 2011.
E. Fournié, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, and N. Touzi. Applications of Malliavin
calculus to Monte Carlo methods in finance. Finance and Stochastics, 3(4):391–412, 1999.
P. Glasserman and B. Yu. Simulation for American options: Regression now or regression later?
In H. Niederreiter, editor, Monte Carlo and Quasi-Monte Carlo Methods 2002, pages 213–226.
Springer, 2004.
J. Guyon and P. Henry-Labordère. The uncertain volatility model: A Monte Carlo approach.
Journal of Computational Finance, 14(3):37–71, 2011.
J. Guyon and P. Henry-Labordère. Nonlinear Option Pricing. Chapman and Hall/CRC, 2013.
M. Haugh and A. E. B. Lim. Linear–quadratic control and information relaxations. Operations
Research Letters, 40(6):521–528, 2012.
M. B. Haugh and L. Kogan. Pricing American options: A duality approach. Operations Research,
52(2):258–270, 2004.
M. B. Haugh and C. Wang. Information relaxations and dynamic zero-sum games. arXiv preprint
1405.4347, 2015.
I. Karatzas and S. Shreve. Brownian Motion and Stochastic Calculus. Springer, 2nd edition,
1991.
44
H. Kraft and F. T. Seifried. Stochastic differential utility as the continuous-time limit of recursive
utility. Journal of Economic Theory, 151:528–550, 2014.
C. Kühn, A. E. Kyprianou, and K. van Schaik. Pricing Israeli options: A pathwise approach.
Stochastics, 79(1-2):117–137, 2007.
R. Lord, R. Koekkoek, and D. van Dijk. A comparison of biased simulation schemes for stochastic
volatility models. Quantitative Finance, 10(2):177–194, 2010.
T. J. Lyons. Uncertain volatility and the risk-free synthesis of derivatives. Applied Mathematical
Finance, 2(2):117–133, 1995.
H. Pham. Continuous-time Stochastic Control and Optimization with Financial Applications.
Springer, 2009.
W. B. Powell. Approximate Dynamic Programming: Solving the curses of dimensionality. John
Wiley & Sons, 2nd edition, 2011.
N. S. Rasmussen. Control variates for Monte Carlo valuation of American options. Journal of
Computational Finance, 9(1):84–102, 2005.
R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
L. C. G. Rogers. Monte Carlo valuation of American options. Mathematical Finance, 12(3):
271–286, 2002.
L. C. G. Rogers. Pathwise stochastic optimal control. SIAM Journal on Control and Optimization,
46(3):1116–1132, 2007.
L. S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–
1100, 1953.
J. M. Vanden. Exact superreplication strategies for a class of derivative assets. Applied Mathematical Finance, 13(01):61–87, 2006.
J. Zhang. A numerical scheme for bsdes. Annals of Applied Probability, 14(1):459–488, 2004.
45
Download