Pathwise Dynamic Programming Christian Bender1,2 , Christian Gärtner2 , and Nikolaus Schweizer3 April 8, 2016 Abstract We present a novel method for deriving tight Monte Carlo confidence intervals for solutions of stochastic dynamic programming equations. Taking some approximate solution to the equation as an input, we construct pathwise recursions with a known bias. Suitably coupling the recursions for lower and upper bounds ensures that the method is applicable even when the dynamic program does not satisfy a comparison principle. We apply our method to two nonlinear option pricing problems, pricing under bilateral counterparty risk and pricing under uncertain volatility. Keywords: stochastic dynamic programming; Monte Carlo; confidence bounds; option pricing MSC 2000: 90C39, 65C05, 90C46, 91B28. 1 Introduction We study the problem of computing Monte Carlo confidence intervals for solutions of discretetime, finite horizon stochastic dynamic programming equations. There is a random terminal value, and a nonlinear recursion which allows to compute the value at a given time from expectations about the value one step ahead. Equations of this type are highly prominent in the analysis of multistage sequential decision problems under uncertainty (Bertsekas, 2005; Powell, 2011). Yet they also arise in a variety of further applications such as financial option pricing (e.g. Guyon and Henry-Labordère, 2013), the evaluation of recursive utility functionals (e.g. Kraft and Seifried, 2014) or the numerical solution of partial differential equations (e.g. Fahim et al., 2011). The key challenge when computing solutions to stochastic dynamic programs numerically stems from a high order nesting of conditional expectations operators in the backward recursion. The solution at each time step depends on an expectation of what happens one step ahead, which in turn depends on expectations of what happens at later dates. If the system is driven by a 1 Corresponding author. Saarland University, Department of Mathematics, Postfach 151150, D-66041 Saarbrücken, Germany, bender@math.uni-sb.de; gaertner@math.uni-sb.de. 3 University of Duisburg-Essen, Mercator School of Management, Lotharstr. 65, D-46057 Duisburg, nikolaus.schweizer@uni-due.de. 2 1 Markov process with a high-dimensional state space – as is the case in typical applications – a naive numerical approach quickly runs into the curse of dimensionality. In practice, conditional expectations thus need to be replaced by an approximate conditional expectations operator which can be nested several times at moderate computational costs, e.g., by a least-squares Monte Carlo approximation. For an introduction and overview of this “approximate dynamic programming” approach, see, e.g., Powell (2011). The error induced by these approximations is typically hard to quantify and control. This motivates us to develop a posteriori criteria which use an approximate or heuristic solution to the dynamic program as an input in order to compute tight upper and lower bounds on the true solution. In this context, “tight” means that the bounds match when the input approximation coincides with the true solution. Specifically, we study stochastic dynamic programming equations of the form Yj = Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])) (1) with random terminal condition YJ = ξ. Here, Ej [·] denotes the conditional expectation given the information at time step j. Fj and Gj are real-valued, adapted, random functions. Fj is convex while Gj is concave and increasing in its second argument. In particular, a possible dependence on state and decision variables is modeled implicitly through the functions Fj and Gj . The stochastic weight vectors βj form an adapted process, i.e., unlike Fj and Gj , βj+1 is known at time j + 1 but not yet at time j. Such weight processes frequently arise in dynamic programs for financial option valuation problems where, roughly speaking, they are associated with the first two derivatives, Delta and Gamma, of the value process Y . Similarly, Fahim et al. (2011) derive time discretization schemes of the form (1) for fully nonlinear second-order parabolic partial differential equations which involve stochastic weights in the approximation of derivatives. As prominent examples, this class of partial differential equations includes Hamilton-Jacobi-Bellman equations arising from continuous-time stochastic control problems. There are two main motivations for assuming the concave-convex structure of the right hand side of (1). First, since convexity and concavity may stem, respectively, from maximization or minimization, the equation is general enough to cover not only dynamic programs associated with classical optimization problems but also dynamic programs arising from certain stochastic twoplayer games or robust optimization problems under model uncertainty. Second, many functions which are themselves neither concave nor convex may be written in this concave-convex form. This can be seen, for instance, in the application to bilateral counterparty credit risk in Section 6.1 below. Traditionally, a posteriori criteria for this type of recursion were not derived directly from equation (1) but rather from a primal-dual pair of optimization problems associated with it. For instance, with the choices Gj (z, y) = y, Fj (z) = max{Sj , z}, βj = 1 and ξ = SJ , (1) becomes the dynamic programming equation associated with the optimal stopping problem for a process (Sj )j . From an approximate solution of (1) one can conclude an approximation of the optimal stopping 2 strategy. Testing this strategy by simulation yields lower bounds on the value of the optimal stopping problem. The derivation of dual upper bounds starts by considering strategies which allow to look into the future, i.e., by considering strategies which allow to optimize pathwise rather than in conditional expectation. The best strategy of this type is easy to simulate and implies an upper bound on the value of the usual optimal stopping problem. This bound can be made tight by penalizing the use of future information in an appropriate way. Approximately optimal information penalties can be derived from approximate solutions to (1). Combining the resulting low-biased and high-biased Monte Carlo estimators for Y0 yields Monte Carlo confidence intervals for the solution of the dynamic program. This information relaxation approach was developed in the context of optimal stopping independently by Rogers (2002) and Haugh and Kogan (2004). The approach was extended to general discrete-time stochastic optimization problems by Brown et al. (2010) and Rogers (2007). Recently, Bender et al. (2015) have proposed a posteriori criteria which are derived directly from a dynamic programming recursion like (1). The obvious advantage is that, in principle, this approach requires neither knowledge nor existence of associated primal and dual optimization problems. This paper generalizes the approach of Bender et al. (2015) in various directions. For example, we merely require that the functions Fj and Gj are of polynomial growth while the corresponding condition in the latter paper is Lipschitz continuity. Moreover, we assume that the weights βj are sufficiently integrable rather than bounded, and introduce the concave-convex functional form of the right hand side of (1) which is more flexible than the functional forms considered there. Conceptually, our main departure from the approach of Bender et al. (2015) is that we do not require that the recursion (1) satisfies a comparison principle. Suppose that two adapted processes Y low and Y up fulfill the analogs of (1) with “=” replaced, respectively, by “≤” and “≥”. We call such processes subsolutions and supersolutions to (1). The comparison principle postulates that subsolutions are always smaller than supersolutions. Relying on such a comparison principle, the approach of Bender et al. (2015) mimics the classical situation in the sense that their lower and upper bounds can be interpreted as stemming, respectively, from a primal and a dual optimization problem constructed from (1). The bounds of the present paper apply regardless of whether a comparison principle holds or not. This increased applicability is achieved at negligible additional numerical costs. The key idea is to construct a pathwise recursion associated with particular pairs of super- and subsolutions. These pairs remain ordered even when such an ordering is not a generic property of super- and subsolutions, i.e., when comparison is violated in general. Roughly speaking, whenever a violation of comparison threatens to reverse the order of the upper and lower bound, the upper and lower bound are simply exchanged on the respective path. Consequently, lower and upper bounds must always be computed simultaneously. In particular, they can no longer be viewed as the respective solutions of distinct primal and dual optimization problems. 3 For some applications like optimal stopping, the comparison principle is not an issue. For others like the nonlinear pricing applications studied in Bender et al. (2015), the principle can be set in force by a relatively mild truncation of the stochastic weights β. Dispensing with the comparison principle allows us to avoid any truncation and the corresponding truncation error. This matters because there is also a class of applications where the required truncation levels are so low that truncation would fundamentally alter the problem. This concerns, in particular, backward recursions associated with fully nonlinear second-order partial differential equations such as the problem of pricing under volatility uncertainty which serves as a running example throughout the paper. Solving equation (1) for the uncertain volatility model is well-known as a challenging numerical problem (Guyon and Henry-Labordère, 2011; Alanko and Avellaneda, 2013), making an a posteriori evaluation of solutions particularly desirable. The paper is organized as follows: Section 2 provides further motivation for the stochastic dynamic programming equation (1) by discussing how it arises in a series of examples. Section 3 states our setting and key assumptions in detail and explores a partial connection between equation (1) and a class of two-player games which makes some of the constructions of the later sections easier to interpret. Section 4 presents our main results. We first discuss the restrictiveness of assuming the comparison principle in the applications we are interested in. Afterwards, we proceed to Theorem 4.5, the key result of the paper, providing a new pair of upper and lower bounds which is applicable in the absence of a comparison principle. In Section 5, we relate our results to the information relaxation duals of Brown et al. (2010). In particular, we show that our bounds in the presence of the comparison principle can be reinterpreted in terms of information relaxation bounds for stochastic two-player zero-sum games, complementing recent work by Haugh and Wang (2015). Finally, in Section 6, we apply our results in the context of two nonlinear valuation problems, option pricing in a four-dimensional interest rate model with bilateral counterparty risk and in the uncertain volatility model. While such nonlinearities in pricing due to counterparty risk and model uncertainty have received increased attention since the financial crisis, Monte Carlo confidence bounds for these two problems were previously unavailable. All proofs are postponed to the Appendix. 2 Examples In this section, we briefly discuss several examples of backward dynamic programming equations arising in option pricing problems, which are covered by the framework of our paper. These include nonlinearities due to early exercise features, counterparty risk, and model uncertainty. Example 2.1 (Bermudan options). Our first example is the optimal stopping problem, or, in financial terms, the pricing problem of a Bermudan option. Given a stochastic process Sj , j = 0, . . . , J, which is adapted to the information given in terms of a filtration on an underlying 4 filtered probability space, one wishes to maximize the expected reward from stopping S, i.e., Y0 := sup E[Sτ ], (2) τ where τ runs through the set of {0, . . . , J}-valued stopping times. If the expectation is taken with respect to a pricing measure, under which all tradable and storable securities in an underlying financial market are martingales, then Y0 is a fair price of a Bermudan option with discounted payoff process given by S. It is well-known and easy to check that Y0 is the initial value of the dynamic program Yj = max{Ej [Yj+1 ], Sj }, YJ = SJ , (3) which is indeed of the form (1). In the traditional primal-dual approach, the primal maximization problem (2) is complemented by a dual minimization problem. This is the information relaxation dual which was initiated for this problem independently by works of Rogers (2002) and Haugh and Kogan (2004). It states that " Y0 = inf E M # sup (Sj − Mj ) , (4) j=0,...,J where M runs over the set of martingales which start in zero at time zero. In order to illustrate our pathwise dynamic programming approach, we shall present an alternative proof of this dual representation for optimal stopping in Example 4.2 below, which, in contrast to the arguments in Rogers (2002) and Haugh and Kogan (2004), does not make use of specific properties of the optimal stopping problem (2), but rather relies on the convexity of the max-operator in the associated dynamic programming equation (3). Example 2.2 (Credit value adjustment). In order to explain the derivation of a discrete-time dynamic programming equation of the form (1) for pricing under counterparty risk, we restrict ourselves to a simplified setting, but refer, e.g., to the monographs by Brigo et al. (2013) or Crépey et al. (2014) for more realistic situations: A party A and a counterparty B trade several derivatives, all of which mature at the same time T . The random variable ξ denotes the (possibly negative) cumulative payoff of the basket of derivatives which A receives at time T . We assume that ξ is measurable with respect to the market’s reference information which is given by a filtration (Ft )0≤t≤T , and that only the counterparty B can default. Default occurs at an exponential waiting time τ with parameter γ which is independent of the reference filtration. The full filtration (Gt )0≤t≤T is the smallest one which additionally to the reference information contains observation of the default event, i.e. it satisfies {τ ≤ t} ∈ Gt for every t ∈ [0, T ]. A recovery scheme X = (Xt )0≤t≤T describes the (possibly negative) amount of money which A receives, if B defaults at time t. The process X is assumed to be adapted to the reference filtration. We denote by D(t, s; κ) = e−κ(s−t) , t ≤ s, the discount factor for a rate κ and stipulate that the constant r ≥ 0 is a good proxy to a risk-free rate. Then, standard calculations in this reduced 5 form approach (e.g. Duffie et al., 1996) show that the fair price for these derivatives under default risk at time t is given by Vt = = 1{τ >t} E 1{τ >T } D(t, T, r)ξ + 1{τ ≤T } D(t, τ, r)Xτ Gt Z T 1{τ >t} E D(t, T, r + γ) ξ + D(s, T, −(r + γ))γXs ds Ft (5) t =: 1{τ >t} Yt , where expectation is again taken with respect to a pricing measure. We assume that, upon default, the contract (consisting of the basket of derivatives) is replaced by the same one with a new counterparty which has not yet defaulted but is otherwise identical to the old counterparty B. This suggests the choice Xt = −(Yt )− + r(Yt )+ = Yt − (1 − r)(Yt )+ for some recovery rate r ∈ [0, 1]. Here (·)± denote the positive part and the negative part, respectively. Substituting this definition of X into (5) and integrating by parts, leads to the following nonlinear backward stochastic differential equation: Z Yt = E ξ − t T rYs + γ(1 − r)(Ys )+ ds Ft . (6) i h RT Note that, by Proposition 3.4 in El Karoui et al. (1997), Y0 = inf ρ E e− 0 ρs ds ξ , where ρ runs over the set of adapted processes with values in [r, r+γ(1−r)]. Thus, the party A can minimize the price by choosing the interest rate which is applied for the discounting within a range determined by the riskless rate r, the credit spread γ of the counterparty, and the recovery rate r. Discretizing (6) in time, we end up with the backward dynamic programming equation Yj = (1 − r∆)E[Yj+1 |Ftj ] − γ(1 − r) E[Yj+1 |Ftj ] + ∆, YJ = ξ, where tj = jh, j = 0, . . . , J, and ∆ = T /J. This equation is of the form (1) with βj ≡ 1, Gj (z, y) = (1 − r∆)z − γ(1 − r)∆ (z)+ , and Fj (z) ≡ 1. We note that, starting with the work by Zhang (2004), time discretization schemes for backward stochastic differential equations have been intensively studied, see, e.g., the literature overview in Bender and Steiner (2012). If one additionally takes the default risk and the funding cost of the party A into account, the corresponding dynamic programming equation has both concave and convex nonlinearities, and Y0 corresponds to the equilibrium value of a two-player game, see the discussion at the end of Section 3. This is one reason to consider the more flexible concave-convex form (1). A numerical example in the theoretical framework of Crépey et al. (2013) with random interest and default rates is presented in Section 6 below. Example 2.3 (Uncertain volatility). Dynamic programming equations of the form (1) also appear in the discretization of Hamilton-Jacobi-Bellman equations for stochastic control problems, see, e.g., Fahim et al. (2011). We illustrate such an approach by the pricing problem of a European 6 option on a single asset under uncertain volatility, but the underlying discretization methodology of Fahim et al. (2011) can, of course, be applied to a wide range of (multidimensional) stochastic control problems in continuous time. We suppose that the price of the asset is given under risk-neutral dynamics and in discounted units by Z Xtσ t = x0 exp 0 1 σu dWu − 2 Z t σu2 du , 0 where W denotes a Brownian motion, and the volatility σ is a stochastic process which is adapted to the Brownian filtration (Ft )0≤t≤T . The pricing problem of a European option with payoff function g under uncertain volatility then becomes the optimization problem Y0 = sup E[g(XTσ )], (7) σ where σ runs over the nonanticipating processes with values in [σlow , σup ] for some given constants σlow < σup . This is the worst case price over all stochastic volatility processes ranging within the interval [σlow , σup ], and thus reflects the volatility uncertainty. The study of this so-called uncertain volatility model was initiated by Avellaneda et al. (1995) and Lyons (1995). The corresponding Hamilton-Jacobi-Bellman equation can easily be transformed in such a way that Y0 = v(0, 0) where v solves the fully nonlinear Cauchy problem 1 1 vt (t, x) = − vxx (t, x) − max 2 σ∈{σlow ,σup } 2 1 2 v(T, x) = g(x0 eρ̂x− 2 ρ̂ T ), σ2 − 1 (vxx (t, x) − ρ̂vx (t, x)) , ρ̂2 (t, x) ∈ [0, T ) × R x ∈ R, (8) for any (constant) reference volatility ρ̂ > 0 of one’s choice. Under suitable assumptions on g, there exists a unique classical solution to (8) (satisfying appropriate growth conditions), see Pham (2009). Let 0 = t0 < t1 < . . . < tJ = T again be an equidistant partition of the interval [0, T ], where J ∈ N, and set ∆ = T /J. We approximate v by an operator splitting scheme, which on each small time interval first solves the linear subproblem, i.e., a Cauchy problem for the heat equation, and then corrects for the nonlinearity by plugging in the solution of the linear subproblem. Precisely, for fixed J, let 1 2 y J (x) = g(x0 eρ̂x− 2 ρ̂ T ), x ∈ R 1 j ȳtj (t, x) = − ȳxx (t, x), ȳ j (tj+1 , x) = y j+1 (x), t ∈ [tj , tj+1 ), x ∈ R, j = J − 1, . . . , 0 2 1 σ2 j j j y (x) = ȳ (tj , x) + ∆ max − 1 ȳxx (tj , x) − ρ̂ȳxj (tj , x) , x ∈ R. 2 ρ̂ σ∈{σlow ,σup } 2 Evaluating y j (x) along the Brownian paths leads to Yj := y j (Wtj ). Applying the FeynmanKac representation for the heat equation, see e.g. Karatzas and Shreve (1991), repeatedly, one 7 observes that ȳ j (tj , Wtj ) = Ej [Yj+1 ], where Ej denotes the expectation conditional on Ftj . It is well-known from Fournié et al. (1999) that the space derivatives of ȳ j (tj , ·) can be expressed via the so-called Malliavin Monte-Carlo weights as ȳxj (tj , Wtj ) = Ej " ∆Wj+1 Yj+1 , ∆ j ȳxx (tj , Wtj ) = Ej 2 ∆Wj+1 1 − 2 ∆ ∆ ! # Yj+1 , where ∆Wj = Wtj − Wtj−1 . (In our particular situation, one can verify this simply by re-writing the conditional expectations as integrals on R with respect to the Gaussian density and integrating by parts.) Hence, one arrives at the following discrete-time dynamic programming equation: YJ Yj = g(XTρ̂ ), = Ej [Yj+1 ] + ∆ max σ∈{σlow ,σup } 1 2 ! #! " 2 ∆Wj+1 ∆Wj+1 σ2 1 − 1 Ej − ρ̂ − Yj+1 , (9) ρ̂2 ∆2 ∆ ∆ where XTρ̂ is the price of the asset at time T under the constant reference volatility ρ̂. This type of time-discretization scheme was analyzed for a general class of fully nonlinear parabolic PDEs by Fahim et al. (2011). In the particular case of the uncertain volatility model, the scheme was suggested by Guyon and Henry-Labordère (2011) by a slightly different derivation. 2 Let Gj (z, y) = y, sι = 12 ( σρ̂2ι − 1) for ι ∈ {up, low}, and Fj (z) = z (1) + ∆ max s∈{slow ,sup } sz (2) , where z (i) denotes the i-th component of the two-dimensional vector z, and define the R2 -valued process β by βj = ! 1 ∆Wj2 ∆2 − ρ̂ ∆Wj ∆ − 1 ∆ , j = 1, . . . , J. With these choices, the dynamic programming equation for Y in (9) is of the form (1). We emphasize that, although the recursive equation for Y stems from the discretization of the HamiltonJacobi-Bellman equation of a continuous-time stochastic control problem, the discrete time process Y may fail to be the value process of a discrete-time control problem, when the weight 1 1+∆ 2 σ2 −1 ρ̂2 2 ∆Wj+1 ∆Wj+1 1 − ρ̂ − 2 ∆ ∆ ∆ ! can become negative with positive probability for σ = σlow or σ = σup , see Example 3.1 below. In such situations, the general theory of information relaxation duals due to Brown et al. (2010) cannot be applied to construct upper bounds on Y , but the pathwise dynamic programming approach, which we present in Section 4, still applies. 8 3 Setting In this section, we introduce the general setting of the paper, into which the examples of the previous section are accommodated. We then explain, how the dynamic programming equation (1) relates to a class of stochastic two-player games, which have a sound interpretation from the point of view of option pricing. While establishing these connections has some intrinsic interest, their main motivation is to make the constructions and results of Section 4 more tangible and intuitive. Throughout the paper we study the following type of concave-convex dynamic programming equation on a complete filtered probability space (Ω, F, (Fj )j=0,...J , P ) in discrete time: YJ = ξ, Yj = Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])), j = J − 1, . . . , 0 (10) where Ej [·] denotes the conditional expectation with respect to Fj . We assume (C): For every j = 0, . . . , J − 1, Gj : Ω × RD × R → R and Fj : Ω × RD → R are measurable and, for every (z, y) ∈ RD × R, the processes (j, ω) 7→ Gj (ω, z, y) and (j, ω) 7→ Fj (ω, z) are adapted. Moreover, for every j = 0, . . . , J − 1 and ω ∈ Ω, the map (z, y) 7→ Gj (ω, z, y) is concave in (z, y) and non-decreasing in y, and the map z 7→ Fj (ω, z) is convex in z. (R): • G and F are of polynomial growth in (z, y) in the following sense: There exist a constant q ≥ 0 and a nonnegative adapted process (αj ) such that for all (z, y) ∈ RD+1 and j = 0, . . . , J − 1 |Gj (z, y)| + |Fj (z)| ≤ αj (1 + |z|q + |y|q ), P -a.s., and αj ∈ Lp (Ω, P ) for every p ≥ 1. • β = (βj )j=1,...,J is an adapted process such that βj ∈ Lp (Ω, P ) for every p ≥ 1 and j = 1, . . . , J. • The terminal condition ξ is an FJ -measurable random variable such that ξ ∈ Lp (Ω, P ) for every p ≥ 1. In the following, we abbreviate by • L∞− (RN ) the set of RN -valued random variables that are in Lp (Ω, P ) for all p ≥ 1. • L∞− (RN ) the set of Fj -measurable random variables that are in L∞− (RN ). j ∞− N • L∞− (RN ) for every j = 0, . . . , J. ad (R ) the set of adapted processes Z such that Zj ∈ Lj Thanks to the integrability assumptions on the terminal condition ξ and the weight process β and thanks to the polynomial growth condition on F and G, it is straightforward to check 9 recursively that the (P -a.s. unique) solution Y to (10) belongs to L∞− ad (R) under assumptions (C) and (R). In order to simplify the exposition, we shall also use the following convention for the remainder of the paper: Unless otherwise noted, all equations and inequalities are supposed to hold P -almost surely. For our main results in Section 4, we rewrite equation (10) relying on convex duality techniques. In the remainder of this section, we thus recall some concepts from convex analysis and show that – in some cases – these techniques allow us to interpret the dynamic programming equation (10) in terms of a class of two-player games. We recall that the convex conjugate of Fj is given by Fj# (u) := sup (u> z − Fj (z)), u ∈ RD , z∈RD where (·)> denotes matrix transposition. Note that Fj# can take the value +∞ and that the maximization takes place pathwise, i.e., ω by ω. Analogously, for Gj the concave conjugate can be defined in terms of the convex conjugate of −Gj as (1) (0) # (1) (0) G# v , v := −(−G ) −v , −v j j > (1) (0) = inf v z + v y − Gj (z, y) , (z,y)∈RD+1 v (1) , v (0) ∈ RD+1 , which can take the value −∞. By the Fenchel-Moreau theorem, e.g. in the form of Theorem 12.2 in Rockafellar (1970), one has, for every j = 0, . . . , J − 1, z ∈ RD , y ∈ R, and every ω ∈ Ω Gj (ω, z, y) = (1) > inf v z+v sup u> z − Fj# (ω, u) . (0) (v (1) ,v (0) )∈RD+1 Fj (ω, z) = y− (1) (0) G# j (ω, v , v ) , (11) (12) u∈RD Here, the minimization and maximization can, of course, be restricted to the effective domains # (1) (0) D DG# (ω,·) = {(v (1) , v (0) ) ∈ RD+1 , G# j (ω, v , v ) > −∞}, DF # (ω,·) = {u ∈ R , Fj (ω, u) < +∞} j j # (0) ≥ 0 for (v (1) , v (0) ) ∈ D by the monotonicity assumption of G# j and Fj . Noting that v G# (ω,·) j on G in the y-variable, the dynamic programming equation (10) can be rewritten in the form Yj = > # (1) (0) (1) (0) (0) # v + v u Ej [βj+1 Yj+1 ] − v Fj (u) − Gj (v , v ) , sup inf (v (1) ,v (0) )∈D G # j u∈D # F j which formally looks like the dynamic programming equation for a two-player game with a random > (Fj+1 -measurable!) and controlled ‘discount factor’ v (1) + v (0) u βj+1 between time j and j +1. To make this intuition precise, we next introduce sets of adapted ‘controls’ (or strategies) in 10 terms of F and G by o # D ∞− (ri )i=j,...,J−1 ri ∈ L∞− (R ), F (r ) ∈ L (R) for i = j, . . . , J − 1 , i i i n (1) (0) (1) (0) = ρi , ρi ∈ L∞− (RD+1 ), ρi , ρ i i i=j,...,J−1 o (1) (0) G# ρi , ρi ∈ L∞− (R) for i = j, . . . , J − 1 . i n AFj = AG j (13) F In Appendix A, we show that these sets of controls AG j and Aj are nonempty under the standing assumptions (C) and (R). For j = 0, . . . , J, we consider the following multiplicative weights (corresponding to the ‘discounting’ between time 0 and j) wj (ω, v (1) ,v (0) , u) = j−1 Y (1) (vi + vi ui )> βi+1 (ω) (0) (14) i=0 (1) (1) (0) (0) for ω ∈ Ω, v (1) = (v0 , ..., vJ−1 ) ∈ (RD )J , v (0) = (v0 , ..., vJ−1 ) ∈ RJ and u = (u0 , ..., uJ−1 ) ∈ (RD )J . Assuming (for simplicity) that F0 is trivial, and (crucially) that the positivity condition (v (1) + v (0) u)> βj+1 (ω) ≥ 0 (15) on the ‘discount factors’ holds for every ω ∈ Ω, (v (1) , v (0) ) ∈ DG# (ω,·) and u ∈ DF # (ω,·) , Y0 j j is indeed the equilibrium value of a (in general, non-Markovian) stochastic two-player zero-sum game, namely, Y0 = sup E wJ (ρ(1) , ρ(0) , r)ξ − inf F (ρ(1) ,ρ(0) )∈AG 0 r∈A 0 = sup inf (ρ(1) ,ρ(0) )∈AG r∈AF 0 0 J−1 X j=0 E wJ (ρ(1) , ρ(0) , r)ξ − J−1 X (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) j=0 as is shown in Appendix C.2. In financial terms, we may think of wj (ρ(1) , ρ(0) , r) as a discretetime price deflator (or as an approximation of a continuous-time price deflator given in terms of a stochastic exponential which can incorporate both discounting in the real-world sense and a change of measure). Then, the first term in E wJ (ρ(1) , ρ(0) , r)ξ − J−1 X (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) j=0 corresponds to the fair price of an option with payoff ξ in the price system determined by the deflator wJ (ρ(1) , ρ(0) , r), which is to be chosen by the two players. The choice may come with an additional running reward or cost which is formulated via the convex conjugates of F and G in the 11 second term of the above expression. With this interpretation, Y0 is the equilibrium price for an option with payoff ξ, on which the two players agree. Of course, the above stochastic two-player game degenerates to a stochastic maximization (resp. minimization) problem, if G (resp. F ) is linear, and so one of the control sets becomes a singleton. In the framework of the uncertain volatility model of Example 2.3, the following example illustrates that the validity of this game-theoretic interpretation of the dynamic programming equation (10) crucially depends on the positivity assumption (15). Example 3.1. We consider the discrete-time dynamic programming equation (9) with T = 2, J = 2, σlow = 0.1, σup = ρ̂ = 0.2 and (for simplicity) the artificial terminal condition Y2 := ξ := 1{∆W1 ≤−3} 1{∆W2 ≤−3} . In this example, the positivity condition (15) is violated, since, with the notation introduced in Example 2.3, we obtain for (1, slow ) = (1, −3/8) ∈ DF # = {1} × [−3/8, 0], j 3 3 3 (1, −3/8)> βj+1 = 1 − (∆Wj+1 )2 + ∆Wj+1 + =: aj+1 < −2 on {∆Wj+1 ≤ −3}, 8 40 8 for j = 0, 1. As Y can be rewritten in the form Yj = max{Ej [Yj+1 ], Ej [aj+1 Yj+1 ]}, one easily observes that (with Φ denoting the cumulative distribution function of a standard normal), Y1 = 1{∆W1 ≤−3} max{P ({∆W2 ≤ −3}), E1 [a2 1{∆W2 ≤−3} ]} = Φ(−3)1{∆W1 ≤−3} Y0 = Φ(−3) max{P ({∆W1 ≤ −3}), E[a1 1{∆W1 ≤−3} ]} = Φ(−3)2 . 2 However, in this example F # ≡ 0, AF0 consists of the processes in L∞− ad (R ) with values in {1} × [−3/8, 0], and the multiplicative weight satisfies w2 ((1, u1 ), (1, u2 )) = 8 8 1 + u1 (1 − a1 ) 1 + u2 (1 − a2 ) . 3 3 Hence, Y0 is not the value of the maximization problem supr∈AF E[w2 (r)ξ], because the deterministic control r1 := r2 := (1, −3/8)> achieves the reward 0 E a1 a2 1{∆W1 ≤−3} 1{∆W2 ≤−3} ≥ 4Φ(−3)2 > Y0 . For most of the paper, we shall not assume the positivity condition (15), and hence go beyond the above game-theoretic setting. Nonetheless, by a slight abuse of terminology, we will also refer F to members of the sets AG j and Aj as controls, when the positivity condition is violated. 12 4 Main results In this section, we state and discuss the main contribution of the paper, the pathwise dynamic programming approach of Theorem 4.5, which avoids the use of nested conditional expectations, in order to construct tight upper and lower bounds on the solution Yj to the dynamic program (10). Extending the ideas in Bender et al. (2015), this construction will be based on the concept of supersolutions and subsolutions to the dynamic program (10). Definition 4.1. A process Y up (resp. Y low ) ∈ L∞− ad (R) is called supersolution (resp. subsolution) to the dynamic program (10) if YJup ≥ YJ (resp. YJlow ≤ YJ ) and for every j = 0, . . . , J − 1 it holds up up Yjup ≥ Gj ([Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])), (and with ’ ≥’ replaced by ’ ≤’ for a subsolution). Before we explain a general construction method for such super- and subsolutions, we first illustrate how the dual representation for optimal stopping in Example 2.1 can be derived by this line of reasoning. Example 4.2. Fix a martingale M . The dynamic version of (4) suggests to consider Θup j := max Si − (Mi − Mj ), i=j,...,J Yjup = Ej [Θup j ]. The anticipative discrete-time process Θup clearly satisfies the recursive equation (pathwise dynamic program) up Θup j = max{Sj , Θj+1 − (Mj+1 − Mj )}, Θup J = SJ . Hence, by convexity of the max-operator and Jensen’s inequality, we obtain, thanks to the martingale property of M and the tower property of conditional expectation, up Yjup ≥ max{Sj , Ej [Θup j+1 − (Mj+1 − Mj )]} = max{Sj , Ej [Yj+1 ]}, YJup = SJ . Thus, Y up is a supersolution to the dynamic program (3). By the monotonicity of the maxoperator, we observe by backward induction that Yjup ≥ Yj , because (assuming that the claim is already proved at time j + 1) up Yjup ≥ max{Sj , Ej [Yj+1 ]} ≥ max{Sj , Ej [Yj+1 ]} = Yj . In particular, writing Y up (M ) instead of Y up in order to emphasize the dependence on the fixed martingale M , we get Y0 ≤ inf M Y0up (M ) = inf E M max Si − (Mi − M0 ) . i=0,...,J 13 (16) ∗ − M∗ = Y up,∗ = Defining M ∗ to be the Doob martingale of Y , i.e. Mi+1 i+1 − Ei [Yi+1 ], and Θ i Θup (M ∗ ), it is straightforward to check by backward induction, that, for every j = J, . . . , 0, Θup,∗ = Yj , as (assuming again that the claim is already proved for time j + 1) j Θup,∗ = max{Sj , Θup,∗ j j+1 − (Yj+1 − Ej [Yj+1 ])} = max{Sj , Ej [Yj+1 ]} = Yj . This turns the inequality in (16) into an equality. This alternative proof of the dual representation for optimal stopping is by no means shorter or simpler than the original one by Rogers (2002) and Haugh and Kogan (2004) which relies on the optional sampling theorem and the supermartingale property of Y . It comes, however, with the advantage that it generalizes immediately to dynamic programming equations which share the same convexity and monotonicity properties. Indeed, if Gj (z, y) = y in our general setting and we, thus, consider the convex dynamic program Yj = Fj (Ej [βj+1 Yj+1 ]), j = J − 1, . . . , 0, YJ = ξ, (17) we can write the corresponding pathwise dynamic program up Θup j = Fj (βj+1 Θj+1 − (Mj+1 − Mj )), Θup J =ξ (18) D for M ∈ MD , i.e., for D-dimensional martingales which are members of L∞− ad (R ). Then, the same argument as above, based on convexity and Jensen’s inequality implies that Yjup = Ej [Θup j ] is a supersolution to the dynamic program (17). A similar construction of a subsolution can be built on classical Fenchel duality. Indeed, consider, for every r ∈ AF0 , the linear pathwise dynamic program # Θlow = rj> βj+1 Θup j j+1 − Fj (rj ), Θlow J = ξ. (19) Then, by (12), it is easy to check that Yjlow = Ej [Θlow j ] defines a subsolution. These observations already led to the construction of tight upper and lower bounds for discretetime backward stochastic differential equations (BSDEs) with convex generators in Bender et al. (2015). In the latter paper, the authors exploit that in their setting the special form of F , stemming from a Lipschitz BSDE, implies – after a truncation of Brownian increments – a similar monotonicity property than in the Bermudan option case. In the general setting of the present paper, we would have to assume the following comparison principle to ensure that the supersolution Yjup and the subsolution Yjlow indeed constitute an upper bound and a lower bound for Yj . (Comp): For every supersolution Y up and every subsolution Y low to the dynamic program (10) it holds that Yjup ≥ Yjlow , P -a.s., 14 for every j = 0, . . . , J. The following characterization shows that the comparison principle can be rather restrictive and is our motivation to remove it below through a more careful construction of suitable pathwise dynamic programs. Theorem 4.3. Suppose (R) and that the dynamic program is convex, i.e., (C) is in force with Gj (z, y) = y for j = 0, . . . , J − 1. Then, the following assertions are equivalent: (a) The comparison principle (Comp) holds. (b) For every r ∈ AF0 the following positivity condition is fulfilled: For every j = 0, . . . , J − 1 rj> βj+1 ≥ 0, P -a.s. (c) For every j = 0, . . . , J−1 and any two random variables Y (1) , Y (2) ∈ L∞− (R) with Y (1) ≥ Y (2) P -a.s., the monotonicity condition Fj (Ej [βj+1 Y (1) ]) ≥ Fj (Ej [βj+1 Y (2) ]), P -a.s., is satisfied. Theorem 4.3 provides three equivalent formulations for the comparison principle. Formulation (b) illustrates the restrictiveness of the principle most clearly. For instance, when β contains unbounded entries with random sign, comparison can only hold in degenerate cases. In some applications, problems of this type can be resolved by truncation of the weights at the cost of a small additional error. Yet in applications such as pricing under volatility uncertainty, the comparison principle may fail (even under truncation). This is demonstrated in the following example which can be interpreted as a more general and systematic version of the calculation in Example 3.1. Example 4.4. Recall the setting and the dynamic programming equation (9) from Example 2.3. In this setting, thanks to condition (c) in Theorem 4.3, comparison boils down to the requirement that the prefactor 1 + sj ! 2 ∆Wj+1 − ρ̂∆Wj+1 − 1 ∆ of Yj+1 in the equation (9) for Yj is P -almost surely nonnegative for both of the feasible values of sj , ( 2 1 σlow 1 sj ∈ −1 , 2 2 ρ̂ 2 !) 2 σup −1 . ρ̂2 For sj > 1, this requirement is violated for realizations of ∆Wj+1 sufficiently close to zero, while for sj < 0 violations occur for sufficiently negative realizations of the Brownian increment – and √ this violation also takes place if one truncates the Brownian increments at ±const. ∆ with an arbitrarily large constant. Consequently, we arrive at the necessary conditions ρ̂ ≤ σlow and √ ρ̂ ≥ σup / 3 for comparison to hold. For σlow = 0.1 and σup = 0.2, the numerical test case 15 in Guyon and Henry-Labordère (2011) and Alanko and Avellaneda (2013), these two conditions cannot hold simultaneously, ruling out the possibility of a comparison principle. Our aim is, thus, to modify the above construction of supersolutions and subsolutions in such a way that comparison still holds for these particular pairs of supersolutions and subsolutions, although the general comparison principle (Comp) may be violated. At the same time, we generalize from the convex structure to the concave-convex structure. To this end, we consider the following ‘coupled’ pathwise recursion. Given (ρ(1) , ρ(0) ) ∈ AG j , r ∈ AFj and M ∈ MD , define the (in general) nonadapted processes θiup = θiup (ρ(1) , ρ(0) , r, M ) and θilow = θilow (ρ(1) , ρ(0) , r, M ), i = j, . . . , J, via the ‘pathwise dynamic program’ θJup = θJlow = ξ, (1) > (1) > (1) > up up low θi+1 − θi+1 − ρi θi = ρi βi+1 ρi βi+1 ∆Mi+1 + − (0) (1) (0) ι +ρi max Fi (βi+1 θi+1 − ∆Mi+1 ) − G# ρ , ρ i i i ι∈{up,low} up ι low θilow = min Gi βi+1 θi+1 − ∆Mi+1 , ri> βi+1 θi+1 − ri> βi+1 θi+1 + − ι∈{up,low} −ri> ∆Mi+1 − Fi# (ri ) , (20) where ∆Mi = Mi − Mi−1 . We emphasize that these two recursion formulas are coupled, as low enters the defining equation for θ up and θ up enters the defining equation for θ low . Roughly θi+1 i i i+1 up low in the upper bound speaking, the rationale of this coupled recursion is to replace θj+1 by θj+1 recursion at time j, whenever the violation of the comparison principle threatens to reverse the order between upper bound recursion and lower bound recursion. Due to this coupling the elementary argument based on convexity and Fenchel duality outlined at the beginning of this section does not apply anymore, but a careful analysis is required to disentangle the influences of the upper and lower bound recursion (see the proofs in the appendix). Our main result on the construction of tight upper and lower bounds for the concave-convex dynamic program (10) in absence of the comparison principle now reads as follows: Theorem 4.5. Suppose (C) and (R). Then, for every j = 0, . . . , J, Yj = = essinf Ej [θjup (ρ(1) , ρ(0) , r, M )] esssup Ej [θjlow (ρ(1) , ρ(0) , r, M )], F (ρ(1) ,ρ(0) )∈AG j , r∈Aj , M ∈MD (ρ(1) ,ρ(0) )∈AG j , r∈AF j , M ∈MD P -a.s. F For any (ρ(1) , ρ(0) ) ∈ AG j , r ∈ Aj , and M ∈ MD , we have the P -almost sure relation θilow (ρ(1) , ρ(0) , r, M ) ≤ θiup (ρ(1) , ρ(0) , r, M ) 16 for every i = j, . . . , J. Moreover, Yj = θjup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = θjlow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) (21) ∗ F P -almost surely, for every (ρ(1,∗) , ρ(0,∗) ) ∈ AG j and r ∈ Aj satisfying the duality relations (1,∗) > ρi (0,∗) Ei [βi+1 Yi+1 ] + ρi (1,∗) (0,∗) ρ , ρ Fi (Ei [βi+1 Yi+1 ]) − G# i i i = Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ])) (22) and (ri∗ )> Ei [βi+1 Yi+1 ] − Fi# (ri∗ ) = Fi (Ei [βi+1 Yi+1 ]) (23) P -almost surely for every i = j, . . . , J − 1, and with M ∗ being the Doob martingale of βY , i.e., Mk∗ = k−1 X βi+1 Yi+1 − Ei [βi+1 Yi+1 ], P -a.s., for every k = 0, . . . J. i=0 ∗ F We note that, by Lemma A.1, optimizers (ρ(1,∗) , ρ(0,∗) ) ∈ AG j and r ∈ Aj which solve the duality relations (22) and (23) do exist. These duality relations also provide some guidance on how to choose nearly optimal controls in numerical implementations of the method: When an approximate solution Ỹ of the dynamic program is available, controls can be chosen such that they (approximately) fulfill the analogs of (22) and (23) with Ỹ in place of the unknown true solution. Likewise, M can be chosen as an approximate Doob martingale of β Ỹ . Moreover, the almost sure optimality property in (21) suggests that (as is the case for the dual upper bounds for optimal stopping) a Monte Carlo implementation benefits from a low variance, when the approximate solution Ỹ to the dynamic program (10) is sufficiently close to the true solution Y . As shown in the next proposition, the processes Ej [θjup ] and Ej [θjlow ] in Theorem 4.5 indeed define super- and subsolutions to the dynamic program (10) which are ordered even though the comparison principle need not hold: Proposition 4.6. Under the assumptions of Theorem 4.5, the processes Y up and Y low given by Yjup = Ej [θjup (ρ(1) , ρ(0) , r, M )] and Yjlow = Ej [θjlow (ρ(1) , ρ(0) , r, M )], j = 0, . . . , J F are, respectively, super- and subsolutions to (10) for every (ρ(1) , ρ(0) ) ∈ AG 0 , r ∈ A0 , and M ∈ MD . We close this section by stating a simplified and decoupled pathwise dynamic programming approach, which can be applied if the comparison principle is known to be in force. 17 Theorem 4.7. Suppose (C), (R), and (Comp) and fix j = 0, . . . , J. For every (ρ(1) , ρ(0) ) ∈ AG j up (1) (0) and M ∈ MD define Θup j = Θj (ρ , ρ , M ) by the pathwise dynamic program (1) > (0) up # (1) (0) = ρ βi+1 Θup Θup i i i+1 − (Mi+1 − Mi ) + ρi Fi (βi+1 Θi+1 − (Mi+1 − Mi )) − Gi (ρi , ρi ), F low = for i = J − 1, . . . , j, initiated at Θup J = ξ. Moreover, for r ∈ Aj and M ∈ MD define Θj Θlow j (r, M ) for i = J − 1, . . . , j by the pathwise dynamic program # low > low Θlow = G β Θ − (M − M ), r β Θ − (M − M ) − F (r ) i i+1 i+1 i i+1 i+1 i i i i+1 i i+1 i initiated at Θlow J = ξ. Then, Yj = essinf (ρ(1) ,ρ(0) )∈AG j , M ∈MD (1) (0) Ej [Θup j (ρ , ρ , M )] = esssup r∈AF j , M ∈MD Ej [Θlow j (r, M )], P -a.s., Moreover, (1,∗) (0,∗) ∗ ∗ Yj = Θup ,ρ , M ∗ ) = Θlow j (r , M ) j (ρ (24) ∗ F P -almost surely, for every (ρ(1,∗) , ρ(0,∗) ) ∈ AG j and r ∈ Aj satisfying the duality relations (22)– (23) and with M ∗ being the Doob martingale of βY . 5 Relation to information relaxation duals In this section, we relate a special case of our pathwise dynamic programming approach to the information relaxation dual for the class of stochastic two-player games discussed in Section 3. Throughout the section, we assume that the positivity condition (15) holds and that no information is available at time zero, i.e., F0 is trivial. Then, by virtue of Proposition B.1, the comparison principle (Comp) is in force and we can apply the simplified pathwise dynamic programming approach of Theorem 4.7. Under assumption (15), Y0 is the equilibrium value of a stochastic two-player zero-sum game, namely, Y0 = sup E wJ (ρ(1) , ρ(0) , r)ξ − inf F (ρ(1) ,ρ(0) )∈AG 0 r∈A 0 = sup r∈AF 0 inf (ρ(1) ,ρ(0) )∈AG 0 J−1 X j=0 E wJ (ρ(1) , ρ(0) , r)ξ − J−1 X (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) (0) (1) (0) , wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) j=0 as seen in Section 3 where the multiplicative weights wj are defined in (14). Now, assume that 18 player 1 fixes her control (ρ(1) , ρ(0) ) ∈ AG 0 . Then, Y0 ≤ sup E wJ (ρ(1) , ρ(0) , r)ξ − r∈AF 0 J−1 X (1) (0) (0) . wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) j=0 One can next apply the information relaxation dual of Brown et al. (2010) (with strong duality), which states that the maximization problem on the right-hand side equals inf E sup p wJ (ρ(1) , ρ(0) , u)ξ − J−1 X u∈(RD )J (0) (1) (0) , wj (ρ(1) , ρ(0) , u) ρj Fj# (uj ) + G# j (ρj , ρj ) − p(u) j=0 where the infimum runs over the set of all dual-feasible penalties, i.e., all mappings p : Ω×(RD )J → R ∪ {+∞} which satisfy E[p(r)] ≤ 0 for every r ∈ AF0 , i.e., for every adapted and admissible control r of player 2. Notice that, for every martingale M ∈ MD , the map pM : u 7→ J−1 X wj (ρ(1) , ρ(0) , u)(ρj + ρj uj )> ∆Mj+1 (1) (0) j=0 is a dual-feasible penalty, since, for adapted controls r ∈ AF0 , E wj (ρ(1) , ρ(0) , r)(ρj + ρj rj )> ∆Mj+1 J−1 X (1) (0) j=0 = J−1 X h i (1) (0) E wj (ρ(1) , ρ(0) , r)(ρj + ρj rj )> Ej [∆Mj+1 ] = 0 j=0 by the martingale property of M and the tower property of the conditional expectation. We shall refer to penalties of that particular form as martingale penalties. It turns out (see Appendix C.1 for the detailed argument) that, given the fixed controls (ρ(1) , ρ(0) ) ∈ AG 0 of player 1, one has, for every martingale M ∈ MD , (1) (0) Θup 0 (ρ , ρ , M ) = sup u∈(RD )J wJ (ρ(1) , ρ(0) , u)ξ − J−1 X (0) (1) (0) wj (ρ(1) , ρ(0) , u) ρj Fj# (uj ) + G# j (ρj , ρj ) − pM (u) , j=0 (25) i.e. the explicit and straightforward to implement pathwise dynamic program Θup (ρ(1) , ρ(0) , M ) in Theorem 4.7 solves the pathwise maximization problem in the information relaxation dual for the subclass of martingale penalties. We emphasize that, in contrast, for general choices of the penalties solving the pathwise maximization problem may turn out to be a computational 19 bottleneck of the information relaxation approach, compare, e.g., the discussion in Section 4.2 of Brown and Smith (2011) and in Section 2.3 of Haugh and Lim (2012). In view of (25), Theorem 4.7 implies " Y0 = inf (ρ(1) ,ρ(0) )∈AG j , M ∈MD − J−1 X wj (ρ E (1) wJ (ρ(1) , ρ(0) , u)ξ sup u∈(RD )J ,ρ (0) , u) (0) ρj Fj# (uj ) + (1) (0) G# j (ρj , ρj ) !# − pM (u) j=0 i.e., one still obtains strong duality, when the minimization is restricted from the set of all dualfeasible penalties to the computationally tractable subset of martingale penalties. Summarizing, when assuming the positivity condition (15), we can interpret the upper bound (1) (0) (1) (0) E[Θup 0 (ρ , ρ , M )] in such a way that, first, player 1 fixes her strategy (ρ , ρ ) and the penalty by the choice of the martingale M , while, then, the player 2 is allowed to maximize the penalized problem pathwise. An analogous representation and interpretation holds, of course, for the lower F bound E[Θlow 0 (r, M )]. Indeed, given a control r ∈ A0 and a martingale M ∈ MD Θlow 0 (r, M ) = wJ (v (1) , v (0) , r)ξ inf (1) (0) (vj ,vj )∈(RD+1 )J − J−1 X wj (v (1) ,v (0) , r) (0) vj Fj# (rj ) + (1) (vj + (0) vj rj )> ∆Mj+1 + (1) (0) G# j (vj , vj ) ! , j=0 where now the map (v (1) , v (0) ) 7→ J−1 X wj (v (1) , v (0) , r)(vj + vj rj )> ∆Mj+1 (1) (0) j=0 is a dual-feasible penalty in the sense of Brown et al. (2010). In this way, we end up with the information relaxation dual of Brown et al. (2010) for each player given that the other player has fixed a control, but with the additional twist that, for our particular class of (possibly non-Markovian) two-player games, minimization can be restricted to the tractable subclass of martingale penalties. The general procedure explained above is analogous to the recent information relaxation approach by Haugh and Wang (2015) for two-player games in a classical Markovian framework which dates back to Shapley (1953), but in their setup Haugh and Wang (2015) have to take minimization over all dual-feasible penalties into account. As an important example, the setting of this section covers the problem of pricing convertible bonds which (in its simplest form) is a stopping game governed by the dynamic programming equation Yj = min{Uj , max{Lj , Ej [Yj+1 ]}}, 20 YJ = LJ for adapted processes Lj ≤ Uj . For this problem, representations in the sense of pathwise optimal control were previously studied by Kühn et al. (2007) in continuous time and by Beveridge and Joshi (2011) in discrete time. 6 Applications In this section, we apply the pathwise dynamic programming methodology to calculate upper and lower bounds for two nonlinear pricing problems, pricing of a long-term interest rate product under bilateral counterparty risk, and option pricing under volatility uncertainty. Traditionally, the valuation of American options was by far the most prominent nonlinear pricing problem both in practice and in academia. In the wake of the financial crisis, various other sources of nonlinearity such as model uncertainty, default risk, liquidity problems or transaction costs have received considerable attention, see the recent monographs Brigo et al. (2013), Crépey (2013), and Guyon and Henry-Labordère (2013). 6.1 Bilateral counterparty risk Suppose that two counterparties have agreed to exchange a stream of payments Ctj over the sequence of time points t0 , . . . , tJ . For many common interest rate products such as swaps, the sign of C is random – so that the agreed payments may flow in either direction. Therefore, a consistent pricing approach must take into account bilateral default risk, thus introducing nonlinearities into the arising recursive pricing equations which are in general neither convex nor concave. In this respect, the present setting is conceptually similar but more complex than the model of unilateral counterparty risk in Example 2.2. We refer to Crépey et al. (2013) for technical background and an in-depth discussion of the intricacies of pricing under bilateral counterparty risk and funding. We notice that by discretizing their equations (2.14) and (3.8), we arrive at the following nonlinear backward dynamic program for the value of the product Yj at time tj (given that there was no default prior to tj ) associated with the payment stream C, namely, YJ = CtJ and Yj = (1 − ∆(rtj + γtj (1 − r)(1 − 2ptj ) + λ))Ej [Yj+1 ] +∆(γtj (1 − r)(1 − 3ptj ) + λ − λ)Ej [Yj+1 ]+ + Ctj . Here Ej denotes the expectation conditional on the market’s reference filtration up to time tj (i.e., the full market information is obtained by enlarging this filtration with the information whether default has yet occurred or not). In focusing on equation (3.8) in Crépey et al. (2013), we consider a pre-default CSA recovery scheme without collateralization, see their paper for background. In the pricing equation, rt denotes (a proxy to) the risk-less short rate at time t. The rate at which a default of either side occurs at time t is denoted by γt . Moreover, pt is the associated conditional 21 probability that it is the counterparty who defaults, if default occurs at time t. We rule out simultaneous default so that own default happens with conditional probability 1 − pt , and assume that the three parameters ρ, ρ and r associated with partial recovery from Crépey et al. (2013) are identical and denoted by r. Finally, λ and λ are two constants associated with the costs of external lending and borrowing, and ∆ = tj+1 −tj is the stepsize of the equidistant time partition. Defining gj = 1 − ∆(rtj + γtj (1 − r)(1 − 2ptj ) + λ) and hj = ∆(γtj (1 − r)(1 − 3ptj ) + λ − λ), we can express the recursion for Y in terms of a concave function Gj and a convex function Fj by setting Fj (z) = z+ and Gj (z, y) = gj z + (hj )+ y − (hj )− z+ + Ctj , so that Yj = Gj (Ej [Yj+1 ], Fj (Ej [Yj+1 ])). Denote by (Ỹj ) a numerical approximation of the process (Yj ), by (Q̃j ) a numerical approximation of (Ej [Yj+1 ]) and by (M̃j ) a martingale which we think of as an approximation to the Doob martingale of Y . In terms of these inputs, the pathwise recursions (20) for upper and lower bound are given by θjup = (1) (1) (0) up up low ρ̃j θj+1 − ∆M̃j+1 − ρ̃j θj+1 − ∆M̃j+1 + ρ̃j θj+1 − ∆M̃j+1 + Ctj − + + and θjlow = min ι∈{up,low} ι ι low gj θj+1 − ∆M̃j+1 − (hj )− θj+1 − ∆M̃j+1 + (hj )+ s̃j θj+1 − ∆M̃j+1 + Ctj + where (g , (h ) , 0) , Q̃j < 0, j j + (1) (0) (ρ̃j , ρ̃j , s̃j ) = (g − (h ) , (h ) , 1) , Q̃ ≥ 0. j j − j + j For the payment stream Ctj , we consider a swap with notional N , fixed rate R and an equidistant sequence of tenor dates T = {T0 , . . . , TK } ⊆ {t0 , . . . tJ }. Denote by δ the length of the time interval between Ti and Ti+1 and by P (Ti−1 , Ti ) the Ti−1 -price of a zero-bond with maturity Ti . Then, the payment process Ctj is given by CTi = N · 1 P (Ti−1 , Ti ) − (1 + Rδ) for Ti ∈ T \ {T0 } and Ctj = 0 otherwise, see Brigo and Mercurio (2006), Chapter 1. For r and γ, we implement the model of Brigo and Pallavicini (2007), assuming that the risk-neutral dynamics of r is given by a two-factor Gaussian short rate model, a reparametrization of the two-factor Hull-White model, while γ is a CIR process. For the conditional default probabilities pt we assume pt = 0 ∧ p̃t ∨ 1 where p̃ is an Ornstein-Uhlenbeck process. In continuous 22 time, this corresponds to the system of stochastic differential equations dxt = −κx xt dt + σx dWtx , dyt = −κy yt dt + σy dWty √ dγt = κγ (µγ − γt )dt + σγ γt dWtγ , dp̃t = κp (µp − p̃t )dt + σp dWtp with rt = r0 + xt + yt , x0 = y0 = 0. Here, W x , W y and W γ are Brownian motions q with instanp γ taneous correlations ρxy , ρxγ and ρyγ . In addition, we assume that Wt = ργp Wt + 1 − ρ2γp Wt where the Brownian motion W is independent of (W x , W y , W γ ). We choose the filtration generated by the four Brownian motions as the reference filtration. For the dynamics of x, y and p̃, exact time discretizations are available in closed form. We discretize γ using the fully truncated scheme of Lord et al. (2010). The bond prices P (t, s) are given as an explicit function of xt and yt in this model. This implies that the swap’s “clean price”, i.e., the price in the absence of counterparty risk is given in closed form as well, see Sections 1.5 and 4.2 of Brigo and Mercurio (2006). We consider 60 half-yearly payments over a horizon of T = 30 years, i.e., δ = 0.5. J is always chosen as an integer multiple of 60 so that δ is an integer multiple of ∆ = T /J. For the model parameters, we choose (r0 , κx , σx , κy , σy ) = (0.03, 0.0558, 0.0093, 0.5493, 0.0138), (γ0 , µγ , κγ , σγ , p0 , µp , κp , σp ) = (0.0165, 0.026, 0.4, 0.14, 0.5, 0.5, 0.8, 0.2), (ρxy , ρxγ , ρyγ , r, λ, λ, N ) = (−0.7, 0.05, −0.7, 0.4, 0.015, 0.045, 1). We thus largely follow Brigo and Pallavicini (2007) for the parametrization of r and γ but leave out their calibration to initial market data and choose slightly different correlations to avoid the extreme cases of a perfect correlation or independence of r and γ. The remaining parameters J, R and ργp are varied in the numerical experiments below. Except for the different recursions for θup and θlow , the overall numerical procedure is similar to the one in Bender et al. (2015), so we refer to that paper for a more detailed exposition. We calculate the initial approximation (Q̃j , Ỹj ) by a least-squares Monte Carlo approach. We first simulate Nr = 105 trajectories of (xtj , ytj , γtj , ptj , Ctj ) forward in time, the so-called regression paths. Then, for j = J − 1, . . . 0, we compute Q̃j by least-squares regression of Ỹj+1 onto a set of basis functions. As the four basis functions, we use 1, γtj , γtj · ptj and the closed-form expression for the clean price at time tj of the swap’s remaining payment stream. Ỹj is computed by application of Fj and Gj to Q̃j . Storing the coefficients from the regression, we next simulate No = 106 independent trajectories of (xtj , ytj , γtj , ptj , Ctj , Q̃j , Ỹj ) forward in time, the so-called outer paths. In order to simulate an approximate Doob martingale M̃ of Ỹj , we rely on a subsampling approach as proposed in Andersen and Broadie (2004). Along each outer path and for all j, we simulate Ni = 100 copies of Ỹj+1 conditional on the observed value of (xtj , ytj , γtj , p̃tj ) and take the average of these copies to obtain an unbiased estimator of Ej [Ỹj+1 ]. 23 J 120 360 Clean Price 0 0 ργp = 0.8 21.53 21.70 ργp = 0 25.13 25.31 ργp = −0.8 28.42 28.64 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) 21.34 21.52 24.97 25.16 28.25 28.50 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) 21.46 24.88 25.07 28.16 28.41 720 0 21.28 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) 1440 0 21.28 21.46 24.88 25.08 28.15 28.41 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) Table 1: Lower and upper bound estimators for varying values of ργp and J with R = 275.12 basis points (b.p.), No = 106 and Ni = 100. Prices and standard deviations (in brackets) are given in b.p. Ni 10 Clean Price 0 20 0 50 0 ργp = 0.8 21.14 22.15 ργp = 0 24.79 25.81 ργp = −0.8 28.09 29.23 (0.17) (0.17) (0.17) (0.17) (0.17) (0.17) 21.38 21.94 25.01 25.58 28.30 28.96 (0.12) (0.12) (0.12) (0.12) (0.12) (0.12) 21.28 21.57 24.90 25.19 28.19 28.54 (0.09) (0.09) (0.09) (0.09) (0.09) (0.09) 21.41 24.83 25.02 28.10 28.35 100 0 21.22 (0.07) (0.07) (0.07) (0.07) (0.07) (0.07) 200 0 21.42 21.55 25.00 25.14 28.29 28.48 (0.06) (0.06) (0.06) (0.06) (0.06) (0.06) Table 2: Lower and upper bound estimators for varying values of ργp and Ni with No = 105 , R = 275.12 b.p. and J = 720. Prices and standard deviations (in brackets) are given in b.p. After these preparations, we can go through the recursion for θup and θlow along each outer path. Denote by Ŷ0up and Ŷ0low the resulting Monte Carlo estimators of E[θ0up ] and E[θ0low ] and their associated empirical standard deviations by σ̂ up and σ̂ low . Then, an asymptotic 95%confidence interval for Y0 is given by [Ŷ0low − 1.96σ̂ low , Ŷ0up + 1.96σ̂ up ]. Table 1 displays upper and lower bound estimators with their standard deviations for different values of the time discretization and the correlation between γ and p. Here, R is chosen as the fair swap rate in the absence of default risk, i.e., it is chosen such that the swap’s clean price at j = 0 is zero. The four choices of J correspond to a quarterly, monthly, bi-weekly, and weekly time discretization, respectively. In all cases, the width of the resulting confidence interval is about 1.2% of the value. While variations between the two finer time discretizations can be explained by standard deviations, the two rougher discretizations lead to values which are slightly higher. We thus work with the bi-weekly discretization J = 720 in the following. The effect of varying the correlation parameter of γ and p also has the expected direction. Roughly, if ργp is positive then larger values of the overall default rate go together with larger conditional default risk of the 24 ργp 0.8 Adjusted Fair Swap Rate 290.91 0 293.75 −0.8 296.41 Clean Price −31.70 Bounds −0.07 0.12 (0.07) −37.41 −0.09 −42.75 −0.12 (0.07) (0.07) (0.07) 0.12 (0.07) 0.14 (0.07) Table 3: Adjusted fair swap rates and lower and upper bound estimators for varying values of ργp with No = 105 , Ni = 100 and J = 720. Rates, prices and standard deviations (in brackets) are given in b.p. counterparty and smaller conditional default risk of the party, making the product less valuable to the party. While this effect is not as pronounced as the overall deviation from the clean price, the bounds are easily tight enough to differentiate between the three cases. Table 2 shows the effect of varying the number of inner paths in the same setting. Since a larger number of inner paths improves the approximation quality of the martingale, and since we have pathwise optimality for the limiting Doob martingale, we observe two effects, a reduction in standard deviations and a reduction of the distance between lower and upper bound estimators. To display these effects more clearly, we have decreased the number of outer paths to No = 105 in this table, leading to the expected increase in the overall level of standard deviations. Finally, Table 3 displays the adjusted fair swap rates accounting for counterparty risk and funding for the three values of ργp , i.e., the values of R which set the adjusted price to zero in the three scenarios. To identify these rates, we fix a set of outer, inner and regression paths and define µ(R) as the midpoint of the confidence interval we obtain when running the algorithm with these paths and rate R for the fixed leg of the swap. We apply a standard bisection method to find the zero of µ(R). The confidence intervals for the prices in Table 3 are then obtained by validating these swap rates with a new set of outer and inner paths. We observe that switching from a clean valuation to the adjusted valuation with ρ = 0.8 increases the fair swap rate by 16 basis points (from 275 to 291). Changing ρ from 0.8 to −0.8 leads to a further increase by 5 basis points. 6.2 Uncertain Volatility Model In this section, we apply our numerical approach to the uncertain volatility model of Example 2.3. Let 0 = t0 < t1 < ... < tJ = T be an equidistant partition of the interval [0, T ], where T ∈ R+ . Recall that for an adapted process (σt )t , the price of the risky asset X σ at time t is given by Z Xtσ = x0 exp 0 t 1 σu dWu − 2 25 Z t σu2 du 0 , where W is a Brownian motion. Further, let g be the payoff a European option written on the risky asset. Then, by Example 2.3, the discretized value process of this European option under uncertain volatility is given by YJ = g(XTρ̂ ), h i ρ̂ Γj = Ej Bj+1 Yj+1 where ρ̂ Bj+1 = and Yj = Ej [Yj+1 ] + ∆ 2 ∆Wj+1 ∆Wj+1 1 − ρ̂ − , ∆2 ∆ ∆ and sι = max s∈{slow ,sup } 1 2 s Γj , (26) σι2 − 1 ρ̂2 for ι ∈ {low, up}. Moreover, XTρ̂ is the price of the asset at time T under the constant reference volatility ρ̂ > 0. Notice that the reference volatility ρ̂ is a choice parameter in the discretization. The basic idea is to view the uncertain volatility model as a suitable correction of a Black-Scholes model with volatility ρ̂. As in Section 6.1, we denote in the following by (Ỹj ) a numerical approximation of the process (Yj ), by (Q̃j ) a numerical approximation of (Ej [Yj+1 ]) and by (Γ̃j ) a numerical approximation of (Γj ). Furthermore, (M̃j ) = (M̃j , M̃j )> denotes an approximation of the Doob martingale M (1) (2) of βj Yj , which is given by Mj+1 − Mj = Yj+1 − Ejh[Yj+1 ] ρ̂ ρ̂ Bj+1 Yj+1 − Ej Bj+1 Yj+1 ! i . The recursion (20) for θjup and θjlow which is based on these input approximations can be written as θjup = max max ι∈{up,low} s∈{slow ,sup } n o (1) (2) ρ̂ ι ι θj+1 − ∆M̃j+1 + sBj+1 θj+1 ∆ − s∆M̃j+1 ∆ and θjlow = (1) (2) ρ̂ (1) (2) ρ̂ (1) (1) (2) (2) up low r̃j + r̃j Bj+1 ∆ θj+1 − r̃j + r̃j Bj+1 ∆ θj+1 − r̃j ∆M̃j+1 − r̃j ∆M̃j+1 ∆, − + (1) (2) where r̃j = (r̃j , r̃j ) is given by (1, s ) , Γ̃ < 0 j low r̃j = (1, s ) , Γ̃ ≥ 0. up j For the payoff, we consider a European call-spread option with strikes K1 and K2 , i.e., g(x) = (x − K1 )+ − (x − K2 )+ , which is also studied in Guyon and Henry-Labordère (2011) and Alanko and Avellaneda (2013). 26 Following them, we choose the maturity T = 1 as well as K1 = 90 and K2 = 110 and x0 = 100. The parameters ρ̂, σlow and σup are varied in the numerical experiments below. In order to calculate the initial approximations (Ỹj ) and (Γ̃j ), we apply the martingale basis variant of least-squares Monte Carlo which was proposed in Bender and Steiner (2012) in the context of backward stochastic differential equations and dates back to Glasserman and Yu (2004) for the optimal stopping problem. The basic idea is to use basis functions whose conditional expectations (under the reference Black-Scholes model) can be calculated in closed form. When Ỹj+1 is given as a linear combination of such basis functions, Ej [Ỹj+1 ] can be calculated without regression by simply applying the same coefficients to the conditional expectations of the basis functions. Similarly the Gamma of Ỹ can be represented in terms of the second derivatives of the basis functions. To be more precise, we fix K basis functions ηJ,k (x), k = 1, . . . , K, at terminal time and assume that ηj,k (x) = E[ηJ,k (XJρ̂ )|Xjρ̂ = x] can be computed in closed form. Then, it is straightforward to check that ρ̂ E[ηj+1,k (Xj+1 )|Xjρ̂ = x] = ηj,k (x) h i d2 ρ̂ ρ̂ E Bj+1 ηj+1,k (Xj+1 ) Xjρ̂ = x = ρ̂2 x2 2 ηj,k (x). dx For a given approximation Ỹj+1 of Yj+1 , j = 0, . . . , J − 1, in terms of such martingale basis functions, a first approximation of Yj can thus be calculated without regression by computing the right hand sides in (26) explicitly with Ỹj+1 in place of Yj+1 . This approximation is regressed onto the basis functions to ensure that the approximation Ỹj at time j is again a linear combination of basis functions. This overall approach has two key advantages. First, since Q̃j = Ej [Ỹj+1 ] and ρ̂ Γ̃j are exact conditional expectations of Ỹj+1 and Bj+1 Ỹj+1 , the martingale M̃j+1 − M̃j = Ỹj+1 − Q̃j ! ρ̂ Bj+1 Ỹj+1 − Γ̃j can be computed without subsimulations, and thus without the associated sampling error and simulation costs. Second, since we calculate Γ̃j by twice differentiating the Y -approximation, we ρ̂ 2 ·∆−2 , these avoid to use the simulated Γ-weights Bj+1 in the regression. Due to the factor ∆Wj+1 weights are otherwise a source of large standard errors. See Alanko and Avellaneda (2013) for a discussion of this problem and some control variates (which are not needed in our approach). We first simulate Nr = 105 regression paths of the process (Xjρ̂ ) under the constant volatility ρ̂. For the regression, we do not start all paths at x0 but rather start Nr /200 trajectories at each of the points 31, . . . , 230. Since X is a geometric Brownian motion under ρ̂, it can be simulated exactly. Starting the regression paths at multiple points allows to reduce the instability of regression coefficients arising at early time points. See Rasmussen (2005) for a discussion of this stability 27 problem and of the method of multiple starting points. For the computation of Ỹj and Q̃j we choose 162 basis functions. The first three are 1, x and E[g(XJρ̂ )|Xjρ̂ = x]. Note that the third one is simply the Black-Scholes price (under ρ̂) of the spread option g. For the remaining 159 basis functions, we also choose Black-Scholes prices of spread options with respective strikes K (k) and K (k+1) for k = 1, . . . , 159, where the numbers K (1) , . . . , K (160) increase from 20.5 to 230.5. The second derivatives of these basis functions are just (differences of) Black-Scholes Gammas. Like in Section 6.1, we then simulate (Xjρ̂ , Ỹj , Q̃j , Γ̃j ) along No = 105 outer paths started in x0 . As pointed out above, the Doob martingale of βj Ỹj is available in closed form in this case. Finally, the recursions for θup and θlow are calculated backwards in time on each outer path. As a first example we consider σlow = 0.1 and σup = 0.2 as in Guyon and Henry-Labordère (2011) and Alanko and Avellaneda (2013). Following Vanden (2006), the continuous-time limiting price of a European call-spread option in this setting is given by 11.2046. Table 4 shows the √ approximated prices Ỹ0 as well as upper and lower bounds for ρ̂ = 0.2/ 3 ≈ 0.115 depending on the time discretization. This is the smallest choice of ρ̂, for which the monotonicity condition in Theorem 4.3 can only be violated when the absolute values of the Brownian increments are large, cp. Example 4.4. As before, Ŷ0up and Ŷ0low denote Monte Carlo estimates of E[θ0up ] respectively E[θ0low ]. The numerical results suggest convergence from below towards the continuous-time limit for finer time discretizations. This is intuitive in this example, since finer time discretizations allow for richer choices of the process (σt ) in the maximization problem (7). We notice that the bounds are fairly tight (with, e.g., a relative width of 1.3% for the 95% confidence interval with J = 21 time discretization points), although the upper bound begins to deteriorate as Ỹ0 approaches its limiting value. The impact of increasing ρ̂ to 0.15 (as proposed in Guyon and Henry-Labordère, 2011; Alanko and Avellaneda, 2013) is shown in Table 5. The relative width of the 95%-confidence interval is now about 0.6 % for up to J = 35 time steps, but also the convergence to the continuous-time limit appears to be slower with this choice of ρ̂. J 3 6 9 12 15 18 21 24 Ỹ0 10.8553 11.0500 11.1054 11.1340 11.1484 11.1585 11.1666 11.1710 Ŷ0low 10.8550 11.0502 11.1060 11.1344 11.1486 11.1585 11.1666 11.1683 (0.0001) (0.0002) (0.0005) (0.0002) (0.0002) (0.0006) (0.0003) (0.0032) Ŷ0up 10.8591 11.0536 11.1116 11.1438 11.1728 11.2097 11.2764 11.5593 (0.0001) (0.0002) (0.0005) (0.0007) (0.0056) (0.0088) (0.0173) (0.0984) √ Table 4: Approximated price as well as lower and upper bounds for ρ̂ = 0.2/ 3 for different time discretizations. Standard deviations are given in brackets Comparing Table 5 with the results in Alanko and Avellaneda (2013), we observe that their point estimates for Y0 at time discretization levels J = 10 and J = 20 do not lie in our confidence intervals which are given by [10.9981, 11.0030] and [11.1021, 11.1096], indicating that their leastsquares Monte Carlo estimator may still suffer from large variances (although they apply control variates). The dependence of the time discretization error on the choice of the reference volatility 28 J 5 10 15 20 25 30 35 40 Ỹ0 10.8153 10.9982 11.0684 11.1027 11.1241 11.1386 11.1479 11.1554 Ŷ0low 10.8153 10.9983 11.0683 11.1023 11.1237 11.1376 11.1462 11.0101 (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0002) (0.0002) (0.1391) Ŷ0up 10.8167 11.0028 11.0728 11.1092 11.1379 11.1687 11.2058 12.0483 (0.0001) (0.0001) (0.0001) (0.0002) (0.0008) (0.0030) (0.0047) (0.6464) Table 5: Approximated price as well as lower and upper bounds for ρ̂ = 0.15 for different time discretizations. Standard deviations are given in brackets ρ̂ is further illustrated in Table 6, which displays the mean and the standard deviation of 30 runs of the martingale basis algorithm for different choices of ρ̂ and up to 640 time steps. By and large, convergence is faster for smaller choices of ρ̂, but the algorithm becomes unstable when the reference volatility is too small. J 10 20 40 80 160 320 640 ρ̂ = 0.06 79.7561 1.6421 · 105 1.7010 · 1016 3.2151 · 1024 1.8613 · 1024 4.5672 · 1039 7.0277 · 1039 11.6463 12.3183 130.2723 11.8494 11.6951 (0.2634) (1.5447) (372.2625) (1.1846) (1.5772) 5.4389 · 10 1.0153 · 10 11.1552 11.1823 11.1942 11.1999 11.2026 11.2047 11.2057 (0.0031) (0.0040) (0.0005) (0.0002) (0.0001) (0.0001) (0.0001) ρ̂ = 0.15 10.9985 11.1027 11.1556 11.1821 11.1952 11.2017 11.2047 (0.0006) (0.0006) (0.0004) (0.0003) (0.0003) (0.0003) (0.0002) ρ̂ = 0.2 10.7999 10.9746 11.0819 11.1455 11.1802 11.1980 11.2073 (0.0007) (0.0005) (0.0005) (0.0006) (0.0005) (0.0006) (0.0006) 9.7088 9.9652 10.2306 10.4945 10.7453 10.9635 11.1248 (0.0004) (0.0005) (0.0009) (0.0015) (0.0047) (0.0076) (0.0103) (5.1739) ρ̂ = 0.08 ρ̂ = 0.1 ρ̂ = 0.5 (8.4594·105 ) (6.3043·1016 ) (1.7603·1025 ) (7.0234·1024 ) (2.5016·1040 ) 3 (1.8766·104 ) (3.7590·1040 ) 8 (3.5571·108 ) Table 6: Mean of L = 30 simulations of Ỹ0 for different ρ̂ and discretizations. Standard deviations are given in brackets. In order to gain a better understanding of how the performance of the method depends on the input parameters, we also consider the case σlow = 0.3 and σup = 0.4. Following Vanden (2006), the price of the European call-spread option in the continuous-time limit is 9.7906 in this case. We get qualitatively the same results as for the previous example, in the sense that convergence is faster for the smaller reference volatility and that the upper bound estimators begin to deteriorate as the time partition becomes too fine. However, quantitatively, the numerical results in Table 7 and 8 are better than in the previous example (confidence intervals remain tight for finer time partitions, for which the discretized price has a relative error of about 0.2 % compared to the continuous-time limit). This is quite likely to be connected to the fact that the ratio between σup and σlow is smaller in this second example. 7 Conclusion This paper proposes a new method for constructing high-biased and low-biased Monte Carlo estimators for solutions of stochastic dynamic programming equations. When applied to the optimal 29 J 3 6 9 12 15 18 21 24 27 30 Ỹ0 9.6206 9.7147 9.7435 9.7574 9.7643 9.7683 9.7717 9.7750 9.7767 9.7772 Ŷ0low 9.6139 9.7147 9.7433 9.7573 9.7642 9.7681 9.7728 9.7740 9.7768 9.7769 (0.0001) (0.0002) (0.0002) (0.0003) (0.0002) (0.0005) (0.0006) (0.0012) (0.0002) (0.0002) Ŷ0up 9.6211 9.7167 9.7475 9.7629 9.7725 9.7848 9.7933 9.8010 9.8248 9.9354 (0.0002) (0.0002) (0.0006) (0.0005) (0.0016) (0.0028) (0.0046) (0.0052) (0.0115) (0.0592) √ Table 7: Approximated price as well as lower and upper bounds for ρ̂ = 0.4/ 3 for different time discretizations. Standard deviations are given in brackets J 10 20 30 40 50 60 70 80 Ỹ0 9.6061 9.6924 9.7235 9.7409 9.7509 9.7574 9.7626 9.7653 Ŷ0low 9.6060 9.6922 9.7231 9.7406 9.7506 9.7571 9.7622 9.7644 (0.0001) (<0.0001) (<0.0001) (<0.0001) (<0.0001) (0.0001) (0.0001) (0.0002) Ŷ0up 9.6063 9.6930 9.7248 9.7435 9.7567 9.7707 9.7946 9.8651 (0.0001) (<0.0001) (0.0001) (0.0001) (0.0002) (0.0006) (0.0018) (0.0088) Table 8: Approximated price as well as lower and upper bounds for ρ̂ = 0.35 for different time discretizations. Standard deviations are given in brackets stopping problem, the method simplifies to the classical primal-dual approach of Rogers (2002) and Haugh and Kogan (2004) (except that the martingale penalty appears as a control variate in the lower bound). Our approach is complementary to earlier generalizations of this methodology by Rogers (2007) and Brown et al. (2010) whose approaches start out from a primal optimization problem rather than from a dynamic programming equation. The resulting representation of high-biased and low-biased estimators in terms of a pathwise recursion makes the method very tractable from a computational point of view. Suitably coupling upper and lower bounds in the recursion enables us to handle situations which are outside the scope of classical primal-dual approaches, because the dynamic programming equation fails to satisfy a comparison principle. Acknowledgements Financial support by the Deutsche Forschungsgemeinschaft under grant BE3933/5-1 is gratefully acknowledged. A Preparations In this section, we verify in our setting, i.e., under the standing assumptions (C) and (R), a number of technical properties of the control processes which are needed in the later proofs. Note first that, by continuity of Fi , Fi# (ri ) = sup (ri> z − Fi (z)) z∈QD 30 (1) (0) (1) (0) is Fi -measurable for every r ∈ AFj and i = j, . . . , J − 1 — and so is G# i (ρi , ρi ) for (ρ , ρ ) ∈ AG j . Moreover, the integrability condition on the controls requires that Fi# (ri ) < ∞ and (1) (0) G# i (ρi , ρi ) > −∞ P -almost surely, i.e., controls take values in the effective domain of the convex conjugate of F and in the effective domain of the concave conjugate of G, respectively. (0) In particular, by the monotonicity assumption on G in the y-variable, we observe that ρi ≥0 P -almost surely. G Finally, we show existence of adapted controls, so that the sets AG j and Aj are always nonempty. Lemma A.1. Fix j ∈ {0, . . . , J − 1} and let fj : Ω × RN → R be a mapping such that, for every ω ∈ Ω, the map x 7→ fj (ω, x) is convex, and for every x ∈ RN , the map ω 7→ fj (ω, x) is Fj -measurable. Moreover, suppose that fj satisfies the following polynomial growth condition: (R) such that There are a constant q ≥ 0 and a nonnegative random variable αj ∈ L∞− j |fj (x)| ≤ αj (1 + |x|q ), P -a.s., for every x ∈ RN . Then, for every Z ∈ L∞− (RN ) there exists a random variable ρj ∈ L∞− (RN ) j j (R) and such that fj# ρj ∈ L∞− j # fj (Z) = ρ> j Z − fj (ρj ), P -a.s. (27) Proof. Let Z ∈ L∞− (RN ). Notice first that, since fj is convex and closed, we have fj## = fj by j Theorem 12.2 in Rockafellar (1970) and thus fj (Z) ≥ ρ> Z − fj# (ρ) (28) holds ω-wise for any random variable ρ. We next show that there exists an Fj -measurable random variable ρj for which (28) holds with P -almost sure equality. To this end, we apply Theorem 7.4 in Cheridito et al. (2015) which yields the existence of an Fj -measurable subgradient to fj , i.e., existence of an Fj -measurable random variable ρj such that for all Fj -measurable RN -valued random variables Z fj Z + Z − fj Z ≥ ρ> j Z, P -a.s. (29) From (29) (with the choice Z = z − Z̄ for z ∈ QN ), we conclude that # > ρ> j Z − fj Z ≥ sup ρj (z) − fj (z) = fj (ρj ), P -a.s., (30) z∈QN by continuity of fj , which is the converse of (28), proving P -almost sure equality for ρ = ρj and thus (27). We next show that ρj satisfies the required integrability conditions, i.e., ρj ∈ L∞− (RN ) ∞− (R) for any Z ∈ L∞− (RN ). and fj# (ρj ) ∈ L∞− (R). To this end, we first prove that ρ> j Z ∈ L j 31 Due to (29) and the Minkowski inequality and since a ≤ b implies a+ ≤ |b|, it follows for Z ∈ L∞− (RN ) that, for every p ≥ 1, j 1 > p p E ρj Z + p 1 p 1 E fj Z + Z p + E fj Z p < ∞, ≤ since fj is of polynomial growth with ‘random constant’ αj ∈ L∞− (R) and Z, Z are members of j L∞− (RN ) by assumption. Applying the same argument to Z̃ = −Z yields j > p > p E ρj Z = E ρj Z̃ < ∞, − + since (29) holds for all Fj -measurable random variables Z and Z̃ inherits the integrability of Z. We thus conclude that p i h Z <∞ E ρ> j and p E ρj < ∞, where the second claim follows from the first by taking Z = sgn(ρj ) with the sign function applied componentwise. In order to show that fj# (ρj ) ∈ L∞− (R), we start with (27) and apply the Minkowski inequality to conclude that h p i p1 h > p i p1 p 1 # E fj ρj ≤ E ρj Z + E fj Z p < ∞. B B.1 Proofs of Section 4 Proof of Theorem 4.3 We first show that the implications (b) ⇒ (c) ⇒ (a) hold in an analogous formulation without the additional assumption that Gi (z, y) = y. Proposition B.1. Suppose (R) and (C), and consider the following assertions: (a) The comparison principle (Comp) holds. F (b’) For every (ρ(1) , ρ(0) ) ∈ AG 0 and r ∈ A0 the following positivity condition is fulfilled: For every i = 0, . . . , J − 1 (ρi + ρi ri )> βi+1 ≥ 0, (1) (0) P -a.s. (c’) For every j = 0, . . . , J − 1 and any two random variables Y (1) , Y (2) ∈ L∞− (R) with Y (1) ≥ Y (2) P -a.s., the monotonicity condition Gj (Ej [βj+1 Y (1) ], Fj (Ej [βj+1 Y (1) ])) ≥ Gj (Ej [βj+1 Y (2) ], Fj (Ej [βj+1 Y (2) ])), is satisfied. 32 P -a.s., Then, (b0 ) ⇒ (c0 ) ⇒ (a). Proof. (b0 ) ⇒ (c0 ): Fix j ∈ {0, . . . , J − 1} and let Y (1) and Y (2) be random variables which are in L∞− (R) and satisfy Y (1) ≥ Y (2) . By Lemma A.1, there are r ∈ AF0 and (ρ(1) , ρ(0) ) ∈ AG 0 such that h i h i Fj Ej βj+1 Y (2) = rj> Ej βj+1 Y (2) − Fj# (rj ) and h i Gj Ej βj+1 Y (1) , Fj (Ej [βj+1 Y (1) ]) h i (1) > (0) (1) (0) = ρj Ej βj+1 Y (1) + ρj Fj Ej [βj+1 Y (1) ] − G# ρ , ρ , j j j P -almost surely. Hence, by (11), (b0 ) and (12) we obtain h h i i Gj Ej βj+1 Y (2) , Fj Ej βj+1 Y (2) h i (1) > (0) (1) (0) ≤ ρj Ej βj+1 Y (2) + ρj Fj Ej [βj+1 Y (2) ] − G# ρj , ρ j j > (1) (0) (0) # (1) (0) # (2) = Ej ρj + ρj rj βj+1 Y − ρj Fj (rj ) − Gj ρj , ρj > (1) (0) (0) # (1) (0) # (1) ≤ Ej ρj + ρj rj βj+1 Y − ρj Fj (rj ) − Gj ρj , ρj h i (1) > (0) (1) (0) ρj Ej βj+1 Y (1) + ρj Fj Ej [βj+1 Y (1) ] − G# ρ , ρ j j j h i h i = Gj Ej βj+1 Y (1) , Fj Ej βj+1 Y (1) . ≤ (c0 ) ⇒ (a): We prove this implication by backward induction. Let Y up and Y low respectively be super- and subsolutions of (10). Then the assertion is trivially true for j = J. Now assume, that the assertion is true for j + 1 ∈ {1, . . . , J}. It follows by (c0 ) and the definition of a sub- and supersolution that up up low low Yjup ≥ Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])) ≥ Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])) ≥ Yjlow . Proof of Theorem 4.3. Notice first, that under the additional assumption Gi (z, y) = y, assertions (b0 ), (c0 ) in Proposition B.1 coincide with assertions (b), (c) in Theorem 4.3, because the vector (0, ..., 0, 1)> ∈ RD+1 is the only control in AG 0 by linearity of G. It, hence, remains to show: (a) ⇒ (b): We prove the contraposition. Hence, we assume that there exists a r ∈ AF0 and a 33 j0 ∈ {0, . . . , J − 1} such that P ({r> j0 βj0 +1 < 0}) > 0. Then we define the process Y by Y, i > j0 + 1 i Y i = Yj0 +1 − n1{r> βj +1 <0} , i = j0 + 1 j0 0 # r> E [β Y i i+1 i+1 ] − Fi (r i ), i ≤ j0 , i for n ∈ N which we fix later on. In view of (12), it follows easily that Y is a subsolution to (10). Now we observe that i h i h # # ∗ Y j0 − Yj0 = Ej0 (rj0 − rj∗0 )> βj0 +1 Yj0 +1 + nEj0 (r> β ) j0 j0 +1 − − Fj0 (r j0 ) + Fj0 (rj0 ), where r∗ ∈ AFj0 is such that for all j = j0 , . . . , J − 1 (rj∗ )> Ej [βj+1 Yj+1 ] − Fj# (rj∗ ) = Fj (Ej [βj+1 Yj+1 ]), see Lemma A.1. In a next step we define the set Aj0 ,N by Aj0 ,N h i 1 > = Ej0 (rj0 βj0 +1 )− ≥ N n h i o ∩ Ej0 (rj0 − rj∗0 )> βj0 +1 Yj0 +1 − Fj#0 (rj0 ) + Fj#0 (rj∗0 ) > −N . For N ∈ N sufficiently large (which is fixed from now on), we get that P (Aj0 ,N ) > 0 and therefore (Y j0 − Yj0 )1Aj0 ,N > −N + n =0 N for n = N 2 , which means that the comparison principle is violated for the subsolution Y (with this choice of n) and the (super-)solution Y . B.2 Proof of Theorem 4.5 The proof of Theorem 4.5 is prepared by two propositions. The first of them shows almost sure optimality of optimal controls and martingales. For convenience, we show this simultaneously for the processes Θup and Θlow from Theorem 4.7. Proposition B.2. With the notation in Theorems 4.5 and 4.7, we have for every i = j, . . . J, (1,∗) (0,∗) Yi = θiup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = Θup ,ρ , M ∗) i (ρ ∗ ∗ = θilow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ) = Θlow i (r , M ). Proof. The proof is by backward induction on i = J, . . . , j. The case i = J is obvious as all five 34 processes have the same terminal condition ξ by construction. Now suppose the claim is already shown for i + 1 ∈ {j + 1, . . . , J}. Then, applying the induction hypothesis to the right hand side of the recursion formulas, we observe (1,∗) (0,∗) Θup,∗ := Θup ,ρ , M ∗ ) = θiup (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ), i i (ρ (1,∗) (0,∗) Θlow,∗ := Θlow ,ρ , M ∗ ) = θilow (ρ(1,∗) , ρ(0,∗) , r∗ , M ∗ ). i (ρ i Exploiting again the induction hypothesis as well as the definition of the Doob martingale and the duality relation (22) we obtain (1,∗) > (0,∗) (1,∗) (0,∗) ∗ ∗ ρi βi+1 Yi+1 − ∆Mi+1 + ρi Fi (βi+1 Yi+1 − ∆Mi+1 ) − G# , ρi ) i (ρi > (1,∗) (0,∗) (1,∗) (0,∗) = ρi Ei [βi+1 Yi+1 ] + ρi Fi (Ei [βi+1 Yi+1 ]) − G# , ρi ) i (ρi Θup,∗ = i = Gi (Ei [βi+1 Yi+1 ], Fi (Ei [βi+1 Yi+1 ])) = Yi . An analogous argument, making use of (23), shows Θlow,∗ = Yi . i The key step in the proof of Theorem 4.5 is the following alternative recursion formula for θjup and θjlow . This result enables us to establish inequalities between Yj , Ej [θjup ] and Ej [θjlow ], replacing, in a sense, the comparison principle (Comp). Proposition B.3. Suppose (R) and (C) and let M ∈ MD . Then, for every j = 0, . . . , J and F ρ(1) , ρ(0) ∈ AG j and r ∈ Aj , we have for all i = j, . . . , J the P -almost sure identities θiup (ρ(1) , ρ(0) , r, M ) (1) (0) up low (1) (0) = sup Φi+1 ρi , ρi , u, θi+1 (ρ(1) , ρ(0) , r, M ), θi+1 (ρ , ρ , r, M ), ∆Mi+1 u∈RD θilow (ρ(1) , ρ(0) , r, M ) = up low (1) (0) inf Φi+1 v (1) , v (0) , ri , θi+1 (ρ , ρ , r, M ), θi+1 (ρ(1) , ρ(0) , r, M ), ∆Mi+1 , (v(1) ,v(0) )∈RD+1 where ΦJ+1 (v (1) , v (0) , u, ϑ1 , ϑ2 , m) = ξ and Φi+1 v (1) , v (0) , u, ϑ1 , ϑ2 , m > > > (1) (1) = v βi+1 ϑ1 − v βi+1 ϑ2 − v (1) m + − (1) (0) +v (0) u> βi+1 ϑ1 − u> βi+1 ϑ2 − u> m − Fi# (u) − G# v , v i + − 35 for i = j, ..., J − 1. In particular, θilow (ρ(1) , ρ(0) , r, M ) ≤ θiup (ρ(1) , ρ(0) , r, M ) for every i = j, . . . , J. Proof. First we fix j ∈ {0, . . . , J − 1}, M ∈ MD and controls ρ(1) , ρ(0) and r in AG j respectively AFj and define θup and θlow by (20). To lighten the notation, we set up (1) (0) (1) (0) low (1) (0) (1) (0) Φlow v , v , r = Φ v , v , r , θ (ρ , ρ , r, M ), θ (ρ , ρ , r, M ), ∆M i i+1 i i+1 i+1 i+1 i+1 up and θ low ). We show the assertion by and define Φup i+1 accordingly (interchanging the roles of θ backward induction on i = J, . . . , j with the case i = J being trivial since θJup = θJlow = ΦJ+1 = ξ by definition. Now suppose that the assertion is true for i + 1. For any (v (1) , v (0) ) ∈ RD+1 we obtain, by (11), the following upper bound for θilow : (1) (0) Φlow i+1 v , v , ri > up low = v (1) βi+1 θi+1 1{(v(1) )> βi+1 ≥0} + θi+1 1{(v(1) )> βi+1 <0} − ∆Mi+1 up # (0) > low > > (1) (0) +v ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G# v , v i + − up low ≥ Gi βi+1 θi+1 1{(v(1) )> βi+1 ≥0} + θi+1 1{(v(1) )> βi+1 <0} − ∆Mi+1 , up # > low > > ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) + − up ι low ≥ min Gi βi+1 θi+1 − ∆Mi+1 , ri> βi+1 θi+1 − ri> βi+1 θi+1 + − ι∈{up,low} # > −ri ∆Mi+1 − Fi (ri ) = θilow . We emphasize that this chain of inequalities holds for every ω ∈ Ω, and not only P -almost surely. Hence, inf (v (1) ,v (0) )∈RD+1 (1) (0) Φlow v , v , r ≥ θilow i i+1 for every ω ∈ Ω. To conclude the argument for θilow , it remains to show that the converse inequality holds P -almost surely. Thanks to (11) we get ι βi+1 θi+1 Gi = − ∆Mi+1 , ri> βi+1 + low θi+1 − ri> βi+1 − up θi+1 > ι inf v (1) βi+1 θi+1 − ∆Mi+1 + v (0) (v(1) ,v(0) )∈RD+1 − ri> ∆Mi+1 ri> βi+1 + − up low θi+1 − ri> βi+1 θi+1 − ri> ∆Mi+1 − Fi# (ri ) 36 Fi# (ri ) − (1) (0) − G# v , v . i up low P -a.s. (by the induction hypothesis) we obtain Together with θi+1 ≥ θi+1 up ι low Gi βi+1 θi+1 − ∆Mi+1 , ri> βi+1 θi+1 − ri> βi+1 θi+1 + − ι∈{up,low} −ri> ∆Mi+1 − Fi# (ri ) > > ι = min inf v (1) βi+1 θi+1 − v (1) ∆Mi+1 ι∈{up,low} (v (1) ,v (0) )∈RD+1 up # (1) (0) low > (0) > > v , v +v ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G# i + − > > > up low v (1) βi+1 θi+1 ≥ inf − v (1) βi+1 θi+1 − v (1) ∆Mi+1 (v(1) ,v(0) )∈RD+1 + − up # (0) low > (1) (0) > > +v v , v ri βi+1 θi+1 − ri βi+1 θi+1 − ri ∆Mi+1 − Fi (ri ) − G# i + − (1) (0) = inf P -a.s. Φlow i+1 v , v , ri , (v(1) ,v(0) )∈RD+1 θilow = min We next turn to θiup where the overall strategy of proof is similar. Recall first that the monotonicity of G in the y-component implies existence of a set Ω̄ρ (depending on ρ(0) ) of full P -measure such (0) that ρk (ω) ≥ 0 for every ω ∈ Ω̄ρ and k = j, . . . , J − 1. By (12) we find that, for any u ∈ RD , (0) (1) up Φup i+1 (ρi , ρi , u) is a lower bound for θi on Ω̄ρ : (1) (0) Φup ρ , ρ , u i+1 i i (1) > (1) > (1) > up low = ρi βi+1 θi+1 − ρi βi+1 θi+1 − ρi ∆Mi+1 + − (0) (1) (0) up # > > low > +ρi u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u) − G# ρi , ρ i i + − > > (1) > (1) (1) up low βi+1 θi+1 − ρi ∆Mi+1 βi+1 θi+1 − ρi ≤ ρi + − (0) (1) (0) up low +ρi Fi βi+1 θi+1 1{u> βi+1 ≥0} + θi+1 1{u> βi+1 <0} − ∆Mi+1 − G# ρ , ρ i i i > > > (1) (1) (1) up low ≤ ρi βi+1 θi+1 − ρi βi+1 θi+1 − ρi ∆Mi+1 + − (0) (1) (0) ι +ρi max Fi (βi+1 θi+1 − ∆Mi+1 ) − G# ρ , ρ i i i ι∈{up,low} = Hence, θiup . (1) (0) sup Φup ρ , ρ , u ≤ θiup i+1 i i u∈RD on Ω̄ρ , and, thus, P -almost surely. To complete the proof of the proposition, we show the converse 37 up low P -a.s., we conclude, by (12), inequality. As θi+1 ≥ θi+1 θiup (1) > (1) > (1) > up low θi+1 − ρi βi+1 θi+1 − ρi ∆Mi+1 = ρi βi+1 + − (0) (1) (0) ι +ρi max Fi (βi+1 θi+1 − ∆Mi+1 ) − G# ρ , ρ i i i ι∈{up,low} (1) > (1) > (1) > up low = ρi βi+1 θi+1 − ρi βi+1 θi+1 − ρi ∆Mi+1 + − (0) ι − u> ∆Mi+1 − Fi# (u) +ρi max sup u> βi+1 θi+1 ι∈{up,low} u∈RD (1) (0) ρ , ρ −G# i i i > (1) (1) > (1) > up low ≤ ρi βi+1 θi+1 − ρi βi+1 θi+1 − ρi ∆Mi+1 + − (0) up # > > low > +ρi sup u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u) + − u∈RD (1) (0) −G# ρi , ρi i (1) (0) = sup Φup ρ , ρ , u , P -a.s. i+1 i i u∈RD As Φi+1 v (1) , v (0) , u, ϑ1 , ϑ2 , m is increasing in ϑ1 and decreasing in ϑ2 , we finally get (1) (0) (1) (0) up low low up sup Φi+1 ρi , ρi , u, θi+1 , θi+1 , ∆Mi+1 ≥ sup Φi+1 ρi , ρi , u, θi+1 , θi+1 , ∆Mi+1 u∈RD u∈RD low up Φi+1 v (1) , v (0) , ri , θi+1 , θi+1 , ∆Mi+1 = θilow , P -a.s., ≥ inf (v(1) ,v(0) )∈RD+1 θiup = up low P -a.s. by the induction hypothesis. as θi+1 ≥ θi+1 We are now in the position to complete the proof of Theorem 4.5. Proof of Theorem 4.5. Let j ∈ {0, . . . , J − 1} be fixed from now on. Due to Propositions B.2 and B.3, it only remains to show that Ei [θilow ] ≤ Yi ≤ Ei [θiup ] for i = j, ..., J. We prove this by backward induction on i. To this end, we fix M ∈ MD and F (1,∗) , ρ(0,∗) and r ∗ in controls ρ(1) , ρ(0) and r in AG j respectively Aj , as well as ‘optimizers’ ρ F up and AG j respectively Aj which satisfy the duality relations (22) and (23). By definition of θ θlow the assertion is trivially true for i = J. Suppose that the assertion is true for i + 1. Recalling Proposition B.3 and applying the tower property of the conditional expectation as well as the 38 induction hypothesis, we get " Ei [θilow ] > > up low (1) (1) θi+1 − v βi+1 θi+1 v βi+1 = Ei inf (v(1) ,v(0) )∈RD+1 + − > up low − v (1) ∆Mi+1 + v (0) ri> βi+1 θi+1 − ri> βi+1 θi+1 − ri> ∆Mi+1 + − # # # (1) (0) −Fi (ri ) − Gi v , v i h up (1,∗) > (1,∗) > low Ei+1 θi+1 ρi βi+1 Ei+1 θi+1 − ≤ Ei ρi βi+1 − + i h > up (1,∗) (0,∗) low − ri> βi+1 Ei+1 θi+1 − ρi ∆Mi+1 + ρi ri> βi+1 Ei+1 θi+1 − + (1,∗) (0,∗) −ri> ∆Mi+1 − Fi# (ri ) − G# ρi , ρ i i (1,∗) > (0,∗) (1,∗) (0,∗) # # > βi+1 Yi+1 + ρi ri βi+1 Yi+1 − Fi (ri ) − Gi ρi , ρi ≤ Ei ρi ≤ Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ])) = Yi . (0,∗) Here, the last inequality is an immediate consequence of (12), the nonnegativity of ρi duality relation (22). Applying an analogous argument, we obtain that Ei [θiup ] = Ei (1) > ρi βi+1 up θi+1 Ei [θiup ] and the ≥ Yi . Indeed, (1) > (1) > low − ρi βi+1 θi+1 − ρi ∆Mi+1 + − (0) up # > > low > +ρi sup u βi+1 θi+1 − u βi+1 θi+1 − u ∆Mi+1 − Fi (u) + − u∈RD (1) (0) −G# ρi , ρ i i h i up (1) > (1) > (1) > low ≥ Ei ρi βi+1 Ei+1 θi+1 − ρi βi+1 Ei+1 θi+1 − ρi ∆Mi+1 − + h i up ∗ > (0) low +ρi (ri∗ )> βi+1 Ei+1 θi+1 − (ri ) βi+1 Ei+1 θi+1 − (ri∗ )> ∆Mi+1 − + (1) (0) # ∗ # −Fi (ri ) − Gi ρi , ρi (1) (0) (ri∗ )> Ei [βi+1 Yi+1 ] − Fi# (ri∗ ) − G# ρ , ρ i i i > (1) (0) (1) (0) = ρi Ei [βi+1 Yi+1 ] + ρi Fi (Ei [βi+1 Yi+1 ]) − G# ρi , ρi i ≥ (1) > ρi (0) Ei [βi+1 Yi+1 ] + ρi ≥ Gi (Ei [βi+1 Yi+1 ] , Fi (Ei [βi+1 Yi+1 ])) = Yi . 39 (0) making now use of the nonnegativity of ρi , the duality relation (23), and (11). This establishes Ei [θilow ] ≤ Yi ≤ Ei [θiup ] for i = j, . . . , J. B.3 Proof of Proposition 4.6 By the definition of θup , the tower property of the conditional expectation, Jensen’s inequality up low (in view of (applied to the convex functions max and Fj ), and the comparison Yj+1 ≥ Yj+1 Proposition B.3), we obtain Yjup (1) > (1) > (1) > up low θj+1 − θj+1 = Ej ρj βj+1 ρj βj+1 − ρj ∆Mj+1 + − (1) (0) (0) # ι + ρj max Fj (βj+1 θj+1 − ∆Mj+1 ) − Gj ρj , ρj ι∈{up,low} (1) > (1) > up low ≥ Ej ρj βj+1 Yj+1 − Ej ρj βj+1 Yj+1 + − (0) (1) (0) ι +ρj max Fj (Ej [βj+1 Yj+1 ]) − G# ρ , ρ j j j ι∈{up,low} h i (1) > (0) up ≥ ρj Ej βj+1 Yj+1 + ρj max ι∈{up,low} (1) (0) ι Fj (Ej [βj+1 Yj+1 ]) − G# ρ , ρ j j j h i (1) > (0) (1) (0) up up ≥ ρj Ej βj+1 Yj+1 + ρj Fj (Ej [βj+1 Yj+1 ]) − G# ρj , ρ j j up up ≥ Gj (Ej [βj+1 Yj+1 ], Fj (Ej [βj+1 Yj+1 ])), (0) making, again, use of the nonnegativity of ρj for the lower bound Y B.4 low and (11). As in the previous proofs, the argument is essentially the same, and we skip the details. Proof of Theorem 4.7 Let j ∈ {0, ..., J − 1} be fixed. Due to Proposition B.2 and the comparison principle (Comp), it only remains to show that the pair of processes Y up and Y low given by Yiup = Ei [Θup i ] and Yilow = Ei [Θlow i ], i = j, ..., J, defines a super- and a subsolution to (10). The proof of this claim is similar to the proof of Proposition 4.6 but simpler. 40 C Proofs on Stochastic Games In this section, we provide the technical details for the discussion of stochastic two-player zero-sum games in Sections 3 and 5. C.1 Alternative Representation We first derive the alternative representation for Θlow 0 (r, M ) as a pathwise minimization problem discussed in Section 5. To this end, define, for some fixed control r ∈ AF0 and martingale M ∈ MD , Θ̃low = i wi,J (v (1) , v (0) , r)ξ inf (1) (0) (vj ,vj )∈RD+1 , j=i,...,J−1 − J−1 X ! (0) (1) (0) (1) (0) # # wi,j (v (1) , v (0) , r) vj Fj (rj ) + (vj + vj rj )> ∆Mj+1 + Gj (vj , vj ) , j=i where wi,j (v (1) , v (0) , u) = j−1 Y (vk + vk uk )> βk+1 . (1) (0) k=i Then, Θ̃low J = ξ and, for i = 0, . . . , J − 1, Θ̃low = i inf − J−1 X (1) inf (1) (0) (vi ,vi )∈RD+1 (1) (0) (vj ,vj )∈RD+1 , wi+1,j (v (1) , v (0) , r) (vi + vi ri )> βi+1 wi+1,J (v (1) , v (0) , r)ξ (0) j=i+1,...,J−1 (0) vj Fj# (rj ) + (1) (vj + (0) vj rj )> ∆Mj+1 j=i+1 − (0) vi Fi# (ri ) + (1) (vi + (0) vi ri )> ∆Mi+1 + (1) (0) G# i (vi , vi ) (1) ! + (1) (0) G# j (vj , vj ) ! . (0) The outer infimum can be taken restricted to such (vi , vi ) ∈ RD+1 which belong to DG# (ω,·) , i because the expression which is to be minimized is +∞ otherwise. Then, assumption (15) implies that the inner infimum can be interchanged with the nonnegative factor (vi +vi ri )> βi+1 , which (1) (0) yields Θ̃low = i inf (1) (0) (vi ,vi )∈D # G (ω,·) i (1) (vi + vi ri )> βi+1 Θ̃low i+1 (0) (0) (1) (0) (1) (0) − vi Fi# (ri ) + (vi + vi ri )> ∆Mi+1 + G# (v , v ) , j i i (31) and the infimum can, by the same argument as above, again be replaced by that over the whole RD+1 . Further rewriting (31) using (11) shows that the recursion for Θ̃low coincides with the one i 41 for Θlow i . Therefore, low Θlow = 0 (r, M ) = Θ̃0 − J−1 X wJ (v (1) , v (0) , r)ξ inf (1) (0) (vj ,vj )∈RD+1 , wj (v (1) , v (0) , r) j=0,...,J−1 (0) vj Fj# (rj ) + (1) (vj + (0) vj rj )> ∆Mj+1 + (1) (0) G# j (vj , vj ) ! , j=0 (1) (0) P -almost surely. The alternative expression for Θup 0 (ρ , ρ , M ) as a pathwise maximization problem can be shown in the same way. C.2 Equilibrium Value It remains to show that Y0 is the equilibrium value of the two-player zero-sum game. We apply the alternative representations for Θup and Θlow as pathwise optimization problems from Section C.1 as well as Theorem 4.7 (twice) in order to conclude Y0 = sup r∈AF 0 , = M ∈MD − J−1 X " E sup r∈AF 0 , E[Θlow 0 (r, M )] inf (1) M ∈MD (0) (0) (1) (0) (1) (0) wj (v (1) , v (0) , r) vj Fj# (rj ) + (vj + vj rj )> ∆Mj+1 + G# (v , v ) j j j j=0 ≤ − J−1 X inf G (1) (0) M ∈MD (ρ ,ρ )∈A0 sup r∈AF 0 ≤ " inf (ρ(1) ,ρ(0) )∈AG 0 E wJ (ρ(1) , ρ(0) , r)ξ − sup E wJ (ρ(1) , ρ(0) , r)ξ − inf F (ρ(1) ,ρ(0) )∈AG 0 r∈A inf − J−1 X wj (ρ (1) (0) ,ρ E J−1 X # (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) # (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + G# j (ρj , ρj ) j=0 " (ρ(1) ,ρ(0) )∈AG 0 , M ∈MD J−1 X j=0 " 0 ≤ E wJ (ρ(1) , ρ(0) , r)ξ # (0) (1) (0) (1) (0) wj (ρ(1) , ρ(0) , r) ρj Fj# (rj ) + (ρj + ρj rj )> ∆Mj+1 + G# j (ρj , ρj ) j=0 = wJ (ρ(1) , ρ(0) , u)ξ sup (uj )∈RD , j=0,...,J−1 (0) (1) (0) (1) (0) , u) ρj Fj# (uj ) + (ρj + ρj uj )> ∆Mj+1 + G# (ρ , ρ ) j j j j=0 = # " sup r∈AF 0 , wJ (v (1) , v (0) , r)ξ (vj ,vj )∈RD+1 , j=0,...,J−1 inf (ρ(1) ,ρ(0) )∈AG 0 , M ∈MD (1) (0) E[Θup 0 (ρ , ρ , M )] = Y0 . 42 !# Here we also applied the zero-expectation property of martingale penalties at adapted controls. Consequently, all inequalities turn into equalities, which completes the proof. References S. Alanko and M. Avellaneda. Reducing variance in the numerical solution of BSDEs. Comptes Rendus Mathematique, 351(3):135–138, 2013. L. Andersen and M. Broadie. Primal-dual simulation algorithm for pricing multidimensional American options. Management Science, 50(9):1222–1234, 2004. M. Avellaneda, A. Levy, and A. Parás. Pricing and hedging derivative securities in markets with uncertain volatilities. Applied Mathematical Finance, 2(2):73–88, 1995. C. Bender and J. Steiner. Least-squares Monte Carlo for Backward SDEs. In R. Carmona, P. Del Moral, P. Hu, and N. Oudjane, editors, Numerical Methods in Finance, pages 257–289. Springer, 2012. C. Bender, N. Schweizer, and J. Zhuo. A primal-dual algorithm for BSDEs. Mathematical Finance, Early View, DOI: 10.1111/mafi.12100, 2015. D. P. Bertsekas. Dynamic programming and optimal control, volume I. Athena Scientific, 3rd edition, 2005. C. Beveridge and M. Joshi. Monte Carlo bounds for game options including convertible bonds. Management Science, 57(5):960–974, 2011. D. Brigo and F. Mercurio. Interest Rate Models – Theory and Practice: With Smile, Inflation and Credit. Springer, 2nd edition, 2006. D. Brigo and A. Pallavicini. Counterparty risk pricing under correlation between default and interest rates. In J. H. Miller, D. C. Edelman, and J. A. D. Appleby, editors, Numerical Methods for Finance, pages 63–81. Chapman and Hall/CRC, 2007. D. Brigo, M. Morini, and A. Pallavicini. Counterparty Credit Risk, Collateral and Funding: With Pricing Cases for All Asset Classes. John Wiley & Sons, 2013. D. B. Brown and J. E. Smith. Dynamic portfolio optimization with transaction costs: Heuristics and dual bounds. Management Science, 57(10):1752–1770, 2011. D. B. Brown, J. E. Smith, and P. Sun. Information relaxations and duality in stochastic dynamic programs. Operations Research, 58(4):785–801, 2010. 43 P. Cheridito, M. Kupper, and N. Vogelpoth. Conditional analysis on Rd . In A. H. Hamel, F. Heyde, A. Löhne, B. Rudloff, and C. Schrage, editors, Set Optimization and Applications in Finance - The State of the Art, pages 179–211. Springer, 2015. S. Crépey. Financial Modeling. Springer, 2013. S. Crépey, R. Gerboud, Z. Grbac, and N. Ngor. Counterparty risk and funding: The four wings of the TVA. International Journal of Theoretical and Applied Finance, 16(2):1350006, 2013. S. Crépey, T. R. Bielecki, and D. Brigo. Counterparty Risk and Funding: A Tale of Two Puzzles. Chapman and Hall/CRC, 2014. D. Duffie, M. Schroder, and C. Skiadas. Recursive valuation of defaultable securities and the timing of resolution of uncertainty. Annals of Applied Probability, pages 1075–1090, 1996. N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equations in finance. Mathematical Finance, 7(1):1–71, 1997. A. Fahim, N. Touzi, and X. Warin. A probabilistic numerical method for fully nonlinear parabolic PDEs. Annals of Applied Probability, 21(4):1322–1364, 2011. E. Fournié, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, and N. Touzi. Applications of Malliavin calculus to Monte Carlo methods in finance. Finance and Stochastics, 3(4):391–412, 1999. P. Glasserman and B. Yu. Simulation for American options: Regression now or regression later? In H. Niederreiter, editor, Monte Carlo and Quasi-Monte Carlo Methods 2002, pages 213–226. Springer, 2004. J. Guyon and P. Henry-Labordère. The uncertain volatility model: A Monte Carlo approach. Journal of Computational Finance, 14(3):37–71, 2011. J. Guyon and P. Henry-Labordère. Nonlinear Option Pricing. Chapman and Hall/CRC, 2013. M. Haugh and A. E. B. Lim. Linear–quadratic control and information relaxations. Operations Research Letters, 40(6):521–528, 2012. M. B. Haugh and L. Kogan. Pricing American options: A duality approach. Operations Research, 52(2):258–270, 2004. M. B. Haugh and C. Wang. Information relaxations and dynamic zero-sum games. arXiv preprint 1405.4347, 2015. I. Karatzas and S. Shreve. Brownian Motion and Stochastic Calculus. Springer, 2nd edition, 1991. 44 H. Kraft and F. T. Seifried. Stochastic differential utility as the continuous-time limit of recursive utility. Journal of Economic Theory, 151:528–550, 2014. C. Kühn, A. E. Kyprianou, and K. van Schaik. Pricing Israeli options: A pathwise approach. Stochastics, 79(1-2):117–137, 2007. R. Lord, R. Koekkoek, and D. van Dijk. A comparison of biased simulation schemes for stochastic volatility models. Quantitative Finance, 10(2):177–194, 2010. T. J. Lyons. Uncertain volatility and the risk-free synthesis of derivatives. Applied Mathematical Finance, 2(2):117–133, 1995. H. Pham. Continuous-time Stochastic Control and Optimization with Financial Applications. Springer, 2009. W. B. Powell. Approximate Dynamic Programming: Solving the curses of dimensionality. John Wiley & Sons, 2nd edition, 2011. N. S. Rasmussen. Control variates for Monte Carlo valuation of American options. Journal of Computational Finance, 9(1):84–102, 2005. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. L. C. G. Rogers. Monte Carlo valuation of American options. Mathematical Finance, 12(3): 271–286, 2002. L. C. G. Rogers. Pathwise stochastic optimal control. SIAM Journal on Control and Optimization, 46(3):1116–1132, 2007. L. S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095– 1100, 1953. J. M. Vanden. Exact superreplication strategies for a class of derivative assets. Applied Mathematical Finance, 13(01):61–87, 2006. J. Zhang. A numerical scheme for bsdes. Annals of Applied Probability, 14(1):459–488, 2004. 45