Multilevel Monte Carlo Metamodeling Imry Rosenbaum Jeremy Staum Outline • • • • • What is simulation metamodeling? Metamodeling approaches Why use function approximation? Multilevel Monte Carlo MLMC in metamodeling Simulation Metamodelling • Simulation – – – – Given input π we observe π π . Each observation is noisy. Effort is measured by number of observations, π. We use simulation output π π; π to estimate the response surface π π = πΈ π π . • Simulation Metamodelling – Fast estimate of π π given any π. – “what does the response surface look like?” Why do we need Metamodeling • What-if analysis – How things will change for different scenarios . – Applicable in financial, business and military settings. • For example – Multi-product asset portfolios. – How product mix will change our business profit. Approaches • Regression • Interpolation • Kriging – Stochastic Kriging • Kernel Smoothing Metamodeling as Function Approximation • Metamodeling is essentially function approximation under uncertainty. • Information Based Complexity has answers for such settings. • One of those answers is Multilevel Monte Carlo. Multilevel Monte Carlo • Multilevel Monte Carlo has been suggested as a numerical method for parametric integration. • Later the notion was extended to SDEs. • In our work we extend the multilevel notion to stochastic simulation metamodeling. Multilevel Monte Carlo • In 1998 Stefan Heinrich introduced the notion of multilevel MC. • The scheme reduces the computational cost of estimating a family of integrals. • We use the smoothness of the underlying function in order to enhance our estimate of the integral. Example • Let us consider π ∈ πΆ 0,1 compute 2 and we want to 1 π’ π = π π, π‘ ππ‘ 0 For all π ∈ Θ = 0,1 . • We will fix a grid ππ = π , π π = 0, … , π , estimate the respective integrals and interpolate. Example Continued We will use piecewise linear approximation π π π = π’ ππ ππ π π=0 Where ππ are the respective hat functions and π’ ππ are Monte Carlo estimate, i.e, π’ ππ = 1 π π π=1 π ππ , ππ . ππ are iid uniform random variables. Example Continued • Let us use the root mean square norm as metric for error 1 π π = π π −π’ π 2 1/2 ππ . 0 • It can be shown that under our assumption of smoothness that π π = π π−1/2 at the cost of π π3/2 . Example Continued • Let us consider a sequence of grids πβπ = π , 2β π = 0, … , 2β . • We could represent our estimator as π π = πΏ β=0 πβ π − πβ−1 π , ( π−1 = 0). Where, πβ π is the estimation using the βπ‘β grid. • We define each one of our decision variables in terms of M, as to keep a fair comparison. Example Continued level Square root of variance 0 β L π−1/2 2−β π−1/2 2−πΏ π−1/2 Cost π 2β π 2πΏ π • The variance reaches its maximum in the first level but the cost reaches its maximum in the last level. Example Continued • Let us now use a different number of observations in each level, πβ , thus the estimator will be πΏ πππΏππΆ π = β=1 1 πβ πβ 2β 2β−1 π’ ππβ ππβ π − π=1 π=0 π’ ππβ−1 ππβ−1 π π=0 • We will use πβ = 2−3β/2 π to balance between cost and variance. Example Continued level Square root of variance 0 β L π−1/2 2−β/4 π−1/2 2−πΏ/4 π−1/2 Cost π 2−β/2 π 2−πΏ/2 π • It follows that the square root of the variance is π π−1/2 while the cost is π π . • Previously, same variance at the cost of π π3/2 . Generalization • Let Θ ⊂ π π1 and πΊ ⊂ π π2 be bounded open sets with Lipschitz boundary. • We assume the Sobolev embedding condition • • πππ,0 π Θ × πΊ = π ∈ πΏπ Θ × πΊ πππ,0 = πΌ ≤π ππΌ π π ππ πΌ πΏπ ππΌ π : πΌ ππ 1/π . π π1 > 1 . π ∈ πΏπ , πΌ ≤ π . General Thm Theorem 1 (Heinrich). Let 1 < π < ∞, π = πππ 2, π . Then there exist constants π1 , π2 > 0 such that for each integer π > 1 there is a choice of parameters πΏ, πβ πΏβ=1 such that the cost of computing π ππΏππΆ is bounded by π1 π and for each π ∈ πππ,0 (Λ Issues • MLMC requires smoothness to work, but can we guarantee such smoothness? • Moreover, the more dimensions we have the more smoothness that we will require. • Is there a setting that will help with alleviating these concerns? Answer • The answer to our question came from the derivative estimation setting in Monte Carlo simulation. • Derivative Estimation is mainly used in finance to estimate the Greeks of financial derivatives. • Glasserman and Broadie presented a framework under which a pathwise estimator is unbiased. • This framework will be suitable as well in our case. Simulation MLMC • • • • • • Goal Framework Multi Level Monte Carlo Method Computational Complexity Algorithm Results Goal • Our goal is to estimate the response surface π π =πΈ π π . • The aim is to minimize the total number of observations used for the estimator. • Effort is relative to amount of precision we require. Elements We will Need for the MLMC • Smoothness provided us with the information how adjacent points behave. • Our assumptions on the function will provide the same information. • The choice of approximation and grid will allow to preserve this properties in the estimator. The framework • First we assume that our simulation output is a Holder continuous function of a random vector π, π π, π = π π π, π . • Therefore, there exist π ∈ ℜ and π ∈ (0,1] such that π π − π π < π π − π π for all π, π in ℜπ Framework Continued… • Next we assume that there exist a random variable, π with a finite second moment such that π π1 − π π2 ≤ π π1 − π2 for all π1 , π2 ∈ Θ, a.s. • Furthermore, we assume that Θ = 1 and that it is compact. Behavior of Adjacnt Points • bias of estimating π π + β using π π is πΈ π π+β −π π ≤πΈ π π+β −π π ≤πΈ π π π+β −π π π =π β π • It follows immediately that, πππ π π + β − π π ≤ πΈ π π + β − π π = π β 2π 2 Multi Level Monte Carlo • Let us assume that we have a sequence of grids Δβ with increasing number of points π π β+1 . • The experiment designed are structured such that the maximum distance between a point π and point in the experiment design is π π −βπ , denoted by Δβ . • Let πβ π, π denote an approximation of π π using the same π at each design point. Approximating the Response MLMC Decomposition • Let us rewrite the expectation of our approximation in the multilevel way πΈ πβ π, π = πΈ π0 π, π + β π=1 πΈ ππ π, π − πΈ ππ−1 π, π . • Let us define the estimator of πΈ πβ π, π observations, πβ π, π, ππ = 1 π π π=1 πβ using m π, ππ . MLMC Decomposition Continued • Next we can write the estimator in the multilevel decomposition, πβ π, π, ππ β = π0 π, π, ππ β β + ππ π, π, ππ β − ππ−1 π, π, ππ β π=1 • Do we really have to use the same π for all levels? The MLMC estimator • We will denote the MLMC estimator as πβ π, ππ0 ,…, ππβ = β π=0 Δππ π, π ππ • Where π Δππ π, ππ = ππ π, ππ , ππ π − ππ−1 π, ππ , ππ π Multilevel Illustration Δ Multi Level MC estimators • Let us denote π π π΅ππ₯π π = π₯: ∀π = 1, … , π, ππ − ≤ π₯π ≤ ππ + 2 2 • We want to consider approximation of the form of π πβ πβ π − ππβ ⋅ π ππβ , π πβ π, π = π=0 Approximation Reqierments • We assume that for each π > 0 there exist a window size π β, π > Δβ ) which is π π −πβ . Such that for each, π ≥ π β, π we have 0 ≤ πβ π ≤ for each π ∈ Θ we have π 1 − βπ ≤ π β ππ ∈π΅ππ₯π β,π π πβ π − ππβ π π β π+π and π ≤ 1 + βπ π Bias and Variance of the Approximation • Under these assumptions we can show that – π π − πΈ πβ π – πππ Δπβ π = π π −βππ = π π −2βππ • Our measure of error is Mean Integrated Square Error ππΌππΈ π π =πΈ Θ π π −π π 2 ππ • Next, we can use a theorem provided by Cliffe et al. to bound the computational complexity of the MLMC. Computational Complexity Theorem Theorem. Let π π denote a simulation response surface and πβ π , an estimator of it using πβ replications for each design point. Suppose there exist Δπβ π , π1 , π2 , π3 , πΌ, π½, πΎ, π such that π > 1, πΌ > 0.5πππ π½, πΎ and 1. πΈ π π − πβ π ≤ π1 π −πΌβ 2. πΈ Δπβ π 3. πππ Δπβ π πΈ π0 π , β = 0 = πΈ πβ π − πβ−1 π , β > 0 ≤ π2 −π½β π πβ 4. The computational cost of Δπβ π is bounded by π3 πβ π πΎβ Theorem Continued… Then for every π there exist values of πΏ and πβ for which the MSE of the MLMC estimator πΏβ=0 Δπβ π is bounded by π 2 with a total computation cost of π π −2 , −2 ππππ π π πΆ= π π −2− πΎ−π½ πΌ 2 , π½>πΎ , π½=πΎ π½<πΎ Multilevel Monte Carlo Algorithm • The theoretical results need translation into practical settings. • Out of simplicity we consider only the Lipschitz continuous setting. Simplifying Assumptions • The constants π1 , π2 and π3 stated in the theorem are crucial in deciding when to stop. However, in practice they will not be known to us. • If πΈ π π − πβ π = π1 π −πΌβ we can deduce that πΈ Δπβ π ≈ π − 1 πΈ π π − πβ π . Simplifying Assumptions Continued • Hence, we can use Δπβ π as a pessimistic estimate of the bias at level β. Thus, we will continue adding level until the following criterion is met π Δπβ π ≤ π − 1 2 • However, due to its inherent variance we would recommend using the following stopping criteria 1 π πππ₯ Δπβ π , Δπβ−1 π ≤ π−1 π 2 The algorithm Black-Scholes Black-Scholes continued Conclusion • Multilevel Monte Carlo provides an efficient metamodeling scheme. • We eliminated the necessity for increased smoothness when dimension increase. • Introduced a practical MLMC algorithm for stochastic simulation metamodeling. Questions?