Multilevel monte Carlo metamodelling

Multilevel Monte Carlo Metamodeling Imry Rosenbaum Jeremy Staum Outline • • • • • What is simulation metamodeling? Metamodeling approaches Why use function approximation? Multilevel Monte Carlo MLMC in metamodeling Simulation Metamodelling • Simulation – – – – Given input 𝜃 we observe 𝑌 𝜃 . Each observation is noisy. Effort is measured by number of observations, 𝑚. We use simulation output 𝑌 𝜃; 𝑚 to estimate the response surface 𝜇 𝜃 = 𝐸 𝑌 𝜃 . • Simulation Metamodelling – Fast estimate of 𝜇 𝜃 given any 𝜃. – “what does the response surface look like?” Why do we need Metamodeling • What-if analysis – How things will change for different scenarios . – Applicable in financial, business and military settings. • For example – Multi-product asset portfolios. – How product mix will change our business profit. Approaches • Regression • Interpolation • Kriging – Stochastic Kriging • Kernel Smoothing Metamodeling as Function Approximation • Metamodeling is essentially function approximation under uncertainty. • Information Based Complexity has answers for such settings. • One of those answers is Multilevel Monte Carlo. Multilevel Monte Carlo • Multilevel Monte Carlo has been suggested as a numerical method for parametric integration. • Later the notion was extended to SDEs. • In our work we extend the multilevel notion to stochastic simulation metamodeling. Multilevel Monte Carlo • In 1998 Stefan Heinrich introduced the notion of multilevel MC. • The scheme reduces the computational cost of estimating a family of integrals. • We use the smoothness of the underlying function in order to enhance our estimate of the integral. Example • Let us consider 𝑓 ∈ 𝐶 0,1 compute 2 and we want to 1 𝑢 𝜃 = 𝑓 𝜃, 𝑡 𝑑𝑡 0 For all 𝜃 ∈ Θ = 0,1 . • We will fix a grid 𝜃𝑖 = 𝑖 , 𝑛 𝑖 = 0, … , 𝑛 , estimate the respective integrals and interpolate. Example Continued We will use piecewise linear approximation 𝑛 𝜂 𝜃 = 𝑢 𝜃𝑖 𝜑𝑖 𝜃 𝑖=0 Where 𝜑𝑖 are the respective hat functions and 𝑢 𝜃𝑖 are Monte Carlo estimate, i.e, 𝑢 𝜃𝑖 = 1 𝑀 𝑀 𝑗=1 𝑓 𝜃𝑖 , 𝜉𝑗 . 𝜉𝑗 are iid uniform random variables. Example Continued • Let us use the root mean square norm as metric for error 1 𝑒 𝜂 = 𝜂 𝜃 −𝑢 𝜃 2 1/2 𝑑𝜃 . 0 • It can be shown that under our assumption of smoothness that 𝑒 𝜂 = 𝑂 𝑀−1/2 at the cost of 𝑂 𝑀3/2 . Example Continued • Let us consider a sequence of grids 𝜃ℓ𝑖 = 𝑖 , 2ℓ 𝑖 = 0, … , 2ℓ . • We could represent our estimator as 𝜂 𝜃 = 𝐿 ℓ=0 𝜂ℓ 𝜃 − 𝜂ℓ−1 𝜃 , ( 𝜂−1 = 0). Where, 𝜂ℓ 𝜃 is the estimation using the ℓ𝑡ℎ grid. • We define each one of our decision variables in terms of M, as to keep a fair comparison. Example Continued level Square root of variance 0 ℓ L 𝑀−1/2 2−ℓ 𝑀−1/2 2−𝐿 𝑀−1/2 Cost 𝑀 2ℓ 𝑀 2𝐿 𝑀 • The variance reaches its maximum in the first level but the cost reaches its maximum in the last level. Example Continued • Let us now use a different number of observations in each level, 𝑚ℓ , thus the estimator will be 𝐿 𝜂𝑀𝐿𝑀𝐶 𝜃 = ℓ=1 1 𝑚ℓ 𝑚ℓ 2ℓ 2ℓ−1 𝑢 𝜃𝑖ℓ 𝜑𝑖ℓ 𝜃 − 𝑗=1 𝑖=0 𝑢 𝜃𝑖ℓ−1 𝜑𝑖ℓ−1 𝜃 𝑖=0 • We will use 𝑚ℓ = 2−3ℓ/2 𝑀 to balance between cost and variance. Example Continued level Square root of variance 0 ℓ L 𝑀−1/2 2−ℓ/4 𝑀−1/2 2−𝐿/4 𝑀−1/2 Cost 𝑀 2−ℓ/2 𝑀 2−𝐿/2 𝑀 • It follows that the square root of the variance is 𝑂 𝑀−1/2 while the cost is 𝑂 𝑀 . • Previously, same variance at the cost of 𝑂 𝑀3/2 . Generalization • Let Θ ⊂ 𝑅 𝑑1 and 𝐺 ⊂ 𝑅 𝑑2 be bounded open sets with Lipschitz boundary. • We assume the Sobolev embedding condition • • 𝑊𝑞𝑟,0 𝑓 Θ × 𝐺 = 𝑓 ∈ 𝐿𝑞 Θ × 𝐺 𝑊𝑞𝑟,0 = 𝛼 ≤𝑟 𝜕𝛼 𝑓 𝑞 𝜕𝜃 𝛼 𝐿𝑞 𝜕𝛼 𝑓 : 𝛼 𝜕𝜃 1/𝑞 . 𝑟 𝑑1 > 1 . 𝑞 ∈ 𝐿𝑞 , 𝛼 ≤ 𝑟 . General Thm Theorem 1 (Heinrich). Let 1 < 𝑞 < ∞, 𝑝 = 𝑚𝑖𝑛 2, 𝑞 . Then there exist constants 𝑐1 , 𝑐2 > 0 such that for each integer 𝑀 > 1 there is a choice of parameters 𝐿, 𝑚ℓ 𝐿ℓ=1 such that the cost of computing 𝜂 𝑀𝐿𝑀𝐶 is bounded by 𝑐1 𝑀 and for each 𝑓 ∈ 𝑊𝑞𝑟,0 (Λ Issues • MLMC requires smoothness to work, but can we guarantee such smoothness? • Moreover, the more dimensions we have the more smoothness that we will require. • Is there a setting that will help with alleviating these concerns? Answer • The answer to our question came from the derivative estimation setting in Monte Carlo simulation. • Derivative Estimation is mainly used in finance to estimate the Greeks of financial derivatives. • Glasserman and Broadie presented a framework under which a pathwise estimator is unbiased. • This framework will be suitable as well in our case. Simulation MLMC • • • • • • Goal Framework Multi Level Monte Carlo Method Computational Complexity Algorithm Results Goal • Our goal is to estimate the response surface 𝜇 𝜃 =𝐸 𝑌 𝜃 . • The aim is to minimize the total number of observations used for the estimator. • Effort is relative to amount of precision we require. Elements We will Need for the MLMC • Smoothness provided us with the information how adjacent points behave. • Our assumptions on the function will provide the same information. • The choice of approximation and grid will allow to preserve this properties in the estimator. The framework • First we assume that our simulation output is a Holder continuous function of a random vector 𝑋, 𝑌 𝜃, 𝜔 = 𝑓 𝑋 𝜃, 𝜔 . • Therefore, there exist 𝑐 ∈ ℜ and 𝜁 ∈ (0,1] such that 𝑓 𝑈 − 𝑓 𝑉 < 𝑐 𝑈 − 𝑉 𝜁 for all 𝑈, 𝑉 in ℜ𝑑 Framework Continued… • Next we assume that there exist a random variable, 𝜅 with a finite second moment such that 𝑋 𝜃1 − 𝑋 𝜃2 ≤ 𝜅 𝜃1 − 𝜃2 for all 𝜃1 , 𝜃2 ∈ Θ, a.s. • Furthermore, we assume that Θ = 1 and that it is compact. Behavior of Adjacnt Points • bias of estimating 𝑌 𝜃 + ℎ using 𝑌 𝜃 is 𝐸 𝑌 𝜃+ℎ −𝑌 𝜃 ≤𝐸 𝑌 𝜃+ℎ −𝑌 𝜃 ≤𝐸 𝑐 𝑋 𝜃+ℎ −𝑋 𝜃 𝜁 =𝑂 ℎ 𝜁 • It follows immediately that, 𝑉𝑎𝑟 𝑌 𝜃 + ℎ − 𝑌 𝜃 ≤ 𝐸 𝑌 𝜃 + ℎ − 𝑌 𝜃 = 𝑂 ℎ 2𝜁 2 Multi Level Monte Carlo • Let us assume that we have a sequence of grids Δℓ with increasing number of points 𝑘 𝑑 ℓ+1 . • The experiment designed are structured such that the maximum distance between a point 𝜃 and point in the experiment design is 𝑂 𝑘 −ℓ𝜏 , denoted by Δℓ . • Let 𝑌ℓ 𝜃, 𝜔 denote an approximation of 𝑌 𝜃 using the same 𝜔 at each design point. Approximating the Response MLMC Decomposition • Let us rewrite the expectation of our approximation in the multilevel way 𝐸 𝑌ℓ 𝜃, 𝜔 = 𝐸 𝑌0 𝜃, 𝜔 + ℓ 𝑖=1 𝐸 𝑌𝑖 𝜃, 𝜔 − 𝐸 𝑌𝑖−1 𝜃, 𝜔 . • Let us define the estimator of 𝐸 𝑌ℓ 𝜃, 𝜔 observations, 𝜇ℓ 𝜃, 𝑚, 𝜔𝑖 = 1 𝑚 𝑚 𝑖=1 𝑌ℓ using m 𝜃, 𝜔𝑖 . MLMC Decomposition Continued • Next we can write the estimator in the multilevel decomposition, 𝜇ℓ 𝜃, 𝑚, 𝜔𝑖 ℓ = 𝜇0 𝜃, 𝑚, 𝜔𝑖 ℓ ℓ + 𝜇𝑗 𝜃, 𝑚, 𝜔𝑖 ℓ − 𝜇𝑗−1 𝜃, 𝑚, 𝜔𝑖 ℓ 𝑗=1 • Do we really have to use the same 𝑚 for all levels? The MLMC estimator • We will denote the MLMC estimator as 𝑍ℓ 𝜃, 𝜔𝑖0 ,…, 𝜔𝑖ℓ = ℓ 𝑖=0 Δ𝑍𝑗 𝜃, 𝑗 𝜔𝑖 • Where 𝑗 Δ𝑍𝑗 𝜃, 𝜔𝑖 = 𝜇𝑗 𝜃, 𝑀𝑗 , 𝜔𝑖 𝑗 − 𝜇𝑗−1 𝜃, 𝑀𝑗 , 𝜔𝑖 𝑗 Multilevel Illustration Δ Multi Level MC estimators • Let us denote 𝑟 𝑟 𝐵𝑜𝑥𝑟 𝜃 = 𝑥: ∀𝑖 = 1, … , 𝑑, 𝜃𝑖 − ≤ 𝑥𝑖 ≤ 𝜃𝑖 + 2 2 • We want to consider approximation of the form of 𝑘 𝑑ℓ 𝑔ℓ 𝜃 − 𝜃𝑖ℓ ⋅ 𝑌 𝜃𝑖ℓ , 𝜔 𝑌ℓ 𝜃, 𝜔 = 𝑖=0 Approximation Reqierments • We assume that for each 𝜉 > 0 there exist a window size 𝑟 ℓ, 𝜉 > Δℓ ) which is 𝑂 𝑘 −𝜈ℓ . Such that for each, 𝑠 ≥ 𝑟 ℓ, 𝜉 we have 0 ≤ 𝑔ℓ 𝑠 ≤ for each 𝜃 ∈ Θ we have 𝜉 1 − ℓ𝜈 ≤ 𝑘 ℓ 𝜃𝑖 ∈𝐵𝑜𝑥𝑟 ℓ,𝜉 𝜃 𝑔ℓ 𝜃 − 𝜃𝑖ℓ 𝜈 𝑘 ℓ 𝑑+𝜈 and 𝜉 ≤ 1 + ℓ𝜈 𝑘 Bias and Variance of the Approximation • Under these assumptions we can show that – 𝜇 𝜃 − 𝐸 𝜇ℓ 𝜃 – 𝑉𝑎𝑟 Δ𝑍ℓ 𝜃 = 𝑂 𝑘 −ℓ𝜈𝜁 = 𝑂 𝑘 −2ℓ𝜈𝜁 • Our measure of error is Mean Integrated Square Error 𝑀𝐼𝑆𝐸 𝜇 𝜃 =𝐸 Θ 𝜇 𝜃 −𝜇 𝜃 2 𝑑𝜃 • Next, we can use a theorem provided by Cliffe et al. to bound the computational complexity of the MLMC. Computational Complexity Theorem Theorem. Let 𝜇 𝜃 denote a simulation response surface and 𝜇ℓ 𝜃 , an estimator of it using 𝑀ℓ replications for each design point. Suppose there exist Δ𝑍ℓ 𝜃 , 𝑐1 , 𝑐2 , 𝑐3 , 𝛼, 𝛽, 𝛾, 𝑘 such that 𝑘 > 1, 𝛼 > 0.5𝑚𝑖𝑛 𝛽, 𝛾 and 1. 𝐸 𝜇 𝜃 − 𝜇ℓ 𝜃 ≤ 𝑐1 𝑘 −𝛼ℓ 2. 𝐸 Δ𝑍ℓ 𝜃 3. 𝑉𝑎𝑟 Δ𝑍ℓ 𝜃 𝐸 𝜇0 𝜃 , ℓ = 0 = 𝐸 𝜇ℓ 𝜃 − 𝜇ℓ−1 𝜃 , ℓ > 0 ≤ 𝑐2 −𝛽ℓ 𝑘 𝑀ℓ 4. The computational cost of Δ𝑍ℓ 𝜃 is bounded by 𝑐3 𝑀ℓ 𝑘 𝛾ℓ Theorem Continued… Then for every 𝜀 there exist values of 𝐿 and 𝑀ℓ for which the MSE of the MLMC estimator 𝐿ℓ=0 Δ𝑍ℓ 𝜃 is bounded by 𝜀 2 with a total computation cost of 𝑂 𝜀 −2 , −2 𝑙𝑜𝑔𝜀 𝑂 𝜀 𝐶= 𝑂 𝜀 −2− 𝛾−𝛽 𝛼 2 , 𝛽>𝛾 , 𝛽=𝛾 𝛽<𝛾 Multilevel Monte Carlo Algorithm • The theoretical results need translation into practical settings. • Out of simplicity we consider only the Lipschitz continuous setting. Simplifying Assumptions • The constants 𝑐1 , 𝑐2 and 𝑐3 stated in the theorem are crucial in deciding when to stop. However, in practice they will not be known to us. • If 𝐸 𝜇 𝜃 − 𝜇ℓ 𝜃 = 𝑐1 𝑘 −𝛼ℓ we can deduce that 𝐸 Δ𝑍ℓ 𝜃 ≈ 𝑘 − 1 𝐸 𝜇 𝜃 − 𝜇ℓ 𝜃 . Simplifying Assumptions Continued • Hence, we can use Δ𝑍ℓ 𝜃 as a pessimistic estimate of the bias at level ℓ. Thus, we will continue adding level until the following criterion is met 𝜀 Δ𝑍ℓ 𝜃 ≤ 𝑘 − 1 2 • However, due to its inherent variance we would recommend using the following stopping criteria 1 𝜀 𝑀𝑎𝑥 Δ𝑍ℓ 𝜃 , Δ𝑍ℓ−1 𝜃 ≤ 𝑘−1 𝑘 2 The algorithm Black-Scholes Black-Scholes continued Conclusion • Multilevel Monte Carlo provides an efficient metamodeling scheme. • We eliminated the necessity for increased smoothness when dimension increase. • Introduced a practical MLMC algorithm for stochastic simulation metamodeling. Questions?

Multilevel monte Carlo metamodelling

Related documents

Products

Support

Multilevel monte Carlo metamodelling

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib