Multilevel monte Carlo metamodelling

advertisement
Multilevel Monte Carlo
Metamodeling
Imry Rosenbaum
Jeremy Staum
Outline
•
•
•
•
•
What is simulation metamodeling?
Metamodeling approaches
Why use function approximation?
Multilevel Monte Carlo
MLMC in metamodeling
Simulation Metamodelling
• Simulation
–
–
–
–
Given input πœƒ we observe π‘Œ πœƒ .
Each observation is noisy.
Effort is measured by number of observations, π‘š.
We use simulation output π‘Œ πœƒ; π‘š to estimate the response
surface πœ‡ πœƒ = 𝐸 π‘Œ πœƒ .
• Simulation Metamodelling
– Fast estimate of πœ‡ πœƒ given any πœƒ.
– “what does the response surface look like?”
Why do we need Metamodeling
• What-if analysis
– How things will change for different scenarios .
– Applicable in financial, business and military settings.
• For example
– Multi-product asset portfolios.
– How product mix will change our business profit.
Approaches
• Regression
• Interpolation
• Kriging
– Stochastic Kriging
• Kernel Smoothing
Metamodeling as Function
Approximation
• Metamodeling is essentially function approximation
under uncertainty.
• Information Based Complexity has answers for such
settings.
• One of those answers is Multilevel Monte Carlo.
Multilevel Monte Carlo
• Multilevel Monte Carlo has been suggested as a
numerical method for parametric integration.
• Later the notion was extended to SDEs.
• In our work we extend the multilevel notion to
stochastic simulation metamodeling.
Multilevel Monte Carlo
• In 1998 Stefan Heinrich introduced the notion of
multilevel MC.
• The scheme reduces the computational cost of
estimating a family of integrals.
• We use the smoothness of the underlying function
in order to enhance our estimate of the integral.
Example
• Let us consider 𝑓 ∈ 𝐢 0,1
compute
2
and we want to
1
𝑒 πœƒ =
𝑓 πœƒ, 𝑑 𝑑𝑑
0
For all πœƒ ∈ Θ = 0,1 .
• We will fix a grid πœƒπ‘– =
𝑖
,
𝑛
𝑖 = 0, … , 𝑛 , estimate
the respective integrals and interpolate.
Example Continued
We will use piecewise linear approximation
𝑛
πœ‚ πœƒ =
𝑒 πœƒπ‘– πœ‘π‘– πœƒ
𝑖=0
Where πœ‘π‘– are the respective hat functions and 𝑒 πœƒπ‘–
are Monte Carlo estimate, i.e,
𝑒 πœƒπ‘– =
1
𝑀
𝑀
𝑗=1 𝑓
πœƒπ‘– , πœ‰π‘— .
πœ‰π‘— are iid uniform random variables.
Example Continued
• Let us use the root mean square norm as metric
for error
1
𝑒 πœ‚ =
πœ‚ πœƒ −𝑒 πœƒ
2
1/2
π‘‘πœƒ
.
0
• It can be shown that under our assumption of
smoothness that 𝑒 πœ‚ = 𝑂 𝑀−1/2 at the cost of
𝑂 𝑀3/2 .
Example Continued
• Let us consider a sequence of grids
πœƒβ„“π‘– =
𝑖
,
2β„“
𝑖 = 0, … , 2β„“ .
• We could represent our estimator as
πœ‚ πœƒ =
𝐿
β„“=0 πœ‚β„“
πœƒ − πœ‚β„“−1 πœƒ , ( πœ‚−1 = 0).
Where, πœ‚β„“ πœƒ is the estimation using the β„“π‘‘β„Ž grid.
• We define each one of our decision variables in terms
of M, as to keep a fair comparison.
Example Continued
level
Square root of
variance
0
β„“
L
𝑀−1/2
2−β„“ 𝑀−1/2
2−𝐿 𝑀−1/2
Cost
𝑀
2β„“ 𝑀
2𝐿 𝑀
• The variance reaches its maximum in the first level
but the cost reaches its maximum in the last level.
Example Continued
• Let us now use a different number of observations in
each level, π‘šβ„“ , thus the estimator will be
𝐿
πœ‚π‘€πΏπ‘€πΆ
πœƒ =
β„“=1
1
π‘šβ„“
π‘šβ„“
2β„“
2β„“−1
𝑒 πœƒπ‘–β„“ πœ‘π‘–β„“ πœƒ −
𝑗=1
𝑖=0
𝑒 πœƒπ‘–β„“−1 πœ‘π‘–β„“−1 πœƒ
𝑖=0
• We will use π‘šβ„“ = 2−3β„“/2 𝑀 to balance between cost
and variance.
Example Continued
level
Square root of
variance
0
β„“
L
𝑀−1/2
2−β„“/4 𝑀−1/2
2−𝐿/4 𝑀−1/2
Cost
𝑀
2−β„“/2 𝑀
2−𝐿/2 𝑀
• It follows that the square root of the variance is
𝑂 𝑀−1/2 while the cost is 𝑂 𝑀 .
• Previously, same variance at the cost of 𝑂 𝑀3/2 .
Generalization
• Let Θ ⊂ 𝑅 𝑑1 and 𝐺 ⊂ 𝑅 𝑑2 be bounded open sets
with Lipschitz boundary.
• We assume the Sobolev embedding condition
•
•
π‘Šπ‘žπ‘Ÿ,0
𝑓
Θ × πΊ = 𝑓 ∈ πΏπ‘ž Θ × πΊ
π‘Šπ‘žπ‘Ÿ,0
=
𝛼 ≤π‘Ÿ
πœ•π›Ό 𝑓
π‘ž
πœ•πœƒ 𝛼
πΏπ‘ž
πœ•π›Ό 𝑓
: 𝛼
πœ•πœƒ
1/π‘ž
.
π‘Ÿ
𝑑1
>
1
.
π‘ž
∈ πΏπ‘ž , 𝛼 ≤ π‘Ÿ .
General Thm
Theorem 1 (Heinrich). Let 1 < π‘ž < ∞, 𝑝 = π‘šπ‘–π‘› 2, π‘ž . Then
there exist constants 𝑐1 , 𝑐2 > 0 such that for each integer 𝑀 > 1
there is a choice of parameters 𝐿, π‘šβ„“ 𝐿ℓ=1 such that the cost of
computing πœ‚ 𝑀𝐿𝑀𝐢 is bounded by 𝑐1 𝑀 and for each 𝑓 ∈ π‘Šπ‘žπ‘Ÿ,0 (Λ
Issues
• MLMC requires smoothness to work, but can we
guarantee such smoothness?
• Moreover, the more dimensions we have the more
smoothness that we will require.
• Is there a setting that will help with alleviating these
concerns?
Answer
• The answer to our question came from the derivative
estimation setting in Monte Carlo simulation.
• Derivative Estimation is mainly used in finance to
estimate the Greeks of financial derivatives.
• Glasserman and Broadie presented a framework
under which a pathwise estimator is unbiased.
• This framework will be suitable as well in our case.
Simulation MLMC
•
•
•
•
•
•
Goal
Framework
Multi Level Monte Carlo Method
Computational Complexity
Algorithm
Results
Goal
• Our goal is to estimate the response surface πœ‡ πœƒ
=𝐸 π‘Œ πœƒ .
• The aim is to minimize the total number of
observations used for the estimator.
• Effort is relative to amount of precision we require.
Elements We will Need for the MLMC
• Smoothness provided us with the information how
adjacent points behave.
• Our assumptions on the function will provide the
same information.
• The choice of approximation and grid will allow to
preserve this properties in the estimator.
The framework
• First we assume that our simulation output is a
Holder continuous function of a random vector 𝑋,
π‘Œ πœƒ, πœ” = 𝑓 𝑋 πœƒ, πœ” .
• Therefore, there exist 𝑐 ∈ ℜ and 𝜁 ∈ (0,1] such that
𝑓 π‘ˆ − 𝑓 𝑉 < 𝑐 π‘ˆ − 𝑉 𝜁 for all π‘ˆ, 𝑉 in ℜ𝑑
Framework Continued…
• Next we assume that there exist a random variable, πœ…
with a finite second moment such that
𝑋 πœƒ1 − 𝑋 πœƒ2 ≤ πœ… πœƒ1 − πœƒ2 for all πœƒ1 , πœƒ2 ∈ Θ, a.s.
• Furthermore, we assume that Θ = 1 and that it is
compact.
Behavior of Adjacnt Points
• bias of estimating π‘Œ πœƒ + β„Ž using π‘Œ πœƒ is
𝐸 π‘Œ πœƒ+β„Ž −π‘Œ πœƒ
≤𝐸 π‘Œ πœƒ+β„Ž −π‘Œ πœƒ
≤𝐸 𝑐 𝑋 πœƒ+β„Ž −𝑋 πœƒ 𝜁 =𝑂 β„Ž 𝜁
• It follows immediately that,
π‘‰π‘Žπ‘Ÿ π‘Œ πœƒ + β„Ž − π‘Œ πœƒ ≤ 𝐸 π‘Œ πœƒ + β„Ž − π‘Œ πœƒ
= 𝑂 β„Ž 2𝜁
2
Multi Level Monte Carlo
• Let us assume that we have a sequence of grids Δβ„“
with increasing number of points π‘˜ 𝑑 β„“+1 .
• The experiment designed are structured such that the
maximum distance between a point πœƒ and point in the
experiment design is 𝑂 π‘˜ −β„“πœ , denoted by Δβ„“ .
• Let π‘Œβ„“ πœƒ, πœ” denote an approximation of π‘Œ πœƒ using
the same πœ” at each design point.
Approximating the Response
MLMC Decomposition
• Let us rewrite the expectation of our approximation in
the multilevel way
𝐸 π‘Œβ„“ πœƒ, πœ”
= 𝐸 π‘Œ0 πœƒ, πœ”
+
β„“
𝑖=1 𝐸
π‘Œπ‘– πœƒ, πœ”
− 𝐸 π‘Œπ‘–−1 πœƒ, πœ” .
• Let us define the estimator of 𝐸 π‘Œβ„“ πœƒ, πœ”
observations, πœ‡β„“ πœƒ, π‘š, πœ”π‘–
=
1
π‘š
π‘š
𝑖=1 π‘Œβ„“
using m
πœƒ, πœ”π‘– .
MLMC Decomposition Continued
• Next we can write the estimator in the multilevel
decomposition,
πœ‡β„“ πœƒ, π‘š, πœ”π‘–
β„“
= πœ‡0 πœƒ, π‘š, πœ”π‘–
β„“
β„“
+
πœ‡π‘— πœƒ, π‘š, πœ”π‘–
β„“
− πœ‡π‘—−1 πœƒ, π‘š, πœ”π‘–
β„“
𝑗=1
• Do we really have to use the same π‘š for all levels?
The MLMC estimator
• We will denote the MLMC estimator as
𝑍ℓ πœƒ,
πœ”π‘–0
,…,
πœ”π‘–β„“
=
β„“
𝑖=0 Δ𝑍𝑗
πœƒ,
𝑗
πœ”π‘–
• Where
𝑗
Δ𝑍𝑗 πœƒ, πœ”π‘–
= πœ‡π‘— πœƒ, 𝑀𝑗 , πœ”π‘–
𝑗
− πœ‡π‘—−1 πœƒ, 𝑀𝑗 , πœ”π‘–
𝑗
Multilevel Illustration
Δ
Multi Level MC estimators
• Let us denote
π‘Ÿ
π‘Ÿ
π΅π‘œπ‘₯π‘Ÿ πœƒ = π‘₯: ∀𝑖 = 1, … , 𝑑, πœƒπ‘– − ≤ π‘₯𝑖 ≤ πœƒπ‘– +
2
2
• We want to consider approximation of the form of
π‘˜ 𝑑ℓ
𝑔ℓ πœƒ − πœƒπ‘–β„“ ⋅ π‘Œ πœƒπ‘–β„“ , πœ”
π‘Œβ„“ πœƒ, πœ” =
𝑖=0
Approximation Reqierments
• We assume that for each πœ‰ > 0 there exist a window
size π‘Ÿ β„“, πœ‰ > Δβ„“ ) which is 𝑂 π‘˜ −πœˆβ„“ . Such that for
each, 𝑠 ≥ π‘Ÿ β„“, πœ‰ we have 0 ≤ 𝑔ℓ 𝑠 ≤
for each πœƒ ∈ Θ we have
πœ‰
1 − β„“πœˆ ≤
π‘˜
β„“
πœƒπ‘– ∈π΅π‘œπ‘₯π‘Ÿ β„“,πœ‰ πœƒ
𝑔ℓ πœƒ −
πœƒπ‘–β„“
𝜈
π‘˜ β„“ 𝑑+𝜈
and
πœ‰
≤ 1 + β„“πœˆ
π‘˜
Bias and Variance of the
Approximation
• Under these assumptions we can show that
– πœ‡ πœƒ − 𝐸 πœ‡β„“ πœƒ
– π‘‰π‘Žπ‘Ÿ Δ𝑍ℓ πœƒ
= 𝑂 π‘˜ −β„“πœˆπœ
= 𝑂 π‘˜ −2β„“πœˆπœ
• Our measure of error is Mean Integrated Square Error
𝑀𝐼𝑆𝐸 πœ‡ πœƒ
=𝐸
Θ
πœ‡ πœƒ −πœ‡ πœƒ
2
π‘‘πœƒ
• Next, we can use a theorem provided by Cliffe et al.
to bound the computational complexity of the
MLMC.
Computational Complexity Theorem
Theorem. Let πœ‡ πœƒ denote a simulation response surface and
πœ‡β„“ πœƒ , an estimator of it using 𝑀ℓ replications for each design
point. Suppose there exist Δ𝑍ℓ πœƒ , 𝑐1 , 𝑐2 , 𝑐3 , 𝛼, 𝛽, 𝛾, π‘˜ such that
π‘˜ > 1, 𝛼 > 0.5π‘šπ‘–π‘› 𝛽, 𝛾 and
1. 𝐸 πœ‡ πœƒ − πœ‡β„“ πœƒ ≤ 𝑐1 π‘˜ −𝛼ℓ
2. 𝐸 Δ𝑍ℓ πœƒ
3. π‘‰π‘Žπ‘Ÿ Δ𝑍ℓ πœƒ
𝐸 πœ‡0 πœƒ , β„“ = 0
=
𝐸 πœ‡β„“ πœƒ − πœ‡β„“−1 πœƒ , β„“ > 0
≤
𝑐2 −𝛽ℓ
π‘˜
𝑀ℓ
4. The computational cost of Δ𝑍ℓ πœƒ is bounded by 𝑐3 𝑀ℓ π‘˜ 𝛾ℓ
Theorem Continued…
Then for every πœ€ there exist values of 𝐿 and 𝑀ℓ for which the
MSE of the MLMC estimator 𝐿ℓ=0 Δ𝑍ℓ πœƒ is bounded by πœ€ 2
with a total computation cost of
𝑂 πœ€ −2 ,
−2 π‘™π‘œπ‘”πœ€
𝑂
πœ€
𝐢=
𝑂 πœ€
−2−
𝛾−𝛽
𝛼
2
,
𝛽>𝛾
,
𝛽=𝛾
𝛽<𝛾
Multilevel Monte Carlo Algorithm
• The theoretical results need translation into practical
settings.
• Out of simplicity we consider only the Lipschitz
continuous setting.
Simplifying Assumptions
• The constants 𝑐1 , 𝑐2 and 𝑐3 stated in the theorem are
crucial in deciding when to stop. However, in practice
they will not be known to us.
• If 𝐸 πœ‡ πœƒ − πœ‡β„“ πœƒ = 𝑐1 π‘˜ −𝛼ℓ we can deduce that
𝐸 Δ𝑍ℓ πœƒ ≈ π‘˜ − 1 𝐸 πœ‡ πœƒ − πœ‡β„“ πœƒ .
Simplifying Assumptions Continued
• Hence, we can use Δ𝑍ℓ πœƒ as a pessimistic estimate
of the bias at level β„“. Thus, we will continue adding
level until the following criterion is met
πœ€
Δ𝑍ℓ πœƒ ≤ π‘˜ − 1
2
• However, due to its inherent variance we would
recommend using the following stopping criteria
1
πœ€
π‘€π‘Žπ‘₯ Δ𝑍ℓ πœƒ , Δ𝑍ℓ−1 πœƒ
≤ π‘˜−1
π‘˜
2
The algorithm
Black-Scholes
Black-Scholes continued
Conclusion
• Multilevel Monte Carlo provides an efficient
metamodeling scheme.
• We eliminated the necessity for increased smoothness
when dimension increase.
• Introduced a practical MLMC algorithm for
stochastic simulation metamodeling.
Questions?
Download