Advanced Monte Carlo Methods: General Principles of the Monte Carlo Method Prof. Dr. Michael Mascagni E-mail: mascagni@math.ethz.ch, http://www.cs.fsu.edu/~mascagni Seminar für Angewandte Mathematik Eidgenössische Technische Hochschule, CH-8044 Zürich Switzerland and Department of Computer Science and School of Computational Science Florida State University, Tallahassee, Florida, 32306-4120 USA Numerical Integration: The Canonical Monte Carlo Application Numerical integration is a simple problem to explain and thoroughly analyze Deterministic methods Monte Carlo (stochastic methods) Integration is expectation: one can view all Monte Carlo as integration in appropriate setting Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 2 of 61 Numerical Integration (Cont.) Methods for approximating definite integrals Rectangle Rule Trapezoidal Rule Divide the curve into N strips of thickness h=(b-a)/N Sum the area of each trip Approximate to that of a trapezium Simpson’s Rule Calculate piecewise quadratic approximation instead Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 3 of 61 The Monte Carlo Method General Principles Every Monte Carlo computation that leads to quantitative results may be regarded as estimating the value of a multiple integral in the appropriate setting Efficiency Definition Suppose there are two Monte Carlo methods Method 1: n1 units of computing time, σ12 Method 2: n2 units of computing time, σ22 n1σ 12 Methods comparison: n2σ 22 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 4 of 61 Monte Carlo Integration Consider a simple integral 1 θ = ∫ f ( x)dx 0 Definition of expectation of a function on random variable η 1 E ( f (η )) = ∫ f ( x) p ( x)dx 0 1 If η is uniformly distributed, then E ( f (η )) = ∫ f ( x)dx = θ 0 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 5 of 61 Crude Monte Carlo Crude Monte Carlo If ξ1, …, ξn are independent random numbers Uniformly distributed then fi =f(ξi ) are random variates with expectation θ 1 f = n n ∑ i =1 fi is an unbiased estimator of θ The variance is 1 1 2 E (( f − θ ) ) = ∫ ( f ( x) − θ ) 2 dx = σ 2 / n n0 The standard error is σ/n1/2 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 6 of 61 Hit-or-Miss Monte Carlo Suppose 0 ≤ f(x) ≤ 1 when 0 ≤ x ≤ 1 Main idea draw a curve y=f(x) in the unit square 0 ≤ x, y≤ 1 is the proportion of the are of the square beneath the curve 1 or we can write f ( x) = ∫ g ( x, y )dy 0 g ( x, y ) = 0 if f(x) < y = 1 if f(x) ≥ y Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 7 of 61 Analysis of Hit-or-Miss Monte Carlo θ can be estimated as the a double integral 1 1 θ = ∫ ∫ g ( x, y )dxdy 0 0 The estimator of hit-or-miss Monte Carlo n* 1 n g = ∑ g (ξ 2i −1 , ξ 2i ) = n i =1 n Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 8 of 61 Hit-or-Miss Monte Carlo (Cont.) Hit-or-Miss Monte Carlo We take n points at random in the unit square, and count the proportion of them which lie below the curve y=f(x) The points are either in or out of the area below the curve The probability that a point lies under the curve is θ The Hit-or-Miss Monte Carlo is a Bernoulli trial the estimator of Hit-or-Miss Monte Carlo is binomial distributed Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 9 of 61 Binomial Distribution Revisited Binomial Distribution Discrete probability distribution Pp(n|N) of obtaining exactly n successes out of N Bernoulli trials Each Bernoulli trial is true with probability p and false with probability q=1-p Mean: Np; Variance: N(1-p)p = = Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 10 of 61 Comparison of Hit-or-Miss Monte Carlo and Crude Monte Carlo Standard error of Hit-or-Miss Monte Carlo θ (1 − θ ) n Standard error of Crude Monte Carlo 1 ∫ ( f −θ ) 2 dx 0 n Hit-or-Miss Monte Carlo is always worse than Crude Monte Carlo Why? Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 11 of 61 Comparison of Hit-or-Miss Monte Carlo and Crude Monte Carlo (Cont.) Note: all of our integrands are in L2 as they have finite variance Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 12 of 61 Why Hit-or-Miss Monte Carlo is worse? Fact The hit-or-miss to crude sampling is equivalent to replacing g(x, ξ) by its expectation f(x) The y variable in g(x,y) is a random variable Leads to uncertainty Places extra uncertainty in the final results Can be replace by exact value Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 13 of 61 General Principle of Monte Carlo If, at any point of a Monte Carlo calculation, we can replace an estimate by an exact value, we shall replace an estimate by an exact value, we shall reduce the sampling error in the final result Mark Kac: “You use Monte Carlo until you understand the problem” Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 14 of 61 Curse of Dimensionality Curse of Dimensionality If one needs n points to achieve certain accuracy for an 1-D integral, to achieve the same accuracy one needs ns points for sdimensional integral Complexity of high-dimensional integration (tensor product rules Rectangle Rule Trapezoidal Rule Simpson’s Rule Convergence Rate O(n-α/s) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 15 of 61 High-Dimensional Monte Carlo Integration Consider the following integral ρ ρ θ = ∫ ...∫ f ( x )dx 1 1 Definition of expectation of a function on random variable η that is uniformly distributed 1 1 0 0 ρ ρ ρ E ( f (η )) = ∫ ...∫ f ( x )dx = θ Standard error 0 0 Independent of Dimension σ/n1/2 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 16 of 61 Confidence Interval Estimator for the standard error 1 n 2 s = ( f − f ) ∑ i n − 1 i =1 2 Confidence Intervals 66%: [ f − s, f + s ] 95%: [ f − 2s, f + 2s ] 99%: [ f − 3s, f + 3s] Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 17 of 61 Variance Reduction Methods Variance Reduction Techniques Employs an alternative estimator Unbiased More deterministic Yields a smaller variance Methods Stratified Sampling Importance Sampling Control Variates Antithetic Variates Regression Methods Orthonormal Functions Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 18 of 61 Stratified Sampling Idea Analysis of Stratified Sampling Break the range of integration into several pieces Apply crude Monte Carlo sampling to each piece separately Estimator Variance Conclusion If the stratification is well carried out, the variance of stratified sampling will be smaller than crude Monte Carlo Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 19 of 61 Stratified Sampling First we divide the integration interval into k subintervals: The estimator is then: Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 20 of 61 Stratified Sampling (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 21 of 61 Stratified Sampling (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 22 of 61 Stratified Sampling (Cont.) This variance may be less than that from crude Monte Carlo with good stratification Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 23 of 61 Stratified Sampling (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 24 of 61 Idea Importance Sampling Concentrate the distribution of the sample points in the parts of the interval that are of most importance instead of spreading them out evenly Importance Sampling 1 θ =∫ 0 1 1 f ( x) f ( x) f ( x)dx = ∫ g ( x)dx = ∫ dG ( x) g ( x) g ( x) 0 0 where g and G satisfy x G ( x) = ∫ g ( y ) dy 1 G (1) = 0 ∫ g ( y ) dy = 1 0 G(x) is a distribution function Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 25 of 61 Importance Sampling Variance 1 σ 2 f / g = ∫ ( f ( x ) / g ( x ) − θ ) 2 dG ( x ) 0 How to select a good sampling function? How about g=cf? g must be simple enough for us to know its integral theoretically. Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 26 of 61 Control Variates Control Variates 1 1 0 0 θ = ∫ φ ( x)dx + ∫ [ f ( x) − φ ( x)]dx φ(x) is the control variate with known integral Estimator 1 t-t’+θ’ is the unbiased estimator t= θ’ is the first (known) integral n Variance var(t-t’+θ’)=var(t)+var(t’)-2cov(t,t’) ∑ i 1 f (ξ i ), t ' = ∑ φ (ξ i ) n i if 2cov(t,t’)<var(t’), then the variance is smaller than crude Monte Carlo t and t’ should have strong positive correlation Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 27 of 61 Antithetic Variates Main idea Estimator Select a second estimate that has a strong negative correlation with the original estimator t’’ has the same expectation of t [t+t’’]/2 is an unbiased estimator of θ var([t+t’’]/2)=var(t)/4+var(t’’)/4+cov(t,t’’)/2 Commonly used antithetic variate (t+t’’)/2=f(ξ)/2+ f(1-ξ)/2 If f is a monotone function, f(ξ) and f(1-ξ) are negatively correlated Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 28 of 61 Antithetic Variates (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 29 of 61 Antithetic Variates (Cont.) General definition of “Antithetic Variates”: Any method that introduces a set of estimators that mutually compensate for the others’ variance Theorem states any unbiased combination of variables can be rearranged into a linear combination that achieves the minimal possible variance Such combinations invariably are antithetic Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 30 of 61 Antithetic Variates (Cont.) General definition of “Antithetic Variates”: Any method that introduces a set of estimators that mutually compensate for the others’ variance Theorem states any unbiased combination of variables can be rearranged into a linear combination that achieves the minimal possible variance Some examples Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 31 of 61 Antithetic Variates: Examples Consider examples from stratification (I) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 32 of 61 Antithetic Variates: Examples Consider examples from stratification (II) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 33 of 61 Antithetic Variates: Examples (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 34 of 61 Antithetic Variates: Examples (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 35 of 61 Antithetic Variates: Examples (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 36 of 61 Antithetic Variates: Examples (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 37 of 61 Orthonormal Functions General method of Monte Carlo integration based on orthonormal functions (Ermakov & Zolotukhin) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 38 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 39 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 40 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 41 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 42 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 43 of 61 Orthonormal Functions (Cont.) An simple, 1-D example, n=0 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 44 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 45 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 46 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 47 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 48 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 49 of 61 Orthonormal Functions (Cont.) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 50 of 61 Orthonormal Functions (Cont.) The formulae (quart) and (quint) exactly treat all quartic and quintic polynomials exactly Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 51 of 61 Regression Variation in Raw Experimental Data Two parts The first part consists of an entirely random variation The second part arises because the observations are influenced by certain concomitant conditions of the experiment We may do little about it We may record these condition Determine how they influence the raw observations Regression Calculate (estimate) the second part Subtract it out from the reckoning Leave only those variations in the observations which are not due to the concomitant conditions Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 52 of 61 Regression Model Model The random observations ηi (i=1, 2, … n) are associated with a set of concomitant numbers xij (j=1,2, .., p) xij describe the experimental condition under which the observations ηi was taken ηi is the sum of a purely random component δi and a linear combination ∑j β j xij of the concomitant numbers βi are called regression coefficient The minimum-variance unbiased linear estimator of βi is b=(X’V-1X)-1X’V-1 η X: nxp matrix xij V: nxn variance covariance matrix of δi Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 53 of 61 Regression Methods Regression Method Suppose we have several unknown estimates θ1, θ2, …, θp A set of estimators t1, t2, …, tn Eti=xi1 θ1+xi2 θ2 + … + xip θp (i=1, 2, …, n) Et=Xθ xij are a set of known constants Minimum-variance unbiased linear estimator of θ={θ1, θ2, …, θp } t*=(X’V-1X)-1X’V-1 t V is unknown Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 54 of 61 Regression Methods (Cont.) Consider an alternative estimator which uses an arbitrary V0 t0*=(X’V0-1X)-1X’V0-1 t Et0*= θ However, t0* is not a minimum-variance estimator If V0 is close to V, then t0* will have a very nearly minimum variance Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 55 of 61 Regression Method in Practice Regression Method in Practice Calculate N independent sets of estimates t1, t2, …, tn Each result is denoted by t1k, t2k, …, tnk (k=1,2,…N) vij can be estimated by N vij 0 = ∑ (tik − ti )(t jk − t j ) /( N − 1) k =1 N ti = ∑ tik / N k =1 Then, the estimator of θ is t t0*=(X’V0-1X)-1X’V0-1 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 56 of 61 Example of Regression Methods t1 = 1 2 f (ξ ) + 12 f (1 − ξ ) t2 = 1 4 f ( 12 ξ ) + 14 f ( 12 − 12 ξ ) + 14 f ( 12 + 12 ξ ) + 14 f (1 − 12 ξ ) Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 57 of 61 Buffon Needle Problem Revisited If we through a needle of length L onto a square grid with Δx =Δy = 1 then Mantel gives a quadratic estimator for the Buffon Needle Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 58 of 61 Comparison of the Variance Reduction Methods Definitions Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 59 of 61 Comparison of the Variance Reduction Methods (Cont.) Method Used Var. Ratio Labor Ratio Efficiency Hit-or-miss 0.34 1/1 0.34 Stratified, 4 strata 13 1/1.3 10 Importance, g(x)=x 29.9 1/3 10 Control Var., ø(x)=x 60.4 1/2 30 Antithetic Variate 62 1/2 31 985 1/2 490 Antithetic (II)* Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 60 of 61 Comparison of the Variance Reduction Methods (Cont.) Method Var. Ratio Labor Ratio Efficiency Antithetic (II)* 2-way 15600 1/4 3900 Antithetic (II)* 4-way 249000 1/8 31000 Antithetic (II)* 8-way 3980000 1/16 250000 Antithetic (II)* special 2950000 1/6 460000 Orthonormal 240000 720000 1/3 Prof. Dr. Michael Mascagni: Advanced Monte Carlo Methods General Principles of Monte Carlo Slide 61 of 61