David Madigan – notes for “Bayesian Data Analysis” Simulation Recall the strong theorem of large numbers: let x1 ,x 2 , , xN be a sequence of independent random variables having a common distribution and let E xi with probability 1, x1 x2 xN μ as N μ . Then . N xf x x dx is to simulate x1 ,x 2 , It follows that one way to estimate E x from f x x and form the arithmetic mean of the x’s as the estimate. , xN In general can estimate E g x g x1 , x2 , i where x x1i , x2i , , xN f x1 , x2 , ,x N dx1 dx2 dx N by 1 N , x Ni are drawn independently from f x1 , x2 , g xi i , xN . Example: Suppose a random variable X has a beta distribution with parameters a, b: f x x 1 xa 1 1 x B a ,b b 1 if x [0,1] 1 a 1 b 1 where B( a, b ) x (1 x ) dx is called a Beta function. 0 0.25 We want to evaluate P X 0.25 1 B a ,b incomplete beta function which has been tabulated. 0 1x 0, 0.25 0 otherwise Alternatively define J x 1 xa 1 1 x b 1 dx . This is an David Madigan – notes for “Bayesian Data Analysis” 1 Then P X 0.25 E J x J x f x x dx 0 So P X 0. 25 estimate 1 N J x by i simulating x1 ,x 2 , , xN from 1 J x . i N This general procedure is called Monte Carlo Integration. x1 ,x 2 , , xN and letting P X 0. 25 How do you draw random variables from f x x ? Underlying many of the methods we will discuss for simulation will be the requirement to generate random number e.g. a random number between 0 and 1. Random number generation and assessing random number generators is a major industry in itself. We will just assume that we have some method for doing it. e.g. X n 1 Xn aXn c mod m then m is approximately U 0, 1 . Note the examples in the book – generating a random permutation estimating the number of distinct entries in a large set Simulating continuous random variables The Inverse Transformation Method (method 1) Proposition: Consider a random variable x1 , x 2 , , xN with CDF F X x . Define a new random variable Y F X x . Then Y ~ unif 0, 1 . Proof: FY y P Y y P FX x y 1 P X FX y F X F X1 y sin ce F X is monotone y so Y unif 0,1 Corollary: 2 David Madigan – notes for “Bayesian Data Analysis” Let U unif 0, 1 . Define a random variable X F the CDF of X. 1 U where F is a CDF. Then F is Proof: FX x P X x P F 1 u x P u F x F x . Thus, if we know how to invert F X , we can simulate X by generating a random u ~ U(0, 1 1) and letting X F u . Example: Simulating a Weibull. For a Weibull random variable X have: x FX x Then F 1 α β 1 e u is that value of X such that: x α β 1 e x e x u α β 1 u α log 1 u β 1 x β log 1 u α X generated in this fashion will have Weibull α , β distribution. Note: This method only works when F X is invertible which will not often be the case. The rejection method (method 2) Suppose there is a density g x which is “close” to the density f that we wish to simulate from but it is much easier to simulate from g than f (e.g. f might be gamma and g Weibull). Then provided c such that f x g x c for all x, we can use g to get simulations from f . 3 David Madigan – notes for “Bayesian Data Analysis” Note: g x must have support at least as big as f x . Here is how it works: Step 1 Simulate Y having density g and simulate a random number U. Step 2 f Y If U set X Y . Otherwise return to step 1. cg Y Claim: that the value X has density function f . Proof: P X x P Y P Y x u N x PY x u f Y cg Y f Y P Y x f Y u P u f Y P u cg Y x f y Y y g y dy cg Y cg Y f Y cg Y x g y dy f y dy cg y P u f Y cP u f Y cg Y cg Y The denominator does not involve x. f Y cP u But lim P X x 1 cg Y x 1 4 Pu f Y 1 cg Y c . David Madigan – notes for “Bayesian Data Analysis” Note: Since P u f Y 1 cg Y c , the number of iterations until an acceptance will be geometric with mean c. Thus it is important to choose g (Y ) so that c is small. Note: However, a difficulty with the rejection method is that in many applications, c is hard to compute. Often end up choosing c very conservatively large thereby resulting in very high computational costs. Sampling Importance Resampling (method 3) (Rubin, 1987) Again, assume there is a density g x which is close to the density f that we want to simulate from. Then to generate a sample of size n from f, proceed as follows: 1. draw x1 ,x 2 , , xN from g x 2. sample a value x from the set x1 , x2 , each xi is proportional to w x , xN where the probability of sampling f x g x 3. sample a second value x using the same procedure, but excluding the already sampled value from the set 4. repeatedly sample without replacement n – 2 more times. “Proof”: Each of the n x’s is drawn with possibility: g x g x w x w x dx as N n f x f x f x dx 5 David Madigan – notes for “Bayesian Data Analysis” Note: SIR is subject to the same problems as rejection sampling but has the advantage that it converges provided w x without explicitly requiring max w x . Example: (beta distribution) Suppose want to simulate from f x x 1 xa 1 1 x B a ,b our candidate (also called the “envelope” distribution) g x Then d f is bounded iff α , β g xa 1 b 1 1 x xa 2 b 1 , x [0,1] . We chose as 1 , x [0,1] . 1 1 x b 2 x b 1 1 x a 1 so max at 0 or 1 or dx 1 a 1 b Use either rejection sampling or SIR. xα If α 0, β 1 then use g x Î ± Î x ±Î± Can simulate from g x 1 1 by the inverse transformation method: 1 x α 1 g x Î ± t α dt x u x u α 0 1 So if u unif 0,1 , x u α α 1 Î ± x Methods for simulating Normal Random Variables 1. Sum of 12 uniforms (approximate algorithm) 12 X u i 6 , i 1 ui unif 0, 1 12 E X E u i i 1 6 12 1 6 0 2 6 David Madigan – notes for “Bayesian Data Analysis” 12 V X V u 12 i 1 1 2 i 1 and X is approximately normal by Central Limit Theorem. This is a fairly crude approximation but may be adequate for some applications. 2. Rejection Sampling 2 First note that if f Z z Then letting x z 1 2 e (i.e. Z ~ N(0, 1)) 2Ï€ z 2 2 fX x ( FX x P X x P x Try candidate distribution gX x e 2 Then 1 ( e 2 f x 2 g x 2Ï€ e x , 0 x x e 2 FZ 2 1 since if e x' 1 2 x fX x fZ x x x 2 2 1 2e 2 e Ï€ 2 1 1 x' 1 2 2 1 e 2 x 1 2 2e Ï€ 1.3155 0 which is impossible ) So, the procedure is: 1. generate a realization from gX x e x (Inverse transform) 2. generate u unif 0, 1 1 x 1 fZ x ) (i.e. exp(1)) 2Ï€ 1 x 1 FZ x 1 2 2 2 e 2Ï€ x z x x 2 3. if u e 2 then accept x. Else go to 1. 4. generate u unif 0, 1 . If u unif 0, 1 . Note: This is quite efficient requiring an average of 1.32 iterations per acceptance. 3. Box-Muller Let X, Y ~ N(0, 1), X, Y independent Consider a transformation to polar coordinates: 7 David Madigan – notes for “Bayesian Data Analysis” R2 X 2 Y2 Y θ tan 1 X To get the joint distribution of R 2 and need Jacobian of the transformation J d d x y θ θ x y 2x 1 1 y2 x 2y y 1 x2 y2 1 x 2 f 2 R , θ d , θ 2 So R and θ z 1 Since f X , Y x , y 1 2 2 2 x 2 y 2 e 2Ï€ 1 2Ï€ d e 2 1 1 2 2 d e 2 1 for 0 d 2Ï€ 2 are independent. Furthermore R exp 2 So proceed by generating R and θ 1 and 0 θ , θ 2 Simulating a beta random variable fX x Γ n m unif 0, 2Ï€ . and then setting X Rcosθ , Y Rsinθ resulting in two independent standard normal random variables. X B n, m 2Ï€ x n 1 1 x m 1 , 0 x 1 Γ n Γ m Consider the n-th smallest of n + m – 1 random 0-1 numbers. 8 David Madigan – notes for “Bayesian Data Analysis” U n x fU n x P U x x n 1 1 x m 1 arrangements arrangements ways to pick x n m 1 n 1 fu x P U x m 1 arrangements ways of splitting around x n m 2 n m 1 ! n 1 n 1 ! m 1 ! U (n) has a beta distribution with params n of m. A problem with this approach is that finding the n-th smallest of n + m – 1 random numbers is expensive if n, m are large. 9