Statistical Computing and Simulation NOV21 2005 1. Using Rejection Methods described in the class, to generate random numbers from Gamma(3/2,1). Check if the numbers generated are really independent and distributed as Gamma(3/2,1). Since f x 2 f x 3 g x 0 .5 x e 0.5 13 x x e x , choose g x is achieved at x 2 23 x with x 0 . Also, the maximum of 3e 3 , and so we have 2 1 x f x 1 3 3 (1.5) 0.5 x 0.5 e 2 3 . 1.25732 and cg x 2e 0.795345 Thus, following the idea of Rejection method, we can use the following algorithm to c generate random numbers from Gamma(3/2,1) 2 3 Step 1:Generate U and Y, where U ~ (0,1) and Y ~ Exp( )。 Step 2:If U (1.5) 0.5 Y 0.5 e then let X = Y; Otherwise go back to Step 1. 1 2 Y 3 I conducted 10,000 runs of replication in S-Plus. There are 7,970 runs satisfy the required condition and the following is the histogram these Gamma(3/2,1) random numbers and 2000 0 1000 Frequency 3000 their theoretical density plot. The p-value of chi-square goodness-of-fit is 0.4047 indicating that there is no enough evidence to reject the hypothesis that these numbers are from Gamma(3/2,1). 0 2 4 6 8 10 xv 2. Test the generation methods of normal distribution introduced in class, i.e., the Box-Muller, Polar, and Ratio-of-uniform algorithms. Based on your simulation results, choose the “best” generator. Here are the algorithms of the Box-Muller, Polar, and Ratio-of-uniform. # (1) Box-Muller theta=2*pi*runif(10000) radius=sqrt(-2*log(runif(10000))) x1=radius*cos(theta) y1=radius*sin(theta) # (2) Polar u1=runif(10000,-1,1) u2=runif(10000,-1,1) w=u1^2+u2^2 ind=(w<1)*c(1:10000) ind=ind[ind>0] tt=cbind(u1,u2,w) tt=tt[c(ind),] x2=sqrt(-2*log(tt[,3])/tt[,3])*tt[,1] y2=sqrt(-2*log(tt[,3])/tt[,3])*tt[,2] # (3) Ratio-of-uniform u3=runif(10000) u4=runif(10000) v=sqrt(2/exp(1))*(2*u4-1) x3=v/u3 z=x^2/4 z1=( z<=0.259/u3+0.35 & z<=-log(u3))*c(1:10000) z1=z1[z1>0] x3=x3[c(z1)] In order to know which is the “best” algorithm, I conducted the above simulation 1,000 times and the following are the p-values of 2 goodness-of-fit test for Box-Muller, Polar, Ratio of uniform, and the rnorm function in S-Plus: Box-Muller Polar Ratio of Uniform rnorm(S-Plus) # < 0.01 9 10 13 11 # < 0.05 48 53 66 51 # < 0.10 95 107 114 100 2 uniform 0.4136 0.5996 0.4355 0.4869 First, we can see that the information of p-values of all methods is very close to the expected values, indicating that all methods are equally accepted. In addition, the histograms of the p-values are uniformly distributed and the tests of uniformity via 2 goodness-of-fit show no special patterns. Overall, the function rnorm seems to be the best, judging from the histogram and numbers of p-values smaller than 0.01, 0.05, and 0.10. Ratio-of-uniform is a little bit away from the expected values. Polar Ratio-of-uniform rnorm(S-Plus) 100 80 60 0.0 0.2 0.4 0.6 0.8 1.0 20 40 0.0 0.2 0.4 p-value 0.6 0.8 p-value 1.0 0 20 0 0 0 20 20 40 40 40 60 60 60 80 80 80 100 100 100 120 Box-Muller 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 p-value 0.4 0.6 0.8 1.0 p-value 3. Describe an algorithm for generating from logistic distribution exp[ ( x ) / ] , x , 0, 0. [1 exp[ ( x ) / ]2 For the choice of “envelop,” we need a distribution defined on (, ) . f x ( x) Let 0, 1 ,i.e. f x ex 1 e x 2 . Let g x e x and so f x 1 g x 1 e x 2 1. Let c 1, then the random numbers from logistic distribution can be generated via the following 3 steps: Step 1. Generate U1, U2 ~U(0,1), Y ~Exp(1) f y 1 Step 2. If U1 then let X Y * (1)^ [10 *U 2 ] cg y 1 e y 2 else go to 1. Step 3. Let X X According to these steps, we generate 10,000 random numbers and 5,002 numbers satisfy the requirement. The following graph is the histogram of these logistic distribution random numbers and their theoretical density plot. We can see that they look very similar. Also, the p-value of chi-square goodness-of-fit is 0.9634 also indicates that we have no enough evidence to reject the hypothesis that these numbers are from standard logistic distribution. 1200 1000 800 600 Frequency 400 200 0 -10 -5 0 5 10 xv 4. (a) Let X and X1 be i.i.d. r.v.’s and let Y X (1 ) X1, where 0 | | 1. Prove that the correlation coefficient between X and Y is X ,Y algorithm for generating a pair of r.v.'s (X,Y) for which 2 (1 ) 2 . Describe an X ,Y . X and X 1 be i.i.d. r.v.`s Var(X) = Var(X 1 ), Cov(X,X 1 )=0 X ,Y = = X ,Y = cov( X , X (1 ) X 1 ) var( X ) var( X (1 ) X 1 ) cov( X , Y ) var( X ) var( Y ) var( X ) var( X ) 2 var( X ) (1 ) 2 var( X 1 ) 2 (1 ) 2 2 (1 ) 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 0 2 2 4 1 2 2 . Note that since 1,1 and 2 (1 ) = 2 2 4 1 2 2 2 implies that [-0.44721,1], we know that when 0.44721,0 , while = 2 2 4 1 2 2 when 0,1 . The following steps can generate random numbers (X,Y) with correlation coefficient . Step 1. Generate X and X 1 be i.i.d. random numbers. Step 2. If -0.44721 0 then Else If 0 1 then 2 2 4 1 2 2 2 2 4 1 2 2 Step 3. Let Y X (1 ) X1 Note that we can also use “Cholesky Decomposition” to generate correlated variables. For x x2 X example, suppose we want to generate ~ N , y x y Y x y . Let’s y2 consider a special case that x y 0 , x y 1 , and 0.5. Using the 1 0.5 X 1 we know that command “chol(A)” where A 0.5 1 Y 0.5 0 X1 , 3 / 2 X 2 where X1, X 2 are independent variables from N(0,1). It should be noted that the choice X a b X 1 , we only need to find of Cholesky decomposition is not unique. Via Y c d X 2 solutions for a 2 b 2 c 2 d 2 1 and ac bd 1/ 2. possible solutions include 1 a b 2 c d 2 6 4 1 2 and, in general, 2 6 4 a b t (chol ( A)). c d (b) Using the idea in (a), describe an algorithm for generating a random vector ( X , Y , Z )T ~ N ( ~, ), where ~ ( X , Y , Z )T and X2 XY X Y XZ X Z XY X Y Y2 YZ Y Z XZ X Z YZ Y Z . Z2 We can use “chol(A)” to create (X,Y,Z), similar to (a). The details are omitted. Here we describe an algorithm, similar to (a), to generate tri-variate random variables. First, 0 1 0 0 X1 X 2 ~ N 0 , 0 1 0 . 0 0 0 1 X 3 assume Cov( X , Y ) XY . Cov( X , Z ) XZ Then Let adjust Z X X1 2 Y XY X 1 1 XY X 2 Z X 1 2 X XZ 1 XZ 3 2 Z XZ X 1 1 XZ aX 2 1 a 2 X 3 . such and so . Let solution of Cov(Y , Z ) YZ that a and is the 2 2 Cov(Y , Z ) XY XZ a (1 XY )(1 XZ ). 5. Describe an algorithm for generating from multinomial distribution n! f ( x1 , x2 ,, xk ) p1x1 p2x 2 pkx k , x1! x2! xk ! where k i 1 pi 1 and k x n . (Note: Searching on the web, see if there are better i 1 i ways for generating random numbers from multinomial distribution.) j 1 Since X1 ~ B(n, p1 ) and X j | X 1 x1 , , X j 1 x j 1 ~ B(n xi , i 1 pj k p i j ) for 2 j k 1 , i i.e., X k n i 1 xi , we can use the algorithm of generating binomial variables to k 1 generate multinomial variables. This would require a number of (k-1) binomial-variable generations. Note that the computing time is minimized if we let p1 p2 pk . (In other words, we can use this idea to generate multinomial variables in S-Plus.) Also, you can use a program in the file “multinomial random numbers(R).txt” to generate: rmultinom <- function(n, size, prob) { K <- length(prob) # #{classes} matrix(tabulate(sample(K, n * size, repl = TRUE, prob) + K * 0:(n - 1), nbins = n * K), nrow = n, ncol = K, byrow = TRUE) } 6. Evaluate the number of comparisons needed to generate a random number from Poisson distribution, i.e., Poisson(=12). Methods considered include Inversion, Mode (or mean), and Indexed search. You don’t need to write a program to count the numbers. Instead, you could use the random numbers in R/S-Plus to evaluate. Based on 10,000 simulation runs, I found that the numbers of comparisons needed for the inversion, mean, and indexed searched are 13.0527 (i.e., +1), 3.7495, and 3.2276. The indexed search is based on m = 5, i.e., the cutting points being 0, 9, 11, 13, and 15.