Statistical Computing and Simulation

advertisement
Statistical Computing and Simulation
NOV21 2005
1. Using Rejection Methods described in the class, to generate random numbers from
Gamma(3/2,1). Check if the numbers generated are really independent and distributed
as Gamma(3/2,1).
 Since f x   2

f x 
3

g x 

0 .5
x e
0.5  13 x
x e
x
, choose g  x  
is achieved at x 
2  23 x
with x  0 . Also, the maximum of
3e
3
, and so we have
2
1 x
f x 

1
3 3
 (1.5) 0.5 x 0.5 e 2 3 .

 1.25732 and
cg x 
2e 0.795345
Thus, following the idea of Rejection method, we can use the following algorithm to
c
generate random numbers from Gamma(3/2,1)
2
3
Step 1:Generate U and Y, where U ~ (0,1) and Y ~ Exp( )。
Step 2:If U  (1.5) 0.5 Y 0.5 e  then let X = Y; Otherwise go back to Step 1.
1
2
Y
3
I conducted 10,000 runs of replication in S-Plus. There are 7,970 runs satisfy the required
condition and the following is the histogram these Gamma(3/2,1) random numbers and
2000
0
1000
Frequency
3000
their theoretical density plot.
The p-value of chi-square goodness-of-fit is 0.4047 indicating that there is no enough
evidence to reject the hypothesis that these numbers are from Gamma(3/2,1).
0
2
4
6
8
10
xv
2. Test the generation methods of normal distribution introduced in class, i.e., the
Box-Muller, Polar, and Ratio-of-uniform algorithms. Based on your simulation results,
choose the “best” generator.
Here are the algorithms of the Box-Muller, Polar, and Ratio-of-uniform.
# (1) Box-Muller
theta=2*pi*runif(10000)
radius=sqrt(-2*log(runif(10000)))
x1=radius*cos(theta)
y1=radius*sin(theta)
# (2) Polar
u1=runif(10000,-1,1)
u2=runif(10000,-1,1)
w=u1^2+u2^2
ind=(w<1)*c(1:10000)
ind=ind[ind>0]
tt=cbind(u1,u2,w)
tt=tt[c(ind),]
x2=sqrt(-2*log(tt[,3])/tt[,3])*tt[,1]
y2=sqrt(-2*log(tt[,3])/tt[,3])*tt[,2]
# (3) Ratio-of-uniform
u3=runif(10000)
u4=runif(10000)
v=sqrt(2/exp(1))*(2*u4-1)
x3=v/u3
z=x^2/4
z1=( z<=0.259/u3+0.35 & z<=-log(u3))*c(1:10000)
z1=z1[z1>0]
x3=x3[c(z1)]
In order to know which is the “best” algorithm, I conducted the above simulation
1,000 times and the following are the p-values of  2 goodness-of-fit test for Box-Muller,
Polar, Ratio of uniform, and the rnorm function in S-Plus:
Box-Muller
Polar
Ratio of Uniform rnorm(S-Plus)
# < 0.01
9
10
13
11
# < 0.05
48
53
66
51
# < 0.10
95
107
114
100
 2 uniform
0.4136
0.5996
0.4355
0.4869
First, we can see that the information of p-values of all methods is very close to the
expected values, indicating that all methods are equally accepted. In addition, the
histograms of the p-values are uniformly distributed and the tests of uniformity via
 2 goodness-of-fit show no special patterns. Overall, the function rnorm seems to be the
best, judging from the histogram and numbers of p-values smaller than 0.01, 0.05, and 0.10.
Ratio-of-uniform is a little bit away from the expected values.
Polar
Ratio-of-uniform
rnorm(S-Plus)
100
80
60
0.0
0.2
0.4
0.6
0.8
1.0
20
40
0.0
0.2
0.4
p-value
0.6
0.8
p-value
1.0
0
20
0
0
0
20
20
40
40
40
60
60
60
80
80
80
100
100
100
120
Box-Muller
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
p-value
0.4
0.6
0.8
1.0
p-value
3. Describe an algorithm for generating from logistic distribution
exp[ ( x   ) /  ]
,    x  ,   0,   0.
 [1  exp[ ( x   ) /  ]2
 For the choice of “envelop,” we need a distribution defined on (, ) .
f x ( x) 
Let   0,   1 ,i.e. f x  
ex
1  e 
x 2
. Let g x   e  x and so
f x 
1

g x  1  e x


2
 1.
Let c  1, then the random numbers from logistic distribution can be generated via the
following 3 steps:
Step 1. Generate U1, U2 ~U(0,1), Y ~Exp(1)
f y
1
Step 2. If U1 
then let X  Y * (1)^ [10 *U 2 ]

cg  y  1  e y 2
else go to 1.
Step 3. Let X  X  


According to these steps, we generate 10,000 random numbers and 5,002 numbers
satisfy the requirement. The following graph is the histogram of these logistic
distribution random numbers and their theoretical density plot. We can see that they
look very similar. Also, the p-value of chi-square goodness-of-fit is 0.9634 also
indicates that we have no enough evidence to reject the hypothesis that these numbers
are from standard logistic distribution.
1200
1000
800
600
Frequency
400
200
0
-10
-5
0
5
10
xv
4. (a) Let X and X1 be i.i.d. r.v.’s and let Y  X  (1   ) X1, where 0 |  | 1. Prove
that the correlation coefficient between X and Y is  X ,Y 
algorithm for generating a pair of r.v.'s
(X,Y) for which

 2  (1   ) 2
. Describe an
 X ,Y   .
  X and X 1 be i.i.d. r.v.`s  Var(X) = Var(X 1 ), Cov(X,X 1 )=0
  X ,Y 

=
=
 X ,Y =
cov( X , X  (1   ) X 1 )
var( X ) var( X  (1   ) X 1 )
cov( X , Y )

var( X ) var( Y )
 var( X )
var( X )  2 var( X )  (1   ) 2 var( X 1 )

 2  (1   ) 2

 2  (1   ) 2


  2   2  2   2  2   2  2


 1  2 2  2  2 2   2  0
 
2  2 4
1  2 2
.
Note that since    1,1 and  

 2 (1 )
 =
 2  2  4
1  2 2
2
implies that  [-0.44721,1], we know that
when    0.44721,0 , while  =
 2  2  4
1  2 2
when
  0,1 . The following steps can generate random numbers (X,Y) with correlation
coefficient .
Step 1. Generate X and X 1 be i.i.d. random numbers.
Step 2. If -0.44721    0 then   
Else If
0    1 then   
 2  2  4
1  2 2
 2  2  4
1  2 2
Step 3. Let Y  X  (1   ) X1
Note that we can also use “Cholesky Decomposition” to generate correlated variables. For
   x    x2
X
example, suppose we want to generate   ~ N   , 
   y   x y
Y 


 x y  
 . Let’s
 y2  
consider a special case that  x   y  0 ,  x   y  1 , and   0.5. Using the
 1 0.5 
X  1
 we know that    
command “chol(A)” where A  
 0.5 1 
 Y   0.5
0  X1 
  ,
3 / 2  X 2 
where X1, X 2 are independent variables from N(0,1). It should be noted that the choice
 X   a b  X 1 
  , we only need to find
of Cholesky decomposition is not unique. Via    
 Y   c d  X 2 
solutions for a 2  b 2  c 2  d 2  1 and ac  bd  1/ 2. possible solutions include
1


a b 
2

 
c d  2  6

4

1


2  and, in general,
2 6

4

a b

  t (chol ( A)).
c d
(b) Using the idea in (a), describe an algorithm for generating a random vector
( X , Y , Z )T ~ N ( ~, ), where ~  ( X , Y ,  Z )T and
  X2

    XY X  Y
  
 XZ X Z
 XY X  Y
 Y2
YZ Y  Z
 XZ X  Z 

YZ Y  Z  .
 Z2 
 We can use “chol(A)” to create (X,Y,Z), similar to (a). The details are omitted. Here we
describe an algorithm, similar to (a), to generate tri-variate random variables. First,
  0  1 0 0
 X1 
  
 

 X 2  ~ N   0 ,  0 1 0  .
  0  0 0 1
X 
 3

  
assume
Cov( X , Y )   XY
.

Cov( X , Z )   XZ
Then
Let
adjust
Z


 X  X1

2
Y   XY X 1  1   XY X 2
Z   X  1   2 X
XZ 1
XZ
3

2
Z   XZ X 1  1   XZ
aX 2  1  a 2 X 3 .
such
and
so
.
Let
solution
of
Cov(Y , Z )  YZ
that
a
and
is
the
2
2
Cov(Y , Z )   XY  XZ  a (1   XY
)(1   XZ
).
5. Describe an algorithm for generating from multinomial distribution
n!
f ( x1 , x2 ,, xk ) 
p1x1 p2x 2  pkx k ,
x1! x2! xk !
where

k
i 1
pi  1 and

k
x  n . (Note: Searching on the web, see if there are better
i 1 i
ways for generating random numbers from multinomial distribution.)
j 1
Since X1 ~ B(n, p1 ) and X j | X 1  x1 , , X j 1  x j 1 ~ B(n   xi ,
i 1
pj
k
p
i j
) for 2  j  k  1 ,
i
i.e., X k  n  i 1 xi , we can use the algorithm of generating binomial variables to
k 1
generate multinomial variables. This would require a number of (k-1) binomial-variable
generations. Note that the computing time is minimized if we let p1  p2    pk . (In
other words, we can use this idea to generate multinomial variables in S-Plus.) Also, you
can use a program in the file “multinomial random numbers(R).txt” to generate:
rmultinom <- function(n, size, prob) {
K <- length(prob) # #{classes}
matrix(tabulate(sample(K, n * size, repl = TRUE, prob) + K * 0:(n - 1),
nbins = n * K),
nrow = n, ncol = K, byrow = TRUE)
}
6. Evaluate the number of comparisons needed to generate a random number from Poisson
distribution, i.e., Poisson(=12). Methods considered include Inversion, Mode (or mean),
and Indexed search. You don’t need to write a program to count the numbers. Instead,
you could use the random numbers in R/S-Plus to evaluate.
 Based on 10,000 simulation runs, I found that the numbers of comparisons needed for
the inversion, mean, and indexed searched are 13.0527 (i.e., +1), 3.7495, and 3.2276.
The indexed search is based on m = 5, i.e., the cutting points being 0, 9, 11, 13, and 15.
Download