Lecture 17 and 18.pptx

advertisement
Generating Random Variates
Introduction
• There are various procedures to produce observations (variates)
from some desired input distribution (like exponential, gamma,
etc.)
• The algorithm usually depends on desired distribution
• But all algorithms have the same general form
• Note the critical importance of a good random-number generator
• There may be several different algorithms for the desired
distribution
Desired Properties
•
•
•
Exact: X should have exactly (not approximately) the desired distribution
– Example of approximate algorithm:
• Treat Z = U1 + U2 + ... + U12 – 6 as N(0, 1)
• Mean and variance are correct; it relies on CLT for approximate
normality
• Range is clearly incorrect
Efficient:
– Low storage
– Fast
– Efficient regardless of parameter values (robust)
Simple:
– Easy to understand and implement (often tradeoff against efficiency)
– Requires only U(0, 1) input
– If possible, one U one X (for synchronization in variance reduction)
Generating Random Variates
• There are five general approaches to generating a univariate
random variables from a distribution
Inverse transform method
Composition method
Convolution method
Acceptance-rejection method
Special properties
Inverse Transform Method
• The simplest (in principle) and “best” method in some ways
Continuous Case
• Suppose X is continuous with cumulative distribution function (CDF)
F(x) = P(X  x) for all real numbers x that is strictly increasing over all x
Algorithm:
1. Generate U ~ U(0, 1) using
a random number generator
2. Find X such that F(X) = U
and return this value of X
•
•
Step 2 involves solving the equation F(X) = U for X; the solution is written
as X = F–1(U) and we must invert the CDF F
Inverting F might be easy (exponential), or difficult (normal) in which case
numerical methods might be necessary
Inverse Transform Method
Mathematical proof: Assume that F is strictly increasing for all x. For
a fixed value x0
P(X x0) = P(F–1(U) x0)
= P(F(F-1(U)) F(x0))
= P(U F(x0))
= F(x0)
Graphical proof:
X1  x0 if and only if U1  F(x0), so P(X1  x0) = P(U1  F(x0)) = F(x0)
Examples
• Weibull (α, β) distribution with parameters α > 0 and β > 0
Density function is
CDF is
F ( x)  
x

  x 1e ( x /  ) if x  0
f ( x)  
otherwise
0
1  e  ( x /  )
f (t )dt  
0

if x  0
otherwise
Solving U = F(X) for X, we get
U  1 e
 ( X /  )
 X   [ ln( 1  U )]1/
• Since 1 – U ~ U(0, 1) as well, we can replace 1 – U by U to get
X   [ ln( U )]1/ 
Intuition Behind
the Inverse Transform Method
• Weibull (α = 1.5, β = 6)
Intuition Behind
the Inverse Transform Method
The Algorithm in Action
Inverse Transform Method
Discrete case
•
Suppose X is discrete with cumulative distribution function F(x) = P(X
 x) for all real numbers x and probability mass function p(xi) = P(X =
xi), where x1, x2, ... are the possible values X can take
Algorithm:
1. Generate random number
U ~ U(0,1)
2. Find the smallest positive
integer i such that
U ≤ F(xi)
3. Return X = xi
•
•
Unlike the continuous case, the discrete inverse transform method
can always be used for any discrete distribution
Graphical proof: From the above picture, P(X = xi) = p(xi) in every
case
Example
•
•
Discrete uniform distribution on 1, 2, ..., 100
xi = i, and p(xi) = p(i) = P(X = i) = 0.01 for i = 1, 2, ..., 100
Example
• “Literal” inverse transform search
1. Generate U ~ U(0,1)
2. If U ≤ 0.01 return X = 1 and stop; else go on
3. If U ≤ 0.02 return X = 2 and stop; else go on
4. If U ≤ 0.03 return X = 3 and stop; else go on
.
.
100. If U ≤ 0.99 return X = 99 and stop; else go on
101. Return X = 100
• Equivalently (on a U-for-U basis)
1. Generate U ~ U(0,1)
2. Return X = 100 U+ 1
Generalized Inverse Transform Method
X = min{x: F(x) U}
• This is valid for any CDF F(x)
– Continuous, possibly with flat spots (i.e., not strictly increasing)
– Discrete
– Mixed continuous-discrete
Problems with the inverse transform method
• Inverting the CDF may be difficult (numerical methods)
• It may not be the fastest or simplest approach for a given distribution
Other Implementations of
Inverse Transform Approach
•
•
•
•
•
Sometimes the inverse function is not analytically available but can
be computed numerically.
In this case, we can still ply the inverse transform approach by using
the numerical inverse.
Consider the normal distribution, the CDF cannot be written in
closed form so the inverse function does not exist in closed form.
However, most software enables numerical calculation of the
inverse function.
Example: NORMINV(p,m,s) in Excel returns F-1X(p) where X is
Normal(m,s) and 0<p<1.
Therefore, we can generate realizations from X by:
NORMINV(RAND() ,m,s) which is equivalent to: F-1X(U) where U is
uniform (0,1).
Acceptance-Rejection Method
•
•
•
•
This is used when inverse transform is not directly applicable or is
inefficient (like for gamma and beta distributions)
It has continuous and discrete versions; we’ll just do the continuous case,
discrete case is similar
Goal is to generate X with some density function f
Specify a function t(x) that majorizes f(x), (t(x) ≥ f(x) for all x)
Acceptance-Rejection Method
•
•
•
•
•
•
•
•


Then t(x) ≥ 0 for all x, but  t ( x)dx   f ( x)dx  1 , so t(x) is not a density

Set c   t ( x)dx  1

Define a new density r(x) = t(x)/c for all x
Algorithm
1.Generate Y having density r
2.Generate U ~ U(0,1) (Independent of Y in Step 1)
3.If U ≤ f(Y)/t(Y), return X = Y and stop; else, go back to step 1 and
try again (Repeat 1-2-3 until acceptance finally occurs in Step 3)
Since t majorizes f, f(Y)/t(Y) ≤ 1 so “U ≤ f(Y) / t(Y)” may or may not occur
We must be able to generate Y with density r, hopefully this is easy with
our choice of t
On each pass, P(acceptance) = 1/c, so we want small c = area under
t(x), to “fit” down on top of f closely (we want t and thus r to resemble f
closely)
Tradeoff between ease of generation from r, and closeness of fit to f
Acceptance-Rejection Method
Proof: The key point is that we get an X only conditional on
acceptance in step 3
P( X  x)  P(Y  x | acceptance) 
P(acceptance, Y  x)
P  acceptance 



f (Y ) 
f ( y)
f ( y) t ( y)
1
P(acceptance)  P  U 
r ( y )dy  
dy 
 
t (Y )   t ( y )
t ( y) c
c



f ( y)  f ( y)
P(acceptance| Y  y )  P  U 

t ( y)  t ( y)


P(acceptance, Y  x)   P(acceptance, Y  x | Y  y )r ( y )dy
-
x
  P(acceptance | Y  y )r ( y )dy
-
1 x f ( y)
t ( y )dy
c  t ( y )
 F(x)/c

P( X  x) 
P(acceptance, Y  x) F ( x) / c
=
 F ( x)
P  acceptance 
1/ c
Example of
Acceptance-Rejection Method
• Beta (4,3) distribution, density is f(x) = 60 x3(1-x)2 for 0 ≤ x ≤ 1
• The maximum occurs at x=0.6, f(0.6)= 2.0736 (exactly), so let t(x) =
2.0736 for 0 ≤ x ≤ 1
• Thus, c = 2.0736, and r is the U(0,1) density function
Algorithm:
1. Generate Y ~ U(0,1)
2. Generate U ~ U(0,1) independent of Y
3. If U ≤ 60Y3(1-Y)2 /2.0736, return X=Y;
else go back to step 1 and try again
P(acceptance) in step 3 is 1/2.0736 = 0.48
Intuition
Intuition
• A different way to look at it
• Accept Y if U ≤ f(Y)/ t(Y), so
plot the pairs (Y, U t(Y)) and
accept the Y’s for which the
pair is under the f curve
A Closer Fit
•
•
Higher acceptance
probability on a given pass
Harder to generate Y with
density shaped like t, can
perhaps use the
composition method
Example
• fX(x)=3/4(-x2+1) if -1<x<1 and fX(x)=0 otherwise.
fX(x)
Majorizing function?
3/4
-1
1
x
Composition Method
• Suppose that the inverse transform method is difficult or too slow
• Suppose we can find other CDFs F1, F2, ... (finite or infinite list) and
weights p1, p2, ... (pj ≥ 0 and p1 + p2 + ... = 1) such that for all x
F(x) = p1F1(x) + p2F2(x) + ...
• Equivalently, we can decompose the density or mass function f(x)
into convex combination of other density or mass functions
f(x) = p1f1(x) + p2f2(x) + ...
• Algorithm:
1. Generate a positive random integer J such that P(J = j) = pj
2. Return X with CDF FJ (given J = j, X is generated independent of J)
• The trick is to find Fj’s from which generation is easy and fast
• Sometimes we can use geometry of distribution to suggest a
decomposition
Example of Composition Method
Symmetric triangular distribution on [–1, +1]
if  1  x  0
x  1

Density : f ( x)   x  1 if 0  x  1
0
otherwise

0
0.5 x 2  x  0.5

CDF : F ( x)  
2
 0.5 x  x  0.5
1
if
if
if
if
x  1
1  x  0
0  x  1
x 1
• Inverse transform method
0.5 X 2  X  0.5
if U  1 / 2
U  F(X )  

2

0
.
5
X

X

0
.
5
if
U

1
/
2


 2U  1
if U  1 / 2
X 

1  2(1  U ) if U  1 / 2
Example of Composition Method
• Composition of the density
f ( x) 
( x  1) I[ 1,0] ( x)

( x  1) I[ 0,1] ( x)
 0.52( x  1) I[ 1,0] ( x)  0.52( x  1) I[ 0,1] ( x)

p1 f1 ( x)

p2 f 2 ( x )
p1
p2
f1(x)
f2(x)
Example of Composition Method
• Composition algorithm
1. Generate U1, U2 ~ U(0,1) independently
2. If U1< 1/2, return X  U 2 1 ;otherwise return X  1  1  U 2
• Comparison of algorithms (expectations)
Method
U’s
Compares
Adds
Multiplies
Sq. Roots
Inverse
transform
1
1
1.5
1
1
Composition
2
1
1.5
0
1
• So composition needs one more U, one less multiplication—faster if
RNG is fast
Convolution
• Suppose X has the same distribution as Y1 + Y2 + ... + Ym, where
the Yj’s are IID and m is fixed and finite
• Then, X ~ Y1 + Y2 + ... + Ym has the m-fold convolution of the
distribution of Yj
• Contrast with the composition method
– Composition method expresses the distribution function (or
density or mass) as a (weighted) sum of other distribution
functions (or densities or masses)
– Convolution method expresses the random variable itself as
the sum of other random variables
• Algorithm (obvious)
1. Generate Y1, Y2, ..., Ym independently from their distribution
2. Return X = Y1 + Y2 + ... + Ym
Examples of Convolution Method
• Symmetric triangular distribution on [–1, +1] (again)
 x  1 1  x  0

Density: f ( x)   x  1 0  x  1
0
otherwise

• By simple conditional probability, if U1,U2 ~ IID U(0,1), then U1 + U2
~ symmetric triangular distribution on [0, 2], so just shift it left by 1
X  U1  U 2  1  (U1  0.5)  (U 2  0.5)
Y1
Y2
• 2 Us, 2 adds; no compares, multiplies, or square roots and it
clearly beats both inverse transform and composition methods
• X ~ Erlang m with mean b > 0
– Then, X = Y1 + Y2 + ... + Ym where Yj’s ~ IID exponential with
mean b/m
Special Properties
• Simply “tricks” that rely completely on a given distribution’s form
• It often involves combining several “component” variates
algebraically (like convolution)
• Must be able to figure out distributions of functions of random
variables
• There is no coherent general form, only examples
Download