Random Numbers and
Simulation
Generating truly random numbers is not possible
• Programs have been developed to generate pseudo-random numbers
• Values are generated from deterministic algorithms
© Fall 2011 John Grego and the University of South Carolina
1
Random Numbers
Pseudo-random deviates can pass any statistical test for randomness
They appear to be independent and identically distributed
Random number generators for common distributions are available in R
Special techniques (STAT 740) may be needed as well
2
Monte Carlo Simulation
Some common uses of simulation
• Modeling stochastic behavior
• Calculating definite integrals
• Approximating the sampling distribution of a statistics (e.g., maximum of a random sample)
3
Modeling Stochastic Behavior
Buffon’s needle
Random Walk
Observe X
1
, X p=P(X
S
1
,S i
=1)=P(X
2 i
, …, where
2
,…, where
=-1)=.5 and study
S i i
X j 1 j
4
Modeling Stochastic Behavior
This is also called
X i
Gambler’s ruin ; each represents a $1 bet with a return of
$2 for a win and $0 for a loss.
5
Gambler’s Ruin
The properties of a of an unfair game ( fair p≠.5
) game ( p=.5
) are a lot more interesting than the properties
Some properties of this process are easy to anticipate ( E(S) )
6
Gambler’s Ruin
Some properties are difficult to anticipate, and can be aided by simulation.
• Expected number of returns to 0
• Expected length of a winning streak
• Probability of going broke given an initial bank
7
Calculating Definite Integrals
In statistics, we often have to calculate difficult definite integrals (posterior distributions, expected values)
I
a b h ( x ) dx
(here, x could be multidimensional)
8
Calculating Definite Integrals
Example 1
I
1
0
1
4
1 x 2 dx
Example 2
I
2
0
1
0
1
(4 x
1
2 2 x
2
2 ) dx
2 dx
1
9
Hit-or-Miss Monte Carlo
Example 1 h ( x )
4
1 x 2
0
1
4
1 x 2 dx 4(arctan(1) arctan(0)) 4 /4
Determine c such that c≥h(x) entire region of interest (here, across c=4 )
10
Hit-or-Miss Monte Carlo
Generate
X i
’s from from n random uniform (X
U[a,b] (here, U[0,1] ) and Y
U[0,c] (here, U[0,4] ) i
,Y i
) pairs, i
’s
Count the number of times (call this that Y i is less than h(X i
) m )
Then I
1
≈c(b-a)m/n
• I.e., (height)(width)(proportion under curve)
11
Classical Monte Carlo
Integration
I
a b h ( x ) dx
Take n random uniform values, U over [a,b] and estimate I using
1
,…,U n
I b a n n
i 1
This method seems straightforward, but is actually more efficient than Hit-or-Miss
12
Expected Value of a Function of a Random Variable
Suppose X is a random variable with density f . Find E[h(x)] for some function h , e.g.,
2
E
E
sin
13
Expected Value of a Function of a Random Variable
h x dx
For n the distribution of
X random values X
1
, X
2
, …, X n from
X (i.e., with density f ),
E
h
1 n i n
1 h
i
14
Examples
Example 3: If a
X is a random variable with
N(10,1) distribution, find E(X 2 )
Example 4: If a
Y is a random variable with
Beta(5,1) distribution, E(-lnY)
There are more advanced methods of integration using simulation (Importance
Sampling)
15
Integration
integrate() performs numerical integration for functions of a variable ( single not using simulation techniques) adapt() in the adapt package performs multivariate numerical integration
16
Approximating the Sampling
Distribution of a Statistic
To perform inference (CI’s, hypothesis tests) based on sampling statistics, we need to know the sampling distribution of the statistics, at least up to an approximation
Example: X
1
, X
2
, …, X n
~ iid N( m , s 2 ).
T
X m s n
has a t ( n 1) distribution
17
Approximating the Sampling
Distribution of a Statistic
What if the data’s distribution is not known?
• Large sample: Central Limit Theorem
• Small sample: Normal theory or nonparametric procedures based on permutation distributions
18
Approximating the Sampling
Distribution of a Statistic
If the population distribution is known, we can approximate the sampling distribution with simulation.
• Repeatedly ( samples of size distribution m times) generate random n from the population
• Calculate a statistic (say, S ) each time
• The empirical (observed) distribution of Svalues approximates the true distribution of S
19
Example
X
1
, X
2
, X
3
, X
4
~Expon(1)
What is the sampling distribution of:
X (the mean) max( X ) min( X )
(the midrange)
2
20