Lecture 15 and 16.pptx

advertisement
Random Numbers
•
•
All stochastic simulations need to “generate” IID U(0,1) “random
numbers”
Other random variates coming from other distribution can be
generated using U(0,1) random numbers
1 if 0  x  1
f ( x)  
0 otherwise
Uniform Random Numbers
Uniform Random Numbers
Early Methods
• Physical
– Dice
– Cards
– Lotteries
• Mechanical
– Spinning disks
• Electrical
Hamamaker’s Die
Having no access to existing tables of
random variables during world war II,
the late well-known statistics professor
H.C. Hamamaker at Philips Research
Laboratories in Eindhoven, invented this
die and the dice-throwing technique to
produce random sampling numbers. The
technique has been tested in various
ways on series of up to 40,000 throws
without showing evidence of bias.
– Electronic Random Number Indicator Equipment (ERNIE, used by
British Post Office to pick winners in a lottery (1959))
– RAND Corporation Tables (A million random digits (1955))
• Other
– Pick digits randomly from phone directories
– Decimals in expansion of π to 100,000 places
Algorithmic and Sequential
Computer Methods
•
•
•
•
Sequential means that the next “random” number is determined by one or
several of its predecessors according to a fixed mathematical formula
The midsquare method: (von Neumann and Metropolis, 1945)
– Start with Z0 = 4-digit positive integer
– Z1 = middle 4 digits of Z02 (append 0s if necessary to left of Z02 to get
exactly 8 digits); U1 = Z1, with decimal point at left
– Z2 = middle 4 digits of Z12; U2 = Z2, with decimal point at left
– Z3 = middle 4 digits of Z22; U3 = Z3, with decimal point at left
– So on ...
Problems the with midsquare method
– Not really “random” since the entire sequence is determined by Z0
– If a Zi ever reappears, the entire sequence will be recycled (This will
occur, since the only choices for 4-digit positive integers are 0000,
0001, 0002, ..., 9999)
“Design” generators so Ui’s “appear” to be IID U(0,1) and cycle length is
long
The Midsquare Method
Can We Generate “Truly” Random
Numbers?
•
•
“True” randomness
– Only possible with physical experiment having U(0,1) output
– Counting gamma rays from space
– Problems
• Not reproducible
• Impractical for computers
Practical approach: produce stream of numbers that appear to be IID U(0,1)
– Use theoretical properties as far as possible
– Perform empirical tests
Criteria for Random Number Generators
1. Appear to be distributed uniformly on [0, 1] and independent
2. Fast, low memory requirement
3. Should be able to reproduce a particular stream of random numbers
1. This makes debugging easier
2. One can use identical random numbers to simulate alternative
system configurations for better comparison
4. Should have provision in the generator for a large number of separate
(nonoverlapping) streams of random numbers (usually such streams are
just carefully chosen subsequences of the larger overall sequence)
•
Beware: There are many RNGs in use (and in software) that have
extremely poor statistical properties
Linear Congruential Generators
•
Specify four parameters (all nonnegative integers):
Z0 = seed (or starting value)
m = modulus (or divisor)
a = multiplier
c = increment
•
Then, Z1, Z2, Z3, ... are recursively generated by
Zi = (aZi-1 + c) (mod m)
•
•
Zi is the remainder of dividing aZi-1 + c by m
Thus, 0 ≤ Zi ≤ m – 1, so the random number is defined as
Ui = Zi/m
•
•
Note that 0 ≤ Ui < 1
c = 0 (Multiplicative LCG), c > 0 (Mixed LCG)
Example of a “Toy” LCG
•
•
•
m = 16, a = 5, c = 3, Z0 = 7
Formula
Zi = 5Zi-1 + 3 (mod 16)
Cycle length (or period) here is
16 = m (the longest possible)
Questions:
•
•
Can the period be “predicted” in
advance?
Can a full period be guaranteed?
i
Zi
Ui
0
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
7
6
1
8
11
10
5
12
15
14
9
0
3
2
13
4
7
6
1
8
---0.375
0.063
0.500
0.688
0.625
0.313
0.750
0.938
0.875
0.563
0.000
0.188
0.125
0.813
0.250
0.438
0.375
0.063
0.500
Objections to LCGs
•
Not really “random”. Indeed, an explicit formula for Zi is
 i
c(a i  1) 
Z i  a Z 0 
(mod m)
a 1 

•
•
•
Cycles when a previous Zi reappears (there are only m choices, so
cycle length is ≤ m)
Ui’s can only take the discrete values 0/m, 1/m, 2/m, ..., (m-1)/m ,
but they’re supposed to be continuous
For this reason, pick m ≥ 109 or more in practice
Full-Period Theorem
•
(Hull and Dobell, 1966) The LCG Zi = (aZi-1 + c) (mod m) has full
period (m) if and only if all three of the following conditions hold:
1. Both c and m are relatively prime (i.e., the only positive integer
that divides both c and m is 1)
2. If q is any prime number that divides m, then q also divides a – 1
3. If 4 divides m, then 4 also divides a – 1
•
•
•
Choice of initial seed Z0 is irrelevant
Checking these conditions for a “real” generator may be difficult
Condition 1 says that if c = 0, then full period is impossible (since m
divides both m and c = 0); but m – 1 is possible
Theorem is about period only, it says nothing about uniformity of a
subcycle, independence, or other statistical properties
•
Desirable Properties of LCGs
•
•
•
•
•
•
Fast
Low storage requirement
Reproducible (remember Z0)
Restart in middle (remember last Zi, use as Z0 next time)
Multiple streams (save separated seeds)
May have good statistical properties (depends on choice of
parameters)
Implementation/Portability Issues
•
•
•
•
LCGs (and other generators) generally deal with very large integers
Usually, m ≥ 2.1 billion ≈ 2.1 × 109 ≈ 231
Take care in coding, especially in high level languages (FORTRAN, C)
To avoid need to store and operate on large integers: use sophisticated
abstract-algebra tricks to “break up” the arithmetic on large integers, then
reassemble
Other Kinds of Generators
More General Congruences
Zi = g(Zi-1, Zi-2, ...) (mod m), Ui = Zi/m
– LCG: g(Zi-1) = a Zi-1 + c
– Multiplicative recursive generator: g(Zi-1, Zi-2, ..., Zi-q) = a1Zi-1 +
... + aq Zi-q
– Quadratic congruential generator: g(Zi-1) = a’Zi-12 + aZi-1 + c
– Fibonacci (bad): g(Zi-1, Zi-2) = Zi-1 + Zi-2
Composite Generators
•
Shuffling
– Fill a vector of length 128 (say) from generator 1
– Use generator 2 to pick one of the 128 in the vector
– Fill the hole with the next value from generator 1, use generator 2 to
pick one of the 128 in the vector, etc.
– Shuffling a bad generator improves it, but shuffling a good generator
doesn’t gain much
•
Differencing LCGs
– Get Z1i and Z2i from LCGs with different moduli
– Let Zi = (Z1i – Z2i ) (mod m); Ui = Zi / m
– They have very long periods (like 1018) and very good statistical
properties
•
Wichmann/Hill
– Use three LCGs to get U1i , U2i , and U3i sequences
– Let Ui = fractional part of U1i + U2i + U3i
– They have long period and good statistical properties (equivalent to a
LCG!)
Combined Multiple Recursive Generators
• Recall the single MRG:
Zi = ( a1Zi-1 + a2Zi-2 + ...+ aqZi-q ) (mod mi), Ui = Zi /mi
• Run J MRGs simultaneously to get {Z1,i},{Z2,i}, ..., {ZJ,i}
• Let m1 be the modulus used in the first of these J MRGs
• For constants δ1, δ2, ..., δJ, define
Yi = (δ1Z1,i+ δ2Z2,i+...+ δJZJ,i) (mod m1), Ui = Yi / m1
• Must choose constants carefully, but extremely long periods and
extremely good statistical behavior can be achieved
• Specific example given in the text for J = 2, q = 3, δ1 = 1, δ2 = -1, with
small, fast, portable C code in Appendix 7B (10,000 streams spaced
106 apart)
• Period is approximately 3.1 x 1057
• Excellent statistical properties through 32 dimensions
Testing Random Number Generators
•
•
•
Since RNGs are completely deterministic, we need to test them to
see if they appear to be random and IID uniform on [0, 1]
General advice: be wary of “canned” RNGs in software that is not
specifically simulation software, especially if they are not thoroughly
documented; perhaps even test them before using
Two types of testing procedures
1. Empirical statistical tests
– Chi-square
– Kolmogorov-Smirnov
– Serial test
– Runs test
– Correlation test
2. Theoretical procedures
– Full period values
– Lattice structure
Empirical Statistical Tests
•
•
Use trial generator to generate some U’s and apply statistical tests
This gives information only on that part of a generator’s cycle
examined (local)
Chi-square, K-S tests for U(0,1) (all parameters known)

2
k 1
k
 f j  ej 
j 1
ej

2
k

 f j   n / k 
j 1
n / k 
2
2
k k
   f j   n / k 
n j 1
Serial test: generalized chi-square test to higher dimensions
• Two dimensions, subdivide unit square into subcells

•
2
k 2 1

k
 fi, j  ei, j 
i , j 1
ei , j

2
k2

n

k
i , j 1
fi , j  (n / k 2 )
Provides indirect test of independence

2
Empirical Statistical Tests
Correlation test: perform correlation analysis using the correlation
coefficient of lag j to test H0: independence
E[U i ]  0.5, Var[U i ]  1/12
Cov(U i , U i  j )  E[U iU i  j ]  E[U i ]E[U i  j ]  E[U iU i  j ]  0.25
j 
Cov(U i , U i  j )
Var[U i ]Var[U i  j ]
 12 E[U iU i  j ]  3
 n  1
12 h
ˆ j 
U
U

3,
h

 1kj 1( k 1) j
 j  1
h  1 k 0


Var( ˆ j )
Aj 
H0

ˆ j
13h  7
 h  1
2
13h  7
 h  1
2
 N (0,1)
Checking the Excel RAND() generator
Data Characteristic
Value
Source file
<edited>
Observation type
Real valued
Number of observations
100
Minimum observation 0.03580
Maximum observation 0.99638
Mean 0.50290
Median
0.50590
Variance
0.06893
Coefficient of variation 0.52207
Skewness
0.04632
Checking the Excel RAND() generator
Checking the Excel RAND() generator
Lag-Correlation Plot
1.00
0.75
0.50
Correlation
0.25
0.00
-0.25
-0.50
-0.75
-1.00
1
2
3
4
5
6
7
8
9
10
Lag
20 lag correlations betw een -0.17938 and 0.12498
11
12
13
14
15
16
17
18
19
20
Checking the Excel RAND() generator
Checking the Excel RAND() generator
Checking the Excel RAND() generator
Some General Guidelines
• Beware of “canned” generators; especially in non-simulation
software, and especially if it is poorly documented
• Insist on documentation of the generator, check with “tried and true”
ones
• Don’t “seed” the generator with the square root of the clock or any
other such scatterbrained scheme, we want to get reproducibility of
the random-number stream
• Reasonably safe ideas
– Use one of the generators from the book
– Look for highly portable generators
– See if they provide default streams with controlled, known
spacing
– Take care to avoid overlapping the streams
Download