Random Numbers • • All stochastic simulations need to “generate” IID U(0,1) “random numbers” Other random variates coming from other distribution can be generated using U(0,1) random numbers 1 if 0 x 1 f ( x) 0 otherwise Uniform Random Numbers Uniform Random Numbers Early Methods • Physical – Dice – Cards – Lotteries • Mechanical – Spinning disks • Electrical Hamamaker’s Die Having no access to existing tables of random variables during world war II, the late well-known statistics professor H.C. Hamamaker at Philips Research Laboratories in Eindhoven, invented this die and the dice-throwing technique to produce random sampling numbers. The technique has been tested in various ways on series of up to 40,000 throws without showing evidence of bias. – Electronic Random Number Indicator Equipment (ERNIE, used by British Post Office to pick winners in a lottery (1959)) – RAND Corporation Tables (A million random digits (1955)) • Other – Pick digits randomly from phone directories – Decimals in expansion of π to 100,000 places Algorithmic and Sequential Computer Methods • • • • Sequential means that the next “random” number is determined by one or several of its predecessors according to a fixed mathematical formula The midsquare method: (von Neumann and Metropolis, 1945) – Start with Z0 = 4-digit positive integer – Z1 = middle 4 digits of Z02 (append 0s if necessary to left of Z02 to get exactly 8 digits); U1 = Z1, with decimal point at left – Z2 = middle 4 digits of Z12; U2 = Z2, with decimal point at left – Z3 = middle 4 digits of Z22; U3 = Z3, with decimal point at left – So on ... Problems the with midsquare method – Not really “random” since the entire sequence is determined by Z0 – If a Zi ever reappears, the entire sequence will be recycled (This will occur, since the only choices for 4-digit positive integers are 0000, 0001, 0002, ..., 9999) “Design” generators so Ui’s “appear” to be IID U(0,1) and cycle length is long The Midsquare Method Can We Generate “Truly” Random Numbers? • • “True” randomness – Only possible with physical experiment having U(0,1) output – Counting gamma rays from space – Problems • Not reproducible • Impractical for computers Practical approach: produce stream of numbers that appear to be IID U(0,1) – Use theoretical properties as far as possible – Perform empirical tests Criteria for Random Number Generators 1. Appear to be distributed uniformly on [0, 1] and independent 2. Fast, low memory requirement 3. Should be able to reproduce a particular stream of random numbers 1. This makes debugging easier 2. One can use identical random numbers to simulate alternative system configurations for better comparison 4. Should have provision in the generator for a large number of separate (nonoverlapping) streams of random numbers (usually such streams are just carefully chosen subsequences of the larger overall sequence) • Beware: There are many RNGs in use (and in software) that have extremely poor statistical properties Linear Congruential Generators • Specify four parameters (all nonnegative integers): Z0 = seed (or starting value) m = modulus (or divisor) a = multiplier c = increment • Then, Z1, Z2, Z3, ... are recursively generated by Zi = (aZi-1 + c) (mod m) • • Zi is the remainder of dividing aZi-1 + c by m Thus, 0 ≤ Zi ≤ m – 1, so the random number is defined as Ui = Zi/m • • Note that 0 ≤ Ui < 1 c = 0 (Multiplicative LCG), c > 0 (Mixed LCG) Example of a “Toy” LCG • • • m = 16, a = 5, c = 3, Z0 = 7 Formula Zi = 5Zi-1 + 3 (mod 16) Cycle length (or period) here is 16 = m (the longest possible) Questions: • • Can the period be “predicted” in advance? Can a full period be guaranteed? i Zi Ui 0 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 7 6 1 8 11 10 5 12 15 14 9 0 3 2 13 4 7 6 1 8 ---0.375 0.063 0.500 0.688 0.625 0.313 0.750 0.938 0.875 0.563 0.000 0.188 0.125 0.813 0.250 0.438 0.375 0.063 0.500 Objections to LCGs • Not really “random”. Indeed, an explicit formula for Zi is i c(a i 1) Z i a Z 0 (mod m) a 1 • • • Cycles when a previous Zi reappears (there are only m choices, so cycle length is ≤ m) Ui’s can only take the discrete values 0/m, 1/m, 2/m, ..., (m-1)/m , but they’re supposed to be continuous For this reason, pick m ≥ 109 or more in practice Full-Period Theorem • (Hull and Dobell, 1966) The LCG Zi = (aZi-1 + c) (mod m) has full period (m) if and only if all three of the following conditions hold: 1. Both c and m are relatively prime (i.e., the only positive integer that divides both c and m is 1) 2. If q is any prime number that divides m, then q also divides a – 1 3. If 4 divides m, then 4 also divides a – 1 • • • Choice of initial seed Z0 is irrelevant Checking these conditions for a “real” generator may be difficult Condition 1 says that if c = 0, then full period is impossible (since m divides both m and c = 0); but m – 1 is possible Theorem is about period only, it says nothing about uniformity of a subcycle, independence, or other statistical properties • Desirable Properties of LCGs • • • • • • Fast Low storage requirement Reproducible (remember Z0) Restart in middle (remember last Zi, use as Z0 next time) Multiple streams (save separated seeds) May have good statistical properties (depends on choice of parameters) Implementation/Portability Issues • • • • LCGs (and other generators) generally deal with very large integers Usually, m ≥ 2.1 billion ≈ 2.1 × 109 ≈ 231 Take care in coding, especially in high level languages (FORTRAN, C) To avoid need to store and operate on large integers: use sophisticated abstract-algebra tricks to “break up” the arithmetic on large integers, then reassemble Other Kinds of Generators More General Congruences Zi = g(Zi-1, Zi-2, ...) (mod m), Ui = Zi/m – LCG: g(Zi-1) = a Zi-1 + c – Multiplicative recursive generator: g(Zi-1, Zi-2, ..., Zi-q) = a1Zi-1 + ... + aq Zi-q – Quadratic congruential generator: g(Zi-1) = a’Zi-12 + aZi-1 + c – Fibonacci (bad): g(Zi-1, Zi-2) = Zi-1 + Zi-2 Composite Generators • Shuffling – Fill a vector of length 128 (say) from generator 1 – Use generator 2 to pick one of the 128 in the vector – Fill the hole with the next value from generator 1, use generator 2 to pick one of the 128 in the vector, etc. – Shuffling a bad generator improves it, but shuffling a good generator doesn’t gain much • Differencing LCGs – Get Z1i and Z2i from LCGs with different moduli – Let Zi = (Z1i – Z2i ) (mod m); Ui = Zi / m – They have very long periods (like 1018) and very good statistical properties • Wichmann/Hill – Use three LCGs to get U1i , U2i , and U3i sequences – Let Ui = fractional part of U1i + U2i + U3i – They have long period and good statistical properties (equivalent to a LCG!) Combined Multiple Recursive Generators • Recall the single MRG: Zi = ( a1Zi-1 + a2Zi-2 + ...+ aqZi-q ) (mod mi), Ui = Zi /mi • Run J MRGs simultaneously to get {Z1,i},{Z2,i}, ..., {ZJ,i} • Let m1 be the modulus used in the first of these J MRGs • For constants δ1, δ2, ..., δJ, define Yi = (δ1Z1,i+ δ2Z2,i+...+ δJZJ,i) (mod m1), Ui = Yi / m1 • Must choose constants carefully, but extremely long periods and extremely good statistical behavior can be achieved • Specific example given in the text for J = 2, q = 3, δ1 = 1, δ2 = -1, with small, fast, portable C code in Appendix 7B (10,000 streams spaced 106 apart) • Period is approximately 3.1 x 1057 • Excellent statistical properties through 32 dimensions Testing Random Number Generators • • • Since RNGs are completely deterministic, we need to test them to see if they appear to be random and IID uniform on [0, 1] General advice: be wary of “canned” RNGs in software that is not specifically simulation software, especially if they are not thoroughly documented; perhaps even test them before using Two types of testing procedures 1. Empirical statistical tests – Chi-square – Kolmogorov-Smirnov – Serial test – Runs test – Correlation test 2. Theoretical procedures – Full period values – Lattice structure Empirical Statistical Tests • • Use trial generator to generate some U’s and apply statistical tests This gives information only on that part of a generator’s cycle examined (local) Chi-square, K-S tests for U(0,1) (all parameters known) 2 k 1 k f j ej j 1 ej 2 k f j n / k j 1 n / k 2 2 k k f j n / k n j 1 Serial test: generalized chi-square test to higher dimensions • Two dimensions, subdivide unit square into subcells • 2 k 2 1 k fi, j ei, j i , j 1 ei , j 2 k2 n k i , j 1 fi , j (n / k 2 ) Provides indirect test of independence 2 Empirical Statistical Tests Correlation test: perform correlation analysis using the correlation coefficient of lag j to test H0: independence E[U i ] 0.5, Var[U i ] 1/12 Cov(U i , U i j ) E[U iU i j ] E[U i ]E[U i j ] E[U iU i j ] 0.25 j Cov(U i , U i j ) Var[U i ]Var[U i j ] 12 E[U iU i j ] 3 n 1 12 h ˆ j U U 3, h 1kj 1( k 1) j j 1 h 1 k 0 Var( ˆ j ) Aj H0 ˆ j 13h 7 h 1 2 13h 7 h 1 2 N (0,1) Checking the Excel RAND() generator Data Characteristic Value Source file <edited> Observation type Real valued Number of observations 100 Minimum observation 0.03580 Maximum observation 0.99638 Mean 0.50290 Median 0.50590 Variance 0.06893 Coefficient of variation 0.52207 Skewness 0.04632 Checking the Excel RAND() generator Checking the Excel RAND() generator Lag-Correlation Plot 1.00 0.75 0.50 Correlation 0.25 0.00 -0.25 -0.50 -0.75 -1.00 1 2 3 4 5 6 7 8 9 10 Lag 20 lag correlations betw een -0.17938 and 0.12498 11 12 13 14 15 16 17 18 19 20 Checking the Excel RAND() generator Checking the Excel RAND() generator Checking the Excel RAND() generator Some General Guidelines • Beware of “canned” generators; especially in non-simulation software, and especially if it is poorly documented • Insist on documentation of the generator, check with “tried and true” ones • Don’t “seed” the generator with the square root of the clock or any other such scatterbrained scheme, we want to get reproducibility of the random-number stream • Reasonably safe ideas – Use one of the generators from the book – Look for highly portable generators – See if they provide default streams with controlled, known spacing – Take care to avoid overlapping the streams