Pseudo-Random Numbers: Linear Congruential Generators As digital computers are deterministic machines, they are incapable of generating random numbers. Instead they generate large cycles of pseudo-random numbers (i.e.) numbers which will pass most of the statistical tests for randomness. However, if the program is rerun on a subsequent day the exact same stream of numbers will be produced. To overcome this problem, since the chain of numbers is very large, a starting seed which uses the current date and I.D. number of the user can be used to enter the chain at different points every time the program is run. a pseudo-random number generator is said to be full cycle if it leaves no gaps in the range of numbers it generates. Form: X0 a given “seed” Xi+1 = ( a Xi + c ) mod m Example Xi+1 = 7 Xi mod 11 -> Xi+1 = 6 Xi mod 11 -> Example [IMSL] [SIMSCRIPT] [RANDU] Xi+1 = 16807 Xi mod (231 - 1) Xi+1 = 6303600167 Xi mod (231 - 1) Xi+1 = 65539 Xi mod (231 ) Lemma. 1, 7, 5, 2, 3, 10, 4, 6, 9, 8, (1) 1, 6, 3, 7, 9, 10, 5, 8, 4, 2, (1) a prime modulus Xi+1 = a Xi mod m, with m prime, is full cycle if m divides ai - 1 for i = m-1, but for no smaller i Xi+1 = ( a Xi + c ) mod m is full cycle if c and m are relatively prime and if p divides m then p divides a - 1, if 4 divides m then 4 divides a - 1 Tests for Randomness In addition to various “Run Tests” and non-parametric tests, the following are useful. Chi-Square Test We partition the M random numbers generated into n classes and calculate the test statistic S ( Oj - Ej )2 / Ej ~ c2M-n-1 distribution for M large, min {Ej } > 4 and Ej ~ 4 M0.4. Kolmogorov Smirniv Test. Order the observations so that X1 X2 … Xn and calculate over i = { 1, 2, … , n} K+n = n Max { i / n - F ( Xi ) } K-n = n Max { F ( Xi ) - (i - 1) / n} If K+n or K-n are either too big or too small, reject H0: Distribution is F at the relevant confidence interval. k-Dimensional Spectral Tests 1. Regard the numbers as points on [0, 1] ~ uniformly distributed 2. Regard the pairs ( Xi, Xi+1 ) as points in [0, 1]2 ~ uniformly distributed 3. Regard the triplets ( Xi, Xi+1, Xi+2 ) as points on [0, 1]3 ….. Lemma. The sequences of size n from the generator Xi+1 = a Xi mod m fall on at most (n! m)1/n parallel hyperplanes. The Spectral Value of a generator is the maximum distance between the hyperplanes. A generator with a small spectral value is desirable. Example. The diagrams show that Xi+1 = 7 Xi mod 11 has a spectral value of 3.479, while that of Xi+1 = 6 Xi mod 11 is 4.919. 3.479 4.919 Newspaper Boy Problem This is a seminal example, which shows how a (Historical Data) forecasting filter can use historical data to Index Demand Freq. optimise parameter settings in nonlinear models. i Xi fi S fi The newspaper boy buys papers at 40p, sells 1 10 10 10 them at 50p and gets 20p back on unsold copies 2 30 25 35 at the end of the day. His “corporate strategy” is 3 70 45 80 to buy 90% of the demand that occurred on the 4 100 15 95 previous day. 5 130 5 100 Day Random # Buy Demand Sell Return Profit (p) Cumulative (p) 0 70 - 1 2 0.25 0.60 63 20 22 52 22 20 41 0 -600 200 -600 -400 3 4 0.80 0.55 47 63 70 48 47 48 0 15 470 180 70 250 5 6 0.90 0.45 43 81 90 39 43 39 0 42 430 -450 680 230 F ( Xi ) 0.10 0.35 0.80 0.95 1.00 Nonuniform Random Numbers - Search In simulation models we typically disregard the first 10,000 iterations to allow the system reach an equilibrium and then monitor the simulation over the next 100,000 iterations. Binary search is quite an efficient algorithm for finding the simulation value relative to a look-up key. In simulation models as this look-up facility will be used a large number of times, it is important that the most efficient algorithm is used. Bucket search is the most efficient algorithm known for searching tables. Suppose X1 < X2 < … < Xn Place m equally spaced buckets on [0, 1], with m a power of 2 and n m 3 n. Let pi = Min { j: F (Xj ) > (i - 1) / m} qi = Min { j: F (Xj ) i / m } Algorithm: Generate the uniform [0, 1] random number U Search pk pk+1, … , qk for j with F (Xj - 1) < U F (Xj ) Return j - 1 Interpolate for the exact value X = [F(Xj) - U]Xj-1 + [U - F(Xj-1)]Xj F(Xj) - F(Xj-1) Example [Newspaper Boy] Suppose U = 0.96 and m = 8, so we search {4, … , 5} for j : F( X4) < 0.96 F (X5 ) Return 4 and so X = 106, by interpolation. Generating Nonuniform Random Numbers 1 Lemma. U uniform [0, 1] <=> Prob { U } = U Lemma. F(X) increasing => F(X) uniform [0, 1] Prob { F(X) } = Prob { X F-1 ( )] = F F-1 ( ) = . Lemma. F-1(U) has the distribution function F. Prob { F-1 (U) t} = Prob { U F( t) } = F( t). 1 1 1 F (x) X F-1 ( ) Exponential. F( x) = 1 - exp ( - l x) , for x 0, so -1 F (U) = - ln ( 1 - U ) / l has an exponential distribution with rate l . For convenience, tables of - ln ( 1 - U ) have been published. This distribution is the kernel of simulation systems for modelling queues. Cauchy This is the Student t1 distribution has median m, scale s and has no mean or higher moments. Using this distribution will result in the simulation model crashing. The sequence in which it crashes can be used to debug the simulation model. Later a proper random number generator can be substituted. F(x) = 1 / p Tan -1 [(x - m ) / s ] + 1 / 2 F-1 ( U) = s Tan [p ( U - 1/2 )] + m ` Normal The Box-Muller formulae generate approximate n (0, 1) pairs: Yi = Cos ( 2 p Ui+1 ) -2 ln (Ui) Zi = Sin ( 2 p Ui+1 ) -2 ln (Ui) Illustration of M | M | 1 The diagram illustrates the dual nature of queuing statistics. T is the length of the simulation run. State No. in the System Expected number in the system = Shaded Area / T. This corresponds to numerical integration (accumulation) and requires careful computer 3 coding. Expected waiting time in the system = {S Service Time Ends i - S Arrival Time i } Number of Customers n This is a tally operation, requiring the use of three registers and is easy to implement. E1 E2 E3 E4 E5 E6 T 0 1 2 3 4 5 6 Stage Computer simulation languages, such as SIMSCRIPT, are a direct implementation of these ideas. The one caveat that needs to be added at this time, is that writing good simulation models requires considerable skill. In particular, unless the variance terms are controlled, each simulation run will result in wildly differing results.