Pseudorandom Number Generators and the Metropolis Algorithm Jane Ren Outline • Introduction to random number generators • The Marsaglia random number generator • SPRNG (A Scalable Library for Pseudorandom Number Generation) • Textbook assignment 3.2.3 with Marsaglia random numbers • Assignment 3.2.3 with random numbers from another type of random number generator. • Test results • Conclusions Properties of Good Random Number Generators • • • • • • • • • Randomness Uniformity Computational Efficiency Long period length Unpredictability Repeatability Portability Homogeneity Good random numbers are not easy to find The Marsaglia Random Number Generator • The programs in our textbook use the Marsaglia generator. • A combinations of two generators • Lagged Fibonacci series – In = In-r – In-s mod 224, r = 97, s = 33 • Arithmetic Series – I – k, I – 2k, I – 3k…, mod (224 - 3), k = 7654321 • The final step – – – – Take xn from a lagged Fibonacci series Take yn from a n arithmetic series If x >= y, the final number is x – y Else the final number is x – y + 1 Sample Output from the Marsaglia Generator • Assignment a0102_02 with seed1 = 1 and seed2 = 6. • Run 1 RANMAR INITIALIZED. idat= 1, xr= 0.894103587 idat= 2, xr= 0.034489155 idat= 3, xr= 0.330096781 idat= 4, xr= 0.613509536 extra xr= 0.710489452 • Run 2 MARSAGLIA CONTINUATION. idat= 1, xr= 0.710489452 idat= 2, xr= 0.429685593 idat= 3, xr= 0.142630935 idat= 4, xr= 0.369295061 extra xr= 0.894588530 Some Advantages of he Marsaglia Generator • The lagged Fibonacci series has a period of (224 – 1) * 294 ≈2110. This is very good for a pseudorandom number generator. • The Marsaglia random number generator is portable because it uses the IEEE standard representation of 32 bit floating point numbers. Some Drawbacks of the Marsaglia Generator • Low discretization step size 2-24 – We have already seen this problem from assignment a0204_03. • • Not as fast as some other generators, such as the congruential generator. It fails the birthday spacing test. – The birthday spacing test chooses m birthdays in a year of n days. If a certain day appeared more than once, then that value should be asymptotically Poisson distributed with mean m**3/(4n). 3.2.3 Assignment for section 3.2 Page 152 • a0302_01 • “Write a Metropolis program to generate the normal distribution through the Markov process x x’ = x + 2a(xr – 0.5) where xr is a uniformly distributed random number in the range [0,1) and a is a real constant. Use a = 3.0 and the initial value x = 0.0 to test your program. (i) Generate a chain of 2000 events. Plot the empirical peaked distribution function for these event in comparison with the exact peaked distribution function. Monitor also the acceptance rate. (ii) Repeat the above for a chain of 20000 events.” Assignment a0302_01 • Output for a chain of 20 000 events RANMAR INITIALIZED. Metropolis Generation of Gaussian random numbers Ndat=20000, a=3.0000 Acceptance rate found 0.491700 • CPU time for running the Metropolis Program with Marsaglia = 0.126u (average of 10 runs) • Comparison between the empirical peaked distribution and the exact peaked normal distribution Scalable Library for Pseudorandom Number Generation (SPRNG) • • Software package released under the GNU General Public License. There are 6 kinds of random number generators in SPRNG 1. Combined Multiple Recursive Generator 2. 48 Bit Linear Congruential Generator with Prime Addend 3. 64 Bit Linear Congruential Generator with Prime Addend 4. Modified Lagged Fibonacci Generator 5. Multiplicative Lagged Fibonacci Generator 6. Prime Modulus Linear Congruential Generator * Mersenne Twister Generator (will be added) • Link http://sprng.cs.fsu.edu/ Why SPRNG? • The random number generators in SPRNG are considered the top generators. • The generators have never failed the standard tests. – Including some of the largest random number tests with up to 10**13 • The tests used were collisions, coupon, equidistribution, gap, maximum of t, permutations, poker, runs up, serial, sum of distributions, Ising: Metropolis, Ising: Wolff, and Random Walk. • What if one of the SPRNG generators was used for assignment a0302_01? • All of the random number generators passed the empirical tests, so which one do we want to use? Descriptions of Some of the Tests SPRNG Generators passed • Permutation Test – Divide the random sequence into subsequences of a certain length. – Rank each value in the subsequence by magnitude. – Count the frequency each permutation occurs. – Conduct a Chi-Square test with these counts. – Compare the results. • Collision Test – Measure the number of times the number falls at a position that already had a number in it. • Coupon Collector’s Test – Compare actual period with expected period The Generators 1. Combined Multiple Recursive Generator • • Period ≈ 2219 Fast because it is a linear congruential generator 2. 48 Bit Linear Congruential Generator with Prime Addend • • • Period = 248 Number of distinct streams is of the order of 219 Fast because it is a linear congruential generator 3. 64 Bit Linear Congruential Generator with Prime Addend • • • Period = 264 Number of distinct streams is of the order of 108 Fast because it is a linear congruential generator The Generators 4. Modified Lagged Fibonacci Generator • Period = 231(2l – 1), where l is the lag. • • The default period is 21310 Number of distinct streams is of the order of 231(l – 1)-1 , default = 21310 5. Multiplicative Lagged Fibonacci Generator • • Period = 261(2l – 1), default = 281 Number of distinct streams is of the order of 263(l – 1)-1 , default = 281 6. Prime Modulus Linear Congruential Generator • • • Period = 264 - 2 Number of distinct streams ≈ 258 Fast because it is a linear congruential generator Is there a better generator? • The current 6 random number generators in SPRNG work sufficiently well. • The Mersenne Twister generator MT19937 is not a part of SPRNG yet, but it is being added to SPRNG. • I chose to use Mersenne Twister as a substitute for the Marsaglia generator for a0302_01. Why Mersenne Twister? • Astronomical period of 219937 – 1 – No other generator has achieved this! – Marsaglia has 2110 • 623 dimensional equidistribution with up to 32 bit accuracy in a working area of only 624 words. – No other generator has achieved this! • Can be employed in practical important simulations because it is based on a good definition of randomness. – Many other generators are only random in a particular area. • Performance exceeds that of integer-largemodulus generators. • Efficient random number generation. • Passed tests many stringent tests, including the diehard tests (invented by Marsaglia) and tests on the higher dimensional uniformity, including the spectral test and the k-distribution test. • Most suitable for Monte Carlo because it emphasizes on the most significant bits. Mersenne Twister – The Algorithm • Based on the recurrence (mod 2) • l, s, t, and u are integers, b and c are bit masks of word size, x[0:n-1] is an array of n unsigned integers of word size, and u, ll, and a are unsigned constant integers of word size. Tempering Matrix • The Mersenne Twister uses a tempering matrix, so that multiplication by the tempering matrix is very efficient and fast. – Example of a tempering Matrix • Let x = (xw-1, xw-2, xw-3,…, x0) a = (aw-1, aw-2, aw-3,…, a0) • xA can be calculated with only binary operations. – xA = shiftright(x) if x0 = 0 – xA = shiftright(x) XOR a if x0 = 1 Mersenne Twister – How does It Work? • Step 0 – Create a mask for the upper bits and another one for the lower bits. • Step 1 – The x array is initialized with a seed • Step 2 – y is the upper bits of x[i] concatenated with the lower bits of x[i+1]. Mersenne Twister – How does It Work? • Step 3 – y * A. – A is chosen carefully so it can be easily calculated with right shift and exor. • Step 4 – Multiply with T (tempering matrix) for better equidistribution and good acurracy. • Steps 5 & 6 • Increment i by 1 and repeat the process Mersenne Twister • We use an improved version of the Mersenne Twister from it’s original authors (released 2/2002). • Some disadvantages of the original version – Most significant bits only affect most significant bits of the array state. – Non-overlapping subsequences on successive runs is not possible. – Not as fast as the improved version. Sample Output from the Mersenne Twister Run 1 MERSENNE TWISTER INITIALIZED. idat= 1, xr= 0.814723692 idat= 2, xr= 0.135477004 idat= 3, xr= 0.905791934 idat= 4, xr= 0.835008590 extra xr= 0.126986812 • Run 2 MERSENNE TWISTER CONTINUATION. idat= 0, xr= 0.126986812 idat= 1, xr= 0.968867771 idat= 2, xr= 0.913375856 idat= 3, xr= 0.221034043 extra xr= 0.632359250 Mersenne Twister in a302_01 • Output for a chain of 20 000 events RANMAR INITIALIZED. Mersenne Twister Generation of Gaussian random numbers Ndat=20000, a=3.0000 Acceptance rate found 0.497850 • The acceptance rate is 0.4917000 with the Marsaglia generator • CPU Time for the Metropolis Program with Mersenne Twister = 0.124 s (Average of 10 runs) – Marsaglia = 0.126 s Mersenne Twister in a302_01 • Comparison between the empirical peaked distribution and the exact peaked normal distribution Results from Marsaglia and Mersenne Twister Combined Mersenne Twister and Marsaglia Execution Times • The CPU times for running the Metropolis program with Marsaglia and Mersenne Twister are very close. • Does it mean both generators have roughly the same speed? • Mersenne Twister is faster than Marsaglia. • I wrote a program to only generate 4,294,967,295 Marsaglia numbers and another one that only generates 4,294,967,295 Mersenne Twister numbers. – Mersenne Twister = 244.710u – Marsaglia = 312.580u – MT is approximately 22% faster. Gaussian Difference Test • Data from the Metropolis program • Marsaglia – Mean = 0.003543 – Error bar = 0.007169 • Mersenne Twister – Mean = -0.008677 – Error bar = 0.006998 • Q = 0.222553 GNUPLOT FOR DISTRIBUTION FUNCTION. • STMC_DF_GNU(Ndat, data1); • STMC_DF_GNU(Ndat2, data2); The Two-Sided Kolmogorov Test • N = N1 = N2 = 50000 • KOLM2_AS2(N1, N2, DAT1, DAT2, DEL, Q) – DAT1 = Marsaglia, DAT2 = Mersenne Twister • DEL = 0.001172, Q = 0.882084 – DAT1 = Marsaglia1, DAT2 = Marsaglia2 • DEL = 0.001148, Q = 0.896565 – DAT1 = MT1, DAT2 = MT2 • DEL = 0.002050, Q = 0.243914 • DEL is the maximum deviation from the exact distribution function • Q is the approximation to the probability that this deviation is due to chance. Conclusions • The Mersenne Twister generator is more suitable for Monte Carlo Simulations than the Marsaglia random number generator. – Mersenne Twister is faster. – Mersenne Twister more accurate. – Mersenne Twister has a linear discretization. • Mersenne Twister is probably the best generator at the present time. References 1. 2. 3. 4. 5. 6. Berg, Bernd. Markov Chain Monte Carlo Simulations and Their Statistical Analysis. World Scientific, 2004. Karaoglu, Hihat. Mersenne Twister: A Study on Random Number Generators. Project Prestudy. Vrije Universiteit Brussel, Apr 2004. Korab, Holly. Best of the Random Number Generators. The National Center for Supercomputing Applications. Featured story, Feb 1999. L’Ecuyer Pierre. Uniform Random Number Generation. The Annals of Operations Research (1994). Lee, J.C. Random Number and Statistical Tests. Talk at Chuo University, Japan. Jan, 2003. Matsumoto, Makoto and Takuji Nishimura. Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulations, Vol. 8, No. 1, Jan 1998, Pages 3-30. References (continued) 7. 8. Matsumoto, Makoto and Takuji Nishmura. rng_afm.c. An Improved version of the Mersenne Twister. http://www.math.keio.ac.jp/matumoto/emt.ht ml Mascagni, Michael, and Srinivasan, Ashok. Algorithm 806: SPRNG: A Scalable Library for Pseudorandom Number Generation. ACM Transactions on Mathematical Software, Vol 26, No. 3, September 2000, pages 436-461. 9. RSA Laboratories’ Bulletin – News and Advice on Data Security and Cryptography. Number 8. Sept 3, 1998. 10. Sinclair, Bart. Permutation Test. Class notes for Elec 428 Computer Systems Performance. http://www.owlnet.rice.edu/~elec428/rng/perexp.h tml 11. The Scalable Parallel Random Number Generator Library (SPRNG) for ASCI Monte Carlo Computations. http://sprng.cs.fsu.edu/ Questions