Chapter 3: Random number generators Random number generators Generating arbitrary distributions The transformation method The rejection method MATLAB functions for common distributions ASTM21 Chapter 3: Random number generators p. 1 Random number generators Generating (pseudo-) random numbers X ~ U(0,1) is basic to all experimental statistics, simulation, experiment design, and data analysis. Computer languages often contain a system-supplied random number generator, say x = rand() or x = rand(seed) where seed is an integer used to "seed" the generator. Designing good random number generators is an art that has only recently matured. Do not try to design one yourself, or use code that is more than 10 years old. Some quotes from Numerical Recipes (3rd ed., Ch. 7): Be cautious about any source earlier than about 1995, since the field has progressed enormously in the following decade. [...] The greatest lurking danger for a user today is that many out-of-date and inferior methods remain in general use. [...] If all scientific papers whose results are in doubt because of [bad random number generators] were to disappear from library shelves, there would be a gap on each shelf about as big as your fist. Numerical Recipes contains examples of good, portable random number generators. But the simplest is to use generators that are part of a well-reputed software package. The so-called Mersenne Twister (Matsumoto & Nishimura 1997) used by MATLAB [mt19937ar] since about 2008 is considered to be of high quality - it passes a number of stringent tests for randomness, including the ‘Diehard’ test suite (Marsaglia 1998). ASTM21 Chapter 3: Random number generators p. 2 Random number generators: Use of seed When MATLAB is started, and you ask for (say) three random numbers using rand(), you get: >> rand(3,1) ans = 0.814723686393179 0.905791937075619 0.126986816293506 The next one is: >> rand ans = 0.913375856139019 If you quit MATLAB and start again, you get for example: >> rand(2,2) ans = 0.814723686393179 0.905791937075619 0.126986816293506 0.913375856139019 Each time MATLAB is started, the default random stream is initialized with the same seed (= 0). ASTM21 Chapter 3: Random number generators p. 3 Random number generators: Use of seed To check how the default random stream was initialized, use: >> RandStream.getGlobalStream ans = mt19937ar random stream (current global stream) Seed: 0 NormalTransform: Ziggurat In simulation experiments it is often desirable to start each experiment (or batch of experiments) from a different, but well-defined state, so that a certain experiment can be reproduced exactly. (Why?) This is done by initializing the default random stream with different seeds for each batch. >> stream = RandStream.create('mt19937ar','seed',100) stream = mt19937ar random stream Seed: 100 NormalTransform: Ziggurat >> RandStream.setGlobalStream(stream) >> rand(3,1) ans = 0.543404941790965 0.278369385093796 0.424517590749133 ASTM21 Chapter 3: Random number generators p. 4 Random number generators: Some caveats • Do not re-initialise unnecessarily: for example to repeat an experiment 1000 times in sequence, do not re-initialise in between, unless each experiment takes a really long time. (Why?) • If your investigation depends very critically on the quality of the random number generator, try another one and check that the results are (statistically) the same. • Warning: In MATLAB, never use rand('seed',1), rand('state',2), randn('seed', 3), etc. to set the seed. These are obsolete forms which cause MATLAB to switch to a very old random number generator, as can be seen by checking the default stream: >> rand('seed',100) >> RandStream.getGlobalStream ans = legacy random stream (current global stream) RAND algorithm: V4 (Congruential) RANDN algorithm: V5 (Ziggurat) The congruential RAND algorithm from V4 (1992) is definitely not a good one! ASTM21 Chapter 3: Random number generators p. 5 Generating arbitrary distributions: The transformation method To generate a univariate (pseudo-) random variable y with given pdf p(y), there are a few basic techniques that can be used, and some nice tricks for special distributions (like the Gaussian). They all start with the generation of one or several uniform variates x ~ U(0,1). The transformation method requires that you can compute (without too much difficulty) the inverse cdf of p(y), F–1(x). ASTM21 Chapter 3: Random number generators p. 6 The transformation method: Three examples Example 1: The exponential distribution To generate a random variable y with the exponential pdf with parameter ! (see L2:24), we use the analytical cdf with the inverse Thus, if x ~ U(0,1) we find that y = −(ln x)/! has the desired distribution. Note that 1 − x ~ U(0,1) so we can save one subtraction by using x instead of 1 − x. ASTM21 Chapter 3: Random number generators p. 7 The transformation method: Three examples Example 2: The Cauchy distribution The standard form of the Cauchy distribution is which is symmetric about x = 0 and has a FWHM (Full Width at Half Maximum) of 2 units. As discussed in L1:19 this distribution has undefined mean value and infinite variance. The analytical cdf is with the inverse Thus, if x ~ U(0,1) we find that y = tan[(x − 0.5)#] has the desired distribution. ASTM21 Chapter 3: Random number generators p. 8 The transformation method: Three examples Example 3: The Box-Muller transformation for the normal distribution It is not convenient to use the transformation method directly on the one-dimensional normal distribution N(0,1), because the cdf and its inverse cannot be expressed in terms of elementary functions. However, if x ~ N(0,1) and y ~ N(0,1) are independent normal variates, then their joint pdf is Transforming to polar coordinates by means of x = r cos !, y = r sin !, we find which shows that r and ! are independent, that ! ~ U(0, 2#), and that the cdf of r is F(r) = 1 − exp(−r2/2), with inverse r = F–1(u) = [−2 ln(1 − u)]1/2. Thus, given two uniform variates u1 ~ U(0,1) and u2 ~ U(0,1) we obtain two independent normal variates as which is the Box-Muller transformation. However, more efficient algorithms exist (p. 13). ASTM21 Chapter 3: Random number generators p. 9 Generating arbitrary distributions: The rejection method The rejection method does not require the inverse cdf, or even the cdf, but only that you can compute p(x) for any given x. Moreover, you need some other function f (x) such that • p(x) ≤ f (x) everywhere • the integral of f is finite (say A) • the inverse cumulative function of f can be computed (F–1(a).) ASTM21 Chapter 3: Random number generators p. 10 Generating arbitrary distributions: The rejection method Using the rejection method, the algorithm to generate x0 ~ p(x) is: $ $ $ $ $ ASTM21 1. 2. 3. 4. 5. Generate a number a ~ U(0, A) Apply transformation x0 = F–1(a) Compute f (x0) and p(x0) Generate another random number b = U(0, f (x0)) If b ≤ p(x0) accept x0, otherwise goto 1 Chapter 3: Random number generators p. 11 The rejection method: Two examples Example 1: Generate x ~ Beta(2,5) This is the beta distribution (see L1:18) with parameters % = 2 and & = 5: The transformation method is not useful, since the cdf is a polynomial of degree 6. But we have p(x) < 2.5 everywhere (see diagram), so we can use f (x) = 2.5 (0 ≤ x ≤ 1) with integral A = 2.5. The cdf of f (x) is y = F(x) = Ax for 0 ≤ x ≤ 1, and the inverse is x = F–1(y) = y/A. Thus the procedure in this case is: 1. 2. 3. 4. 5. Generate a number a ~ U(0, A) Apply transformation x = F–1(a) = a/A [this is simply x ~ U(0,1)] 3 Compute f (x) = A and p(x) Generate another random number y = U(0, A) 2.5 If y ≤ p(x) accept x, otherwise goto 1 f (x) 2 It can be seen that this is equivalent to placing random points (x, y) in the rectangle outlined by f (x) and accepting the x value if the y value is below p(x). The efficiency of the method depends on the ratio of areas below the two curves, i.e., A = 2.5 in this case. On average, A pairs (x, y) are needed to generate one x. ASTM21 Chapter 3: Random number generators 1.5 p(x) 1 0.5 0 −0.2 0 0.2 0.4 0.6 x 0.8 1 p. 12 1.2 The rejection method: Two examples Example 2: The ziggurat algorithm for x ~ N(0,1) The ziggurat algorithm (by Marsaglia) is the most commonly used method to generate Gaussian numbers (e.g., randn in MATLAB) because it is very efficient. It is essentially the rejection method applied to segments of the Gaussian curve (see diagram). First one of the rectangles is selected at random (as they have equal area), then a second random number is used to decide if the x value is to the left of the dotted line, otherwise the rejection method is applied to decide if it is below the Gaussian curve. 0.5 0.4 0.3 0.2 0.1 0 0 ASTM21 1 2 3 4 Artist’s drawing of a Guto-Sumerian ziggurat (step pyramid) Source: www.iranian.com Chapter 3: Random number generators p. 13 MATLAB functions for common distributions Distribution pdf or pmf cdf p = F(x) inverse cdf x = F−1(p) random generator Uniform unifpdf unifcdf unifinv unifrnd Beta betapdf betacdf betainv betarnd Normal normpdf normcdf norminv normrnd Chi-squared chi2pdf chi2cdf chi2inv chi2rnd Exponential exppdf expcdf expinv exprnd Gamma gampdf gamcdf gaminv gamrnd Binomial binopdf binocdf binoinv binornd Poisson poisspdf poisscdf poissinv poissrnd standard (0, 1) rand randn Initialization of random number renerator: rng(seed), rng(’shuffle’) ASTM21 Chapter 3: Random number generators p. 14