1 Files\courses\MonteCarlo\MonteCarlo1.doc Revision 11.06.97 Marc Nerlove 1997 NOTES ON MONTE CARLO, BOOTSTRAPPING AND ESTIMATION BY SIMULATION Lest men suspect your tale untrue, Keep probability in view. John Gay, Fables, I, 1727 1. Random Number Generation Anyone who considers arithmetical methods of producing random digits, is, of course, in a state of sin. John von Neumann, 1951 Uniform and Normal Deviates "It may seem perverse to use a computer, that most precise and deterministic of all machines conceived by the human mind, to produce 'random' numbers. More than perverse, it may seem to be a conceptual impossibility. Any program, after all, will produce output that is entirely predictable, hence, not truly 'random.'"(Press, et al., p.191) There is, nonetheless, a large literature on how one uses a computer to "simulate" a sequence of random numbers. 1 The basic building block of all "random" number generation is the uniform random deviate. These are random numbers which lie within an interval, usually (0,1), and for which any number in this interval is as likely as any other. All other sorts of random deviates (GAUSS lists commands for beta, RNDBETA, gamma, RNDGAM, negative binomial, RNDNB, and Poisson, RNDP, as well as uniform, RNDU and RNDUS, and normal, RNDN and RNDNS 2) are generated by transforming uniform deviates, sometimes simply and sometimes by elaborate algorithms. 3 Standard normal deviates (zero mean, variance 1) may also be produced from uniform deviates by such transformations.4 For example, Box and Muller suggest the following transformation which produces pairs of independent normal deviates: y1 (2 ln u1 ) 2 cos( 2u 2 ) 1 (1) 1 y 2 (2 ln u1 ) 2 sin( 2u 2 ) 1 , Beginning with John von Neumann's "Various Techniques Used in Connection with Random Digits," Bureau of Standards Applied Mathematics Series 12, pp. 36-38, 1951. 2 In the case of RNDU and RNDN, the so-called "seed" is supplied from the computer clock; in the case of RNDUS and RNDNS, the "seed" is user supplied. For beta, gamma negative binomial and Poisson, what code is available in the GAUSS file random.src is not very informative, but for normal and uniform deviates the code is concealed in the GAUSS executive program and considered proprietary. References to the literature in the GAUSS manual on these routines indicate that the method used to generate uniformly distributed random variates is (2) below. However, (1) is not used to generate normal deviates, but rather a method called the "fast acceptance-rejection algorithm" proposed in A. J. Kinderman and J. R. Ramage, "Computer Generation of Normal Random Numbers,' Jour. Amer. Stat. Assoc., 71(356): 893-896 (1976). A brief discussion of the acceptance-rejection method is given below. 3 GAUSS also has an algorithm, called by RNDVM not listed in the manual, for generating deviates of the von Mieses distribution, usually called the circular normal distribution. 4 The definitive work on this subject is Luc Devroye, Non-Uniform Random Variate Generation, New York: Springer-Verlag, 1986. I discuss further examples below. 2 where ui and ui+1 are consecutive uniform deviates in the interval (0,1).5 At that point it is easy to generate vectors of numbers having a multivariate normal distribution with specified mean vector and variance covariance matrix through the use of an appropriate matrix transformation. Uniform deviates are usually generated by so-called congruential generators. The first to be suggested (by Lehmer in 1951) was the multiplicative generator: ui 1 aui (mod m) , where the notation indicates that u i 1 is the remainder when aui is divided by m. Such generators have to be started by an initial value called the seed. The numbers produced by such a recurrence relation will eventually start to repeat. The maximum period possible is m, but the multiplicative generator will generally have a period much shorter than this. A recursive relation with better properties is the linear congruential generator: (2) u i 1 (au i c)(mod m) , where m is called the modulus, and a and c are positive integers called the multiplier and the increment respectively. The reccurrence (2) will eventually repeat itself, but if m, a and c are properly chosen the period of the generator will be m, the maximum possible. The number m is usually chosen to be about the length of a machine word, making the generator machine dependent. 6 "Portable" random number generators (RNG) are generally much less efficient. Moreover, unless m, a and c are chosen very carefully in relation to one another, there may be a lot of serial correlation among values generated. The rules are complicated: (a) c and m may have no common divisor, (b) a = 1(mod p) for every prime factor of m, and (c) a = 1(mod 4) if m is a multiple of 4. The seed for each successive call of the generator is the last integer ui+1 returned. Usually this value is saved for the next use of the procedure. In the case of the GAUSS generators RNDU and RNDUS, the ending value of the seed is not available directly to the user, but could be computed from the last random number returned. RNDU takes its seed from the computer clock when GAUSS is first started up. To obtain a number in the interval [0,1) usually u i+1/m is returned rather than ui+1.7 What can go wrong with the RNG in common use? Clearly, that the choice of m, a and c may be botched is the greatest problem in situations in which the specifics of the RNG are not known. Press, et al., pp.194-195, suggest a kind of shuffling procedure to alleviate this problem But randomness, like beauty, is in the eye of the beholder. "...the deterministic program that produces the random sequence should be different from, and – in all measurable respects – statistically uncorrelated with, the computer program that uses its output. In other words, any two different random number generators ought to produce statistically the same results when coupled to your particular applications program." (p.191) More generally, Knuth (1969, pp. 34-100) gives a large number of tests for randomness, the most powerful of which is called the spectral test. Understanding and implementing this test requires knowledge of time series analysis, but Exercise 2 to this section suggest a couple of simpler tests which might be implemented in GAUSS: Chisquare and Kolmogorov-Smirnov. GAUSS's generators RNDU and RNDN pass these tests. To obtain a series of vectors which can be considered to have come from a multivariate normal distribution with mean vector and variance-covariance matrix , we can make use of the Cholesky 5 Box, G. E. P. Box and M. E. Muller, "A Note on the Generation of Random Normal Deviates, Annals of Mathematical Statistics, 29: 610-611 (1958). In fact, the Box-Muller method is not a very efficient way to generate normal deviates because it involves evaluation of logarithmic and trigonometric functions which require series expansions inside almost all computers (i.e., these functions are not "hard-wired"). This is why GAUSS uses an alternative method (Kinderman and Ramage, op. cit.). 6 The GAUSS manual remarks that the generator assumes a 32-bit machine (p.1525), which means m should be about 232. This is consistent with the restriction in RNDUS that the user supplied seed must be in the range 0<seed<231-1 (p.1523). The example used there chooses 3937841 as the seed. The default value for m is in fact 231-1 = 2147483647 (p. 1519). 7 The default values are c=0, a=397204094 and m=2147483647. But you can set your own parameters and starting seed with RNDCON, RNDMOD, and RNDSEED (p.1519). 3 decomposition of the matrix and the characterization of the multivariate normal discussed in Econometrics for Applied Economists, Notes for AREC 624, Spring 1995, vol. 3, pp. APP/11 - 17.8 Let u be a vector of iid normal variable with mean zero and variance 1; then, (3) v Tu has a multivariate normal distribution with mean vector and variance-covariance matrix T'T = . Nonuniform Deviates in General9 One of the principal problems in doing Monte Carlo simulations of all kinds is the generation of nonnormal deviates. For example, in regression analysis, tests of significance are based on the assumption of an underlying normal distribution for the disturbances; if these are not normal, potentially serious errors can result in the assessment of the significance of the regression coefficients.. Similarly, maximumlikelihood estimation is generally based on the assumption of a normal distribution and heavy reliance is placed on the desirable asymptotic properties of the ML estimates; while many of these asymptotic properties hold for many nonnormal cases, we typically deal with samples which are not so large that we can be confident that the asymptotic properties are good approximations; moreover, methods of likelihood inference more generally depend crucially on the distributional assumptions made.. As remarked, all random number generation, including that for normal deviates, begins with the generation of uniform deviates. The problem is to get from uniform deviates to the deviates we want. 10 There are essentially three basic methods for generating nonuniform deviates: (1) the inverse transform method; (2) the composition method; and the acceptance-rejection method. Fishman (1996, pp. 179-187) deal with two other methods, as well as with refinements of the basic three. (1)The inverse transform method is conceptually the easiest to explain and can be applied to discrete as well as continuously distributed variates: Since we know how to generate variates which are uniform on the interval [0,1), simply regard these as probabilities from a cumulative distribution function corresponding to the distribution we want and find the value of the variate corresponding to the "probability" turned up by the uniform generator. Thus, to obtain continuously distributed variates with the cumulative F( ), for which we have F-1( ) in closed form, first generate the uniform(0,1) variable, u, the compute z = F-1(u ). The z's so generated could have come from the distribution with density f( ). The applicability of this method depends on our ability to specify F -1( ), and its efficiency is problematic since we often can not avoid evaluating exponential, logarithmic or trigonometric functions. Fishman (1996, p.151) gives a useful table of common continuous distributions and their analytical inverses. Included are the following: uniform; beta; exponential, logistic, noncentral Cauchy, normal Pareto; and Weibull. David Baird has a series of programs, available at the GAUSS site at American University, http://gurukul.ucc.american.edu/econ/gaussres/GAUSSIDX.HTM, for generating nonuniform random variates, in which he also gives programs for the following inverse cumulatives: Normal; t; F; chi-square; and Gamma. For a discrete distribution the inverse transform method is a bit more difficult because the cumulative cannot generally be obtained in closed analytical form, although the principle is the same. The essential idea is as follows: Suppose we want to generate a discrete random variate, z , which can take on 8 GAUSS contains a command, CHOL, for finding the Cholesky decomposition of a positive definite symmetric matrix. This decomposition finds an upper triangular matrix T such that T'T = . 9 As remarked in footnote 5, the definitive reference on this subject is Devroye, op. cit., but Fishman (1996, Chapter 3, "Generating Samples," pp. 145-254) contains an excellent discussion. Mooney's (1997, pp. 14 – 46) is good as far as it goes, but somewhat incomplete. 10 GAUSS contains commands for generating deviates, as remarked, for beta, RNDBETA, gamma, RNDGAM, negative binomial, RNDNB, and Poisson, RNDP, as well normal, RNDN. But you need to understand in general how this is done and to be able to generate variables from distributions not covered by GAUSS, such as the t-distribution. 4 values a, a+1, ..., b, coming from the cumulative distribution with probabilities { 0 < q a < qa+1 < ... < qb = 1}. qi is the probability that z i. Set z =a. Generate a uniform variate u, as long as u < q a z remains at a, but when u > qa augment z by 1 and continue with qz. If this seems difficult to follow, the example of Bernoulli trials discussed by Mooney (1997, pp. 31-33) is instructive: A Bernoulli trial is just a "one-shot" binomial; z takes on the value a with probability q and not a with probability 1-q; thus the cumulative distribution is {0 < q < 1}. Here is a little GAUSS program to generate n Bernoulli RVs with this distribution. q = 0.4; /* for example */ y = RNDU(n,1); p = q*ones(n,1); z = y .le p; /* .le is the element by element logical operator, returning 1 if true, 0 if false */ z is an n by 1 vector of RVs from the distribution {0 < 0.4 < 1}. Another example involving only a twopoint cumulative is the binomial distribution with t trials and probability p of success . Here t t q p i (1 p) t i . i 1 i Obviously, this method, in general, applies to any univariate discrete distribution, although in general the probabilities qi can be specified in terms of a few underlying parameters and the method makes no use of this to increase efficiency. When the number of possible values is more than 2, the number of comparisons involved can be large. Fishman (1996, pp. 153-155) gives a method for improving efficiency and reducing the mean number of comparisons required to generate a random variable with the specified cumulative. GAUSS has commands for generating RVs for the negative binomial and the Poisson distributions, and Baird includes a program for the binomial. Fortunately, many common discrete distributions can be built up as the composition of Bernoulli variables. (2) The composition method is the lazy approach, when it can be implemented, and is generally even more inefficient than the inverse transform method. We have already had an example of this approach in the Box-Muller method for generating random normal deviates from uniform deviates by transforming the latter. IID normal 0-1 variables can also be transformed to univariate normals with nonzero mean and variance different from one, as well as into nonindependent multivariate normals. Mooney (1997, pp. 1823) suggests obtaining RVs from the lognormal, Chi-square and t distributions this way: lognormal by exponentiating a normal variable; Chi-square with df degrees of freedom by summing df independent normals squared; and t with df degrees of freedom as the ratio of a normal and the sqrt of an independent Chi-square with df degrees of freedom. The binomial, geometric and negative binomial distributions can also be treated in this way (Mooney, 1997, pp. 35-42). A standard exponential variable can be generated from a uniform [0,1) variable.11 Notwithstanding its potential inefficiency, the composition method provides a very powerful tool for handling distributions that cannot be represented in terms of one of the standard distributions. The idea is to "mix" one or more distributions you already know something about. The key parameters in the implementation of this method are the mixing proportions. For example suppose we wanted to mix only two distributions for which we know how to create RVs easily, e.g., two normal distributions with different means and variances; We create a vector from one distribution of length np and one from the other of length n(1-p) where p is the mixing proportion; concatenate the two vectors so obtained vertically; then randomize the elements using the row index and the GAUSS commands RNDU and CEIL. Mooney (1997, pp. 24-25) gives the following GAUSS example for mixing two normal distributions: 11 It is interesting to note that von Neumann's 1951 paper suggested obtaining an exponentially distributed random variable by the acceptance-rejection method described below, rather than by the composition method. His sketch of the proof appears to have been the basis for the development of the AR method. 5 y1 = RNDN(((1-p)*n),1); mean = a1; variance = b1; y1 = mean + (sqrt(variance)*y1); y2 = RNDN((p*n),1); mean = a2; variance = b2; y2 = mean + (sqrt(variance)*y2); y = y1|y2; index = CEIL(RNDU(n,1)*n); /* See GAUSS manual, p. 1075. */ x = SUBMAT(y, index', 0); /* See GAUSS manual, p. 1599. Note that index has been transposed. */ x is the desired vector of RVs having a distribution which is a mixture in proportion p of the two normal distributions specified. (3) The acceptance-rejection method (AR method) is not only the most generally applicable but the basis for the most efficient and refined algorithms. It works when the inverse cumulative is intractable and when the distribution of the RV we want to generate is not a simple function of one or more distributions we know how to generate. The AR method is based on a theorem on conditional probability stated by von Neumann (1951, op. cit.): Theorem: Let f(z) be a pdf defined on the interval [a,b], such that f ( z ) cg ( z )h( z ), where b h( z ) 0, h( z )dz 1, c sup z [ f ( z ) / h( z )] and 0 g ( z ) 1. a Let Z be the RV with pdf h(z) and U be uniformly distributed on [0,1); then, if Ug(Z), Z has the pdf f(z). This theorem suggests the following procedure for generating a vector, x, n by 1, of RVs from a distribution having the pdf f(z): Generate two independent RVs, distributed uniformly on [0,1), say z and u. Calculate f(z); if u f(z), insert the value z as the next element in x; if not, try again. Here g(z) and h(z) are both the uniform density. Go on until you have filled the vector x. 12 Here is a short GAUSS program to generate a vector of 10000 RVs coming from the exponential distribution using the von Neumann method: SAMPLESZ = 10000; i = 1; x = zeros(samplesz,1); do while i le samplesz; T = 0; again: u = rndu(1,1); sum = 0; N = 2; do while sum < u; v = rndu(1,1); sum = sum + v; N = N + 1; endo; if (N % 2 eq 0); T = T +1; goto again; 12 The proof is as follows (Fishman, 1996, p.172): U and Z have the joint pdf f U , Z (u, z ) h( z ), 0 u 1, a z b. Then Z has the conditional pdf hZ ( z | U g ( Z )) g (z) 0 f U , Z (u, z )du prob(U g ( Z )) , b where prob(U g (Z )) h( z ) g ( z ) 1 / c, and hZ ( z | U g (Z )) cg ( z)h( z) f ( z). a QED. 6 endif; x[i,1] = T + u; i = i + 1; endo; The algorithm used by GAUSS to generate normal 0-1 deviates is of the AR type (Kinderman and Ramage, op. cit.). The AR method can be made quite efficient by clever programming and is the basis for many of the algorithms which are used in practice The method, however, has problems with "thick-tailed" distributions, since, for such distributions, the ratio of rejections to acceptances will generally be high, leading to the necessity of generating a very large number of uniform RNs in order to achieve the desired final number. And, as is the case in general, formulations which require the evaluation of exponential, logarithmic, trigonometric, or other special functions may be inefficient simply because these evaluations are themselves time consuming. --Continued next file: MonteCarlo2.doc Exercises Part 1: Random Number Generation 1. Show that the transformation (1) of two variables each independently distributed on the unit interval yields two independently identically distributed normal variates with unit variances and zero means, with positive probability on the whole real plane. Follow the development of Hogg and Craig, 4th edition, section 4.3, "Transformations of Variables of the Continuous Type." Example 8 is exactly this problem. The variables (u1, u2) are distributed with positive probability density on the unit square, [0,0], [0,1], [1,0], [1,1]; whereas the variables (y1 y2) are distributed with positive probability density on the whole real plane. Pay particular attention as to how the region of positive probability for the uniformly distributed variates is transformed by (1) into the region of positive probability for the normal variates. (See H&C, sec. 4.3, Examples 3-7.) 2. Write a GAUSS program to generate 1000 random numbers in the interval [0,1) using RNDU. Divide the interval [0,1) into 10 equal parts and find the frequency of numbers falling in each category. Compare this with the theoretical frequency for each category of 1/10 using a standard Chi-square test to assess the statistical significance of the deviations you find. Use the same random numbers to implement a Kolmogorov-Smirnov test. The K-S test is unfortunately not referenced in the 4th and later editions of H&C, but you can find a good discussion in A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the Theory of Statistics, 3rd edition, New York: McGraw-Hill, 1974, pp.508-510. Knuth, pp.41-51, and Press, et al., pp. 472-475, also have an extended discussions. Critical values for the test are tabulated in CRC Standard Probability and Statistics Tables and Formulae, ed. by W. H. Beyer, Boca Raton: CRC Press, 1991, p. 334. How would you modify these tests to check whether RNDN produces variates which are independent standard normal? 3. Write a GAUSS program to generate 1000 3 by 1 vectors having a trivariate normal distribution with mean vector (1, 2, 3)' and variance-covariance matrix: 6.250 1.875 0.625 1.875 2.8125 0.9375 0.625 0.9375 1.3125. How would you modify your program to produce random normal deviates with a specified correlation matrix? 4. Write a GAUSS program to generate 1000 random nonuniform numbers using the GAUSS commands RNDN (normal), RNDBETA (beta, choose shape parameters both = 2), RNDGAM (gamma, choose parameter alpha = 2), and RNDNB (negative binomial, choose parameters = 1.5 and 0.5; note that this is a discrete distribution). The Poisson distribution is treated in Exercise 6 below. Compute the KolmogorovSmirnov test statistic against the null of the true distribution, and graph the empirical against the theoretical 7 distribution. Do the same for the t-distribution for which there is no GAUSS generator, with degrees of freedom = 2. (See Baird's programs, available at the GAUSS site at American University, http://gurukul.ucc.american.edu/econ/gaussres/GAUSSIDX.HTM. Note that the cumulative of the t-distribution is required and that you can use the GAUSS routine, CFDTC.) GAUSS does not supply all of the other cumulatives you need; you will have to write these yourself or get them from Baird. Note that the Poisson distribution is a discrete distribution and that the KolmogorovSmirnov test based on continuous cumulative distribution will give you weird results. See Exercise 6, below. 5. Write a GAUSS program to generate 1000 RVs from a binomial distribution with 20 trials and probability 0.75 of success. Use a Chi-square test to check whether the variates you generated could have come from a binomial with probability 0.5 of success. 6. The Poisson distribution, which is widely used in the study of event data in econometrics (Greene, 2nd ed., pp. 676-679), has the pdf m x e m f ( x) , x 0,1,2,...., x! 0, elsewhere, m 0. (Hogg and Craig, 4th ed., pp. 99-102.) This discrete distribution has, among others, the following properties, which may be useful in formulating computer algorithms to generate Poisson variates from uniform variates (Johnson, Kotz and Kemp, Univariate Discrete Distributions, 2nd. ed., Wiley, 1992, pp. 158-162): (a)If X1 , X2 , X3 , ... are iid RVs from an exponential distribution with parameter 1 (pdf e-x , x0) and Z is the smallest integer 0 such that X1+X2+X3+X4+...+XZ > , then Z has a Poisson distribution with parameter m = . (b)If U1 , U2 , ... are iid uniform [0,1) variates and Z is the smallest nonnegative integer such that Z 1 U i e , then Z has a Poisson distribution with parameter m = . i 1 Write two short GAUSS programs to generate 10,000 Poisson RVs with parameter = 0.5 based on method (a) and on method (b). In the case of method (a), based on exponentially distributed RVs, use the algorithm used in the text to illustrate the AR method. Show that z = -ln(u), where u is distributed uniform [0,1), has a standard exponential distribution and use this as well to generate Poisson variables by the composition method. Compare the frequencies obtained with the theoretical frequencies, m = 0.5, x = 0, 1, 2, ...10 Compare with the RVs produced by the GAUSS command RNDP. What do you conclude about the efficacy of the different methods to simulate Poisson RVs and about the probable nature of the method underlying RNDP? 7. The (central) Cauchy distribution (Hogg and Craig, 2nd. ed., p 142, Exercise 4.22) has the pdf f ( x) 1 , x . (1 x 2 ) Show (a) that f(x) is the marginal distribution of X1 in the joint distribution of (X1 = Y1/Y2 , X2 = Y2) where Y1 and Y2 are iid normal 0-1 variables; and (b) the cumulative Cauchy is F ( x) 1 1 arctan x . 2 Write two short GAUSS programs to generate 1000 RVs from a Cauchy distribution with pdf f(x) using respectively the composition method and the inverse transform method. Compare the empirical cumulatves withthe theoretical cumulative and test whether the differences are significant using a KolmogorovSmirnov test. 8 References 1. Random Number Generation: Press, W. H., B. P. Fannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: the Art of Scientific Computing. New York: Cambridge University Press, 1986. Chapter 7, "Random Numbers," pp.191-225. Knuth, D. E., The Art of Computer Programming, Volume2: Seminumerical Algorithms. Reading, MA: Addison-Wesley Pub. Co., 1969. Chapter 3, "Random Numbers," pp. 1-160. Hammersley, J. M., and D. C. Handscomb, Monte Carlo Methods. London: Methuen & Co. Ltd., 1964. Chapter 3, "Random, Pseudorandom, and Quasirandom Numbers," pp. 25-42. Mooney, C. Z., Monte Carlo Simulation, Sage Publications, Series No. 07-116, 1997. Fishman, G. S., Monte Carlo: Concepts, Algorithms and Applications, New York: Springer-Verlag, 1996.