A Critical Assessment of Simulated Critical Values By John T. Cuddington* and William Navidi** December 29, 2010 Abstract There is a wide variety of statistical problems (e.g., unit root and cointegration tests) where hypothesis testing involves the use of simulated rather than theoretical critical values. We argue that in practice the number of replications used to simulate critical values is often insufficient to provide the degree of precision that is implied. In particular, the number of replications needed is greatest for values in the tails of the distribution. We provide recommendations for approximating the number of replications needed to achieve a desired degree of precision. Keywords: Monte Carlo simulations, simulated critical values, unit root tests, cointegration tests. Running Head: Simulated Critical Values *Division of Economics and Business and **Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO 80421. **To whom correspondence should be addressed. 1 Introduction There is a wide variety of statistical problems where hypothesis testing involves the use of simulated rather than theoretical critical values. In some cases finite sample critical values are required; in other cases, asymptotic critical values must be simulated. Important examples include the simulated critical values needed to carry out Dickey-Fuller and PhillipsPerron unit root tests, the Engle-Granger cointegration test, unit root tests in the presence of breaks at known or unknown dates (Perron, Zivot-Andrews). Authors devising these new tests typically provide simulated critical values (CVs), often reporting two or even more significant digits for various sample sizes. Reporting CVs to two or more digits implies something about the purported precision of the underlying calculations, which is determined largely by the number of replications used in the Monte Carlo simulations. The objective of a typical simulation exercise is to obtain one or more simulated critical values or quantiles for a test statistic calculated for a variety of sample sizes. For example, one might wish to obtain estimates of the 1%, 5%, and 10% critical values for a test statistic calculated from sample size of T = 25, 50, 100, and 250. For each sample size Ti , the following five steps are carried out. 1. Specify the data generating process (DGP) under the null hypothesis. 2. Using the DGP, generate a sample of length Ti . For some dynamic models, it may be necessary to sample Ti + k observations and discard the first k to eliminate the effects of initial conditions. 3. Calculate the statistic of interest using sample of length Ti . We denote this statistic S(Ti ). 4. Repeat steps 2 and 3 r times to obtain i.i.d. observations on the test statistic of interest. These observations from the r replications S1 (Ti ), ..., Sr (Ti ) are an estimate 2 of the sampling distribution of the finite sample statistic S(Ti ).1 5. Use the rα order statistics for α = 0.01, 0.05, and 0.10, say, to estimate the desired critical value or quantiles for the finite sample statistic S(Ti ). Even when simulation results are reported for a range of sample sizes, researchers attempting to apply them typically have sample sizes that differ from those that were simulated, necessitating some form of interpolation (at least implicitly). In a series of papers, MacKinnon has proposed a method in which response surface regressions are used to accurately estimate critical values of the asymptotic distribution of S(T ) (as sample size T goes to infinity), and to estimate critical values for finite sample sizes by interpolation. The method involves simulating a critical value, say the α quantile, for a range of sample sizes T1 , T2 , Tn . Let q α (Ti ), be the estimated α quantile for sample size Ti . Each q α (Ti ) is obtained from a simulation with r replications. The response surface regression corresponding to the α quantile is: α q α (Ti ) = θ∞ + θ1α Ti−1 + θ2α Ti−2 + θ3α Ti−3 + εi α represents the α quantile of the asymptotic distribution, and the reIn this equation, θ∞ maining terms reflect the rate of convergence (1/T ) of the quantile of the finite-sample distributions to the quantile of the asymptotic distribution. One can substitute any chosen value of T into the RHS of the estimated response surface regression to obtain the estimated critical value for sample size T . This interpolation method is increasingly being used to get accurate finite sample critical values for complicated time series estimators. Since the early 1990s, MacKinnon has argued forcefully that large numbers of replications are required in simulation experiments in order to obtain accurate critical values. For example, 1 It is important that r be large enough to obtain sufficient granularity in the CDF of S(Ti ) to obtain the rα order statistic. For example, if r = 20, it would not be possible to determine the 1% quantile! One would need at least 100 replications. 3 his 1991 paper uses 25,000 replications for each experiment. Additionally, he repeats each experiment 40 times so that the total number of replications is equal to 1,000,000. In the 2010 reissue and extension of his original 1991 working paper, MacKinnon uses 500 sets of experiments with 200,000 replications in each experiment for a grand total of 100 million replications (for each sample size considered)! Note that in terms of total replications, MacKinnon recommends a “divide and conquer” approach. That is, he runs 500 experiments with 200,000 replications rather than one experiment with 100 million. In MacKinnon (2000) he lists several reasons for this, including (1) the observed variation among the estimates from the 500 experiments provides a very easy way to measure the experimental randomness in the estimated quantiles, (2) less computer memory is required, (3) it reduces the sorting cost inherent in quantile calculations (as “it is cheaper to sort N numbers M times that to sort MN numbers at once”), and (4) it allows the simulation experiments to be easily divided among a number of computers for parallel processing and reduced vulnerability to power failures. In spite of MacKinnon’s admonitions, researchers simulating critical values have often based their results on woefully inadequate numbers of replications. For example, MacKinnon (2010, page 2) points out that “Engle and Granger (1987), Engle and Yoo (1987), Yoo (1987), and Phillips and Ouliaris (1990) all provide tables for one or more versions of the EngleGranger cointegration test. But these tables are based on at most 10,000 observations, which means that they are quite inaccurate.” In spite of tremendous gains in computing power over time, there is, if anything, a tendency towards smaller rather than larger numbers of replications in critical value simulation exercises. MacKinnon’s papers with various coauthors are laudable exceptions. Here are a few other examples of simulation papers with replications used. Dickey-Fuller (1979, 4000 reps), Dickey-Fuller (1981, 50,000 reps), Schwert (1989, 10,000 reps), Zivot-Andrews (1992, 1000 or 5000 reps), Lumsdaine-Papell (1997, 500 reps), Nunes-Newbold-Kuan (1997, 5000 reps), Harvey-Leybourne-Newbold (2001, 10,000 4 reps), Lee-Strazicich (2001, 5000 reps), and Lee-Strazicich (2003, 2000 or 5000 reps). The purpose of this note is twofold. First, we argue that, in practice, the number of replications used to simulate critical values is often insufficient to provide the degree of precision that is implied, and second, we provide some recommendations for approximating, roughly, the number of replications needed to achieve a desired degree of precision. It is important to note that larger numbers of replications are needed to obtain precise critical values in the tails of the distribution. In a typical simulation to obtain critical values, the true distribution of S(T ) cannot be calculated analytically. In this paper we begin by discussing the estimation of quantiles in some situations where the true distribution is known. By examining the asymptotic distribution of the sample quantiles, we show that for some typical distributions, the number of replications needed to obtain precision to two or three significant digits is much larger than is typically used. Then we provide some suggestions for determining a number of replications that will provide a desired degree of precision in cases where the true distribution is unknown. 2 A simple example Consider the most basic of simulation experiments. Suppose we take r independent observations of a random variable X that has the standard normal distribution. We might ask: What are the upper and lower bounds of the confidence 95% interval based on these r replications? In this trivial example, we know that the theoretical CVs are (−1.960, 1.960). The simulated CVs are found by calculating the 2.5% and 97.5% quantiles of the sampling distribution. For a slightly more complex example, consider drawing a random sample of size n from the standard normal distribution N (0, 1). Using these n observations, we run the simplest of all regressions, where the model contains only an intercept term: 5 X i = α + εi It is, of course, well-known that the OLS estimator of α is just the sample mean. So in this simple case: Pn S= i=1 Xi n which has the following distribution: S ∼ N (0, 1/n) For convenience, we will consider the standardized sample mean √ nS, which has a standard normal distribution. Following the five step procedure above, is straightforward to replicate this procedure r times to obtain the empirical sampling distribution of this sampling statistic. Table 1 below shows our results of this exercise for estimating CVs of the standard normal distribution using r = 100 and r = 1, 000, 000, along with results for estimating CVs of the distribution of the sample mean of 100 i.i.d. standard normal random variables using r = 10, 000. For r = 100, the estimated CVs are quite far from the true values. Undoubtedly, the reader’s response to this finding is that the chosen r is much too small. Surprisingly, the estimated CVs for r = 1, 000, 000 and for the sample mean with r = 10, 000 are still somewhat different from the theoretical values. These examples raise the question: just how large does r have to be in this simplest of experiments to get the precision of the 0.025 quantile to be 0.001? We can address this question by examining the asymptotic distribution of a sample quantile. 6 Table 1: True Quantiles and Estimated Quantiles for the Standard Normal Distribution Quantile 0.01 0.025 0.05 0.10 0.50 0.90 0.95 0.975 0.99 True Value −2.326 −1.960 −1.645 −1.282 Estimated Value n = 1, r = 100 −2.550 −2.227 −1.988 −1.326 −0.172 1.519 1.941 2.150 2.425 Estimated Value −2.322 −1.955 −1.641 −1.281 n = 1, r = 1, 000, 000 Estimated Value n = 100, r = 10, 000 3 0.000 1.282 1.645 1.960 2.326 0.000 1.278 1.640 1.957 2.321 −2.377 −1.954 −1.651 −1.281 −0.002 1.290 1.658 1.975 2.294 The asymptotic distribution of a sample quantile Let F (x) be any absolutely continuous, strictly increasing cumulative distribution function, 0 and let f (x) = F (x) be the corresponding probability density function. In the context of the discussion above, F (x) is the cdf of the statistic S(T ). Let 0 < p < 1, and let zp = F −1 (p) be the pth quantile of F . The goal is to estimate the critical value zp , and find its asymptotic distribution. Let S1 , ..., Sr be i.i.d. with cdf F (x), and let ẑp be the estimated critical value, for which the proportion of Si that are less than ẑp is p. It is known that for r sufficiently large, ẑp is approximately normally distributed with p(1 − p) [see, e.g., Walker (1968)]. We provide a brief derivation of E(ẑp ) = zp and V (ẑp ) = rf (zp )2 this result. To start with, let p̂ be the proportion of the Si that are less than zp . Then by the Central Limit Theorem, p̂ is approximately normally distributed with mean p and varip(1 − p) . Using the delta method, it follows that F −1 (p̂) is approximately normal ance r " #2 p(1 − p) d with mean F −1 (p) and variance F −1 (p) . Now since F −1 (p) = zp and since r dp d F −1 (p) = 1/f (F −1 (p)) = 1/f (zp ), F −1 (p̂) is approximately normal with mean zp and dp p(1 − p) variance . rf (zp )2 7 It can be shown using a stochastic equicontinuity argument (e.g. Andrews, 1994) that ẑp ≈ F −1 (p̂); in fact the difference between them is of order 1/r in probability. Since this difference is of higher order than the standard deviation of F −1 (p̂), the asymptotic distribution of ẑp is the same as that of F −1 (p̂). Therefore ẑp is asymptotically normal with E(ẑp ) = zp and p(1 − p) V (ẑp ) = . rf (zp )2 Some general observations follow: For a fixed number of replications, the variance is a function of p. The numerator is maximized for p = 0.5, and decreases quadratically as p tends toward 0 or 1. For values of zp where the density is large, the denominator is larger, so the variance is smaller. Intuitively, the reason that such values of zp can be estimated more precisely is that there will be comparatively many observations in neighborhoods of values where the density is large. One needs more replications to get precise CVs for small values of p because relatively few of the replications produce information regarding the tails of the sampling distribution. Relationship between V (ẑp ) and p For the normal distribution, the density f (zp ) is maximized for p = 0.5, and decreases exponentially as p tends toward 0 or 1. As a result, the variance is minimized at p = 0.5 and increases as p approaches 0 or 1. In contrast, the uniform distribution on [0, 1] has constant density f (x) = 1 on [0, 1]. In this case the variance is maximized at p = 0.5 and decreases quadratically as p approaches 0 or 1. Figure 1 presents a plot of the standard deviation of √ ẑp (multiplied by r) as a function of p for both the standard normal distribution and the uniform distribution on [0, 1]. The behavior observed for the normal distribution in Figure 1 is typical of distributions supported on the whole line. To see this, write p = F (zp ), so that V (ẑp ) = F (zp )(1 − F (zp )) 0 . Applying L’Hospital’s rule, we find limp→0 rV (ẑp ) = limp→0 1/2f (zp ). rf (zp )2 0 Similarly, limp→1 rV (ẑp ) = limp→1 −1/2f (zp ). For distributions supported on the whole 8 Standard Deviation of Sample Quantile of Standard Normal Distribution Standard Deviation of Sample Quantile of Uniform Distribution 4 0.25 Standard Deviation Standard Deviation 3.5 3 2.5 2 0.15 0.1 0.05 1.5 1 0 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Quantile 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Quantile 1 Figure 1: Standard deviation (multiplied by √ r) of ẑp as a function of p for the standard normal distribution (left) and the uniform distribution on [0, 1] (right). line, zp → ∞ as p → 1 and zp → −∞ as p → 0. Assuming there are not infinitely many sign 0 0 0 changes in f (zp ), then f (zp ) < 0 for sufficiently large z, and limz→∞ f (zp ) = 0. It follows that limp→1 V (ẑp ) = ∞. Similarly, limp→0 V (ẑp ) = ∞ as well. For distributions supported on a finite interval, the dependence of V (ẑp ) on p may be quite different. A striking example is the uniform distribution on [0, 1]. Here f (zp ) = 1 for all p, so V (ẑp ) = p(1 − p)/n. Now the variance is maximized at p = 0.5, and decreases to 0 as p tends toward 0 or 1 (see Figure 1). An example We’ll consider the standard normal distribution, with density f (x) = φ(x) = (2π)−1/2 e−x 2 /2 . The most commonly used quantiles are p = 0.025 and p = 0.975, which are used to construct 95% confidence intervals. The true critical values are ±1.9600 to four decimal places. In particular, when r = 1, 000, 000 and p = 0.025 or 0.975, then zp = ±1.96 and φ(zp ) = 0.058. The standard deviation of ẑp turns out to be about 0.003. 9 One can use the formula for the variance to compute the number of replications needed to estimate a quantile to a given level of precision. For example, to estimate the 0.025 quantile of the normal distribution with a precision of ±0.1% with 95% confidence, one must find the value of r that makes the standard deviation equal to 0.001/1.96. It turns out that r ≈ 27, 400, 000! Figure 2 presents a plot of the approximate number of replications needed to estimate zp with a precision of ±0.1% with 95% confidence for the standard normal distribution. The approximation is based on estimating the true distribution of the sample quantile with its asymptotic normal distribution. Number of Replications Needed to Estimate Quantile to ± 0.001 with 95% Confidence 7 Number of Replications 6 x 10 5 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Quantile Figure 2: Number of replications needed to estimate zp with a precision of ±0.001 with 95% confidence for the standard normal distribution. The precision is proportional to the square root of the number of replications. Therefore, for example, if a precision of only ±1% were needed, the number of replications needed would be 1/100 of that shown in Figure 2. The effect of the sample size and number of replications on precision Let S n represent the sample mean of n i.i.d. observations from the standard normal distribution, and consider estimating the pth quantile of the distribution of S n on the basis of r 10 replications. The density of S n is √ √ nφ(x/ n), and the pth quantile is zp /n, where zp is the pth quantile of N (0, 1). It follows that the asymptotic variance of the pth quantile is p(1 − p) rnφ(zp )2 We see that the asymptotic variance is inversely proportional to the product rn. Therefore, when estimating a normal quantile, it does not matter whether one takes r samples of size n, or one sample of size rn. For sample means of other distributions, if n is large enough, the density of the sample mean is approximately normal, so the asymptotic variance again should be approximately proportional to rn. 4 CV simulation recommendations It is important to assess the precision of simulated critical values. When the density of a test statistic S is known, the asymptotic distribution for a particular critical value can be used to compute the uncertainty. For the normal distribution (see Figure 1), there is greater uncertainty associated with the critical values for extreme quantiles. Therefore it takes more replications to estimate the 1% critical value with precision .001 than to estimate the 50% critical value with the same precision. Suppose that we wish to estimate critical values for the 2.5% and 97.5% quantiles of a standard normal distribution with a precision of 0.01 (equivalent to a standard error of 0.005). Our analysis suggests that this precision can be attained with approximately 285,000 replications. Table 2 shows other critical values and the recommended number of replications needed for precision of .100, .010 and .001. If the researcher is content with a precision of 0.100 for the 2.5/97.5% quantile, for example, only 2,850 replications are needed. On the other hand, for precision of 0.001 on this quantile, roughly 28,500,000 replications are needed. This may be impractical if many simulations need to be run for a collection of model specifications. For the more extreme 1.0/99.0% quantile, even more reps are required. 11 For researchers who are using only 5,000 replications, say, to simulate critical values, it seems reasonable to produce tables of critical values with only one digit (rather than two or three) to accurately indicate significant digits. Table 2: Recommended Number of Replications to Produce Simulated Critical Values for Various Quantiles of the Standard Normal Distribution with Desired Precision Quantiles 0.100 Desired Precision 0.010 10.0 or 90.0% 5.0 or 95.0% 2.5 or 97.5% 1.0 or 99.0% 1,170 1,790 2,850 5,570 117,000 179,000 285,000 557,000 0.001 11,700,000 17,900,000 28,500,000 55,700,000 In all cases, the number of recommended replications r is obtained from the following formula: p(1 − p) r= 2 , where σ is the standard error of zp , equal to one-half the desired precision. σ φ(zp )2 In many practical applications, the density of S will not be known. In these cases one might proceed by estimating the density with a standard smoothing technique, and then use that estimate to approximate the asymptotic distribution of the sample quantile. Here we describe a nonparametric bootstrap approach. Choose a value R, and generate R replications S1 , ..., SR . The value R should be large enough to provide a reasonable approximation in the bootstrap procedure we will describe, but need not be large enough to provide the desired degree of precision. Partition the R values of S into m subsamples of size r = R/m. (It should be feasible to take, for example, R = 100, 000, m = 100 and r = 1000; smaller values may suffice in some situations.) Then compute the sample quantile ẑp from each subsample, obtaining estimates ẑp(1) , ..., ẑp(m) . Next compute the sample standard deviation sr of these estimates. The value of sr is an estimate of the standard deviation of the sample quantile based on r replications.2 For any large number of replications M , we can then approximate the distribution of the sample quantile based on M replications with N (zp , rs2r /M ). 2 This recommendation amounts to MacKinnon’s “divide and conquer” approach discussed in the intro- duction. 12 References Andrews, D. (1994). Empirical Process Methods in Econometrics. In Handbook of Econometrics, R.F. Engle and D.L. McFadden, eds. Elsevier Science B.V. Dickey, David A. and Wayne A. Fuller (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74:427–431. Dickey, David A. and Wayne A. Fuller (1981). Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root,. Econometrica 4 (July), 49:1057–1072. Engle, Robert F. and C. W. J. Granger (1987). Cointegration and Error Correction: Representation, Estimation, and Testing. Econometrica, 55:251–276. Engle, R.F. and B.S. Yoo (1991). Cointegrated Economic Time Series: An Overview with New Results. in Long-Run Economic Relationships: Redings in Cointegration, R.F. Engle and C.W.J. Granger (eds). Oxford: Oxford University Press. Harvey, D., Leybourne, S. and P. Newbold (2001). Innovational Outlier Unit Root Tests with An Endogenously Determined Breaks in Level. Oxford Bulletin of Economics and Statistics 63:559–575. Lee, Junsoo and Mark C. Strazicich (2001). Break Point Estimation and Spurious Rejections with Endogenous Unit Root Tests. Oxford Bulletin of Economics and Statistics 63:535–558. Lee, Junsoo and Mark C. Strazicich (2003). Minimum LM Unit Root Test with Two Structural Breaks. Review of Economics and Statistics 85:1082–1089. Lumsdaine, R. and D. Papell (1997). Multiple Trend Breaks and the Unit Root Hypothesis. Review of Economics and Statistics 79:212-218. MacKinnon, James G. (1991). Critical Values for Cointegration Tests. Chapter 13 in R. F. Engle and C. W. J. Granger (eds.), Long-run Economic Relationships: Readings in Cointegration, Oxford: Oxford University Press. 13 MacKinnon, James G. (1996). Numerical Distribution Functions for Unit Root and Cointegration Tests. Journal of Applied Econometrics, 11:601–618. MacKinnon, James G. (2000). Computing Numerical Distribution Functions in Econometrics. In High Performance Computing Systems and Applications, A. Pollard. D. Mewhort, and D. Weaver, eds. Amsterdam, Kluwer, 455–470 MacKinnon, James G. (2010). Critical Values for Cointegration Tests. Queens University Working Paper No. 1227, 2010. [This paper updates and extends MacKinnon (1991).] Nunes, L., P. Newbold, and C. Kuan (1997). Testing for Unit Roots with Breaks: Evidence on the Great Crash and the Unit Root Hypothesis Reconsidered. Oxford Bulletin of Economics and Statistics, 59:435–448. Perron, Pierre (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis, Econometrica 57:1361–1401. Phillips, P.C.B and S. Ouliaris (1990). Asymptotic Properties of Residual-Based Tests for Cointegration, Econometrica 58:165-193. Phillips, P.C.B. and P. Perron (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75:335–346. Schwert, G. William (1989). Tests for Unit Roots: A Monte Carlo Investigation. Journal of Business and Economic Statistics, 7:147–159. Walker, A. (1968). A Note on the Asymptotic Distribution of Sample Quantiles, Journal of the Royal Statistical Society, Series B 30:570–575. Yoo, B.S. (1987). Co-integrated Time Series: Structure, Forecasting and Testing. unpublished Ph.D. Dissertation, University of San Diego. Zivot, E. and D. Andrews. (1992) Further Evidence on the Great Crash, the Oil-Price Shock, and the Unit-Root Hypothesis. Journal of Business and Economic Statistics, 10:251270. 14