Econometrica, Vol. 72, No. 5 (September, 2004), 1519–1563 WAVELET-BASED TESTING FOR SERIAL CORRELATION OF UNKNOWN FORM IN PANEL MODELS BY YONGMIAO HONG AND CHIHWA KAO1 Wavelet analysis is a new mathematical method developed as a unified field of science over the last decade or so. As a spatially adaptive analytic tool, wavelets are useful for capturing serial correlation where the spectrum has peaks or kinks, as can arise from persistent dependence, seasonality, and other kinds of periodicity. This paper proposes a new class of generally applicable wavelet-based tests for serial correlation of unknown form in the estimated residuals of a panel regression model, where error components can be one-way or two-way, individual and time effects can be fixed or random, and regressors may contain lagged dependent variables or deterministic/stochastic trending variables. Our tests are applicable to unbalanced heterogenous panel data. They have a convenient null limit N(0 1) distribution. No formulation of an alternative model is required, and our tests are consistent against serial correlation of unknown form even in the presence of substantial inhomogeneity in serial correlation across individuals. This is in contrast to existing serial correlation tests for panel models, which ignore inhomogeneity in serial correlation across individuals by assuming a common alternative, and thus have no power against the alternatives where the average of serial correlations among individuals is close to zero. We propose and justify a data-driven method to choose the smoothing parameter—the finest scale in wavelet spectral estimation, making the tests completely operational in practice. The data-driven finest scale automatically converges to zero under the null hypothesis of no serial correlation and diverges to infinity as the sample size increases under the alternative, ensuring the consistency of our tests. Simulation shows that our tests perform well in small and finite samples relative to some existing tests. KEYWORDS: Error component, hypothesis testing, serial correlation of unknown form, spectral peak, static and dynamic panel models, unbalanced panel data, wavelet. 1. INTRODUCTION PANEL DATA HAVE BEEN WIDELY USED in economics and finance. They often provide insights not available in pure time-series or cross-sectional data (e.g., Baltagi (2002), Granger (1996), Hsiao (2003)). This paper proposes a new class of generally applicable wavelet-based consistent tests for serial correlation of unknown form in the errors of panel models. It is important to test serial correlation for panel models because existence of serial correlation will invalidate conventional tests such as t- and F -tests which use standard covariance estimators of parameter estimators, and will indicate model misspecification when 1 We thank the co-editor and three referees for insightful comments that have lead to significant improvement on a previous version. We also thank Badi Baltagi, Pierre Duchesne, Jiti Gao, Jerry Hausman, Cheng Hsiao, Heshem Pesaran, Jim Stock, and seminar participants at MIT-Harvard Econometrics Workshop, 2001 North American Summer Meeting of Econometric Society in Washington, DC, 2001 Far Eastern Meeting of Econometric Society in Kobe, Japan, Fifth ICSA International Conference in Hong Kong, and 10th International Conference on Panel Data Models in Berlin for helpful comments. This research is supported by National Science Foundation via Grant SES-0111769. 1519 1520 Y. HONG AND C. KAO regressors include lagged dependent variables. Moreover, the choice of estimation methods may depend on whether there exists serial correlation in the errors of panel models. When the errors are serially correlated, for example, the computation of MLE (e.g., Anderson and Hsiao (1982), Hsiao (2003), Binder, Hsiao, and Pesaran (1999)) and GMM (e.g., Blundell and Bond (1998)) could be complicated, and the feasible GLS estimator will be invalid or have to be modified substantially (e.g., Baltagi and Li (1991)). Some procedures, such as Breusch and Pagan’s (1980) tests for random effects, also assume serial uncorrelatedness in the errors of panel models. There have been some tests for serial correlation in panel models. Bhargava, Franzini, Narendranathan (1982) extend Durbin and Watson’s (1951) test to static panel models. Breusch and Pagan (1980) propose an LM test for firstorder serial correlation, assuming no random effects. Baltagi and Li (1995) propose a class of LM tests for first-order serial correlation, allowing random or fixed individual effects. Bera, Sosa-Escudero, and Yoon (2001) also propose a convenient OLS-based test for first-order serial dependence. Li and Hsiao (1998) propose tests for first-order and higher-order serial correlation in a semiparametric partially linear panel model. All existing tests for serial correlation in panel models assume a known form of a common alternative, e.g., an AR(1) or MA(1) model. These tests have optimal power when the assumed model is true, but they are not consistent against serial correlation of unknown form. It is useful to test serial correlation of unknown form because prior information about the alternative is usually not available in practice. This is particularly relevant for panel models because there may exist significant inhomogeneity in serial correlation across individuals (e.g., Choi (2002)). By assuming a common alternative model, existing tests ignore inhomogeneity in serial correlation across individuals, and so have little power against the alternatives where the average of serial correlations among individuals is close to zero. Moreover, as Granger and Newbold (1977, p. 92) pointed out, the first few lags of OLS residuals of a misspecified linear dynamic model often appear like a white noise, due to the very nature of OLS. It is therefore important to check serial correlation at higher order lags. Little effort has been made on evaluation of dynamic panel models (Granger (1996)). Wavelets are a new mathematical tool alternative to the Fourier transform. They can effectively capture nonsmooth features such as singularities and spatial inhomogeneity (e.g., Donoho and Johnstone (1994, 1995a, 1995b), Donoho et al. (1996), Gao (1997), Hong and Lee (2001), Jensen (2000), Neumann (1996), Lee and Hong (2001), Ramsey (1999), Wang (1995)). Many economic and financial time series have a spectrum with peaks and kinks, as can arise from persistent dependence, business cycles, seasonality, and other kinds of periodicity (e.g., Bizer and Durlauf (1992), Granger (1969), Watson (1993)). In this paper we use wavelets to test serial correlation in estimated residuals of panel models. Unlike existing tests, whose constructions are usually SERIAL CORRELATION IN PANEL MODELS 1521 model-dependent, our tests are generally applicable. The panel model can be static or dynamic, and one-way or two-way; both balanced and unbalanced panel data are covered; individual and time effects can be fixed or random; regressors can contain lagged dependent variables or deterministic/stochastic trending variables; and no specific estimation method is required. Our tests have a convenient limit N(0 1) distribution under the null hypothesis, no matter whether regressors contain lagged dependent variables or deterministic/stochastic trending variables. In contrast to Durbin and Watson’s (1951) test and Box and Pierce’s (1970) portmanteau test, parameter estimation uncertainty has no impact on the limit distribution of our test statistics when applied to dynamic panel models. We do not require an alternative model, and our tests are consistent against serial correlation of unknown form even in the presence of substantial inhomogeneity in serial correlation across individuals. No consistent test for serial correlation of unknown form was available for panel models. Our asymptotic theory considers a panel model with both large n and T , where n is the number of individuals and T is the number of time-series observations Increasing effort has been devoted to the study of panel models with both large n and T , due to the growing use of cross-country data over time to study growth convergence, international R&D spillover and purchasing power parity, and to the growing use of firm- or portfolio-level financial time series. As is well known (e.g., Phillips and Moon (1999), Hahn and Kuersteiner (2002)), asymptotic analysis in panel models is much more involved than in pure time-series analysis, due to the need to handle double indices. As a distinct feature, we treat both n → ∞ and T → ∞ simultaneously, which complements Phillips and Moon (1999) and Hahn and Kuersteiner’s (2002) joint limit theory for panel models. Our general theory does not require that the ratio n/T go to 0 or a constant. We also show that the use of the estimated residuals from a possibly nonstationary panel model rather than the unobservable errors has no impact on the null limit distribution of our test statistics. In addition, we find several interesting features not available in pure time-series analysis. Most remarkably, the limit N(0 1) distribution of our test statistics is obtained without requiring the smoothing parameter—the finest scale in wavelet estimation to grow with T . This not only leads to reasonable asymptotic approximation in finite samples, but also makes it possible to use data-driven methods that deliver a fixed finest scale under the null hypothesis of no serial correlation. This is in sharp contrast to Lee and Hong (2001), who, in testing serial correlation for observed raw time series data, require the finest scale to grow as T → ∞ under the null hypothesis. We further develop a data-driven method to choose the finest scale, making our tests completely operational in practice. The data-driven finest scale converges to 0 under the null and grows to ∞ under the alternative, ensuring consistency against serial correlation of unknown form. We also find that a heteroskedasticity-corrected test may be less powerful than a heteroskedasticity-consistent test. This differs 1522 Y. HONG AND C. KAO from the well-known estimation result that heteroskedasticity-corrected estimators (e.g., feasible GLS) are more efficient than heterokedasticity-consistent estimators (e.g., OLS). Our tests work reasonably well in small and finite samples often encountered in economics. We describe the panel model and hypotheses in Section 2, introduce wavelets and test statistics in Section 3, derive the limit distributions of these tests in Section 4, and establish their consistency in Section 5. Section 6 proposes a data-driven finest scale. Section 7 is a simulation study. Section 8 concludes. All proofs are in the Appendix. Throughout, A denotes the Euclidean norm [tr(A A)]1/2 ; A∗ and Re(A) the complex conjugate and the real part of A; Z ≡ {0 ±1 } and Z+ ≡ {0 1 } the set of integers and the set of nonnegative integers; and c and C generic bounded constants, with 0 < c < C < ∞. Unless indicated, all limits are taken as both n T → ∞. A GAUSS program for implementing our tests is available from the authors upon request. The user only needs to supply estimated residuals. 2. THE FRAMEWORK Consider a linear panel process (2.1) Yit = α + Xit β + µi + λt + εit (t = 1 Ti ; i = 1 n; n Ti ∈ Z+ ), where Yit is a scalar Xit is a p × 1 vector of regressors that may contain lagged dependent variables, α is an intercept, β is a p × 1 vector of slope parameters, µi is the individual effect, λt is the time effect, and εit is the error such that {εit } and {εls } are mutually independent for all i = l and all t s. We allow for fixed or random effects. We assume Ti = ci T for some integer T and ci ∈ [c C], thus permitting unbalanced panel data. Moreover, we allow Yit Xit α, and β to depend on n and T (For notational simplicity, we have suppressed such dependence.) The slope parameter β is often of interest. It can be estimated, e.g., by the within estimator n T −1 n T i i X̃it X̃it X̃it Ỹit β̂ ≡ (2.2) i=1 t=1 i=1 s=1 Ti n where X̃it ≡ Xit − X̄i· − X̄·t + X̄ X̄i· ≡ Ti−1 t=1 Xit X̄·t ≡ n−1 i=1 Xit , and n Ti Xit The variables Ỹit Ȳi· Ȳ·t , and Ȳ are defined simX̄ ≡ n−1 i=1 Ti−1 t=1 ilarly. For interval estimation and hypothesis testing, one often uses the n Ti X̃it X̃it )−1 of β̂, where σ̂ε2 standard covariance estimator Ω̂β̂ ≡ σ̂ε2 ( i=1 t=1 n 2 −1 2 is an estimator for σε ≡ limn→∞ n i=1 E(εit ) This estimator is valid when SERIAL CORRELATION IN PANEL MODELS 1523 {εit } in (2.1) is serially uncorrelated, among other things. The existence of serial correlation of any form, however, will generally invalidate the covariance estimator and related inference. In particular, conventional t- and F -tests will be misleading. Moreover, when the Xit contain lagged dependent variables, serial correlation will further render inconsistent the within estimator β̂ for β. We are interested in testing whether the error process {εit } is serially correlated. The hypotheses of interest are H0 : cov(εit εit−|h| ) = 0 for all h = 0 and all i vs. HA : cov(εit εit−|h| ) = 0 at least for some h = 0 and some i The alternative HA allows some (but not all) individual series to be white noises. Prior information about the alternative is usually not available in practice, although there may exist substantial inhomogeneity in serial correlation across i. To test H0 we will examine serial correlation in the demeaned estimated residual (2.3) v̂it ≡ ûit − ūi· − ū·t + ū (t = 1 Ti ; i = 1 n) Ti n n where ûit ≡ Yit −Xit β̂ ūi· ≡ Ti−1 t=1 ûit ū·t ≡ n−1 i=1 ûit ū ≡ n−1 i=1 Ti−1 × Ti t=1 ûit and β̂ is an estimator consistent for β under H0 When β̂ is the within estimator in (2.2), v̂it is the well known within residual. However, we allow using other estimators that are consistent for β under H0 but not necessarily under HA . p Suppose β̂ → β, as does the within estimator in (2.2) under H0 ; then v̂it will converge in probability to the true error εit Under HA however, β̂ may not be consistent for β (as in a dynamic panel model), and v̂it will converge in probability to the model error (2.4) vit ≡ εit + (β − β∗ ) (Xit − E X̄i· − E X̄·t + E X̄) which contains both the true error εit and the misspecified component, where β∗ ≡ p lim β̂. This does not invalidate our tests, but it affects the power of the tests in finite samples, because serial correlation in {vit } may differ from serial correlation in {εit } However, our tests are still consistent against HA because H0 holds if and only if {vit } is serially uncorrelated: When H0 holds, {vit } coincides with {εit } and so is serially uncorrelated; on the other hand, if {vit } is serially uncorrelated in a linear dynamic panel model, we can view {vit } as the true error for (2.1), and estimation and inference can be implemented in a standard fashion. (We note that in a linear dynamic panel setup, it is possible that {εit } is serially correlated but {vit } is serially uncorrelated, due to β∗ = β This occurs when and only when εit contains the misspecified linear component, (β∗ − β) (Xit − E X̄i· − E X̄·t + E X̄) plus a white noise. In this case, serial correlation in {εit } is solely caused by the misspecified linear component, and it is actually more appropriate to view that (2.1) is a correctly specified linear dynamic panel model, but with vit as the true error and β∗ as the true 1524 Y. HONG AND C. KAO model parameter. With such an interpretation, H0 holds when {vit } is seriously uncorrelated.) Suppose {vit } has autocovariance function Ri h ≡ E(vit vit−|h| ) and power spectrum (2.5) fi (ω) ≡ (2π)−1 ∞ Ri (h)e−ihω ω ∈ [−π π] i≡ −1 h=−∞ Both Ri (h) and fi (ω) contain the same information on serial correlation of {vit } One can use Ri (h) or fi (ω) to test H0 vs. HA All existing tests for serial correlation in panel models are based on Ri (h) assuming a common model with some prespecified lags h for all i (e.g., AR(1) and MA(1)). We use fi (ω) here, which is a natural tool to test serial correlation of unknown form, because it contains information on serial correlation at all lags. Under H0 fi (ω) becomes fi0 (ω) ≡ (2π)−1 Ri (0) for all ω ∈ [−π π] Under HA fi (ω) = (2π)−1 Ri (0) at least for some i Thus, a consistent test for H0 vs. HA can be formed by comparing consistent estimators of fi (ω) and fi0 (ω) We will use wavelets to estimate fi (ω) which are suitable for time series with spectral peaks and kinks. Of course, other nonparametric methods (e.g., kernel smoothing; see Hong (1996) and Section 7 below) could be used. 3. WAVELET METHOD 3.1. Wavelets The essence of wavelet analysis is to expand a function as a sum of elementary functions called wavelets centered at a sequence of locations. These wavelets are derived from a single function ψ(·) called the mother wavelet, by translations and dilations. As a spatially adaptive analytic tool, wavelets are powerful in capturing singularities of nonsmooth functions, such as spectral peaks and kinks (e.g., Gao (1997), Neumann (1996)). We first impose a standard condition on ψ(·). ∞ψ : R → R is an∞ orthonormal wavelet such that ∞ASSUMPTION 1: ψ(x) dx = 0 |ψ(x)| dx < ∞ −∞ ψ(x)ψ(x − k) dx = 0 for all k ∈ Z, −∞ ∞ 2 −∞ k = 0 and −∞ ψ (x) dx = 1. The orthonormality of ψ(·) ensures that the doubly infinite sequence {ψjk(·)} where (3.1) ψjk (x) ≡ 2j/2 ψ(2j x − k) j k ∈ Z constitutes an orthonormal basis for L2 (R) the space of square-integrable functions on R (Daubechies (1992)). The integers j and k are called scale and 1525 SERIAL CORRELATION IN PANEL MODELS translation parameters. Intuitively, j localizes analysis in frequency and k localizes analysis in time or space. Assumption 1 ensures that the Fourier transform of ψ(·), ∞ −1/2 (3.2) ψ(x)e−izx dx z ∈ R ψ̂(z) ≡ (2π) −∞ exists and is continuous in z almost everywhere. We impose a condition on ψ̂(·) ASSUMPTION 2: (a) |ψ̂(z)| ≤ C(1 + |z|)−τ for some τ > 32 ; (b) ψ̂(z) = eiz/2 b(z) or ψ̂(z) = −ieiz/2 b(z) where b(·) is real-valued with b(0) = 0. Many wavelets satisfy this. One example is spline wavelets of positive order m ∈ Z+ For odd m, this family has ψ̂(z) = eiz/2 b(z), where b(·) is real-valued and symmetric. For even m, it has ψ̂(z) = −ieiz/2 b(z) where b(·) is real-valued and odd (e.g., Hernández and Weiss (1996, (2.16), p. 161)). One member in this family is the first-order spline wavelet, called the Franklin wavelet, whose (3.3) ψ̂(z) = e P3 (z) ≡ iz/2 −1/2 (2π) 1/2 sin4 (z/4) P3 (z/4 + π/4) (z/4)2 P3 (z/2)P3 (z/4) where 2 1 + cos(2z) 3 3 Another member is the second-order spline wavelet, with (3.4) ψ̂(z) = −ie P5 (z) ≡ iω/2 −1/2 (2π) 1/2 sin6 (z/4) P5 (z/4 + π/4) (z/4)3 P5 (z/2)P5 (z/4) where 8 1 13 cos2 (2z) + cos(2z) + 30 30 15 3.2. Wavelet Representation of Spectrum We now consider wavelet representation of spectral density fi (·) Given an orthonormal wavelet basis {ψjk (·)} for L2 (R) we define Ψjk (ω) ≡ (2π)−1/2 ∞ m=−∞ ψjk ω +m 2π ω ∈ [−π π] This constitutes an orthonormal wavelet basis for L2 [−π π] the space of 2π-periodic functions on [−π π] See, e.g., Daubechies (1992, Ch. 9) or Hernández and Weiss (1996, Ch. 4). 1526 Y. HONG AND C. KAO compute Ψjk (·) from its Fourier transform Ψ̂jk (h) ≡ (2π)−1/2 × πOne can also −ihω Ψjk (ω)e dω via the formula −π (3.6) ∞ −1/2 Ψjk (ω) = (2π) Ψ̂jk (h)eihω h=−∞ Lee and Hong (2001) show that the spectral density fi (·) in (2.5) can be decomposed as ∞ 2 j (3.7) −1 fi (ω) = (2π) σ + 2 i αijk Ψjk (ω) ω ∈ [−π π] j=0 k=1 where the wavelet coefficient αijk is the orthogonal projection of fi (·) on the base Ψjk (·); i.e., π (3.8) fi (ω)Ψjk (ω) dω αijk ≡ −π Unlike the Fourier transforms, αijk depends on the local behavior of fi (·) because Ψjk (·) is effectively 0 outside an interval of width 2−j centered at k/2j Such a spatial adoption feature makes it powerful for capturing nonsmooth features. We can also express αijk in time domain; i.e., (3.9) αijk = (2π)−1/2 ∞ Ri (h)Ψ̂jk∗ (h) h=−∞ 3.3. Wavelet Spectral Density Estimator Define the sample autocovariance function of {v̂it }: (3.10) R̂i (h) ≡ Ti−1 Ti (h = 0 ±1 ±(Ti − 1)) v̂it v̂it−|h| t=|h|+1 Then a wavelet estimator of the spectral density fi (·) can be given by Ji 2 j (3.11) fˆi (ω) ≡ (2π)−1 R̂i (0) + α̂ijk Ψjk (ω) j=0 k=1 where the empirical wavelet coefficient (3.12) −1/2 α̂ijk ≡ (2π) Ti −1 h=1−Ti R̂i (h)Ψ̂jk∗ (h) ω ∈ [−π π] SERIAL CORRELATION IN PANEL MODELS 1527 and Ji ≡ Ji (Ti ) is the finest scale corresponding to the highest resolution level. Appropriate conditions on Ji will be given. We allow a different Ji for a different i This is useful because the pattern of serial correlation may vary substantially across i We will also propose a data-driven method to choose Ji . Note that (3.11) is a linear wavelet estimator. Nonlinear wavelet estimators of Donoho et al. (1996) which are popular in curve estimation could be used. However, under our regularity conditions (see subsequent sections) which are not stronger than standard assumptions in time series panel econometrics, both linear and nonlinear wavelet estimators have the same convergence rate. Nonlinear wavelet estimators have no advantage at least in large samples. Masry (1994, 1997) also finds that the gain from using nonlinear wavelet estimators rather than linear wavelet estimators is marginal for different models. Moreover, the use of nonlinear wavelet estimators would lead to a much more complicated analysis in theory. Put Q(f1 f2 ) ≡ quadratic form π −π 3.4. Wavelet-Based Tests [f1 (ω) − f2 (ω)]2 dω for any f1 (·) and f2 (·) We use the Ji 2 j (3.13) Q(fˆi fˆi0 ) = α̂2ijk j=0 k=1 where fˆi0 (ω) ≡ (2π)−1 R̂i (0) and the equality follows by Parseval’s identity. Our first test statistic n Ji 2j 2 Vˆ 1/2 (3.14) Ŵ1 ≡ 2πTi α̂ − M̂ ijk i=1 j=0 k=1 where M̂ ≡ n R̂2i (0)Mi0 i=1 V̂ ≡ n R̂4i (0)Vi0 i=1 Mi0 ≡ Ti −1 (1 − h/Ti )bJi (h h) h=1 Vi0 ≡ 4 Ti Ti (1 − h/Ti )(1 − m/Ti )b2Ji (h m) h=1 m=1 1528 Y. HONG AND C. KAO and bJ (h m) ≡ 2 Re[aJ (h m) + aJ (h −m)] J 2 j aJ (h m) ≡ Ψ̂jk (h)Ψ̂jk∗ (m) j=0 k=1 and Ψ̂jk (·) is as in (3.6) Ti −1 Put âijk ≡ (2π)−1/2 h=1−T ρ̂i (h)Ψ̂jk (h) where ρ̂i (h) ≡ R̂i (h)/R̂i (0). Our i second test statistic Ji n 2j 1 1/2 2 2πTi (3.15) Ŵ2 ≡ √ âijk − Mi0 Vi0 n i=1 j=0 k=1 Intuitively, Ŵ2 can be viewed as a heteroskedasticity-corrected test while Ŵ1 is a heteroskedasticity-consistent test, where heteroskedasticity arises from different variances σi2 and finest scales Ji In Ŵ2 these two forms of heteroskedasticity are corrected first for each i. As is shown below, Ŵ1 and Ŵ2 are asymptotically N(0 1) under H0 but their power properties generally differ. The heteroskedasticity-robust test Ŵ1 may be more powerful than the heteroskedasticity-consistent test Ŵ2 . This differs from the well-known estimation result that correcting heteroskedasticity leads to more efficient estimation (e.g., the feasible GLS is more efficient than OLS). Both Ŵ1 and Ŵ2 apply to one-way or two-way error component models. For one-way component models, however, one can use v̂it ≡ ûit − ūi· if one knows λt = 0 and use v̂it ≡ ûit − ū·t if one knows µi = 0 The limit distribution of the test statistics is unchanged. 4. ASYMPTOTIC DISTRIBUTION We now impose a set of unified regularity conditions that hold under both H0 and HA . ASSUMPTION 3: √ nT (β̂ − β∗ ) = OP (1) where β∗ = β under H0 . ASSUMPTION 4: (a) For each i {vit } is covariance-stationary with E(vit ) = 0, E(vit2 ) = σi2 ∈ [c C] and E(vit8 ) ∈ [c C]; (b) the individual and time effects, µi and λt can be stochastic (random effects) or deterministic ( fixed effects). Ti X̃it ṽit−h if h ≥ 0 and Γ˜ixv (h) ≡ ASSUMPTION 5: Put Γ˜ixv (h) ≡ Ti−1 t=h+1 Γ˜ixv (−h) if h < 0 Γixv (h) ≡ p lim Γ˜ixv (h) where X̃it ≡ Xit − X̄i· − X̄·t + X̄ SERIAL CORRELATION IN PANEL MODELS 1529 Ti EX̃it 4 ≤ C; and ṽit ≡ vit − v̄i· − v̄·t + v̄ Then (a) sup1≤i≤n Ti−1 t=1 ∞ −1 (b) sup1≤i≤n sup1≤h<Ti EΓ˜ixv (h) − Γixv (h)2 ≤ CTi ; (c) h=−∞ Γixv (h) ≤ C. We only require that the estimator β̂ be consistent for β under H0 ; it need not be asymptotically most efficient under H0 and even need not be consistent for β under HA Thus, we can use the convenient within estimator β̂ in (2.2). Of course, other estimators such as OLS, feasible GLS, MLE, and IV are also allowed. In Assumption 4, no restrictive assumptions on {µi } and {λt } are imposed. We allow {λt } to be serially correlated if λt is random, and {µi } to be spatially correlated if µi is random. Given Assumption 3, {vit } coincides with the true error {εit } under H0 Here, we allow a certain degree of heterogeneity in panel data under H0 —the process {Yit Xit } need not be stationary for each i, and {εit } may have different variances across i In particular, we allow some nonstationary processes. One example is the deterministic trend process (e.g., Kao and Emerson (2004)) (4.1) Yit = α + γ1 t + γ2 t 2 + · · · + γp t p + µi + λt + εit This is covered by (2.1) with Xit ≡ [t/T (t/T )p ] and β ≡ (T γ1 T pγp ) Another example is the panel cointegration process (e.g., Phillips and Moon (1999), Kao and Chiang (2000)): (4.2) Yit = α + γZit + µi + λt + εit where Zit = Zit−1 + ηit {ηit } is I(0) for each i and {ηit } may or may not be correlated with {εit } This process is also covered by (2.1) with Xit ≡ T −1 Zit and β ≡ T γ However, Assumption 4 and (2.4) generally imply that both {εit } and {Xit } are covariance-stationary when β∗ = β under HA This more restrictive condition is needed under HA when we investigate the asymptotic power property of the proposed tests. THEOREM 1: Suppose Assumptions 1–5 hold and max1≤i≤n (22Ji )/(n2 + T ) → 0 as n → ∞ and T → ∞ If {εit } in (2.1) is i.i.d. for each i then d d Ŵ1 → N(0 1) and Ŵ2 → N(0 1). The large sample properties of the proposed tests are due to the n inde√ pendent copies of serially uncorrelated series {εit } and the existence of a nT consistent estimator for β under H0 . In fact, the assumption of a linear panel model is not necessary to obtain the large sample properties. One could extend Assumptions 4 and 5 to cover some nonlinear panel models, with more tedious algebra in the proof. It is also possible to relax the condition on the √ nT -convergence rate of β̂ for β at a cost of strengthening conditions on the finest scales {Ji } 1530 Y. HONG AND C. KAO Most remarkably, we permit but do not require Ji → ∞ for any i; all Ji can be fixed as n T → ∞ under H0 This is in sharp contrast to Lee and Hong (2001), who require J → ∞ as T → ∞ to achieve asymptotic normality in testing serial correlation for observed raw time-series data. The reason that all Ji can be fixed is that the additional smoothing provided by n ensures asymptotic normality of Ŵ1 and Ŵ2 Intuitively, Ŵ1 and Ŵ2 are sums of approximately independent random variables {2πTi Q(fˆi fˆi0 )}ni=1 By the central limit theorem, they will converge to a normal distribution as n → ∞. This occurs no matter whether Ji → ∞. In the time-series or cross-sectional literature it is often found that the normal approximation is inadequate for the finite sample distributions of quadratic forms involving smoothed nonparametric estimation. This is because the asymptotic normality of these quadratic forms requires the smoothing parameter to grow or vanish at a suitable rate as the sample size grows and the convergence rate of test statistics delicately depends on the smoothing parameter. The fact that the asymptotic normality of Ŵ1 and Ŵ2 does not depend on whether Ji → ∞ suggests that asymptotic approximation may work well in the panel context. Indeed, our simulation shows that Ŵ1 and Ŵ2 perform well in finite samples even when Ji = 0 for all i Most importantly, the fact that Ji may be fixed for all i allows use of data-driven methods that may deliver a fixed finest scale under H0 Sensible data-driven methods have this feature because the optimal finest scale J0 = 0 under H0 We will propose a plug-in method to select a finest scale, which automatically converges to 0 under H0 and grows to ∞ under HA thus ensuring consistency against serial correlation of unknown form. Such a data-driven method could not be used for Lee and Hong’s (2001) test. Although we require that both n and T grow to ∞, we do not impose a restrictive relative speed limit between them. Also, no specific estimation method is required. From the proof of Theorem 1, we find that parameter estimation uncertainty for β has no impact on the null limit distribution of Ŵ1 and Ŵ2 , no matter whether Xit contains lagged dependent variables or deterministic/stochastic trending variables. This is in contrast to Durbin and Watson (1951) and Box and Pierce (1970), whose test statistics or limit distributions have to be modified when applied to estimated residuals of a stationary dynamic model. If regressors contain deterministic or stochastic trending variables, the limit distributions of these tests will become nonstandard (e.g., Kao and Chiang (2000), Kao and Emerson (2004)). Intuitively, parameter estimation uncertainty for β induces an adjustment of a finite number of degrees of freedom for Ŵ1 and Ŵ2 , but this becomes negligible as n → ∞ The tests Ŵ1 and Ŵ2 are applicable for both small and large Ji When (and only when) Ji → ∞ for all i = 1 n we can use the following simplified 1531 SERIAL CORRELATION IN PANEL MODELS versions of test statistics: n Ji 2j 2 2 Ji +1 − 1) i=1 2πTi j=0 k=1 α̂ijk − R̂i (0)(2 W̃1 = (4.3) n 1/2 2 i=1 R̂4i (0)(2Ji +1 − 1) Ji 2j 2 n J +1 1 2πTi j=0 k=1 âijk − (2 i − 1) W̃2 = √ (4.4) 2(2Ji +1 − 1)1/2 n i=1 These are the generalizations of Lee and Hong’s (2001) test to estimated residuals of panel models. Theorem 2 shows that they are asymptotically N(0 1) under H0 if Ji → ∞ for all i. THEOREM 2: Suppose Assumptions 1–5 hold, 2Ji +1 = ai Tiν for ai ∈ [c C] and ν ∈ (0 12 ) n/T ν log22 T → 0 n/T 2(2τ−1)−2(2τ−1/2)ν → 0 as n T → ∞ where τ ≥ 32 p is as in Assumption 2. If {εit } in (2.1) is i.i.d. for each i then W̃1 − Ŵ1 → 0 W̃2 − p d d W2 → 0, W̃1 → N(0 1), and W̃2 → N(0 1). Thus, for large (and only large) Ji W̃1 and W̃2 are asymptotically equivalent to Ŵ1 and Ŵ2 respectively. Note that here, n cannot grow faster than T ν where ν < 12 . 5. CONSISTENCY We now show that Ŵ1 and Ŵ2 are consistent against HA . We assume the following condition. ASSUMPTION ∞ ∞zero-mean stationary ∞6: For2 each i, {vit } isa∞ fourth-order process with R (h) ≤ C and h=−∞ i j=−∞ k=−∞ l=−∞ |κi (j k l)| ≤ C where κi (j k l) is the fourth-order cumulant of the joint distribution of {vit vit+j vit+k vit+l } Assumption 6 characterizes temporal dependence of {vit } When {vit } is Gaussian, the cumulant condition holds trivially because κi (j k l) = 0 for all j k l ∈ Z If for each i {vit } is a fourth-order stationary linear process with absolutely summable coefficients and i.i.d. innovations whose fourth order moment exists, the cumulant condition also holds (e.g., Hannan (1970, p. 211)). More primitive conditions (e.g., strong mixing) could be imposed, but such primitive conditions would rule out long memory processes. Assumption 6 allows long memory processes I(d) with d < 14 for {vit }. THEOREM 3: Put nA ≡ #(NA ) and ci ≡ Ti /T where N A ≡ {i : 0 ≤ i ≤ n n Q(fi fi0 ) > 0}. Suppose Assumptions 1–6 hold, (nA T )−1 i=1 2Ji → 0, and 1532 Y. HONG AND C. KAO Ji → ∞ for all i = 1 n as n T → ∞. Then (a) (nA T )−1 V̂ 1/2 Ŵ1 − n−1 A × p Ji +1 ν = ai Ti for all i where ai ∈ i∈NA 2πci Q(fi fi0 ) → 0; (b) if in addition 2 p −1 1−ν/2 −1 [c C] and ν ∈ (0 1) then (nA T ) Ŵ2 − nA i∈NA π(ci /ai )1/2 Q(fi fi0 ) → 0. As discussed earlier, although serial correlation in {vit } may differ from serial correlation in {εit } H0 holds if and only if {vit } is serially uncorrelated. Consequently, the index set NA is nonempty under HA , at least for large n It follows that n−1 A i∈NA ci Q(fi fi0 ) ≥ c for large n. Then P[Ŵ1 > C(n T )] → 1 and P[Ŵ2 > C(n T )] → 1 under HA for any sequence of constants {C(n T ) = n o[nA T/( i=1 2Ji )1/2 ]} Thus, Ŵ1 and Ŵ2 are consistent against HA provided n (nA T )−1 i=1 2Ji → 0 and Ji → ∞ for all i. For simplicity, we let Ji → ∞ for all i to ensure consistency against HA This differs from the case under H0 where Ji can be fixed for all i Our data-driven method below will deliver a finest scale that automatically converges to 0 under H0 but grows to ∞ with T under HA n Under HA Ŵ1 and Ŵ2 diverge to ∞ at the rate of nA T/( i=1 2Ji )1/2 Thus, the larger the set NA is, the more powerful Ŵ1 and Ŵ2 are. In fact, the power depends on nA /n the proportion of individuals with serial correlation. For n 2Ji +1 = ai Tiν , nA T ( i=1 2Ji )1/2 ∝ (nA /n)n1/2 T 1−ν/2 This implies that Ŵ1 and Ŵ2 have asymptotic power 1 against HA even if the proportion nA /n → 0 at a rate slightly slower than n1/2 T 1−ν/2 For Ŵ1 and Ŵ2 serial correlations from different individuals never cancel each other out when some individuals have positive autocorrelations and some have negative autocorrelations. In contrast, cancellation may occur at least in part for existing tests, leading to low or little power; see Section 7 for more discussion. We emphasize that the ability of our tests to detect serial correlation in the presence of substantial inhomogeneity in serial correlation across i is not due to the use of wavelets, but to the use of the quadratic form in (3.13). On the other hand, we may extend the adaptive procedures of Fan (1996) and Spokoiny (1996) to further improve the power of our tests, as one referee pointed out. We leave this for future research. Theorems 1 and 3 imply that for large n and T , the negative values of Ŵ1 and Ŵ2 can occur only under H0 Thus, upper-tailed N(0 1) critical values should be used. As noted earlier, Ŵ1 and Ŵ2 are heteroskedasticity-consistent and heteroskedasticity-corrected tests respectively. An interesting question is, which test, Ŵ1 or Ŵ2 is more powerful? For convenience, we assume 2Ji +1 = ai Tiν for all i where ai ∈ [c C] and ν ∈ (0 1) and assume a larger ai for processes with stronger serial correlation in terms of a larger Q(fi fi0 ). With this rule, we have the following theorem. THEOREM 4: Suppose Assumptions 1–6 hold, n = γT ς for γ ∈ (0 ∞) and ς ∈ (0 ∞) and 2Ji +1 = ai Tiν for ai ∈ [c C] and ν ∈ (0 1) If ai is monotonically SERIAL CORRELATION IN PANEL MODELS 1533 increasing in Q(fi fi0 ) and Ti = T for all i then Ŵ1 is more efficient than Ŵ2 in terms of Bahadur’s asymptotic efficiency criterion. Bahadur’s (1960) asymptotic slope criterion is pertinent for power comparison of large sample tests under fixed alternatives. The basic idea is to compare the logarithms of the asymptotic significance levels (i.e., p-values) of the tests under a fixed alternative. Bahadur’s asymptotic efficiency is defined as the limit ratio of the sample sizes required by the two tests under comparison to achieve the same asymptotic significance level (p-value) under a fixed alternative. Theorem 4 implies that for hypothesis testing, correcting heteroskedasticity may give poorer power. This is in contrast to the well-known result that correcting heteroskedasticity leads to more efficient estimation. Intuitively, for Ŵ2 √ a larger Q(fi fi0 ) is more heavily discounted by Vi0 ∼ 2(2Ji +1 − 1)1/2 when Ji is larger. Thus, it is less powerful than Ŵ1 which puts uniform weighting on each Q(fi fi0 ) Of course it is possible that Ŵ1 is asymptotically less powerful than Ŵ2 as will occur when ai is monotonically decreasing in Q(fi fi0 ) However, sensible data-driven methods usually provide a rule that ai is increasing in Q(fi fi0 ) When Ji = J for all i Ŵ1 and Ŵ2 may still not be asymptotically equally efficient, because of heteroskedasticity (σi2 = σ 2 ). We note that the asymptotic power of Ŵ1 and Ŵ2 does not depend on mother wavelet ψ(·). All wavelets are asymptotically equally efficient by Bahadur’s criterion. This differs from the kernel method, where the choice of a kernel affects the asymptotic power of tests (Hong (1996)). 6. ADAPTIVE CHOICE OF FINEST SCALE Theorem 1 implies that the choice of Ji is not important for asymptotic normality of Ŵ1 and Ŵ2 . Both small and large Ji can be used However, the choice of Ji may have significant impact on the power. Therefore, it is desirable to choose Ji via suitable data-driven methods. We now develop a data-driven method to select a finest scale. We first justify the use of a dataˆ For simplicity, we consider a common Jˆ for all i here. We driven finest scale J. ˆ use Ŵc (J) and Ŵc (J) to denote the Ŵc tests using Jˆ and J respectively, where c = 1 2. We impose a condition on the smoothness of ψ̂(·) at 0. ASSUMPTION 7: |ψ̂(z)| ≤ C|z|q for some q ∈ (0 ∞). THEOREM 5: Suppose Assumptions 1–5 and 7 hold, and Jˆ is a data-driven ˆ finest scale with 2J /2J = 1 + oP (2−J/2 ) where J is a nonstochastic finest scale 2J 2 such that 2 /(n + T ) → 0 as n → ∞ and T → ∞ If {εit } in (2.1) is i.i.d. p p d ˆ − W1 (J) → ˆ − W2 (J) → ˆ → 0 W2 (J) 0Ŵ1 (J) N(0 1), and for each i, then W1 (J) ˆ → N(0 1). Ŵ2 (J) d 1534 Y. HONG AND C. KAO Thus, the use of Jˆ rather than J has no impact on the limit distribution of ˆ and Ŵ2 (J) ˆ provided that Jˆ converges to J at a suitable rate. The rate Ŵ1 (J) J Jˆ condition 2 /2 − 1 = oP (2−J/2 ) is mild. If 2J ∝ T 1/5 for example, we require ˆ 2J /2J = 1 + oP (T −1/10 ) If J is fixed (e.g., J = 0), which occurs under H0 for our p ˆ data-driven method below, the condition becomes 2J /2J → 1 So far very few data-driven methods to choose J are available in the literature. Walter (1994) and Hall and Patil (1996) consider some data-driven methods in related but different contexts. They cannot be applied directly to our tests. We now develop a data-driven Jˆ that can satisfy the condition of Theorem 5. We first derive the average asymptotic IMSE formula for {fˆi (·)}ni=1 which was not available in the literature. We impose an additional condition on {vit }. ASSUMPTION 8: Assumption 7. ∞ h=−∞ |h|q |Ri (h)| ≤ C for all i, where q ∈ [1 ∞) is as in This characterizes the smoothness of fi (·) It rules out long memory processes. Under Assumption 8, we have a well-defined qth order generalized spectral derivative of fi (ω): (6.1) f (q) i (ω) ≡ (2π) −1 ∞ |h|q Ri (ω)e−ihω ω ∈ [−π π] h=−∞ We also define a measure λq ∈ (0 ∞) of the smoothness of ψ̂(·) at 0: (6.2) λq ≡ − (2π)2q+1 |ψ̂(z)|2 lim 1 − 2−2q z→0 |z|2q For the Franklin wavelet (3.3), q = 2; for the second-order spline wavelet (3.4), q = 3. To state the next result, we define a pseudo spectral density estimator f¯i (·) Ti for fi (·) that is based on the unobservable series {vit }t=1 ; namely, Ji 2 j (6.3) f¯i (ω) ≡ (2π)−1 R̄i (0) + ᾱijk Ψjk (ω) ω ∈ [−π π] j=1 k=1 where R̄i (h) ≡ Ti−1 Ti t=h+1 vit vit−|h| and ᾱijk ≡ (2π)−1/2 Ti −1 h=1−Ti R̄i (h)Ψ̂jk∗ (h). THEOREM 6: Suppose Assumptions 1–8 hold, λq ∈ (0 ∞) Ji → ∞, 2Ji/Ti → 0 as Ti → ∞. Then (a) for each i Q(fˆi fi ) = Q(f¯i fi ) + oP (2Ji /Ti + 2−2qJi ) 1535 SERIAL CORRELATION IN PANEL MODELS and EQ(f¯i fi ) = π 2 (q) 2Ji +1 π 2 fi (ω) dω + 2−2q(Ji +1) λq f (ω) dω Ti −π −π J −2qJi i + o 2 /Ti + 2 (b) If in addition Ji = J for all i and Ti /T = ci then n−1 n n−1 i=1 Q(f¯i fi ) + oP (2J /T + 2−2qJ ) and n −1 n i=1 2J+1 −1 −1 EQ(f¯i fi ) = ci n T i=1 n + 2−2q(J+1) λq n−1 n i=1 Q(fˆi fi ) = π −π fi2 (ω) dω n π 2 (q) fi (ω) dω i=1 −π + o(2J /T + 2−2qJ ) Theorem 6(a) gives the asymptotic IMSE of f¯i (·) and Theorem 6(b) gives the average asymptotic IMSE of {f¯i (·)}ni=1 . They imply that the optimal convern gence rates of Q(fˆi fi ) and n−1 i=1 Q(fˆi fi ) are the same as those of Q(f¯i fi ) n and n−1 i=1 Q(f¯i fi ) respectively. Parameter estimation uncertainty in β̂ has n no impact on the optimal convergence rates of Q(fˆi fi ) and n−1 i=1 Q(fˆi fi ) The optimal finest scale J0 that minimizes the average asymptotic IMSE of {f¯i (·)}ni=1 is 2J0 +1 = [2qλq ξ0 (q)T ]1/(2q+1) π n π (q) n where ξ0 (q) ≡ i=1 −π [fi (ω)]2 dω/ i=1 ci−1 −π fi2 (ω) dω This is infeasible because ξ0 (q) is unknown under HA However we can use some estimator ξ̂0 (q) to obtain a plug-in finest scale Jˆ0 : (6.4) (6.5) ˆ 2J0 +1 ≡ [2qλq ξ̂0 (q)T ]1/(2q+1) Because Jˆ0 is a nonnegative integer, we should use 1 ˆ J0 ≡ max (6.6) log2 2qλq ξ̂0 (q)T − 1 0 2q + 1 where the square bracket denotes the integer part. We impose the following condition on ξ̂0 (q). ASSUMPTION 9: ξ̂0 (q) − ζ0 (q) = oP (T −δ ) for some constant ζ0 (q) where δ = 1/2(2q + 1) if ζ0 (q) ∈ [c C] and δ = 1/(2q + 1) if ζ0 (q) = 0. 1536 Y. HONG AND C. KAO Note that the condition on ξ̂0 (q) is more stringent when ζ0 (q) = 0 than when ζ0 (q) = 0 but for both cases the conditions are mild We do not require p lim ξ̂0 (2) ≡ ζ0 (q) = ξ0 (q) where ξ0 (q) is as in (6.4) When (and only when) ζ0 (q) = ξ0 (q) Jˆ0 in (6.6) will converge to the optimal J0 in (6.4). COROLLARY 1: Suppose Assumptions 1–9 hold and Jˆ0 is given as in (6.6). d d If {εit } in (2.1) is i.i.d. for each i, then Ŵ1 (Jˆ0 ) → N(0 1) and Ŵ2 (Jˆ0 ) → N(0 1). We can use parametric or nonparametric (e.g., local linear smoothing) methods for estimator ξ̂0 (q). The former generally deliver a suboptimal finest scale, but have less variation in finite samples. The latter will deliver an asymptotically optimal finest scale, but are subject to substantial variation. There are also some methods (e.g., cross-validation and AIC) in the literature for selecting the truncation parameter. There is usually a trade-off between Type I and Type II errors in choosing a specific method. To obtain reasonable power while having a proper rejection probability under H0 for sample sizes often encountered in economics, we use a parametric AR(pi ) model for each i: (6.7) v̂it = γi0 + pi γih v̂it−h + εit (t = 1 Ti ; i = 1 n) h=1 where v̂it ≡ 0 if t ≤ 0 In practice, one can use AIC or BIC to select pi for each i Suppose γ̂i ≡ (γ̂i0 γ̂i1 γ̂ip ) is the OLS estimator of γi ≡ (γi0 γi1 γip ) We consider q = 2; an example is the Franklin wavelet in (3.3), whose λ2 = π 4 /45 We have 2 π n π n d2 ˆ f (ω) dω fˆi2 (ω) dω (6.8) (Ti /T ) ξ̂0 (2) ≡ 2 i dω −π −π i=1 i=1 pi where fˆi (ω) ≡ (2π)−1 |1 − h=1 γ̂ih e−ihω |−2 We note that ξ̂0 (2) satisfies Assumption 9 with q = 2 because for parametric AR(pi ) approximations, ξ̂0 (2) − ζ0 (2) = OP ((nT )−1/2 ). The performance of the plug-in method in (6.6) relies on the specification in (6.7). To further improve the power, one could also consider a data-driven, individual-specific Jˆi using the IMSE criterion of fi (·) in Theorem 6(a) Such individual-specific Jˆi may more effectively capture spatial nonhomogeneity in the degree of serial correlation across i However, they may have wide variations across i, leading to large Type I errors for the tests. A compromise is to develop a data-driven Jˆc where c is an index for some suitable groups such as regions and sectors where all individuals in the same group will have the same finest scale. SERIAL CORRELATION IN PANEL MODELS 1537 In fact, the IMSE criterion is more suitable for estimation than for testing. A better criterion for testing is to maximize power or trade off between level distortion and power improvement. This will, however, require higher-order asymptotic analysis for our tests, which is beyond the scope of this paper and should be pursued in future work. Simulation studies below show that the finest scale chosen via (6.6)–(6.8) gives reasonable rejection probabilities under H0 and gives robust and good power, particularly when the spectrum has distinct peaks or kinks. 7. MONTE CARLO EXPERIMENT We now compare the performance of Ŵ1 and Ŵ2 with three existing tests for serial correlation—Bhargava, Franzini, and Narendranathan’s (1982; BFN) Durbin–Watson type test, Baltagi and Li’s (1995; BL) LM test, (7.1) n T 2 n T nT 2 2 v̂it v̂it−1 v̂it BL = T − 1 i=1 t=1 i=1 t=1 and Bera, Sosa-Escudero, and Yoon’s (2001; BSY) modified LM test: (7.2) 2 2 à BSY = nT B̃ + (T − 1) 1 − T T 2 where v̂it is the within residual, n T n à ≡ 1 − ũi IT ũi ũ2it i=1 n B̃ ≡ T i=1 t=1 ũit ũit−1 i=1 t=1 n T ũ2it i=1 t=1 IT is a matrix of ones, and ũi ≡ (ũi1 ũiT ) is the OLS residual without random effects. All these tests consider balanced panels. Both BL and BSY have an asymptotic χ21 distribution under H0 . In contrast, BFN converges to 2 under H0 , a degenerate distribution. Bhargava, Franzini, and Narendranathan (1982, p. 436) suggest using a critical value of 2 at the 5% level. We find that BFN rejects H0 up to 677%, 669%, and 648% at the 5% level, when (n T ) = (25 32), (50 64), and (100 128) respectively. It seems that this test cannot be used, so we drop it from comparison. For Ŵ1 and Ŵ2 because the choice of ψ(·) is not important, we only use the Franklin wavelet in (3.3). To examine the impact of the choice of J we consider J = 0 1 and the datadriven Jˆ0 in (6.6). We also compare Ŵ1 and Ŵ2 with the panel versions of Hong’s (1996) kernel-based tests, K̂1 and K̂2 , which are obtained by gener- 1538 Y. HONG AND C. KAO alizing Hong’s (1996) kernel method to model (2.1). They are based on an individual-specific kernel spectral density estimator. We use the Daniell kernel k(z) = sin(πz)/πz z ∈ R which has the optimal power over a class of kernels. We choose a data-driven bandwidth using a plug-in method similar to that for Jˆ0 We consider the following two data generating processes (DGP): (a) DGP1 a static panel model: Yit = 5 + 5Xit + µi + εit , Xit = 5Xit−1 + ηit ηit ∼ i.i.d. U[−5 5] and (b) DGP2 a dynamic panel model: Yit = 5 + 5Yit−1 + µi + εit For both DGPs, µi ∼ i.i.d. N(0 σµ2 ) Let τ measure the relative strength of random effects (no random effect when τ = 0) and we choose a variety of τ as in Baltagi, Chang, and Li (1992). To examine the probability of rejecting a correct H0 , we set εit = zit where zit ∼ i.i.d. N(0 1) To examine the power, we consider the following error processes: AR(1) Alternatives: i = 1 n AR(1)a : εit = 2εit−1 + zit AR(1)b : εit = −2εit−1 + zit i = 1 n (7.3) i = 1 n2 AR(1)c : ε = 2εit−1 + zit it −2εit−1 + zit i = n2 + 1 n ARMA(12 4) Alternatives: i = 1 n ARMA(12 4)a: εit = −3εit−12 + zit + zit−4 b ARMA(12 4) : εit = 3εit−12 + zit − zit−4 i = 1 n (7.4) −3εit−12 + zit + zit−4 i = 1 n2 c : ε = ARMA(12 4) it 3εit−12 + zit − zit−4 i = n2 + 1 n AR(1)a and AR(1)b are the full positive and negative AR(1) respectively. We expect BL and BSY to have optimal power against them. Wavelet tests have no advantages because these alternatives have a relatively flat spectrum. AR(1)c is a mixed AR(1), where the first half individuals have a positive AR coefficient, and the second half have a negative AR coefficient. On the other hand, ARMA(12 4) can arise from monthly data; it has four distinct spectral peaks. The top panel of Table I reports rejection probabilities under H0 with τ = 4 for DGP1 When n < T , the rejection probabilities of BSY are reasonable and the best among all the tests. BL overrejects H0 , while Ŵ1 , Ŵ2 , K̂1 , and K̂2 underreject H0 but not excessively. For other values of τ (not reported), the rejection probability of BSY is sensitive to the choice of τ displaying severe overrejections when τ is large. BL still overrejects H0 for all τ The Ŵ1 , Ŵ2 K̂1 , and K̂2 tests are robust to the choice of τ. The rejection probabilities of Ŵ1 and Ŵ2 TABLE I PERCENTAGE OF REJECTIONS UNDER THE NULL HYPOTHESIS IN THE STATIC PANEL MODEL (n, T ) Level (5, 8) 10% 5% (10, 16) 10% 5% (25, 32) 10% 5% (50, 64) 10% 5% (5, 5) 10% (10, 10) 5% 10% 5% (25, 25) 10% 5% (50, 50) 10% 5% (8, 5) 10% 5% (16, 10) 10% 5% (32, 25) 10% 5% (64, 50) 10% 5% Asymptotic Critical Values Bootstrapped Critical Values Ŵ1 (0) Ŵ1 (1) Ŵ1 (Jˆ0 ) Ŵ2 (0) Ŵ2 (1) Ŵ2 (Jˆ0 ) K̂1 K̂2 BL BSY 89 46 86 40 89 46 127 58 122 67 127 58 129 61 209 113 50 15 37 12 83 89 83 109 112 109 64 106 18 17 33 44 33 53 59 53 33 68 4 4 105 94 97 115 106 111 92 113 69 47 53 48 40 55 63 59 47 50 27 17 109 101 105 119 111 108 117 119 94 92 58 49 48 56 58 54 59 62 40 50 122 124 122 160 161 160 225 369 106 14 53 55 53 102 93 102 119 239 27 0 98 47 99 51 98 47 147 82 155 91 147 82 93 46 211 114 8 2 4 0 95 91 95 123 128 123 92 135 36 31 39 49 39 64 76 64 39 77 15 15 104 95 93 120 117 121 99 109 81 84 38 36 42 49 50 52 37 61 41 34 78 88 98 119 118 134 73 109 81 73 33 36 50 43 66 80 34 70 40 54 104 92 92 117 100 100 94 108 86 108 38 54 54 48 56 56 50 62 38 59 103 102 100 106 104 94 101 106 115 107 48 52 58 55 38 38 57 53 63 55 106 74 84 113 92 94 104 105 116 101 59 44 48 58 44 54 59 50 57 53 1539 Note: (a) DGP: Yit = 5 + 5Xit + µi + εit Xit = 5Xit−1 + ηit ηit ∼ i.i.d. U[−5 5] µi ∼ i.i.d. N(0 4) and εit ∼ i.i.d. N(0 1). (b) Ŵ1 , heteroskedasticity-consistent Franklin wavelet-based test; Ŵ2 , heteroskedasticity-corrected Franklin wavelet-based test; Jˆ0 data-driven finest scale. (c) K̂1 , heteroskedasticity-consistent Daniell kernel-based test; K̂2 heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 1000 simulation replications; 500 bootstrap replications. SERIAL CORRELATION IN PANEL MODELS Ŵ1 (0) 25 16 45 28 60 31 66 29 6 0 31 17 48 26 66 34 4 0 26 18 49 25 63 30 Ŵ1 (1) 18 13 33 20 51 28 58 27 3 0 20 13 37 21 58 35 2 1 17 7 35 12 48 21 Ŵ1 (Jˆ0 ) 18 13 33 19 63 40 79 40 6 0 20 13 45 25 78 43 4 1 17 7 41 17 69 38 Ŵ2 (0) 14 4 45 25 57 30 68 34 1 0 24 11 43 22 66 35 1 0 14 7 43 28 59 29 Ŵ2 (1) 10 4 28 14 46 19 60 30 0 0 8 3 29 14 54 26 0 0 7 3 24 17 42 22 Ŵ2 (Jˆ0 ) 10 4 30 11 47 23 82 37 1 0 8 3 30 16 72 36 1 0 7 3 26 13 63 37 K̂1 25 15 36 17 66 39 72 39 9 0 23 13 50 29 75 40 4 2 16 8 42 21 67 35 K̂2 14 4 28 12 58 26 80 38 1 0 12 2 33 20 75 34 1 0 7 2 31 17 71 32 BL 305 199 226 149 232 146 225 145 453 324 352 239 302 208 282 192 578 463 480 354 347 245 353 238 BSY 90 30 57 21 67 18 77 22 183 93 135 51 99 34 73 28 319 207 220 103 142 59 114 48 1540 Y. HONG AND C. KAO are better when a smaller J or data-driven Jˆ0 is used. When n = T Ŵ1 , Ŵ2 K̂1 , and K̂2 all underreject H0 , but they improve when n = T increase. On the other hand, BL overrejects severely while BSY performs well. When n > T again, BSY has the best rejection probabilities if T > 10 but overrejects if T is small BL still overrejects H0 for all sample sizes. The Ŵ1 , Ŵ2 K̂1 , and K̂2 tests all underreject H0 but they are substantially improved when T > 25, especially when data-driven Jˆ0 or J = 0 is used. The rejection probabilities of Ŵ1 and Ŵ2 are similar to those of K̂1 and K̂2 in all cases. Table I indicates that when using asymptotic critical values, most tests do not perform well when n T < 32. For small samples, we suggest using wild bootstrap (e.g., Härdle and Mammen (1993), Davidson and Flachaire (2001), Horowitz (2001)) as an alternative to asymptotic approximation. The bottom panel of Table I reports bootstrap rejection probabilities, which are based on 1000 replications and 500 bootstrap resamples. The Ŵ1 and Ŵ2 tests have reasonable rejection probabilities in small samples for all cases (n < T , n = T , and n > T ). It appears that wild bootstrap can remedy the underrejection of Ŵ1 and Ŵ2 using asymptotic critical values, especially when the sample size is as small as 5 or 8. However, K̂1 and K̂2 overreject when (n T ) = (5 5). BL and BSY perform better using wild bootstrap, but they still tend to underreject. Table II reports the rejection probabilities under DGP2 , a dynamic panel process. Again Ŵ1 , Ŵ2 K̂1 , and K̂2 all underreject using asymptotic critical values, but they all perform well using wild bootstrap, though they still underreject compared to Table I. Unlike under DGP1 (a static panel process), BSY and BL perform poorly using either asymptotic or bootstrap critical values, though BL performs slightly better than BSY when n ≤ T . BSY overrejects for almost all cases using bootstrap critical values. The top panel of Table III first reports the Type I error corrected powers against AR(1) alternatives, with τ = 4 using empirical critical values, which provides a fair comparison. Because empirical critical values are not observable in practice, we also report power using bootstrap critical values. Under AR(1)a BSY is the most powerful, followed by BL. This is expected because both BSY and BL are optimal against AR(1). The Ŵ1 , Ŵ2 K̂1 , and K̂2 tests have nontrivial but substantially lower power when sample sizes are small. This is because AR(1)a has a relatively flat spectrum and the advantage of consistent testing is not displayed. Under AR(1)b , BL becomes the most powerful. Somewhat surprisingly, Ŵ1 , Ŵ2 K̂1 , and K̂2 have rather high power and dominate BSY, perhaps due to the fact that a negative AR(1) has a less smooth spectrum than a positive AR(1). For example, with (n T ) = (10 16), the power of BSY is 63% while those of BL and Ŵ1 (0) are 714% and 471% respectively. Interestingly, both BSY and BL fail to detect AR(1)c the mixed model. In contrast, Ŵ1 , Ŵ2 K̂1 , and K̂2 are very powerful against AR(1)c , indicating that wavelet and kernel tests are rather effective in TABLE II PERCENTAGE OF REJECTIONS UNDER THE NULL HYPOTHESIS IN THE DYNAMIC PANEL MODEL (n, T ) Level (5, 8) 10% 5% (10, 16) 10% 5% (25, 32) 10% 5% (50, 64) 10% 5% (5, 5) 10% 5% (10, 10) 10% 5% (25, 25) 10% 5% (50, 50) 10% 5% (8, 5) 10% 5% (16, 10) 10% 5% (32, 25) 10% 5% (64, 50) 10% 5% Asymptotic Critical Values 2 13 11 46 23 2 11 4 25 13 2 11 4 28 11 0 10 3 34 20 0 4 2 26 13 0 4 2 32 11 3 16 7 37 22 0 8 2 34 21 98 48 18 21 3 62 298 198 999 997 Ŵ1 (0) 35 Ŵ1 (1) 20 Ŵ1 (Jˆ0 ) 35 Ŵ2 (0) 40 Ŵ2 (1) 35 Ŵ2 (Jˆ0 ) 40 K̂1 60 K̂2 35 BL 120 BSY 155 15 45 20 60 15 45 15 65 10 90 15 65 10 65 5 65 45 20 80 100 62 41 63 55 35 59 70 67 3 0 29 0 0 5 24 0 0 4 33 0 0 4 24 0 0 1 20 0 0 0 27 0 0 0 38 0 0 3 28 0 0 1 0 402 314 128 0 240 163 142 55 75 45 85 55 90 60 90 65 90 60 90 35 90 60 85 0 5 90 205 45 45 30 70 50 40 25 70 60 45 30 70 45 80 45 65 40 65 40 40 40 80 45 65 50 40 25 55 45 70 40 55 0 240 160 110 90 220 130 110 0 27 19 38 12 4 0 26 18 49 25 63 30 3 27 11 20 7 2 1 17 7 35 12 48 21 3 23 15 35 18 4 1 17 7 41 17 69 38 0 23 14 30 15 1 0 14 7 43 28 59 29 0 17 4 18 5 0 0 7 3 24 17 42 22 0 12 7 28 9 1 0 7 3 26 13 63 37 2 28 14 32 15 4 2 16 8 42 21 67 35 0 18 9 33 16 1 0 7 2 31 17 71 32 58 28 10 13 2 567 442 177 100 19 5 6 2 73 969 931 949 949 233 160 145 81 994 986 590 590 Bootstrapped Critical Values 30 40 30 25 25 25 30 35 15 25 120 85 120 95 115 95 85 75 25 150 20 55 25 30 80 35 20 55 25 10 75 25 20 100 40 10 75 40 20 70 35 15 85 30 60 30 15 75 210 110 110 65 30 5 40 85 30 20 5 40 100 55 30 5 40 95 50 50 30 50 100 40 25 15 45 120 50 50 30 50 105 55 20 5 30 110 65 25 15 35 15 0 335 205 140 240 120 195 110 115 20 55 35 15 65 35 20 55 35 15 65 30 20 65 30 15 65 30 15 70 30 15 55 30 60 35 15 50 230 135 80 40 110 45 90 45 105 60 95 65 100 65 75 50 110 65 5 0 305 160 1541 Notes: (a) DGP: Yit = 5 + 5Yit−1 + µi + εit µi ∼ i.i.d. N(0 4) and εit ∼ i.i.d. N(0 1). (b) Ŵ1 , heteroskedasticity-consistent Franklin wavelet-based test; Ŵ2 , heteroskedasticity-corrected Franklin wavelet-based test; Jˆ0 data-driven finest scale. (c) K̂1 , heteroskedasticity-consistent Daniell kernel-based test; K̂2 heteroskedasticitycorrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 1000 simulation replications; 500 bootstrap replications. SERIAL CORRELATION IN PANEL MODELS Ŵ1 (0) 5 Ŵ1 (1) 4 Ŵ1 (Jˆ0 ) 5 Ŵ2 (0) 1 Ŵ2 (1) 0 Ŵ2 (Jˆ0 ) 1 K̂1 6 K̂2 1 BL 166 BSY 129 TABLE III Model 1 (n, T ) Level (5, 8) 10% 5% (10, 16) 10% 5% Model 2 (25, 32) 10% 5% (50, 64) 10% 5% (5, 8) 10% (10, 16) 5% 10% 5% Model 3 (25, 32) 10% 1542 PERCENTAGE OF REJECTIONS UNDER THE AR(1) ALTERNATIVE IN THE STATIC PANEL MODEL (50, 64) 5% (5, 8) (10, 16) (25, 32) (50, 64) 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 999 167 160 161 143 154 155 164 149 177 111 105 106 105 72 72 72 103 81 83 62 386 299 257 344 238 221 346 287 116 87 252 185 159 197 151 127 242 69 610 45 933 837 800 937 800 758 879 866 174 107 887 736 678 883 681 666 807 775 124 60 999 999 999 999 999 999 999 999 139 126 999 999 999 999 999 999 999 999 74 63 995 995 995 995 995 995 995 995 995 995 125 120 125 140 145 140 130 45 70 65 75 75 75 65 50 65 65 70 25 65 420 270 420 355 255 355 350 300 100 60 255 170 255 240 160 240 195 205 35 55 915 795 825 915 770 800 845 865 175 20 825 700 710 830 675 715 785 780 115 20 995 995 995 995 995 995 995 995 115 80 995 995 995 995 995 995 995 995 80 75 Empirical Critical Values 144 61 748 99 58 554 115 69 564 93 35 683 72 40 462 102 49 477 157 96 720 116 58 653 199 119 978 748 599 999 623 424 417 537 333 373 600 511 960 999 999 999 999 999 999 999 999 999 999 999 999 999 998 999 999 998 999 999 999 999 286 247 246 263 245 245 270 265 401 79 198 163 160 164 140 140 179 163 243 52 560 370 410 515 350 340 555 530 905 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 215 180 215 285 245 285 220 295 160 40 145 100 145 170 165 170 150 170 80 40 608 498 369 592 470 366 508 476 830 95 471 356 283 422 352 257 392 354 714 63 996 554 902 992 462 883 949 946 999 708 979 424 828 977 333 840 908 905 999 592 Bootstrapped Critical Values Ŵ1 (0) 25 20 90 30 Ŵ1 (1) 25 15 75 40 Ŵ1 (Jˆ0 ) 25 20 75 40 Ŵ2 (0) 40 5 65 40 Ŵ2 (1) 35 15 60 30 Ŵ2 (Jˆ0 ) 40 5 60 30 K̂1 30 20 90 40 K̂2 50 10 90 45 BL 15 0 110 40 BSY 205 195 670 665 700 495 535 645 445 480 700 700 950 995 620 490 960 585 485 955 450 510 715 80 465 335 920 470 335 930 340 345 580 80 995 930 995 985 930 995 995 995 995 995 970 880 995 970 885 995 995 995 995 995 995 995 995 995 995 995 995 995 995 995 Notes: (a) DGP: Yit = 5 + 5Xit + µi + εit Xit = 5Xit−1 + ηit ηit ∼ i.i.d. U[−5 5] µi ∼ i.i.d. N(0 4) and Model 1: εit = 2εit−1 + zit , i = 1 n; Model 2: εit = −2εit−1 + zit , i = 1 n; and Model 3: εit = 2εit−1 + zit , i = 1 n/2 and εit = −2εit−1 + zit i = n/2 + 1 n where zit ∼ i.i.d. N(0 1). (b) Ŵ1 , heteroskedasticityconsistent Franklin wavelet-based test; Ŵ2 , heteroskedasticity-corrected Franklin wavelet-based test; Jˆ0 data-driven finest scale. (c) K̂1 , heteroskedasticity-consistent Daniell kernel-based test; K̂2 heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 200 simulation replications; 200 bootstrap replications. Y. HONG AND C. KAO Ŵ1 (0) 36 14 Ŵ1 (1) 43 14 Ŵ1 (Jˆ0 ) 42 14 Ŵ2 (0) 36 9 Ŵ2 (1) 56 14 Ŵ2 (Jˆ0 ) 56 14 K̂1 47 16 K̂2 38 15 BL 35 11 BSY 307 209 SERIAL CORRELATION IN PANEL MODELS 1543 capturing inhomogeneous serial correlations across individuals. The bottom panel of Table III reports bootstrap power. Due to the extensive computation involved, we use 200 bootstrap resamples and 200 replications for bootstrap power. Again, BSY is the best for AR(1)a and BL, Ŵ1 , Ŵ2 K̂1 , and K̂2 have low power when T < 32. We also observe that Ŵ1 and Ŵ2 are much more powerful than K̂1 and K̂2 under AR(1)b for (n T ) = (10 16) though the advantages diminish as the sample sizes increase. For all AR(1) alternatives, the choice of J has significant impact on Ŵ1 and Ŵ2 and J = 0 gives the best power for Ŵ1 and Ŵ2 against various AR(1) alternatives The data-driven Jˆ0 delivers reasonable and robust power in all cases. The top panel of Table IV reports the Type I error corrected powers against ARMA(12 4) using empirical critical values. All tests have no power when the sample size is (10 16) since the effective sample size in fact is (10 4). This is because we remove the first 12 time series observations for each i. Hence the effective samples for the results reported are (10 4), (25 20), (50 52). Ŵ1 (Jˆ0 ) and Ŵ2 (Jˆ0 ) have the best power and dominate K̂1 , K̂2 BL, and BSY against all three ARMA(12 4). This is apparently due to the fact that ARMA(12 4) has four sharp spectral peaks, which can be more effectively captured by wavelets than kernels. This confirms our prediction that wavelet-based tests are powerful in capturing spectral modes/peaks in small and finite samples. The bottom panel of Table IV reports bootstrap powers. Both Ŵ1 (Jˆ0 ) and Ŵ2 (Jˆ0 ) perform the best and clearly dominate K̂1 and K̂2 for most cases. We note that the clear dominance of Ŵ1 and Ŵ2 over K̂1 and K̂2 is reduced if bootstrap rather than empirical critical values are used. The choice of J has significant impact on the power either using empirical or bootstrap critical values of Ŵ1 and Ŵ2 Data-driven Jˆ0 gives better power than J = 0 1 Apparently due to the seasonal patterns of ARMA(12 4), the choice of J = 0 1 yields little or no power for Ŵ1 and Ŵ2 against ARMA(12 4)b and ARMA(12 4)c . In contrast, Jˆ0 is able to adapt to the unknown serial correlation pattern and gives robust and high power. This highlights the value of our data-driven finest scale Jˆ0 . We also conduct a simulation study on the power under a dynamic panel model (DGP2 ) which we do not report in the paper for space. The relative ranking between our tests and other tests under a dynamic model remains similar to that under a static model, and in fact the dominance of our waveletbased tests over the kernel-based tests against ARMA(12 4) error alternatives becomes more striking. For example, the Type I error corrected powers of Ŵ1 , Ŵ2 K̂1 , and K̂2 are 673%, 828%, 457%, and 583% respectively when (n T ) = (10 16). On the other hand, the relative ranking between BL and BSY is reversed: unlike under a static model, BSY now has poor power while BL is most powerful against a positive or negative (but not mixed) AR(1) error alternative. TABLE IV Model 1 (n, T ) Level (10, 16) 10% (25, 32) 5% 10% 5% Model 2 (50, 64) 10% 5% (10, 16) 10% (25, 32) 5% 10% 5% 1544 PERCENTAGE OF REJECTIONS UNDER THE ARMA(12, 4) ALTERNATIVE IN THE STATIC PANEL MODEL Model 3 (50, 64) 10% 5% (10, 16) 10% (25, 32) 5% (50, 64) 10% 5% 10% 5% Empirical Critical Values 0 0 10 0 0 0 0 0 27 0 0 0 0 0 0 0 0 0 0 0 181 463 432 161 420 348 350 277 291 95 119 466 299 95 301 262 241 191 177 47 418 915 968 442 913 968 930 941 194 66 334 841 952 314 841 956 903 915 123 29 0 0 1 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 38 1 344 27 0 165 116 73 86 339 10 0 228 7 81 289 59 32 50 237 73 0 955 57 0 955 958 961 62 229 39 0 955 28 0 944 950 944 35 148 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 98 108 371 79 81 289 230 163 162 210 57 60 258 40 35 196 240 102 98 124 220 293 943 229 260 944 948 944 147 171 146 197 936 132 201 939 937 939 72 103 45 0 970 45 0 970 975 975 30 75 40 20 40 100 80 100 15 60 10 5 10 5 10 55 25 55 5 20 5 5 135 115 340 130 160 360 280 370 80 15 55 70 255 60 95 275 195 240 35 15 290 335 945 225 300 945 955 955 125 50 145 235 940 150 215 925 940 935 65 50 Bootstrapped Critical Values Ŵ1 (0) Ŵ1 (1) Ŵ1 (Jˆ0 ) Ŵ2 (0) Ŵ2 (1) Ŵ2 (Jˆ0 ) K̂1 K̂2 BL BSY 30 20 30 55 35 55 15 30 10 5 5 0 5 30 10 30 0 10 5 5 270 570 550 275 610 605 470 505 170 0 165 425 425 180 480 455 335 375 85 0 395 925 965 420 935 980 920 925 165 10 300 855 950 300 870 960 900 895 105 5 30 20 30 90 65 90 15 55 15 10 10 5 10 45 35 45 0 30 5 10 60 0 160 65 0 155 210 230 45 50 10 0 90 35 0 105 125 155 15 50 115 0 975 80 0 970 975 975 70 85 Notes: (a) DGP: Yit = 5 + 5Xit + µi + εit Xit = 5Xit−1 + ηit ηit ∼ i.i.d. U[−5 5] µi ∼ i.i.d. N(0 4) and Model 1: εit = −3εit−2 + zit + zit−4 , i = 1 n; Model 2: εit = 3εit−2 + zit + zit−4 , i = 1 n; and Model 3: εit = −3εit−2 + zit + zit−4 , i = 1 n/2 and εit = 3εit−2 + zit + zit−4 , i = n/2 + 1 n where zit ∼ i.i.d. N(0 1). (b) Ŵ1 , heteroskedasticity-consistent Franklin wavelet-based test; Ŵ2 , heteroskedasticity-corrected Franklin wavelet-based test; Jˆ0 data-driven finest scale. (c) K̂1 , heteroskedasticity-consistent Daniell kernel-based test; K̂2 heteroskedasticity-corrected Daniell kernel-based test. (d) BL, Baltagi–Li test; BSY, Bera, Sosa-Escudero, and Yoon test. (e) 200 simulation replications; 200 bootstrap replications. Y. HONG AND C. KAO Ŵ1 (0) Ŵ1 (1) Ŵ1 (Jˆ0 ) Ŵ2 (0) Ŵ2 (1) Ŵ2 (Jˆ0 ) K̂1 K̂2 BL BSY SERIAL CORRELATION IN PANEL MODELS 1545 8. CONCLUSION We have proposed a class of generally applicable wavelet-based consistent tests for serial correlation in static and dynamic panel models. Wavelets are powerful for detecting serial correlation where the spectrum has peaks or kinks. The new tests have a convenient limit N(0 1) distribution, which is not affected by parameter estimation uncertainty, even if regressors contain lagged dependent variables or deterministic/stochastic trending variables. Our tests do not require an alternative model, and are consistent against serial correlation of unknown form even in the presence of substantial inhomogeneity in serial correlation across individuals. They are applicable to unbalanced heterogeneous panel models with one way or two way error components. A datadriven method is developed to select the smoothing parameter—the finest scale in wavelet estimation, making our tests entirely operational in practice. Simulation shows that our tests perform well in finite samples. Dept. of Economics and Dept. of Statistical Science, Cornell University, 424 Uris Hall, Ithaca, NY 14850, U.S.A., and Department of Economics, Tsinghua University, Beijing 100084, China; yh20@cornell.edu and Dept. of Economics and Center for Policy Research, Syracuse University, 426 Eggers Hall, Syracuse, NY, U.S.A.; cdkao@maxwell.syr.edu. Manuscript received October, 2000; final revision received February, 2004. APPENDIX A: MATHEMATICAL APPENDIX To prove Theorems 1–6 and Corollary 1, we will use the following lemma. LEMMA A.1: Suppose Assumptions 1 and 2 hold, and let bJi (h m) be as in (3.15). Then for any Ji Ti ∈ Z+ and a bounded constant C that does not depend on i, Ji and Ti (i) bJi (h m) is real-valued, bJi (0 m) = bJi (h 0) = 0, and bJi (h m) = bJi (m h); Ti −1 Ti −1 υ (1+υ)(Ji +1) (ii) for 0 ≤ υ ≤ 12 ; m=1 h |bJi (h m)| ≤ C2 h=1 Ti −1 Ti −1 2 (Ji +1) (iii) [ m=1 |bJi (h m)|] ≤ C2 ; h=1 Ti −1 Ti −1 Ti −1 2 Ji +1 (iv) ; h1 =1 h2 =1 [ m=1 |bJi (h1 m)bJi (h2 m)|] ≤ C(Ji + 1)2 Ti −1 Ji +1 Ji +1 Ji +1 − 1)| ≤ C[(Ji + 1) + 2 (2 /Ti )(2τ−1) ] where τ is in Assump(v) | h=1 bJi (h h) − (2 tion 2; Ti −1 Ti −1 2 Ji +1 (vi) | h=1 − 1)| ≤ C[(Ji + 1)2 + 2Ji +1 (2Ji +1 /Ti )(2τ−1) ]; m=1 bJi (h m) − 2(2 (vii) sup1≤hm≤Ti −1 |bJi (h m)| ≤ C(Ji + 1); Ti −1 (viii) sup1≤h≤Ti −1 m=1 |bJi (h m)| ≤ C(Ji + 1). PROOF OF LEMMA A.1: This lemma extends Lee and Hong (2001, Lemma A.1), who consider the case where Ji ≡ J → ∞ as Ti ≡ T → ∞ See Hong and Kao (2002) for a detailed proof. Q.E.D. PROOF OF THEOREM 1: We shall show Theorems A.1–A.3. 1546 Y. HONG AND C. KAO THEOREM A.1: Let α̂ijk and ᾱijk be defined as in (3.12) and (6.3), and VnT ≡ Ji 2j p −1/2 n 2 2 Vi0 is as in (3.15). Then VnT i=1 2πTi j=0 k=1 (α̂ijk − ᾱijk ) → 0. n i=1 σi8 Vi0 , where −1/2 n THEOREM A.2: Put MnT ≡ ni=1 σi4 Mi0 where Mi0 is as in (3.14). Then VnT ( i=1 2πTi × Ji 2j d 2 ᾱ − M ) → N(0 1). nT j=0 k=1 ijk p −1/2 THEOREM A.3: Let M̂ and Vˆ be defined as in (3.15). Then VnT (M̂ − MnT ) → 0 and p Vˆ /VnT → 1. PROOF OF THEOREM A.1: Because α̂2ijk − ᾱ2ijk = (α̂ijk − ᾱijk )2 + 2(α̂ijk − ᾱijk )ᾱijk we shall show Propositions A.1 and A.2 below under the conditions of Theorem 1. Q.E.D. −1/2 PROPOSITION A.1: VnT −1/2 PROPOSITION A.2: VnT n i=1 2πTi n i=1 2πTi Ji 2j j=0 k=1 (α̂ijk Ji 2j j=0 k=1 (α̂ijk −1/2 1/2 − ᾱijk )2 = OP [VnT + (n−1 + T −1 )VnT ]. p − ᾱijk )ᾱijk → 0. PROOF OF PROPOSITION A.1: By the definition of v̂it in (2.3), we have v̂it = ṽit − X̃it (β̂ − β) where X̃it and ṽit are as in Assumption 5. We note that under H0 {vit } in (2.4) coincides with the true errors {εit } in (2.1), and so is i.i.d. for each i and {vit } and {vls } are independent for all i = l and all t s Putting R̃i (h) ≡ T −1 Tt=|h|+1 ṽit ṽit−|h| , we write (A.1) R̂i (h) − R̃i (h) = (β̂ − β) Γ˜ixx (h)(β̂ − β) − (β̂ − β) Γ˜ixv (h) − (β̂ − β) Γ˜ivx (h) ≡ 3 ξ̂ci (h) c=1 T i where Γ˜ixx (h) ≡ Ti−1 t=|h|+1 X̃it X̃it−|h| and Γ˜ixv (h) and Γ˜ivx (h) are as in Assumption 5. Next, recalling the definition of R̄i (h) as in (6.3), we can write (A.2) Ti R̃i (h) − R̄i (h) = Ti−1 −v̄i·vit − v̄i·ṽit−|h| − vit v̄·t−|h| − v̄·t ṽit−|h| + v̄vit + v̄ṽit−|h| t=|h|+1 ≡ 9 ξ̂ci (h) c=4 Given (A1) and (A2), we have α̂ijk − ᾱijk = (2π)−1/2 (A.3) n i=1 2πTi 9 Ti −1 c=1 h=1−Ti ξ̂ci (h)Ψ̂jk (h) It follows that 2 n T −1 Ji Ji 2j 2j 9 i ξ̂ci (h)Ψ̂jk (h) (α̂ijk − ᾱijk )2 ≤ 25 Ti j=0 k=1 c=1 ≡ 25 9 i=1 j=0 k=1 h=1−Ti Âc c=1 −1/2 p We shall show that VnT Âc → 0 for 1 ≤ c ≤ 9 We first consider Â1 From (A.1), we have |ξ̂1i (h)| ≤ β̂ − β2 Γ˜ixx (h) ≤ β̂ − β2 Γ˜ixx (0) Let bJi (h m) be defined as in Lemma A.1. 1547 SERIAL CORRELATION IN PANEL MODELS Then we have n i −1 T i −1 T −1/2 −1/2 Ti bJi (h m)ξ̂1i (h)ξ̂1i (m) VnT |Â1 | = VnT i=1 h=1 m=1 ≤ −1/2 β̂ − VnT 4 β n 1/2 Ti −1 Ti −1 Ti i=1 b2Ji (h m) h=1 m=1 n 1/2 Ti3 Γ˜ixx (0)4 i=1 = OP (n−3/2 ) given Lemma A.1(vi), VnT ≤ C ni=1 2Ji +1 , Assumptions 3–5, Ti ≤ CT and σi2 ∈ [c C]. Next, we consider the second term Â2 in (A.3). Recalling Γixv (h) ≡ p limTi →∞ Γ˜ixv (h) we have ξ̂2i (h) = (β̂ − β) Γixv (h) + (β̂ − β) [Γ˜ixv (h) − Γixv (h)] It follows that 2 T −1 Ji 2j n (A.4) Ti Γ (h) Ψ̂ (h) Â2 ≤ 2β̂ − β2 ixv jk i=1 j=0 k=1 h=1−T T −1 2 + [Γ˜ixv (h) − Γixv (h)]Ψ̂jk (h) h=1−T ≡ 2β̂ − β M̂1 + 2β̂ − β M̂2 say π −1 We now consider M̂1 in (A.4). Let Λxv × ijk ≡ −π fixv (ω)Ψjk (ω) dω, where fxv (ω) ≡ (2π) ∞ ∞ xv −ijω −1/2 Then Λijk = (2π) h=−∞ Γxv (h)e h=−∞ Γixv (h)Ψ̂jk (h) by Parseval’s identity, and Ti −1 1/2 xv Γ (h) Ψ̂ (h) = (2π) Λ + Γ ixv jk |h|≥Ti ixv (h)Ψ̂ij (h) It follows by the Cauchy–Schwarz ijk h=1−Ti inequality that 2 (A.5) M̂1 ≤ 2 n 2 Ji 2 j 2πTi i=1 2 Λxv ijk + 2 j=0 k=1 n Ti |h|≥Ti i=1 Ji 2 j Γixv (h)2 |Ψ̂jk (h)|2 j=0 k=1 |h|≥Ti ¯ = O(nT ) + o[nT (2J /T )2τ ] = O(nT ) Ji ¯ given 2J /T → 0 where J¯ ≡ max1≤i≤n (Ji ). Here, we have used the facts that (a) j=0 × 2j ∞ xv 2 −1 2 k=1 Λijk = (2π) h=−∞ Γixv (h) ≤ C by Parseval’s identity and Assumption 5; i 2j (b) |h|≥Ti Γixv (h)2 = o(T −1 ) given Assumption 5 and Ti = ci T ≥ cT ; and (c) Jj=0 k=1 × Ji 2τ−1 2 j 2 2τJi /Ti given Assumption 2. h≥|Ti | |Ψ̂jk (h)| ≤ j=0 |h|≥Ti |ψ̂(2πh/2 )| ≤ C2 For the second term M̂2 in (A.4), we have (A.6) β̂ − β2 M̂2 n ≤ β̂ − β2 Ti −1 Ti −1 Ti i=1 −1 = OP (nT ) |bJi (h m)|Γ˜ixv (h) − Γixv (h)Γ˜ixv (m) − Γixv (m) h=1 m=1 n T i −1 T i −1 |bJi (h m)| i=1 h=1 m=1 = OP [(nT )−1 VnT ] given Lemma A.1(ii), VnT ≤ C Ti=1 2Ji +1 , and Assumption 5. Combining (A.4)–(A.6) yields −1/2 −1/2 −1/2 −1/2 1/2 −1 1/2 Â3 = OP (VnT + (nT )−1 VnT ) VnT Â2 = OP (VnT + (nT ) VnT ) Similarly, we have VnT 1548 Y. HONG AND C. KAO Now we consider the term Â4 in (A.3). By the Cauchy–Schwarz inequality and the fact that under H0 {vit } coincides with {εit } and so is i.i.d. with Evit8 ≤ C for each i, we have E(v̄i·2 |Ti−1 × T i T i −1 −2 t=h+1 vit ||Ti t=m+1 vit |) ≤ nCTi J +1 for h m > 0 It follows from Markov’s inequality, Lemma A.1(ii), and VnT ≤ C i=1 2 i that (A.7) −1/2 Â4 VnT ≤ −1/2 VnT n i=1 = Ti −1 Ti −1 Ti Ti |bJi (h m)|v̄i·2 Ti−1 h=1 m=1 t=h+1 Ti −1 vit Ti vit t=m+1 1/2 OP (T −1 VnT ) −1/2 Similarly, we have VnT Â5 = OP (T −1 VnT ) Next, for the term Â6 in (A.3), noting that vit and v̄·t−h are independent for h > 0 under H0 Ti T i we have E(|v̄·t−h v̄·t−m ||Ti−1 t=h+1 vit ||Ti−1 t=m+1 vit |) ≤ C(nTi )−1 for h m > 0 by the Cauchy– −1/2 1/2 8 Schwarz inequality and Evit ≤ C It follows that VnT Â6 = OP (n−1 VnT ) Similarly, we have −1/2 −1 1/2 VnT Â7 = OP (n VnT ) i i Finally, given E(v̄2 |Ti−1 Tt=h+1 vit ||Ti−1 Tt=m+1 vit |) ≤ Cn−1 Ti−2 for h m > 0 under H0 we 1/2 p −1/2 1/2 −1/2 have VnT Âc = OP [(nT )−1 VnT Âc → 0 for all 1 ≤ c ≤ 9 given ] for c = 8 9 We have shown VnT max1≤i≤n 22(Ji +1) /(n2 + T ) → 0 Proposition A.1 then follows from (A.3). Q.E.D. PROOF OF PROPOSITION A.2: Recalling α̂ijk − ᾱijk = (2π)−1/2 can write (A.8) n 9 Ti −1 c=1 h=1−Ti ξ̂ci (h)Ψ̂jk (h), we n T −1 Ji Ji 2j 2j 9 i ξ̂ci (h)Ψ̂jk (h) ᾱijk 2πTi (α̂ijk − ᾱijk )ᾱijk = Ti i=1 j=0 k=1 c=1 ≡ 9 i=1 j=0 k=1 h=1−Ti δ̂c c=1 p −1/2 We shall show VnT δ̂c → 0 for 1 ≤ c ≤ 9 First, we have (A.9) −1/2 −1/2 VnT |δ̂1 + δ̂8 + δ̂9 | ≤ VnT (Â1 + Â8 + Â9 )1/2 n i=1 = OP [n −3/4 1/4 VnT Ji 2 1/2 j 2πTi ᾱ2ijk j=0 k=1 + (VnT /nT ) ] 1/2 n Ji 2j −1 −1 2 2 where VnT × j=0 k=1 ᾱijk = OP (1) by Lemma A.1(v) and E ᾱijk ≤ CTi i=1 2πTi Ti−1 −1 2 | ψ̂ (2πh)| . jk h=1−Ti Next, we consider the second term δ̂2 in (A8). We write (A.10) δ̂2 = (β̂ − β) n i=1 Ji 2 j Ti j=0 k=1 Ti −1 Γixv (h)Ψ̂jk (h) h=1−Ti Ti −1 + [Γ˜ixv (h) − Γixv (h)]Ψ̂jk (h) ᾱijk h=1−Ti ≡ (β̂ − β) M̂3 + (β̂ − β) M̂4 say. SERIAL CORRELATION IN PANEL MODELS 1549 For the first term M̂3 noting that {ᾱijk } is a zero-mean sequence independent across i we obtain E M̂32 = n T −1 T −1 i i Ti2 E i=1 ≤ n σi4 Ti h=1 m=1 T −1 i i=1 2 bJi (h m)Γixv (h)R̄i (m) 2 Γixv (h) h=1 =O T n sup 1≤h≤Ti −1 T −1 i 2 b2Ji (h m) m=1 (Ji + 1)2 = O(T VnT ) i=1 −1/2 given Assumption 5, Lemma A.1(viii), and VnT ≤ C ni=1 2Ji +1 . It follows that VnT (β̂ − β) M̂3 = OP (n−1/2 ) by Chebyshev’s inequality For the second term M̂4 in (A.10) we have Ji 2j −1/2 −1/2 1/2 1/2 2 1/2 VnT |(β̂ − β) M̂4 | ≤ VnT β̂ − βM̂2 ( ni=1 2πTi j=0 = OP [(nT )−1 VnT ] where k=1 ᾱijk ) −1/2 M̂2 = OP [(nT )−1 VnT ] as shown in (A.6). It follows from (A.10) that VnT δ̂2 = OP (n−1/2 + −1/2 −1 1/2 −1/2 −1 1/2 (nT ) VnT ) Similarly, we have VnT δ̂3 = OP (n + (nT ) VnT ) Ti −1 Ti −1 n We now consider δ̂4 in (A.8). Write δ̂ = Ti h=1 4 i=1 m=1 bJi (h m)ξ̂4i (h)R̄i (m) Using n J +1 Lemma A.1(ii) and VnT ≤ C i=1 2 i , we can obtain E δ̂24 = n n Ti −1 Ti −1 Ti −1 Ti −1 Ti Tl bJi (h1 m1 )bJl (h2 m2 ) h1 =1 h2 =1 m1 =1 m2 =1 i=1 l=1 ≤ CT −1 T −1 T −1 n i i i=1 × E[ξ̂4i (h1 )R̄i (m1 )ξ̂4l (h2 )R̄l (m2 )] 2 2 n T −1 T −1 i i −2 |bJi (h m)| + CT |bJi (h m)| h=1 m=1 J¯ i=1 h=1 m=1 ≤ C(2 /T )VnT + CT −2 2 VnT where the first inequality follows from the facts that (a) |E[ξ̂4i (h1 )R̄i (m1 )ξ̂4i (h2 )R̄i (m2 )]| ≤ CTi−3/2 Tl−3/2 , and (b) for i = l |E[ξ̂4i (h1 )R̄i (m1 )ξ̂4l (h2 )R̄l (m2 )]| ≤ CTi−2 Tl−2 which can be shown by exploiting the facts that under H0 , {vit } coincides with {εit } and so is i.i.d. with E(vit8 ) ≤ C for each i and {vit } and {vlt } are mutually independent for i = l. Hence, we have −1/2 ¯ 1/2 −1/2 VnT δ̂4 = OP (2J2 /T 1/2 + VnT /T ) by Chebyshev’s inequality Similarly, we can obtain VnT δ̂5 = ¯ 1/2 1/2 J/2 OP (2 /T + VnT /T ) Ti −1 Ti −1 Next, we consider δ̂6 Write δ̂6 = ni=1 Ti h=1 m=1 bJi (h m)ξ̂6i (h)R̄i (m) where ξ̂6i (h) = Ti n −1 Ti v as in (A.2). Then using Lemma A.1(ii) and VnT ≤ C i=1 2Ji +1 , we can obtain v̄ it ·t−h t=h+1 E δ̂26 = n n Ti −1 Ti −1 Ti −1 Ti −1 Ti Tl i=1 l=1 ≤ Cn−1 bJi (h1 m1 )bJl (h2 m2 ) h1 =1 h2 =1 m1 =1 m2 =1 T −1 T −1 n i i i=1 J¯ × E[ξ̂6i (h1 )R̄i (m1 )ξ̂6l (h2 )R̄l (m2 )] n T −1 T −1 2 2 i i −2 |bJi (h m)| + Cn |bJi (h m)| h=1 m=1 ≤ C(2 /T )VnT + Cn i=1 h=1 m=1 −2 2 VnT 1550 Y. HONG AND C. KAO where for the first inequality, we have used the facts that (a) |E[ξ̂6i (h1 )R̄i (m1 )ξ̂6i (h2 )R̄i (m2 )]| ≤ CTi−2 n−1 ; (b) for i = l |E[ξ̂6i (h1 )R̄i (m1 )ξ̂6l (h2 )R̄l (m2 )]| ≤ CTi−1 Tl−1 n−2 which can be shown by exploiting the i.i.d. property of {vit } and the independence between {vit } and {vlt } for i = l ¯ −1/2 under H0 via tedious algebra. It follows by Chebyshev’s inequality and 2J /n → 0 that VnT δ̂6 = p p ¯ ¯ 1/2 −1/2 1/2 1/2 1/2 J/2 J/2 OP (2 /T + VnT /n) → 0 Similarly, we have VnT δ̂7 = OP (2 /T + VnT /n) → 0 We have p −1/2 shown VnT δ̂c → 0 for 1 ≤ c ≤ 9 given max1≤i≤n 22(Ji +1 )/(n2 + T ) → 0 Proposition A.2 then follows from (A.8). Q.E.D. PROOF OF THEOREM A.2: Recalling the definition of ᾱijk in (6.3), we can write ni=1 2πTi × Ji 2j n T −1 T −1 n 2 j=0 k=1 ᾱijk = i=1 Ti h=1 m=1 bJi (h m)R̄i (h)R̄i (m) ≡ i=1 (Âi + B̂1i − B̂2i − B̂3i ) where Ti −1 Ti −1 Âi ≡ 2Ti−1 bJi (h m) h=1 m=1 bJi (h m) h=1 m=1 B̂2i ≡ Ti−1 T −1 T −1 T −1 T −1 (by symmetry of bJi (· ·)) Ti vit2 vit−h vit−m t=1 bJi (h m) h=1 m=1 B̂3i ≡ Ti−1 vit vit−j vis vis−m t=2 s=1 Ti −1 Ti −1 B̂1i ≡ Ti−1 Ti t−1 Ti h vit vit−h vis vis−m t=1 s=m+1 bJi (h m) h=1 m=1 Ti m vit vit−h vis vis−m t=1 s=1 Note again that under H0 , {vit } coincides with {εit } and so is i.i.d. for each i and {vit } and {vls } are independent for i = l and all t s. Q.E.D. i 2j −1/2 n −1/2 n 2 PROPOSITION A.3: VnT ( i=1 2πTi Jj=0 k=1 ᾱijk − MnT ) = VnT i=1 Âi + oP (1). Next, we decompose Âi into the terms with t − s > qi and t − s ≤ qi , for some integer qi ∈ Z+ : T t−q −1 T Ti −1 Ti −1 t−1 i i i −1 (A.11) bJi (h m) + Âi = 2Ti vit vit−h vis vis−m h=1 m=1 t=qi +2 s=1 t=2 s=max(t−qi 1) ≡ B̂i + B̂4i Furthermore, we decompose q q qi Ti t−qi −1 n−1 n−1 n−1 i i bJi (h m) B̂i = 2Ti−1 (A.12) + + vit vit−h vis vis−m h=1 m=1 h=1 m=qi +1 h=qi +1 m=1 t=qi +2 s=1 ≡ Ûi + B̂5i + B̂6i ¯ PROPOSITION A.4: Suppose Assumptions 1 and 2 hold, 22J /T → 0 qi ≡ qi (Ti ) → ∞, −1/2 n qi /2Ji → ∞, and qi2 /Ti → 0 where J¯ ≡ max1≤i≤n (Ji ) If {vit } is i.i.d. for each i then VnT i=1 Âi = −1/2 n Û VnT + o (1). P i=1 i −1/2 PROPOSITION A.5: Under the conditions of Proposition A.4, VnT n i=1 Ûi d → N(0 1). 1551 SERIAL CORRELATION IN PANEL MODELS Propositions A.3–A.5 and the Slutsky theorem imply Theorem A.2 We now prove Propositions A.3–A.5. PROOF OF PROPOSITION A.3: Recall the definition of MnT in Theorem A.2. We obtain n Ji 2 j 2πTi i=1 ᾱ2ijk − MnT = j=0 k=1 n Âi + i=1 n n n B̂2i − B̂3i (B̂1i − σi4 Mi0 ) − i=1 i=1 i=1 p p −1/2 n −1/2 n −1/2 We shall show (a) VnT ( i=1 B̂1i − MnT ) → 0; (b) VnT × i=1 B̂2i → 0; and (c) VnT n p B̂ → 0 i=1 3i (a) Observe that B̂1i has a structure similar to B̂1n in Lee and Hong (2001). Following Lee and Hong’s (2001) reasoning and using Lemma A.1(ii), we can obtain that for each i and for Ti sufficiently large, E(B̂1i − E B̂1i )2 ≤ CTi−1 T −1 T −1 i i 2 |bJi (h m)| h=1 m=1 ¯ ≤ C 2 (2J /T ) Ti −1 Ti −1 |bJi (h m)| h=1 m=1 n Because {B̂1i } is a random sequence independent across i and i=1 E B̂1i = MnT we have n ¯ n E( i=1 B̂1i − MnT )2 = i=1 E(B̂1i − E B̂1i )2 = O(VnT 2J /T ) given Lemma A.1(ii) and VnT ≤ ¯ −1/2 n C ni=1 2Ji +1 Hence, by Chebyshev’s inequality and 22J /T → 0 we have VnT ( i=1 B̂1i − MnT ) = ¯ OP [(2J /T )1/2 ] = oP (1). 2 ≤ CTi−1 × (b) Next, we consider B̂2i Following Lee and Hong (2001), we have E B̂2i Ti −1 Ti −1 3 [ h=1 m=1 |bJi (h m)|] Then by the fact that B̂2i is a zero-mean random sequence inde 2 pendent across i, Lemma A.1(ii), and VnT ≤ C ni=1 2Ji +1 , we have E( ni=1 B̂2i )2 = ni=1 E B̂2i = p −1/2 n 2J¯ 2J¯ B̂ → 0 by Chebyshev’s inequality and 2 /T → 0 O[(2 /T )VnT ] Hence, VnT i=1 2i p −1/2 B̂3i → 0 (c) By reasoning similar to (b) we can obtain VnT Q.E.D. PROOF OF PROPOSITION A.4: Given (A.11) and (A.12), we have Âi = Ûi + B̂4i + B̂5i + B̂6i It p −1/2 n suffices to show VnT i=1 B̂ci → 0 for c = 4 5 6 (a) We first consider B̂4i in (A.11) From Lee and Hong (2001, Proof of Theorem 1), we have for each i and for Ti sufficiently large, 2 E B̂4i T −1 T −1 2 Ti −1 Ti −1 i i ¯ ≤ C(qi /Ti ) |bJi (h m) ≤ C 2 (q̄2J /T ) |bJi (h m)| h=1 m=1 h=1 m=1 where q̄ ≡ max1≤i≤n (qi ) and the last inequality follows from Lemma A.1(ii). Hence, using the fact that {B̂4i } is a zero-mean random sequence independent across i, Lemma A.1(ii) and VnT ≤ ¯ 2 = O(VnT q̄2J /T ) This, Chebyshev’s inequality, C ni=1 2Ji +1 , we have E( ni=1 B̂4i )2 = ni=1 E B̂4i p ¯ −1/2 n q̄2 /T → 0, and 22J /T → 0 imply VnT i=1 B̂4i → 0 (b) Next, we consider B̂5i as in (A.12) By the definition of bJi (h m) the Cauchy–Schwarz inequality, and Assumption 2, we obtain 2 E B̂5i = σi4 T −1 T −1 h=1 m>qi b2Ji (h m) ≤ C Ji |ψ̂(2πm/2j )|2 ≤ C 2 22τJi /qi2τ−1 j=0 m>qi ¯ 2 Therefore, E( ni=1 B̂5i )2 = ni=1 E B̂5i ≤ C(2J /q0 )2τ−1 ni=1 2Ji where q0 ≡ min1≤i≤n (qi ) It fol−1/2 n 2τ−1 J¯ J¯ lows by Chebyshev’s inequality and 2 /q0 → 0 that VnT ] = oP (1) i=1 B̂5i = OP [(2 /q0 ) 1552 Y. HONG AND C. KAO (c) Finally, we consider B̂6i in (A.12) Following Lee and Hong (2001, Proof of Theorem 1), we obtain 2 E B̂6i ≤ C2 2τJi /qi2τ−1 + CTi−1 T −1 T −1 2 i i bJ (h m) i h=1 m=1 ¯ ≤ C2Ji (2Ji /q0 )2τ−1 + C(2J /T ) Ti −1 Ti −1 bJ (h m) i h=1 m=1 p ¯ −1/2 N 2τ−1 J¯ by Lemma A.1(ii) and Assumption 2. Thus, VnT + (2J /T )1/2 ] → 0 by i=1 B̂6i = OP [(2 /q0 ) n J +1 J¯ 2J¯ i Chebyshev’s inequality, VnT ≤ C i=1 2 2 /q0 → 0, and 2 /T → 0. This completes the proof of Proposition A.4. Q.E.D. T i qi PROOF OF PROPOSITION A.5: We write Ûi = Ti−1 t=q Uit where Uit ≡ 2vit h=1 vit−h × i +2 qi t−q −1 bJi (h m)Sit−qi −1 (m), and Sit−qi −1 (m) ≡ s=1i vis vis−m Then Hit−qi −1 (h) Hit−qi −1 (h) ≡ m=1 Û = T̄t=q0 +2 Ut where T̄ ≡ max1≤i≤n (Ti ) Ut ≡ ni=1 Uit 1(qi ≤ t ≤ Ti ) and 1(·) is the indicator function Put Ft ≡ ni=1 Fit where Fit is the sigma field generated by {vis s ≤ t}ni=1 Because {vit vit−h } is independent of Hit−qi −1 (h) for 0 < h ≤ qi {Ut Ft−1 } is an adapted martingale difference sequence (m.d.s.), with E Û 2 = T̄ EUt2 = t=q0 +2 n T̄ E(Uit2 )1(qi ≤ t ≤ Ti ) = VnT [1 + o(1)] t=q0 +2 i=1 given qi → ∞ qi /2Ji → ∞ q̄2 /T → 0. We apply Brown’s (1971) martingale limit theorem by verifying his two conditions: (a) var−2 (Û) T̄t=q0 +2 E{Ut2 1[|Ut | ≥ var1/2 (Û)]} → 0 for all > 0, T̄ p (b) var−2 (Û 2 )T −2 t=q0 +2 E(Ut2 |Ft−1 ) → 1 T̄ −2 4 We first verify (a) by showing VnT t=q0 +2 EUt → 0 Given t {Uit } is a zero-mean independent sequence across i, so we have EUt4 ≤ C[ ni=1 Ti−2 (EUit4 )1/2 1(qi ≤ t ≤ Ti )]2 Moreover, following Lee and Hong (2001, Proof of Theorem 1), we can obtain that for each i and for Ti sufqi qi T̄ −2 2 4 −1 ) → 0 ficiently large, EUit4 ≤ Ct 2 h=1 m=1 bJi (h m) It follows that VnT t=1 EUt = O(T Hence, condition (a) holds. −2 E(Ũ 2 − E Û 2 ) → 0 where Next, we verify condition (b) by showing VnT Ũ 2 ≡ T̄ E(Ut2 |Ft−1 ) t=q0 +2 E(Ut2 |Ft−1 ) ≡E Ti−1 n 2 Uit 1(qi ≤ t ≤ Ti ) Ft−1 i=1 = n i=1 Ti−1 E(Uit2 |Fit−1 )1(qi ≤ t ≤ Ti ) 1553 SERIAL CORRELATION IN PANEL MODELS where the second equality follows from the facts that for each t {Uit } is a zero-mean random sequence independent across i, and that for each i {Uit Fit−1 } is an m.d.s. Lee and Hong (2001) show that for each i and for Ti sufficiently large, T 2 T −1 T −1 2 i i i E [E(Uit2 |Fit−1 ) − EUit2 ] ≤ C(q̄/T ) |bJi (h m)| + C(Ji + 1)2Ji t=qi +2 h=1 m=1 It follows that E(Ũ 2 − E Ũ 2 )2 = n E i=1 Ti 2 [E(Uit2 |Fit−1 ) − EUit2 ] 2 ≤ C(q̄/T )VnT + C(J¯ + 1)VnT t=qi +2 i for T sufficiently large, where the equality follows from the fact that { Tt=q [E(Uit2 |Fit−1 ) − i +2 2 EUit ]} is a zero-mean independent sequence across i and the inequality follows from Lem ¯ −2 ma A.1(ii) and VnT ≤ C ni=1 2Ji +1 . Hence, given q̄2 /T → 0 and 22J /T → 0 we have VnT E(Ũ 2 − n −2 2 2 Ji E Ũ ) = O(q̄/T ) + O[VnT i=1 (Ji + 1)2 ] → 0 as n T → ∞ Thus, condition (b) holds, and so −1/2 Û → N(0 1) by Brown’s theorem. VnT d Q.E.D. PROOF OF THEOREM A.3: (a) Recalling the definition of M̂ and MnT we have (A.13) M̂ − MnT = Ti −1 n [R̂i (0) − Ri (0)]2 + 2[R̂i (0) − Ri (0)]Ri (0) bJi (h h) i=1 h=1 ≡ M̂5 + 2M̂6 Because R̂i (0) − Ri (0) = [R̂i (0) − R̃i (0)] + [R̃i (0) − R̄i (0)] + [R̄i (0) − Ri (0)] we can write (A.14) M̂5 ≤ 4 Ti −1 n [R̂i (0) − R̃i (0)]2 + [R̃i (0) − R̄i (0)]2 + [R̄i (0) − Ri (0)]2 bJi (h h) i=1 h=1 ≡ 4(M̂51 + M̂52 + M̂53 ) Using (A1), the Cauchy–Schwarz inequality, Assumptions 3–5, Lemma A.1(v), and VnT ≤ C ni=1 2Ji +1 , we have −1/2 −1/2 VnT M̂51 ≤ 4VnT β̂ − β2 n β̂ − β2 Γ˜ixx (0)4 + Γ˜ixv (0)2 + Γ˜ivx (0)2 i=1 Ti −1 × bJi (h h) h=1 1/2 = OP (nT )−1 VnT n J +1 Similarly, using VnT ≤ C i=1 2 i , the Cauchy–Schwarz inequality, Markov’s inequality, the i.i.d. property of {vit } for each i and independence between {vit } and {vls } for all i = l under H0 we can obtain E[R̃i (0) − R̄i (0)]2 ≤ C(Ti−2 + n−1 Ti−1 ) It follows from Markov’s inequality, the −1/2 Cauchy–Schwarz inequality, Lemma A.1(v), and VnT ≤ C ni=1 2Ji +1 that VnT M̂52 = OP [(T −2 + 1/2 −1 −1 −1 2 n T )VnT ] Using Markov’s inequality, E[R̄i (0) − Ri (0)] ≤ CTi Lemma A.1(v), and VnT ≤ −1/2 1/2 −1/2 1/2 C ni=1 2Ji +1 , we have VnT M̂53 = OP (T −1 VnT M̂5 = OP (T −1 VnT ) Hence, we have VnT )= oP (1) given (A.14) and VnT /T 2 → 0 1554 Y. HONG AND C. KAO Next, we consider the second term M̂6 in (A.13). We write (A.15) M̂6 = Ti −1 n [R̂i (0) − R̃i (0)] + [R̃i (0) − R̄i (0)] + [R̄i (0) − Ri (0)] Ri (0) bJi (h h) i=1 h=1 ≡ M̂61 + M̂62 + M̂63 Using (A.1) with h = 0 Assumptions 3–5, Lemma A.1(v), and VnT ≤ C −1/2 −1/2 VnT |M̂61 | ≤ VnT β̂ − β2 n n i=1 2Ji +1 , we have β̂ − βΓ˜ixx (0) + Γ˜ixv (0) + Γ˜ivx (0) Ri (0) i=1 Ti −1 × bJi (h h) h=1 1/2 = OP (nT )−1/2 VnT Also, using (A.2), E[R̃i (0) − R̄i (0)]2 ≤ C(Ti−2 + (nTi )−1 ), the Cauchy–Schwarz inequality, n −1/2 1/2 Markov’s inequality, Lemma A.1(ii), and VnT ≤ C i=1 2Ji +1 , we have VnT M̂62 = OP [T −1 VnT + (nT )−1/2 VnT ] Finally, noting that R̄i (0) − Ri (0) is a zero-mean sequence independent across i, we have 2 T −1 T −1 n n T i i i −1 T i −1 ¯ 2 E M̂63 = E[R̄i (0) − Ri (0)]2 bJi (h h) ≤ C 2 (2J /T ) bJi (h h) i=1 h=1 m=1 where the last inequality follows by Lemma A.1(v). ¯ by Chebyshev’s inequality. Hence, (A.15) and 22J /T p −1/2 VnT (M̂ − MnT ) → 0. p (b) The proof for Vˆ /VnT → 1 is analogous to (a). i=1 h=1 m=1 −1/2 ¯ It follows that VnT M̂63 = OP (2J/2 /T 1/2 ) p −1/2 → 0 imply VnT M̂6 → 0. We have shown Q.E.D. PROOF OF THEOREM 2: To conserve space, we only show for W̃1 . The proof for W̃2 is similar. Put M̃ ≡ ni=1 R̂2i (0)(2Ji +1 − 1) and V˜ ≡ 4 ni=1 R̂4i (0)(2Ji +1 − 1) Then we can write W̃1 − Ŵ1 = Ŵ1 (Vˆ 1/2 /V˜ 1/2 − 1) + Vˆ −1/2 (M̂ − M̃)(Vˆ 1/2 /V˜ 1/2 ) where M̂ and Vˆ are as in (3.14). p p −1/2 Because Ŵ1 = OP (1) by Theorem 1, VnT (M̂ − MnT ) → 0 and Vˆ /VnT → 1 by Theorem A.3, it p p p d −1/2 suffices for W̃1 − Ŵ1 → 0 and W̃1 → N(0 1) if (a) VnT (M̃ − MnT ) → 0 and (b) V˜ /VnT → 1 We first show (a). Following reasoning analogous to the proof of Theorem A.3, we ob p −1/2 −1/2 0 0 0 (M̃ − MnT ) → 0 where MnT ≡ ni=1 σi4 (2Ji +1 − 1) It remains to show VnT (MnT − tain VnT 2 Ji +1 ν ν MnT ) → 0 This follows from Lemma A.1(v), 2 = ai T n/T log2 (T ) → 0, and n/T 2(2τ−1)−2(2τ−1/2)ν → 0 because n n ¯ −1/2 −1/2 0 VnT |MnT − MnT | ≤ CVnT (Ji + 1) + (2J /T )(2τ−1) 2Ji +1 → 0 i=1 Now we show i=1 n (b). Put ≡ i=1 σi8 (2Ji +1 − 1) Following reasoning 0 p 0 → 1 It remains to show VnT /VnT we can obtain V˜nT /VnT 0 VnT analogous to the proof → 1 This follows from of Theorem A.3, Lemma A.1(vi) and Ji → ∞, because n n −1 0 −1 2 (2τ−1) Ji +1 J¯ VnT |VnT − VnT | ≤ CVnT (Ji + 1) + (2 /T ) 2 → 0 i=1 i=1 1555 SERIAL CORRELATION IN PANEL MODELS −1 where VnT n i=1 (Ji + 1)2 → 0 given VnT ≤ C n i=1 2Ji +1 and Ji → ∞ Q.E.D. PROOF OF THEOREM 3: Recall the definition of M̂ and Vˆ as in (3.14). Following reasoning analogous to the proof of Theorem A.3, we can obtain M̂ = MnT [1 + oP (1)] and Vˆ = i 2j α̂2 + oP (1) VnT [1 + oP (1)] It follows that (nA T )−1 Vˆ −1/2 Ŵ1 = (nA T )−1 ni=1 2πTi Jj=0 n k=1 ijk n given MnT ≤ C i=1 (2Ji +1 − 1) = O(VnT ), and VnT /nA T → 0 by (nA T )−1 i=1 2Ji +1 → 0 It remains to show Ji 2j p n 2 2 (a) (nA T )−1 i=1 2πTi j=0 k=1 (α̂ijk − αijk ) → 0; and j n Ji 2 n 2 −1 (b) n−1 j=0 k=1 αijk = (nA T ) A i=1 2πTi i=1 2πci Q(fi fi0 ) + o(1) where αijk is defined in (3.8) We first show (a). Because (A.16) (nA T )−1 n Ji 2 i 2πTi i=1 (α̂2ijk − α2ijk ) j=0 k=1 = (nA T )−1 n Ji 2 i 2πTi i=1 [(α̂ijk − αijk )2 + 2(α̂ijk − αijk )αijk ] j=0 k=1 it suffices to show that the first term in (A.16) vanishes in probability. That the second term in (A.16) vanishes in probability then follows by the Cauchy–Schwarz inequality and the fact that n Ji 2j 2 2 (nA T )−1 i=1 2πTi j=0 k=1 αijk ≤ C supi∈NA Q(fi f0 ) ≤ C Noting α̂ijk − αijk = (α̂ijk − ᾱijk ) + (ᾱijk − αijk ), we obtain (A.17) n Ji 2 j 2πTi i=1 (α̂ijk − αijk )2 ≤ 2 j=0 k=1 n Ji 2 j 2πTi i=1 [(α̂ijk − ᾱijk )2 + (ᾱijk − αijk )2 ] j=0 k=1 ≡ 2(M̂71 + M̂72 ) Following reasoning analogous to the proof of Proposition A.1, we can obtain (A.18) (nA T )−1 M̂71 = OP [(nA T )−1 + (nA T )−1 VnT ] under Assumptions 1–6 and HA Note that we have obtained a slower rate for M̂71 under HA than under H0 For the second term in (A.17), we further decompose (A.19) M̂72 ≤ 2 n Ji 2 j 2πTi i=1 [(ᾱijk − E ᾱijk )2 + (E ᾱijk − αijk )2 ] ≡ 2(M̂721 + M̂722 ) j=0 k=1 We now consider the first term in (A.19). We have sup1≤h≤Ti −1 var[R̄i (h)] ≤ CTi−1 , which follows from Assumption 6 and Ti −1 var[R̄i (h)] = Ti−1 (1 − |l|/Ti )[R2i (l) + Ri (l − h)Ri (l + h) + κi (h l l + h)] l=1−Ti Cf. Hannan (1970, p. 209). Therefore, we have M̂721 ≤ n i=1 Ti −1 Ti −1 Ti sup var[R̄i (h)] 1≤h≤Ti −1 h=1 m=1 |bJi (h m)| = O(VnT ) 1556 Y. HONG AND C. KAO For the second term in (A.19), noting that |E ᾱijk − αijk | ≤ (2π)−1/2 Ti−1 we have ∞ T −1 Ji 2j n i −1 2 2 2 M̂722 ≤ Ti Ri (h) h |Ψ̂jk (h)| i=1 j=1 k=1 ¯ = O (22J /T ) h=1−Ti n ∞ h=−∞ |hRi (h)Ψ̂jk (h)| h=−∞ 2Ji +1 = o(VnT ) i=1 ¯ given Assumption 2 and 22J /T → 0. It follows by Markov’s inequality that (nA T )−1 M̂72 = OP [(nA T )−1 VnT ] This, (A.17), (A.18), and VnT /(nA T ) → 0 imply (a). Next, we show (b). This follows because n (nA T )−1 2 ∞ j 2πTi i=1 α2ijk − (nA T )−1 n j=0 k=1 2 ∞ Ji 2 j 2πTi i=1 α2ijk j=0 k=1 j ≤ C sup i∈NA α2ijk → 0 j=Ji +1 k=1 as min1≤i≤n (Ji ) → ∞ and Qi (fj f0 ) = ∞ 2j 2 k=1 αijk j=0 ≤ C This completes the proof for Ŵ1 . Q.E.D. PROOF OF THEOREM 4: Given Ti = ci T and 2Ji +1 = ai Tiν , we have 2Ji +1 = bi T ν where bi ≡ ai ciν When T → ∞ we have n n 0 −1 8 Ji +1 ν −1 8 VnT ≡ n σi (2 − 1) = T n σi bi [1 + o(1)] = b̄T ν [1 + o(1)] i=1 where b̄ ≡ n−1 n i=1 0 σi8 bi It follows from Theorem 2 and Vˆ /VnT → 1 that p i=1 (nA T )−1 (b̄T ν )1/2 Ŵ1 = n−1 A ci Q(fi fi0 ) + oP (1) and i∈NA (nA T )−1 (b̄T ν )1/2 Ŵ2 = n−1 A (b̄/σi8 bi )1/2 ci Qi + oP (1) i∈NA (c) For c = 1 2 we put SnT ≡ −2 ln[1 − Φ(Ŵc )] where Φ(·) is the N(0 1) CDF Because ln[1 − 1 2 Φ(z)] = − 2 z [1 + o(1)] as z → +∞, we have −2 (nA T ) (1) b̄T ν SnT n−1 A = 2 ci Q(fi fi0 ) + oP (1) and i∈NA −2 (nA T ) (2) b̄T ν SnT = n−1 A 2 (b̄/σi8 bi )1/2 ci Q(fi fi0 ) + oP (1) i∈NA (1) (2) Suppose {Ti(1) }ni=1 and {Ti(2) }ni=1 are two sequences of sample sizes used for Ŵ1 and Ŵ2 re(2) (1) spectively so that Sn(1) n(2) T (1) , and T (2) → ∞ Then Bahadur’s relative (1) T (1) /Sn(2) T (2) → 1 as n SERIAL CORRELATION IN PANEL MODELS 1557 efficiency of Ŵ1 to Ŵ2 n(2) n(2) (2) 1 (2) (2) ( n(2) Ti i=1 ci )n T BE(Ŵ1 : Ŵ2 ) ≡ lim i=1 = lim (1) (1) n n (1) 1 (1) (1) ( n(1) i=1 Ti i=1 ci )n T n ! lim −1 b̄/(σi8 bi )ci Q(fi fi0 ) (1+κ)/(3−ν) n→∞ n i=1 = limn→∞ n−1 ni=1 ci Q(fi fi0 ) where the last equality follows from n(c) = γ[T (c) ]κ for c = 1 2. Hence, BE(Ŵ1 : Ŵ2 ) > 1 if ai is a monotonically increasing function of Q(fi fi0 ) and ci = c (i.e., Ti = T ) for all i Q.E.D. ˆ the proof for Ŵ2 (J) ˆ is similar. We write PROOF OF THEOREM 5: We only consider Ŵ1 (J); ˆ − Ŵ2 (J) Ŵ1 (J) n 2j Jˆ −1/2 2 ˆ − M̂(J)] − [Vˆ (J)1/2 /Vˆ (J)1/2 − 1]Ŵ1 (J) ˆ 2πTi α̂ijk − [M̂(J) = Vˆ (J) i=1 j=J k ˆ − Given Ŵ1 (J) = OP (1) by Theorem 1 and Vˆ (J)/VnT → 1 by Theorem A.3, it suffices for Ŵ1 (J) ˆ 2j 2 p p d −1/2 n ˆ → ˆ − M̂(J)]} → Ŵ1 (J) → 0 and W1 (J) N(0 1) if (a) V { 2πTi J α̂ − [M̂(J) 0 and p nT i=1 j=J k ijk p (b) Vˆ (J)/Vˆ (J) → 1 We first show (a). Decompose (A.20) n 2 Jˆ j 2πTi i=1 α̂2ijk j=0 k=1 = n 2 Jˆ j 2πTi i=1 [(α̂ijk − ᾱijk )2 + ᾱ2ijk + 2(α̂ijk − ᾱijk )ᾱijk ] j=0 k=1 ≡ Ĝ1 + Ĝ2 + 2Ĝ3 For the first term in (A.20), we write Ĝ1 = n −1/2 i=1 p ˆ j 2πTi ( Jj=0 − Jj=0 ) 2k=1 (α̂ijk − ᾱijk )2 ≡ Ĝ11 − Ĝ12 By Proposition A.1, we have VnT Ĝ12 → 0 Next, for any given constants M > 0 and > 0 we have ˆ ˆ P(Ĝ11 > ) ≤ P Ĝ11 > 2J/2 M|2J /2J − 1| ≤ + P 2J/2 M|2J /2J − 1| > ˆ p where the second term vanishes to 0 as n T → ∞ given 2J/2 |2J /2J − 1| → 0 For the first probˆ −1/2 ability, given 2J/2 M|2J /2J − 1| ≤ we have for all n and all Ti sufficiently large, VnT Ĝ11 ≤ J+1 2j −1/2 n −1/2 2 p VnT Ĝ1 = oP (1) k=1 (α̂ijk − ᾱijk ) → 0 by Proposition A.1. Therefore, VnT i=1 2πTi j=0 Next, we consider Ĝ2 By symmetry of bJ (· ·) we write (A.21) Ĝ2 = n Ti −1 σi4 i=1 +2 [bJˆ(h h) − bJ (h h)] + h=1 n i=1 Ti n T i −1 [Ti−1 R̄2i (h) − σi4 ][bJˆ(h h) − bJ (h h)] i=1 h=1 Ti −1 h−1 R̄i (h)R̄i (m)[bJˆ(h m) − bJ (h m)] h=1 m=1 ≡ Ĝ21 + Ĝ22 + 2Ĝ23 say. 1558 Y. HONG AND C. KAO For the last term Ĝ23 in (A.21), we have for any constants M > 0 and > 0 −1/2 P(VnT |Ĝ23 | > ) −1/2 ˆ ˆ |Ĝ23 | > 2J/2 M|2J /2J − 1| ≤ + P 2J/2 M|2J /2J − 1| > ≤ P VnT where, again, the second term vanishes to 0 as n T → ∞ We now show the first probability vanˆ ishes. Put T̄ ≡ max1≤i≤n (Ti ) as before. Given 2J/2 M|2J /2J − 1| ≤ and the definition of aJ (h m) as in (3.14), we have [log2 2J (1+/M2J/2 )] |aJˆ(h m) − aJ (h m)| = 2π |cj (h m)ψ̂(2πh/2j )ψ̂∗ (2πm/2j )| j=[log2 2J (1−/M2J/2 )] where 2 j (A.22) −j cj (h m) ≡ 2 e i2π(m−h)k/2j = k=1 1 if m = h + 2j r for any integer r 0 otherwise. Cf. Priestley (1981, p. 392, (6.19)). It follows that we have for all n and T sufficiently large, n T̄ −1 T̄ −1 E|Ĝ23 | ≤ E 1(h ≤ Ti )1(m ≤ Ti )Ti R̄i (h)R̄i (m) h=1−T̄ m=1−T̄ [log2 × 2π i=1 2J (1+/2J/2 M)] |cj (h m)ψ̂(2πh/2j )ψ̂∗ (2πm/2j )| j=[log2 2J (1−/2J/2 M)] [log2 2J (1+/2J/2 M)] ≤ 2πn × 1/2 j 2 2 j=[log2 2J (1−/2J/2 M)] ∞ T̄ −1 −j |ψ̂(2πh/2 )| j h=1−T̄ |ψ̂(2πh/2 + 2πr)| j r=−∞ 1/2 ≤ Cn1/2 2J+1 /2J/2 M = O(VnT /M) −1/2 ˆ where the second inequality follows given Assumption 2. Therefore, P(VnT |Ĝ23 | > 2J/2 M|2J / p −1/2 2J − 1| ≤ ) also vanishes to 0. Consequently, we have VnT Ĝ23 → 0 Similarly, we can also obtain p −1/2 VnT Ĝ22 → 0 and −1/2 VnT −1/2 ˆ − M̂(J)] = VnT Ĝ21 − [M̂(J) n p [σi4 − R̂2i (0)][bJˆ (h h) − bJ (h h)] → 0 i=1 n −1 2 4 −1/2 where n ] given Assumptions 3–5, and that under H0 {vit } i=1 [R̂i (0) − σi ] = OP [(nT ) −1/2 ˆ − coincides with {εit } and so is i.i.d. for each i It follows from (A.21) that VnT {Ĝ2 − [M̂(J) p M̂(J)]} → 0 Next, by the Cauchy–Schwarz inequality, we have −1/2 −1/2 −1/2 VnT |Ĝ3 | ≤ (VnT Ĝ1 )1/2 (VnT |Ĝ2 |)1/2 −1/4 1/4 = OP VnT + (T −1/2 + n−1/2 VnT ) oP (n1/4 ) = oP (1) 1559 SERIAL CORRELATION IN PANEL MODELS p −1/2 −1/2 Ĝ2 = n−1/2 VnT by Proposition A.1 and the fact that n−1/2 VnT (Ĝ21 + Ĝ22 + Ĝ23 ) → 0, as can be shown using reasoning similar to that for Ĝ23 Result (a) then follows from (A.20). p p ˆ nT → ˆ Vˆ (J) = 1 + oP (1) it suffices to show Vˆ (J)/V 1 given Vˆ (J)/VnT → 1 by (b) To show Vˆ (J)/ ˆ and VnT we can use reasoning analogous to that Theorem A.3. Recalling the definitions of Vˆ (J) for Ĝ23 to obtain ˆ − VnT ]/VnT = V −1 [Vˆ (J) nT n Ti −1 Ti −1 [R̂4i (0) − σi8 ] i=1 p [bJˆ(h m) − bJ (h m)] → 0 h=1 m=1 n −1 p 8 4 −1/2 ˆ Vˆ (J) → where we used the fact that n ] Thus, Vˆ (J)/ 1 It foli=1 [R̂i (0) − σi ] = OP [(nT ) p ˆ Vˆ (J) − 1]Ŵ1 (J) → 0 given Ŵ1 (J) = OP (1) by Theorem 1. Therefore, result (b) lows that [Vˆ (J)/ p d ˆ − Ŵ1 (J) → ˆ → holds, and we have Ŵ1 (J) 0 and Ŵ1 (J) N(0 1) This completes the proof. Q.E.D. PROOF OF THEOREM 6: We shall prove for Theorem 6(b) only; the proof for Theorem 6(a) is n n similar and simpler. (a) We first show n−1 i=1 Q(fˆi fi ) = n−1 i=1 Q(f¯i fi ) + oP (2J /T + 2−2qJ ) Write (A.23) n−1 n [Q(fˆi fi ) − Q(f¯i fi )] i=1 = n−1 n Q(fˆi f¯i ) + 2n−1 i=1 n i=1 π −π [fˆi (ω) − f¯i (ω)][f¯i (ω) − fi (ω)] dω ≡ Q̂1 + 2Q̂2 For Q̂1 in (A.23), by Parseval’s identity, (A.18), and VnT ∝ n2J+1 we have J 2 n (α̂ijk − ᾱijk )2 = OP [(nT )−1 + 2J /nT ] = oP (2J /T ) j Q̂1 = n−1 i=1 j=0 k=1 as n T → ∞ Next, we have Q̂2 = oP (2J /T + 2−2qJ ) by the Cauchy–Schwarz inequality, Q̂1 = n oP (2J /T ), n−1 i=1 Q(f¯i fi ) = OP (2J /T + 2−2qJ ) which follows by Markov’s inequality and n −1 J ¯ n /T + 2−2qJ ) The latter is to be shown below. i=1 EQ(fi fi ) = O(2 n −1 ¯ (b) To compute n i=1 EQ(fi fi ) we write (A.24) n−1 n EQ(f¯i fi ) = n−1 i=1 n EQ(f¯i E f¯i ) + n−1 i=1 n Q(E f¯i fi ) i=1 We first consider the second term in (A.24). By the orthonormality of {Ψjk (·)} we obtain (A.25) n−1 n ∞ 2 n j Q(E f¯i fi ) = n−1 i=1 i=1 j=J+1 k=1 J 2 n j α2ijk + n−1 (E ᾱijk − αijk )2 i=1 j=0 k=1 For the first term in (A.24), using (3.9) and (A.22), we have 2 ∞ j α2ijk = ∞ ∞ ∞ R(h)R(m)cj (h m)ψ̂(2πh/2j )ψ̂∗ (2πm/2j ) j=J+1 h=−∞ m=−∞ j=J+1 k=1 = ∞ ∞ ∞ j=J+1 h=−∞ r=−∞ R(h)R(h + 2j r)ψ̂(2πh/2j )ψ̂∗ (2πh/2j + 2πr) 1560 Y. HONG AND C. KAO We now evaluate the terms corresponding to r = 0 and r = 0 respectively. For r = 0 we have ∞ ∞ R2i (h)|ψ̂(2πh/2j )|2 = j=J+1 h=−∞ ∞ (2π/2j )q j=J+1 ∞ |ψ̂(2πh/2j )|2 2 2 h Ri (h) |2πh/2j |2q h=−∞ ∞ |ψ(z)|2 (2π/2j )q z→∞ |z|2q j=J+1 = lim = λq 2−2q(J+1) π −π ∞ h2 R2i (h) [1 + o(1)] h=−∞ 2 (q) fi (ω) dω [1 + o(1)] where f (·) and λq are defined in (6.1) and (6.2), and o(1) is uniform in i and ω ∈ [−π π] For the term corresponding to r = 0 it can be shown to be o(2−2q(J+1) ) It follows that for the first term in (A.25), (q) ∞ 2 n j (A.26) n−1 α2ijk = 2−2q(J+1) λ2q n−1 i=1 j=J+1 k=1 n i=1 π −π 2 (q) fi (ω) dω + o 2−2q(J+1) For the second term in (A.25), by Lemma A.1(vii) and Assumption 8 we have J 2 n j (A.27) n−1 (E ᾱijk − αijk )2 i=1 j=0 k=1 J 2 n j = n−1 Ti −1 i=1 j=0 k=1 ≤ 4Cn−1 n i=1 Ti−1 2 Ri (h)ψ̂jk (2πh) |h|≥Ti h=1−Ti Ti −1 Ti −1 Ti−2 |h|Ri (h)ψ̂jk (2πh) + |hRi (h)mRi (m)bJ (h m)| = O (J + 1)/T 2 h=1 m=1 Finally, we consider the first term in (A.24), the variance factor We write n−1 n EQ(f¯i E f¯i ) = n−1 i=1 Ti −1 n Ti −1 bJi (h m) cov[R̄i (h) R̄i (m)] i=1 h=1−Ti m=1−Ti = n−1 Ti −1 n Ti −1 bJi (h m)Ti−1 i=1 h=1−Ti m=1−Ti η(l) + m 1− Ti l × Ri (l)Ri (l + m − h) + Ri (l + m)Ri (l − h) + κi (l h m − h) ≡ Ω1nT + Ω2nT + Ω3nT say, where η(l) = l if l > 0 η(l) = 0 if h − m ≤ l ≤ 0, and η(l) = −l + h − m if −(Ti − h) + 1 ≤ l ≤ h − m Cf. Priestley (1981, p. 326). Given Assumption 6 and Lemma A.1(vii), we have |Ω2nT | ≤ C(J + 1) and |Ω3nT | ≤ C(J + 1) For the first term Ω1nT , we can write Ω1nT = n−1 Ti −1 n bJi (h h)Ti−1 i=1 h=1−Ti + n−1 Ti −1 Ti −1 n i=1 h=1−Ti |r|=1 h 1− R2i (l) Ti l bJi (h h + r)Ti−1 η(l) + h − r 1− Ri (l)Ri (l + r) Ti l SERIAL CORRELATION IN PANEL MODELS = n−1 n Ti−1 (2Ji +1 − 1) i=1 ∞ 1561 R2i (h) + O[(J + 1)/T ] h=−∞ where we have used Lemma A.1(v) for the first term, which corresponds to h = m; the second term corresponds to h = m and it is O[(J + 1)/T ] uniformly in i given ∞ h=−∞ |Ri (h)| ≤ C and Lemma A.1(v) It follows that as J → ∞, n n 2J+1 −1 π 2 n−1 (A.28) EQ(f¯i E f¯i ) = fi (ω) dω + o(2J /T ) n T −π i=1 i=1 Collecting (A.24)–(A.28) and J → ∞, we obtain n n n π (q) 2 2J+1 −1 π 2 n−1 fi (ω) dω EQ(fˆi fi ) = fi (ω) dω + 2−2q(J+1)λq n−1 n T −π −π i=1 i=1 i=1 + o(2J /T + 2−2qJ ) Q.E.D. This completes the proof of Theorem 6. PROOF OF COROLLARY 1: The result follows from Theorem 5 because Assumption 9 implies ˆ 2J /2J − 1 = oP (T −1/(2(2q+1)) ) = oP (2−J/2 ) where the nonstochastic finest scale J is given by 2J+1 ≡ Q.E.D. max{[2αλ2q ζ0 (q)T ]1/(2q+1) 0} The latter satisfies the conditions of Theorem 5. REFERENCES ANDERSON, T. W., AND C. HSIAO (1982): “Formulation and Estimation of Dynamic Models Using Panel Data,” Journal of Econometrics, 18, 47–82. ANDREWS, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59, 817–858. BALTAGI, B. H. (2002): Econometric Analysis of Panel Data, 2nd Edition. New York: John Wiley. BALTAGI, B. H., AND Q. LI (1991): “A Joint Test for Serial Correlation and Random Individual Effects,” Statistical and Probability Letters, 11, 277–280. (1995): “Testing AR(1) Against MA(1) Disturbances in an Error Component Model,” Journal of Econometrics, 68, 133–151. BALTAGI, B. H., Y.-J. CHANG, AND Q. LI (1992): “Monte Carlo Results on Several New and Existing Tests for the Error Component Model,” Journal of Econometrics, 54, 95–120. BAHADUR, R. R. (1960): “Stochastic Comparison of Tests,” Annals of Mathematical Statistics, 31, 276–295. BERA, A. K., W. SOSA-ESCUDERO, AND M. YOON (2001): “Tests for the Error Component Model in the Presence of Local Misspecification,” Journal of Econometrics, 101, 1–23. BINDER, M., C. HSIAO, AND M. H. PESARAN (1999): “Likelihood Based Inference for Panel Vector Autoregressions: Testing for Unit Roots and Cointegration in Short Panels,” Working Paper, Department of Economics, University of Southern California. BIZER, D. S., AND S. DURLAUF (1992): “Testing the Positive Theory of Government Finance,” Journal of Monetary Economics, 26, 123–141. BHARGAVA, A., L. FRANZINI, AND W. NARENDRANATHAN (1982): “Serial Correlation and Fixed Effects Model,” Review of Economic Studies, 49, 533–549. BLUNDELL, R., AND S. BOND (1998): “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models,” Journal of Econometrics, 87, 115–143. BOX, G., AND D. PIERCE (1970): “Distribution of Residual Autocorrelations in Autoregressive Integrated Moving Average Time Series Models,” Journal of the American Statistical Association, 65, 1509–1526. 1562 Y. HONG AND C. KAO BREUSCH, T. S., AND A. PAGAN (1980): “The Lagrange Multiplier Test and Its Applications to Model Specification in Econometrics,” Review of Economic Studies, 47, 239–253. BROWN, B. M. (1971): “Martingale Central Limit Theorems,” Annals of Mathematical Statistics, 42, 59–66. CHOI, I. (2002): “Instrumental Variables Estimation of a Nearly Nonstationary Error Component Model,” Journal of Econometrics, 109, 1–32. DAUBECHIES, I. (1992): Ten Lectures on Wavelets. Philadelphia: SIAM. DAVIDSON, R., AND E. FLACHAIRE (2001): “The Wild Bootstrap, Tamed at Last,” Working Paper, Department of Economics, Queen’s University. DONOHO, D. L., AND I. M. JOHNSTONE (1994): “Ideal Spatial Adoption by Wavelet Shrinkage,” Biometrika, 81, 425–455. (1995a): “Adapting to Unknown Smoothness via Wavelet Shrinkage,” Journal of the American Statistical Association, 90, 1200–1224. (1995b): “Wavelet Shrinkage: Asymptopia,” Journal of the Royal Statistical Society, Series B, 57, 301–369. DONOHO, D. L., I. M. JOHNSTONE, I. M. KERKYACHARIAN, AND D. PICARD (1996): “Density Estimation by Wavelet Thresholding,” Annals of Statistics, 24, 508–539. DURBIN, J., AND G. S. WATSON (1951): “Testing for Serial Correlation in Least Squares Regression: II,” Biometrika, 38, 159–178. DURLAUF, S. (1991): “Spectral Based Testing for the Martingale Hypothesis,” Journal of Econometrics, 50, 1–19. FAN, J. (1996): “Test of Significance Based on Wavelet Thresholding and Neyman’s Truncation,” Journal of American Statistical Association, 91, 674–688. GAO, H.-Y. (1997): “Choice of Thresholds for Wavelet Shrinkage Estimate of the Spectrum,” Journal of Time Series Analysis, 18, 231–251. GEWEKE, J. (1981): “The Approximate Slopes of Econometric Tests,” Econometrica, 49, 1427–1442. GRANGER, C. W. J. (1969): “Investigating Causal Relations by Econometric Models and CrossSpectral Methods,” Econometrica, 37, 424–438. (1996): “Evaluation of Panel Models: Some Suggestions from Time Series,” Working Paper, Department of Economics, University of California, San Diego. GRANGER, C. W. J., AND P. NEWBOLD (1977): Forecasting Economic Time Series. New York: Academic Press. HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large,” Econometrica, 70, 1639–1657. HALL, P., AND P. PATIL (1996): “On the Choice of Smoothing Parameter, Threshold and Truncation in Nonparametric Regression by Nonlinear Wavelet Methods,” Journal of the Royal Statistical Society, 58, 361–377. HANNAN, E. (1970): Multiple Time Series. New York: John Wiley. HÄRDEL, W., AND E. MAMMEN (1993): “Comparing Nonparametric versus Parametric Regression Fits,” The Annals of Statistics, 21, 1926–1947. HERNÁNDEZ, E., AND G. WEISS (1996): A First Course on Wavelets. Boca Raton: CRC Press. HJELLVIK, V., Q. YAO, AND D. TJØSTHEIM (1998): “Linearity Testing Using Local Polynomial Approximation,” Journal of Statistical Planning and Inference, 68, 295–321. HONG, Y. (1996): “Consistent Testing for Serial Correlation of Unknown Form,” Econometrica, 64, 837–864. HONG, Y., AND C. KAO (2002): “Wavelet-Based Testing for Serial Correlation of Unknown Form in Panel Models,” Working Paper, Department of Economics and Department of Statistical Science, Cornell University. HONG, Y., AND J. LEE (2001): “One-Sided Testing for ARCH Effects,” Econometric Theory, 17, 1051–1981. HOROWITZ, J. L. (2001): “The Bootstrap,” in Handbook of Econometrics, Vol. 5, ed. by J. J. Heckman and E. E. Leamer. New York: Elsevier, Ch. 52, 3159–3228. SERIAL CORRELATION IN PANEL MODELS 1563 HSIAO, C. (2003): Analysis of Panel Data. Cambridge: Cambridge University Press. JENSEN, M. J. (2000): “An Alternative Maximum Likelihood Estimator of Long-Memory Processes Using Compactly Supported Wavelets,” Journal of Economic Dynamics and Control, 24, 361–387. KAO, C., AND M.-H. CHIANG (2000): “On the Estimation and Inference of a Cointegrated Regression in Panel Data,” Advances in Econometrics, 15, 179–222. KAO, C., AND J. EMERSON (2004): “On the Estimation of a Linear Time Trend Regression with a One-Way Error Component Model in the Presence of Serially Correlated Errors,” Journal of Probability and Statistical Science, forthcoming. LEE, J., AND Y. HONG (2001): “Testing for Serial Correlation of Unknown Form Using Wavelet Methods,” Econometric Theory, 17, 386–423. LI, Q., AND C. HSIAO (1998): “Testing Serial Correlation in Semiparametric Panel Data Models,” Journal of Econometrics, 87, 207–237. MASRY, E. (1994): “Probability Density Estimation from Dependent Observations Using Wavelet Orthonormal Bases,” Statistics and Probability Letters, 21, 181–194. (1997): “Multivariate Probability Density Estimation by Wavelet Methods: Strong Consistency and Rates for Stationary Time Series,” Stochastic Processes and Their Applications, 67, 177–193. NEUMANN, M. H. (1996): “Spectral Density Estimation via Nonlinear Wavelet Methods for Stationary Non-Gaussian Time Series,” Journal of Time Series Analysis, 17, 601–633. NEWEY, W., AND K. WEST (1994): “Automatic Lag Selection in Covariance Matrix Estimation,” Reviews of Economic Studies, 61, 631–653. PHILLIPS, P. C., AND H. R. MOON (1999): “Linear Regression Limit Theory for Nonstationary Panel Data,” Econometrica, 67, 1057–1112. PRIESTLEY, M. B. (1981): Spectral Analysis and Time Series. New York: Academic Press. RAMSEY, J. (1999): “The Contribution of Wavelets to the Analysis of Economic and Financial Data,” Philosophical Transactions Royal Society of London, Series A, 537, 2593–2606. SPOKOINY, V. G. (1996): “Adaptive Hypothesis Testing Using Wavelets,” The Annals of Statistics, 24, 2477–2498. WALTER, G. (1994): Wavelets and Other Orthogonal Systems with Applications. Boca Raton: CRC Press. WANG, Y. (1995): “Jump and Sharp Cusp Detection by Wavelets,” Biometrika, 82, 385–397. WATSON, M. W. (1993): “Measures of Fit for Calibrated Models,” Journal of Political Economy, 101, 1011–1041.