~ (mbhaiies) n working paper department of economics A UNIFIED APPROACH TO ROBUST, REGRESSION-BASED SPECIFICATION TESTS Jeffrey M. Wooldridge No. 480 Revised November 1988 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 A UNIFIED APPROACH TO ROBUST, REGRESSION-BASED SPECIFICATION TESTS Jeffrey M. Wooldridge No. 480 Revised November 1988 Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/unifiedapproacht00wool2 A UNIFIED APPROACH TO ROBUST, REGRESSION- BASED SPECIFICATION TESTS Jeffrey M. Wooldridge Department of Economics Massachusetts Institute of Technology, E52-262C Cambridge, MA 02139 (617) 253-3488 January 1988 Revised: November 1988 I am grateful tc Adrian Pagan, Peter Phillips, Jim Fcterba, Danny Quah, Tom Stoker, Hal white, and an anonvmous referee for helpful comments and suggestions on a previous version. JAN 9 19&8 ABSTRACT: This paper develops a general approach to robust, regression-based specification tests for (possibly) dynamic econometric models. The key feature of the proposed tests is that, in addition to estimation under the null hypothesis, computation requires only a matrix linear least squares regression and then an ordinary least squares regression similar to those employed in popular nonrobust tests. For the leading cases of conditional mean and/or conditional variance tests, the proposed statistics are robust to departures from distributional assumptions that are not being tested. Moreover, the statistics can be computed using any vT-consistent estimator, resulting in significant simplifications in some otherwise difficult --contexts. Among the examples covered are conditional mean tests for models estimated by weighted nonlinear least squares under misspecif ication of the conditional variance, tests of jointly parameterized conditional means and variances estimated by quasi-maximum likelihood under nonnormality new, , computationally simple specification tests for the tobit model. and some 1 . Introduction Specification testing has become an integral part of the econometric model building process. The literature is extensive, and model diagnostics are available for most procedures used by applied econometricians The most . popular specification tests are those that can be computed via ordinary least squares regressions. Examples are the Lagrange Multiplier (LM) test for nested hypotheses, versions of Hausman's [24] specification tests, White's information matrix (IM) test, and regression-based versions of various nonnested hypotheses tests. [26] [13] In fact, Newey [17], Tauchen [21], and White have shown that all of these tests are asymptotically equivalent to particular conditional moment (CM) test. a In a maximum likelihood setting with independent observations, Newey [17] and Tauchen [21] have devised outer product- type auxiliary regressions for computing CM tests. extended these methods to a White [26] has general dynamic setting. The simplicity of most popular regression-based procedures currently employed, cost. including the Newey-Tauchen-White (NTW) procedure, is not without In many cases the validity of these tests relies on certain auxiliary assumptions holding in addition to the relevant null hypothesis. example, For in a nonlinear regression framework where the dynamic regression function is correctly specified under the null hypothesis, the usual LM regression-based statistic is invalid in the presence of conditional cr unconditional hetercskedasriciry . product statistic is also invalid. for heteroskedasticity: Except in special cases the NTW outer Other examples include the various tests currently used regression forms require constancy of the conditional fourth moment of the regression errors under the null hypothesis. In addition, the Lagrange Multiplier and other CM tests for jointly parameterized conditional means and variances are inappropriate under various departures from normality. The above situations are all characterized by the same feature: validity of the tests requires imposition of more than just the hypotheses of interest under H . In addition, traditional econometric testing procedures require that the estimators used to compute the statistics are efficient (in some sense) under the null hypothesis. is It is important to stress that this not merely nitpicking about regularity conditions. Due primarily to the work of white [7], [22,23,24,26], Domowitz and White Hansen [11], and Newey [17], there now exist general methods of In the context of linear regression models, computing robust statistics. Pagan and Hall [19] discuss how to compute conditional mean tests that are robust to heteroskedasticity . Their discussion centers around the use of the White [22] heteroskedasticity-consistent standard errors. For one degree of freedom tests the Pagan and Hall suggestion leads to easily computable tests, and it is certainly an alternative to the current approach for regression models. But computation of the statistics for tests with more than one degree of freedom requires explicit inversion of the White covariance matrix estimator; this matrix must then be used in a quadratic form to obtain the neteroskedasticity-robust Wald statistic. much in the spirit cf the LM approach: the model only under the null, The tests proposed here are very computation requires estimation of so that any particular model can be subjected to a battery of robust specification tests without ever reestimating the . model. . More importantly, the tests can be computed using any standard regression package. Although there are some fairly general formulas available for robust LM statistics (e.g. Engle [9], White [24,25]), formulas for general nonlinear restrictions involve an analytical expression for the derivative of the implicit constraint function and a generalized inverse. In specific instances computationally simple robust LM statistics are available. notable example is A the paper by Davidson and MacKinnon [6], which develops a regression-based heteroskedasticity-robust LM test in a nonlinear regression model with independent errors and unconditional heteroskedasticity It is a safe bet that the substantial analytical and computational work required to obtain robust statistics is a primary reason that they are used infrequently in applied work. Evidence of this statement is the growing use of the White [22] heteroskedasticity-robust t-statistics computed by many econometrics packages. a Hausman test, test, , which are now Only occasionally does one see an LM or a nonnested hypothesis test carried out in a manner that is robust to second moment misspecif ication. This is unfortunate since these tests are inconsistent for the alternative that the conditional mean is correctly specified but the conditional variance is misspecif ied. words , In other the standard forms of well known tests can result in inference with the wrong asymptotic size while having no systematic power for testing the auxiliary assumptions tnat are imposed in addition to h_ This paper develops a unified approach to calculating robust statistics via least squares regressions which econometricians . I believe is easily accessible to applied The general method suggested here can be viewed as an extension of the Davidson and MacKinnon [6] approach. In fact, in the context of nonlinear regression models, their procedure is shown to be valid for quite general dynamic models with conditional as well as unconditional heteroskedasticity . In the the same context the approach here can be viewed as the Lagrange Multiplier version of the robust Wald strategy suggested by Pagan and Hall [19]. This paper also extends Wooldridge's [30] robust, regression based conditional mean and conditional variance tests in the context of quasi-maximum likelihood estimation in multivariante linear exponential families. The current framework is more general because it applies anytime a generalized residual function (defined in section 2) is the basis for the "test. For the leading cases of conditional first and second moments, the regression-based tests proposed maintain only the hypotheses of interest under the null, and they are applicable to specification testing of dynamic multivariate models of first and second moments without imposing further assumptions on the conditional distribution (except regularity conditions). Moreover, in classical situations, these tests are asymptotically equivalent under the null and local alternatives to their traditional counterparts. Robustness is obtained without sacrificing asymptotic efficiency. For some specification tests the current approach does impose auxiliary assumptions under the null hypothesis. regression-based nature cf the tests. far, This is the price one pays for the Still, in most cases encountered so the current framework imposes fewer auxiliary assumptions under the nui] than popular nonrobust tests. This does not mean that robust tests are not available in such circumstances, but only that regression-based forms of these tests are not known. The goal here is to provide a unified approach to robust, repress ion -based specification tests, and not to robust tests in Nevertheless, the coverage is fairly broad. general. A general treatment of robust tests is contained in White [26]. A second aspect of the proposed statistics is that they may be computed using any ,/T-consistent estimator. The asymptotic distribution of the test statistic under the null and local alternatives is invariant with respect to the asymptotic distribution of the estimators used in computation; be viewed as another kind of robustness. this can Consequently, the methodology leads to some interesting new tests in cases where the computational burden based on previous approaches can be prohibitive. This is true whether or not robustness to violation of auxiliary assumptions is an issue; in fact, the procedure can be profitably applied to situations which assume correct specification of the entire conditional distribution provided that the test statistic can be put into the form considered in section the proposed tests have properties similar to Neyman's 2. [18] In such cases C(q) tests, but they are applicable even whether or not the score of the log- likelihood is the basis for test statistic. When restricted to LM tests, the new statistics offer generalized residual alternatives to outer product-type C(a) statistics, provided of course that the score of the log- likelihood can be put into the appropriate generalized residual form. Section section 3 2 cf the paper discusses the setup and the general results, illustrates the scope cf the methodology with several examples, and section 4 contains concluding remarks. contained in an annendix. B.egularity conditions and proofs are 2 General Results . Let lxJ y , { (y z , ): lxK. z t— 1,2,...) be a sequence of observable random vectors with y is context) past values of y (z v v ,,z , t'^t-l' t-1' V z e (y For time series applications, . can be excluded from x z one may take x z and variables. denote the predetermined v y.,,z,) ,J 1' , current and (in a time series in terms of the explanatory variables z explaining y Interest lies in the vector of endogenous variables. y . . . ,y let x = Note that or, if there are no "exogenous" variables, ) For cross section applications set x . and assume that the observations are independently distributed. The conditional distribution of y given x always exists and is denoted t t Assume that the researcher is interested in testing hypotheses D (-|x ). about a certain aspect of D the conditional variance. contains ( (y^.z , for example the conditional expectation and/or Note that, because at time .),..., (y.,z) } or [y . ,y t _ 2 . • • t .Y^ the conditioning set the assumption is that interest lies in getting the dynamics of the relevant aspects of D^ correctly specified. For cross section applications this point is irrelevant. For motivational purposes and to illustrate the notation, to introduce a couple of examples. testing cf a conditional mean. . y^_ (taken to be a scalar for simplicity) The parametric model is (m (x ,a): c t where A c R The first example concerns specification Suppose interest lies in testing hypotheses about the conditional expectation cf given x_ it is useful , a e A. t=l,2,...}, and the null hypothesis is (2.1) 2 H E(y |x : Q t t - m (x ,a ) t t o some q ), q g A, t-1 2 (2.2) , A If a_ is a VT-consistent estimator of a I A defined as u (y ,x under H„ then the residuals are U o A ,q ^ y ) m - (x ,a ) . can be based on the A test of H sample covariance ^ vvvVWW T t" 1 A where X (x A 1 - - T I t-1 A (2 3) A A'u (2.4) is a lxQ vector function of "misspecif ication indicators" .q.tt) A A that can depend on a and a nuisance parameter estimator n approachleads - The standard LM (uncentered) r-squared from the to a test based on the regression A A on u t If a A V m Q t t-1 A , (2.5) T. t asymptotically equivalent to the NLS estimator then under H. and is conditional homoskedasticity J TR , 2 u 2 is asymptotically J y^.. Thus, the LM Q approach effectively takes the null hypothesis to be HI: H. holds and V(v U (J t |x t - a ) 1 o for some 1 a o > 0, t=l , (2.6) but it is inconsistent for the alternative H' K n holds but : does not. K' It also essentially reauires that a„. be the NLS estimator. i The Nevey-Taucher.-White regression fcr the same problem is 1 u V m or. t Q , t u t=l ,....T. A t (2.7) t A In general, R' is also required for from this regression to be TP. i 2 asmDtoticaily x^> although there are some cases, such as testine for serial Q . correlation in a static regression model with static conditional heteroskedasticity (i.e. V(y |x ) V(y «= |z where the NTW regression is )), A being The validity of the NTW procedure also generally relies on a robust. the NLS estimator. As pointed out by Pagan and Hall [19], a robust test is available from The White [22] heteroskedasticity-robust covariance the regression (2.5). matrix estimator can be used to compute a robust Wald statistic for the A A hypothesis that can be excluded from the regression (2.5). A When is a A scalar this is simple because a robust test statistic is simply the robust A A t-statistic on A When . vector computation of the robust Wald is a A statistic is somewhat more complicated since it involves inversion of the White covariance matrix estimator as well as explicit construction of the appropriate quadratic form. the Wald procedure is valid In addition, A essentially only when a is the NLS estimator. The regression-based heteroskedasticity-robust form of the test, which in addition is valid for any 7T-consistent estimator, Example 3.1 discussed in section is a special case of 3. As a second example, consider testing for heteroskedasticity. The null hypothesis is taken to be H E(yjx • b i_ ) t ttO = m (x ,a ) and V(v |x t ) t - a 2 00 , a e A, c 2 > 0, t=l Again let u^(y ,x^,a) be the residual function, and let A(x_,£,7r) be _ vector or Heteroskedasticity indicators, where of tests is based A^(x^,f„,r_)' iu_(y_,x^,Q_) I ^ = (a' ,a or. T T 6 u l. i 1 Z. Z. Z. T - O T ] 2 )' . , 2 , . . . a IxQ A general class . V T a - t" A where a 2 = T A A J- 1 y I t«l -12 X u r A O r. (u - f. A standard LM-type statistic is obtained from the • t=l centered r-squared from the regression A Art u TR 2 t on asymptotically v is X 1, C 2 t , t=l (2.8) T. under L^ H': H n holds and, where u y - m^Cx in addition, ,q ). of the Breusch-Pagan [2] E[(u l o A t ) |x ' t ] = k 2 o > 0, t=l , 2 , . . Regression (2.8) yields the "studentized" version test as derived by Koenker [16]. The studentized form of the test is robust to certain departures from normality, and it is now widely used in the literature (see, e.g.. Engle [8], Pagan and Hall [19], and Pagan, Trivedi, and Hall [20]). Unfortunately, this form of the test is not completely robust in the sense defined in this paper. o E[( u _) 4 (2.8) is an auxiliarv assumption imposed under H A that is required for J u |x t The constancy of t to lead to a valid test. conditional on x Normality of u heterokurtosis under H_ , but it is easy to construct examples to illustrate that the auxiliary assumption of homokurtosis is binding. o . ,. rules out t t If the regression . errors u_ nave a conditional t- distribution witr. constant variance but decrees of freedom that otherwise deoend on x then K_ holds but K' does not. t Hsieh [14] and Pagan and Hall [19] have noted that just as with conditional mean tests, the white [22] covariance matrix car. be used tc compute heteroskedasticity tests that are robust tc heterokurtosis. when A^ is a scalar, However, except computation of the statistic requires several matrix ' operations. Pagan, Trivedi, and Hall [20] report the White [22] t-statistic A in a model for the variance of inflation when X An alternative is a scalar. robust form of the test is provided in Example 3.2 in section It is 3. A. almost as easy to compute as the nonrobust form even when is a vector, A but it allows for heterokurtosis under the null. There are other examples where the goal is to test hypotheses about certain aspects of a conditional distribution but auxiliary assumptions are maintained under the null hypothesis in order to obtain regression-based test. simple a Because the limiting distributions of test statistics are usually sensitive to violations of the auxiliary assumptions, it is important to use robust forms of tests for which H n includes only the hypotheses of interest. To be attractive these tests must be easy to compute under reasonably broad circumstances. The remainder of this section develops a general approach to constructing robust, regression-based tests. Many specification tests, including those for conditional means and variances, have asymptotically equivalent versions that can be derived as follows. Let ^>,_(y^,x t set 9 C R H • U p . t t be an Lxl random function defined on a parameter ,8) The null hypothesis of interest is expressed as El© (v ,x " ' t t By definition, t ,8 O )|x t « ] 0, is the "true" S o for some S o € 9, t-1,2, parameter vector under H„. - hypothesis specifies that the conditional expectation of _ne prece tcnrinSw VansD^6s x_ (2.9) is zero, c Because the null (y .x it is natural tc Cci. c , 6 ) given a For the conditional near, tests in a nonlinear regression model, 1=1, 6 = a, and a (y^,x 10 , 0) u„(y ,x ,o) = y - . The tests for heteroskedasticity take L m (x ,a). <^ t (y t .x E u 2 (a) .0) - t a 1, 8 = Q ' >a ( 2 )' , and 2 . The validity of (2.9) can be tested by choosing functions of the predetermined variables x between these functions and and checking whether the sample covariances ,x (y <f> ,6 are significantly different frc :om ) In order to cover a broad range of circumstances that are of interest zero. to economists depend on it is useful to allow the misspecif ication indicators to , and some nuisance parameters. 8 nuisance parameters. Let n G IT denote a Nxl vector of Let A (x ,8,n) be an LxQ matrix of misspecif ication symmetric and positive semi-definite indicators and let C weighting matrix. Assume the availability of an estimator T 1/2 ' (x be an LxL, ,8 ,ir) such that 8 " (#„ - 8 o 1 estimator ) -= p (1) U is such that T jr_ T Also assume that the nuisance parameter under H_. (7r_ - T tt_) = T (1) under H p rt , u where (?r _: T A T=l , 2 , . . . } is a nonstochastic sequence in II. need not have It is because r an interpretable probability limit under H_ that n is called a nuisance parameter A computable test statistic is the Qxl vector -, T 1 l t-1 AAA A'CA u u (2-10) " A A where "*" denotes that each function is evaluated at or 8 1 ( dependence of the summaries in (2.10) convenience) . or. A (6',-k')' 1 x the sample size T is suppressed for For the conditional mean tests and the heteroskedasticity tests, A_(x_,£,7r) is the IxQ vector denoted ^(x^, 8 ,?r) . From the point of view of simply obtaining tests with known asymptotic size under H the p.s.d. matrix C " t could be absorbed into A . t But the structure in (2.10) is exploited below to generate regression-based tests with the additional property that they are asymptotically equivalent to In the examples discussed better known tests in classical circumstances. thus far C (x to allow C = ,ff,n) Section 1. covers some cases where it is profitable 3 to be random. To use (2.10) as the basis for a test of (2.9), the limiting distribution of m £_ S t" T 1/2 T AAA V £ A'C 6 (2.11) t t*t A under H n is needed. under H finding the asymptotic distribution of In general, £ entails finding the limiting distribution of C - T" 1/2 I A t«l C 'CV (2.12) (values with "o" superscripts are evaluated at limiting distribution of T 1/2 or (f 6 , -k ' )' and the ) A (8 - 6 (the limiting distribution of T ) A 1/2 (.n A - n ) does not affect the limiting distribution of under H^) £ . Because £ is the standardized sum of a vector martingale difference sequence under H n , its limiting distribution is generally derivable from a central limit theorem (provided that {A^'C^<p^} is also weakly dependent in an appropriate sense). L. In standard cases T l_ 1/2 l_ A (.6- - 6 ) will also be asymptotically normal. the asymptotic covariance matrices of differentiability assumptions on asymptotic covariance matrix of A^_, £,,, o £T C_ test statistic is easy to compute. 1/2 and e^. (0 - 8 ) and it is possible to derive the by a standard mean value expansion. principle, deriving a quadratic form in distribution is straightforward. , and T Given A £„, In which has an asymptotic chi-square But nothing guarantees that the resulting In certain instances test statistics based on £ simple OLS regressions. can be computed from The Newey-Tauchen-White approach can be applied when a # the maximum likelihood estimator and the conditional density of y is A given x^ is correctly specified under H_. In addition to <j> A , A A and , the C A score of the conditional log-likelihood is needed for computation. s The NTW regression is simply 1 and one uses TR 2 on A AAA S{_, 4>' t C A t=l,...,T , as asymptotically v 2 (2.13) If interest lies in the case where • A the entire conditional density is correctly specified under H maximum likelihood estimator of 8 , and is 8 the then the Newey-Tauchen-White approach is computationally easier than the present approach. It should be noted, A however, that the NTW regression is valid only when 8 is the MLE , whereas A the procedure described below is valid when of 6 8 is any Jl- consistent estimator Wooldridge [31] discusses a C(q) version of the NTW statistic that . A allows 8 to be any ,/T- consistent estimator. Another possible drawback to the NTW regression is that there is growing evidence that it can yield tests with poor finite sample properties even in the best possible circumstances (Davidson and MacKinnon [6], Bollerslev and Wooldridge Neumann [15]). [1] , Kennan and This is at least in part because the NTW regression ignores the generalized residual structure in (2.2) in always using the outer product of the gradient in computing an estimate of the information matrix. A relatively simple statistic that typically imposes fewer assumptions A than the NTW approach is available if |„ is appropriately modified. that 8 G int(9) and that <?_ is differentiable on int(S) 1 1. . Assume Define $ (x , 6 ) = 1 E V fl^ [ v t (y t > x ># t O )| x t instead of basing Then, ]• test statistic on the a A covariance of the weighted misspecif ication indicator C 1/2 A A A "1/2" generalized residuals C linear projection onto f j * *rt A A $ C the idea is to first purge from C , <f> A and the weighted t A A its A = $ (x where $ , t t t 1/2 , t' 0„). T' That is, consider the modified statistic ~ L t t T t J t r (2.14) t where A 1 A ^, A 1 t t-i T A A A y *'c a 1 A y *'c $ B„ c t (2.15) z is the PxQ matrix of regression coefficients from the matrix regression " 1/2 C/ A t :i/2 C ' * t t on t-1,. (2.16) .,T. can be written more concisely as | " T *i :i/2: [A = C where A - 1/2 $ B by C 4> t T are the LxQ matrix residuals from the t=l ], A regression (2.16) and (2.17) JA't t=l m C 1/2 t A Note that by construction A . <}> t 1/2 . It is important to realize that £ and £ are not always asymptotically A A eouivalent in the sense that £_ - r A A [A - t weighted is t £_ i -* under H„ o . The indicators A t and A $ B_] L. generally vield tests with different power functions. I Nevertheless, the robust form of the test almost always has a straightforward interpretation. I return to this issue below. A Ever, when ~^, and £„. are not asympotically equivalent the basis for a useful specification test. 14 £„. can be used as The computational simplicity of a ; , limiting x quadratic form in Theorem 2.1 (i) consequence of the following theorem. is a f Assume that the following conditions hold under H : Regularity conditions A.l in the appendix; For some (ii) e int(6) 6 o (a) E[* (y ,x ,0 )|x t t t o t (b) *t (c) T (vV 1/2 (0 " E[ V ] (y - ,x t lop - 6 = ) t-1 2 0, 0(1), , f T V 1/2 , |x t . . ] ' . t=1 > 2 ----; Tip (tt_ - tt°) = (1). Then ,0,0 .,^,0,0 t T J t r (2.18) (l) o t p t-=l where -1 T c tii c I E[$°'C°A°- z In addition, TP TR where R 2 is the is given d 2 "* *Q' uncentered on and B 2 u r- squared from the regression [cy ij rcy (l 2 2 - ;j T ) t-l,...,T by (2.15) Equation (2.18) has a very useful interpretation. action of ?, (2.19) ?r, and E evaluated at the estimators 6^,, Viewing |_ as r.^, :nH B e Gua t icr. (2. IS) demonstrates that the asymptotic distribution of this vector is un.cn £ id ; a when the estimators are replaced by their probability limits. original statistic {^ does not generally nave this property. 15 Note . Theorem (2.1) can be applied as follows: AAA A A LL-tJ. - C^, and ^ = C^; Given A (1) C , , <f> 6- and n _, , compute A ?/% = t * t A , tf> and $ w l» t- i- A C , . Define I— Run the matrix regression (2) on A t=l,...,T * and save the residuals, say A (2.20) ; Run the regression (3) 1 2 and use TR - T on £' t-1 A t t T 2 - RSS as asymptotically x n under H assuming that A does not contain redundant indicators It must be emphasized that condition (ii.b), EfV <j> c U (y t ,x t ,6 O )|x c " be computable under the null hvpothesis, can impose additional restrictions on t ,8 i_ o ) that must be satisfied in order for (l)-(3) to $ be a valid procedure under H n $ (x which requires that . If additional assumptions are used in forming then the "implicit null hypothesis" includes more than just (2.9). But as shown in Wooldridge [30], $^ is always computable under the relevant null hypothesis for conditional mean (hence conditional probability) or conditional variance testing in a linear exponential family. leading - but certainly not the only - cases where one would like to be robust agair.st other distributional misspecifications section 3 These are . Example 3.3 in snows that no auxiliary assumptions are needed to compute regression-based specification tests of jointly parameterized mean and variance functions that are robust to nonnormalitv. 16 , In many other situations $ (x , 8 computed if some additional is easily ) and in many cases standard assumptions are imposed under H the nonlinear regression example suppose that where 4> ,x (y For example, E [y ,6) contains a and any conditional variance parameters. 6 seen to be $ to (6 -= ) -3V m (a )v(y |x ato tt m (x ,a)] - $ (6 in 3 is easily ) Most tests for skewness in the ). literature impose homoskedasticity or some other conditional variance assumption under the null, and V(y |x ) has been specified. <J> (8 readily computed once is ) a model for Tests for skewness are typically carried out after the first two moments are thought to be correctly specified. is If this the case, Theorem 2.1 imposes no auxiliary assumptions under the null. However, it should be noted that the choice of by J the form of $ indicator A If A . t is A^_ L limited in this example A A. is t linearly J related to $ t then the modified In a linear model with conditional is simply zero. A A homoskedasticity and repressors w , $ is cannot contain linear combinations of w proportional to w . t unity then the choice A t = 1 for unconditional skewness based on if w In particular, F is unavailable; Thus . t , A contains this rules out a standard test ^ Newey-Tauchen-White test would X u Z-t=l allow more flexibility in the choice of A^ for this example. consider testing for nonconstancy of the conditional As another example, first absolute moment of the regression errors. E[jy^ !y_ - n^(>-_.Q m_(x_.a)| ) - differentiabie in | jx k a, ] — n where > C. 9 The generalized residual is = (a' ,*)' . at (6) <j> (y^.x^.tf) = Although e_(f) is not strictly it is differentiabie almost surely under the usual assumptions imposed in these contexts. is V 6 Under H„ tto - (liy -m (a ) > 0] - The quasi -gradient with respect to a tto lfy -m (a ) 'ato < 0])V m (a ) . Under . conditional symmetry of the distribution of y E(l[y -m (a t t o Also, E[V 0. > 0]|x ) t = E(l[y ) )|x /clol (8 <j> = ] m (o - t t and * (x 1, t o t < 0] |x ) , 6 o ) is given x t) , so that E[V^ simply (0,1). t (« o If m ) = |x t ] is the l. A is an M- estimator other than the NLS conditional mean function and a estimator (e.g. the least absolute deviations estimator) then conditional A symmetry is needed anyway for a Assumption (ii.c) to be consistent for a . perhaps more properly listed as a regularity is condition, but it is placed in the text to emphasize the generality of Theorem 2.1. Having ./T- cons is tent estimators of 8 and ix requirement, and allows relatively simple specification tests when well as n has-been estimated by ) assumptions). an- weak is a fairly (as 8 inefficient procedure (under classical An application to the tobit model is given in section 3. The tobit example has the feature of imposing correct specification of the conditional distribution under H n but, unlike the usual LM or NTW regressions, Theorem 2.1 can be used when an estimator other than the MLE is available A yet unresolved issue a the relationship between £ is and £ There is . simple characterization of their asymptotic equivalence under H n The . proof of the following lemma follows immediately from the construction of Lemma 2.2 Let the conditions of Theorem 2.1 hold. : I (iii) l' T— / 1-\ l/l a a in addition, a VC^ I t t-1 If, u - u 0(1), p tnen C - T £ T = o (2.21) (l), p 18 | . . When (iii) holds, and £ are asymptotically equivalent under H_ f Condition (iii) is usefully interpreted as the sample covariance between "1/2 A ' $ (C A : t=l,..,T) and (C 1/2 ' A : tf> t=l,...,T) being asymptotically zero. It is trivially satisfied if T X *.(*)'C t-1 U (0,»r (0) - (2.22) x is the defining first-order condition for 6 . This is frequently the case, A in particular when 6 is a quasi -maximum likelihood estimator (QMLE) of the parameters of a conditional mean (see Wooldridge [30]) or of the parameters of a jointly parameterized conditional mean and conditional variance (see Example 3.3 below). In these examples (2.21) also holds (trivially) for local alternatives, so that the difference between the test based on £ and, A say, the NTW test based on £ is , simply that different estimators have been used for the moment matrices appearing in the quadratic form. Consequently, under the conditions required for the classical test to be valid, the two procedures are asymptotically equivalent under local alternatives is achieved without losing asymptotic efficiency. known asymptotic size under H_ , ; robustness In addition to having the robust test has a limiting noncentral chi-square distribution even when the auxiliary assumptions are violated under local alternatives (e.g. heteroskedasticity is present in a dynamic regression model). Lemma 2.2 does not directly provide a description of the local behavior of £„ when (iii) fails tc held under local alternatives, but viewed from a slightly different angle it provides useful insight. implies that the quadratic form in 19 £ Note that Theorem 2.1 has an asymptotic chi-square : distribution under H n regardless of whether or not (iii) holds; the issue is has power how to characterize the directions of misspecif ication that £ Fortunately, it is frequently the case against when (iii) does not hold. that £ asymptotically equivalent to some well-known statistic under local is This facilitates interpreting alternatives, when classical assumptions hold. a rejection when (iii) fails to hold. To characterize the local behavior of f it is useful to be somewhat , more explicit about the nature of the local alternatives. / nonstochastic sequences such that JT(6 {#„,: * Let / - 8 - 0(1) and jT(n ) and n 6 * - n O be - 0(1). ) T— 1,2,...} indexes the sequence of local alternatives, but, as with under H_ , 6 need not uniquely index the nonnull probability measure. the plim of the estimator n T— 1,2,...}. 6 k under the sequence of local alternatives (H is , Assume that the conditions of Theorem 2.1 are supplemented with conditions of the form % as T -< co f-liT v * r ^ 1 * . , f J t=l for various functions G o . T i, , -l v ol t=l This corresponds to standard assumptions in the analysis of the local behavior of test statistics. The arguments of Theorem 2.1 can be used to show that under the sequence of local alternatives Tl-1/2 £ 7 i~ - T * B T !i * * * I [A_ $_B T ]' - * * C^ + o^(l) (2.23) where = ^ El£ Lt-i * * * " I ^-*-J j x *** i I EL** C i-A t-i J and values with a "*" superscript are evaluated at 20 or (0 ,?r ). Equation ' is the extension of (2.18) (2.23) to local alternatives and implies that the local limiting distribution of £ their plims 6^. when is the same and n^, provided that JT(8 - 6 T A 0(1) under (H^) This implies that if . ,/T-consistent estimators of *T1 under (H A A (0 „,7r ) , where | . " ,x (6 (8 Tl (iii), = ) >* . are replaced by A Tl and ) ( VT^ and (1) A A ,n that t^) = - A 7T ^T2 T2 ^ are b ° th then } (2 is evaluated at (6 in addition, Suppose, and n (1) ^T2 " °p A „). A 8 A under {H ) A 6 ) and 24) evaluated at is £ - A „ are chosen to satisfy and w i.e. T* T 1/2 t I^ t (^ 2 )'C t (J T2 ,; T2 )^(? T2 ) = o p( (2.25) l). A Then, by the analog of Lemma 2.2 for local alternatives, A where £ evaluated at (6 9 ). ,7r = *T1 - 9 £ o «= „ (1) A A is £ Along with (2.24) this implies that (2 °p (1) ^T2 ' 26) Conclusion (2.26) is simple yet very under H n and local alternatives. A powerful. It means that for any Vi-consistent estimator asymptotically equivalent to £ 6 _.. , £_.. is A A „ because £„,„ has been evaluated at an estimator that satisfies the asymptotic first order condition (2.22). Whenever such an estimator is available the interpretation of straightforward: £ is £T is asymptotically equivalent to the vector that originally motivated the test statistic, £_, when |„ is evaluated at the estimator that solves the first order condition. estimator is used in computing the interpretation of | £ , It does not matter which provided that it is JI-co-sLs ter.t. Thus, does not depend on the estimator used in computing 21 it. In many situations there is available an estimator that satisfies (2.22), and typically it solves a well-known problem. Interpreting £ even whether or not (iii) holds typically reduces to interpreting an LM-type statistic in a particular weighted nonlinear regression model or in a model estimated by MLE under normality. An example of a case where an estimator satisfying (2.22) does not have a simple interpretation involves testing for skewness as discussed above. In a linear model with homoskedasticity , the estimator that solves the first order condition (2.22) sets the correlation between the regressors and the third moment of the errors equal to zero. This method of moments estimator is not particularly easy to interpret. The reasoning of the previous paragraph is applied to the tobit model in the next section. There it is seen that a test for the conditional mean using, e.g., Heckman's [12] two-step estimator, is asymptotically equivalent to a standard Davidson-MacKinnon [5] test for comparing two readily interpretable weighted nonlinear regression models. The results of Theorem 2.1 and Lemma 2.2 are asymptotic. Very little is known about the finite sample performance of the statistics of Theorem 2.1, especially for nonlinear dynamic models. It should be emphasized, however, that even though the regression in step (3) uses unity as the dependent variable, these statistics do not necessarily have the same finite sample biases sometimes exhibited by outer product-type regressions. Unlike standard outer product regressions, the robust form does exploit the generalized residual form cf the test statistic. Davidson and MacKinnon [6] In fact, the simulations of for a static regression model and of Bollerslev and wooidridge [1] for an AR-GARCH model suggest that the orthogonalization 22 2 A of C 1/2 A A A with respect to C 1/2 : A in step (2) improves the finite sample 4> performance relative to the NTW outer product regression, even under classical assumptions. That this might be the case was previously suggested to me by Peter Phillips. 3 . Examples of Robust Example 3.1 be a scalar and let (m (x ,q) Let y : Regress ion- Based Tests . a G A) : parametric family for the conditional expectation of y A c R , given x p be a , The null . hypothesis is H Let {h (x > 0, E(y : Q 7) : |x t t - m (x ,q ) t {7 ) , some a t' t ) t-1 (3.1) , estimator such that T is an t i of V(y |x J e A, Q It is not assumed that (h (x ,7): C T. } q 7 e T) be a sequence of weighting functions such that h and suppose that 7 where t or that h (x ,7 to t ) t (7 - |x t' ) = ,7) (1), contains a version 7 £ T) to V(v is proportional F F J 7 (x for some 7 ) t o e T . The researcher chooses a set of weights (h (x ,7 )] and performs weighted NLS (WNLS) , or uses some other 7T-consistent estimator for a matter which estimator for a o is used, . However, no o the tests are motivated bv the WNLS first order condition T I V a m . (Q)' [y t fc - t_ 1 A general class of diagnostics iMq)]/M7 t t u T is - 0. ) (3.2) obtained by replacing 7 m_(a) with a 1x0 vector of cisspecif icatior. indicators evaluated at the estimators T± l A t A A (a ,7r T A T )' [y t - 23 A m (a )]/h (7 ) t T t T (3.3) and other nuisance parameters can contain 7 where n Theorem 2.1, 6 1/h (7)- is It = a, <t>A e m v ) m ' t t ( A ({,1) = A (a.jr), and C (0,7r) = t Q ). easy to see that computation of $ (x auxiliary assumptions under H_ J , u tto and in fact $ (x The usual LM-type statistic, which is TR A A A on u A/h. t t A A 2 .0 , 6 requires no ) -V ) QttO (x ni .0. ). from the regression A A//h, t t V m /7h a t t In the notation of . t-l,...,T, A be asymptotically equivalent to the WNLS estimator and that requires that a A h (7 ) be a consistent estimator of V(y |x ) up to scale. The following A procedure is valid under H n for any assumptions about V(y |x ,/T- without any cons is tent estimator a ): A (i) Let q be a ,/T-consistent estimator of a A A u , t V m (cO the gradient b a t T and the indicator , _ ~ : ' e ;-i/2: m V^m^ = h^ /2„ Vjn^, and A^ x h„ ' X t (ii) Regress A t Compute the residuals . A A^*/-\A A (a_,7r_). A t T T Define u (iii) Regress 1 on u A t u t , ; and save the lxQ residuals, say on V m t = h •• and use TR 2 = T - A ; RSS from this regression as 2 asymptotically x n under H_. A This procedure with h [6] = 1 was first proposed by Davidson and MacKinnon in the context of a nonlinear regression model with independent errors and unconditional heteroskedasticity . It was independently suggested by Vooldridge [29] for nonlinear, possiblv dynamic regression models with conditional or unconditional heteroskedasticiry under a martingale difference assumption on the regression errors. A a Theorem 2.1 further demonstrates that A need not be the NLS estimator. 24 The indicator A^ can be chosen to yield LM tests, Hausman tests based on two WNLS regressions, and tests of nonnested hypotheses such as the Davidson-MacKinnon , test, which do not require [5] correct specification of the conditional variance of y given x . Conditional mean tests in the more general context of multivariate linear exponential families are considered in more detail in Wooldridge [30]. The estimator that satisfies condition (iii) of Lemma 2.2 is the WNLS A estimator based on weights 1/h (7„) From the remarks following Lemma 2.2, . the robust test statistic employing any ,/T-consistent estimator is asymptotically equivalent to the LM statistic based on (3.3) when (3.3) is A evaluated at the WNLS estimator, h is a ,/T-consistent-estimator of 7 (7 ) G T, h (x ,7 2 = a h (x ,7) where a : E(y |x t and 7 ), . Suppose now, in the context of Example 3.1, the goal is to test : whether for some 7 H |x For efficiency reasons it is prudent to . o put some thought into the choice of h Example 3.2 V(y is proportional to t ) = 2 proportional to V(y |x is ) is absorbed into 7. m^x^), V(y t |x t ) 7 o % e k (3 ' " 4) e T, t-1,2,.... A A Let a 7) The null hypothesis is VVV' -= Let v (x ). be the WNLS or some other ,/T-consistent estimator of a any Jl- consistent estimator of 7 indicators where 6 a statistic or tne .1 = (a' ,7')' . Let . A (x ,6 ,t) , and let 7 T be be a lxQ vector of Most tests for variances can be derived from rom T A I Kl< "2 - vj/v_ 25 (3.5) Choosing A(x ,8, it) vechfV m (a)'V m q t q t nonredundant elements of to be the nonconstant, (a) information matrix test in the leads to the White [24] ] context of quasi-maximum likelihood estimation in a linear exponential is a lxQ subvector of x where w 2 (7) = a When v familiy (see Wooldridge [30]). choosing A ,B,n) (x (u 2 (q),...,u = w ,8,-k) , leads to the Lagrange Multiplier test for , a general form of heteroskedasticity (see Breusch and Pagan A (x test for ARCH(Q) under gives Engle's [8] (q)) Setting [2]). a null of conditional homoskedasticity. The correspondences for Theorem 2.1 are L u*(a) v (7), - V v (7). 7 Under H *- -V v (7 ) o 7 t t {$,*) - l/v?( 7 ). C t ULOL E[u (a )|x Note that V = ] «= so that $ <j> 1, (a' ,7')' , (8) <f> - -2V m (a)u (a) ($) uuo (x = 8 6 ) - ptou - E[V * (0 )|x - ] = no additional assumptions are needed under H„ to compute ; o The choice C (8 ,n) e 1/v 2 (7) in (3.5) is motivated by the structure of the score of the normal log- likelihood with mean function m (a) and variance "2 function v (7) appears in the the variance In particular, the scaling 1/v . tests of Godfrey [10] and Breusch and Pagan [3]. in this context is TR o /\ f\ (u t 2 u from the regression ° A V AV - The standard LM statistic A A A A VV Vt/V t=1 ----- T C3.6) - In addition to (3.4) this test imposes £[(*0 t- |x,_] - k [v 4_(x_,7 C «- w ^ o )] , under the null, so that it is ncnrobust. and Pagan [3], normality. (3.6) is some k > Moreover, as pointed out by Breusch generally valid only if 7 J Breusch and Pagan [3] 26 (3.7) o T is the QMLE of 7 ' o under offer a computationally simple C(a) test . that allows 7 to be any 7T- cons is tent estimator of 7 that (3.7) hold under the null. , but it still requires The NTW procedure applied to this case is valid essentially in the same cases as the usual LM statistic. The robust procedure obtained from Theorem 2.1 is easy to compute, A imposes only (3.4) under the null, and allows 7 estimator of 7 to be any Jl- consistent . o 01 A«AAA« A A Let q_ be a (i) ,/T- 1 cons is tent estimator of a , and let 7_ be a A ,/T-consistent estimator of 7 Compute the residuals u . AA A V v (7 t T and the indicator ), A A 7 7 t V and , t' (ii) Regress A 6 A /v A v t on V v 7 t t tt (fl the gradient A« Define ). 4> = (u - A v )/ v - u / v t t t t 1 A A V v /v V v A , t ; and save the 1x0 x residuals, say J - • 2 (iii) Regress 1 on 6 and use TR - T Y A 6 t t u - A ; t' RSS from this regression as 6 2 asymptotically x n under H_ Interestingly, when v (x homoskedasticity Given u t , 2 , so that the null is conditional the regression in (ii) simply demeans the indicators. , and a choice for a_, T a ,7) A , t the Ay^ statistic is obtained as TR from u Q the regression A A on 1 A - A T = T £ A U 1 L. where A 0(V (uf- . - t-I,...,T A_) (3.8) I This procedure is asymptotically equivalent to the traditional regression form (2.8) under the additional assumption that E[u^_(a ) |x ] is constant. Note that (3.6) and the regression (2.8) usually yield different test statistics that are not asymptotically equivalent under 27 ' The demeaning of the indicators may not seem like much of a H modification, but it yields an asymptotically chi-square distributed statistic without the additional assumption of constant fourth moment for u In the case of the White test in a linear time series model, . the demeaning of the indicators yields a statistic which is asymptotically equivalent to Hsieh's [14] suggestion for robust form of the White test, but the above a statistic is significantly easier to compute. In the case of the ARCH test, TR asymptotically equivalent to TR 1 (VV Vl"V on ( 2 ( 2 from the regression in (3.8) is from the regression V The regression based form in (3.9) a T )(u t= Q +1 ---- T t-Q"V is robust to < - 3 - 9 ) departures from the conditional normality assumption, and from any other auxiliary assumptions, such as constant conditional fourth moment for u . Nevertheless, it is t asymptotically equivalent to the usual ARCH test under normality. Example 3.3 Theorem 2.1 can also be applied to models that jointly : parameterize the conditional mean and conditional variance. Again, and consider LM tests that are robust to nonnormality. a scalar, let y The unconstrained conditional mean and variance functions are (M M where A C R E(yJx tt ) t (x t ,5), » (x ,«): 6 A) 6 t t (3.10) It is assumed that . tto - M^(x_,<5 ), V(v |x t t ) - a- tto (x ,6 some ), S o € A. (3.11) Take the null hypothesis to be H • 5 o « r(0 o ) for some 28 6 o e 6 c R P (3.12) be where P < M and r is continuously dif ferentiable on int(9) H (r(0)) and v (0) - w Let m (6) . (r(0)) be the constrained mean and variance functions. A QMLE is carried out under the null hypothesis. under H. and let , t 8 T T are the lxP gradients of m & A ttO and v under H_ of the quasi-log likelihood evaluated at s t (5)' V^ t (S)'u t (£)A> t (0 - Vt (5) lV t Evaluating s i/« (*), t u> t s 2 - T" V„m . Note that u> t 8 = v t and t and u «= *t The transpose of the score is . (6)'[u (S) - w (fi)]/2u£(«) t t u (5) t (ff) t (fi)^]J \(S}_- w t (3.13) (3.14) («) J at r(S) gives - A [V r /i t t v (#)]. 1/2 I t-1 (3.15) (^)'C (5)^ (^) (r(0))' t | t V.w (r(0))'], C t the middle of (3.14) evaluated at v(8) u^(r(£)) S 2 + V l/[2w t where A (8)' = t [ (r(0))' s AAA 'o . 8 The LM test of (3.12) is based on the unrestricted score by definition. m be the estimator of 8 = r(0_) be the constrained estimator of 7 £„, A V„v Let and , t 4> (0) {&)' is the diagonal = [u (r(0)), The standardized score evaluated at r(0 s (r(L)) matrix in ,-1/2 ) is (3.16) t«l C 2 Under H„ and the assumption of conditional normalitv, TP." from the regression .ity, TR e u • on is asvmotioticaliv v > t=l vnere Q M , - ? (3.17) T is tne number cr restrictions under fa_. Unfortunately, this procedure is invalid under nonnormality, nor does it have systematic power for detecting nonnormality. form of the test. In this case, 29 Theorem (2.1) suggests a robust u U 2x1 t t V - ' t t Vt Vt = A 2xM l/v. t where u v J t m - t t T' u //v /v t [u£ V f * 2xP tJ 6 - t v ]/(72v t t ) , k v t /(y2 v rt /7v t Vt/<V2v t I J are The transformed quantities H (#„,). v > t l/[2v£] . 2x2 ) J The robust test statistic is obtained by first running the regression A on t $ t=l,...,T t and saving the matrix residuals [A on 1 t' 2 = T - t=l,...,T). : Then run the regression t-l,...,T A t t and use TR (3.18) (3.19) RSS as asymptotically x 2 under H 0" n . Note that the regression in (3.19) contains perfect multicollinearity since A V r(8 „) = t where V r(6) is the MxP gradient of compute a RSS; b 0, i Many regression packages nevertheless r. for those that do not, ? regressors can be omitted from (3.19). a Note that the first order condition for 8— is simply 1a a a (0.) I * c.(*.AP )'C.(Oi c u x i - 0, 1 t-l so that the robust indicator is asymptotically equivalent to the usual LM indicator. The matrix regression in (3.18) is the cost to the researcher in . guarding against nonnormality Example 3.4 x y Suppose that y : random scalar censored below zero, and let is a be a lxM vector of predetermined variables from x Among other things, the tobit model implies that is the tobit model. E(y ly VJ IJ >0,x t t ' A popular model for . ) t' — x -a + a FV p(x ,q /a ) o tl o' o' tl o - m (0 ) t o (3.20) and V(y >0,x ) t ly t t -a*(l - v - (6 t o Cx a /a )p(x tl o o a /a ) tl o o [p^^/a^h - (3.21) ) where p(-) is the Mill's ratio and 8 ooo - (a ,a ) . Here a 2 is the conditional o JO variance usuallv associated with the underlying "latent" variable, and w a J is conditional mean of the latent variable. t From a statistical standpoint, the tobit model is no more sensible than log yjy t >0,x t N(x - (3.22) £ ,r^) tl Q ((3.22) also seems reasonable for many economic applications; see Cragg [4]), If (3.22) is valid, B log y oo 2 and on rj x L. can be estimated bv OLS of . Li using only the positive values of y^ Recall that (3.22) implies . i. E(y^_ jy^>0 ,x_) 2 - exp(rj /2 + x _/5 ). 2 c c and V(v J t"t >G,X t ly ) = [eXD(r? ' - 2 -o ) uAB.A too - 2 r ' eXD(r; o /2 I1 A + X -0 tl o ) ] (3-24) A Let a A"2 , T , a T be any VT-consistent estimators of a and a o 2 o under H A . These include the MLE's, Heckman's [12] two-step estimators, and various WNLS Define the Davidson-MacKinnon estimators. indicator to be the following [5] weighted difference of the predicted values from (3.23 and (3.20): AA A A t A where v A« = v A A (a ,a A - A = w and u ) AA A X^^] - (v / Wt ){exp[r^/2 + t [x^ AA + t^p (x^/a^J ] } (3.25) A « r/ (j3 ) if the tobit model is true, Then, . A A should be asymptotically uncorrelated with A u A E y J t t A A x ,a_ - - tl T A ,a^/a^,). o^pix p T tl T (3.26) T A test which can be shown to be consistent against the alternative (3.23) A based on the weighted correlation between u is A and A . Unfortunately, the A usual LM statistic is invalid even when the weighting reason is that the estimators A A (a a ) is employed. 1/,/v The need not have been obtained from the weighted nonlinear least squares problem 1 - min I (y t a a t=l - x tl Q - ^P(x A Q / £7 )) / v t tl (3.27) - , Nevertheless, a statistic is available from Theorem 2.1. q> t (a,o) = y - x tl Q - erp(x -a/cr) Let = u (^) t A and let V m^ denote the 1 x (M+l) L. A to a and a, evaluated at asymptotically valid: gradient of x .a + ap(x .a/a) with respect tl ti A (a^,o„,) . Then the following procedure is T . Define (i) as in (3.25), u A quantities are The weighted M 6 A = A /v /7v t t A on t and save the residuals 1 V^m •= T t=l t A , t A V„m /7v /v t . t T . t on u t=l A t 2 t't as above. A ^ u /v /7v V„m t t' u , and V m A Run the regression (iii) u A Run the OLS regression (ii) and use TR as in (3.26), A A , t , . RSS as asymptotically r J J - . . , ' 2 y, 1 under H_ . This test takes the null hypothesis to be correct specification of the conditional density given x J of y J t b t , i.e. the tobit model holds under H„.--Tn particular, it relies on linearity of the conditional expectation in x conditional homoskedasticity , . and conditional normality in the underlying latent variable model; it is not intended to be robust to departures from any of these assumptions. Instead the test is devised to have power against departures from the tobit model that invalidate the conditional expectation (3.20). Equation (3.20) is of course only one of many consequences of the tobit model that could be tested. The test is most useful if interest lies in determining the effect of explanatory variables on positive values of the dependent variable The discussion following Lemma 2.2 implies that (i)-(iii) is asymptotically equivalent to the Davids on -MacKinnon test fcr weighted NLS estimation of (3.20) and (3.23), provided a T and c estimators. are Jl- cons is tent If g„ and a_ are the MLE's then these estimators are more i I efficient than the WNLS estimators that solve (3.27). 33 But Heckman's [12] two . step estimators yield a test that is asymptotically equivalent to the test One is essentially doing employing the MLE's or the WNLS estimators. specification testing of the nonlinear regression model (3.20) with variance This makes interpretation of procedure (i)-(iii) straightforward. (3.21). It must be emphasized that if the MLE's are available then a Newey-Tauchen-White regression (2.13), which requires the estimated scores of the conditional log-likelihood, The procedure obtained from is available. Theorem 2.1 is more flexible albeit less efficient. A similar test could be based on competing specifications for E(y that is, the zero as well as positive observations for y would require specifying P(y >0|x Cragg ) in the competing model |x ); This can be used. (3.22) such as in [4] Before leaving this example, it is useful to note that simple tests for exclusion restrictions can be developed along similar lines. unrestricted mean function for E(y w X !y >0,x ) t t i; t is + X Q + ap([x Q + X a Q tl ol t2 o2 tl ol t2 o2 ] The null hypothesis is that a „ The (3 'V- = 0, which reduces to (3.20). " 28) The indicator A is now the gradient of (3.28) with respect to a A evaluated at the restricted estimates: A A t = x A + c„V p(x ,a„,,/a„,)x . t2 T z t^ Tl T t2 . A where V p(-) is the derivative of the Mill's ratio. z When this X t is used in (i)-(iii) a test asymptotically equivalent to the LM statistic in the context A of WNLS is obtained. a , ol and a Again, a„, A and c . o 34 are any Jl consistent estimators of 4 Conclusions . This paper has developed a general class of regression-based specification tests for (possibly) dynamic multivariate models which, in the leading cases, imposes under H_ only the hypotheses being tested (correctness of the conditional mean and/or correctness of the conditional variance) The . framework can be applied to testing other aspects of a conditional distribution under a modest number of additional assumptions. It is hoped that the computational simplicity of the methods proposed here removes some of the barriers to using robust test statistics in practice. The possibility of generating simple test statistics when T 1/2 ' A (6 - 6 has a complicated limiting distribution should be useful in ) several situations. The tobit example in Section is only one case where 3 the conditional mean parameters are estimated using a method other than the efficient WNLS procedure or the even more efficient MLE. is to choosing Another application between log-linear and linear-linear specifications. case, both models can be estimated by OLS , In this and then transformed in the manner of the tobit example to obtain estimates of E(y ]x t t ) for the separate models. Theorem 2.1 applies directly to linear simultaneous equations models (SEM's). Computation of $ (x t ,6 t o ) is straightforward provided that the reduced form for the endogenous right hand side variables is available. parameter vector 8 The contains the reduced form parameters of the relevant endogenous variables as well as the structural parameters in the equation(s) If all equations of a simultaneous system are being tested of interest. jointly then 6 is simply all structural parameters. 35 An immediate application is to testing for serial correlation in linear dynamic simultaneous equations models in the presence of heteroskedasticity (conditional or unconditional) of unknown form. Also, tests for multivariate ARCH in SEM's that are robust to heterokurtosis are also easily constructed. The scope of applications to nonlinear SEM's is limited by one's ability to compute $ (x E[V 4>{y u ,x t t ,6 O )|x t ]. , 6 ) = This is exactly the problem of computing the optimal instrumental variables for nonlinear SEM's. Theorem 2.1 can be extended to certain unit root time series models. The initial purging of C effectively stationary. 1/2 A 4 from m ./-.A 1/2 C A can produce indicators A that are This happens for the LM test in linear time series models when the regressors excluded under the null hypothesis are individually cointegrated (in a generalized sense) with the regressors included under the null. In this context the statistics derived from Theorem 2.1 have the advantage over the usual Wald or LM tests of being robust to conditional heteroskedasticity under H Extending Theorem 2.1 to general nonstationary time series models is left for future research. 36 References 1. Bollerslev, T. and J.M. Wooldridge. Quasi -maximum likelihood estimation of dynamic models with time-varying covariances. MIT Department of Economics Working Paper No. 505, 1988. 2. Breusch, T.S. and A.R. Pagan. A simple test for heteroskedasticity and random coefficient variation. Econometrlca 47 (1979): 1287-1294. 3. Breusch, T.S. and A.R. Pagan. The Lagrange Multiplier statistic and its application to model specification in econometrics. Review of Economic Studies 47 (1980): 239-253. 4. Cragg, J.G. Some statistical models for limited dependent variables with applications to the demand for durable goods. Econometrica 39 (1971): 829-844. 5. Davidson, R. and J.G. MacKinnon. Several tests of model specification in the presence of alternative hypotheses. Econometrica 49 (1981): 781-793. 6. Davidson, R. and J.G. MacKinnon. Heteroskedasticity-robust tests in regression directions. Annales de 1' INSEE 59/60 (1985): 183-218. 7. Domowitz I. and H. White. Misspecified models with dependent observations. Journal of Econometrics 20 (1982): 35-58. 8. Engle R..F. Autoregressive conditional heteroskedasticity with estimates of United Kingdom inflation. Econometrica 50 (1982): 987-1008. 9. Engle, R.F. Wald, Likelihood Ratio, and Lagrange Multiplier tests in Handbook econometrics. In Z. Griliches and M. Intriligator (ed.), of Econometrics, Vol. II, Amsterdam: North Holland (1984): 775-826. , 10. Godfrey, L.G. Testing for multiplicative heteroskedasticity. Econometrics 8 (1978): 227-236. 11. Hansen, L.P. Large sample properties of generalized method of moments estimators. Econometrica 50 (1982): 1029-1054. 12. Heckman, J.J. The commor. structure cf statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 5 (1976): 475-492. 13. Hausman, J. A. Specification tests in econometrics. Econometrica 46 (1978): 1251-1271. Journal of . : 14. Hsieh, D.A. A heteroskedasticity-consistent covariance matrix estimator for time series regressions. Journal of Econometrics 22 (1983): 281-290. 15. Kennan, J. and G.R. Neumann. Why does the information matrix test reject too often? University of Iowa Department of Economics Working Paper No. 88-4, 1988. 16. Koenker, R. A note on studentizing a test for heteroscedasticity Journal of Econometrics 17 (1981): 107-112. 17. Newey, W.K. Maximum likelihood specification testing and conditional moment tests. Econometrica 53 (1985): 1047-1070. 18. Neyman. J. Optimal asymptotic tests of composite statistical hypotheses. In U. Grenander (ed.), Probability and Statistics Stockholm: Almquist and Wiksell (1959): 213-234. , 19. Pagan, A.R. and A.D. Hall. Diagnostic tests as residual analysis. Econometric Reviews 2 (1983): 159-218. 20. Pagan, A.R. P.K. Trivedi, and A.D. Hall. Assessing the variability of inflation. Review of Economic Studies 50 (1983): 585-596. 21. Tauchen, G. Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics 30 (1985): 415-443. 22. White, H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity Econometrica 48 (1980) 817-838. , . 23. White, H. Nonlinear regression on cross section data. Econometrica 48 (1980): 721-746. 24. White, H. Maximum likelihood estimation of misspecif ied models. Econometrica 50 (1982): 1-26. 25. White, H. Asymptotic Theory for Econometric ians Fress, 1984. 26. White, H. Specification testing in dynamic models. In T. Advances in Econometrics - Fifth World Congress Vol. I, Cambridge University Press (1987): 1-58. New York: Academic . , Bewley (ed.), New York: 27. White, H. and I. Domowitz. Nonlinear regression with dependent observations. Econometrica 52 (1984): 143-162. 28. Wooldridge J.M. Asymptotic properties of econometric estimators Ph.D. dissertation, 1986. , 38 . UCSD 29. Wooldridge, J.M. A regression-based Lagrange Multiplier statistic that is robust in the presence of heteroskedasticity MIT Deparment of Economics Working Paper No. 478, 1987. . 30. Wooldridge, J.M. Specification testing and quasi -maximum likelihood estimation. MIT Department of Economics Working Paper No. 479, 1987. 31. Wooldridge, J.M. On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Manuscript, MIT Department of Economics, 1988. 39 1 , Mathematical Appendix For convenience, I include a lemma that is used repeatedly in the proof of Theorem 2.1. Lemma A. : Assume that the sequence of random functions (Q (w T— 1,2, ...}, where Q T (w R p , e 8, 0): is continuous on 8 and 8 is a compact subset of •) G 8, T=l 2 and the sequence of nonrandom functions {Q (0): , , . . . ) satisfy the following conditions: (i) sup |Q (w 0e8 (ii) Let {Q (0): 0) - Q (0)]| B 0; e 8, T=l , 2 , . . . } is continuous on 8 be a sequence of random vectors such that uniformly in -0*0 _ T. where (0„! Then C 8. Q T (w T ,0 T ) Proof: - Q T (0°) 2 0. see Wooldridge [28, Lemma A.l, p. 229]. A definition simplifies the statement of the conditions. Definition A.l A sequence of random functions : (q (y ,x ,0): 6 8, t— 1,2,...}, where q^(y ,x ,-) is continuous on 8 and 8 is a compact subset of t t t R P , is said to satisfy the Uniform Weak Law of Large Numbers (UWLLN) and Uniform Continuity (UC) conditions provided that (i) sup 0£8 1 IT" T Y q^(y x 0) ^ ^ C-l " E[q (y .x .0)]| 2 - and T 1 (ii) {T " Y E iq^(>V.^ .*)]: Z Z Z t-1 continuous on 8 uniformly in T. U0 8 e 6, T-1,2,...} is 0(1) and 1 ; In the statement of the conditions, variables y a(9) the dependence of functions on the is frequently suppressed for notational convenience. and x is a lxL function of the Pxl vector then, by convention, V a(0) 6 If is the 8 LxP matrix V {a(8)']. If k(6) QxL matrix then the matrix V A(0) is the is a P 8 LQxP matrix defined as h(ey v - [Vfaier g where A. (8) is the j IVq ••• I °' th row of A(0) and V.A.(B) as defined as above. ] gradient of A. (9) is the LxP 0J J derivative of ( J Also, for any Lxl vector function <p, define the second to be the LPxP matrix <p ^<P<*) - V [V^(*)']. fl Finally, define the parameter vector Conditions (i) A. (iii) that P and c R II e int(6), 6 b (#' ,»r')' (a) are compact and have nonempty interiors; U°: T-l (y ,x [<f> N ,8): , 2 , . . } c int(II) uniformly in T; is a sequence of Lxl e 6) 9 . t functions such ft (y t ,x t ,•) 6 6 and is Borel measurable for each 4>^{- ,6) . : 6 c R (ii) S continuously differentiable on the interior of 6 for all y (b) int(8) . Assume that <*> t ) (x is continuously interior of 6 for all x^ (c) ppttwOUJ ttO Define 9Ax.,B , (C j_(x j_,c): ,.•) = E, , t=l , 2 , . . . ) |x for all t=l 6 o , 2 , . . . e differentiable on the t t-1,2,...; S semi-definite for all x_ and t . , o £ A} is a sequence cf LxL matrices satisfying the measurabiiity requirements, C (x ,6) for ail x x [V,i (y ,x is 5. and C (x ; 41 symmetric and positive is , • ) is differentiable on int(A) ; (d) (x (A , ; ,8): e A) is a sequence of LxQ matrices satisfying 8 the measurability requirements, and A (x for all x (iv) , t-1,2, T (a) T (b) (v) . 1/2 . is ) differentiable on int(A) . To6) = tt°) = (0„ - 1/2 (tt - t (1); p 0(1); (6)* (0)) and {$ (0)'C (5)A (5) satistfy the UWLLN ($ (0)'C (a) • and UC conditions; (T (b) T -1 '" Y, E[$ (vi) l$ (0)'C (a) t {* (9)' [I. t (0' [I t Z (5)V^ t T" (a) 1/2 t is uniformly positive definite; ]} (0)}, {[I - r I 9>°'C%° ^i t t t p (6)V^ {A (5)'C t t (A C $ ® p t (0)'C («)]V * t fl (fl)} t and I t (0)}, (1); " v { ® ^.(0)' C [I ® * («)' ]V C (6)}, and (* (0)' t 5 t t L ^ ® ^ (0)' ]V C (5)} satisfy the UWLLN and UC conditions; (b) (vii) ' *~ t-1 t (S) ® 4>J6)' [^ JV^*)' ]^C t (fi) ) } satisfy the UWLLN and UC requirements T i (viii) {E° - T" (a) 1 I E[(a!-*°b!)'cV^'c!(aJ-*°b2)]} is uniformly P-d.; H^ 1 / 2 !" 1 / 2 (b) (c) (A t I- L. = t ^ N(0,I Q ); J. (5)'C (5)^ (0)^ (0)'C (5)A (5)}, t t t [A_(S)' C^(8)<f>^(8)4>(6)' t- -*° B?)'C°< I (A° <, t C^(8)§(6)) U t , t and t {<J-^(f)' L. satisfy the UWLLN and UC conditions. 42 C_(5)^_(^)^_(£ )'C_(5)* (0)} ; t t t t t , Proof of Theorem 2.1 First, note that assumptions (i)-(vi) ensure existence : lip of B„ and imply that B_ T B_ - o •= Lemma A.l. (1) by Therefore, T rT -T-^z [a * (L - (-D v°]'v t - t=1 b°)-t" - 1/2 T A A A y J'c J . A Consider the term post-multiplying value expansion about 6 , B - )' . A standard mean assumption (vi.a), and Lemma A.l yield r l/2 Al $°'CV r t t t 1/2 J'C i I ^, t t*t t=l T' (B (a. v 2)7 t-1n T T-2 {^'[l L »*°']VjC°}T 1 / 2 (« T -« o + ) + (l). o p i The first term on the right hand side of (a. 2) is (1) and (iv.a,b), the terms in lines two and three of (a. 2) by (vi.b). By (vi.a) are also (1). P Therefore 1/9 A A A ' 1 *'C 6 T - (a. 3) (1). P t-1 A Along with B_ B_ - o (1), - r L p r T - T" 1/2 X [A t-1 - this establishes that under H n u * B°]'C^ „-l/2r' - _1 + T T - 1 r 0(1). + (a. 4) P A mean value expansion, assumption (vii) C , o I [A t - , and Lemma A.l yield _c_o,,_c.o * B ] 'C 4 t T t t ttTtet X . ([a!-^b!]'C°V,c^ - IP B°' ri_ ® 43 , CN (a. 5) *!'C°]V,$°} T t t 6 t i/2 (fl Ti - B o ) , ^ t=l •T + 1/2 (5 - 6°) T (l) o p Consider the second line of appearing there is o under H. (1) It must be shown that the average (a. 5). First, note that by definition of $ . and the law of iterated expectations, E([A°-*°b£]'C°V^°) - E{E([A°-f°B°]'C°V^°|x L t T t J t t ])) (a. 6) - (a. 7) t Therefore I-^E([A°.t°B°,-C°7^°, by definition of B T-^EUA^XrcV, - The regularity conditions imposed imply that each of . the averages appearing in (a. 7) satisfy the WLLN. _1 Y I [A t -$°B°]'C°V^ ~ t t t e t T L Because Ef^^Jx t t ] j = sample averages in under K = (a. 8) (1). p it is even easier to show that the remaining , u are o (1). (a. 5) o Therefore Combined with T (5_ T p - 5_) T = p (1) this establishes the first conclusion of the theorem: l T - T" 1/2 J [A° - + o (l). p *l*°]'Cyt Given (viii.a), the asymptotic covariance matrix of definite. Moreover, E_ (viii.c) ensures that x tT -* N(0,I ) (a. 9) £ is uniformly positive under H- by (viii.b). Condition -•JT T" AAAAA AA A I [(A t t-1 - *t C V' tVt is a consistent estimator of S_. CT rT where R 2 is the (f> and A regression - " t ] (a 10) - It is easy to see that Tr2, (a. 11) uncentered r-squared from the regression 1 and \ t W AA (A C on J' A t-1 t t are as defined in the text. (a. 12) is unity, squares from the regression TR 2 - T (a. 12). 45 - T, (a. 12) Because the dependent variable in RSS, where RSS is the residual sum of 26 10 UI6 I _7 7 - *i 6-5"-© 1 Date Due 1 2 1989 ^26 199V ll Lib-26-67 MM 3 L [BRARIES TOAD DOS MOT fl^fl