Testing index sufficiency with a predicted index Andreas Dzemski∗ November 20, 2015 This paper tackles the problem of testing the null hypothesis of index sufficiency H0 : E[Y |X] = E[Y |r0 (X)] when the index rule r0 is unknown and has to be estimated at a first stage. I extend a testing approach by Delgado and Manteiga 2001 to allow for predicted variables in the conditioning set of the null model. The class of permissible estimators of r0 is characterized in terms of assumptions about their precision and complexity, and comprises a wide range of parametric, semiparametric and fully nonparametric estimators. I provide a stochastic expansion that describes how the estimation of the index rule affects the asymptotic distribution of the test statistic. This expansion holds uniformly over the class of permissible first-stage estimators. As demonstrated for kernelbased estimators, the first-stage estimation typically affects the asymptotic distribution of the test statistic. I suggest a multiplier bootstrap procedure which explicitly accounts for the first-stage estimation error. A rejection rule based on bootstrapped critical values guarantees that the test is correctly sized. In contrast to the case of an observed index, employing a higher-order kernel is not sufficient to eliminate the bias from kernel smoothing. An alternative procedure that uses an estimator of the bias to center the test statistic works under relatively weak assumptions about the first-stage estimator. JEL codes: C12, C14, C52 Keywords: significance test, generated regressors, U-statistic, multiplier bootstrap ∗ School of Business, Economics and Law, University of Gothenburg. 1 References Aı̈t-Sahalia, Yacine, Peter J Bickel, and Thomas M Stoker (2001). “Goodness-of-fit tests for kernel regression with an application to option implied volatilities”. In: Journal of Econometrics 105.2, pp. 363–412. Carroll, Raymond J et al. (1997). “Generalized partially linear single-index models”. In: Journal of the American Statistical Association 92.438, pp. 477–489. Chen, Song Xi and Ingrid Van Keilegom (2009). “A goodness-of-fit test for parametric and semi-parametric models in multiresponse regression”. In: Bernoulli 15.4, pp. 955–976. Das, Mitali, Whitney K Newey, and Francis Vella (2003). “Nonparametric estimation of sample selection models”. In: The Review of Economic Studies 70.1, pp. 33–58. Delgado, Miguel A and Wenceslao González Manteiga (2001). “Significance testing in nonparametric regression based on the bootstrap”. In: Annals of Statistics, pp. 1469– 1507. Dzemski, Andreas and Florian Sarnetzki (2014). “Overidentification test in a nonparametric treatment model with unobserved heterogeneity”. Working Paper. Escanciano, Juan Carlos, David Jacho-Chávez, and Arthur Lewbel (2014). “Uniform convergence of weighted sums of non and semiparametric residuals for estimation and testing”. In: Journal of Econometrics 178, pp. 426–443. Escanciano, Juan Carlos and Kyungchul Song (2010). “Testing single-index restrictions with a focus on average derivatives”. In: Journal of Econometrics 156.2, pp. 377–391. Fan, Yanqin and Qi Li (1996). “Consistent model specification tests: omitted variables and semiparametric functional forms”. In: Econometrica: Journal of the econometric society, pp. 865–890. Hansen, Bruce (2008). “Uniform convergence rates for kernel estimation with dependent data”. In: Econometric Theory 24.03, pp. 726–748. Härdle, Wolfgang and James Marron (1985). “Optimal bandwidth selection in nonparametric regression function estimation”. In: The Annals of Statistics, pp. 1465– 1481. Hastie, Trevor and Robert Tibshirani (1986). “Generalized additive models”. In: Statistical science, pp. 297–310. Heckman, James (1979). “Sample selection bias as a specification error”. In: Econometrica, pp. 153–161. Heckman, James J and Edward Vytlacil (2005). “Structural Equations, Treatment Effects, and Econometric Policy Evaluation1”. In: Econometrica 73.3, pp. 669–738. Heckman, James, Hidehiko Ichimura, Jeffrey Smith, et al. (1998). “Characterizing Selection Bias Using Experimental Data”. In: Econometrica, pp. 1017–1098. Heckman, James, Hidehiko Ichimura, and Petra Todd (1998). “Matching as an econometric evaluation estimator”. In: The Review of Economic Studies 65.2, pp. 261–294. Ichimura, Hidehiko (1993). “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models”. In: Journal of Econometrics 58.1, pp. 71–120. Jones, Chris, James Marron, and Simon Sheather (1996). “A brief survey of bandwidth selection for density estimation”. In: Journal of the American Statistical Association 91.433, pp. 401–407. 2 Klein, Roger W and Richard H Spady (1993). “An efficient semiparametric estimator for binary response models”. In: Econometrica, pp. 387–421. Lavergne, Pascal (2001). “An equality test across nonparametric regressions”. In: Journal of Econometrics 103.1, pp. 307–344. Lavergne, Pascal and Quang Vuong (2000). “Nonparametric significance testing”. In: Econometric Theory 16.04, pp. 576–601. Lavergne, Pascal, Samuel Maistre, Valentin Patilea, et al. (2015). “A significance test for covariates in nonparametric regression”. In: Electronic Journal of Statistics 9, pp. 643–678. Li, Qi (1999). “Consistent model specification tests for time series econometric models”. In: Journal of Econometrics 92.1, pp. 101–147. Li, Qi, Cheng Hsiao, and Joel Zinn (2003). “Consistent specification tests for semiparametric/nonparametric models based on series estimation methods”. In: Journal of Econometrics 112.2, pp. 295–325. Maistre, Samuel and Valentin Patilea (2014). “Nonparametric model checks for singleindex assumptions”. Working Paper. Mammen, Enno, Christoph Rothe, and Melanie Schienle (2012). “Nonparametric regression with nonparametrically generated covariates”. In: The Annals of Statistics 40.2, pp. 1132–1170. — (2015). “Semiparametric estimation with generated covariates”. In: Econometric Theory. Masry, Elias (1996). “Multivariate local polynomial regression for time series: uniform strong consistency and rates”. In: Journal of Time Series Analysis 17.6, pp. 571–599. Newey, Whitney K (2009). “Two-step series estimation of sample selection models”. In: The Econometrics Journal 12.s1, S217–S229. Nolan, Deborah and David Pollard (1987). “U-processes: Rates of Convergence”. In: The Annals of Statistics, pp. 780–799. Pollard, David (1984). Convergence of stochastic processes. Springer. Rodrı́guez-Póo, Juan M, Stefan Sperlich, and Philippe Vieu (2015). “Specification testing when the null is nonparametric or semiparametric”. In: Econometric Theory, pp. 1– 29. Rosenbaum, Paul R and Donald B Rubin (1983). “The central role of the propensity score in observational studies for causal effects”. In: Biometrika 70.1, pp. 41–55. Sherman, Robert P (1994). “Maximal inequalities for degenerate U-processes with applications to optimization estimators”. In: The Annals of Statistics, pp. 439–459. Stute, Winfried (1997). “Nonparametric model checks for regression”. In: The Annals of Statistics, pp. 613–641. Stute, Winfried and Li-Xing Zhu (2005). “Nonparametric checks for single-index models”. In: Annals of Statistics, pp. 1048–1083. van de Geer, Sara (2000). Empirical Processes in M-estimation. Vol. 6. Cambridge University Press. van der Vaart, Aad and Jon Wellner (1996). Weak Convergence and Empirical Processes. Springer. 3 Vytlacil, Edward (2002). “Independence, monotonicity, and latent index models: An equivalence result”. In: Econometrica 70.1, pp. 331–341. Xia, Yingcun and Wolfgang Härdle (2006). “Semi-parametric estimation of partially linear single-index models”. In: Journal of Multivariate Analysis 97.5, pp. 1162–1184. Xia, Yingcun, WK Li, et al. (2004). “A goodness-of-fit test for single-index models”. In: Statistica Sinica 14.1, pp. 1–28. 4