^^^trs^ ^^orr^^ Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/consistencyasympOOnewe WS^W: Consistency and Asymptotic Normality of Nonparametric Projection Estimators Whitney K. Newey No. 584 Rev. July 1991 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 -*-; ;?jf?K ^ U;>'<# Consistency and Asymptotic Normality of Nonparametric Projection Estimators Whitney No. 584 K. Newey Rev. July 1991 ; VO' ''' • M.J.T. NOV LIBRARIES 4 1991 RECEiVtu 1 MIT Working Paper 584 Consistency and Asymptotic Normality of Nonparametric Projection Estimators Whitney K. Newey MIT Department of Economics March, Revised, 1991 July, 1991 Helpful comments were provided by Andreas Buja and financial support by the NSF and the Sloan Foundation. Abstract Least squares projections are a useful way of describing the relationship between random variables. These include conditional expectations and projections on additive functions. way of estimating such projections. Sample least squares provides a convenient This paper gives convergence rates and asymptotic normality results of least squares estimators of linear functionals of projections. General results are derived, and primitive regularity conditions given for power series and splines. Also, it is shown that mean-square continuity of a linear functional is necessary for v^-consistency and sufficient under conditions for asymptotic normality, and this result is applied to estimating the parameters of a finite dimensional component of a projection and to weighted average derivatives of projections. Keywords: Nonparametric regression, additive interactive models, partially linear models, average derivatives, polynomials, asymptotic normality. splines, convergence rates, Introduction 1. Least squares projections of a random variable random vector y and q. q y on functions of a provide a useful way of describing the relationship between The most familiar example is the conditional expectation E[y|q], which is the projection on the linear space of all (measurable, finite mean-square) functions of q. Estimation of this projection is the nonparametric regression problem. estimating E[y|q] when q Motivated partly by the difficulty of has high dimension, projections on smaller sets of functions have been considered, by Breiman and Stone (1978), Breiman and Friedman (1985), Friedman and Stuetzle (1981), Stone (1985), Zeldin and Thomas (1977). These include projections on the set of functions that are additive in linear combinations of q, and generalizations to allow the component functions to be multi-dimensional. One simple way to estimate nonparametric projections is by regression on a finite dimensional subset, size, e.g. (1988), with dimension allowed to grow with the sample as in Agarwal and Studden (1980), Gallant (1981), Stone (1985), Cox and Andrews (1991), which will be referred to here as series estimation. This type of estimator may not be good at recovering the "fine structure" of the projection relative to other smoothers, Hastie, e.g. and Tibshirani (1989), but is computationally simple. see Buja, Also, the fine structure is less important for mean-square continuous functionals of the projection, such as the parameters of partially linear models or weighted average derivative (examples discussed below), which are essentially averages. This paper derives convergence rates and asymptotic normality of series estimators of projection functionals. Convergence rates are important because they show how dimension affects the asymptotic accuracy of the estimators (e.g. Stone 1982, 1985) and are useful for the theory of semiparametric estimators that depend on projection estimates (e.g. Newey 1991). Asymptotic normality is useful for statistical inference about functionals of projection, such as derivatives. The paper gives mean-square rates for estimation of the projection and uniform conververgence rates for estimation of functions and derivatives. Conditions for asymptotic normality and consistent estimation of asymptotic standard errors are given, and applied to estimation of a component of an additive projection and its derivatives. Fully primitive regularity conditions are given for power series and spline regression, as well as more general conditions that may apply to other types of series. The regularity conditions allow for dependent observations, so that they are of use for time series models. The paper also relates continuity properties of linear functionals of projections to ^-consistent estimation. Under a regularity condition on the projection residual variance, continuity in mean-square is shown to be necessary for existence of a (regular) V^-consistent estimator, and sufficient for asymptotically normal series estimators. This result is used to derive V^-consistency results for partially linear models with an additive nonparametric component (a generalization of the model of Engle et. al. , 1984) with and for weighted average derivatives of (possibly) additive models (a generalization of the Stoker (1986) functional). One problem that motivates the results given here is estimation of an additive nonparametric autoregression, ••• y^ = ^(yt-l^ * (1.1) ^^^t-r^ ^ ^' is the residual from the projection of where e of lags. r -^ y on additive functions This model avoids high dimensional arguments but allows for several lags, which seems useful for short time series. The convergence rates and asymptotic normality results apply to estimates of this projection, although asymptotic normality here requires it to be a dynamic regression, where E[e |y , y „• • • • 1 ~ ^- the weighted average derivative r). Effects of lagged values can be quantified by Xw(y ) [5h(y )/ay ]dy , (J = 1, .... The results to follow include primitive regularity conditions for Vn-consistency and asymptotic normality of series estimators of this functional, and could also be applied to generalizations that allow interactions between lags in equation (1.1). These results are an addition to previous work on the subject, that of Agarwal and Studden (1980), Gallant (1981), including Stone (1985). Cox (1988), Andrews (1991), Andrews and Whang (1990), because the results (including asymptotic normality) do not require that the projection equal the conditional expectation, similarly to Stone (1985), but not to the others. the results allow for dependent observations, In addition, and apparently improve in some respects on those of Cox (1988) and Andrews (1991) for the special case of conditional expectations. There is some overlap of the convergence rates with a recent paper by Stone (1990) on additive-interactive spline estimators, which the author saw only after the first version of this paper was written. Stone's rate results (1990) are implied by those of Section 7 below, under conditions that are weaker in some respects (allow for dependence) and stronger in others (imposing a side condition on allowed number of terms). Also, the same convergence rate result is given in Section 6 for variable degree polynomial regression, which is not considered in Stone (1990), and uniform rates and asymptotic normality are shown here. Series Estimators 2. The results of this paper concern estimators of least squares projections that can be described as follows. q (measurable) functions of z, denote a mean-squared closed, q with finite mean-square. Let z denote a data observation, with q having dimension H Let r. and y linear subspace of the set of all functions of The projection of K on y is h(q) = argminj^^j^E[{y - h(q)}^]. (2.1) h(q) = E[y|q], An example is the conditional expectation, set of all measurable functions of q = (ql.ql)', generalization has (1=1 n (2.2) = {q^p . h^,^2f^2t^ has finite mean-square, q. are subvectors of L) with finite mean-square. q : q , K where An important q ., and ~^^^^ ^[^^^(q^,)'] < <^}. Primitive conditions for this set to be closed are given in Section is a smaller set of functions, x with many see Stone (1985) for discussion and references. e.g. theory allows for any closed This 6. whose consideration is motivated partly by the difficulty of estimating conditional expectations for dimensions; is the H, (e.g. H = {w(q) [J]„h„(q.) ] >, The general w(q) a known function, under conditions for this to be closed), and primitive conditions are given for power series and spline estimators of a projection on the H of equation (2.2). The estimators of h(q) finite dimensional subspace of considered here are sample projections on a H, which can be described as follows. be a vector of functions, each of which is an p (q) = (p.„(q), . element of Denote the data observations by H, . . ,p„„(q) )' Let y. and q., , (i = and let ...). 2. 1. for sample size where . • . An estimator of n. h(q) = p^(q)'n, (2.3) y = (y^. . y^) p^ = [p^(q^ )...., p^(q^) and ' h{q) is ^ = (p^'p^)"p^'y. denotes a generalized inverse, and (•) subscripts for K have been suppressed for notational convenience. 71 ] The matrix and h(q) p 'p will be asymptotically nonsingular under conditions given below, making the choice of generalized inverse asymptotically irrelevant. Often the object of interest is not the projection, but rather some functional some point = {J] ._.h (q . projection, is an A(h) A(h) = 3 h(q), s x 1 vector of real a partial derivative evaluated at A(h) = h (q.)-h (q and q, where h(q), Examples are numbers. H of A(h) = h{q ) q ,...,q ) - h(q) for the difference of the Jth component of an additive .) }, at two different points. Another example is the parameters of the finite dimensional component of a projection, i.e. h(q) = q'P + S._ h „(q in p .). Estimators of such parameters have been analyzed by Chamberlain (1986), Heckman (1986), Rice (1986), Robinson (1988), Schick (1986), and others, but only under the conditions h(q) = E[y|q] L = and 1. Here the nonparametric component can be additive, which leads to a more efficient estimator if Var(ylq) is constant: the moment that K see Section is closed, projections of each element of E[{q -P(q 1« )}{q -P(q |K )>' identification condition for (2.4) P = A(h), let q ]. P, K Let 5. P(q \H on H , h(q) = E[y|q] = {Z._ h .(q .)>, assume for denote the vector of ) M = and Assuming that M it follows that A(h) = M"^E[{q^-P(q^|K2)}h(q)]. and is nonsingular, the In addition, ) - q'M one can consider linear functlonals of is in hand, /3 by subtracting off h2(q2). h(q ,q once for any q^p e.g. q^, E[{q,-P(q. |K )}h(q)], where for L = 1. ^i=i^2l^^Zp is some specified value of q Another interesting example is a weighted integral of a partial derivative of of the form h(q), X A.(h) = /w (q)a %(q)dq, (2.5) for a multi-indices A. (J = 1 and functions 1. This is an average derivative w.(q). functional similar to that of Stoker (1986), autoregression from Section s), including the nonparametric Estimators of similar functionals have been analyzed by Hardle and Stoker (1989), Powell, Stock, and Stoker (1989), and Andrews (1991). In this paper most of the analysis will concern linear functionals of such as the above examples. The natural "plug-in" estimators of linear functionals have a simple form. h(q) Let A = (A(p ^(q)) is a linear combination of elements of A(h) = h, H, A(p^,^,(q) linearity of ) )' . A(h) Because implies A'tt. The paper focuses on linear functionals because the linearity in n of this estimator leads to straightforward asymptotic distribution theory for A(h). the delta-method can also be used to analyze nonlinear functions of Of course, such estimators. The idea of sample projection estimators is that they should approximate h(q) if K is allowed to grow with the sample size. this approximation are that and 2) p (q) "spans" H 1) as each component of K grows (i.e. p The two key features of (q) is an element of for any function in H, K K, can ) . be chosen big enough that there is a linear combination of p (q) approximates it arbitrarily closely in mean square). 1), 71= (E[p*^(q.)p*^(q.)'])"^E[p^(q.)y^] = (E[p^(q. )p^(q^ the coefficients of the projection of 2), will approximate (q)'7r p error in is small, tt h (q) on ) ' ] n estimates )"^E[p*^(q. jh^lq. Thus, p (q). under ) ] and 1) Consequently, when the estimation h (q). should approximate h(q) Under that h (q). Two types of approximating functions will be considered in detail. They are: Power Series: A = (X Let nonnegative integers, X q = r Tl„ I q. . , ...,A denote an r-dimensional vector of )' a multi-index, i.e. oo For a sequence (A(k)) with norm |A| = Xii-i-^i' ^^'^ l^t of distinct such vectors,- a power series approximation corresponds to ^Ik- (2.6) pj^(q) = i ^ = A(k-s) q_ ^' . s ^. k = s+1. , , H allowing for the finite dimensional component of Throughout the paper it will be assumed that natural way, with the degree of the terms in k. Also, X (k-s including in of some exists a {q i.e. q„», q an additive . > are ordered in the A(k-s) IA(k-s)| monotonically increasing component of the projection will be allowed by q only those terms with components that are subvectors by requiring that for each multi-index such that the only nonzero components of where the corresponding component of q is included in condition will be that all such terms appear in multivariate powers of each q „, X(k-s) A (k-s) ordered so that |A(k-s)| are those All of these A(k-s). {q. there The spanning q„. {A(k-s)}. requirements are summarized in the statement that increasing. discussed above. } consists of all is monotonic The theory to follow uses orthogonal polynomials, which may also have computational advantages. A(k-s) If each q is replaced with the product of orthogonal polynomials of order corresponding to components of respect to some weight function on the range of q q X(k-s) and the distribution of The estimator will be numerically invariant to such a . replacement (because , then there should be little collinearity among is similar to this weight, the different q with \(k-s), is monotonically increasing), |A(k-s)| but it may alleviate the well known multicollinearity problem for power series. Splines: Splines, which are smooth piecewise polynomials, can be formulated as projections if their knots (the joining points for the polynomials) are They have attractive features relative to power series, being held fixed. less oscillatory and less sensitive to bad approximation over small regions. The theory here requires that the knots be placed in the support of For convenience the support is normalized to which therefore must be known. be r~s ^ = n-=i (•) [-1,1] f"'^' ^-l q^., ^^^ attention restricted to evenly spaced knots. • = 1(* > 0)(*). An Let degree spline with L+1 evenly spaced knots on m is a linear combination of '^^ [ £ u^, r {[u + For a set of multi-indices 1 £ m, fc - 2(A:-m)/(L+l)]^> with {A(k)}, the approximating functions for , m+1 i A.(k) ^ m+L-1 /fc £ m+L for each J and will be products of univariate splines, q i. e. q^jj. (2.7) pj^j,(q) = { Note that implicit in k = 1, . . . , s r-s ^"inj=i^^(k-s).L_|^<'2j^' ^ = K •••^ is a choice of number of knots for each of the k, s s components of q and a choice of which multiplicative components to include. Throughout the paper it will be assumed that the the ratio of numbers of knots for each pair of elements of is bounded above and below. q An additive q component of the projection will be allowed for by requiring that the multi-indices satisfy the same condition as for power series, which can be siommarlzed in the statement that the components that appear in any terms consist of all interactions of q q„„. The condition that the support of may first appear. support domain support R r— and range R [-1,1], then is = (t(x q ),..., t(x ))' will have Since additive projections are invariant to such componentwise, }. the spline estimator based on estimate the original projection. X with x t(«) is a univariate one-to-one transformation with one-to-one transformations, of is not as restrictive as it Suppose that there are "original" variables and , $ is q R r— is restrictive. Of course, Also, q will also the condition that the support the bounds on derivatives of the projection imposed to obtain convergence rates in what follows are restrictive. The transformation must be continuously differentiable with positive derivative to preserve differentiability of the projection, and boundedness of the derivatives will require that the derivatives of the original projection go to zero as the transformation. For example, x if grows faster than the derivatives of t(') = 2F(«) - that is continuously differentiable of all orders, 1, where F(x) is a CDF then the order of differentiability of the projection is preserved under the transformation, but boundedness of derivatives requires that the derivatives of the projection go to zero faster than the density of Fixed, F as x goes to infinity. evenly spaced knots is restrictive, and is motivated by theoretical convenience. A judicious choice of transformation may help alleviate effects of evenly spaced knots. If a distribution function is used to transform the data, as discussed above, closely the true distribution, and the distribution matches then the transformed variable will be "spread out," which can improve splines with fixed evenly spaced knots. Allowing for estimated knots (e.g. via smoothing splines, as in Wahba, 1984) is known to lead to more accurate estimates, but is outside the scope of this paper. The theory to follow uses B-splines, which are a linear transformation of the above basis that is nonsingular on [-1,1] and has low multicollinearity. The low multicollinearity of B-splines and recursive formula for calculation also leads to computational advantages; e.g. see Powell (1981). Series estimates depend on the choice of the number of terms that it is desirable to choose choice of K based on the data. For example, one might choose K h . (x. ) E- _i ^V- "^.-if^^s ^ 1 > is the estimate of the regression function computed from all the observations but the data based With a data-based by delete one cross validation, by minimizing the sum of squared residuals where so these estimates have the flexibility to adjust to conditions K, in the data. K, i . Some of the results to follow will allow for K. 10 . 3. Regularity Conditions This section lists and discusses some fundamental regularity conditions on which all the following results are based. The first Assumption limits dependence of the observations. Assumption 3.1: is stationary and a-mixing with mixing {(y,,q.)) U coefficients a(t) = 0U~^), = I, for ...), 2, > ^l 2. The stationarity assumption could be relaxed, but is not done so in order to keep the notation as simple as possible. moment conditions on matrix D ]> = [trace(D'D)] IIDII 1/v V {E[IIYII let V < , u = y - h (q), Let y. 00, and The results will also make use of 1/2 = y u - h (q.). for a random matrix , the infimum of constants |Y| Y, for a Also, |Y| = such that C 00 ProbCllYII < C) = 1. Assumption 3.2: |u. s i 2 is finite for IS | 2 is bounded. E[u.|q.] and 11 The bounded second conditional moment assumption is quite common in the literature (e.g. Stone, 1985). Apparently it can be relaxed only at the expense of affecting the convergence rates (e.g. see Newey, 1990), so to avoid further complication this assumption is retained. Either Assumption 3.3: (pit) that = 0(.t~^), (t = 1, |E[u^u^^^|q^.q^^^] a) 2, I z. is uniform mixing with mixing coefficients ...), for M ^ c(t) and E^"iC(t) < > 2 b) or; there exists oo. This assumption is restrictive, but covers many cases of interest, a dynamic nonparame trie regression with h (q. \J ) including _„,...] _, XXXXX.LX = E[y. |q. X The next Assumption is useful for controlling 11 such c(t) , q. _., y. y. ^^ K K (p 'p /n) . J Assumption 3.4: {P (q)} K (P^j,(q) For the support and a nonsingular, probability measure P K — > i) q. P(q.eQ) a cP(q €Q) with , K . there is K with A^ ii) There is q € Q; p . . . (q)' = such that <q(K) P^^Cq) )' there is a for any measurable set K K X P (q)P (q)'dP(q) Q Q is bounded away from oo. K The bounds in ii) give a convergence rate for all of the restrictions onto ii). p,j,(q) = q while iii) controls For example, k-1 , then if Q = that loads Without this type of normalization the second moment matrix can be ill-conditioned, difficulties. K p 'p /n, Hypothesis iii) is essentially a normalization, its singularity. and For each constant matrix and the smallest eigenvalue of zero as of iii) For P^(q) = (P^^(q) £ Cq^K); maXj^^j,|Pj^(q^)| Q ^°^ ^^^ ^KK^'^^^'Sc Q R, q K K E[p (q)p (q)'] = KJs. leading to technical is uniformly distributed on [cr..], 1J which = l/(i+j-l), <r [0,1], 1 has a smallest eigenvalue that goes to zero faster than K factorial. One approach to verifying this assumption is to find a lower bound on the smallest eigenvalue of p (q)[X(K)] to let -1/2 , P,^(x) K K E[p (q)p (q)'], and then let as in Newey (1988a) for power series. P (q) \(K) = Another approach, is be a transformation that is orthonormal with respect to some density and assume that the true distribution dominates the one corresponding to that density, and to use known bounds for orthonormal functions, as in Cox (1988), Newey (1988b), and Andrews (1991) for power series. A third approach to find a transformation that is well conditioned, though not orthogonal, as for B-splines in Agarwal and Studden (1980) and Stone (1985). The next Assumption specifies the way in which the number of terms is allowed to depend on the data. 12 In Assumption 3.5: with K(n) 5 K subvector of K = K(z, K(n) :s z , n) such that There are with probability approaching one; which is a subvector of p (q), i) p K(n) ^ K(n) — p- ii) for all (q), is a (q) K(n) i K s K(n). That is, is allowed to be random, K but must lie between nonrandom upper and lower limits with probability approaching one, functions must be nested. = K. and the approximating Nonrandom K is included as a special case where These upper and lower limits control variance and bias, (the larger is respectively the less bias there is but the more variance). K K It would be interesting to derive such upper and lower limits for specific choices of K cross-validation), but these results are beyond the scope of this (e.g. paper. Some results below will require that the transformed approximating functions are also nested: Assumption 3.6: 3.3, P— for all is a subvector of P (q), (q) K(n) Assumption 3.4 is satisfied and for rs Yi :s P of Assumption (q) which is a subvector of P (q), K(n). This Assumption is satisfied for power series but not for splines, following spline results are limited to the nonrandom K so that the case. The next Assumption imposes a rate condition for convergence of the second moment matrix. p'^ (q)P (q)' Let denote the number of elements of that are nonzero for some convenience suppress the Assumption 3.7: K(n) KCq(K) /n n argument in > 0. 13 K, q e Q, K, K, and for notational and K henceforth. This is a side condition that will be maintained throughout. growth rate of sense, It limits the in a way that may be nonoptimal in the mean-square error K as discussed in Sections 6 and 7, although it is weaker than similar side conditions imposed by Cox (1988) and Andrews (1991). The bias of these estimators depends on the error from the finite dimensional approximation. error. Sobolev norms will be used to quantify this For a measurable function |f(q)|^^^ = if(q)| for some |X| , Q, ^ d. V Q, let -X,^,^^<E[|a^f(q.)r]}l/^ -^U|.d'^'^(^i^lv = |f(q)|^^^ . The norm defined on f (q) '"^^lAl^d'^^^^qeo'^^^^''^'- will be taken to be infinite if d f(q) does not exist Inclusion of derivatives in these norms will be useful for deriving properties of 9 h(q). Many of the results will be based on the following polynomial approximation rate hypothesis: Assumption 3.8: and a = a(?,d,v) For each class of functions such that for all '^'n^R^^'^^^ - P^'^'^^'^ld.v ^ ?, there exists C = C(?,d,v) f 6 ?, ^" a This condition is not primitive, but is known to be satisfied in many cases. Primitive conditions for power series and splines are given in Sections 6 and 7. In order for the same bias bounds to apply to estimated functionals of interest, it is necessary that A(h) be continuous with respect to the same same norm as in Assumption 3.8, which is imposed in the following condition. 14 Assumption 3.9: For the d and of Assumption 3.8, v linear functional with respect to the Sobolev norm such that for all h e K, IIA(h)ll C|hkd, V s |h| is a continuous A(h) , d, V , there is i.e. C . Convergence Rates 4, This Section gives mean-square convergence rates for and uniform h(q) consistency rates for its derivatives and continuous linear functionals. The results include both sample and population mean-square error rates. Theorem 4.1: and 3= = {h}, -3.5 If and Assumptions 3.1 and 3.7 are satisfied for d = 0, then lj^^[h(q.)-h(q.)]^/n = (K/n If Assumption 3.6 is also satisfied, S[h(q)-h(q)]^dF(q) = + K"^"f j:^,^^^^^/^;""'';^''''; then for the CDF (K/n + k'^"" F(q) {l^^^^^CK/K)'"^^ of }^^'^ q., ). The two terms in the convergence rate essentially correspond to variance and The bias term, which is b ias. K if K = K, and for K ,„ ,,,,-av, 2/v ) iY^j.^^r^^i^'^) 2a, „ is bounded above by v > 1/a will be equal to . K .A consequence of the second conclusion is convergence rates for some version of the additive components, h(q) because Ti closed implies that the mapping from to some decomposition into additive components is mean-square continuous (see Bickel et. al. , 1990, Appendix). Uniform convergence rates depend on bounds for the derivatives of the 15 I I series terms. Assumption 4.1: For each for all multi-indices k s K, with A P j,(q) s p \\\ is differentiable of order there is and p such that with Ci-ki(K) A 1 probability one Theorem 4.2: 00, and 9^ maXj^^j,! aV^^Cq. ) i I Cj^l and (K) If Assumptions 3.1 - 3.8 are satisfied, for = {h}, and Assumption sup .\dh(q) qev - ^h(q)\ 4. 1 is satisfied, = p The uniform convergence rate for i Cq^K) C|;^| (K) d = \X\ , s p, v = then (K'^^^C,..,(K)l(K/n)^^^ + K~";;. A I is slower than the mean-square rate h(q) and does not attain Stone's (1982) bounds, although it is faster than previously derived rates for series estimators, as further discussed below. Convergence rates for continuous linear functionals of h(q) will follow from this result. Theorem 4.3: C,.^.(K) p If Assumptions 3.1 - 3.9 and 4.1 are satisfied for is monotonically increasing in (K^^^t^XK)i(K/n)^^^ d |\|, then v = a> and A(h)-A(h^) = + k"";;. The implied convergence rate for mean-square continuous linear functions is not sharp, as they are shown to be Tn-consistent (under slightly stronger conditions) in Section 5. 16 5. Asymptotic Normality An estimator of the asymptotic variance of An can be formed from the usual estimator of the asymptotic variance of the projection coefficients n. The asymptotic normality result below will require that the products of H elements of and the residual be martingale differences, autocorrelation correction is required. Treating K as fixed, Let V = I.;^p^(q. )p^(q. t H p^'p*^/n. n, )' [y. -h(q. ) ]^/n. the White (1980) estimator of the asymptotic variance of the projection coefficient estimator combination of so that no Z VZ is tt Since . Att is a linear a corresponding estimator of the asymptotic variance is n = A'z:~vt~A/n. This estimator is consistent as K grows, under conditions to follow. Further conditions are useful for asymptotic normality of consistency of Q. 11 and E[u. |q.] E[h(q. )u. lz._ , 111 V = E[p^(q.)p^(q.)'u^]. Assumptions 3.1 2 i) z. and Let Z = E[p^(q.)p^(q.)']. Assumption 5.1: A(h) - 3.2 are satisfied, with is bounded away from zero; __,... ] = 0; ii) For any iii) K = K(n) h (q. ) 17 = E[y. s > 4/i/(/i-l), h(q) e H, is nonrandom. Fart ii) of is the martingale difference assumption: observations are independent or Q = A'Z"^VZ"Vn. I z. _, it z. holds if the _,...] . An Assumption 5.2: has full column rank for some A a nonsingular matrix on C^ such that for all J K and for all ^ K, K there is does not depend C P (q) K. Part ii) of this hypothesis rules out asymptotic linear dependence among different components of When A(h). is mean-square continuous in A(h) a primitive condition for this hypothesis is that is onto A(h) R h, as , discussed below. Assumption 5.3: KK<Q(K)Vn K, Assumption 3.4, 3.8, and 3.9 are satisfied for — > and 0, vQC~" -^ v ^ K £ 2, 0. The second condition requires essentially that the bias converges to zero faster than 1/Vn (see Assumption 3.8), which is stronger than the natural condition that the bias go to zero faster than the variance. Theorem 5.1: n~^^^[A(h) - A(h)] Furthermore, \b ^n Q — > n^, -5.3 If Assumptions 5.1 -^ are satisfied then N(o.i), if there exists a scalar ip dr^^^ih > - qjq'^^^' -^ and nonsingular o. n_ such that then il^^lACh) - A(h)] -S N(O.n^), ipjl -^ n^. This result improves on Andrews (1991) in applying to estimators of projections other than conditional expectations, allowing for dependence, and having a faster growth rate for more diffcult to allow for random but restricts K, K K to be fixed: with dependent observations. The second conclusion is useful for forming asymptotic Gaussian 18 It is confidence intervals in the usual way. satisfied and K^^'^C^CK) [ consistent estimator of (K/n) ^'^^ n 1/2 i/» — + k""'] > so that 0, A(h) is a in the standard way. is not restrictive when n A(h) the delta-method can be used to make inference A(h), about smooth nonlinear functions of hypothesis about If the hypotheses of Theorem 4. 3 are will satisfy the hypotheses. A(h) However, when The where is a scalar, is a vector, A(h) ili ^n it requires essentially that the variance of each component of A(h) converge which may not be true when, e.g., A(h) includes to zero at the same rate, both and its derivatives at a single point. h(q) primitive condition for = n, i// = is possible to derive a It corresponding to Vn-consistency of A(h), which is stated in the following result. Theorem 5.2: V = and Suppose that i) Assumptions 5.1 and 5.3 are sat isfied for ii) for any 2; E[{h(q)-p (q)'n 8(q) and } ] of elements of — as > H A(h) = E[S(q)h(q)] VE[A(h) - A(h)] h(q) € K — > such that for all -^ K oo; there exists K such that iii) there exists an E[5(q)S(q)' ] Then for h e K. N(O.Q^), n nn -^ d = s x vector 1 exists and is nonsingular, Q^ = E[a-^(q)6(q)5(q)' ] n^. Hypothesis ii) is the minimal mean-square spanning requirement for consistency of the sample projection. By the multivariate Riesz representation theorem (e.g. hypothesis iii) is equivalent to the statement that see Hansen, A(h) 1985), is mean-square continuous and has range R . Furthermore, mean-square continuity of such linear functionals is a necessary condition for a finite semiparametric variance bound for (1956), A(h), as in Stein and hence for existence of a (regular) v'n-consistent estimator (Chamberlain, 1985), so that mean-square continuity of the v'n-conslstent case. 19 A(h) characterizes Theorem 5.3: is a scalar functional that is not mean-square A(h) If continuous then the semiparametric variance bound for A(h) is infinite. Theorem 5.2 can be specialized to many interesting examples, including the parameters of the finite dimensional component of the projection and average derivatives. = q'^p -i- (p ^ Theorem 5.4: h (q ) P = E[5(q)h(q) 5(q) = for ] the parameters m"-^ 3 [q^-P(q^ IW^) of 1 h(q) • the mean-square spanning hypothesis of ii) will be satisfied as Furthermore, any satisfy h^{q^) long as As noted in equation (2.4), (q ), . . . ,p (q ) )' H span giving the following result: , Suppose that i) Assumptions 5.1 and 5.3 are satisfied; ii) for e K there exists E[{q^-P(q^\n^)}{qj^-P(q^\n^)}' such that ^\i(^fc>i(> n^ = Then for is nonsingular. ] M~h[a-^(q){q^-P(q^\n^)}{{q^-P(q^ \}<^)}]M~\ - Vn(^ p^) -^ nn N(o,n^). Sample projection estimators of -^ n^. have been previously analyzed by p Chamberlain (1986), Andrews (1991), and Newey (1990), but only under q'p + h„(q„) h (q„) = E[y|q], an unrestricted functional form for could not be additive), and independent observations. E[y|q] = q'p of this result is that if constant, + h (q ) and then an estimator that imposes additivity asymptotically more efficient than one that does not: are 2 0- <T^iE[{q^-Piq^\n^)}{q^-P(q^\n^)}' -1 (E[{q -E[q Iq ] >{q -E[q Iq semi-definite difference. the convergence rate of ] >' Thus, p, ] ) ])''^ 2 o- h (q (q) ) h„(q„) (e.g. One implication = Var(ylq) is will be The asymptotic matrices and respectively, which have a positive although imposing additivity does not improve it can lower its asymptotic variance. 20 . Theorem 5.2 can also be specialized to the average derivative functional of equation (2.4) for certain weights. If there are no boundary terms then integration by parts gives X |A A.(h) = S w.(q)a -^h(q)dq = (-1) ^ Q X 3 -'w(q)h(q)dq Q -^ |A = E[5 where (q)h(q)], 5 is the density of f(q) = (-1) (q) Following Stoker (1986), hence A(h) A I " P(a ^w(q)/f (q) K) (q) | 2 Here, E[5.(q) will be finite if if ^ A 9 w(q). Let S(q) = (5 (q) 5 (q))'. q. ] is not too small relative to f(q) A I ^ the previous integration by parts will be valid, and mean-square continuous, under the hypotheses of the following result. Suppose that Theorem 5.5: ~ any oo; h(q) € iii) Q K there exists |A| n K ~ K such that E[ {h(q)-p (q)'n Is convex with nonempty Interior, different lab le to order for all Assumptions 5.1 and 5.3 are satisfied; i) s. |A.|, E[<T^(q)5(q)a(q)' |A.| on E[S(q)S(q)' Q ] and 8 w .(q) w .(q) = 2 ii) for K — on the boundary of Q K } ] — > as > Is continuously exists and is nonsingular. Then for Q_ ]. Vn(A(h) - A(h^)) -^ N(0,n^), nh -^ Q^. Primitive conditions for these results for power series and splines are given in the following Sections. 21 = Power Series 6. This Section gives primitive conditions for consistency and asymptotic normality of projections that use power series. (2.2), For each i) q_ = q_„, then l such that for each > q (£ „, for some = L), 1 is a subvector q„ ii) There exists a constant I'; with the partitioning i, if q_ = iqy'p.q. '.)' Either p = n^ (i.e. is not present) or q. and ii) are sufficient for i) Boundedness of 1 ] a(q) iii) ; is bounded and for the clos- {^^^h^^iq^py. El{q^-P{.q^\}i^)}{{q^-P(.q^\K^)}'] of Conditions q c > for any , cJ"a(q)d[F(q2^)-F(52£)] ^ E[a(q)] ^ c"Va(q)d[F(q2^) -FC^^^) 0, ure is as specified in equation and the following hypothesis is satisfied: Assumption 6.0: of H it will be assumed that the next Sections, Throughout both this and {Z._ h .(q .)} is nonsingular. to be closed. can be relaxed, but for brevity is not here. q Power series estimators will be mean-square consistent if the regressor distribution has an absolutely continuous component and the slowly enough; see Newey (1988a). There are finite that the support of q„, is r grows To obtain convergence rates it is useful to bound the regressor density below, Assumption 6.1: K as follows: 2 1 q. > q., 12 fl._i[q..q.] v. ^ 0, (J = 1 such r) and the distribution of has q absolutely continuous component with density bounded below by Cfl-si ^ ^'^ •"'' -^ It is also ^^ -"^ -^ ^ °^ ^^^ support. possible to allow for a discrete regressor with finite support, by including all dummy variables for all points of support of the regressor, and all interactions. Because such a regressor is essentially parametric, and allowing for it does not change any of the convergence rate results, 22 this generalization will not be considered here. To state further conditions, let denote the sequence of {'V(k)> multi-indices used in defining the power series or spline interactions in equations (2.3) and (2.4) respectively, and for any (other) multi-index ^(k) = {j X (k) * 0>, : V = max, r. „,, ,u./J(k), k^j€^(k) J(k) = #^(k) < A = max, 7 n, ,A./J(k). „,, k^j€^(k) J A, . J For power series, Assumption 3.7 follows from _4+4t> Assumption 6.2: K /n — 0. > For a = v = 0, and 1 Cox's (1988, 715) p. 1/4 K = o(n ), this condition is K = o(n ) which is weaker than requirement. Primitive approximation rate conditions (as in Assumption 3.8) for power series follow from known results of Lorentz (1986), Powell (1981), or a Taylor expansion. Assumption 6.3: Each of the components continuously differentiable of order h h Aq.) a = h-d when t. = corresponding conditions for v = oo, with is L), 1 q.. d = and a = h/n. A literature search has not yet revealed 1. d = on the support of This hypothesis implies Assumption 3.8 for and with (£ , > and a > 1, but rates for this case follow from a Taylor expansion under the following (strong) condition: Assumption 6.4: the A There is a constant C such that for each multi-index partial derivative of each additive component of is bounded by C 23 h(q) A, exists and The first power series result gives convergence rates. Theorem 6.1: -3.3 Suppose that Assumptions 3.1 and 6.0 - 6.3 are satisfied. Then Y..'l^[h(q.)-h(q.)]^/n = (K/n S[h(q)-h(q)]^dF(q) = in addition, K^^''). (K/n + K'^^"^ ) sup ^^\h(q)-h(q)\ = Suppose, + (K^''^{[K/n]^^^ + K^^"^}) that either a) Assumption 6.4 satisfied and n. = i, X < h, and a = h-\, or; b) is any positive number; a sup^^^\dh(q)-ah(q)\ = Then (K^*''*^'^{[K/n]'^^^ + k"*";;. This result implies optimal convergence rates for power series estimators of h(q) when K If the density of satisfied. Cn , h(q) goes to infinity at the optimal rate and Assumption 6.2 is y = n/{2h+n.), is n condition that and , h > h > q is bounded away from zero, 3n,/2, , K = the mean-square convergence rate for which attains Stone's (1982) bounds. 3n/2, K = en The side which is needed to guarantee Assumption 6.7, limits this optimality result, but is weaker than the corresponding condition in Cox (1988). These mean-square error results apply to additive projections (rather than conditional expectations), or Andrews and Whang (1990), (1990), like Stone (1985, 1990) but unlike Cox (1988) allows for interactive terms, similarly to Stone (although Stone (1985) also derives optimal rates for derivatives), The side condition Assumption 6.2 is and allows for dependent observations. not present in Stone (1985) or Andrews and Whang (1990), but it implies a population mean-square error result, unlike Andrews and Whang (1990). comparison with Cox's (1988) uniform convergence rate of 24 K /Vn + K In = K ( [K/n] K + Cox's notation, + K ), for univariate ) h = 1 and with density bounded away from zero (in q k = 2), the rate here is the faster 1/2 and uniform convergence rates for derivatives are given here. To state the asymptotic normality result for power series, the variance matrix estimator described in Section 5, satisfied with s > 4iL/(y.-l), respect to the Sobolev norm \h\ , , a, — ) \h n "0 ip^^'^^lACh) - A(h)] In comparison with Andrews -^ are 6.1, is continuous with either a) 6.3 is satisfied, Q„ — = n. for some oo > such that yph N(0,n^), (1991), 6.0, d = 0, and 00 K/n and nonsingular > - 5.2, A(h) ii) 0; > or b) Assumption 6.3 is satisfied, 0; Assumption 6.4 is satisfied and a scalar — /n IC denote Q let for power series. Theorem 6.2: Suppose that i) Assumptions 3.1, 3.2, 5.1 VnK K([K/n] -^ lA ^n 7 > 0; Q — > — VnK 1, > Hi) Q^. 0; or c) there exists Then n^. this result applies to projections other than the conditional expectation, or allows for dependent observations, and has weaker growth rate conditions for o(n ) h > 5T./2 (e.g. h(q) v = if K = o(n while Andrews (1991) requires imply that n. K: is thrice ). then i) For d = requires i) K = and ii) continuously dlf ferentiable when = 1). This result can be applied to estimation of the components of an additive projection and their derivatives, when the observations are independent. 25 Theorem 6.3: and bounded away from zero, are satisfied for v = 0, . |u| < each h J h order q ., q on Q, K = o(n tr^^^lChrq.) Also, — VnK if > for oo Xq J - (J ., hXq.)} - .)/dq^. - a'^h — VnK = .Jq is bounded (q) Assumptions 6.0 and 6.1 2, .... 1. Then for any pair of points 0. > r), (h.^Cq.) - h.^Cq.)}] then for any 0, Q~^^^[a^h Xq q s > a- is continuously ^ different iable to .) J and ), in the support of . 2 Suppose the observations are independent, q . -^ in the support of .)/dq^.] -^ N(0,1). q ., U(0, 1). The differencing normalization here is different than the mean centering in in Stone (1985), which would be more difficult to work with. A v'n-cbnsistency and asymptotic normality result for power series estimates of mean-square continuous linear functionals is: Theorem 6.4: Suppose that Assumptions 5.1, 6.0, 6.1, are satisfied and ¥r /n — exists an > 0; ii) Assumption 6.3 is satisfied, s x 1 vector d(q) exists and is nonsingular, and n^ = E[<r^(q)5(q)5(q)' Vn[A(h) - of elements of and VnK H such A(h) = E[S(q)h(q)] — > 0; Hi) that E[S(q)S(q)' for all h € H. there ] Then for ], A(h)] -^ N(O.n^), nf2 -^ n^. This result can be specialized to the parameters of a finite dimensional component and average derivatives, as follows. 26 Theorem 6.5: If hypotheses i) and ii) of Theorem 6.5 are satisfied then for n^ = M~^E[a-^(q){q^-P(q^\n^)}{{q^-P(q^\H^)}' ]M'\ Vn(^ - ^q) -^ N(O.^q). r)fl -^ n^. This result gives fully primitive regularity conditions for ^-consistency and asymptotic normality of a power series estimator of the parameters of a finite dimensional component of a projection. It allows for h (q to have the ) additive form discussed above, and also allows for dependent observations. An analogous result can be given for weighted average derivatives, although for brevity such a result is only given below for splines. 7. Splines Results for splines are limited to the case Assumption 7.1: .... Assumptions 6.0 and 6.1 are satisfied with v . = 0, (j = J). Splines allow for a faster growth rate for the number of terms. Assumption 7.2: 3 K /n — > 0. Approximation rate conditions for a = 1 or d = follow from known results, but a literature search has not yet revealed conditions for other cases, which limits the following results. 27 1, Theorem 7.1: 7.2, Suppose that a = m and 6.3 are satisfied, and lip = Z.'^JMq.)-h(q.)]^/n ^1=1 S[h(q)-h(q)]^dF(q) = sup If, = n. 1, h ^ - Assumptions Then for 1. |A| 3. - 1 3.3, 7.1, ^ m, (K/n + K^^^"^), (K/n \h(q)-h(q)\ = in addition d = 0, or 1 K^^"^) + (K(lK/nl^^^ + K^^"^}). then (K^'^'^dK/n]^^^ + K^*^}). sup^^^\a^h(q)-ah(q)\ = This result yields optimal mean-square convergence rates for spline regression estimation of an additive projection with dependent observations, n computed as described in Section Theorem 7.2: Suppose that Sobolev norm > for , d, = n n or 1 0; ii) d £ m. — > n„. Apparently, let \. be the variance estimator Q. using splines. d = = 0; A(h) Assumptions i) 5. 1 - 5.2, 7.1, is continuous with respect to the Assumption 6.3 is satisfied for m ^ h-1, a scalar ijj and nonsingular > n^ such Then - A(h)] -^ N(0,n^). ipjl -^ Qq. there are no other asymptotic normality results for spline projections in the literature. ) 7, > 00 il,^^''^[A(h) K = o(n 5, —^0; Hi) there exists VnK i/( \h\ n. 4 — K/n are satisfied and that h here the side condition of Assumption 7.2 is satisfied if : Throughout the rest of Section and K = if for power series, differentiability of h(q) The growth rate K = o(n 1/4 ) is smaller than so asymptotic normality will require only twice rather than the thrice differentiability for power 28 . This result can be specialized analogously to Theorem 6.3, although series. for brevity this specialization is omitted. A Vn-consistency and asymptotic normality result for spline estimators of mean-square continuous linear functionals is: 4 K /n Theorem 7.3: Suppose that Assumptions 5.1, 6.1, are satisfied and ii) Assumption 6.3 is satisfied, an s X vector 1 m 2 h-1 H of elements of S(q) and is nonsingular, and — VnK and > such that A(h) = E[S(q)h(q)] for all Hi) 0; > 0; there exists E[S(q)S(q)' ] exists n^ = Then for h s H. — E[(r^(q)S(q)S(q)'] -^ - A(h)] Vn[A(h) nh N(0,n^), -^ Q^. This result can be specialized to the parameters of a finite dimensional component and average derivatives, as follows. derivative result is given here. Let components of jth, other than the q density of the Theorem 7.4: Suppose that hypotheses order |A.|, |A.| h .^(q on d > d ^ 0, and Q denote the vector of all the . and the conditional f(q.|q_.) component given the others. jth for some integer q_ For brevity, only the average d is continuously different iable to w .(q) m, w .(q) Theorem 7.3 are satisfied, i) and ii) of on the boundary of = is continuously different iable to order .) \\[d'^w(q .)/dq'^.]/f(q .\q [-1,1] -> J J \\\ and d, Then .;il„<oo. w(q .)[d^h.(q .)/dq^.]dq Vn[S for all Q J N(0,E[a-(q) S(q) ]), . nQ w(q - S J [-1,1] -^ 29 .) [d^h E[a-(q) 8(q) (q .)/dq'^.]dq J^ -' ] J J .] J ^ Proofs of Theorems 8. The proofs of Sections 8 and 9 are abbreviated, with details provided only for central or potentially unfamiliar results. A longer version of these sections is available from the author upon request. Throughout, a generic positive constant and \ . min (B) and maximum eigenvalues of a symmetric matrix X max be C let be minimum and (B) The following Lemmas are useful B. in proving the results for power series and splines. Lemma 8.0: ~ ~ E[h^„(q j) If Assumption 6.1 i) and 11) are satisfied, 2 } < m, i = 1, . . . L} , then Is closed In mean-square. {Zp_ h „(q .): If In addition Assumption 6.1 ill) is satisfied then H is closed. For now, Proof: drop the 2 let K„ = ^ Cmax-{llh„ll C K such that for each Existence of such a >. and for nontational convenience .)}, and Wellner (1990), existence of a constant llhll h „(q By Proposition 2 of Section 4 of the Appendix of subscript. Bickel, Klaasen, Ritov, {S„ argument like that of Stone (1990, Lemma C 1, closed is equivalent to h e Jf there is can be shown using an induction "L„ Rate of Convergence for As noted Interaction Spline Regression," Tech. Rep. No. 268, Berkeley). there, than with h.(q.) assuming that this property holds for each maximal dimension less a, for each h e K, such that for all q- there is a unique decomposition that are strict subvectors of for all measurable functions of Stone (1990), q„ q„, x», q„, that there is a constant E[h(q)^] i c~^E[h^(5^)^]. 30 Y.p-A^p^'^p^ E[h„(q.)5(q„ with finite mean-square. it suffices to show that for any "maximal" proper subvector of any other h = Then, ) ] > = following that is not a c > 1 such that . To show this property, -^c q„ note that that holding fixed the vector of components « -^ of that are not components of q of a strict subvector of each q„ * k, i is a function h.(q.) Then, q„. E[h(q)^] s c~^S{\iqf) + E^^^h^(5^) >^dF(5^)dF(5^) = c"V[X<h^(5^) + j:^^^h^(5^)>2dF(5^)]dF(5^) = c"V[J'<h^(q^)2 + {5:^^^h^(5^)>2}dF(5^)]dF(q^) c"V[Jh^(q^)^dF(5^)]dF(q^) = c~^E[h^(5^)^] 2= To show the second conclusion, = {q'^ + h„(q_) ^1"^ 2 ^2 To show square. h., 22 h converges to some p and hence , = h.-q'S. h_ ^1' 2j J J H so Lemma 8.1: = {f(q): h for d = Proof: e K, is a — jO h. > h. in mean Cauchy sequence, and hence converges to q!j3. h.-q'S., ^1*^0' ^ If the support Q of each additive component C . J H and let '\\^r>. mean-square, iri which is an element of K_ 2 e K. different iable of order there is h subscript, 2 mean-square continuous function of p. converges to . closed, is a p. Cauchy sequence, a . and hence by note e H, h so that by Consider a sequence ^ h„ € K_>. : now add back the ^ and q of f(q) max \d such that for all f e 5, and a = a = and for 1, ^ C}. inf ( < d, such that is continuously f(q) f(q)\ > ^/n., C is a box and there is . then for power series, ^K\f(q)-p'^(q)'n\_, and < CK'"'. a = f-d. First, note that it suffices to show the result for a = r, since the approximation error of the function is bounded by the sum of errors over all additive components. monotonic increasing, For the first conclusion, note that by the set of all linear combinations of include the set of all polynomials of degree 31 QC for some p (q) C |\(K)| will small ^ . enough, so Theorem 8 of Lorentz (1986) applies. Q = let [q ,q and note that ], series up to order for all K. with n sup |af(q)/aq - 5fj,(q)/5q| £ C-K coefficient chosen so that f(q, ) Lemma 8.2: satisfying (q)'Tr = f„(q, d = for e.g. Q, and the constant 1 Jv a^f„(5)/aq|d5 ^ ck"^"'\ If the support of Q is star-shaped and there is q. of f(q) then for power series, for all }, for all star-shaped, there exists s p £ for all q e Q For a function 1. series up to order ti\ m C such that > s CK~°^. ^ (w. l.g. such that for all for an expansion around Note q. ) r = n. that By ^q + (l-3)q € Q q e Q, let P(f,m,q) f(q), denote the Taylor 5P(f , m, aV(f,m,q) = PO'^f m-1 A| so that by induction P(af/aq.,m-l,q), a f(q) „fi\f(q)-p'^(q)' max^\d f(q)\ ^ X, there is d > As above, assume without loss of generality Proof: Q inf f e ?, a, such C is continuously f(q) different iable of all orders and for all multi-indices C ^ |f(q) - f(,(q)| ), 1 K. ^ = {f(q): each additive component that such that C The second conclusion then follows . 1 ia'*f(q)/aq'* - there exists f^(q) = P '' by integration and boundedness of S is a spanning vector for power (q)/9q By the first conclusion, there is f € ?, d p For the second conclusion, , , q)/aq q). . = Also, also satisfies the hypotheses, so that by the intermediate value form of the remainder, max ^j^la'^fCq) - Next, let m(K) combination of , m- A| q) | , | cVe ^ (m-d) be the largest integer such that p (q), ordering" hypothesis, i C mCK)"", PO^f and let fj^Cq) a > 0, c'"^'^ 32 C V[ ] P(f,m,q) and (m(K)-d) such that C ! is a linear By the "natural = P(f,m(K),q). there are constants so that for any ! ] £ CK"", and C,m(K) s K sup|^l^^ ^\d\(q)-dh^(.q)\ = sup|^|^^^Q|a^f(q)-P(a^f.m(K)-U|.q)| ^ CK' Lemma 8.3: C If the support of Q q ? = {f(q): each additive component such that continuously different iable of order then for splines there is ^-1, inf ^K\f(q)-p'^(q)'Ti\, a, TteiK Proof: Powell (1981), I a < CK'"-, such that for all > n. = 1. ^ < d. d = assume r = w. l.g. CK - a f f(q) and there is is and m a f e ?, a = ((/ri)-d. follows by Theorem 12.8 of Schumaker (1981). and let Q = [-1,1]. 1 large enough and some K for there exists f(q)/aq d = 0, f(q)\ ^ C), \d is a spanning vector for splines of degree (q)/aq spacing bounded by sup C of f(q) max and ^ or 00 The result for For the other case, d p 1=1 is a box, . such that for n (q)/aq i C'K I Note that with knot m-d, C. Therefore, by f„(q) = p (q)'Tr , The conclusion then follows by . integration. If Assumption 6.1 is satisfied, Lemma 8.4: 3.4 is satisfied and for any Proof: First, assume that then for power series Assumption there is a constant d > is nonexistent, q the definitions in Abramowitz and Stegun (1972, the ultraspherical polynomial of order Tr2^~^°'r(k+2a)/{k! {k+a)[r(a)]^}, 12 2 1 .) J J J x.{q.) = (2q .-q .-q .)/(q .-q J J J P. k K P (q) (q) ^ J and k and let Ch. 22), for exponent p^^Ux) = C such that for all q = q let . C (a) ..(a) a, n [h'-'^h'^'^^C^'^Ux). Following ix) denote = Also, and define (i^ +.5) = n.^.p .\, (a:.(q.)), j=r A (k) J ^j , K p (q) is a nonsingular combination of 33 by the "natural ordering" let . assumption (i.e. by monotonic increasing). |X(k)| J] V n-=i f ^^5 -~*5 -^ ("^ -"^ -^ ^ 12 r Q = absolutely continuous on for Also, P(q) with pdf proportional to ._^[ci .,q .] J J J ^^^ ^y t^® change of there is a constant > wi th C A^^^(jP^(q)P^(q)'dP(5)) (v +.5) ^^in^J'Vlf^ 'm K M = maXj^^j,|A(k)| = C«C P^^^o:) = ip'^'^Ux) J^'i^-^^ (a;.(q.)) ., M J J ®. , [q J-1 p^^^a:)). by differentiating 22.5.37 of Abramowitz and Stegun (for Next, equal to and r a subvector of P (q) =C. («^j(qj))']dP(q)) 'm ^^J^^'j^^^ where the inequality follows by for +.5) (i^ v ,_„' it follows that for here) and solving, i £ k, I d C so that by 22.14.2 of Abramowitz and Stegun, (x) m (y+ 5) ' , for there £ (a:)/diC A(k-s) as in equation (2.3), |aV^(q)| |A(k-s)| s CK where the last equality follows by Now, P (q) for the case with for P. (q) continue to hold by n 2K ^"^^^X C|A(k-s)r^-^^^"^^ ^ CK' s Cn.^,[l.A,(k-s)]"''''j'^^J s q1 , let bounded. the linear space spanned (q) KJ\. described above. q P P(q E[{q^-P{q^\n^^)}{q^-P(.q^\n^^)}'] Let ). 1/a = q.,, IK k = I The bounds of the previous equation P(q„) = (P Note H (q ^ £ is bigger than J^^, M. ),..., Pj,(q„) so that )' E[q^P(q2)'](E[P(q2)P(q2)']) -U and thus has smallest Furthermore, \ E[P(q2)P(q2)'] by the extremal characterization of the smallest eigenvalue. 34 and K, for, B = Thus, (q) Kl^ eigenvalue bounded away from zero by Assumption 6.1. E[P^(q)P^(q)'] = BDB' P s, 1 = (M ).A E[P^(q)P^(q)']) i A^. (BB')A^. (D) > min{X (E[P(q- )P(q^) min mm min K. min 2 2 . . and for any X K K P (q)P (q)' the number of nonzero elements of satisfied, and there is a constant d > (J=l,...,r), < m. . it C Let B follows that consider the case where First, .. ) i C. > q = q be the B-spline of order (a;), -1 + 2j/[L+l], = ..., j -1, -1, 0, m, is bounded by such that for all sup J Proof: ] If Assumption 7.1 is satisfied then for splines Assumption 3.4 is Lemma 8.5: d ' £ \X\ ,^„|9 P,^(q)\ s XA q^Q, K— A Q = and let ffn_, Let [-1,1]. for the knot sequence with left end-knot ... with X CK, and let j, Pl^(q) =n;iHA^(k)>0)P^_^^(^^(q^). Then existence of a nonsingular matrix follows by inclusion in Q p (q) splines for components of all if |k-k' distribution of 1587) that for ^""^ ^pv^'^p'^ Po c, , L P^ (q)Po c, T L (q) ^ K for (q) h(q) Theorem 19.2 of Powell, B-splines, q e and the 1981). ^px^^'-ip^^pic' ^^p^ 2(m+l)'"K = CK. Also, ~ ^ ^°^ for ^^ll^^l^ (q)'dq) 2 C P(q) the noting that Q, ^^^ so-called normalized B-splines with Burman and Chen (1989, it follows by the argument of l^^^ = Ap implying that the number of nonzero elements of -1/2 evenly spaced knots, . m, K independent uniform random variables on r [2(m+l)/L.] [L„/2] mm {S > is bounded above by P*^(q)P^(q)' X I P corresponding to components of q by a well known property of q € R such that of all multiplicative interactions of usual basis result for B-splines (e.g. Next, A ^i L+m+l'^^l^^ ' ^^^^^ ^^ for all positive integers L. ^ 35 "^^^^ Therefore, boundedness away from zero of the smallest eigenvalue follows by p. P (q) a the subvector of ®f.Pp analogously to the proof of Lemma 8.4. (qn), , Also, since changing even knot spacing is equivalent to rescaling the argument of B-splines, sup„|a B IK {x)/d<c ., JL | s CL d s , derivatives given in the conclusion. implying the bounds on m, The proof when is present follows q as in the proof of Lemma 8.4 Proof of Theorem 4.1: 9. 1 Note that Assumptions 3.2 and 3.3 imply that Assumption is satisfied for J = and 1 = y. y The first conclusion then follows by Lemma 9.9, and the second by Lemma 9.10. Proof of Theorem By reasoning as in the previous proof, 4. 2: the theorem immediately from Lemma 9.11. Proof of Theorem 4.3: Follows immediately from Theorem 4.2. Proof of Theorem 5.1: By Assumption 3.4, K combination of A(h), Z, or for any matrix 2 (q) i C, (8.1) (q) Is a (q) by IIDBll = [trCDB D')] A'Z —1 A 2 1/2 IIDIIX :£ max Thus, B. Z P (q), then follows that (B ) s UFA'S \\X max and hence the -1 A IIDIIA F = n for max -1/2 (B) .by , >^'^^ ] ^ Cv^, —1 /? —1 /9 (Z ) £ CVn. is invariant to replacing A'Z = -1/2 K P (q) by the Assumption 5.2, so that in analyzing the properties of element of does not change = {tr[FA'Z~-^AF']>^''^ s C{tr [FA'Z"-^VS"-^AF' —1 II K P (q) 1/? for a positive definite square-root IIFA'Z"-^''^!! nonsingular linear it suffices to show the conclusion with this and positive semi-definite D IIFA'Z Note that p 2 Note that replacement. 0" Thus, Q. {J K so that replacing p (q), n, P row of j A, A'Z K from C P (q) A the is Invariant to j K. It is a monotonic Increasing sequence in the positive semi-definite semi-order (since this matrix is formally identical to the 36 , inverse asymptotic variance of a minimum chi-square estimator, which rises as additional equality restrictions are added). eigenvalue of A'Z is also monotonic increasing in A having full rank for some 5 CA IIFII max Z~ (Z-Z)Z~'^(Z-Z)Z~ in~'^)^^^ i CA . min A'Z A A (Q)'^""^ ^ CVn. (Z~-^-Z~^ ) (Z-Z)Z~^ = z"^(Z-Z)Z~^ + by Lemma 9.6 and Assumption 3.7, , so that by K, This implies that t~^-I.~^ = Z~^(E-S)Z~-^ = E~^(Z-Z)Z~-^+ Also, the smallest the smallest eigenvalue of K, is bounded away from zero. (8.2) Therefore, IIZ-ZII = o (1), P A ~-l max (Z = ) p such that A Z"^'^^ -1 that IIAZ --!/? 2 IIFA'Z II --1 2 Let TT II and = 0(1) (.iT^^^) max -1/2 i CIIAZ II IIFA'Z and there are positive semidefinite square roots (1), A max (t~^^^) = It (1). Z -1/2 , follows p so that w.p.a.l, II, -1/7 ^ Cn[l + A max (Z '^irA be such that |h(q) (1))] IIZ-SIIO = p --1/2 2 < IIFA'Z + )IIZ-ZII(1 --1/2 2 max (Z = ) (n), p (n), p - P^(q)'ji| Q V ^ CK~"- For R = F[A' t'"^?' h = , (h(q^),....h(q^))'. F[A(h) - A(hQ)] = FA'Z~V'u/n where, for convenience, and (8.2), (8.3) the K + R, superscript on IIRII ^ IIFA'Z~'^P/V^llllh-PSll/v/n + FA'Z~-^P'u/n = FA'Z~^P'u/n + R. Lemma 9.8, By vi) has been dropped. w. p. a. 1, £ C[tr(FA'z"4z~-^AF')]^^^ + Also, P {h-Pn)/n+A' n-Aih^)] {y}''^) liz"-^^^' u/V^II = IIFIIIIA'Tt ^Ik"" - A(hQ)ll s C^nK"" = o (1). R = FA' (Z~^-Z~-^ )P'u/n. and by Lemma 9.6. P 37 By the proof of II(Z-Z)Z~ II £ —1 /? - IIZ-ZIIA max (S 1 = ) /? p (8.4) "? Therefore, Cn(K) /VK). (K = IIFA'i~-^(2-Z)Z~-^P'u/nll IIRII IIFA'Z'VyHlIll (E-Z)S~^'^^II IIZ~-^''^P'u/V^II :£ s CO (K^'^^K^^^<„(K)^/v^) = o (1). P p Next, as A(h), v'FA'Z be a constant vector with let I' and Z. = p'FA'Z~^P^(q, )u,/v^, Note that P'u/n. _1 Also, 1. IZ^^I £ II/JIIIIFA'Z a = s/2, Assumption 5.1, for and the same dimension 1 r",Z. /Vn = ^i=l in is a martingale difference sequence, Z. 9 E[Z^^] = = so that ^11 in llvll (qJlllu^l/'/S £ CK u[l-(2/a)] > 2 and 1 |u. 1 (IZ^^I^)^ - CK^Co^^^'^^'^i'^'^ " ^^<o^^^'^- inequality and K 2 r.Z ./n ^1 ni > m < I 1. for any Also, e-n)Z^ in m Since this result holds for all CkV (K)\[ |u. /Vn Y.^ = J.. (1984) that llvll = 1, in l FA'Z'^^P'u/n = o(l). -^ -^ N(0, 1). N(0,I) and the triangle inequality. (8.4), To prove the second conclusion, note that Assumption hypotheses of Theorem 4.1 are satisfied. Y. |u.-u. l^/n = 1 |h(q.)-h(q.)|^/n L ^i 1 'ii Therefore, by |u.-u. /^7 (8.5) |'^]/(ne) The first conclusion then follows by eqs. follows by the Cramer-Wold device. 1 = o(l), i with i^ CK^C-(K)Vn ci . £ E[Z^ ]/(n6) s ] )^ ^ I *"i ^1 Thus, co. 0. e > 0, J- . It then follows by Theorem 5.2.3 of White (8.3), By s KK, -^ E[1(Z^ Cq(K)|u^|. Davydov's (1968) ^^^"^ '^^ E[(j:.Z^./n - 1)^] s i (C/n)[j:^"^r*'^^ " ^''^^KIZ^ '^i ni ^t=0 in so that /o 1 If IIIIP and 111 £ 2|u. | ? j:^|u^-u^|/n = p 1 :£ I |u. -u. I Thus, for u. implies that the 5. 1 = y - (K/n + (nK"^")/n) = |u.-u.| 11 » ? 1 /? (1){J:^|u.-u. r/n}^'^^ = 38 , 1/2 ), (K/n) = o p + h(q 1/2 {Y^'^/n^). (1). p Let and V and let be as defined above, Z 111 V = T.P^Cq. )P^(q. )'u?/n. ''I (8.6) = P y (q. )P^(q )u 2 and s there equal to so that 2s, J = K, IIV-VII now follows by the triangle inequality that It < IIF[n-n]F'll II FA 'Z"-^ II + ^IIV-VII o (1). noting that by , = U) v y ~ 1 /? _ u7. p (q), Then Assumption 9.1 is satisfied for s > i^l/{^l-l), replacing P (q) K^'^^^QCfO^EilV^i'^" " ° (K^^^Co(*^)^K^^^/n^^^) = apply Lemma 9.6, with Next, 2 s IIV-VII except with (K _ C, = o IIV-VII = C„(K) AK) 7 B , , yi /Vn) = o (i: Therefore, (1). IIF[A'f:~'^(Z-S)Z~-^VZ~^(Z-f:)z"-^A]F'll + 2IIF[A'Z~-^(Z-Z)Z~'^VZ~^A]F'II s o (1) + o (1)X (Z~-^) + o (DA (1) = o (1), max max p p p p giving the second conclusion. ,l/2-,l/2 w Q n — _,l/2 > ,. ., ^ ,v, by continuity of the square root, and , i2„ 1 yy K by -^ I, o Q -1 nonsingular, A = E[p*^(q)5(q)']. (q) it , ^ i// n Q) ,l/2_,l/2 = i/» Q — yy /-1/2--1/2 p — Q .K thatV (i// Q follows that -1/2 i// Q n > Q , ^-1/2-1/2 U Q _, . then ^ Therefore, so the final conclusion follows from the first conclusion. Proof of Theorem 5.2: 5 1 Q ^,1/2 ,, , Therefore, multiplying through by ^ and, hypothesis is satisfied, If the final By iii) and each component of p (q) an element of Let 5^(q) = p^(q)'Z"-^A = p^(q) z"-^E[p^(q)5 (q) ' is the minimiim mean-square error linear combination of follows by ii) and (r^(q) E[a-^(q)ll5(q)-5^(q)ll^] s CE[ bounded that 115 E[5 (q)5 (q)'] -^ E[5 (q)5(q) ' (q)-5„ (q) ] , 11^] E[ll5(q)-5 (q)ll^] -^ -^ 0. Therefore, p ' ] . (q), H, Since it and A'Z'-^A = so that Assumption 5.2 is satisfied by iii). Also, 39 llnn-nQll = IIE[<r^(q)A'Z ^p^(q)p^(q)'Z"-^A] - = IIE[(r^(q)5j,(q)5j,(q)'] - E[o-^(q)5(q)5(q) so that the final hypothesis of Theorem 5. 1 ' n^ll ] II ^ E[o-^(q) Il5(q)-5j,(q) is satisfied. -^ 11^] 0, The conclusion follows by the final conclusion of Theorem 5.1. Proof of Theorem 5.3: exists a sequence A(h) h.(q) e H, is not mean square contionuous then there (J = 1, is bounded away from zero. |A(h.)| that If P(.y\H) = h(q) + rh.Cq), ...) 2, such that E[h.(q) 2 ] — and » Consider any parametric submodel such with true value of y equal to zero. By Chamberlain (1987) the supremum over all such submodels of the Cramer-Rao variance bound is the the asymptotic variance of the least squares estimator, which is 2 (T (q) (E[h.(q) 2-2 2 2 E[(r (q)h.(q) ]) bounded away from zero, Furthermore, by the delta-method and ]. the corresponding supremum for A(h) is [aA(h+rh.)/ar]^(E[h.(q)^])"^E[(r^(q)h.(q)^] i CA(h )^(E[h (q)^] )~^ -^ J J J . . J J Therefore, the supremum over all parametric submodels of Cramer-Rao bounds for is not finite. A(h) Proof of Theorem 5.4: Given in Section 5. Proof of Theorem 5.5: Given in Section 5. Proof of Theorem 6.1: Proceed by verifying the hypotheses of Theorems 4.1 Assumption 3.4 with and 4.2. Lemma 8.4. co. follows by Assumption 6.1 and C^fK^ ~ ^ Assvunptions 3.5 and 3.6 follow by the assumption |A(k-s)| is increasing, which implies that the products of univariate orthogonal polynomial terms form a nested sequence. 6.2 and 2 K = K . Assumption 3.8 with Assumption 3.7 follows by Assumption d = two conclusions then follow from Theorem 4.1. 40 follows by Lemma 8.1. The first The third line follows from The final conclusion follows from Theorem 4.2, Theorem 4.2 similarly. bound on C-v (J^) any ) a from Lemma 8.4, and Lemma 8.2 (which implies Assumption 8 for A > . Proof of Theorem 6.2: By Theorem 5.1 it suffices to show that Assumption 5.3 Assumption 3.4 with is satisfied. and Lemma 8.4. Co(K) = j,3j,2+4i^, i,5+4i7, /n = K /n > Proof of Theorem 6.3: „ , by . , 1 ) — VnK K = K^, Finally, note that — follows by Assumption 6.1 K' Assumptions 3.8 and 3.9 and Lemmas 8.1 and 8.2. K K the follow by ii) and > so that KKC, {K)^/n = . Proceed to verify hypotheses of Theorem 6.3. Assumptions 3.1, 3.2, and 5.1 are satisfied by the independence of the observations. Note that A(h) = h (q )-h (q is continuous with respect to . . . J J J |h|„ J |h| J 6.3 is satisfied. E[5(q)h(q)], with V = 2. , d.oo Proceed from Theorem 5.2. as in the proof of Theorem 6.3, J - ) r h(q), = J so that ii) of Theorem , The conclusion then follows by taking Proof of Theorem 6.4: q J A(h) = d h.(qj/5q. while , q 1 0,00 is continuous with respect to '^ 3 h(q)/3q. = h(q, .) J It ip = Q follows by -1/2 u . A(h) = that Assumption 5.3 is satisfied Theorem 5.2 ii) follows by the well known spanning result for power series for bounded Proof of Theorem 6.5: the proof of Theorem Proof of Theorem 7.1: q (e.g. Gallant, 1980), giving the result. Follows from Theorem 5.3 by the same argument used in 6. 5. Proceed by verifying the hypotheses of Theorems 4.1 and follows by Assumption 7.1 and Lemma 4.2. Assumption 3.4 with 8.5. Assumptions 3.5 and 3.6 follow trivially by Cp,(K) = K' K constant. 3.7 follows by Assumption 7.2 and Lemma 8.5, which implies Assumption 3.8 with follows by Lemma 8.3. 41 Assumption K = ^ CK 2 . The conclusions then follow from . Theorems 4.1 and 4.2, with the bound on Proof of Theorem 7.2: and Lemma 8.5. Lemma 8.3. K^K^/n = Cr,^^^ ~ ^' Assumptions 3.8 and 3.9 and Finally, note that ^0 K^n A By Theorem 5.1 it suffices to show that Assumption 5.3 Assumption 3.4 with is satisfied. by from Lemma 8.5. <..(K) K ^ CK follows by Assumption 7.1 — VuK. follow by ii) and > by Lemma 8.5 so that KK< (K) /n = i). Proof of Theorem 7.3: Follows analogously to the proof of Theorem 6.3. Proof of Theorem 7.4: Follows analogously to the proof of Theorem 6.5, except for the explicit formula for 5(q) given, which follows by J'w(q.)[a^h.(q.)/aq'!]dq. = Jw(q)[5'^h(q)/aq'^]dq w(q) = w(q )f (q_ for [d^wiq)/aq^]/f(.q) = [a'^wCq j/Sq'!] f (q_ )/f (q) = [a^w(q j/aq'^J/f (q . 9. . . . | q_ . ) . ) where , Useful Lemmas This Section gives general results on convergence rates for certain It is useful to allow throughout for a vector of series remainder terms. estimates with dimension that can increase with sample size. To do so, it is necessary to introduce more notation and assumptions. m I Let {y.,)._, observation z. j_i be a collections of functions of a single data For notational convenience, be suppressed in what follows. the J subscript on will y. The results will pertain to certain of sample averages of these functions, or of series estimators of the projections of y. on y. .-h.(q.) ij 1 J K. Denote the observations on y. by y. ., and let u. . h.(q) s and objects without a subscript denote corresponding vectors of 42 n or observations, J K = K(z , . . . , y. = (y e.g. let z ,n), p = = (u. u. For )'. u. The estimators are p (q )]'. (q.) [p and .)' ,y h.(q) = p^(q)'(p'p)"p'yj. Assumption For 9. 1: s > and 1 i' ( y 2 E[B .Iq.] i C and either a) coefficients such that rit) (i=1.2.... = 0(.t~^), <pU) (t " "^ Z+=i<^(*) z = J) max , is uniform 1, 2, . max. 2i^d . ) . ^ v ,|u..| ij . j^J ^ IB ., yi .1 yi s <oo, mixing with mixing (0) , (J)B y > or 2, |E[u. -u. . there exists b) - -Iq-.q.^^ll cU)v (J) , ). Henceforth, let = ^^^i ^i' I^£=i = ^^^ T.(^' = Y.i=i ly The first few Lemmas consist of useful convergence results for random matrices with dimension that can depend on sample size. denote symmetric matrices such matrices, and A max Let and (•) Z X and Z the mm (•) . smallest and largest eigenvalues respectively. Lemma 9.1: = o IIZ-SII If X then (1) i C (Z) X p mm it) . with probability approaching one ^ C Proof: For a conformable vector X = min„ mm (Z) . C - o + A^l'I^l >-^ „ ll/ili=l (1). Lemma 9.2: If ^ C A , min (Z) mm (Z) . conformable matrix such that IIZ"-^''^D n Proof: = II p It < llAlhllBII, tr(A'BA) :£ (e n it follows by ^l, ^ ^ (Z-Z)fi> A A =: w.p.a.l, ^ -1/2 D . max • II a matrix norm that (Z-Z) ^ X mm (Z) . - IIZ-ZII £ w.p.a.l. ^ C/2 IIZ - A mm (Z) II IIZ-ZII = o and D for some e (1), p = II n (€ p n ) is a n n , then ). is easy to show that for any conformable matrices IIA'BAII IIAII^A and w.p.a.l. fi' Therefore, p (w. p. a. 1) max i IIBIIollA'AII, (B), IIABII ^ and that if IIAIIA max (B) 43 B and A and B, IIABII is positive semi-definite, IIBAII ^ NANA max (B). Let Z -1/2 is an orthogonal matrix and A roots of the eigenvalues of Z -1/7 X max (Z ) (9.1) = IX -1 p n such that n -1/2 X max --1 (Z X llP'P/n - Zll ^ C mm (Z) . = o (1) P W = P(P'P) P' Let and w. p. a. 1. and rows, A and is a max (Z~-^)] (e^). P n a random matrix u K x n random matrix (e and p Then n tr(u'p(p'p) p'u/n) = W = p(p'p) Since the space spanned by G and If lIG-Gll^/n £ For W ' (e p = PA p '^ 2 n ). P and p is a subset of the space spanned p Z = P'P/n. Then by Lemma 9.2, (e^). n denote random matrices with the same number of columns and u = Y-G. and let ), be the orthogonal projection p' Let For a vector tr(u'p(p'p) p'u/n) = let p - Proof: Then (1). — P'u/nll = is positive semi-definite. W-W Lemma 9.4: n, p n -1/2— IIZ P Y = + ll(Z-Z)Z"-^D II^A P , tr(u'Wu/n) s tr(u'Wu/n) = IIZ~^'^^P'u/nll^ = Let ) (e^)[l + o (1)0 (1) + IIZ-ZII^A (Z'-^'^^J^O (1)] = n max p p p Suppose respectively. P, U is positive definite and operators for the linear spaces spanned by the columns of by where ) + I1Z"'^''^[Z-Z]Z"-^''^II) is a random matrix. A Proof: Z Also by ^ Lemma 9.1, . n 11^(1 UAU' rows. Lemma 9.3: where 1/? Note that . denote the trace of a square matrix tr(A) with -1 )] n which is equal to Z a diagonal matrix consisting of the square = tr(D' [z"-^-z'-^]D 11^ n liZ'-^'^^D £ (Z max IIZ'-^'^^D s Let -1 be the symmetric square root of (e 2 ). n = (p'p) p'Y and G = prr. Then for any conformable matrix (e^) + IIG-pnll^/n. p and n W as in the proof of Lemma 9.3, idempotent, 44 n by Wp = p, and I-W IIG-GII^/n = tr[Y'WY Y'WG - G'WY + G'G]/n = tr[u'Wu + G' (I-W)G]/n - s tr[u'Wu + (G-ptt)' (I-W)(G-p7r)]/n s Suppose X llP'P/n - Ell Lemma 9.5: such that (S) . = o w.p.a.l., ^ C n P llG-p7rll^/n. ^ is a random selection matrix. S \\n-n\\^ so p (e^) +0 p (1 n ) A . min p'p/n = S'(P'P/n)S By (P'P/n). Also, Thus, A note that for W llTT-Till^ = £ A . . min min + (e^) n p Proof: Then for any conformable matrix n, IIG-p7rll^/n, ^ tr[(7r-Tt)'S'ES(rt-7r)] ' n P P = PS and (e^), P where random matrix K x n is a tr[u'P(P'P)~P' u/n] = and (1) + (s^) p p (1 ) IIG-p7rll^/n. for the selection matrix (p'p/n) i C w.p.a.l, f > so W = p(p'p) p' as above, A . ^^^ . min = (p'p/n)~-^ ^' r- G = and , A S, (p'p/n) i p (1). pTr, (p'p/n)~-^tr[(7r-Tr)' (p'p/n) (n-Tt)] ^ ^ (l)tr[Y'WY - Y'WG G'WY - + G'G]/n P £ (l)[tr(u'Wu/n) + IIG-GII^/n] = P (e^) P To prove the second conclusion, + n p (1 ) IIG-GII^/n. note that by the triangle inequality and the same arguments as for the previous equation, tr[(7i-7r)'S'ZS(7i-7i)] = tr[(n-rr)' [S'ZS-p'p/n] s llTi-Till^llS'ZS-p'p/nll + (e^) p £ [0 (6^) + p n p n + (l)IIG-GII^/n](l + p (tt-tt)] (tt-tt)' (p'p/n) (tt-tt) (l)IIG-GII^/n - P'P/nll) = IIZ + (e^) p n + ( p 1 ) IIG-GII^/n. The next results give convergence rates for sample average with dimension that can grow with sample size. below, let a > 2/i/((j-l) For ji in Assumption be as small as desired, 45 and 1 and s in Lemma 9.6 ^ f-(l/2) + (l/s)-(l/A), I -1/2 Lemma 9.6: and B for s > , , ^ = s > 2(j/(/J-1) 2u/(u-l) fl (s/2a), s s 2m/(h-1) 1/2 s > 2/i/(ji-l) - I^ (J = max .^,\y .\ ^ v (J)B .. yi j^ ij y . , 1, 2, and ...), \B .\ yi s v (J) < m then 1, (A llE^y^/n - E[y^]ll = The proof for the case Proof: 2fi/(/J-l) :£ If Assumption 3.1 is satisfied and there exists increasing with . yi s (J)n"). s > 2|Li/(fx-l) follows immediately from applying Davydov's inequality to the covariance terms in E[ IIJ].y./n-E[y. 2 ] II ]. The proof for the other case follows by a truncation argument analogous to that used to prove weak laws of large numbers. K For the of Assumption 3.4 and P (q) 11 = P^(q), Z = E[P(q. )P(q. number of elements of lli - = Zll P^(q)P^(q) y. = is nonzero for some _ IP, i^Cq. )P»v^(q. kK i cK. i _ 2 V (J) = CrtfK) Let y, h .(q. J 1 ), and K let P(q) denote the Q. (K<-(K)Vn). ^0 Apply Lemma 9.6 for 9.6 with T '^.PCq. )P(q. )'/n. ^1 ^t=l ^1 that are nonzero at any point in P(q)P(q)' p e Q, of Assumption 3.5, If Assumptions 3.1, 3.4, and 3.7 are satisfied then Lemma 9.7: Proof: S = )' ]. — — — K = K(n) B , ) . =s I = 1, C< (K) and 2 , P, rr(q)P/,7(q) q e Q. for all Here, k J = K. and I such that Note that for all q so that Assumption 9.1 is satisfied for s = m. Thus, the conclusion follows by Lemma s = > 2fi/(ji-l). h, and and h .(q. J be h n x J matrices with respective ). 1 46 ij elements y. ., , . If Assumptions 3.1, Lemma 9.8: 3.4, u = y-h, Let Proof: =P^(q.), P. 1 E[P P.^'u u expectations, EEP^^P'^^u^ by each element of residual with H. Z = E[P.PM, |q . ^v ] ] (9.6) v' (9.7) v' H cU) for [C..+C.']v £ 2\E[v'P.P'. jt 1 ,i^E[u. .u. ij -1/2 IIZ (J)^tr(E~-^''^[C + . u. = (u, .Iq.,q. ^1 ^i+t,] ,u Ij 27" — 1/2 P'u/nll = ZcU)v iJ)^v'Zv, ] I (J)^cU)v' Zv. s 2v y is positive semi-definite in either case, y giving 9. 1 for Then it follows that .)'. nj = r.-^^ELu'.PE'V'u.l/n^ E[llz"-^''^P'ull^/n^] Ji^ =0 (1984) that for any i+i, J J s ] given there, c(i) i+t Let as in (9.6) or (9.7). (9.8) E[P u = Ct 2 Zv (J) c(t)S - (C..+C.',) jt Jiy Thus, s and orthogonality of the projection [C.,+C.']v £ ZCt'^^v'ElP.P'.u^ .]v ^ jt . Then under the uniform mixing condition of Assumption under Assumption 9.1 b), Also, c(t) and v . jt Also, (J)^Z. it follows by Lemma 2.2 of White and Domowitz conformable C By Assumption 9.1 and iterated an element of P and 11 = E[Pj^P^E[u? .] then (J)^/n). (JKi^ 1 (0 s t < n-i). ], . and 9.1 are satisfied, 3.7, =0 tr[(y-h)'p(p'p)~p' (y-h)/n] (9.5) a), 3.5, ((JK) p ^t=0 c(i)]ZZ"'^^^)/n = 0(JKi^ (J)^/n), y v (J)n y -1/2 ). The conclusion then follows by Lemma 9.3, Lemma 9.7, Assumption 3.5, and Assumption 3.7. Define 5(h,K,d,v) = inf max.. TT 5(h,K,d,oo) = inf max \ ^ . \ .. ,{E[|a'^{h(q.)-7r'p^(q.)}r]}^'''' + exp(-exp(K) 1 —CI sup I 1 a'^[h(q. )-7i'p'^(q) 47 ] | + exp(-exp(K) ) ) Lemma 9.9: If Assumptions 3.1, 3.4, p Let Proof: and let n [tt s. {E[ |h,(q. for K s K s K. t^ 1=1 w. p.a. 1 K-N^K K 1/v V (q, Jr>- ) ] | J ] >^ s 5(h,,K, 0, v). J 1- 5(h.,K,0,v) s 5(h.,K,0,v), j 1- j the indicator function for the event K s K £ K, ^ E[max^^j,^j^llh-p'7rj,ll^/n] E[l^llh-p'Trj^ll^/n] Then by 'p By Assumption 3.5, ,v^ Jk Then for )-7r 1 J Ik and 9.1 are satisfied, J J'^ = 3.7, y be such that .„ jr 3.5, and the Markov inequality, 2 llh-p'7T/>ll /n £ V ?/v Also, by Lemma 9.8, (y-h)'p(p'p) p' (y-h)/n = ). iY,.{Y.^^„^^(h.,K,0,w) } P J ^—K— J — 1/2 -1/2 ((JK) I' (J)n ). The conclusion then follows by Lemma 9.4. Is. If Assumptions 3.1. Lemma 9.10: for the distribution of F(q) 3.4, 3.5. 3.7, and 9.1 are satisfied, then q, {5:.J[h.(q)-h.(q)]^dF(q)}^''^ P Let Proof: K P (q) y J K=:1S.-1S. J be as defined in the proof of the previous Lemma, with n. replacing K p (q). Then by the same argument as there, Xl^llh(q)-P^(q)'7r-ll^dF(q) s i:j{5:j^^j,^^5(hj, K, 0, Next, P = [P^(q^), -V w.p.a.l, 71 . . . ,P^(q^)]', = (P'P) P'y, P=[P^(q^) S = jP^(q)P^(q)'dF(q), apply Lemma 9.5 with the selection matrix such that S 71 v)''}^"'''. = ir^. G = h, 48 - 1/2 e^ = (JK) v (J)n ^^^''n^^'' P^(q) = S'P^(q) -1/2 . From the A conclusion of Lemma 9.5 and the argument in the previous Lemma, follows it that = JllP^(q) [ii-TT^] ll^dF(q) Xllp'^(q)[n-7r^]ll^dF(q) = tr{(n-7r^)'[JT^(q)P^(q)'dF(q)](n-Tr^)} = tr[(jr-7r)'S'ZS(n-7r)] :£ p = °p^^^ (e^) + n * Op(l)llh-P'7r^ll2/n = (1) IIG-P7rll^/n p Op(.2) . Then by the first equation of this proof, and J'llh(q)-h(q)ll^dF(q) s C{ JllP^(q) [rt-ir^] 0p(I.{I^^^^^5(h..K.0.v)^}2/-) 1- = w.p.a.l, 1 + Jl^llh(q)-P^(q) ll^dF(q) ' Tij^ll^dFCq) >, so that the conclusion follows. Lemma 9.11: If Assumptions 3.1, and are different iable of order P (q) Let Proof: =0 - a^(q)ll n. e R^ „|A.„(q)| qtU :£ J and be such that, for and 5(h.,K, |A|,oo) sup J K. Also, let and k ^ K h(q) K, ^^^" (K^''^C,^|(K)[(K/n)^^^+{E.5(h..K. Ul.co)^}^''^]). A. h h (q)-P^(q) (q) . JK. JJ>>- each for each \\\ JK- sup and 9.1 are satisfied, 3.7, ^Ul^^^ ^<0^^^' ^^'^ "•^^sK^^eo'^^kK^^^I ^<|A|^^^' sup^„lia^(q) 3.5, 3.4, „ qsy a\ js. .^(q) K x J be the tt | 'tt . JK. J 5(h :£ I .. K, |X | , oo) for J matrix with column J and A^ the n x J matrix with Assumption 3.7, 5(h [P*^(q^),...,P^(q^)]', (9.9) ., K, | A| , oo) ^ 5(h ., K, | A| , oo) Note that by A.„(q.). w.p.a.l. Thus, for w.p.a.l. llh-Pn^ll^/n = IIA^II^/n = K. element ij K. J] -EJ 1 -i^Cq- jK. 49 1 )^/n i j; .5(h J ., J K, | A | n. jK. K. . oo)^. P = . Next, Y = y let and G = h. Note that columns of linear transformation of the columns of by Lemma 9.8, Thus, P(P'P) P' = p(p'p) p' so that p, tr [ (Y-G)'P(P'P)~P' (Y-G) ]/n = are a nonsingular P by Lemma Thus, (K/n). 9.5. (9.10) llTC-Tri>ll^ K s (K/n) +0 p P (l)y'.5(h ..K, ~ J U| , oo)^ J = Op([(K/n)^''^ + {j:j5(hj.K, lAl.o.)^}^^^]^). Noting that 3 h(q) = n' d P (q), inequality that for any lia'^h(q) £ Since the tefrm q e Q, - a\(q)ll^ s C{ll(^-Tr/>)'a'^P^(q)ll^ + ll7r-'aV^(q) ll7r-7r-ll^lia'^P^(q)ll^ r>- it then follows by the Cauchy-Schwartz + j:.A.-(q)^ ^ KC J JN- , . , I A I (K)^llii-7r" 11^ f^. + J] .a(h J following the last inequality does not depend on conclusion then follows from eq. (9.12). 50 ., - K, a\(q)ll^> | A t , oo). J n, the first References Abramowitz, Functions. M. and Stegun, Washington, I. D. C. eds. (1972). Handbook of Mathematical Commerce Department. A., : D. W. K. (1991). Asymptotic normality of series estimators for various nonparametric and semiparametric models. Econometrica. 59 307-345. Andrews, Andrews, D.W.K. and Whang, Y.J. (1990). Additive interactive regression models: Circumvention of the curse of dimensionality. Econometric Theory. 466-479. 6 Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1990): "Efficient and Adaptive Inference in Semiparametric Models" monograph, forthcoming. Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 80 580-598. Breiman, Buja, A., models. L. , Stone, C.J. (1978). Nonlinear additive regression, note. T. and Tibshirani, R. (1989). Linear smoothers and additive Annals of Statistics. 17 453-510. Hastie, , Burman, P. and Chen, K.W. (1989). function. Annals of Statistics. Nonparametric estimation of a regression 17 1567-1596. Chamberlain, G. (1986). Notes on semiparametric regression. Preprint. Department of Economics, Harvard University. Davydov, Y. A. (1968). Convergence of distributions generated by stationary stochastic processes. Theory of Probability and Its Applications. 13 691-696. Granger, C.W.J., Rice, J., and Weiss, A. (1984). Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association. 81 310-320. Engle, R.F. , Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association. 76 817-823. Gallant, A.R. (1980). Explicit estimators of parametric functions in nonlinear regression. Journal of the American Statistical Association. 75 182-193. (1981). On the bias in flexible functional forms and an essentially unbiased form: The Fourier flexible form. Journal of Econometrics. 76 211 - 245. Gallant, A.R. (1985). A method for calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators. Journal of Econometrics. 30 203-238. Hansen, L.P. 51 W. and Stoker, T. (1989). Investigation of smooth regression by the method of average derivatives. Journal of the American Statitistical Assocations. 84 986-995. Hardle, Heckman, N. E. (1986). Spline smoothing in a partly linear model. the Royal Statistical Society, Series B. 48 244-248. Lorentz, G. Company. (1986). G. Journal of Approximation of Functions. New York: Chelsea Publishing Newey, U.K. (1988a). Adaptive estimation of regression models via moment restrictions. Journal of Econometrics. 38 301-339. (1988b). Two-step series estimation of sample selection models, Princeton University. Department of Economics. Newey, W. K. preprint. (1990). Series estimation of regression functionals. preprint. MIT Department of Economics. Newey, W. K. Newey, W. K. (1991). The asymptotic distribution of semiparametric estimators. Preprint. MIT. Department of Economics. (1981). Approximation Theory and Methods. Cambridge, England: Cambridge University Press. Powell, M.J.D. Powell, J.L. Stock, J.H. and Stoker, of index coefficients. Econometrica. , , (1989). Semiparametric estimation 57 1403-1430. T. M. J. (1986). Convergence rates for partially splined estimates. Statistics and Probability Letters. 4 203-208. Rice, Robinson, P. (1988). Root-n-consistent semiparametric regression. Econometrica. 56 931-954. Schick, A. (1986). On asymptotically efficient estimation in semiparametric models. Annals of Statistics. 14 1139-1151. Schumaker, (1981): L. L. Spline Functions: Basic Theory. Wiley, New York. Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics. 10 1040-1053. Stone, C.J. (1985). Additive regression and other nonparametric models. of Statistics. 13 689-705. Stone, C.J. Tech. Rep. (1990). No. L 268, Aru^als rate of convergence for interaction spline regression," Berkeley). (1984). Cross-validated spline methods for the estimation of multivariate functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th Anniversary Conference Iowa State Statistical 205-235, Iowa State University Laboratory (H.A. David and H. T. David, eds. Wahba, G. ) Press, Ames, Iowa. 52 White, H. (1980). Using least squares to approximate unknown regression functions. International Economic Review. 21 149-170. White, H. Press. (1984). Asymptotic Theory for Econometric ians. Orlando: Academic White, H. and Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica. 52 143-161. M. D. and Thomas, D. M. (1975). Ozone trends in the Eastern Los Angeles Proceedings International basin corrected for meteorological variations. Conference on Environmental Sensing and Assessment 2, held September 14-19, 1975, in Las Vegas, Nevada. Zeldin, , 53 40 7 -6! Date Due «AY 3 IfSOT MIT 3 TQflO LIBRARIES D07nS51 1