Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/convergencerates00newe2 uewcjf HB31 .M415 working paper department of economics CONVERGENCE RATES & ASYMPTOTIC NORMALITY FOR SERIES ESTIMATORS Whitney 95-13 K. Newey Jan. 1995 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 CONVERGENCE RATES & ASYMPTOTIC NORMALITY FOR SERIES ESTIMATORS Whitney K. Newey Jan. 1995 95-13 MASSACHUSETTS INSTITUTE OF TFCKNtunGY AUG 1 6 1995 95-13 CONVERGENCE RATES AND ASYMPTOTIC NORMALITY FOR SERIES ESTIMATORS Whitney K. Newey Department of Economics, MIT January, 1995 ABSTRACT This paper gives general conditions for convergence rates and asymptotic normality of series estimators of conditional expectations, and specializes these conditions to polynomial regression and regression splines. rates are derived. estimators, covering Asymptotic normality is Both mean-square and uniform convergence shown for nonlinear functionals of series many cases not previously Also, a simple condition for treated. v'n-consistency of a functional of a series estimator is given. The regularity conditions are straightforward to understand, and several examples are given to illustrate their application. Keywords: Nonparametric Estimation, Series Estimation, Convergence Rates, Asymptotic Normality. Whitney K. Newey Department of Economics MIT; E52-262D Cambridge, MA Phone: (617) 253-6420 Fax: (617) 253-1330 E-mail: wnewey@mit.edu 02139 Support was provided by the NSF for research for this paper. Introduction 1. Nonparametric estimation is depend on a conditional expectation analysis, one could model the E[y|x] is 1990). Series estimation unknown. For example, demand in Hausman and Newey, One way to estimate 1995). Also, it is convenient for imposing certain restrictions on is such as additive separability relatively is Models often by least squares regression on approximating functions, referred to here as series estimation. E[y|x], that applications. demand function as a conditional expectation with unknown functional form (e.g. see Deaton, 1988, or E[y|x] many econometric useful in (e.g. see Stone, 1985, or Andrews and Whang, computationally convenient, because the data few estimated is summarized by a coefficients. Large sample properties of series estimators have been derived by Stone (1985), Cox (1988), Andrews and Whang (1990), Newey (1988, 1994a, b, c). This paper extends previous results on convergence rates and Eastwood and Gallant Gallant and Souza (1991), These extensions include convergence rates that asymptotic normality in several ways. are faster than some published ones (1991), (e.g. such as Cox, 1988) or have weaker side conditions than current results (e.g. than Newey, 1994a). Also, asymptotic normality is shown for regression splines and/or nonlinear functionals that are not covered by the results of Andrews (1991), and a simple primitive condition for v'n-consistency Newey K that is less restrictive than previous results (e.g. those of 1994), requiring only 2 K /n given. under an upper bound on the growth of number of In addition, the results are derived terms is — for regression splines and > Andrews 3 K /n — > 1991 or for power series estimators of linear functionals. To describe a series estimator, expectation and g let g n (x) = E[y|x] denote some function of x. denote the true conditional Also, consider a vector of approximating functions K p (x) = (P 1K (x) Pkk (x)) '' (1) having the property that a linear combination can approximate (i = 1, n) ..., g(x) = p where A denote the data. K series estimator of will be nonsingular with probability approaching one, is P (x n )]'. (2) Under conditions given below, denotes any symmetric generalized inverse. B (y.,x.), K [p^x^ P = = (P'PfP'Y, (x)'/3, g n (x) Let g n (x). and hence P'P will be the (P'P) standard inverse. Series estimators are convenient for imposing certain types of restrictions. example, suppose that E[y|x] is additively separable in two subvectors x For and x, of cL x, so that E[y|x] = g (x ) + g (x ). a a b b (3) This restriction can be imposed by including in x or 3. x, , D but not on both. Additivity of p (x) functions that depend either on then results from g(x) combination of these functions of the separate arguments. linear model, where one of the components is it being a linear Another example a linear function, as a partially is in E[y|x] = x r + g (x ) a Q b b (4) This restriction can be imposed by including only x and functions of cL The value of imposing either of these restrictions is when of categorical variables. We x a p in D (x). that efficiency improvements result, in the sense that convergence rates are faster. particularly convenient x, The partially linear model has a large number of elements, e.g. when x a is consists explicitly consider both types of restrictions in this paper. There are several different types of series estimators that could be used, such as Fourier series, power series, and splines. because, as in Andrews (1991), g is periodic, (x) which is Fourier series are not considered here is difficult it to derive primitive conditions except not relevant for most econometric applications. when Primitive conditions will be given in Sections 5 and 6 for power series and regression splines. Convergence Rates 2. The results will follow from a few primitive, easy to interpret conditions. The first condition is Assumption 1: (y.,x (y ,x ) are ) The bounded conditional variance assumption convergence rates. For the next condition norm for a matrix Assumption P K = Bp (x) 2: (x); K xeX IIP (x)ll there K and; £ < (K) Q and ii) let IIBII is K(n) is bounded. to relax without affecting the = [trace(B'B)] be the support of There K = Var(y|x) is difficult 1/2 is K K E[P (x)P (x)'] is a sequence of constants such that be the Euclidean x.. a nonsingular constant matrix the smallest eigenvalue of i) zero uniformly in sup K For every K X Also, let B. and i.i.d. 2 C Q (K) K/n — » B such that for bounded away from satisfying C n (K) as n -4 «. This condition imposes a normalization on the approximating functions, bounding the second moment matrix away from singularity, and restricting the magnitude of the series terms. This condition is useful for controlling the convergence in probability of the sample second moment matrix of the approximating functions to the Euclidean norm. This is Newey (1988) and Andrews (1991) also also difficult for Gallant's Fourier flexible form. its expectation, in impose bounds on the smallest The reasons for difficulty are discussed in detail in Gallant and Souza (1991). this moment matrix. eigenvalue of the second Primitive conditions are given below for regression splines and power series when the density of with CVK equal to C n (K) 2 — K /n restrictions CK 3 — K /n or » and mentioned » To do so, let is it |gl Assumption as K — 3: d = leading to the useful specify a rate of A (A. = A )' let x, denote a vector of |A| = E-.A., and | £d sup |A| xg;r For an integer d £ 1 g(x)/9x la «««ax there are a, r r | such that £ lg n ~P 'Pv'h = derivatives up to order d K shrink at The integer . d will be specified the order of the derivative for which a uniform convergence rate x, ^^ ' m. » This condition requires that the uniform approximation error to the function and integer be d Define max |;V C, in the introduction. nonnegative integers, having the same dimension as any nonnegative integer. bounded away from zero, is respectively, for a constant For controlling the bias of the estimator approximation for the series. x a is related to the smoothness of the function and the size of d. g n (x) a = that exist and where s/r, r is s is below as The the dimensionality of For example, for splines and power series and assumption will be satisfied with derivatives of g n (x), derived. is its d = 0, this the number of continuous the dimension of x. It should also be noted that to derive a mean-square convergence rate only a mean square approximation rate is needed (i.e. E[<g (x)-p (x)'P„} u K. ] = 0(K rate specified here, but for simplicity we do To state the general convergence rate rather than the uniform convergence )), make not result, denote the cumulative distribution function of x. l C d (K) = max lx|sd sup A xeX lia P K (x)ll. this generalization explicit. some notation and is required. Let F Q (x) It assumed throughout that will be exists whenever result on Theorem appears it C rf (K) £ for large enough 1 and that K, C H (K) The following result gives a general in the assumptions. mean-square and uniform convergence rates. If Assumptions 1: 1-3 d = are satisfied with 2 S[g (x) - g(x)] dFQ ( X ) = O (K/n + K then 2CL ). p some Also, if Assumptions 1-3 are satisfied for \g - This result is g \ d = O «: (K)[VR/Vn + K~ p d similar to Newey d £ a ]). mean square is K (1994a), except that the requirement that 2 C n (K) K/n depend on the data here and the requirement that square error result then different than Andrews and error, rather than sample Whang mean square — is > does not The mean different. (1990), in applying to integrated error, and in imposing Assumption 2. The conclusion for mean-square error leads to optimal convergence rates for power series and splines, i.e. that attain Stone's (1982) bound. K corresponds to a variance term and to a bias term. these two terms go to zero at the same rate, which occurs the same rate as *""'" n convergence rate will be rate will be n -r/(r+2s) , (and the side condition n 0, K when C n (K) K/n is chosen so that goes to infinity at — is satisfied), > a = s/r, will not be optimal. K/v^i + K 1/2— s/r , For example, for splines and which cannot attain Stone's Nevertheless, these uniform convergence rates improve on some in the literature, e.g. on Cox the the which equals Stone's (1982) bound. the corresponding rate will be (1982) bound. K When essentially For power series and splines, where . The uniform convergence rates = K/n The term (1988). Also, it is not yet known whether it is attain the optimal uniform convergence rates using a series estimator. possible to d 3. Asymptotic Normality There are many applications where a functional of a conditional expectation For example, a common practice interest. function in log linear form, where y is in demand analysis is the log of consumption, is of estimation of the demand x is a two dimensional vector with first argument equal to the log of price and second the log of income, and the estimated demand function analysis e is g(x) .A functional of interest in demand approximate consumer surplus, equal to the integral of the demand function is over a range of prices. 6 = e For a fixed income i(lnt ' lnI) dt I an estimator of this functional would be (5) . Jjj An asymptotic normality result for this functional could be useful in constructing approximate confidence intervals and tests. This Section gives conditions for asymptotic normality of functionals of series In this Section estimates. we focus on i/n-consistent case in the next Section. functional of g, i.e. To describe the results, let a( • ) be a vector a mapping from a possible conditional expectation function to a The estimator real vector. the "slower than 1/Vn" case, and discuss the will be assumed to take the form 9 = a(£). (6) For example, the approximate consumer surplus estimator satisfies this equation with a(g) = e s e where gn To use ' Q dt. = a(g The true value corresponding to this estimator will be (7) ), denotes the true conditional expectation function. 6 for approximate inference procedures, asymptotic variance estimator. it is important to have an Such can be formed from a delta-method estimator of the I variance of A = A when as a function of the estimated coefficients 6 aa( P exists, K '/3)/8pi g, and otherwise A let be any vector with the same dimension as A regularity conditions given below will be imply that approaches one in large samples. is The exists with probability that 2 t = E^p^x-lp^x.)' [y.-g(x.)] /n. (8) just the usual one for a nonlinear function of least squares A The vector coefficients. £. Let Q = P'P/n, V = A'Q~IQ~A, This estimator Let £. is a Jacobian term, and Q ZQ is the White (1980) estimator of the least squares asymptotic variance for a possibly misspecified model. This estimator will lead to correct asymptotic inferences because accounts properly it for variance, and because bias will be small relative to variance under the regularity conditions discussed below. Some additional conditions are important for the asymptotic normality Assumption E[{y-g_(x)> 4: 4 |x] is bounded, and Var(ylx) This assumption requires that the fourth conditional strengthening Assumption that it The next one 1. is is moment result. bounded away from zero. of the error is bounded, a smoothness condition on be approximated sufficiently well by a linear functional when requiring a(g), g is close to % Assumption 4 2 £ ,(K) K /n for some that Either 5: — c a(g) is linear in and there exists a function > C, a) > and all g, g lla(g)-a(i)-D(g-i;i)ll £ C(|g-il The interpretation of D(g;g) is with 2 d that ) it D(g;g) lg-gn and is or; g, l < e, For b) that is lg-g IID(g;g)-D(g;g)ll as in Assumption 3, d l , < e, s L g I a functional derivative of it d I and such that g linear in is g-g a(g). I true - d Indeed, this assumption implies that norm I g is is Frechet differentiate in often straightforward to verify. easy to check that will be satisfied with it D(g;i) = JPg(lnt,lnT)e as long as [lnE,lnp]xOnT> note that for lg-g Q < l is expansion of e with respect to the g ' d = and dt, (9) contained in the support for lg-g l Q and g e, < and lla(g)-a(g)-D(g-g;g)ll =£ z gives z |e are in some bounded set. /Pc|g(lnp,lnT)-i(lnp,lnI)| 2 i = e z Assumption n g = gn denote the derivative at ) a(g) 5: from Assumption 3, is a scalar, there exists and there exists zzz~ ~ 2 Therefore, dp * C(p-E )( lg-il such that K ~ is ) Let ^ C g | E[g-.(x) K. for , I 2 ] — d and > K. bounded away from zero. This assumption says that the derivative mean-square norm estimator 2 Q derivative. |D(g)| such that (x)'/3.. K. D(g,.) for some -e -e (z-z)| £ C|z-z| . C g„(x) = p bounded on so that a mean-value , The next requirement imposes some continuity conditions on the D(g) = D(g;g this is so, will be uniformly g z d J e /dz z, To see that x. i Also, for any scalar around some other point z consumer surplus example In the ~ z when C i(lnt lnT) and e the range of integration. constant is .. I This condition it a(g) 6 (E[g(x) 2 1/2 ]) . restriction imposed is cases of interest. When that a(g) a(g) is is continuous in lgl H . but not in the The lack of mean-square continuity not v'n-consistent, and is is is also a useful regularity condition. a scalar, which is is Another general enough to cover many a vector asymptotic normality would follow from conditions like those of Andrews (1991), which are difficult to verify. Assumption 5 will imply that the a primitive condition, that is In contrast, relatively easy to verify. For example, for the consumer surplus estimator previously discussed where p (x) » a power series, suppose that is x T = bounded density, and that the set equal to everywhere CJ" <(lnp,lnT) on the line l/(p-p_) g.dnp.lnljdp = C D(q.(x)) so that T, p) is C E[q.(x) combination of a power series it and > 2 ] — A = Let (D(p 1K 2 <r (x) ),...,D(p Theorem = A'Q^SQ^A, K If Assumptions 2: (C,JK)/Vn) d — ] By the Weirstrass 0. » q,(x), it is useful to is satisfied in this example. work with an asymptotic and ))'. The asymptotic variance formula V E[g.(x) follows that Assumption 5 = Var(ylx) KK g.(x) Since a polynomial is a linear 0. » To state the asymptotic normality result variance formula. such that can be approximated uniformly by a polynomial g.(x) and g.(x) D(g.) = f g.(lnp,lnT)exp[g (lnp,lnT)]dp £ for a constant C/2, > contained in this uniformly bounded, and converges to zero is For this sequence, else. approximation theorem, any P £ s p i : Then there exists a sequence of continuous functions support. is continuously distributed with compact support and is is Q = E[p 1-6 K (x)p K (x)'], K I = E[p (x)p VnK are satisfied and — K (x)'<r(x) then » 2 ]. (10) Q = 8_ + and l VnV~ "(e K - e ) Q -i* N(0,1), VnV u '(e - e Q ) -!> N(0,1). This theorem includes as a special case linear functionals, which were studied by Andrews (1991). (1991) conditions. (1991), in to be Theorem 2 are quite simple In that case the conditions of Also, the restriction leading to only requiring that Andrews i.i.d. (1991). here, while 2 CJK) K/n K/n — > — > Andrews weaker than that of Andrews for power series, rather than One reason for this contrast, Andrews is relative to is K/n — that the regressors are assumed (1991) general results allows for the regressors to not be identically distributed. 10 may This bound 6. We know not be sharp. d = considered next, where Whether 0. / O This result only gives an upper bound on the convergence rate for will not be sharp in the v'n-consistent case it sharp for other cases is it ,(K)/v n) {C, is still an open question. v'n-Consistency 4. The key condition for v'n-Consistency is that the derivative D(g) be mean-square continuous, as specified in the following hypothesis. Assumption D(g Q PK ) There 6: = E[v(x)g with is (x)], E[lMxH3 DCpj^) = K Kp 2 (x)ll This condition allows for a(g) with i>(x) EMxJp^fx)] -» ] a(g) E[v(x)v(x)'] finite for all k and nonsingular such that and K, and there is 0. to be a vector. as an expected outer product, when g is approximating functions, and for the functional It requires a representation of equal to the truth or any of the i>(x) in the outer product representation to be approximated in mean-square by some linear combination of the functions. This condition and Assumption 5 are mutually exclusive, and together cover most cases of interest A (i.e. they seem to be exhaustive). sufficient condition for Assumption 6 continuous in g is that the functional a(g) be mean-square over some linear domain that includes the truth and the approximating functions, and that the approximation functions form a basis for this domain. The outer product representation in Assumption 6 will then follow from the Riesz representation theorem. This condition is somewhat like Van der Vaart's (1991) condition for v'n-consistent estimability of functionals, except that his mean-square continuity hypothesis pertains to the set of scores rather than a set of conditional expectations. Also, here it is a sufficient condition for Vn-consistency of a particular estimator, 11 A similar rather than a necessary condition for existence of such an estimator. condition was Newey also used by (1994b) to derive primitive conditions for v'n-consistency of series estimators, under stronger regularity conditions. There are many interesting examples of functionals that satisfy this condition. example an average consumer surplus estimator, is (11) JMDdl = some weight function, with is surplus over different income values. This 1. An estimator of derivative of this functional l( Assumption 6 will then be satisfied, with and (4). Suppose that Similarly to equation (10), the E£p£p,lsl<l)v(I)exp(g = v(x) (lnp,lnD). where v(I,p), f(I,p) assumed to be bounded away from zero on p, Another example an average of consumer is D(g) = Jv(I,p)g(lnp,lnI)dpdI, v(I,p) = I is this functional could be used as a summary of consumer surplus as a function of income. density for but = jJvmjfexpCgUnp.lnnjdpdl, a(g) v(I) (5), Consider the functional integrated over income. where equation like that of is the coefficients E[Var(x |x, )] is yn (12) f(I,p) [p_,p] x is the [1,1]. from the partially linear model of equation nonsingular, an identification condition for y 3. and let U = x a( In this g() a ) - E[x = example a |xj b and E[i>(x)g (x)] a(g) is v(x) = (E[UU' = (EIUU' _1 ]) -1 ]) (E[Ux^]y a linear functional of g, a(g) = Then for U. + E[Ug and b (x a(g) b )]) = y Q = D(g) . (13) satisfies third example is a weighted average derivative, where a(g) This functional = .Tw(x)[3g(x)/3x]dx, is /w(x)dx = 1, w(x) a 0. (14) useful for estimating scaled coefficients of an index model, as 12 U EMx)g(x)], Assumption 6 by construction. A One 1 discussed in Stoker (1986), and can also be used to quantify the average slope of a Assuming that function. w(x) zero outside some compact set and that is continuously distributed with density where w(x) that is bounded away from zero on the set = -J*[aw(x)/ax]g(x)dx = EMx)g(x)], Therefore, Assumption 6 is integration by parts gives is positive, a(g) f(x) x vlx) = -f(x) _1 3w(x)/3x. satisfied for this functional, and hence is (15) Theorem 3 below will give v'n-consistency of a series estimator of this functional. The asymptotic variance of the estimator from Assumption 6. It will be determined by the function v(x) will be equal to V = EMx)Wx)'Var(ylx)]. Theorem 3: If Assumptions 1-4 and 6 are satisfied for Vn(G - 9 5. Q ) -U V N(0, V), -^ d = 0, and — VnK > then V. Power Series One particular type of series for which primitive regularity conditions can be specified X is a power series. Suppose that r is the dimension of ~ Z-_i^-> — J * J denote a vector of nonnegative integers, X X r an d l et z = TI. — (z.) For a sequence vectors, a power series approximation has (X. )' r . J PkK (x) It will = x 1 J i.e. x, X = let a multi-index, with (A(k))„ norm of distinct such *-- Mk) (16) . be assumed that the multi-index sequence 13 is ordered with degree I A(K) | |i\| increasing in K. The theory to follow uses orthonormal polynomials, which may also have computational For the first step, replacing each power advantages. x polynomials of order corresponding to components of may distribution by the product of orthonormal with respect to some X, when the lead to reduced collinearity, particularly matches that of the data. The estimator replacement, because I | \(l) distribution closely will be numerically invariant to such a monotonically increasing. is For power series primitive conditions for Assumptions 2 and 3 can be specified. Assumption 2 will be a consequence of the following condition. Assumption x on which The support of 7: x is a Cartesian product of compact connected intervals has a probability density function that This assumption can be relaxed by specifying that distribution of x), but it To be x it is bounded away from zero. only holds for a component of the (which would allow points of positive probability in the support of appears difficult to be more general. specific about Assumption 3 we need uniform approximation rates for power series. These rates will follow from the following smoothness assumption. Assumption of It 8: g n (x) = E[y|x] is continuously differentiate of order s on the support x. d = follows from this condition and Lorentz (1986) that for rate of Assumption 3 is a = s/r, where r is the dimension of search has not yet revealed a corresponding result for also approximated, although hold for a it is any positive number. known that if 0, E[y|x] d > is 0, the approximation x. A literature where derivatives are analytical Assumption 3 will Because of this absence of approximation rates for derivatives, the rest of the Section will confine discussion to the case The first result for polynomials gives convergence rates. 14 where d = 0. Theorem If Assumptions 4: 1, 7, hf/n and 8 are satisfied and 2 Slg (x) - g(x)] dF (x) = (K/n + K Q 2s/r \g - ), — then > gQ Q = 0(K[VR/Vn \ This result gives a mean-square and uniform approximation rates for K~ + s/r ]). Uniform g. convergence rates for derivatives are not given for the reason discussed above, that approximation rates for derivatives are not readily available in the literature. As previously noted, the mean-square approximation rate attains Stone's (1982) r/(r+2s) J~ K = Cn bounds for obtained here and s £ r (implying different than that of Andrews and is 3 K /n — Whang 0). > The mean-square rate (1990), because it population mean-square error rather than the sample mean-square error, which because it for the is is a fixed norm rather than one that changes with the sample size and On the other hand, Assumption 7 configuration of the regressors. Whang conditions imposed in Andrews and is stronger than the (1990). The uniform approximation rate does not appear to be optimal, although it seems to improve on rates existing in the literature, being faster than that of Cox (1988). implication of this rate — > appealing is that g will be uniformly consistent when £ r s One 3 K /n and 0. This result can also be extended to additive models, such as the one in equation A power (3). series estimator could be constructed as described above, except that x products of powers from both ab and x, are continuously differentiable of order s, are excluded. x a and x, b . each of g (x a abb ) and g, (x, The conclusion of Theorem 4 K — — s/r , where r is will then hold with the r maximum dimension replaced by r. This gives mean-square convergence rates for polynomials like the regression spline rates of Stone (1985). r An extension to the partially linear case replaced by the dimension of rather than x. x, would also be similar, with and Assumption 7 only required to hold for x, Because these extensions are straightforward, explicit results are not 15 ) the exclusion of the (many) interaction terms will increase the approximation rate to of If given. Assumptions 7 and 8 are primitive conditions for Assumptions 2 and 3, so that asymptotic normality and v'n-consistency results will follow under the other conditions of Sections 3 and 4. The following more primitive version of Assumption 5 to verify, except for Assumption 5. is are straightforward All of these other conditions are primitive, easier to check and will suffice for Assumption 5 for power series. Assumption a(g) 9: a scalar and there exists a sequence of continuous functions is 00 such that {gjfx)}.. a(g An asymptotic normality Theorem VnK up 9 = 9n + ' (K/Vn) — result for 1 VnV~ and power series estimators and 4 are satisfied for either 0, » 2/2 — K /n (6 - 6 n ) u > -U The rate conditions of this result require that functional, or that s > 3r ? bounded away from zero but is If Assumptions 5: satisfied, ) or a(g) E[g.(x) is d = — 0. > given by the following: Assumptions 0, is linear and ¥T /n 7-9 — then 0, > are N(0,1). s > 3r/2 in the case of a linear a nonlinear functional. in the case of ] certain amount of smoothness of the regression function is In this sense a required for asymptotic normality. For the consumer surplus example, are satisfied. Since it is s/2 A -> has already been shown that Assumptions 4 and 9 nonlinear, and since the dimension of case, asymptotic normality will follow tfft- it x is 2 from Assumptions 7 and 8 and from in this K /n — and •> 0. result can also be formulated for v'n-consistency of a functional of a polynomial regression estimator. Theorem 6: If Assumptions satisfied, VnK Vn(B - B n ) -^ — N(0,V) > 0, and 1 and 4 are satisfied for either V — > K /n — V. 16 > or a(g) d = 0, Assumptions is linear and YT/n 6-8 — > 0, are then The conditions of this result are very similar to those of Theorem Assumption 9 has been replaced by Assumption 6 are not given because it is satisfies Assumptions 4 and 6 with — > and v^nK — s/2 — " it > 4. d = 0, except that More primitive conditions for Assumption 6. already in a form which demonstrated by the examples of Section 5, is straightforward to verify, as The average consumer surplus estimator so that under Assumptions 7 and 8 and will be v'n-consistent. K /n Also, the partially linear coefficients and weighted average derivative examples also satisfy the assumptions, and are linear functionals, so that they will be ViT-consistent 6. K /n if — and > VnK — -» Regression Splines Regression splines are linear combinations of functions that are smooth piecewise Spline approximating polynomials of a given order with fixed knots (join points). functions have attractive features relative to polynomials, being less sensitive to outliers and to bad approximation over small regions. as the theory here is concerned, that support of required for knot placement. x They have the disadvantage, as far must be known. A known support is Alternatively, the results could also be obtained for estimators that only use data where x some specified region, but the conclusion is in would have to be modified for that case. In particular, the uniform and mean-square convergence rates would only apply to the true function over the region of the data that is used, and asymptotic normality would only hold for functionals that did not depend on the function g for x For simplicity, this section will only outside the region. consider the case where the support is known and satisfies Assumption 7, following condition can be imposed without loss of generality. Assumption 10: The support of x is [-1,1] 17 . where the x When the support of is known and Assumption 7 is can always be rescaled x satisfied, so that this condition holds. To describe a spline, let degree polynomial with L-l r {[x + A A.(k,K) s with For x. l(x > 0)*x. knots is £-1 1st* m+1, - 2(£-m-l)/L]} 1 A spline basis for a univariate m (17) m m+2 £ , £ m+L s formed from products of these functions for the individual spline basis can be components of = (x) m+J the set of distinct r-tuples of nonnegative integers {A(k,K)> for each p kK (x) = nj=iPx (x (k) j j' and L)> and £, — (k=1 * K) K = (m+J) let , (18) - The theory to follow uses B-splines, which are a linear transformations of the above functions that have lower multicollinearity. The low multicollinearity of B-splines and recursive formula for calculation also leads to computational advantages; e.g. see Powell (1981). The rate at which splines uniformly approximate a function power series, so the smoothness condition of Assumption 8 will be splines there does not in the literature, so Theorem 7: we limit attention to the case If Assumptions 1, 7, 8, Q is the same as that for For left unchanged. seem to be approximation rates for derivatives readily available 2 S[g (x) - g(x)] dF (x) = It is and 10 d = 0, as we do for power series. are satisfied then (K/n + K~ 2s/r ), \g - gQ Q = 0(VK[VR/Vn \ + K~ s/r ]). interesting to note that the uniform convergence rate for splines is faster than power series. It may be intrinsic feature of the that this is an artifact of the proof, rather than some two types of approximation, but more work that. 18 is needed to determine Asymptotic normality and consistency will follow similarly as for power series. Theorem If Assumptions 8: Up 9 = Qn + In "' VnK satisfied, (K/Vn) — and VnV~ comparison with Theorem 5, d = and 4 are satisfied for either 0, > 1 1/2 K /n — or > a(g) 0, Assumptions is linear and ¥T/n 7-10 are — then > 0, (9 - 9 n ) -±> N(0,1). u this asymptotic normality result for functionals of a growth rate spline estimator imposes less stringent conditions on the Consequently, K. the necessary smoothness conditions for asymptotic normality will be less severe, requiring only for a nonlinear functional or s > 2r As for a linear one. s > r in the case of polynomials, the consumer surplus functional satisfies the conditions of this A 9: If Assumptions are satisfied, VnK Vn(Q - 6 n ) — ' -U » 1 0, and 4 are satisfied for and N(0,V) K /n either V — > — or » splines, the average — s/2 "" — > 0, be Vn-consistent 7. and = v'nK ' — " > 0. d = a(g) 0, Assumptions is linear and 6-8 K /n — and » 0, V. The examples of Section 4 meet the conditions of v'nK — vlT-consistency result for splines can also be formulated. Theorem then K /n normal for result, so that it will be asymptotically consumer surplus estimator this result, and so for regression will be v'n-consistent if K /n — > and and the partially linear coefficients and weighted average derivative will 2 if K /n — » and VnK — s/r — > 0. Conclusion This paper has derived convergence rate and asymptotic normality results for series estimators. Primitive conditions were given for power series and regression splines. Further refinements of these results would be useful. for estimation of derivatives of functions. 19 It would be good to have results These could easily be obtained from 10 approximation rates for power series or splines. Also, it whether the uniform convergence rates could be improved 20 would be useful to know on. Appendix: Proofs of Theorems Throughout the Appendix, different uses and C let = £•_,• J\ denote a generic constant that Also, before proving Theorems 1-3, the following observations that apply to each of those proofs. nonsingular linear transformations of of K p (x) = P i.e. I, K bounded away from zero, it Q p (x) Q A Hence, the conditions imposed on K replaces p K -1/? K p (x)ll K g invariant to is K (x)p K . is (x)' ] Q = has smallest eigenvalue I. This because, for a is .... a nonsingular linear *CC d (K). and convergence rate bounds in terms of E[ IIQ-III 2 ] = CJK) derived for C H (K). E^Zj^EKEi^Pj^Cx.Jp^Cx^/n - 2 I jk > * Ik =lEj=l E[ PkK (x)2p jK (x)2l/n = E[ ^k=l P kK (x)25:j=l P jK (x)2l/n " < (K)2tr(I)/n " — > ] V K)2K/n so that 0, IIQ-III = O «.(K)K 1/2 /vn) = o Furthermore, since the difference in smallest eigenvalues is the smallest eigenvalue of IIQ-III, (19) (1). p P by B = ~ (x), First, 1: Since make helpful to Assumptions 2 and 5 will be satisfied when in this replacement will also hold for the original Proof of Theorem is satisfying ~ p (x) , it different in can be assumed throughout that Q = E[p ,.-1/2 K, , Q p (x) _-l _ of < d (K) = sup xeX(|A|£d ll5Q —1/2 it can be assumed throughout that -.-1/2 , symmetric square root Q (x), Furthermore, since (x). transformation of p may be Q n = 1) — Next, for in absolute value Let converges in probability to one. the indicator function for the smallest eigenvalue of Prob(l bounded Q being greater than 1/2, be 1 so > 1. G = (g^x^....^^))' Boundedness of Var(y|x) let e = Y-G and X = (Xj and independence of the observations implies Therefore, 21 x n ). E[ee' |X] ^ CI. 1/2 E[l n IIQ P'£/nll 2 1 n VelXl/n E[e'P(P'P) n = 1 n E[tr<P(P'P) V'ee'HXl/n tr<P(P'P)~ P'E[ce\X])/n * CI tr<P(P'P)~ V' >/n * CK/n, n 1/2 so by the Markov inequality, _1 n IIQ P'e/nll = -1/2 £ (1)1 p Let 1 l = 1 = |X] n IIQ 1 n IIQ" 1 P'e/nll = 1/2 /Vn). follows that It 1/2 _1 Q" 1/2 1/2 P'c/n> {(e'P/n)Q" P'e/nll = {K Q (K 1/2 /vn). p be as in Assumption g = (gJx.) Then for 3. g (x by ))', 1 P(P'P)~P' idempotent, _1 P' (g-P0)/nll * IIQ 1 £ (l)[(g-P0)'(g-P0)/n] Therefore, by 1 n 1 - 110 (0 - 0) s 011 = 1/2 = (K~ Q n P'(g-P0)/n] 1/2 ). _1 P' (y-g)/n + 1 -1 1 1 a _1 1 pf [(g-P0)'P(P / (1)1 IIQ^P'e/nll + 1 n IIQ P' (g-P0)/n, Q it P'(g-P0)/nll = (K follows that 1/2 /vn + K~ a ). (20) p Next, by the triangle inequality, l n 2 X[g(x)- g() (x)] dF(x) = £ MI0-0II 2 K , + l n J [p K l nJ"[p lg " g = P (21) 2 (x)'0-g (x)] dF(x) = (K/n + K~ o = 1 1 2a ) + 0(K~ 2a ) = (K/n + K~ with probability approaching one. and Cauchy-Schwartz inequalities, Also, by the triangle n 2 o Therefore, the first conclusion follows by l K (x)'(0-0) + p (x)'0-g (x)] dF(x) l Q d * K l n lp « d.(K)[(K 1/2/vn) K '(0-0)l + K" d + |p '0-g a ]). 22 l o d s C (K)l 110-011 + 0(K" d n a ) (22) 2a ). Q The second conclusion follows from from this equation like the first follows eq. (21). QED. Proof of Theorem rfi[<; d Note that Assumption 5 2: (K)(VR/Vn + K~ 2 C d (K) (v'R/vfi + -1/2 F = Let V.. = (A'EA) a 2 _a K = ) 1/2 4 2 = )] b) implies K <(C (K) d [C (K) d 4 K/n] 1/2 |D(g CI by K = |A'/3 )| K I ^ IIAIIII/3 K = II IIAII(E[g Q = bounded below and Var(ylx) 1/2 + (•nK _a 2 ) 4 + L[£ (K) /n] 4 [C (K) /n] K 2 1/2 (xr]) 1/2 n'nk~ implying , II a -» All > (23) 0, 0. as in Assumption g — -> > Z s Further, ». = A'ZA i CHAN V so that so that I, 1/2 d Then by Cauchy-Schwartz inequality, for . ~ 6, /n) 2 and , hence Then, for 2 £ C, |F| _1 1 n and e n IIFA'Q IIFA'Q~ Now, Theorem as in the proof of 1 1 = tr(FA'AF) £ tr(CFV„F) = IIFAII 1/2 2 II II g = Y-G. K (x) 1, _1 1 n UFA' s UFA' — let s II + 1 n IIFA'(Q-I)Q + 1 n K = p (x)'0 tr(FA' (Q-I)Q 1/2 (e-9 n ) = v^F[a(i)-a(g n )] = i/nV" + FA'P'c/Vn + v^FA'(Q -I)P'c/n By Assumptions 3 and 5 n l n 1 iv (v^TK (•nK p P ) II = IIQ-III (24) (1), p AF) £ C + CIIFA' 111 n UFA' _i ll = IIQ-III (1). p G = 3, l^nF[D(g" )-D(g K = o (1). ( p P 1 n Next, let 1 (g (x g (x ) <VnF[a(g)-a(g n )-D(g)+D(g n )] 1 + v^nFA'Q" ?' (G-Pp )]| Cauchy-Schwartz inequality and IIFA'Q T'/v'nllllg-P/VI s _a a UFA' (1) ))', Then _1 1 s C + from Assumption for 1 Also, by the II p _1 2 II C. =s eq. n i'Q IVnFA'Q Vi P'(g-P0,J/n| N ) I Hlvfimax.^ |g.(x.)-g„(x.) u i i i^n IIFA'Q is. X = x (x, 1 23 n ) and p. i K Ci/nl^-g^ K 1 )/n + v/nF[D(g * i/nC|D(g -g (24), K (25) = p K (x.). i | £ 1 n IIFA'Q Then by )-D(g Q )]>. £ Cv/nK~ a -> 0. s ll^n|g_-g„ |_ = u E[e|X n ] = iv 0, u Etl n Hv'nFA'fQ^-nP'e/nl^lX 1 n -I)[y\p.p'.(r (x.)/n](Q *-I)AF' **i n 2 llvfiFA'tQ^-DP'e/nll Next Z. , (i = m n) 1 -^ i so that E[Z. Also, is i.i.d.. e)Z > | £ ne 2 4 IIFA' ll 2 in C n (K) by N(0,I) -5-> 1 2 2 E[llp.ll = ] conclusion follows by * )]| (1-1 n .^i |x.]]/n 11 L"/n( I VnV~K K A = so VnK ) — > Assumption 5 and = Prob(I 1) — I i-g 1 2 I Q d K |p I a(p n K ' | s T C(|p n as — » 0, < J K so that 2 A ) » — 0. > so N(0,1), and 5, eq. > FAQ 1 P' e//n (23), _a 2 )] Then by -£-> 0. ) so that the first N(0,1), a(g) lp /iip-pii < (K)[VR/Vn + K~ d 5, -£-» 0. l D(p K '3-g n s linear in is ;g))'. a ] Then g. l , < | Q l a(p '|3) 2 1/2 = [C (K) K/n] (l + d Therefore, for the lg-g n a(p e/2 By Assumption and 5, e 1 from =1, for any i\i\ £ < e, K ' 0)-a(g)-D(p K/ 0;g)+D(g;g) /II0-0H I Ia; d (K) 2 ii0-0ii -» exists and equals probability approaching one. and Theorem — n, (26) in d where 5, ;g) | d ] (Li/n[C (K)(v^/i/n+K u 0)-a(g)-J' (0-0) /II0-0II = T p-£i 2 4 £ ne E[(Z. /e) ] 2 Assumption lg-g n 1, = (D(p implying e/2, K, 2 -5- and 1, — £.Z. (0~0 n ) VnV,. n IK '/3-g| IIQ-III Note that for each 2 s C< rt (K) K/n ^0 e 1, Assumption II such that 2 4 = ) = ] equal to the indicator function for Also, let > 1. 2 follows that 1)(Z. /e) in > Theorem so that by Theorem 0, * (9-9„) -^-» 0. In case b) of A. 2 in limit theorem, Next, consider case a) of Assumption = A'£, F.EtZ 0, 4 E[e l and the triangle inequality, (25) it CIIFAII = FA'P'e/Vn. in in Finally, by 1. |vGF[a(g)-a(g )-D(g)+D(g eq. h 2 = ne E[l(lZ. /e| ] Then by the Lindbergh-Feller central > ^ ) (1), V.Z. in in — o ) 1 (Q-I)AF' is l in nE[l(|Z. 1 0. = FA'p.e./Vn, r Z. let , 1 _1 n Therefore, since this conditional expectation 1 2 1 tr(FA' (Q -I)AF) £ CI tr(FA' (Q-I)Q -I)Q(Q n = ] _1 -1 CI tr(FA' (Q n J when 1=1. Furthermore, by linearity of 1, 24 Therefore, D(g;g) in g, A exists with Assumption 5, Q 2 l n UA-A)' = IIA-AII £ CT l(A-A)'p n K | d l£-g l Eq. (23) and dividing through by p « dJ (K) 2 [v'R/vn K~ + a ]) K (A-A) = -£-» 0. n K lD((A-A)'p ;i)-D((A-A)'p ;g £ I CIIA-AIIC (K)li-g n d d II l A-A gives II Therefore, I 1 n II A-A £ I IIFAII . l d k £ C< (K)|g-g II n (27) )| + IIFIIIIA-AII = IIFAII = (1), and p -1 similarly I IIFAQ n = II (1). p h = I Q n Next, let _1 AF h = I AF. n and Then = llhll _1 _1 llh-hll E s Since £ I IIFA'Q n + I (I-Q)II F( A-A)' II n Z the largest eigenvalue of CI, I Ih'Zh n £ Cllh-hll II £ I UFA' n II + I |F|IIA-AII III-QII n bounded above, and hence by is + 2[(h-h)'Z(h-h)] 1/2 [h'Zh] 1/2 £ o (1) + Cllh-hll -^ h'Zh = = Ih'Zh - h'Zhl £ (h-h)'Z(h-h) + |2(h-h)'Zh| 1| 2 and (1) p 0. 1 (28) -^ 0. P 2 Z = r.p.p'.e./n. Also, let for ^111 4 E[e.|x.] follows by from Theorem -£-» IIQ-III It that 1, n vnK = n _a /K 2 llhll implying = IIZ-ZII (Do P = g (x.)-£(x.). A. let -E-» 0, IIZ-ZII I Ih'Zh - h'Zhl = |h'(Z-Z)h| £ Now, bounded, and an argument like that 11 l 1/2 — 0, > ) so by -1 Theorem 1, < (K)[(K/n) max.^ I A. I bounded and a similar argument to IIQ-III -J-» 0, 1/2 +K~ £ lg-g S = E[p.p'. |e.|] = E[p.p'.E[|e. and £.p.p'. |e.| Note that | |x.]] it (1) -^ (29) 0. P a ] 1/2 2 = [< (K) K/n] (l + = l (o(D) -?-> 0. £ CQ = follows that CI. II By S— S II E[e —^-» Also, let 2 |x.] 0. Therefore, I IFVF - h'Zhl = n £ 2max. i£n _1 I A. |n l V.(h'p.) u \ *i |h'(Z-Z)h| = |n 2 _1 Hi 2 |e. i I ] 2 y.(n'p.) (2c.A.+A + max. |A |n"V.(h'p.) u\ *i i^n l 25 2 ill 2 )| £ o U)[h' (S+Q)hl p (29) S pV s o (l)[h'(S+Q-S-I)h] + o (l)[h'(S+I)h] £ o (l)llhll P P P Then by the triangle inequality and = Probd Then by i/nv" 1/2 —> 1) (e-e F V 1, eqs. -£-> (28)-(30), 2 + IIQ-III) + o (IIS-SII -^ (l)llhll 0. P follows that it |FVF-1| -!-> 1 0. implying 1, 2 1/2 = VnF(e-e )/(F v) ) 2 -U n(o.i), giving the last conclusion. show the Finally, to first conclusion, it show that suffices to |V | s CC,(K) /2 9 = 6. + (V* because / /v n)vnF(9-9_) = 9 n + K Cauchy-Schwartz inequality implies C|p K, CHAN 2 A| CC.UO IIAII. s . s CC (K) d 2 Dividing by / — J f with satisfied with = llcll > Let and N(0,c'Vc) IIAII * CC.(K), IIAII c'Vc c'Vc -£-» Furthermore, for the functional 1. replaced by v(x) Wx). for scalar 2 so that (K)II0II, then gives IIAII Note that for any /Vn). the 0, K = |D(p 'A)| £ |V„| s and hence V By the Cramer-Wold device and by symmetry of 3: to prove that c'v n(9-6 c s £ '|3| /2 , QED. . Proof of Theorem vector |p (V* 2 u K. Therefore, c'v(x). K K v v = Ap (x) = E[i>(x)p (x)']Q V, it suffices for any conformable constant c'a(g), Assumption 6 is suffices to prove the result it —1 and K be the mean square p (x) IX v = v(x) projection of on the approximating functions, where the dropped for notational simplicity. 2 2 Then V,. ix K 2 E[{v-p (x)' v} I K — ] -V I > Assumptions 0. * E[|i> 2 2 -i> |] s EUv 2 1/? s od) + 2iEivnr'(Eiiv-v Thus, —^-> V V. By <r (x) 1 = E[v v <r K. x argument by Assumption (x)] 6, is E[[v-v v ) K ? -v) Z 1/? K r]r* ] + 2E[|v||i> -v|] K -> o. bounded away from zero and proof now follows exactly as in the proof of Theorem — V i v(x) non-zero, V > The 0. 2, except that F = V -1/2 " is /*y —> l/W. s ] and the Cauchy-Schwartz inequality then give K. bounded by 2 K. Therefore, by the conclusion of Theorem 26 2, ^(9-9^ = V^ /2 1/2 v'nV V K (e-e —^-> V is P Let 4: bounded below, x density of (3.14) of and hence Andrews (1991) that 2 3 C->(K) — K/n £ CK /n Proof of Theorem 2 ] — » 5: 0. Furthermore, d = and K(J K a for 1/J c 0, that J c > g.(x) = VnST so that (x) IIP K(J), 1/e, ~ CK> " follows by Theorem 8 of Lorentz a = and such that such that (3 Let J. so the conclusion s/r, |a(g.)| is > a sequence of e for all and . and pv = pv let JK. . for Then by the triangle E[gv (x) 2 ] = sup A ^.|g (x)-|3 'p XGo. J K.J and J supremum — (x)| — 0, > K(J) £ K < K(J+1), inequality, so Assumption 5 |a(g.J| is p„ = > e satisfied. K for for < < K(l), K all as » A„. be a monotonic increasing sequence such that K(J) > d = Then, since K. and from the proof of Theorem C (K) V ' note that by Assumption 9 there 5, set, there is g K (x) = p (x)'3 K ), K follows as in the proof of Theorem 4 that Assumptions 2 and 3 are It for any fixed oo s ' it K.J » follows as in eqs. (3.13) Since power series can approximate a continuous function in the 0. norm on a compact — By the (x). 1/9 |P (x) kK k < K X €X uniformly bounded continuous functions E[g.(x) it p QED. 1. To show Assumption satisfied. Also, ]. max (1986) that Assumption 3 is satisfied with follows by Theorem so that their x's each term by the Jacobi in K K E[P (x)P (x)' =s > so that 1, orthonormal with respect to the uniform distribution is CI 1/2 -i> QED. which comprises a nonsingular linear transformation of [-1,1], and -^-» V. FV 2, be obtained by transforming the (x) polynomial of the same order that on V and replacing the individual powers [-1,1] Theorem Also, as in the proof of and squaring gives 1, Proof of Theorem support -i> N(0,V). ) K /n s CK /n 5/r Proof of Theorem — > C n (K) ^ CK, so for nonlinear and, since Assumption 3 is a(g) satisfied with follows it a = s/r, vftK QED. 6: Follows by Theorem 3 similarly to the proof of Theorem Assumption 6 replacing Assumption Proof of Theorem 4, 7: It follows by 9. 5, with QED. Lemma 27 A. 16 of Newey (1994a), that for P (x) equal to K the products of normalized B-splines (e.g. see Powell, 1981) multiplied by the square root of the number of knots, 0. Furthermore, it K IIP CK * (x)ll 1/2 = C (K), Q and hence ? < (K) K/n s Q follows by the argument of Burman and Chen (1989, rest of Assumption 2 is satisfied. by Theorem 12.8 of Schumaker Also, (1981), Assumption 3 with d = so the conclusion follows Proof of Theorem 8: Follows as proof of Theorem 5. Proof of Theorem 9: Follows as in the proof of Theorem 6. in the 28 and p. ? CK /n — 1587), that the a = s/r from Theorem 1. follows QED. References Andrews, D.W.K. 1991, Asymptotic normality of series estimators for nonparametric and semiparametric models, Econometrica 59, 307-345. , Andrews, D.W.K. and Y.J. Whang, 1990, Additive interactive regression models: Circumvention of the curse of dimensionality, Econometric Theory 6, 466-479. Burman, P. and K.W. Chen, 1989, Nonparametric estimation of a regression function, Annals of Statistics 17, 1567-1596. Chen, H. 1988, Convergence rates for parametric components in a partly linear model, Annals of Statistics 16, 136-146. Cox, D.D. 1988, Approximation of least squares regression on nested subspaces, Annals of Statistics 16, 713-732. Deaton, A., 1988, Rice prices and income distribution in Thailand: A nonparametric analysis, Economic Journal 99 supplement, 1-37. B.J. and A.R. Gallant, 1981, Adaptive rules for seminonparametric estimation that achieve asymptotic normality, Econometric Theory 7, 307-340. Eastwood, Gallant, A.R. and G. Souza, 1991, On the asymptotic normality of Fourier flexible functional form estimates, Journal of Econometrics 50, 329-353. Hausman, J. A. and W.K. Newey, 1995, Nonparametric estimation of exact consumer surplus and deadweight loss, forthcoming, Econometrica. Lorentz, G.G. 1986, Approximation of Functions, New York: Chelsea Publishing Company. Newey, W.K. 1988, Adaptive estimation of regression models via moment restrictions, Journal of Econometrics 38, 301-339. Newey, W.K. 1994a, Convergence rates for series estimators, Statistical Methods of Economics and Quantitative Economics: Essays in Honor of C.R. Rao, G.S. Maddalla, ed., forthcoming. Newey, W.K. 1994b, The asymptotic variance of semiparametric estimators, Econometrica 1349-1382. Newey, W.K. 1994c, Series estimation of regression functionals, Econometric Theory 10, 1-28. Powell, M.J.D. 1981, Approximation Theory and Methods, Cambridge, England: Cambridge University Press. Schumaker, L.L. 1981, Spline Functions: Basic Theory, Wiley: New York. Stoker, T.M. 1986, Consistent estimation of scaled coefficients, Econometrica 54, 1461-1481. Stone, C.J. 1982, Optimal global rates of convergence for nonparametric regression, Annals of Statistics 10, 1040-1053. 29 62, MIT LIBRARIES 3 9080 00940 0380 Stone, C.J. 1985, Additive regression and other nonparametric models, Annals of Statistics 13, 689-705. White, H. 1980, Using least squares to approximate unknown regression functions, International Economic Review 21, 149-170. 30 7966 ] Date Due Lib-26-67