^^ \^ — :% orrtic*T M.J.T. LIBRARIES - DEWEY Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/nonparametricestOOnewe ^0 working paper department of economics massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 WORKING PAPER DEPARTMENT OF ECONOMICS Nonparametric Estimation of Triangular Simultaneous Equations Models Whitney K. Newey James L. Powell Francis Vella No. 98-06 July, 1998 MASSACHUSEHS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142 -MASSACHUSffTS INSTITUlt OF TECHNOLOGY Nonparametric Estimation of Triangular Simultaneous Equations Models Whitney K. Newey Department of Economics MIT, E52-262D Cambridge, MA 02139 wnewey@mit.edu James L. Powell Department of Economics University of California at Berkeley Francis Vella Department of Economics Rutgers University New Brunswick, NJ 08903 Vella@fas-econ.rutgers.edu March 1995 Revised, Febuary 1998 Abstract This paper presents a simple two-step nonparametric estimator for a triangular Our approach employs series approximations that exploit the simultaneous equation model. The first step comprises the nonparametric estimation additive structure of the model. The second step is the estimation of the reduced form and the corresponding residuals. of the primary equation via nonparametric regression with the reduced form residuals We derive consistency and asymptotic normality results for our included as a regressor. Finally we present an empirical example, estimator, including optimal convergence rates. based on the relationship between the hourly wage rate and annual hours worked, which illustrates the utility of our approach. Keywords: Nonparametric Estimation, Simultaneous Equations, Series Estimation, Two-Step Estimators. The NSF provided financial support for the work of Newey and Powell on this paper. An early version of this paper was presented at an NBER conference on applications of semiparametric estimation, October 1993, at Northwestern. We are grateful to participants at this conference and workshops at Harvard/MIT and Tilburg for helpful comments. We are also grateful to the referees for their extensive comments that have helped both the content and style of this paper. Introduction 1. Structural estimation important is in econometrics, because needed to account is it correctly for endogeneity that comes from individual choice or market equilibrium. structural models do not have tight functional form specifications, so that to consider nonparametric structural models and their estimation. is it is estimation useful This paper proposes and analyzes one approach to nonparametric structural modeling and estimation. approach Often This different than standard nonparametric regression, because the object of is the structural model and not just a conditional expectation. Nonparametric structural models have been previously considered Newey and Powell (1989) and Vella (1991). Roehrig (1988), Roehrig (1988) gives identification results when the errors are independent for a system of equations in of the instruments. Newey and Powell (1989) consider identification and estimation under the weaker condition that the disturbance has conditional mean zero given the instruments. The model we consider are complementary to these previous results. in some ways than Newey and Powell The model we consider The results of is this paper more restrictive (1989), but is easier to estimate. a triangular nonparametric simultaneous equations model is where y = g(~,(x,z + e ) (1.1) E[e|u,z] = E[e|u], , E[u|z] = 0, X = n (z) + u X is a d X X 1 vector of endogenous variables, variables that includes a is x d, vector of instrumental 1 1 z, as a d,, functions of the instruments x 1 subvector, n„(z) is a d 11 1 (1.1) z and z, u is a d x X x vector of 1 vector of disturbances. 1 Equation generalizes the limited information simultaneous equations model to allow for a nonparametric relationship nonparametric reduced form g (x,z ^ ) between the variables y and (x,z ) and a (z). The conditional mean restriction in equation (1.1) is one conditional mean version of the usual orthogonality condition for a linear model. general assumption than requiring that This added generality For example, can be purchases of a commodity and y a more and that z E[u] = 0. x in allows for it some separable expenditures on a subgroup of Endogeneity results from expenditure on a commodity subset being a choice commodities. from individual heterogeneity Also, heteroskedasticity often results variable. is (1.1) for some econometric models, because conditional heteroskedasticity in the disturbances. demand models Equation be independent of (g,u) may be important 2 functions, as pointed out by Brown and Walker Assumption (1989). demand in allows for both (1.1) endogeneity and heteroskedasticity. An alternative model, considered by Newey and Powell E[e|z] = 0. equation (1989), requires only that Strictly speaking, neither model is stronger than the other, because does not imply that (1.1) into a reduced E[e|z] = form and a residual One benefit of such a condition is u that 3 0. The additive separability of satisfying equation it (1.1) a strong restriction. leads to a straightforward estimation approach, as will be further discussed below. In this sense, the model of equation a convenient one for applications where its restrictions are palatable. is estimation satisfied, very difficult is if 2 If x 4 a two-step nonparametric estimator of this model. n The second step on z. (z) were linear in z is If equation (1.1) is satisfied, the nonparametric regression of u is The mapping from the reduced form it first step y on x, z and , and the conditional expectations were replaced by independent of = E[E[e|u,z]|z] = E[E[e|u]|z] = E[E[e|u]] = E[e] = discontinuous, making The from the nonparametric regression u population regressions, then the third condition implies that and the second condition that z is orthogonal to e. 3 is (1989). consists of the construction of a residual vector of (1.1) In contrast, E[g|z] = mean assumption only the conditional Newey and Powell as discussed in We propose 4 is x E[y|z] z, z and is orthogonal to E[e] = 0, then u, E[g|z] 0. to the structure g (x.z difficult to construct a consistent estimator. ) turns out to be u. Two-step nonparametric kernel estimation has previously been considered (1994). Ahn in Our results concern series estimators, which are convenient for imposing the additive structure implied by equation We (1.1). derive optimal mean-square convergence rates and asymptotic normality results that account for the first step estimation and allow for Vn-consistency of mean-square continuous functionals of the estimator. Section 2 of the paper considers identification, presenting straightforward Section 3 describes the estimator and Section 4 derives sufficient conditions. Section 5 gives conditions for asymptotic normality and inference, convergence rates. including consistent standard error formulae. Section 6 describes an extension of the model and estimators to semiparametric models. and Monte Carlo results illustrating the Section 7 contains an empirical example utility of our approach. Identification An implication (2.1) of our model is that for A E[y|x,z] = g (x,z,) + E[e1x,z] = g (x,z w gQ(x,z^) + Aq(u) = hQ(w), Thus, the function of interest, y on (x,z ) and u. Equation E[X^(u)+{y-E[y |x,z]}|u,z] = A as specified in equation g (x,z (1.1). equivalent to the conditional (2.1) (u) is ), = E[e|u], (u) ) + E[c|u,z] = (x', z'^, u')'. one component of an additive regression of also implies that only depends on u, E[e|u,z] = E[y-g (x,z which implies Thus, the additive structure of equation mean restriction of equation identified, so that the identification of g (1.1). under equation (1.1) (2.1) is u is the same as the identification of this additive component in equation (2.1). To analyze identification of g (x,z ), note that a conditional expectation unique with probability one, so that any other additive function - E[c|u,z] = E[c|u], Furthermore, is |u,z] ) g(x,z )+A(u) is Prob(g(x,2 )+\(u) = g (x,z )+X satisfying equation (2.1) must have identification is Equivalently, working with the difference of two conditional expectations, identification is equivalent to the statement that a zero additive function must have only constant components. gJx,z) 2.1: Prob(d(x,zJ + Tr(i^) up is identified, = 0) = To be precise we have implies there is a constant 1 c Theorem 2.1, nonconstant. ), if (5(x,z )). the with Prob(8(x,z.,) = c Suppose that identification ) there are 5(x,z and ) b-Iu) with 5(x,z ) + i^{u) = and between this implies a functional relationship Intuitively, Then by fails. u 6(x,z ^'(u) were a one-to-one function of a subvector More generally, random vectors (x,z 5(x,z and ) + y{u) = ) u of u, ) and a degeneracy in the joint distribution of these two random variables. instance, Z and only if a straightforward interpretation of this result that leads to a simple is sufficient condition for identification. (x,z gig to an additive constant, if 1 There Therefore, 1. equivalent to equality of conditional expectations implying equality of the additive components, up to a constant. Theorem = (u)) For then u = implies an exact relationship between Consequently, the absence of an exact relationship u. will imply identification. To formalize these ideas it is helpful to be precise about what we mean by existence of a functional relationship. There Definition: there exists and is a functional relationship between h(x,z^,u) Prob(h(x,z Jl) = 0) and a set < 1 for U all such that fixed This condition says that the pair of vectors equation. Prob(U) u e (x,z (x,z >0, ) and u if and only if Prob(h(x,z ,u) = 0) = 11. ) and u Thus, the functional relationship of this definition solve a nontrivial implicit is an implicit one. As previously noted, nonidentification implies existence of such an implicit relationship. Therefore, the contrapositive statement leads to the following sufficiency result for 1 identification. Theorem g (x,z ) Although (x,z If there is no functional relationship between is identified up to an additive constant. 2.2: it and ) is u and ) u then a sufficient condition, nonexistence of a functional relationship between is By Theorem not a necessary condition for identification. nonexistence of an additive functional relationship that Thus, identification condition. (x,z may is occur when there still 2.1, it is necessary and sufficient is an exact, nonadditive functional relationship. The additive structure identified even z is differentiable. 6(x) + y{u) = 5(x) + The restriction that TT(z) = (z,e x and u is of the is so strong that the model is not satisfied, and x is and gp,(x) i.e. may be even though two dimensional, z form + u not two implies a nonlinear, nonadditive = exp(x -u ^^"U is are restricted to be '^^^(u) only one instrument although there are x = n(z) has z ). Consider an additive satisfying 9'(u) = 5(x^,exp(x^-u^)+U2) + y{u^,u^) = g(-)(x) and '^--.(u) 5^(x) + 52(x)exp(x^-u^) = 0, are differentiable means that Then equation 6(x) and ). (2.2) implies -52(x)exp(x^-u^) + two equations gives ^^(u) = -y (u)exp(u -X 0. y(u) Let numerical subscripts denote partial derivatives with respect to corresponding arguments. last ), The reduced form must also be differentiable. Combining the (2.1) For example, suppose ). example there In this relationship between (2.2) (x,z a scalar, endogenous variables. function equation when the usual order condition smaller dimension than present, in 3-^(u) = 0, 5^U) + r^^u) = 0. 5 (x) = 6 (x) = 0, ^ (u) = TtJ.^^ 6(x) identified, up to a constant, by There = u, it follows from this equation that Then the second and 0- implying that both equation (2.2) implies that contains more than one point with u can vary for given x probability one, so that with probability one conditional on x Assuming that support of and 5 first equations imply are constant functions. y a constant function, and hence is Theorem gQ(x) a simple sufficient condition for identification that generalizes the rank is z = (z ,z partitioned as ). differentiation with respect to Theorem then If g(x,z If 2.3: support of (z,u) g (x,z ), Let x, \(u), x, u, z u, and and I, and , \[(z) z 2 Suppose that z subscripts denote respectively. are differentiable, the boundary of the has zero probability, and with probability one rankCH (z)) = d , is identified. ) were linear n(z) is 2.1. condition for identification in a linear simultaneous equations system. is Thus, in z then rankdT (z)) = d is precisely the condition for identification of one equation of a linear simultaneous system, in terms of the reduced form coefficients. Thus, this condition is a nonlinear, nonparametric generalization of the rank condition. It is interesting to note, as pointed out by a referee, that this condition leads to an explicit formula for the structure. Note that h(x,z) = E[y|x,z] = g(x,z ) + Differentiating gives A(x-n(z)). h^(z,x) = g^(x,z^) + A^(u), h^(x,z^) = g^(x,z^) - n^(z)'A^(u), h_(x,z) = -n_(z)'X (u), 2 2 u Then, if rankdl [Tl^izm^iz)' ] = d (z)) n^tz) for almost and solving gives the other terms gives all z, multiplying A^(u) = -D(z)h (x.z). h„(x,z) by D(z) = Plugging this result into gj(x,z^) = h^(x,z) - n^(z)'D(z)h2(x,z), (2.3) g (x,z = h ) (x,z) + D(z)h (x,z). This formula gives an explicit equation for the derivatives of the structural function in terms of the identified conditional expectations So far we have This constant. different values of also desirable to restriction is only considered identification of sufficient for is (x,z know the level of g (x,z E[e] = 0, t(u) if is so that g (x,z h(x,z). ) up to an additive as comparing g (x,z at ) such as forecasting demand quantity, which can be identified ), imposed on the distribution of assumed that follows that many purposes, such In other cases, ). and n(z) E[y] = E[g (x,z it is a location Then from equation )]. JxCujdu = some function with if For example, suppose that e. g it is (2.2) it 1, J'E[y|x,z^,u]T(u)du - E[jE[y |x,z^,u]T(u)du] + E[y] (2.4) gQ(x,z^) - E[gQ(x,z^)] + E[y] = g^{x,z^). Thus, under the familiar restriction g_(x,z up to a constant, as ) E[e] = the conditions for identification of in equation (2.2), also serve to identify the level of go(x,z^). Another approach to identifying the constant term For example, the restriction ^ (u). ^(-,(0) = is is to place restrictions on a generalization of a condition that holds in the linear simultaneous equations model because, in the linear model where g^(x,z ) and TKz) are linear and equation (1.1) holds with population regressions replacing conditional expectations, the regression of linear in u. some known (2.5) More generally, we u and A. E[y|x,z^,u] - y on ) and will consider imposing the restriction In this case X = (x,z gQ(x,z^) + X^{u) - X = gQ(x,z^). u will be /^(-.(u) = X for Thus, under the restriction A (u) = structural function in terms of A, there The constant h. restrictions as well, but this will suffice for Equation (2.1) can be estimated in under other types of many purposes. two steps by combining estimation of the residual with estimation of the additive regression procedure (i will be identified Estimation 3. u a simple, explicit formula for the is = (2.1), 1, is equation (2.1). formation of nonparametrically estimated residuals The second step n). ..., in is using the estimated residuals the structural function g (x,z ) The first step of this u. = w.-IT(z.), estimation of the additive regression in equation in u. place of the true ones. An estimator of can then be recovered by pulling out the component that depends on those variables. Estimation of additive regression models has previously been considered and we can apply those results for our second step estimation. literature, least in the There are at two general approaches to estimation of an additive component of a nonparametric One regression. is to use an estimator that imposes additivity and then pull out the component of interest. Breiman and Friedman This approach for kernel estimators has been considered by (1985), Hastie and Tibshirani (1991), and others. Series estimators are particularly convenient for imposing additivity, by simply excluding interaction terms, as in Stone (1985) and Andrews and Whang (1990). Another approach to estimating the additive component expectation is to use the fact that, for a function J'E[y|x,z^,u]T(u)du = g (x,z Then gQ(x,z) ) + t(u) g (x,z such that ) of the conditional J'T(u)du = 1, AQ(u)T(u)du. can be estimated by integrating, over u, an unrestricted estimator of This partial means approach to estimating additive models the conditional expectation. has been proposed by Newey (1994), Tjostheim and Auestad (1994), and Linton and Nielsen (1995). computationally convenient for kernel estimators, although is It it is less asymptotically efficient than imposing additivity for estimation of -v^-consistent when the disturbance functionals of additive models homoskedastic. is paper we focus on series estimators because of their computational In this To describe the two-step series convenience and high efficiency in imposing additivity. we estimator = (z),...,r (r For each positive integer L initially consider the first step. be a vector of approximating functions. (z))' In this let r (z) paper we will consider in detail polynomial and spline approximating functions, that will be further Let an discussed below. number of observations. r. = r Let n be the total be the predicted value from a regression of IT(z) x. on (z.), 1 1 n(z) = (3.1) rhz)'r. To form the second w functions of u, subscript index the observations, and let i in = (R'R)~^R'(x,,...,x 3^ step, p let = (x'.z'u')' (w) = (p R = )', p r]'. be a vector of approximating (w))' (w),...,p such that each In [r, depends either on (w) (x,z ) or on Exclusion of the interaction terms will mean that any linear but not both. combination of the approximating functions has the same additive structure as in equation (2.1), that additivity i.e. for the event and d, imposed. is t(w) Also, denote the indicator function lid) let denote a function of the form d+d t(w) = n. ,^ J=l (3.2) where + d a is and . b . l(a. < Let (x,z 11 u. = x. ). refers to the observation, and y. on = p (w.) J This - n(z.), w p. b.) J w are finite constants, the dimension of discussed below. w. < J is . t(w) is w. = the jth 1 x. = t(w.). 1 for each observation where 10 w, and d = d a trimming function to be further (x'. ,z' .,u'. )' 1 component of li , where an The second step t. i subscript ^ for 1 = 1, giving is obtained by regressing p - (P'P) ^P'Y h(w) = p'^(w)'/3, (3.3) P = Y = [Vl.-'Vnl'' t(w) The trimming function convenient for the asymptotic theory. is w be used to exclude large values of (y^....,y^)'. that might lead to outliers in the case of Although we assume that the trimming function polynomial regression. results would extend to the case where the a limits and . b if are estimated. . these limits are estimated from independent data function g (x.z ) and of A respectively. u, ^ K i(x,z^) = Sg. (x,z,) -^ remaining terms depend only on (3.4) u. Suppose that p IK (w) = K for the next terms, and that the Then the estimators can be formed as +1 Ej!2 ^jPj(---i)' To complete the definition of ^(-^ = h - this estimator the constant shifts, the constants are not important. For example, includes the log of price, a constant elasticities (i.e. derivatives of g). is ^g " ^A = ^r ^j=Y2^jPj^"^- For many objects of estimation, such as predicting how X g(x,z if and c c is changes as ) g(x,z ) x For important for predicting tax receipts. However, the condition that ^g ~ ^1 ~ ^A z not needed for estimation of ^^.(u) linear model, can easily be used to estimate the constants. ^"^ or represents log-demand Because of the trimming we cannot use the most familiar condition that » — K ^j=K +Z^f]'^^ . terms should be specified. In other cases the constant is important. example, knowing the level of demand estimate the constants. ) the is 1 (x,z g 1 These estimators are uniquely defined except for the constant terms and still hold. by collecting those terms that depend only on (u) depends only on ,,(w) ^IcK p, is can be used to construct an estimator of the structural h(w) and those that depend only on constant, that In a subsample that (e.g. not otherwise used in estimation), then the results described here will The estimator nonrandom, some is J J particular, can also It ^^^*^^ ^° = X, := which generalizes the Choosing ^" estimator satisfying 11 E[e] c^ = X - X(u) = X A and to In this case it will follow that the estimator of the individual equation (3.4). components are g(x,z (3.6) We ) = h(x,z ,u) X{u) = h(x,z ,u) - h(x,z ,u) + X. - A, consider two specific types of series estimators corresponding to polynomial and To describe these spline approximating functions. let fi = ' vector of nonnegative integers, ^ |fi| = d^ and Xi._,)^., let z M. For a sequence = n._ (z.) \ rhz) = of distinct such vectors, a i^JU))p_, approximation for the first stage given by is in the second stage, multi-indices with dimension the same as (x,z or ) that the first d u, power series z^^^^V. (z^^^^ To describe a power series only on norm a multi-index, with i.e. denote a )' jn ((j nonzero, and vice versa. (^(k)), _ denote a sequence of and such that for each k, w depends This feature can be incorporated by making sure but not both. components of w, let /j(k) are zero if any of the last d components are Then for the second stage a power series approximation can be obtained as p (w) = (w*^ For the asymptotic theory with degree |(li| w"^ it )' will be increasing in i . assumed that each multi-index sequence or k, and that all is ordered terms of a particular order are included before increasing the order. The theory to follow uses orthonormal polynomials, which may also have computational advantages. For the first step, replacing each power by the product of univariate z polynomials of the same order, where the univariate polynomials are orthonormal with respect to some distribution, may The estimator will be lead to reduced collinearity. numerically invariant to such a replacement, because 12 I nU) I is monotonically An analogous replacement could increasing. also be used in the second step. Regression splines are smooth piecewise polynomials with fixed knots (join points). in the statistics literature They have been considered They are different than smoothing parameter in that is by Agarwal and Studden (1980). For regression splines the smoothing splines. They have attractive features relative to power series the number of knots. they are less sensitive to outliers and to bad approximations over small regions. in the The theory here requires that the knots be placed must be known, and that the knots be evenly spaced. It support of which therefore z., should be possible to generalize results to allow the boundary of the support to be estimated and the knots not evenly spaced, but these generalizations lead to further technicalities that we leave to future research. To describe regression splines and that the support of c let [-1,1] = (c) is z is m a. = -1, convenient to assume that degree spline with [-1,1]. b. = 1, For a scalar evenly spaced knots on J-1 a linear combination of 1 p.j(u) = ^ In this is a cartesian product of the interval An l(c > 0)*c. it ' paper we take set of multi-indices functions for z of knots for the < ^ < m+1, , {[u + m 1 m+2 - 2(^-m-l)/J]}™, fixed but will allow </j{£)}, with /J.(£) J ^ m+J < ^ < m+J to increase with sample size. for each will be products of univariate splines. jth component of z is J and j For a the approximating I, In particular, if the number then the approximating functions could ., be formed as d (3.7) r^^(z) Throughout the paper ratio of 1 nj=i it %i).)j^^s^' will be (^ = assumed that '' J. -K). depends on numbers of knots for each pair of elements of 13 z is K in such a way that the bounded above and away from Spline approximating functions in the second step could be formed in the analogous zero. way, from products of univariate splines, with additivity imposed by only including terms that depend only on one of (x,z or ) u. The theory to follow uses B-splines, which are a linear transformations of the above The low multicoUinearity of B-splines and functions that have lower multicollinearity. recursive formula for calculation also lead to computational advantages; e.g. see Powell (1981). Convergence Rates we In this Section obtain results it = y - h (w), and important to impose some regularity conditions. is u = x - H for a random matrix constants Assumption C 1: derive mean-square error and uniform convergence rates. Y, = IIYII V such that (i = {E[IIYII^]>^'^^, matrix v < oo, D let and IIDII {x,z), = [trace(D'D)] tj 1/2 the infimum of IIYII 00 ProbdlYII {(y.,x.,z.)}, Also, for a (z). X = Let To 1, C) < 2, = ...) 1. is i.i.d. and and Var(x|z) Var(y|X) are bounded. The bounded second conditional moment assumption literature (e.g. Stone, 1985). Relaxing it is quite common in the series estimation would lead to complications that we wish to avoid. Next, Assumption on its we impose z 2: is the following requirement. Also, from zero on W = {w continuously distributed with density that support, and the support of intervals. Let w W, and is z is : is t(w) = IK bounded away from zero a cartesian product of compact, connected continuously distributed and the density of W is contained in the interior of 14 w the support of is bounded away w. This assumption useful for deriving the properties of series estimators like those is considered here see Newey, 1997). (e.g. moment matrix the second of the approximating functions. condition like those discussed in Section 2 w of allows us to bound below the eigenvalues of It is embodied in this being bounded away from zero means that there between (x,z and ) so that by u, Also, an identification is assumption. The density no functional relationship Theorem 2.2 identification This assumption, holds. along with smoothness conditions discussed below, imply that the dimension of at least as large as the dimension of (x,z z must be a familiar order condition for ), identification. To see why Assumption 2 implies the order condition, note that the density of is bounded away from zero on an open x, z set. and , But (z ,n(z)) given below, so is unless of and are the same, Some discreteness results. a vector function of In particular. the distribution of a one-to-one function of is must be bounded away from zero on an open z that will be smooth under assumptions has at least as many components as z x (z ,n(z)) w Then, because range must be confined to a manifold of smaller dimension than its (z ,n(z)) n{z) the density of IT(z), set. w z. in the z Since the dimension (z ,n(z)). must have as many components as components of z (x,z ). can be allowed for without affecting the Assumption 2 can be weakened to hold only for some component of Also, one could allow some components of distributed, as long as they have finite suppoj-t. z On the other hand, to be discretely it is not easy to extend our results to the case where there are discrete instruments that take on an infinite number of values. Some smoothness conditions are useful for controlling the bias of the series estimators. Assumption and 3: gQ(x,z^) TT (z) and is ^q(u) continuously differentiable of order s on the support of are Lipschitz and continuously differentiable of order If. 15 s z on The derivative orders and s the rate of approximation for In particular, approximate the function. and hence also for X(u), control the rate at which polynomials or splines s 0(K will be h (w), -s/d where ), d is g (x,z ) and the dimension of (x,z^). and The last regularity condition restricts the rate of L. Recall that Assumption denotes the dimension of d Either a) for power series, (K 4: + for splines, (K^ + KL^^^)[(L/n)^''^+L"^/^i] -^ For example, for splines, K if and requires that they grow slower than As a preliminary step expectation estimator Lemma it h(w). is z. at the and that , +L L)[(L/n) i i) U q = 1/2 Also, for sup Let F same rate then 0; or b) s > 2d this condition . denote the distribution function of all = w. ,,,\h(w) - (K/n + k~^^^^ + L/n + hJw)\ L^^/^0. p (J q = for splines, and ^weW = p for power series, 1 (K^KK/n)^^^ subsequent proofs are given + K^^'^ + iUn)^^^ equation (3.4). We term a will treat the constant square and uniform convergence rates, so it is + L~^/^i]). Appendix. in the This result leads to convergence rates for the structural estimator separately. > helpful to derive convergence rates for the conditional Sr(w)[h(w)-hJw)]^dFJw) in — If Assumptions 1-4 are satisfied then 4.1: This and n K 0. grow L K growth of the number of terms little g(x,z differently for the helpful to state and discuss ) given mean them For mean square convergence we give a result for mean-square convergence of the de-meaned difference between the estimator and the truth. 16 Let x = E[t(w)]. Theorem If Assumptions 1-4 are satisfied then for 4.2: ST(w)[kx,z^)-ST(w)A(x,z^)F^(dw)/TfF^(dw) = The convergence rate approximating functions in is the sum of K(x,z ) = g(x,z )-g^(x,z (K/n + k'^^^"^ + L/n + C^^/^^i). two terms, depending on the number K the first step and the number in the ) L of Each of second step. these two terms has a form analogous that derived in Stone (1985), which would attain the optimal mean-square convergence rate for estimating a conditional expectation were chosen appropriately. L and L proportional to n conclusion of Theorem 4.2 <^ In particular, i i i if K (1982) it ) if ^ \ satisfied, then the r^ / (max{n r i.e. -2s/(d+2s) , n -2s /(d +2s i i )-, )]. i would be optimal for estimation of n did not have to be estimated in the first step. u n follows that the convergence rates in this expression are optimal for their respective steps, g (x,z is and is ^ From Stone chosen proportional to and Assumption 4 , ,- 1'^ t- /j r , ^rJ/ r / i?/ M- /_; (dw)/T} F (dw) = )F )-St(w)A(x,z Sr(w)[A(x,z ,^ (4.1) is K if Thus, when K and L are chosen appropriately, the mean square error convergence rate of the second step estimator is the slower of the optimal rate for the first step and the optimal rate for the second step that would obtain An important feature n if the first step did not have to be estimated. of this result is that the second step convergence rate the optimal one for a function of a d-dimensional variable rather than is the slower rate that would be optimal for estimation of a general function of additivity was not imposed. estimation. It is h (w) is where used in essentially the exclusion of interaction terms that leads to this result, as discussed in Although a This occurs because the additivity of w, Andrews and Whang full derivation of (1990). bounds on the rate of convergence of estimators of 17 g_^(x,z is ) beyond the scope of this paper, something can be said. n Stone (1982) that estimation of when ) u known. is cannot be better than the best rate when n the rate is attained we know c. vr K .^.^- Setting ^ and . that it u is known, w +. ^ the rates to K be proportional to +K rate will be attained for splines when d is unknown , s , d, where s this rate is s/d s n , and d /(d +2s nil ^, ) s/d ^ /d ^ and s/d means that the second stage We do (relative to its dimension) as the first stage. convergence rate would be when the first stage although in that case we know > For power series the 2. narrow the range of Also, the side conditions in It is s/d s less at least as and step. smooth know what the optimal smooth than the second stage, if K L and are chosen to minimize the mean-square Our estimator could conceivably be suboptimal in that case. Assumption 4 are quite strong and rule out any possibility of optimality of our estimator when not is s/d that the first stage convergence rate would dominate the mean-square error of our estimator error in Theorem 4.1 or 4.2. is ^ that i follows that the optimal second stage it /d s d/(d+2s) where the estimator could attain the best convergence rate for the second The condition when neither the first or the second stage is very smooth 1). interesting to note that the convergence rates in equation (4.1) are only relevant for the function itself and not for derivatives. h(w), u must attain the best rate when g side conditions of Assumption 4 are even stronger and would (e.g. known from the best one (and that the estimator has the best rate). is minimize each component of the mean-square error s/d when Since the best rate Therefore, for the ranges of ^ L , is the best attainable mean-square error (MSE) rate for is g (x,z It Because u is plugged into one might think that convergence rates for derivatives are also important, as Taylor expansion of h(u) around h(u). depends only on the conditioning variables The reason that they are not x and z in equation We thank a referee for pointing this out and for giving a slightly of the following discussion of optimality. 18 (2.1), is that in a u a feature that more general version allows us to work with a Taylor expansion of rather than h (w) Theorem 4.2 gives convergence rates that are net be interesting to know will E[c] = A = A (u) rates. is does not identify the constant. a pointwise one, so by specifying a restriction motivated in the is straightforward to obtain uniform convergence rates for h(w) in Lemma uniform rates for g /^^(u) = A uniform convergence rates for W then for 4.3: q = 1/2 is is g ) = h(x,z ,u) - for splines, and ^'^P(xz)eW \S(x.zp - gQ(x,z^)\ from the ) note that u at 4.1. In particular, ^rf^^ = 1 (in ^> ^.nd W W, when the in follow immediately from those for q = can ) some value imposed leading to an estimator as A, g(x,z continuous in the supremum norm on equal to the coordinate projection of g(x,z If In particular, g(x,z Furthermore, as long as the restriction from Lemma will also follow identification restriction Specifically, for 4.1. up to the constant term, by just fixing h(w), used to identify the constant term -^(-.(u) Theorem Also, the restriction but this condition does not seem to be well 0, so that uniform rates follow immediately. (3.6), That model and so will not be considered. be recovered from on included. does not lead to mean-square error convergence E[T(w)e] = uniform convergence rates for W), would also would be possible to obtain mean-square rates that include the constant term It It it is It With the trimming the depend on the identifying assumption for the constant term. usual restriction in the proofs. of the constant term. when the constant term similar results hold if h(w) equation h. on values of (x,z ), Assumptions 1-4 are satisfied for power series, = 0^(K'^[(K/n)^^^ + k"^^'^ + (L/n)^''^ + L^/^^i]). This uniform convergence rate cannot be made equal to Stone's (1982) bounds for either the first or second steps, due to the leading K*^ term. Nevertheless, these uniform rates improve on some in the literature, e.g. on Cox (1988), as noted in Apparently, it is Newey (1997). not yet known whether series estimators can attain the optimal uniform convergence rates. The uniform convergence rate for polynomial regression estimators 19 is slower and the K conditions on more stringent for power series than L and we Nevertheless splines. retain the results for polynomials because they have some appealing practical features. They are often used for increased flexibility in functional form. Also polynomials do not require knot placement, and hence are simpler to use in practice. We could also derive convergence rates for Theorems 4.2 and identical to those in 5. 4.3. and obtain convergence rates Mu), We omit these results for brevity. Inference Large sample confidence intervals are useful for inference. We focus on pointwise confidence intervals rather than uniform ones, as in much of the literature. purpose denote a possible true function h let suppressed for convenience. consider all linear function g (x,z That functionals of w argument is We represents the entire function. h which turn out to include the value of the h, We construct confidence at a point. ) the symbol is, where the h (w), For this intervals using standard errors that account for the nonparamiCtric first step estimation. To describe the standard errors and associated confidence consider a general framework. h (w), w where the For this purpose argument represents the entire function. As a function of real line, 9 i.e. = a(h_) h, a(h) 9 . useful to That is, the symbol be a particular number associated with The object of interest h is denote a possible true function suppressed for convenience. Let of the functional at of h it represents a mapping from possible functions of a(h) a functional. 9 = a(h) estimator is let intervals will be In this Section h h h. to the assumed to be the value we develop standard errors for the and give asymptotic normality results that allow formation of large sample confidence intervals. This framework value of g is at a point. general enough to include many objects For example, under the restriction 20 of interest, such as the '^p.(u) = X discussed earlier, the value of g (x,z Sq^^'^I^ ^ (5.1) ^^^^ ^^"^0^' "^ A general inference procedure for equation, i.e. the value of can be used (x,z at a point ) can be represented as ) h(x,z^,u) - X. linear functionals will apply to the functional of this the construction of large sample confidence intervals for in at a point. g Another example a w^eighted average derivative of is when g through a linear combination w particularly interesting g (x,z This object ). an index model that depends on is Sq^w say r, differentiable function with derivative t ) = t(w where 9-) Then for any (q). w t(q) - (x,z is ) only a is such that v(w) q v(w)t (w'^r) integrable, is Jv(w)[ah-(w)/aw,]dw = (5.2) (w,^')v(w)dw]3', [Jt q 1 1 so that the weighted average derivative h functional of where , g (w ) g^^w ) This y. is also a linear This functional may also be useful over some range, as discussed in Section 3g (w )/Sw or proportional to a(h) = J'v(w)[5h(w)/9w ]dw. for summarizing the slope of the value of is 7. Unlike at a point, the weighted average derivative functional will be ^-consistent under regularity conditions discussed below. We can derive standard errors for linear functionals of general enough to include many important examples, such a weighted average derivative. "plug-in" estimator. For example, the value of applying the functional in equation general, linearity of 9 = Ap, (5.3) Because G is a(h) A = (5.1) g to obtain which is as the value of G = a(h) The estimator h, of 6 = a(h ) a class g is at a point or a natural, at a point could be estimated by g(x,z ) = a(h) = h(x,z ,u) - A. In implies that (a(p^j^) a(pj^j^)). a linear combination of the two-step least squares coefficients p, a natural estimator of the variance can be obtained by applying the formula for parametric 21 two step estimators Newey (see As for other series estimators, such as 1984). because (1997), this should lead to correct inference asymptotically, Let properly for the variability of the estimator. Q = P'P/n, (5.4) Z = J:.^ — J. T.p.p'.[y.-h(w.)]^/n, X 1. i. 1. 1 Newey accounts it denote the Kronecker product and ® =1 Q X 1 in ®(R'R/n), \A i\. 1 Z, = 11 ^1=1 1 11 The variance estimate for V = AQ ht (5.5) This estimator regressions. is It H = ,(u.u'. )®(r.r'. )/n, y. + a(h) is ' then HQ^^S^Qj^H'lQ ^A' suggested by Newey (1984) for parametric two-step like that AQ ZQ equal to a term is n ,T.[Sh(w.)/au]'®p.r'./n ^1=1 1 1 11 V. A' , that is the standard heteroskedasticity consistent variance estimator, plus an additional nonnegative definite term that accounts for the presence of estimator is u. It has this additive form, where the variance of the two-step larger than the first, because the second step conditions on the first step dependent variables, making the second and first steps orthogonal. We will show that under certain regularity conditions Thus doing inference as will be a correct large results of Andrews However, as a(h). sample approximation. (1991) (0-G were distributed normally with mean 9 if VnV ) — > N(0,1). and variance 6 V/n This extends large sample inference and Newey (1997) to account for an estimated first step. work, such results do not specify the convergence rate of in this previous That rate will depend on how fast V goes to infinity. To date there is still not a complete characterization of the convergence rate of a series estimator available in the literature, although it is known when ^^-consistency occurs (see Newey, 1997). Those results can be extended to obtain Vn-consistency of functionals of the two-step estimators developed here. norm and this P norm by Let II • II = (E[t(w)(') ]/t) denote a trimmed mean-square denote the set of functions that can be approximated arbitrarily well a linear combination of p (w) for large enough 22 K. It is in well known, for p in T corresponding to power series or splines, that (w) (x,z and ) where the individual components have u a(h) for any II T • The norm. II such that = ElxCwJi'Cwjhtw)], y(w) with a function h(w) means that Essentially this h(w) e T. product of finite y(w) e critical condition for Vji-consistency is that there is (5.6) includes all additive functions representation theorem, this is its 2 — E[t(w)(') ]/t. |z]. in the same space as being continuous with a(h) Vn(Q In this case By the Riesz h. - 8 will be ) asymptotic variance can be given an explicit expression. To describe the asymptotic variance E[T(w)v(w)3h (w)/5u' is the same as the functional respect to the mean square error asymptotically normal, and that can be represented as an expected a(h) in the /n-consistent case, let p(z) = The asymptotic variance of the estimator can then be expressed as: V = E[T(w)i^(w)i^(w)'Var(y|X)] (5.7) + E[p(z)Var(x|z)p(z)' For example, for the weighted average derivative zero where t(w) equation (5.6) is w density of is zero, so and Proj(*|?') t'(w) lie in the v(w) if is 3v{w)/Sx|y), where is ff^(w) the denotes the mean-square projection on the space of w and space spanned by functions that are additive in equation (5.2), then integration by parts shows that y(w) = -Proj(f (w) satisfied with functions that are additive in make t(w)v(w) = v(w), in ]. w and x. p x. The presence of (w) for large Proj(«|?') K, that is is necessary to the set of Then p(z) = -E[T(w){Proj(fQ(w)~^av(w)/5x|?')>ahQ(w)/au' |z], and the asymptotic variance will be V from equation To state precisely results on asymptotic normality regularity conditions. Recall that t) = y-h (w). 23 (5.7). it is helpful to introduce some Assumption 2 5: 4 IX] E[llull is = Var(y|X) (X) a- bounded. h (w) Also, 4 bounded away from zero, is E[7) |X] bounded, and is twice continuously differentiable in is with u bounded first and second derivatives. moment This condition strengthens the bounded conditional second moment and of the fourth conditional condition to boundedness away from the conditional variance being bounded zero. The next assumption is requires that the functional least for the truth Assumption E[T(w)v(w)h and the approximating functions. h a(p (w)], for large (w) which is such that |3 = E[t(w)v(w)p ) K, and v[w) This condition also requires that p E[t(w)IIv(w)II E[T(w)lly(w)-p p (w)], 5v(w)/Sx (w) w in complementary to Assumption is < ] K 2 (w)ll important for the precise form that on functions that are additive The next condition 2 a(h oo, as -^« ] ) = K —^ oo. can be approximated well by a linear combination v(w) example, with the average derivative this condition leads f It have a representation as an expected product form, at a(h) There exists- 5: the key condition for ^/n-consistency of the estimator. y(w) and must take. For to be the projection of u. For 6. vi-w] d = d + d and a d d W x 9 max. 1 u vector integers of nonnegative let o o I- Id h(w)/5w ,• 'Sw • ^ denote a nonnegative integer and 5 K — > is oo, a scalar, a(p K 'p ) | a(h) is This assumption says that functional square error. | ^ | h | _ o for some = |h|„ = o a(h) is continuous The lack of mean-square continuity in 0, imposed is a scalar, which is |h|„, o K E[t(w){p (w)'/3 fij. 2 } ] — but not in mean a(h) Another restriction general enough to cover 24 and there exists will imply that the estimator also a useful regularity condition. a(h) 2: bounded away from zero while is that 5 K. not V^-consistent, and is 5 h(w) ,li., r-j w a(h) 7: such that as Also let . = V. i^j_ ,,Ja'^h(w)|. ,^5,sup Assumption , |u| r- many cases of is > 0. When interest. a(h) a vector, asymptotic normality with an estimated covariance is matrix would follow from Assumption Andrews of iii) In contrast, (1991) is difficult to verify. is J Assumption 7 a primitive condition, that is This condition and Assumption 5 are mutually exclusive, relatively easy to verify. and general enough to include many cases of interest. For example, as noted above, For another example. Assumption 7 Assumption 6 includes the weighted average derivative. obviously applies to the functional a(h) = h(x,z ,u) - A from equation continuous with respect to the supremum norm and where _ i^ there is IK. p (x,z ,u)'/3 such that /3^ K That assumption of Andrews (1991). - A is it K^L^)/n -^ — TnK 8: 0, > VnL 0, i i — One can show that this condition requires that pointed out by a referee. also limits the It k\^ + KL^^j/n s/d 5/2 > s /d and > L + L. KL +KL 2, so that if each for splines and n was as n for These conditions are stronger than for single step series estimators series. discussed in 0. K and (KL -^ and growth rate of grows at the same rate they can grow no faster than power K for power series, 0, (K^L + k'^L^ + and for splines is bounded away from zero but is > which straightforward to show that The next condition restricts the allowed rates of growth for Assumption (5.1), Newey (1997), because of complications due to the multistage nature of the estimation problem. The next condition is useful for the estimator V of the asymptotic variance to have the right properties. Assumption 9: Either quadratic one, and h^iw) of the is /—<, a) vnK l~s differentiable of jth x is univariate, — 0; all orders, there > b) x derivative bounded above by This assumption helps guarantee that A is if a spline used K multivariate, p a constant C is C(C)-^, (u) is and 25 and it (w) is at least a a power series, with the absolute value /nK~^ -^ its derivative is for some e > 0. can be approximated by + p which (w), These important for consistency of the covariance matrix estimator. is conditions are not very general, but more general ones would require series approximation rates for derivatives, which are not available in the literature. least useful for some set of regularity conditions. „sup,,.lia . Theorem p 5.1: = o 1-3, 5, either 6 or and 8 and 9 are satisfied, 7, o VnV-^^^(G - e^) -^ N(o,i), Furthermore, if Assumption 6 -^ -^ n(o,i). is satisfied, V N(0,V), v^i/^^^re-e^; -^V. asymptotic normality needed for large sample confidence intervals this results gives convergence rate bounds and, when Assumption 6 holds, Vn-consistency. Bounds on the convergence rate can also be derived from the results shown that for power series CK Cs-^K) ^ o and for splines Thus, for example, when generic positive constant. Cs-^K) ^ o a{h) CK = h(x,z l/\^ (z) b) is where (1997) C where - A, is a 6 = 0, and for splines K/v'n also satisfies the by letting K the convergence rate for all orders, same condition, the convergence rate can be made close to grow slowly enough. for some positive, small has derivatives of h (w) satisfied, so that continuously differentiable of order Cn Newey /^/Vn. When Assumption 9 n in , ,u) an upper bound on the convergence rate for power series would be would be B = then and (i:JK)/Vn) In addition to the C^-^K) (w)ll. p If Assumptions VR(B - e^; of Assumption 7, let 5 K u. max, and at is showing that the asymptotic variance estimator will be consistent under Next, for the nonnegative integer 0n + This assumption 6, s all In particular, for every s > 0, h (w) and so that if H X 26 are K = Cn of the conditions will be satisfied. a(h) = h(x,z ,u) - (z) L = and Then the for power series will then be n "^ * , which will be close to l/-/n for small enough. e The centering of the limiting distribution "over fitting" that is K of the estimator is than l/yH. present, meaning that the bias of these estimators shrinks faster This over fitting than the variance. -s/d , 1/Vn Since at zero in this result is a reflection of is implied by Assumption is generally the fastest the standard deviation of This feature of the asymptotic normality result that condition that faster than K Andrews is for is present. shared by other theorems for When combined with the and Newey (1997). (1991) go to infinity slower than n < a < 1, the bias shrinking means that the convergence rate for the estimator l/Vn This the optimal rate. is condition, but that is is bounded away from different than asymptotic normality results for other types of nonparametric estimators, such as kernel regression. 6. The order of the bias so that Assumption 8 requires that the bias shrink faster estimators can shrink (such as the sample mean), overfitting series estimators, as in 8. an extension that It would be good to relax this beyond the scope of this paper. is Additive Semiparametric Models In economic applications there is nonparametric estimation problematic. often a large number of covariates thereby making The well known curse of dimensionality can make the estimation of large dimensional nonparametric models difficult. problem is to restrict the model in some way while retaining some nonparametric features. The single index model discussed earlier Another example others. is a model that is One approach to this is one example of such a restricted model. additive in some components and a parametric in This type of model has received much attention in the literature as an approach to dimension reduction. Furthermore, it is particularly easy to estimate using a series estimator, by simply imposing restrictions on the approximating functions. To describe this type of model, let w = (x.z 27 ) as before, and let w (j = denote 0,1,...,J), J+1 gQ(w^) = w^^r + Ijiigjtw^ (6.1) w subvectors of A semiparametric additive model . J. x This model can be estimated by specifying that the functions of consist of the elements of w and power series or splines could also impose similar restrictions on IT^(z) restriction could be imposed in estimation by specifying and power series or splines The coefficients g„(w ) + (u). /V z in each z m = z'tt^ + V J]._ g.(w + ) r (z) Assume that (6.2) If m (z m of w ] = E[T(w)i^{w)h(w)], is This . h, 5. h(w) = so that a series Let q(w) be the on functions of the form for the probability measure where E[T(w)q(w)q(w)' ). to include the elements that are mean-square continuous functionals of '^f,(u), ,11 of equation (6.1) are examples of functionals of from the mean-square projection One J). For example, one ^m=l estimator will be v^-consistent under the conditions of Section residual (j=l P(d) = E[T(w)l(wei^)]/E[T(w)]. nonsingular, an identification condition for ^. follow from the results of Section 5. y will Robinson (1988) also gave some instrumental variable estimators for a semiparametric model that is linear in endogenous variables. The regularity conditions of Section 5 can be weakened somewhat for this model. Basically, only the nonparametric part of the model need satisfy these conditions so w Then y(w) = (E[T(w)q(w)q(w)' ])~^q(w). Hence, Vn-consistency and asymptotic normality of the series estimator for that, for example, (w) p and the reduced form, by specifying that X(u) could specify a partially linear reduced form as z„ included in w in they depend on a linear combination plus some additive components. of is need not be continuously distributed. 28 Empirical Example 7. we illustrate our approach To investigate the empirical relationship between the hourly wage rate and the number of annual hours worked. the hourly wage rate as independent of the While early examinations treated number of hours worked recent studies have incorporated an endogenous wage rate, (see, for example, Moffitt 1984, Biddle and Zarkin and Vella 1993), and have uncovered a non-linear relationship between the wage rate 1989, and annual hours. The theoretical underpinnings of an increasing importance of labor when it is hours of work due to related start up costs. number of hours involved in a greater Alternatively Barzel (1973) argues that labor 1962). this relationship can be assigned to is (see Oi relatively unproductive at low Moreover, at high hours of work Barzel argues that fatigue decreases labor productivity. These forces will generate hourly wage rates that initially increase but subsequently decrease as daily hours work increase. may Finally, the taxation/labor supply literature argues that a non-linear relationship exist due to the progressive nature of taxation rates. between hours and wage rates it in the (7.1) y. is = z^.p + g2o(x.) + variables that includes E and x. e., u. z x. ., many influences and we capture and individual r are parameters, are zero mean error terms such that semiparametric version of equation parametric reduced form. characteristics in z'.^ + u.. annual hours worked, is |3 = wage rate of the log of the hourly individual characteristics, and potentially the outcome of is following model: y. where Thus the non-linear relationship z., We use so that it (1.1), like that z i, z. is is . a vector of a vector of exogenous g is E[g|u] * an unknown function, 0. discussed in Section this specification because there are This 6, is a with a many individual would be difficult to apply fully nonparametric estimation due to the curse of dimensionality. We estimate this model using data on males from the 1989 wave of the Michigan Panel Survey Of Income Dynamics. 29 To preserve we follow comparability with previous empirical studies Biddle .and .ZarJcin (1989). ..Accordingly, in we similar data exclusions to those only. include males aged between 22 and 55 who had worked between 1000 and 3500 hours in the previous year. This produced a sample Our measures of hours and wages are annual hours worked and the of 1314 observations. hourly wage rate respectively. We in wage equation by first estimate the column 1 of Table The hour effect 1. is OLS and report the relevant parameters linear small and not significantly different from Adjusting for the endogeneity of hours via linear two stage least squares (2SLS), zero. however, indicates that the impact of hours effect is small. The results in column 2, is statistically significant although the on the basis of the t-statistic for the residuals, suggests hours are endogenous to wages. To allow the for function to be non-linear g we employed and various approximations for the manner g endogeneity of hours. We also allow for in alternative specifications which we account for the some non-linearities in the including non-linear terms for the experience and tenure variables. number of non-linear terms residuals in the from the chosen in the first step specification, We primary equation. we determine the number of approximating terms The CV criterion is well will lead to estimates with good properties here. properties in a Monte Carlo study, and find that Because the theory requires over variance asymptotically, those that CV it first choose the and then, by employing the associated asymptotic mean-square error when the first estimation it We discriminate between alternative specifications on the basis of cross validation (CV) criterion. that reduced form by is known not present so we are hopeful Below we consider CV performs where the bias fitting, to minimize its well. is smaller than the seems prudent to consider specifications with more terms than gives, particularly for inference. the number of terms that minimizes the CV Accordingly, for both steps we identify criterion and then add an additional term. For example, while the approximation which minimized the CV criterion for the first step was a fourth order polynomial in tenure and experience 30 we generate the residuals from a model which includes a fifth order polynomial in Note that to enable these variables. comparisons, these were the residuals which were included in column 2 discussed above. The CV values for a subset of the second step specifications examined are reported in Table Although we also experimented with splines, 2. in each step, we found that the The preferred specifications involving polynomial approximations appeared to dominate. specification is a fourth order polynomial in hours while accounting for the endogeneity through a third order polynomial specification 182.643. is in the residuals. The CV criterion for this The fourth column of Table hours profile employing these approximations. coefficients while excluding the residuals. 1 reports the estimates of the The third column presents the hours Due to the over fitting requirement we now take as a base case the specification with a fifth order polynomial order polynomial in the residual. We in will also check the sensitivity of hours and a fourth some results to this choice. While Table it is 1 suggests the non-linearity and endogeneity are statistically important useful to examine their respective roles in determining the hours profile. For comparison purposes Figure plots the change in the log of the hourly 1 annual hours increase, where the nonparametric specification h = 5 r = 4. and To is wage rate as the over fitted case with over the observations, and the graph shows the deviation of the function from As annual hours affect the log of the hourly wage rate plot the impact of hours on the log of the hourly wage in its In Figure 1 we mean. an additive manner we simply rate. In Figure 1 we plot the quadratic 2SLS estimates like those of Biddle and Zarkin (1989) and the profile similar to theirs. mean facilitate comparisons each function has been centered at its is very also plot the predicted relationship between hours and wages for these data with and without the adjustment for endogeneity using our base specification. It indicates the relationship is highly non-linear. Furthermore, failing to adjust for the endogeneity leads to incorrect inferences regarding the overall impact of hours on wages and, more specifically, the value of the turning points. Although we found that the polynomial approximations appeared to best 31 fit the data we We employ also identified the best fitting spline approximations. the parameters we allow this for the first number of to be chosen by the data are the and second steps and we over in this instance including an additional join point, in each step, above what CV of the values. is points for the residuals. In Figure 2 we 1 data by 1 join points in the join point in the hours function and and the wage rate from the over fitted spline approximation. do determined on the basis plot the predicted relationship looks remarkably similar to that in Figure We join points. fit the Note that the best fitting approximation involved reduced form and a primary specification of cubic splines and 1 join between hours This predicted relationship Furthermore, the points noted above 1. related to the failure to account for the endogeneity are similar. Figures wage rate 1 and 2 indicate that at high numbers of hours the implied decrease sufficient to decrease total earnings. is may be due This in the number to the small of observations in this area of the profile and the associated large standard errors. Figures 3 and 4 explore this issue by plotting the 95 percent confidence intervals for our adjusted nonparametric estimates of the wage hours profile. It confirms that the estimated standard errors are large at the upper end of the hours range. To quantify an average measure of the effect of hours on wages we consider the weighted average derivative of the estimated over the range where the function 1300 region, to 2500 125 upper tail. shown hours, represents observations above only is 2600, The average derivative for the .000155. to be upward sloping 81 percent 1300-2500 h = 5 We estimated function. so that not very the polynomial approximation, based on error of g and r in of the total sample. and 1 2. This There were many were excluded by ignoring the part of the hours profile from = 4, is .000513 is .000505 with a standard error of estimates are quite different than the linear 2SLS estimate reported consistent with the presence of nonlinearity. 1 Figures with a standard The corresponding estimate from our spline approximation of the weighted average derivative Figure the derivative the average derivative is .000350 .000147. in Table These 1 which Furthermore, for the quadratic 2SLS with a standard error of 32 .000142, is in which is also quite different than the average derivative for the nonparametric estimators. difference and is This present despite the relatively wide pointwise confidence bands in Figures 3 Thus, in our example the average w^age change for most of the individuals in the 4. sample is found to be substantially larger for our nonparametric approach than by linear or quadratic 2SLS. Before proceeding we examined the robustness of these estimates of the average derivative to the specification of the reduced the reduced form and alternative approximations of the wage hours reduced the number of approximating terms CV form and the alternative specifications of in the reduced form. profile. While we First, we found that the form criteria for each step increased with the exclusion of these terms in the reduced we found that there was virtually no impact on the total profile and a very minor effect on our estimate of the average derivative of the profile for the region discussed above. When we included For example, consider the estimates from the polynomial approximation. only linear terms in the reduced form the estimated average derivative a standard error of quadratic terms .0000175. .000546 is .000515 is with The corresponding values for the specification including with a standard error of cubic terms resulted in an estimate of .000522 .000174. Finally, the addition of with a standard error of .000153. 1 Changes in the specification of the model based on the spline approximations produced similarly small differences. We terms also explored the sensitivity of the average derivative estimate to the approximation. in the series Both the average derivative and were relatively unaffected by increasing the number of terms example, with an 8th order polynomial the average derivative estimate Finally our model. We is in the standard error approximations. For hours and 7th order polynomial in the residual .000519 we examine whether we are with a standard error of .000156. able to reject the additive structure implied by do this by including an interaction term capturing the product of hours and the residuals variable was in its number of .231. in our preferred specification. The CV value for The t-statistic on this included this specification is 33 182.924 while that for our preferred specification plus this interaction term and the term capturing the product of hours squared and the residual squared conclude there is is On the basis of these results we 183.188. no evidence against the additive structure of the model. In addition to this empirical evidence The objective of data examined above. we provide simulation evidence featuring the this exercise is to explore the ability of our procedure to accurately estimate the unknown function examined above. We in the empirical setting we simulate the endogenous variable through the exogenous variables and parameter estimates from each of the polynomial and spline approximations reported above. We generate the hours variable by simulating the reduced form and incorporating a random component drawn from the reduced form empirical residual vector. we estimate the residuals and from the simulated hours vector order terms for hours. We then From this reduced we generate form the higher simulate wages by employing the simulated values of hours and residuals, along with the true exogenous variables, and the parameter estimates discussed above. wage equation The random component residuals. in the empirical the is drawn from the distribution of the empirical As we employ the parameter vector from the over fitted models polynomial model has a fifth order polynomial in the reduced form while the wage equation has a 5th order polynomial in hours and a fourth order polynomial in the residuals. in the The spline approximation employs a cubic spline with reduced form while the wage equation has points for the residuals. We examine 2 2 join points for hours join points and the performance of our procedure for 2 join 3000 replications of the model. In order to relax the parametric assumptions of the model we use the CV criterion each step of the estimation. That is, for the model which employs the polynomial approximation we first choose the number of terms of the residuals from the over approximating terms in the fit in in the reduced form. Then on the basis reduced form we then choose the number of wage equation. For the model generated by the spline approximation we do the same except we employ a cubic spline and choose the number of join points. For both approximations we trim the bottom and top two and a half percent 34 observations, on the basis of the hours residuals, from the data for which the wage equation. An important aspect of our procedure polynomial model for all 25 CV method the ability of the is To examine identify the correct specification. this specifications of the we computed the choices in the reduced form this generated examined up to 3 residuals in the wage equation. derivative estimator like that CV we computed we we can 27 As there was also 5 For the spline model we possible specifications. applied to the actual data. the estimator by choosing the First consider the results Once again we over .000153, 8.9 percent. fit the number of terms that minimizes computed standard errors for the same The mean of the from the polynomial model. average derivative estimates, across the Monte Carlo replications, was represents a bias of to a fifth evaluate the performance of an average criterion, plus one additional term, and specification. possibilities. This produced results criteria for the form and then 3 each for hours and join points in the reduced From the Monte Carlo data such that 125 CV to correctly wage equation combining up order polynomial in hours and fifth order polynomial in residuals. the we estimate .000456, which The standard error across the replications was while the average of the estimated standard errors was .000152. Therefore the estimated standard errors accurately reflect the variability of the estimator in this experiment. For the spline approximation the average estimate was represents a bias of .000145, We 11.1 percent. .000442 which The standard error across the replications was while the average of the estimated standard errors was .000144. also computed rejection frequencies for tests that the average derivative was equal to its true value. For the polynomial model the rejection rates at the nominal 10 and 20 respectively. percent. percent significance levels were 6.5, 11.8 and 5, 22.5 percent The corresponding figures for the spline model were 7.4, 13.1 and 24.0 Thus, asymptotic inference procedures turned out to be quite accurate in this experiment, lending some credence to our inference in the empirical example. 35 Appendix We will first prove one of the results on identification. Proof of Theorem Consider differentiable functions 2.3: 5(x,z and ) and ^(u), suppose that = 6(x,z and z identically in + ^(u) = 6(n(z)+u,z^) + ^(u), ) Differentiating then gives u. = n (z)'5 (x.z If IT (z) rank full is + 5 (x.z ) X J\. J. it = 5 (x,z ), X J. it suffices to consider = S{yi,z )+Tf{u) = n (z)'5 (x,z ). ^ X + ^ (u), LI }\. follows from the last equation that that then follows from the first two equations that show identification we ) X A. will use 5(x,z ) Theorem and ) = and Tf By differentiability of 2.1. •yiu) 5 (x,z 5 (x,z that are differentiable. = ) (u) = g and 0. It Now, to 0. A Also, if with probability one, then by continuity of the functions and the boundary having zero probability, this equality holds identically on the interior of the support of Then, as above, (u,z). all the partial derivatives of are zero with probability one, implying they are constant. by Theorem 2.1. To avoid that is repetition, needed. may be it and y(u) Identification then follows useful to prove consistency and asymptotic normality Throughout the appendix different in different uses. among components and X and ir represents a possible value of z ) QED. for a general two-step weighted least squares estimator. notation 5(x,z its H Let w(X,7r) (z) 36 To state the Lemmas some C will denote a generic positive constant, X be a vector of variables that includes a vector of functions of = E[x|z]. assumed that lemmas Then for w = w(X,n X and (z)), it ti, where will be (A.l) E[y|X] = h^lw), Also, let i//. 11 = MX.) E[x|z] = Tl^(z). be a scalar, nonnegative function of X., and IT(z) be as in x. 1 1 the body of the paper, and p = (P'Pf^P'Y, h(w) = p^(w)'p. (A.2) W Let max, = {w t(w) = : i^^sup ... and for 1}, Al: component w.(X,7r) 2 E[t(w)i//(X) and B K iii) K There exists 5, max. with, a, a and > lin (z)-^' Lemma Ah r'^Cz)!! K ^r 1 Lemma (w) = P (w) R L (z) = Br h^\, = U d Al: and L (z), For each nonnegative integer iv) < C,AK) and sup-^IIR^^fz)!! d = -^ 0, 0, 5 ^{L); v) :£ CK 2 Krf^) K/n -ct and P Note that the value of r K p (z) (w) = and/or r'^(z). Let r + (L/n)^^^ + is unchanged (z) is used, so that L 37 if and p. d, then a nonsingular constant it 111 = t(w.), 1 0, l\]). h t. > (K/n + K~^"- + L/n + L^\). a:jK)[(K/n)^^^ + K°' d — then and Al are satisfied for some nonnegative integer linear transformation of p and and Al are satisfied for p Proof of there are nonsingular such that |h-,-p 'f3j.\^ ^ O K o L ST(w)\p(X)^[h(w)-hJw)]^dFjX) = \h - continuously K and iK^^^i^^CK) + i:^(K)^^(L)][(L/n)^^^ + l'""!] 1 is each ClA. 5 If Assumptions If Assumptions L; -sup,.,lia'^p'^(w)ll (3 ii) have smallest eigenvalues that are ] and K. 1 sup in (z)' n; w.(X,II (z)) and L K L E[R (z)R . K K L and or tt For each Lipschitz in is P (w) - Bp (w) such that for P (w)P (w)'] C,JK) is w(X,Tr) either does not depend on bounded away from zero, uniformly there = P^^^.). |h|„ = o a nonnegative integer, let bounded and is i//(X) i) distributed with bounded density; B P^ ^n'^n^n^'* |a^h(w)|. Assumption matrices 5 P = [t^^^p^ = p'^(w.), can be assumed that Q = E[T.i/»^p.p'. r i^r 1 ]. By •' Assumption Q the smallest eigenvalue of Al, bounded away from zero, so that the is K —1 Q largest eigenvalue of K —1/2 Q p ^ C^ (w)ll p By the Cauchy-Schwartz inequality there (w). Ll~K IIS Consider replacing bounded. is p p C a constant is ~K by (w) (w) = such that so that the hypotheses and the conclusion will be satisfied with |(K), I this replacement. 2~K Since E[t(w)i/((X) p Q = prove the results with for Q^ = E[rhz)rhz)'] = = L A Next, let 1/2 by construction, I By analogous reasoning, I. /—' /vn + L notational simplicity that that ~cx ^ suffices to prove the result it 11 -^ TT. i, n(z) CG-^ Iin.-n.ll^/n £ i.^^J. is 1. = n(z.), and J. / s O(A^) + L Newey (1997), y."jin.-n.ii /n = 11 ^1=1 LI. (y-j-J' Theorem Theorem max. 1 of Newey lin.-n.ll i^n J:.", ^1=1 (a P 11 J J. from (Q^-Dly-?-^) I + 0(K~^"i) Cll^-?-. Il^o L of Newey 2 Wlf-'if, II = 2 (A ), (1) + 0(K"^"i) ) 71 Lemma and (?(L)A p A3, ). TT ^"^ ^ " P'P/n. 38 Note that (A. 2) so that by eq. ) (^(L)A ii) p Also, by eq. (1997). Tt p |T.-T.|/n = ['^i'/'iPi'----'^n'^nPn^' X J (1997), = 11 1 follows that it o Therefore, by Assumption Al P = -y (1), p last inequality follows by the Appendix in Let Also, suppose for ^^ Note that for L Cll3^-3-, 11^0 n (A.4) 1 + C^." Iin.-y r^(z.)ll^/n ) i-^ O (A.3a) TT„(z.). 1 < Cj'[IT(z)-n-(z)]^dF(z) + Cj[n(z)-r, r^(z)]^dF(z) + Also, by = TT. a scalar function. )'Q (^-y J 1. < CS[fi{z)-y^r^{z)]hF{z) + C| (A.3) suffices to Al, j;." where the it I. 71 Assumption ~K (w)p (w)'] = E[IIQ-III^] of (A. 7a), implying -^ IIQ-Ill Lipschitz in w(X,Tt) Markov inequality, W Also, by 0. convex and a mean value expansion T.x.llp.-p.ll n, Xi-^i'^j'/'j HPjH £ Cj(K)llw.-w.ll < CC^(K)lllt.-n.ll. /" = Then by (K). = x. in Also, by the and x. = x., x., and w, i/». bounded, 5 IIQ-QII ^ ^ (A.5) liy;.",x.x.0^(p.p'. ^1=1 11 1 11 - p.p'.)/nll + CllE.",(T.x.-x.)p.p'./nll ^1=1 1 1 11 1 11 + Cll7.",(x.x.-x.)p.p'./nll 5 Cr.'^,x.x.(llp.-p.ll^ + 1=1 11 ^1=1 1 1 11 1 2i//.llp.llllp.-p.ll)/n 11 1 1 1 + CC,AK)^.^A(jj-tp.\/n ^"1=1 1 1 1 A/2 .2, £ CC,(K)^.",IIIT.-n.ll^/n + C(r.",x.0^llp.ll^/n)^''^(l.'^,x.x.llp.-p.ll^/n)^ ^1=1 1 1 O (C^(K)^C(L)A ^0 + p Let = T). (w.)], i//.[y.-h 2 ~ E[t).|X] is E[ti.E[t) .1 (t) = 11 = for = |X] 1 1 * .1X.,X.] = E[t}.E[tj Then by j. + Cn"^E.'^,x.x.llp.-p.ll^/n = ^1=1 where the second to Y result is that (1/n). n Also, p 2 CE[x.i//.p'.p.]/n (A.7) (n"^[r 1111 . bounded, and by independence of p n ) if n | 111 + C,(K)^A^]) 1 71 |X] = (A P n it ), = o 1 (n~^), p Then, since a standard follows that ll(P-P)'T]/nll^ 1111 = Ctr(Q)/n = Ctr(I)/n = CK/n. (A.5) (llp.ll^+llp.ll^)/n E[IIP'Vnll^] = E[E[IIP'r]ll^/n|X]] = EIY.'^ x^i/z'^p'.p. Var(y. |X.)/n^] £ ^1=1 1 1 IIP'7)/nll^ eqs. | 71 (A.5). Therefore, by the triangle inequality, ^ C[ll(P-P)'Vnll^ + llP'Vnll^] = o (1) (K/n) = + P Then by X, Cn"V."jx.-x. ^1=1 (K)^C(L)A p E[|Y = |ti.,X.,X.] |X.,X.] last equality foUowrs similarly to eq. = O (A Furthermore, 1 depending only on P (1). p Then by independence ). E[7).|X] = 0. so i//.h„(w.), = o ) TT (X ,...,X E[ll(P-P)'T)/nll^|X] £ Cn"^.",.//^llx.p.-x.p.ll^ £ ^1=1 1 11 11 (A.6) = o X = r{K]\{L)A TT Var{y. |X) = Var(y|X.) EItj.tj i 1 and ), i//.E[y. and ip. E[7).7).|X] X.] |X.,X.] ,...,t) + + K^'^^C,(K)A IT = t) 1111 1=1 1 (C(K)^A^ pi 11 bounded by the observations, = ) 71 E[0.y.|X] = of the observations 11 ^1=1 1 P and (A.7) and the smallest eigenvalue of with probability approaching one, for M = P(P'P)~^P', 39 Q 7j'M7]/n (K/n). P bounded away from zero = (r}'P/n)Q~^(P'T)/n) £ = (l)IIP'7)/nll^ (K/n). h, Let h, and sup^lh (w)-p 1 llh-hll^/n £ K (w)'|3| IIMt) It n x 1 1 = 0(K vectors 1 — cc ri ,...,h 1 )' ). p Let ~ and M-I idempotent, (M-I)h A^ = VK/Vn + K"°' + n for (A^), P and T.i//.h„(w.), 1 h = (h M O = h. 1 (i.e. Then by ). TT h = PfS 11 ^"1=1 p r y = (x and O (A?). p h bounded away from zero with probability Q i/( x y i// (l)(y-h)'M(y-h)/n ^ y )', (l)[7)'M7)/n + r + (h-h)'M(h-h)]/n ^ (h-h)'M(h-h) tr (K/n) + p p (l)y.",x.i/»^[h^(w.)-h^(w.)]^/n ^1=1 1 1 1 1 (l)^.",x.iA^[h^(w.)-p'.p]^/n = ^1=1 1 1 1 1 Then by the triangle 1 (K/n) + CV.", Iin.-n.ll^/n + 0(k"^'^) = ^1 < O (l)(p-/3)'Q(p-p) = p 1 + M(h-h) + (M-I)hll^/n ^ C(7)'MT)/n + y.'^,T.ip^[h(w.)-hJw.)]^/n ^1=1 r 1 1 1 approaching one that, for + T.i/(.h^(w.), 1 then follows by the smallest eigenvalue of llp-|3ll^ = h. 1 11 + y.^,T.ilj^[hAw.)-p'.f3]^/n £ ^1=1 r 1 Y.", Illt.-n.ll^/n = 1=1 Lipschitz, and h„(w) u (M-I)(h-Pp), T.i//.h(w.). 1 be the corresponding h be such that = h. p p P (aJ). h inequality, {J'x(w)i//(X)^[h(w)-hQ(w)]^dFQ(X)}^''^ £ {Jx(w).//(X)^[p^(w)' 0-fi)]^dF^{X)}^^^ + {Jx(w)i/<(X)^[p^(w)'p-hQ(w)]^dF(w)}^''^ = giving the first conclusion. + 0(k"'^) < IIP-/3II (A^), Also, by the triangle inequality, |h(w)-hQ(w)|^ ^ |p^(w)'p-hQ(w)|^ + |p^(w)'(p-p)|g £ 0(K ") + C.(K)llp-pll = o (C;,(K)A, p o To state Lemma A2, some additional notation as given in the body of the paper. Also, ). n is let p. 40 QED. needed. = p K Z Let (w.), r. , = p Z , L (z.). Q , and Q be Z = t = y.",T.^//^p.p'.[y.-h(w.)]^/n, ^1=1 1111 1 1111 Var(y. |X.)], E[T.^//^p.p'. Q V = P'P/n, = -< Q E[T.(//^p.p'. r ], I'^i'^i 111 H = y.",T.0.p.{[ah(w.)/aw]'aw(x.,fT.)/a7i®r'. }/n, ^1=1 111 1 H = E[T.(/».p.{[ah_(w.)/aw]'aw(x.,n.)/a7i®r'. Ill V = AQ~^Z Assumption A2: + HQ'^S^Q^^H' )Q"^A' lla(h)ll i) differentiable in w 11 1 K For notational convenience, bounded; and L and £ C|h| V = AQ~^(Z , h (w) and + HQ^^E^Q^^H' )Q~^A' V subscripts for ii) ; }]. 1 are suppressed. viw), such that (3 a(h K. a(pj^j^) = E[t(w)0(X)^i'(w)Pj^j^(w)], and (p ) ) is = E[t(w)i/((X) E[t(w)i//(X)^IIv(w)-Pj^p'^(w)II^] such that for K. a(h ) 2 i/'(w)h_,(w)], U (J ~ a scalar and there exists are twice continuously w(X,7r) respectively, and the first and second derivatives are tt either a) there are ill) 1 1 1 h (w) = p K ~ (w)'p 0, or; E[h (w) , K. K. -^ 2 ] b) — > a(h) 0, K. bounded away from zero. K. Under iii) let b), d(X) = [ah (w)/aw]'aw(X,n (z))/a7i, the set of limit points of elements of r T(w)i/;(X)y(w)d(X) V = (z)'^' on and , ifi, let p(z) recall the definition of as be the matrix of projections of and E[T(w)^//(X)^i^(w)i'(w)'Var(y|X)] + E[p(z)Var(x|z)p(z)' 41 £ ]. is and Lemma ^(L) If Assumptions A2: and ^J^) following go to zero as (>:;,JK)/Vn) v^^^^re and A2 are satisfied, hounded away from zero as 3.re KKiJiKf^CD^/n. Al ^ n oo: VRk~°^, L^C,Q(K)^^(Lf'/n. and - e^) -^ N(o,i), VR(B - 9^) -^ N(0,V), Lemma A2: First, V 71 Aq = [K^^^q(K) Ajj Note that by 0(L 1/2 //n) K grow, and each of the VRC^x, C,JK)^K^/n, (K^L+L^K(K)^/n, C,j:K)^C,^(K)^(LK+L^)/n, then 9 = 6 + V^^^'^cb-Qq) -^ n(o,i). is satisfied, -^V. A^ = K^^^/V^ + k""' + A h , TT A^, = ?(L)L^'^^/v^, Ql + Cq(K)^?(L)]A^ + (:^{K)K^^^//n, = [L^'^^CilK) + Cq(K)C(L)^]A^ + K^^^C(L)/v/n. v^nK""^ and A -^ and = 0(K rate conditions and C^(K) CL^C.AK)^^iLf/n -^ 0, (A.8) and let = l}^^/V^ + L"^, A L bounded away from zero, is \ and Furthermore, if Assumption A2, Hi) a) Proof of Var(y X) o 1/2 VdkA /Vn + L and -^ 0, 1/2 ^(L) /-/n). A = (L^'^^/-/n)(l + h'^'^^VnC'^i) Therefore, it follows from the convergence 2 2 ^(L) L /n ^ bounded away from zero that and K^^^^Q = 0([KC^(K) + K^^\^{K)'^^{L)]L^^^/Vn + <q(K)K/v^) L^^^Aj^ = 0([LC^(K) + L^^^Cq(K)?(L)^]L^^^//S + L^^^K^^^?(L)/v^) = 0(a\^(K)^ + L^Co(K)^C(L)^ + LK^(L)^}^'^^/v^) VnCQ(K)Aj = 0«Q(K)L//n) -^ 0. 42 = -^ 0. Cq(K)L^^^Ci(K)Aj^ = 0({Cq(K)Vi(K) (K+L)}^^V/n) -> 0. L from these results that also follows It 1/2 -^ A —> C,^(K)A^ 0, and H For simplicity the remainder of the proof will be given for the scalar Lemma Also, as in the proof of r = (z) R and (z), A[Z + HE H' ]A' . — > i^^(w) = Ap^(w). a) A2,- iii), Q = By K 0. S - CI I, is V is p -1 A = Then }^''^ K (w), V = = Var(y|X) (X) cr case. (z) (w) = P 2 By . Therefore, positive semidefinite. satisfied. K In this case, = {trlFAA'F']}^''^ £ {tr[CFAZA'F' ]}^^^ ^ tr{CFVF' llFAll Suppose that Assumption Let show the result with equal to identity matrices. Q = bounded away from zero and (A.9) suffices to it be a symmetric square root of F Let Q and Q Al —> C,(K)A = E[t(w)i/((X) 2 C. K ylwjp (w)' ]• E[T(w)i/»(X)^lli^(w)-y^(w)ll^] ^ E[T(w)i//(X)^lly(w)-p^p^(w)ll^] I, K IS. and E[T(w)i//(X)d(X)y„(w)r^(z)']r^(z), mean square error (MSE) 2 {w)-v{w)\\ \\v b, (z) = E[T(w)i//(X)d(X)i^(w)r^(z)' ]r^(z). of a least squares projection random variable being projected, CE[t(w)i//(X) b„, (z) = d(X) = [5h_(w)/aw]'aw(X,n^(z))/a7r, Also let 2 ] (z)-b E[llb K.L — K as > — > co. (z)ll no greater than the 2 2 £ E[t(w)i//(X) d(X) Up 2 L is ] Since the MSE of the (w)-v(w)ll 2 ] ^ K. Furthermore, by Assumption A2, ii), b), K. 2 E[llb (z)-p(z)ll Var(x|z), it ] — as > L — > oo. 2 Then by boundedness of cr (X) = Var(y|X) and and by the fact that MSE convergence implies convergence of second moments, follows that (A.IO) V = E[T(w)i//(X)^i^^(w)i^^(w)'cr^(X)] + E[b^, (z)Var(x|z)b^, (z)'] IS. This shows that F is bounded. K.L In. Suppose that Assumption A2, by the Cauchy-Schwartz inequality, |a(hj.)| = |Ap^| £ K. Is. that HAH -^ 00. Also, V a AZA' Next, by the proof of > CIIAII Lemma 2 , so Al, 43 F iii) ~ IIAIIIIp^ll K. is -^ V. K.L b) is satisfied. = IIAII(E[h^(x) 2 Then 1/7 ]) so , is. also bounded under Assumption 5. (A.ll) (A^) = o = IIQ-III Q P H = Further, for (1). P Al, implies that the smallest eigenvalue of Q (A„) = o —^ IIQ-III o Lemma similarly to 111 H p Now, Ql P 1 ,T.i//.p.d(X.)r'. /n, ^"1=1 1 1 Y. = IIH-HII (A.12) (A^J - = IIQ -III (1), P (1), p bounded away from zero is --1 Q with probability approaching one, implying that the largest eigenvalue of implying (1), (A.13) IIBQ~ IIFA(Q~-^-I)ll Also, IIFAQ~^''^II^ £ II IIBIIO for any matrix (1) ^ I1FA(I-Q)Q~^II £ IIFAII IIQ-IIIO = tr(FAQ"^A'F') £ CO (1) IIFAII^ let. ll^/nFa(p^'p - h^)ll £ C^nllFII |p^( (A.14) U Also, for = h. 1 r £ Cv^[tr(FAQ"^A'F')]^'^^0(K"") = (1). Then (k"*^). p (^nK"") = o p Ap = a(p Then by h. = T.>i.h_(w.) 1 1 1 a(h) = A^, 'p), (^K""") = o h = and In h (h, 1 (1). and (A.15), and the triangle inequality, )', ynF[a(h)-a(hQ)] = FAQ"^P't)/V^ + FAQ"^P' (h-h)/v^ + o (A.16) n = (n ,n )', L sup-,|n (2)-r (z)'?'| of each u. n 1 h(w.) = x.-n., 11 1 = 0(L around -a w., U = (u,,...,u 1 and d. and by eq. i), | P (A.14) eqs. (1). p )' P Let 0. IIFAQ"-^P'(h-Pp)/v^ll £ cynllFAQ~^P'/v^llsup^|p^(w)'p - h^lw) (A.15) for -^ )'P - h-(-)|^ = U o In 1 1 • h = (h,,...,h and T.i/».h-(w.) Therefore, P lp^(-)'p - h„(-)L = u o be such that p (1) = P Next, B. is )', n = d(X.). (B.O), 44 a- (1). be such that By a second order mean-value expansion ^P'(h-h)/'/n = FAQ (A.17) FAQ ^^2i'^.i/».p.d.[IT.-n.]/-/n + = p FAQ ^HQ^^R'U/v^ 111 + FAQ~-^HQ7^R'(n-R'y)//S + FAQ~V.",T.i/;.p.d.[r'.?'-n.]/\/n + p ^1=1 111 1 £ Cv/nllFAQ"^^^IICn(K)j;",llfr.-n.ll^/n = Il3ll Also, by d. nHQ bounded and {Vni:AK)^^) = o 1— of multivariate regression ^ on T.ip.p.d. 111 sum equal to the matrix H' r., HQ, H' 1 1 1 (1), of squares from the n - 2- 2 V. ,T.i//.p.p'.d./n ^ CQ. ^1=1 1 r-. £: 1111 Therefore, IIFAQ~%Q^^R'(n-R'j)/'/nll ^ IIFAQ'-^HQ'^R'/v/Hlli/n'sup^ in^Czj-r^lz)' ^| (A. 18) < [traceCFAQ'^HQ^^Q^Q^^H'Q'^A'F'Jl^^^Ot/KL"^) < CIIFAQ"^'^^IIO(ynL"°'i) (/nLA) = = o P (1). P Similarly, 111 IIFAQ~V.",T.i/;.p.d.[r'.^-n.]/v^ ^1=1 ill (A.19) Next, note that E[IIR' U/V^ll^] II < CIIFAQ~^^^IIO(V^L~'^i) = o (1). p = tr(Z ) < Ctr(I ) £ L by E[u^|z] bounded, so by the Markov inequality, (A.20) IIR'U/VnII = (L^''^). P Also, note that --1— --1 HQ, IIFAQ --1/2 II r^ (A.21) (1 II FAQ = Q (1). 11 By similar reasoning, and by eq. Therefore, p IIFAQ~-^H(q"^-I)R'U/V^II < IIFAQ"-^HQ7^IIIIQ 1 (A.22) II p 1 -III IIR' U/Vnll = p (A. 8), IIFAQ'^CH-HIR'U/'/SII < IIFAQ"^II IIH-HII IIR' U/^/SlI = (A„L^''^) p Noting also that HH' is (L^'^^A^J = o (1). Ql p H = o (1). p the population matrix mean-square of the regression of 45 11 on T.i//.p.d. Ill HH' < so that r., E[IIHR' U/V^ll^] = tr(HZ,H' follows that it < CK. ) 1 1 IIHR'U/V^II = Therefore, CI, /? (K and ), P IIFA(Q~-^-I)HR'U/Vnll < IIFAQ~^II (A.23) {A^K^'^^) = o = ll(I-Q)ll IIHR' U/-/nll Q P (1). P Combining eqs. (A.16)-(A.23) and the triangle inequality gives FAQ"^P' (h-h)/v^ = FAHR'U/'/n + o (A.24) Next, it " -~-l/2 Al that (P-P)'7)/v^ll IIQ = implying (1), --1/2 --1 /-^ IIFAQ (P-P)'t)/v^II ^ (A.25) Lemma follows as in the proof of (Cq(K)^C(L)A^ + C^(K)^A^) = o (1). HFAQ "-1/2 ^ - (P-P)'7)/V^ll IIIIQ = o (1). P Also, by = E[t)|X] 0, E[IIFA(Q~-^-I)P't7/V^I1^|X] = tr(FA(I-Q)Q~-^ZQ~^(I-Q)A'F' (A.26) s IIFA(I-Q)Q~^II^ £ II FA 11^ I-Q 11^0 (1), II P so that -^ IIFA(Q" -I)P'7j/Vnll Then combining 0. eqs. (A.25)-(A.26) with eq. (A.16) and the triangle inequality gives VnF[a(h)-a(h„)] = FA(P'-r)/i/n + HR'U/i/n) + o (A.27) (1). p Next, for any vector Note that are Z. d> with across i.i.d. i = ll<AII let 1 for a given + Hr.u.]/v/n <A'FA[T.i/(.p.7i. Ill n, E[Z. in ] = 11 1 and 0, e > 0, IIFAII ^ C and IIFAHII £ CIIFAII C :£ by semidefinite, so that nE[l(|Z. in £ I > e)Z^ ] in = ne^E[l(|Z. in Cne"^ll(/.ll'*<ElllT.p.ll'^E[T}'^|X.]] 11 11 | > e)(Z. /e)^] s ne"^E[|Z. in in + E[llr.ll'^E[u'^|z.]]}/n^ 11 1 46 Z. . in Var(Z. ) = 1/n. in in Furthermore, for any = l"^] CI-HH' positive s Cn ^{Cq(K)^E[IIt.0.p.II^] + ?(L)^E[llr.ll^]} = 0([Cq(K)^K+ ?(L)\]/n) = VnF(e-e Therefore, — follows by the Lindbergh-Feller theorem and equation N(0,I) > ) o(l). (A.27). CI - As previously noted, Suppose a(h) a scalar. is so that o 9 = IIAll^ scalar ^ Up A V 9-9^ = that p . Lemma Next, by 1 /9 1 (V Newey of w Lipschitz in w(X,7r) and max. = 11 E[D|X] s CFAQ"^A'F = :£ Ctr(D)max. Also, by II 1 = 11 (Do P 111 ''1=1 1 X;.",T7.llw.-w.ll 1=1 /n = IIZ-ZII s (A PTi ). (?(L)A = o ) 1 111 = o (1) h„(w) -^0 x. "2 = t., (1). Hence, by the -2 2 1111 - tAt).-t).) = l+h | | }PQ~^A'F'/n, Then 1 (1). p P E[y'.",T)^lT.-T. |/n|X] ^ cy.", |t.-t. |/n ^1=1 1 1 so that 1 IIIj"^(T.p.p'.-T.p.p'.)T,^/nll II :s so that by (1), max.^|h.-h.| = o ^1=1 1 y.",7)^|T.-T. |/n = ^1=1 111 Then similarly to the proof of ^ IIFAQ" (Z-Z)Q~U' F' 1 follows by it IIQ-III + llEj"^T.p.p'.T,^/n - = £ IIFAQ~^II^IIZ-ZII 47 -^0. and 1 1 O (e(L)A ^ p ) 71 (A^), pQ Zll = Op(AQ) + Op«E[T.llp.ll*E[T,^|X.]]/n}^^^) = 0^(A^ + (:^{K)\/n) = o Therefore, /3 then gives IIAIl p 71 D = FAQ~^P'diag{l+|T) ElJ^.'^.n^llw.-w.ii^/nlX] £ cy.",llw.-w.ll^/n, 1 Also, (1). 1 1 1=1 ' p Note that by (1). ^1=1 uniformly bounded, E[Tf)^|X] p = IIFAQ~^P' diag{ i),i) }PQ~^A'F'/nll |h.-h.| i:^n = o p Z = y.",T.p.p'.T)^/n. Let (1). p IIFAQ~^f:-E)Q~^A'F' = | ) p Also, for {).. h respectively, ti | In the other case it follows b). (C^(K)A, iTT.-n. = o 11 i:£n T.[-27).(h.-h.)+(h.-h.)^] , |h.-h. n This result for the {C.AK)/Vn). o " - triangle inequality, iii), p max. (1997) that all (C^(K)/V^). 5 1 i^sn and /Tn) = = |h.-h.| 1 o p p max. Al, . for Dividing by Cv(K)IIAII. o U/Vn) = i:£n Theorem lAp'^k ^ covers the case of Assumption A2, a(h) V from £ e-e_ = Therefore, CJK) o A * 2 CIIAII the Cauchy-Schwartz inequality implies /3, latAp"*^)! 9 < IIAIl V ^ positive semi-definite, so that is follows from the above proof that It Note that for any large enough. C^(K)lipil, HSH' Furthermore, (1). IIBZII £ and for any matrix CIIBII CI - by B, S positive semi-definite, so that 11FA(Q~^SQ"-^-Z)A'F'II s IIFA(Q"^-I)SQ~^A'F'II + IIFAS(Q~^-I)A'F'II £ IIFAQ"^1I^II(I-Q)SII + IIFAZIIIII-QIIIIFAQ~^II ^ PPP + IIFAIIo (1)0 (l)III-QII P inequality, To show this last result, note first that ^ IIZ-SII 11 -^ that IIZ-ZJl Let 0. d. = {[dh(v/.)/dv/]' dw{X.,fl.)/dn 11 1 E.^^T.|d.-d.|2/n ^ C(sup^|h-hQ|^ + -^ 0. 11 and d. = d(X.). bounded, 9w(X,7r)/97r Al, HSH')A'F' follows similarly to the argument for it 1 Then by the conclusion of Lemma Then by the triangle ^ (I). FA(Q"^HQ~^f: q"^H' Q~^ - suffices to show that it = o (1) = j:.^^||fT.-n.ll^/n) (^(Kj^A^j Therefore, IIH-HII ^ C(y.",T.llp.ll^llr.ll^/n)^''^(^.",T.|d.-d. |^/n)^'^^= ^1=1 Hence, by eq. 1 1 ^1=1 1 and the triangle inequality, (A. 12) IIFAQ~-^HQ~^(Z -S )Q~^H'Q"'^A'F' s tr(FAQ~-^HQ7^H'Q~^A'F')o (1) p 1 P IIH-HII -^ (C^(K)L^^^C,(K)A^) = o (1). 1 h p The conclusion then 0. For example, by logic follows similarly to previous arguments. It 111 II like that above, < IIFAQ'-^HQ^^II^IIZ -Z = o II (1). p ^ FVF' — then follows by similar arguments and the triangle inequality that the case of Assumption A2, square root that that (V /V similarly )VnV from Lemma A3: max. |v.-v. Proof: If I — v. = v'^^'^/v"^^^ (9-e F — V . is i.i.d. (S ), By the density of 1, is so that a scalar, it follows by taking a VnW~^^^{e-e^) = In the other case the last conclusion follows and continuously distributed with bounded density and » v. -^ a(h) In QED. — 5 where N(0,I). > ) —1/2 > b), iii, L 0, then V "j bounded, for any inequality, 48 2('a<v.<b)-2('a£v.£b; /n | A > 0, A — > 0, =- (5 ). by the Markov inin.p y" [l(|v.-a|£A M=r We )+l(|v.-b|£A )]/n = known also use the well positive sequences with such that — > £ I II 1/5 n v. -v. = (1) — ^ 0. 2 n 8 and only if -1/2 (e i.e. , n follows that It | — n > in oo. | e if ) 5 — n max. > Y )) = O (A pn -^ i Then 0. |v.-v. i^sn probability approaching one (w.p.a.l) as in | i ). for Consider any positive sequence slowly enough. > goes to zero slower than n V max. i:£n n so 0. Y result that — e in (Probd v.-a| ^A )+Prob( v.-b <A v n all e = (e 1/2 n ) ^ d /v with n n Then w.p.a.l, E y.", |l(a<v.sb)-l(a£v.£b)|/5 n = £ y.", l(a£v.+[v.-v.]<b).-l(a<v.<b) |/5 n n^i=l i i i i n 1 n^i=l | nnn < £ y.",[l(|v.-a|<6 /v )+l(lv.-b|:£5 /v )]/8 n = e 8~^0 {8 /v i n^i=l 1 nn Proof of Lemma 4.1: nnpnn ) = O {v pn ) = o (1). QED. p Because series estimators are invariant to nonsingular linear transformations of the approximating functions, a location and scale shift allows us to W assume that = and [0,1] Z = [0,1] Also, in the polynomial case, the i. component powers can be replaced by polynomials of the same order that are orthonormal with respect to the uniform distribution on [0,1], and in the spline both cases the resulting vector of functions is case by a B-spline basis. a nonsingular linear combination of the original vector, because of the assumption that the order of the multi-index is increasing. Also, assume that the same operation approximating functions C.(K) = it (z). ,.,lia^p^(w)ll, i^.sup ^ welf 1mI-J ' follows from Newey C.(K) 5 CK^"^^-^, and in the spline carried out on the first step ^.(L) = ^j max, j, , ?(L) £ CL, C(L) < L-^. 49 .sup ImI-J (1997) that in the polynomial case, case that Cj(K) ^ K-^^\ is For any nonnegative integer max. J Then r In -,lia'^phz)ll. ^ zeZ It follows from these inequalities and Assumption 4 that ^1 71 so the conclusion follows by Proof of Theorem distribution of - Af,(u)dF g„(w ) (u). t(w) = given w For = (x.z = g„{w J-Jg (w )dF (w away from zero, so Therefore, for is Al. QED. Without changing the notation 4.2: w Lemma 7t ), ). let 1, let F (w) denote the conditional A(u) = A(u) - J"A(u)dF (u), note also that g(w ) Also by the density of the conditional density on W, and ^f^(u) = g(w J-Jglw )dF (w w ) = '^^.(u) and being bounded and bounded and the corresponding marginals. A = J"{h(w)-h (w)}dF (w) J[h(w)-hQ(w)]^dFQ(w) £ CJ[i(w^)+A(u)-gQ(w^)-AQ(u)]^dFQ(w^)dFQ(u) = Cj'[i(w^)-gQ(w^) + A(u)-Aq(u) + A]^dFQ(w^)dFQ(u) = CX[i(w^)-iQ(w^)]^dFQ(w^) + CjlACul-AQCull^dF^lu) + CA^ s Cmax{J'[i(w^)-iQ(w^)]^dFQ(w^), JlAlul-A^CuJl^dF^lu), A^> where the product terms drop out by the construction of g(u), etc., e.g. J" J[g(w^)-gQ(w^)][A(u)-AQ(u)]dFQ{w^)dFQ(u) = J[g(w^)-iQ(w^)]dFQ(w^)- J[A(u)-AQ(u)]dFQ(u) = 0. QED. Proof of Theorem 4.3: W. Lemma 4.1 u by fixing the value of at any point in QED. Proof of Theorem 4.1, Follows from it 5.1: For power series, using the inequalities in the proof of Theorem follows that L^Co(K)^?(L)'^ s cA^, Cq(K)^C^(K)^(LK+L^) - CK^(KL + L^), 50 and for splines that L^CQCKl^CtL)"^ ^ CKl"^, It Cq(K)^<^(K)^(LK+L^) £ CK^(KL + L^). then follows from the rate conditions of Theorem 4.2 that the rate conditions of A2 are satisfied. Theorem 4.1. Lemma The conclusion then follows from Lemma A2, similarly to the proof of QED. 51 References Agarwal, G.G. and Studden, W.J. (1980): "Asymptotic Integrated Mean Square Error Using Least Squares and Bias Minimizing Spline," Annals of Statistics 8, 1307-1325. Ahn, H. (1994) "Nonparametric Estimation of Conditional Choice Probabilities in a Binary Choice Model Under Uncertainty," working paper, Virginia Polytechnic Institute. Andrews, D.W.K. (1991): "Asymptotic Normality of Series Estimators for Nonparametric and Semiparametric Models," Econometrica 59, 307-345. Andrews, D.W.K. and Whang, Y.J. (1990): "Additive Interactive Regression Models: Circumvention of the Curse of Dimensionality," Econometric Theory 6, 466-479. Barzel, Y. (1973) "The Determination of Daily Hours and Wages," Quarterly Journal Economics 87, of 220-238. J. and Zarkin, G. (1989) "Choice among Wage-Hours Packages: An Empirical Investigation of Male Labor Supply," Journal of Labor Economics 7, 415-37. Biddle, Breiman, L. and Friedman, J.H. (1985) "Estimating optimal transformations for multiple regression and correlation," Journal of the American Statistical Association. 80 580-598. Brown, B.W. and Walker, M.B. (1989): "The Random Utility Hypothesis and Inference Demand Systems," Econometrica 47, 814-829. in . Cox, D.D. 1988, "Approximation of Least Squares Regression on Nested Subspaces," Annals of Statistics 16, 713-732. Hastie, T. and R. Tibshirani (1991): Generalized Additive Models, Chapman and Hall, London. (1995): "A Kernel Method of Estimating Structured Nonparametric Regression Based on Marginal Integration," Biometrika 82, 93-100. Linton, O. and Nielsen, J.B. Linton, O., E. Mammen, and J. Nielsen (1997): "The Existence and Asymptotic Properties of a Backfitting Projection Algorithm Under Weak Conditions," preprint. Moffitt, R. Wage-Hours Labor Supply Model" (1984) "The Estimation of a Joint Journal of Labor Economics Z, 550-66. Newey, W.K. (1984) "A Method of Moments Interpretation of Sequential Estimators," Economics Letters 14, 201-206. Newey, W.K. (1994) "Kernel Estimation of Partial Means and a General Variance Estimator," Econometric Theory, 10, 233-253. Newey, W.K. (1997): "Convergence Rates and Asymptotic Normality for Series Estimators," Journal of Econometrics 79, 147-168. Newey, W.K. and J.L. Powell (1989) "Nonparametric Instrumental Variables Estimation," working paper, MIT Department of Economics. Oi, W. (1962) "Labor as a Quasi Fixed Factor," Journal of Political 52 Economy 70, 538-555. Powell, M.J.D. (1981). Approximation Theory and Methods. Cambridge, England: Cambridge University Press. Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression," Econometrica, 56, 931-954. Roehrig, C.S. (1988): "Conditions for Identification in Nonparametric and Parametric Models," Econometrica, 56, 433-447. (1982) "Optimal global rates of convergence for nonparametric regression," Annals of Statistics 10, 1040-1053. Stone, C.J. Stone, C.J. (1985) "Additive regression and other nonparametric models," Annals Statistics. 13 689-705. of Tjostheim, D. and Auestad, B. (1994): "Nonparametric Identification of Nonlinear Time Series: Projections," Journal of the American Statistical Association 89, 1398-1409. Two-Step Estimator for Approximating Unknown Functional Endogenous Explanatory Variables," Australian National Models with Forms in Department of Economics, working paper. University, Vella, F. (1991) "A Simple (1993) "Nonwage Benefits in a Simultaneous Model of Wages and Hours: Labor Supply Functions of Young Females," Journal of Labor Economics 11, 704-723 Vella, F. 53 Table 1: Estimate OLS .000017 (.5597) 2sls NP .000387 -.0094 (2.7447) NPIV -.01097 (2.5513) (2.9170) 7.01516-6 (2.6363) (2.7986) -2.1707e-9 -2.0250e-9 (2.6304) (2.4640) 2.3692e-13 (2.5537) -.000384 7.4577e-6 1.89296-13 (2.0311) -.0005 (2.6732) (2.7643) -6.64646-8 r (.5211) 3.49786-10 r^ (2.7516) Notes:!) h denotes hours worked and r is the reduced form residual. ii) The regressors in the wage equation included education dummies, union status, tenure, full-time work experience, black dummy and regional variables. iii) The variables included in Z which are excluded from X are marital status, health status, presence of young children, rural dummy and non-labor income. iv) Absolute value of t-statistics are reported in parentheses. Table Specification h=0; r=0 h=l; r=l h=2; r=2 h=3; r=3 h=4; r=4 h=5; r=5 2: Cross Validation Value 185.567 183.158 183.738 182.899 182.882 183.128 54 Estimote of Wage/Hours Profile from Polynomial Approximation .^^;^>^ d 0) o d / cc m Ol o q d \ y ^ 1 5 >^ 3 d 1 o o _) \ CJi d >. y ^ / / . \\ -v -^ ^ ^ ^ - \ \ - '^ ^ / ^ ^ / / \\ \ . \ 1 \\ d ^ /'' / \ -^ / \ / /_ / d / - / 1 c o ^ - ^ / <u "~ J \ ^ \ \\ 1 c; ^ ''/^ \ \ O X ^/ \ \ ^ -/y- ~ - \ / tn . / sz 7 CJ — — — — — — to I UnodjustPd Nnnpnromelnc Two Stoqe LROst Squares 1,1. I.I.I, 1 1000 1400 1800 2200 2600 Annual Hours Worked Figiire 6< 1: 3000 3400 3800 Estimate of Wage/Hours Profile from Spline Approximation 6 CM o -[ ^ /^^ d o V \ \ ^ / d /^ ^ o d - d \ \ "" - \ '' / / IN " \ 1 \ ^ d \ \ 1 en / / \ \ s / / \ ^ O / / 1 \\ \ \ \ y/ v_^ a Ncnporometric — — — UnadJL stea Nonpnrametric 1 1 10Q0 \' / \V I -^_ / 1 vt- ' 1400 1800 . 2600 Annual Hours Worked Figure hCly 1 1 2200 2: 3000 _i 3400 L. 3800 95% Confidence d 6 o d o 1 X 1 Interval for Wage/Hours ' ' 1 \ Profile 1 ' 1 i _ - t from Polynomiol ' 1 \ ' 0~----'-/A^^~~-'A\ \ ^^ ./''''^ N, '' \ '^ / \ \ d 1 ^ ^ -V. S \ \ CO o \ \ 1 o \ CSJ ! I 1000 I 1400 1 1 1800 1 1 1 2200 1 1 2600 Annual Hours Worked Figure "51 3: 1 3000 1 ! 3400 [ 3800 MIT LIBRARIES 3 9080 01444 1346 95% Confidence o Intervol for i 1 1 1 1 1 Wage/Hours ! 1 from Spline Profile 1 1 < i 1 0) o ^ cr •^ QJ 6 \ en ^ ^ . ^ \ o ^ ^ "^ ^ ^ '" - '' -^ \ 5 >N i- 3 o o o \ c o ' N^^ " ^ ^ ^ d j^ y "" y^ y^ ^ ^^// \. \ \ "^ \ ^ '^ ^ \ "^ / \ \ \ / \ 1 c ._ cn " " \ "!t CD ^ — \ o -J ^^^ 1 X o " " "^ u_ en ^ \ -\,-'-''''^ v. ^ ~ " s -^ 6 . \ \ 00 - \ 1 JZ \ u \ \ OJ 1000 1400 2200 1800 2600 Annual Hours Worked Figure ; V 4: 3000 3400 3800 kB73 n!5 Date Due Lib-26-67