^^^'^^%.; MIT. LIBRARIES • DEWEY Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/nonparametricest00newe2 working paper department of economics Nonparametric Estimation Of Triangular Simultaneous Equations Models Whitney Newey No. 98-16 October 1998 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 WORKING PAPER DEPARTMENT OF ECONOMICS Nonparametric Estimation Of Triangular Simultaneous Equations Models Whitney Newey No. 98-16 October 1998 MASSACHUSEHS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142 MASSACHUSETTS TNSTTfDrr OFTECHMOLOGY Nonparametric Estimation of Triangular Simultaneous Equations Models Whitney K. Newey Department of Economics MIT, E52-262D Cambridge, MA 02139 wnewey@mit edu . James L. Powell Department of Economics University of California at Berkeley Francis Vella Department of Economics Rutgers University Brunswick, NJ 08903 Vella@f as-econ. rutgers. edu New March 1995 Revised, Febuary 1998 Abstract This paper presents a simple two-step nonparametric estimator for a triangular simultaneous equation model. Our approach employs series approximations that exploit the additive structure of the model. The first step comprises the nonparametric estimation The second step is the estimation of the reduced form and the corresponding residuals. of the primary equation via nonparametric regression with the reduced form residuals included as a regressor. We derive consistency and asymptotic normality results for our estimator, including optimal convergence rates. Finally we present an empirical example, based on the relationship between the hourly wage rate and annual hours worked, which illustrates the utility of our approach. Keywords: Nonparametric Estimation, Simultaneous Equations, Series Estimation, Two-Step Estimators. The NSF provided financial support for the work of Newey and Powell on this paper. An early version of this paper was presented at an NBER conference on applications of semiparametric estimation, October 1993, at Northwestern. We are grateful to participants at this conference and workshops at Harvard/MIT and Tilburg for helpful comments. We are also grateful to the referees for their extensive comments that have helped both the content and style of this paper. Introduction 1. Structural estimation important is in econometrics, because needed to account is it correctly for endogeneity that comes from individual choice or market equilibrium. structural models do not have tight functional form specifications, so that to consider nonparametric structural models and their estimation. it is is estimation useful This paper proposes and analyzes one approach to nonparametric structural modeling and estimation. approach Often This different than standard nonparametric regression, because the object of is the structural model and not just a conditional expectation. Nonparametric structural models have been previously considered Newey and Powell (1989) and Vella for a system of equations (1991). Roehrig (1988), in Roehrig (1988) gives identification results when the errors are independent of the instruments. Newey and Powell (1989) consider identification and estimation under the weaker condition that the disturbance has conditional mean zero given the instruments. The model we consider are complementary to these previous results. in some ways than Newey and Powell The model we consider is The results of is this paper more restrictive (1989), but is easier to estimate. a triangular nonparametric simultaneous equations model where y = (x Z or ° (1.1) X = n X is a d X 1 + G ) ^ (z) + E[e|u,z] = E[e|u], , vector of endogenous variables, variables that includes z, as a d,, (1.1) x 1 z is subvector, a x d n„(z) is vector of instrumental 1 a d 11 1 functions of the instruments E[u|z] = 0, u z, and u is a d x 1 X x vector of 1 vector of disturbances. Equation generalizes the limited information simultaneous equations model to allow for a nonparametric relationship nonparametric reduced form g (x,z ) between the variables y and (x,z ) and a Ilf^(z). The conditional mean restriction in equation (1.1) is one conditional mean version of the usual orthogonality condition for a linear model. general assumption than requiring that This added generality y a more For example, can be purchases of a commodity and x in E[u] = 0. and that z allows for it some separable expenditures on a subgroup of Endogeneity results from expenditure on a commodity subset being a choice commodities. from Also, heteroskedasticity often results variable. is (1.1) for some econometric models, because conditional heteroskedasticity in the disturbances. demand models Equation be independent of (e,u) may be important 2 Brown and Walker functions, as pointed out by demand individual heterogeneity in Assumption (1989). allows for both (1.1) endogeneity and heteroskedasticity. An alternative model, considered by Newey and Powell E[c|z] = 0. equation (1.1) Strictly speaking, neither model E[e|z] = does not imply that into a reduced form and a residual One benefit of such a condition is u that 3 0. is very difficult if it In this sense, its restrictions 2 If n The second step on z. (z) were linear in z is In contrast, E[e|z] = is u The first step of this model. from the nonparametric regression the nonparametric regression of y on x, z and , and the conditional expectations were replaced by If equation (1.1) is satisfied, u is independent of z, = E[E[e|u,z]|z] = E[E[e|u]|z] = E[E[e|u]] = E[e] = 0. The mapping from the reduced form it (1.1) 4 3 discontinuous, making the model of equation are palatable. population regressions, then the third condition implies that and the second condition that z is orthogonal to e. 4 a strong restriction. (1989). consists of the construction of a residual vector X is mean assumption only the conditional We propose a two-step nonparametric estimator of (1.1) x leads to a straightforward estimation Newey and Powell satisfied, as discussed in The additive separability of satisfying equation a convenient one for applications where estimation stronger than the other, because is approach, as will be further discussed below. is (1989), requires only that E[y|z] z and to the structure is orthogonal to E[e] = 0, g-(x,z difficult to construct a consistent estimator. then ) u, E[e|z] turns out to be Two-step nonparametric kernel estimation has previously been considered u. (1994). Ahn in Our results concern series estimators, which are convenient for imposing the additive structure implied by equation We (1.1). derive optimal mean-square convergence rates and asymptotic normality results that account for the first step estimation and allow for v'n-consistency of mean-square continuous functionals of the estimator. Section 2 of the paper considers identification, presenting straightforward Section 3 describes the estimator and Section 4 derives sufficient conditions. convergence rates. Section 5 gives conditions for asymptotic normality and inference, including consistent standard error formulae. Section 6 describes an extension of the model and estimators to semiparametric models. Section 7 contains an empirical example and Monte Carlo results illustrating the utility of our approach. Identification An implication (2.1) of our model is that for E[y|x,z] = + E[g|x,2] gQ^^-^i^ = Eq(->^,z^) w = ggfx.z^) + Aq(u) = hQ(w). Thus, the function of interest, y on (x,z ) and u. g (x,z Equation (2.1) also E[X (u)+{y-E[y|x,z]>|u,z] = ^^(u) as specified in equation (1.1). equivalent to the conditional is ), '^(-v(u) = = E[c|u], + E[e|u,z] (x', z'^. u')', one component of an additive regression of implies that only depends on u, E[e|u,z] = E[y-g-^(x,z which implies Thus, the additive structure of equation mean restriction of equation identified, so that the identification of g_^ (1.1). under equation (1.1) ) (2.1) is u is the same as the identification of this additive component in equation (2.1). To analyze identification of g (x,z ), note that a conditional expectation unique with probability one, so that any other additive function = E[e|u,z] = E[e|u], Furthermore, is lu.z] g(x,z )+X(u) is Prob(g(x,z )+A(u) = g (x,z )+A satisfying equation (2.1) must have identification is Equivalently, working with the difference of two conditional expectations, identification additive function 2.1: Prob(8(x,zJ equivalent to the statement that a zero is must have only constant components. gJx.z) up is identified, + "xCu) - 0) = To be precise we have implies there is a constant 1 c Theorem 2.1, nonconstant. ), (5(x,z the with Prob(8(x,z,) = c Suppose that identification ) there are 5(x,z and ) ^'(u) with 5(x,z ) + g-Cu) Intuitively, this implies a functional relationship Then by fails. = and between u 5(x,z )). 2'(u) were a one-to-one function of a subvector More generally, random vectors (x,z ) 6(x,z and + 3'(u) = ) u of u, ) and a degeneracy in the joint distribution of these two random variables. instance, if Tf and only if a straightforward interpretation of this result that leads to a simple is sufficient condition for identification. (x,z gig to an additive constant, if 1 There Therefore, 1. equivalent to equality of conditional expectations implying equality of the additive components, up to a constant. Theorem. = (u)) For then u = implies an exact relationship between Consequently, the absence of an exact relationship u. will imply identification. To formalize these ideas it is helpful to be precise about what we mean by existence of a functional relationship. There Definition: there exists and is a h(x,z ,u) functional relationship between and a set Prob(h(x,z ,u) = 0) < 1 for 11 all such that fixed This condition says that the pair of vectors equation. ProbCU.) >0, u e (x,z Cx,z J and u if and only if Prob(h(x.z ,u) = 0) = 11. ) and u Thus, the functional relationship of this definition solve a nontrivial implicit is an implicit one. As previously noted, nonidentification implies existence of such an implicit relationship. Therefore, the contrapositive statement leads to the following sufficiency result for 1 identification. Theorem g (x,z ) Although (x,z If there is is identified up 2.2: it and ) is no functional relationship between ) and u then an additive constant. to a sufficient condition, nonexistence of a functional relationship between u is By Theorem not a necessary condition for identification. nonexistence of an additive functional relationship that may Thus, identification condition. (x,z is occur when there still 2.1, it is necessary and sufficient is an exact, nonadditive functional relationship. The additive structure identified even z is differentiable. (2.2) (2.1) is zM n(z) = (z,e ), is The reduced form x and so strong that the model is not satisfied, For example, suppose ). example there In this relationship between 6{x) + (x,z a scalar, endogenous variables. function equation when the usual order condition smaller dimension than present, in u of the and g(-)(x) is and even though two dimensional, x = form n(z) + u x -u has z z is not are restricted to be '^^((u) only one instrument although there are two implies a nonlinear, nonadditive = exp(x -u ). Consider an additive satisfying = 5(x) + ^(u) = 5(x^,exp(x^-u^)+U2) + The restriction that x i.e. may be gp,(x) and must also be differentiable. A. (u) = 0- are differentiable means that 5(x) and 2r(u) Let numerical subscripts denote partial derivatives with respect to corresponding arguments. 5^(x) + 52(x)exp(x^-u^) = 0, Then equation -52(x)exp(x -u Combining the last two equations gives Z Au) = -y (u)exp(u -X ^'("^.^2) ). ) (2.2) implies + r Au) = 0, 5 (x) + y^^") = 0. x Assuming that support of probability one, so that with probability one 7^M = 5(x) identified, up to a constant, by There is = is u, it follows from this equation that Then the second and first equations imply 0. implying that both equation (2.2) implies that contains more than one point with u can vary for given x r,(u) 5 (x) = 5 (x) = 0, conditional on and 5 are constant functions. y a constant function, and hence Theorem g(-,(x) a simple sufficient condition for identification that generalizes the rank z = (z ,z partitioned as ). differentiation with respect to Theorem support of then If g(x,z If 2.3: (z,u) g^(x,z ), Let x, x, -XCu), u, z u, and and 1, and , 2 Suppose that z z subscripts denote respectively. are differentiable, the boundary of the Tl(z) has zero probability, and with probability one rankCTl (z)) - d , is identified. ) were linear n(z) is 2.1. condition for identification in a linear simultaneous equations system. is Thus, in z then rankdl = d (z)) is precisely the condition for identification of one equation of a linear simultaneous system, in terms of the reduced form coefficients. Thus, this condition is a nonlinear, nonparametric generalization of the rank condition. It is interesting to note, as pointed out by a referee, that this condition leads to an explicit formula for the structure. Note that h(x,z) = E[y|x,z] = g(x,z. ) + Differentiating gives A(x-n(z)). h^(z,x) = gj^(x,z^) + \iu), h^(x,z^) = g^(x.z^) - n^(z)'A^(u), h^Cx.z) = -n2(z)'A^(u), Then, [IT if rank(n (z)n (z)' ] n = d (z)) (z) for almost and solving gives the other terms gives all A z, (u) multiplying = -D(z)h (x,z). h (x,z) by D(z) = Plugging this result into g^(x,z^) = h^(x.z) - U^{z)'D{z)h^{x,z), (2.3) g (x,z,) = h (x,z) + D(z)h„(x,z). 2 X 1 ^x This formula gives an explicit equation for the derivatives of the structural function in terms of the identified conditional expectations So far we have This constant. is different values of also desirable to restriction is (x,z ). know the many purposes, such g (x,z level of E[c] = t(u) if 0, is so that g (x,z h(x,z). ) up to an additive as comparing g {x,z at ) such as forecasting demand quantity, In other cases, ), imposed on the distribution of assumed that follows that only considered identification of sufficient for and IT(z) which can be identified E[y] = E[g (x,z it is a location Then from equation )]. J'T(u)du = some function with if For example, suppose that e. g it is (2.2) it 1, jE[yIx,z^,u]T(u)du - E[jE[y |x,z^,u]T(u)du] + E[y] (2.4) = gQ(x,z^) - E[gQ(x,z^)] + E[y] Thus, under the familiar restriction g^(x,z up to a constant, as ) in = g^{x,z^). E[e] = the conditions for identification of equation (2.2), also serve to identify the level of go(x.z^). Another approach to identifying the constant term For example, the restriction X^(u). /V (0) = is is to place restrictions on a generalization of a condition that holds in the linear simultaneous equations model because, in the linear model where g^(x,z ) and IKz) are linear and equation (1.1) holds with population regressions replacing conditional expectations, the regression of linear in u. some known (2.5) More generally, we u and A. y on (x,z ) and will consider imposing the restriction In this case E[y|x,z^,u] - A = gQ(x,z^) + Aq(u) - A = g^i-x.,z^). u will be '^(-.(u) = A for Thus, under the restriction = -^f^tu) structural function in terms of A, there The constant h. restrictions as well, but this will suffice for Equation (2.1) can be estimated in under other types of many purposes. two steps by combining estimation of the residual with estimation of the additive regression procedure (i will be identified Estimation 3. u a simple, explicit formula for the is = (2.1), is equation (2.1). formation of nonparametrically estimated residuals The second step n). 1 in is using the estimated residuals the structural function g (x,z ) The first step of this u. = w.-II(z.), estimation of the additive regression in equation u. in place of the true ones. An estimator of can then be recovered by pulling out the component that depends on those variables. Estimation of additive regression models has previously been considered in the literature, and least we can apply those results for our second step estimation. There are at two general approaches to estimation of an additive component of a nonparametric One regression. is to use an estimator that imposes additivity and then pull out the component of interest. Breiman and Friedman This approach for kernel estimators has been considered by (1985), Hastie and Tibshirani (1991), and others. Series estimators are particularly convenient for imposing additivity, by simply excluding interaction terms, as in Stone (1985) and Andrews and Whang (1990). Another approach to estimating the additive component expectation is to use the fact that, for a function jElylx.z ,u]T(u)du = g (x,z Then g (x,z) ) t(u) g (x,z such that ) of the conditional jT(u)du = 1, + AQ(u)T(u)du. can be estimated by integrating, over u, an unrestricted estimator of . This partial means approach to estimating additive models the conditional expectation. has been proposed by Newey (1994), Tjostheim and Auestad (1994), and Linton and Nielsen (1995). computationally convenient for kernel estimators, although is It is it less asymptotically efficient than imposing additivity for estimation of v^-consistent when the disturbance functionals of additive models homoskedastic. is paper we focus on series estimators because of their computational In this To describe the two-step series convenience and high efficiency in imposing additivity. we estimator = (r . (z),...,rj For each positive integer L initially consider the first step. be a vector of approximating functions. (z))' . In this r let (z) paper we will consider in detail polynomial and spline approximating functions, that will be further Let an discussed below. number of observations. r. = r subscript index the observations, and let i Let be the total be the predicted value from a regression of IT(z) x. on (z.), 1 1 In mz) = rhz)'y, (3.1) To form the second p (w) = (p step, let w functions of = (x',z' u')' In R = x)'. r = (R'R)~^R'(x, [r,,...,r KK. such that each depends either on (w) p ]' be a vector of approximating (w))' (w),...,p IK. u, n (x,z ) or on Exclusion of the interaction terms will mean that any linear but not both. combination of the approximating functions has the same additive structure as in equation (2.1), that additivity i.e. for the event and d, is t(w) imposed. Also, let denote the indicator function 1(d) denote a function of the form d+d t(w) = n. ,'^ l(a. £ w. s J=l J J (3.2) where + d a is . and b . Let (x,z ). 11 u. = refers to the observation, and y. on = p (w.) This x. - n(z.), w p. J w are finite constants, the dimension of discussed below. b.) is . t(w) is w. = (x'. 1 1 t. = t(w.). 1 for each observation where 10 the jth component of w, and d = d a trimming function to be further ,z' .,u'. )', li 1 where an The second step t. = 1, giving is i subscript ^ for obtained by regressing h(w) = p^(w)'p, (3.3) P = fVi p - (P'P) ^P'Y Y = -^nPn^'- t(w) The trimming function convenient for the asymptotic theory. is be used to exclude large values of (y,.....yJ'. w that might lead to outliers in the case of Although we assume that the trimming function polynomial regression. where the limits results would extend to the case a and . b if are estimated. . The estimator g (x,z ) h(w) constant, that p and of respectively. u, remaining terms depend only on g(x,z^) = c^+ u. (x,z ) Suppose that p IK (w) = K for the next A(u) = c^+ this estimator the constant shifts, the constants are not important. For example, includes the log of price, a constant elasticities (i.e. derivatives of g). is g(x,z if and c c is changes as ) g(x,z ) x J J and = g p, 1 - c^ A For important for predicting tax receipts. However, the condition that c z not needed for estimation of '^p>(u) linear model, can easily be used to estimate the constants. J-*^ or represents log-demand Because of the trimming we cannot use the most familiar condition that j^oP-P-(u) Y,--v p+'i . terms should be specified. In other cases the constant is important. example, knowing the level of demand estimate the constants. the is ^g ^ ^X = ^V Ij^j^ +2^jP/"^' For many objects of estimation, such as predicting how x ) Then the estimators can be formed as K +1 l.^^ ^.p.{x,z^), To complete the definition of 1 (x,z terms, and that the These estimators are uniquely defined except for the constant terms and hold. by collecting those terms that depend only on '^^.(u) depends only on (w) still is can be used to construct an estimator of the structural and those that depend only on (3.4) In these limits are estimated from independent data (e.g. a subsample that not otherwise used in estimation), then the results described here will function nonrandom, some is J J particular, can also It = A, = which generalizes the Choosing leads to an estimator satisfying 11 E[e] c^ A = X - A(u) = X and to In this case it will follow that the equation (3.4). estimator of the individual components are g(x,z (3.6) We ) = h(x,2 ,u) A(u) = h(x,z ,u) - h(x,z ,u) + A. - A, consider two specific types of series estimators corresponding to polynomial and To describe these spline approximating functions. let /i = (ji ' vector of nonnegative integers, = IT. (z.) For a sequence ". To describe a power (x,z or ) that the first d and let z z^^^V. series in the second stage, let u, d^ I]._,fx., given by is multi-indices with dimension the same as only on |fi| = °^ distinct such vectors, a power series (i-i(^))o_i approximation for the first stage rhz)^U^^'^ a multi-index, with norm i.e. denote a )' ,...,/i but not both. components of nonzero, and vice versa. |i(k) w, denote a sequence of (m^^)) and such that for each k, w depends This feature can be incorporated by making sure are zero if any of the last d components are Then for the second stage a power series approximation can be obtained as K, fid) m(k)w .-••.w'^ ) p (w) = (w'^ , , For the asymptotic theory with degree |;i| it will be increasing in i . assumed that each multi-index sequence or k, and that all is ordered terms of a particular order are included before increasing the order. The theory to follow uses orthonormal polynomials, which may also have computational advantages. For the first step, replacing each power by the product of univariate z polynomials of the same order, where the univariate polynomials are orthonormal with respect to some distribution, may The estimator will be lead to reduced collinearity. numerically invariant to such a replacement, because 12 | ^lU) I is monotonically An analogous replacement could also be used increasing. second step. in the Regression splines are smooth piecewise polynomials with fixed knots (join points). They have been considered in the statistics literature They are different than smoothing parameter is For regression splines the smoothing splines. the number of knots. by Agarwal and Studden (1980). They have attractive features relative to power series in that they are less sensitive to outliers and to bad approximations over small regions. The theory here requires that the knots be placed in the must be known, and that the knots be evenly spaced. It support of z., which therefore should be possible to generalize results to allow the boundary of the support to be estimated and the knots not evenly spaced, but these generalizations lead to further technicalities that we leave to future research. To describe regression and that the support of c let [-1,1] = (c) is splines z is m J-1 b. = a. = -1, convenient to assume that degree spline with [-1,1]. 1, For a scalar evenly spaced knots on a linear combination of 1 p.,(u) = ^ In this is a cartesian product of the interval An l(c > 0)«c. it . M[u set of multi-indices z of knots for the + m paper we take functions for < ^ < m+1. {fx(£)}, 1 m+2 - 2(;^-m-l)/J]}™, fixed but will allow with fi.(£) J ^ m+J ^ ^ s to increase with sample size. for each will be products of univariate splines. jth component of z is J m+J and j £, For a the approximating In particular, if the number then the approximating functions could ., be formed as d (3.7) rj^(z) = Throughout the paper nj=; it '^M/io.j/^j)' will be ^'^ = assumed that '' J. ••^^' depends on ratio of numbers of knots for each pair of elements of 13 z is K in such a way that the bounded above and away from formed Spline approximating functions in the second step could be zero. in the analogous way, from products of univariate splines, with additivity imposed by only including terms that depend only on one of (x,z or ) u. The theory to follow uses B-splines, which are a linear transformations of the above The low multicoUinearity of B-splines and functions that have lower multicollinearity. recursive formula for calculation also lead to computational advantages; e.g. see Powell (1981). 4. Convergence Rates we In this Section obtain results it = y - h (w), and important to impose some regularity conditions. is u = x - Assumption C 1: Also, for a (z). TT V for a random matrix constants derive mean-square error and uniform convergence rates. Y, IIYII V such that (i = , v < oo, let and IIDII (x,z), = [trace(D'D)] tj 1/2 the infimum of IIYII 00 Prob( IIYII {(y.,x.,z.)}, 1/v = {E[IIYir]r D matrix X = Let To 1, < C) = 2, ...) 1. is i.i.d. and and Var(x|z) Var(y|X) are bounded. The bounded second conditional moment assumption literature (e.g. Stone, 1985). Relaxing it is quite common in the series estimation would lead to complications that we wish to avoid. Next, Assumption on we impose z 2: Also, from zero on Let W = {w continuously distributed with density that and the support of its support, intervals. is the following requirement. w W, and is z is : is t(w) = bounded away from zero a cartesian product of compact, connected continuously distributed and the density of W is 1>. w is contained in the interior of the support of 14 bounded away w. useful for deriving the properties of series estimators like those This assumption is considered here (e.g. see Newey, 1997). moment matrix the second of the approximating functions. condition like those discussed in Section 2 w of allows us to bound below the eigenvalues of It is embodied in this being bounded away from zero means that there between (x,z and ) so that by u, Theorem 2.2 Also, an identification is assumption. The density no functional relationship This assumption, identification holds. along with smoothness conditions discussed below, imply that the dimension of at least as large as the dimension of (x,z z must be a familiar order condition for ), identification. To see why Assumption 2 implies the order condition, note that the density of is bounded away from zero on an open x, z set. and , But the density of IKz), (z ,n(z)) is set. (z ,IT(z)) a vector function of w Then, because w a one-to-one function of is must be bounded away from zero on an open z that will be smooth under assumptions given below, so its range must be confined to a manifold of smaller dimension than (z ,n(z)) unless of and n(z) has at least as many components as z x are the same, Some discreteness results. In particular. the distribution of in the z Since the dimension (z ,n(z)). must have as many components as components of z (x,z ). can be allowed for without affecting the Assumption 2 can be weakened to hold only for some component of Also, one could allow z. some components of distributed, as long as they have finite support. z On the other hand, to be discretely it is not easy to extend our results to the case where there are discrete instruments that take on an infinite number of values. Some smoothness conditions are useful for controlling the bias of the series estimators. Assumption and g (x,z 3: ) ^ is (z) and /V (u) continuously differentiable of order s on the support of are Lipschitz and continuously differentiable of order W. 15 s z on The derivative orders and s control the rate at which polynomials or splines the rate of approximation for In particular, approximate the function. and hence also for A.(u), s h (w), 0(K will be -s/d where ), d is g (x.z ) and the dimension of (x.Zj^). and The last regularity condition restricts the rate of L. Recall that Assumption 4: denotes the dimension of d Either a) for power series, (K + for splines. (K^ + KL^^^)[(L/n)^''^+L"^/^il -^ For example, for splines, K if and requires that they grow slower than As a preliminary step expectation estimator Lemma 4.1: it h(w). is n i i) q = 1/2 all at the same rate then 0; or b) s > 2d this condition . Let F denote the distribution function of w. - hJw)\ (K/n + K~^^^^ + L/n + L~^^/^i). q = for splines, and ,.Ah(w) w&W p = p for power series, 1 (k'^KKM)^^^ + K^^^ + (L/n)^^^ + L~^/^i]). subsequent proofs are given in the Appendix. This result leads to convergence rates for the structural estimator in equation (3.4). We will treat the constant square and uniform convergence rates, so separately. —> If Assumptions 1-4 are satisfied then sup This and +L L)[(L/n) helpful to derive convergence rates for the conditional ST(w)[h(w)-hJw)]^dFJw) = Also, for z. and that , K 0. grow L K growth of the number of terms it term a is little g(x,z differently for the helpful to state and discuss ) given mean them For mean square convergence we give a result for mean-square convergence of the de-meaned difference between the estimator and the truth. 16 Let t = E[t(w)]. Theorem If Assumptions 1-4 are satisfied then for 4.2: ST(w)[kx,z^)-ST(w)l(x,z^)Fj:dw)/xl^FQ(dw) = The convergence rate approximating functions the is sum of in the first step h(x,z ) = g(x,z )-g (x,z ) (K/n + k"^^''^ + L/n + L~^^/\). two terms, depending on the number K and the number in the L of Each of second step. these two terms has a form analogous that derived in Stone (1985), which would attain the optimal mean-square convergence rate for estimating a conditional expectation L were chosen appropriately. and L proportional to n conclusion of Theorem 4.2 i In particular, i i , if K chosen proportional to is and Assumption 4 10 1 From Stone (1982) it g-^(x,z ) if u n satisfied, then the {max{n~^^^^'^^^^\ n~^^/^V^^l^). p follows that the convergence rates for their respective steps, and is ST(w)Lkx,zJ-ST(w)kx.zJFJ:dw)/TfFJdw) = (4.1) is K if i.e. in this expression are optimal would be optimal for estimation of n did not have to be estimated in the first step. Thus, when K and L are chosen appropriately, the mean square error convergence rate of the second step estimator is the slower of the optimal rate for the first step and the optimal rate for the second step that would obtain An important feature n is if the first step did not have to be estimated. of this result is that the second step convergence rate the optimal one for a function of a d-dimensional variable rather than the slower rate that would be optimal for estimation of a general function of additivity was not imposed. estimation. It is This occurs because the additivity of h-^(w) is w, where used in essentially the exclusion of interaction terms that leads to this result, as discussed in Andrews and Whang (1990). Although a full derivation of bounds on the rate of convergence of estimators of 17 g (x,z is ) beyond the scope of this paper, something can be said. n Stone (1982) that estimation of when ) u is known. cannot be better than the best rate when the rate is attained we know o ... Setting ,, K n andJ . that it u Since the best rate is known, d , minimize each component of the mean-square error rate will be attained for splines , s when s/d ^ ^i/d. n , s d, it d/(d+2s) u is unknown where this rate is , and d /(d +2s nil that and s/d > For power series the 2. narrow the range of where the estimator could attain the best convergence rate for the second s/d ^ The condition s/d means that the second stage We (relative to its dimension) as the first stage. convergence rate would be when the first stage although in that case we know least as is at and s/d step. smooth do not know what the optimal less smooth than the second stage, that the first stage convergence rate would dominate the mean-square error of our estimator error in Theorem 4.1 or 4.2. is ^ ,, ) i follows that the optimal second stage side conditions of Assumption 4 are even stronger and would s/d when the best one (and that the estimator has the best rate). is ^* the rates be proportional ^to *u to u , known from must attain the best rate when g Therefore, for the ranges of . L is the best attainable mean-square error (MSE) rate for is g (x,z It if K L and are chosen to minimize the mean-square Our estimator could conceivably be suboptimal in that case. Also, the side conditions in Assumption 4 are quite strong and rule out any possibility of optimality of our estimator (e.g. when It is s/d ^ when neither the first or the second stage is very 1). interesting to note that the convergence rates in equation (4.1) are only relevant for the function itself and not for derivatives. h(w), smooth Because u is plugged into one might think that convergence rates for derivatives are also important, as Taylor expansion of h(u) around h(u). depends only on the conditioning variables The reason that they are not x and z in equation We thank a referee for pointing this out and for giving a slightly of the following discussion of optimality. IS (2.1), is that in a u a feature that more general version allows us to work with a Taylor expansion of rather than h (w) Theorem 4.2 gives convergence rates that are net know be interesting to similar results hold if E[e] = usual restriction A (u) = A rates. is motivated It it in the is With the trimming the Also, the restriction straightforward to obtain uniform convergence rates for h(w) in Lemma uniform rates for g '^p)(u) = X uniform convergence rates for W 4.3: q = 1/2 is continuous in the is g 4.1. ) = h(x,z ,u) - for splines, and from the ) note that u at ^f^(^) 1 in = ^> ^i^d W W, equation h. on values of (x,z ), Assumptions 1-4 are satisfied for power series, ^S(x,z^) - gQ(x,zp\ = 0^(K'^[(K/n)^^^ + k"^^"* + (L/n)^^^ + ^^P(xz)&-W (in when the follow immediately from those for q = can ) some value imposed leading to an estimator as A, g(x,z supremum norm on In particular, equal to the coordinate projection of g(x,z If In particular, g(x,z Furthermore, as long as the restriction from Lemma will also follow identification restriction Specifically, for 4.1. up to the constant term, by just fixing h(w), used to identify the constant term 'Vp,(u) then for That but this condition does not seem to be well 0, so that uniform rates follow immediately. Theorem included. is model and so will not be considered. be recovered from (3.6), would also does not lead to mean-square error convergence E[t(w)c] = uniform convergence rates for W), It would be possible to obtain mean-square rates that include the constant term It by specifying a restriction on when the constant term does not identify the constant. a pointwise one, so in the proofs. of the constant term. depend on the identifying assumption for the constant term. will h(w) L^/\]). This uniform convergence rate cannot be made equal to Stone's (1982) bounds for either the first or second steps, due to the leading K term. Nevertheless, these uniform rates improve on some in the literature, e.g. on Cox (1988), as noted in Apparently, it is Newey (1997). not yet known whether series estimators can attain the optimal uniform convergence rates. The uniform convergence rate for polynomial regression estimators 19 is slower and the K conditions on more stringent for power series than L and Nevertheless we splines. retain the results for polynomials because they have some appealing practical features. flexibility in functional form. They are often used for increased Also polynomials do not require knot placement, and hence are simpler to use in practice. We could also derive convergence rates for identical to those in 5. Theorems 4.2 and 4.3. and obtain convergence rates A(u), We omit these results for brevity. Inference We focus on Large sample confidence intervals are useful for inference. confidence intervals rather than uniform ones, as in much of the literature. purpose denote a possible true function h let suppressed for convenience. That consider all linear functionals of function g (x,z w For this argument is We represents the entire function. h which turn out to include the value of the h, We at a point. ) the symbol is, where the h (w), pointwise construct confidence intervals using standard errors that account for the nonparametric first step estimation. To describe the standard errors and associated confidence intervals consider a general framework. h (w), w where the For this purpose argument represents the entire function. As a function of real line, G i.e. = aCh^) h, a(h) G . useful to That is, the symbol be a particular number associated with The object of interest h is denote a possible true function suppressed for convenience. Let of the functional at of h represents a mapping from possible functions of a(h) a functional. Q = a(h) estimator is let it will be In this Section we h h h. to the assumed to be the value develop standard errors for the and give asymptotic normality results that allow formation of large sample confidence intervals. This framework value of g is at a point. general enough to include many objects of For example, under the restriction 20 '^f^(u) interest, such as the = X discussed earlier, the value of g (x.z (5.1) g (x,z = a(h ) at a point ) a(h) = h(x,z ), A general inference procedure for equation, i.e. the value of can be used g can be represented as u) - X. linear functionals will apply to the functional of this construction of large sample confidence intervals for in the a weighted average derivative of is when g through a linear combination w particularly interesting say 2^, g (x,z This object ). an index model that depends on is differentiable function with derivative ?-} ) at a point. Another example v(w)t (w' (x,z t = t(w where g (w ) (q). Then for any 3') w t(q) = (x,z is ) only a is such that v(w) integrable, is J"v(w)[ah„(w)/aw,]dw = [ft (w,^)v(w)dw]3', (5.2) q 1 1 so that the weighted average derivative h functional of where , a(h) = for summarizing the slope of the value of g (w ) is Xv(w)[5h(w)/9w ]dw. g/^(w ) This j. also a linear is This functional may also be useful over some range, as discussed in Section 9g (w )/5w or proportional to 7. Unlike at a point, the weighted average derivative functional will be ^^-consistent under regularity conditions discussed below. We can derive standard errors for linear functionals of general enough to include G = AP, (5.3) Because G is Q = a(h) The estimator For example, the value of applying the functional in equation general, linearity of a(h) A = which many important examples, such as the value a weighted average derivative. "plug-in" estimator. h, (5.1) g to obtain of 6 is of = a(h^) a class g is at a ppint or a natural, at a point could be estimated by g(x,z ) = a(h) = h(x,z - X. ,u) In implies that (a(p^j^) a(pj^j^)). a linear combination of the two-step least squares coefficients p, a natural estimator of the variance can be obtained by applying the formula for parametric 21 , two step estimators (see Newey As for other series estimators, such as 1984). (1997), this should lead to correct inference asymptotically, properly for the variability of the estimator. Q = P'P/n, (5.4) E, ^1=1 11 ^1=1 V = AQ ht This estimator regressions. H = 11 The variance estimate for (5.5) 111 = 7.",(u.u'.)®(r.r'.)/n, 1 is a(h) is accounts = Q I 1 ®(R'R/n), d-k 1 ,T.[Sh(w.)/5u]'®p.r'./n. ^1=1 1 1 11 y. then + HQ^^Z^Q^^H' ]Q ^A' like that suggested by Newey (1984) for parametric two-step AQ ZQ equal to a term It is 11 t = y.",T.p.p'.[y.-h(w.)]^/n, it Newey denote the Kronecker product and © Let because in A' , that is the standard heteroskedasticity consistent variance estimator, plus an additional nonnegative definite term that accounts for the presence of estimator is u. It has this additive form, where the variance of the two-step larger than the first, because the second step conditions on the first step dependent variables, making the second and first steps orthogonal. /-V-1/2 (0-9 vnV will show that under certain regularity conditions Thus doing inference as will be a correct large results of Andrews if were distributed normally with mean 9 sample approximation. (1991) 9 — d '' We ) cind > N(0,1). variance V/n This extends large sample inference and Newey (1997) to account for an estimated first step. However, as in this previous work, such results do not specify the convergence rate of a(h). That rate will depend on how fast V goes to infinity. To date there is still not a complete characterization of the convergence rate of a series estimator available in the literature, although it is known when v^-consistency occurs (see Newey, 1997). Those results can be extended to obtain \^-consistency of functionals of the two-step estimators developed here. norm and this P norm by a Let II • II = (E[t(w)(-) J/t) denote a trimmed mean-square denote the set of functions that can be approximated arbitrarily well in linear combination of K p (w) for large enough 22 K. It is well known, for p in T corresponding to power series or splines, that (w) (x,z and ) u where the individual components have critical condition for \^-consistency is that there is includes all additive functions finite i'(w) T e II • The norm. II such that a(h) = E[T(w)i;{w)h(w)], (5.6) for any with a function h(w) product of representation theorem, this respect to the means that Essentially this h(w) G T. is its is 2 — E[t(w)(*) ]/t. in the same space as being continuous with a(h) VnCe - 6 In this case By the Riesz h. will be ) asymptotic variance can be given an explicit expression. To describe the asymptotic variance E[T(w)y(w)5h (w)/9u' that the same as the functional mean square error asymptotically normal, and p(w) can be represented as an expected a(h) in the v^-consistent case, let p(z) = The asymptotic variance of the estimator can then be expressed |z]. as: V = E[T(w)i^(w)p(w)'Var(y|X)] (5.7) + E[p(z)Var(x|z)p(z)' For example, for the weighted average derivative zero where t(w) equation (5.6) is w density of is zero, so satisfied with and Proj( • | y) functions that are additive in make t'(w) lie in i'(w) in equation (5.2), Sv(w)/5x|7'), is where fp,(w) is the denotes the mean-square projection on the space of w and w and x. The presence of p (w) x. for large V and the asymptotic variance will be t} that is is necessary to the set of Then from equation To state precisely results on asymptotic normality Recall that Proj('|?') K, -E[T{w){Proj(fQ(w)"Ww)/5x|?')}ahQ(w)/5u' regularity conditions. v(w) if then integration by parts shows that = -Proj(f (w) the space spanned by functions that are additive in p(z) = t(w)v(w) = v(w), ]. = y-h (w). 23 |z], (5.7). it is helpful to introduce some . Assumption 2 5: 4 |X] E[llull is cr (X) = Var(y|X) bounded. h (w) Also, 4 bounded away from zero, is E[t) |X] is bounded, and twice continuously differentiable in is with u bounded first and second derivatives. This condition strengthens the bounded conditional second moment condition to boundedness moment and the of the fourth conditional away from conditional variance being bounded zero. The next assumption is requires that the functional least for the truth Assumption the key condition for v^-consistency of the estimator. and the approximating functions. h E[T(w)y(w)h„(w)], and which K, such that K is 2 ElxtwjIlt'Cw)!! 5v(w)/5x is a(h oo, ^ O as ) = ^ K co. can be approximated well by a linear combination uCw) important for the precise form that on functions that are additive The next condition < K example, with the average derivative this condition leads f (w) ] E[T(w)lly(w)-P-^p'^(w)ll^] kK kK for large (3 = E[t(w)i;(w)p, ^(w)], a{p, ^) This condition also requires that p (w) have a representation as an expected product form, at a(h) There exists- vCw) 6: It w in complementary to Assumption v{w) and 6. must v(w) For take. to be the projection of u. For d and a = d + d d d x vector 1 h(w)/Sw 8 of nonnegative integers let fi • • • 3w Also let . \y.\ = E-_iM- 5Ti(w) = denote a nonnegative integer and 5 w max , I . /I I ^jSup a^h(w) ... £5 w€M' I Assumption a(h) 7: such that as K — is oo, a scalar, | a(h) a(p 'P ) is This assumption says that functional square error. | s | h | _ o for some a(h) is continuous in The lack of mean-square continuity also a useful regularity condition. imposed is a scalar, which a(h) I is |h|_, o K E[t(w){p (w)'p„} but not in 2 ] P^ K. —> mean a(h) Another restriction general enough to cover 24 and there exists will imply that the estimator is that 5^0, bounded away from zero while not \^-consistent, and is h s = | K > I many cases of is 0. s When interest. a(h) a vector, asymptotic normality with an estimated covariance is matrix would follow from Assumption of iii) In contrast, (1991) is difficult to verify. is J Andrews Assumption 7 is and general enough to include many cases of interest. For example, as noted above, Assumption 6 includes the weighted average derivative. For another example, Assumption 7 a(h) = h(x,z ,u) - A obviously applies to the functional from equation continuous with respect to the supremum norm and where IK ji^ is such that (3 p (x,z ,u)'p K. E[T(w){p^(w)'pj^}^] -^ a primitive condition, that This condition and Assumption 6 are mutually exclusive, relatively easy to verify. there That assumption of Andrews (1991). bounded away from zero but is 0. K^L^)/n -^ /-^^~s/d 8: 0, — vnK ^ > /-u vnL 0, and for splines -s /d i i — > ^ 0, pointed out by a referee. It K^^L"^ + s/d > 5/2 (KL K > 0. s /d and > L + L. KL 2 +KL 3 2, was as so that if each for splines and n ,,6, ,^8, n for These conditions are stronger than for single step series estimators series. discussed in and growth rate of grows at the same rate they can grow no faster than — KL'^)/n and /,,9, for power series, (K^L + k'^L^ + also limits the K . i- One can show that this condition requires that power is _ - X The next condition restricts the allowed rates of growth for Assumption which straightforward to show that is it (5.1), Newey (1997), because of complications due to the multistage nature of the estimation problem. The next condition is useful for the estimator V of the asymptotic variance to have the right properties. Assumption 9: Either quadratic one, and h (w) of the is a) x 1— — v^TK is univciriate, > 0; b) x is differentiable of all orders, there jth derivative bounded above by This assumption helps guarantee that if is used multivariate, is C(C) -^^(u) a spline a constant , and 25 and v^iK it is at least a is a power series, K p (w) C with the absolute value — its derivative > for some e > 0. can be approximated by + which p (w), important for consistency of the covariance matrix estimator. is These conditions are not very general, but more general ones would require series approximation rates for derivatives, which are not available in the literature. This assumption at is least useful for showing that the asymptotic -variance estimator will be consistent under some set of regularity conditions. Next, for the nonnegative integer of Assumption 7, let 5 C,AK) = o u K max, -sup„iJ5^p . Theorem Qn + p 5.1: i.w)\\. If Assumptions (C,JK)/Vn) 6 Vnv^^'^<e - e^) 1-3, 5, either 6 -^ 7, and 8 and 9 are satisfied, -^ Vnv'^''(e-e^) N(o,i), N(0,V), -^ n(o,i). is satisfied. 7-^1^. needed for large sample confidence intervals this In addition to the asymptotic normality results gives convergence rate bounds and, when Assumption 6 holds, •v^-consistency. Bounds on the convergence rate can also be derived from the results shown that for power series ^^.(K) o generic positive constant. ^ CK C^-^K) ^ and for splines when Thus, for example, o a(h) CK = h(x,z 1/v^ n (z) b) is satisfied, also satisfies the where is a 5 = 0, and for splines by letting K so that grow slowly enough. for some positive, small the convergence rate for a(h) h (w) has derivatives of all orders, same condition, the convergence rate can be made close to continuously differentiable of order Cn (1997) C where - A, K/\^ Newey VK/Vn. When Assumption 9 and in , ,u) an upper bound on the convergence rate for power series would be would be Q = then and Furthermore, if Assumption 6 Vn(e - e^) or e, s all = h(x,z In particular, for every s > 0, h (w) and so that if n^(z) K = Cn of the conditions will be satisfied. ,u) - X 26 are L = cind Then the for power series will then be n ~ * , l/Vn which will be close to for € small enough. The centering of the limiting distribution at zero "over fitting" that than the variance. of the estimator is than 1/v^. Since is in this result is present, meaning that the bias of these estimators shrinks faster. This over fitting K -s/d , 1/v^ is implied by Assumption is faster than K Andrews is (X for is present. shared by other theorems for When combined with the and Newey (1997). (1991) go to infinity slower than n < a < 1, the bias shrinking means that the convergence rate for the estimator 1/Vn! the optimal rate. This is is bounded away from different than asymptotic normality results for other types of nonparametric estimators, such as kernel regression. condition, but that is an extension that 6. The order of the bias generally the fastest the standard deviation of This feature of the asymptotic normality result that series estimators, as in 8. so that Assumption 8 requires that the bias shrink faster estimators can shrink (such as the sample mean), overfitting condition that a reflection of It would be good to relax this beyond the scope of this paper. is Additive Semiparametric Models In economic applications there is nonparametric estimation problematic. often a large number of covariates thereby making The well known curse of dimensionality can make the estimation of large dimensional nonparametric models difficult. problem is to restrict the model in some way while retaining some nonparametric features. The single index model discussed earlier Another example others. is a model that is One approach to this is one example of such a restricted model. additive in some components and a parametric in This type of model has received much attention in the literature as an approach to dimension reduction. Furthermore, it is particularly easy to estimate using a series estimator, by simply imposing restrictions on the approximating functions. To describe this type of model, let w = (x,z 27 ) as before, and let w ., (j = denote 0,1,..., J), w subvectors of J+1 A semiparametric additive model . is go(w^) = w^^^' + IjIigj^w^J. (6.1) I' x This model can be estimated by specifying that the functions of consist of the elements of w and power series or splines could also impose similar restrictions on w in a partially linear reduced form as could specify ^ TT„(z) and power series or splines The coefficients gp^(w ) + A (u) z in each z m = z'lr^ + V r (z) (6.2) m (z m of equation (6.1) are examples of functionals of from the mean-square projection of Assume that ,11 w ] r = E[T(w)y(w)h(w)], is h, 5. so that a series Let q(w) 5. Then y will Robinson (1988) also gave some instrumental is linear in endogenous variables. The regularity conditions of Section 5 can be weakened somewhat for Basically, only the nonparametric part of the w -y. = (E[T(w)q(w)q(w)' ])~^q(w). variable estimators for a semiparametric model that that, for example, be the P{d) = E[T(w)l(w6i4)]/E[T(w)]. Hence, \^-consistency and asymptotic normality of the series estimator for follow from the results of Section h(w) = nonsingular, an identification condition for i;(w) This on functions of the form for the probability measure where E[T(w)q(w)q(w)' ). to include the elements that are mean-square continuous functionals of y. ,g.(w, .) + A„(u), Ij J=l J For example, one . estimator will be VlT-consistent under the conditions of Section residual One (j=l,...,J). '^m=l restriction could be imposed in estimation by specifying z„ (w) p and the reduced form, by specifying that A(u) they depend on a linear combination plus some additive components. of included in this model. model need satisfy these conditions so need not be continuously distributed. ?.8 Empirical Example 7. To illustrate our approach we investigate the empirical relationship between the hourly wage rate and the number of annual hours worked. the hourly wage rate as independent of the While early examinations treated number of hours worked recent studies have incorporated an endogenous wage rate, (see, for example, Moffitt 1984, Biddle and Zarkin between the wage rate 1989, and Vella 1993), and have uncovered a non-linear relationship The theoretical underpinnings of and annual hours. an increasing importance of labor when it is hours of work due to related start up costs. number of hours involved in a greater Alternatively Barzel (1973) argues that labor 1962). can be assigned to this relationship is relatively unproductive at low Moreover, at high hours of work Barzel These forces will generate hourly wage argues that fatigue decreases labor productivity. rates that initially increase but subsequently decrease as daily hours work increase. Finally, the taxation/labor supply literature argues that a non-linear relationship exist due to the progressive nature of taxation rates. between hours and wage rates it may Thus the non-linear relationship potentially the outcome of is (see Oi many influences and we capture in the following model: (7.1) y. where y. is = z;./3 H- g2o(x.) + variables that includes e. and u. z x. and individual z are parameters, are zero mean error terms such that semiparametric version of equation parametric reduced form. characteristics in z'.r + u.. annual hours worked, is p ., = wage rate of the log of the hourly individual characteristics, and x. e., z., We so that (1.1), z. i, is g„_ z is . a vector of exogenous is an unknown function, E[e|u] * 0. This like that discussed in Section 6, use this specification because there are it a vector of is a with a many individual would be difficult to apply fully nonparametric We estimation due to the curse of dimensionality. estimate this model using data on males from the 1989 wave of the Michigan Panel Survey Of Income Dynamics. 29 To preserve comparability with previous empirical studies we follow similar data exclusions to those Accordingly, and Zarkin (1989). in Biddle we only include males aged between 22 and 55 who had worked between 1000 and 3500 hours in the previous year. This produced a sample Our measures of hours and wages are annual hours worked and the of 1314 observations. hourly wage rate respectively. We in wage equation by first estimate the column 1 of Table The hour effect 1. is OLS and report the relevant parameters linear small and not significantly different from Adjusting for the endogeneity of hours via linear two stage least squares (2SLS), zero. however, indicates that the impact of hours effect is small. The results in column 2, is statistically significant although the on the basis of the t-statistic for the residuals, suggests hours are endogenous to wages. To allow the for function to be non-linear g we employed and various approximations for the manner g endogeneity of hours. We also allow for in alternative specifications which we account for the some non-linearities in the reduced form by including non-linear terms for the experience and tenure variables. number of non-linear terms residuals in the from the chosen in the first step specification, We primary equation. and then, by employing the associated we determine the number of approximating terms The CV criterion is well asymptotic mean-square error when the first estimation it will lead to estimates with good properties here. properties in a Monte Carlo study, and find that is known those that CV it to minimize not present so we are Below we consider CV performs Because the theory requires over fitting, where the bias variance asymptotically, first choose the discriminate between alternative specifications on the basis of cross validation (CV) criterion. that We hopeful its well. is smaller than the seems prudent to consider specifications with more terms than gives, particularly for inference. the number of terms that minimizes the CV Accordingly, for both steps we identify criterion and then add an additional term. For example, while the approximation which minimized the CV criterion for the first step was a fourth order polynomial in tenure and experience 30 we generate the residuals from a model which includes a fifth order polynomial Note that to enable in these variables. comparisons, these were the residuals which were included in column 2 discussed above. The CV values for a subset of the second step specifications examined are reported in Table 2. Although we also experimented with splines, in each step, we found that the The preferred specifications involving polynomial approximations appeared to dominate. specification a fourth order polynomial in hours while accounting for the endogeneity is through a third order polynomial in the residuals. specification 182.643. is The CV criterion for this The fourth column of Table reports the estimates of the The third column presents the hours hours profile employing these approximations. coefficients while excluding the residuals. 1 Due to the over fitting requirement we now take as a base case the specification with a fifth order polynomial in hours and a fourth order polynomial in the residual. We will also check the sensitivity of some results to this choice. While Table it is 1 suggests the non-linearity and endogeneity are statistically important useful to examine their respective roles in determining the hours profile. For comparison purposes Figure plots the change in the log of the hourly 1 annual hours increase, where the nonparametric specification h = 5 and r = 4. To is wage rate as the over fitted case with facilitate comparisons each function has been centered at its over the observations, and the graph shows the deviation of the function from As annual hours affect the log of the hourly wage rate plot the impact of hours on the log of the hourly wage in its In Figure 1 we mean. an additive manner we simply rate. In Figure 1 we plot the quadratic 2SLS estimates like those of Biddle and Zarkin (1989) and the profile similar to theirs. mean is very also plot the predicted relationship between hours and wages for these data with and without the adjustment for endogeneity using our base specification. It indicates the relationship is highly non-linear. Furthermore, failing to adjust for the endogeneity leads to incorrect inferences regarding the overall impact of hours on wages and, Although we found more specifically, the value of the turning points. that the polynomial approximations appeared to best fit the data 31 we We employ also identified the best fitting spline approximations. the parameters we allow this for the first to be chosen by the data are the and second steps and number of we over in this instance. including an additional join point, in each step, above what CV of the values. is fit points for the residuals. Figure 2 In we the data by 1 join points in the join point in the hours function and 1 plot the predicted relationship and the wage rate from the over fitted spline approximation. looks remarkably similar to that in Figure We do join points. determined on the basis Note that the best fitting approximation involved reduced form and a primary specification of cubic splines and 1 join between hours This predicted relationship Furthermore, the points noted above 1. related to the failure to account for the endogeneity are similar. Figures wage rate 1 and 2 indicate that at high numbers of hours the implied decrease sufficient to decrease total earnings. is may be due This to the small in the number of observations in this area of the profile and the associated large standard errors. Figures 3 and 4 explore this issue by plotting the 95 percent confidence intervals for our adjusted nonparametric estimates of the wage hours profile. It confirms that the estimated standard errors are large at the upper end of the hours range. To quantify an average measure of the effect of hours on wages we consider the weighted average derivative of the estimated over the range where the function 1300 region, to 2500 125 upper tail. shown hours, represents observations above only is 2500, The average derivative for the .000155. to be upward sloping 81 percent 1300-2500 h = 5 We estimated function. so that not very the polynomial approximation, based on error of g and r in the derivative Figures of the total sample. This There were many were excluded by ignoring the part of the hours profile from = 4, is .000513 with a standard is .000505 with a standard error of .000147. estimates are quite different than the linear 2SLS estimate reported in Table consistent with the presence of nonlinearity. 1 2. The corresponding estimate from our spline approximation of the weighted average derivative Figure and 1 the average derivative is .000350 These 1 which is Furthermore, for the quadratic 2SLS in with a standard error of 32 .000142, which is also quite different than the average derivative for the nonparametric estimators. difference and is This present despite the relatively wide pointwise confidence bands in Figures 3 Thus, in our example the average wage change for most of the individuals in the 4. sample found to be substantially larger for our nonparametric approach than by linear is or quadratic 2SLS. Before proceeding we examined the robustness of these estimates of the average derivative to the specification of the reduced the reduced form and alternative approximations of the wage hours reduced the number of approximating terms CV form and the alternative specifications of in the reduced form. profile. While we First, we found that the form criteria for each step increased with the exclusion of these terms in the reduced we found that there was virtually no impact on the total profile and a very minor effect on our estimate of the average derivative of the profile for the region discussed above. When we included For example, consider the estimates from the polynomial approximation. only linear terms in the reduced form the estimated average derivative a standard error of The corresponding values for the specification including quadratic terms .0000175. .000546 is with a standard error of cubic terms resulted in an estimate of Changes in the specification of the .000522 .000174. .000515 is with Finally, the addition of with a standard error of .000153. model based on the spline approximations produced similarly small differences. We terms were also explored the sensitivity of the average derivative estimate to the approximation. in the series Both the average derivative and relatively unaffected by increasing the number of terms in the its number of standard error approximations. For example, with an 8th order polynomial in hours and 7th order polynomial in the residual the average derivative estimate Finally our model. is .000519 we examine whether we are We do and the residuals variable was .231. with a standard error of able to reject the additive structure implied by this by including an interaction in .000156. our preferred specification. term capturing the product of hours The t-statistic on The CV value for this specification 33 is 182.924 this included while that for our preferred specification plus this interaction term and the term capturing the product of hours squared and the residual squared conclude there is is On the basis of these results we 183.188. no evidence against the additive structure of the model. In addition to this empirical evidence we provide The objective of this exercise data examined above. simulation evidence featuring the procedure to accurately estimate the unknown function examined above. We explore the ability of our is to in the empirical setting we simulate the endogenous variable through the exogenous variables and parameter estimates from each of the polynomial and spline approximations reported above. We generate the hours variable by simulating the reduced form and incorporating a random component drawn from the reduced form empirical residual vector. we estimate the residuals and from the simulated hours vector order terms for hours. We From this reduced we generate form the higher then simulate wages by employing the simulated values of hours and residuals, along with the true exogenous variables, and the parameter estimates discussed above. wage equation The random component residuals. is drawn from the distribution of the empirical As we employ the parameter vector from the over fitted models in the empirical the polynomial model has a fifth order polynomial in the reduced form while the wage equation has a 5th order polynomial in hours and a fourth order polynomial in the residuals. in the The spline approximation employs a cubic spline with reduced form while the wage equation has points for the residuals. We examine 2 2 join points for hours join points and the performance of our procedure for 2 join 3000 replications of the model. In order to relax the parametric assumptions of the each step of the estimation. That is, for the model which employs the polynomial approximation we first choose the number of terms of the residuals from the over fit reduced approximating terms in the model we use the CV criterion in wage equation. in the reduced form. Then on the basis form we then choose the number of For the model generated by the spline approximation we do the same except we employ a cubic spline and choose the number of join points. For both approximations we trim the bottom and top two and a half percent 34 observations, on the basis of the hours residuals, from the data for which the wage equation. An important aspect of our procedure polynomial model for order polynomial in all 25 3 residuals in the wage equation. the results we computed we we can possibilities. 27 For the spline model we possible specifications. evaluate the performance of an average the estimator by choosing the .000153, 8.9 percent. to a fifth As there was also 5 Once again we over fit the number of terms that minimizes and computed standard errors for the same The mean of the from the polynomial model. average derivative estimates, across the Monte Carlo replications, was represents a bias of criteria for the wage equation combining up applied to the actual data. First consider the results CV to correctly form and then 3 each for hours and This produced criterion, plus one additional term, specification. 125 join points in the reduced derivative estimator like that CV we computed hours and fifth order polynomial in residuals. From the Monte Carlo data such that this specifications of the choices in the reduced form this generated examined up to CV method the ability of the is To examine identify the correct specification. the we estimate .000456, which The standard error across the replications was while the average of the estimated standard errors was .000152. Therefore the estimated standard errors accurately reflect the variability of the estimator in this experiment. For the spline approximation the average estimate was represents a bias of .000145, We 11.1 percent. .000442 which The standard error across the replications was while the average of the estimated standard errors was .000144. also computed rejection frequencies for tests that the average derivative was equal to its true value. For the polynomial model the rejection rates at the nominal 10 and 20 respectively. percent. percent significance levels were 6.5, 11.8 and 5, 22.5 percent The corresponding figures for the spline model were 7.4, 13.1 and 24.0 Thus, asymptotic inference procedures turned out to be quite accurate in this experiment, lending some credence to our inference in the empirical example. 35 Appendix We will first prove one of the results on identification. Proof of Theorem Consider differentiable functions 2.3: 6(x,z and ) and rtu), suppose that = 5(x,z and z identically in + ^(u) = 5(n(z)+u,z ) J^ If IT is full (z) rank + 5 (x,z ) J. i. it + t(u), Differentiating then gives u. = n (z)'5 (x,z J, ) = 6 (x,z ), J'L i. it identification we suffices to consider 6(x,z J+ytu) = = H (z)'5 (x,z ^ j^ + ^ (u), LI follows from the last equation that that 11 then follows from the first two equations that show ) J. will use 5(x,z ) Theorem and ) = and y By differentiability of 2.1. y{u) 5,(x,z, 5 (x.z that are differentiable. u = ) (u) = g and Also, ). 1, 0. It Now, to 0. A if with probability one, then by continuity of the functions and the boundary having zero probability, this equality holds identically on the interior of the support of Then, as above, (u,z). all the partial derivatives of are zero with probability one, implying they are constant. by Theorem 2.1. To avoid that is repetition, needed. may be it Throughout the appendix different in different uses. and n represents a possible value of among and y{u) Identification then follows useful to prove consistency and asymptotic normality X z ) QED. for a general two-step weighted least squares estimator. notation 6(x,z its components and n Let w(X,7r) (z) 36 To state the Lemmas some C will denote a generic positive constant, X be a vector of variables that includes a vector of functions of = E[x|z]. assumed that lemmas Then for w = w(X,n X and (z)), it ti, where will be (A.l) E[y|X] = h^Cw), Also, let ti. 11 = 0(X.) E[x|z] = Tl^U). be a scalar, nonnegative function of and n(z) X., be as in t. 1 1 the body of the paper, and Let p = (P'Pr^P'Y, h(w) = p^(w)'p, (A.2) W max, = {w t(w) = : ,^5,sup and for 1>, 5 Al: component w.(X,7r) 0(X) i) |h|_ = o a nonnegative integer, let bounded and is matrices B E[t(w)(/»(X) K 2 K P (w)P (w)'] and B iii) There exists 5, max, with, a, a, , and > K in w.(X,n and L K is continuously there are nonsingular and L; L R (z) = Br L and sup^-IIR^z)!! K and (z), For each nonnegative integer iv) ^sup,,,ll5^P^(w)ll < C^AK) p^ (z)) each ii) tt; have smallest eigenvalues that are ] and such that |h„-p 'P^l- ^ U K o z, L K. 1 K K L L E[R (z)R (z)' and or ti For each Lipschitz in is P (w) = Bp (w) such that for bounded away from zero, uniformly C,JK) w(X,7r) either does not depend on distributed with bounded density; is = P^^^.). P^ t^i'/'iPi-'-.^n'^nPn^'' ,.^|5'^h(w)|. Assumption there P = ^(L); v) r< CK 5 — cc and sup^\\Tl^{z)-y^rHz)\\ s CL~"i. Lemma Al: If Assumptions 1 [K^^^C,^(K) + (:^(K)^^(L)][(L/n)^^^ + 1'°"!] -^ 0, ST(w)iJj(X)^[h(w)-hJw)fdFjX) = If Assumptions 1 Lemma = P^(w) C,^(K) (K/n + k"^" + L/n + = O a:jK)[(K/n)^^^ + k"" + (L/n)^^^ + p a l\]). Al: Note that the value of if , and K p (w) rhz) = and/or R^(z). Let r K/n then d, is unchanged (z) is used, so that it can be L = t(w.), p. a nonsingular constant = p^(w.), Q = assumed that Elx.^^^p.p'. 1 37 and 0, > L^\). h t. — then u a linear treinsformation of p'^(w) 2 0, and Al are satisfied for some nonnegative integer |h - h-l Proof of d = and Al are satisfied for 1 1 1 ]. By largest eigenvalue of Q — 1/Z K p (w). u~K 113 Q 2~K Since E[t(w)i/'(X) p Q = prove the results with Q^ = E[rhz)rhz)'] = = L A Next, let L /Vn' + ~K (w)p (w)'] = by construction, I suffices to prove the result it 11 H. i, n(z) is = iKz.), and = IT. ^ O(A^) + CWr-Tf, L Tt + c^;." ) ll^o CJ-lnCz)-?-, L ' Note that for Newey (1997), = y.^Am-HAl^/n ^1=1 1 Theorem max.^ i^n 1 1 of Newey Ilft.-n.l! 11 = Therefore, by Assumption Al + 0(K"^"i) rhz)]^dF(z) + Clly-r, L ll^o p Theorem 1 follows that it of Newey 2 llr-y. II = (A 2 ), TT (1997), (e(L)A ). TT p ^ ii) [TjI/'jPj "^n^nPn^' + 0(K"^"i) and Lemma ^"^ A3, ^ ^ P'P/n. 38 Note that (A. 2) so that by eq. (A^). P (1) Also, by eq. (1997). 11 P = from rhz.)ii^/n iin.-?- l.^.\i.-T.\/n = O (?(L)A ). 1=1 71 p (A.4) y (1), p last inequality follows by the Appendix in Let Also, suppose for ft- 1 a scalar function. £ c(f-r, )'Q,(?-r, ^ Cj-[n(z)-n^(z)]^dF(z) + (A.3a) TT„(z.). 1 £ Cj'[n(z)-3-j^rhz)]^dF(z) + C|(y-?'j^)'(Q^-IKr-9'JI Also, by suffices to it Al, X;.!l,iin.-n.ii^/n (A.3) such that I. notational simplicity that that where the (w) = p C a constant is By analogous reasoning, I. 71 Assumption by (w) p so that the hypotheses and the conclusion will be satisfied with |(K), this replacement. for Consider replacing bounded. is bounded away from zero, so that the is By the Cauchy-Schwartz inequality there £ CCi (w)ll p Q Al, the smallest eigenvalue of Assumption E[IIQ-I!I^] ^ of (A.7a), implying —^ IIQ-Ill Lipschitz in w(X,7r) Markov inequality, ^ ri C^(K)llw.-w.ll ^ C<^(K)lin.-n.ll. :£ /n = ,T.0.llp.ll ^1=1 convex and a mean value expansion T.x.llp.-p.ll ti, Y. •" W Also, by 0. ^1 = x. x. = x., x., and Also, by the 1111 Then by (K). p w, in and i/i. ^1 bounded, (A.5) £ IIQ-QII lir.",x.x.^//^(p.p'. ^1=1 r 1 s + Clir.^Jx.x.-xJp.p'./nll ^1=1 1 11 1 1 ^r 1 - p.p'.)/nll + CllX;.",(x.x.-x.)p.p'./nll ^r 1 ^i=l 1 i i '^I'^i i cy;.",x.x.(llp.-p.ll^ + Z^.llp.mip.-p.lD/n + ^1=1 11 11 1 1 1 1 Cr (K)^.", |fi.-i/r. |/n ^1=1^1 ^1 a/2 s C<,(K)\.",lllt.-n.ll^/n + C(r.",x.i/(^llp.ll^/n)^^^(y.",x.x.llp.-p.ll^/n)^ ^1=1 1 1 O (C^(K)^^(L)A ^0 + P Let r). = (w.)], i//.[y.-h of the observations 2 ~ E[t}.IX] is the observations, = 11 tj = for E[ll(P-P)'7)/nll^|X] E[tj.t) i :£ |X] 1 ^ .|X.,X.] where the second to Y result is that = o (1/n). n Also, = = CElx.i//.p'p.]/n (A.7) E[t].|X1 P n ) if | (n"^[<^(K)^?(L)A P + r{K)hh) 1 TT E[|Y ||X] = O n (A P n it ), EdlP'Vnll^] = E[E[IIP'7)ll^/n|X]] = = Ctr(Q)/n = Ctr(I)/n = CK/n. (A.5) X, 111 llP'Vnll^ ^ C[ll(P-P)'Tj/nll^ + eqs. = depending only on E(y;.",x?i/»'*p'.p. Var(y. 1*^1*^1 ^1=1 1 r = o P(P'P)~ P' 39 ll(P-P)'T?/nll^ |X.)/n^] £ 1 Therefore, by the triangle inequality, llP'-rj/nll^] M = (n"^), P Then, since a standard follows that (1) + O (K/n) = P and (A.7) and the smallest eigenvalue of with probability approaching one, for = o 71 (A.5). P Then by Furthermore, 0. bounded, and by independence of last equality follows similarly to eq. (A = Cn~^.",0^1lx.p.-x.p.ll^ < Cn'V.'^, |x.-x. (llp.ll^+llp.ll^)/n ^1=1 1 ^1=1 11 11 '^i 1111 p (1). p 1 E[t}.E[t).17).,X.,X.] |X.,X.] p 2 so = o ) 71 Then by independence ). 1 Then by j. + Cn't",-^.T.IIp.-p.ll^/n = ^^1=1 (X ,...,X = ^.h^(w.), 1 + <^(K)^C(L)A ^ ^0 71 Var(y. |X) = Var(y|X.) and E[-r].Tj.|X] X = and ), = 0.E[y. 1 ip. =0 TT (t) E[^.y. |X] E[tj.E[t).|X.]|X.,X.] (A.6) p 1111 ^1=1 1 = O (C,(K)^A^ + K^^^<,(K)A ^1 ^1 ) 71 t) bounded by 11 ^1=1 1 , Q T)'MT)/n (K/n). P bounded away from zero = (T)'P/n)Q~ (P'Vn) ^ = O (K/n). p (DIIP'Tj/nll^ p h, h, and be h jthe Let = x h. 1 corresponding K sup,,,|h^(w)-p (w)'p| be such that Mw 111 , n x vectors .) .\jj = 0(K 1 h. = x.iJj.hAw.), 11 1 -oc o (M-I)(h-Pe), h^<w) Hh-h«^/n ^ mv Lipschitz. and Jin--IT.« 1=1 It 11 Gil p approaching one that, for s h = P/3 (l)(p-p)'Q(p-/3) = + (h-E3'M(h-h)]/n s O p M-I )'). Let and (B idempotent, (M-I)h o /n = O (A P for ), A^, h TT = VK/Vn + K~" + 11 and i/» y ,...,t \p (l)(y-h)'M(y-h)/n ^ (K/d) + p O (A?). p h bounded av^ay from zero with probability Q y = (x p + and (K/n) + C^.", Ilfi.-n.ll^/n + 0(K"^") ^1=1 then follows by the smallest eigenvalue of llp-/3ll^ 1 + M(h-h) + <M-i)hll^/n £ C(T]'MT)/n + y;.",T.i/»^[h^(w.)-h„(w.)]^/n ^1=1 1 1 1 1 + y.",T.^A^[h„(w.)-p'.p]^/n £ ^1=1 11 J]- ,...,h 1 ~ M Then by ). 1 1 h = (h (i.e. = T.0.h_(w.), h. 1 y )', (l)[T)'M7]/n + (h-h)'M(h-h) (DY." £.i//^[h^(w.)-h„(w.)]^/n ^^1=1 r 1 1 1 (l)j:.",T.0^[h^(w.)-p'.p]^/n = ^1=1 1 1 1 1 P {Ah. h Then by the triangle inequality, UT(w)!/»(X)^[h(w)-hQ(w)]^dFj^(X)}^''^ ^ {jT{w).//(X)^[p^(w)' (p-/3)]^dFQ(X)>^^^ + {jT(w)0(X)V(w)'l3-hQ(w)]^dF(w)}^'^^ - giving the first conclusion. + 0{K"") s lip-pll (Aj^), Also, by the triangle inequality, |h(w)-hQ(w)|g s |p^(w)'p-hQ(w)|g + |p^(w)'(p-i3)|^ ^ 0(K ") + Cg(K)llp-pll = «g(K)Aj^). To state Lemma A2, some additional notation as given in the body of the paper. Also, let is p. 40 QED. needed. = p K Z Let (w.), r. , = p S , L (z.), Q , and Q be 2 = ± = y.",T.i/»^p.p'.[y.-h(w.)]^/n, 1111 ^1=1 Q = Q = P'P/n, 1 r r r E[T.^^p.p'. 1111 -^i 1 y.",T.i//.p.{[Sh(w.)/5w]'aw(X.,fT.)/57r®r'. }/n, ^1=1 111 1 1 H = E[T.^/(.p.<[5h„(w.)/aw]'5w(X.,n.)/aTr®r'.>]. 11 ill AQ~^(f: + Assumption A2: HQ~^f ^Q^^H' )Q"^A' differentiable in lla(h)ll i) w 11 1 K For notational convenience, iii) and L and ^ C|h|„; o ir 1 V = AQ"^Z , V subscripts for h^(w) U ii) and + HQ^^Z^Q^^H' )Q"^A' v(w), = E[t(w).//(X)^i'(w)Pj^j^(w)], a(h ) is ~ (^ are twice continuously w(X,Tt) respectively, and the first and second derivatives are either a) there are a scalar and there exists . are suppressed. such that p a(h K. a(pj^j^) i ], H = V = bounded; E[T.t//^p.p'.Var(y. IXJI, 1 ) and U ) = 2 E[T(v^r)^//(X) E[T(w)t/((X)^lli^(w)-/3j^p^(w)ll^] such that for K ~ h (w) - p (w)'p -^ 0, or; b) E[h (w) , vCwjh Z ] — > U (w)], a(h) 0, bounded away from zero. K. Under iii) b), let d(X) = [ah (w)/aw]'5w(X,n (z))/a7t, the set of limit points of elements of r T(w)^(X)y(w)d(X) {z)'y on , £, and p(z) let recall the definition of + E[p(z)Var(x|z)p{z)' 41 as be the matrix of projections of and V = E[T(w)0(X)^i^(w)i^{w)'Var(y|X)] £ ]. is and Lemma If Assumptions Al and A2 are satisfied, A2: and ^(L) are bounded C,JK) following go to zero n —^ as away from zero as m.- VnK L^C,Q(K)^^(Lf/n, KLi^Q(Kf^(L)^/n, , and VnV^'^^Ce - e^) -^ N(O.I), VR(Q - Qq) -U Lemma A2: = V N(0,V), L^'^^/v/n + l""i, Aq = [K^^^q(K) A„ = 1/2 /Vn) grow, C,JK) K /n, (K L+L and each of the KXK) then /n, 6 = 6^ + -^V. A^ = K^^^/V^ + K""' + A 71 n U VnK"" -^ and A, and = 0(K Cs-(K) CL^CQ(K)^?(L)Vn -> 0, K^''% = N(0,I). o . A^, = ?(L)L^^^/;/n. Ql + Co(K)^C(L)]A^ + Co(K)K^^^/v^. 1 rate conditions and (A.8) K bounded away from zero, is satisfied, + K^^^^{L)/V^. [L^''^C,(K) + Cn(K)C(L)^]A li 0(L and is First, let TT Note that by L C,^(K)^i^^(K)^(LK+L^ )/n, ^^^(Q-B^) -^ Furthermore, if Assumption A2, Hi) a) A i, \ and (CXK)/Vn) p ^d Proof of VnL Var(y X) 1/2 71 vQc""i /Vn + L and 1/2 ^(L) ^ 0, = (L^''^/V^)(l + L"^^^vQ.~"i) = A 7r /Vn). Therefore, it follows from the convergence bounded away from zero that ^(L) and 0([KCj(K) + K^''^Co(K)^^(L)]L^''^/^n + Cq^^JK/V^) = 0({k\Ci(K)^ + KLCqCKj'^CCL)^ + K^Co(K)^}^''^/v^) -> 0. L^^^Ajj = 0([LCj(K) + L^''^Co(K)C(L)^]L^^^/v^ + L^'^\^^^^(L)/Vn) VnC^CKjA^ = 0(CQ(K)L/yn) -^ 0, 42 2 2 L /n ^ Cq(k)l^''^Cj(k)a^ = o({Cq(k)^lc^(k)^(k+l)}^''^/v^) -^ o. It from these also follows L results that 1/2 -^ A — Cq(K)A 0, and > For simplicity the remainder of the proof will be given for the scalar Also, as in the proof of r = R (z) Q and (z), A[Z + HS H']A'. Let Lemma Q and -^ v^{w) = Ap^(w). 0. equal to identity matrices. Q = Z - CI I, is V Q = By -1 U mean square error (MSE) By . and b, (z) 2 (wj-i/'fw)!! ] — K as > — > (w), V = = Var(y|X) (X) Therefore, = C. 2 E[t(w)i/»(X) u'{w)p K (w)' ]. ^ E[T(w)i/((X)^lli;(w)-p-,p^(w)ll^] = E[T(w)i//(X)d(X)y(w)rhz)' ]rhz). (z)-b K.L }^'^^ K K.l_, of a least squares projection E[llb (w) = P 2 cr case. (z) IT 0. b^, (z) = (J random variable being projected, Up A = Then K . In this case, E[T(w)i/»(X)^lly(w)-i'-,(w)ll^] I, p positive semidefinite. a) is satisfied. A2,- iii), E[T(w)i//(X)d(X)yj.(w)rhz)']r^(z), 2 show the result with d(X) = [ah-(w)/5w]'aw(X,n-(z))/5jr, Also let CE[t(w)i//(X) suffices to = {tr[FAA'F']}^'^^ s {tr[CFASA'F' ]>^^^ £ tr{CFVF' IIFAll Suppose that Assumption Let it be a symmetric square root of F bounded away from zero and {A.9) Al -^ C,(K)A is 2 L oo. (z)ll no greater than the MSE of the 2 ] E[t(w)i/((X) d(X) :s Since the 2 Up (w)-t'(w)ll 2 s ] K. Furthermore, by Assumption A2, b), ii), K. 2 E[llb (z)-p(z)ll Var(x|z), it ] — as > L — > co. 2 Then by boundedness of cr (X) = Var(y|X) and and by the fact that MSE convergence implies convergence of second moments, follows that V = (A.IO) E[T(w)(^(X)^yj^{w)yj^(w)V^(X)] + E[bj^^(z)Var(x|z)bj^^(z)'] This shows that F is bounded. Suppose that Assumption A2, ~ by the Cauchy-Schwartz inequality, |a(h-,)| = |Ap^| s K. that IIAII — > 00. Also, V > ASA' ^ Next, by the proof of Lemma 2 CIIAII , so Al, 43 F ~ |IAIIIip„ll is V. b) is satisfied. iii) K. K. -^ = 2 IIAII(E[h„(x) Then 1/2 ]) . K. also bounded under Assumption 5. so (A.ll) = IIQ-III (A_) = o Q p H = Further, for (A.12) Lemma Al, implies that the smallest eigenvalue of Q (A„) = o -^ IIQ-III (A_,) = o (1). Ql p similarly to 11 111 ^1=1 H p Now, p 1 ,T.i//.p.d(X.)r'./n, T. = IIH-HII = IIQ -III (1), P (1), p is bounded away from zero with probability approaching one, implying that the largest eigenvalue of O impl3dng (1), (A.13) ^ IIFA{Q~^-I)II IIFAQ'^'^^II^ Also, £ IIBQ~'^II IIBIIO for any matrix (1) IIFA(I-Q)Q~-^II ^ = tr(FAQ"^A'F') ^ CO = (l)IIFAII^ be such that p IIV^a(p'^'p - (A.14) h^)ll (J Also, for h. = T.i/».h-(w.) 1 1 1 0. P h^(-)U = O Ip'^C-)'^ - U o (V^"") = £ Cv^llFII|p^(-)'3 - h^(-)k = U o In h = (h,,...,h and 1 Then (K"°'). p p ^ CV^[tr(FAQ"^A'F')]^'^^0(K~°^) = h. Ap = a(p = T.iA.h^(w.) 1 (A.16) 1 1 = A3, a(h) 'p), h = and eqs. In (h,,...,h 1 (1). )', = o (t/iiK"") P Then by o p IIFAQ"^' (h-Pp)/Vnll s Cv/KiiFAQ~^P'/V^llsup,,,|p^(w)'^ - h^(w) (A.15) for -^ (1). P Next, let is Therefore, B. IIFAIIIIQ-IIIO (1) Q {A.14) | (1). P and (A.15), and the triangle inequality, )', v^[a(h)-a(h-)] = FAQ"^P'VV^ + FAQ"^P' (h-h)/\/n + o (1). p Let IT = (n sup_|n (z)-r of each n )', {z)'2r\ h(w.) u. = x.-n., = 0(L around w., U = and d. and by eq. i), (u ,...,u )', = d(X.). (B.O), 44 y be such that By a second order mean-value expansion — (A.17) FAQ ^P'(h-h)/v^ = FAQ ^Y.-^^i.ip.p.d.[-n.-Tl.]/^/R + ^1=1 111 111 = FAQ^^Hq/r' U/^n p 1 111 + FAQ~-^HQ7^R' (n-R' j-VV^ + FAQ~V.",T.i/».p.d.[r'.y-n.l/Vn + ^1=1 111 1 < C^IIFAQ~^''^IICn(K)I.'',lin.-n.ll^/n = O (/SCn(K)A^) = o (1), ^0 ^1=1 71 p p 11 f ll^ll Also, by d. p '^ nHQ H' bounded and ^ multivariate regression of — -^"l yv on T.i/>.p.d. Ill sum equal to the matrix r., :^ 1 1 1 n HQ, H Y. ^ 2,^ '^ 2 .x.ip .p.p'.d./n ^1=1 from the of squares 1111 '^ ^ CQ. 1 Therefore, llFAQ~^HQ~^R'(n-R'3')/v^ll ^ IIFAQ'^HQ'^R'/'/nllV^-sup^lnQCzj-rhz)'?'! (A.18) < [trace(FAQ"^HQ~^Q^Q^^H'Q"^A'F')]^''^0(/nL~"i) ^ CIIFAQ"^^^IIQ(^/riL~"i) = (v/nL""i) = o (1). Similarly, ill IIFAQ~V.",T.i/».p.d.[r'.3'-n.]/v/n (A.19) 111 ^1=1 II s CIIFAQ~^''^IIO(VnL~"i) = o E[IIR'U/V^II^] = tr(Z^) £ Ctr(Ij^) ^ Next, note that (1). p L by E[u^|z] bounded, so by the Markov inequality, 1 (A.20) IIR'U/V?ill = O (L /7 ). P Also, note that IIFAQ'^HQT^H ^ (A.21) (1)IIFAQ~-^''^II p 1 IIFAQ~%(Q7^-I)R'U/\/nll :s (A.22) eq. (1). Therefore, p 11 IIFAQ"^Hq7^IIIQ -IIIIIR'U/V^II = 1 By similar reasoning, and by = = is Ql (A„L^^^) = o p HH' (L^'^^A^,) p = o (A. 8), IIFAQ ^(H-H)R'U/Vnll ^ IIFAQ ^IIIIH-HIIIIR'U/Vnll Noting also that O H (1). p the population matrix mean-square of the regression of 45 (1). p ) T.0.p.d. r i*^! on HH' s so that r., CI, E[I1HR' U/Vnll^] = tr(HZ,H') < CK. follows that it 1 1 1 Therefore, 1/2 = IIHR'U/\/nll (K and ), IIFA(Q"^-I)HR'U/V^II ^ IIFAQ"^II (A.23) ll(I-Q)ll IIHR' U/ZSlI = (A^K^'^^) = o Q P (1). P Combining eqs. (A.16)-(A.23) and the triangle inequality gives FAQ~-^P' (h-h)/v^ = (A.24) Next, it FAUR'V/Vn follows as in the proof of (Cq(K)^C(L)A^ + <:^(K)^A^) = o (A.25) + o (1). Lemma Al that (P-PJ'tj/VTTiI IIQ = implying (1), IIFAQ"^(P-P)'t]/v^II ^ IIFAQ~^'^^IIIIQ~^'^^(P-P)'-n/V^II = o (1). P Also, by (A.26) 0, E[IIFA(Q~^-I)P't7/v^II^|X] = tr(FA(I-Q)Q~^ZQ~^(I-Q)A'F' IIFA(I-Q)Q~^II^ s :£ so that = E[7)|X] IIFA(Q~ II FA 11^ I-Q 11^0 II -^ -DP'V/nll (1), Then combining 0. eqs. (A.25)-(A.26) with eq. (A.16) and the triangle inequality gives (A.27) v^[a(h)-a(h^)] = FA{P' n/VR + HR'U/v^) + o (1). p Next, for any vector Note that are Z. <p with across i.i.d. i = II0II e > 0, IIFAII let </i'FA[T.i/».p.T}. Ill ''in for a given in Furthermore, for any 1 £ C and n, IIFAHII E[Z. ] = + Hr.u.]/V^ = Z. 11 1 0, and £ CIIFAII £ C in I > e)Z^ in ] = n6^E[l(|Z. in £ Cne"^li0ll^E[llT.p.ll\[7)||X.]] + | > e){Z. /e)^] £ ne"^E[lZ. in in E[llr.ll'^E[ut |z.]]}/n^ 46 Var(Z. ) = 1/n. in by semidefinite, so that nEUdZ. . in l"^] CI-HH' positive £ Cn ^{Cq(K)^E[IIt.i/».p.II^] + C(L)^E[llr.ll^]} = 0([Cq(K)^K+ ^{L)\]/n) = o(l). /nF(9-0 Therefore, — follows by the Lindbergh-Feller theorem and equation N(0,I) > ) (A.27). CI - As previously noted, Suppose a(h) so that CJK)II/3II, o 9 IIAII^ :£ scalar CtK) 5 ^ V 9-6^ = that p . Next, by and 1 ^ C,AK)\\A\\. o /Vn) = Lemma P max. Al, of Newey |h.-h. w Lipschitz in w(X,7i) triangle inequality, max. 111111 and 11 x.[-27j.(h.-h.)+(h.-h.)^] = v.. | | = = o III.-IT. 11 | = 11 D = 111 ,7).llw.-w.ll ^1=1 /n = iiz-zii = Therefore, (A P £ ). 111 (l)o (1) O it max.^|h.-h.| = o i^r 1 II | ' ' = x.(7}.-t).) I+Itj 'n | }PQ"^A'F'/n. ' Then (1). P so that :s IIFAQ"^II^IIZ-ZII T.^.tj^Ix.-x. |/n ^1=1 1 11 IIQ-III = (^(L)A p^ = 47 ) 71 (A„), Q zii 9 (A^ C^(K)^K/n) pQ^O -^0. and CJ^." |x.-t. |/n + iiy;.",T.p.p'.T)^/n ^i=l ri^i i 11 s Hence, by the (1). v }PQ"U'F'/nll 4 4 1/? = ({E[x.llp.irE[7}^|X.]]/nr ) IIFAQ~^(Z-S)Q"^A'F' h„(w) 1 E[J]." tj^It.-x. |/n|X] i follows by 1111 = x., x. Then similarly to the proof of 1 follows it so that by (1), p r r ^ p TT pQpii (A^) + = o P iiy;.",(x.p.p'.-x.p.p'.)T}^/nii ^1=1 = o ) 71 FAQ~^P'diag{l+|T7, b 1^ '-'1=1 P uniformly bounded, o then gives IIAII other case Also, (1). (f(L)A y'.",x.p.p'.-n^/n. E(y;.",T}^llw.-w.ll^/n|X] £ cy.",llw.-w.ll^/n, ^1=1 1 ^1=1 1 1 1 1 y. 'p| p 1 Z = £ Ctr{D)max.^ |h.-h.| = E[t)^|X] = o p IIFAQ~^(Z-S)Q~^A'F'II = IIFAQ~^P'diag{i^ Also, by ) h Note that by (1). p i^n In the p Let |p This result for the " b). (C„(K)A, respectively, tt Also, for (1). iii), p 1 E[D|X] £ CFAQ~^A'F = n {C.JK)/Vn). 5 1 i£n |h.-h. i:£n 1 max. (1997) that all for ^^ Dividing by (C^(K)/\/n). P (1/V^) = i:sn Theorem ^ o covers the case of Assumption A2, a(h) V Op „ . r/9 9-6^ = O (V Therefore, . CIIAII the Cauchy-Schwartz inequality implies IS, = latAp'^)! ^ \Ap^\ 9 IIAII A 2 V £ positive semi-definite, so that is follows from the above proof that It Note that for any large enough. from a scalar. is HZH' + Furthermore, = o (1). p IIBSII s and for any matrix CIIBII CI - E by B, positive semi-definite, so that IIFA(Q~^EQ~^-Z)A'F'I1 < IIFA(Q~^-I)SQ~^A' F' £ O IIFAZIIIII-QIIIIFAQ~^II + IIFAIIo (1)0 (l)III-QII P P inequality, To show -^ JIE-ZII show that suffices to it IIFAZ(q"^-1)A' F' + II 11 -^ it = {[ah{w.)/dw]' av/(X.,fi.)/dn d. 11 1 C(sup^|h-hQ|2 0. 11 and d. = d(X.). bounded, 3w(X,7r)/57r Al, -^ follows similarly to the argument for 1 Then by the conclusion of Lemma ^ii^i'V^i'^^".^ Let 0. IIZ -Z,ll Then by the triangle (1). P FA(Q~^HQ~^Z^Q^^H'Q"^ - HE^H')A'F' this last result, note first that that = o (1) P < IIFAQ"^II^II(I-Q)ZII + II = + l.^^\\fl.-Tl//n) (q(K)2A^). Therefore, IIH-HII ^ C(X.",T.llp.ll^lir.ll^/n)^^^(I.",T.|d.-d.|^/n)^^^= ^1=1 1 1 ^1=1 1 111 Hence, by eq. (A. 12) and the triangle inequality, P IIH-HII — (C^(K)L^''^C{K)A^) = o (1). 1 h p The conclusion then 0. > For example, by logic follows similarly to previous arguments. like that above, IIFAQ~^HQ~^f: -Z )Q~^H'Q~^A'F'II ^ IIFAQ~^HQ~^II^I1Z -E tr(FAQ~'^HQ7^H'Q~^A'F')o :£ It (1) = o (1). p p 1 II ^ FVF' — then follows by similar arguments and the triangle inequality that the case of Assumption A2, square root that that /V (V similarly )\^nV from Lemma A3: max.^ Iv.-v. i^n Proof: 11 I — v. = — V — ^1/2 > n ), By the density of 6 where -^ N(0,I). ^ 1, a(h) is a scalar, it follows by taking a VnV'^^he-Q so that In ) = In the other case the last conclusion follows QED. . and continuously distributed with bounded density and is i.i.d. (5 P b), v"^'^^/V~^'^^ (^"^n^ F If iii, I. n v. -^ 0. then Y.'^ AKasv .^b)-l(ar^v .:^b)\/n ^i=2 bounded, for any 1 inequality. 48 i i A n > 0, A n — > 0, = (S P n ). by the Markov y.",[l(|v.-a|£A )+l(|v.-b|^A )]/n = ^1=1 1 n n 1 We also use the well known positive sequences with such that — > c — goes to zero slower than n = n p — ^ 0. 8 in -1/2 (e i.e. , — n > in e if Y n n )) = (A n 5 ) — n max. i^n > -^ |v.-v. i Then 0. n p ). for Consider any positive sequence follows that It probability approaching one (w.p.a.l) as 2 n | and only if (1) slowly enough. > Iv.-v. |/5 V max. 1 n i^n 1 n so 0, Y result that c in (Prob(| v.-a| <A )+Prob( v.-b|<A p v n all e = (e 1/2 n ) ^ 5 /v with n n | i Then w.p.a.l, oo. c 7.", |l{a<v.£b)-l(a:£v.<b)|/5 n = e 7.", l(a£v.+[v.-v.]<b)-l(a<v.£b) |/5 n n^i=l i i 1 i i n n^i=l | nnn /v nnpnn s c y.^JKIv.-alrsS /v )+l(|v.-b|<5 /v )]/8 n = e d~^0 n^i=l Proof of nn 1 Lemma 4.1: i (5 ) = O (y pn ) = o (1). QED. p Because series estimators are invariant to nonsingular linear transformations of the approximating functions, a location and scale shift allows us to assume that W = and [0,1] Z = [0,1] Also, i. in the polynomial case, the component powers can be replaced by polynomials of the same order that are orthonormal with respect to the uniform distribution on [0,1], and in the spline both cases the resulting vector of functions is case by a B-spline basis. a nonsingular linear combination of the original vector, because of the assumption that the order of the multi-index is increasing. Also, assume that the same operation approximating functions <.(K) = Then it follows from <IK) ^ and max. r , (z). .sup Newey CK^'^^J, is carried out on the first step For any nonnegative integer ~J\d p (w)ll, f .(L) = max, j, , (1997) that in the polynomial case, C(L) £ CL. in the spline case that 49 In .sup .^115 p (z)ll. It follows from these inequalities and Assumption 4 that [Cq(K)C^(K) + Co(K)^?q(L)]A^ so the conclusion follows by - JA (u)dF (u). g„(w ) w t(w) = given For w = (x,z = g„(w J-J'g (w JdF (w away from Al. A^ = (L/n)^^^ ), ). let 1, A(u) L V^^i, let F"f,(w) denote the conditional A(u) - jA(u)dF (u), - note also that g(w ) Also by the density of zero, so is the conditional density on Therefore, for + QED. Without changing the notation Proof of Theorem 4.2: distribution of Lemma ^0. W, and ^^(u) = = g(w )-J'g(w )dF (w w ) ^Au) and being bounded and bounded and the corresponding marginals. A = J{h(w)-h (w)}dF (w) J[h(w)-hQ(w)]^dFQ(w) > Cj'[i(w^)+A(u)-gQ(w^)-AQ(u)]^dFQ(w^)dFQ(u) = CSlg(vf^)-g^iw^) + A(u)-Aq(u) + A]^dFQ(w^)dFQ(u) = Cj-[i(w^)-iQ(w^)]^dFQ(w^) + Cj'[A(u)-AQ(u)]^dFQ(u) + CA^ i Cmax{J[i(w^)-iQ(w^)]^dFQ(w^), J-[A(u)-AQ{u)]^dFQ{u), A^> where the product terms drop out by the construction of g(u), etc., e.g. S J[g(w^)-iQ(w^)][A(u)-AQ(u)]dFQ(w^)dFQ(u) = X[g(w^)-gQ(w^)]dFQ(w^)- J[A(u)-AQ(u)]dFQ(u) = 0. QED. Proof of Theorem 4.3: W. Lemma 4.1 by fixing the value of u at any point in QED. Proof of Theorem 4.1, Follows from it 5.1: For power series, using the inequalities in the proof of Theorem follows that Cq(K)^K^ s Ck"^, L^CqCKJ^^CL)"^ £ (K^L+L^)C^(K)^ £ C(k\+L^)K^, Ck\^, KLCq(K)'^C{L)^ ^ CK^L^, Co(K)^Ci(K)^(LK+L^) ^ CK^(KL + L^). «^n and for splines that It then follows from the rate conditions of Theorem 4.2 that the rate conditions of A2 are satisfied. Theorem 4.L Lemma The conclusion then follows from Lemma A2, similarly to the proof of QED. 51 References Agarwal, G.G. and Studden, W.J. (1980): "Asymptotic Integrated Mean Square Error Using Least Squares and Bias Minimizing Spline," Annals of Statistics 8, 1307-1325. Ahn, H. (1994) "Nonparametric Estimation of Conditional Choice Probabilities in a Binary Choice Model Under Uncertainty," working paper, Virginia Polytechnic Institute. Andrews, D.W.K. (1991): "Asymptotic Normality of Series Estimators for Nonparametric and Semiparametric Models," Econometrica 59, 307-345. Andrews, D.W.K. and Whang, Y.J. (1990): "Additive Interactive Regression Models: Circumvention of the Curse of Dimensionality," Econometric Theory 6, 466-479. Barzel, Y. (1973) "The Determination of Daily Hours and Wages," Quarterly Journal Economics 87, of 220-238. J. and Zarkin, G. (1989) "Choice among Wage-Hours Packages: An Empirical Investigation of Male Labor Supply," Journal of Labor Economics 7, 415-37. Biddle, Breiman, L. and Friedman, J.H. (1985) "Estimating optimal transformations for multiple regression and correlation," Journal of the American Statistical Association. 80 580-598. Brown, B.W. and Walker, M.B. (1989): "The Random Utility Hypothesis and Inference Demand Systems," Econometrica 47, 814-829. in Cox, D.D. 1988, "Approximation of Least Squares Regression on Nested Subspaces," Annals of Statistics 16, 713-732. Hastie, T. and R. Tibshirani (1991): Generalized Additive Models, Chapman and Hall, London. Linton, 0. and Nielsen, J.B. (1995): "A Kernel Method of Estimating Structured Nonparametric Regression Based on Marginal Integration," Biometrika 82, 93-100. Linton, 0., E. Mammen, and J. Nielsen (1997): "The Existence and Asymptotic Properties of a Backfitting Projection Algorithm Under Weak Conditions," preprint. Wage-Hours Labor Supply Model" Moffitt, R. (1984) "The Estimation of a Joint Journal of Labor Economics 2, 550-66. Newey, W.K. (1984) "A Method of Moments Interpretation of Sequential Estimators," Economics Letters 14, 201-206. Newey, W.K. (1994) "Kernel Estimation of Partial Means and a General Variance Estimator," Econometric Theory, 10, 233-253. Newey, W.K. (1997): "Convergence Rates and Asymptotic Normality for Series Estimators," Journal of Econometrics 79, 147-168. Newey, W.K. and J.L. Powell (1989) "Nonparametric Instrumental Variables Estimation," working paper, MIT Department of Economics. Oi, W. (1962) "Labor as a Quasi Fixed Factor," Journal of Political 52 Economy 70, 538-555. Powell, M.J.D. (1981). Approximation Theory and Methods. Cambridge, England: Cambridge University Press. Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression," Econometrica, 56, 931-954. Roehrig, C.S. (1988): "Conditions for Identification in Nonparametric and Parametric Models," Econometrica, 56, 433-447. Stone, C.J. (1982) "Optimal global rates of convergence for nonparametric regression," Annals of Statistics 10, 1040-1053. Stone, C.J. (1985) "Additive regression and other nonparametric models," Annals Statistics. 13 689-705. of Tjostheim, D. and Auestad, B. (1994): "Nonparametric Identification of Nonlinear Time Series: Projections," Journal of the American Statistical Association 89, 1398-1409. Two-Step Estimator for Approximating Unknovim Functional Models with Endogenous Explanatory Variables," Australian National University, Department of Economics, working paper. Vella, F. (1991) "A Simple Forms Vella, in F. (1993) "Nonwage Benefits in a Simultaneous Model of Wages and Hours: Labor Supply Functions of Young Females," Journal of Labor Economics 11, 704-723 53 Table 1: Estimate OLS .000017 (.5597) 2sls NP .000387 -.0094 (2.7447) (2.5513) 7.01516-6 NPIV -.01097 (2.9170) 7.4577e-6 (2.6363) (2.7986) -2.1707e-9 -2.0250e-9 (2.6304) (2.4640) 2.3692e-13 (2.5537) 1.89296-13 (2.0311) -.0005 -.000384 (2.7643) (2.6732) -6.64646-8 r (.5211) 3.49786-10 r^ (2.7516) Notes:i) h denotes hours worked and r is the reduced form residual. ii) The regressors in the wage equation included education dummies, union status, tenure, full-time work experience, black dummy and regional variables. iii) The variables included in Z which are excluded from X are marital status, health young children, rural dummy and non-labor income. status, presence of iv) Absolute value of t-statistics are reported in parentheses. Table Specification h=0; r=0 h=l; r=l h=2; r=2 h=3; r=3 h=4; r=4 h=5; r=5 2: Cross Validation Value 185.567 183.158 183.738 182.899 182.882 183.128 54 1 Estimote of Woge/Hours Profile from Polynomial Approximation — o 1 1 1 1 1 \ y^ o ^^ // // \ ^ ^ \ d -\ \ / * \ " ^ '^ \ \ ^ ^ ''/ -^ , ^ ''''/ 1 ^. / CNJ d "^""A / ^^ / / 1 / / \ to d ^ \ " ^ ^ ^ ^/^ \ ' \ o \ \ 1 I f N^ " ^ ^ /-" d o d __J.----^J, ^1 ^^,^4— \ " ^ Y \ - \ / ' ^"^^L^-^ 1 / / <a- d / / 1 / iT) d / - 7 — — — — — — 1 Unadjusled Nonporometi-.c Two Slaqe Leasl Squares ID 1 I 1000 1 1400 1 1800 1 1 2200 I 1 1 2600 Annual Hours Worked Figiire 1: 1.1 JOOO 1 3400 .J 3800 Estimate of Wage/Hours Profile from Spline Approximation d (D CNJ O cc d " 0) en O \ / X Oi o »- d 1 \ _! C 01 \ 1 ^ y c SI o \ V ""^ 1 \ "^ ^ \ \' / - \ / f^ . \ \ / d \ / \ / 1 / ^ — — — unadjusled Nonparomelnc , , 1 1000 1400 ' \ \ / / 1 CP o N - CN d \ '"x / y - \ / / / \ \ \ ^ ^ ^ / / / \ \ \ ^ / '' \ \ o \ \\ / /~ >N o d N. y/^ ^ o 5 d l^ " ^.--'''^ y^ y^ 1800 1 1 1 1 2200 1 2600 Annual Hours Worked Figure 2: : 3000 1 3400 3800 95Z Confidence 1 d 1 Interval for 1 1 1 Wage/Hours 1 1 T from Polynomial Profile ' ' 1 1 ' --' o ,-'' tt: d - \ o d o 1 I \\ ^^-„^^^ ^^^-^""^ \ ^^^---"'y<^ - »» \ \ *^ .y^^ ^ ^ s ^^""^ N, \ \ . \ J' \ o - \ 1 \ \ \ 0) en \ CD d \ \ 1 \ SI o \ CN 1 1 I 1000 1400 1800 2200 Annual Hours Worked Figure 3: . 1 1 2600 3000 3400 3800 MIT LIBRARIES 3 95% Confidence T d 1 1 TOflD DD77Tb33 Interval for 1 r- M Wage/Hours 1 1 T from Spline Profile 1 1 1 1 1 ,----^ '^ d ^ \ ^^^^--""^ \ o d \ __ \ N 1 ^ o ^ d ^ " -~ -^ ^^ ^^\ " N, ^J^J^^^^ ,'><^ /^^ \ ""-. \ ^ / \ \ -' . ' X " en '' \ m o \ \ \ 1 \ \ \ CM I 1000 1400 1800 2200 2600 Annual Hours Worked Figiire 4: 3000 3400 3800 4879 OU Date Due Lib-26-67