Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/flexiblesimulate00newe2 l°£ty£y working paper department of economics FLEXIBLE SIMULA TED MOMENT ESTIMA TION OF NONLINEAR ERRORS-IN-VARIABLES MODELS Whitney K. Newey No. 99-02 February, 1999 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 WORKING PAPER DEPARTMENT OF ECONOMICS FLEXIBLE SIMULATED MOMENT ESTIMATION OF NONLINEAR ERRORS-IN-VARIABLES MODELS Whitney K. Newey No. 99-02 February, 1999 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142 Flexible Simulated Moment Estimation of Nonlinear Errors-in-Variables Models Whitney K. Newey 2 MIT First Version, October 1989 Revised, February 1999 Abstract Nonlinear regression with measurement error microeconomic data. is important for estimation from One approach to identification and estimation where the unobserved true variable paper is is a causal model, is predicted by observable variables. This about estimation of such a model using simulated moments and a flexible disturbance distribution. parametric models. estimator is An estimator of the asymptotic variance is given for Also, a semiparametric consistency result is given. demonstrated in a The value of the Monte Carlo study and an application to estimating Engle Curves. JEL Classification: Keywords: C15, C21 nonlinear regression, errors-in-variables, simulated moments. This research was supported by the National Science Foundation and the Sloan Foundation. David Card and Angus Deaton provided helpful comments. 2 Department of Economics, E52-262D, MIT, Cambridge, MA 02139. Phone: 617-253-6420, Fax: 617-253-1330, Email: wnewey@mit.edu. 1. Introduction Nonlinear regression models with measurement error are important but difficult to estimate. Measurement error models are often of interest. is common problem a microeconomic data, where nonlinear in For example, flexible functional forms often lead to Instrumental variables estimators are not inherently nonlinear specifications. consistent for these models, as discussed in Amemiya (1985), The purpose of this paper approaches must be adopted. computationally feasible and also allows for flexibility disturbances. This purpose is in is so that alternative to develop an approach that is the distribution of accomplished by using simulated moments estimation with flexible distributions, an approach that may be useful for simulated moments estimation of other models. The measurement error model considered here has a prediction equation for the true regressor with a disturbance that is independent of the predictors. The estimator is based on the conditional expectation of the dependent variable, and the conditional expectation of the product of the dependent variable and mismeasured regressor. model for measurement error in nonlinear models has previously been considered by Hausman, Ichimura, Newey, and Powell (1991) and Hausman, Newey, and Powell (1995), but only for the case of polynomial regression or approximation, and simulated not considered. This moments was This paper allows for general functional forms, significantly extending the scope of the previous work. Much of the other work on measurement error in nonlinear the assumption that the variance of the measurement error size. is models relies heavily on small relative to the sample These papers include Wolter and Fuller (1982) and Amemiya (1985). In econometric practice the measurement error often seems quite large relative to the sample size, and has big effects on the coefficients. Thus, it seems important to consider approaches that allow for relatively large measurement error, as does the one here. Simulated moments estimation provides a computationally convenient approach when estimating equations involve integrals, as discussed in in Lerman and Manski (1981), Pakes (1986), McFadden (1989), and Pakes and Pollard (1989). This approach uses Monte Carlo methods to form an unbiased estimators of integrals moment in equations. Allowing flexibility in disturbance distributions is desirable, because consistency of the estimator depends on correct specification of the distribution. Also, preserve the computational convenience of simulated moments. These goals are accomplished by combining simulated moment estimation with a linear specification for distribution shape. density to the simulated one. it in is useful to parameters The specification parameterizes ratio of the true This approach is similar to the importance sampling technique from the simulation literature. The parametric simulated moments estimator we propose method of moment estimator. Here the moments are smooth standard asymptotic theory applies. is essentially a generalized in the parameters, so that For that reason we just give large sample inference procedures with an outline of the asymptotic theory for the parametric case. We pay more attention to conditions for consistency for the nonparametric case, giving a consistency result when the number grow with sample of parameters in the distribution approximation is allowed to size. The paper also includes Monte Carlo and empirical applications, to evaluate the potential impact of this approach for applied work. The empirical application estimation of Engel curves from household expenditure data. is The measurement error correction makes a big difference in the application, with a Gaussian specification for prediction error sufficing in most cases. having small standard errors. moment estimation Also, the estimator seems quite accurate, The results illustrate the usefulness of using simulated to correct for measurement error, while allowing some flexibility in the distribution of the prediction error. Section 2 describes the errors in variables model and some of its implications for conditional moments. Section 3 lays out the estimation method and discusses parametric Section 4 gives a semiparametric consistency asymptotic inference for the estimator. result. Section 5 presents results of a small Monte Carlo study. Section 6 describes an empirical example of Engel curve estimation of the relationship between income and consumption. The Model The model considered here y = f(w ,5 (2.1) w = w where W E[<|x,v] = 0, C + ) * + E[7?|X,V,^] 7}, = ti'x + and y Q is cr are scalars, ^ 5 are conformable matrices. cr errors, and w unobserved. The w , Here 0, independent of v v, = w, , w t), are vectors, v represents true regressors, The x w are observed and and and Tt measurement 7) , £, tj, and last equation is a prediction equation for the true regressors, v is an unobserved prediction error, scaling matrix, a square root of a variance matrix. allowed to be observed, with w corresponding elements of and tt'x and * observed regressors. are observed predictor variables, element of x, x. is w tj v cr may be where is x a Some of the true regressors can be equal to an element of cr and v x, by specifying that are identically zero, and the corresponding * = w. This model was considered by Hausman, Ichimura, Newey, and # Powell (1991) (HINP henceforth), for the special case where f(w w . e.g. As long as as E[v] = x ,5) includes a constant, the location and scale of and Var(v) = Instrumental variables (IV) I when the second moment of v is v a polynomial in can be normalized, exists. estimators can be used to estimate this model when * * * f(w w linear in is ,5) Substituting . w-tj being valid instruments, because the disturbance is in the first linear in the equation leads to measurement error case this substitution leads to residuals that are nonlinear in In the nonlinear x Consequently, w for will not be valid instruments, x 17. tj. and another approach has to be adopted. An approach Let error. to consistent estimation can be based on integrating out the prediction be the density of g n (v) Integrating over the prediction error leads v. to three useful conditional expectation equations: (2.2a) E[y|x) = JTdr^x + (2.2b) E[wy|x] = (2.2c) E[w|x] = The Jln' x + Q first is a regression of on x, that Q S v, <r )g (v)dv, v]T{n'x + <r v, 5 )g (v)dv, tt^x. unobserved variable wy cr v is y on x, analogous to the usual one, except that the has been integrated out. less familiar. The second equation The third equation is is a regression of a standard regression equation. The second equation is important for identification of nonlinear models. components of this equation corresponding to unobserved w (i.e. those not corresponding to observed covariates) provide information additional to the first equation. in HINP for polynomial regression, the identification. Intuitively, there are identification. It was shown in As shown first equation does not suffice for two functions that need to be regression function and the density of The v, identified, the so that two equations are needed for HINP that the parameters of any polynomial regression equation are identified from these two equations, and one expects that identification of the regression parameters will hold more generally. It is beyond the scope of this paper to develop fully primitive identification conditions for this model, but some things can be said. identified so 7r from n w are on x, can be treated as known and identification of the other pieces of the model m distribution with a finite support and provide 2m 2m * Also, that a "rank condition" v. and 5 For example, HINP showed # a scalar and is parameters are identified nonsingular. i.e. parameters from these equations, including w where has a discrete points of positive probability, then (2.2 a - parameters of a parametric family of distributions for that, in the case ti'x If Assuming that none are redundant, equations. holds, one could identify 8 parameters (2.2c) as the coefficients of a least squares regression of considered by focusing on equation (2.2a) and (2.2b). b) First, the if f(w ,5) is moment matrix the second some of the moments of v a polynomial of degree of are identified (I.tc'x ^n x in this case. the p, ^' * s has rc'x If * a continuous distribution, then a simple counting argument suggests that g (v) should be identified. Assuming that the distinct functions, these equations give functions to be identified. So, left f(w ,5 ) and hand sides of (2.2a) and (2.2b) are two functional equations, and there are two by an analogy with the finite dimensional case, it should be possible, under appropriate regularity conditions, to identify both the w regression function for * and the density function for v. Making this intuition precise would be quite difficult, because of the nonlinear, nonparametric functional) nature of these equations, but it is (i.e. an important problem deserving of future attention. Independence of model of equation moments of v (2.1) x it and v a strong assumption, but in the general nonlinear is is difficult can depend on x, to drop this assumption. then regression function from the distribution. it is much more Intuitively, if some difficult to separate the 3. Estimation To describe the estimator moment conditional g vector, and H(z,£,v) a p(z,P,g) = Suppose that there parameter value denote a data observation, z r x a set of conditioning variables is and density (z.P.v) = y - H (z,/3,v) = Uwy (z,/3,v) = w and L is 3 q x a p(z,/3,g) r 1 x vector of 1 residual g x such that for the true , )|x] = 0. ,g H H a /3 more general vector of functions, related as in 1 The nonlinear errors-in-variables model (3.3) v, in a rH(z,/3,v)g(v)dv. £ E[p(z,(3 (3.2) embed the model a density function of a random vector parameters, (3.1) Let setup. helpful to is it is a special case of this one, where f(7r'x+crv,S), - [7r'x+o-v]-f(7r'x+crv,5)>, - jt'x, /3 = (S'.o-.ji' )', a selection matrix that picks out those elements of w that include measurement error. The common approach to using equation variables. One difficulty with Another difficulty is this approach that the residual is (3.2) in estimation is nonlinear instrumental is that the density g(v) is unknown. an integral that may be difficult to compute. Here, these difficulties are dealt with simultaneously, by choosing a flexible parameterization for the density that makes integral. function. To describe this approach, it easy to use a simulation estimator of the we begin with a specification of the density For now, suppose that the density g(v,y) = (3.4) where and is <p(v) p.(v) = v member a is of a parametric family, of the P(v,y) = £. jr.p.Cv), PCv.rMv), some fixed density function. For example, if <p(v) then this would be an Edgeworth approximation. , form need not be positive, but leads to residuals that are linear were standard normal The function g(v,?-) shape parameters in the n and that can easily be estimated by simulation. For a density like that of equation (3.4), a simulated residual can be constructed by drawing random variables from combination [v ,...,v. and the P(v,y) H and then evaluating the product of the linear <p{v) Let functions. z. denote a single observation and denote a vector of random variables, each having marginal density ] For example, if (p(v) is a standard normal pdf, then [v ,...,v. ] could be computer Then a simulated residual for the generated Gaussian random numbers. <p(v). observation i is -1 p\(e) (3.5) This is = s £j\H(z.,(3,v )P(v e = (p'.r')'. *), essentially an importance sampling estimator of the residual, sampling density and P(v,y) approximates g(v)/<p(v). where is <p(v) The simulated residual is an unbiased estimator of the true residual, because E[p.(G)|z.] = p(z ,0,g(y)). Therefore, by the results of McFadden (1989) and Pakes and Pollard (1989), an instrumental variables (IV) estimator with p. (6) as the residual should be consistent l if the IV estimator with the true residual familiar way. estimated. Let A(x) Suppose that denote a 9 solves q x r is. An IV estimator can be formed in vector of instrumental variables, that a may be n (3.6) This ^.^ACxJp.O) = 0. McFadden a simulated, nonlinear IV estimator like that of is Because equation (3.6) density linear in is to integrate to one. P(v,3")<p(v) P{v,y), Also, is it it important to normalize the may be important normalizations by imposing constraints on the coefficients. p.(v), p~(v), for j * k), then integrates to one, and has zero P(v,y)<p(v) =1, y For example, if is <p(v) are the Hermite polynomials that are ... orthonormal with respect to the standard normal density Jp.(v)p (v)<p(v)dv = to impose a location There are different ways to impose and scale normalization on this density. the standard normal density and (1989). j mean and = 2 (i.e. J"p.(v) <p(v)dv and 0, y unit variance. = It to impose such constraints using the simulated values, by requiring that = and 1 will imply that is also possible y S 2 n ,(l,v. ,v )P(v. ,r) = 0. is ^i=l^s=l is is T satisfy£. In the nonlinear errors in variables model, it is convenient to work with a two step estimator, where the first step consists of estimation of n by least squares (LS), and an instrumental variables estimator using the first two residuals of the second step is equation (3.3). The first order conditions for such an estimator can be formulated as a solution to equation (3.6), if A(x) is chosen in a particular fashion. Let u be the LS estimator and A(x) = diag[B(x),x], (3.7) where (5',o\y'). £ _iP( v has two columns and number of rows equal to the number of elements of B(x) - »y) Suppose that constraints are imposed on the = 0. Then the solution to equation equation (3.7), requires that parameters solve the equation n (3.6), y with coefficients such that A(x) specified as in be the least squares estimator, and that the other n a = £"B(x.)p.(a), (3.8) y p. (a) * equation as s=l min The P(v V *- order condition, although the normalization l Specifically, for a 3-) is> example the estimator minimizes a quadratic form that has this type of a a positive definite matrix, (3.9) ,5) L(7r'x.+crv. )f (n'x.+o-v. ,5) 1 is 1 is ' l l is l -i£ Lw.y. its first not imposed. f(7r'x.+crv. : = 1 In the empirical a = (5',<r,r')\ l ,P(v. ,y) = is is equal to a vector of instrumental variables and C(x) W solves n a [£ " C(x.)p.(a)]' ^1=1 S s=l W[£" l ^i=l C(x.)p.(a)]. l l first order conditions to this minimization problem are as given equation (3.8), in with n B(x) = [aX. ,C(x.)p.(a)/Saj'W. ^1=1 l l (3.10) Standard large sample theory for IV can be used for asymptotic inference procedures. If the simulated values observation for the v.) (v data point, then the usual IV formulae can be used to form a i For example, suppose that consistent variance estimator. independent observations as see are included with the data to form an augmented Newey and McFadden, i varies. v.) (z.,v are Then under standard regularity conditions 1994), the asymptotic variance of v^nte - 9 (e.g. can be estimated ) by _1 V = G nG (3.11) _1/ , n G = n"V. ,A(x.)s3.(e)/ae, ^1=1 1 If the simulation draws given i, v. 1 n 111 a = n"V. A(x.)p.(e)p.(G)'A(x.)'. ^1=1 i are mutually statistically independent as then one could also use the estimator 10 s 1 varies for a , V = G (3.12) _1 Q = n V.^AtxJA.Atx.)' ^1=1 £2G *', 1 = A. i l l S ,H(z.,/3,v. )H(z.,/3,v. )'P(v. ,£) S"V is is is ^s=l 2 . i 1 Both of these variance estimators ignore estimation of the instruments, which Because the large sample theory for these under standard regularity conditions. estimators 4. straightforward, is valid is we do not give regularity conditions here. Consistent Semiparametric Estimation If the functional form of the density becomes semiparametric. g n (v) left unspecified, is Models where identification is then the model achieved by conditional moment restrictions like those of equation (3.1) are nonlinear, nonparametric simultaneous Newey and Powell equations models. their result can be applied here. with and the IV equation density & function. with to apply the previous estimator, but by a nonparametric conditional expectation equation. that will be assumed to include the true v be a sequence of families such that Let &. = /3 Pj(v,y) 9 = (£,g) {P.(v,y)<p(v)}n§', and a density replacing g, P(v,y). of the conditional expectation of estimator. is and satisfy other regularity conditions given below. Euclidean vector (3.5) (3.6) replaced be a set of functions of g n (v) {Pj(v,^)> The basic idea chosen to be a member of an increasing sequence of approximating families P(v,y) Let have considered estimation of such models, and (1991) p. (8) and Let be the parameter consisting of the be the simulated residual of equation be a nonparametric estimator E[p.(G)|x.] x., Then a minimum distance estimator of 11 can approximate any P Sv,j)<p{v) p.(0) given Also, let such as a series or kernel 9 = (/3 ,g ) is 6 = argmin (4.1) 0eSx ^ Q(6) = £. E[p\(e) x.]' DE[ P .(e) |x.]/n, =1 Q(9); | n where D is a positive definite matrix and The objective function size. equation in can depend on the data and on sample J (4.1) a sample analog of is Q(9) = E[E[p(z,e)|x]'DE[p(z,e)|x]], where and 6 D is is D the limit of identified a unique solution), p(z,9) = TH(z,/3,v)g(v)dv. and from the conditional moment equation then Q(8) extremum estimator reasoning (3.2) positive definite is (i.e. that equation has have a unique minimum of zero at will (e.g. D If Newey and McFadden, 6 The general . 1994) then suggests that should be consistent. The estimator can be shown to be consistent sets, similarly to Gallant (1987). if and fi are restricted to compact g The compact function set assumption but the results of Newey and Powell (1991) indicate estimators of the form considered here. a strong one, is importance for minimum distance its A = For a matrix [a. .], let II A = II ij Itrace(A'A)] 1/2 , and for a function g(v) let denote a function norm, to be llgll further discussed below. Assumption 4.1: /3 € B, which is compact, and In the primitive regularity conditions given below, The following dominance condition Assumption M(z,v), 4.2: There exists IIH(z,/3,v)-H(z,/3,v)ll < M(z,v) e g i*, will be a Sobolev llgll will be useful in such that for a compact set in a norm norm, showing uniform convergence. £, e S, IIH(z,J3,v)ll ^ M(z,v)IIJ3-j3ll. Moment conditions for the dominating function M(z,v) will be specified below. To show uniform convergence of the objective function of equation useful to impose a strong condition on function norm, that 12 it (4.1), it is dominates a weighted llgli supremum norm. it will be implies that on the faster HgH assumed that in order to HgH moves outward, v sampling imposes a restriction on the restriction on llgll, this so that Assumption 4.1 assumption the faster the tails of sup {|g(v)|w(v)> guarantee that baseline density. dominated by The import of §\ and 0. > is v <p(v), is a uniform bound imposed by the presence of the weight function; the g(v), grows as u(v) w(v) bounded on is v behavior of tail denote the support of = sup.,|g(y)|w(v), llgll, , Below V Let is finite. g(v) must go to zero Also, the nature of importance tail thickness of the true density relative to the For second moment dominance this restriction will translate into a w(v) relative to These considerations lead to the following <p{v). assumption: Assumption 4.3: EIJ M(z,v) - HgH,, V t dv] w(v) [u(v)<p(v)] In order to g6 for llgll ? and there exists e such that > 0) < oo. guarantee that the parametric approximation suffices for consistency, the following denseness condition will be imposed. Assumption lim J _ IIP >aj J 4.4: For any (-,r)*> - gll = g € J there exists such that g can be approximated by the family. necessary to make some assumption concerning the conditional expectations The following condition estimator. is lifted from Newey and Powell changing notation assume that the data observation (v. ,...,v. P Av,y)<p(v) e ^ 0. This condition specifies that It is and *§ ). Assume that the data are stationary. 13 z. (1991). Without includes the simulation draws > Assumption € from Assumption > 1+e/2 and if finite, y For 4.5: E[ |tf/(z.) ] | £." IIEty(z) n ,w..0(z.), J=l iJ J IXj] w.. £ < - E[i/»(z) |x.]ll E[0(z)|z ]/n = 0(1); l" z. then is i.i.d. it is -^ U or b) Etyr(z)|z in ] K/n — > 0, (1977), while b), iii) ii) iii) with P Lemma by construction. containing K/n e/(€+2) A. 10 of -> either and and is E[^/(z) if if " E[ is ] a) E[i//(z)|x.] .P' .)"£ = 1+€/2 |i//(z) ,P.0(z ] | < «,, .). For instance, quite general. K Newey 1 — K of Stone For a series estimator of the form given in elements such that any function with finite mean square in mean-square for large enough (1994a) and the arguments for Lemma K, ii) follows — K A. 11 as long as > K. It P p(z,6) p(z,G)). in 6), form of the weights w st are restricted to not depend on of for some preliminary estimator should be noted that Implicitly the Thus, while they could be chosen based on some fixed \p (e.g. a linear combination they are not allowed to vary with i// Assumption 4.5 should also be "plug-compatible" with future results on nonparametric conditional expectation estimators, such as those for time series. The last Assumption oo 0. ip. B ii) For K-nearest-neighbor estimators with Assumption 4.5 and the approximating functions with positive definite, is 2+e (£ " P Assumption 4.5 restricts the form of randomness. (i.e. D D, > E[(ft(z.)]; iii) Neither of these results allow for data based in — 8 of Robinson (1987) and Proposition can be approximated arbitrarily well from Lemma P'. D easy to use known results to show that Assumption 4.5 holds follows by a) holds = i) some cases and for nearest neighbor and series estimators. oo, 0; (s,t=l,...,n), 1, J =1 Assumption 4.5 can easily be checked if /n n V. ,w.. = 0, iJ -^ [."^(z.)/n then oo 4.3, assumption specifies that 4.6: J —^-> oo as n — > J must go to infinity with the sample size. m. As mentioned earlier, the degree of approximation 14 J can be random, in a very general and However, way. should be noted that it it not restrictions on the growth rate of is J that are used to obtain consistency, but rather the restriction of the function to a compact Often, the compactness condition will require that higher order derivatives set. be uniformly bounded, a condition that will have more "bite" for large values of J, imposing strong constraints on the coefficients of higher order terms. These assumptions and identification deliver the following consistency E[p(z,G)\x] = Theorem 4.1: 4.1 - 4.6 are satisfied, then It If II has a unique solution on 0-/3 II -^ and llg-g Bjc? -^ II at result: and Assumptions 6 0. should be noted that the hypotheses of this theorem are not very primitive until the norm Once that is specified. llgll is specified, may it require some work to check the the other assumptions. The following set of Assumptions is sufficient to demonstrate that the assumptions are not vacuous, and do cover cases of some interest. Assumption 4.7: fixed constant is I- Q r -v V J and £ 4.2: is This result is and vi) ; Corollary B is for y>(v) E[sup is all : v e V}; v, There ii) g(v) = iii) is v g V, sup |g(v)| for llgll If Assumptions ] < 3.1, 11/3-pJI V a compact interval = sup |g(v)|; iv) continuous and bounded away from zero on M(z,v) vgV compact then is one-dimensional; such that ^ = <g(v) ^ B|v-v| |g(v)-g(v)| <p(v) v i) ^ The support of V; v) 4.2, 4.5 - 4.7 -^ Hg-gJI -^> and It are satisfied, is (3 e B, Ptv,^) = satisfied, 0. easy to relax the assumption that one dimensional, using the results of Elbadawi, Gallant, and Souza (1983). noncompact support for v, using the results of Gallant and Nychka (1987). quite thick tails, with B, oo. restrictive in several ways. difficult to allow for and a w(v) = C(l+v'v) in although this extension is more possible Unfortunately, their result allows for Assumption 15 is It v 4.3. This tail behavior does w when not allow Assumption 4.3 to be satisfied <p(v) is the standard normal density. Of course, there are fast computational methods for generating data from densities (1+v'v) proportional to densities. Also, so that one could easily use such thick-tailed baseline , should be possible to develop intermediate conditions that allow for it more general simulators. 5. A Sampling Experiment A small Monte Carlo study is useful in a rough check of whether the estimator can give satisfactory results in practice. Consider the model * * (5.1) y = 5 + 5 { 2 w — + 5 e + e, 3 = 8 8 [ 2 = I, < is N(0,1), * w w = = w tt +i), + x + tt N(0,1), is 7) v, The regression equation for = n = model is tt this 1, x one that relationship between consumption and income. discussed in Section 6, where it is used and is v are useful in estimating the This specification will be further in the observations was was chosen 1. The parameter values empirical example. were set so that the r-squared for the prediction equation for signal to noise ratio N(0,.5). w * was The number of observations was set to 1/2, 100. and so the The number of to be small relative to typical sample sizes in economics, to * make computation than typical in easier. The r-squared for the regression of w on x was set higher order to offset the small sample size, so that the estimator might be informative. Table One reports the results from 100 replications. 16 Results for three different estimators of 5 , 5 and , squares (OLS) regression of The third estimator 7r„x.. with C(x.) = I®R. This estimator matrix. W and is instrumental variables are third-order, where y is = (l,w,e ) that are an instrumental variables estimator (IV) is with the same right-hand side but with instruments + first estimator is an ordinary least on the right-hand side variables y The second estimator measured with error. The are reported. 5 R. = (l,h.,h. ,h. )' , where h. a simulated moment estimator (SM) from equation I®(£. R.R'. ) where , I is = 7i (3.9), a two dimensional identity a system two-stage least squares estimator where the Also, R.. =1, was a Hermite polynomial P[v,-y) y_ = y„ = 0, and y was estimated. of the There were two simulations per observation. In one replication out of the 100 the estimator did not converge to a stationary This replication was excluded from the results, that are reported in Table One. point. The estimator shows promise. The standard errors of the IV and simulated moment estimators are much larger than the OLS estimator, but the biases are substantially As previously noted the IV estimator smaller. it leads to bias reduction. estimator is It is is inconsistent, although in this example interesting to note that the standard error of the smaller than that of the IV estimator. SM Thus, in this example the valid SM correction for measurement error leads to both smaller bias and variance than the inconsistent IV correction. 6. An Application to Engel Curve Estimation The application presented here is long been of interest in econometrics. estimation of Engel curves, a subject that has Measurement error has recently been shown Hausman, Newey, and Powell (1995) to be important curves. in the in estimation of nonlinear Engel This section adds to that work by estimating a nonlinear, nonpolynomial Engel 17 curve for the model of equation Hausman, Newey, and Powell a specification that was not estimated in (2.1), Also, the results here take account of (1995). measurement error in the denominator of the share equation. The functional form considered here S. (5.1) = 5, is that preferred by Leser (1963), + 5„ln(I.) + 5_(1/I.) + £., * where the share of expenditure on a commodity and is S. I. is the true total As suggested by the Hausman, Newey, and Powell (1993) tests of the Gorman expenditure. rank restriction, a rank two specification such as this may be a good (1981) specification, once the measurement error has been accounted for. * a specification In addition, in is considered that accounts for the presence of the denominator of the left-hand side of this equation. I. This "denominator problem" * results from the fact that ill = Y./I., S. where Y. is the expenditure on the commodity. l * Thus, if results is I. measured with error, another nonlinear measurement error problem from using the measured shares. This problem can be dealt with by bringing out of the denominator, giving Y. (5.2) = l If e. 5,1. 1 l + 6_I.ln(I.) + 5 2 l l 3 + I.e.. 11 E[e.|I.] = 0, satisfies the usual restriction then equations (5.1) and (5.2) are equivalent statistical specifications, in that running least squares on either equation should give a consistent estimator. Covariates will also be allowed in this * specification by allowing additional variables equation, corresponding to inclusion of equation x I.x . to enter linearly in this as additional regressors in the share (5.1). The measurement error will be assumed to be multiplicative, the observed total expenditure, 18 i.e. for I. equal to I. * ln(I.) (5.3) * = w. = n'x. + 1 i In the empirical ln(I.) v., 1 1 = 11111 * ln(I.) + l work the predictor variables * = w. = w. + T). tj.. will be a constant, x. age and age squared for household head and spouse, and dummies for educational attainment, spouse employment, home ownership, industry, occupation, region, and black or white, a total of 19 variables, With this specification for the measurement and including the constant. f(w prediction equations, = 8 ,<5) The measurement error + 5 w + 5 exp(-w in the left-hand side as in the Monte Carlo example. ), denominator can be accounted for as * * f(w equation (5.2), leading to a specification with It is that interesting to note that even 5 = 0, if ,5) * = 5 exp(w the share equation is + 5 ) linear in this equation is nonlinear, so that IV will not be consistent. measurement error in the * exp(w )w ln(I.), in + S . so Thus, denominator of the share suggests the need for the estimators developed here. The data used in estimation are from the 1982 Consumer Expenditure Survey (CES). The basic data we use are total expenditure and expenditure on commodity groups from the first quarter of 1982. Results were obtained for four commodity groups, food, clothing, transportation, and recreation. The number of observations empirical results were reported as elasticities, econometrics. To compare shapes, i.e. in the dlnf(x)/dlnx, data set as is common is 1321. The in were calculated at the quartiles of elasticities observed expenditure. The results are given in Tables Two through statistics, including the quartiles of the Five. Table income distribution. include estimated expenditure elasticities at these quartiles. information on the prediction regression. R The 2 in this Two gives The other tables Table Two regression The other information quite sizable for such a cross-section data set. some sample will also gives is is .23, which is useful in calculating the magnitude of the measurement error and bounding the size of the variance of the prediction error v. In particular, the model we have assumed implies that the 19 standard error .45 of the residual an upper bound on the standard deviation of both is the measurement error and the variance of the prediction error Also, given an estimator Var(v) of cr 1/2 v. R an estimator of the , 2 of the measurement equation, that determines the magnitude of the measurement error bias in a linear model, is Var(7r'x + v)/Var(w) = [Var(ir'x) + 0- 2 ]/Var(w) = [(.25) 2 + ^ 2 2 ]/(.51) tt .24 + (3.8)<? 2 Tables Three to Five give results for each commodity for three different specifications of the share equation and four different estimators. where measurement error results for the share equation, left-hand side ignored. is This specification in the Table Three gives denominator of the the same as in the Monte Carlo study. is Table Four changes the specification to account for the left-hand side denominator by multiplying through the original equation by total expenditure, as described above. Table Five adds covariates regional price effects. dummy three regional x to the share equation to allow for demographic and There are six covariates; own and spouse age, family The equation estimated variables. is size, and analogous to that of Table * Four in * 5 + 5 f(w accounting for the left-hand side denominator, with = * * exp(w )w ,x ,5) + exp(w )x + 8 ' 5 It . should be noted that this specification from the prediction equation. restricts family size to be absent Tables Three to Five report results for four different estimators, ordinary least squares (LS), two stage least squares (IV) with instruments described below, the simulated moment estimator with Gaussian v (SMO), and the simulated with one Hermite polynomial term (SMI), of the third order, included functions. p. (a) The simulated moment estimators are each obtained as as given in equation (3.8), simulation draws, and 10 / an estimated asymptotic variance of (5.4) W U. i = E Z = n T." , ^i=l £._ C(x.)p.(a)/v n. l J=l J equal to the inverse of Specifically, 1 i ^J=l J 20 x.x'.)~ x.(w.-7r'x.), J J li moment equation (3.9), with l i i in the U.U'., = C(x.)p.(a) + [S^ n C(x.)p.(a)/57r](y n i W in moment estimation l where a is an initial consistent estimator. minimizing choice of W, 3 This is an asymptotic variance that accounts for the presence of n in p.. The standard errors for LS and IV were calculated from heteroskedasticity consistent The standard errors for simulated moment formulae, e.g. as given in White (1982). estimators were calculated from the H = GMM asymptotic variance estimator (H'Z H) where , n aY. ,C(x.)p.(a)/aa. i l ^i=l A selection process was used to include in the instruments. to choose the order of powers of the predicted value Starting at the second order, the minimum needed to have enough moments to allow estimation of distribution parameters, the order was chosen by cross-validation on the food equation, Gaussian, simulated moment estimator (SMO), using the cross-validation criteria for choice of instruments suggested in Newey Inclusion of higher order powers did not result in any decrease in the (1994b). cross-validation criteria. variables were (1, x'n, Consequently, in Tables Three and Four the instrumental -2 (x'rc) ). In Table Five exp(x'n:)'X was added to the instruments, because of the presence of the covariates. The number of Hermite polynomial terms to include was chosen essentially by an upwards testing procedure, applied in the model of Table Three. order term was tried in each case, as reported asymmetry term was in the distribution of tried. In v. If it was in Table Three. Inclusion of a third This term allows for statistically significant, a fourth order none of the cases was this term significant, so only results for the one, third order, Hermite polynomial term are reported in the tables. For each estimator, elasticities at the quartiles, the estimate of and the estimator of the coefficient standard errors (in y o- = Var(v) 1/2 , of the Hermite polynomial term, as well as parentheses below the estimates) are reported. The (asymptotic) t-statistic on the coefficient of inverse expenditure (t-stat) and the overidentification 3 The procedure used to obtain the initial consistent estimators was to begin with an identity weighting matrix, use a few iterations to obtain "reasonable" parameter values, choose W as in equation (5.4), and then minimize to get 21 a. (minimum chi-square) test (Q) statistic for the simulated The t-statistic reported. the coefficient on inverse expenditure were zero. if The degrees of freedom of the overidentification test Table Five. SMO and also particularly relevant in Table Three because the 2SLS is estimator would be consistent respectively for moment estimator are statistic are SMI, in Tables Three and Four, and The difference between these statistics for and 2 and 8 respectively in 7 SMO and SMI is 1 a one-degree of freedom chi-squared test of the Hermite coefficient being zero. Even though the IV estimator estimator in a number of inconsistent, is When cases. it gives results similar to the the share denominator with error there are larger differences between IV and SM. smaller than those of IV, which is allowed to be measured The standard errors of SM are consistent with the Monte Carlo results of Section is There are large differences between the OLS and SM estimators, as presence of measurement error. SM consistent with the interesting to note that the elasticities for is It is 4. transportation go down rather than up, unlike linear regression with measurement error. In comparing Tables 3 and 4, it is apparent that accounting for measurement error the denominator leads to some changes in the results. food equation in Table 4 than more precisely estimated larger in Table 4. in in Table 3. There these equations. Surprisingly, the estimated standard errors in Table 4 are not most cases SMO is in the cr is The overidentification test statistics are is a levels equation that thought to be more heteroskedastic than the share equation. In more nonlinearity The prediction error standard deviation larger than those in Table 3, although Table 4 nonnormality. is There is quite similar to SMI, except for is much sometimes little evidence of much larger standard errors. In summary, although allowing for nonnormality does not change the empirical results, correcting for measurement error makes a big difference. simulated moments estimator is in In several cases the quite different than the inconsistent IV estimator, suggesting that the inconsistency of IV estimator may not be uniformly small. Furthermore, the simulated moment estimators seem quite accurate, having small standard 22 errors. These results illustrate the usefulness of using simulated moment estimation to correct for measurement error, while allowing some flexibility in the distribution of the prediction error to asses the impact of allowing for nonnormality. 23 Appendix The proof proceeds by verifying the hypotheses of Theorem Proof of Theorem 4.1: Newey and Powell (1991). = "B-xS S compact by is and -1 S Y, (z,v ,...,v Note that for _,H(z,/3,v )P(v ,y). II ell = + 11/311 Note that llgll. Z simulations let S Also, ). be (/3,g) For compact. "S Z = augmented data vector, with S 9 = Let the norm for 5.1 of denote the p(Z,6) = let Pq( v ) = g (v)/<p(v), it follows by Assumptions 4.2 and 4.3 that by the triangle inequality ^ ^ IIH(z,P sf1 (A.l) llp(Z,9 (A.2) {Etllp(Z,e n )ll )ll ,v )ll|p s 2+e 1/(2+e) ^ ]} = C-{E[J [M(z,v)w(v)]" It 1_€ V (v)~ dv]» iip(z,e)-p(z,e)n ^ c-<y s < S Newey and Powell so that Assumption 5.1 of and 5.3 then follow by Assumptions 4.4 and E[p(Z,6)|x] = E[p(z,/3,g) |x] 6, S Theorem J* in the 4.1 follows sup norm. by oo. 6 e 0, )) /s}iie-en, (1991) follows 4.5. by eq. (A.2). Assumptions 5.2 Furthermore, by the fact that for an unbiased simulator, as noted in the text, 5.1 of /S Newey and Powell 2.1. Newey and The conclusion then follows by QED. (1991). 4.1. by hypothesis and the Arzela theorem, which gives compactness of Assumption 4.3 follows with zero and Assumption 4.6, g(v)/op(v) ]> The proof proceeds by verifying the hypotheses of Theorem Proof of Corollary 4.2: Assumption > ll, S Powell's (1991) Assumption 3.1 holds by Assumption the conclusion of )]" /S}llg s 1 ,m(z,v )[u(v )<ph ^5=1 )] ?)(v 2+e 1/(2+e) 1 )[u(v )<p(v 1 ) 1/(2+6) follows similarly to equation A.l that that for (A.3) <I s f 1 M(z,V s )[w(v s =s s C-£ ? <E[<M(z,v 2-e v )|/S (v PjW- T ne vi). w(v) = 1 by <p{v) bounded away from Assumption 4.4 follows by a Weirstrass approximation of proof then follows by the conclusion of Theorem 24 4.1. QED. References Amemiya, 1985, Instrumental variables estimator for the nonlinear errors in variables Y., model, Journal of Econometrics 28, 273-289. A.R. Gallant, and G. Souza, 1983, An elasticity can be estimated consistently without a priori knowledge of functional form, Econometrica 51, 1731-1751. Elbadawi, I., Gallant, A.R., 1987, Identification and consistency in nonparametric regression, in T.F. Bewley, ed., Advances in econometrics: fifth world congress, Cambridge: Cambridge University Press, 145-169. Gallant.. A.R. and D.W. Nychka, 1987, Semi-nonparametric maximum likelihood estimation, Econometrica 55, 363-390. Gorman, W.M. 1981, Some engel curves, in A. Deaton, ed., essays in the theory and measurement of consumer behaviour in honor of Sir Richard Stone, Cambridge: Cambridge , University Press. H. Ichimura, J.L. Powell, and W.K. Newey, 1991, Estimation of polynomial models, Journal of Econometrics 50, 273-295. variables errors in Hausman, J. A., J. A., W.K. Newey, and J.L. Powell, 1995, Nonlinear errors in variables: Estimation of some Engel curves, Journal of Econometrics 65, 205-233. Hausman, Leser, C.E.V., 1963, Forms of Engel functions, Econometrica 31, 694-703. Lerman, S. and C. Manski, 1981, On the use of simulated frequencies to approximate choice probabilities, in C. Manski and D. McFadden, eds., Structural analysis of discrete data with econometric applications, Cambridge: MIT Press. McFadden, D., 1989, A method of simulated moments for estimation of discrete response models without numerical integration, Econometrica 57, 995-1026. Newey, W.K., 1994a, Series estimation of regression functionals, Econometric Theory 10, 1-28. Newey, W.K., 1994b, Efficient estimation of models with conditional moment restrictions, in G.S. Maddala and C.R. Rao, eds., Handbook of statistics, volume 11: Econometrics. Amsterdam: North-Holland. Newey, W.K. and D. McFadden, 1994, Large sample estimation and hypothesis testing, Engle and D. McFadden, eds., Handbook of Econometrics, Vol. 4, Amsterdam: in R. North-Holland. Newey, W.K. and J.L. Powell, 1991, Instrumental variables estimation for nonparametric models, working paper, Department of Economics, MIT. Pakes, A., 1986, Patents as options: Some estimates of the value of holding European patent stocks, Econometrica 54, 755-785. Pakes, A. and D. Pollard, 1989, Simulation and the asymptotics of optimization estimators, Econometrica 57, 1027-1057. Robinson, P., 1987, Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form, Econometrica 55, 875-891. 25 Stone, C.J., 1977, Consistent nonparametric regression (with discussion), Annals of Statistics 5, 595-645. White, H., 1982, Instrumental variables regression with independent observations, Econometrica 50, 483-499. Wolter, K.M. and W.A. Fuller, 1982, Estimation of nonlinear errors in variables models, Annals of Statistics 10, 539-548. 26 Table One: Monte 5 5 1 OLS -1. 10 .25 IV - .30 - .09 3.31 2.27 SM RMSE SE Bias 1 3 . . Bias 13 . 67 . . 13 1. . 07 25th 50th 75th 3373 4574 6417 Sample standard error of log of expenditure .51 Standard error of predicted .25 Standard error of residual .45 R-squared .23 Table Three: RMSE Bias 67 26 05 .85 .47 .31 13 25 1.05 Some Sample Table Two: Income Quartiles 6 2 SE 32 2.27 Carlo Results . 1 . 1 . . Elasticity Estimates for Share Equations Food 50th 75th LS .72 (.02) .66 (.02) .59 (.03) 4.52 2SLS .82 (.05) .78 (.04) .74 (.06) 47 SMO .82 (.04) .78 (.04) .74 (.06) (.08) .84 (.20) .78 (.09) .71 .31 SMI (.36) y o- .61 -.01 (1.49) ( 27 .05) t-stat 12 2.01 1.62 Statistics 25th 3 SE Q 3.67 6.36 .04 6.08 RMSE . 2. 1. 86 07 65 Clothing LS 25th 50th 75th 1.21 (.05) 1.08 .97 (.05) 18.66 (.04) 2.25 SMI IS 1.42 1.30 (.12) (.09) (.10) 1.63 1.40 1.26 (.20) (.10) (.18) .02 (.38) 1.62 1.28 1.07 - 2SLS 1.61 SMO o- (.46) (.30) (.56) ( .56 0009 .15 .0018) (.11) . t-stat .20 Q 6. 11 1 . 99 Transportation LS 2SLS SMO 25th 50th 75th 1.28 1.44 1.50 (.07) (.06) (.07) .99 (.08) 1.06 1. (.08) 12 (.12) 1.02 1.01 (.06) 1.01 (.06) 1.71 (2.01) .98 (.07) .63 (.18) 10 (.07) (-07) SMI 1.40 (.27) t-stat o~ 11.19 1.00 .028 (.018) . .04 11.28 3.10 7.54 Recreation LS 25th 50th 75th 1.40 1.20 1.06 (.07) (.06) (.07) 1.31 (.12) 1.07 1.33 2SLS 1.70 (.15) SMO 2.97 SMI K cr t-stat 16.59 11.45 (.15) .02 (.34) (.63) (.12) .39 (.48) 6.98 2.32 1.28 .71 (4.40) (.36) (.12) (.18) 6.48 ( 28 .024 002) . 5.58 09 Table Four: Elasticity Estimates for Level Equations Food 25th 50th 75th LS .68 .04 .63 .03 .57 .03 2SLS .90 .80 (.05) .70 (.05) (.08) .81 (.04) .63 (.05) .35 (.02) 1.21 (.21) .83 (.06) .46 (.13) .24 (.06) (.08) SMO SMI . 98 y cr t-stat Q .31 3.34 12.4 .008 (.005 5.09 18.16 16.42 Clothing LS 25th 50th 75th 1.40 1. 11 (.06) .89 (.04) 53.95 (.11) 1.21 (.13) 4.92 2SLS 2.04 SMO SMI 1.50 H <r (.26) (.12) 2.07 1.36 .96 (.21) (.08) (.07) .34 (.02) 2.14 1.37 (.58) (.11) .93 (.18) .33 (.09) .001 (.009) t-stat 39.04 17.15 3.19 17.13 Transportation LS 2SLS SMO 25th 50th 75th 3.14 1.95 1.48 (1.06) (.19) (.09) .23 (.65) .93 (.12) 1.53 1.16 .94 (.05) .76 (.04) .64 (.04) .97 (.20) .74 (.09) .58 (.27) (.08) SMI 1.25 (.59) y <r t-stat 1.68 2.04 (.53) .004 (.021) 29 38.27 10.63 .69 10.52 Recreation. LS 50th 75th 1.74 1.26 42.87 (.19) (.09) .95 (.05) 1.01 (.15) 14.85 2SLS 1.84 1.32 (.28) SMO 2.35 SMI t-stat 25th (.16) 1.44 (.26) (.09) 7.85 1.93 (4.76) (.34) Table Five: 0" n .95 (.07) .36 (.03) .33 (.15) .25 (.02) .024 48.62 33.23 15.88 21.83 (.005) Elasticity Estimates for Level Equations with Covariates Food 25th 50th 75th .72 (.05) .66 (.03) (.03) 2SLS .97 (.09) .85 (.06) .74 (.06) SMO 1.00 .86 (.05) .73 (.05) .33 (.02) .90 (.07) .58 (.15) .21 (.07) LS (.08) SMI 1.25 (.29) 1.08 .88 (.11) (.06) (.04) 1.50 1.22 (.14) (.15) (.22) 1.31 (.09) .94 (.08) .33 (.02) SMI 2.01 1.33 .002 (.14) .92 (.23) .31 (.73) (.11) (.011) 1 . 94 43.51 2.47 41.59 r 1.35 (.31) SMO .009 7.14 (.007) 50th 2SLS 2.01 Q 4.40 25th 75th t-stat 1.90 .58 Clothing LS 7 <r K cr t-stat 32.76 4.22 30 29.32 43.23 1.59 42.95 Transportation LS 2SLS SMO SMI 25th 50th 75th 2.54 1.83 1.50 (.63) (.14) (.08) .05 (.55) .70 (.16) 1.38 (.09) .87 (.06) .69 (.04) (.04) 1.86 1. 13 (.26) .61 (.07) .40 (.08) 1 . 07 (.87) <r If t-stat Q 1.30 3.09 (.45) .61 .016 28.01 30.12 2.61 28.92 .009) ( Recreation LS 25th 50th 75th 1.73 1.25 (.09) .95 (.05) 35.96 (.19) 1.31 (.17) 1.01 (.16) 10.69 (.32) 1.41 (.11) .88 (.08) 8.38 5.45 1.91 (.32) .18 .21 (.23) (.03) 2SLS 1.78 (.30) SMO SMI 2.40 7076 K cr .31 t-stat 37.86 54.23 10.91 44.88 (.03) ( DIO 31 .021 005) . Date Dub Lib-26-67 MIT LIBRARIES 3 9080 01972 0975 i 6 7 8 9 10 11 OREGON RULE CO. U.S.A.