M.I.T. LIBRARIES - DEWEY Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/flexiblesimulateOOnewe utzWtY working paper department of economics massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 FLEXIBLE SIMULATED MOMENT ESTIMATION OF NONLINEAR ERRORS -IN -VARIABLES MODELS VHITNEY K. NEWEY Massachusetts Institute of Technology 93-18 Nov. 1993 mTt. LIBRARIES DEC -7 1993 RECEIVED Flexible Simulated Moment Estimation of Nonlinear Errors-in-Variables Models Whitney K. Newey MIT October, 1989 Revised, November 1993 This research was supported by the National Science Foundation and the Sloan Foundation. David Card and Angus Deaton provided helpful comments. - 1 - 1. Introduction Nonlinear regression models with measurement error are important but difficult to estimate. (e.g. Measurement error is a common problem in discrete choice) models are often of interest. are not consistent for these models, as discussed approaches must be adopted. microeconomic data, where nonlinear Instrumental variables estimators in Amemiya The purpose of this paper is (1985), so that alternative to develop cin approach based on a prediction equation for the true variable, that uses simulation to simplify computation. The approach allows for flexibility in the distribution being simulated, and could be used for simulation estimation of other models. The measurement error model considered here Hausman, Ichimura, Newey, and Powell (1991) is the prediction model analyzed in and Hausman, Newey, and Powell (1993). model has a prediction equation for the true regressor with a disturbance that independent of the predictors. This previous work shows how This is to consistently estimate polynomial regression models, and general regression models via polynomial approximations. This paper avoids polynomial approximation by working directly with certain integrals, estimating them by simulation methods. Other work relies on the assumption that the variance of the measurement error shrinks with sample Wolter and Fuller (1982) and Y. Amemiya (1985). This approach is size, applicable including when there are a large number of measurements on true regressors, but this situation does not occur often in econometric practice. Flexibility in distribution of the prediction error is desirable, because consistency of the estimator depends on correct specification. meinage computation costs, so that the estimator models. is It is also important to feasible for a variety of regression These goals are accomplished by combining simulated moment estimation with a linear in parameters specification for distributional shape. Simulated moment estimation provides a convenient approach when estimation equations are integrals, e.g. Lerman and Manski (1981), Pakes (1986), and McFadden (1989). - 2 This approach uses Monte Carlo methods to form an unbiased estimators of integrals distribution in moment incorporated by multiplying by a linear is equations. in Flexibility in parameters function that approximates the ratio of the true density to the simulated one. This approach is similar to the importance sampling technique from the simulation literature. Section 2 describes the errors in variables model and some of its implications for conditional moments. Section 3 lays out the estimation method and discusses parametric Section 4 gives a semiparametric consistency asymptotic inference for the estimator. result. Section 5 presents results of a small Monte Carlo study. Section 6 describes an empirical example of Engle curve estimation of the relationship between income and consumption. The Model The model considered here (2.1) y = f(w .5q) + C. EIClx.v] = 0. w E(t)|x,v,(^] - w where cr is y w + T), = n'x + (TV, and C, are conformable matrices. errors, and w unobserved. The S , Here observed regressors. 0, independent of V are scalars, = w , w w, t), x, x. and are vectors, v represents true regressors, The x are observed and w , tj, and last equation is a prediction equation for the true regressors, are observed predictor variables, v is an unobserved prediction error, scaling matrix, a square root of a variance matrix. allowed to be observed, with w and ti measurement t) i!;, and and <r may v where is x a Some of the true regressors can be equal to an element of x, by specifying that be corresponding elements of element of tt'x w is cind t) are identically zero, and the corresponding v a- This model was considered by Hausman, Ichimura, Newey, and = w. • Powell (1991) (HINP henceforth), for the special case where f(w ,5) is a polynomial in » w e.g. x As long as . E[v] = as includes a constant, the location and scale of Var(v) = and when the second moment of I v v can be normalized, exists. Instrumental variables (IV) estimators can be used to estimate this model when f(w is ,5) w linear in Substituting . w-tj be valid instruments, because the disturbaince w for is in the first linear in the equation leads to measurement error the nonlinear case this substition leads to residuals that are nonlinear in x Consequently, An approach Let error. will not be valid instruments, be the density of gp>(v) E[y|x] = /f(7r^x + (2.3b) E[w.y|xl = Sin^x + (2.3c) E[w|x] = first is a regression of on X, that tj. and another approach has to be adopted. o-^v, Integrating leads to three equations: v. 5Q)gQ(v)dv, cr^v]f(n'^x + cr^v, d^)g^{v)dv, ir^x. unobserved variable wy In to consistent estimation can be based on integrating out the prediction (2.3a) The t). x v is y on x, analogous to the usual one, except that the has been integrated out. less familiar. The second equation The third equation is is a regression of a standard regression equation. The second equation is important for identification of nonlinear models. components of this equation correponding to unobserved w (i.e. those not corresponding to observed covciriates) provide information additional to the first equation. in HINP for polynomial regression, the identification. Intuitively, there are As shown first equation does not suffice for two functions that need regression function and the density of The v, so that - 4 - to be identified, the two equations are needed for identification. It was shown HINP that the parameters of any polynomial regression in equation are identified from these two equations, sind one expects that identification of the regression parameters will hold more generally. is It beyond the scope of this paper to develop fully primitive identification conditions for this model, but some things can be said. from identified so distribution with a finite support and b) provide 2m m 2m x, w case where nonsingular. Also, has a discrete i.e. that a "rank condition" paraimeters from these equations, including if f(w a scalar and is parameters are identified ,5) v. is moment matrix the second some of the moments of For example, HINP showed a polynomial of degree of are identified v should be identified. Assuming that the distinct functions, these equations give functions to be identified. So, left and 6 (I.ti' x,...,(7i' x) in this case. If is ti' ,5 the p, )' f(w a continuous distribution, then a simple counting airgument suggests that (v) on points of positive probability, then (2.3 a - parameters of a parametric family of distributions for in the ir'x If Assuming that none are redundant, equations. holds, one could identify g are ti w (2.3c) as the coefficients of a least squares regression of considered by focusing on equation (2.3a) and (2.3b). 5 the parameters can be treated as known and identification of the other pieces of the model n that, First, has x ) and hand sides of (2.3a) and (2.3b) are two functional equations, and there are two by an analogy with the finite dimensional case, it should be possible, under appropriate regularity conditions, to identify both the w regression function for and the density function for v. Making this intuition precise would be quite difficult, because of the nonlinear, nonparametric functional) nature of these equations, but it is cin (i.e. important problem deserving of future attention. Independence of model of equation moments of v (2.1) and x it is v is a strong assumption, but in the general nonlinear difficult to drop this assumption. can depend on x, then it is much more regression function from the distribution. - 5 - Intuitively, if some difficult to separate the 3. Estimation To describe the estimator moment conditional parameters, g vector, and Let setup. z denote a data observation, a density function of a random vector H(z,p,v) a r x 1 v, in a q x a p more general a p(z,p,g) r 1 x vector of 1 residual vector of functions, related as in p(z,p.g) = J-H(z,p,v)g(v)dv. (3.1) Suppose that there parameter value is and density p x a set of conditioning variables g such that for the true , E[p(z,^Q,gQ)|x] = 0. (3.2) The nonlinear errors-in-variables model is a special case of this one, where H^(z,p,v) = y - fdr'x-KTV.S), (3.3) H (z,3,v) = H2(z,P,v) = and embed the model helpful to is it L is Uw.y w - [Tr'x+(rvl«f(n'x+(rv,6)), - tt'x, (3 = {d'.a-.n'V. a selection matrix that picks out those elements of w that include measurement error. The common approach to using equation variables. One difficulty with this approach Another difficulty compute. (3.2) in estimation is nonlinear instrumental is that the residual is ein is that the density integral that may be g(v) is unknown. difficult to Here, these difficulties are dealt with simultaneously, by choosing a flexible parameterization for the density that estimator of the integral. To describe makes it this approach, the density function. - 6 - easy to use a simulation we begin with a specification of For now, suppose that the density g(v,r) = (3.4) where and ^(v) is = V p.(v) member a is of a parametric family, of the P(v.r) = Zj^^yjPj(v). P(v,yMv), some fixed density function. For example, if ^(v) then this would be an Edgeworth approximation. , form need not be positive, but leads to residuals that are linear in were standard normal The function g(v,y) the shape parameters y and that can easily be estimated by simulation. For a density like that of equation (3.4), a simulated residual can be constructed by drawing random variables from combination [v. ,...,v. H and the P(v,y) y(v) is <p{v) if observation pAe) = (3.5) is denote a single observation and [v. v. ] could be computer Then an estimator of the residual s~V ^ =^,H(z.,3,v.IS )P(v.is ,r), p{z.,P,g{y)) for e = O'.y')'. 1 essentially an importamce sampling estimator of the residual, sampling density and ip(.v). is 1 1 This z. a standard normal pdf, then generated Gaussian random numbers. i Let functions. denote a vector of random variables, each having marginal density ] For example, the and then evaluating the product of the linear P(v,y) approximates g(v)/^(v). where ^(v) The simulated residual is is an unbiased estimator of the true residual, because E[p.(e)|z.] = p(z.,p,g(y)). Therefore, by the results of McFadden (1989) and Pakes and Pollard (1989), an instrumental variables (IV) estimator with p.O) as the residual should be consistent 1 if the IV estimator with the true residual familiar way. estimated. Let A(x) Suppose that denote a 9 q x r is. An IV estimator can be formed in vector of instrumental variables, that solves - 7 - a may be n ^Ij"jA(x.)3.(e) = (3.6) This is a simulated, nonlinear IV estimator like that of McFadden (1989). Because equation (3.6) density 0. linear in is to integrate to one. P(v,y)^(v) and scale normalization on this density. P{v,9r), it Also, it is important to normalize the may be important There are different ways to impose For example, normalizations by imposing constraints on the coefficients. the standard normal density and pJv), p„(v), ... cu'e = for j * k), then integrates to one, and has zero P(v,y)v(v) =1, y 9'^ mean and ^(v) if is the Hermite polynomials that are orthonormal with respect to the standard normal density J'p.(v)p, (v)^(v)dv to impose a location = 2 (i.e. Jp.(v) <p{v)dv = and 0. = y unit variance. 1 and will imply that is It also possible to impose such constraints using the simulated values, by requiring that ,v? )P(v. ,r) = I-",L^,(l.v. ^1=1^=1 IS is 0. IS In the nonlinear errors in variables model, it is convenient to estimator, where the first step consists of estimation of tt work with a two step by least squares (LS), and an instrumental variables estimator using the first two residuals of the second step is equation (3.3). The first order conditions for such cin estimator can be formulated as a solution to equation (3.6), if A(x) is chosen in a particular fashion. Let tt be the LS estimator and A(x) = diag[B(x),x], (3.7) where (.8' ,<r,Tf' ). y _.P(v. has two columns and number of rows equal to the number of elements of B(x) ,3r) Suppose that constraints are imposed on the = 0. Then the solution equation (3.7), requires that ir y to equation (3.6), with coefficients such that A(x) specified as in be the least squares estimator, and that the other paremieters solve the equation - 8 - n A = (3.8) a = j:.;:^B(x.)p.(a), iS' ,a;r' )' f(Tr'x. -HTV. y; p.(a) = 1 equation as min Specifically, The a s ,5) P(v. ,r) lis L(7i'x.-Hrv. IS )f (ti' x.+o-v. ,6) 1 IS order condition, although the normalization for C(x) a solves a positive definite matrix, (3.9) 1 example the estimator minimizes a quadratic form that has this type of its first not imposed. ri s=l Lw.y. In the empirical 1 T = -i^'^- '^^ is equal to a vector of instrumental variables and W n A, [r.",C(x.)p.(a)]' W[j:.",C(x.)p.(a)] ^1=1 ^1=1 1 1 1 1 first order conditions to this minimization problem are as given equation (3.8), in with n B(x) = [ar. ,C(x.)p.(a)/aa)'W. ^1=1 (3.10) 1 1 Standard large sample theory for IV can be used for asymptotic inference procedures. If the simulated values (v.,,...,v._) observation for the data point, then the usual IV formulae can be used to form a i consistent variance estimator. independent observations as see are included with the data to form an augmented iS il Newey and McFadden, For example, suppose that i vairies. (z.,v. ,...,v. are ) Then under standard regularity conditions 1994), the asymptotic variance of ^11(8 - 6 (e.g. can be estimated ) by V = G'^nc'^' (3.11) , G = n"V.",A(x.)ap.(e)/ae, ^1=1 1 If the simulation draws given i, v. 1 111 n = n"V.",A(x.)p.(e)p.(e)'A(x.)'. ^1=1 are mutually statistically independent as then one could also use the estimator - 9 - s 1 varies for a , V = G"^nG (3.12) A. 1 n = ^', n~^j;."^A(x.)A.A(x.)' = s"t^,H(z.,p,v. *^=1 1 )H(z.,p,v. )'P(v. ,y)^. IS IS 1 IS Both of these variance estimators ignore estimation of the instruments, which Because the large sample theory for these under standard regularity conditions. estimators we do straightforward, is is valid not give regulcirity conditions here. Consistent Semiparametric Estimation If the functional form of the density unspecified, then the Models where identification model becomes semiparametric. moment is left gf^(v) is acheived by conditional restrictions like those of equation (3.1) are nonlinear, nonparametric simultaneous equations models. Newey and Powell such models, and their result can be applied here. previous estimator, but with (1989) have considered estimation of The basic idea is to apply the chosen to be a member of an increasing sequence of P(v,y) approximating families and the IV equation (3.6) replaced by a nonparametric conditional expectation equation. & Let density be a set of functions of gf>(v) {P (v.y)} function. and satisfy other regularity conditions given below. be a sequence of families such that Let Euclidean vector (3.5) with &. = <P.(v,y)y(v)}n§', p Pj(v,9f) and a density replacing 9 = 0,g) g, P(v,9r). of the conditional expectation of estimator. that will be assumed to include the true v p. (6) and Let P.(v,3')y(v) be the simulated residual of equation E[p.(e)|x.] x., Then a minimum distamce estimator of - 10 can approximate any be the peirameter consisting of the p.(0) given Also, let be a nonparametric estimator such as a series or kernel 9 = 0-,g^) is „ „ e = argmin^ (4.1) Q(e) = Q(e); oCioXj'; j;." 11 11 DEl^.O) E[p.(e) x.]' | 1—1 |x.l/n. n D where is a positive definite matrix and The objective function size. equation in CEin J (4.1) depend on the data and on sample a sample analog of is Q(6) = E[E(p(z.0)|x]'DE[p(z.e)|x]], D where and e is is the limit of identified p(z,9) = jH(z,P.v)g(v)dv. and If from the conditional moment equation then a unique solution), D Q(e) extremum estimator reasoning is (i.e. positive definite that equation has have a unique minimum of zero at will Newey and McFadden, (e.g. (3.2) D 8 The general . 1994) then suggests that should be consistent. sets, The estimator can be shown to be consistent similarly to Gallant (1987). The compact function set assumption but the results of Newey and Powell if and p g are restricted to compact (1989) indicate its importcince for estimators of the form considered here. A = For a matrix [a..], let is a strong one, minimum distance = IIAII ij [trace(A'A)] 1/2 , and for a function g(v) let denote a function norm, to be llgll further discussed below. Assumption 4.1: p 6 S, which is compact, and In the primitive regularity conditions given below, The following dominance condition Assumption M(z,v), 4.2: There exists M(z,v) g e ^, will be a Sobolev llgll will be useful in such that for a compact set in a 0, norm norm. showing uniform convergence. g € B, IIH(z,^,v)ll s IIH(z,p.v)-H(z,^.v)ll < M(z,v)llp-pll. Moment conditions for the dominating function To show uniform convergence M(z,v) will be specified below. of the objective function of equation (4.1), useful to impose a strong condition on function norm, that - 11 - it it is dominates a weighted llgl supremum norm. llglL, V,W Below will be it V Let denote the support of = sup,,|g(v)|tj(v). cj(v) <p{v), and 0. > V assumed that UglU, dominated by is llgll, so that Assumption 4.1 V,(*) "gUy implies that on the faster in behavior of tail moves outwcird, v order to guarantee that restriction on this assumption is a the faster the tails of sup^< |g(v)|tj(v)} sampling imposes a restriction on the baseline density. The import of &. uniform bound imposed by the presence of the weight function; the g(v), grows as w(v) bounded on is is finite. must go to zero g(v) Also, the nature of importance tail thickness of the true density relative to the For second moment dominance this restriction will translate into a w(v) relative to <p(v). These considerations lead to the following assumption: Assumption 4.3: ^ HgH HgH,, for g e ^ and there exists e such that > V,£J ElX^VKz.v) u(v) lu(v)(p(v)] In order to dv] < oo. guarantee that the parametric approximation suffices for consistency, the following denseness condition will be imposed. Assumption 4.4: g e ^ For any limj_^IIPj(.,y)«> - gll = necessciry to g there exists ). &. such that make some assumption concerning the conditional expectations is lifted from Newey and Powell changing notation assume that the data observation (v. ,...,v. P.{v,f)(p(v) e can be approximated by the family. The following condition estimator. J 0. This condition specifies that It is and Assume that the data are stationary. - 12 - z. (1989). Without includes the simulation draws > Assumption and For 4.5: e 11 if E[|^(z.)|^*^''^] < finite, T.", IIEl^tz) |x.] - ^1=1 .i/»(z.). iJ J U ^1 r.",E[^(z)|z.]/n = ^1=1 then oo ^.^i^^^i'^" |x.lll^/n E[.//(z) (1); z. iJ or b) E[0(z)|z.] it is K/n (1977), b), iii) — ) 0, while ii) iii) with P Lemma in P'. and and ii) if positive definite, is is Elf/ztz)^""^] a) E[i/»(z)|x.] = if E[ |i//(z) I ^^^''^l < oo, .P'.j'j]." ,P ..//(z.). (J^.",? ^j=l ^j=l j"^ J some cases and holds by construction. containing Newey A. 10 of either iii) D D, J is J For instsmce, quite general. K 1 — K For K-nearest-neighbor estimators with 8 of Robinson (1987) and Proposition of Stone For a series estimator of the form given in elements such that any function with finite mean square can be approximated arbitrarily well from Lemma ^ D — easy to use known results to show that Assumption 4.5 holds follows by a) i) E[«//(2.)l; n). = for nearest neighbor and series estimators. 00, 0; ^11 p then i.i.d. is "^ (s,t=l 1, J=l Assumption 4.5 can easily be checked if -^ 4.3, 1 I.",w.. = w.. i 0. r.",w. J=l from Assumption > in mean-square for large enough (1993a) and the arguments for Neither of these results allow for data based Assumption 4.5 restricts the form of randomness. K. It Lemma A. 11 K, ii) follows — K as long as > m should be noted that Implicitly the form of the weights w st in Assumption 4.5 and the approximating functions P are restricted to not depend on i//. Thus, while they could be chosen based on some fixed of p(z,e) (i.e. for some prelimineu-y estimator with 9 p(z,9)). in 9), ip (e.g. a lineair combination they are not allowed to vary with \p Assumption 4.5 should also be "plug-compatible" with future results on nonparametric conditional expectation estimators, such as those for time series. The last Assumption assumption specifies that 4.6: As mentioned J -^ eairlier, oo as n — ) J must go to infinity with the sample size. oo. the degree of approximation - 13 J can be random, in a very general and way. However, should be noted that it it not restrictions on the growth rate of is J that are used to obtain consistency, but rather the restriction of the function to a compact Often, the compactness condition will require that higher order derivatives set. be uniformly bounded, a condition that will have more "bite" for large values of J, imposing strong constraints on the coefficients of higher order terms. These assumptions and identification deliver the following consistency result: E[p(z,Q)\x] = Theorem 4.1: 4.1 - 4.6 are satisfied, then It If has a unique solution on -^ lip-3_ll and llg-g_ll Bx'S -^ at and Assumptions G_ 0. should be noted that the hypotheses of this theorem are not very primitive until the norm Once that is specified. llgll is specified, may require some work it to check the the other assumptions. The following set of Assumptions is sufficient to demonstrate that the assumptions are not vacuous, and do cover cases of some interest. Assumption 4.7: fixed constant V is cmd I- qT-V^; vi) Corollary 4.2: and £ is This result is B is ^(v) for is all v e v, : Vh There ii) g(v) = iii) is for llgll If Assumptions ] < 3.1, IIP-/3qII V a compact interval \ i V, sup = sup^|g(v)|; |g(v)| iv) continuous and bounded away from zero on Elsup^^^M(z,v) compact then is one-dimensional; such that & = {g(v) s B|v-v| |g(v)-g(v)| (p{v) v i) ^ B, The support of V; v) 4.2, 4.5 - 4.1 -^ and are satisfied, llg-g_ll It is -^ p e S, satisfied, 0. easy to relax the assumption that one dimensional, using the results of Elbadawi, Gallant, and Souza (1983). noncompact support for using the results of Gallant and Nychka (1987). quite thick tails, with P(v,y) = 00. restrictive in several ways. difficult to allow for and a w(v) = C(l+v'v) in v, although this extension is more possible Unfortunately, their result allows for Assumption - 14 is It v 4.3. This tail behavior does not allow Assumption 4.3 to be satisfied when fiv) is the standard normal density. Of course, there are fast computational methods for generating data from densities (l+v'v) proportional to densities. Also, so that one could easily use such thick-tailed baseline , should be possible to develop intermediate conditions that allow for it more general simulators. 5. A Sampling Experiment A small Monte Carlo study is useful in a rough check of whether the estimator can Consider the model give satisfactory results in practice. • (5.1) S^w y = 6^ + w w = = w + 5^e -w + c, = 5^ = 5^ 1. C is N(O.l), » + + TT T), 71 N(0,1), is T) X + The regression equation for = n V, this 71 model = is 1, one that relationship between consumption and income. discussed in Section 6, where it is used in and X is v are useful in estimating the This specification will be further the empirical example. were set so that the r-squared for the prediction equation for signal to noise ratio was 1. N(0,.5). The parameter values w The number of observations was set to observations was chosen to be small relative to typical sample sizes make computation thcin typical in easier. was The r-squared for the regression of w 1/2, 100. in on and so the The number of economics, to x was set higher order to offset the small sample size, so that the estimator might be informative. Table One reports the results from 100 replications. 15 - Results for three different estimators of 8 , and 5„, 6„ squares (OLS) regression of The second estimator are reported. on the mismeasured right-hand side variables y R. = (l,h.,h.,h. 111 1 and is W = is a simulated I®(J^._ R.R'. ) first estimator is an ordinary least an instrumental variables estimator is side but with instruments estimator The momemt , where )' , where h. = (IV) n, + ir„x.. 2 is The third a two dimensional identity matrix. C(x.) = I®R. This estimator a system two-stage least squares estimator where the instrumental variables are P(v,y) Also, and was a Hermite polynomial of the third-order, where was estimated. 7 y =1, R.. 7^ = 7- = 0, There were two simulations per observation. In one replication out of the 100 the estimator did not converge to a stationary This replication was excluded from the results, that are reported in Table One. point. The estimator shows promise. The standard errors of the IV and simulated moment estimators are much larger than the OLS estimator, but the biases As previously noted the IV estimator smaller. it ). 1 estimator (SM) from equation (3.9), with I ~w with the same right-hand 1 1 (l,w,e leads to bias reduction. estimator is It is is aire substantially inconsistent, although in this example interesting to note that the standard error of the smaller than that of the IV estimator. Thus, in this example the valid SM SM correction for measurement error leads to both smaller bias and variance than the inconsistent IV correction. 6. An Application to Engel Curve Estimation The application presented here is long been of interest in econometrics. estimation of Engel curves, a subject that has Measurement error has recently been shown Hausman, Newey, and Powell (1993) to be important curves. in the in estimation of nonlinear Engel This section adds to that work by estimating a nonlinear, nonpolynomial Engle curve for the model of equation (2.1), which was not considered by Hausman, Newey, and - 16 - The functional form considered here Powell (1993). S. (5.1) = 5, is S. 2 1 1 where » • + 5_ln(I.) + 5_(1/I.) + 3 1 is that preferred by Leser (1963), c, 1 1 the share of expenditure on a commodity and is I. the true total As suggested by the Hausman, Newey, and Powell (1993) tests of the Gorman expenditure. rank restriction, a rank two specification such as this may be a good (1981) specification, once the a specification In addition, is results from the fact that if results is 1. = Y./l., S. for. considered that accounts for the presence of denominator of the left-hand side of this equation. in the Thus, measurement error has been accounted where Y. is 1. This "denominator problem" the expediture on the commodity. measured with error, another nonlinear measurement error problem from using the measured shares. This problem can be dealt with by bringing 1. out of the denominator, giving Y. = 5,1. (5.2) 1 1 If e. + 6^1.1n(l.) + 5^ + I.e.. 2 1 1 11 3 1 Elcll.] = satisfies the usual restriction 11 1 0, then equations (5.1) aind (5.2) are equivalent statistical specifications, in that running least squares on either Covariates will also be allowed equation should give a consistent estimator. specification by allowing additional variables equation, corresponding to inclusion of equation 1.x in this to enter linearly in this . as additional regressors in the share x (5.1). The measurement error will be assumed to be multiplicative, i.e. for 1. equal to the observed total expenditure, (5.3) • ln(I.) 1 In the empirical 11111 • • = w. = 1 tt'x. 1 + v., ln(I.) 1 1 work the predictor = Ind.) vEiriables squared for household head and spouse, aind x. « + t?. = w. = w. + ij.. will be a constant, age and age dummies for educationational attainment. - 17 spouse employment, home ownership, industry, occupation, region, With this specification for the total of 19 variables, including the constant. measurement and prediction equations, f(w black or white, a eind = 5 ,5) + 5„w + 6 exp(-w as in the ), Monte Carlo exeimple. The measurement error denominator can be accounted for as in the left-hand side * f(w equation (5.2), leading to a specification with m m ,5) = 5 exp(w + 6 ) in » exp(w )w + 6^. * It is that interesting to note that even 5 = 0, if the shaire equation is linear in ln(I.), this equation is nonlinear, so that IV will not be consistent. measurement error in the denominator of the shao'e suggests the so Thus, need for the estimators developed here. The data used in estimation are from the 1982 Consumer Expenditure Survey (CES). The basic data we use are total expenditure and expenditure on commodity groups from the Results were obtained for four commodity groups, food, clothing, first quarter of 1982. transportation, and recreation. The number of observations empirical results were reported as elasticities, econometrics. To compaire shapes, elasticities i.e. in the dlnf(x)/dlnx, data set as is 1321. common is The in were calculated at the quartiles of observed expenditure. The results are given in Tables Two through Five. income distribution. statistics, including the quartiles of the include estimated expenditure elasticities at these quartiles. information on the prediction regression. R The 2 quite sizeable for such a cross-section data set. Two Table in this gives some sample The other tables Table Two regression also gives .23, is The other information will is which is useful in calculating the magnitude of the measurement error and bounding the size of the variance of the prediction error standard error .45 In particular, the v. of the residual is ain model we have assumed implies that the upper bound on the standard deviation of both the measurement error and the variance of the prediction error estimator a- of Var(v) 1/2 , an estimator of the R 2 v. Also, given an of the measurement equation, that determines the magnitude of the measurement error bias in a linear model, - 18 - is Vartrt'x + v)/Var(w) = [Vardr'x) + o-^]/Var(w) = [(.25)^ + ^^]/(.51)^ = .24 + (3.8)o^^. Tables Three to Five give results for each commodity for three different specifications of the share equation and four different estimators. results for the share equation, left-hand side ignored. is where measurement error This specification is Table Three gives the denominator of the in the same as in the Monte Carlo study. Table Four changes the specification to account for the left-haind side denominator by multiplying through the original equation by total expenditure, as described above. Table Five adds covciriates regional price effects. dummy three regional Four in + 5 to the share equation to allow for The equation estimated variables. • exp(w )w demographic and There are six covariates; own and spouse age, family is size, and analogous to that of Table accounting for the left-hand side denominator, with • 5 x f(w ,x ,5) = • + 6 + exp(w )x restricts fcimily size to be absent ' 6 . It should be noted that this specification from the prediction equation. Tables Three to Five report results for four different estimators, ordinary least squares (LS), two stage least squares (IV) with instruments described below, the simulated moment estimator with Gaussian v (SMO), and the simulated with one Hermite polynomial term (SMI), of the third order, included functions. p. (a) The simulated moment estimators are each obtained as as given in equation (3.8), simulation draws, and 10 an estimated asymptotic variance of W (5.4) 0. 1 where a is = f:"\ ^._ C(x.)p.(a)/v^. W in in moment estimation the moment equation (3.9), with equal to the inverse of Specifically, Z = n"V.",U.O'.. = C(x.)p.(i) + [5j;.",C(x.)3.{a)/S7i](^.",x.x'.)"^x.(w.-TC'x.). "^j 1 an ^j=l '^i ^j=l J initial consistent estimator. minimizing choice of W, 2 This J is J 11 1 an asymptotic variance that accounts for the presence of n 2 in p.. The procedure used to obtain the intial consistent estimators was to begin with an identity weighting matrix, use a few iterations to obtain "reasonable" parameter values, calculate W as in equation (5.4), and then minimize to get - 19 - a. The standard errors for LS and IV were calculated from heteroskedasticity consistent The standard errors for simulated moment fromulae, e.g. as given in White (1982). estimators were calculated from the H = GMM asymptotic variance estimator (H'Z H) , where ay;.",c(x.)p.(a)/aa. ''1=1 1 1 A preselection process was used moments powers of the predicted Starting at the second order, the value to include in the instruments. to have enough to choose the order of minimum needed was to allow estimation of distribution paremieters, the order chosen by cross-validation on the food equation, Gaussicin, simulated moment estimator (SMO), using the cross-validation criteria for choice of instruments suggested in (1993b). Newey Inclusion of higher order powers did not result in any decrease in the Consequently, in Tables Three and Four the instrumental cross-validation criteria. variables were (1, x'jr, (x'lr) In ). Table Five was added exp(x'ir)«x to the instruments, because of the presence of the covariates. The number of Hermite polynomial terms to include was chosen essentially by an upwards testing procedure, applied in the model of Table Three. order term was tried in each case, as reported in Table Three. asymmetry term was in the distribution of tried. In none of the cases one, third order, Hermite polynomial This term allows for it was statistically significant, a fourth order was this term significant, so only results for the If v. Inclusion of a third term are reported in the tables. For each estimator, elasticities at the quartiles, the estimate of and the estimator of the coefficient standard errors (in y cr = Var(v) 1/2 of the Hermite polynomial term, as well as parentheses below the estimates) are reported. The (asymptotic) t-statistic on the coefficient of inverse expenditure (t-stat) and the over identification (minimum chi-square) test reported. The t-statistic (Q) statistic for the simulated is particulau'ly estimator would be consistent if moment estimator are relevant in Table Three because the 2SLS the coefficient on inverse expenditure were zero. The degrees of freedom of the overidentification test statistic are respectively for SMO and also SMI, in Tables Three and Four, and - 20 - 8 2 and and 7 1 respectively in Table Five. The difference between these statistics for SMO and SMI is a one-degree of freedom chi-squared test of the Hermite coefficient being zero. Even though the IV estimator estimator number of in a inconsistent, is it gives results similar to the When the share denominator cases. with error there are larger differences between IV and SM. smaller thain those of which IV, allowed to be measured is The standard errors of SM are consistent with the Monte Carlo results of Section is There are large differences between the OLS and SM estimators, as presence of measurement error. SM consistent with the interesting to note that the elasticities for is It is 4. transportation go down rather than up, unlike linear regression with measurement error. In comparing Tables 3 and 4, it is the denominator leads to some changes food equation Table 4 than in more precisely estimated larger in Table in 3. the results. In more nonlinearity The overidentification test in Table 3, although Table 4 is most cases SMO is in There is quite similar to SMI, except for in the a- is statistics are a levels equation that is thought to be more heteroskedastic than the share equation. nonnormality. is The prediction error standard deviation these equations. in TTiere Suprisingly, the estimated standard errors in Table 4 are not 4. larger than those Table in apparent that accounting for measurement error much sometimes little evidence of much larger standard errors. In summary, although allowing for nonnormality does not change the empirical results, correcting for measurement error maikes a big difference. simulated moments estimator is In several cases the quite different than the inconsistent IV estimator, suggesting that the inconsistency of IV estimator may not be uniformly small. Furthermore, the simulated moment estimators seem quite accurate, having small standard errors. These results illustrate the usefulness of using simulated moment estimation to correct for measurement error, while allowing some flexibility in prediction error to asses the impact of allowing for nonnormality. 21 the distribution of the Appendix The proof proceeds by verifying the hypotheses of Theorem Proof of Theorem 4.1: Newey and Powell (1989). = Sx5 S compact by is -1 S & amd Z = augmented data vector, with S For compact. (z,v ,...,v Note that for ,H(z,fi,v )P(v ,y). y ^^=1 s s 9 = O.g) Let the norm for ). be 11611 lipil simulations let S + Z llgll. Note that denote the p(Z,e) = Also, let p_(v) = g_(v)/cp(v), u = 5.1 of it u follows by Assumptions 4.2 and 4.3 that by the triangle inequality £ J:^^^IIH(z,Pq,v^)II|Pq(v^)|/S s <j:^f^M(z.v^)[(j(v^)v(v^)rVs>llgQll, (A.l) llpCZ.e^)!! (A.2) {E[llp(Z,e_)ll^''^]>^''^^"'^^ £ C-r^,<E[m(z,v )[u(v )v)(v )] h'^''^]}^^^^''^^/S *^=1 s s s = C.'(E[J^M(z,v)^*^w(v)"^"%(v)"^"^dvl}>^^^^''^^ It follows similarly to equation A.l that that for iip(z,e)-p(z,e)ii (A.3) so that Assumption 5.1 of Newey and Powell and 5.3 then follow by Assumptions 4.4 and 4.5. Furthermore, by the fact that in the sup norm. by The conclusion then follows by QED. (1989). 4.1. by hypothesis and the Arzela theorem, which gives compactness of Assumption 4.3 follows with zero and Assumption 4.6, g(v)/^p(v) Newey and Powell 2.1. Newey amd The proof proceeds by verifying the hypotheses of Theorem Proof of Corollary 4.2: W Assumptions 5.2 (A.2). for an unbiased simulator, as noted in the text, the conclusion of Theorem 5.1 of 4.1 follows 6 6 8, (1989) follows by eq. Powell's (1989) Assumption 3.1 holds by Assumption Assumption CO. c«{j;^^^M(z,v^)[w(v^)^(v^)l Vs>iie-eii, :£ E[p(Z,e)|x] = E[p(z,^,g)|x] 6, < vi). w(v) = 1 by (p(v) bounded away from Assumption 4.4 follows by a Weirstrass approximation of PAy). The proof then follows by the conclusion of Theorem 22 - 4.1. QED. 6 References Y. (1985): "Instrumental Variables Estimator for the Nonlinear Errors in Variables Model," Journal of Econometrics 28, 273-289. Amemiya, Elbadawi, I., A.R. Gallant, and G. Souza (1983): "An Elasticity Can be Estimated Consistently without a Priori Knowledge of Functional Form," Econometrica, 51, 1731-1751. Gallant, A.R. (1987): "Identification and Consistency in Nonparametric Regression," in T.F. Bewley, ed.. Advances in Econometrics: Fifth University Press, 145-169. World Congress, Cambridge: Cambridge Gallant, A.R. and D.W. Nychka (1987): "Semi-Nonparametric Maximum Likelihood Estimation," Econometrica, 55, 363-390. Gorman, W.M. (1981): "Some Engel Curves," Measurement of Consumer Behaviour in in A. Deaton, ed.. Essays in the Theory and Sir Richard Stone, Cambridge: Cambridge Honor of University Press. J. A., H. Ichimura, J.L. Powell, and W.K. Newey (1991): "Estimation of Polynomial Errors in Variables Models," Journal of Econometrics, 50, 273-295. Hausman, J. A., W.K. Newey, aind J.L. Powell (1993): "Nonlinear Errors in Variables Estimation of Some Engle Curves," Journal of Econometrics, forthcoming. Hausman, Laser, C.E.V. (1963): "Forms of Engel Functions," Econometrica 31, : 694-703. S. and C. Manski (1981): "On the Use of Simulated Frequencies to Approximate Choice Probabilities," in Structural Analysis of Discrete Data with Econometric Applications, ed. by C. Manski and D. McFadden, Cambridge: MIT Press. Lerman, McFadden, D. (1989): "A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration," Econometrica, 57, 995-1026. Newey, W.K. (1993a): "Series Estimation of Regression Functionals," Econometric Theory, forthcoming. Newey, W.K. (1993b): "Efficient Estimation of Models with Conditional Moment Restrictions," forthcoming in G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Volume 11: Econometrics. Amsterdam: North-Holland. Newey, W.K. (1994): "Large Sample Estimation and Hypothesis Testing," forthcoming R. Engle and D. McFadden (eds.). Handbook of Econometrics, Vol. 4, Amsterdam: in North-Holland. J.L. Powell (1989): "Instrumental Variables Estimation for Nonparametric Models," preprint. Department of Economics, Princeton University. Newey, W.K. and Pakes, A. (1986): "Patents as Options: Some Estimates of the Value of Holding European Patent Stocks," Econometrica, 54, 755-785. Pakes, A. cind D. Pollard (1989): "Simulation and the Asymptotics of Optimization Estimators," Econometrica, 57, 1027-1057. - 23 - Robinson, P. (1987): "Asymptotically Efficient Estimation in the Presence of Heteroskedasticity of Unknown Form," Econometrica, 55, 875-891. Stone, C.J. (1977): "Consistent Nonparametric Regression" (with discussion), Annals Statistics, 5, 595-645. of White, H. (1982): "Instrumental Veiriables Regression with Independent Observations," Econometrica 50, 483-499. Wolter, K.M. and W.A. Fuller (1982): "Estimation of Nonlinear Errors in Variables Models," Annals of Statistics 10, 539-548. - 24 - Table One: Monte ^ Bias OLS IV -1. 10 - .30 SM - .09 SE RMSE .25 3.31 2.27 3.32 2.27 1 . B 13 i as . 67 . 13 . 07 '2 SE RMSE Bias 67 26 05 .85 .47 13 . 1.25 1.05 25th 50th 75th 3373 4574 6417 Sample standard error of log of expenditure .51 Standard error of predicted .25 Standard error of residual .45 R-squared .23 Table Three: ^3 Some Sample Table Two: Income Quartiles Carlo Results . 1 . 1 . SE . .31 Statistics Elasticity Estimates for Share Equations Food 25th 50th 75th LS .72 (.02) .66 (.02) .59 (.03) 2SLS .82 (.05) .78 (.04) .74 (.06) SMO .82 (.04) .78 (.04) .74 (.06) .84 (.20) .78 (.09) (.36) SMI .71 7 a- t-stat Q 4.52 . .61 47 3.67 6.36 .04 6.08 (.08) -.01 .31 (1.49) - 25 - ( .05) 12 2.01 1.62 RMSE . 2. 1. 86 07 65 Clothing LS 25th 50th 75th 1.21 (.05) 1.08 .97 (.05) 18.66 (.04) 2.25 2SLS 1.61 SMO SMI r <r 1.42 1.30 (.12) (.09) (.10) 1.63 1.40 1.26 (.20) (.10) (.18) .02 (.38) 1.62 1.28 1.07 -.0009 (.46) (.30) (.56) (.0018) .15 t-stat .56 6. 11 .20 1.99 (.11) Tr an s p ortation LS 2SLS SMO 25th 50th 75th t-stat 1.28 1.44 1.50 11.19 (.07) (.06) (.07) .99 (.08) 1.06 1.12 (.08) (.12) 1.02 1.01 (.06) 1.01 (.06) 1.71 (2.01) .98 (.07) .63 (.18) .10 (.07) (.07) SMI 1.40 (.27) 1.00 .028 (.018) .04 11.28 3.10 7.54 Recreation LS 25th 50th 75th 1.40 1.20 1.06 (.07) (.06) (.07) 1.31 (.12) 1.07 1.33 2SLS 1.70 (.15) SMO 2.97 SMI 7 a- t-stat 16.59 11.45 (.15) 6.48 .02 (.34) (.63) (.12) .39 (.48) 6.98 2.32 1.28 .71 (4.40) (.36) (.12) (.18) ( - 26 - .024 002) . 5.58 .09 Table Four: Elasticity Estimates for Level Equations Food a 25th 50th 75th .68 .04 .63 .03 .57 .03 .31 .90 (.08) .80 .70 3.34 (.05) (.05) SMO .98 (.08) .81 (.04) .63 (.05) .35 (.02) SMI 1.21 (.21) .83 (.06) .46 (.13) .24 (.06) LS 2SLS r t-stat 12.4 .008 5.09 Q 18.16 16.42 (.005) CI othing LS 25th 50th 75th 1.40 1.11 (.06) .89 (.04) 53.95 1.50 1.21 (.13) 4.92 (.11) 2SLS 2.04 SMO SMI (.26) (.12) 2.07 1.36 (.21) (.08) 2.14 1.37 (.58) (.11) r a- .96 (.07) .34 (.02) .93 (.18) .33 (.09) .001 (.009) t-stat 39.04 17.15 3.19 17.13 Transportation LS 2SLS SMO 25th 50th 75th 3.14 1.95 1.48 (1.06) (.19) (.09) .23 (.65) .93 (.12) 1.53 1.16 .94 (.05) .76 (.04) .64 (.04) .97 (.20) .74 (.09) .58 (.27) (.08) SMI 7 <r 1.25 (.59) t-stat 1.68 2.04 (.53) .004 (.021) - 27 - 38.27 10.63 .69 10.52 Recreation. LS 25th 50th 75th 1.74 1.26 (.09) .95 (.05) 42.87 (.19) 1.01 (.15) 14.85 2SLS 1.84 1.32 (.28) SMO 2.35 SMI (.16) 1.44 .95 (.07) .36 (.03) 1.93 .33 (.34) (.15) .25 (.02) (.26) (.09) 7.85 (4.76) Table Five: t-stat y cr ( .024 005) 48.62 33.23 15.88 21.83 . Elasticity Estimates for Level Equations with Covariates Food 25th 50th 75th LS .72 (.05) .66 (.03) .58 (.03) 1.90 2SLS .97 (.09) .85 (.06) .74 (.06) 4.40 SMO 1.00 .86 (.05) .73 (.05) .33 (.02) 1.25 .90 (.29) (.07) .58 (.15) .21 (.07) (.08) SMI y cr .009 t-stat Q 7.14 43.51 2.47 41.59 (.007) Clothing LS 25th 50th 75th 1.35 1.08 (.11) (.06) .88 (.04) 1.50 1.22 (.14) (.15) (.22) 1.31 (.09) .94 (.08) .33 (.02) 2.01 1.33 .002 (.14) .92 (.23) .31 (.73) (.11) (.011) 2SLS 2.01 (.31) SMO SMI 1 . 94 y <r t-stat 32.76 4.22 - 28 - 29.32 43.23 1.59 42.95 Transportation LS 2SLS SMO SMI 25th 50th 75th 2.54 1.83 1.50 (.63) (.14) (.08) .05 (.55) .70 (.16) 1.38 1.07 (.09) .87 (.06) .69 (.04) (.04) 1.86 1. 13 (.26) .61 (.07) .40 (.08) (.87) r <r t-stat Q 1.30 3.09 (.45) .61 .016 (.009) 28.01 30.12 2.61 28.92 Recreation LS 25th 50th 75th 1.73 1.25 (.09) .95 (.05) 35.96 (.19) 1.01 (.16) 10.69 2SLS 1.78 SMO SMI 1.31 (.30) (. 2.40 1.41 (.32) (. 11) (.08) 8.38 5.45 1.91 (.32) .18 .21 (.23) (.03) 17) .88 r cr .31 (.03) .021 (.005) 29 - t-stat 37.86 54.23 10.91 44.88 2267 U45 Date Due SEP 1 1999 nr