c. X ALFRED P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT CONSISTENT ESTIMATION OF SCALED COEFFICIENTS by Thomas M. Stoker July 1984 WP #1583-84 |^3V MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE. MASSACHUSETTS 02139 CONSISTENT ESTIMATION OF SCALED COEFFICIENTS by Thomas M. Stoker July 1984 WP #1583-84 i^SY "Thomas Stoker is Associate Professor of Applied Economics, Sloan School (1. o-f Management, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139. This research was funded by Foundation. discussions, P. Bickel grant from the National The author wishes to thank A. and C. estination. a Deaton, Manski J. J. Hausmann, and for useful Science Powell for valuable ongoing J. Rotemberg for useful comments, and conversations regarding adaptive ABSTRACT This paper studies the estimation oi coefficients B in single such that E(y I X) =F (a + X '6) 1 where the function F is misspecified or linear instrumental variables slope coefficient vector of shown to be consistent for up to a y models inde;-; unknown. regressed on X A is scalar multiple, where the instruments &re appropriately defined score vectors of the marginal distribution of X. The framework is illustrated by several common limited dependent variable aodels and models involving a transformed dependent variable. Similar estimators are indicated for multiple index models and models where extraneous variables are present. The construction of the instrumental variables is discussed, and illustrated by several examples. The asymptotic distribution of the instrufliental variables estimator is established. CONSISTENT ESTIMATION OF SCALED COEFFICIENTS 1. Introduction In which a this paper we consider the generic econometric modeling situation in dependent variable independent variables expectation of as a function of X a vector of where the conditional and stochastic terms, X given y is modeled y takes the form E(ylX) Flw+XS). This situation = exists for many standard models of discrete choice, censoring and selection, but is clearly not limited to such models. Our interest is in what can be learned about the values of the coefficients g without specific assumptions on the distribution of unobserved stochastic terms or other functional aspects; or unknown. For other words, in when the true form of the function F form is misspecified 1 different examples of limited dependent variables models, Ruud(1983a), Gol dberger 1981 ( Goldberger (1984) , ) Deaton and Irish(1984) and Chung and , among others, have studied the conditions under which OLS regression coefficients and other quasi -maxinum likelihood estimators will consistently estimate P up to scalar multiple. Ruud(1983a) points out that a a sufficient condition for this property occurs when the conditional expectation of each component of example, when X and Irish(1984) is X given = cx + XP is linear in Z, which is valid, for multivariate normally distributed. Goldberger (1981 and Chung and Gol dberger analogous condition with An Z a ( 1984 ) ) , Deaton point out the sufficiency of an more general definition of 2 Z. intriguing feature of this work is that it provides special cases where knowledge of the marginal distribution of X is very useful for estimating behavioral effects when certain features of the true model are unknown. The question is immediately raised as to whether more general results D-f this type can be obtained, because as Ruudn983a) suHicient condition is states, the above "too restrictive to be generally applicable." Results which apply to more general marginal distribution forms are of substantial practical interest because the marginal distribution of can in general X be empirically characterized. The purpose of this paper is to indicate that distributed random vector, knowledge of coefficient vector of a a X is a continously distribution the marginal general permits consistent estmation of 8 up to particular, we show that such if of scalar multiple. in X In consistent estimate can be obtained as slope a linear instrumental variables regression of y on X, where the instruments are appropriately defined score vectors from the marginal distribution of X. any two components of this slope The ratio of vector will consistently estimate the ratio of the corresponding components of These estimates may suffice for many applications, such as judging relative marginal utilities in a discrete choice situation. Moreover, because variables estimator is easily the asymptotic distribution of the instrumental established, certain scale free hypotheses can be tested, such as zero restrictions and equality restrictions on the components More broadly, the ratio estimates provide a choosing specific modelling assumptions. Namely, of B. consistent benchmark for if alternative functional form or stochastic distribution assumptions give rise to substantively different estimates of 6, the consistent ratio estimates can guide the choice of the best specification. For example, in a binary discrete choice situation, separate estimates of P under logit or probit assumptions could be judged in relation to the consistent instrumental variable ratio estimates. The exposition begins with notation, 2. examples and assumptions The main result on consistency of the instrumental in Section variables slope vector , is presented in in Section 3.2. Section 3.1, with immediate extensions to more general models they utilize the results o-f independent interest because some potential The proofs are oi Stoker (1982 1983) in , novel a presents facilitating results on the construction Section 3.3 way. the instrumental of variables, and Section 3.4 establishes the asymptotic distribution of the Specific examples instrumental variables estimator. distributions Are considered in Section to the previous literature where the relation of the results 4, is discussed. independent variable of Section contains some concluding 5 remarks. 2. Notation and Basic Assumptions We consider the situation where data yk and an M-vector of po(X) represents the density The conditional density q(ylX) (Bodel , Xi, for f; = distribution with density a the isarginal of variable where M>2. l,...,K, q(ylX)po(X), which is absolutely continuous with respect to = finite measure v. X. independent variables k=l,...,K represent random drawings froa (yk,Xk), Po(y,X) is observed on a dependent a o- distribution of represents the true behavioral econometric for which we assume that the conditional expectation E(ylX) can be written in the form (2.1) E(y IX) = F(c< + for some function F, constants, and Z is X'e) = where a F(Z) i s a constant, single index model. , . . . Z Ph) ' as an an M-vector of index variable, This framework is very general, subsuming (2.1) any standard limited dependent variable such models. i defined as Z=k+X'0. We refer to with a 0= (B models, but is not restricted to Before proceeding to specific examples, it is useful to note the following generic special case of Suppose that Z* is (2.1). index variable such that Z*-Z is independent of F*(Z*) for some function F", (2.1) is implied. X, then if EiylZ") = This implies the natural result variables can be omitted from that important behavioral general a X in (2.1) without affecting our results, provided that the omitted variables are independent the included ones. distributed independently Example 1 y represents a e > = y if 1 I X ) =F (ot true function F + X ' P ) F (« + X function. Example + X'0 + e for such an inde>:, where e is We now turn to some specific ' di examples: chotomous random variable modeled as -(a+X'S) is the probability of y=l given the value of determined by the true distribution of e. B) =*[ If e i, (cx +X g ) /o-) , where # is the cumulative nornal Logit models, etc. can easily be included. with the is distributed and variance a^, then the faailiar probit model normally with mean with 4 X. k otherwise = E (y of = Binary Discrete Choice : Suppose that Here write Z" We will of results, distribution 5 Tobit Models 2: Suppose that y is equal to an index Z* only if Z" is positive, as in the following censored tobit specification y = (x + X 'B + e E > -(a+X'0) otherwi se = Alternatively, if if y=K+X B+e is observed only when e truncated tobit specification. > -(a+X'B), we have the Example 3 Dependent Variable Transformations ; Suppose there exists where g(y) function g(y) such that the true model is of the form «+X 'B+E = g(y) a is invertible everywhere except for a set of v-measure 0. A specific example here is the familiar Box-Cox transformation where y^^* a+X B+E = ^' with y^^'' =C(y'^-l)/X] for \^0, y * ^ = ln(y) for X = 0. These examples serve to illustrate the wide spectrum of models covered by with general (2.1) found. function and many other single index examples can be F, Multiple index models are considered in Section 3.2. We now turn to the other assumptions required. Formally, we assume that set Q of X is continuously distributed, having carrier the following form: Assumption 1 ; £i i s a measurable, closed, convex subset of R" with nonempty interior. For XedQ, where dQ is the boundary of Q, we have F(a+X •8)po(X)=0 and Xpo(X)=0. Assumption allows for unbounded X's, where ft=R" and dQ=0. 1 For the bounded case Fpo and Xpo vanish on the boundary, which is obviously implied if po vanishes on the boundary. While the majority of the results employ Assumption 1, the incorporation of discrete (qualitative) independent variables is discussed in Section 3.2. The main regularity condition on the behavioral model Assumption 2 : F(Z) is di from ft by f f erent i able for all a set of v-measure 5 0. Z=o< + is X'6, where XeH, and ft differs f reasons we will utilize the translation family generated by For technical Po(X), defined as Q (6) ={X + e P(y,X)e) I = Xeft} q(y with B , I {p(Xie)} = TT X)p (X I compact subset a 8) where plXIS) so that P (y , , X I of 0) = po<X-e) is defined on R" with interior point 6=0. We set =Po (y , X ) , p(XIO)=po(X) and Q(0)=Q. We assume Assumption 3 P(y,XI6) : is twice di f variances and covariances of for (2.2) ^8 X and the score vector exist .1 8 ainP(y.X 18) 38 " E(.ll o is nonsingular X') for all BeB. clearly guarantees the existence of the means, variances and 3 covariances y, SeB, where all The matrix Assumption SeQ. The means, erenti abl e in 8 for all of y, and X A,© for 8=0, the data set moments. Note that Jlo can be written as: (2.3) If Sin p(X ilc I 81n po(X: 0) 38 8X we denote the mean of (2.4) E(y) f y •(8) = <^'i&) is di / for each SeB as yP(y,Xie)dv then we assume Assumption 4 : f f er ent i abl e for all e«8, with nonzero derivative 8=0. Finally, we give Assumption 5 in the Appendix, which is a purely technical regularity assumption that assures that derivatives may be taken under at expectations. While somewhat formidable technically, these assumptions are collectively very weak, 3. Consistent Estimation of Scaled Coefficients In this section we consider the slope estimates of the linear equation Vk = (3.1) c Xk 'd + + Uk obtained by instrumenting with evaluated The slope coefficients X^. at where (l,.lloi.'), (2.3) (di,...,dM)' can be written = d score vector is the A,o^ explicitly as (3.2) d where Sox = = ( Sox E.How (X^-X covariance matrices. Soy * ) ' ) /K and Soy = EJlok (yk-y ) /K are the relevant sample Section 3.1 we establish the main result that In strongly consistent estimator of B up to a scalar multiple. In d is a Section 3.2 we extend the result to more general models with extraneous independent variables and several index variables. instruments jlok in Section 3.3 we discuss how to construct the applications, and in Section 3.4 we establish the asymptotic distribution of 3. In d for statistical inference. The Main Result 1 We begin by showing A Theorem 1 ; Under Assumptions , 1, 2, 3, 4 and 5, d = Tg a.s., where T We first consider the unbounded case where Q=F" and dO=0. reparameterizing the translation family is the population mean of X is K-»oo . nonzero constant. Proof: lim in the data, TT by E(X) f m = and define E(y) Begin by Ho+6, where mo=Eo(X) = ^(w) = 4>*(i-'-Po). a Since Q(e)=0 for all Theorem in the unbounded case, application oi by a direct Stoker(1983) we have that oi 2 btB (3.3) 8f(0) 8<»<Po) d liffl a. s. 38 Hhere the latter equality follows fro* the definition of p. from computing the latter derivative. The reeult follows variables to By a change of x = X-6, we have that E(y) (3.4) Now, = from Assumptions ^^ (3.' - = 4)*(e) / 2 ; and 5, we differentiate — po(x)dv [;|f 8F where rr d at i s F(a+(x + e) 'B)po(>!)dv evaluated at Z = a = / — 8 (3.4) as Po(x)dv po(x)dv] + (>;+6) ' B. The result follows from evaluating 9=0 and inserting into nonzero by Assumption (3.3), where T BF = S "tt the applicability of Po(X)dv, and Z Theorem 2 of a Stoker(1983) to this problem. Theorem longer immediately valid, because ft(8)#fi when 3*" r^ do tV''^^ ^® in this case can be written = a+X'B. T is consideration very careful applies only when the carrier set does not vary with 8, and so derivative = 4. The bounded case where d0^8 follows from (3.6) (3.5) I -|^ / F(a + X'B)p(X I8)dv ^® a ei*0. of 2 is no (3.3) The structure of the as + ~ F(O( + J" X'0) po(X)dv ^^ SUB) where each term is evaluated at 8=0. The first term is the derivative of 4i*(e) holding the carrier set ^(8) constant at ft(0)=Q, while the second term derivative of *(8) holding varying the carrier set. the integrand constant at By repeated application of 8 F (a+X ' ) po (X) is the while Fubini's Theorem and the Fundamental Theorem of Calculus, the second term reduces to integrals of Fpo over boundary points Xed^, Theorem of •first (an 2 of so Stoker 1983) ( it vanishes by Assumption what the proof Now, 2. actually shows is that lim Soy is equal to the derivative. By an analogous argument, we have that lim Sox Mxtl identity matrix), so that equation (3.3) 3t: 38 shown to be valid in this is case also. Consequently, the result that lim d=TP a.s. follows. BED is quite nonstandard, The technique of the above proof some independent interest. The results of Stoker(1983) (and and possibly of the predecessor Stoker(1982)) connect the large sample limits of linear instrumental variables regression coefficients to the aggregate effects induced by distribution The above proof exploits changes, or changes in the sample configuration. these results by considering the implied aggregate impact artificial) type of distribution change. Namely, on E(y) of varying the density of X OS of a specific gives the local effect within the translation family effect is seen to be consistently estimated by d. (but TT. This The desired property of then established by calculating the value of the aggregate effect via d is (3.5). This technique of proof, namely to perturb the sample distribution and then trace the aggregate implications, may be useful The reason that the translation family J] in other contexts. works for this problem is that changes in the implied marginal distributions of Z=o<+X'B are determined locally in a neighborhood of 8=0 by changes in the parameter B'g. this feature provides a characterization of the scalar T. Namely, the Bean of the index 2 as a n = chain rule formula where T = E(Z) «+ (ho+S) = 8<{>'(0) , so that ' g, then (3.5) In if fact, we denote is seen to be T IS interpreted as the change ar, in E(y) induced by translation. a change in the mean of the index E(Z) under density 3. 2 Immediate Extensions Extraneous Variables and Multiple Index Models - The logic of the above proof can be immediately applied to more general Bodeling circumstances than provided by (2.1), which we outline below. For this section we expand the notation slightly to consider two sets of (only), and an Mj vector Suppose that the independent variables; an behavioral model for implies that the conditional expectation Xi Xj. of y given X, the form and Xa 15 of E(y (3.7) y Mi>2 vector I X, ,Xj) = F((x, + Xi 'Bi.Xa) for some function F and constant coefficients od extraneous variables X^ to the model (2.1). and Bi. (3.7) We assume that just adds the (Xi',X2')' is distributed with density po(Xi,X2). It is easy to see that if Xi is continuously distributed and have no common components, then knowledge of Po(Xi,X2) estimation of Assumptions respect to up 6i through 1 a 5 scalar multiple. to apply to Xi In particular, reinterpret (defining the translation family with " " 3X» and consider the slope coefficient estimates di ye = Cj + Xmd, + obtained by instrumenting with 1, we have that lini d| of the linear equation U, (l,Jlii.')'- '' Theorem allows consistent ain Po(X^,X2) _ ^' (3.9) and Xz only), define the generalized score vector as Xi (3.8) to Xj = By reinterpreting the proof of 3F Tipj a.s., where Ti = J'r7- po(Xt,X2)dv, Zi=ai+Xi Bi. This result indicates that extraneous variables are acconodated in the 10 above analysis through their impact on the instruments could be ignored i .(li -f did not depend on the value of (3.8) o-f The variables Xz Jin,. Xa, sufficient condition is that Xa is distributed independently of indicated in the discussion of generalized index a Z* of Xi for which a (as Section 2). The extension permits the analysis of two codinionly encountered practical situations which were not previously treated. The first occurs when the variables X2 are qualitative variables, not continously distributed. The above result says that when the qualitative variables X2 are not independent continuous variables Xi, the coefficients 81 be consistently estimated up to variables regression (3.9). a of of the the continuous variables can scalar nultiple by the instrumental The instruments Jin, in this case are just the score vectors of the distribution of Xi conditional on the value of X2, evaluated at Xi=Xik and X2=X2k. The second practical situation occurs when the behavioral model employs several index variables. Suppose that M2>2 and that the conditional expectation (3.7) can be written E(ylX,,X2) 13. 10) = F(o<, + X, in the two index form 'Bj,o;2+X2'B2) = where Zj=Oi + Xi'Bi and Z2=o<2 + X2 62. As above, when ' distributed and di Xj F(Z,,Z2) is Xi the slope coefficients and X2 have no variables in common, consistently estimate Tjgj. Moreover, if continuously X2 is continuously distributed, then the same argument can be applied to estimating Ba up to scale. reinterpret Assumptions Assumption 3 1 through 5 to apply to X= (Xi ' , X2 ) , ' ' Formally, noting that implies that no linear combination of the components of perfectly correlated with any linear combination of the components of we define (3.11) Jl= 31n Po(Xi .Xa) dXz 11 Xi is X^. If 2 ; coeHicients and dz as the estimated slope (3.12) = Vk Ca Xzk da + + the linear equation o-f U: obtained by instrumenting with (l,il2k')', then we have that lim d2=T2B2 a.s., where T2=J' 3F Po <X| ^-. d1 Equi valent (3.1); 1 y with Z2=a2 + X2'0a. Xg) dv, , we can set , (X,',X2')' = X and perforni the single regression here repeated as yk = where (l,Jlok')' = (3. 13) + c ( 1 Xk + d ,ili k ' , u» Jt2k) ' is used as the instrumental above development, we clearly have that lin coeHicients of both index variables Zj^od + = d' ,T2B2 ' ' ) of F the scale factors Ti and T2 will of a Thus the a.s. in is unknown. (3.10) It general not be equal, that only the ratios of components of Bi or ratios of components of consistently estimated. Ratios From the Xt'Bi and Z2=K2+X2'62 can be estimated up to scale when the true function be noted that (TiPi variable. component of 0, to a component should so B2 sre of B2 are not identified by d. A standard example of a two index model obeying (3.10) is the selection bias model studied by Heckman 1979) ( Example 4 ; Selection Bias Suppose that y = but that cx, + y y X, is equal Bi + to an index as in Ei is observed only We assume that Z,*, (ei,E2) if a second index is distributed conditional expectation of y given Xi Z2" = cx2+X2 P2+E2 is positive. independently and X2 is ' of (X,,X2). Thus, the E(y I X, ,X2,Z2">0) so that the structural X.ei = Ki = F (0(l+X, 'e, + + E(e,Ie2 > -(Ka+Xagz)) ,«2+X2'B2) parameters 0i and the selection parameters 02 can be estimated up to scale without explicit assumptions on the joint distribution of Notice that in this example Ti=l, so that lim di=6i a.s. (Ei,E2). By comparing Example 4 and the truncated tobit specification Example of 2, we see that selection parameters can be estimated up to scale in two polar situations, namely when the selection index the structural structural index index Zj. Zi, l-z has no variables in comson with when the selection index Zz is equal or Moreover, it is easy to verify that variable appearing in both Zj corresponding component of d there is a common then the large sample limit of the and Z2, of if to the (3.2) is the sum of the corresponding 8 components of Tipj and T2B2. The above discussion has focused on two index models; clearly analogous results can be obtained for models with three or more index variables. While we now return to the notation and framework of Section 3.1, all of the following econometric results can be reformulated for the above estimators wi thout di f f i cul ty. 3.2 Construction of the Instruments In this section we discuss the empirical variables Jlok. construction of the instrumental There are two cases in which application of Theorem 1 is particularly easy. The first is when the density po(X) is known exactly, so that Jloi, can be computed directly from (2.3), and evaluated at Unfortunately, this case is never likely to be met in case occurs when the form of the density is congenial 13 X= practice. in that Jl© Xk. The second is exactly f colllnear with X some known function of or We discusE this case in X. conjunction with the normal distribution examples In in Section 4. general applications the above circumstances will not be valid, and the score vectors have to be estimated. will kot, marginal distribution of is modeled, X to some estimable parameters, score vectors Here we assume that the with the density po(X) and establish that the natural known up estimates the provide valid instruments. We then briefly raise the Jlok prospect of estimating nonpar ametri cal Jlow 1 y. Suppose that the marginal density is assumed to lie within family p"(XIA), so that po(X) = p*(XIAo), where A is parametric a finite vector of a parameters which can contain the mean, variances and covariances, etc. with true value A=Ao. that p* is twice of di f We make Assumption 6 the Appendix, of of X, which assumes erenti able with respect to the components of A as well as some other regularity properties. The application of Theorem 1 now proceeds in two steps. First obtain any strongly consistent estimate A of A=Ao using the data X^, k«l,...,K. Standard goodness of fit tests can easily be performed at this stage to assure the suitability of the assumed parametric form p". the score vector l^^ (3.14) . kot, - for each k= l,...,K by evaluating ^JLlLllll^ ,= , (2.3) at and A as Xw K A and form the instruoiental Next construct estimates of variable estiroator d* « A (d t* A , . . . , dw*) ' o-f (3.1) A using (3.15) Jlok as in d' = (Sox)-'Soy where Sox and Soy are the sample covariance matrices between respectively. The justification of JLoi. and this procedure is formalized as 14 Xi, and y^ . Theorem 2 : Under Assumptions 2, 1, d" = Tp a.s. above and Theorem 7 of 5 4, 3, and 6, 1 i m , K-*oo Proof: A direct application of Theorem 1 reinterpreted to apply to the elements While Theorem it 2 of Stoker 1983) ( QED A. permits the implementation of Theorem 1 in applications, relies on specific modeling of the independent variable distribution, question of significant practical vectors .llok can be nonparametr i cal A importance concerns whether the score 1 y estimated, because then specific modeling assumptions on po(X) would not be necessary. A number of natural methods for such nonparametric estimation come to mind, such as to use an adaptive score estimate of the type proposed in Stone(1975), Bickel(1982) and Manski < 19B4) Unfortunately, to the author's knowledge, no results are available on nonparametric score vector estimation for multivariate distributions, as the above papers are concerned only with univariate distributions. this topic is mentioned because of its natural Consequently, importance, but relegated to future research. 3.4 Scale Free Inferences on The above results establish the strong consistency of the instrumental A variables coefficients d" (and A d) asymptotic distribution of d" (and as an estimator d) hypothesis tests on the true value of and ratio restrictions In to be carried out. free hypotheses include zero restrictions <Bi=Bj) TB. this section the established, which allows scale free is P of Examples of scale (Bj=0), equality restrictions (Bi/Bj=c). Because the data on observed variables and instruments represent i.i.d. drawings, the asymptotic distribution of d" can be established by very standard methods. Ne sketch the argument below, which is just the appropriate specialization of the results of 15 ( 1 Whitef 1980, 1982) among others. , Consider the generalized setting where an M-vector of variables observed, so that the full set represents a is observations (yk,Xk ,Wh'), k=l,...,K, of random sample from Wi, joint distribution of y, a X and U. Of interest is the asymptotic distribution of the instrumental variables estimator dw of equation obtained by instrumenting nith (3.1) (l,Wk')', defined as < 3 . 1 dw 6 where S„x = = (Swx) 'Sw, X^-X S Ww-W) ( ) ' /K and S„y covariance matrices. We collect = E (W^-W) (y^-y /K are the relevant sample ) sufficient set of regularity conditions for a the following results on dw as Assumption IV in the Appendix. Assumption 7 lists the requirements not covered by Assumptions 1-6 for the specific application If of this paper. we define matrices between in Sw W turn define Uwk lim i:(Wk-W)Uwv/K = /K(dw - (3.17) = (2wx)"'2wv, where Ewx and Zwy are the covariance and y, = (y^-y a.s., Sw) By applying plim Swx = = respectively, then clearly lim dw X ) - (Xh-X ' ) Sw, = Sw a.s.. If we then we have immediately that Eo(uwi.>=0, and (Swx)-' ^Jlll''""^""'' Ewx, and the Central Limit Theorem to the second term, we have Theorem 3 ; Under Assumption IV, as mean K-»oo, /K(dw-6w) is asymptotically normal with and covariance aatrix Vw = Ewu.wu is the covariance matrix of Uw=(y-E(y))-(X-E(X)) Sw. 16 ( Ewx ) ~ '2wu ,wu Ewx ( (W-E(W))uw, with ) ' "' i where Following Whi te (1980 1982) , , V„ is consistently estimated by Vw=(Swx)"*Swu,wu(Swx')~\ where Suu,wu = EnWk-W)(Wk-W)'Uwk^]/K, and ' _ Uwk= (y i.-y _ ) - < Theorem Xi,-X 3 ' -'• ' ) dw is the estimated residual -from establishes the asymptotic distributions coefficients estimators studied in Stoker (1982, 1983) Theorem Wi, = 3 Corollary is applied by setting 4 Under fissumptions ; (3.1) 1, 2, all -for For of the linear (3.2) of d dw above, to yield Jlok 3, . using 4, and 7, as 5 asymptotically normal with mean /K(d-Tg) K-»oo, is and covariance matrix V=Eou,ou, where Sou.ou is the covariance matrix of A-oUo, with uo=(y-Eo(y) )-(X-Eo(X) Proof: Lemma 1 Stoker(1983) of ) 're. implies that lim Sox is the Jacobian matrix of E(X)=Po+6 with respect to 6, which is the MxM identity matrix. Theorem then yields the result. Following Theorem Corollary 5 : since plim Sox Under Assumptions 1, 2, 3, 4, = plim Sox, we also have 6 5, asymptotically normal with mean As above, V = Sou.ou the asymptotic covariance matrix = E[ (.iokA-oi. above QED for d", 2 3 ' ) Uoi.^]/K or V* = /K(d"-Te) is as K*m, and covariance matrix V. is consistently estimated by V (Sox and 7, ~ ) 'Sou. ou (Sox ' ) " ' , where Uoi. is the estimated residual from (3.1) with coefficients d*; namely uoi. = (yk-y)-(Xk-X) Corollaries 4 d*. and 5 establish the asymptotic distribution required for testing hypotheses on the value of TB. This facilitates the testing of certain hypotheses on the value of g, which are scale free in that they are unaffected by the true value of T. For example, if 1 is an M-vector of constants, the linear restriction l'B=0 is equivalent to l'(TB)=0, under which the test 17 f statistic Id* asymptotically normal with mean 15 Therefore, by choosing appropriate values of (such as Bj=0) using d* and and equality restrictions 1, and variance I'Vl. tests of zero restrictions can be carried out (such as Bi=6j) V. Tests on the value of a nonlinear di f erentiabi e function of Tg can be derived by the "delta" method in the usual way. As an example, for testing whether the ratio 8i/Bj is equal to /K[ (di •/dj") - (01 /B j) is ] a specific value, we have that asymptotically normal with mean and variance cjij, where (3. 16) where di=TBi ai, 1 =7^ and Vij +~ ^ v.. is the d.a i,j '7?^- V,, element of V. CTij is consistently estimated by evaluating (3.10) using the appropriate components of d* and 4. V. Independent Variable Distribution Examples and Related Discussion Here we present several examples based on specific forms of the marginal distribution of X, to illustrate the structure of the instruments ilok and relate our results to the previous literature. 4,1 Multivariate Normal As indicated above, Distributions the implementation of Theorem 1 particularly easy is A if the score vector of OLS slope coefficients of yk regressed on Xk, Jlo is exactly collinear with verify that this situation will occur • ul ti van ate normal if 18 for then k=l,...,K. and only if po(X) form over 0, as follows. the form X, Suppose that d is the vector It is Jlo of is easy to the can be written in (4.1) where = Jlo A A + BX is an M-vector generality, A=-B(Eo(X) is B and B an MxM matrix of constants, symmetric and nonsingular. Since Eo(Jlo)=0, we have that Now, )=-Bi-'o. view of in which implies that In po(X) In po(X) (4.2) = - C for some constant C, (4.3) where without loss of with respect to (4.1) X, the form must be of (1/2) we integrate (2.3) (X-Mo) 'B(X-i-'o) which for E"'=B clearly implies that l(Xeft) po(X) Pn(X pN(XI|Jo,i:)dv J- I |-'o,E) ft where Pn(XI|-'o,E) multivariate normal density with mean Mo and is the covariance matrix E, and l(Xeft) is the indicator function of the event Consequently, (4.1) the carrier Q, It Po ( p (X X I ) =Pn X and when ft=R", Pot E) I 8) =Pn (X I The translation family . Mo+S,E) . computed via (2.4) Z = Pn TTn The induced marginal determined by B'B, as p* of then po(X) ( X I multivariate normal over is E) i-'oi . informative to reexamine the structure of Theorem is ( implies that po(X) general in ( Z 6 g) =Pn ' I as E(y)=4)*(e), (2 k + ijo 6 when Q=R" and 1 is defined via distributions of ' I Xeft. + 6 B g Eg) ' ' , . Z= (x + X'B are The mean of y can be equivalently using the marginal density or as (4.4) f(e) = = E(y) = S **"(e'B) The aggregate effects on E(y) a*' 38 (4.5) Now, since TTn p"(Zie'0)dv F(Z) ./Ml' changing G can therefore be written as of W SO ={ V3(B'e)1/ Vae is an exponential g) \ /' / 9<|>" U(e \ p)/ ^ fanily with driving variable 19 X, the results of imply that the OLS slope coefficient vector stoker 1982) ( 8* * •r^ strongly to ( ) which from , gives the result (4.5) d (3.1) o-f Theorem of converges 1. The earlier interpretation of the scaling factor T as the effect on E(y) of varying in r)=E (Z ) =a + + i-'o' 8 B is obvious from ' exponential family form with driving variable same results from Stoker regression coefficient It and is useful of on yk another application of the Z, indicates that T is the a.s. 982) Zi, =a+ is limit of the OLS Xk'0. this point to discuss the results of Ruud<1983a), Deaton at ( 1984) While Ruud<1983a) . studies maximum likelihood estimators for binary discrete choice models and and Chung and Gol dberger Deaton and lrish(1984) definition linear in of the indicator E(X IZ) G = 1984) each paper utilizes a employ generalized a condition that E(XIZ) is G + HZ and H are M-vectors. normality of X. one dimensional function Z, ( Z; (4.6) where 1 and Chung and Gol dberger lrish(1984) quasi ( Finally, since p"(Zie'0) (4.5). F For Z=K+X'g, The value of condition for the purpose of inpacts only on (4.6) (4.6) inplied by multivariate is that it makes X calculating covariances with scalar covariance. a is effectively y, so that the This is easily seen from the following proof, which is basically Chung and Gol dberger ' s ( 1984) result for censored model cases. Let y=F(Z), Sxx, Exy, and Lxz denote the respective covariance matrices between X, y and Ozy and Cx' denote the respective and Z, scalar covariance values. Now, begin by expressing Lxy as (4.7) Lxy = Eo((X-Po)y) = H = E2(E(X-|JolZ)F(Z)) Ez ((Z-r>)o)F(Z)) = H azy where the second equality follows fro* (2.1) and the latter equalities follow :u •from (4.6). function y F. Note that the value of ary entirely captures the impact recalling here that Noh, regressed on X, the the OLS slope coefficient vector of is d o-f we have (4.8) Hcrz, r^) where the latter equality -follows from H=I!xz/o-z' and B= (Exx) ~' Exz. This argument was recalled in order to indicate the identical role played by the linearity condition (4.6) and density translation in the normal distribution case. The fact that the marginal densities of e'B suffices to reduce the dimensionality (4.5), Zk=ot + Xk the aggregate effects as in which IS exactly the impact of the linearity condition Moreover, equality between that T of depend only on Z = (4.5) and (4.8) (4.6) on (4,7). gives an alternative demonstration (Jzy/az^, the large sample OLS slope regression coefficient of yi, on B. Given interest in conditions that depend only on the marginal distribution of X, it is natural to inquire how much more general than multivariate normality is the linearity condition (4.6) with Z=a+X'B. We have no concrete answer here, although no obvious examples of nonnormal densities where (4.6) all is valid for individual components of regularity conditions) X a, are imiaediate. It is true that if the are independent or homoscedast (4.6) i c implies multivariate normality Linnik and Rao(1973)), although the implications of (4.6) , then (c.f. (under some Kagan, to more general circumstances are not known to the author. 4.2 Mixtures of Normals An obvious circumstance where the X data was nonnormal would occur the sample distribution displayed several pronounced modes. 21 In if the this case, it v might be appropriate to model the density via X indicate below that the large sample limit appropriate weighted average of the limits of this case is the in d o-f mixture of normals. We a the OLS slope coefficients that would be obtained from regressions over each of the component normal di str 1 buti ons. fill the intuition of this example can be seen in the case of of component normal mixture. Suppose that the marginal distribution of two a is given X as (4.9) Po(X) where pi(X) = = using (3.1) limit of (4.11) d Pa and ¥ = - Xpi (X Jlo where w(X) + (l-X)p2(X) Pn(XIm>,2i), Pr(X) densities, Pi (4.10) Xp, (X) = as A-ok < P°'^' 11'^ ) /po X ( X ) . 1. < If d (1-H(X))E2-MX-(J2) + is the instrumental instrument, then lim d = TP a.s. variables slope vector of By direct lim = d XEi-';(X-Mi)yq(y X)p, (X)dv + I Xd, + ( 1-X £2"' / X-M2) yq ( ) (y I X ) pa ( X ) d (1-X)d2 where dj is the large sample value of the OLS coefficient of X computation, the can be written as = if component The relevant score vector Xo is w(X)E.-MX-|-',) = are the normal Pn(XIp2,I;2) = y on X of (3.1) was distributed with respect to the normal density pi(X), and d2 is the large sample value of the OLS coefficient of y on X if X was distributed with respect to the normal density p2(X). Consequently, one can consider the proper slope estimator in this context as weighting together regression coefficients from samples distributed with respect to each of the component densities. From Section 4.1 we have that dx=TjB and 62=^2$, where T = XTi+(l-X)T2. This is consistent with where ^i" <^**, the formula E(y) = f"(e'e) = \^i"{eB) + (1 -X and +2*** are the aggregate functions derived as in ) <ti2** ( B B ) from (4.4) translation families generated by po<X), pi(X) and psiX) respectively. course, separate OLS estimates of d, and da could not in general be Of computed with observed data, because is it not in general possible to identify which observations Xk were drawn individually from pi(X) or from pztX). estimates to compute d, Jlok of the true score vectors have to be constructed, which requires estimates of Mi, Ma, the consistent roots of Also, Sz and 2i, These could be obtained as >,. the likelihood equation for Xi<, k = l,...,K implied by Finally, the above weighted component regression interpretation clearly (4.9). holds for the case of a mixture of more than two normal components. Elliptical Distributions 4. 5 the same fashion as OLS slope estimators arise when independent In variables are multivariate normally distributed, weighted least squares for when the independent variables are elliptically estiffiators are called distributed. Suppose that the marginal density of (4.12) po(X) where p© Eo(X) = = and p*( -y (X-Ho) Lisa E-MX-Ho) X has the elliptical form ) positive definite matrix. Here the score vectors take the form Ho (4.13) where r(X) - = = a)(r (X) (X-Mo) ^ ' by /cu in ( r ( X k ) 2-' (X-^o) E'MX-po) The proper instrumental is weighted ) is the distance measure and tu(r)=- variables estimator d ^ -r . or for proportionately estimating B least squares, where the data for the k*" observation are weighted ) . In the multivariate normal case we have u)(r) = general that Eo(A-o)=0 implies that the weights u)(r(X)) 23 l for all r. Note are uncorrelated , with however correlations with squares and cross products of X, possible. Po(X), in X are one would require estimates of the parameters determining As above, particular m© and E, for each observation in order to estimate the proper weights ci)(r(X)) X^. Multivariate Lognormal Distributions 4. 4 Other cases where X is distributed sample distribution of X is skewed, in nonnormal -fashion occur when the a as one would expect measuring income or other wealth components. As for variables example we consider final a lognormally distributed; namely where In(X-X) the case where X is distributed as a multivariate normal vector with mean m* and covariance matrix E*, with X is a vector of constants. Here Q is defined as the set is = {XIX>X}, with the standard definition of the lognorsial density augmented by setting it to zero for XedQ. computation, the appropriate score vector for this case is By direct given as (4. 14) Jlo where diag(X-X) [diag(X-X) ]-' = is the diagonal construct the vectors estimates of X, A,ok, [ l + ( E») "M n 1 aatrix with ( X-X) -M») i''" one would evaluate 3 diagonal (4.10) at element Xj-Xi. Xi, To and consistent h* and E*. This example points out the close connection between the proper score vectors and the specification of the index The proper score function is given by specification above, of in Z the behavioral when (4.14) variables in the index Z=a+X'B. and the index Z were defined as Z=o< + l n ( X) X If, ' 6 , is equation (2.1). the correct alternatively, we set X=0 then the results of Section 4.1 would apply, with the proper estimator the OLS slope coefficients of Vk index regressed on InlX^). While in many applications the precise Z -form of may not significantly affect the coefficient ratio estimates, the is it important for the correct application of our results. 5. Summary and Conclusion In this paper a linear instrumental variables estimator d is proposed for estimating the ratio of coefficients in single index models. The framework is illustrated by several common examples of limited dependent variables models, as well as models involving a transformed dependent variable. Similar estimators are indicated for multiple index models, and models where extraneous variables are present. The construction of the instrumental variables is discussed, and illustrated by several examples of specific independent variable distributions. The asymptotic distribution of established for purposes of statistical 1 c to the extent that it is is inference. There sre two major advantages to the proposed estimator nonparametr d d. First, d is robust to aany specific functional form and stochastic distribution assumptions. If a particular application requires only estimates of the ratios of components of 0, then A d will suffice. Scale A, free hypotheses on g can be tested using d. Moreover, in a general application where different sets of modeling assuaptions produce substantively different estinated parameter values, d will provide useful information for choosing the best specification. A The other major advantage in using d is that it is a linear estimator, once the instruments are computed. Consequently, once the distribution of the independent variables is characterized, the computation of d is easy and relatively inexpensive, particularly for large data bases. There are also two drawbacks to the results. First, to construct the proper instruments, the distribution of the independent variables must be 25 modeled, and the score vectors derived from the assumed density. This problem can be overcome by further research on nonpar ametric estimation of Bultivariate score vectors, which given the current state estimation, appears very promising. of work on adaptive The second drawback is that our results apply only to estimating the coefficients of continously distributed variables, but most serious applications to mi croeconomi using discrete as well as continuous c data will require independent variables. While we have indicated above how discrete variables can be accomodated in the estimation of continuous variable coefficients, the question of how to nonparametr call estimate discrete variable coefficients up to scale remains open. i y f Append! For the purpose >: ; Further Regularity Assumptions differentiating under integral signs, define o-f difference quotients as .^,y,,,„ n X,Cpoa-he,)-po(X)3 = D.,(x,h) yq(ylX)[po(X-he,)-po(X)] ._ i,j=l,...,M, where for with component j*^^ Assumption 5 and 1 h is a scalar. IDyj(y,X,h)l for D, all J i (X,h) , j = l of X, ej is the unit < gi I , < . . . h where vector We now make There exists v-integrable functions gvj(y,X) ; i,j=l,...,M such that for all I i*^ component is the Xi < Ihl < and gij(X) for ho, gyj(y,X) J (X) ,f1. For the purpose of using estimated parameters to construct the score vector instruments, define 31n p* A,o(A) (X I A) dX Denote the j"' component of A,o(A) Assumption of A in 6 ; p"(XIA) is twice neighborhood an open of di as Jloj(A) f erenti able with respect to the components A=Ao. There exists measurable functions 6vj(y,X) and 6ij(X), i,j=l,...,M such that I I yJloj(A) Xjl-oj (A) 6yj(y,X) < I I < and assume Gi j(X) 27 ( < for all A bounded for some x>0, A d an open neighborhood in .1+T where Eo(Gyj) Ao, and Eo(Gij) 1+T are i,j=l,...,M. sufficient set of conditions for establishing the asymptotic distribution The means and covariance matrices of y, Assumption IV covanance matri;-; Uw= (y-E (y ) ) - ( : variables slope estimators is given as instrumental of X-E Ewx = ) ( X ) ' 6w, E [ ( W-E W) ( ) ( X-E ( X ' ) ) ] and W exist, is nonsingular. the covariance matrix of For deriving the asymptotic distribution of X and the For (W-E(W))Uw exists. the specific estimator of this paper, we require Assumption 7 : For Uo= y~Eo (y ) ) - X-Eo ) (X ) exists. 28 ' Tg , the covariance matrix of JloUo . I Footnotes 1. The sensitivity of estimates to specific stochastic distribution assumptions in certain limited dependent variable contexts is well known. For example, Heckman and Singer(19S4) illustrate such sensitivity for duration models, and establish an approach based on nonpararoetr cal 1 y estimating the stochastic heterogeneity distribution. i 2. See also Greene (19B1 Ruud(19B3b) technique. 3. studies , 1983) Lawley(1943) and Stewart 1983) ( , similar estimation problem and proposes a a different The behavioral modeling framework of Deaton and lrish(1984) and Chung and Goldberger 1984) is slightly different to that considered here, since it subsumes situations where e (our notation) is uncorrelated with X, but possibly not independent. 4. ( Man5ki(1975) presents an alternative nonpar ametri both 0! and g for discrete choice models. 5. 6. Note that Assumption = {X + 81 XeiJ,eeB}. 3 requires that F(o(+X'B) c method of estimating is defined over the set ft(8) There are a number of regression estimators that measure the effects of 7. discrete variables, however none appear to estimate the coefficients of discrete variables up to the same scalar multiple as applicable to the continuous coefficients. For example, suppose that Xa is a single discrete variable taking the values and 1, and the behavioral model implies that E (y The joint density of Xi and X2 can be written as Xi Xz) =F (cx + Xi B + X2P2) ' I 1 , PoUj.Xa) . = (l-X)p'='(X,) If X2=0 = Xp» (X,) if X2=l where X is the probability that X2=l and p-" is the conditional density given that X2=j. Now suppose that one estimates the equation yk = + c Xiw'dj + + X: of Xi u. A A using instruments (1 ,iloi. A,di. where Jld = Sin po (X, X2) / 3X so that (dj',d2) is an estimator of the macroeconomi c effects of varying E(Xi) and E(X2) on E(y). It is easy to show that lim di=TjBi and that lim 92=Eo (y X2=l -Eo <y X2=0) While d2 is a measure of the impact of the discrete variable X2, the conditions under which lim d2=TiP2 appear to involve severe restrictions on the structure of the function F. , ) , , , ) I . For instance, in the selection model of Example 4, if a variable Xi was contained in both Xi and X2, then its coefficient di from (3.1) will consistently estimate 0it+T202i, the structural coefficient plus a selection 8. term. Man5ki(1984) also proposes similar work on multivariate extensions. It should be noted that the nonpar ametri c estimation o-f JLok called -for in the present paper is not as demanding as that proposed by Manski, because the X data is observed. 9. 10. of 11. Moreover, V* is just the "heteroscedast White(1980). On this point, i ci ty consistent" variance estimator see the discussion in Deaton and 30 Irish(1984). References "On Adaptive Est i mat i on Bickel, F'.(19B2), , " ftnnals of Statistics , "F'r opor 1 1 onal Projections C-F. and A. S. Gol dberger (1984) Dependent Variable Models," Econometr ica 52, 531-534. Chung, , 647-671, 10, in Limited , Deaton, A. and M. Irish(1984), "Statistical Models for Zero Expenditures in Household Budgets," Journal of Public Economics 23, 59-80. , Greene, W. H.(1981), "On the Asymptotic Bias of the Ordinary Least Squares Estimator of the Tobit Model," Econometr i ca 49, 505-514. , Greene, W. H.(1983), "Estimation of Limited Dependent Variable Models by Ordinary Least Squares and the Method of Moments," Journal of Econometrics, 21, 195-212. Goldberger, A. S.(1981), "Linear Regression After Selection," Journal of Econometri cs, 15, 357-366. Heckman, J. (1979), "Sample Selection Bias as Econometr ica 47, 153-161. a Specification Error," , "A Method for Minimizing the Impact of Heckman, J. and B. Si nger 1964 Distributional Assumptions in Econometric Models for Duration Data," Econometrica, 52, 271-320. ( , ) Y. Kagan, A. M. Mathematical , Linnik and C. R. Rao(1973), Characterization Problems in Statistics, Wiley, New York. V. Lawley, D.(1943), "A Note on Karl Pearson's Selection Formulae," Proceedings of the Royal Society of Edinburgh, Section A 62, 28-30. , Manski , of F.(1975), "Maximum Score Estimation of the Stochastic Utility Model Choice," Journal of Econometrics 3, 205-228. C. , Manski, C. F.(1984), "Adaptive Estimation of Non-linear Regression Models," draft. Department of Economics, University of Wisconsin. Ruud , "Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecif ication of Distribution in Multinomial Discrete Choice Models," Econometr ica 51, 225-228. P. A. (1983a) , , Ruud, P. A. (1983b), "Consistent Estimation of Limited Dependent Variable Models Despite Mi sspeci f ication of Distribution," Draft. Stewart, M. B.(1983), "On Least Squares Estimation When the Dependent Variable Review of Economic Studies 50, 737-753. is Grouped," , Stone, C.(1975), "Adaptive Maximum Likelihood Estimators of Parameter," Annals of Statistics 3, 267-284. a Location , Stoker, T. M.(1982), "The Use of Cross Section Data to Characterize Macro Functions," Journal of the American Statistical Association, 77, 369-380. 31 stoker, T. M,(1983), "Aggregation, Efficiency and Cross Section Regression," MIT Sloan School of Management Working Paper No. 1453-83, revised April 1984. White, H.(1980), "A Heteroskedasti ci ty-Consi stent Covanance Estiwator and Direct Test for Heter oskedast i ci ty " Econometr i ca 48, 817-838, , , White, H.(1982), "Instrumental Variables Regression with Independent Observations," Econometr i ca 50, 483-500. , 3790 054 a 3 ^ Dan DD M MT3 Sfifi Date Due Lib-26-67 BASF.MFMT Q^o^ uo/f^^