ALFRED P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT NONLINEAR THREE STAGE LEAST SQUARES POOLING OF CROSS SECTION AND AVERAGE TIME SERIES DATA Dale W. Jorgenson" and Thomas M. Stoker"" April 1982 August (Revised: I983) Sloan School of Management Working Paper #1293-82 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 ^.S3 NONLINEAR THREE STAGE LEAST SQUARES POOLING OF CROSS SECTION AND AVERAGE TIME SERIES DATA Dale W. Jorgenson" and Thomas M. Stoker"" April 1982 August (Revised: 1983) Sloan School of Management Working Paper #1293-82 "Department of Economics, Harvard University Cambridge, Massachusetts 02138 '"Sloan School of Management, Massachusetts Cambridge, Massachusetts 02139 Institute of Technology MONLIMEAR THREE STAGE LF^ST SQUARES POOLING OF CROSS SECTION AND TIME SERIES OBSERVATIONS by l^ale l_, Iiitroduc t Jorgenson and Thomas V/. U. Stolcer The purpose of this paper is to discuss the pooling of ion . cross section and average time series data by the method of nonlinear three stage least squares introduced by Jorgenson and Laffont (1974). ^ \,e applications of this method to exact aggregation models, where there unique correspondence between individual and aggregate behavior. consider is a This correspondence makes exact aggregation models appropriate for the analysis of average data, individual data, \lc or both in coi.bina consider observations on K individuals, T time periods, indexed by t = 2 1, form of an exact aggregation model ... for T. V/e t ion. " indexed by k = can represent the the kth individual the in 2 1, ... Iw for structural tth time period by: ^nkt = ^kt Pn^Pf The observations y^^j.^ ^'^' and (n = Xu^^ 1. 2 ... N). vary over both individuals and time periods, while the vector of observations p^ varies over time periods, but is the same for all individuals in a given time period. functions of the observations 6' = ^^1' ^2 • ^L^ • p^ The coefficients P (p , O) are and the vector of L structural parameters Restrictions on the parameters are embodied in the forms of these functions. We can write the exact aggregation model for the kth individual in vector form yj.,. = (I., » Xj^^) p(p^, e), (1) - 2 where y vector of N observations, is a and cients, I. is ©) is vector of N coeffi- By averaging the model identity matrix of order N. tlie a (1) individuals for each time period, we obtain the structural form of over all for averaged data. eact aggregation model the fiip^, ^t = (^N ® where y and x' ^P (2) ^^ P^i't' are vectors of M observations on averages of y, . and x,' over individuals. all The models for individual cross section and average time series observations contain the same parameter vector 9 and the same coefficient vector P(p^, ©) . This reflects the correspondence between individual and aggregate behavior that characterizes exact aggregation models. The forms of the individual and aggregate (1) i.iodel and (2) are necessary and sufficient for exact aggregation, provided that the population distribution of tricted. x is unres- ^ As an example of exact aggregation models we first consider the linear model that underlies previous discussions of pooling cross section and time series data ^ ^nkt = Pt «ln where G, of models parameters ^ 2 ... Kt «2n (n = ' and 0^ are vectors of parameters. Zn In 1, ' N) . (1) and (2) x/ kt is " (1 ^ z' ' 2 kt (i (p N) ... . In this example the vector of includes the elements of The vector of coefficients vector of observations 1. e) ' is (p^ and 0tzn (n = In 0j,j , 0oj,) and the ) * Deciand analysis provides many examples of nonlinear exact aggregation models. In each of these examples the theory of consumer beliavior implies constraints on the parameters of the model that are incorporated through the - 3 - form of the coefficients P^^Cp^, e) = (n 1. ... 2 N). Demand systems generated indirect utility function are nonlinear exact by the Gorman polar form of the Specific examples include the linear expenditure system aggregation models. introduced by Klein and Rubin (1947-1948) and implemented by Stone the (1954), S-branch utility tree of JJrown and Helen (1972), and the generalization of the S-branch utility tree of Dlackorby, Boyce, As an illustration, and Russell (1978). linear expenditure system can be v/ritten in exact the aggregation form as follows: (P„t 'nkt = where y period ^, ^ - Pjt^ "^j and parameters (Pjjj. \ ''kt (n = • price of this comi.iodity (n = is the p^^^ 1, 2 ... N). b^^ c^ - and c^^ 1 c- bjj = (n 2 1, ... *"'' P;t;'^ii^ N) *^^^ , N) ... vector of observations duced by Deaton and Muellbauer (1980a, 1981, 2 ; M, is the vector of coefficients n„(P(, More complex nonlinear exact aggregation models Stoker (1980, 1, in total The vector of parameters 9 includes the expenditure on all coiianodities. is ^ denotes expenditure on the nth commodity by the kth individual n*. t -^n liave x' is (1, «) ' ''],.)• recently been intro- 19G0b) and by Jorgenson, Lau, and The AIDS models of Deaton and Duellbauer can be 1982). wri tten: ^nkt = (a„ where Yj^j^^, InP^ is a . and p '1^*, = la. b„, c„ • (n, j in ^ in p ^ . p.^)M,^ . ^^7^7^'^" are defined as ^ I" I I c„ . in the Inp^^ ' = 1. 2 ... N). linear expenditure system and: In p^^. The vector of parameters d includes the parameters price index. nnnj*^ *t,> I c„j = 1, 2 ... N), the vector of coefficients B (p., nt O) ' is - (a n 1 + c . nj In p.^. Jt' ;;") :; In P 4 - and the vector of observations is x,' kt The translog model of Jorgenson, Lau and Stoker can be represented in the form a _ ^nkt - y I + /_a b In p. D(p^) ^ b . l-Ll) 'li „ _ "kt ' Ul m n(p^) ^° kt .5- _ns m . ' kt (n = where y^j,^, Mj.^ and p^^ are defined as above, A , ^ (s ^ "" = 1, kt skt, , n(p^) 2 1, A n, ^ b^' ,. M), ... ... 2 S) represents demographic characteristics such as family size, age of head of household, and: so on, D(p^) = -1 In this example j = a„ 2 1, ... lb. + N; I + b,jj In p.j. the vector © consists of = s 1, In p.. ^) ^ ... 2 S) tion a ^^^ Mj^^ A^^^. •• ^' M,^^ A^j^, P ... M,^, G) , Stochast ic ?'>i)cc close with if ic at ion . ns ' (n, is °'"^- (1) a 5 we consider and (2). V/e In Sec- squares stage least In Section 4 we consider estimation subject brief summary of the results and discussion of applications. 2_. b'' ;:j' A^.^^y In Section 2 the nonlinear three we discuss hypothesis testing and in Section V/e ' (p estimator for pooled time series and cross section observations. to inequality constraints. b . nj implications of nonlinearity for the pool- section and average time series data. we present and characterize b ^"" ''^ ^^'^'"^ °' stochastic specification of exact aggregation models 3 n' b^e ' V.^^. parameters b„- - In this paper we focus on the ing of cross the the vector of coefficients , b„, b,,. ' vations x-^ is (M^^. H^^ In the and begin by considering average a - observations for T time periods and a - 5 single cross section of K inJividual observations. We assune that the observations are generated by exact a^^gre^ation models and (2) with additive disturbance terms. (1) specification of the disturbance terms, Given the stochastic the observations must be transformed to obtain disturbances that are honoscedastic and uncorrelated across observa- tions. For pooling of cross section and average time series data the transformation of observations to obtain homoscedast ic and uncorrelated disturbances can be divided into two steps. The first transforming the average data so step separates the data sets by that time series disturbances are uncorre- lated with cross section disturbances. ing data iiej) sets to a form where disturbances in each data and uncorrelated. citly, The second V/e transforms the resultset are hoMOScedast ic present the transformation for the first step expli- indicating the features of this transformation that result in increased efficiency. Tlie second step involves standard techniques for transformation, which we illustrate by example. We assume that individual observations are generated by the exact aggre- gation model (1) with an additive random component, ^kt = (In ® ^kt) P(Pf ®) We assume that the disturbance ^ ^^'kt ^k't') =0. e . (!') =kf term uncorrelated across individuals, say e^,^ is distributed with mean zero and is so that: k it k'. Any systematic correlation among individuals is assumed to be captured by selection of the variables variance Q x^^^. The disturbance term e^^ is assumed to have and time series covariance structure E(e, e,' , ) = C , !f . A - 6 wide variety of alternative time series structures for form for the matrix C by choosing an appropriate We could obtain e can be represented , stochastic version of the exact aggregation model a by averaging the individual observations in (1') would be the appropriate procedure if averaging the individual observations. for each time period. (2) This the average data were constructed by However, we must allow for alternative methods for constructing the aggregate data. In demand analysis, for example, data on aggregate personal consumption expenditures are obtained from production accounts for the economy as a whole rather than by direct observation of quantities consumed by the entire population of individual households. To allow for differences in methods of construction of the individual and aggregate data we introduce an additive random component V aggregation model (2) for each time period. data y y^ = where u 8 ^[) (Ifj = + \) stochastic term zero, t ^ variance P(Pt, e) + Uj, is a (2') vector of N averaged disturbances e. and ^) is assumed to be distributed independently ft J-K » e^^ and time series covariance structure E(\) To accommodate t'. The model relating the averaged jj then: anj p ^q ^ into the exact (e, of ej^^ ' \) , ) = ) Tlie . with mean n\1 1 for variety of time series covariance structures for u a we have: E(u t +—K ii')=a \'> "\) In order to present data we consider a r o ^tt' ^e- methods for pooling cross section and time series sample of K' individual observations. We can "stack" the equations (1') to obtain: Y = (I^j 9 X) p(pj. . 9) + e, (3) - 7 where Y is the vector of observations (y nkt . {x,' kt and } X is the matrix with rows ), is the vector of disturbances with mean zero and covarianco e o matrix H 9 Similarly, we can represent the equations (2') I,.,. K 6 Y = f(e) + (4) u. where Y is the vector of averaged observations {y f(e) = form: in the xj p(Pi .6) I^ .e) p(P2 } and u is the vector of disturbances. step in the transformation of observations eliminates the The first correlation between of E(u t e, ^kt = K' TT- C ') K ' e and u n tt ''e (k = , ' This correlation is removed by a Y. _ 'i1^ C X - t , K c "tt K'; t = 2 1. ... , 7 and u^^ o , (5) in (2') by: (6) ^es' I :: x and tt e o ^cs' denote the cross section averages of y, , The resulting disturbances u° are now uncorrelated with e^^^ and x, o Ej. T). III C u" = u where y i^ ... v Yes. ^tt t 2 nonsingular transformation of (3) and (4), which is equivalent to replacing y yo = 1. o (k = 1, 2 - 8 - K'). but have a the original more coraplicated time series structure than disturbances 9^ E(u- u«:) = ^ ^ C^,. - K' P., [C^^^. . C^,^ - II ^"'^ • "e observations is to apply The second step in the transformation of a non- to obtain disturbances that are singular transform to the average data in (4) this transformation below by homoscedastic and uncorrelated. We illustrate example. been performed, We assume that the transformation has model (4) to: Y* = f*(e*) + — u* altering the (8) , where u* is distributed with mean zero and variance we stack the equation systems (3) 9.^^ A I^. For estimation, and (8): ( Y = t» where U' = (e' -- 9) + U. (e«) . u., and variance: u*'). which is distributed with mean zero pr^;,j = described above requires conThe implementation of the transformations covariances n^. C^^,. 9^ sistent estimates of the variances and ... T). In general, = 1. 2 processes these estimates require specific models of the The purpose of the transformations is to assure generating the disturbances. Equation (2') shows that the contribution of the efficiency in estimation. individual errors it.f e^^^ to the covariance gible unless the matrices £2*^* where K is population size. ' are the structure of u^. is likely to be negli- o^, same order of magnitude as ^^C^^. (6) The benefits of performing the transformation - 9 - depend on the size of the cross section relative to the population. In many applications, K'/K will be extremely small so that the transformation (6) Typical numbers for an analysis of U.S. leaves the observations unaffected. household demand behavior are K' = 10,000 =70 and K million. Consequently, only when the cross section sample size is of the same order of magnitude as the size of the population will the correction yield significant benefits; otherwise it can be ignored. The following examples illustrate different error structures, assume K'/K is very small. We take C^^, = q , t^t r ^ where we f^r simplicity, defer- ring further discussion of this time series structure until we have presented In Examples the examples. Example and 2 we take 1 (Random Individual Errors): 1 an additional random component tributed with mean ^ at and variance 9..., V I(\j. E(u^ + ej.j.)/K. up = t level, I \), Kt which is dis= Then u /K. t = given as u t . 2 = 'X/IT'u^ ^ with Q , t = ^ u* (Common Time Effect): will usually encounter K mation. SI > > fi., a grouping correction, with + 9. V n , n^*^ u of .' £ Suppose that ^ turbance in the aggregate data with Example \). arises because of \} the individual so that t'. ^ t Suppose that The second stage transformation is just Example for with: t#f (8) = flK^ = fi represents for all so that u = \) t. a conmion dis- In practice one for purposes of esti- Here no second stage correction is necessary, with 3 (Autocorrelated Conmion Time Effect): Suppose that Example 2 is 10 - - modified to \)^ variance and uncorrelated over time, 1 = n,. V fi w /I - Y Ji = y 2 , w + so that u = \) distributed with nean zero, with K fl we > > and - by y x y y . CI U*^ = n neeliai° is " y (with x Of course, = u u in this case. (1) Now suppose that C lation structure for , so that we have a nontrivial ^ In Examples 2 e Ik. and 3 above, time series corre- the effect of C , jtQ L L L would be negligible, due to the unimportance of X Eh^/K in Kt however, - and x the standard adjustment to the first observation). and llien . The second stage correction is now quasi-first . replacing y differencing, is lo. the contribution of "Le.^/K to u kt t (1) ble, where , As above, • ' ^ the time series structure is potentially u. important, have the same time series covariance structure as e, ^ kt . In Example 1, t since '\Ji; u will conand would require ^ sideration in the second stage of the transformation of observations. Example illustrates the cost of pooling with very general error struc- 3 In particular. tures. Example 3, the parameter y is best relabeled as a com- ponent of ©, with the transformed error covariance structure now determined by ^- e and fil u*» = ,Q w . The treatment of autocorrelation will list of parameters to be estimated with terized by Q and Q ^. DO involve augmenting the the remaining error structure charac- This modeling approach is standard practice in time series analysis. Consequently, estimation of the parameters in Section and 9. e CI u**, 3 we discuss only the consistent which we will regard as positive definite but otherwise unrestricted. Before discussing the additional assumptions required for estimation of the complete model, we introduce instrumental variables. appropriate to treat the variables x^^ and observations, model is a the aggregate p^^ as endogenous observations, or both. simultaneous equations model It in exact Tliis is often for the individual can occur when the aggregation form or part of a - 11 - .arger system of simultaneous equations. For example, in demand analysis observations on prices can reflect both supply and demand influences, ing aggregate instruments. Alternatively, in a study of savings, requir- errors in variables may necessitate instruments for the individual data, while the in average data such errors may be negligible. We assxune that there are vectors of observations on instruiaental variables, say t^j respectively, Denote as Z, ). and as Z and Z the matrices with rows z, and z' the matrix: \\^0 Finally, we must introduce regularity assumptions in order to characterize the NL3SLo estimator. We include these are that the coefficient functions P(p ablo in the components of fl, e) in the Appendix. are twice continuously differenti- that the moment matrices defining the NL3SLS objective function converge to stable, well behaved limits, parameter vector © is identified. set §^ a in remaining parameters in set 9, 3.. and all the , and that the We collect all components of ^ identified in the cross section in a The assumptions all parameters identified in the a set 9 time series . The Nonlinear Three Stage Least Squares Estimator . The t-ILSSLS estima- tor 6 of ©* is found as the value of 6 which minimizes: S(e) = il - (i (9)) where E T. u* T, ' [t^ V \ ^7 % Z(Z'Z)^Z'] lY - 4(9)). • ' (11) - 12 - is a consistent estimator of I as IC be written more explicitly s(e) = s^(e) — T , The objective function S(e) can ><». as: s(e), + (12) with; s^(e)=(Y-(i 8 X) p ,e))'[a^ (p^ ® z^(z^z^) V.^](y-(i^ s(e) = (Y» - f* (G))'[rri u* s x)p(p *^ o ®z , e)). o (Z'Z)"^z'] (Y* - f*(e)). where S (9) and S(9) are IJL3SLS objective functions for the cross section and average models individually. ized to estimate ters; 9 = set. similarly, © and © A Q Obviously, the elements of 8 S(9) = ^, , the function S (9) could be minim- for fixed values of the remaining parame- could be minimized to estimate the elements of ©. If then all parameters could be estimated from either data Minimizing (11) constrains the estimated values from cross section and time series data sets to be equal, which results in efficiency gains. " Note that the function S (9) can be evaluated using only and the 9. e moment matrices Z X .Z„Z, and (I Thus for estimating 9 or other 8 Z„) Y. more restricted parameterizations of P(p , 9), only one pass through the cross This computational sim- section data is required to construct these moments. plification results from exact aggregation. The estimation procedure consists of three steps: estimators of 9.^ and n^,; second, First, find consistent minimize (11) to obtain 9; third, calculate the asymptotic covariance matrix of 9. If % is not empty, then we cannot improve upon previous suggestions in the literature for finding consistent estimators of 9. anj o . for example. equation of the model by NL2a^S. Gallant (1977) suggests estimating each This involves pooling both data sources on a - 13 - single equation basis, the cross section data, covariance from as the estimated residual forming A and forming ft^^ as the estimated residual covariance data. from the transformed average time series ©^ is empty, which suggests The more usual situation is that procedure. ^ First, obtain consistent estimates of p(p^ by O) , dual covariance matrix provides P.^ a , Tlie consistent estimator of Using the consistent estimators of estimates of the elements of ©« (i(p^ , P-^ or ML3SLS to the the model even if ^ is solve for consistent 6). Holding these parameters fixed at say 6°. system as whole, a using only the time series tJL2SLS residuals, The estimated covariance matrix of the data. estimated resi- NL2SLS to each equation the remaining parameters of e by applying ^0. estimate of linear 2SLS o data. estimation of each equation using the cross section not empty. simpler a consistent estimator of 0^^. In addition, ft^^ provides a this procedure usually produces good starting values for G to use in minimizing (11). The objective function (11) can be minimized using known computational methods. a variety of well A convenient method that illustrates pooling Gauss-Newton process.^ To discuss cross section and time series data is the Let this method we require the following notation: l?^j(e) and «l>(e) denote the matrices Doi(e) BQ(e) = r.02(e) . %(«> and 4. (0) is the 4(6) = 4^(9) matrix with elements { \[K 10 x^ P^ (P^. ©) ) for finding 6 from an The Gauss-Newton process is an iterative procedure A initial value G At the ith iteration, the current value ©. is updated to 14 d .^, 1 + 11-' = A O. + <9. 1 first by Y - (I^ 9 X) q (p, p "tQ '^ y* - f* (6.)= linearizing the system (9) with respect to B as: 1 q 11 (9. (J) 1 ) Oil « X) D„ = (I ,e.) AB. (e.) A9. + e (13) , u*. + We then apply Zcllner and Theil's (1962) linear three stage least squares uethod to the uodel (13), obtaining: Ae. = (M 1 where 4- xo )~^ M (M X + eo M u ) (14) . : M^„ = Bo(e.)'(n;^ xo x'z^u^-z^)-' z^x) b^ (e.) = 4 IT X (e.) '(a"l e z(Z'Z)~^z') M^ = 4 <ji U* 1 (9.) 1 (9.)'(a~i 8Z(Z'Z)~^J') Y* -f* (9.)) A . A Convergence to 9 is achieved when A 9. becoties sufficiently small. Following Hartley (1961) we check whether S(6. A A forming 9.,, =9. 1 1 1 where a + -i- A 9.)< S(9.); a we shrink A 9^ by not, i\ + A 9/2. 1 V/e continue until improvement in S is found, new iteration is performed; alternatively, falls below if if the current increment convergence criterion, we have found the minimum. Under our assumptions the NL3SLS estimator 9 is consistent for 9* as K' T — > ", and asymptotically normal with asymptotic covariance matrix: AVAll (6) = (M* xo + M*)~^ (15) . X where the moment matrices are evaluated at the true values f! e and CI u . precise form of the limiting normal distribution depends on the way that Tlie K' - 15 - and T approach infinity; however, (M by ' xo II -t- (fl) given js case. in any ^ ) x' Closer inspection of (14) indicates the relationship of K'LSSLS to linear pooling estimators; for example, A e. = consistent estimator of AVAR a (r: 1 i- ii xo ^ ) (m X 11 a e^ xo % ii 0—1 m + © = ©, 1 = i> , then: XI ag.) A where AG. S(Q) are the Gauss-Newton increments from minimizing S and A6. respectively. Thus, A 0. is just a and (9) matrix weighted average of the indi- vidual increments. Second, ^'xo ^^^ ^'eo if '^^^ the cross section data are exogenous, then Z„ = X, ^® evaluated using the moments X'X and case one can obtain (I and both 8 X)' Y. this In from the cross section residuals of each equation it e estimated by OLS. Third, additional cross section data sets can be incorporated in straightforward manner. tj, (or tQ , If an additional cross section is available for time for that matter) with data Y, are formed as above, second term of (14). gate data series. ^^ 1 and fl^^ X, , then n,(0), M and Z,, , and M , enter additively into the first and and M^^ The proper correction In a (7) must be applied to the aggre- this way all of the available cross section informa- tion can be used in estimating the vector of parameters 6. 4_. Parametric Hypothesis Te s t = g(p), where p is a Statistical hypotheses take the form 6 . vector with dimensionality R less than that of Our irterest is in testing the hypothesis that tive G ^ s(p). in the Appendix, = g(p) ft, against the alterna- For this task, we require two additional assumptions, which indicate that p is say L. listed identified and that the disturbances - 16 - Ei.^ let and u t are normally distributed, statistic of interest is found as follows: Let S^(p) denote the The test objective function: = (Y - <k{g(p))) '[t~^ 8 Sj.(p) Denote by p - i*(g(p))). (16) Under our assumptions Gal- the value of p which minimizes S (p). have shown that the statistic, lant and Jorgenson (1979) (17) - S(e). r = S^(p) is Z(Z'Z)'^Z'](Y asymptotically distributed as chi-square with L The appropriate test under the null hypothesis. The minimization of S ( p) - R degrees of freedom statistic is provided by x. to find p is analogous to the procedure for Although finding 0, and requires only moment matrices from the cross section. the monotonicity any consistent estimator of I can be used in evaluating (16), condition S (p) - S(0) > will be guaranteed only if the sane I is used to A A evaluate both S and S. Thus, the original consistent estimates 9.^ and n^^ used in estimating 6 should be used in finding estimators for restricted versions of the model. 5_. Estimation Subject to Inequality Restrictions The final topic we . consider is the estimation of the parameter © subject to inequality restrictions. For example, an integrable demand system must obey the condition that the Slutsky matrix of compensated price derivatives is negative semi-definite. The unnoustrained estimator 6 need not obey these restrictions for finite samples; thus, it may be desirable to impose them. We represent such restric- tions formally as: (» > m (e) — , (m = 1, 2 ... M') . (18) 17 - where we assurne ^ twice continuously dif f erent iable in each component be to of e. The inequality constrained estimator 9 minimizes This estimator corresponds to straints (18). a SO) subject to the con- saddlepoint of the Lagrangian function L = s(e) where X is a + \'i> (19) , vector of M' Lagrange multipliers and The Iluhn- Tucker straint functions. ( 195 1) Gonditions is the W vector of con- for a saddlepoint of this Lagraugian are: ^qL = Aq s(fi) + A.' («(e)) = , and the complementary slackness condition: X.'^ = X , , >. 9. a where (1(9) is the matrix with elements ^ ( ) . J To obtain the estimator ^ we begin by linearizing the model as in (13). Next, we linearize the constraints as: *^^n.i) = «(e.) A e. + ^(e.) . A A where 9. is the current iteration value of the unknown parameters. ' We then 1 apply Liew's (1976) inequality constrained linear three stage least squares method to the linear model, obtaining: * A 4. A = A 9. + (M^^ t M^)"^ *(9.) • X* , - 18 - where A d. (14) is given by and >.* solution of the linear complementar- is the ity problem: 9(^ ) (M 1 xo + M )'^$(e.) X * A A IS • X 11 [<P(e.)Ae. + 1 - piQ.)]'\ = o. x i 1 o. AAA A A and check that both S(e.^^) we < S (9.). 6. + A 9^ and that 0^ ^^i + i^ >. . m = 1, 2 . . . M' shrink the increment vector as before, until either improvement is found or the increment values fall terion. to e.^^ = e. A A A If not, (18), we update that satisfies the constraints Given 9. A Tliis in absolute value below a convergence cri- concludes our discussion of the NL3SLS estimator. Conclusion and Applications . In this paper we have discussed the nonlinear three stage least squares method of pooling average time series and cross section data. There are two major advantages of this technique. The first is the identification of parameters and the gains in efficiency in esti- mation. For example, by pooling average models can be estimated that account for time series and cross section data, a large number of specific demo- graphic effects in consumer behavior in both microeconomic and raacroeconomic settings. Such effects are difficult to identify or estimate precisely using aggregate time series data alone. Alternatively, the effects of time varying factors such as price levels that are constant across consumers in each time period may be impossible to identify using only data from tion survey. a single cross sec- Both effects can be estimated when cross section observations are pooled with average time series observations. The second major advantage of the nonlinear three stage least squares technique is ease of computation. for While exact aggregation models can allow on substantial nonl ineari ties in variables representing common influences - 19 - employed in pooled behavior as well as in parameters, cross section data are estimation through moment matrices. ing only one These matrices can be constructed utiliz- pass through each cross section data source. Tliis feature sub- iterations to estimate stantially reduces the time and expense of performing nonlinear model same model atri a the the cost of estimating several restricted versions of for hypothesis testing. this paper to models of We have applied the techniques described in aggregate consumer behavior for the United States. Models describing consumer by Jorgenson. budget allocation among broad commodity classes are presented Lau and Stoker (1980, 1981, series data from 1958-1974, 1982). These models are estimated from annual together with cross section data from 1972. time Ine- the resulting Slutquality constrained estimation is required to assure that sky matrices are negative semi-definite. to A model describing the allocation of total energy expenditures Tliis model specific energy types is presented by Jorgenson and Stoker (1983). is together estimated using annual time series average data from 1958-1978. with five cross section data bases. Parametric hypothesis tests are performed of strucusing the test for separability of preferences and the possibility tural change. Finally, Jorgenson, Slesnick. and Stoker (1983) have presented At the first stage the consumer budget is models of two stage budgeting. allocated between energy and nonenergy commodities. energy budget is allocated among types of energy. At the second stage the 20 - APPENDIX TECHNICAL ASSUMPTIONS : Below we list the assumptions required to establish consistency and asymptotic normality for the NL3SLS estimator. (1977) Assumptions 1-5 follow Oallaut and assumptions 6-7 follow Jorgenson and Gallant Assumpt ion The parameter space of €, say %, _1: is (1979). compact, with the true value an interior point. Assumption 2: The components of p^ (p^, e) (n = 1, 2 ... N) are twice continuously differential be in 6.. J For the next two assumptions, P ^ ^Pf, 6) to refer (^ ^Pf ,, «' J ae.' ae, 1 a 1 J 9) (p and se. ae. 1 J (p^., O) ' . The matrix Z'Z jj converges to The Cesaro sums, ^°'. Pn <Pt N ^ <ynkt - ^kt o o k Rpkt k o <ynkt - ^kt Pn Npkt o k (^kt pi <Pt iiAN.), J Assumption 3A (Cross Section). — •" e. where p^^ is the mth component of p^ o p-' J jLh^ _(i!AL - ^ae, definite matrix as N the notation to the vectors: J pij Pq we use • • ®)^<yjkt «)) (Pt • o - '^kt 00 Pj ^Pt «>> ' • • o «)> ' o converge almost surely uniformly in 6 (n, j = 1, 2 ... N) . The sums: a positive - 21 - k ^P"Pe" S'kt o k are bounded almost is ^lyt (^kt PJ' ^Pt o surely for all the sth component of Time Series ) ( definite matrix as T f^\ ^^it - ^t — 1, 2 ... N; s' = 1, 2 ... S) , where . K <Pf ®)^ <yjt Tj\Fr^ ^^nt - ^t Pn tJ\|^^ (I; Z'Z converges to =^ a positive ^Pf ®)>' - ^t pj (Pf «))• e)). pi (p,. -' 1^ fxsupQ I z^.^ (x^ 1t-_ ^isup^ I :' f.ij z^.^ (x^ P^J are bounded almost The matrix . The Cesaro sums: >°>. converge almost surely in 6 (n, al p;; j = 1, (p^. e))i. (p^. e))l. surely (n, j = 1, 2 2 ... — N) N) , . The sums. where z , is the sth component f Assumption lim N.T is = j o Assumption 3B z (n, z, o of «>^'' • o N The matrix: 4: + T xo nonsingular, where M xo X and ST are defined in equations (14) and (15) - 22 - Assumption mental variables 1 im i ts — f is the z ; — > k " solution of the almost sure e)) = .(n = 1. 2 ... N) . (API) P^ (p^, e)) = .(n = 1. 2 ... N) , (AP2) - x| t * 6: (Parameter Restriction). The function g(p) tinuously dif f erent iable mapping of There is only one point interior point of ^ P. 8n - , where g n a Pv twice con- is a compact set P into the parameter space in P which satisfies g(p) p The L x Pj component of . o true value ment of G is (p^ o ^1*1^7 Zj (y^t ^ Assumption ©. the only that is, (y^^^ - x'^ P^ kzj^t °° lim T and z identified by the instru- is : Um N © of (11) (Identification). 5.: matrix G(p*) has rank is the nth component R, of g(p) = and is an p where the n, and p. the is jtli j th J p. Assumption 7: (Normality). distributed for all k and t. The disturbances e and \. are normally ele- - 23 Footnotes 1. mators, 2. For detailed discussion of nonlinear three stage least squares esti(1977), Gallant see Amemiya The correspondence between individual and aggregate behavior is dis- cussed by Lau (1977. 3. 19G2) and Stoker (1982b). An alternative approach to aggregation is based on restrictions on the distribution of the variables 4. and Gallant and Jorgenson (1979). (1977), x^^^. for example. See, Stoker (1982a). See for example. Balestra and Nerlove (1966), Kmenta Mundlak (1978). Much of the discussion of the linear model focuses on the stochastic specification rather than the structural model; see, 5. and 3 a linear model has been surveyed by Dielman (1983). This stochastic specification is used in an exact aggregation model by Jorgenson, 6. for example, The literature on pooling cross section and average time Amemiya (1978). series data in and (1978), Lau, and Stoker (1980, The exclusion of \)^ 1981, 1982). from the cross section disturbances in Examples may appear to be somewhat arbitrary. Suppose instead that ^ + Ei o o represent the cross section disturbances. The 2 can be estimated as the \) o difference between the estimate of the cross section constant term and the constant term applicable to the time series. Correlation between resulting cross section and time series disturbances is then due only to the e terms, o so that the effect of the transformation separating the tv/o data sets is negl igible. 7. This excludes the possibility that x is subject to measurement iviot; aggregate instruments would be required to deal with errors of measurement. 8. We assume that the variance of the disturbance, conditional on the - 24 - instrumental variables, series models. If is constant for both cross section and average time this assumption is relaxed, by adjusting the weighting matrix of equations (1980a, 1980b. and Hansen (1982) (11) and (12). See White for details. The Gauss-Newton method for systems of nonlinear regression equations 9. is 1982) efficiency gains are possible discussed by Malinvaud (19 80). 10. If the observations are transformed, the transformed data should be used here. 11. (1976) Matrix weighted averages are discussed in Chamberlain and Learner and Mundlak (1978), among others. 12. This assiunes that disturbances in different cross sections are uncorrel ated, which requires transformations of the average data only. Over- lapping cross sections require panel data techniques that are beyond the scope of this article. - 25 - References "The riaximum Likelihood and Nonlinear Three-Stage Least Squares Estimator in the General Nonlinear Simultaneous Equations Model." Econometrica Vol. 45, No. 4, May, pp. 955-968. Ameiniya. T. (1977), . "A Note on a Random Coefficients Model," International Economic Review Vol. 19, No. 3, October, pp. 793-796. (197 8), , Time Series Balestra, P. and M. Nerlove (1976), "Pooling Cross Section and Natural Gas." for Demand The Model: Dynamic of a Estimation Data in the 585-612. July. No. 3, pp. Vol. 34, Econometrica R. Boyce, and R.R. Russell (1978), "Estimation of Demand SysBlackorby, C. A Generalization of the Sby the Gorman Polar Form: Generated tems Branch Utility Tree." Econometrica Vol. 46, No. 2. March, pp. 345-364. . , A Generalization and D.M. lleien (1972), "The S-Branch Utility Tree: Vol. 40, No. 4. July, of the Linear Expenditure System," Econometrica pp. 737-747. Brown, M. . Chamberlain. G. and E. Learner (1976), "Matrix Weighted Averages and Posterior Bounds," Journal of the Royal Statistical Society B. 38, pp. 73-84. , Deaton, A. and J. Muellbauer (1980a), "An Almost Ideal Demand system," American Economic Review Vol. 70, No. 3. June, pp. 312-326. , (1980b), Economics and Consumer Behavior Cambridge University Press, and Cambridge, , Dielman, Terry E. (1983). "Pooled Cross-Sectional and Time Series Data: A Survey of Current Statistical Methodology," American Statistician Vol. 37. No. 2, May. pp. 111-122. . Gallant. A. R. (1977). "Three-Stage Least-Squares Estimation for a System of Simultaneous. Nonlinear. Implicit Equations." Journal of Econometrics. Vol. 5. No. 1. January, pp. 71-88. Gallant. A.R. and D.W. Jorgenson (1979). "Statistical Inference for a System of Simultaneous, Nonlinear, Implicit Equations in the Context of Instrumental Variable Estimation," Journal of Econometrics Vol. 11, No. 2/3, October/December, pp. 275-302. , Hansen, L.P. (1983), "Large Sample Properties of Generalized Methods of forthcoming. Moments Estimators," Econometrica . Hartley. H.O. (1961), "The Modified Gauss-Newton Method for the Fitting of Non-Linear Regression Functions by Least Squares," Technome tr ics. 3, 2 69-2 80. Jorgenson. D.W. and J.J. Laffont (1974), "Efficient Estimation of Nonlinear Simultaneous Equations with Additive Disturbances," Annals of Economic - 26 and Social Measurement . Vol. No. 3. 4. October 1974. pp. 615-640. under Jorgenson, D.W.. L.J. Lau. andT.M. Stoker (1980). "Welfare Comparison May, pp. No. Vol. 70. 2, Review Economic American Aggregation," Exact , 2 6 8-2 72. (1981). "Aggregate Consumer Behavior Nobay and D. Peel (eds.), and Individual Welfare." in D. Currie. R. Macroeconomic Analysis London. Croom-Helm. pp. 35-61. , and . "The Transcendental Logarithmic Model of Aggregate Consumer Behavior," in R.L. Basmann and Press, Rhodes (eds.). Advances in Econometrics Vol. 1, Greenwich, JAI , (1982), and C . pp. 97-238. D.T. Slesnick, andT.M. Stoker (1983), "Exact Aggregation Jorgenson, D.W. Harover Individuals and Commodities," Discussion Paper 1005, Cambridge, vard Institute of Economic Research, August. , on Jorgenson, D.W. and T.M. Stoker (1983), "Aggregate Consumer Expenditures and of Economics Enerfiy Advances in the Energy." in J R. Moroney (ed.) Resources Vol. 4. Greenwich. JAI Press, forthcoming. , . . Rubin (1947-1948), "A Constant-Utility Index of the Cost Vol. 15(2). No. 38. pp. 84-87. of Living," Review of Economic Studies Klein, L.R. and H. , , (1978). "Some Problems of Inference from Economic Survey Data." in N.K. Namboodiri (ed.). Survey Sampling and Measurement. New York. Academic Press. 1978. pp. 107-120. Kmenta. Kuhn, J. Neyman and A.W. Tucker (1951), "Nonlinear Programming." in, J. Mathematical on Symposium Proceedings of the Second Berkeley (ed. ) Statistics and Probability Berkeley, University of California Press, 481-492. H.W. . . Lau, pp. (1977), "Existence Conditions for Aggregate Demand Functions," Technical Report No. 248, Institute for Mathematical Studies in the (revised Social Sciences, Stanford University. Stanford. California February 1980). L.J. "A Note on the Fundamental Theorem of Exact Aggregation." Economics Letters Vol. 9, No. 2. pp. 119-126. (1982). . Liew. (1976), "A Two-Stage Least Squares Estimator with Inequality Res58. trictions on Parameters," Review of Economics and Statistics, Vol. No. 2. May. pp. 234-238. C.K. Malinvaud, E. 1980, Statistical Methods of Econometrics North-Holland. , 3rd. ed. , Amsterdam. Data." Mundlak. Y. (1978). "On the Pooling of Time Series and Cross Section 69-86. Econometrica Vol. 46. No. 1. January, pp. . - 27 - (1982a), "The Use of Cross Section Data to Characterize Macro Functions," Journal of the American Statistical Assoc ia t ion June, pp. 369-380. Stoker, T.M. , (1982b), "Completeness, Distribution Restrictions and the Form of Aggregate Functions," M.I.T. Sloan School of f*.anagemont V/orking Paper No. 1345-82, August. Stone, R. (1954), "Linear Expenditure Systems and Demand Analysis: An ApplicaVol. 64, No. tion to the Pattern of I3ritish Demand," Economic Journal 255, September, pp. 511-527. , White, H. Vol. (1980a), "Nonlinear Regression on Cross-Section Data," Economet r ica No. 3, April, pp. 721-746. 48. (1980b), "A Heteroscedasici ty-Consistent Covariance Matrix Estimator with a Direct Test for Heteroscedast ici ty, " Econometrica Vol. 48, No. 4, May, pp. 817-838. , (1982), "Instrumental Variables Regression with Independent Observations," Econometrica Vol. 50, pp. 483-500. , Zellner, A. and H. Theil (1962). "Three-Stage Least Squares: Simultaneous Estimation of Simultaneous Equations," Econometrica Vol. 30, No. 1, January, pp. 54-78. , . 3251 013 Mil 3 I IHRflRIFS TDSD DQ4 SEM IMB BAS Date Due Lib-26-67