Topics in Microeconometrics William Greene Department of Economics Stern School of Business Part 3: Basic Linear Panel Data Models Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP WKS OCC IND SOUTH SMSA MS FEM UNION ED BLK LWAGE = = = = = = = = = = = = work experience weeks worked occupation, 1 if blue collar, 1 if manufacturing industry 1 if resides in south 1 if resides in a city (SMSA) 1 if married 1 if female 1 if wage set by union contract years of education 1 if individual is black log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text. Balanced and Unbalanced Panels • • Distinction: Balanced vs. Unbalanced Panels A notation to help with mechanics zi,t, i = 1,…,N; t = 1,…,Ti • The role of the assumption • Mathematical and notational convenience: • • Balanced, n=NT N Unbalanced: n i= 1 Ti Is the fixed Ti assumption ever necessary? Almost never. Is unbalancedness due to nonrandom attrition from an otherwise balanced panel? This would require special considerations. Application: Health Care Usage German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from the JAE Archive) Variables in the file are DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status An Unbalanced Panel: RWM’s GSOEP Data on Health Care N = 7,293 Households Fixed and Random Effects • Unobserved individual effects in regression: E[yit | xit, ci] Notation: y it = x it + c i + it x i1 x i2 Xi x iTi • Ti ro w s, K co lu m n s Linear specification: Fixed Effects: E[ci | Xi ] = g(Xi). Cov[xit,ci] ≠0 effects are correlated with included variables. Random Effects: E[ci | Xi ] = 0. Cov[xit,ci] = 0 Convenient Notation • Fixed Effects – the ‘dummy variable model’ y it = i + x it + it Individual specific constant terms. • Random Effects – the ‘error components model’ y it = x it + it + u i Compound (“composed”) disturbance Estimating β • β is the partial effect of interest • Can it be estimated (consistently) in the presence of (unmeasured) ci? • • Does pooled least squares “work?” Strategies for “controlling for ci” using the sample data The Pooled Regression • Presence of omitted effects y it = x it β + c i + ε it , o bse rva tio n fo r pe rso n i a t tim e t y i = X iβ + c ii+ ε i , Ti o bse rva tio n s in gro u p i = X iβ + c i + ε i , n o te c i (c i , c i ,...,c i ) N y = X β + c + ε , i= 1 Ti o bse rva tio n s in th e sa m ple • Potential bias/inconsistency of OLS – depends on ‘fixed’ or ‘random’ OLS with Individual Effects -1 b = ( X X ) X 'y N = β + (1 /N )Σ i= 1 X i X i N + (1 /N )Σ i= 1 X i X i -1 -1 (1 /N )Σ Ni= 1 X i c i (p a rt d u e to th e o m itte d c i ) (1 /N )Σ Ni= 1 X i ε i (c o v a ria n c e o f X a n d ε w ill = 0 ) T h e th ird te rm v a n ish e s a sy m p to ti c a lly b y a ssu m p tio n 1 N p lim b = β + p lim Σ i= 1 X i X i N So, w hat becom es of -1 N Ti Σ x c (le ft o u t v a ria b le fo rm u la ) i i i= 1 N Σ Ni= 1 w i x i c i ? p lim b = β if th e c o v a ria n c e o f x i a n d c i c o n v e rg e s to ze ro . Estimating the Sampling Variance of b • s2(X ́X)-1? Inappropriate because • • • Correlation across observations (Possibly) Heteroscedasticity A ‘robust’ covariance matrix • • • Robust estimation (in general) The White estimator A Robust estimator for OLS. Cluster Estimator R o bu st va ria n ce e stim a to r fo r V a r[ b ] E st.V a r[ b ] = ( X 'X ) 1 Ti Ti 1 Ni= 1 ( t= x it ˆ v it )( t= 1 x it ˆ v it ) ( X 'X ) 1 = ( X 'X ) 1 1 N Ti Ti ˆ ˆ v v x it x is ( X 'X ) i= 1 t= 1 s= 1 it is ˆ v it a le a st squ a re s re sidu a l = it c i (If T i = 1 , th is is th e W h ite e stim a to r.) Application: Cornwell and Rupert Bootstrap variance for a panel data estimator • Panel Bootstrap = Block Bootstrap • Data set is N groups of size Ti • Bootstrap sample is N groups of size Ti drawn with replacement. Difference in Differences With two periods, y it = y i2 -y i1 = 0 + ( x i2 - x i1 ) β + u i C o n sid e r a "tre a tm e n t, D i ," th a t ta k e s p la c e b e tw e e n tim e 1 a n d tim e 2 fo r so m e o f th e in d iv id u a ls y i = 0 + ( x i ) β + 1D i + u i D i = th e "tre a tm e n t d u m m y " This is a linear regression model. If there are no regressors, ˆ 1 y | tre a tm e n t - y | co n tro l = "diffe re n ce in diffe re n ce s" e stim a to r. ˆ 0 A ve ra ge ch a n ge in y i fo r th e "tre a te d" Difference-in-Differences Model With two periods and strict exogeneity of D and T, y it = 0 1D it 2 T t 3 T t D it it D it = d u m m y v a ria b le fo r a tre a tm e n t th a t ta k e s p la c e b e tw e e n tim e 1 a n d tim e 2 fo r so m e o f th e in d iv id u a ls, T t = a tim e p e rio d d u m m y v a ria b le , 0 in p e rio d 1 , 1 in p e rio d 2 . This is a linear regression model. If there are no regressors, U sing least squares, b 3 ( y 2 y 1 ) D 1 ( y 2 y 1 ) D 0 Difference in Differences y it = 0 1D it 2 Tt 3 D it Tt β x it it , t 1, 2 y it = 2 3 D i 2 ( β x it ) it = 2 3 D i 2 β ( x it ) u i y it | D 1 y it | D 0 3 β ( x it | D 1) ( x it | D 0 ) If th e sa m e in d iv id u a l is o b se rv e d in b o th sta te s , th e se co n d te rm is ze ro . If th e e ffe ct is e stim a te d b y a v e ra g in g in d iv id u a ls w ith D = 1 a n d d iffe re n t in d iv id u a ls w ith D = 0 , th e n p a rt o f th e 'e ffe ct' is e x p la in e d b y ch a n g e in th e co v a ria te s, n o t th e tre a tm e n t. The Fixed Effects Model yi = Xi + diαi + εi, for each individual y1 y 2 yN X1 X 2 X N d1 0 0 0 d2 0 0 0 0 0 0 β ε α d N β = [X , D ] ε α = Zδ ε E[ci | Xi ] = g(Xi); Effects are correlated with included variables. Cov[xit,ci] ≠0 Estimating the Fixed Effects Model • • The FEM is a plain vanilla regression model but with many independent variables Least squares is unbiased, consistent, efficient, but inconvenient if N is large. b X X a D X X D D D 1 X y D y U sin g th e Frisc h -W a u g h th e o re m b = [ X M D X ] 1 X M D y The Within Groups Transformation Removes the Effects y it x it β c i + ε it y i x iβ c i + ε i y it y i ( x it - x i ) β (ε it ε i ) U se le a st squ a re s to e stim a te β . Least Squares Dummy Variable Estimator • b is obtained by ‘within’ groups least squares (group mean deviations) • a is estimated using the normal equations: D’Xb+D’Da=D’y a = (D’D)-1D’(y – Xb) T a i = (1 /T i )Σ t=i 1 (y it - x it b )= e i Application Cornwell and Rupert LSDV Results Note huge changes in the coefficients. SMSA and MS change signs. Significance changes completely! Pooled OLS The Effect of the Effects A Caution About Stata and R2 R squared = 1 - R esidual S um of S quares T otal S um of S quares For the FE model above, O r is it? W hat is the total sum of squa res? C onventional: T otal S um of S quares = i 1 "W ithin S um of S quares" i 1 = N N W hich should appear in the denom inator o f R Ti Ti t 1 t 1 y it y 2 R2 = 0.90542 y it yi R2 = 0.65142 2 2 The coefficient estimates and standard errors are the same. The calculation of the R 2 is different. In the areg procedure, you are estimating coefficients for each of your covariates plus each dummy variable for your groups. In the xtreg, fe procedure the R2 reported is obtained by only fitting a mean deviated model where the effects of the groups (all of the dummy variables) are assumed to be fixed quantities. So, all of the effects for the groups are simply subtracted out of the model and no attempt is made to quantify their overall effect on the fit of the model. Since the SSE is the same, the R2=1−SSE/SST is very different. The difference is real in that we are making different assumptions with the two approaches. In the xtreg, fe approach, the effects of the groups are fixed and unestimated quantities are subtracted out of the model before the fit is performed. In the areg approach, the group effects are estimated and affect the total sum of squares of the model under consideration. Examining the Effects with a KDE Fixed Effects from Cornwell and Rupert Wage Model .3 4 5 .2 7 6 D e n s i ty .2 0 7 .1 3 8 .0 6 9 Fixed E ffects fr om C or nw ell and R uper t W age M odel .0 0 0 0 1 2 3 4 5 6 7 AI AI Fre que nc y Ke rn e l d e n s i ty e s ti m a te fo r Mean = 4.819, Standard deviation = 1.054. .8 5 6 1 .6 8 8 2 .5 2 0 3 .3 5 1 4 .1 8 3 AI 5 .0 1 5 5 .8 4 7 6 .6 7 8 Robust Covariance Matrix for LSDV Cluster Estimator for Within Estimator +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ |OCC | -.02021 .01374007 -1.471 .1412 .5111645| |SMSA | -.04251** .01950085 -2.180 .0293 .6537815| |MS | -.02946 .01913652 -1.540 .1236 .8144058| |EXP | .09666*** .00119162 81.114 .0000 19.853782| +--------+------------------------------------------------------------+ +---------------------------------------------------------------------+ | Covariance matrix for the model is adjusted for data clustering. | | Sample of 4165 observations contained 595 clusters defined by | | 7 observations (fixed number) in each cluster. | +---------------------------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ |DOCC | -.02021 .01982162 -1.020 .3078 .00000| |DSMSA | -.04251 .03091685 -1.375 .1692 .00000| |DMS | -.02946 .02635035 -1.118 .2635 .00000| |DEXP | .09666*** .00176599 54.732 .0000 .00000| +--------+------------------------------------------------------------+ Time Invariant Regressors • Time invariant xit is defined as invariant for all i. E.g., sex dummy variable, FEM and ED (education in the Cornwell/Rupert data). • If xit,k is invariant for all t, then the group mean deviations are all 0. FE With Time Invariant Variables +----------------------------------------------------+ | There are 3 vars. with no within group variation. | | FEM ED BLK | +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ EXP | .09671227 .00119137 81.177 .0000 19.8537815 WKS | .00118483 .00060357 1.963 .0496 46.8115246 OCC | -.02145609 .01375327 -1.560 .1187 .51116447 SMSA | -.04454343 .01946544 -2.288 .0221 .65378151 FEM | .000000 ......(Fixed Parameter)....... ED | .000000 ......(Fixed Parameter)....... BLK | .000000 ......(Fixed Parameter)....... +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -2688.80597 886.90494 .00000 | |(2) Group effects only 27.58464 240.65119 .72866 | |(3) X - variables only -1688.12010 548.51596 .38154 | |(4) X and group effects 2223.20087 83.85013 .90546 | +--------------------------------------------------------------------+ Drop The Time Invariant Variables Same Results +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ EXP | .09671227 .00119087 81.211 .0000 19.8537815 WKS | .00118483 .00060332 1.964 .0495 46.8115246 OCC | -.02145609 .01374749 -1.561 .1186 .51116447 SMSA | -.04454343 .01945725 -2.289 .0221 .65378151 +--------------------------------------------------------------------+ | Test Statistics for the Classical Model | +--------------------------------------------------------------------+ | Model Log-Likelihood Sum of Squares R-squared | |(1) Constant term only -2688.80597 886.90494 .00000 | |(2) Group effects only 27.58464 240.65119 .72866 | |(3) X - variables only -1688.12010 548.51596 .38154 | |(4) X and group effects 2223.20087 83.85013 .90546 | +--------------------------------------------------------------------+ No change in the sum of squared residuals Fixed Effects Vector Decomposition Efficient Estimation of Time Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects Thomas Plümper and Vera Troeger Political Analysis, 2007 Introduction [T]he FE model … does not allow the estimation of time invariant variables. A second drawback of the FE model … results from its inefficiency in estimating the effect of variables that have very little within variance. This article discusses a remedy to the related problems of estimating time invariant and rarely changing variables in FE models with unit effects The Model y it = α i + K β x + k k it k =1 M γ z + ε m m i it m =1 w h e re α i d e n o te th e N u n it e ffe cts. Fixed Effects Vector Decomposition Step 1: Compute the fixed effects regression to get the “estimated unit effects.” “We run this FE model with the sole intention to obtain estimates of the unit effects, αi.” FE αˆ i = y i - k =1 b k x k i K Step 2 Regress ai on zi and compute residuals ai = M m =1 γ m z im + h i h i is o rth o g o n a l to z i (sin ce it is a re sid u a l) V e cto r h i is e xp a n d e d so e a ch e le m e n t h i is re p lica te d Ti tim e s - h is th e le n g th o f th e fu ll sa m p le . Step 3 Regress yit on a constant, X, Z and h using ordinary least squares to estimate α, β, γ, δ. y it = α + K k =1 β k x k it + M m =1 γ m z m i + δ h i + ε it N otice that i in the original m odel has becom e + h i in the revised m odel. Step 1 (Based on full sample) These 3 variables have no within group variation. FEM ED BLK F.E. estimates are based on a generalized inverse. --------+--------------------------------------------------------| Standard Prob. Mean LWAGE| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------EXP| .09663*** .00119 81.13 .0000 19.8538 WKS| .00114* .00060 1.88 .0600 46.8115 OCC| -.02496* .01390 -1.80 .0724 .51116 IND| .02042 .01558 1.31 .1899 .39544 SOUTH| -.00091 .03457 -.03 .9791 .29028 SMSA| -.04581** .01955 -2.34 .0191 .65378 UNION| .03411** .01505 2.27 .0234 .36399 FEM| .000 .....(Fixed Parameter)..... .11261 ED| .000 .....(Fixed Parameter)..... 12.8454 BLK| .000 .....(Fixed Parameter)..... .07227 --------+--------------------------------------------------------- Step 2 (Based on 595 observations) --------+--------------------------------------------------------| Standard Prob. Mean UHI| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------Constant| 2.88090*** .07172 40.17 .0000 FEM| -.09963** .04842 -2.06 .0396 .11261 ED| .14616*** .00541 27.02 .0000 12.8454 BLK| -.27615*** .05954 -4.64 .0000 .07227 --------+--------------------------------------------------------- Step 3! --------+--------------------------------------------------------| Standard Prob. Mean LWAGE| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------Constant| 2.88090*** .03282 87.78 .0000 EXP| .09663*** .00061 157.53 .0000 19.8538 WKS| .00114*** .00044 2.58 .0098 46.8115 OCC| -.02496*** .00601 -4.16 .0000 .51116 IND| .02042*** .00479 4.26 .0000 .39544 SOUTH| -.00091 .00510 -.18 .8590 .29028 SMSA| -.04581*** .00506 -9.06 .0000 .65378 UNION| .03411*** .00521 6.55 .0000 .36399 FEM| -.09963*** .00767 -13.00 .0000 .11261 ED| .14616*** .00122 120.19 .0000 12.8454 BLK| -.27615*** .00894 -30.90 .0000 .07227 HI| 1.00000*** .00670 149.26 .0000 -.103D-13 --------+--------------------------------------------------------- The Magic Step 1 Step 2 Step 3 What happened here? y it = α i + K k =1 β k x k it + M m =1 γ m z m i + ε it w h e re α i d e n o te th e N u n it e ffe c ts . A n a s s u m p tio n is a d d e d a lo n g th e w a y C o v (α i , Z i ) = 0 . T h is is e xa c tly th e n u m b e r o f o rth o g o n a lity a s s u m p tio n s n e e d e d to id e n tify γ . It is n o t p a rt o f th e o rig in a l m o d e l. The Random Effects Model • The random effects model y it = x it β + c i + ε it , o bse rva tio n fo r pe rso n i a t tim e t y i = X iβ + c ii+ ε i , T i o bse rva tio n s in gro u p i = X iβ + c i + ε i , n o te c i (c i , c i ,...,c i ) N y = X β + c + ε , i= 1 T i o bse rva tio n s in th e sa m ple N c= ( c 1 , c 2 , ...c N ) , i= 1 Ti by 1 ve cto r • ci is uncorrelated with xit for all t; E[ci |Xi] = 0 E[εit|Xi,ci]=0 Error Components Model A Generalized Regression Model y it x it b + ε it + u i E[ε it | X i ] 0 2 2 E[ε it | X i ] σ E[u i | X i ] 0 2 σ 2ε + σ u2 2 σ u V a r[ ε i + u ii ] ... 2 σ u 2 E[u i | X i ] σ u y i = X iβ + ε i + u ii fo r T i o b se rv a tio n s 2 σu 2 ... 2 σε + σu ... ... 2 σu … 2 2 σu Ω i ... 2 2 σ ε + σ u σu Random vs. Fixed Effects • Random Effects • • • • Small number of parameters Efficient estimation Objectionable orthogonality assumption (ci Xi) Fixed Effects • • Robust – generally consistent Large number of parameters Ordinary Least Squares • Standard results for OLS in a GR model • • • • Consistent Unbiased Inefficient True variance of the least squares estimator X X V a r[b | X ] N N i 1 Ti i 1 Ti 1 1 -1 X Ω X X X N N i 1 Ti i 1 Ti 0 Q Q* Q 0 as N 1 -1 Estimating the Variance for OLS X X V a r[ b | X ] N N i 1 Ti i 1 Ti 1 1 X Ω X X X N N T T i 1 i i 1 i 1 In th e sp irit o f th e W h ite e stim a to r, u s e ˆX X Ω N i 1 Ti N i 1 fi ˆ iw ˆ i X i X i w Ti ˆ i = y i - X ib , fi , w Ti N i 1 Ti H y p o th e sis te sts a re th e n b a se d o n W a ld sta tistic s. T H IS IS T H E 'C L U S T E R ' E S T IM A T O R OLS Results for Cornwell and Rupert +----------------------------------------------------+ | Residuals Sum of squares = 522.2008 | | Standard error of e = .3544712 | | Fit R-squared = .4112099 | | Adjusted R-squared = .4100766 | +----------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 5.40159723 .04838934 111.628 .0000 EXP .04084968 .00218534 18.693 .0000 19.8537815 EXPSQ -.00068788 .480428D-04 -14.318 .0000 514.405042 OCC -.13830480 .01480107 -9.344 .0000 .51116447 SMSA .14856267 .01206772 12.311 .0000 .65378151 MS .06798358 .02074599 3.277 .0010 .81440576 FEM -.40020215 .02526118 -15.843 .0000 .11260504 UNION .09409925 .01253203 7.509 .0000 .36398559 ED .05812166 .00260039 22.351 .0000 12.8453782 Alternative Variance Estimators +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Constant 5.40159723 .04838934 111.628 .0000 EXP .04084968 .00218534 18.693 .0000 EXPSQ -.00068788 .480428D-04 -14.318 .0000 OCC -.13830480 .01480107 -9.344 .0000 SMSA .14856267 .01206772 12.311 .0000 MS .06798358 .02074599 3.277 .0010 FEM -.40020215 .02526118 -15.843 .0000 UNION .09409925 .01253203 7.509 .0000 ED .05812166 .00260039 22.351 .0000 Robust – Cluster___________________________________________ Constant 5.40159723 .10156038 53.186 .0000 EXP .04084968 .00432272 9.450 .0000 EXPSQ -.00068788 .983981D-04 -6.991 .0000 OCC -.13830480 .02772631 -4.988 .0000 SMSA .14856267 .02423668 6.130 .0000 MS .06798358 .04382220 1.551 .1208 FEM -.40020215 .04961926 -8.065 .0000 UNION .09409925 .02422669 3.884 .0001 ED .05812166 .00555697 10.459 .0000 Generalized Least Squares G LS is e qu iva le n t to O LS re gre ssio n o f y it * y it i y i . o n x it * x it i x i ., w h e re i 1 2 2 Ti u ˆ ] [ X Ω -1 X ] -1 2 [ X * X * ] -1 A sy.V a r[β Estimators for the Variances y it x it β it u i U sin g th e O L S e stim a to r o f β , b O LS , N T i 1 t i 1 ( y it - a - x it b ) N i 1 2 2 2 e stim a te s U Ti -1 -K W ith th e L S D V e stim a te s, a i a n d b LS D V , N T i 1 t i 1 ( y it - a i - x it b ) N i 1 Ti -N -K 2 2 e stim a te s U sin g th e d iffe re n ce o f th e tw o , N Ti ( y - a - x b ) 2 it i 1 t 1 it N i 1 T i -1 -K N Ti ( y - a - x b ) 2 i it i 1 t 1 it N i 1 T i -N -K e stim a te s U2 Practical Problems with FGLS 2 T h e p re c e d in g re g u la rly p ro d u c e n e g a tiv e e stim a te s o f u . E stim a tio n is m a d e v e ry c o m p lic a te d in u n b a la n c e d p a n e ls. A b u lle tp ro o f so lu tio n (o rig in a lly u se d in T S P , n o w N L O G IT a n d o th e rs). T N i 1 t i 1 ( y it a i x it b L S D V ) 2 Fro m th e r o b u st L S D V e stim a to r: ˆ N i1 Ti N 2 2 Fro m th e p o o le d O L S e stim a to r: E st( u ) N 2 ˆu T 2 N T T i 1 t i 1 ( y it a O L S x it b O L S ) N i1 Ti i 1 t i 1 ( y it a O L S x it b O L S ) i 1 t i 1 ( y it a i x it b L S D V ) N 2 i1 Ti 2 0 2 2 ˆ Stata Variance Estimators N 2 ˆ T i 1 t i 1 ( y it a i x it b LS D V ) N i 1 Ti K N 2 > 0 ba se d o n FE e stim a te s 2 (N K ) S S E (gro u p m e a n s) ˆ 2 ˆ u M ax 0 , N A (N A ) T 0 2 w h e re A = K o r if ˆ u is n e ga tive , A = tra ce o f a m a trix th a t so m e w h a t re se m ble s I K . M a n y o th e r a dju stm e n ts e x ist. N o n e gu a ra n te e d to be po sitive . N o o ptim a lity pro pe rtie s o r e v e n gu a ra n te e d co n siste n cy. Application +--------------------------------------------------+ | Random Effects Model: v(i,t) = e(i,t) + u(i) | | Estimates: Var[e] = .231188D-01 | No problems arise | Var[u] = .102531D+00 | | Corr[v(i,t),v(i,s)] = .816006 | in this sample. | Variance estimators are based on OLS residuals. | +--------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ EXP .08819204 .00224823 39.227 .0000 19.8537815 EXPSQ -.00076604 .496074D-04 -15.442 .0000 514.405042 OCC -.04243576 .01298466 -3.268 .0011 .51116447 SMSA -.03404260 .01620508 -2.101 .0357 .65378151 MS -.06708159 .01794516 -3.738 .0002 .81440576 FEM -.34346104 .04536453 -7.571 .0000 .11260504 UNION .05752770 .01350031 4.261 .0000 .36398559 ED .11028379 .00510008 21.624 .0000 12.8453782 Constant 4.01913257 .07724830 52.029 .0000 Testing for Effects: An LM Test B re u sc h a n d P a g a n La g ra n g e M u ltip lie r sta tistic y it x it u i it , u i a n d it 0 2 ~ N o rm a l , u 0 0 2 H0 : u 0 N 2 N 2 2 i 1 ( Ti e i ) 2 LM = 1 [1] N T 2 N 2 i 1 T ( T i 1) i 1 t 1 e it i ( i1 Ti ) 0 2 Application: Cornwell-Rupert Hausman Test for FE vs. RE Estimator Random Effects E[ci|Xi] = 0 Fixed Effects E[ci|Xi] ≠ 0 FGLS (Random Effects) LSDV (Fixed Effects) Consistent and Efficient Inconsistent Consistent Inefficient Consistent Possibly Efficient Computing the Hausman Statistic N 1 2 ˆ E st.V a r[ β F E ] ii X i ˆ i 1 X i I T i N ˆ i 2 ˆ E st.V a r[ β R E ] ii X i ˆ i 1 X i I T i 1 -1 2 , 0 ˆ i = Ti ˆu 2 2 ˆ Ti ˆu 1 2 2 ˆ ] E st.V a r[ β ˆ ] A s lo n g a s ˆ and ˆ u a re c o n siste n t, a s N , E st.V a r[ β FE RE w ill b e n o n n e g a tiv e d e fin ite . In a fin ite sa m p le , to e n su re th is, b o th m u st 2 b e c o m p u te d u sin g th e sa m e e stim a te o f ˆ . T h e o n e b a se d o n L S D V w ill g e n e ra lly b e th e b e tte r c h o ic e . ˆ ] if th e re a re tim e N o te th a t c o lu m n s o f ze ro s w ill a p p e a r in E st.V a r[ β FE in v a ria n t v a ria b le s in X . β does not contain the constant term in the preceding. Hausman Test +--------------------------------------------------+ | Random Effects Model: v(i,t) = e(i,t) + u(i) | | Estimates: Var[e] = .235368D-01 | | Var[u] = .110254D+00 | | Corr[v(i,t),v(i,s)] = .824078 | | Lagrange Multiplier Test vs. Model (3) = 3797.07 | | ( 1 df, prob value = .000000) | | (High values of LM favor FEM/REM over CR model.) | | Fixed vs. Random Effects (Hausman) = 2632.34 | | ( 4 df, prob value = .000000) | | (High (low) values of H favor FEM (REM).) | +--------------------------------------------------+ Variable Addition A Fixed E ffects M odel y it i x it it LS D V estim ator - D eviations from group m eans: T o estim ate , regress (y it y i ) on ( x it x i ) A lgebraic equivalent: O LS regress y it on ( x it , x i ) M undlak interpretation: i x i u i M odel becom es y it x i u i x it it = x i x it it u i a random effects m odel w ith the group m eans. E stim ate by FG LS . A Variable Addition Test • • • Asymptotic equivalent to Hausman Also equivalent to Mundlak formulation In the random effects model, using FGLS • • • Only applies to time varying variables Add expanded group means to the regression (i.e., observation i,t gets same group means for all t. Use Wald test to test for coefficients on means equal to 0. Large chi-squared weighs against random effects specification. Fixed Effects +----------------------------------------------------+ | Panel:Groups Empty 0, Valid data 595 | | Smallest 7, Largest 7 | | Average group size 7.00 | | There are 3 vars. with no within group variation. | | ED BLK FEM | | Look for huge standard errors and fixed parameters.| | F.E. results are based on a generalized inverse. | | They will be highly erratic. (Problematic model.) | | Unable to compute std.errors for dummy var. coeffs.| +----------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ |WKS | .00083 .00060003 1.381 .1672 46.811525| |OCC | -.02157 .01379216 -1.564 .1178 .5111645| |IND | .01888 .01545450 1.221 .2219 .3954382| |SOUTH | .00039 .03429053 .011 .9909 .2902761| |SMSA | -.04451** .01939659 -2.295 .0217 .6537815| |UNION | .03274** .01493217 2.192 .0283 .3639856| |EXP | .11327*** .00247221 45.819 .0000 19.853782| |EXPSQ | -.00042*** .546283D-04 -7.664 .0000 514.40504| |ED | .000 ......(Fixed Parameter)....... | |BLK | .000 ......(Fixed Parameter)....... | |FEM | .000 ......(Fixed Parameter)....... | +--------+------------------------------------------------------------+ Random Effects +--------------------------------------------------+ | Random Effects Model: v(i,t) = e(i,t) + u(i) | | Estimates: Var[e] = .235368D-01 | | Var[u] = .110254D+00 | | Corr[v(i,t),v(i,s)] = .824078 | | Lagrange Multiplier Test vs. Model (3) = 3797.07 | | ( 1 df, prob value = .000000) | | (High values of LM favor FEM/REM over CR model.) | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ |WKS | .00094 .00059308 1.586 .1128 46.811525| |OCC | -.04367*** .01299206 -3.361 .0008 .5111645| |IND | .00271 .01373256 .197 .8434 .3954382| |SOUTH | -.00664 .02246416 -.295 .7677 .2902761| |SMSA | -.03117* .01615455 -1.930 .0536 .6537815| |UNION | .05802*** .01349982 4.298 .0000 .3639856| |EXP | .08744*** .00224705 38.913 .0000 19.853782| |EXPSQ | -.00076*** .495876D-04 -15.411 .0000 514.40504| |ED | .10724*** .00511463 20.967 .0000 12.845378| |BLK | -.21178*** .05252013 -4.032 .0001 .0722689| |FEM | -.24786*** .04283536 -5.786 .0000 .1126050| |Constant| 3.97756*** .08178139 48.637 .0000 | +--------+------------------------------------------------------------+ The Hausman Test, by Hand --> matrix; br=b(1:8) ; vr=varb(1:8,1:8)$ --> matrix ; db = bf - br ; dv = vf - vr $ --> matrix ; list ; h =db'<dv>db$ Matrix H has 1 rows and 1 +-------------1| 2523.64910 1 columns. --> calc;list;ctb(.95,8)$ +------------------------------------+ | Listed Calculator Results | +------------------------------------+ Result = 15.507313 Means Added to REM - Mundlak +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ |WKS | .00083 .00060070 1.380 .1677 46.811525| |OCC | -.02157 .01380769 -1.562 .1182 .5111645| |IND | .01888 .01547189 1.220 .2224 .3954382| |SOUTH | .00039 .03432914 .011 .9909 .2902761| |SMSA | -.04451** .01941842 -2.292 .0219 .6537815| |UNION | .03274** .01494898 2.190 .0285 .3639856| |EXP | .11327*** .00247500 45.768 .0000 19.853782| |EXPSQ | -.00042*** .546898D-04 -7.655 .0000 514.40504| |ED | .05199*** .00552893 9.404 .0000 12.845378| |BLK | -.16983*** .04456572 -3.811 .0001 .0722689| |FEM | -.41306*** .03732204 -11.067 .0000 .1126050| |WKSB | .00863** .00363907 2.371 .0177 46.811525| |OCCB | -.14656*** .03640885 -4.025 .0001 .5111645| |INDB | .04142 .02976363 1.392 .1640 .3954382| |SOUTHB | -.05551 .04297816 -1.292 .1965 .2902761| |SMSAB | .21607*** .03213205 6.724 .0000 .6537815| |UNIONB | .08152** .03266438 2.496 .0126 .3639856| |EXPB | -.08005*** .00533603 -15.002 .0000 19.853782| |EXPSQB | -.00017 .00011763 -1.416 .1567 514.40504| |Constant| 5.19036*** .20147201 25.762 .0000 | +--------+------------------------------------------------------------+ Wu (Variable Addition) Test --> matrix ; bm=b(12:19);vm=varb(12:19,12:19)$ --> matrix ; list ; wu = bm'<vm>bm $ Matrix WU has 1 rows and 1 +-------------1| 3004.38076 1 columns. A Hierarchical Linear Model Interpretation of the FE Model y it x it β c i + ε it , ( x d o e s n o t co n ta in a co n sta n t) 2 E [ε it | X i , c i ] 0 , V a r[ε it | X i , c i ]= c i + z i δ + u i , 2 E [u i| z i ] 0 , V a r[u i| z i ] u y it x it β [ z i δ u i ] ε it Hierarchical Linear Model as REM +--------------------------------------------------+ | Random Effects Model: v(i,t) = e(i,t) + u(i) | | Estimates: Var[e] = .235368D-01 | | Var[u] = .110254D+00 | | Corr[v(i,t),v(i,s)] = .824078 | | Sigma(u) = 0.3303 | +--------------------------------------------------+ +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ OCC | -.03908144 .01298962 -3.009 .0026 .51116447 SMSA | -.03881553 .01645862 -2.358 .0184 .65378151 MS | -.06557030 .01815465 -3.612 .0003 .81440576 EXP | .05737298 .00088467 64.852 .0000 19.8537815 FEM | -.34715010 .04681514 -7.415 .0000 .11260504 ED | .11120152 .00525209 21.173 .0000 12.8453782 Constant| 4.24669585 .07763394 54.702 .0000 Evolution: Correlated Random Effects U nknow n param eters y it i x it it , [ 1 , 2 , ..., N , , ] 2 S tandard estim ation based on L S (dum m y v ariables) A m biguous definition of the distribution of y it E ffects m odel, nonorthogonality, heterog eneit y y it i x it it , E[ i | X i ] g ( X i ) 0 C ontrast to random effects E[ i | X i ] S tandard estim ation (still) based on L S (dum m y variables) C orrelated random effects, m ore detailed m odel y it i x it it , P[ i | X i ] g ( X i ) 0 L inear projection? i x i u i C or(u i , x i ) 0 Mundlak’s Estimator Mundlak, Y., “On the Pooling of Time Series and Cross Section Data, Econometrica, 46, 1978, pp. 69-85. W rite c i = x i δ u i , E [c i | x i1 , x i1 , ... x iT ] = x i δ i A ssu m e c i c o n ta in s a ll tim e in v a ria n t in fo rm a tio n y i = X iβ + c ii+ ε i , T i o b se rv a tio n s in g ro u p i = X iβ + ix i δ + ε i + u ii L o o k s lik e ra n d o m e ffe c ts. 2 V a r[ ε i + u ii]= Ω i + σ u ii T h is is th e m o d e l w e u se d fo r th e W u te s t. Correlated Random Effects M u n d la k c i = x i δ u i , E [c i | x i1 , x i1 , ... x iT ] = x i δ i A ssu m e c i co n ta in s a ll tim e in v a ria n t in fo rm a tio n y i = X iβ + c ii+ ε i , T i o b se rv a tio n s in g ro u p i = X iβ + ix i δ + ε i + u ii C h a m b e rla in / W o o ld ri d g e c i = x i1 δ 1 x i2 δ 2 ... x iT δ T u i y i = X iβ ix i1 δ 1 ix i1 δ 2 ... ix iT δ T i u i + ε i TxK TxK TxK TxK e tc. P ro b le m s: R e q u ire s b a la n ce d p a n e ls M o d e rn p a n e ls h a v e la rg e T ; m o d e ls h a v e la rg e K Mundlak’s Approach for an FE Model with Time Invariant Variables y it x it β + z i δ c i + ε it , ( x do e s n o t co n ta in a co n sta n t) 2 E [ε it | X i , c i ] 0 , V a r[ε it | X i , c i ]= c i + x i θ + w i , 2 E [w i| X i , z i ] 0 , V a r[w i| X i , z i ] w y it x it β z i δ x i θ w i ε it ra n do m e ffe cts m o de l in clu din g gro u p m e a n s o f tim e va ryin g va ria ble s. Mundlak Form of FE Model +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ x(i,t)================================================================= OCC | -.02021384 .01375165 -1.470 .1416 .51116447 SMSA | -.04250645 .01951727 -2.178 .0294 .65378151 MS | -.02946444 .01915264 -1.538 .1240 .81440576 EXP | .09665711 .00119262 81.046 .0000 19.8537815 z(i)=================================================================== FEM | -.34322129 .05725632 -5.994 .0000 .11260504 ED | .05099781 .00575551 8.861 .0000 12.8453782 Means of x(i,t) and constant=========================================== Constant| 5.72655261 .10300460 55.595 .0000 OCCB | -.10850252 .03635921 -2.984 .0028 .51116447 SMSAB | .22934020 .03282197 6.987 .0000 .65378151 MSB | .20453332 .05329948 3.837 .0001 .81440576 EXPB | -.08988632 .00165025 -54.468 .0000 19.8537815 Variance Estimates===================================================== Var[e]| .0235632 Var[u]| .0773825 Panel Data Extensions • • • • • Dynamic models: lagged effects of the dependent variable Endogenous RHS variables Cross country comparisons– large T More general parameter heterogeneity – not only the constant term Nonlinear models such as binary choice The Hausman and Taylor Model y it x 1 it β 1 x 2 it β 2 z 1 i α 1 z 2 i α 2 it u i M o d e l: x 2 a n d z 2 a re c o rre la te d w ith u . D e v ia tio n s fro m g ro u p m e a n s re m o v e s a ll tim e in v a ria n t v a ria b le s y it y i ( x 1 it - x 1 i ) 'β 1 ( x 2 it - x 2 i ) 'β 2 it Im p lic a tio n : β 1 , β 2 a re c o n siste n tly e stim a te d b y L S D V . ( x 1 it - x 1 i ) = K 1 in stru m e n ta l v a ria b le s ( x 2 it - x 2 i ) = K 2 in stru m e n ta l v a ria b le s z1i ? = L 1 in stru m e n ta l v a ria b le s (u n c o rre la te d w ith u ) = L 2 in stru m e n ta l v a ria b le s (w h e re d o w e g e t th e m ?) H & T : x 1 i = K 1 a d d itio n a l in stru m e n ta l v a ria b le s. N e e d s K 1 L 2 . H&T’s 4 Step FGLS Estimator 2 (1) LS D V e stim a te s o f β 1 , β 2 , (2) ( e * ) ' = ( e 1 , e 1 , ..., e 1 ), ( e 2 , e 2 , ..., e 2 ), ..., ( e N , e N , . .., e N ) IV re g re ssio n o f e * o n Z * w ith in stru m e n ts W i co n siste n tly e stim a te s α 1 a n d α 2 . 2 2 (3) W ith fix e d T , re sid u a l va ria n ce in (2 ) e stim a te s u / T 2 2 W ith u n b a la n ce d p a n e l, it e stim a te s u (1 /T ) o r so m e th in g 2 re se m b lin g th is. (1 ) p ro vid e d a n e stim a te o f so u se th e tw o 2 2 to o b ta in e stim a te s o f u a n d . Fo r e a ch g ro u p , co m p u te ˆ i 1 2 2 2 ˆ / ( ˆ Ti ˆu) ( 4 ) T ra n sfo rm [ x it1 , x it2 , z i1 , z i2 ] to W i * = [ x it1 , x it2 , z i1 , z i2 ] - ˆ i [ x i1 , x i2 , z i1 , z i2 ] and y it to y it * = y it - ˆ i y i. H&T’s 4 STEP IV Estimator In stru m e n ta l V a ria b le s V i ( x 1 it - x 1 i ) = K 1 in stru m e n ta l va ria b le s ( x 2 it - x 2 i ) = K 2 in stru m e n ta l va ria b le s z1i = L 1 in stru m e n ta l va ria b le s ( u n co rre la te d w ith u ) x1i = K 1 a d d itio n a l in stru m e n ta l va ria b le s. N o w d o 2 S LS o f y * o n W * w ith in stru m e n ts V to e stim a te a ll p a ra m e te rs. I.e ., ˆ * W ˆ * ) -1 W ˆ * y * . [ β 1 , β 2 , α 1 , α 2 ]= ( W Arellano/Bond/Bover’s Formulation Builds on Hausman and Taylor y it x 1 it β 1 x 2 it β 2 z 1 i α 1 z 2 i α 2 it u i In stru m e n ta l v a ria b le s fo r p e rio d t ( x 1 it - x 1 i ) = K 1 in stru m e n ta l v a ria b le s ( x 2 it - x 2 i ) = K 2 in stru m e n ta l v a ria b le s z1i = L 1 in stru m e n ta l v a ria b le s ( u n c o rre la te d w ith u ) x1i = K 1 a d d itio n a l in stru m e n ta l v a ria b le s. K 1 L 2 . L e t v it it u i L e t z it [( x 1 it - x 1 i ) ' ,( x 2 it - x 2 i ) ' , z 1 i , x 1'] T h e n E [ z it v it ] 0 W e fo rm u la te th is fo r th e T i o b se rv a tio n s in g ro u p i. Arellano/Bond/Bover’s Formulation Adds a Lagged DV to H&T y it y i,t 1 + x 1 it β 1 x 2 it β 2 z 1 i α 1 z 2 i α 2 it u i P a ra m e te rs : θ = [ , β 1 , β 2 , α 1 , α 2 ]' T h e d a ta y i,2 y i,1 x 1 i2 y i,3 y i,2 x 1 i3 yi , Xi y i, T y i,T -1 x 1 iT i i 1 K1 x 2 i2 z 1 i x 2 i3 z 1 i x 2 iT z 1 i K2 i L1 z 2 i z 2 i , T i -1 ro w s z 2 i L2 c o lu m n s This formulation is the same as H&T with yi,t-1 contained in x2it . Dynamic (Linear) Panel Data (DPD) Models • • • • Application Bias in Conventional Estimation Development of Consistent Estimators Efficient GMM Estimators Dynamic Linear Model B a le stra -N e rlo ve (1 9 6 6 ), 3 6 S ta te s, 1 1 Y e a rs D e m a n d fo r N a tu ra l G a s S tru ctu re * N e w D e m a n d: G i,t G i,t (1 )G i,t 1 * D e m a n d Fu n ctio n G i,t 1 2 Pi,t 3 N i,t 4 N i,t 5 Yi,t 6 Yi,t i,t G = ga s de m a n d N = po pu la tio n P = price Y = pe r ca pita in co m e R e du ce d Fo rm G i,t 1 2 Pi,t 3 N i,t 4 N i,t 5 Yi,t 6 Yi,t 7 G i,t 1 i i,t A General DPD model y i,t x i, t β y i,t 1 c i i,t E[ i,t | X i ,c i ] 0 2 2 E[ i,t | X i , c i ] , E[ i,t i,s | X i , c i ] 0 if t s. E [c i | X i ] g( X i ) N o co rre la tio n a cro ss in dividu a ls O LS a n d G LS a re bo th in co n siste n t. Arellano and Bond Estimator B a se o n first d iffe re n ce s y i,t y i,t 1 ( x i,t x i,t 1 ) 'β + (y i,t 1 y i,t 2 ) ( i,t i,t 1 ) In stru m e n ta l v a ria b le s y i,3 y i,2 ( x i,3 x i,2 ) 'β + (y i,2 y i,1 ) ( i,3 i,2 ) C a n u se y i1 y i,4 y i,3 ( x i,4 x i,3 ) 'β + (y i,3 y i ,2 ) ( i,4 i,3 ) C a n u se y i,1 a n d y i2 y i,5 y i,4 ( x i,5 x i,4 ) 'β + (y i,4 y i,3 ) ( i,5 i,4 ) C a n u se y i,1 a n d y i2 a n d y i,3 Arellano and Bond Estimator M o re in stru m e n ta l v a ria b le s - P re d e te rm in e d X y i,3 y i,2 ( x i,3 x i,2 ) 'β + (y i,2 y i,1 ) ( i,3 i,2 ) C a n u se y i1 a n d x i,1 , x i,2 y i,4 y i,3 ( x i,4 x i,3 ) 'β + (y i,3 y i,2 ) ( i,4 i,3 ) C a n u se y i,1 , y i2 , x i,1 , x i,2 , x i,3 y i,5 y i,4 ( x i,5 x i,4 ) 'β + (y i,4 y i,3 ) ( i,5 i,4 ) C a n u se y i,1 , y i2 , y i,3 , x i,1 , x i,2 , x i,3 , x i,4 Arellano and Bond Estimator E v e n m o re in stru m e n ta l v a ria b le s - S tric tly e x o g e n o u s X y i,3 y i,2 ( x i,3 x i,2 ) 'β + (y i,2 y i,1 ) ( i,3 i,2 ) C a n u se y i1 a n d x i,1 , x i,2 , ..., x i, T (a ll p e rio d s) y i,4 y i,3 ( x i,4 x i,3 ) 'β + (y i,3 y i,2 ) ( i,4 i, 3 ) C a n u se y i,1 , y i2 , x i,1 , x i,2 , ..., x i, T y i,5 y i,4 ( x i,5 x i,4 ) 'β + (y i,4 y i,3 ) ( i,5 i,4 ) C a n u se y i,1 , y i2 , y i,3 , x i,1 , x i,2 , ..., x i, T T h e n u m b e r o f p o te n tia l in stru m e n ts is h u g e . T h e se d e fin e th e ro w s o f Z i . T h e se c a n b e u se d fo r sim p le in stru m e n ta l v a ria b le e stim a tio n . Application: Maquiladora http://www.dallasfed.org/news/research/2005/05us-mexico_felix.pdf Maquiladora Estimates