Part 11: Heterogeneity [ 1/36] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Part 11: Heterogeneity [ 2/36] Agenda Random Parameter Models Fixed effects Random effects Heterogeneity in Dynamic Panels Random Coefficient Vectors-Classical vs. Bayesian General RPM Swamy/Hsiao/Hildreth/Houck Hierarchical and “Two Step” Models ‘True’ Random Parameter Variation Discrete – Latent Class Continuous Classical Bayesian Part 11: Heterogeneity [ 3/36] A Capital Asset Pricing Model R it 0t 1ti 2ti2 3t s i it R it one period percentage return 0t expected return on a riskless security (stochastic) 1t expected premium on the 'market' portfolio, R Mt R 0t 2t "nonlinear" risk effect 3t "nonbeta risk" term Data are [R it ,i ,i2 , s i ], generated by auxiliary regressions Coefficients are 'random' through time. Fama - MacBeth, "Risk, Return, and Equilibrium: Empirical Tests," Journal of Political Economy, 1974. Part 11: Heterogeneity [ 4/36] Heterogeneous Production Model Healthi,t i iHEXPi,t iEDUCi,t i,t i country, t=year Health = health care outcome, e.g., life expectancy HEXP = health care expenditure EDUC = education Parameter heterogeneity: Discrete? Aids dominated vs. QOL dominated Continuous? Cross cultural heterogeneity World Health Organization, "The 2000 World Health Report" Part 11: Heterogeneity [ 5/36] Parameter Heterogeneity Unobserved Effects Random Constants y it x it β c i it y it i x it β it i ui , E[ui | X i ] 0 --> Random effects E[ui | X i ] 0 --> Fixed effects E XE[ui | X i ] 0. Var[ui | X i ] not yet defined - so far, constant. Part 11: Heterogeneity [ 6/36] Parameter Heterogeneity Generalize to Random Parameters y it x it βi it βi β ui E[ui | X i ] zero or nonzero - to be defined E X [E[ui | X i ]] = 0 Var[ui | X i ] to be defined, constant or variable "The Pooling Problem : " What is the consequence of estimating under the erroneous assumption of constant parameters. (Theil, 1960, "The Aggregation Problem") (Maddala, 1970s - 1990s, "The Pooling Problem") Part 11: Heterogeneity [ 7/36] Fixed Effects (Hildreth, Houck, Hsiao, Swamy) y it x it βi it , each observation y i X iβi ε i , Ti observations βi β ui Assume (temporarily) Ti > K. E[ui | X i ] =g(X i ) (conditional mean) P[ui | X i ] =(X i -E[X i ])θ (projection) E X [E[ui | X i ]] = E X [P[ui | X i ]] =0 Var[ui | X i ] Γ constant but nonzero Part 11: Heterogeneity [ 8/36] OLS and GLS Are Inconsistent y i X iβi ε i , Ti observations βi β ui y i X iβ X iui ε i , Ti observations y i X iβ w i E[w i | X i ] X iE[ui | X i ] E[ε i | X i ] 0 Part 11: Heterogeneity [ 9/36] Estimating the Fixed Effects Model y1 y2 ... yN X1 0 ... 0 0 X2 ... 0 ... 0 β1 ε1 ... 0 β2 ε2 ... ... ... ... ... X N βN εN Estimator: Equation by equation OLS or (F)GLS 1 N ˆ Estimate β? i1βi is consistent for E[βi ] in N. N Part 11: Heterogeneity [ 10/36] Partial Fixed Effects Model Some individual specific parameters y i Diαi +X iβ ε i , Ti observations Use OLS and Frisch-Waugh ˆ [N X Mi X ]1 [N X Mi y ], Mi I D (DD ) 1 D β i1 i D i i1 i D i D i i i i ˆ) ˆ i [DiDi ]1 D(y i -X iβ α E.g., Individual specific time trends, y it i0 i1 t x it β it ; Detrend individual data, then OLS E.g., Individual specific constant terms, y it i0 x it β it ; Individual group mean deviations, then OLS Part 11: Heterogeneity [ 11/36] Heterogeneous Dynamic Models logYi,t i i log Yi,t 1 i x it i,t long run effect of interest is i i 1 i See : Pesaran,H., Smith,R., Im,K.,"Estimating Long-Run Relationships From Dynamic Heterogeneous Panels," Journal of Econometrics, 1995. (Repeated with further study in Matyas and Sevestre, The Econometrics of Panel Data. Smith, J., notes, Applied Econometrics, Dynamic Panel Data Models, University of Warwick. http://www2.warwick.ac.uk/fac/soc/economics/staff/faculty/jennifersmith/panel/ Weinhold, D., "A Dynamic "Fixed Effects" Model for Heterogeneous Panel Data," London School of Economics, 1999. Part 11: Heterogeneity [ 12/36] Random Effects and Random Parameters THE Random Parameters Model y it x it βi it , each observation y i X iβi ε i , Ti observations βi β ui Assume (temporarily) Ti > K. E[ui | X i ] =0 Var[ui | X i ] Γ constant but nonzero We differentiate the classical and Bayesian interpretations Randomness here is heterogeneity, not "uncertainty" Bayesian approach to be considered later. Part 11: Heterogeneity [ 13/36] Estimating the Random Parameters Model y i X iβi ε i , Ti observations βi β ui y i X iβ X iui ε i , Ti observations y i X iβ w i E[w i | X i ] X iE[ui | X i ] E[ε i | X i ] 0 Var[w i | X i ] X iΓX i 2 ,iI <== Should 2 ,i vary by i? Objects of estimation : β, 2 ,i , Γ Second level estimation : βi Part 11: Heterogeneity [ 14/36] Estimating the Random Parameters Model by OLS y i X iβi ε i , Ti observations βi β ui y i X iβ X iui ε i , Ti observations y i X iβ w i b [Ni1 X i X i ]1 [Ni1 X i y i ] β [Ni1 Xi X i ]1 [Ni1 Xi w i ] Var[b|X ]=[Ni1 X i X i ]1 [Ni1 X i ( X iΓX i 2 I) X ][Ni1 X i X i ]1 =2 [Ni1 X i X i ]1 [Ni1 X i X i ]1 [Ni1 (X i X i )Γ(Xi X)][Ni1 Xi X i ]1 the usual + the variation due to the random parameters Robust estimator ˆ iw ˆ i X i ][Ni1 X i X i ]1 Est.Var[b] [Ni1 X i X i ]1 [Ni1 X i w Part 11: Heterogeneity [ 15/36] Estimating the Random Parameters Model by GLS y i X iβi ε i , Ti observations βi β ui y i X iβ X iui ε i , Ti observations y i X iβ w i , Var[w i|X i ] = Ωi =( X iΓX i 2 ,iI) ˆ [N X Ω-1 X ]1 [N X Ω-1 y ] β i1 i i i i 1 i i i 2 ˆ and For FGLS, we need Γ ˆ ,i. Part 11: Heterogeneity [ 16/36] Estimating the RPM 1 bi β ( X i X i ) X i w i , w i =X iui +ε i 1 = β ui ( X i X i ) X iε i 1 Var[bi|X i ]=Γ+ ( X i X i ) 2 ,i 2 ˆ ,i tTi 1 (y it x it bi )2 is unbiased Ti K (but not consistent because Ti is fixed). Part 11: Heterogeneity [ 17/36] An Estimator for Γ E[bi|X i ] β Var[bi|X i ]=Γ+2 ,i ( X i X i ) 1 Var[bi ] VarXE[bi|X i ] E X Var[bi|X i ] = 0+ E X [Γ+2 ,i ( X i X i )1 ] Γ+E X [2 ,i ( X i X i )1 ] 1 N Estimate Var[bi ] with i1 (bi b)(bi b)' N 1 2 1 Estimate E X [2 ,i ( X i X i ) 1 ] with Ni1 ˆ ,i ( X i X i ) N 2 1 ˆ= 1 Ni1 (bi b)(bi b)' - 1 Ni1 Γ ˆ ,i ( X i X i ) N N Part 11: Heterogeneity [ 18/36] A Positive Definite Estimator for Γ 1 N 1 N 2 1 ˆ Γ= i 1 (bi b)(b i b)' i 1 ˆ ,i ( X iX i ) N N May not be positive definite. What to do? (1) The second term converges (in theory) to 0 in Ti . Drop it. (2) Various Bayesian "shrinkage" estimators, (3) An ML estimator Part 11: Heterogeneity [ 19/36] Estimating βi N ˆ β GLS i1 Wb i i,OLS Wi {Ni1 [Γ 2 ,i ( X i X i ) 1 ]} 1 [Γ 2 ,i ( X i X i ) 1 ] Best linear unbiased predictor based on GLS is ˆ Aβ ˆ ˆ β i i GLS + (I-A i )bi,OLS bi,OLS A i (β GLS bi,OLS ) A i {Γ -1 [2 ,i ( Xi X i ) 1 ]1 } 1 Γ -1 ˆ | all data]=A Var[β ˆ ]A Var[β i i GLS i [A i ˆ ] Var[β GLS (I-A i )] WVar[bi,OLS ]i Var[bi,OLS ]Wi A i ( I A ) Var[bi,OLS ] i Part 11: Heterogeneity [ 20/36] Baltagi and Griffin’s Gasoline Data World Gasoline Demand Data, 18 OECD Countries, 19 years Variables in the file are COUNTRY = name of country YEAR = year, 1960-1978 LGASPCAR = log of consumption per car LINCOMEP = log of per capita income LRPMG = log of real price of gasoline LCARPCAP = log of per capita number of cars See Baltagi (2001, p. 24) for analysis of these data. The article on which the analysis is based is Baltagi, B. and Griffin, J., "Gasoline Demand in the OECD: An Application of Pooling and Testing Procedures," European Economic Review, 22, 1983, pp. 117-137. The data were downloaded from the website for Baltagi's text. Part 11: Heterogeneity [ 21/36] OLS and FGLS Estimates +----------------------------------------------------+ | Overall OLS results for pooled sample. | | Residuals Sum of squares = 14.90436 | | Standard error of e = .2099898 | | Fit R-squared = .8549355 | +----------------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Constant 2.39132562 .11693429 20.450 .0000 LINCOMEP .88996166 .03580581 24.855 .0000 LRPMG -.89179791 .03031474 -29.418 .0000 LCARPCAP -.76337275 .01860830 -41.023 .0000 +------------------------------------------------+ | Random Coefficients Model | | Residual standard deviation = .3498 | | R squared = .5976 | | Chi-squared for homogeneity test = 22202.43 | | Degrees of freedom = 68 | | Probability value for chi-squared= .000000 | +------------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ CONSTANT 2.40548802 .55014979 4.372 .0000 LINCOMEP .39314902 .11729448 3.352 .0008 LRPMG -.24988767 .04372201 -5.715 .0000 LCARPCAP -.44820927 .05416460 -8.275 .0000 Part 11: Heterogeneity [ 22/36] Country Specific Estimates Part 11: Heterogeneity [ 23/36] Estimated Γ Part 11: Heterogeneity [ 24/36] Two Step Estimation (Saxonhouse) A Fixed Effects Model y it i x it β it Secondary Model i ziδ Two approaches (1) Reduced form is a linear model with time constant zi y it x it β ziδ it (2) Two step (a) FEM at step 1 (b) ai i (ai i ) ziδ v i 1 Var[v i ] 2 x i ( X iMDi X i ) 1 x i Ti Use weighted least squares regression of ai on zi Part 11: Heterogeneity [ 25/36] A Hierarchical Model Fixed Effects Model y it i x it β it Secondary Model i ziδ ui <======== Two approaches (1) Reduced form is an REM with time constant zi y it x it β ziδ ui it (2) Two step (a) FEM at step 1 (b) ai i (ai i ) ziδ ui v i 1 Var[ui v i ] u2 2 x i ( X iMDi X i ) 1 x i Ti Part 11: Heterogeneity [ 26/36] Analysis of Fannie Mae Fannie Mae The Funding Advantage The Pass Through Passmore, W., Sherlund, S., Burgess, G., “The Effect of Housing Government-Sponsored Enterprises on Mortgage Rates,” 2005, Federal Reserve Board and Real Estate Economics Part 11: Heterogeneity [ 27/36] Two Step Analysis of Fannie-Mae Fannie Mae's GSE Funding Advantage and Pass Through RMi,s,t 0s ,t (1s ,tLTV) 2s ,t Smalli,s ,t 3s ,tFees i,s ,t s4,tNew i,s ,t 5s ,tMtgCoi,s ,t s ,t Ji,s ,t i,s ,t i, s, t individual, state,month 1,036,252 observations in 370 state,months. RM mortgage LTV= 3 dummy variables for loan to value Small = dummy variable for small loan Fees = dummy variable for whether fees paid up front New = dummy variable for new home MtgCo = dummy variable for mortgage company J = dummy variable for whether this is a JUMBO loan THIS IS THE COEFFICIENT OF INTEREST. Part 11: Heterogeneity [ 28/36] Average of 370 First Step Regressions Symbol Variable Mean S.D. Coeff S.E. RM Rate % 7.23 0.79 J Jumbo 0.06 0.23 0.16 0.05 LTV1 75%-80% 0.36 0.48 0.04 0.04 LTV2 81%-90% 0.15 0.35 0.17 0.05 LTV3 >90% 0.22 0.41 0.15 0.04 New New Home 0.17 0.38 0.05 0.04 Small < $100,000 0.27 0.44 0.14 0.04 Fees Fees paid 0.62 0.52 0.06 0.03 MtgCo Mtg. Co. 0.67 0.47 0.12 0.05 R2 = 0.77 Part 11: Heterogeneity [ 29/36] Second Step s ,t 0 1 GSE Funding Advantage s,t - estimated separately 2 Risk free cost of credit s,t 3 Corporate debt spreads s,t - estimated 4 different ways 4 Prepayment spreads,t 5 Maturity mismatch risk s,t 6 Aggregate Demands,t 7 Long term interest rate s,t 8 Market Capacity s,t 9 Time trends,t 10-13 4 dummy variables for CA, NJ, MD, VA s,t 14-16 3 dummy variables for calendar quarters s,t Part 11: Heterogeneity [ 30/36] Estimates of β1 Second step based on 370 observations. Corrected for "heteroscedasticity, autocorrelation, and monthly clustering." Four estimates based on different estimates of corporate credit spread: 0.07 (0.11) 0.31 (0.11) 0.17 (0.10) 0.10 (0.11) Reconcile the 4 estimates with a minimum distance estimator ˆ11 -1 ) ( 2 ˆ1 -1 ) ( 1 2 3 4 -1 ˆ ˆ1 -1 ),( ˆ1 -1 ),( ˆ1 -1 ),( ˆ1 -1 )]'Ω Minimize [( ˆ3 (1 -1 ) 4 ( ˆ ) 1 1 Estimated mortgage rate reduction: About 16 basis points. .16%. Part 11: Heterogeneity [ 31/36] The Minimum Distance Estimator 0.07 (0.11) 0.31 (0.11) .017 (0.10) 0.10 (0.11) Reconcile the 4 estimates with a minimum distance estimator ˆ11 -1 ) ( 2 ˆ ( - ) ˆ -1 1 1 ˆ11 -1 ),( ˆ12 -1 ),( ˆ13 -1 ),( ˆ14 -1 )]' Ω Minimize [( ˆ3 ( ) 1 1 4 ( ˆ 1 -1 ) ˆ ˆ1 .07 / .112 (1 / .112 ) (1 / .112 ) (1 / .10 2 ) (1 / .112 ) .31 / .112 (1 / .112 ) (1 / .112 ) (1 / .10 2 ) (1 / .112 ) + ... Approximately .17%. Part 11: Heterogeneity [ 32/36] A Hierarchical Linear Model German Health Data Hsat = β1 + β2AGEit + γi EDUCit + β4 MARRIEDit + εit γi = α1 + α2FEMALEi + ui Sample ; all$ Reject ; _Groupti < 7 $ Regress ; Lhs = newhsat ; Rhs = one,age,educ,married ; RPM = female ; Fcn = educ(n) ; pts = 25 ; halton ; pds = _groupti ; Parameters$ Sample ; 1 – 887 $ Create ; betaeduc = beta_i $ Dstat ; rhs = betaeduc $ Histogram ; Rhs = betaeduc $ Part 11: Heterogeneity [ 33/36] OLS Results OLS Starting values for random parameters model... Ordinary least squares regression ............ LHS=NEWHSAT Mean = 6.69641 Standard deviation = 2.26003 Number of observs. = 6209 Model size Parameters = 4 Degrees of freedom = 6205 Residuals Sum of squares = 29671.89461 Standard error of e = 2.18676 Fit R-squared = .06424 Adjusted R-squared = .06378 Model test F[ 3, 6205] (prob) = 142.0(.0000) --------+--------------------------------------------------------| Standard Prob. Mean NEWHSAT| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------Constant| 7.02769*** .22099 31.80 .0000 AGE| -.04882*** .00307 -15.90 .0000 44.3352 MARRIED| .29664*** .07701 3.85 .0001 .84539 EDUC| .14464*** .01331 10.87 .0000 10.9409 --------+--------------------------------------------------------- Part 11: Heterogeneity [ 34/36] Maximum Simulated Likelihood Normal exit: 27 iterations. Status=0. F= 12584.28 -----------------------------------------------------------------Random Coefficients LinearRg Model Dependent variable NEWHSAT Log likelihood function -12583.74717 Estimation based on N = 6209, K = 7 Unbalanced panel has 887 individuals LINEAR regression model Simulation based on 25 Halton draws --------+--------------------------------------------------------| Standard Prob. Mean NEWHSAT| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------|Nonrandom parameters Constant| 7.34576*** .15415 47.65 .0000 AGE| -.05878*** .00206 -28.56 .0000 44.3352 MARRIED| .23427*** .05034 4.65 .0000 .84539 |Means for random parameters EDUC| .16580*** .00951 17.43 .0000 10.9409 |Scale parameters for dists. of random parameters EDUC| 1.86831*** .00179 1044.68 .0000 |Heterogeneity in the means of random parameters cEDU_FEM| -.03493*** .00379 -9.21 .0000 |Variance parameter given is sigma Std.Dev.| 1.58877*** .00954 166.45 .0000 --------+--------------------------------------------------------- Part 11: Heterogeneity [ 35/36] “Individual Coefficients” Frequency --> Sample ; 1 - 887 $ --> create ; betaeduc = beta_i $ --> dstat ; rhs = betaeduc $ Descriptive Statistics All results based on nonmissing observations. ============================================================================== Variable Mean Std.Dev. Minimum Maximum Cases Missing ============================================================================== All observations in current sample --------+--------------------------------------------------------------------BETAEDUC| .161184 .132334 -.268006 .506677 887 0 -.2 6 8 -.1 5 7 -.0 4 7 .0 6 4 .1 7 5 BET AEDUC .2 8 5 .3 9 6 .5 0 7 Part 11: Heterogeneity [ 36/36] A Hierarchical Linear Model A hedonic model of house values Beron, K., Murdoch, J., Thayer, M., “Hierarchical Linear Models with Application to Air Pollution in the South Coast Air Basin,” American Journal of Agricultural Economics, 81, 5, 1999. Part 11: Heterogeneity [ 37/36] HLM y ijk log of home sale price i, neighborhood j, community k. m y ijk m1 mjk x ijk ijk (linear regression model) M x mijk sq.ft, #baths, lot size, central heat, AC, pool, good view, age, distance to beach Random coefficients mjk qm1 qj Nqjk w jk Q Nqjk %population poor, race mix, avg age, avg. travel to work, FBI crime index, school avg. CA achievement test score s 1 sE qm vj j q j Sqm E qm air quality measure, visibility j