(Fixed and Random Effects), Two Step Analysis of Panel Data Models

advertisement
Part 11: Heterogeneity [ 1/36]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 11: Heterogeneity [ 2/36]
Agenda

Random Parameter Models







Fixed effects
Random effects
Heterogeneity in Dynamic Panels
Random Coefficient Vectors-Classical vs. Bayesian
General RPM Swamy/Hsiao/Hildreth/Houck
Hierarchical and “Two Step” Models
‘True’ Random Parameter Variation


Discrete – Latent Class
Continuous


Classical
Bayesian
Part 11: Heterogeneity [ 3/36]
A Capital Asset Pricing Model
R it   0t  1ti   2ti2   3t s i  it
R it  one period percentage return
 0t  expected return on a riskless security (stochastic)
1t  expected premium on the 'market' portfolio, R Mt  R 0t
 2t  "nonlinear" risk effect
 3t  "nonbeta risk" term
Data are [R it ,i ,i2 , s i ], generated by auxiliary regressions
Coefficients are 'random' through time.
Fama - MacBeth, "Risk, Return, and Equilibrium: Empirical
Tests," Journal of Political Economy, 1974.
Part 11: Heterogeneity [ 4/36]
Heterogeneous Production Model
Healthi,t  i  iHEXPi,t   iEDUCi,t  i,t
i  country, t=year
Health = health care outcome, e.g., life expectancy
HEXP = health care expenditure
EDUC = education
Parameter heterogeneity:
Discrete? Aids dominated vs. QOL dominated
Continuous? Cross cultural heterogeneity
World Health Organization, "The 2000 World Health Report"
Part 11: Heterogeneity [ 5/36]
Parameter Heterogeneity
Unobserved Effects  Random Constants
y it  x it β  c i  it
y it  i  x it β  it
i    ui ,
E[ui | X i ]  0 --> Random effects
E[ui | X i ]  0 --> Fixed effects
E XE[ui | X i ]  0.
Var[ui | X i ] not yet defined - so far, constant.
Part 11: Heterogeneity [ 6/36]
Parameter Heterogeneity
Generalize to Random Parameters
y it  x it βi  it
βi  β  ui
E[ui | X i ] zero or nonzero - to be defined
E X [E[ui | X i ]] = 0
Var[ui | X i ] to be defined, constant or variable
"The Pooling Problem : " What is the consequence
of estimating under the erroneous assumption of
constant parameters. (Theil, 1960, "The Aggregation
Problem") (Maddala, 1970s - 1990s, "The Pooling
Problem")
Part 11: Heterogeneity [ 7/36]
Fixed Effects
(Hildreth, Houck, Hsiao, Swamy)
y it  x it βi  it , each observation
y i  X iβi  ε i , Ti observations
βi  β  ui
Assume (temporarily) Ti > K.
E[ui | X i ] =g(X i ) (conditional mean)
P[ui | X i ] =(X i -E[X i ])θ (projection)
E X [E[ui | X i ]] = E X [P[ui | X i ]] =0
Var[ui | X i ]  Γ constant but nonzero
Part 11: Heterogeneity [ 8/36]
OLS and GLS Are Inconsistent
y i  X iβi  ε i , Ti observations
βi  β  ui
y i  X iβ  X iui  ε i , Ti observations
y i  X iβ  w i
E[w i | X i ]  X iE[ui | X i ]  E[ε i | X i ]  0
Part 11: Heterogeneity [ 9/36]
Estimating the Fixed Effects Model
 y1 
 
 y2  
 ... 
 
 yN 
 X1
0

 ...

 0
0
X2
...
0
...
0   β1   ε1 
   

... 0   β2   ε2 

... ...   ...   ... 
    
... X N   βN   εN 
Estimator: Equation by equation OLS or (F)GLS
1 N ˆ
Estimate β? i1βi is consistent for E[βi ] in N.
N
Part 11: Heterogeneity [ 10/36]
Partial Fixed Effects Model
Some individual specific parameters
y i  Diαi +X iβ  ε i , Ti observations
Use OLS and Frisch-Waugh
ˆ  [N X Mi X ]1 [N X Mi y ], Mi  I  D (DD ) 1 D
β
i1 i D i
i1 i D i
D
i
i
i i
ˆ)
ˆ i  [DiDi ]1 D(y i -X iβ
α
E.g., Individual specific time trends,
y it  i0  i1 t  x it β  it ; Detrend individual data, then OLS
E.g., Individual specific constant terms,
y it  i0  x it β  it ; Individual group mean deviations, then OLS
Part 11: Heterogeneity [ 11/36]
Heterogeneous Dynamic Models
logYi,t  i  i log Yi,t 1  i x it  i,t
long run effect of interest is i 
i
1  i
See :
Pesaran,H., Smith,R., Im,K.,"Estimating Long-Run Relationships
From Dynamic Heterogeneous Panels," Journal of Econometrics, 1995.
(Repeated with further study in Matyas and Sevestre, The
Econometrics of Panel Data.
Smith, J., notes, Applied Econometrics, Dynamic Panel Data Models,
University of Warwick.
http://www2.warwick.ac.uk/fac/soc/economics/staff/faculty/jennifersmith/panel/
Weinhold, D., "A Dynamic "Fixed Effects" Model for Heterogeneous
Panel Data," London School of Economics, 1999.
Part 11: Heterogeneity [ 12/36]
Random Effects and
Random Parameters
THE Random Parameters Model
y it  x it βi  it , each observation
y i  X iβi  ε i , Ti observations
βi  β  ui
Assume (temporarily) Ti > K.
E[ui | X i ] =0
Var[ui | X i ]  Γ constant but nonzero
We differentiate the classical and Bayesian interpretations
Randomness here is heterogeneity, not "uncertainty"
Bayesian approach to be considered later.
Part 11: Heterogeneity [ 13/36]
Estimating the Random
Parameters Model
y i  X iβi  ε i , Ti observations
βi  β  ui
y i  X iβ  X iui  ε i , Ti observations
y i  X iβ  w i
E[w i | X i ]  X iE[ui | X i ]  E[ε i | X i ]  0
Var[w i | X i ]  X iΓX i  2 ,iI <== Should 2 ,i vary by i?
Objects of estimation : β, 2 ,i , Γ
Second level estimation : βi
Part 11: Heterogeneity [ 14/36]
Estimating the Random
Parameters Model by OLS
y i  X iβi  ε i , Ti observations
βi  β  ui
y i  X iβ  X iui  ε i , Ti observations
y i  X iβ  w i
b  [Ni1 X i X i ]1 [Ni1 X i y i ]  β  [Ni1 Xi X i ]1 [Ni1 Xi w i ]
Var[b|X ]=[Ni1 X i X i ]1 [Ni1 X i ( X iΓX i  2 I) X ][Ni1 X i X i ]1
=2 [Ni1 X i X i ]1  [Ni1 X i X i ]1 [Ni1 (X i X i )Γ(Xi X)][Ni1 Xi X i ]1
 the usual + the variation due to the random parameters
Robust estimator
ˆ iw
ˆ i X i ][Ni1 X i X i ]1
Est.Var[b]  [Ni1 X i X i ]1 [Ni1 X i w
Part 11: Heterogeneity [ 15/36]
Estimating the Random
Parameters Model by GLS
y i  X iβi  ε i , Ti observations
βi  β  ui
y i  X iβ  X iui  ε i , Ti observations
y i  X iβ  w i , Var[w i|X i ] = Ωi =( X iΓX i  2 ,iI)
ˆ  [N X Ω-1 X ]1 [N X Ω-1 y ]
β
i1 i i
i
i 1 i i
i
2
ˆ and 
For FGLS, we need Γ
ˆ  ,i.
Part 11: Heterogeneity [ 16/36]
Estimating the RPM
1

bi  β  ( X i X i ) X i w i , w i =X iui +ε i
1

= β  ui  ( X i X i ) X iε i
1

Var[bi|X i ]=Γ+ ( X i X i )
2
 ,i
2

ˆ  ,i
 tTi 1 (y it  x it bi )2

is unbiased
Ti  K
(but not consistent because Ti is fixed).
Part 11: Heterogeneity [ 17/36]
An Estimator for Γ
E[bi|X i ]  β
Var[bi|X i ]=Γ+2 ,i ( X i X i ) 1
Var[bi ]  VarXE[bi|X i ]  E X Var[bi|X i ]
=

0+
E X [Γ+2 ,i ( X i X i )1 ]
Γ+E X [2 ,i ( X i X i )1 ]
1 N
Estimate Var[bi ] with i1 (bi  b)(bi  b)'
N
1
2
1
Estimate E X [2 ,i ( X i X i ) 1 ] with Ni1
ˆ  ,i ( X i X i )
N
2
1
ˆ= 1 Ni1 (bi  b)(bi  b)' - 1 Ni1
Γ
ˆ  ,i ( X i X i )
N
N
Part 11: Heterogeneity [ 18/36]
A Positive Definite Estimator for Γ
1 N
1 N 2
1
ˆ
Γ= i 1 (bi  b)(b i  b)' i  1 
ˆ  ,i ( X iX i )
N
N
May not be positive definite. What to do?
(1) The second term converges (in theory) to 0 in Ti . Drop it.
(2) Various Bayesian "shrinkage" estimators,
(3) An ML estimator
Part 11: Heterogeneity [ 19/36]
Estimating βi
N
ˆ
β


GLS
i1 Wb
i i,OLS
Wi  {Ni1 [Γ  2 ,i ( X i X i ) 1 ]} 1 [Γ  2 ,i ( X i X i ) 1 ]
Best linear unbiased predictor based on GLS is
ˆ  Aβ
ˆ
ˆ
β
i
i GLS + (I-A i )bi,OLS  bi,OLS  A i (β GLS  bi,OLS )
A i  {Γ -1  [2 ,i ( Xi X i ) 1 ]1 } 1 Γ -1
ˆ | all data]=A Var[β
ˆ ]A  
Var[β
i
i
GLS
i
[A i
ˆ ]
 Var[β
GLS
(I-A i )] 
 WVar[bi,OLS ]i
Var[bi,OLS ]Wi   A i 


(
I
A
)
Var[bi,OLS ]  
i 
Part 11: Heterogeneity [ 20/36]
Baltagi and Griffin’s Gasoline Data
World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are
COUNTRY = name of country
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
See Baltagi (2001, p. 24) for analysis of these data. The article on which the
analysis is based is Baltagi, B. and Griffin, J., "Gasoline Demand in the OECD: An
Application of Pooling and Testing Procedures," European Economic Review, 22,
1983, pp. 117-137. The data were downloaded from the website for Baltagi's
text.
Part 11: Heterogeneity [ 21/36]
OLS and FGLS Estimates
+----------------------------------------------------+
| Overall OLS results for pooled sample.
|
| Residuals
Sum of squares
=
14.90436
|
|
Standard error of e =
.2099898
|
| Fit
R-squared
=
.8549355
|
+----------------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Constant
2.39132562
.11693429
20.450
.0000
LINCOMEP
.88996166
.03580581
24.855
.0000
LRPMG
-.89179791
.03031474
-29.418
.0000
LCARPCAP
-.76337275
.01860830
-41.023
.0000
+------------------------------------------------+
| Random Coefficients Model
|
| Residual standard deviation
=
.3498
|
| R squared
=
.5976
|
| Chi-squared for homogeneity test = 22202.43
|
| Degrees of freedom
=
68
|
| Probability value for chi-squared=
.000000
|
+------------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
CONSTANT
2.40548802
.55014979
4.372
.0000
LINCOMEP
.39314902
.11729448
3.352
.0008
LRPMG
-.24988767
.04372201
-5.715
.0000
LCARPCAP
-.44820927
.05416460
-8.275
.0000
Part 11: Heterogeneity [ 22/36]
Country Specific Estimates
Part 11: Heterogeneity [ 23/36]
Estimated Γ
Part 11: Heterogeneity [ 24/36]
Two Step Estimation (Saxonhouse)
A Fixed Effects Model
y it  i  x it β  it
Secondary Model
i  ziδ
Two approaches
(1) Reduced form is a linear model with time constant zi
y it  x it β  ziδ  it
(2) Two step
(a) FEM at step 1
(b) ai  i  (ai  i )  ziδ  v i
1

Var[v i ]  2   x i ( X iMDi X i ) 1 x i 
 Ti

Use weighted least squares regression of ai on zi
Part 11: Heterogeneity [ 25/36]
A Hierarchical Model
Fixed Effects Model
y it  i  x it β  it
Secondary Model
i  ziδ  ui <========
Two approaches
(1) Reduced form is an REM with time constant zi
y it  x it β  ziδ  ui  it
(2) Two step
(a) FEM at step 1
(b) ai  i  (ai  i )  ziδ  ui  v i
1

Var[ui  v i ]  u2  2   x i ( X iMDi X i ) 1 x i 
 Ti

Part 11: Heterogeneity [ 26/36]
Analysis of Fannie Mae



Fannie Mae
The Funding Advantage
The Pass Through
Passmore, W., Sherlund, S., Burgess, G.,
“The Effect of Housing Government-Sponsored
Enterprises on Mortgage Rates,” 2005,
Federal Reserve Board and Real Estate Economics
Part 11: Heterogeneity [ 27/36]
Two Step Analysis of Fannie-Mae
Fannie Mae's GSE Funding Advantage and Pass Through
RMi,s,t  0s ,t  (1s ,tLTV)  2s ,t Smalli,s ,t  3s ,tFees i,s ,t
 s4,tNew i,s ,t  5s ,tMtgCoi,s ,t   s ,t Ji,s ,t  i,s ,t
i, s, t  individual, state,month
1,036,252 observations in 370 state,months.
RM  mortgage
LTV= 3 dummy variables for loan to value
Small = dummy variable for small loan
Fees = dummy variable for whether fees paid up front
New = dummy variable for new home
MtgCo = dummy variable for mortgage company
J = dummy variable for whether this is a JUMBO loan
THIS IS THE COEFFICIENT OF INTEREST.
Part 11: Heterogeneity [ 28/36]
Average of 370 First Step
Regressions
Symbol
Variable
Mean
S.D.
Coeff
S.E.
RM
Rate %
7.23
0.79
J
Jumbo
0.06
0.23
0.16
0.05
LTV1
75%-80%
0.36
0.48
0.04
0.04
LTV2
81%-90%
0.15
0.35
0.17
0.05
LTV3
>90%
0.22
0.41
0.15
0.04
New
New Home
0.17
0.38
0.05
0.04
Small
< $100,000 0.27
0.44
0.14
0.04
Fees
Fees paid
0.62
0.52
0.06
0.03
MtgCo
Mtg. Co.
0.67
0.47
0.12
0.05
R2 = 0.77
Part 11: Heterogeneity [ 29/36]
Second Step
 s ,t  0 
1 GSE Funding Advantage s,t - estimated separately
2 Risk free cost of credit s,t
3 Corporate debt spreads s,t - estimated 4 different ways
4 Prepayment spreads,t
5 Maturity mismatch risk s,t
6 Aggregate Demands,t
7 Long term interest rate s,t
8 Market Capacity s,t
9 Time trends,t
10-13 4 dummy variables for CA, NJ, MD, VA s,t
14-16 3 dummy variables for calendar quarters s,t
Part 11: Heterogeneity [ 30/36]
Estimates of β1
Second step based on 370 observations. Corrected for
"heteroscedasticity, autocorrelation, and monthly clustering."
Four estimates based on different estimates of corporate
credit spread:
0.07 (0.11)
0.31 (0.11)
0.17 (0.10)
0.10 (0.11)
Reconcile the 4 estimates with a minimum distance estimator
ˆ11 -1 ) 
 (
 2

ˆ1 -1 ) 
(

1
2
3
4
-1
ˆ
ˆ1 -1 ),(
ˆ1 -1 ),(
ˆ1 -1 ),(
ˆ1 -1 )]'Ω
Minimize [(
 ˆ3

 (1 -1 ) 
4
(

ˆ

)
 1 1 
Estimated mortgage rate reduction: About 16 basis points. .16%.
Part 11: Heterogeneity [ 31/36]
The Minimum Distance Estimator
0.07 (0.11)
0.31 (0.11)
.017 (0.10)
0.10 (0.11)
Reconcile the 4 estimates with a minimum distance estimator
ˆ11 -1 ) 
 (
 2

ˆ
( - )
ˆ -1  1 1 
ˆ11 -1 ),(
ˆ12 -1 ),(
ˆ13 -1 ),(
ˆ14 -1 )]' Ω
Minimize [(
 ˆ3

(


)
 1 1 
4
 (

ˆ
 1 -1 ) 
ˆ
ˆ1 

.07 / .112

(1 / .112 )  (1 / .112 )  (1 / .10 2 )  (1 / .112 )
.31 / .112

(1 / .112 )  (1 / .112 )  (1 / .10 2 )  (1 / .112 )
+ ...
Approximately .17%.
Part 11: Heterogeneity [ 32/36]
A Hierarchical Linear Model
German Health Data
Hsat = β1 + β2AGEit + γi EDUCit + β4 MARRIEDit + εit
γi = α1 + α2FEMALEi + ui
Sample ; all$
Reject ; _Groupti < 7 $
Regress ; Lhs = newhsat ; Rhs = one,age,educ,married
; RPM = female ; Fcn = educ(n)
; pts = 25 ; halton
; pds = _groupti ; Parameters$
Sample ; 1 – 887 $
Create ; betaeduc = beta_i $
Dstat ; rhs = betaeduc $
Histogram ; Rhs = betaeduc $
Part 11: Heterogeneity [ 33/36]
OLS Results
OLS Starting values for random parameters model...
Ordinary
least squares regression ............
LHS=NEWHSAT Mean
=
6.69641
Standard deviation
=
2.26003
Number of observs.
=
6209
Model size
Parameters
=
4
Degrees of freedom
=
6205
Residuals
Sum of squares
=
29671.89461
Standard error of e =
2.18676
Fit
R-squared
=
.06424
Adjusted R-squared
=
.06378
Model test
F[ 3, 6205] (prob) =
142.0(.0000)
--------+--------------------------------------------------------|
Standard
Prob.
Mean
NEWHSAT| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
7.02769***
.22099
31.80 .0000
AGE|
-.04882***
.00307
-15.90 .0000
44.3352
MARRIED|
.29664***
.07701
3.85 .0001
.84539
EDUC|
.14464***
.01331
10.87 .0000
10.9409
--------+---------------------------------------------------------
Part 11: Heterogeneity [ 34/36]
Maximum Simulated Likelihood
Normal exit: 27 iterations. Status=0. F=
12584.28
-----------------------------------------------------------------Random Coefficients LinearRg Model
Dependent variable
NEWHSAT
Log likelihood function
-12583.74717
Estimation based on N =
6209, K =
7
Unbalanced panel has
887 individuals
LINEAR regression model
Simulation based on
25 Halton draws
--------+--------------------------------------------------------|
Standard
Prob.
Mean
NEWHSAT| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------|Nonrandom parameters
Constant|
7.34576***
.15415
47.65 .0000
AGE|
-.05878***
.00206
-28.56 .0000
44.3352
MARRIED|
.23427***
.05034
4.65 .0000
.84539
|Means for random parameters
EDUC|
.16580***
.00951
17.43 .0000
10.9409
|Scale parameters for dists. of random parameters
EDUC|
1.86831***
.00179 1044.68 .0000
|Heterogeneity in the means of random parameters
cEDU_FEM|
-.03493***
.00379
-9.21 .0000
|Variance parameter given is sigma
Std.Dev.|
1.58877***
.00954
166.45 .0000
--------+---------------------------------------------------------
Part 11: Heterogeneity [ 35/36]
“Individual Coefficients”
Frequency
--> Sample ; 1 - 887 $
--> create ; betaeduc = beta_i $
--> dstat
; rhs = betaeduc $
Descriptive Statistics
All results based on nonmissing observations.
==============================================================================
Variable
Mean
Std.Dev.
Minimum
Maximum
Cases Missing
==============================================================================
All observations in current sample
--------+--------------------------------------------------------------------BETAEDUC| .161184
.132334
-.268006
.506677
887
0
-.2 6 8
-.1 5 7
-.0 4 7
.0 6 4
.1 7 5
BET AEDUC
.2 8 5
.3 9 6
.5 0 7
Part 11: Heterogeneity [ 36/36]
A Hierarchical Linear Model


A hedonic model of house values
Beron, K., Murdoch, J., Thayer, M.,
“Hierarchical Linear Models with Application to
Air Pollution in the South Coast Air Basin,”
American Journal of Agricultural Economics, 81,
5, 1999.
Part 11: Heterogeneity [ 37/36]
HLM
y ijk  log of home sale price i, neighborhood j, community k.
m
y ijk   m1 mjk x ijk
 ijk (linear regression model)
M
x mijk  sq.ft, #baths, lot size, central heat, AC, pool, good view,
age, distance to beach
Random coefficients
mjk   qm1 qj Nqjk  w jk
Q
Nqjk  %population poor, race mix, avg age, avg. travel to work,
FBI crime index, school avg. CA achievement test score
   s 1 sE qm
 vj
j
q
j
Sqm
E qm
 air quality measure, visibility
j
Download