12. Random Parameters, Discrete Random Parameter Variation

advertisement
Part 12: Random Parameters [ 1/46]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Econometric Analysis of Panel Data
12. Random Parameters Linear Models
Part 12: Random Parameters [ 3/46]
Parameter Heterogeneity
(1) Regression model
y i,t  x i,t βi  εit
(2) Conditional probability model
f(y it | x i,t , βi )
(3) Heterogeneity - how are parameters distributed across
individuals?
(a) Discrete - the population contains a mixture of Q
types of individuals.
(b) Continuous. Parameters are part of the stochastic
structure of the population.
Part 12: Random Parameters [ 4/46]
Agenda

‘True’ Random Parameter Variation


Discrete – Latent Class
Continuous


Classical
Bayesian
Part 12: Random Parameters [ 5/46]
Discrete Parameter Variation
The Latent Class Model
(1) Population is a (finite) mixture of Q types of individuals.
q = 1,...,Q. Q 'classes' differentiated by (βq ,  ,q )
(a) Analyst does not know class memberships. ('latent.')
(b) 'Mixing probabilities' (from the point of view of the
Q
analyst) are 1 ,..., Q , with q=1
q  1
(2) Conditional density is
P(y i,t | class  q)  f(y it | x i,t , βq ,  ,q )
Part 12: Random Parameters [ 6/46]
Log Likelihood for an LC Model
Conditional density for each observation is
P(y i,t | x i,t , class  q)  f(y it | x i,t , βq )
Joint conditional density for Ti observations is
f(y i1 , y i2 ,..., y i,Ti | X i , βq )   t i 1 f(y it | x i,t , βq )
T
(Ti may be 1. This is not only a 'panel data' model.)
Maximize this for each class if the classes are known.
They aren't. Unconditional density for individual i is
f(y i1 , y i2 ,..., y i,Ti | X i )   q1 q
Q

Ti
t 1
f(y it | x i,t , βq )

Log Likelihood
LogL(β1 ,..., β Q , δ1 ,..., δ Q )   i1 log  q1 q  t i 1 f(y it | x i,t , β q )
N
Q
T
Part 12: Random Parameters [ 7/46]
Part 12: Random Parameters [ 8/46]
Example: Mixture of Normals
Q normal populations each with a mean  q and standard deviation q
For each individual in each class at each period,
2



 y it   q 
y it   q 
1
1
1
f(y it | class  q) 
exp   

  =
 .


 2  q    j  q 
q 2


Panel data, T observations on each individual i,
T
2

 1 


y it   q 
1
T

 exp  t 1  
f(y i1 ,..., y iT | class  q)  

  2 

2  q  
q




Log Likelihood

N
 Q
logL   i1 log  q1 q


T
2


 1 


y


1
it
q
T


 exp  t 1  

  2 

2  q  
 q








Part 12: Random Parameters [ 9/46]
Unmixing a Mixed Sample
Sample
Calc
Create
Create
Create
Kernel
Regress
; 1 – 1000$
; Ran(123457)$
; lc1=rnn(1,1) ;lc2=rnn(5,1)$
; class=rnu(0,1)$
; if(class<.3)ylc=lc1 ; (else)ylc=lc2$
; rhs=ylc $
; lhs=ylc;rhs=one;lcm;pts=2;pds=1$
. 224
. 180
Densi t y
. 135
. 090
. 045
. 000
-4
-2
0
2
4
6
YLC
Ker nel densi t y est i m at e f or
YLC
8
10
Part 12: Random Parameters [ 10/46]
Mixture of Normals
Part 12: Random Parameters [ 11/46]
Estimating Which Class
Prior probability Prob[class=q]=q
Joint conditional density for Ti observations is
P(y i1 , y i2 ,..., y i,Ti | class  q)   t i 1 f(y it | x i,t , βq ,  ,q )
T
Joint density for data and class membership is
P(y i1 , y i2 ,..., y i,Ti , class  q)  q  t i 1 f(y it | x i,t , βq ,  ,q )
T
Posterior probability for class, given the data
P(y i1 , y i2 ,..., y i,Ti , class  q)
P(class  q | y i1 , y i2 ,..., y i,Ti ) 

P(y i1 , y i2 ,..., y i,Ti )
P(y i1 , y i2 ,..., y i,Ti , class  q)

J
j 1
P(y i1 , y i2 ,..., y i,Ti , class  q)
Use Bayes Theorem to compute the posterior probability
q  t i 1 f(y it | x i,t , β q ,  ,q )
T
w(q | datai )  P(class  q | y i1 , y i2 ,..., y i,Ti ) 

Q
q1
q  t i 1 f(y it | x i,t , β q ,  ,q )
T
Best guess = the class with the largest posterior probability.
Part 12: Random Parameters [ 12/46]
Posterior for Normal Mixture
 Ti 1  y it  
ˆq 
q  t 1  

ˆ


ˆ q  
ˆ q  

ˆ | datai )  w(q
ˆ | i) 
w(q
 Ti 1  y it  
ˆq 
Q
 q1 ˆq  t 1 ˆ   ˆ 

q 
q
 
c iqˆ
q
=
Q
 q1 ciqˆq
Part 12: Random Parameters [ 13/46]
Estimated Posterior Probabilities
Part 12: Random Parameters [ 14/46]
More Difficult When the
Populations are Close Together
Part 12: Random Parameters [ 15/46]
The Technique Still Works
---------------------------------------------------------------------Latent Class / Panel LinearRg Model
Dependent variable
YLC
Sample is 1 pds and
1000 individuals
LINEAR regression model
Model fit with 2 latent classes.
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Model parameters for latent class 1
Constant|
2.93611***
.15813
18.568
.0000
Sigma|
1.00326***
.07370
13.613
.0000
|Model parameters for latent class 2
Constant|
.90156***
.28767
3.134
.0017
Sigma|
.86951***
.10808
8.045
.0000
|Estimated prior probabilities for class membership
Class1Pr|
.73447***
.09076
8.092
.0000
Class2Pr|
.26553***
.09076
2.926
.0034
--------+-------------------------------------------------------------
Part 12: Random Parameters [ 16/46]
Predicting Class Membership
Means = 1 and 5
Means = 1 and 3
+----------------------------------++----------------------------------+
|Cross Tabulation
||Cross Tabulation
|
+--------+--------+-----------------+--------+--------+----------------|
|
|
CLASS
||
|
|
CLASS
|
|CLASS1 | Total |
0
1
||CLASS1 | Total |
0
1
|
+--------+--------+----------------++--------+--------+----------------+
|
0|
787 |
759
28
||
0|
787 |
523
97
|
|
1| 1713 |
18
1695
||
1| 1713 |
250
1622
|
+--------+--------+----------------++--------+--------+----------------+
|
Total| 2500 |
777
1723
||
Total| 2500 |
777
1723
|
+--------+--------+----------------++--------+--------+----------------+
Note: This is generally not possible as the true underlying class
membership is not known.
Part 12: Random Parameters [ 17/46]
How Many Classes?
(1) Q is not a 'parameter' - can't 'estimate' Q with  and β
(2) Can't 'test' down or 'up' to Q by comparing
log likelihoods. Degrees of freedom for Q+1
vs. Q classes is not well defined.
(3) Use AKAIKE IC; AIC = -2  logL + 2#Parameters.
For our mixture of normals problem,
AIC1  10827.88
AIC2  9954.268 
AIC3  9958.756
Part 12: Random Parameters [ 18/46]
Part 12: Random Parameters [ 19/46]
Latent Class Regression
Assume normally distributed disturbances
1
f(y it | class  q) 
 ,q
 y it  x it βq 



 ,q


Mixture of normals sets x it β q = μitq .
Part 12: Random Parameters [ 20/46]
An Extended Latent Class Model
(1) There are Q classes, unobservable to the analyst
(2) Class specific model: f(y it | x it , class  q)  g(y it , x it , βq )
(3) Conditional class probabilities q
Common multinomial logit form for prior class probabilities
to constrain all probabilities to (0,1) and ensure
multinomial logit form for class probabilities;
exp(q )
P(class=q|δ)  q 
, Q = 0
J
 j1 exp(q )
Note, q = log(q / Q ).

Q
q=1
q  1;
Part 12: Random Parameters [ 21/46]
Baltagi and Griffin’s Gasoline Data
World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are
COUNTRY = name of country
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
See Baltagi (2001, p. 24) for analysis of these data. The article on which the
analysis is based is Baltagi, B. and Griffin, J., "Gasoline Demand in the OECD: An
Application of Pooling and Testing Procedures," European Economic Review, 22,
1983, pp. 117-137. The data were downloaded from the website for Baltagi's
text.
Part 12: Random Parameters [ 22/46]
3 Class Linear Gasoline Model
Part 12: Random Parameters [ 23/46]
Estimating E[βi |Xi,yi, β1…, βQ]
ˆ from the class with the largest estimated probability
(1) Use β
q
(2) Probabilistic
Q
ˆ
ˆ
βi = q=1 Posterior Prob[class=q|datai ] β
q
Part 12: Random Parameters [ 24/46]
Estimated Parameters
LCM
vs.
Gen1 RPM
Part 12: Random Parameters [ 25/46]
Heckman and Singer’s RE Model


Random Effects Model
Random Constants with Discrete Distribution
(1) There are Q classes, unobservable to the analyst
(2) Class specific model: f(y it | x it , class  q)  g(y it , x it ,  q , β)
(3) Conditional class probabilities q
Common multinomial logit form for prior class probabilities
to constrain all probabilities to (0,1) and ensure
multinomial logit form for class probabilities;
P(class=q|δ)  q 
exp(q )

J
j1
exp(q )
, Q = 0

Q
q=1
q  1;
Part 12: Random Parameters [ 26/46]
LC Regression for Doctor Visits
Part 12: Random Parameters [ 27/46]
3 Class Heckman-Singer Form
Part 12: Random Parameters [ 28/46]
The EM Algorithm
Latent Class is a 'missing data' model
di,q  1 if individual i is a member of class q
If di,q were observed, the complete data log likelihood would be
logL c   i1 log


 q1 di,q  t i 1 f(y i,t | datai,t , class  q) 
(Only one of the Q terms would be nonzero.)
Expectation - Maximization algorithm has two steps
(1) Expectation Step: Form the 'Expected log likelihood'
N
Q
T
given the data and a prior guess of the parameters.
(2) Maximize the expected log likelihood to obtain a new
guess for the model parameters.
(E.g., http://crow.ee.washington.edu/people/bulyko/papers/em.pdf)
Part 12: Random Parameters [ 29/46]
Implementing EM for LC Models
Given initial guesses 0q  10 , 02 ,..., 0Q , β0q  β10 , β02 ,..., β0Q
E.g., use 1/Q for each q and the MLE of β from a one class
model. (Must perturb each one slightly, as if all q are equal
and all β q are the same, the model will satisfy the FOC.)
ˆ0 , δ
ˆ0
ˆ
(1) Compute F(q|i)
= posterior class probabilities, using β
Reestimate each β q using a weighted log likelihood
Maximize wrt βq
 i=1 Fˆiq
N

Ti
t=1
log f(y it | x it , βq )
(2) Reestimate q by reestimating δ
ˆ
q =(1/N)Ni=1F(q|i)
using old ˆ
 and new β
ˆ
Now, return to step 1.
Iterate until convergence.
Part 12: Random Parameters [ 30/46]
Continuous Parameter Variation
(The Random Parameters Model)
y it  x it βi  it , each observation
y i  X iβi  ε i , Ti observations
βi  β  ui
E[ui | X i ] = 0
Var[ui | X i ]  Γ constant but nonzero
f(ui | X i )= g(ui , Γ), a density that does
not involve X i
Part 12: Random Parameters [ 31/46]
OLS and GLS Are Consistent
y i  X iβi  ε i , Ti observations
βi  β  ui
y i  X iβ  X iui  ε i , Ti observations
y i  X iβ  w i
E[w i | X i ]  X iE[ui | X i ]  E[ε i | X i ]  0
Var[w i | X i ]  2 I  X iΓX i
(Discussed earlier - two step GLS)
Part 12: Random Parameters [ 32/46]
ML Estimation of the RPM
Sample data generation
y i,t  x i,t βi  i,t
Individual heterogeneity
βi = β  ui
Conditional log likelihood
log f(y i1 ,..., y iTi | X i , βi ,  )  log  t i 1 f(y it | x it , βi ,  )
T
Unconditional log likelihood
logL(β, Γ ,  ) 

βi
log  t i 1 f(y it | x it , βi ,  )g(βi | β, Γ)dβi
T
(1) Using simulated ML or quadrature, maximize to
estimate β, Γ ,  .
(2) Using data and estimated structural parameters,
compute E[βi | datai , β, Γ ,  ]
Part 12: Random Parameters [ 33/46]
RP Gasoline Market
Part 12: Random Parameters [ 34/46]
Parameter Covariance matrix
Part 12: Random Parameters [ 35/46]
RP
vs.
Gen1
Part 12: Random Parameters [ 36/46]
Modeling Parameter Heterogeneity
Conditional Linear Regression
y i,t  x i,t βi  i,t
Individual heterogeneity in the means of the parameters
βi = β  Δzi + ui
E[ui | X i , zi ]
Heterogeneity in the variances of the parameters
Var[ui,k | datai ]  k exp(hiδk )
Estimation by maximum simulated likelihood
Part 12: Random Parameters [ 37/46]
Hierarchical Linear Model
COUNTRY = name of country
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
yit = 1i + 2i x1it + 3i x2it + it.
1i=1+1 zi + u1i
2i=2+2 zi + u2i
3i=3+3 zi + u3i
y
z
x1
x2
Part 12: Random Parameters [ 38/46]
Estimated HLM
Part 12: Random Parameters [ 39/46]
RP vs. HLM
Part 12: Random Parameters [ 40/46]
A Hierarchical Linear Model
German Health Care Data
Hsat = β1 + β2AGEit + γi EDUCit + β4 MARRIEDit + εit
γi = α1 + α2FEMALEi + ui
Sample ; all $
Setpanel ; Group = id ; Pds = ti $
Regress ; For [ti = 7] ; Lhs = newhsat ; Rhs = one,age,educ,married
; RPM = female ; Fcn = educ(n)
; pts = 25 ; halton ; panel ; Parameters$
Sample ; 1 – 887 $
Create ; betaeduc = beta_i $
Dstat ; rhs = betaeduc $
Histogram ; Rhs = betaeduc $
Part 12: Random Parameters [ 41/46]
OLS Results
OLS Starting values for random parameters model...
Ordinary
least squares regression ............
LHS=NEWHSAT Mean
=
6.69641
Standard deviation
=
2.26003
Number of observs.
=
6209
Model size
Parameters
=
4
Degrees of freedom
=
6205
Residuals
Sum of squares
=
29671.89461
Standard error of e =
2.18676
Fit
R-squared
=
.06424
Adjusted R-squared
=
.06378
Model test
F[ 3, 6205] (prob) =
142.0(.0000)
--------+--------------------------------------------------------|
Standard
Prob.
Mean
NEWHSAT| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------Constant|
7.02769***
.22099
31.80 .0000
AGE|
-.04882***
.00307
-15.90 .0000
44.3352
MARRIED|
.29664***
.07701
3.85 .0001
.84539
EDUC|
.14464***
.01331
10.87 .0000
10.9409
--------+---------------------------------------------------------
Part 12: Random Parameters [ 42/46]
Maximum Simulated Likelihood
-----------------------------------------------------------------Random Coefficients LinearRg Model
Dependent variable
NEWHSAT
Log likelihood function
-12583.74717
Estimation based on N =
6209, K =
7
Unbalanced panel has
887 individuals
LINEAR regression model
Simulation based on
25 Halton draws
--------+--------------------------------------------------------|
Standard
Prob.
Mean
NEWHSAT| Coefficient
Error
z
z>|Z|
of X
--------+--------------------------------------------------------|Nonrandom parameters
Constant|
7.34576***
.15415
47.65 .0000
AGE|
-.05878***
.00206
-28.56 .0000
44.3352
MARRIED|
.23427***
.05034
4.65 .0000
.84539
|Means for random parameters
EDUC|
.16580***
.00951
17.43 .0000
10.9409
|Scale parameters for dists. of random parameters
EDUC|
1.86831***
.00179 1044.68 .0000
|Heterogeneity in the means of random parameters
cEDU_FEM|
-.03493***
.00379
-9.21 .0000
|Variance parameter given is sigma
Std.Dev.|
1.58877***
.00954
166.45 .0000
--------+---------------------------------------------------------
Part 12: Random Parameters [ 43/46]
Simulating Conditional Means
for Individual Parameters
Eˆ (i | y i , Xi ) 
=
ˆ ˆ
Ti 1  yit  (  Lw i , r )xit
1 R ˆ ˆ
(  Lw i ,r ) t 1  

r 1
R
ˆ 
ˆ
1 R

R r 1




ˆ ˆ
Ti 1  yit  (  Lw i , r )xit 

t 1 ˆ  

ˆ



1 R ˆ
ˆ
Weight


ir
ir
R r 1
Posterior estimates of E[parameters(i) | Data(i)]
Part 12: Random Parameters [ 44/46]
“Individual Coefficients”
Frequency
--> Sample ; 1 - 887 $
--> create ; betaeduc = beta_i $
--> dstat
; rhs = betaeduc $
Descriptive Statistics
All results based on nonmissing observations.
==============================================================================
Variable
Mean
Std.Dev.
Minimum
Maximum
Cases Missing
==============================================================================
All observations in current sample
--------+--------------------------------------------------------------------BETAEDUC| .161184
.132334
-.268006
.506677
887
0
- . 268
- . 157
- . 047
. 064
. 175
BETAEDUC
. 285
. 396
. 507
Part 12: Random Parameters [ 45/46]
Hierarchical Bayesian Estimation
Sample data generation: y i,t  x i,tβi  i,t , i,t ~ N[0,2 ]
Individual heterogeneity: βi = β  ui , ui ~ N[0, Γ]
What information exists about 'the model?'
Prior densities for structural parameters :
p(log )= uniform density with (large) parameter A 0
p(β) = N[β0 , Σ 0 ], e.g., 0 and (large) v 0I
p(Γ) = Inverse Wishart[...]
Priors for parameters of interest :
p(βi )= N[β,Γ]
p( ) = as above.
Part 12: Random Parameters [ 46/46]
Estimation of Hierarchical
Bayes Models
(1) Analyze 'posteriors' for hyperparameters β, Γ ,
(2) Analyze posterior for group level parameters, βi
Estimators are Means and Variances of posterior
distributions.
Algorithm: Generally, Gibbs sampling from posteriors
with resort to laws of large numbers
To be discussed later.
Download