Part 16: Nonlinear Effects

advertisement
Part 16: Nonlinear Effects [ 1/95]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Econometric Analysis of Panel Data
16. Nonlinear Effects Models and
Models for Binary Choice
Part 16: Nonlinear Effects [ 3/95]
Modeling a Binary Outcome




Did firm i produce a product or process innovation in year t ?
yit : 1=Yes/0=No
Observed N=1270 firms for T=5 years, 1984-1988
Observed covariates: xit = Industry, competitive pressures,
size, productivity, etc.
How to model?



Binary outcome
Correlation across time
Heterogeneity across firms
Part 16: Nonlinear Effects [ 4/95]
Application
Part 16: Nonlinear Effects [ 5/95]
The “Panel Probit Model”
The German innovation data: T=5 (N=1270)
y it *= x itβ+it , y it  1[x itβ+it > 0]
 0   1
 i1 
  
 

 i2  ~ N  0  ,  12
   ...
 
  
 
 0   1T
 iT 
12
1
...
2,T
... 1T  

... 2,T  
... ...  

... 1  
Part 16: Nonlinear Effects [ 6/95]
FIML
logL   i1 log Prob[y i1 ,..., y i5 ]
N
  i1 log
N

β
(2y i5 1) xi5

...
β
(2y i1 1) xi1

g( v | Σ*)dv1 ...dv 5
g( v | Σ*)  (2) 5 / 2 | Σ* |1 / 2 exp[(1 / 2) v'(Σ*) 1 v]
 1

qi1qi212

Σ* 
 ...

 qi1qi51T
qit  2y it  1
qi1qi212
1
...
qi2 qi52,T 1
... qi1qi515 

... qi2 qi222 
...
... 

...
1

See Greene, W., “Convenient Estimators for the Panel Probit Model: Further Results,”
Empirical Economics, 29, 1, Jan. 2004, pp. 21-48.
Part 16: Nonlinear Effects [ 7/95]
GMM
From the marginal distributions:
E[y it  ( x it β) | X i ]  0 (note: strict exogeneity)
Suggests orthogonality conditions
 (y i1  ( x i1β)) x i1   0 
(y  ( x  β)) x   
0
i2 
i2
i2


5*K moments.

E
  ... 

...
  

(y i5  ( x i5β)) x i5   0 
Part 16: Nonlinear Effects [ 8/95]
GMM Estimation-1
Step 1. Pool the data and use probit to estimate β.
Compute weighting matrix.
1
1
1270
W

1270 1270 i1
ˆ
ˆ
 (y  ( x  β
  (y  ( x  β

i1
i1 )) x i1
i1
i1 )) x i1



ˆ)) x  (y  ( x  β
ˆ
(y i2  ( x i2β

i2
i2
i2 )) x i2


'
...
...




ˆ)) x  (y  ( x  β
ˆ)) x 

(y


(
x
β
i5
i5  
i5
i5
i5 
 i5
Part 16: Nonlinear Effects [ 9/95]
GMM Estimation-2
Step 2. Minimize GMM criterion q = g(β)'W -1g(β)
1
1270
g(β)=

1270 i1
 (y i1  ( x i1β)) x i1 



(y


(
x
β
))
x
i2
i2 
 i2


...



(y i5  ( x i5β)) x i5 
Note : 8 parameters, 5(8)=40 moment equations.
Part 16: Nonlinear Effects [ 10/95]
GEE Estimation
Part 16: Nonlinear Effects [ 11/95]
Fractional Response
Part 16: Nonlinear Effects [ 12/95]
Fractional Response Model
Part 16: Nonlinear Effects [ 13/95]
Fractional Response Model
Part 16: Nonlinear Effects [ 14/95]
Many interesting qualitative variables such as health satisfaction, labor
outcomes, insurance, etc.
Part 16: Nonlinear Effects [ 15/95]
Part 16: Nonlinear Effects [ 16/95]
Application: Health Care Panel Data
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293
individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary
choice. There are altogether 27,326 observations. The number of observations ranges from 1 to
7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).
Variables in the file are
DOCTOR
HOSPITAL
HSAT
DOCVIS
HOSPVIS
PUBLIC
ADDON
HHNINC
=
=
=
=
=
=
=
=
HHKIDS
EDUC
AGE
MARRIED
=
=
=
=
1(Number of doctor visits > 0)
1(Number of hospital visits > 0)
health satisfaction, coded 0 (low) - 10 (high)
number of doctor visits in last three months
number of hospital visits in last calendar year
insured in public health insurance = 1; otherwise = 0
insured by add-on insurance = 1; otherswise = 0
household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
children under age 16 in the household = 1; otherwise = 0
years of schooling
age in years
marital status
Part 16: Nonlinear Effects [ 17/95]
Unbalanced Panels
Most theoretical results are for balanced
panels.
Most real world panels are unbalanced.
Often the gaps are caused by attrition.
Group
Sizes
The major question is whether the gaps are
‘missing completely at random.’ If not, the
observation mechanism is endogenous, and at
least some methods will produce
questionable results.
Researchers rarely have any reason to treat
the data as nonrandomly sampled. (This is
good news.)
Part 16: Nonlinear Effects [ 18/95]
Unbalanced Panels and Attrition ‘Bias’

Test for ‘attrition bias.’ (Verbeek and Nijman, Testing for Selectivity
Bias in Panel Data Models, International Economic Review, 1992,
33, 681-703.



Variable addition test using covariates of presence in the panel
Nonconstructive – what to do next?
Do something about attrition bias. (Wooldridge, Inverse Probability
Weighted M-Estimators for Sample Stratification and Attrition,
Portuguese Economic Journal, 2002, 1: 117-139)


Stringent assumptions about the process
Model based on probability of being present in each wave of the panel
Part 16: Nonlinear Effects [ 19/95]
Panel Data Binary Choice Models
Random Utility Model for Binary Choice
Uit =  + ’xit
+ it + Person i specific effect
Fixed effects using “dummy” variables
Uit = i + ’xit + it
Random effects using omitted heterogeneity
Uit =  + ’xit + it + ui
Same outcome mechanism: Yit = 1[Uit > 0]
Part 16: Nonlinear Effects [ 20/95]
Pooled Model
Part 16: Nonlinear Effects [ 21/95]
Ignoring Unobserved Heterogeneity
Assuming strict exogeneity; Cov(x it ,ui  it )  0
y it *=x it β  ui  it
Prob[y it  1 | x it ]  Prob[ui  it  -x itβ]
Using the same model format:


Prob[y it  1 | x it ]  F x it β / 1+u2  F( x it δ)
This is the 'population averaged model.'
Part 16: Nonlinear Effects [ 22/95]
Ignoring Heterogeneity in the RE Model
Ignoring heterogeneity, we estimate δ not β.
Partial effects are δ f( x it δ) not βf( x itβ)
β is underestimated, but f( x it β) is overestimated.
Which way does it go? Maybe ignoring u is ok?
Not if we want to compute probabilities or do
statistical inference about β. Estimated standard
errors will be too small.
Part 16: Nonlinear Effects [ 23/95]
Ignoring Heterogeneity (Broadly)




Presence will generally make parameter estimates look
smaller than they would otherwise.
Ignoring heterogeneity will definitely distort standard
errors.
Partial effects based on the parametric model may not
be affected very much.
Is the pooled estimator ‘robust?’ Less so than in the
linear model case.
Part 16: Nonlinear Effects [ 24/95]
Pooled vs. RE Panel Estimator
---------------------------------------------------------------------Binomial Probit Model
Dependent variable
DOCTOR
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------Constant|
.02159
.05307
.407
.6842
AGE|
.01532***
.00071
21.695
.0000
43.5257
EDUC|
-.02793***
.00348
-8.023
.0000
11.3206
HHNINC|
-.10204**
.04544
-2.246
.0247
.35208
--------+------------------------------------------------------------Unbalanced panel has
7293 individuals
--------+------------------------------------------------------------Constant|
-.11819
.09280
-1.273
.2028
AGE|
.02232***
.00123
18.145
.0000
43.5257
EDUC|
-.03307***
.00627
-5.276
.0000
11.3206
HHNINC|
.00660
.06587
.100
.9202
.35208
Rho|
.44990***
.01020
44.101
.0000
--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 25/95]
Partial Effects
---------------------------------------------------------------------Partial derivatives of E[y] = F[*] with
respect to the vector of characteristics
They are computed at the means of the Xs
Observations used for means are All Obs.
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z] Elasticity
--------+------------------------------------------------------------|Pooled
AGE|
.00578***
.00027
21.720
.0000
.39801
EDUC|
-.01053***
.00131
-8.024
.0000
-.18870
HHNINC|
-.03847**
.01713
-2.246
.0247
-.02144
--------+------------------------------------------------------------|Based on the panel data estimator
AGE|
.00620***
.00034
18.375
.0000
.42181
EDUC|
-.00918***
.00174
-5.282
.0000
-.16256
HHNINC|
.00183
.01829
.100
.9202
.00101
--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 26/95]
Effect of Clustering





Yit must be correlated with Yis across periods
Pooled estimator ignores correlation
Broadly, yit = E[yit|xit] + wit,
 E[yit|xit] = Prob(yit = 1|xit)
 wit is correlated across periods
Assuming the marginal probability is the same, the
pooled estimator is consistent. (We just saw that it
might not be.)
Ignoring the correlation across periods generally leads
to underestimating standard errors.
Part 16: Nonlinear Effects [ 27/95]
‘Cluster’ Corrected Covariance Matrix
C  the number if clusters
nc  number of observations in cluster c
H1 = negative inverse of second derivatives matrix
gic = derivative of log density for observation
Part 16: Nonlinear Effects [ 28/95]
Cluster Corrected Logit Model Estimator
exp(xic )
1
P(yic | xic )  ( y  1) 
 ( y  0)
1  exp(xic )
1  exp(xic )
exp(t )
Let  (t)=
.
1  exp(t )
P(yic | xic )  ( y  1)   (xic )  ( y  0)[1   (xic )]
Algebra: [1 -  (t )]   (t )
P(yic | xic )  ( y  1)   (xic )  ( y  0)[ (xic )]
Let q ic  2yic  1
P(yic | xic )  ( qicxic )
More algebra: dΛ(t)/dt  Λ(t)[1-Λ(t)]. Let  ic   (qicxic )
Part 16: Nonlinear Effects [ 29/95]
Cluster Corrected Logit Model Estimator
P(yic | xic )   ic  ( qicxic )
Log Likelihood: logL =
 
C
Nc
c 1
i 1
log  ic   c 1

C
Nc
i 1
log Lic
 log Lic
1

 ic [1   ic ]qic xic  [1   ic ]qic xic  gic

 ic
 2 log Lic
  ic [1   ic ]qic xic qic xic   ic [1   ic ]xic xic  H ic
 
Est.Var ˆ  
 C
 c 1
C  C

C  1  c 1

 c 1
C

Nc
i 1


Nc
i 1
Nc
i 1
  ic [1   ic ]xic xic 

[1   ic ]qic xic
 
1
Nc
i 1
  ic [1   ic ]xic xic 

1

[1   ic ]qic xic 

Part 16: Nonlinear Effects [ 30/95]
Cluster Correction: Doctor
---------------------------------------------------------------------Binomial Logit Model
Dependent variable
DOCTOR
Log likelihood function
-17457.21899
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------| Conventional Standard Errors
Constant|
-.25597***
.05481
-4.670
.0000
AGE|
.01469***
.00071
20.686
.0000
43.5257
EDUC|
-.01523***
.00355
-4.289
.0000
11.3206
HHNINC|
-.10914**
.04569
-2.389
.0169
.35208
FEMALE|
.35209***
.01598
22.027
.0000
.47877
--------+------------------------------------------------------------| Corrected Standard Errors
Constant|
-.25597***
.07744
-3.305
.0009
AGE|
.01469***
.00098
15.065
.0000
43.5257
EDUC|
-.01523***
.00504
-3.023
.0025
11.3206
HHNINC|
-.10914*
.05645
-1.933
.0532
.35208
FEMALE|
.35209***
.02290
15.372
.0000
.47877
--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 31/95]
Random Effects
exp(xit  ui )
1
P(yit | xit , ui )  ( yit  1) 
 ( yit  0)
1  exp(xit  ui )
1  exp(xit  ui )
LogL   i 1
n

Ti
t 1
log P(yit | xit , ui )
logL cannot be maximized because of the unobserved u i . We
maximize E u [logL] instead.
Part 16: Nonlinear Effects [ 32/95]
Quadrature – Butler and Moffitt (1982)
This method is used in most commerical software since 1982
N

N

T
logL   i1 log  t i 1 F(y it ,   x it  u v i )    v i  dv i
 

=

 -v 2 
exp 
 dv i
2
 2 
1
log g( v )
i1

(make a change of variable to w = v/ 2
=
1



 i1 log g( 2w) exp -w 2 dwi
N
u i ~ N[0, 2u ]
= u vi
where vi ~ N[0,1]


The integral can be computed using Hermite quadrature.

1
 i1 log h1 whg( 2zh )
N
H

The values of w h (weights) and zh (nodes) are found in published
tables such as Abramovitz and Stegun (or on the web). H is by
choice. Higher H produces greater accuracy (but takes longer).
Part 16: Nonlinear Effects [ 33/95]
Quadrature Log Likelihood
After all the substitutions, the function to be maximized:
Not simple, but feasible.
logL   i1 log
1
  i1 log
1
N
N


H
h 1


H
h 1


Ti

wh  t 1 F(y it ,   x it  u 2 zh ) 


T
wh  t i 1 F(y it ,   x it  zh )


Part 16: Nonlinear Effects [ 34/95]
Simulation Based Estimator
N

N

Ti

logL   i1 log  t 1 F(y it ,   x it  u v i )    v i  dv i
 

=

i1
log g(v i )
This equals


N
i1
 -v i2 
exp 
 dv i
2
 2 
1
log E[g( v i )]
The expected value of the functio n of v i can be approximated
by drawing R random draws v ir from the population N[0,1] and
averaging the R functions of v ir . We maxi mize
logL S   i1 log
N
1 R  Ti
F(y it ,   x it  u v ir ) 

r 1  t 1

R
Part 16: Nonlinear Effects [ 35/95]
Random Effects Model: Quadrature
---------------------------------------------------------------------Random Effects Binary Probit Model
Dependent variable
DOCTOR
Log likelihood function
-16290.72192  Random Effects
Restricted log likelihood -17701.08500  Pooled
Chi squared [
1 d.f.]
2820.72616
Estimation based on N = 27326, K =
5
Unbalanced panel has
7293 individuals
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------Constant|
-.11819
.09280
-1.273
.2028
AGE|
.02232***
.00123
18.145
.0000
43.5257
EDUC|
-.03307***
.00627
-5.276
.0000
11.3206
HHNINC|
.00660
.06587
.100
.9202
.35208
Rho|
.44990***
.01020
44.101
.0000
--------+------------------------------------------------------------|Pooled Estimates
Constant|
.02159
.05307
.407
.6842
AGE|
.01532***
.00071
21.695
.0000
43.5257
EDUC|
-.02793***
.00348
-8.023
.0000
11.3206
HHNINC|
-.10204**
.04544
-2.246
.0247
.35208
--------+-------------------------------------------------------------
Part 16: Nonlinear Effects [ 36/95]
Random Parameter Model
---------------------------------------------------------------------Random Coefficients Probit
Model
Dependent variable
DOCTOR (Quadrature Based)
Log likelihood function
-16296.68110 (-16290.72192)
Restricted log likelihood -17701.08500
Chi squared [
1 d.f.]
2808.80780
Simulation based on 50 Halton draws
--------+------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
--------+------------------------------------------------|Nonrandom parameters
AGE|
.02226***
.00081
27.365
.0000 ( .02232)
EDUC|
-.03285***
.00391
-8.407
.0000 (-.03307)
HHNINC|
.00673
.05105
.132
.8952 ( .00660)
|Means for random parameters
Constant|
-.11873**
.05950
-1.995
.0460 (-.11819)
|Scale parameters for dists. of random parameters
Constant|
.90453***
.01128
80.180
.0000
--------+-------------------------------------------------------------
Using quadrature, a = -.11819. Implied  from these estimates is
.904542/(1+.904532) = .449998 compared to .44990 using
quadrature.
Part 16: Nonlinear Effects [ 37/95]
A Dynamic Model
y it  1[x it  y i,t 1  it  ui > 0]
Two similar 'effects'
Unobserved heterogeneity
State dependence = state 'persistence'
Pr(y it  1 | y i,t 1 ,..., y i0 , x it ,u]  F[x it  y i,t 1  ui ]
How to estimate ,  , marginal effects, F(.), etc?
(1) Deal with the latent common effect
(2) Handle the lagged effects:
This encounters the initial conditions problem.
Part 16: Nonlinear Effects [ 38/95]
Dynamic Probit Model: A Standard Approach
(1) Conditioned on all effects, joint probability
P(y i1 , y i2 ,..., y iT | y i0 , x i ,ui )   t 1 F( x it β  y i,t 1  ui , y it )
T
(2) Unconditional density; integrate out the common effect
P(y i1 , y i2 ,..., y iT | y i0 , x i ) 



P(y i1 , y i2 ,..., y iT | y i0 , x i ,ui )h(ui | y i0 , x i )dui
(3) Density for heterogeneity
h(ui | y i0 , x i )  N[  y i0  x iδ, u2 ], x i = [x i1 ,x i2 ,...,x iT ], so
ui =   y i0  x iδ + u w i
(contains every period of x it )
(4) Reduced form
P(y i1 , y i2 ,..., y iT | y i0 , x i ) 

T

t 1
 
F( x it β  y i,t 1    y i0  x iδ  u w i , y it )h(w i )dw i
This is a random effects model
Part 16: Nonlinear Effects [ 39/95]
Simplified Dynamic Model
Projecting ui on all observations expands the model enormously.
(3) Projection of heterogeneity only on group means
h(ui | y i0 , x i )  N[  y i0  x iδ, u2 ] so
ui =   y i0  x iδ + w i
(4) Reduced form
P(y i1 , y i2 ,..., y iT | y i0 , x i ) 

T

t 1
 
F(  x it β  y i,t 1  y i0  x iδ  u w i , y it )h(w i )dw i
Mundlak style correction with the initial value in the equation.
This is (again) a random effects mo del
Part 16: Nonlinear Effects [ 40/95]
A Dynamic Model for Public Insurance
Age
Household Income
Kids in the household
Health Status
Add initial value, lagged value, group means
Part 16: Nonlinear Effects [ 41/95]
Dynamic Common Effects Model
Part 16: Nonlinear Effects [ 42/95]
Fixed Effects
Part 16: Nonlinear Effects [ 43/95]
Fixed Effects Models


Estimate with dummy variable coefficients
Uit = i + ’xit + it
Can be done by “brute force” for 10,000s of individuals
log L  i 1
N




Ti
t 1
log F ( yit , i  xit )
F(.) = appropriate probability for the observed outcome
Compute  and i for i=1,…,N (may be large)
See FixedEffects.pdf in course materials.
Part 16: Nonlinear Effects [ 44/95]
Unconditional Estimation

Maximize the whole log likelihood

Difficult! Many (thousands) of parameters.

Feasible – NLOGIT (2004) (“Brute force”)
Part 16: Nonlinear Effects [ 45/95]
Fixed Effects Health Model
Groups in which yit is always = 0 or always = 1. Cannot compute αi.
Part 16: Nonlinear Effects [ 46/95]
Conditional Estimation




Principle: f(yi1,yi2,… | some statistic) is free
of the fixed effects for some models.
Maximize the conditional log likelihood, given
the statistic.
Can estimate β without having to estimate αi.
Only feasible for the logit model. (Poisson
and a few other continuous variable models.
No other discrete choice models.)
Part 16: Nonlinear Effects [ 47/95]
Binary Logit Conditional
Probabiities
ei  xit 
Prob( yit  1| xit ) 
.
1  ei  xit 
Ti


Prob  Yi1  yi1 , Yi 2  yi 2 , , YiTi  yiTi  yit 
t 1


Ti


 Ti

exp   yit xit  
exp   yit xit β 
 t 1

 t 1



.
Ti
Ti




 Ti 


exp
d
x

exp
d
x
β
All
different
ways
that
 
  t d it  S i  

  it it 
it it 
Si 

 t 1

 t 1

 t dit can equal Si
Denominator is summed over all the different combinations of Ti values
of yit that sum to the same sum as the observed Tt=1i yit . If Si is this sum,
T 
there are   terms. May be a huge number. An algorithm by Krailo
 Si 
and Pike makes it simple.
Part 16: Nonlinear Effects [ 48/95]
Example: Two Period Binary Logit

e i  xitβ
Prob(y it  1 | xit ) 
.

1  e i  xitβ

Prob  Yi1  y i1 , Yi2  y i2 ,



Prob  Yi1


Prob  Yi1


Prob  Yi1


Prob  Yi1

, YiTi  y iTi

y

0
,
data


it
t 1

2

 1, Yi2  0  y it  1 , data 
t 1

2

 0, Yi2  1  y it  1 , data 
t 1

2

 1, Yi2  1  y it  2 , data 
t 1

 0, Yi2  0
2
 Ti


exp
y
x

  it it 
Ti

 t 1

y it , data  
.

Ti



t 1


exp
d
x


 tdit Si  
it it
 t 1

 1.
exp( x i1β)
exp( x i1β)  exp( x i2β)
exp( x i2β)

exp( x i1β)  exp( x i2β)

 1.
Part 16: Nonlinear Effects [ 49/95]
Example: SevenPeriod Binary Logit
Prob[y = (1,0,0,0,1,1,1)|Xi ]=
exp(i  x1 )
exp( i  x7 )
1

 ... 
1  exp(i  x1 ) 1  exp( i  x 2 )
1  exp( i  x7 )
There are 35 different sequences of yit (permutations) that sum to 4.
For example, y*it| p1 might be (1,1,1,1,0,0,0). Etc.
Prob[y=(1,0,0,0,1,1,1)|Xi ,t71 yit =7] =
exp t71 yit xit 
7
*


exp


y
 p1  t 1 it| p xit 
35
Part 16: Nonlinear Effects [ 50/95]
Part 16: Nonlinear Effects [ 51/95]
With T = 50, the number of permutations of sequences of
y ranging from sum = 0 to sum = 50 ranges from 1 for 0 and 50,
to 2.3 x 1012 for 15 or 35 up to a maximum of 1.3 x 1014 for sum =25.
These are the numbers of terms that must be summed for a model
with T = 50. In the application below, the sum ranges from 15 to 35.
Part 16: Nonlinear Effects [ 52/95]
The sample is 200 individuals each observed 50 times.
Part 16: Nonlinear Effects [ 53/95]
The data are generated from a probit process with b1 = b2 = .5. But, it is fit as a
logit model. The coefficients obey the familiar relationship, 1.6*probit.
Part 16: Nonlinear Effects [ 54/95]
Large T Results. T = 50
Chamberlain estimator is
always consistent.
Pooled estimator is consistent because
the true model actually has no effects.
Brute force estimator is consistent
because the IP problem goes away
with large T.
Part 16: Nonlinear Effects [ 55/95]
Estimating Partial Effects
“The fixed effects logit estimator of  immediately gives us
the effect of each element of xi on the log-odds ratio…
Unfortunately, we cannot estimate the partial effects…
unless we plug in a value for αi. Because the distribution of
αi is unrestricted – in particular, E[αi] is not necessarily zero
– it is hard to know what to plug in for αi. In addition, we
cannot estimate average partial effects, as doing so would
require finding E[Λ(xit + αi)], a task that apparently
requires specifying a distribution for αi.”
(Wooldridge, 2002)
Part 16: Nonlinear Effects [ 56/95]
Logit Constant Terms
Step 1. Estimate β with Chamberlain's conditional estimator
Step 2. Treating β as if it were known, estimate i from the
first order condition
1
yi 
Ti

Ti
ˆ
e i e xit β
t 1
ˆ

1
Ti
ic it
1

 t 1 1   c T
i it
i
Ti
1  e i e xit β
Estimate i  1 / exp(i )  i   log i
c it
 t 1   c
i
it
Ti
 
ˆ is treated as known data.
c it  exp xitβ
Solve one equation in one unknown for each i.
Note there is no solution if y i = 0 or 1.
Iterating back and forth does not maximize logL.
Part 16: Nonlinear Effects [ 57/95]
Advantages and Disadvantages
of the FE Model

Advantages




Allows correlation of effect and regressors
Fairly straightforward to estimate
Simple to interpret
Disadvantages



Model may not contain time invariant variables
Not necessarily simple to estimate if very large
samples (Stata just creates the thousands of dummy
variables)
The incidental parameters problem: Small T bias
Part 16: Nonlinear Effects [ 58/95]
Incidental Parameters Problems:
Conventional Wisdom

General: The unconditional MLE is biased in
samples with fixed T except in special cases
such as linear or Poisson regression (even
when the FEM is the right model).
The conditional estimator (that bypasses
estimation of αi) is consistent.

Specific: Upward bias (experience with probit
and logit) in estimators of 
Part 16: Nonlinear Effects [ 59/95]
Maximum Likelihood Estimation
With normally distributed disturbances, the FE model is the
ordinary classical normal linear regression model. OLS is the
maximum likelihood estimator of β. The maximum likelihood
estimator of 2 is
Ti
N
2


e
2

ˆ   i1 Ti t 1 it , the usual mean squared residual, with no
 t 1 Ti
correction for degrees of freedom. From standard results for
the linear model, the exact expectation is
2

E[
ˆ ]
2

 (Ni1 Ti )  N  K 
1
K 
1  N  K 
2 
2 

   1   N    1  

N
N

T

T
T
T




i1 i


i1 i 
Part 16: Nonlinear Effects [ 60/95]
The Incidental Parameters Problem



The model is correctly specified
The log likelihood is correctly specified and
maximized
The estimator is inconsistent




The number of parameters grows with N
The “bias” in the MLE gets smaller as T grows
At infinite T, the estimator is consistent in N
In the linear FEM, the MLE of 2 is affected by
this problem.
Part 16: Nonlinear Effects [ 61/95]
Part 16: Nonlinear Effects [ 62/95]
The Incidental Parameters Problem
Part 16: Nonlinear Effects [ 63/95]
Fixed Effects Logit Health Model:
Conditional vs. Unconditional. Small T
Part 16: Nonlinear Effects [ 64/95]
A Monte Carlo Study of the
FE Estimator: Probit vs. Logit
Estimates of Coefficients and Marginal
Effects at the Implied Data Means
Results are scaled so the desired quantity being estimated
(, , marginal effects) all equal 1.0 in the population.
Part 16: Nonlinear Effects [ 65/95]
Fixed Effects


Attention mostly focused on index function models;
f(yit|xit) = some function of xit’β+αi.
Incidental parameters problems




Bias of estimator of β is O(1/T)
How do we estimate αi?
How can we compute interesting partial effects?
Models




Linear model: No problem
Poisson (nonlinear model): No problem
1 or 2 other models: No problem
Other nonlinear models: The literature speaks in generalities


The probit and logit models have been analyzed at length
Almost nothing is known about any other model save for Greene’s
(2002-2004) limited Monte Carlo studies (frontier, tobit, truncation,
ordered probit, probit, logit)
Part 16: Nonlinear Effects [ 66/95]
Bias Correction Estimators

Motivation: Undo the incidental parameters bias in the
fixed effects probit model:



Advantages




(1) Maximize a penalized log likelihood function, or
(2) Directly correct the estimator of β
For (1) estimates αi so enables partial effects
Estimator is consistent under some circumstances
(Possibly) corrects in dynamic models
Disadvantage



No time invariant variables in the model
Practical implementation
Extension to other models? (Ordered probit model (maybe) –
see JBES 2009)
Part 16: Nonlinear Effects [ 67/95]
Bias Reduction



Parametric (probit and logit) models with fixed effects
(We examine non- and semiparametric methods at the end of the
course.)
Recent references: All about probit and logit models.





[1] Carro, J., “Estimating dynamic panel data discrete choice models
with fixed effects,” JE, 140, 2007, pp. 503-528
[2] Val, F., “Fixed Effects estimation of structural parameters and
marginal effects in panel probit models,” JE, 2010
[3] Hahn, J. and G. Kuersteiner, “Bias reduction for dynamic nonlinear
panel models with fixed effects,” UCLA, 2003
[4] Hahn, J. and W. Newey, “Jackknife and Analytical Bias reduction
for nonlinear panel models,” Econometrica, 2004.
See, also, bibliographies and work of T. Woutersen, B. Honoré and E.
Kyriazidou.
Part 16: Nonlinear Effects [ 68/95]
Bias Reduction – 1: Hahn



All rely on a large T approximation to the bias when T is
(very small)
All analyze the equivalent of the brute force,
unconditional estimator.
Hahn/Kuersteiner and Hahn/Newey




plim bMLE = β + B(T) where B(T) is O(1/T)
Derive an expression for B(T)
The bias corrected estimator is obtained by subtraction
No further analysis is obtained to estimate fixed effects or
partial effects.
Part 16: Nonlinear Effects [ 69/95]
Bias Reduction – 2: Val



Plim bMLE = β + B(T)
Find, D(T) a large sample approximation such
that Plim bMLE +D(T) = β + F(T) where F(T) is
O(1/T2)
Finds a counterpart approximation to the
marginal effects.
Part 16: Nonlinear Effects [ 70/95]
Bias Reduction – 3: Carro





Change the log likelihood. Maximum Modified
Likelihood Estimator = MMLE
Maximize MMLE such that the solution to MMLE
is bMMLE plim bMMLE = β +G(T) where G(T) is
O(1/T2).
Also obtains a solution for αi (unlike the others).
ai,MMLE = f(bMMLE)
(A problem? When yit is always the same, there
is no solution for ai.)
Part 16: Nonlinear Effects [ 71/95]
Bias Reduction?




Approximations rely on large T
Work “moderately well” when T is as low as 8
or 10.
Completely miss the mark when T=2, 3,4
Nothing is known about any other models.
Part 16: Nonlinear Effects [ 72/95]
A Mundlak Correction for the FE Model
Fixed Effects Model :
y*it  i  xit  it ,i = 1,...,N; t = 1,...,Ti
yit  1 if yit > 0, 0 otherwise.
Mundlak (Wooldridge, Heckman, Chamberlain),...
i    xi  ui (Projection, not necessarily conditional mean)
where u is normally distributed with mean zero and standard
deviation u and is uncorrelated with xi or (xi1 , xi 2 ,..., xiT )
Reduced form random effects model
y*it    xi  xit  it  ui ,i = 1,...,N; t = 1,...,Ti
yit  1 if yit > 0, 0 otherwise.
Part 16: Nonlinear Effects [ 73/95]
Arrived 6 PM April 13, 2015
Part 16: Nonlinear Effects [ 74/95]
Mundlak Correction
Part 16: Nonlinear Effects [ 75/95]
A Variable Addition Test for FE vs. RE
The Wald statistic of 45.27922 and
the likelihood ratio statistic of
40.280 are both far larger than the
critical chi squared with 5 degrees
of freedom, 11.07. This suggests
that for these data, the fixed
effects model is the preferred
framework.
Part 16: Nonlinear Effects [ 76/95]
Fixed Effects Models Summary







Incidental parameters problem if T < 10 (roughly)
Inconvenience of computation
Appealing specification
Alternative semiparametric estimators?
 Theory not well developed for T > 2
 Not informative for anything but slopes (e.g.,
predictions and marginal effects)
Ignoring the heterogeneity definitely produces an
inconsistent estimator (even with cluster correction!)
A Hobson’s choice
Mundlak correction is a useful common approach.
Part 16: Nonlinear Effects [ 77/95]
Conditional vs. Unconditional
Dep. Var. = Healthy
Note, this estimator is not consistent –
Incidental Parameters Problem
Part 16: Nonlinear Effects [ 78/95]
Escaping the FE Assumptions
Chamberlain (again)
Structure
Prob(y it  1 | x it )  F(i  x it), i    x i  w i
Reduced form is a random effects model
Prob(y it  1 | x it )  F(  x i  x it  w i )
(Does not allow time invariant effects (again).)
Estimation:
(1) FIML
(2) Period by period, then reconcile with minimum distance
Part 16: Nonlinear Effects [ 79/95]
Modeling a Binary Outcome




Did firm i produce a product or process innovation in year t ?
yit : 1=Yes/0=No
Observed N=1270 firms for T=5 years, 1984-1988
Observed covariates: xit = Industry, competitive pressures,
size, productivity, etc.
How to model?



Binary outcome
Correlation across time
Heterogeneity across firms
Part 16: Nonlinear Effects [ 80/95]
Application
Part 16: Nonlinear Effects [ 81/95]
Part 16: Nonlinear Effects [ 82/95]
Estimates of a Fixed Effects Probit Model
----------------------------------------------------------------------------FIXED EFFECTS Probit Model
Dependent variable
IP
Log likelihood function
-2087.22475
Estimation based on N =
6350, K = 953
Inf.Cr.AIC =
6080.4 AIC/N =
.958
Model estimated: Apr 16, 2013, 10:00:53
Unbalanced panel has
1270 individuals
Skipped 552 groups with inestimable ai
PROBIT (normal) probability model
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
IP| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Index function for probability
EMPLP| .12108D-05
.00015
.01 .9934 -.28688D-03 .28930D-03
LOGSALES|
-.53108
.34472
-1.54 .1234
-1.20672
.14457
IMUM|
4.26647
2.87407
1.48 .1377
-1.36660
9.89953
FDIUM|
-7.34800**
3.31144
-2.22 .0265
-13.83830
-.85770
--------+-------------------------------------------------------------------Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Part 16: Nonlinear Effects [ 83/95]
Pooled, Fixed Effects and Random Effects Probit
+---------------------------------------------+
| Probit
Regression Start Values for IP
|
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
EMPLP
.00013619
.154784D-04
8.798
.0000
580.944724
LOGSALES
.13668535
.01792267
7.626
.0000
10.5400961
IMUM
.89572747
.14180988
6.316
.0000
.25275054
FDIUM
3.24193536
.38236984
8.479
.0000
.04580618
Constant
-1.63354524
.20737277
-7.877
.0000
+---------------------------------------------+
| Random Effects Binary Probit Model
|
+---------+--------------+----------------+--------+---------+----------+
EMPLP
.00017616
.117150D-04
15.037
.0000
580.944724
LOGSALES
.21174534
.04309101
4.914
.0000
10.5400961
IMUM
1.41657383
.34121909
4.152
.0000
.25275054
FDIUM
4.41817066
.83712165
5.278
.0000
.04580618
Constant
-2.51015928
.49459030
-5.075
.0000
Rho
.58588783
.01864491
31.423
.0000
+---------------------------------------------+
| FIXED EFFECTS Probit Model
|
+---------+--------------+----------------+--------+---------+----------+
EMPLP
.121081D-05
.00014700
.008
.9934
419.786630
LOGSALES
-.53108315
.34473601
-1.541
.1234
10.5368540
IMUM
4.26652343
2.87418573
1.484
.1377
.25359436
FDIUM
-7.34808205
3.31155361
-2.219
.0265
.04444097
Part 16: Nonlinear Effects [ 84/95]
Fixed Effects

Advantages




Allows correlation of effect and regressors
Fairly straightforward to estimate
Simple to interpret
Disadvantages



Not necessarily simple to estimate if very large
samples (Stata just creates the thousands of dummy
variables)
The incidental parameters problem: Small T bias.
No time invariant variables
Part 16: Nonlinear Effects [ 85/95]
Incidental Parameters Problems: Conventional
Wisdom


General: Biased in samples with fixed T except
in special cases such as linear or Poisson
regression
Specific: Upward bias (experience with probit
and logit) in estimators of 
Part 16: Nonlinear Effects [ 86/95]
What We KNOW - Analytic



Newey and Hahn: MLE converges in probability
to a vector of constants. (Variance diminishes
with increase in N).
Abrevaya and Hsiao: Logit estimator converges
to 2 when T = 2.
Han, Schmidt, Greene: Probit estimator
converges to 2 when T = 2.
Part 16: Nonlinear Effects [ 87/95]
What We THINK We Know – Monte Carlo

Heckman:



Bias in probit estimator is small if T  8
Bias in probit estimator is toward 0 in some cases
Katz (et al – numerous others), Greene


Bias in probit and logit estimators is large
Upward bias persists even as T  20
Part 16: Nonlinear Effects [ 88/95]
Heckman’s Monte Carlo Study
Part 16: Nonlinear Effects [ 89/95]
Some Familiar Territory – A Monte Carlo Study of the FE
Estimator: Probit vs. Logit
(Greene, The Econometrics Journal, 7, 2004, pp. 98-119)
Estimates of Coefficients and Marginal
Effects at the Implied Data Means
Results are scaled so the desired quantity being estimated (, , marginal
effects) all equal 1.0 in the population.
Part 16: Nonlinear Effects [ 90/95]
A Monte Carlo Study of the FE Probit Estimator
Percentage Biases in Estimates of Coefficients and
Marginal Effects at the Implied Data Means
Part 16: Nonlinear Effects [ 91/95]
Dynamic Models
y it  1[x it  y i,t 1  ui  it > 0]
Two 'effects' with similar impact on observations
Unobserved time persistent heterogeneity
State dependence = state 'persistence'
Pr(y it  1 | y i,t 1 ,..., y i0 , x it ,u]  F[x it  y i,t 1  ui ]
How to estimate , , marginal effects, F(.), etc?
(1) Deal with the latent common effect
(a) Random effects approaches
(b) Fixed effects approaches
(2) Handling the lagged effects: The initial conditions problem.
Part 16: Nonlinear Effects [ 92/95]
Application – Doctor Visits
Riphahn, Million Wambach, JAE, 2003
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293
individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary
choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges
from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable
NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of
the data for the person.
DOCTOR = 1(Number of doctor visits > 0)
HSAT = health satisfaction, coded 0 (low) - 10 (high)
DOCVIS = number of doctor visits in last three months
HOSPVIS = number of hospital visits in last calendar year
PUBLIC = insured in public health insurance = 1; otherwise = 0
ADDON = insured by add-on insurance = 1; otherswise = 0
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS = children under age 16 in the household = 1; otherwise = 0
EDUC = years of schooling
AGE = age in years
MARRIED = marital status
EDUC = years of education
Part 16: Nonlinear Effects [ 93/95]
Application: Innovations
Bertschek and Lechner, J of Econometrics, 1998
Part 16: Nonlinear Effects [ 94/95]
Application
Stewart, JAE, 2007




British Household Panel Survey (1991-1996)
3060 households retained (balanced) out of
4739 total.
Unemployment indicator (0.1)
Data features


Panel data – unobservable heterogeneity
State persistence: “Someone unemployed at t-1 is
more than 20 times as likely to be unemployed at t
as someone employed at t-1.”
Part 16: Nonlinear Effects [ 95/95]
Application: Direct Approach
Part 16: Nonlinear Effects [ 96/95]
GHK Simulation/Estimation
The presence of the autocorrelation and state dependence in the model
invalidate the simple maximum likelihood procedures we examined earlier.
The appropriate likelihood function is constructed by formulating the
probabilities as
Prob( yi,0, yi,1, . . .) = Prob(yi,0) × Prob(yi,1 | yi,0) ×・ ・ ・×Prob(yi,T | yi,T-1) .
This still involves a T = 7 order normal integration, which is approximated in
the study using a simulator similar to the GHK simulator.
Part 16: Nonlinear Effects [ 97/95]
A Dynamic RE Probit
Habit persistence and latent heterogeneity
yi,t  1[ xi,t  yi,t 1  i  i,t  0], t = 1,...,T; i=1,...,N
Fixed effects assumption; Cov[i ,x i,t ] may not be 0.
Initial condition y i,0 is observed.
Mundlak (1978), Chamberlain (1984): Project i on Xi
i  xi  ui , ui ~ N[0,u2 ]
Implies a random effects model:
yi,t  1[ xi,t  yi,t 1  xi  ui  i,t  0], t = 1,...,T; i=1,...,N
Part 16: Nonlinear Effects [ 98/95]
Problems with Dynamic RE Probit




Assumes yi,0 and the effects are uncorrelated
Assumes the initial conditions are exogenous –
OK if the process and the observation begin at
the same time, not if different.
Doesn’t allow time invariant variables in the
model.
The normality assumption in the projection.
Part 16: Nonlinear Effects [ 99/95]
Heckman’s Solution
Dynamic model
y i,t  1[ xi,t  y i,t 1  i  i,t  0], t = 1,...,T; i=1,...,N
Use Mundlak device as before; i  xi  ui
y i,t  1[ xi,t  y i,t 1  xi  ui  i,t  0], t = 1,...,T; i=1,...,N
Explicit model for the initial condition. ("Reduced form")
y i,0  1[ zi,0   hi  0],hi ~N[0,h2 ] (zi,0 includes x i,0 and instruments)
hi correlated with ui but uncorrelated with i,t .
Project hi on ui so hi  ui  i,0 , i,0 uncorrelated with ui and i,t .
y i,0  1[ zi,0   ui  i,0  0]
Random effects model, but the initial period is different.
Part 16: Nonlinear Effects [ 100/95]
Dynamic Probit Model:
A “Simplified” Approach (Wooldridge, 2005)
(1) Conditioned on all effects, joint probability
P(y i1 , y i2 ,..., y iT | y i0 , x i ,ui )   t 1 F( x it β  y i,t 1  ui , y it )
T
(2) Unconditional density; integrate out the common effect
P(y i1 , y i2 ,..., y iT | y i0 , x i ) 



P(y i1 , y i2 , ..., y iT | y i0 , x i ,ui )h(ui | y i0 , x i )dui
(3) (The rabbit in the hat) Density for heterogeneity
h(ui | y i0 , x i )  N[  y i0  x iδ, u2 ] so
ui =   y i0  x iδ + w i
(4) Reduced form
P(y i1 , y i2 ,..., y iT | y i0 , x i ) 

T

t 1
 
F( x it β  y i,t 1    y i0  x iδ  w w i , y it )h(w i )dw i
This is a Butler-Moffit style random effects model
Part 16: Nonlinear Effects [ 101/95]
Distributional Problem

Normal distributions assumed throughout




Alternative: Discrete distribution for ui.



Normal distribution for the unique component, εi,t
Normal distribution assumed for the heterogeneity,
ui
Sensitive to the distribution?
Heckman and Singer style, latent class model.
Conventional estimation methods.
Why is the model not sensitive to normality for
εi,t but it is sensitive to normality for ui?
Part 16: Nonlinear Effects [ 102/95]
Implementations of RE Models




Linear, Probit, Logit, Poisson, 1 or 2 other
models: SAS, about 10 others
Linear, Probit, Logit, 4 or 5 others: MLWin
(Using Bayesian MCMC methods)
Linear, Probit, Logit, Poisson, 4 or 5 others:
Stata (using quadrature, Proc = GLAMM)
Linear, Probit, Logit, Poisson, MNL, Tobit, about
50 others: LIMDEP/NLOGIT (using maximum
simulated likelihood)
Download