Extended Models for Binary Choice - NYU Stern

advertisement
Discrete Choice Modeling
William Greene
Stern School of Business
New York University
Part 4
Bivariate and Multivariate
Binary Choice Models
Multivariate Binary Choice Models

Bivariate Probit Models






Simultaneous Equations and Recursive Models
A Sample Selection Bivariate Probit Model
The Multivariate Probit Model





Analysis of bivariate choices
Marginal effects
Prediction
Specification
Simulation based estimation
Inference
Partial effects and analysis
The ‘panel probit model’
Application: Health Care Usage
German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods
Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293
individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary
choice. There are altogether 27,326 observations. The number of observations ranges from 1 to 7.
(Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).
Variables in the file are
DOCTOR = 1(Number of doctor visits > 0)
HOSPITAL = 1(Number of hospital visits > 0)
HSAT = health satisfaction, coded 0 (low) - 10 (high)
DOCVIS = number of doctor visits in last three months
HOSPVIS = number of hospital visits in last calendar year
PUBLIC = insured in public health insurance = 1; otherwise = 0
ADDON = insured by add-on insurance = 1; otherswise = 0
HHNINC = household nominal monthly net income in German marks / 10000.
(4 observations with income=0 were dropped)
HHKIDS = children under age 16 in the household = 1; otherwise = 0
EDUC = years of schooling
AGE = age in years
MARRIED = marital status
Gross Relation Between Two Binary Variables
Cross Tabulation Suggests Presence or
Absence of a Bivariate Relationship
+-----------------------------------------------------------------+
|Cross Tabulation
|
|Row variable is DOCTOR
(Out of range 0-49:
0)
|
|Number of Rows = 2
(DOCTOR
= 0 to 1)
|
|Col variable is HOSPITAL (Out of range 0-49:
0)
|
|Number of Cols = 2
(HOSPITAL = 0 to 1)
|
+-----------------------------------------------------------------+
|
HOSPITAL
|
+--------+--------------+------+
|
| DOCTOR|
0
1| Total|
|
+--------+--------------+------+
|
|
0|
9715
420| 10135|
|
|
1| 15216
1975| 17191|
|
+--------+--------------+------+
|
|
Total| 24931
2395| 27326|
|
+-----------------------------------------------------------------+
Tetrachoric Correlation
A co rre la tio n m e a su re fo r tw o b in a ry v a ria b le s
C a n b e d e fin e d im p licitly
y 1 * = μ 1 + ε 1, y 1 = 1 (y 1 * > 0 )
y 2 * = μ 2 + ε 2 , y 2 = 1 (y 2 * > 0 )
 0   1
 ε1 
  ~ N   , 
 ε2 
 0   ρ
ρ 

1 
ρ is th e te tra c h o ric c o rre la tio n b e tw e e n y 1 a n d y 2
Log Likelihood Function
for Tetrachoric Correlation
lo g L =

=

n
i=1
lo g Φ 2  (2 y i1 - 1 )μ 1,(2 y i2 - 1 )μ 2 ,(2 y i1 - 1 )(2 y i2 - 1 )ρ 
n
i=1
lo g Φ 2  q i1μ 1, q i2 μ 2 , q i1q i2 ρ 
N o te : q i1 = (2 y i1 - 1 ) = -1 if y i1 = 0 a n d + 1 if y i1 = 1 .
Φ 2 = B iv a ria te n o rm a l C D F - m u st b e co m p u te d
u sin g q u a d ra tu re
M a xim ize d w ith re sp e ct to μ 1,μ 2 a n d ρ.
Estimation
+---------------------------------------------+
| FIML Estimates of Bivariate Probit Model
|
| Maximum Likelihood Estimates
|
| Dependent variable
DOCHOS
|
| Weighting variable
None
|
| Number of observations
27326
|
| Log likelihood function
-25898.27
|
| Number of parameters
3
|
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Index
equation for DOCTOR
Constant
.32949128
.00773326
42.607
.0000
Index
equation for HOSPITAL
Constant
-1.35539755
.01074410 -126.153
.0000
Tetrachoric Correlation between DOCTOR
and HOSPITAL
RHO(1,2)
.31105965
.01357302
22.918
.0000
A Bivariate Probit Model
Two Equation Probit Model
 (More than two equations comes later)
 No bivariate logit – there is no
reasonable bivariate counterpart
 Why fit the two equation model?



Analogy to SUR model: Efficient
Make tetrachoric correlation conditional on
covariates – i.e., residual correlation
Bivariate Probit Model
y 1 * = β 1 x 1 + ε 1 , y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + ε 2 , y 2 = 1 (y 2 * > 0 )
 0   1
 ε1 
  ~ N   , 
 ε2 
 0   ρ
ρ 

1 
T h e v a ria b le s in x 2 a n d x 2 m a y b e th e sa m e o r
d iffe re n t. T h e re is n o n e e d fo r e a ch e q u a tio n to h a v e
its 'o w n v a ri a b le .'
ρ is th e c o n d itio n a l te tra ch o ric c o rre la tio n b e tw e e n y 1 a n d y 2 .
(T h e e q u a tio n s c a n b e fit o n e a t a tim e . U se F IM L fo r
(1 ) e fficie n cy a n d (2 ) to g e t th e e stim a te o f ρ .)
Estimation of the Bivariate Probit Model
 (2 y i1 - 1 )β 1 x i1,

n


lo g L =  i=1 lo g Φ 2 (2 y i2 - 1 )β 2 x i2 ,


 (2 y i1 - 1 )(2 y i2 - 1 )ρ 
=

n
i=1
lo g Φ 2  q i1β 1 x i1, q i2 β 2 x i2 , q i1q i2 ρ 
N o te : q i1 = (2 y i1 - 1 ) = -1 if y i1 = 0 a n d + 1 if y i1 = 1 .
Φ 2 = B iv a ria te n o rm a l C D F - m u st b e c o m p u te d
u sin g q u a d ra tu re
M a xim ize d w ith re sp e ct to β 1, β 2 a n d ρ .
Parameter Estimates
---------------------------------------------------------------------FIML Estimates of Bivariate Probit Model for DOCTOR and HOSPITAL
Dependent variable
DOCHOS
Log likelihood function
-25323.63074
Estimation based on N = 27326, K = 12
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Index
equation for DOCTOR
Constant|
-.20664***
.05832
-3.543
.0004
AGE|
.01402***
.00074
18.948
.0000
43.5257
FEMALE|
.32453***
.01733
18.722
.0000
.47877
EDUC|
-.01438***
.00342
-4.209
.0000
11.3206
MARRIED|
.00224
.01856
.121
.9040
.75862
WORKING|
-.08356***
.01891
-4.419
.0000
.67705
|Index
equation for HOSPITAL
Constant|
-1.62738***
.05430
-29.972
.0000
AGE|
.00509***
.00100
5.075
.0000
43.5257
FEMALE|
.12143***
.02153
5.641
.0000
.47877
HHNINC|
-.03147
.05452
-.577
.5638
.35208
HHKIDS|
-.00505
.02387
-.212
.8323
.40273
|Disturbance correlation (Conditional tetrachoric correlation)
RHO(1,2)|
.29611***
.01393
21.253
.0000
---------------------------------------------------------------------| Tetrachoric Correlation between DOCTOR
and HOSPITAL
RHO(1,2)|
.31106
.01357
22.918
.0000
--------+-------------------------------------------------------------
How to Measure Fit? Pseudo R2 is meaningless
Predictions: Success and Failure
Joint Frequency Table: Columns = HOSPITAL
Rows
= DOCTOR
(N) = Count of Fitted Values
Cell with largest probability
0
1
0
9715
420
( 4683)
(
0)
1
15216
1975
(22643)
(
0)
TOTAL
24931
2395
(27326)
(
0)
TOTAL
10135
( 4683)
17191
(22643)
27326
(27326)
Analysis of Predictions
+--------------------------------------------------------+
| Bivariate Probit Predictions for DOCTOR
and HOSPITAL |
| Predicted cell (i,j) is cell with largest probability |
| Neither DOCTOR
nor HOSPITAL predicted correctly
|
|
558 of
27326 observations |
| Only
DOCTOR
correctly predicted
|
|
DOCTOR
= 0:
69 of
10135 observations |
|
DOCTOR
= 1:
1768 of
17191 observations |
| Only
HOSPITAL correctly predicted
|
|
HOSPITAL = 0:
9510 of
24931 observations |
|
HOSPITAL = 1:
1768 of
2395 observations |
| Both
DOCTOR
and HOSPITAL correctly predicted
|
|
DOCTOR
= 0 HOSPITAL = 0:
2306 of
9715 |
|
DOCTOR
= 1 HOSPITAL = 0:
13115 of
15216 |
|
DOCTOR
= 0 HOSPITAL = 1:
0 of
420 |
|
DOCTOR
= 1 HOSPITAL = 1:
0 of
1975 |
+--------------------------------------------------------+
Marginal Effects

What are the marginal effects



Possible margins?




Effect of what on what?
Two equation model, what is the conditional mean?
Derivatives of joint probability = Φ2(β1’xi1, β2’xi2,ρ)
Partials of E[yij|xij] =Φ(βj’xij) (Univariate probability)
Partials of E[yi1|xi1,xi2,yi2=1] = P(yi1,yi2=1)/Prob[yi2=1]
Note marginal effects involve both sets of regressors.
If there are common variables, there are two effects
in the derivative that are added.
Bivariate Probit Conditional Means
P ro b [y i1 = 1 , y i2 = 1 ] = Φ 2 ( β 1 x i1 , β 2 x i2 ,ρ )
T h is is n o t a c o n d itio n a l m e a n . F o r a g e n e ric x th a t m ig h t a p p e a r in e ith e r in d e x fu n ctio n ,
 P ro b [y i1 = 1 , y i2 = 1 ]
x i
= g i1β 1 + g i2 β 2
 β  x - ρβ x
1 i1
g i1 = φ ( β 1 x i1 )Φ  2 i2
2

1- ρ


 β x - ρβ  x
2 i2
 , g i2 = φ ( β 2 x i2 )Φ  1 i1
2


1- ρ






T h e te rm in β 1 is 0 if x i d o e s n o t a p p e a r in x i1 a n d lik e w ise fo r β 2 .
E [y i1 | x i1 , x i2 , y i2 = 1 ] = P ro b [y i1 = 1 | x i1, x i2 , y i2 = 1 ] =
 E [y i1 | x i1, x i2 , y i2 = 1 ]
x i
=
1
Φ ( β 2 x i2 )
 g i1β 1 + g i2 β 2  -
Φ 2 ( β 1 x i1 , β 2 x i2 ,ρ )
Φ ( β 2 x i2 )
Φ 2 ( β 1 x i1 , β 2 x i2 ,ρ )φ ( β 2 x i2 )
2
[Φ ( β 2 x i2 )]
β2



g i1
g i2
Φ 2 ( β 1 x i1 , β 2 x i2 ,ρ )φ ( β 2 x i2 ) 
=
β
+
 1 
β2
2



Φ
(
β
x
)
Φ
(
β
x
)
[Φ
(
β
x
)]

2 i2 

2 i2
2 i2

Marginal Effects: Decomposition
+------------------------------------------------------+
| Marginal Effects for E[y1|y2=1]
|
+----------+----------+----------+----------+----------+
| Variable | Efct x1 | Efct x2 | Efct z1 | Efct z2 |
+----------+----------+----------+----------+----------+
| AGE
|
.00383 | -.00035 |
.00000 |
.00000 |
| FEMALE
|
.08857 | -.00835 |
.00000 |
.00000 |
| EDUC
| -.00392 |
.00000 |
.00000 |
.00000 |
| MARRIED |
.00061 |
.00000 |
.00000 |
.00000 |
| WORKING | -.02281 |
.00000 |
.00000 |
.00000 |
| HHNINC
|
.00000 |
.00217 |
.00000 |
.00000 |
| HHKIDS
|
.00000 |
.00035 |
.00000 |
.00000 |
+----------+----------+----------+----------+----------+
Direct Effects
Derivatives of E[y1|x1,x2,y2=1] wrt x1
+-------------------------------------------+
| Partial derivatives of E[y1|y2=1] with
|
| respect to the vector of characteristics. |
| They are computed at the means of the Xs. |
| Effect shown is total of 4 parts above.
|
| Estimate of E[y1|y2=1] = .819898
|
| Observations used for means are All Obs. |
| These are the direct marginal effects.
|
+-------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
AGE
.00382760
.00022088
17.329
.0000
43.5256898
FEMALE
.08857260
.00519658
17.044
.0000
.47877479
EDUC
-.00392413
.00093911
-4.179
.0000
11.3206310
MARRIED
.00061108
.00506488
.121
.9040
.75861817
WORKING
-.02280671
.00518908
-4.395
.0000
.67704750
HHNINC
.000000
......(Fixed Parameter).......
.35208362
HHKIDS
.000000
......(Fixed Parameter).......
.40273000
Indirect Effects
Derivatives of E[y1|x1,x2,y2=1] wrt x2
+-------------------------------------------+
| Partial derivatives of E[y1|y2=1] with
|
| respect to the vector of characteristics. |
| They are computed at the means of the Xs. |
| Effect shown is total of 4 parts above.
|
| Estimate of E[y1|y2=1] = .819898
|
| Observations used for means are All Obs. |
| These are the indirect marginal effects. |
+-------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
AGE
-.00035034
.697563D-04
-5.022
.0000
43.5256898
FEMALE
-.00835397
.00150062
-5.567
.0000
.47877479
EDUC
.000000
......(Fixed Parameter).......
11.3206310
MARRIED
.000000
......(Fixed Parameter).......
.75861817
WORKING
.000000
......(Fixed Parameter).......
.67704750
HHNINC
.00216510
.00374879
.578
.5636
.35208362
HHKIDS
.00034768
.00164160
.212
.8323
.40273000
Marginal Effects: Total Effects
Sum of Two Derivative Vectors
+-------------------------------------------+
| Partial derivatives of E[y1|y2=1] with
|
| respect to the vector of characteristics. |
| They are computed at the means of the Xs. |
| Effect shown is total of 4 parts above.
|
| Estimate of E[y1|y2=1] = .819898
|
| Observations used for means are All Obs. |
| Total effects reported = direct+indirect. |
+-------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
AGE
.00347726
.00022941
15.157
.0000
43.5256898
FEMALE
.08021863
.00535648
14.976
.0000
.47877479
EDUC
-.00392413
.00093911
-4.179
.0000
11.3206310
MARRIED
.00061108
.00506488
.121
.9040
.75861817
WORKING
-.02280671
.00518908
-4.395
.0000
.67704750
HHNINC
.00216510
.00374879
.578
.5636
.35208362
HHKIDS
.00034768
.00164160
.212
.8323
.40273000
Marginal Effects: Dummy Variables
Using Differences of Probabilities
+-----------------------------------------------------------+
| Analysis of dummy variables in the model. The effects are |
| computed using E[y1|y2=1,d=1] - E[y1|y2=1,d=0] where d is |
| the variable. Variances use the delta method. The effect |
| accounts for all appearances of the variable in the model.|
+-----------------------------------------------------------+
|Variable
Effect
Standard error
t ratio (deriv) |
+-----------------------------------------------------------+
FEMALE
.079694
.005290
15.065 (.080219)
MARRIED
.000611
.005070
.121 (.000511)
WORKING
-.022485
.005044
-4.457 (-.022807)
HHKIDS
.000348
.001641
.212 (.000348)
Computed using
difference of probabilities
Computed using scaled
coefficients
Heteroscedasticity in Bivariate Probit
y 1 * = β 1 x 1 + ε 1, y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + ε 2 , y 2 = 1 (y 2 * > 0 )
  0    i12
 ε1 
  ~ N   , 
  0   ρ  i1 i 2
 ε2 
ρ  i1 i 2  

2
 i 2  
σ ij = e xp [ γ j z ij ]
Modeling the covariance separately produces an internal inconsistency.
The correlation cannot vary freely from the variances and maintain the
Cauchy Schwarz inequality.
A Model with Heteroscedasticity
---------------------------------------------------------------------FIML Estimates of Bivariate Probit Model
Log likelihood function
-25282.46370 (-25323.63074. ChiSq = 82.336)
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Index
equation for DOCTOR
Constant|
-.16213**
.07022
-2.309
.0210
AGE|
.01745***
.00114
15.363
.0000
43.5257
FEMALE|
.94019***
.14039
6.697
.0000
.47877
EDUC|
-.02277***
.00419
-5.441
.0000
11.3206
MARRIED|
.00952
.02743
.347
.7284
.75862
WORKING|
-.19688***
.03041
-6.475
.0000
.67705
|Index
equation for HOSPITAL
Constant|
-1.75985***
.06894
-25.527
.0000
AGE|
.00987***
.00136
7.250
.0000
43.5257
FEMALE|
-16.5930
32.11025
-.517
.6053
.47877
HHNINC|
-.12331
.07497
-1.645
.1000
.35208
HHKIDS|
-.00425
.03374
-.126
.8997
.40273
|Variance equation for DOCTOR
FEMALE|
.81572***
.11624
7.017
.0000
.47877
MARRIED|
-.00893
.05285
-.169
.8659
.75862
|Variance equation for HOSPITAL
FEMALE|
2.66327
1.78858
1.489
.1365
.47877
MARRIED|
-.04095**
.02031
-2.016
.0438
.75862
|Disturbance correlation
RHO(1,2)|
.29295***
.01398
20.951
.0000
--------+-------------------------------------------------------------
Partial Effects Due to Means and Variances
+------------------------------------------------------+
| Marginal Effects for Ey1|y2=1
|
+----------+----------+----------+----------+----------+
| Variable | Efct x1 | Efct x2 | Efct h1 | Efct h2 |
+----------+----------+----------+----------+----------+
| AGE
|
.00190 | -.00012 |
.00000 |
.00000 |
| FEMALE
|
.10212 |
.20690 | -.05879 | -.30949 |
| EDUC
| -.00247 |
.00000 |
.00000 |
.00000 |
| MARRIED |
.00103 |
.00000 |
.00064 |
.00476 |
| WORKING | -.02139 |
.00000 |
.00000 |
.00000 |
| HHNINC
|
.00000 |
.00154 |
.00000 |
.00000 |
| HHKIDS
|
.00000 |
.00005 |
.00000 |
.00000 |
+----------+----------+----------+----------+----------+
Partial Effects: All 4 Sources
---------------------------------------------------------------------Partial derivatives of E[y1|y2=1] with
respect to the vector of characteristics.
They are computed at the means of the Xs.
Effect shown is total of 4 parts above.
Estimate of E[y1|y2=1] = .916928
Observations used for means are All Obs.
Total effects reported = direct+indirect.
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------AGE|
.00177
.00142
1.248
.2120
43.5257
FEMALE|
-.05926
.18914
-.313
.7540
.47877
EDUC|
-.00247
.00215
-1.152
.2494
11.3206
MARRIED|
.00644
.00422
1.525
.1272
.75862
WORKING|
-.02139
.01834
-1.166
.2435
.67705
HHNINC|
.00154
.00263
.584
.5592
.35208
HHKIDS| .53016D-04
.00043
.124
.9016
.40273
--------+-------------------------------------------------------------
A Simultaneous Equations Model
S im u lta n e o u s E q u a tio n s M o d e l
y 1 * = β 1 x 1 + γ 1 y 2 + ε 1 , y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + γ 2 y 1 + ε 2 , y 2 = 1 (y 2 * > 0 )
 0   1
 ε1 
  ~ N   ,
 ε2 
 0   ρ
ρ 

1 
T h is m o d e l is n o t id e n tifie d . (N o t e stim a b le .
T h e c o m p u te r c a n c o m p u te 'e stim a te s' b u t
th e y h a v e n o m e a n in g .)
bivariate probit;lhs=doctor,hospital
;rh1=one,age,educ,married,female,hospital
;rh2=one,age,educ,married,female,doctor$
Error
809: Fully simultaneous BVP model is not identified
Fully Simultaneous ‘Model’
(Obtained by bypassing internal control)
---------------------------------------------------------------------FIML Estimates of Bivariate Probit Model
Dependent variable
DOCHOS
Log likelihood function
-20318.69455
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Index
equation for DOCTOR
Constant|
-.46741***
.06726
-6.949
.0000
AGE|
.01124***
.00084
13.353
.0000
43.5257
FEMALE|
.27070***
.01961
13.807
.0000
.47877
EDUC|
-.00025
.00376
-.067
.9463
11.3206
MARRIED|
-.00212
.02114
-.100
.9201
.75862
WORKING|
-.00362
.02212
-.164
.8701
.67705
HOSPITAL|
2.04295***
.30031
6.803
.0000
.08765
|Index
equation for HOSPITAL
Constant|
-1.58437***
.08367
-18.936
.0000
AGE|
-.01115***
.00165
-6.755
.0000
43.5257
FEMALE|
-.26881***
.03966
-6.778
.0000
.47877
HHNINC|
.00421
.08006
.053
.9581
.35208
HHKIDS|
-.00050
.03559
-.014
.9888
.40273
DOCTOR|
2.04479***
.09133
22.389
.0000
.62911
|Disturbance correlation
RHO(1,2)|
-.99996***
.00048
********
.0000
--------+-------------------------------------------------------------
A Latent Simultaneous Equations Model
S im u lta n e o u s E q u a tio n s M o d e l in th e la te n t v a ria b le s
y 1 * = β 1 x 1 + γ 1 y 2 + ε 1 , y 1 = 1 (y 1 * > 0 )
*
y 2 * = β 2 x 2 + γ 2 y 1 + ε 2 , y 2 = 1 (y 2 * > 0 )
*
 0   1
 ε1 
  ~ N  ,
 ε2 
 0   ρ
ρ 

1 
N o te th e u n d e rlyin g (la te n t) s tru c tu ra l v a ria b le s in
e a c h e q u a tio n , n o t th e o b s e rv e d b in a ry v a ria b le s .
T h is m o d e l is id e n tifie d . It is h a rd to in te rp re t. It c a n
b e c o n s is te n tly e s tim a te d b y tw o s te p m e th o d s .
(A n a lyze d in A m e m iya (1 9 7 9 ) a n d M a d d a la (1 9 8 3 ).)
A Recursive Simultaneous Equations Model
R e cu rsiv e S im u lta n e o u s E q u a tio n s M o d e l
y 1 * = β 1 x 1 +
ε 1 , y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + γ 2 y 1 + ε 2 , y 2 = 1 (y 2 * > 0 )
 0   1
 ε1 
~
N
  ,
 
 ε2 
 0   ρ
ρ 

1 
T h is m o d e l is id e n tifie d . It ca n b e co n siste n tly a n d e fficie n tly
e stim a te d b y fu ll in fo rm a tio n m a xim u m lik e lih o o d . T re a te d a s
a b iv a ria te p ro b it m o d e l, ig n o rin g th e s im u lta n e ity.
Bivariate ; Lhs = y1,y2 ; Rh1=…,y2 ; Rh2 = … $
Application: Gender Economics at
Liberal Arts Colleges
Journal of Economic Education, fall, 1998.
Estimated Recursive Model
Estimated Effects: Decomposition
A Sample Selection Model
y 1 * = β 1 x 1 + ε 1 , y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + ε 2 , y 2 = 1 (y 2 * > 0 )
 0   1
 ε1 
  ~ N   , 
 ε2 
 0   ρ
ρ 

1 
y 1 is o n ly o b se rv e d w h e n y 2 = 1 .
f(y 1, y 2 ) = P ro b [y 1 = 1 | y 2 = 1 ] * P ro b [y 2 = 1 ]
(y 1 = 1 , y 2 = 1 )
= P ro b [y 1 = 0 | y 2 = 1 ] * P ro b [y 2 = 1 ] (y 1 = 0 , y 2 = 1 )
= P ro b [y 2 = 0 ]
(y 2 = 0 )
Sample Selection Model: Estimation
f(y 1, y 2 ) = P ro b [y 1 = 1 | y 2 = 1 ] * P ro b [y 2 = 1 ] (y 1 = 1 , y 2 = 1 )
= P ro b [y 1 = 0 | y 2 = 1 ] * P ro b [y 2 = 1 ] (y 1 = 0 , y 2 = 1 )
= P ro b [y 2 = 0 ]
(y 2 = 0 )
T e rm s in th e lo g lik e lih o o d :
(y 1 = 1 , y 2 = 1 ) Φ 2 ( β 1 x i1, β 2 x i2 ,ρ )
(B iv a ria te n o rm a l)
(y 1 = 0 , y 2 = 1 ) Φ 2 (-β 1 x i1 , β 2 x i2 , -ρ ) (B iv a ria te n o rm a l)
(y 2 = 0 )
Φ (-β 2 x i2 )
(U n iv a ria te n o rm a l)
E stim a tio n is b y fu ll in f o rm a tio n m a xim u m lik e lih o o d .
T h e re is n o "la m b d a " v a ria b le .
Application: Credit Scoring
American Express: 1992
 N = 13,444 Applications




Observed application data
Observed acceptance/rejection of application
N1 = 10,499 Cardholders


Observed demographics and economic data
Observed default or not in first 12 months
Full Sample is in AmEx.lpj and described in AmEx.lim
Application: Credit Scoring
Credit Scoring Data
The Multivariate Probit Model
M u ltip le E q u a tio n s A n a lo g to S U R M o d e l fo r M B in a ry V a ria b le s
y 1 * = β 1 x 1 + ε 1 , y 1 = 1 (y 1 * > 0 )
y 2 * = β 2 x 2 + ε 2 , y 2 = 1 (y 2 * > 0 )
...
y M * = β M x M + ε M , y M = 1 (y M * > 0 )
 ε1

ε
 2
 ...

 εM


~N
M



lo g L =

 0   1
  
  0  ,  ρ12
  ...   ...
  
  0   ρ 1 M
N
i=1
ρ12
...
1
...
...
...
ρ 2M
...
ρ1M  

ρ 2M 

...  

1  
lo g Φ M [q i1β 1 x i1 , q i2 β 2 x i2 ,..., q iM β M x iM | Σ *]
Σ m n *  1 if m = n o r q im q in ρ m n if n o t.
MLE: Simulation
Estimation of the multivariate probit model
requires evaluation of M-order Integrals
 The general case is usually handled with the
GHK simulator. Much current research focuses
on efficiency (speed) gains in this computation.
 The “Panel Probit Model” is a special case.



(Bertschek-Lechner, JE, 1999) – Construct a GMM
estimator using only first order integrals of the
univariate normal CDF
(Greene, Emp.Econ, 2003) – Estimate the integrals
with (GHK) simulation anyway.
---------------------------------------------------------------------Multivariate Probit Model: 3 equations.
Dependent variable
MVProbit
Log likelihood function
-4751.09039
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------|Index function for DOCTOR
Constant|
-.35527**
.16715
-2.125
.0335
AGE|
.01664***
.00194
8.565
.0000
43.9959
FEMALE|
.30931***
.04812
6.427
.0000
.47935
EDUC|
-.01566
.01024
-1.530
.1261
11.0909
MARRIED|
-.04487
.05112
-.878
.3801
.78911
WORKING|
-.14712***
.05075
-2.899
.0037
.63345
|Index function for HOSPITAL
Constant|
-1.61787***
.15729
-10.286
.0000
AGE|
.00717**
.00283
2.536
.0112
43.9959
FEMALE|
-.00039
.05995
-.007
.9948
.47935
HHNINC|
-.41050
.25147
-1.632
.1026
.29688
HHKIDS|
-.01547
.06551
-.236
.8134
.44915
|Index function for PUBLIC
Constant|
1.51314***
.18608
8.132
.0000
AGE|
.00661**
.00289
2.287
.0222
43.9959
HSAT|
-.06844***
.01385
-4.941
.0000
6.90062
MARRIED|
-.00859
.06892
-.125
.9008
.78911
|Correlation coefficients
R(01,02)|
.28381***
.03833
7.404
.0000
R(01,03)|
.03509
.03768
.931
.3517
R(02,03)|
-.04100
.04831
-.849
.3960
--------+-------------------------------------------------------------
Univariate
Estimates
[-0.29987
[ 0.01644
[ 0.30643
[-0.01936
[-0.04423
[-0.15390
.16195]
.00193]
.04767]
.00962]
.05139]
.05054]
[-1.58276
[ 0.00662
[-0.00407
[-0.41080
[-0.03688
.16119]
.00288]
.05991]
.22891]
.06615]
[ 1.53542
[ 0.00646
[-0.07069
[-.00813
.17060]
.00268]
.01266]
.06908]
[ was 0.29611
]
Marginal Effects




There are M equations: “Effect of what on
what?”
NLOGIT computes E[y1|all other ys, all xs]
Marginal effects are derivatives of this with
respect to all xs. (EXTREMELY MESSY)
Standard errors are estimated with
bootstrapping.
Application: The ‘Panel Probit Model’
M equations are M periods of the
same equation
 Parameter vector is the same in
every period (equation)
 Correlation matrix is unrestricted
across periods

Application: Innovation
Pooled Probit – Ignoring Correlation
Random Effects: Σ=(1- ρ)I+ρii’
Unrestricted Correlation Matrix
Download