LabPart4-OrderedChoice&CountData

advertisement
Discrete Choice Modeling
William Greene
Stern School of Business
New York University
Lab Sessions
Lab 4
Ordered Choice and Count Data Models
Data Set
Data for this session are
healthcare.lpj
Refer to healthcare.lim for full list of the
variables.
This is an unbalanced panel. The group
counter is already in the data set. Use
;PDS=_Groupti for panel models
Binary Dependent Variables
DOCTOR = visited the doctor at least once
HOSPITAL = went to the hospital at least once.
PUBLIC = has public health insurance (1=YES)
ADDON = additional health insurance.(1=Yes)
ADDON is extremely unbalanced.
Dependent Variables: Ordered
HSAT = ordered reported health
satisfaction, coded 0,1,…,10.
Use with ORDERED
or
ORDERED ; Logit
Request marginal effects with
; Marginal
as usual.
Ordered Choice Models
Ordered ; Lhs = dependent variable
; Rhs = One, … independent variables $
Remember to include the constant term
For ordered logit in stead of ordered probit, use
Ordered ; Logit ; Lhs = dependent variable
; Rhs = One, … independent variables $
To get marginal effects, use ; Margin as usual.
There are fixed and random effects estimators for this model:
; FEM ; PDS = _Groupti
; Random ; PDS = _Groupti
Sample Selection in Ordered Choice
Selection Model - The usual binary choice model
z* = γ z+w
z = 1 if z* > 0
When (and only when) z = 1, we observe the ordered probit.
y*  βx  , we assume x contains a constant term
y  0 if y*  0
y = 1 if 0 < y*  1
y = 2 if 1
< y*  2
y = 3 if 2
< y*  3
...
y = J if  J-1
< y*   J
In general : y = j if  j-1
< y*   j , j = 0,1,...,J
-1  ,  o  0,  J  ,  j-1   j, j = 1,...,J
If =Cor(w,)  0, the ML estimator of the ordered probit
model is inconsistent.
Sample Selection Ordered Probit
PROBIT ; Lhs = … ; Rhs = … ; HOLD $
ORDERED ; Lhs = … ; Rhs = … ; Selection $
This is a maximum likelihood estimator, not a
least squares estimator. There is no
‘lambda’ variable. The various parameters
are present in the likelihood function.
Zero Inflated Ordered Probit
Zero Inflation Model - The usual binary choice model
z* = γz+w
z = 1 if z* > 0
y*  βx  , we assume x contains a constant term
y  0 if y*  0 or if z = 0 even if y* > 0. (The zero inflation.)
y = 1 if 0 < y*  1
y = 2 if 1
< y*  2
y = 3 if 2
< y*  3
...
y = J if  J-1
< y*   J
In general : y = j if  j-1
< y*   j , j = 0,1,...,J
-1  ,  o  0,  J  ,  j-1   j, j = 1,...,J
Zero Inflated Ordered Probit Model
Zero inflated ordered probit model with correlation:
A probit model for the zero cell
(E.g., You can use DOCTOR for a model.)
Create ; y1 = y > 0 $
Probit ; … ; HOLD $
Ordered probit with excess zeros
Orde ; Lhs … ; Rhs … ; ZIOP$
Correlation between w (in probit) and ε in ordered probit
model
; CORRELATION is optional. Rho=0 is the default.
Hierarchical Ordered Probit
Hierarchical ordered probit. Ordered probit in which threshold
parameters depend on variables. Two forms:
HO1: μ(i,j) = exp[θ(j) + δ’z(i)].
HO2, different δ vector for each j.
Use ORDERED ; … ; HO1 = list of variables or
ORDERED ; … ; HO2 = list of variables.
Can combine with SELECTION models and zero inflation models.
This is also the Pudney and Shields generalized ordered probit from
Journal of Applied Econometrics, August 2000, with the
modification of using exp(…) and internally, a way to make sure
that the thresholds are ordered..
Dependent Variables: Count
DOCVIS = count of visits to the doctor
HOSPVIS = count of visits to the hospital.
There are outliers. It helps to use truncated or censored samples
(1) Truncated Data
SAMPLE ; All $
REJECT ; DocVis > 10 $ before using or
REJECT ; HospVis > 10 $ before using.
Then, if using a panel data estimator, use
REGRESS ; Lhs = One ; Rhs = One ; Str = ID ; Panel $
to create the _GROUPTI count variable
(2) Censored Data
SAMPLE ; All $
CREATE ; DocVis10 = Min(10,DocVis) ; Hosp10 = Min(10,HospVis) $
Models for Count Data

Basic models Poisson and negative
binomial
POISSON ; Lhs = y ; Rhs = One,… $
NEGBIN ; Lhs = y ; Rhs = One,… $

Many extensions







Various heterogeneity forms
Panel data
Random parameters and latent class
Zero inflation
Sample selection
Censoring and truncation
Numerous others… (some, far from all, shown
below)
Robust Covariance Matrix
“Robust” sandwich estimator is
appropriate for the Poisson and other
loglinear models
POISSON ; … ; ROBUST $
“Offset” Variables
Poisson (and NB) mean is a rate per unit of
time.


In the sample, all observations should be
observed (exposed) for the same length of
time.
Else, the appropriate model is as shown below.
Exposure Variable
Probability of j occurrences in interval of length Ti
exp(i Ti )(i Ti ) j
Prob[y i  j | x i , Ti ] 
j!
i  exp(β'x i )
Model is equivalent to
exp(i )(ii ) j
Prob[y i  j | x i , Ti ] 
j!
i  exp(β'x i  log Ti )
Note that if Ti is constant, it disappears into the constant term.
; Exposure = the name of T
Sample Selection
Selection Model - The usual binary choice model
z* = γ z+w
z = 1 if z* > 0
When (and only when) z = 1, we observe the Poisson or
Negative binomial outcome variable and covariates.
y  0,1,... with probability
exp[-()][()]j
Prob(y=j|x,z=1,)=
j!
()  exp(βx  )
" Selectivity " arises if Cor(w,)=  0.
The model is fit by full information maximum likelihood, using
Hermite quadrature to integrate  out of the likelihood.
Estimating a Selection Model
? Selection Equation
Probit ; Lhs = … ; Rhs = …
; Hold $
? Main Regression Equation
Poisson ; Lhs = … ; Rhs = …
; Selection $
NegBin with Heterogeneity in Alpha
In the negative binomial model, the
overdispersion parameter is α, with the
model assumption
λ
= exp(β’x)
E[y|x] = λ
Var[y|x] = λ[1+ α λ]
We allow α to be heterogeneous:
α
= exp(δ’z)
Use
; Hfn = … variables in z
Censoring and Truncation
Censoring limit C

Right (upper): Values larger than C are set to Ci. The
largest value in the sample is C. Use ;LIMIT=C;MAXIMUM

Left (lower): Values less than C are set equal to C. The
smallest value in the sample is C. Use ;LIMIT=C
Truncation limit C

Right (upper): Values greater than or equal to C have
been discarded. The largest value in the sample is C-1.
Use ;LIMIT=C;TRUNCATION;UPPER

Left (lower): Values less than or equal to C have been
discarded. The smallest value in the sample is C+1. Use
;LIMIT=C;TRUNCATION
Zero Inflation
Two regime, latent class model


Prob[Regime 1 => y=0] = q
Prob[y = j|Regime 2] = Poisson or NegBin,
λ=exp(β’x)
Reduced form:


Prob[y=0] = q + (1-q)P(0)
Prob[y=j > 0] = (1-q)P(j)
Regime Models:


q = Probit or Logit
Structures:


ZIP: Probit or Logit F(γ’z) z can be any set of
variables
ZIP-tau: Probit or Logit F(τ β’x) – same β’x as
above
ZIP and ZIP-tau Models
;ZIP
Logit, ZIP-tau
;ZIP = Normal: Probit ZIP-tau
;ZIP [=Normal] ; Rh2 = variables in z
Alternative Models



Default is Poisson
;MODEL = NegBin
;MODEL = Gamma
Hurdle Model
Two Part Model:


Prob[y=0] = Logit or Probit using β’x from the
count model or γ’z as specified with ;RH2=list
Prob[y=j|j>0] = Truncated Poisson or NegBin
Two part decision:
Drug or alcohol use, for example
POISSON ; … ; Hurdle $
POISSON ; … ; Hurdle ; Rh2 = List $
Poisson and NB with Normal Heterogeneity
y  0,1,... with probability
exp[-()][()]j
Prob(y=j|x,z=1,)=
j!
()  exp(βx  )
The model is fit by full information maximum likelihood, using
Hermite quadrature to integrate  out of the likelihood. Note,
this is precisely the negative binomial model if exp() has a
  1
gamma distribution with mean 1; f[exp( )]=
 exp().
( )
In this model,  has a normal distribution, so exp() is lognormal.
Add
; NORMAL to the command.
Fixed and Random Effects
Poisson or NegBin
;PDS=setting
Fixed Effects



Default is a conditional esitmator
;FEM uses the unconditional estimator
The two are algebraically identical but use
different algorithms
Random Effects


Use ; RANDOM
Can be fit as a random parameters model with
just a random constant
Random Parameters
Poisson







; LHS = dependent variable
; RHS = independent variable(s)
; PDS = setting (may be ;PDS=1)
; RPM ; PTS = number of Points
; Halton (for smarter integration method)
; Correlated if desired to fit correlated
parameters model
; FCN = variable(n) , variable(n), …
to indicate which parameters are random
Latent Class
Poisson (or NEGBIN)





; LHS = dependent variable
; RHS = independent variable(s)
; LCM for a latent class model
; LCM = variables if probabilities are
heterogeneous
; PDS = setting (may be ;PDS=1)
; PTS = number of latent classes
Download