Tobit and Heckman Cheat sheet

advertisement
PBAF 529
Tobit Models and Heckman Selection: Dealing with selection bias
Cheat Sheet
Tobit Models
Created by James Tobin, Tobit models are similar to a linear equation, but they also
include a normally distributed error term. The dependent variable is censored, which
means that Tobit models set parameters around it.
Tobit models address problems with data due to your measurement or dataset not
capturing all the information (i.e. ceiling effects or censored data).
Example A: Effect of GRE Scores on Graduate School admissions
GRE scores have a maximum and a minimum (200 and 800, respectively). Two people may have the same score
on the maximum or minimum range, but not have the same abilities.
Example B: Effect of schooling on wages
You are trying to measure the effect of schooling on peoples’ wage and you only have data on the wages of
those who are working. You are missing the data for those who are not working.
How Tobit models work for example B, only lower censored:
y i= observed outcome variable of interest (wages)
Because the distribution is truncated, the outcome is only observed above a certain threshold (we only know the
wages of people who work). To get around this problem we assume that there is some latent outcome variable
yi*, which is related to the observed outcomes in the following way. (You can think of yi* as a variable that
captures the outcome variable of interest for all observations in the sample, even for those where one wasn’t
observed in reality.) (Wages for people who did not work).
𝑦𝑖∗
𝑦𝑖 = {
0
𝑖𝑓𝑦𝑖∗ > 0
𝑖𝑓 𝑦𝑖∗ ≤ 0
where
𝑦𝑖∗ = 𝛽𝑥𝑖 +∈𝑖
and
𝑥𝑖 = explanatory variables
𝛽 = parameters specifying relationshipbetweeen x and y
∈= error term
Alex Chew and Amelia Vader
1
PBAF 529
What is the probability of being “observed”?
Censored data leads to biased estimates using regular OLS. You can try to correct for this error by introducing
an adjustment to your equation that takes into account the probability of being “observed”(in our example B
this would mean the probability of working and therefore having a wage above zero). Conceptually, this implies
the following relationship between the relationship between the outcome variable of interest, the explanatory
variables, and the observed outcomes in your sample.
E(y)=F(y)E(y*)
where
E(y)=Expected value of Outcome of interest
F(y)=Probability of being observed
E(y*) =Expected value of outcome of interest conditional upon being observed
Maximum Likelihood Estimator
Implementing the “adjustment” to correct for selection bias requires using a “maximum likelihood estimation.”
This means using an equation to determine the probability of being “observed” in the sample. Although a
number of maximum likelihood estimations exist, one of the more common is the Heckman Selection estimator.
Heckman Selection Estimator
Heckman selection is a statistical model developed by James Heckman to correct for
selection bias. It is a means of correcting for not having a randomly selection sample (i.e.
your sample isn’t representative of the group you want to study).
Heckman selection model is a type of Tobit model
How Heckman works
1. Selection Equation (Maximum Likelihood Estimator)
First, you create calculate the selection equation, or the probability that someone is working (their propensity to
be in the sample): Make a probit model for determinants of being “observed” and record a likelihood estimate
for each observation.
2. Add exclusionary variables
This can make your selection equation better. Otherwise, the selection may be weak. For example B, effect of
schooling on wages, pick exclusionary variables would affect the likelihood of working and not affect wage rate
(e.g. having younger children at home, student status)
Alex Chew and Amelia Vader
2
PBAF 529
3. OLS regression
Then, use a statistically adjusted value (inverse mills ratio) calculated using your selection equation as an
Independent Variable in the OLS regression for your outcome of interest. Heckman treats the selection bias as
an omitted variable bias. Plug in a statistically adjusted version of the likelihood estimate from the selection
equation as an explanatory variable in an OLS regression.
The result is a better estimate or fit compared with running it with those without wage information, or running
it with a smaller sample of only those for whom you have wage information.
Key Assumptions
 Error terms for selection equation and OLS regression are jointly normal.
 Vi in equation is normally distributed and E [∈i | Vi] is linear
When to NOT use a Tobit:





If you have heteroskedaticity in the error term.
When you don’t have an instrumental variable or exclusion restriction (without these you are going off
of assumptions about the distribution).
When you don’t have a theory about the selection bias. Your model is only as good as the assumptions
you are make about the bias.
If you have colinearity problems.
If ρ parameter is very sensitive
Resources
Tobit
How to read a Tobit Output: http://www.ats.ucla.edu/stat/stata/output/stata_tobit.htm
How to use a Tobit model in Stata: http://en.wikibooks.org/wiki/Stata/Tobit_and_Selection_Models
Information on the five variations of Tobit models: http://en.wikipedia.org/wiki/Tobit_model
Information on censoring problems: http://en.wikipedia.org/wiki/Censoring_%28statistics%29
Tobit model setup in Stata: Microeconometrics Using Stata, Cameron and Trivedi. pg. 536
Heckman
Microeconometrics Using Stata, Cameron and Trivedi. Pg. 558
How to use Heckman in Stata: http://www.stata.com/features/heckman-probit/heckprob.pdf
Basic information: http://en.wikipedia.org/wiki/Heckman_correction
Powerpoint on Heckman: http://rtm.wustl.edu/GMMC/heckman.pdf
Alex Chew and Amelia Vader
3
Download