Gujarati Chapter 6-- Statistical Inference and Hypothesis Testing

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Limited Dependent Variable Regression Models--The Probit Model
Recall that one of the assumptions of the OLS regression model is that the dependent (Y) variable is a continuous
measurement variable. The values of a continuous measurement variable are not limited--the variable can take on
fractional/decimal values, and it can range (potentially) from -∞ to +∞. What happens when this assumption is
violated? That is, what happens when the values of the dependent (Y) variable in the model are limited in some
way? For example, perhaps the values of the Y variable are limited to a range smaller than -∞ to +∞, or perhaps
the Y variable is categorical, taking discrete values that label various categories (green, blue, red, for example),
or, perhaps the Y variable is a binary/dummy/indicator variable, taking only 0 or 1 as possible values. In all of
these cases, we need to use a Limited Dependent (Y) Variable Model. There are several other names for
limited dependent variable models, including Categorical Models, Qualitative Choice Models, Binary Choice
Models, etc., depending on how the values of the dependent (Y) variable are limited.
Problems with OLS Regression When the Dependent Variable is Limited
What happens if we ignore the fact that our dependent (Y) variable is limited, and we go ahead and use OLS to
estimate a regression model when we have a limited dependent (Y) variable? Several bad things:
1.
2.
3.
4.
the OLS estimates of the β's are often biased
if the β's are biased, they are also inconsistent (the bias remains even when we increase sample size)
the error term (ehats) will be heteroskedastic (the heteroskedasticity problem)
when we use the OLS model to make predictions, we can obtain predictions greater than 100% or less
than 0%, which make no sense
We can correct the heteroskedasticity problem by using weighted least squares (WLS) regression, but we would
still have the other problems. So, when we have a limited dependent variable situation, we need to avoid OLS
regression and use a modified/adjusted regression model, such as the . . . (drum roll) . . . Probit model!
The Probit Regression Model
In this handout, we explore the simplest example of a limited dependent variable model, one in which the Y
variable is a dummy/indicator variable, taking only 0 or 1 as possible values. This model is called the Probit
model.1 There are many situations in the real world in which you might want to build a model to predict a Y
variable that takes only two possible values. For example, many survey questions have only two possible
responses, such as “yes” or “no”, or “agree” or “disagree.” Also, many questions in economics, finance and
marketing involve only two possible answers, such as “raise interest rates” or “don’t raise interest rates,” “buy the
stock” or “don’t buy the stock”, “consumer will buy the shampoo” or “consumer won’t buy the shampoo,” etc. In
such situations, if we wanted to build a model to predict which of the two possible responses was chosen by an
individual in the population, and which factors (X variables) affect that choice, then we could use a Probit model.
In the Probit model, instead of constructing a model to directly predict whether the Y variable takes on a value of
0 or 1, we predict the probability (chances) that Y = 1; that is, the model attempts to predict “Prob(Y=1)”. If, for
example, the model predicts that Prob(Y=1) = 0.20, then, naturally, Prob(Y=0) = 0.80, because the Y variable can
only take on the values of 0 and 1, so the Prob(Y=1) and the Prob(Y=0) must add to 1.00 . This corresponds to
the fact that 20% plus 80% must add to 100%. In the Probit model, we want to explain how various independent
X variables affect the Prob(Y=1), and we want to predict Prob(Y=1).
We begin by defining an index variable, Xindex, based on the X variables in the model plus an error term. As an
example, let's consider a model with variables X1 and X2.
1
There is another model that can be used in this situation, called the Logit model, but it is similar to the Probit model.
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Xindex = β0 +β1X1+β2X2+ e
Then, we say that if Xindex becomes greater than zero, then Y = 1; otherwise, Y = 0. We are using Xindex as an
indicator of whether or not Y will be equal to 1. So, we're saying:
If Xindex > 0, then Y = 1.
Otherwise, Y = 0.
So, when is Xindex > 0 ? It depends on the error term, e. Remember that the error term is assumed to have a
standard normal, bell-shaped distribution, centered on zero, as shown in the graph below.
Shaded area is the
probability that
e < β0 +β1X1+β2X2
frequency(e)
0
Xindex = β0 +β1X1+β2X2
e
It turns out that Xindex > 0 when e < β0 +β1X1+β2X2.. So, all we need to do is to find β0 +β1X1+β2X2. on the e axis
and then ask ourselves, when is e < β0 +β1X1+β2X2 ?
Recall that the area under a distribution curve is a probability. So, the area under the curve to the left of
β0 +β1X1+β2X2 gives the probability that e < β0 +β1X1+β2X2. This is the probability that Y = 1. So, if we can find
this area, then we've found Prob(Y=1).
There is a formula that gives the shaded area in the graph above: The cumulative distribution function (cdf) of
the normal distribution, usually denoted “F”, shown below. Because this formula gives the shaded area, it also
gives Prob(Y=1) . . .
𝑋
𝑖𝑛𝑑𝑒π‘₯
π‘ƒπ‘Ÿπ‘œπ‘(π‘Œ = 1) = 𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ ) = ∫−∞
1
𝐸𝑋𝑃(−0.5𝑒 2 )𝑑𝑒
√2πœ‹
Note: F is the integral
of a standard normal
“bell- curve” equation.
(where "e" in the equation above is the value of e along the horizontal axis of the bell curve graph above)
Yes, the formula above is pretty crazy. To understand it, let’s examine its graph below:
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
F(Xindex)
1
0.89
0.5
0
0
Xindex = 1.21
e
In the graph above, the horizontal axis shows the possible values of the error term, e. The value of Xindex is also
shown on the e axis. F(Xindex) is graphed on the vertical axis, and F(Xindex) can range from 0 to 1. The value of
Xindex is fed into the crazy F equation, which turns it into a value of F(Xindex) between 0 and 1 along the vertical
axis. This F(Xindex) value is Prob(Y=1).
That is, Prob(Y=1) is given by F(Xindex).
For example, in the graph above, suppose that we plug the β's and X’s into the Xindex equation and calculate Xindex
= 1.21. You find this value along the horizontal axis in the graph, and then you read the height up to the F graph.
For Xindex = 1.21, this height is 0.89. The value 0.89 is the value of F(Xindex). Because Prob(Y=1) is equal to
F(Xindex), we know that Prob(Y=1) is 0.89 . Therefore, the probability is 89% that the value of Y = 1. Thus,
naturally, the probability that Y = 0 is 11%, because 100% - 89% = 11%. Hence, Prob(Y=0) = 0.11 .
That is, Prob(Y=0) is given by [1 - F(Xindex)].
Importantly, the values of Prob(Y=1) and Prob(Y=0) in the example above resulted from the particular values of
the X’s that we plugged into β0 + β 1X1+ β 2X2. If we change the value of one or more of the X’s that we plug into
β 0 + β 1X1+ β 2X2, then we will get different answers for Prob(Y=1) and Prob(Y=0). This is the way in which the
values of the X’s affect the values of Prob(Y=1) and Prob(Y=0).
Maximum Likelihood Estimation—Finding the Values of the β's in the Probit Model
Because the Y variable only takes 0/1 values in the Probit model, violating one of the assumptions of the OLS
regression model, we can’t use OLS regression to estimate the β's of the Probit model. Instead, we use
“Maximum Likelihood Estimation” (MLE). In MLE, we find the β's that maximize the probability of obtaining
the data sample that we actually obtained when we collected our data. For example, suppose we collect the
following data sample:
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Individual Y
X1 X2
in sample
1
1
6
2
2
1
1
10
3
0
4
12
4
1
5
1
5
0
2
8
etc.
etc. etc. etc.
When doing MLE, we ask ourselves: “Given the values of the X’s that are in my sample, what values of the β's
would maximize my chances of getting the values of Y that are in my sample.” These are the “most likely”
values of the β's; these are the MLE estimates of the β's.
To find the MLE estimates of the β's, we first find an equation that gives the probability of obtaining the Y values
in our sample, given the X's in our sample, and then we find the values of the β's that maximize the probability of
getting those Y's.
Recall from the basic laws of probability that, if the probability that Y =1 for the first individual in our sample is
Prob(Y1=1), and the probability that Y = 1 for the second individual in our sample is Prob(Y2=1), then the
probability that Y = 1 for both individuals at the same time is Prob(Y1=1)* Prob(Y2=1), assuming that the two
individuals are independent of one another (which we assume). In the same way, the probability that three
individuals have Y =1 at the same time would be three Prob(Y=1)’s multiplied together, and so on, for any
number of individuals. In the same way, the probability that Prob(Y=0) for any number of individuals would be
all of the individuals’ Prob(Y=0)’s multiplied together. Now, take all of the individuals in the sample that have Y
= 1 and multiply their Prob(Y=1)’s together, then take all of the individuals in the sample with Y = 0 and multiply
their Prob(Y=0)’s together, and then, (the grand finale) multiply the two products by each other. This would give
the probability of getting the Y’s in the sample (both the 1’s and the 0’s) that we actually have in the sample.
Recall that Prob(Y=1) is given by F(Xindex), and Prob(Y=0) is given by [1 - F(Xindex)]. Then, the probability of
getting the actual Y’s in the data sample is given by the Likelihood Function:
The Likelihood Function
L = Prob(Y1=1,Y2=1,Y3=0,Y4=1,Y5=0, etc.) = ∏π‘Œπ‘–=1 𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ ) βˆ™ ∏π‘Œπ‘–=0[1 − 𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ )]
where the capital Greek letter pi, “Π”, means “multiply together.” So, ∏π‘Œπ‘–=1 𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ ) means, “multiply
together all the F(Xindex) values for all of the individuals in the sample who have Y =1,” and ∏π‘Œπ‘–=0[1 −
𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ )] means “multiply together all of the [1 - F(Xindex)] values for all of the individuals in the sample who
have Y = 0.”
Next, because Xindex has β's inside it (recall that Xindex = β 0 + β 1X1+ β 2X2+...), we could try various values of the
β's in the Likelihood Function equation until we find the values of the β's that maximize
Prob(Y1=1,Y2=1,Y3=0,Y4=1,Y5=0, etc.). (Typically, this is done by first logging the equation, then using the
“Classical Calculus Method” to find the first order conditions (FOC’s), and then finding the values of the β's that
solve the FOC’s.) The values of the β's that come out of this process are the “Maximum Likelihood Estimates”
of the β's for the Probit model. The Maximum Likelihood Estimates are the values of the β's that maximize the
chances of getting the Y's that we actually have in our sample, given the X's that we actually have in our sample.
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Prediction with the Probit Model
Actually, this is pretty simple. Recall that the purpose of the Probit model is to predict Prob(Y=1) for various
values of the X variables. After we find the MLE estimates of the β's, we plug them into
Xindex = β 0 + β 1X1+ β 2X2+..., along with the values of the X’s that we are interested in. After plugging in, we get
a number for Xindex. Then, we take the Xindex number and plug it into the graph of the F function to find F(Xindex).
Finally, recall that Prob(Y=1) is equal to F(Xindex), so we have found our prediction for Prob(Y=1).
Marginal Effects in the Probit Model
A “Marginal Effect” is the change in Prob(Y=1) that occurs when we change the value of an X variable by one
unit (a marginal amount). Each X variable has its own, different, marginal effect on Prob(Y=1). As an example,
let’s discuss the marginal effect of X1 on Prob(Y=1). In the Probit model, the marginal effect of X1 on Prob(Y=1)
is given by:
πœ•π‘ƒπ‘Ÿπ‘œπ‘(π‘Œ = 1)
= β1 βˆ™ [π‘ π‘™π‘œπ‘π‘’ π‘œπ‘“ 𝐹(𝑋𝑖𝑛𝑑𝑒π‘₯ )]
πœ•π‘‹1
In a similar way, the marginal effect of X2 on Prob(Y=1) would be given by the same equation, but with β1
replaced by β2.
Testing Significance of the Overall Model
In OLS regression, we use the F-test to test the statistical significance of the model as a whole. In the Probit
model, we use a Likelihood Ratio Test to accomplish a similar hypothesis test. The Likelihood Ratio Test is a
test of the following hypotheses:
H0: all β's are zero (none of the X’s help to predict Prob(Y=1) )
H1: one or more of the β's is not zero (one or more of the X’s helps to predict Prob(Y=1) )
In a Likelihood Ratio Test, we compare an LRtest number derived from the sample data to a LRcritical number from
the chi-square (χ2) table. The formula for LRtest is:
𝑳
𝑳𝑹
LRtest = 2·π’π’ ( 𝑼) = 2·[ln(LU) – ln(LR)]
where ln(LU) = log of the maximized value of the Likelihood Function
ln(LR) = n·[P·ln(P) + (1-P)·ln(1-P)], and P = the proportion of individuals in the sample with Y =1
LU is the likelihood (chances) of getting the Y values in the sample when all β's in the model take the values that
maximize L = Prob(Y1=1,Y2=1,Y3=0,Y4=1,Y5=0, etc.).
LR is the likelihood (chances) of getting the Y values in the sample (for example, Y1=1,Y2=1,Y3=0,Y4=1,Y5=0,
etc.) when the all β's in the model (except the intercept) are restricted to the value zero.
Typically, computer software programs will provide either LU or LRtest in the output of Probit model results.
LRcritical is found in the chi-square (χ2) table using d.f. = k – 1, where k is the number of β's in the model. This
is a one-sided test.
5
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
As is typical in hypothesis testing:
If LRtest > LRcritical, then Reject H0 and Accept H1.
If LRtest < LRcritical, then Accept H0 and Reject H1.
Measuring Goodness of Fit of the Probit Model
In OLS regression, we use R2 to measure “Goodness of Fit,” that is, how well the regression equation fits the data
points. There are many measures of Goodness of Fit for limited dependent variable models such as the Probit
model. We will consider here a commonly-used measure of “Goodness of Fit” for the Probit model called
McFadden’s R2. The formula for McFadden’s R2 is:
R2McFadden = 1 – (ln(LU)/ln(LR))
The value of R2McFadden lies between 0 and 1, with a value of 0 indicating that none of the X variables in the model
helps to predict Prob(Y=1). The larger the value of R2McFadden, the better the model fits the data. Sometimes
R2McFadden is called the Likelihood Ratio Index (LRI), because its formula contains a ratio of likelihoods.
Probit Models in SAS
In SAS, there are several different Procs that can be used to estimate Probit models. In this handout, we use
PROC QLIM to estimate a Probit model. (The "QLIM" stands for "qualitative and limited" dependent variable
models, that is, models that have a Y variable that can take only qualitative or limited values.) In the example
program below, the data from dataset cleanNCcounties.xls are used to estimate a Probit model in which the
dependent (Y) variable is a 0/1 variable that indicates whether or not a North Carolina county voted Republican in
the year 2000 presidential election. The independent (X) variables are the percent of county residents in poverty
(PctInPov), the county unemployment rate (UnempRate), an urbanization index (UrbanIndex), the percentage of
county residents age 65+ (OldIndex), the percentage of county residents who are college graduates (EducIndex),
and an index of manufacturing employment (EmpManfIndex).
/* SOFTWARE: SAS Statistical Software program, version 9.2 */
/* AUTHOR: Dr. Chris Dumas, UNC-Wilmington, April, 2015. */
/* TITLE: Probit model with marginal effects regression */
options helpbrowser=sas;
options number pageno=1 nodate nolabel font="SAS Monospace" 10;
options leftmargin=1.00 in rightmargin=1.00 in
topmargin=1.00 in bottommargin=1.00 in;
proc import datafile="v:\ECN377\cleanNCcounties.xls"
replace;
run;
dbms=xls out=dataset01
data dataset02;
set dataset01;
UrbanIndex = UrbanPop/PopCens;
OldIndex = Age65More/PopCens;
EducIndex = CoColGrads/PopCens;
EmpManfIndex = (EmpManf2000/PopCens)*100;
run;
6
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
proc means data=dataset02 n mean max min;
var VoteRepub PctInPov UnempRate UrbanIndex OldIndex EducIndex EmpManfIndex;
run;
/* In the PROC GLIM command below, the "discrete" option tells SAS that the Y
variable has discrete values rather than limited continuous values. The "marginal"
option in the output command tells SAS to calculate and save the marginal effects
in dataset03. */
/* Note: McFadden's LRI is McFadden's R-square */
proc qlim data=dataset02;
model VoteRepub = PctInPov UnempRate UrbanIndex OldIndex EducIndex EmpManfIndex /
discrete;
output out=dataset03 marginal;
run;
/* The PROC MEANS command below uses the marginal effects that were calculated for
each observation and variable by the PROC QLIM command above to calculate the mean
marginal effect for each variable. Recall that the “Marginal Effect” of variable X
is the change in Prob(Y=1) that occurs when we change the value of X by one unit (a
marginal amount). There is a different marginal effect for each X variable. */
proc means data=dataset03 n mean;
var Meff_P2_PctInPov Meff_P2_UnempRate Meff_P2_UrbanIndex
Meff_P2_OldIndex Meff_P2_EducIndex Meff_P2_EmpManfIndex;
run;
7
Download