Uploaded by khushee chaturvedi

Regression Models with Dummy Variables

advertisement
Is There Evidence of Wage Discrimination?
◼
Three Seton Hall professors recently learned in a court
decision that they could pursue their lawsuit alleging the
University paid higher salaries to younger instructors and
male professors.
◼
Mary Schweitzer works in human resources at another
college and has been asked by the college to test for
age and gender discrimination in salaries.
◼
She gathers data on 42 professors, including the salary,
experience, gender, and age of each.
17-2
Is There Evidence of Wage Discrimination?
◼
Using this data set, Mary hopes to:
1.
Test whether salary differs by a fixed amount between
males and females.
2.
Determine whether there is evidence of age
discrimination in salaries.
3.
Determine if the salary difference between males and
females increases with experience.
17-3
17.1 Dummy Variables
LO 17.1 Use dummy variables to capture a shift of the intercept.
◼
In previous chapters, all the variables used in regression
applications have been quantitative.
◼
In empirical work it is common to have some variables
that are qualitative: the values represent categories that
may have no implied ordering.
◼
We can include these factors in a regression through the
use of dummy variables.
◼
A dummy variable for a qualitative variable with two
categories assigns a value of 1 for one of the categories
and a value of 0 for the other.
17-4
LO 17.1
◼
Variables with Two Categories
For example, suppose we are interested in determining
the impact of gender on salary.
◼
We might first define a dummy variable d that has the
following structure:
◼
Let
d=1
if
gender = “female”
◼
and
d=0
if
gender = “male.”
◼
This allows us to include a measure for gender in a
regression model and quantify the impact of gender on
salary.
17-5
LO 17.1
Regression with a Dummy Variable
17-6
LO 17.1
Regression with a Dummy Variable
17-7
LO 17.1
Regression with a Dummy Variable
Graphically, we can see how the dummy variable shifts the
intercept of the regression line.
17-8
LO 17.1
Salaries, Gender, and Age
17-9
LO 17.1
Estimation Results
◼
17-10
Testing the Significance of Dummy Variables
LO 17.2 Test for differences between the categories of a qualitative variable.
◼
The statistical tests discussed in Chapter 15
remain valid for dummy variables as well.
◼
We can perform a t-test for individual
significance, form a confidence interval using the
parameter estimate and its standard error, and
conduct a partial F test for joint significance.
17-11
LO 17.2
Example 17.2
17-12
Multiple Categories
17-13
Multiple Categories
17-14
LO 17.2
Multiple Categories
17-15
Avoiding the Dummy Variable Trap
◼
Given the intercept term, we exclude one of the dummy
variables from the regression,
◼
where the excluded variable represents the reference
category against which the others are assessed.
◼
If we included as many dummy variables as categories,
this would create perfect multicollinearity in the data, and
such a model cannot be estimated.
◼
So, we include one less dummy variable than the
number of categories of the qualitative variable.
17-16
17.2 Interactions with Dummy Variables
LO 17.3 Use dummy variables to capture a shift of the intercept and/or slope.
17-17
LO 17.3
Modeling Interaction
17-18
LO 17.3
Shifts in the Intercept and the Slope
Graphically, we can see how both the intercept and the
slope might be impacted.
17-19
LO 17.3
Testing for Significance
17-20
LO 17.3
◼
Example 17.4
Our introductory case, is about the impact of gender on
salary. Further “Does additional experience get a higher
reward for one gender over the other?”
◼
Since age was not significant, we shall consider three
models, one with a dummy variable for gender, one with an
interaction variable between gender and experience, and
one with both a dummy variable and an interaction variable.
◼
As before, we keep experience as a quantitative
explanatory variable.
17-21
◼
17-22
◼
17-23
◼
17-24
LO 17.3
Predicted Salaries
◼
The interaction term allows for male professors to have a
different slope coefficient than female professors.
◼
Conceptually, experience impacts the salary of each
gender differently.
17-25
17.3: Binary Choice Models
LO 17.4 Use a linear probability model to estimate a binary response variable.
◼
So far, we have been considering models where dummy
variables are used as explanatory variables.
◼
There are, however, many applications where the
variable of interest, the response variable, is binary.
◼
Consumer choice literature has many applications
including whether to buy a house, join a health club, or
go to graduate school.
17-26
LO 17.4
The Linear Probability Model
17-27
LO 17.4
Weakness of LPM
17-28
Approval of Loan application
17-29
◼
Here 0.0188 means a 1 percent increase in the down
payment will increase the probability by 0.0118.
◼
Similarly, a 1 percent increase in the income-to-loan ratio
will increase the probability of getting a loan by 0.0258.
◼
If the DP = 30%, IL = 30%, then the predicted
probability= 1.0338, which does not make sense.
17-30
The Logit Model
◼
To address the problem of the LPM that the predicted
probabilities may be negative or greater than 1,
◼
we consider an alternative called the logistic model,
often referred to as a logit model.
◼
A logit model uses a nonlinear regression function that
ensures that the result is always in the interval [0,1].
◼
But interpreting the coefficients becomes more
complicated
◼
and estimation cannot be done by OLS.
17-31
LO 17.5
Logistic Regression
17-32
LO 17.5
Logit versus Linear Probability Model
17-33
LO 17.5
Example 17.6
◼
An educator wants to determine if a student’s interest in
science is linked with the student’s GPA.
◼
She uses Minitab to estimate a logit model where a
student’s choice of field (1=science, 0=other) is predicted
by GPA.
◼
With a p-value of 0.0012, GPA is indeed a significant
factor in predicting whether a student chooses science.
17-34
LO 17.5
Predicted Field Choice
17-35
LO 17.5
Example 17.7
17-36
LO 17.5
Prediction Comparison
◼
Compared to the linear probability model, the logit model does
not predict probabilities less than zero or greater than one.
◼
Therefore, whenever possible, it is generally preferable to use
the logit model rather than the linear probability model.
17-37
Chapter 17 Learning Objectives (LOs)
LO 17.1: Use dummy variables to capture a shift of the
intercept.
LO 17.2: Test for differences between the categories of a
qualitative variable.
LO 17.3: Use dummy variables to capture a shift of the
intercept and/or slope.
LO 17.4: Use a linear probability model to estimate a binary
response variable.
LO 17.5: Interpret the results from a logit model.
17-38
Download