Econ 140
Multiple Regression Applications II
&III
Lecture 17
Lecture 17
1
Today’s Plan
Econ 140
• Two topics and how they relate to multiple regression
– Multicollinearity
– Dummy variables
Lecture 17
2
Multicollinearity
Econ 140
• Suppose we have the following regression equation:
Y = a + b1X1 + b2X2 + e
• Multicollinearity occurs when some or all of the
independent X variables are linearly related
• Different forms of multicollinearity:
– Perfect: OLS estimation will not work
– Non-perfect: comes out of applied work
Lecture 17
3
Multicollinearity Example
Econ 140
• Again we’ll use returns to education where:
– the dependent variable Y is (log) wages
– the independent variables (X’s) are age, experience, and
years of schooling
• Experience is defined as years in the labor force, or the
difference between age and years of schooling
– this can be written: Experience = Age - Years of school
– What’s the problem with this?
Lecture 17
4
Multicollinearity Example (2)
Econ 140
• Note that we’ve expressed experience as the difference of
two of our other independent variables
– by constructing experience in this manner we create a
collinear dependence between age and experience
– the relationship between age and experience is a linear
relationship such that: as age increases, for given years
of schooling, experience also increases
• We can write our regression equation for this example:
Wages = a + b1Experience + b2Age + e
Lecture 17
5
Multicollinearity Example (3)
Econ 140
• Recall that our estimate for b1 is
bˆ1 
2
x
y
x
 1  2   x1x2  x2 y
2
2
2


x
x

x
x
 1 2  1 2
Where x1 = experience and x2 = age
• The problem is that x1 and x2 are linearly related
– as we get closer to perfect linearity, the denominator
will go to zero.
– OLS won’t work!
Lecture 17
6
Multicollinearity Example (4)
Econ 140
• Recall that the estimated variance for bˆ1 is:
2


x

2
2
2
ˆ b1  ˆ YX  2
2
2
  x1  x2   x1x2  
– So as x1 and x2 approach perfect collinearity, the
denominator will go to zero and the expression for the
the estimated variance of bˆ1 will increase
• Implications:
– with multicollinearity, you will get large standard errors
on partial coefficients
– your t-ratios, given the null hypothesis that the value of
the coefficient is zero, will be small
Lecture 17
7
More Multicollinearity Examples Econ 140
• On L16_1.xls we have individual data on age, years of
education, weekly earnings, school age, and experience
– we can perform a regression to calculate returns given
age and experience
– we can also estimate bivariate models including only
age, only experience, and only years of schooling
– we expect that the problem is that experience is related
to age (to test this, we can regress age on experience)
• if the slope coefficient on experience is 1, there is
perfect multicollinearity
Lecture 17
8
More Multicollinearity Examples Econ
(2) 140
• On L16_2.xls there’s a made-up example of perfect
multicollinearity
– OLS is unable to calculate the slope coefficients
– calculating the products and cross-products, we find
that the denominator for the slope coefficients is zero as
predicted
– If we have is an applied problem with these properties:
1) OLS is still unbiased
2) Large variance, standard errors, and difficult
hypothesis testing
3) Few significant coefficients but a high R2
Lecture 17
9
More Multicollinearity Examples Econ
(3) 140
• What to do with L16_2.xls?
– There’s simply not enough variation
– We can collect more data or rethink the model
– We can test for partial correlations between the X
variables (as demonstrated on L16_1.xls).
Lecture 17
10
Dummy variables
Econ 140
• Dummy variables allow you to include qualitative
variables (or variables that otherwise cannot be quantified)
in your regression
– examples include: gender, race, marital status, and
religion
– also becomes important when looking at “regime
shifts” which may be new policy initiatives, economic
change, or seasonality
• We will look at some examples:
– using female as a qualitative variable
– using marital status as a qualitative variable
– using the Phillips curve to demonstrate a regime shift
Lecture 17
11
Qualitative example: female
Econ 140
• We’ll construct a dummy variable:
Di = 0 if not female
i = 1, …n
Di = 1 if female
– We can do this with any qualitative variable
– Note: assigning the values for the dummy variable is an
arbitrary choice
• On L17_1.xls there is a sample from the current CPS
– to create the dummy variable “female” we assign the
value one and zero to the CPS’ value of two and one for
sex, respectively
– we can include the dummy variable in the regression
equation like we would any other variable
Lecture 17
12
Qualitative example: female (2)
Econ 140
• We estimate the following equation:
Yˆi  5.975  0.485Di
• Now we can ask: what are the expected earnings given that
a person is male?
E Yi | Di  0   a  b(0)  a
E Yi | Di  0   5.975
• Similarly, what are the expected earnings given that a
person is female?
E(Yi | Di = 1) = a + b(1) = a + b
= 5.975 - 0.485 = 5.490
Lecture 17
13
Qualitative example: female (4)
Econ 140
• We can use other variables to extend our analysis
• for example we can include age to get the equation:
Y = a + b1Di + b2Xi + e
– where Xi can be any or all relevant variables
– Di and the related coefficient b1 will indicate how
much, on average, females earn less than males
– for males the intercept will be aˆ
– for females the intercept will be aˆ  bˆ1
Lecture 17
14
Qualitative example: female (5)
Econ 140
• The estimated regression found on the spreadsheet is
Yˆi  5.085  0.656Di  0.023X i
• The expected weekly earnings for men are:
E (Yi | Di  0)  a  b2 X i
• The expected weekly earnings for women are:
E (Yi | Di  1)  (a  b1 )  b2 X i
Lecture 17
15
Qualitative example: female (6)
Econ 140
• An important note:
• We can’t include dummy variables for both male and female
in the same regression equation
– suppose we have Y = a + b1D1i + b2D2i + e
– where: D1i = 0 if male D1i = 1 if female
D2i = 0 if female D2i = 1 if male
– OLS won’t be able to estimate the regression coefficients
because D1i and D2i show perfect multicollinearity with
intercept a
• So if you have m qualitative variables, you should include
(m-1) dummy variables in the regression equation
Lecture 17
16
Example: marital status
Econ 140
• The spreadsheet (L17_1.xls) also estimates the following
regression equation using two distinct dummy variables:
Y  a  b1D1i  b2 D2i  b3 X i  e
– where: D1i = 0 if male D1i = 1 if female
D2i = 0 if other D2i = 1 if married
• Using the regression equation we can create four categories:
married males, unmarried males, married females, and
unmarried females
Lecture 17
17
Example: marital status (2)
Econ 140
• Expected earnings for unmarried males:
E (Yi | D1i  0, D2i  0)  a  b3 X i
• Expected earnings for unmarried females:
E (Yi | D1i  1, D2i  0)  (a  b1 )  b3 X i
• Expected earnings for married males:
E (Yi | D1i  0, D2i  1)  (a  b2 )  b3 X i
• Expected earnings for unmarried females:
E (Yi | D1i  1, D2i  1)  (a  b1  b2 )  b3 X i
Lecture 17
18
Interactive terms
Econ 140
• So far we’ve only used dummy variables to change the
intercept
• We can also use dummy variables to alter the partial slope
coefficients
• Let’s think about this model:
Wt = a + b1Agei + b2Marriedi + e
– we could argue that bˆ1, bˆ2 and aˆ would be different for
males and females
– we want to think about two sub-sample groups: males
and females
– we can test the hypothesis that the partial slope
coefficients will be different for these 2 groups
Lecture 17
19
Interactive terms (2)
Econ 140
• To test our hypothesis we’ll estimate the regression equation
for the whole sample and then for the two sub-sample
groups
• We test to see if our estimated coefficients are the same
between males and females
• Our null hypothesis is:
H0 : aM, b1M, b2M = aF, b1F, b2F
Lecture 17
20
Interactive terms (3)
Econ 140
• We have an unrestricted form and a restricted form
– unrestricted: used when we estimate for the sub-sample
groups separately
– restricted: used when we estimate for the whole sample
• What type of statistic will we use to carry out this test?
– F-statistic:
SSRR  SSRU  q
F
SSRU n1  k   n2  k 
q = k, the number of parameters in the model
n = n1 + n2 where n is complete sample size
Lecture 17
21
Interactive terms (4)
Econ 140
• The sum of squared residuals for the unrestricted form will
be:
SSRU = SSRM + SSRF
• L17_2.xls
– the data is sorted according to the dummy variable
“female”
– there is a second dummy variable for marital status
– there are 3 estimated regression equations, one each for
the total sample, male sub-sample, and female subsample
Lecture 17
22
Interactive terms (5)
Econ 140
• The output allows us to gather the necessary sum of squared
residuals and sample sizes to construct the estimate:
SSRR  SSRU  q
F
SSRU n1  k   n2  k 
16.261  7.495  5.093 3

7.495  5.093 33  6 
1.224

 2.626
0.466
– Since F0.05,3, 27 = 2.96 > F* we cannot reject the null
hypothesis that the partial slope coefficients are the same
for males and females
Lecture 17
23
Irene O. Wong:
Irene O. Wong:
Interactive terms (6)
Econ 140
• What if F* > F0.05,3, 27 ? How to read the results?
– There’s a difference between the two sub-samples and
therefore we should estimate the wage equations
separately
– Or we could interact the dummy variables with the
other variables
• To interact the dummy variables with the age and marital
status variables, we multiply the dummy variable by the
age and marital status variables to get:
Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) +
b5(Di*Marriedi) + ei
Lecture 17
24
Interactive terms (7)
Econ 140
• Using L17_2.xls you can construct the interactive terms by
multiplying the FEMALE column by the AGE and
MARRIED columns
– one way to see if the two sub-samples are different,
look at the t-ratios on the interactive terms
– in this example, neither of the t-ratios are statistically
significant so we can’t reject the null hypothesis
• We now know how to use dummy variables to indicate the
importance of sub-sample groups within the data
– dummy variables are also useful for testing for
structural breaks or regime shifts
Lecture 17
25
Interactive terms (8)
Econ 140
• If we want to estimate the equation for the first sub-sample
(males) we take the expectation of the wage equation
where the dummy variable for female takes the value of
zero:
E(Wt|Di = 0) = a + b1Agei + b2Marriedi
• We can do the same for the second sub-sample (Females)
E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi
• We can see that by using only one regression equation, we
have allowed the intercept and partial slope coefficients to
vary by sub-sample
Lecture 17
26
Phillips Curve example
Econ 140
• Phillips curve as an example of a regime shift.
• Data points from 1950 - 1970: There is a downward
sloping, reciprocal relationship between wage inflation and
unemployment
W
UN
Lecture 17
27
Phillips Curve example (2)
Econ 140
• But if we look at data points from 1971 - 1996:
W
UN
• From the data we can detect an upward sloping
relationship
Lecture 17
28
Phillips Curve example (3)
Econ 140
• There seems to be a regime shift between the two periods
– note: this is an arbitrary choice of regime shift - it was
not dictated by a specific change
• We will use the Chow Test (F-test) to test for this regime
shift
1
– the test will use a restricted form: Wt  a  b
UN
– it will also use an unrestricted form:
 1
1
Wt  a  b1D  b2
 b3  D
UN
 UN



– D is the dummy variable for the regime shift, equal to 0
Lecture 17for 1950-1970 and 1 for 1971-1996
29
Phillips Curve example (4)
Econ 140
• L17_3.xls estimates the restricted regression equations and
calculates the F-statistic for the Chow Test:
• The null hypothesis will be:
H0 : b1 = b3 = 0
– we are testing to see if the dummy variable for the
regime shift alters the intercept or the slope coefficient
• The F-statistic is (* indicates restricted)
2*
2


e

e
q
ˆ
ˆ


F
 eˆ2  n  k 
Lecture 17
Where q=2
30
Phillips Curve example (5)
Econ 140
• The expectation of wage inflation for the first time period:
1
E (W | D  0)  a  b
UN
• The expectation of wage inflation for the second time
period:
1
E (W | D  1)  a  b1   b2  b3 
UN
• You can use the spreadsheet data to carry out these
calculations
Lecture 17
31
What we’ve learned
Econ 140
• Multicollinearity
– linear relationship between independent variables
– examples
• Dummy variables
– way to include qualitative variables in regressions
– examples
Lecture 17
32