Lecture 18

advertisement
Econ 140
Multiple Regression Applications III
Lecture 18
Lecture 18
1
Dummy variables
Econ 140
• Include qualitative indicators into the regression: e.g.
gender, race, regime shifts.
• So far, have only seen the change in the intercept for the
regression line.
• Suppose now we wish to investigate if the slope changes as
well as the intercept.
• This can be written as a general equation:
Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) +
b5(Di*Marriedi) + ei
• Suppose first we wish to test for the difference between
males and females.
Lecture 18
2
Interactive terms
Econ 140
• For females and males separately, the model would be:
Wi = a + b1Agei + b2Marriedi + e
– in so doing we argue that bˆ1, bˆ2 and aˆ would be
different for males and females
– we want to think about two sub-sample groups: males
and females
– we can test the hypothesis that the intercept and partial
slope coefficients will be different for these 2 groups
Lecture 18
3
Interactive terms (2)
Econ 140
• To test our hypothesis we’ll estimate the regression equation
above (Wi = a + b1Agei + b2Marriedi + e) for the whole
sample and then for the two sub-sample groups
• We test to see if our estimated coefficients are the same
between males and females
• Our null hypothesis is:
H0 : aM, b1M, b2M = aF, b1F, b2F
Lecture 18
4
Interactive terms (3)
Econ 140
• We have an unrestricted form and a restricted form
– unrestricted: used when we estimate for the sub-sample
groups separately
– restricted: used when we estimate for the whole sample
• What type of statistic will we use to carry out this test?
– F-statistic:
SSRR  SSRU  q
F
SSRU n1  k   n2  k 
q = k, the number of parameters in the model
n = n1 + n2 where n is complete sample size
Lecture 18
5
Interactive terms (4)
Econ 140
• The sum of squared residuals for the unrestricted form will
be:
SSRU = SSRM + SSRF
• L17_2.xls
– the data is sorted according to the dummy variable
“female”
– there is a second dummy variable for marital status
– there are 3 estimated regression equations, one each for
the total sample, male sub-sample, and female subsample
Lecture 18
6
Interactive terms (5)
Econ 140
• The output allows us to gather the necessary sum of squared
residuals and sample sizes to construct the test statistic:
SSRR  SSRU  q
F
SSRU n1  k   n2  k 
16.261  7.495  5.093 3

7.495  5.093 33  6 
1.224

 2.626
0.466
– Since F0.05,3, 27 = 2.96 > F* we cannot reject the null
hypothesis that the partial slope coefficients are the same
for males and females
Lecture 18
7
Irene O. Wong:
Irene O. Wong:
Interactive terms (6)
Econ 140
• What if F* > F0.05,3, 27 ? How to read the results?
– There’s a difference between the two sub-samples and
therefore we should estimate the wage equations
separately
– Or we could interact the dummy variables with the
other variables
• To interact the dummy variables with the age and marital
status variables, we multiply the dummy variable by the
age and marital status variables to get:
Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) +
b5(Di*Marriedi) + ei
Lecture 18
8
Interactive terms (7)
Econ 140
• Using L17_2.xls you can construct the interactive terms by
multiplying the FEMALE column by the AGE and
MARRIED columns
– one way to see if the two sub-samples are different,
look at the t-ratios on the interactive terms
– in this example, neither of the t-ratios are statistically
significant so we can’t reject the null hypothesis
Lecture 18
9
Interactive terms (8)
Econ 140
• If we want to estimate the equation for the first sub-sample
(males) we take the expectation of the wage equation
where the dummy variable for female takes the value of
zero:
E(Wt|Di = 0) = a + b1Agei + b2Marriedi
• We can do the same for the second sub-sample (Females)
E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi
• We can see that by using only one regression equation, we
have allowed the intercept and partial slope coefficients to
vary by sub-sample
Lecture 18
10
Phillips Curve example
Econ 140
• Phillips curve as an example of a regime shift.
• Data points from 1950 - 1970: There is a downward
sloping, reciprocal relationship between wage inflation and
unemployment
W
UN
Lecture 18
11
Phillips Curve example (2)
Econ 140
• But if we look at data points from 1971 - 1996:
W
UN
• From the data we can detect an upward sloping
relationship
• ALWAYS graph the data between the 2 main variables of
interest
Lecture 18
12
Phillips Curve example (3)
Econ 140
• There seems to be a regime shift between the two periods
– note: this is an arbitrary choice of regime shift - it was
not dictated by a specific change
• We will use the Chow Test (F-test) to test for this regime
shift
1
– the test will use a restricted form: Wt  a  b
UN
– it will also use an unrestricted form:
 1
1
Wt  a  b1D  b2
 b3  D
UN
 UN



– D is the dummy variable for the regime shift, equal to 0
for 1950-1970 and 1 for 1971-1996
Lecture 18
13
Phillips Curve example (4)
Econ 140
• L17_3.xls estimates the restricted regression equations and
calculates the F-statistic for the Chow Test:
• The null hypothesis will be:
H0 : b1 = b3 = 0
– we are testing to see if the dummy variable for the
regime shift alters the intercept or the slope coefficient
• The F-statistic is (* indicates restricted)
2*
2


e

e
q
ˆ
ˆ


F
 eˆ2  n  k 
Lecture 18
Where q=2
14
Phillips Curve example (5)
Econ 140
• The expectation of wage inflation for the first time period:
1
E (W | D  0)  a  b
UN
• The expectation of wage inflation for the second time
period:
1
E (W | D  1)  a  b1   b2  b3 
UN
• You can use the spreadsheet data to carry out these
calculations
Lecture 18
15
Econ 140
Relaxing Assumptions
Lecture 18
Lecture 18
16
Today’s Plan
Econ 140
• A review of what we have learned in regression so far and
a look forward to what we will happen when we relax
assumptions around the regression line
• Introduction to new concepts:
– Heteroskedasticity
– Serial correlation (also known as autocorrelation)
– Non-independence of independent variables
Lecture 18
17
CLRM Revision
Econ 140
• Calculating the linear regression model (using OLS)
• Use of the sum of square residuals: calculate the variance
for the regression line and the mean squared deviation
• Hypothesis tests: t-tests, F-tests, c2 test.
• Coefficient of determination (R2) and the adjustment.
• Modeling: use of log-linear, logs, reciprocal.
• Relationship between F and R2
• Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2);
H0: a + b = 1.
• Dummy variables and interactions; Chow test.
Lecture 18
18
Relaxing assumptions
Econ 140
• What are the assumptions we have used throughout?
• Two assumptions about the population for the bi-variate
case: 1. E(Y|X) = a + bX (the conditional expectation
function is linear); 2. V(Y|X) = (conditional variances are
constant)
• Assumptions concerning the sampling procedure (i= 1..n)
1. Values of Xi (not all equal) are prespecified; 2. Yi is
drawn from the subpopulation having X = Xi; 3. Yi ‘s are
independent.
• Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3.
C(Yh, Yi) = 0
– How can we test to see if these assumptions don’t hold?
– What can we do if the assumptions don’t hold?
Lecture 18
19
Homoskedasticity
Econ 140
• We would like our estimates to be BLUE
• We need to look out for three potential violations of the
CLRM assumptions: heteroskedasticity, autocorrelation,
and non-independence of X (or simultaneity bias).
• Heteroskedasticity: usually found in cross-section data
(and longitudinal)
• In earlier lectures, we saw that the variance of bˆ is
V (bˆ) 
s2
2
x

– This is an example of homoskedasticity, where the
variance is constant
Lecture 18
20
Homoskedasticity (2)
Econ 140
• Homoskedasticity can be illustrated like this:
Y
constant
variance around
the regression line
X
X1
Lecture 18
X2
X3
21
Heteroskedasticity
Econ 140
• But, we don’t always have constant variance s2
– We may have a variance that varies with each
2
s
observation, or i
• When there is heteroskedasticty, the variance around the
regression line varies with the values of X
Lecture 18
22
Heteroskedasticity (2)
Econ 140
• The non-constant variance around the regression line can
be drawn like this:
Y
X
Lecture 18
X1
X2
X3
23
Serial (auto) correlation
Econ 140
• Serial correlation can be found in time series data (and
longitudinal data)
• Under serial correlation, we have covariance terms
V (b)   cis 2   ch cis hi
h i
– where Yi and Yh are correlated or each Yi is not
independently drawn
– This results in nonzero covariance terms
Lecture 18
24
Serial (auto) correlation (2)
Econ 140
• Example: We can think of this using time series data such
that unemployment at time t is related to unemployment in
the previous time period t-1
• If we have a model with unemployment as the dependent
variable Yt then
– Yt and Yt-1 are related
– et and et-1 are also related
Lecture 18
25
Non-independence
Econ 140
• The non-independence of independent variables is the third
violation of the ordinary least squares assumptions
• Remember from the OLS derivation that we minimized the
sum of the squared residuals  e2 given that g b  X , e  0
– we needed independence between the X variable and
the error term
– if not, the values of X are not pre-specified
– without independence, the estimates are biased
Lecture 18
26
Summary
Econ 140
• Heteroskedasticity and serial correlation
– make the estimates inefficient
– therefore makes the estimated standard errors incorrect
• Non-independence of independent variables
– makes estimates biased
– instrumental variables and simultaneous equations are
used to deal with this third type of violation
• Starting next lecture we’ll take a more in-depth look at the
three violations of the CLRM assumptions
Lecture 18
27
Download