Chapter 7: Inference in Multiple
Regression & Control Variables
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
1-1
Outline
1.
2.
3.
4.
Hypothesis tests and confidence intervals for one coefficient
Joint hypothesis tests on multiple coefficients
Other types of hypotheses involving multiple coefficients
Variables of interest, control variables, and how to decide
which variables to include in a regression model
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-2
Hypothesis Tests and Confidence
Intervals for a Single Coefficient
• Hypothesis tests and confidence intervals for a single
coefficient in multiple regression follow the same logic and
recipe as for the slope coefficient in a single-regressor
model.
•
βˆ1 − E (βˆ1 )
var(βˆ1 )
is approximately distributed N(0,1) (CLT).
• Hypotheses on β1 can be tested with usual t-statistic, and
confidence intervals are constructed as { ± 1.96×SE( )}.
• So too for β2,…, βk.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-3
Example: The California class size data
1. TestScore = 698.9 – 2.28×STR
(10.4) (0.52)
2. TestScore = 686.0 – 1.10×STR – 0.650PctEL
(8.7) (0.43)
(0.031)
• The coefficient on STR in (2) is the effect on TestScores of a
unit change in STR, holding constant the percentage of
English Learners in the district
• The coefficient on STR falls by one-half
• The 95% confidence interval for coefficient on STR in (2) is
{–1.10 ± 1.96×0.43} = (–1.95, –0.26)
• The t-statistic testing βSTR = 0 is t = –1.10/0.43 = –2.54, so
we reject the hypothesis at the 5% significance level
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-4
Standard errors in STATA
reg testscr str pctel, robust;
Regression with robust standard errors
Number of obs
F( 2,
417)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
223.82
0.0000
0.4264
14.464
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -1.101296
.4328472
-2.54
0.011
-1.95213
-.2504616
pctel | -.6497768
.0310318
-20.94
0.000
-.710775
-.5887786
_cons |
686.0322
8.728224
78.60
0.000
668.8754
703.189
------------------------------------------------------------------------------
= 686.0 – 1.10×STR – 0.650PctEL
TestScore
(8.7) (0.43)
(0.031)
We use robust standard errors (robust to heteroskedasticity)–
Stata’s default is homoskedasticity-only SE’s, which are typically wrong
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-5
Tests of Joint Hypotheses
Let Expn = expenditures per pupil and consider the model:
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
By including Expn, coeff. on STR is effect on TestScore while
holding fixed Expn and PctEL. Reason: Principal wants to hire
more teachers (decrease STR) without breaking budget/pupil
Skeptical taxpayer: Null hypothesis that “school resources don’t
matter,” and the alternative that they do:
H0: β1 = 0 and β2 = 0
vs. H1: either β1 ≠ 0 or β2 ≠ 0 or both
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-6
Tests of joint hypotheses, ctd.
• H0: β1 = 0 and β2 = 0
• vs. H1: either β1 ≠ 0 or β2 ≠ 0 or both
• A joint hypothesis specifies a value for two or more
coefficients; restricts two or more coefficients.
• In general, a joint hypothesis will involve q restrictions.
Here, q = 2, and the two restrictions are β1 = 0 and β2 = 0.
• A “common sense” idea is to reject if either of the individual
t-statistics exceeds 1.96 in absolute value.
• But this “one at a time” test isn’t valid: the resulting test
rejects too often under the null hypothesis (more than 5%)!
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-7
Why can’t we just test the coefficients
one at a time?
Because the test’s rejection rate under the null isn’t 5%. To
see why, assume for simplicity that the t-statistics are
independently distributed:
ˆ −0
ˆ −0
β
β
1
t1 =
and t2 = 2
SE (βˆ 2 )
SE (βˆ1 )
The “one at time” test is:
reject H0: β1 = β2 = 0 if |t1| > 1.96 and/or |t2| > 1.96
What is the probability that this “one at a time” test rejects H0,
when H0 is actually true? (It should be 5%, given 1.96!)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-8
Using independence (to make point)
Let’s compute the “one at a time” test’s size, i.e. the
probability of incorrectly rejecting true null hypothesis:
=
PrH [|t1| > 1.96 and/or |t2| > 1.96]
0
= 1 – PrH [|t1| ≤ 1.96 and |t2| ≤ 1.96]
0
=independence 1 – PrH [|t1| ≤ 1.96] ×PrH [|t2| ≤ 1.96]
0
0
= 1 – (.95)2 = .0975 = 9.75% – which is not 5%!
• The size of the “common sense” test isn’t 5%!
• In general, its size depends on the correlation between t1 &
t2 (hence on the correlation between the estimated β’s)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-9
Two correct approaches for right size
:
• Use a different critical value in this procedure, given by the
“Bonferroni method – see Appendix. 7.1. (Rarely done.)
• Use a different test statistic designed to test both β1 and β2
at once: the F-statistic. (Usually done.)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-10
The F-statistic
The F-statistic tests all parts of a joint hypothesis at once.
• Formula for more than two β’s: Ch18.
• Formula for two regressors (joint hyp β1 = β1,0,β2 = β2,0 ):
2
2

t
+
t
1 1 2 − 2 ρˆ t1 ,t2 t1t2 
F= 

2

ˆ
2
1 − ρt1 ,t2

ˆ t ,t estimates the correlation between t1 and t2.
where ρ
1 2
• The F-statistic is large when t1 large or t2 large
• The F-statistic corrects for any correlation between t1 and t2.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-11
Large-sample distribution of F-statistic
Special case: t1 , t2 independent, so ρˆ t1 ,t2
→p 0
in large samples
Then the formula becomes
2
2
1  t1 + t2 − 2ρˆ t1 ,t2 t1t2  1 2 2
F= 
≅ (t1 + t2 )
2

2
1 − ρˆ t1 ,t2
 2
• Under the null, t1 & t2 have std. normal distributions that, in
this special case, are independent.
• Recall,Fm ,∞ := ( z12 + ... + zm2 ) / m with zi independent std.normals
• Conclude F-statistic is F2,∞!
• This conclusion remains true, even beyond special case.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-12
The
Fm ,∞
distribution
The Fm ,∞ distribution with m degrees of freedom is defined to be
that of a sum of m independent squared std. normals, over m.
In large samples, F-stat distributed as Fm ,∞ (Table 4, p.753)
where m=#restrictions
Selected large-sample critical values of
q
5% critical value
1
3.84
2
3.00
3
2.60
4
2.37
5
2.21
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
Fm ,∞
(the case m=2 above)
7-13
Computing the p-value using F-statistic:
p-value = probability that
Fm ,∞
distribution exceeds F-statistic.
Implementation in STATA: Use “test” command after regressing
Example:
Test the joint hypothesis that the population
coefficients on STR and expenditures per pupil (expn_stu) are
both zero, against the alternative that not.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-14
F-test example, California class size data:
reg testscr str expn_stu pctel, r;
Regression with robust standard errors
Number of obs =
420
F( 3,
416) = 147.20
Prob > F
= 0.0000
R-squared
= 0.4366
Root MSE
= 14.353
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.2863992
.4820728
-0.59
0.553
-1.234001
.661203
expn_stu |
.0038679
.0015807
2.45
0.015
.0007607
.0069751
pctel | -.6560227
.0317844
-20.64
0.000
-.7185008
-.5935446
_cons |
649.5779
15.45834
42.02
0.000
619.1917
679.9641
-----------------------------------------------------------------------------NOTE
test str expn_stu; The test command follows the regression
( 1)
( 2)
str = 0.0
There are q=2 restrictions being tested
expn_stu = 0.0
F( 2,
416) =
5.43
The 5% critical value for m=2 is 3.00
Prob > F =
0.0047
Stata computes the p-value for you
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-15
Intuition for F-statistic
There is a simple formula for the F-statistic that holds only
under homoskedasticity (so unreliable) but does clarify it.
The homoskedasticity-only F-statistic formula
Formula is in terms of the following R-squareds:
• Run two regressions, one under the null hypothesis (the
“restricted” regression) and one under the alternative
hypothesis (the “unrestricted” regression).
• Compare the fits of the regressions – the R2s – if the
“unrestricted” model fits sufficiently better, reject the null
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-16
The “restricted” and “unrestricted”
regressions
Example: are the coefficients on STR and Expn zero?
Unrestricted population regression (under H1):
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
Restricted population regression (that is, under H0):
TestScorei = β0 + β3PctELi + ui
• The number of restrictions under H0 is q = 2.
• The R2 will spike in unrestricted regression (SSR drops)
• By how much must the R2 spike for the coefficients on Expn
and PctEL to be judged statistically significant?
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-17
Formula for homoskedasticity-only F-stat.
where:
2
Rrestricted
2
2
( Runrest
−
R
.
rest . ) / q
F=
2
(1 − Runrest
. ) / ( n − kunrest . − 1)
= the R2 for the restricted regression
2
= the R2 for the unrestricted regression
Runrestricted
q = the number of restrictions under the null
kunrestricted = # regressors in unrestricted regression.
• Formula shows that the null is more unlikely (bigger F-stat)
the bigger is the spike in R2 of removing null’s restrictions.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-18
Example:
Restricted regression:
= 644.7 –0.671PctEL,
TestScore
2
= 0.4149
Rrestricted
(1.0) (0.032)
Unrestricted regression:
TestScore
= 649.6 – 0.29STR + 3.87Expn – 0.656PctEL
(15.5) (0.48)
2
Runrestricted
So
(1.59)
(0.032)
= 0.4366, kunrestricted = 3, q = 2
F=
2
2
(Runrestricted
− Rrestricted
)/q
2
(1− Runrestricted
) / (n − kunrestricted − 1)
(.4366 − .4149) / 2
=
= 8.01
(1− .4366) / (420 − 3 − 1)
Note: Heteroskedasticity-robust F = 5.43…(neat formula unreliable)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-19
The homoskedasticity-only F-statistic –
summary
F=
2
2
( Runrestricted
− Rrestricted
)/q
2
(1− Runrestricted
) / (n − kunrestricted − 1)
• The homoskedasticity-only F-statistic rejects when adding the
two variables increases R2 by “enough”
• If the errors are homoskedastic, then the homoskedasticityonly F-statistic has a large-sample distribution that is Fm ,∞
• But if the errors aren’t, the large-sample distribution of the
homoskedasticity-only F-statistic is not Fm ,∞
• Upshot: Use Stata to compute F-stat that is robust (slide 15)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-20
The F distribution
Your regression printouts might refer to the “F” distribution.
If the four multiple regression LS assumptions hold and if:
5. ui is homoskedastic: var(u|X1,…,Xk) does not depend on X’s
6. u1,…,un are normally distributed
then the homoskedasticity-only F-statistic has “Fq,n-k–1” dist’bn
even in small samples, where
q = # restrictions
k = # regressors under the alternative (unrestricted model)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-21
The Fq,n–k–1 distribution:
• The Fq,n-k-1 distribution is tabulated in p.754-6
• Also, Fq ,n −k –1 → Fq ,∞ as sample n → ∞
• For q not too big and n≥100, both essentially identical.
• Stata computes p-values of F-stats using Fq,n-k-1 distribution
• Fq,n-k-1 distribution appears in published empirical work.
• But all this relevant if assn’s 5, 6 hold … typically don’t! So
why bother with this? It’s a historical relic, back when
computers were unavailable and people computed
probabilities by hand … a simple formula was used despite
unrealistic assns.
• It does give larger critical values (conservative viz Fq,∞ )
• This the reason some still use it, given small samples.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-22
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-23
Summary: testing joint hypotheses
• The “one at a time” approach of rejecting if either of the tstatistics exceeds 1.96 rejects more than 5% of the time
under the null (the size exceeds the desired significance
level)
• The heteroskedasticity-robust F-statistic is built in to STATA
(“test” command); this tests all q restrictions at once.
• For n large, the F-statistic is distributed Fq,∞
• The homoskedasticity-only F-statistic is important historically
(and thus in practice), and can help intuition, but isn’t valid
when there is heteroskedasticity
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-24
Testing Single Restrictions on Multiple
Coefficients
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
Consider the null and alternative hypothesis,
H0: β1 = β2
vs. H1: β1 ≠ β2
This null imposes a single restriction (q = 1) on multiple
coefficients – it is not a joint hypothesis with multiple
restrictions (compare with β1 = 0 and β2 = 0).
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-25
Testing single restrictions on multiple
coefficients, ctd.
Here are two methods for testing single restrictions on multiple
coefficients:
1.
2.
Rearrange (“transform”) the regression
Rearrange the regressors so that the restriction becomes a
restriction on a single coefficient in an equivalent
regression; or,
Perform the test directly
STATA lets you test these restrictions directly
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-26
Method 1: Rearrange (“transform”) the
regression
Yi = β0 + β1X1i + β2X2i + ui
H0: β1 = β2 vs. H1: β1 ≠ β2
Add and subtract β2X1i:
Yi = β0 + (β1 – β2) X1i + β2(X1i + X2i) + ui
or
Yi = β0 + γ1 X1i + β2Wi + ui
Where
γ1 = β1 – β2
Wi = X1i + X2i
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-27
Rearrange the regression, ctd.
(a) Original equation:
Yi = β0 + β1X1i + β2X2i + ui
H0: β1 = β2
vs. H1: β1 ≠ β2
(b) Rearranged (“transformed”) equation:
Yi = β0 + γ1 X1i + β2Wi + ui
where γ1 = β1 – β2 and Wi = X1i + X2i
So
H0: γ1 = 0 vs. H1: γ1 ≠ 0
•
These two regressions ((a) and (b)) have the same R2, the same
predicted values, and the same residuals.
•
The testing problem is now a simple one: test whether γ1 = 0 in
regression (b).
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-28
Method 2: Perform the test directly
Yi = β0 + β1X1i + β2X2i + ui
H0: β1 = β2
vs. H1: β1 ≠ β2
Example:
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
In STATA, to test β1 = β2 vs. β1 ≠ β2 (two-sided):
regress testscore str expn pctel, r
test str=expn
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-29
Confidence Sets for Multiple Coefficients
Yi = β0 + β1X1i + β2X2i + … + βkXki + ui, i = 1,…,n
What is a joint confidence set for β1 and β2?
A 95% joint confidence set is:
• A set-valued function of the data that contains the true
coefficient(s) in 95% of hypothetical repeated samples.
• Equivalently, the set of coefficient values that cannot be
rejected at the 5% significance level.
You can find a 95% confidence set as the set of (β1, β2) that
cannot be rejected at the 5% level using an F-test.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-30
Joint confidence sets ctd.
• Let F(β1,0, β2,0) be the (heteroskedasticity-robust) F-statistic
testing the hypothesis that β1 = β1,0 and β2 = β2,0:
• 95% confidence set = {β1,0, β2,0: F(β1,0, β2,0) < 3.00}
• 3.00 is the 5% critical value of the F2,∞ distribution
• This set has coverage rate 95% because the test on which it
is based (the test it “inverts”) has size of 5%
• 5% of the time, the test incorrectly rejects the null when the
null is true, so 95% of the time it does not; therefore the
confidence set constructed as the nonrejected values
contains the true value 95% of the time (in 95% of all
samples).
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-31
The confidence set based on the Fstatistic is an ellipse:
2
2
1  t1 + t2 − 2ρˆ t1 ,t2 t1t2 
{β1, β2: F = 
≤ 3.00}
2

ˆ
2
1 − ρt1 ,t2

Now
1
2
2


ˆ
×
t
+
t
F=
1
2 − 2 ρt1 ,t2 t1t 2 
2

2(1 − ρˆ t1 ,t2 )
=
1
×
2
2(1 − ρˆ t1 ,t2 )
 βˆ − β  2  βˆ − β  2
ˆ − β   βˆ − β  

β
2,0
1
1,0
1
1,0
2
2,0 
 2
ˆ
ρ
2
+
+
 



t1 ,t2 
ˆ
ˆ
 SE (βˆ 2 )   SE (βˆ1 ) 

β
β
SE
(
)
SE
(
)

1 
2 


This is a quadratic form in β1,0 and β2,0 – thus the boundary of
the set F = 3.00 is an ellipse.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-32
Confidence set based on inverting the Fstatistic
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-33
Regression Specification: variables of interest, control
variables, and conditional mean independence
We want to get an unbiased estimate of the effect on test
scores of changing class size, holding constant factors outside
the school committee’s control – such as outside learning
opportunities (museums, etc), parental involvement in
education (reading with mom at home?), etc.
If we could run an experiment, we would randomly assign
students (and teachers) to differently sized classes.
For then STRi would be independent of all the things that go
into ui, so E(ui|STRi) = 0 and the OLS slope estimator in the
regression of TestScorei on STRi would be unbiased.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-34
But with observational data, ui depends on additional factors
(museums, parental involvement, knowledge of English etc).
•
If you can observe those factors, then include them in the
regression.
•
But what if you cannot (e.g. parental involvement in
homework)?
•
The idea is to include “control variables” – they are correlated
with the causal OVs, but need not be causal themselves.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-35
Control variables in multiple regression
Sps. X is a regressor of interest (STR), ie trying to understand
how it causally influences the dependent variable.
If there is an OV, such as learning opportunity outside school,
OV bias in the estimated coefficient of X may arise.
A control variable W is one correlated with a causal OV in
regression of Y on X. Care not whether has causal effect on Y.
Examples: PctEL & LnchPct, both correlated with OV “outside
learning opportunity”
Goal is to solve OV bias by including, if not the OV, then at
least a control variable.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-36
Control variable: CA test scores
TestScore
= 700.2 – 1.00STR – 0.122PctEL – 0.547LchPct,
(5.6)
(0.27)
(.033)
(.024)
R 2 =0.773
• STR is the variable of interest
• PctEL probably has a direct causal effect (English tests harder
for English learners). But it is also a control variable:
Immigrant communities tend to be have fewer outside learning
opportunities, and PctEL is correlated with these causal OV.
PctEL is both a possible causal variable and a control variable.
• LchPct might have a causal effect (eating lunch helps learning);
it also is correlated with income-related outside learning
opportunities.
LchPct is both a possible causal variable and a control variable.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-37
Control variables, ctd.
Equivalent statements of control variable being effective
I.
Among individuals (entities) with the same value of the
control variable(s), the variable of interest is uncorrelated
with the omitted determinant of Y
II.
A control variable is effective if, when included in the
regression, it makes the error term uncorrelated with the
variable of interest.
III. Holding constant the control variable(s), the variable of
interest is “as if” randomly assigned.
IV. Conditional mean independence: E(ui|Xi, Wi) = E(ui|Wi)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-38
Control variables & causality
Control variables need not be causal & their coefficients
need not have a causal interpretation. For example, LchPct
TestScore
= 700.2 – 1.00STR – 0.122PctEL – 0.547LchPct,
(5.6)
(0.27)
(.033)
(.024)
R
2
0.773
• Does the coefficient on LchPct have a causal interpretation?
If so, then should eliminating the school lunch program
should boost test scores (given negative coefficient).
• If anything, eliminating it would lower test scores!
Clearly, any causal effect LchPct has is overwhelmed by
being correlated with an OV (which is what we wish!)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-39
Conditional mean independence, ctd.
Consider the regression model,
Y = β0 + β1X + β2W + u
where X is the variable of interest and W is an effective control
variable so that E(ui|Xi, Wi) = E(ui|Wi).
In addition, suppose that LSA #2, #3, and #4 hold. Then:
1.
β1 has a causal interpretation.
2.
β̂1
3.
The coefficient on control variable, β̂2 , is generally biased.
is unbiased
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-40
The math of conditional mean
independence
1. β1 has a causal interpretation.
The expected change in Y resulting from a change in X,
holding (a single) W constant, is:
E(Y|X = x+∆x, W=w) – E(Y|X = x, W=w)
= [β0 + β1(x+∆x) + β2w + E(u|X = x+∆x, W=w)]
– [β0 + β1x + β2w + E(u|X = x, W=w)]
= β1∆x + [E(u|X = x+∆x, W=w) – E(u|X = x, W=w)]
= β1∆x
where last line uses\ conditional mean independence:
E(u|X = x+∆x, W=w) = E(u|W=w) = E(u|X = x, W=w).
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-41
The math of conditional mean
independence, ctd.
Under conditional mean independence:
2. β̂1 is unbiased
3. β̂ 2 is in general biased
The math: Consider the regression model,
Y = β0 + β1X + β2W + u
where u satisfies the conditional mean independence
assumption. For convenience, suppose E(u|W) = γ0 + γ2W
(that is, that E(u|W) is linear in W).
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-42
Unbiased coeff. on var of interest
E(u|X, W) = E(u|W) = γ0 + γ2W.
(*)
Let
v := u – E(u|X, W)
(**)
so that E(v|X, W) = 0. Combining (*) and (**) yields,
u = E(u|X, W) + v
= γ0 + γ2W + v
Now substitute (***) into the regression,
Y = β0 + β1X + β2W + u
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
(***)
(+)
7-43
So that
Y = β0 + β1X + β2W + u
= β0 + β1X + β2W + (γ0 + γ2W + v)
(+)
from (***)
= (β0+ γ0) + β1X + (β2+γ2)W + v
= δ0+ β1X + δ2W + v
(++)
• Because E(v|X, W) = 0, equation (++) satisfies OLS1 so the
OLS estimators of δ0, β1, and δ2 in (++) are unbiased.
• Because the regressors in (+) and (++) are the same, the
OLS coefficients in regression (+) satisfy, E( β̂1 ) = β1 and
E( β̂2 ) = δ2 = β2+ γ2 ≠ β2 in general.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-44
E( β̂ ) = β1
1
and
E(β̂ 2 ) = δ2 = β2+ γ2 ≠ β2
In summary, if W is effective, then:
• The OLS estimator of the effect of interest, β̂1 , is unbiased.
• The OLS estimator of the coefficient on the control variable,
β̂2, is biased. This bias owes to control variable being
correlated with the error term through the OV.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-45
Implications for variable selection and
“model specification”
1.
Identify the variable of interest
2.
Think of causal OVs that could bias estimated coefficients
3.
Include any causal OVs & otherwise include Vs correlated
with them if conditional mean independence plausibly
holds. Arrive at “base” or “benchmark” model
4.
Specify various alternative models with other variables
5.
Estimate base & alternative models (“sensitivity checks”)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-46
Model specification, ctd.
Questions to ask:
– Does a candidate variable change the coefficient of
interest (β1)?
– Is a candidate variable statistically significant?
– Use judgment, not a recipe such as maximizing R2
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-47
Digression about measures of fit…
It is seductive trying to maximize the R2, but the goal remains
an unbiased estimator of coefficient on variable of interest.
• A high R2 means that the regressors explain Y’s variation
• A high R2 does not mean you have eliminated omitted
variable bias.
• A high R2 does not mean you have an unbiased estimator of
a causal effect (β1).
• A high R2 does not mean the included variables are
statistically significant.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-48
Analysis of the Test Score Data Set
1.
Identify the variable of interest: --- STR ---
2.
Think of causal OVs that could bias estimated coefficients
…whether the students know English; outside learning
opportunities; parental involvement; teacher quality (if
teacher salary is correlated with district wealth)…
3.
Include any causal OVs & otherwise include Vs correlated
with them if conditional mean independence plausibly holds
…many of the causal OVs are hard to measure, control Vs
could be PctEL and measures of district wealth.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-49
4.
Specify various alternative models with other variables . . .
It isn’t clear which of the income-related variables will best
control for the many omitted causal factors such as outside
learning opportunities, so the alternative specifications
include regressions with PctEL and LchPct and PctAssistance
Test scores vs controls:
5.
Estimate base & alternative models (“sensitivity checks”)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-50
Presenting regression results
• A table is a clear way to report a number of regressions
• A table of regression results should include:
– estimated coefficients and significance levels (* if 5%, ** if 1%)
– standard errors
– measures of fit
– number of observations
– relevant F-statistics, if any
– Any other pertinent information.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-51
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-52
Discussion of specifications
• Including control variables cuts in half the estimated effect of
class size on test scores, from -2.3 to about -1 in all
specifications. (We had guessed that OVs caused
overstatement bias) Estimate -1 significant at 5% level. So
lowering STR by 1 is expected to boost scores by 1 point.
•
Control variables much better fit the test scores. The
adjusted R-squared jumps from .05 with STR as sole
regressor, to .77 with PctEl & LchPct as included controls
• The control variables are not always statistically significant,
such as PctPublicAssistance. Since including this is not
significant and not changing esimated coefficient on STR,
this control seems redundant.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-53
Summary: Multiple Regression
• Multiple regression allows you to estimate the effect on Y of
a change in X1, holding other included variables constant.
• If you can measure a variable, you can avoid omitted
variable bias from that variable by including it.
• If you can’t measure the omitted variable, you still might be
able to control for its effect by including a control variable.
• There is no simple recipe for deciding which variables belong
in a regression – you must exercise judgment.
• One approach is to specify a base model – relying on a-priori
reasoning – then explore the sensitivity of the key
estimate(s) in alternative specifications.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
7-54