EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
Imagine you are estimating a model: cdsales =
β 0
+
β 1 radioplay +
β 2 interested in knowing how getting lots of radio play impacts cd sales.
price +
β 3 genre + u . You are
1. What omitted variables may be biasing an estimation of this model?
2. What assumption would this lead us to violate?
3. What direction do you think omission of this variable might be introducing?
Imagine you want to improve this model and introduce a new variable, quality , which (somehow) measures the objective quality of a given album.
1. What is likely to happen to the SSR of the new model relative to the original?
2. What is likely to happen to the R
2 of the new model relative to the original?
3. What TWO effects might this have on the standard error of ˆ
1
.
4. What is the equation for the variance of Var (
β
ˆ
1
) ?
Write down the two-sentence size interpretation for the underlined ˆ in each of the following regressions: log ( wage ) = 1.056
− .254
f emale + .117
educ
= 2.670
+ .279
cm + .001
avgcons
These four concepts—significance level, p-value, t-statistic, and critical value—are easy to mix up and are intimately related. Maybe a picture helps.
1
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
Just to help keep all of these test straight:
Test Type Test Statistic
Population mean e.g.
H
0
:
µ
=
µ 0 t =
−
µ 0 q s
2 n
Difference in population means e.g.
H
0
:
µ 1
−
µ 2
=
µ 0 t =
( ¯
1 r
− ¯
2 s
2 n
1
1
)
+
− s
2 n
2
2
µ 0
Population proportion e.g.
H
0
: p = p
0 z =
− p
0 q p
0
( 1 − p
0 n
)
Difference in population proportions e.g.
H
0
: p
1
− p
2
= p
0 z = r
( ˆ
1
− ˆ
( 1 − ˆ )
2 n
1
)
+
− p
0
( 1 − ˆ ) n
2
True regression parameter ( k other vars) e.g.
H
0
:
β
=
β 0 t =
SE
−
β 0
( ˆ
β
)
Multiple restrictions in regression
( q restrictions, k total variables in UR model) F =
( R 2
UR
− R 2
R
) / q
( 1 − R 2
UR
) / ( n − k − 1 )
Distribution t z z t t ∼ t n − 1
∼ t n
1
+ n
2
− 2
∼ N ( 0, 1 )
∼ N ( 0, 1 )
∼ t n − k − 1
F ∼ F q , n − k − 1
And one for confidence intervals:
Confidence Interval for:
Population mean (non-binary) h x ¯ − cSE ( ¯ ) , ¯ + cSE ( ¯ ) i
Difference in population means (non-binary) h
ˆ − cSE
ˆ
, ˆ − cSE
ˆ i
Population mean/proportion (binary) h i
− cSE ˆ , ˆ + cSE ˆ
Difference in population proportions (binary) h
ˆ − cSE ˆ
, ˆ − cSE ˆ i
Regression population parameter h i
− cSE
ˆ
, ˆ − cSE
ˆ
Standard Error q s 2 n q s 2
1 n
1
+ s 2
2 n
2 q
( 1 − ˆ ) n q p
1
( 1 − ˆ
1
) n
1
+ p
2
( 1 − ˆ
2 n
2
)
Stata Ouput
Distribution of c t n − 1 t n
1
+ n
2
− 2 t n − 1 t n
1
+ n
2
− 2 t n − k − 1
2
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
Consider the equation: colGPA =
β 0
+
β 1 hsGPA +
β 2
ACT +
β 3 skipped + u where colGPA is cumulative college grade point average, hsGPA is high school GPA, and skipped is the average lectures skipped per week. What are your expectations for the coefficients in this equation?
1. Estimate the equation and report the results. Assume that n = 141. Test for the hypothesis
β 3
= 0.
= 1.3896
+ .4118
hsGPA + .0147
ACT − .0831
skipped
( 0.332
) ( 0.094
) ( 0.011
) ( 0.026
)
• Step 1: State the hypotheses:
H
0
:
H
1
:
• Step 2: Compute the test statistic: t =
• Step 3: Choose significance level and critical value:
• Step 4: Reject the null hypothesis
• Step 5: Interpret: or fail to reject the null
2. Construct a 90% confidence interval for
β 3
. Interpret your results.
(a) Confidence level:
(b) ¯ & s :
(c) Find c
90
:
(d) Compute & Interpret interval:
(e) Interpret:
3
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
3. Test for the hypothesis
β 1
= .4 against the two-sided alternative at the 5% significance level.
• Step 1: State the hypotheses:
H
0
:
H
1
:
• Step 2: Compute the test statistic: t =
• Step 3: Choose significance level and critical value: Using the t-table, for two-side, at
0.05 significance, with 141 − 3 − 1 = 137 degrees of freedom, c = 1.960.
or fail to reject the null • Step 4: Reject the null hypothesis
• Step 5: Interpret:
4. Test for the hypothesis
β 1
= 1 against
β 1
< 1 at the 10% significance level.
• Step 1: State the hypotheses:
H
0
:
H
1
:
• Step 2: Compute the test statistic: t =
• Step 3: Choose significance level and critical value: or fail to reject the null • Step 4: Reject the null hypothesis
• Step 5: Interpret:
Example Now let’s consider an example from actual data for a poverty alleviation program in Mexico. In
1997, 24,059 households in rural Mexico were randomly allocated between treatment and control groups for a conditional cash transfer program called Oportunidades to keep kids in school. When analyzing the results of a randomized experiment, the first step is to verify that the control group is, on average, very much like the treatment group in terms of characteristics that we observe and have data for. For example, data was collected on household assets. Your data reveals that while 14.47% of the 14,846 treatment households have a refrigerator, and 16.53% of the 9,213 control households have one. In order to confirm that about the same proportion of households in each group have a refrigerator, we need to perform a hypothesis test.
4
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
Call the sample proportion of households with a refrigerator in the treatment group ˆ t
, the true treatment proportion p t p c
, and the true control proportion p c
. Also, call the whole sample proportion of households (in either treatment or control) with a refrigerator ˆ .
Step 1.
H
0
: p t
− p c
= D = 0
H
1
: p t
− p c
= D = 0
Step 2.
How do we compute this test statistic? We know that the null hypothesis specifies
E
[ p t
− p c
] = 0, so what’s left is the standard deviation. Whenever we’re testing a difference of means, remember the formula: Var ( ¯ − ¯ ) = Var ( ¯ ) + Var ( ¯ ) .
So applying the formula, we have that:
Var ( ˆ t
− ˆ c
) = Var ( ˆ t
) + Var ( ˆ c
)
Var
Which means SD ( ˆ
( t
ˆ
Var ( ˆ c t
) =
) =
− ˆ c
ˆ ( 1 − p ) n t
ˆ ( 1 − p ) n c
) = q
ˆ ( 1 − p ) n t
+
ˆ ( 1 − p ) n c
The trickiest part here is keeping track of what your null hypothesis is! Now we’re ready to calculate our z-statistic:
SD ( )
= ˆ t
− ˆ c
= − .0206
=
=
⇒
14846
24059
( .1447
) +
9213
24059
( .1653
) = .1526
r
.1526
( 1 − .1526
)
+
.1526
( 1 − .1526
)
14846
− .0206
− 0
9213 z = = − 4.32
.00477
= .00477
Step 3.
By the null hypotheses we chose, we’re doing a two-sided test. Let’s choose the 5% significance level as this is the most common test that economists evaluate. Check the normal table to find that c = 1.96
Step 4.
Reject Fail to reject
Step 5.
Interpret: At the 5% significance level, there is statistical evidence that the proportion of households with a refrigerator in the control group is not the same as the proportion of households with a refrigerator in the treatment group. What does this mean for the study?
Probably not much. In randomized experiments such as this one, many household characteristics are checked for “balance” across treatment and control.
Statistically, we expect that some of our hypothesis tests will reject the null simply because a 5% significance level indicates that 5% of the time we will reject the null even though it’s true.
With this same example, how would we compute a confidence interval?
The KEY difference here is that now, instead of assuming a null hypothesis to be true, we are just taking our estimated variance from what we observe in our sample(s). Therefore, instead of constructing a ˆ that represents the mean of all observations in our sample, we allow for the means of the two samples
5
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8 to be different. In fact, the confidence interval is centered on our estimated difference from the sample. We then use standard errors from these sub-samples to constuct the standard error of their difference:
= p ˆ t
− p ˆ c
We can use the formula Var ( x ¯ − ¯ ) = Var ( x ¯ ) + Var ( y ¯ ) to find the ) = Var − ˆ c
) :
D ) = Var ( ˆ t
) + Var ( ˆ c
) d
ˆ p
) = s 2 p n p d
ˆ c
) = s
2 c n c
You can then take the square root of this estimated variance to get an estimate for the estimator’s standard error. Then, we can plug in our values in order to construct the confidence interval.
CI
W
= ¯ − c
W
√ s n
+ c
W
√ s n
6