1 Pop Quiz: This quiz is worth 85% of your final grade. You have 2

EEP/IAS 118 - Introductory Applied Econometrics

Spring 2015

1 Pop Quiz: This quiz is worth 85% of your final grade.

You have 2 minutes. Go.

Sylvan Herskowitz

Section Handout 8

Imagine you are estimating a model: cdsales =

β 0

+

β 1 radioplay +

β 2 interested in knowing how getting lots of radio play impacts cd sales.

price +

β 3 genre + u . You are

1. What omitted variables may be biasing an estimation of this model?

2. What assumption would this lead us to violate?

3. What direction do you think omission of this variable might be introducing?

Imagine you want to improve this model and introduce a new variable, quality , which (somehow) measures the objective quality of a given album.

1. What is likely to happen to the SSR of the new model relative to the original?

2. What is likely to happen to the R

2 of the new model relative to the original?

3. What TWO effects might this have on the standard error of ˆ

1

.

4. What is the equation for the variance of Var (

β

ˆ

1

) ?

Warm-Up: Interpreting

ˆ

Write down the two-sentence size interpretation for the underlined ˆ in each of the following regressions: log ( wage ) = 1.056

− .254

f emale + .117

educ

= 2.670

+ .279

cm + .001

avgcons

Quick note on p-values and significance levels:

These four concepts—significance level, p-value, t-statistic, and critical value—are easy to mix up and are intimately related. Maybe a picture helps.

1


Spring 2015

Sylvan Herskowitz

Section Handout 8

2 Hypothesis Testing and Confidence Interval Review

Just to help keep all of these test straight:

Test Type Test Statistic

Population mean e.g.

H

0

:

µ

=

µ 0 t =

−

µ 0 q s

2 n

Difference in population means e.g.

H

0

:

µ 1

−

µ 2

=

µ 0 t =

( ¯

1 r

− ¯

2 s

2 n

1

1

)

+

− s

2 n

2

2

µ 0

Population proportion e.g.

H

0

: p = p

0 z =

− p

0 q p

0

( 1 − p

0 n

)

Difference in population proportions e.g.

H

0

: p

1

− p

2

= p

0 z = r

( ˆ

1

− ˆ

( 1 − ˆ )

2 n

1

)

+

− p

0

( 1 − ˆ ) n

2

True regression parameter ( k other vars) e.g.

H

0

:

β

=

β 0 t =

SE

−

β 0

( ˆ

β

)

Multiple restrictions in regression

( q restrictions, k total variables in UR model) F =

( R 2

UR

− R 2

R

) / q

( 1 − R 2

UR

) / ( n − k − 1 )

Distribution t z z t t ∼ t n − 1

∼ t n

1

+ n

2

− 2

∼ N ( 0, 1 )

∼ N ( 0, 1 )

∼ t n − k − 1

F ∼ F q , n − k − 1

And one for confidence intervals:

Confidence Interval for:

Population mean (non-binary) h x ¯ − cSE ( ¯ ) , ¯ + cSE ( ¯ ) i

Difference in population means (non-binary) h

ˆ − cSE

ˆ

, ˆ − cSE

ˆ i

Population mean/proportion (binary) h i

− cSE ˆ , ˆ + cSE ˆ

Difference in population proportions (binary) h

ˆ − cSE ˆ

, ˆ − cSE ˆ i

Regression population parameter h i

− cSE

ˆ

, ˆ − cSE

ˆ

Standard Error q s 2 n q s 2

1 n

1

+ s 2

2 n

2 q

( 1 − ˆ ) n q p

1

( 1 − ˆ

1

) n

1

+ p

2

( 1 − ˆ

2 n

2

)

Stata Ouput

Distribution of c t n − 1 t n

1

+ n

2

− 2 t n − 1 t n

1

+ n

2

− 2 t n − k − 1

2


Spring 2015

Sylvan Herskowitz

Section Handout 8

3 CI and HT Practice with Regressions

Consider the equation: colGPA =

β 0

+

β 1 hsGPA +

β 2

ACT +

β 3 skipped + u where colGPA is cumulative college grade point average, hsGPA is high school GPA, and skipped is the average lectures skipped per week. What are your expectations for the coefficients in this equation?

1. Estimate the equation and report the results. Assume that n = 141. Test for the hypothesis

β 3

= 0.

= 1.3896

+ .4118

hsGPA + .0147

ACT − .0831

skipped

( 0.332

) ( 0.094

) ( 0.011

) ( 0.026

)

• Step 1: State the hypotheses:

H

0

:

H

1

:

• Step 2: Compute the test statistic: t =

• Step 3: Choose significance level and critical value:

• Step 4: Reject the null hypothesis

• Step 5: Interpret: or fail to reject the null

2. Construct a 90% confidence interval for

β 3

. Interpret your results.

(a) Confidence level:

(b) ¯ & s :

(c) Find c

90

:

(d) Compute & Interpret interval:

(e) Interpret:

3


Spring 2015

Sylvan Herskowitz

Section Handout 8

3. Test for the hypothesis

β 1

= .4 against the two-sided alternative at the 5% significance level.


H

0

:

H

1

:


• Step 3: Choose significance level and critical value: Using the t-table, for two-side, at

0.05 significance, with 141 − 3 − 1 = 137 degrees of freedom, c = 1.960.

or fail to reject the null • Step 4: Reject the null hypothesis

• Step 5: Interpret:

4. Test for the hypothesis

β 1

= 1 against

β 1

< 1 at the 10% significance level.


H

0

:

H

1

:


• Step 3: Choose significance level and critical value: or fail to reject the null • Step 4: Reject the null hypothesis

• Step 5: Interpret:

4 Hypothesis Testing with Two Proportions

Example Now let’s consider an example from actual data for a poverty alleviation program in Mexico. In

1997, 24,059 households in rural Mexico were randomly allocated between treatment and control groups for a conditional cash transfer program called Oportunidades to keep kids in school. When analyzing the results of a randomized experiment, the first step is to verify that the control group is, on average, very much like the treatment group in terms of characteristics that we observe and have data for. For example, data was collected on household assets. Your data reveals that while 14.47% of the 14,846 treatment households have a refrigerator, and 16.53% of the 9,213 control households have one. In order to confirm that about the same proportion of households in each group have a refrigerator, we need to perform a hypothesis test.

4


Spring 2015

Sylvan Herskowitz

Section Handout 8

Call the sample proportion of households with a refrigerator in the treatment group ˆ t

, the true treatment proportion p t p c

, and the true control proportion p c

. Also, call the whole sample proportion of households (in either treatment or control) with a refrigerator ˆ .

Step 1.

H

0

: p t

− p c

= D = 0

H

1

: p t

− p c

= D = 0

Step 2.

How do we compute this test statistic? We know that the null hypothesis specifies

E

[ p t

− p c

] = 0, so what’s left is the standard deviation. Whenever we’re testing a difference of means, remember the formula: Var ( ¯ − ¯ ) = Var ( ¯ ) + Var ( ¯ ) .

So applying the formula, we have that:

Var ( ˆ t

− ˆ c

) = Var ( ˆ t

) + Var ( ˆ c

)

Var

Which means SD ( ˆ

( t

ˆ

Var ( ˆ c t

) =

) =

− ˆ c

ˆ ( 1 − p ) n t

ˆ ( 1 − p ) n c

) = q

ˆ ( 1 − p ) n t

+

ˆ ( 1 − p ) n c

The trickiest part here is keeping track of what your null hypothesis is! Now we’re ready to calculate our z-statistic:

SD ( )

= ˆ t

− ˆ c

= − .0206

=

=

⇒

14846

24059

( .1447

) +

9213

24059

( .1653

) = .1526

r

.1526

( 1 − .1526

)

+

.1526

( 1 − .1526

)

14846

− .0206

− 0

9213 z = = − 4.32

.00477

= .00477

Step 3.

By the null hypotheses we chose, we’re doing a two-sided test. Let’s choose the 5% significance level as this is the most common test that economists evaluate. Check the normal table to find that c = 1.96

Step 4.

Reject Fail to reject

Step 5.

Interpret: At the 5% significance level, there is statistical evidence that the proportion of households with a refrigerator in the control group is not the same as the proportion of households with a refrigerator in the treatment group. What does this mean for the study?

Probably not much. In randomized experiments such as this one, many household characteristics are checked for “balance” across treatment and control.

Statistically, we expect that some of our hypothesis tests will reject the null simply because a 5% significance level indicates that 5% of the time we will reject the null even though it’s true.

Confidence Interval

With this same example, how would we compute a confidence interval?

The KEY difference here is that now, instead of assuming a null hypothesis to be true, we are just taking our estimated variance from what we observe in our sample(s). Therefore, instead of constructing a ˆ that represents the mean of all observations in our sample, we allow for the means of the two samples

5


Spring 2015

Sylvan Herskowitz

Section Handout 8 to be different. In fact, the confidence interval is centered on our estimated difference from the sample. We then use standard errors from these sub-samples to constuct the standard error of their difference:

= p ˆ t

− p ˆ c

We can use the formula Var ( x ¯ − ¯ ) = Var ( x ¯ ) + Var ( y ¯ ) to find the ) = Var − ˆ c

) :

D ) = Var ( ˆ t

) + Var ( ˆ c

) d

ˆ p

) = s 2 p n p d

ˆ c

) = s

2 c n c

You can then take the square root of this estimated variance to get an estimate for the estimator’s standard error. Then, we can plug in our values in order to construct the confidence interval.

CI

W

= ¯ − c

W

√ s n

+ c

W

√ s n

6

1 Pop Quiz: This quiz is worth 85% of your final grade. You have 2

1 Pop Quiz: This quiz is worth 85% of your final grade.

You have 2 minutes. Go.

Warm-Up: Interpreting

ˆ

Quick note on p-values and significance levels:

2 Hypothesis Testing and Confidence Interval Review

3 CI and HT Practice with Regressions

4 Hypothesis Testing with Two Proportions

Confidence Interval

Related documents

Products

Support

1 Pop Quiz: This quiz is worth 85% of your final grade. You have 2