Chapter 3 - Review of Statistics

Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Review of Statistics
3.2 Hypothesis Testing
Making a decision based on the evidence at hand whether a certain
claim is true.
A. Start by specifying the null and alternative hypotheses.
H0:  = 0. Ha:   0 (a two-sided test) or  > 0 (a one-sided
We want to use evidence in a randomly selected sample of data to
either accept the null hypothesis or to reject it in favor of the
alternative hypothesis.
“Accept” does not mean that it is true, rather, it is accepted
tentatively with the recognition that it might be rejected later
based on additional evidence. Therefore, we say, we “reject” the
null hypothesis or we “fail to reject” the null hypothesis.
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
B. Form the test statistic. Here we use the t-distribution to test
means. t = m - mo
C. Specify the level of significance:
D. Compute the p-value using the data, and compare the p-value to
the fixed level of significance.
P-value: the probability of drawing a statistic at least as adverse to
the null as the value actually computed with your data, assuming that
the null hypothesis is true.
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Calculating the p-value based on Y :
To compute the p-value, you need the to know the sampling
distribution of Y , which is complicated if n is small.
If n is large, you can use the normal approximation (CLT):
If Y is unknown, how do we calculate the p-value?
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Estimator of the variance of Y (sample variance of Y):
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
What is the link between the p-value and significance level?
The significance level is prespecified. For example, if the
prespecified significance level is 5%,
• you reject the null hypothesis if |t| ≥ 1.96.
• Equivalently, you reject if p ≤ 0.05.
i.e. Φ(1.96)=0.975
• The p-value is sometimes called the marginal
significance level.
• Often, it is better to communicate the p-value than
simply whether a test rejects or not – the p-value
contains more information than the “yes/no” statement
about whether the test rejects.
• Think of the p-value as the level of credibility of Ho. As pvalue gets small, Ho gets less and less credible.
P < .01: highly significant rejection of null
.05 < P < .01: Statistically significant. Reject Ho.
P>.05: Don’t reject null.
Again the p-value depends on the alternative
(1) Ha: X > Xo ==> p-value=Pr( Z >
(2) Ha: X < Xo ==> p-value=Pr( Z <
(3) Ha: X  Xo ==> Pr( |z| > |z observed|)
Example: A neurologist is testing the effect of a drug on response
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
time by injecting 100 rats with a unit dose of the drug, subjecting
each to neurological stimulus, and recording its response time. The
neurologist knows that the mean response time for rats not injected
with the drug is 1.2 seconds. The mean of the 100 injected rats’
response times is 1.0 seconds with a sample standard deviation of 0.5
seconds. Do you think that the drug has an effect on response time?
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Example: The average cost of repairing cars after an accident is
$1000. The insurance company wants to test whether the St. Louis
office has a mean of $1000 for car repairs. The company uses a
critical value of .05. There are 5 cars in the sample with a mean cost
of $540, and s = $299.
One –sided Alternatives
If one wants to test whether education helps the labor market, they
could test whether graduates will earn more than non-graduates,
rather than testing whether their earnings differ or not.
The approach to calculate the test statistic and p-values is the same as
the two sided case.
Type I errors: Reject Ho when Ho true. Pr(Type I) = . (false positive)
Type II errors: Do not reject Ho if Ho is false. (false negative)
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Confidence Intervals
A 95% confidence interval for Y is an interval that contains the
true value of Y in 95% of repeated samples.
Because of random sampling error we cannot learn the exact
value of the population mean, but we can construct a set of
values that contain it with a prespecified probability.
A 95% confidence interval can always be constructed as the set
of values of Y not rejected by a hypothesis test with a 5%
significance level.
Comparing the Means of Two Populations (Large Sample):
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Do men and women in the same occupation in the same company
earn the same wage on average? The sample size is assumed to be
large, so we’ll use the normal distribution.
How to set up the problem: Let X(m) be the average male wage, and
X(f) be the average female wage. m = standard deviation for men
and f = standard deviation for women.
Estimate of the Standard Error:
A confidence interval: mm - m f
SE(Xm - X f ) =
the test statistic:
s 2f
= Xm - X f ± za SE Xm - X f
A hypothesis test: Ho: (m) = (f)
s m2
X m - X f - mm - m f
s m2 s f
nm n f
A Numerical Example: Test for discrimination in the firm: 100 men,
64 women, where men mean salary = 3100, standard deviation = 200,
women mean salary = 2900, standard deviation = 320
Using the t-statistic when the sample size is small
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
The use of the standard normal distribution in conjunction with the tstatistic for hypothesis testing is only relevant when the sample size
is large.
When the sample size is small, the standard normal distribution can
provide a very poor approximation to the distribution of the tstatistic. However, if the population distribution is normally
distributed, the exact distribution of the t-statistic will be a student t
distribution with n-1 degrees of freedom.
The t-statistic testing the difference of two means does not have
student t distribution even if the population distribution of Y is
normal. This is because the variance estimator used to compute the
standard error does not produce a denominator with a chi-squared
A modified version of the differences of the mean t-statistic is
available in equation 3.23 in the book. However, it is only valid
when the two groups have the same exact variance or same number
of observations.