# Lecture 13: Hypothesis testing in linear regression models BUEC 333 ```Lecture 13: Hypothesis testing in linear regression
models
BUEC 333
Professor David Jacks
1
Previously, considered the sampling distribution
for the least squares estimator.
Specifically, saw that in the regression model,
Yi = β0 + β1X1i + β2X2i +...+ βkXki + εi with
εi ~ N(0,σ2), OLS estimator ˆ j has a normal
sampling distribution with mean βj and Var[ ˆ j ].
Follows from the fact that a linear function of a
normally distributed variable is itself normally
distributed (from εi to Yi to ˆ j).
Sampling and hypothesis testing
2
We also revisited the Central Limit Theorem.
Main implication from the CLT: as the sample size
gets larger (technically, as n → ∞), the sampling
distribution of the least squares estimator is well
approximated by a normal distribution.
So even if the errors are not normal, the sampling
distribution of the beta-hats is approximately
normal in large samples.
Sampling and hypothesis testing
3
The point was to get a sampling distribution for
the OLS estimator to do some hypothesis testing.
Here, hypothesis testing is the same as always:
1.) we formulate the null and alternative
hypotheses that we are interested in and
choose a level of significance α for the test.
2.) we construct a test statistic that has a known
sampling distribution
Sampling and hypothesis testing
4
3.) we compare the value of the test statistic to a
critical value (associated with our α) from
the sampling distribution of the test statistic.
4.) if the test statistic is larger than the critical
value, it is unlikely that the null is true and
we reject it (with Type I error probability α).
Sampling and hypothesis testing
5
Consider the case of IQ tests which are
constructed so the average score for adults is 100.
Would like to know if university students are
smarter than average?
In this example, we can take a sample of
undergrads at SFU (say, n=6) and try to determine
if the average of IQ scores for all students at the
university is higher than 100 (above average).
How this works
6
The following scores are obtained for the 6
students in the sample: 110, 118, 110, 122, 110,
and 150.
It can easily be shown that the sample mean and
sample variance are 120 and 201.33…so it seems
to be the case that SFU students are smarter.
But is this finding likely to hold in repeated
samples?
How this works
7
Enter the t-test:
1.) We form our null and alternative hypotheses as
follows, H0 : μ ≤ 100 versus H1 : μ &gt; 100,
and specify a significance level of 5%.
2.) Under the null, the t-statistic should be
distributed as a tn-1 = t5 if H0 is true
How this works
8
Think about what this means: repeatedly sampling
6 students and calculating a t-statistic each time,
histogram as a sketch of t’s sampling distribution.
If the population mean truly is 100, the most likely
value of t should be zero if our sample of 6
students is truly representative of the population.
How this works
9
How this works
10
3.) &amp; 4.) But we need to be more precise than
“our sample does not look like it is representative
of the population”.
That is, our computed t-statistic is large in
magnitude, but how large is large enough?
We can calculate how often a computed sample t
will be far from the population mean of t = 0
based on our knowledge of the distribution.
How this works
11
The critical values of the t-distribution tell us
exactly how often we should find computed tstatistics of large magnitude.
But we also need to know the degrees of freedom
(here, df = 5) to help us make this determination.
Why? If we sample only a few students, our
computed t-statistics are more likely
How this works
12
Under a one-tailed test with significance level of
5% and 5 degrees of freedom, you should find a
critical value of t = 2.02.
Is our computed t-statistic larger than this critical
value? Obviously yes…
Thus, reject the null hypothesis at the 5% level of
significance that the population mean of IQ scores
for students is 100 (or less)
How this works
13
More precisely, based on our knowledge of the
t-distribution we know that 95% of the time the
computed t-statistic should be less than 2.02.
But there will always be some uncertainty:
if we conclude that the population mean
intelligence (of students) is higher than 100, how
often will be wrong in this conclusion?
How this works
14
So what kind of hypotheses do we test in a
regression context? Well, a lot of things…
e.g., H0 : βj = 0 versus H1 : βj ≠ 0
2.) hypotheses related to the confidence intervals
for coefficients; e.g. Pr[L ≤ βj ≤ U] = 1 – α
As it turns out, these are very similar to what we
did
Hypothesis testing and regression
15
We can also test more complicated hypotheses
about the set of regression coefficients; e.g.,
H0 : β1 = 0, β2 = 0 versus H1 : β1 ≠ 0, β2 = 0.
In the remainder of this lecture, we will see how to
do these tests.
And in weeks to come, we will see how to test for
correct specification, multicollinearity, serial
correlation, and heteroskedasticity; that is,
whether the classical assumptions are violated.
Hypothesis testing and regression
16
But before all that, we need to motivate the topic a
little more; that is, we can do hypothesis testing
but to what ultimate purpose?
Econometrics started out as an exercise in
explicitly testing economic theory.
For example, does quantity demanded decrease
with price or does international trade induce
economic growth?
Hypothesis testing and regression
17
So far, we have developed the best means to come
up with some reasonable guesses (i.e., estimates).
Now, we can return back to the essence of
real world from the sample at our disposal.
In particular, hypothesis testing allows to answer
the question of whether our results are likely
Hypothesis testing and regression
18
In the context of regression analysis, we can never
prove that a theory is correct; all we can do is to
show that the sample data “fit” the theory.
However, you can often reject a hypothesis or
theory; “it is very unlikely this sample would have
been observed if the theory were true.”
Hypothesis testing and regression
19


We know that if εi ~ N(0,σ2), then ˆ j ~ N  j ,Var[ˆ j ]
for each j = 0,1, 2,..., k.
We also know how to standardize variables:
subtract by their mean and divide by their standard
deviation so that the resulting distribution is
centered around 0 with a variance of 1, or
Returning to the t-test
20
Returning to the t-test
21
.
Returning to the t-test
22
Suppose we want to test a simple hypothesis like
H0 : βj = βH versus H1 : βj ≠ βH where βH is some
number (unspecified for now).
If we knew Var[ ˆ j ], we could base our test on Z.
That is, if H0 is true, then:
Returning to the t-test
23
If Z is far from zero, then it is unlikely that H0 is
true, and we would reject the null.
And if Z is close to zero, then there is not enough
evidence against H0 to reject the null, so we would
fail to reject.
And, of course, we know whether a particular
value of Z is “close” or “far” from zero
Returning to the t-test
24
For better or worse, we will never knowVar[ ˆ j ] .
Notice that it is a population quantity and,
therefore, we cannot actually use Z for testing. 
But we can estimate Var[ ˆ j ]. 
And we have seen this before…in the form of the
standard error of ˆ j ,
Luckily, this easily calculated by a computer.
From the Z to the t
25
So instead, we can base our test on:
When and only when εi ~ N(0,σ2), then t has a t
distribution with n-k-1 degrees of freedom; or
more compactly, t ~ tn-k-1 when εi ~ N(0,σ2).
This can be shown using exactly the same kind of
argument
From the Z to the t
26
So now we have a test statistic that we can use for
testing simple (but informative) hypotheses about
regression coefficients.
We can (and will) test one- or two-sided
hypotheses using this statistic.
We can also build confidence intervals for
regression coefficients using this statistic like
Pr[ˆ  t *  s.e.(ˆ )    ˆ  t *  s.e.(ˆ )]  1  
j
 /2
j
j
j
 /2
j
27
All of this is operationally identical to the tests we
did for population means.
And just like when we were testing population
means, we need normality (here, of the error
terms) for the t-statistic to follow a t-distribution.
If the errors are not normally distributed, we can
rely on the CLT to help us out; consequently, in
large samples we can still use
28
People often (always?) test whether a particular
regression coefficient is “statistically significant”.
When they say this, they are testing whether βj is
statistically different from zero; that is: they are
considering H0 : βj = 0 versus H1 : βj ≠ 0.
We can test this hypothesis very easily using our
trusty t-statistic as, in this case
Statistical significance
29
This particular hypothesis test is so common that
every software package reports the result of this
test automatically (and the associated p-value).
Source
SS
df
MS
Model
Residual
1123.39127
533.896273
2
146
561.695635
3.65682379
Total
1657.28754
148
11.1978888
Coef.
lngdpprod
lndist
_cons
1.469513
-1.713246
18.09403
Std. Err.
.1002652
.1351385
2.530274
Statistical significance
t
14.66
-12.68
7.15
Number of obs
F( 2,
146)
Prob &gt; F
R-squared
Root MSE
P&gt;|t|
0.000
0.000
0.000
=
=
=
=
=
=
149
153.60
0.0000
0.6778
0.6734
1.9123
[95% Conf. Interval]
1.271355
-1.980326
13.09334
1.667672
-1.446165
23.09473
30
1.) Just because we reject the null hypothesis
βj = 0 does not mean Xj should be in the model.
2.) When n increases, you will automatically get
3.) A larger value of the test statistic does not
mean a particular independent variable is “more
important” in explaining the dependent variable.
Statistical significance
31
Suppose we have a regression model with k
independent variables.
A very common hypothesis to test is
H0 : β1 = β2 = ... = βk = 0 versus
H1 : at least one βj ≠ 0, where j = 1, 2, ... , k.
That is, we are testing the joint hypothesis that all
the slope coefficients are zero.
The F-test
32
Both seek to answer the question of whether the
regression model fits the data well.
In the case of the F-test, the particular question of
interest is whether the regression fits the data
better than the sample mean.
If H0 is true, then Yi = β0 + εi, and the OLS
estimator of β0 in this model is simply ˆ0  Y.
The F-test
33
The test statistic for this hypothesis is
ESS / k
ESS (n  k  1)
F

RSS / (n  k  1)
k
TSS * R 2
(n  k  1)
F
2
TSS  TSS * R
k
R
(n  k  1)
F
~ Fk ,n  k 1
2
(1  R )
k
2
The F-test
34
Once again, you can look up critical values for the
F-distribution with k and (n – k – 1) degrees of
freedom in the back of text or online.
This test is also routinely calculated by statistical
software packages (refer back to slide 30 for an
example of a F-test value as well as p-value).
But again be careful.
The F-test
35
A huge literature has emerged which tries to
explain the volume of trade between countries
The workhorse of this literature in empirical
international trade is the gravity model.
The gravity model relates bilateral volumes of
Gravity: an extended example
36
Two nations with similar economies, history, and
with the United States.
Gravity: an extended example
37
Two nations with similar geography, history, and
institutions (Denmark and Germany) and their
Gravity: an extended example
38
But why gravity?
M1  M 2
Fg  G
d2
GDP1  GDP2
n
d
Here, B is a “catch-all” term which will include
Gravity: an extended example
39
A typical model might look like the following:
ln(tradeij * trade ji )    1 ln(GDPi  GDPj )
  2 ln(distanceij )   ij
Consider countries i and j and form all pairs.
But before estimation, should establish our priors
related to the sign of the estimated coefficients.
That is, β1 &gt; 0 and β2 &lt; 0
Gravity: an extended example
40
Gravity: an extended example
41
Consider the sign of GDP1* GDP2…we expect
this to positively affect the amount of trade any
given pair of countries will have (ceteris paribus).
1.) Form the null and alternative hypotheses and
specify a level of significance (α) of 5%:
H0 : β1 ≤ 0 versus H1 : β1 &gt; 0.
2.) If H0 is true, the t-statistic should be distributed
Gravity: an extended example
42
2.) continued
ˆ j   H
1.4695  0.00
t

 14.66
0.1003
s.e.( ˆ j )
We can already note that the value of the t-statistic
is very large (“large enough” values are generally
in the range of two to three).
This suggests that it is very unlikely
Gravity: an extended example
43
3.) We can improve on this intuition by comparing
the value of the test statistic to the critical value
from the sampling distribution of the test statistic.
In this case, with n = 149 and df = 146, the critical
value of a one-sided t-test at the 5% level of
significance is 1.66.
This tells us that there is a 5% probability of
observing an estimated t-value
Gravity: an extended example
44
4.) However, since the test statistic is so much
larger (in absolute value) than the critical value, it
is unlikely that the null is true and we reject it
(with Type I error probability α).
Another way of seeing this is the CI for β1:
Pr[ ˆ j  t* /2  s.e.( ˆ j )   j  ˆ j  t* /2  s.e.( ˆ j )]  1  
Pr[1.4695  1.96  0.1003   j  1.4695  1.96  0.1003]  95%
Gravity: an extended example
45
Another potential hypothesis relates to the OLS
estimate on GDP1*GDP2…a certain set of models
of international trade suggest β1 = 2.
1.) Form the null and alternative hypotheses and
specify a level of significance (α) of 5%:
H0 : β1 = 2 versus H1 : β1 ≠ 2.
2.) If H0 is true, the t-statistic should be distributed
Gravity: an extended example
46
Gravity: an extended example
47
2.) continued
Gravity: an extended example
48
3.) We can improve on that intuition by comparing
the value of the test statistic to the critical value
from the sampling distribution of the test statistic.
In this case, with n = 149 and df = 146, the critical
value of a two-sided t-test at the 5% level of
significance is 1.96.
This tells us that there is a 5% probability of
observing an estimated t-value
Gravity: an extended example
49
4.) However, since the test statistic is so much
larger (in absolute value) than the critical value, it
is unlikely that the null is true and we reject it
(with Type I error probability α).
Another way of seeing this is the CI for β2:
Pr[ ˆ j  t* /2  s.e.( ˆ j )   j  ˆ j  t* /2  s.e.( ˆ j )]  1  
Pr[1.4695  1.96  0.1003   j  1.4695  1.96  0.1003]  95%
Gravity: an extended example
50
Finally, we can consider the overall significance of
our regression model by evaluating the F-test.
Under the null of the F-test, no explanatory
variables has any effect…if F is large, then the
unconstrained model fits the data much better than
the constrained model where Yi = β0 + ε.
In this case, where F = 153.60 and the associated
p-value is 0.0000
Gravity: an extended example
51
For purposes of hypothesis testing in the CLRM,
there are precisely two ways to get at wellbehaved sampling distributions for test statistics:
1.) assume normal population errors, or
2.) invoke the Central Limit Theorem
Once these are in place, it all boils down to
calculating values
Conclusion
52
```