Lecture 12: The classical normal linear model BUEC 333 Professor David Jacks

advertisement
Lecture 12: The classical normal linear model
BUEC 333
Professor David Jacks
1
Previously, we explored the six assumptions of
linear regression which define the classical linear
regression model (CLRM).
We also learned that when those six assumptions
are satisfied, then the least squares estimator is
BLUE (Best Linear Unbiased Estimator).
This explains why OLS serves as the baseline
estimator in so many applications in economics
Recap on the classical model
2
Now, we will add one more assumption to the
CLRM, namely that the error term is normally
distributed…gives us the CNLRM.
Also gives us an exact sampling distribution for
the least squares estimator (reflecting that a
different sample would yield different values).
Once we have a sampling distribution for our
estimator, we can develop testable hypotheses
Recap on the classical model
3
Seventh assumption: error terms are independent
and identically distributed as normal RVs with
mean 0 and variance σ2, or εi ~ N(0,σ2).
Actually, a pretty strong assumption.
Previously, we assumed that the errors have mean
zero (A2), are uncorrelated across observations
(A5), and have a constant variance (A6).
Assumption 7: Normal errors
4
But A7 gives us more than that; also specifies the
complete probability distribution of the errors not
just their mean and variance (due to normality).
NB: do not need normality to use OLS;
from the GMT, OLS is BLUE even without it.
But when errors are normally distributed, an even
stronger result: OLS is BUE.
Assumption 7: Normal errors
5
The sampling distribution of OLS estimator:
reflects distribution of the values of ˆ j that we
would observe if we drew many different samples
from the same population and estimated the
regression model on each sample.
OLS estimators are unbiased, so we already know
the mean of their sampling distribution:
the (true) population values of the coefficients…
The sampling distribution of the OLS estimator
6
Suppose we have the following simple linear
regression model, Yi = β0 + β1Xi + εi where
εi ~ N(0,σ2).
We also know the least squares estimators are
given as follows:
ˆ
1
X  X Y  Y 



 X  X 
i
i
i
2
i
and βˆ0  Y  ˆ1 X
i
Now, need to think about what the normality
The sampling distribution of the OLS estimator
7
Because εi ~ N(0,σ2) and Yi is a linear function of
εi, A7 implies that Yi is normally distributed too.
From Lecture 5: a linear function of a normally
distributed random variable is itself normally
distributed.
Likewise, Y-bar is also normally distributed
The sampling distribution of the OLS estimator
8
Notice what this will imply about OLS estimates:
these too will be normally distributed!



That is, ˆ0 ~ N  0 ,Var  ˆ0  and ˆ1 ~ N 1 ,Var  ˆ1 

Why? Refer to slide 41 in lecture 11…
As it turns out, this a pretty powerful result:
if errors and, thus, OLS estimators are normally
distributed
The sampling distribution of the OLS estimator
9
So, knowing the sampling distribution of the OLS
estimators means we can test hypotheses about
the population betas themselves without ever
knowing their “true” value.
NB: this result generalizes for any number of
independent variables.
What normal errors buys us
10
It can be shown that the sampling variance of the
slope coefficient in the regression model with one
independent variable is
Var  ˆ1  =

 X
i
i
2
X
2
Intuition: when we have more variation in ε, the
OLS estimate is less precise.
The sampling variance of the OLS estimator
11
What about the opposite?
The sampling variance of the OLS estimator
12
Upon closer inspection, the sampling variance of
the OLS estimator contains an unknown, the
variance of the unobserved error term or σ2.
So, how do we get around this?
As it turns out, an unbiased estimator of σ2 is
s 
2
e
2
i i
 n  k  1
The sampling variance of the OLS estimator
13
With substitution, we get an unbiased estimator of
the variance of the OLS estimator:
ˆ  ˆ  
Var
 1
s2
 X
i
i
X
2
The square root of this estimator is called the
standard error of beta-hat (standard because of the
square root; error because…).
The sampling variance of the OLS estimator
14
Bringing it all together:
e  /  n  k  1


ˆ  ˆ  
Var
2
 1
i i
 X
i
i
X
2
Naturally, we prefer smaller over larger values.
Intuitively, this could entail:
1.) a smaller numerator (a higher value of n or
lower values of k or ei )
The sampling variance of the OLS estimator
15
Previously stated that when errors are normally
distributed, the sampling distribution of the OLS
estimator is also normal.
But what if the errors are not normally distributed?
As long as the errors are “well behaved”, then we
can rely on the Central Limit Theorem.
Implications from the CLT
16
Main implications from the CLT: as the sample
size gets larger (as n → ∞), the sampling
distribution of the least squares estimator is well
approximated by a normal distribution.
So even if the errors are not normal, the sampling
distribution of the beta-hats is approximately
normal in large samples.
Implications from the CLT
17
CLT: the sum (and hence, the mean) of a number
of independent, identically distributed random
variables will tend to be normally distributed,
regardless of their underlying distribution, if the
number of different RVs is large enough.
Consider (yet again) the case of a six-sided dice:
There are 6 possible outcomes {1, 2, 3, 4, 5, 6}
Each with an associated probability of 1/6
The pdf looks like the following…
Demonstrations of the CLT
18
Demonstrations of the CLT
19
Simulation: using a computer to “throw the dice”
many times (N).
We can then look at the sampling distribution of
the average and consider what happens as N
increases.
Demonstrations of the CLT
20
Run it one time:
Run it another time:
OK, one more time:
Demonstrations of the CLT
21
Give me a billion!
Now let’s plot the histogram…
Demonstrations of the CLT
22
We know the
population mean
is equal to 3.5…
so pretty close,
but how can we
get closer?
Demonstrations of the CLT
23
For N = 100…
Demonstrations of the CLT
24
For N = 1000…
Demonstrations of the CLT
25
In fact, for a large enough (infinite) sample, the
histogram is the sampling distribution.
So what does this have to do with OLS?
Well, it would be nice if our beta-hats were
normally distributed as is the case when:
1.) the error term is normally distributed (A7) or
Demonstrations of the CLT
26
Just as in the case of the dice, we can do a
simulation for OLS estimates.
That is, drawn a random sample of 100
observations on Y, X, and ɛ (which are not
normally distributed).
The underlying data generating process for the
two variables is linear where β1 equals 1.02.
Demonstrations of the CLT
27
In the first sample,
Demonstrations of the CLT
28
In the second sample,
Demonstrations of the CLT
29
In the third sample,
Demonstrations of the CLT
30
Demonstrations of the CLT
31
Demonstrations of the CLT
32
For the purposes of hypothesis testing, turns out
that there are precisely two ways to “skin a cat”:
1.) Assume population errors are normally
distributed, or
2.) Invoke the Central Limit Theorem
Both result in (approximately) normally
distributed beta-hats
Conclusion
33
Download