Lecture 11: The classical linear regression model BUEC 333 Professor David Jacks

advertisement
Lecture 11: The classical linear regression model
BUEC 333
Professor David Jacks
1
First half of this course dealt with the intuition and
statistical foundations underlying regression
analysis.
This week, we will explore the key assumptions
embedded in the classical (normal) linear
regression model, or C(N)LRM.
Second half of this course will deal with the
Review of regression analysis
2
But first, need to review what we have seen before
with respect to what constitutes regression.
At the heart of it, there is the regression function,
a conditional expectation expressed as E(Y|X).
Generally, our goal is to make predictions
regarding the value of Y or statistically summarize
the economic relationship between Y and X.
Review of regression analysis
3
The simplest example of a regression model is the
case where the regression function f is a line:
E[Y|X] = β0 + β1X
This says that the conditional mean of Y is a linear
function of the independent variable X.
We call β0 and β1 the regression coefficients.
Review of regression analysis
4
The regression function is never an exact
representation of the relationship between
dependent and independent variables.
There is always some variation in Y that cannot be
explained by the model; we think of this as error.
To soak up all that variation in Y that cannot be
explained by X, we include a
Review of regression analysis
5
Our simple linear regression model has a
deterministic (non-random) and a stochastic
(random) component now:
Y = β0 + β1X + ε
As β0 + β1X is the conditional mean of Y given X,
Y = E(Y|X) + ε implies ε = Y − E(Y|X).
This comprises that which is known, unknown, and
assumed
Review of regression analysis
6
Goal to end up with a set of estimated coefficients
(sample statistics that we compute from our data):
ˆ0 , ˆ1 , ˆ2 ,..., ˆk
One sample would produce one set of estimated
coefficients, another sample a different set…
Thus, they are RVs with their own sampling
distribution (well-defined mean and variance…
how likely is it we end up with a certain estimate).
Example: class average on midterm vs. samples
Review of regression analysis
7
From population regression to sample analog:
Yi   0  1 X 1i   2 X 2i 
  k X ki   i
Yi  ˆ0  ˆ1 X 1i  ˆ2 X 2i 
 ˆk X ki  ei
Yi  Yˆi  ei
Ordinary least squares (OLS) as a means to
minimize our “prediction mistakes”…
OLS minimizes the sum of squared residuals:
Review of regression analysis
8
How to determine this minimum?
Differentiate w.r.t. β0 and β1, set to zero, and solve.
The solutions to this minimization problem are:
βˆ0  Y  ˆ1 X
 X
n
 X
n
ˆ1 
i 1
i
 X Yi  Y 
 X
n
i 1
i
X
i 1
=
2
i
 X Yi  Y 
n 1
 X
n
i 1
i
X
2
n 1
Review of regression analysis
9
Our question for the week is for which model does
OLS give us the best estimator?
But what do we mean by best?
That is, how do we evaluate an estimator?
First, take a look at the sampling distribution, and
ask whether the estimator is:
1.) centered around the “true value”?
OLS as the best estimator
10
An estimator is unbiased if it is centered at the
true value, or E ( ˆk )   k .
This is our minimal requirement for an estimator.
Using only unbiased estimators, the one with the
smallest variance is called efficient (or best).
Means using our data/information optimally.
OLS as the best estimator
11
OLS as the best estimator
12
If the assumptions of the classical model hold,
then the OLS estimator is BLUE.
BLUE = Best Linear Unbiased Estimator.
But what is the purpose of demonstrating this?
1.) Provides the statistical argument for OLS.
The classical assumptions
13
1.) The regression model is correctly specified:
a.) has the correct functional form and
b.) has an additive error term.
2.) The error term has zero population mean,
or E(εi) = 0.
3.) All independent variables are uncorrelated
with the error term, or Cov(Xi,εi) = 0,
for each independent variable Xi .
The classical assumptions
14
4.) No independent variable is a perfect linear
function of any other independent variable
5.) Errors are uncorrelated across observations,
or Cov(εi,εj) = 0 for two observations i and j
6.) The error term has a constant variance, or
Var(εi) = σ2 for every i
The classical assumptions
15
Remember that we wrote the regression model as
Yi = E(Yi|Xi) + εi
Assumption 1 says: a.) The regression has the
correct functional form for E[Yi|Xi].
This entails that:
a.) E[Yi|Xi] is linear in parameters,
b.) we have include all the correct X’s, and
Assumption 1: specification
16
Assumption 1 also says: b.) The regression model
has an additive error terms as in Yi = E(Yi|Xi) + εi.
Not very restrictive: can always write the model
this way if we define εi = Yi - E[Yi|Xi].
But if we fail on any one of these counts (so that
Assumption 1 is violated), OLS will give us the
wrong answer.
Assumption 1: specification
17
Also known as having a zero mean error term.
Pretty weak assumption as all it says is that there
is no expected error in the regression function.
Of course, if we expected a particular error value,
then (part of) the error term would be predictable,
and we could just add that to the regression model.
Assumption 2: Zero population mean
18
In this case,
E(Yi|Xi) = E(β0 + β1Xi + εi) = β0 + β1Xi + E(εi)
E(Yi|Xi) = β0 + β1Xi + 5
We could just define a new intercept β0* = β0 + 5
and a new error term εi* = εi – 5.
Then we have Yi = β0* + β1Xi + εi*
If Assumption 2 is violated, OLS will be
Assumption 2: Zero population mean
19
We need Assumption 3 to be satisfied for all
independent variables Xi; that is, there is zero
covariance between the Xi and the error term.
When Assumption 3 is satisfied, we say Xi is
exogenous.
When Assumption 3 is violated, we say Xi is
endogenous.
Why are applied economists so worried about
endogeneity?
Assumption 3: No correlation with the error term
20
Remember we cannot observe εi—it is one of the
unknowns in our population regression function.
Furthermore, the error term should be something
that happens separately from everything else in the
model….it is, in a sense, a shock.
If Cov(Xi,εi) ≠ 0 and Xi is in our model, then OLS
attributes variation in Yi to Xi that is due to εi.
Assumption 3: No correlation with the error term
21
Our estimator should not “explain” this variation
in Y using X because it is due to error, not X.
But if Cov(Xi,εi) ≠ 0, then when ε moves around,
so does X.
Thus, we see X and Y moving together, and the
least squares estimator therefore “explains” some
of this variation in Y using X.
Assumption 3: No correlation with the error term
22
Consequently, OLS gives us the wrong answer;
we get a biased estimate of the coefficient on X
because it measures the effect of X and ε on Y.
This assumption is violated most frequently when
a researcher omits an important independent
variable from an equation.
We know from Assumption 2 that the influence of
omitted variables will show up in the error term;
Assumption 3: No correlation with the error term
23
But if not, then we have omitted variable bias.
Example: suppose income depends on standard
variables like age, education, and IQ, but we do
not have the data on IQ.
We might also think that education and IQ are
correlated: smarter people tend to get more
education(?).
Assumption 3: No correlation with the error term
24
How do we ever know if Assumption 3 is
satisfied?
There are some statistical tests available to tell us
that our independent variables are exogenous, but
they are not very convincing.
Sometimes we can rely on economic theory.
More often than not, we have to rely on
Assumption 3: No correlation with the error term
25
If Assumptions 1 through 3 are satisfied, then the
OLS estimator is unbiased.
Suppose we have the simple linear regression
Yi   0  1 X i   i 
ˆ
1
X  X Y  Y 



 X  X 
X  X     X    



 X  X 
i
i
2
i
ˆ1
i
i
i
i
0
1
i
0
 1 X   
2
i
Unbiasedness
i
i
26
ˆ1
X


i
i

 X  1  X i  X    i  

 X  X 
   X  X    X  X     


 X  X 
 X  X 
2
i
i
2
ˆ
1
1
i
i
i
i
i
2
i
i
2
i
i
̂1  1
 
 E ˆ1  1
Unbiasedness
27
This is more of a technical assumption…perfect
collinearity makes OLS coefficients incalculable.
With perfect (multi-)collinearity, one (or more)
independent variables is a perfect linear function
of other variables.
Suppose in our hockey data, we have POINTS,
GOALS, and ASSISTS as independent variables
in a regression model where
Assumption 4: No perfect collinearity
28
Perfect collinearity is a problem because for OLS
there is no unique variation in X to explain Y.
If we regress SALARY on POINTS, GOALS, and
ASSISTS whenever we see SALARY vary with
GOALS, it also varies with POINTS by exactly
the same amount.
Thus, we do not know if the corresponding
variation in SALARY should be attributed to
GOALS
Assumption 4: No perfect collinearity
29
Perfect collinearity between two independent
variables implies that:
1.) they are really the same variable,
2.) one is a multiple of the other, and/or
3.) a constant has been added to a variable
In any case, they are providing the same
information about population parameter values.
The solution is simple
Assumption 4: No perfect collinearity
30
Observations of the error terms should not be
correlated with each other as they should be
independent draws, or Cov(εi,εj) = 0.
If not, we say the errors are serially correlated.
If they are serially correlated, this tells us that if
ε1 > 0, then ε2 is more likely to be positive also.
Assumption 5: No serial correlation
31
Suppose there is a big negative shock (ε < 0) to
GDP this year (e.g., oil prices rise or there is
another financial crisis).
This likely triggers a recession and we can expect
to see another negative shock in the path of GDP.
If the case of serial correlation, OLS is still
unbiased but there are more efficient estimators
Assumption 5: No serial correlation
32
Assumption 5: No serial correlation
33
Assumption 5: No serial correlation
34
Error terms should possess a constant variance as
they should be independent, identical draws.
If true, we say the errors are homoskedastic.
If false, we say the errors are heteroskedastic.
(Hetero-skedasis = different-dispersion)
In general, heteroskedasticity arises when Var(ε)
depends
Assumption 6: No heteroskedascticity
35
Suppose we regress income (Y) on education (X).
Although people with more education have higher
income on average, they also (as a group) have
more variability in their earning.
That is, some people with PhDs get good jobs and
earn $$$, but some are “over-qualified” and have
a hard time finding a job.
Assumption 6: No heteroskedascticity
36
In contrast, almost everyone that drops out of high
school earns very little…low levels of education,
low average and variance of earnings.
The residuals from our regression of earnings on
education will likely increase as education
increases.
Like before, OLS is still unbiased but there are
more efficient estimators
Assumption 6: No heteroskedascticity
37
Assumption 6: No heteroskedascticity
38
Assumption 6: No heteroskedascticity
39
When Assumptions 1 through 6 are satisfied, then
least squares estimator, ˆ j , has the smallest
variance of all linear unbiased estimators of βj.
An important theorem: it allows us to say that
least squares is BLUE (remember, Best Linear
Unbiased Estimator).
What do we mean by a linear estimator?
The Gauss-Markov Theorem (GMT)
40
From Lecture 8, OLS estimator as a weighted
average of the Y’s:



 X X

n 


1
1 
i
ˆ0  Y  ˆ1 X     X  n

Yi
2


n 
i 1 n

  Xi  X 
 
 i 1


n


 X i  X Yi  Y  n   X i  X  1 


Y
ˆ1  i 1 n



i
2
2
n


n
i 1
 Xi  X 
  Xi  X 


i 1
 i 1

w’s just place more emphasis on observations of Y
The Gauss-Markov Theorem (GMT)
41
The bottomline: we have established the
conditions necessary for OLS to be BLUE.
Assumptions 1 through 3: OLS is unbiased.
Given unbiasedness, what can we say about the
sampling variance of estimates using OLS?
That is where Assumptions 4 through 6 come in:
OLS has the smallest sampling variance possible;
consequently, the highest reliability possible.
Conclusion
42
Download