Lecture 9: Goodness of fit BUEC 333 Professor David Jacks

advertisement
Lecture 9: Goodness of fit
BUEC 333
Professor David Jacks
1
More than once, we have said that the goal of
regression analysis is to explain variation in the
dependent variable (Yi) on the basis of variation in
the independent variables (X1i, X2i,…, Xki).
But what exactly does this mean?
And how do we know whether we (or our
regression) are doing a good job?
Today’s topic revolves around the idea of the
goodness of fit.
Explaining variation in Y
2
When we talk about the variation in Yi to be
explained, we are implicitly talking about how Yi
varies around its mean.
Ultimately, our interest is in deviations of Yi from
its population mean μY .
Of course, we do not know μY.
The total sum of squares
3
However, we always have
n
 (Y  Y )  0 because
i
i 1
n
n
n
Y  Y  Y 
i 1
i
i 1
i 1
i
So, trying to explain the total of these deviations is
pretty useless.
Taking a cue from OLS, it might make sense then
The total sum of squares
4
We will focus on what is (usually) called the
Total Sum of Squares (TSS):
TSS   Yi  Y 
n
2
i 1
Should look familiar to you: think “variance”.
Generally, this is not zero unless there is no
variation in Yi at all.
When TSS is large, there is lots of variation in Yi
around its mean…
The total sum of squares
5
We can always write: Yi  Y  Yi  Y  Yˆi  Yˆi
The decomposition of variance
6

 
Thus, TSS   Yi  Y    Yˆi  Y  ei
n
i 1
n
TSS  
i 1
2

Yˆi  Y
n
i 1
n
2
  e
2
i 1
2
i
TSS  ESS  RSS
This is a very convenient expression as it
decomposes TSS into two components:
The total sum of squares
7
When we build a regression model, we want to
know how well it “fits” the data; that is, does our
model do a good job of explaining variation in Yi?
This suggests why our previous decomposition is
so useful: it gives us two parts, that which is
explained and that which is unexplained.
Thus, we use it to measure the proportion of the
Explained variance
8
The proportion of variation in Yi around its mean
that is explained by the regression model is R2:
2
e
i i
ESS TSS  RSS
RSS
R 

 1
 1
2
TSS
TSS
TSS
 Yi  Y 
2
i
Therefore, R2 is a summary statistic of the
variation explained which is bound by 0 and 1 and
is known as the coefficient of determination.
Explained variance
9
If ESS = 0, then R2 = 0 and we have explained
none of the variation with our regression model. 
Using R2 to assess model fit
10
If ESS = TSS, then R2 = 1 and we have explained
all of the variation with our regression model. ?
Using R2 to assess model fit
11
Typically, we do not encounter either of these
extremes in the data
Generally, larger values are better in the sense that
our model does a better job of predicting Yi.
So how big should R2 be to inspire confidence in
our model? The answer is context specific…
Using R2 to assess model fit
12
There is a somewhat natural temptation to build a
model (i.e. choose your independent variables) to
maximize R2. Avoid this temptation!
If you add another independent variable, R2 never
decreases—even if the new variables has no real
relationship with the dependent variable.
Why? Adding more variables will not change TSS,
and it can either leave RSS unchanged or lower it.
More about R2
13
Ultimately, we are looking for a set of
independent variables that have economic as well
as statistical significance.
Another reason to avoid maximizing R2: there is a
an associated loss of degrees of freedom.
The degrees of freedom is defined as the number
of observations (n) minus number of parameters
Motivating adjusted R2
14
When we add independent variables to the model,
we lose degrees of freedom, and our parameter
estimates are less precise.
So if we add extra variables to the model, we need
to trade off a better fit (in terms of R2) against
having a concise model.
Adjusted R2 takes this trade-off into account by
measuring the share of Yi’s variation explained by
a model
Motivating adjusted R2
15
With that in mind, adjusted R2 is defined as:
R
2
e /(n  k  1)

 1
 Y  Y  /(n  1)
2
i i
2
i
i
n 1 
 RSS
R  1 
*

 TSS n  k  1 
n 1 

2
2
R  1   (1  R ) *

n  k 1 

2
Adjusted R2
16
An example: our model of SALARY as a function
of POINTS from Lecture 8.
Naturally, we think performance will be linked to
pay, a result borne out in the estimates…
Adjusted R2
17
But what happens when we attempt to maximize
R2 by adding extraneous independent variables.
Naturally, we expect R2 might increase
Adjusted R2
18
Thus, adjusted R2 (or R-bar-squared) penalizes for
having lots of independent variables (i.e. few
degrees of freedom).
It can increase, decrease, or stay the same when
we add an extra regressor to the model.
Adjusted R2
19
Like R2, adjusted R2 is less than one, but it is not
necessarily positive (i.e., if R2 is very close to zero
to begin with, adjusted R2 can be negative).
Conveniently, it can be used to compare fits of
regressions with same dependent variable and
different numbers of independent variable.
But it is not “the final word”: we must assess
Adjusted R2
20
Download