Document 10639919

advertisement
Underfitting and Overfitting
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
1 / 43
Underfitting
Suppose the true model is
y = Xβ + η + ε,
where η is an unknown fixed vector and ε satisfies the GMM.
Suppose we incorrectly assume that
y = Xβ + ε.
This example of misspecifying the model is known as underfitting.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
2 / 43
Note that η may equal Wα for some design matrix W whose columns
could contain explanatory variables excluded from X.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
3 / 43
What are the implications of underfitting?
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
4 / 43
Find E(c0 β̂).
Find E(σ̂ 2 ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
5 / 43
Example 1
Consider an experiment with two experimental units (mice in this case)
for each of two treatments.
We might assume the GMM holds with
 

y11
1
 

y 
1
 12 

E(y) = E   = Xβ = 
y21 
1
 

y22
1
c
Copyright 2012
Dan Nettleton (Iowa State University)



1 0  
µ + τ1
 µ



 
1 0
µ
+
τ
1



=

.
τ
1


µ + τ2 
0 1
 τ


2
0 1
µ + τ2
Statistics 611
13 / 43
Example 1 (continued)
Suppose the person who conducted the experiment neglected to
mention that, in each treatment group, one of the experimental units
was male and the other was female.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
14 / 43
Example 1 (continued)
Then the true model may require

 
 

α/2
µ + τ1
µ + τ1 + α/2

 
 

µ + τ − α/2 µ + τ  −α/2
1
1


 

E(y) = 

+
=
µ + τ2 + α/2 µ + τ2   α/2

 
 

−α/2
µ + τ2
µ + τ2 − α/2


1/2


−1/2 h i


= Xβ + 
 α = Xβ + Wα = Xβ + η.
 1/2


−1/2
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
15 / 43
Example 1 (continued)
If we analyze the data assuming the GMM with E(y) = Xβ, determine
1
E(τ\
1 − τ2 ), and
2
E(σ̂ 2 ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
16 / 43
Example 2
Once again consider an experiment with two experimental units (mice)
for each of two treatments.
Suppose we assume the GMM holds with
 

y11
1
 

y 
1
 12 

E(y) = E   = Xβ = 
y21 
1
 

y22
1
c
Copyright 2012
Dan Nettleton (Iowa State University)



1 0  
µ + τ1
 µ



 
1 0
µ
+
τ
1



=

.
τ
1




0 1
µ + τ2 


τ2
0 1
µ + τ2
Statistics 611
19 / 43
Example 2 (continued)
Suppose the person who conducted the experiment neglected to
mention that both experimental units in treatment group 1 were female
and that both experimental units in treatment group 2 were male.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
20 / 43
Example 2 (continued)
Then the true model may require

 
 

α/2
µ + τ1
µ + τ1 + α/2

 
 

µ + τ + α/2 µ + τ   α/2
1
1


 

E(y) = 

+
=
µ + τ2 − α/2 µ + τ2  −α/2

 
 

−α/2
µ + τ2
µ + τ2 − α/2


1/2


 1/2 h i


= Xβ + 
 α = Xβ + Wα = Xβ + η.
−1/2


−1/2
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
21 / 43
Example 2 (continued)
If we analyze the data assuming the GMM with E(y) = Xβ, determine
1
E(τ\
1 − τ2 ), and
2
E(σ̂ 2 ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
22 / 43
Overfitting
Now suppose we consider the model
y = Xβ + ε,
where
X = [X1 , X2 ]
and β =
" #
β1
β2
3
Xβ = X1 β 1 + X2 β 2 .
Furthermore, suppose that (unknown to us) X2 β 2 = 0.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
27 / 43
In this case, we say that we are overfitting.
Note that we are fitting a model that is more complicated than it needs
to be.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
28 / 43
To examine the impact of the overfitting, consider the case where
X = [X1 , X2 ] is of full-column rank.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
29 / 43
If we were to fit the simpler and correct model y = X1 β 1 + ε, the LSE of
β 1 is β̃ 1 = (X01 X1 )−1 X1 y. Then
E(β̃ 1 ) = (X01 X1 )−1 X01 E(y)
c
Copyright 2012
Dan Nettleton (Iowa State University)
= (X01 X1 )−1 X01 X1 β 1
= β1 .
Statistics 611
30 / 43
Var(β̃ 1 ) = (X01 X1 )−1 X01 Var(y)X1 (X01 X1 )−1
= σ 2 (X01 X1 )−1 X01 X1 (X01 X1 )−1
= σ 2 (X01 X1 )−1 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
31 / 43
If we were to fit the full model
y = X1 β1 + X2 β2 + ε
that is correct
" # but more complicated than it needs to be, then the LSE
β1
of β =
is
β2
"
β̂ 1
β̂ 2
#
−1
[X1 , X2 ]0 y
= [X1 , X2 ]0 [X1 , X2 ]
"
=
X01 X1 X01 X2
X02 X1 X02 X2
c
Copyright 2012
Dan Nettleton (Iowa State University)
#−1 "
#
X01 y
X02 y
.
Statistics 611
32 / 43
If X01 X2 = 0, then
"
β̂ 1
β̂ 2
#
"
=
"
=
X01 X1
0
0
X02 X2
#−1 "
#
X01 y
(X01 X1 )−1 X1 y
(X02 X2 )−1 X2 y
c
Copyright 2012
Dan Nettleton (Iowa State University)
X02 y
"
#
=
β̃ 1
#
(X02 X2 )−1 X2 y
.
Statistics 611
33 / 43
Now suppose X01 X2 6= 0.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
34 / 43
E
" #!
β̂ 1
β̂ 2
= (X0 X)−1 X0 E(y)
= (X0 X)−1 X0 Xβ
" #
β1
=β=
.
β2
Thus, E(β̂ 1 ) = β 1 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
35 / 43
Var(β̂) = Var
" #!
β̂ 1
β̂ 2
= σ 2 (X0 X)−1
#−1
"
0
0
2 X1 X1 X1 X2
=σ
.
X02 X1 X02 X2
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
36 / 43
By Exercise A.72,
"
A B
C D
#−1
"
=
A−1 + A−1 BE−1 CA−1 A−1 BE−1
−ECA−1
E−1
#
,
where E = D − CA−1 B.
Thus, Var(β̂ 1 ) is σ 2 times
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
37 / 43
(X01 X1 )−1 + (X01 X1 )−1 X01 X2 (X02 X2 − X02 X1 (X01 X1 )−1 X01 X2 )−1 X02 X1 (X01 X1 )−1
= (X01 X1 )−1 + (X01 X1 )−1 X01 X2 (X02 (I − PX1 )X2 )−1 X02 X1 (X01 X1 )−1 .
Thus,
Var(β̂ 1 ) − Var(β̃ 1 ) = σ 2 (X01 X1 )−1 X01 X2 (X02 (I − PX1 )X2 )−1 X02 X1 (X01 X1 )−1 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
38 / 43
In a homework problem, you will show that
Var(β̂ 1 ) − Var(β̃ 1 ) is NND.
Thus, one cost of overfitting is increased variability of estimators of
regression coefficients.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
39 / 43
How is estimation of σ 2 affected?
Let r1 = rank(X1 ) and r2 = rank(X2 ).
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
40 / 43
If we fit a simpler model y = X1 β 1 + ε, then
σ̃ 2 =
y0 (I − PX1 )y
n − r1
and
E(y0 (I − PX1 )y) = (n − r1 )σ 2
⇒ E(σ̃ 2 ) = σ 2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
41 / 43
If we overfit with the model y = Xβ + ε, then
σ̂ 2 =
y0 (I − PX )y
n−r
and
E(σ̂ 2 ) = σ 2 .
Thus, overfitting does not lead to biased estimation of σ 2 .
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
42 / 43
However, as we will see later in the course, overfitting leads to a loss of
degrees of freedom (n − r < n − r1 ), which can lead to a loss of power
for testing hypotheses about β.
c
Copyright 2012
Dan Nettleton (Iowa State University)
Statistics 611
43 / 43
Download