Proof of the rms error of regression formula

advertisement
RMS Error of Regression
Larry Wang
February 23, 2009
√
We will prove that the rms error of regression is given by 1 − r2 σy .
Let x and y denote the means of lists x and y, respectively, and let σx andP
σy denote their
Pstandard
deviations. For convenience, we also define a sum of squares term ssxx = (xi − x)2 =
x2 − nx
and likewise
P x’s and two y’s? Because we want to define a sum of products term
P for ssyy . Why two
−
y)
=
xy − nxy.
ssxy = (xi − x)(y
i
r
ssxx
Note that σx =
and thus ssxx = nσx2 . The same is of course true for y.
n
rσy
The regression estimate is given by predicted y = mx + y − mx, where m =
. Let’s find the rms
σx
q P
error of this estimate, n1 (yi − predicted y)2 .
P
First, we find the sum of squares of the errors, (yi − predicted y)2 .
X
X
(yi − predicted y)2 =
(yi − y + mx − mxi )2
X
2
((yi − y) − m(xi − x))
X
X
X
=
(yi − y)2 + m2
(xi − x)2 − 2m
(xi − x)(yi − y)
=
= ssyy + m2 ssxx − 2mssxy
q
Then the rms error is given by n1 (ssyy + m2 ssxx − 2mssxy ). We wish to show that this is equal to
√
1 − r2 σy , so we need to show that ssyy + m2 ssxx − 2mssxy = (1 − r2 )ssyy . This is equivalent to
2
r ssyy = 2mssxy − m2 ssxx
ssxy
r=
nσx σy
rσy
ssxy
ssxy
m=
=
=
σx
nσx2
ssxx
2
ss2xy
ss2xy
ssxy
2
r =
=
=
nσx σy
nσx2 nσy2
ssxx ssyy
Now take 2mssxy − m2 ssxx .
ssxy
2
ssxy −
ssxx
ssxy
ssxx
2
ssxx = 2
ss2xy
ss2xy
ss2xy
−
=
= r2 ssyy
ssxx
ssxx
ssxx
Which is exactly what we wanted. So the rms error of regression is indeed given by
1
√
1 − r 2 σy . 
Download