RMS Error of Regression Larry Wang February 23, 2009 √ We will prove that the rms error of regression is given by 1 − r2 σy . Let x and y denote the means of lists x and y, respectively, and let σx andP σy denote their Pstandard deviations. For convenience, we also define a sum of squares term ssxx = (xi − x)2 = x2 − nx and likewise P x’s and two y’s? Because we want to define a sum of products term P for ssyy . Why two − y) = xy − nxy. ssxy = (xi − x)(y i r ssxx Note that σx = and thus ssxx = nσx2 . The same is of course true for y. n rσy The regression estimate is given by predicted y = mx + y − mx, where m = . Let’s find the rms σx q P error of this estimate, n1 (yi − predicted y)2 . P First, we find the sum of squares of the errors, (yi − predicted y)2 . X X (yi − predicted y)2 = (yi − y + mx − mxi )2 X 2 ((yi − y) − m(xi − x)) X X X = (yi − y)2 + m2 (xi − x)2 − 2m (xi − x)(yi − y) = = ssyy + m2 ssxx − 2mssxy q Then the rms error is given by n1 (ssyy + m2 ssxx − 2mssxy ). We wish to show that this is equal to √ 1 − r2 σy , so we need to show that ssyy + m2 ssxx − 2mssxy = (1 − r2 )ssyy . This is equivalent to 2 r ssyy = 2mssxy − m2 ssxx ssxy r= nσx σy rσy ssxy ssxy m= = = σx nσx2 ssxx 2 ss2xy ss2xy ssxy 2 r = = = nσx σy nσx2 nσy2 ssxx ssyy Now take 2mssxy − m2 ssxx . ssxy 2 ssxy − ssxx ssxy ssxx 2 ssxx = 2 ss2xy ss2xy ss2xy − = = r2 ssyy ssxx ssxx ssxx Which is exactly what we wanted. So the rms error of regression is indeed given by 1 √ 1 − r 2 σy .