Partitioning the Sums of Squares for Linear Regression MA 223 Let (xk , yk ), 1 ≤ k ≤ n be any set of points in the plane. As we showed in class the best fit line y = a + bx though this data is obtained when b= Sxy , a = ȳ − bx̄ Sxx where as usual x̄ and ȳ denote the sample means for the xk and yk respectively and Sxx = ∑ ∑ k k (xk − x̄)2 , Sxy = (xk − x̄)(yk − ȳ). Recall also that the total sum of squares for the data is SStotal = ∑ (yk − ȳ)2 k and the residual sum of squares is SSres = ∑ ∑ k k (ŷk − yk )2 = (bxk + a − yk )2 . In the text it is claimed on page 119 that SStotal = SSres + bSxy . (1) Note that since b = Sxy /Sxx the quantity bSxy is always non-negative, so that SStotal ≥ SSres is always true. Equation (1) isn’t too hard to prove. First, I claim that we may as well assume that x̄ = 0 and ȳ = 0. To see this, recall from class that we can add any constant p to each xk and the quantity xk − x̄ is unchanged for any k. Similarly we can add any constant q to each xk and the quantity yk − ȳ is unchanged for any k. This leaves Sxx , Syy and Sxy unchanged, and of course then b = Sxy /Sxx and SStotal are also unchanged. However, x̄ changes to x̄ + p and ȳ changes to ȳ + q, so that a changes to a + q − bp. Then the quantity (bxk + a − yk ) changes to b(xk + p) + (a + q − bp) − yk − q = bxk + a − yk , that is, remains unchanged. Thus SSres also remains unchanged. The upshot of all this is that adding any constant to the xk and/or yk leaves the essential quantities of interest b, Sxx , Sxy , SStotal , and SSres unchanged. As a consequence we may as well assume, for simplicity, that x̄ = ȳ = 0, since we can always add appropriate constants to make this true. In this case we have ∑ ∑ ∑ ∑ xk yk 2 2 , SS = (bx − y ) , S = xk yk . , a = 0, SS = y res k k xy total k 2 k xk k k k b = ∑k Then SSres + bSxy = ∑ ∑ (bxk − yk ) + 2 k 1 ( k xk yk )2 2 k xk ∑ = b 2 ∑ ∑ = = ( x2k − 2b k ∑ k xk yk + ∑ ∑ k 2 ∑ yk2 + ( k xk yk )2 2 k xk ∑ ∑ ∑ xk yk ) ( k xk yk ) ( k xk yk )2 2 −2 ∑ + y + ∑ 2 ∑ 2 k 2 k xk k xk k xk k ∑ k 2 yk2 k = SStotal . This demonstrates equation (1). 2