The Coefficient of Determination Lecture 46 Section 13.9 Robb T. Koether Hampden-Sydney College Tue, Apr 13, 2010 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 1 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 2 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 3 / 48 Explaining the Variation in y Statisticians use regression models to “explain” y . More specifically, through the model they use variation in x to explain variation in y . Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 4 / 48 Explaining the Variation in y For example, why do some people weigh more than other people? One explanation is that some people weigh more than others because they are taller. That is, there is variation in weight because their is variation in height and because weight and height are correlated. But that is only a partial explanation. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 5 / 48 Explaining the Variation in y Statisticians want to quantify how much of the variation in y is explained by the variation in x. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 6 / 48 The Regression Identity As always, variation is measure by calculating a sum of squared deviations. There are three different deviations that we can measure. I I I Deviations of y from y (variation in the data). Deviations of ŷ from y (variation in the model). Deviations of y from ŷ (difference between the data and the model). Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 7 / 48 The Regression Identity Variation in the data (Total sum of squares): X SST = (y − y )2 . Variation in the model (Regression sum of squares): X SSR = (ŷ − y )2 . Residues (Sum of squared Errors): X SSE = (y − ŷ )2 . Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 8 / 48 Example - SST, SSR, and SSE The following data represent the heights and weights of 10 adult males. Height (x) Weight (y ) 70 185 65 140 180 71 76 220 150 68 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 9 / 48 Example - SST, SSR, and SSE The regression line is ŷ = −310 + 7x. The model predicts, for example, that if a person is 70 inches tall, he will weigh 180 pounds. The model also predicts that a person will weigh an additional 7 pounds for each additional inch of height. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 10 / 48 Example - SST, SSR, and SSE Compute the predicted weight: Y1 (L1 ) → L3 . Height (x) Weight (y ) Pred. Wgt. (ŷ ) 70 185 180 140 145 65 71 180 187 220 222 76 68 150 166 170 159 67 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 11 / 48 Example - SST, SSR, and SSE The regression line 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 12 / 48 Example - SST, SSR, and SSE The deviations of y from y 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 13 / 48 Example - SST, SSR, and SSE The deviations of ŷ from y 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 14 / 48 Example - SST, SSR, and SSE The deviations of y from ŷ 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 15 / 48 Example Compute SST. x 70 65 71 76 68 67 68 72 74 69 Robb T. Koether (Hampden-Sydney College) y 185 140 180 220 150 170 185 200 210 160 y −y (y − y )2 The Coefficient of Determination Tue, Apr 13, 2010 16 / 48 Example Compute SST: L2 -y. x 70 65 71 76 68 67 68 72 74 69 Robb T. Koether (Hampden-Sydney College) y 185 140 180 220 150 170 185 200 210 160 y −y 5 −40 0 40 −30 −10 5 20 30 −20 (y − y )2 The Coefficient of Determination Tue, Apr 13, 2010 17 / 48 Example Compute SST: Ans2 . x 70 65 71 76 68 67 68 72 74 69 Robb T. Koether (Hampden-Sydney College) y 185 140 180 220 150 170 185 200 210 160 y −y 5 −40 0 40 −30 −10 5 20 30 −20 (y − y )2 25 1600 0 1600 900 100 25 400 900 400 The Coefficient of Determination Tue, Apr 13, 2010 18 / 48 Example Compute SST: sum(Ans). x y 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) y −y 5 −40 0 40 −30 −10 5 20 30 −20 (y − y )2 25 1600 0 1600 900 100 25 400 900 400 5950 The Coefficient of Determination Tue, Apr 13, 2010 19 / 48 Example Compute SSR. x 70 65 71 76 68 67 68 72 74 69 Robb T. Koether (Hampden-Sydney College) y 185 140 180 220 150 170 185 200 210 160 ŷ ŷ − y The Coefficient of Determination (ŷ − y )2 Tue, Apr 13, 2010 20 / 48 Example Compute SSR: Y1 (L1 ) → L3 . x y ŷ 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) ŷ − y The Coefficient of Determination (ŷ − y )2 Tue, Apr 13, 2010 21 / 48 Example Compute SSR: L3 -y. x y 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) ŷ 180 145 187 222 166 159 166 194 208 173 ŷ − y 0 −35 7 42 −14 −21 −14 14 28 −7 The Coefficient of Determination (ŷ − y )2 Tue, Apr 13, 2010 22 / 48 Example Compute SSR: Ans2 . x y 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) ŷ 180 145 187 222 166 159 166 194 208 173 ŷ − y 0 −35 7 42 −14 −21 −14 14 28 −7 The Coefficient of Determination (ŷ − y )2 0 1225 49 1764 196 441 196 196 784 49 Tue, Apr 13, 2010 23 / 48 Example Compute SSR: sum(Ans). x y ŷ 70 185 180 145 65 140 71 180 187 222 76 220 68 150 166 67 170 159 166 68 185 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) ŷ − y 0 −35 7 42 −14 −21 −14 14 28 −7 The Coefficient of Determination (ŷ − y )2 0 1225 49 1764 196 441 196 196 784 49 4900 Tue, Apr 13, 2010 24 / 48 Example Compute SSE. x 70 65 71 76 68 67 68 72 74 69 Robb T. Koether (Hampden-Sydney College) y 185 140 180 220 150 170 185 200 210 160 ŷ y − ŷ The Coefficient of Determination (y − ŷ )2 Tue, Apr 13, 2010 25 / 48 Example Compute SSE: Y1 (L1 ) → L3 . x y ŷ 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) y − ŷ The Coefficient of Determination (y − ŷ )2 Tue, Apr 13, 2010 26 / 48 Example Compute SSE: L2 -L3 → L4 . x y ŷ 70 185 180 65 140 145 71 180 187 76 220 222 68 150 166 67 170 159 68 185 166 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) y − ŷ 5 −5 −7 −2 −16 11 19 6 −7 −13 The Coefficient of Determination (y − ŷ )2 Tue, Apr 13, 2010 27 / 48 Example Compute SSE: Ans2 . x y 70 185 65 140 71 180 76 220 68 150 67 170 68 185 72 200 74 210 69 160 Robb T. Koether (Hampden-Sydney College) ŷ 180 145 187 222 166 159 166 194 208 173 y − ŷ 5 −5 −7 −2 −16 11 19 6 −7 −13 The Coefficient of Determination (y − ŷ )2 25 25 49 4 256 121 361 36 49 169 Tue, Apr 13, 2010 28 / 48 Example Compute SSE: sum(Ans). x y ŷ 70 185 180 145 65 140 71 180 187 222 76 220 68 150 166 67 170 159 166 68 185 72 200 194 74 210 208 69 160 173 Robb T. Koether (Hampden-Sydney College) y − ŷ 5 −5 −7 −2 −16 11 19 6 −7 −13 The Coefficient of Determination (y − ŷ )2 25 25 49 4 256 121 361 36 49 169 1050 Tue, Apr 13, 2010 29 / 48 Example We have now found that SSR = 4900. SSE = 1050. SST = 5950. We see that SSR + SSE = SST. This is called the regression identity. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 30 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 31 / 48 TI-83 - Finding SSR, SSE, and SST TI-83 SSR, SSE, and SST Put the x values into L1 and the y values into L2 . Use LinReg(a+bx) L1 ,L2 ,Y1 . Enter Y1 (L1 )→L3 . To get SSR, evaluate sum((L3 -y)2 ). To get SSE, evaluate sum((L2 -L3 )2 ). To get SST, evaluate sum((L2 -y)2 ). Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 32 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 33 / 48 Explaining Variation One goal of regression is to “explain” the variation in y . For example, if y were weight, how would we explain the variation in weight? That is, why do some people weigh more than others? A partial answer is that some people weigh more because they are taller. That is, an explanatory variable is height x. What are some other partial answers? Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 34 / 48 Explaining Variation How much of the variation in weight is explained by variation in height? The total variation in weight is SST. The linear model (the regression line) explains some of the variation. The model predicts the variation SSR. The remainder is SSE, the variation not predicted by the model. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 35 / 48 Explaining Variation Statisticians consider the predicted variation SSR to be the amount of variation in y that is explained by the model. The residual variation SSE is the remaining variation in y that is not explained by the model. It all checks out because SST = SSR + SSE. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 36 / 48 Variation Explained by the Model The regression line 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 37 / 48 Variation Explained by the Model The total variation in y (SST) 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 38 / 48 Variation Explained by the Model The variation in y that is explained by the model (SSR) 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 39 / 48 Variation Explained by the Model The variation in y that is unexplained by the model (SSE) 220 210 200 190 180 170 160 150 140 64 Robb T. Koether (Hampden-Sydney College) 66 68 70 72 The Coefficient of Determination 74 76 Tue, Apr 13, 2010 40 / 48 Explaining Variation It can be shown that r2 = SSR SST and, therefore, 1 − r2 = SSE . SST Therefore, r 2 is the proportion of variation in y that is explained by the model. It is called the coefficient of determination. 1 − r 2 is the proportion that is not explained by the model. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 41 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 42 / 48 TI-83 - Coefficient of Determination TI-83 Coefficient of Determination To calculate r 2 on the TI-83, follow the procedure that produces the regression line and r . In the same window, the TI-83 reports the value of r 2 . Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 43 / 48 TI-83 - Finding SSR, SSE, and SST Practice The data on the next slide represent crude oil pricesa (x) vs. gasoline pricesb (y ). Draw the scatter plot. Find the equation of the regression line. Perform the residual analysis. Find the correlation coefficient. Find the coefficient of determination. Compute SST, SSR, and SSE. a b http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_WCO_K_W.xls http://tonto.eia.doe.gov/oog/ftparea/wogirs/xls/pswrgvwrec.xls Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 44 / 48 TI-83 - Finding SSR, SSE, and SST Practice Date Jan 16 Jan 23 Jan 30 Feb 6 Feb 13 Feb 20 Feb 27 Mar 6 Mar 13 Mar 20 Mar 27 Apr 3 Crude Oil 40.98 41.05 42.07 41.77 43.04 39.87 40.22 42.85 42.91 44.90 50.10 48.09 Date Jan 19 Jan 26 Feb 2 Feb 9 Feb 16 Feb 23 Mar 2 Mar 9 Mar 16 Mar 23 Mar 30 Apr 9 Gasoline 1.833 1.833 1.894 1.926 1.970 1.924 1.942 1.936 1.921 1.950 2.048 2.044 Find SST, SSR, and SSE. Find r 2 and interpret the value. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 45 / 48 Outline 1 The Regression Identity 2 Sums of Squares on the TI-83 3 Explaining Variation 4 TI-83 - The Coefficient of Determination 5 Assignment Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 46 / 48 Assignment Homework Read Section 13.9, pages 868 - 869. Work the practice problem on the previous slide. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 47 / 48 Answers to Even-Numbered Exercises Answers to Even-Numbered Exercises SST = 0.0490, SSR = 0.0321, SSE = 0.0169. r 2 = 0.6544. About 65.44% of the variation in gas prices is due to variation in oil prices. Robb T. Koether (Hampden-Sydney College) The Coefficient of Determination Tue, Apr 13, 2010 48 / 48