Sec 10.3 Coefficient of Determination and Standard Error of the Estimate. Review concepts x 1 2 3 4 5 y 10 8 12 16 20 𝒚′ fill in the third row of the table for each x value. Review concepts x 1 2 3 4 5 y 10 8 12 16 20 𝒚′ 7.6 10.4 13.2 16 18.8 Variation Regression line Bluman, Chapter 10 5 Variations The total variation is calculated by: ( y y ) 2 This the sum of the squares of the vertical distances from the mean. 2 parts of variation The total variation is made up off two types of variation: 1. Explained variation: attributed to the relationship between x & y. 2. Unexplained variation: due to chance. Explained variation ( y y) ' 2 Most of the variation can be explained by the relationship. Unexplained variation ( y y ) ' 2 When this variation is small then the value of r will be close to 1 or -1. The last few slides summarized: ( y y) ( y y) ( y y ) 2 Total variation ' 2 Explained variation ' 2 Unexplained variation Residuals The values of (y-y') are called residuals. A residual is the difference between the actual value of y and the predicted value of y' for a given x value. The mean of the residuals is always zero. Coefficient of . . . 1.Determination, 2 r 2.non-determination, 2 1-r Coefficient of determination, r2 The coefficient of determination,r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable. Coefficient of Determiation The coefficient of determination is the ratio of the explained variation to the total variation. The symbol for the coefficient of determination is r 2. explained variation r total variation Another way to arrive at the value for r 2 is to square the correlation coefficient. 2 Bluman, Chapter 10 14 Coefficient of Nondetermination: Coefficient nondetermination is the measure of the rest of the variation that is not explained by r2. It is the complement of r2 and equals to 1-r2. Coefficient of Nondetermiation The coefficient of nondetermination is a measure of the unexplained variation. The formula for the coefficient of determination is 1.00 – r 2. Bluman, Chapter 10 16 Some facts: The coefficient of determination is a percent. if r2=.81 that means 81% of variation in the dependent variable is explained by the variation in the independent variable. i.e. The coefficient of nondetermination is: 1-81%=19% and it means that 19% … Example: Let r=0.9123 Find the coefficients of determination and nondetermination. Explain the meaning of each. Standard Error of the estimate Symbol: Sest Sest is the standard deviation of the observed y values about the predicted y' values. s est ( y y ) n2 ' 2 y a y b xy Sest n2 2 Standard Error of the Estimate The standard error of estimate, denoted by sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is: sest y y 2 n2 Bluman, Chapter 10 20 Chapter 10 Correlation and Regression Section 10-3 Example 10-12 Page #569 Bluman, Chapter 10 21 Example 10-12: Copy Machine Costs A researcher collects the following data and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 55.57 + 8.13x. Find the standard error of the estimate. Bluman, Chapter 10 22 Example 10-12: Copy Machine Costs Machine Age x (years) Monthly cost, y A B C D E F 1 2 3 4 4 6 62 78 70 90 93 103 y 63.70 71.83 79.96 88.09 88.09 104.35 y–y (y – y )2 -1.70 6.17 -9.96 1.91 4.91 -1.35 2.89 38.0689 99.2016 3.6481 24.1081 1.8225 169.7392 y 55.57 8.13 x y 55.57 8.13 1 63.70 y 55.57 8.13 2 71.83 y 55.57 8.13 3 79.96 y 55.57 8.13 4 88.09 sest sest y 55.57 8.13 6 104.35 Bluman, Chapter 10 y y 2 n2 169.7392 6.51 4 23 Chapter 10 Correlation and Regression Section 10-3 Example 10-13 Page #570 Bluman, Chapter 10 24 Example 10-13: Copy Machine Costs sest 2 y a y b xy n2 Bluman, Chapter 10 25 Example 10-13: Copy Machine Costs sest sest Machine Age x (years) A B C D E F 1 2 3 4 4 6 Monthly cost, y xy y2 62 78 70 90 93 103 62 156 210 360 372 618 3,844 6,084 4,900 8,100 8,649 10,609 496 1778 42,186 2 y a y b xy n2 42,186 55.57 496 8.13 1778 6.48 4 Bluman, Chapter 10 26 Formula for the Prediction Interval about a Value y nx X 1 1 y 2 n n x 2 x 2 y t 2 sest nx X 1 1 2 2 n n x x 2 y t 2 sest with d.f. = n - 2 Bluman, Chapter 10 27 Chapter 10 Correlation and Regression Section 10-3 Example 10-14 Page #571 Bluman, Chapter 10 28 Example 10-14: Copy Machine Costs For the data in Example 10–12, find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old. 2 Step 1: Find x, x , and X . x 20 x 2 82 20 X 3.3 6 Step 2: Find y for x = 3. y 55.57 8.133 79.96 Step 3: Find sest. sest 6.48 (as shown in Example 10-13) Bluman, Chapter 10 29 Example 10-14: Copy Machine Costs Step 4: Substitute in the formula and solve. nx X 1 1 y 2 2 n n x x 2 y t 2 sest nx X 1 1 n n x 2 x 2 2 y t 2 sest 79.96 2.776 6.48 1 6 3 3.3 2 1 y 2 6 6 82 20 79.96 2.776 6.48 Bluman, Chapter 10 6 3 3.3 1 1 6 6 82 20 2 2 30 Example 10-14: Copy Machine Costs Step 4: Substitute in the formula and solve. 79.96 2.776 6.48 6 3 3.3 1 1 y 2 6 6 82 20 79.96 2.776 6.48 2 6 3 3.3 1 1 6 6 82 20 2 2 79.96 19.43 y 79.96 19.43 60.53 y 99.39 Hence, you can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y. Bluman, Chapter 10 31 Read section 10.3 Take notes on Residuals. Review the calculator steps. Page 574 #1-7 all, 9-17 odds Bluman, Chapter 10 32