Quarter 4: Part I Simple Linear Regression Section 4: Notes - Coefficient of Determination (r2) – Strength of a Linear Relationship A fitted value ŷ is simply another name for a predicted value as it describes where a particular x-value fits the line of best fit. It is found by substituting a given value of x into the regression equation ŷ b0 b1 x . A residual denoted (e) is the difference or error between an observed observation and a predicted or fit value. Graphically, it is the vertical distance between a point and the line of best fit. It is found by subtracting the fitted ŷ -value from the observed y-value: e y yˆ An outlier, in the regression sense, is a value with a large, in absolute value, residual. Graphically, it is a point that falls far from the regression line, not following the pattern apparent in the other points. In regression, the residuals represent the natural or unexplained variation (the natural error) as they describe the deviations about the regression line. The coefficient of correlation r measures the strength of a relationship. r also indicates the direction of a relationship. r is between 1 r 1 where r = 1 indicates a perfect positive relationship and r = 1 indicates a perfect negative relationship. r = 0 indicates no relationship, or at least no linear relationship when employing the linear model. The coefficient of determination r 2 , like the coefficient of correlation, describes the strength of a relationship, but has a more concrete interpretation. SSR SST SSE r2 . 0 r2 1 SST SST 2 SST is the total sum of squared deviations about y . SST y y SSE is the total sum of squared deviations about the regression line ŷ . SSE y yˆ residual . 2 2 SSR is the total sum of squared deviations due to regression, i.e. SSR yˆ y SSR, however, is most easily found by computing the difference SSR = SST – SSE. 2 Interpreting r2: Blank- percent of the variation in y variable is explained by the regression line. A line of best fit or regression line is also called a least squares regression line because this line fits the points in such a way that minimizes the error or residual terms, and hence minimizes SSE the squared error terms. Quarter 4: Part I Simple Linear Regression Section 4: Notes - Coefficient of Determination (r2) – Strength of a Linear Relationship 1. A researcher would like to know if gestation period of an animal could be used to predict the life expectancy. She collects the following data. Animal Cat Chicken Dog Duck Goat Lion Parakeet Pig Rabbit Squirrel Gestation (days) (x) Life Expectancy (years) (y) 63 22 63 28 151 108 18 115 31 44 fitted residual y y y 2 ŷ y yˆ y ŷ 2 11 7.5 11 10 12 10 8 10 7 9 TOTAL SST SSE (1) Fill in the chart to find SST, SSE, SSR, and ultimately the coefficient of determination r 2 . Check your r2 with that provided by the calculator. (2) Interpret the coefficient of determination r 2 . 2. Following are the lengths and grades of ten research papers for a sociology professor’s class Length (pages): x 25 32 20 28 15 34 29 30 45 35 Grade: y 69 81 72 75 64 89 84 73 92 86 (1) On the graph below, draw each residual (the vertical distance between the point and the line of best fit) (2) Use your calculator (LinReg) to find the least squares regression line along with the coefficient of determination r2. (3) Interpret the coefficient of determination. Regression Plot grade = 49.2756 + 0.997420 pages S = 4.38172 R-Sq = 80.1 % R-Sq(adj) = 77.6 % 95 grade 85 75 65 15 25 35 pages 45