Regression Line Definition: The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. Least-Squares Regression Different regression lines produce different residuals. The regression line we want is the one that minimizes the sum of the squared residuals. + Least-Squares Regression Line Definition: Equation of the least-squares regression line We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and standard deviations of the two variables and their correlation. The least squares regression line is the line ŷ = a + bx with slope br and y intercept sy sx a y bx Least-Squares Regression We can use technology to find the equation of the leastsquares regression line. We can also write it in terms of the means and standard deviations of the two variables and their correlation. + Least-Squares Least-Squares Regression Line br sx and y intercept a y b x The number of miles (in thousands) for the 11 used Hondas have a mean of 50.5 and a standard deviation of 19.3. The asking prices had a mean of $14,425 and a standard deviation of $1,899. The correlation for these variables is r = -0.874. Find the equation of the least-squares regression line and explain what change in price we would expect for each additional 19.3 thousand miles. To find the slope: b 0 . 874 1899 86 . 0 19 . 3 To find the intercept: 14425 = a - 86(50.5) a = 18,768 For each additional 19.3 thousand miles, we expect the cost to change by r s y ( 0 . 874 )( 1899 ) 1660 . So, the CR-V will cost about $1660 fewer dollars for each additional 19.3 thousand miles. Note: Due to roundoff error, values for the slope and y intercept are slightly different from the values on page 166. Least-Squares Regression slope sy + . Definition: Equation of the least-squares regression line Least-Squares Regression Line + . Body Weight (lb): 120 187 109 103 131 165 158 116 Backpack Weight (lb): 26 30 26 24 29 35 31 28 You should get yˆ 16 . 3 0 . 0908 x Least-Squares Regression Find the least-squares regression equation for the following data using your calculator. Plots Definition: A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data. Least-Squares Regression One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between two variables. We see departures from this pattern by looking at the residuals. + Residual Residual Plots Pattern in residuals Linear model not appropriate Definition: If we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by s residuals n2 2 2 ˆ ( y y ) i n2 Least-Squares Regression A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. 1) The residual plot should show no obvious patterns 2) The residuals should be relatively small in size. + Interpreting Plots For the used Hondas data, the standard deviation is s = 8 , 499 ,851 = $972. So, when we use number of miles to 11 2 estimate the asking price, we will be off by an average of $972. Least-Squares Regression Here are the scatterplot and the residual plot for the used Hondas data: + Residual Role of r2 in Regression Definition: The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r2 using the following formula: 2 r 1 where and SSE residual 2 SST (y i y) 2 SSE SST Least-Squares Regression The standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the leastsquares regression line predicts values of the response y. + The Role of r2 in Regression SSE/SST = 30.97/83.87 SSE/SST = 0.368 If we use the mean backpack Therefore, 36.8% of the variation weight as our prediction, the sum in pack is unaccounted for by of theweight squared residuals is 83.87. the least-squares regression line. SST = 83.87 Least-Squares Regression r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. Consider the example on page 179. If we needed to predict a backpack weight for a new hiker, but didn’t know each hikers weight, we could use the average backpack weight as our prediction. + The 1 – SSE/SST = 1 – 30.97/83.87 r2 = 0.632 If we use the LSRL to make our 63.2 % of the variation in backpack weight predictions, the sum of the is accounted for by the linear model squared residuals is 30.90. relating pack weight to body weight. SSE = 30.90