Stat 101L: Lecture 14 Regression Wisdom Sifting Residuals for Groups –Display residuals versus the explanatory variable. –Look at the distribution of residuals. 1 Example y, Life Expectancy (years) x, Wealth Index r 0 . 874 yˆ 2 . 41 7 . 71 x 2 10 6 Count 8 4 2 -15 -10 -5 0 5 10 153 Residual Life Expectancy 1 Stat 101L: Lecture 14 Interpretation There appear to be several groups of residuals. – two large negative residuals. – a large group between –5 and 0. – another group between 5 and 10. 4 90 Life Expectancy 80 70 60 50 40 30 5 6 7 8 9 Wealth Index 10 7 10 11 5 15 Residual 10 5 0 -5 -10 -15 5 6 8 9 Wealth Index 11 6 2 Stat 101L: Lecture 14 Getting the “Bends” A fundamental assumption is that the relationship is a straight line. What looks straight on a scatter plot may show a curve when one looks at the plot of residuals versus the explanatory variable. 7 Example y, Stopping distance (feet) x, Speed (miles per hour) R 2 0 . 984 yˆ 62 . 8 3 . 48 x 8 200 Distance 150 100 50 0 0 10 20 30 40 Speed 50 60 70 80 9 3 Stat 101L: Lecture 14 10 Residual 5 Under predict Under predict 0 Over predict -5 -10 0 10 20 30 40 50 Speed 60 70 80 10 Interpretation There is a curved pattern in the residuals. – under predicts – over predicts – under predicts 11 Interpretation Although the straight line does a very good job explaining the variation in stopping distance, a curved relationship model would do even better. 12 4 Stat 101L: Lecture 14 Dangers of Extrapolation Suppose we use the least squares equation relating speed to stopping distance for a vehicle traveling at 5 mph? The predicted stopping distance is –45.4 feet. 13 Special Points Outlier – In regression, this is a point with a large residual. Leverage – In regression a point has high leverage if it is an extreme value for the explanatory variable. 14 Influence Outliers and high leverage points can be influential points, that is, they can greatly influence what the intercept and the slope of the least squares line will be. 15 5