Stat 101 – Lecture 15 Regression Wisdom • Sifting Residuals for Groups – Display residuals versus the explanatory variable. – Look at the distribution of residuals. 1 Example • y, Life Expectancy (years) • x, Wealth Index r = 0 . 874 yˆ = 2 . 41 + 7 . 71 x 2 10 6 Count 8 4 2 -15 -10 -5 0 5 10 Residual Life Expectancy 15 3 Stat 101 – Lecture 15 Interpretation • There appear to be several groups of residuals. – two large negative residuals. – a large group between –5 and 0. – another group between 5 and 10. 4 90 Life Expectancy 80 70 60 50 40 30 5 6 7 8 9 W ealth Index 10 11 5 15 Residual 10 5 0 -5 -10 -15 5 6 7 8 9 Wealth Index 10 11 6 Stat 101 – Lecture 15 Getting the “Bends” • A fundamental assumption is that the relationship is a straight line. • What looks straight on a scatter plot may show a curve when one looks at the plot of residuals versus the explanatory variable. 7 Example • y, Stopping distance (feet) • x, Speed (miles per hour) R 2 = 0 . 984 yˆ = − 62 . 8 + 3 . 48 x 8 200 Distance 150 100 50 0 0 10 20 30 40 50 Speed 60 70 80 9 Stat 101 – Lecture 15 10 Residual 5 0 -5 -10 0 10 20 30 40 50 Speed 60 70 80 10 Interpretation • There is a curved pattern in the residuals. – under predicts – over predicts – under predicts 11 Interpretation • Although the straight line does a very good job explaining the variation in stopping distance, a curved relationship model would do even better. 12 Stat 101 – Lecture 15 Dangers of Extrapolation • Suppose we use the least squares equation relating speed to stopping distance for a vehicle traveling at 5 mph? • The predicted stopping distance is – 45.4 feet. 13 Special Points • Outlier – In regression, this is a point with a large residual. • Leverage – In regression a point has high leverage if it is an extreme value for the explanatory variable. 14 Influence • Outliers and high leverage points can be influential points, that is, they can greatly influence what the intercept and the slope of the least squares line will be. 15