Stat 101 – Lecture 14 Least Squares Estimates 0.2812 = 0.058 4.636 b0 = 0.908 − 0.058(11.92) = 0.217 b1 = 0.956 yˆ = 0.217 + 0.058 x 1 Interpretation • Slope – for every 1 mg increase in tar, the nicotine content increases, on average, 0.058 mg. • Intercept – there is not a reasonable interpretation of the intercept in this context because one wouldn’t see a cigarette with 0 mg of tar. 2 Nicotine Content vs. Tar Content 2.0 Nicotine (mg) Predicted Nicotine = 0.217 + 0.058Tar 1.5 1.0 0.5 0.0 0 5 10 15 20 25 Tar (mg) 3 Stat 101 – Lecture 14 Prediction • Least squares line yˆ = 0 .217 + 0.058 x x = 13 yˆ = 0 .217 + 0.058 (13) = 0 .97 4 Residual • • • • Tar, x = 13 mg Nicotine, y = 0.8 mg Predicted, ŷ = 0.97 mg Residual, y − yˆ = 0.8–0.97 = – 0.17 mg 5 Residuals • Residuals help us see if the linear model makes sense. • Plot residuals versus the explanatory variable. – If the plot is a random scatter of points, then the linear model is the best we can do. 6 Stat 101 – Lecture 14 Plot of Residuals vs. Tar Content 0.3 Residual 0.2 0.1 0.0 -0.1 -0.2 -0.3 0 5 10 15 20 25 Tar (mg) 7 Interpretation of the Plot • The residuals are scattered randomly. This indicates that the linear model is an appropriate model for the relationship between tar and nicotine content of cigarettes. 8 (r)2 or R2 • The square of the correlation coefficient gives the amount of variation in y, that is accounted for or explained by the linear relationship with x. 9 Stat 101 – Lecture 14 Tar and Nicotine • r = 0.956 • (r)2 = (0.956)2 = 0.914 or 91.4% • 91.4% of the variation in nicotine content can be explained by the linear relationship with tar content. 10 Regression Conditions • Quantitative variables – both variables should be quantitative. • Linear model – does the scatterplot show a reasonably straight line? • Outliers – watch out for outliers as they can be very influential. 11 Regression Cautions • Beware of extraordinary points. • Don’t extrapolate beyond the data. • Don’t infer x causes y just because there is a good linear model relating the two variables. • Don’t choose a model based on R2 alone. 12