Stat 101 – Lecture 14 Least Squares Estimates Interpretation b

advertisement
Stat 101 – Lecture 14
Least Squares Estimates
0.2812
= 0.058
4.636
b0 = 0.908 − 0.058(11.92) = 0.217
b1 = 0.956
yˆ = 0.217 + 0.058 x
1
Interpretation
• Slope – for every 1 mg increase in
tar, the nicotine content increases,
on average, 0.058 mg.
• Intercept – there is not a reasonable
interpretation of the intercept in
this context because one wouldn’t
see a cigarette with 0 mg of tar.
2
Nicotine Content vs. Tar Content
2.0
Nicotine (mg)
Predicted Nicotine = 0.217 + 0.058Tar
1.5
1.0
0.5
0.0
0
5
10
15
20
25
Tar (mg)
3
Stat 101 – Lecture 14
Prediction
• Least squares line
yˆ = 0 .217 + 0.058 x
x = 13
yˆ = 0 .217 + 0.058 (13) = 0 .97
4
Residual
•
•
•
•
Tar, x = 13 mg
Nicotine, y = 0.8 mg
Predicted, ŷ = 0.97 mg
Residual, y − yˆ = 0.8–0.97
= – 0.17 mg
5
Residuals
• Residuals help us see if the linear
model makes sense.
• Plot residuals versus the explanatory
variable.
– If the plot is a random scatter of
points, then the linear model is the
best we can do.
6
Stat 101 – Lecture 14
Plot of Residuals vs. Tar Content
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
0
5
10
15
20
25
Tar (mg)
7
Interpretation of the Plot
• The residuals are scattered
randomly. This indicates that the
linear model is an appropriate
model for the relationship between
tar and nicotine content of
cigarettes.
8
(r)2 or R2
• The square of the correlation
coefficient gives the amount of
variation in y, that is accounted for
or explained by the linear
relationship with x.
9
Stat 101 – Lecture 14
Tar and Nicotine
• r = 0.956
• (r)2 = (0.956)2 = 0.914 or 91.4%
• 91.4% of the variation in nicotine
content can be explained by the
linear relationship with tar content.
10
Regression Conditions
• Quantitative variables – both
variables should be quantitative.
• Linear model – does the scatterplot
show a reasonably straight line?
• Outliers – watch out for outliers as
they can be very influential.
11
Regression Cautions
• Beware of extraordinary points.
• Don’t extrapolate beyond the data.
• Don’t infer x causes y just because
there is a good linear model relating
the two variables.
• Don’t choose a model based on R2
alone.
12
Download