Chapter 8: Linear Regression

advertisement
AP Statistics
Chapter 8
Practice Problems
Linear Regression
Linear Regression

A diver is investigating a wreck under the water and has to
come up to the surface slowly. The following is a chart
detailing his depth from the time he starts ascending.
Time (sec)
Depth (ft)
Time (sec)
Depth (ft)
0
240
210
155
30
225
280
185
60
203
330
130
100
189
360
125
140
180
390
120
180
164
Linear Regression
1.
We wish to perform regression on the data.
What are the three conditions that we must
check before we attempt to do regression?
The data is quantitative
b) The data is linear
c) There are no outliers
a)
2.
Graph the scatterplot and determine if linear
regression seems appropriate.
a)
3.
No, regression doesn’t seem appropriate because
there appears to be an outlier
Which point is the outlier?
a)
At time 280 seconds, he was at depth of 185 feet.
Linear Regression

Determine the equation for the Least Squares
Regression Line (LSRL).


Describe the association.


There appears to be a strong, negative, linear
association between time and depth, but there
appears to be an outlier
Determine the correlation.


Depth  225.642  0.272(time)
r = -0.932
Eliminate the outlier. What is the new
correlation? Why does it change?

r = -0.980; it’s stronger because the residuals are smaller
Linear Regression
8.
Determine the equation for the Least Squares
Regression Line (LSRL) without the outlier.
a)
9.
Explain the meaning of the slope of the line.
a)
10.
Depth  225.698  0.292(time)
For every 1 second increase in time, our model
predicts an average decrease of 0.292 feet in depth.
Explain the meaning of the b0 in context of this
problem
a)
b)
At a time of 0 seconds, our model predicts a depth
of 225.698 feet.
Although this makes sense, we know that the diver
began to ascend at a depth of 240 ft.
Linear Regression
11.
Describe the relationship between time and
depth using r2 to make your description more
precise.
a)
b)
Since r = -0.980, r2 = 0.960
96% of the variation in the depth can be explained
by the approximate linear relationship with the
time.
Linear Regression
12.
Using the modified model, predict the depth of
the diver at each of the following times and
comment on the confidence of your prediction:
2 min. 50 sec.
a) ≈ 176 feet
b) 5 min.
a) ≈ 138 feet
c) 6 min. 30 sec. What is the residual at this time?
a) ≈ 112 feet. The residual is 120 – 112 = 8 (or if we
use the calculator ≈ 8.27)
d) 10 min.
a) ≈ 50.4 feet
a)
Linear Regression

Using the following summary statistics of a
statistics class, determine the LSRL (assume that
IQ is the explanatory variable):
IQ  112
S IQ  10
r  0.893
SAT  1821 S SAT  107
107
sSAT
 0.893
 9.556
slope  b1  r
10
sIQ
b0  y  b1 x
b0  SAT  b1 ( IQ )
b0  1821  9.556(112)  750.728
Linear Regression

Using the following summary statistics of a
statistics class, determine the LSRL (assume that
IQ is the explanatory variable):
IQ  112
S IQ  10
r  0.893
SAT  1821 S SAT  107
LSRL  yˆ  b0  b1 x
Since b0  750.728 and b1  9.556
SAT  750.728  9.556 IQ
Linear Regression


With an LSRL of: SAT  750.728  9.556 IQ
Interpret b0


With an IQ of 0, our model predicts an SAT score of
750.728.
 This make absolutely no sense. You can’t have
an IQ of 0!
Interpret b1

For every increase of 1 point in IQ, our model
predicts an average increase of 9.556 point on the
SAT.
Linear Regression


With an LSRL of: SAT  1803.976  0.152 IQ
Interpret r2


Since r = 0.893, r2 = 0.797
Approximately 80% of the variation in SAT score
can be explained by the approximate linear
relationship with the IQ.
Review Question

A researcher uses a regression equation to predict
home heating bills (dollar cost), based on home size
(square feet). The correlation between predicted bills
and home size is 0.70. What is the correct interpretation
of this finding?
70% of the variability in home heating bills can be explained by
home size.
b) 49% of the variability in home heating bills can be explained by
home size.
c)The
For
each added
home size, heating
bills
answer
is b)square
sincefoot
theofcoefficient
of determination
increased by 70 cents.
measures the proportion of variation in the dependent
d) For each added square foot of home size, heating bills
variable
that is predictable from the independent
increased by 49 cents.
variable.
e) None of the above.
a)
Review Question
A national consumer magazine reported the following
correlations:
The correlation between car weight and car reliability
is -0.30.
The correlation between car weight and annual
maintenance cost is 0.20.
Which of the following statements are true?

I.
II.
III.
Heavier cars tend to be less reliable.
Heavier cars tend to cost more to maintain.
Car weight is related more strongly to reliability than to
maintenance cost.
a)
I only
e)
I, II, and III
The answer is e) since reliability tends to decrease as car
b)
II only
weight
increases, costs tend to increase as car weight increases,
c)
III only
and
d) strength
I and II increases as correlation gets closer to ±1
Review Question
In the context of regression analysis, which of
the following statements are true?

I.
II.
III.
When the sum of the residuals is greater than zero,
the data set is nonlinear
A random pattern of residuals supports a linear
model.
A random pattern of residuals supports a non-linear
model.
I only
b)
II only
c) answer
III only
The
is b) since a random pattern of residuals supports a
linear
model;
d)
I and IIa non-random pattern supports a non-linear model.
The
of the
e) sum
I and
III residuals is always zero, whether the data set is linear
or nonlinear
a)
Assignment
Chapter 8
Chapter 9
Lesson:
Read:
Problems:
Linear Regression
Regression Wisdom
Chapter 8
Chapter 9
1 – 49 (odd)
1 - 31 (odd)
Download