IM_chapter5

advertisement
Chapter 5
Exercises 5.1 – 5.16
5.1
5.2
a.
A positive correlation would be expected, since as temperature increases cooling costs
would also increase.
b.
A negative correlation would be expected, since as interest rates climb fewer people would
be submitting applications for loans.
c.
A positive correlation would be expected, since husbands and wives tend to have jobs in
similar or related classifications. That is, a spouse would be reluctant to take a low-paying
job if the other spouse had a high-paying job.
d.
No correlation would be expected, because those people with a particular I.Q. level would
have heights ranging from short to tall.
e.
A positive correlation would be expected, since people who are taller tend to have larger feet
and people who are shorter tend to have smaller feet.
f.
A weak to moderate positive correlation would be expected. There are some who do well on
both, some who do poorly on both, and some who do well on one but not the other. It is
perhaps the case that those who score similarly on both tests outnumber those who don't.
g.
A negative correlation would be expected, since there is a fixed amount of time and as time
spent on homework increases, time in watching television would decrease.
h.
No correlation overall, because for small or substantially large amounts of fertilizer yield
would be small.
The statement is incorrect. The correlation coefficient measures the extent to which x and y are
linearly related. They may have a strong nonlinear relationship and yet have a correlation of zero.
5.3
77
5.4
Association ≠ Causation. For example, it could be age, or the amount they entertain, or even the age
of their children that has a more important effect on their drinking habits rather than the amount they
earn.
Sugar
Consumption x
150
300
350
375
390
480
5.5
Depression
Rate y
2.3
3.0
4.4
5.0
5.2
5.7
zx
zy
zxzy
-1.726
-.369
.083
.309
.444
1.259
-1.470
-.947
.099
.548
.697
1.071
2.537
.349
.008
.169
.309
1.348
4.720
a.
r=
∑ zxzy
n -1
=
4.720
= 0.944
5
The correlation is strong and positive.
b.
Increasing sugar consumption doesn’t cause or lead to higher rates of depression, it may be
another reason that causes an increase in both. For instance, a high sugar consumption may
indicate a need for comfort food for a reason that also causes depression.
c.
These countries may not be representative of any other countries. It may be that only these
countries have a strong positive correlation between sugar consumption and depression rate
and other countries may have a different type of relationship between these factors. It is
therefore not a good idea to generalize these results to other countries.
78
5.6
a.
Inpatient,
x
80
76
75
62
100
100
88
64
50
54
83
r=
∑ zxzy
n -1
Outpatient
y
62
66
63
51
54
75
65
56
45
48
71
=
zy
0.246
0.663
0.351
-0.900
-0.587
1.601
0.559
-0.379
-1.525
-1.213
1.184
zx
0.258
0.022
-0.038
-0.806
1.441
1.441
0.731
-0.688
-1.516
-1.279
0.435
zxzy
0.064
0.014
-0.013
0.726
-0.846
2.307
0.409
0.261
2.312
1.552
0.516
7.30
= 0.73
10
Outpatient y
75
65
55
45
50
60
70
80
90
100
Inpatient, x
There appears to be a reasonably strong positive linear relationship between the cost-tocharge ratio for inpatient and outpatient services at these Oregon hospitals.
b.
There is one hospital, Harney District that has a lower outpatient cost-to-charge ratio or
higher inpatient cost-to-charge ratio than the other ten hospitals.
c.
If this observation was removed, the remaining points would all be much closer to a line and
so the correlation coefficient would be greater. The relationship would be stronger.
5.7
The correlation coefficient between household debt and corporate debt would be positive, and the
relationship would be strong. As the household debt increases, the corporate debt increases at a
similar rate; this can clearly be seen on the graph by a constant width between the two lines.
5.8
a.
r = 0.1178
79
b.
consumer debt
8
7
6
5.7
5.8
5.9
6.0
6.1
6.2
6.3
houshold debt
The scatterplot supports the correlation coefficient that indicates a very weak, or no linear
relationship between consumer and household debts.
5.10
a.
r = 0.335. There is a weak positive relationship between timber sales and the amount of
acres burned in forest fires.
b
No. Correlation does not imply a causal relationship.
a.
Comparing Heart Rate Responses to Two Exercise Tests
200
195
300 yd run
5.9
190
185
180
170
175
180
Shuttle
185
190
195
There does not appear to be any linear relationship between peak heart rate during a shuttle
run and peak heart rate during a 300 yard run.
b.
n = 10, Σx = 1788, Σy = 1898,
Sxy = 339472 −
Sxx = 320400 −
(1788 )(1898 )
10
(1788 )2
10
∑x
2
= 320400, Σy 2 = 360628, Σxy = 339472
= 109.6
= 705.6
80
Syy = 360628 −
r =
(1898 )2
109.6
705.6 387.6
10
= 387.6
= .2096
The value of .2096 suggests at best a very weak linear relationship between the two
variables. The conclusion is consistent with the one of part a.
5.12
The value of r does not depend on which variable is labeled x. So switching the labels will
not change the value of r .
a.
Since the points tend to be close to the line, it appears that x and y are strongly correlated in
a positive way.
b.
An r value of .9366 indicates a strong positive linear relationship between x and y.
c.
If x and y were perfectly correlated with r = 1, then each point would lie exactly on a line.
The line would not necessarily have slope 1 and intercept 0.
a.
80
70
Exam Score
5.11
c.
60
50
40
0
10
20
Test Anxiety
There are several observations that have identical x-values yet different y-values. Thus, the
value of y is not determined solely by x, but also by various other factors. There is one data
point that is far removed from the remaining data points. The plot seems to indicate there
may be a tendency for exam scores to decrease as test anxiety increases.
b.
There appears to be a linear relationship between x and y. The scatter plot shows a
tendency for y to decrease as x increases. That is, as test anxiety increases, exam scores
decrease. The relationship may be characterized as a moderate negative relationship.
81
c.
x
23
14
14
0
7
20
20
15
21
∑ x = 134
x2
529
196
196
0
49
400
400
225
441
2
∑ x = 2436
y
43
59
48
77
50
52
46
51
51
∑ y = 477
y2
1849
3481
2304
5929
2500
2704
2116
2601
2601
2
∑ y =26085
xy
989
826
672
0
350
1040
920
765
1071
∑ xy = 6633
(134)(134)
= 2436 – 1995.11 = 440.89
9
(477)(477)
Syy = 26085 = 26085 – 25281 = 804
9
(134)(477)
Sxy = 6633 = 6633 – 7102 = -469
9
−469
−469
r =
=
= -0.7878
(21)(28.35)
440.89 804
Sxx = 2436 -
r = -0.7878 indicates a moderate negative linear relationship between test anxiety and
exam score.
d.
Correlation measures the extent of association, but association does not imply causation.
It is possible that the two variables are not causally related but they may both be related
to a third variable.
(9620)(7436)
2600
r =
(9620)(9620)
(7436)(7436)
36168 −
23145 −
2600
2600
27918 −
5.13
27918 − 27513.2
404.8
=
= .3899
(23.9583)(43.336)
574 1878.04
There is a weak positive linear relationship between high school GPA and first-year college GPA.
=
5.14
No, because x, artist, is not a numerical variable.
5.15
No. An r value of −0.085 indicates an extremely weak relationship between support for
environmental spending and degree of belief in God.
5.16
The sample correlation coefficient would be closest to −.9. This is because there is an almost perfect
negative linear relationship between speed and time required to travel a fixed distance. That is, as
speed increases, time required to traverse the fixed distance decreases.
82
Exercises 5.17 – 5.35
5.17
a.
There is a weak negative association between pollution and the cost of medical care.
b.
x = Pollution and y = cost;
Σ x = 191.1, Σ x2 = 6184.05 Σ xy = 177807,
Σ y = 5597, n = 6, x = 31.85, y = 932.833
( x )( y )
(191.1)(5597)
= −457.45
∑ xy − ∑ n ∑ = 177807 −
6
(∑ x )
(191.1)
= ∑x −
= 6184.05 −
= 97.515
Sxy =
2
Sxx
2
2
6
n
The slope, b =
Sxy
Sxx
=
−457.45
= −4.69
97.515
The intercept, a = y − bx = 932.833 − ( −4.69)(31.85) = 1082.21
The equation: ŷ = 1082.21 - 4.69x
The slope is negative, consistent with the description in part a.
d.
Yes, it does support the conclusion that elderly people that live in more polluted areas have
higher medical costs, but care must be taken not to state that the pollution causes the high
medical costs – or even the high medical costs causes the pollution!
a.
90
80
% who would buy lottery ticket
5.18
c.
70
60
50
40
30
20
10
0
4
6
8
10
12
14
Grade
There appears to be positive linear relationship between x, the grade level, and y, the
percentage who said they were more likely to purchase a lottery ticket.
83
b.
x = Grade and y = %bought;
Σ x = 36, Σ x2 = 344 Σ xy = 2318.2,
Σ y = 237.4, n = 4, x = 9, y = 59.35
( x )( y )
(36)(237.4)
∑ xy − ∑ n ∑ = 2318.2 − 4 = 181.6
(∑ x )
(36)
= ∑x −
= 344 −
= 20
Sxy =
2
Sxx
2
2
n
The slope, b =
Sxy
Sxx
4
=
181.6
= 9.08
20
The intercept, a = y − bx = 59.35 − (9.08)(9) = −22.37
The equation: ŷ = -22.37 + 9.08 x
5.20
5.21
a.
The dependent variable is the number of fruit and vegetable servings per day. The
independent variable is the number of hours of TV viewed per day.
b.
Negative, because as the number of hours of TV watched increases, the number of servings
of fruit and vegetables decreases.
a.
For lower values of patient to nurse ratios, nurse job satisfaction might be low because up to
a point, the more patients a nurse has to look after, the more interesting the job would
become. After a certain number, however, the job would get difficult to do well and might get
frustrating. The relationship might be nonlinear.
b.
Patient satisfaction is probably related to the amount of attention received. The higher the
patient to nurse ratio, the less personal attention would be received, so the relationship would
be negative.
c.
In an ideal world, there would be no relationship between nurse to patient ratio and patient
quality care; it should be excellent, no matter how many patients each nurse has to care for!
However, quality of care probably declines as the number of patients a nurse must care for
increases. The relationship would be negative.
a.
Head circumference z-score vs. volume of cerebral grey matter
850
Cerebral Grey Matter (ml) 2-5 y
5.19
800
750
700
-1
0
1
Head circumference z-scores
84
2
3
b.
n = 18, Σx = 24.35, Σy = 13890 Σx 2 = 49.6775, Σy 2 = 10767400, Σxy = 19501.75
x = 1.3528, y = 771.667
Sxy = 19501.75 −
Sxx = 49.6775 −
(24.35)(13890)
= 711.667
18
(24.35)2
= 16.737
18
Syy = 10767400 −
r =
c.
Sxy
Sxx Syy
The slope, b =
=
Sxy
Sxx
138902
= 48950
18
711.677
16.737 48950
=
= .7863
711.667
= 42.521
16.737
The intercept, a = y − bx = 771.667 − (42.521)(1.3528) = 714.145
The equation: ŷ = 714.145 + 42.521 x
5.22
5.23
d.
When head circumference z-score is 1.8, the predicted volume of grey matter is 790.68 ml.
e.
The least-squares line was calculated using values of z-scores of between -0.75 and 2.8 and
therefore is only valid for values in this range. We don’t know if the relationship between
cerebral grey matter and head circumference z-score remains the same outside these values
and so this equation cannot be used for prediction.
a.
The value of the y-intercept of the line ŷ = -147 + 6.175x is –147. The value of the
slope is 6.175. This means that each unit increase in x is associated with an increase
of 6.175 in y, on average. Thus, each 1 cm increase in snout-vent length is associated
with a 6.175 increase in clutch size, on average.
b.
This least squares line should not be used to predict clutch size of a salamander with a
snout-vent length of 22 cm because a 22cm snout-vent length is outside the 3070cm snout-vent range of the data set by quite a bit.
a.
There is a moderately strong positive linear relationship between the percentage of public
schools who were at or above the proficient level in math in 4th and 8th grade in the 8 states.
85
b.
x = 4th grade and y = 8th grade;
Σ x = 140, Σ x2 = 2586 Σ xy = 3497,
Σ y = 188, n = 8, x = 17.5, y = 23.5
( x )( y )
(140)(188)
∑ xy − ∑ n ∑ = 3497 − 8 = 207
(∑ x )
(140)
= ∑x −
= 2586 −
= 136
Sxy =
2
Sxx
2
2
n
The slope, b =
Sxy
Sxx
8
=
207
= 1.522
136
The intercept, a = y − bx = 23.5 − 1.522(17.5) = −3.135
The equation: ŷ = -3.135 + 1.522x
c.
5.24
a.
Predicted 8th grade = -3.135 + 1.522(4th grade percent) ⇒ -3.135 + 1.522(14) = 18 (rounded to
nearest integer). This is 2% lower than the actual 8th grade value of 20 for Nevada.
y=
7436
= 2.86
2600
x=
9620
= 3.7
2600
(9620)(7436)
404.8
2600
b=
=
= 0.7052
(9620)(9620)
574
36168 −
2600
27918 −
a = 2.86 – 0.7052(3.7) = 0.2508
Therefore, the equation of the least squares regression line is ŷ = 0.2508 + 0.7052x
5.25
b.
The equation ŷ = 0.2508 + 0.7052x has slope b = 0.7052, so each one unit increase in x
is associated with an increase of 0.7052 in y, on average. This means that, on average,
each 1.0 increase in high school GPA is associated with an increase of 0.7052 in first
year college GPA.
c.
For x = 4.0, ŷ = 0.2508 + 0.7052(4.0) = 0.2508 + 2.8208 = 3.0716
a.
There appears to be a negative linear association between carbonation depth and the
strength of concrete for a sample of core specimens.
86
b.
Σ x = 323, Σ x2 = 14339, Σ xy = 3939.9,
Σ y = 130.8, n = 9, x = 35.889, y = 14.533
( x )( y )
(323)(130.8)
= −754.367
∑ xy − ∑ n ∑ = 3939.9 −
9
(∑ x )
(323)
= ∑x −
= 14339 −
= 2746.889
Sxy =
2
Sxx
2
2
The slope, b =
n
Sxy
Sxx
9
=
−754.367
= −0.275
2746.889
The intercept, a = y − bx = 14.533 − ( −0.275)(35.889) = 24.40
The equation: ŷ = 24.4 – 0.275x
c.
When depth is 25, predicted strength = 24.4 – 0.275(25) = 17.5
d.
The least squares line was calculated using values of “depth” of between 8 mm and 65 mm
and therefore is only valid for values in this range. We don’t know if the relationship between
depth and strength remains the same outside these values and so this equation cannot be
used. A depth of 100 mm is clearly outside these values and it would be unreasonable to
use this equation to predict strength.
5.26
It certainly seems that the sooner the paramedics get there, the higher your chances of survival. The
slope of the least squares line is – 9.30, which means that for every extra minute, on average, the
survival rate decreases by 9.30%.
5.27
The slope is the average increase in the y variable for an increase of one unit in the x variable.
Because the home prices (y variable) dropped by an average of $4000 (-4000) for every (1) mile (x
variable) from the Bay area, the slope is -4000/1 = -4000.
5.28
a.
r = 0.70 There is a moderately strong positive linear relationship between sale price and
property size.
b.
r = -0.333 There is a very weak negative linear relationship (if any!) between sale price and
land/building ratio.
c.
I would use size as it has a correlation coefficient ( r) much closer to |1|.
d.
Using x = size and y = sale price: Σ x = 16603, Σ x2 = 40097671 Σ xy = 232691.5,
Σ y = 100.5, n = 10, x = 1660.3, y = 10.05
(
x )(
y)
(16603)(100.5)
= 65831.35
∑ xy − ∑ n ∑ = 232691.5 −
10
(∑ x )
(16603)
=∑x −
= 40097671 −
= 12531710.1
S xy =
2
S xx
2
2
The slope, b =
n
S xy
S xx
=
10
65831.35
= 0.00525
12531710.1
The intercept, a = y − bx = 10.05 − 0.00525(1660.3) = 1.333
The equation: ŷ = 1.333 + 0.00525x
87
5.29
a.
Σ x = 240, Σ x2 = 6750 Σ xy = 199750,
Σ y = 7250, n = 11, x = 21.818, y = 659.091
(
x )(
y)
(240)(7250)
∑ xy − ∑ n ∑ = 199750 − 11 = 41568.182
(∑ x )
(240)
=∑x −
= 6750 −
= 1513.636
11
n
S xy =
2
S xx
2
2
The slope, b =
Sxy
Sxx
=
41568.182
= 27.462
1513.636
The intercept, a = y − bx = 659.091 − 27.462(21.818) = 59.925
The equation: ŷ = 59.925 + 27.462x
5.30
5.31
b.
Concentration with 18% bare ground: 59.925 + 27.462(18) = 554 (to nearest integer)
c
No, because the data used to obtain the least squares equation was from steeply sloped
plots, so it would not make sense to use it to predict runoff sediment from gradually sloped
plots. You would need to use data from gradually sloped plots to create a least squares
regression equation to predict runoff sediment from gradually sloped plots.
a.
slope = 244.9
b.
244.9
c.
y = −275.1 + 244.9(2) = −275.1 + 489.8 = 214.7
d.
No. When shell height (x) equals 1, the equation would result in a predicted breaking
strength of −275.1 + 244.9(1) = −30.2. It is impossible for breaking strength to be a negative
value, so the equation results in a predicted value which is not meaningful.
intercept = −275.1
a.
The graph reveals a moderate linear relationship between x and y.
88
(1368.1)(80.9)
6933.48 − 6917.456
16.0244
16
=
=
= 0.1123
b=
2
−
117123.85
116981.101
142.7494
(1368.1)
117123.85 −
16
6933.48 −
b.
a=
c.
80.9
⎛ 1368.1 ⎞
− 0.1123 ⎜
⎟ = 5.0563 − 0.1123(85.5063) = 5.0563 − 9.6024 = −4.5461
16
⎝ 16 ⎠
The change in vital capacity associated with a 1 cm. increase in chest circumference is
.1123.
The change in vital capacity associated with a 10 cm. increase in chest circumference is
10(.1123) = 1.123.
d.
yˆ = −4.5461+ .1123(85) = 4.9994
e.
No; this is shown by the fact that there are two data points in the data set whose x values are
81.8, but these data points have different y values.
5.32
It is dangerous to use the least squares line to obtain predictions for x-values outside the range of
those contained in the sample, because there is no information in the sample about the relationship
that exists between x and y beyond the range of the data. The relationship may be the same or it
may change substantially. There is no data to support a conclusion either way.
5.33
a.
y = 100 + .75(sy)2 = 100 + 1.5(sy). That person's annual sales would be 1.5 standard
deviations above the mean sales of 100.
b.
(y − y ) = r
r =
5.34
sy
sx
(x − x ) , which implies
y− y
sy
=r
x− x
sx
. Hence, −1.0 = r(−1.5) implies
−1.0
= .67.
−1.5
The denominators of b and of r are always positive numbers. The numerator of b and r is
∑ (x − x )(y − y) . Since both b and r have the same numerator and positive denominators, they will
always have the same sign.
89
5.35
a.
ŷ = −424.7 + 3.891x
b.
Let y' = cy. Then y ′ = cy .
b′ =
∑(x − x )(cy − c y )
∑(x − x )
2
=
c ∑(x − x )(y − y )
∑(x − x )2
= cb
a′ = c y − cb x = c( y − b x ) = ca
Both the slope and the y intercept are changed by the multiplicative factor c. Thus, the new
least squares line is the original least squares line multiplied by c.
Exercises 5.36 – 5.51
5.36
a.
Σ x = 55, Σ x2 = 385 Σ xy = 1086,
Σ y = 185.6, n = 10, x = 5.5, y = 18.56
(
x )(
y)
(55)(185.6)
∑ xy − ∑ n ∑ = 1086 − 10 = 65.2
(∑ x )
(55)
=∑x −
= 385 −
= 82.5
10
n
S xy =
2
S xx
2
2
The slope, b =
Sxy
Sxx
=
65.2
= 0.7903
82.5
The intercept, a = y − bx = 18.56 − 0.7903(5.5) = 14.213
The equation: ŷ = 14.213 + 0.7903x
The number of transplants has increased steadily over time.
b.
x
y
1
2
3
4
5
6
7
8
9
10
15
15.7
16.1
17.6
18.3
19.4
20
20.3
21.4
21.8
y - ŷ
-.0036
-.0939
-.4842
.2254
.1351
.4448
.2545
-.2358
-.0739
-.3164
ŷ
15.0036
15.7939
16.5842
17.3745
18.1648
18.9552
19.7455
20.5358
21.3261
22.1164
90
There does appear to be curvature in the residual plot which indicates that the relationship
between year and number of transplants may be better described by a curve rather than a
line.
5.37
a.
b.
Yes, there appear to be large residuals, those associated with the x-values of
40, 50 and 60.
91
c.
x
y
40
50
60
70
80
90
100
58
34
32
30
28
27
22
ŷ
46.5
42.0
37.5
33.0
28.5
24.0
19.5
y - yˆ
11.5
−8.0
−5.5
−3.0
−0.5
3.0
2.5
Yes, the residuals for small x-values and large x-values are positive, while the residuals for
the middle x-values are negative.
5.38
a.
b.
If the equation of the least squares line is ŷ = 1082.2 − 4.691x :
x
y
30.0
31.8
32.1
26.8
30.4
40.0
915
891
968
972
952
899
ŷ
941.47
933.03
931.62
956.48
939.59
894.56
residuals
-26.47
-42.03
36.38
15.52
12.41
4.44
The correlation coefficient, r = -0.581. It indicates a moderately strong negative linear
relationship between pollution and medical cost.
92
c.
It appears that the areas with the high and low pollution have smaller pollution rates have
smaller residuals than the areas with pollution rates in the middle range. This might warrant
further investigation.
5.39
d.
The observation is influential. With this observation deleted, the equation of the regression
line is ŷ = 974 – 1.35x, which is quite different than the line based on the complete data set.
a.
The equation of the least-squares line is ŷ = 94.33 − 15.388 x .
b.
x
.106
.193
.511
.527
1.08
1.62
1.73
2.36
2.72
3.12
3.88
4.18
y
98
95
87
85
75
72
64
55
44
41
37
40
y
92.6989
91.3601
86.4667
86.2205
77.7110
69.4014
67.7088
58.0143
52.4746
46.3194
34.6246
30.0082
residual
5.30112
3.63988
0.53326
-1.22053
-2.71096
2.59856
-3.70876
-3.01432
-8.47464
-5.31945
2.37544
9.99184
There appears to be a pattern in the plot. It is like the graph of a quadratic equation.
93
5.40
a.
No: The least squares line with observation 11 is : ŷ = -1.1 + 1.29x,
without observation 11: ŷ = -17.59 + 1.59x (not a lot of difference in the slope)
5.41
5.42
b.
Yes. With a residual of 100 – 68.56 = 31.44, when se = 12.185, observation 11 can be
considered an outlier.
c.
No: The least squares line with observation 5 is : ŷ = -1.1 + 1.29x,
without observation 5: ŷ = 5.26 + 1.16x (not a lot of difference in the slope)
d.
No. With a residual of 100 – 95.65 = 4.35, when se = 12.185, observation 5 cannot be
considered an outlier.
a.
r2 = 15.4%
b.
r2 = 16%: No, only 16% of the variability in first-year grades can be attributed to an
approximate linear relationship between first-year college grades and SAT II score so this
does not indicate a good predictor.
a.
A value of r 2 = 0.7664 means that 76.64% of the observed variability in clutch size can
be explained by an approximate linear relationship between clutch size and snout vent
length.
b.
To find the value of s e , we need the value of SSResid.
We know r 2 = 1- SSResid/SSTo
Solving for SSResid, we get
SSResid = 10266.954
10266.954
= 29.25
se =
14 − 2
Thus, a typical amount by which an observation deviates from the least squares line is 29.25.
29.25.
a.
There does appear to be a positive linear relationship between x and y.
100
90
80
70
Runoff
5.43
60
50
40
30
20
10
0
0
50
100
Rainfall
94
b.
∑ x = 798 ∑ x
x=
798
= 53.2
15
2
= 63040
y =
∑ y = 643 ∑ y
2
= 41999
643
= 42.87
15
(798)(643)
17024.4
15
= 0.827
b=
=
(798)(798)
20586.4
63040 −
15
a = 42.87 – 0.827(53.2) = -1.13
yˆ = −1.13 + 0.827 x
51232 −
c.
For x = 80,
d.
SSResid =
yˆ = −1.13 + 0.827(80) = 65.03
∑y
2
−a
∑ y − b∑ xy
= 41999 – (-1.13)(643) – 0.827(51232)
= 356.726
se =
356.726
= 5.238
15 − 2
e.
Rainfall
5
12
14
17
23
30
40
47
55
67
72
81
96
112
127
Runoff
4
10
13
15
15
25
27
46
38
46
53
70
82
99
100
95
Residual
0.99344
1.20463
2.55068
2.06976
-2.89208
1.31911
-4.95062
8.26057
-6.35522
-8.2789
-5.41376
4.14348
3.73888
7.50731
-3.89728
∑ xy = 51232
10
Residual
5
0
-5
-10
0
20
40
60
80
100
120
140
Rainfall
Yes, the variability of the residuals appears to be increasing with x, indicating that a linear relationship
may not be appropriate.
5.44
a.
n = 38 ∑ x = 704
Sxy = 829.48 −
Sxx = 14752 −
∑ y = 45.48
∑ y2 = 55.444
∑ xy = 829.48
(704)(45.48)
= −13.097
38
(704)2
= 1709.474
38
Syy = 55.444 −
the slope, b =
∑ x2 = 14752
(45.48)2
= 1.012
38
−13.097
= −0.00766
1709.474
intercept, a = 1.1968 − (−.00766)(18.526) = 1.339
ŷ = 1.339 – 0.008x
(Sxy )
2
r =
c.
With r 2 close to 0, the linear relationship between perceived stress and telomere length
accounts for a very small proportion of variability in telomere length.
(Sxx )(Syy )
=
( −13.097 )2
b.
2
(1709.474)(1.012)
= .0992
96
5.45
a.
ŷ = 766 + .015(9900) = 914.5
residual = 893 − 914.5 = −21.5
5.46
5.47
b.
The typical amount that average SAT score deviates from the least squares line is 53.7.
c.
Only about 16% of the observed variation in average SAT scores can be attributed to the
approximate linear relationship between average SAT scores and expenditure per pupil. The
least-squares line does not effectively summarize the relationship between average SAT
scores and expenditure per pupil.
a.
ŷ = −89.09 + .72907(375) = 184.31
residual = 165 − 184.31 = −19.31
b.
2
r =.963
a.
The plot suggests that the least squares line will give fairly accurate predictions. The least
squares equation is y = 5.20683 − .03421x.
b.
The summary statistics for the data remaining after the point (143, .3) is deleted are:
n=9
∑ x = 1060 − 143 = 917
∑ y = 15.8 − .3 = 15.5
∑ x2 = 114514 − (143)2 = 94065
∑ y = 27.82 − (.3) = 27.73
2
2
∑ xy = 1601.1 − (143)(.3) = 1558.2
∑ x2 −
( ∑ x )2
(917)2
= 94065 −
= 94065 − 93432.1111 = 632.8889
n
9
∑ xy −
( ∑ x)( ∑ y)
(917)(15.5)
= 1558.2 −
= 1558.2 − 1579.2778 = −21.0778
n
9
97
b=
−21.0778
= − .0333
632.8889
a = 1.7222 − (−.0333)(101.8889) = 1.7222 + 3.3930 = 5.1151
The least squares equation with the point deleted is ŷ = 5.1151 − .0333x. The deletion of this
point does not greatly affect the equation of the line.
c.
For the full data set:
(15.8)2
= 27.82 − 24.964 = 2.856
10
SSResid = 27.82−5.2068338(15.8) − (−.03421541)(1601.1)
= 27.82 − 82.2680 + 54.7823 = .3343
SSTo = 27.82 −
r 2 = 1−
.3343
= 1 − .1171 = .8829
2.856
For the data set with the point (143, .3) deleted:
(15.5)2
SSTo = 27.73 −
= 27.73 − 26.6944 = 1.0356
9
SSResid = 27.73 − 5.1151(15.5) − (−.0333)(1558.2) = 27.73 − 79.28405 + 51.8881 = .33405
.33405
= 1 − .3226 = .6774
1.0356
The value of r 2 becomes smaller when the point (143, 0.3) is deleted. The reason for
this is that in the equation r 2 = 1 – SSResid/SSTo, the value of SSTo is lowered by
dropping the point (143, 0.3) but the value of SSResid remains about the same.
r 2 = 1−
5.48
5.49
1235.470
= 0.9512
25321.368
The coefficient of determination reveals that 95.12% of the total variation in hardness of molded
plastic can be explained by the linear relationship between hardness and the amount of time elapsed
since termination of the molding process.
2
r = 1−
a.
yˆ = 62.9476 − 0.54975(25) = 49.2
residual = 49.2 –70 = -20.8
b.
Since the slope of the fitted line is negative, the value of r is the negative square root of
c.
r 2 . So r = - r 2 = − 0.57 = −0.755
SSresid
r 2 = 1−
SSTo
SSresid
0.57 = 1 2520
Solving for SSResid gives
SSResid = 1083.6
1083.6
se =
= 11.64
8
98
5.50
a.
Whether se is small or not depends upon the physical setting of the problem. An se of 2 feet
when measuring heights of people would be intolerable, while an se of 2 feet when measuring
distances between planets would be very satisfactory. It is possible for the linear association
between x and y to be such that r2 is large and yet have a value of se that would be
considered large. Consider the following two data sets:
Set 1
x
y
5 14
6 16
7 17
8 18
9 19
10 21
Set 2
x
y
14
5
16 15
17 25
18 35
19 45
21 55
For set 1, r2 = .981 and se = .378. For set 2, r2 = .981 and se = 2.911.
Both sets have a large value for r2, but se for data set 2 is 7.7 times larger than se for data set
1. Hence, it can be argued that data set 2 has a large r2 and a large se.
b.
Now consider the data set
x
y
5
10.004
55
10.006
15
10.007
45
10.008
25
10.009
35
10.010
This data set has r2 = .12 and se = .002266. So yes, it is possible for a bivariate data set to
have both r2 and se small.
5.51
c.
When r2 is large and se is small, then not only has a large proportion of the total variability in y
been explained by the linear association between x and y, but the typical error of prediction is
small.
a.
When r = 0, then se = sy. The least squares line in this case is a horizontal line with intercept
of y .
b.
When r is close to 1 in absolute value, then se will be much smaller than sy.
c.
2
s e = 1 − (.8) (2.5) =.6(2.5) = 1.5
d.
Letting y denote height at age 6 and x height at age 18, the equation for the least squares
line for predicting height at age 6 from height at age 18 is
⎛ 1.7 ⎞
(height at age 6) = 46 + .8 ⎜
⎟ [(height at age 18) − 70] = 7.95 + .544(height at age 18)
⎝ 2.5 ⎠
The value of se is
1 − (.8)2 (1.7) = .6(1.7) = 1.02.
99
Exercises 5.52 – 5.60
5.52
a.
Because of the substantial curvature in the plot, a straight line would not provide an effective
summary of the relationship.
b.
The plot of the transformed variables suggests that the relationship could be modeled by a
straight line.
c.
The coefficient of determination between y ′ and x ′ is .973. This suggests that a leastsquares line might effectively summarize the relationship between x ′ and y ′ .
d.
When x = 35, x ' = 1.54407
yˆ ′ = 2.01780 − 1.05171(1.54407) = 0.393886
yˆ = 100.393886 = 2.47677
e.
Yes, this appears to be the case. To see this, predict y using both approaches. Compare the
values of
( y − yˆ )2 for the two methods. The method of part c results in a lower value for
∑
∑
( y − yˆ )2 .
100
a.
Fatality Rate vs Age
% of Drivers killed in Injury Crashes
3.5
3.0
2.5
2.0
1.5
1.0
20
b.
30
40
50
60
Age of Driver
70
Table suggests moving x up or y down so let y ' =
80
90
1
.
y
c.
Age x
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
Fatality Rate y
1
0.99
0.8
0.8
0.75
0.75
0.95
1.05
1.15
1.2
1.3
1.65
2.2
3
3.2
y'=1/y
1
1.01
1.25
1.25
1.33
1.33
1.05
0.95
0.87
0.83
0.77
0.61
0.45
0.33
0.31
Transformed Data vs Age
1.4
1.2
Transformed Data
5.53
1.0
0.8
0.6
0.4
0.2
20
30
40
50
60
70
80
90
Age
d.
The scatterplot suggests there is a good linear transformation.
101
e.
Using transformed data:
Σ x = 825, Σ (x)2 = 52375 Σ xy’ = 643.31,
Σ y’ = 13.3603, n = 15, x = 55, y ' = 0..8907
(
x )(
y ')
(825)(13.3603)
= −91.5065
∑ xy '− ∑ n∑ = 12.72 −
15
(∑ x )
(825)
=∑x −
= 52375 −
= 7000
n
15
S xy ' =
2
S xx
2
2
The slope, b =
Sxy '
Sxx
=
−91.5065
= −0.0131
7000
The intercept, a = y − bx = 0.8907 − ( − 0.0131)(55) = 1.6112
1
= 1.6112 − 0.0131x , where x = Age and
yˆ
1
1
y = fatality rate. When x = 78, = 1.6112 − 0.0131(78) = .5894 , so yˆ =
= 1.697.
ŷ
.5894
The equation: yˆ ' = 1.6112 − 0.0131x or
5.54
a.
The plot is curvilinear, not linear.
b.
The plot looks like section 3 of figure 5.31, which suggests going down the ladder on y and/or
x. Both x and y are down the ladder.
102
c.
Yes, this plot is straighter than the plot in part a.
5.55
d.
Since there are x observations whose values are 0, both log(x) and 1/x cannot be employed.
Another transformation that might be helpful in straightening the plot is cube root of x and
cube root of y.
a.
n = 12
∑ x = 22.4 ∑ y = 303.1 ∑ x 2
241.29 −
r=
= 88.58
(22.4)(303.1)
12
2
2
(22.4)
(303.1)
12039.27 −
12
12
−324.5
= −0.717
=
(6.84)(66.208)
88.58 −
103
=
∑y
2
= 12039.27
−324.50
46.767 4383.47
∑ xy = 241.29
b.
∑ x = 13.5 ∑ y = 55.74 ∑ x
n=12
47.7283 −
r=
22.441 −
2
∑y
= 22.441
(13.5)(55.74)
12
(13.5)2
(55.74)2
303.3626 −
12
12
=
2
= 303.3626
∑ xy = 47.7283
−14.9792
7.2535 44.4503
−14.9792
= −0.835
(2.693)(6.667)
The correlation between x and y is −.835. Since this correlation is larger in absolute
value than the correlation of part a, the transformation appears successful in straightening the
plot.
5.56
a.
From 1990 to 1999 the number of people waiting for a transplant has increased. Each year,
the number of people added to the waiting list increases.
b.
Using the transformation y’ where y’ = y :
Σ x = 55, Σ x2 = 385 Σ xy’ = 391.2
Σ y’ = 64.72, n = 10, x = 5.5, y ' = 6.47
(
x )(
y ')
(55)(64.72)
∑ xy '− ∑ n∑ = 391.2 − 10 = 35.24
(∑ x )
(55)
=∑x −
= 385 −
= 82.5
S xy ' =
2
S xx
2
2
10
n
Sxy ' 35.24
The slope, b =
=
= 0.427
82.5
Sxx
The intercept, a = y '− bx = 6.47 − 0.427(5.5) = 4.12
104
The equation: y’= 4.12 + 0.427x
c.
Using the transformed equation:
yˆ = 4.12 + 0.427 x, the predicted number of patients
waiting for an organ transplant in 2000 (year 11) is
ŷ = 4.12 + 0.427(11) = 8.817 ⇒
y = 2.269. As y is measured in thousands, we predict 2269 patients awaiting transplant
surgery in 2000.
d.
5.57
We are assuming the relationship between year and the number awaiting transplant stays the
same outside the range of x-values in the given data range. The further from the data range
a prediction is going to be made, the less accurate it may be. 2010 is further away from the
data used to create the least squares line and we don’t know if the relationship between the
two variables is still the same. I would be less confident to make a prediction if the year was
2010.
a.
The relationship appears non-linear.
b.
Σ x = 7.5, Σ x2 = 13.75 Σ xy = 641.05,
Σ y = 370.1, n = 5, x = 1.5, y = 74.02
(
x )(
y)
(7.5)(370.1)
= 85.9
∑ xy − ∑ n ∑ = 641.05 −
5
(∑ x )
(7.5)
=∑x −
= 13.75 −
= 2.5
n
5
S xy =
2
S xx
2
2
The slope, b =
Sxy 85.9
= 34.36
Sxx 2.5
The intercept, a = y − bx = 74.02 − 34.36(1.5) = 22.48
The equation: ŷ = 22.48 + 34.36x
105
There is a definite curvature in the residual plot confirming the conclusion in part a.
c.
The value of r2 is higher and the size of the residuals are smaller for the log transformation.
d.
y = a + b(x’) where x’ = log10(x)
values of x’ are: -0.30102, 0, 0.17609, 0.30103, 0.39794
Σ x’ = .5740, Σ (x’)2 = .3706 Σ x’y = 73.2836,
Σ y = 370.1, n = 5, xʹ = .1148, y = 74.02
S xy =
(
x )(
y)
(.5740)(370.1)
= 30.796
∑ xy − ∑ n ∑ = 73.2836 −
5
106
S xx =
∑x
2
−
(
∑ x)
2
= .3706 −
(.574)2
= 0.3047
5
n
Sxy 30.796
The slope, b =
= 101.07
Sxx 0.3047
The intercept, a = y − bx = 74.02 − 101.07(.1148) = 62.417
The equation: ŷ = 62.417 + 101.07x’ ⇒ ŷ = 62.417 + 101.07 log (x)
e.
5.58
When energy of shock (x) = 1.75, predicted success percent to be 62.417 + 101.07(log 1.75)
= 87.0%. When energy of shock is 0.8, the predicted success would be 62.417 + 101.07(log
0.8) = 52.6%
a.
b.
107
c.
d.
e.
I would recommend using either the transformation of part d or part e.
108
5.59
a.
The plot does appear to have a positive slope, so the scatter plot is compatible with the
"positive association" statement made in the paper.
b.
This transformation does straighten the plot, but it also appears that the variability of y
increases as x increases.
109
c.
The plot appears to be as straight as the plot in b, and has the desirable property that the
variability in y appears to be constant regardless of the value of x.
d.
This plot has curvature opposite of the plot in part a, suggesting that this transformation has
taken us too far along the ladder.
110
5.60
The relationship between age and canal length is not linear, but curvilinear. Transforming to
1/x produces a scatterplot that is much straighter than the plot above.
Exercises 5.61 – 5.65
5.61
Using x = peak intake and y’ = ln(p/1-p): Σ x = 250, Σ x2 = 16500 Σ xy’ = 61.75,
Σ y’ = 6.4958, n = 5, x = 50, y ' = 1.3
(
x )(
y ')
(250)(6.4958)
= −263.04
∑ xy '− ∑ n∑ = 61.75 −
5
(∑ x )
S
(250)
=∑x −
= 16500 −
= 4000. The slope, b =
S xy ' =
2
S xx
2
2
n
5
xy '
Sxx
=
−263.04
= −0.065876.
4000
The intercept, a = y '− bx = 1.3 − ( −0.06576)(50) = 4.589
The equation: yˆ ' = 4.589 - 0.0659x
Using the values of a and b from the logistic equation, the probability of survival for a hamster with a
e 4.589 −0.0659(40)
peak intake of 40 μg: p =
= .876
1 + e 4.589 −0.0659(40)
111
a.
Probability of Hatching for Low and Mid Elevation Treatments
0.9
Variable
Low
0.8
Mid
0.7
0.6
Probability
5.62
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
Days
5
6
7
8
The plots have the characteristic “S-shape” of the logistic plot.
b.
Days
Proportion
p
(1 − p )
⎛ p ⎞
y ' = ln ⎜
⎟
⎝1− p ⎠
1
2
3
4
5
6
7
8
0.75
0.67
0.36
0.31
0.14
0.09
0.06
0.07
3
2.030303
0.5625
0.449275
0.162791
0.098901
0.06383
0.075269
1.098612
0.708185
-0.57536
-0.80012
-1.81529
-2.31363
-2.75154
-2.58669
The resulting best fit line is: y ' = a + bx = 1.513 − 0.587 x ,
where y is the proportion of eggs hatched and x = the days of exposure.
The negative slope mean that the value of b < 0, indicating that the curve starts near 1 for
small x values and then decreases as x increases. In other words, the greater the exposure
time, the lower the probability of hatching.
c.
d.
For 3 days:
p=
For 5 days:
p=
e1.513 −0.587(3)
1 + e1.513 −0.587(3)
e1.513 −0.587(5)
1 + e1.513 −0.587(5)
= .438 .
= .194 .
Somewhere between two days (p = .584) and three days (p = .438).
112
5.63
a.
It can be seen form the table that as the elevation increases, the Lichen becomes less
common.
b.
p
(1 − p)
Elevation Proportion
400
600
800
1000
1200
1400
1600
0.99
0.96
0.75
0.29
0.077
0.035
0.01
⎛ p ⎞
y ' = ln ⎜
⎟
⎝1− p ⎠
99
24
3
0.408451
0.083424
0.036269
0.010101
4.59512
3.178054
1.098612
-0.89538
-2.48382
-3.31678
-4.59512
The resulting best fit line is: yˆ ' = a + bx = 7.537 − 0.0079 x , where y is the proportion
of plots with lichen and x = elevation.
c.
The table with a row of proportions of mosquitoes killed:
Concentration
Proportion killed
a.
0.10
.2083
0.15
.25
0.20
.4464
0.30
.6078
0.50
.8298
0.70
.9623
0.95
.9608
Proportion of Mosquitos Killed with different Concentrations of Pesticide
1.0
0.9
0.8
Prop. Killed
5.64
To estimate the proportion of plots of land where the lichen is classified as “common” at an
elevation of 900m:
e7.537 −0.0079(900)
p=
= .6052
1 + e7.537 −0.0079(900)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.3
0.4
0.5
0.6
Conc.
113
0.7
0.8
0.9
1.0
b.
p
(1 − p )
Concentration Proportion
0.1
0.15
0.2
0.3
0.5
0.7
0.95
0.2083
0.25
0.4464
0.6078
0.8298
0.9623
0.9608
0.263105
0.333333
0.806358
1.54972
4.875441
25.5252
24.5102
⎛ p ⎞
y ' = ln ⎜
⎟
⎝ 1− p ⎠
-1.3352
-1.09861
-0.21523
0.438074
1.58421
3.239666
3.19909
The resulting best fit line is: yˆ ' = a + bx = −1.559 + 5.768 x , where y is the proportion
of mosquitoes killed and x = the concentration of pesticide.
The positive slope, b >0, shows that as the concentration of the pesticide increases,
the proportion of the mosquitoes killed also increases.
c.
⎛ p ⎞
When the dose kills 50%, p = 0.5, so: ln ⎜
⎟ = a + bx
⎝ (1 − p ) ⎠
⎛ 0.5 ⎞
ln ⎜
⎟ = −1.559 + 5.768 x
⎝ (1 − 0.5) ⎠
x=
a.
Proportion failing vs. Load on the Fabric
0.4
Proportion failing
5.65
ln(1) + 1.559 1.559
=
= 0.270 . About 0.27 g/cc
5.768
5.768
0.3
0.2
0.1
0.0
0
10
20
30
40
50
Load
60
70
80
90
b.
Load
Prop.
failing
5
15
35
50
70
80
90
0.02
0.04
0.2
0.23
0.32
0.34
0.43
⎛ p ⎞
p
y ' = ln ⎜
⎟
(1 − p )
⎝ 1− p ⎠
0.020408
-3.89182
0.041667
-3.17805
0.25
-1.38629
0.298701
-1.20831
0.470588
-0.75377
0.515152
-0.66329
0.754386
-0.28185
114
The resulting best fit line is: y ' = a + bx = −3.579 + 0.0397 x , where y is the proportion of
fabrics failing and x = the load applied.
The positive slope, b >0, shows that as the load or forces increases, the proportion of the
fabrics that fail also increases.
c.
d.
When the load is 60, p =
e −3.579 + 0.0397(60)
1 + e −3.579 +0.0397(60)
= .232 .
.232 lbs per sq. in.
⎛ p ⎞
When the failure rate is 5%, p = 0.05, so: ln ⎜
⎟ = a + bx
⎝ (1 − p ) ⎠
⎛ 0.05 ⎞
ln ⎜
⎟ = −3.579 + 0.0397 x
⎝ (1 − 0.05) ⎠
ln(0.0526) + 3.579
= 15.98 .
0.0397
To have less than a 5% chance of a wardrobe malfunction, a maximum force of 15.5 lbs/sq
in. might be suggested.
x=
Exercises 5.66 – 5.78
5.66
a.
r = -0.981 There appears to be a strong negative linear relationship between the amount of
catalyst added to a chemical reaction and the resulting reaction time.
b.
There is a definite curvature to the plot. Linear does not seem the best description of this
relationship. This shows the importance of checking, not only the numerical checks of “a
good fit” but also graphical ones too.
5.67
a.
∑ x = 5.92 , ∑ x = 3.8114, ∑ y = 10.47 ,
∑ y = 9.885699 , ∑ xy = 5.8464 , x = 0.455, y = 0.805
2
For this data set, n= 13,
2
(5.92)(10.47)
5.8464 − 4.7679 1.0785
13
= 0.9668
b=
=
=
(5.92)(5.92)
3.8114 − 2.6959 1.1155
3.8114 −
13
5.8464 −
a = 0.805 – 0.9668(0.455) = 0.3651
The least squares regression line is ŷ = 0.3651 + 0.9668x
b.
For a value of x = 0.5, ŷ = 0.3651 + 0.9668(0.5) = 0.8485
115
5.68
n = 15, ∑ x = 82.82, ∑ y = 12545, ∑ x2 = 459.9784, ∑ y2 = 12734425, ∑ xy = 67703.9
a.
∑ xy −
( ∑ x)( ∑ y)
(82.82)(12545)
= 67703.9 −
= −1561.227
n
15
∑ x2 −
( ∑ x )2
(82.82)2
= 459.9784 −
= 2.702
n
15
b=
−1561.227
= −577.895
2.702
a = 836.33 − (−577.8953)(5.5213) = 4027.083
ŷ = 4027.083 − 577.895 x
b.
The b value of −577.895 is the estimate of the average change in myoglobin level associated
with a one unit increase in finishing time.
c.
ŷ = 4027.083 − 577.895(8) = −596.077
The least squares equation yields a negative value for the estimated level of myoglobin when
the finishing time is 8h. This is clearly unreasonable since myoglobin level cannot be a
negative value.
5.69
r 2 = 1−
5987.16
= 1 − .3439 = .6561
17409.60
So 65.61% of the observed variation in age is explained by a linear relationship between percentage
of root with transparent dentine for premolars and age.
se2 =
5987.16 5987.16
=
= 176.0929
36 − 2
34
s e = 176.0929 = 13.27
The typical amount by which an observed age deviates from the least squares line of percentage of
root with transparent dentine and age is 13.27.
5.70
a.
The least square line is yˆ = 32.08 + 0.5549 x
x
15
19
31
39
41
44
47
48
55
65
y
23
52
65
55
32
60
78
59
61
60
predicted
40.4048
42.6245
49.2837
53.7231
54.8330
56.4978
58.1626
58.7175
62.6020
68.1513
residual
−17.4048
9.3755
15.7163
1.2769
−22.8330
3.5022
19.8374
0.2825
−1.6020
−8.1513
116
b.
SSResid = (−17.4048)2 + (9.3755)2 + . . . + (−1.6020)2 + (−8.1513)2
= 302.9271 + 87.9000 + . . . + 2.5664 + 66.4437
= 1635.6833
SSTo = 31993 −
r 2 = 1−
c.
(545)2
= 31993 − 29702.5 = 2290.50
10
1635.6833
= 1 − .7141 = .2859
2290.5000
Only 28.59% of the observed variation in age is explained by the linear relationship between
percent of root transparent dentine and age. Also, se = 14.3, so a typical prediction error is
quite large. The least squares line does not give very accurate predictions.
5.71
a.
∑ x2 −
( ∑ x )2
(22.027)2
= 62.600235 −
= 62.600235 − 40.43239 = 22.16784
n
12
∑ xy −
( ∑ x ∑ y)
(22.027)(793)
= 1114.5 −
= 1114.5 − 1455.61758 = −341.11758
n
12
b=
−341.11758
= −15.38795
22.16784
a = 66.08333 − (−15.38795)(1.83558) = 66.08333 + 28.24586 = 94.32919
The least squares equation is ŷ = 94.33 − 15.388x
b.
SSTo = ∑ y 2 −
( ∑ y )2
(793)2
= 57939 −
= 57939 − 52404.08333 = 5534.91667
n
12
SSResid = 57939 − 94.32919(793) − (−15.38795)(1114.5)
= 57939 − 74803.04767 + 17149.87028 = 285.82261
c.
r 2 = 1−
d.
se =
2
285.82261
= 1 − .05164 = .94836 or 94.836%
5534.91667
285.82261
= 28.582261
10
s e = 28.582261 = 5.34624
A typical prediction error would be about 5.35 percent.
e.
Since the slope of the fitted line is negative, the value of r is the negative square root of r2.
So r = − r 2 = − .94836 = − .97384 .
117
5.72
a.
118
b.
It appears that log x, log y does the best job of producing an approximate linear relationship.
The least-squares equation for predicting y ' = log y from x ' = log x is ŷ' = 1.61867 − .31646 x ' .
When x = 25, x ' = 1.39764
yˆ ′ = 1.61867 − .31646(1.39764) = 1.17628
yˆ = 101.17628 = 15.0064
5.73
a.
A value of .11 for r indicates a weak linear relationship between annual raises and teaching
evaluations.
b.
r 2 = (.11)2 = .0121
119
5.74
a.
The plot does not suggest a linear relationship. However, the one outlier value (51.3, 49.3)
prevents an accurate interpretation.
b.
ŷ = −11.37 + 1.0906(40) = 32.254
c.
The value of r is not very large and the value of se is 4.70, which is large relative to the size
2
of the y-values in the sample. A straight line is not very effective in summarizing the
relationship.
d.
For the new data set, n = 9,
∑ x = 388.8 − 51.3 = 337.5, ∑ y = 310.3 − 49.3 = 261.0
= 12706.85, ∑ y = 10072.41 − ( 49.3 ) = 7641.92
∑ x = 15338.54 − (51.3 )
∑ xy = 12306.58 − (51.3 )( 49.3 ) = 9777.49
2
2
∑x
2
(∑ x )
−
2
2
2
= 12706.85 −
n
(337.5)2
= 50.60
9
( x )( y )
(337.5)(261.0)
= −10.01
∑ xy − ∑ n ∑ = 9777.49 −
9
b=
261
−10.01
⎛ 337.5 ⎞
= −0.1978 , a =
− ( −.1978) ⎜
⎟ = 36.4175
9
50.60
⎝ 9 ⎠
ŷ = 36.4175 − .1978x
∑
y
2
(∑ y )
−
n
2
= 7641.92 −
( 261)2
9
= 72.92 , r 2 =
( −10.01)2
= .027
( 50.6 )( 72.92 )
Without the observation (51.3, 49.3) there is very little evidence of a linear relationship
between fire-simulation consumption and treadmill consumption. One would be very hesitant
to use the prediction equation based on the data set including this observation because this
observation is very influential.
5.75
The summary values are: n = 13, ∑ x = 91, ∑ y = 470,
∑ xy = 3867
∑ xy −
a.
∑x
2
= 819 ,
∑y
2
= 19118
( ∑ x )2
( ∑ y )2
( ∑ x)( ∑ y)
= 577, ∑ x 2 −
= 182, ∑ y 2 −
= 2125.6923
n
n
n
b=
577
= 3.1703
182
a = 36.1538 − 3.1703(7) = 13.9617
The equation of the estimated regression line is ŷ = 13.9617 + 3.1703 x
120
b.
The plot with the line drawn in suggests that perhaps a simple linear regression model may
not be appropriate. The scatterplot suggests that a curvilinear relationship may exist
between flood depth and damage. The points for small x-values or large x-values are below
the line, while points for x-values in the middle range are above the line.
5.76
c.
When x = 6.5, ŷ = 13.9617 + 3.1703(6.5) = 34.5687
d.
The scatterplot in part b suggests that the value of damage levels off at between 45 and 50
when the depth of flooding is in excess of 10 feet. Using the least squares line to predict
flood damage when x = 18 would yield a very high value for damage and result in a predicted
value far in excess of actual damage. Since x = 18 is outside of the range of x-values for
which data has been collected, we have no information concerning the relationship in the
vicinity of x = 18. All of these reasons suggest that one would not want to use the least
squares line to predict flood damage when depth of flooding is 18 feet.
a.
∑(x − x )2 = 2, ∑(y − y )2 = 2, ∑(x − x )(y − y ) = 0
r=
0
2(2)
=0
b.
If y = 1, when x = 6, then r = .509.
(Comment: Any y value greater than .973 will work.)
c.
If y = −1, when x = 6, then r = −.509.
(Comment: any y value less than −.973 will work).
121
5.77
a.
b.
∑ x2 −
b=
( ∑ x )2
( ∑ y )2
( ∑ x)( ∑ y)
= .2157 , ∑ y 2 −
= 3.08 , ∑ xy −
= 0.474
n
n
n
.474
= 2.1975
.2157
a = 7.6 − (2.1975)(.93286) = 5.550
The least squares line is ŷ = 5.550 + 2.1975 x
c.
s x = .2157 / 6 = .1896, s y = 3.08 / 6 = .7165
.474
=.5815
6(.1896)(.7165)
This value of r suggests a moderate positive linear relationship between x and y.
r =
122
d.
x rank
y rank
6.5
6.5
4.0
3.0
1.0
2.0
5.0
5.0
7.0
2.0
1.0
3.5
3.5
6.0
129.5 −
rs =
(x rank)(y rank)
7 (8)
4
7(6)(8)
12
32.5
45.5
8.0
3.0
3.5
7.0
30.0
129.5
2
=
17.5
= .625
28
This value is very close to the value of r in part c.
5.78
a.
b.
Based on the plot in part a and figure 5.34 a transformation going down the ladder on x or y is
suggested. The transformation log(time) will produce a reasonably straight plot.
123
Download