# Thus, SSR = SST - SSE = 281.2 – 127.3 = 153.9

```Chapter 14
Simple Linear Regression
Learning Objectives
1.
Understand how regression analysis can be used to develop an equation that estimates
mathematically how two variables are related.
2.
Understand the differences between the regression model, the regression equation, and the estimated
regression equation.
3.
Know how to fit an estimated regression equation to a set of sample data based upon the leastsquares method.
4.
Be able to determine how good a fit is provided by the estimated regression equation and compute
the sample correlation coefficient from the regression analysis output.
5.
Understand the assumptions necessary for statistical inference and be able to test for a significant
relationship.
6.
Know how to develop confidence interval estimates of y given a specific value of x in both the case
of a mean value of y and an individual value of y.
7.
Learn how to use a residual plot to make a judgement as to the validity of the regression
assumptions.
8.
Know the definition of the following terms:
independent and dependent variable
simple linear regression
regression model
regression equation and estimated regression equation
scatter diagram
coefficient of determination
standard error of the estimate
confidence interval
prediction interval
residual plot
14 - 1
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Solutions:
a.
16
14
12
10
y
1
8
6
4
2
0
0
1
2
3
4
5
6
x
b.
There appears to be a positive linear relationship between x and y.
c.
Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
d.
x
xi 15

3
n
5
y
( xi  x )( yi  y )  26
b1 
yi 40

8
n
5
( xi  x ) 2  10
( xi  x )( yi  y ) 26

 2.6
10
( xi  x )2
b0  y  b1 x  8  (2.6)(3)  0.2
yˆ  0.2  2.6 x
e.
yˆ  0.2  2.6(4)  10.6
14 - 2
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
a.
60
50
40
y
2.
30
20
10
0
0
5
10
15
20
25
x
b.
There appears to be a negative linear relationship between x and y.
c.
Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
d.
x
xi 55

 11
n
5
y
( xi  x )( yi  y )  540
b1 
yi 175

 35
n
5
( xi  x ) 2  180
( xi  x )( yi  y ) 540

 3
180
( xi  x )2
b0  y  b1 x  35  (3)(11)  68
yˆ  68  3x
e.
yˆ  68  3(10)  38
14 - 3
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
a.
30
25
20
y
3.
15
10
5
0
0
5
10
15
20
x
b.
x
xi 50

 10
n
5
y
( xi  x )( yi  y )  171
b1 
yi 83

 16.6
n
5
( xi  x ) 2  190
( xi  x )( yi  y ) 171

 0.9
190
( xi  x )2
b0  y  b1 x  16.6  (0.9)(10)  7.6
yˆ  7.6  0.9 x
c.
yˆ  7.6  0.9(6)  13
14 - 4
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
25
Simple Linear Regression
a.
70
60
50
% Management
4.
40
30
20
10
0
40
45
50
55
60
65
70
75
% Working
b.
There appears to be a positive linear relationship between the percentage of women working in the
five companies (x) and the percentage of management jobs held by women in that company (y)
c.
Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
d.
x
xi 300

 60
n
5
y
( xi  x )( yi  y )  624
b1 
yi 215

 43
n
5
( xi  x ) 2  480
( xi  x )( yi  y ) 624

 1.3
( xi  x )2
480
b0  y  b1 x  43  1.3(60)  35
yˆ  35  1.3x
e.
yˆ  35  1.3x  35  1.3(60)  43%
14 - 5
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
a.
100
90
80
70
Rating
5.
60
50
40
30
20
10
0
0
500
1000
1500
2000
2500
Price (\$)
3000
3500
4000
b.
There appears to be a positive relationship between price and rating. The sign that says “Quality:
You Get What You Pay For” does fairly reflect the price-quality relationship for ellipticals.
c.
Let x = price (\$) and y = rating.
x
xi 1500

 1875
n
8
y
( xi  x )( yi  y )  68, 900
b1 
yi 592

 74
n
8
( xi  x ) 2  8,155, 000
( xi  x )( yi  y )
68,900

 .008449
( xi  x )2
8,155,000
b0  y  b1 x  74  (.008449)(1875)  58.158
yˆ  58.158  .008449 x
d.
yˆ  58.158  .008449 x  58.158  .008449(1500)  70.83 or approximately 71
14 - 6
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
a.
90
80
70
60
Win%
6.
50
40
30
20
10
0
4
5
6
7
8
9
Yds/Att
b.
The scatter diagram indicates a positive linear relationship between x = average number of passing
yards per attempt and y = the percentage of games won by the team.
c.
x  xi / n  680 /10  6.8
( xi  x )( yi  y )  121.6
b1 
y  yi / n  464 /10  46.4
( xi  x ) 2  7.08
( xi  x )( yi  y ) 121.6

 17.1751
( xi  x )2
7.08
b0  y  b1 x  46.4  (17.1751)(6.8)  70.391
yˆ  70.391  17.1751x
d.
The slope of the estimated regression line is approximately 17.2. So, for every increase of one yard
in the average number of passes per attempt, the percentage of games won by the team increases by
17.2%.
e.
With an average number of passing yards per attempt of 6.2, the predicted percentage of games won
is ŷ = -70.391 + 17.175(6.2) = 36%. With a record of 7 wins and 9 loses, the percentage of wins that
the Kansas City Chiefs won is 43.8 or approximately 44%. Considering the small data size, the
14 - 7
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
a.
150
140
130
Annual Sales (\$1000s)
7.
120
110
100
90
80
70
60
50
0
2
4
6
8
10
12
14
Years of Experience
b.
Let x = years of experience and y = annual sales (\$1000s)
x
xi 70

7
n
10
y
( xi  x )( yi  y )  568
b1 
yi 1080

 108
n
10
( xi  x ) 2  142
( xi  x )( yi  y ) 568

4
142
( xi  x )2
b0  y  b1 x  108  (4)(7)  80
y  80  4 x
c.
y  80  4 x  80  4(9)  116 or \$116,000
14 - 8
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
a.
4.5
4.0
Satisfaction
8.
3.5
3.0
2.5
2.0
2.0
2.5
3.0
3.5
Speed of Execution
4.0
4.5
b.
The scatter diagram indicates a positive linear relationship between x = speed of execution rating and
y = overall satisfaction rating for electronic trades.
c.
x  xi / n  36.3 /11  3.3
( xi  x )( yi  y )  2.4
b1 
y  yi / n  35.2 /11  3.2
( xi  x ) 2  2.6
( xi  x )( yi  y ) 2.4

 .9077
( xi  x )2
2.6
b0  y  b1 x  3.2  (.9077)(3.3)  .2046
yˆ  .2046  .9077 x
d.
The slope of the estimated regression line is approximately .9077. So, a one unit increase in the
speed of execution rating will increase the overall satisfaction rating by approximately .9 points.
e.
The average speed of execution rating for the other brokerage firms is 3.4. Using this as the new
value of x for Zecco.com, we can use the estimated regression equation developed in part (c) to
estimate the overall satisfaction rating corresponding to x = 3.4.
yˆ  .2046  .9077 x  .2046  .9077(3.4)  3.29
Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3.
14 - 9
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
a.
85
80
75
Rating
9.
70
65
60
55
50
100
150
200
250
300
Price (\$)
350
400
450
b.
The scatter diagram indicates a positive linear relationship between x = price (\$) and y = overall
rating.
c.
x  xi / n  4660 / 20  233 y  yi / n  1400 / 20  70
( xi  x )( yi  y )  8100
b1 
( xi  x ) 2  127, 420
( xi  x )( yi  y )
8100

 .06357
( xi  x )2
127,420
b0  y  b1 x  70  (.06357)(233)  55.188
yˆ  55.188  .06357 x
d.
We can use the estimated regression equation developed in part (c) to estimate the overall
satisfaction rating corresponding to x = 200.
yˆ  55.188  .06357 x  55.188  .06357(200)  67.9
Thus, an estimate of the overall rating when x = \$200 is approximately 70.
14 - 10
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
10. a.
1400
% Gain in Options Value
1200
1000
800
600
400
200
0
0
100
200
300
400
% Increase in Stock Price
500
600
b.
The scatter diagram indicates a positive linear relationship between x = percentage increase in the
stock price and y = percentage gain in options value. In other words, options values increase as stock
prices increase.
c.
x  xi / n  2939 /10  293.9
( xi  x )( yi  y )  314,501.1
b1 
y  yi / n  6301/10  630.1
( xi  x ) 2  115,842.9
( xi  x )( yi  y ) 314,501.1

 2.7149
( xi  x )2
115,842.9
b0  y  b1 x  630.1  (2.1749)(293.9)  167.81
yˆ  167.81  2.7149 x
d.
The slope of the estimated regression line is approximately 2.7. So, for every percentage increase in
the price of the stock the options value increases by 2.7%.
e.
The rewards for the CEO do appear to be based upon performance increases in the stock value.
While the rewards may seem excessive, the executive is being rewarded for his/her role in increasing
the value of the company. This is why such compensation schemes are devised for CEOs by boards
of directors. A compensation scheme where an executive got a big salary increase when the
company stock went down would be bad. And, if the stock price for a company had gone down
during the periods in question, the value of the CEOs options would also go down.
14 - 11
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
11. a.
b.
There appears to be a positive linear relationship between x = price and y = road-test score.
c.
x
xi 339.6

 28.3
n
12
y
( xi  x )( yi  y )  309.90
b1 
yi 930

 77.5
n
12
( xi  x ) 2  346.38
( xi  x )( yi  y ) 309.90

 .8947
( xi  x )2
346.38
b0  y  b1 x  77.5  (.8947)(28.3)  52.18
yˆ  52.18  .8947 x
d. The slope is .8947. A sporty car that has a ten thousand dollar higher price can be expected to
have a 10(.8947) = 8.947, or approximately a 9 point higher road-test score.
e.
yˆ  52.18  .8947(36.7)  85
14 - 12
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
12. a.
190
Entertainment (\$)
170
150
130
110
90
70
70
90
110
130
Hotel Room Rate (\$)
150
170
b.
The scatter diagram indicates a positive linear relationship between x = hotel room rate and the
amount spent on entertainment.
c.
x  xi / n  945 / 9  105
( xi  x )( yi  y )  4237
b1 
y  yi / n  1134 / 9  126
( xi  x ) 2  4100
( xi  x )( yi  y ) 4237

 1.0334
( xi  x )2
4100
b0  y  b1 x  126  (1.0334)(105)  17.49
yˆ  17.49  1.0334 x
d.
With a value of x = \$128, the predicted value of y for Chicago is
yˆ  17.49  1.0334 x  17.49  1.0334(128)  150
Note: In The Wall Street Journal article the entertainment expense for Chicago was \$146. Thus, the
estimated regression equation provided a good estimate of entertainment expenses for Chicago.
14 - 13
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Reasonable Amount of Itemized
Deductions (\$1000s)
13. a.
30.0
25.0
20.0
15.0
10.0
5.0
0.0
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
b.
Let x = adjusted gross income and y = reasonable amount of itemized deductions
x
xi 399

 57
n
7
y
( xi  x )( yi  y )  1233.7
b1 
yi 97.1

 13.8714
n
7
( xi  x ) 2  7648
( xi  x )( yi  y ) 1233.7

 0.1613
7648
( xi  x )2
b0  y  b1 x  13.8714  (0.1613)(57)  4.6773
y  4.68  016
. x
c.
y  4.68  016
. x  4.68  016
. (52.5)  13.08 or approximately \$13,080.
The agent's request for an audit appears to be justified.
14 - 14
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
14. a.
b.
There appears to be a positive linear relationship between x = features rating and y = PCW World
Rating.
c.
x
xi 784

 78.4
n
10
y
( xi  x )( yi  y )  147.20
b1 
yi 777

 77.7
n
10
( xi  x ) 2  284.40
( xi  x )( yi  y ) 147.20

 .51758
( xi  x )2
284.40
b0  y  b1 x  77.7  (.51758)(78.4)  37.1217
yˆ  37.1217  .51758 x
d.
15. a.
yˆ  37.1217  .51758(70)  73.35 or 73
The estimated regression equation and the mean for the dependent variable are:
yi  0.2  2.6xi
y 8
The sum of squares due to error and the total sum of squares are
SSE  ( yi  yi ) 2  12.40
SST  ( yi  y ) 2  80
Thus, SSR = SST - SSE = 80 - 12.4 = 67.6
b.
r2 = SSR/SST = 67.6/80 = .845
The least squares line provided a very good fit; 84.5% of the variability in y has been explained by
the least squares line.
c.
rxy  .845  .9192
14 - 15
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
16. a.
The estimated regression equation and the mean for the dependent variable are:
yˆi  68  3x
y  35
The sum of squares due to error and the total sum of squares are
SSE  ( yi  yˆi ) 2  230
SST  ( yi  y ) 2  1850
Thus, SSR = SST - SSE = 1850 - 230 = 1620
b.
r2 = SSR/SST = 1620/1850 = .876
The least squares line provided an excellent fit; 87.6% of the variability in y has been explained by
the estimated regression equation.
c.
rxy  .876  .936
Note: the sign for r is negative because the slope of the estimated regression equation is negative.
(b1 = -3)
17.
The estimated regression equation and the mean for the dependent variable are:
yˆi  7.6  .9 x
y  16.6
The sum of squares due to error and the total sum of squares are
SSE  ( yi  yˆi ) 2  127.3
SST  ( yi  y ) 2  281.2
Thus, SSR = SST - SSE = 281.2 – 127.3 = 153.9
r2 = SSR/SST = 153.9/281.2 = .547
We see that 54.7% of the variability in y has been explained by the least squares line.
rxy  .547  .740
18. a.
x  xi / n  600 / 6  100
SST = ( yi  y ) 2  1800
y  yi / n  330 / 6  55
SSE = ( yi  yˆ i ) 2  287.624
SSR = SST – SSR = 1800 – 287.624 = 1512.376
SSR 1512.376

 .84
SST
1800
b.
r2 
c.
r  r 2  .84  .917
14 - 16
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
19. a.
The estimated regression equation and the mean for the dependent variable are:
ŷ = 80 + 4x
y = 108
The sum of squares due to error and the total sum of squares are
SSE  ( yi  yˆ i )2  170
SST  ( yi  y )2  2442
Thus, SSR = SST - SSE = 2442 - 170 = 2272
b.
r2 = SSR/SST = 2272/2442 = .93
We see that 93% of the variability in y has been explained by the least squares line.
c.
20. a.
rxy  .93  .96
x  xi / n  160 /10  16
y  yi / n  55,500 /10  5550
( xi  x )( yi  y )  31, 284
b1 
( xi  x ) 2  21.74
( xi  x )( yi  y ) 31,284

 1439
( xi  x )2
21.74
b0  y  b1 x  5550  (1439)(16)  28,574
yˆ  28,574  1439 x
b.
SST = 52,120,800
SSE = 7,102,922.54
SSR = SST – SSR = 52,120,800 - 7,102,922.54 = 45,017,877
r 2 = SSR/SST = 45,017,877/52,120,800 = .864
The estimated regression equation provided a very good fit.
c.
yˆ  28,574  1439 x  28,574  1439(15)  6989
Thus, an estimate of the price for a bike that weighs 15 pounds is \$6989.
21. a.
x
xi 3450

 575
n
6
y
( xi  x )( yi  y )  712,500
b1 
yi 33, 700

 5616.67
n
6
( xi  x ) 2  93, 750
( xi  x )( yi  y ) 712,500

 7.6
93, 750
( xi  x ) 2
b0  y  b1 x  5616.67  (7.6)(575)  1246.67
y  1246.67  7.6 x
14 - 17
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b.
\$7.60
c.
The sum of squares due to error and the total sum of squares are:
SSE  ( yi  yˆi ) 2  233,333.33
SST  ( yi  y ) 2  5, 648,333.33
Thus, SSR = SST - SSE = 5,648,333.33 - 233,333.33 = 5,415,000
r2 = SSR/SST = 5,415,000/5,648,333.33 = .9587
We see that 95.87% of the variability in y has been explained by the estimated regression equation.
d.
22. a.
y  1246.67  7.6x  1246.67  7.6(500)  \$5046.67
y = 74
SSE = 173.88
The total sum of squares is
SST  ( yi  y ) 2  756
Thus, SSR = SST - SSE = 756 – 173.88 = 582.12
r2 = SSR/SST = 582.12/756 = .77
b.
The estimated regression equation provided a good fit because 77% of the variability in y has been
explained by the least squares line.
c.
rxy  .77  .88
This reflects a strong positive linear relationship between price and rating.
23. a.
s2 = MSE = SSE / (n - 2) = 12.4 / 3 = 4.133
b.
s  MSE  4.133  2.033
c.
( xi  x ) 2  10
sb1 
d.
t
s
( xi  x )
2

2.033
10
 0.643
b1
2.6

 4.044
sb1 .643
Using t table (3 degrees of freedom), area in tail is between .01 and .025
p-value is between .02 and .05
Using Excel or Minitab, the p-value corresponding to t = 4.04 is .0272.
Because p-value   , we reject H0: 1 = 0
14 - 18
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
e.
MSR = SSR / 1 = 67.6
F = MSR / MSE = 67.6 / 4.133 = 16.36
Using F table (1 degree of freedom numerator and 3 denominator), p-value is between .025 and .05
Using Excel or Minitab, the p-value corresponding to F = 16.36 is .0272.
Because p-value   , we reject H0: 1 = 0
Source
of Variation
Regression
Error
Total
24. a.
Sum
of Squares
67.6
12.4
80.0
Mean
Square
67.6
4.133
F
16.36
p-value
.0272
s2 = MSE = SSE/(n - 2) = 230/3 = 76.6667
b.
s  MSE  76.6667  8.7560
c.
( xi  x ) 2  180
sb1 
d.
Degrees
of Freedom
1
3
4
t
s
( xi  x )
2

8.7560
180
 0.6526
b1
3

 4.59
sb1 .653
Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less than .02
Using Excel or Minitab, the p-value corresponding to t = -4.59 is .0193.
Because p-value   , we reject H0: 1 = 0
e.
MSR = SSR/1 = 1620
F = MSR/MSE = 1620/76.6667 = 21.13
Using F table (1 degree of freedom numerator and 3 denominator), p-value is less than .025
Using Excel or Minitab, the p-value corresponding to F = 21.13 is .0193.
Because p-value   , we reject H0: 1 = 0
Source
of Variation
Regression
Error
Total
Sum
of Squares
1620
230
1850
Degrees
of Freedom
1
3
4
Mean
Square
1620
76.6667
F
21.13
p-value
.0193
14 - 19
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
25. a.
s2 = MSE = SSE/(n - 2) = 127.3/3 = 42.4333
s  MSE  42.4333  6.5141
b.
( xi  x ) 2  190
sb1 
t
s
( xi  x )
2

6.5141
190
 0.4726
b1
.9

 1.90
sb1 .4726
Using t table (3 degrees of freedom), area in tail is between .05 and .10
p-value is between .10 and .20
Using Excel or Minitab, the p-value corresponding to t = 1.90 is .1530.
Because p-value &gt;  , we cannot reject H0: 1 = 0; x and y do not appear to be related.
c.
MSR = SSR/1 = 153.9 /1 = 153.9
F = MSR/MSE = 153.9/42.4333 = 3.63
Using F table (1 degree of freedom numerator and 3 denominator), p-value is greater than .10
Using Excel or Minitab, the p-value corresponding to F = 3.63 is .1530.
Because p-value &gt;  , we cannot reject H0: 1 = 0; x and y do not appear to be related.
26. a.
In the statement of exercise 18, ŷ = 23.194 + .318x
In solving exercise 18, we found SSE = 287.624
s 2  MSE = SSE/(n-2) =287.624 / 4  71.906
s  MSE  71.906  8.4797
( x  x )
sb1 
t
2
 14,950
s
( x  x )
2

8.4797
 .0694
14,950
b1
.318

 4.58
sb1 .0694
Using t table (4 degrees of freedom), area in tail is between .005 and .01
p-value is between .01 and .02
14 - 20
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Using Excel, the p-value corresponding to t = 4.58 is .010.
Because p-value   , we reject H0: 1 = 0; there is a significant relationship between price and
overall score
b.
In exercise 18 we found SSR = 1512.376
MSR = SSR/1 = 1512.376/1 = 1512.376
F = MSR/MSE = 1512.376/71.906 = 21.03
Using F table (1 degree of freedom numerator and 4 denominator), p-value is between .025 and .01
Using Excel, the p-value corresponding to F = 11.74 is .010.
Because p-value   , we reject H0: 1 = 0
c.
Source
of Variation
Regression
Error
Total
27. a.
Sum
of Squares
1512.376
287.624
1800
Degrees
of Freedom
1
4
5
Mean
Square
1512.376
71.906
F
21.03
p-value
.010
Let x = number of megapixels and y = price (\$)
x
xi 95
  9.5
n 10
y
yi 2190

 219
n
10
2
( xi  x )( yi  y )  2165 ( xi  x )  56.5
b1 
( xi  x )( yi  y ) 2165

 38.31858
( xi  x )2
56.5
b0  y  b1 x  219  (38.31858)(9.5)  145.0265
yˆ  145.0265  38.31858 x
b.
SSE = ( yi  yˆ i ) 2  20, 730.27 SST = ( yi  y ) 2 = 103,690
Thus, SSR = SST - SSE = 103,690 – 20,730.27 = 82,959.73
MSR = SSR/1 = 82,959.73
MSE = SSE/(n - 2) = 20,730.27/8 = 2591.28
F = MSR / MSE = 82,959.73/2591.28 = 32.015
Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01
Using Excel, the p-value corresponding to F = 32.015 is .000.
14 - 21
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Because p-value   , we reject H0: 1 = 0
Number of megapixels and price are related.
c.
r2 = SSR/SST = 82,959.73/103,690= .80
The estimated regression equation provided a good fit; we should feel comfortable using the
estimated regression equation to estimate the price given the number of megapixels.
d.
28.
yˆ  145.0265  38.31858(10)  238.16 or approximately \$238
The sum of squares due to error and the total sum of squares are
SSE  ( yi  yˆ i ) 2  1.4379
SST  ( yi  y ) 2  3.5800
Thus, SSR = SST - SSE = 3.5800 – 1.4379 = 2.1421
s2 = MSE = SSE / (n - 2) = 1.4379 / 9 = .1598
s  MSE  .1598  .3997
We can use either the t test or F test to determine whether speed of execution and overall satisfaction
are related.
We will first illustrate the use of the t test.
( xi  x ) 2  2.6
s
sb1 
t
( xi  x )
b1
sb
1

.9077
.2479
2

.3997
 .2479
2.6
 3.66
Using t table (9 degrees of freedom), area in tail is less than .005; p-value is less than .01
Using Excel or Minitab, the p-value corresponding to t = 3.66 is .000.
Because p-value   , we reject H0: 1 = 0
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are
related.
Next we illustrate the use of the F test.
MSR = SSR / 1 = 2.1421
F = MSR / MSE = 2.1421 / .1598 = 13.4
Using F table (1 degree of freedom numerator and 9 denominator), p-value is less than .01
Using Excel or Minitab, the p-value corresponding to F = 13.4 is .000.
14 - 22
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Because p-value   , we reject H0: 1 = 0
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are
related.
The ANOVA table is shown below.
Source
of Variation
Regression
Error
Total
29.
Sum
of Squares
2.1421
1.4379
3.5800
Degrees
of Freedom
1
9
10
SSE = ( yi  yˆi ) 2  233,333.33
Mean
Square
2.1421
.1598
F
13.4
p-value
.000
F
92.83
p-value
.0006
SST = ( yi  y ) 2 = 5,648,333.33
Thus, SSR = SST – SSE = 5,648,333.33 –233,333.33 = 5,415,000
MSE = SSE/(n - 2) = 233,333.33/(6 - 2) = 58,333.33
MSR = SSR/1 = 5,415,000
F = MSR / MSE = 5,415,000 / 58,333.25 = 92.83
Source of
Variation
Regression
Error
Total
Sum
of Squares
5,415,000.00
233,333.33
5,648,333.33
Degrees of
Freedom
1
4
5
Mean
Square
5,415,000
58,333.33
Using F table (1 degree of freedom numerator and 4 denominator), p-value is less than .01
Using Excel or Minitab, the p-value corresponding to F = 92.83 is .0006.
Because p-value   , we reject H0: 1 = 0. Production volume and total cost are related.
30.
SSE = ( yi  yˆi ) 2  173.88
SST = ( yi  y ) 2 = 756
Thus, SSR = SST – SSE = 756 – 173.88 = 582.12
s2 = MSE = SSE/(n-2) = 173.88/6 = 28.98
s  28.98  5.3833
 ( xi  x ) 2 = 8,155,000
sb1 
t
s
( xi  x )
2

5.3833
 .001885
8,155,000
b1 .008449

 4.48
sb1 .001885
14 - 23
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Using t table (1 degree of freedom numerator and 8 denominator), area in tail is less than .005
p-value is less than .01
Using Excel or Minitab, the p-value corresponding to t = 4.48 is .0042.
Because p-value   , we reject H0: 1 = 0
There is a significant relationship between price and rating.
31.
SST = 52,120,800
SSE = 7,102,922.54
SSR = SST – SSR = 52,120,800 - 7,102,922.54 = 45,017,877
MSR = SSR/1 = 45,017,877
MSE = SSE/(n - 2) = 7,102,922.54/8 = 887,865.3
F = MSR / MSE = 45,017,877/887,865.3 = 50.7
Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01
Using Excel, the p-value corresponding to F = 32.015 is .000.
Because p-value   , we reject H0: 1 = 0
Weight and price are related.
32. a.
b.
s = 2.033
x 3
( xi  x ) 2  10
s yˆ *  s
1 ( x*  x )2
1 (4  3) 2

 2.033 
 1.11
2
n ( xi  x )
5
10
ŷ* = .2 + 2.6 x * = .2 + 2.6(4) = 10.6
yˆ *  t /2 s yˆ *
10.6  3.182 (1.11) = 10.6  3.53
or 7.07 to 14.13
c.
spred  s 1 
d.
ŷ *  t /2 spred
1 ( x*  x )2
1 (4  3) 2


2.033
1


 2.32
n ( xi  x ) 2
5
10
10.6  3.182 (2.32) = 10.6  7.38
or 3.22 to 17.98
14 - 24
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
33. a.
b.
s = 8.7560
x  11
s yˆ *  s
( xi  x )2  180
1 ( x*  x )2
1 (8  11) 2


8.7560

 4.3780
n ( xi  x ) 2
5
180
yˆ *  0.2  2.6 x*  0.2  2.6(4)  10.6
yˆ *  t /2 s yˆ *
44  3.182 (4.3780) = 44  13.93
or 30.07 to 57.93
c.
spred  s 1 
d.
ŷ*  t /2 spred
1 ( x*  x ) 2
1 (8  11) 2


8.7560
1


 9.7895
n ( xi  x ) 2
5
180
44  3.182(9.7895) = 44  31.15
or 12.85 to 75.15
34.
s = 6.5141
x  10
s yˆ*  s
( xi  x ) 2  190
1 ( x*  x ) 2
1 (12  10) 2

 6.5141 
 3.0627
2
n ( xi  x )
5
190
yˆ *  7.6  .9 x*  7.6  .9(12)  18.40
yˆ *  t /2 s yˆ *
18.40  3.182(3.0627) = 18.40  9.75
or 8.65 to 28.15
spred  s 1 
1 ( x*  x ) 2
1 (12  10) 2

 6.5141 1  
 7.1982
2
n ( xi  x )
5
190
ŷ *  t /2 spred
18.40  3.182(7.1982) = 18.40  22.90
or -4.50 to 41.30
14 - 25
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
The two intervals are different because there is more variability associated with predicting an
individual value than there is a mean value.
35. a.
b.
yˆ *  2090.5  581.1x*  2090.5  581.1(3)  3833.8
s  MSE  21,284  145.89 s = 145.89
x  3.2
s yˆ *  s
( xi  x ) 2  0.74
1 ( x*  x )2
1 (3  3.2) 2


145.89

 68.54
n ( xi  x ) 2
6
0.74
yˆ *  t /2 s yˆ *
3833.8  2.776 (68.54) = 3833.8  190.27
or \$3643.53 to \$4024.07
c.
spred  s 1 
1 ( x*  x )2
1 (3  3.2) 2


145.89
1


 161.19
n ( xi  x ) 2
6
0.74
ŷ*  t /2 spred
3833.8  2.776 (161.19) = 3833.8  447.46
or \$3386.34 to \$4281.26
d.
36. a.
As expected, the prediction interval is much wider than the confidence interval. This is due to the
fact that it is more difficult to predict the starting salary for one new student with a GPA of 3.0 than
it is to estimate the mean for all students with a GPA of 3.0.
s yˆ *  s
1 ( x*  x )2
1 (9  7) 2


4.6098

 1.6503
n ( xi  x ) 2
10
142
yˆ *  t /2 s yˆ *
yˆ *  80  4 x*  80  4(9)  116
116  2.306(1.6503) = 116  3.8056
or 112.19 to 119.81 (\$112,190 to \$119,810)
b.
spred  s 1 
1 ( x*  x )2
1 (9  7) 2


4.6098
1


 4.8963
n ( xi  x )2
10
142
ŷ*  t /2 spred
116  2.306(4.8963) = 116  11.2909
14 - 26
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
or 104.71 to 127.29 (\$104,710 to \$127,290)
c.
37. a.
As expected, the prediction interval is much wider than the confidence interval. This is due to the
fact that it is more difficult to predict annual sales for one new salesperson with 9 years of
experience than it is to estimate the mean annual sales for all salespersons with 9 years of
experience.
x  57
( xi  x ) 2  7648
s2 = 1.88
s yˆ *  s
s = 1.37
1 ( x*  x )2
1 (52.5  57) 2

 1.37

 0.52
2
n ( xi  x )
7
7648
yˆ *  t /2 s yˆ *
ŷ* = 4.68 + 0.16 x * = 4.68 + 0.16(52.5) = 13.08
13.08  2.571 (.52) = 13.08  1.34
or 11.74 to 14.42 or \$11,740 to \$14,420
b.
spred = 1.47
13.08  2.571 (1.47) = 13.08  3.78
or 9.30 to 16.86 or \$9,300 to \$16,860
c.
Yes, \$20,400 is much larger than anticipated.
d.
Any deductions exceeding the \$16,860 upper limit could suggest an audit.
38. a.
b.
ŷ* = 1246.67 + 7.6(500) = \$5046.67
x  575
( xi  x ) 2  93, 750
s2 = MSE = 58,333.33 s = 241.52
spred  s 1 
1 ( x*  x )2
1 (500  575) 2

 241.52 1  
 267.50
2
n ( xi  x )
6
93,750
ŷ*  t /2 spred
5046.67  4.604 (267.50) = 5046.67  1231.57
or \$3815.10 to \$6278.24
c.
Based on one month, \$6000 is not out of line since \$3815.10 to \$6278.24 is the prediction interval.
However, a sequence of five to seven months with consistently high costs should cause concern.
14 - 27
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
39. a.
Let x = miles of track and y = weekday ridership in thousands.
x
xi 203

 29
n
7
y
( xi  x )( yi  y )  1471
b1 
yi 309

 44.1429
n
7
( xi  x ) 2  838
( xi  x )( yi  y ) 1471

 1.7554
838
( xi  x )2
b0  y  b1 x  44.1429  (1.7554)(29)  6.76
yˆ  6.76  1.755x
b.
SST =3620.9 SSE = 1038.7 SSR = 2582.1
r2 = SSR/SST = 2582.1/3620.9 = .713
The estimated regression equation explained 71.3% of the variability in y; a good fit.
c.
s2 = MSE = 1038.7/5 = 207.7
s  207.7  14.41
s yˆ *  s
1 ( x*  x )2
1 (30  29) 2

 14.41 
 5.47
2
n ( xi  x )
7
838
yˆ *  6.76  1.755x*  6.76  1.755(30)  45.9
45.9  2.571(5.47) = 45.9  14.1
or 31.8 to 60
d.
spred  s 1 
1 ( x*  x )2
1 (30  29) 2

 14.41 1  
 15.41
2
n ( xi  x )
7
838
ŷ*  t /2 spred
45.9  2.571(15.41) = 45.9  39.6
or 6.3 to 85.5
The prediction interval is so wide that it would not be of much value in the planning process. A
larger data set would be beneficial.
40. a.
9
b.
ŷ = 20.0 + 7.21x
c.
1.3626
14 - 28
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
d.
SSE = SST - SSR = 51,984.1 - 41,587.3 = 10,396.8
MSE = 10,396.8/7 = 1,485.3
F = MSR / MSE = 41,587.3 /1,485.3 = 28.00
Using F table (1 degree of freedom numerator and 7 denominator), p-value is less than .01
Using Excel or Minitab, the p-value corresponding to F = 28.00 is .0011.
Because p-value   = .05, we reject H0: B1 = 0.
Selling price is related to annual gross rents.
e.
41. a.
b.
ŷ = 20.0 + 7.21(50) = 380.5 or \$380,500
ŷ = 6.1092 + .8951x
t
b1  B1 .8951  0

 6.01
sb1
.149
Using the t table (8 degrees of freedom), area in tail is less than .005
p-value is less than .01
Using Excel or Minitab, the p-value corresponding to t = 6.01 is .0003.
Because p-value   = .05, we reject H0: B1 = 0
Maintenance expense is related to usage.
c.
42 a.
ŷ = 6.1092 + .8951(25) = 28.49 or \$28.49 per month
ŷ = 80.0 + 50.0x
b.
30
c.
F = MSR / MSE = 6828.6/82.1 = 83.17
Using F table (1 degree of freedom numerator and 28 denominator), p-value is less than .01
Using Excel or Minitab, the p-value corresponding to F = 83.17 is .000.
Because p-value &lt;  = .05, we reject H0: B1 = 0.
Annual sales is related to the number of salespersons.
d.
ŷ = 80 + 50 (12) = 680 or \$680,000
14 - 29
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Salary &amp; Bonus (\$1000s)
43. a.
140
130
120
110
100
90
80
70
60
50
20
b.
25
30
35
40
Tuition &amp; Fees (\$1000s)
45
50
There appears to be a positive relationship between the two variables. Students that graduate from
the schools with higher tuition and fees tend to receive a higher starting salary and bonus.
The Minitab output is shown below:
The regression equation is
Salary &amp; Bonus (\$1000s) = 33.8 + 1.92 Tuition &amp; Fees (\$1000s)
Predictor
Constant
Tuition &amp; Fees (\$1000s)
S = 7.60875
Coef
33.788
1.9154
R-Sq = 73.8%
SE Coef
9.340
0.2689
T
3.62
7.12
P
0.002
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
18
19
SS
2937.1
1042.1
3979.2
MS
2937.1
57.9
F
50.73
P
0.000
d.
The p-value = .000 &lt;  = .05 (t or F); significant relationship
e.
r2 = .738. The least squares line provided a good fit; approximately 74% of the variability in salary
and bonus can be explained by the linear relationship with tuition and fees.
f.
ŷ = 33.788 + 1.9154(43) = 116.15 or approximately \$116,000.
Note to Instructor: The average starting salary and bonus reported by U.S. News &amp; World Report for
the University of Virginia was \$121,000.
14 - 30
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
44. a.
Scatter diagram:
1000
900
800
Price (\$)
700
600
500
400
300
200
100
0
45
50
55
60
Weight (oz)
65
70
b.
There appears to be a negative linear relationship between the two variables. The heavier helmets
tend to be less expensive.
c.
The Minitab output is shown below:
The regression equation is
Price = 2044 - 28.3 Weight
Predictor
Constant
Weight
Coef
2044.4
-28.350
S = 91.8098
SE Coef
226.4
3.826
R-Sq = 77.4%
T
9.03
-7.41
P
0.000
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
16
17
SS
462761
134865
597626
MS
462761
8429
F
54.90
P
0.000
ŷ = 2044.4 – 28.35 Weight
d.
Significant relationship: p-value = .000 &lt;  = .05
e.
r2 = 0.774; A good fit
45. a.
x
xi 70

 14
n
5
y
( xi  x )( yi  y )  200
yi 76

 15.2
n
5
( xi  x ) 2  126
14 - 31
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b1 
( xi  x )( yi  y ) 200

 1.5873
126
( xi  x )2
b0  y  b1 x  15.2  (1.5873)(14)  7.0222
yˆ  7.02  1.59 x
b.
The residuals are 3.48, -2.47, -4.83, -1.6, and 5.22
c.
6
Residuals
4
2
0
-2
-4
-6
0
5
10
15
20
25
x
With only 5 observations it is difficult to determine if the assumptions are satisfied.
However, the plot does suggest curvature in the residuals that would indicate that the error
term assumptions are not satisfied. The scatter diagram for these data also indicates that the
underlying relationship between x and y may be curvilinear.
d.
s2  23.78
hi 
1 ( xi  x ) 2
1 ( xi  14) 2



n ( xi  x ) 2 5
126
The standardized residuals are 1.32, -.59, -1.11, -.40, 1.49.
e.
The standardized residual plot has the same shape as the original residual plot. The
curvature observed indicates that the assumptions regarding the error term may not be
satisfied.
14 - 32
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
46. a.
yˆ  2.32  .64 x
b.
4
3
Residuals
2
1
0
-1
-2
-3
-4
0
2
4
6
8
10
x
The assumption that the variance is the same for all values of x is questionable. The variance appears
to increase for larger values of x.
47. a.
Let x = advertising expenditures and y = revenue
yˆ  29.4  1.55 x
b.
SST = 1002 SSE = 310.28 SSR = 691.72
MSR = SSR / 1 = 691.72
MSE = SSE / (n - 2) = 310.28/ 5 = 62.0554
F = MSR / MSE = 691.72/ 62.0554= 11.15
Using F table (1 degree of freedom numerator and 5 denominator), p-value is between .01 and .025
Using Excel or Minitab, the p-value corresponding to F = 11.15 is .0206.
Because p-value   = .05, we conclude that the two variables are related.
14 - 33
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
c.
10
Residuals
5
0
-5
-10
-15
25
35
45
55
65
Predicted Values
d.
48. a.
The residual plot leads us to question the assumption of a linear relationship between x and y. Even
though the relationship is significant at the .05 level of significance, it would be extremely
dangerous to extrapolate beyond the range of the data.
yˆ  80  4 x
8
6
Residuals
4
2
0
-2
-4
-6
-8
0
2
4
6
8
10
12
14
x
b.
The assumptions concerning the error term appear reasonable.
14 - 34
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
49. a.
The Minitab output follows:
The regression equation is
Price (\$) = 22636 + 59.0 Square Footage
Predictor
Constant
Square Footage
S = 19166.0
Coef
22636
58.96
SE Coef
20460
12.08
R-Sq = 57.0%
T
1.11
4.88
P
0.283
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
18
19
SS
8748562231
6612039769
15360602000
MS
8748562231
367335543
F
23.82
P
0.000
b.
c.
50. a.
The residual plot leads us to question the assumption of a linear relationship between square footage
and price. Therefore, even though the relationship is very significant (p-value = .000), using the
estimated regression equation make predictions of the price for a house with square footage beyond
the range of the data is not recommended.
The Minitab output follows:
The regression equation is
Y = 66.1 + 0.402 X
Predictor
Constant
X
S = 12.62
Coef
66.10
0.4023
SE Coef
32.06
0.2276
R-sq = 38.5%
T
2.06
1.77
p
0.094
0.137
14 - 35
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 5
Total
6
SS
497.2
795.7
1292.9
Unusual Observations
Obs.
X
Y
1
135
145.00
MS
497.2
159.1
Fit
120.42
F
3.12
SEFit
4.87
Residual
24.58
p
0.137
St.Resid
2.11R
R denotes an observation with a large standardized residual.
The standardized residuals are: 2.11, -1.08, .14, -.38, -.78, -.04, -.41
The first observation appears to be an outlier since it has a large standardized residual.
b.
2.5
2.0
Standardized Residual
1.5
1.0
0.5
0.0
-0.5
-1.0
110
115
120
125
Fitted Value
130
135
140
The standardized residual plot indicates that the observation x = 135, y = 145 may be an outlier;
note that this observation has a standardized residual of 2.11.
14 - 36
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
c.
The scatter diagram is shown below
150
145
140
135
y
130
125
120
115
110
105
100
100
110
120
130
140
150
160
170
180
x
The scatter diagram also indicates that the observation x = 135, y = 145 may be an outlier; the
implication is that for simple linear regression an outlier can be identified by looking at the scatter
diagram.
51. a.
The Minitab output is shown below:
The regression equation is
Y = 13.0 + 0.425 X
Predictor
Constant
X
Coef
13.002
0.4248
S = 3.181
SE Coef
2.396
0.2116
R-sq = 40.2%
T
5.43
2.01
p
0.002
0.091
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 6
Total
7
SS
40.78
60.72
101.50
Unusual Observations
Obs.
X
Y
7
12.0
24.00
8
22.0
19.00
MS
40.78
10.12
Fit Stdev.Fit
18.10
1.20
22.35
2.78
F
4.03
Residual
5.90
-3.35
p
0.091
St.Resid
2.00R
-2.16RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large influence.
The standardized residuals are: -1.00, -.41, .01, -.48, .25, .65, -2.00, -2.16
14 - 37
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
The last two observations in the data set appear to be outliers since the standardized residuals for
these observations are 2.00 and -2.16, respectively.
b.
Using Minitab, we obtained the following leverage values:
.28, .24, .16, .14, .13, .14, .14, .76
MINITAB identifies an observation as having high leverage if hi &gt; 6/n; for these data, 6/n =
6/8 = .75. Since the leverage for the observation x = 22, y = 19 is .76, Minitab would identify
observation 8 as a high leverage point. Thus, we conclude that observation 8 is an influential
observation.
c.
30
25
y
20
15
10
5
0
0
5
10
15
20
25
x
The scatter diagram indicates that the observation x = 22, y = 19 is an influential observation.
52. a.
120
Program Expenses (\$)
100
80
60
40
20
0
0
5
10
15
20
Fundraising Expenses (%)
14 - 38
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
25
Simple Linear Regression
The scatter diagram does indicate potential influential observations. For example, the 22.2%
fundraising expense for the American Cancer Society and the 16.9% fundraising expense for the St.
Jude Children’s Research Hospital look like they may each have a large influence on the slope of the
estimated regression line. And, with a fundraising expense of on 2.6%, the percentage spend on
programs and services by the Smithsonian Institution (73.7%) seems to be somewhat lower than
would be expected; thus, this observeraton may need to be considered as a possible outlier
b.
A portion of the Minitab output follows:
The regression equation is
Program Expenses (%) = 91.0 - 0.917 Fundraising Expenses (%)
Predictor
Constant
Fundraising Expenses (%)
S = 7.47387
R-Sq = 47.7%
Coef
90.981
-0.9172
SE Coef
3.177
0.3392
T
28.64
-2.70
P
0.000
0.027
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
408.35
446.87
855.22
MS
408.35
55.86
F
7.31
P
0.027
Unusual Observations
Obs
3
5
Fundraising
Expenses (%)
2.6
22.2
Program
Expenses
(%)
73.70
71.60
Fit
88.60
70.62
SE Fit
2.67
5.90
Residual
-14.90
0.98
St Resid
-2.13R
0.21 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
c.
The slope of the estimtaed regression equation is -0.917. Thus, for every 1% increase in the amount
spent on fundraising the percentage spent on program expresses will decrease by .917%; in other
words, just a little under 1%. The negative slope and value seem to make sense in the context of this
problem situation.
d.
The Minitab output in part (b) indicates that there are two unusual observations:

Observation 3 (Smithsonian Institution) is an outlier because it has a large standardized residual.

Observation 5 (American Cancer Society) is an influential observation becasuse has high
leverage.
Although fundraising expenses for the Smithsonian Institution are on the low side as compared to
most of the other super-sized charities, the percentage spent on program expenses appears to be
much lower than one would expect. It appears that the Smithsonian’s administrative expenses are too
high. But, thinking about the expenses of running a large museum like the Smithsonian, the
percetage spent on administrative expenses may not be unreasonable and is just due to the fact that
operating costs for a museum are in general higher than for some other types of organizations. The
very large value of fundraising expenses for the American Cancer Society suggests that this
14 - 39
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
obervation has a large influence on the estiamted regresion equation. The following Minitab output
shows the results if this observatoin is deleted from the original data.
The regression equation is
Program Expenses (%) = 91.3 - 1.00 Fundraising Expenses (%)
Predictor
Constant
Fundraising Expenses (%)
S = 7.96708
Coef
91.256
-1.0026
R-Sq = 31.5%
SE Coef
3.654
0.5590
T
24.98
-1.79
P
0.000
0.116
The y-intercept has changed slightly, but the slope has changed from -.917 to -1.00.
53. a.
140
Debt/GDP (%)
120
100
80
60
40
20
0
0
100
200
300
400
500
600
Gold Value (\$B)
b.
There appears to be a positive relationship between the two variables. But, observation 9 (U.S.)
appears to be an observation with high leverage and may be very influential in terms of fitting a
linear model to the data.
c.
The Minitab output follows.
The regression equation is
Debt = 49.1 + 0.123 Gold Value
Predictor
Constant
Gold Value
Coef SE Coef
49.08
15.12
0.12299 0.07847
S = 32.0394
R-Sq = 26.0%
T
3.25
1.57
P
0.014
0.161
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
7
8
SS
2522
7186
9708
MS
2522
1027
F
2.46
P
0.161
14 - 40
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Unusual Observations
Obs
9
Gold
Value
487
Debt
93.2
Fit
109.0
SE Fit
29.5
Residual
-15.8
St Resid
-1.27 X
X denotes an observation whose X value gives it large leverage.
d.
The Minitab output identifies observation 9 as an observation whose x value gives it large leverage.
e.
Looking at the scatter diagram in part (a) it looks like observation 9 will have a lot of influence on
the estimated regression equation. To investigate this we can simply drop the observation from the
data set and fit a new estimated regression equation. The Minitab output we obtained follows.
The regression equation is
Debt = 30.8 + 0.342 Gold Value
Predictor
Constant
Gold Value
Coef
30.77
0.3422
SE Coef
19.85
0.1804
S = 30.3907
R-Sq = 37.5%
T
1.55
1.90
P
0.172
0.107
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
6
7
SS
3324.2
5541.6
8865.7
MS
3324.2
923.6
F
3.60
P
0.107
Note that the slope of the estimated regression equation is now .342 as compared to a value of .123
when this observation is included. Thus, we see that this observation has a big impact on the value of
the slope of the fitted line and hence we would say that it is an influential observation.
54. a.
14 - 41
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
The scatter diagram does indicate potential outliers and/or influential observations. For example, the
data for the Washington Redskins, New England Patriots, and the Dallas Cowboys not only have the
three highest revenues, they also have the highest team values.
b.
A portion of the Minitab output follows:
The regression equation is
Value = - 252 + 5.83 Revenue
Predictor
Constant
Revenue
S = 87.2441
Coef SE Coef
-252.1
130.8
5.8317
0.5863
T
-1.93
9.95
R-Sq = 76.7%
P
0.064
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
30
31
SS
753008
228346
981354
MS
753008
7612
F
98.93
P
0.000
Unusual Observations
Obs
9
19
21
22
32
Revenue
269
282
214
213
327
Value
1612.0
1324.0
1178.0
1170.0
1538.0
Fit
1316.6
1392.5
995.9
990.1
1654.9
SE Fit
31.8
38.6
16.0
16.2
63.7
Residual
295.4
-68.5
182.1
179.9
-116.9
St Resid
3.64R
-0.88 X
2.12R
2.10R
-1.96 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
c.
The Minitab output indicates that there are five unusual observations:
 Observation 9 (Dallas Cowboys) is an outlier because it has a large standardized residual.
 Observation 19 (New England Patriots) is an influential observation becasuse has high leverage.
 Observation 21 (New York Giants) is an outlier because it has a large standardized residual.
 Observation 22 (New York Jets) is an outlier because it has a large standardized residual.
 Observation 32 (Washington Redskins) is an influential observation becasuse has high leverage.
55.
No. Regression or correlation analysis can never prove that two variables are causally related.
56.
The estimate of a mean value is an estimate of the average of all y values associated with the same x.
The estimate of an individual y value is an estimate of only one of the y values associated with a
particular x.
57.
The purpose of testing whether 1  0 is to determine whether or not there is a significant
relationship between x and y. However, rejecting 1  0 does not necessarily imply a good fit. For
example, if 1  0 is rejected and r2 is low, there is a statistically significant relationship between x
and y but the fit is not very good.
14 - 42
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
58. a.
1420
1400
S&amp;P 500
1380
1360
1340
1320
1300
1280
1260
12200
12400
12600
12800
13000
13200
13400
DJIA
b.
A portion of the Minitab output is shown below:
The regression equation is
S&amp;P = - 669 + 0.157 DJIA
Predictor
Constant
DJIA
Coef SE Coef
-669.0
130.7
0.15727 0.01015
S = 9.60811
R-Sq = 94.9%
T
-5.12
15.49
P
0.000
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
13
14
SS
22146
1200
23346
MS
22146
92
F
239.89
P
0.000
c.
Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value   =.05, we
reject H 0 : 1  0 ; there is a significant relationship.
d.
With R-Sq = 94.9%, the estimated regression equation provided an excellent fit.
e.
yˆ  669.0  .15727(DJIA)=  669.0  .15727(13,500)  1454
f.
The DJIA is not that far beyond the range of the data. With the excellent fit provided by the
estimated regression equation, we should not be too concerned about using the estimated regression
equation to predict the S&amp;P500.
14 - 43
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
59. a.
The Minitab output is shown below:
The regression equation is
Share Price (\$) = - 2.99 + 0.911 Fair Value (\$)
Predictor
Constant
Fair Value (\$)
S = 12.0064
Coef SE Coef
-2.987
5.791
0.91128 0.09783
R-Sq = 76.9%
T
-0.52
9.31
P
0.610
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
26
27
SS
12507
3748
16255
MS
12507
144
F
86.76
P
0.000
ŷ = -2.987 + .91128 Fair Value (\$)
b.
Significant relationship: p-value = .000 &lt;  = .05
c.
ŷ = -2.987 + .91128 Fair Value (\$) = -2.987 + .91128(50) = 42.577 or approximately \$42.58
d.
The estimated regression equation should provide a good estimate because r2 = 0.769
60. a.
The scatter diagram indicates a positive linear relationship between the two variables. Online
universities with higher retention rates tend to have higher graduation rates.
14 - 44
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
b.
The Minitab output follows:
The regression equation is
GR(%) = 25.4 + 0.285 RR(%)
Predictor
Constant
RR(%)
Coef
25.423
0.28453
S = 7.45610
SE Coef
3.746
0.06063
T
6.79
4.69
R-Sq = 44.9%
P
0.000
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
27
28
SS
1224.3
1501.0
2725.3
MS
1224.3
55.6
F
22.02
P
0.000
Unusual Observations
Obs
2
3
RR(%)
51
4
GR(%)
25.00
28.00
Fit
39.93
26.56
SE Fit
1.44
3.52
Residual
-14.93
1.44
St Resid
-2.04R
0.22 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
61.
c.
Because the p-value = .000 &lt; α =.05, the relationship is significant.
d.
The estimated regression equation is able to explain 44.9% of the variability in the graduation rate
based upon the linear relationship with the retention rate. It is not a great fit, but given the type of
data, the fit is reasonably good.
e.
In the Minitab output in part (b), South University is identified as an observation with a large
standardized residual. With a retention rate of 51% it does appear that the graduation rate of 25% is
low as compared to the results for other online universities. The president of South University should
be concerned after looking at the data. Using the estimated regression equation, we estimate that the
gradation rate at South University should be 25.4 + .285(51) = 40%.
f.
In the Minitab output in part (b), the University of Phoenix is identified as an observation whose x
value gives it large influence. With a retention rate of only 4%, the president of the University of
Phoenix should be concerned after looking at the data.
The Minitab output is shown below:
The regression equation is
Expense = 10.5 + 0.953 Usage
Predictor
Constant
X
Coef
10.528
0.9534
SE Coef
3.745
0.1382
S = 4.250
R-sq = 85.6%
T
2.81
6.90
p
0.023
0.000
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 8
Total
9
SS
860.05
144.47
1004.53
MS
860.05
18.06
F
47.62
p
0.000
14 - 45
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Fit
39.13
Stdev.Fit
1.49
(
95% C.I.
35.69, 42.57)
(
95% P.I.
28.74, 49.52)
a.
ŷ = 10.528 + .9534 Usage
b.
Since the p-value corresponding to F = 47.62 = .000 &lt;  = .05, we reject H0: 1 = 0.
c.
The 95% prediction interval is 28.74 to 49.52 or \$2874 to \$4952
d.
Yes, since the expected expense is ŷ = 10.528 + .9534(30) = 39.13 or \$3913.
62. a.
The Minitab output is shown below:
The regression equation is
Defects = 22.2 - 0.148 Speed
Predictor
Constant
Speed
S = 1.489
Coef
22.174
-0.14783
SE Coef
1.653
0.04391
R-Sq = 73.9%
T
13.42
-3.37
P
0.000
0.028
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
4
5
SS
25.130
8.870
34.000
MS
25.130
2.217
F
11.33
P
0.028
Predicted Values for New Observations
New Obs
Fit
1
14.783
SE Fit
0.896
(
95.0% CI
12.294, 17.271)
(
95.0% PI
9.957, 19.608)
b.
Since the p-value corresponding to F = 11.33 = .028 &lt;  = .05, the relationship is significant.
c.
r 2 = .739; a good fit. The least squares line explained 73.9% of the variability in the number of
defects.
d.
Using the Minitab output in part (a), the 95% confidence interval is 12.294 to 17.271.
14 - 46
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
63. a.
9
8
7
Days
6
5
4
3
2
1
0
0
5
10
15
20
Distance
There appears to be a negative linear relationship between distance to work and number of days
absent.
b.
The Minitab output is shown below:
The regression equation is
Days = 8.10 - 0.344 Distance
Predictor
Constant
X
Coef
8.0978
-0.34420
S = 1.289
SE Coef
0.8088
0.07761
R-sq = 71.1%
T
10.01
-4.43
p
0.000
0.002
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 8
Total
9
Fit
6.377
c.
Stdev.Fit
0.512
SS
32.699
13.301
46.000
(
MS
32.699
1.663
95% C.I.
5.195, 7.559)
F
19.67
(
p
0.002
95% P.I.
3.176, 9.577)
Since the p-value corresponding to F = 419.67 is .002 &lt;  = .05. We reject H0 : 1 = 0.
There is a significant relationship between the number of days absent and the distance to work.
d.
r2 = .711. The estimated regression equation explained 71.1% of the variability in y; this is a
reasonably good fit.
e.
The 95% confidence interval is 5.195 to 7.559 or approximately 5.2 to 7.6 days.
14 - 47
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
64. a.
The Minitab output is shown below:
The regression equation is
Cost = 220 + 132 Age
Predictor
Constant
X
Coef
220.00
131.67
S = 75.50
SE Coef
58.48
17.80
R-sq = 87.3%
T
3.76
7.40
p
0.006
0.000
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 8
Total
9
Fit
746.7
b.
Stdev.Fit
29.8
SS
312050
45600
357650
(
MS
312050
5700
95% C.I.
678.0, 815.4)
(
F
54.75
p
0.000
95% P.I.
559.5, 933.9)
Since the p-value corresponding to F = 54.75 is .000 &lt;  = .05, we reject H0: 1 = 0.
Maintenance cost and age of bus are related.
c.
r2 = .873. The least squares line provided a very good fit.
d.
The 95% prediction interval is 559.5 to 933.9 or \$559.50 to \$933.90
65. a.
The Minitab output is shown below:
The regression equation is
Points = 5.85 + 0.830 Hours
Predictor
Constant
X
Coef
5.847
0.8295
S = 7.523
SE Coef
7.972
0.1095
R-sq = 87.8%
T
0.73
7.58
p
0.484
0.000
Analysis of Variance
SOURCE
DF
Regression
1
Residual Error 8
Total
9
Fit
84.65
b.
Stdev.Fit
3.67
SS
3249.7
452.8
3702.5
(
MS
3249.7
56.6
95% C.I.
76.19, 93.11)
F
57.42
(
p
0.000
95% P.I.
65.35, 103.96)
Since the p-value corresponding to F = 57.42 is .000 &lt;  = .05, we reject H0: 1 = 0.
Total points earned is related to the hours spent studying.
c.
84.65 points
d.
The 95% prediction interval is 65.35 to 103.96
14 - 48
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
66. a.
The Minitab output is shown below:
The regression equation is
Horizon = 0.275 + 0.950 S&amp;P 500
Predictor
Constant
S&amp;P 500
Coef
0.2747
0.9498
S = 2.664
SE Coef
0.9004
0.3569
R-Sq = 47.0%
T
0.31
2.66
P
0.768
0.029
Analysis of Variance
Source
DF
Regression
1
Residual Error 8
Total
9
SS
50.255
56.781
107.036
MS
50.255
7.098
F
7.08
P
0.029
The market beta for Horizon is b1 = .95
b.
Since the p-value = 0.029 is less than  = .05, the relationship is significant.
c.
r2 = .470. The least squares line does not provide a very good fit.
d.
Xerox has higher risk with a market beta of 1.22.
67. a.
The Minitab output is shown below:
The regression equation is
Audit% = - 0.471 +0.000039 Income
Predictor
Constant
Income
Coef
-0.4710
0.00003868
S = 0.2088
SE Coef
0.5842
0.00001731
R-Sq = 21.7%
T
-0.81
2.23
P
0.431
0.038
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
18
19
SS
0.21749
0.78451
1.00200
MS
0.21749
0.04358
F
4.99
P
0.038
Predicted Values for New Observations
New Obs
1
Fit
0.8828
SE Fit
0.0523
95.0% CI
( 0.7729, 0.9927)
95.0% PI
( 0.4306, 1.3349)
b.
Since the p-value = 0.038 is less than  = .05, the relationship is significant.
c.
r2 = .217. The least squares line does not provide a very good fit.
d.
The 95% confidence interval is .7729 to .9927.
14 - 49
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
68. a.
18.0
Price (\$1000s)
16.0
14.0
12.0
10.0
8.0
6.0
4.0
0
20
40
60
80
Miles (1000s)
100
120
b.
There appears to be a negative relationship between the two variables that can be approximated by a
straight line. An argument could also be made that the relationship is perhaps curvilinear because at
some point a car has so many miles that its value becomes very small.
c.
The Minitab output is shown below.
The regression equation is
Price (\$1000s) = 16.5 - 0.0588 Miles (1000s)
Predictor
Constant
Miles (1000s)
S = 1.54138
Coef
16.4698
-0.05877
SE Coef
0.9488
0.01319
R-Sq = 53.9%
T
17.36
-4.46
P
0.000
0.000
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
17
18
SS
47.158
40.389
87.547
MS
47.158
2.376
F
19.85
P
0.000
d.
Significant relationship: p-value = 0.000 &lt; α = .05.
e.
r 2 = .539; a reasonably good fit considering that the condition of the car is also an important factor
in what the price is.
f.
The slope of the estimated regression equation is -.0558. Thus, a one-unit increase in the value of x
coincides with a decrease in the value of y equal to .0558. Because the data were recorded in
thousands, every additional 1000 miles on the car’s odometer will result in a \$55.80 decrease in the
predicted price.
g.
The predicted price for a 2007 Camry with 60,000 miles is ŷ = 16.5 -.0588(60) = 12.97 or
approximately \$13,000. Because of other factors, such as condition and whether the seller is a
private party or a dealer, this is probably not the price you would offer for the car. But, it should be a
good starting point in figuring out what to offer the seller.
14 - 50