1.
2.
3.
4.
5.
6.
LEARNING OBJECTIVES
The overall objective of this chapter is to give you an understanding of bivariate regression and correlation analysis, thereby enabling you to:
Compute the equation of a simple regression line from a sample of data and interpret the slope and intercept of the equation.
Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data.
Compute a standard error of the estimate and interpret its meaning.
Compute a coefficient of determination and interpret it.
Test hypotheses about the slope of the regression model and interpret the results.
Estimate values of y using the regression model.
CHAPTER OUTLINE
13.1 Introduction to Simple Regression Analysis
13.2 Determining the Equation of the Regression Line
13.3 Residual Analysis
Using Residuals to Test the Assumptions of the Regression Model
Using the Computer for Residual Analysis
13.4 Standard Error of the Estimate
13.5 Coefficient of Determination
Relationship between r and r 2
13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall Model
Testing the Slope
Testing the Overall Model
13.7 Estimation
Confidence Intervals to Estimate the Conditional Mean of y : µ y/x
Prediction Intervals to Estimate a Single Value of y
13.8 Interpreting the Output coefficient of determination ( confidence interval dependent variable deterministic model heteroscedasticity homoscedasticity independent variable least squares analysis outliers r 2 )
243
KEY WORDS prediction interval probabilistic model regression analysis residual residual plot scatter plot simple regression standard error of the estimate ( s e
) sum of squares of error (SSE)
244 Solutions Manual and Study Guide
STUDY QUESTIONS
1. The process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable is _______________.
2. Bivariate linear regression is often termed _______________ regression.
3. In regression, the variable being predicted is usually referred to as the _______________ variable.
4. In regression, the predictor is called the _______________ variable.
5. The first step in simple regression analysis often is to graph or construct a _______________.
6. In regression analysis,
7. In regression analysis,
b o
1
represents the population _______________.
represents the sample _______________.
8. A researcher wants to develop a regression model to predict the price of gold by the prime interest rate. The dependent variable is _______________.
9. In an effort to develop a regression model, the following data were gathered: x y
: 2, 9, 11, 19, 21, 25
: 26, 17, 18, 15, 15, 8
The slope of the regression line determined from these data is ____________.
The y intercept is ____________.
10. A researcher wants to develop a regression line from the data given below: x y
: 12, 11, 5, 6, 9
: 31, 25, 14, 12, 16
The equation of the regression line is ____________________.
11. In regression, the value of y
y is called the _______________.
12. Data points that lie apart from the rest of the points are called _______________.
13. The regression assumption of constant error variance is called _______________.
If the error variances are not constant, it is called _______________.
14. Suppose the following data are used to determine the equation of the regression line given below: x y
:
:
2, 5, 11, 24, 31
12, 13, 16, 14, 19 y = 12.224 + 0.1764 x
The residual for x = 11 is ____________________.
Chapter 13: Simple Regression Analysis 245
15. The total of the residuals squared is called the ______________________________.
16. A standard deviation of the error of the regression model is called the
__________________________________ and is denoted by _______________.
17. Suppose a regression model is developed for ten pairs of data resulting in S.S.E. = 1,203. The standard error of the estimate is _______________.
18. A regression analysis results in the following data: x y
= 276
= 77 x 2 = 12,014 y 2 = 1,183 xy n
= 2,438
= 7
The value of S.S.E. is _______________.
19. The value of S e
is computed from the data of question 18 is _______________.
20. Suppose a regression model results in a value of S e
= 27.9. 95% of the residuals should fall within
_______________.
21. Coefficient of determination is denoted by __________.
22. _________________________ is the proportion of variability of the dependent variable accounted for or explained by the independent variable.
23. The value of r 2 always falls between __________ and __________ inclusive.
24. Suppose a regression analysis results in the following: b b n
1
0
= .19364
= 59.4798
= 8 y = 1,019 y 2 = 134,451 xy = 378,932
The value of r 2 for this regression model is ________.
25. Suppose the data below are used to determine the equation of a regression line: x y
: 18, 14, 9, 6, 2
: 14, 25, 22, 23, 27
The value of r 2 associated with this model is __________.
26. A researcher has developed a regression model from sixteen pairs of data points. He wants to test to determine if the slope is significantly different from zero. He uses a two-tailed test and
= .01.
The critical table t value is _______________.
27. The following data are used to develop a simple regression model: x y
: 22, 20, 15, 15, 14, 9
: 31, 20, 12, 9, 10, 6
The observed t value used to test the slope of this regression model is _______________.
246 Solutions Manual and Study Guide
28. If
= .05 and a two-tailed test is being conducted, the critical table t value to test the slope of the model developed in question 27 is _______________.
29. The decision reached about the slope of the model computed in question 27 is to _______________ the null hypothesis.
Chapter 13: Simple Regression Analysis 247
ANSWERS TO STUDY QUESTIONS
1. Regression
2. Simple
3. Dependent
4. Independent
5. Scatter Plot
6. Slope
7. y Intercept
8. Price of Gold
9.
–0.626, 25.575
10.
–1.253 + 2.425 x
11. Residual
12. Outliers
13. Homoscedasticity,
Heteroscadasticity
14. 1.8356
15. Sum of Squares of Error
16. Standard Error of the Estimate,
17. 12.263
18. 20.015
19. 2.00
20. 0 + 55.8
21. r 2
22. Coefficient of Determination
23. 0, 1
24. .900
25. .578
26. 2.977
27. 4.72
28. + 2.776
29. Reject s e
248 Solutions Manual and Study Guide
13.1
SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 13 x
12
21
28
8
20 x
17
15
22
19
24
x = 89
x 2 = 1,833
y = 97
y 2 = 1,935
xy = 1,767 n = 5 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
1 , 767
( 89 )(
5
97
1 , 833
5
( 89 )
2
)
= 0.162
b
0
=
n y
b
1
x n
97
5
0 .
162
89
5
= 16.5 y = 16.5 + 0.162 x
13.3 (Advertising) x
12.5
3.7
21.6
60.0
37.6
6.1
16.8
41.2
(Sales)
148
55
338
994
541
89
126
379 y
x = 199.5
y = 2,670
xy = 107,610.4
x 2 = 7,667.15
y 2 = 1,587,328 n = 8
Chapter 13: Simple Regression Analysis 249 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
107 , 610 .
4
( 199 .
5 )(
8
2 ,
7 , 667 .
15
8
( 199 .
5 )
2
670 )
= 15.24
b
0
=
n y
b
1
x n
2 , 670
8
15 .
24
199 .
5
8
= –46.29 y = –46.29 + 15.24 x
13.5 Starts Failures
233,710
199,091
57,097
50,361
181,645
158,930
155,672
60,747
88,140
97,069
164,086
166,154
188,387
168,158
170,475
166,740
x = 1,953,048
y 2 = 61,566,568,203
86,133
71,558
71,128
71,931
83,384
71,857
y = 809,405
x 2 = 351,907,107,960
xy = 141,238,520,688 n = 11 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
141 , 238 , 520 , 688
( 1 , 953 , 048 )( 809 , 405 )
11
( 1 , 953 , 048 )
2
351 , 907 , 107 , 960
11
= b
1
=
–0.48042194
b
0
=
n y
b
1
x n
809 , 405
11
(
0 .
48042194 )
1 , 953 , 048
11
= 158,881.1 y = 158,881.1 – 0.48042194 x
250 Solutions Manual and Study Guide
13.7
x = 994.8
y 2 = 104.9815
Steel
99.9
97.9
98.9
87.9
92.9
97.9
100.6
104.9
105.3
108.6
New Orders
2.74
2.87
2.93
2.87
2.98
3.09
3.36
3.61
3.75
3.95
y = 32.15
xy = 3,216.652
x 2 = 99,293.28 n = 10 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
3 , 216 .
652
( 994 .
8 )(
10
32 .
99 , 293 .
28
10
( 994 .
8 )
2
15 )
= 0.05557
b
0
=
n y
b
1
x n
32 .
15
10
( 0 .
05557 )
994 .
8
=
–2.31307
10
= –2.31307 + 0.05557 x
13.9 x y Predicted ( y ) Residuals ( y –
ˆ
)
12 17 18.4582 –1.4582
21 15 19.9196 –4.9196
28 22 21.0563 0.9437
8 19 17.8087 1.1913
20 24 19.7572 4.2428 y = 16.5 + 0.162 x
Chapter 13: Simple Regression Analysis 251
13.11 x y Predicted ( y ) Residuals ( y
– ˆ
)
12.5 148 144.2053
3.7 55 10.0953
21.6 338
60.0 994
282.8873
868.0945
37.6 541 526.7236
6.1 89 46.6708
16.8 126
41.2 379
209.7364
581.5868
3.7947
44.9047
55.1127
125.9055
14.2764
42.3292
–83.7364
–202.5868 y
ˆ
= –46.29 + 15.24
x
13.13 x y Predicted ( y
ˆ
) Residuals ( y
– y )
5 47 42.2756 4.7244
7 38 38.9836 –0.9836
11 32 32.3996
12 24 30.7537
–0.3996
–6.7537
19 22 19.2317 2.7683
25 10 9.3558 0.6442 y = 50.5056 – 1.6460 x
13.15
No apparent violation of assumptions
Error terms appear to be non independent
252 Solutions Manual and Study Guide
13.17
There appears to be nonlinear regression
13.19 SSE =
y 2 – b
0
y – b
1
xy = 1,935 – (16.51)(97) – 0.1624(1767) = 46.5692 s e
SSE n
2
46 .
5692
= 3.94
3
Approximately 68% of the residuals should fall within ±1 s e
.
3 out of 5 or 60% of the actually residuals in 13.1 fell within ± 1 s e
.
13.21 SSE =
y 2 – b
0
y – b
1
xy = 1,587,328 – (–46.29)(2,670) – 15.24(107,610.4) =
SSE = 70,940 s e
SSE n
2
70 , 940
= 108.7
6
13.23
Six out of eight (75%) of the sales estimates are within $108.7 million.
( y – y
ˆ
)
4.7244
–0.9836
–0.3996
–6.7537
22.3200
.9675
.1597
45.6125
2.7683 7.6635
0.6442 .4150
( y – ) 2 = 77.1382
SSE =
( y
y )
2
( y – y
ˆ
) 2
= 77.1382 s e
SSE n
2
77 .
1382
= 4.391
4
Chapter 13: Simple Regression Analysis 253
13.25 Volume ( x )
728.6
497.9
Sales ( y )
10.5
48.1
439.1 64.8
377.9 20.1 n
= 7 x 2
375.5 11.4
363.8 123.8
276.3 89.0
= 1,464,071.97
x = 3059.1
y 2 = 30,404.31 b
1
= –.1504 b
0
= 118.257
y = 367.7
xy = 141,558.6 y = 118.257 – .1504
x
SSE =
y 2 – b
0
y – b
1
XY
= 30,404.31 – (118.257)(367.7) – (–0.1504)(141,558.6) = 8211.6245 s e
n
SSE
2
8211 .
6245
= 40.5256
5
This is a relatively large standard error of the estimate given the sales values
(ranging from 10.5 to 123.8).
13.27 r 2 = 1
y
2
SSE
(
n y )
2
1
272 .
12
45 , 154
( 498 )
2
7
= .972
This is a high value of r 2
13.29 r 2 = 1
y
2
SSE
(
n y )
2
1
19 .
8885
524
( 48 )
2
5
= .685
This value of r 2 is a modest value.
68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.
254 Solutions Manual and Study Guide
13.31 CCI
116.8
91.5
68.5
61.6
65.9
90.6
100.0
104.6
125.4
x = 323.573
y 2 = 79,718.79
Median Income
37.415
36.770
35.501
35.047
34.700
34.942
35.887
36.306
37.005
y = 824.9
xy = 29,804.4505
x 2 = 11,640.93413 n = 9 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y b
1
= 19.2204
=
29 , 804 .
4505
( 323 .
573
9
)( 824
11 , 640 .
93413
9
( 323 .
573 )
2
.
9 )
= b
0
=
n y
b
1
x n
824 .
9
9
( 19 .
2204 )
323 .
573
9
= –599.3674 y =
–599.3674 + 19.2204 x
SSE =
y 2 – b
0
y – b
1
xy =
79,718.79 – (–599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435 s e
n
SSE
2
1283 .
13435
= 13.539
7 r 2 = 1
y
2
SSE
(
n y )
2
1
1283 .
13435
79 , 718 .
79
( 824 .
9 )
2
9
= .688
Chapter 13: Simple Regression Analysis 255
13.33 s b
=
x
2 s e
(
n x )
2 b
1
= –0.898
7 .
376
58 , 293
( 571 )
2
7
= .068145
H o
:
= 0
H a
:
0
Two-tail test,
/2 = .005
= .01 df = n – 2 = 7 – 2 = 5 t
.005,5
= ±4.032 t = b
1
s b
1
0 .
898
0
= –13.18
.
068145
Since the observed t = –13.18 < t
.005,5
= –4.032, the decision is to reject the null hypothesis.
13.35 s b
=
x
2 s e
(
n x )
2 b
1
= –0.715
2 .
575
421
( 41 )
2
5
= .27963
H o
:
= 0
H a
:
0
= .05
For a two-tail test,
/2 = .025 df = n – 2 = 5 – 2 = 3 t
.025,3
= ±3.182 t = b
1
s b
1
0 .
715
0
= –2.56
.
27963
Since the observed t = –2.56 > t
.025,3
= –3.182, the decision is to fail to reject the null hypothesis.
13.37 F = 8.26 with a p -value of .021. The overall model is significant at
= .05 but not at
= .01. For simple regression, t = F = 2.8674 t
.05,5
= 2.015 but t
.01,5
= 3.365. The slope is significant at
= .01.
= .05 but not at
256 Solutions Manual and Study Guide
13.39 x
0
= 100 df = n
For 90% confidence,
– 2 = 7 – 2 = 5
/2 = .05 t
.05,5
= ±2.015 x
n
x = 571 x
571
= 81.57143
7
x 2 = 58,293 y = 144.414 – .0898(100) = 54.614
S e
= 7.377 y ± t
/2,n–2 s e
1
1 n
( x
2 x
0
x )
(
2 n x )
2
=
54.614 ± 2.015(7.377) 1
1
7
( 100
81 .
57143 )
2
58 , 293
( 571 )
7
2
=
54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091
38.523 < y < 70.705
For x
0
= 130, y = 144.414 – .0898(130) = 27.674 y ± t
/2,n–2 s e
1
1 n
( x
2 x
0
x )
(
2 n x )
2
=
27.674 ± 2.015(7.377) 1
1
7
( 130
81 .
57143 )
2
58 , 293
( 571 )
2
7
=
27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227
10.447 < y < 44.901
The width of this confidence interval of y for x
0
= 130 is wider that the confidence interval of y for x
0
= 100 because x
0
= 100 is nearer to the value of x = 81.57 than is x
0
= 130.
Chapter 13: Simple Regression Analysis 257
13.41 x
0
= 10 df = n
For 99% confidence
– 2 = 5 – 2 = 3 t
.005,3
= 5.841 x
n
x = 41 x
41
= 8.20
5
x 2 = 421 y = 15.46 – 0.715(10) = 8.31 y ± t
/2,n–2 s e
1 n
( x x
2
0
( x )
2 n x ) 2
/2 = .005
S e
= 2.575
8.31 ± 5.841(2.575)
1
5
( 10
8 .
2 )
2
421
( 41 ) 2
5
=
8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34
0.97 < E(y
10
) < 15.65
If the prime interest rate is 10%, we are 99% confident that the average bond rate is between 0.97% and 15.65%.
13.43 x
53
47
41
50
58
62
45
60
x = 416
y = 57
xy = 3,106 y
5
5
7
4
10
12
3
11
y 2 = 489 n x 2 = 22,032
= 8 b b
1
0
= 0.355
= –11.335
a) y = –11.335 + 0.355 x
258 Solutions Manual and Study Guide
b) y (Predicted Values)
7.48
5.35
3.22
6.415
9.255
10.675
4.64
9.965 c) ( y – y
ˆ
) 2
6.1504
.1225
14.2884
5.8322
.5550
1.7556
2.6896
1.0712
SSE = 32.4649
( y – y ) residuals
–2.48
–0.35
3.78
–2.415
0.745
1.325
–1.64
1.035
d) s e
= s e
SSE n
2
32 .
4649
6
= 2.3261
e) r 2 = 1
y
2
SSE
(
n y )
2
1
32 .
4649
489
( 57 )
2
8
= .608
f) H o
:
= 0
H a
:
0
= .05
Two-tailed test,
/2 = .025 t
.025,6
= ±2.447 df = n – 2 = 8 – 2 = 6 s b
=
x
2 s e
(
n x )
2
2 .
3261
22 , 032
( 416 )
2
8
= 0.116305 t = b
1
s b
1
0 .
3555
.
116305
0
= 3.05
Since the observed t = 3.05 > t
.025,6
= 2.447, the decision is to
The population slope is different from zero. reject the null hypothesis.
g) This model produces only a modest r 2 = .608. Almost 40% of the variance of Y is unaccounted
for by X . The range of Y values is 12 – 3 = 9 and the standard error of the estimate is 2.33. Given
this small range, the s e
is not small.
Chapter 13: Simple Regression Analysis 259
13.45 a) x
0
= 60
x = 524
y = 215
xy = 15,125 t
.
025,6
= ±2.447
x y n
2
2
= 36,224
= 6,411
= 8 s e
= 3.201 95% Confidence Interval df = n – 2 = 8 – 2 = 6
/2 = .025 b
1
= .5481 b
0
= –9.026 y = –9.026 + 0.5481(60) = 23.86 x
x n
524
8
= 65.5 y ± t
/2,n–2 s e
1 n
( x
2 x
0
( x )
2 n x )
2
23.86 + 2.447((3.201)
( 60
65 .
5 )
2
1
8
36 , 224
( 524 )
2
8
23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94
20.92 < E(y
60
) < 26.8
b) x
0
= 70
70
= –9.026 + 0.5481(70) = 29.341 y + t
/2,n–2 s e
1
1 n
( x x
2
0
( x )
2 n x )
2
29.341 + 2.447(3.201) 1
1
8
( 70
65 .
5 )
2
36 , 224
( 524 )
2
8
29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347
20.994 < y < 37.688
260 Solutions Manual and Study Guide
c) The confidence interval for (b) is much wider because part (b) is for a single value of y which
produces a much greater possible variation. In actuality, x
0
= 70 in part (b) is slightly closer to the
mean ( x ) than x
0
= 60. However, the width of the single interval is much greater than that of the
average or expected y value in part (a).
13.47
n y
= 12
= 5940
x = 548
x 2 = 26,592
y 2 = 3,211,546
xy = 287,908 b
1
= 10.626383 b
0
= 9.728511 s y e
= 9.728511 + 10.626383 x
SSE =
y 2 – b
0
y – b
1
xy =
3,211,546 – (9.728511)(5940) – (10.626383)(287,908) = 94337.9762 n
SSE
2
94 , 337 .
9762
= 97.1277
10 r 2 = 1
y
2
SSE
(
n y )
2
1
94 , 337 .
9762
= .652
271 , 246 t =
10 .
626383
0
97 .
1277
26 , 592
( 548 )
2
12
= 4.33
If
= .01, then t
.005,10
= 3.169. Since the observed t = 4.33 > t
.005,10
= 3.169, the decision is to reject the null hypothesis.
13.49 1977
581
213
668
345
1476
1776
x = 5059
y 2 = 36,825,446
2000
571
220
492
221
1760
5750
y = 9014
xy = 13,593,272
x 2 = 6,280,931 n = 6
Chapter 13: Simple Regression Analysis 261 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
13 , 593 , 272
( 5059 )( 9014 )
6
( 5059 )
2
6 , 280 , 931
6
= 2.97366 b
0
=
n y
b
1
x n
9014
6
( 2 .
97366 )
5059
6
= –1004.9575 y = –1004.9575 + 2.97366 x for x = 700: y = 1076.6044 y + t
/2,n–2 s e
1 n
( x
2 x
0
( x )
2 n x )
2
= .05, t
.025,4
= 2.776 x
0
= 700, n = 6 x = 843.167
SSE =
y 2 – b
0
y – b
1
xy =
36,825,446 – (–1004.9575)(9014) – (2.97366)(13,593,272) = 5,462,363.69 s e
n
SSE
2
5 , 462 , 363 .
69
= 1168.585
4
Confidence Interval =
1076.6044 + (2.776)(1168.585)
1
6
( 700
843 .
167 )
2
6 , 280 , 931
( 5059 )
2
6
=
1076.6044 + 1364.1632
–287.5588 to 2440.7676
262 Solutions Manual and Study Guide
H
0
:
H a
:
1
= 0
1
0
= .05 df = 4
Table t
.025,4
= 2.132 t = b
1
s b
0
2 .
9736
0
1168 .
585
2 , 015 , 350 .
833
2 .
9736
.
8231614
= 3.6124
Since the observed t = 3.6124 > t
.025,4
= 2.132, the decision is to reject the null hypothesis.
13.51
x = 44,754
y = 17,314
x 2 = 167,540,610
y 2 = 24,646,062 n = 13
xy = 59,852,571 b
1
=
SS xy
SS x
xy
x
2
(
n x )
2 n y
=
59 , 852 , 571
( 44 , 754 )( 17 , 314 )
13
( 44 , 754 )
2
167 , 540 , 610
13
= .01835 b
0
=
n y
b
1
x n
17 , 314
13
(.
01835 )
44 , 754
= 1268.685
13 y = 1268.685 + .01835 x r 2 for this model is .002858. There is no predictability in this model.
Test for slope: t = 0.18 with a p -value of 0.8623. Not significant
13.53 Let Water use = y and Temperature = x
x = 608
y = 1,025
xy = 86,006
n x y
2
2
= 49,584
= 152,711
= 8 y =
–54.35604 + 2.40107 x
100
= –54.35604 + 2.40107(100) = 185.751
b
1
= 2.40107 b
0
= –54.35604
SSE =
y 2 – b
0
y – b
1
xy
SSE = 152,711 – (–54.35604)(1,025) – (2.40107)(86,006) = 1919.5146
Chapter 13: Simple Regression Analysis 263 s e
SSE n
2
1 , 919 .
5146
= 17.886
6 r 2 = 1
y
2
SSE
(
n y )
2
1
1 , 919 .
5145
152 , 711
( 1025 )
2
8
= 1 – .09 = .91
Testing the slope:
H o
:
= 0
H a
:
0
= .01
Since this is a two-tailed test,
/2 = .005 df = n – 2 = 8 – 2 = 6 t
.005,6
= ±3.707 s b
=
x
2 s e
(
n x )
2
17 .
886
49 , 584
( 608 )
2
8
= .30783 t = b
1
s b
1
2 .
40107
0
= 7.80
.
30783
Since the observed t = 7.80 < t
.005,6
= 3.707, the decision is to reject the null hypothesis.
13.55 The F value for overall predictability is 7.12 with an associated p -value of .0205 which is significant at
= .05. It is not significant at alpha of .01. The coefficient of determination is .372 with an adjusted r 2 of .32. Thisrepresents very modest predictability. The standard error of the estimate is 982.219 which in units of 1,000 laborers means that about 68% of the predictions are within 982,219 of the actual figures. The regression model is:
Number of Union Members = 22,348.97 – 0.0524 Labor Force. For a labor force of 100,000
(thousand, actually 100 million), substitute x = 100,000 and get a predicted value of 17,108.97
(thousand) which is actually 17,108,970 union members.
264 Solutions Manual and Study Guide