Chapter 12

Chapter 13

Simple Regression Analysis

1.

2.

3.

4.

5.

6.

LEARNING OBJECTIVES

The overall objective of this chapter is to give you an understanding of bivariate regression and correlation analysis, thereby enabling you to:

Compute the equation of a simple regression line from a sample of data and interpret the slope and intercept of the equation.

Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data.

Compute a standard error of the estimate and interpret its meaning.

Compute a coefficient of determination and interpret it.

Test hypotheses about the slope of the regression model and interpret the results.

Estimate values of y using the regression model.

CHAPTER OUTLINE

13.1 Introduction to Simple Regression Analysis

13.2 Determining the Equation of the Regression Line

13.3 Residual Analysis

Using Residuals to Test the Assumptions of the Regression Model

Using the Computer for Residual Analysis

13.4 Standard Error of the Estimate

13.5 Coefficient of Determination

Relationship between r and r 2

13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall Model

Testing the Slope

Testing the Overall Model

13.7 Estimation

Confidence Intervals to Estimate the Conditional Mean of y : µ y/x

Prediction Intervals to Estimate a Single Value of y

13.8 Interpreting the Output coefficient of determination ( confidence interval dependent variable deterministic model heteroscedasticity homoscedasticity independent variable least squares analysis outliers r 2 )

243

KEY WORDS prediction interval probabilistic model regression analysis residual residual plot scatter plot simple regression standard error of the estimate ( s e

) sum of squares of error (SSE)

244 Solutions Manual and Study Guide

STUDY QUESTIONS

1. The process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable is _______________.

2. Bivariate linear regression is often termed _______________ regression.

3. In regression, the variable being predicted is usually referred to as the _______________ variable.

4. In regression, the predictor is called the _______________ variable.

5. The first step in simple regression analysis often is to graph or construct a _______________.

6. In regression analysis,

7. In regression analysis,

 b o

1

represents the population _______________.

represents the sample _______________.

8. A researcher wants to develop a regression model to predict the price of gold by the prime interest rate. The dependent variable is _______________.

9. In an effort to develop a regression model, the following data were gathered: x y

: 2, 9, 11, 19, 21, 25

: 26, 17, 18, 15, 15, 8

The slope of the regression line determined from these data is ____________.

The y intercept is ____________.

10. A researcher wants to develop a regression line from the data given below: x y

: 12, 11, 5, 6, 9

: 31, 25, 14, 12, 16

The equation of the regression line is ____________________.

11. In regression, the value of y

 y is called the _______________.

12. Data points that lie apart from the rest of the points are called _______________.

13. The regression assumption of constant error variance is called _______________.

If the error variances are not constant, it is called _______________.

14. Suppose the following data are used to determine the equation of the regression line given below: x y

:

:

2, 5, 11, 24, 31

12, 13, 16, 14, 19 y = 12.224 + 0.1764 x

The residual for x = 11 is ____________________.

Chapter 13: Simple Regression Analysis 245

15. The total of the residuals squared is called the ______________________________.

16. A standard deviation of the error of the regression model is called the

__________________________________ and is denoted by _______________.

17. Suppose a regression model is developed for ten pairs of data resulting in S.S.E. = 1,203. The standard error of the estimate is _______________.

18. A regression analysis results in the following data: x y

= 276

= 77 x 2 = 12,014 y 2 = 1,183 xy n

= 2,438

= 7

The value of S.S.E. is _______________.

19. The value of S e

is computed from the data of question 18 is _______________.

20. Suppose a regression model results in a value of S e

= 27.9. 95% of the residuals should fall within

_______________.

21. Coefficient of determination is denoted by __________.

22. _________________________ is the proportion of variability of the dependent variable accounted for or explained by the independent variable.

23. The value of r 2 always falls between __________ and __________ inclusive.

24. Suppose a regression analysis results in the following: b b n

1

0

= .19364

= 59.4798

= 8 y = 1,019 y 2 = 134,451 xy = 378,932

The value of r 2 for this regression model is ________.

25. Suppose the data below are used to determine the equation of a regression line: x y

: 18, 14, 9, 6, 2

: 14, 25, 22, 23, 27

The value of r 2 associated with this model is __________.

26. A researcher has developed a regression model from sixteen pairs of data points. He wants to test to determine if the slope is significantly different from zero. He uses a two-tailed test and



= .01.

The critical table t value is _______________.

27. The following data are used to develop a simple regression model: x y

: 22, 20, 15, 15, 14, 9

: 31, 20, 12, 9, 10, 6

The observed t value used to test the slope of this regression model is _______________.


28. If



= .05 and a two-tailed test is being conducted, the critical table t value to test the slope of the model developed in question 27 is _______________.

29. The decision reached about the slope of the model computed in question 27 is to _______________ the null hypothesis.


ANSWERS TO STUDY QUESTIONS

1. Regression

2. Simple

3. Dependent

4. Independent

5. Scatter Plot

6. Slope

7. y Intercept

8. Price of Gold

9.

–0.626, 25.575

10.

–1.253 + 2.425 x

11. Residual

12. Outliers

13. Homoscedasticity,

Heteroscadasticity

14. 1.8356

15. Sum of Squares of Error

16. Standard Error of the Estimate,

17. 12.263

18. 20.015

19. 2.00

20. 0 + 55.8

21. r 2

22. Coefficient of Determination

23. 0, 1

24. .900

25. .578

26. 2.977

27. 4.72

28. + 2.776

29. Reject s e


13.1

SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 13 x

12

21

28

8

20 x

17

15

22

19

24

 x = 89

 x 2 = 1,833

 y = 97

 y 2 = 1,935

 xy = 1,767 n = 5 b

1

=

SS xy

SS x



 xy



 x

2 

(

 n x )

2 n y

=

1 , 767



( 89 )(

5

97

1 , 833



5

( 89 )

2

)

= 0.162

b

0

=

 n y

 b

1

 x n



97

5



0 .

162

89

5

= 16.5 y = 16.5 + 0.162 x

13.3 (Advertising) x

12.5

3.7

21.6

60.0

37.6

6.1

16.8

41.2

(Sales)

148

55

338

994

541

89

126

379 y

 x = 199.5

 y = 2,670

 xy = 107,610.4

 x 2 = 7,667.15

 y 2 = 1,587,328 n = 8

Chapter 13: Simple Regression Analysis 249 b

1

=

SS xy

SS x





 xy

 x

2 

(

 n x )

2 n y

=

107 , 610 .

4



( 199 .

5 )(

8

2 ,

7 , 667 .

15



8

( 199 .

5 )

2

670 )

= 15.24

b

0

=

 n y

 b

1

 x n



2 , 670

8



15 .

24

199 .

5

8

= –46.29 y = –46.29 + 15.24 x

13.5 Starts Failures

233,710

199,091

57,097

50,361

181,645

158,930

155,672

60,747

88,140

97,069

164,086

166,154

188,387

168,158

170,475

166,740

 x = 1,953,048

 y 2 = 61,566,568,203

86,133

71,558

71,128

71,931

83,384

71,857

 y = 809,405

 x 2 = 351,907,107,960

 xy = 141,238,520,688 n = 11 b

1

=

SS xy

SS x



 xy



 x

2 

(

 n x )

2 n y

=

141 , 238 , 520 , 688



( 1 , 953 , 048 )( 809 , 405 )

11

( 1 , 953 , 048 )

2

351 , 907 , 107 , 960



11

= b

1

=

–0.48042194

b

0

=

 n y

 b

1

 x n



809 , 405

11



(



0 .

48042194 )

1 , 953 , 048

11

= 158,881.1 y = 158,881.1 – 0.48042194 x


13.7

 x = 994.8

 y 2 = 104.9815

Steel

99.9

97.9

98.9

87.9

92.9

97.9

100.6

104.9

105.3

108.6

New Orders

2.74

2.87

2.93

2.87

2.98

3.09

3.36

3.61

3.75

3.95

 y = 32.15

 xy = 3,216.652

 x 2 = 99,293.28 n = 10 b

1

=

SS xy

SS x



 xy



 x

2 

(

 n x )

2 n y

=

3 , 216 .

652



( 994 .

8 )(

10

32 .

99 , 293 .

28



10

( 994 .

8 )

2

15 )

= 0.05557

b

0

=

 n y

 b

1

 x n



32 .

15

10



( 0 .

05557 )

994 .

8

=

–2.31307

10

= –2.31307 + 0.05557 x

13.9 x y Predicted ( y ) Residuals ( y –

ˆ

)

12 17 18.4582 –1.4582

21 15 19.9196 –4.9196

28 22 21.0563 0.9437

8 19 17.8087 1.1913

20 24 19.7572 4.2428 y = 16.5 + 0.162 x


13.11 x y Predicted ( y ) Residuals ( y

– ˆ

)

12.5 148 144.2053

3.7 55 10.0953

21.6 338

60.0 994

282.8873

868.0945

37.6 541 526.7236

6.1 89 46.6708

16.8 126

41.2 379

209.7364

581.5868

3.7947

44.9047

55.1127

125.9055

14.2764

42.3292

–83.7364

–202.5868 y

ˆ

= –46.29 + 15.24

x

13.13 x y Predicted ( y

ˆ

) Residuals ( y

– y )

5 47 42.2756 4.7244

7 38 38.9836 –0.9836

11 32 32.3996

12 24 30.7537

–0.3996

–6.7537

19 22 19.2317 2.7683

25 10 9.3558 0.6442 y = 50.5056 – 1.6460 x

13.15

No apparent violation of assumptions

Error terms appear to be non independent


13.17

There appears to be nonlinear regression

13.19 SSE =

 y 2 – b

0

 y – b

1

 xy = 1,935 – (16.51)(97) – 0.1624(1767) = 46.5692 s e



SSE n



2



46 .

5692

= 3.94

3

Approximately 68% of the residuals should fall within ±1 s e

.

3 out of 5 or 60% of the actually residuals in 13.1 fell within ± 1 s e

.

13.21 SSE =

 y 2 – b

0

 y – b

1

 xy = 1,587,328 – (–46.29)(2,670) – 15.24(107,610.4) =

SSE = 70,940 s e



SSE n



2



70 , 940

= 108.7

6

13.23

Six out of eight (75%) of the sales estimates are within $108.7 million.

( y – y

ˆ

)

4.7244

–0.9836

–0.3996

–6.7537

22.3200

.9675

.1597

45.6125

2.7683 7.6635

0.6442 .4150



( y – ) 2 = 77.1382

SSE =



( y

 y )

2

( y – y

ˆ

) 2

= 77.1382 s e



SSE n



2



77 .

1382

= 4.391

4


13.25 Volume ( x )

728.6

497.9

Sales ( y )

10.5

48.1

439.1 64.8

377.9 20.1 n



= 7 x 2

375.5 11.4

363.8 123.8

276.3 89.0

= 1,464,071.97

 x = 3059.1

 y 2 = 30,404.31 b

1

= –.1504 b

0

= 118.257

 y = 367.7

 xy = 141,558.6 y = 118.257 – .1504

x

SSE =

 y 2 – b

0

 y – b

1



XY

= 30,404.31 – (118.257)(367.7) – (–0.1504)(141,558.6) = 8211.6245 s e

 n

SSE



2



8211 .

6245

= 40.5256

5

This is a relatively large standard error of the estimate given the sales values

(ranging from 10.5 to 123.8).

13.27 r 2 = 1



 y

2

SSE



(

 n y )

2



1



272 .

12

45 , 154



( 498 )

2

7

= .972

This is a high value of r 2

13.29 r 2 = 1



 y

2

SSE



(

 n y )

2



1



19 .

8885

524



( 48 )

2

5

= .685

This value of r 2 is a modest value.

68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.


13.31 CCI

116.8

91.5

68.5

61.6

65.9

90.6

100.0

104.6

125.4

 x = 323.573

 y 2 = 79,718.79

Median Income

37.415

36.770

35.501

35.047

34.700

34.942

35.887

36.306

37.005

 y = 824.9

 xy = 29,804.4505

 x 2 = 11,640.93413 n = 9 b

1

=

SS xy

SS x





 xy

 x

2 

(

 n x )

2 n y b

1

= 19.2204

=

29 , 804 .

4505



( 323 .

573

9

)( 824

11 , 640 .

93413



9

( 323 .

573 )

2

.

9 )

= b

0

=

 n y

 b

1

 x n



824 .

9

9



( 19 .

2204 )

323 .

573

9

= –599.3674 y =

–599.3674 + 19.2204 x

SSE =

 y 2 – b

0

 y – b

1

 xy =

79,718.79 – (–599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435 s e

 n

SSE



2



1283 .

13435

= 13.539

7 r 2 = 1



 y

2

SSE



(

 n y )

2



1



1283 .

13435

79 , 718 .

79



( 824 .

9 )

2

9

= .688


13.33 s b

=

 x

2  s e

(

 n x )

2 b

1

= –0.898



7 .

376

58 , 293



( 571 )

2

7

= .068145

H o

:



= 0

H a

:

 

0



Two-tail test,



/2 = .005

= .01 df = n – 2 = 7 – 2 = 5 t

.005,5

= ±4.032 t = b

1

 s b



1 



0 .

898



0

= –13.18

.

068145

Since the observed t = –13.18 < t

.005,5

= –4.032, the decision is to reject the null hypothesis.

13.35 s b

=

 x

2  s e

(

 n x )

2 b

1

= –0.715



2 .

575

421



( 41 )

2

5

= .27963

H o

:



= 0

H a

:

 

0



= .05

For a two-tail test,



/2 = .025 df = n – 2 = 5 – 2 = 3 t

.025,3

= ±3.182 t = b

1

 s b



1 



0 .

715



0

= –2.56

.

27963

Since the observed t = –2.56 > t

.025,3

= –3.182, the decision is to fail to reject the null hypothesis.

13.37 F = 8.26 with a p -value of .021. The overall model is significant at



= .05 but not at



= .01. For simple regression, t = F = 2.8674 t

.05,5

= 2.015 but t

.01,5

= 3.365. The slope is significant at



= .01.



= .05 but not at


13.39 x

0

= 100 df = n

For 90% confidence,

– 2 = 7 – 2 = 5



/2 = .05 t

.05,5

= ±2.015 x



 n

 x = 571 x



571

= 81.57143

7

 x 2 = 58,293 y = 144.414 – .0898(100) = 54.614

S e

= 7.377 y ± t

/2,n–2 s e

1



1 n





( x

2 x

0



 x )

(



2 n x )

2

=

54.614 ± 2.015(7.377) 1



1

7



( 100



81 .

57143 )

2

58 , 293



( 571 )

7

2

=

54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091

38.523 < y < 70.705

For x

0

= 130, y = 144.414 – .0898(130) = 27.674 y ± t

/2,n–2 s e

1



1 n





( x

2 x

0



 x )

(



2 n x )

2

=

27.674 ± 2.015(7.377) 1



1

7



( 130



81 .

57143 )

2

58 , 293



( 571 )

2

7

=

27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227

10.447 < y < 44.901

The width of this confidence interval of y for x

0

= 130 is wider that the confidence interval of y for x

0

= 100 because x

0

= 100 is nearer to the value of x = 81.57 than is x

0

= 130.


13.41 x

0

= 10 df = n

For 99% confidence

– 2 = 5 – 2 = 3 t

.005,3

= 5.841 x



 n

 x = 41 x



41

= 8.20

5

 x 2 = 421 y = 15.46 – 0.715(10) = 8.31 y ± t

/2,n–2 s e

1 n





( x x

2

0





( x )



2 n x ) 2



/2 = .005

S e

= 2.575

8.31 ± 5.841(2.575)

1

5



( 10



8 .

2 )

2

421



( 41 ) 2

5

=

8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34

0.97 < E(y

10

) < 15.65

If the prime interest rate is 10%, we are 99% confident that the average bond rate is between 0.97% and 15.65%.

13.43 x

53

47

41

50

58

62

45

60

 x = 416

 y = 57

 xy = 3,106 y

5

5

7

4

10

12

3

11



 y 2 = 489 n x 2 = 22,032

= 8 b b

1

0

= 0.355

= –11.335

a) y = –11.335 + 0.355 x


b) y (Predicted Values)

7.48

5.35

3.22

6.415

9.255

10.675

4.64

9.965 c) ( y – y

ˆ

) 2

6.1504

.1225

14.2884

5.8322

.5550

1.7556

2.6896

1.0712

SSE = 32.4649

( y – y ) residuals

–2.48

–0.35

3.78

–2.415

0.745

1.325

–1.64

1.035

d) s e

= s e



SSE n



2



32 .

4649

6

= 2.3261

e) r 2 = 1



 y

2

SSE



(

 n y )

2



1



32 .

4649

489



( 57 )

2

8

= .608

f) H o

:



= 0

H a

:

 

0



= .05

Two-tailed test,



/2 = .025 t

.025,6

= ±2.447 df = n – 2 = 8 – 2 = 6 s b

=

 x

2  s e

(

 n x )

2



2 .

3261

22 , 032



( 416 )

2

8

= 0.116305 t = b

1

 s b



1



0 .

3555



.

116305

0

= 3.05

Since the observed t = 3.05 > t

.025,6

= 2.447, the decision is to

The population slope is different from zero. reject the null hypothesis.

g) This model produces only a modest r 2 = .608. Almost 40% of the variance of Y is unaccounted

for by X . The range of Y values is 12 – 3 = 9 and the standard error of the estimate is 2.33. Given

this small range, the s e

is not small.


13.45 a) x

0

= 60

 x = 524

 y = 215

 xy = 15,125 t

.

025,6

= ±2.447



 x y n

2

2

= 36,224

= 6,411

= 8 s e

= 3.201 95% Confidence Interval df = n – 2 = 8 – 2 = 6



/2 = .025 b

1

= .5481 b

0

= –9.026 y = –9.026 + 0.5481(60) = 23.86 x



 x n



524

8

= 65.5 y ± t 

/2,n–2 s e

1 n





( x

2 x

0





( x )



2 n x )

2

23.86 + 2.447((3.201)

( 60



65 .

5 )

2

1

8



36 , 224



( 524 )

2

8

23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94

20.92 < E(y

60

) < 26.8

b) x

0

= 70

70

= –9.026 + 0.5481(70) = 29.341 y + t 

/2,n–2 s e

1



1 n





( x x

2

0





( x )



2 n x )

2

29.341 + 2.447(3.201) 1



1

8

( 70



65 .

5 )

2



36 , 224



( 524 )

2

8

29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347

20.994 < y < 37.688


c) The confidence interval for (b) is much wider because part (b) is for a single value of y which

produces a much greater possible variation. In actuality, x

0

= 70 in part (b) is slightly closer to the

mean ( x ) than x

0

= 60. However, the width of the single interval is much greater than that of the

average or expected y value in part (a).

13.47

 n y

= 12

= 5940

 x = 548

 x 2 = 26,592

 y 2 = 3,211,546

 xy = 287,908 b

1

= 10.626383 b

0

= 9.728511 s y e

= 9.728511 + 10.626383 x

SSE =

 y 2 – b

0

 y – b

1

 xy =



3,211,546 – (9.728511)(5940) – (10.626383)(287,908) = 94337.9762 n

SSE



2



94 , 337 .

9762

= 97.1277

10 r 2 = 1



 y

2

SSE



(

 n y )

2



1



94 , 337 .

9762

= .652

271 , 246 t =

10 .

626383



0

97 .

1277

26 , 592



( 548 )

2

12

= 4.33

If



= .01, then t

.005,10

= 3.169. Since the observed t = 4.33 > t

.005,10

= 3.169, the decision is to reject the null hypothesis.

13.49 1977

581

213

668

345

1476

1776

 x = 5059

 y 2 = 36,825,446

2000

571

220

492

221

1760

5750

 y = 9014

 xy = 13,593,272

 x 2 = 6,280,931 n = 6

Chapter 13: Simple Regression Analysis 261 b

1

=

SS xy

SS x



 xy



 x

2 

(

 n x )

2 n y

=

13 , 593 , 272



( 5059 )( 9014 )

6

( 5059 )

2

6 , 280 , 931



6

= 2.97366 b

0

=

 n y

 b

1

 x n



9014

6



( 2 .

97366 )

5059

6

= –1004.9575 y = –1004.9575 + 2.97366 x for x = 700: y = 1076.6044 y + t 

/2,n–2 s e

1 n





( x

2 x

0





( x )



2 n x )

2



= .05, t

.025,4

= 2.776 x

0

= 700, n = 6 x = 843.167

SSE =

 y 2 – b

0

 y – b

1

 xy =

36,825,446 – (–1004.9575)(9014) – (2.97366)(13,593,272) = 5,462,363.69 s e

 n

SSE



2



5 , 462 , 363 .

69

= 1168.585

4

Confidence Interval =

1076.6044 + (2.776)(1168.585)

1

6



( 700



843 .

167 )

2

6 , 280 , 931



( 5059 )

2

6

=

1076.6044 + 1364.1632

–287.5588 to 2440.7676


H

0

:



H a

:



1

= 0

1



0



= .05 df = 4

Table t

.025,4

= 2.132 t = b

1

 s b

0



2 .

9736



0

1168 .

585

2 , 015 , 350 .

833



2 .

9736

.

8231614

= 3.6124

Since the observed t = 3.6124 > t

.025,4


13.51

 x = 44,754

 y = 17,314

 x 2 = 167,540,610

 y 2 = 24,646,062 n = 13

 xy = 59,852,571 b

1

=

SS xy

SS x



 xy



 x

2 

(

 n x )

2 n y

=

59 , 852 , 571



( 44 , 754 )( 17 , 314 )

13

( 44 , 754 )

2

167 , 540 , 610



13

= .01835 b

0

=

 n y

 b

1

 x n



17 , 314

13



(.

01835 )

44 , 754

= 1268.685

13 y = 1268.685 + .01835 x r 2 for this model is .002858. There is no predictability in this model.

Test for slope: t = 0.18 with a p -value of 0.8623. Not significant

13.53 Let Water use = y and Temperature = x

 x = 608

 y = 1,025

 xy = 86,006



 n x y

2

2

= 49,584

= 152,711

= 8 y =

–54.35604 + 2.40107 x

100

= –54.35604 + 2.40107(100) = 185.751

b

1

= 2.40107 b

0

= –54.35604

SSE =

 y 2 – b

0

 y – b

1

 xy

SSE = 152,711 – (–54.35604)(1,025) – (2.40107)(86,006) = 1919.5146

Chapter 13: Simple Regression Analysis 263 s e



SSE n



2



1 , 919 .

5146

= 17.886

6 r 2 = 1



 y

2

SSE



(

 n y )

2



1



1 , 919 .

5145

152 , 711



( 1025 )

2

8

= 1 – .09 = .91

Testing the slope:

H o

:



= 0

H a

:

 

0



= .01

Since this is a two-tailed test,



/2 = .005 df = n – 2 = 8 – 2 = 6 t

.005,6

= ±3.707 s b

=

 x

2  s e

(

 n x )

2



17 .

886

49 , 584



( 608 )

2

8

= .30783 t = b

1

 s b



1



2 .

40107



0

= 7.80

.

30783

Since the observed t = 7.80 < t

.005,6


13.55 The F value for overall predictability is 7.12 with an associated p -value of .0205 which is significant at



= .05. It is not significant at alpha of .01. The coefficient of determination is .372 with an adjusted r 2 of .32. Thisrepresents very modest predictability. The standard error of the estimate is 982.219 which in units of 1,000 laborers means that about 68% of the predictions are within 982,219 of the actual figures. The regression model is:

Number of Union Members = 22,348.97 – 0.0524 Labor Force. For a labor force of 100,000

(thousand, actually 100 million), substitute x = 100,000 and get a predicted value of 17,108.97

(thousand) which is actually 17,108,970 union members.


Chapter 12

Chapter 13

Simple Regression Analysis

Related documents

Products

Support

Chapter 12

Chapter 13

Simple Regression Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib