Testing the Significance of the y

Chapter 13

Simple Linear Regression

Analysis

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.

Simple Linear Regression

13.1 The Simple Linear Regression

Model and the Least Square Point

Estimates

13.3 Testing the Significance of Slope and y-Intercept

13-2

The Simple Linear Regression Model and the Least Squares Point Estimates

• The dependent (or response) variable is the variable we wish to understand or predict

• The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable

• Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables

13-3

Objective of Regression Analysis

The objective of regression analysis is to build a regression model (or predictive equation) that can be used to describe, predict and control the dependent variable on the basis of the independent variable

13-4

Example 13.1: Fuel Consumption

Case #1

5

6

3

4

7

8

Week

1

2

32.5

39.0

45.9

57.8

58.1

62.5

Average

Hourly Weekly Fuel

Temperature Consumption x (deg F)

28.0

28.0

y (MMcf)

12.4

11.7

12.4

10.8

9.4

9.5

8.0

7.5

13-5

Example 13.1: Fuel Consumption

Case #2

13-6

Example 13.1: Fuel Consumption

Case #3

13-7

Example 13.1: Fuel Consumption

Case #4

• The values of β

0 and β

1 determine the value of the mean weekly fuel consumption μ y|x

• Because we do not know the true values of β and β

1

, we cannot actually calculate the mean weekly fuel consumptions

0

• We will learn how to estimate β

0 next section and β

1 in the

• For now, when we say that μ y|x is related to x by a straight line, we mean the different mean weekly fuel consumptions and average hourly temperatures lie in a straight line

13-8

Form of The Simple Linear Regression

Model

• y = β

0

+ β

1 x + ε

•  y

= β

0

+ β

1 x + ε is the mean value of the dependent variable y when the value of the independent variable is x

• β

0

0 is the y-intercept; the mean of y when x is

• β

1 is the slope; the change in the mean of y per unit change in x

• ε is an error term that describes the effect on y of all factors other than x

13-9

Regression Terms

• β

0 and β

1 are called regression parameters

• β

0 is the yintercept and β

1 is the slope

• We do not know the true values of these parameters

• So, we must use sample data to estimate them

• b

0 is the estimate of β

0 estimate of β

1 and b

1 is the

13-10

The Simple Linear Regression Model

Illustrated

13-11

The Least Squares Estimates, and

Point Estimation and Prediction

• The true values of β

0 unknown and β

1 are

• Therefore, we must use observed data to compute statistics that estimate these parameters

• Will compute b

0 estimate β

1 to estimate β

0 and b

1 to

13-12

The Least Squares Point Estimates

• Estimation/prediction equation y

̂ = b

0

+ b

1 x

• Least squares point estimate of the slope β

1 b

1

SS xy

SS xx

SS xy

SS xx

( x i

( x i

 x ) ( y i x )

2 

 y )

 x i

2 

 x i y i n

  x i

2

  x i

  y i

 n

13-13

The Least Squares Point Estimates

Continued

• Least squares point estimate of the yintercept

0 b

0

 y

 b

1 x y

 y i n x

 x i n

13-14

Example 13.3: Fuel Consumption Case

#1 y x

12.4

28.0

x

2

784.00

11.7

28.0

784.00

12.4

32.5

1056.25

10.8

39.0

1521.00

9.4

45.9

2106.81

9.5

57.8

3340.84

xy

347.20

327.60

403.00

421.20

431.46

549.10

8.0

58.1

3375.61

7.5

62.5

3906.25

464.80

468.75

81.7

351.8

16874.76

3413.11

13-15

Example 13.3: Fuel Consumption Case

#2

• From last slide,

– Σy i

= 81.7

– Σx i

= 351.8

– Σx 2 i

= 16,874.76

– Σx i y i

= 3,413.11

• Once we have these values, we no longer need the raw data

• Calculation of b

0 totals and b

1 uses these

13-16

Example 13.3: Fuel Consumption Case

#3 (Slope b

1

)

SS xy

  x i y i

  x i

  y i

 n

SS xx

3413 .

11

  x i

2 

( 351 .

8 )( 81 .

7 )

8

  x i

2

 n

179 .

6475

16874 .

76

( 351 .

8 )

2

8

1404 .

355 b

1

SS xy

SS xx

179 .

6475

 

0 .

1279

1404 .

355

13-17

Example 13.3: Fuel Consumption Case

#4 (y-Intercept b

0

) y x

 n n y i x i

81 .

7

8

351 .

8

8

10 .

2125

43 .

98 b

0

 y

 b

1 x

10 .

2125

(

0 .

1279 )( 43 .

98 )

15 .

84

13-18

Example 13.3: Fuel Consumption Case

#5

• Prediction (x = 40)

• y

̂ = b

0

+ b

1 x = 15.84 + (-0.1279)(28)

• y

̂ = 12.2588 MMcf of Gas

13-19

Example 13.3: Fuel Consumption Case

#6

13-20

Example 13.3: The Danger of Extrapolation

Outside The Experimental Region

13-21

Testing the Significance of the Slope

• A regression model is not likely to be useful unless there is a significant relationship between x and y

• To test significance, we use the null hypothesis:

H

0

: β

1

= 0

• Versus the alternative hypothesis:

H a

: β

1

≠ 0

13-22

Testing the Significance of the Slope #2

If the regression assumptions hold, we can reject H

0

:

1

= 0 at the

 level of significance (probability of Type I error equal to

) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than

13-23

Testing the Significance of the Slope #3

Alternative Reject H

0

If p -Value

H a

: β

1

> 0 t > t

α

Area under t distribution right of t

H a

: β

1

< 0 t < –t

α

H a

: β

1

≠ 0 |t| > t

α/2

*

Area under t distribution left of t

Twice area under t distribution right of |t|

* That is t > t

α/2 or t <

–t

α/2

13-24

Testing the Significance of the Slope #4

• Test Statistics b t= s b

1

1 where s b

1

 s

SS xx

• 100(1α)% Confidence Interval for β

1

[b

1

± t

/2

S b1

]

• t

, t

/2 and p-values are based on n degrees of freedom

–2

13-25

Example 13.6: MINITAB Output of

Regression on Fuel Consumption Data

13-26

Example 13.6: Excel Output of

Regression on Fuel Consumption Data

13-27

Example 13.6: Fuel Consumption

Case

• The p-value for testing H

0 versus H a twice the area to the right of |t|=7.33 is with n-2=6 degrees of freedom

• In this case, the p-value is 0.0003

• We can reject H

0 in favor of H a at level of significance 0.05, 0.01, or 0.001

• We therefore have strong evidence that x is significantly related to y and that the regression model is significant

13-28

A Confidence Interval for the Slope

• If the regression assumptions hold, a

100(1-

) percent confidence interval for the true slope B1 is

– b

1

± t

/2 s b

• Here t is based on n - 2 degrees of freedom

13-29

Example 13.7: Fuel Consumption

Case

• An earlier printout tells us:

– b

1

= -0.12792

– s b1

= 0.01746

• We have n-2=6 degrees of freedom

– That gives us a t-value of 2.447 for a 95 percent confidence interval

• [b

1

± t

0.025

· s b1

] = [-0.12792 ± 0.01746]

= [-0.1706, -0.0852]

13-30

Testing the Significance of the y-Intercept

If the regression assumptions hold, we can reject H

0

:

0

= 0 at the

 level of significance (probability of Type I error equal to

) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than

13-31

Testing the Significance of the y-Intercept #2

Alternative Reject H

0

If p -Value

H a

: β

0

> 0 t > t

α

Area under t distribution right of t

H a

: β

0

< 0 t < –t

α

H a

: β

0

≠ 0 |t| > t

α/2

*

Area under t distribution left of t

Twice area under t distribution right of |t|

* That is t > t

α/2 or t < –t

α/2

13-32

Testing the Significance of the y-Intercept #3

Test Statistics b t= s b

0

0 where s b

0

 s x

2

1 n

SS xx

100(1-

)% Confidence Interval for

1

[ b

0

 t

/ 2 s b

0

] t

, t

/2 and p-values are based on n freedom

–2 degrees of

13-33