Simple Linear Regression
Analysis
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
13.1 The Simple Linear Regression
Model and the Least Square Point
Estimates
13.3 Testing the Significance of Slope and y-Intercept
13-2
The Simple Linear Regression Model and the Least Squares Point Estimates
• The dependent (or response) variable is the variable we wish to understand or predict
• The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable
• Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables
13-3
The objective of regression analysis is to build a regression model (or predictive equation) that can be used to describe, predict and control the dependent variable on the basis of the independent variable
13-4
Example 13.1: Fuel Consumption
Case #1
5
6
3
4
7
8
Week
1
2
32.5
39.0
45.9
57.8
58.1
62.5
Average
Hourly Weekly Fuel
Temperature Consumption x (deg F)
28.0
28.0
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
13-5
Example 13.1: Fuel Consumption
Case #2
13-6
Example 13.1: Fuel Consumption
Case #3
13-7
Example 13.1: Fuel Consumption
Case #4
• The values of β
0 and β
1 determine the value of the mean weekly fuel consumption μ y|x
• Because we do not know the true values of β and β
1
, we cannot actually calculate the mean weekly fuel consumptions
0
• We will learn how to estimate β
0 next section and β
1 in the
• For now, when we say that μ y|x is related to x by a straight line, we mean the different mean weekly fuel consumptions and average hourly temperatures lie in a straight line
13-8
Form of The Simple Linear Regression
Model
• y = β
0
+ β
1 x + ε
• y
= β
0
+ β
1 x + ε is the mean value of the dependent variable y when the value of the independent variable is x
• β
0
0 is the y-intercept; the mean of y when x is
• β
1 is the slope; the change in the mean of y per unit change in x
• ε is an error term that describes the effect on y of all factors other than x
13-9
• β
0 and β
1 are called regression parameters
• β
0 is the yintercept and β
1 is the slope
• We do not know the true values of these parameters
• So, we must use sample data to estimate them
• b
0 is the estimate of β
0 estimate of β
1 and b
1 is the
13-10
The Simple Linear Regression Model
Illustrated
13-11
The Least Squares Estimates, and
Point Estimation and Prediction
• The true values of β
0 unknown and β
1 are
• Therefore, we must use observed data to compute statistics that estimate these parameters
• Will compute b
0 estimate β
1 to estimate β
0 and b
1 to
13-12
• Estimation/prediction equation y
̂ = b
0
+ b
1 x
• Least squares point estimate of the slope β
1 b
1
SS xy
SS xx
SS xy
SS xx
( x i
( x i
x ) ( y i x )
2
y )
x i
2
x i y i n
x i
2
x i
y i
n
13-13
Continued
• Least squares point estimate of the yintercept
0 b
0
y
b
1 x y
y i n x
x i n
13-14
Example 13.3: Fuel Consumption Case
#1 y x
12.4
28.0
x
2
784.00
11.7
28.0
784.00
12.4
32.5
1056.25
10.8
39.0
1521.00
9.4
45.9
2106.81
9.5
57.8
3340.84
xy
347.20
327.60
403.00
421.20
431.46
549.10
8.0
58.1
3375.61
7.5
62.5
3906.25
464.80
468.75
81.7
351.8
16874.76
3413.11
13-15
Example 13.3: Fuel Consumption Case
#2
• From last slide,
– Σy i
= 81.7
– Σx i
= 351.8
– Σx 2 i
= 16,874.76
– Σx i y i
= 3,413.11
• Once we have these values, we no longer need the raw data
• Calculation of b
0 totals and b
1 uses these
13-16
Example 13.3: Fuel Consumption Case
#3 (Slope b
1
)
SS xy
x i y i
x i
y i
n
SS xx
3413 .
11
x i
2
( 351 .
8 )( 81 .
7 )
8
x i
2
n
179 .
6475
16874 .
76
( 351 .
8 )
2
8
1404 .
355 b
1
SS xy
SS xx
179 .
6475
0 .
1279
1404 .
355
13-17
Example 13.3: Fuel Consumption Case
#4 (y-Intercept b
0
) y x
n n y i x i
81 .
7
8
351 .
8
8
10 .
2125
43 .
98 b
0
y
b
1 x
10 .
2125
(
0 .
1279 )( 43 .
98 )
15 .
84
13-18
Example 13.3: Fuel Consumption Case
#5
• Prediction (x = 40)
• y
̂ = b
0
+ b
1 x = 15.84 + (-0.1279)(28)
• y
̂ = 12.2588 MMcf of Gas
13-19
Example 13.3: Fuel Consumption Case
#6
13-20
Example 13.3: The Danger of Extrapolation
Outside The Experimental Region
13-21
Testing the Significance of the Slope
• A regression model is not likely to be useful unless there is a significant relationship between x and y
• To test significance, we use the null hypothesis:
H
0
: β
1
= 0
• Versus the alternative hypothesis:
H a
: β
1
≠ 0
13-22
Testing the Significance of the Slope #2
If the regression assumptions hold, we can reject H
0
:
1
= 0 at the
level of significance (probability of Type I error equal to
) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than
13-23
Testing the Significance of the Slope #3
Alternative Reject H
0
If p -Value
H a
: β
1
> 0 t > t
α
Area under t distribution right of t
H a
: β
1
< 0 t < –t
α
H a
: β
1
≠ 0 |t| > t
α/2
*
Area under t distribution left of t
Twice area under t distribution right of |t|
* That is t > t
α/2 or t <
–t
α/2
13-24
Testing the Significance of the Slope #4
• Test Statistics b t= s b
1
1 where s b
1
s
SS xx
• 100(1α)% Confidence Interval for β
1
[b
1
± t
/2
S b1
]
• t
, t
/2 and p-values are based on n degrees of freedom
–2
13-25
Example 13.6: MINITAB Output of
Regression on Fuel Consumption Data
13-26
Example 13.6: Excel Output of
Regression on Fuel Consumption Data
13-27
Example 13.6: Fuel Consumption
Case
• The p-value for testing H
0 versus H a twice the area to the right of |t|=7.33 is with n-2=6 degrees of freedom
• In this case, the p-value is 0.0003
• We can reject H
0 in favor of H a at level of significance 0.05, 0.01, or 0.001
• We therefore have strong evidence that x is significantly related to y and that the regression model is significant
13-28
• If the regression assumptions hold, a
100(1-
) percent confidence interval for the true slope B1 is
– b
1
± t
/2 s b
• Here t is based on n - 2 degrees of freedom
13-29
Example 13.7: Fuel Consumption
Case
• An earlier printout tells us:
– b
1
= -0.12792
– s b1
= 0.01746
• We have n-2=6 degrees of freedom
– That gives us a t-value of 2.447 for a 95 percent confidence interval
• [b
1
± t
0.025
· s b1
] = [-0.12792 ± 0.01746]
= [-0.1706, -0.0852]
13-30
Testing the Significance of the y-Intercept
If the regression assumptions hold, we can reject H
0
:
0
= 0 at the
level of significance (probability of Type I error equal to
) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than
13-31
Alternative Reject H
0
If p -Value
H a
: β
0
> 0 t > t
α
Area under t distribution right of t
H a
: β
0
< 0 t < –t
α
H a
: β
0
≠ 0 |t| > t
α/2
*
Area under t distribution left of t
Twice area under t distribution right of |t|
* That is t > t
α/2 or t < –t
α/2
13-32
Test Statistics b t= s b
0
0 where s b
0
s x
2
1 n
SS xx
100(1-
)% Confidence Interval for
1
[ b
0
t
/ 2 s b
0
] t
, t
/2 and p-values are based on n freedom
–2 degrees of
13-33