Chapter 12 Simple Linear Regression

advertisement
Chapter 12
Simple Linear Regression








Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model Assumptions
Simple Linear Regression Model


The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:
y = b0 + b1x +e
• b0 and b1 are called parameters of the model.
• e is a random variable called the error term.
Simple Linear Regression Equation

The simple linear regression equation is:
E(y) = b0 + b1x
• Graph of the regression equation is a straight line.
• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation

Positive Linear Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is positive
x
Simple Linear Regression Equation

Negative Linear Relationship
E(y)
Intercept
b0
Regression line
Slope b1
is negative
x
Simple Linear Regression Equation

No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0
x
Estimated Simple Linear Regression Equation

The estimated simple linear regression equation is:
ŷ  b0  b1 x
• The graph is called the estimated regression line.
• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
Estimation Process
Regression Model
y = b0 + b1x +e
Regression Equation
E(y) = b0 + b1x
Unknown Parameters
b0, b1
b0 and b1
provide estimates of
b0 and b1
Sample Data:
x
y
x1
y1
.
.
.
.
xn yn
Estimated
Regression Equation
ŷ  b0  b1 x
Sample Statistics
b0, b1
Least Squares Method

Least Squares Criterion
min  (y i  y i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
The Least Squares Method

Slope for the Estimated Regression Equation
 xi y i  (  xi  y i ) / n
b1 
2
2
 xi  (  xi ) / n
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
n = total number of observations
The Least Squares Method

y-Intercept for the Estimated Regression Equation
b0  y  b1 x
where:
_
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
Example: Reed Auto Sales

Simple Linear Regression
Reed Auto periodically has a special week-long
sale. As part of the advertising campaign Reed runs
one or more television commercials during the
weekend preceding the sale. Data from a sample of 5
previous sales are shown on the next slide.
Example: Reed Auto Sales

Simple Linear Regression
Number of TV Ads
1
3
2
1
3
Number of Cars Sold
14
24
18
17
27
Example: Reed Auto Sales

Slope for the Estimated Regression Equation
b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5

y-Intercept for the Estimated Regression Equation
b0 = 20 - 5(2) = 10

Estimated Regression Equation
y^ = 10 + 5x
Example: Reed Auto Sales
Scatter Diagram
30
25
Cars Sold

20
^
y = 10 + 5x
15
10
5
0
0
1
2
TV Ads
3
4
The Coefficient of Determination

Relationship Among SST, SSR, SSE
SST = SSR + SSE
2
2
^ )2
 ( y i  y )   ( y^i  y )   ( y i  y
i
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
The Coefficient of Determination

The coefficient of determination is:
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
Example: Reed Auto Sales

Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong
because 88% of the variation in number of cars sold
can be explained by the linear relationship between
the number of TV ads and the number of cars sold.
The Correlation Coefficient

Sample Correlation Coefficient
rxy  (sign of b1 ) Coefficien t of Determinat ion
rxy  (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ  b0  b1 x
Example: Reed Auto Sales

Sample Correlation Coefficient
rxy  (sign of b1 ) r 2
The sign of b1 in the equation yˆ  10  5 x is “+”.
rxy = + .8772
rxy = +.9366
Model Assumptions

Assumptions About the Error Term e
1. The error e is a random variable with mean of
zero.
2. The variance of e , denoted by  2, is the same for
all values of the independent variable.
3. The values of e are independent.
4. The error e is a normally distributed random
variable.
Testing for Significance



To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
Two tests are commonly used
• t Test
• F Test
Both tests require an estimate of  2, the variance of e
in the regression model.
Testing for Significance

An Estimate of  2
The mean square error (MSE) provides the estimate
of  2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE   (yi  yˆi ) 2   ( yi  b0  b1 xi ) 2
Testing for Significance

An Estimate of 
• To estimate  we take the square root of  2.
• The resulting s is called the standard error of the
estimate.
SSE
s  MSE 
n2
Testing for Significance: t Test

Hypotheses
H 0 : b1 = 0
H a : b1 = 0

Test Statistic
b1
t
sb1
Testing for Significance: t Test

Rejection Rule
Reject H0 if t < -t or t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom
Example: Reed Auto Sales

t Test
• Hypotheses
H 0 : b1 = 0
H a : b1 = 0
• Rejection Rule
For  = .05 and d.f. = 3, t.025 = 3.182
Reject H0 if t > 3.182
Example: Reed Auto Sales

t Test
• Test Statistics
t = 5/1.08 = 4.63
• Conclusions
t = 4.63 > 3.182, so reject H0
Confidence Interval for b1


We can use a 95% confidence interval for b1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of b1 is not
included in the confidence interval for b1.
Confidence Interval for b1

The form of a confidence interval for b1 is:
b1  t / 2 sb1
where
b1 is the point estimate
t / 2 sb1 is the margin of error
t / 2 is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Example: Reed Auto Sales


Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for b1.
95% Confidence Interval for b1
b1  t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44

Conclusion
0 is not included in the confidence interval.
Reject H0
Testing for Significance: F Test

Hypotheses
H 0 : b1 = 0
H a : b1 = 0

Test Statistic
F = MSR/MSE
Testing for Significance: F Test

Rejection Rule
Reject H0 if F > F
where:
F is based on an F distribution
with 1 d.f. in the numerator and
n - 2 d.f. in the denominator
Example: Reed Auto Sales

F Test
• Hypotheses
• Rejection Rule
H 0 : b1 = 0
H a : b1 = 0
For  = .05 and d.f. = 1, 3: F.05 = 10.13
Reject H0 if F > 10.13.
Example: Reed Auto Sales

F Test
• Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
• Conclusion
F = 21.43 > 10.13, so we reject H0.
Some Cautions about the
Interpretation of Significance Tests


Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does not
enable us to conclude that a cause-and-effect
relationship is present between x and y.
Just because we are able to reject H0: b1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
Example: Reed Auto Sales

Point Estimation
If 3 TV ads are run prior to a sale, we expect the
mean number of cars sold to be:
y^ = 10 + 5(3) = 25 cars
Download