Chapter12

advertisement
Chapter 12
Simple
Regression
& Correlation
Analysis
© 2002 Thomson / South-Western
Slide 12-1
Learning Objectives
• Compute the equation of a simple
regression line from a sample of data, and
interpret the slope and intercept of the
equation.
• Understand the usefulness of residual
analysis in testing the assumptions
underlying regression analysis and in
examining the fit of the regression line to
the data.
• Compute a standard error of the estimate
and interpret its meaning.
© 2002 Thomson / South-Western
Slide 12-2
Learning Objectives, continued
• Compute a coefficient of determination
and interpret it.
• Test hypotheses about the slope of the
regression model and interpret the
results.
• Estimate values of Y using the
regression model.
• Compute a coefficient of correlation
and interpret it.
© 2002 Thomson / South-Western
Slide 12-3
Correlation and Regression
• Correlation is a measure of the degree
of relatedness of two variables.
• Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another
variable.
© 2002 Thomson / South-Western
Slide 12-4
Simple Regression Analysis
• Bivariate (two variables) linear
regression -- the most elementary
regression model
– dependent variable, the variable to
be predicted, usually called Y
– independent variable, the predictor
or explanatory variable, usually called
X
© 2002 Thomson / South-Western
Slide 12-5
Airline
Cost
Data
Number of
Passengers
© 2002 Thomson / South-Western
Cost
($1,000)
X
Y
61
63
67
69
70
74
76
81
86
91
95
97
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
Slide 12-6
Scatter Plot of Airline Cost Data
6
5
Cost ($1000)
4
3
2
1
0
0
20
40
60
80
100
120
Number of Passengers
© 2002 Thomson / South-Western
Slide 12-7
Regression Models
 Deterministic
 1X
 Probabilistic
Regression Model: Y = 0 +
Regression Model: Y = 0 + 1X +

 0 and 1 are population
parameters
 0
and 1 are estimated by sample statistics b0 and b1
© 2002 Thomson / South-Western
Slide 12-8
Equation of the
Simple Regression Line
Yˆ  b0  b1 X
where :
b
0
= the sample intercept
b = the sample slope
1
Yˆ = the predicted value of Y
© 2002 Thomson / South-Western
Slide 12-9
Slope and Y Intercept
of the Regression Line
 X  X Y  Y   XY  nXY


b
 X n X
 X  X 
2
1
2
2


X  Y 


XY 
n
X
2


X
2
n
Y
X


b Y b X  n b n
0
© 2002 Thomson / South-Western
1
1
Slide 12-10
Least Squares Analysis
SSXY    X  X Y  Y    XY 
SSXX  
b1 
X  X
2

X
2


 X  Y 
n
X
2
n
SSXY
SSXX
Y
X


b  Y b X  n b n
0
1
© 2002 Thomson / South-Western
1
Slide 12-11
Airline Cost Example: Solving for Slope and Y
Intercept of the Regression Line (Part 1)
Number of
Passengers
X
Cost ($1,000)
Y
X
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
3,721
3,969
4,489
4,761
4,900
5,476
5,776
6,561
7,396
8,281
9,025
9,409
61
63
67
69
70
74
76
81
86
91
95
97
X
= 930
Y
= 56.69
© 2002 Thomson / South-Western
X
2
2
= 73,764
XY
261.08
257.04
296.14
287.73
313.60
318.20
366.32
380.70
439.46
466.83
535.80
539.32
 XY
= 4,462.22
Slide 12-12
Airline Cost Example: Solving for Slope and Y
Intercept of the Regression Line (Part 2)
SSXY  XY 
 X Y
 4,462.22  (930)(56.69)  68.745
n
12
 X )2
(

 73,764  (930)2  1689
SSXX X 2 
n
12
b1  SSXY 68.745  .0407
SSXX 1689
Y
 X 56.69



 (.0407) 930  1.57
b0
b1
n
n
12
12
Yˆ  1.57  .0407X
© 2002 Thomson / South-Western
Slide 12-13
Graph of Regression Line
for the Airline Cost Example
6
5
Cost ($1000)
4
3
2
1
0
0
20
40
60
80
100
120
Number of Passengers
© 2002 Thomson / South-Western
Slide 12-14
Residual Analysis
• Residual is the difference between
the actual Y value and the Y value
predicted by the regression model
• It is the error of the regression model
in predicting each value of the
dependent variable.
© 2002 Thomson / South-Western
Slide 12-15
Airline Cost Example:
Residual Analysis
Number of
Passengers
X
61
63
67
69
70
74
76
81
86
91
95
97
Cost ($1,000)
Y
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4.70
5.11
5.13
5.64
5.56
Predicted
Value
Ŷ
4.053
4.134
4.297
4.378
4.419
4.582
4.663
4.867
5.070
5.274
5.436
5.518
Residual
Y Yˆ
.227
.054
.123
-.208
.061
-.282
.157
-.167
.040
-.144
.204
.042
 (Y Yˆ)  .001
© 2002 Thomson / South-Western
Slide 12-16
Airline Cost Example:
Excel Graph of Residuals
0.2
Residual
0.1
0.0
-0.1
-0.2
-0.3
60
70
80
90
100
Number of Passengers
© 2002 Thomson / South-Western
Slide 12-17
Nonlinear Residual Plot
0
© 2002 Thomson / South-Western
X
Slide 12-18
Nonconstant Error Variance
0
0
© 2002 Thomson / South-Western
X
X
Slide 12-19
Graphs of Nonindependent
Error Terms
X
0
© 2002 Thomson / South-Western
X
0
Slide 12-20
Healthy Residual Plot
0
© 2002 Thomson / South-Western
X
Slide 12-21
Standard Error of the Estimate
Sum of Squares Error
SSE  
Standard Error
of the
Estimate
© 2002 Thomson / South-Western
 
Y Y
2
  Y  b0  Y  b1  XY
2
Se 
SSE
n2
Slide 12-22
Airline Cost Example:
Determining SSE
Number of
Passengers
X
61
63
67
69
70
74
76
81
86
91
95
97
Cost ($1,000)
Y
Residual
4.28
4.08
4.42
4.17
4.48
4.30
4.82
4 .70
5.11
5.13
5.64
5.56
.227
-.054
.123
-.208
.061
-.282
.157
-.167
.040
-.144
.204
.042
Y  Yˆ
 (Y  Yˆ )   . 001
(Y  Yˆ ) 2
.05153
.00292
.01513
.04326
.00372
.07952
.02465
.02789
.00160
.02074
.04162
.00176
 (Y  Yˆ ) 2 =.31434
Sum of squares of error = SSE = .31434
© 2002 Thomson / South-Western
Slide 12-23
Airline Cost Example:
Standard Error of the Estimate
Sum of Squares Error
SSE  
Standard Error
of the
Estimate
Y Yˆ 
2
 0.31434
SSE
Se  n  2
0.31434

10
 0.1773
© 2002 Thomson / South-Western
Slide 12-24
Coefficient of Determination
• The proportion of variability of the
dependent variable accounted for
or explained by the independent
variable in a regression model
© 2002 Thomson / South-Western
Slide 12-25
Coefficient of Determination
SSYY  
Y Y    Y
2
 Y


2
2
n
SSYY  exp lained var iation  un exp lained var iation
SSYY  SSR  SSE
SSR SSE
1

SSYY SSYY
SSR
2
r  SSYY
SSE
 1
SSYY
SSE
2
 1
2
0 1
Y
2
Y  n

© 2002 Thomson / South-Western

r
Slide 12-26
Airline Cost Example:
Coefficient of Determination
SSE  0.31434
Y


56.69 
 Y 
 270.9251 
 3.11209
2
SSYY
2
2
n
SSE
r  1
SSYY
.31434
 1
3.11209
 ..899
2
© 2002 Thomson / South-Western
12
89.9% of the variability
of the cost of flying a
Boeing 737 is accounted for
by the number of passengers.
Slide 12-27
Hypothesis Tests for the Slope
of the Regression Model
S

b
t
S
S

S
SSE
n2
1
H 0:  1  0
H 1:  1  0
H 0:  1  0
H 1:  1  0
H 0:  1  0
H 1:  1  0
© 2002 Thomson / South-Western
1
b
where:
e
b
e
SSXX

 X


2
SSXX  

1
X
2
n
 the hypothesized slope
df  n  2
Slide 12-28
Airline Cost Example:
Point Estimation
Yˆ  1.57  0.0407 X
For X  73,
Yˆ  1.57  0.040773
 4.5411 or $4,541.10
© 2002 Thomson / South-Western
Slide 12-29
Airline Cost Example: Confidence Interval
to Estimate the Conditional Mean of Y


X
1
X
0

n
SSXX
2
where : X 0  a particular value of X
Yˆ  t  , n  2 S e

2
 X 
2
SSXX =  X 2 
n
For X 0  73 and a 95% confidence level ,
 73  77.5 
930 
73,764 
2
4.5411  2.2280.1773
1

12
2
12
 4.5411  1220
4.4191  E Y 73  4.6631
© 2002 Thomson / South-Western
Slide 12-30
Airline Cost Example: Confidence
Interval to Estimate the Average Value
of Y for some Values of X
X
62
68
73
85
90
Confidence Interval
4.0934 + .1876
4.3376 + .1461
4.5411 + .1220
5.0295 + .1349
5.2230 + .1656
© 2002 Thomson / South-Western
3.9058 to 4.2810
4.1915 to 4.4837
4.4191 to 4.6631
4.8946 to 5.1644
5.0674 to 5.3986
Slide 12-31
Prediction Interval to Estimate Y
for a Given Value of X

1 X 0 X
ˆ
Y  t  ,n 2 S e 1  
n
SSXX
2
where : X 0  a particular value of X

2
X


2
SSXX =  X
© 2002 Thomson / South-Western
2
n
Slide 12-32
Confidence Intervals for Estimation
Regression Plot
Cost
6
5
Regression
4
95% CI
95% PI
60
70
80
90
100
Number of Passengers
© 2002 Thomson / South-Western
Slide 12-33
Pearson Product-Moment
Correlation Coefficient
r


SSXY
 SSX  SSY 
  X  X Y  Y 
  X  X   Y Y 
 X  Y 
XY


n
2




X
2
2
 X

2

n
© 2002 Thomson / South-Western


Y
  Y 2 
n


 
2


A correlation
measure used to
determine the
degree of
relatedness of two
variables that are at
least of interval
level.
1  r  1
Slide 12-34
Three Degrees of Correlation
r<0
r>0
r=0
© 2002 Thomson / South-Western
Slide 12-35
Economics Example:
Computation of r (Part 1)
Day
1
2
3
4
5
6
7
8
9
10
11
12
Summations
Interest
X
7.43
7.48
8.00
7.75
7.60
7.63
7.68
7.67
7.59
8.07
8.03
8.00
92.93
© 2002 Thomson / South-Western
Futures
Index
Y
221
222
226
225
224
223
223
226
226
235
233
241
2,725
X2
55.205
55.950
64.000
60.063
57.760
58.217
58.982
58.829
57.608
65.125
64.481
64.000
720.220
Y2
48,841
49,284
51,076
50,625
50,176
49,729
49,729
51,076
51,076
55,225
54,289
58,081
619,207
XY
1,642.03
1,660.56
1,808.00
1,743.75
1,702.40
1,701.49
1,712.64
1,733.42
1,715.34
1,896.45
1,870.99
1,928.00
21,115.07
Slide 12-36
Economics Example:
Computation of r (Part 2)
r

X  Y 


XY 


X
 X 2 
n


n

 
2

Y 
2
  Y 

n 



 92.93 2725
21115
, .07 
12

2


92
.
93
 720.22 
 619,207  2725
12
12





2


 
2


.815
© 2002 Thomson / South-Western
Slide 12-37
Economics Example:
Scatter Plot and Correlation Matrix
245
Futures Index
240
235
230
225
220
7.40
7.60
7.80
8.00
8.20
Interest
Interest
Interest
Futures Index
© 2002 Thomson / South-Western
Futures Index
1
0.815254
1
Slide 12-38
Download