The Coefficient of Determination - Hampden

advertisement
The Coefficient of Determination
Lecture 46
Section 13.9
Robb T. Koether
Hampden-Sydney College
Tue, Apr 13, 2010
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
1 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
2 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
3 / 48
Explaining the Variation in y
Statisticians use regression models to “explain” y .
More specifically, through the model they use variation in x to
explain variation in y .
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
4 / 48
Explaining the Variation in y
For example, why do some people weigh more than other people?
One explanation is that some people weigh more than others
because they are taller.
That is, there is variation in weight because their is variation in
height and because weight and height are correlated.
But that is only a partial explanation.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
5 / 48
Explaining the Variation in y
Statisticians want to quantify how much of the variation in y is
explained by the variation in x.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
6 / 48
The Regression Identity
As always, variation is measure by calculating a sum of squared
deviations.
There are three different deviations that we can measure.
I
I
I
Deviations of y from y (variation in the data).
Deviations of ŷ from y (variation in the model).
Deviations of y from ŷ (difference between the data and the model).
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
7 / 48
The Regression Identity
Variation in the data (Total sum of squares):
X
SST =
(y − y )2 .
Variation in the model (Regression sum of squares):
X
SSR =
(ŷ − y )2 .
Residues (Sum of squared Errors):
X
SSE =
(y − ŷ )2 .
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
8 / 48
Example - SST, SSR, and SSE
The following data represent the heights and weights of 10 adult
males.
Height (x) Weight (y )
70
185
65
140
180
71
76
220
150
68
67
170
68
185
72
200
74
210
69
160
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
9 / 48
Example - SST, SSR, and SSE
The regression line is
ŷ = −310 + 7x.
The model predicts, for example, that if a person is 70 inches tall,
he will weigh 180 pounds.
The model also predicts that a person will weigh an additional 7
pounds for each additional inch of height.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
10 / 48
Example - SST, SSR, and SSE
Compute the predicted weight: Y1 (L1 ) → L3 .
Height (x) Weight (y ) Pred. Wgt. (ŷ )
70
185
180
140
145
65
71
180
187
220
222
76
68
150
166
170
159
67
68
185
166
72
200
194
74
210
208
69
160
173
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
11 / 48
Example - SST, SSR, and SSE
The regression line
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
12 / 48
Example - SST, SSR, and SSE
The deviations of y from y
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
13 / 48
Example - SST, SSR, and SSE
The deviations of ŷ from y
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
14 / 48
Example - SST, SSR, and SSE
The deviations of y from ŷ
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
15 / 48
Example
Compute SST.
x
70
65
71
76
68
67
68
72
74
69
Robb T. Koether (Hampden-Sydney College)
y
185
140
180
220
150
170
185
200
210
160
y −y
(y − y )2
The Coefficient of Determination
Tue, Apr 13, 2010
16 / 48
Example
Compute SST: L2 -y.
x
70
65
71
76
68
67
68
72
74
69
Robb T. Koether (Hampden-Sydney College)
y
185
140
180
220
150
170
185
200
210
160
y −y
5
−40
0
40
−30
−10
5
20
30
−20
(y − y )2
The Coefficient of Determination
Tue, Apr 13, 2010
17 / 48
Example
Compute SST: Ans2 .
x
70
65
71
76
68
67
68
72
74
69
Robb T. Koether (Hampden-Sydney College)
y
185
140
180
220
150
170
185
200
210
160
y −y
5
−40
0
40
−30
−10
5
20
30
−20
(y − y )2
25
1600
0
1600
900
100
25
400
900
400
The Coefficient of Determination
Tue, Apr 13, 2010
18 / 48
Example
Compute SST: sum(Ans).
x
y
70 185
65 140
71 180
76 220
68 150
67 170
68 185
72 200
74 210
69 160
Robb T. Koether (Hampden-Sydney College)
y −y
5
−40
0
40
−30
−10
5
20
30
−20
(y − y )2
25
1600
0
1600
900
100
25
400
900
400
5950
The Coefficient of Determination
Tue, Apr 13, 2010
19 / 48
Example
Compute SSR.
x
70
65
71
76
68
67
68
72
74
69
Robb T. Koether (Hampden-Sydney College)
y
185
140
180
220
150
170
185
200
210
160
ŷ
ŷ − y
The Coefficient of Determination
(ŷ − y )2
Tue, Apr 13, 2010
20 / 48
Example
Compute SSR: Y1 (L1 ) → L3 .
x
y
ŷ
70 185
180
65 140
145
71 180
187
76 220
222
68 150
166
67 170
159
68 185
166
72 200
194
74 210
208
69 160
173
Robb T. Koether (Hampden-Sydney College)
ŷ − y
The Coefficient of Determination
(ŷ − y )2
Tue, Apr 13, 2010
21 / 48
Example
Compute SSR: L3 -y.
x
y
70 185
65 140
71 180
76 220
68 150
67 170
68 185
72 200
74 210
69 160
Robb T. Koether (Hampden-Sydney College)
ŷ
180
145
187
222
166
159
166
194
208
173
ŷ − y
0
−35
7
42
−14
−21
−14
14
28
−7
The Coefficient of Determination
(ŷ − y )2
Tue, Apr 13, 2010
22 / 48
Example
Compute SSR: Ans2 .
x
y
70 185
65 140
71 180
76 220
68 150
67 170
68 185
72 200
74 210
69 160
Robb T. Koether (Hampden-Sydney College)
ŷ
180
145
187
222
166
159
166
194
208
173
ŷ − y
0
−35
7
42
−14
−21
−14
14
28
−7
The Coefficient of Determination
(ŷ − y )2
0
1225
49
1764
196
441
196
196
784
49
Tue, Apr 13, 2010
23 / 48
Example
Compute SSR: sum(Ans).
x
y
ŷ
70 185
180
145
65 140
71 180
187
222
76 220
68 150
166
67 170
159
166
68 185
72 200
194
74 210
208
69 160
173
Robb T. Koether (Hampden-Sydney College)
ŷ − y
0
−35
7
42
−14
−21
−14
14
28
−7
The Coefficient of Determination
(ŷ − y )2
0
1225
49
1764
196
441
196
196
784
49
4900
Tue, Apr 13, 2010
24 / 48
Example
Compute SSE.
x
70
65
71
76
68
67
68
72
74
69
Robb T. Koether (Hampden-Sydney College)
y
185
140
180
220
150
170
185
200
210
160
ŷ
y − ŷ
The Coefficient of Determination
(y − ŷ )2
Tue, Apr 13, 2010
25 / 48
Example
Compute SSE: Y1 (L1 ) → L3 .
x
y
ŷ
70 185
180
65 140
145
71 180
187
76 220
222
68 150
166
67 170
159
68 185
166
72 200
194
74 210
208
69 160
173
Robb T. Koether (Hampden-Sydney College)
y − ŷ
The Coefficient of Determination
(y − ŷ )2
Tue, Apr 13, 2010
26 / 48
Example
Compute SSE: L2 -L3 → L4 .
x
y
ŷ
70 185
180
65 140
145
71 180
187
76 220
222
68 150
166
67 170
159
68 185
166
72 200
194
74 210
208
69 160
173
Robb T. Koether (Hampden-Sydney College)
y − ŷ
5
−5
−7
−2
−16
11
19
6
−7
−13
The Coefficient of Determination
(y − ŷ )2
Tue, Apr 13, 2010
27 / 48
Example
Compute SSE: Ans2 .
x
y
70 185
65 140
71 180
76 220
68 150
67 170
68 185
72 200
74 210
69 160
Robb T. Koether (Hampden-Sydney College)
ŷ
180
145
187
222
166
159
166
194
208
173
y − ŷ
5
−5
−7
−2
−16
11
19
6
−7
−13
The Coefficient of Determination
(y − ŷ )2
25
25
49
4
256
121
361
36
49
169
Tue, Apr 13, 2010
28 / 48
Example
Compute SSE: sum(Ans).
x
y
ŷ
70 185
180
145
65 140
71 180
187
222
76 220
68 150
166
67 170
159
166
68 185
72 200
194
74 210
208
69 160
173
Robb T. Koether (Hampden-Sydney College)
y − ŷ
5
−5
−7
−2
−16
11
19
6
−7
−13
The Coefficient of Determination
(y − ŷ )2
25
25
49
4
256
121
361
36
49
169
1050
Tue, Apr 13, 2010
29 / 48
Example
We have now found that
SSR = 4900.
SSE = 1050.
SST = 5950.
We see that
SSR + SSE = SST.
This is called the regression identity.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
30 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
31 / 48
TI-83 - Finding SSR, SSE, and SST
TI-83 SSR, SSE, and SST
Put the x values into L1 and the y values into L2 .
Use LinReg(a+bx) L1 ,L2 ,Y1 .
Enter Y1 (L1 )→L3 .
To get SSR, evaluate sum((L3 -y)2 ).
To get SSE, evaluate sum((L2 -L3 )2 ).
To get SST, evaluate sum((L2 -y)2 ).
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
32 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
33 / 48
Explaining Variation
One goal of regression is to “explain” the variation in y .
For example, if y were weight, how would we explain the variation
in weight?
That is, why do some people weigh more than others?
A partial answer is that some people weigh more because they
are taller.
That is, an explanatory variable is height x.
What are some other partial answers?
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
34 / 48
Explaining Variation
How much of the variation in weight is explained by variation in
height?
The total variation in weight is SST.
The linear model (the regression line) explains some of the
variation.
The model predicts the variation SSR.
The remainder is SSE, the variation not predicted by the model.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
35 / 48
Explaining Variation
Statisticians consider the predicted variation SSR to be the
amount of variation in y that is explained by the model.
The residual variation SSE is the remaining variation in y that is
not explained by the model.
It all checks out because SST = SSR + SSE.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
36 / 48
Variation Explained by the Model
The regression line
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
37 / 48
Variation Explained by the Model
The total variation in y (SST)
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
38 / 48
Variation Explained by the Model
The variation in y that is explained by the model (SSR)
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
39 / 48
Variation Explained by the Model
The variation in y that is unexplained by the model (SSE)
220
210
200
190
180
170
160
150
140
64
Robb T. Koether (Hampden-Sydney College)
66
68
70
72
The Coefficient of Determination
74
76
Tue, Apr 13, 2010
40 / 48
Explaining Variation
It can be shown that
r2 =
SSR
SST
and, therefore,
1 − r2 =
SSE
.
SST
Therefore, r 2 is the proportion of variation in y that is explained by
the model. It is called the coefficient of determination.
1 − r 2 is the proportion that is not explained by the model.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
41 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
42 / 48
TI-83 - Coefficient of Determination
TI-83 Coefficient of Determination
To calculate r 2 on the TI-83, follow the procedure that produces
the regression line and r .
In the same window, the TI-83 reports the value of r 2 .
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
43 / 48
TI-83 - Finding SSR, SSE, and SST
Practice
The data on the next slide represent crude oil pricesa (x) vs.
gasoline pricesb (y ).
Draw the scatter plot.
Find the equation of the regression line.
Perform the residual analysis.
Find the correlation coefficient.
Find the coefficient of determination.
Compute SST, SSR, and SSE.
a
b
http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_WCO_K_W.xls
http://tonto.eia.doe.gov/oog/ftparea/wogirs/xls/pswrgvwrec.xls
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
44 / 48
TI-83 - Finding SSR, SSE, and SST
Practice
Date
Jan 16
Jan 23
Jan 30
Feb 6
Feb 13
Feb 20
Feb 27
Mar 6
Mar 13
Mar 20
Mar 27
Apr 3
Crude Oil
40.98
41.05
42.07
41.77
43.04
39.87
40.22
42.85
42.91
44.90
50.10
48.09
Date
Jan 19
Jan 26
Feb 2
Feb 9
Feb 16
Feb 23
Mar 2
Mar 9
Mar 16
Mar 23
Mar 30
Apr 9
Gasoline
1.833
1.833
1.894
1.926
1.970
1.924
1.942
1.936
1.921
1.950
2.048
2.044
Find SST, SSR, and SSE.
Find r 2 and interpret the value.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
45 / 48
Outline
1
The Regression Identity
2
Sums of Squares on the TI-83
3
Explaining Variation
4
TI-83 - The Coefficient of Determination
5
Assignment
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
46 / 48
Assignment
Homework
Read Section 13.9, pages 868 - 869.
Work the practice problem on the previous slide.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
47 / 48
Answers to Even-Numbered Exercises
Answers to Even-Numbered Exercises
SST = 0.0490, SSR = 0.0321, SSE = 0.0169.
r 2 = 0.6544. About 65.44% of the variation in gas prices is due to
variation in oil prices.
Robb T. Koether (Hampden-Sydney College)
The Coefficient of Determination
Tue, Apr 13, 2010
48 / 48
Download