Rsquare

advertisement
Descriptive measures of the
degree of linear association
R-squared and correlation
Regression Plot
y = 54.4758 - 0.764016 x
S = 7.81137
R-Sq = 6.5 %
R-Sq(adj) = 3.2 %
y
n
2
SSR    yˆ i  y   119.1
60
y
i 1
2
n
SSE    yi  yˆ i   1708.5
50
i 1
n
SSTO    yi  y   1827.6
ŷ
40
i 1
0
1
2
3
4
5
x
6
2
7
8
9
10
Regression Plot
y = 75.5458 - 5.76402 x
S = 7.81137
R-Sq = 79.9 %
80
R-Sq(adj) = 79.2 %
y
2
n
SSR    yˆ i  y   6679.3
70
60
i 1
50
2
n
y
SSE    yi  yˆ i   1708.5
40
i 1
30
n
10
i 1
0
1
2
3
4
5
x
2
SSTO    yi  y   8487.8
ŷ
20
6
7
8
9
10
Coefficient of determination
SSR
SSE
R r 
 1
SSTO
SSTO
2
2
• R2 is a number (a proportion!) between 0 and 1.
• If R2 = 1:
– all data points fall perfectly on the regression line
– predictor X accounts for all of the variation in Y
• If R2 = 0:
– the fitted regression line is perfectly horizontal
– predictor X accounts for none of the variation in Y
Interpretations of
2
R
• R2 ×100 percent of the variation in Y is
reduced by taking into account predictor X.
• R2 ×100 percent of the variation in Y is
“explained by” the variation in predictor X.
R-sq on Minitab fitted line plot
Regression Plot
Mort = 389.189 - 5.97764 Lat
S = 19.1150
R-Sq = 68.0 %
R-Sq(adj) = 67.3 %
Mortality
200
150
100
30
40
Latitude (at center of state)
50
R-sq on Minitab regression output
The regression equation is
Mort = 389.189 - 5.97764 Lat
S = 19.1150
R-Sq = 68.0 %
R-Sq(adj) = 67.3 %
Analysis of Variance
Source
Regression
Error
Total
DF
1
47
48
SS
36464.2
17173.1
53637.3
MS
36464.2
365.4
F
99.7968
P
0.000
Correlation coefficient
r  R  r
2
2
• r is a number between -1 and 1, inclusive.
• Sign of coefficient of correlation
– plus sign if slope of fitted regression line is positive
– negative sign if slope of fitted regression line is
negative.
Correlation coefficient formulas
 X
n
r
i 1
 X



r 



 X Yi  Y 
 X
n
i 1
i
2
i
 Y
n
i
i 1
 X
i
 X
i
Y 
n
2
i 1
 Y
n
i 1
2
Y 
2



 b1



Interpretation of
correlation coefficient
• No clear-cut operational interpretation as
for R-squared value.
• r = -1 is perfect negative linear relationship.
• r = 1 is perfect positive linear relationship.
• r = 0 is no linear relationship.
2
R
= 100% and r = +1
Fahrenheit
220
120
20
0
25
50
Celsius
75
100
2
R
= 2.9% and r = 0.17
Lengths of left forearms and head circumferences
of Spring 1998 Stat 250 Students
32
31
30
29
28
27
26
25
24
23
22
52
57
Head circumference (in cm)
n=89 students
62
2
R
= 70.1% and r = - 0.84
Annual Wine Consumption versus Death
Norway
Finland
U.S.
300
200
Italy
100
France
0
1
2
3
4
5
6
7
Liters of wine per person per year
8
9
2
R
= 82.8% and r = 0.91
Weights of Females
155
Actual = Ideal
145
135
125
115
105
110
120
130
140
150
160
Actual weight (lbs)
170
180
190
2
R
= 50.4% and r = 0.71
Weights of Males
200
Actual = Ideal
190
180
170
160
150
140
130
150
200
Actual weight (lbs)
250
2
R
= 0% and r = 0
A Perfect Quadratic Relationship
40
y
30
20
10
0
-5
0
x
5
Cautions about
2
R
and r
• Summary measures of linear association.
Possible to get R2 = 0 with a perfect
curvilinear relationship.
• Large R2 does not necessarily imply that
estimated regression line fits the data well.
• Both measures can be greatly affected by
one (outlying) data point.
Cautions about
2
R
and r
• A “statistically significant R2” does not
imply that slope is meaningfully different
from 0.
• A large R2 does not necessarily mean that
useful predictions can be made. Can still
get wide intervals.
Download