Descriptive measures of the degree of linear association R-squared and correlation Regression Plot y = 54.4758 - 0.764016 x S = 7.81137 R-Sq = 6.5 % R-Sq(adj) = 3.2 % y n 2 SSR yˆ i y 119.1 60 y i 1 2 n SSE yi yˆ i 1708.5 50 i 1 n SSTO yi y 1827.6 ŷ 40 i 1 0 1 2 3 4 5 x 6 2 7 8 9 10 Regression Plot y = 75.5458 - 5.76402 x S = 7.81137 R-Sq = 79.9 % 80 R-Sq(adj) = 79.2 % y 2 n SSR yˆ i y 6679.3 70 60 i 1 50 2 n y SSE yi yˆ i 1708.5 40 i 1 30 n 10 i 1 0 1 2 3 4 5 x 2 SSTO yi y 8487.8 ŷ 20 6 7 8 9 10 Coefficient of determination SSR SSE R r 1 SSTO SSTO 2 2 • R2 is a number (a proportion!) between 0 and 1. • If R2 = 1: – all data points fall perfectly on the regression line – predictor X accounts for all of the variation in Y • If R2 = 0: – the fitted regression line is perfectly horizontal – predictor X accounts for none of the variation in Y Interpretations of 2 R • R2 ×100 percent of the variation in Y is reduced by taking into account predictor X. • R2 ×100 percent of the variation in Y is “explained by” the variation in predictor X. R-sq on Minitab fitted line plot Regression Plot Mort = 389.189 - 5.97764 Lat S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 % Mortality 200 150 100 30 40 Latitude (at center of state) 50 R-sq on Minitab regression output The regression equation is Mort = 389.189 - 5.97764 Lat S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 % Analysis of Variance Source Regression Error Total DF 1 47 48 SS 36464.2 17173.1 53637.3 MS 36464.2 365.4 F 99.7968 P 0.000 Correlation coefficient r R r 2 2 • r is a number between -1 and 1, inclusive. • Sign of coefficient of correlation – plus sign if slope of fitted regression line is positive – negative sign if slope of fitted regression line is negative. Correlation coefficient formulas X n r i 1 X r X Yi Y X n i 1 i 2 i Y n i i 1 X i X i Y n 2 i 1 Y n i 1 2 Y 2 b1 Interpretation of correlation coefficient • No clear-cut operational interpretation as for R-squared value. • r = -1 is perfect negative linear relationship. • r = 1 is perfect positive linear relationship. • r = 0 is no linear relationship. 2 R = 100% and r = +1 Fahrenheit 220 120 20 0 25 50 Celsius 75 100 2 R = 2.9% and r = 0.17 Lengths of left forearms and head circumferences of Spring 1998 Stat 250 Students 32 31 30 29 28 27 26 25 24 23 22 52 57 Head circumference (in cm) n=89 students 62 2 R = 70.1% and r = - 0.84 Annual Wine Consumption versus Death Norway Finland U.S. 300 200 Italy 100 France 0 1 2 3 4 5 6 7 Liters of wine per person per year 8 9 2 R = 82.8% and r = 0.91 Weights of Females 155 Actual = Ideal 145 135 125 115 105 110 120 130 140 150 160 Actual weight (lbs) 170 180 190 2 R = 50.4% and r = 0.71 Weights of Males 200 Actual = Ideal 190 180 170 160 150 140 130 150 200 Actual weight (lbs) 250 2 R = 0% and r = 0 A Perfect Quadratic Relationship 40 y 30 20 10 0 -5 0 x 5 Cautions about 2 R and r • Summary measures of linear association. Possible to get R2 = 0 with a perfect curvilinear relationship. • Large R2 does not necessarily imply that estimated regression line fits the data well. • Both measures can be greatly affected by one (outlying) data point. Cautions about 2 R and r • A “statistically significant R2” does not imply that slope is meaningfully different from 0. • A large R2 does not necessarily mean that useful predictions can be made. Can still get wide intervals.