“Quiz 13” Practice Question on Linear Regression Practice Problem (From p. 593 # 8) Let x be per capita income in thousands of dollars. Let y be the death rate per 1000 residents. Six small cities in Oregon (Albany, Bend, Corvallis, Grants Pass, Klamath Falls and Roseburg) gave the following x and y values. (x) 8.6 9.3 10.1 8.0 8.3 8.7 (y) 8.4 7.6 5.4 10.6 8.3 9.3 P P 2 P P 2 P For this data: x = 53, x = 471.04, y = 49.6, y = 425.22, xy = 432.06. Find: (a) the line of best fit; (b) find the correlation coefficient; (c) find the coefficient of determination and explain what it means. Answer. (a) First we compute SSx = and SSxy = X X P ( x)2 532 x − = 471.04 − = 2.8733333 n 6 2 P P ( x)( y) (53)(49.6) xy − = 432.06 − = −6.0733333 n 6 Therefore, the slope is b= SSxy = −2.1137 SSx and the y-intercept is 49.6 − (−2.1137) a = ȳ − bx̄ = 6 53 6 = 26.938 Therefore, the equation of the least squares line is y = −2.1137x + 26.938. (b) Most of the information we need for computing Sxy SSx SSy r= p was found in (a), we additionally compute SSy = X P ( y)2 49.62 y − = 425.22 − = 15.1933333 n 6 2 Therefore, Sxy −6.0733333 = −.9191948 =p SSx SSy (2.8733333)(15.19333333) r= p Because r is reasonably close to −1 this indicates a good negative linear correlation. (c) The coefficient of determination is r2 = (−.9191948)2 = .8449. Therefore, approximately 84.5% of deviation is explained by the regression line, while 15.5% is unexplained.