Solutions - Simple Regression 12.1 Car mileage and weight: a) The response variable is mileage, and the explanatory variable is weight. b) yˆ 45.6 .0052 x ; the y-intercept is 45.6 and the slope is - 0.0052. c) For each 1000 pound increase in the vehicle, the predicted mileage will decrease by 5.2 miles per gallon. d) The y-intercept is the predicted miles per gallon for a car that weighs 0 pounds. This is far outside the range of the car weights in this database and, therefore, does not have contextual meaning for these data. 12.3 Children of working females: a) yˆ 5.2 .04428 3.968 (rounds to 4.0) b) yˆ 5.2 .04491 1.196 (rounds to 1.2) y yˆ 2.3 1.196 1.104 (rounds to 1.1) c) d) The y-intercept indicates that for nations with no female economic activity, the predicted fertility rate is 5.2. As x increases from 0 to 100, the predicted fertility rate decreases from 5.2 to 0.8. 12.11 Dollars and thousands of dollars: Slope when income is in dollars: 1.50/1000 = 0.0015 12.13 When can you compare slopes?: a) For a $1000 increase in GDP, the predicted percentage using cell phones increases by 2.62, and the predicted percentage using the Internet increases by 1.55. b) Because the slope of GDP to cell phone use is larger than is the relation of GDP to Internet use, an increase in GDP would have a slightly greater impact on the percentage using cell phones than on the percentage using the Internet. 12.16 Weight, height, and fat: a) (i) Percentage of body fat and body mass index have the strongest association. (ii) Height and body mass index have the weakest association. b) There is a fairly strong, positive association between height and weight. As one goes up, the other tends to go up. c) r 2 = (0.553)(0.553) = 0.306 (rounds to 0.31). r 2 summarizes the reduction in sum of squared errors in predicting y using the regression line instead of using the mean of y. In this case, the sum of squared errors is 31% less when we use the regression equation. d) None of these results would differ if height and weight were instead measured with metric units. 12.18 Verbal and Math SAT: a) ŷ = 250 + 0.5(500) = 500. Generally, at the x-value equal to its mean, the predicted value of y is equal to its mean. b) We can find the correlation as follows: r b 100 0.5 0.5 When the x and y sx 100 sy variables have the same spread, the correlation equals the slope. c) r 2 = (0.5)(0.5) = 0.25; The sum of squared errors is 25% less when we use the regression equation instead of the mean of y. 12.19 SAT regression toward mean: a) ŷ = 250 + 0.5(800) = 650 b) The predicted y value will be 0.5 standard deviations above the mean, for every one standard deviation above the mean that x is. Here, x = 800 is three standard deviations above the mean; so the predicted y value is 0.5(3) = 1.5 standard deviations above the mean. 12.20 GPAs and TV watching: a) The correlation of -0.353 (rounds to -0.35) indicates that there is a negative relation between the two variables. The more one watches television, the lower his or her college GPA tends to be. The proportional reduction in error of 0.125 (rounds to 0.13) indicates that the sum of squared errors is 13% less when we use the regression equation instead of the mean of y. b) We would expect that student to be (2)(0.505) = 1.01 standard deviations above the mean on high school GPA. With regression to the mean, the predicted y is relatively closer to its mean than x is to its mean. 12.32 t-score?: a) df = n – 2 = 25 – 2 = 23 b) - 2.069 (rounds to -2.07) and 2.069 (rounds to 2.07) c) We’d use 2.07 12.36 More boys are bad?: a) The negative slope indicates a negative association between life length and number of sons. Having more sons is bad. b) i) Assumptions: Assume randomization, linear trend with normal conditional distribution for y and the same standard deviation at different values of x. ii) Hypotheses: The null hypothesis that the variables are independent is H0: β = 0. The two-sided alternative hypothesis of dependence is Ha: β ≠ 0. iii) Test statistic: t = b/se = - 0.65/0.29 = - 2.241. iv) P-value: The P-value is 0.026. v) Conclusion: If H0 were true that the population slope β = 0, it would be unusual to get a sample slope at least as far from 0 as b = - 0.65. In fact, the probability would be 0.026. The P-value gives very strong evidence that an association exists between number of sons and life length. c) The 95% confidence interval is b t.025 se 0.651 1.9660.29 . The confidence interval is (-1.220, -0.080) which rounds to (-1.2, -0.1). The plausible values for the true population slope range from -1.2 to -0.1. It is not plausible that the true slope is 0. 12.42 Student GPAs: a) i) Assumptions: Assume randomization, linear trend with normal conditional distribution for y and the same standard deviation at different values of x. ii) Hypotheses: The null hypothesis that the variables are independent is H0: β = 0. The two-sided alternative hypothesis of dependence is Ha: β ≠ 0. iii) Test statistic: t = b/se = 0.6369/0.1442 = 4.42 (or just look at the printout for the test statistic). iv) P-value: The P-value is 0.000. v) Conclusion: If H0 were true that the population slope β = 0, it would be very unusual – the probability would be almost 0 – to get a sample slope at least as far from 0 as b = 0.6369. The P-value is beyond the significance level of 0.05, and we can reject the null hypothesis. We have very strong evidence that an association exists between high school and college GPA. b) The 95% confidence interval is b t.025 se 0.6369 2.0020.1442 The confidence interval is (0.348, 0.926) which rounds to (0.3, 0.9). Zero is not a plausible value for this slope; as was concluded in the significance test, it is not plausible that there is no association. 12.46 Predicting house prices: a) The residual df, 98, equals n – 2; therefore, the sample size was 100. b) The sample predicted mean selling price was ŷ = 9.2 + 77.0(1.53) = 127.010, or $127,010. c) The estimated residual standard deviation of y is the square root of the MS Error, 1349. The square root of 1349 is 36.729. d) The prediction interval is: ŷ 2s or 127.010±2(36.729); (53.552, 200.468) which rounds to (53.6, 200.5). 12.47 Predicting clothes purchases: a) The value under “Fit,” 448, is the predicted amount spent on clothes in the past year for th those in the 12 grade of school. b) The 95% confidence interval of (427, 469) is the range of plausible values for the th population mean of dollars spent on clothes for 12 grade students in the school. c) The 95% prediction interval of (101, 795) is the range of plausible values for the th individual observations (dollars spent on clothes) for all the 12 grade students at the school. 12.56 Savings grow exponentially: a) 1 y = αβ x = (100)(1.10) = 110 5 b) y = αβ x = (100)(1.10) = 161.05 c) y = αβ x = (100)(1.10) x th 8 d) The first year after which you’ll have more than $200 is the 8 . y = αβ x = (100)(1.10) = 214.36 12.58 U.S. population growth: a) ŷ = 68.33×1.1418 0 = 68.33 million . ŷ = 68.33×1.1418 11 = 293.83 million b) 1.1418 is the multiplicative effect on ŷ for a one-unit increase in x. c) This suggests a very good fit of data to model. The high correlation indicates a linear relation between the log of the y values and the x values.