HW 9 SOLUTIONS Regression and Correlation 1. 12.41. The three residual plots, (i), (ii), and (iii), were generated after fitting regression lines to the three scatterplots, (a), (b), and (c). Which residual plot goes with which scatterplot? How do you know? Correct: Scatterplot (b) shows curvature, so it goes with residual plot (ii). In scatterplot (a), the points fan out as X increases, so this scatterplot goes with residual plot (iii). Finally, there are no unusual features in scatterplot (c), which goes with residual plot (i). 2. 12.5(modified). Twenty plots, each 10 x 4 meters were randomly chosen in a large field of corn. For each plot, the plant density (number of plants in the plot) and the mean cob weight (g of grain per cob) were observed. The results are given in the table. Plant Density X Cob Weight Y Plant Density X Cob Weight Y 137 212 173 194 107 241 124 241 132 215 157 196 135 225 184 193 115 250 112 224 103 241 80 257 102 237 165 200 65 282 160 190 149 206 157 208 85 246 119 224 a. Calculate the linear regression of Y on X. Correct: TI-84 After entering x’s in L1 and y’s in L2->STAT->TESTS and LinRegTTest->ENTER->Xlist: L1 Ylist: L2 Calculate->ENTER yields Y = 316.376 – 0.7206X b. Calculate sY and specify the units. TI-84 STAT->CALC->ENTER->L2->ENTER yields s = 24.954 g c. Calculate the value of sY|X and specify the units. Correct: TI-84 From LinRegTTest output we find SY|X = 8.619254138 = 8.619 g d. Interpret the value of sY|X in the context of this setting. Correct: Predictions of cob weight based on the regression model tend to be off by 8.6 g, on average. e. Calculate the value of r2 Correct: TI-84 From LinRegTTest output, we find r2 = 0.887 f. Interpret the value of r2 in the context of the setting. Correct: 88.7% of the variability in grams of grain per cob is explained by variability in the number of plants per plot g. Now, using the QQplot of the residuals and a residual vs. predicted (fitted) values plot. Use these plots to comment on the assumptions (that can be checked here). Correct: The residuals also appear to be centered around 0 at each slice of X, giving no indication that the errors do not have mean zero. The residual plot shows a fairly even spread at each "slice" of X. The points are a little more spread out where there are more points, but this is to be expected. Little evidence against the errors having equal variance. The QQplot of the residuals shows no systematic departure from the line. There is little to no indication of nonnormality in the errors here. h. Assuming the linear model is correctly specified, compute a 95% confidence interval for β1. Correct: The CI is -0.7206 ± (2.101)(0.0605) -0.7206 ± 0.1271 (-0.8477,-0.5935) or -0.8477 < β1 < -0.5935. TI-84 STAT->TESTS and LinRegTInt->ENTER->Xlist: L1 Ylist: L2 C-Level:0.95 Calculate->ENTER (-.848, -0.593) TI-83 Using the output from LinRegTTest, we have . 0.606 . The CI is -0.7206 ± (2.101)(0.0605) -0.7206 ± 0.1271 (-0.8477,-0.5935) or -0.8477 < β1 < -0.5935 i. Interpret the interval you just computed in part (h) in the context of the setting. Correct: We are 95% confident that mean cob weight decreases with each additional plant per plot by as little as 0.5935 or as much as 0.8477 grams of grain per cob. 3. 12.6. Laetisaric acid is a compound that holds promise for control of fungus diseases in crop plants. The accompanying data show the results of growing the fungus Pythium ultimum in various concentrations of laetisaric acid. Each growth value is the average of four radial measurements of a P. ultimum colony grown in a petri dish for 24 hours; there were two petri dishes at each concentration. Fungus Laetisaric Acid Growth Y Concentration X (μg/mLi) (mm) 0 33.3 0 31.0 3 29.8 3 27.8 6 28.0 6 29.0 10 25.5 10 23.8 20 18.3 20 15.5 30 11.7 30 10.0 a. Calculate the linear regression of Y on X. TI-84 After entering x’s in L1 and y’s in L2->STAT->TESTS and LinRegTTest->ENTER->Xlist: L1 Ylist: L2 Calculate->ENTER yields Y = 31.83 - 0.712X b. Calculate sY|X. What are the units of sY|X? Correct: TI-84 From LinRegTTest output we find sY|X = 1.295 mm c. Calculate the value of r2. Correct: TI-84 From LinRegTTest output, r2 = 0.975 d. Interpret the value of r2 in the context of the setting. Correct: 97.6% of the variability in Pithium ultimum growth can be explained by the variability in Laetisaric acid concentration. e. Suppose a second investigator were to replicate the experiment, using concentrations of 0, 2, 4, 6, 8, and 10 mg, with two petri dishes at each concentration. Would you predict that the value of r calculated by this second investigator would be about the same as that calculated in part (a), smaller in magnitude, or larger in magnitude? Explain. Correct: The second investigator would have less spread in the X values, so we would expect the second investigator to obtain a smaller correlation (a value of r closer to zero). f. Using the QQplot of the residuals and the residual vs. X plot. Use these plots to comment on the assumptions (that can be checked here). Correct: The residuals plot show a fairly even spread and centered around 0 at each "slice" of X. I do not see an obvious shape being made by this plot. There is little indication that the assumption of the errors having mean zero and equal variance have been violated. The QQplot of the residuals is showing what could be a slight pattern, but not a systematic departure, since at every two or three points, the points go back to the line. For only having 12 residuals, it's hard to tell if this is a pattern or not. For now, I do not have enough points to say this departure is systematic. There is little evidence against normality. g. Consider the null hypothesis that laetisaric acid has no effect on growth of the fungus. Assuming that the linear model is applicable, formulate this as a hypothesis about the true regression line, and test the hypothesis against the alternative that laetisaric acid inhibits growth of the fungus. Let α = 0.05. Correct: (1) α = 0.05 (2) H0: Laetisaric acid has no effect on fungus growth (β1 = 0) HA: Laetisaric acid inhibits fungus growth (β1 < 0) TI-84 After entering x’s in L1 and y’s in L2->STAT->TESTS and LinRegTTest->ENTER->Xlist: L1 Ylist: L2 β & ρ: < 0 Calculate->ENTER yields (3) t = -19.840 (4) P = 0.00000000116 (5) P < α, reject H0 (6) Conclude that mean Pythium ultimum growth decreases significantly as laetisaric acid concentration is increased. h. Assuming that the linear model is applicable, find estimates of the mean and standard deviation of fungus growth at a laetisaric acid concentration of 15 μg/mLi Correct: Substituting X = 15 into the fitted regression equation yields Y = 31.83 - (0.7120)(15) = 21.15. Thus, we estimate that the mean radial measurement of the Pithium ultimum colony would be 21.15 mm at a laetisaric acid concentration of 15 μg/ml. According to the linear model, the standard deviation of fungus growth does not depend on X. Our estimate of this standard deviation, σY|X , is the residual standard deviation from the regression line, sY|X. Thus, we estimate that the standard deviation of fungus growth would be 1.295 mm at a laetisaric acid concentration of 15 μg/ml. 4. In a study of the tufted titmouse (Parus bicolor), an ecologist captured seven male birds, measured their wing lengths and other characteristics, and then marked and released them. During the ensuing winter, he repeatedly observed the marked birds as they foraged for insects and seeds on tree branches. He noted the branch diameter on each occasion, and calculated (from 50 observations) the average branch diameter for each bird. The results are shown in the table. Bird Wing Length X (mm) Branch Diameter Y (cm) 1 79.0 1.02 2 80.0 1.04 3 81.5 1.20 4 84.0 1.51 5 79.5 1.21 6 82.5 1.56 7 83.5 1.29 a. Calculate the correlation coefficient between wing length and branch diameter. Correct: r = 0.803 b. Construct a 90% confidence interval for the population correlation coefficient, ρ. Correct: We utilize the Fisher Z-Transform from the Chapter 12 slides. From the previous exercise, we have r = 0.803344979. So, Z(r) = 1/2[ln((1+r)/(1-r))] = .5[ln(1.803344979/.196655021)] = .5[ln(9.170093734)] = 1.107973755. invNorm(.95,0,1) = Z.05 = 1.645. Now, calculate (1/(n-3))1/2 = (1/(7-3))1/2 = 0.5. We find the 90% CI on Z(ρ) as 1.107973755 +or- (1.645)(0.5) = 1.107973755 +or- 0.8225 or equivalently, 0.285473755 < Z(ρ) < 1.930473755. Lower limit on ρ is e2(0.285473755)-1/e2(0.285473755)+1 = 0.769943296/2.769943755 = 0.277963559. Upper limit on ρ is e2(1.930473755)-1/e2(1.930473755)+1 = 46.51034658/48.51034658 = 0.958771682. Finally, we report the 90%CI for ρ as (0.278, 0.959) or equivalently 0.278 < ρ < 0.959