Homework Solution (#8) 1. (a) Using indicator variables for humorous and musical commercials, the fitted regression equation is ˆ{memoryscore | length, TYPE} 2.54 2.91* I Humorous 5.50 * I Musical 0.22 * length Equivalently ˆ{memoryscore | length, TYPE Humorous} 5.45 0.22 * length ˆ{memoryscore | length, TYPE Musical } 8.04 0.22 * length ˆ{memoryscore | length, TYPE Serious} 2.54 0.22 * length [Note: You can choose any two of the three possible indicators; the latter three equations which give the regression lines for a fixed type of commercial will always be the same]. (b) Test Residual Residual by Predicted Plot 15 10 5 0 -5 -10 -15 0 5 10 15 20 25 30 Test Predicted Bivariate Fit of Residual Test By Length Residual Test 15 10 5 0 -5 -10 -15 20 30 40 50 Length 60 Both plots look approximately like random scatter. Thus, neither a transformation or adding powers of length as explanatory variables appears to be needed. (c) Longer commercials are remembered more than shorter commercials of the same type if the coefficient on length is greater than zero. As the below chart shows, the p-value for testing the null hypothesis that the coefficient on length equals zero is 0.0001. Thus, there is strong evidence that longer commercials are remembered more than shorter commercials of the same type. Parameter Estimates Term Intercept Length I-Humorous I-Musical Estimate Std Error t Ratio Prob>|t| 2.5347388 2.154979 1.18 0.2445 0.2226932 0.054327 4.10 0.0001 2.9075313 1.80602 1.61 0.1130 5.5012221 1.827852 3.01 0.0039 (d) Serious commercials are remembered more than humorous commercials of the same length if the coefficient on I Humorous is less than zero when the two indicator variables are I Humorous and I Musical . As the above chart shows, the p-value for testing the null hypothesis that the coefficient on I Humorous equals zero is 0.1130. Thus, there is not much evidence that serious commercials are remembered more than humorous commercials of the same type. (e) Using JMP, a 95% prediction interval for a person’s memory test score after seeing a humorous commercial that lasts 40 seconds is (2.94, 25.76). 2. (a) The percent total variation in student GPA’s that can be explained by the simple linear regression on an explanatory variable X is the R 2 from a simple linear regression of student GPA on X. Using JMP, we find that R 2 ’s from the simple linear regression of student GPA on each of the three explanatory variables is 0.40 for IQ, 0.29 for self concept and 0.01 for gender. Thus, IQ is the best single predictor of student GPA. (b) The formula for a multiple regression model that can be used to answer this question is {GPA | IQ, self concept , GENDER} 0 1 IQ 2 self concept 3 I female The coefficient 2 measures how much the mean GPA increases for a one point change in self-concept when IQ and gender are held fixed. (c) The question of interest is whether a change in self-concept is associated with a change in GPA when IQ and gender are held fixed. The null hypothesis is that a change in self-concept is not associated with a change in GPA when IQ and gender are held fixed, H 0 : 2 0 . The alternative hypothesis is that a change in self-concept is associated with a change in GPA when IQ and gender are held fixed, H a : 2 0 . The p-value for this hypothesis test is 0.015 as indicated in the output below from Fit Model. Thus, there is moderate evidence that changes in self-concept are associated with changes in GPA when IQ and gender are held fixed. A 95% confidence interval for the increase in GPA that is associated with a one point increase in self-concept when IQ and gender are held fixed is (0.02, 0.08). This is an observational study so no causal conclusions about the impact of changing self-concept on GPA can be made. Parameter Estimates Term Estimate Intercept IQ Self Concept Gender Dummy Std Error Lower 95% Upper 95% - 0.0010 -7.950465 -2.094255 3.42 0.0841168 0.014955 5.62 <.0001 0.0543188 0.1139147 0.0512928 0.015646 3.28 0.0016 0.0201165 0.0824691 0.968521 0.349484 2.77 0.0071 0.2721584 1.6648836 Overlay Plot Overlay Y's 0.05 0.04 0.03 Y Prob>|t| -5.02236 1.469531 3. (a) 0.02 0.01 0 -0.01 -0.0100000000000 UVB Y t Ratio Deep Percent Inhibition Surface Percent Inhibition Yes, it appears that there is a straight line relationship between UVB exposure and mean percent of inhibition for each depth. (b) The formula for the parallel regression lines multiple regression model is { percent inhibition | UVB exp osure, DEPTH } 0 1 I Surface 2UVB exp osure (c) The formula for the separate regression lines multiple regression model is { percent inhibition | UVB exp osure, DEPTH } 0 1 I Surface 2UVB exp osure 3 I Surface * UVB exp osure (d) To test whether the parallel regression lines model or the separate regression lines model is more appropriate, we test H 0 : 3 0 (parallel regression lines model) versus H a : 3 0 (separate regression lines model). We carry out this hypothesis test by fitting the separate regression lines model and looking at the p-value for the test of H 0 : 3 0 . Parameter Estimates Term Estimate Intercept I-Surface UVB UVB*ISurface Std Error t Ratio Prob>|t| 0.37 1.4999316 1.4673181 1238.9756 -980.0395 4.056453 0.7175 10.53841 0.14 0.8914 222.9629 5.56 <.0001 381.5392 - 0.0234 2.57 Lower 95% Upper 95% 7.263503 -21.29953 757.29369 -1804.305 10.263367 24.234168 1720.6576 -155.7742 The p-value for the test is 0.0234. Thus, there is moderate evidence against the parallel regression lines model; we should use the separate regression lines model. (e) There is moderate evidence that the effect of UVB exposure on the distribution of percentage inhibition differs at the surface and in the deep. This is equivalent to testing H 0 : 3 0 vs. H a : 3 0 in the separate regression lines model. The p-value for this test is 0.0234. A 95% confidence interval for the difference between the effect of a one unit increase in UVB exposure on mean percent inhibition at the surface and the effect of a one unit increase in UVB exposure on mean percent inhibition in the deep is (-1804.31, -155.77). Thus the effect of increasing UVB exposure on increasing mean percent inhibition appears to be greater in the deep than at the surface. 4. No, the researchers should not conclude that coffee drinking causes heart disease. This is an observational study and there may be confounding variables. In particular, cigarette smoking is probably a confounding variable because it is known to be associated with heart disease and is probably also associated with coffee drinking (people who smoke cigarettes tend to drink more coffee). We could use multiple regression to better address the question of whether coffee drinking causes heart disease by fitting a multiple regression of heart disease on coffee drinking and cigarette smoking. The coefficient on coffee drinking in the multiple regression would measure the mean change in heart disease that is associated with a one cup increase in coffee drinking when cigarette smoking is held fixed. The multiple regression could not be used to prove or disprove that coffee drinking causes heart disease because there could be other confounding variables besides cigarette smoking but it would provide more relevant evidence than the simple linear regression that does not control for cigarette smoking.