Solutions

advertisement
Homework Solution (#8)
1. (a) Using indicator variables for humorous and musical commercials, the fitted
regression equation is
ˆ{memoryscore | length, TYPE}  2.54  2.91* I Humorous  5.50 * I Musical  0.22 * length
Equivalently
ˆ{memoryscore | length, TYPE  Humorous}  5.45  0.22 * length
ˆ{memoryscore | length, TYPE  Musical }  8.04  0.22 * length
ˆ{memoryscore | length, TYPE  Serious}  2.54  0.22 * length
[Note: You can choose any two of the three possible indicators; the latter three equations
which give the regression lines for a fixed type of commercial will always be the same].
(b)
Test Residual
Residual by Predicted Plot
15
10
5
0
-5
-10
-15
0
5 10 15 20 25 30
Test Predicted
Bivariate Fit of Residual Test By Length
Residual Test
15
10
5
0
-5
-10
-15
20
30
40 50
Length
60
Both plots look approximately like random scatter. Thus, neither a transformation or
adding powers of length as explanatory variables appears to be needed.
(c) Longer commercials are remembered more than shorter commercials of the same type
if the coefficient on length is greater than zero. As the below chart shows, the p-value for
testing the null hypothesis that the coefficient on length equals zero is 0.0001. Thus,
there is strong evidence that longer commercials are remembered more than shorter
commercials of the same type.
Parameter Estimates
Term
Intercept
Length
I-Humorous
I-Musical
Estimate
Std Error
t Ratio
Prob>|t|
2.5347388 2.154979 1.18 0.2445
0.2226932 0.054327 4.10 0.0001
2.9075313 1.80602 1.61 0.1130
5.5012221 1.827852 3.01 0.0039
(d) Serious commercials are remembered more than humorous commercials of the same
length if the coefficient on I Humorous is less than zero when the two indicator variables are
I Humorous and I Musical . As the above chart shows, the p-value for testing the null
hypothesis that the coefficient on I Humorous equals zero is 0.1130. Thus, there is not much
evidence that serious commercials are remembered more than humorous commercials of
the same type.
(e) Using JMP, a 95% prediction interval for a person’s memory test score after seeing a
humorous commercial that lasts 40 seconds is (2.94, 25.76).
2. (a) The percent total variation in student GPA’s that can be explained by the simple
linear regression on an explanatory variable X is the R 2 from a simple linear regression of
student GPA on X. Using JMP, we find that R 2 ’s from the simple linear regression of
student GPA on each of the three explanatory variables is 0.40 for IQ, 0.29 for self
concept and 0.01 for gender. Thus, IQ is the best single predictor of student GPA.
(b) The formula for a multiple regression model that can be used to answer this question
is
{GPA | IQ, self  concept , GENDER}   0  1 IQ   2 self  concept   3 I female
The coefficient  2 measures how much the mean GPA increases for a one point change
in self-concept when IQ and gender are held fixed.
(c) The question of interest is whether a change in self-concept is associated with a
change in GPA when IQ and gender are held fixed. The null hypothesis is that a change
in self-concept is not associated with a change in GPA when IQ and gender are held
fixed, H 0 :  2  0 . The alternative hypothesis is that a change in self-concept is
associated with a change in GPA when IQ and gender are held fixed, H a :  2  0 . The
p-value for this hypothesis test is 0.015 as indicated in the output below from Fit Model.
Thus, there is moderate evidence that changes in self-concept are associated with changes
in GPA when IQ and gender are held fixed. A 95% confidence interval for the increase
in GPA that is associated with a one point increase in self-concept when IQ and gender
are held fixed is (0.02, 0.08). This is an observational study so no causal conclusions
about the impact of changing self-concept on GPA can be made.
Parameter Estimates
Term
Estimate
Intercept
IQ
Self
Concept
Gender
Dummy
Std Error
Lower 95%
Upper 95%
- 0.0010 -7.950465 -2.094255
3.42
0.0841168 0.014955 5.62 <.0001 0.0543188 0.1139147
0.0512928 0.015646 3.28 0.0016 0.0201165 0.0824691
0.968521 0.349484 2.77 0.0071 0.2721584 1.6648836
Overlay Plot
Overlay Y's
0.05
0.04
0.03
Y
Prob>|t|
-5.02236 1.469531
3. (a)
0.02
0.01
0
-0.01
-0.0100000000000
UVB
Y
t Ratio
Deep Percent Inhibition
Surface Percent Inhibition
Yes, it appears that there is a straight line relationship between UVB exposure and mean
percent of inhibition for each depth.
(b) The formula for the parallel regression lines multiple regression model is
{ percent  inhibition | UVB  exp osure, DEPTH }   0  1 I Surface   2UVB  exp osure
(c) The formula for the separate regression lines multiple regression model is
{ percent  inhibition | UVB  exp osure, DEPTH }   0  1 I Surface   2UVB  exp osure
  3 I Surface * UVB  exp osure
(d) To test whether the parallel regression lines model or the separate regression lines
model is more appropriate, we test H 0 :  3  0 (parallel regression lines model) versus
H a :  3  0 (separate regression lines model). We carry out this hypothesis test by fitting
the separate regression lines model and looking at the p-value for the test of H 0 :  3  0 .
Parameter Estimates
Term
Estimate
Intercept
I-Surface
UVB
UVB*ISurface
Std Error
t Ratio
Prob>|t|
0.37
1.4999316
1.4673181
1238.9756
-980.0395
4.056453
0.7175
10.53841 0.14 0.8914
222.9629 5.56 <.0001
381.5392
- 0.0234
2.57
Lower 95%
Upper 95%
7.263503
-21.29953
757.29369
-1804.305
10.263367
24.234168
1720.6576
-155.7742
The p-value for the test is 0.0234. Thus, there is moderate evidence against the parallel
regression lines model; we should use the separate regression lines model.
(e) There is moderate evidence that the effect of UVB exposure on the distribution of
percentage inhibition differs at the surface and in the deep. This is equivalent to testing
H 0 :  3  0 vs. H a :  3  0 in the separate regression lines model. The p-value for this
test is 0.0234. A 95% confidence interval for the difference between the effect of a one
unit increase in UVB exposure on mean percent inhibition at the surface and the effect of
a one unit increase in UVB exposure on mean percent inhibition in the deep is (-1804.31,
-155.77). Thus the effect of increasing UVB exposure on increasing mean percent
inhibition appears to be greater in the deep than at the surface.
4. No, the researchers should not conclude that coffee drinking causes heart disease.
This is an observational study and there may be confounding variables. In particular,
cigarette smoking is probably a confounding variable because it is known to be
associated with heart disease and is probably also associated with coffee drinking (people
who smoke cigarettes tend to drink more coffee). We could use multiple regression to
better address the question of whether coffee drinking causes heart disease by fitting a
multiple regression of heart disease on coffee drinking and cigarette smoking. The
coefficient on coffee drinking in the multiple regression would measure the mean change
in heart disease that is associated with a one cup increase in coffee drinking when
cigarette smoking is held fixed. The multiple regression could not be used to prove or
disprove that coffee drinking causes heart disease because there could be other
confounding variables besides cigarette smoking but it would provide more relevant
evidence than the simple linear regression that does not control for cigarette smoking.
Download