Group 2 15516 7758 3.7113 0.04359

advertisement
Practice Exam (weeks 7 – 11 ) – Sample Solutions
Attempt all questions. You must support all answers with reasons – correct
answers with incorrect or missing reasons will receive NO CREDIT.
1. A study is conducted to investigate depression in adolescents. Among the factors
considered are worry and satisfaction with the immediate environment. High scores
indicate high levels of depression, worry, or satisfaction. The following statements are
made:
‘Depression is positively correlated with worry, r = .3.’
‘Depression is negatively correlated with satisfaction, r = -.36.’
‘Satisfaction and worry scores are negatively correlated, r = -.16.’
a. Is this study observational or experimental?
assigned to groups by researchers.
Observational; individuals are not
b. Sketch scatterplots to illustrate what you expect the data to look like in each case.
r = -.36
r = -.16
1
yy
-3 -2 -1
-2
-2
-1
0
yy
0
-1
yy
0
1
1
2
2
2
r = .3
-2.0
-1.0
0.0
xx
1.0
-2
-1
0
1
xx
2
3
-3
-2
-1
0
1
2
3
xx
c. A friend who does not know anything about statistics asks you to interpret these
statements in a practical sense. What would you say? As worry increases, there
is a tendency for depression to increase (on average). As satisfaction increases,
there is a tendency for depression to decrease. As worry increases, there is a
tendency for satisfaction to decrease.
2. If gene frequencies are in equilibrium, the genotypes AA, Aa, and aa occur with
probabilities (1-)2, 2 (1-), and 2, respectively. The following data were published on
haptoglobin type in a sample of 190 people:
Haptoglobin Type
Hp1-1
Hp1-2
Hp2-2
10
68
112
The MLE of  is about .77. Test the goodness of fit of the data to the hypothesis of
equilibrium.
Under the null, the expected numbers are 10.051, 67.298, and 112.651. The TS is 2 =
(10 – 10.051)2/10.051 + (68 – 67.298)2/67.298 + (112 – 112.651)2/112.651 = .01. The pvalue is obtained from the 2 distribution with 3 – 1 – 1 = 1 df (subtract an extra df for
estimating one parameter). The p-value is quite big, about .92. So, do not reject the null
hypothesis; the data are consistent with Hardy-Weinberg equilibrium.
3. Twenty-two patients undergoing cardiac bypass surgery were randomized to one of three
ventilation groups:
a. 50% nitrous oxide and 50% oxygen mixture continuously for 24 hours
b. 50% nitrous oxide and 50% oxygen mixture only during the operation
c. no nitrous oxide but 35-50% oxygen for 24 hours
The question of interest is whether the three ventilation methods result in a different
mean red cell foliate level. The planned data analysis is ANOVA.
a. What assumptions should be satisfied to obtain a valid p-value from ANOVA?
random samples (i.e. random assignment to groups here), equal variance for the
red cell foliate levels in each group, and either normally distributed levels in
each group or sufficiently large samples so that the sample means can be
assumed to be normally distributed.
b. If you had all the data, what graphical and numerical examinations would you
make to check for violations of the assumptions?
Could look at separate QQ normal plots for each group, although these samples
are really too small for this. You should check that the 3 SDs are within a factor
of 2 of each other.
c. What null hypothesis is being tested? That the mean red cell foliate levels are
equal for the three groups.
d. Use the ANOVA table below (obtained from R) to determine whether the null is
rejected at the 5% level. Interpret the result.
The p-value of the F statistic is less than 5%, so reject the null hypothesis. At
least one group mean is different from the others.
> redcell.aov<-aov(Folate~Group)
> summary(redcell.aov)
Df Sum Sq Mean Sq F value
Group
Residuals
2
15516
7758
19
39716
2090
Pr(>F)
3.7113 0.04359 *
--Signif. codes:
0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
e. Explain why performing a joint test (ANOVA) is preferable to performing several
pair-wise tests.
The more tests that are done, the larger the overall probability of a Type I error
(falsely rejecting a true null).
4. A new drug is being developed for use in the treatment of skin cancer. It is hoped that it
will be effective on a majority of those patients on whom it is tested. The company
developing the drug wants to get statistical evidence to support such a claim, and tests the
drug on 25 people, finding that it is effective for 15 of them.
a. What is the parameter of interest? The population proportion p of skin cancer
patients for whom the treatment would be effective.
b. What assumptions should be satisfied in order to make a CI for the parameter?
Unknown population parameter, random sample from the population of interest,
and sufficiently large sample so that the CLT holds (i.e. the sample proportion is
normally distributed).
c. If appropriate, make an approximate 95% CI for the parameter. If not, explain
why not. Since there are at least 10 successes and 10 failures, the CLT should
provide a good approximation. 15/25 = .6, so the CI is .6 +/- 2*sqrt(.6*.4/25) (or
use 1.96 instead of 2)
d. Set up the null and alternative hypotheses for testing whether the finding is
statistically significant. NULL H: p = .5; ALT A: p > .5
e. Carry out the hypothesis test, giving the value of the test statistic and its p-value.
Is the result significant at the 10% level? The TS is z = (.5 – . 6 )/sqrt(.5*.5/25) =
1. The p-value is P(Z > 1) = 16%; do not reject the null.
f. If the treatment will really work for a majority of patients in the population, what
might the company do to strengthen its evidence? The company should carry out
a larger trial, preferably using a control and randomizing patients to groups.
Download