Practice Exam (weeks 7 – 11 ) – Sample Solutions Attempt all questions. You must support all answers with reasons – correct answers with incorrect or missing reasons will receive NO CREDIT. 1. A study is conducted to investigate depression in adolescents. Among the factors considered are worry and satisfaction with the immediate environment. High scores indicate high levels of depression, worry, or satisfaction. The following statements are made: ‘Depression is positively correlated with worry, r = .3.’ ‘Depression is negatively correlated with satisfaction, r = -.36.’ ‘Satisfaction and worry scores are negatively correlated, r = -.16.’ a. Is this study observational or experimental? assigned to groups by researchers. Observational; individuals are not b. Sketch scatterplots to illustrate what you expect the data to look like in each case. r = -.36 r = -.16 1 yy -3 -2 -1 -2 -2 -1 0 yy 0 -1 yy 0 1 1 2 2 2 r = .3 -2.0 -1.0 0.0 xx 1.0 -2 -1 0 1 xx 2 3 -3 -2 -1 0 1 2 3 xx c. A friend who does not know anything about statistics asks you to interpret these statements in a practical sense. What would you say? As worry increases, there is a tendency for depression to increase (on average). As satisfaction increases, there is a tendency for depression to decrease. As worry increases, there is a tendency for satisfaction to decrease. 2. If gene frequencies are in equilibrium, the genotypes AA, Aa, and aa occur with probabilities (1-)2, 2 (1-), and 2, respectively. The following data were published on haptoglobin type in a sample of 190 people: Haptoglobin Type Hp1-1 Hp1-2 Hp2-2 10 68 112 The MLE of is about .77. Test the goodness of fit of the data to the hypothesis of equilibrium. Under the null, the expected numbers are 10.051, 67.298, and 112.651. The TS is 2 = (10 – 10.051)2/10.051 + (68 – 67.298)2/67.298 + (112 – 112.651)2/112.651 = .01. The pvalue is obtained from the 2 distribution with 3 – 1 – 1 = 1 df (subtract an extra df for estimating one parameter). The p-value is quite big, about .92. So, do not reject the null hypothesis; the data are consistent with Hardy-Weinberg equilibrium. 3. Twenty-two patients undergoing cardiac bypass surgery were randomized to one of three ventilation groups: a. 50% nitrous oxide and 50% oxygen mixture continuously for 24 hours b. 50% nitrous oxide and 50% oxygen mixture only during the operation c. no nitrous oxide but 35-50% oxygen for 24 hours The question of interest is whether the three ventilation methods result in a different mean red cell foliate level. The planned data analysis is ANOVA. a. What assumptions should be satisfied to obtain a valid p-value from ANOVA? random samples (i.e. random assignment to groups here), equal variance for the red cell foliate levels in each group, and either normally distributed levels in each group or sufficiently large samples so that the sample means can be assumed to be normally distributed. b. If you had all the data, what graphical and numerical examinations would you make to check for violations of the assumptions? Could look at separate QQ normal plots for each group, although these samples are really too small for this. You should check that the 3 SDs are within a factor of 2 of each other. c. What null hypothesis is being tested? That the mean red cell foliate levels are equal for the three groups. d. Use the ANOVA table below (obtained from R) to determine whether the null is rejected at the 5% level. Interpret the result. The p-value of the F statistic is less than 5%, so reject the null hypothesis. At least one group mean is different from the others. > redcell.aov<-aov(Folate~Group) > summary(redcell.aov) Df Sum Sq Mean Sq F value Group Residuals 2 15516 7758 19 39716 2090 Pr(>F) 3.7113 0.04359 * --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 e. Explain why performing a joint test (ANOVA) is preferable to performing several pair-wise tests. The more tests that are done, the larger the overall probability of a Type I error (falsely rejecting a true null). 4. A new drug is being developed for use in the treatment of skin cancer. It is hoped that it will be effective on a majority of those patients on whom it is tested. The company developing the drug wants to get statistical evidence to support such a claim, and tests the drug on 25 people, finding that it is effective for 15 of them. a. What is the parameter of interest? The population proportion p of skin cancer patients for whom the treatment would be effective. b. What assumptions should be satisfied in order to make a CI for the parameter? Unknown population parameter, random sample from the population of interest, and sufficiently large sample so that the CLT holds (i.e. the sample proportion is normally distributed). c. If appropriate, make an approximate 95% CI for the parameter. If not, explain why not. Since there are at least 10 successes and 10 failures, the CLT should provide a good approximation. 15/25 = .6, so the CI is .6 +/- 2*sqrt(.6*.4/25) (or use 1.96 instead of 2) d. Set up the null and alternative hypotheses for testing whether the finding is statistically significant. NULL H: p = .5; ALT A: p > .5 e. Carry out the hypothesis test, giving the value of the test statistic and its p-value. Is the result significant at the 10% level? The TS is z = (.5 – . 6 )/sqrt(.5*.5/25) = 1. The p-value is P(Z > 1) = 16%; do not reject the null. f. If the treatment will really work for a majority of patients in the population, what might the company do to strengthen its evidence? The company should carry out a larger trial, preferably using a control and randomizing patients to groups.