Comprehensive Take-Home Exam STAT 210 (83 pts.) SOLUTIONS 1 – Risk Factors for Low Birth Weight These data come from a study of infant birth weights. Two of the factors of interest to researchers are maternal smoking during pregnancy and age of the mother and the role these factors might play in the birth weight of the infant. It is believed that smoking mothers and mothers over the age of 35 are at increased risk of having an infant with low birth weight, which is a birth weight below 6 lbs. Data from this study are contained in the file Lowbirthweight.JMP. The variables in the data file are as follows: In Parts (a) – (c) you will examine whether or not women over 35 years of age and women who smoke during pregnancy have a significantly better chance of having an infant with low birth weight. a) Find the following conditional probabilities: (4 pts.) P(LBW=Yes|Over 35 = Yes)=.1186 P(LBW=Yes|Over 35 = No) = .0549 P(LBW=Yes|Smoker = Yes) = .0936 P(LBW=Yes|Smoker=No) = .0341 b) Use the conditional probabilities from part (a) to find the Relative Risk (RR) of having an infant with low birth weight associated with both of these potential risk factors. Give the correct interpretations of both of these RR’s. (4 pts.) RR for smoking mothers = .0936/.0341 = 2.74, smokers are 2.74 times more likely to have an infant with a low birth weight than non-smokers. RR for over 35 mothers = .1148/.0549 = 2.10, mothers over 35 years of age are 2.10 times more likely to have an infant with a low birth weight. 1 – Low Birth Weights (cont’d) c) Use an appropriate inferential method to determine if these two risk factors are statistically significant, specifically test whether or not having the risk factor increases the chance of having a infant with a low birth weight. Summarize your findings including any appropriate computer output you used. (8 pts.) See Fisher’s Exact Test results above. Notation: i the proportion of mothers in the population that have infants with low birth weights. You can also think of this as the probability or chance that a mother in the population has an infant with low birth weight. Smoking: H o : Smo ker s Non smo ker s H a : Smo ker s Non smo ker s From Fisher’s Exact Test we have a p-value=.0011. We have strong evidence to suggest that women who smoke during pregnancy are at increased risk of having an infant with a low birth weight. Over 35: H o : Over 35 Under35 H a : Over 35 Under35 From Fisher’s Exact Test we have a p-value=.0551. We have weak evidence to suggest that women who are over 35 years of age are at increased risk of having an infant with a low birth weight. Using a significance level .05 we would retain the null hypothesis and fail to conclude that having a child past the age of 35 presents an increased risk of having an infant with a low birth weight. In Parts (d) – (f) you will be comparing the actual birth weights of infants. d) Construct a comparative display that shows the birth weights plotted vs. smoking status of the mother. Obtain summary statistics (mean, median, quantiles, SD, etc.) for the birth weights of infants from both groups. How do the birth weights compare on the basis of these statistics? (3 pts.) Smokers vs. Non-Smokers The sample mean birth weight for non-smoking mothers is 7.73 lbs. with a standard deviation of 1.05 lbs. compared to a mean of 7.24 lbs. and standard deviation of 1.08 lbs. for smoking mothers. The sample means suggest non-smoking mothers have babies that are on average about .5 lbs. heavier than babies born to smoking mothers. The variation in the observed birth weights for both groups is very similar. Birth weights for both groups are approximately normal. Over 35 years of age vs. Under 35 years of age The sample mean birth weight for mothers under the age of 35 is 7.53 lbs. with a standard deviation of 1.08 lbs. compared to a sample mean of 7.37 lbs. and standard deviation of 1.22 lbs. for mothers over 35 years of age. The sample means suggest mothers under the age of 35 have babies that are on average about .16 lbs. heavier than babies born to over 35. The variation in the observed birth weights for both groups is very similar. Birth weights for both groups are approximately normal. e) Conduct a test to determine if smoking mothers have significantly smaller infants on average when compared to mothers who did not smoke during pregnancy. Summarize your findings. (4 pts.) Because the sample standard deviations for the risk vs. non-risk group in both cases are very similar we will use a pooled t-test to compare the population means. (Note: you can formally test the equality of variance assumption by selecting the Unequal Variances option in JMP, see handout). Smokers vs. Non-smokers: H o : Non smo ker s Smo ker s H a : Non smo ker s Smo ker s t 5.977 and p value .0001 We have extremely strong evidence to suggest that the mean birth weight of infants born to non-smoking mothers exceeds that for smoking mothers (p < .0001). Over 35 vs. Under 35: H o : Under35 Over 35 H a : Under35 Over 35 t 1.107 and p value .1344 We have insufficient evidence to suggest that the mean birth weight of infants born to mothers under 35 years of age exceeds that for mother over 35 years age (p = .1344). f) Construct a 95% CI for the difference in mean birth weight for infants born to smoking vs. non-smoking mothers. Interpret this interval. (3 pts.) Non-smokers vs. Smokers: A 95% CI for ( Nonsmo ker s Smo ker s ) is given by the interval (.33 lbs. , .65 lbs.). We estimate that the mean birth weight of infants born to non-smoking mothers exceeds the mean birth weight of infants born to smoking mothers by between .33 and .65 lbs (with 95% confidence, i.e. there is a 95% chance that this interval covers the true difference in these populations means.) Under 35 vs. Over 35: A 95% CI for ( Under35 Over 35 ) is given by the interval (-.13 lbs. , .46 lbs.). We estimate that the mean birth weight of infants born to mothers under the age of 35 could be between .13 lbs. lower and .46 lbs. higher than the mean birth weight of mothers over the age of 35. We have no evidence to suggest that the mean birth weights of infants born to these two populations of women significantly differ. g) Repeat parts (d) – (f) using the age over 35 as the potential risk factor. Briefly summarize your findings. (5 pts.) See above 2 – WSU Student Survey Results Two of the variables examined in a recent WSU student survey regarded smoking and drinking (alcohol) habits. Specifically, amongst students who smoke: How many cigarettes do they smoke per day? and amongst students who drink alcohol How many drinks do they have per drinking episode? Which has more variation the number of cigarettes smoked per day by WSU students who smoke or the number drinks per episode consumed by WSU students who regularly drink? Explain. (3 pts.) Cigarettes per day Drinks per episode 7.68 4.16 100% 75.74% CVdrinks 100% 58.84% 10.14 7.07 The variation in the number of cigarettes smoked per day by the smokers sampled exceeds the variation in the number of drinks per episode by the drinkers sampled. CVcigs 3 – Effectiveness of a New Reading Program A new reading program may reduce the number of elementary students who read below grade level. The company that developed this program supplied materials and teach training for a large-scale test involving nearly 8500 children in several different school districts. Statistical analysis of the results showed the percentage of students who did not attain the grade level standard was reduced from 15.9% to 15.1%. The hypothesis that the new reading program produced no improvement was rejected with a p-value of .023. a) Explain what the p-value means in this context. (2 pts.) This is the probability that we obtain sample percentage of 15.1% or less by chance variation alone if in fact the actual percentage of student reading below grade level was still 15.9%. It is the probability we would see this drop in percent by dumb luck alone if in fact the new reading program has no “effect”. b) Even though this reading method has been shown to be significantly better, why might you not recommend that your local school adopt it? Explain. (2 pts.) With a sample of 8500 children even a very small change in the percent of students reading below grade level will statistically significant. The cost of training teachers and buying the reading materials is unlikely to be worth if for such a minimal change in the percentage of students reading below the desired level. 4 – Salaries of Minnesota Teachers Analysis of a sample of 288 Minnesota teachers salaries produced the following 90% confidence interval for the mean teachers salary in Minnesota. 90% CI for Mean MN Teacher Salary ($38,944 , $42,893) Which conclusion below is correct? What’s wrong with the others? (1 pt. each) a) If we took many random samples of MN teachers, about 9 out of 10 of them would produce this confidence interval. FALSE, The sample mean, x , and sample standard deviation, s , are going to vary from sample to sample. In general each sample of n=288 MN teachers will produce a different CI. b) If we took many samples of MN teachers, about 9 out of 10 of them would produce a confidence interval that would cover the true mean salary of all MN teachers. TRUE, this is exactly what we mean by 90% confidence when we are constructing confidence intervals. If you were to take 10 samples from a population and construct a 90% confidence interval for the mean from each one, you would expect 9 of them to cover the true population mean and 1 of them to miss the population mean. The problem is we don’t know if any given interval actually covers the mean we are trying to estimate, we just know that the process of constructing a CI will produce an interval that covers the population mean 90% of the time. We normally think of this as meaning there is a 90% chance that a CI we construct from our sample, will cover the parameter we are trying to estimate. c) About 9 out of 10 MN teachers earn between $38,944 and $42,893. FALSE, this is a confidence interval for the mean. If you wanted an interval that covered approximately 90% of all teachers salaries in MN you would have to have a much wider interval. For example, suppose we assumed that MN teachers salaries were approximately normally distributed, then the interval x 1.645 s would cover approximately 90% of the teachers salaries state-wide. Look back at the Empircal Rule discussion from Chapter 2 to get a better feel for this. d) About 9 out of 10 of the teachers surveyed earn between $38,944 and $42,893. FALSE, same reason as (c) above. e) We are 90% confident that the average teacher salary in the United States is between $38,944 and $42,893. FALSE, the pay scale for MN teachers is not the same as the pay scale for teachers across the U.S. There are states that pay there teachers substantially more and many states where teachers earn less. If you wanted a 90% CI for the mean salary of teachers in the U.S. you need to take a random sample of teachers from across the U.S. , not just from MN. 5 – Vaccination Rates for Children in the United States Public health officials believe that 90% of children have been vaccinated against measles. A random survey of medical records at many schools across the country found that among the more than 13,000 children only 89.4% had been vaccinated. A statistician would reject the 90% hypothesis with a p-value of .011. a) Explain what the p-value means in this context. (2 pts.) If the vaccination rate was still 90%, this is the probability that, for a sample of 13,000 children, we would find a sample percentage of 89.4% or less. As this is not very likely to occur we must logically conclude that the vaccination rate is no longer 90% and has gone down. Realize of course that this conclusion could be wrong. However, with a probability this small that chance variation produced a percentage this far below 90% based upon a sample of 13,000 children, we fairly strong evidence that the measles vaccination rate has decreased. b) The result is statistically significant, but is it important? Explain. (2 pts.) A change in the percentage of children being vaccinated for measles this small is not likely to be of practical importance. 6 – Number of Eggs per Duck Nest A random sample n = 9 of duck nests in Mississippi Wildlife Refuge yielded the following data for the number eggs in the nest: 13 11 8 6 6 4 7 9 8 a) Find the sample mean (2 pts.) x 8 eggs b) Find the sample median (2 pts.) m 8 eggs c) Find the sample mode (1 pt.) 6 and 8 eggs. The mode need not be unique. d) Find the sample variance and sample standard deviation (6 pts.) 9 s2 (x i 1 i x)2 9 1 (13 8) 2 (11 8) 2 (9 8) 2 (8 8) 2 7.5 eggs2 8 s s 2 7.5 2.74 eggs e) Find the standard error of the mean (1 pt.) SE x s 2.74 .913 eggs n 9 f) Construct and interpret a 90% CI for the = mean number of eggs per duck nest in the Mississippi Wildlife Refuge. (4 pts.) x (t table)SE x 8 (1.86)(.913) 8 1.70 (6.3,9.7) We estimate that the mean number of eggs per duck nest in Mississippi Wildlife Refuge is somewhere between 6.3 eggs and 9.7 eggs with 90% confidence. g) Explain why you think this CI is not appropriate based upon assumptions that might be violated. (2 pts.) The random variable we are examining X = # of eggs per duck nest is discrete. It cannot possible have a normal distribution, which is a required assumption when using the t-distribution for making inferences about the mean of a population. This normality assumption is less critical when our sample size is “large” but here n = 9, which cannot be viewed as a large sample. 7 – Body Mass Index and Anorexia Nervosa These data come from a study of women who are receiving treatment for anorexia nervosa. One measure of the effectiveness of the treatment is a weight gain which in the case of this analysis will be represented by change in body mass index (BMI). Data File: Anorexia-bodymass.JMP a) Is there evidence to suggest that for anorexic women receiving this treatment there is an increase in their mean body mass index (BMI)? (4 pts.) 7. a) & b) Let d = BMI discharge – BMI admission H o : d 0 H a : d 0 d 3.301 , s d 3.59 t = 4.11 , p-value=.0003 We have very strong evidence to suggest that the mean change in BMI is positive. In particular we estimate that the mean increase in the BMI of patients is between 1.62 and 4.98 unit with 95% confidence. b) Estimate with 95% confidence the mean change in their BMI. Interpret this interval in practical terms. (4 pts.) c) Is there evidence to suggest that the BMI index at discharge significantly differs from the preferred BMI determined at the start of treatment? (4 pts.) 7. c) Let d = BMI discharge – BMI preferred Note: A negative difference indicates that they patient did not reach their preferred BMI during the course of treatment. H o : d 0 H a : d 0 d .415 , s d 3.90 t = .476 and p-value = .6393 We have no evidence to suggest that the mean difference in discharge BMI and the preferred BMI for patients in this treatment program is different from 0. On average patients appear to be meeting their preferred BMI at the time of discharge. d) True or False. All women in this study were at or above the preferred BMI at the time of discharge. Explain. (2 pts.) NO there are several patients for which the difference d = BMI discharge – BMI preferred is negative. e) Estimate with 95% confidence the percent increase in the BMI for women receiving this treatment. Interpret this interval in practical terms. (4 pts.) % Improvement in BMI = BMI increase 100% BMI admission We estimate that the mean percent- increase in BMI, for the population of anorexia patients who are undergo treatment in this program, is between 11.56% and 36.32% (with 95% confidence).