take home exam solutions - Winona State University

advertisement
Comprehensive Take-Home Exam STAT 210 (83 pts.)
SOLUTIONS
1 – Risk Factors for Low Birth Weight
These data come from a study of infant birth weights. Two of the factors of interest to
researchers are maternal smoking during pregnancy and age of the mother and the role
these factors might play in the birth weight of the infant. It is believed that smoking
mothers and mothers over the age of 35 are at increased risk of having an infant with low
birth weight, which is a birth weight below 6 lbs.
Data from this study are contained in the file Lowbirthweight.JMP. The variables in the
data file are as follows:
In Parts (a) – (c) you will examine whether or not women over 35 years of age and
women who smoke during pregnancy have a significantly better chance of having an
infant with low birth weight.
a) Find the following conditional probabilities: (4 pts.)
P(LBW=Yes|Over 35 = Yes)=.1186 P(LBW=Yes|Over 35 = No) = .0549
P(LBW=Yes|Smoker = Yes) = .0936 P(LBW=Yes|Smoker=No) = .0341
b) Use the conditional probabilities from part (a) to find the Relative Risk (RR) of
having an infant with low birth weight associated with both of these potential risk factors.
Give the correct interpretations of both of these RR’s. (4 pts.)
RR for smoking mothers = .0936/.0341 = 2.74, smokers are 2.74 times more likely to
have an infant with a low birth weight than non-smokers.
RR for over 35 mothers = .1148/.0549 = 2.10, mothers over 35 years of age are 2.10
times more likely to have an infant with a low birth weight.
1 – Low Birth Weights (cont’d)
c) Use an appropriate inferential method to determine if these two risk factors are
statistically significant, specifically test whether or not having the risk factor increases
the chance of having a infant with a low birth weight. Summarize your findings
including any appropriate computer output you used. (8 pts.)
See Fisher’s Exact Test results above.
Notation:  i  the proportion of mothers in the population that have infants with
low birth weights. You can also think of this as the probability or chance that a
mother in the population has an infant with low birth weight.
Smoking:
H o :  Smo ker s   Non smo ker s
H a :  Smo ker s   Non smo ker s
From Fisher’s Exact Test we have a p-value=.0011. We have strong evidence to
suggest that women who smoke during pregnancy are at increased risk of having an
infant with a low birth weight.
Over 35:
H o :  Over 35   Under35
H a :  Over 35   Under35
From Fisher’s Exact Test we have a p-value=.0551. We have weak evidence to
suggest that women who are over 35 years of age are at increased risk of having an
infant with a low birth weight. Using a significance level   .05 we would retain the
null hypothesis and fail to conclude that having a child past the age of 35 presents
an increased risk of having an infant with a low birth weight.
In Parts (d) – (f) you will be comparing the actual birth weights of infants.
d) Construct a comparative display that shows the birth weights plotted vs. smoking
status of the mother. Obtain summary statistics (mean, median, quantiles, SD, etc.) for
the birth weights of infants from both groups. How do the birth weights compare on the
basis of these statistics? (3 pts.)
Smokers vs. Non-Smokers
The sample mean birth weight for non-smoking mothers is 7.73 lbs. with a standard
deviation of 1.05 lbs. compared to a mean of 7.24 lbs. and standard deviation of 1.08
lbs. for smoking mothers. The sample means suggest non-smoking mothers have
babies that are on average about .5 lbs. heavier than babies born to smoking
mothers. The variation in the observed birth weights for both groups is very
similar. Birth weights for both groups are approximately normal.
Over 35 years of age vs. Under 35 years of age
The sample mean birth weight for mothers under the age of 35 is 7.53 lbs. with a
standard deviation of 1.08 lbs. compared to a sample mean of 7.37 lbs. and standard
deviation of 1.22 lbs. for mothers over 35 years of age. The sample means suggest
mothers under the age of 35 have babies that are on average about .16 lbs. heavier
than babies born to over 35. The variation in the observed birth weights for both
groups is very similar. Birth weights for both groups are approximately normal.
e) Conduct a test to determine if smoking mothers have significantly smaller infants on
average when compared to mothers who did not smoke during pregnancy. Summarize
your findings. (4 pts.)
Because the sample standard deviations for the risk vs. non-risk group in both cases
are very similar we will use a pooled t-test to compare the population means. (Note:
you can formally test the equality of variance assumption by selecting the Unequal
Variances option in JMP, see handout).
Smokers vs. Non-smokers:
H o :  Non smo ker s   Smo ker s
H a :  Non smo ker s   Smo ker s
t  5.977 and p  value  .0001
We have extremely strong evidence to suggest that the mean birth weight of infants
born to non-smoking mothers exceeds that for smoking mothers (p < .0001).
Over 35 vs. Under 35:
H o : Under35   Over 35
H a : Under35   Over 35
t  1.107 and p  value  .1344
We have insufficient evidence to suggest that the mean birth weight of infants born
to mothers under 35 years of age exceeds that for mother over 35 years age (p =
.1344).
f) Construct a 95% CI for the difference in mean birth weight for infants born to
smoking vs. non-smoking mothers. Interpret this interval. (3 pts.)
Non-smokers vs. Smokers:
A 95% CI for (  Nonsmo ker s   Smo ker s ) is given by the interval (.33 lbs. , .65 lbs.).
We estimate that the mean birth weight of infants born to non-smoking mothers
exceeds the mean birth weight of infants born to smoking mothers by between .33
and .65 lbs (with 95% confidence, i.e. there is a 95% chance that this interval covers
the true difference in these populations means.)
Under 35 vs. Over 35:
A 95% CI for ( Under35   Over 35 ) is given by the interval (-.13 lbs. , .46 lbs.).
We estimate that the mean birth weight of infants born to mothers under the age of
35 could be between .13 lbs. lower and .46 lbs. higher than the mean birth weight of
mothers over the age of 35. We have no evidence to suggest that the mean birth
weights of infants born to these two populations of women significantly differ.
g) Repeat parts (d) – (f) using the age over 35 as the potential risk factor. Briefly
summarize your findings. (5 pts.)
See above
2 – WSU Student Survey Results
Two of the variables examined in a recent WSU student survey regarded smoking and
drinking (alcohol) habits. Specifically, amongst students who smoke:
 How many cigarettes do they smoke per day?
and amongst students who drink alcohol
 How many drinks do they have per drinking episode?
Which has more variation the number of cigarettes smoked per day by WSU students
who smoke or the number drinks per episode consumed by WSU students who regularly
drink? Explain. (3 pts.)
Cigarettes per day
Drinks per episode
7.68
4.16
 100%  75.74%  CVdrinks 
 100%  58.84%
10.14
7.07
The variation in the number of cigarettes smoked per day by the smokers sampled
exceeds the variation in the number of drinks per episode by the drinkers sampled.
CVcigs 
3 – Effectiveness of a New Reading Program
A new reading program may reduce the number of elementary students who read below
grade level. The company that developed this program supplied materials and teach
training for a large-scale test involving nearly 8500 children in several different school
districts. Statistical analysis of the results showed the percentage of students who did not
attain the grade level standard was reduced from 15.9% to 15.1%. The hypothesis that
the new reading program produced no improvement was rejected with a p-value of .023.
a) Explain what the p-value means in this context. (2 pts.)
This is the probability that we obtain sample percentage of 15.1% or less by chance
variation alone if in fact the actual percentage of student reading below grade level
was still 15.9%. It is the probability we would see this drop in percent by dumb
luck alone if in fact the new reading program has no “effect”.
b) Even though this reading method has been shown to be significantly better, why might
you not recommend that your local school adopt it? Explain. (2 pts.)
With a sample of 8500 children even a very small change in the percent of students
reading below grade level will statistically significant. The cost of training teachers
and buying the reading materials is unlikely to be worth if for such a minimal
change in the percentage of students reading below the desired level.
4 – Salaries of Minnesota Teachers
Analysis of a sample of 288 Minnesota teachers salaries produced the following 90%
confidence interval for the mean teachers salary in Minnesota.
90% CI for Mean MN Teacher Salary
($38,944 , $42,893)
Which conclusion below is correct? What’s wrong with the others? (1 pt. each)
a) If we took many random samples of MN teachers, about 9 out of 10 of them would
produce this confidence interval.
FALSE, The sample mean, x , and sample standard deviation, s , are going to vary
from sample to sample. In general each sample of n=288 MN teachers will produce
a different CI.
b) If we took many samples of MN teachers, about 9 out of 10 of them would produce a
confidence interval that would cover the true mean salary of all MN teachers.
TRUE, this is exactly what we mean by 90% confidence when we are constructing
confidence intervals. If you were to take 10 samples from a population and
construct a 90% confidence interval for the mean from each one, you would expect
9 of them to cover the true population mean and 1 of them to miss the population
mean. The problem is we don’t know if any given interval actually covers the mean
we are trying to estimate, we just know that the process of constructing a CI will
produce an interval that covers the population mean 90% of the time. We normally
think of this as meaning there is a 90% chance that a CI we construct from our
sample, will cover the parameter we are trying to estimate.
c) About 9 out of 10 MN teachers earn between $38,944 and $42,893.
FALSE, this is a confidence interval for the mean. If you wanted an interval that
covered approximately 90% of all teachers salaries in MN you would have to have a
much wider interval. For example, suppose we assumed that MN teachers salaries
were approximately normally distributed, then the interval x  1.645  s would cover
approximately 90% of the teachers salaries state-wide. Look back at the Empircal
Rule discussion from Chapter 2 to get a better feel for this.
d) About 9 out of 10 of the teachers surveyed earn between $38,944 and $42,893.
FALSE, same reason as (c) above.
e) We are 90% confident that the average teacher salary in the United States is between
$38,944 and $42,893.
FALSE, the pay scale for MN teachers is not the same as the pay scale for teachers
across the U.S. There are states that pay there teachers substantially more and
many states where teachers earn less. If you wanted a 90% CI for the mean salary
of teachers in the U.S. you need to take a random sample of teachers from across the
U.S. , not just from MN.
5 – Vaccination Rates for Children in the United States
Public health officials believe that 90% of children have been vaccinated against measles.
A random survey of medical records at many schools across the country found that
among the more than 13,000 children only 89.4% had been vaccinated. A statistician
would reject the 90% hypothesis with a p-value of .011.
a) Explain what the p-value means in this context. (2 pts.)
If the vaccination rate was still 90%, this is the probability that, for a sample of
13,000 children, we would find a sample percentage of 89.4% or less. As this is not
very likely to occur we must logically conclude that the vaccination rate is no longer
90% and has gone down.
Realize of course that this conclusion could be wrong. However, with a probability
this small that chance variation produced a percentage this far below 90% based
upon a sample of 13,000 children, we fairly strong evidence that the measles
vaccination rate has decreased.
b) The result is statistically significant, but is it important? Explain. (2 pts.)
A change in the percentage of children being vaccinated for measles this small is not
likely to be of practical importance.
6 – Number of Eggs per Duck Nest
A random sample n = 9 of duck nests in Mississippi Wildlife Refuge yielded the
following data for the number eggs in the nest:
13 11
8
6
6
4
7
9 8
a) Find the sample mean (2 pts.) x  8 eggs
b) Find the sample median (2 pts.) m  8 eggs
c) Find the sample mode (1 pt.) 6 and 8 eggs. The mode need not be unique.
d) Find the sample variance and sample standard deviation (6 pts.)
9
s2 
 (x
i 1
i
 x)2
9 1

(13  8) 2  (11  8) 2    (9  8) 2  (8  8) 2
 7.5 eggs2
8
s  s 2  7.5  2.74 eggs
e) Find the standard error of the mean (1 pt.)
SE x 
s

2.74
 .913 eggs
n
9
f) Construct and interpret a 90% CI for the  = mean number of eggs per duck nest in the
Mississippi Wildlife Refuge. (4 pts.)
x  (t  table)SE x  8  (1.86)(.913)  8  1.70  (6.3,9.7)
We estimate that the mean number of eggs per duck nest in Mississippi Wildlife
Refuge is somewhere between 6.3 eggs and 9.7 eggs with 90% confidence.
g) Explain why you think this CI is not appropriate based upon assumptions that might
be violated. (2 pts.)
The random variable we are examining X = # of eggs per duck nest is discrete. It
cannot possible have a normal distribution, which is a required assumption when
using the t-distribution for making inferences about the mean of a population. This
normality assumption is less critical when our sample size is “large” but here n = 9,
which cannot be viewed as a large sample.
7 – Body Mass Index and Anorexia Nervosa
These data come from a study of women who are receiving treatment for anorexia
nervosa. One measure of the effectiveness of the treatment is a weight gain which in the
case of this analysis will be represented by change in body mass index (BMI).
Data File: Anorexia-bodymass.JMP
a) Is there evidence to suggest that for anorexic women receiving this treatment there is
an increase in their mean body mass index (BMI)? (4 pts.)
7. a) & b)
Let d = BMI discharge – BMI admission
H o : d  0
H a : d  0
d  3.301 , s d  3.59
t = 4.11 , p-value=.0003
We have very strong evidence to suggest
that the mean change in BMI is positive.
In particular we estimate that the mean
increase in the BMI of patients is between
1.62 and 4.98 unit with 95% confidence.
b) Estimate with 95% confidence the mean change in their BMI. Interpret this interval
in practical terms. (4 pts.)
c) Is there evidence to suggest that the BMI index at discharge significantly differs from
the preferred BMI determined at the start of treatment? (4 pts.)
7. c)
Let d = BMI discharge – BMI preferred
Note: A negative difference indicates that they patient did not
reach their preferred BMI during the course of treatment.
H o : d  0
H a : d  0
d  .415 , s d  3.90
t = .476 and p-value = .6393
We have no evidence to suggest that the mean
difference in discharge BMI and the preferred
BMI for patients in this treatment program is
different from 0. On average patients appear to
be meeting their preferred BMI at the time of
discharge.
d) True or False. All women in this study were at or above the preferred BMI at the time
of discharge. Explain. (2 pts.)
NO there are several patients for which the difference d = BMI discharge – BMI
preferred is negative.
e) Estimate with 95% confidence the percent increase in the BMI for women receiving
this treatment. Interpret this interval in practical terms. (4 pts.)
% Improvement in BMI =
BMI increase
 100%
BMI admission
We estimate that the mean
percent- increase in BMI, for the
population of anorexia patients
who are undergo treatment in
this program, is between 11.56%
and 36.32% (with 95%
confidence).
Download