Solution to Final Exam

advertisement
University of Massachusetts
Department of Biostatistics and Epidemiology
Public Health 540 Introductory Biostatistics
Fall 2009
Final Exam-Solution
Instructions:
This is an open book exam. You may use any references you brought with you such as notes,
books, calculators, internet, etc. You must work independently, however. You will need to use
SAS to answer the last question.
The exam consists of 20 multiple choice questions. Unless otherwise specified, circle the letter of
the one best answer for each question. You may show any work in the margins or on the back of
the exam.
After completing the exam, please sign the academic honesty pledge.
I have complied with the University policy on Academic Honesty in completing
this exam.
________________________________________________________________
Questions 1 to 4 are based on a data set called the "Low Birth Weight" dataset. This dataset was
created for a study designed to identify the risk factors associated with giving birth to a low birth
weight baby (weighing less than 2500 grams). Data for the study were collected on 189 women,
59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four
variables which were thought to be of importance were age, weight of the mother at her last
menstrual period, smoking status, and number of physician visits during the first trimester of
pregnancy. Data were collected at Baystate Medical Center, Springfield, Massachusetts, during
1986.
Following is some output from the program Minitab regarding the mean age of the mothers
selected for the study.
One-Sample T: AGE
Variable
AGE
N
189
Mean
23.238
StDev
5.299
SE Mean
0.385
(
80.0% CI
22.742, 23.734)
1.
What is the formula used to compute the confidence interval (22.742, 23.734)?
Note: We will follow the convention used in Pagano and Gauvreau (2000) that zα / 2 is the
percentile from a normal distribution such that P ( Z > zα / 2 ) = α / 2 .
2.
σ
A.
x ± z0.025
B.
x ± z0.1
C.
x ± t188,0.025
D.
x ± t188,0.1
E.
Something else or not enough information
189
σ
189
s
189
s
189
Researchers are interested in testing the null hypothesis that the mean age of mothers
at time of delivery is 23 years vs. a two-sided alternative hypothesis. What can you
conclude about this test from the output given in Question 1? (CIRCLE ALL THAT
APPLY)
A.
The null hypothesis that μ = 23 will be rejected at the 5% significance level
B.
The null hypothesis that μ = 23 will be rejected at the 20% significance level
C.
The null hypothesis that μ = 23 won't be rejected at the 5% significance level
D.
The null hypothesis that μ = 23 won't be rejected at the 20% significance level
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 2 of 11
Following is some output from Minitab regarding the proportion of mothers who smoked
where SMOKE is coded as: (0 = non-smoking, 1 = smoking). The output includes the number
of smokers, X, the sample size, N, the sample proportion of smoking mothers, a 95% confidence
interval for the proportion of smoking mothers in the population, a test statistic for the null
hypothesis that p=0.5 (z-value), and the p-value for the two-sided test.
Test and CI for One Proportion: SMOKE
Test of p = 0.5 vs p not = 0.5
Success = 1
Variable
SMOKE
3.
N
189
Sample p
0.391534
95.0% CI
(0.321949, 0.461120)
Z-Value
-2.98
P-Value
0.003
What is the formula used to compute the confidence interval (0.32, 0.46)?
σ
A.
x ± z0.025
B.
x ± t188,0.025
C.
pˆ ± z0.025
D.
pˆ ± z0.05
E.
4.
X
74
189
s
189
pˆ (1 − pˆ )
189
pˆ (1 − pˆ )
189
Something else or not enough information
Researchers are interested in testing the null hypothesis that half of the mothers are
smokers. What can you conclude about this test from the output above Question 3?
(CIRCLE ALL THAT APPLY)
A.
The null hypothesis that p = 0.5 will be rejected at the 1% significance level
B.
The null hypothesis that p = 0.5 will be rejected at the 5% significance level
C.
The null hypothesis that p = 0.5 will be rejected at the 10% significance level
D.
The null hypothesis that p = 0.5 will be rejected at the 20% significance level
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 3 of 11
5.
6.
When testing a hypothesis, let A be the probability of type I error and B the probability of
type II error. What must always be true?
A.
A=B
B.
A<B
C.
A>B
D.
A+B=1
E.
Something else or not enough information
Following is some descriptive output regarding a variable named "time (T)."
Descriptive Statistics: time
Variable
time
N
8
Mean
3.250
Median
3.000
TrMean
3.250
Variable
time
Minimum
1.000
Maximum
7.000
Q1
2.000
Q3
4.500
StDev
1.909
SE Mean
0.675
(Note the “TrMean” stands for ‘trimmed mean’, another statistic sometimes used to represent the
middle of the distribution. Also, Q1 corresponds to the 1st quartile, and Q3 corresponds to the
third quartile.)
8
What is the value of
∑T ?
i =1
i
A.
8
B.
24
C.
26
D.
56
E.
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 4 of 11
7.
8.
The authors of a recent study on the effect of exposure to agent A in the proportion of
pregnancies that end up in miscarriage claim that one of the tests in their study has a
power of 80% against a certain alternative hypothesis. If that is really the case, what
else must be true?
A.
Pr ( error of type I ) = 0.2
B.
Pr ( error of type II ) = 0.2
C.
Significance level = 0.2
D.
Confidence level = 0.2
E.
Something else or not enough information
A researcher would like to use a data set on trauma to test the null hypothesis that the
mean age for patients who died is the same as for patients who survived. She does not
know the variance in age, but is willing to assume that the variance is the same for
patients who died, and those who survived. What type of test should she use to decide
whether or not to reject such a null hypothesis?
A.
A Z-test based on
σ
X
n
B.
A t-test based on
X
s
n
C.
A Z-test based on
X1 − X 2
σ2
n1
D.
A t-test based on
σ2
n2
X1 − X 2
s 2p
n1
E.
+
+
s 2p
n2
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 5 of 11
Questions 9-12: IQ scores are designed so that in the general population they have a mean of
100 and a standard deviation (σ) of 10. We collect a simple random sample of 100 IQ scores
from the entering UMass freshman class and we want to test the null hypothesis (H0) that the
average IQ score for the entering UMass freshman class is less than or equal to 110 versus the
alternative hypothesis (HA) that it is more than 110.
9.
Assuming that σ = 10 holds for the entering UMass freshman class, on what statistic
should we base our decision whether or not to reject H0?
A.
Pˆ1 − Pˆ2
(
Pˆ 1 − Pˆ
n1
B.
) + Pˆ (1 − Pˆ )
n Pˆ + n Pˆ
where Pˆ = 1 1 2 2
n1 + n2
n2
X − 110
σ
n
C.
D.
E.
10.
X − 110
s
n
Pˆ
Pˆ (1 − Pˆ )
n
Something else or not enough information
Suppose the p-value for such a test happens to be 0.08 and the desired significance level
(α) is 5%. What should the researcher's decision be?
A.
Reject the null hypothesis that the average IQ score for the entering UMass
freshman class is less than or equal to 110
B.
Do not reject the null hypothesis that the average IQ score for the entering UMass
freshman class is less than or equal to 110
C.
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 6 of 11
11.
12.
Suppose the p-value for such a test happens to be 0.08 and the desired significance level
(α) is 10%. What should the researcher's decision be?
A.
Reject the null hypothesis that the average IQ score for the entering UMass
freshman class is less than or equal to 110
B.
Do not reject the null hypothesis that the average IQ score for the entering UMass
freshman class is less than or equal to 110
C.
Something else or not enough information
Suppose the desired significance level (α) is 1%. What should the rejection region be?
A.
X > 112.576
B.
X > 112.576 or X < 107.424
C.
X > 110+2.576
D.
X > 110+1.96
E.
X > 112.326
F.
X > 112.326 or X < 107.674
G.
Something else or not enough information
s
100
s
s
or X < 110-1.96
100
100
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 7 of 11
Questions 13-14: A researcher wants to test if age is a risk factor for death after traumatic aortic
rupture. She hypothesizes that patients who die of traumatic aortic rupture are older. To save
time, the researcher decides to use the following procedure to test her hypothesis: she will flip a
fair coin and if it lands heads she will reject the null hypothesis, otherwise she won’t reject the
null hypothesis.
13.
14.
What would the probability of type I error be for such a test?
A.
5%
B.
10%
C.
50%
D.
95%
E.
Something else or not enough information
If the researcher is considering two-sided alternative hypotheses, what would be the
power of such a test?
A.
5%
B.
10%
C.
50%
D.
95%
E.
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 8 of 11
Questions 15-16: The following are some partial results on height in centimeters of a simple
random sample of UMass students.
One-Sample T: Height
Variable
Height
15.
16.
N
100
Mean
?
StDev
?
SE Mean
?
95.0% CI
(152.00, 176.00)
Given this information, what has to be true? (CIRCLE ALL THAT APPLY)
A.
H0: μ = 160 is rejected when the significance level (α) is 1%
B.
H0: μ = 160 is not rejected when the significance level (α) is 1%
C.
H0: μ = 180 is not rejected when the significance level (α) is 5%
D.
H0: μ = 180 is rejected when the significance level (α) is 10%
E.
Something else or not enough information
Given this information, what is the value of the sample mean height ( X )?
A.
152
B.
164
C.
174
D.
176
E.
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 9 of 11
Questions 17-18: According to the 2000 census, 10% of California residents were born outside
of the US. We choose at random 10 California residents.
17.
18.
Assuming the proportion from the census is still valid, what is the probability that at least
half of the 10 people selected were born outside of the US?
A.
0.001
B.
0.002
C.
0.1
D.
0.999
E.
1.0
F.
Something else or not enough information
Assuming the proportion from the census is still valid, what is the expected number of
people among the 10 selected who were born outside of the US?
A.
0.1
B.
1
C.
5
D.
9
E.
Something else or not enough information
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 10 of 11
Questions 19-20: Suppose that the distribution of total serum cholesterol in men who eventually
develop coronary artery disease follows a normal distribution with mean μ = 244 mg/dl and
standard deviation σ = 48 mg/dl. For men who do not develop the disease the corresponding
distribution is also normal with mean 219 and standard deviation 41 mg/dl.
Health educators propose using an initial serum cholesterol level of 260 as a screening test for
predicting future coronary artery disease (i.e., if a man’s total serum cholesterol is greater than
260 mg/dl, he would be considered at elevated risk of the disease).
19. What proportion of men would be classified as having elevated risk using this criteria among
men who eventually develop coronary artery disease, and among men who did not develop
coronary artery disease?
a. Men with eventual disease: 0.348
Men with no disease: 0.159
b. Men with eventual disease: 0.348
Men with no disease: 0.198
c. Men with eventual disease: 0.371
Men with no disease: 0.159
d. Men with eventual disease: 0.371
Men with no disease: 0.198
e. None of the above.
20. In the data set basev2.sas7bdat in the Seasons study, total cholesterol (TC) was reported for
men (FEMALE=0). In order for a subject to be eligible to participate in the Seasons study, there
must have been no evidence of previous coronary artery disease. The program ejs09b540p53.sas
(see Course Website) describes TC for men. Using this program as a starting point, determine
the proportion of men in the Seasons study data that would be classified as having elevated risk
of coronary artery disease, and circle all responses that you feel are correct.
a. The proportion of men classified as having elevated risk in the Seasons study is smaller than
the expected proportion for men who do not develop the disease (based on question 19), and
therefore we do not expect men in the Seasons study to develop coronary artery disease.
b. The men classified as having elevated risk in the Seasons study are not at higher risk of
coronary artery disease since the men had no evidence of previous coronary artery disease.
c. If a man is classified as having elevated risk in the Seasons study, they will go on to develop
coronary artery disease.
d. If a man is not classified as having elevated risk in the Seasons study, they will not develop
coronary artery disease.
e. The proportion of men in the Seasons study not classified as having elevated risk of coronary
artery disease is larger than the proportion expected (based on question 19).
J:\Courses\be540\yr2009-stanek\assignments\finalexam-2009-solution.doc
Page 11 of 11
Download