FinalExam

advertisement
Statistics 512
Final Exam
Spring 2002
Statistics 512
Final Exam
April 29 - May 1, 2002
This is a take home exam. The following rules apply.
1. You may use any textbook, including textbooks on the Web.
2. You may use any statistical software.
3. Answer the questions on the exam paper in the space provided.
Labeled computer output should be attached as needed. You may also
attach additional pages of computations for any problem, but I will
check these only if you direct me to them.
4. You may not seek help from any person except Prof. Altman and Lan
Wang. This includes assistance from internet sources such as e-mail,
bulletin boards and chat rooms.
5. Since students will be taking the exam at any time during the week,
you may not discuss this exam with anyone until Friday May 3.
6. Failure to comply with items 4 or 5 could lead to reduction in your
grade, or disciplinary action.
7. When you have completed the exam, slide it under my door.
I have read the rules above and agree to comply with them.
Signature ________________________________________________
Name (printed) ___________________________________________
1
Statistics 512
Final Exam
Spring 2002
1. Ozone has become an important air pollutant, with many adverse effects. Ozone is
known to affect growth of many plant species. However, even within species, some
varieties show more sensitivity than others, indicating a genetic susceptibility. As well,
other factors such as soil moisture may affect ozone sensitivity.
In a laboratory study of the effects of soil moisture and ozone on lettuce, 3 varieties were
selected from the 28 available, based on genetic susceptibility to ozone (high, medium,
low).
To maintain the ozone level, the plants must be grown in special chambers. Only one
ozone level can be used in each chamber. However, the soil moisture level can be
controlled by the use of special pots.
The response variable is leaf dry weight, which is a measure of yield of this leafy
vegetable. Ozone will be maintained at 30, 50 or 80 ppb. The soil moisture treatments
are: constant moisture and fluctuating moisture on a dry/damp cycle. One plant from
each variety is used as the sampling unit
The objectives of the study are:
1. to assess the effects of ozone and soil moisture on leaf dry weight
2. to determine if there is an interaction between ozone and soil moisture
3. to determine if there are significant differences in the way that the varieties of lettuce
response to the ozone and soil moisture
a. Fill out the following table regarding the factors:
Name of treatment
ozone
Categorical or Quantitative
moisture
variety
2
Fixed or Random
Statistics 512
Final Exam
Spring 2002
b. The investigator is considering 2 designs:
A. There will be 18 chambers in all, with 6 chambers at each level of ozone. Each
chamber will hold 3 pots with the same moisture treatment. Each pot will hold one plant.
All 3 varieties will be placed in each chamber.
B. Each pot will hold 1 plant. Each chamber will have one variety of plant only. There
will be 27 chambers in all, with 3 chambers of each variety at each level of ozone.
Each of the designs above is a split plot experiment with 54 plants. Fill out the
following table. Include the following factors: ozone, moisture, variety and all of the 2
and 3-way interactions of these 3 factors.
Design A
Whole plot
Experimental Unit
Blocking factor (if any)
Factors
Subplot
Design B
Whole plot
Experimental Unit
Factors
Subplot
3
Statistics 512
Final Exam
Spring 2002
c. The random effects of the blocking factors creates correlation among observations.
Fill in the following Table for the 2 most highly correlated observations with the stated
effects.
Observation
Correlated Design A
(yes/no)
same level of ozone,
moisture and variety
different levels of ozone,
same levels of moisture and
variety
same levels of ozone,
different levels of moisture,
same variety
same level of ozone, same
level of moisture, different
variety
same level of ozone,
different level of moisture,
different variety
4
Correlated Design B
(yes/no)
Statistics 512
Final Exam
Spring 2002
d. Major interest in this experiment focuses on the ozone by moisture interaction within
variety. Which design provides the most information about this? Justify your answer?
e. Another major focus of this experiment is the ozone by moisture by variety interaction.
Which design provides the most information about this? Justify your answer?
5
Statistics 512
Final Exam
Spring 2002
f. The unit supervisor points out that use of the chambers is very expensive. Each
chamber can hold many pots of plants. She suggests that the experiment can be run with
only 9 chambers, 3 at each level of ozone. Explain how to design the experiment so that
it is a split plot design with 54 plants in all. Call this Design C.
g. Is Design C more or less powerful than Design A for detecting Ozone by moisture
interaction? Explain your answer.
6
Statistics 512
Final Exam
Spring 2002
2. For a nutrition project, students investigated the effect of time of meal on glucose
response. They provided a pasta dinner for 6 subjects at each of 0200, 0600, 1000, 1400,
1600 and 2000 (on a 24 hour clock). The serum glucose measurement for each subject
was taken prior to the meal time, and then at intervals following the meal. Most of the
subjects were measured at 10 times.
Data for this problem are on the class Web page. The columns of data are:
Glucose - the serum glucose level
Time - the measurement times coded from 1 to 10
Subject - subject coded from 1 to 6 for each start time (36 subjects in all)
Start - start time of the meal on a 24 hour clock
a. What are the fixed and random factors for this study?
b. Subject should be coded as subject(start) because different subjects were used for each
dinner time. Run an analysis which allows for time series error within subject. Is the
time series error statistically signficant?
7
Statistics 512
Final Exam
Spring 2002
c. The SAS statement:
MODEL Y=A B/OUTPRED=filename;
saves all of the original data plus predicted values (for the fixed effects of the model) in
SAS file "filename". Add this statement to your analysis, and then obtain an interaction
plot of START by TIME, so that you can assess the shape of the response curves. Do
they appear to be linear? (Please include your curve in the analysis.) You may need to
recode the start times in order to obtain unique plotting symbols.
d. Based on your analysis, test for the statistical significance of the following effects:
meal start
time
time by meal start interaction
You analysis should include the AR(1) term only if it is statistically significant. Each test
should include the test statistics, p-value, and your conclusion in a written statement.
8
Statistics 512
Final Exam
Spring 2002
e. Total glucose is an important measure of response. Total glucose is the sum of times 2
through 9. Compute 95% confidence intervals for the total glucose for the meals starting
at 2 a.m. and at 6 a.m.
(Hint: Remember to use the DIVISOR= option after the slash in the "ESTIMATE"
statement. You will need to have coefficients for the overall mean (INTERCEPT), start
time, time and start*time.)
f. Hormones secreted at about 3:00 a.m. in most humans delay the metabolism of glucose
by the body. As a result, total glucose is expected to be higher for the 2 a.m. meal than
for the 6 a.m. meal. Test whether or not there is a significant difference in total glucose
for these two meal times.
9
Statistics 512
Final Exam
Spring 2002
The remainder of the problem does not require computing.
g. The times of measurement for each subject are the same, but they are not equally
spaced. In fact, the measurements 1 and 2 are 15 minutes apart, while 10 and 11 are 2
hours apart. The AR(1) analysis analysis assumes that measurements with adjacent
orderings (1,2) (2,3) ... (10,11) have the same correlation. Give one reason why it still
makes sense to fit the AR(1) correlation structure, and one reason why it does not. (Your
explanation should consider the option of equal correlations within subject, versus
correlations that are smaller for observations which are further apart in time.
h. A diabetics nurse looking at the data said "Since total glucose is of most interest, why
not compute the total glucose for each subject and analyse that."
What are the pros and cons of this approach for this particular set of data?
10
Statistics 512
Final Exam
Spring 2002
3. Answer each of the following questions in a few sentences.
a. In a study of asthma drugs, lung capacity was measured on each patient at the start of
the study. The patients were then assigned to one of 3 drug treatments and lung capacity
was measured again after 2 weeks. The experiment was analyzed as a 2-way design with
times 0 and 2 weeks (WEEK), and 3 drug treatments (DRUG). There was a drug main
effect but no drug by week interaction. The biostatistician who analyzed the data calls
for an investigation into the experiment protocol. Why?
b. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo.
The experiment is planned as a replicated Latin square design, with patients as "rows"
and time order as "columns". The physician on the project recommends a 3 week
washout period between treatments to avoid carry-over effects.
What are carry-over effects?
11
Statistics 512
Final Exam
Spring 2002
c. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo.
The experiment is planned as a replicated Latin square design, with patients as "rows"
and time order as "columns".
Are the treatments time varying or time invariant? Justify your answer.
d. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo.
The experiment is planned as a replicated Latin square design, with patients as "rows"
and time order as "columns".
The biostatistician suggests that randomization could be done by writing out the rows of
the Latin square on 27 tokens and allowing each patient to select a token. The physician
suggests instead that they just write out 3 tokens with A, B and C, and allow each patient
to pick a token at random during each time period. Assuming that mixing can be done so
that the patients can choose a token "at random" discuss briefly whether or not these two
methods will lead to a balanced Latin Square experiment.
12
Statistics 512
Final Exam
Spring 2002
e. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo.
The experiment is planned as a replicated Latin square design, with patients as "rows"
and time order as "columns". The physician expects that different patients will react
differently to the 3 treatments.
How can the physician determine from this design whether there is a patient by treatment
interaction?
f. . 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo.
The experiment is planned as a replicated Latin square design, with patients as "rows"
and time order as "columns".
The physician plans to compare both drug treatments to the placebo. The overall F-test
for the drug treatments is not statistically significant. Should a multiple comparisons
procedure be applied to the two tests of contrasts? Explain your answer briefly.
13
Statistics 512
Final Exam
Spring 2002
g. A drug company is considering a combination drug with 8 components all known to
alleviate asthma symptoms. The company hopes to find an optimal combination of
components. The biostatistician suggests that this can be done by fitting a polynomial
and solving for the maximum. The project leader points out that fitting a quadratic
requires 3 levels for each factor, and suggests running several experiments, each looking
at only 4 of the 8 components. What is at least one alternative method that uses fewer
than 38 treatments? Assuming that the total sample size is comparable for the two
methods, which do you think would be preferable?
h. A drug company is considering a combination drug with 8 components all known to
alleviate asthma symptoms. The company hopes to find an optimal combination of
components. The biostatistician suggests that this can be done by fitting a quadratic
polynomial. This involves fitting the intercept, 8 linear terms, 8 quadratic terms, and 28
linear*linear terms. 45 regression coefficients are estimated. An appropriate
experimental design is used so that the 44 contrasts used to estimate these coefficients are
orthogonal. (The estimate of the intercept is not a contrast.) The biostatistician wishes to
start by testing to determine if any of the coefficients are zero. Should a Bonferroni
correction be used in combination with the 45 t-tests?
14
Download