Statistics 512 Final Exam Spring 2002 Statistics 512 Final Exam April 29 - May 1, 2002 This is a take home exam. The following rules apply. 1. You may use any textbook, including textbooks on the Web. 2. You may use any statistical software. 3. Answer the questions on the exam paper in the space provided. Labeled computer output should be attached as needed. You may also attach additional pages of computations for any problem, but I will check these only if you direct me to them. 4. You may not seek help from any person except Prof. Altman and Lan Wang. This includes assistance from internet sources such as e-mail, bulletin boards and chat rooms. 5. Since students will be taking the exam at any time during the week, you may not discuss this exam with anyone until Friday May 3. 6. Failure to comply with items 4 or 5 could lead to reduction in your grade, or disciplinary action. 7. When you have completed the exam, slide it under my door. I have read the rules above and agree to comply with them. Signature ________________________________________________ Name (printed) ___________________________________________ 1 Statistics 512 Final Exam Spring 2002 1. Ozone has become an important air pollutant, with many adverse effects. Ozone is known to affect growth of many plant species. However, even within species, some varieties show more sensitivity than others, indicating a genetic susceptibility. As well, other factors such as soil moisture may affect ozone sensitivity. In a laboratory study of the effects of soil moisture and ozone on lettuce, 3 varieties were selected from the 28 available, based on genetic susceptibility to ozone (high, medium, low). To maintain the ozone level, the plants must be grown in special chambers. Only one ozone level can be used in each chamber. However, the soil moisture level can be controlled by the use of special pots. The response variable is leaf dry weight, which is a measure of yield of this leafy vegetable. Ozone will be maintained at 30, 50 or 80 ppb. The soil moisture treatments are: constant moisture and fluctuating moisture on a dry/damp cycle. One plant from each variety is used as the sampling unit The objectives of the study are: 1. to assess the effects of ozone and soil moisture on leaf dry weight 2. to determine if there is an interaction between ozone and soil moisture 3. to determine if there are significant differences in the way that the varieties of lettuce response to the ozone and soil moisture a. Fill out the following table regarding the factors: Name of treatment ozone Categorical or Quantitative moisture variety 2 Fixed or Random Statistics 512 Final Exam Spring 2002 b. The investigator is considering 2 designs: A. There will be 18 chambers in all, with 6 chambers at each level of ozone. Each chamber will hold 3 pots with the same moisture treatment. Each pot will hold one plant. All 3 varieties will be placed in each chamber. B. Each pot will hold 1 plant. Each chamber will have one variety of plant only. There will be 27 chambers in all, with 3 chambers of each variety at each level of ozone. Each of the designs above is a split plot experiment with 54 plants. Fill out the following table. Include the following factors: ozone, moisture, variety and all of the 2 and 3-way interactions of these 3 factors. Design A Whole plot Experimental Unit Blocking factor (if any) Factors Subplot Design B Whole plot Experimental Unit Factors Subplot 3 Statistics 512 Final Exam Spring 2002 c. The random effects of the blocking factors creates correlation among observations. Fill in the following Table for the 2 most highly correlated observations with the stated effects. Observation Correlated Design A (yes/no) same level of ozone, moisture and variety different levels of ozone, same levels of moisture and variety same levels of ozone, different levels of moisture, same variety same level of ozone, same level of moisture, different variety same level of ozone, different level of moisture, different variety 4 Correlated Design B (yes/no) Statistics 512 Final Exam Spring 2002 d. Major interest in this experiment focuses on the ozone by moisture interaction within variety. Which design provides the most information about this? Justify your answer? e. Another major focus of this experiment is the ozone by moisture by variety interaction. Which design provides the most information about this? Justify your answer? 5 Statistics 512 Final Exam Spring 2002 f. The unit supervisor points out that use of the chambers is very expensive. Each chamber can hold many pots of plants. She suggests that the experiment can be run with only 9 chambers, 3 at each level of ozone. Explain how to design the experiment so that it is a split plot design with 54 plants in all. Call this Design C. g. Is Design C more or less powerful than Design A for detecting Ozone by moisture interaction? Explain your answer. 6 Statistics 512 Final Exam Spring 2002 2. For a nutrition project, students investigated the effect of time of meal on glucose response. They provided a pasta dinner for 6 subjects at each of 0200, 0600, 1000, 1400, 1600 and 2000 (on a 24 hour clock). The serum glucose measurement for each subject was taken prior to the meal time, and then at intervals following the meal. Most of the subjects were measured at 10 times. Data for this problem are on the class Web page. The columns of data are: Glucose - the serum glucose level Time - the measurement times coded from 1 to 10 Subject - subject coded from 1 to 6 for each start time (36 subjects in all) Start - start time of the meal on a 24 hour clock a. What are the fixed and random factors for this study? b. Subject should be coded as subject(start) because different subjects were used for each dinner time. Run an analysis which allows for time series error within subject. Is the time series error statistically signficant? 7 Statistics 512 Final Exam Spring 2002 c. The SAS statement: MODEL Y=A B/OUTPRED=filename; saves all of the original data plus predicted values (for the fixed effects of the model) in SAS file "filename". Add this statement to your analysis, and then obtain an interaction plot of START by TIME, so that you can assess the shape of the response curves. Do they appear to be linear? (Please include your curve in the analysis.) You may need to recode the start times in order to obtain unique plotting symbols. d. Based on your analysis, test for the statistical significance of the following effects: meal start time time by meal start interaction You analysis should include the AR(1) term only if it is statistically significant. Each test should include the test statistics, p-value, and your conclusion in a written statement. 8 Statistics 512 Final Exam Spring 2002 e. Total glucose is an important measure of response. Total glucose is the sum of times 2 through 9. Compute 95% confidence intervals for the total glucose for the meals starting at 2 a.m. and at 6 a.m. (Hint: Remember to use the DIVISOR= option after the slash in the "ESTIMATE" statement. You will need to have coefficients for the overall mean (INTERCEPT), start time, time and start*time.) f. Hormones secreted at about 3:00 a.m. in most humans delay the metabolism of glucose by the body. As a result, total glucose is expected to be higher for the 2 a.m. meal than for the 6 a.m. meal. Test whether or not there is a significant difference in total glucose for these two meal times. 9 Statistics 512 Final Exam Spring 2002 The remainder of the problem does not require computing. g. The times of measurement for each subject are the same, but they are not equally spaced. In fact, the measurements 1 and 2 are 15 minutes apart, while 10 and 11 are 2 hours apart. The AR(1) analysis analysis assumes that measurements with adjacent orderings (1,2) (2,3) ... (10,11) have the same correlation. Give one reason why it still makes sense to fit the AR(1) correlation structure, and one reason why it does not. (Your explanation should consider the option of equal correlations within subject, versus correlations that are smaller for observations which are further apart in time. h. A diabetics nurse looking at the data said "Since total glucose is of most interest, why not compute the total glucose for each subject and analyse that." What are the pros and cons of this approach for this particular set of data? 10 Statistics 512 Final Exam Spring 2002 3. Answer each of the following questions in a few sentences. a. In a study of asthma drugs, lung capacity was measured on each patient at the start of the study. The patients were then assigned to one of 3 drug treatments and lung capacity was measured again after 2 weeks. The experiment was analyzed as a 2-way design with times 0 and 2 weeks (WEEK), and 3 drug treatments (DRUG). There was a drug main effect but no drug by week interaction. The biostatistician who analyzed the data calls for an investigation into the experiment protocol. Why? b. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo. The experiment is planned as a replicated Latin square design, with patients as "rows" and time order as "columns". The physician on the project recommends a 3 week washout period between treatments to avoid carry-over effects. What are carry-over effects? 11 Statistics 512 Final Exam Spring 2002 c. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo. The experiment is planned as a replicated Latin square design, with patients as "rows" and time order as "columns". Are the treatments time varying or time invariant? Justify your answer. d. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo. The experiment is planned as a replicated Latin square design, with patients as "rows" and time order as "columns". The biostatistician suggests that randomization could be done by writing out the rows of the Latin square on 27 tokens and allowing each patient to select a token. The physician suggests instead that they just write out 3 tokens with A, B and C, and allow each patient to pick a token at random during each time period. Assuming that mixing can be done so that the patients can choose a token "at random" discuss briefly whether or not these two methods will lead to a balanced Latin Square experiment. 12 Statistics 512 Final Exam Spring 2002 e. 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo. The experiment is planned as a replicated Latin square design, with patients as "rows" and time order as "columns". The physician expects that different patients will react differently to the 3 treatments. How can the physician determine from this design whether there is a patient by treatment interaction? f. . 27 patients participate in a study of asthma drugs. There are 2 drugs and a placebo. The experiment is planned as a replicated Latin square design, with patients as "rows" and time order as "columns". The physician plans to compare both drug treatments to the placebo. The overall F-test for the drug treatments is not statistically significant. Should a multiple comparisons procedure be applied to the two tests of contrasts? Explain your answer briefly. 13 Statistics 512 Final Exam Spring 2002 g. A drug company is considering a combination drug with 8 components all known to alleviate asthma symptoms. The company hopes to find an optimal combination of components. The biostatistician suggests that this can be done by fitting a polynomial and solving for the maximum. The project leader points out that fitting a quadratic requires 3 levels for each factor, and suggests running several experiments, each looking at only 4 of the 8 components. What is at least one alternative method that uses fewer than 38 treatments? Assuming that the total sample size is comparable for the two methods, which do you think would be preferable? h. A drug company is considering a combination drug with 8 components all known to alleviate asthma symptoms. The company hopes to find an optimal combination of components. The biostatistician suggests that this can be done by fitting a quadratic polynomial. This involves fitting the intercept, 8 linear terms, 8 quadratic terms, and 28 linear*linear terms. 45 regression coefficients are estimated. An appropriate experimental design is used so that the 44 contrasts used to estimate these coefficients are orthogonal. (The estimate of the intercept is not a contrast.) The biostatistician wishes to start by testing to determine if any of the coefficients are zero. Should a Bonferroni correction be used in combination with the 45 t-tests? 14