Biost 536, Fall 2014 Homework #1 September 26, 2014, Page 1 of 5 Biost 536: Categorical Data Analysis in Epidemiology Emerson, Fall 2014 Homework #1 September 26, 2014 Written problems: To be submitted as a MS-Word compatible file to the class Catalyst dropbox by 1:30 pm on Thursday, September 26, 2014. See the instructions for peer grading of the homework that are posted on the web pages. On this (as all homeworks) Stata / R code and unedited Stata / R output is TOTALLY unacceptable. Instead, prepare a table of statistics gleaned from the Stata output. The table should be appropriate for inclusion in a scientific report, with all statistics rounded to a reasonable number of significant digits. (I am interested in how statistics are used to answer the scientific question.) In all problems requesting “statistical analyses” (either descriptive or inferential), you should present both Methods: A brief sentence or paragraph describing the statistical methods you used. This should be using wording suitable for a scientific journal, though it might be a little more detailed. A reader should be able to reproduce your analysis. DO NOT PROVIDE Stata OR R CODE. Inference: A paragraph providing full statistical inference in answer to the question. Please see the supplementary document relating to “Reporting Associations” for details. Keys to past homeworks from quarters that I taught Biost 517 (e.g. HW #8 from 2012) or Biost 518 (e.g., HW #1 from 2014) or Biost 536 (e.g. HW #3 from 2013) might be consulted for the presentation of inferential results. Note that the requirement to provide a paragraph describing your statistical methods was new this year in Biost 518, and other past keys do not give explicit examples of a separate paragraph. However, many past keys provide this information as an introductory sentence. All questions of treatment benefit relate to any benefit of idarubicin to more frequently induce complete remission in patients with acute myelogenous leukemia (AML) relative to daunorubicin. The data can be found on the class web page (follow the link to Datasets) in the file labeled leukemia.txt. Documentation is in the file leukemia.doc. The data is in free-field format. The file initleuk.txt contains Stata and R code that might be useful in reading in the file. In this homework, we are ultimately interested in inference for the primary endpoint (induction of complete remission) by treatment and sex. We will ignore the sequential aspect of the clinical trial and just analyze the data as if we had always planned on getting data on 130 subjects. We will use an intent to treat approach that analyzes all subjects according to how they were randomized. 1. Provide suitable descriptive statistics for this dataset as might be presented in Table 1 of a manuscript appearing in the medical literature. Methods: Dichotomous variables were created for attainment of complete remission (Complete Remission), status at follow-up (Alive at Follow-Up), and whether subjects received a bone marrow transplant (Bone Marrow Transplant). Descriptive statistics are restricted to the treatment arm of the study (Idarubicin), the control arm (Daunorubicin), Biost 536, Fall 2014 Homework #1 September 26, 2014, Page 2 of 5 and finally presented overall. For continuous and ordered categorical variables (age, French American British classification of acute myeloid leukemia (FAB classification of AML), Karnofsky score, white blood cell count, platelet count, hemoglobin, chemotherapy courses to complete remission, and observation time) we include the mean, standard deviation, minimum and maximum. For binary variables (sex and indicators of complete remission, status at follow up, and bone marrow transplant) we present percentages. Results: Data is available on 130 subjects; however, data is missing on FAB AML Subtype for 8 subjets on the daunorubicin arm and 4 subjects on the idarubicin arm. Additionally one subject has missing data for white blood cell count, platelet count, and hemoglobin at baseline in the daunorubicin arm. Since none of these covariates will be adjusted for in this initial analysis, this won’t be an immediate problem but should we should take care to note how this may affect further analyses. This randomized control trial has 65 of the 130 subjects assigned to each arm. Gender is fairly evenly distributed with males making up about 54% of the arm receiving daunorubicin treatment and 46% of the idarubicin arm, age distribution was similarly evenly distributed ranging from late teens to early 60s with subjects on the daunorubicin arm having an average age of 40 and subjects on the idarubicin arm averaging 38 years of age. Among indicators of health and outcomes, subjects on the daunorubicin arm tended to have higher baseline white blood cell and platelet counts, subects on the idarubicin arm had a higher proportion achieving complete remission (78-59%), a higher proportion were alive at follow up (42-25%) and had a longer mean observation time (574-421). Treatment Arm Daunorubicin Idarubicin Total N=65 N=65 N=130 Male (%) 53.8% 46.2% 50.0% 40(13.4; 19 38(12.5; 17 39(12.9; 17 Age (yrs)1 60) 61) 61) FAB Classification of AML 3(1.5; 0 - 6) 3(1.5; 1 - 6) 3(1.5; 0 - 6) Subtype1 80(12.6; 40 80(11.6; 30 80(12.1; 30 Karnofsky Score 100) 100) 100) Baseline White Blood 43(55.0; 0.7 - 29(36.3; 0.4 36(46.9; 0.4 Cells (103/mm3)1 215) 154.1) 215) Baseline Platelets 94(92.4;11 67(57.8; 11 80(77.8; 11 (103/mm3)1 457) 370) 457) Baseline Hemoglobin 9.6(1.5; 6.4 9.2(1.8; 2.8 9.4(17; 2.8 1 (g/dl) 13.9) 13.7) 13.9) Complete Remission (%) 58.5% 78.46% 68.5% Chemotherapy Courses to Complete Remission2 1.5(0.5; 1 - 2) 1.3(0.5; 1 - 2) 1.4(0.5;1 - 2) 24.6% 41.5% 33.1% 9.2% 12.3% 10.8% Alive at Follow-Up (%) Bone Marrow Transplant (%) Biost 536, Fall 2014 Homework #1 September 26, 2014, Page 3 of 5 Observation Time 421(322.2; 3 - 574(445.2; 9 - 497.5(394.6; 3 (Days)1 1848) 1800) 1848) 1 Summary statistics are of the form Mean(SD; Min-Max) 2This only applies to subjects achieving complete remission, so the sample sizes are as follows: Daunorubicin Arm=38, Idarubicin Arm=51, Total=89 2. Perform an analysis to assess whether subjects taking idarubicin have better primary clinical outcomes than patients on daunorubicin. Methods: The proportion of subjects achieving complete remission was compared between groups receiving the idarubicin treatment and those receiving the daunorubicin treatment. Because Wald-based statistics behave the worst at more extreme proportions than were observed by treatment arm in table 1, I felt comfortable using a chi-square test and taking a 95% confidence interval using Wald-based statistics. Results: The proportion of subjects on the idarubicin treatment arm that achieved complete remission was .78 compared to .58 on the daunoribicin arm. The observed 20% absolute difference in remission is one that would not be surprising if the true difference were between 4.4% and 35.6% with the idarubicin arm having the more favorable outcomes. With a one sided chi-square p-value of 0.014, we can assert with confidence that we can reject the null hypothesis of no difference in outcomes in favor of the hypothesis that subjects taking idarubicin have more favorable clinical outcomes. 3. Is the analysis of treatment effect confounded by sex? Provide your reasoning. One would not expect there to be confounding by sex, as this is a randomized control trial. We do see a higher proportion of males in the daunorubicin arm than in the idarubicin arm but the difference in proportions is pretty small. In the analyses that follow; however, we will see that males have worse remission rates than females independent of treatment so the concern about confounding is nonnegligible. 4. Perform an analysis to assess any treatment benefit of idarubicin over daunorubicin adjusted for sex. Methods: Odds of achieving complete remission were compared between subjects taking idarubicin and subjects taking daunorubicin using logistic regression holding sex constant and allowing for unequal variances. Results: Subjects, controlling for sex who took idarubicin were observed to have odds of complete remission 2.5 times higher than those taking daunorubicin. This observed odds ratio would not be surprising if the true odds ratio comparing subjects in the treatment group and the control group were between 1.14 and 5.53. With a one-sided p-value of 0.022 we can reject with confidence the hypothesis that the odds of achieving complete remission do not differ between treatment arms when holding sex constant. Biost 536, Fall 2014 Homework #1 September 26, 2014, Page 4 of 5 5. Perform an analysis to assess whether males taking idarubicin have more frequent complete remission than males taking daunorubicin. Methods: Odds of achieving complete remission were compared between male subjects taking idarubicin and subjects taking daunorubicin using logistic regression allowing for unequal variances. Results: Male subjects who took idarubicin were observed to have odds of complete remission 2.5 times higher than those taking daunorubicin. This observed odds ratio would not be surprising if the true odds ratio comparing subjects in the treatment group and the control group were between 0.89 and 6.88. With a one-sided p-value of 0.084 and a 95% confidence interval that includes an odds ratio of 1, we cannot reject the hypothesis that the odds of achieving complete remission do not differ between treatment arms for males. 6. Repeat problem 5 for females. Methods: Odds of achieving complete remission were compared between female subjects taking idarubicin and subjects taking daunorubicin using logistic regression allowing for unequal variances. Results: Female subjects who took idarubicin were observed to have odds of complete remission 2.6 times higher than those taking daunorubicin. This observed odds ratio would not be surprising if the true odds ratio comparing subjects in the treatment group and the control group were between 0.75 and 8.77. With a onesided p-value of 0.131 and a 95% confidence interval that includes an odds ratio of 1, we cannot reject the hypothesis that the odds of achieving complete remission do not differ between treatment arms for males. 7. Perform an analysis to assess whether any treatment benefit of idarubicin over daunorubicin differs by sex. Methods: Odds ratios of achieving complete remission between subjects taking idarubicin and subjects taking daunorubicin were compared by sex using logistic regression controlling for sex, including an interaction term, and allowing for unequal variances. Results: Male subjects who took idarubicin were observed to have odds ratios of complete remission compared to their daunorubicin treated counterparts that was 4% less than the odds ratio comparing idarubicin and daunorubicin taking females. This difference in odds ratios would not be surprising if the difference in odds ratio for males was between one fifth (20%) that for females and 4.8 times larger than that for females. With a one-sided p-value of 0.961 and a 95% confidence interval that includes a no difference, we cannot reject the hypothesis that the treatment effect of idarubicin compared to daunorubicin is the same between men and women. 8. Use the analysis you performed in problem 7 to answer the question of whether idarubicin use is associated with more frequent induction of remission. Using a logistic regression comparing odds of remission between subjects treated with idarubicin and daunorubicin that controlled for sex and allowed for a differential effect between men and women, we find an estimated odds ratio for Biost 536, Fall 2014 Homework #1 September 26, 2014, Page 5 of 5 women of 2.57 that would not be atypical if the true odds ratio were between 0.75 and 8.81. Additionally, the estimated odds ratio for males of 2.47 could easily occur if the true odds ratio were between 0.5 and 12.3. Given that these all contain the null hypothesis of no difference, we cannot confidently say that idarubicin is associated with more frequent induction of remission. 9. For each of the analysis models used in problems 4 and 7, provide estimates of the probability of inducing a complete remission by all combinations of treatment group and sex. How do these estimates compare to descriptive statistics for those groups? Methods: Odds are simply p/(1-p) where p is the probability of event. To get probabilities from odds we simply take the odds estimated by multiplying the baseline odds (females on the daunoribicin arm) by whichever treatment or gender effects apply and applying the interaction term for males for the model that employs it and take the resulting odds/(1+odds). Results: The table below shows the predicted probability of inducing a complete remission by sex and treatment arm. Probability of Remission Idarubicin Daunorubicin Males Females Males Females Model Type No Interaction Interaction 0.702 0.700 0.855 0.857 0.484 0.476 0.702 0.700 Observed 0.700 0.857 0.486 0.700 Both the model that does not include interaction and the model that does include an interaction term predict the clinical outcomes in the dataset reasonably well. Each is within 1% absolute difference from the observed probabilities of remission. 10. Which of the above analyses should be used to decide whether idarubicin should be approved for the indication of AML? What problems exist with the use of the other analyses you performed? If I were asked to perform a single analysis to determine whether idarubicin should be used for AML I would have used the analysis that controls for sex. As mentioned before, we have different proportions of males and females in each arm with a difference in baseline remission rates that would make the uncontrolled model susceptible to confounding. In addition, without performing any statistical analyses beyond summary statistics, we can see a substantial difference in treatment effects. Ignoring this would be scientifically dishonest and would lead to reporting an aggregate effect that we cannot confidently state is uniform between sexes.