Statistical Reasoning in Public Health Biostatistics 612, 2009, HW#3 1. A random sample of 200 patients admitted to an adult intensive care unit (ICU) was collected to examine factors associated with death during hospital stay for ICU patients. Data was also collected on patient’s age (in years), race, whether the patient had an infection at the time of ICU admission, and whether the patient had CPR administered prior to the hospital admission. Of specific interest is whether or not infection at the time of admission is associated with increased probability of death during hospital stay. Logistic regression was employed to help answer the substantive question. Below find the estimated coefficients for infection status at time of admission from 4 different logistic regression models all relating the probability of death in the ICU to patient characteristics. (For those interested in playing with the data I have placed a Stata file with the data on the course website homework page) a. What is the direction of the relationship between the probability of death and patients infection status in this sample of 200 patients? Is this direction consistent across the four logistic regression models presented above? b. Compute a 95% CI for the (unadjusted : ie, from the first model listed) coefficient of infection status (at the population level) based on the above results. c. Compute the estimated unadjusted odds ratio of death in the ICU for patients admitted with an infection relative to patients admitted with no infection. Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. Give a 95% confidence interval for this odds ratio, and interpret in words. d. For all 3 regression models which include infection status and other patient characteristics as predictors/covariates: i. estimate the adjusted odds ratio of death for patients with infection at the time of ICU admissions relative to patients without infection at the time of admission ii. compute the 95% confidence interval for each of the adjusted odds ratios e. Which of the 3 adjusted odds ratios computed in part (d) are “statistically significant”? f. Is the relationship between death in ICU patients and infection at the time of confounded by other patient characteristics? Give numerical evidence to justify your answer. g. How were subjects selected for inclusion in the study sample? Would it be possible to use the results from the 4th logistic regression model (with infection status, age, CPR, and race as predictors) assuming you were given all slope estimates and the intercept, to estimate the probability (risk) of death for various groups of patients based on the reported patient characteristics? h. What additional information would you need to see to assess whether the relationship between death in the ICU and infection was modified by patient’s age? 2. The following exercise involves the results from a case-control study published in the American Journal of Epidemiology in 20011. (full article in .pdf on course website). The article abstract is as follows: “A case-control study design was used to determine and quantify all-terrain vehicle (ATV) risk factors. The analysis was based on the results of two national probability surveys conducted in 1997: a survey of injured ATV drivers treated in hospital emergency departments and a survey of the general population of ATV users. Cases were drawn from the injury survey; controls (ATV drivers who had not been injured) were drawn from the user survey. Risk factors were quantified by means of a binary logistic regression analysis. After adjustment for covariates, injury risks were systematically related to a number of driver characteristics (age, gender, driving experience), driver use patterns (monthly driving times, recreational vs. nonrecreational use), and vehicle characteristics (number of wheels, engine size). The results of the analysis suggest that future safety efforts should focus on reducing 1 Rodgers G, Adler P. Risk Factors for All-Terrain Vehicle Injuries: A National Case-Control Study. (2001) American Journal of Epidemiology. Vol 153, No 11: pps 1112 – 1118. Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. child injuries, getting new drivers to participate in hands-on training programs, and encouraging consumers to dispose of the three-wheel ATVs still in use. Am J Epidemiol 2001;153:1112–18.” The authors use logistic regression analyses to estimate unadjusted and adjusted associations between risk of injury and subject and ATV riding characteristics. The following table displays the results of the unadjusted analyses relating to individual predictors of interest: For the categorical variables, the “reference” group that each of the other levels of the predictor is being compared to by the odds ratios listed is indicated by an odds ratio Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. of 1.0 (the relative odds of injury for any group of persons compared to themselves is 1.0). So for example, for the predictor “Gender”, females are the reference group. a) b) c) d) e) f) g) h) For the predictor Age, the authors categorized persons into 5 mutually exclusive age groups. Which of these groups is the reference group for the age comparisons? In these results from the unadjusted analysis, which age group has the highest risk of injury? What is the crude odds ratio for this group compared to the reference? Interpret this odds ratio in words. What is the 95% CI for the odds ratio in part b? What is the estimated crude odds ratio of injury for those ≤ 15 years old compared to the 16-25 year olds? (hint: this requires some minor computation using information in the table) What is the general relationship between the risk of injury and age sample? Can the results relating injury to age be used to estimate the prevalence/incidence of injury by age group? Why or why not? Which sex is at higher risk of injuries based on the crude association between injury and gender? What is the odds ratio of injury for males relative to females? What information would you need to see to ascertain whether the relationship between injury and sex was modified by age? The following table displays the results of the multiple logistic regression relating injury to the predictors given in the table: Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. i) j) k) l) m) n) o) Why is there no corresponding adjusted odds ratio accompanying the intercept in the table? Which sex had a higher risk of injury after adjusting for the other predictors in the model? How does the adjusted odds ratio of injury for males to females compare in value to the unadjusted estimate. Does this data suggest that the relationship between injury and sex was confounded by at least some of the other predictors used in the multiple logistic regression model? In these results from the adjusted analysis, which age group has the highest risk of injury? What is the adjusted odds ratio for this group compared to the reference? Interpret this odds ratio in words. Interpret the confidence interval for quantity whose estimate you reported in part l. Generally speaking, how do the adjusted results relating injury to age compare to the unadjusted results? Is there any evidence that the injury/age relationship was confounded by at least some of the predictors used in the multiple regression model? What units is the “x” in for ATV driving experience in the above multiple Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. logistic regression model? Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. 3. The following exercise involves information from the July 2004 AJPH article “Asian/Pacific Islander Adolescent Sexual Orientation and Suicide Risk in Guam”2. The authors used survey results based on information collected from 1,381 adolescents in Guam. The full text of the article is posted on the course website. A summary of the authors’ research motives and results can be found in the abstract: The authors performed both simple and multiple logistic regressions to estimate the association between having thoughts of suicide (“Suicide Ideation”) and adolescents’ self-described sexual orientation. The authors also performed simple and multiple logistic regressions to estimate the association between attempting suicide and sexual orientation. All regressions were run separately for boys and girls. The results of the regressions are given in table 2, shown on the next page. The predictor “same sex” is described by the authors by: “The key independent variable for the analysis was sexual orientation. This measure was coded 1 for gay, lesbian and bisexual adolescents (heterosexual, not sure and don’t know responses were coded as 0). …… Rates of reporting same sex orientation were 3.5% for both boys and girls” “Model 1” refers to simple logistic regression analyses, and “Model 2” refers to multiple logistic regression analyses. 2 Pinney T, Millman S. Asian/Pacific Islander Adolescent Sexual Orientation and Suicide Risk in Guam (2004). American Journal of Public Health . Vol 4, No 7. pps 1204-1206. Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. a) Interpret the estimated odds ratio and 95% CI for odds ratio associated with the predictor “same sex” in “Model 1” for boys, with outcome “Suicide Ideation”. b) Do the results given in Table 1 allow you to investigate whether the suicide ideation/same-sex association is confounded by subject’s race, use of alcohol, sense of hopelessness and involvement in a physically abusive relationship for both boys and girls? If so, how could you do so? c) Do you think it was necessary that the authors report odds ratios out to three decimal places? Why or why not? d) For whom in the sample (boys or girls) does same-sex orientation appear to be riskier in terms of attempting suicide? Explain your answer. e) Is enough information given in table 1 to ascertain whether sex modifies the association between same-sex orientation and risk of attempted suicide? Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. Below find a diagram that may help you parse some of the results in the article tables: Predictor 5: “hopelessness” : indicator of whether respondent had experienced feelings of hopelesess (defined in article test) in past year: a single x taking on a value of 1 if yes, 0 if not. The adjusted odds ratio for hopelessness” compares the odds of the outcome for adolescents who had experienced hopelessness to those who had not, but were otherwise the same in on the other predictors (Predictors 1, 2,3,4) Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. Sample Exam Questions: Choose the correct answer from the following multiple choice questions. A case-control study is performed to identify risk factors for a certain rare birth outcome. The study includes 200 infants with this birth outcome and 200 healthy infants without the birth outcome. A multiple logistic regression analysis is performed to relate the probability of giving birth to a child with the outcome to characteristics of the mother. The following results are presented. Adjusted odds ratio 95% Confidence interval mother smoked during pregnancy no yes 1.0 2.0 (1.5, 2.5) mother drank alcohol during pregnancy no yes 1.0 1.2 (0.8, 1.6) maternal age > 20 years (old) ≤ 20 years (young) 1.0 1.5 (1.2, 1.8) 4. After controlling for smoking and maternal age, which statement best describes the relationship between alcohol and risk of the birth outcome, as estimated by the above regression model? (a) After accounting for sampling variability, alcohol is positively associated with risk of the birth outcome (p<.05) (c) After accounting for sampling variability, alcohol is negatively associated with the risk of the birth outcome (p < .05) (d) After accounting for sampling variability, alcohol is not associated with the risk of the birth outcome (p > .05) Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA. 5. Use only the information provided above. A younger mother who both smokes and drinks alcohol has about how many times higher odds of having a child with the birth outcome relative to a younger, non-drinking mother who smokes? (a) (b) (c) (d) (e) (f) 6. about 1.2 times higher odds about 1.5 times higher odds about 2 times higher odds about 3.6 times higher odds about 4.7 times higher odds about 9.6 times higher odds Use only the information provided above. A younger mother who both smokes and drinks alcohol has about how many times higher odds of having a child with the birth outcome than an older non-smoking, non-drinking mother? (a) (b) (c) (d) (e) (f) about 1.2 times higher odds about 1.5 times higher odds about 2 times higher odds about 3.6 times higher odds about 4.7 times higher odds about 9.6 times higher odds Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.