Statistical Reasoning in Public Health Biostatistics 612, 2009, HW#3

advertisement
Statistical Reasoning in Public Health
Biostatistics 612, 2009, HW#3
1. A random sample of 200 patients admitted to an adult intensive care unit (ICU) was
collected to examine factors associated with death during hospital stay for ICU
patients. Data was also collected on patient’s age (in years), race, whether the patient
had an infection at the time of ICU admission, and whether the patient had CPR
administered prior to the hospital admission. Of specific interest is whether or not
infection at the time of admission is associated with increased probability of death
during hospital stay. Logistic regression was employed to help answer the
substantive question. Below find the estimated coefficients for infection status at time
of admission from 4 different logistic regression models all relating the probability of
death in the ICU to patient characteristics. (For those interested in playing with the
data I have placed a Stata file with the data on the course website homework page)
a. What is the direction of the relationship between the probability of death and
patients infection status in this sample of 200 patients? Is this direction
consistent across the four logistic regression models presented above?
b. Compute a 95% CI for the (unadjusted : ie, from the first model listed)
coefficient of infection status (at the population level) based on the above
results.
c. Compute the estimated unadjusted odds ratio of death in the ICU for patients
admitted with an infection relative to patients admitted with no infection.
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
Give a 95% confidence interval for this odds ratio, and interpret in words.
d. For all 3 regression models which include infection status and other patient
characteristics as predictors/covariates:
i.
estimate the adjusted odds ratio of death for patients with
infection at the time of ICU admissions relative to patients
without infection at the time of admission
ii.
compute the 95% confidence interval for each of the adjusted
odds ratios
e. Which of the 3 adjusted odds ratios computed in part (d) are “statistically
significant”?
f. Is the relationship between death in ICU patients and infection at the time of
confounded by other patient characteristics? Give numerical evidence to
justify your answer.
g. How were subjects selected for inclusion in the study sample? Would it be
possible to use the results from the 4th logistic regression model (with
infection status, age, CPR, and race as predictors) assuming you were given
all slope estimates and the intercept, to estimate the probability (risk) of death
for various groups of patients based on the reported patient characteristics?
h. What additional information would you need to see to assess whether the
relationship between death in the ICU and infection was modified by patient’s
age?
2. The following exercise involves the results from a case-control study published in the
American Journal of Epidemiology in 20011. (full article in .pdf on course website).
The article abstract is as follows:
“A case-control study design was used to determine and quantify all-terrain vehicle
(ATV) risk factors. The analysis was based on the results of two national probability
surveys conducted in 1997: a survey of injured ATV drivers treated in hospital
emergency departments and a survey of the general population of ATV users. Cases
were drawn from the injury survey; controls (ATV drivers who had not been injured)
were drawn from the user survey. Risk factors were quantified by means of a binary
logistic regression analysis. After adjustment for covariates, injury risks were
systematically related to a number of driver characteristics (age, gender, driving
experience), driver use patterns (monthly driving times, recreational vs.
nonrecreational use), and vehicle characteristics (number of wheels, engine size).
The results of the analysis suggest that future safety efforts should focus on reducing
1 Rodgers G, Adler P. Risk Factors for All-Terrain Vehicle Injuries: A National Case-Control Study. (2001)
American Journal of Epidemiology. Vol 153, No 11: pps 1112 – 1118.
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
child injuries, getting new drivers to participate in hands-on training programs, and
encouraging consumers to dispose of the three-wheel ATVs still in use. Am J
Epidemiol 2001;153:1112–18.”
The authors use logistic regression analyses to estimate unadjusted and adjusted
associations between risk of injury and subject and ATV riding characteristics.
The following table displays the results of the unadjusted analyses relating to
individual predictors of interest:
For the categorical variables, the “reference” group that each of the other levels of the
predictor is being compared to by the odds ratios listed is indicated by an odds ratio
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
of 1.0 (the relative odds of injury for any group of persons compared to themselves is
1.0). So for example, for the predictor “Gender”, females are the reference group.
a)
b)
c)
d)
e)
f)
g)
h)
For the predictor Age, the authors categorized persons into 5 mutually
exclusive age groups. Which of these groups is the reference group for the age
comparisons?
In these results from the unadjusted analysis, which age group has the highest
risk of injury? What is the crude odds ratio for this group compared to the
reference? Interpret this odds ratio in words.
What is the 95% CI for the odds ratio in part b?
What is the estimated crude odds ratio of injury for those ≤ 15 years old
compared to the 16-25 year olds? (hint: this requires some minor computation
using information in the table)
What is the general relationship between the risk of injury and age sample?
Can the results relating injury to age be used to estimate the
prevalence/incidence of injury by age group? Why or why not?
Which sex is at higher risk of injuries based on the crude association between
injury and gender? What is the odds ratio of injury for males relative to females?
What information would you need to see to ascertain whether the relationship
between injury and sex was modified by age?
The following table displays the results of the multiple logistic regression relating
injury to the predictors given in the table:
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
i)
j)
k)
l)
m)
n)
o)
Why is there no corresponding adjusted odds ratio accompanying the intercept
in the table?
Which sex had a higher risk of injury after adjusting for the other predictors in
the model?
How does the adjusted odds ratio of injury for males to females compare in
value to the unadjusted estimate. Does this data suggest that the relationship
between injury and sex was confounded by at least some of the other predictors
used in the multiple logistic regression model?
In these results from the adjusted analysis, which age group has the highest
risk of injury? What is the adjusted odds ratio for this group compared to the
reference? Interpret this odds ratio in words.
Interpret the confidence interval for quantity whose estimate you reported in
part l.
Generally speaking, how do the adjusted results relating injury to age compare
to the unadjusted results? Is there any evidence that the injury/age relationship
was confounded by at least some of the predictors used in the multiple regression
model?
What units is the “x” in for ATV driving experience in the above multiple
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
logistic regression model?
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
3. The following exercise involves information from the July 2004 AJPH article
“Asian/Pacific Islander Adolescent Sexual Orientation and Suicide Risk in Guam”2.
The authors used survey results based on information collected from 1,381
adolescents in Guam. The full text of the article is posted on the course website.
A summary of the authors’ research motives and results can be found in the abstract:
The authors performed both simple and multiple logistic regressions to estimate the
association between having thoughts of suicide (“Suicide Ideation”) and adolescents’
self-described sexual orientation. The authors also performed simple and multiple
logistic regressions to estimate the association between attempting suicide and sexual
orientation. All regressions were run separately for boys and girls. The results of the
regressions are given in table 2, shown on the next page. The predictor “same sex” is
described by the authors by:
“The key independent variable for the analysis was sexual orientation.
This measure was coded 1 for gay, lesbian and bisexual adolescents
(heterosexual, not sure and don’t know responses were coded as 0). ……
Rates of reporting same sex orientation were 3.5% for both boys and
girls”
“Model 1” refers to simple logistic regression analyses, and “Model 2” refers to multiple
logistic regression analyses.
2 Pinney T, Millman S. Asian/Pacific Islander Adolescent Sexual Orientation and Suicide Risk in Guam (2004).
American Journal of Public Health . Vol 4, No 7. pps 1204-1206.
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
a) Interpret the estimated odds ratio and 95% CI for odds ratio associated with the
predictor “same sex” in “Model 1” for boys, with outcome “Suicide Ideation”.
b) Do the results given in Table 1 allow you to investigate whether the suicide
ideation/same-sex association is confounded by subject’s race, use of alcohol, sense
of hopelessness and involvement in a physically abusive relationship for both boys
and girls? If so, how could you do so?
c) Do you think it was necessary that the authors report odds ratios out to three decimal
places? Why or why not?
d) For whom in the sample (boys or girls) does same-sex orientation appear to be riskier
in terms of attempting suicide? Explain your answer.
e) Is enough information given in table 1 to ascertain whether sex modifies the
association between same-sex orientation and risk of attempted suicide?
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
Below find a diagram that may help you parse some of the results in the article tables:
Predictor 5: “hopelessness” : indicator of whether respondent had
experienced feelings of hopelesess (defined in article test) in past year: a
single x taking on a value of 1 if yes, 0 if not. The adjusted odds ratio for
hopelessness” compares the odds of the outcome for adolescents who had
experienced hopelessness to those who had not, but were otherwise the
same in on the other predictors (Predictors 1, 2,3,4)
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
Sample Exam Questions: Choose the correct answer from the following multiple
choice questions.
A case-control study is performed to identify risk factors for a certain rare birth
outcome. The study includes 200 infants with this birth outcome and 200 healthy
infants without the birth outcome. A multiple logistic regression analysis is
performed to relate the probability of giving birth to a child with the outcome to
characteristics of the mother. The following results are presented.
Adjusted odds ratio
95% Confidence interval
mother smoked
during pregnancy
no
yes
1.0
2.0
(1.5, 2.5)
mother drank alcohol
during pregnancy
no
yes
1.0
1.2
(0.8, 1.6)
maternal age
> 20 years (old)
≤ 20 years (young)
1.0
1.5
(1.2, 1.8)
4.
After controlling for smoking and maternal age, which statement best
describes the relationship between alcohol and risk of the birth outcome, as
estimated by the above regression model?
(a)
After accounting for sampling variability, alcohol is positively
associated with risk of the birth outcome (p<.05)
(c)
After accounting for sampling variability, alcohol is negatively
associated with the risk of the birth outcome (p < .05)
(d)
After accounting for sampling variability, alcohol is not associated
with the risk of the birth outcome (p > .05)
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
5.
Use only the information provided above. A younger mother who both
smokes and drinks alcohol has about how many times higher odds of having a
child with the birth outcome relative to a younger, non-drinking mother who
smokes?
(a)
(b)
(c)
(d)
(e)
(f)
6.
about 1.2 times higher odds
about 1.5 times higher odds
about 2 times higher odds
about 3.6 times higher odds
about 4.7 times higher odds
about 9.6 times higher odds
Use only the information provided above. A younger mother who both
smokes and drinks alcohol has about how many times higher odds of having a
child with the birth outcome than an older non-smoking, non-drinking
mother?
(a)
(b)
(c)
(d)
(e)
(f)
about 1.2 times higher odds
about 1.5 times higher odds
about 2 times higher odds
about 3.6 times higher odds
about 4.7 times higher odds
about 9.6 times higher odds
Copyright © 2009 The Johns Hopkins University and John McGready. Creative Commons BY-NC-SA.
Download