JVMgrading2384 - Emerson Statistics

advertisement
Biost 536: Categorical Data Analysis in Epidemiology
Emerson, Fall 2014
Homework #2
October 5, 2014
Written problems:
In all problems requesting “statistical analyses” (either descriptive or inferential), you should
present both
 Methods: A brief sentence or paragraph describing the statistical methods you used.
This should be using wording suitable for a scientific journal, though it might be a
little more detailed. A reader should be able to reproduce your analysis. DO NOT
PROVIDE Stata OR R CODE.
 Inference: A paragraph providing full statistical inference in answer to the question.
Please see the supplementary document relating to “Reporting Associations” for
details.
For these questions, we will be considering adjustment for age and sex using both stratified and
regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization of
age.
1. We are interested in analyzing associations between 5 year mortality and prevalence of
ASCVD at study enrollment using statistical methods appropriate for binary response
variables. The observation time for death among these subjects is potentially subject to
censoring. Provide a statistical analysis demonstrating that such methods as logistic
regression can be used to answer this question.
Methods: To evaluate the association between 5 year mortality and prevalence of
ASCVD, logistic regression was used to estimate the binary dependent variable, odds of
mortality in 5 years based on the binary predictor of interest, history of CHD (ASCVD).
Robust standard errors were used to adjust for the heteroscedasticity of the binary
outcome variable, mortality in 5 years. 95% confidence intervals were calculated using
Wald type CI based on the approximate normal distribution for the maximum likelihood
estimates for a bninomial distribution. Age, as a continuous variable, and sex, as a binary
variable, were in included in the model to adjuste for age and sex differences in the
relationship between ASCVD and 5-year mortality. To account for right-censored data
(i.e. patients who die prior to 5-years), post hoc analyses was completed using tobit
regression.
Inference: Adjusted for sex and age, the odds of death in 5 years in individuals with a
history of ASCVD is 3.57 times the odds of death in 5 years in individuals without a
history of ASCVD (95% CI 2.36-5.39). Based on this data, we can reject the null
hypothesis that there is equal likelihood of death in 5 years among individuals with and
without a history of ASCVD (p-value <.001), after adjusting for both age and sex.
**Noting the question below (3b), I am wondering, if despite the introductory paragraphs
which indicate these should all be adjusted for age and sex, we were supposed to present
unadjusted estimates here, that would be 4.0 (2.6-6.1), p-value <.001
2. Using the risk difference (RD) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cs or
an equivalent analysis in R).
Methods: A standard mantel-haenszel analysis was used to calculate stratumspecific risk estimates for the dependent variable, 5-year mortality, based on the
binary POI, history of ASCVD, and adjusted for/stratified by the binary variable,
sex, and the categorized variable, age (by 5 year increments). 95% confidence
intervals were calculated using Wald type CI based on the approximate normal
distribution for the maximum likelihood estimates for a binomial distribution and
Exact methods were used to calculate the appropriate p-value due to small sample
sizes in stratum-specific age categories.
Inference: The absolute risk difference was 21.1% (95% 14.4-27.8%) in the
stratum-specific adjusted model. That is to say, adjusted for sex and age, the
absolute difference in the risk of death in 5 years in individuals with a history of
ASCVD is 21.1% higher than the risk of death in 5 years in individuals without a
history of ASCVD (95% CI 14.4%-27.8%).
b. Answer the question using an appropriate regression model.
Methods: To evaluate the association between 5 year mortality and prevalence of
ASCVD, linear regression was used to estimate the absolute risk difference of
mortality in 5 years by the binary predictor of interest, history of CHD (ASCVD).
Robust standard errors were used to adjust for the heteroscedasticity of the binary
outcome variable, mortality in 5 years. 95% confidence intervals were calculated
using Wald type CI based on the approximate normal distribution for the
maximum likelihood estimates for a binomial distribution. Age, as a continuous
variable, and sex, as a binary variable, were in included in the model to adjust for
age and sex differences in the relationship between ASCVD and 5-year mortality.
To account for right-censored data (i.e. patients who die prior to 5-years), post
hoc analyses was completed using tobit regression.
Inference: Adjusted for sex and age, the absolute difference in the risk of death
in 5 years in individuals with a history of ASCVD is 18.9% higher than the risk
of death in 5 years in individuals without a history of ASCVD (95% CI 12.2%25.7%). Based on this data, we can reject the null hypothesis that there is no
difference in the risk of death in 5 years among individuals with and without a
history of ASCVD (p-value <.001), after adjusting for both age and sex.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
The MH approach (former) weights the data by sample sizes, but ignores the
mean-variance relationship in choosing those weights. The regression approach
takes into account the mean-variance relationship when weighting the data.
3. Using the odds ratio (OR) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cc or
an equivalent analysis in R).
Methods: A standard mantel-haenszel analysis was used to calculate stratum-specific
odds of mortality in 5 years based on the binary POI, history of ASCVD. Stratification
by the binary variable, sex, and the categorized variable, age (by 5 year increments) was
completed. 95% confidence intervals were calculated using Wald type CI based on the
approximate normal distribution for the maximum likelihood estimates for a binomial
distribution and Exact methods were used to calculate the appropriate p-value due to
small sample sizes in stratum-specific age categories.
Inference: Using this approach, adjusted for age, the odds of death in 5 years in females
with a history of ASCVD is 3.36 times the odds of death in 5 years in females without a
history of ASCVD (95% CI 1.64-6.81) and 4.02 times (95% CI 2.31-7.03) among males.
Based on this data, we can reject the null hypothesis that there is equal likelihood of
death in 5 years among both males and females with and without a history of ASCVD (pvalue <.001), after adjusting for age. (I co.uld not figure out the stata commands to
calculate these together. Cc and istandard and estandard or given wgts did not work)
b. Answer the question using an appropriate regression model.
Methods: To evaluate the association between 5 year mortality and prevalence of
ASCVD, logistic regression was used to estimate the binary dependent variable, odds of
mortality in 5 years based on the binary predictor of interest, history of CHD (ASCVD).
Robust standard errors were used to adjust for the heteroscedasticity of the binary
outcome variable, mortality in 5 years. 95% confidence intervals were calculated using
Wald type CI based on the approximate normal distribution for the maximum likelihood
estimates for a binomial distribution. Age, as a continuous variable, and sex, as a binary
variable, were in included in the model to adjust for age and sex differences in the
relationship between ASCVD and 5-year mortality.
Inference: Adjusted for sex and age, the odds of death in 5 years in individuals with a
history of ASCVD is 3.57 times the odds of death in 5 years in individuals without a
history of ASCVD (95% CI 2.36-5.39). Based on this data, we can reject the null
hypothesis that there is equal likelihood of death in 5 years among individuals with and
without a history of ASCVD (p-value <.001), after adjusting for both age and sex.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
The MH approach weights the data by sample sizes, but ignores the mean-variance
relationship in choosing those weights. The regression approach takes into account
the mean-variance relationship when weighting the data.
4. Using the risk difference (RR) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command ir or
an equivalent analysis in R).
Methods: A standard mantel-haenszel analysis was used to calculate stratum-specific
odds of mortality in 5 years based on the binary POI, history of ASCVD. Stratification
by the binary variable, sex, and the categorized variable, age (by 5 year increments) was
completed. 95% confidence intervals were calculated using Wald type CI based on the
approximate normal distribution for the maximum likelihood estimates for a binomial
distribution.
Inference: Using this approach, adjusted for age, the relative risk of death in 5 years in
females with a history of ASCVD is 2.5 times the odds of death in 5 years in female
without a history of ASCVD (95% CI 1.64-6.81) among females and 3.16 times (95% CI
2.04-5.38) in males. Based on this data, we can reject the null hypothesis that there is
equal likelihood of death in 5 years among both males and females with and without a
history of ASCVD (p-value <.001), after adjusting for age. (As above, with MH analyses,
I could not find the stata commands to calculate point estimates with more than 1
stratification.)
b. Answer the question using an appropriate regression model.
Methods: A poisson regression was used to calculate the relative risk of the
dependent binary variable, death in 5 years, in individuals with a history of
ASCVD compared to those without, adjusting for the linear variable, age, and the
binary variable, sex. A poisson regression was chosen assuming its
approximation to the binomial in instances of a rare outcome. Robust standard
errors was used to account for the mean-variance relationship.
Inference: Adjusting for age and sex, the relative risk of death in 5 years in
individuals with a history of ACSVD is 2.72 (95% CI 1.89-3.91) times the risk of
death in 5 years among individuals without a history of ACSVD. These data
provide evidence to reject the null hypothesis that there is no difference in risk of
death based on ASCVD status (p-value <.001)
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
The poisson regression approach accounts for the mean-variance relationship with
robust SE estimates, while the MH technique ignore the mean-variance
relationship of binary variables.
5. Comment very briefly on the similarity or differences among the three approaches.
Which would you tend to prefer in general? Why?
They all estimate differential degrees of risk. I prefer the RR or OR from a clinician
perspective as patients can understand “relative risks” easily. I think RD help from an
epidemiologic and public health/policy perspective. I tend to prefer the regression
approach to analyses in that you don’t lose characteristics of the data in categorization of
linear variables.
Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as
a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and
documentation surveillance.doc on the class web pages).
6. Using the incidence ratio as a measure of association, provide inference for an association
between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and
SEER..
a. Answer the question using directly standardized rates, with standardization to the
U.S. population.
Methods: Using stratified analyses an incident rate ratio was calculated, in
addition to 95% confidence intervals for the incident-rate ratio, for the incidence
of CRC in whites living in the US as a function of birthplace. An incident rate
was chosen as the estimand of interest because of the interest in count data and a
person-year denominator. Strata of interest included age and sex. The data was
weighted with the sum over the values within strata.
Inference: The standardized incident rate ratio (IRR) between CRC incidence
among US-born individuals as compared to foreign-born individuals was 1.02
(95% CI .99-1.05) over combined sex strata. These data provide reason to believe
we should fail to reject the null hypothesis that there is no difference in incidence
rates among all individuals when adjusted for age. However, when stratified by
sex, the IRR for males was 1.05 (95% CI 1.00-1.09) and the IRR for females was
.99 (95% CI .95-1.03), again, providing us reason to believe we should fail to
reject the null hypothesis across sex strata, in addition to within combined sex
strata.
b. Answer the question using an appropriate regression model.
Methods: A poisson regression was used to calculate IRR of our outcome of
interest, CRC incidence, based on our binary predictor of interest, birthplace,
while adjusted for covariates age, sex, and SEER site. 95% Wald type CI were
calculated with 3 degrees of freedom.
Inference: After adjusting for age, sex, and SEER site, the incidence of CRC in
foreign born individuals is .99 times that of the incidence of CRC in US born
individuals (95% CI .89-1.09). These data do not provide evidence to support our
(alternative) hypothesis and reject the null (that there is no difference). Therefore,
we fail to reject the null (p-value = .811).
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
The difference is that the standardized weights from the stratified analysis
computed an average of the ratio between the weighted average of US and foreign
born CRC incidence rates. Conversely, the poisson regression used a log link to
first estimate the rates, then averaged the differences on the log scale, then
exponentiated that difference to estimate the geometric mean (rather than
arithmetic mean) of the stratum specific IRRs. In the first example above, I
average over possible EM of sex in the first part, though when completed for
individual sex strata, there was not meaningful difference in adjusted IRR.
Hence, averaging over EM of sex, as done in poisson regression, is not a
scientifically meaningful issue here, and I will thus conclude it is “ok”—the
poisson regression is an acceptable method to this analysis.
Download