3234grading3225

advertisement
Random Number: 3225
Biost 536
Homework 3
Due 10-30-14
1.
Methods: To determine if censoring is present in our data, I restricted the data to
those who did not die to see what the minimum observation time was. Those that
died during the study left the study due to the achievement of the outcome of
interest, and they would not be considered censored. If the minimum observation
time for those who did not die is greater than or equal to 4 years, then we will know
that no one left the study during the 4-year period for reasons other than the
outcome of death.
Results: The minimum observation time for participants who did not die during the
study was 1480 days, which is equivalent to 4.05 years. This provides evidence that
there was no censoring present in our data and that such methods as logistic
regression can be used to answer the question of interest. We have at least 4 years
of follow up on all patients and we will have no incomplete data on a binary
indicator of death within 4 years.
2. a.
Methods: Descriptive statistics are provided by AAI category for variables
measured at study enrollment. Continuous variables are summarized by means,
standard deviations, minima, and maxima. Binary and categorical data are
summarized by their frequency distribution as the frequency and percent.
Descriptive statistics are also provided by AAI category for the proportions of 4-year
mortality.
Results: Table 1 displays the descriptive statistics for the baseline variables for the
4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI
category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35
to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR).
Participants ranged in age from 65 to 95. The baseline variables are not equally
distributed among AAI categories. The AAI categories farther from 1 tend to have a
higher proportion of males, participants with diabetes, and older participants on
average. As the AAI categories increase numerically, there are lower proportions of
black participants, lower proportions of smokers, lower proportions of prevalent
ASVD, lower systolic blood pressure on average, lower serum cholesterol on
average, lower blood C reactive protein on average, and lower blood fibrinogen on
average. BMI is approximately equal on average among the groups.
Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by
the trend in the proportions that the data is U-shaped.
Table 1:
Black Race
(%)
Smoker (%)2
20(27.0)
>0.550.75
(n=230)
75.0 (6.2;
65-93)
118(51.3
)
62(27.0)
12 (16.2)
55(23.9)
Uses Estrogen
(%)2
Prevalent
ASVD disease
(%)
Diabetes (%)2
1(1.4)
8(3.5)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
75.3 (6.3; 72.5 (5.3;
65-95)
65-94)
224(38.8 909(35.1
)
)
114(19.8 357(13.8
)
)
88(15.3)
324(12.5
)
29(5.0)
231(8.9)
41(55.4)
101(43.9
)
173(30.0
)
540(20.9
)
236(18.2
)
16(16.5)
8(47.1)
18(24.7)
55(24.3)
26.0
(4.3;15.643.1)
147.5
(24.5;
96-212)
218.5
(43.2;
112-392)
5.2 (8.2;
0-83)
163(12.7
)
27.4
(4.6;16.948)
131.3
(19.2;
84-227)
206.1
(37.3; 73407)
3.1 (5.3;
0-86)
4(23.5)
25.1
(4.1;14.8
-37.4)
153.6
(24.9;
95-207)
208.4
(40.9;
122-317)
6.2 (10.9;
0-63)
370(14.4
)
26.4
(4.7;14.758.8)
136.3
(21.5;
77-230)
213.5
(38.7; 93430)
3.5 (6.2;
0-108)
26(26.8)
BMI(kg/m2)1,
125(22.2
)
25.9
(4.4;15.144.6)
143.1
(22.7;
93-235)
216.8
(40.6;
111-396)
4.0 (5.8;
0-51)
29.3
(5.8;16.5
-48.1)
129.5
(17.0;
93-172)
198.7
(42.3;
123-330)
3.5 (6.1;
0-48)
26.2
(3.8;20.8
-37)
126.7
(18.1;
102-174)
199.5
(30.6;
144-251)
3.5 (4.0;
0-13)
0.250.55
(n=74)
27 (36.5)
>0.550.75
(n=230)
55(23.9)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
105(18.2) 202(7.8)
>1.151.35
(n=1295)
79(6.1)
>1.351.55
(n=97)
12(12.4)
>1.552.4
(n=17)
2(11.8)
Age(yrs)1
Male (%)
2
0.250.55
(n=74)
76.7 (6.1;
66-92)
43(58.1)
>1.151.35
(n=1295)
71.6 (5.0;
65-95)
681(52.6
)
177(13.7
)
103(8.0)
>1.351.55
(n=97)
72.6 (5.8;
65-91)
76(78.4)
>1.552.4
(n=17)
73.7 (5.0;
67-86)
15(88.2)
17(17.5)
0(0)
3(3.1)
0(0)
62(4.8)
0(0)
0(0)
Systolic blood
pressure
(mmHg)1,2
Serum
Cholesterol
(mg/dl)1, 2
Blood C
reactive
protein
(mg/l)1, 2
Blood
360.6
346.2
333.4
321.5
314.9
315.3
326.6
fibrinogen
(92.2;
(76.3;
(69.9;
(65.5;
(62.4;
(71.5;
(100.0;
(mg/dl)1,2
210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614)
1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum)
2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7,
Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81)
Table 2
4-year
Mortality (%)
b.
Methods: Participants were compared based on a continuous measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the probability of death within 4 years were estimated for AAI levels
based on a linear regression of the binary indicator of death in 4 years on a model
that included a continuous indicator of AAI. Differences in mortality rates were
compared using 95% confidence intervals computed using Wald-type maximum
likelihood methods. Two-sided p values were based on the chi squared statistic and
95% confidence intervals were computed assuming the asymptotic normal
distribution for the linear regression parameter estimates, with standard errors
computed using Huber-White sandwich estimator.
Inference: Participants with a 0.1 higher AAI have an absolute decrease in risk of
2.8% when compared with those with a 0.1 lower AAI for 4-year mortality (95% CI:
-0.035, -0.022; p<0.001).
c.
Methods: Participants were compared based on a continuous measure of log
transformed AAI with respect to differences in the incidence of death within 4 years
of study enrollment. No subjects’ observations were censored prior to four years, so
four-year mortality rates were calculable based on sample proportions at all levels
of AAI. Crude estimates for the probability of death within 4 years were estimated
for AAI levels based on a linear regression of the binary indicator of death in 4 years
on a model that included a continuous indicator of the log transformed AAI.
Differences in mortality rates were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the chi squared statistic and 95% confidence intervals were computed
assuming the asymptotic normal distribution for the linear regression parameter
estimates, with standard errors computed using Huber-White sandwich estimator.
Inference: Participants with a 2-fold higher AAI have an absolute decrease in risk of
19.3% when compared with those with 2-fold lower AAI for 4-year mortality (95%
CI: -23.4%, -15.1%; p<0.001).
d.
Methods: Participants were compared based on a continuous measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the probability of death within 4 years were estimated for AAI levels
based on a linear regression of the binary indicator of death in 4 years on a model
that included a continuous indicator of AAI, and a quadratic indicator of AAI.
Differences in mortality rates were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the global F-test and 95% confidence intervals were computed assuming
the asymptotic normal distribution for the linear regression parameter estimates,
with standard errors computed using Huber-White sandwich estimator.
Inference: AAI modeled untransformed and as a quadratic variable is significantly
associated with 4-year mortality (F(2, 4876)=43.3; p<0.001).
e.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the probability of death within 4 years were estimated for AAI groups
based on a linear regression of the binary indicator of death in 4 years on a model
that included a categorical indicator of AAI (categorized at the scientific cutpoints of
0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in mortality rates were
compared using 95% confidence intervals computed using Wald-type maximum
likelihood methods. Two-sided p values were based on the global F-test and 95%
confidence intervals were computed assuming the asymptotic normal distribution
for the linear regression parameter estimates, with standard errors computed using
Huber-White sandwich estimator.
Inference: AAI categorized with scientifically relevant cutpoints is significantly
associated with decreased 4-year mortality (F(6, 4872)=17.91; p<0.001).
f.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the probability of death within 4 years were estimated for AAI levels
based on a linear regression of the binary indicator of death in 4 years on a model
that included a categorical indicator of AAI (categorized into 7 groups by quantile).
Differences in mortality rates were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the global F-test and 95% confidence intervals were computed assuming
the asymptotic normal distribution for the linear regression parameter estimates,
with standard errors computed using Huber-White sandwich estimator.
Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (F(6, 4872)=17.89; p<0.001).
g.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the probability of death within 4 years were estimated for AAI levels
based on a linear regression of the binary indicator of death in 4 years on a model
that included a linear spline indicator of AAI (categorized with knots at cutpoints of
0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in mortality rates were
compared using 95% confidence intervals computed using Wald-type maximum
likelihood methods. Two-sided p values were based on the global F-test and 95%
confidence intervals were computed assuming the asymptotic normal distribution
for the linear regression parameter estimates, with standard errors computed using
Huber-White sandwich estimator.
Inference: AAI categorized by linear spline is significantly associated with
decreased 4-year mortality (F(7, 4871)=15.66; p<0.001).
h.
The graph of the fitted values from the linear, log transformed, quadratic, dummy
variable, and linear spline models shows that the linear spline fits the data the best
since it allows for the most flexibility. From our descriptive statistics, we know that
the data is U-shaped, and the linear model does not allow it to take that shape. The
log transformed AAI model allows it to curve slightly but not in a complete U-shape.
Another problem with the linear and log transformed AAI are that they have fitted
values below 0, and the fitted values must stay between 0 and 1, which means they
do not model the data accurately. The dummy variables that were categorized
scientifically show that the data takes a U-shape, but the steps may not be modeling
the data as closely if the cutpoints are not where meaningful changes are taking
place. The quantile categorized dummy variables do not take on a U-shape like the
scientific cut point dummy variables. The quadratic model does fit the data to be a
U-shape, but it also forces it to be a U-shape.
When modeling a predictor of interest, an important factor to consider is
interpretability. While, we cant to have a model that fits the data, we also want to be
able to interpret our results in a meaningful way. This is the advantage of using an
untransformed linear or log transformed model, because they are both
interpretable. Using a quadratic model may fit the data better, but the risk
difference will not be interpretable. When we use dummy variables (either with
scientific cutpoints or quantiles), we have to compare each of our groups to a
reference group, which also loses some interpretability of the data. A linear spline
may be flexible and provide a better fit to the data, but we lose interpretability as
well. The only interpretation we can make is that there is or isn’t a significant
association between the two variables.
When modeling a confounder, an important factor to consider is the accuracy of the
model to fit the data, because we need to fully adjust for the confounder to avoid
residual confounding. Since we do not need to interpret the results of the
relationship between the confounder and outcome, interpretability is less
important. Since getting the best fit is the goal, choosing a more flexible model may
be ideal. Since linear, log transformed, and quadratic force the model to take on a
certain shape, they may not be ideal for a confounder if we know that it is a U-shape
for example. Dummy variables (either with scientific cutpoints or quantiles) can
allow for meaningful comparisons between groups, but we lose flexibility as well.
The linear spline may be the best fit to model a confounder since it is the most
flexible and may be the best at reducing residual confounding.
When modeling a precision variable, neither interpretability nor accuracy of the fit
to the model are very important. It is often not worth the effort to fit the data
exactly, and it may be better to choose a simple model to avoid losing precision or
degrees of freedom. In this case, a linear untransformed model may be best unless
we have a scientific knowledge that would make us think another model would be
better. The quadratic model, dummy variables, and spline would all add in
potentially unnecessary components and make the model much more complicated
than it needs to be.
3. a.
Methods: Descriptive statistics are provided by AAI category for variables
measured at study enrollment. Continuous variables are summarized by means,
standard deviations, minima, and maxima. Binary and categorical data are
summarized by their frequency distribution as the frequency and percent.
Descriptive statistics are also provided by AAI category for the proportions of 4-year
mortality.
Results: Table 1 displays the descriptive statistics for the baseline variables for the
4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI
category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35
to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR).
Participants ranged in age from 65 to 95. The baseline variables are not equally
distributed among AAI categories. The AAI categories farther from 1 tend to have a
higher proportion of males, participants with diabetes, and older participants on
average. As the AAI categories increase numerically, there are lower proportions of
black participants, lower proportions of smokers, lower proportions of prevalent
ASVD, lower systolic blood pressure on average, lower serum cholesterol on
average, lower blood C reactive protein on average, and lower blood fibrinogen on
average. BMI is approximately equal on average among the groups.
Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by
the trend in the proportions that the data is U-shaped.
Table 1:
Black Race
(%)
Smoker (%)2
20(27.0)
>0.550.75
(n=230)
75.0 (6.2;
65-93)
118(51.3
)
62(27.0)
12 (16.2)
55(23.9)
Uses Estrogen
(%)2
Prevalent
ASVD disease
(%)
Diabetes (%)2
1(1.4)
8(3.5)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
75.3 (6.3; 72.5 (5.3;
65-95)
65-94)
224(38.8 909(35.1
)
)
114(19.8 357(13.8
)
)
88(15.3)
324(12.5
)
29(5.0)
231(8.9)
41(55.4)
101(43.9
)
173(30.0
)
540(20.9
)
236(18.2
)
16(16.5)
8(47.1)
18(24.7)
55(24.3)
26.0
(4.3;15.643.1)
147.5
163(12.7
)
27.4
(4.6;16.948)
131.3
4(23.5)
25.1
(4.1;14.8
-37.4)
153.6
370(14.4
)
26.4
(4.7;14.758.8)
136.3
26(26.8)
BMI(kg/m2)1,
125(22.2
)
25.9
(4.4;15.144.6)
143.1
29.3
(5.8;16.5
-48.1)
129.5
26.2
(3.8;20.8
-37)
126.7
Age(yrs)1
Male (%)
2
Systolic blood
0.250.55
(n=74)
76.7 (6.1;
66-92)
43(58.1)
>1.151.35
(n=1295)
71.6 (5.0;
65-95)
681(52.6
)
177(13.7
)
103(8.0)
>1.351.55
(n=97)
72.6 (5.8;
65-91)
76(78.4)
>1.552.4
(n=17)
73.7 (5.0;
67-86)
15(88.2)
17(17.5)
0(0)
3(3.1)
0(0)
62(4.8)
0(0)
0(0)
pressure
(24.9;
(24.5;
(22.7;
(21.5;
(19.2;
(17.0;
(18.1;
(mmHg)1,2
95-207)
96-212)
93-235)
77-230)
84-227)
93-172)
102-174)
Serum
208.4
218.5
216.8
213.5
206.1
198.7
199.5
Cholesterol
(40.9;
(43.2;
(40.6;
(38.7; 93- (37.3; 73- (42.3;
(30.6;
(mg/dl)1, 2
122-317) 112-392) 111-396) 430)
407)
123-330) 144-251)
Blood C
6.2 (10.9; 5.2 (8.2;
4.0 (5.8;
3.5 (6.2;
3.1 (5.3;
3.5 (6.1;
3.5 (4.0;
reactive
0-63)
0-83)
0-51)
0-108)
0-86)
0-48)
0-13)
protein
(mg/l)1, 2
Blood
360.6
346.2
333.4
321.5
314.9
315.3
326.6
fibrinogen
(92.2;
(76.3;
(69.9;
(65.5;
(62.4;
(71.5;
(100.0;
(mg/dl)1,2
210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614)
1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum)
2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7,
Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81)
Table 2
4-year
Mortality (%)
0.250.55
(n=74)
27 (36.5)
>0.550.75
(n=230)
55(23.9)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
105(18.2) 202(7.8)
>1.151.35
(n=1295)
79(6.1)
>1.351.55
(n=97)
12(12.4)
>1.552.4
(n=17)
2(11.8)
b.
Methods: Participants were compared based on a continuous measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a continuous indicator of AAI. Differences in the ratios of odds of mortality
were compared using 95% confidence intervals computed using Wald-type
maximum likelihood methods. Two-sided p values were based on the chi-squared
statistic and 95% confidence intervals were computed assuming the asymptotic
normal distribution for the logistic regression parameter estimates.
Inference: Participants with a 0.1 unit higher AAI have an odds 0.76 times that of
those with a 0.1 unit lower AAI of dying within 4 years of study enrollment (95% CI:
0.72, 0.79; p<0.001).
c.
Methods: Participants were compared based on a continuous measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a continuous indicator of log transformed AAI. Differences in the ratios of
odds of mortality were compared using 95% confidence intervals computed using
Wald-type maximum likelihood methods. Two-sided p values were based on the
chi-squared statistic and 95% confidence intervals were computed assuming the
asymptotic normal distribution for the logistic regression parameter estimates.
Inference: Participants with a 2-fold higher AAI have an odds 0.20 times that of
those with a 2-fold lower AAI of dying within 4 years of study enrollment (95% CI:
0.15, 0.26; p<0.001).
d.
Methods: Participants were compared based on a continuous measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a continuous indicator of AAI, and a quadratic indicator of AAI. Differences
in the ratios of odds of mortality were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the chi-squared statistic and 95% confidence intervals were computed
assuming the asymptotic normal distribution for the logistic regression parameter
estimates.
Inference: AAI modeled untransformed and as a quadratic variable is significantly
associated with 4-year mortality (LR chi2(2)=128.5; p<0.001).
e.
Methods: Participants were compared based on a categorical measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a categorical indicator of AAI (categorized at the scientific cutpoints of
0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). . Differences in the ratios of odds of
mortality were compared using 95% confidence intervals computed using Waldtype maximum likelihood methods. Two-sided p values were based on the chisquared statistic and 95% confidence intervals were computed assuming the
asymptotic normal distribution for the logistic regression parameter estimates.
Inference: AAI categorized with scientifically relevant cutpoints is significantly
associated with decreased 4-year mortality (LR chi2(6)=150.4; p<0.001).
f.
Methods: Participants were compared based on a categorical measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a categorical indicator of AAI (categorized into 7 groups by quantile).
Differences in the ratios of odds of mortality were compared using 95% confidence
intervals computed using Wald-type maximum likelihood methods. Two-sided p
values were based on the chi-squared statistic and 95% confidence intervals were
computed assuming the asymptotic normal distribution for the logistic regression
parameter estimates.
Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (LR chi2(6)=149.3; p<0.001).
g.
Methods: Participants were compared based on a categorical measure of AAI with
respect to ratios of odds of mortality within 4 years of study enrollment. No
subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the odds of death within 4 years were estimated for AAI levels based
on a logistic regression of the binary indicator of death in 4 years on a model that
included a linear spline indicator of AAI (categorized with knots at cutpoints of 0.25,
0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in the ratios of odds of mortality
were compared using 95% confidence intervals computed using Wald-type
maximum likelihood methods. Two-sided p values were based on the chi-squared
statistic and 95% confidence intervals were computed assuming the asymptotic
normal distribution for the logistic regression parameter estimates.
Inference: AAI categorized by linear spline is significantly associated with
decreased 4-year mortality (LR chi2(7)=152.99; p<0.001).
h.
The graph of the fitted values from the linear, log transformed, quadratic, dummy
variable, and linear spline models shows that the linear spline fits the data the best
since it allows for the most flexibility. From our descriptive statistics, we know that
the data is U-shaped, and the linear model, log transformed model, and quadratic
model start the U-shape but level off at an aai of greater than 1.5, whereas the linear
spline continues to increase at that point. The dummy variables that were
categorized scientifically show that the data takes a U-shape, but the steps may not
be modeling the data as closely if the cutpoints are not where meaningful changes
are taking place. The quantile categorized dummy variables do not take on a Ushape like the scientific cut point dummy variables.
When modeling a predictor of interest, an important factor to consider is
interpretability. While, we cant to have a model that fits the data, we also want to be
able to interpret our results in a meaningful way. This is the advantage of using an
untransformed linear or log transformed model, because they are both
interpretable. Using a quadratic model may fit the data better, but the risk
difference will not be interpretable. When we use dummy variables (either with
scientific cutpoints or quantiles), we have to compare each of our groups to a
reference group, which also loses some interpretability of the data. A linear spline
may be flexible and provide a better fit to the data, but we lose interpretability as
well. The only interpretation we can make is that there is or isn’t a significant
association between the two variables.
When modeling a confounder, an important factor to consider is the accuracy of the
model to fit the data, because we need to fully adjust for the confounder to avoid
residual confounding. Since we do not need to interpret the results of the
relationship between the confounder and outcome, interpretability is less
important. Since getting the best fit is the goal, choosing a more flexible model may
be ideal. Since linear, log transformed, and quadratic force the model to take on a
certain shape, they may not be ideal for a confounder if we know that it is a U-shape
for example. Dummy variables (either with scientific cutpoints or quantiles) can
allow for meaningful comparisons between groups, but we lose flexibility as well.
The linear spline may be the best fit to model a confounder since it is the most
flexible and may be the best at reducing residual confounding.
When modeling a precision variable, neither interpretability nor accuracy of the fit
to the model are very important. It is often not worth the effort to fit the data
exactly, and it may be better to choose a simple model to avoid losing precision or
degrees of freedom. In this case, a linear untransformed model may be best unless
we have a scientific knowledge that would make us think another model would be
better. The quadratic model, dummy variables, and spline would all add in
potentially unnecessary components and make the model much more complicated
than it needs to be.
4. a.
Methods: Descriptive statistics are provided by AAI category for variables
measured at study enrollment. Continuous variables are summarized by means,
standard deviations, minima, and maxima. Binary and categorical data are
summarized by their frequency distribution as the frequency and percent.
Descriptive statistics are also provided by AAI category for the proportions of 4-year
mortality.
Results: Table 1 displays the descriptive statistics for the baseline variables for the
4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI
category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35
to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR).
Participants ranged in age from 65 to 95. The baseline variables are not equally
distributed among AAI categories. The AAI categories farther from 1 tend to have a
higher proportion of males, participants with diabetes, and older participants on
average. As the AAI categories increase numerically, there are lower proportions of
black participants, lower proportions of smokers, lower proportions of prevalent
ASVD, lower systolic blood pressure on average, lower serum cholesterol on
average, lower blood C reactive protein on average, and lower blood fibrinogen on
average. BMI is approximately equal on average among the groups.
Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by
the trend in the proportions that the data is U-shaped.
Table 1:
Black Race
(%)
Smoker (%)2
20(27.0)
>0.550.75
(n=230)
75.0 (6.2;
65-93)
118(51.3
)
62(27.0)
12 (16.2)
55(23.9)
Uses Estrogen
(%)2
Prevalent
ASVD disease
(%)
Diabetes (%)2
1(1.4)
8(3.5)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
75.3 (6.3; 72.5 (5.3;
65-95)
65-94)
224(38.8 909(35.1
)
)
114(19.8 357(13.8
)
)
88(15.3)
324(12.5
)
29(5.0)
231(8.9)
41(55.4)
101(43.9
)
173(30.0
)
540(20.9
)
236(18.2
)
16(16.5)
8(47.1)
18(24.7)
55(24.3)
26.0
(4.3;15.643.1)
147.5
(24.5;
96-212)
218.5
(43.2;
112-392)
5.2 (8.2;
0-83)
163(12.7
)
27.4
(4.6;16.948)
131.3
(19.2;
84-227)
206.1
(37.3; 73407)
3.1 (5.3;
0-86)
4(23.5)
25.1
(4.1;14.8
-37.4)
153.6
(24.9;
95-207)
208.4
(40.9;
122-317)
6.2 (10.9;
0-63)
370(14.4
)
26.4
(4.7;14.758.8)
136.3
(21.5;
77-230)
213.5
(38.7; 93430)
3.5 (6.2;
0-108)
26(26.8)
BMI(kg/m2)1,
125(22.2
)
25.9
(4.4;15.144.6)
143.1
(22.7;
93-235)
216.8
(40.6;
111-396)
4.0 (5.8;
0-51)
29.3
(5.8;16.5
-48.1)
129.5
(17.0;
93-172)
198.7
(42.3;
123-330)
3.5 (6.1;
0-48)
26.2
(3.8;20.8
-37)
126.7
(18.1;
102-174)
199.5
(30.6;
144-251)
3.5 (4.0;
0-13)
0.250.55
(n=74)
27 (36.5)
>0.550.75
(n=230)
55(23.9)
AAI Categories
>0.75>0.950.95
1.15
(n=577)
(n=2589)
105(18.2) 202(7.8)
>1.151.35
(n=1295)
79(6.1)
>1.351.55
(n=97)
12(12.4)
>1.552.4
(n=17)
2(11.8)
Age(yrs)1
Male (%)
2
0.250.55
(n=74)
76.7 (6.1;
66-92)
43(58.1)
>1.151.35
(n=1295)
71.6 (5.0;
65-95)
681(52.6
)
177(13.7
)
103(8.0)
>1.351.55
(n=97)
72.6 (5.8;
65-91)
76(78.4)
>1.552.4
(n=17)
73.7 (5.0;
67-86)
15(88.2)
17(17.5)
0(0)
3(3.1)
0(0)
62(4.8)
0(0)
0(0)
Systolic blood
pressure
(mmHg)1,2
Serum
Cholesterol
(mg/dl)1, 2
Blood C
reactive
protein
(mg/l)1, 2
Blood
360.6
346.2
333.4
321.5
314.9
315.3
326.6
fibrinogen
(92.2;
(76.3;
(69.9;
(65.5;
(62.4;
(71.5;
(100.0;
(mg/dl)1,2
210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614)
1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum)
2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7,
Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81)
Table 2
4-year
Mortality (%)
b.
Methods: Participants were compared based on a continuous measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the risk ratio of death within 4 years were estimated for AAI levels
based on a Poisson regression of the binary indicator of death in 4 years on a model
that included a continuous indicator of AAI. Risk ratios for 4-year mortality rates
were compared using 95% confidence intervals computed using Wald-type
maximum likelihood methods. Two-sided p values were based on the chi-squared
statistic and 95% confidence intervals were computed assuming the asymptotic
normal distribution for the Poisson regression parameter estimates, with standard
errors computed using Huber-White sandwich estimator.
Inference: Participants with a 0.1 unit higher AAI are 0.79 times as likely as those
with a 0.1 unit lower AAI of dying within 4 years of study enrollment (95% CI: 0.76,
0.82; p<0.001).
c.
Methods: Participants were compared based on a continuous measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the risk ratio of death within 4 years were estimated for AAI levels
based on a Poisson regression of the binary indicator of death in 4 years on a model
that included a continuous indicator of log transformed AAI. Risk ratios for 4-year
mortality rates were compared using 95% confidence intervals computed using
Wald-type maximum likelihood methods. Two-sided p values were based on the
chi-squared statistic and 95% confidence intervals were computed assuming the
asymptotic normal distribution for the Poisson regression parameter estimates,
with standard errors computed using Huber-White sandwich estimator.
Inference: Participants with a 2-fold higher AAI are 0.28 times as likely as those
with a 2-fold lower AAI of dying within 4 years of study enrollment (95% CI: 0.23,
0.34; p<0.001).
d.
Methods: Participants were compared based on a continuous measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions at all levels of AAI. Crude
estimates for the risk ratio of death within 4 years were estimated for AAI levels
based on a Poisson regression of the binary indicator of death in 4 years on a model
that included a continuous indicator of AAI, and a quadratic indicator of AAI. Risk
ratios for 4-year mortality rates were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the chi-squared statistic and 95% confidence intervals were computed
assuming the asymptotic normal distribution for the Poisson regression parameter
estimates, with standard errors computed using Huber-White sandwich estimator.
Inference: AAI modeled untransformed and as a quadratic variable is significantly
associated with 4-year mortality (Wald chi2(2)=191.1; p<0.001).
e.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions of AAI groups. Crude estimates
for the risk ratio of death within 4 years were estimated for AAI levels based on a
Poisson regression of the binary indicator of death in 4 years on a model that
included a categorical indicator of AAI (categorized at the scientific cutpoints of
0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Risk ratios for 4-year mortality rates
were compared using 95% confidence intervals computed using Wald-type
maximum likelihood methods. Two-sided p values were based on the chi-squared
statistic and 95% confidence intervals were computed assuming the asymptotic
normal distribution for the Poisson regression parameter estimates, with standard
errors computed using Huber-White sandwich estimator.
Inference: AAI categorized with scientifically relevant cutpoints is significantly
associated with decreased 4-year mortality (Wald chi2(6)=188.3; p<0.001).
f.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions of AAI groups. Crude estimates
for the risk ratio of death within 4 years were estimated for AAI levels based on a
Poisson regression of the binary indicator of death in 4 years on a model that
included a categorical indicator of AAI (categorized into 7 groups by quantile). Risk
ratios for 4-year mortality rates were compared using 95% confidence intervals
computed using Wald-type maximum likelihood methods. Two-sided p values were
based on the chi-squared statistic and 95% confidence intervals were computed
assuming the asymptotic normal distribution for the Poisson regression parameter
estimates, with standard errors computed using Huber-White sandwich estimator.
Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (Wald chi2(6)=177.4; p<0.001).
g.
Methods: Participants were compared based on a categorical measure of AAI with
respect to differences in the incidence of death within 4 years of study enrollment.
No subjects’ observations were censored prior to four years, so four-year mortality
rates were calculable based on sample proportions of AAI groups. Crude estimates
for the risk ratio of death within 4 years were estimated for AAI levels based on a
Poisson regression of the binary indicator of death in 4 years on a model that
included a linear spline indicator of AAI (categorized with knots at cutpoints of 0.25,
0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Risk ratios for 4-year mortality rates were
compared using 95% confidence intervals computed using Wald-type maximum
likelihood methods. Two-sided p values were based on the chi-squared statistic and
95% confidence intervals were computed assuming the asymptotic normal
distribution for the Poisson regression parameter estimates, with standard errors
computed using Huber-White sandwich estimator.
Inference: AAI categorized by linear spline is significantly associated with
decreased 4-year mortality (Wald chi2(7)=189.7; p<0.001).
h.
The graph of the fitted values from the linear, log transformed, quadratic, dummy
variable, and linear spline models shows that the linear spline fits the data the best
since it allows for the most flexibility. From our descriptive statistics, we know that
the data is U-shaped, and the linear model, log transformed model, and quadratic
model start the U-shape but level off at an aai of greater than 1.5, whereas the linear
spline continues to increase at that point. The log transformed model also has
predicted values above 1, which are not possible with our data. The dummy
variables that were categorized scientifically show that the data takes a U-shape, but
the steps may not be modeling the data as closely if the cutpoints are not where
meaningful changes are taking place. The quantile categorized dummy variables do
not take on a U-shape like the scientific cut point dummy variables.
When modeling a predictor of interest, an important factor to consider is
interpretability. While, we cant to have a model that fits the data, we also want to be
able to interpret our results in a meaningful way. This is the advantage of using an
untransformed linear or log transformed model, because they are both
interpretable. Using a quadratic model may fit the data better, but the risk
difference will not be interpretable. When we use dummy variables (either with
scientific cutpoints or quantiles), we have to compare each of our groups to a
reference group, which also loses some interpretability of the data. A linear spline
may be flexible and provide a better fit to the data, but we lose interpretability as
well. The only interpretation we can make is that there is or isn’t a significant
association between the two variables.
When modeling a confounder, an important factor to consider is the accuracy of the
model to fit the data, because we need to fully adjust for the confounder to avoid
residual confounding. Since we do not need to interpret the results of the
relationship between the confounder and outcome, interpretability is less
important. Since getting the best fit is the goal, choosing a more flexible model may
be ideal. Since linear, log transformed, and quadratic force the model to take on a
certain shape, they may not be ideal for a confounder if we know that it is a U-shape
for example. Dummy variables (either with scientific cutpoints or quantiles) can
allow for meaningful comparisons between groups, but we lose flexibility as well.
The linear spline may be the best fit to model a confounder since it is the most
flexible and may be the best at reducing residual confounding.
When modeling a precision variable, neither interpretability nor accuracy of the fit
to the model are very important. It is often not worth the effort to fit the data
exactly, and it may be better to choose a simple model to avoid losing precision or
degrees of freedom. In this case, a linear untransformed model may be best unless
we have a scientific knowledge that would make us think another model would be
better. The quadratic model, dummy variables, and spline would all add in
potentially unnecessary components and make the model much more complicated
than it needs to be.
5. RD vs. OR vs. RR for Untransformed Linear Model
When looking at the untransformed linear model, the risk ratio model and odds
ratio models do a better job of modeling the U-shaped data than the risk difference
model. The risk difference model also extends below 0, while the OR and RR models
stay within 0 and 1.
RD vs. OR vs. RR on the Log Transformed Model
When looking at the log transformed aai model, the risk ratio model and odds ratio
models do a better job of modeling the U-shaped data than the risk difference
model. The risk difference model also extends below 0, and the RR model extends
above 1, while the OR model stays within 0 and 1.
RD vs. OR vs. RR on the Quadratic Model
When looking at the quadratic model, the risk ratio and odds ratio models curve less
after 1.5 than the risk difference model, and the risk difference model does a better
job of fitting U-shaped data.
RD vs. OR vs. RR on the Dummy Variable based on scientific cutpoint model
When looking at the dummy variable model based on scientific cutpoints, the risk
difference, odds ratio, and risk ratio all present the same model. They all model the
U-shape, but the model is less flexible than the spline model.
RD vs. OR vs. RR on the Dummy Variable based on quantile model
When looking at the dummy variable model based on quantile cutpoints, the risk
difference, odds ratio, and risk ratio all present the same model. The quantile
cutpoints don’t model the data as well as the scientific cutpoints. This does show a
U-shape, but a lot of information is lost in the way the cutpoints were formed.
RD vs. OR vs. RR on the Dummy Variable based on linear spline model
When looking at the linear spline model, the risk difference, odds ratio, and risk
ratio all present the same model. This model is the most flexible and shows the Ushaped nature of the data.
Download