Random Number: 3225 Biost 536 Homework 3 Due 10-30-14 1. Methods: To determine if censoring is present in our data, I restricted the data to those who did not die to see what the minimum observation time was. Those that died during the study left the study due to the achievement of the outcome of interest, and they would not be considered censored. If the minimum observation time for those who did not die is greater than or equal to 4 years, then we will know that no one left the study during the 4-year period for reasons other than the outcome of death. Results: The minimum observation time for participants who did not die during the study was 1480 days, which is equivalent to 4.05 years. This provides evidence that there was no censoring present in our data and that such methods as logistic regression can be used to answer the question of interest. We have at least 4 years of follow up on all patients and we will have no incomplete data on a binary indicator of death within 4 years. 2. a. Methods: Descriptive statistics are provided by AAI category for variables measured at study enrollment. Continuous variables are summarized by means, standard deviations, minima, and maxima. Binary and categorical data are summarized by their frequency distribution as the frequency and percent. Descriptive statistics are also provided by AAI category for the proportions of 4-year mortality. Results: Table 1 displays the descriptive statistics for the baseline variables for the 4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35 to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR). Participants ranged in age from 65 to 95. The baseline variables are not equally distributed among AAI categories. The AAI categories farther from 1 tend to have a higher proportion of males, participants with diabetes, and older participants on average. As the AAI categories increase numerically, there are lower proportions of black participants, lower proportions of smokers, lower proportions of prevalent ASVD, lower systolic blood pressure on average, lower serum cholesterol on average, lower blood C reactive protein on average, and lower blood fibrinogen on average. BMI is approximately equal on average among the groups. Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by the trend in the proportions that the data is U-shaped. Table 1: Black Race (%) Smoker (%)2 20(27.0) >0.550.75 (n=230) 75.0 (6.2; 65-93) 118(51.3 ) 62(27.0) 12 (16.2) 55(23.9) Uses Estrogen (%)2 Prevalent ASVD disease (%) Diabetes (%)2 1(1.4) 8(3.5) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 75.3 (6.3; 72.5 (5.3; 65-95) 65-94) 224(38.8 909(35.1 ) ) 114(19.8 357(13.8 ) ) 88(15.3) 324(12.5 ) 29(5.0) 231(8.9) 41(55.4) 101(43.9 ) 173(30.0 ) 540(20.9 ) 236(18.2 ) 16(16.5) 8(47.1) 18(24.7) 55(24.3) 26.0 (4.3;15.643.1) 147.5 (24.5; 96-212) 218.5 (43.2; 112-392) 5.2 (8.2; 0-83) 163(12.7 ) 27.4 (4.6;16.948) 131.3 (19.2; 84-227) 206.1 (37.3; 73407) 3.1 (5.3; 0-86) 4(23.5) 25.1 (4.1;14.8 -37.4) 153.6 (24.9; 95-207) 208.4 (40.9; 122-317) 6.2 (10.9; 0-63) 370(14.4 ) 26.4 (4.7;14.758.8) 136.3 (21.5; 77-230) 213.5 (38.7; 93430) 3.5 (6.2; 0-108) 26(26.8) BMI(kg/m2)1, 125(22.2 ) 25.9 (4.4;15.144.6) 143.1 (22.7; 93-235) 216.8 (40.6; 111-396) 4.0 (5.8; 0-51) 29.3 (5.8;16.5 -48.1) 129.5 (17.0; 93-172) 198.7 (42.3; 123-330) 3.5 (6.1; 0-48) 26.2 (3.8;20.8 -37) 126.7 (18.1; 102-174) 199.5 (30.6; 144-251) 3.5 (4.0; 0-13) 0.250.55 (n=74) 27 (36.5) >0.550.75 (n=230) 55(23.9) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 105(18.2) 202(7.8) >1.151.35 (n=1295) 79(6.1) >1.351.55 (n=97) 12(12.4) >1.552.4 (n=17) 2(11.8) Age(yrs)1 Male (%) 2 0.250.55 (n=74) 76.7 (6.1; 66-92) 43(58.1) >1.151.35 (n=1295) 71.6 (5.0; 65-95) 681(52.6 ) 177(13.7 ) 103(8.0) >1.351.55 (n=97) 72.6 (5.8; 65-91) 76(78.4) >1.552.4 (n=17) 73.7 (5.0; 67-86) 15(88.2) 17(17.5) 0(0) 3(3.1) 0(0) 62(4.8) 0(0) 0(0) Systolic blood pressure (mmHg)1,2 Serum Cholesterol (mg/dl)1, 2 Blood C reactive protein (mg/l)1, 2 Blood 360.6 346.2 333.4 321.5 314.9 315.3 326.6 fibrinogen (92.2; (76.3; (69.9; (65.5; (62.4; (71.5; (100.0; (mg/dl)1,2 210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614) 1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum) 2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7, Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81) Table 2 4-year Mortality (%) b. Methods: Participants were compared based on a continuous measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI levels based on a linear regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI. Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: Participants with a 0.1 higher AAI have an absolute decrease in risk of 2.8% when compared with those with a 0.1 lower AAI for 4-year mortality (95% CI: -0.035, -0.022; p<0.001). c. Methods: Participants were compared based on a continuous measure of log transformed AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI levels based on a linear regression of the binary indicator of death in 4 years on a model that included a continuous indicator of the log transformed AAI. Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: Participants with a 2-fold higher AAI have an absolute decrease in risk of 19.3% when compared with those with 2-fold lower AAI for 4-year mortality (95% CI: -23.4%, -15.1%; p<0.001). d. Methods: Participants were compared based on a continuous measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI levels based on a linear regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI, and a quadratic indicator of AAI. Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the global F-test and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI modeled untransformed and as a quadratic variable is significantly associated with 4-year mortality (F(2, 4876)=43.3; p<0.001). e. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI groups based on a linear regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized at the scientific cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the global F-test and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized with scientifically relevant cutpoints is significantly associated with decreased 4-year mortality (F(6, 4872)=17.91; p<0.001). f. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI levels based on a linear regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized into 7 groups by quantile). Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the global F-test and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (F(6, 4872)=17.89; p<0.001). g. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the probability of death within 4 years were estimated for AAI levels based on a linear regression of the binary indicator of death in 4 years on a model that included a linear spline indicator of AAI (categorized with knots at cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the global F-test and 95% confidence intervals were computed assuming the asymptotic normal distribution for the linear regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized by linear spline is significantly associated with decreased 4-year mortality (F(7, 4871)=15.66; p<0.001). h. The graph of the fitted values from the linear, log transformed, quadratic, dummy variable, and linear spline models shows that the linear spline fits the data the best since it allows for the most flexibility. From our descriptive statistics, we know that the data is U-shaped, and the linear model does not allow it to take that shape. The log transformed AAI model allows it to curve slightly but not in a complete U-shape. Another problem with the linear and log transformed AAI are that they have fitted values below 0, and the fitted values must stay between 0 and 1, which means they do not model the data accurately. The dummy variables that were categorized scientifically show that the data takes a U-shape, but the steps may not be modeling the data as closely if the cutpoints are not where meaningful changes are taking place. The quantile categorized dummy variables do not take on a U-shape like the scientific cut point dummy variables. The quadratic model does fit the data to be a U-shape, but it also forces it to be a U-shape. When modeling a predictor of interest, an important factor to consider is interpretability. While, we cant to have a model that fits the data, we also want to be able to interpret our results in a meaningful way. This is the advantage of using an untransformed linear or log transformed model, because they are both interpretable. Using a quadratic model may fit the data better, but the risk difference will not be interpretable. When we use dummy variables (either with scientific cutpoints or quantiles), we have to compare each of our groups to a reference group, which also loses some interpretability of the data. A linear spline may be flexible and provide a better fit to the data, but we lose interpretability as well. The only interpretation we can make is that there is or isn’t a significant association between the two variables. When modeling a confounder, an important factor to consider is the accuracy of the model to fit the data, because we need to fully adjust for the confounder to avoid residual confounding. Since we do not need to interpret the results of the relationship between the confounder and outcome, interpretability is less important. Since getting the best fit is the goal, choosing a more flexible model may be ideal. Since linear, log transformed, and quadratic force the model to take on a certain shape, they may not be ideal for a confounder if we know that it is a U-shape for example. Dummy variables (either with scientific cutpoints or quantiles) can allow for meaningful comparisons between groups, but we lose flexibility as well. The linear spline may be the best fit to model a confounder since it is the most flexible and may be the best at reducing residual confounding. When modeling a precision variable, neither interpretability nor accuracy of the fit to the model are very important. It is often not worth the effort to fit the data exactly, and it may be better to choose a simple model to avoid losing precision or degrees of freedom. In this case, a linear untransformed model may be best unless we have a scientific knowledge that would make us think another model would be better. The quadratic model, dummy variables, and spline would all add in potentially unnecessary components and make the model much more complicated than it needs to be. 3. a. Methods: Descriptive statistics are provided by AAI category for variables measured at study enrollment. Continuous variables are summarized by means, standard deviations, minima, and maxima. Binary and categorical data are summarized by their frequency distribution as the frequency and percent. Descriptive statistics are also provided by AAI category for the proportions of 4-year mortality. Results: Table 1 displays the descriptive statistics for the baseline variables for the 4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35 to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR). Participants ranged in age from 65 to 95. The baseline variables are not equally distributed among AAI categories. The AAI categories farther from 1 tend to have a higher proportion of males, participants with diabetes, and older participants on average. As the AAI categories increase numerically, there are lower proportions of black participants, lower proportions of smokers, lower proportions of prevalent ASVD, lower systolic blood pressure on average, lower serum cholesterol on average, lower blood C reactive protein on average, and lower blood fibrinogen on average. BMI is approximately equal on average among the groups. Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by the trend in the proportions that the data is U-shaped. Table 1: Black Race (%) Smoker (%)2 20(27.0) >0.550.75 (n=230) 75.0 (6.2; 65-93) 118(51.3 ) 62(27.0) 12 (16.2) 55(23.9) Uses Estrogen (%)2 Prevalent ASVD disease (%) Diabetes (%)2 1(1.4) 8(3.5) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 75.3 (6.3; 72.5 (5.3; 65-95) 65-94) 224(38.8 909(35.1 ) ) 114(19.8 357(13.8 ) ) 88(15.3) 324(12.5 ) 29(5.0) 231(8.9) 41(55.4) 101(43.9 ) 173(30.0 ) 540(20.9 ) 236(18.2 ) 16(16.5) 8(47.1) 18(24.7) 55(24.3) 26.0 (4.3;15.643.1) 147.5 163(12.7 ) 27.4 (4.6;16.948) 131.3 4(23.5) 25.1 (4.1;14.8 -37.4) 153.6 370(14.4 ) 26.4 (4.7;14.758.8) 136.3 26(26.8) BMI(kg/m2)1, 125(22.2 ) 25.9 (4.4;15.144.6) 143.1 29.3 (5.8;16.5 -48.1) 129.5 26.2 (3.8;20.8 -37) 126.7 Age(yrs)1 Male (%) 2 Systolic blood 0.250.55 (n=74) 76.7 (6.1; 66-92) 43(58.1) >1.151.35 (n=1295) 71.6 (5.0; 65-95) 681(52.6 ) 177(13.7 ) 103(8.0) >1.351.55 (n=97) 72.6 (5.8; 65-91) 76(78.4) >1.552.4 (n=17) 73.7 (5.0; 67-86) 15(88.2) 17(17.5) 0(0) 3(3.1) 0(0) 62(4.8) 0(0) 0(0) pressure (24.9; (24.5; (22.7; (21.5; (19.2; (17.0; (18.1; (mmHg)1,2 95-207) 96-212) 93-235) 77-230) 84-227) 93-172) 102-174) Serum 208.4 218.5 216.8 213.5 206.1 198.7 199.5 Cholesterol (40.9; (43.2; (40.6; (38.7; 93- (37.3; 73- (42.3; (30.6; (mg/dl)1, 2 122-317) 112-392) 111-396) 430) 407) 123-330) 144-251) Blood C 6.2 (10.9; 5.2 (8.2; 4.0 (5.8; 3.5 (6.2; 3.1 (5.3; 3.5 (6.1; 3.5 (4.0; reactive 0-63) 0-83) 0-51) 0-108) 0-86) 0-48) 0-13) protein (mg/l)1, 2 Blood 360.6 346.2 333.4 321.5 314.9 315.3 326.6 fibrinogen (92.2; (76.3; (69.9; (65.5; (62.4; (71.5; (100.0; (mg/dl)1,2 210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614) 1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum) 2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7, Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81) Table 2 4-year Mortality (%) 0.250.55 (n=74) 27 (36.5) >0.550.75 (n=230) 55(23.9) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 105(18.2) 202(7.8) >1.151.35 (n=1295) 79(6.1) >1.351.55 (n=97) 12(12.4) >1.552.4 (n=17) 2(11.8) b. Methods: Participants were compared based on a continuous measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI. Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: Participants with a 0.1 unit higher AAI have an odds 0.76 times that of those with a 0.1 unit lower AAI of dying within 4 years of study enrollment (95% CI: 0.72, 0.79; p<0.001). c. Methods: Participants were compared based on a continuous measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a continuous indicator of log transformed AAI. Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: Participants with a 2-fold higher AAI have an odds 0.20 times that of those with a 2-fold lower AAI of dying within 4 years of study enrollment (95% CI: 0.15, 0.26; p<0.001). d. Methods: Participants were compared based on a continuous measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI, and a quadratic indicator of AAI. Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: AAI modeled untransformed and as a quadratic variable is significantly associated with 4-year mortality (LR chi2(2)=128.5; p<0.001). e. Methods: Participants were compared based on a categorical measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized at the scientific cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). . Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Waldtype maximum likelihood methods. Two-sided p values were based on the chisquared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: AAI categorized with scientifically relevant cutpoints is significantly associated with decreased 4-year mortality (LR chi2(6)=150.4; p<0.001). f. Methods: Participants were compared based on a categorical measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized into 7 groups by quantile). Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (LR chi2(6)=149.3; p<0.001). g. Methods: Participants were compared based on a categorical measure of AAI with respect to ratios of odds of mortality within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the odds of death within 4 years were estimated for AAI levels based on a logistic regression of the binary indicator of death in 4 years on a model that included a linear spline indicator of AAI (categorized with knots at cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Differences in the ratios of odds of mortality were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the logistic regression parameter estimates. Inference: AAI categorized by linear spline is significantly associated with decreased 4-year mortality (LR chi2(7)=152.99; p<0.001). h. The graph of the fitted values from the linear, log transformed, quadratic, dummy variable, and linear spline models shows that the linear spline fits the data the best since it allows for the most flexibility. From our descriptive statistics, we know that the data is U-shaped, and the linear model, log transformed model, and quadratic model start the U-shape but level off at an aai of greater than 1.5, whereas the linear spline continues to increase at that point. The dummy variables that were categorized scientifically show that the data takes a U-shape, but the steps may not be modeling the data as closely if the cutpoints are not where meaningful changes are taking place. The quantile categorized dummy variables do not take on a Ushape like the scientific cut point dummy variables. When modeling a predictor of interest, an important factor to consider is interpretability. While, we cant to have a model that fits the data, we also want to be able to interpret our results in a meaningful way. This is the advantage of using an untransformed linear or log transformed model, because they are both interpretable. Using a quadratic model may fit the data better, but the risk difference will not be interpretable. When we use dummy variables (either with scientific cutpoints or quantiles), we have to compare each of our groups to a reference group, which also loses some interpretability of the data. A linear spline may be flexible and provide a better fit to the data, but we lose interpretability as well. The only interpretation we can make is that there is or isn’t a significant association between the two variables. When modeling a confounder, an important factor to consider is the accuracy of the model to fit the data, because we need to fully adjust for the confounder to avoid residual confounding. Since we do not need to interpret the results of the relationship between the confounder and outcome, interpretability is less important. Since getting the best fit is the goal, choosing a more flexible model may be ideal. Since linear, log transformed, and quadratic force the model to take on a certain shape, they may not be ideal for a confounder if we know that it is a U-shape for example. Dummy variables (either with scientific cutpoints or quantiles) can allow for meaningful comparisons between groups, but we lose flexibility as well. The linear spline may be the best fit to model a confounder since it is the most flexible and may be the best at reducing residual confounding. When modeling a precision variable, neither interpretability nor accuracy of the fit to the model are very important. It is often not worth the effort to fit the data exactly, and it may be better to choose a simple model to avoid losing precision or degrees of freedom. In this case, a linear untransformed model may be best unless we have a scientific knowledge that would make us think another model would be better. The quadratic model, dummy variables, and spline would all add in potentially unnecessary components and make the model much more complicated than it needs to be. 4. a. Methods: Descriptive statistics are provided by AAI category for variables measured at study enrollment. Continuous variables are summarized by means, standard deviations, minima, and maxima. Binary and categorical data are summarized by their frequency distribution as the frequency and percent. Descriptive statistics are also provided by AAI category for the proportions of 4-year mortality. Results: Table 1 displays the descriptive statistics for the baseline variables for the 4879 participants in the study provided by AAI category (74 in AAI category 0.250.55, 230 in AAI category >0.55 to 0.75, 577 in AAI category >0.75-0.95, 2589 in AAI category >0.95 to 1.15, 1295 in AAI category >1.15 to 1.35, 97 in AAI category >1.35 to 1.55, 17 in AAI category >1.55 to 2.4). The data in Table 1 presents the nonmissing data, and any data missing is missing completely at random (MCAR). Participants ranged in age from 65 to 95. The baseline variables are not equally distributed among AAI categories. The AAI categories farther from 1 tend to have a higher proportion of males, participants with diabetes, and older participants on average. As the AAI categories increase numerically, there are lower proportions of black participants, lower proportions of smokers, lower proportions of prevalent ASVD, lower systolic blood pressure on average, lower serum cholesterol on average, lower blood C reactive protein on average, and lower blood fibrinogen on average. BMI is approximately equal on average among the groups. Table 2 displays the proportions of 4-year mortality by AAI category. We can tell by the trend in the proportions that the data is U-shaped. Table 1: Black Race (%) Smoker (%)2 20(27.0) >0.550.75 (n=230) 75.0 (6.2; 65-93) 118(51.3 ) 62(27.0) 12 (16.2) 55(23.9) Uses Estrogen (%)2 Prevalent ASVD disease (%) Diabetes (%)2 1(1.4) 8(3.5) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 75.3 (6.3; 72.5 (5.3; 65-95) 65-94) 224(38.8 909(35.1 ) ) 114(19.8 357(13.8 ) ) 88(15.3) 324(12.5 ) 29(5.0) 231(8.9) 41(55.4) 101(43.9 ) 173(30.0 ) 540(20.9 ) 236(18.2 ) 16(16.5) 8(47.1) 18(24.7) 55(24.3) 26.0 (4.3;15.643.1) 147.5 (24.5; 96-212) 218.5 (43.2; 112-392) 5.2 (8.2; 0-83) 163(12.7 ) 27.4 (4.6;16.948) 131.3 (19.2; 84-227) 206.1 (37.3; 73407) 3.1 (5.3; 0-86) 4(23.5) 25.1 (4.1;14.8 -37.4) 153.6 (24.9; 95-207) 208.4 (40.9; 122-317) 6.2 (10.9; 0-63) 370(14.4 ) 26.4 (4.7;14.758.8) 136.3 (21.5; 77-230) 213.5 (38.7; 93430) 3.5 (6.2; 0-108) 26(26.8) BMI(kg/m2)1, 125(22.2 ) 25.9 (4.4;15.144.6) 143.1 (22.7; 93-235) 216.8 (40.6; 111-396) 4.0 (5.8; 0-51) 29.3 (5.8;16.5 -48.1) 129.5 (17.0; 93-172) 198.7 (42.3; 123-330) 3.5 (6.1; 0-48) 26.2 (3.8;20.8 -37) 126.7 (18.1; 102-174) 199.5 (30.6; 144-251) 3.5 (4.0; 0-13) 0.250.55 (n=74) 27 (36.5) >0.550.75 (n=230) 55(23.9) AAI Categories >0.75>0.950.95 1.15 (n=577) (n=2589) 105(18.2) 202(7.8) >1.151.35 (n=1295) 79(6.1) >1.351.55 (n=97) 12(12.4) >1.552.4 (n=17) 2(11.8) Age(yrs)1 Male (%) 2 0.250.55 (n=74) 76.7 (6.1; 66-92) 43(58.1) >1.151.35 (n=1295) 71.6 (5.0; 65-95) 681(52.6 ) 177(13.7 ) 103(8.0) >1.351.55 (n=97) 72.6 (5.8; 65-91) 76(78.4) >1.552.4 (n=17) 73.7 (5.0; 67-86) 15(88.2) 17(17.5) 0(0) 3(3.1) 0(0) 62(4.8) 0(0) 0(0) Systolic blood pressure (mmHg)1,2 Serum Cholesterol (mg/dl)1, 2 Blood C reactive protein (mg/l)1, 2 Blood 360.6 346.2 333.4 321.5 314.9 315.3 326.6 fibrinogen (92.2; (76.3; (69.9; (65.5; (62.4; (71.5; (100.0; (mg/dl)1,2 210-695) 171-674) 132-614) 109-872) 138-696) 167-584) 232-614) 1 Descriptive statistics presented are the mean (standard deviation; minimum-maximum) 2 Missing Data: (Smoker: 6, Uses Estrogen: 5, Diabetes: 54, BMI: 12, Systolic Blood Pressure: 7, Cholesterol: 45, Blood C Reactive protein: 63, Blood fibrinogen: 81) Table 2 4-year Mortality (%) b. Methods: Participants were compared based on a continuous measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI. Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: Participants with a 0.1 unit higher AAI are 0.79 times as likely as those with a 0.1 unit lower AAI of dying within 4 years of study enrollment (95% CI: 0.76, 0.82; p<0.001). c. Methods: Participants were compared based on a continuous measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a continuous indicator of log transformed AAI. Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: Participants with a 2-fold higher AAI are 0.28 times as likely as those with a 2-fold lower AAI of dying within 4 years of study enrollment (95% CI: 0.23, 0.34; p<0.001). d. Methods: Participants were compared based on a continuous measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions at all levels of AAI. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a continuous indicator of AAI, and a quadratic indicator of AAI. Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI modeled untransformed and as a quadratic variable is significantly associated with 4-year mortality (Wald chi2(2)=191.1; p<0.001). e. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions of AAI groups. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized at the scientific cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized with scientifically relevant cutpoints is significantly associated with decreased 4-year mortality (Wald chi2(6)=188.3; p<0.001). f. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions of AAI groups. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a categorical indicator of AAI (categorized into 7 groups by quantile). Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized by quantile is significantly associated with decreased 4year mortality (Wald chi2(6)=177.4; p<0.001). g. Methods: Participants were compared based on a categorical measure of AAI with respect to differences in the incidence of death within 4 years of study enrollment. No subjects’ observations were censored prior to four years, so four-year mortality rates were calculable based on sample proportions of AAI groups. Crude estimates for the risk ratio of death within 4 years were estimated for AAI levels based on a Poisson regression of the binary indicator of death in 4 years on a model that included a linear spline indicator of AAI (categorized with knots at cutpoints of 0.25, 0.55, 0.75, 0.95, 1.15, 1.35, 1.55, 2.4). Risk ratios for 4-year mortality rates were compared using 95% confidence intervals computed using Wald-type maximum likelihood methods. Two-sided p values were based on the chi-squared statistic and 95% confidence intervals were computed assuming the asymptotic normal distribution for the Poisson regression parameter estimates, with standard errors computed using Huber-White sandwich estimator. Inference: AAI categorized by linear spline is significantly associated with decreased 4-year mortality (Wald chi2(7)=189.7; p<0.001). h. The graph of the fitted values from the linear, log transformed, quadratic, dummy variable, and linear spline models shows that the linear spline fits the data the best since it allows for the most flexibility. From our descriptive statistics, we know that the data is U-shaped, and the linear model, log transformed model, and quadratic model start the U-shape but level off at an aai of greater than 1.5, whereas the linear spline continues to increase at that point. The log transformed model also has predicted values above 1, which are not possible with our data. The dummy variables that were categorized scientifically show that the data takes a U-shape, but the steps may not be modeling the data as closely if the cutpoints are not where meaningful changes are taking place. The quantile categorized dummy variables do not take on a U-shape like the scientific cut point dummy variables. When modeling a predictor of interest, an important factor to consider is interpretability. While, we cant to have a model that fits the data, we also want to be able to interpret our results in a meaningful way. This is the advantage of using an untransformed linear or log transformed model, because they are both interpretable. Using a quadratic model may fit the data better, but the risk difference will not be interpretable. When we use dummy variables (either with scientific cutpoints or quantiles), we have to compare each of our groups to a reference group, which also loses some interpretability of the data. A linear spline may be flexible and provide a better fit to the data, but we lose interpretability as well. The only interpretation we can make is that there is or isn’t a significant association between the two variables. When modeling a confounder, an important factor to consider is the accuracy of the model to fit the data, because we need to fully adjust for the confounder to avoid residual confounding. Since we do not need to interpret the results of the relationship between the confounder and outcome, interpretability is less important. Since getting the best fit is the goal, choosing a more flexible model may be ideal. Since linear, log transformed, and quadratic force the model to take on a certain shape, they may not be ideal for a confounder if we know that it is a U-shape for example. Dummy variables (either with scientific cutpoints or quantiles) can allow for meaningful comparisons between groups, but we lose flexibility as well. The linear spline may be the best fit to model a confounder since it is the most flexible and may be the best at reducing residual confounding. When modeling a precision variable, neither interpretability nor accuracy of the fit to the model are very important. It is often not worth the effort to fit the data exactly, and it may be better to choose a simple model to avoid losing precision or degrees of freedom. In this case, a linear untransformed model may be best unless we have a scientific knowledge that would make us think another model would be better. The quadratic model, dummy variables, and spline would all add in potentially unnecessary components and make the model much more complicated than it needs to be. 5. RD vs. OR vs. RR for Untransformed Linear Model When looking at the untransformed linear model, the risk ratio model and odds ratio models do a better job of modeling the U-shaped data than the risk difference model. The risk difference model also extends below 0, while the OR and RR models stay within 0 and 1. RD vs. OR vs. RR on the Log Transformed Model When looking at the log transformed aai model, the risk ratio model and odds ratio models do a better job of modeling the U-shaped data than the risk difference model. The risk difference model also extends below 0, and the RR model extends above 1, while the OR model stays within 0 and 1. RD vs. OR vs. RR on the Quadratic Model When looking at the quadratic model, the risk ratio and odds ratio models curve less after 1.5 than the risk difference model, and the risk difference model does a better job of fitting U-shaped data. RD vs. OR vs. RR on the Dummy Variable based on scientific cutpoint model When looking at the dummy variable model based on scientific cutpoints, the risk difference, odds ratio, and risk ratio all present the same model. They all model the U-shape, but the model is less flexible than the spline model. RD vs. OR vs. RR on the Dummy Variable based on quantile model When looking at the dummy variable model based on quantile cutpoints, the risk difference, odds ratio, and risk ratio all present the same model. The quantile cutpoints don’t model the data as well as the scientific cutpoints. This does show a U-shape, but a lot of information is lost in the way the cutpoints were formed. RD vs. OR vs. RR on the Dummy Variable based on linear spline model When looking at the linear spline model, the risk difference, odds ratio, and risk ratio all present the same model. This model is the most flexible and shows the Ushaped nature of the data.