Biost 518: Applied Biostatistics II Biost 515: Biostatistics II Emerson, Winter 2015 Homework #5 February 20, 2015 On this (as all homeworks) Stata / R code and unedited Stata / R output is TOTALLY unacceptable. Instead, prepare a table of statistics gleaned from the Stata output. The table should be appropriate for inclusion in a scientific report, with all statistics rounded to a reasonable number of significant digits. (I am interested in how statistics are used to answer the scientific question.) Unless explicitly told otherwise in the statement of the problem, in all problems requesting “statistical analyses” (either descriptive or inferential), you should present both 1. Methods: A brief sentence or paragraph describing the statistical methods you used. This should be using wording suitable for a scientific journal, though it might be a little more detailed. A reader should be able to reproduce your analysis. DO NOT PROVIDE Stata OR R CODE. 2. Inference: A paragraph providing full statistical inference in answer to the question. Please see the supplementary document relating to “Reporting Associations” for details. All problems of the homework relate to the clinical trial of DFMO and suppression of polyamines. In this homework I ask you to sometimes use dummy variables to analyze the data. There are two approaches to performing this analysis: 1. Provide suitable descriptive statistics. Answer: Methods: In this study, we are mainly interested in the effect of DFMO on the mucosal polyamines. Typically, in this homework, we are mainly interested in the effect of DFMO on the mucosal spermidine levels after 12 months of treatment and the baseline. Therefore, descriptive statistics of baseline of spermidine levels, spermidine levels after 12 months of treatment, age and sex are presented within groups defined by dose levels of DFMO (0, 0.075, 0.2 or 0.4 g/sq m/day), as well as the entire sample. For continuous variables, we presented mean, standard deviation, minimum and maximum. The variables included age, spermidine level at baseline and after 12 months of treatment. For binary variable, i.e. female in this sample, we presented percentages. We are also interested in whether there is an effect of DFMO on the decreasing of spermidine levels after 12 months of treatment. Therefore, probability and odds of decreasing of spermidine levels after 12 months of treatment were presented in the following table, too. Inference: There are 114 subjects in the dataset. However, 1 of those has missing age. And only 95 of 114 subjects have available spermidine levels after 12 months of treatment. For these missing values, we just omitted them, but it should be remembered that we cannot assess the influence that such omissions might have on our results. Of the 114 subjects, 32 subjects took doses of 0 g/sq m/day, 29 subjects took doses of 0.075 g/sq m/day, 25 subjects took doses of 0.2 g/sq m/day and 28 subjects took doses of 0.4 g/sq m/day. And after 12 months of treatment, there are 28 subjects in dose 0 group, 26 subjects in dose 0.075 group, 21 subjects in dose 0.2 group and 20 subjects in dose 0.4 group. The following table presents descriptive statistics within these groups and for the entire sample. We can find that for each dose groups except the placebo group (i.e. dose 0 group), the mean spermidine levels after 12 months of treatment are all lower than the mean spermidine levels at baseline. For placebo group, the mean spermidine level after 12 months of treatment is similar to the mean spermidine level at baseline. In addition, the probability of decrease of spermidine level after 12 months of treatment is increasing with the increase of the dose level for each group, as well as the odds. Therefore, we could say that there might be a trend to lower spermidine levels after 12 months of treatment with the increase of the dose level of DFMO. And the probability of decreasing of spermidine level after 12 months might be larger with the increase of the dose level. There is no obvious trend in sex or age across groups defined by different doses of DFMO. Doses Level of DFMO (g/sq m/day) 0 (n=32) 0.075 (n=29) 0.2 (n=25) 0.4 (n=28) Any level (n=114) Female (%) 18.8% 17.2% 0 21.4 14.9 Age (yrs) 65.9 (8.51; 45.5 – 77.2) 61.3 (7.69; 47.8 – 76.9) (n=28) 62.8 (8.28; 45.4 – 77.6) 63.9 (7.81; 48.5 – 81.0) 63.6 (8.16; 45.4 – 81.0) (n=113) Spermidine at baseline 3.26 (1.45; 1.40- 7.05) 3.47 (1.55; 1.51 – 7.02) 3.35 (1.33; 1.70 – 6.22) 3.56 (1.88; 0.66 – 7.6) 3.41 (1.55; 0.66 – 7.6) Spermidine after 12 months 3.26 (1.31; 1.01 – 5.91) (n=28) 2.92 (0.994; 1.35 – 4.92) (n=26) 2.71 (1.40; 0.293–6.45) (n=21) 1.95 (0.799; 0 – 3.42) (n=20) 2.77 (1.23; 0 – 6.45) (n=95) Probability of decrease of spermidine after 12 months 0.464 (n=28) 0.615 (n=26) 0.619 (n=21) 0.800 (n=20) 0.611 (n=95) Odds 0.8667 (n=28) 1.600 (n=26) 1.625 (n=21) 4.000 (n=20) 1.568 (n=95) 1For 2For the level of the polyamines in the table, the units are micromole/mg protein. continuous variables, descriptive statistics are presented in the form: mean (sd; min - max) 2. For each of the following models, provide inference (P values, and where appropriate, 95% confidence intervals with scientific interpretation of the parameters) regarding the effect of DFMO on the mucosal spermidine levels after 12 months of treatment. (Recall that when multiple modeled covariates are derived from the same scientific factor, you need to test all those covariates simultaneously. When no other covariates are in the model, the “overall F” or “overall chi squared” test can do this for us.) Note that part h asks you to provide a table of predicted values for each of these models. a. Model dose as dummy variables using the dose 0 group as the reference group. Answer: Method: To investigate the association between DFMO and mucosal spermidine levels after 12 months of treatment, we used a linear regression model with robust standard error, and modeled dose as dummy variables using the dose 0 group as the reference group and modeled spermidine level as an untransformed continuous variable. The spermidine level is our response and the dose of DFMO is the predictor. The Huber-White sandwich estimator was used to compute standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, 95% confidence intervals and two-sided p values were computed. An inference was made based on the p value from the overall F test. Inference: Based on the p value from the overall F test, which is 0.0001, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. The regression model includes 4 parameters: one intercept, which represents the estimate of mean spermidine level after 12 months of treatment for the reference group (i.e. the dose 0 group), three slopes for the three dose groups (dose 0.075, dose 0.2 and dose 0.4), which each represents the estimate of difference of mean spermidine levels between the corresponding dose group and dose 0 group. The following table provides the point estimate, a 95% confidence interval and a two-sided p value for each parameter from this model. Point Estimate 95% CI Two-sided P Intercept 3.256 (2.761, 3.751) < 0.001 Slope for dose 0.075 -0.3363 (-0.9649, 0.2924) 0.291 Slope for dose 0.2 -0.5443 (-1.324, 0.2358) 0.169 Slope for dose 0.4 -1.306 (-1.914, -0.6983) < 0.001 We can find that the difference of mean spermidine level after 12 months of treatment between dose 0.075 group and dose 0 group is not statistically significant, as well as the difference between dose 0.2 group and dose 0 group. The difference of mean spermidine level after 12 months of treatment between dose 0.4 group and dose 0 group is statistically significant (based on the twosided p < 0.001). Because of the concern of high Type 1 Error, it might be inappropriate to give interpretations of the 95% confidence intervals of these parameters. b. Model dose as dummy variables using the dose 0.075 group as the reference group. You do not have to provide a formal description of the methods or inference for this part. Instead comment on how the regression parameters from this model relate to those obtained in part a. Suppose we were to completely ignore the major multiple comparison issues and to instead trust the individual p values listed in the coefficient table, what conclusions would we reach about differences among the dose groups in part a vs in part b? Answer: When using the dose 0.075 group as the reference group, the intercept, which is 2.92, is the estimate of mean spermidine level after 12 months for dose 0.075 group, which is the sum of the intercept and slope for dose 0.075 in part a. The slope for dose 0 , which is 0.336, is the estimate of difference between the two mean levels for dose 0 group and dose 0.075 group. It has the same absolute value as the slope for dose 0.075 in part a, but with a positive sign. The slope for dose 0.2 , which is -0.208, is the estimate of difference between the two mean levels for dose 0.2 group and dose 0.075 group. It is equal to the slope for dose 0.2 subtracting the slope for dose 0.075 in part a. The slope for dose 0.4, which is -0.970, is the estimate of difference between the two mean levels for dose 0.4 group and dose 0.075 group. It is equal to the slope for dose 0.4 subtracting the slope for dose 0.075 in part a. If we ignored the major multiple comparison inssues, based on the individual two-sided p values for the parameters from the linear regression using dose 0.075 group as reference group, the difference between the two mean levels for dose 0 group and dose 0.075 group (a two-sided p = 0.291), as well as the difference between the two mean levels for dose 0.2 group and dose 0.075 group (a two-sided p = 0.566), are not statistically significant. The difference between the mean levels for dose 0.4 group and dose 0.075 group (a two-sided p < 0.001) is statistically significant. In part a, we could concluded that the difference between mean levels of spermidine after 12 months of treatment for dose 0 group and dose 0.075 group or dose 0.2 group is not statistically significant. The difference between mean levels for dose 0 group and dose 0.4 group is statistically significant. c. Model dose continuously as a linear predictor. Answer: Methods: I used a linear regression model with robust standard error and modeled dose continuously as a linear predictor to investigate the association between dose level and spermidine levels after 12 months of treatment. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, a 95% confidence interval and a two-sided p value were computed. An inference was made based on these results. Inference: Based on the results, we estimated that the mean level of spermidine after 12 months of treatment is 3.23 (micromole/mg protein) for dose 0 group with a 95% confidence interval: (2.89, 3.58), and for every 1 (g/sq m/day) higher of dose level (but it’s not meaningful in this study), the mean level of spermidine will be 3.13 lower. Based on the 95% confidence interval of the slope, our data would not be unusual if the true difference between two groups with 1 unit difference in dose level is between 4.50 lower and 1.75 lower in the group with higher level of dose. Based on the two-sided p < 0.001, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. d. Model dose as two variables: a continuous linear predictor along with a quadratic term (so an additional predictor equal to the square of dose). Answer: Methods: I used a linear regression model with robust standard error to investigate the association between dose level and spermidine levels after 12 months of treatment. In the regression model, the spermidine level is the response, and the continuous linear term of dose and a quadratic term of dose are the predictors. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, 95% confidence intervals and two-sided p values were computed. An inference was made based on the p value from overall F test. Inference: Based on the results from the regression, the p value from overall F test is less than 0.0001. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. e. Model dose as a binary variable indicating whether dose was greater than 0. Answer: Methods: I generated an indicator variable for dose greater than 0, and did a linear regression with robust standard error using the indicator variable as predictor and the level of spermidine after 12 months as response to investigate the association between dose and spermidine levels. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, a 95% confidence interval and a two-sided p value were computed. An inference was made based on these results. Inference: Based on the results from the regression, we estimated that the mean level of spermidine after 12 months in dose 0 group is 3.26 (micromole/mg protein), with a 95% confidence interval: (2.77, 3.75). Mean level of spermidine in group with dose higher than 0 is 0.691 micromole/mg protein lower than the mean level in dose 0 group. Our data would not be unusual if the true difference in mean level of spermidine between dose 0 group and group with dose higher than 0 is between 0.128 and 1.25, with dose 0 group having higher mean level of spermidine. Based on the two-sided p = 0.017, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. f. Model dose as two variables: a binary variable indicating whether dose was greater than 0 and a continuous linear term. Answer: Methods: I used a linear regression model with robust standard error to investigate the association between dose level and spermidine levels after 12 months of treatment. In the regression model, the spermidine level is the response, and the continuous linear term of dose and the binary variable indicating whether dose was greater than 0 are the predictors. The HuberWhite sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, 95% confidence intervals and two-sided p values were computed. An inference was made based on the p value from the overall F test. Inference: Based on the results from the linear regression including the two variables, the p value from the overall F test is 0.0001. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. g. Model dose as three variables: a continuous linear predictor, a quadratic term, and a cubic term (a term equal to dose raised to the third power). Answer: Methods: I used a linear regression model with robust standard error to investigate the association between dose level and spermidine levels after 12 months of treatment. In the regression model, the spermidine level is the response, and the continuous linear term of dose, a quadratic term of dose and a cubic term of dose are the predictors. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in linear regression and Wald statistics, 95% confidence intervals and two-sided p values were computed for the parameters. An inference was made based on the p-value from the overall F test. Inference: Based on the results from the regression, the p value from the overall F test is 0.0001. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment. h. Provide a table of the fitted values for each dose group from the above models. Comment on the similarities / differences between those fitted values (and the descriptive statistics). Answer: For methods from part a, b, c, d, f and g, the fitted values for the mean level of spermidine after 12 months of treatment for each dose group are presented in the following table. To see the similarities and differences more clearly, the fitted values are presented to 4 significant digits. Doses Level of DFMO (g/sq m/day) Methods 0 0.075 0.2 0.4 a 3.256 2.920 2.712 1.950 b 3.256 2.920 2.712 1.950 c 3.234 3.000 2.609 1.984 d 3.213 3.011 2.642 1.964 f 3.256 2.976 2.599 1.995 g 3.256 2.920 2.712 1.950 For method e, we can only estimate the mean level of spermidine for dose 0 group and group with dose higher than 0. For dose 0 group, the fitted value is 3.256; for group with dose higher than 0, the fitted value is 2.565. We can see that the fitted values from method a, b and g are the same as the descriptive statistics in problem 1. It’s because that there are four parameters in these models and the models are saturated. For method e, there are two parameters in the model, and there are two groups. It’s a saturated model, too. Therefore, the fitted value for dose 0 group in method e is the same as the descriptive statistics. If we provided the descriptive statistics for the mean spermidine level for group with dose greater than 0, it should be the same as the fitted value in model e. For method f, since it includes the binary variable indicating whether dose was greater than 0, the fitted value for dose 0 group is the same as the descriptive statistics. However, to get the fitted values for other three groups in method f needs to borrow information across groups. Hence, they are different from descriptive statistics. For method c and d, they are not saturated. Therefore, the fitted values in c and d are different from descriptive statistics. 3. Repeat the analyses in problem 2 adjusting for the baseline mucosal spermidine levels. (Note that the Stata functions "test" and "testparm" can be used to perform Wald tests of multiple parameters adjusted for other covariates.) You do not need to consider the descriptive statistics or the fitted values for this problem. For these problems, I just added an adjustment for the baseline mucosal spermidine levels in the previous models in problem2. Since the baseline spermidine level is not associated with the predictor of interest, i.e. the dose, I made inference based on the p value from multiple partial F test. a. Model dose as dummy variables using the dose 0 group as the reference group. Answer: Inference: Based on the p value from the multiple partial F test, which is 0.0002, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline mucosal spermidine levels. The regression model includes 5 parameters: one intercept, which represents the estimate of mean spermidine level after 12 months of treatment for the reference group (i.e. the dose 0 group) with baseline spermidine levels equal to 0; three slopes for the three dose groups (dose 0.075, dose 0.2 and dose 0.4), which each represents the difference of mean spermidine levels after 12 months of treatment between the corresponding dose group and dose 0 group with similar baseline spermidine levels; one slope for the baseline spermidine level, which represents the estimate of the difference in mean spermidine levels after 12 months of treatment for two groups differing in 1 micromole/mg protein in baseline spermidine level with the same level of dose. The following table provides the point estimate, a 95% confidence interval and a two-sided p value for each parameter from this model. Point Estimate 95% CI Two-sided P Intercept 2.670 (2.036, 3.303) < 0.001 Slope for dose 0.075 -0.3425 (-0.9325, 0.2476) 0.252 Slope for dose 0.2 -0.5411 (-1.299, 0.2168) 0.160 Slope for dose 0.4 -1.379 (-2.010, -0.7484) < 0.001 Slope for baseline 0.1779 (0.03185, 0.3239) 0.018 We can find that the difference of mean spermidine level after 12 months of treatment between dose 0.075 group and dose 0 group is not statistically significant after adjustment for the baseline of spermidine level, as well as the difference between dose 0.2 group and dose 0 group. The difference of mean spermidine level after 12 months of treatment between dose 0.4 group and dose 0 group with similar baseline of spermidine level is statistically significant (based on the two-sided p < 0.001). Because of the concern of high Type 1 Error, it might be inappropriate to give interpretations of the 95% confidence intervals of these parameters. b. Model dose as dummy variables using the dose 0.075 group as the reference group. You do not have to provide a formal description of the methods or inference for this part. Instead comment on how the regression parameters from this model relate to those obtained in part a. Suppose we were to completely ignore the major multiple comparison issues and to instead trust the individual p values listed in the coefficient table, what conclusions would we reach about differences among the dose groups in part a vs in part b? Answer: When using the dose 0.075 group as the reference group, the intercept, which is 2.33, is the estimate of mean spermidine level after 12 months for dose 0.075 group with baseline of spemidine level equal to 0, which is the sum of the intercept and slope for dose 0.075 in part a. The slope for dose 0 , which is 0.342, is the estimate of difference between the two mean spermidine levels after 12 months of treatment for dose 0 group and dose 0.075 group with similar baseline of spermidine levels. It has the same absolute value as the slope for dose 0.075 in part a, but with a positive sign. The slope for dose 0.2 , which is -0.199, is the estimate of difference between the two mean spemidine levels after 12 months of treatment for dose 0.2 group and dose 0.075 group with similar baseline of spermidine levels. It is equal to the slope for dose 0.2 subtracting the slope for dose 0.075 in part a. The slope for dose 0.4, which is -1.04, is the estimate of difference between the two mean spemidine levels after 12 months of treatment for dose 0.4 group and dose 0.075 group with similar baseline of spermidine levels. It is equal to the slope for dose 0.4 subtracting the slope for dose 0.075 in part a. The slope for baseline of spermidine level, which is 0.178, represents the estimate of the difference in mean spermidine levels after 12 months of treatment for two groups differing in 1 micromole/mg protein in baseline spermidine level with the same level of dose. It is the same as the point estimate of slope for baseline in part a. If we ignored the major multiple comparison inssues, based on the individual two-sided p values for the parameters from the linear regression using dose 0.075 group as reference group, the difference between the two mean spermidine levels after 12 months of treatment for dose 0 group and dose 0.075 group with similar baseline spermidine levels (a two-sided p = 0.252), as well as the difference between the two mean levels for dose 0.2 group and dose 0.075 group with similar baseline (a two-sided p = 0.573), are not statistically significant. The difference between the mean spermidine levels after 12 months of treatment for dose 0.4 group and dose 0.075 group with similar baseline is statistically significant (a two-sided p < 0.001). In part a, we could concluded that the difference of mean spermidine level after 12 months of treatment between dose 0.075 group and dose 0 group is not statistically significant after adjustment for the baseline of spermidine level, as well as the difference between dose 0.2 group and dose 0 group. The difference of mean spermidine level after 12 months of treatment between dose 0.4 group and dose 0 group with similar baseline of spermidine level is statistically significant (based on the two-sided p < 0.001). c. Model dose continuously as a linear predictor. Answer: Inference: The results of the linear regression are presented by the following table: Point Estimate 95% CI Two-sided P Intercept 2.664 (2.133, 3.196) < 0.001 Slope for dose -3.291 (-4.723, -1.859) < 0.001 Slope for baseline 0.1754 (0.03274, 0.3181) 0.017 Based on the results from the linear regression, we estimated that the mean level of spermidine after 12 months of treatment is 2.66 (micromole/mg protein) for dose 0 group with baseline of spermidine level equal to 0 with a 95% confidence interval: (2.13, 3.20), and for every 1 (g/sq m/day) higher of dose level (but it’s not meaningful in this study), the mean level of spermidine will be 3.29 lower for the groups with similar baseline of spermidine levels. Based on the 95% confidence interval of the slope, our data would not be unusual if the true difference between two groups with 1 unit difference in dose level and similar baseline of spermidine level is between 4.72 lower and 1.86 lower in the group with higher level of dose. (We should notice that the 95% CI might be unreliable.) Based on the two-sided p < 0.001, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline of spermidine level. d. Model dose as two variables: a continuous linear predictor along with a quadratic term (so an additional predictor equal to the square of dose). Answer: Inference: Based on the results from the regression, the p value from multiple partial F test is less than 0.0001. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline of spermidine level. e. Model dose as a binary variable indicating whether dose was greater than 0. Answer: Inference: The results of the linear regression are presented by the following table: Point Estimate 95% CI Two-sided P Intercept 2.748 (2.081, 3.416) < 0.001 Slope for indicator -0.7110 (-1.254, -0.1679) 0.011 Slope for baseline 0.1539 (-0.05704, 0.3136) 0.059 Based on the results from the linear regression, we estimated that the mean level of spermidine after 12 months of treatment is 2.75 (micromole/mg protein) for dose 0 group with baseline of spermidine level equal to 0 with a 95% confidence interval: (2.08, 3.42), and mean level of spermidine in group with dose higher than 0 is 0.711 micromole/mg protein lower than the mean level in dose 0 group with similar baseline of spermidine levels. Based on the 95% confidence interval of the slope, our data would not be unusual if the true difference between dose 0 group and dose greater than 0 group with similar baseline of spermidine level is between 0.168 and 1.25, when the dose 0 group has higher mean level of spermidine level after 12 months of treatment. (We should notice that the 95% CI might be unreliable.) Based on the p value from the multiple partial F test, which is 0.0109, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline of spermidine level. f. Model dose as two variables: a binary variable indicating whether dose was greater than 0 and a continuous linear term. Answer: Inference: Based on the results from the linear regression including the two variables, the p value from the multiple partial F test is 0.0001. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline of spermidine level. g. Model dose as three variables: a continuous linear predictor, a quadratic term, and a cubic term (a term equal to dose raised to the third power). Answer: Inference: Based on the results from the regression, the p value from the multiple partial F test is 0.0002. Therefore, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the mucosal spermidine levels after 12 months of treatment with adjustment for the baseline of spermidine level. 4. For each of the following models, provide inference (P values, and where appropriate, 95% confidence intervals with scientific interpretation of the parameters) regarding the effect of DFMO on the odds of decreased spermidine levels after 12 months of treatment (i.e., a lower spermidine level at 12 months than at baseline). Note that in part g you are asked to provide a table of predicted values for the odds of decreased spermidine as well as the probability of decreased spermidine for each of these models. For these problems, I created a indicator variable for the decrease of spermidine levels after 12 months of treatment. a. Model dose as dummy variables. Answer: Methods: I used a logistic regression model with robust standard error to compare the odds of decreased spermidine levels after 12 months of treatment, and modeled the predictor dose as dummy variables using dose 0 group as the reference group. Based on Wald statistics, Huber-White sandwich estimator and the approximate normal distribution for regression parameter estimates, I computed the point estimate of the slope parameters with standard error, twosided p value and 95% confidenc intervals. Since all modeled covariates are derived from the POI, we could make an inference based on the p value from the overall chi squared test. Inference: Based on the p value from the overall chi squared test, which is 0.1594, we cannot reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. The regression model includes 4 parameters: one intercept, which represents the estimate of odds of decrease of spermidine level after 12 months of treatment for the reference group (i.e. the dose 0 group), three slopes for the three dose groups (dose 0.075, dose 0.2 and dose 0.4), which each represents the estimate of odds ratio of decrease of spermidine levels after 12 months of treatement between the corresponding dose group and dose 0 group. The following table provides the point estimate, a 95% confidence interval and a two-sided p value for each parameter from this model. (I used logistic() command in Stata, therefore do not need to exponentiate the parameter.) Point Estimate 95% CI Two-sided P Intercept 0.8667 (0.4108, 1.829) 0.707 Slope for dose 0.075 1.846 (0.6206, 5.492) 0.270 Slope for dose 0.2 1.875 (0.5889, 5.970) 0.287 Slope for dose 0.4 4.615 (1.220, 17.46) 0.024 We can find that the odds ratio of decrease of spermidine level after 12 months of treatment between dose 0.075 group and dose 0 group is not statistically significant, as well as the odds ratio between dose 0.2 group and dose 0 group. The odds ratio of decrease of spermidine level after 12 months of treatment between dose 0.4 group and dose 0 group is statistically significant (based on the two-sided p = 0.024). Because of the concern of high Type 1 Error, it might be inappropriate to give interpretations of the 95% confidence intervals of these parameters. b. Model dose continuously as a linear predictor. Answer: Methods: I used a logistic regression model with robust standard error and modeled dose continuously as a linear predictor to investigate the association between dose level and decrease of spermidine levels after 12 months of treatment. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters and Wald statistics, a 95% confidence interval and a two-sided p value were computed. An inference was made based on these results. Inference: Based on the results, we estimated that the odds of decreasing of spermidine after 12 months of treatment is 0.973 for dose 0 group with a 95% confidence interval: (0.545, 1.74), and for two groups differing in 1(g/sq m/day) of dose level, the odds ratio of decreasing of spermidine level after 12 months of treatment is estimated to be 30.9, with the group with higher dose level having larger odds of decreasing of spermidine level. Based on the 95% confidence interval of the slope, our data would not be unusual if the true odds ratio of two groups with 1 unit difference in dose level is between 1.40 and 682, with group with higher dose level having larger odds. Based on the two-sided p = 0.030, we can with high confidence reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. c. Model dose as two variables: a continuous linear predictor along with a quadratic term (so an additional predictor equal to the square of dose). Answer: Methods: I used a logistic regression model with robust standard error to investigate the association between dose level and decreasing of spermidine levels after 12 months of treatment. In the regression model, the decrease of spermidine level after 12 months of treatment is the response, and the continuous linear term of dose and a quadratic term of dose are the predictors. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in logistic regression and Wald statistics, 95% confidence intervals and two-sided p values were computed. Since all modeled covariates are derived from the POI, we could make an inference based on the p value from the overall chi squared test. Inference: Based on the results from the regression, the p value from overall chi squared test is 0.0931. Therefore, we cannot reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. d. Model dose as a binary variable indicating whether dose was greater than 0. Answer: Methods: I generated an indicator variable for dose greater than 0, and did a logistic regression with robust standard error using the indicator variable as predictor and the decrease of spermidine level after 12 months as response to investigate the association between DFMO and decreasing of spermidine levels. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in logistic regression and Wald statistics, a 95% confidence interval and a two-sided p value were computed. An inference was made based on these results. Inference: Based on the results from the regression, we estimated that the odds of decreasing of spermidine after 12 months in dose 0 group is 0.867, with a 95% confidence interval: (0.411, 1.83). For dose 0 group and dose greater than 0 group, the odds ratio of decreasing of spermidine level after 12 months of treatment is estimated to be 2.36, with the dose greater than 0 group having larger odds of decreasing of spermidine level. Based on the 95% confidence interval of the slope, our data would not be unusual if the true odds ratio of the two groups is between 0.954 and 5.84, with dose greater than 0 group having larger odds. Based on the two-sided p = 0.063, we cannot reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. e. Model dose as two variables: a binary variable indicating whether dose was greater than 0 and a continuous linear term. Answer: Methods: I used a logistic regression model with robust standard error to investigate the association between dose level and decreasing of spermidine levels after 12 months of treatment. In the regression model, the decrease of spermidine level is the response, and the continuous linear term of dose and the binary variable indicating whether dose was greater than 0 are the predictors. The Huber-White sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in logistic regression and Wald statistics, 95% confidence intervals and two-sided p values were computed. Since all modeled covariates are derived from the POI, we could make an inference based on the p value from the overall chi squared test. Inference: Based on the results from the logistic regression including the two variables, the p value from the overall chi squared test is 0.0765. Therefore, we cannot reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. f. Model dose as three variables: a continuous linear predictor, a quadratic term, and a cubic term (a term equal to dose raised to the third power). Answer: Methods: I used a logistic regression model with robust standard error to investigate the association between dose level and decreasing of spermidine levels after 12 months of treatment. In the regression model, the decrease of spermidine level is the response, and the continuous linear term of dose, a quadratic term of dose and a cubic term of dose are the predictors. The HuberWhite sandwich estimator was used to compute robust standard error. Based on approximately nomal distribution of the parameters in logistic regression and Wald statistics, 95% confidence intervals and two-sided p values were computed for the parameters. Since all modeled covariates are derived from the POI, we could make an inference based on the p value from the overall chi squared test. Inference: Based on the results from the regression, the p value from the overall chi squared test is 0.1594. Therefore, we cannot reject the null hypothesis that there is no effect of DFMO on the decreasing of mucosal spermidine levels after 12 months of treatment. g. Provide a table of the fitted values for each dose group from the above models. Comment on the similarities / differences between those fitted values (and the descriptive statistics). Answer: For models in part a, b, c, e, and f, the fitted values for the odds of decreasing of spermidine levels after 12 months of treatment for each dose group are presented in the following table. To see the similarities and differences more clearly, the fitted values are presented to 4 significant digits. Doses Level of DFMO (g/sq m/day) Methods 0 0.075 0.2 0.4 a 0.8667 1.600 1.625 4.000 b 0.9731 1.259 1.932 3.836 c 0.9650 1.265 1.958 3.792 e 0.8667 1.439 2.032 3.532 f 0.8667 1.600 1.625 4.000 For model d, we can only estimate the odds of decreasing of spermidine for dose 0 group and group with dose higher than 0. For dose 0 group, the fitted value is 0.8667; for group with dose higher than 0, the fitted value is 2.045. We can see that the fitted values from method a and f are the same as the descriptive statistics in problem 1. It’s because that there are four parameters in these models and the models are saturated. For method d, there are two parameters in the model, and there are two groups. It’s a saturated model, too. Therefore, the fitted value for dose 0 group in method d is the same as the descriptive statistics. If we provided the descriptive statistics for the odds of decreasing of spermidine level after 12 month of treatment for group with dose greater than 0, it should be the same as the fitted value in model d. For method e, since it includes the binary variable indicating whether dose was greater than 0, the fitted value for dose 0 group is the same as the descriptive statistics. However, to get the fitted values for other three groups in method e needs to borrow information across groups. Hence, they are different from descriptive statistics. For method b and c, they are not saturated. Therefore, the fitted values in b and c are different from descriptive statistics. 5. Which of the above analyses would you prefer a priori to test for an effect of DFMO on mucosal levels of polyamines? Answer: I would prefer to the linear regression model with robust standard error using the continuous linear dose as predictor and with adjustment for the baseline of spermidine a priori to test for an effect of DFMO on mucosal levels of spermidine. Since the dose of DFMO is an ordered variable, using dummy variables would ignore the order of it, so I would not prefer dummy variables. Using the binary indicator variable will only give us information about the dose 0 group and group with dose higher than 0, which will lose a lot of information. Adding quadratic or cubic terms to the model might fit the data better, but it’s hard to interpret. Therefore, a priori I would prefer to the linear regression model with robust standard error using the continuous linear dose as predictor and with adjustment for the baseline of spermidine.