5796

advertisement
Biost 518: Applied Biostatistics II
Biost 515: Biostatistics II
Emerson, Winter 2015
Homework #5
February 20, 2015
On this (as all homeworks) Stata / R code and unedited Stata / R output is TOTALLY unacceptable.
Instead, prepare a table of statistics gleaned from the Stata output. The table should be appropriate
for inclusion in a scientific report, with all statistics rounded to a reasonable number of significant
digits. (I am interested in how statistics are used to answer the scientific question.)
Unless explicitly told otherwise in the statement of the problem, in all problems requesting
“statistical analyses” (either descriptive or inferential), you should present both
1. Methods: A brief sentence or paragraph describing the statistical methods you used. This
should be using wording suitable for a scientific journal, though it might be a little more
detailed. A reader should be able to reproduce your analysis. DO NOT PROVIDE Stata
OR R CODE.
2. Inference: A paragraph providing full statistical inference in answer to the question.
Please see the supplementary document relating to “Reporting Associations” for details.
All problems of the homework relate to the clinical trial of DFMO and suppression of polyamines. In this
homework I ask you to sometimes use dummy variables to analyze the data. There are two approaches to
performing this analysis:
1. Provide suitable descriptive statistics.
Answer:
Methods: In this study, we are mainly interested in the effect of DFMO on the mucosal
polyamines. Typically, in this homework, we are mainly interested in the effect of
DFMO on the mucosal spermidine levels after 12 months of treatment and the baseline.
Therefore, descriptive statistics of baseline of spermidine levels, spermidine levels after
12 months of treatment, age and sex are presented within groups defined by dose levels
of DFMO (0, 0.075, 0.2 or 0.4 g/sq m/day), as well as the entire sample. For continuous
variables, we presented mean, standard deviation, minimum and maximum. The
variables included age, spermidine level at baseline and after 12 months of treatment.
For binary variable, i.e. female in this sample, we presented percentages. We are also
interested in whether there is an effect of DFMO on the decreasing of spermidine levels
after 12 months of treatment. Therefore, probability and odds of decreasing of
spermidine levels after 12 months of treatment were presented in the following table,
too.
Inference: There are 114 subjects in the dataset. However, 1 of those has missing age.
And only 95 of 114 subjects have available spermidine levels after 12 months of
treatment. For these missing values, we just omitted them, but it should be
remembered that we cannot assess the influence that such omissions might have on our
results.
Of the 114 subjects, 32 subjects took doses of 0 g/sq m/day, 29 subjects took doses of
0.075 g/sq m/day, 25 subjects took doses of 0.2 g/sq m/day and 28 subjects took doses of
0.4 g/sq m/day. And after 12 months of treatment, there are 28 subjects in dose 0 group,
26 subjects in dose 0.075 group, 21 subjects in dose 0.2 group and 20 subjects in dose
0.4 group. The following table presents descriptive statistics within these groups and
for the entire sample.
We can find that for each dose groups except the placebo group (i.e. dose 0 group), the
mean spermidine levels after 12 months of treatment are all lower than the mean
spermidine levels at baseline. For placebo group, the mean spermidine level after 12
months of treatment is similar to the mean spermidine level at baseline. In addition,
the probability of decrease of spermidine level after 12 months of treatment is
increasing with the increase of the dose level for each group, as well as the odds.
Therefore, we could say that there might be a trend to lower spermidine levels after 12
months of treatment with the increase of the dose level of DFMO. And the probability
of decreasing of spermidine level after 12 months might be larger with the increase of
the dose level. There is no obvious trend in sex or age across groups defined by
different doses of DFMO.
Doses Level of DFMO (g/sq m/day)
0
(n=32)
0.075
(n=29)
0.2
(n=25)
0.4
(n=28)
Any level
(n=114)
Female (%)
18.8%
17.2%
0
21.4
14.9
Age (yrs)
65.9 (8.51;
45.5 – 77.2)
61.3 (7.69;
47.8 – 76.9)
(n=28)
62.8 (8.28;
45.4 – 77.6)
63.9 (7.81;
48.5 – 81.0)
63.6 (8.16;
45.4 – 81.0)
(n=113)
Spermidine at
baseline
3.26 (1.45;
1.40- 7.05)
3.47 (1.55;
1.51 – 7.02)
3.35 (1.33;
1.70 – 6.22)
3.56 (1.88;
0.66 – 7.6)
3.41 (1.55;
0.66 – 7.6)
Spermidine
after 12 months
3.26 (1.31;
1.01 – 5.91)
(n=28)
2.92 (0.994;
1.35 – 4.92)
(n=26)
2.71 (1.40;
0.293–6.45)
(n=21)
1.95 (0.799;
0 – 3.42)
(n=20)
2.77 (1.23; 0
– 6.45)
(n=95)
Probability of
decrease of
spermidine after
12 months
0.464
(n=28)
0.615
(n=26)
0.619
(n=21)
0.800
(n=20)
0.611
(n=95)
Odds
0.8667
(n=28)
1.600
(n=26)
1.625
(n=21)
4.000
(n=20)
1.568
(n=95)
1For
2For
the level of the polyamines in the table, the units are micromole/mg protein.
continuous variables, descriptive statistics are presented in the form: mean (sd;
min - max)
2. For each of the following models, provide inference (P values, and where appropriate, 95%
confidence intervals with scientific interpretation of the parameters) regarding the effect of
DFMO on the mucosal spermidine levels after 12 months of treatment. (Recall that when
multiple modeled covariates are derived from the same scientific factor, you need to test
all those covariates simultaneously. When no other covariates are in the model, the
“overall F” or “overall chi squared” test can do this for us.) Note that part h asks you to
provide a table of predicted values for each of these models.
a. Model dose as dummy variables using the dose 0 group as the reference group.
Answer:
Method: To investigate the association between DFMO and mucosal spermidine
levels after 12 months of treatment, we used a linear regression model with
robust standard error, and modeled dose as dummy variables using the dose 0
group as the reference group and modeled spermidine level as an
untransformed continuous variable. The spermidine level is our response and
the dose of DFMO is the predictor. The Huber-White sandwich estimator was
used to compute standard error. Based on approximately nomal distribution of
the parameters in linear regression and Wald statistics, 95% confidence
intervals and two-sided p values were computed. An inference was made based
on the p value from the overall F test.
Inference:
Based on the p value from the overall F test, which is 0.0001, we can with high
confidence reject the null hypothesis that there is no effect of DFMO on the
mucosal spermidine levels after 12 months of treatment.
The regression model includes 4 parameters: one intercept, which represents
the estimate of mean spermidine level after 12 months of treatment for the
reference group (i.e. the dose 0 group), three slopes for the three dose groups
(dose 0.075, dose 0.2 and dose 0.4), which each represents the estimate of
difference of mean spermidine levels between the corresponding dose group and
dose 0 group. The following table provides the point estimate, a 95% confidence
interval and a two-sided p value for each parameter from this model.
Point Estimate
95% CI
Two-sided P
Intercept
3.256
(2.761, 3.751)
< 0.001
Slope for dose 0.075
-0.3363
(-0.9649, 0.2924)
0.291
Slope for dose 0.2
-0.5443
(-1.324, 0.2358)
0.169
Slope for dose 0.4
-1.306
(-1.914, -0.6983)
< 0.001
We can find that the difference of mean spermidine level after 12 months of
treatment between dose 0.075 group and dose 0 group is not statistically
significant, as well as the difference between dose 0.2 group and dose 0 group.
The difference of mean spermidine level after 12 months of treatment between
dose 0.4 group and dose 0 group is statistically significant (based on the twosided p < 0.001).
Because of the concern of high Type 1 Error, it might be inappropriate to give
interpretations of the 95% confidence intervals of these parameters.
b. Model dose as dummy variables using the dose 0.075 group as the reference group.
You do not have to provide a formal description of the methods or inference for this
part. Instead comment on how the regression parameters from this model relate to
those obtained in part a. Suppose we were to completely ignore the major multiple
comparison issues and to instead trust the individual p values listed in the coefficient
table, what conclusions would we reach about differences among the dose groups in
part a vs in part b?
Answer:
When using the dose 0.075 group as the reference group, the intercept, which is
2.92, is the estimate of mean spermidine level after 12 months for dose 0.075
group, which is the sum of the intercept and slope for dose 0.075 in part a.
The slope for dose 0 , which is 0.336, is the estimate of difference between the
two mean levels for dose 0 group and dose 0.075 group. It has the same absolute
value as the slope for dose 0.075 in part a, but with a positive sign.
The slope for dose 0.2 , which is -0.208, is the estimate of difference between the
two mean levels for dose 0.2 group and dose 0.075 group. It is equal to the slope
for dose 0.2 subtracting the slope for dose 0.075 in part a.
The slope for dose 0.4, which is -0.970, is the estimate of difference between the
two mean levels for dose 0.4 group and dose 0.075 group. It is equal to the slope
for dose 0.4 subtracting the slope for dose 0.075 in part a.
If we ignored the major multiple comparison inssues, based on the individual
two-sided p values for the parameters from the linear regression using dose
0.075 group as reference group, the difference between the two mean levels for
dose 0 group and dose 0.075 group (a two-sided p = 0.291), as well as the
difference between the two mean levels for dose 0.2 group and dose 0.075 group
(a two-sided p = 0.566), are not statistically significant. The difference between
the mean levels for dose 0.4 group and dose 0.075 group (a two-sided p < 0.001)
is statistically significant.
In part a, we could concluded that the difference between mean levels of
spermidine after 12 months of treatment for dose 0 group and dose 0.075 group
or dose 0.2 group is not statistically significant. The difference between mean
levels for dose 0 group and dose 0.4 group is statistically significant.
c. Model dose continuously as a linear predictor.
Answer:
Methods: I used a linear regression model with robust standard error and
modeled dose continuously as a linear predictor to investigate the association
between dose level and spermidine levels after 12 months of treatment. The
Huber-White sandwich estimator was used to compute robust standard error.
Based on approximately nomal distribution of the parameters in linear
regression and Wald statistics, a 95% confidence interval and a two-sided p
value were computed. An inference was made based on these results.
Inference: Based on the results, we estimated that the mean level of spermidine
after 12 months of treatment is 3.23 (micromole/mg protein) for dose 0 group
with a 95% confidence interval: (2.89, 3.58), and for every 1 (g/sq m/day) higher
of dose level (but it’s not meaningful in this study), the mean level of spermidine
will be 3.13 lower. Based on the 95% confidence interval of the slope, our data
would not be unusual if the true difference between two groups with 1 unit
difference in dose level is between 4.50 lower and 1.75 lower in the group with
higher level of dose. Based on the two-sided p < 0.001, we can with high
confidence reject the null hypothesis that there is no effect of DFMO on the
mucosal spermidine levels after 12 months of treatment.
d. Model dose as two variables: a continuous linear predictor along with a quadratic
term (so an additional predictor equal to the square of dose).
Answer:
Methods: I used a linear regression model with robust standard error to
investigate the association between dose level and spermidine levels after 12
months of treatment. In the regression model, the spermidine level is the
response, and the continuous linear term of dose and a quadratic term of dose
are the predictors. The Huber-White sandwich estimator was used to compute
robust standard error. Based on approximately nomal distribution of the
parameters in linear regression and Wald statistics, 95% confidence intervals
and two-sided p values were computed. An inference was made based on the p
value from overall F test.
Inference: Based on the results from the regression, the p value from overall F
test is less than 0.0001. Therefore, we can with high confidence reject the null
hypothesis that there is no effect of DFMO on the mucosal spermidine levels
after 12 months of treatment.
e. Model dose as a binary variable indicating whether dose was greater than 0.
Answer:
Methods: I generated an indicator variable for dose greater than 0, and did a
linear regression with robust standard error using the indicator variable as
predictor and the level of spermidine after 12 months as response to investigate
the association between dose and spermidine levels. The Huber-White sandwich
estimator was used to compute robust standard error. Based on approximately
nomal distribution of the parameters in linear regression and Wald statistics, a
95% confidence interval and a two-sided p value were computed. An inference
was made based on these results.
Inference: Based on the results from the regression, we estimated that the mean
level of spermidine after 12 months in dose 0 group is 3.26 (micromole/mg
protein), with a 95% confidence interval: (2.77, 3.75). Mean level of spermidine
in group with dose higher than 0 is 0.691 micromole/mg protein lower than the
mean level in dose 0 group. Our data would not be unusual if the true difference
in mean level of spermidine between dose 0 group and group with dose higher
than 0 is between 0.128 and 1.25, with dose 0 group having higher mean level of
spermidine. Based on the two-sided p = 0.017, we can with high confidence
reject the null hypothesis that there is no effect of DFMO on the mucosal
spermidine levels after 12 months of treatment.
f. Model dose as two variables: a binary variable indicating whether dose was greater
than 0 and a continuous linear term.
Answer:
Methods: I used a linear regression model with robust standard error to
investigate the association between dose level and spermidine levels after 12
months of treatment. In the regression model, the spermidine level is the
response, and the continuous linear term of dose and the binary variable
indicating whether dose was greater than 0 are the predictors. The HuberWhite sandwich estimator was used to compute robust standard error. Based
on approximately nomal distribution of the parameters in linear regression and
Wald statistics, 95% confidence intervals and two-sided p values were
computed. An inference was made based on the p value from the overall F test.
Inference: Based on the results from the linear regression including the two
variables, the p value from the overall F test is 0.0001. Therefore, we can with
high confidence reject the null hypothesis that there is no effect of DFMO on the
mucosal spermidine levels after 12 months of treatment.
g. Model dose as three variables: a continuous linear predictor, a quadratic term, and a
cubic term (a term equal to dose raised to the third power).
Answer:
Methods: I used a linear regression model with robust standard error to
investigate the association between dose level and spermidine levels after 12
months of treatment. In the regression model, the spermidine level is the
response, and the continuous linear term of dose, a quadratic term of dose and a
cubic term of dose are the predictors. The Huber-White sandwich estimator
was used to compute robust standard error. Based on approximately nomal
distribution of the parameters in linear regression and Wald statistics, 95%
confidence intervals and two-sided p values were computed for the parameters.
An inference was made based on the p-value from the overall F test.
Inference: Based on the results from the regression, the p value from the overall
F test is 0.0001. Therefore, we can with high confidence reject the null
hypothesis that there is no effect of DFMO on the mucosal spermidine levels
after 12 months of treatment.
h. Provide a table of the fitted values for each dose group from the above models.
Comment on the similarities / differences between those fitted values (and the
descriptive statistics).
Answer:
For methods from part a, b, c, d, f and g, the fitted values for the mean level of
spermidine after 12 months of treatment for each dose group are presented in
the following table. To see the similarities and differences more clearly, the
fitted values are presented to 4 significant digits.
Doses Level of DFMO (g/sq m/day)
Methods
0
0.075
0.2
0.4
a
3.256
2.920
2.712
1.950
b
3.256
2.920
2.712
1.950
c
3.234
3.000
2.609
1.984
d
3.213
3.011
2.642
1.964
f
3.256
2.976
2.599
1.995
g
3.256
2.920
2.712
1.950
For method e, we can only estimate the mean level of spermidine for dose 0
group and group with dose higher than 0. For dose 0 group, the fitted value is
3.256; for group with dose higher than 0, the fitted value is 2.565.
We can see that the fitted values from method a, b and g are the same as the
descriptive statistics in problem 1. It’s because that there are four parameters in
these models and the models are saturated. For method e, there are two
parameters in the model, and there are two groups. It’s a saturated model, too.
Therefore, the fitted value for dose 0 group in method e is the same as the
descriptive statistics. If we provided the descriptive statistics for the mean
spermidine level for group with dose greater than 0, it should be the same as the
fitted value in model e. For method f, since it includes the binary variable
indicating whether dose was greater than 0, the fitted value for dose 0 group is
the same as the descriptive statistics. However, to get the fitted values for other
three groups in method f needs to borrow information across groups. Hence,
they are different from descriptive statistics. For method c and d, they are not
saturated. Therefore, the fitted values in c and d are different from descriptive
statistics.
3. Repeat the analyses in problem 2 adjusting for the baseline mucosal spermidine levels.
(Note that the Stata functions "test" and "testparm" can be used to perform Wald tests of
multiple parameters adjusted for other covariates.) You do not need to consider the
descriptive statistics or the fitted values for this problem.
For these problems, I just added an adjustment for the baseline mucosal spermidine
levels in the previous models in problem2. Since the baseline spermidine level is not
associated with the predictor of interest, i.e. the dose, I made inference based on the p
value from multiple partial F test.
a. Model dose as dummy variables using the dose 0 group as the reference group.
Answer:
Inference:
Based on the p value from the multiple partial F test, which is 0.0002, we can
with high confidence reject the null hypothesis that there is no effect of DFMO
on the mucosal spermidine levels after 12 months of treatment with adjustment
for the baseline mucosal spermidine levels.
The regression model includes 5 parameters: one intercept, which represents
the estimate of mean spermidine level after 12 months of treatment for the
reference group (i.e. the dose 0 group) with baseline spermidine levels equal to 0;
three slopes for the three dose groups (dose 0.075, dose 0.2 and dose 0.4), which
each represents the difference of mean spermidine levels after 12 months of
treatment between the corresponding dose group and dose 0 group with similar
baseline spermidine levels; one slope for the baseline spermidine level, which
represents the estimate of the difference in mean spermidine levels after 12
months of treatment for two groups differing in 1 micromole/mg protein in
baseline spermidine level with the same level of dose. The following table
provides the point estimate, a 95% confidence interval and a two-sided p value
for each parameter from this model.
Point Estimate
95% CI
Two-sided P
Intercept
2.670
(2.036, 3.303)
< 0.001
Slope for dose 0.075
-0.3425
(-0.9325, 0.2476)
0.252
Slope for dose 0.2
-0.5411
(-1.299, 0.2168)
0.160
Slope for dose 0.4
-1.379
(-2.010, -0.7484)
< 0.001
Slope for baseline
0.1779
(0.03185, 0.3239)
0.018
We can find that the difference of mean spermidine level after 12 months of
treatment between dose 0.075 group and dose 0 group is not statistically
significant after adjustment for the baseline of spermidine level, as well as the
difference between dose 0.2 group and dose 0 group. The difference of mean
spermidine level after 12 months of treatment between dose 0.4 group and dose
0 group with similar baseline of spermidine level is statistically significant
(based on the two-sided p < 0.001).
Because of the concern of high Type 1 Error, it might be inappropriate to give
interpretations of the 95% confidence intervals of these parameters.
b. Model dose as dummy variables using the dose 0.075 group as the reference group.
You do not have to provide a formal description of the methods or inference for this
part. Instead comment on how the regression parameters from this model relate to
those obtained in part a. Suppose we were to completely ignore the major multiple
comparison issues and to instead trust the individual p values listed in the coefficient
table, what conclusions would we reach about differences among the dose groups in
part a vs in part b?
Answer:
When using the dose 0.075 group as the reference group, the intercept, which is
2.33, is the estimate of mean spermidine level after 12 months for dose 0.075
group with baseline of spemidine level equal to 0, which is the sum of the
intercept and slope for dose 0.075 in part a.
The slope for dose 0 , which is 0.342, is the estimate of difference between the
two mean spermidine levels after 12 months of treatment for dose 0 group and
dose 0.075 group with similar baseline of spermidine levels. It has the same
absolute value as the slope for dose 0.075 in part a, but with a positive sign.
The slope for dose 0.2 , which is -0.199, is the estimate of difference between the
two mean spemidine levels after 12 months of treatment for dose 0.2 group and
dose 0.075 group with similar baseline of spermidine levels. It is equal to the
slope for dose 0.2 subtracting the slope for dose 0.075 in part a.
The slope for dose 0.4, which is -1.04, is the estimate of difference between the
two mean spemidine levels after 12 months of treatment for dose 0.4 group and
dose 0.075 group with similar baseline of spermidine levels. It is equal to the
slope for dose 0.4 subtracting the slope for dose 0.075 in part a.
The slope for baseline of spermidine level, which is 0.178, represents the
estimate of the difference in mean spermidine levels after 12 months of
treatment for two groups differing in 1 micromole/mg protein in baseline
spermidine level with the same level of dose. It is the same as the point estimate
of slope for baseline in part a.
If we ignored the major multiple comparison inssues, based on the individual
two-sided p values for the parameters from the linear regression using dose
0.075 group as reference group, the difference between the two mean
spermidine levels after 12 months of treatment for dose 0 group and dose 0.075
group with similar baseline spermidine levels (a two-sided p = 0.252), as well as
the difference between the two mean levels for dose 0.2 group and dose 0.075
group with similar baseline (a two-sided p = 0.573), are not statistically
significant. The difference between the mean spermidine levels after 12 months
of treatment for dose 0.4 group and dose 0.075 group with similar baseline is
statistically significant (a two-sided p < 0.001).
In part a, we could concluded that the difference of mean spermidine level after
12 months of treatment between dose 0.075 group and dose 0 group is not
statistically significant after adjustment for the baseline of spermidine level, as
well as the difference between dose 0.2 group and dose 0 group. The difference
of mean spermidine level after 12 months of treatment between dose 0.4 group
and dose 0 group with similar baseline of spermidine level is statistically
significant (based on the two-sided p < 0.001).
c. Model dose continuously as a linear predictor.
Answer:
Inference: The results of the linear regression are presented by the following
table:
Point Estimate
95% CI
Two-sided P
Intercept
2.664
(2.133, 3.196)
< 0.001
Slope for dose
-3.291
(-4.723, -1.859)
< 0.001
Slope for baseline
0.1754
(0.03274, 0.3181)
0.017
Based on the results from the linear regression, we estimated that the mean
level of spermidine after 12 months of treatment is 2.66 (micromole/mg protein)
for dose 0 group with baseline of spermidine level equal to 0 with a 95%
confidence interval: (2.13, 3.20), and for every 1 (g/sq m/day) higher of dose
level (but it’s not meaningful in this study), the mean level of spermidine will be
3.29 lower for the groups with similar baseline of spermidine levels. Based on
the 95% confidence interval of the slope, our data would not be unusual if the
true difference between two groups with 1 unit difference in dose level and
similar baseline of spermidine level is between 4.72 lower and 1.86 lower in the
group with higher level of dose. (We should notice that the 95% CI might be
unreliable.) Based on the two-sided p < 0.001, we can with high confidence
reject the null hypothesis that there is no effect of DFMO on the mucosal
spermidine levels after 12 months of treatment with adjustment for the baseline
of spermidine level.
d. Model dose as two variables: a continuous linear predictor along with a quadratic
term (so an additional predictor equal to the square of dose).
Answer:
Inference: Based on the results from the regression, the p value from multiple
partial F test is less than 0.0001. Therefore, we can with high confidence reject
the null hypothesis that there is no effect of DFMO on the mucosal spermidine
levels after 12 months of treatment with adjustment for the baseline of
spermidine level.
e. Model dose as a binary variable indicating whether dose was greater than 0.
Answer:
Inference: The results of the linear regression are presented by the following
table:
Point Estimate
95% CI
Two-sided P
Intercept
2.748
(2.081, 3.416)
< 0.001
Slope for indicator
-0.7110
(-1.254, -0.1679)
0.011
Slope for baseline
0.1539
(-0.05704, 0.3136)
0.059
Based on the results from the linear regression, we estimated that the mean
level of spermidine after 12 months of treatment is 2.75 (micromole/mg protein)
for dose 0 group with baseline of spermidine level equal to 0 with a 95%
confidence interval: (2.08, 3.42), and mean level of spermidine in group with
dose higher than 0 is 0.711 micromole/mg protein lower than the mean level in
dose 0 group with similar baseline of spermidine levels. Based on the 95%
confidence interval of the slope, our data would not be unusual if the true
difference between dose 0 group and dose greater than 0 group with similar
baseline of spermidine level is between 0.168 and 1.25, when the dose 0 group
has higher mean level of spermidine level after 12 months of treatment. (We
should notice that the 95% CI might be unreliable.) Based on the p value from
the multiple partial F test, which is 0.0109, we can with high confidence reject
the null hypothesis that there is no effect of DFMO on the mucosal spermidine
levels after 12 months of treatment with adjustment for the baseline of
spermidine level.
f. Model dose as two variables: a binary variable indicating whether dose was greater
than 0 and a continuous linear term.
Answer:
Inference: Based on the results from the linear regression including the two
variables, the p value from the multiple partial F test is 0.0001. Therefore, we
can with high confidence reject the null hypothesis that there is no effect of
DFMO on the mucosal spermidine levels after 12 months of treatment with
adjustment for the baseline of spermidine level.
g. Model dose as three variables: a continuous linear predictor, a quadratic term, and a
cubic term (a term equal to dose raised to the third power).
Answer:
Inference: Based on the results from the regression, the p value from the
multiple partial F test is 0.0002. Therefore, we can with high confidence reject
the null hypothesis that there is no effect of DFMO on the mucosal spermidine
levels after 12 months of treatment with adjustment for the baseline of
spermidine level.
4. For each of the following models, provide inference (P values, and where appropriate, 95%
confidence intervals with scientific interpretation of the parameters) regarding the effect of
DFMO on the odds of decreased spermidine levels after 12 months of treatment (i.e., a
lower spermidine level at 12 months than at baseline). Note that in part g you are asked to
provide a table of predicted values for the odds of decreased spermidine as well as the
probability of decreased spermidine for each of these models.
For these problems, I created a indicator variable for the decrease of spermidine levels
after 12 months of treatment.
a. Model dose as dummy variables.
Answer:
Methods: I used a logistic regression model with robust standard error to
compare the odds of decreased spermidine levels after 12 months of treatment,
and modeled the predictor dose as dummy variables using dose 0 group as the
reference group. Based on Wald statistics, Huber-White sandwich estimator
and the approximate normal distribution for regression parameter estimates, I
computed the point estimate of the slope parameters with standard error, twosided p value and 95% confidenc intervals. Since all modeled covariates are
derived from the POI, we could make an inference based on the p value from
the overall chi squared test.
Inference:
Based on the p value from the overall chi squared test, which is 0.1594, we
cannot reject the null hypothesis that there is no effect of DFMO on the
decreasing of mucosal spermidine levels after 12 months of treatment.
The regression model includes 4 parameters: one intercept, which represents
the estimate of odds of decrease of spermidine level after 12 months of
treatment for the reference group (i.e. the dose 0 group), three slopes for the
three dose groups (dose 0.075, dose 0.2 and dose 0.4), which each represents the
estimate of odds ratio of decrease of spermidine levels after 12 months of
treatement between the corresponding dose group and dose 0 group. The
following table provides the point estimate, a 95% confidence interval and a
two-sided p value for each parameter from this model. (I used logistic()
command in Stata, therefore do not need to exponentiate the parameter.)
Point Estimate
95% CI
Two-sided P
Intercept
0.8667
(0.4108, 1.829)
0.707
Slope for dose 0.075
1.846
(0.6206, 5.492)
0.270
Slope for dose 0.2
1.875
(0.5889, 5.970)
0.287
Slope for dose 0.4
4.615
(1.220, 17.46)
0.024
We can find that the odds ratio of decrease of spermidine level after 12 months
of treatment between dose 0.075 group and dose 0 group is not statistically
significant, as well as the odds ratio between dose 0.2 group and dose 0 group.
The odds ratio of decrease of spermidine level after 12 months of treatment
between dose 0.4 group and dose 0 group is statistically significant (based on the
two-sided p = 0.024).
Because of the concern of high Type 1 Error, it might be inappropriate to give
interpretations of the 95% confidence intervals of these parameters.
b. Model dose continuously as a linear predictor.
Answer:
Methods: I used a logistic regression model with robust standard error and
modeled dose continuously as a linear predictor to investigate the association
between dose level and decrease of spermidine levels after 12 months of
treatment. The Huber-White sandwich estimator was used to compute robust
standard error. Based on approximately nomal distribution of the parameters
and Wald statistics, a 95% confidence interval and a two-sided p value were
computed. An inference was made based on these results.
Inference: Based on the results, we estimated that the odds of decreasing of
spermidine after 12 months of treatment is 0.973 for dose 0 group with a 95%
confidence interval: (0.545, 1.74), and for two groups differing in 1(g/sq m/day)
of dose level, the odds ratio of decreasing of spermidine level after 12 months of
treatment is estimated to be 30.9, with the group with higher dose level having
larger odds of decreasing of spermidine level. Based on the 95% confidence
interval of the slope, our data would not be unusual if the true odds ratio of two
groups with 1 unit difference in dose level is between 1.40 and 682, with group
with higher dose level having larger odds. Based on the two-sided p = 0.030, we
can with high confidence reject the null hypothesis that there is no effect of
DFMO on the decreasing of mucosal spermidine levels after 12 months of
treatment.
c. Model dose as two variables: a continuous linear predictor along with a quadratic
term (so an additional predictor equal to the square of dose).
Answer:
Methods: I used a logistic regression model with robust standard error to
investigate the association between dose level and decreasing of spermidine
levels after 12 months of treatment. In the regression model, the decrease of
spermidine level after 12 months of treatment is the response, and the
continuous linear term of dose and a quadratic term of dose are the predictors.
The Huber-White sandwich estimator was used to compute robust standard
error. Based on approximately nomal distribution of the parameters in logistic
regression and Wald statistics, 95% confidence intervals and two-sided p values
were computed. Since all modeled covariates are derived from the POI, we
could make an inference based on the p value from the overall chi squared test.
Inference: Based on the results from the regression, the p value from overall chi
squared test is 0.0931. Therefore, we cannot reject the null hypothesis that there
is no effect of DFMO on the decreasing of mucosal spermidine levels after 12
months of treatment.
d. Model dose as a binary variable indicating whether dose was greater than 0.
Answer:
Methods: I generated an indicator variable for dose greater than 0, and did a
logistic regression with robust standard error using the indicator variable as
predictor and the decrease of spermidine level after 12 months as response to
investigate the association between DFMO and decreasing of spermidine levels.
The Huber-White sandwich estimator was used to compute robust standard
error. Based on approximately nomal distribution of the parameters in logistic
regression and Wald statistics, a 95% confidence interval and a two-sided p
value were computed. An inference was made based on these results.
Inference: Based on the results from the regression, we estimated that the odds
of decreasing of spermidine after 12 months in dose 0 group is 0.867, with a
95% confidence interval: (0.411, 1.83). For dose 0 group and dose greater than 0
group, the odds ratio of decreasing of spermidine level after 12 months of
treatment is estimated to be 2.36, with the dose greater than 0 group having
larger odds of decreasing of spermidine level. Based on the 95% confidence
interval of the slope, our data would not be unusual if the true odds ratio of the
two groups is between 0.954 and 5.84, with dose greater than 0 group having
larger odds. Based on the two-sided p = 0.063, we cannot reject the null
hypothesis that there is no effect of DFMO on the decreasing of mucosal
spermidine levels after 12 months of treatment.
e. Model dose as two variables: a binary variable indicating whether dose was greater
than 0 and a continuous linear term.
Answer:
Methods: I used a logistic regression model with robust standard error to
investigate the association between dose level and decreasing of spermidine
levels after 12 months of treatment. In the regression model, the decrease of
spermidine level is the response, and the continuous linear term of dose and the
binary variable indicating whether dose was greater than 0 are the predictors.
The Huber-White sandwich estimator was used to compute robust standard
error. Based on approximately nomal distribution of the parameters in logistic
regression and Wald statistics, 95% confidence intervals and two-sided p values
were computed. Since all modeled covariates are derived from the POI, we
could make an inference based on the p value from the overall chi squared test.
Inference: Based on the results from the logistic regression including the two
variables, the p value from the overall chi squared test is 0.0765. Therefore, we
cannot reject the null hypothesis that there is no effect of DFMO on the
decreasing of mucosal spermidine levels after 12 months of treatment.
f. Model dose as three variables: a continuous linear predictor, a quadratic term, and a
cubic term (a term equal to dose raised to the third power).
Answer:
Methods: I used a logistic regression model with robust standard error to
investigate the association between dose level and decreasing of spermidine
levels after 12 months of treatment. In the regression model, the decrease of
spermidine level is the response, and the continuous linear term of dose, a
quadratic term of dose and a cubic term of dose are the predictors. The HuberWhite sandwich estimator was used to compute robust standard error. Based
on approximately nomal distribution of the parameters in logistic regression
and Wald statistics, 95% confidence intervals and two-sided p values were
computed for the parameters. Since all modeled covariates are derived from the
POI, we could make an inference based on the p value from the overall chi
squared test.
Inference: Based on the results from the regression, the p value from the overall
chi squared test is 0.1594. Therefore, we cannot reject the null hypothesis that
there is no effect of DFMO on the decreasing of mucosal spermidine levels after
12 months of treatment.
g. Provide a table of the fitted values for each dose group from the above models.
Comment on the similarities / differences between those fitted values (and the
descriptive statistics).
Answer:
For models in part a, b, c, e, and f, the fitted values for the odds of decreasing of
spermidine levels after 12 months of treatment for each dose group are
presented in the following table. To see the similarities and differences more
clearly, the fitted values are presented to 4 significant digits.
Doses Level of DFMO (g/sq m/day)
Methods
0
0.075
0.2
0.4
a
0.8667
1.600
1.625
4.000
b
0.9731
1.259
1.932
3.836
c
0.9650
1.265
1.958
3.792
e
0.8667
1.439
2.032
3.532
f
0.8667
1.600
1.625
4.000
For model d, we can only estimate the odds of decreasing of spermidine for dose
0 group and group with dose higher than 0. For dose 0 group, the fitted value is
0.8667; for group with dose higher than 0, the fitted value is 2.045.
We can see that the fitted values from method a and f are the same as the
descriptive statistics in problem 1. It’s because that there are four parameters in
these models and the models are saturated. For method d, there are two
parameters in the model, and there are two groups. It’s a saturated model, too.
Therefore, the fitted value for dose 0 group in method d is the same as the
descriptive statistics. If we provided the descriptive statistics for the odds of
decreasing of spermidine level after 12 month of treatment for group with dose
greater than 0, it should be the same as the fitted value in model d. For method
e, since it includes the binary variable indicating whether dose was greater than
0, the fitted value for dose 0 group is the same as the descriptive statistics.
However, to get the fitted values for other three groups in method e needs to
borrow information across groups. Hence, they are different from descriptive
statistics. For method b and c, they are not saturated. Therefore, the fitted
values in b and c are different from descriptive statistics.
5. Which of the above analyses would you prefer a priori to test for an effect of DFMO on
mucosal levels of polyamines?
Answer:
I would prefer to the linear regression model with robust standard error using the
continuous linear dose as predictor and with adjustment for the baseline of spermidine
a priori to test for an effect of DFMO on mucosal levels of spermidine. Since the dose of
DFMO is an ordered variable, using dummy variables would ignore the order of it, so I
would not prefer dummy variables. Using the binary indicator variable will only give
us information about the dose 0 group and group with dose higher than 0, which will
lose a lot of information. Adding quadratic or cubic terms to the model might fit the
data better, but it’s hard to interpret. Therefore, a priori I would prefer to the linear
regression model with robust standard error using the continuous linear dose as
predictor and with adjustment for the baseline of spermidine.
Download