Stat 565 Final Exam Solutions Fall 2005

advertisement

Stat 565 Final Exam Solutions Fall 2005

1. (a) [4pts] The Kaplan-Meier estimator of a survival curve uses the censored observations. At the j-th failure time, the estimated probability of surviving beyond the previous failure time is multiplied by n j

− d j = number who survive beyond the j-th failure time n number at risk just prior to the j-th failure time j

Both the numerator and denominator of this ratio include individuals who are censored after the j-th failure time.

(b) [4pts] Several estimators for the baseline survivor function have been proposed.

The most commonly used estimator is based on the Nelson-Aalen estimator of the baseline cumulative hazard function

ˆ

0

= − ˆ

0

= ∑

− d j

( j)

) exp(x

T

) i

⎟ where d j is the number of individuals who fail at j-th ordered failure time t

( j) and is the estimator of the regression coefficients obtained by maximizing the Cox partial likelihood.

(c) [4pts] The hazard function, h(t), in not the conditional probability of dying by time t. It is a rate computed as the conditional probability of failing in the time interval

(t, t

+ Δ

] , given survival up to time t, divided by

Δ

, and taking the limit as

Δ →

0 .

2. (a) [4pts] Based on the estimated survivor curves for those treated with the drug

(group 1)

(i) the median survival time for group 1 is approximately 150 days:

(ii) the probability that an infected patient treated with the drug lives at least 100

days is about 0.67.

(b) [4pts] The null and alternative hypotheses for the log-rank test are

1

(c) [4pts] A Cox proportional hazards model for these data is h(t)

= h (t) e

β x where h (t)

0 is the baseline hazard function x

1 if the patient is treated with the drug

⎩ 0 if the patient is treated with the placebo

For any time t,

β is the natural logarithm of the ratio of hazards for patients treated with the drug versus treatment with the placebo.

Consequently,

β is the natural logarithm of the relative risk for dying shortly after time t for patients treated with the drug versus treatment with the placebo, given survival up to time t.

(d) [4pts] The proportional hazards assumption implies that S(t)

= [

S (t)

] e

β x

. Since the estimated survivor curves do not cross this could be satisfied. Closer inspection of the estimated curves, however, reveals that the estimated survivor curve for the placebo group is zero after about 160 days, while the estimated survivor curve for the patients treated with the drug is above 0.20 after 400 days.

This suggests that the ratio of hazards may not be constant across all time points.

3. (a) [4pts] In this model

β

1

represents monthly change in mean total cholesterol

level for patients given the standard treatment.

(b) [4pts] The proposed model appears to be consistent with the information provided by the plots. The lower left plot indicates nearly straight line decreases in the sample means for total cholesterol for both treatments. So the systematic part of the model appears to be a good approximation. The first two plots are more difficult to read, but they indicate variability about the treatment lines that could be largely explained by variation in lines for individual patients about the treatment lines.

2

(c) [4pts] This model could be written in matrix form as Y X Zb

+ e . Suppose

% % % % that patient 1 was assigned to the standard treatment and patient 2 was assigned to the combined treatment and each patient was measured at all six time points. Fill in the rows of the X, Z, and b

%

for those two patients.

Y

11

⎢ Y

12

⎢ ⎥

Y

Y

13

Y

14

15

Y

16

⎥ =

Y

21

⎢ ⎥

Y

Y

22

Y

23

24

Y

25

Y

26

⎡ 1 0 0

1 0.5 0

1 1 0

1 3 0

1 6 0

1 9 0

1 0 0

1 0.5 0.5

1 1 1

1 3 3

1 6 6

1 9 9

β

⎢ ⎥

⎢ ⎥

⎣ ⎦

⎡ 1 0 0 0

1 0.5 0 0

1 1 0 0

1 3 0 0

1 6 0 0

1 9 0 0

0 0 1 0

0 0 1 0.5

0 0 1 1

0 0 1 3

0 0 1 6

0 0 1 9

⎥ ⎡ ⎤

⎥ ⎢ ⎥

⎥ ⎢

⎥ ⎢

⎥ ⎢

⎥ ⎢ ⎣ b

11 b b

02

12

⎣ e e

⎢ e

⎢ e

16

⎢ e

14 e e e e

11

12

13

15

21

22

23

26

⎢ ⎥

⎢ e e

24

25

(d) [8pts] Relative to the proposed model, the null hypothesis that the trends in mean total cholesterol levels across time are the same for the standard and combination treatments is H :

0

0 . Test against the two sided alternative could either use a t-test or an F-test. The value of the t-test is

H : a

0 .You t

=

3.32

0

= −

4.64 with 349 degrees of freedom.

0.715

The value for the F-test is 21.54, the square of the t-test, with (1,349) degrees of freedom. Since the p-value is less than .0001, the null hypothesis is rejected.

Mean total cholesterol tends to decrease more rapidly under the combination treatment than with the standard treatment. The estimated rate of decrease during the first nine months of the standard treatment is 4.52 mg/dL per month. The estimated rate of decrease during the first nine months of the combination treatment is 7.86 mg/dL per month.

3

(e) [8pts] Relative to the proposed model, the null hypothesis is

Test against the two sided alternative H : a

H :

0

β + β =

1 2

0 .

β + β ≠

.You could either use a t-

1 2

0 test or an F-test. The value of the t-test is t

=

( 4.54

3.32)

0

= −

15.71

The Kenward-Rogers approximation to degrees of freedom is not provided by the output but the degrees of freedom would be at least 120-3=117. The estimated slope is more than 15 standard errors away from zero, the p-value is less than

.0001, and the null hypothesis is rejected. Mean total cholesterol tends to decrease under the combination treatment with an estimated rate of decrease during the first nine months of 7.86 mg/dL per month.

(f) [8pts] The REML log-likelihood values and corresponding AIC and BIC values shown on page 9 can be used to compare random parts of models with the same model for changes in mean responses. Consequently model A can be compared with model B because the systematic part of each model is

= β + β + β δ

, and model C can be compared with model D because the systematic part of each model is

= β +β +β δ +β t

2

3 ij

+β δ t

2

4 i ij

.

Model A cannot be compared with either model C or D because the REML likelihoods correspond to different sets of residuals. Similarly, Model B cannot be compared to either model C or D. Since the AIC and BIC values are lower for model A than model B, including random intercepts and slopes for individual patients appears to provide a better model for patient to patient variability of repeated measures than a random intercepts model. The difference in the REML log-likelihoods for those two models is 5334.5-5283.9=50.6 with (4-2)=2 degrees of freedom. This result is significant at the .0001 level and also supports the conclusion that the model with random slopes and intercepts provides a significant improvement over with model with random intercepts. A similar conclusion can be reached in the comparison of Models C and D. The AIC and

BIC values are lower for model C than model D. The difference in the REML log-likelihoods for those two models is 5334.5-5283.9=50.6 with (4-2)=2 degrees of freedom. This result is significant at the .0001 level and also supports the conclusion that including random intercepts and slopes for individual patients provides a significantly better model for patient to patient variability of repeated measures than a random intercepts model.

4

4.

(a) [4pts] Write out the formula for the GEE estimating equations is

0

%

=

100

D V i

1

(Y

% i

− π

% i

) where for the i-th child

Y

% i

Y

Y i1 i2

Y i3

are the binary results (1 for safe and 0 for elevated) at 1,4,6 weeks

⎡ π i1

⎡ e

π = π

%

π i2 i3

⎥ ⎢

⎦ e e

2 trt i

3

1

(

2

3

3

)trt i

5

1

(

2

5

3

)trt i

2 trt i )

3

1

(

2

3

3

)trt i

5

1

(

2

5

3

)trt i

)

)

are the probabilities

of “safe” blood lead levels at 1, 4 and 6 weeks.

V i

=

Var(Y )

%

=

π i1

(1

− π

0 i1

) 0 0

π i2

(1

− π

0 0

π i2 i3

) 0

(1

− π i3

)

1

ρ

1

ρ

π i1

(1

− π

0 i1

) 0 0

π i2

(1

− π

0 0 i2

π

) 0 i3

(1

− π i3

)

D i

=

∂π

∂β i1

0

∂π

∂β i1

1

∂π

∂β i1

2

∂π

∂β i1

3

∂π

∂β i2

0

∂π

∂β i2

1

∂π

∂β i2

2

∂π

∂β i2

3

∂π

∂β i3

0

∂π

∂β i3

1

∂π

∂β i3

2

∂π

∂β i3

3

⎢ i i

π i1

(1

− π i1

) 0 0

0

π i2

(1

− π i2

0 0

) 0

π i3

(1

− π i3

)

(b) [8pts] The GEE estimates of the proportion of children treated with succimer who have “safe” blood lead levels (<20 μ g / dL ) after 6 months of treatment is e

β + β + β + β

3

)

β + β + β + β

3

)

= e

=

0.41314

Using the delta method, a large sample standard error for ˆ is

5

S

= π − π

)

[

1 1 5 5

] ⎢

.13037 -.13037 -.01689 .01689

-.13037 .24977 .01689 -.03297

1

1

-.01689 .01689 .00508 -.00508

⎢ ⎥

5

=0.0652

.01689 -.03297 -.00508 .00963

⎦ 5 and an approximate 95% confidence interval is

0.413 (1.96)(0.0652)

(0.285, 0.541)

Alternatively, a 95% confidence interval could be constructed for

5

1

(

2

5

3

) using

S

β + β + β + β

3

)

= [

1 1 5 5

]

.13037 -.13037 -.01689 .01689

-.13037 .24977 .01689 -.03297

-.01689 .01689 .00508 -.00508

1

⎢ ⎥

⎢ ⎥

5

= 0.2689

.01689 -.03297 -.00508 .00963

⎦ 5

Then

ˆ

5

ˆ

1

(

ˆ

2

5

ˆ

3

) (1.96)S

β + β + β + β

3

)

⇒ − ±

(-0.8787, 0.1757)

Transform the endpoints of the interval to the probability scale

⎜ e

0.8787

0.8787

, e

+

0.1757

0.1757

⎟ ⇒

(0.293, 0.544)

(c) [8pts] With respect to the proposed model, the odds of “safe” blood lead levels

Are different for the succimer and placebo treatments if

β =

0 . Test the null hypothesis H :

β =

0 against the alternative H :

3

0 . From the GEE output with the model based standard errors a large sample normal statistic is

ˆ

0

Z

=

S

ˆ

3

=

3.52

with p-value=.0004. This is sufficient evidence to reject the null hypothesis. There is a decreasing trend in the log-odds for “safe” blood lead level for the placebo, but an increasing trend for the succimer treatment. The estimated model is lo g

π ⎞ = −

+

for the succimer

1

− π treatment, and log

π

1

− π

⎞ =

for the placebo.

6

(d) [4pts] If the proposed logistic regression model log

1

π ij

− π ij

⎟ = β + β

0 1 time ij

+ β

2 trt i

+ β

3

(trt )(time ) i ij is correct, the empirical (robust sandwich estimator) for the covariance matrix of

GEE estimates of regression coefficients is a consistent estimator. The modelbased estimator of the covariance matrix for the GEE parameter estimates, may not be consistent if the proposed exchangeable covariance structure for repeated measures is not correct.

(e) [4pts] The IWM estimates are obtained by maximizing a likelihood function for

the logistic regression model l og

1

π ij

− π ij

1 time ij

+ β

2 trt i

+ β

3 ij where the binary outcomes {Y ij

} are all assumed to be independent. The estimating equations are

0

%

=

100

X V i

1

(Y

% i

− π

% i

) where for the i-th child

Y i1

Y

% i

= ⎢

Y i2

Y i3

are the binary results (1 for safe and 0 for elevated) at 1, 4 and 6 weeks.

⎡ π i1

⎡ e

π = π

%

π i2 i3

⎥ ⎢

⎦ e e

2 trt i

3

1

(

2

3

3

)trt i

5

1

(

2

5

3

)trt i

2 trt i )

3

1

(

2

3

3

)trt i

5

1

(

2

5

3

)trt i )

)

V i

=

π i1

(1

− π i1

0

) 0 0

π i2

(1

− π i2

0 0

) 0

π i3

(1

− π i3

)

and X i

= ⎢

⎢ i i

7

(f) [4pts] Both the IWM and GEE estimators are consistent estimators for the parameters in the model log

1

π ij

− π ij

1 time ij

+ β

2 trt i

+ β

3

(trt )(time ) i ij

, if that model is correct for changes in the log-odds across time and treatments. Both estimators have large sample normal distributions. If sample size is large enough and the specified exchangeable covariance structure is close enough to the correct covariance matrix for the binary outcomes at the three time points for each child, the GEE estimator for the regression coefficients will be more efficient (have smaller standard errors) than the IWM estimator.

There were 100 total points for this exam. Check your papers to be sure that a score was written on your paper for each part and your total score was correctly recorded. The scores on this exam for all students in the class are shown in the following stem-leaf display.

9 | 0 0

8 |

8 | 0 0

7 | 5 6 8 9

7 | 0 1 2 3 4

6 | 1 2 2

5 |

5 |

4 | 0 2 4

8

Download