Stat 565 Assignment 2 Solutions Fall 2005

advertisement
1
Stat 565
Assignment 2 Solutions
Fall 2005
1) Exact randomization test for a 2x2 table of counts:
The following data were taken from Table 3.11 on page 73 in the book, Categorical Data
Analysis, by Alan Agresti.
Surgery
Radiation Therapy
Cancer
Controlled
21
15
Cancer
not Controlled
2
3
The 41 larynx cancer patients were randomly assigned to the two treatments. Use Fisher's
exact test to test the null hypothesis that the two treatments are equally effective in
controlling cancer against the alternative that the treatments are not equally effective.
Ho: The two treatments are equally effective in controlling cancer
Ha: not Ho
The p-value for the two-sided alternative is computed as
⎛ 36 ⎞ ⎛ 5 ⎞ ⎛ 36 ⎞ ⎛ 5 ⎞ ⎛ 36 ⎞ ⎛ 5 ⎞ ⎛ 36 ⎞ ⎛ 5 ⎞ ⎛ 36 ⎞ ⎛ 5 ⎞
⎛ 36 ⎞ ⎛ 5 ⎞
⎜ 21 ⎟ ⎜ 2 ⎟ ⎜ 22 ⎟ ⎜ 1 ⎟ ⎜ 23 ⎟ ⎜ 0 ⎟ ⎜ 19 ⎟ ⎜ 4 ⎟ ⎜ 18 ⎟ ⎜ 5 ⎟
⎜ 20 ⎟ ⎜ 3 ⎟
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
⎝
⎠
p-value=
+
+
+
+
= 1 - ⎝ ⎠ ⎝ ⎠ = 0.6384
⎛ 41 ⎞
⎛ 41 ⎞
⎛ 41 ⎞
⎛ 41 ⎞
⎛ 41 ⎞
⎛ 41 ⎞
⎜ 23 ⎟
⎜ 23 ⎟
⎜ 23 ⎟
⎜ 23 ⎟
⎜ 23 ⎟
⎜ 23 ⎟
⎝ ⎠
⎝ ⎠
⎝ ⎠
⎝ ⎠
⎝ ⎠
⎝ ⎠
.
There is not enough evidence to reject the null hypothesis that two treatments are equally
effective in controlling cancer.
Note that there are more patients in the surgery group than in the radiation group, so you
must examine proportions to determine which tables are less consistent with the null
hypothesis than the observed table. For the observed data the over proportion of patients
with controlled cancer is 36/41=0.878 and the proportion of surgery patients for which
cancer was 21/23=0.913. The difference is 0.913-0.878=0.035. Cleary 22/23 and 23/23
are farther above 0.878. In the other direction, 19/23=0.826 and 18/23=0.783 are farther
from 0.878, but 20/23=0.8695 is closer to 0.878 than 21/23=0.913.
SAS CODE AND OUTPUT:
data set1;
input row col count;
cards;
1 1 21
1 2 2
2 1 15
2 2 3
run;
2
proc freq data=set1;
tables row*col/exact chisq;
weight count;
run;
The FREQ Procedure
Statistics for Table of row by col
Fisher's Exact Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F)
21
Left-sided Pr <= F
0.8947
Right-sided Pr >= F
0.3808
Table Probability (P)
0.2755
Two-sided Pr <= P
0.6384
Sample Size = 41
R CODE AND OUTPUT:
> fisher.test(matrix(c(21,2,15,3),nrow=2,byrow=T))
Fisher's Exact Test for Count Data:
matrix(c(21, 2, 15, 3), nrow = 2, byrow = T)
p-value = 0.6384
1) Exact Randomization test for a 2x3 table of counts:
Perform an "exact" randomization test of the null hypothesis
Ho: the IFN-B treatment produces the same results as the treatment given to the controls
against the one sided alternative
Ha: the IFN-B treatment gives better results. The data are
Treated with IFN-B
Controls
Result of treatment
Improved
Unchanged
5
4
1
4
Worsened
1
5
Totals
10
10
Using the criteria that better results for the IFN-B treatment must yield as many improved
cases as in the observe data and the total number of improved and unchanged cases for
the IFN-B treatment must be as large as the total for the observed data, the following
tables of counts are at least as inconsistent with Ho as the observed table of counts:
Treated with IFN-B
Controls
Improved
5
1
Result of treatment
Unchanged
4
4
Worsened
1
5
Totals
10
10
3
Treated with IFN-B
Controls
Improved
6
0
Result of treatment
Unchanged
4
4
Worsened
0
6
Totals
10
10
Treated with IFN-B
Controls
Result of treatment
Improved
Unchanged
6
3
0
5
Worsened
1
5
Totals
10
10
Treated with IFN-B
Controls
Result of treatment
Improved
Unchanged
5
5
1
3
Worsened
0
6
Totals
10
10
The p-value is
⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞ ⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞ ⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞ ⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
5 4 1
6 4 0
6 3 1
5 5 0
p − value = ⎝ ⎠⎝ ⎠⎝ ⎠ + ⎝ ⎠⎝ ⎠⎝ ⎠ + ⎝ ⎠⎝ ⎠⎝ ⎠ + ⎝ ⎠⎝ ⎠⎝ ⎠ = 0.0176557
⎛ 20 ⎞
⎛ 20 ⎞
⎛ 20 ⎞
⎛ 20 ⎞
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
⎝ 10 ⎠
⎝ 10 ⎠
⎝ 10 ⎠
⎝ 10 ⎠
This p-value provides enough evidence to reject the null hypothesis and conclude the
IFN-B treatment was more effective in controlling larynx cancer than the control
treatment.
Another possibility is to use the more stringent criteria that the IFN-B treatment must
yield as many improved cases and as many unchanged cases as for the observed data.
The following tables of counts at least as inconsistent with Ho as the observed table of
counts with respect to those criteria:
Treated with IFN-B
Controls
Improved
5
1
Result of treatment
Unchanged
4
4
Worsened
1
5
Totals
10
10
Treated with IFN-B
Controls
Improved
6
0
Result of treatment
Unchanged
4
4
Worsened
0
6
Totals
10
10
4
Treated with IFN-B
Controls
Improved
5
1
Result of treatment
Unchanged
5
3
Worsened
0
6
Totals
10
10
⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞ ⎛ 6 ⎞⎛ 8 ⎞⎛ 6 ⎞ ⎛ 6 ⎞ ⎛ 8 ⎞⎛ 6 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
5 4 1
6 4 0
5 5 0
The p-value is P − value = ⎝ ⎠⎝ ⎠⎝ ⎠ + ⎝ ⎠⎝ ⎠⎝ ⎠ + ⎝ ⎠⎝ ⎠⎝ ⎠ = 0.0158371.
⎛ 20 ⎞
⎛ 20 ⎞
⎛ 20 ⎞
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
⎜⎜ ⎟⎟
⎝ 10 ⎠
⎝ 10 ⎠
⎝ 10 ⎠
This p-value provides enough evidence to reject the null hypothesis and conclude the
IFN-B treatment was more effective in controlling larynx cancer than the control
treatment.
Under the null hypotheses, the IFN-B and control treatments would be equivalent and a
patient would respond the same way, regardless of the treatment that patient was given.
This fixes the column totals in the table. The row totals are fixed at 10 for each row due
to the decision to put 10 patients into each treatment group. Consequently, if the null
hypothesis is true, there are 43 possible tables of counts that could have been obtained
from random assignment of 10 patients into each treatment group. These are shown
below.
5
Summing the probabilities for those tables with values of the Pearson statistic that are at
least 5.33, the value for the observed table, yields a total probability of 0.0642 . This is
the p-value produced by the exact option in PROC FREQ in SAS and also by the
fisher.exact( ) function in R and S-Plus. Since we are testing against a one-sided
alternative and there are the same number of patients in each group, a more appropriate pvalue is obtained by dividing by 2 to obtain p-value=0.0321. This p-value also leads to
the rejection of the null hypothesis at the 0.05 level of significance. This procedure is not
quite satisfactory, however, because it includes a number of tables that are not more
extreme than the observed table in the sense that they contain fewer improved and
unchanged patients under the IFN-B treatment than the observed table. For example, the
table with 5 improved and 0 unchanged patients in the IFN-B group is considered “more
extreme” by the Pearson chi-square criterion. Some students failed to clearly state their
test criteria.
3) Randomization, adherence and intention to treat.
A. Since the subjects will come from two different populations (California and Iowa)
and these two populations may differ with respect to climate and environment and other
factors such as racial composition and consumption of soy, separate randomization
assignments of women to the three treatments should be done in California and Iowa,
6
with equal numbers of women assigned to each of the three treatment groups in Iowa and
equal numbers of women assigned to the three treatment groups in California. This
balance avoids confounding with location differences. To minimize the number of
medical personnel needed to perform the initial screening tests and control initial
screening and follow-up costs, a few women will be admitted to the study each week over
a two year period. Therefore it is not possible to assign all participants to the treatment
groups at one time. Instead, we can randomize women to treatment groups as they enter
the study. One way to do this is to use a uniform random number generator to generate a
list of random numbers for each location. Each woman who is admitted to the study in
Iowa would be assigned the next available random number on the list. Starting with the
first number on the list, divide each list into 40 sets of numbers with three consecutive
numbers per set. For any set of three numbers, the woman with the smallest number is
assigned the placebo, the woman with the middle number is assigned the lower level of
isoflavones, and the women with the largest number is assigned the higher level of
isoflavones. Repeat this process in California. In this way, women can be randomized to
a treatment group as soon as they are admitted to the study, and by using consecutive
blocks of three at each location, the study will be nearly balanced with respect to
treatment groups, location and season (or time of entry into the study). The study should
be double blind, so neither the participating women nor the people administering the
treatments and taking measurements should have any knowledge of the treatment
assignments. Therefore the pills should be identical in shape and color and come in
identical bottles (each bottle would have only the name of the patient). One person at
each location would label the bottles and keep the list of treatment assignments secret
until the end of the study.
Strengths of this randomization procedure are that it maintains equal sample sizes for the
three treatments at each location, it nearly maintains balance with respect to season or
time of entry into the study, and women can be randomized to treatment groups as soon
as they are admitted to the study.
One weaknesses of this randomization scheme is that it does not block with respect to
other possible confounding factors. If there really are no seasonal differences, blocking
with respect to season would result in a slight loss of power to detect treatment
differences, but this potential loss would be quite small relative to the potential gain in
power if there are substantial seasonal differences. Randomization within blocks of three
consecutive women at each site is complicated and personnel must be well trained to
carry this out correctly.
B. If your protocol indicated intent to treat, you would not be allowed to exclude these
34 women from the analysis for lack of adherence.
Intention to treat preserves randomization: The validity of a randomized control
trial depends greatly on the process of randomization. Randomization insures that
both measurable and immeasurable factors will balance out “on average”. If a factor other
than the treatment itself could possibly influence bone loss measurements, then
randomization insures that patients with different levels of that factor are equally likely to
7
receive either of two treatments or the placebo. This prevents many types of bias that can
occur in a non-randomized trial.
An analysis that excludes noncompliant patients is no longer randomized and might
cause serious bias. Suppose noncompliant women have something in common that
affects the treatment outcome. For example, suppose they are all heavy users of alcohol
and most were assigned to the high level of isoflavone treatment group. Suppose heavy
use of alcohol cause bone density to decline. If we exclude these patients from the
analysis, we are eliminating a higher percentage of poorly performing patients from one
treatment group, but not from the control group or the other treatment group. This could
seriously bias comparison of outcome means. The intention to treat principle maintains
unbiased estimates of the treatment differences for the patients used in the study by
keeping all patients who were randomized to treatment in the study. Note that this does
not exactly coincide with unbiased estimation of treatment differences for a theoretical
population of fully compliant patients.
Intention to treat analysis is more realistic. There are many factors that influence
whether a patient complies or not with a treatment. Some of the factors that influence
compliance might also influence the outcome measure. Noncompliant patients, for
example, may tend to have worse outcomes than compliant patients, even in a placebo
group. Perhaps patients who forget to follow a prescribed treatment will also neglect
other things important for their health. Thus an analysis that excludes non-compliant
patients may produce a study sample that is healthier than the patients in the overall
population and this could bias estimates of treatment effects, although it may produce less
biased estimates of the subpopulation that would be compliant.
Intention to treat analysis is especially important for medications that are difficult
to tolerate. If you exclude noncompliant patients, you are ignoring the influence of poor
tolerability on the efficacy of a treatment.
Consequences of intention to treat:
1. Power of tests for treatment effects is generally reduced because existence of noncompliant subjects tends to increase variation of observed outcomes within treatment
groups and also tends to diminish observed treatment differences. Standardized estimates
of differences in treatment means tend to be pulled toward zero.
2. Conversely, in an equivalence trial (attempting to prove that two treatments do not
differ by more than a certain amount), using an intention to treat analysis will tend to
favor equality of treatments.
3. While estimates of treatment effects would be unbiased for this mixture of compliant
and non-compliant women, estimates of treatment effects may be distorted relative to
what they are for compliant women.
C. Most biostatistician would refer to the intention to treat principle to argue that the 22
ineligible should be kept in the study. Arguments would follow those given in part B.
Potential advantages: By not deleting any subjects from the study, the largest possible
sample sizes are maintained and this could provide more power for tests of treatment
8
effects. Also, one might be able to make inferences to a wider population, but this is
limited because no real effort would have been made to sample subjects from ineligible
parts of the wider population.
Potential disadvantages: The ineligible women have different characteristics and may
respond to treatments or placebo in a some what different manner than eligible women.
Generally, the enrollment criteria are imposed to reduce variability by providing more
homogeneous subjects, although this restricts the population to which inferences can be
made. Including ineligible may increase variation in responses within treatment groups
and decrease power of tests of significance, in spite of the larger sample sizes. It may
also distort estimates of treatment effects relative to what they would be for the
population of “eligible” women.
D. One could argue that since these ineligible women had to be removed from the study
for safety reasons, they should be treated as dropouts and any data collected after they
were removed (just to monitor safety) should not be included in the analysis. Many
biostatisticians would appeal to the intention to treat principle to argue that all data
collected on those women should be included in the analysis. I believe that that is an
incorrect use of the intention to treat principle, but this debate is certainly unresolved at
the present time. Since these ineligible women have been forced to stop using any
treatment, their follow-up data will generally not conform to the follow-up data for
women remaining in the study and this will tend to dilute estimates of treatment effects
and increase variability within treatment groups.
One strategy is to run several different analyses: an analysis of data for all women
randomized to treatment, another analysis that excludes ineligible women who were
removed from the study for safety reasons, a third analysis that further excludes data on
all non-compliant women. Hopefully, all three analyses will produce the same
inferences. If not, careful explanations of possible reasons for differences in inferences
must be given.
4) Use simulation to explore randomization tests.
Results will vary, depending on the permutations selected by your simulations, but you
should see the following pattern:
Number of
Blocks
95th Percentile of
the Simulated
F-Values
2
4
6
8
10
48.375
2.932
3.029
2.875
2.904
95th Percentile of a
Central F
Distribution with
(3,3(b-1)) df
9.277
3.863
3.287
3.072
2.960
Proportion of
Ratio: Column 2
Simulated F-values
divided by
that exceed the value
Column 3
in Column 3
5.215
0.247
0.759
0.0204
0.922
0.0414
0.936
0.0405
0.981
0.0464
9
As the number of blocks increases, the simulated 95th percentile of the randomization
distribution of the ratio of mean squares (F-ratio) approaches and the 95th percentile of
the central F-distribution and the actual type I error level for the randomization test
approaches the nominal 0.05 value. As the number of blocks increases, the number of
possible permutations increases. For these types of experimental units, as few as 6 blocks
are enough for the central F approximation to provide a reasonable approximation to the
exact randomization distribution of the F-ratio, at least near the upper .05 percentile of
the distribution. The quality of this approximation will vary across experiments,
depending how the experimental units differ from each other. The approximation will be
better for larger sample sizes and when variation among units more nearly follows a
normal distribution. A theoretical argument can be based on a central limit theorem.
The approximation is not very good for two blocks because there are only eight
experimental units and the distribution of their responses is quite discrete and nonnormal. The central limit theorem is a large sample result.
5) Sample Size Determination
A. Using n1 = n 2 = n and α = 0.05 , find the value of n that provides
power=0.8 for rejecting the null hypothesis H 0 : μ 2 − μ1 = 0 when
the true difference in the means is given by the alternative
δ = μ 2 − μ1 = 5 and the standard deviation is approximately 2δ .
In this case, α = .05 and β =1-power=1-0.8=0.2, and initial estimate of the sample size is
n% 1 = n% 2 =
2(z.05 + z.20 ) 2 s2
δ2
=
2(1.645 + 0.84)2 (2δ)2
δ2
≈ 50
Then, a more accurate value for sample size based on the central t-distribution with
50+50-2=98 degrees of freedom is
n1 = n 2 =
2(t (98),.05 + t (98),.20 )2 s2
δ2
=
2(1.661 + 0.8453)2 (2δ)2
δ2
= 51
The sample size calculation is rounded up to the next largest integer. It makes no sense
to request a sample size of 50.6, for example. You must use whole subjects.
B. Test the null hypothesis Ho : μ1 = μ 2 against the two-sided
alternative Ha : μ1 ≠ μ 2 . Using n1 = n 2 = n and α = 0.05 , find the
value of n that provides power=0.8 for rejecting the null
hypothesis when the alternative is δ = 5 and the standard deviation
is approximately 2δ .
In this case, α = .05 and β =1-power=1-0.8=0.2, and initial estimate of the sample size is
10
n% 1 = n% 2 =
2(z.025 + z.20 )2 s2
δ2
=
2(1.96 + 0.84)2 (2δ)2
δ2
≈ 63
Then, a more accurate value for sample size based on the central t-distribution with
63+63-2=124 degrees of freedom is
n1 = n 2 =
2(t (124),.025 + t (124),.20 )2 (2δ)2
δ2
= 64
The sample size calculation is rounded up to the next largest integer.
C. For an equivalence test of H 0 : μ1 − μ 2 ≥ θ versus H a : μ1 − μ 2 < θ the
rejection region will be inside the interval of possible alternatives (− θ , θ ) . We reject the
null hypothesis if x1 − x 2 is enough standards errors above − θ and x1 − x 2 is
enough standard errors below θ to be convinced this outcome would be sufficiently
unlikely to occur just because of sampling variability when the null hypothesis is actually
true. Consequently, reject the null hypothesis of non-equivalent treatments if
x 1 − x 2 > − θ + t α, n 1 + n 2 − 2 s 2 (
1
1
1
1
+
) and x1 − x 2 < θ − t α, n1 + n 2 − 2 s 2 ( +
)
n1 n 2
n1 n 2
Let n1 = λn 2 and assume that the actual difference in response means is δ = μ1 − μ 2 ,
where 0 ≤ δ < θ . Then, the power of the test is
⎛
⎞
1
1
1
1
P ⎜ − θ + t α, n 1 + n 2 − 2 s 2 ( +
) < x1 − x 2 < θ − t α, n 1 + n 2 − 2 s 2 ( +
) μ1 − μ 2 = δ ⎟
⎜
⎟
n1 n 2
n1 n 2
⎝
⎠
D. Calculate the sample sizes when α = .05 and β =1-power=1-0.8=0.2, δ = θ / 2 , and
s = θ.
⎞
⎛
1
2 1
⎟
⎜ − θ + t α, n + n − 2 * s 2 ( 1 + 1 ) − δ
t
*
s
(
)
θ
−
+
−
δ
α
,
n
+
n
−
2
1
2
1
2
n1 n 2
n1 n 2
⎟
⎜
x1 − x 2 − δ
1 − β = P⎜
<
<
μ1 − μ 2 = δ ⎟
1
1
1
1
1
1
⎟
⎜
s2 ( +
)
s2 ( +
)
s2 ( +
)
⎟
⎜
n
n
n
n
n
n
1
2
1
2
1
2
⎠
⎝
⎛
1
1
1 ⎞
2 1
⎜ − ( θ + δ) + t α, n + n − 2 * s 2 ( +
θ
−
δ
−
+
)
(
)
t
*
s
(
)⎟
α
+
−
,
n
n
2
1
2
1
2
n1 n 2
n1 n 2 ⎟
⎜
= P⎜
< t ( n1 + n 2 − 2) <
⎟
1
1
2 1
2 1
⎜
⎟
s ( +
)
s ( +
)
⎜
⎟
n
n
n
n
1
2
1
2
⎝
⎠
11
⎛
⎞
⎜
⎟
⎜
− ( θ + δ)
⎟
( θ − δ)
= P⎜ t α, n1 + n 2 − 2 +
< t ( n1 + n 2 − 2) < − t α, n1 + n 2 − 2 +
⎟
1
1 ⎟
2 1
2 1
⎜
+
+
s
(
)
s
(
)
⎜
n1 n 2
n1 n 2 ⎟⎠
⎝
where t( n1 +n2 − 2) denotes a random variable with a central t-distribution with n1 + n 2 − 2
degrees of freedom and t α, ( n1 + n 2 − 2 ) denotes an upper tail percentile of that central tdistribution. Get an approximate solution by assuming that the sample sizes are large
enough for a standard normal random variable (Z) to be a good approximation to a
random variable with a central t-distribution. Then,
⎛
⎞
⎜
⎟
⎜
− ( θ + δ)
⎟
( θ − δ)
< Z < − Zα +
1 − β ≈ P⎜ Z α +
⎟
1
1 ⎟
2 1
2 1
⎜
+
+
s
(
)
s
(
)
⎜
n1 n 2
n1 n 2 ⎟⎠
⎝
⎛
⎞
⎛
⎞
⎜
⎟
⎜
⎟
⎜
⎟
⎜
− ( θ + δ) ⎟
( θ − δ)
= P⎜ Z < − Z α +
⎟ − P⎜ Z < Z α +
⎟
1 ⎟
1 ⎟
2 1
2 1
⎜
⎜
+
+
s
(
)
s
(
)
⎜
⎜
n1 n 2 ⎟⎠
n1 n 2 ⎟⎠
⎝
⎝
⎛
⎞
⎜
⎟
⎜
⎟
( θ − δ)
≈ P⎜ Z < − Z α +
⎟
1 ⎟
2 1
⎜
+
s
(
)
⎜
n1 n 2 ⎟⎠
⎝
since the second probability in the previous equation will be close to zero when the
sample sizes are large enough. Consequently, set
Zβ = − Z α +
(θ − δ )
s2 (
Solving for n 2 , we obtain
λn1 = n 2 ≈
1
1
+
)
n1 n 2
(Zα + Zβ )2 s2 (1 + λ −1 )
( θ − δ )2
Substituting λ = 1 , δ = μ1 − μ 2 = θ / 2 , α = .05 , β =1-power=1-0.8=0.2, and s = θ , we
obtain
12
n1 = n 2 ≈
8(Z.05 + Z.20 )2 s2
( θ) 2
=
8(1.64485 + 0.84162) 2 ( θ) 2
( θ) 2
= 50
Then, make an adjustment for the t-distribution
n1 = n 2 ≈
8(t.05,98 + t.20,98 )2 s2
( θ)2
=
8(1.66055 + 0.84540)2 ( θ)2
( θ)2
= 51
Alternatively, we numerically find the smallest n1 = n 2 = n that satisfies
⎛
( θ − δ) ⎞⎟
− ( θ + δ)
1 − β = P⎜ t α,2( n −1) +
< t 2( n −1) < − t α,2( n −1) +
⎟
⎜
2s 2 / n
2s 2 / n ⎠
⎝
⇒
⎛
− (3θ / 2)
(θ / 2) ⎞⎟
< t 2( n −1) < − t α,2( n −1) +
0.8 = P⎜ t .05,2( n −1) +
⎜
⎟
2θ 2 / n
2θ 2 / n ⎠
⎝
⇒
⎛
n ⎞
−3 n
⎟
0.8 = P⎜⎜ t .05,2( n −1) +
< t 2( n −1) < − t α,2( n −1) +
2 2
2 2 ⎟⎠
⎝
The following R function yields n=51.
fsize=function (alpha)
{
power=.8
n=1
repeat{
n=n+1
df=2*n-2
upper=-qt(1-alpha,df)+sqrt(n)/(2*sqrt(2))
lower=qt(1-alpha,df)-3*sqrt(n)/(2*sqrt(2))
pt=pt(upper,df)-pt(lower,df)
if(pt>power) break
}
return(n)
}
E. Find the sample size needed so that the half length of a 95%
confidence interval for the difference in the means will be about
0.5 when the standard deviation is 5.0.
13
A large sample approximation of the 95% CI for difference in means will be
1
1
( x1 − x 2 ) m z α / 2 s
+
n1 n 2
Given the half length of a 95% confidence interval for the difference in the means will be
about 0.5 when the standard deviation is 5.0 and assume n1 = n 2 = n , we can solve for n.
2
2
⎛s z
⎛ (5)(1.96) 2 ⎞
2⎞
⎟ =⎜
⎟ = 768.32
n ≥ ⎜⎜ α / 2
⎜
⎟
⎟
0
.
5
0
.
5
⎝
⎠
⎠
⎝
Hence n should be rounded up to 769.
6) Inferences for relative risk
A. Use the information from the pilot study to estimate
π2
π1 ,
the relative risk of relapse for the standard therapy relative to
treatment with the new compound. Also construct an approximate
95% confidence interval for the relative risk of relapse.
Treatment
New compound
Standard therapy
Relapse
24
30
P2 = P(relapse|standard therapy)=
30
24
, P1 = P(relapse|new compound)=
80
80
Hence estimated relative risk, RRˆ =
Not relapse
56
50
P2 30
=
= 1.25
P1 24
From the previous assignment, we know that the variance of the large sample normal
approximation to the distribution of RR̂ is
(1 − π 2 ) (1 − π 1 )
p2
)) =
+
p1
nπ 2
mπ 1
30
24
(1 − ) (1 − )
(1 − p 2 ) (1 − p1 )
80 +
80 = 0.05 , and the
We can estimate this by
+
=
np 2
mp1
30
24
p
standard error is approximately SE(log( 2 )) = 0.05 . Then, an approximate 95% CI
p1
for log(RR) is
var(log(
14
p
p
log( 2 ) ± (1.96)SE(log( 2 )) ⇒ (log(1.25) ± (1.96) ,05 ⇒ ( −0.215256, 0.661256)
p1
p1
.
Finally, an approximate 95% CI for RR is (e −0.215256 , e 0.661256 ) ⇒ (0.806, 1.937 ) .
B. Consider a new study in which the researchers want to test
Ho : π 1 = π 2 against the alternative Ha : π 1 < π 2 . Determine the
sample size needed in order to achieve power of at least 0.80
π2
when the relative risk (
π 1 ) is 1.20. If you can, present a
formula in addition to a numerical value for the sample size.
Assume n1 = n 2 = n . The sample size formula for a test involving the difference
between two proportions is
(z
n=
α
2π(1 − π) + z1−β π1 (1 − π1 ) + π 2 (1 − π 2 )
)2
( π1 − π 2 ) 2
where π 2 =P(relapse | standard compound), π1 =P(relapse | new compound), and
30
= 0.375 .
π = ( π1 + π2 ) / 2 . Using the information given in the problem, specify π 2 =
80
In a real situation you may have a good deal of information on the standard compound
from previous studies and may be able to specify an accurate estimate of π 2 =P(relapse |
standard compound). Since RR= ( π2 / π1 ) =1.2, the value for π1 must be
π1 = π 2 / 1.2 = 0.375 / 1.2 = 0.3125 and it follows that
π = ( π1 + π2 ) / 2 = (0.375 + 0.3125) / 2 = 0.34375 . Using α = .05 , β =1-power=10.8=0.2 we obtain n1 = n 2 = n = 714 observations.
If you initially selected π1 =24/80=0.3 from the information from the pilot study, then
RR= ( π2 / π1 ) =1.2 implies that the value for π 2 must be π 2 = 1.2 π1 = 0.36 . These values
require a samples sizes of n1 = n 2 = n = 759 observations.
π
You could also do this test in terms of relative risk. H o : log( 2 ) = 0 versus
π1
π
H A : log( 2 ) > 0 . The large sample normal approximation to the distribution of the
π1
⎛
π
p
(1 − π 2 ) (1 − π1 ) ⎞
⎟ . Then, the
estimate of log(relative risk) is log( 2 ) ~& N ⎜⎜ log( 2 ),
+
π1
p1
nπ 2
nπ1 ⎟⎠
⎝
15
p
null hypothesis is rejected at the α = 0.05 level if log( 2 ) > 0 + Z α
p1
(1 − π 2 ) (1 − π1 )
.
+
nπ 2
nπ1
The power of the test is
⎛
⎜
power = P ⎜ z =
⎜
⎜
⎝
p2
) − 1.2
p1
> Zα −
(1 − π2 ) (1 − π1 )
+
nπ 2
nπ1
log(
⎞
⎟
1.2
⎟
⎟
(1 − π 2 ) (1 − π1 )
+
⎟
nπ 2
nπ1 ⎠
Solving for n, we obtain
n=
( Zα + Zβ )
2 ⎛ (1 − π 2 )
+
⎜
⎝ π2
(0 − log(1.2))2
(1 − π1 ) ⎞
⎟
π1 ⎠
=
(1 − 0.3) (1 − 0.375) ⎞
+
⎟
0.375 ⎠
⎝ 0.3
= 770
(0 − log(1.2))2
(1.645 + 0.84 )2 ⎛⎜
(1 − π2 ) (1 − π1 )
+
makes a substantial
π2
π1
difference. Using π1 =0.5 and π2 =0.5 instead of π1 =0.30 and π2 =0.375, for example, in
the previous formula changes the estimated sample size to 372. The variance of
log(p 2 / p1 ) becomes smaller as π1 and π2 become larger. It would be wise to make these
calculations for several sets of values for π1 and π2 .
The values used for π1 and π2 in evaluating the
C. Suppose that the new compound and the standard therapy are
considered to be equivalent if the absolute value of the
logarithm of the relative risk does not exceed 0.1. Determine the
sample size needed in order to achieve power of at least 0.8 for
π2
establishing equivalence when log(
π 1 ) = 0.05 . Assume the relapse
rate for the standard therapy is equal to the observed rate in
the pilot study (30/80=0.375). Assume n1 = n 2 = n .
Assuming n1 = n 2 = n , the large sample normal approximation to the natural logarithm
of the estimate for relative risk is
⎛
π
p
(1 − π 2 ) (1 − π1 ) ⎞
⎟
+
log( 2 ) ~& N ⎜⎜ log( 2 ),
π1
p1
nπ 2
nπ1 ⎟⎠
⎝
Using a derivation similar to that used in part C of problem 5, the null hypothesis of
equivalence is rejected if
16
p
log( 2 ) > −θ + Z α
p1
p
(1 − π 2 ) (1 − π1 )
and log( 2 ) < θ − Z α
+
nπ 2
nπ1
p1
(1 − π 2 ) (1 − π1 )
+
nπ 2
nπ1
where θ = 0.1, the boundary value of log( π 2 / π1 ) that determines equivalence in this
case. When the alternative δ = log( π 2 / π1 ) = .05 is true, the power of the test is
p
⎛
⎞
log( 2 ) − δ
⎜
⎟
− θ − δ + Zασθ n
θ − δ − Zα σθ n ⎟
p1
<
<
P⎜
⎜
⎟
σδ n
σδ / n
σδ n
⎜
⎟
⎝
⎠
where
σθ
1
=
n
n
(1 − π 2 ) (1 − π 2 e − θ )
is the large sample standard deviation for
+
π2
π2e − θ
(1 − π 2 ) (1 − π 2 e − δ )
is the large
+
π2
π2e − δ
sample standard deviation for log(p 2 / p1 ) when δ = log( π 2 / π1 ) .
log(p 2 / p1 ) when θ = log( π 2 / π1 ) and
σδ
1
=
n
n
A sample size is determined by stepping through possible values of n until the smallest n
is found for which
p
⎛
⎞
log( 2 ) − δ
⎜
⎟
− θ − δ + Zασθ n
θ − δ − Zα σθ n ⎟
p1
⎜
power = 1 − β = P
<
<
⎜
⎟
σδ n
σδ / n
σδ n
⎜
⎟
⎝
⎠
This is done by the following R code which yields n=8816
alpha <- 0.05
power <- 0.8
p2 <- 0.375
theta= 0.1
delta= 0.05
p1 <- p2*exp(-theta);
stheta <- sqrt(((1-p1)/p1) + ((1-p2)/p2))
p1a <- p2*exp(-delta)
sdelta <- sqrt(((1-p1a)/p1a) + ((1-p2)/p2))
n=1
repeat{
n=n+1
upper <- (theta-delta)/(sdelta/sqrt(n)) - qnorm(1-alpha)*stheta/sdelta
lower <- (-theta-delta)/(sdelta/sqrt(n)) + qnorm(1-alpha)*stheta/sdelta
pZ <- pnorm(upper)-pnorm(lower)
17
if(pZ>power) break
}
n
An approximate sample size formula is
2
⎡ Z α σ δ + Zβ σ θ ⎤
n=⎢
⎥
(θ − δ )
⎦
⎣
For θ =0.10, δ = 0.05 , π 2 = 30 80 = 0.375 , power=0.8= 1 − β and α =0.05, this formula
yields n=8816 subjects for both the standard and new treatment groups. R code is given
below
alpha <- 0.05
power <- 0.8
p2 <- 0.375
theta= 0.1
delta= 0.05
p1 <- p2*exp(-theta);
stheta <- sqrt(((1-p1)/p1) + ((1-p2)/p2))
p1a <- p2*exp(-delta)
sdelta <- sqrt(((1-p1a)/p1a) + ((1-p2)/p2))
n <- ((qnorm(power)*sdelta+qnorm(1-alpha)*stheta)/(theta-abs(delta)))^2
n <- ceiling(n)
n
D. Determine the sample sizes needed to construct a 95%
confidence interval for the relative risk of relapse for the new
therapy versus the standard treatment, such that the length of
the confidence does not exceed 5% of the actual value of the
relative risk. Assume n1 = n 2 = n .
From part A, we know an approximate 95% CI for log(RR) is;
p
(1 − p 2 ) (1 − p1 )
log( 2 ) ± 1.96
+
p1
n 2 p2
n1p1
An approximate 95% CI for RR is;
p
(1− p 2 ) (1− p1 )
⎛ log( p 2 ) 1.96 (1− p 2 ) (1− p1 )
log( 2 ) +1.96
−
+
+
⎜
p1
n 2p2
n 1 p1
p1
n 2p2
n1p1
⎜e
,e
⎜⎜
⎝
⎞
⎟
⎟
⎟⎟
⎠
The length of the confidence should not exceed 5% of the actual value of the relative risk.
Consequently,
18
p
(1− p 2 ) (1− p1 )
log( 2 ) +1.96
+
p1
n 2p2
n1p1
e
⎛ 1.96
p2 ⎜
⎜e
⇒
p1 ⎜
⎜
⎝
⎛ 1.96
⎜
⎜e
⇒
⎜⎜
⎝
p
(1− p 2 ) (1− p1 )
log( 2 ) −1.96
+
p1
n 2p2
n1p1
−e
(1− p 2 ) (1− p1 )
+
n 2p2
n1p1
−1.96
p
≤ 0.05 2
p1
(1− p 2 ) (1− p1 ) ⎞
+
⎟
n 2 p2
n1p1
p
⎟ ≤ 0.05 2
p1
⎟⎟
⎠
(1− p 2 ) (1− p1 )
(1− p 2 ) (1− p1 ) ⎞
+
−1.96
+
⎟
n 2p2
n1p1
n 2p2
n 1 p1
⎟ ≤ 0.05
−e
⎟⎟
⎠
−e
Using n1 = n 2 = n and using the information from the pilot study to select p1 =
p2 =
24
and
80
30
(1− p2 ) (1− p1 )
, we have
+
= 4 and
p2
p1
80
⎛
1 ⎛ (1− p 2 ) (1− p1 ) ⎞
⎟
⎜ 1.96 ⎜⎜
+
−1.96
n ⎝ p2
p1 ⎟⎠
⎜e
−e
⎜
⎜
⎝
⎛ 3.92
⎜
⇒
⎜e
⎜
⎝
1 ⎛ (1− p 2 ) (1− p1 ) ⎞ ⎞
⎜
⎟⎟
+
n ⎜⎝ p 2
p1 ⎟⎠ ⎟
⎟
⎟
⎠
1
n
−e
− 3.92
1
n
≤ 0.05
⎞
⎟
⎟ ≤ 0.05
⎟
⎠
We can solve for n by stepping through a sequence of possibilities. The following R
function evaluated at alpha=0.05 yields n=24591.
Nsize=function (alpha)
{
n=0
qn=qnorm(1-alpha/2)
repeat{
n=n+1
diff=exp(qn*2*sqrt(1/n))-exp(-qn*2*sqrt(1/n))
if (diff<0.05) break}
return(n)
}
Download