Sample Size Calculations for the Rate of Changes in Repeated

advertisement
Sample Size Calculations for the Rate of
Changes in Repeated Measures Designs
Chul Ahn, Ph.D.
UT Southwestern Medical Center
at Dallas
(Joint work with Sinho Jung at Duke)
Normal outcomes
1. Univariate summary statistics
Kirby et al. (1994)
Overall and Doyle (1994)
2. Univariate split-plot ANOVA
Bloch (1986)
Lui and Cumberland (1992)
3. Hotelling’s T2
Vonesch and Schork (1986)
Rochon (1991)
4. Multivariate ANOVA
Muller and Barton (1989)
Muller et al. (1992)
Binary Outcomes
1. Extension of univariate split-plot model
Lui (1999)
2. Weighted least squares
Rochon (1989)
Lipsitz and Fitzmaurice (1994)
GEE






Liu and Liang (1997): Score test, no closed
form formula except for some special cases
Rochon (1998), Wald test
{Pan (2001), Z-test, a special case of Wald
test, SAS and S-Plus use Wald test}
Jung and Ahn (2003, 2004, 2005)
Ahn and Jung (2003, 2005), Z-test
Dahmen et al. (2004)
Kim et al. (2005)
Other Approaches
Hedeker et al. (1999)
 Yi and Panzarella (2002)
 Gastanaga et al. (2006)
 Tu et al. (2004, 2007)

Problem formulation
Diggle et al. (2002) “Correlation between
repeated observations affects the sample size
estimates in a different way depending on the
problem.”.
 Leon (2004) and Rochon (1998): As
correlation increases, the required sample size
increases when comparing group averages.
 Jung and Ahn (2003, 2005) and Ahn and Jung
(2005) show that it may not be the case when
comparing the rates of changes over time
within subjects
GEE (Jung and Ahn, 2003, 2005)
A closed form formula for sample size
and power for comparing the rate of
changes between two groups
 Sample size can be computed using a
scientific calculator

GEE for continuous outcomes
Let yij be a continuous variable at
measurement time tij (j=1, …, Ki) for
subject i.
Let ri =0 for control group and ri =1 for
experimental group.
We assume missing completely at
random (MCAR)
 β4 is the parameter of interest
 Let


Let

Sn(b)=0

where

is approximately normal with mean
0 and variance Σn = An-1 Vn An-1

and

Reject H0 : β4 = 0
if the absolute value of
than z1-α/2
where
is larger
is the (4,4)-component of Σn
Sample size estimation
Sample size estimate to detect
H1 : β4 = β40 with a two-sided α test and
power 1-γ
 Assume that the visits are either made at
scheduled times or missing, and the
missing probability depends on
measurement time only.

Let A and V denote the limits of An and
Vn.
Then, Σn converges to Σ=A-1 V
A-1
 Let σ42 denote the (4,4) component of Σ
 Then, the required sample size is

We need to derive the expression of A
and V for σ42 to calculate the sample
size
 Let δij =0 for missing observation, and δij
=1 otherwise
 Under MCAR (δi1, …,δiK) is independent
of (yi1, …,yiK)
 Let visit times be fixed (t1, …,tK)

Let σ2 = var (εij), ρjj’ = corr (εij, εij’),
 pj=E(δij)=p(observation at tj)
 pjj’=E(δij δij’)
=p(observation at both tj and tj’)



Where
Sample Size Formula
σ42 is the (4,4) component of Σ=A-1 V A-1
σ42 =σ2st2 /(μ02 σr2 σt4),
where σt2 = μ2 - μ12

The required sample size is given by

Note that we do not have to specify the
true values for β1, β2, and β3 in sample
size calculation for testing β4
Calculation of σ42 requires projection of
the missing probabilities and true
correlation structure
 As a special case, we consider two
missing patterns;
independent missing (pjj’ = pj pj’) and
monotone missing (pjj’ = pj’ for j<j’)

We can use any correlation structures.
The commonly used correlation
structures are AR(1) with ρjj’ = ρ|j-j’| , and
compound symmetry (exchangeable)
with ρjj’ = ρ for j≠j’
 The sample size calculation can be done
easily with a scientific calculator

Example
Davis (1991, SIM)
 83 women in labor were randomized to
receive a pain medication (43 women) or
placebo (40 women). The amount of pain
was self-reported (0 = no pain, 100 =
extreme pain)
 K=6, maximum number of
measurements
 Monotone missing pattern

Sample size calculation
From the data, we got σ2 = 815.84
 H1 : β40 =5.71 in a new study
 Assign equal number of subjects in each
group: σr2 = 0.25 (=r(1-r))
 Proportion of observed measurements
(p1 , …, p6 )=(1, 0.9, 0.78,, 0.67, 0.54, 0.41)
 From these, we get μ0=4.31, μ1=2.02,
μ2=6.73, σt2=2.65

Under CS, we get ρ=0.64 and st2=8.30
from the data
 We need n=67 to detect β40 =5.71 with
α=0.05 and 90% power
 Under AR(1), we get ρ=0.80 and
st2=13.73 from the data
 We need n=111 to detect β40 =5.71 with
α=0.05 and 90% power

Simulation study
With the same ρ value, sample size under
AR(1) is larger than that under CS for testing
the rates of changes between two groups
 A conservative approach is to use AR(1)
 With the same ρ value, sample size under CS
is larger than that under AR(1) when
comparing marginal means between two
groups
 A conservative approach is to use CS
(Rochon, 1998)

K group comparisons
Jung and Ahn (2004)
 Two group comparisons can be
extended to K (K≥3) group comparisons
 Use of non-central chi-square distribution
Increase n or m?
Ahn and Jung (2004)
 Efficiency of the slope estimator in
repeated measurements
 Relative benefit of adding subjects (n)
versus adding measurements (m) on a
specified fixed study period [0,T]
 n and m will affect the standard error of
β4 estimate

Given m, let g(m)=n1/2 se(β)
 The effect of increase from m to (m+1)
on se(β) is the same as that from n to n’,
where n’ satisfies
g(m+1)/ n1/2 =g(m)/ (n’)1/2
That is, n’=n{g(m)/g(m+1)}2

True correlation, CS

Under no missing, pj=pjj’=1,
σm2 = 12 σ2(1-ρ)m/{(m+1)(m+2)T2 }
σm+1/σm does not depend on ρ in the
complete data case, while it depends on
ρ in the missing data case
Adding one more measurements in [0, T]
is equivalent to adding n(m-1)/(m+1)2
more subjects in the complete data case.
 That is, we can reduce n(m-1)/(m(m+3))
patients by adding one more
assessments to achieve the same
precision in the complete data case


Suppose that we increase the number of
measurements from m to m+1, the
relative reduction in standard error of
slope is
(se(βm)- se(βm+1))/se(βm)
Effect of dropout on sample size
estimate
Monotone missing
 Let N be the estimated total sample size
under no missing data, and q be the
proportion of dropout at the end of the
study
 Can we estimate the sample size using
N/(1-q)?

Dropout patterns
Binary Repeated Measurements
Jung and Ahn (2005, SIM)
 g(pkij )= ak + bk tkij
where g(p)=log{p/(1-p)}
 pkij (ak,bk)=g-1(ak + bk tkij)
=exp(ak+bk tkij)/{1+ exp(ak+bk tkij)}
 Closed-form sample size formula can be
derived in a similar way as we did for
continuous outcomes

Sample size to test H1 : |b1 – b2 |=d
Steps for sample size calculation
1.
2.
3.
4.
5.
6.
7.
Choose type I error α and power 1-β
Schedule measurement times (t1,…,tm)
Choose allocation proportions r1 and r2
Given pk1 and pkm, calculate (ak,bk), and pkj
Set d= b2 - b1
Specify non-missing proportions (δ1,…,δm),
and a missing pattern for δjj’
Specify the true correlation structure and the
associated correlation parameter ρ
Calculate the variance vk and the sample size
n
Example
75% of scleroderma patients do not have
pulmonary fibrosis at baseline in the
ongoing GENOSIS trial
 A new clinical trial will examine the effect
of a new drug in preventing the
occurrence of pulmonary fibrosis
 Presence or absence of pulmonary
fibrosis will be assessed at baseline, and
at months 6, 12, 18, 24 and 30.


Compare the occurrence of pulmonary fibrosis
from baseline to 30 months for placebo versus
a new drug
 Within-group correlation structure: AR(1) with
ρ=0.8, ρjj’=0.8|j-j’|
 Assign equal number of patients in each
group, r1= r2=0.5
 We project that proportion of subjects without
pulmonary fibrosis is p11 =0.75 at baseline,
and p16 =0.5 at 30 month in a placebo group
We assume that a new therapy will
prevent further occurrence of pulmonary
fibrosis
 That is, p21 = p26 =0.75
 b1 = {g(0.5)-g(0.75)}/(6-1)=-0.220
a1 =g(0.75)=1.099
 Similarly, we obtain (a2,b2)=(1.099,0)
 So, d=0-(-0.220)=0.220


The probabilities of no pulmonary fibrosis can
be estimated from the logistic regression
equation
(0.750, 0.707, 0.659, 0.608, 0.555, 0.500) for
the placebo group
(0.750, 0.750, 0.750, 0.750, 0.750, 0.750) for
the treatment group
 The proportions of observed measurements
are expected to be
(δ1,…,δ6)=(1.0, 0.95, 0.90, 0.85, 0.80, 0.75)
Suppose that we expect independent
missing
 Now, we have all the parameters values
to compute the sample size n
 From the parameters, we obtain v1
=0.305 and v2 =0.353
 Finally,

n=(1.96+0.84)2
(0.305/0.5+0.353/0.5)/0.2202=214
Software for sample size estimate
GEESIZE version 3.1
http://www.imbs.uniluebeck.de/pub/Geesize/
“GEESIZE computes the minimum sample
size in studies with correlated response
data based on GEE. These correlated
response data arise e.g. in repeated
measurement designs, family studies or
studies involving paired organs.”


RMASS2: Repeated measuers with
attrition: sample size for 2 groups
http://tigger.uic.edu/~hedeker/ml.html
Download