Baseline_Measurements_in_RCTs_JS_2011-05

Baseline Measurements in RCTs
To adjust or not to adjust, that is the question
The first table of the results section of most
journal articles contains a summary of baseline
characteristics as below:
Group 2
25
p
N per group
Group 1
25
Age (mean)
Sex (% male)
hemoglobin (g/ml)
Hypertension
Previous MI
38.4
72%
103.4
56%
48%
41.2
64%
104.8
24%
24%
.24
.55
.81
.02
.08
Assuming that this is a randomized trial, what is
the question that the p-value are answering now?
Are they useful?
In a randomized experiment, we know:
1. Outcome differences that occur may occur by chance
or because there is a true difference between treatments.
2. Baseline differences that occur must occur by chance
or because someone has been “fiddling” with the
randomization process.
What is the point of determining the probability
that the difference occurred by chance when we
know it occurred by chance?
The only time this might be useful would be when
we suspect that somebody has been cheating and
not randomizing fairly.
If randomizing has been fair, are the p-values of
any usefulness?
No, not really.
Why not?
Because with very large sample sizes,
inconsequential differences between treatments
may be statistically significant.
At the same time, with very small sample sizes,
large differences at baseline may not be
statistically significant but may have a huge
bearing on the outcome.
In the previous example, there was double the
proportion of patients with previous MI in one
group than the other , but the difference was not
significant.
What should one look at in the baseline table?
The magnitude of the difference between
treatment groups on variables known or highly
suspected of being related to the outcome of
interest.
What if there is a large difference on a variable not
related to outcome?
It will not have any impact on the result of the
trial.
Example:
Variable highly correlated with outcome:
Rate of MI or Cardiac Death
Previous MI: 50%
No previous MI: 25%
Previous MI
No previous MI
Treatment 1
N=80
N=20
Treatment 2
N=40
N=60
Number of Poor Outcomes Expected:
Previous MI
No previous MI
Treatment 1
80X.5=40
20X.25=5
Total Poor Outcomes Expected:
40+5=45
Treatment 2
40X.5=20
60X.25=15
20+15=35
Example:
Variable not correlated with outcome:
Rate of MI or Cardiac Death
Sex= Male: 50%
Sex=Female: 50%
Male
Female
Treatment 1
N=80
N=20
Treatment 2
N=40
N=60
Number of Poor Outcomes Expected:
Treatment 1
80X.5=40
20X.5=10
Treatment 2
40X.5=20
60X.5=30
Total Poor Outcomes Expected:
40+10=50
20+30=50
Male
Female
Baseline Characteristics
p-values on baseline characteristics indicate only
whether randomization worked - one expects that
one of every 20 characteristics will have a p <.05.
Lack of balance on baseline characteristics is only
important if those characteristics are associated
with outcome.
Failure to reach statistically significant differences
on baseline characteristics does not assure one that
there is lack of confounding. In small studies, the
power to detect significant differences is low.
More attention should be paid to the size and
distribution of the difference rather than the
p-value.
If one is concerned about important baseline
characteristics, then an adjusted analysis should be
planned and implemented whether or not
significant baseline differences between
treatments occur.
Adjustment for important baseline characteristics
will result in improved precision and power even
when there is perfect balance on such factors.
Adjustment for factors unassociated with outcome
will decrease power very slightly.
RCTs with Survival Outcomes
How does adjustment for important
baseline covariates affect power?
RCTs with Survival Outcomes
How does adjustment for important
baseline covariates affect power?
Hernandez et al (Annals Epi 2006)
RCTs with Survival Outcomes
Varied predictive effect of covariate,
covariate prevalence as well as treatment
effect
Modeling no Treatment Effect
Adjustment for covariate did not affect type
1 error
Modeling Treatment Effect (hi prev cov)
For hazard ratio on treatment of 1.4 and
1.7, adjusting for a moderate covariate
effect (HR=2) reduced sample size by 15%
For hazard ratio on treatment of 1.4 and
1.7, adjusting for a strong covariate effect
(HR=5) reduced sample size by 45%
Modeling Treatment Effect (low prev cov)
For hazard ratio on treatment of 1.4 and
1.7, adjusting for a moderate covariate
effect (HR=2) reduced sample size by 4%
For hazard ratio on treatment of 1.4 and
1.7, adjusting for a strong covariate effect
(HR=5) reduced sample size by 12%
Conclusions
Adjustment for covariates results in
increased power without effect on type 1
error
Covariate adjustment with binary outcomes
(logistic regression) in RCTs (Hernandez,
JCE, 2004)
Varied treatment effects, covariate effects,
outcome incidences, and covariate
prevalences
1. Type 1 error unaffected for adjustment
2. Adjustment for non-predictive covariate
has a trivial negative effect on sample size
3. For a strong covariate (OR=5), covariate
prevalence = 50%, adjustment decreases
sample size by 13%; for a very strong
covariate (OR=30), adjustment decreases
sample size by 45% for both weak
(OR=1.4) and moderate (OR=1.7)
treatment effects
4. As covariate prevalence decreases,
sample size savings decrease:
Prevalence Savings
50%
13%
25%
10%
12.5%
6%
Conclusion
Adjustment results in greater power without
a negative effect on type 1 error
Simulations used a single covariate;
savings are potentially greater with multiple
important predictors
Sample size with continuous outcomes:
What is the best way to deal with baseline
measurements?
Alternatives:
1. Ignore
2. Subtract (differences)
3. Regression
Let’s examine the relationship between
the estimates of variance using different
strategies
The difference between two groups (parallel
design)
The variance of the difference is equal the
sum of the variances for each group.
Assuming that variance is homogeneous (ie
same in both groups), this means that the
variance of the difference is twice the
variance of the individual groups.
In addition, if one takes two measurements on
a group of individuals and calculates the
variance of the difference between the first
and second measurements, the variance will
depend on the correlation between the two
measures.
If they are completely independent (correlation=0),
then the variance will be the same as if one were
assessing the variance of the difference between
two independent groups, i.e. as above the variance
of the difference is equal to twice the variance of
the individual time points.
However, if there is a correlation (r) between
the two time points
s2difference=
_
s2time 1+s2time 2 -2r(stime 1 X stime 2)
Using these facts as a basis, one can determine
the relative size of the variance for the three
different designs, and hence determine relative
sample sizes.
Assumption: Variance is homogeneous across
treatments and time.
Parallel Design (Final Scores Only)
s2total=s2group1+s2group 2=2s2group 1
Parallel Design
(Change from Baseline: Group 1 vs Group 2)
For group 1:
s2diff=s2baseline+ s2followup - 2rsbaselinesfollowup
For group 2:
s2diff=s2baseline+ s2followup - 2rsbaselinesfollowup
For a comparison of change from baseline in
group 1 vs group 2:
Simply add the two together.
Assuming s2baseline=s2followup
_
The sum is equivalent to: (1 r) 4s2
Now let's compare the two designs:
Parallel groups final scores: 2s2
Parallel groups (change scores): (1_ r) 4s2
Parallel groups final scores: 2s
2
Parallel groups (change scores): (1_ r) 4s2
If r = .5, then the variance of the change scores will be equal
the variance than the final scores.
If r > .5, then the variance of the change scores will be less
than the variance than the final scores.
Parallel groups (change scores): (1_ r) 4s2
If r > .5, then the variance of the change scores will be less
than the variance than the final scores.
The use of change scores in the parallel
group design can be used to help control
variance and decrease sample size when
the correlation between pre and post
scores are high.
This should be the case when the study
group is relatively heterogeneous (i.e. high
variability between subjects) at baseline on
the measurement of interest. The variance
will be even more tightly controlled if one
uses analysis of covariance.
What are we assuming when we use change
scores?
Less variance in the change score than in the
final scores?
Why this assumption?
If the study sample is relatively heterogeneous
at baseline, expect significant variance in final
scores.
On the other hand, the amount of change might
be relatively homogeneous, i.e. patients at
various levels might experience similar change.
Example:
Initial
Final
10
20
10
20
10
20
15
25
15
25
15
25
20
30
20
30
20
30
Change
10
10
10
10
10
10
10
10
10
If the final score were used, there would be
quite a bit of variability, but if the change
score were used, there would be little
variability.
We use our knowledge of the initial score
to predict the patients’ final scores
In fact, it is unusual for change scores to be
unrelated to baseline scores
In other words, the amount of change often
is influenced by patients’ scores at baseline
Other alternatives?
The change score approach assumes that the
best equation explaining the relationship
between the final score (Y) and the initial
score (X) is of the form;
Y= a + X
Where a = some constant amount.
In the previous example, the final score was
predicted by the equation:
Final score = 10 + 1*Initial Score
It is often the case that the amount of change
expected varies depending on the initial score
especially when it involves a variable that is
variable measured with considerable error
The highest scores probably reflect random
variation above the true score, and lower
scores reflect random variation below the true
scores.
Thus, we expect greater change in the lower
scores.
Y
X
Solution:
Analysis of covariance
Fits the best linear relationship between
initial score and final score
Limitations:
If comparing two groups, analysis of
covariance makes the assumption that the
coefficient of X (i.e. the slope of the line)
is the same for the two treatment groups.
If not, ANCOVA should not be used.
Example:
Final score= 10 + .5*Initial Score
1
Initial Score
2
Final Score
3
Change
Score
4
Change
Predicted by
ANCOVA
0
2
10
11
10
9
10
9
0
0
4
6
8
12
13
14
8
7
6
8
7
6
0
0
0
16
20
18
20
2
0
2
0
0
0
Col3_Col4
In this example, there is considerable variation in
the final score – it varies between 10 and 20,
and there would still be considerable variation
if we used simple change scores – they vary
between 0 and 10
However, if we predicted the score by
Y=10+.5X, then we could explain all the
variation in the change scores (Col 3– Col 4).
In other words, all of the variation in the
outcome is explained by our knowledge of the
initial score.
In general, analysis of covariance will tend
to explain more of the variance in the final
score than simple change scores.