Change from baseline or analysis of covariance?: Lord's Paradox

advertisement
Change from baseline or analysis of
covariance?: Lord's Paradox and other
matters.
Stephen Senn
(C) Stephen Senn 2004
1
Outline
• Adjustment in Randomised Clinical Trials
– The argument for ANCOVA
• Lord’s Paradox
– ANCOVA versus simple analysis of change scores
(SACS)
• Observational studies
– The argument against ANCOVA
• Resolution?
– Why ANCOVA although not perfect may be best after
all
(C) Stephen Senn 2004
2
SACS and ANCOVA
A simple randomised clinical trial in which there are two
treatment groups and only two measurements per patient:
a baseline measurement, X and an outcome
measurement, Y.
Popular choices of outcome measure are
1) raw outcomes Y
2) change score d = Y - X
3) covariance adjusted outcomes Y - X. (where  is
chosen appropriately)
NB As Laird (Am Stat., 37, 329-330, 1983) has shown,
covariate adjusted change scores are the same as 3)
(C) Stephen Senn 2004
3
The Estimators Associated
with the Measures
If subscript t stands for treatment and c for control we have:
1)
ˆraw  Yt  Yc
2)
ˆchange  Yt  Yc    X t  X c 
3)
ˆ  Yt  Yc     X t  X c 
1) and 2) are just special cases of 3). If  is chosen to be the
regression of Y on X, then 3) corresponds to analysis of covariance.
(C) Stephen Senn 2004
4
Warning
• These three measures, measure the same thing
• No question of choosing between them on the basis
of clinical relevance
• Can only choose between them on the basis either of
variance, or statistical philosophy
• ANCOVA may generally be expected to have the
lowest variance
• Baseline is irrelevant to the definition of the
treatment effect.
(C) Stephen Senn 2004
5
V
a
r
i
a
n
c
e
s
f
o
r
T
h
r
e
e
A
p
p
r
o
a
c
h
e
s
t
o
A
n
a
l
y
s
i
s
0
.
5
2
.
0
1
.
6
c
h
a
n
g
e
_
s
c
o
r
e
o
u
t
c
o
m
e
s
A
N
C
O
V
A
varince
1
.
2
0
.
8
0
.
4
0
.
0
0
.
0 0
.
2 0
.
4 0
.
6 0
.
8 1
.
0
c
o
r
r
e
l
a
t
i
o
n
c
o
e
f
f
i
c
i
e
n
t
(C) Stephen Senn 2004
6
ANCOVA and Baseline by Treatment
Interaction
• It is often stated that ANCOVA relies on the
parallelism assumption.
• This is not true.
• If the effect of treatment varies with baseline it
varies whether or not ANCOVA is used.
• ANCOVA is a first approximation and better than
either doing nothing or using change scores.
(C) Stephen Senn 2004
7
Not to use ANCOVA, because you fear
parallelism may not apply, is like saying
crossing the channel in a rowing boat is
dangerous I prefer to swim”.
(C) Stephen Senn 2004
8
Dichotomania
• Obsessive compulsive disorder
– Cochrane Collaboration has a galloping case
• Numbers Needed to Treat should have been strangled at birth
• Division of patients into sheep and goats
– Ignoring existence of geep and shoats
• Use of difference from baseline
– Sin number one
• Destruction of information
• Arbitrary division into responders non-responders
– Sin number two
• Further destruction of information
• Unjustified causal interpretation
(C) Stephen Senn 2004
9
A Red Herring
• It is sometimes claimed that measurement
error invalidates ANCOVA
– The reason is that if baseline is measured
with error the regression of outcome on
baseline is attenuated
• However this claim is incorrect
• ANCOVA is still valid
– The reason is that it is appropriate to correct
for an observed imbalance using an observed
regression
(C) Stephen Senn 2004
10
Counter-Claims
• There is a significant minority of papers
arguing against ANCOVA as a means of
dealing with bias
– E.g. Liang and Zeger (2000), Sankyha,
Samuelson (1986), American Statistician
• The variance claims are accepted
• Claims are made that ANCOVA is biased
unless there is balance at baseline
(C) Stephen Senn 2004
11
Justification of the Counter-Claim
E X c   c
E  X t   t  c  
E Yc    c   c
E Yt    t   t   c     t   c     c  
Hence
EX t  X c   
E Yt  YC     
E Yt  YC   X t  X c   (   )    
E Yt  YC    X t  X c   (   )      1   
(C) Stephen Senn 2004
12
Lord’s Paradox
• Lord, F.M. (1967) “ A paradox in the
interpretation of group comparisons”,
Psychological Bulletin, 68, 304-305.
“A large university is interested in investigating the
effects on the students of the diet provided in the
university dining halls….Various types of data are
gathered. In particular the weight of each student at
the time of his arrival in September and his weight in
the following June are recorded”
(C) Stephen Senn 2004
13
Two Statisticians
Statistician One
• Calculates difference
in weight for each hall
• Finds non-significant
difference in each
case
• (Also no difference
between halls)
Statistician Two
• Adjust for initial
weight
• Finds significant hall
effect
• Concludes
difference between
halls
(C) Stephen Senn 2004
14
A Simulated Example
• Starting and final weights for two groups of
students
• Males and females
• 300 In each group
• Analysis illustrated with S Plus
(C) Stephen Senn 2004
15
100
80
Final Weight
60
40
f
f
m m m
m
mm
mm
m
m
m
m
m
m
mm
m
m
m
m
mm
mm
m
m
m
m
m
m
m
m
mmmmmmmmmmm
m
mm
m
m
m
mm
m
mm
m
m
m
m
mmm
mm
m
m
m
m
m
mm
m
mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
fmm
m
m
mm
m
mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m mm
m
m
m
mm
m
m
m
m
m
m
m
mm
m
m
mm
m
mm
m
m
mm
mm
m
mm
m
m
m
m
m
f
m
m
m
m
m
m
f
m
m
m
m
m
mm
fm mm
mm
m
mf m
f
m
m
m
f
m
m
m
m
m
m
m
m
m
m
m
m
f
mm
m
mm
f mm
mm
m
f m
m
m m
m
m
fm
ff mm
m
fm
m
m
mm
m
f
m
m
m
m
f
m
f
m
m
m
m
f mf f mfm fm
f f ffmm
f
f fm
f
f
m
m
f
f
f
m
f
m
f mm
f fff mf mf fm m
fm
ff f m
fm
f
ffff ff fffm
f
f f f fff ffffffffffffffffm
ff
ffffffm
f
f
f
f
f
fmf ff fffffm
f
ff ffffffff ffffffffffffffff f ff fff f
f
f
f f f fffff f f
f
f f f f ffffffffffffffffff fffffffffffff f f
f ffffffffffffffffff f fffff fff f
f f
ff f
f
f fff fff ff ff f
f
ff f
ff f f ff
f
f
40
60
80
100
Initial Weight
(C) Stephen Senn 2004
16
Statistician One’s Analysis
Paired t-Test
data: Y.males and X.males
t = 0.662, df = 299, p-value = 0.5085
Paired t-Test
data: Y.females and X.females
t = -0.0512, df = 299, p-value = 0.9592
Standard Two-Sample t-Test
data: diff.males and diff.females
t = 0.5017, df = 598, p-value = 0.6161
(C) Stephen Senn 2004
17
100
80
Final Weight
60
40
f
f
m m m
m
mm
mm
m
m
m
m
m
m
mm
m
m
m
mm
mm
mm
m
mm
mmmmmm
mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
mm
mmm m
mm
m
m
mm
mm
m
mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
mm
m
m
m
fmm
m
m
mm
m
mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m mm
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
mmmm
mm
m
mm
m
m
m
m
m
m
f
m
m
m
m
m
f
m
m
m
m
m
fm mm
mm
m
mf m
m
mfm
m
f
mm
m
m
m
m
m
m
m
mm
m
fm
m
m
f
m
m
m
fm
m
m
m
m
m
m
f
ff mm
m
m
m
m
m
f mfm
m
m mm
fm
m
fmm
fmm
m
f mm
m
ffm
fm
f
m
f
m
f
f fm
fm f ff m
m
fm
f ffm
fm
f m
f ffffm
f m
m
ff f m
fff fffm
f
m
m
f
f
f
f
f
f
f
f f f fff fffffffffffffffm
f
f
f
m
f
f
f
f
f
f
f
f
f f ff fm
f
f f ff f
fmfffffffffff fff ffffffffffffff f ff ff f
f
f
f
f
f
f
f
f
f
f
f
f
f f f f ffffffffffffffffff fffffffffffff f f
f ffffffffffffffffff f fffff fff f
f f
ff f
f
f fff fff ff ff f
ff f
ff f f f ff
f
f
40
60
80
100
Initial Weight
(C) Stephen Senn 2004
18
Statistician Two’s analysis
Call: lm(formula = Y ~ X + sex)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 12.4987 1.5135
8.2584 0.0000
X 0.8087 0.0232
34.9156 0.0000
sex 1.9816 0.2833
6.9945 0.0000
(C) Stephen Senn 2004
19
What People Usually Conclude
• Where baseline values are not expected to
be equal between groups ANCOVA can
mislead
• Therefore even though SACS will have a
higher variance it should be preferred
under such circumstances since it is
obviously unbiased
(C) Stephen Senn 2004
20
A Counter Counter-Example
• Suppose we design a bizarre clinical trial
• Only person with diastolic blood pressure
at baseline equal to 95mmHg or
105mmHg may enter
• In the first stratum they are randomised 3
to 1 and in the second 1 to 3
• Situation as follows
(C) Stephen Senn 2004
21
A Stupid Trial
Numbers of Patients by dbp and Treatment
Treatment
Baseline
diastolic
blood
pressure
Total
A
B
Total
95mm Hg
300
100
400
105 mmHg
100
300
400
400
400
800
(C) Stephen Senn 2004
22
Approach to Analysis
• Stratify by baseline dbp
• Produce treatment estimate for each
stratum
• Overall estimate is average of the two
• Stratification deals with the imbalance
(C) Stephen Senn 2004
23
An Equivalent Approach
• Create dummy variable stratum
S = -1 if baseline dbp, X = 95mmHg
S = 1 if baseline dbp, X =105 mmHg
• Regress dbp at outcome, Y, on treatment
indicator T and on stratum indicator S
• Estimate will be same as by stratification
• If you want variance estimate to be exactly the
same you need to include interaction also
(C) Stephen Senn 2004
24
An Equivalent Equivalent
Approach
• Regress Y on T and X rather than on T and S
– This is called ANCOVA!
• Note that S= (X-100)/5
• Hence this approach is equivalent to the
previous one, which is equivalent to
stratification, which is unbiased
• On the other hand SACS is biased
• Hence we have produced a counter-example
(C) Stephen Senn 2004
25
Conclusion
• Contrary to what is often claimed there are
cases where ANCOVA is unbiased but
SACS is biased.
• No simple statement of the form “ANCOVA
is more efficient but SACS is unbiased” is
possible.
• In fact it is very difficult to imagine cases
where SACS is the preferred analysis
(C) Stephen Senn 2004
26
Lord’s Paradox Revisited
• Statistician one assumes that in the absence of
any differential treatment effect the two groups
despite different baselines would show
equivalent changes
• Statistician two assumes that in the absence of
any differential treatment effect the change of
the groups as a whole is the same as the
change within groups
• Both of these causal assumptions are untestable
(C) Stephen Senn 2004
27
However
• It is easy to design
trials for which
– ANCOVA is unbiased
– SACS is biased
– A causal interpretation
can be given
• It is rather difficult to
design trials for which
– SACS is unbiased
– ANCOVA is biased
– A causal interpretation
can be given
(C) Stephen Senn 2004
28
The Necessary Condition for
ANCOVA to be Unbiased
E Yt  YC     X t  X c    
E Yt  YC       X t  X c 
E Yt  YC     
Or in everyday language that the bias in the raw
comparison at outcome should be  times the bias at
baseline where  is the individual regression effect
(C) Stephen Senn 2004
29
Cut-off Designs
Trochim and Capelleri have suggested that in many clinical
trials randomisation will be unethical because some patients by
the nature of their illness may be unwilling to assume the risks
associated with an experimental treatment. They propose a class
of designs called “cut-off” designs in which some patients are
assigned to treatment in a deterministic manner on the basis of
baseline values.
The position, for example, might be as given in the diagram
below.
(C) Stephen Senn 2004
30
A Cut-off Design in Hypertension
Standard
treatment only
Randomise to
Experimental
standard or
treatment only
experimental
Mild hypertension
Moderate hypertension
(C) Stephen Senn 2004
Severe hypertension
31
Cut off Designs
• Provided that the relationship between
baseline and outcome is linear ANCOVA is
valid
• Cut off designs are thus a wide class of
design for which ANCOVA is unbiased
• SACS will be biased
• Thus we have more counterexamples to
the claims of Liang and Zeger
(C) Stephen Senn 2004
32
A Challenge
• Can you design a trial for which
– SACS is unbiased
– ANCOVA is biased
– A causal interpretation can be given?
(C) Stephen Senn 2004
33
Some Schemes That won’t Work
• Select patient according to true baseline values
– Not possible in practice since not known
– Still won’t work since correlation of true values is not 1
• Select patients according to average of values at
baseline and outcome
– You need a crystal ball
• Select according to some other value
– ANCOVA will be biased but so will SACS
• Select on binary covariate
– Either this is permanent (e.g. sex), in which case causal
inference doubtful
– Or it varies over time in which case there will be a regression
(C) Stephen Senn 2004
34
Conclusion
• In RCTs ANCOVA is the appropriate way
to use baseline information
– SACS, responder analysis, NNTs all wasteful
• A hallmark of second rate analysis
• In observational studies things are more
complex
– ANCOVA may not be perfect but it may be the
best you can do
(C) Stephen Senn 2004
35
Here there be tygers!
(C) Stephen Senn 2004
36
Download