Cluster randomised trials - User Web Areas at the

advertisement
Randomised Controlled Trials in the
Social Sciences
Cluster randomised trials
Martin Bland
Professor of Health Statistics
University of York
www-users.york.ac.uk/~mb55/
Cluster randomised trials
Also called group randomised trials.
Research subjects are not sampled independently, but in
a group.
For example:
 all the patients in a general practice are allocated to
the same intervention, the general practice forming a
cluster,
 all pupils in a school class are allocated to the same
intervention, the class forming a cluster.
Members of a cluster will be more like one
another than they are like members of other
clusters.
Members of a cluster will be more like one
another than they are like members of other
clusters.
We need to take this into account in the analysis
and design.
Methods of analysis which ignore clustering:
 two sample t method,
 chisquared test for a two way table,
 difference between two proportions,
 relative risk,
 analysis of covariance,
 logistic regression.
Methods of analysis which ignore clustering:
 two sample t method,
 chisquared test for a two way table,
 difference between two proportions,
 relative risk,
 analysis of covariance,
 logistic regression.
May mislead, because they assume that all subjects are
independent observations.
Methods which ignore clustering may mislead, because
they assume that all subjects are independent
observations.
Observations within the same cluster are correlated.
Methods which ignore clustering may mislead, because
they assume that all subjects are independent
observations.
Observations within the same cluster are correlated.
May lead to standard errors which are too small,
confidence intervals which are too narrow, P values
which are too small.
A little simulation
Four cluster means, two in each group, from a Normal
distribution with mean 10 and standard deviation 2.
Generated 10 members of each cluster by adding a
random number from a Normal distribution with mean zero
and standard deviation 1.
The null hypothesis, that there is no difference between the
means in the two populations, is true.
Two-sample t test comparing the means, ignoring the
clustering.
1000 times:
600 significant differences, with P<0.05
502 highly significant, with P<0.01.
If t test ignoring the clustering were valid, expect 50
significant differences, 5%, and 10 highly significant ones.
The analysis assumes that we have 20 independent
observations in each group. This is not true.
We have two independent clusters of observations, but the
observations in those clusters are really the same thing
repeated ten times.
A valid statistical analysis.
Possible analysis:
• find the means for the four clusters
• carry out a two-sample t test using these four means
only.
1000 simulation runs:
53 (5.3%) significant at P<0.05
14 (1.4%) highly significant at P<0.01
Simulation is very extreme.
Two groups of two clusters and a very large cluster
effect.
Have seen a proposed study with two groups of two
clusters.
Smaller cluster effect would only reduce the shrinking of
the P values, it would not remove it.
Simulation shows that spurious significant differences
can occur if we ignore the clustering.
Example: GP Education Trial
Trial of General Practictioner education to
improve treatment of asthma.
Educate GPs in small groups, or not, and
evaluate this education by giving repeated
questionnaires to their asthmatic patients.
Asked for my views on the sample size
calculations.
Original: ignored the clustering and the
GPs, and treated the design as a
comparison of two groups of patients.
Revised: produced a sample size
calculation based primarily on the
number of GPs, not patients.
The trial was funded and a research
fellow, a GP, appointed.
The cluster nature of the study was selfevident to me. It was not self-evident to
the research fellow!
The trial was funded and a research
fellow, a GP, appointed.
The cluster nature of the study was selfevident to me. It was not self-evident to
the research fellow!
Many researchers find the importance of
clustering very hard to understand.
The study appeared including the following
description of the analysis:
‘For each general practitioner a score was
calculated for each questionnaire item. Analysis
of variance was then carried out for each
questionnaire item to compare the three groups .
..’
How big is the effect of clustering?
The design effect is what we must multiply the sample size
for a trial which is not clustered, to achieve the same
power.
Alternatively, the power of a cluster randomised trial is the
power of an individuall randomised trial of size divided by
the design effect.
Design effect:
Deff = 1 + (m − 1)×ICC
where m is the number of observations in a cluster and ICC
is the intra-cluster correlation coefficient, the correlation
between pairs of subjects chosen at random from the same
cluster.
Deff = 1 + (m − 1)×ICC
ICC is usually quite small, 0.04 is a
typical figure.
If m =1, cluster size one, no clustering,
then Deff =1, otherwise Deff will
exceed 1.
If we estimate the required sample size
ignoring clustering, we must multiply it by
the design effect to get the sample size
required for the clustered sample.
Alternatively, if the sample size is estimated
ignoring the clustering, the clustered sample
has the same power as for a simple sample
of size equal to what we get if we divide our
sample size by the design effect.
If we analyse the data as if there were no
clusters, the variances of the estimates
must be multiplied by Deff, hence the
standard error must be multiplied by the
square root of Deff.
Deff = 1 + (m − 1)×ICC
Clustering may have a large effect if the ICC is large OR if
the cluster size is large.
E.g., if ICC = 0.001, cluster size = 500, the design effect will
be 1 + (500 – 1)0.001 = 1.5,
Need to increase the sample size by 50% to achieve the
same power as an unclustered trial.
Deff = 1 + (m − 1)×ICC
Clustering may have a large effect if the ICC is large OR if
the cluster size is large.
E.g., if ICC = 0.001, cluster size = 500, the design effect will
be 1 + (500 – 1)0.001 = 1.5,
Need to increase the sample size by 50% to achieve the
same power as an unclustered trial.
Need to estimate variances both within and between
clusters.
If the number of clusters is small, the between clusters
variance will have few degrees of freedom and we will be
using the t distribution in inference rather than the Normal.
This too will cost in terms of power.
Example: a grant application
An evaluation of a peer-led health
education intervention.
A comparison of two groups each of
two clusters (counties) of about 750
people each.
Applicants were aware of the problem of
cluster randomisation, but did not give
any assessment of its likely impact on
the power of the study, except to say
that the intra-cluster correlation was
"small", i.e. 0.005 based on a US study.
Deff = 1 + (m − 1)×ICC
For the proposed design, the mean
number of subjects in a cluster was
about 750, so
Deff = 1 + 750 × 0.005 = 4.75
Thus the estimated sample size for any
given comparison should be multiplied
by 4.75.
The estimated sample size for any given
comparison should be multiplied by 4.75.
We have the same power as an
individually randomised sample of
3000/4.75 = 630
Degrees of freedom
In large sample approximation sample
size calculations, power 80% and alpha
5% are embodied in the multiplier
(0.85 + 1.96)2 = 7.90.
For a small sample calculation using the t
test, 1.96 must be replaced by the
corresponding 5% point of the t distribution
with the appropriate degrees of freedom.
2 degrees of freedom gives t = 4.30.
Hence the sample size multiplier is
(0.85 + 4.30)2 = 26.52
3.36 times that for the large sample.
This will reduce the effective sample size
even more, down to 630/3.36 = 188.
Thus the 3000 men in two groups of two
clusters will give the same power to detect
the same difference as 188 men
randomised individually.
This will reduce the effective sample size
even more, down to 630/3.36 = 188.
Thus the 3000 men in two groups of two
clusters will give the same power to detect
the same difference as 188 men
randomised individually.
This proposal came back with many
more clusters.
Cluster size small, large number of clusters, small ICC:
Design effect close to one.
Little effect if the clustering is ignored.
E.g. randomised controlled trial of the effects of
coordinating care for terminally ill cancer patients
(Addington-Hall et al., 1992).
554 patients randomised by GP. About 200 GPs, so
most clusters had only a few patients.
Ignored the clustering.
Several approaches can be used to allow for clustering:
 summary statistic for each cluster
 adjust standard errors using the design effect
 robust variance estimates
 general estimating equation models (GEEs)
 multilevel modeling
 Bayesian hierarchical models
 others
Several approaches can be used to allow for clustering:
 summary statistic for each cluster
 adjust standard errors using the design effect
 robust variance estimates
 general estimating equation models (GEEs)
 multilevel modeling
 Bayesian hierarchical models
 others
Any method which takes into account the
clustering will be a vast improvement
compared to methods which do not.
A refereeing case study
Paper sent in 1997 by the BMJ.
Study of the impact of a specialist
outreach team on the quality of
nursing and residential home care.
Intervention carried out at the
residential home level.
Eligible homes were put into matched
pairs and one of each pair
randomised to intervention.
Thus the randomisation was
clustered.
The randomisation was clustered.
Intervention was applied to the care staff,
not to the patients.
The residents in the home were used to
monitor the effect of the intervention on the
staff.
Clustering was totally ignored in the
analysis.
Clustering was totally ignored in the
analysis.
Used the patient as the unit of analysis.
Clustering was totally ignored in the
analysis.
Used the patient as the unit of analysis.
Carried out a Mann-Whitney test of the scores between the
two groups at baseline. This was not significant.
Clustering was totally ignored in the
analysis.
Used the patient as the unit of analysis.
Carried out a Mann-Whitney test of the scores between the
two groups at baseline. This was not significant.
Mann-Whitney test at follow-up, completely ignoring the
baseline measurements.
Clustering was totally ignored in the
analysis.
Used the patient as the unit of analysis.
Carried out a Mann-Whitney test of the scores between the
two groups at baseline. This was not significant.
Mann-Whitney test at follow-up, completely ignoring the
baseline measurements.
Wilcoxon matched pairs test for each group separately and
found that one was significant and the other not.
Possible approaches
Summary statistic for the home, e.g. the mean change in
score. These could then be compared using a t method.
As the homes were randomised within pairs, I suggested the
paired t method. (This may not be right, as the matching
variables may not be informative and the loss of degrees of
freedom may be a problem.)
The results should be given as a difference in mean change,
with a confidence interval as recommended in the BMJ’s
guide-lines to authors, rather than as a P value.
Alternative: fit a multi-level model, with homes as one level of
variability, subjects another, and variation within subjects a
third. A job for a professional statistician.
What happened next?
The paper was rejected.
What happened next?
The paper was rejected.
Study reported in the Lancet!
What happened next?
The paper was rejected.
Study reported in the Lancet!
Extra author, a well-known medical statistician.
‘The unit of randomisation in the study was the
residential home and not the resident. Thus, all data
were analysed by use of general estimated equation
models to adjust for clustering effects within homes. .
. . Clinical data are presented as means with 95% CIs
calculated with Huber variance estimates.’.
I looked for the acknowledgement
to an unknown referee, in vain.
Reviews of published trials
There have been several reviews of published
cluster randomised trials in medical
applications.
Some reviews of published cluster randomised trials
Authors
Source
Years
Clustering
allowed for
in sample
size
Clustering
allowed
for in
analysis
Donner et al. 16 non-therapeutic
(1990)
intervention trials
1979 –
1989
<20%
<50%
Simpson et
al. (1995)
1990 –
1993
19%
57%
Isaakidis and 51 trials in Sub-Saharan
Ioannidis
Africa
(2003)
1973 –
20%
2001 (half
post 1995)
37%
Puffer et al.
(2003)
36 trials in British Medical
Journal, Lancet, and New
England Journal of
Medicine
1997 –
2002
56%
92%
Eldridge et
al. (in press
2003)
152 trials in primary health 1997 care
2000
20%
59%
21 trials from American
Journal of Public Health
and Preventive Medicine
Importance for the evidence base
Incorrect analyses may produce false conclusions.
Sample sizes may be too small.
Key references
Murray DM. (1998) The Design and Analysis of GroupRandomized Trials. Oxford, University Press.
Donner A, Klar N. (2000) Design and Analysis of Cluster
Randomised Trials in Health Research. London, Arnold.
Many papers by Alan Donner and colleagues.
Campbell MK, Elbourne DR, Altman DG for the
CONSORT Group. The CONSORT statement: extension
to cluster randomised trials. Submitted for publication.
Bland JM, Kerry SM, Altman DG. Statistics Notes series
in British Medical Journal, numbers 29-34
– www.york-users.ac.uk/~mb55
Publications on cluster designs
How-to-do-it papers.
Statistics notes in the BMJ.
Articles in GP journals.
Special editions of Statistical Methods in
Medical Research and Statistics in Medicine.
Papers reporting intraclass correlation
coefficients to help others to design clustered
studies.
Web of Knowledge search on: randomi* in
clusters OR cluster randomi*
140
120
100
80
60
40
20
0
1981 1985
1990 1995
Year
All papers
Trials
2000
2005
Methods
This is not a thorough search and will have
missed many studies.
2001 includes special issues of Statistics in
Medicine and Statistical Methods in
Medical Research on cluster
randomisation.
Ignores papers using clusters in
observational studies.
Ignores other terms e.g. ‘group randomised’.
Cornfield (1978) ‘Randomisation by group: A formal
analysis’ includes the following:
‘Randomization by cluster accompanied by an analysis
appropriate to randomization by individual is an exercise in
self-deception, however, and should be discouraged.’
Murray (1998). The Design and Analysis of GroupRandomized Trials. Oxford, University Press.
Are any of these trials “social science”?
van der Molen HF, Sluiter JK, Hulshof CTJ, Vink, P, van
Duivenbooden, C, Holman, R, Frings-Dresen, MHW. TI
Implementation of participatory ergonomics intervention in
construction companies. Scandinavian Journal of Work
Environment & Health 31, 191-204.
Study objective: The effectiveness of the implementation
of participatory ergonomics intervention to reduce physical
work demands in construction work was studied.
Are any of these trials “social science”?
Shemilt I, Harvey I, Shepstone L, Swift L, Reading R,
Mugford M, Belderson P, Norris N, Thoburn J, Robinson J.
(2004) A national evaluation of school breakfast clubs:
evidence from a cluster randomized controlled trial and an
observational analysis. Child Care Health and
Development 30, 413-427.
Study objective: To measure the health, educational and
social impacts of breakfast club provision in schools
serving deprived areas across England.
Also Shemilt, I, Mugford M, Moffatt P, Harvey I, Reading R,
Shepstone L, Belderson P. (2004) A national evaluation of
school breakfast clubs: where does economics fit in? Child
Care Health and Development 30, 429-437.
Are any of these trials “social science”?
Strang J, McCambridge J. (2004) Can the practitioner
correctly predict outcome in motivational interviewing?
Journal of Substance Abuse Treatment 27. 83- 88,
Study objective: We have examined whether practitioner
ratings (immediately post-intervention) or other recorded
characteristics of a single-session 1-hour motivational
intervention were predictive of 3-month cannabis use
outcome.
Are any of these trials “social science”?
Stephenson JM, Strange V, Forrest S, Oakley A, Copas A,
Allen E, Babiker A, Black S, Ali M, Monteiro H, Johnson
AM.. (2004) Pupil-led sex education in England (RIPPLE
study): cluster-randomised intervention trial. Lancet 364,
338-346.
Study objective: Improvement of sex education in schools
is a key part of the UK government's strategy to reduce
teenage pregnancy in England. We examined the
effectiveness of one form of peer-led sex education in a
school-based randomised trial of over 8000 pupils.
Are any of these trials “social science”?
Kendrick D, Royal S (2004) Cycle helmet ownership and
use; a cluster randomised controlled trial in primary
school children in deprived areas. Archives of Disease in
Childhood VL 89, 330-335.
Study objective: To assess the effectiveness of two
different educational interventions plus free cycle helmets,
in increasing cycle helmet ownership and use.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat Homerus)
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat Homerus)
(Even the worthy Homer sometimes nods)
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat Homerus)
(Even the worthy Homer sometimes nods) (Even the
greatest get it wrong).
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians.
• There are many ways to allow for clustering.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians.
• There are many ways to allow for clustering.
• The number of cluster randomised trials published has
increased greatly.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians.
• There are many ways to allow for clustering.
• The number of cluster randomised trials published has
increased greatly.
• The effects of clustering have often been ignored.
Conclusions
• The effects of clustering can be large, inflating Type I
errors.
• This may not be obvious to researchers, even to
statisticians.
• There are many ways to allow for clustering.
• The number of cluster randomised trials published has
increased greatly.
• The effects of clustering have often been ignored.
• The situation has improved.
Recommendations
• Keep up the pressure.
Recommendations
• Keep up the pressure.
• Extend to specialist journals.
Recommendations
• Keep up the pressure.
• Extend to specialist journals.
Randomised Controlled Trials in the
Social Sciences
Cluster randomised trials
Martin Bland
Professor of Health Statistics
University of York
www-users.york.ac.uk/~mb55/
Download