Title and Abstract

advertisement
CONSORT extension for non-inferiority
and equivalence trials
Title and Abstract
1a – Title
Description:
Identification as a noninferiority randomised trial in the title
Explanation:
Readers should be able to easily identify from the title or abstract that the study was a noninferiority or
equivalence randomised trial. Including the design in the title or abstract also ensures ease of identification of
these studies in a literature search for inclusion in systematic reviews.
Example:
“Dabigatran Etexilate Versus Enoxaparin for Prevention of Venous Thromboembolism After Total Hip Replacement:
A Randomised, Double- Blind, Non-Inferiority Trial.”42
1b – Abstract
Description:
Table 1: Information to include in the Abstract of a Report of a Noninfe riority or Equivalence
Randomised Trial: Extension of CONSORT for Abstracts Checklist 7, 8, a, b
Item
Title
Trial Design
Participants
Interventions
Objective
Standard Checklist Item
Identification of study as
randomised
Description of the trial design (e.g.,
parallel, cluster, noninferiority)
Methods
Eligibility criteria for participants
and the settings in which the data
were collected
Interventions intended for each
group
Specific objective or hypothesis
Outcome
Clearly defined primary outcome for
this report
Randomisation
How participants were allocated to
interventions
Whether participants, caregivers,
and those assessing the outcomes
were blinded to group assignment
Results
Numbers of participants
randomised to each group
Trial status
Number of participants analyzed in
each group
For the primary outcome, a result
for each group and the estimated
effect size and its precision
Important adverse events or side
effects
General interpretation of the results
Blinding (masking)
Numbers randomised
Recruitment
Numbers analyzed
Outcome
Harms
Conclusions
Trial registration
Funding
Extension for Noninferiority Trials
Identification of study as a
noninferiority trial
Specific hypothesis concerning
noninferiority, including
noninferiority margin
Clarify for all reported outcomes
whether noninferiority or
superiority
For the primary noninferiority
outcome, results in relation to
noninferiority margin
Interpretation taking into account
the noninferiority hypotheses and
any superiority hypotheses
Registration number and name of
trial register
Source of funding
Abbreviation: CONSORT, Consolidated Standards of Reporting Trials.
a
This checklist relates to noninferiority trials, but the same issues apply to equivalence trials.
b
This checklist may be republished without restriction.
Explanation:
Clear, transparent, and sufficiently detailed abstracts are important. Readers may only have access to the abstract,
and many others skim it before deciding whether to read further. A well-written abstract also helps in retrieval of
relevant reports from electronic databases. In 2008, a CONSORT extension for reporting abstracts was published, 7,
8
and those recommendations were incorporated into CONSORT 2010. For noninferiority studies, the study
design24 and the noninferiority margin32 are poorly reported in abstracts. In addition to the items recommended
for all trials, abstracts for noninferiority RCTs should specify the noninferiority hypothesis, identify the primary
outcome and noninferiority margin, and make clear whether hypotheses for other reported outcomes are
noninferiority or superiority. The results should relate the primary noninferiority outcome to the noninferiority
margin. The overall interpretation should take account of noninferiority and also any superiority hypotheses (Table
2).
Example:
This example details only those parts relevant to noninferiority. 43
TITLE: IDENTIFICATION OF STUDY AS A NONINFERIORITY TRIAL. “Duloxetine, Pregabalin, and Duloxetine plus Gabapentin for
Diabetic Peripheral Neuropathic Pain Management in Patients With Inadequate Pain Response to Gabapentin: An
Open-Label, Randomised, Noninferiority Comparison.”
METHODS-OBJECTIVE: SPECIFIC HYPOTHESIS CONCERNING NONINFERIORITY, INCLUDING NONINFERIORITY MARGIN. “To determine
whether duloxetine is noninferior to (as good as) pregabalin in the treatment of pain associated with diabetic
peripheral neuropathy. . . . Noninferiority would be declared if the mean improvement [in the weekly mean of the
diary-based daily pain score] for duloxetine was no worse than the mean improvement for pregabalin, within
statistical variability, by a margin of -0.8 unit.”
METHODS-OUTCOME: CLARIFY FOR ALL REPORTED OUTCOMES WHETHER NONINFERIORITY OR SUPERIORITY. “The primary objective
was a noninferiority comparison between duloxetine and pregabalin on improvement in the weekly mean of the
diary-based daily pain score (0- to 10-point scale) at end point. “. . . adverse effects, nausea, insomnia,
hyperhidrosis, and decreased appetite [were secondary outcomes to be assessed for superiority].”
RESULTS-OUTCOME: FOR THE PRIMARY NONINFERIORITY OUTCOME, RESULTS IN RELATION TO NONINFERIORITY MARGIN. “The 97.5%
lower confidence limit was a -0.05 difference in means, establishing noninferiority.”
CONCLUSIONS: INTERPRETATION TAKING INTO ACCOUNT THE NONINFERIORITY HYPOTHESES AND ANY SUPERIORITY HYPOTHESES.
“Duloxetine was noninferior to pregabalin for the treatment of pain in patients with diabetic peripheral
neuropathy who had an inadequate pain response to gabapentin.”
Introduction
2a – Background
Description:
Rationale for using noninferiority design
Explanation:
The rationale for using a noninferiority design should include evidence for the efficacy of the reference treatment
in a similar context. If previous trials (preferably as part of a systematic review) demonstrated the superiority of
the reference treatment relative to placebo (or an equivalent, such as “usual care” for nonpharmacological
interventions) they should be cited, preferably with effect sizes and CIs. If no such trials exist, other evidence for
efficacy of the reference treatment should be given. Evidence for other potential advantages of the new treatment
over the reference treatment should be summarised, to justify use of the new treatment if it should be shown to
be noninferior. One aim of the current trial might be to provide or support such evidence. (See also checklist items
4a, 5, and 6.)
Example:
“German guidelines consider adjuvant fluorouracil the standard of care [for locally advanced rectal
cancer].Optimisation of local tumour control has meant that distant metastases now represent the most common
type of treatment failure in rectal cancer. . . . Capecitabine is an oral fluoropyrimidine derivative that was as
effective as fluorouracil plus folinic acid for adjuvant treatment of stage III colon cancer. It was also non-inferior to
infusional fluorouracil in combination with oxaliplatin for first-line treatment of metastatic colorectal cancer . . . no
randomised trial has compared capecitabine with perioperative fluorouracil in locally advanced disease. Our choice
of a non-inferiority trial design was based on the expectation that non-inferiority of capecitabine, given orally on
an outpatient basis, would be sufficient to tip the risk-benefit ratio in its favour.”44
2b – Objectives
Description:
Hypotheses concerning noninferiority, specifying the noninferiority margin with the rationale for its choice.
Explanation:
The authors should specify for which outcomes noninferiority hypotheses apply and for which superiority
hypotheses apply. Usually the noninferiority hypothesis refers to the primary end point, whereas the new
treatment is expected to offer other advantages, e.g., fewer adverse effects or lower cost. If the trial is multigroup
or the treatments have a factorial structure, the comparisons to which the noninferiority hypothesis applies should
be specified. If sequential testing of noninferiority and superiority hypotheses was planned, that should also be
reported. The rationale for the choice of the noninferiority margin and whether the margin is based on a relative
or absolute scale should be specified because relative measures tend to make it less easy to conclude
noninferiority, particularly when observed rates turn out to be smaller than the expected rates. 46, 47 The method
used to set the margin of noninferiority should be reported. Conventionally, the margin is taken as the size of the
effect considered clinically irrelevant. That approach might show an ineffective new treatment as noninferior if the
margin is too large in relation to the effect of the reference treatment compared with placebo. To prove that the
new treatment is effective, the effect retention or putative placebo method has been proposed (eAppendix), 36 and
it should be used if possible if the noninferiority trial is aimed for drug approval. 35,47
Example:
“A sequential analysis for the antiplatelet comparison was developed and planned to first test the noninferiority of
aspirin plus extended release dipyridamole as compared with clopidogrel. If this condition was satisfied, then the
superiority of aspirin plus extended-release dipyridamole over clopidogrel could be assessed in a second test of the
conventional null hypothesis of no difference between the two treatments. Confirmation of noninferiority in this
trial involved the prespecification of a hazard ratio for aspirin plus extended release dipyridamole, as compared
with clopidogrel, that is below a predefined margin. The margin was defined in the following way. . . . ” 45 (See item
7a.)
Methods
3a – Trial design
Description:
Description of trial design (such as parallel, factorial) including allocation ratio
Explanation:
The word “design” is often used to refer to all aspects of how a trial is set up, but it also has a narrower
interpretation. Many specific aspects of the broader trial design, including details of randomisation and blinding,
are addressed elsewhere in the CONSORT checklist. Here we seek information on the type of trial, such as parallel
group or factorial, and the conceptual framework, such as superiority or non-inferiority, and other related issues
not addressed elsewhere in the checklist.
The CONSORT statement focuses mainly on trials with participants individually randomised to one of two “parallel”
groups. In fact, little more than half of published trials have such a design.(16) The main alternative designs are
multi-arm parallel, crossover, cluster,(40) and factorial designs.(39) Also, most trials are set to identify the
superiority of a new intervention, if it exists, but others are designed to assess non-inferiority or equivalence. It is
important that researchers clearly describe these aspects of their trial, including the unit of randomisation (such as
patient, GP practice, lesion). It is desirable also to include these details in the abstract (see item 1b).
If a less common design is employed, authors are encouraged to explain their choice; especially as such designs
may imply the need for a larger sample size or more complex analysis and interpretation.
Although most trials use equal randomisation (such as 1:1 for two groups), it is helpful to provide the allocation
ratio explicitly. For drug trials, specifying the phase of the trial (I-IV) may also be relevant.
Example:
“This was a multicenter, stratified (6 to 11 years and 12 to 17 years of age, with imbalanced randomisation [2:1]),
double-blind, placebo-controlled, parallel-group study conducted in the United States (41 sites).”(85)
3b – Changes to trial design
Description:
Important changes to methods after trial commencement (such as eligibility criteria), with reasons
Explanation:
A few trials may start without any fixed plan (that is, are entirely exploratory), but the most will have a protocol
that specifies in great detail how the trial will be conducted. There may be deviations from the original protocol, as
it is impossible to predict every possible change in circumstances during the course of a trial. Some trials will
therefore have important changes to the methods after trial commencement.
Changes could be due to external information becoming available from other studies, or internal financial
difficulties, or could be due to a disappointing recruitment rate. Such protocol changes should be made without
breaking the blinding on the accumulating data on participants’ outcomes. In some trials, an independent data
monitoring committee will have as part of its remit the possibility of recommending protocol changes based on
seeing unblinded data. Such changes might affect the study methods (such as changes to treatment regimens,
eligibility criteria, randomisation ratio, or duration of follow-up) or trial conduct (such as dropping a centre with
poor data quality).(87)
Some trials are set up with a formal “adaptive” design. There is no universally accepted definition of these designs,
but a working definition might be “a multistage study design that uses accumulating data to decide how to modify
aspects of the study without undermining the validity and integrity of the trial.”(88) The modifications are usually
to the sample sizes and the number of treatment arms and can lead to decisions being made more quickly and
with more efficient use of resources. There are, however, important ethical, statistical, and practical issues in
considering such a design.(89) (90)
Whether the modifications are explicitly part of the trial design or in response to changing circumstances, it is
essential that they are fully reported to help the reader interpret the results. Changes from protocols are not
currently well reported. A review of comparisons with protocols showed that about half of journal articles
describing RCTs had an unexplained discrepancy in the primary outcomes.(57) Frequent unexplained discrepancies
have also been observed for details of randomisation, blinding,(91) and statistical analyses.(92)
Example:
“Patients were randomly assigned to one of six parallel groups, initially in 1:1:1:1:1:1 ratio, to receive either one of
five otamixaban … regimens … or an active control of unfractionated heparin … an independent Data Monitoring
Committee reviewed unblinded data for patient safety; no interim analyses for efficacy or futility were done.
During the trial, this committee recommended that the group receiving the lowest dose of otamixaban (0·035
mg/kg/h) be discontinued because of clinical evidence of inadequate anticoagulation. The protocol was
immediately amended in accordance with that recommendation, and participants were subsequently randomly
assigned in 2:2:2:2:1 ratio to the remaining otamixaban and control groups, respectively.”(86)
4a – Participants
Description:
Whether participants in the noninferiority trial are similar to those in any trial(s) that established efficacy of the
reference treatment.
Explanation:
Because an inference of noninferiority relies on evidence that the reference treatment is effective (see “Assay
Sensitivity” in eAppendix), relevant differences in participants’ characteristics compared with previous trials should
be reported and explained. Such description should concentrate on differences that might affect response to
treatments. For continuous variables it is important to provide not just the mean values but also an indication of
variability (e.g., standard deviation).
Example:
“[We] enrolled 6628 men and women in 312 health centres in Sweden . . . who had hypertension (blood pressure
>180 mm Hg systolic, >105mmHg diastolic, or both), aged 70-84 years. The only difference in inclusion criteria
between this trial and the STOP-Hypertension trial was that patients with isolated systolic hypertension could be
included in STOP Hypertension-2, based on previous positive findings in patients with isolated systolic
hypertension treated with diuretics and calcium antagonists.”48
4b – Study settings
Description:
Settings and locations where the data were collected
Explanation:
Along with the eligibility criteria for participants (see item 4a) and the description of the interventions (see item
5), information on the settings and locations is crucial to judge the applicability and generalisability of a trial. Were
participants recruited from primary, secondary, or tertiary health care or from the community? Healthcare
institutions vary greatly in their organisation, experience, and resources and the baseline risk for the condition
under investigation. Other aspects of the setting (including the social, economic, and cultural environment and the
climate) may also affect a study’s external validity.
Authors should report the number and type of settings and describe the care providers involved. They should
report the locations in which the study was carried out, including the country, city if applicable, and immediate
environment (for example, community, office practice, hospital clinic, or inpatient unit). In particular, it should be
clear whether the trial was carried out in one or several centres (“multicentre trials”). This description should
provide enough information so that readers can judge whether the results of the trial could be relevant to their
own setting. The environment in which the trial is conducted may differ considerably from the setting in which the
trial’s results are later used to guide practice and policy.(94) (99) Authors should also report any other information
about the settings and locations that could have influenced the observed results, such as problems with
transportation that might have affected patient participation or delays in administering interventions.
Example:
“The study took place at the antiretroviral therapy clinic of Queen Elizabeth Central Hospital in Blantyre, Malawi,
from January 2006 to April 2007. Blantyre is the major commercial city of Malawi, with a population of 1 000 000
and an estimated HIV prevalence of 27% in adults in 2004.”(93)
5 – Interventions
Description:
Whether the reference treatment in the noninferiority trial is identical (or very similar) to that in any trial(s) that
established efficacy.
Explanation:
Any differences between the control intervention in the current trial and in the previous trial(s) in which efficacy
was established should be reported and explained. For example, differences may exist because patient
management changes with time and concomitant therapies may differ. 50 Doses may differ: if the dose of the
reference treatment is reduced, it might result in reduced efficacy; if it is increased; possibly leading to tolerability
problems, the advantages of the new treatment could be overestimated.
Example:
“The current international definition [of active management of the third stage of labour (AMTSL)] comprises:
administration of oxytocin soon after delivery of the baby; controlled cord traction; and uterine massage after
delivery of the placenta. . . . Randomised trials of [AMTSL] . . . included early clamping and cutting of the cord [full
package, the reference treatment]. The experimental intervention assessed in the trial was the simplified package,
in which placental delivery was allowed to occur with the aid of gravity and maternal effort [full package without
controlled cord traction]. The full package practised in the trial was similar to the way it has been executed in other
AMTSL trials except for delayed cord clamping.”49
6a – Outcomes
Description:
Specify the noninferiority outcome(s) and whether hypotheses for main and secondary outcome(s) are
noninferiority or superiority. Whether the outcomes in the noninferiority trial are identical (or very similar) to
those in any trial(s) that established efficacy of the reference treatment.
Explanation:
Any differences in outcome measures in the new trial compared with trials that established efficacy of the
reference treatment should be noted and explained. In particular, authors should note any differences in the
timing of evaluation. Ideally, outcomes should not be changed, but changes may be indicated by improvements in
the understanding, management, and prognosis of a disease. For example, early acquired immunodeficiency
syndrome (AIDS) trials had death as the primary outcome, but as deaths became uncommon, the focus shifted to
AIDS clinical events, then shifted again to surrogate markers as clinical events became uncommon.
Example:
“[S]even large, randomised, placebo-controlled trials involving a total of 16,770 patients who underwent
percutaneous interventions have established that the overall reduction in the risk of death or nonfatal myocardial
infarction 30 days after adjunctive inhibition of platelet glycoprotein IIb/IIIa receptors is 38 percent [relative
reduction]. . . . The primary end point [in the present trial] was a composite of death, nonfatal myocardial
infarction, or urgent target-vessel revascularisation within 30 days after the index procedure.”51
6b – Changes to outcomes
Description:
Any changes to trial outcomes after the trial commenced, with reasons
Explanation:
There are many reasons for departures from the initial study protocol (see item 24). Authors should report all
major changes to the protocol, including unplanned changes to eligibility criteria, interventions, examinations, data
collection, methods of analysis, and outcomes. Such information is not always reported.
As indicated earlier (see item 6a), most trials record multiple outcomes, with the risk that results will be reported
for only a selected subset (see item 17). Pre-specification and reporting of primary and secondary outcomes
(see item 6a) should remove such a risk. In some trials, however, circumstances require a change in the way an
outcome is assessed or even, as in the example above, a switch to a different outcome. For example, there may be
external evidence from other trials or systematic reviews suggesting the end point might not be appropriate, or
recruitment or the overall event rate in the trial may be lower than expected.(112) Changing an end point based
on unblinded data is much more problematic, although it may be specified in the context of an adaptive trial
design.(88) Authors should identify and explain any such changes. Likewise, any changes after the trial began of
the designation of outcomes as primary or secondary should be reported and explained.
A comparison of protocols and publications of 102 randomised trials found that 62% of trials reports had at least
one primary outcome that was changed, introduced, or omitted compared with the protocol.(55) Primary
outcomes also differed between protocols and publications for 40% of a cohort of 48 trials funded by the Canadian
Institutes of Health Research.(113) Not one of the subsequent 150 trial reports mentioned, let alone explained,
changes from the protocol. Similar results from other studies have been reported recently in a systematic review
of empirical studies examining outcome reporting bias.(57)
Example:
“The original primary endpoint was all-cause mortality, but, during a masked analysis, the data and safety
monitoring board noted that overall mortality was lower than had been predicted and that the study could not be
completed with the sample size and power originally planned. The steering committee therefore decided to adopt
co-primary endpoints of all-cause mortality (the original primary endpoint), together with all-cause mortality or
cardiovascular hospital admissions (the first prespecified secondary endpoint).”(112)
7a – Sample size
Description:
Whether the sample size was calculated using a noninferiority criterion and, if so, what the noninferiority margin
was.
Explanation:
The margin of noninferiority ▲ should be specified and preferably justified on clinical grounds. If ▲ is too large,
there will be too great a risk of accepting a truly inferior treatment as noninferior. This concern is especially
relevant for serious outcomes such as mortality. On the other hand, defining a very small ▲ might produce
inconclusive results, requiring an extremely large trial if adequate power is to be achieved. If ▲ is chosen to be a
proportion of the difference between reference treatment and placebo in previous trials (ratio approach),53 that
should be noted. Calculation of power requires that the investigators stipulate the expected response in each
group. It is common for these values to be set equal so that the power of the trial corresponds to the case in which
there is a zero difference between the 2 groups. The power can be higher if the new treatment is assumed to be
more effective than the reference treatment or lower if it is assumed to be less effective. 54 Two reviews of
published trials found that less than three-quarters of reports of noninferiority and equivalence trials reported a
sample-size calculation that incorporated ▲.24,33
Example:
a)
Noninferiority
 “Using data from the nonfatal stroke outcomes from the Clopidogrel versus Aspirin in Patients at
Risk of Ischemic Events trial and from the meta-analysis by the Antithrombotic Trialists’
Collaboration . . . ,we derived an estimated odds ratio for clopidogrel being better than placebo
for the outcome of nonfatal stroke: 1.377 (95% confidence interval [CI], 1.155 to 1.645). Thus, to
ensure that the aspirin plus extended-release dipyridamole preserved at least half the effect of
clopidogrel, the noninferiority margin was set at 1.075, an effect size equal to half the lower limit
of the confidence interval. . . . With 1715 recurrent strokes, we would have a statistical power of
82% to reject the inferiority null hypothesis, assuming a 6.5% relative risk reduction with aspirin
plus extended release dipyridamole as compared with clopidogrel.”45
b) Equivalence
 “The margin of equivalence, ▲, was 5% and the range - 5% to 5% was predefined as an
acceptable range of completion rates [of medical abortion] between the two types of providers.
The margin was based on clinically and statistically important differences as well as ethical
criteria, cost, and feasibility. The sample size of 1086 women was calculated to be sufficient (with
a two-sided 95% CI and 80% power) to establish equivalence The sample size calculation allowed
for 10% loss to follow-up. . . . ”5
7b – Interim analyses and stopping guidelines
Description:
To which outcome(s) they apply and whether related to a noninferiority hypothesis.
Explanation:
In superiority trials, if an interim analysis shows clear evidence of the efficacy of the new treatment, it may be
considered unethical to continue the trial and deny the new effective treatment to the control group. In contrast,
in noninferiority trials, if noninferiority is demonstrated for the primary outcome (using the prestated
noninferiority margin) before completion of the trial, there is less ethical need to stop the trial because the control
group is already receiving the standard treatment and the experimental treatment is not appearing appreciably
worse. Also, if noninferiority is evident at interim analysis and the point estimate is favorable, the investigators or
the data monitoring committee may then wish to continue in the hope of demonstrating superiority. 38 In
noninferiority trials it is therefore often more appropriate to base stopping rules on safety outcomes and
superiority hypotheses. 56 Stopping rules for efficacy in noninferiority trials may be asymmetric, 57 i.e., may favor
stopping early if the new treatment is appearing worse than the standard but continuing longer if the new
treatment is appearing better. Formal stopping rules for futility may be particularly important for noninferiority
trials (given that the comparison is with a proven standard therapy). It has been suggested that relating the
observed effect to the point of “no effect” rather than the noninferiority margin may be more appropriate for
considering futility and harm in noninferiority trials and that “the data would have to show convincing evidence of
harm before the trial would be stopped for futility.”58
Example:
“A data and safety monitoring board reviewed the data periodically for safety and efficacy. They could recommend
stopping the study if a benefit in favour of oral anticoagulation therapy was shown, such that the hazard ratio for
clopidogrel plus aspirin versus oral anticoagulation therapy exceeded 1.0 by more than 3 SDat either of two formal
interim analyses, timed to occurwhen50%or75%of events had occurred. . . . ”55
8a – Randomisation: sequence generation
Description:
Method used to generate the random allocation sequence
Explanation:
Participants should be assigned to comparison groups in the trial on the basis of a chance (random) process
characterised by unpredictability (see box 1). Authors should provide sufficient information that the reader can
assess the methods used to generate the random allocation sequence and the likelihood of bias in group
assignment. It is important that information on the process of randomisation is included in the body of the main
article and not as a separate supplementary file; where it can be missed by the reader.
The term “random” has a precise technical meaning. With random allocation, each participant has a known
probability of receiving each intervention before one is assigned, but the assigned intervention is determined by a
chance process and cannot be predicted. However, “random” is often used inappropriately in the literature to
describe trials in which non-random, deterministic allocation methods were used, such as alternation, hospital
numbers, or date of birth. When investigators use such non-random methods, they should describe them precisely
and should not use the term “random” or any variation of it. Even the term “quasi-random” is unacceptable for
describing such trials. Trials based on non-random methods generally yield biased results.(2) (3) (4) (136) Bias
presumably arises from the inability to conceal these allocation systems adequately (see item 9).
Many methods of sequence generation are adequate. However, readers cannot judge adequacy from such terms
as “random allocation,” “randomisation,” or “random” without further elaboration. Authors should specify the
method of sequence generation, such as a random-number table or a computerised random number generator.
The sequence may be generated by the process of minimization, a non-random but generally acceptable method
(see box 2).
In some trials, participants are intentionally allocated in unequal numbers to each intervention: for example, to
gain more experience with a new procedure or to limit costs of the trial. In such cases, authors should report the
randomisation ratio (for example, 2:1 or two treatment participants per each control participant) (see item 3a).
In a representative sample of PubMed indexed trials in 2000, only 21% reported an adequate approach to random
sequence generation (16); this increased to 34% for a similar cohort of PubMed indexed trials in 2006.(17) In more
than 90% of these cases, researchers used a random number generator on a computer or a random number table.
Example:
“Independent pharmacists dispensed either active or placebo inhalers according to a computer generated
randomisation list.”(63)
“For allocation of the participants, a computer-generated list of random numbers was used.”(135)
8b – Randomisation: type
Description:
Type of randomisation; details of any restriction (such as blocking and block size)
Explanation:
In trials of several hundred participants or more simple randomisation can usually be trusted to generate similar
numbers in the two trial groups (139) and to generate groups that are roughly comparable in terms of known and
unknown prognostic variables.(140) For smaller trials (see item 7a)—and even for trials that are not intended to
be small, as they may stop before reaching their target size—some restricted randomisation (procedures to help
achieve balance between groups in size or characteristics) may be useful (see box 2).
It is important to indicate whether no restriction was used, by stating such or by stating that “simple
randomisation” was done. Otherwise, the methods used to restrict the randomisation, along with the method
used for random selection, should be specified. For block randomisation, authors should provide details on how
the blocks were generated (for example, by using a permuted block design with a computer random number
generator), the block size or sizes, and whether the block size was fixed or randomly varied. If the trialists became
aware of the block size(s), that information should also be reported as such knowledge could lead to code
breaking. Authors should specify whether stratification was used, and if so, which factors were involved (such as
recruitment site, sex, disease stage), the categorisation cut-off values within strata, and the method used for
restriction. Although stratification is a useful technique, especially for smaller trials, it is complicated to implement
and may be impossible if many stratifying factors are used. If minimization (see box 2) was used, it should be
explicitly identified, as should the variables incorporated into the scheme. If used, a random element should be
indicated.
Only 9% of 206 reports of trials in specialty journals (23) and 39% of 80 trials in general medical journals reported
use of stratification.(32) In each case, only about half of the reports mentioned the use of restricted
randomisation. However, these studies and that of Adetugbo and Williams(8) found that the sizes of the treatment
groups in many trials were the same or quite similar, yet blocking or stratification had not been mentioned. One
possible explanation for the close balance in numbers is underreporting of the use of restricted randomisation.
Example:
“Randomisation sequence was created using Stata 9.0 (StataCorp, College Station, TX) statistical software and was
stratified by center with a 1:1 allocation using random block sizes of 2, 4, and 6.”(137)
“Participants were randomly assigned following simple randomisation procedures (computerised random
numbers) to 1 of 2 treatment groups.”(138)
9 – Randomisation: allocation concealment mechanism
Description:
Mechanism used to implement the random allocation sequence (such as sequentially numbered containers),
describing any steps taken to conceal the sequence until interventions were assigned
Explanation:
Item 8a discussed generation of an unpredictable sequence of assignments. Of considerable importance is how
this sequence is applied when participants are enrolled into the trial (see box 1). A generated allocation schedule
should be implemented by using allocation concealment,(23) a critical mechanism that prevents foreknowledge of
treatment assignment and thus shields those who enroll participants from being influenced by this knowledge. The
decision to accept or reject a participant should be made, and informed consent should be obtained from the
participant, in ignorance of the next assignment in the sequence.(148)
The allocation concealment should not be confused with blinding (see item 11). Allocation concealment seeks to
prevent selection bias, protects the assignment sequence until allocation, and can always be successfully
implemented.(2) In contrast, blinding seeks to prevent performance and ascertainment bias, protects the
sequence after allocation, and cannot always be implemented.(23) Without adequate allocation concealment,
however, even random, unpredictable assignment sequences can be subverted.(2) (149)
Centralised or “third-party” assignment is especially desirable. Many good allocation concealment mechanisms
incorporate external involvement. Use of a pharmacy or central telephone randomisation system are two common
techniques. Automated assignment systems are likely to become more common.(150) When external involvement
is not feasible, an excellent method of allocation concealment is the use of numbered containers. The
interventions (often drugs) are sealed in sequentially numbered identical containers according to the allocation
sequence.(151) Enclosing assignments in sequentially numbered, opaque, sealed envelopes can be a good
allocation concealment mechanism if it is developed and monitored diligently. This method can be corrupted,
however, particularly if it is poorly executed. Investigators should ensure that the envelopes are opaque when held
to the light, and opened sequentially and only after the participant’s name and other details are written on the
appropriate envelope.(143)
A number of methodological studies provide empirical evidence to support these precautions.(152) (153) Trials in
which the allocation sequence had been inadequately or unclearly concealed yielded larger estimates of treatment
effects than did trials in which authors reported adequate allocation concealment. These findings provide strong
empirical evidence that inadequate allocation concealment contributes to bias in estimating treatment effects.
Despite the importance of the mechanism of allocation concealment, published reports often omit such details.
The mechanism used to allocate interventions was omitted in reports of 89% of trials in rheumatoid arthritis,(108)
48% of trials in obstetrics and gynaecology journals,(23) and 44% of trials in general medical journals.(32) In a
more broadly representative sample of all randomised trials indexed on PubMed, only 18% reported any allocation
concealment mechanism, but some of those reported mechanisms were inadequate.(16)
Example:
“The doxycycline and placebo were in capsule form and identical in appearance. They were prepacked in bottles
and consecutively numbered for each woman according to the randomisation schedule. Each woman was assigned
an order number and received the capsules in the corresponding prepacked bottle.”(146)
“The allocation sequence was concealed from the researcher (JR) enrolling and assessing participants in
sequentially numbered, opaque, sealed and stapled envelopes. Aluminum foil inside the envelope was used to
render the envelope impermeable to intense light. To prevent subversion of the allocation sequence, the name
and date of birth of the participant was written on the envelope and a video tape made of the sealed envelope
with participant details visible. Carbon paper inside the envelope transferred the information onto the allocation
card inside the envelope and a second researcher (CC) later viewed video tapes to ensure envelopes were still
sealed when participants’ names were written on them. Corresponding envelopes were opened only after the
enrolled participants completed all baseline assessments and it was time to allocate the intervention.”(147)
10 – Randomisation: implementation
Description:
Who generated the allocation sequence, who enrolled participants, and who assigned participants to interventions
Explanation:
As noted in item 9, concealment of the allocated intervention at the time of enrolment is especially important.
Thus, in addition to knowing the methods used, it is also important to understand how the random sequence was
implemented—specifically, who generated the allocation sequence, who enrolled participants, and who assigned
participants to trial groups.
The process of randomising participants into a trial has three different steps: sequence generation, allocation
concealment, and implementation (see box 3). Although the same people may carry out more than one process
under each heading, investigators should strive for complete separation of the people involved with generation
and allocation concealment from the people involved in the implementation of assignments. Thus, if someone is
involved in the sequence generation or allocation concealment steps, ideally they should not be involved in the
implementation step.
Even with flawless sequence generation and allocation concealment, failure to separate creation and concealment
of the allocation sequence from assignment to study group may introduce bias. For example, the person who
generated an allocation sequence could retain a copy and consult it when interviewing potential participants for a
trial. Thus, that person could bias the enrolment or assignment process, regardless of the unpredictability of the
assignment sequence. Investigators must then ensure that the assignment schedule is unpredictable and locked
away (such as in a safe deposit box in a building rather inaccessible to the enrolment location) from even the
person who generated it. The report of the trial should specify where the investigators stored the allocation list.
Example:
“Determination of whether a patient would be treated by streptomycin and bed-rest (S case) or by bed-rest alone
(C case) was made by reference to a statistical series based on random sampling numbers drawn up for each sex at
each centre by Professor Bradford Hill; the details of the series were unknown to any of the investigators or to the
co-ordinator … After acceptance of a patient by the panel, and before admission to the streptomycin centre, the
appropriate numbered envelope was opened at the central office; the card inside told if the patient was to be an S
or a C case, and this information was then given to the medical officer of the centre.”(24)
“Details of the allocated group were given on coloured cards contained in sequentially numbered, opaque, sealed
envelopes. These were prepared at the NPEU and kept in an agreed location on each ward. Randomisation took
place at the end of the 2nd stage of labour when the midwife considered a vaginal birth was imminent. To enter a
woman into the study, the midwife opened the next consecutively numbered envelope.”(154)
“Block randomisation was by a computer generated random number list prepared by an investigator with no
clinical involvement in the trial. We stratified by admission for an oncology related procedure. After the research
nurse had obtained the patient’s consent, she telephoned a contact who was independent of the recruitment
process for allocation consignment.”(155)
11a – Blinding
Description:
If done, who was blinded after assignments to interventions (for example, participants, care providers, those
assessing outcomes) and how
Explanation:
The term “blinding” or “masking” refers to withholding information about the assigned interventions from people
involved in the trial who may potentially be influenced by this knowledge. Blinding is an important safeguard
against bias, particularly when assessing subjective outcomes.(153)
Benjamin Franklin has been credited as being the first to use blinding in a scientific experiment.(158) He
blindfolded participants so they would not know when he was applying mesmerism (a popular “healing fluid” of
the 18th century) and in so doing showed that mesmerism was a sham. Based on this experiment, the scientific
community recognised the power of blinding to reduce bias, and it has remained a commonly used strategy in
scientific experiments.
Box 4, on blinding terminology, defines the groups of individuals (that is, participants, healthcare providers, data
collectors, outcome adjudicators, and data analysts) who can potentially introduce bias into a trial through
knowledge of the treatment assignments. Participants may respond differently if they are aware of their treatment
assignment (such as responding more favourably when they receive the new treatment).(153) Lack of blinding
may also influence compliance with the intervention, use of co-interventions, and risk of dropping out of the trial.
Unblinded healthcare providers may introduce similar biases, and unblinded data collectors may differentially
assess outcomes (such as frequency or timing), repeat measurements of abnormal findings, or provide
encouragement during performance testing. Unblinded outcome adjudicators may differentially assess subjective
outcomes, and unblinded data analysts may introduce bias through the choice of analytical strategies, such as the
selection of favourable time points or outcomes, and by decisions to remove patients from the analyses. These
biases have been well documented.(71) (153) (159) (160) (161) (162)
Blinding, unlike allocation concealment (see item 10), may not always be appropriate or possible. An example is a
trial comparing levels of pain associated with sampling blood from the ear or thumb.(163) Blinding is particularly
important when outcome measures involve some subjectivity, such as assessment of pain. Blinding of data
collectors and outcome adjudicators is unlikely to matter for objective outcomes, such as death from any cause.
Even then, however, lack of participant or healthcare provider blinding can lead to other problems, such as
differential attrition.(164) In certain trials, especially surgical trials, blinding of participants and surgeons is often
difficult or impossible, but blinding of data collectors and outcome adjudicators is often achievable. For example,
lesions can be photographed before and after treatment and assessed by an external observer.(165) Regardless of
whether blinding is possible, authors can and should always state who was blinded (that is, participants, healthcare
providers, data collectors, and outcome adjudicators).
Unfortunately, authors often do not report whether blinding was used.(166) For example, reports of 51% of 506
trials in cystic fibrosis,(167) 33% of 196 trials in rheumatoid arthritis,(108) and 38% of 68 trials in dermatology(8)
did not state whether blinding was used. Until authors of trials improve their reporting of blinding, readers will
have difficulty in judging the validity of the trials that they may wish to use to guide their clinical practice.
The term masking is sometimes used in preference to blinding to avoid confusion with the medical condition of
being without sight. However, “blinding” in its methodological sense seems to be understood worldwide and is
acceptable for reporting clinical trials.(165) (168)
Example:
“Whereas patients and physicians allocated to the intervention group were aware of the allocated arm, outcome
assessors and data analysts were kept blinded to the allocation.”(156)
“Blinding and equipoise were strictly maintained by emphasizing to intervention staff and participants that each
diet adheres to healthy principles, and each is advocated by certain experts to be superior for long-term weightloss. Except for the interventionists (dieticians and behavioural psychologists), investigators and staff were kept
blind to diet assignment of the participants. The trial adhered to established procedures to maintain separation
between staff that take outcome measurements and staff that deliver the intervention. Staff members who
obtained outcome measurements were not informed of the diet group assignment. Intervention staff, dieticians
and behavioural psychologists who delivered the intervention did not take outcome measurements. All
investigators, staff, and participants were kept masked to outcome measurements and trial results.”(157)
11b – Similarity of interventions
Description:
If relevant, description of the similarity of interventions
Explanation:
Just as we seek evidence of concealment to assure us that assignment was truly random, we seek evidence of the
method of blinding. In trials with blinding of participants or healthcare providers, authors should state the
similarity of the characteristics of the interventions (such as appearance, taste, smell, and method of
administration).(35) (173)
Some people have advocated testing for blinding by asking participants or healthcare providers at the end of a trial
whether they think the participant received the experimental or control intervention.(174) Because participants
and healthcare providers will usually know whether the participant has experienced the primary outcome, this
makes it difficult to determine if their responses reflect failure of blinding or accurate assumptions about the
efficacy of the intervention.(175) Given the uncertainty this type of information provides, we have removed
advocating reporting this type of testing for blinding from the CONSORT 2010 Statement. We do, however,
advocate that the authors report any known compromises in blinding. For example, authors should report if it was
necessary to unblind any participants at any point during the conduct of a trial.
Example:
“Jamieson Laboratories Inc provided 500-mg immediate release niacin in a white, oblong, bisect caplet. We
independently confirmed caplet content using high performance liquid chromatography … The placebo was
matched to the study drug for taste, color, and size, and contained microcrystalline cellulose, silicon dioxide,
dicalcium phosphate, magnesium stearate, and stearic acid.”(172)
12a – Statistical methods
Description:
Whether a 1- or 2-sided confidence interval approach was used.
Explanation:
Tests of noninferiority need to be related to the ▲ and α as prespecified in the noninferiority hypothesis. It should
be specified whether an absolute difference between treatments or a relative measure, or both, will be used.
Judgment of the results in relation to the study hypothesis is based on the location of the whole CI in relation to
▲(Figure 1). For noninferiority trials, the upper bound of the 2-sided (1-2α) x100% CI for the (deleterious)
treatment effect or the upper bound of the 1-sided (1-α)x100% CI has to be below the margin ▲ to declare that
noninferiority has been shown, with a significance level α. The 2-sided CI provides additional information, in
particular for the situation in which the new treatment is superior to the reference treatment. For equivalence
trials, equivalence is demonstrated if the entire 2-sided (1-α)x100% CI lies within -▲and▲. If noninferiority has
been demonstrated, it is then acceptable to assess whether the new treatment appears superior to the reference
treatment, using an appropriate test or CI, with a significance level or confidence, respectively, defined a priori in
the protocol and with an ITT analysis. Conversely, occasionally a trial protocol may specify that if superiority is not
demonstrated, a noninferiority analysis will be performed.61 Such sequential testing should be fully explained.
Figure 1: Possible scenarios of Observed Treatment Differences for Adverse Outcomes (Harms) in Noninferiority
Trials
Error bars indicate 2-sided 95% CIs. The blue dashed line at x=▲ indicates the noninferiority margin; the blue tinted region to
the left of x=▲ indicates the zone of inferiority. A, if the CI lies wholly to the left of zero, the new treatment is superior. B and
C, if the CI lies to the left of ▲ and includes zero, the new treatment is noninferior but not shown to be superior. D, If the CI lies
wholly to the left of ▲ and wholly to the right of zero, the new treatment is noninferior in the sense already defined but also
inferior in the sense that a null treatment difference is excluded. This puzzling circumstance is rare, because it requires a very
large sample size. It also can result from a noninferiority margin that is too wide. E and F, If the CI includes ▲ and zero, the
difference is nonsignificant but the result regarding noninferiority is inconclusive. G, If the CI includes ▲ and is wholly to the
right of zero, the difference is statistically significant but the result is inconclusive regarding possible inferiority of magnitude ▲
or worse. H, if the CI is wholly above ▲, the new treatment is inferior.
a This CI indicates noninferiority in the sense that it does not include ▲, but the new treatment is significantly worse than the
standard. Such a result is unlikely because it would require a very large sample size.
B This CI is inconclusive in that it is still plausible that the true treatment difference is less than ▲, but the new treatment is
significantly worse than the standard. Adapted from Piaggio et al.6
Example:
a)
Noninferiority, continuous outcome
 “The primary efficacy end point was the mean change in pain intensity. . . . Study endpoints were
analysed primarily for the per protocol population and repeated, for sensitivity reasons, for the
intention-to-treat (ITT) population. Foremost efficacy endpoints, a confidence interval (CI)
approach was used on an analysis of covariance (ANCOVA) model, with a two-sided 5% level of
significance. . . . For the primary efficacy endpoint, non-inferiority of lumiracoxib to indomethacin
could be claimed if the lower limit of the CI [for the difference in mean change of pain intensity
assessed on a 5-point Likert scale]was greater than -0.5.Thistestfornon-inferioritywasonly
performed for the primary efficacy variable; all other secondary variables were tests of
superiority of lumiracoxib versus indomethacin.”59
b) Noninferiority, binary outcome
 “The trial was powered for separate comparisons between the control group [unfractionated
heparin or enoxaparin plus a glycoprotein IIb/ IIIa inhibitor] and each of the two investigational
groups. We used sequential noninferiority and superiority analyses with hierarchical end-point
testing, with the type I error controlled by the Benjamini and Hochberg procedure, as previously
described. Noninferiority was declared if the upper limit of the one-sided 97.5% confidence
interval (CI) for the event rate in the investigational group did not exceed a relative margin of
25% from the event rate in the control group [risk ratio=1.25], equivalent to a one-sided test with
an alpha value of 0.025. A two sided alpha value of 0.05 was used for superiority testing.”60
c) Equivalence, binary outcome
 “To assess the equivalence between midlevel healthcare providers and doctors, the risk
difference between the two provider types together with their [two-sided] 95% CI was derived by
use of a generalised estimating equation (GEE) model . . . [The primary endpoint was complete
abortion] . . . If the CI of the risk difference between the two groups falls within the
predetermined margin of equivalence (-5% to 5%), the two types of providers can be considered
equivalent. . . . The analyses for the primary and secondary endpoints were on an intention-totreat basis, supplemented by per-protocol analysis of the primary endpoint.”52
12b – Additional analyses
Description:
Methods for additional analyses, such as subgroup analyses and adjusted analyses
Explanation:
As is the case for primary analyses, the method of subgroup analysis should be clearly specified. The strongest
analyses are those that look for evidence of a difference in treatment effect in complementary subgroups (for
example, older and younger participants), a comparison known as a test of interaction.(182) (183) A common but
misleading approach is to compare P values for separate analyses of the treatment effect in each group. It is
incorrect to infer a subgroup effect (interaction) from one significant and one non-significant P value.(184) Such
inferences have a high false positive rate.
Because of the high risk for spurious findings, subgroup analyses are often discouraged.(14) (185) Post hoc
subgroup comparisons (analyses done after looking at the data) are especially likely not to be confirmed by further
studies. Such analyses do not have great credibility.
In some studies, imbalances in participant characteristics are adjusted for by using some form of multiple
regression analysis. Although the need for adjustment is much less in RCTs than in epidemiological studies, an
adjusted analysis may be sensible, especially if one or more variables is thought to be prognostic.(186) Ideally,
adjusted analyses should be specified in the study protocol (see item 24). For example, adjustment is often
recommended for any stratification variables (see item 8b) on the principle that the analysis strategy should
follow the design. In RCTs, the decision to adjust should not be determined by whether baseline differences are
statistically significant (see item 16).(183) (187) The rationale for any adjusted analyses and the statistical
methods used should be specified.
Authors should clarify the choice of variables that were adjusted for, indicate how continuous variables were
handled, and specify whether the analysis was planned or suggested by the data.(188) Reviews of published
studies show that reporting of adjusted analyses is inadequate with regard to all of these aspects.(188) (189)
(190) (191)
Example:
“Proportions of patients responding were compared between treatment groups with the Mantel-Haenszel ᵪ2 test,
adjusted for the stratification variable, methotrexate use.”(103)
“Pre-specified subgroup analyses according to antioxidant treatment assignment(s), presence or absence of prior
CVD, dietary folic acid intake, smoking, diabetes, aspirin, hormone therapy, and multivitamin use were performed
using stratified Cox proportional hazards models. These analyses used baseline exposure assessments and were
restricted to participants with nonmissing subgroup data at baseline.”(181)
Results
13a – Participant flow
Description:
For each group, the number s of participants who were randomly assigned, received intended treatment, and were
analysed for the primary outcome
Explanation:
The design and conduct of some RCTs is straightforward, and the flow of participants, particularly were there are
no losses to follow-up or exclusions, through each phase of the study can be described adequately in a few
sentences. In more complex studies, it may be difficult for readers to discern whether and why some participants
did not receive the treatment as allocated, were lost to follow-up, or were excluded from the analysis(51). This
information is crucial for several reasons. Participants who were excluded after allocation are unlikely to be
representative of all participants in the study. For example, patients may not be available for follow-up evaluation
because they experienced an acute exacerbation of their illness or harms of treatment.(22) (192)
Attrition as a result of loss to follow up, which is often unavoidable, needs to be distinguished from investigatordetermined exclusion for such reasons as ineligibility, withdrawal from treatment, and poor adherence to the trial
protocol. Erroneous conclusions can be reached if participants are excluded from analysis, and imbalances in such
omissions between groups may be especially indicative of bias.(192) (193) (194) Information about whether the
investigators included in the analysis all participants who underwent randomisation, in the groups to which they
were originally allocated (intention-to-treat analysis (see item 16 and box 6)), is therefore of particular
importance. Knowing the number of participants who did not receive the intervention as allocated or did not
complete treatment permits the reader to assess to what extent the estimated efficacy of therapy might be
underestimated in comparison with ideal circumstances.
If available, the number of people assessed for eligibility should also be reported. Although this number is relevant
to external validity only and is arguably less important than the other counts,(195) it is a useful indicator of
whether trial participants were likely to be representative of all eligible participants.
A review of RCTs published in five leading general and internal medicine journals in 1998 found that reporting of
the flow of participants was often incomplete, particularly with regard to the number of participants receiving the
allocated intervention and the number lost to follow-up.(51) Even information as basic as the number of
participants who underwent randomisation and the number excluded from analyses was not available in up to 20%
of articles.(51) Reporting was considerably more thorough in articles that included a diagram of the flow of
participants through a trial, as recommended by CONSORT. This study informed the design of the revised flow
diagram in the revised CONSORT statement.(52) (53) (54) The suggested template is shown in fig 1, and the
counts required are described in detail in table 2.
Table 2 - Information required to document the flow of participants through each stage of a
randomised trial
Stage
Enrolment
Randomisation
Number of people
included
People evaluated for
potential enrolment
Participants
Number of people not
included or excluded
People who did not meet
the inclusion criteria or
met the inclusion criteria
but declined to be
enrolled
Rationale
These counts indicate whether trial
participants were likely to be
representative of all patients seen; they
are relevant to assessment of external
validity only, and they are often not
available.
Crucial count for defining trial size and
randomly assigned
Treatment
allocation
Follow-up
Analysis
Participants who
received treatment
as allocated, by
study group
Participants who
completed
treatment as
allocated, by study
group
Participants who
completed follow-up
as planned, by study
group
Participants
included in main
analysis, by study
group
Participants who did not
receive treatment as
allocated, by study group
Participants who did not
complete treatment as
allocated, by study group
assessing whether a trial has been
analysed by intention to treat.
Important counts for assessment of
internal validity and interpretation of
results; reasons for not receiving
treatment as allocated should be given.
Important counts for assessment of
internal validity and interpretation of
results; reasons for not completing
treatment or follow-up should be given.
Participants who did not
complete follow-up as
planned, by study group
Participants excluded
from main analysis, by
study group
Crucial count for assessing whether a trial
has been analysed by intention to treat;
reasons for excluding participants should
be given.
Some information, such as the number of individuals assessed for eligibility, may not always be known,(14) and,
depending on the nature of a trial, some counts may be more relevant than others. It will sometimes be useful or
necessary to adapt the structure of the flow diagram to a particular trial. In some situations, other information
may usefully be added. For example, the flow diagram of a parallel group trial of minimal surgery compared with
medical management for chronic gastro-oesophageal reflux also included a parallel non-randomised preference
group (see fig).(196)
The exact form and content of the flow diagram may be varied according to specific features of a trial. For
example, many trials of surgery or vaccination do not include the possibility of discontinuation. Although CONSORT
strongly recommends using this graphical device to communicate participant flow throughout the study, there is
no specific, prescribed format.
Example:
Figure 2: Flow diagram of a multicentre trial of fractional flow reserve versus angiography for guiding percutaneous
coronary intervention (PCI) (adapted from Tonino et al(313)). The diagram includes detailed information
on the excluded participants.
Figure 3: Flow diagram of minimal surgery compared with medical management for chronic gastro-oesophageal
reflux disease (adapted from Grant et al(196)). The diagram shows a multicentre trial with a parallel nonrandomised preference group.
13b – Losses and exclusions
Description:
For each group, losses and exclusions after randomisation, together with reasons
Explanation:
Some protocol deviations may be reported in the flow diagram (see item 13a)—for example, participants who did
not receive the intended intervention. If participants were excluded after randomisation (contrary to the intentionto-treat principle) because they were found not to meet eligibility criteria (see item 16), they should be included in
the flow diagram. Use of the term “protocol deviation” in published articles is not sufficient to justify exclusion of
participants after randomisation. The nature of the protocol deviation and the exact reason for excluding
participants after randomisation should always be reported.
Example:
“There was only one protocol deviation, in a woman in the study group. She had an abnormal pelvic measurement
and was scheduled for elective caesarean section. However, the attending obstetrician judged a trial of labour
acceptable; caesarean section was done when there was no progress in the first stage of labour.”(197)
“The monitoring led to withdrawal of nine centres, in which existence of some patients could not be proved, or
other serious violations of good clinical practice had occurred.”(198)
14a – Recruitment
Description:
Dates defining the periods of recruitment and follow-up
Explanation:
Knowing when a study took place and over what period participants were recruited places the study in historical
context. Medical and surgical therapies, including concurrent therapies, evolve continuously and may affect the
routine care given to participants during a trial. Knowing the rate at which participants were recruited may also be
useful, especially to other investigators.
The length of follow-up is not always a fixed period after randomisation. In many RCTs in which the outcome is
time to an event, follow-up of all participants is ended on a specific date. This date should be given, and it is also
useful to report the minimum, maximum, and median duration of follow-up.(200) (201)
A review of reports in oncology journals that used survival analysis, most of which were not RCTs,(201) found that
nearly 80% (104 of 132 reports) included the starting and ending dates for accrual of patients, but only 24% (32 of
132 reports) also reported the date on which follow-up ended.
Example:
“Age-eligible participants were recruited … from February 1993 to September 1994 … Participants attended clinic
visits at the time of randomisation (baseline) and at 6-month intervals for 3 years.”(199)
14b – Reason for stopped trial
Description:
Why the trial ended or was stopped
Explanation:
Arguably, trialists who arbitrarily conduct unplanned interim analyses after very few events accrue using no
statistical guidelines run a high risk of “catching” the data at a random extreme, which likely represents a large
overestimate of treatment benefit.(204)
Readers will likely draw weaker inferences from a trial that was truncated in a data-driven manner versus one that
reports its findings after reaching a goal independent of results. Thus, RCTs should indicate why the trial came to
an end (see box 5). The report should also disclose factors extrinsic to the trial that affected the decision to stop
the trial, and who made the decision to stop the trial, including reporting the role the funding agency played in the
deliberations and in the decision to stop the trial.(134)
A systematic review of 143 RCTs stopped earlier than planned for benefit found that these trials reported stopping
after accruing a median of 66 events, estimated a median relative risk of 0.47 and a strong relation between the
number of events accrued and the size of the effect, with smaller trials with fewer events yielding the largest
treatment effects (odds ratio 31, 95% confidence interval 12 to 82).(134) While an increasing number of trials
published in high impact medical journals report stopping early, only 0.1% of trials reported stopping early for
benefit, which contrasts with estimates arising from simulation studies(205) and surveys of data safety and
monitoring committees.(206) Thus, many trials accruing few participants and reporting large treatment effects
may have been stopped earlier than planned but failed to report this action.
Example:
“At the time of the interim analysis, the total follow-up included an estimated 63% of the total number of patientyears that would have been collected at the end of the study, leading to a threshold value of 0.0095, as
determined by the Lan-DeMets alpha-spending function method … At the interim analysis, the RR was 0.37 in the
intervention group, as compared with the control group, with a p value of 0.00073, below the threshold value. The
Data and Safety Monitoring Board advised the investigators to interrupt the trial and offer circumcision to the
control group, who were then asked to come to the investigation centre, where MC (medical circumcision) was
advised and proposed … Because the study was interrupted, some participants did not have a full follow-up on that
date, and their visits that were not yet completed are described as “planned” in this article.”(202)
“In January 2000, problems with vaccine supply necessitated the temporary nationwide replacement of the whole
cell component of the combined DPT/Hib vaccine with acellular pertussis vaccine. As this vaccine has a different
local reactogenicity profile, we decided to stop the trial early.”(203)
15 – Baseline data
Description:
A table showing baseline demographic and clinical characteristics for each group
Explanation:
Although the eligibility criteria (see item 4a) indicate who was eligible for the trial, it is also important to know the
characteristics of the participants who were actually included. This information allows readers,
especially clinicians, to judge how relevant the results of a trial might be to an individual patient.
Randomised trials aim to compare groups of participants that differ only with respect to the intervention
(treatment). Although proper random assignment prevents selection bias, it does not guarantee that the groups
are equivalent at baseline. Any differences in baseline characteristics are, however, the result of chance rather
than bias.(32) The study groups should be compared at baseline for important demographic and clinical
characteristics so that readers can assess how similar they were. Baseline data are especially valuable for
outcomes that can also be measured at the start of the trial (such as blood pressure).
Baseline information is most efficiently presented in a table (see table 3). For continuous variables, such as weight
or blood pressure, the variability of the data should be reported, along with average values. Continuous variables
can be summarised for each group by the mean and standard deviation. When continuous data have an
asymmetrical distribution, a preferable approach may be to quote the median and a centile range (such as the
25th and 75th centiles).(177) Standard errors and confidence intervals are not appropriate for describing
variability—they are inferential rather than descriptive statistics. Variables with a small number of ordered
categories (such as stages of disease I to IV) should not be treated as continuous variables; instead, numbers and
proportions should be reported for each category.(48) (177)
Unfortunately significance tests of baseline differences are still common(23) (32) (210); they were reported in half
of 50 RCTs trials published in leading general journals in 1997.(183) Such significance tests assess the probability
that observed baseline differences could have occurred by chance; however, we already know that any differences
are caused by chance. Tests of baseline differences are not necessarily wrong, just illogical.(211) Such hypothesis
testing is superfluous and can mislead investigators and their readers. Rather, comparisons at baseline should be
based on consideration of the prognostic strength of the variables measured and the size of any chance
imbalances that have occurred.(211)
Example:
Table 3 - Example of reporting baseline demographic and clinical characteristics*
(Adapted from table 1 of Yusuf et al (209))
Age (years)
Sex (female)
Smoking status
Current
Past
Ethnic origin
Asian
Arab
African
European
Native or Aboriginal
Other
Blood pressure (mm Hg)
Heart rate (beats per min)
Cholesterol (mmol/l)
Total
LDL
HDL
Coronary artery disease
Myocardial infarction
Angina pectoris
Peripheral artery disease
Hypertension
Telmisartan (N=2954)
66.9 (7.3)
1280 (43.3%)
Placebo (N=2972)
66.9 (7.4)
1267 (42.6%)
293 (9.9%)
1273 (43.1%)
289 (9.7%)
1283 (43.2%)
637 (21.6%)
37 (1.3%)
51 (1.7%)
1801 (61.0%)
390 (13.2%)
38 (1.3%)
140.7 (16.8/81.8) (10.1)
68.8 (11.5)
624 (21.0%)
40 (1.3%)
55 (1.9%)
1820 (61.2%)
393 (13.2%)
40 (1.3%)
141.3 (16.4/82.0) (10.2)
68.8 (12.1)
5.09 (1.18)
3.02 (1.01)
1.27 (0.37)
2211 (74.8%)
1381 (46.8%)
1412 (47.8%)
349 (11.8%)
2259 (76.5%)
5.08 (1.15)
3.03 (1.02)
1.28 (0.41)
2207 (74.3%)
1360 (45.8%)
1412 (47.5%)
323 (10.9%)
2269 (76.3%)
Diabetes
1059 (35.8%)
*Data are means (SD) or numbers (%).
1059 (35.6%)
16 – Numbers analysed
Description:
For each group, number of participants (denominator) included in each analysis and whether the analysis was by
original assigned groups
Explanation:
The number of participants in each group is an essential element of the analyses. Although the flow diagram
(see item 13a) may indicate the numbers of participants analysed, these numbers often vary for different
outcome measures. The number of participants per group should be given for all analyses. For binary outcomes,
(such as risk ratio and risk difference) the denominators or event rates should also be reported. Expressing results
as fractions also aids the reader in assessing whether some of the randomly assigned participants were excluded
from the analysis. It follows that results should not be presented solely as summary measures, such as relative
risks.
Participants may sometimes not receive the full intervention, or some ineligible patients may have been randomly
allocated in error. One widely recommended way to handle such issues is to analyse all participants according to
their original group assignment, regardless of what subsequently occurred (see box 6). This “intention-to-treat”
strategy is not always straightforward to implement. It is common for some patients not to complete a study—
they may drop out or be withdrawn from active treatment—and thus are not assessed at the end. If the outcome
is mortality, such patients may be included in the analysis based on register information, whereas imputation
techniques may need to be used if other outcome data are missing. The term “intention-to-treat analysis” is often
inappropriately used—for example, when those who did not receive the first dose of a trial drug are excluded from
the analyses.(18)
Conversely, analysis can be restricted to only participants who fulfill the protocol in terms of eligibility,
interventions, and outcome assessment. This analysis is known as an “on-treatment” or “per protocol” analysis.
Excluding participants from the analysis can lead to erroneous conclusions. For example, in a trial that compared
medical with surgical therapy for carotid stenosis, analysis limited to participants who were available for follow-up
showed that surgery reduced the risk for transient ischaemic attack, stroke, and death. However, intention-totreat analysis based on all participants as originally assigned did not show a superior effect of surgery.(214)
Intention-to-treat analysis is generally favoured because it avoids bias associated with non-random loss of
participants.(215) (216) (217) Regardless of whether authors use the term “intention-to-treat,” they should make
clear which and how many participants are included in each analysis (see item 13). Non-compliance with assigned
therapy may mean that the intention-to-treat analysis underestimates the potential benefit of the treatment, and
additional analyses, such as a per protocol analysis, may therefore be considered.(218) (219) It should be noted,
however, that such analyses are often considerably flawed.(220)
In a review of 403 RCTs published in 10 leading medical journals in 2002, 249 (62%) reported the use of intentionto-treat analysis for their primary analysis. This proportion was higher for journals adhering to the CONSORT
statement (70% v 48%). Among articles that reported the use of intention-to-treat analysis, only 39% actually
analysed all participants as randomised, with more than 60% of articles having missing data in their primary
analysis.(221) Other studies show similar findings.(18) (222) (223) Trials with no reported exclusions are
methodologically weaker in other respects than those that report on some excluded participants,(173) strongly
indicating that at least some researchers who have excluded participants do not report it. Another study found
that reporting an intention-to-treat analysis was associated with other aspects of good study design and reporting,
such as describing a sample size calculation.(224)
Example:
“The primary analysis was intention-to-treat and involved all patients who were randomly assigned.”(212)
“One patient in the alendronate group was lost to follow up; thus data from 31 patients were available for the
intention-to-treat analysis. Five patients were considered protocol violators … consequently 26 patients remained
for the per-protocol analyses.”(213)
17a – Outcomes and estimations
Description:
For the outcome(s) for which noninferiority was hypothesised, a figure showing CIs and the noninferiority margin
may be useful.
Explanation:
A figure helps readers to interpret the result based on the CI, because it shows graphically where the CI lies with
respect to the null value (if a risk difference is used) or to 1 (if a relative measure is used) and with respect to the
margin of noninferiority or the margins of equivalence. In the example the new treatment was noninferior. The
figure can be used to show graphically the results of different analyses, e.g., with or without adjustment (Figure 2)
or ITT and per protocol.63 Only 1 of 47 published equivalence or noninferiority trials in ophthalmology evaluating
prostaglandins depicted the CI graphically with the prespecified noninferiority or equivalence margin. 24
Example:
“The unadjusted HR comparing overall survival between the SLND [sentinel lymph node dissection]-alone group
and the ALND [axillary lymph node dissection] group was 0.79 (90% CI, 0.56-1.10), which did not cross the specified
boundary of 1.3. The HR for overall survival adjusting for adjuvant therapy . . . and age for the SLND alone group
compared with the ALND group was 0.87 (90% CI, 0.62-1.23)” (FIGURE 4).62
Figure 4: Confidence Intervals and the Noninferiority Margin (Item 17a, Example 1)
Hazard ratios (HRs) comparing overall survival between the axillary lymph node dissection (ALND) and sentinel lymph node
dissection (SLND)–alone groups. Blue dashed line at HR=1.3 indicates noninferiority margin; blue-tinted region to the left of
HR=1.3 indicates values for which SLND alone would be considered noninferior to SLND plus ALND. Reproduced from Giuliano
et al.62
17b – Binary outcomes
Description:
For binary outcomes, presentation of both absolute and relative effect sizes is recommended
Explanation:
When the primary outcome is binary, both the relative effect (risk ratio (relative risk) or odds ratio) and the
absolute effect (risk difference) should be reported (with confidence intervals), as neither the relative measure nor
the absolute measure alone gives a complete picture of the effect and its implications. Different audiences may
prefer either relative or absolute risk, but both doctors and lay people tend to overestimate the effect when it is
presented in terms of relative risk.(243) (244) (245) The size of the risk difference is less generalisable to other
populations than the relative risk since it depends on the baseline risk in the unexposed group, which tends to vary
across populations. For diseases where the outcome is common, a relative risk near unity might indicate clinically
important differences in public health terms. In contrast, a large relative risk when the outcome is rare may not be
so important for public health (although it may be important to an individual in a high risk category).
Example:
“The risk of oxygen dependence or death was reduced by 16% (95% CI 25% to 7%). The absolute difference was 6.3% (95% CI -9.9% to -2.7%); early administration to an estimated 16 babies would therefore prevent 1 baby dying
or being long-term dependent on oxygen” (also see table 4).(242)
Table 4 - Example of reporting both absolute and relative effect sizes
(Adapted from table 3 of The OSIRIS Collaborative Group(242))
Primary outcome
Percentage (No)
Early administration
(n=1344)
Delayed selective
administration (n=1346)
Risk ratio
(95% CI)
Risk difference
(95% CI)
Death or oxygen dependence
at “expected date of delivery”
31.9 (429)
38.2 (514)
0.84 (0.75
to 0.93)
-6.3 (-9.9 to 2.7)
18 – Ancillary analyses
Description:
Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory
Explanation:
Multiple analyses of the same data create a risk for false positive findings.(246) Authors should resist the
temptation to perform many subgroup analyses.(183) (185) (247) Analyses that were prespecified in the trial
protocol (see item 24) are much more reliable than those suggested by the data, and therefore authors should
report which analyses were prespecified. If subgroup analyses were undertaken, authors should report which
subgroups were examined, why, if they were prespecified, and how many were prespecified. Selective reporting of
subgroup analyses could lead to bias.(248) When evaluating a subgroup the question is not whether the subgroup
shows a statistically significant result but whether the subgroup treatment effects are significantly different from
each other. To determine this, a test of interaction is helpful, although the power for such tests is typically low. If
formal evaluations of interaction are undertaken (see item 12b) they should be reported as the estimated
difference in the intervention effect in each subgroup (with a confidence interval), not just as P values.
In one survey, 35 of 50 trial reports included subgroup analyses, of which only 42% used tests of interaction.(183)
It was often difficult to determine whether subgroup analyses had been specified in the protocol. In another
survey of surgical trials published in high impact journals, 27 of 72 trials reported 54 subgroup analyses, of which
91% were post hoc and only 6% of subgroup analyses used a test of interaction to assess whether a subgroup
effect existed.(249)
Similar recommendations apply to analyses in which adjustment was made for baseline variables. If done, both
unadjusted and adjusted analyses should be reported. Authors should indicate whether adjusted analyses,
including the choice of variables to adjust for, were planned. Ideally, the trial protocol should state whether
adjustment is made for nominated baseline variables by using analysis of covariance.(187) Adjustment for
variables because they differ significantly at baseline is likely to bias the estimated treatment effect.(187) A survey
found that unacknowledged discrepancies between protocols and publications were found for all 25 trials
reporting subgroup analyses and for 23 of 28 trials reporting adjusted analyses.(92)
Example:
“On the basis of a study that suggested perioperative β-blocker efficacy might vary across baseline risk, we
prespecified our primary subgroup analysis on the basis of the revised cardiac risk index scoring system. We also
did prespecified secondary subgroup analyses based on sex, type of surgery, and use of an epidural or spinal
anaesthetic. For all subgroup analyses, we used Cox proportional hazard models that incorporated tests for
interactions, designated to be significant at p<0.05 … Figure 3 shows the results of our prespecified subgroup
analyses and indicates consistency of effects … Our subgroup analyses were underpowered to detect the modest
differences in subgroup effects that one might expect to detect if there was a true subgroup effect.”(100)
19 – Harms
Description:
All important harms or unintended effects in each group
Explanation:
Readers need information about the harms as well as the benefits of interventions to make rational and balanced
decisions. The existence and nature of adverse effects can have a major impact on whether a particular
intervention will be deemed acceptable and useful. Not all reported adverse events observed during a trial are
necessarily a consequence of the intervention; some may be a consequence of the condition being treated.
Randomised trials offer the best approach for providing safety data as well as efficacy data, although they cannot
detect rare harms.
Many reports of RCTs provide inadequate information on adverse events. A survey of 192 drug trials published
from 1967 to 1999 showed that only 39% had adequate reporting of clinical adverse events and 29% had adequate
reporting of laboratory defined toxicity.(72) More recently, a comparison between the adverse event data
submitted to the trials database of the National Cancer Institute, which sponsored the trials, and the information
reported in journal articles found that low grade adverse events were underreported in journal articles. High grade
events (Common Toxicity Criteria grades 3 to 5) were reported inconsistently in the articles, and the information
regarding attribution to investigational drugs was incomplete.(251) Moreover, a review of trials published in six
general medical journals in 2006 to 2007 found that, although 89% of 133 reports mentioned adverse events, no
information on severe adverse events and withdrawal of patients due to an adverse event was given on 27% and
48% of articles, respectively.(252)
An extension of the CONSORT statement has been developed to provide detailed recommendations on the
reporting of harms in randomised trials.(42) Recommendations and examples of appropriate reporting are freely
available from the CONSORT website (www.consort-statement.org). They complement the CONSORT 2010
Statement and should be consulted, particularly if the study of harms was a key objective. Briefly, if data on
adverse events were collected, events should be listed and defined, with reference to standardised criteria where
appropriate. The methods used for data collection and attribution of events should be described. For each study
arm the absolute risk of each adverse event, using appropriate metrics for recurrent events, and the number of
participants withdrawn due to harms should be presented. Finally, authors should provide a balanced discussion of
benefits and harms.(42)
Example:
“The proportion of patients experiencing any adverse event was similar between the rBPI21 [recombinant
bactericidal/permeability-increasing protein] and placebo groups: 168 (88.4%) of 190 and 180 (88.7%) of 203,
respectively, and it was lower in patients treated with rBPI21 than in those treated with placebo for 11 of 12 body
systems … the proportion of patients experiencing a severe adverse event, as judged by the investigators, was
numerically lower in the rBPI21 group than the placebo group: 53 (27.9%) of 190 versus 74 (36.5%) of 203 patients,
respectively. There were only three serious adverse events reported as drug-related and they all occurred in the
placebo group.”(250)
Discussion
20 – Limitations
Description:
Trial limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses
Explanation:
The discussion sections of scientific reports are often filled with rhetoric supporting the authors’ findings(254) and
provide little measured argument of the pros and cons of the study and its results(255) (256). Some journals have
attempted to remedy this problem by encouraging more structure to authors’ discussion of their results.(255)
(256) For example, Annals of Internal Medicine recommends that authors structure the discussion section by
presenting (1) a brief synopsis of the key findings, (2) consideration of possible mechanisms and explanations, (3)
comparison with relevant findings from other published studies (whenever possible including a systematic review
combining the results of the current study with the results of all previous relevant studies), (4) limitations of the
present study (and methods used to minimize and compensate for those limitations), and (5) a brief section that
summarises the clinical and research implications of the work, as appropriate.(255) We recommend that authors
follow these sensible suggestions, perhaps also using suitable subheadings in the discussion section.
Although discussion of limitations is frequently omitted from research reports,(257) identification and discussion
of the weaknesses of a study have particular importance.(258) For example, a surgical group reported that
laparoscopic cholecystectomy, a technically difficult procedure, had significantly lower rates of complications than
the more traditional open cholecystectomy for management of acute cholecystitis.(259) However, the authors
failed to discuss an obvious bias in their results. The study investigators had completed all the laparoscopic
cholecystectomies, whereas 80% of the open cholecystectomies had been completed by trainees.
Authors should also discuss any imprecision of the results. Imprecision may arise in connection with several
aspects of a study, including measurement of a primary outcome (see item 6a) or diagnosis (see item 4a). Perhaps
the scale used was validated on an adult population but used in a paediatric one, or the assessor was not trained in
how to administer the instrument.
The difference between statistical significance and clinical importance should always be borne in mind. Authors
should particularly avoid the common error of interpreting a non-significant result as indicating equivalence of
interventions. The confidence interval (see item 17a) provides valuable insight into whether the trial result is
compatible with a clinically important effect, regardless of the P value.(120)
Authors should exercise special care when evaluating the results of trials with multiple comparisons. Such
multiplicity arises from several interventions, outcome measures, time points, subgroup analyses, and other
factors. In such circumstances, some statistically significant findings are likely to result from chance alone.
Example:
“The preponderance of male patients (85%) is a limitation of our study … We used bare-metal stents, since drugeluting stents were not available until late during accrual. Although the latter factor may be perceived as a
limitation, published data indicate no benefit (either short-term or long-term) with respect to death and
myocardial infarction in patients with stable coronary artery disease who receive drug-eluting stents, as compared
with those who receive bare-metal stents.”(253)
21 – Generalisability
Description:
Generalisability (external validity, applicability) of the trial fundings
Explanation:
External validity, also called generalisability or applicability, is the extent to which the results of a study can be
generalised to other circumstances.(262) Internal validity, the extent to which the design and conduct of the trial
eliminate the possibility of bias, is a prerequisite for external validity: the results of a flawed trial are invalid and
the question of its external validity becomes irrelevant. There is no absolute external validity; the term is
meaningful only with regard to clearly specified conditions that were not directly examined in the trial. Can results
be generalised to an individual participant or groups that differ from those enrolled in the trial with regard to age,
sex, severity of disease, and comorbid conditions? Are the results applicable to other drugs within a class of similar
drugs, to a different dose, timing, and route of administration, and to different concomitant therapies? Can similar
results be expected at the primary, secondary, and tertiary levels of care? What about the effect on related
outcomes that were not assessed in the trial, and the importance of length of follow-up and duration of treatment,
especially with respect to harms? (263)
External validity is a matter of judgment and depends on the characteristics of the participants included in the trial,
the trial setting, the treatment regimens tested, and the outcomes assessed.(5) (136) It is therefore crucial that
adequate information be described about eligibility criteria and the setting and location (see item 4b), the
interventions and how they were administered (see item), the definition of outcomes (see item 6), and the period
of recruitment and follow-up (see item). The proportion of control group participants in whom the outcome
develops (control group risk) is also important. The proportion of eligible participants who refuse to enter the trial
as indicated on the flowchart (see item 13) is relevant for the generalisability of the trial, as it may indicate
preferences for or acceptability of an intervention. Similar considerations may apply to clinician preferences. (264)
(265)
Several issues are important when results of a trial are applied to an individual patient.(266) (267) (268) Although
some variation in treatment response between an individual patient and the patients in a trial or systematic review
is to be expected, the differences tend to be in magnitude rather than direction.
Although there are important exceptions,(268) therapies (especially drugs (269)) found to be beneficial in a
narrow range of patients generally have broader application in actual practice. Frameworks for the evaluation of
external validity have been proposed, including qualitative studies, such as in integral “process evaluations”(270)
and checklists.(271) Measures that incorporate baseline risk when calculating therapeutic effects, such as the
number needed to treat to obtain one additional favourable outcome and the number needed to treat to produce
one adverse effect, are helpful in assessing the benefit-to-risk balance in an individual patient or group with
characteristics that differ from the typical trial participant.(268) (272) (273) Finally, after deriving patient centred
estimates for the potential benefit and harm from an intervention, the clinician must integrate them with the
patient’s values and preferences for therapy. Similar considerations apply when assessing the generalisability of
results to different settings and interventions.
Example:
“As the intervention was implemented for both sexes, all ages, all types of sports, and at different levels of sports,
the results indicate that the entire range of athletes, from young elite to intermediate and recreational senior
athletes, would benefit from using the presented training programme for the prevention of recurrences of ankle
sprain. By including non-medically treated and medically treated athletes, we covered a broad spectrum of injury
severity. This suggests that the present training programme can be implemented in the treatment of all athletes.
Furthermore, as it is reasonable to assume that ankle sprains not related to sports are comparable with those in
sports, the programme could benefit the general population.”(260)
“This replicates and extends the work of Clarke and colleagues and demonstrates that this CB (cognitive
behavioural) prevention program can be reliably and effectively delivered in different settings by clinicians outside
of the group who originally developed the intervention. The effect size was consistent with those of previously
reported, single-site, indicated depression prevention studies and was robust across sites with respect to both
depressive disorders and symptoms … In this generalisability trial, we chose a comparison condition that is
relevant to public health—usual care … The sample also was predominantly working class to middle class with
access to health insurance. Given evidence that CB therapy can be more efficacious for adolescents from homes
with higher incomes, it will be important to test the effects of this prevention program with more economically
and ethnically diverse samples.”(261)
22 – Interpretation
Description:
Interpret results in relation to the noninferiority hypothesis. If a superiority conclusion is drawn for outcome(s) for
which noninferiority was hypothesised, provide justification for switching.
Explanation:
The results of any trial must be interpreted in relation to its aims. As shown in Figure 1, assuming an adverse
outcome calculated as new vs. reference, if the upper bound of the 2-sided (1-2α) x100% CI for the difference
between treatments is below ▲, noninferiority may be claimed. Alternative explanations such as poor adherence,
dropouts, recruitment of patients ununlikely to respond, and treatment crossovers may need to be considered
(see “Conduct” in eAppendix). If instead the upper bound is above the noninferiority margin ▲, the null hypothesis
of inferiority remains plausible. If the 2-sided CI for the treatment difference is entirely to the left of zero as in case
A of Figure 1, then it can be sensibly concluded that there is statistically significant evidence that the new
treatment is superior to reference, if the superiority hypothesis is defined a priori in the protocol and the analysis
is ITT. It should be indicated whether the conclusion relating to noninferiority or equivalence is based on ITT or per
protocol analysis or both and whether those conclusions are stable with respect to different types of analyses (e.g.,
ITT, per-protocol). Conclusions should preferably be stated in terms of the prespecified noninferiority or
equivalence margin using language consistent with the aim of the trial (e.g., treatment A is “noninferior to” or
“equivalent to” treatment B).32,33,47
Example:
a)
Concluding inferiority of new drug or conventional superiority of reference drug
 “Although the trial was intended to assess the non-inferiority of tirobifanas compared with
abciximab, the findings demonstrated that tirobifan offered less protection from major ischemic
events than did abciximab. . . . In order to meet the present definition of equivalence, the upper
bound of the 95% confidence interval of the hazard ratio for the comparison of tirofiban with
abciximab had to be less than 1.47. . . . The primary end point occurred more frequently among
the 2398 patients in the tirofiban group than among the 2411 patients in the abciximab group(
7.6percentvs6.0percent; hazard ratio, 1.26; . . . two-sided 95 percent confidence interval of 1.01
to 1.57, demonstrating the superiority of abciximab over tirofiban; P=0.038).” 51
b) Concluding noninferiority of new drug from a trial designed to access superiority
 “The SYNERGY protocol prespecified that if enoxaparin was not demonstrated to be superior to
unfractionated heparin, a non-inferiority analysis was to be performed. . . . Enaxoparin was not
superior to unfractionated heparin but was noninferior for the treatment of high-risk patients
with non ST-segment elevation [acute coronary syndromes].”61
c) Concluding equivalence
 “The risk difference for complete abortion was 1.24% (95% CI-0.53 to 3.02), which falls within the
predefined equivalence range (-5%to 5%). . . . The provision of medical abortion up to 9 weeks’
gestation by midlevel providers and doctors was similar in . . . effectiveness.” 52
Other Information
23 – Registration
Description:
Registration number and name of trial registry
Explanation:
The consequences of non-publication of entire trials,(281) (282) selective reporting of outcomes within trials, and
of per protocol rather than intention-to-treat analysis have been well documented.(55) (56) (283) Covert
redundant publication of clinical trials can also cause problems, particularly for authors of systematic reviews when
results from the same trial are inadvertently included more than once.(284)
To minimize or avoid these problems there have been repeated calls over the past 25 years to register clinical trials
at their inception, to assign unique trial identification numbers, and to record other basic information about the
trial so that essential details are made publicly available.(285) (286) (287) (288) Provoked by recent serious
problems of withholding data,(289) there has been a renewed effort to register randomised trials. Indeed, the
World Health Organisation states that “the registration of all interventional trials is a scientific, ethical and moral
responsibility” (www.who.int/ictrp/en). By registering a randomised trial, authors typically report a minimal set of
information and obtain a unique trial registration number.
In September 2004 the International Committee of Medical Journal Editors (ICMJE) changed their policy, saying
that they would consider trials for publication only if they had been registered before the enrolment of the first
participant.(290) This resulted in a dramatic increase in the number of trials being registered.(291) The ICMJE
gives guidance on acceptable registries (http://www.icmje.org/faq_clinical.html).
In a recent survey of 165 high impact factor medical journals’ instructions to authors, 44 journals specifically stated
that all recent clinical trials must be registered as a requirement of submission to that journal.(292)
Authors should provide the name of the register and the trial’s unique registration number. If authors had not
registered their trial they should explicitly state this and give the reason.
Example:
“The trial is registered at ClinicalTrials.gov, number NCT00244842.” (280)
24 – Protocol
Description:
Where the full trial protocol can be assessed, if available
Explanation:
A protocol for the complete trial (rather than a protocol of a specific procedure within a trial) is important because
it pre-specifies the methods of the randomised trial, such as the primary outcome (see item 6a). Having a protocol
can help to restrict the likelihood of undeclared post hoc changes to the trial methods and selective outcome
reporting (see item 6b). Elements that may be important for inclusion in the protocol for a randomised trial are
described elsewhere.(294)
There are several options for authors to consider ensuring their trial protocol is accessible to interested readers. As
described in the example above, journals reporting a trial’s primary results can make the trial protocol available on
their web site. Accessibility to the trial results and protocol is enhanced when the journal is open access. Some
journals (such as Trials) publish trial protocols, and such a publication can be referenced when reporting the trial’s
principal results. Trial registration (see item 23) will also ensure that many trial protocol details are available, as
the minimum trial characteristics included in an approved trial registration database includes several protocol
items and results (www.who.int/ictrp/en). Trial investigators may also be able to post their trial protocol on a
website through their employer. Whatever mechanism is used, we encourage all trial investigators to make their
protocol easily accessible to interested readers.
Example:
“Full details of the trial protocol can be found in the Supplementary Appendix, available with the full text of this
article at www.nejm.org.”(293)
25 – Funding
Description:
Sources of funding and other support (such as supply of drugs), role of funders
Explanation:
Authors should report the sources of funding for the trial, as this is important information for readers assessing a
trial. Studies have showed that research sponsored by the pharmaceutical industry are more likely to produce
results favouring the product made by the company sponsoring the research than studies funded by other
sources.(297) (298) (299) (300) A systematic review of 30 studies on funding found that research funded by the
pharmaceutical industry had four times the odds of having outcomes favouring the sponsor than research funded
by other sources (odds ratio 4.05, 95% confidence interval 2.98 to 5.51).(297) A large proportion of trial
publications do not currently report sources of funding. The degree of underreporting is difficult to quantify. A
survey of 370 drug trials found that 29% failed to report sources of funding.(301) In another survey, of PubMed
indexed randomised trials published in December 2000, source of funding was reported for 66% of the 519
trials.(16)
The level of involvement by a funder and their influence on the design, conduct, analysis, and reporting of a trial
varies. It is therefore important that authors describe in detail the role of the funders. If the funder had no such
involvement, the authors should state so. Similarly, authors should report any other sources of support, such as
supply and preparation of drugs or equipment, or in the analysis of data and writing of the manuscript.(302)
Example:
“Grant support was received for the intervention from Plan International and for the research from the Wellcome
Trust and Joint United Nations Programme on HIV/AIDS (UNAIDS). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.”(295)
“This study was funded by GlaxoSmithKline Pharmaceuticals. GlaxoSmithKline was involved in the design and
conduct of the study and provided logistical support during the trial. Employees of the sponsor worked with the
investigators to prepare the statistical analysis plan, but the analyses were performed by the University of Utah.
The manuscript was prepared by Dr Shaddy and the steering committee members. GlaxoSmithKline was permitted
to review the manuscript and suggest changes, but the final decision on content was exclusively retained by the
authors.”(296)
Other Information
Authors
Gilda Piaggio, PhD
Diana R. Elbourne, PhD
Stuart J. Pocock, PhD
Stephen J. W. Evans, MSc
Douglas G. Altman, DSc
for the CONSORT Group
The original CONSORT (CONsolidated Standards of Reporting Trials) Statement was developed to help alleviate the
problem of inadequate reporting of randomised controlled trials (RCTs).1-3 The statement, recently updated as
CONSORT2010,4,5 comprises evidence based recommendations for reporting RCTs, including a diagram showing the
flow of participants through the trial.
The initial focus of the CONSORT Statement was on parallel-group trials,1-3 aiming to identify treatment superiority
if it exists. Most CONSORT recommendations apply equally to other trial designs, but some need adaptation. We
therefore extended the CONSORT recommendations to noninferiority and equivalence trials in 2006. 6 The present
article updates those recommendations to reflect the newCONSORT2010 Statement and the 2008 CONSORT
Statement for the reporting of abstracts, together with recent methodological publications. 7,8 The main changes from
the 2006 article are shown in the BOX. We generally focus on noninferiority trials throughout, but the same
principles apply to equivalence trials.
First, the article explains the rationale for such trials. Second, it considers how commonly noninferiority trials are
published. Third, it provides empirical evidence about their quality. Fourth, it explains the approach used to update
the CONSORT Statement to include noninferiority trials. Fifth, it presents the updated CONSORT checklist for
reporting noninferiority trials and provides illustrative examples (and further elaboration) for those items that have
been amended.
For convenience, the article will refer to treatments and patients, although not all interventions evaluated in RCTs
are technically treatments, and the participants in trials are not always patients.
Box: Major changes from the 2006 extension of the CONSORT statement
for reporting noninferiority and equivalence trials
• Based on the standard CONSORT 2010 checklist, which incorporates changes to the CONSORT 2001 checklist
described in detail in the CONSORT 2010 Statement9
• Uses a 2-column display to show more clearly the additional information to report for noninferiority trials
• New checklist for abstracts to apply to noninferiority trials
• Expanded checklist items for objectives, outcomes, and interpretation
• Most examples of good reporting practice updated, including 10 new examples of good reporting from publications after
2006. Kept 3 examples from the 2006 extension that illustrates specific points
• Recent methodological developments are summarized in the eAppendix
• Empirical evidence of reporting of noninferiority trials updated
Rationale for noninferiority or equivalence designs
Most RCTs aim to determine whether one intervention is superior to another. Failure to show a difference does
not mean they are equivalent. By contrast, equivalence trials10 aim to determine whether one (typically new)
intervention is therapeutically similar to another (usually an existing) treatment. We use “new” to refer to the
treatment under evaluation, and the comparison or standard or reference treatment is often called an “active
control.” We will generally use the term “reference treatment” for consistency.
A noninferiority trial seeks to determine whether a new treatment is not worse than a reference treatment by
more than an acceptable amount. Because proof of exact equivalence is impossible, a prestated margin of
noninferiority (▲) for the treatment effect in a primary patient outcome is defined. Equivalence trials are very
similar, except that equivalence is defined as the treatment effect being between ▲ and ▲. For therapeutic or
prophylactic trials the noninferiority approach is much more common than a true (2-sided) equivalence approach.
However, equivalence trials are more common in pharmacokinetics, in which a difference in either direction from
the reference treatment is of importance.
Noninferiority of the new treatment with respect to the reference treatment is of interest on the premise that the
new treatment has some other advantage, such as greater availability, reduced cost, less invasiveness,11,12 fewer
adverse effects (harms),13 or greater ease of administration. 14 In trials that investigate noninferiority, therefore, the
question of interest is not symmetric.15 The new treatment will be recommended if it is similar to the reference
treatment for a prespecified primary outcome but not if it is worse by more than ▲. Superiority of the new
treatment for the primary outcome would be an additional benefit. Some noninferiority trials have been criticised
for merely studying a new marketable product (“me-too” drugs) without offering any advantages over existing
products.16 The use of noninferiority or equivalence trials has been criticised on the grounds that they ask “no
relevant clinical questions” and are therefore unethical. 17 But some observers argue that this view is misplaced.18,19
This article focuses mainly on noninferiority trials but applies also to the less common 2-sided equivalence trials
(eAppendix, available at http://www .jama.com).
How common are noninferiority trials?
Assessing the frequency of noninferiority trials is complicated, because not all noninferiority or equivalence trials
use these words, and the term “equivalence” is often inappropriately used when reporting “negative” (null) results
of superiority trials; such trials often lack statistical power to rule out important differences. 20,21 A recent review of
583 noninferiority trials of drug therapies published between 1989 and 2009 showed an increasing trend, with only
1 trial published before 1999 and more than 100 trials published per year from 2007. 22 A third of these were in the
fields of infectious diseases or cardiology. An earlier review found the same 2 specialties had the greatest number
of noninferiority and equivalence trials.23 Surveys in ophthalmology24 and oncology25 also found increases in the
number of such trials.
Quality of reporting of noninferiority trials
Early reviews of the quality of trials claiming equivalence found that important deficiencies were common.
Equivalence was inappropriately claimed in 67% of 88 studies published from 1992 to 1996 on the basis of
nonsignificant tests for superiority. 21 Fifty-one percent stated equivalence as an aim, but only 23% reported that
they were designed with a preset margin of equivalence. Other disease- or field-specific reviews had similar
findings.26-29
More recent reviews have found that the quality of reports of noninferiority and equivalence trials remains poor.
In one review (covering the years 1990 to 2000) only about one fifth of 332 noninferiority and equivalence trials
provided a suitable rationale for the noninferiority margin.23 In another review covering noninferiority trials
indexed in PubMed as of February 5, 2009, almost all of 232 published reports of equivalence and noninferiority
drug trials specified the noninferiority margin, but only 24% explained how it was determined. 30,31 Other reviews
had broadly similar findings.32-34 An increasing quality of reporting of noninferiority trials in oncology was observed
from 2001 to 2010.35
Updating the CONSORT statement extension for noninferiority
randomised trials
The updated CONSORT 2010 Statement comprises a 25-item checklist and a participant flow diagram.4 In the 2010
update, some new items and subitems were introduced, wording was simplified and clarified, and the specificity of
some items was made more explicit by breaking them into subitems. Methodological advances reported in the
literature since the 2001 Statement were reviewed and taken into consideration. This noninferiority extension was
undertaken to reflect the updated CONSORT Statement and to integrate any significant advances in noninferiority
trials methodology since 2006.
The updating process
An electronic search of publications citing the original CONSORT extension for noninferiority and equivalence
trials6 was conducted using Web of Science (October 14, 2010). The search yielded 260 publications. An initial
assessment of the titles and abstracts was made for relevance, yielding 142 articles. After excluding repeated
publications, 137 articles remained, of which 85 were trial reports, 47 were methodological papers, and 5were
reviews of published reports of trials potentially relevant to the update of the CONSORT extension. The
methodological studies and reviews were assessed for material that might influence the update. In addition, we
reviewed publications from 2006 and later, including guidelines issued by both the Food and Drug Administration 36
and the European Medicines Agency37,38 for sponsors to consider when designing and reporting noninferiority trials
(whether for prelicensing pivotal trials or post licensing safety trials). The citation search was rerun on October 8,
2012, from which an additional 149 articles were considered for relevance.
Three authors (G.P., D.R.E., D.G.A.) met face to face on several occasions to discuss the revision of the extension
and also discussed multiple drafts on conference calls and by e-mail. A draft of the revised checklist and
accompanying text was distributed to other coauthors, and the subsequent revision was circulated to the larger
CONSORT group for feedback. After consideration of their comments the final version was prepared and approved
by the CONSORT Executive (http://www.consort-statement.org/about-consort/the-consort-group/the-consortgroup-executive/)
Methodology
Methodological considerations in noninferiority trials are discussed in the eAppendix. Key issues include the need
to state the trial hypotheses in relation to the noninferiority margin; the choice of this margin; analysis using a CI
approach; and the presentation and interpretation of the results using the CI in relation to the noninferiority
margin (FIGURE 1).
Figure 1: Possible scenarios of observed treatment differences for adverse outcomes (harms) in noninferiority
trials
Error bars indicate 2-sided 95% CIs. The blue dashed line at x=▲ indicates the noninferiority margin; the blue
tinted region to the left of x=▲ indicates the zone of inferiority. A, If the CI lies wholly to the left of zero, the new
treatment is superior. B and C, If the CI lies to the left of ▲ and includes zero, the new treatment is noninferior but
not shown to be superior. D, If the CI lies wholly to the left of ▲ and wholly to the right of zero, the new treatment
is noninferior in the sense already defined but also inferior in the sense that a null treatment difference is
excluded. This puzzling circumstance is rare, because it requires a very large sample size. It also can result from a
noninferiority margin that is too wide. E and F, If the CI includes ▲ and zero, the difference is nonsignificant but
the result regarding noninferiority is inconclusive. G, If the CI includes ▲ and is wholly to the right of zero, the
difference is statistically significant but the result is inconclusive regarding possible inferiority of magnitude ▲ or
worse. H, If the CI is wholly above ▲, the new treatment is inferior.
This CI indicates noninferiority in the sense that it does not include ▲, but the new treatment is significantly
worse than the standard. Such a result is unlikely because it would require a very large sample size.
a
This CI is inconclusive in that it is still plausible that the true treatment difference is less than ▲, but the new
treatment is significantly worse than the standard. Adapted from Piaggio et al.
B
6
Revised extension of CONSORT statement
To accommodate noninferiority trials, an extension of the CONSORT Statement should encompass the following
main issues: (1) the rationale for adopting a noninferiority design; (2)how study hypotheses were incorporated into
the design; (3) choice of participants, interventions (especially the reference treatment), and outcomes; (4)
statistical methods, including sample size calculation; and (5)how the design affects interpretation and
conclusions. Consequences for the CONSORT checklist, including specific changes, are described below. The flow
diagram was not considered to require any specific modification.
Comment
Available efficacious active treatments can make use of placebo controls unethical. 64 Noninferiority trials,
comparing a new treatment with a standard, are becoming frequent because of the need to replace standard
treatments by other treatments having comparable efficacy but presenting other advantages. Even in cases for
which a treatment is efficacious on some measures, e.g., depression scales, it may not be efficacious for a rarer but
arguably more important outcome, e.g., suicide.65
It is not our intent to promote noninferiority or equivalence trials but to contribute to better reporting and
understanding of these trials: the design of a trial should be appropriate to the question to be answered. 66 Reports
of noninferiority and equivalence trials must be clear enough to allow readers to interpret results reliably.
Accordingly, we have provided an updated extension to the CONSORT Statement to facilitate appropriate
reporting of noninferiority and equivalence trials. The present recommendations are among a series of extensions
to the CONSORT Statement. The current versions of all CONSORT recommendations are available at
http://www.consort-statement.org.
Author contributions
Dr Piaggio had full access to all of the data in the study and takes responsibility for the integrity of the data and the
accuracy of the data analysis.
Study concept and design: Piaggio, Elbourne, Pocock, Evans, Altman.
Acquisition of data: Piaggio, Elbourne, Altman.
Analysis and interpretation of data: Piaggio, Elbourne, Altman.
Drafting of the manuscript: Piaggio, Elbourne, Pocock, Altman.
Critical revision of the manuscript for important intellectual content: Piaggio, Pocock, Evans, Altman.
Obtained funding: Altman.
Funding/Support
The CONSORT Group receives funding from the UK Medical Research Council. Dr Piaggio and Dr Elbourne received
a small contribution from the CONSORT group toward the time spent on this and other CONSORT work. Dr Piaggio
was supported by the CONSORT group to attend meetings in London. Dr Altman is supported by a Cancer Research
UK programme grant (C5529).
References
1. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT
statement. JAMA. 1996;276(8):637- 639.
2. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of
reports of parallel-group randomised trials. Lancet. 2001;357(9263):1191-1194.
3. Altman DG, Schulz KF, Moher D, et al; CONSORT GROUP (Consolidated Standards of Reporting Trials). The revised
CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med.
2001;134(8):663-694.
4. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting
parallel group randomised trials.BMJ. 2010; 340:c869.
5. Schulz KF, Altman DG, MoherD; CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting
parallel group randomised trials. PLoSMed. 2010; 7(3):e1000251.
6. Piaggio G, Elbourne DR,Altman DG, Pocock SJ, Evans SJ; CONSORT Group. Reporting of noninferiority and
equivalence randomized trials: an extension of the CONSORT statement. JAMA. 2006;295(10):1152-1160.
7. Hopewell S, Clarke M, Moher D, et al; CONSORT Group. CONSORT for reporting randomised trials in journal and
conference abstracts. Lancet. 2008;371(9609): 281-283.
8. Hopewell S, Clarke M, Moher D, et al; CONSORT Group. CONSORT for reporting randomized con-trolled trials in
journal and conference abstracts: explanation and elaboration. PLoS Med. 2008;5(1):e20.
9. Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting
parallel group randomised trials. BMJ. 2010; 340:c332.
10. Wellek S. Testing Statistical Hypotheses of Equivalence. Boca Raton, Fla: Chapman Hall/CRC; 2003.
11. Durkalski VL, Palesch YY, Pineau BC, Vining DJ, Cotton PB. The virtual colonoscopy study: a large multicenter clinical
trial designed to compare two diagnostic screening procedures. Control Clin Trials. 2002; 23(5):570-583.
12. Clinical Outcomes of Surgical Therapy Study Group. A comparison of laparoscopically assisted and open colectomy for
colon cancer. N Engl J Med. 2004;350 (20):2050-2059.
13. Chadwick D; Vigabatrin European Monotherapy Study Group. Safety and efficacy of vigabatrin and carbamazepine in
newly diagnosed epilepsy: a multicentre randomised double-blind study. Lancet. 1999; 354(9172):13-19.
14. Van De Werf F, Adgey J, Ardissino D, et al; Assessment of the Safety and Efficacy of a New Thrombolytic (ASSENT-2)
Investigators. Single-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction: the
ASSENT-2 double-blind randomised trial. Lancet. 1999;354(9180):716-722.
15. Committee for Proprietary Medicinal Products. Note for Guidelines on Evaluation of Medicinal Products Indicated for
Treatment of Bacterial Infections. London, England: European Medicines Agency (EMA); 2004.
16. Pocock SJ. The pros and cons of non-inferiority (equivalence) trials. In: Guess HA, Kleinman A, Kusek JW, Engel LW,
eds. The Science of Placebo: Towards an Interdisciplinary Research Agenda. London, England: BMJ Books;
2000:236-248.
17. Garattini S, Bertele’ V. Non-inferiority trials are unethical because they disregard patients’ interests. Lancet.
2007;370(9602):1875-1877.
18. Nunn AJ, Meredith SK, Spigelman MK, Ginsberg AM, Gillespie SH. The ethics of non-inferiority trials. Lancet.
2008;371(9616):895-, author reply 896-897.
19. Gandjour A. The ethics of non-inferiority trials. Lancet. 2008;371(9616):895-, author reply 896-897.
20. Moher D, Dulberg CS,Wells GA. Statistical power, sample size, and their reporting in randomized controlled trials.
JAMA. 1994;272(2):122-124.
21. Greene WL, Concato J, Feinstein AR. Claims of equivalence in medical research: are they supported by the evidence?
Ann Intern Med. 2000;132(9):715- 722.
22. Suda KJ, Hurley AM, McKibbin T, Motl Moroney SE. Publication of noninferiority clinical trials: changes over a 20year interval. Pharmacotherapy. 2011; 31(9):833-839.
23. Lange S, Freitag G. Choice of delta: requirements and reality—results of a systematic review. Biom J. 2005; 47(1):12-27,
99-107.
24. Eyawo O, Lee CW, Rachlis B, Mills EJ. Reporting of noninferiority and equivalence randomized trials for major
prostaglandins: a systematic survey of the ophthalmology literature. Trials. 2008;9:69.
25. Chuang-Stein C, Beltangady M, Dunne M, Morrison B. The ethics of non-inferiority trials. Lancet. 2008; 371(9616):895896, author reply 896-897.
26. Krysan DJ, Kemper AR. Claims of equivalence in randomized controlled trials of the treatment of bacterial meningitis in
children. Pediatr Infect Dis J. 2002; 21(8):753-758.
27. Dimick JB,Diener-West M, Lipsett PA. Negative results of randomized clinical trials published in the surgical literature:
equivalency or error? Arch Surg. 2001; 136(7):796-800.
28. Costa LJ, Xavier AC, del Giglio A. Negative results in cancer clinical trials—equivalence or poor accrual? Control Clin
Trials. 2004;25(5):525-533.
29. Piaggio G, Pinol AP. Use of the equivalence approach in reproductive health clinical trials. Stat Med. 2001;20(23):35713577.
30. Wangge G, Klungel OH, Roes KCB, de Boer A, HoesAW, Knol MJ. Interpretation and inference in noninferiority
randomized controlled trials in drug research. Clin Pharmacol Ther. 2010;88(3):420-423.
31. Wangge G, Klungel OH, Roes KC, de Boer A, Hoes AW,Knol MJ. Room for improvement in conducting and reporting
non-inferiority randomized controlled trials on drugs: a systematic review. PLoS One. 2010;5(10): e13550.
32. Parienti JJ, Verdon R, Massari V. Methodological standards in non-inferiority AIDS trials: moving from adherence to
compliance. BMCMedRes Methodol. 2006; 6:46.
33. Le Henanff A, Giraudeau B, Baron G, Ravaud P. Quality of reporting of noninferiority and equivalence randomized
trials. JAMA. 2006;295(10):1147-1151.
34. Schiller P, Burchardi N, Niestroj M, Kieser M. Quality of reporting of clinical non-significance and equivalence
randomized trials—update and extension. Trials. 2012;13:214.
35. Tanaka S, Kinjo Y, Kataoka Y, Yoshimura K, Teramukai S. Statistical issues and recommendations for noninferiority
trials in oncology: a systematic review. Clin Cancer Res. 2012;18(7):1837-1847.
36. US Food and Drug Administration (FDA). Guidance for Industry, Non-Inferiority Clinical Trials. FDA website.
http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM202140.pdf.
March 2010. Accessed November 19, 2012.
37. Committee for Medicinal Products for Human Use; Efficacy Working Party; Committee for Release for Consultation.
Committee for Medicinal Products for Human Use (CHMP) guideline on the choice of the noninferiority margin.
Stat Med. 2006;25(10):1628- 1638.
38. European Medicines Agency (EMA). Points to Consider on Switching Between Superiority and Non-inferiority. London,
England: EMA; 2000.
39. Boutron I, Moher D, Altman DG, Schulz KF, Ravaud P; CONSORT Group. Extending the CONSORT statement to
randomized trials of nonpharmacologic treatment: explanation and elaboration. Ann Intern Med. 2008;148(4):295309.
40. Zwarenstein M, Treweek S, Gagnier JJ, et al; CONSORT Group; Pragmatic Trials in Healthcare (Practihc) Group.
Improving the reporting of pragmatic trials: an extension of the CONSORT statement. BMJ. 2008; 337:a2390.
41. Campbell MK, Piaggio G, Elbourne DR, Altman DG; CONSORT Group. Consort 2010 statement: extension to cluster
randomised trials. BMJ. 2012;345:e5661.
42. Eriksson BI, Dahl OE, Rosencher N, et al;RE-NOVATE Study Group. Dabigatran etexilate versus enoxaparin for
prevention of venous thromboembolism after total hip replacement: a randomised, double-blind, noninferiority trial.
Lancet. 2007;370(9591):949-956.
43. Tanenberg RJ, Irving GA, Risser RC, et al. Duloxetine, pregabalin, and duloxetine plus gabapentin for diabetic peripheral
neuropathic pain management in patients with inadequate pain response to gabapentin: an open-label, randomized,
noninferiority comparison. Mayo Clin Proc. 2011;86(7):615- 626.
44. Hofheinz RD, Wenz F, Post S, et al. Chemoradiotherapy with capecitabine versus fluorouracil for locally advanced rectal
cancer: a randomised, multicentre, non-inferiority, phase 3 trial. Lancet Oncol. 2012; 13(6):579-588.
45. Sacco RL, Diener HC, Yusuf S, et al; PRoFESS Study Group. Aspirin and extended-release dipyridamole versus
clopidogrel for recurrent stroke. NEngl JMed. 2008; 359(12):1238-1251.
46. Kaul S, Diamond GA, Weintraub WS. Trials and tribulations of non-inferiority: the ximelagatran experience. J Am Coll
Cardiol. 2005;46(11):1986- 1995.
47. Kaul S, Diamond GA. Good enough: a primer on the analysis and interpretation of noninferiority trials. Ann Intern Med.
2006;145(1):62-69.
48. Hansson L, Lindholm LH, Ekbom T, et al. Randomised trial of old and new antihypertensive drugs in elderly patients:
cardiovascular mortality and morbidity the Swedish Trial in Old Patients with Hypertension- 2 study. Lancet.
1999;354(9192):1751- 1756.
49. Gu¨lmezoglu AM, Lumbiganon P, Landoulsi S, et al. Active management of the third stage of labour with and without
controlled cord traction: a randomised, controlled, non-inferiority trial. Lancet. 2012;379 (9827):1721-1727.
50. Rothmann M, Li N, Chen G, Chi GY, Temple R, Tsou HH. Design and analysis of non-inferiority mortality trials in
oncology. Stat Med. 2003;22(2): 239-264.
51. Topol EJ, Moliterno DJ, Herrmann HC, et al; TARGET Investigators. Comparison of two platelet glycoprotein IIb/IIIa
inhibitors, tirofiban and abciximab, for the prevention of ischemic events with percutaneous coronary
revascularization. N Engl JMed. 2001; 344(25):1888-1894.
52. Warriner IK, Wang D, Huong NT, et al. Can midlevel health-care providers administer early medical abortion as safely
and effectively as doctors? A randomised controlled equivalence trial in Nepal. Lancet. 2011;377(9772):1155-1161.
53. Brown D, Volkers P, Day S. An introductory note to the CHMP guidelines. Stat Med. 2007;26(1): 230-235.
54. Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ.
1996;313(7048):36-39.
55. Connolly S, Pogue J, Hart R, et al; ACTIVE Writing Group of the ACTIVE Investigators. Clopidogrel plus aspirin versus
oral anticoagulation for a trial fibrillation in the atrial fibrillation Clopidogrel Trial with Irbesartan for prevention of
Vascular Events (ACTIVEW): a randomised controlled trial. Lancet. 2006; 367(9526):1903-1912.
56. D’Agostino RB Sr, Massaro JM, Sullivan LM. Noninferiority trials: design concepts and issues—the encounters of
academic consultants in statistics. Stat Med. 2003;22(2):169-186.
57. Durrleman S, Simon R. Planning and monitoring of equivalence studies. Biometrics. 1990;46(2): 329-336.
58. Cairns JA, Wittes J, Wyse DG, et al. Monitoring the ACTIVE-W trial: some issues in monitoring a noninferiority trial.
Am Heart J. 2008;155(1):33-41.
59. Willburger RE, Mysler E, Derbot J, et al. Lumiracoxib 400 mg once daily is comparable to indomethacin 50 mg three
times daily for the treatment of acute flares of gout. Rheumatology (Oxford). 2007;46 (7):1126-1132.
60. Stone GW, McLaurin BT, Cox DA, et al; ACUITY Investigators. Bivalirudin for patients with acute coronary syndromes.
N Engl J Med. 2006;355(21): 2203-2216.
61. Ferguson JJ, Califf RM, Antman EM, et al; SYNERGY Trial Investigators. Enoxaparin vs. unfractionated heparin in
high-risk patients with non–STsegment elevation acute coronary syndromes managed with an intended early
invasive strategy: primary results of the SYNERGY randomized trial. JAMA. 2004; 292(1):45-54.
62. Giuliano AE, Hunt KK, Ballman KV, et al. Axillary dissection vs. no axillary dissection in women with invasive breast
cancer and sentinel node metastasis: a randomized clinical trial. JAMA. 2011;305(6):569-575.
63. Lavender T, Furber C, Campbell M, et al. Effect on skin hydration of using baby wipes to clean the napkin area of
newborn babies: assessor-blinded randomised controlled equivalence trial. BMC Pediatr. 2012; 12:59.
64. Rothman KJ. Placebo mania. BMJ. 1996;313 (7048):3-4.
65. Gunnell D, Saperia J, Ashby D. Selective serotonin reuptake inhibitors (SSRIs) and suicide in adults: metaanalysis of
drug company data from placebo controlled, randomised controlled trials submitted to the MHRA’s safety review.
BMJ. 2005;330(7488):385.
66. DjulbegovicB,ClarkeM. Scientific and ethical issues in equivalence trials. JAMA. 2001;285(9):1206-1208.
Download