Dear editor, We like to make use of the possibility of submitting

advertisement
Dear editor,
We like to make use of the possibility of submitting appeal to the BMJ decision, because we are convinced that
misunderstandings and/or misinterpretations are underlying the negative approval of our manuscript.
Point by point we will deal with all the remarks made by the editorial board and external reviewers.
Response to the remarks of the editorial board
* We thought this an interesting paper
Comment: large studies investigating the diagnostic accuracy of clinical and contextual information to
discriminate influenza from other influenza like infections are difficult to perform and might not be undertaken
anymore in the future. These studies are extremely important as a basis to the evaluation of the additional value
of any other test (like rapid point-of-care tests). Taking into account the number of illness days and the pre/post
epidemic period is innovative. The decision rules formulated in our study can be useful, especially in ruling out
influenza, for many primary physicians all over the world. This fact is recognized by all the external reviewers.
* However we have some concerns about the multiple imputation method. A note from our statistician
says: you must always include your outcome (here the reference standard) when building an imputation
model and all the factors you look at in your main analysis (here they omit interaction terms).
Comment:
We agree that these interactions need to be present when the multiple imputations are generated. However, the
way the SAS procedure MI is designed ensures that this concern is properly taken into account. Indeed, the VAR
statement is used as input to construct a so-called full multivariate model based on these variables. As a
consequence, all pair wise and higher-order interactions are taken into account. We agree that this is a confusing
feature, and therefore we have updated the wording in the manuscript accordingly
*It's also not clear how you built your model.
Comment: In the method section we tried to explain the different steps we took to reach the final model. It might
be that our concise description is not clear and detailed enough for the statistician. Evidently we are willing to
provide every additional detail necessary for a better understanding and review. Please give us a more detailed
question out of the text below so that we can respond more accurately.
“Each imputed data set was analyzed using a GEE (Generalized Estimating Equations) model with influenza
positive PCR as the dependent variable and GP code as a cluster variable (to control for possible clustering of
inclusion criteria and symptom registration within GPs). A backward regression analysis starting from a model
with all symptoms and interaction terms (pre-planned on clinical relevance) between all symptoms and influenza
epidemic, RSV epidemic, vaccine use, number of illness days and age was performed. After stepwise elimination
of interaction terms with the threshold p<0.01 a forward introduction of interaction terms with borderline p-value
was executed. When convergence problems occurred the responsible variable was eliminated. To deal with
multiple comparisons only symptoms, signs, contextual information and interaction terms with a p-value less
then 0.001 were withheld in the final model. Parameter estimates of relevant variables and interaction terms were
then averaged across data sets by using a bootstrap technique (SAS macro).”
* We have some concerns about the patient selection to avoid bias, and would have liked to know more
about those not recruited
Comment: the data we analyzed were provided by the sentinel network of the Scientific Institute of Public
Health, whose main goal is to gather information about the epidemiology of each influenza epidemic. In
consequence we have no clinical nor contextual information about the not included ILI patients (nor even the
exact total number), which we recognize as a weak point of our study. At the other hand the GPs were not
focused on the diagnostic part of the data they collected and were unaware of the outcome of the PCR test at the
moment they registered the different symptoms and signs. By looking at the different curves of ILI consultations
/100000 inhabitants versus positive swabs in most seasons a good match was seen between both, so no specific
selection took place .cfr http://www.euroflu.org/index.php/
The first external reviewer (Bruce Arroll) is agreeing with us on this point.
The inclusion criteria considered were broadly interpretable and in fact especially patients with high fever were
included (cfr table 2). In consequence the sensitivity of temperature above 37°C is high and specificity very low.
Our results are therefore only applicable in a similar starting situation: a patient probably suffering from
influenza with fever and respiratory symptoms. A case definition each GP is dealing with during every winter
period. A remark was added to the discussion section to attend the reader to this fact.
* We also thought the likelihood ratios created by your model very modest in their values - ones above
10 or below 0.1 would be regarded as more useful.
Comment:
We fully agree with this statement, but we like to make following remarks. First of all we do reach negative
likelihood ratios pre/post an epidemic lower than 0,1 (Table 4: prediction rule 1: 0.04 (0.03 to 0.05); prediction
rule 2: 0.07 (0.05 to 0.08))
Secondly the pretest probability of influenza is already high (52,6%) in our study population. In consequence a
smaller positive LR already gives a striking gain in posttest probability. During and outside an influenza
epidemic a post-test probability of 79% and 60% is reached respectively using prediction rule 1.
Thirdly our prediction rules can be considered as a starting point for future diagnostic studies considering the
additive diagnostic value of for instance rapid point of care influenza tests. In our study rapid influenza tests
were only performed at the laboratory and not at the GPs’ practice, in addition the rapid tests used, changed over
the study period (2002-2007). Therefore we did not include the diagnostic performance of these tests in our
prediction rule.
* We would also have liked to see your prediction rule validated
Comment:
We could have chosen to divide our database in two parts: one part to define the prediction rules and their
diagnostic values, another part to validate these rules. But we decided not to do so because validation preferably
must be done in other populations and settings.
The second prediction rule consisting of fever and cough was already proposed and studied in other publications
especially during an influenza epidemic. The diagnostic performance of this rule is therefore validated in our
study.
We performed sensitivity analysis in 12 different subgroup datasets (cfr table 5) to underline the robustness of
our data.
Response to the external reviewers:
Reviewer 1 Comments...
Name:Bruce Arroll
Position:Professor and Head of General Practice and Primary Health Care
University of Auckland New Zealand
Research question — The research question was to develop a clinical
prediction rule for symptoms and signs of influenza.
Overall design of study — The overall design of the study was appropriate.
However the authors did not mention the Stard statement and although they seemed
to have complied with it a flow clart of patient selection would improve the
paper or at least to make figure 2 easier to understand.
Participants — The patients are as adequately described as they could be.
They all had Ili and there appeared to be no exclusion criteria. The authors
were not able to say how representative they were patients in general but as
mention this is unlikely to create any systematic variation from the "truth"
Methods — The methods are adequately described? The main outcome measures
are measures of validity which is appropriate. They had ethical approval.
Results — The results are well presented and are appropriate.
Interpretation and conclusions — The results are warranted by and
sufficiently derived from the data. It would be helpful to describe how they
derived a pretest probability of 62% in line 212 (discussion). They point out
that they have epidemic and non epidemic information and also have combined
symptoms/signs to give greater likelihood ratios.Their messages are clear
The references are up to date and relevant and glaring omissions?
I have read the reviewers comments from JAMA and while I don't understand some
of the finer statistical points I feel the authors have answered the general
criticisms in a satisfactory way
I agree with the authors that their paper does add new information to the
literature on influenza. In particular their day 1 data and their combination of
symptoms and signs to increase post test probabilities. Given the concern about
influenza in the modern world this paper will be of use to primary care physicians.
Comment:
Stard statement was followed as much as possible as we mentioned in the letter to the editor. In the following
table we stated in detail were every item could be found in our manuscript.
Section and Topic
TITLE/ABSTRACT/
KEYWORDS
INTRODUCTION
METHODS
Participants
Item
#
1
2
Identify the article as a study of diagnostic accuracy (recommend MeSH
heading 'sensitivity and specificity').
State the research questions or study aims, such as estimating diagnostic
accuracy or comparing accuracy between tests or across participant
groups.
3
The study population: The inclusion and exclusion criteria, setting and
locations where data were collected.
4
Participant recruitment: Was recruitment based on presenting
symptoms, results from previous tests, or the fact that the participants
had received the index tests or the reference standard?
Participant sampling: Was the study population a consecutive series of
participants defined by the selection criteria in item 3 and 4? If not,
specify how participants were further selected.
Data collection: Was data collection planned before the index test
and reference standard were performed (prospective study) or
after (retrospective study)?
The reference standard and its rationale.
5
6
Test methods
On page #
7
8
Technical specifications of material and methods involved including how
and when measurements were taken, and/or cite references for index
tests and reference standard.
9
Definition of and rationale for the units, cut-offs and/or categories of the
results of the index tests and the reference standard.
10
The number, training and expertise of the persons executing and reading
the index tests and the reference standard.
Title page
Line 1-3
Introduction
P3 /
Line 93-95
Methods
P4/
Line 105-107
Methods
P4/
Line 105-107
Discussion
P8/
Line 222-224
Methods
P4/
Line 107-109
Methods
P4/
Line 123-126
Methods
P4/
Line 107/109110
Methods
P5/
Line 154-155
Methods
P4/
Line 99-111
11
Statistical methods
12
13
RESULTS
Participants
Test results
14
When study was performed, including beginning and end dates of
recruitment.
15
Clinical and demographic characteristics of the study population (at least
information on age, gender, spectrum of presenting symptoms).
16
The number of participants satisfying the criteria for inclusion who did or
did not undergo the index tests and/or the reference standard; describe
why participants failed to undergo either test (a flow diagram is strongly
recommended).
Time-interval between the index tests and the reference standard, and
any treatment administered in between.
17
18
19
20
Estimates
21
22
DISCUSSION
Whether or not the readers of the index tests and reference standard
were blind (masked) to the results of the other test and describe any
other clinical information available to the readers.
Methods for calculating or comparing measures of diagnostic accuracy,
and the statistical methods used to quantify uncertainty (e.g. 95%
confidence intervals).
Methods for calculating test reproducibility, if done.
Distribution of severity of disease (define criteria) in those with the target
condition; other diagnoses in participants without the target condition.
A cross tabulation of the results of the index tests (including
indeterminate and missing results) by the results of the reference
standard; for continuous results, the distribution of the test results by the
results of the reference standard.
Any adverse events from performing the index tests or the reference
standard.
Estimates of diagnostic accuracy and measures of statistical uncertainty
(e.g. 95% confidence intervals).
How indeterminate results, missing data and outliers of the index tests
were handled.
23
Estimates of variability of diagnostic accuracy between subgroups of
participants, readers or centers, if done.
24
25
Estimates of test reproducibility, if done.
Discuss the clinical applicability of the study findings.
Methods
P4/
Line 109-116
Methods
P5/
Line 133-155
Not applicable
Methods
P4/
Line 98-104
Results
P6/
Line 159-168
Table 1
Results
P6/
Line 159-163
Methods
P4/
Line 77
Not registered
Figure 2
Not registered
Table 3 and 4
Methods
P5/
Line 124-125
Line 133-138
Results
P7/
Line 205-208
Table 5
Not applicable
Discussion
P8/
Line 210-215
Line 264-268
As mentioned earlier we have no exact number nor clinical information about all the eligible patients. About
30% of the ILI patients were swabbed. We started with 4597 patient records of which 13 records were
considered useless because of missing of any clinical information. The remaining 4584 18,7% of the records had
missing values in at least one of the following items: age, number of illness days and/or temperature. We
possessed 3738 full records (cfr table 1). Cfr result section. Figure 2 gives the flowchart of the 4584 analyzed
records.
Extra information will be noted in a footmark of figure 2.
pretest probability of 62% in line 212 (discussion)
pretest odds = 1,11 (pre test probability = 52%)
positive likelihood of an influenza epidemic: = 1,5
post test odds = 1,11*1,5=1,67
posttest probability = 1,67/(1+1,67)*100%=62%
Reviewer 2 Name: Chris Del Mar
Position: Prof primary care research, Bond University, Gold Coast, Australia
This is a difficult paper. The subject is very topical and important. But the
methods have some flaws and the results are difficult to understand.
Background: influenza is important to diagnose because some of the public health
responses to 'pandemics' is to isolate or otherwise manage people suspected of
influenza separately. There have been several studies to look for algorithms to
diagnose influenza clinically – none of them very accurate.
Description: the study is described as a cross-sectional diagnosis analysis of
>4,500 people who presented to GPs in France and Belgium with suspected flu in
several periods over several years at epidemic and non-epidemic times. Half
were found to have influenza by the blood test gold standard. The analysis is
complicated. The main finding is that it is difficult to diagnose influenza. No
item was >10 for LR+ or <0.1 for LR-. Two 'Prediction Rules' are derived from
the data. Neither performs particularly well. Ruling out is easier than ruling
in. Things that were the best predictors were:
*
Whether there was an epidemic or not;
*
Previous ILI contact
*
Cough
*
Sputum with the cough on Day 1
*
Temp >37.8
Comment: The gold standard (blood tests) is appropriate.
There is a problem with the selection of patients: these were those who might
have had influenza, as determined by the surveillance GPs. These will have
varied by threshold for making the diagnosis – and this might also be a reason
for the differences in 'epidemic-or-not', and 'Previous ILI contact'. The
authors fully acknowledge this problem.
I note the reviews by JAMA reviewers and the responses.
The paper is very well written, (although some copy-editing to sort out the
tricky English idioms is necessary) and a lot of data provided. This will be
daunting to most readers, (in particular the analysis is beyond me, for one, and
I hope a statistical reviewer is engaged), but it is hard to see how this can be
summarised better. Perhaps the Figure 1 could be left out – it is not central to
the study.
It would be good to either ask the Authors to discuss further how this is useful
for practising clinicians, (and perhaps health administrators facing future
problems with influenza), (or alternatively, have an accompanying editorial
doing this). The conclusions seem to be missing is that influenza is hard to
separate out from the ubiquitous ILI, even when GPs are already thinking of it.
All the rest seems to be small print to me.
Comment:
Our study was only performed in Belgian (Flemish and French speaking regions) not in France.
Our reference test was not a blood test, but a PCR test on nose and throat swabs.
problem with the selection of patients: the data we analyzed were provided by the sentinel network of the
Scientific Institute of Public Health, whose main goal is to gather information about the epidemiology of each
influenza epidemic. In consequence we have no clinical nor contextual information about the not included ILI
patients (nor even the exact total number), which we recognize as a weak point of our study. At the other hand
the GPs were not focused on the diagnostic part of the data they collected and were unaware of the outcome of
the PCR test at the moment they registered the different symptoms and signs. By looking at the different curves
of ILI consultations /100000 inhabitants versus positive swabs in most seasons a good match was seen between
both, so no specific selection took place .cfr http://www.euroflu.org/index.php/
The first external reviewer (Bruce Arroll) is agreeing with us on this point.
this might also be a reason for the differences in 'epidemic-or-not', and 'Previous ILI contact': these differences
are reflecting the real situation where a change from low pretest to high pretest and again to low pretest
probability occurs. Therefore distinguishing these differences in the analysis is appropriate and mandatory.
Figure 1 may be omitted.
Usefulness for practising clinicians, (and perhaps health administrators facing future problems with influenza):
The conclusions seem to be missing is that influenza is hard to separate out from the ubiquitous ILI, even when
GPs are already thinking of it. The pretest probability of influenza is already high (52,6%) in our study
population. In consequence a smaller positive LR already gives a striking gain in posttest probability. During and
outside an influenza epidemic a post-test probability of 79% and 60% is reached respectively using prediction
rule 1.
We will add to the conclusion that clinical and contextual information alone is not sufficient enough for
confirming influenza. Ruling out influenza, when a temperature above 37,8°C and cough are absent is more
reliable.
________________________________________
Reviewer 3 Comments...
Name: Dr Matthew Thompson
Position: GP and Senior Clinical Scientist, University of Oxford Department of
Primary Health Care.
Thank you for asking me to review this article.
Originality
This is a large cross sectional study carried out in GP settings in Belgium
which sought to determine diagnostic value of single and combinations of
symptoms for diagnosis of influenza in patients of all ages. There are several
clinical criteria for influenza, and there have been several attempts to try
to determine validity of diagnostic criteria (particularly in the recent swine
flu pandemic). Most have been small, and conflicting. At the same time there
have been several studies looking at point of care rapid influenza tests,
which do not show particularly good sensitivity to be used as screening tests
alone. In the recent pandemic of course, this lack of clinical diagnostic
criteria led to all sorts of problems with over/mis-diagnosis. A large well
conducted study, such as this manuscript, adds significantly to the literature
on diagnosing influenza.
Validity of study
This is a pragmatic study, conducted for sentinel purposes. Recruitment was
non-consecutive of some of the GPs' patients attending with new illness. THis
limitation is noted by the authors. GPs had a fairly broad inclusion criteria
(fever + respiratory symptom + systemic symptom). Obviously this selects
patients who may have a more severe illness, and ones in which respiratory
symptoms are present. For example in the recent pandemic, GI symptoms were not
uncommon, it is possible that such presentations would not have been captured
by their inclusion criteria - this should be commented on. Less recruitment
of younger children unfortunately - due to the swabbing involved, limiting
generalisability. The clinical features were collected apparently in a
standardised way. The gold/reference standard was PCR for influenza, however,
they did do rapid diagnostic test too - no results comparing the rapid and
PCR tests are presented - this would be interesting to present, and also the
clinical predictors of both true positive rapid flu tests, and particularly of
false negative rapid flu tests. It would be a pity not to include this data in
this paper and would consider adding, if available.
THe statistical methods seem robust, they dealt with missing variables, used
bootstrapping to strengthen internal validity (they did not have a derivation
and validation data set).
Applicability and interest
Their findings are presented appropriately and fairly clearly, and they have
used clear methods to examine their prediction rules both during, as well as
pre/post influneza epidemics. The main finding is that their clinical
prediction rules are better at ruling out than ruling in. Some of the clinical
features that they identify as being good/not good predictors are very
interesting, and challenge current clinical thinking of many clinicians I
would suspect. For example mylagia does not seem useful, whereas cough
producing sputum on the first day of illness is.
These findings would be valuable for ongoing yearly influenza diagnosis in
primary/emergency care worldwide. Indeed, the way the authors present the pre
and post test probabilities in the discussion would be clear for most readers
to understand
I enjoyed reading this paper, it was well written and presented, and I think
adds to the literuatre, and challenges current diagnostic thinking.
Comment:
in the recent pandemic, GI symptoms were not uncommon, it is possible that such presentations would not have
been captured by their inclusion criteria : GI symptoms were recorded in our study and were not an exclusion
criterion. Influenza patients in the recent pandemic suffered from fever and cough as well as from GI symptoms
and would have been included in our study.
no results comparing the rapid and PCR tests are presented: our prediction rules can be considered as a starting
point for future diagnostic studies considering the additive diagnostic value of for instance rapid point of care
influenza tests. In our study rapid influenza tests were only performed at the laboratory and not at the GPs’
practice, in addition the rapid tests used, changed over the study period (2002-2007). Therefore we did not
include the diagnostic performance of these tests in our prediction rule.
Download