Dear editor, We like to make use of the possibility of submitting appeal to the BMJ decision, because we are convinced that misunderstandings and/or misinterpretations are underlying the negative approval of our manuscript. Point by point we will deal with all the remarks made by the editorial board and external reviewers. Response to the remarks of the editorial board * We thought this an interesting paper Comment: large studies investigating the diagnostic accuracy of clinical and contextual information to discriminate influenza from other influenza like infections are difficult to perform and might not be undertaken anymore in the future. These studies are extremely important as a basis to the evaluation of the additional value of any other test (like rapid point-of-care tests). Taking into account the number of illness days and the pre/post epidemic period is innovative. The decision rules formulated in our study can be useful, especially in ruling out influenza, for many primary physicians all over the world. This fact is recognized by all the external reviewers. * However we have some concerns about the multiple imputation method. A note from our statistician says: you must always include your outcome (here the reference standard) when building an imputation model and all the factors you look at in your main analysis (here they omit interaction terms). Comment: We agree that these interactions need to be present when the multiple imputations are generated. However, the way the SAS procedure MI is designed ensures that this concern is properly taken into account. Indeed, the VAR statement is used as input to construct a so-called full multivariate model based on these variables. As a consequence, all pair wise and higher-order interactions are taken into account. We agree that this is a confusing feature, and therefore we have updated the wording in the manuscript accordingly *It's also not clear how you built your model. Comment: In the method section we tried to explain the different steps we took to reach the final model. It might be that our concise description is not clear and detailed enough for the statistician. Evidently we are willing to provide every additional detail necessary for a better understanding and review. Please give us a more detailed question out of the text below so that we can respond more accurately. “Each imputed data set was analyzed using a GEE (Generalized Estimating Equations) model with influenza positive PCR as the dependent variable and GP code as a cluster variable (to control for possible clustering of inclusion criteria and symptom registration within GPs). A backward regression analysis starting from a model with all symptoms and interaction terms (pre-planned on clinical relevance) between all symptoms and influenza epidemic, RSV epidemic, vaccine use, number of illness days and age was performed. After stepwise elimination of interaction terms with the threshold p<0.01 a forward introduction of interaction terms with borderline p-value was executed. When convergence problems occurred the responsible variable was eliminated. To deal with multiple comparisons only symptoms, signs, contextual information and interaction terms with a p-value less then 0.001 were withheld in the final model. Parameter estimates of relevant variables and interaction terms were then averaged across data sets by using a bootstrap technique (SAS macro).” * We have some concerns about the patient selection to avoid bias, and would have liked to know more about those not recruited Comment: the data we analyzed were provided by the sentinel network of the Scientific Institute of Public Health, whose main goal is to gather information about the epidemiology of each influenza epidemic. In consequence we have no clinical nor contextual information about the not included ILI patients (nor even the exact total number), which we recognize as a weak point of our study. At the other hand the GPs were not focused on the diagnostic part of the data they collected and were unaware of the outcome of the PCR test at the moment they registered the different symptoms and signs. By looking at the different curves of ILI consultations /100000 inhabitants versus positive swabs in most seasons a good match was seen between both, so no specific selection took place .cfr http://www.euroflu.org/index.php/ The first external reviewer (Bruce Arroll) is agreeing with us on this point. The inclusion criteria considered were broadly interpretable and in fact especially patients with high fever were included (cfr table 2). In consequence the sensitivity of temperature above 37°C is high and specificity very low. Our results are therefore only applicable in a similar starting situation: a patient probably suffering from influenza with fever and respiratory symptoms. A case definition each GP is dealing with during every winter period. A remark was added to the discussion section to attend the reader to this fact. * We also thought the likelihood ratios created by your model very modest in their values - ones above 10 or below 0.1 would be regarded as more useful. Comment: We fully agree with this statement, but we like to make following remarks. First of all we do reach negative likelihood ratios pre/post an epidemic lower than 0,1 (Table 4: prediction rule 1: 0.04 (0.03 to 0.05); prediction rule 2: 0.07 (0.05 to 0.08)) Secondly the pretest probability of influenza is already high (52,6%) in our study population. In consequence a smaller positive LR already gives a striking gain in posttest probability. During and outside an influenza epidemic a post-test probability of 79% and 60% is reached respectively using prediction rule 1. Thirdly our prediction rules can be considered as a starting point for future diagnostic studies considering the additive diagnostic value of for instance rapid point of care influenza tests. In our study rapid influenza tests were only performed at the laboratory and not at the GPs’ practice, in addition the rapid tests used, changed over the study period (2002-2007). Therefore we did not include the diagnostic performance of these tests in our prediction rule. * We would also have liked to see your prediction rule validated Comment: We could have chosen to divide our database in two parts: one part to define the prediction rules and their diagnostic values, another part to validate these rules. But we decided not to do so because validation preferably must be done in other populations and settings. The second prediction rule consisting of fever and cough was already proposed and studied in other publications especially during an influenza epidemic. The diagnostic performance of this rule is therefore validated in our study. We performed sensitivity analysis in 12 different subgroup datasets (cfr table 5) to underline the robustness of our data. Response to the external reviewers: Reviewer 1 Comments... Name:Bruce Arroll Position:Professor and Head of General Practice and Primary Health Care University of Auckland New Zealand Research question — The research question was to develop a clinical prediction rule for symptoms and signs of influenza. Overall design of study — The overall design of the study was appropriate. However the authors did not mention the Stard statement and although they seemed to have complied with it a flow clart of patient selection would improve the paper or at least to make figure 2 easier to understand. Participants — The patients are as adequately described as they could be. They all had Ili and there appeared to be no exclusion criteria. The authors were not able to say how representative they were patients in general but as mention this is unlikely to create any systematic variation from the "truth" Methods — The methods are adequately described? The main outcome measures are measures of validity which is appropriate. They had ethical approval. Results — The results are well presented and are appropriate. Interpretation and conclusions — The results are warranted by and sufficiently derived from the data. It would be helpful to describe how they derived a pretest probability of 62% in line 212 (discussion). They point out that they have epidemic and non epidemic information and also have combined symptoms/signs to give greater likelihood ratios.Their messages are clear The references are up to date and relevant and glaring omissions? I have read the reviewers comments from JAMA and while I don't understand some of the finer statistical points I feel the authors have answered the general criticisms in a satisfactory way I agree with the authors that their paper does add new information to the literature on influenza. In particular their day 1 data and their combination of symptoms and signs to increase post test probabilities. Given the concern about influenza in the modern world this paper will be of use to primary care physicians. Comment: Stard statement was followed as much as possible as we mentioned in the letter to the editor. In the following table we stated in detail were every item could be found in our manuscript. Section and Topic TITLE/ABSTRACT/ KEYWORDS INTRODUCTION METHODS Participants Item # 1 2 Identify the article as a study of diagnostic accuracy (recommend MeSH heading 'sensitivity and specificity'). State the research questions or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups. 3 The study population: The inclusion and exclusion criteria, setting and locations where data were collected. 4 Participant recruitment: Was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the index tests or the reference standard? Participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in item 3 and 4? If not, specify how participants were further selected. Data collection: Was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)? The reference standard and its rationale. 5 6 Test methods On page # 7 8 Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard. 9 Definition of and rationale for the units, cut-offs and/or categories of the results of the index tests and the reference standard. 10 The number, training and expertise of the persons executing and reading the index tests and the reference standard. Title page Line 1-3 Introduction P3 / Line 93-95 Methods P4/ Line 105-107 Methods P4/ Line 105-107 Discussion P8/ Line 222-224 Methods P4/ Line 107-109 Methods P4/ Line 123-126 Methods P4/ Line 107/109110 Methods P5/ Line 154-155 Methods P4/ Line 99-111 11 Statistical methods 12 13 RESULTS Participants Test results 14 When study was performed, including beginning and end dates of recruitment. 15 Clinical and demographic characteristics of the study population (at least information on age, gender, spectrum of presenting symptoms). 16 The number of participants satisfying the criteria for inclusion who did or did not undergo the index tests and/or the reference standard; describe why participants failed to undergo either test (a flow diagram is strongly recommended). Time-interval between the index tests and the reference standard, and any treatment administered in between. 17 18 19 20 Estimates 21 22 DISCUSSION Whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers. Methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals). Methods for calculating test reproducibility, if done. Distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in participants without the target condition. A cross tabulation of the results of the index tests (including indeterminate and missing results) by the results of the reference standard; for continuous results, the distribution of the test results by the results of the reference standard. Any adverse events from performing the index tests or the reference standard. Estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals). How indeterminate results, missing data and outliers of the index tests were handled. 23 Estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done. 24 25 Estimates of test reproducibility, if done. Discuss the clinical applicability of the study findings. Methods P4/ Line 109-116 Methods P5/ Line 133-155 Not applicable Methods P4/ Line 98-104 Results P6/ Line 159-168 Table 1 Results P6/ Line 159-163 Methods P4/ Line 77 Not registered Figure 2 Not registered Table 3 and 4 Methods P5/ Line 124-125 Line 133-138 Results P7/ Line 205-208 Table 5 Not applicable Discussion P8/ Line 210-215 Line 264-268 As mentioned earlier we have no exact number nor clinical information about all the eligible patients. About 30% of the ILI patients were swabbed. We started with 4597 patient records of which 13 records were considered useless because of missing of any clinical information. The remaining 4584 18,7% of the records had missing values in at least one of the following items: age, number of illness days and/or temperature. We possessed 3738 full records (cfr table 1). Cfr result section. Figure 2 gives the flowchart of the 4584 analyzed records. Extra information will be noted in a footmark of figure 2. pretest probability of 62% in line 212 (discussion) pretest odds = 1,11 (pre test probability = 52%) positive likelihood of an influenza epidemic: = 1,5 post test odds = 1,11*1,5=1,67 posttest probability = 1,67/(1+1,67)*100%=62% Reviewer 2 Name: Chris Del Mar Position: Prof primary care research, Bond University, Gold Coast, Australia This is a difficult paper. The subject is very topical and important. But the methods have some flaws and the results are difficult to understand. Background: influenza is important to diagnose because some of the public health responses to 'pandemics' is to isolate or otherwise manage people suspected of influenza separately. There have been several studies to look for algorithms to diagnose influenza clinically – none of them very accurate. Description: the study is described as a cross-sectional diagnosis analysis of >4,500 people who presented to GPs in France and Belgium with suspected flu in several periods over several years at epidemic and non-epidemic times. Half were found to have influenza by the blood test gold standard. The analysis is complicated. The main finding is that it is difficult to diagnose influenza. No item was >10 for LR+ or <0.1 for LR-. Two 'Prediction Rules' are derived from the data. Neither performs particularly well. Ruling out is easier than ruling in. Things that were the best predictors were: * Whether there was an epidemic or not; * Previous ILI contact * Cough * Sputum with the cough on Day 1 * Temp >37.8 Comment: The gold standard (blood tests) is appropriate. There is a problem with the selection of patients: these were those who might have had influenza, as determined by the surveillance GPs. These will have varied by threshold for making the diagnosis – and this might also be a reason for the differences in 'epidemic-or-not', and 'Previous ILI contact'. The authors fully acknowledge this problem. I note the reviews by JAMA reviewers and the responses. The paper is very well written, (although some copy-editing to sort out the tricky English idioms is necessary) and a lot of data provided. This will be daunting to most readers, (in particular the analysis is beyond me, for one, and I hope a statistical reviewer is engaged), but it is hard to see how this can be summarised better. Perhaps the Figure 1 could be left out – it is not central to the study. It would be good to either ask the Authors to discuss further how this is useful for practising clinicians, (and perhaps health administrators facing future problems with influenza), (or alternatively, have an accompanying editorial doing this). The conclusions seem to be missing is that influenza is hard to separate out from the ubiquitous ILI, even when GPs are already thinking of it. All the rest seems to be small print to me. Comment: Our study was only performed in Belgian (Flemish and French speaking regions) not in France. Our reference test was not a blood test, but a PCR test on nose and throat swabs. problem with the selection of patients: the data we analyzed were provided by the sentinel network of the Scientific Institute of Public Health, whose main goal is to gather information about the epidemiology of each influenza epidemic. In consequence we have no clinical nor contextual information about the not included ILI patients (nor even the exact total number), which we recognize as a weak point of our study. At the other hand the GPs were not focused on the diagnostic part of the data they collected and were unaware of the outcome of the PCR test at the moment they registered the different symptoms and signs. By looking at the different curves of ILI consultations /100000 inhabitants versus positive swabs in most seasons a good match was seen between both, so no specific selection took place .cfr http://www.euroflu.org/index.php/ The first external reviewer (Bruce Arroll) is agreeing with us on this point. this might also be a reason for the differences in 'epidemic-or-not', and 'Previous ILI contact': these differences are reflecting the real situation where a change from low pretest to high pretest and again to low pretest probability occurs. Therefore distinguishing these differences in the analysis is appropriate and mandatory. Figure 1 may be omitted. Usefulness for practising clinicians, (and perhaps health administrators facing future problems with influenza): The conclusions seem to be missing is that influenza is hard to separate out from the ubiquitous ILI, even when GPs are already thinking of it. The pretest probability of influenza is already high (52,6%) in our study population. In consequence a smaller positive LR already gives a striking gain in posttest probability. During and outside an influenza epidemic a post-test probability of 79% and 60% is reached respectively using prediction rule 1. We will add to the conclusion that clinical and contextual information alone is not sufficient enough for confirming influenza. Ruling out influenza, when a temperature above 37,8°C and cough are absent is more reliable. ________________________________________ Reviewer 3 Comments... Name: Dr Matthew Thompson Position: GP and Senior Clinical Scientist, University of Oxford Department of Primary Health Care. Thank you for asking me to review this article. Originality This is a large cross sectional study carried out in GP settings in Belgium which sought to determine diagnostic value of single and combinations of symptoms for diagnosis of influenza in patients of all ages. There are several clinical criteria for influenza, and there have been several attempts to try to determine validity of diagnostic criteria (particularly in the recent swine flu pandemic). Most have been small, and conflicting. At the same time there have been several studies looking at point of care rapid influenza tests, which do not show particularly good sensitivity to be used as screening tests alone. In the recent pandemic of course, this lack of clinical diagnostic criteria led to all sorts of problems with over/mis-diagnosis. A large well conducted study, such as this manuscript, adds significantly to the literature on diagnosing influenza. Validity of study This is a pragmatic study, conducted for sentinel purposes. Recruitment was non-consecutive of some of the GPs' patients attending with new illness. THis limitation is noted by the authors. GPs had a fairly broad inclusion criteria (fever + respiratory symptom + systemic symptom). Obviously this selects patients who may have a more severe illness, and ones in which respiratory symptoms are present. For example in the recent pandemic, GI symptoms were not uncommon, it is possible that such presentations would not have been captured by their inclusion criteria - this should be commented on. Less recruitment of younger children unfortunately - due to the swabbing involved, limiting generalisability. The clinical features were collected apparently in a standardised way. The gold/reference standard was PCR for influenza, however, they did do rapid diagnostic test too - no results comparing the rapid and PCR tests are presented - this would be interesting to present, and also the clinical predictors of both true positive rapid flu tests, and particularly of false negative rapid flu tests. It would be a pity not to include this data in this paper and would consider adding, if available. THe statistical methods seem robust, they dealt with missing variables, used bootstrapping to strengthen internal validity (they did not have a derivation and validation data set). Applicability and interest Their findings are presented appropriately and fairly clearly, and they have used clear methods to examine their prediction rules both during, as well as pre/post influneza epidemics. The main finding is that their clinical prediction rules are better at ruling out than ruling in. Some of the clinical features that they identify as being good/not good predictors are very interesting, and challenge current clinical thinking of many clinicians I would suspect. For example mylagia does not seem useful, whereas cough producing sputum on the first day of illness is. These findings would be valuable for ongoing yearly influenza diagnosis in primary/emergency care worldwide. Indeed, the way the authors present the pre and post test probabilities in the discussion would be clear for most readers to understand I enjoyed reading this paper, it was well written and presented, and I think adds to the literuatre, and challenges current diagnostic thinking. Comment: in the recent pandemic, GI symptoms were not uncommon, it is possible that such presentations would not have been captured by their inclusion criteria : GI symptoms were recorded in our study and were not an exclusion criterion. Influenza patients in the recent pandemic suffered from fever and cough as well as from GI symptoms and would have been included in our study. no results comparing the rapid and PCR tests are presented: our prediction rules can be considered as a starting point for future diagnostic studies considering the additive diagnostic value of for instance rapid point of care influenza tests. In our study rapid influenza tests were only performed at the laboratory and not at the GPs’ practice, in addition the rapid tests used, changed over the study period (2002-2007). Therefore we did not include the diagnostic performance of these tests in our prediction rule.