ESM Appendix 2 - Method for grading the quality of evidence

advertisement

Prognostication in comatose survivors of cardiac arrest

.

An Advisory Statement from the

European Resuscitation Council and the European Society of Intensive Care Medicine

Intensive Care Medicine, 2014.

Claudio Sandroni, Alain Cariou, Fabio Cavallaro, Tobias Cronberg, Hans Friberg, Cornelia Hoedemaekers, Janneke Horn,

Jerry P. Nolan, Andrea O. Rossetti and Jasmeet Soar

Corresponding author:

Claudio Sandroni

Department of Anaesthesiology and Intensive Care

Catholic University School of Medicine,

Largo Gemelli, 8 - 00168 Rome, Italy sandroni@rm.unicatt.it

ESM Appendix 2 - Method for grading the quality of evidence

Quality of evidence

According to GRADE, the quality of evidence is graded as high, moderate, low or very low according to the study design and to the presence of the following factors: 1) limitations; 2) indirectness; 3) inconsistency; 4) imprecision and 5) publication bias. Publication bias was not considered, given the difficulty of measuring it

in prognostic studies[1].

Study design

In the GRADE system the ideal study design for informing recommendations is a randomised trial;

observational studies are deemed low level evidence [2]. However, valid studies of test accuracy may start as high quality[3]. These studies involve a comparison between the test under consideration and an

appropriate reference. For diagnostic accuracy studies this reference is the test considered to be the gold standard, while for prognostic accuracy studies the comparison is made between the predicted outcome and the real outcome of the patient at a given time point, assessed by blinded evaluators.

Limitations

Limitations (risk of bias), indirectness, inconsistency and imprecision decrease the quality of evidence by one level if serious, or by two levels if very serious.

Given the importance of the risk of self-fulfilling prophecy, limitations were graded as serious when the treating team was not blinded to the results of the predictor of poor outcome that was being studied, and very serious when the investigated predictor was used for the decision to WLST. Other factors that were considered in order to evaluate the presence of limitations are: blinding of outcome evaluators; exclusion of non-neurological causes of death (or description of the best CPC in patients who died at the end of the study period); exclusion of previous neurological disease; exclusion of sedation (for indices based on clinical examination or EEG); exclusion of patients receiving neuromuscular blocking drugs (for indices based on clinical examination); length of follow-up.

Indirectness

Indirectness was deemed present when the described outcome did not completely correspond to that described in the inclusion criteria; in practice, when the poor outcome was defined as CPC 4-5 (i.e., vegetative state or death) instead of 3-5 (severe neurological disability, vegetative state, or death).

Inconsistency

Inconsistency across studies was evaluated after pooling. Inconsistency was graded as serious when heterogeneity was significant (p<0.1 or I 2 >50%) for either sensitivity or specificity, and it was graded as very serious when heterogeneity was significant for both of them.

Imprecision

When evaluating the accuracy of predictors of poor outcome in critically ill patients, avoiding false positives

(i.e., falsely pessimistic predictions) is particularly important. Ideally, the rate of false positives (FPR) should be zero. However, even a zero FPR has little value when the precision of its estimate is low, i.e., when the point estimate has a large confidence interval (CI). Imprecision was therefore graded as serious when the upper limit of the 95% CI of the FPR estimate was greater than 5%, and it was graded as very serious when

this value was more than 10%. CIs were calculated using the F distribution method, according to Blyth.[4]

Recommendations

GRADE has adopted a four-category classification for recommendations. A recommendation can be for or against a given management approach and its strength can be strong or weak.

Criteria for determining the strength of a recommendation

The four domains that contribute to the strength of a GRADE recommendation are:[5]

1) The balance between desirable and undesirable outcomes;

2) The confidence and variability in values and preferences that patients (or population) apply to those outcomes;

3) The confidence in the magnitude of the estimates of effect (i.e., the quality of evidence);

4) The resource use, i.e. the cost of the strategy under evaluation.

Domain 1 - The balance between desirable and undesirable effects for a test predicting poor outcome depends not only on the quality of these effects, but also on the test performance (i.e. on the balance between true and false predictions given by that test). Table 1 below summarizes the patient-important outcomes associated with the four possible results of a test for predicting poor neurological outcome.

Sensitivity (TP/[TP+FN]) and specificity (TN[TN+FP]) of a test summarize the balance between the desirable and the undesirable test results and help to assess the balance between alternative treatment strategies based on them.

When our panel was highly confident of the balance between desirable and undesirable consequences, we made a strong recommendation for (desirable outweighs undesirable) or against (undesirable outweighs desirable) a prognostication strategy. If we were less confident of the balance between desirable and undesirable consequences, we offered a weak recommendation.

Table 1. Expected outcomes of results of a test predicting poor neurological outcome in a patient who is comatose after having been resuscitated from cardiac arrest

Population Adult patients who are comatose after resuscitation from cardiac arrest

Intervention Prognostic test under evaluation

Comparison Standard care (no prognostication)

Outcomes For the patient

The patient is correctly predicted to have a poor

TP outcome and in most cases will undergo limitations or withdrawal of life sustaining treatment (WLST). Inappropriate treatments will be avoided.

TN

FP

FN

The patient is correctly predicted to have a good neurological outcome ( 1 ). Appropriate treatments will be continued even in the presence of persistent unresponsiveness ( 2 ).

The patient is predicted to have a poor outcome but will ultimately recover with only mild or no neurological sequelae. Risk of inappropriate treatment limitation or WLST and consequent poor outcome because of a falsely pessimistic prediction (self-fulfilling prophecy).

The patient will continue to receive treatment despite having an eventually poor outcome, due to a prediction of good neurological recovery. Risk of unnecessary prolonged treatment in a patient with irreversible brain injury.

For the family

Family stress due to uncertainty about patient outcome will be avoided.

Knowing there is a reasonable chance of recovery will comfort the patient’s family.

Unnecessary suffering for patient’s family caused by a falsely pessimistic prediction.

Stress and suffering for patient’s family because of unrealised hope of good recovery.

For the community

Cost reduction since unnecessary diagnostic procedures or treatments will be avoided.

Appropriate use of resources in patients with a reasonable chance of recovery

Burden of death secondary to incorrect prognostication and inappropriate WLST

Resources spent on treating a patient with no chance of recovery.

Complications of the test

Resource utilization (cost)

They could be relevant for some tests, such as

MRI

Notes

( 1 ) Interference from residual sedation or paralysis may cause persistent unconsciousness in patients with no or minimal brain damage.

( 2 ) Correct prediction of good neurological outcome will not rule out subsequent death due to cardiovascular complications or multiorgan failure, which tend to occur mostly in the early post resuscitation phase.

Domain 2 - The confidence in values and preferences that patients or a community apply to the consequences and outcomes described above and the knowledge of their variability. Given the paucity of relevant studies on patients’ values and preferences, the panel made assumptions on this and refined them through discussion and consensus;

Domain 3 – The confidence in the magnitude of the estimates of effect corresponds to the quality of evidence of the test that can be found in the relevant Evidence Profile Tables. The higher the quality of

evidence, the more likely a strong recommendation is warranted.[5]

Domain 4 - The higher the costs of a prognostic test, the less likely a strong recommendation is warranted.

For example, additional costs may be low in the case of a clinical examination but may be high for MRI.

Domain 4 includes other variables, such as complications (e.g., a test which can be made only outside ICU may imply an additional risk for the patient) and feasibility (e.g. based on the availability of a test.

Process for grading evidence and recommendations

The process included the following steps:

1.

Guideline panel members rated the relative importance of all outcomes and consequences using

Table 1 as a basis. This was made a priori, i.e., before examining the Evidence Profile tables of the various predictors. Rating was assigned on a Likert scale from a minimum value (informative but not important for decision making) to a maximum (critical for decision making)

2.

The panel reviewed the Evidence Profile tables of all predictors to be included in the recommendations, in order to evaluate the balance between favourable and unfavourable outcomes. These tables included the timing of the test, its performance (sensitivity, specificity, false positive rates and their respective confidence intervals), whether the test has been used for decisions of WLST, and the quality of evidence.

3.

Finally, the panel members voted on the overall strength of recommendation for included predictors. This was accomplished using a web-based survey software (SurveyMonkey ® www.surveymonkey.com

) followed by reports and web-based discussion, when needed.

References

1. Rifai N, Altman DG, Bossuyt PM, (2008) Reporting bias in diagnostic and prognostic studies: time for action. Clinical chemistry 54: 1101-1103

2.

3.

4.

5.

Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y,

Meerpohl J, Norris S, Guyatt GH, (2011) GRADE guidelines: 3. Rating the quality of evidence. Journal of clinical epidemiology 64: 401-406

Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams JW, Jr., Kunz R, Craig

J, Montori VM, Bossuyt P, Guyatt GH, (2008) Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. Bmj 336: 1106-1110

Blyth CR, (1986) Approximate Binomial Confidence Limits. Journal of the American Statistical

Association 81: 843-855

Andrews JC, Schunemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, Rind D, Montori VM,

Brito JP, Norris S, Elbarbary M, Post P, Nasser M, Shukla V, Jaeschke R, Brozek J, Djulbegovic B,

Guyatt G, (2013) GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation's direction and strength. Journal of clinical epidemiology 66: 726-735

Download