Incorporating uncertainty into medical decision making: an approach

advertisement
DECISION MAKING IN CLINICAL PRACTICE
Incorporating Uncertainty Into
Medical Decision Making: An Approach
to Unexpected Test Results
Matt T. Bianchi, MD, PhD, Brian M. Alexander, MD, MPH, Sydney S. Cash, MD, PhD
The utility of diagnostic tests derives from the ability to
translate the population concepts of sensitivity and specificity into information that will be useful for the individual
patient: the predictive value of the result. As the array of
available diagnostic testing broadens, there is a temptation
to de-emphasize history and physical findings and defer
to the objective rigor of technology. However, diagnostic
test interpretation is not always straightforward. One significant barrier to routine use of probability-based test
interpretation is the uncertainty inherent in pretest probability estimation, the critical first step of Bayesian reasoning. The context in which this uncertainty presents the
greatest challenge is when test results oppose clinical
judgment. It is this situation when decision support would
be most helpful. The authors propose a simple graphical
approach that incorporates uncertainty in pretest probability and has specific application to the interpretation of
unexpected results. This method quantitatively demonstrates how uncertainty in disease probability may be
amplified when test results are unexpected (opposing clinical judgment), even for tests with high sensitivity and
specificity. The authors provide a simple nomogram for
determining whether an unexpected test result suggests
that one should ‘‘switch diagnostic sides.’’ This graphical
framework overcomes the limitation of pretest probability
uncertainty in Bayesian analysis and guides decision
making when it is most challenging: interpretation of
unexpected test results. Key words: pretest probability;
uncertainty; Bayes; unexpected; decision theory. (Med
Decis Making 2009;29:116–124)
E
sufficiently similar to those reported in studies. Some
have advocated incorporating threshold strategies
(for deciding whether to test or to treat) into the
decision-making process.1,2 The challenge of test
interpretation is to translate the population statistics
of sensitivity and specificity into information relevant to the individual patient. Although formal probability theory is infrequently used in routine clinical
practice, the likelihood ratio (LR) method of Bayesian
reasoning is a relatively straightforward approach to
patient-specific diagnostic interpretations.3
One important obstacle to implementing Bayesian
reasoning to guide test interpretation is that there is
considerable uncertainty in the first and fundamental
step—estimation of the pretest probability (pre-TP) of
disease.4,5 Routine Bayesian strategies (e.g., using the
LR-nomogram shown in Figure 1) require designation
of a single pre-TP value, even if the clinical impression is qualitative only, or if the epidemiological data
are expressed as ranges or intervals. This limitation is
crucial because pre-TP estimations are often more
appropriately (and intuitively) represented as ranges
or intervals of uncertainty. Pre-TP can be estimated
from clinical experience, and translation of this
ach clinical scenario presents unique challenges
related to risks and benefits of testing and therapy, and delivery of appropriate care depends
heavily on accurate diagnosis of disease. The increasingly evidence-based context of medicine provides
clinicians with important tools to facilitate the diagnostic process, most commonly in the form of the
sensitivity and specificity of various tests. Evidencebased diagnosis involves consideration of many
factors, including whether a particular patient is
Received 6 April 2008 from Partners Neurology, Massachusetts General
Hospital and Brigham and Women’s Hospital, Wang Ambulatory
Center, Boston, Massachusetts (MTB, SSC) and Harvard Radiation
Oncology Program, Massachusetts General Hospital, Brigham and
Women’s Hospital, and Beth Israel Medical Center, Boston, Massachusetts (BMA). MTB and BMA performed literature searches and designed
figures; all 3 authors contributed to the text. The authors thank Dr. Allan
Ropper, Dr. Amy Tindell, and Manuel Botzolakis for critical comments
and discussion. Revision accepted for publication 18 April 2008.
Address correspondence to Matt T. Bianchi, MD, PhD, Massachusetts
General Hospital, Neurology Clinic, Wang Ambulatory Center, 8th
Floor, Boston, MA 02114; e-mail: thebianchi@gmail.com.
DOI: 10.1177/0272989X08323620
116 • MEDICAL DECISION MAKING/JAN–FEB 2009
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
UNCERTAINTY AND UNEXPECTED TEST RESULTS
subjective strategy into probability is better described
by a range (e.g., high, intermediate, or low probability)
than an exact value. Even the rigorous determination
of population pre-TP provided by epidemiological
studies is commonly presented as a confidence interval, reflecting the uncertainty inherent even in these
more formal probability estimations. Finally, aspects
of the history and physical findings shaping the clinical assessment may themselves be uncertain or consistent with multiple possible diagnoses. Considering
ranges or intervals as a representation of uncertainty
is therefore a more intuitive approach that captures
the potential complexity of clinical encounters. This
framework is compatible with threshold strategies,1
and it is our specific goal here to demonstrate that it
is also compatible with formal diagnostic logic, particularly a Bayesian approach.
Despite the absence of formal probability theory in
routine test ordering and interpretation, diagnostic evaluations seem to proceed in the correct direction most
of the time. Test interpretation is relatively straightforward when the results agree with the clinical assessment based on history and physical information (i.e.,
the pre-TP of disease). Although in such circumstances, there is little practical need for detailed statistical decision theory, the interpretation of test results
performed in the setting of pre-TP uncertainty may not
be so intuitive, particularly those that oppose clinical
suspicion. In fact, there are substantial data indicating
that variability in physician-generated pre-TP estimations (one type of uncertainty) is not uncommon among
physicians.6 13 These studies also demonstrate the
consequences of test result misinterpretation, particularly when physicians are presented with unexpected
test results. The primary source of misunderstanding
was either poor estimation of pre-TP or failure to incorporate this information into test interpretation.
We attempt to address two fundamental aspects of
diagnostic reasoning that complicate diagnostic test
interpretation: uncertainty in the pre-TP and unexpected test results. Decision making under difficult
but routinely encountered diagnostic circumstances
would benefit from a mechanism for dealing with
uncertainty and a guide for weighing clinical suspicion against a test result when the two disagree.
SENSITIVITY AND SPECIFICITY:
FROM POPULATION TO INDIVIDUAL
The familiar 2 × 2 illustration of disease status versus test result provides the foundation for understanding the clinical utility of diagnostic tests (Figure 1A).
Figure 1 Diagnostic tools. (A) The familiar 2 × 2 box showing the
presence or absence of disease and the diagnostic test result (positive or negative). Positive test results (top row) can either be true (disease present; TP) or false (disease absent; FP). Likewise, negative
test results can either be true (disease absent; TN) or false (disease
present; FN). Sensitivity describes the proportion of affected patients
(left column) testing positive: TP/(TP + FN). Specificity describes the
proportion of unaffected patients testing negative: TN/(TN + FP).
Positive and negative predictive values are calculated ‘‘horizontally’’
for positive tests, TP/(TP + FP), and negative tests, TN/(TN + FN),
respectively. PPV, positive predictive value; NPV, negative predictive value. (B, C) The Bayesian nomogram (adapted with permission
from www.CEBM.net) converts pre-TP (left column) to post-TP (right
column) via the likelihood ratio (LR; middle column) using a straight
line as shown. Lines refer to text examples. Note that positive test
results use LR values > 1 (LR(+) ), whereas negative test results use
LR values <1 (LR(−) ). Probability is considered here in the form
of percentages (0%–100%). LR(+) = [sensitivity/(100 – specificity)].
LR(−) = [(100 − sensitivity)/specificity].
DECISION MAKING IN CLINICAL PRACTICE
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
117
BIANCHI, ALEXANDER, CASH
It assumes that disease status can be dichotomized as
present or absent and that test results can be either
positive or negative. Disease prevalence, or pre-TP, is
reflected by the proportion of patients in the ‘‘disease
present’’ column compared with the whole population. Whereas sensitivity and specificity address the
epidemiologically relevant question of how likely is
a positive or negative test result given known disease
status, predictive value addresses the following clinically pertinent question: how likely is a given test
result true, given unknown disease status? Thinking
‘‘horizontally’’ leads to easy calculation of both positive predictive value (PPV) and negative predictive
value (NPV) (Figure 1A).
Unfortunately, the extent to which predictive
value depends on disease prevalence in a population
(or, for the individual patient, the pre-TP) is not easily apparent using a 2 × 2 box. Indeed, predictive
value depends so strongly on pre-TP that the results
of a relatively powerful test (e.g., with sensitivity and
specificity both > 90%) can be rendered meaningless, and the results of even a random coin toss can
seem to have good predictive power (e.g., > 90%)
when paired with the appropriate pre-TP.5,14,15 This
distinction is also emphasized by the pitfalls of using
the common mnemonic ‘‘SPin/SNout’’ as a diagnostic
aid (Specific test + Positive result = rule in; Sensitive
test + Negative result = rule out).16 18 Thus, pre-TP
is a critical component of Bayesian reasoning that is
required for transitioning from the population concepts of sensitivity and specificity to the predictive
value of test results for individual patients.
An alternative to performing predictive value calculations using the 2 × 2 box is to use the Bayes nomogram (Figure 1B). This approach combines test power
(sensitivity and specificity) in the form of a single metric, called the likelihood ratio, with the pre-TP, also
known as the a priori disease probability.19,20 In other
words, test results are combined with the best estimation of disease probability (before the test was performed), which is usually based on clinical assessment
such as history and physical exam findings, combined
with population data from epidemiological studies.
We attempt to merge this Bayesian approach with
graphical representations of two important clinical
challenges to test interpretation: uncertainty in the
pre-TP and test results that oppose clinical judgment.
THE LIKELIHOOD RATIO AND BAYES THEOREM
Bayesian analysis uses the LR, which contains
information about both sensitivity and specificity of
a test, to adjust the probability of disease in an individual patient based on test results. By definition,
every dichotomous test has two LRs: one used to
adjust disease probability upward (if the result is
positive) and one to adjust disease probability downward (if the result is negative). The so-called positive
LR, or LR(+) , is a value greater than 1 defined by
the ratio [sensitivity/(100 − specificity)], whereas the
negative LR, or LR(−) , is a fraction less than 1 defined
by the ratio [(100 − sensitivity)/specificity]. The LR
can be considered an adjustment factor applied to
the pre-TP to generate a ‘‘revised’’ probability, known
as the posttest probability (post-TP). A simple nomogram (Figure 1B) facilitates the use of test results to
transform pre-TP into post-TP (circumventing a cumbersome conversion between probability and odds).
From the initial assessment of pre-TP (left column),
a straight line through the appropriate LR value
(corresponding to the test result) leads to the post-TP
of disease. Note that although probabilities are often
expressed as fractions between 0 and 1, the nomogram
and subsequent discussion will use the corresponding
percentage values (0–100). We will consider an example to illustrate the critical importance of pre-TP in
Bayesian reasoning.
You are performing a stress echocardiogram on
a 42-year-old woman presenting to the emergency
department with chest pain, and the result is positive. She asks whether this means that she has acute
heart disease, and you correctly defer the answer to
this question to the evaluating team for ‘‘clinical correlation.’’ Without more clinical information to establish a pre-TP, the test result cannot be interpreted. If
she had no cardiac risk factors and the pain was
sharp, brief, and began at rest, her pre-TP might be
∼ 3%.21 However, if she smokes, has diabetes and
hypertension, and the pain was prolonged, exertional, and resolved with rest, her pre-TP could be
quite high (we will assume 70% here). The nomogram in Figure 1B shows a graphical representation
of the Bayesian approach to this case and the statistical meaning of clinical correlation. The LR(+) of stress
echo is estimated at ∼ 6,22 which is then applied in
the nomogram to the low and high pre-TP values that
define a potential range of uncertainty about the
patient’s baseline risk of coronary disease. This range
of pre-TP projects through the LR(+) of the echocardiogram result to define the post-TP window: 15% to
93%. This interval is clearly too large to interpret
(and it spans the 50% mark, a potential point of
‘‘indifference’’) and confirms your decision to defer
to clinical correlation. Even if the test had much better discriminative power, with LR(+) = 20 (assuming
118 • MEDICAL DECISION MAKING/JAN–FEB 2009
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
UNCERTAINTY AND UNEXPECTED TEST RESULTS
95% sensitive and 95% specific), a similar initial
window of pre-TP uncertainty would yield a post-TP
window of 33% to 98%—still too wide (uncertain)
for meaningful conclusions (Figure 1C). Clearly, a single test result in a given patient (e.g., positive stress
test) can yield vastly different certainty of disease
depending on the pre-TP. Additional clinical information, combined with relevant epidemiological
data, is critical to focus the pre-TP estimation, reduce
the uncertainty of this starting point, and thus facilitate accurate interpretation of test results.
UNCERTAINTY IN PRETEST
PROBABILITY ESTIMATES
Unfortunately, epidemiological data are not always
available or applicable to particular patient presentations, and sometimes the clinical history is poor or
complex. Senior clinicians face this sort of uncertainty, where data are lacking, by using intuition
based on experience.23 However, this uncertainty in
pre-TP need not represent an obstacle to probabilistic
reasoning. Considering a confidence interval or ‘‘window’’ of pre-TP provides an intuitive mechanism for
dealing with uncertainty while still implementing
Bayesian reasoning. In particular, we emphasize the
unexpected result: when the test disagrees with clinical suspicion (e.g., obtaining a positive test result
despite a low estimated pre-TP). Such an approach
would be of potential benefit at the bedside when
faced with an unexpected result.24
For visual convenience, we consider a transformed
version of the Bayes nomogram such that the pre-TP
axis is linear (note that in the standard Bayes nomogram, the pre-TP values are distributed in a nonlinear
fashion).1,5 This transformed graph contains the same
information as the standard nomogram, but the probability axes are linear (Figure 2). In each panel, curved
lines represent different LRs (thus reflecting a range of
possible test results), and 1-to-1 (linear) visual comparisons of pre-TP (on the x-axis) to post-TP (on the yaxis) can be made easily (see legend). Similar to using
the standard Bayesian nomogram, one draws a line
starting from a given pre-TP, upward to the appropriate test result curve, followed by a horizontal line leftward to the y-axis, where the post-TP is obtained.
To incorporate uncertainty, we can specify a range
of pre-TP that reflects the degree of confidence in the
clinical assessment (prior to obtaining the test result).
Consider a scenario in which you do not think that
a patient has a given disease, but uncertainty allows
for a 5% to 25% pre-TP range. Here, this would be
indicated by a shaded region spanning this uncertainty interval on the x-axis. We next examine how
using a range of pre-TP, instead of a single point,
affects test interpretation in a Bayesian framework.
When considering LR values as an indication of
a test’s power, the closer the value is to 1, the less
powerful the test. Conversely, smaller fractions and
larger whole numbers indicate increased power for
LR(−) and LR(+) values, respectively. If a relatively
poor test, defined here as having an LR(−) of only
0.5, agrees with the clinical suspicion (in this case,
a negative result), the uncertainty range is reduced
from a baseline pre-TP range of 5% to 25% to a postTP range of 3% to 15% (Figure 2A). This new range
was obtained by using the edges of the pre-TP interval, reading from these 2 points upward to the
LR = 0.5 curve for this hypothetical test result, and
then reading leftward to specify the post-TP range.
A more accurate (or narrow) pre-TP estimate
would have helped to identify the ‘‘true’’ disease
probability. A more powerful test also would have
helped if it agreed with our clinical suspicion: a negative result from a more powerful test, defined here as
having an LR(−) = 0:1, would collapse the 5% to 25%
pre-TP uncertainty window to a small range of postTP (Figure 2C). This simple graphical method allows
one to easily incorporate uncertainty into pre-TP by
means of a window or confidence interval. Any test
result that agrees with the clinical suspicion will
cause that interval to shrink when the post-TP is
obtained, and more powerful tests cause more marked
collapse of uncertainty ranges (as expected).
Next we consider the impact of an unexpected
result using this ‘‘window’’ approach to uncertainty
in the pre-TP of disease. The uncertainty expressed
in the pre-TP range of 5% to 25% can increase when
an unexpected (positive) result is obtained: a poor
test, defined here as having an LR(+) = 2, which disagreed with the clinical suspicion, caused the uncertainty window to grow to 8% to 40% (Figure 2B).
One might conclude from this hypothetical case that
simply using a more powerful test might overcome
the issues of uncertainty caused by an unexpected
test result (i.e., one could place more confidence in
the result of a more powerful test, even if it disagreed
with the clinical suspicion). However, when a more
powerful test, defined here as having an LR(+) = 10,
opposes the clinical expectation, we see that the
range of uncertainty is expanded even more than for
the less powerful test, to a post-TP range of 18% to
DECISION MAKING IN CLINICAL PRACTICE
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
119
BIANCHI, ALEXANDER, CASH
A
C
Poor test: agrees
Good test: agrees
20
5
5
2
0.
1
05
0.
0
20
40
60
80
100
0
20
Pretest probability (%)
D
Poor test: opposes
5
2
=
0.
2
0.
5
40
1
20
05
0.
0.
05
0.
1
20
60
0.
0.
2
0.
5
40
80
LR
LR
=
1
2
60
100
20
Posttest probability (%)
5
10
80
80
Good test: opposes
100
20
100
60
10
B
40
Pretest probability (%)
1
0
Posttest probability (%)
1
=
20
0.
1
0.
05
0.
0
40
0.
5
0.
2
0.
20
LR
1
=
LR
40
60
2
5
2
60
80
10
80
Posttest probability (%)
20
100
10
Posttest probability (%)
100
0
0
0
20
40
60
80
100
Pretest probability (%)
0
20
40
60
80
100
Pretest probability (%)
Figure 2 Uncertainty in pretest probability estimation. Each panel illustrates the conversion of pretest probability (x-axis) to posttest
probability (y-axis) using various likelihood ratio (LR) values representing different test results (curved lines). A ‘‘reference’’ diagonal line
is LR = 1 (no change in probability). Better tests for ruling in disease (higher LR(+) ) have curves above this line, whereas better tests for ruling out disease (smaller LR(−) ) have curves below this line. To illustrate a hypothetical patient for whom a disease is thought unlikely, we
indicate a range of pretest probability (pre-TP) by a shaded area spanning 5% to 25%. The impact of test results from ‘‘poor’’ (A and B)
and ‘‘good’’ tests (C and D) is shown under conditions of agreement with (A and C) or opposition to (B and D) this pre-TP assessment. (A)
When a poor test (defined here as having an LR(−) = 0:5) agrees (is negative) with the expectation (that the patient is unlikely to have disease), the pre-TP uncertainty is narrowed somewhat in the resulting post-TP range. (B) When a poor test (here defined as a LR(+) = 2)
opposes the expectation, the initial range of uncertainty is expanded in the resulting post-TP range. (C) A good test (defined here as having an LR(−) = 0:1) that agrees with the expectation significantly shrinks the uncertainty window. (D) When the result of a good test (here
defined as an LR(+) = 10) opposes the expectation, the uncertainty range is expanded substantially (cf. C).
78% (Figure 2D). Note also that this expanded range
of uncertainty spanned the 50% mark, which could
be viewed as the point of maximal uncertainty.
The reason for the seemingly counterintuitive
increase in uncertainty (from the unexpected result of
a more powerful test) is easily visualized by considering the shape of the LR curves in the graph (Figure 2).
Unexpected results involve the steep portions of the
graph: positive test results opposing low clinical suspicion involve the steep portion of the curves near the
top left corner, whereas negative test results opposing
high clinical suspicion involve the steep portions of
curves near the bottom right corner. As the LR value
improves with better test power, the slopes become
120 • MEDICAL DECISION MAKING/JAN–FEB 2009
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
UNCERTAINTY AND UNEXPECTED TEST RESULTS
A
LR = 5
Post-Pre (%)
40
30
20
10
20
10
20
30
40
50
10
0
0
–10
–10
–30
B
–20
Pretest probability (%)
–30
D
LR = 20
40
40
30
30
20
20
10
10
20
30
40
50
10
0
0
–10
–10
–20
–30
LR = 0.2
40
30
–20
Post-Pre (%)
C
4%
8%
20%
60
70
80
90
100
Pretest probability (%)
LR = 0.05
60
70
80
90
100
–20
Pretest probability (%)
–30
Pretest probability (%)
Figure 3 Determining when unexpected test results increase uncertainty. The change in magnitude of uncertainty by an unexpected test
result was determined over a range of likelihood ratio (LR) values. In the left column (A, B), a positive result was unexpected because the
pretest probability estimation was initially below 50%, whereas in the right column (C, D), a negative result was unexpected with pretest
probability estimates > 50%. In each panel, a ‘‘window’’ of uncertainty was moved along the x-axis (4%, 8%, or 20% in magnitude; see
Figure 2), and the difference between the posttest and pretest ranges of uncertainty is plotted on the y-axis. Positive values indicate
greater uncertainty given the test result (posttest window > pretest window; see Figure 2D, for example), whereas negative values indicate less uncertainty despite the unexpected result. For LR(+) = 5, crossover occurs at the pretest probability of ∼ 31%; for LR(+) = 20,
∼ 18%; for LR(−) = 0:2, ∼ 67%; for LR(−) = 0:05, 82%. These crossover points are listed as approximations to reflect small differences
depending on the pre-TP window size (4%, 8%, or 20%).
steeper, and thus even small windows of pre-TP uncertainty (on the x-axis) can be amplified into larger postTP uncertainty ranges (on the y-axis). In contrast,
results that agree with clinical suspicion involve the
flatter parts of the curves (e.g., a positive test in the setting of high pre-TP, associated with the upper right
corner). Thus, in cases where the test result agrees with
clinical suspicion, disease uncertainty expressed in
the pre-TP will collapse when the post-TP is obtained,
in proportion to the power of the test.
DO ALL UNEXPECTED RESULTS
INCREASE UNCERTAINTY?
The phenomenon of amplified uncertainty in disease probability when unexpected test results are
encountered depends on 3 factors: 1) the steepness of
the LR curve (how powerful the test is), 2) the x-axis
position of the ‘‘shoulder’’ of the curve (where the
slope of the curve = 1), and 3) the x-axis position of
the pre-TP uncertainty window relative to the shoulder (Figure 2). Better tests have LR curves that
approach ‘‘right angles’’ at the upper left and lower
right corners of this type of graph, and the shoulder
moves progressively closer to the extreme probabilities of 0% or 100%. For example, the LR curve of
a ‘‘superior’’ test with LR(+) = 500 would approach
a right angle in the left upper corner (see Figure 2).
This suggests that in certain circumstances, the
pre-TP uncertainty window might not be close
enough to the shoulder of the LR curve to produce
the paradoxical amplification of uncertainty seen in
Figure 2D. Therefore, not all unexpected test results
necessarily increase the window of uncertainty. To
address this issue, Figure 3 illustrates conditions (combinations of pre-TP estimates and test power) in which
an unexpected test result would decrease rather than
DECISION MAKING IN CLINICAL PRACTICE
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
121
BIANCHI, ALEXANDER, CASH
increase uncertainty in disease probability. We consider 3 window sizes of pre-TP uncertainty—4%, 8%,
and 20%—to reflect potential levels of uncertainty that
might be reasonably encountered in practice. We did
not consider smaller windows (which approach exact
values used in the LR nomogram) or larger windows
(which render any meaningful interpretation difficult;
see Figure 1B).
To generate the plot of Figure 3, we compared the
pre-TP and post-TP uncertainty windows in multiple hypothetical conditions. Each of the pre-TP windows was moved in 1% increments along the x-axis,
and the corresponding post-TP window range was
obtained for each of 4 hypothetical LR values:
LR(+) = 5 (Figure 3A), LR(+) = 20 (Figure 3B),
LR(−) = 0:2 (Figure 3C), and LR(−) = 0:05 (Figure 3D).
The curves represent the calculated differences
between the width of the pre-TP and post-TP windows. Positive values indicate increased uncertainty
caused by the test result, whereas negative values
represent less uncertainty. In other words, each panel
represents a single hypothetical test result, and the
curves illustrate how that result affects disease uncertainty depending on the pre-TP estimate (defined by
its location on the x-axis and its window size).
The y-intercept of these curves indicates the point
at which the test result did not change the range of
uncertainty (identical window size before and after
the test result). This transition point moves toward
the probability extremes (0% or 100%) for higher performance tests, essentially tracking the movement of
the shoulder toward the extreme values for more
powerful tests, as discussed earlier. The size of the
uncertainty window (between 4% and 20%) did not
substantially alter the crossover point. As expected,
higher LR(+) values left-shift the crossover point from
∼ 30% to ∼ 25% to ∼ 20% for LR(+) values of 5, 10,
and 20, respectively (Figure 3A, B).
These graphs illustrate that when the window of
uncertainty flanks the shoulder of the LR curves (see
Figure 2), an unexpected test result will increase
disease uncertainty. More powerful tests may not
necessarily circumvent the potential for increased
uncertainty from unexpected results, depending on
where the pre-TP window of uncertainty lies.
WHEN SHOULD AN UNEXPECTED RESULT
COMPEL YOU TO SWITCH DIAGNOSTIC SIDES?
We have seen that unexpected test results can
increase uncertainty about the presence or absence
of disease. From the example in Figure 2D, an
unexpected result adjusted disease probability from
a range of 5% to 25% to a range of 18% to 78%. Not
only did the range increase with the unexpected
result, but the post-TP window range also spanned
the 50% mark, straddling the diagnostic fence of
whether disease is absent or present. To know
whether a test result is expected, we used the 50%
mark here as a reference point: for pre-TP values
above 50%, we would consider a negative result
unexpected and vice versa for values below 50%.
What if the unexpected result shifted the uncertainty
window entirely across this 50% mark or any other
defined threshold? When we encounter an unexpected result, we must compare our confidence in the
test result with our confidence in the clinical suspicion of disease probability.
This sort of decision making occurs in a qualitative
and subjective manner routinely in clinical practice.
However, a quantitative approach can be helpful
when faced with the challenge of weighing an unexpected result against one’s clinical suspicion. As
mentioned earlier, there is abundant evidence from
clinical studies that this scenario is precisely where
physicians are most prone toward errors of interpretation. One way of phrasing this question is as follows: how good does a test have to be for an
unexpected result to move the entire uncertainty
window across the 50% mark? This threshold is chosen for illustration here, but any value could be used,
tailored to the clinical scenario. In such cases, even if
the uncertainty window actually increased with the
unexpected test result, the result would nevertheless
be helpful in that it would compel the diagnostic suspicion to ‘‘switch sides.’’ In other words, such a situation might represent a criterion for an unexpected
test result to outweigh one’s clinical suspicion.
Therefore, we sought a simple graphical method to
determine, for any uncertain pre-TP window, how
powerful a test would have to be for an unexpected
result to force the entire resulting post-TP uncertainty
window to the opposite side of the 50% probability
threshold. Figure 4 provides a graphical nomogram
solution to this question (see figure legend for derivation). One uses the graph in 3 simple steps: 1) define
the pre-TP window of uncertainty, 2) draw a vertical
line from the more extreme boundary of that window
(the end closer to 0% for low pre-TP cases or the end
closer to 100% for high pre-TP cases) to the curve,
and 3) draw a horizontal line leftward to obtain the
minimum power of a hypothetical test that would
statistically compel a switch of diagnostic sides.
The graph illustrates the concepts that the closer
the pre-TP range is to 0%, the more powerful a test
122 • MEDICAL DECISION MAKING/JAN–FEB 2009
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
UNCERTAINTY AND UNEXPECTED TEST RESULTS
Figure 4 Determining when unexpected test results compel
switching diagnostic sides. For any pretest probability (pre-TP),
there is one hypothetical likelihood ratio (LR) value that would
correspond to a posttest probability (post-TP) of 50%. LR is plotted (note log y-axis) against pre-TP %. Because post-TP is derived
from posttest odds = pretest odds ∗ LR, we solve this relationship
for post-TP = 50% to generate the following equation: LR = 1/x − 1
(where x = pre-TP expressed as a fraction rather than a percentage), which yields the curve shown. For threshold analysis other
than the 50% cutoff, the general formula to calculate the LR
required to transform any pre-TP to any post-TP is as follows:
LR = [y(1 − x)][(x(1 − y)], where x = pre-TP and y = post-TP,
expressed as fractions. The x-axis crossover point occurs when
LR = 1, as expected. The shaded regions correspond to combinations of pre-TP and LR power such that an unexpected result
would shift disease probability across the 50% mark.
must be (greater LR(+) ) for an unexpected test result
to move the uncertainty window entirely over the
50% mark (i.e., to switch diagnostic sides). Similarly, the higher the pre-TP is, the smaller the LR(−)
must be to compel switching sides. The dotted line
indicates the 50% threshold used in this explanation, but any threshold can be used in principle.2
The gray-shaded region to the left of this threshold
indicates LR(+) values that will satisfy this sideswitching criterion when the pre-TP is low, whereas
the shaded region to the right indicates LR(−) values
when the pre-TP is high. In summary, the nomogram provides a simple tool to compare one’s clinical suspicion (in the form of a pre-TP range of
uncertainty) with the power of an unexpected test
result, to aid in determining which has more weight
when there is disagreement.
DISCUSSION
In a time when medical practice is increasingly
limited by scarce resources, careful consideration
must be given to how diagnostic test results will
guide management. Understanding Bayes theorem is
critical to recognize potential pitfalls of diagnostic
interpretations, particularly when unexpected test
results are encountered. This important perspective
requires that decision making extend beyond the
population concepts of sensitivity and specificity, to
focus on the predictive value of a given test result for
the individual patient. The LR method of Bayesian
reasoning incorporates sensitivity and specificity into
a strategy that focuses on using tests to adjust disease
probability. Given the potential impact of test result
misinterpretation on health care delivery and cost,
diagnostic strategies should be equipped to deal with
the uncertain and the unexpected. Because accurate
pre-TP assessment (including clinical features and
epidemiological data) remains the cornerstone of
diagnosis, progress in epidemiology and improved
clinical skills represent important pathways to
improve diagnostic acumen at the point of care by
decreasing uncertainty in pre-TP estimation. A combination of a threshold framework for testing (or
treating) and an appreciation for how tests of various
power (sensitivity and specificity) can alter disease
probability may help clinicians decide whether to
order a diagnostic test, as well as interpret test results
in a manner that considers the inherent uncertainty
in diagnostic evaluations.
Despite the critical importance of pre-TP on test
result interpretation, using this information clinically
remains a challenge across multiple levels of medical
training.3 6,16 Adding to the complexity is the challenge of accurate pre-TP estimation, which may be
better reflected as an interval than a single number,
particularly when there are few epidemiology data or
when a given patient is not similar to published data.
We provide a simple graphical method to incorporate
uncertainty into pre-TP estimation that is compatible
with a Bayesian approach to diagnostic reasoning.
This method is applied to situations in which decision support would be most helpful: unexpected test
results. The nomogram provides a visual tool for
weighing clinical suspicion against diagnostic test
power when there is disagreement. Like the standard
Bayes nomogram, it can be used as a ‘‘pocket guide’’
for interpretation of any unexpected test result, provided there is some knowledge of the test’s sensitivity
and specificity. Simple calculations allow adaptation
of our nomogram to accommodate any threshold
value. Although it cannot substitute for the intricacies
of clinical reasoning, it provides a semi-quantitative
framework to encourage discussion and take advantage of probability theory in situations when decision
DECISION MAKING IN CLINICAL PRACTICE
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
123
BIANCHI, ALEXANDER, CASH
support might prove most helpful. We can think of
diagnostic tests as serving several statistical purposes:
adjustment of disease probability, narrowing windows of uncertainty, and potentially compelling us to
change our minds. A graphical aid to this approach
complements the wisdom of experience in facing the
challenges of diagnostic uncertainty and unexpected
results.
REFERENCES
1. Sackett DL, Sackett DL. Clinical Epidemiology: A Basic Science
for Clinical Medicine. 2nd ed. Boston: Little, Brown; 1991.
2. Pauker SG, Kassirer JP. The threshold approach to clinical
decision making. N Engl J Med. 1980;302:1109–17.
3. Glasziou P. Which methods for bedside Bayes? ACP J Club.
2001;135:A11–2.
4. Richardson WS. Five uneasy pieces about pretest probability.
J Gen Intern Med. 2002;17:882–3.
5. Baron JA. Uncertainty in Bayes. Med Decis Making. 1994;14:
46–51.
6. Attia JR, Nair BR, Sibbritt DW, et al. Generating pretest probabilities: a neglected area in clinical decision making. Med J Aust.
2004;180:449–54.
7. Cahan A, Gilon D, Manor O, Paltiel O. Probabilistic reasoning
and clinical decision-making: do doctors overestimate diagnostic
probabilities? QJM. 2003;96:763–9.
8. Ghosh AK, Ghosh K, Erwin PJ. Do medical students and physicians understand probability? QJM. 2004;97:53–5.
9. Green ML. Evidence-based medicine training in graduate medical education: past, present and future. J Eval Clin Pract. 2000;6:
121–38.
10. Heller RF, Sandars JE, Patterson L, McElduff P. GPs’ and physicians’ interpretation of risks, benefits and diagnostic test results.
Fam Pract. 2004;21:155–9.
11. Lyman GH, Balducci L. Overestimation of test effects in clinical judgment. J Cancer Educ. 1993;8:297–307.
12. Lyman GH, Balducci L. The effect of changing disease risk on
clinical reasoning. J Gen Intern Med. 1994;9:488–95.
13. Phelps MA, Levitt MA. Pretest probability estimates: a pitfall
to the clinical utility of evidence-based medicine? Acad Emerg
Med. 2004;11:692–4.
14. Weissler AM. A perspective on standardizing the predictive
power of noninvasive cardiovascular tests by likelihood ratio
computation: 2. Clinical applications. Mayo Clin Proc. 1999;74:
1072–87.
15. Weissler AM. A perspective on standardizing the predictive
power of noninvasive cardiovascular tests by likelihood ratio
computation: 1. Mathematical principles. Mayo Clin Proc. 1999;
74:1061–71.
16. Bianchi MT, Alexander BM. Evidence based diagnosis: does
the language reflect the theory? BMJ. 2006;333:442–5.
17. Boyko EJ. Ruling out or ruling in disease with the most sensitive or specific diagnostic test: short cut or wrong turn? Med Decis
Making. 1994;14:175–9.
18. Pewsner D, Battaglia M, Minder C, et al. Ruling a diagnosis in
or out with ‘‘SpPIn’’ and ‘‘SnNOut’’: a note of caution. BMJ. 2004;
329:209–13.
19. Gallagher EJ. Clinical utility of likelihood ratios. Ann Emerg
Med. 1998;31:391–7.
20. Halkin A, Reichman J, Schwaber M, et al. Likelihood ratios:
getting diagnostic testing into perspective. QJM. 1998;91:247–58.
21. Diamond GA, Forrester JS. Analysis of probability as an aid in
the clinical diagnosis of coronary-artery disease. N Engl J Med.
1979;300:1350–8.
22. Pocket Guide to Diagnostic Tests. Norwalk, CT: Appleton &
Lange; 1992.
23. Parker M. Whither our art? Clinical wisdom and evidencebased medicine. Med Health Care Philos. 2002;5:273–80.
24. Benish WA. The use of information graphs to evaluate and
compare diagnostic tests. Methods Inf Med. 2002;41:114–8.
124 • MEDICAL DECISION MAKING/JAN–FEB 2009
Downloaded from http://mdm.sagepub.com at Harvard University on August 10, 2009
Download