The clinical value of diagnostic tests A well

advertisement
The clinical value of
diagnostic tests
A well-explored but underdeveloped
continent
J. Hilden : March 2006
The clinical value of diagnostic
tests
The diagnostic test and some
neglected aspects of its statistical
evaluation.
Some aspects were covered in my seminar spring 2003
Plan of
my talk
Historical & ”sociological” observations
Clinicometric framework
Displays and measures of diagnostic power
Appendix: math. peculiarities
Plan of
my talk
Historical & ”sociological” observations
”Skud & vildskud”
- Diagnostic vs. therapeutic research
- 3 key innovations & some pitfalls
Clinicometric framework
Displays and measures of diagnostic power
Appendix: math. peculiarities
A quantitative framework for diagnostics is much
harder to devise than for therapeutic trials.
• Trials concern what
• Diagnostic activities
happens observably
aim at changing the
doc’s mind
• …concern 1st order
entities (mean effects) • …concern 2nd order
entities (uncertainty /
“entropy” change)
CONSORT >> 10yrs >> STARD
CC ~1993
CC ~2003
In the 1970s
medical decision theory established itself
– but few first-rate statisticians took notice.
Were they preoccupied with other topics, …
Cox, prognosis, … trial follow-up ?
Sophisticated models became available for
describing courses of disease conditionally
on diagnostic data.
Fair to say that they themselves remained
‘a vector of covariates’ ?
Early history
Yerushalmy ~1947:
studies of observer variation*
Vecchio:
/:BLACK~WHITE:/ Model 1966
- simplistic but indispensable
- simple yet often misunderstood?!
Warner ~1960:
congenital heart dis. via BFCI
* important but not part of my topic today
Other topics not mentioned
Location (anatomical diagnoses)
and multiple lesions
Monitoring, repeated events, prognosis
Systematic reviews & meta-analyses
Interplay between diagnostic test data &
knowledge from e.g. physiology
Tests with a therapeutic potential
Non-existence of ”prevalence-free”
figures of merit
Patient involvement, consent
BFCI (Bayes’ Formula w.
Conditional Independence**)
”based on the assumption of CI”:
what does that mean?
Do you see why it was misunderstood?
** Indicant variables independent cond’lly on pt’s true condition
BFCI (Bayes’ Formula w.
Conditional Independence)
”Bayes based on the assumption of CI”
- what does that mean?
1) ”There is no ”Bayes Theorem” without CI”
2) ”The BFCI formulae presuppose CI
(CI is a necessary condition for correctness)”
No, CI is a sufficient condition; whether it is
also necessary is a matter to be determined
– and the answer is No.
Counterexample: next picture !
Joint conditional distribution of two
tests* in two diseases (red, green)
.0625
.0375
.1
.125
.075
.2
.25
.15
.4
.1875
.1125
.3
.4375
.0625
.5
.4375
.0625
.5
.75
.25
1
.75
.25
1
*with 3 and 2 test qualitative outcomes
Vecchio’s /:BLACK&WHITE:/ Model 1966
Common misunderstandings:
1) ”The sensitivity and specificity are
properties of the diagnostic test
[rather than of the patient population]”
2) They are closely connected with the
ability of the test to rule out & in
True only when the ”prevalence” is intermediate
Plan of
my talk
Historical & ”sociological” observations
Clinicometric framework
Displays and measures of diagnostic power
Appendix: math. peculiarities
You cannot discuss Diagnostic Tests without:
Some conceptual framework*
A Case, the unit of experience in the
clinical disciplines,
is a case of a Clinical Problem,
defined by the who-how-where-whywhat of a clinical encounter
– or Decision Task.
We have a case population or:
case stream (composition!) with a
case flow (rate, intensity).
*Clini[co]metrics, rationel klinik, …
Examples
Each time the doc sees the patient we
have a new encounter / case, to be
compared with suitable ”statistical”
precedents – and physio- &
pharmacology.
Prognostic outlook at discharge from
hospital: a population of cases =
discharges, not patients (CPR Nos.).
Danish Citizen No.
Diagnosis?
Serious diagnostic endeavours are always actionoriented
– or at least counselling-oriented –
i.e., towards what should be done so as to influence
the future (action-conditional prognosis).
The ”truth” is either
(i) a gold standard test (”facitliste”), or
(ii) syndromatic (when tests define the ”disease,*”
e.g. rheum. syndromes, diabetes)
*in clinimetrics there is little need for that word!
Example
The acute abdomen:
there is no need to discriminate between
appendicitis and non-app. (though it is
fun to run an ”appendicitis contest”)
What is actionwise relevant is the
decision: open up or wait-and-see?
<This is frequently not recognized in the literature>
In clinical studies the choice of sample,
and of the variables on which to base
one's prediction,
must match the clinical problem
as it presents itself
at the time of decision making.
In particular, one mustn't
discard subgroups (impurities?)
that did not become identifiable
until later: prospective recognizability !
Purity vs. representativeness:
A meticulously filtered case stream
('proven infarctions')
may be needed for patho- and
pharmaco-physiological research,
but is inappropriate as a basis
for clinical decision rules
[incl. cost studies].
Consecutivity as a safeguard against
selection bias.
Standardization:
(Who examines the patient? Where?
When? With access to clin. data?)
Gold standard … the big problem !!
w. blinding, etc.
Safeguards against change of data after
the fact.
If the outcome is FALSE negative or
positive,
you apply an ”arbiter” test
”in order to resolve the discrepant finding,”
i.e. a 2nd, 3rd, … reference test.
If TRUE negative or positive, accept !
~ The defendant decides who shall be allowed
to give testimony and when
Digression…
Randomized trials of diagn. tests
…theory under development
Purpose & design: many variants
Sub(-set-)randomization, depending on the
pt.’s data so far collected.
”Non-disclosure”: some data are kept under
seal until analysis. No parallel in therapeutic trials!
Main purposes…
…Randomized trials of diagn. tests
1) when the diagnostic intervention is itself
potentially therapeutic;
2) when the new test is likely to redefine the
disease(s) ( cutting the cake in a
completely new way );
3) when there is no obvious rule of
translation from the outcomes of the new
test to existing treatment guidelines;
4) when clinician behaviour is part of the
research question…
…end of digression
Plan of
my talk
Historical & ”sociological” observations
Clinicometric framework
Displays and measures of diagnostic power
Appendix: math. peculiarities
Displays & measures of
diagnostic power
1) The Schism – between:
2) ROCography
3) VOIography
ROCography
~ classical discriminant analysis /
pattern recognition
Focus on disease-conditional
distribution of test results (e.g., ROC)
AuROC (the area under the ROC) is
popular … despite 1991 paper
VOI (value of information)
~ decision theory.
VOI = increase in expected utility afforded
by an information source such as a
diagnostic test
Focus on posttest conditional distribution of
disorders, range of actions and the
associated expected utility – and
– its preposterior quantification.
Less concerned with math structure, more
with medical realism.
VOI
Do we have a canonical guideline?
1) UTILITY
2) UTILITY / COST
Even if we don't have the utilities
as actual numbers, we can use this
paradigm as a filter:
evaluation methods that violate it are
wasteful of lives or resources.
Stylized utility (pseudo-regret functions) as a
(math. convenient) substitute.
VOI
Def. diagnostic uncertainty as expected regret
(utility loss, relative to if you knew what ailed the pt.)
Diagnosticity measures (DMs):
Diagnostic tests
should be evaluated in terms of
pretest-posttest difference
in diagnostic uncertainty.
Auxiliary quantities like sens and spec
… go into the above.
…so much as to VOI principles
NOT
Diagnosticity measures and auxiliary quantities
Diagnosticity measures and auxiliary quantities
Sens (TP), spec (TN):
nosografic distrib.
PVpos, Pvneg: diagnostic distr.|test result
Youden’s Index: Y = sens + spec – 1 =
1 – (FN) – (FP)
= det(nosog. 2X2) =
(TP)(TN)–(FP)(FN)
= 2(AuROC – ½)
ROC
Y=1
FN
TP
AuROC =
[sens+spec] / 2
FP
TN
Diagnosticity measures and auxiliary quantities
Sens, spec
nosografic distribution
LRpos, LRneg = slopes of segments
The ”Likelihood ratio” term is o.k. when diagnostic
hypotheses are likened to scientific hypotheses
ROC
Y=1
FN
TP
FP
TN
Diagnosticity measures and auxiliary quantities
«Utility index» = (sens) x Y.
... is nonensense
ROC
Y=1
FN
TP
FP
TN
Diagnosticity measures and auxiliary quantities
DOR (diagnostic odds ratio) =
[(TP)(TN)] / [(FP)(FN)]
= infinity in this example
even if TP is only 0.0001.
... careful!
ROC
Y=1
FN
TP
FP = 0
TN
Three test
outcomes
FREQUENCY-WEIGHTED ROC
implies constant misclassification
Continuous test
Cutoff at x = c
minimizes misclassification
Two binary tests and
their 6 most important
joint rules of interpretation
”Overhull” implies
superiority
*
*
**
§
§
Essence of the proof
that ”overhull”
implies superiority
Utility-based evaluation
in general
*
Utility-based evaluation
in general
∫(pdy + qdx )mina{ (LaD pdy + La,nonD qdx)/(pdy + qdx) }
is how it looks when applied to the ROC
*
(which contains the required information about
the disease-conditional distributions).
Utility-based evaluation
in general
The area under the ROC
(AuROC) is misleading
You have probably seen my
counterexample* before.
Assume D and non-D
equally frequent and also
utilitywise symmetric …
*Medical Decision Making 1991; 11: 95-101
Two Investigations
Expected regret (utility drop
relative to perfect diagnoses)
Bxsens
The tent graph
Cxspec
pretest
Good & bad pseudoregret functions
Shannon-like
Brier-like
Plan of
my talk
Historical & ”sociological” observations
Clinicometric framework
Displays and measures of diagnostic power
Appendix: math. peculiarities
LRpos = LRneg = 1
End of
my talk
Thank you !
Tak for i dag !
Download