The clinical value of diagnostic tests A well-explored but underdeveloped continent J. Hilden : March 2006 The clinical value of diagnostic tests The diagnostic test and some neglected aspects of its statistical evaluation. Some aspects were covered in my seminar spring 2003 Plan of my talk Historical & ”sociological” observations Clinicometric framework Displays and measures of diagnostic power Appendix: math. peculiarities Plan of my talk Historical & ”sociological” observations ”Skud & vildskud” - Diagnostic vs. therapeutic research - 3 key innovations & some pitfalls Clinicometric framework Displays and measures of diagnostic power Appendix: math. peculiarities A quantitative framework for diagnostics is much harder to devise than for therapeutic trials. • Trials concern what • Diagnostic activities happens observably aim at changing the doc’s mind • …concern 1st order entities (mean effects) • …concern 2nd order entities (uncertainty / “entropy” change) CONSORT >> 10yrs >> STARD CC ~1993 CC ~2003 In the 1970s medical decision theory established itself – but few first-rate statisticians took notice. Were they preoccupied with other topics, … Cox, prognosis, … trial follow-up ? Sophisticated models became available for describing courses of disease conditionally on diagnostic data. Fair to say that they themselves remained ‘a vector of covariates’ ? Early history Yerushalmy ~1947: studies of observer variation* Vecchio: /:BLACK~WHITE:/ Model 1966 - simplistic but indispensable - simple yet often misunderstood?! Warner ~1960: congenital heart dis. via BFCI * important but not part of my topic today Other topics not mentioned Location (anatomical diagnoses) and multiple lesions Monitoring, repeated events, prognosis Systematic reviews & meta-analyses Interplay between diagnostic test data & knowledge from e.g. physiology Tests with a therapeutic potential Non-existence of ”prevalence-free” figures of merit Patient involvement, consent BFCI (Bayes’ Formula w. Conditional Independence**) ”based on the assumption of CI”: what does that mean? Do you see why it was misunderstood? ** Indicant variables independent cond’lly on pt’s true condition BFCI (Bayes’ Formula w. Conditional Independence) ”Bayes based on the assumption of CI” - what does that mean? 1) ”There is no ”Bayes Theorem” without CI” 2) ”The BFCI formulae presuppose CI (CI is a necessary condition for correctness)” No, CI is a sufficient condition; whether it is also necessary is a matter to be determined – and the answer is No. Counterexample: next picture ! Joint conditional distribution of two tests* in two diseases (red, green) .0625 .0375 .1 .125 .075 .2 .25 .15 .4 .1875 .1125 .3 .4375 .0625 .5 .4375 .0625 .5 .75 .25 1 .75 .25 1 *with 3 and 2 test qualitative outcomes Vecchio’s /:BLACK&WHITE:/ Model 1966 Common misunderstandings: 1) ”The sensitivity and specificity are properties of the diagnostic test [rather than of the patient population]” 2) They are closely connected with the ability of the test to rule out & in True only when the ”prevalence” is intermediate Plan of my talk Historical & ”sociological” observations Clinicometric framework Displays and measures of diagnostic power Appendix: math. peculiarities You cannot discuss Diagnostic Tests without: Some conceptual framework* A Case, the unit of experience in the clinical disciplines, is a case of a Clinical Problem, defined by the who-how-where-whywhat of a clinical encounter – or Decision Task. We have a case population or: case stream (composition!) with a case flow (rate, intensity). *Clini[co]metrics, rationel klinik, … Examples Each time the doc sees the patient we have a new encounter / case, to be compared with suitable ”statistical” precedents – and physio- & pharmacology. Prognostic outlook at discharge from hospital: a population of cases = discharges, not patients (CPR Nos.). Danish Citizen No. Diagnosis? Serious diagnostic endeavours are always actionoriented – or at least counselling-oriented – i.e., towards what should be done so as to influence the future (action-conditional prognosis). The ”truth” is either (i) a gold standard test (”facitliste”), or (ii) syndromatic (when tests define the ”disease,*” e.g. rheum. syndromes, diabetes) *in clinimetrics there is little need for that word! Example The acute abdomen: there is no need to discriminate between appendicitis and non-app. (though it is fun to run an ”appendicitis contest”) What is actionwise relevant is the decision: open up or wait-and-see? <This is frequently not recognized in the literature> In clinical studies the choice of sample, and of the variables on which to base one's prediction, must match the clinical problem as it presents itself at the time of decision making. In particular, one mustn't discard subgroups (impurities?) that did not become identifiable until later: prospective recognizability ! Purity vs. representativeness: A meticulously filtered case stream ('proven infarctions') may be needed for patho- and pharmaco-physiological research, but is inappropriate as a basis for clinical decision rules [incl. cost studies]. Consecutivity as a safeguard against selection bias. Standardization: (Who examines the patient? Where? When? With access to clin. data?) Gold standard … the big problem !! w. blinding, etc. Safeguards against change of data after the fact. If the outcome is FALSE negative or positive, you apply an ”arbiter” test ”in order to resolve the discrepant finding,” i.e. a 2nd, 3rd, … reference test. If TRUE negative or positive, accept ! ~ The defendant decides who shall be allowed to give testimony and when Digression… Randomized trials of diagn. tests …theory under development Purpose & design: many variants Sub(-set-)randomization, depending on the pt.’s data so far collected. ”Non-disclosure”: some data are kept under seal until analysis. No parallel in therapeutic trials! Main purposes… …Randomized trials of diagn. tests 1) when the diagnostic intervention is itself potentially therapeutic; 2) when the new test is likely to redefine the disease(s) ( cutting the cake in a completely new way ); 3) when there is no obvious rule of translation from the outcomes of the new test to existing treatment guidelines; 4) when clinician behaviour is part of the research question… …end of digression Plan of my talk Historical & ”sociological” observations Clinicometric framework Displays and measures of diagnostic power Appendix: math. peculiarities Displays & measures of diagnostic power 1) The Schism – between: 2) ROCography 3) VOIography ROCography ~ classical discriminant analysis / pattern recognition Focus on disease-conditional distribution of test results (e.g., ROC) AuROC (the area under the ROC) is popular … despite 1991 paper VOI (value of information) ~ decision theory. VOI = increase in expected utility afforded by an information source such as a diagnostic test Focus on posttest conditional distribution of disorders, range of actions and the associated expected utility – and – its preposterior quantification. Less concerned with math structure, more with medical realism. VOI Do we have a canonical guideline? 1) UTILITY 2) UTILITY / COST Even if we don't have the utilities as actual numbers, we can use this paradigm as a filter: evaluation methods that violate it are wasteful of lives or resources. Stylized utility (pseudo-regret functions) as a (math. convenient) substitute. VOI Def. diagnostic uncertainty as expected regret (utility loss, relative to if you knew what ailed the pt.) Diagnosticity measures (DMs): Diagnostic tests should be evaluated in terms of pretest-posttest difference in diagnostic uncertainty. Auxiliary quantities like sens and spec … go into the above. …so much as to VOI principles NOT Diagnosticity measures and auxiliary quantities Diagnosticity measures and auxiliary quantities Sens (TP), spec (TN): nosografic distrib. PVpos, Pvneg: diagnostic distr.|test result Youden’s Index: Y = sens + spec – 1 = 1 – (FN) – (FP) = det(nosog. 2X2) = (TP)(TN)–(FP)(FN) = 2(AuROC – ½) ROC Y=1 FN TP AuROC = [sens+spec] / 2 FP TN Diagnosticity measures and auxiliary quantities Sens, spec nosografic distribution LRpos, LRneg = slopes of segments The ”Likelihood ratio” term is o.k. when diagnostic hypotheses are likened to scientific hypotheses ROC Y=1 FN TP FP TN Diagnosticity measures and auxiliary quantities «Utility index» = (sens) x Y. ... is nonensense ROC Y=1 FN TP FP TN Diagnosticity measures and auxiliary quantities DOR (diagnostic odds ratio) = [(TP)(TN)] / [(FP)(FN)] = infinity in this example even if TP is only 0.0001. ... careful! ROC Y=1 FN TP FP = 0 TN Three test outcomes FREQUENCY-WEIGHTED ROC implies constant misclassification Continuous test Cutoff at x = c minimizes misclassification Two binary tests and their 6 most important joint rules of interpretation ”Overhull” implies superiority * * ** § § Essence of the proof that ”overhull” implies superiority Utility-based evaluation in general * Utility-based evaluation in general ∫(pdy + qdx )mina{ (LaD pdy + La,nonD qdx)/(pdy + qdx) } is how it looks when applied to the ROC * (which contains the required information about the disease-conditional distributions). Utility-based evaluation in general The area under the ROC (AuROC) is misleading You have probably seen my counterexample* before. Assume D and non-D equally frequent and also utilitywise symmetric … *Medical Decision Making 1991; 11: 95-101 Two Investigations Expected regret (utility drop relative to perfect diagnoses) Bxsens The tent graph Cxspec pretest Good & bad pseudoregret functions Shannon-like Brier-like Plan of my talk Historical & ”sociological” observations Clinicometric framework Displays and measures of diagnostic power Appendix: math. peculiarities LRpos = LRneg = 1 End of my talk Thank you ! Tak for i dag !