Sensitivity, Specificity and ROC Curve Analysis Criteria for Evaluating a Screening Test •Validity: provide a good indication of who does and does not have disease -Sensitivity of the test -Specificity of the test •Reliability: (precision): gives consistent results when given to same person under the same conditions •Yield: Amount of disease detected in the population, relative to the effort -Prevalence of disease/predictive value Validity of Screening Test (Accuracy) - Sensitivity: Is the test detecting true cases of disease? Ideal is 100%: 100% of cases are detected; =Pr(T+|D+) -Specificity: Is the test excluding those without disease? Ideal is 100%: 100% of non-cases are negative; =Pr(T-|D-) - See Gehlbach, Chp. 10 Example: Screening for Glaucoma using IOP True Cases of Glaucoma IOP > 22: Yes No Yes 50 100 No 50 1900 (total) 100 2000 Sensitivity = 50% (50/100) False Negative=50% Specificity = 95% (1900/2000) False Positive=5% Where do we set the cut-off for a screening test? Consider: -The impact of high number of false positives: anxiety, cost of further testing -Importance of not missing a case: seriousness of disease, likelihood of re-screening Yield from the Screening Test: Predictive Value •Relationship between Sensitivity, Specificity, and Prevalence of Disease Prevalence is low, even a highly specific test will give large numbers of False Positives •Predictive Value of a Positive Test (PPV): Likelihood that a person with a positive test has the disease •Predictive Value of a Negative Test (NPV): Likelihood that a person with a negative test does not have the disease Screening for Glaucoma using IOP True Cases of Glaucoma IOP > 22: Yes No Yes 50 100 No 50 1900 (total) 100 2000 Specificity = 95% (1900/2000) False Positive=5% Positive Predictive Value =33% (50/150) How Good does a Screening Test have to be? IT DEPENDS -Seriousness of disease, consequences of high false positivity rate: -Rapid HIV test should have >90% sensitivity, 99.9% specificity -Screen for nearsighted children proposes 80% sensitivity, >95% specificity -Pre-natal genetic questionnaire could be 99% sensitive, 80% specific Choosing a cut-point: receiver operating characteristic curves • Situation where screening test yields results as a continuous value (e.g., intraocular pressure for glaucoma) • Want to select a value above (or below) which to call “diseased” or “at risk” • How do we select that value? Non-diseased cases Diseased cases Threshold Test result value or subjective judgment of likelihood that case is diseased More typically: Non-diseased cases Diseased cases Test result value or subjective judgment of likelihood that case is diseased 12 Threshold Diseased cases TP Fraction (sensitivity) Non-diseased cases less aggressive mindset FP Fraction (1-specificity) Threshold Diseased cases TP Fraction (sensitivity) Non-diseased cases moderate mindset FP Fraction (1-specificity) Threshold Diseased cases TP Fraction (sensitivity) Non-diseased cases more aggressive mindset FP Fraction (1-specificity) Non-diseased cases Threshold Diseased cases TP Fraction (sensitivity) Entire ROC curve FP Fraction (1-specificity) Highly discriminate (good) Somewhat discriminate (not as good) TP Fraction (sensitivity) Entire ROC curve Reader Skill and/or Level of Technology FP Fraction (1-specificity) Non-informative (no better than chance) Use area under to curve (AUC) to judge discriminating ability. Gehlbach: want AUC>80% Luke Neff: Refractory Burn Shock Data Logistic Regression and ROC Curve Analysis Response Profile Ordered Value PET Total Frequency 1 0 22 2 1 20 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 20.2651 1 <.0001 Score 15.3270 1 <.0001 Wald 10.1930 1 0.0014 Luke Neff: Refractory Burn Shock Data Logistic Regression and ROC Curve Analysis Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Wald Pr > ChiSq Chi-Square Intercept 1 -3.0649 0.9514 10.3771 0.0013 Admission Lactate 1 0.8436 0.2642 10.1930 0.0014 Odds Ratio Estimates Effect Admission Lactate Point Estimate 2.325 95% Wald Confidence Limits 1.385 3.902 Luke Neff: Refractory Burn Shock Data Logistic Regression and ROC Curve Analysis Area 0.8489 Standard Error 0.0633 95% Wald Confidence Limits 0.7249 0.9729 Point that Maximizes sum of sensitivity and specificity. Corresponds to lactate value of about 3.0 Pred Prob True Pos True Neg False Pos 0.9995 1 22 0 0.9863 2 22 0 0.9838 3 22 0 0.96 4 22 0 0.9402 6 22 0 0.9353 7 22 0 0.9182 8 22 0 0.889 9 22 0 0.8401 10 22 0 0.8284 11 22 0 0.7894 12 22 0 0.675 12 21 1 0.637 12 20 2 0.5767 12 18 4 0.5351 13 17 5 0.493 14 17 5 0.4302 14 16 6 0.4096 15 16 6 0.3894 16 16 6 0.3695 17 16 6 0.3312 18 15 7 0.3127 18 14 8 0.2611 18 13 9 0.2299 18 12 10 0.1881 19 10 12 0.1637 19 8 14 0.1525 19 7 15 0.1419 19 5 17 0.1226 19 4 18 0.1056 19 2 20 0.0907 19 1 21 0.0718 20 0 22 False Neg 19 18 17 16 14 13 12 11 10 9 8 8 8 8 7 6 6 5 4 3 2 2 2 2 1 1 1 1 1 1 1 0 Se 0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.6 0.6 0.6 0.65 0.7 0.7 0.75 0.8 0.85 0.9 0.9 0.9 0.9 0.95 0.95 0.95 0.95 0.95 0.95 0.95 1 1 - Sp 0 0 0 0 0 0 0 0 0 0 0 0.05 0.09 0.18 0.23 0.23 0.27 0.27 0.27 0.27 0.32 0.36 0.41 0.45 0.55 0.64 0.68 0.77 0.82 0.91 0.95 1