Addressing common analytical challenges in the evaluation of screening tests using Stata Eduardo Ortiz-Panozo EUSMEX 2013 Evaluation of screening tests • Frequent in public health ο Early stages of disease • Types: – Comparing results to a gold standard – Comparing results among tests Common analytical challenges • Verification bias – Positive results are more likely to be confirmed by a gold standard • Correlated observations – More than one test applied in each individual Summary statistics ππππ ππ‘ππ£ππ‘π¦ = ππππ = Pr π = 1 π· = 1 ππππππππππ‘π¦ = ππππ = Pr π = 0 π· = 0 πππ ππ‘ππ£π πππππππ‘ππ£π π£πππ’π = πππ = Pr π· = 1 π = 1 πππππ‘ππ£π πππππππ‘ππ£π π£πππ’π = πππ = Pr π· = 0 π = 0 Pr π = 1 π· = 1 πππ ππ‘ππ£π ππππππβπππ πππ‘ππ = πΏπ += Prβ‘[Y = 1|D = 0] Pr π = 0 π· = 1 πππππ‘ππ£π ππππππβπππ πππ‘ππ = πΏπ −= [Y = 0|D = 0] Prβ‘ Comparison of tests by log-linear models log Pr π = 1 π· = 1, π exp π½1 Pr π = 1 π· = 1, π = 1 = πππππ = [Y = 1|D = 1, T = 0] Prβ‘ log Pr π· = π π, π, ππ exp π½2 + π½3 Pepe (2003) = π½0 + π½1 π = π½0 + π½1 π + π½2 π + π½3 ππ Pr π· = 1 π = 1, π = 1 = ππππ = [D = 1|Y = 1, T = 0] Prβ‘ Pros of the regression approach • • • • Weighting (IPW, probability of verification) Robust Standard Errors (Correlation) Efficiency Adjustment for covariates Pepe (2003) Numerical example: Cervical Cancer Detection Program, Morelos, 2009 • First HPV, then Pap to the same women • Two-stage sampling for verification: 1) Health facilities 2) Women • Prob. verification among HPV+ = 1 • Prob. verification among HPV- ~0.05 • Comparison of 4 strategies • n=5,980 Naïve estimation -diagt. diagt BIO23 hpv hpv biopsy Pos. Neg. Total Abnormal Normal 79 399 3 144 82 543 Total 478 147 625 True abnormal diagnosis defined as BIO23 = 1 (labelled +) [95% Confidence Interval] --------------------------------------------------------------------------Prevalence Pr(A) 13% 11% 16% --------------------------------------------------------------------------Sensitivity Pr(+|A) 96.3% 89.7% 99.2% Specificity Pr(-|N) 26.5% 22.9% 30.4% ROC area (Sens. + Spec.)/2 .614 .587 .642 --------------------------------------------------------------------------Likelihood ratio (+) Pr(+|A)/Pr(+|N) 1.31 1.23 1.4 Likelihood ratio (-) Pr(-|A)/Pr(-|N) .138 .045 .423 Odds ratio LR(+)/LR(-) 9.5 3.12 28.9 Positive predictive value Pr(A|+) 16.5% 13.3% 20.2% Negative predictive value Pr(N|-) 98% 94.2% 99.6% --------------------------------------------------------------------------- Sampling design . svydes Survey: Describing stage 1 sampling units pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: pw2 linearized certainty strata nocs fpc #Obs per Unit Stratum #Units #Obs min mean max 1 2 3 31 8 27 2599 1563 1818 29 66 10 83.8 195.4 67.3 442 649 377 3 66 5980 10 90.6 649 . tab hpv, sum(pw2) hpv Summary of pw2 Mean Std. Dev. Freq. + 19.14039 1 12.69373 0 1795 4185 Total 6.4451505 10.839113 5980 Estimation considering verification bias . svy: tab BIO23 hpv, row ci (running tabulate on estimation sample) Number of strata Number of PSUs = = 3 57 Number of obs Population size Design df hpv biopsy - + Total - .8828 [.8526,.9075] .1172 [.0925,.1474] 1 + .373 [.2863,.4688] .627 [.5312,.7137] 1 Total .8646 [.8304,.8928] .1354 [.1072,.1696] 1 Key: row proportions [95% confidence intervals for row proportions] Pearson: Uncorrected Design-based chi2(1) F(1, 54) = = 47.7550 179.8197 P = 0.0000 = = = 625 3531 54 HPV testing vs biopsy, by –diagt– and –svy:tab– (n=625) -diagt- -svy:tab- Summary statistics Pr Sensitivity 0.96 0.90 0.99 0.63 0.53 0.71 Specificity 0.27 0.23 0.30 0.88 0.85 0.91 Positive predictive value 0.17 0.13 0.20 0.17 0.15 0.19 Negative predictive value 0.98 0.98 0.99 CI95% 0.98 0.94 1.00 Pr CI95% Comparing screening strategies, including adjustments for verification bias and correlated observations . svy: glm y i.test if BIO23==1, link(log) family(bin) eform nolog /*search diff*/ (running glm on estimation sample) Survey: Generalized linear models Number of strata Number of PSUs = = 3 35 Linearized Std. Err. y exp(b) test 2 3 4 1.204208 .8383724 1.326153 .1023847 .0574411 .0609956 _cons .5206612 .0802986 Number of obs Population size Design df t = = = 323 499 32 P>|t| [95% Conf. Interval] 2.19 -2.57 6.14 0.036 0.015 0.000 1.012717 .7291663 1.207551 1.431907 .9639341 1.456403 -4.23 0.000 .3802978 .7128309 Comparison of screening strategies, by loglinear modelling (n=625) Relative statistics Pap HPV rSens 1 1.2* 0.8* 1.3*** r(1-Spec) 1 5.8*** 0.3*** 6.5*** rPPV 1 0.3*** 1.5** 0.3*** rNPV 1 1.0 1.0 1.0 Reference test: Pap smear * p<.05, ** p<.01, *** p<.001 Combined Sequential Conclusion GLM module of Stata allows the necessary specifications for the evaluation of screening tests, adjusting for common challenges in evaluation of screening tests, namely correlation between observations and verification bias. References • Seed PT, Tobias A. Summary statistics for diagnostic tests. Stata Technical Bulletin 2001;59:9-12 • Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford Statistical Science Series, Oxford University Press. 2003