Clin Chem Lab Med 2022; 60(6): 801–807 Review Peter Schlattmann* Statistics in diagnostic medicine https://doi.org/10.1515/cclm-2022-0225 Received March 10, 2022; accepted March 10, 2022; published online March 31, 2022 Abstract: This tutorial gives an introduction into statistical methods for diagnostic medicine. The validity of a diagnostic test can be assessed using sensitivity and specificity which are defined for a binary diagnostic test with known reference or gold standard. As an example we use Procalcitonin with a cut off value ≥ 0.5 g/L as a test and Sepsis2 criteria as a reference standard for the diagnosis of sepsis. Next likelihood ratios are introduced which combine the information given by sensitivity and specificity. For these measures the construction of confidence intervals is demonstrated. Then, we introduce predictive values using Bayes’ theorem. Predictive values are sometimes difficult to communicate. This can be improved using natural frequencies which are applied to our example. Procalcitonin is actually a continuous biomarker, hence we introduce the use of receiver operator curves (ROC) and the area under the curve (AUC). Finally we discuss sample size estimation for diagnostic studies. In order to show how to apply these concepts in practice we explain how to use the freely available software R. Keywords: likelihood ratio; predictive values; receiver operator curve; sample size estimation; sensitivity; software R; specificity. Motivating example Worldwide, sepsis and its sequelae still remain a frequent cause of acute illness and death in patients with community and nosocomial acquired infections [1]. Sepsis may be seen as systemic inflammatory response due to infection. However, a gold standard for the proof of infection is *Corresponding author: Peter Schlattmann, Institute of Medical Statistics, Computer and Data Sciences Jena Bachstr. 18, 07743 Jena, Germany, E-mail: peter.schlattmann@med.uni-jena.de. https://orcid.org/0000-0001-7420-7707 missing. Depending on prior antibiotic therapy, bacteremia is found only in approximately 30% of patients with sepsis. Furthermore, early clinical signs of sepsis, like fever, tachycardia, and leucocytosis, are unspecific and overlap with signs also seen in a multitude of systemic inflammatory response syndromes (SIRS) in the absence of infection, especially in surgical patients. Other signs, such as arterial hypotension, thrombocytopenia, or elevated lactate levels indicate, too late, the progression to organ dysfunction [2]. Thus, delay in diagnosis and treatment of sepsis causes increased mortality. In sepsis numerous humoral and cellular systems are activated, followed by a release of a multitude of mediators and other molecules that mediate the host response to infection. Several potential diagnostic indicators measured in the bloodstream have been evaluated for their clinical ability to assess the diagnosis and severity of sepsis. One of these, the 116 amino acid polypeptide procalcitonin (PCT) is frequently used when it comes to identify bacterial infections. In this tutorial we will use Procalcitonin as an example for the use of statistics in diagnostic medicine using data from a study by Ljungstroem et al. (2017) [3]. Also we will show how to use the freely available software R [4] for the necessary calculations. Sensitivity and specificity of a diagnostic test Sensitivity and specificity are key parameters when evaluating the validity of a binary diagnostic test [5] which requires knowledge of a reference or gold standard which denotes the disease status D+ if sepsis is present and D− otherwise. The potential outcomes of a 2 × 2 table showing the disease status D in the columns and test results (T ) in the rows are shown in Table 1. The establishment of such a reference standard is difficult when diagnosing sepsis [6, 7]. For our example we will apply the Sepsis-2 criteria as used by Ljungstroem et al. (2017) i.e. verified bacterial infection and systemic inflammatory response syndrome (SIRS). As diagnostic test we apply PCT ≥0.5 g/L indicating a positive test result (T +) and values <0.5 g/L indicating a negative test result (T −). 802 Schlattmann: Statistics in diagnostic medicine Table : Potential outcomes of a diagnostic test. Test positive T + Test negative T − Then a 95% confidence interval (CI) is given by Disease present D+ Disease absent D− True positive (TP) False negative (FN) False positive (FP) True negative (TN) When evaluating a diagnostic test a population of diseased persons and a population of healthy individuals is considered. Since no test is perfect a 2 × 2 table is constructed as shown in Table 1 which shows the potential outcomes of a diagnostic test. In a perfect world a diagnostic test would identify all diseased person as ill. That is, we would have only true positives (TP). In our example a patient who suffers from sepsis according to the Sepsis-2 criteria and the corresponding PCT-value of that patient is ≥ 0.5 g/L denotes a true positive result. From Table 1 we can define the sensitivity (sens) of our diagnostic test TP sens = TP+FN . This implies the probability to identify diseased persons correctly using a PCT level with at least 0.5 g/L as diagnostic test. Likewise, a non-diseased person should be identified correctly as well. This leads to the specificity (spec) of a TN . diagnostic test given by spec = TN+FP Based on Table 2 we obtain 296 true positives (TP) and a total of 667 diseased persons (TP + FN) according to Sepsis-2 criteria. Thus, the sensitivity is given as 296 667 = 0.44 = 44%. Likewise the specificity is, given as the ratio of 664 true negatives (TN) with a total of 870 healthy 641 persons. This leads to a specificity of 870 = 0.74 = 74%. In order to quantify statistical uncertainty sensitivity and specificity should be reported together with a confidence interval. Usually a 95% interval is applied. Statistically speaking sensitvity and specificity are proportions and confidence intervals can be constructed accordingly. ̂ = nk , k=number The estimate of a proportion p is given as p of events and n=total number. The corresponding standard error σ is given by √̅̅̅̅̅̅̅ ̂ (1 − p ̂) p σp = (1) n (2) For sensitivity a 95% CI is constructed as follows with TP ̂ = TP+FN p √̅̅̅̅̅̅̅̅̅̅̅̅̅ sens(1 − sens) (3) 95%CI = sens ± 1.96 TP + FN 296 For our data we obtain sens = 667 = 0.44. The 95% CI is √̅̅̅̅̅̅̅̅̅ 0.44( 1−0.44) given by 0.44 ± 1.96 = (0.41, 0.48). For the 667 specificity equal to 0.74 we obtain a 95% CI (0.71, 0.77). There a are several ways to construct a confidence interval for a binomial proportion with different statistical properties [8]. Likelihood ratios In order to create an overall measure of diagnostic performance frequently likelihood ratios are considered [9]. These have the advantage that they combine the information obtained from sensitivity and specificiy. One way to do sensitivity this is the positive likelihood ratio LR+ = 1−specificity . The LR+ summarizes how many times more likely patients with the disease are to have that particular result than patients without the disease. More formally this is the ratio of the proportion of true positives divided by the proportion of false positives. A LR+>1 indicates that the test result is associated with the presence of the disease. For our data we obtain LR+=0.44/(1–0.74)=1.70. What does this mean? According to Jaeschke et al. (1994) [10] a LR+≥10 would be conclusive. The likelihood ratio equal to 1.70 observed here thus adds little information. Again the corresponding uncertainty should be addressed using 95% confidence intervals. Formally the LR+ is a ratio of binomial proportions [11] where a confidence interval can be constructed on the scale of the natural logarithm. On the log scale the variance of the LR+ is given by var(log(LR+ )) = σ2LR+ = 1 1 1 1 + + + (4) TP TP + FN FP FP + TN 1 1 1 + 296+371 + 229 + For our example we obtain σ2LR+ = 296 Table : Procalcitonin (PCT) and Sepsis- with cut off value . g/L. PCT ≥. g/L PCT <. g/L Total ̂ ± 1.96σ p 95% CI = p = 0.0051. Then a 95% CI is given by 1 229+641 Sepsis-+ Sepsis-− Total , , LR+ ± 1.96*exp(σ LR+ ) (5) For our example a 95% CI is obtained as √̅̅̅̅̅̅ 1.70 ± 1.96*exp( 0.0051) = (1.47, 1.95). 803 Schlattmann: Statistics in diagnostic medicine probability is obtained using Bayes’s theorem. In order to apply Bayes’ theorem we need to define the prior probability of disease which is given by the prevalence TP + FN (pre) = P (D+ ) = [TP + FN + FP + TN]. P (D+ |T + ) = P (D+ )*P (T + |D+ ) P (D )*P(T + |D+ ) + P (D− )*P (T + |D−) + (6) The PPV can be expressed in terms of sensitivity (sens) and specificity (spec): PPV = pre * sens pre* sens + (1 − pre)*(1 − spec) (7) 667 = 0.43 = For our data we obtain a prevalence pre = 1537 43%. Plugging this into the formula leads to PPV = 0.43*0.44 = 0.57 = 57% 0.43*0.44 + (1 − 0.43)*(1 − 0.74) (8) Thus, our diagnostic PCT test does little to improve our prior knowledge which is given by the prevalence equal to 43%. Performing the test tells us that the probability that the patient suffers from sepsis given that the PCT value is larger than 0.5 g/L is 57%. As Figure 1 shows the PPV depends on the prevalence of disease as well on sensitivity and specificity combined as LR+. The PPV increases with increasing prevalence. Also a larger positive likelihood ratio leads to higher predictive values. Fagan plot A Fagan plot [12] uses the pretest (prior) probability together with the LR+ and is a graphical tool for estimating how much the result of a diagnostic test changes the probability that a patient has a disease. The rationale for a 0.6 0.4 0.2 Sensitivity and specificity are used to describe the validity of a diagnostic test. These quantities can be expressed as conTP . That is ditional probabilities: sensitivity = P(T + |D+ ) = TP+FN the probability that a diseased person will have a positive TN . This is test result. The specificity equals = P(T − |D− ) = TN+FP the probability that a healthy person will have a negative test result. From a clinical perspective the important question is whether a positive test indicates a diseased person. This is TP . This the positive predictive value PPV = P(D+ |T + ) = [TP+FP] LR+=1.7 LR+=5 LR+=10 0.0 Bayes’ theorem Positiv predicitve value 0.8 1.0 Positive and negative predictive values 0.0 0.2 0.4 0.6 0.8 1.0 Prevalence of Sepsis Figure 1: Bayes theorem: relationship between positive predictive value and prevalence for various values of LR+. Fagan plot is given by a formulation of Bayes’s theorem as follows: posterior ∝ prior × likelihood ratio. Again, this describes how our prior knowledge is improved by applying a diagnostic test and leads to the posterior knowlegde after performing the test. In order to create a Fagan nomogramm we need to apply odds. p Odds are defined as odds = 1−p where p is a proba0.5 bility. If we have a fair coin the odds for head are 1−0.5 =1 meaning that head and tail are equally likely. The prior 0.43 odds in our example are given by 1−0.43 =0.75. Formulating Bayes’s theorem in terms of odds leads to posterior odds = prior odds × likelihood ratio. In our example we obtain: posterior odds = prior odds × likelihood ratio = 0.75 × 1.70 = 1.28 Odds can be transformed into probabilities using the odds following formula p = 1+odds . For our data we obtain the 1.28 PPV = ( 1+1.28) = 0.56. A Fagan plot works on the log scale. Thus we use log (posterior odds) = log (prior odds) + log(LR+ ) (9) Thus taking logs creates a linear equation. Hence, a Fagan plot consists of a vertical axis on the left with the pretest probability, an axis in the middle representing the LR+ and a vertical line showing the PPV. By connecting the pretest probability and the LR+ the post test probability is obtained. Please note, that although the labels on the left and right are written in terms of probability, the tick marks are spaced at the log odds scale. The Fagan plot of our example is shown in Figure 2. The PPV tells us the probability whether a patient with positive test result really has the disease. The negative predictive value (NPV) is the probability that a person 804 Schlattmann: Statistics in diagnostic medicine Figure 2: Fagan plot: relationship between prevalence, likelihood ratio and positive predictive value. with a negative test result is really healthy. That is TN . In our example we obtain a NPV = P(D− |T − ) = [TN+FN] NPV=641/1012=0.63. Communicating predictive probabilities Looking at Eq. (7) the use of Bayes’s appears tedious, sometimes hard to understand and difficult to communicate. Thus, Hoffrage and Gigerenzer [13] introduced the concept of natural frequencies. Traditionally medical doctors are told: The prevalence of sepsis is 43%, the sensitivity is 44% and the specificity is 74%. Please tell me the probability that the patient suffers from sepsis. This is error prone especially if the disease of interest is rare and sensitivity and specificity of the test are high [14]. Thus, we proceed as follows. Assume that we have 1,000 subjects where 434 suffer from sepsis (prevalence 43%). Of these 190 will be test positive (sensitivity=44%). Of the 570 patients without sepsis 148 will be false positive 191 (specificity=74%). Then, the PPV equals 191+147 = 0.565. This means that of 100 patients with a positive test 57 suffer from sepsis and 43 are false positive. The application of natural frequencies is also shown in Figure 3. Receiver operator curves Until now we have assumed that we are dealing with a binary diagnostic test. By using a cut off 0.5 g/L we have transformed the continuous marker Procalcitonin into a binary test. Obviously, other cut-off values could be used. For example we could apply a cut off value ≥ 2.0 g/L. Figure 3: Application of natural frequencies to calculate predictive values. Then we would obtain Table 3. This leads to a sensitivity 176 771 sens = 667 = 0.26 and a specificity spec = 870 = 0.89. As a result increasing the cut off value form 0.5 g/L to 2.0 g/L led to a decreased sensitivity and an increased specificity. Looking at descriptive statistics of the Procalcitonin data we observe a median PCT value equal to 0.2 g/L with a minimum equal to 0.01 g/L and a maximum of 200 g/L. Obviously, we could use any value between minimum and maximum as a cut off value and calculate the corresponding sensitivity and specificity. This is done when we create a receiver operator curve (ROC) [15] which is obtained by calculating the sensitivity and specificity of every observed data value and plotting Table : Procalcitonin (PCT) and Sepsis- with cut off value . ng/mL. PCT ≥. g/L PCT <. g/L Total Sepsis-+ Sepsis-− Total , , Schlattmann: Statistics in diagnostic medicine sensitivity against 1-specificity. A test that perfectly discriminates between the two groups would yield a “curve” that coincided with the left and top sides of the plot since we would not have any false negative (FN) are false positive (FP) values. A useless test would give a straight line from the bottom left corner to the top right. This implies that a true positive and a false positive test result are equally likely. The performance of the test can be assessed by using the area under the receiver operating characteristic curve (AUC). This area may be interpreted as the probability that a random person with the disease has a higher value of the measurement than a random person without the disease. A perfect test would have an AUC=1 and a useless test has an AUC=0.5. For our example we obtain an AUC=0.64 with 95% CI (0.61, 0.67). A good review for the construction of confidence intervals for the AUC is given by Cho et al. (2018) [16]. In conclusion Procalcitonin offers moderate diagnostic discrimination at best as shown in Figure 4. However, after having determined that a test provides good discrimination the best cut off point for clinical needs has to be chosen. One possible approach is given by maximizing the sum of the sensitivity and specificity. This leads to the so called Youden Index J=sens + spec − 1. Hence for each possible cut off value J is calculated and the value which leads to a maximum of J is chosen. For the data at hand we obtain a cut off value equal to 0.175 g/L with a sensitivity of 0.65 and a specificity of 0.56. An approach which is just data driven is not helpful because also the clinical situation needs to taken into account. Schuetz et al. (2019) [17] for example “refined the established PCT algorithms by incorporating severity of illness and probability of bacterial infection and reducing the fixed cut-offs to only one for mild to 805 moderate and one for severe disease 0.25 g/L and 0.5 g/L, respectively”. Sample size estimation Like in therapeutic clinical trials sample size estimation should be performed for diagnostic studies. Knottnerus and Muris (2003) [18] present the whole strategy needed for the development of diagnostic tests. This involves the selection of cases and controls and ensuring that a correct reference standard is defined. From a statistical point of view the allowable Type I and Type II errors, the primary outcome of interest together with a relevant effect size and is variability need to be defined in advance. Formulas and tables for the planning of binary tests may be found in the paper by Flahault et al. (2003) [19]. For continuous biomarkers sample size estimation can be based on the AUC of the ROC curve. Let us consider a phase I diagnostic study where we want to determine whether the new diagnostic test has any ability to discriminate diseased patients from healthy controls. Then the null hypothesis is that the AUC equals 0.5 vs. the alternative hypothesis that the AUC is ≠0.5. Formulas for sample size estimation may be found in Obuchowski et al. (2004, page 1123 Eqs. (2) and (3)) [20]. Let us assume that our new biomarker performs better than Procalcitonin with an AUC=0.7. We accept a Type I error of 5% (two-sided) and we want a power of 90%. Then we need 41 cases and 41 controls. Using R for the calculations The freely statistical available package R [4] may be used to perform the necessary calculations for our example. The package can be obtained at https://cran.r-project. org. A useful integrated software environment is given by by RStudio https://www.rstudio.com/. Using RStudio R scripts can easily be used to run the respective R commands. Data and a R script for our example are given in the supplementary material. Importing and manipulating data Figure 4: Procaclition as biomarker for sepsis: receiver operator curve. The data from our example are read from an Excel csv file and stored as an object named “diag.data”. The command “read.csv2” reads Excel files in .csv format. First comes the name of the file. Next “header=T” implies that the first line of the file contains the variable names. Finally 806 Schlattmann: Statistics in diagnostic medicine “na.string=. ‘means missing values are indicated by to “.” the function “epi.tests” which calculates sensitivity etc. diag.data<-read.csv2(“pct_example.csv”,header=T, na.string=“.”) install.packages("epiR"} library(epiR) epi.tests(table05) This object “diag.data” contains the data and can be modified. Here, the data column “Procalcitonin” contains the biomarker values in g/L. In a first step we create a new binary indicator named “PCT” which is a new column of our data. This indicator takes the value “1” if the Procalcitonin level is ≥ 0.5 g/L and 0 otherwise. Then, we attach value labels. First, we declare the variable as a “factor” and assign the value labels. ## Apply the widely used cutoff value 0.5 g/L diag.data$PCT<ifelse(diag.data$Procalcitonin<0.5,0,1) # Define the variable PCT as a factor diag.data$PCT<-as.factor(diag.data$PCT) # assign value labels levels(diag.data$PCT)<-c(‘<0.5 g/L’,‘>=0.5 g/L’) attach(diag.data) The command “attach” provides access to the individual elements of the data object “diag.data”. Now we can perform basic descriptive statistics using the command “summary”. For the variable “Proacalcitonin” we obtain summary(Procalcitonin) Min. 1st Qu. Median 0.010 0.060 0.200 This gives the (shortened) result Point estimates and 95% CIs. True prevalence * Sensitivity * Specificity * Positive predictive value * Negative predictive value * Positive likelihood ratio Negative likelihood ratio 0.43 (0.41, 0.46) 0.44 (0.41, 0.48) 0.74 (0.71, 0.77) 0.56 (0.52, 0.61) 0.63 (0.60, 0.66) 1.69 (1.47, 1.94) 0.75 (0.70, 0.82) —————————————————————————————— Construction of a Fagan plot A Fagan plot is constructed using the library “Teachingdemos” and submitting a prevalence of 0.43 and LR+=1.7 to the function “fagan.plot”. library(TeachingDemos) fagan.plot(0.43,1.7) The result is shown in Figure 2. Mean 3.376 3rd Qu. 1.070 Max. 200.000 Receiver operator curves Calculating diagnostic parameters In the next step we create a 2 × 2 table with our diagnostic test variable “PCT” vs. “sepsis2”. We exclude missing values indicated by “.” and change the ordering of the table and obtain table05<-table(PCT,sepsis2,exclude=“.”)[2:1,2:1] table05 sepsis2 PCT Yes No >=0.5 g/L 296 229 <0.5 g/L 371 641 The package “epiR” is used to calculate sensitivity specificity etc. First we need to install the package (only once) and then load its functionality using the command “library”. Finally we submit our table named “table05” to In the next step we use Procalcitonin as a continuous biomarker and construct a ROC-curve and calculate the AUC together with a 95% CI using the package pROC [21]. library(pROC) cut1<-roc(~sepsis2∼Procalcitonin,data=diag.data, percent=F,print.auc=T,ci=T) print(cut1) Data: Procalcitonin in 870 controls (sepsis No) < 667 cases (sepsis Yes). Area under the curve: 0.6407 95% CI: 0.613–0.6683 (DeLong) In the next step we obtain the optimal cut off value based on Youden’s index J and plot the ROC curve Schlattmann: Statistics in diagnostic medicine coords(cut1, “best”, “threshold”,best.method=“youden”) plot.roc(sepsis~Procalcitonin, print.auc=T,ci=T,data= pct,percent=F,legacy.axes=T,grid=T) The ROC curve is shown in Figure 4 and the cut off value is given by threshold specificity sensitivity 0.175 0.562069 0.6521739 Sample size estimation If we want to estimate the necessary sample size for a diagnostic Phase I study assuming an AUC = 0.7, 90% power and two sided significance level of 5% we can use: library(pROC) power.roc.test(auc=0.70, sig.level=0.05, power=0.90, alternative=“two.sided”) One ROC curve power calculation ncases = 40.21369 ncontrols = 40.21369 auc = 0.7 sig.level = 0.05 power = 0.9 Hence we would include 82 subjects into our study. Acknowledgements: I would like to thank Dr. Ljungstroem [3] and colleagues for the allowance to use the data from their study as an example for this article. Research funding: None declared. Author contributions: Single author statement. Competing interests: Author states no conflict of interest. Informed consent: Not applicable. Ethical approval: Not applicable. References 1. Fleischmann C, Scherag A, Adhikari NKJ, Hartog CS, Tsaganos T, Schlattmann P, et al. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016;193: 259–72. 2. Wacker C, Prkno A, Brunkhorst FM, Schlattmann P. Procalcitonin as a diagnostic marker for sepsis: a systematic review and metaanalysis. Lancet Infect Dis 2013;13:426–35. 807 3. Ljungström L, Pernestig AK, Jacobsson G, Andersson R, Usener B, Tilevik D. Diagnostic accuracy of procalcitonin, neutrophil-lymphocyte count ratio, C-reactive protein, and lactate in patients with suspected bacterial sepsis. PLoS One 2017;12:e0181704. 4. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2019. Available from: https://www. R-project.org/. 5. Altman DG, Bland JM. Statistics Notes: diagnostic tests 1: sensitivity and specificity. BMJ 1994;308:1552. 6. Levy MM, Fink MP, Marshall JC, Abraham E, Angus D, Cook D, et al. 2001 SCCM/ESICM/ACCP/ATS/SIS international sepsis definitions conference. Crit Care Med 2003;31:1250–6. 7. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016; 315:801. 8. Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Statistician 1998;52: 119–26. 9. Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios. BMJ 2004;329:168–9. 10. Jaeschke R. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The EvidenceBased Medicine Working Group. JAMA: J Am Med Assoc 1994; 271:703–7. 11. Koopman PAR. Confidence intervals for the ratio of two binomial proportions. Biometrics 1984;40:513–7. 12. Fagan T. Nomogram for Bayes’s theorem. N Engl J Med 1975;293: 257. 13. Gigerenzer G, Hoffrage U. How to improve Bayesian reasoning without instruction – frequency formats [journal article]. Psychol Rev 1995;102:684–704. 14. Gigerenzer G What are natural frequencies? BMJ (Clinical research ed). 2011;343:d6386. 15. Altman DG, Bland JM. Statistics Notes: diagnostic tests 3: receiver operating characteristic plots. BMJ 1994;309:188. 16. Cho H, Matthews GJ, Harel O. Confidence intervals for the area under the receiver operating characteristic curve in the presence of ignorable missing data. Int Stat Rev 2018;87: 152–77. 17. Schuetz P, Beishuizen A, Broyles M, Ferrer R, Gavazzi G, Gluck EH, et al. Procalcitonin (PCT)-guided antibiotic stewardship: an international experts consensus on optimized clinical use. Clin Chem Lab Med 2019;57:1308–18. 18. Knottnerus JA, Muris JW. Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol 2003; 56:1118–28. 19. Flahault A, Cadilhac M, Thomas G. Sample size calculation should be performed for design accuracy in diagnostic test studies. J Clin Epidemiol 2005;58:859–62. 20. Obuchowski NA, Lieber ML, Wians FH. ROC curves in clinical chemistry: uses, misuses, and possible solutions. Clin Chem 2004;50:1118–25. 21. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:77.