Sensitivity and Specificity Learning Objectives The student will be able to define and discuss the concepts of sensitivity and specificity in the context of diagnostic medical tests false positive and false negative define and produce a receiver operating curve and discuss its role in assessing how well a test can discriminate between categories Knowledge and Skills produce graphs in EXCEL (scatter plot) calculate the likelihood of a patient having a disease based on the outcome of the result of a diagnostic medical test with given sensitivity and specificity and prevalence of the disease in the general population calculate sensitivity and specificity based on a data set, graph the corresponding receiver operating curve (ROC) interpret ROC and AUC Prerequisites frequency interpretation of probability conditional probability and Bayes Theorem Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Before You Come to Class… Hypothyroidism is a disease that affects about 4% of the population in the U.S. Read about hypothyroidism on the web before class using the Mayo Clinic’s web page: http://mayoclinic.com/health/hypothyroidism/DS00353 Answer the following questions: 1. What are common symptoms of hypothyroidism? (See http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=symptoms) 2. Describe the role of the two hormones, thyroxine (T-4) and triiodothyronine (T-3) in human physiology. Which hormone regulates T-3 and T-4 and where is it produced? (See http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=causes) 3. Which diagnostic tests are used to diagnose hypothyroidism? (See http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=tests-and-diagnosis) In-class Activity A patient underwent a diagnostic test for hypothyroidism. The diagnostic test correctly identifies patients who in fact have the disease in 93% of the cases and correctly labels healthy patients as healthy in 81% of the cases. Assume that about 4% of all patients have the disease and the test for this particular patient comes back positive (i.e., the test indicates that the patient has the disease). Without any calculations, how likely is it in your opinion that the patient in fact has the disease? Write down your guess. (After completion of this module, you will be able to use the rules of probability to calculate the probability of this event.) We will start with a case study to address two questions regarding the performance of a diagnostic medical test: 1. What percentage of patients is correctly identified as having/not having a disease? 2. Given a test result comes back positive/negative, what is the likelihood that the patient in fact has/does not have the disease? We will see that the answer to Question 1 depends on the test characteristics, whereas the answer to Question 2 requires knowledge of the prevalence of the disease in the population. Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2 What percentage of patients is correctly identified as having/not having a disease? Data A retrospective study on hypothyroidism was conducted in a primary care internal medicine practice group at the University of Rochester School of Medicine and Dentistry, Rochester NY, between August 1983 and February 1984 (Goldstein and Mushlin 1987). A total of 27 patients in the study were known to be hypothyroid and a total of 97 patients had normal thyroid gland function, i.e., they were euthyroid. A diagnostic test for T-4 values was performed on all patients to determine the predictive value of this test. (See spreadsheet.) Table 1: T-4 values T4 Value <1.1 1.1-2 2.1-3 3.1-4 4.1-5 5.1-6 6.1-7 7.1-8 8.1-9 9.1-10 10.1-11 11.1-12 >12 Total Hypothyroid 2 3 1 8 4 4 3 2 0 0 0 0 0 27 Euthyroid 0 0 0 0 1 6 11 20 17 20 13 5 4 97 In-class Activity 1. Use a scatter plot to graph the number of patients in each group (hypothyroid and euthyroid) as a function of T-4 values. 2. Based on the data, what is the likelihood that a patient with a T-4 value of more than 4μg/dl is hypothyroid? Euthyroid? Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3 3. For a diagnostic test that yields numerical values, a range of values is defined that suggest the presence of the disease. In the case of hypothyroidism, the presence of disease is indicated by low T4 values. We can thus define a cut point to distinguish normal from abnormal: values below the cut point indicate presence of the disease; values above the cut point indicate absence of the disease. Based on the data, define a cut point and explain your reasoning. For your choice, what percent of patient would be correctly/incorrectly identified as having the disease? What percent of patient would be correctly/incorrectly identified as healthy? True/False Positive and True/False Negative When a diagnostic test is performed and the test result is positive, the patient is classified as having the disease. If the test is negative, the patient is classified as healthy. However, diagnostic tests may yield a wrong result by either falsely classifying a person as having the disease if in fact the person is healthy, or by classifying the person as healthy if in fact the person has the disease. This can be summarized in a two-by-two table Disease present Disease absent Test positive True positive (TP) False positive (FP) Test negative False negative (FN) True negative (TN) The following figure is another way to understand the different outcomes of a diagnostic test as a function of disease state and also explains two important concepts, specificity and sensitivity, that are used to determinate how well a test can discriminate between patients who have the disease and those who do not. Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4 Assume that a patient with a T-4 value of 7 or less is considered to be hypothyroid, i.e., set the cut point equal to 7. Using the data in Table 1 show that number of patients who are classified TN, FP, TP, and FN are TN 79 total 97 FP 18 TP 25 total 27 FN 2 We can fill the two-by-two table with numbers: Disease present Disease absent Test positive 25 18 Test negative 2 79 In-class Problem: Assume that a patient with a T-4 value of 5 or less is considered to be hypothyroid. Use the data in Table 1 to complete the two-by-two table: Disease present Disease absent Test positive Test negative Sensitivity and Specificity The specificity or true negative rate (TNR) is defined as the percentage of patients who are correctly identified as being healthy: Specificity TN TN FP The quantity 1-specificity is the false positive rate and is the percentage of patients that are incorrectly identified as having the disease. The sensitivity or true positive rate (TPR) is defined as the percentage of patients who are correctly identified as having the disease: Sensitivity TP TP FN Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5 We will use the date in Table 1 to see that there is a trade-off between sensitivity and specificity. We will calculate sensitivity and specificity for different cut points for hypothyroidism. When the cut point is 7, the specificity is 79 25 0.81 and the sensitivity is 0.93 . 79 18 25 2 In-class Problem: Calculate specificity and sensitivity when the cut point is 5. In-class Activity Complete Table 2 using the data in Table 1 (see spreadsheet). From the table you can see that there is a tradeoff between specificity and sensitivity. Describe this tradeoff. Table 2 Cut Point Hypothyroid TP Euthyroid FN FP TN Specificity Sensitivity 4 5 6 7 25 2 18 79 0.81 0.93 8 9 10 11 12 13 Receiver Operating Curve (ROC) The receiver operating curve is a graph where sensitivity is plotted as a function of 1-specificity. The area under the ROC is denoted AUC. Suppose we pick a pair of patients, one from the group of healthy patients and one from the group of patients that have the disease. One can show that AUC is the probability that the healthy patient will have a higher T-4 value than the patient with the disease. The AUC is thus used as a measure of how well the test can discriminate between the two groups. The larger the value of the AUC is, the better the test can discriminate between the two groups. In-class Problem: Graph the ROC for the data set in Table 1 using your calculations in Table 2. Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 6 Given a test result comes back positive/negative, what is the likelihood that the patient in fact has/does not have the disease? Problem: A patient underwent a diagnostic test for hypothyroidism. The diagnostic test correctly identifies patients who in fact have the disease in 93% of the cases and correctly labels healthy patients as healthy in 81% of the cases. Assume that about 4% of all patients have the disease and the test for this particular patient comes back positive (i.e., the test indicates that the patient has the disease). (a) Determine the sensitivity and the specificity of the test. (b) If a randomly picked individual from the population undergoes this test and the test result is positive, what is the probability that this person in fact has the disease? Answer: (a) The sensitivity is 0.93 and the specificity is 0.81. (b) We use Bayes Theorem to calculate the probability in question: The tree diagram illustrates the probabilities of prevalence and test outcomes. We are interested in calculating the probability that a patient has the disease given the test came back positive. Using Bayes Theorem, we find P(Disease &Test Positive) P(Test Positive) (0.04)(0.93) 0.17 (0.04)(0.93) (0.96)(0.19) P(Disease|Test Positive) Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 7 That is, the likelihood that this randomly picked person in fact has the disease is only about 17%. In-class Problem: Repeat the calculation when the prevalence is 2%. How does your answer change when the prevalence is 20%? Homework Problems 1. Produce a graph where P(Disease|Test Positive) is on the vertical axis and prevalence is on the horizontal axis for a diagnostic test that correctly identifies patients who in fact have the disease in 93% of the cases and correctly labels healthy patients as healthy in 81% of the cases. What does this graph tell you about random testing for a relatively uncommon disease? 2. You work at a diagnostic lab. Last year, the lab was asked by a local clinic to perform a certain diagnostic medical test 1347 times. If the test of a patient came back positive, the clinic treated the patient for the disease. You wish to determine the number of patients that were treated unnecessarily based on the testing done at your lab. What information would you need to determine this number and how would you go about calculating this number? 3. There are often multiple diagnostic medical tests available to test whether a patient has a certain disease. For instance, to test asymptomatic men for the presence of Chlamydia trachomatis, a sexually transmitted organism that can cause serious problems in women, including infertility, various tests have been developed that detect the presence of C. trachomatis in urine. Chernesky et al. (1999) compared the diagnostic performance of three rapid tests and LE dipstick assay and found the following performance: Rapid test Testpack Chlamydia (Abbott) Surecell Chlamydia (Kodak) Clearview Chlamydia (Unipath) LE dipstick (Boehringer) % Sensitivity % Specificity 70.9 95.5 62.9 100 67.7 95.5 87.5 92.4 (a) Which test has the highest sensitivity, which the highest specificity? (b) Suppose the prevalence in the population that is tested is 18% and 1000 patients are being tested. For each test, determine the number of patients that are missed and the number of patients that are unnecessarily treated. Determine the probability that a positive test is a true positive and Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 8 that a negative test is a true negative. How do your answers change when the prevalence is only 2%? 4. Assume that there is a trade-off between sensitivity and specificity: a test with a higher sensitivity has a lower specificity and vice versa. If your goal is to minimize the number of patients who are treated unnecessarily, should you maximize sensitivity or specificity? What about if your goal is to minimize the number of patients that are missed by the test? 5. Pregnancy Test: The Clearblue® Easy pregnancy test list the “%samples from pregnant women giving a pregnant result” as a function of “Number of days before the expected period”: Number of days before the expected period -1 -2 -3 -4 % of samples from pregnant women giving a pregnant result 95% 90% 82% 51% (a) Which of the two quantities, sensitivity and specificity, can you calculate from the data provided? Perform the calculations. (b) If a pregnant woman uses the test four days before the expected period, what is the likelihood that she will have at least one positive test result if she takes two pregnancy tests? References Chernesky, M., D. Jang, J. Krepel, J. Sellors, and J. Mahony. 1999. Journal of Clinical Microbiology, Sept. 1999, p. 2777-2780. Goldstein, B.J. and A.I. Mushlin. 1987. Use of a single thyroxine test to evaluate ambulatory medical patients for suspected hypothyroidism. Journal of General Internal Medicine 2: 20-24. Citation: Neuhauser, C. Sensitivity and Specificity. Created: November 8, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 9