Sensitivity and Specificity

advertisement
Sensitivity and Specificity
Learning Objectives
The student will be able to



define and discuss the concepts of sensitivity and specificity in the context of
diagnostic medical tests
false positive and false negative
define and produce a receiver operating curve and discuss its role in assessing
how well a test can discriminate between categories
Knowledge and Skills




produce graphs in EXCEL (scatter plot)
calculate the likelihood of a patient having a disease based on the outcome of
the result of a diagnostic medical test with given sensitivity and specificity and
prevalence of the disease in the general population
calculate sensitivity and specificity based on a data set, graph the corresponding receiver operating
curve (ROC)
interpret ROC and AUC
Prerequisites


frequency interpretation of probability
conditional probability and Bayes Theorem
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 1
Before You Come to Class…
Hypothyroidism is a disease that affects about 4% of the population in the U.S. Read about
hypothyroidism on the web before class using the Mayo Clinic’s web page:
http://mayoclinic.com/health/hypothyroidism/DS00353
Answer the following questions:
1. What are common symptoms of hypothyroidism? (See
http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=symptoms)
2. Describe the role of the two hormones, thyroxine (T-4) and triiodothyronine (T-3) in human
physiology. Which hormone regulates T-3 and T-4 and where is it produced? (See
http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=causes)
3. Which diagnostic tests are used to diagnose hypothyroidism? (See
http://mayoclinic.com/health/hypothyroidism/DS00353/DSECTION=tests-and-diagnosis)
In-class Activity
A patient underwent a diagnostic test for hypothyroidism. The diagnostic test correctly identifies
patients who in fact have the disease in 93% of the cases and correctly labels healthy patients as healthy
in 81% of the cases. Assume that about 4% of all patients have the disease and the test for this particular
patient comes back positive (i.e., the test indicates that the patient has the disease). Without any
calculations, how likely is it in your opinion that the patient in fact has the disease? Write down your
guess. (After completion of this module, you will be able to use the rules of probability to calculate the
probability of this event.)
We will start with a case study to address two questions regarding the performance of a diagnostic
medical test:
1. What percentage of patients is correctly identified as having/not having a disease?
2. Given a test result comes back positive/negative, what is the likelihood that the patient in fact
has/does not have the disease?
We will see that the answer to Question 1 depends on the test characteristics, whereas the answer to
Question 2 requires knowledge of the prevalence of the disease in the population.
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 2
What percentage of patients is correctly identified as having/not having a disease?
Data
A retrospective study on hypothyroidism was conducted in a primary care internal medicine practice
group at the University of Rochester School of Medicine and Dentistry, Rochester NY, between August
1983 and February 1984 (Goldstein and Mushlin 1987). A total of 27 patients in the study were known to
be hypothyroid and a total of 97 patients had normal thyroid gland function, i.e., they were euthyroid. A
diagnostic test for T-4 values was performed on all patients to determine the predictive value of this
test. (See spreadsheet.)
Table 1: T-4 values
T4 Value
<1.1
1.1-2
2.1-3
3.1-4
4.1-5
5.1-6
6.1-7
7.1-8
8.1-9
9.1-10
10.1-11
11.1-12
>12
Total
Hypothyroid
2
3
1
8
4
4
3
2
0
0
0
0
0
27
Euthyroid
0
0
0
0
1
6
11
20
17
20
13
5
4
97
In-class Activity
1. Use a scatter plot to graph the number of patients in each group (hypothyroid and euthyroid) as a
function of T-4 values.
2. Based on the data, what is the likelihood that a patient with a T-4 value of more than 4μg/dl is
hypothyroid? Euthyroid?
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 3
3. For a diagnostic test that yields numerical values, a range of values is defined that suggest the
presence of the disease. In the case of hypothyroidism, the presence of disease is indicated by low
T4 values. We can thus define a cut point to distinguish normal from abnormal: values below the cut
point indicate presence of the disease; values above the cut point indicate absence of the disease.
Based on the data, define a cut point and explain your reasoning. For your choice, what percent of
patient would be correctly/incorrectly identified as having the disease? What percent of patient
would be correctly/incorrectly identified as healthy?
True/False Positive and True/False Negative
When a diagnostic test is performed and the test result is positive, the patient is classified as having the
disease. If the test is negative, the patient is classified as healthy. However, diagnostic tests may yield a
wrong result by either falsely classifying a person as having the disease if in fact the person is healthy, or
by classifying the person as healthy if in fact the person has the disease. This can be summarized in a
two-by-two table
Disease present
Disease absent
Test positive True positive (TP)
False positive (FP)
Test negative False negative (FN) True negative (TN)
The following figure is another way to understand the different outcomes of a diagnostic test as a
function of disease state and also explains two important concepts, specificity and sensitivity, that are
used to determinate how well a test can discriminate between patients who have the disease and those
who do not.
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 4
Assume that a patient with a T-4 value of 7 or less is considered to be hypothyroid, i.e., set the cut point
equal to 7. Using the data in Table 1 show that number of patients who are classified TN, FP, TP, and FN
are
TN  79 
 total  97
FP  18 
TP  25
 total  27
FN  2 
We can fill the two-by-two table with numbers:
Disease present Disease absent
Test positive
25
18
Test negative
2
79
In-class Problem: Assume that a patient with a T-4 value of 5 or less is considered to be hypothyroid.
Use the data in Table 1 to complete the two-by-two table:
Disease present Disease absent
Test positive
Test negative
Sensitivity and Specificity
The specificity or true negative rate (TNR) is defined as the percentage of patients who are correctly
identified as being healthy:
Specificity 
TN
TN  FP
The quantity 1-specificity is the false positive rate and is the percentage of patients that are incorrectly
identified as having the disease.
The sensitivity or true positive rate (TPR) is defined as the percentage of patients who are correctly
identified as having the disease:
Sensitivity 
TP
TP  FN
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 5
We will use the date in Table 1 to see that there is a trade-off between sensitivity and specificity. We will
calculate sensitivity and specificity for different cut points for hypothyroidism.
When the cut point is 7, the specificity is
79
25
 0.81 and the sensitivity is
 0.93 .
79  18
25  2
In-class Problem: Calculate specificity and sensitivity when the cut point is 5.
In-class Activity
Complete Table 2 using the data in Table 1 (see spreadsheet). From the table you can see that there is a
tradeoff between specificity and sensitivity. Describe this tradeoff.
Table 2
Cut Point
Hypothyroid
TP
Euthyroid
FN
FP
TN
Specificity
Sensitivity
4
5
6
7
25
2
18
79
0.81
0.93
8
9
10
11
12
13
Receiver Operating Curve (ROC)
The receiver operating curve is a graph where sensitivity is plotted as a function of 1-specificity. The area
under the ROC is denoted AUC. Suppose we pick a pair of patients, one from the group of healthy
patients and one from the group of patients that have the disease. One can show that AUC is the
probability that the healthy patient will have a higher T-4 value than the patient with the disease. The
AUC is thus used as a measure of how well the test can discriminate between the two groups. The larger
the value of the AUC is, the better the test can discriminate between the two groups.
In-class Problem: Graph the ROC for the data set in Table 1 using your calculations in Table 2.
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 6
Given a test result comes back positive/negative, what is the likelihood that the patient in
fact has/does not have the disease?
Problem: A patient underwent a diagnostic test for hypothyroidism. The diagnostic test correctly
identifies patients who in fact have the disease in 93% of the cases and correctly labels healthy patients
as healthy in 81% of the cases. Assume that about 4% of all patients have the disease and the test for
this particular patient comes back positive (i.e., the test indicates that the patient has the disease).
(a) Determine the sensitivity and the specificity of the test.
(b) If a randomly picked individual from the population undergoes this test and the test result is positive,
what is the probability that this person in fact has the disease?
Answer:
(a) The sensitivity is 0.93 and the specificity is 0.81.
(b) We use Bayes Theorem to calculate the probability in question:
The tree diagram illustrates the probabilities of prevalence and test outcomes. We are interested in
calculating the probability that a patient has the disease given the test came back positive. Using Bayes
Theorem, we find
P(Disease &Test Positive)
P(Test Positive)
(0.04)(0.93)

 0.17
(0.04)(0.93)  (0.96)(0.19)
P(Disease|Test Positive) 
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 7
That is, the likelihood that this randomly picked person in fact has the disease is only about 17%.
In-class Problem: Repeat the calculation when the prevalence is 2%. How does your answer change
when the prevalence is 20%?
Homework Problems
1. Produce a graph where P(Disease|Test Positive) is on the vertical axis and prevalence is on the
horizontal axis for a diagnostic test that correctly identifies patients who in fact have the disease in
93% of the cases and correctly labels healthy patients as healthy in 81% of the cases. What does this
graph tell you about random testing for a relatively uncommon disease?
2. You work at a diagnostic lab. Last year, the lab was asked by a local clinic to perform a certain
diagnostic medical test 1347 times. If the test of a patient came back positive, the clinic treated the
patient for the disease. You wish to determine the number of patients that were treated
unnecessarily based on the testing done at your lab. What information would you need to
determine this number and how would you go about calculating this number?
3. There are often multiple diagnostic medical tests available to test whether a patient has a certain
disease. For instance, to test asymptomatic men for the presence of Chlamydia trachomatis, a
sexually transmitted organism that can cause serious problems in women, including infertility,
various tests have been developed that detect the presence of C. trachomatis in urine. Chernesky et
al. (1999) compared the diagnostic performance of three rapid tests and LE dipstick assay and found
the following performance:
Rapid test
Testpack Chlamydia (Abbott)
Surecell Chlamydia (Kodak)
Clearview Chlamydia (Unipath)
LE dipstick (Boehringer)
% Sensitivity % Specificity
70.9
95.5
62.9
100
67.7
95.5
87.5
92.4
(a) Which test has the highest sensitivity, which the highest specificity?
(b) Suppose the prevalence in the population that is tested is 18% and 1000 patients are being
tested. For each test, determine the number of patients that are missed and the number of patients
that are unnecessarily treated. Determine the probability that a positive test is a true positive and
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 8
that a negative test is a true negative. How do your answers change when the prevalence is only
2%?
4. Assume that there is a trade-off between sensitivity and specificity: a test with a higher sensitivity
has a lower specificity and vice versa. If your goal is to minimize the number of patients who are
treated unnecessarily, should you maximize sensitivity or specificity? What about if your goal is to
minimize the number of patients that are missed by the test?
5. Pregnancy Test: The Clearblue® Easy pregnancy test list the “%samples from pregnant women giving
a pregnant result” as a function of “Number of days before the expected period”:
Number of days
before the
expected period
-1
-2
-3
-4
% of samples from
pregnant women giving
a pregnant result
95%
90%
82%
51%
(a) Which of the two quantities, sensitivity and specificity, can you calculate from the data provided?
Perform the calculations.
(b) If a pregnant woman uses the test four days before the expected period, what is the likelihood
that she will have at least one positive test result if she takes two pregnancy tests?
References
Chernesky, M., D. Jang, J. Krepel, J. Sellors, and J. Mahony. 1999. Journal of Clinical Microbiology, Sept.
1999, p. 2777-2780.
Goldstein, B.J. and A.I. Mushlin. 1987. Use of a single thyroxine test to evaluate ambulatory medical
patients for suspected hypothyroidism. Journal of General Internal Medicine 2: 20-24.
Citation: Neuhauser, C. Sensitivity and Specificity.
Created: November 8, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 9
Download