Studies and Systematic Reviews of Diagnostic Tests

advertisement
Welcome Back From Lunch
Thursday Afternoon
1:30-2:15 Studies and Systematic Review of
Diagnostic Test Accuracy (Tom)
2:15-3:00 Prognostic and Genetic Tests
(Mark)
3:00-3:45 Combining Tests (Michael)
3:45-4:00 Break
4:00-6:00 Small Groups
6:00 Meet in 6702 to head to Giants game
Studies of Diagnostic Test
Accuracy
Checklist




Was there an independent, blind
comparison with a reference (“gold”)
standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those
in whom we would use it in practice)?
Was the reference standard applied
regardless of the diagnostic test result?
Was the test (or cluster of tests) validated
in a second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd ed. (NY:
Churchill Livingstone), 2000. p 68
Beyond the Checklist

Consider not only possibility of bias, but
WHY it may occur and DIRECTION it
would affect results




Incorporation bias
Spectrum bias
Verification bias
Double gold standard bias
Incorporation Bias


When the test itself can be incorporated
into the gold standard
Prevented by blinding
Example: Study of BNP as a test for
congestive heart failure (CHF)*
Gold standard: determination of CHF by two
cardiologists blinded to BNP
 “The best clinical predictor of congestive
heart failure was an increased heart size on
chest roentgenogram (accuracy, 81 percent)”
 Is there a problem with assessing accuracy
of chest x-rays to diagnose CHF in this
study?
*Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et

al. Rapid measurement of B-type natriuretic peptide in the emergency
diagnosis of heart failure. N Engl J Med 2002;347(3):161-7. Problem 4.3
Incorporation bias



Cardiologists not blinded to Chest X-ray
Used (incorporated) Chest x-ray for CHF
diagnosis
Incorporation bias for assessment of
Chest X-ray, not BNP
Spectrum of Disease and
Nondisease


Disease is often easier to diagnose if
severe
“Nondisease” is easier to diagnose if
patient is well than if the patient has
other diseases
Spectrum Bias



Sensitivity depends on the spectrum of
disease in the population being tested.
Specificity depends on the spectrum of
non-disease in the population being
tested.
Example: Absence of Nasal Bone (on 13week ultrasound) as a Test for
Chromosomal Abnormality
Spectrum Bias Example: Absence of Nasal
Bone as a Test for Trisomy 21*
Nasal Bone
Absent
Yes
No
Total
D+
229
104
333
D129
5094
5223
LR
27.8
0.32
Sensitivity = 229/333 = 69%
Specificity = 5094/5223 = 97.5%
BUT the D- group only included chromosomally
normal fetuses
Cicero et al., Ultrasound Obstet Gynecol 2004; 23: 218-23
Spectrum Bias: Absence of Nasal
Bone as a Test for Chromosomal
Abnormality



D- group excluded 295 fetuses with other
chromosomal abnormalities (esp. Trisomy 18)
Among these fetuses, 32% had absent nasal
bone (not 2.5%)
What decision is this test supposed to help
with?

If it is whether to test chromosomes using
chorionic villus sampling or amniocentesis, these
295 fetuses should be included in D+ group!
Spectrum Bias:
Absence of Nasal Bone as a Test for
Chromosomal Abnormality, effect of including
other trisomies in D+ group
Nasal Bone
Absent
Yes
No
Total
D+
229 + 95 =324
104 + 200=304
333 + 295=628
D129
5094
5223
LR
20.4
0.50
Sensitivity = 324/628 = 52%
NOT 69% obtained when the D+ group only included
fetuses with Trisomy 21
Verification bias: Example

Visual assessment of jaundice in
newborns


Study patients who are getting a
bilirubin measurement
Ask clinicians to estimate extent of
jaundice at time of blood draw
Visual Assessment of jaundice*:
Results


Sensitivity of jaundice below the nipple line
for bilirubin ≥ 12 mg/dL = 97%
Specificity = 19%
Editor’s Note: The take-home message for me is that no
jaundice below the nipple line equals no bilirubin test, unless
there’s some other indication.
--Catherine D. DeAngelis, MD

What is the problem?
*Moyer et al., Archives Pediatr Adol Med 2000; 154:391
Verification Bias*

Inclusion criterion for study: gold standard
test was done


Subjects with positive index tests are more
likely to be get the gold standard and to be
included in the study


in this case, blood test for bilirubin
clinicians usually don’t order blood test for
bilirubin if there is little or no jaundice
How does this affect sensitivity and
specificity?
*AKA Work-up, Referral Bias, or Ascertainment Bias
Verification Bias Effects
Jaundice
below nipple
No jaundice
below nipple
TSB >12
TSB < 12
a
b
c 
d 
Sensitivity, a/(a+c), is biased ___.
Specificity, d/(b+d), is biased ___.
*AKA Work-up, Referral Bias, or Ascertainment Bias
Double Gold Standard Bias-1

Two different “gold standards”
 One gold standard (e.g., surgery, invasive
test) is more likely to be applied in
patients with positive index test
 Other gold standard (e.g., clinical followup) is more likely to be applied in patients
with a negative index test.
Double Gold Standard Bias- 2

There are some patients in whom the two
“gold standards” do not give the same
answer


Spontaneously resolving disease (positive with
immediate invasive test, but not with follow-up)
Newly occurring or newly detectable disease
(positive with follow-up but not with immediate
invasive test)
Double Gold Standard Bias, example





Study Population: All patients presenting to the ED
who received a V/Q scan
Test: V/Q Scan
Disease: Pulmonary embolism (PE)
Gold Standards:
 1. Pulmonary arteriogram (PA-gram) if done (more
likely with more abnormal V/Q scan)
 2. Clinical follow-up in other patients (more likely
with normal VQ scan
What happens if some PE resolve spontaneously?
*PIOPED. JAMA 1990;263(20):2753-9.
Effect of Double Gold Standard Bias 1:
Spontaneously resolving disease




Test result will always agree with gold standard
Both sensitivity and specificity increase
Example: Joe has a small pulmonary embolus (PE)
that will resolve spontaneously.
 If his VQ scan is positive, he will get an
angiogram that shows the PE (true positive)
 If his VQ scan is negative, his PE will resolve and
we will think he never had one (true negative)
VQ scan can’t be wrong!
Effect of Double Gold Standard Bias 2:
Newly occurring or newly detectable disease




Test result will always disagree with gold standard
Both sensitivity and specificity decrease
Example: Jane has or will soon get a nasty breast
cancer that is currently undetectable
 If her mammogram is positive, she will get
biopsies that will not find the tumor (mammogram
will look falsely positive)
 If her mammogram is negative, she will return in
several months and we will think the tumor was
initially missed (mammogram will look falsely
negative)
Mammogram can’t be right!
Effect of Double Gold
Standard Bias

Spontaneously
resolving disease


Sensitivity falsely
increased
Specificity falsely
increased

Newly occurring or
newly detectable
disease


Sensitivity falsely
decreased
Specificity falsely
decreased
Bias
Incorporation
Spectrum
Description
Gold standard
incorporates index test.
D+ only includes
“sickest of the sick”
D- only includes
“wellest of the well:
Verification
Double Gold
Standard
Positive index test
makes gold standard
more likely.
Disease resolves
spontaneously
Disease become
sdetectable during
follow-up
Sensitivity is
falsely …
Specificity is
falsely …
Systematic Reviews of
Diagnostic Accuracy Studies
Meta-analyses of Diagnostic
Tests






Systematic and reproducible approach to finding
studies
Summary of results of each study
Investigation into heterogeneity
Summary estimate of results, if appropriate
Unlike other meta-analyses (risk factors,
treatments), results aren’t summarized with a
single number (e.g., RR), but with two related
numbers (sensitivity and specificity)
These can be plotted on an ROC plane
MRI for the diagnosis of MS
Whiting et al. BMJ 2006;332:875-84
Dermoscopy vs Naked Eye for Diagnosis of
Malignant Melanoma
Br J Dermatol. 2008 Sep;159(3):669-76
Example: A clinical decision rule to identify
children at low risk for appendicitis (Problem
5.6)


Study design: prospective cohort study
Subjects





4140 patients 3-18 years presenting to Boston
Children’s Hospital ED with abdominal pain
Of these, 767 (19%) received surgical consultation
for possible appendicitis
113 excluded (chronic diseases, recent imaging)
53 missed
601 included in the study (425 in derivation set)
Kharbanda et al. Pediatrics 2005; 116(3): 709-16
A clinical decision rule to identify children at low
risk for appendicitis

Predictor variable



Standardized assessment by pediatric ED attending
Focus on “Pain with percussion, hopping or cough”
(complete data in N=381)
Outcome variable:


Pathologic diagnosis of appendicitis (or not) for
those who received surgery (37%)
Follow-up telephone call to family or pediatrician 24 weeks after the ED visit for those who did not
receive surgery (63%)
Kharbanda et al. Pediatrics 2005; 116(3): 709-16
A clinical decision rule to identify children at low
risk for appendicitis


Results: Pain with percussion, hopping or
cough
78% sensitivity and 83% NPV seem low to
me. Are they valid for me in deciding whom
to image?
Kharbanda et al. Pediatrics 2005; 116(3): 709-16
Checklist




Was there an independent, blind
comparison with a reference (“gold”)
standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those
in whom we would use it in practice)?
Was the reference standard applied
regardless of the diagnostic test result?
Was the test (or cluster of tests) validated
in a second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd ed.
(NY: Churchill Livingstone), 2000. p 68
In what direction would these
biases affect results?




Sample not representative (population
referred to pedi surgery)?
Verification bias?
Double-gold standard bias?
Spectrum bias
For children presenting with
abdominal pain to SFGH 6-M

Sensitivity probably valid (not
falsely low)





But whether all of them tried to hop is not
clear
Specificity probably low
PPV is too high
NPV is too low
Does not address surgical consultation
decision
Prognostic and Genetic Tests
(Mark)
Download