Digital Mammography Computer-aided Diagnosis Multiple-reader

advertisement
Digital Mammography,
Computer-aided
Diagnosis,
Multiple-reader studies and the Holy Grail of
Imaging Physics
RF Wagner, SV Beiden (OST, CDRH, FDA)
Gregory Campbell (OSB, CDRH, FDA)
RM Gagne, RJ Jennings (OST, CDRH, FDA)
CE Metz, Y Jiang (Radiology Research, U of Chicago)
H-P Chan (Radiology Research, U of Michigan)
OUTLINE
I)
A Preamble: What we mean by the metaphor
of the Holy Grail of Imaging Physics
II)
Some Fundamentals
of the Imaging Physics
III)
Some Fundamentals
of Scorekeeping for
Diagnostic Imaging in Clinical Studies
-- Why ROC & Multiple-Reader
ROC Studies
(from Public Panel Meeting on Digital Mammo)
IV)
The Next Step: How (Some) Systems for
Computer-Aided
Diagnosis
may help us toward our goals
PREAMBLE
Two Cultures
A:
Physical laboratory-elaborate
physical measurements
on performance of diagnostic imaging systems
-- and models of observers
& lesion detection
B:
The Clinical setting - measurements
of diagnostic
performance of imaging systems on patients
i.e., measurements
on real observers
& real patients
The "Holy Grail"
(as the metaphor
is used in these communities)
To predict the clinical ranking of imaging systems
from the ranking obtained in physical laboratory
To use the info from A to reduce the burden of B.
World's shortest course on
physical
image quality
Image quality depends
Contrast, Resolution,
and the Task
on
Noise
e
More photons or quanta allow for finer tasks Physical measures:
generalizedconcept
of "quanta"
Family of Required Physical Measurements
Physical composition of
input background & "lesions"
X-ray source distribution
X-ray spectral
content
Input exposure
Spectral contrast
Gray-scale
Scatter-rejection
Modulation
transfer
transfer
mechanism
Transfer Function
Noise Power Spectrum
(MTF)
Summary of Signal Detection Theory &
ICRU Report #54: The Assessment of Image Quality
Detectability SNR in additive Gaussian noise:
A = separation of class means;
X = covariance matrix of noise
d2=At
Y.dA
Fourier domain measurements
in medical imaging:
G 2 MTF2(f)
NEQ(f) =
NPS(f)
DQE(f) =
NEQ(f)/Q
For a discrimination
d 2=
Contemporary
task characterized
by AS(f):
J [AS(f) l2NEQ(f) df
studies (non-stationary
d z = At _.1 A
... in spatial domain
statistics):
"Contrast-detail"
,
.:"°- "NQ"_
o
•
.
• o,..._,_,m
o
,
.
curves
;
!
(required
Contrast for
vs diameter
detection)
.
Symbols = Observed
°-°'
.
,
,
_
,_
°
,
Lines = predicted from
Laboratory measurements
LO
0.1
Diameter
Exposure
........
|,_Q
Latitude
i
(ram)
- Conventional
........
lm*ging
i
_,___,
°"'
System
.......
Contrast vs exposure
(conventional
mammo)
"
PIII
II
0,01
10
........
ts
10
........
t6
!0
........
10
X_ay Quantum Fluence (_Ummz)
Exposure
........
1.00
Lautitude - Optimized
,
........
Imaging
'
System
_4.10
E
(optimized
o
_
O,Ol
104
........
digital system)
Contrast vs exposure
_
|05
.......
i
|0_
X_ruy Qu*mtum Flutuce {#1ram_)
......
lO 7
A slightly longer course on
Clinical scorekeeping
in diagnostic radiology:
I) The ROC Paradigm
II) Two Classical Papers on
Variability in Mammography
III) Multiple-reader
studies what's involved
IV) Contem vorary example - digital mammography
1)
ACTUALLY
The ROC Paradigm
w,,ll
NEGATIVE
t CASES
OISTRIBUTION
FOR
ONE POSSIBLE
THRESHOLD
DECISION
ACTUALLY POSITIVE
CASES
DISTRIBUTION FOR
I
-_,
FNF
DECISION
[
AXIS
>
TEST
RESULT
VALOE.
O. SUS ECT,VE
]
[_JUDGEMENT
OF LIKELIHOOD
t.O
'
,
'
'
I
Z
THAT
,
CASE
'
'
IS
"
POSITIVEJ
i
<___L _
O+
u_+
Q_<_
Q
,,/:,,._
_o..o_^
" THRES.o_o
_/
>,= o..,___o-_o
_"
CONVENTIONAL
[& ROC
CURVE I
LU Q:_ IF-.
Q_
t-"
0 .C0 .0
,
,
f
FALSE
[FPF
Fig.
possible
A typical
operating
I
POSITIVE
,
f
I
1.0
FRACTION
OR P(T+IO-)]
conventional
points.
0.5I . ,
ROC
curve,
showing
three
2)
Classic
(a)
papers on Variability in Mammography
Elmore et al. Study of 10 Radiologists
(i,_.)
True Positive Fraction vs False Positive Fraction
Based on Recommendation
for Biopsy
(123 noncancers/27
proven cancers)
D'Orsi & Swets' Plotted in ROC Space
True Negative
( _qq,_-)
Proportion
1.0 0.9 0.8 0_7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
1.0
,
,
,
,
,
,
,
v
,
0.0
¢-
0.9'
•0.1
0.8-
-0.2
0
_
0
Q-
c0
°_
0.7-
o
0.6nL_
_) 0.5.
0.3 o
0.4
•
Q.
0
_13_
(I)
.0.5 o_>
:=_
-_
m
_3)
Z
O 0.4'
fla) 0.3-
-0.6
t-- 0.2-
-0.8 u_
-0.7
0.1
.0.9
0.0
.......
1.0
0.0 0.1 0.2 0.3 0.4 015 0.6 0.7 0.8 019 1.0
False
Positive
Proportion
RAtnOLOOtST
pA_N'rt
ANT
wtu¢ CAt'_'f.g (N _ 2"/)_
|MME,_A'I'_
WOglKU_*
'Ot.TIgAo
atoes¥
_RINO
, _
PA_ENTt W'tTHOOTCAN_'__, O'l " 123)
ADOCTIONAL
VIEWS
percentage
ANY
|MMEtMAT
WO_KUI._
of patients
Ig
Itt¢_¥
ULTtRA-.
Ag)O_TIO_A|
_Ut_u
v_¢I
J¢
A
96
B2 :
Il
41
64
20 i
8
$3
B
96
7o
i!
$2
40
7
II
34
C
D
93
93
63
78
19
iI
67
22
65
57
9
t3
24
15
55
42
E
F
G
H
1
89
85
85.
81
78
70
48
67
$2
48
!5
19
7
t5
4
26
70
30
37
41
41
45
44
38
30
t4
_
.'
l0
"
14
II
lI
t4
7
24
43
38
25
25
j
74
133
7
37
lI
3
5
6
*tw
*Any imng, dlJtc woAup wit defined u • t.c._'mxgndat_
r J blowy.
Elmore et at.
to obtain **dditlontl nutmmogra_L¢
"dcw't. ullrtsound t_w_ct
(b)
Beam, Layde, Sullivan 1996
Study of 108 rand omly selected Radiologists
Sensitivity vs _Specificity
Based on Recommendation
for Biopsy
(45 cancers, 19 normal, 15 benign).
100 _
uu "
" _,
°
,¢b
_A
t
qJ.
o 8*
,0. *
¢b
41.
'W, q_tW
*
O
q_
o_.
OqUW
o
0
.8-
*¢p
9
qt.
41.4'
¢*
4w
_
o
o
.8- o
Mb*
¢. ¢.
41" _g*Ct*
q*
9
qP
q*
7U
o
0,0
r_
50
"o
311
211"
IO"
$
!
|
|
I
i
I
i
l
|
10
20
30
40
50
60
'70
80
90
100
Specificity
%
Figure2. Scattecplot
of_
vssi_cirgfty_,mong
ffTe108US
r_oTotog_ts_o. p_tic_ted in thestudy.
Beam later showed
that
this is not consistent with a single ROC curve
and finite sampling statistics
Cxl
"x3 "
Two Correlated Diagnostic Modalities
J v-,,l
0
g"_
,.Q
Ca
o
k
NonCa
I
:
I
"
of m_gnancy
/ mod_
1
3) The Multiple-Reader,
Multiple-Case
(MRMC) ROC Paradigm:
Every reader reads every case
(& where practical) in both modalities
Can then model and account for the following
multivariate Roc. Accuracy Parameters:
Variance due to case sampling
Variance due to reader sampling
Correlation of case variation across modalities
Correlation
of reader variation
Within reader variability,
across modalities
or reader "jitter"
(Most models involve additional
parameters)
Using Multiple Readers, Multiple Cases
& Elaborate MRMC Software (refs. at end)
We are interested in assessing mean performance
and the uncertainty (confidence intervals, C.I.)
in estimates of the difference in performance over
Multiple (#) Readers & Multiple (#) Cases
The uncertainty of a diff. in average ROC parameters
(between two modalities) =
[Var (C) x (1 - pc)l/(# Cases)
+ Var (Btw R) x (1 - pr )/(# Readers)
+ e/(# Cases x # Readers)
(i.e.)
(Uncorrelated
portion of Case Var)/(#
Cases)
+
(Uncorrelated
portion of Reader Var)/(#
Readers)
+
(Within reader jitter)/(#
Cases x # Readers)
4) Sponsor's
Multiple-Reader,
Multiple-Case
Study
Cases:
625 women (one or both breasts)
997 breasts
44 breasts with cancer (no known bilateral Ca)
Readers:
5 MQSA-qualified
radiologists
Modalities:
All cases imaged with both modalities
All 5 readers, read all images from both modalities:
Readings of Digital & Analog separated by 30 days Balanced Reading (half of cases:Digital first; &v.v.)
Individual
Reader Results
Mean (over 5 readers) ROC Area SFM (Analog): 0.77
Mean(over 5 readers) ROC Area FFDM (Digital): 0.76
Mean 95% C.I. about the difference of 0.01:+/-0.11
Multiple-Reader
Analysis
(Generalizable to a Pop. of Readers & a Pop. of Cases)
Mean 95% C.I. about the difference of 0.01:+/-0.064
Implications
of the present Variance Structure
for a post-approval
assumed
trial
Drawn from a screening population to have same sampling properties as current
(a)
with goal of 95% C.I. ~ +/- 0.05
44 cancers, 10 readers
50 cancers, 6 or 7 readers
59 cancers, 5 readers
(b) with goal of 95% C.I. ~ +/- 0.03
78 cancers, 100 readers
100 cancers, 20 readers
... and screening yields about 5 cancers/1000
screened
The Clinical Studies
Hendrick RE, Lewin JM, D'Orsi CJ, Kopans DM, Conant
E, Cutter GR, Sitzler A.
Non-inferiority study of FFDM in an enriched diagnostic
cohort: Comparison with screen-film mammography in
625 women.
Proc. of the 5th International Workshop on Digital
Mammography, Toronto CA (Medical Physics Publ. Co.,
Madison WI, 2001).
Lewin JM, Hendrick RE, D'Orsi CJ, Isaacs P, Moss L,
Karellas A, Sisney GA, Kuni CK, Cutter GR.
Interim clinical evaluation of FFDM in a screening cohort:
Comparison with screen-film mammography in 4,965
exams.
Radiology 2001; 218 (in press - March issue).
We gratefully acknowledge
R. Edward Hendrick, Radiology Department,
& General Electric Medical Systems
for permission to present these results
prior to publication
NWU
Formal concepts
underlying
& required
Analysis
the previous
analysis --
to take the next step:
of computer-aided
diagnosis
Components
of Variability
(most general linear model for the present problem)
Components
correlated
across modalities:
c - range of case difficulty & finite-sample
r
- range of variability
effect
of reader skill
r x c - dependence of case difficulty on reader
(or vice versa: of reader skill on case)
Components
uncorrelated
m xc
-
mxr
-
m x(r xc)
across modalities:
"interaction"
II
-
"
Variance
components,..
0.0020
,
,
,
,
r
re
0.0015
0
o0
0.0010
0
._
0.0005
0.0000
0
e
Analysis of Clinical Study of
Jiang, Nishikawa, Schmidt, Metz, Giger, Doi
Discriminating 46 Ca vs 58 Bn _tcalc dust/10 readers
(Acad Radio 6, 1999)
Model B
Study #1 (Jiang et al.)
28
u.uuzu
Variance
'
components
,
& splitting
,
0.0015
0
o
0.0010
0
._
0.0005
>
0.0000
0
,
c
r
re
Analysis of Clinical Study of
Jiang, Nishikawa, Schmidt, Metz, Giger, Doi
Discriminating 46 Ca vs 58 Bn Bcalc clust/10 readers
(Acad Radio 6, 1999)
Model B
Study # 1 (Jiang et al.)
28
Toward the Holy Grail
If mature systems for computer-aided
diagnosis
can greatly reduce reader components
of variance
•.. this will leave only the case component...
and this carries the physics/technology
One possible scenario at that point:
Limit clinical studies to special cohorts
who will challenge/stress
physically
determined
differences
the
between
systems
"Field Test"
A study that attempts to sample the modality
under completely
realistic conditions
- attempts to reduce any sources of bias
"Stress Test"
A study that samples the application
under conditions
of the modality
that challenge
one or another features of the system
When setting up a Stress Test
of Modality A vs Modality B How to generate an "enriched"
(i.e., a set with challenging
image set?
images)
Caveat:
Selecting images using Modality
A will:
Favorably
bias Sensitivity
of Modality
A;
Favorably
bias Specificity
of Modality
B;
Possibly indeterminate
Alternative
solutions
bias on ROC curve itself
to the enrichment
must be found
problem
Principal
International
Commission
references
for this talk
for Radiation Units and Measurements.
Medical Imaging: The Assessment of Image Quality. ICRU Report
No. 54 (International Commission for Radiation Units and
Measurements,
Inc., Bethesda MD, 1996).
SV Beiden, RF Wagner, G Campbell. Components-of-variance
models and multiple-bootstrap
experiments: An alternative method
for random-effects,
receiver operating characteristic analysis. Acad.
Radiol. 2000; 7: 341-349.
RF Wagner, SV Beiden, G Campbell. Multiple-reader
studies, digital
mammography,
computer-aided
diagnosis--and
the Holy Grail of
imaging physics (I). Proc of the SPIE - Medical Imaging 2001; v 4320,
no. 200 (in press).
SV Beiden, RF Wagner, G Campbell, CE Metz, Y Jiang, H-P Chan.
Multiple-reader
studies, digital mammography,
computer-aided
diagnosis--and
the Holy Grail of imaging physics (II). Proc of the
SPIE - Medical Imaging 2001; v 4320, no. 201 (in press).
Other Key Sources
Dorfman DD, Berbaum KS, Metz CE. Receiver operating
characteristic rating analysis: generalization to the population of
readers and patients with the jackknife method. Invest Radiol 1992;
27: 723-731.
Elmore JG, Wells CK, Lee C, Howard DH, Feinstein AR. Variability
in radiologists' interpretations of mammograms. New England
Journal of Medicine 1994; 331: 1493-1499.
D'Orsi CJ and Swets JA. Letter to the Editor: Variability in the
interpretation of mammograms. New England Journal of Medicine
1995; 332: 1172.
Beam C, Layde PM, Sullivan DC. Variability in the interpretation of
screening mammograms by US radiologists. Arch Intern Med 1996;
156: 209-213.
Figure Credits
"More photons or quanta allow for finer tasks - " (six panels of photographs
made using increasing numbers of photons); from Vision: Human and
Electronic by Albert Rose, p 38: Copyright 1973, Plenum Press (New York and
London).
"Contrast-detail" curves; adapted from RM Gagne, H Jafroudi, RJ Jennings, et al., "Digital
mammography using storage phosphor plates and a computer-designed x-ray system,"
Digital Mammography '96: Proc. of the 3rd International Workshop on Digital
Mammography. K Doi, ML Giger, RM Nishikawa, RA Schmidt Eds. Elsevier
(Amsterdam, 1996).
"The ROC Paradigm" from CE Metz, "Basic principles of ROC analysis,"
Seminars in Nuclear Medicine 1978; 8: 283-298.
"Table of 10 Radiologists' recommendations for biopsy" from JG Elmore, CK Wells, CH
Lee et al, "Variability in radiologists' interpretations of mammograms." New England
Journal of Medicine 1994; 331: 1493-99. Copyright 1994, Massachusetts Medical Society.
All rights reserved.
"D'Orsi & Swets plotted in ROC Space" from CJ D'Orsi and JA Swets, "Letter to the
Editor: Variability in the interpreation of mamograms." New England Journal of
Medicine 1995; 332: 1172. Copyright 1995, Massachusetts Medical Society. All rights
reserved.
"Study of 108 randomly selected Radiologists" from C Beam, PM Layde, DC Sullivan,
"Variability in the interpretation of screening mammograms by US radiologists." Arch
Intern Med 1996;156: 209-213. Copyrighted 1996, American Medical Association.
"Variance components & splitting" from SV Beiden, RF Wagner, G Campbell, CE Metz,
Y Jiang, "Components-of-variance models for random-effects ROC analysis: The case of
unequal variance structures across modalities." Acad Radiol (in press, July 2001).
Academic Radiology is the official journal of the Association of University Radiologists
(copyright holder), the Society of Chairmen of Academic Radiology Departments, the
Association of Program Directors in Radiology, and the American Association of
Academic Chief Residents in Radiology.
Download