Virtual Clinical Trial

advertisement
Virtual Clinical Trial
1
ive Fra
True posit
ction
0.8
0.6
System A
System B
System C
0.4
✤
What is the relevant task?
Detect tumors, detect and locate tumors, estimate parameters!
✤
What objects will be imaged?
The patient population and the statistics that govern it.!
✤
What is the imaging system(s)?
The physics and statistics of the system which determine how an
object is mapped to image data.!
✤
How will the information be extracted?
The observer.!
✤
What measure of performance will be used?
The figure of merit
0.2
0
0
0.2
0.4
0.8
0.6
1
ction
sitive Fra
False po
Observer Models and
Realism Testing
Matthew Kupinski
July 22, 2014
Virtual Clinical Trial
✤
What is the relevant task?
Detect tumors, detect and locate tumors, estimate parameters!
✤
What objects will be imaged?
The patient population and the statistics that govern it.!
✤
What is the imaging system(s)?
The physics and statistics of the system which determine how an
object is mapped to image data.!
✤
How will the information be extracted?
The observer.!
✤
What measure of performance will be used?
The figure of merit
Observers
✤
Decision-making strategy!
✤
Options
-Human
-Model
-Anthropomorphic model
-Computer-aided diagnosis (CAD)
Observer review
Classification
Signal-absent images
Classification
✤
t = T (g)
Estimation
✤
✤
=
(g)
Signal-present images
Combination
t = T (g)
= (g)
ROC analysis
Hotelling observer
t = w† g
w = Kg 1 g
✤
Is the ideal observer when the image data are Gaussian
distributed!
✤
Maximizes the SNR among all linear observers!
✤
Requires knowledge of the first- and second-order statistics!
✤
A linear template applied to the image
Ideal linear observers
Channelized Hotelling observer
v = Tg
t = ⇠† v
=t
Ideal linear observers
⇠ = K⇠
1
⇠
✤
Dimension of v is much smaller than g!
✤
Ideal linear observer on channel outputs!
✤
Channels can be chosen to mimic human visual system!
✤
Internal noise model is needed to match absolute performance
Limitations of Hotelling observers
=t
✤
Signal location is fixed!
✤
Performance is limited in the presence of signal-shape variability!
✤
Signal must often be very weak to achieve AUC < 1.0
Limitations of Hotelling observers
Combined tasks
D1
✤
Signal location is fixed!
✤
Performance is limited in the presence of signal-shape variability!
✤
Signal must often be very weak to achieve AUC < 1.0
Task is li
mited
D2
EROC
EROC
EROC Curve
Traditional ROC Curve
1
✤
TPF vs. FPF!
✤
AUC from 2AFC test!
✤
Ideal observer is likelihood
ratio
True−positive Fraction
0.8
0.7
✤
Expected utility for TPF vs. FPF!
✤
AEROC from 2AFC test!
✤
Ideal observer is known!
✤
Specializes to LROC for signal
localization tasks
0.6
0.5
0.4
0.3
Expected Utility for TPF
0.9
0.2
0.1
0
0
0.2
0.4
0.6
False−positive Fraction
0.8
1
0
0.2
0.4
0.6
False−positive Fraction
0.8
1
Scanning linear observer
Scanning linear observer
Estimating signal position
t(g, ) =
g( )† Kg 1 (g
t(g) = max t(g, )
g)
1
g( )† Kg
2
1
g( )
(g) = arg max t(g, )
✤
Maximizes area under the EROC curve when the image
statistics are Gaussian distributed and a delta utility function
is assumed!
✤
Non-linear observer due to the maximization steps
Scanning linear observer
= t(g, )
Anthropomorphic models?
Estimating signal position
= t(g, )
✤
Courtesy of E. Krupinski
Human scan pattern is difficult to model
Anthropomorphic models?
Problem
Observer realism
ObseHuman
rverscanispattern
to model
limisitdifficult
ed
Low
High
High
Low
✤
Task realism
Courtesy of E. Krupinski
Problem
Problem
Observer realism
Low
Observer realism
High
X
High
Low
Task realism
Low
High
X
High
Low
Task realism
Rank ordering
✤
Does absolute performance matter?!
✤
What matters is that the same decision is made using the model
observer!
✤
Rank ordering!
✤
May allow for simpler tasks and observers
Human performance
Observer Performance
Solution
Model performance
System parameter
Rank ordering
Ranking Systems
Observer Performance
Human performance
✤
A judge is a program producing the estimates of the
performance for a system (may include multiple readers)!
✤
Judge j produces a rank Rjs for system s !
✤
We want to know how consistent these rankings are
among the judges
Model performance
System parameter
Ranking Systems
R̄s =
J
1X
Rjs
J j=1
SSt = J
S
X
Ranking Systems
S
1X
S+1
R̄ =
R̄s =
S s=1
2
R̄s
R̄
✤
The Friedman statistic is usually used in hypothesis testing
where the null hypothesis is that the rankings of the judges
are uncorrelated!
✤
Under the null hypothesis the distribution of Q can be
approximated by a chi-square distribution with S - 1
degrees of freedom!
✤
A large value for Q indicates agreement among the judges
2
s=1
SSe =
J
S
XX
1
Rjs
J (S 1) j=1 s=1
R̄
The Friedman statistic is
2
=
S (S + 1)
12
SSt
Q=
SSe
Ranking Systems
Ranking Systems
1
✤
We want a large value for Q, but how large?!
0.9
0.8
The maximum value for Q is
Qmax = J (S
1)
Define a sample variance for each system
2
s
J
1X
=
Rjs
J j=1
R̄s
2
Correct Order Index
✤
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2000
4000
6000
Training Samples
8000
10000
Task-Based Turing Test
Task-Based Turing Test
✤
Problem: Is the simulated data realistic enough?!
✤
Question: Realistic enough for what?!
✤
Answer: Realistic enough to give results for estimators in
simulation that match what we get with real data!
✤
We must quantify how well the results match
Task-Based Turing Test
✤
Consider parameter µ with estimator µest !
✤
µest has a PDF pr1(µest) with the simulated data!
✤
µest has a PDF pr2(µest) with the real data!
✤
Want to measure how well these PDFs match
Task-Based Turing Test
To measure the match between the two PDFs we will use the Wicoxon
statistic
Nr
1 X
W =
Wr
Nr r=1
N1 X
N2
1 X
Wr =
step [µest (g2j )
N1 N2 i=1 j=1
=
W is usually used to reject the null hypothesis that the two
distributions are the same!
✤
W is also an estimate of the AUC for the two PDFs!
✤
If the two PDFs are the same, then the AUC is 0.5
µest (g1i )]
MRMC Analysis applies
2
W
✤
C1
C2
C3
C4
C5
C6
C7
+
+
+
+
+
+
N1
N2
Nr
N1 N2
N1 Nr
N2 Nr
N1 N2 Nr
Task-Based Turing Test
Task-Based Turing Test
✤
If 0.5 is within one standard deviation of W we will
consider the two PDFs to be a good match!
✤
As the number of cases and readers increases the standard
deviation decreases and this becomes a more demanding
test!
✤
Using MRMC analysis we can compute the numbers of
readers and cases needed to reach the level of accuracy
desired!
Task-Based Turing Test
Task-Based Turing Test
Single'signal'present'image:'
Simulation Signal Present ROI (single image): 3mm 14HU object
Real Signal Present ROI (single image): 3mm 14HU object
1120
10
1120
1100
20
Signal'present'image'(500'averaged):'
'
Simulation Signal Present ROI (averaged by 500 images): 3mm 14HU object Real Signal Present ROI(averaged by 500 images):3mm 14HU object
10
1078
1100
20
10
1080
30
1080
40
1060
50
20
1074
1040
1020
30
40
1070
40
50
1068
50
1066
60
80
1000
80
90
980
90
1020
40
50
60
70
80
90
100
1068
1066
60
1064
70
1062
1062
1000
80
80
1060
100
30
1070
1064
70
20
1072
1072
70
10
1074
30
60
100
1076
20
1060
50
70
10
40
1040
60
1076
30
10
20
30
40
50
60
70
80
90
100
90
1060
90
1058
100
1058
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
Task-Based Turing Test
Summary
1
0.9
True−Positive Fraction
0.8
0.7
✤
Observer models can predict human performance for relatively
simple tasks!
✤
Consistent ranking may be more important than absolute quantitation!
✤
Task-based methods can be used to test model realism
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
False−Positive Fraction
0.8
1
Download