Virtual Clinical Trial 1 ive Fra True posit ction 0.8 0.6 System A System B System C 0.4 ✤ What is the relevant task? Detect tumors, detect and locate tumors, estimate parameters! ✤ What objects will be imaged? The patient population and the statistics that govern it.! ✤ What is the imaging system(s)? The physics and statistics of the system which determine how an object is mapped to image data.! ✤ How will the information be extracted? The observer.! ✤ What measure of performance will be used? The figure of merit 0.2 0 0 0.2 0.4 0.8 0.6 1 ction sitive Fra False po Observer Models and Realism Testing Matthew Kupinski July 22, 2014 Virtual Clinical Trial ✤ What is the relevant task? Detect tumors, detect and locate tumors, estimate parameters! ✤ What objects will be imaged? The patient population and the statistics that govern it.! ✤ What is the imaging system(s)? The physics and statistics of the system which determine how an object is mapped to image data.! ✤ How will the information be extracted? The observer.! ✤ What measure of performance will be used? The figure of merit Observers ✤ Decision-making strategy! ✤ Options -Human -Model -Anthropomorphic model -Computer-aided diagnosis (CAD) Observer review Classification Signal-absent images Classification ✤ t = T (g) Estimation ✤ ✤ = (g) Signal-present images Combination t = T (g) = (g) ROC analysis Hotelling observer t = w† g w = Kg 1 g ✤ Is the ideal observer when the image data are Gaussian distributed! ✤ Maximizes the SNR among all linear observers! ✤ Requires knowledge of the first- and second-order statistics! ✤ A linear template applied to the image Ideal linear observers Channelized Hotelling observer v = Tg t = ⇠† v =t Ideal linear observers ⇠ = K⇠ 1 ⇠ ✤ Dimension of v is much smaller than g! ✤ Ideal linear observer on channel outputs! ✤ Channels can be chosen to mimic human visual system! ✤ Internal noise model is needed to match absolute performance Limitations of Hotelling observers =t ✤ Signal location is fixed! ✤ Performance is limited in the presence of signal-shape variability! ✤ Signal must often be very weak to achieve AUC < 1.0 Limitations of Hotelling observers Combined tasks D1 ✤ Signal location is fixed! ✤ Performance is limited in the presence of signal-shape variability! ✤ Signal must often be very weak to achieve AUC < 1.0 Task is li mited D2 EROC EROC EROC Curve Traditional ROC Curve 1 ✤ TPF vs. FPF! ✤ AUC from 2AFC test! ✤ Ideal observer is likelihood ratio True−positive Fraction 0.8 0.7 ✤ Expected utility for TPF vs. FPF! ✤ AEROC from 2AFC test! ✤ Ideal observer is known! ✤ Specializes to LROC for signal localization tasks 0.6 0.5 0.4 0.3 Expected Utility for TPF 0.9 0.2 0.1 0 0 0.2 0.4 0.6 False−positive Fraction 0.8 1 0 0.2 0.4 0.6 False−positive Fraction 0.8 1 Scanning linear observer Scanning linear observer Estimating signal position t(g, ) = g( )† Kg 1 (g t(g) = max t(g, ) g) 1 g( )† Kg 2 1 g( ) (g) = arg max t(g, ) ✤ Maximizes area under the EROC curve when the image statistics are Gaussian distributed and a delta utility function is assumed! ✤ Non-linear observer due to the maximization steps Scanning linear observer = t(g, ) Anthropomorphic models? Estimating signal position = t(g, ) ✤ Courtesy of E. Krupinski Human scan pattern is difficult to model Anthropomorphic models? Problem Observer realism ObseHuman rverscanispattern to model limisitdifficult ed Low High High Low ✤ Task realism Courtesy of E. Krupinski Problem Problem Observer realism Low Observer realism High X High Low Task realism Low High X High Low Task realism Rank ordering ✤ Does absolute performance matter?! ✤ What matters is that the same decision is made using the model observer! ✤ Rank ordering! ✤ May allow for simpler tasks and observers Human performance Observer Performance Solution Model performance System parameter Rank ordering Ranking Systems Observer Performance Human performance ✤ A judge is a program producing the estimates of the performance for a system (may include multiple readers)! ✤ Judge j produces a rank Rjs for system s ! ✤ We want to know how consistent these rankings are among the judges Model performance System parameter Ranking Systems R̄s = J 1X Rjs J j=1 SSt = J S X Ranking Systems S 1X S+1 R̄ = R̄s = S s=1 2 R̄s R̄ ✤ The Friedman statistic is usually used in hypothesis testing where the null hypothesis is that the rankings of the judges are uncorrelated! ✤ Under the null hypothesis the distribution of Q can be approximated by a chi-square distribution with S - 1 degrees of freedom! ✤ A large value for Q indicates agreement among the judges 2 s=1 SSe = J S XX 1 Rjs J (S 1) j=1 s=1 R̄ The Friedman statistic is 2 = S (S + 1) 12 SSt Q= SSe Ranking Systems Ranking Systems 1 ✤ We want a large value for Q, but how large?! 0.9 0.8 The maximum value for Q is Qmax = J (S 1) Define a sample variance for each system 2 s J 1X = Rjs J j=1 R̄s 2 Correct Order Index ✤ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2000 4000 6000 Training Samples 8000 10000 Task-Based Turing Test Task-Based Turing Test ✤ Problem: Is the simulated data realistic enough?! ✤ Question: Realistic enough for what?! ✤ Answer: Realistic enough to give results for estimators in simulation that match what we get with real data! ✤ We must quantify how well the results match Task-Based Turing Test ✤ Consider parameter µ with estimator µest ! ✤ µest has a PDF pr1(µest) with the simulated data! ✤ µest has a PDF pr2(µest) with the real data! ✤ Want to measure how well these PDFs match Task-Based Turing Test To measure the match between the two PDFs we will use the Wicoxon statistic Nr 1 X W = Wr Nr r=1 N1 X N2 1 X Wr = step [µest (g2j ) N1 N2 i=1 j=1 = W is usually used to reject the null hypothesis that the two distributions are the same! ✤ W is also an estimate of the AUC for the two PDFs! ✤ If the two PDFs are the same, then the AUC is 0.5 µest (g1i )] MRMC Analysis applies 2 W ✤ C1 C2 C3 C4 C5 C6 C7 + + + + + + N1 N2 Nr N1 N2 N1 Nr N2 Nr N1 N2 Nr Task-Based Turing Test Task-Based Turing Test ✤ If 0.5 is within one standard deviation of W we will consider the two PDFs to be a good match! ✤ As the number of cases and readers increases the standard deviation decreases and this becomes a more demanding test! ✤ Using MRMC analysis we can compute the numbers of readers and cases needed to reach the level of accuracy desired! Task-Based Turing Test Task-Based Turing Test Single'signal'present'image:' Simulation Signal Present ROI (single image): 3mm 14HU object Real Signal Present ROI (single image): 3mm 14HU object 1120 10 1120 1100 20 Signal'present'image'(500'averaged):' ' Simulation Signal Present ROI (averaged by 500 images): 3mm 14HU object Real Signal Present ROI(averaged by 500 images):3mm 14HU object 10 1078 1100 20 10 1080 30 1080 40 1060 50 20 1074 1040 1020 30 40 1070 40 50 1068 50 1066 60 80 1000 80 90 980 90 1020 40 50 60 70 80 90 100 1068 1066 60 1064 70 1062 1062 1000 80 80 1060 100 30 1070 1064 70 20 1072 1072 70 10 1074 30 60 100 1076 20 1060 50 70 10 40 1040 60 1076 30 10 20 30 40 50 60 70 80 90 100 90 1060 90 1058 100 1058 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Task-Based Turing Test Summary 1 0.9 True−Positive Fraction 0.8 0.7 ✤ Observer models can predict human performance for relatively simple tasks! ✤ Consistent ranking may be more important than absolute quantitation! ✤ Task-based methods can be used to test model realism 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 False−Positive Fraction 0.8 1