Computational Radiology Laboratory Harvard Medical School www.crl.med.harvard.edu Children’s Hospital Department of Radiology Boston Massachusetts A Survey of Validation Techniques for Image Segmentation and Registration, with a focus on the STAPLE algorithm Simon K. Warfield, Ph.D. Associate Professor of Radiology Harvard Medical School Outline • Validation of image segmentation – Overview of approaches – STAPLE • Validation of image registration • STAPLE algorithm available as open source software from: – http://www.nitrc.org/projects/staple – http://crl.med.harvard.edu/ Computational Radiology Laboratory. Slide 2 Segmentation • Goal: identify or label structures present in the image. • Many methods: – Interactive or manual delineation, – Supervised approaches with user initialization, – Alignment with a template, – Statistical pattern recognition. • Applications: – Quantitative measurement of volume, shape or location of structures, – Provides boundary for visualization by surface rendering. Newborn MRI Segmentation. Computational Radiology Laboratory. Slide 3 Validation of Image Segmentation • Spectrum of accuracy versus realism in reference standard. • Digital phantoms. – Ground truth known accurately. – Not so realistic. • Acquisitions and careful segmentation. – Some uncertainty in ground truth. – More realistic. • Autopsy/histopathology. – Addresses pathology directly; resolution. • Clinical data ? – Hard to know ground truth. – Most realistic model. Computational Radiology Laboratory. Slide 4 Validation of Image Segmentation • Comparison to digital and physical phantoms: – Excellent for testing the anatomy, noise and artifact which is modeled. – Typically lacks range of normal or pathological variability encountered in practice. MRI of brain phantom from Styner et al. IEEE TMI 2000 Computational Radiology Laboratory. Slide 5 Comparison To Higher Resolution MRI Photograph MRI Provided by Peter Ratiu and Florin Talos. Computational Radiology Laboratory. Slide 6 Comparison To Higher Resolution Photograph MRI Photograph Microscopy Provided by Peter Ratiu and Florin Talos. Computational Radiology Laboratory. Slide 7 Comparison to Autopsy Data • Neonate gyrification index – Ratio of length of cortical boundary to length of smooth contour enclosing brain surface Computational Radiology Laboratory. Slide 8 Staging Stage 3: at 28 w GA shallow indentations of inf. frontal and sup. Temp. gyrus (1 infant at 30.6 w GA, normal range: 28.6 ± 0.5 w GA) Stage 4: at 30 w GA Stage 3 2 indentations divide front. lobe into 3 areas, sup. temp.gyrus clearly detectable (3 infants, 30.6 w GA ± 0.4 w, normal range: 29.9 ± 0.3 w GA) Stage Stage 5: at 32 w GA frontal lobe clearly divided into three parts: sup., middle and inf. Frontal gyrus (4 infants, 32.1 w GA ± 0.7 w, normal range: 31.6 ± 0.6 w GA) Stage 6: at 34 w GA temporal lobe clearly divided into 3 parts: sup., middle and inf. temporal gyrus (8 infants, 33.5 w GA ± 0.5 w normal range: 33.8 ± 0.7 w GA) Stage 4 “Assessment of cortical gyrus and sulcus formation using MR images in normal fetuses”, Abe S. et al., Prenatal Diagn 2003 Computational Radiology Laboratory. Slide 9 Stage Neonate GI: MRI Vs Autopsy Gyrification Index versus age in days 3 2.5 GI 2 1.5 1 0.5 0 200 220 240 260 280 300 320 340 Post-conceptional age in days MRI Scan 2 MRI Scan 1 Armstrong Computational Radiology Laboratory. Slide 10 GI Increase Is Proportional to Change in Age. 'change in GI' versus 'days of growth before final scan' 0.8 0.7 change of GI 0.6 0.5 0.4 0.3 0.2 0.1 0 50 55 60 65 70 75 80 85 90 time interval between scans in days Change of Total Brain GI Linear (Change of Total Brain GI) Computational Radiology Laboratory. Slide 11 GI Versus Qualitative Staging Staging versus GI 2.4 2.2 Total Brain GI 2 1.8 1.6 1.4 1.2 1 3 4 5 6 7 8 9 Staging Grade MRI scan 1 MRI scan 2 Computational Radiology Laboratory. Slide 12 Neonate Gyrification GI : interactive versus automatic segmentation. GI - automatic segmentation 5 4.5 4 y = 1.2241x + 0.4443 3.5 3 2.5 2 1.5 1 0.5 0 -1 0 1 2 3 4 5 GI - hand segmentation Linear (line of equality) Computational Radiology Laboratory. Slide 13 Validation of Image Segmentation • Comparison to expert performance; to other algorithms. • Why compare to experts ? – Experts are currently doing the segmentation tasks that we seek algorithms for. – Surgical planning. – Neuroscience research. • What is the appropriate measure for such comparisons ? Computational Radiology Laboratory. Slide 14 Measures of Expert Performance • Repeated measures of volume – Intra-class correlation coefficient • Spatial overlap – Jaccard: Area of intersection over union. – Dice: increased weight of intersection. – Vote counting: majority rule, etc. • Boundary measures – Hausdorff, 95% Hausdorff. • Bland-Altman methodology: – Requires a reference standard. • Measures of correct classification rate: – Sensitivity, specificity ( Pr(D=1|T=1), Pr(D=0|T=0) ) – Positive predictive value and negative predictive value (posterior probabilities Pr(T=1|D=1), Pr(T=0|D=0) ) Computational Radiology Laboratory. Slide 15 Validation of Image Segmentation • STAPLE (Simultaneous Truth and Performance Level Estimation): – An algorithm for estimating performance and ground truth from a collection of independent segmentations. Computational Radiology Laboratory. Slide 16 STAPLE papers – Image segmentation with labels: • • • • Warfield, Zou, Wells ISBI 2002 Warfield, Zou, Wells MICCAI 2002. Warfield, Zou, Wells, IEEE TMI 2004. Commowick and Warfield IPMI 2009 – Image segmentation with boundaries: • Warfield, Zou, Wells MICCAI 2006. • Warfield, Zou, Wells PTRSA 2008. – Diffusion data and vector fields: • Commowick and Warfield IEEE TMI 2009 Computational Radiology Laboratory. Slide 17 STAPLE: Estimation Problem • Complete data density: f ( D, T | p, q ) • Binary ground truth Ti for each voxel i. • Expert j makes segmentation decisions Dij. • Expert performance characterized by sensitivity p and specificity q. – We observe expert decisions D. If we knew ground truth T, we could construct maximum likelihood estimates for each expert’s sensitivity (true positive fraction) and specificity (true negative fraction): pˆ , qˆ arg max ln f ( D , T | p, q ) p, q Computational Radiology Laboratory. Slide 18 Expectation-Maximization • Since we don’t know ground truth T, treat T as a random variable, and solve for the expert performance parameters that maximize: Q ( | ( t 1 ) ) E ln f (D, T | ) | D, ( t 1 ) • Parameter values θj=[pj qj]T that maximize the conditional expectation of the log-likelihood function are found by iterating two steps: – E-step: Estimate probability of hidden ground truth T given a previous estimate of the expert quality parameters, and take expectation. – M-step: Estimate expert performance parameters by comparing D to the current estimate of T. Computational Radiology Laboratory. Slide 19 Probability Estimate of True Labels Estimate probability of tissue class in reference standard: W si f (T i s | D i , ) k k f (T i s ) f ( D ij | T i s, ) k j s f (T i s ) f ( D ij | T i s , ) k j Computational Radiology Laboratory. Slide 20 Binary Input: True Segmentation W i f (Ti 1 | D i , p , q ) k k k f ( D ij | Ti 1, p j , q j ) f (Ti 1) k k j Ti f ( D ij | Ti , p j , q j ) f (Ti ) k k j k k k f (Ti 1) k f (Ti 0) k pj k j : D ij 1 qj k j : D ij 0 k j : D ij 0 (1 p j ) (1 q j ) k j : D ij 1 f (Ti 1) : prior probability true label at vo xel i is 1. k W i : conditional probability that true lab el is 1. Computational Radiology Laboratory. Slide 21 Expert Performance Estimate pj qj k 1 k 1 i : D ij Wi k i : D ij 1 W i 1 i : D ij 0 Wi k (1 W i ) k i : D ij 0 (1 W ) i 1 k i : D ij k (1 W ) i 0 k i : D ij p (sensitivity, true positive fraction) : ratio of expert identified class 1 to total class 1 in the image. q (specificity, true negative fraction) : ratio of expert identified class 0 to total class 0 in the image. k 1 js s k W si i : D ij s k W si i Computational Radiology Laboratory. Slide 22 Newborn MRI Segmentation Computational Radiology Laboratory. Slide 23 Newborn MRI Segmentation Summary of segmentation quality (posterior probability Pr(T=t|D=t) ) for each tissue type for repeated manual segmentations. Indicates limits of accuracy of interactive segmentation. Computational Radiology Laboratory. Slide 24 Expert and Student Segmentations Test image Expert consensus Student 2 Student 1 Student 3 Computational Radiology Laboratory. Slide 25 Phantom Segmentation Image Image Expert segmentation Expert Students Student segmentations Voting STAPLE Computational Radiology Laboratory. Slide 26 STAPLE Summary • Key advantages of STAPLE: – Estimates ``true’’ segmentation. – Assesses expert performance. • Principled mechanism which enables: – Comparison of different experts. – Comparison of algorithm and experts. • Extensions for the future: – Prior distribution or extended models for expert performance characteristics. – Estimate bounds on parameters. Computational Radiology Laboratory. Slide 27 Image registration • A metric: measures similarity of images given an estimate of the transformation. • Best metric depends on nature of the images. • Alignment quality ultimately possible depends on model of transformation. • The transformation is identified by solving an optimization problem. – Seek the transform parameters that maximize the metric of image similarity Computational Radiology Laboratory. Slide 28 Validation of Registration • Compare transformations – Take some images, apply a transformation to them. – Estimate the transform using registration – How well does the estimated transformation match the applied transform? • Check alignment of key image features – Fiducial alignment – Spatial overlap • Segment structures, assess overlap after alignment. Computational Radiology Laboratory. Slide 29 Intraoperative Nonrigid Registration • Fast: it should not take more than 1 min to make the registration. • Robust: the registration should work with poor quality image, artifacts, tumor... • Physics based: we are not only concerned in the intensity matching, but also interested in recovering the physical (mechanical) deformation of the brain. • Accurate: neuro-surgery needs a precise knowledge of the position of the structures. • Archip et al. NeuroImage 2007 Computational Radiology Laboratory. Slide 30 Block Matching Algorithm Similarity measure: coefficient of correlation [ 0 : 1] Divide a global optimization problem in many simple local ones Highly parallelizable, as blocks can be matched independently. Computational Radiology Laboratory. Slide 31 Block Matching Algorithm Displacement estimates are noisy. Computational Radiology Laboratory. Slide 32 Patient-specific Biomechanical Model Pre-operative image Automatic brain segmentation Brain finite element model (linear elastic) Computational Radiology Laboratory. Slide 33 Registration Validation • Landmark matching assessment in six cases • Parallel version runs in 35 seconds on a 10 dual 2GHz PC cluster 7x7x7 block size 11x11x25 window 1x1x1 step 50 000 blocks 10 000 tetrahedra Registration Error Evaluation Using Landmarks Correspondences 3 2,5 Measured Error – – – – – Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6 2 1,5 1 0,5 • 60 landmarks: 0 – Average error = 0.75mm 0 – Maximum error = 2.5mm – Data voxel size 0.8x0.8x2.5 mm3 5 10 15 Displacement Computational Radiology Laboratory. Slide 34 Registration Validation • 11 prospective consecutive cases, • Alignment computed during the surgery. • Estimate of the registration accuracy – 95% Hausdorff distance of the edges of the registered preoperative MRI and the intraoperative MRI. Computational Radiology Laboratory. Slide 35 Automatic selection of fiducials (1)Non-rigid alignment of preoperative MPRAGE. (2) Intraoperative whole brain SPGR at 0.5T Contours extracted from (1) with the Canny edge detector Contours extracted from (2) with the Canny edge detector 95% Hausdorff metric computed Computational Radiology Laboratory. Slide 36 Alignment improvement Non-rigid registration – preop to intraop scans (95% Hausdorff distance) Max Displacement measured (mm) Rigid registration accuracy – preop to intraop (mm) Non-Rigid registration accuracy – preop to intraop (mm) Ratio Rigid/NonRigid oligoastrocytoma Grade II 10.68 5.95 1.90 3.13 left posterior temporal glioblastoma Grade IV 21.03 10.71 2.90 3.69 Case 3 left medial temporal glioblastoma Grade IV 15.27 7.65 1.70 4.50 Case 4 left temporal anaplastic oligoastrocytoma Grade III 10.00 6.80 0.85 8.00 Case 5 right frontal oligoastrocytoma Grade II 9.87 5.10 1.27 4.01 Case 6 left frontal anaplastic astrocytoma Grade III 17.48 10.20 3.57 2.85 Case 7 right medial temporal anaplastic astrocytoma Grade III 19.96 9.35 2.55 3.66 Case 8 right frontal oligoastrocytoma Grade II 17.44 8.33 1.19 7.00 Case 9 right frontotemporal oligoastrocytoma Grade II 15.08 7.14 1.87 3.81 Case 10 right occipital anaplastic oligodendroglioma Grade III 9.48 5.95 1.44 4.13 Case 11 left frontotemporal oligodendroglioma Grade II 10.74 4.76 0.85 5.60 14.27 7.44 1.82 4.58 Tumor position Tumor pathology Case 1 right posterior frontal Case 2 AVG Computational Radiology Laboratory. Slide 37 Visualization of aligned data • Matched preoperative fMRI and DT-MRI aligned with intraoperative MRI. Tensor alignment: Ruiz et al. 2000 Computational Radiology Laboratory. Slide 38 Conclusion • Validation strategies for registration: – Comparison of transformations. – Fiducials • Manual, automatic. – Overlap statistics – as for segmentation. • Validation strategies for segmentation: – Digital and physical phantoms. – Comparison to domain experts. – STAPLE. Computational Radiology Laboratory. Slide 39 Acknowledgements Collaborators • • • • • • Neil Weisenfeld. Andrea Mewes. Richard Robertson. Joseph Madsen. Karol Miller. Michael Scott. • • • • • • • William Wells. Kelly H. Zou. Frank Duffy. Arne Hans. Olivier Commowick. Alexandra Golby. Vicente Grau. This study was supported by: R01 RR021885, R01 EB008015, R01 GM074068 Computational Radiology Laboratory. Slide 40