Multilevel Segmentation and Integrated Bayesian Model Classification Jason J. Corso Postdoctoral Research Fellow Laboratory of Neuro Imaging Center for Computational Biology Center for Imaging and Vision Science Departments of Neuroscience and Statistics University of California, Los Angeles jcorso@ucla.edu Motivation Segmentation of Brain Tumor is Important but Difficult. • • Enhancement Important • • • • Would quantify tumor growth. Currently, only manual or interactive. Aid in surgical planning. Provide data for treatment analysis. Difficult • • Necrosis Ring Glioblastoma multiforme. Debris Edema Highly varying in appearance, size and shape. • • • Can deform nearby structures. Impossible with simple thresholding. Multiple modalities necessary. © Jason Corso UCLA LONI/CCB and CIVS 2 Motivation Two Typical Examples © Jason Corso UCLA LONI/CCB and CIVS 3 Graph Affinity-Based Segmentation 2 case that distinct feature distributions can be learned for heterogeneous regions.the To demonstrate suchon an application, Define problem a graph: experiment with the task of detecting and segmenting n tumors. Our method combines the most effective G = two {V,of E} oaches to segmentation. The first approach, exemplified are[4],sparse, neighbors. he work of Edges Tu et al. [5], usestoclass models to icitly represent the different heterogeneous processes. The Eachand pixel voxel is aestimation node. are s of segmentation class/membership ed jointly by sampling from a posterior distribution of ible problem solutions. nodes, The goal offor thesev methods ∈ V is Augment ompute samples that maximize the posterior distribution, ch is often statistics: represented as savproduct . distribution over Fig. 2. An example graph from brain MR showing the measurable node rative models on sets of pixels, or segments. Hense, properties as squares and the random, model variables are circles. A 2D graph call these methods . type of approach is shown for presentation only; all processing occurs in 3D. class model-based. label: cv This ery powerful as the samples are guaranteed to be from atistical distribution has beenbetween learned from u, training v ∈ Vdiscuss the application of our method to the problem of Define that affinity but the algorithms for obtaining these estimates are segmenting brain tumor in multichannel MR. We describe the paratively slow and the model choice problem is difficult. w = exp (−D(s , s ; θ)) , specific class models and probability functions used in the uv u v e techniques have been studied to improve the efficiency experimental results. he inference where task, e.g. D Swendsen-Wang sampling [6], but is some non-negative e methods still remain comparatively inefficient. distance function and θ are some II. BACKGROUND he second approach is based on the concept of normalized predetermined values. [7] on graphs. In these graph-based methods, the input In this section, we first make the definitions and describe induces a sparse graph, and each edge in the graph is the notation necessary for the technical discussion. Then, we Regions are cuts. n an affinity measurement thatdefined characterizesby the cuts similarity introduce the necessary background concepts for the use of he two neighboring nodes in some predefined feature generative models in segmentation. e. Cuts are sets of edges that separate the graph into subsets, typically analyzing eigen-spectrum [7], © Jasonby Corso UCLA the LONI/CCB and CIVS 4 • • • • • • • • • Graph Affinity-Based Segmentation Segmentation by Affinities • Define a region saliency measure ! u∈R,v ∈R / • Γ(R) = ! wuv . Low Γ(R) means good saliency: • • • u,v∈R wuv low affinity on boundaries, high affinity in interior. Normalized Cut (Shi & Malik) • • Affinities at a pixel scale only. Can take filters of varying size but erroneous at borders. © Jason Corso UCLA LONI/CCB and CIVS 5 Graph Affinity-Based Segmentation Segmentation by Weighted Aggregation • • • • Invented in natural image domain by Sharon et al. (CVPR 2000, 2001). • • Affinities are calculated at each level of the graph. First extended to medical imaging domain by Akselrod-Ballrin (CVPR 2006) Efficient, multiscale process. Results in a pyramid of recursively coarsened graphs that capture multiscale properties of the data. Statistics in each graph node are agglomerated up the hierarchy. WU V G1 U sU sV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 6 Segmentation with Generative Models • • • • • Define a likelihood function P ({su }|{cu }) , And define priors on the class labels (models): P ({cu }) . Goal is to seek estimates of class labels that maximize the posterior: P ({cu }|{su }) ∝ P ({su }|{cu })P ({cu }) . Segmentation and classification solved jointly. Exemplified by the DDMCMC algorithm of Tu and Zhu. 4 EDEMA NECROTIC 4 ACTIVE NEC EDEMA 4UMOR/UTLINE Image From Tu et al. IJCV 2005 © Jason Corso UCLA LONI/CCB and CIVS 4OPOLOGYWITH'RAMMATICAL2ELATIONS Stochastic grammars are another example. 7 Comparison • • • Graph Affinity-Based Methods • • • • Rapid, feed-forward methods giving quick results. Evidence of much power in these methods. Not guaranteed to get meaningful answers. Very memory intensive (especially with 3D medical data). Model-Based MAP Methods • • • Guaranteed to get answer from posterior. Very computationally expensive. The models are hard to design and train. Suggests a combination of the two. Research Agenda: leverage the efficiency of graph-based bottom-up methods with the power and optimality of top-down generative models. © Jason Corso UCLA LONI/CCB and CIVS 8 Combining Affinities and Models • Define class dependent affinities in terms of probabilities: P̂ (Xuv |su , sv ) = wuv . • Xuv is the binary event that both u and v are in the same region. Xuv = 0 • u Xuv • • Xuv = 1 v u is not deterministic on class labels (cu , cv ): v pixels in same region may have different labels, pixels in different regions may have same labels. © Jason Corso UCLA LONI/CCB and CIVS 9 Combining Affinities and Models The Conventional Affinity Function wuv = exp (−D(su , sv ; θ)) © Jason Corso UCLA LONI/CCB and CIVS 10 Combining Affinities and Models Model-Aware Affinity Node Likelihoods Model Specific Measurements The node likelihoods weigh the model specific measurements. © Jason Corso UCLA LONI/CCB and CIVS 11 Combining Affinities and Models Model-Aware Affinity • Treat class labels as hidden variables and sum them out: P (Xuv |su , sv ) = ∝ = !! cu cv !! cu cv !! cu cv P (Xuv |su , sv , cu , cv )P (cu , cv |su , sv ) , P (Xuv |su , sv , cu , cv )P (su , sv |cu , cv )P (cu , cv ) , P (Xuv |su , sv , cu , cv )P (su |cu )P (sv |cv )P (cu , cv ) . Model Specific Measurement © Jason Corso UCLA LONI/CCB and CIVS Node Likelihoods Class Prior 12 Combining Affinities and Models Model-Aware Affinity ! " P (Xuv |su , sv , cu , cv ) = exp −D (su , sv ; θ[cu , cv ]) • • Defined as a function of class label pairs θ[cu , cv ] . Suitable for heterogeneous data. • • • Model-aware affinities modulated by evidence: P (su |cu ) . At pixel scales, evidence for any class is low. At coarser scales, evidence is high and models improve affinities. © Jason Corso UCLA LONI/CCB and CIVS 13 Segmentation by Weighted Aggregation • • • • Overview • • Multiscale graph-based algorithm. Inspired by methods in Algebraic Multigrid. First extended to 3D medical domain by Akselrod-Ballrin (2006) Again, define a graph Gt = (V t , E t ) • Superscript denotes level in a pyramid of graphs Finest layer induced by voxel lattice • • • G = {G : t = 0, . . . , T } t 6-neighbor connectivity sv set according to multimodal image intensities. Affinities initialized by L1-distance: wuv = exp (−θ |su − sv |1 ) . Node properties G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 14 Segmentation by Weighted Aggregation • Select a representative set of nodes satisfying ! v∈Rt • wuv ≥ β ! wuv v∈V t i.e., all nodes in finer level have strong affinity to nodes in coarser. G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 15 Segmentation by Weighted Aggregation • Select a representative set of nodes satisfying ! v∈Rt • • wuv ≥ β ! wuv v∈V t i.e., all nodes in finer level have strong affinity to nodes in coarser. Begin to define graph G = (V , E ) 1 G 1 1 1 G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 16 Segmentation by Weighted Aggregation • Compute interpolation weights between coarse and fine levels puU = ! wuU V ∈V t+1 G 1 U wuV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 17 Segmentation by Weighted Aggregation • • Compute interpolation weights between coarse and fine levels puU = ! wuU V ∈V t+1 wuV Accumulate statistics at the coarse level sU = ! u∈V t G 1 U sU puU su " v∈V t pvU sV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 18 Segmentation by Weighted Aggregation • Interpolate affinity from finer levels ŵU V = ! puU wuv pvV (u!=v)∈V t G 1 U sU ŵU V sV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 19 Segmentation by Weighted Aggregation • Interpolate affinity from finer levels ŵU V = ! puU wuv pvV (u!=v)∈V t • Use coarse affinity to modulate the interpolated affinity WU V = ŵU V exp (−D(sU , sV ; θ)) WU V G 1 U sU sV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 20 Segmentation by Weighted Aggregation • Repeat ... WU V G 1 U sU sV V puU G0 u su © Jason Corso wuv s v v UCLA LONI/CCB and CIVS 21 Segmentation by Weighted Aggregation Incorporating Model-Aware Affinities • • Alter the way coarse affinities are modulated. • Change to Currently WU V = ŵU V exp (−D(sU , sV ; θ)) . WU V = ŵU V P (XU V |sU , sV ) . • Associate the most-likely class with each node: ∗ cU © Jason Corso = arg max P (sU |c) . UCLA LONI/CCB and CIVS c∈C 22 Example of the Segmentation Pyramid © Jason Corso UCLA LONI/CCB and CIVS 23 Extracting Information from the Hierarchy The Problem: • • • Gives hierarchy, need final answer. Original SWA suggests saliency. Model information provides more information for extraction. Current Model-Based Extraction • Compute class for each voxel for each level in the pyramid. • • • Use interpolation weights. Accumulate class membership. Assign class the voxel was for most of the levels. © Jason Corso UCLA LONI/CCB and CIVS 24 Extracting Information from the Hierarchy Energy Based Methods -- Saliency Only • • • Treat hierarchy as an input datum to a minimizer. Define a set of level variables {lv }, v ∈ V over voxel lattice. A final segmentation is induced by an instance. Hs ({l}) = α1 ! (li ) Γ(vi ) {i} − α2 “External” Potential • ! 1 (li = lj ) <i,j> Pair Potential Can be minimized various techniques • • Formulate as Gibbs Field and use simulated annealing. Use popular min-cut/max-flow graph cuts method. © Jason Corso UCLA LONI/CCB and CIVS 25 Extracting Information from the Hierarchy Energy Based Methods -- Model-Based • • • Models provide stronger information for extraction. Define a set of model variables {mv }, v ∈ V over voxel lattice. Bayesian estimate of model likelihood over hierarchy: ! P (mi , ŝi ) = P (ŝi , mi , t) t={0,...,T } ! = t={0,...,T } P (sv(t) |mi )E(ŝi , t) i • E(ŝi , t) is level evidence, computed by entropy of likelihood dist: # exp −κ m∈M P (sv(t) |m) ln P (sv(t) |m) i i ! # E(ŝi , t) = " " z={0,...,T } exp −κ m∈M P (sv (z) |m) ln P (sv (z) |m) ! " i © Jason Corso UCLA LONI/CCB and CIVS i 26 Extracting Information from the Hierarchy Energy Based Methods -- Model-Based • Define a similar Potts-type energy model: Hm (M) = −α1 • ! {i} P (mi , ŝi ) − α2 ! 1 (mi = mj ) <i,j> Use energy minimization or stochastic optimization techniques • • • Min-flow/max-cut methods. Simulated annealing on Gibbs field: ! " 1 ΠM (M) = exp −τ H({l}) Z Swendseng-Wang cuts similar to Barbu and Zhu. © Jason Corso UCLA LONI/CCB and CIVS 27 Extracting Information from the Hierarchy Activate Top-Down Generative Models Treat the hierarchy as a set of model “proposals.” Becomes plausible to use global object properties (e.g., shape). Towards a comprehensive medical image parsing framework. Keys to making these methods more robust and efficient. Integrate Top-Down Generative Models Drive hierarchical agglomeration by the top-down models. Need to define a new multiscale generative model incorporating appearance and shape properties. New learning algorithms for the models and agglomeration. © Jason Corso UCLA LONI/CCB and CIVS 28 Application Brain Tumor Segmentation On theto Dimensionality of Enhancing Brain Tumor Appearance in Multimodal MR Images • • • • • • Jason J. Corso PhD Informatics, multiforme University of California, Los Angeles, CA 90024 DatasetMedical of 20Imaging glioblastoma patients. 3D, 256x256x25. Images Pre-processed: Abstract • • • • noise Medical removal data presents complex high dimensional signals to the informatician. Understanding this data is skull removal paramount to achieving robustness and efficiency in the solution to many medical informatics problems. In this spatial registration paper, we perform a statistical analysis of the appearance of brain standardization. tumor and edema. A maximum intensity likelihood and minimum description length methodology is adopted using mixture models to capture the distribuUse 4 modalities tions. The data consists of 30 manually contoured enhancing tumor studies analyzed both inT2. single T1, T1brain w/contrast, Flair, and modalities and jointly in four modalities. • Expert annotated. 1 Introduction Understanding the data isprovided paramount to achieving Data graciously by robustness and efficiency in the solution to many medDr. Cloughesy UCLA Henry ical informatics problems.ofEven though medical data E. Figure 1: Example slices for brain tumor (left-column) is typically constrained due to the domain, be it natSingleton Brain Cancer Research and Program edema (right-column). ural language reports, imaging studies, intra-operative (swelling). Multiple modalities are necessary to video preprocessed recordings, or doctor-patient it can be and byqueries, Shishir Dubeedema using FSL tools. very complex and sampled from some unknown highdimensional, qualitative or quantitative space. Studying the underlying statistics of this data is important not in Corso its own right, also because of the potential ©only Jason UCLAbut LONI/CCB and CIVS perform a full analysis of the imaging study: the T1 modality with a contrast agent (referred to as T1 Post) is typically used to discriminate enhancing tumor regions from normal brain, and T2 or FLAIR modality is 29 Model-Aware Affinity and Class Prior • For model-aware affinity, use class-dependent weighted distance: ! P (Xuv |su , sv , cu , cv ) = exp − • Coefficients θcmu cv θcmu cv m=1 # m # #su − sm # v . 7 are set based on expert, domain knowledge. !! !! m T1 cu , cv !!! ! ND, Brain Brain, Brain Brain, Tumor Tumor, Tumor Brain, Edema Tumor, Edema Edema, Edema • • M " $ 1/4 1/4 0 0 0 0 0 T1CE FLAIR T2 1/4 1/4 1 1 0 1/2 0 1/4 1/4 0 0 1/2 1/4 1/2 1/4 1/4 0 0 1/2 1/4 1/2 TABLE I Feature statistic is simply average intensity. C OEFFICIENTS USED IN MODEL - AWARE AFFINITY CALCULATION . ROWS The class prior term • • ARE ( SYMMETRIC ) CLASS PAIRS AND COLUMNS ARE MODALITIES . encodes obvious hard constraints (e.g. tumor cannot be next to non-data), remaining set to uniform. © Jason Corso UCLA LONI/CCB and CIVS 30 Class Likelihood Model • Each class is modeled with a Gaussian mixture model. • Likelihood is computed directly against this model. • !! For some structures more node statistics must be used: • • • • ! cu , c v ND Brai Brain Tumo Brain Tumo Edem T1 T2 FLAIR T1CE C OEFFICIENTS U ARE ( SYMMET (a) Tumor Class E. Implementa Shape moments Surface curvature Location T1 Relational T2 FLAIR T1CE (b) Edema Class Fig. 5. Class intensity histograms in each channel independently. D. Model-Aware Affinity Definition © Jason Corso UCLA LONI/CCB and CIVS For the model-aware affinity term (5), we use a class dependent weighted distance: The multilev weighted aggre # input pixels (#V model-aware a the number of typically small not greatly affe However, the rithm is great. T typically numb cuts the numbe is maintaining is huge, even i al. [11] give su In our implem buffer to store t hierarchy. The algorith native bindings 31 Example of the Segmentation Pyramid © Jason Corso UCLA LONI/CCB and CIVS 32 Example of the Segmentation Pyramid Caudate 4 5 6 7 8 9 10 11 12 Ventricle Putamen Can see the relevant structures emerging. Preliminary results from sub-cortical project (began Sept. 2006). © Jason Corso UCLA LONI/CCB and CIVS 33 Example of the Segmentation Pyramid 5 6 7 8 Hippocampus Hard to segment structures emerging too. © Jason Corso UCLA LONI/CCB and CIVS 34 Quantitative Results Volume Overlap Comparison Algorithm Single Voxel Classifier Saliency-Based Extractor Model-Based Extractor Algorithm Single Voxel Classifier Saliency-Based Extractor Model-Based Extractor Results on Training Set Tumor Jac Prec Rec 42% 48% 85% 44% 51% 64% 62% 75% 81% Results on Testing Set Tumor Jac Prec Rec 49% 55% 81% 48% 61% 63% 66% 80% 79% 8 Jac 43% 47% 54% Edema Prec 49% 55% 66% Rec 78% 76% 72% Jac 56% 56% 61% Edema Prec 66% 66% 78% Rec 76% 71% 71% TABLE II VOLUME OVERLAP RESULTS ( MEAN SCORES ). © Jason Corso UCLA LONI/CCB and CIVS 35 Quantitative Results Fig. 8. Example hierarchy on image TE01 from the test set. The top row is comprised of the Volume Measurement Study ID TR01 TR02 TR03 TR04 TR05 TR06 TR07 TR08 TR09 TR10 TE01 TE02 TE03 TE04 TE05 TE06 TE07 TE08 TE09 TE10 Tumor Truth Auto 93335 79894 17344 12516 71211 30769 798 1552 25411 34776 7643 13826 31829 34699 24594 26570 64486 79718 220243 227251 24851 38817 54352 51054 46651 41242 10499 8177 46783 57454 41395 39328 36826 24079 52256 39348 16503 19989 95422 92311 mm III TABLE Edema Truth Auto 138972 151524 20709 38145 122168 188732 10143 11730 122829 118247 10768 16470 34733 38670 134487 162264 253682 268981 90120 36970 262421 213046 171869 153000 65919 70784 81740 74495 101198 84437 37355 29709 72625 36251 140799 146390 220270 241509 183315 186080 Stu T T T T T T T T T T T T T T T T T T T T 3 © Jason Corso UCLA LONI/CCB and CIVS 36 Quantitative Results t. The top row is comprised of the following channels: T1, T1 with contrast, FLAIR, and T2. Surface Measurement Auto 1524 8145 8732 1730 8247 6470 8670 2264 8981 6970 3046 3000 0784 4495 4437 9709 6251 6390 1509 6080 Study ID TR01 TR02 TR03 TR04 TR05 TR06 TR07 TR08 TR09 TR10 TE01 TE02 TE03 TE04 TE05 TE06 TE07 TE08 TE09 TE10 © Jason Corso Tumor Mean Median 1.23 0 7.73 0 3.71 1.8 5.11 0.86 1.56 0 7.63 1.22 10.28 0 1.17 0 4.11 0 0.68 0 29.97 0.78 1.56 0 1.12 0 0.39 0 1.02 0 0.74 0 20.23 2.32 2.23 0 2.14 0 4.54 0 UCLA LONI/CCB and CIVS mm IV TABLE Edema Mean Median 1.01 0 76.40 97.68 3.03 0 1.95 0 0.56 0 13.60 6.5 7.02 0 1.12 0 0.71 0 5.57 1.72 2.52 0 2.49 0 0.54 0 1.24 0 0.77 0 19.69 26.78 10.25 1.83 1.63 0 2.04 0 3.36 0 37 Classification Comparison 2 3 4 5 6 Models Saliency Single Voxel Manual 1 © Jason Corso UCLA LONI/CCB and CIVS 38 Classification Comparison 8 9 10 11 Models Saliency Single Voxel Manual 7 © Jason Corso UCLA LONI/CCB and CIVS 39 Results Comparing SWA with Model-Aware Affinities to Original SWA Model-Aware Original © Jason Corso UCLA LONI/CCB and CIVS 40 Implementation Details • Built with Java and Swing. Uses ImageIO: supports many image formats using the LONI medical image plugins. Developed new core data structures to support transparent outof-core data manipulation, which is often necessary when working with medical imaging (especially hierarchical). Developed with principled OO techniques: Polymorphic Model-Class design for “plug-and-play” affinity function and likelihood measurements based on specific application. Design patterns to make it easy to generate an application specific segmentation algorithm built with atop same core software. Computational time: 90x60x90 volume segmented in about 2 minutes on a 2Ghz PowerPC G5. Extraction time depends on method. • • • • • • • © Jason Corso UCLA LONI/CCB and CIVS 41 Results - Tool for Interactive Analysis Using Models Select Slices to View (3 axes) Select Modality Axial Sagittal Select Hierarchy Level Select Transparency Overlay Manual Contours © Jason Corso UCLA LONI/CCB and CIVS Coronal 3D Rendering 42 Conclusions • Main contribution: • Classes as hidden variables for bridging graph-based affinities and modelbased techniques. • • Incorporated into the SWA algorithm. • Software contribution: • • Computational time: 90x60x90 volume segmented & classified in about 2 minutes on a PowerPC G5 (in Java). • A complete system for experimenting with the Bayesian computational methods, including learning, segmentation, visualization, and analysis. Future: • • • Better models with more feature statistics, Learning the model-aware affinity class dependent parameters, Leverage the efficiency of graph-based bottom-up methods with the comprehensive and optimality of top-down generative models. Thanks for listening. © Jason Corso UCLA LONI/CCB and CIVS 43 Acknowledgements • Funding by NLM Medical Informatics Training Grant #LM07356 and NIH NCBC Center for Computational Biology (LONI). • Dr. Cloughesy of the Henry E. Singleton Brain Cancer Research Program, University of California, Los Angeles, for providing the data. • Shishir Dube for helping with and performing the data preprocessing. • Drs. Yuille, Sharon, Taira and Sinha for guidance. © Jason Corso UCLA LONI/CCB and CIVS 44 Research Overview • • • • Concepts • • Automatic Knowledge discovery Domain expert knowledge Models • • • Statistical Learning Bayesian methodology (generative modeling) Manual creation Model discovery (e.g. learning grammars, manifold learning) Inference • • • Estimate the models (find instances of the concepts) in data Efficient and robust Integrate bottom-up, feed-forward with top-down models Applications • • • Medical Informatics Surgical planning Various related biomedical application (e.g. tissue microarrays) © Jason Corso UCLA LONI/CCB and CIVS 45