Tutorial on Medical Image Segmentation: Beyond Level-Sets MICCAI, 2014 Yuri Boykov, UWO Western University Canada www.csd.uwo.ca/faculty/yuri/miccai14_MIS 1: Basics of optimization-based segmentation - continuous and discrete approaches 2 : Exact and approximate techniques - non-submodular and high-order problems 3: Multi-region segmentation (Milan) - high-dimensional applications Yuri Boykov, UWO Introduction to Image Segmentation Part 1 implicit/explicit representation of boundaries • active contours, level-sets, graph cut, etc. Basic low-order objective functions (energies) • physics, geometry, statistics, information theory Part 2 Set functions, submodularity • Exact methods Approximation methods • Higher-order and non-submodular objectives • Comparison to gradient descent (level-sets) Yuri Boykov, UWO Thresholding T Yuri Boykov, UWO Thresholding T S={ p : Ip < T } Yuri Boykov, UWO Thresholding - = Background I= Iobj - Ibkg Subtraction ? Threshold intensities above T better segmentation? Yuri Boykov, UWO Good segmentation S ? Objective function must be specified Quality function Cost function Loss function E(S) : 2P “Energy” Regularization functional Segmentation becomes an optimization problem: S = arg min E(S) Yuri Boykov, UWO Good segmentation S ? Objective function must be specified Quality function Cost function Loss function E(S) = E1(S)+…+ En(S) “Energy” Regularization functional combining different constraints e.g. on region and boundary Segmentation becomes an optimization problem: S = arg min E(S) Yuri Boykov, UWO Beyond linear combination of terms Ratios are also used • • • • E1( S ) E( S ) E2 ( S ) Normalized cuts [Shi, Malik, 2000] Minimum Ratio cycles [Jarmin Ishkawa, 2001] Ratio regions [Cox et al, 1996] Parametric max-flow applications [Kolmogorov et al 2007] Not in this tutorial Yuri Boykov, UWO Segmentation principles interactive Boundary seeds vs. • Livewire (intelligent scissors) Region seeds • Graph cuts (intelligent paint) • Distance (Voronoi-like cells) Bounding box • Grabcut [Rother et al] Center seeds • Star shape [Veksler] Many other options… unsupervised Normalized cuts [Shi Malik] Mean-shift [Comaniciu] MDL [Zhu&Yuille] Entropy of appearance Add enough constraints: Saliency Shape Known appearance Texture Yuri Boykov, UWO 1. We won’t cover everything… 2. We will not emphasize the differences between interactive and unsupervised (too easy to convert one into the other) interactive Boundary seeds vs. • Livewire (intelligent scissors) Region seeds • Graph cuts (intelligent paint) • Distance (Voronoi-like cells) Bounding box • Grabcut [Rother et al] Center seeds • Star shape [Veksler] Many other options… unsupervised Normalized cuts [Shi Malik] Mean-shift [Comaniciu] MDL [Zhu&Yuille] Entropy of appearance Add enough constraints: Saliency Shape Known appearance Texture Yuri Boykov, UWO Common segmentation techniques boundary-based region-growing intelligent scissors (live-wire) region-based thresholding both region & boundary geodesic active contours (e.g. level-sets) MRF active contours (snakes) (e.g. graph-cuts) watersheds random walker optimization-based Yuri Boykov, UWO Common surface representations continuous optimization sp mesh level-sets combinatorial optimization mixed optimization s p {0,1} sp Z p graph labeling point cloud labeling on complex on grid Yuri Boykov, UWO Active contours (e.g. snakes) [Kass, Witkin, Terzopoulos 1987] Given: initial contour (model) near desirable object Goal: evolve the contour to fit exact object boundary Yuri Boykov, UWO Tracking via active contours Tracking Heart Ventricles 5-14 Yuri Boykov, UWO Active contours - snakes Parametric Curve Representation (continuous case) A curve can be represented by 2 functions C {ν(s) | s [0,1]} ν(s) ( x(s), y(s)) open curve parameter 0 s 1 closed curve 5-15 Yuri Boykov, UWO Snake Energy E( C ) Ein( C ) Eex( C ) internal energy encourages smoothness or any particular shape external energy encourages curve onto image structures (e.g. image edges) 5-16 Yuri Boykov, UWO Active contours - snakes (continuous case) internal energy (physics of elastic band) 1 1 2 2 Ein ( C ) dν ds d 2 ν ds ds d s 0 0 elasticity / stretching 2 stiffness / bending external energy (from image) 1 Eex ( C ) | I (v( s)) |2 ds 0 proximity to image edges 5-17 Yuri Boykov, UWO Active contours – snakes (discrete case) elastic energy (elasticity) d vi 1 i 1 ds 2 v3 C (ν0 , ν1 , ν 2 ,....,ν n1 ) 2n v4 v5 νi ( xi , yi ) v2 v1 v6 v7 v 10 v9 v8 bending energy (stiffness) d 2 ( i 1 i ) ( i i 1 ) i 1 2 i i 1 2 ds Yuri Boykov, UWO Basic Elastic Snake 1 1 dv 2 2 case E | | ds | I ( v( s ))| ds Ccontinuous { ν( s ) | s [ 0 ,1 ]} ds 0 0 n 1 E | vi 1 vi | 2 i 0 elastic smoothness term (interior energy) n 1 | I ( vi ) | 2 discrete case C { νi | 0 i n } i 0 image data term (exterior energy) 5-19 Yuri Boykov, UWO Snakes - gradient descent C n 1 simple elastic snake energy E( x0 , , xn1 , y0 , , yn1 ) | I x ( xi , yi ) |2 | I y ( xi , yi ) |2 here, energy is a function of 2n variables i 0 n 1 ( xi 1 xi ) 2 ( yi 1 yi ) 2 i 0 update equation for the whole snake C' C E t C E x'0 x0 x0 E y ' y y0 0 0 ... ... ... t E x ' x n 1 n 1 xn1 y' n 1 yn 1 ynE1 5-20 Yuri Boykov, UWO Snakes - gradient descent C n 1 simple elastic snake energy E( x0 , , xn1 , y0 , , yn1 ) | I x ( xi , yi ) |2 | I y ( xi , yi ) |2 here, energy is a function of 2n variables i 0 n 1 ( xi 1 xi ) 2 ( yi 1 yi ) 2 i 0 ν'i νi Fi t update equation for each node Fi C E x'0 x0 x0 E y ' y y0 0 0 ... ... ... t E x ' x n 1 n 1 xn1 y' n 1 yn 1 ynE1 xEi Fi E yi 5-21 Yuri Boykov, UWO Snakes - gradient descent E(C) energy function E(C) for contours C C 2 n cˆ c2 local minima for E(C) step size could be tricky c1 c0 gradient descent steps Ci 1 Ci t E second derivative of image intensities 5-22 Yuri Boykov, UWO Implicit (region-based) surface representation via level-sets z u ( x, y ) C {( x, y) : u( x, y) 0} (implicit contour representation) [Dervieux, Thomasset, 79, 81] [Osher, Sethian, 89] Yuri Boykov, UWO Implicit (region-based) surface representation via level-sets dC N Normal contour motion can be represented by evolution of level-set function u du | u| [Dervieux, Thomasset, 79, 81] [Osher, Sethian, 89] u (x) The scaling by | u | is easily verified in one dimension du | u | dC dC x p Note 1: - commonly used for gradient descent evolution dC E dt Note 2: - level sets can not represent tangential motion of contour points ??? Yuri Boykov, UWO Tangential vs. normal motion of contour points - normal motion of a contour point visibly changes shape (geometry) - tangential motion generates no “visible” shape change A simple example of tangential motion of contour points (rotation) A simple example of normal motion of contour points (expansion) Comments: - geometric “energy” of a contour measures “visible” shape properties (length, curvature, area, e.t.c.). Thus, gradient descent w.r.t. geometric objective generates only ”visible” (normal) motion. dC dt E - level sets can represent contour gradient descent evolution only for geometric “energies” E(C) (s.t. E is collinear with contour normal N ) Level sets (implicit contour representation) Geodesic active contours Yuri Boykov, UWO Tangential vs. normal motion of contour points - normal motion of a contour point visibly changes shape (geometry) - tangential motion generates no “visible” shape change Np E p Q: in what medical applications tangential motion of segment boundary matters? Comments: - gradient descent for physics-based “energy” of a contour (e.g. elasticity) may produce geometrically “invisible” tangential motion of contour points - physics-based energy of a contour depends on its parameterization, while geometrically it could be the same contour (compare two shapes above) Parametric contours (explicit contour representation) Physics-based active contours Yuri Boykov, UWO Physics vs. Geometry continuous optimization mesh snakes, balloons, active contours • • • • explicit or parametric contour representation physics-based objectives (typically) gradient descent could use dynamic programming in 2D [Amini, Weymouth, Jain, 1990] level-sets geodesic active contours • • • • implicit or non-parametric representation geometry-based objectives gradient descent can use convex formulations (TV-based) [Chan, Esidoglu, Nikolova 2006] Yuri Boykov, UWO Most common geometric functionals for segmentation with level-sets dC N Functional E( C ) weighted length (boundary alignment to intensity edges) E (C ) g ds E (C ) (oriented boundary alignment) ~ g g , N f da ~ f int(C ) (region alignment to appearance model) flux du | u | C E (C ) weighted area or C v, N ds ~ div(v) Yuri Boykov, UWO Most common geometric functionals for segmentation with level-sets dC N Functional E( C ) E(C) | S |g weighted length or du | u | ~ g g , N (boundary alignment to intensity edges) E (C ) weighted area E (C ) (oriented boundary alignment) da ~ f int(C ) (region alignment to appearance model) flux f C v, N ds ~ div(v) Yuri Boykov, UWO Towards discrete geometry: weighted boundary length on a graph [Barrett and Mortensen 1996] “Live wire” or “intelligent scissors” w(| I |) | I | pixels | I| image-based edge weights B A shortest path algorithm (Dijkstra) Yuri Boykov, UWO shortest path on a 2D graph graph cut Example: find the shortest closed contour in a given domain of a graph Shortest paths approach Graph Cuts approach p Compute the shortest path p ->p for a point p. Repeat for all points on the gray line. Then choose the optimal contour. Compute the minimum cut that separates red region from blue region Yuri Boykov, UWO graph cuts vs. shortest paths On 2D grids graph cuts and shortest paths give optimal 1D contours. B A A Path connects points A Cut separates regions Shortest paths still give optimal 1-D contours on N-D grids Min-cuts give optimal hyper-surfaces on N-D grids Yuri Boykov, UWO Graph cut [Boykov and Jolly 2001] hard constraint t n-links a cut hard constraint s Minimum cost cut can be computed in polynomial time (max-flow/min-cut algorithms) I pq wpq exp 2 2 I pq Yuri Boykov, UWO Minimum s-t cuts algorithms Augmenting paths [Ford & Fulkerson, 1962] - heuristically tuned to grids [Boykov&Kolmogorov 2003] Push-relabel [Goldberg-Tarjan, 1986] - good choice for denser grids, e.g. in 3D Preflow [Hochbaum, 2003] - also competitive Yuri Boykov, UWO Optimal boundary in 2D “max-flow = min-cut” Yuri Boykov, UWO Optimal boundary in 3D 3D bone segmentation (real time screen capture, year 2000) Yuri Boykov, UWO ‘Smoothness’ of segmentation boundary - snakes (physics-based contours) - geodesic contours (geometry) - graph cuts (discrete geometry) NOTE: many distance-to-seed methods optimize segmentation boundary only indirectly, they compute some analogue of optimum Voronoi cells [fuzzy connectivity, random walker, geodesic Voronoi cells, etc.] Yuri Boykov, UWO Discrete vs. continuous boundary cost Geodesic contours E ( S ) ws ds S Graph cuts E(S ) wpq [S p Sq ] p ,q S p {0,1} C [Caselles, Kimmel, Sapiro, 1997] (level-sets) [Chan, Esidoglu, Nikolova, 2006] (convex) [Boykov and Jolly 2001] [Boykov and Kolmogorov 2003] Both incorporate segmentation boundary smoothness and alignment to image edges Yuri Boykov, UWO Graph cuts on a grid and boundary of S B( S ) S p {0,1} |e| eC Severed n-links can approximate geometric length of contour C [Boykov&Kolmogorov, ICCV 2003] This result fundamentally relies on ideas of Integral Geometry (also known as Probabilistic Geometry) originally developed in 1930’s. • e.g. Blaschke, Santalo, Gelfand Yuri Boykov, UWO Integral geometry approach to length 2 C a subset of lines L intersecting contour C LC a set of all lines L 0 probability that a “randomly drown” line intersects C Euclidean length of C : || C || 1 2 n d d L Cauchy-Crofton formula the number of times line L intersects C Yuri Boykov, UWO Graph cuts and integral geometry Graph nodes are imbedded in R2 in a grid-like fashion C Edges of any regular neighborhood system generate families of lines { , , , } || C || Euclidean length 1 2 n k k k k the number of edges of family k intersecting C || C ||gc graph cut cost for edge weights: k k wk 2 Length can be estimated without computing any derivatives Yuri Boykov, UWO Metrication errors Euclidean metric “standard” 4-neighborhoods (Manhattan metric) Riemannian metric 8-neighborhoods larger-neighborhoods Yuri Boykov, UWO Metrication errors 4-neighborhood 8-neighborhood Yuri Boykov, UWO Differential geometry Differential vs. integral approach to geometric boundary length || C || || C || 1 | u | dx 0 Ct dt Parametric (explicit) contour representation Level-set function representation Integral geometry || C || 1 2 n L d d Cauchy-Crofton formula implicit (region-based) representation of contours Yuri Boykov, UWO Implicit (region-based) surface representation via level-sets • Level set function u(p) is normally stored on image pixels • Values of u(p) can be interpreted as distances or heights of image pixels -1.7 -0.8 -0.8 0.2 A contour may be approximated from u(x,y) with subpixel accuracy u( p) u( x , y) -0.6 -0.4 -0.5 0.5 C -0.2 0.6 0.3 0.7 Yuri Boykov, UWO Implicit (region-based) surface representation via graph-cuts • Graph cuts represent surfaces via binary labeling Sp of each graph node • Binary values of Sp indicate interior or exterior points (e.g. pixel centers) 0 0 0 1 There are many contours satisfying interior/exterior labeling of points S p S ( x , y) 0 0 0 1 Question: Is this a contour to be reconstructed from binary labeling Sp ? C Answer: NO 0 1 1 1 Yuri Boykov, UWO Contour/surface representations (summary) Implicit (area-based) Level sets (geodesic active contours) Graph cuts (minimum cost cuts) What else besides boundary length |∂S| ? Explicit (boundary-based) Snakes (physics-based band model) Live-wire (shortest paths on graphs) Yuri Boykov, UWO From seeds to more general region constraints [Boykov and Jolly 2001] Dp (s) t n-links a cut S w pq S Dp (t ) segmentation s assume I and I are known “expected” intensities of object and background s t Dp (s) | I p I s | Dp (t ) | I p I t | cost of severed t-links E(S) = s | I I p | pS t | I I p | + pS | S | cost of severed n-links Yuri Boykov, UWO From seeds to more general region constraints [Boykov and Jolly 2001] t n-links a cut Dp (s) S w pq S segmentation re-estimate Dp (s) | I p I s | unknown intensities of object and background Chan-Vese model s I s and I t I s and I t could be Block-Coord.Descent Dp (t ) Dp (t ) | I p I t | cost of severed t-links E(S, Is,It) = | I p I s | pS t | I I p | pS + | S | cost of severed n-links Yuri Boykov, UWO Block-coordinate descent for E(S , I 0 , I 1 ) 0 Minimize over labeling S for fixed I , I E (S , I 0 , I 1 ) (I 0 p:S p 0 I p )2 (I 1 p:S p 1 I p )2 w { pq}N pq 1 [S p Sq ] optimal L can be computed using graph cuts 0 1 Minimize over I , I for fixed labeling S E( S, I , I ) 0 1 (I p:S p 0 0 Ip) 2 (I p:S p 1 1 Ip) fixed for 2 S=const wpq [S p Sq ] { pq}N optimal I 1, I 0 can be computed by minimizing squared errors inside object and background segments Iˆ 0 1 |S | I p:S p 0 p Iˆ1 1 |S | I p :S p 1 p Yuri Boykov, UWO Chan-Vese segmentation (binary case S p {0,1} ) E (S , I 0 , I 1 ) 0 2 ( I I ) p p :S p 0 w { pq}N pq 1 2 ( I I ) p p :S p 1 [S p Sq ] Yuri Boykov, UWO Chan-Vese segmentation (could be used for more than 2 labels S p {0,1,2,...} ) can be used for segmentation, to reduce color-depth, or to create a cartoon E ( S , I 0 , I 1 ,...) 0 2 ( I I ) p p :S p 0 w { pq}N pq 1 2 ( I I ) ... p p :S p 1 [S p Sq ] Yuri Boykov, UWO Chan-Vese segmentation (could be used for more than 2 labels S p {0,1,2,...} ) can be used for segmentation, to reduce color-depth, or to create a cartoon E ( S , I 0 , I 1 ,...) joint optimization over S and I0, I1,… is NP-hard 0 2 ( I I ) p p :S p 0 1 2 ( I I ) ... p p :S p 1 without the smoothing term, this is like “K-means” clustering in the color space Yuri Boykov, UWO From fixed intensity segments to general intensity distributions [Boykov and Jolly 2001] t n-links a cut Dp (s) S w pq S Dp (t ) segmentation s Appearance models can be given by intensity distributions of object and background Dp (s) ln Pr( I p | s) Dp (t ) ln Pr(I p | t ) cost of severed t-links E(S) = ln Pr(I pS p | s) ln Pr(I pS p | t ) + | S | cost of severed n-links Yuri Boykov, UWO Graph cut (region + boundary) Yuri Boykov, UWO Graph cut as energy optimization for S [Boykov and Jolly 2001] t n-links a cut Dp (s) S w pq S Dp (t ) segmentation cut s S p {0,1} E(S) = cost(cut) (1)D ( S)D (0) D pS p p p p pS p + w pqN pq [ S p Sq ] cost of severed t-links cost of severed n-links unary terms pair-wise terms regional properties of S boundary smoothness for S Yuri Boykov, UWO Unary potentials as linear term wrt. S p {0,1} unary terms D p ( S p ) Dp (1) Dp (0) pS p pS Dp (1) S p Dp (0) (1- Sp ) p const Dp (1) Dp (0) S p p f ( p) f p s p f, S p Linear region term analogous to geodesic active contours E ( S ) f S Yuri Boykov, UWO Unary potentials as linear term wrt. f,S f p sp p unary terms (linear) Examples of potential functions f • Log-likelihoods f p ln Pr ( I p ) • Chan-Vese f p ( I p c)2 • Volume Ballooning f p 1 • Attention f p filter responseat p Yuri Boykov, UWO In general,… k-arity potentials are k-th order polynomial pair-wise terms w pqN pq [ S p Sq ] w S pqN pq p (1 Sq ) (1 Sp ) Sq quadratic polynomial wrt.S p Quadratic term analogous to boundary length in geodesic active contours E ( S ) | S |w ws ds S Yuri Boykov, UWO Basic (quadratic) boundary regularization | S |w w pqN pq [s p sq ] s p {0,1} second-order terms (quadratic) Examples of discontinuity penalties w • Euclidean boundary length 1 w pq ~ | pq| [Boykov&Kolmogorov, 2003], via integral geometry • contrast-weighted boundary length [Boykov&Jolly, 2001] wpq ~ exp ( I p I q )2 Yuri Boykov, UWO Basic second-order segmentation energy includes linear and quadratic terms E(S) f, S | S |w f pS p w pqN pq [ S p Sq ] Yuri Boykov, UWO Basic second-order segmentation energy includes linear and quadratic terms E(S) f, S | S |w MAIN ADVANTAGE: guaranteed global optimum (t.e. best segmentation w.r.t. objective) (discrete case) via graph cuts [Boykov&Jolly’01; Boykov&Kolmogorov’03] public code [BK’2004], fast on CPU (continuous case) via convex TV formulations [Chen,Esidoglu,Nikolova’06; Chambolle,Pock,Cremers’08] public code [C. Nieuwenhuis’2014], comparably fast on GPU NOTE: this formulation is different from basic level-sets [Osher&Sethian’89] Yuri Boykov, UWO Optimization vs Thresholding E(S) f, Sf p B(S) E(S) pS S thresholding Pr(I | Fg) e.g. graph cut [BJ, 2001] Pr(I | Bg) Pr(I(p)| fg) f(p) ln Pr(I(p)| bg) I Yuri Boykov, UWO Other examples of useful globally optimizable segmentation objectives Flux [Boykov&Kolmogorov 2005] Color consistency [“One Cut”, Tang et al. 2014] Distance ||S-S0|| from template shape • Hamming, L2,… [e.g. Boykov,Cremers,Kolmogorov, 2006] Star-shape prior [Veksler 2008] NOT ALLOWED Yuri Boykov, UWO Many more example of useful hard-to-optimize segmentation objectives Typical Problems: Continuous case • Non-convexity Typical Solutions: gradient descent (linearization) + level sets Discrete case • Non-submodularity (more later) • High-order • Density (too many terms) to be discussed Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials E(S) (constraints on segment size) V0 | S| E(S) V0 s p can not be represented as a sum of simpler (unary or quadratic) terms high-order potential |S| V0 Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials E(S) (constraints on segment size) V0 | S| E(S) 2 V s 0 2 p can be represented as a sum of unary and quadratic terms NOTE: 2nd-order potential |S| V0 still difficult to optimize (completely connected graph) Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials E(S) (constraints on segment size) V0 | S| E(S) m V s 0 m p can not be represented as a sum of simpler (unary or quadratic) terms high-order potential |S| V0 Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials (constraints on segment size) Curvature of the boundary Shape convexity Segment connectivity Appearance entropy, color consistency Distribution consistency High-order shape moments … Yuri Boykov, UWO Summary Not-Covered: Covered basics of: Thresholding, region growing Snakes, active contours Geodesic contours Graph cuts (binary labeling, MRF) Implicit surface representation Global optimization is possible To be covered later: High-order models Ratio functionals Normalized cuts Watersheds Random walker Many others…