Computational Anatomy & Statistical Shape Models John Ashburner john@fil.ion.ucl.ac.uk Functional Imaging Lab, 12 Queen Square, London, UK. Why? o The Wellcome Trust is keen that there is a translational component to the work in the FIL. o E.g. develop some potentially useful diagnostic stuff. o For proper generative models of brain shape differences. o More accurate spatial normalisation. o More accurate shape characterisations. o To use these models for proper characterisation of population differences. o These may be multivariate. o Join the mainstream. o How do more established fields of biology compare shapes? NeuroImage Volume 23, Supplement 1, Pages CO2-S299 (2004) Mathematics in Brain Imaging Edited by P.M. Thompson, M.I. Miller, T. Ratnanather, R.A. Poldrack and T.E. Nichols o o o o o o o o o o o o o o o o Mapping cortical change in Alzheimer's disease, brain development, and schizophrenia Paul M. Thompson, Kiralee M. Hayashi, Elizabeth R. Sowell, Nitin Gogtay, Jay N. Giedd, Judith L. Rapoport, Greig I. de Zubicaray, Andrew L. Janke, Stephen E. Rose, James Semple et al. Computational anatomy: shape, growth, and atrophy comparison via diffeomorphisms Michael I. Miller Geometric strategies for neuroanatomic analysis from MRI James S. Duncan, Xenophon Papademetris, Jing Yang, Marcel Jackowski, Xiaolan Zeng and Lawrence H. Staib Variational, geometric, and statistical methods for modeling brain anatomy and function Olivier Faugeras, Geoffray Adde, Guillaume Charpiat, Christophe Chefd'Hotel, Maureen Clerc, Thomas Deneux, Rachid Deriche, Gerardo Hermosillo, Renaud Keriven, Pierre Kornprobst et al. Computational anatomy and neuropsychiatric disease: probabilistic assessment of variation and statistical inference of group difference, hemispheric asymmetry, and time-dependent change John G. Csernansky, Lei Wang, Sarang C. Joshi, J. Tilak Ratnanather and Michael I. Miller Sequence-independent segmentation of magnetic resonance images Bruce Fischl, David H. Salat, André J.W. van der Kouwe, Nikos Makris, Florent Ségonne, Brian T. Quinn and Anders M. Dale Expert knowledge-guided segmentation system for brain MRI Alain Pitiot, Hervé Delingette, Paul M. Thompson and Nicholas Ayache Surface-based approaches to spatial localization and registration in primate cerebral cortex David C. Van Essen Cortical surface segmentation and mapping Duygu Tosun, Maryam E. Rettmann, Xiao Han, Xiaodong Tao, Chenyang Xu, Susan M. Resnick, Dzung L. Pham and Jerry L. Prince Cortical cartography using the discrete conformal approach of circle packings Monica K. Hurdal and Ken Stephenson A framework to study the cortical folding patterns J.-F. Mangin, D. Rivière, A. Cachia, E. Duchesnay, Y. Cointepas, D. Papadopoulos-Orfanos, P. Scifo, T. Ochiai, F. Brunelle and J. Régis Geodesic estimation for large deformation anatomical shape averaging and interpolation Brian Avants and James C. Gee Unbiased diffeomorphic atlas construction for computational anatomy S. Joshi, Brad Davis, Matthieu Jomier and Guido Gerig Statistics on diffeomorphisms via tangent space representations M. Vaillant, M.I. Miller, L. Younes and A. Trouvé Soliton dynamics in computational anatomy Darryl D. Holm, J. Tilak Ratnanather, Alain Trouvé and Laurent Younes Implicit brain imaging Facundo Mémoli, Guillermo Sapiro and Paul Thompson Computational anatomy: shape, growth, and atrophy comparison via diffeomorphisms Michael I. Miller Training and Classifying ? ? Patient Training Data Control Training Data ? ? Classifying ? ? Patients Controls ? ? y=f(aTx+b) Support Vector Classifier Support Vector Classifier (SVC) Support Vector Support Vector Support Vector a is a weighted linear combination of the support vectors Some Equations o Linear classification is by y = f(aTx + b) o where a is a weighting vector, x is the test data, b is an offset, and f(.) is a thresholding operation o a is a linear combination of SVs a = o So y = f(Si wi xiTx + b) Si wi xi Going Nonlinear o Nonlinear classification is by y = f(Si wi (xi,x)) o where (xi,x) is some function of xi and x. o e.g. RBF classification (xi,x) = exp(-||xi-x||2/(2s2)) o Requires a matrix of distance measures (metrics) between each pair of images. Nonlinear SVC What is a Metric? o Positive A B o Dist(A,B) ≥ 0 o Dist(A,A) = 0 o Symmetric o Dist(A,B) = Dist(B,A) o Satisfy triangle inequality o Dist(A,B)+Dist(B,C) ≥ Dist(A,C) C Concise representations o Information reduction/compression o Most parsimonious representation - best generalisation o Occam’s Razor o Registration compresses data o signal is partitioned into o deformations o residuals The Small deformation setting o Most “nonlinear” registration is done in the small-deformation setting. o Involves adding a smooth displacement field to an identity transform: y=x+u o No one-to-one constraint o Inverse from: x = y - u o o o o o Can be a poor approximation to the real inverse Adding and subtracting displacements doesn’t work properly Smoothing and averaging displacements doesn’t work properly Not the most parsimonious model Unrealistic generative model Small deformation: displacements are linear within the Eulerian framework Small-deformation setting Forward transform Backward transform Small def. approx. to backward transform Small def. approx. to forward transform Illustrating some concepts with rotations o Consider a 2D rotation y=Rx, where cos sin R sin cos o This can be formulated as the solution of a differential equation at time t=1 o x1(t) = x2(t) o x2(t) = -x1(t) o or 0 o x(t) = Ax(t), where A 0 Flow field for rigid 2D rotation Exponentials o The solution can be obtained by cos x(1) Rx(0) sin sin 0 A x(0) e x(0) exp x(0) cos 0 o The exponential is defined as: A 2 A3 A 4 A R e IA ... 2! 3! 4! o There are many ways of computing R from A, but one of the easiest is by scaling and squaring Averaging rotations o It makes no sense to average the rotation matrices themselves. o The result may not be a rotation o The elements of a rotation matrix lie on a manifold. o Average by minimising the sum of squared distances tangential to the manifold o Distance derived from velocity (distance travelled in unit time) o Shortest distances are geodesics, which require constant velocity r12 r11 Groups o 2D rotation matrices form a Lie group under multiplication (SO2). o Group Requirements o Composition of group members is another group member o The members have inverses o There is an identity member o The composition operations are associative o Lie Group requirements o Continuous and differentiable manifold 3D Rotations o 3D rotations defined by 1 2 0 A R e exp 1 0 3 0 3 2 o These do not commute (R1R2≠R2R1) o Similarly exp(A1) exp(A2) ≠exp(A2) exp(A1) o Both differ from exp(A1+A2) o Makes life more difficult o Iterative schemes needed for averaging etc. Lie Algebra o A would be known as the Lie algebra of the rotation matrix. o The amount of non-commutativity is measured by the Lie bracket o Results from curvature of the manifold o [A,B] = AB-BA eB eA e2(AB-BA) eA eB How much rotation is in a rotation matrix? o Given two rotation matrices, R and S. The relative difference between the rotations can be found by computing C = log (R-1S) and then computing the RMS of C. (12+ 22 + 32)1/2 Cartan decomposition o A matrix can be decomposed into A = (A+AT)/2 + (A-AT)/2 o (A+AT)/2 is symmetric o Encodes zooms and shears o (A-AT)/2 is skew symmetric o Encodes rigid rotations o A can be converted into a column vector (a) o There is a matrix L, such that La gives the elements of (A-AT)/2. o The square of the rotation angle is then given by (La)T(La) = aT(LTL)a o This excludes any zooming and shearing from the measure o Similarly – and more usefully - the amount of zooming and shearing can be computed in a way that is independent of the rotations. Nonlinear Registration Mapping Flow field for nonlinear deformation … and the resulting deformation A diffeomorphism and its inverse Diffeomorphisms have curved trajectories (variable velocity) if followed in the Eulerian reference frame (fixed). If followed within the Lagrangian frame (moves over time), they appear to have constant velocity. Partial Differential Equations Model one image as it deforms to match another. x(t) = u(x(t)) x(1) = u e (x(0)) Matrix representations of diffeomorphisms x(1) = eU x(0) x(0) = e-U x(1) For large k eU ≈ (I+U/k)k Compositions Large deformations generated from compositions of small deformations S1 = S1/8oS1/8oS1/8oS1/8oS1/8oS1/8oS1/8oS1/8 Recursive formulation S1 = S1/2oS1/2, Small deformation approximation S1/8 ≈ I + U/8 S1/2 = S1/4oS1/4, S1/4 = S1/8oS1/8 The shape metric o Don’t use the straight distance (i.e. o Distance = √uTLTLu o What’s the best form of L? o Membrane Energy o Bending Energy o Linear Elastic Energy √uTu) LTL for “membrane energy” LTLu for “membrane energy” is generated by convolving with LTL for “bending energy” LTLu for “bending energy” is generated by convolving with Registration with different models Consistent registration Register to a mean shaped image A B A B µ C Totally impractical for lots of scans C Problem: How can the distance between e.g. A and B be computed? Inverse exponentiating is iterative and slow. Baker-Campbell-Hausdorff series o Exp-1(Exp(A)Exp(B)) = A+B +[A,B]/2 +[A,[A,B]]/12-[B,[A,B]]/12 -[B,[A,[A,B]]]/48-[A,[B,[A,B]]]/48 +… o Where [A,B] is the Lie bracket applied to flow fields eB eA eB e2(AB-BA) eA Sometimes unstable. Looks like proper nonlinear methods would be impractical. Alternative strategy o Assume the manifold is locally flat around some point (the template image. o The results depend on the point on the manifold that is chosen. o Ideally use a mean shape as the template. o Best approximation. Visualisation o The results of multivariate analyses are difficult to visualise o For linear classifiers, it can be done by “caricaturing” the difference o E.g. for the separation of two groups, it would be possible to show two exaggerated versions of the mean image. Controls Patients y=f(aTx+b) t=-1.0 t=-0.75 t=-0.5 t=-0.25 t=0.0 t=0.25 t=0.5 t=0.75 t=1.0 I could say more about the registration algorithm next time But let’s just say that it shows potential… Average of 452 images Only affine registered 2D average of 471 images Registration of each 2D image takes about 3 seconds per iteration, and about 16 iterations. I see no problems scaling it to 3D. Over-fitting Test data A simpler model can often do better... Cross-validation o Methods must be able to generalise to new data o Various control parameters o More complexity -> better separation of training data o Less complexity -> better generalisation o Optimal control parameters determined by crossvalidation o Test with data not used for training o Use control parameters that work best for these data Two-fold Cross-validation Use half the data for training. and the other half for testing. Two-fold Cross-validation Then swap around the training and test data. Leave One Out Cross-validation Use all data except one point for training. The one that was left out is used for testing. Leave One Out Cross-validation Then leave another point out. And so on... Interpretation?? o Significance assessed from accuracy based on cross-validation. o Main problems: o No simple interpretation. o Mechanism of classification is difficult to visualise o especially for nonlinear classifiers o Difficult to understand (not like blobs) o May be able to use the separation to derive simple (and more publishable hypotheses). Group Theory o Diffeomorphisms (smooth continuous one-to-one mappings) form a Group. o Closure o AoB remains in the same group. o Associativity o (AoB)oC = Ao(BoC) o Identity o Identity transform I exists. o Inverse o A-1 exists, and A-1oA=AoA-1 = I o It is a Lie Group. o The group of diffeomorphisms constitute a smooth manifold. o The operations are differentiable. Lie Groups o Simple Lie Groups include various classes of affine transform matrices. o E.g. SO(2) : Special Orthogonal 2D (rigid-body rotation in 2D). o Manifold is a circle o Lie Algebra is exponentiated to give Lie group. For square matrices, this involves a matrix exponential. Relevance to Diffeomorphisms o Parameterise with velocities, rather than displacements. o Velocities are the Lie Algebra. These are exponentiated to a deformation by recursive application of tiny displacements, over a period of time=0..1. o A(1) = A(1/2) oA(1/2) o A(1/2) = A(1/4) oA(1/4) o Don’t actually use matrices. o For tiny deformations, things are almost linear. o x(1/1024) x(0) + vx/1024 o y(1/1024) y(0) + vy/1024 o z(1/1024) z(0) + vz/1024 o Recursive application by o x(1/2) = x(1/4) (x(1/4), y(1/4),z(1/4)) o y(1/2) = y(1/4) (x(1/4), y(1/4),z(1/4)) o z(1/2) = z(1/4) (x(1/4), y(1/4),z(1/4)) Working with Diffeomorphisms o Averaging Warps. o Distances on the manifold are given by geodesics. o Average of a number of deformations is a point on the manifold with the shortest sum of squared geodesic distances. o E.g. average position of London, Sydney and Honolulu. o Inversion. o Negate the velocities, and exponentiate. o x(1/1024) x(0) - vx/1024 o y(1/1024) y(0) - vy/1024 o z(1/1024) z(0) - vz/1024 o Priors for registration o Based on smoothness of the velocities. o Velocities relate to distances from origin.