The Geometry of Random Fields in Astrophysics and Brain Mapping Keith Worsley, McGill Jonathan Taylor, Stanford and Université de Montréal Robert Adler, Technion Frédéric Gosselin, Université de Montréal Philippe Schyns, Fraser Smith, Glasgow Arnaud Charil, Montreal Neurological Institute Astrophysics Sloan Digital Sky Survey, release 6, Aug. ‘07 Sloan Digital Skydata Survey, FWHM=19.8335 2000 Euler Characteristic (EC) 1500 1000 500 "Meat ball" topology "Bubble" topology 0 -500 -1000 "Sponge" topology -1500 Observed Expected -2000 -5 -4 -3 -2 -1 0 1 Gaussian threshold 2 3 4 5 Nature (2005) Subject is shown one of 40 faces chosen at random … Happy Sad Fearful Neutral … but face is only revealed through random ‘bubbles’ First trial: “Sad” expression Sad 75 random Smoothed by a bubble centres Gaussian ‘bubble’ What the subject sees 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Subject is asked the expression: Response: “Neutral” Incorrect Your turn … Trial 2 Subject response: “Fearful” CORRECT Your turn … Trial 3 Subject response: “Happy” INCORRECT (Fearful) Your turn … Trial 4 Subject response: “Happy” CORRECT Your turn … Trial 5 Subject response: “Fearful” CORRECT Your turn … Trial 6 Subject response: “Sad” CORRECT Your turn … Trial 7 Subject response: “Happy” CORRECT Your turn … Trial 8 Subject response: “Neutral” CORRECT Your turn … Trial 9 Subject response: “Happy” CORRECT Your turn … Trial 3000 Subject response: “Happy” INCORRECT (Fearful) Bubbles analysis 1 E.g. Fearful (3000/4=750 trials): + 2 + 3 + Trial 4 + 5 + 6 + 7 + … + 750 1 = Sum 300 0.5 200 0 100 250 200 150 100 50 Correct trials Proportion of correct bubbles =(sum correct bubbles) /(sum all bubbles) 0.75 Thresholded at proportion of 0.7 correct trials=0.68, 0.65 scaled to [0,1] 1 Use this as a 0.5 bubble mask 0 Results Mask average face Happy Sad Fearful But are these features real or just noise? Need statistics … Neutral Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for each expression Equivalent to 2-sample Z-statistic for correct vs. incorrect bubbles, e.g. Fearful: Trial 1 2 3 4 5 6 7 … 750 1 0.5 0 1 1 Response 0 1 Z~N(0,1) statistic 4 2 0 -2 0 1 1 … 1 0.75 Very similar to the proportion of correct bubbles: 0.7 0.65 Results Thresholded at Z=1.64 (P=0.05) Happy Average face Sad Fearful Neutral Z~N(0,1) statistic 4.58 4.09 3.6 3.11 2.62 2.13 1.64 Multiple comparisons correction? Need random field theory … Euler Characteristic Heuristic Euler characteristic (EC) = #blobs - #holes (in 2D) Excursion set Xt = {s: Z(s) ≥ t}, e.g. for neutral face: EC = 0 30 20 0 -7 -11 13 14 9 0 Heuristic: At high thresholds t, the holes disappear, EC ~ 1 or 0, E(EC) ~ P(max Z ≥ t). Observed Expected 10 EC(Xt) 1 0 -10 -20 -4 -3 -2 -1 0 1 Threshold, t 2 • Exact expression for E(EC) for all thresholds, • E(EC) ~ P(max Z ≥ t) is 3 4 extremely accurate. The»result If Z(s) N(0; 1) ¡is an¢ isotropic Gaussian random ¯eld, s 2 <2 , with ¸2 I2£2 = V @Z , @s µ ¶ P max Z(s) ¸ t ¼ E(EC(S \ fs : Z(s) ¸ tg)) s2S Z 1 1 £ L (S) = EC(S) e¡z2 =2 dz 0 (2¼)1=2 t L (S) £ 1 e¡t2 =2 1 + ¸ Perimeter(S) Lipschitz-Killing 1 2 2¼ curvatures of S 1 L (S) ¡t2 =2 2 Area(S) £ + ¸ te 2 (2¼)3=2 If Z(s) is white noise convolved with an isotropic Gaussian Z(s) ¯lter of Full Width at Half Maximum FWHM then p ¸ = 4 log 2 : FWHM ½0 (Z ¸ t) ½1 (Z ¸ t) ½2 (Z ¸ t) EC densities of Z above t white noise = filter * FWHM Results, corrected for search Random field theory threshold: Z=3.92 (P=0.05) Happy Average face Sad Fearful Neutral Z~N(0,1) statistic 4.58 4.47 4.36 4.25 4.14 4.03 3.92 Bonferroni threshold: Z=4.87 (P=0.05) – nothing Theory (1981,1995) Let T (s), s 2 S ½ <D be a smooth isotropic random ¯eld. Let Xt = fs : T (s) ¸ tg be the the excursion set. Let Rt = fz : f (z) ¸ tg be the rejection region of T . Then X D \ L (S)½ (R ): E(EC(S Xt )) = d d t d=0 Proof. E(EC(S \ Xt )) = X D L (S)½ (R ) d d t d=0 (Hadwiger, 1930s): Suppose Á(S), S ½ <D , is a set functional that is invariant under translations and rotations of S, and satis¯es the additivity property Á(A [ B) = Á(A) + Á(B) ¡ Á(A \ B): Then Á(S) must be a linear combination of intrinsic volumes Ld (S): Á(S) = X D L (S)c : d d d=0 The choice Á(S) = E(EC(S \ Xt )) is invariant under translations and rotations because the random ¯eld is isotropic, and is additive because the EC is additive: EC(A [ B) = EC(A) + EC(B) ¡ EC(A \ B) E(EC(S \ Xt )) = X D L (S)½ (R ) d d t d=0 Lipschitz-Killing curvature Ld (S) EC density ½d (Rt ) Steiner-Weyl Tube Formula (1930) Morse Theory method (1981, 1995) • Put a tube of radius r about the search µ ¶ @Z ¸ = Sd region λS @s EC has a point-set representation: 14 10 Tube(λS,r) 8 λS 6 4 2 2 4 6 8 10 12 14 • Find volume, expand as a power series in r, pull off coefficients: jTube(¸S; r)j = X D d=0 1fT (s)¸tg 1f@T (s)=@s=0g s r 12 EC(S \ Xt ) = X ¼d ¡(d=2 + 1) L (S)r d D ¡d µ ¶ £ sign ¡ @ 2 T + boundary 0 @s@s µ µ ¶ 2 1 @ T E 1f ¸ g det ¡ ½D (Rt ) = T t ¸D @s@s0 ¯ ¶ µ ¶ ¯ @T @T ¯ P =0 ¯ @s = 0 @s µ ¡random ¶ field: For a Gaussian ½d (Z ¸ t) = p1 @ 2¼ @t d P(Z ¸ t) Lipschitz-Killing curvature Ld (S) of a triangle r Tube(λS,r) λS ¸ = Sd µ @Z @s ¶ Steiner-Weyl Volume of Tubes Formula (1930) Area(Tube(¸S; r)) = X D ¼ d=2 L ¡d (S)r d D ¡(d=2 + 1) d=0 = L2 (S) + 2L1 (S)r + ¼ L0 (S)r2 = Area(¸S) + Perimeter(¸S)r + EC(¸S)¼r2 L (S) = EC(¸S) 0 L (S) = 1 Perimeter(¸S) 1 2 L (S) = Area(¸S) 2 Lipschitz-Killing curvatures are just “intrinisic volumes” or “Minkowski functionals” in the (Riemannian) metric of the variance of the derivative of the process Lipschitz-Killing curvature Ld (S) of any set S S S ¸ = Sd Edge length × λ 12 10 8 6 4 2 . .. . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . 4 .. . . . . . . . . . . . 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 .. . . . ... . .. . . . . . . . ..... . . . . .... .. .. 10 µ @Z @s ¶ of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) Non-isotropic data We must restrict T (s) to T (s) = f (Z(s)); s 2 S ½ <D ; where Z(s) = (Z1 (s); : : : ; Zn (s)) and Zi (s) are independent and identically distributed non-isotropic Gaussian random ¯elds Z(s) » N(0; 1). Luckily this covers many of the usual test statistics such as T, Â2 , F for testing for contrasts in a linear model. Heuristic: If we know the spatial correlation function of Z(s), can we warp (deform) the space so that the data becomes isotropic? Obviously not globally, but perhaps locally . . . We may need to embed S in a higher dimensional space . . . How many dimensions are needed? Nash Embedding Theorem says it is ¯nite. Better idea: Replace local Euclidean distance by the variogram: d(s1 ; s2 ) = V(Z(s1 ) ¡ Z(s2 )) ³ ´ or Reimannian metric by ¸(s)2 = V @Z(s) . @s Non-isotropic data ¸(s) = Sd Z~N(0,1) s2 3 µ @Z @s ¶ 2 1 0.14 0.12 0 -1 -2 Edge length × λ(s) 12 10 8 6 4 2 .. . . . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . . . . . . . . . . . . . 4 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 . .. . . . ... . .. . . . . . . . ..... . . . . .... ... 10 s1 0.1 0.08 0.06 -3 of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) Estimating Lipschitz-Killing curvature Ld (S) We need independent & identically distributed random fields e.g. residuals from a linear model Z1 Z2 Z3 Z4 Replace coordinates of the triangles 2 <2 by normalised residuals Z 2< n; jjZjj Z5 Z7 Z8 Z9 … Zn of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 Z = (Z1 ; : : : ; Zn ): Z6 N 0 ( ) Scale space How much to smooth the data? Why not try all R smooths (in a range), then choose the maximum. Suppose f is a kernel with f 2 = 1 and B is a Brownian sheet. Then the scale space random ¯eld is Z µ ¡ ¶ s h Z(s; w) = w¡D=2 f dB(h) » N(0; 1): w Lipschitz-Killing curvatures: Note scaled to preserve variance, not mean ¡ ¡ L (S £ [w ; w ]) = w1 1 + w2 1 L (S) + i 1 2 i 2 b(N ¡X i+1)=2c j=0 w¡i¡2j+1 ¡ w¡i¡2j+1 1 2 ¡ i + 2j 1 ¡ ¡ ¡ £ ·(1 2j)=2 ( 1)j (i + 2j 1)! L (S); i+2j ¡1 (1 ¡ 2j)(4¼)j j!(i ¡ 1)! where · = R³ s0 @f (s) @s ´ 2 + D f (s) ds: 2 E(EC(S \ Xt )) = Beautiful symmetry: X D L (S)½ (R ) d d t d=0 Lipschitz-Killing curvature Ld (S) Steiner-Weyl Tube Formula (1930) EC density ½d (Rt ) Taylor Gaussian Tube Formula (2003) µ ¶ • Put a tube of radius r about @Z the search region λS and rejection region Rt: ¸ = Sd @s Z2~N(0,1) 14 r 12 10 Rt Tube(λS,r) 8 Tube(Rt,r) r λS 6 t-r t Z1~N(0,1) 4 2 2 4 6 8 10 12 14 • Find volume or probability, expand as a power series in r, pull off1coefficients: jTube(¸S; r)j = X D d=0 ¼d L P(Tube(Rt ; r)) = ¡d (S)r d D ¡(d=2 + 1) X (2¼)d=2 d! d=0 ½d (Rt )rd EC density ½d ( ¹ ¸ t) of the  ¹ statistic Z2~N(0,1) Tube(Rt,r) Â(s) ¹ = r max Z1 (s) cos µ + Z2 (s) sin µ 0·µ·¼=2 t-r Taylor’s Gaussian Tube Formula (2003) 1 P (Z1 ; Z2 2 Tube(Rt ; r)) = X (2¼)d=2 d! Rejection region Rt t Z1~N(0,1) ½d ( ¹ ¸ t)rd d=0 (2¼)1=2 ½1 ( ¹ ¸ t)r + (2¼)½ ( ¸ t)r2 =2 + ¢ ¢ ¢ = ½0 ( ¹ ¸ t) + ¹ 2 Z 1 = (2¼)¡1=2 e¡z2 =2 dz + e¡(t¡r)2 =2 =4 t¡r ½0 ( ¹ ¸ t) = Z t 1 (2¼)¡1=2 e¡z2 =2 dz + e¡t2 =2 =4 ½1 ( ¹ ¸ t) = (2¼)¡1 e¡t2 =2 + (2¼)¡1=2 e¡t2 =2 t=4 ½ ( ¹ ¸ t) = (2¼)¡3=2 e¡t2 =2 t + (2¼)¡1 e¡t2 =2 (t2 ¡ 1)=8 2 .. . EC densities for some standard test statistics Using Morse theory method (1981, 1995): T, χ2, F (1994) Scale space (1995, 2001) Hotelling’s T2 (1999) Correlation (1999) Roy’s maximum root, maximum canonical correlation (2007) Wilks’ Lambda (2007) (approximation only) Using Gaussian Kinematic Formula: T, χ2, F are now one line … Likelihood ratio tests for cone alternatives (e.g chi-bar, beta-bar) and nonnegative least-squares (2007) … Accuracy of the P-value approximation If Z(s) » N(0; 1) ¡is an¢ isotropic Gaussian random ¯eld, s 2 <2 , with ¸2 I2£2 = V @Z , @s µ ¶ P max Z(s) ¸ t ¼ E(EC(S \ fs : Z(s) ¸ tg)) s2S Z 1 1 = EC(S) £ e¡z2 =2 dz (2¼)1=2 t 1 ¡2 £ 1 + ¸ Diameter(S) e t =2 2 2¼ 1 £ 2 + ¸ Area(S) te¡t2 =2 (2¼)3=2 Z 1 1 = c0 e¡z2 =2 dz + (c1 + c2 t + ¢ ¢ ¢ + cD tD¡1 )e¡t2 =2 (2¼)1=2 t ¯ µ ¯ ¶ ¯ ¯ ¯P max Z(s) ¸ t ¡ E(EC(S \ fs : Z(s) ¸ tg))¯ = O(e¡®t2 =2 ); ® > 1: ¯ ¯ s2S The expected EC gives all the polynomial terms in the expansion for the P-value. Bubbles task in fMRI scanner Correlate bubbles with BOLD at every voxel: Trial 1 2 3 4 5 6 7 … 3000 1 0.5 0 fMRI 10000 0 Calculate Z for each pair (bubble pixel, fMRI voxel) – a 5D “image” of Z statistics … Thresholding? Cross correlation random field Correlation between 2 fields at 2 different locations, searchedµ over all pairs of locations, ¶ one in S, one in T: P max C(s; t) ¸ c s2S;t2T = ¼ E(EC fs 2 S; t 2 T : C(s; t) ¸ cg) dim(S) X dim(T X) i=0 2n¡2¡h (i ¡ 1)!j! ¸ ½ij (C c) = ¼h=2+1 L (S)L (T )½ (C ¸ c) i j ij j=0 b(hX ¡1)=2c (¡1)k ch¡1¡2k (1 ¡ c2 )(n¡1¡h)=2+k k=0 X k X k l=0 m=0 ¡( n¡i + l)¡( n¡j + m) 2 2 ¡ ¡ ¡ ¡ l!m!(k l m)!(n 1 h + l + m + k)!(i ¡ 1 ¡ k ¡ l + m)!(j ¡ k ¡ m + l)! Cao & Worsley, Annals of Applied Probability (1999) Bubbles data: P=0.05, n=3000, c=0.113, T=6.22 MS lesions and cortical thickness Idea: MS lesions interrupt neuronal signals, causing thinning in downstream cortex Data: n = 425 mild MS patients Lesion density, smoothed 10mm Cortical thickness, smoothed 20mm Find connectivity i.e. find voxels in 3D, nodes in 2D with high correlation(lesion density, cortical thickness) Look for high negative correlations … Threshold: P=0.05, c=0.300, T=6.48 n=425 subjects, correlation = -0.568 5.5 Average cortical thickness 5 4.5 4 3.5 3 2.5 2 1.5 0 10 20 30 40 50 60 Average lesion volume 70 80 Summary Points are in a low dimensional space, physically meaningful Smooth (choice of kernel? Scale space …), threshold Galaxies: Looking for sheets, strings, clusters of high density EC is used to measure “topology”; other intrinsic volumes (diameter, surface area, volume) are also used Compare observed EC with expected EC under some model (e.g. “inflation”) Bubbles, MS lesions: Detect sparse high-density clusters with a very low signal to noise Thresholding gives maximum likelihood estimates under certain conditions EC is merely a device for getting an extremely accurate approximation to the false positive rate (P-value of the maximum) Brain mapping data: Detect sparse high-density “activations” with a very low signal to noise Riemannian metric is usually unknown But we only need to estimate Lipschitz-Killing curvature Fill with small simplices, work out LKC on each component, sum using inclusion-exclusion formula