ISI Platinum Jubilee, Jan 1-4, 2008 Roy’s union-intersection principle, random fields, and brain imaging Keith Worsley, McGill Jonathan Taylor, Stanford and Université de Montréal Sankhya (1937) 3:65-72 Deformation Based Morphometry (DBM) (Tomaiuolo et al., 2004) n1 = 19 non-missile brain trauma patients, 3-14 days in coma, n2 = 17 age and gender matched controls Data: non-linear vector deformations needed to warp each MRI to an atlas standard Y1i(s) = deformation vector of trauma patient i at point s Y2i(s) = deformation vector of control patient i at point s Locate damage: find regions where deformations are different Test statistic: Hotelling’s T2 (1931) T(s)2 = (Y1.(s) – Y2.(s))’Σ-1 (Y1.(s) – Y2.(s)) P.C. Mahalanobis with Harold Hotelling, at Gupta Nivas, 1939 Multivariate linear models for random field data Y(s)n£q is the observations £ variables data matrix at point s 2 S ½ <D . At every point s we have a multivariate linear model: Y(s)n£q = Xn£p B(s)n£q + E(s)n£q ; E(s)n£q » N(0; I §): We detect sparse s where B(s) 6= 0 by testing H0 : B(s) = 0 at every point s. Test statistics T (s): # regressors p=1 p>1 q=1 T F # variables q>1 Hotelling’s T2 Need null distribution: P(maxs2S T (s) ¸ t) Wilks’ Λ Pillai’s trace Roy’s max root etc. EC heuristic At high thresholds t, the excursion set Xt = fs : T (s) ¸ tg is either one connected component (EC = 1) or empty (EC = 0), so µ ¶ P max T (s) ¸ t ¼ E(EC(S \ Xt )): s2S Theorem (1981, 1995) If T (s) is a smooth isotropic random ¯eld then E(EC(S \ Xt )) = X D d=0 ¹d (S)½d (t): E(EC(S \ Xt )) = X D ¹d (S)½d (t) d=0 Intrinsic volume ¹d (S) EC density ½d (t) For S with smooth boundary @S that has curvature matrix C(s), Morse theory: X \ EC(S Xt ) = 1fT (s)¸tg 1f@T (s)=@s=0g ¡((D ¡ d)=2) ¹d (S) = ¡ Z 2¼(D d)=2 £ detrD¡1¡d fC(s)gds; @S and ¹D (S) = jS j. For D = 3, ¹0 (S) = EC(S) ¹1 (S) = 2 Diameter(S) ¹2 (S) = 1 Surface area(S) 2 ¹3 (S) = Volume(S): s µ ¶ £ sign ¡ @ 2 T + boundary @s@s0 µ µ ¶ 2 @ T ½D (t) = E 1fT ¸tg det ¡ @s@s0 ¯ ¶ µ ¶ ¯ @T @T ¯ P = 0 =0 ¯ @s @s For a Gaussian random ¯eld, ¸ = Sd( @Z ), @s µ ¡ ¶ ¸ @ d p P(Z ¸ t) ½d (t) = 2¼ @t After lots of messy algebra … EC densities are known for Â2 , T , F ¯elds (1994), and Hotelling0 s T 2 , º df (1999): ½0 (t) = ½1 (t) = ½2 (t) = ½3 (t) = Z 1 ¡( º+1 ) ¡ º+1 q¡2 2 (1 + u) u 2 du; 2 ¡q+1 q º ¡( )¡( ) t 2 2 ¼ ¡ 12 ¡( º+1 ) ¡ º ¡1 q¡1 2 (1 + t) t 2 ; 2 ¡q+2 q º ¡( )¡( ) 2 2 µ ¶ ¡ ¡ º+1 1 ¼ ¡( ) q 1 ¡ º ¡1 q¡2 ¡ 2 (1 + t) 2 t 2 t ; º ¡q+1 ¡( q )¡( º ¡q+1 ) 2 2 µ ¶ ¡ ¡ ¡ ¡ 3 º+1 ¼ 2 ¡( ) 2q 1 (q 1)(q 2) ¡ º ¡1 q ¡3 ¡ 2 2 (1 + t) 2 t 2 t t+ ¡ : º ¡q (º q + 2)(º ¡ q) ¡( q )¡( º ¡q ) 2 2 Last case # regressors p=1 p>1 q=1 # variables T✔ F✔ q>1 Hotelling’s T2✔ Wilks’ Λ Pillai’s trace Roy’s max root etc. ? We shall now ¯nd a P-value approximation (but not quite the EC density) for the maximum Roy0 s maximum root, T (s) =maximum eigenvalue of Y(s)0 X(X0 X)¡1 X0 Y(s)(Y(s)0 (I ¡ X(X0 X)¡1 X0 )Y(s))¡1 : The above messy algebra is just too complicated. Instead . . . Roy’s union-intersection principle Make it into a univariate linear model by multiplying by vector uq£1 : (Y(s)u)n£1 = Xn£p (B(s)u)q£1 + (E(s)u)n£1 ; H0u : (B(s)u)q£1 = 0 Let F (s; u) be the usual F-statistic for testing H0u . Let °q ½ <q be the unit q-sphere. Roy0 s maximum root is T (s) = maxu2° F (s; u): q Now F (s; u) is an F-¯eld in search region S °q and we already know the EC density of the F-¯eld, ½d (F ¸ t) so µ ¶ µ ¶ P max T (s) ¸ t = P max F (s; u) ¸ t s2S s2S;u2°q ¼ 1 2 D+q X ¹d (S ° )½ (F ¸ t) q d✔ d=0 Why ½? F(s,u)=F(s,-u) = 1 2 X D d=0 ¹d (S) q X k=0 Almost the EC density of the Roy’s maximum root field ¸ t) ¹k (°q )½d+k (F ✔ Non-negative least squares random field theory fMRI data: 120 scans, 3 scans hot, rest, warm, rest, … First scan of fMRI data 1000 Highly significant effect, T=6.59 500 hot rest warm 890 880 870 0 0 100 200 300 No significant effect, T=-0.74 820 hot rest warm T statistic for hot - warm effect 5 800 0 100 0 100 0 -5 T = (hot – warm effect) / S.d. ~ t110 if no effect 200 Drift 300 810 800 790 200 Time, seconds 300 Linear model regressors Alternating hot and warm stimuli separated by rest (9 seconds each). 2 1 0 -1 0 50 100 150 200 250 Hemodynamic response function (HRF) 300 350 0.4 Delays and disperses the stimuli by ~6s 0.2 0 -0.2 0 50 Regressors x(t) = stimuli * HRF, sampled every 3 seconds 2 1 0 -1 0 50 100 150 200 Time, seconds 250 300 350 Linear model for fMRI time series with AR(p) errors Y (t) = x(t)¯?+ z(t)°? + ²(t) ²(t) = a1 ²(t ¡ 1) + ¢ ¢ ¢ + ap ²(t ¡ p) + ¾W N (t) ? ? ? t = time Y (t) = fMRI data x(t) = stimuli convolved with HRF z(t) = drift etc: ²(t) = error W N (t) » N(0; 1) independently unknown parameters Allowing for unknown delay of the HRF Replace x(t) by two responses with extreme HRF delays x1 (t) = x(t) with HRF shifted by + 2 seconds; x2 (t) = x(t) with HRF shifted by ¡ 2 seconds; Y (t) = x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t): When the coe±cients are non-negative ¯1 ¸ 0; ¯2 ¸ 0; we get a range of responses in between: x(t) 2 1 x2(t) x1(t) stimulus 0 -1 10 20 30 40 50 Time, t (seconds) 60 70 Non-negative coefficients is a cone alternative Recall model is Y (t) = x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t) ¯ ¸ 0; ¯ ¸ 0: 1 2 Let z1 , z2 be an orthogonal basis for x1 , x2 . Then E(Y ) falls inside a cone alternative: Cone angle θ = angle between x1 and x2 x1(t) z2 Cone alternative Null 0 β1 ≥ 0 β2 ≥ 0 z1 x2(t) Example of three extremes Y (t) = x1 (t)¯1 + x2 (t)¯2 + x3 (t)¯3 + z(t)° + ²(t); ¯ ¸ 0; ¯ ¸ 0; ¯ ¸ 0; 1 2 3 z1 ; z2 ; z3 orthogonal basis for x1 ; x2 ; x3 . 4 x3(t) spread 4 seconds 3D cone 0 25 4 x1(t) standard z3 z1 z2 x2(t) delayed 4 seconds General non-negative least squares problem In general we may have p constrained regressors and q unconstrained regressors. The constrained regressors could be the extremes of the HRF, e.g. min/max latency shift, min/max spread, etc. The model is then all HRFs in between. In vector form: Yn£1 = Xn£p ¯p£1 + Zn£q °q£1 + ²n£1 ¯p£1 ¸ 0 (component ¡ wise) °q£1 unconstrained ²n£1 » N(0n£1 ; In£n ¾ 2 ) (without loss of generality) We might also have far more regressors than actual observations(!), i.e. p >> n. Footnote: Woolrich et al. (2004) replace “hard” constraints by “soft” constraints through a prior distribution on β, taking a Bayesian approach. Pick a range of say p = 150 plausible values of the non-linear parameter ν: ν1,…,νp Fitting the NNLS model Simple: • Do “all subsets” regression • Throw out any model that does not satisfy the non-negativity constraints • Among those left, pick the model with smallest error sum of squares For larger models there are more efficient methods e.g. Lawson & Hanson (1974). The non-negativity constraints tend to enforce sparsity, even if regressors are highly correlated (e.g. PET). Why? Highly correlated regressors have huge positive and negative unconstrained coefficients – non-negativity suppresses the negative ones. Example: n=20, p=150, but surprisingly it does not overfit 30 Y Yhat Yhat component 1 Yhat component 2 25 Tracer 20 b̄ = 107:4 b̄40 = 46:9 41 15 10 5 0 0 1000 2000 b̄ = 5:4 b̄86 = 71:7 87 3000 Time 4000 rest: b̄ = 0 j 5000 6000 Tend to get sparse pairs of adjacent regressors, suggesting best regressor is somewhere inbetween. P-values for testing the cone alternative? ! H0 : ¯p£1 = 0p£1 error sum of squares = SSE0 H1 : ¯p£1 ¸ 0p£1 (component ¡ wise) ! error sum of squares = SSE1 Likelihood ratio test statistic is the Beta-bar statistic, equal to the coe±cient of determination or multiple correlation R2 : ¡ SSE SSE 0 1 ¹= B SSE0 Null distribution if there are no constraints, i.e. H1 : ¯p£1 6= 0p£1 : ¡ ¡ ¢¸ ¢ ¡ ¸ ¹ P(B t) = P Beta p ; º p t ; º =n¡q 2 2 Null distribution with constraints is a weighted average of Beta distributions, hence the name Beta-bar (Lin & Lindsay, Takemura & Kuriki, 1997): ¹ ¸ t) = P(B X º j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 =1 when j=ν 2 wj = P(#funconstrained b̄0 s > 0g = j) P-values¹ for PET data at a single voxel Observed B is t = 0:9524, º = 20. p = 2: Cone weights: w1 = ½, w2 = θ/2π ¹ weights w Easiest way to ¯nd the B j is by simulation (10000 iterations): -8 0.5 2 0.4 wj 1.5 x 10 ¡ ¡ ¢¸ ¢ ¡ j º j P Beta ; t 2 0.3 1 2 0.2 0.5 0.1 0 0 1 2 3 4 5 0 6 j ¹ ¸ t) = P(B X º j=1 0 1 2 3 4 j ¡ ¡ ¢¸ ¢ ¡ wj P Beta j ; º j t = 2:35 £ 10¡12 2 2 5 6 The Beta-bar random field Recall that at a single point (voxel): ¹ ¸ t) = P(B X º j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 2 Recall that if T (s); s 2 S ½ <D , is an isotropic random ¯eld: µ ¶ X D ¸ ¼ \ f ¸ g P max T (s) t E(EC(S s : T (s) t )) = ¹d (S)½d (T ¸ t) s2S d=0 with ½0 ´ P. Taylor & Worsley (2007): ¹ ¸ t) = ½d (B X º j=1 ¡ ¡ ¢ ¢ ¡j ¸ j º wj ½d Beta ; t 2 =1 when j=ν 2 From well-known EC µ From simulations ¶ X µ ¶ Same linear combination! ¡ ¢ densities at a single¸ voxel ¼ º of F field ¡ ¸ ¹ j º j P max B(s) t wj P max Beta ; (s) t s2S j=1 s2S 2 2 Roy’s union-intersection principle Yn£1 = Xn£p ¯p£1 + ²n£1 ; ¯p£1 ¸ 0; (component ¡ wise) » N(0 ² ;I ¾ 2 ): (without loss of generality) n£1 n£1 n£n Write the model in terms of the orthonormalised regressors: Yn£1 = (Zu)n£1 ® + ²n£1 ; 0 ® ¸ 0; X)¡1=2 ; Zn£p = X(X ½ ¾ 0 1=2 (X X) ¯ ¸ 0 ½ ° ½ <n : up£1 2 U = jj 0 : ¯ p (X X)1=2 ¯ jj c ® b(u)=Sd( b(u)) be the usual T-statistic for Let T (u) = ® testing H0u : ® = 0. Roy0 s UIP gives the test statistic: T¹ = max T (u); u2U ¹= B T¹2 : ¡ ¹ n 1 + T2 x1(t) z2 Cone u 0 1 z1 U x2(t) Roy’s union-intersection principle Can we use the same trick of adding U to the search space? Let0 s put back the parameter s. The T-bar statistic is T¹(s) = max T (s; u); T (s; u) »H tn¡1 ; 0 u2U µ ¶ µ ¶ P max T¹(s) ¸ t = P max T (s; u) ¸ t s2S s2S;u2U ¼ D+q X ¹d (S U )½✔d (T ¸ t) d=0 = X D d=0 ¹d (S) q X ¸ t): ¹k (U )½d+k ✔ (T k=0 Sadly this doesn0 t work: although T (s; u) is a T-¯eld in s for each u, it is not b(s; u) is a Gaussian random ¯eld in a T-¯eld in (s; u), despite the fact that ® c ® b(s; u))2 is not a Â2 ¡ ¯eld in (s; u). (s; u). The reason is that (n ¡ 1)Sd( n 1  ¹= max Z1 cos µ + Z2 sin µ 0·µ·¼=2 Functions of Gaussian fields: Z1~N(0,1) Example: 3 Z2~N(0,1) s2 2 1 0 -1 -2 -3 Excursion sets, Xt = fs :  ¹ ¸ tg Search Region, S Rt = fZ :  ¹ ¸ tg s1 Rejection regions, Z2 Cone 2 alternative 4 0 2 Z1 Null 3 1 -2 -2 0 2 Threshold t Euler characteristic heuristic again Search Region, S Excursion sets, Xt EC= #blobs - # holes = 1 7 6 5 2 1 1 Euler characteristic, EC 10 Heuristic : ¸ t) P(max Â(s) ¹ Observed 8 0 s2S 6 ¼ E(EC) = 0:05 Expected 4 ) t = 3:75 2 0 -2 0 0.5 EXACT! 1 1.5 2 E(EC(S \ Xt )) = X D d=0 2.5 3 L (S)½ (R ) d d t 3.5 4 Threshold, t E(EC(S \ Xt )) = Beautiful symmetry: X D µ L (S)½ (R ) d d t @Z ¸ = Sd @s d=0 Lipschitz-Killing curvature Ld (S) Steiner-Weyl Tube Formula (1930) ¶ EC density ½d (Rt ) Taylor Gaussian Tube Formula (2003) • Put a tube of radius r about the search region λS and rejection region Rt: Z2~N(0,1) 14 r 12 10 Rt Tube(λS,r) 8 Tube(Rt,r) r λS 6 t-r t Z1~N(0,1) 4 2 2 4 6 8 10 12 14 • Find volume or probability, expand as a power series in r, pull off1coefficients: jTube(¸S; r)j = X D d=0 ¼d L P(Tube(Rt ; r)) = ¡d (S)r d D ¡(d=2 + 1) X (2¼)d=2 d! d=0 ½d (Rt )rd Lipschitz-Killing curvature Ld (S) of a triangle r Tube(λS,r) λS ¸ = Sd µ @Z @s ¶ Steiner-Weyl Volume of Tubes Formula (1930) Area(Tube(¸S; r)) = X D ¼ d=2 L ¡d (S)r d D ¡(d=2 + 1) d=0 = L2 (S) + 2L1 (S)r + ¼ L0 (S)r2 = Area(¸S) + Perimeter(¸S)r + EC(¸S)¼r2 L (S) = EC(¸S) 0 L (S) = 1 Perimeter(¸S) 1 2 L (S) = Area(¸S) 2 Lipschitz-Killing curvatures are just “intrinisic volumes” or “Minkowski functionals” in the (Riemannian) metric of the variance of the derivative of the process Lipschitz-Killing curvature Ld (S) of any set S S S ¸ = Sd Edge length × λ 12 10 8 6 4 2 . .. . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . 4 .. . . . . . . . . . . . 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 .. . . . ... . .. . . . . . . . ..... . . . . .... .. .. 10 µ @Z @s ¶ of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) s2 Z~N(0,1) Non-isotropic data? µ ¸ = Sd 3 @Z @s ¶ 2 1 0.14 0.12 0 -1 -2 s1 0.1 0.08 0.06 -3 . . . .. . 12we warp .. • Can to isotropy? i.e. multiply edge lengths by λ? ... . .the . . data • • • . . . . . . . . . 10 . . . . . . . . ... . . no, Globally . . but . . locally . . . . yes, but we may need extra dimensions. . 8 . . . . . . . . . . . . . . . . . . . Theorem: Nash Embedding #dimensions ≤ D + D(D+1)/2; D=2: #dimensions 6 . . . . . . . . . ..... .. . . . . . . . . . . . . Euclidean .... 4 idea: . . . replace Better distance by the variogram: ... . . . . ... . . . . . d(s1, s2)2 = Var(Z(s1) - Z(s2)). 2 4 6 8 10 ≤ 5. Non-isotropic data? Use Riemannian metric µ ¶ of Var(∇Z) ¸ = Sd Z~N(0,1) s2 3 @Z @s 2 1 0.14 0.12 0 -1 -2 Edge length × λ 12 10 8 6 4 2 .. . . . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . . . . . . . . . . . . . 4 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 . .. . . . ... . .. . . . . . . . ..... . . . . .... ... 10 s1 0.1 0.08 0.06 -3 of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) Estimating Lipschitz-Killing curvature Ld (S) We need independent & identically distributed random fields e.g. residuals from a linear model Z1 Z2 Z3 Z4 Replace coordinates of the triangles in S ½ <2 by normalised residuals Z jjZ jj ; Z = (Z1 ; : : : ; Zn ) 2 <n : Taylor & Worsley, JASA (2007) Z5 Z6 Z7 Z8 Z9 … Zn of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) E(EC(S \ Xt )) = Beautiful symmetry: X D µ L (S)½ (R ) d d t @Z ¸ = Sd @s d=0 Lipschitz-Killing curvature Ld (S) Steiner-Weyl Tube Formula (1930) ¶ EC density ½d (Rt ) Taylor Gaussian Tube Formula (2003) • Put a tube of radius r about the search region λS and rejection region Rt: Z2~N(0,1) 14 r 12 10 Rt Tube(λS,r) 8 Tube(Rt,r) r λS 6 t-r t Z1~N(0,1) 4 2 2 4 6 8 10 12 14 • Find volume or probability, expand as a power series in r, pull off1coefficients: jTube(¸S; r)j = X D d=0 ¼d L P(Tube(Rt ; r)) = ¡d (S)r d D ¡(d=2 + 1) X (2¼)d=2 d! d=0 ½d (Rt )rd EC density ½d ( ¹ ¸ t) of the  ¹ statistic Z2~N(0,1) Tube(Rt,r) r t-r Rejection region Rt t Z1~N(0,1) Taylor’s Gaussian Tube Formula (2003) 1 P (Z1 ; Z2 2 Tube(Rt ; r) = X (2¼)d=2 d! ½d ( ¹ ¸ t)rd d=0 (2¼)1=2 ½1 ( ¹ ¸ t)r + (2¼)½ ( ¸ t)r2 =2 + ¢ ¢ ¢ = ½0 ( ¹ ¸ t) + ¹ 2 Z 1 = (2¼)¡1=2 e¡z2 =2 dz + e¡(t¡r)2 =2 =4 t¡r ½0 ( ¹ ¸ t) = Z t 1 (2¼)¡1=2 e¡z2 =2 dz + e¡t2 =2 =4 ½1 ( ¹ ¸ t) = (2¼)¡1 e¡t2 =2 + (2¼)¡1=2 e¡t2 =2 t=4 ½ ( ¹ ¸ t) = (2¼)¡3=2 e¡t2 =2 t + (2¼)¡1 e¡t2 =2 (t2 ¡ 1)=8 2 .. . ¹ ¸ t) of the B ¹ statistic EC density ½d (B Recall that at a single point (voxel): ¹ ¸ t) = P(B X n j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 2 Recall that if F (s); s 2 S ½ <D , is an isotropic random ¯eld: µ ¶ X D ¸ ¼ \ f ¸ g L (S)½ (F ¸ t) P max Z(s) t E(EC(S s : Z(s) t )) = d d s2S d=0 with ½0 ´ P. Taylor & Worsley (2007): ¹ ¸ t) = ½d (B X n j=1 ¡ ¡ ¢ ¢ ¡j ¸ j º wj ½d Beta ; t 2 2 From well-known EC µ From simulations ¶ X µ ¶ linear combination! n ¡ ¢ densities at a single¸ voxel ¼ Same of F field ¡ ¸ ¹ j º j P max B(s) t wj P max Beta ; (s) t s2S j=1 s2S 2 2 Power? S = 1000cc brain, FWHM = 10mm, P = 0.05 Event Block (20 seconds) 1 1 Cone angle θ = 78.4o Cone angle θ = 38.1o 0.9 0.9 T-test on β1 0.8 0.8 0.7 0.6 0.6 Power of test F-test on 0.5 (β , β ) 1 2 0.4 Cone weights: w1 = ½ w2 = θ/2π 0.5 0.4 0.3 0.2 0 0 0.3 x1(t) 0 0.2 x2(t) -0.5 0.1 2 0 2 Shift d of HRF (seconds) 3 0 0 -2 0.1 20 40 Time t (seconds) 1 Response 0.5 Response Power of test Beta-bar test 0.7 0 0 20 40 Time t (seconds) 1 2 Shift d of HRF (seconds) 3 Cross correlation random field Let X(s), s 2 S ½ <M , and Y (t), t 2 T ½ <N be n £ 1 vectors of Gaussian random ¯elds. De¯ne the cross correlation random ¯eld as X(s)0 Y (t) C(s; t) = max u;v (X(s)0 X(s) P µ (t))1=2 : ¶ max C(s; t) ¸ c ¼ E(EC fs 2 S; t 2 T : C(s; t) ¸ cg) s2S;t2T = dim(S) X dim(T X) i=0 ½ij (C ¸ c) = Y (t)0 Y 2n¡2¡h (i L (S)L (T )½ (C ¸ c) i j ij j=0 ¡1)=2c ¡ 1)!j! b(hX ¼h=2+1 k=0 (¡1)k ch¡1¡2k (1 ¡ c2 )(n¡1¡h)=2+k X k X k l=0 m=0 ¡( n¡i + l)¡( n¡j + m) 2 2 : ¡ ¡ ¡ ¡ l!m!(k l m)!(n 1 h + l + m + k)!(i ¡ 1 ¡ k ¡ l + m)!(j ¡ k ¡ m + l)! Maximum canonical cross correlation random field Let X(s)n£p , s 2 S ½ <M , and Y(t)n£q , t 2 T ½ <N be matrices of Gaussian random ¯elds. De¯ne the maximum canonical cross correlation random ¯eld as u0 X(s)0 Y(t)v C(s; t) = max u;v (u0 X(s)0 X(s)u v 0 Y(t)0 Y(t)v)1=2 ; the maximum of the canonical correlations between X and Y, de¯ned as the singular values of (X0 X)¡1=2 X0 Y(Y 0 Y)¡1=2 . P µ max C(s; t) ¸ c s2S;t2T ¶ ¼ 1 2 X M L (S) i i=0 where L (° ) = k p X N L (T ) j j=0 p X L (° ) k p k=0 ¡ 2k+1 ¼ k2 ¡ p+1 ³ ´2 ¡ ¡ k! p 1 k ! 2 q X l=0 ¢ if p ¡ 1 ¡ k is even, and zero otherwise, k = 0; : : : ; p ¡ 1. L (° )½ ¸ c) (C l q i+k;j+l Mann-Whitney random field MW = sum of ranks of n/2 random fields, n=10 2.5 2 50 1.5 1 100 0.5 0 150 -0.5 -1 -1.5 200 -2 -2.5 250 50 100 150 200 250