Nonnegative least squares for imaging data Keith Worsley McGill Jonathan Taylor Stanford and Université de Montréal John Aston Academia Sinica, Taipei Nature (2005) Subject is shown one of 40 faces chosen at random … Happy Sad Fearful Neutral … but face is only revealed through random ‘bubbles’ First trial: “Sad” expression Sad 75 random Smoothed by a bubble centres Gaussian ‘bubble’ What the subject sees 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Subject is asked the expression: Response: “Neutral” Incorrect Your turn … Trial 2 Subject response: “Fearful” CORRECT Your turn … Trial 3 Subject response: “Happy” INCORRECT (Fearful) Your turn … Trial 4 Subject response: “Happy” CORRECT Your turn … Trial 5 Subject response: “Fearful” CORRECT Your turn … Trial 6 Subject response: “Sad” CORRECT Your turn … Trial 7 Subject response: “Happy” CORRECT Your turn … Trial 8 Subject response: “Neutral” CORRECT Your turn … Trial 9 Subject response: “Happy” CORRECT Your turn … Trial 3000 Subject response: “Happy” INCORRECT (Fearful) Bubbles analysis 1 E.g. Fearful (3000/4=750 trials): + 2 + 3 + Trial 4 + 5 + 6 + 7 + … + 750 1 = Sum 300 0.5 200 0 100 250 200 150 100 50 Correct trials Proportion of correct bubbles =(sum correct bubbles) /(sum all bubbles) 0.75 Thresholded at proportion of 0.7 correct trials=0.68, 0.65 scaled to [0,1] 1 Use this as a 0.5 bubble mask 0 Results Mask average face Happy Sad Fearful But are these features real or just noise? Need statistics … Neutral Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for each expression Equivalent to 2-sample Z-statistic for correct vs. incorrect bubbles, e.g. Fearful: Trial 1 2 3 4 5 6 7 … 750 1 0.5 0 1 1 Response 0 1 Z~N(0,1) statistic 4 2 0 -2 0 1 1 … 1 0.75 Very similar to the proportion of correct bubbles: 0.7 0.65 Results Thresholded at Z=1.64 (P=0.05) Happy Average face Sad Fearful Neutral Z~N(0,1) statistic 4.58 4.09 3.6 3.11 2.62 2.13 1.64 Multiple comparisons correction for 91200 pixels? Need random field theory … Euler Characteristic = #blobs - #holes Excursion set {Z > threshold} for neutral face EC = 0 30 Euler Characteristic 20 0 -7 -11 13 14 9 1 0 Heuristic: At high thresholds t, the holes disappear, EC ~ 1 or 0, E(EC) ~ P(max Z > t). Observed Expected 10 0 -10 -20 -4 -3 -2 -1 0 Threshold 1 2 • Exact expression for E(EC) for all thresholds, • E(EC) ~ P(max Z > t) is 3 4 extremely accurate. Random field theory » If Z(s) N(0; 1) ¡is an¢ isotropic Gaussian random ¯eld, s 2 <2 , with ¸2 I2£2 = V @Z , @s µ ¶ P max Z(s) ¸ t ¼ E(EC(S \ fs : Z(s) ¸ tg)) s2S Z 1 1 £ L (S) = EC(S) e¡z2 =2 dz 0 (2¼)1=2 t 1 ¡2 L (S) Lipschitz-Killing 1 Perimeter(S) £ + ¸ e t =2 1 2 curvatures of S 2¼ 1 (=Resels(S) × c) L (S) ¡t2 =2 2 Area(S) £ + ¸ te 2 (2¼)3=2 If Z(s) is white noise convolved with an isotropic Gaussian Z(s) ¯lter of Full Width at Half Maximum FWHM then p ¸ = 4 log 2 : FWHM ½0 (Z ¸ t) ½1 (Z ¸ t) ½2 (Z ¸ t) EC densities of Z above t white noise = filter * FWHM Results, corrected for search Random field theory threshold: Z=3.92 (P=0.05) Happy Average face Sad Fearful Neutral Z~N(0,1) statistic 4.58 4.47 4.36 4.25 4.14 4.03 3.92 3.82 3.80 3.81 3.80 Saddle-point approx (Rabinowitz, 1997; Chamandy, 2007)↑ Bonferroni: Z=4.87 (P=0.05) – nothing fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, … First scan of fMRI data Highly significant effect, T=6.59 1000 hot rest warm 890 880 870 500 0 100 200 300 No significant effect, T=-0.74 820 hot rest warm 0 800 T statistic for hot - warm effect 5 0 -5 T = (hot – warm effect) / S.d. ~ t110 if no effect 0 100 0 100 200 Drift 300 810 800 790 200 Time, seconds 300 Linear model regressors Alternating hot and warm stimuli separated by rest (9 seconds each). 2 1 0 -1 0 50 100 150 200 250 300 350 Hemodynamic response function: difference of two gamma densities 0.4 0.2 0 -0.2 0 50 Regressors = stimuli * HRF, sampled every 3 seconds 2 1 0 -1 0 50 100 150 200 Time, seconds 250 300 350 Linear model for fMRI time series with AR(p) errors Y (t) = (s ? h)(t)¯? + z(t)°? + ²(t) ²(t) = a1 ²(t ¡ 1) + ¢ ¢ ¢ + ap ²(t ¡ p) + ¾W N (t) ? ? ? t = time Y (t) = fMRI data s(t) = stimulus h(t) = hemodynamic response function (HRF) z(t) = drift etc: ²(t) = error W N (t) » N(0; 1) independently ? = convolution unknown parameters Unknown latency δ of HRF h(t; ±) = h0 (t ¡ ±) h0 (t) is a known canonical HRF, and latency shift ± is unknown. This is a hard non-linear regression problem so Friston et al. (1998) linearized the problem by expanding h(t; ±) in a Taylor series: h(t; ±) ¼ h0 (t) ¡ ± h_ 0 (t): The model now becomes another linear model with an extra regressor Y (t) = (s ? h0 )(t)¯ ¡ (s ? h_ 0 )(t)±¯ + z(t)° + ²(t) = x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t) ? ? ? x1 = s ? h0 x = ¡s ? h_ 2 ¯1 = ¯ ¯2 = ±¯ 0 unknown parameters Example Y (t) = (s ? (h0 ¡ ± h_ 0 ))(t)¯ + z(t)° + ²(t) = (x1 (t) + x2 (t)±)¯ + z(t)° + ²(t) Linearized HRF h0 (t) ¡ ± h_ 0 (t) ± = ¡2; 0; 2 seconds 0.4-2 convolve with stimulus 2 0 Linearized regressors x1 (t) + ±x2 (t) ± = ¡2; 0; 2 seconds +2 0 -2 +2 1 0.2 0 0 -0.2 -1 0 10 20 Time, t (seconds) stimulus 10 20 30 40 50 Time, t (seconds) 60 70 Two interesting problems: • Estimate the latency shift δ and its standard error; • Test for the magnitude β of the stimulus allowing for unknown latency. Test for the magnitude β of the stimulus allowing for unknown latency Y (t) = (s ? h0 )(t)¯ ¡ (s ? h_ 0 )(t)±¯ + z(t)° + ²(t) = x1 (t)¯ + x2 (t)±¯ + z(t)° + ²(t) = x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t) We could do either • T-test on β1 > 0, allowing for β2 • loses sensitivity if δ is far from 0 • F-test on (β1, β2) ≠ 0 • wastes sensitivity on unrealistic HRF’s Cone alternative We know that the • magnitude of the response is positive • latency shift must lie in some restricted interval, say [-2, 2] seconds This implies that •β≥0 • -Δ ≤ δ ≤ Δ, where Δ = 2 seconds This specifies a cone alternative for (β1 = β, β2 = δβ) (Friman et al. , 2003): δ=2 β2 Cone angle θ = 2 atan(Δ||x2||/||x1||) Null 0 cone alternative δ=0 β1 δ = -2 Non-negative least squares Express model in terms of the two extremes: x¤ (t) = x1 (t) ¡ ±x2 (t) x¤ (t) = x1 (t) + ±x2 (t); 1 2 Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t) 1 1 2 2 Then the coe±cients are non-negative: ¯ ¤ ¸ 0; 1 ¯ ¤ ¸ 0: 2 2 1 x2*(t) x1*(t) stimulus 0 -1 10 20 30 40 50 Time, t (seconds) 60 70 Non-negative least squares Express model in terms of the two extremes: x¤ (t) = x1 (t) ¡ ±x2 (t) x¤ (t) = x1 (t) + ±x2 (t); 1 2 Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t) 1 1 2 2 Then the coe±cients are non-negative: ¯ ¤ ¸ 0; 1 ¯ ¤ ¸ 0: 2 x1*(t) β2 Cone angle θ = angle between x1* and x2* Null 0 β1* ≥ 0 β2 * ≥ 0 β1 x2*(t) Example of three extremes Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t) 1 1 2 2 3 3 ¯ ¤ ¸ 0; ¯ ¤ ¸ 0; ¯ ¤ ¸ 0: 1 2 3 4 x3*(t) spread 4 seconds 3D cone 0 25 4 x1*(t) standard β3 β1 β2 x2*(t) delayed 4 seconds General non-negative least squares problem In general we may have p constrained regressors and q unconstrained regressors. The constrained regressors could be the extremes of the HRF, e.g. min/max latency shift, min/max spread, etc. The model is then all HRFs in between. In vector form: Yn£1 = Xn£p ¯p£1 + Zn£q °q£1 + ²n£1 ¯p£1 ¸ 0 (component ¡ wise) °q£1 unconstrained ²n£1 » N(0n£1 ; In£n ¾ 2 ) (without loss of generality) We might also have far more regressors than actual observations(!), i.e. p >> n. Footnote: Woolrich et al. (2004) replace “hard” constraints by “soft” constraints through a prior distribution on β, taking a Bayesian approach. Pick a range of say p = 150 plausible values of the non-linear parameter ν: ν1,…,νp Fitting the NNLS model Simple: • Do “all subsets” regression • Throw out any model that does not satisfy the non-negativity constraints • Among those left, pick the model with smallest error sum of squares For larger models there are more efficient methods e.g. Lawson & Hanson (1974). The non-negativity constraints tend to enforce sparsity, even if regressors are highly correlated (e.g. PET). Why? Highly correlated regressors have huge positive and negative unconstrained coefficients – non-negativity suppresses the negative ones. Example: n=20, p=150, but surprisingly it does not overfit 30 Y Yhat Yhat component 1 Yhat component 2 25 Tracer 20 b̄ = 107:4 b̄40 = 46:9 41 15 10 5 0 0 1000 2000 b̄ = 5:4 b̄86 = 71:7 87 3000 Time 4000 rest: b̄ = 0 j 5000 6000 Tend to get sparse pairs of adjacent regressors, suggesting best regressor is somewhere inbetween. P-values? ! error sum of squares = SSE H0 : ¯p£1 = 0p£1 0 H1 : ¯p£1 ¸ 0p£1 (component ¡ wise) ! error sum of squares = SSE1 Likelihood ratio test statistic is the Beta-bar statistic, equal to the coe±cient of determination or multiple correlation R2 : ¡ SSE SSE 0 1 ¹= B SSE0 Null distribution if there are no constraints, i.e. H1 : ¯p£1 6= 0p£1 : ¡ ¡ ¢¸ ¢ ¡ ¸ ¹ P(B t) = P Beta p ; º p t ; º =n¡q 2 2 Null distribution with constraints is a weighted average of Beta distributions, hence the name Beta-bar (Lin & Lindsay, Takemura & Kuriki, 1997): ¹ ¸ t) = P(B X º j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 =1 when j=ν 2 wj = P(#funconstrained b̄0 s > 0g = j) P-values¹ for PET data at a single voxel Observed B is t = 0:9524, º = 20. p = 2: Cone weights: w1 = ½, w2 = θ/2π ¹ weights w Easiest way to ¯nd the B j is by simulation (10000 iterations): -8 0.5 2 0.4 wj 1.5 x 10 ¡ ¡ ¢¸ ¢ ¡ j º j P Beta ; t 2 0.3 1 2 0.2 0.5 0.1 0 0 1 2 3 4 5 0 6 j ¹ ¸ t) = P(B X º j=1 0 1 2 3 4 j ¡ ¡ ¢¸ ¢ ¡ wj P Beta j ; º j t = 2:35 £ 10¡12 2 2 5 6 The Beta-bar random field Recall that at a single point (voxel): ¹ ¸ t) = P(B X º j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 2 Recall that if F (s); s 2 S ½ <D , is an isotropic random ¯eld: µ ¶ X D ¸ ¼ \ f ¸ g L (S)½ (F ¸ t) P max Z(s) t E(EC(S s : Z(s) t )) = d d s2S d=0 with ½0 ´ P. Taylor & Worsley (2007): ¹ ¸ t) = ½d (B X º j=1 ¡ ¡ ¢ ¢ ¡j ¸ j º wj ½d Beta ; t 2 =1 when j=ν 2 From well-known EC µ From simulations ¶ X µ ¶ Same linear combination! ¡ ¢ densities at a single¸ voxel ¼ º of F field ¡ ¸ ¹ j º j P max B(s) t wj P max Beta ; (s) t s2S j=1 s2S 2 2 º = 1; Proof  ¹= max Z1 cos µ + Z2 sin µ 0·µ·¼=2 Z1~N(0,1) Z2~N(0,1) s2 3 2 1 0 -1 -2 Excursion sets, Xt = fs :  ¹ ¸ tg s1 Threshold t 4 3 Search Region, S Rt = fZ :  ¹ ¸ tg -3 2 Rejection regions, Z2 Cone 2 alternative 0 Z1 Null 1 -2 -2 0 2 Euler characteristic heuristic again Search Region, S Excursion sets, Xt EC= #blobs - # holes = 1 7 6 5 2 1 1 Euler characteristic, EC 10 Heuristic : ¸ t) P(max Â(s) ¹ Observed 8 0 s2S 6 ¼ E(EC) = 0:05 Expected 4 ) t = 3:75 2 0 -2 0 0.5 EXACT! 1 1.5 2 E(EC(S \ Xt )) = X D d=0 2.5 3 L (S)½ (R ) d d t 3.5 4 Threshold, t E(EC(S \ Xt )) = Proof: X D L (S)½ (R ) d d t d=0 Theorem (Hadwiger, 1930s): Suppose Á(S), S ½ <D , is a set functional that is invariant under translations and rotations of S, and satis¯es the additivity property Á(A [ B) = Á(A) + Á(B) ¡ Á(A \ B): Then Á(S) must be a linear combination of intrinsic volumes L (S): d Á(S) = X D L (S)c : d d d=0 Proof: The choice Á(S) = E(EC(S \ Xt )) is invariant under translations and rotations because the random ¯eld is isotropic, and is additive because the EC is additive: EC(A [ B) = EC(A) + EC(B) ¡ EC(A \ B) E(EC(S \ Xt )) = X D L (S)½ (R ) d d t µ @Z ¸ = Sd @s d=0 Lipschitz-Killing curvature Ld (S) Steiner-Weyl Tube Formula (1930) • Put a tube of radius r about the search region λS 14 r 12 10 Tube(λS,r) 8 λS 6 EC density ½d (Rt ) Morse Theory µ Approachµ(1995) ¶ 1 @2 ¹ ¡ E ½d (Rt ) = 1f¹¸tg det ¸d @s@s0 ¯ ¶ µ ¶ ¯ @ ¹ ¯ ¹ = 0 P @ =0 ¯ @s @s µ random ¶ For a Gaussian field ¡ ½d (Z ¸ t) = 4 2 2 4 6 8 10 12 14 • Find volume, expand as a power series in r, pull off coefficients: jTube(¸S; r)j = X D d=0 ¼d L ¡d (S)r d D ¡(d=2 + 1) ¶ p1 @ 2¼ @t d P(Z ¸ t) For a chi-bar random field??? Lipschitz-Killing curvature Ld (S) of a triangle r Tube(λS,r) λS ¸ = Sd µ @Z @s ¶ Steiner-Weyl Volume of Tubes Formula (1930) Area(Tube(¸S; r)) = X D ¼ d=2 L ¡d (S)r d D ¡(d=2 + 1) d=0 = L2 (S) + 2L1 (S)r + ¼ L0 (S)r2 = Area(¸S) + Perimeter(¸S)r + EC(¸S)¼r2 L (S) = EC(¸S) 0 L (S) = 1 Perimeter(¸S) 1 2 L (S) = Area(¸S) 2 Lipschitz-Killing curvatures are just “intrinisic volumes” or “Minkowski functionals” in the (Riemannian) metric of the variance of the derivative of the process Lipschitz-Killing curvature Ld (S) of any set S S S ¸ = Sd Edge length × λ 12 10 8 6 4 2 . .. . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . 4 .. . . . . . . . . . . . 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 .. . . . ... . .. . . . . . . . ..... . . . . .... .. .. 10 µ @Z @s ¶ of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) Non-isotropic data? Use Riemannian metric µ ¶ of Var(∇Z) ¸ = Sd Z~N(0,1) s2 3 @Z @s 2 1 0.14 0.12 0 -1 -2 Edge length × λ 12 10 8 6 4 2 .. . . . . . . . . .. . . .. . . . . . . . . . . . . . . . .. . . . . . ... . . . . . . . . . . . . . . 4 6 .. . . . . . . . . . . . . . . . . . . . . . . . 8 . .. . . . ... . .. . . . . . . . ..... . . . . .... ... 10 s1 0.1 0.08 0.06 -3 of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) Estimating Lipschitz-Killing curvature Ld (S) We need independent & identically distributed random fields e.g. residuals from a linear model Z1 Z2 Z3 Z4 Replace coordinates of the triangles in S ½ <2 by normalised residuals Z jjZ jj ; Z = (Z1 ; : : : ; Zn ) 2 <n : Taylor & Worsley, JASA (2007) Z5 Z6 Z7 Z8 Z9 … Zn of triangles L (Lipschitz-Killing ²) = 1, L (¡) curvature L (N = 1, )=1 0 0 0 L (¡) = edge length, L (N) = 1 perimeter 1 2 L1 (N) = area 2 P Lcurvature P L Lipschitz-Killing union L ² ¡ Pof L ¡ of triangles N (S) = P² 0 ( ) ¡ 0( ) + P L (S) = L (¡) ¡ L (N) ¡ N 1 L1 (S) = P L 1(N) 2 N 2 0 N 0 ( ) E(EC(S \ Xt )) = Beautiful symmetry: X D µ L (S)½ (R ) d d t @Z ¸ = Sd @s d=0 Lipschitz-Killing curvature Ld (S) Steiner-Weyl Tube Formula (1930) ¶ EC density ½d (Rt ) Taylor Gaussian Tube Formula (2003) • Put a tube of radius r about the search region λS and rejection region Rt: Z2~N(0,1) 14 r 12 10 Rt Tube(λS,r) 8 Tube(Rt,r) r λS 6 t-r t Z1~N(0,1) 4 2 2 4 6 8 10 12 14 • Find volume or probability, expand as a power series in r, pull off1coefficients: jTube(¸S; r)j = X D d=0 ¼d L P(Tube(Rt ; r)) = ¡d (S)r d D ¡(d=2 + 1) X (2¼)d=2 d! d=0 ½d (Rt )rd EC density ½d ( ¹ ¸ t) of the  ¹ statistic Z2~N(0,1) Tube(Rt,r) r t-r Rejection region Rt t Z1~N(0,1) Taylor’s Gaussian Tube Formula (2003) 1 P (Z1 ; Z2 2 Tube(Rt ; r) = X (2¼)d=2 d! ½d ( ¹ ¸ t)rd d=0 (2¼)1=2 ½1 ( ¹ ¸ t)r + (2¼)½ ( ¸ t)r2 =2 + ¢ ¢ ¢ = ½0 ( ¹ ¸ t) + ¹ 2 Z 1 = (2¼)¡1=2 e¡z2 =2 dz + e¡(t¡r)2 =2 =4 t¡r ½0 ( ¹ ¸ t) = Z t 1 (2¼)¡1=2 e¡z2 =2 dz + e¡t2 =2 =4 ½1 ( ¹ ¸ t) = (2¼)¡1 e¡t2 =2 + (2¼)¡1=2 e¡t2 =2 t=4 ½ ( ¹ ¸ t) = (2¼)¡3=2 e¡t2 =2 t + (2¼)¡1 e¡t2 =2 (t2 ¡ 1)=8 2 .. . ¹ ¸ t) of the B ¹ statistic EC density ½d (B Recall that at a single point (voxel): ¹ ¸ t) = P(B X n j=1 ¡ ¡ ¢¸ ¢ ¡ j º j wj P Beta ; t 2 2 Recall that if F (s); s 2 S ½ <D , is an isotropic random ¯eld: µ ¶ X D ¸ ¼ \ f ¸ g L (S)½ (F ¸ t) P max Z(s) t E(EC(S s : Z(s) t )) = d d s2S d=0 with ½0 ´ P. Taylor & Worsley (2007): ¹ ¸ t) = ½d (B X n j=1 ¡ ¡ ¢ ¢ ¡j ¸ j º wj ½d Beta ; t 2 2 From well-known EC µ From simulations ¶ X µ ¶ linear combination! n ¡ ¢ densities at a single¸ voxel ¼ Same of F field ¡ ¸ ¹ j º j P max B(s) t wj P max Beta ; (s) t s2S j=1 s2S 2 2 Proof, n=3: Power? S = 1000cc brain, FWHM = 10mm, P = 0.05 Event Block (20 seconds) 1 1 Cone angle θ = 78.4o Cone angle θ = 38.1o 0.9 0.9 T-test on β1 0.8 0.8 0.7 0.6 0.6 Power of test F-test on 0.5 (β , β ) 1 2 0.4 Cone weights: w1 = ½ w2 = θ/2π 0.5 0.4 0.3 0.2 0 0 0.3 x1*(t) 0 0.2 x2*(t) -0.5 0.1 2 0 2 Shift d of HRF (seconds) 3 0 0 -2 0.1 20 40 Time t (seconds) 1 Response 0.5 Response Power of test Beta-bar test 0.7 0 0 20 40 Time t (seconds) 1 2 Shift d of HRF (seconds) 3 Bubbles task in fMRI scanner Correlate bubbles with BOLD at every voxel: Trial 1 2 3 4 5 6 7 … 3000 1 0.5 0 fMRI 10000 0 Calculate Z for each pair (bubble pixel, fMRI voxel) – a 5D “image” of Z statistics … Thresholding? Cross correlation random field Correlation between 2 fields at 2 different locations, searchedµ over all pairs of locations, ¶ one in S, one in T: P max C(s; t) ¸ c s2S;t2T = ¼ E(EC fs 2 S; t 2 T : C(s; t) ¸ cg) dim(S) X dim(T X) i=0 2n¡2¡h (i ¡ 1)!j! ¸ ½ij (C c) = ¼h=2+1 L (S)L (T )½ (C ¸ c) i j ij j=0 b(hX ¡1)=2c (¡1)k ch¡1¡2k (1 ¡ c2 )(n¡1¡h)=2+k k=0 X k X k l=0 m=0 ¡( n¡i + l)¡( n¡j + m) 2 2 ¡ ¡ ¡ ¡ l!m!(k l m)!(n 1 h + l + m + k)!(i ¡ 1 ¡ k ¡ l + m)!(j ¡ k ¡ m + l)! Cao & Worsley, Annals of Applied Probability (1999) Bubbles data: P=0.05, n=3000, c=0.113, T=6.22 NNLS for bubbles? At the moment, we are correlating Y(t) = fMRI data at each voxel with each of the 240x380=91200 face pixels as regressors x1(t),…,x91200(t) separately: Y(t) = xj(t)βj + z(t)γ + ε(t). We should be doing this simultaneously: Y(t) = x1(t)β1 + … + x91200(t)β91200 + z(t)γ + ε(t). Obviously impossible: #observations(3000) << #regressors(91200). Maybe we can use NNLS: β1 ≥ 0, …, β91200 ≥ 0. It should enforce sparsity over β = activation at face pixels, provided #observations(3000) >> #dimensions of cone ~ #resels of face(146.2) We can threshold Beta-bar over brain voxels to P<0.05 using above. Result will be an face image of isolated “local maxima” for each voxel. It will tell you which brain voxels are activated, but not which face pixels. Might be a huge computational task! Interactions? Y ~ x1 + … + x91200 + x1x2 + z. MS lesions and cortical thickness Idea: MS lesions interrupt neuronal signals, causing thinning in down-stream cortex Data: n = 425 mild MS patients 5.5 Average cortical thickness (mm) 5 4.5 4 3.5 3 2.5 Correlation = -0.568, T = -14.20 (423 df) 2 1.5 0 10 20 30 40 50 Total lesion volume (cc) 60 70 80 Charil et al, NeuroImage (2007) MS lesions and cortical thickness at all pairs of points Dominated by total lesions and average cortical thickness, so remove these effects as follows: CT = cortical thickness, smoothed 20mm ACT = average cortical thickness LD = lesion density, smoothed 10mm TLV = total lesion volume Find partial correlation(LD, CT-ACT) removing TLV via linear model: CT-ACT ~ 1 + TLV + LD test for LD Repeat for all voxels in 3D, nodes in 2D ~1 billion correlations, so thresholding essential! Look for high negative correlations … Threshold: P=0.05, c=0.300, T=6.48 Cluster extent rather than peak height (Friston, 1994) Choose a lower level, e.g. t=3.11 (P=0.001) Find clusters i.e. connected components of excursion set L (cluster) Measure cluster extent by resels D Z D=1 extent L (cluster) » c D t ® k Distribution of maximum cluster extent: Bonferroni on N = #clusters ~ E(EC). Peak height Distribution: fit a quadratic to the peak: Y s Cao and Worsley, Advances in Applied Probability (1999)