A New, Nonparametric InformationSplitting Image Analysis Technique Mark Inlow Jing Wan, Sungeun Kim, Kwansik Nho, Shannon Risacher, Andrew Saykin, Li Shen Life as a Statistics Professor… Image Analysis Setup • Data: 𝒀𝑖,𝑗 = image value at location 𝑖 for subject 𝑗. • Question: Does the image mean depend on predictor 𝑥 at any location 𝑖? • Methods: 1. Parametric: Random Field Theory • Con: Assumptions 2. Nonparametric: Permutation • Con: Slow 3. New Approach: weaker assumptions, faster? Theoretical Basis • One-sample case: test 𝐻0 : μ𝑖 = 0, 𝑖 = 1, … , 𝑘 vs. 𝐻1 : 𝜇𝑖 ≠ 0 for at least one location 𝑖. • Theorem 1 (New Result): – Let 𝑡𝑖 be the t-test statistic for location 𝑖. – Let δ𝑖 = 𝑛 2 𝑗=1 𝒀𝑖,𝑗 – If 𝑌𝑗 is 𝑀𝑉𝑁(0, Σ) then 𝑡𝑖 and δ𝑖 are independent under 𝐻0 . • Note: 𝐸[δ𝑖 ] is an increasing function of μ𝑖 . Information Splitting Suppose we have a continuous predictor: 𝒀𝑖,𝑗 = β𝑖 𝑥𝑖,𝑗 + 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + ε𝑖,𝑗 1. Partition the sample into 𝑚 subsamples 2. Let 𝑡β𝑖 ,𝑙 be t-stat for 𝐻0,𝑖 : β𝑖 = 0, subsample 𝑙. 3. Define 𝒀𝑖,𝑙 = 𝑡β𝑖 ,𝑙 , 𝑙 = 1, … , 𝑚; 𝑖 = 1, … , 𝑘 4. If 𝑛/𝑚 large, 𝑌𝑙 ≈ 𝑀𝑉𝑁. 5. Compute 𝑡𝑖 and δ𝑖 ; apply Theorem 1 One Monotonic Recipe 1. 𝑠𝑖 = 1, 𝑡𝑖 > 1.886; −1, 𝑡𝑖 < −1.886; else 0 2. Let 𝑠𝑎1 = average of 𝑠𝑖 for smallest 1% of δ𝑖 ; Let 𝑠𝑎2 = average of 𝑠𝑖 for next smallest 1% of δ𝑖 ; … Let 𝑠𝑎100 = average of 𝑠𝑖 for largest 1% of δ𝑖 . 3. Fit model 𝑠𝑎𝑝 = 𝛽𝑝 + ε𝑝 , 𝑝 = 1, … , 100. 4. Test 𝐻0 : β = 0 using permutation. 5. If β ≈ normal, use permutation t-test. Hippocampus Surface Normal Data • 𝑳𝑖,𝑗 = value of normal at left hippocampus at location 𝑖 for subject j • 𝑹𝑖,𝑗 = value of normal for right hippocampus • n = 582 subjects; k = 6611 locations • Let 𝑺𝑖,𝑗 = 𝑳𝑖,𝑗 + 𝑹𝑖,𝑗 (assume bilateral symmetry) • Is there a relationship between 𝑳 (or 𝑺) and a given SNP at one or more locations? SA vs. P for 𝑆 (LR Hippo Sum) APOE BIN1 New Approach vs. RFT Results Hippo Data Left, 𝐿 Left, 𝐿 LR Sum, 𝑆 LR Sum, 𝑆 SNP APOE BIN1 APOE BIN1 New Approach 1.2 x 10−5 1.3 x 10−1 3.7 x 10−7 7.0 x 10−2 RFT Peak Amplitude 1.3 x 10−3 1.4 x 10−1 6.6 x 10−5 3.4 x 10−1 Permutation Distribution Normality APOE BIN1 • 10 SurfStat APOE T-Map for LR Sum SurfStat BIN1 T-map for LR Sum Comments • Information splitting: info at location 𝑖 shared by 𝑡𝑖 and δ𝑖 which are independent under 𝐻0 . • Performance/properties: seem favorable compared to RFT and permutation methods • Going forward: – Incorporate spatial information! – Apply to larger images – Do formal simulation studies Acknowledgements 1. Andrew Saykin, Li Shen, and the Department of Radiology and Imaging Sciences, IU School of Medicine, who supported and financed my 2010-2011 sabbatical. 2. My main coauthor: Jing Wan, who did the SurfStat statistical analyses and data management. 3. My other coauthors/colleagues: Sungeun Kim, Kwansik Nho, and Shannon Risacher. Hippocampus Surface Data • FreeSurfer and Large Deformation Diffeomorphic Metric Mapping (FS+LDDMM) were used to segment hippocampal surfaces from MRI scans • To remove size effect, total intracranial volume (ICV) was adjusted to a constant and each hippocampus was scaled accordingly. • Rigid body transformation was applied to register each hippocampus to a template. • 6611 Surface signals were extracted as the deformation along the surface normal direction of the template and were adjusted for baseline age, gender, education and handedness. Genetic (SNP) Data • Single Nucleotide Polymorphism (SNP) – DNA sequence location possessing nucleotide variants of length one, i.e., T vs. C or A vs. G. • The SNP data were genotyped using the Human 610-Quad BeadChip. • Top 23 SNPs from AlzGene database and a SNP from the TOMM40 gene were considered. • After quality controls, 20 SNPs remained. Random Field Theory • Suppose we want to test the global composite null Ho: β1,𝑗 = 0 for all 𝑗 for a given SNP. • By the Bonferroni inequality: P max 𝑡 > 𝑎 ≤ (6611)𝑃(𝑡𝑑𝑓 > 𝑎) • Gaussian Random Field Theory (RFT) provides much less conservative estimate: P max 𝑡 > 𝑎 ≈ 𝑑=0 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 𝐸𝐶𝑑 (𝑎) where the sum is over 𝐷, the number of dimensions of the image (K. Worsley) Random Field Theory, Cont.: • RFT p-value for maximum 𝑡 statistic P max 𝑡 > 𝑎 ≈ Σ𝑑=0 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 𝐸𝐶𝑑 (𝑎) • 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 is the number of 𝑑-dimensional resels (resolution elements); it depends on smoothness (correlation) of image, e.g. 𝑅𝑒𝑠𝑒𝑙𝑠𝐷 = 𝑉/𝐹𝑊𝐻𝑀𝐷 • 𝐸𝐶𝑑 (ζ) is the 𝑑-dimensional Euler Characteristic density. For large values of ζ Euler C. is 0 or 1 depending if 𝑡𝑗 > ζ for any 𝑗. Random Field Theory Varieties • Maximum Test Statistic: P-value = 𝑃(max 𝑡 > 𝑎) • Spatial Extent of Suprathreshold 𝑡’s: P-value = 𝑃(𝐻 > ℎ𝜏 ) where 𝐻 is the number of connected suprathreshold 𝑡’s; ℎ𝜏 is observed number exceeding threshold τ. • Cluster Maximum and Spatial Extent Left Spherical Distribution Theory Theorem: Let 𝒀 be a 𝑝 by n matrix of 𝑛 𝑝-dimensional observations which is multivariate normal 𝑵𝑝⨯𝑛 𝟎, 𝜮 ⊗ 𝐈n . Let 𝒅 be a 𝑝-dimensional vector of weights determined uniquely by 𝒀𝒀′ . • let z ′ = (𝑧𝑗 )′ = 𝒅′𝒀. • Let 𝑧 = z ′ 𝟏n /n. 𝑆𝑧2 ′ 2 • Let = (𝑧 𝑧 − 𝑛𝑧 )/(n − 1). • Then 𝑡 = 𝑛𝑧/𝑆𝑧 has a 𝑡𝑛−1 distribution. Comparison of Maps Information-Splitting: 𝛿𝒋 Statistical Parametric Map: 𝒕𝒋 Materials • 582 non-Hispanic Caucasian participants 166 healthy controls (HCs), 287 mild cognitive impairment (MCI), and 129 AD • Magnetic resonance imaging (MRI) data • 20 SNPs were selected from the AlzGene database and TOMM40 gene and coded to test additive genetic effect (i.e. dose dependent effect of the minor allele).