A New, Nonparametric Information- Splitting Image Analysis Technique Mark Inlow

advertisement
A New, Nonparametric InformationSplitting Image Analysis Technique
Mark Inlow
Jing Wan, Sungeun Kim, Kwansik Nho,
Shannon Risacher, Andrew Saykin, Li Shen
Life as a Statistics Professor…
Image Analysis Setup
• Data: 𝒀𝑖,𝑗 = image value at location 𝑖 for
subject 𝑗.
• Question: Does the image mean depend on
predictor 𝑥 at any location 𝑖?
• Methods:
1. Parametric: Random Field Theory
• Con: Assumptions
2. Nonparametric: Permutation
• Con: Slow
3. New Approach: weaker assumptions, faster?
Theoretical Basis
• One-sample case: test 𝐻0 : μ𝑖 = 0, 𝑖 = 1, … , 𝑘
vs. 𝐻1 : 𝜇𝑖 ≠ 0 for at least one location 𝑖.
• Theorem 1 (New Result):
– Let 𝑡𝑖 be the t-test statistic for location 𝑖.
– Let δ𝑖 =
𝑛
2
𝑗=1 𝒀𝑖,𝑗
– If 𝑌𝑗 is 𝑀𝑉𝑁(0, Σ) then 𝑡𝑖 and δ𝑖 are
independent under 𝐻0 .
• Note: 𝐸[δ𝑖 ] is an increasing function of μ𝑖 .
Information Splitting
Suppose we have a continuous predictor:
𝒀𝑖,𝑗 = β𝑖 𝑥𝑖,𝑗 + 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + ε𝑖,𝑗
1. Partition the sample into 𝑚 subsamples
2. Let 𝑡β𝑖 ,𝑙 be t-stat for 𝐻0,𝑖 : β𝑖 = 0, subsample 𝑙.
3. Define 𝒀𝑖,𝑙 = 𝑡β𝑖 ,𝑙 , 𝑙 = 1, … , 𝑚; 𝑖 = 1, … , 𝑘
4. If 𝑛/𝑚 large, 𝑌𝑙 ≈ 𝑀𝑉𝑁.
5. Compute 𝑡𝑖 and δ𝑖 ; apply Theorem 1
One Monotonic Recipe
1. 𝑠𝑖 = 1, 𝑡𝑖 > 1.886; −1, 𝑡𝑖 < −1.886; else 0
2. Let 𝑠𝑎1 = average of 𝑠𝑖 for smallest 1% of δ𝑖 ;
Let 𝑠𝑎2 = average of 𝑠𝑖 for next smallest 1% of δ𝑖 ;
…
Let 𝑠𝑎100 = average of 𝑠𝑖 for largest 1% of δ𝑖 .
3. Fit model 𝑠𝑎𝑝 = 𝛽𝑝 + ε𝑝 , 𝑝 = 1, … , 100.
4. Test 𝐻0 : β = 0 using permutation.
5. If β ≈ normal, use permutation t-test.
Hippocampus Surface Normal Data
• 𝑳𝑖,𝑗 = value of normal at left hippocampus at
location 𝑖 for subject j
• 𝑹𝑖,𝑗 = value of normal for right hippocampus
• n = 582 subjects; k = 6611 locations
• Let 𝑺𝑖,𝑗 = 𝑳𝑖,𝑗 + 𝑹𝑖,𝑗 (assume bilateral symmetry)
• Is there a relationship between 𝑳 (or 𝑺) and a
given SNP at one or more locations?
SA vs. P for 𝑆 (LR Hippo Sum)
APOE
BIN1
New Approach vs. RFT Results
Hippo
Data
Left, 𝐿
Left, 𝐿
LR Sum, 𝑆
LR Sum, 𝑆
SNP
APOE
BIN1
APOE
BIN1
New
Approach
1.2 x 10−5
1.3 x 10−1
3.7 x 10−7
7.0 x 10−2
RFT Peak
Amplitude
1.3 x 10−3
1.4 x 10−1
6.6 x 10−5
3.4 x 10−1
Permutation Distribution Normality
APOE
BIN1
• 10
SurfStat APOE T-Map for LR Sum
SurfStat BIN1 T-map for LR Sum
Comments
• Information splitting: info at location 𝑖 shared
by 𝑡𝑖 and δ𝑖 which are independent under 𝐻0 .
• Performance/properties: seem favorable
compared to RFT and permutation methods
• Going forward:
– Incorporate spatial information!
– Apply to larger images
– Do formal simulation studies
Acknowledgements
1. Andrew Saykin, Li Shen, and the Department
of Radiology and Imaging Sciences, IU School
of Medicine, who supported and financed my
2010-2011 sabbatical.
2. My main coauthor: Jing Wan, who did the
SurfStat statistical analyses and data
management.
3. My other coauthors/colleagues: Sungeun
Kim, Kwansik Nho, and Shannon Risacher.
Hippocampus Surface Data
• FreeSurfer and Large Deformation Diffeomorphic
Metric Mapping (FS+LDDMM) were used to segment
hippocampal surfaces from MRI scans
• To remove size effect, total intracranial volume (ICV)
was adjusted to a constant and each hippocampus was
scaled accordingly.
• Rigid body transformation was applied to register each
hippocampus to a template.
• 6611 Surface signals were extracted as the deformation
along the surface normal direction of the template and
were adjusted for baseline age, gender, education and
handedness.
Genetic (SNP) Data
• Single Nucleotide Polymorphism (SNP) – DNA
sequence location possessing nucleotide
variants of length one, i.e., T vs. C or A vs. G.
• The SNP data were genotyped using the
Human 610-Quad BeadChip.
• Top 23 SNPs from AlzGene database and a SNP
from the TOMM40 gene were considered.
• After quality controls, 20 SNPs remained.
Random Field Theory
• Suppose we want to test the global composite
null Ho: β1,𝑗 = 0 for all 𝑗 for a given SNP.
• By the Bonferroni inequality:
P max 𝑡 > 𝑎 ≤ (6611)𝑃(𝑡𝑑𝑓 > 𝑎)
• Gaussian Random Field Theory (RFT) provides
much less conservative estimate:
P max 𝑡 > 𝑎 ≈ 𝑑=0 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 𝐸𝐶𝑑 (𝑎)
where the sum is over 𝐷, the number of
dimensions of the image (K. Worsley)
Random Field Theory, Cont.:
• RFT p-value for maximum 𝑡 statistic
P max 𝑡 > 𝑎 ≈ Σ𝑑=0 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 𝐸𝐶𝑑 (𝑎)
• 𝑅𝑒𝑠𝑒𝑙𝑠𝑑 is the number of 𝑑-dimensional
resels (resolution elements); it depends on
smoothness (correlation) of image, e.g.
𝑅𝑒𝑠𝑒𝑙𝑠𝐷 = 𝑉/𝐹𝑊𝐻𝑀𝐷
• 𝐸𝐶𝑑 (ζ) is the 𝑑-dimensional Euler
Characteristic density. For large values of ζ
Euler C. is 0 or 1 depending if 𝑡𝑗 > ζ for any 𝑗.
Random Field Theory Varieties
• Maximum Test Statistic:
P-value = 𝑃(max 𝑡 > 𝑎)
• Spatial Extent of Suprathreshold 𝑡’s:
P-value = 𝑃(𝐻 > ℎ𝜏 )
where 𝐻 is the number of connected
suprathreshold 𝑡’s; ℎ𝜏 is observed number
exceeding threshold τ.
• Cluster Maximum and Spatial Extent
Left Spherical Distribution Theory
Theorem: Let 𝒀 be a 𝑝 by n matrix of 𝑛
𝑝-dimensional observations which is
multivariate normal 𝑵𝑝⨯𝑛 𝟎, 𝜮 ⊗ 𝐈n . Let 𝒅 be
a 𝑝-dimensional vector of weights determined
uniquely by 𝒀𝒀′ .
• let z ′ = (𝑧𝑗 )′ = 𝒅′𝒀.
• Let 𝑧 = z ′ 𝟏n /n.
𝑆𝑧2
′
2
• Let
= (𝑧 𝑧 − 𝑛𝑧 )/(n − 1).
• Then 𝑡 = 𝑛𝑧/𝑆𝑧 has a 𝑡𝑛−1 distribution.
Comparison of Maps
Information-Splitting: 𝛿𝒋
Statistical Parametric Map: 𝒕𝒋
Materials
• 582 non-Hispanic Caucasian participants
166 healthy controls (HCs), 287 mild cognitive impairment (MCI),
and 129 AD
• Magnetic resonance imaging (MRI) data
• 20 SNPs were selected from the AlzGene database and
TOMM40 gene and coded to test additive genetic effect (i.e.
dose dependent effect of the minor allele).
Download