Document 12776870

advertisement
Fast, accurate methods for heritability and associa5on high-­‐
dimensional connectomics data Thomas E. Nichols University of Warwick Plan •  Heritability: What/Why •  Three Two Methods –  Boxplot Heritability –  1-­‐step es5mators –  No-­‐step es5mators Ganjgahi, H., Winkler, A. M., Glahn, D. C., Blangero, J., Kochunov, P., & Nichols, T. E. (2015). NeuroImage Fast and powerful heritability inference for family-­‐based neuroimaging studies. NeuroImage. online. Ge, T., Nichols, T. E., Lee, P. H., Holmes, A. J., Roffman, J. L., Buckner, R. L., … Smoller, J. W. (2015). Massively expedited genome-­‐wide heritability analysis (MEGHA). Proceedings of the Na4onal Academy of Sciences, 112, 201415603. Heritability: A variance decomposi5on Total Variance Environmental effects, “C”ommon to twins/
family members Measurement “E”rrors, or other effects unique to each individual A C “A”ddi5ve gene5c variance D E “D”ominant gene5c variance Variance Decomposi6on V = A + D + C + E •  Narrow sense heritability h2 = A/V –  Propor5on of all variance a`ributed to addi5ve gene5c variance •  Common environment c2 = C/V –  Propor5on of all variance a`ributable shared environmental variance Heritability: A variance decomposi5on Total Variance Environmental effects, “C”ommon to twins/
family members Measurement “E”rrors, or other effects unique to each individual A C “A”ddi5ve gene5c variance D E “D”ominant gene5c variance Variance Decomposi6on V = A + D + C + E •  Why bother? –  A phenotype is defined as a heritable trait –  Indicates poten5al for gene5c associa5ons –  Unitless measure, can be used for comparing different biological quan55es Es5ma5ng Heritability •  Requires varying relatedness… like with twins Cov(YTwin1,YTwin2 ) =
MZ Twins ! A+ D+C + E
#
" A+ D+C
A+ D+C $
&
A+ D+C + E %
MZ Twins ⎡ A + D + C + E
⎢
A+ D
⎣
Raised Together ⎡ A + D + C + E
⎢ 1 A + 1 D + C
4
⎣ 2
Raised Apart A+ D
⎤
A + D + C + E ⎥⎦
DZ Twins A + 1 4 D + C ⎤
A + D + C + E ⎥⎦
1
2
DZ Twins ⎡ A + D + C + E
⎢ 1 A + 1 D
4
⎣ 2
A + 1 4 D ⎤
A + D + C + E ⎥⎦
1
2
•  But you never have twins raised apart! •  With only MZ & DZ twins, can only es5mate 2 unknowns! –  Must either assume D=0 or C=0; generally assume D=0 Heritability Modeling •  Requires varying relatedness… like with twins Cov(YTwin1,YTwin2 ) =
MZ Twins ! A+ D+C + E
#
" A+ D+C
Raised Together A+ D+C $
&
A+ D+C + E %
DZ Twins ⎡ A + D + C + E
⎢ 1 A + 1 D + C
4
⎣ 2
A + 1 4 D + C ⎤
A + D + C + E ⎥⎦
1
2
•  ACE Model –  Comes from assuming D=0 –  Es5ma5on •  Variance components model, maximum likelihood •  Structural Equa5on Model, maximum likelihood •  Method-­‐of-­‐moments Es5ma5ng Heritability Corr(YTwin1,YTwin2 ) =
MZ Twins !
1
#
#" h 2 + c 2
$
h +c &
&%
1
2
DZ Twins Raised Together 2
!
#
#
"
1
1
2
2
h
+
c
2
1
2
2 $
h
+
c
2
&
&
1
%
•  Es5mate h2, c2 with maximum likelihood –  Itera5ve op5miza5on required –  Likelihood can be nasty… flat or highly ridged –  Convergence failures common –  For high-­‐resolu5on images & connectomes, need something fast and reliable Es5ma5ng Heritability Corr(YTwin1,YTwin2 ) =
MZ Twins !
1
#
#" h 2 + c 2
$
h +c &
&%
1
2
DZ Twins Raised Together 2
!
#
#
"
1
1
2
2
h
+
c
2
1
2
2 $
h
+
c
2
&
&
1
%
•  Method-­‐of-­‐Moments, aka Falconer’s Es5mate h2F = 2(rMZ
rDZ )
–  Where •  rMZ is MZ intra-­‐class correla5on •  rDZ is DZ intra-­‐class correla5on •  Falconer’s is easy but not op5mal –  E.g it can be nega5ve! –  Generally a higher variance es5mate than ML methods Aggregate (Boxplot) Heritability 2
•  h es5ma5on driven by correla5ons over Joint work with subjects, for each phenotype Steve Smith, Xu Chen DZ Twins MZ Twins 4
4
3
3
2
2
Twin i(2) Twin i(2) Voxel #1 1
0
-1
1
0
-1
-2
-2
-3
-3
-4
-4
-3
-2
-1
0
1
Twin i(1) 75 twin pairs 2
3
4
-4
-4
-3
-2
-1
0
1
2
Twin i(1) 75 twin pairs 3
4
Aggregate (Boxplot) Heritability •  With high-­‐dimensional phenotype, can compute pairwise correla5ons Twin pair 1 (MZ) Twin pair 2 (DZ) All Voxels 44
3
3
33
2
2
22
0
-1
1(2) Twin 3
4
1
1
0
-1
11
00
-1
-1
-2
-2
-2
-2
-3
-3
-3
-3
-4
-4
-3
-2
-1
0
1
2
3
4
-4
-4
-3
-2
-1
0
1
2
…
All Voxel
oxe
4
Twin 2(2) Twin 1(2) All Voxels Twin pair 3 (DZ) 3
4
-4
-4
-4
-4
-3
-3
-2
-2
-1
-1
00
11
Twin 1(1) Twin 2(1) Twin 13(1
1000 voxels 1000 voxels 1000 voxels Aggregate (Boxplot) Heritability •  With high-­‐dimensional phenotype, can compute pairwise correla5ons •  Then compare over subjects •  2(<rMZ> − <rDZ>) –  Is that average heritability? 0.8
0.78
0.76
r –  Surely the difference <rMZ> and <rDZ> must tell you something about heritability? Pairwise Correla5ons 0.74
0.72
0.7
0.68
MZ
DZ
What exactly do these mean!? f2 + c e2
]
Var(µ)/ 2 + h
ERV
E(rMZ) ≈ E(rDZ) ≈ E(rUR) ≈ Var(µ)/
Var(µ)/
2
+1
f2 + ce2
+ 12 h
Var(µ)/
Var(µ)/
2
2
+1
2
Var(µ)/
2
+1
P
⇢f
Var(µ)
1]
2 ERV
P
⇢f
P
⇢f
μj 2
•  Huge influence of phenotype mean –  Variance of mean common to all r’s –  If we Demean, then Var(µ) = 0
Variance of mean μj of voxel j
Mean of voxel j
1X
=
J j
2
j
Average of voxel variance What do these mean (ater de-­‐meaning)? f2 + ce2
] Variance-­‐weighted average heritability h
ERV
E(rMZ) ≈ E(rDZ) ≈ 1
1f
2
h
2
+ ce2
1
E(rUR) = 0 P
⇢f
1]
2 ERV
P
⇢f
•  So a group comparison gives… f2
1 h
E(rMZ − rDZ) ≈ 2
1
X
1
f2 =
h
J j
2
j
X
1
ce2 =
J j
2
j
2
!
h2j
Variance-­‐weighted avg. common var. 2
!
c2j
Variance-­‐weighted ERV ]=
ERV
]
ERV
P
⇢f
2
J(J
1)
X✓
j>j 0
j j0
2
◆
ERVjj 0
ERV: Endophenotype Ranking Value Heritability × gene5c correla5on ERVjj 0 = hj hj 0 ⇢G
jj 0
What does a difference in means mean? •  What if this effect is significant? Pairwise r’s f2
1 h
E(rMZ − rDZ) ≈ 2
1
0.8
0.78
]
ERV
>0
f
⇢P
•  Indicates significant heritability (
-­‐ish) ⇣
⌘
0.76
0.74
]
–  Shited by ERV
a`enuated by 1
0.72
0.7
P
⇢f
1
Variance-­‐weighted average inter-­‐voxel correla5on 0.68
MZ
DZ
P =
⇢f
2
J(J
1)
X✓
j>j 0
j j0
2
◆
⇢P
jj 0
•  But! It’s a valid test for any heritability! h2j = 0 8i ) ERVjj 0 = 0 8j, j 0 ) E(rMZ − rDZ) = 0 Applica5ons? •  “Aggregate Heritability” (AgHe) AgHe = 2(⟨rMZ⟩ − ⟨rDZ⟩) ⇡
f2
h
1
]
ERV
f2
⇡h
P
⇢f
–  Biased es5mate of variance-­‐weighted heritability demeaned result •  Regardless of AgHe, Suggests Approach to High-­‐dimensional Phenotype Heritability Ranking –  Mean h2, h2
•  h2 computed at each element/voxel, then averaged f2
–  Var-­‐Weighted Mean h2, h
•  For BOLD phenotypes, not so crazy! •  Most ac5ve voxels most variable X
1
f2 =
h
J j
2
j
2
!
h2j
Valida5on Against Falconer’s 1000 voxels, 500 heritable, w
ildly varying variance 2
• 
•  Es5mate h F at each voxel, then average or weighted average •  AgHe always has smaller bias and variance –  Not as good as ML methods (of course) 0.2
0.1
0
-0.1
Null
(0,.2)
(0,.3)
(0,.5)
(0,.7)
(.2,0)
(.3,0)
(.5,0)
(.7,0)
(.2,.2) (.3,.2) (.2,.3) (.5,.2) (.3,.3) (.2,.5)
Null
(0,.2)
(0,.3)
(0,.5)
(0,.7)
(.2,0)
(.3,0)
(.5,0)
(.7,0)
(.2,.2) (.3,.2) (.2,.3) (.5,.2) (.3,.3) (.2,.5)
Null
(0,.2)
(0,.3)
(0,.5)
(0,.7)
(.2,0)
(.3,0)
(.5,0)
(.7,0)
(.2,.2) (.3,.2) (.2,.3) (.5,.2) (.3,.3) (.2,.5)
0.2
0.1
0
-0.1
0.1
0.05
0
HCP Phenotype Ranking •  22 HCP Phenotypes… –  nElm = 3k-­‐60k •  For each Structural Sulcal Depth, Cor5cal Thickness, Myelin, Cor5cal loca5on Areal registra5on measures (Area ra5o, displacement) Res6ng-­‐State Func6onal ICA-­‐based 100-­‐ & 200-­‐dimensional Dual-­‐regression vs. Direct PCA Par5al correla5on vs Full correla5on with Mean GM corr. Par5al correla5on: various regulariza5on parameters Task-­‐Based Func6onal All task data (86 contrasts on 7 tasks), on 100 ROI set –  Compute AgHe, h 2 & hf2
•  APACE used to find P-­‐values & CI’s •  Hypothesis: –  Ranking will be similar between the 3 methods –  AgHe most similar to hf
2 (Var-­‐Wt mean h2) HCP Phenotype Ranking: Es5mates •  Good monotonic rela5onship –  Tighter for hf
2 (variance-­‐weighted mean h2) f
AgHe vs. h
2. AgHe vs. h
2. r = 0.81 r = 0.89 HCP Phenotype Ranking: P-­‐values •  Good agreement for strong significance –  AgHe more op5mis5c… possibly due to (1
P-­‐values: AgHe vs. h
.2 P)
⇢f
Pvalues: AgHe vs. h
f
.2 1
Non-­‐High-­‐Dimensional Phenotypes: Not so good •  Ranking of each row of a 200-­‐dimensional ICA netmat –  200 phenotypes, each with 199 elements: Connec5on strength to each other node •  AgHe not so biased, but huge variance AgHe vs. h
2. AgHe f
AgHe vs. h
2
. AgHe Aggregate Heritability Conclusions •  Cheap and cheerful approach to heritability •  Gives a (biased) f2 ERV
]
1 h
es5mate of variance-­‐ E(rMZ − rDZ) ≈ 2 1 ⇢fP
weighted mean !
2
X
heritability 1
j
f2 =
h
h2j
2
J
2
j
•  But valid test of Ho: All h =0 •  Easy to implement •  Poten5ally useful for phenotype ranking • 
Back to Maximum Likelihood Itera5ve maximiza5on of likelihood –  E.g. Model when C=0… for family •  Φ is kinship matrix – allowing arbitrary families •  Must jointly es5mate β and A and E variances –  Make inference on Ho: h2=0 with likelihood ra5o test, χ2 sta5s5c •  Blangero (& others) suggested eigen-­‐simplifica5on •  This turns a correlated-­‐data problem into an independent by heteroscedas4c one Ganjgahi, et al. (2015). NeuroImage Fast and powerful heritability inference for family-­‐based … NeuroImage. Fast Permuta5on Heritability Inference •  Eigen-­‐simplified ML update equa5ons simple... Just regression (OLS & WLS) •  We stop ater just one itera5on –  1-­‐step es5mators are consistent… Amemiya (1977) –  Result is so fast, allows permuta5on •  End result... Non-­‐itera5ve, easy to implement solu5on to twin or arbitrary family data! Ganjgahi, et al. (2015). NeuroImage Fast and powerful heritability inference for family-­‐based … NeuroImage. Connectome Applica5on •  HCP’s PTN –  Parcella5ons, Time series, Network Matrices (NetMats) –  d=25, 50, 100, 200 •  Use FPHI to measure h2 on netmats –  w/ 300, 1225, 4950 and 19,900 elements h2=0 Frac5on about 50% h2=0 •  If H0 true everywhere, –  Always far less than 50% Non-­‐zero h2 Distribu5on d •  Median decreases with Non-­‐zero h2 Distribu5on d •  Median decreases with –  But mean FWE significant h2 increases Associa5on -­‐ Inference on β •  Netmat connec5vity regressed on English Reading ability •  FWE-­‐
significant edges –  None for d>50 d = 50 d = 25 Conclusions •  When twin/family data available, rich addi5onal informa5on available •  Standard tools can be slow, error-­‐prone •  These fast, approximate methods provide useful alterna5ves for massive connectomic data 
Download