Document 10701343

advertisement
modQTL: modularized multi-tissue eQTL discovery
by Bayesian matrix factorization and completion
Confounder correction/
Bayesian model selection
B *
C
J
Factor selection
12K genes
u
s
is
t
3
es
K factors
j
2
Gam([m+1]/2,λ /2)
~
j
P(λ) ~ Gam(1,1)
Yti | ut, vi ~
2
N(utvi,σ )
2. Re-estimate factorizers
P(u | Y, V), P(v | Y, U)
P(vj) ∝"exp(-λ||vj||)
Y
Y
451 individuals
K
2
1
ge
s
e
n
efficient parallel processing even
using extensive Gibbs sampling
Remark2:
other imputation techniques didn't
work at all if missing values ≥ 80%
Simulation results
130
2
1
Y
≈ X
# ⟨B⟩ *
⟨C⟩
110
0
τ
imputed
➔ used Gibbs sampled avg B,C
●
−1
100
−2
90
−3
−4
−4
80
−3
−2
−1
0
true
1
2
3
4
1
2
3
4
5
6
factors
7
8
9
10
1. Generate data with ground-truth
with rank = 3 (tested in K=1 .. 10)
2. Remove 80% of elements as NaN
3. Fit matrix factorization
4. Estimate missing values by sampling
MFGL
●
+
+++ +
+
+
+
●
MFGL−corrected cis−eQTL
●
●
●
●
●
●
+
+
+
●
+
++
+ +
++
+
++
++
+++
●
+
++
+++
++
+
+++
++
+
●
∆=MFGL − PEER
+
++
+
+
100
++
+
++
+++++
++++++ +
++
+++ +
+++++++
+ +
++
+
+
+
+
+
++ + ++
+
+++
+ +
++ ++ +
+
+
++
+
++
++
++ +
+
++
++ ++ +
++ +
++ +
++
++
+++
+++ +++++
++
++
+++
++
+++
+ +
+ +
+
+++
+
+ +
+++ +++
+ ++
++
+++
+++
+++
+
+ +
+++++
++
+++
+
100
75
●
PEER (J=5..100)
●
●
●
● ●
50
● ●
●
25
●
● ● ●
●
Vanilla
●
0
10
15
20
0.0
10
Hidden Factors
15
0.2
0.4
0.6
0.8
1.0
20
−log10 p−value
p−value
• Similar pattern on tissues with ≥ 100 samples
Imputation improves signals for downstream analysis
Improvement in 13 Brain eQTL calling
DDX11
Adipose − Subcutaneous
Adipose − Visceral (Omentum)
Adrenal Gland
Artery − Aorta
Artery − Coronary
Artery − Tibial
Bladder
Brain − Amygdala
Brain − Anterior cingulate cortex (BA24)
Brain − Caudate (basal ganglia)
Brain − Cerebellar Hemisphere
Brain − Cerebellum
Brain − Cortex
Brain − Frontal Cortex (BA9)
Brain − Hippocampus
Brain − Hypothalamus
Brain − Nucleus accumbens (basal ganglia)
Brain − Putamen (basal ganglia)
Brain − Spinal cord (cervical c−1)
Brain − Substantia nigra
Breast − Mammary Tissue
Cells − EBV−transformed lymphocytes
Cells − Transformed fibroblasts
Cervix − Ectocervix
Cervix − Endocervix
Colon − Sigmoid
Colon − Transverse
Esophagus − Gastroesophageal Junction
Esophagus − Mucosa
Esophagus − Muscularis
Fallopian Tube
Heart − Atrial Appendage
Heart − Left Ventricle
Kidney − Cortex
Liver
Lung
Minor Salivary Gland
Muscle − Skeletal
Nerve − Tibial
Ovary
Pancreas
Pituitary
Prostate
Skin − Not Sun Exposed (Suprapubic)
Skin − Sun Exposed (Lower leg)
Small Intestine − Terminal Ileum
Spleen
Stomach
Testis
Thyroid
Uterus
Vagina
Whole Blood
Adipose − Subcutaneous
Adipose − Visceral (Omentum)
Adrenal Gland
Artery − Aorta
Artery − Coronary
Artery − Tibial
Bladder
Brain − Amygdala
Brain − Anterior cingulate cortex (BA24)
Brain − Caudate (basal ganglia)
Brain − Cerebellar Hemisphere
Brain − Cerebellum
Brain − Cortex
Brain − Frontal Cortex (BA9)
Brain − Hippocampus
Brain − Hypothalamus
Brain − Nucleus accumbens (basal ganglia)
Brain − Putamen (basal ganglia)
Brain − Spinal cord (cervical c−1)
Brain − Substantia nigra
Breast − Mammary Tissue
Cells − EBV−transformed lymphocytes
Cells − Transformed fibroblasts
Cervix − Ectocervix
Cervix − Endocervix
Colon − Sigmoid
Colon − Transverse
Esophagus − Gastroesophageal Junction
Esophagus − Mucosa
Esophagus − Muscularis
Fallopian Tube
Heart − Atrial Appendage
Heart − Left Ventricle
Kidney − Cortex
Liver
Lung
Minor Salivary Gland
Muscle − Skeletal
Nerve − Tibial
Ovary
Pancreas
Pituitary
Prostate
Skin − Not Sun Exposed (Suprapubic)
Skin − Sun Exposed (Lower leg)
Small Intestine − Terminal Ileum
Spleen
Stomach
Testis
Thyroid
Uterus
Vagina
Whole Blood
GENOTYPE (rs4931424)
GENOTYPE (rs4931424)
2
1
0
−1
−2
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+
0 1 2
0 1 2
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
0 1 2
0 1 2
+ + +
+
+
+
+
+
+
+
+
+
+
+ +
+ +
+ +
+
0 1 2
0 1 2
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
ZFP57
+
+
+
+ + +
+ +
+ +
+
+
+
+
+
+
0 1 2
+
+ +
+
+
+ +
+
+
+
+
+
+
+
0 1 2
+ +
+
+
+ +
+ + +
+
+
+
+
+
+
+
+
+ +
+
+ +
+
+ +
+
+
+
+
0 1 2
0 1 2
0 1 2
+ +
+
+
Imputation doesn't create fantasy (consistent effect size)
ignore weakly
associated (brain) tissues
ZFP57
GENOTYPE (rs145352923)
+
+
+ +
Adrenal Gland
+
+ − Aorta+ +
2 Artery
+
+
+ Coronary
+
Artery
−
+
1
Artery − Tibial +
0
+ + Bladder
+ +
−1Brain
−+Amygdala+
+ (BA24)
−2
Brain − Anterior cingulate
cortex
Brain − Caudate (basal ganglia)
Brain − Cerebellar
0 Hemisphere
1 2 0 1 2
Brain − Cerebellum
Brain − Cortex
Brain − Frontal Cortex (BA9)
Brain − Hippocampus
Brain − Hypothalamus
Brain − Nucleus accumbens (basal ganglia)
Brain − Putamen (basal ganglia)
Brain − Spinal cord (cervical c−1)
Brain − Substantia nigra
Breast − Mammary Tissue
Cells − EBV−transformed lymphocytes
Cells − Transformed fibroblasts
Cervix − Ectocervix
Cervix − Endocervix
Colon − Sigmoid
Colon − Transverse
Esophagus − Gastroesophageal Junction
Esophagus − Mucosa
Esophagus − Muscularis
Fallopian Tube
Heart − Atrial Appendage
Heart − Left Ventricle
Kidney − Cortex
Liver
Lung
Minor Salivary Gland
Muscle − Skeletal
Nerve − Tibial
Ovary
Pancreas
Pituitary
Prostate
Skin − Not Sun Exposed (Suprapubic)
Skin − Sun Exposed (Lower leg)
Small Intestine − Terminal Ileum
Spleen
Stomach
Testis
Thyroid
Uterus
Vagina
Whole Blood
modQTL discovery
Adipose − Subcutaneous
Adipose − Visceral (Omentum)
Adrenal Gland
Artery − Aorta
Artery − Coronary
Artery − Tibial
Bladder
Brain − Amygdala
Brain − Anterior cingulate cortex (BA24)
Brain − Caudate (basal ganglia)
Brain − Cerebellar Hemisphere
Brain − Cerebellum
Brain − Cortex
Brain − Frontal Cortex (BA9)
Brain − Hippocampus
Brain − Hypothalamus
Brain − Nucleus accumbens (basal ganglia)
Brain − Putamen (basal ganglia)
Brain − Spinal cord (cervical c−1)
Brain − Substantia nigra
Breast − Mammary Tissue
Cells − EBV−transformed lymphocytes
Cells − Transformed fibroblasts
Cervix − Ectocervix
Cervix − Endocervix
Colon − Sigmoid
Colon − Transverse
Esophagus − Gastroesophageal Junction
Esophagus − Mucosa
Esophagus − Muscularis
Fallopian Tube
Heart − Atrial Appendage
Heart − Left Ventricle
Kidney − Cortex
Liver
Lung
Minor Salivary Gland
Muscle − Skeletal
Nerve − Tibial
Ovary
Pancreas
Pituitary
Prostate
Skin − Not Sun Exposed (Suprapubic)
Skin − Sun Exposed (Lower leg)
Small Intestine − Terminal Ileum
Spleen
Stomach
Testis
Thyroid
Uterus
Vagina
Whole Blood
+
+
+
2
+
+
1
+
0
−1 − Subcutaneous
Adipose
Adipose − −2
Visceral (Omentum)
Remark1:
3
Bayesian model averaging
.. ..
0.75
●
DDX11
4
• Kyung .. Casella (2010)
.. ..
+
●
+
++ ++
++ +
++++++
++
+
+
+
++
+++++++ + ++++
+
++
++++++
+ + +
+ + ++ + + + + + ++++ +
+ + +
++ + ++++
+ +++
+
+ + ++
+ + + + + +
+
++
+++
++
++ + +
+
+ +++
+
+
+++
+ +++
+
+
120
P2
53 tissues
2
2
~"N(0,τ σ )
2
P(τ )
+
●
Imputed
451 individuals
5
• Shrink row j if not informative
• Treat rows/columns as groups
• Yuan & Lin (2006)
P(cj)
+
+
0.50
Group LASSO on rows prevents
from overfitting
Bayesian group LASSO
.. ..
1.00
1000
+ +
MFGL+imputation
+ +
+ +
+ +
+ +
+ +
+
MFGL
Y
Y
X
+
++
+
1. Sample unobserved expression
Y in tissue t of individual i (column)
Group LASSO:
P(cj) ∝"exp(-λ||cj||)
.. ..
≈T U *
Y
+
++
+
Vanilla
➔ sparsity on rows in C
➔ sparsity on columns in B
Matrix factorization
P1
V
• Strong confounder:
1-5 dominant factors
• Nonlinearity: fine-tuned
by ~ 10 minor factors
• Unrelated factors are muted
+
+
+
+
≈ 80% missing
J factors
B: batch effects; C: confounders
http://www.gtexportal.org
m individuals
+
++
+
+
+
ZFP57 ~ rs145352923
p<1e−178
v ≈ β0 + β1 * SNP
modular activity of gene
m individuals
m individuals
m individuals
+
+
+
+
Posterior StdDev
≈
P2 Imputation of tissues
on gene matrix
T tissues
X
n genes
n genes
Background - GTEx
P1 Confounder correction:
genes on tissue matrix
Heart − Left Ventricle
Heart − Left Ventricle (218)
1.25
• MFGL: this work
with Gibbs sampling +
Bayesian group LASSO
• PEER: Stegle et al.
Variatonal Bayes
grid search, J = 5 .. 100
• Vanilla: no correction
1.0
1. Computer Science and Artificial Intelligence Lab, MIT & Broad Institute, Cambridge, MA
2. Department of Mathematics, MIT, Cambridge, MA
{biriarte,ypp,manoli}@mit.edu; * equal contribution
Boosted statistical power
in eQTL calling (Matrix eQTL)
0.0
Manolis
1
Kellis
Density
Yongjin
1,*
Park ,
# eGenes
Benjamin
1,2,*
Iriarte
,
1
0
+
+
+
+
+
−1
0
1
2
genotype
GENOTYPE (rs145352923)
Y
≈
U × V
• Knowledge eGene ~ eQTL in one tissue can
transfer to other tissues.
• "Your heart can tell us about your thyroid."
Download