modQTL: modularized multi-tissue eQTL discovery by Bayesian matrix factorization and completion Confounder correction/ Bayesian model selection B * C J Factor selection 12K genes u s is t 3 es K factors j 2 Gam([m+1]/2,λ /2) ~ j P(λ) ~ Gam(1,1) Yti | ut, vi ~ 2 N(utvi,σ ) 2. Re-estimate factorizers P(u | Y, V), P(v | Y, U) P(vj) ∝"exp(-λ||vj||) Y Y 451 individuals K 2 1 ge s e n efficient parallel processing even using extensive Gibbs sampling Remark2: other imputation techniques didn't work at all if missing values ≥ 80% Simulation results 130 2 1 Y ≈ X # ⟨B⟩ * ⟨C⟩ 110 0 τ imputed ➔ used Gibbs sampled avg B,C ● −1 100 −2 90 −3 −4 −4 80 −3 −2 −1 0 true 1 2 3 4 1 2 3 4 5 6 factors 7 8 9 10 1. Generate data with ground-truth with rank = 3 (tested in K=1 .. 10) 2. Remove 80% of elements as NaN 3. Fit matrix factorization 4. Estimate missing values by sampling MFGL ● + +++ + + + + ● MFGL−corrected cis−eQTL ● ● ● ● ● ● + + + ● + ++ + + ++ + ++ ++ +++ ● + ++ +++ ++ + +++ ++ + ● ∆=MFGL − PEER + ++ + + 100 ++ + ++ +++++ ++++++ + ++ +++ + +++++++ + + ++ + + + + + ++ + ++ + +++ + + ++ ++ + + + ++ + ++ ++ ++ + + ++ ++ ++ + ++ + ++ + ++ ++ +++ +++ +++++ ++ ++ +++ ++ +++ + + + + + +++ + + + +++ +++ + ++ ++ +++ +++ +++ + + + +++++ ++ +++ + 100 75 ● PEER (J=5..100) ● ● ● ● ● 50 ● ● ● 25 ● ● ● ● ● Vanilla ● 0 10 15 20 0.0 10 Hidden Factors 15 0.2 0.4 0.6 0.8 1.0 20 −log10 p−value p−value • Similar pattern on tissues with ≥ 100 samples Imputation improves signals for downstream analysis Improvement in 13 Brain eQTL calling DDX11 Adipose − Subcutaneous Adipose − Visceral (Omentum) Adrenal Gland Artery − Aorta Artery − Coronary Artery − Tibial Bladder Brain − Amygdala Brain − Anterior cingulate cortex (BA24) Brain − Caudate (basal ganglia) Brain − Cerebellar Hemisphere Brain − Cerebellum Brain − Cortex Brain − Frontal Cortex (BA9) Brain − Hippocampus Brain − Hypothalamus Brain − Nucleus accumbens (basal ganglia) Brain − Putamen (basal ganglia) Brain − Spinal cord (cervical c−1) Brain − Substantia nigra Breast − Mammary Tissue Cells − EBV−transformed lymphocytes Cells − Transformed fibroblasts Cervix − Ectocervix Cervix − Endocervix Colon − Sigmoid Colon − Transverse Esophagus − Gastroesophageal Junction Esophagus − Mucosa Esophagus − Muscularis Fallopian Tube Heart − Atrial Appendage Heart − Left Ventricle Kidney − Cortex Liver Lung Minor Salivary Gland Muscle − Skeletal Nerve − Tibial Ovary Pancreas Pituitary Prostate Skin − Not Sun Exposed (Suprapubic) Skin − Sun Exposed (Lower leg) Small Intestine − Terminal Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole Blood Adipose − Subcutaneous Adipose − Visceral (Omentum) Adrenal Gland Artery − Aorta Artery − Coronary Artery − Tibial Bladder Brain − Amygdala Brain − Anterior cingulate cortex (BA24) Brain − Caudate (basal ganglia) Brain − Cerebellar Hemisphere Brain − Cerebellum Brain − Cortex Brain − Frontal Cortex (BA9) Brain − Hippocampus Brain − Hypothalamus Brain − Nucleus accumbens (basal ganglia) Brain − Putamen (basal ganglia) Brain − Spinal cord (cervical c−1) Brain − Substantia nigra Breast − Mammary Tissue Cells − EBV−transformed lymphocytes Cells − Transformed fibroblasts Cervix − Ectocervix Cervix − Endocervix Colon − Sigmoid Colon − Transverse Esophagus − Gastroesophageal Junction Esophagus − Mucosa Esophagus − Muscularis Fallopian Tube Heart − Atrial Appendage Heart − Left Ventricle Kidney − Cortex Liver Lung Minor Salivary Gland Muscle − Skeletal Nerve − Tibial Ovary Pancreas Pituitary Prostate Skin − Not Sun Exposed (Suprapubic) Skin − Sun Exposed (Lower leg) Small Intestine − Terminal Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole Blood GENOTYPE (rs4931424) GENOTYPE (rs4931424) 2 1 0 −1 −2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 1 2 0 1 2 + + + + + + + + + + + + + + + + + + + + + + + + + 0 1 2 0 1 2 + + + + + + + + + + + + + + + + + + + + 0 1 2 0 1 2 + + + + + + + + + + + + + + + + ZFP57 + + + + + + + + + + + + + + + + 0 1 2 + + + + + + + + + + + + + + 0 1 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 1 2 0 1 2 0 1 2 + + + + Imputation doesn't create fantasy (consistent effect size) ignore weakly associated (brain) tissues ZFP57 GENOTYPE (rs145352923) + + + + Adrenal Gland + + − Aorta+ + 2 Artery + + + Coronary + Artery − + 1 Artery − Tibial + 0 + + Bladder + + −1Brain −+Amygdala+ + (BA24) −2 Brain − Anterior cingulate cortex Brain − Caudate (basal ganglia) Brain − Cerebellar 0 Hemisphere 1 2 0 1 2 Brain − Cerebellum Brain − Cortex Brain − Frontal Cortex (BA9) Brain − Hippocampus Brain − Hypothalamus Brain − Nucleus accumbens (basal ganglia) Brain − Putamen (basal ganglia) Brain − Spinal cord (cervical c−1) Brain − Substantia nigra Breast − Mammary Tissue Cells − EBV−transformed lymphocytes Cells − Transformed fibroblasts Cervix − Ectocervix Cervix − Endocervix Colon − Sigmoid Colon − Transverse Esophagus − Gastroesophageal Junction Esophagus − Mucosa Esophagus − Muscularis Fallopian Tube Heart − Atrial Appendage Heart − Left Ventricle Kidney − Cortex Liver Lung Minor Salivary Gland Muscle − Skeletal Nerve − Tibial Ovary Pancreas Pituitary Prostate Skin − Not Sun Exposed (Suprapubic) Skin − Sun Exposed (Lower leg) Small Intestine − Terminal Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole Blood modQTL discovery Adipose − Subcutaneous Adipose − Visceral (Omentum) Adrenal Gland Artery − Aorta Artery − Coronary Artery − Tibial Bladder Brain − Amygdala Brain − Anterior cingulate cortex (BA24) Brain − Caudate (basal ganglia) Brain − Cerebellar Hemisphere Brain − Cerebellum Brain − Cortex Brain − Frontal Cortex (BA9) Brain − Hippocampus Brain − Hypothalamus Brain − Nucleus accumbens (basal ganglia) Brain − Putamen (basal ganglia) Brain − Spinal cord (cervical c−1) Brain − Substantia nigra Breast − Mammary Tissue Cells − EBV−transformed lymphocytes Cells − Transformed fibroblasts Cervix − Ectocervix Cervix − Endocervix Colon − Sigmoid Colon − Transverse Esophagus − Gastroesophageal Junction Esophagus − Mucosa Esophagus − Muscularis Fallopian Tube Heart − Atrial Appendage Heart − Left Ventricle Kidney − Cortex Liver Lung Minor Salivary Gland Muscle − Skeletal Nerve − Tibial Ovary Pancreas Pituitary Prostate Skin − Not Sun Exposed (Suprapubic) Skin − Sun Exposed (Lower leg) Small Intestine − Terminal Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole Blood + + + 2 + + 1 + 0 −1 − Subcutaneous Adipose Adipose − −2 Visceral (Omentum) Remark1: 3 Bayesian model averaging .. .. 0.75 ● DDX11 4 • Kyung .. Casella (2010) .. .. + ● + ++ ++ ++ + ++++++ ++ + + + ++ +++++++ + ++++ + ++ ++++++ + + + + + ++ + + + + + ++++ + + + + ++ + ++++ + +++ + + + ++ + + + + + + + ++ +++ ++ ++ + + + + +++ + + +++ + +++ + + 120 P2 53 tissues 2 2 ~"N(0,τ σ ) 2 P(τ ) + ● Imputed 451 individuals 5 • Shrink row j if not informative • Treat rows/columns as groups • Yuan & Lin (2006) P(cj) + + 0.50 Group LASSO on rows prevents from overfitting Bayesian group LASSO .. .. 1.00 1000 + + MFGL+imputation + + + + + + + + + + + MFGL Y Y X + ++ + 1. Sample unobserved expression Y in tissue t of individual i (column) Group LASSO: P(cj) ∝"exp(-λ||cj||) .. .. ≈T U * Y + ++ + Vanilla ➔ sparsity on rows in C ➔ sparsity on columns in B Matrix factorization P1 V • Strong confounder: 1-5 dominant factors • Nonlinearity: fine-tuned by ~ 10 minor factors • Unrelated factors are muted + + + + ≈ 80% missing J factors B: batch effects; C: confounders http://www.gtexportal.org m individuals + ++ + + + ZFP57 ~ rs145352923 p<1e−178 v ≈ β0 + β1 * SNP modular activity of gene m individuals m individuals m individuals + + + + Posterior StdDev ≈ P2 Imputation of tissues on gene matrix T tissues X n genes n genes Background - GTEx P1 Confounder correction: genes on tissue matrix Heart − Left Ventricle Heart − Left Ventricle (218) 1.25 • MFGL: this work with Gibbs sampling + Bayesian group LASSO • PEER: Stegle et al. Variatonal Bayes grid search, J = 5 .. 100 • Vanilla: no correction 1.0 1. Computer Science and Artificial Intelligence Lab, MIT & Broad Institute, Cambridge, MA 2. Department of Mathematics, MIT, Cambridge, MA {biriarte,ypp,manoli}@mit.edu; * equal contribution Boosted statistical power in eQTL calling (Matrix eQTL) 0.0 Manolis 1 Kellis Density Yongjin 1,* Park , # eGenes Benjamin 1,2,* Iriarte , 1 0 + + + + + −1 0 1 2 genotype GENOTYPE (rs145352923) Y ≈ U × V • Knowledge eGene ~ eQTL in one tissue can transfer to other tissues. • "Your heart can tell us about your thyroid."