eQTLs and reverse engineering approaches in the rat: exploiting multiple tissues Enrico Petretto Imperial College Faculty of Medicine Outline • Specialized tools for genetic mapping – The rat as a model system • Expression QTL mapping in the rat – Single and multiple tissues analysis • Co-expression networks – Tissue specific regulatory networks • eQTL applications to medicine – Ogn, a key regulator of left ventricular mass Specialized tools for genetic mapping: rat Recombinant Inbred (RI) strains Genotype H Cumulative, renewable resource for phenotypes and genetic mapping Spontaneously Hypertensive Rat Normotensive Rat (BN) SHR BN Genotype B F1 Pravenec et al. J Hypertension, 1989 F2 RI strains Gene X Strain Distribution Pattern for Gene X H H B B B H H Genetic mapping in RI strains RI strains Gene X SDP for Gene X B B H B B cardiac mass H H Linkage Linkage mRNA Gene expression Genetical Genomics Genetic mapping quantitative variation of mRNA levels in a population Expression QTLs genetic determinants of gene expression eQTL mapping in the rat why the rat ? The rat is among the leading model species for research in physiology, pharmacology, toxicology and for the study of genetically complex human diseases Spontaneously Hypertensive Rat (SHR): A model of the metabolic syndrome • • • • • • Spontaneous hypertension Decreased insulin action Hyperinsulinaemia Central obesity Defective fatty acid metabolism Hypertriglyceridaemia Microarray data generation in the rat 30 RI strains + 2 parental strains 4 animals per strain (no pooling) Expression profiling fat, kidney, adrenal, heart, skeletal muscle, aorta, liver, brain, … > 1,000 genetic markers + 800k SNPs eQTLs cis- and trans-acting eQTLs cis-acting eQTL gene Candidate genes for physiological traits trans-acting eQTL gene Regulatory gene networks Genetic architecture of genetic variation in gene expression 0.6 cis-eQTL 0.5 0.5 trans-eQTL 0.4 0.4 PGW < 0.05 h2QTL + + h2QTL 0.6 Supplementary Figure 2 0.3 0.2 0.2 0.1 Heart LV 0.0 0.00 small genetic effect Fat Heart 0.1 trans-eQTLs: 0.3 0.25 0.50 0.75 1.00 1.25 Fat 0.0 1.50 0.00 eQTL allelic effect 0.25 0.50 0.75 1.00 1.25 1.50 eQTL allelic effect 0.6 0.6 0.5 0.5 0.4 0.4 h2QTL big genetic effect highly heritable h2QTL cis-eQTLs: 0.3 0.2 0.2 0.1 0.1 Kidney 0.0 0.00 Petretto et al. 2006 PLoS Genet 0.3 0.25 0.50 0.75 1.00 1.25 eQTL allelic effect 1.50 Adrenal 0.0 0.00 0.25 0.50 0.75 1.00 1.25 eQTL allelic effect 1.50 eQTL Mapping Methods Linear regression (within single tissues) QTL Reaper, empirical genome-wide significance by permutations (Hubner et al. Nat. Genet. 2005; Petretto et al. PLoS Genet. 2006; Petretto et al. Nat. Genet. 2008) Bayesian multiple regression models (within single tissues) Fully multivariate, model based Bayesian multiple response models (across multiple tissues) Borrow strength across tissues Bayesian models • Fully Bayesian variable selection in the “large p, small n” paradigm for unidimensional outcomes Y (n × 1) Evolutionary Stochastic Search (ESS)* • Bayesian multiple regression model – Providing evidence of polygenic control (for each transcript) • Bayesian multiple response model – Providing evidence of shared genetic regulation across tissues – improved estimate of the variance-covariance matrix across tissues enhances detection of small effects (trans-eQTLs) * Bottolo and Richardson 2008, submitted n, observations q, number of probesets p, number of genetic markers Bayesian eQTL analysis in multiple tissues • Data from four tissues were pooled and normalized using RMA • Standardize gene expression measurements across tissues to avoid potential batch effects • Pilot study – 2,000 transcripts having the highest variation in gene expression jointly in fat, kidney, adrenal and heart tissues – 1,000 genetic markers 1. Bayesian multiple regression model 2. Bayesian multiple response model Bayesian multiple regression model: polygenic control Bayesian multiple response model: shared genetic regulation 60% % shared eQTLs 50% 40% 30% cis - trans 47% 53% Bayesian regression Single tissue analysis cis trans Fat 70% 30% Kidney 71% 29% Adrenal 71% 29% Heart 69% 31% 20% 10% 0% No eQTL 1 eQTL 2 eQTLs 3 eQTLs 4 eQTLs 5 eQTLs Number of shared eQTLs across fat, kidney, adrenal and heart Pr (marker to be true positive | probeset)=0.95 Single vs multiple tissues analysis • Bayesian regression within single tissues has high power to detect cis-acting effects • Detection of a significant proportion of mRNA levels under polygenic control • Multiple response model – Pooling information across tissues greatly enhances identification of shared genetic regulation of gene expression – 33% of transcripts are under shared monogenic control – 17% of transcripts are under shared polygenic control – Increased power to detect small eQTL effects shared across tissues (i.e., trans-eQTLs) From eQTLs to gene regulatory networks • Co-expression analysis in trans-eQTL clusters taking into account the underlying genetic architecture (Grieve et al. submitted) • Joint co-expression analysis in 4 tissues – detect specific co-expression patterns within and across tissues • Graphical Gaussian Models (GGM) to model linear dependencies between genes within and across tissues Co-expression analysis across tissues samples Within tissues Transcript 1, Transcript 2, Transcript 3, … Transcript 2000 heart Transcript 1, skeletal Transcript 2, muscle Transcript 3, … Transcript 2000 Across tissues Transcript 1, Transcript 2, Transcript 3, … Transcript 2000 liver Transcript 1, Transcript 2, Transcript 3, … Transcript 2000 aorta Test for: • Functional enrichment analysis using GO classification • Genetic control of the co-expression modules (Monti et al. Nat. Genet. 2008) → Tissue specific modules → Cross-tissue modules GGMs to model linear dependencies between genes • Partial correlation matrix Π = (πij) • Inverse of variance covariance matrix P Ω = (ωij) = P-1 πij = - ωij / (ωii ωjj )-½ • small n, large p • Regularized covariance matrix estimator by shrinkage (Ledoit-Wolf approach) • Guarantees positive definiteness Schafer and Strimmer 2004, Rainer and Strimmer 2007 Partial correlation graphs • Multiple testing on all partial correlations – Fitting a mixture distribution to the observed partial correlations (p) f (p) = η0 f0 (p;κ) + ηA fA (p) η0 +ηA =1, η0 >> ηA uniform [-1, 1] ∧ ∧ η0 , κ ∧ Prob (non-zero edge|p) = 1 Schafer and Strimmer 2004, Rainer and Strimmer 2007 ∧ η0 f0 (p;κ) f (p) Hypothesis driven analysis 1. Co-expression graphs point to gene regulatory networks 2. Co-expression graphs under genetic control are suggestive of common regulation by a single gene(s) Graphical Gaussian models • Detect conditionally dependent co-expression components (modules): – – Within tissues Across tissues Example 1. tissue specific component (skeletal muscle) posterior probability for non-zero edge = 0.95 Transcription Factor activity trans cis trans NO genome-wide significance Module mapping in skeletal muscle: Chromosome 18 (15,889,013 pb) P = 0.00086 Enriched in inflammatory response genes GO:0002376 GO:0006955 7.5 x 10-12 2.1 x 10-11 immune system immune response Example 2. tissue specific component (aorta) posterior probability for non-zero edge = 0.95 Transcription Factor activity trans cis trans NO genome-wide significance Module mapping in aorta: Chromosome 1 (102,366,482 pb) P < 0.001 GO:0003012 GO:0006936 9.0 x 10-4 9.0 x 10-4 muscle system process muscle contraction Example 3. multi-tissue component (liver, heart) posterior probability for non-zero edge = 0.95 Transcription Factor activity trans cis trans NO genome-wide significance Module mapping in both tissues: Chromosome 20 (34,232,001 pb) P = 0.0008 GO:0030097 GO:0002520 GO:0042592 5.1 x 10-4 6.4 x 10-4 9.1 x 10-4 hemopoiesis immune system development homeostatic process Example 4. multi-tissue component (skeletal muscle, heart) posterior probability for non-zero edge = 0.95 Transcription Factor activity trans cis trans NO genome-wide significance Module mapping in both tissues: Chromosome 17 (79,885,972 pb) P = 0.002 GO:0048545 GO:0051384 5.9 x 10-4 6.2 x 10-4 response to steroid hormone stimulus response to glucocorticoid stimulus cis-eQTLs candidate genes for physiological traits cis eQTL genes: candidates for human hypertension … Hubner et al Nature Genetics 2005 A successful “eQTL story”… Annual risk of cardiovasc disease left ventricular mass (LVM) 20% Men Women 15% 10% 5% 0% < 90 90 - 114 115 - 139 LVM (g/m) Levy et al (1990) New Engl J Med 322: 1561-66 >140 cis eQTL genes: candidates for left ventricular mass (LVM) 4 Rat chromosome 17 LOD Score 3 Look for cis-eQTLs associated with LVM 2 1 0 QTL 0 10 20 30 40 50 60 Genetic distance (cM) 70 80 LVM Limited correlation of LVM with blood pressure in the RI strains DBP LVM, left ventricular mass SBP, systolic blood pressure DBP, diastolic blood pressure PP, pulse pressure SBP r = -0.24 r = 0.51* PP r = 0.11 r = -0.05 r = 0.80** HR r = 0.32 r = 0.22 LVM r = 0.02 DBP r = 0.07 SBP r = 0.10 PP R Sq Linear = 0.056 R Sq Linear = 0.006 R Sq Linear = 0.005 R Sq Linear = 7.954E-4 R Sq Linear = 0.099 R Sq Linear = 0.004 R Sq Linear = 0.604 R Sq Linear = 7.954E-4 R Sq Linear = 0.01 R Sq Linear = 0.459 R Sq Linear = 0.604... * P < 0.01 ** P < 10-5 HR QTT: Genome-wide association between cis-eQTLs and LVM a) Correlation with LVM 1 4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 cis-eQTLs Differentially expressed genes Chromosome 17 Left ventricular mass QTL 8 Hbld2 Ogn b) 1 2 3 4 5 Correlation with DBP LOD Score 3 2 1 0 0 10 7 8 6 5 9 10 11 12 13 14 15 16 17 18 19 20 4 3 2 1 -3 20 30 40 50 60 Genetic distance (cM) c) 6 -Log10(P-value) 7 70 80 -2 -1 0 1 Fold change 2 3 Fine mapping of LVM in RI strains Using informative SNPs in the region D D D Rat chromosome 17 4 D DD DD D DD D D D D D D LOD Score LOD Score 3 D D D D 2 D D D D 1 DDD D D D D DD D D D D 0 0 D 10 D 20 20000000 30 40 40000000 Mb Hbld2 Ogn 50 DDD D D 60 60000000 Mb Ogn KO mouse (mean ± SE) LVM / LVM body (%) weight (%) 0.5 0.4 Ogn+/+ ** * ns ns Ogn+/Ogn-/- 0.3 0.2 0.1 0.0 Baseline Hypertrophic stimulation (angiotensin II infusion) * P = 0.01 ** P = 2 x 10-3 Ogn is a strong candidate gene for LVM 9 co-localise with rat cardiac mass QTLs 9 correlate with variation in LVM (BP independent) 9 dynamically regulated in response to hypertrophic stimulation → in vivo regulation in the Ogn KO mouse Translational studies Genome-wide expression analysis in the heart biopsies collected from 20 aortic stenosis patients ( ↑ LVM) and 7 controls ( ↓ LVM) Top differentially expressed genes (out of 22k probesets) in human cardiac hypertrophy and associated with LVM Probe ID Gene Symbol Gene Name Fold 1 Change FDR (%) 218730_s_at OGN Osteoglycin 2.2 202766_s_at FBN1 Fibrillin 1 209621_s_at PDLIM3 219087_at 2 Correlation 3 with LVM P4 value 2.7 0.62 1.2E-03 2.0 2.7 0.55 4.5E-03 PDZ and LIM domain 3 1.6 5.0 0.52 6.5E-03 ASPN Asporin 1.9 5.0 0.52 7.5E-03 213646_x_at TUBA1B Tubulin, alpha 1b 1.5 2.7 0.52 5.7E-03 213765_at MFAP5 Microfibrillar associated protein 5 1.8 5.0 0.51 6.8E-03 203570_at LOXL1 Lysyl oxidase-like 1 1.5 5.0 0.51 8.1E-03 208782_at FSTL1 Follistatin-like 1 1.5 2.7 0.51 1.1E-02 213867_x_at ACTB Actin, beta 1.5 2.7 0.49 signaling 1.1E-02 pathway 212614_at ARID5B AT rich interactive domain 5B 1.6 2.7 0.49 9.9E-03 216442_x_at FN1 Fibronectin 1 1.9 5.0 0.49 1.1E-02 211750_x_at TUBA1C Tubulin, alpha 1c 1.6 2.7 0.48 1.3E-02 219922_s_at LTBP3 Latent transforming growth factor beta binding protein 3 1.5 5.0 0.40 4.3E-02 202119_s_at CPNE3 Copine III 1.5 3.9 0.40 4.8E-02 210095_s_at IGFBP3 Insulin-like growth factor binding protein 3 1.7 2.7 0.39 5.0E-02 … … … … … … … 1 Fold change of differential expression between patients with low (≤ 93 g/m2) and high (≥ 142 g/m2) LVM in the study population 2 False discovery rate for differential expression was estimated by SAM analysis 3 Data are ranked according to decreasing values of the Pearson correlation with LVM (determined non-invasively by echocardiography) 4 Empirical P-values for the correlations were calculated by 10,000 permutations TGF-β Rat studies Ogn hypertrophic stimulation TGF-β Ogn KO mouse signaling pathway Primary genetic control OGN protein LVM LVM Example 4. multi tissue component (skeletal muscle, heart) posterior probability for non-zero edge = 0.95 Transcription Factor activity trans cis trans NO genome-wide significance QTT approach for BP and cis-eQTLs in the kidney Summary • The eQTL approach is a powerful tool for the identification of: – Candidate genes for complex traits – Regulatory gene networks • Developed novel, integrated and fully multivariate methods for eQTL analysis across multiple tissues • Using the eQTL approach we identified Ogn as primary determinant of cardiac mass in rats, mice and humans Acknowledgments Tim Aitman Ian Grieve Sarah Langley Jon Mangion Matthias Hening Norbert Hubner Michael Pravenec Gary Conrad Ted Kurtz Yigal Pinto Stuart Cook Riswan Sarwar Han Lu Blanche Schroen Sylvia Richardson Leonardo Bottolo (MDC, Berlin) (MDC, Berlin) (Institute of Physiology, Prague) (Kansas State University, USA) (University of California, USA) (Cardiovascular Research I., Maastricht) Clusters of trans-eQTLs Trans-eQTLs Rat chromosome 8 heart fat adrenal kidney PGW<0.05 tissue-specific clusters not tissue-specific cluster 100 c17.6 c17.38 c15.108 c15.11 c16.0 c11.31 c15.75 c6.136 c4.93 c15.78 c1.87 c10.25 c11.32 c4.148 c8.45 c8.87 c8.53 c4.91 c4.161 c10.21 c4.151 c16.46 c15.80 c17.40 c8.9 c16.50 c3.41 c20.44 c3.112 c8.49 c13.9 c17.87 c3.130 c5.151 c7.142 c8.32 c15.58 c1.248 c8.38 c1.90 c12.7 c3.129 c6.131 160 trans-eQTLs hot spots 140 120 kidney heart fat adrenal Chromosome 15, 108 Mb, D15Rat29 80 60 40 20 0 Locus (chromosome.Mb) Ogn protein expression in adult rat cardiac myocytes top left - labelled with rhodamineconjugated phalloidin top right - DAPI* bottom left - Ogn antibody visualized using Alexa fluor 488 donkey anti-goat bottom right - merged image * 4',6-DIAMIDINO-2-PHENYLINDOLE Data Mining of eQTL datasets eQTLexplorer database • Relational database (MySQL) • Located on Codon server (Imperial College, London) • Advantages of relational database – reduced redundancy & increased consistancy – improved access & security – facilitated data integration & mining Mueller et al. Bioinformatics 2006 Main Screen cis & trans eQTLs physiological QTLs Browsing the data… eQTL mapping in the heart (left ventricle) Genome-wide corrected P-value 0 500 1000 1500 0.05 0.01 0.001 0.0001 0.00001 Petretto et al, Nature Genetics 2008 (under revision) 2000 trans-eQTL cis-eQTL Characterization of Ogn Dynamic responsein toresponse to hypertrophic stimulation Dynamic regulation hypertrophic stimulation* Sequence analysis Fold change BN SHR BN SHR ±1 -2 Hbld2 * -4 ** -8 0h 1h 3h 6h 24 h BN SHR 5’UTR 5'UTR +2 -2346 -1787 -1452 -1438 C A T G T T ±1 G A Fold change Ogn +2 (mean ± SE) Ogn Ogn exon exon 33 Hbld23’UTR 3'UTR 1173 A G 348 T C 1224 1852 2497 CA 47bp Ins T A -2 5'UTR -1997 -1397 -4-350 G C T A T C -8 3'UTR 1310 67bp Ins 0h 1h 3h 6h * P < 0.05 ** P < 0.01 ns, not significant * neonatal rat ventricular myocytes were stimulated with phenylephrine (100 μM) 24 h 258 C G Alternative splicing of the Ogn 3’UTR Parental strains RI strains strains Total mRNA Short isoform Long isoform ** 1.0 Luciferase Activity Circles, BN Triangles, SHR ns ** 0.5 ** 0.0 BN-L SHR-L BN-S Fold change * SHR-S ns ** Total mRNA Short isoform 30 (arbitrary optical units) ns ** Ogn protein expression Fold change ** 20 *** ** 10 0 BN SHR Long isoform OGN protein expression CAD 1 2 AS 3 HF 4 5 HTN 6 7 8 Mimecan Pre-OGN Pre-OGN 50 kDa OGN OGN 20 kDa LVM LVM CAD, coronary heart disease AS, concentric hypertrophy secondary to aortic stenosis HF, ischemic heart failure HTN, hypertensive heart disease 0 0 1 1 Model size 2 Model size 2 3 3 4 fat 4 5 Model posterior probability for each transcript Model posterior probability for each transcript Model posterior probability for each transcript Model posterior probability for each transcript heart kidney 5 0 0 1 1 Model size 2 Model size 2 3 3 4 4 5 adrenal 5