eQTLs and reverse engineering approaches in the rat: exploiting multiple tissues Enrico Petretto

advertisement
eQTLs and reverse engineering
approaches in the rat:
exploiting multiple tissues
Enrico Petretto
Imperial College Faculty of Medicine
Outline
• Specialized tools for genetic mapping
– The rat as a model system
• Expression QTL mapping in the rat
– Single and multiple tissues analysis
• Co-expression networks
– Tissue specific regulatory networks
• eQTL applications to medicine
– Ogn, a key regulator of left ventricular mass
Specialized tools for genetic mapping:
rat Recombinant Inbred (RI) strains
Genotype
H
Cumulative,
renewable resource
for phenotypes and
genetic mapping
Spontaneously
Hypertensive Rat
Normotensive
Rat (BN)
SHR
BN
Genotype
B
F1
Pravenec et al. J Hypertension, 1989
F2
RI strains
Gene X
Strain Distribution Pattern
for Gene X
H
H
B
B
B
H
H
Genetic mapping in RI strains
RI strains
Gene X
SDP for Gene X
B
B
H
B
B
cardiac mass
H
H
Linkage
Linkage
mRNA
Gene expression
Genetical Genomics
Genetic mapping
quantitative variation of mRNA
levels in a population
Expression QTLs
genetic determinants
of gene expression
eQTL mapping in the rat
why the rat ?
The rat is among the leading model species for research
in physiology, pharmacology, toxicology
and for the study of genetically complex human diseases
Spontaneously Hypertensive Rat (SHR):
A model of the metabolic syndrome
•
•
•
•
•
•
Spontaneous hypertension
Decreased insulin action
Hyperinsulinaemia
Central obesity
Defective fatty acid metabolism
Hypertriglyceridaemia
Microarray data generation in the rat
30 RI strains + 2 parental strains
4 animals per strain (no pooling)
Expression profiling
fat, kidney, adrenal, heart, skeletal muscle, aorta, liver,
brain, …
> 1,000 genetic markers + 800k SNPs
eQTLs
cis- and trans-acting eQTLs
cis-acting
eQTL
gene
Candidate genes for
physiological traits
trans-acting
eQTL
gene
Regulatory
gene networks
Genetic architecture of genetic variation in gene expression
0.6
cis-eQTL
0.5
0.5
trans-eQTL
0.4
0.4
PGW < 0.05
h2QTL
+
+
h2QTL
0.6
Supplementary Figure 2
0.3
0.2
0.2
0.1
Heart
LV
0.0
0.00
small genetic effect
Fat
Heart
0.1
trans-eQTLs:
0.3
0.25
0.50
0.75
1.00
1.25
Fat
0.0
1.50
0.00
eQTL allelic effect
0.25
0.50
0.75
1.00
1.25
1.50
eQTL allelic effect
0.6
0.6
0.5
0.5
0.4
0.4
h2QTL
big genetic effect
highly heritable
h2QTL
cis-eQTLs:
0.3
0.2
0.2
0.1
0.1
Kidney
0.0
0.00
Petretto et al. 2006 PLoS Genet
0.3
0.25
0.50
0.75
1.00
1.25
eQTL allelic effect
1.50
Adrenal
0.0
0.00
0.25
0.50
0.75
1.00
1.25
eQTL allelic effect
1.50
eQTL Mapping
Methods
ƒ Linear regression (within single tissues)
ƒ QTL Reaper, empirical genome-wide significance by permutations
(Hubner et al. Nat. Genet. 2005; Petretto et al. PLoS Genet. 2006; Petretto et al. Nat. Genet. 2008)
ƒ Bayesian multiple regression models (within single
tissues)
ƒ Fully multivariate, model based
ƒ Bayesian multiple response models (across multiple
tissues)
ƒ Borrow strength across tissues
Bayesian models
• Fully Bayesian variable selection in the “large p, small n”
paradigm for unidimensional outcomes Y (n × 1)
Evolutionary Stochastic Search (ESS)*
• Bayesian multiple regression model
– Providing evidence of polygenic control (for each transcript)
• Bayesian multiple response model
– Providing evidence of shared genetic regulation across tissues
– improved estimate of the variance-covariance matrix across
tissues enhances detection of small effects (trans-eQTLs)
* Bottolo and Richardson 2008, submitted
n, observations
q, number of probesets
p, number of genetic markers
Bayesian eQTL analysis in
multiple tissues
• Data from four tissues were pooled and normalized
using RMA
• Standardize gene expression measurements across
tissues to avoid potential batch effects
• Pilot study
– 2,000 transcripts having the highest variation in gene expression
jointly in fat, kidney, adrenal and heart tissues
– 1,000 genetic markers
1. Bayesian multiple regression model
2. Bayesian multiple response model
Bayesian multiple regression model: polygenic control
Bayesian multiple response model:
shared genetic regulation
60%
% shared eQTLs
50%
40%
30%
cis - trans
47%
53%
Bayesian regression
Single tissue analysis
cis
trans
Fat
70%
30%
Kidney
71%
29%
Adrenal
71%
29%
Heart
69%
31%
20%
10%
0%
No eQTL 1 eQTL 2 eQTLs 3 eQTLs 4 eQTLs 5 eQTLs
Number of shared eQTLs across fat, kidney, adrenal and heart
Pr (marker to be true positive | probeset)=0.95
Single vs multiple tissues analysis
• Bayesian regression within single tissues has high
power to detect cis-acting effects
• Detection of a significant proportion of mRNA
levels under polygenic control
•
Multiple response model
– Pooling information across tissues greatly enhances
identification of shared genetic regulation of gene
expression
– 33% of transcripts are under shared monogenic control
– 17% of transcripts are under shared polygenic control
– Increased power to detect small eQTL effects shared
across tissues (i.e., trans-eQTLs)
From eQTLs to gene regulatory
networks
• Co-expression analysis in trans-eQTL clusters
taking into account the underlying genetic
architecture (Grieve et al. submitted)
• Joint co-expression analysis in 4 tissues
– detect specific co-expression patterns within and
across tissues
• Graphical Gaussian Models (GGM) to model
linear dependencies between genes within and
across tissues
Co-expression analysis across
tissues
samples
Within tissues
Transcript 1,
Transcript 2,
Transcript 3,
…
Transcript 2000
heart
Transcript 1,
skeletal
Transcript 2,
muscle
Transcript 3,
…
Transcript 2000
Across tissues
Transcript 1,
Transcript 2,
Transcript 3,
…
Transcript 2000
liver
Transcript 1,
Transcript 2,
Transcript 3,
…
Transcript 2000
aorta
Test for:
• Functional enrichment analysis using
GO classification
• Genetic control of the co-expression
modules (Monti et al. Nat. Genet. 2008)
→ Tissue specific modules
→ Cross-tissue modules
GGMs to model linear
dependencies between genes
• Partial correlation matrix
Π = (πij)
• Inverse of variance covariance matrix P
Ω = (ωij) = P-1
πij = - ωij / (ωii ωjj )-½
• small n, large p
• Regularized covariance matrix estimator by
shrinkage (Ledoit-Wolf approach)
• Guarantees positive definiteness
Schafer and Strimmer 2004, Rainer and Strimmer 2007
Partial correlation graphs
• Multiple testing on all partial correlations
– Fitting a mixture distribution to the observed
partial correlations (p)
f (p) = η0 f0 (p;κ) + ηA fA (p)
η0 +ηA =1, η0 >> ηA
uniform [-1, 1]
∧
∧
η0 , κ
∧
Prob (non-zero edge|p) = 1 Schafer and Strimmer 2004, Rainer and Strimmer 2007
∧
η0 f0 (p;κ)
f (p)
Hypothesis driven analysis
1. Co-expression graphs point to gene regulatory
networks
2. Co-expression graphs under genetic control are
suggestive of common regulation by a single
gene(s)
Graphical Gaussian models
•
Detect conditionally dependent co-expression
components (modules):
–
–
Within tissues
Across tissues
Example 1. tissue specific component (skeletal muscle)
posterior probability for
non-zero edge = 0.95
Transcription
Factor activity
trans
cis
trans
NO genome-wide
significance
Module mapping in skeletal muscle:
Chromosome 18 (15,889,013 pb)
P = 0.00086
Enriched in inflammatory response genes
GO:0002376
GO:0006955
7.5 x 10-12
2.1 x 10-11
immune system
immune response
Example 2. tissue specific component (aorta)
posterior probability for
non-zero edge = 0.95
Transcription
Factor activity
trans
cis
trans
NO genome-wide
significance
Module mapping in aorta:
Chromosome 1 (102,366,482 pb)
P < 0.001
GO:0003012
GO:0006936
9.0 x 10-4
9.0 x 10-4
muscle system process
muscle contraction
Example 3. multi-tissue component (liver, heart)
posterior probability for
non-zero edge = 0.95
Transcription
Factor activity
trans
cis
trans
NO genome-wide
significance
Module mapping in both tissues:
Chromosome 20 (34,232,001 pb)
P = 0.0008
GO:0030097
GO:0002520
GO:0042592
5.1 x 10-4
6.4 x 10-4
9.1 x 10-4
hemopoiesis
immune system development
homeostatic process
Example 4. multi-tissue component (skeletal muscle, heart)
posterior probability for
non-zero edge = 0.95
Transcription
Factor activity
trans
cis
trans
NO genome-wide
significance
Module mapping in both tissues:
Chromosome 17 (79,885,972 pb)
P = 0.002
GO:0048545
GO:0051384
5.9 x 10-4
6.2 x 10-4
response to steroid hormone stimulus
response to glucocorticoid stimulus
cis-eQTLs
candidate genes for
physiological traits
cis eQTL genes: candidates for human hypertension
…
Hubner et al Nature Genetics 2005
A successful “eQTL story”…
Annual risk of cardiovasc disease
left ventricular mass (LVM)
20%
Men
Women
15%
10%
5%
0%
< 90
90 - 114
115 - 139
LVM (g/m)
Levy et al (1990) New Engl J Med 322: 1561-66
>140
cis eQTL genes: candidates for left
ventricular mass (LVM)
4
Rat chromosome 17
LOD Score
3
Look for cis-eQTLs
associated with LVM
2
1
0
QTL
0
10
20
30
40
50
60
Genetic distance (cM)
70
80
LVM
Limited correlation of LVM with
blood pressure in the RI strains
DBP
LVM, left ventricular mass
SBP, systolic blood pressure
DBP, diastolic blood pressure
PP, pulse pressure
SBP
r = -0.24
r = 0.51*
PP
r = 0.11
r = -0.05
r = 0.80**
HR
r = 0.32
r = 0.22
LVM
r = 0.02
DBP
r = 0.07
SBP
r = 0.10
PP
R Sq Linear = 0.056
R Sq Linear = 0.006
R Sq Linear = 0.005
R Sq Linear = 7.954E-4
R Sq Linear = 0.099
R Sq Linear = 0.004
R Sq Linear = 0.604
R Sq Linear = 7.954E-4
R Sq Linear = 0.01
R Sq Linear = 0.459
R Sq Linear = 0.604...
* P < 0.01
** P < 10-5
HR
QTT: Genome-wide association between cis-eQTLs and LVM
a)
Correlation with LVM
1
4
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
cis-eQTLs
Differentially expressed genes
Chromosome 17
Left ventricular mass QTL
8
Hbld2
Ogn
b)
1
2
3
4
5
Correlation with DBP
LOD Score
3
2
1
0
0
10
7
8
6
5
9
10 11 12 13 14 15 16 17 18 19 20
4
3
2
1
-3
20
30
40
50
60
Genetic distance (cM)
c)
6
-Log10(P-value)
7
70
80
-2
-1
0
1
Fold change
2
3
Fine mapping of LVM in RI strains
Using informative SNPs in the region
D
D
D
Rat chromosome 17
4
D
DD DD
D
DD
D
D
D
D
D
D
LOD
Score
LOD Score
3
D D
D
D
2
D
D
D
D
1
DDD
D
D
D
D
DD
D
D
D
D
0
0
D
10
D
20
20000000
30
40
40000000
Mb
Hbld2
Ogn
50
DDD
D
D
60
60000000
Mb
Ogn KO mouse
(mean ± SE)
LVM / LVM
body (%)
weight (%)
0.5
0.4
Ogn+/+
**
*
ns ns
Ogn+/Ogn-/-
0.3
0.2
0.1
0.0
Baseline
Hypertrophic
stimulation
(angiotensin II infusion)
* P = 0.01
** P = 2 x 10-3
Ogn is a strong candidate gene for LVM
9 co-localise with rat cardiac mass QTLs
9 correlate with variation in LVM (BP independent)
9 dynamically regulated in response to hypertrophic
stimulation
→ in vivo regulation in the Ogn KO mouse
Translational studies
Genome-wide expression analysis in the heart
biopsies collected from 20 aortic stenosis
patients ( ↑ LVM) and 7 controls ( ↓ LVM)
Top differentially expressed genes (out of 22k probesets)
in human cardiac hypertrophy and associated with LVM
Probe ID
Gene
Symbol
Gene Name
Fold
1
Change
FDR
(%)
218730_s_at
OGN
Osteoglycin
2.2
202766_s_at
FBN1
Fibrillin 1
209621_s_at
PDLIM3
219087_at
2
Correlation
3
with LVM
P4
value
2.7
0.62
1.2E-03
2.0
2.7
0.55
4.5E-03
PDZ and LIM domain 3
1.6
5.0
0.52
6.5E-03
ASPN
Asporin
1.9
5.0
0.52
7.5E-03
213646_x_at
TUBA1B
Tubulin, alpha 1b
1.5
2.7
0.52
5.7E-03
213765_at
MFAP5
Microfibrillar associated protein 5
1.8
5.0
0.51
6.8E-03
203570_at
LOXL1
Lysyl oxidase-like 1
1.5
5.0
0.51
8.1E-03
208782_at
FSTL1
Follistatin-like 1
1.5
2.7
0.51
1.1E-02
213867_x_at
ACTB
Actin, beta
1.5
2.7
0.49
signaling
1.1E-02 pathway
212614_at
ARID5B
AT rich interactive domain 5B
1.6
2.7
0.49
9.9E-03
216442_x_at
FN1
Fibronectin 1
1.9
5.0
0.49
1.1E-02
211750_x_at
TUBA1C
Tubulin, alpha 1c
1.6
2.7
0.48
1.3E-02
219922_s_at
LTBP3
Latent transforming growth factor beta binding protein 3
1.5
5.0
0.40
4.3E-02
202119_s_at
CPNE3
Copine III
1.5
3.9
0.40
4.8E-02
210095_s_at
IGFBP3
Insulin-like growth factor binding protein 3
1.7
2.7
0.39
5.0E-02
…
…
…
…
…
…
…
1 Fold change of differential expression between patients with low (≤ 93 g/m2) and high (≥ 142 g/m2) LVM in the study population
2 False discovery rate for differential expression was estimated by SAM analysis
3 Data are ranked according to decreasing values of the Pearson correlation with LVM (determined non-invasively by echocardiography)
4 Empirical P-values for the correlations were calculated by 10,000 permutations
TGF-β
Rat studies
Ogn
hypertrophic
stimulation
TGF-β
Ogn KO mouse
signaling
pathway
Primary
genetic control
OGN protein
LVM
LVM
Example 4. multi tissue component (skeletal muscle, heart)
posterior probability for
non-zero edge = 0.95
Transcription
Factor activity
trans
cis
trans
NO genome-wide
significance
QTT approach for BP and
cis-eQTLs in the kidney
Summary
• The eQTL approach is a powerful tool for the
identification of:
– Candidate genes for complex traits
– Regulatory gene networks
• Developed novel, integrated and fully multivariate
methods for eQTL analysis across multiple tissues
• Using the eQTL approach we identified Ogn as
primary determinant of cardiac mass in rats, mice
and humans
Acknowledgments
Tim Aitman
Ian Grieve
Sarah Langley
Jon Mangion
Matthias Hening
Norbert Hubner
Michael Pravenec
Gary Conrad
Ted Kurtz
Yigal Pinto
Stuart Cook
Riswan Sarwar
Han Lu
Blanche Schroen
Sylvia Richardson
Leonardo Bottolo
(MDC, Berlin)
(MDC, Berlin)
(Institute of Physiology, Prague)
(Kansas State University, USA)
(University of California, USA)
(Cardiovascular Research I., Maastricht)
Clusters of trans-eQTLs
Trans-eQTLs
Rat chromosome 8
heart
fat
adrenal
kidney
PGW<0.05
tissue-specific clusters
not tissue-specific cluster
100
c17.6
c17.38
c15.108
c15.11
c16.0
c11.31
c15.75
c6.136
c4.93
c15.78
c1.87
c10.25
c11.32
c4.148
c8.45
c8.87
c8.53
c4.91
c4.161
c10.21
c4.151
c16.46
c15.80
c17.40
c8.9
c16.50
c3.41
c20.44
c3.112
c8.49
c13.9
c17.87
c3.130
c5.151
c7.142
c8.32
c15.58
c1.248
c8.38
c1.90
c12.7
c3.129
c6.131
160
trans-eQTLs hot spots
140
120
kidney
heart
fat
adrenal
Chromosome 15, 108 Mb, D15Rat29
80
60
40
20
0
Locus (chromosome.Mb)
Ogn protein expression in adult
rat cardiac myocytes
top left - labelled with rhodamineconjugated phalloidin
top right - DAPI*
bottom left - Ogn antibody visualized
using Alexa fluor 488 donkey anti-goat
bottom right - merged image
* 4',6-DIAMIDINO-2-PHENYLINDOLE
Data Mining of eQTL datasets
eQTLexplorer database
• Relational database (MySQL)
• Located on Codon server (Imperial
College, London)
• Advantages of relational database
– reduced redundancy & increased consistancy
– improved access & security
– facilitated data integration & mining
Mueller et al. Bioinformatics 2006
Main Screen
cis & trans eQTLs
physiological QTLs
Browsing the data…
eQTL mapping in the heart
(left ventricle)
Genome-wide corrected
P-value
0
500
1000
1500
0.05
0.01
0.001
0.0001
0.00001
Petretto et al, Nature Genetics 2008 (under revision)
2000
trans-eQTL
cis-eQTL
Characterization of Ogn
Dynamic
responsein
toresponse to hypertrophic stimulation
Dynamic
regulation
hypertrophic stimulation* Sequence analysis
Fold change
BN
SHR
BN
SHR
±1
-2
Hbld2
*
-4
**
-8
0h
1h
3h
6h
24 h
BN
SHR
5’UTR
5'UTR
+2
-2346 -1787 -1452 -1438
C
A
T
G
T
T ±1 G
A
Fold change
Ogn
+2
(mean ± SE)
Ogn
Ogn
exon
exon 33
Hbld23’UTR
3'UTR
1173
A
G
348
T
C
1224
1852
2497
CA 47bp Ins
T
A
-2
5'UTR
-1997 -1397 -4-350
G
C
T
A
T
C
-8
3'UTR
1310
67bp Ins
0h
1h
3h
6h
* P < 0.05
** P < 0.01
ns, not significant
* neonatal rat ventricular myocytes were stimulated with phenylephrine (100 μM)
24 h
258
C
G
Alternative splicing of the Ogn 3’UTR
Parental strains
RI strains strains
Total
mRNA
Short
isoform
Long
isoform
**
1.0
Luciferase Activity
Circles, BN
Triangles, SHR
ns
**
0.5
**
0.0
BN-L
SHR-L
BN-S
Fold change
*
SHR-S
ns
**
Total
mRNA
Short
isoform
30
(arbitrary optical units)
ns
**
Ogn protein expression
Fold change
**
20
***
**
10
0
BN
SHR
Long
isoform
OGN protein expression
CAD
1
2
AS
3
HF
4
5
HTN
6
7
8
Mimecan
Pre-OGN
Pre-OGN
50 kDa
OGN
OGN
20 kDa
LVM
LVM
CAD, coronary heart disease
AS, concentric hypertrophy secondary to aortic stenosis
HF, ischemic heart failure
HTN, hypertensive heart disease
0
0
1
1
Model size
2
Model size
2
3
3
4
fat
4
5
Model posterior probability
for each transcript
Model posterior probability
for each transcript
Model posterior probability
for each transcript
Model posterior probability
for each transcript
heart
kidney
5
0
0
1
1
Model size
2
Model size
2
3
3
4
4
5
adrenal
5
Download