Regulatory variation and its
functional consequences
Chris Cotsapas
cotsapas@broadinstitute.org
Motivating questions
• How do phenotypes vary across individuals?
– Regulatory changes drive cellular and organismal
traits
– Likely also drive evolutionary differences
• How are genes (co)regulated?
– Pathways, processes, contexts
Regulatory variation
• What do “interesting” variants do?
• Genetic changes to:
–
–
–
–
–
–
–
–
Coding sequence **
Gene expression levels
Splice isomer levels
Methylation patterns
Chromatin accessibility
Transcription factor binding kinetics
Cell signaling
Protein-protein interactions
~88% of GWAS
hits are
regulatory
Genetic variation alters regulation
• Protein levels
– Maize (Damerval 94)
• Expression levels
– Yeast, maize, mouse, humans (Brem 02, Schadt
03, Stranger 05, Stranger 07)
• RNA splicing
– Humans (Pickrell 12, Lappalainen 13)
• Methylation and Dnase I peak strength
– Humans (Degner 12; Gibbs 12)
Genetics of gene expression (eQTL)
• cis-eQTL
– The position of the eQTL maps
near the physical position of the
gene.
– Promoter polymorphism?
– Insertion/Deletion?
– Methylation, chromatin
conformation?
• trans-eQTL
– The position of the eQTL does
not map near the physical
position of the gene.
– Regulator?
– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
Cis- eQTL analysis:
Test SNPs within a pre-defined distance of gene
1Mb
1Mb window
probe
gene
SNPs
1Mb
QT association
• Analysis of the relationship between a dependent or outcome
variable (phenotype) with one or more independent or
predictor variables (SNP genotype)
Yi = b0 + b1Xi + ei
Continuous Trait Value
Linear Regression Equation
Slope: b1
b0
Logistic Regression Equation
pi
ln (1-pi) = b0 + b1Xi + ei
(
)
0
1
Number of A1 Alleles
2
eQTL analysis: a GWAS for every gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
cis-eQTLs are rather common
Nica et al PLoS Genet 2011
Cis-eQTLs cluster around TSS
Stranger et al
PLoS Genet 2012
trans hotspots (yeast)
Brem et al Science 2002
Yvert et al Nat Genet 2003
Candidate genes, perturbations underlying organismal phenotypes
DOES REGULATORY VARIATION ALTER
PHENOTYPE? APPLICATION TO GWAS
Rationale
• How do disease/trait variants actually alter
biology?
• If they change regulation, then:
– Change in gene expression/isoform use
– Phenotypic consequence*
Compare patterns of association
GWAS peak
eQTL for gene 1
eQTL for gene 2
Pearson’s covariance for windows of
51 SNPs between –log(p) in 2 traits
CD GWAS p
eQTL p
Detect a peak when effect is the same
No peak when there are independent
hits near each other
Crohn’s/eQTL analysis
• CD meta analysis (GWAS only)
• CEU Hapmap LCL eQTL data
• Overlapping SNPs only (eQTL data has 610K
SNPs, most in CD meta-analysis)
• Test 133 associations (total 1054 tests)
GWAS peak
eQTL for gene 1
eQTL for gene 2
Crohn’s/eQTL analysis
SNP
CHR
Gene
rs11742570
5
PTGER4
rs12994997
2
ATG16L1
rs11401
16
SPNS1
rs10781499
9
INPP5E
rs2266959
2
C22orf29
A peak implies that the same effect drives GWAS and eQTL
PTGER4 eQTL covaries with CD GWAS signal at rs11742570 on chr 5
101
100kb
TTC33
PRKAA1
PTGER4
RPL37
76
GWAS/eQTL
50
covariance
25
0
35
GWAS
−log(p)
0
C6
PTGER4
SNORD72
CARD6
PRKAA1RPL37
TTC33
39.45
39.67
39.9
40.12
40.34
40.56
LOC100506548
40.78
C7
PLCXD3
MROH2B
41
41.22
41.45
MS/eQTL analysis
SNP
CHR
Gene
rs6880778
5
PTGER4
rs7132277
12
CDK2AP
rs7665090
4
CISD2
rs2255214
3
GOLGB1 & EAF2
rs201202118
12
METTL1 & TSFM
rs12946510
17
ORMDL3, STARD3 & ZPBP2
rs2283792
22
PPM1F
rs7552544
1
SLC30A7
rs34536443
19
SLC44A2
A peak implies that the same effect drives GWAS and eQTL
A PTGER4 eQTL covaries with MS GWAS signal at rs6880778 on chr 5
19
100kb
PTGER4
14
GWAS/eQTL
9
covariance
5
0
16
GWAS
−log(p)
0
PRKAA1 CARD6
TTC33
SNORD72
MROH2B
C6
LOC100506548
RPL37
PLCXD3
PTGER4
DAB2
C7
39.41
39.63
39.85
40.06
40.28
40.5
40.72
40.94
41.16
41.38
60
40
0
20
Covariance
80 100
CD/PTGER4 covariance
39500000
40000000
40500000
41000000
41500000
41000000
41500000
10
5
0
Covariance
15
MS/PTGER4 covariance
39500000
40000000
40500000
Open question
DOES REGVAR REVEAL CO-REGULATION?
A.K.A. WHERE ARE THE TRANS eQTLS?
Whole-genome eQTL analysis is an independent
GWAS for expression of each gene
gene 1
gene 2
gene 3
gene 4
gene 5
gene N
Issues with trans mapping
• Power
– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes
– Sample sizes clearly inadequate
• Data structure
– Bias corrections deflate variance
– Non-normal distributions
• Sample sizes
– Far too small
But…
• Assume that trans eQTLs affect many genes…
• …and you can use cross-trait methods!
Association data
Z1,1
Z2,1
:
:
Zs,1
Z1,2
…
…
Z1,p
Zs,p
Cross-phenotype meta-analysis
l=1
l¹1
l¹1
−log(p)
−log(p)
−log(p)
SCPMA ~
L(data | λ≠1)
L(data | λ=1)
Cotsapas et al, PLoS Genetics
CPMA for correlated traits
• Empirical assessment to account for
correlation
• Simulate Z scores under covariance,
recalculate CPMA
• Construct distribution of CPMA for dataset,
call significance
with Ben Voight, U Penn
Experimental design
610,180 SNPs
MAF >0.15 CEU and YRI
LD pruned (r2 < 0.2)
plink
CEU p-values
Transcript ~ SNP, sex
CPMA
8368 transcripts
YRI p-values
Detectable on Illumina arrays
108 CEU individuals*
109 YRI individuals*
Transcript ~ SNP, sex
* Stranger et al Nat Genet 2007
(LCL data; publicly available)
CEU CPMA
scores
>95%ile sim CPMA
YRI CPMA
scores
Target sets of genes
• trans-acting variant: SNP with CPMA evidence
• Target genes: genes affected by trans-acting
variant (i.e. regulon)
Prediction 1
• Allelic effects should be conserved between
two populations
– Binomial test on paired observations for all genes
P < 0.05 in at least one population
Genes
pCEU < 0.05
Genes
pYRI < 0.05
CEU
+
+
-
-
+
YRI
+
+
-
-
+
YRI
-
-
+
+
-
True for 1124/1311 SNPs
(binomial p < 0.05)
Prediction 2
• Target genes should overlap
– Identify by mixture of gaussians classification
– Empirical p from distribution of overlaps between
NCEU and NYRI genes across SNPs.
Genes
pCEU < 0.05
Genes
pYRI < 0.05
True for 600/1311 SNPs
(empirical p < 0.05)
What about the target genes?
• Regulons:
– Encode proteins more
connected than expected by
chance
www.broadinstitute.org/mpg/dapple.php
Rossin et al 2011 PLoS Genetics
What about the target genes?
• Regulons:
– Encode proteins enriched for
TF targets (ENCODE LCL data)
– 24/67 filtered TFs significant
– Binomial overlap test
trans
target
genes
CHiPseq
LCL target
genes
TF
p-value
CEBPB
3.7 x 10-142
HDAC8
7.8 x 10-122
FOS
2.5 x 10-96
JUND
3.7 x 10-88
NFYB
3.3 x 10-71
ETS1
3.8 x 10-63
FAM48A
2.1 x 10-61
FOXA1
1.4 x 10-33
GATA1
4.6 x 10-33
HEY1
7.8 x 10-32
Summary
• Regulatory variation is common
• It affects gene expression levels
• Likely many other types:
– DNA accessibility, chromatin states
– Transcript splicing, processing, turnover
• Has phenotypic consequences
– GWAS
– Some cellular assays (not discussed here)
Open questions
• Discover regulatory elements (cis)
– Promoters, enhancers etc
• Gene regulatory circuits (trans)
• Dynamics of regulation
– Splicing variation, processing, degradation
• Phenotypic consequences
– Cellular assays required
• Tie in to organismal phenotype
RNAseq, GTEx
NEXT-GEN SEQUENCING DATA
GTEx – Genotype-Tissue EXpression
An NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
How can we make RNAseq useful?
• Standard eQTLs
– Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs
– Depth of sequence!
•
•
•
•
Long genes are preferentially sequenced
Abundant genes/isoforms ditto
Power!?
Mapping biases due to SNPs
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq
– Look for trans effects
• Open chromatin states (Dnase I; methylation)
– Find active genes
– Changes in epigenetic marks correlated to RNA
– Genetic effects
• RNA/DNA comparisons
– Simultaneous SNP detection/genotyping
– RNA editing ???