Analysis of Promoter Shifting Using CAGE data An insight into transcription regulation 11/11/2014

advertisement
Analysis of Promoter Shifting
Using CAGE data
An insight into transcription regulation
11/11/2014
Jack Binysh MRC Internship
1
Outline
• Background
 Introduction to Promoters and Transcription Start Sites (TSS’s)
 Classification of Promoters
 Motivation for Project
• Project





11/11/2014
CAGE data
Previous work
CAGEr
Results
Future Work
Jack Binysh MRC Internship
2
Promoters and TSS’s
How is regulation achieved?
• Promoter region
 Contains regulatory
elements (binding motifs,
CG enrichment…)
 Controls gene expression
• Context specific
• Dynamic
• Associated epigentics
 Histone placement
 Histone ‘marks’
11/11/2014
Jack Binysh MRC Internship
3
Classification
• Correlation between
several features
 Broad vs Sharp
 CG islands vs TATA
 Ordered vs Disorded
Histones
 General vs Specific
function
11/11/2014
Jack Binysh MRC Internship
4
CAGE Data
• Cap Analysis of Gene Expression
• mRNA captured, first ~20 bp
sequenced from 5’ end  Tags
 Full length estimated
 Tags mapped to Genome
• TSS determination at bp resolution.
• Genome wide mapping of mRNA
transcription
• FANTOM5 – CAGE datasets
for many cell types
A ‘TSS Profile’
11/11/2014
Jack Binysh MRC Internship
5
Motivation for Project
• Already known that:
 One Gene may have multiple types of promoter → regulated in
several ways
 Variants of Transcription factors may exist in different cells
Do the rules governing transcription change
between cell types?
Both temporally (embryonic development) and spatially (in adult tissues) ?
•Focus on housekeeper genes – always expressed
•Look for changes in TSS profile between cell types…
11/11/2014
Jack Binysh MRC Internship
6
CAGEr
Input
Output
Available
resources
Methods
TSS
Tag
clusters
(TC)
Normalized
expression
Custom input
CAGEset
CAGE bam
files
CTSS files
11/11/2014
Jack Binysh MRC Internship
7
Clustering in CAGEr
• Two levels of clustering
 TSS profiles  Tag clusters
 Tag cluster  Consensus
•Tag clusters sample specific
•Consensus clusters the
same for all samples
TCs
CTSSs
S1
S2
S3
TCs
consensus cluster
11/11/2014
Jack Binysh MRC Internship
8
Extending CAGEr
•Large datasets
• 1 sample ~ 46 million tag sites
•FANTOM5 has hundreds of samples
•Pairwise comparisons of datasets O(n2)
• if 1 comparison takes ~ 1 hour, 60
samples takes ~ 10 weeks!
•Need to speed things up, avoid doing every
comparison, etc.
11/11/2014
Jack Binysh MRC Internship
9
Dendrogram
•67 cell types compared
•Most show very little
shifting –’bulk’
•~ 8 ‘outliers’
Cardiac Myocytes
Sertoli Cells
Hepatocytes
Hair follicle papilla
CD 19
Renal Glomerular
Neurons
Aortic Endothelial
11/11/2014
Jack Binysh MRC Internship
10
Heatmap
•Each outlier is separated from
every other cell type
•The difference between two
outliers is greater than the
difference between one outlier
and the ‘bulk’
•Suggests a different set of
shifting promoters in every
outlier
11/11/2014
Jack Binysh MRC Internship
11
Scatter plots
11/11/2014
Jack Binysh MRC Internship
12
Dinucleotide Density plots
• Each cluster has two
dominant TSS’s – modal
and sample specific
Cardiac Myocytes
Centered on modal TSS
• Look for dinucleotide
enrichment in sequences
• Initiator sequence at
modal CTSS visible
Cardiac Myocytes
Centered on outlier TSS
• No obvious motif at the
TSS of the outlier
11/11/2014
Jack Binysh MRC Internship
13
Motif discovery
• Motif discovery finds no specific motifs 500 bp either side of the outlier
TSS in any of the samples.
• All of the samples show general GC enrichment
• ~80% clusters overlap 1 annotated CpG island, ~20% overlap none
Cardiac Myocytes
11/11/2014
Jack Binysh MRC Internship
14
Gene Ontology
•
Gene Ontology analysis
 Each cluster associated with nearest annotated TSS & entrezgene ID
 Keywords tagged to each entrezgene ID
 Statistics on over/under representation of Keywords
Cardiac Myocytes
11/11/2014
Jack Binysh MRC Internship
15
Gene Ontology
•Significantly over-represented Biological functions tend to be housekeeping –
not cell specific
•Perhaps the shifting promoters are not involved with cell specific gene function
at all?
Cardiac Myocytes
11/11/2014
Jack Binysh MRC Internship
16
Future Work
• Repeat analysis using different consensus clusters
 Problems with thresholds within analysis
 More promising recent dinucleotide maps
11/11/2014
Jack Binysh MRC Internship
17
Future Work
• Analysis of non-shifting promoters
 Looking at more general changes in shape
 Eg . Dot product, linear scaling
11/11/2014
Jack Binysh MRC Internship
18
Extra slides…
11/11/2014
Jack Binysh MRC Internship
19
Previous Results
•Zebrafish embryonic development
•Initial RNA transcriptome inherited from
mother, zygotic gene activation at Mid Blastula
Transition
•Corresponds to change in TSS profile
Sharp Broad
Position of TSS’s shift
Shifting Promoters
•“Differential promoter interpretation by the
maternal and zygotic transcription
machinery”
11/11/2014
Jack Binysh MRC Internship
20
Shifting Promoters
Search for genetic structure
correlated with this shifting
•TATA like enrichment always found
~30 bp upstream in Maternal
•In Zygote, boundary 50 bp
downstream of TSS
•Majority of TATA- like motifs not
canonical TATA boxes (W box)
Two Independent Mechanisms
for Transcription Initiation
11/11/2014
Jack Binysh MRC Internship
21
Nucleosome Location
•H3K4me3 Nucleosome locations
estimated at 4 developmental stages
•Alignment with Zygotic, but not
Maternal, TSS, 50bp downstream
 Same location as boundary
• Suggests Zygotic mechanism for
positioning nucleosomes after MBT
11/11/2014
Jack Binysh MRC Internship
22
Internucleosomal Phasing Patterns
•10 bp AA/TT dinucleotide enrichment
periodicity downstream of zygotic
TSS, but not maternal
•Weaker GC/AT enrichment pattern
matching nucleosome free and
wrapped DNA
•Zygotic,not maternal, TSS associated
with nucleosome positioning
11/11/2014
Jack Binysh MRC Internship
23
Download