Analysis of Promoter Shifting Using CAGE data An insight into transcription regulation 11/11/2014 Jack Binysh MRC Internship 1 Outline • Background Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project 11/11/2014 CAGE data Previous work CAGEr Results Future Work Jack Binysh MRC Internship 2 Promoters and TSS’s How is regulation achieved? • Promoter region Contains regulatory elements (binding motifs, CG enrichment…) Controls gene expression • Context specific • Dynamic • Associated epigentics Histone placement Histone ‘marks’ 11/11/2014 Jack Binysh MRC Internship 3 Classification • Correlation between several features Broad vs Sharp CG islands vs TATA Ordered vs Disorded Histones General vs Specific function 11/11/2014 Jack Binysh MRC Internship 4 CAGE Data • Cap Analysis of Gene Expression • mRNA captured, first ~20 bp sequenced from 5’ end Tags Full length estimated Tags mapped to Genome • TSS determination at bp resolution. • Genome wide mapping of mRNA transcription • FANTOM5 – CAGE datasets for many cell types A ‘TSS Profile’ 11/11/2014 Jack Binysh MRC Internship 5 Motivation for Project • Already known that: One Gene may have multiple types of promoter → regulated in several ways Variants of Transcription factors may exist in different cells Do the rules governing transcription change between cell types? Both temporally (embryonic development) and spatially (in adult tissues) ? •Focus on housekeeper genes – always expressed •Look for changes in TSS profile between cell types… 11/11/2014 Jack Binysh MRC Internship 6 CAGEr Input Output Available resources Methods TSS Tag clusters (TC) Normalized expression Custom input CAGEset CAGE bam files CTSS files 11/11/2014 Jack Binysh MRC Internship 7 Clustering in CAGEr • Two levels of clustering TSS profiles Tag clusters Tag cluster Consensus •Tag clusters sample specific •Consensus clusters the same for all samples TCs CTSSs S1 S2 S3 TCs consensus cluster 11/11/2014 Jack Binysh MRC Internship 8 Extending CAGEr •Large datasets • 1 sample ~ 46 million tag sites •FANTOM5 has hundreds of samples •Pairwise comparisons of datasets O(n2) • if 1 comparison takes ~ 1 hour, 60 samples takes ~ 10 weeks! •Need to speed things up, avoid doing every comparison, etc. 11/11/2014 Jack Binysh MRC Internship 9 Dendrogram •67 cell types compared •Most show very little shifting –’bulk’ •~ 8 ‘outliers’ Cardiac Myocytes Sertoli Cells Hepatocytes Hair follicle papilla CD 19 Renal Glomerular Neurons Aortic Endothelial 11/11/2014 Jack Binysh MRC Internship 10 Heatmap •Each outlier is separated from every other cell type •The difference between two outliers is greater than the difference between one outlier and the ‘bulk’ •Suggests a different set of shifting promoters in every outlier 11/11/2014 Jack Binysh MRC Internship 11 Scatter plots 11/11/2014 Jack Binysh MRC Internship 12 Dinucleotide Density plots • Each cluster has two dominant TSS’s – modal and sample specific Cardiac Myocytes Centered on modal TSS • Look for dinucleotide enrichment in sequences • Initiator sequence at modal CTSS visible Cardiac Myocytes Centered on outlier TSS • No obvious motif at the TSS of the outlier 11/11/2014 Jack Binysh MRC Internship 13 Motif discovery • Motif discovery finds no specific motifs 500 bp either side of the outlier TSS in any of the samples. • All of the samples show general GC enrichment • ~80% clusters overlap 1 annotated CpG island, ~20% overlap none Cardiac Myocytes 11/11/2014 Jack Binysh MRC Internship 14 Gene Ontology • Gene Ontology analysis Each cluster associated with nearest annotated TSS & entrezgene ID Keywords tagged to each entrezgene ID Statistics on over/under representation of Keywords Cardiac Myocytes 11/11/2014 Jack Binysh MRC Internship 15 Gene Ontology •Significantly over-represented Biological functions tend to be housekeeping – not cell specific •Perhaps the shifting promoters are not involved with cell specific gene function at all? Cardiac Myocytes 11/11/2014 Jack Binysh MRC Internship 16 Future Work • Repeat analysis using different consensus clusters Problems with thresholds within analysis More promising recent dinucleotide maps 11/11/2014 Jack Binysh MRC Internship 17 Future Work • Analysis of non-shifting promoters Looking at more general changes in shape Eg . Dot product, linear scaling 11/11/2014 Jack Binysh MRC Internship 18 Extra slides… 11/11/2014 Jack Binysh MRC Internship 19 Previous Results •Zebrafish embryonic development •Initial RNA transcriptome inherited from mother, zygotic gene activation at Mid Blastula Transition •Corresponds to change in TSS profile Sharp Broad Position of TSS’s shift Shifting Promoters •“Differential promoter interpretation by the maternal and zygotic transcription machinery” 11/11/2014 Jack Binysh MRC Internship 20 Shifting Promoters Search for genetic structure correlated with this shifting •TATA like enrichment always found ~30 bp upstream in Maternal •In Zygote, boundary 50 bp downstream of TSS •Majority of TATA- like motifs not canonical TATA boxes (W box) Two Independent Mechanisms for Transcription Initiation 11/11/2014 Jack Binysh MRC Internship 21 Nucleosome Location •H3K4me3 Nucleosome locations estimated at 4 developmental stages •Alignment with Zygotic, but not Maternal, TSS, 50bp downstream Same location as boundary • Suggests Zygotic mechanism for positioning nucleosomes after MBT 11/11/2014 Jack Binysh MRC Internship 22 Internucleosomal Phasing Patterns •10 bp AA/TT dinucleotide enrichment periodicity downstream of zygotic TSS, but not maternal •Weaker GC/AT enrichment pattern matching nucleosome free and wrapped DNA •Zygotic,not maternal, TSS associated with nucleosome positioning 11/11/2014 Jack Binysh MRC Internship 23