Regulomics I: Methods to read out regulatory functions Identifying regulatory functions in genomes Chr5: 133,876,119 – 134,876,119 Genes Transcription • Regulatory elements are not easily detected by sequence analysis • Examine biochemical correlates of RE activity in cells/tissues: • Chromatin Immunoprecipitation (ChIP-seq) • DNase-seq and FAIRE • Methylated DNA immunoprecipitation (MeDIP) Identifying regulatory functions in genomes Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010) Biochemical indicators of regulatory function 1. TF binding 2. Histone modification 3. Chromatin modifiers & coactivators 4. DNA looping factors • H3K27ac p300 • H3K4me3 MLL cohesin Regulatory functions are tissue/cell type/time point-specific From Visel et al. (2009) Nature 461:199 Identifying regulatory functions in genomes Chr5: 133,876,119 – 134,876,119 Genes Transcription Histone mods TF binding Methods ChIP-seq TFs Chromatin accessibility Histone mods DNase FAIRE From Furey (2012) Nat Rev Genet 13:840 ChIP-seq ChIP Peak call Signal Input Align reads to reference Use peaks of mapped reads to identify binding events PCR Calling peaks in ChIP-seq data ChIP Peak call Enrichment relative to control Input ChIP-seq is an enrichment method Requires a statistical framework for determining the significance of enrichment ChIP-seq ‘peaks’ are regions of enriched read density relative to an input control Input = sonicated chromatin collected prior to immunoprecipitation There are many ChIP-seq peak callers available Wilbanks and Facciotti PLoS ONE 5:e11471 (2010) Generating ChIP-seq peak profiles Artifacts: • Repeats • PCR duplicates From Park (2009) Nat Rev Genet 10:669 Assessing statistical significance Assume read distribution follows a Poisson distribution Many sites in input data will have some reads by chance Some sites will have many reads # of reads at a site (S) Empirical FDR: Call peaks in input (using ChIP as control) FDR = ratio of # of peaks of given enrichment value called in input vs ChIP From Pepke et al (2009) Nat Meth 6:S22 Assessing statistical significance Sequencing depth matters: # of reads at a site (S) From Park (2009) Nat Rev Genet 10:669 ChIP-seq signal profiles vary depending on factor Transcription factors Pol II Histone mods From Park (2009) Nat Rev Genet 10:669 Quantitative analysis of ChIP-seq signal profiles HeLa K562 ChIP-seq signal Signal at 20,000 bound sites HeLa Sites strongly marked in HeLa Sites strongly marked in both Clustering Sites strongly marked in K562 ChIP-seq analysis workflow From Park (2009) Nat Rev Genet 10:669 Interpreting ChIP-seq datasets Requires some prior knowledge • • • TF function Histone modification Potential target genes Exploit existing annotation • Promoter locations • Known binding sites • Known histone modification maps Example from PS1: CTCF and RAD21 (cohesin) CTCF and cohesin co-occupy many sites Promoters Insulators Enhancers From Kagey et al (2010) Nature 467:430 Promoter Enhancers? CTCF: marks insulators and promoters RAD21 (cohesin): marks insulators, promoters and enhancers Discovering regulatory functions specific to a biological state Limb Brain Function? Assign enhancers to genes based on proximity (not ideal) GREAT: bejerano.stanford.edu/great/ Gene ontology annotation assigned to regulatory sequences TF motif elicitation from ChIP-seq data CTCF ~20,000 binding sites identified by ChIP: MEME suite: http://meme.nbcr.net/meme/ From Furey (2012) Nat Rev Genet 13:840 Single TF binding events may not indicate regulatory function • Many TFs are present at high concentrations in the nucleus • TF motifs are abundant in the genome Enhancer-associated histone modification • Single TF binding events may be incidental Mapping chromatin accessibility DNase I FAIRE From Furey (2012) Nat Rev Genet 13:840 DNase I hypersensitivity identifies TF binding events From Furey (2012) Nat Rev Genet 13:840 DNase I hypersensitivity identifies regulatory elements DNase I hypersensitive sites Song et al., Genome Res 21:1757 (2011) De novo TF motif discovery by DNase I hypersensitivity mapping In human ES cells: From Neph (2012) Nature 489:83 De novo TF motif discovery by DNase I hypersensitivity mapping Across tissue types: From Neph (2012) Nature 489:83 Capturing long-range regulatory interactions From Visel et al. (2009) Nature 461:199 Chromosome Conformation Capture Methods ChIP for specific factors: ChIA-PET Sequence Sequence: Hi-C Sequence Long-range regulatory interactions mediated by specific factors: RNA PolII From Kieffer-Kwon et al. (2013) Cell 155:1507 Long-range regulatory interactions mediated by specific factors: Cohesin Int – Intergenic or intronic Pr – Promoter Ex – Exonic From DeMare et al. (2013) Genome Res. 23:1224 Summary • Relevant overview papers on ChIP-seq and DNase-seq posted on class wiki • Wednesday: Epigenetics and the histone code