How can so many different cell types arise from just one genome? Brad Bernstein MGH Pathology, Center for Cancer Research & Center for Systems Biology MASSACHUSETTS GENERAL HOSPITAL HARVARD MEDICAL SCHOOL Parsing 3 billion bases of the human genome ctggaggtgcaatggctgtcttgtcctggccttggacatgg! gctgaaatactgggttcacccatatctaggactctagacgg! gtgggtaagcaagaactgaggagtggccccagaaataattg! gcacacgaacattcaatggatgttttaggctctccagagga! tggctgagtgggctgtaaggacaggccgagagggtgcagtg! ccaacaggctttgtggtgcgatggggcatccgagcaactgg! tttgtgaggtgtccggtgacccaaggcaggggtgagaggac! cttgaaggttgaaaatgaaggcctcctggggtcccgtccta! agggttgtcctgtccagacgtccccaacctccgtctggaag! acacaggcagatagcgctcgcctcagtttctcccaccccca! cagctctgctcctccacccacccagggggcggggccagagg! tcaaggctagagggtgggattggggagggagaggtgaaacc! gtccctaggtgagccgtctttccaccaggcccccggctcgg! ggtgcccaccttccccatggctggacacctggcttca! Parsing 3 billion bases of the human genome ctggaggtgcaatggctgtcttgtcctggccttggacatgg! gctgaaatactgggttcacccatatctaggactctagacgg! gtgggtaagcaagaactgaggagtggccccagaaataattg! gcacacgaacattcaatggatgttttaggctctccagagga! tggctgagtgggctgtaaggacaggccgagagggtgcagtg! ccaacaggctttgtggtgcgatggggcatccgagcaactgg! tttgtgaggtgtccggtgacccaaggcaggggtgagaggac! cttgaaggttgaaaatgaaggcctcctggggtcccgtccta! agggttgtcctgtccagacgtccccaacctccgtctggaag! acacaggcagatagcgctcgcctcagtttctcccaccccca! cagctctgctcctccacccacccagggggcggggccagagg! tcaaggctagagggtgggattggggagggagaggtgaaacc! gtccctagGTGAGCCGTCTTTCCACCAGGCCCCCGGCTCGG! GGTGCCCACCTTCCCCATGGCTGGACACCTGGCTTCA! Protein coding genes account for just 1% of the human genome How do we find the genome’s other working parts? How do these parts interact to give rise to >200 different cell types in the human? One genome, many cell types Diagram of stem cell division and differentiation. A - stem cell B - progenitor cell C - differentiated cell 1 - symmetric stem cell division 2 - asymmetric stem cell division 3 - progenitor division 4 - terminal differentiation http://en.wikipedia.org/wiki/File:Stem_cell_division_and_differentiation.svg http://en.wikipedia.org/wiki/File:Stem_cells_diagram.png Cell type-specific gene expression Development & Lineage-Specification Cell Type-Specific Gene Expression Programs Cell Type A X X Cell Type B Gene X X Gene Y Gene Z . . X. Chromatin Structure & the Epigenome Gene AA Gene ZZ X . . . http://en.wikipedia.org/wiki/File:Epigenetic_mechanisms.jpg Cell type-specific gene expression Cell differentiation can occur at many levels of gene expression http://openi.nlm.nih.gov/detailedresult.php?img=3110863_ejn0033-1563-f1&req=4 Genomic DNA in the nucleus packaged into chromatin Two meters of DNA in a nucleus smaller than of the head of a pin http://en.wikipedia.org/wiki/File:DNA_to_Chromatin_Formation.jpg#file Chromatin packaging affects accessibility of gene expression machinery Nucleosome DNA Closed DNA, unreadable genes DNA can be tightly wound and closely packed around nucleosomes, blocking gene expression Open DNA, readable genes Other parts of the genome are less tightly wound, allowing gene expression machinery more access to genes A Variety of Chromatin Modifications Can Affect Gene Expression Some modifications recruit factors to stop gene expression while others open up chromatin to allow gene expression Chromatin Binding Factors Histone modifications Chemical modifications (‘tags’) in chromatin DNA methylation Histone modifications http://en.wikipedia.org/wiki/File:Epigenetic_mechanisms.jpg Chemical tags added and removed by enzymes Histone modifications Acetyl groups Chromatin compacts Transcription repressed Histone acetylase Histone deacetylase Chromatin decondenses Transcription activated Chemical tags recruit proteins that turn genes ‘on’ or ‘off’ “On” “Off” Polycomb compaction (Francis et al, Science 2004) The field of epigenetics seeks to understand non-genetic changes that influence gene activity and cell behavior These changes may include modifications of DNA, but also encompass many other entities Changes remembered across a lifetime and may rarely be inherited across generations X-chromosome inactivation Barr body (inactive Xi) Calico Cat http://en.wikipedia.org/wiki/File:Sd4hi-unten-crop.jpg http://en.wikipedia.org/wiki/File:Calico_cat_-_Phoebe.jpg Epigenomics of human disease • Cancer is a genetic and epigenetic disease • Aberrant DNA methylation is a hallmark • Mutations of many genes with epigenetic functions • Neuropsychiatric, metabolic, developmental disorders • Long-term health consequences of early environmental exposures may be mediated through epigenomic changes Genome-wide maps of epigenomic features Next-generation sequencing has transformed epigenomics research RNA-seq RNA Whole Genome Bisulfite Sequencing Dnase-seq ChIP-seq http://en.wikipedia.org/wiki/File:DNA_to_Chromatin_Formation.jpg#file Genomewide chromatin state maps ChIP-­‐seq Fixed cells/tissue Y Enrich chromatin with modified histones Histone 3 lysine 4 methyl an0body Y Deep sequence the enriched DNA Histone modification map Mikkelsen et al, Nature 2007 Chromatin structure is dynamic in development ES cells HSCs Hematopoeitic B-cells CD34 ES ES cells progenitors ‘poised’ ‘ac4ve’ PAX5 B-cells CpG Island PAX5 Bivalent chromatin domains Some chromatin contains both activating and repressing epigenetic modifications in the same areas, affecting accessibility of chromatin to RNA polymerase. http://openi.nlm.nih.gov/imgs/rescaled512/2634711_6604771f1.png Systema4c detec4on of DNA elements and their cell-­‐type-­‐specifici4es Determinants of cellular state Cell type DNA TF ES cell Silenced locus Endoderm Mesoderm Ectoderm Liver, Lung, Pancreas Blood, Heart Skeletal Muscle CNS, Skin Enhancers and their regulators mediate tissue-specific gene expression programs E P Gene e.g., Sonic hedgehog Transcription factor (TF) Yet conventional approaches have yielded few examples in human. How do we parse 3 Gb of DNA to identify enhancers systematically? Chromatin maps identify genome regulatory elements MLL Bivalent domains EZH2 Regulatory elements Meissner et al, Nature 2008 Incorporating enhancers into regulatory networks à Human genome contains ~1 million enhancers TF2 - TF1 + E TF3 Enhancers elements exhibit high cell type-specificity Enhancer clusters Proximal genes Lymphoblas t cell-­‐ specific Hepa0c cell-­‐specific Endothelial cell-­‐specific Func4onal annota4ons and regulatory predic4ons for GWAS Most disease-associated SNPs are non-coding >900 GWAS studies for 165 phenotypes (>3400 associated SNPs) GWAS variants: Coding Promoter Intron Intergenic Hindorff et al: A Catalog of Published Genome-Wide Association Studies. Disease variants (SNPs) are enriched within enhancer states http://www.genome.gov/gwastudies/ 7% 2% 47% 44% GWAS SNPs enriched within enhancers that are active in related cell types Erythrocyte phenotypes Erythroid (K562) Blood lipids Hepatoma cells Rheumatoid arthritis Lymphoblastoids Primary biliary cirrhosis Lymphoblastoids Systemic lupus erythematosus Lymphoblastoids Lipoprotein cholesterol/triglycer. Hepatoma cells Haematological traits Erythroid (K562) Haematological parameters Erythroid (K562) Colorectal cancer Blood pressure Hepatoma cells Erythroid (K562) Gene activity and human disease TF2 - TF1 + E TF3 • Human genome contains ~1 million enhancers that act like switches to control the ac4vity of individual genes • Human diseases such as diabetes are complex traits influenced by many sites in our genome (unlike Mendelian disorders) • Gene4c studies have iden4fied thousands of DNA sites that vary between individuals and influence disease risk • These sites of disease risk coincide with enhancers. Suggests complex diseases may be caused by defects in many switches… Determinants of cellular state Cell type DNA TF ES cell Silenced locus Endoderm Mesoderm Ectoderm Liver, Lung, Pancreas Blood, Heart Skeletal Muscle CNS, Skin Chromatin state transition upon developmental specification ES cells: Repressive H3K27me3 confined to ‘poised’ promoters Differentiated cells: ~70% of the genome sequestered by compact/repressive chromatin Polycomb compaction (Francis et al, Science 2004) The global organization of DNA is dynamic Acknowledgements MGH lab Oren Ram Alon Goren Esther Rheinbay Mario Suva Mazhar Adli Shawn Gillespie Birgit Knoechel Richard Koche Manching Ku Eric Mendenhall Rusty Ryan Kaylyn Williamson Vicky Zhou Jiang Zhu James Zou Broad Institute Chuck Epstein Noam Shoresh Tim Durham Robbyn Issner Xiaolan Zhang Nir Yosef Ido Amit Aviv Regev MIT Computer Science Manolis Kellis Jason Ernst Pouya Kheradpour Luke Ward MGH collaborators Miguel Rivera Hiro Wakimoto Samuel Rabkin David Louis Andrew Chi