Epigenomics: A Practical Guide Benjamin Rodriguez, PhD Wei Li Lab, Baylor College of Medicine Molecular Biology Refresher Course with Bioinformatics August 30th 2013 Software, Sites, Materials Course Materials: http://dldcc-web.brc.bcm.edu/lilab/benji/MBRB_2013/index.html Most up to date slides I will upload for all three of my lectures Browsers: http://genome.ucsc.edu/ http://epigenomegateway.wustl.edu/ Web-based analysis: http://bejerano.stanford.edu/great/public/html/ http://david.abcc.ncifcrf.gov Outline • DNA methylation • Histone modifications • DNase hypersensitivity • Aberrant methylation in cancer • Epigenetic inheritance in development and disease DNA is Packaged in Chromatin Chromatin consists of nucleosomes, DNA wrapped around histone proteins nucleosome histone DNA chromatin • Chromatin organizes genes to be accessible for transcription, replication, and repair CpG Islands and Promoters • Although C & G constitute 42% of the human genome, less than 1% of pairs are CpG – Less than ¼ of expected frequency of 0.04 • A ‘CpG island’ is a run of “CpG-rich” sequence – min 200 bp in length – GC content > 50% – Observed : Expected ratio > 0.6 – This definition is not precise • Many CpG islands occur within promoters DNA Methylation • Methylation at CpG islands often repress nearby gene expression • Many highly expressed genes have CpG methylation on their exons • Some genes could be imprinted, so maternal and paternal copies have different DNA methylation • In embryonic stem cells, there are also CHG methylation • Recently, another type of DNA methylation called hydroxyl methylation (hmC) is found Epigenetic Mechanisms: DNA Methylation CpG island CGCG CG Normal 1 CG MCG CG 2 3 MCG 4 C: cytosine mC: methylcytosine DNA Methylation and Gene Silencing CpG island CGCG CG Normal 1 MCGMCG MCG Cancer CG 2 3 MCG 1 3 MCG 4 CG CG 2 X MCG CG 4 C: cytosine mC: methylcytosine CG DNA Methylation and Regulation • Cytosine methylation blocks DNA-binding proteins’ access to regulatory sites and creates binding sites for repressive proteins • Methylation often follows decrease in site use From Thurman et al Nature 2012 Methylation and Expression Some genes (e.g. HOXB13 in breast cancer) show strong correlation of promoter methylation with expression R2 = 0.7817 P < 0.0005 From Rodriguez et al Carcinogenesis 2008 Methylation, Retroviruses and Repeats • Bacteria use DNA methylation to limit invasive DNA from viruses • A large fraction of the human genome consists of carcasses of retro-viruses and transposons • Almost all DNA repeats are heavily methylated • If they lose methylation they are more likely to be expressed DNA Methylation and Development • Almost all DNA de-methylated in embryo • Increasing methylation at various times during fetal development restrict functionality – This is why cloning is difficult • Wave of methylation in adolescence • Gradual de-methylation in old age DNA Methylation and Inheritance • Most DNA is de-methylated during gametogenesis and embryogenesis • Methylation persists in some DNA regions • Humans and mice show epigenetic inheritance apparently mediated by DNA methylation Agouti Mice and DNA Methylation Genomic Imprinting Epigenetic mechanism of transcriptional regulation Maternal allele Paternal allele Paternal allele Maternal allele Expression of a subset of mammalian genes is restricted to one parental allele Genomic Imprinting Maternal allele Paternal allele Paternal allele Maternal allele Parental chromosomes are differentially marked by DNA methylation Genomic Imprinting Imprinting regulated by cis-acting elements (Imprinting Control Regions) and non-coding RNAs Maternal allele Paternal allele Paternal allele Maternal allele Imprinting Control Regions act over long distances and control the imprinting of multiple genes We will examine a recent study of the IGF2 DMR in individuals exposed to famine in utero Epigenetic Mechanisms: Post-Translational Modification to Histones Histone Acetylation Histone Methylation Ac Me • Epigenetic modifications of Histones include Histone Acetylation and Methylation Histone Modifications • Different modifications at different locations by different enzymes • Potential temporal and spatial specificity Histone Modifications • Gene body mark: H3K36me3, H3K79me3 • Active promoter (TSS) mark: H3K4me3 • Active enhancer (TF binding) mark: H3K4me1, H3K27ac • Both enhancers and promoters: H3K4me2, H3/H4ac, H2AZ • Repressive promoter mark: H3K27me3 • Repressive mark for DNA methylation: H3K9me3 Genes, regulatory DNA, and epigenetic features Graphic from NIH RoadMap Epigenomics Site Genes, regulatory DNA, and epigenetic features DNaseI - promoters - enhancers - silencers - insulators - etc. DNase Hypersensitive (HS) Mapping • DNase randomly cuts genome (more often in open chromatin region) • Select short fragments (two nearby cuts) to sequence • Map to active promoters and enhancers DNaseI hypersensitive sites mark regulatory DNA DNaseI Hypersensitive site (DHS) Promoters Enhancers ~100,000 – 250,000 DHSs per cell type (0.5-1.5% of genome) genome.ucsc.edu www.epigenomebrowser.org Epigenetic Modifications to Histones and DNA Can Cooperate to Silence Gene Expression Me HDAC DNMT HMT Me Me Me Ac Ac Me Me Me Me Ac HMT HDAC Gene expression Me Me Gene expression • Coordinated activities of chromatin modifying enzymes lead to condensation of chromatin and inhibition of gene expression Roles in Normal Development and Cancer EPIGENETICS Normal epigenetic mechanisms Differentiated cells Progenitor cell • Regulation of genes involved in differentiation, cell cycle, and cell survival Roles in Normal Development and Cancer EPIGENETICS Normal epigenetic mechanisms Differentiated cells Progenitor cell Deregulated epigenetic mechanisms Malignant progenitor cell Tumor • Regulation of genes involved in differentiation, cell cycle, and cell survival • Through epigenetic silencing of certain genes, affected cells may acquire new phenotypes which promote tumorigenesis HOXB13 hypermethylation in breast cancer cells R2 = 0.7817 P < 0.0005 Strong inverse assocation between promoter CpG island hypermethylation and HOXB13 gene expression From Rodriguez et al Carcinogenesis 2008 HOXB13 hypermethylation in breast cancer cells Bisulfite sequencing (Sanger, clone-based, very laborious) From Rodriguez et al Carcinogenesis 2008 Inhibition of DNA methyltransferase activity restores expression of HOXB13 From Rodriguez et al Carcinogenesis 2008 HOXB13 hypermethylation strongly associates with patient ER status (OR=3.75, 95% CI 1.41-9.96; P = 0.008) Patient ER Status Paired Tumor and Adjacent Normal Tissues From Rodriguez et al Carcinogenesis 2008 HOXB13 hypermethylation associates with poor disease free survival in ERα-positive patients From Rodriguez et al Carcinogenesis 2008 Epigenetic inheritance and human development • Epidemiologic studies suggest adult disease risk is associated with adverse environmental conditions early in development • Involvement of epigenetic dysregulation has been hypothesized • Do early-life environmental conditions can cause epigenetic changes in humans that persist throughout life? Is there are role for clinical intervention? 1. Periconceptual exposure to famine 2. Offspring born before vs. after maternal gastrointestinal bypass surgery Persistent epigenetic differences associated with prenatal exposure to famine in humans Individuals who were prenatally exposed to famine during the Dutch Hunger Winter in 1944–45 had, 6 decades later, less DNA methylation of the imprinted IGF2 gene compared with their unexposed, same-sex siblings Association was specific for periconceptional exposure, reinforcing that very early mammalian development is a crucial period for establishing and maintaining epigenetic marks Heijmans et al PNAS 2008 Insulin-like growth factor II (IGF2) • • • • One of the best-characterized epigenetically regulated loci Key factor in human growth and development Maternally imprinted Imprinting is maintained through the IGF2 differentially methylated region (DMR) • Hypomethylation of DMR leads to bi-allelic expression of IGF2 • IGF2 DMR methylation is a normally distributed quantitative trait largely determined by genetic factors • methylation mark is stable up to middle age • If affected by environmental conditions early in human development, altered methylation may be detected many years later Difference in IGF2 DMR methylation between individuals prenatally exposed to famine and their same-sex sibling Fig A displays the difference in IGF2 DMR methylation within sibships according to the estimated conception date of the famine-exposed individual IGF2 DMR methylation was lowest in the famine-exposed individual among 72% (43/60) of sibships; this lower methylation was observed in conceptions across the famine period. IGF2 DMR methylation among individuals periconceptionally exposed to famine and their unexposed, same-sex siblings IGF2 DMR methylation among individuals exposed to famine late in gestation and their unexposed, same-sex siblings • 62 individuals exposed to famine late in gestation for at least 10 weeks, they were born in or shortly after the famine • No difference in IGF2 DMR methylation between the exposed individuals and their unexposed siblings Timing of famine exposure during gestation and IGF2 DMR methylation • Periconceptional, late exposure groups and 122 controls • Periconceptional exposure associated with lower methylation • Statistically significant association between timing and exposure Differential methylation in offspring born before versus after maternal gastrointestinal bypass surgery • Obesity during pregnancy affect fetal programming of adult disease • Children born after surgery (AMS) are less obese and exhibit improved cardiometabolic risk profiles carried into adulthood • Analyze the impact of maternal weight loss surgery on methylation levels in BMS and AMS offspring. • Statistically significant correlations between gene methylation levels and gene expression and plasma markers of insulin resistance • Effective treatment of a maternal phenotype is durably detectable in the methylome and transcriptome of subsequent offspring Guenard et al. PNAS 2013 Offspring born before vs. after maternal gastrointestinal bypass surgery BMS offspring higher weight, height, and waist and hip girth (P < 0.05) AMS offspring Lower body fat % (P = 0.07) Improved Fasting insulin levels (P = 0.03) Homeostatic model of insulin resistance (HOMA-IR) index (P = 0.03) Lower blood pressure (P < 0.05). Differential methylation analysis of offspring born before vs. after maternal gastrointestinal bypass surgery • 14,466 CpG sites (2.9% of sites analyzed) exhibited significant differences • corresponded to 5,698 unique genes • significant biological functions related to autoimmune disease, pancreas disorders, diabetes mellitus, and disorders of glucose metabolism Any questions? On to the Laboratory! Laboratory Excercises • We will work with the significant differentially methylated CpG sites published as supporting data from the Guenard study • The MGBS_study.xlsx and AMS.probes.bed files are available from the class web site • We will perform our own mapping and significance testing of the CpG sites (in relation to genes) using GREAT • We will analyze the published gene list and our custom gene list in DAVID • Finally, we will analyze last week’s gene list (from the MLL-AF9 fusion protein study) in DAVID Do hyper- and hypo-methylated sites in AMS offspring have different distributions? Open MGBS_study.xlsx and examine the “DMC list” worksheet The study used a poorly described algorithm, DiffScore, to assess statistical differences and to rank CpG sites Also implemented a loose threshold for change cutoff In excel, we can easily compute summary stats The average and standard deviation of the Delta beta values are quite similar Mapping significant CpG sites to genes with GREAT • Choose human GRCh37 on species assembly • Test regions upload bed file AMS.probes.bed • Set Background regions to whole genome • Choose submit Compare gene lists from publication to those obtained by GREAT http://jura.wi.mit.edu/cgi-bin/bioc/tools/compare.cgi Choose compare 2 lists, Paste lists of genes, press submit • GREAT recovers 170 of 198 genes from the publication (AMS and GREAT) • GREAT identifies 170 additional genes (because by default it searches a wider space of genomic distances) • The missing 28 genes may result from gene name synonyms Mapping significant CpG sites to genes with GREAT On “Region-Gene Association Graphs” We see 3 / 4 of the CpG sites are assigned two genes Orientation and distance to TSS show upstream pretty flat, but a spike in predictions when the distance is > 5 kb from TSS Could that be the reason we don’t see any significance test results? Let us find out Modifying the genomic region search range in GREAT Open the “Association rule settings” dialog box Change downstream to 5kb and distal to 5kb Resubmit the job What happened with a smaller genomic search interval? We only returned 64 genes! Crap. But we did finally return a single significant test result InterPro(protein sequence analysis and classification) Functional enrichment analyses with DAVID • With GREAT, we were able to identify the majority of genes published in the original study • We do not have sufficient information to repeat the study’s original analyses • We can use DAVID to analyze the study gene list and our gene list from GREAT Open http://david.abcc.ncifcrf.gov and choose “Start Analysis” Functional enrichment analyses with DAVID • • • • “Upload Gene List” Dialog box Copy and Paste the list from MGBS_study.xlsx worksheet “Study Genes” On “Select Identifier”, choose “Official Gene Symbol” and choose “Gene List” on “List Type” Then Submit List Open http://david.abcc.ncifcrf.gov and choose “Start Analysis” Functional enrichment analyses with DAVID For species, highlight Homo sapiens and click “Select Species” Rename the list Choose “Functional Annotation Tool” Functional enrichment analyses with DAVID Each Annotation Category on the left can be expanded to reveal a number of optional databases to query This allows for powerful customization For this exercise, we will accept the default options Choose “Functional Annotation Chart” Functional enrichment analyses with DAVID Functional Annotation Chart fields are: category, term, related term (RT), genes, count, percentage, p-value (univariate modified Fisher’s), and Benjamini p-value (correction for multiple testing) Terms with arrows can be sorted Shown above are the first three results, the only ones to pass multipletesting correction They reference the same group of genes Functional enrichment analyses with DAVID Clicking on the link for term “Pleckstrin homology” opens the corresponding entry at Interpro Proteins containing this domain can bind to and interact with membrane bound proteins, potentially mediating various signal transduction pathways in the cell Functional enrichment analyses with DAVID Let’s now perform the analysis with our list of differentially methylated genes obtained via GREAT • “Upload Gene List” Dialog box • Copy and Paste the list from MGBS_study.xlsx worksheet “Great Analysis” • On “Select Identifier”, choose “Official Gene Symbol” and choose “Gene List” on “List Type” • Submit List, choose “Homo sapiens” • Select “Functional Annotation Chart” Note: Entrez Gene ID’s are a preferred way to search for gene functions They can account for the fact that a gene may go by several different names Functional enrichment analyses with DAVID We see the same first three results as before, but now they do not pass multiple-testing correction Why? One explanation, we introduced “noisy” genes with GREAT Why did we not see any significant biological functions related to autoimmune disease, pancreas disorders, diabetes mellitus, or disorders of glucose metabolism? 1. Study authors gave us a small piece of the data they likely used 2. Methodological issues 3. Commercial IPA is very different from publicly curated databases and search tools Functional enrichment analyses with DAVID Finally, lets analyze the list of genes from last week’s MLL-AF9 fusion gene study. The file “MLL-AF9_promoters.bed” is available from the course website • “Upload Gene List” Dialog box • Open the bed file in excel, copy and paste the fifth column into DAVID • On “Select Identifier”, choose “Entrez Gene ID” and choose “Gene List” on “List Type” • Submit List, choose “Mus musculus” • Select “Functional Annotation Chart” Note: Entrez Gene ID’s are a preferred way to search for gene functions They can account for the fact that a gene may go by several different names Functional enrichment analyses with DAVID Jackpot! We have dozens of highly enriched terms for the genes bound by oncogenic MLL-AF9 in mouse leukemia stem cells Enriched functions include transcription regulation and cell cycle More than 40% of targets are phosphoproteins Laboratory Summary • The Guenard study was not very fruitful, so to speak • I have some issues with their methodology • Limited data (published) sharing is poor practice • DNA methylation data is difficult to interpret • GREAT and DAVID are powerful tools for functional enrichment analyses of genome-wide studies • With the right tools and a little patience, you can make novel discoveries and draw meaningful biological interpretation from genomics datasets