Towards the Human Methylome Megan Hitchins, PhD Head, Medical Epigenetics Laboratory Senior Lecturer, University of New South Wales Lowy Cancer Research Centre The epigenetic code • Genetic code - 3e9 bases of genome • Epigenetic code – superimposed on the genetic code • Provides functional relevance by influencing transcriptional activity • Conveyed by the composition of modifications on the DNA backbone: Covalent cytosine methylation Covalent modifications to amino acids of histone tails Variants of histone proteins Chromatin remodelling enzymes Small and non-coding RNAs Epigenetic programming 23 chromosomes: 22 autosomes + X or Y 23 chromosomes: 22 autosomes + X Germ cell epigenetic profile Oocyte: unknown Sperm: protamines 1. Paternal pronucleus is demethylated within zygote Epigenetic programming 1. Paternal pronucleus is demethylated within zygote 2. Preimplantation embryo 3. Inner cell mass (ES cells) 4. Embryonic cell lineages 5. Somatic differentiation 6. Germline Same genetic code within all somatic cells; different epigenetic code The epigenome(s) • The totality of all epigenetic modifications within a given cell type which influences gene activity Uniform genetic information across all nucleated cells within an individual (except somatically acquired genetic mosaicism & recombinational events in immune cells) Unique epigenome per cell type (>250 normal human epigenomes) providing it with its cell identity • Altered epigenomes in disease states Epigenetic modifications - architecture of the epigenome Conglomeration of histone tail modifications M M M H3 Active Ac Ac N-A R T K Q T A R K S T G G K A P R K Q L A T K A A R K S P A T G G V K K 4 9 M M M Nucleosome composition and position on -helix Repressive Cytosine Methylcytosine NH2 N3 2 HO 27 M M M 4 NH2 5 N3 1 6 2 N Hydroxymethylcytosine HO 4 NH2 CH3 5 N3 1 6 2 N HO 4 1 N CH2OH 5 6 The Methylome • The collective profile of cytosine methylation across the entire genome • Methylation in humans occurs predominantly at 5’-CpG-3’ dinucleotides (mCpG) • 28.6 million CpG sites in the human genome, of which 7% occur within CpG islands • Non-CpG methylation has been discovered in human ES cells including mCHG & mCHH (where H = A/C/T) Detection of methylcytosine by “sodium bisulphite sequencing” A TAT CACGT GATTATA m m Non-CpG methylation: Genomic DNA: A CAT CACGT GATTACA Sodium bisulphite treatment Unmethylated: Methylated: A UAT UAUGT GATTAUA A UAT UACGT GATTAUA PCR amplification Unmethylated: Methylated: A TAT TATGT GATTATA A TAT TACGT GATTATA Sequence (single locus or whole-genome) Detection of regions with dense methylcytosine by MeDIP-Seq Genomic DNA: 1. Sonicate 2. Denature 3. Affinity purify with antibody 4. Deep-sequence DNA enriched with mC m 5’-A CAT CACGT GATTACA-3’ 3’-T GTA GTGCA CTAATGT-5’ m Methods for deciphering a methylome BS-seq/MethylC-Seq Interrogates methylation status at every cytosine across the entire genome to single-base resolution – most comprehensive MeDIP-seq Isolate highly methylated CpG densities detection of differential methylation eg normal versus cancery Study designs to determine the epigenetic basis for disease in humans • Cancer: Compare neoplastic versus adjacent normal tissues and precursor lesions / intermediates • Complex disease: 1. Case versus Control from the same tissue-type 2. Identical twins with discordant phenotypes • Environmental factors: 1. Identical twins with similar versus diverse life-styles 2. Longitudinal epigenetic-epidemiological study Deciphering the human methylomes Whole-genome bisulphite sequencing (BS-Seq): • Human ES cells • Fetal lung fibroblasts Lister et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature (2009); 462: 315-322 • Peripheral blood mononuclear cells (PBMC) Li et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biology (2010); 8: e1000533 • Similar mC levels • Predominantly mCpG • • • • > • ES cells > significant non-CpG methylation; antisense strand Differentiated cells none Lost upon ES cell differentiation Gain upon iPS Epigenetic marker of pluripotency? Subtelomeric regions highly methylated Lister et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature (2009); 462: 315-322 Differential methylation patterns > differentiate cell identity Lister et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature (2009); 462: 315-322 Consistent methylation pattern across protein-coding genes U U Upstream Exon1 Intron1 Exons Introns Last exon Downstream Li et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biology (2010); 8: e1000533 PBMC: Haploid differentially methylated regions • “YH” genome sequenced (1st Asian) • “YH” methylome - same individual & same sample • Integration of genetic & epigenetic data (SNP-methylation tags) Genetic variation > epigenetic variation > variable gene expression SNP TF G A Allele-specific methylation Allele-specific expression G G G G A Exonic SNP Li et al. PLoS Biology (2010); 8:e1000533 Do epigenetic differences in monozygotic twins underlie phenotypic discordance? • Genetically identical (MZ) • Phenotypic discordances observed for imprinting disorders and common “multifactorial” diseases • Epigenetic basis? Epigenetic divergence in MZ twin pairs (Traditional epigenetic methods) • Epigenetic profile of 40 MZ twin-pairs of 3-74 years • Similar epigenetic profiles at age 3y • Numerous epigenetic differences resulting in altered gene expression between co-twins at 50y • More differences between co-twins that had spent less time living together and had differing life-styles • Epigenetic diversity increased with both age & lifestyle differences; implicates the interaction between environmental factors and epigenetics Fraga et al Esteller. Epigenetic differences arise during the lifetime of MZ twins. PNAS (2005) 102: 10604-9. “EpiTwin”– largest methylome study • Methylomes of 5000 twins (MZ & DZ) aged 18-85y • Compare patterns with co-twin • Identify differences that may underlie discordance in common (multifactorial) diseases including diabetes, obesity, atopy, cardiovascular, osteoporosis & longevity • Collaboration: TwinsUK registry (King’s College London) and Beijing Genomics Institute (BGI, Schenzhen, China) • Cost - $30M • Initiated 2010 – expected completion 2015 Cancer Methylomes Neurofibromatosis: • Schwann cells > benign neurofibromas > malignant peripheral nerve sheath tumours (5-10%) • Acquired methylome by MeDIP-Seq & genome-wide CNV; integrated this with existing gene expression profiles for all 3 cell types > comparative epigenomics • Identified a complex pattern of epigenetic changes during neoplastic progression; most occurred outside regions previously considered most important in changing gene expression during carcinogenesis. Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral nerve sheath tumours. Genome Research (2011); 21: 515-524 Key epigenetic changes in neurofibromatosis Satellite repeats progressively demethylated Clustering of MPNST and NF using expression of genes based on differential methylation patterns CpG island “shores” Non CpG island promoters Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral nerve sheath tumours. Genome Research (2011); 21: 515-524 Direction and applicability to molecular/genetic pathology • Provide the framework of the epigenome with multiple reference epigenomes from normal and disease cellular states • Define individual epigenetic markers that are representative of specific cell or disease states • Much whole-epigenome scale information will be superfluous: return to the study of individual loci or selected sequence types with the transition from investigative research > clinical application Future role of the methylome in Genetic & Molecular Pathology • Formulation of (minimal) panels of methylation biomarkers: • Diagnostic markers for the early detection & monitoring disease states • Differential markers eg to distinguish cancer subtypes • Prognostic & predictive markers eg drug sensitivity; survival outcomes • Stratification of patient populations for clinical trials • Stem cell and regenerative therapies • Reproductive medicine Further information www.ihec-epigenomes.org “EpiTwin” www.twinsuk.ac.uk/projects/epitwin.html Jones et al. Moving AHEAD with an international human epigenome project. Nature (2008); 454: 711-715 Lister et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature (2009); 462: 315-322 Li et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biology (2010); 8: e1000533 Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral nerve sheath tumours. Genome Research (2011); 21: 515-524 Towards defining the human methylome Megan P. Hitchins, PhD Definition of terms Epigenetics: “Somatically heritable changes in gene activity that do not involve changes in the DNA base sequence”. DNA Methylation: Covalent CH3 bond at the 5-carbon position of the cytosine pyrimidine ring. Sometimes referred to as the ‘fifth’ or ‘minor’ base, in humans methylation of DNA occurs primarily at cytosines immediately preceding a guanine (mCpG). Other forms of methylation, including hydroxymethyl (CH2OH) cytosine (hmCpG) and non-CpG methylation (mCHG and mCHH, where H is A, T or C), have been identified in human embryonic stems cells. Methylome: The collective profile of methylation of all cytosines across the entire genome of a cell. Epigenome: The totality of epigenetic marks across the entire genome of a cell (or the entirety of the ‘epigenetic code’ within a cell). This includes all modifications superimposed on the genome that collectively comprise the ‘epigenetic code’, including cytosine methylation, covalent modifications of histone tails, histone variants present within the nucleosome core, small and non-coding RNAs. Biological role: The epigenetic state of the DNA directs chromatin structure and influences gene activity. The epigenome determines cell identity, since it differs from one cell type to another, and changes dynamically during cell differentiation through “epigenetic programming”. The “International Human Epigenome Project” Completion of the Human Genome Project a decade ago represents one of the greatest scientific achievements. It defined the sequence of all 3 x 109 bases, allowing us to determine gene number, structure and identify regulatory regions embedded within the DNA sequence. Since then, sequencing of additional human genomes led the realisation of the extent of genetic variation (SNPs and copy number variants) amongst us. Within any one individual, the DNA sequence in all nucleated cells is essentially identical (excepting recombination events in immune cells and occasional somatically acquired genetic changes). Yet, in humans there are over 250 different cell types, each displaying distinct physical and behavioural phenotypes. The activity of the genome is carefully orchestrated throughout the life-cycle by the epigenetic code that is superimposed on the genomic ‘skeleton’ and differs from one cell type to another. The epigenome, comprised of the genome-wide conglomerate of DNA modifications, including cytosine methylation, covalent attachments to the amino acids of histone tails, core histone protein variants, small and non-coding RNAs, directs chromatin packaging and influences gene expression (Figure 1). There is no single methylome or epigenome. This varies between cell types and changes dynamically, for instance as stem 1 Towards defining the human methylome Megan P. Hitchins, PhD cells become committed to a particular cell lineage and differentiate. The epigenome thus provides a framework for the functional expression of the genetic code. Figure 1. Epigenetic mechanisms. The coding information in the base sequence of DNA is organised within a chromatin structure to form cell-specific epigenomes. DNA cytosine methylation and covalent modification of histone tails and histone protein variants contribute information to nucleosomal remodelling machinery that influences gene repression or activation. Adapted from [1]. In recent years, it has become apparent that aberrations in the process of epigenetic programming also lead to disease states, most notably congenital disease due to disrupted genomic imprinting and cancer. Environmental factors and nutrition can also induce epigenetic alteration. Furthermore, epigenomic changes are potentially reversible through treatment with drugs that inhibit chromatin modifying enzymes, of which histone deacetylase inhibitors and methyltransferase inhibitors have been FDA or EU approved for clinical use. In view of the importance of the epigenome in conferring cell identity, its role in disease aetiology and the technical feasibility we now have to define it on a genome-wide scale, it was considered timely to undertake an international effort to decode the epigenome. The International Human Epigenome Project (IHEP) was conceived to decipher and catalogue the epigenome of various normal cell types and disease states, beginning with defining the methylome [1-2]. The scope of the IHEP is to provide high resolution reference epigenome maps for key cellular states in humans and mouse (and other model organisms), including but not limited to, embryonic and adult stem cells, proliferative and differentiated cell types, and correspondent cells that show an altered disease state. For example, in the haematopoetic system, naive CD34+ progenitors, differentiated cells such as leukocytes and lymphoid cells, would provide the reference for acute myelogenous leukaemia, acute lymphoblastic leukaemia, chromin myelogenous leukaemia and myelodysplastic syndrome. Another objective of the IHEP is to develop the bioinformatics infrastructure to support the curation and integration of epigenomic data over and above the genome sequence, and provide user-friendly interfaces for free public access to the data. 2 Towards defining the human methylome Megan P. Hitchins, PhD The human methylome Cytosine methylation (mC) is most readily detected following conversion of genomic DNA with sodium bisulphite treatment, whereby unmethylated cytosines are converted to uracil and thence to thymine following PCR amplification, whereas methyl cytosines (mC) are inert and remain unconverted as a C. Thus mC may be differentiated from unmethylated cytosines on the basis of the presence of a C or a T in bisulphite-converted DNA, relative to the original DNA sequence, by DNA sequencing (or other methods) in the same manner as SNPs are identified. The application of automated high throughput deep-sequencing to the entire sodium bisulphite-converted genome (MethylC-seq or BS-seq) has the capability to provide full coverage of the methylome at single base resolution. Full methylome analysis should ideally take into account the occurrence of CpG methylation, hydroxylmethylation of CpGs (hmCpG) as well as non-CpG methylation, although the biological functions of these latter modifications remain to be determined. However, sodium bisulphite treatment does not distinguish between mC and hydroxymethyl cytosine (hmC), since both remain unconverted. (Currently the only means of differentiating between mC and hmC is through the use of specific antibodies, which bind either mC or hmC and allow DNA sequences with which these modifications are associated to be analysed - though this is a blunt tool capable only of examining these modifications on a global scale). Whole-genome methylC-seq to single base resolution has now been successfully implemented for three human cell types: embryonic stem (ES) cells, fetal lung fibroblasts, and of significant clinical importance, peripheral blood mononuclear cells (PBMCs). Defining the methylomes of human ES and fetal lung fibroblast cells by the same group [3] led to significant revelations. Most notably, while both cell types had similar levels of mCpG occurring as a mirror-imagine on both DNA strands, human ES cells had significant levels of mCHG and mCHH as well, which occurred asymmetrically and preferentially on the antisense strand. When the ES cells were induced to differentiate, these non-CpG methylation marks were lost, whereas they were gained by the fetal lung fibroblasts following induction to pluripotency (iPS). Thus non-CpG methylation appears to be characteristic of the pluripotent state. The methylome of PBMCs was defined in an anonymous individual of Han Chinese heritage, whose complete genome had previously been sequenced, allowing for the integration of genomic and epigenomic information [4]. Interestingly, a significant number of regions showing methylation of a single allele were identified, only a proportion of which were attributable to genomic imprinting (methylation and silencing of a single copy of a gene on the basis of parental origin of inheritance). Some of these ‘monoallelically’ methylated sites were associated with genetic polymorphisms and were also expressed from a single allele. This study thus illustrates that the epigenome within any particular cell type is also likely to differ between individuals according to genomic sequence variation due to genomeepigenome interactions. Comparison of the methylomes from these three different tissues also identified regions of differential CpG methylation between the cells types, confirming tissuespecific methylation differences. On the other hand, common patterns of CpG methylation across the genome also emerged. These included diminished methylation at the transcription start sites and the first exon of genes (irrespective of whether these occurred within CpG 3 Towards defining the human methylome Megan P. Hitchins, PhD islands), with a sudden increase in methylation thereafter at the first intron. Exons within the body of genes were then methylated at higher levels than introns. On a chromosomal scale, subtelomeric regions were comparably hypermethylated. Future applications of the human methylomes As delineation of the methylomes of various normal and disease-related cells becomes commonplace in the near future, cellular and disease states will be more definitively categorised on the basis of unique or confined epigenetic marks. A panel of methylation marks used to define colorectal cancers on the basis of the “CpG methylator phenotype” is already utilised, although this was developed following single-gene analyses. With the scaling up of methylation analyses to encompass entire genomes, a proportional scaling in the identification of methylation biomarkers will ensue. These will include diagnostic methylation biomarkers that detect or define disease states (or particular subtypes of diseases that we are currently unable to distinguish on the basis of existing pathological tests), as well as prognostic markers for particular treatment regimens. Defining the various human methylomes is likely to lead to a revolution in molecular pathology practice. References and further reading 1. Moving AHEAD with an international human epigenome project. Nature, 2008. 454(7205): p. 711-5. 2. Jones, P.A. and R. Martienssen, A blueprint for a Human Epigenome Project: the AACR Human Epigenome Workshop. Cancer Res, 2005. 65(24): p. 11241-6. 3. Lister, R., et al., Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 2009. 462(7271): p. 315-22. 4. Li, Y., et al., The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol, 2010. 8(11): p. e1000533. Contact details: Dr Megan P. Hitchins, Medical Epigenetics Laboratory, Adult Cancer Program, Lowy Cancer Research Centre, University of New South Wales, Randwick High Street, Randwick NSW 2052 Email: M.Hitchins@unsw.edu.au Phone: 02 9385 1431 4