Towards the Human Methylome PhD Megan Hitchins, Head, Medical Epigenetics Laboratory

advertisement
Towards the Human Methylome
Megan Hitchins, PhD
Head, Medical Epigenetics Laboratory
Senior Lecturer, University of New South Wales
Lowy Cancer Research Centre
The epigenetic code
• Genetic code - 3e9 bases of genome
• Epigenetic code – superimposed on
the genetic code
• Provides functional relevance by
influencing transcriptional activity
• Conveyed by the composition of
modifications on the DNA backbone:
Covalent cytosine methylation
Covalent modifications to amino acids of
histone tails
Variants of histone proteins
Chromatin remodelling enzymes
Small and non-coding RNAs
Epigenetic programming
23 chromosomes:
22 autosomes + X or Y
23 chromosomes:
22 autosomes + X
Germ cell epigenetic profile
Oocyte: unknown
Sperm: protamines
1. Paternal pronucleus is
demethylated within zygote
Epigenetic programming
1. Paternal pronucleus is
demethylated within zygote
2. Preimplantation embryo
3. Inner cell mass (ES cells)
4. Embryonic cell lineages
5. Somatic differentiation
6. Germline
Same genetic code within all somatic cells; different epigenetic code
The epigenome(s)
• The totality of all epigenetic modifications within
a given cell type which influences gene activity
Uniform genetic information across all nucleated cells
within an individual (except somatically acquired genetic
mosaicism & recombinational events in immune cells)
Unique epigenome per cell type (>250 normal human
epigenomes) providing it with its cell identity
• Altered epigenomes in disease states
Epigenetic modifications - architecture of
the epigenome
Conglomeration of histone tail modifications
M
M
M
H3
Active
Ac
Ac
N-A R T K Q T A R K S T G G K A P R K Q L A T K A A R K S P A T G G V K K
4
9
M
M
M
Nucleosome composition
and position on -helix
Repressive
Cytosine
Methylcytosine
NH2
N3
2
HO
27
M
M
M
4
NH2
5
N3
1 6
2
N
Hydroxymethylcytosine
HO
4
NH2
CH3
5
N3
1 6
2
N
HO
4
1
N
CH2OH
5
6
The Methylome
• The collective profile of cytosine methylation
across the entire genome
• Methylation in humans occurs predominantly at
5’-CpG-3’ dinucleotides (mCpG)
• 28.6 million CpG sites in the human genome, of
which 7% occur within CpG islands
• Non-CpG methylation has been discovered in human
ES cells including mCHG & mCHH (where H = A/C/T)
Detection of methylcytosine by
“sodium bisulphite sequencing”
A TAT CACGT GATTATA
m m
Non-CpG methylation:
Genomic DNA:
A CAT CACGT GATTACA
Sodium bisulphite treatment
Unmethylated:
Methylated:
A UAT UAUGT GATTAUA
A UAT UACGT GATTAUA
PCR amplification
Unmethylated:
Methylated:
A TAT TATGT GATTATA
A TAT TACGT GATTATA
Sequence (single locus or whole-genome)
Detection of regions with dense
methylcytosine by MeDIP-Seq
Genomic DNA:
1. Sonicate
2. Denature
3. Affinity purify with antibody
4. Deep-sequence DNA
enriched with mC
m
5’-A CAT CACGT GATTACA-3’
3’-T GTA GTGCA CTAATGT-5’
m
Methods for deciphering a methylome
BS-seq/MethylC-Seq
Interrogates methylation status at every
cytosine across the entire genome to
single-base resolution
– most comprehensive
MeDIP-seq
Isolate highly methylated CpG densities detection of differential methylation
eg normal versus cancery
Study designs to determine the epigenetic
basis for disease in humans
• Cancer:
Compare neoplastic versus adjacent normal tissues
and precursor lesions / intermediates
• Complex disease:
1. Case versus Control from the same tissue-type
2. Identical twins with discordant phenotypes
• Environmental factors:
1. Identical twins with similar versus diverse life-styles
2. Longitudinal epigenetic-epidemiological study
Deciphering the human methylomes
Whole-genome bisulphite sequencing (BS-Seq):
• Human ES cells
• Fetal lung fibroblasts
Lister et al. Human DNA methylomes at base resolution show widespread
epigenomic differences. Nature (2009); 462: 315-322
• Peripheral blood mononuclear cells (PBMC)
Li et al. The DNA methylome of human peripheral blood mononuclear cells.
PLoS Biology (2010); 8: e1000533
•
Similar mC levels
•
Predominantly mCpG
•
•
•
•
>
•
ES cells > significant non-CpG
methylation; antisense strand
Differentiated cells none
Lost upon ES cell differentiation
Gain upon iPS
Epigenetic marker of pluripotency?
Subtelomeric regions highly
methylated
Lister et al. Human DNA methylomes at base resolution show widespread epigenomic
differences. Nature (2009); 462: 315-322
Differential methylation patterns > differentiate cell identity
Lister et al. Human DNA methylomes at base resolution show widespread epigenomic
differences. Nature (2009); 462: 315-322
Consistent methylation pattern across protein-coding genes
U
U
Upstream
Exon1 Intron1 Exons Introns Last
exon
Downstream
Li et al. The DNA methylome of human peripheral blood mononuclear cells.
PLoS Biology (2010); 8: e1000533
PBMC: Haploid differentially methylated regions
• “YH” genome sequenced (1st Asian)
• “YH” methylome - same individual &
same sample
• Integration of genetic & epigenetic
data (SNP-methylation tags)
Genetic variation > epigenetic
variation > variable gene expression
SNP
TF
G
A
Allele-specific methylation
Allele-specific expression
G
G
G
G
A
Exonic SNP
Li et al. PLoS Biology (2010); 8:e1000533
Do epigenetic differences in monozygotic
twins underlie phenotypic discordance?
• Genetically identical (MZ)
• Phenotypic discordances
observed for imprinting
disorders and common
“multifactorial” diseases
• Epigenetic basis?
Epigenetic divergence in MZ twin pairs
(Traditional epigenetic methods)
• Epigenetic profile of 40 MZ twin-pairs of 3-74 years
• Similar epigenetic profiles at age 3y
• Numerous epigenetic differences resulting in altered
gene expression between co-twins at 50y
• More differences between co-twins that had spent
less time living together and had differing life-styles
• Epigenetic diversity increased with both age &
lifestyle differences; implicates the interaction
between environmental factors and epigenetics
Fraga et al Esteller. Epigenetic differences arise during the lifetime of MZ twins.
PNAS (2005) 102: 10604-9.
“EpiTwin”– largest methylome study
• Methylomes of 5000 twins (MZ & DZ) aged 18-85y
• Compare patterns with co-twin
• Identify differences that may underlie discordance in
common (multifactorial) diseases including diabetes,
obesity, atopy, cardiovascular, osteoporosis & longevity
• Collaboration: TwinsUK registry (King’s College London) and
Beijing Genomics Institute (BGI, Schenzhen, China)
• Cost - $30M
• Initiated 2010 – expected completion 2015
Cancer Methylomes
Neurofibromatosis:
• Schwann cells > benign neurofibromas > malignant
peripheral nerve sheath tumours (5-10%)
• Acquired methylome by MeDIP-Seq & genome-wide CNV;
integrated this with existing gene expression profiles for all
3 cell types > comparative epigenomics
• Identified a complex pattern of epigenetic changes during
neoplastic progression; most occurred outside regions
previously considered most important in changing gene
expression during carcinogenesis.
Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral
nerve sheath tumours. Genome Research (2011); 21: 515-524
Key epigenetic changes in neurofibromatosis
Satellite repeats progressively demethylated
Clustering of MPNST and NF using expression of
genes based on differential methylation patterns
CpG island “shores”
Non CpG island promoters
Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral
nerve sheath tumours. Genome Research (2011); 21: 515-524
Direction and applicability to
molecular/genetic pathology
• Provide the framework of the epigenome with
multiple reference epigenomes from normal and
disease cellular states
• Define individual epigenetic markers that are
representative of specific cell or disease states
• Much whole-epigenome scale information will be
superfluous: return to the study of individual loci
or selected sequence types with the transition
from investigative research > clinical application
Future role of the methylome in
Genetic & Molecular Pathology
• Formulation of (minimal) panels of methylation
biomarkers:
• Diagnostic markers for the early detection & monitoring
disease states
• Differential markers eg to distinguish cancer subtypes
• Prognostic & predictive markers eg drug sensitivity;
survival outcomes
• Stratification of patient populations for clinical trials
• Stem cell and regenerative therapies
• Reproductive medicine
Further information
www.ihec-epigenomes.org
“EpiTwin” www.twinsuk.ac.uk/projects/epitwin.html
Jones et al. Moving AHEAD with an international human epigenome project.
Nature (2008); 454: 711-715
Lister et al. Human DNA methylomes at base resolution show widespread epigenomic
differences. Nature (2009); 462: 315-322
Li et al. The DNA methylome of human peripheral blood mononuclear cells.
PLoS Biology (2010); 8: e1000533
Feber et al., Beck. Comparative methylome analysis of benign and malignant peripheral
nerve sheath tumours. Genome Research (2011); 21: 515-524
Towards defining the human methylome
Megan P. Hitchins, PhD
Definition of terms
Epigenetics: “Somatically heritable changes in gene activity that do not involve changes
in the DNA base sequence”.
DNA Methylation: Covalent CH3 bond at the 5-carbon position of the cytosine
pyrimidine ring. Sometimes referred to as the ‘fifth’ or ‘minor’ base, in humans
methylation of DNA occurs primarily at cytosines immediately preceding a guanine
(mCpG). Other forms of methylation, including hydroxymethyl (CH2OH) cytosine
(hmCpG) and non-CpG methylation (mCHG and mCHH, where H is A, T or C), have
been identified in human embryonic stems cells.
Methylome: The collective profile of methylation of all cytosines across the entire
genome of a cell.
Epigenome: The totality of epigenetic marks across the entire genome of a cell (or the
entirety of the ‘epigenetic code’ within a cell). This includes all modifications
superimposed on the genome that collectively comprise the ‘epigenetic code’, including
cytosine methylation, covalent modifications of histone tails, histone variants present
within the nucleosome core, small and non-coding RNAs.
Biological role: The epigenetic state of the DNA directs chromatin structure and
influences gene activity. The epigenome determines cell identity, since it differs from
one cell type to another, and changes dynamically during cell differentiation through
“epigenetic programming”.
The “International Human Epigenome Project”
Completion of the Human Genome Project a decade ago represents one of the greatest
scientific achievements. It defined the sequence of all 3 x 109 bases, allowing us to determine
gene number, structure and identify regulatory regions embedded within the DNA sequence.
Since then, sequencing of additional human genomes led the realisation of the extent of
genetic variation (SNPs and copy number variants) amongst us. Within any one individual,
the DNA sequence in all nucleated cells is essentially identical (excepting recombination
events in immune cells and occasional somatically acquired genetic changes). Yet, in humans
there are over 250 different cell types, each displaying distinct physical and behavioural
phenotypes. The activity of the genome is carefully orchestrated throughout the life-cycle by
the epigenetic code that is superimposed on the genomic ‘skeleton’ and differs from one cell
type to another. The epigenome, comprised of the genome-wide conglomerate of DNA
modifications, including cytosine methylation, covalent attachments to the amino acids of
histone tails, core histone protein variants, small and non-coding RNAs, directs chromatin
packaging and influences gene expression (Figure 1). There is no single methylome or
epigenome. This varies between cell types and changes dynamically, for instance as stem
1
Towards defining the human methylome
Megan P. Hitchins, PhD
cells become committed to a particular cell lineage and differentiate. The epigenome thus
provides a framework for the functional expression of the genetic code.
Figure 1. Epigenetic mechanisms. The coding
information in the base sequence of DNA is organised
within a chromatin structure to form cell-specific
epigenomes. DNA cytosine methylation and covalent
modification of histone tails and histone protein
variants contribute information to nucleosomal
remodelling machinery that influences gene repression
or activation. Adapted from [1].
In recent years, it has become apparent that aberrations in the process of epigenetic
programming also lead to disease states, most notably congenital disease due to disrupted
genomic imprinting and cancer. Environmental factors and nutrition can also induce
epigenetic alteration. Furthermore, epigenomic changes are potentially reversible through
treatment with drugs that inhibit chromatin modifying enzymes, of which histone deacetylase
inhibitors and methyltransferase inhibitors have been FDA or EU approved for clinical use.
In view of the importance of the epigenome in conferring cell identity, its role in disease
aetiology and the technical feasibility we now have to define it on a genome-wide scale, it
was considered timely to undertake an international effort to decode the epigenome.
The International Human Epigenome Project (IHEP) was conceived to decipher and
catalogue the epigenome of various normal cell types and disease states, beginning with
defining the methylome [1-2]. The scope of the IHEP is to provide high resolution reference
epigenome maps for key cellular states in humans and mouse (and other model organisms),
including but not limited to, embryonic and adult stem cells, proliferative and differentiated
cell types, and correspondent cells that show an altered disease state. For example, in the
haematopoetic system, naive CD34+ progenitors, differentiated cells such as leukocytes and
lymphoid cells, would provide the reference for acute myelogenous leukaemia, acute
lymphoblastic leukaemia, chromin myelogenous leukaemia and myelodysplastic syndrome.
Another objective of the IHEP is to develop the bioinformatics infrastructure to support the
curation and integration of epigenomic data over and above the genome sequence, and
provide user-friendly interfaces for free public access to the data.
2
Towards defining the human methylome
Megan P. Hitchins, PhD
The human methylome
Cytosine methylation (mC) is most readily detected following conversion of genomic DNA
with sodium bisulphite treatment, whereby unmethylated cytosines are converted to uracil
and thence to thymine following PCR amplification, whereas methyl cytosines (mC) are inert
and remain unconverted as a C. Thus mC may be differentiated from unmethylated cytosines
on the basis of the presence of a C or a T in bisulphite-converted DNA, relative to the
original DNA sequence, by DNA sequencing (or other methods) in the same manner as SNPs
are identified. The application of automated high throughput deep-sequencing to the entire
sodium bisulphite-converted genome (MethylC-seq or BS-seq) has the capability to provide
full coverage of the methylome at single base resolution. Full methylome analysis should
ideally take into account the occurrence of CpG methylation, hydroxylmethylation of CpGs
(hmCpG) as well as non-CpG methylation, although the biological functions of these latter
modifications remain to be determined. However, sodium bisulphite treatment does not
distinguish between mC and hydroxymethyl cytosine (hmC), since both remain unconverted.
(Currently the only means of differentiating between mC and hmC is through the use of
specific antibodies, which bind either mC or hmC and allow DNA sequences with which
these modifications are associated to be analysed - though this is a blunt tool capable only of
examining these modifications on a global scale).
Whole-genome methylC-seq to single base resolution has now been successfully
implemented for three human cell types: embryonic stem (ES) cells, fetal lung fibroblasts,
and of significant clinical importance, peripheral blood mononuclear cells (PBMCs).
Defining the methylomes of human ES and fetal lung fibroblast cells by the same group [3]
led to significant revelations. Most notably, while both cell types had similar levels of mCpG
occurring as a mirror-imagine on both DNA strands, human ES cells had significant levels of
mCHG and mCHH as well, which occurred asymmetrically and preferentially on the
antisense strand. When the ES cells were induced to differentiate, these non-CpG methylation
marks were lost, whereas they were gained by the fetal lung fibroblasts following induction
to pluripotency (iPS). Thus non-CpG methylation appears to be characteristic of the
pluripotent state. The methylome of PBMCs was defined in an anonymous individual of Han
Chinese heritage, whose complete genome had previously been sequenced, allowing for the
integration of genomic and epigenomic information [4]. Interestingly, a significant number of
regions showing methylation of a single allele were identified, only a proportion of which
were attributable to genomic imprinting (methylation and silencing of a single copy of a gene
on the basis of parental origin of inheritance). Some of these ‘monoallelically’ methylated
sites were associated with genetic polymorphisms and were also expressed from a single
allele. This study thus illustrates that the epigenome within any particular cell type is also
likely to differ between individuals according to genomic sequence variation due to genomeepigenome interactions. Comparison of the methylomes from these three different tissues also
identified regions of differential CpG methylation between the cells types, confirming tissuespecific methylation differences. On the other hand, common patterns of CpG methylation
across the genome also emerged. These included diminished methylation at the transcription
start sites and the first exon of genes (irrespective of whether these occurred within CpG
3
Towards defining the human methylome
Megan P. Hitchins, PhD
islands), with a sudden increase in methylation thereafter at the first intron. Exons within the
body of genes were then methylated at higher levels than introns. On a chromosomal scale,
subtelomeric regions were comparably hypermethylated.
Future applications of the human methylomes
As delineation of the methylomes of various normal and disease-related cells becomes
commonplace in the near future, cellular and disease states will be more definitively
categorised on the basis of unique or confined epigenetic marks. A panel of methylation
marks used to define colorectal cancers on the basis of the “CpG methylator phenotype” is
already utilised, although this was developed following single-gene analyses. With the
scaling up of methylation analyses to encompass entire genomes, a proportional scaling in the
identification of methylation biomarkers will ensue. These will include diagnostic
methylation biomarkers that detect or define disease states (or particular subtypes of diseases
that we are currently unable to distinguish on the basis of existing pathological tests), as well
as prognostic markers for particular treatment regimens. Defining the various human
methylomes is likely to lead to a revolution in molecular pathology practice.
References and further reading
1.
Moving AHEAD with an international human epigenome project. Nature, 2008.
454(7205): p. 711-5.
2.
Jones, P.A. and R. Martienssen, A blueprint for a Human Epigenome Project: the
AACR Human Epigenome Workshop. Cancer Res, 2005. 65(24): p. 11241-6.
3.
Lister, R., et al., Human DNA methylomes at base resolution show widespread
epigenomic differences. Nature, 2009. 462(7271): p. 315-22.
4.
Li, Y., et al., The DNA methylome of human peripheral blood mononuclear cells.
PLoS Biol, 2010. 8(11): p. e1000533.
Contact details:
Dr Megan P. Hitchins,
Medical Epigenetics Laboratory,
Adult Cancer Program,
Lowy Cancer Research Centre,
University of New South Wales,
Randwick High Street,
Randwick NSW 2052
Email: M.Hitchins@unsw.edu.au
Phone: 02 9385 1431
4
Download