Power point format lecture slides - Wei Li Lab

advertisement
Epigenomics: A Practical Guide
Benjamin Rodriguez, PhD
Wei Li Lab, Baylor College of Medicine
Molecular Biology Refresher Course with Bioinformatics
August 30th 2013
Software, Sites, Materials
Course Materials:
http://dldcc-web.brc.bcm.edu/lilab/benji/MBRB_2013/index.html
Most up to date slides
I will upload for all three of my lectures
Browsers:
http://genome.ucsc.edu/
http://epigenomegateway.wustl.edu/
Web-based analysis:
http://bejerano.stanford.edu/great/public/html/
http://david.abcc.ncifcrf.gov
Outline
• DNA methylation
• Histone modifications
• DNase
hypersensitivity
• Aberrant methylation
in cancer
• Epigenetic
inheritance in
development and
disease
DNA is Packaged in Chromatin
Chromatin consists of nucleosomes, DNA
wrapped around histone proteins
nucleosome
histone
DNA
chromatin
• Chromatin organizes genes to be accessible for transcription,
replication, and repair
CpG Islands and Promoters
• Although C & G constitute 42% of the human
genome, less than 1% of pairs are CpG
– Less than ¼ of expected frequency of 0.04
• A ‘CpG island’ is a run of “CpG-rich” sequence
– min 200 bp in length
– GC content > 50%
– Observed : Expected ratio > 0.6
– This definition is not precise
• Many CpG islands occur within promoters
DNA Methylation
• Methylation at CpG islands often repress nearby
gene expression
• Many highly expressed genes have CpG
methylation on their exons
• Some genes could be imprinted, so maternal and
paternal copies have different DNA methylation
• In embryonic stem cells, there are also CHG
methylation
• Recently, another type of DNA methylation called
hydroxyl methylation (hmC) is found
Epigenetic Mechanisms: DNA Methylation
CpG island
CGCG CG
Normal
1
CG
MCG
CG
2
3
MCG
4
C: cytosine
mC:
methylcytosine
DNA Methylation and Gene Silencing
CpG island
CGCG CG
Normal
1
MCGMCG MCG
Cancer
CG
2
3
MCG
1
3
MCG
4
CG
CG
2
X
MCG
CG
4
C: cytosine
mC:
methylcytosine
CG
DNA Methylation and Regulation
• Cytosine methylation blocks DNA-binding proteins’
access to regulatory sites and creates binding sites
for repressive proteins
• Methylation often follows decrease in site use
From Thurman et al Nature 2012
Methylation and Expression
Some genes (e.g. HOXB13 in breast cancer) show strong
correlation of promoter methylation with expression
R2 = 0.7817 P < 0.0005
From Rodriguez et al Carcinogenesis 2008
Methylation, Retroviruses and Repeats
• Bacteria use DNA methylation to limit invasive
DNA from viruses
• A large fraction of the human genome consists
of carcasses of retro-viruses and transposons
• Almost all DNA repeats are heavily methylated
• If they lose methylation they are more likely to
be expressed
DNA Methylation and Development
• Almost all DNA de-methylated in embryo
• Increasing methylation at various times during
fetal development restrict functionality
– This is why cloning is difficult
• Wave of methylation in adolescence
• Gradual de-methylation in old age
DNA Methylation and Inheritance
• Most DNA is de-methylated during
gametogenesis and embryogenesis
• Methylation persists in some DNA regions
• Humans and mice show epigenetic
inheritance apparently mediated by DNA
methylation
Agouti Mice and DNA Methylation
Genomic Imprinting
Epigenetic mechanism of transcriptional regulation
Maternal allele
Paternal allele
Paternal allele
Maternal allele
Expression of a subset of mammalian genes is
restricted to one parental allele
Genomic Imprinting
Maternal allele
Paternal allele
Paternal allele
Maternal allele
Parental chromosomes are differentially marked by
DNA methylation
Genomic Imprinting
Imprinting regulated by cis-acting elements
(Imprinting Control Regions) and non-coding RNAs
Maternal allele
Paternal allele
Paternal allele
Maternal allele
Imprinting Control Regions act over long distances and
control the imprinting of multiple genes
We will examine a recent study of the IGF2 DMR in
individuals exposed to famine in utero
Epigenetic Mechanisms: Post-Translational
Modification to Histones
Histone
Acetylation
Histone
Methylation
Ac
Me
• Epigenetic modifications of Histones include Histone
Acetylation and Methylation
Histone Modifications
• Different modifications at different locations by
different enzymes
• Potential temporal and spatial specificity
Histone Modifications
• Gene body mark: H3K36me3, H3K79me3
• Active promoter (TSS) mark: H3K4me3
• Active enhancer (TF binding) mark:
H3K4me1, H3K27ac
• Both enhancers and promoters: H3K4me2,
H3/H4ac, H2AZ
• Repressive promoter mark: H3K27me3
• Repressive mark for DNA methylation:
H3K9me3
Genes, regulatory DNA, and epigenetic features
Graphic from NIH RoadMap Epigenomics Site
Genes, regulatory DNA, and epigenetic features
DNaseI
- promoters
- enhancers
- silencers
- insulators
- etc.
DNase Hypersensitive (HS) Mapping
• DNase randomly cuts genome (more often in
open chromatin region)
• Select short fragments (two nearby cuts) to
sequence
• Map to
active
promoters
and
enhancers
DNaseI hypersensitive sites mark regulatory DNA
DNaseI Hypersensitive site (DHS)
Promoters
Enhancers
~100,000 – 250,000 DHSs per cell type (0.5-1.5% of genome)
genome.ucsc.edu
www.epigenomebrowser.org
Epigenetic Modifications to Histones and DNA Can
Cooperate to Silence Gene Expression
Me
HDAC
DNMT
HMT
Me
Me
Me
Ac
Ac
Me
Me
Me
Me
Ac
HMT
HDAC
Gene
expression
Me
Me
Gene
expression
• Coordinated activities of chromatin modifying enzymes lead to
condensation of chromatin and inhibition of gene expression
Roles in Normal Development and Cancer
EPIGENETICS
Normal epigenetic
mechanisms
Differentiated
cells
Progenitor
cell
• Regulation of genes involved in differentiation, cell cycle, and cell
survival
Roles in Normal Development and Cancer
EPIGENETICS
Normal epigenetic
mechanisms
Differentiated
cells
Progenitor
cell
Deregulated epigenetic
mechanisms
Malignant
progenitor cell
Tumor
• Regulation of genes involved in differentiation, cell cycle, and cell
survival
• Through epigenetic silencing of certain genes, affected cells may
acquire new phenotypes which promote tumorigenesis
HOXB13 hypermethylation in breast cancer cells
R2 = 0.7817
P < 0.0005
Strong inverse assocation between promoter CpG island
hypermethylation and HOXB13 gene expression
From Rodriguez et al Carcinogenesis 2008
HOXB13 hypermethylation in breast cancer cells
Bisulfite sequencing
(Sanger, clone-based, very laborious)
From Rodriguez et al Carcinogenesis 2008
Inhibition of DNA methyltransferase activity
restores expression of HOXB13
From Rodriguez et al Carcinogenesis 2008
HOXB13 hypermethylation strongly
associates with patient ER status
(OR=3.75, 95% CI 1.41-9.96; P = 0.008)
Patient ER Status
Paired Tumor and Adjacent Normal Tissues
From Rodriguez et al Carcinogenesis 2008
HOXB13 hypermethylation associates with poor
disease free survival in ERα-positive patients
From Rodriguez et al Carcinogenesis 2008
Epigenetic inheritance and human development
• Epidemiologic studies suggest adult disease risk is
associated with adverse environmental conditions early
in development
• Involvement of epigenetic dysregulation has been
hypothesized
• Do early-life environmental conditions can cause
epigenetic changes in humans that persist throughout
life? Is there are role for clinical intervention?
1. Periconceptual exposure to famine
2. Offspring born before vs. after maternal gastrointestinal
bypass surgery
Persistent epigenetic differences associated with
prenatal exposure to famine in humans
Individuals who were prenatally exposed to famine during the
Dutch Hunger Winter in 1944–45 had, 6 decades later, less DNA
methylation of the imprinted IGF2 gene compared with their
unexposed, same-sex siblings
Association was specific for periconceptional exposure, reinforcing
that very early mammalian development is a crucial period for
establishing and maintaining epigenetic marks
Heijmans et al PNAS 2008
Insulin-like growth factor II (IGF2)
•
•
•
•
One of the best-characterized epigenetically regulated loci
Key factor in human growth and development
Maternally imprinted
Imprinting is maintained through the IGF2 differentially
methylated region (DMR)
• Hypomethylation of DMR leads to bi-allelic expression of IGF2
• IGF2 DMR methylation is a normally distributed quantitative trait
largely determined by genetic factors
• methylation mark is stable up to middle age
• If affected by environmental conditions early in human
development, altered methylation may be detected many years
later
Difference in IGF2 DMR methylation between individuals
prenatally exposed to famine and their same-sex sibling
Fig A displays the difference in IGF2 DMR methylation within sibships according to the
estimated conception date of the famine-exposed individual
IGF2 DMR methylation was lowest in the famine-exposed individual among 72% (43/60)
of sibships; this lower methylation was observed in conceptions across the famine period.
IGF2 DMR methylation among individuals
periconceptionally exposed to famine and their
unexposed, same-sex siblings
IGF2 DMR methylation among individuals exposed to famine
late in gestation and their unexposed, same-sex siblings
• 62 individuals exposed to famine late in gestation for at least 10 weeks, they
were born in or shortly after the famine
• No difference in IGF2 DMR methylation between the exposed individuals and
their unexposed siblings
Timing of famine exposure during gestation
and IGF2 DMR methylation
• Periconceptional, late exposure groups and 122 controls
• Periconceptional exposure associated with lower methylation
• Statistically significant association between timing and exposure
Differential methylation in offspring born before versus
after maternal gastrointestinal bypass surgery
• Obesity during pregnancy affect fetal programming of adult
disease
• Children born after surgery (AMS) are less obese and exhibit
improved cardiometabolic risk profiles carried into adulthood
• Analyze the impact of maternal weight loss surgery on
methylation levels in BMS and AMS offspring.
• Statistically significant correlations between gene methylation
levels and gene expression and plasma markers of insulin
resistance
• Effective treatment of a maternal phenotype is durably
detectable in the methylome and transcriptome of subsequent
offspring
Guenard et al. PNAS 2013
Offspring born before vs. after
maternal gastrointestinal
bypass surgery
BMS offspring
higher weight, height, and waist and hip
girth (P < 0.05)
AMS offspring
Lower body fat % (P = 0.07)
Improved Fasting insulin levels (P = 0.03)
Homeostatic model of insulin resistance
(HOMA-IR) index (P = 0.03)
Lower blood pressure (P < 0.05).
Differential methylation analysis of offspring born before vs. after
maternal gastrointestinal bypass surgery
• 14,466 CpG sites (2.9% of sites analyzed) exhibited significant differences
• corresponded to 5,698 unique genes
• significant biological functions related to autoimmune disease, pancreas
disorders, diabetes mellitus, and disorders of glucose metabolism
Any questions?
On to the Laboratory!
Laboratory Excercises
• We will work with the significant differentially methylated CpG
sites published as supporting data from the Guenard study
• The MGBS_study.xlsx and AMS.probes.bed files are available
from the class web site
• We will perform our own mapping and significance testing of
the CpG sites (in relation to genes) using GREAT
• We will analyze the published gene list and our custom gene list
in DAVID
• Finally, we will analyze last week’s gene list (from the MLL-AF9
fusion protein study) in DAVID
Do hyper- and hypo-methylated sites in AMS offspring
have different distributions?
Open MGBS_study.xlsx
and examine the “DMC
list” worksheet
The study used a poorly
described algorithm,
DiffScore, to assess
statistical differences
and to rank CpG sites
Also implemented a
loose threshold for
change cutoff
In excel, we can easily
compute summary stats
The average and
standard deviation of
the Delta beta values
are quite similar
Mapping significant CpG sites to genes with GREAT
• Choose human GRCh37 on species assembly
• Test regions upload bed file AMS.probes.bed
• Set Background regions
to whole genome
• Choose submit
Compare gene lists from publication to those obtained by GREAT
http://jura.wi.mit.edu/cgi-bin/bioc/tools/compare.cgi
Choose compare 2 lists, Paste lists of genes, press submit
• GREAT recovers 170 of 198 genes from the publication (AMS and GREAT)
• GREAT identifies 170 additional genes (because by default it searches a wider
space of genomic distances)
• The missing 28 genes may result from gene name synonyms
Mapping significant CpG sites to genes with GREAT
On “Region-Gene Association Graphs”
We see 3 / 4 of the CpG sites are assigned two genes
Orientation and distance to TSS show upstream pretty flat, but a spike in
predictions when the distance is > 5 kb from TSS
Could that be the reason we don’t see any significance test results?
Let us find out
Modifying the genomic region search range in GREAT
Open the “Association rule settings” dialog box
Change downstream to 5kb and distal to 5kb
Resubmit the job
What happened with a smaller genomic search interval?
We only returned 64 genes! Crap.
But we did finally return a single significant test result
InterPro(protein sequence analysis and classification)
Functional enrichment analyses with DAVID
• With GREAT, we were able to identify the majority of genes
published in the original study
• We do not have sufficient information to repeat the study’s
original analyses
• We can use DAVID to analyze the study gene list and our gene
list from GREAT
Open http://david.abcc.ncifcrf.gov and choose “Start Analysis”
Functional enrichment analyses with DAVID
•
•
•
•
“Upload Gene List” Dialog box
Copy and Paste the list from MGBS_study.xlsx
worksheet “Study Genes”
On “Select Identifier”, choose “Official Gene
Symbol” and choose “Gene List” on “List Type”
Then Submit List
Open http://david.abcc.ncifcrf.gov and choose “Start Analysis”
Functional enrichment analyses with DAVID
For species, highlight Homo sapiens and click
“Select Species”
Rename the list
Choose “Functional Annotation Tool”
Functional enrichment analyses with DAVID
Each Annotation Category on the left
can be expanded to reveal a number of
optional databases to query
This allows for powerful customization
For this exercise, we will accept the
default options
Choose “Functional Annotation Chart”
Functional enrichment analyses with DAVID
Functional Annotation Chart fields are: category, term, related term (RT), genes,
count, percentage, p-value (univariate modified Fisher’s), and Benjamini p-value
(correction for multiple testing)
Terms with arrows can be sorted
Shown above are the first three results, the only ones to pass multipletesting correction
They reference the same group of genes
Functional enrichment analyses with DAVID
Clicking on the link for term “Pleckstrin homology” opens the corresponding
entry at Interpro
Proteins containing this domain can bind to and interact with membrane bound
proteins, potentially mediating various signal transduction pathways in the cell
Functional enrichment analyses with DAVID
Let’s now perform the analysis with our list of
differentially methylated genes obtained via GREAT
• “Upload Gene List” Dialog box
• Copy and Paste the list from MGBS_study.xlsx
worksheet “Great Analysis”
• On “Select Identifier”, choose “Official Gene
Symbol” and choose “Gene List” on “List Type”
• Submit List, choose “Homo sapiens”
• Select “Functional Annotation Chart”
Note: Entrez Gene ID’s are a preferred way
to search for gene functions
They can account for the fact that a gene
may go by several different names
Functional enrichment analyses with DAVID
We see the same first three results as before, but now they do not pass
multiple-testing correction
Why? One explanation, we introduced “noisy” genes with GREAT
Why did we not see any significant biological functions related to
autoimmune disease, pancreas disorders, diabetes mellitus, or disorders of
glucose metabolism?
1. Study authors gave us a small piece of the data they likely used
2. Methodological issues
3. Commercial IPA is very different from publicly curated databases and
search tools
Functional enrichment analyses with DAVID
Finally, lets analyze the list of genes from last
week’s MLL-AF9 fusion gene study.
The file “MLL-AF9_promoters.bed” is available
from the course website
• “Upload Gene List” Dialog box
• Open the bed file in excel, copy and paste the
fifth column into DAVID
• On “Select Identifier”, choose “Entrez Gene ID”
and choose “Gene List” on “List Type”
• Submit List, choose “Mus musculus”
• Select “Functional Annotation Chart”
Note: Entrez Gene ID’s are a preferred way
to search for gene functions
They can account for the fact that a gene
may go by several different names
Functional enrichment analyses with DAVID
Jackpot! We have dozens of highly enriched terms for the genes bound by
oncogenic MLL-AF9 in mouse leukemia stem cells
Enriched functions include transcription regulation and cell cycle
More than 40% of targets are phosphoproteins
Laboratory Summary
• The Guenard study was not very fruitful, so to speak
• I have some issues with their methodology
• Limited data (published) sharing is poor practice
• DNA methylation data is difficult to interpret
• GREAT and DAVID are powerful tools for functional enrichment
analyses of genome-wide studies
• With the right tools and a little patience, you can make novel
discoveries and draw meaningful biological interpretation from
genomics datasets
Download