Computational epigenetics and chromatin regulation

advertisement
ISMB/ECCB07 Vienna, Austria
Special Session: Michael Zhang (Chair)
Computational epigenetics and chromatin regulation
I’d like to start by quoting (Callinan & Feinberg, 2006): “One of the most exciting
frontiers in both epigenetics and genome sciences is the new field of epigenomics or the
study of epigenetic modification at a level much larger than a single gene. Epigenetics is
the study of heritable changes other than those in the DNA sequence and encompasses
two major modifications of DNA or chromatin: DNA methylation, the covalent
modification of cytosine, and post-translational modification of histones including
methylation, acetylation, phosphorylation and sumoylation. Functionally, epigenetic
marks act to regulate gene expression, silence the activity of transposable elements and
stabilize adjustments of gene dosage, as seen in X inactivation and genomic imprinting.
Curiously, the term epigenetics has evolved from its original definition by Waddington,
meaning essentially developmental biology, yet epigenetics in the current meaning of the
word may in fact be critical to understanding developmental biology. This is because the
DNA sequence is invariant across tissues with the exception of rearranged genes such as
immunoglobulin family members, yet the epigenome shows tissue-specific variation.” In
this Special Session, I’ve sampled a few genome-scale studies in different aspects of this
emerging field where computational methods have played an important role in helping to
understand the epigenome.
Mon. July 23, 2007
2:40 p.m. – 3:10 p.m.
Insulator (CTCF) binding site motif and its distribution in
the human genome (M. Q. Zhang)
3:10 p.m – 3:40 p.m.
A genomic code for nucleosome positioning (E. Segal)
3:40 p.m. – 4:10 p.m.
Coffee Break
4:10 p.m – 4:40 p.m.
Mapping the structure of human chromatin (W.S. Noble)
4:40 p.m – 5:10 p.m.
Allelic expression and genomic imprinting (A. Hartemink)
5:10 p.m. – 5:40 p.m
Realizing the medical potential of epigenomics by tailored
algorithms and software (T. Lengauer/C. Bock)
Abstracts:
Insulator (CTCF) binding site motif and its distribution in the human genome
Michael Q. Zhang,
Cold Spring Harbor Laboratory, USA and Tsinghua University, China
CTCF, or CCCTC - binding factor, is a ubiquitous 11 Zn-finger transcription
factor with highly versatile functions and plays important roles in both normal
development and cancer progression. CTCF is highly conserved, can serve as
transcriptional activator (e.g., APBβ), repressor (e.g., c-myc), or chromatin insulator (e.g.,
β-globin) and may have widespread function in imprinting, with binding sites now
identified within H19/Igf2, Rasgrf1, Meg3, and the BWS locus. CTCF interactions with
sin3 and YB-1 are shown to modulate CTCF function as a transcriptional repressor.
Cooperation of CTCF with nucleophosmin, Kaiso and helicase protein CHD8 have been
linked to the control of insulator function of CTCF and epigenetic regulation. More
recently, CTCF-YY1-Tsix complex has been reported to function as a key component of
the X-chromosome binary switch. Since CTCF recognizes long and diverse nucleotide
sequences, whether or not there exist a well-defined binding site consensus motif has
been controversial. Recent genome-wide ChIP-chip mapping of CTCF binding sites (Kim
et al 2007) has allowed us to characterize its motif and distribution that are consistent to
the major role of chromatin insulation. Together with our analysis of large-scale DNA
methylation data (Rollins et al 2006, Das et al 2006 and Fan et al, submitted) and
Polycomb silencing data (Lee et al 2006, Cuddapah et al, submitted), CTCF seems also to
play important roles in boundary regions of un-methylated CpG islands and PcG binding
domains.
References
1. Kim TH, Abdullaev Z, Smith AD, Ching KA, Loukinow D, Green RD, Zhang
MQ, Lobanenkov V, Ren B (2007) Analysis of the vertebrate insulator protein
CTCF binding sites in the human genome. Cell, Accepted.
2. Rollins RA, Haghighi FG, Edwards JR, Das R, Zhang MQ, Ju J, Bester TH
(2006) Large-scale structure of genomic methylation patterns. Genome Res.
16:157-63.
3. Das R, Dimitrova N, Xuan Z, Rollins RA, Haghighi F, Edwards JR, Ju J, Bestor
TH, Zhang MQ (2006) Computational prediction of methylation status in human
genomic sequences. Proc Natl Acad Sci U S A. 103:10713-6.
4. Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier
B, Johnstone SE, Cole MF, Isono K, Koseki H, Fuchikami T, Abe K, Murray HL,
Zucker JP, Yuan B, Bell GW, Herbolsheimer E, Hannett NM, Sun K, Odom DT,
Otte AP, Volkert TL, Bartel DP, Melton DA, Gifford DK, Jaenisch R, Young RA.
(2006) Control of developmental regulators by Polycomb in human embryonic
stem cells. Cell. 125:301-13.
A genomic code for nucleosome positioning
E. Segal
Weizmann Institute, IL
Eukaryotic genomes are packaged into nucleosome particles that occlude the
DNA from interacting with most DNA binding proteins. Nucleosomes are remarkable
from a physical perspective because in each nucleosome one persistence length of DNA
is wrapped in nearly two complete superhelical turns around a protein core. As a
consequence of this extreme DNA bending, nucleosomes have higher affinity for
particular DNA sequences that are best-able to sharply bend as required by the
nucleosome. We have discovered that genomes care where their nucleosomes are located
on average, and that genomes manifest this care by encoding an additional layer of
genetic information, superimposed on top of other kinds of regulatory and coding
information that were previously recognized. By constructing a statistical profile from
nucleosome-bound sequences that we isolated in vivo, we developed a partial ability to
read this nucleosome positioning code and predict the in vivo locations of nucleosomes.
Our results suggest that genomes utilize the nucleosome positioning code to facilitate
specific chromosome functions including transcription factor binding and transcription
initiation; and they imply, further, that genomes are encoding even higher levels of
chromosome architecture.
Mapping the structure of human chromatin
William Stafford Noble, Robert Thurman, John Stamatoyannopoulos
Department of Genome Sciences, University of Washington, Seattle, WA, USA
DNA in the nucleus is packaged in a complex molecular structure known as
chromatin. At the finest scale, the DNA strand is wound around eight-protein histone
complexes called nucleosomes. The positioning of nucleosomes along the DNA, as well
as superhelical and other higher level chromatin structures, directly affects the
accessibility of the DNA to DNA- binding proteins and hence modulates the expression
of nearby genes.
Low-throughput methods for assaying local chromatin structure have been
available for decades. Recently, however, a variety of moderate- to high-throughput
techniques have been developed that allow us to understand chromatin on a larger scale
and across a variety of cellular conditions.
In this talk, I will compare and contrast some of these methods, including
methods using southern blots, PCR, microarrays and capillary sequencing machines. I
will describe some of our observations derived from these assays, and I will discuss the
computational tasks that naturally arise from these new types of data. For example, our
data provides evidence for large-scale chromatin domains, and we show that these
domains are correlated with other high-throughput data sets generated by the ENCODE
consortium.
Computational Methods for Genome-wide Prediction of Imprinted Genes
A. Hartemink
Duke University, USA
Imprinted genes are epigenetically modified genes that are expressed
monoallelically according to their parent of origin. They are involved in embryonic
development, and imprinting dysregulation is linked to cancer, obesity, diabetes, and
behavioral disorders such as autism and bipolar disease. Experimental evidence suggests
that genomic imprinting evolved ~180 million years ago in a common ancestor to
viviparous mammals after divergence from the egg-laying monotremes. We adopt a
machine learning approach for both identifying imprinted gene candidates and predicting
their parental expression preference. We collect a series of DNA sequence features
within and flanking each locus, such as statistics on repetitive elements, transcription
factor binding sites, and CpG islands. Based on these features, we subsequently train a
classifier employing a two- tier prediction strategy. Each gene in the mouse genome is
first predicted to be either imprinted or nonimprinted, and then the parental allele
preferentially expressed is predicted for all candidate imprinted genes. Of 23,788
annotated autosomal mouse genes, our model identifies 600 (2.5%) to be potentially
imprinted, 64% of which are predicted to exhibit maternal expression. These predictions
allow us to identify putative candidate genes for complex conditions where parent-oforigin effects are involved, including Alzheimer disease, autism, bipolar disorder,
diabetes, obesity, and schizophrenia. We observe that the number, type, and relative
orientation of repeated elements flanking a gene are particularly important in predicting
whether a gene is imprinted.
Realizing the medical potential of epigenomics by tailored algorithms and software
Christoph Bock1, Jörn Walter2, Thomas Lengauer1
1
Max-Planck-Institut für Informatik, Saarbrücken, Germany and 2Universität des
Saarlandes, FR 8.3 Biowissenschaften, Genetik/Epigenetik, Saarbrücken, Germany
The demand for computational support and bioinformatic tools in the field of medical
epigenetics is rapidly increasing, due to complex experimental methods, increasingly
genome-wide analysis and the pressure to quickly translate scientific results into clinical
practice. Our goal is to develop bioinformatic methodology for addressing these issues,
and to implement a set of web services which make powerful algorithms available to
typical bench scientists. This talk surveys three of our contributions to the field of
computational epigenetics.
1. We developed software tools that support curation and low-level analysis of DNA
methylation data derived by bisulfite sequencing. This technology is commonly regarded
as the gold standard for assaying DNA methylation, but suffers from the need for careful
control of data quality and tedious data preparation. Our BiQ Analyzer software
automates these tasks and provides an easy step-by-step analysis workflow to facilitate
reproducible data analysis (Bock et al. (2005). Bioinformatics 21, 4067-8). BiQ Analyzer
has recently been selected by ABI to be part of the Applied Biosystems Software
Community Program.
2. We developed a method for predicting epigenetic modifications from the DNA.
Initially applied to DNA methylation, we could show that the methylation state of CpG
islands in human lymphocytes is highly correlated with DNA sequence, repeats and
predicted DNA structure (Bock et al. (2006). PLoS Genet 2(3): e26). We subsequently
extended the method to the prediction of other epigenetic modifications such as histone
modifications and DNA accessibility, and we applied it to refine the annotation of CpG
islands for the human genome (Bock et al. submitted). Given the generality of the method
and a wide range of potential applications in analyzing large-scale epigenome datasets
and prioritization of candidate regions, we implemented it as a publicly available web
service called EpiGRAPH, which is currently in beta testing.
3. We developed a workflow and a set of statistical methods that optimize candidates for
epigenetic biomarkers with respect to their applicability in clinical settings. This
approach accounts for the fact that some widely used methods for pre-clinical research
(such as methylation-specific PCR and bisulfite sequencing of multiple clones) are less
appropriate for clinical settings, due to lack of robustness, complicated handling and
time-consuming analysis. Using a combination of experimental and statistical analysis,
we were able to determine optimal pyrosequencing targets for assessing aberrant
promoter methylation of the MGMT gene, which is an important biomarker for resistance
against chemotherapy (Mikeska et al. submitted).
Download