Notes on Chromatin Conformation Capture

advertisement
Notes on Chromatin Conformation Capture
And Related Techniques
https://en.wikipedia.org/wiki/Chromosome_conformation_capture
- 3C is high throughput molecular biology technique used to analyze the
organization of chromosomes in a cell’s natural state
- Studying these properties is important for understanding and evaluation of the
regulation of gene expression, DNA replication and repair and recombination
o One example is chromosomal folding to bring an enhancer and
associated transcription factors within close proximity of a gene, as
was first shown in the beta-globin locus
3C technique has five experimental steps
- Step 1 Cross-linking: Addition of formaldehyde results in the cross-linking of DNA
segments to proteins and the cross-linking of proteins with each other
o This leads to cross-linking of interacting DNA segments (e.g.
enhancers and promoters)
o A cross-link is a bond that links one polymer chain to another – these
bonds can be covalent or ionic bounds
o For example, disulfide and isopeptide bonds are natural cross-links
that occur in organisms
o Through the clever use of crosslinking agents, protein-protein
interactions in a cell’s natural state can be stabilized and looked for,
since protein interactions are often too weak or transient to be easily
detected on their own
o Formaldehyde is a common reagent of choice whose cross-linking
effects can be reversed by incubation at 70C
- Step 2 Restriction digest: A restriction enzyme is added in excess to the crosslinked DNA, separating the non-cross-linked DNA from the cross-linked
chromatin
o The selection of the restriction enzyme depends on the locus being
analyzed
o Frequently cutting enzymes (4 bp) are used to study smaller loci
(<10-20 kb), while larger loci demand the use of larger cutters (5 bp)
o The idea is that the restriction enzyme will not cut within the two
interacting sequences (this will be due to the choice of restriction
enzyme) so that the cross-linked DNA remains intact, while
everything else is chopped into small pieces
- Step 3 Intramolecular ligation: Using very low concentrations of DNA favors the
ligation of relevant DNA fragments with the corresponding junctions, instead of
the ligation of random fragments
o The sticky ends produced at the ends of the relevant cross-linked
sequences by the restriction enzymes can be ligated together using
DNA ligase (similar to the way that Okazaki fragments are ligated
together on the lagging strand during DNA replication), which is done
through the formation of two covalent phosphodiester bonds
o This forms a circular fragment around the formaldehyde that had
cross-linked the DNA if both ends ligate together (if only one end
ligates, then the resulting strand is linear, with a central restriction
site corresponding to the site of ligation and specific restriction ends
as well
o However, randomly cross-linked fragments will also be ligated
together, due to incomplete restriction digestion, which represents
about 20-30% of all junctions. This number can be decreased by
reducing the cross-linking stringency in the first step so that only very
close DNA segments cross-link (i.e. those that are interacting)
o Furthermore, one end of a fragment can ligate with the other end of
the same fragment (i.e. self-circularization), preventing the ligation of
relevant cross-linked DNA to each other and this contributes up to
30% of all junctions formed
- Step 4 Reverse cross-links: High temperature will result in the reversal of crosslinks formed in step 1
o The resulting linear DNA fragment has specific restriction ends as well
as a central restriction site corresponding to the site of ligation
o The pool of these fragments is collectively referred to as the 3C library
- Step 5 Quantitation: PCR uses primers against the site of ligation to semiquantitatively assess the frequencies of a restriction fragment of interest
o Quantitative PCR using TaqMan probes (3C-qPCR) provides a more
quantitative measurement of the fragment of interest
o Quantitative PCR monitors the amplification of a targeted DNA
molecule during PCR in real-time using non-specific fluorescent dyes
that intercalate with any double-stranded DNA
o In this sense, since there will be more relevant fragments than
random fragments ligated together, the ligated fragments will also
amplify more quickly and so can be picked out using qPCR
Circularized Chromosome Conformation Capture (4C)
- This technique has an advantage over 3C in that only the sequence of one of the
sites of interest needs to be known – this fragment is known as the “bait”
- Steps 1-4: These steps are identical to 3C. The idea is that we should produce a
bunch of one promoter sequence that is ligated to one or more unknown
enhancer sequences (for example)
- Step 5a Second restriction digest: After the reversal of the cross-linked DNA, the
restriction fragments are subjected to another round of restriction digest, this
time with a frequent cutter that will results in smaller fragments with restriction
ends that differ from the central restriction site
- Step 5b Self-circularization: Self-circularization of the DNA fragments is more
favored now that they are not bound to other proteins or fragments
o Intramolecular ligation occurs to induce the formation of circular
fragments which become the 4C library
- Step 5c Inverse PCR and quantitation: Primers are designed against the outer
restriction sites of the “bait” sequence, which result in the amplification of the
small unknown captured fragment
o Large-scale sequencing can be used to sequence the 4C library –
custom microarrays can also be made using probes designed against
the adjacent upstream and downstream regions of all genomic sites of
the restriction enzyme used in step 2
Carbon-Copy Chromosome Conformation Capture (5C)
- The 5C technique expands from 3C and allows for the parallel analysis of
interactions between many selected loci
- Steps 1-4: Same as in 3C
- Step 5 Ligation-mediated amplification and quantitation
o Performing multiplex ligation-mediated amplification (LMA) after the
construction of the 3C library leads requires using multiplex primers
that consist of universal primer sequences like T7 and T2 and the
ligation junction sequences (all the 3C library leads should still have
the same central restriction site at the ligation junction)
o They anneal to the 3C fragments and get ligated together with a DNA
ligase – the ligated primers serve as templates of which get amplified
to generate the 5C libraries
o The use of universal primer sequences means these 5C fragments can
be analyzed on microarrays and the small size of the 5C fragments is
also compatible with analysis using high-throughput sequencing
o Multiplex PCR permits multiple targets to be amplified with only a
single primer pair
 Each probe consists of two oligonucleotides that recognize
adjacent sequences in the DNA and when they bind, and a
ligase is used to mend the two probes together.
 Then, a primer that requires sequences from both probes is
used for PCR so that amplification only occurs upon ligation
http://www.lcg.unam.mx/frontiers/files/frontiers/deWit_GenesDev2012.pdf
A decade of 3C technologies: insights into nuclear organization
- 3C technologies are based on the remarkably simple idea that digestion and religation
of fixed chromatin in cells, followed by the quantification of ligation junctions, allows
for the determination of DNA contact frequencies and insight into chromosome
topology
- First step is to establish a representation of the 3D organization of the DNA
o Chromatin is fixed using a fixative agent, most often formaldehyde
o Next, the fixed chromatin is cut with a restriction enzyme recognizing 6
bp (such as HindIII) or with more frequent cutters
-
-
-
-
-
-
o Then, the sticky ends of the cross-linked DNA fragments are religated
under diluted conditions to promote Intramolecular ligations (i.e. between
cross-linked fragments
o DNA fragments that are far away on the linear template, but that
colocalize in space (e.g. enhancers and promoters) can be ligated to each
other
o In this way, a one-dimensional linear DNA segment serves as a template
of the 3D nuclear structure
To establish the 3D conformation of a locus or chromosome, one must measure the
number of ligation events between non-neighboring sites
o In 3C, this is done by quantitative or semiquantitative amplification of
selected ligation junctions
o Primers are designed near and toward the ends of all restriction fragments
of interest and the amplification efficiencies of different primer
combinations are compared in a matrix of ligation frequencies that serve
as proxies for pairwise interaction frequencies
In the original study by Dekker et al. (2002), from this matrix, the average 3D
conformation of yeast chromosome III was determined, showing that it forms a
contorted ring
o The method was then adapted for mammalian systems to show that
chromatin loops exist in vivo between regulatory DNA elements and their
target genes via studies of the beta-globin locus
With 3C, it is also possible to pick up enhancers that were previously unknown to
regulate a gene
o Survey of spatial environment of CFTR gene identified a number of cell
type-specific DNA-DNA interactions
o Some sequences showed enhancer activity in a reporter assay suggesting
they may activate CFTR expression
Figure 2 demonstrates how enhancer looping at the mouse beta-globin locus was
demonstrated
o The relative cross-linking frequencies of several sequences for found to be
particular high in fetal liver cells where the gene is expressed, while the
cross-linking frequencies were low everywhere in fetal brains, where the
gene is silent
o This suggests that the sequences with high-cross linking frequency are
regulatory in nature (and, in particular, are enhancers)
Enhancer activity on gene expression can be blocked by insulator sequences
They are bound by proteins such as CTCF and 3C technology has been used to
demonstrate that the function of certain insulators is dependent on the spatial
organization
o CTCF sites form chromatin loops by contacting each other in the betaglobin locus
o Also recruit additional factors such as cohesin, which may facilitate DNA
loop formation
A number of recent studies have pointed to the existence of loops between the start
and end of a gene
o 3C experiments in mouse liver cells showed that ribosomal DNA
promoters have an increased propensity to interact with terminator
sequences and that these loops are associated with increased rDNA
expression
o A mechanistic explanation is that gene looping facilitates reloading of
RNA polymerase and thereby increases expression throughput
o In yeast, loops form on genes when they are active or poised, but not when
they are repressed
- Technical issues that arise when interpreting 3C data
o Any two sequences nearby on the linear chromosome are close in space
and therefore sequences over hundreds of kilobases frequently cross-link
and ligate to the anchor, independently of the chromatin’s 3D
conformation
o To appreciate loops visualized by 3C-based technologies, one needs to
find the anchor interacting with a distant sequence more frequently than
with intervening sequences
o Therefore, 3C methods intrinsically rely on quantitative rather than
qualitative measurements using qPCR for the quantitative detection of a
given ligation junction
o At most alleles, cross-linking will result in larger chromatin aggregates
with many DNA fragments together within which all DNA ends compete
with each other for ligation to the anchor fragment
o Therefore, even a very stable enhancer-promoter interaction will only
occasionally result in the corresponding ligation junction an, since every
diploid cell only contributes maximally two ligation junctions of interest,
3C PCR requires faithful and quantitative amplification of very rare
ligation junctions from many genome equivalents
o Consequently, qPCR is notoriously difficult and requires strict controls
and careful experimental design
o The advent of genome-scale methods such as microarrays and highthroughput sequencing has enabled the development of more unbiased
methods that offer a solution for assessing the relative abundance of longrange DNA-DNA contacts
Chromosome conformation capture-on-chip (4C) technology
- 4C-seq uses next-generation sequencing (NGS) to analyze contacting sequences and is
similar to using microarrays to analyze the contacts of a selected genomic site with all
of the genomic fragments on the array
- It is a “one versus all” strategy because, a single viewpoint is defined and the genome
is screened for sequences that contact this selected site
- In 4C technology, the ligated 3C template is processed with a second round of DNA
digestion and ligation to create small DNA circles (some of which contain 3C ligation
junctions)
-
-
-
-
-
o Using view-point-specific primers (that bind to our sequence of interest),
inverse PCR specifically amplifies all sequences contacting this
chromosomal site and can be analyzed by microarrays or NGS methods
o The latter is cheaper and enables more accurate quantification of DNA
interaction frequencies and has a larger dynamic range
o The idea is that the viewpoint-specific primers also contain the Illumina
sequencing primers so that the PCR products can be sequenced without
further processing
o The reads contain the primer and the ligation junction, and after trimming
the primer sequence, the remainder of the reads are aligned to the genome
4C first used to investigate the DNA interaction profiles of a tissue-specific gene
embedded in an inactive chromosomal region (beta-globin) and a house-keeping gene
(Rad23a) present in an active gene-rich region
o Rad23a made contacts with active regions on its own chromosome and on
other chromosomes that was largely conserved in both tissues
o The tissue-specific gene, however, made contacts with other active regions
in erythroid cells, while in fetal brains (where it is not expressed), inactive
regions were contacted
4C studies have also shown that coregulated genes preferentially meet at dedicated
transcription sites in the nucleus implying that genes dynamically move to specific
nuclear locations for transcription, rather than the transcription machinery moving to
genes
Another 4C study focused on dosage compensation of the mammalian X chromosome
o Using allele-specific 4C strategy, it was shown that the active and inactive
X chromosomes adopt distinct topologies in that the noncoding RNA Xist
that drives X inactivation is not required to maintain gene silencing
4C technology is preferred to assess the DNA contact profile of individual genomic
sites but is limited to the description of long-range contacts with larger regions
elsewhere on the chromosomes (in cis) or on other chromosomes (in trans), rather
than local interactions between a gene and its enhancer 50 kb away due to lack of
resolution
Most 4C strategies use restriction enzymes with a 6-nucleotide recognition sequence
that cut once every few kilobases, creating fragments that are much larger than the
average regulatory sequences (which are less than several hundred base pairs)
Using more frequent cutters (i.e. those that recognize 4-nt sequences) can potentially
pick up more local interactions
Chromosome conformation capture carbon copy (5C technology)
- “Many versus many” technology allowing concurrent determination of interactions
between multiple sequences
- In 5C, the 3C template is hybridized to a mix of oligonucleotides, each of which
partially overlaps a different restriction site in the genomic region of interest
o Pairs of oligonucleotides that correspond to interacting fragments are
juxtaposed on the 3C template and can be ligated together
o Since all 5C oligos carry one of two universal primer sequences at their 5’
ends, all ligation products can subsequently be amplified simultaneously
in a multiplex PCR reaction and then analyzed through high-throughput
sequencing
- The resolution of the technique is determined by the spacing between neighboring
oligonucleotides on the linear chromosome template
- It can never reach the resolution of 4C as note every unique end of a restriction
fragment will allow the design of a 5C oligonucleotide
- However, it provides a matrix of interaction frequencies for many pairs of sites
allowing the reconstruction of the average 3D conformation of larger genomic regions
Interpretation of chromosome capture experiments: further considerations
- The resolution of all 3C-based methods is limited by the choice of the first restriction
enzyme
- For a six-cutter like HindIII, there are ~800,000 HindIII sites in the mouse
genome, and the average resolution throughput for the genome will be ~4 kb.
- Local distribution of restriction sites can vary between different genomic regions,
resulting in different resolutions at different genomic locations
- An additional factor that can influence the results is the presence of repeats in the
genome
o For 3C, this is a relatively minor problem because one can be quite
flexible in the selection of primers for PCR
o For NGS-based methods, this can be more challenging, especially in
4C-seq and 5C, which rely on the sequence directly adjacent to the
restriction site
o This can be partially circumvented by increasing the length of the
sequencing reads, which gives higher mapping specificity
- A key characteristic of 3C-based methods is the very high capture probability
between neighboring fragments, due to their close spatial proximity
o Moving further away from a given fragment leads to exponential
decrease of the capture probability until it reaches a baseline level
o The rapid decline in contact probability makes it so that specific
ligation junctions between two given sites far apart on the
chromosome will be rare
o This makes 3C unsuitable for the analysis of long-range contacts
o For far cis and trans DNA contacts, 3C and HiC data sets are not
reproducible at the single fragment resolution, but are highly
reproducible over genomic windows
o When a long-range interaction within or between chromosomes is
described, this is often a statistical definition, meaning two regions
have a higher probability for making contacts compared with other
regions at a similar distance on the same chromosome
- A further issue to consider is the number of contacts a given gene appears to have
o In 4C, a single locus can be engaged in tens or hundreds of contacts
(depending on the threshold)
o These contacts are collected from many cells and will not all be
present in the same cell, implying that the large number of contacts
reflects cell-to-cell differences in genome topology
o During mitosis, each chromosome probably adopts one of a limited
number of energetically favored conformations that will position a
given gene next to a few other genes
- The dynamics of chromatin structure and cell-to-cell variation is not appreciable
by 3C based methods, and it cannot be determined whether two different
interactions of A with B and C occur simultaneously or sequentially and/or
whether they are mutually exclusive
Original 3C method: Dekker et al. (2002)
Capturing Chromosome Conformation
- Analysis of chromosome conformation is complicated by technical limitations
- Electron microscopy while affording high resolution cannot be applied to studies of
specific loci
- Light microscopy does not offer high enough resolution
- DNA binding proteins fused with GFP allow visualization of individual loci, but do
not permit a multi-loci analysis
- Intact nuclei are isolated and subjected to formaldehyde fixation, which cross-links
proteins to other proteins and to DNA
o The overall result is cross-linking of physically touching segments
throughout the genome via contacts between their DNA-bound proteins
o The relative frequencies with which different sites have become crosslinked are then determined
 This is done using restriction enzymes to cut around the areas of
the DNA that are cross-linked and then using DNA ligase to seal
these nearby segments together
 Very low concentrations of DNA are used to favor ligation of
cross-linked fragments, which is Intramolecular, over ligation of
random fragments, which is intermolecular
 The cross-links are reversed, and the resulting linear DNA
fragment with a restriction site in the middle is quantified using
qPCR
 Control template is generated in which all possible ligation
products are present in equal abundance
- Explanation of Figure 1
o When using primers 5 and 6, which are centered around the central
restriction site, greater qPCR amplification was observed, compared to
using primers 6 and 13, where 13 is located much farther from the
restriction site
o Ligation product formation increased linearly with formaldehyde
concentration
- In general, cross-linking frequency (X) decreases with increasing separation distance
in kb along chromosome III
- Centromere relationships were probed by analyzing the frequencies with which the
centromere of chromosome IV (CEN4) became cross-linked to each of 10 sites along
the length of chromosome III
o In premeiotic cells, CEN4 interacted strongly with CEN3
Oncogene-mediated alterations in chromatin conformation (2011)
http://www.pnas.org/content/109/23/9083.short
Abstract
- They performed unbiased high-resolution mapping of intra- and interchromosome
interactions upon overexpression of ERG, an oncogenic transcription factor frequently
overexpressed in prostate cancer
- Through integration of data from genome-wide chromosome conformation capture
(Hi-C), ERG binding and gene expression, they demonstrate that oncogenic
transcription factor overexpression is associated with global, reproducible and
functionally coherent changes in chromatin organization
- Broad implications, since genomic alterations in other cancer types frequently give
rise to aberrant transcription factor expression, which can alter 3D topology of
chromatin
Topological domains in mammalian genomes identified by analysis of chromatin
interactions (2012)
http://www.nature.com/nature/journal/v485/n7398/abs/nature11082.html
- They investigate the three-dimensional organization of the human and mouse genomes
in embryonic stem cells and terminally differentiated cell types at unprecedented
resolution
- Identify large, megabase-sized local chromatin interaction domains, which we term
‘topological domains’ as a pervasive structural feature of the genome organization
- These domains correlate with regions of the genome that constrain the spread of
heterochromatin
o Stable across different cell types and highly conserved across species,
indicating that topological domains are an inherent property of
mammalian genomes
- Found that the boundaries of topological domains are enriched with insulator binding
protein CTCF, housekeeping genes, and SINES, indicating that these factors may have
a role in establishing the topological domain structure of the genome
-
https://books.google.ca/books?id=uKDHBgAAQBAJ&pg=PA58&lpg=PA58&dq=can+
cross-
linked+DNA+be+cleaved+with+restriction+enzymes&source=bl&ots=MH7Bo3OV2&sig=UKKuN7xNWSdiWW_jObicAxkeTWA&hl=en&sa=X&ved=0CDAQ6AEwA2oV
ChMI46ffibSNyAIVVpyICh26UgtZ#v=onepage&q=can%20crosslinked%20DNA%20be%20cleaved%20with%20restriction%20enzymes&f=false
Print out copies of the question!
Make question by the end of the day and send to everybody!
Read the write up and add contributions if you need too!
Download