Notes on Chromatin Conformation Capture And Related Techniques https://en.wikipedia.org/wiki/Chromosome_conformation_capture - 3C is high throughput molecular biology technique used to analyze the organization of chromosomes in a cell’s natural state - Studying these properties is important for understanding and evaluation of the regulation of gene expression, DNA replication and repair and recombination o One example is chromosomal folding to bring an enhancer and associated transcription factors within close proximity of a gene, as was first shown in the beta-globin locus 3C technique has five experimental steps - Step 1 Cross-linking: Addition of formaldehyde results in the cross-linking of DNA segments to proteins and the cross-linking of proteins with each other o This leads to cross-linking of interacting DNA segments (e.g. enhancers and promoters) o A cross-link is a bond that links one polymer chain to another – these bonds can be covalent or ionic bounds o For example, disulfide and isopeptide bonds are natural cross-links that occur in organisms o Through the clever use of crosslinking agents, protein-protein interactions in a cell’s natural state can be stabilized and looked for, since protein interactions are often too weak or transient to be easily detected on their own o Formaldehyde is a common reagent of choice whose cross-linking effects can be reversed by incubation at 70C - Step 2 Restriction digest: A restriction enzyme is added in excess to the crosslinked DNA, separating the non-cross-linked DNA from the cross-linked chromatin o The selection of the restriction enzyme depends on the locus being analyzed o Frequently cutting enzymes (4 bp) are used to study smaller loci (<10-20 kb), while larger loci demand the use of larger cutters (5 bp) o The idea is that the restriction enzyme will not cut within the two interacting sequences (this will be due to the choice of restriction enzyme) so that the cross-linked DNA remains intact, while everything else is chopped into small pieces - Step 3 Intramolecular ligation: Using very low concentrations of DNA favors the ligation of relevant DNA fragments with the corresponding junctions, instead of the ligation of random fragments o The sticky ends produced at the ends of the relevant cross-linked sequences by the restriction enzymes can be ligated together using DNA ligase (similar to the way that Okazaki fragments are ligated together on the lagging strand during DNA replication), which is done through the formation of two covalent phosphodiester bonds o This forms a circular fragment around the formaldehyde that had cross-linked the DNA if both ends ligate together (if only one end ligates, then the resulting strand is linear, with a central restriction site corresponding to the site of ligation and specific restriction ends as well o However, randomly cross-linked fragments will also be ligated together, due to incomplete restriction digestion, which represents about 20-30% of all junctions. This number can be decreased by reducing the cross-linking stringency in the first step so that only very close DNA segments cross-link (i.e. those that are interacting) o Furthermore, one end of a fragment can ligate with the other end of the same fragment (i.e. self-circularization), preventing the ligation of relevant cross-linked DNA to each other and this contributes up to 30% of all junctions formed - Step 4 Reverse cross-links: High temperature will result in the reversal of crosslinks formed in step 1 o The resulting linear DNA fragment has specific restriction ends as well as a central restriction site corresponding to the site of ligation o The pool of these fragments is collectively referred to as the 3C library - Step 5 Quantitation: PCR uses primers against the site of ligation to semiquantitatively assess the frequencies of a restriction fragment of interest o Quantitative PCR using TaqMan probes (3C-qPCR) provides a more quantitative measurement of the fragment of interest o Quantitative PCR monitors the amplification of a targeted DNA molecule during PCR in real-time using non-specific fluorescent dyes that intercalate with any double-stranded DNA o In this sense, since there will be more relevant fragments than random fragments ligated together, the ligated fragments will also amplify more quickly and so can be picked out using qPCR Circularized Chromosome Conformation Capture (4C) - This technique has an advantage over 3C in that only the sequence of one of the sites of interest needs to be known – this fragment is known as the “bait” - Steps 1-4: These steps are identical to 3C. The idea is that we should produce a bunch of one promoter sequence that is ligated to one or more unknown enhancer sequences (for example) - Step 5a Second restriction digest: After the reversal of the cross-linked DNA, the restriction fragments are subjected to another round of restriction digest, this time with a frequent cutter that will results in smaller fragments with restriction ends that differ from the central restriction site - Step 5b Self-circularization: Self-circularization of the DNA fragments is more favored now that they are not bound to other proteins or fragments o Intramolecular ligation occurs to induce the formation of circular fragments which become the 4C library - Step 5c Inverse PCR and quantitation: Primers are designed against the outer restriction sites of the “bait” sequence, which result in the amplification of the small unknown captured fragment o Large-scale sequencing can be used to sequence the 4C library – custom microarrays can also be made using probes designed against the adjacent upstream and downstream regions of all genomic sites of the restriction enzyme used in step 2 Carbon-Copy Chromosome Conformation Capture (5C) - The 5C technique expands from 3C and allows for the parallel analysis of interactions between many selected loci - Steps 1-4: Same as in 3C - Step 5 Ligation-mediated amplification and quantitation o Performing multiplex ligation-mediated amplification (LMA) after the construction of the 3C library leads requires using multiplex primers that consist of universal primer sequences like T7 and T2 and the ligation junction sequences (all the 3C library leads should still have the same central restriction site at the ligation junction) o They anneal to the 3C fragments and get ligated together with a DNA ligase – the ligated primers serve as templates of which get amplified to generate the 5C libraries o The use of universal primer sequences means these 5C fragments can be analyzed on microarrays and the small size of the 5C fragments is also compatible with analysis using high-throughput sequencing o Multiplex PCR permits multiple targets to be amplified with only a single primer pair Each probe consists of two oligonucleotides that recognize adjacent sequences in the DNA and when they bind, and a ligase is used to mend the two probes together. Then, a primer that requires sequences from both probes is used for PCR so that amplification only occurs upon ligation http://www.lcg.unam.mx/frontiers/files/frontiers/deWit_GenesDev2012.pdf A decade of 3C technologies: insights into nuclear organization - 3C technologies are based on the remarkably simple idea that digestion and religation of fixed chromatin in cells, followed by the quantification of ligation junctions, allows for the determination of DNA contact frequencies and insight into chromosome topology - First step is to establish a representation of the 3D organization of the DNA o Chromatin is fixed using a fixative agent, most often formaldehyde o Next, the fixed chromatin is cut with a restriction enzyme recognizing 6 bp (such as HindIII) or with more frequent cutters - - - - - - o Then, the sticky ends of the cross-linked DNA fragments are religated under diluted conditions to promote Intramolecular ligations (i.e. between cross-linked fragments o DNA fragments that are far away on the linear template, but that colocalize in space (e.g. enhancers and promoters) can be ligated to each other o In this way, a one-dimensional linear DNA segment serves as a template of the 3D nuclear structure To establish the 3D conformation of a locus or chromosome, one must measure the number of ligation events between non-neighboring sites o In 3C, this is done by quantitative or semiquantitative amplification of selected ligation junctions o Primers are designed near and toward the ends of all restriction fragments of interest and the amplification efficiencies of different primer combinations are compared in a matrix of ligation frequencies that serve as proxies for pairwise interaction frequencies In the original study by Dekker et al. (2002), from this matrix, the average 3D conformation of yeast chromosome III was determined, showing that it forms a contorted ring o The method was then adapted for mammalian systems to show that chromatin loops exist in vivo between regulatory DNA elements and their target genes via studies of the beta-globin locus With 3C, it is also possible to pick up enhancers that were previously unknown to regulate a gene o Survey of spatial environment of CFTR gene identified a number of cell type-specific DNA-DNA interactions o Some sequences showed enhancer activity in a reporter assay suggesting they may activate CFTR expression Figure 2 demonstrates how enhancer looping at the mouse beta-globin locus was demonstrated o The relative cross-linking frequencies of several sequences for found to be particular high in fetal liver cells where the gene is expressed, while the cross-linking frequencies were low everywhere in fetal brains, where the gene is silent o This suggests that the sequences with high-cross linking frequency are regulatory in nature (and, in particular, are enhancers) Enhancer activity on gene expression can be blocked by insulator sequences They are bound by proteins such as CTCF and 3C technology has been used to demonstrate that the function of certain insulators is dependent on the spatial organization o CTCF sites form chromatin loops by contacting each other in the betaglobin locus o Also recruit additional factors such as cohesin, which may facilitate DNA loop formation A number of recent studies have pointed to the existence of loops between the start and end of a gene o 3C experiments in mouse liver cells showed that ribosomal DNA promoters have an increased propensity to interact with terminator sequences and that these loops are associated with increased rDNA expression o A mechanistic explanation is that gene looping facilitates reloading of RNA polymerase and thereby increases expression throughput o In yeast, loops form on genes when they are active or poised, but not when they are repressed - Technical issues that arise when interpreting 3C data o Any two sequences nearby on the linear chromosome are close in space and therefore sequences over hundreds of kilobases frequently cross-link and ligate to the anchor, independently of the chromatin’s 3D conformation o To appreciate loops visualized by 3C-based technologies, one needs to find the anchor interacting with a distant sequence more frequently than with intervening sequences o Therefore, 3C methods intrinsically rely on quantitative rather than qualitative measurements using qPCR for the quantitative detection of a given ligation junction o At most alleles, cross-linking will result in larger chromatin aggregates with many DNA fragments together within which all DNA ends compete with each other for ligation to the anchor fragment o Therefore, even a very stable enhancer-promoter interaction will only occasionally result in the corresponding ligation junction an, since every diploid cell only contributes maximally two ligation junctions of interest, 3C PCR requires faithful and quantitative amplification of very rare ligation junctions from many genome equivalents o Consequently, qPCR is notoriously difficult and requires strict controls and careful experimental design o The advent of genome-scale methods such as microarrays and highthroughput sequencing has enabled the development of more unbiased methods that offer a solution for assessing the relative abundance of longrange DNA-DNA contacts Chromosome conformation capture-on-chip (4C) technology - 4C-seq uses next-generation sequencing (NGS) to analyze contacting sequences and is similar to using microarrays to analyze the contacts of a selected genomic site with all of the genomic fragments on the array - It is a “one versus all” strategy because, a single viewpoint is defined and the genome is screened for sequences that contact this selected site - In 4C technology, the ligated 3C template is processed with a second round of DNA digestion and ligation to create small DNA circles (some of which contain 3C ligation junctions) - - - - - o Using view-point-specific primers (that bind to our sequence of interest), inverse PCR specifically amplifies all sequences contacting this chromosomal site and can be analyzed by microarrays or NGS methods o The latter is cheaper and enables more accurate quantification of DNA interaction frequencies and has a larger dynamic range o The idea is that the viewpoint-specific primers also contain the Illumina sequencing primers so that the PCR products can be sequenced without further processing o The reads contain the primer and the ligation junction, and after trimming the primer sequence, the remainder of the reads are aligned to the genome 4C first used to investigate the DNA interaction profiles of a tissue-specific gene embedded in an inactive chromosomal region (beta-globin) and a house-keeping gene (Rad23a) present in an active gene-rich region o Rad23a made contacts with active regions on its own chromosome and on other chromosomes that was largely conserved in both tissues o The tissue-specific gene, however, made contacts with other active regions in erythroid cells, while in fetal brains (where it is not expressed), inactive regions were contacted 4C studies have also shown that coregulated genes preferentially meet at dedicated transcription sites in the nucleus implying that genes dynamically move to specific nuclear locations for transcription, rather than the transcription machinery moving to genes Another 4C study focused on dosage compensation of the mammalian X chromosome o Using allele-specific 4C strategy, it was shown that the active and inactive X chromosomes adopt distinct topologies in that the noncoding RNA Xist that drives X inactivation is not required to maintain gene silencing 4C technology is preferred to assess the DNA contact profile of individual genomic sites but is limited to the description of long-range contacts with larger regions elsewhere on the chromosomes (in cis) or on other chromosomes (in trans), rather than local interactions between a gene and its enhancer 50 kb away due to lack of resolution Most 4C strategies use restriction enzymes with a 6-nucleotide recognition sequence that cut once every few kilobases, creating fragments that are much larger than the average regulatory sequences (which are less than several hundred base pairs) Using more frequent cutters (i.e. those that recognize 4-nt sequences) can potentially pick up more local interactions Chromosome conformation capture carbon copy (5C technology) - “Many versus many” technology allowing concurrent determination of interactions between multiple sequences - In 5C, the 3C template is hybridized to a mix of oligonucleotides, each of which partially overlaps a different restriction site in the genomic region of interest o Pairs of oligonucleotides that correspond to interacting fragments are juxtaposed on the 3C template and can be ligated together o Since all 5C oligos carry one of two universal primer sequences at their 5’ ends, all ligation products can subsequently be amplified simultaneously in a multiplex PCR reaction and then analyzed through high-throughput sequencing - The resolution of the technique is determined by the spacing between neighboring oligonucleotides on the linear chromosome template - It can never reach the resolution of 4C as note every unique end of a restriction fragment will allow the design of a 5C oligonucleotide - However, it provides a matrix of interaction frequencies for many pairs of sites allowing the reconstruction of the average 3D conformation of larger genomic regions Interpretation of chromosome capture experiments: further considerations - The resolution of all 3C-based methods is limited by the choice of the first restriction enzyme - For a six-cutter like HindIII, there are ~800,000 HindIII sites in the mouse genome, and the average resolution throughput for the genome will be ~4 kb. - Local distribution of restriction sites can vary between different genomic regions, resulting in different resolutions at different genomic locations - An additional factor that can influence the results is the presence of repeats in the genome o For 3C, this is a relatively minor problem because one can be quite flexible in the selection of primers for PCR o For NGS-based methods, this can be more challenging, especially in 4C-seq and 5C, which rely on the sequence directly adjacent to the restriction site o This can be partially circumvented by increasing the length of the sequencing reads, which gives higher mapping specificity - A key characteristic of 3C-based methods is the very high capture probability between neighboring fragments, due to their close spatial proximity o Moving further away from a given fragment leads to exponential decrease of the capture probability until it reaches a baseline level o The rapid decline in contact probability makes it so that specific ligation junctions between two given sites far apart on the chromosome will be rare o This makes 3C unsuitable for the analysis of long-range contacts o For far cis and trans DNA contacts, 3C and HiC data sets are not reproducible at the single fragment resolution, but are highly reproducible over genomic windows o When a long-range interaction within or between chromosomes is described, this is often a statistical definition, meaning two regions have a higher probability for making contacts compared with other regions at a similar distance on the same chromosome - A further issue to consider is the number of contacts a given gene appears to have o In 4C, a single locus can be engaged in tens or hundreds of contacts (depending on the threshold) o These contacts are collected from many cells and will not all be present in the same cell, implying that the large number of contacts reflects cell-to-cell differences in genome topology o During mitosis, each chromosome probably adopts one of a limited number of energetically favored conformations that will position a given gene next to a few other genes - The dynamics of chromatin structure and cell-to-cell variation is not appreciable by 3C based methods, and it cannot be determined whether two different interactions of A with B and C occur simultaneously or sequentially and/or whether they are mutually exclusive Original 3C method: Dekker et al. (2002) Capturing Chromosome Conformation - Analysis of chromosome conformation is complicated by technical limitations - Electron microscopy while affording high resolution cannot be applied to studies of specific loci - Light microscopy does not offer high enough resolution - DNA binding proteins fused with GFP allow visualization of individual loci, but do not permit a multi-loci analysis - Intact nuclei are isolated and subjected to formaldehyde fixation, which cross-links proteins to other proteins and to DNA o The overall result is cross-linking of physically touching segments throughout the genome via contacts between their DNA-bound proteins o The relative frequencies with which different sites have become crosslinked are then determined This is done using restriction enzymes to cut around the areas of the DNA that are cross-linked and then using DNA ligase to seal these nearby segments together Very low concentrations of DNA are used to favor ligation of cross-linked fragments, which is Intramolecular, over ligation of random fragments, which is intermolecular The cross-links are reversed, and the resulting linear DNA fragment with a restriction site in the middle is quantified using qPCR Control template is generated in which all possible ligation products are present in equal abundance - Explanation of Figure 1 o When using primers 5 and 6, which are centered around the central restriction site, greater qPCR amplification was observed, compared to using primers 6 and 13, where 13 is located much farther from the restriction site o Ligation product formation increased linearly with formaldehyde concentration - In general, cross-linking frequency (X) decreases with increasing separation distance in kb along chromosome III - Centromere relationships were probed by analyzing the frequencies with which the centromere of chromosome IV (CEN4) became cross-linked to each of 10 sites along the length of chromosome III o In premeiotic cells, CEN4 interacted strongly with CEN3 Oncogene-mediated alterations in chromatin conformation (2011) http://www.pnas.org/content/109/23/9083.short Abstract - They performed unbiased high-resolution mapping of intra- and interchromosome interactions upon overexpression of ERG, an oncogenic transcription factor frequently overexpressed in prostate cancer - Through integration of data from genome-wide chromosome conformation capture (Hi-C), ERG binding and gene expression, they demonstrate that oncogenic transcription factor overexpression is associated with global, reproducible and functionally coherent changes in chromatin organization - Broad implications, since genomic alterations in other cancer types frequently give rise to aberrant transcription factor expression, which can alter 3D topology of chromatin Topological domains in mammalian genomes identified by analysis of chromatin interactions (2012) http://www.nature.com/nature/journal/v485/n7398/abs/nature11082.html - They investigate the three-dimensional organization of the human and mouse genomes in embryonic stem cells and terminally differentiated cell types at unprecedented resolution - Identify large, megabase-sized local chromatin interaction domains, which we term ‘topological domains’ as a pervasive structural feature of the genome organization - These domains correlate with regions of the genome that constrain the spread of heterochromatin o Stable across different cell types and highly conserved across species, indicating that topological domains are an inherent property of mammalian genomes - Found that the boundaries of topological domains are enriched with insulator binding protein CTCF, housekeeping genes, and SINES, indicating that these factors may have a role in establishing the topological domain structure of the genome - https://books.google.ca/books?id=uKDHBgAAQBAJ&pg=PA58&lpg=PA58&dq=can+ cross- linked+DNA+be+cleaved+with+restriction+enzymes&source=bl&ots=MH7Bo3OV2&sig=UKKuN7xNWSdiWW_jObicAxkeTWA&hl=en&sa=X&ved=0CDAQ6AEwA2oV ChMI46ffibSNyAIVVpyICh26UgtZ#v=onepage&q=can%20crosslinked%20DNA%20be%20cleaved%20with%20restriction%20enzymes&f=false Print out copies of the question! Make question by the end of the day and send to everybody! Read the write up and add contributions if you need too!