Giacomo Cavalli In vivo protein-DNA interactions UE Méthodologie, 11 April, 2014 Institut de Génétique Humaine, CNRS Montpellier, France 10,000 nm DNA compaction in the nucleus 11 nm 30nm 1bp (0.3nm) compact size DNA length compaction nucleus (human) 2 x 23 = 46 chromosomes 92 DNA molecules 10 mm ball 12,000 Mbp 4 m DNA 400,000 x mitotic chromosome 2 chromatids, 1 mm thick 2 DNA molecules 10 mm long X 2x 130 Mbp 2x 43 mm DNA 10,000 x DNA domain anchored DNA loop 1 replicon ? 60 nm x 0.5 mm 60 kbp 20 mm DNA 35 x chromatin fiber approx. 6 nucleosomes per ‘turn’ of 11 nm 30 nm diameter 1200 bp 400 nm DNA 35 x nucleosome disk 1 ? turn of DNA (146 bp) + linker DNA 6 x 11 nm 200 bp 66 nm DNA 6 - 11 x 1 bp 0.33 nm DNA 1x base pair 0.33 x 1.1 nm Compaction of DNA by histones Compaction by higher order determinants Implication of PcG proteins in dynamic gene regulation Cellular Memory System Differentiation Cell fate determination Proliferation Cell cycle Control HOX genes Signaling genes PcG proteins Cell cycle genes TFs of developmental networks Cancer development Stem cell plasticity Developmental pathways Schematic mechanism of Polycomb mediated silencing Histone Methyl Transferase Me3 K27 Core PRC2 Me3 K27 E(z) Esc Su(z)12 Nurf-55 PRE H3 Target gene H3 H3 H3 Schematic mechanism of Polycomb mediated silencing Histone Methyl Transferase Me3 K27 Core PRC2 Me3 K27 E(z) Esc Su(z)12 Nurf-55 PRE H3 H3 H3 H3 Target gene Psc Ph Core PRC1 dRing Pc Me3 K27 H3 Me3 K27 H3 H3 H3 Schematic mechanism of Polycomb mediated silencing Histone Methyl Transferase Me3 K27 Core PRC2 Me3 K27 E(z) Esc Su(z)12 Nurf-55 PRE H3 H3 H3 H3 Target gene Psc Ph Core PRC1 Ub E3 ligase dRing Pc Me3 K27 H3 Ub K119 H3 Me3 K27 H3 Ub K119 H3 Schematic mechanism of Polycomb mediated silencing Histone Methyl Transferase Me3 K27 Core PRC2 Me3 K27 E(z) Esc Su(z)12 Nurf-55 PRE H3 H3 H3 H3 Target gene Psc Ph Core PRC1 Ub E3 ligase dRing Pc Me3 K27 H3 H2A Ub K119 H3 H2A Me3 K27 H3 H2A ATP-dependent chromatin remodeling Ub K119 H3 H2A PcG and trxG proteins associate to multiple genomic loci Polytene chromosome staining shows around 100 bands for each PcG protein Genome-wide identification of downstream PcG target genes « ChIP-on-chip » approach The ChIP on chip approach ChIP Cross-link chromatin Produce soluble chromatin DNA chip 1st generation microarrays Produce 2 KB PCR fragments of overlapping genomic DNA fragments 2nd generation microarrays IP step whole genome coverage with 1,000,000 long oligonucleotides, i.e. 1 Oligo per 120 bp of euchromatin Produce fluorescent labeled probes Protein IP Control IP Hybridize to the DNA chip Obtain the profile Dynamic function of Polycomb proteins and cell proliferation Embryos -Schuettengruber et al 2009 H3K27me3 200Kb PC Ph S2 cells data-Schwartz et al 2006 H3K27me3 PC Psc http://www.purl.org/NET/polycomb ChIP on chip validation: Comparing ChIP on chip data with a chromatin profiling using an independent technology called DamID In DamID, the chromatin protein of interest is fused to the bacterial Dam-methylase and the construct is transfected into the cells of interest. The protein of interest drives the Dam partner to its targets, and the methylase puts a methyl mark at the “A” of GATC sequences. Methylated DNA is then isolated and hybridized onto microarrays of interest Correspondence between ChIP on chip and DamID data Signaling pathways interacting with RDGN genes: 26.4% Maternal genes 73.6% N=53 23.1% Gap genes 76.9% N=13 PcG target genes regulate genes at multiple layers of transcriptional cascades toy / PAX6 ey / PAX6 eyg / PAX6(5A) eya / EYA1-4 so / SIX1/2 27.3% Optix / SIX3/6 Pair-rule genes shf / WIF1 dac / DACH1-2 72.7% N=11 Eye specification Segment polarity genes 40.7% 59.3% N=54 Homeotic genes 100 % N=8 No target PcG target Direct Hox gene targets N=21 52.4% 47.6% FLY toy2 ey 1 eyg (toe)1 Optix1 shf eya2 so1 dac1 hh1 dpp1 MOUSE Pax6 Pax6 Six6 Wif1 Eya1-4 Six1 Dach1 Shh Bmp2 HUMAN PAX6 PAX6 SIX6 WIF1 EYA1-4 SIX1 DACH1 SHH BMP2 Additional factors involved in eye development: oc1 Otx1 OTX1 ato3 Atoh1-8 ATOH1-8 tsh2 bi1 Tbx2 TBX2 The evolution of ChIP: massive sequencing of the immunoprecipitated chromatin DNA H3K36 ChIP 100bp 1Kb+ ChIP-Seq Library construction ~5-10ng Polish ends 5’ 3’ Taq extend 600bp 500bp 400bp 300bp 200bp 100bp A Ligate Solexa Linkers A Illumina sequencing Laser C A Linker ligated DNA Amplify to form clusters T G Sequence one base at a time Flow cell imaging by microscopy 60 X objective: thousands tiff images / hundred thousands of images per run. Chromatin Immunoprecipitation Tag Sequencing After obtaining the sequences, they are positioned on the genome by automated algorythms (like Blast but quicker) and each tag is thus assigned its position on the genome. These profiles can then be quantified and analyzed just like normal ChIP on chip profiles Identification of new PcG target genes •PcGtargets (PC/PH/H3K27me3) 305 maintained PH 181 embryos PC 145 New domain 0 0 H3K27me3 PH eye discs 350 275 0 0 PC 353 0 H3K27me3 0 (+) (-) Anna Delest fd96Ca fd96Cb danr dan PRE position is highly conserved in Drosophila species D.Melanogster vs D.Yakuba PH Mel PH Yak PC Mel PC Yak K27 Mel K27 Yak PHO Mel PHO Yak DSP1 Mel DSP1 Yak K4 Mel K4 Yak Wnt4 wg → species-specific differences can be used to study PRE sequence features Bernd Schüttengruber • Exploiting In vivo protein-DNA interactions to learn about the three dimensional conformation of chromatin PREs are sometimes located at positions overlapping the proximal gene promoter, but in other instances they can be at tens of kilobases away from it. How can PcG proteins repress transcription in all these cases? > 30 kb Mecanisms ? PcG proteins 28 kb "SPREADING versus LOOPING" Two models have been proposed in order to explain how PcG proteins repress their target genes: 1. They might spread from the PRE into flanking chromatin, covering the whole domain including the target promoter 2. Alternatively, they might reach the promoter via direct looping of the PRE and establishment of protein-protein contacts. "LOOPING" PRE "SPREADING" PcG proteins PcG proteins Interestingly, at some endogenous target genes PREs are located at very large distance from the promoter and they are flanked by elements called: "chromatin insulators" Insulators • Insulators are divided into three classes depending on their abilities Enhancer blockers En. Ins. En. Gene Chromatin boundaries Ins. Insulators that can be "bypassed" En. En. Ins. • One insulator can have many of these properties Gene Ins. Ins. Gene The gypsy insulator ● DNA element isolated from the drosophila gypsy retrotransposon ● This sequence contains 12 binding sites for the Su(Hw) protein, that is required for insulator function En. Insulator bypass model Ins. Ins. Gene Model of nuclear chromosomal architecture based on insulators interaction Domain A Domain B Insulating proteins Gerasimova et al, Mol. Cell, 2000 Bypass of the gypsy insulator by a PRE yellow PRE Insulator white yellow PRE Insulator Expression of white Expression of white red red brown brown orange orange yellow yellow white white PRE Enhancer Insulator Insulator white Insulator Gene Gene Insulator Yes! the PRE can bypass 2 insulators ChIP analyisis of the molecular landmarks of insulator bypass 35 1kb Fold enrichment 30 25 PC 20 pupal stage ● PcG proteins bound to the PRE can reach a downstream promoter without coating an insulated chromatin domain 15 10 5 0 yellow PRE Insulator ● PcG proteins are able to spread from a PRE into a neighboring region of several kb. This spreading is blocked by one insulator white Insulator 0 Fold enrichment 5 ● Two insulators build a chromatin domain fully shielded from invasion by PcG proteins 10 15 20 PH 25 pupal stage 30 35 1kb The data shown before provide good evidence for a spreading process Can we get direct evidence for looping? Chromosome Conformation Capture (3C) technology: 3C technology allows to convert chromosomal interaction events into DNA ligation events that can be analyzed by PCR Biological material 1 5 DNA purification and quantitative PCR analysis 4 Main steps of 3C technology Formaldehyde-fixed nuclei preparation 2 3 Chromatin digestion Extensive dilution Ligation Two gypsy insulators build a chromatin loop Interaction level in percentage of input (P)(S)YSW-22E lines - Adult H3C - distal gypsy insulator anchor 0.5% 0.4% 0.3% 0.2% 0.1% 0.0% -15kb -10kb PRE CG4238-RF -5kb Prox.yellow Ins. 0 +5kb +10kb Nplp4-RA Dist. mini-white Ins. CG33543-RC CG15353-RA (P)(S)YSW transposon tRNA:CR31940-RA tRNA:CR31669-RA tRNA:CR31939-RA Anchor tRNA:CR31943-RA tRNA:CR31944-RA +15kb In summary, both spreading and looping models could be correct, each one accounting for a particular context yellow PRE b w PcG proteins mini-white yellow PRE br b Dist. Ins. w mini-white br Prox. Ins. PRE close to its target promoter Dist. Ins. PRE distant from its target promoter "SPREADING" + Comet et al, Dev. Cell 2006 PRE "LOOPING" How PcG proteins and insulators might work in the cell nucleus Nucleus PcG bodies Insulator bodies Insulator-binding protein complexes High-resolution 3C is appropriate to study chromatin conformation Analysis of Hox gene contacts by 4C • We developed a new 4C method based on “biotinylated primer extension” streptavidin bead Biotinylated Primer GGGGG CCCCC • The amplified material is then hybridized to a Microarray (Roche Nimblegen) • We used the Fab-7 PRE sequence as a bait, which negatively regulates the Abd-B gene in the BX-C Itys COMET Modification and control of the 4C procedure 3C preparation 20 18 16 14 12 10 8 6 4 2 0 Copies number ratio 1. Biotinylated-primer extension Copies number ratio Anchor fragment 2 Ins. / 1 Ins. 2 Ins. / 1 Ins. Unknown partner 3C 2. Affinity purification on streptavidin beads 4C INPUT BEFORE amplification 20 18 16 14 12 10 8 6 4 2 0 4C BEFORE amplification 4C AFTER amplification 3. “In situ” linker synthesis 1kb 4. Quantitative amplification by real-time PCR 500bp 400bp 300bp Unknown partners ligated to the anchor fragment 200bp 5. Genomic DNA-Chip Hybridization Anchor fragment 100bp Primer dimers Denaturing 4% agarose gel 4C data analysis by generation of domainograms Normalized profile intensities for each probe i are transformed into rank based scores Qi, which are combined into Siw multiscale scores and transformed as Piw probabilities using Fisher's Chi square law. Piw at scale = N probes probabilities at scale = 3 probes probabilities at scale = 1 probe Legend: N is the total number of probes ri is the rank of probe i Benjamin LEBLANC The Piw values represent probabilities of 4C events as a function of chromatin domain size Piw in false color Piw at Log scale = N probes Major Fab-7 4C hits are Polycomb bound regions 4C Domainogram 1Mb 1Mb 3R 3R ANT-C grn prospero hth E5-ems Fab-7 BX-C srp-pnr ss Polycomb ChIP Domainogram NK-C pnt Drop 10-5000 10-500 10-50 10-10 10-1 Simplified Hi-C procedure •Fix nuclei of 16-18 hr embryos •Digestion with 4-cutter DpnII •Ligation and DNA purification as 3C •Sonication and selection of ~800 bp •Deep paired-end sequencing Hi-C efficiently reproduces known 3C contacts Chromatin contact features 2. Matrix diagonal is not homogeneous Polycomb-mediated interactions Bantignies et al., 2011 http://www.igh.cnrs.fr/equip/cavalli/link.PolycombTeaching.html References: Schüttengruber et al. (2009) PLoS Biol 7(1): e1000013; Comet et al. Dev Cell 11, 117-124 and PNAS , 108(6):2294-9; Bantignies et al. Cell 144, 214-26, Sexton et al. Cell 148, 458-472 Bernd SCHÜTTENGRUBER Nicolas NEGRE Benjamin LEBLANC Anna DELEST Itys COMET Tom SEXTON ERC EU - 7FP CNRS, ARC French ministry of research