science.sciencemag.org/cgi/content/full/science.abb0074/DC1 Supplementary Materials for Mechanisms of OCT4-SOX2 motif readout on nucleosomes Alicia K. Michael, Ralph S. Grand, Luke Isbel, Simone Cavadini, Zuzanna Kozicka, Georg Kempf, Richard D. Bunker, Andreas D. Schenk, Alexandra Graff-Meyer, Ganesh R. Pathare, Joscha Weiss, Syota Matsumoto, Lukas Burger, Dirk Schübeler*, Nicolas H. Thomä* *Corresponding author. Email: nicolas.thoma@fmi.ch (N.H.T.); dirk@fmi.ch (D.S.) Published 23 April 2020 on Science First Release DOI: 10.1126/science.abb0074 This PDF file includes: Materials and Methods Figs. S1 to S17 Table S1 Caption for Movie S1 References Other Supplementary Material for this manuscript includes the following: (available at science.sciencemag.org/cgi/content/full/science.abb0074/DC1) MDAR Reproducibility Checklist (.pdf) Movie S1 (.mov) Materials and Methods Human octamer histones expression, purification and reconstitution Human histones were expressed and purified as described previously (40). Lyophilized histones were mixed at equimolar ratios in 20 mM Tris-HCl (pH 7.5) buffer, containing 7 M guanidine hydrochloride and 20 mM 2-mercaptoethanol. Samples were dialyzed against 10 mM Tris-HCl (pH 7.5) buffer, containing 2 M NaCl, 1 mM EDTA, and 2 mM 2-mercaptoethanol. The resulting histone complexes were purified by size exclusion chromatography (Superdex 200; GE Healthcare). DNA preparation DNA for medium to large scale individual nucleosome purifications was generated by Phusion (Thermo Fisher) PCR amplification (on average 2 x 96 well plates) or large-scale plasmid purification (GigaPrep, Invitrogen) followed by EcoRV-HF (New England Biosciences) blunt-end restriction enzyme cleavage. The resulting DNA fragment was purified by a monoQ column (GE Healthcare). The unmodified 601 Widom sequence was purified with a large-scale plasmid purification using a high copy plasmid containing 32 copies of the 601 sequence previously cloned into pUC19 vector (41, 42). All purified DNA was concentrated and stored at -20C in 10mM Tris-HCl pH 7.5 until use. Nucleosome assembly The DNA and the histone octamer complex were mixed in a 1:1.5 molar ratio in the presence of 2 M KCl. The samples were dialyzed against refolding buffer (RB) high (10 mM Tris-HCl (pH 7.5), 2 M KCl, 1 mM EDTA, and 1 mM DTT). The KCl concentration was gradually reduced from 2 M to 0.25 M using a peristaltic pump with RB low (10 mM Tris-HCl (pH 7.5), 250 mM KCl, 1 mM EDTA, and 1 mM DTT) at 4°C. Samples were further dialyzed against RB low buffer at 4°C overnight. After dialysis, nucleosomes were incubated at 55C for 2 hours. The reconstituted nucleosome pools used for SeEN-seq were then purified by native polyacrylamide gel electrophoresis using a Prep Cell apparatus (Bio-Rad) in TCS buffer (10 mM Tris-HCl (pH 7.5) and 500 M TCEP). Large scale assemblies of individual nucleosomes were purified by on a monoQ 5/50 ion exchange gradient (GE Healthcare) and desalted using a Zeba spin column (Thermo Fisher) into 20mM Tris-HCl pH 7.5 and 500 M TCEP and stored at 4°C. Protein expression and purification of OCT4 and SOX2 Human full-length OCT4 (residues 1 – 360), OCT4 DNA binding domain (residues 134-290) or human SOX2 DNA binding domain (residues 37 – 118) were subcloned into pAC-derived vectors (43) containing an N-terminal StrepII tag. An additional N-terminal EGFP tag and C-terminal sortase-6XHIS tag (LPETGGHHHHHH) were fused in frame with OCT4 full-length to improve purification and was also used 2 for cryo-EM sample preparation. Recombinant proteins were expressed in 2- 4 L cultures of Trichoplusia ni High Five cells using the Bac-to-Bac system (Thermo Fisher). Cells were cultured at 27°C, harvested 2 days after infection, resuspended in lysis buffer (50 mM Tris-HCl pH 8.0, 1M NaCl, 100 M phenylmethylsulfonyl fluoride, 1 × protease inhibitor cocktail (Sigma), 250 M TCEP), and lysed by sonication. The supernatant was harvested and the proteins were purified by Streptactin affinity chromatography (IBA), and then purified by Heparin ion exchange chromatography (GE Healthcare). All proteins were further purified by size exclusion chromatography (Superdex 200; GE Healthcare) in GF buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 5% glycerol, 500 M TCEP) as a last purification step. The purified proteins were concentrated and stored at -80°C. SeEN-seq library pool preparation DNA sequences were generated by replacing Widom 601 sequence with the canonical consensus JASPAR OCT4-SOX2 motif (CTTTGTTATGCAAAT, MA0142.1) (25) at 1bp intervals across the entire W601. The W601-OCT4-SOX2 variant DNA sequences were flanked by EcoRV sites and adapter sequences and ordered as gene fragments from TWIST Biosciences. The individual gene fragments were suspended, pooled equally, cut with EcoRV-HF (NEB) and W601-OCT4-SOX2 variant DNA fragments (153bp) purified from an agarose gel using QIAquick Gel Extraction kit (Qiagen). The W601-OCT4SOX2 DNA pool was spiked with an excess of W601 DNA (1:30 molar ratio; pool:601). The nucleosome pool was assembled and purified as described above. SeEN-seq assay For SeEN-seq EMSAs, nucleosomes (100nM) were incubated with increasing amounts of full-length OCT4, SOX2 (residues 37-118) or OCT4 and SOX2 (100nM, 200nM and 400nM) in 20uL reactions containing 20mM Tris-HCl pH 7.5, 75mM NaCl, 10mM KCl, 1mM MgCl2, 0.1 mg/ml BSA, and 1 mM DTT. The reactions were incubated at room temperature for ~1 hour and loaded onto a 6% non-denaturing polyacrylamide gel (acrylamide:bis = 37.5:1) in 0.5X TGE and run for 1 hour (150V, room temperature). At least 2 technical replicate experiments were generated for all TF(s) conditions tested. Gels were then stained with SYBR gold nucleic acid stain (~10 min, Invitrogen). DNA bands corresponding to the size of TF bound and unbound nucleosome complexes were imaged and excised using a C300 gel doc UVtransilluminator (Azure Biosystems). Gel slices were incubated with acrylamide gel extraction buffer (100uL, 500mM Ammonium acetate, 10mM Magnesium Acetate, 1mM EDTA, 0.1% SDS) and heated (50OC, 30 min.). H2O (50uL) and QIAquick Gel Extraction kit QG buffer (450uL, Qiagen) were added and the samples heated (50OC, 30 min.). Samples were briefly spun and the supernatant containing DNA fragments were transferred to QIAquick Gel Extraction spin columns. Samples were purified according to 3 manufactures instructions, eluted in H2O (22uL) and DNA quantified by Qubit reagent (Thermo Fisher scientific). Purified DNA (20ul, ~2-20ng DNA) was used for Next generation sequencing (NGS) library preparation (NEBNext ChIP-seq, E6240S) with dual indexing (E7600S) and no more than 10 cycles of PCR amplification. Purified sequencing libraries were quantified by Qubit reagent (Thermo Fisher) and the library size checked on the bioanalyser platform (Agilent) before sequencing on an Illumina MiSeq or NextSeq platform (300bp paired-end). Sequencing fragments were mapped to the W601 sequence and OCT4-SOX2 motif containing variants (153bp) using the Bioconductor package QuasR with default settings (44), which internally use bowtie for read mapping (45). The number of sequence reads aligned to each construct was quantified by the QuasR function Qcount with every construct represented. SeEN-seq enrichments are calculated by determining the fold change between library-size normalized read counts for each 601-OCT4-SOX2 variant in the TF-bound and unbound nucleosome fractions. These fold changes represent a relative affinity difference between all positions. In all replicates we were able to capture every motif position, suggesting that the OCT4-SOX2 motif does not dramatically affect nucleosome stability. For autocorrelation analysis, SeEN-seq profiles (averages of technical replicates) were de-trended by subtracting an 11bp running mean and autocorrelation values were calculated from the residuals using default parameters of the acf function from the R TSApackage (46). In vivo analysis of Oct4 binding and accessibility at full and partial motifs Comparison of in vivo binding of Oct4 and Sox2 Previously published ChIP-seq data for Oct4 and Sox2 (34) were downloaded from GEO: GSM1910640 and GSM1910642 (two replicates of Sox2 ChIP-seq), GSM1910644 and GSM1910646 (two replicates of Oct4 ChIP-seq and GSM1910648 (input chromatin control, only first replicate used due to limited number of reads in the second replicate). ChIP-seq datasets were selected from this particular study (34) due to the existence of replicates for both factors, a well-matched control and their limited GC bias, as GC bias can confound the analysis of NGS data if not carefully controlled for (47-49). Datasets were downloaded from GEO using the SRAdb R package (50) and aligned to the mm10 assembly of the mouse genome using Bowtie (45) within the QuasR (44) package. Bowtie was run using QuasR default parameters, returning only unique alignments. All read counting in given genomic regions was done using the QuasR function qCount, whereby reads were shifted by 80 base-pairs. For all replicates across TF datasets, peaks were identified using MACS2 (51) using default parameters and with corresponding control samples as a background. Resulting peaks were then filtered requiring at least 80% mappability. Here we define mappability as the fraction of all possible 50mers in a given region that are uniquely mappable using default QuasR parameters for Bowtie (read lengths of the ChIP-seq datasets used in this study are 45 and 50). The 4 library-size normalized counts were determined as: nsIP = min(NIP, Ncontrol)*(nIP/NIP) and nscontrol = min(NIP, Ncontrol)*(ncontrol/Ncontrol). Where nIP and ncontrol are the raw counts per peak and NIP and Ncontrol are the total number of reads mapping to the genome in the IP and control sample respectively. Thus, counts were in each case scaled down to the smaller library. For each dataset, enrichment over input in peaks was defined as log2(nsIP + 8)–log2(nscontrol + 8), where a pseudo-count of 8 is used to reduce noise levels at small read counts. Only peaks with a log2 enrichment of at least 1 were retained for further analysis. The joint peak set was defined as the union of all peaks identified in any of the samples. In cases where two or multiple peaks overlapped, a new peak region was defined containing all the nucleotides of the overlapping peaks. The 500 top-enriched peaks of each sample were used for de novo motif finding using HOMER (52). HOMER was run using the function findMotifsGenome.pl using 5 different motif lengths (6, 10, 14, 18 and 22) and 200nt long sequences centered on each peak as input. Resulting weight matrices were, if necessary, reverse-complemented so they all had the same orientation as the reference weight matrix in the Jaspar database (MA0142.1) (25). Each inferred weight matrix was then used to scan the genome using the matchPWM function from the Biostrings R package (53). Matching sequences were determined by requiring a log2-odds score of at least 10 over a uniform background. In cases where two (or multiple) matches overlapped (ignoring their strands), only the match with the highest log2-odds score was retained. In vivo analysis of Oct4 binding and accessibility at full and partial motifs For the analysis focusing on full and partial motifs, we downloaded an additional previously published ChIP-seq data for Oct4 (33) as well as ATAC-seq data for untreated and doxycycline-treated mouse ESCs that contain doxycycline-sensitive Oct4 and Sox2 transgene (36, 54) from GEO: accessions GSM2417142 (Oct4 ChIP-seq), GSM2417127 (whole cell extract control), GSM2341271-6 (ATAC-seq) and GSE134652 (ATAC-seq). We used an additional Oct4 dataset (GSM2417142) to ensure that the results were reproducible across different labs. This additional ChIP-seq datasets was again selected due to its limited GC bias and well-matched control. The previously used Oct4 replicates (GSM1910644 and GSM1910646, see above) were merged for this analysis. ATAC-seq reads were trimmed using cutadapt (55). Both ChIPseq and ATAC-seq data were aligned as described for the Oct4 and Sox2 samples in the previous paragraph. The weight matrix for Oct4-Sox2 was downloaded from Jaspar (MA0142.1) (25) and Oct4-Sox2 sites (full motif) were predicted by scanning the genome and selecting for sites with log2-odds score (probability of a given sequence under the weight matrix model versus a uniform background) of at least 10 using the matchPWM function from the Biostrings package (53). HMG-POUS partial motif matches were determined by scanning with only the first 11 bases of the weight matrix, excluding the last 4 bases which model the specificity of the POUHD, using the same cut-off of 10. Only those partial motif matches where the 4 bases 5 of sequence downstream of the predicted site had a log2-odds score < 0, using the last 4 bases of the weight matrix, were retained for further analysis. Analogously, only those predicted full motifs were retained that had a log2-odds score >= 0 for the last 4 bases of the predicted site. In this way, we ensured that full motif matches contained a 3’ sequence that can be bound by the POUHD and partial motif matches did not, using the logic that a log2-odds of 0 means that the sequence is equal likely to come from the weight matrix and a uniform background and thus represents a natural cut-off for motif distinction. Each predicted site was enlarged to a window of 251bp centered at the start of the motif and only predicted sites for which at least 80% of all possible overlapping 50-mers within the enlarged sequence were mappable using QuasR default parameters were retained for further analysis. In addition, only predicted sites for a full motif that did not overlap with a HMG-POUS motif, and vice versa, within the 251bp window were used. Finally, promoters were defined as the regions ±1000bp around transcription starts of all genes in the UCSC known Gene table (http://genome.ucsc.edu), via the R package TxDb.Mmusculus.UCSC.mm10.knownGene (56) and only distal predicted sites not overlapping with any promoter in this set were kept. ChIP-seq as well ATAC-seq counts on the (enlarged) predicted sites were determined using the QuasR function qCount, using a shift of 80bp for the ChIP-seq data and no shift for ATAC-seq. Log2 ChIP enrichments over the control were determined as described in the previous paragraph. Library-size normalization was performed for both ChIP and ATAC-seq data by normalizing to the smallest library as described above. Fluorescence polarization (FP) assays Flc-labelled DNA containing the canonical OCT4-SOX2 motif (5ʹ-Flc-GACCTTTGTTATGCAAATTAA3ʹ) was used as a fluorescent tracer. Increasing amounts of OCT4 or SOX2 (0.3-2500 nM) were mixed with tracer (10 nM final concentration) in a 384-well microplate (Greiner, 784076) and incubated for 15 min at room temperature. The interaction was measured in a buffer containing 15 mM HEPES pH 7.4, 250 M TCEP, 75 mM NaCl, 10 mM KCl, 1 mM MgCl2, 0.1% (v/v) pluronic acid. Changes in fluorescence polarization were monitored by a PHERAstar FS microplate reader (BMG Labtech) equipped with a fluorescence polarization filter unit. The polarization units were converted to fraction bound as described previously (57). The fraction bound was plotted versus OCT4 or SOX2 concentration and fitted assuming a one-to-one binding model to determine the dissociation constant (Kd) using Prism 7 (GraphPad). Since the oligonucleotide that was used contained a fluorescent label, we refer to these as apparent Kd (Kd (app)).. All measurements were performed in triplicates. For the competitive titration assays, the OCT4 or SOX2 bound to the fluorescent oligo tracer was back-titrated with unlabeled oligo or nucleosomes containing the canonical motif at different sites. The competitive titration experiments were carried out by mixing tracer (10 nM), OCT4 (300 nM) or SOX2 (150 nM) and increasing concentration of different nucleosomes or DNA (0 - 3.2M). The fraction bound vs. the nucleosome or DNA concentration were fitted with a 6 nonlinear regression curve to obtain the IC50 values in Prism 7 (GraphPad). Two to three technical replicates were measured for each reaction. We note that the assay does not allow us to differentiate between a specific and nonspecific contribution to the binding. Thermal stability assay of nucleosomes Thermal stability assays (TSAs) of the nucleosomes were performed by the previously described method (58). The nucleosomes (final concentration 1 M) or complexes (1 M nucleosome or DNA:2 M OCT4SOX2) were incubated with a temperature gradient from 26°C to 95°C, in steps of 1°C/min, using a StepOnePlusTM Real-Time PCR unit (Applied Biosystems), in 1X binding buffer (BB) containing 5 × SYPRO Orange (Sigma-Aldrich). The buffer only background control was subtracted from the raw fluorescence data and then normalized and plotted. Cryo-EM sample preparation After nucleosome assembly, the nucleosomes were purified using a Mono Q 5/50 column (GE Healthcare) and desalted into 20mM Hepes pH 7.4, 0.5mM TCEP. The NCP SHL6 (~65L, 5M) was then mixed with molar excess of GFP-OCT4 and SOX2 in ~100L volume and incubated at room temperature for 30 minutes (1:3:3; NCP:OCT4:SOX2 molar ratio) in a binding buffer containing 20 mM HEPES pH 7.4, 1 mM MgCl2, 10 mM KCl, and 0.5mM TCEP. The sample was then purified using a Superose 6 3.2/300 column (GE Healthcare) into a buffer containing 20mM Hepes pH 7.4, 50mM NaCl, 10mM KCl, 1mM MgCl2 and 0.5mM TCEP or directly used for a GraFix gradient (29). Peak fractions were analysed by SDS-PAGE stained with Coomassie and native PAGE stained with SYBR gold (Thermo Fisher) to identify proteinDNA complexes. The sample was concentrated using an Amicon Ultra-0.5mL centrifugal filter (Merck Millipore) and either prepared directly for electron microscopy (non-crosslinked sample) or subject to crosslinking using the GraFix method (29). For GraFix crosslinking, the OCT4-SOX2-NCPSHL6 complexes were layered on top of a 10%–30% (w/v) sucrose gradient (50 mM HEPES pH 7.4, 50 mM NaCl, 0.2 mM TCEP) with an increasing concentration (0.18%–0.36% w/v) of glutaraldehyde (EMS) and subjected to ultracentrifugation (Beckman SW40Ti rotor, 30000 rpm, 18 h, 4°C). After centrifugation, 300L fractions were collected from the top of the gradient and peak fractions were analyzed by both native PAGE and SDS-PAGE. Samples were quenched with 50mM Tris-HCl pH 7.5. The peak fractions were combined and desalted to remove sucrose using a Zeba spin column (Thermo Fisher). The resulting sample was concentrated with an Amicon-Ultra 0.5mL centrifugal filter to ~1M nucleosomes as determined by measuring the DNA concentration at Abs260. After concentration, 3L of sample was applied to Quantifoil holey carbon grids (R 1.2/1.3 200-mesh, Quantifoil Micro Tools). Glow discharging was carried out in a 7 Solarus plasma cleaner (Gatan) for 15s in a H2/O2 environment. Grids were blotted for 3s at 4°C at 100% humidity in a Vitrobot Mark IV (Thermo Fisher), and then immediately plunged into liquid ethane. Cryo-EM data collection Data were collected automatically with EPU (Thermo Fisher) on a Cs-corrected (CEOS GmbH, Heidelberg, Germany) Titan Krios (Thermo Fisher) electron microscope at 300 keV. Zero-energy loss micrographs were recorded using a Gatan K2 summit direct electron detector (Gatan) in counting mode located after a Quantum-LS energy filter (slit width of 20 eV). The acquisition was performed at a nominal magnification of 130,000 × in EFTEM nanoprobe mode yielding a pixel size of 0.86 Å at the specimen level. The objective aperture was 100 m. All datasets were recorded with exposure rates between 3.5-5 e-/(px·s) and the exposures were fractionated into 40 frames. The targeted defocus values ranged from -0.25 to -2 m. Cryo-EM image processing Real-time evaluation along with acquisition with EPU (Thermo Fisher) was performed with CryoFLARE (59). Drift correction was performed with the RELION 3.0 (60) motioncor implementation where a motion corrected sum of all 40 frames was generated with and without applying a dose weighting scheme and CTF was fitted using GCTF (61) on the non-dose-weighted sums. Particles were picked using crYOLO on the dose-weighted sums (62). All datasets were processed in RELION 3.0 (60) including: 2D and 3D classification, 3D refinement, particle polishing and CTF refinement. The resulting particles were imported into cryoSPARCv2 (63) and a final non-uniform refinement was performed. The resolution values reported for all reconstructions are based on the gold-standard Fourier shell correlation curve (FSC) at 0.143 criterion (64, 65) and all the related FSC curves are corrected for the effects of soft masks using high-resolution noise substitution (66). A negative B factor was applied to sharpen the maps automatically in PHENIX (phenix.auto_sharpen) (67). Before sharpening, all maps have been filtered based on local resolution estimated with MonoRes (Xmipp) (68). All datasets were analysed for anisotropic effects and relative angular distribution using both cryo-EF (69) and MonoDir (70). For cryoEF analysis a particle size of 190 Å was used. The box size and FSC resolution were taken from the final refinement. Final 3D density map with an efficiency Eod > 0.7, as determined by cryo-EF, were judged as having only limited angular distribution defects (as described in (69)). For both TF-bound nucleosome reconstructions, OCT4-SOX2-NCPSHL-6 and OCT4-SOX2-NCPSHL+6, the EOD value was within an acceptable range (Eod = 0.73, 0.78, respectively) and no additional normalization of angular distribution was performed. The non-crosslinked nucleosome only sample (NCPSHL-6) showed significant anisotropy, as determined by cryo-EF (Eod = 0.67). In order to decrease anisotropy and improve the quality 8 of the map, a 2D classification into 50 classes was performed on these particles (28,138 particles). Junk classes containing only a small number of particles were discarded. The remaining 16 2D classes were balanced by selecting a random subset of particles in each class corresponding to the number of particles in the smallest class. This resulted in 6,302 particles. After subsequent refinement in cryoSPARCv2 of these limited particles, cryo-EF was performed and the EOD increased to 0.79, with only a nominal decrease in resolution (PSF best, 3.38 Å, PSF worst, 5.50 Å). Directional resolution and radial averages versus resolution for all maps were calculated using monoDIR (70) (fig. S12). Model building and refinement A nucleosome template model was extracted from PDB entry 6NJ9 for subsequent interpretation of the cryo-EM maps and was identified as having highest correlation with the cryo-EM map of the nucleosome in a search of all available nucleosome models in the PDB (based on the highest cross correlation coefficient calculated with PHENIX) (67). The template DNA was replaced with the specific 601-OCT4-SOX2 sequence based on the SOX2 motif and confirmed by the differences in purine and pyrimidine densities (in regions with adequate resolution). Human histone coordinates were extracted from PDB entry 5Y0C and rigid body docked. For the OCT4-SOX2-NCPSHL-6 structure, a homology model of human SOX2 HMG in complex with bent DNA was prepared from a Mus musculus SOX2-HMG/OCT1-POU/DNA ternary complex (PDB entry 1GT0; chains A, B, and D; 100% sequence identity for relevant part, see also fig. S4), docked, and merged with the DNA from PDB entry 6NJ9. A homology model of human OCT4 POUs was prepared from a Mus musculus OCT4-POU/DNA complex (PDB entry 3L1P; chain B; 93% sequence identity for relevant part), and rigid body docked. Initial fitting of the template models into the cryo-EM maps, and model building was carried out interactively with COOT (71) and ROSETTA (72). For all-atom refinement, dihedral angle restraints for OCT4 and SOX2 were generated from the corresponding template models (same as above) using PHENIX (67). Initial all-atom real-space refinement was carried out with PHENIX applying reference model restraints (sidechain and backbone torsions) for OCT4 and SOX2 and secondary structure restraints for protein (hydrogen bonds) and DNA (hydrogen bonds, planarity, stacking) (67). Later refinement steps (including B factor fitting) were carried out with ROSETTA (72) using the ‘FastRelax’ protocol in combination with a density scoring function (73) and reference model restraints (sidechain and backbone torsions) for OCT4 and SOX2 (converted from PHENIX). The weight for the dihedral angle restraints was adjusted to allow a certain degree of freedom in order to prevent clashes as well as geometry and density-fit outliers. Restraints for the covalently-attached crosslinker (modified PTD) were generated with PHENIX and JLigand (74). MOLPROBITY (75) and PHENIX were used for model validation. For the non-crosslinked OCT4-SOX2-NCPSHL-6 structure, the model of the crosslinked OCT4SOX2-NCPSHL-6 structure was rigid body docked, followed by all-atom refinement as described above. For 9 the OCT4-SOX2-NCPSHL+6 structure, the OCT4 and SOX2 chains together with respective DNA fragments from the OCT4-SOX2-NCPSHL-6 and NCPSHL-6 only structures were rigid body docked and re-combined with COOT. A 3.45 Å mouse OCT4/SOX2/DNA complex structure (PDB: 6HT5) was used for validation of the overall arrangement of OCT4-SOX2 on a juxtaposed OCT4-SOX2 DNA motif (see fig. S11C). The sequence register was confirmed by purine/pyrimidine density patterns. All-atom refinement was carried out as described above. In all models, sidechain atoms with missing or ambiguous density were marked by setting their occupancies to zero (see fig. S14). Density maps segmentation, figure preparation Structural figures and cryo-EM segmented maps were produced with PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC) and UCSF Chimera (version 1.13). Calculation of clash scores and contact surface area Clash scores for OCT4–nucleosome models were calculated using a PyMOL script (scanFactor.py) as described previously (13). In brief, an OCT1 probe (1O4X) containing an appropriately positioned DNA fragment for superimposing on a nucleosome template model was placed in all possible binding positions, and the clash score for each taken as total number of atoms in OCT1 closer than an adjustable threshold distance (1 Å default) to nucleosome atoms. The PyMOL script has also been deposited here: https://github.com/aliciamichael/amichael/blob/master/scanFactor_var_super.py DNase I nucleosome footprinting assay Nucleosome core particles (NCP) reconstituted with Widom 601 DNA containing an OCT4-SOX2 motif 51bp from the dyad and purified full-length human OCT4 or OCT4 and SOX2 (residues 37-118) were mixed in a 1:2 molar ratio in BB buffer (20 mM HEPES pH 7.4, 1 mM MgCl 2, 10 mM KCl, and 0.5mM TCEP) and incubated on ice for ~30 minutes. Nucleosomes in the presence or absence of OCT4 and/or SOX2 were treated with a titration (0.12U, 0.06U, 0.03U) of DNaseI (NEB M0303S) in the presence of MgCl2 (2.5mM) and CaCl2 (0.5mM) for 5 minutes at 37OC. The reaction was stopped by adding an equal volume of Stop Buffer (200 mM NaCl, 30 mM EDTA, 1% SDS) and chilled on ice for 10 min. Samples were treated with Proteinase K (10g) for 2 hours and DNA retrieved using Ampure Beads (A63881). DNA was used for sequence library preparation (NEBNext ChIP-seq, E6240S) with Dual indexing and sequenced on an Illumina MiSeq (300bp paired-end). Sequences were mapped to the Widom 601 sequence (147bp) containing the OCT4-SOX2 motif using the Bioconductor package QuasR with default settings (44), which internally using bowtie for read mapping (45). The start position of mapped reads, the DNaseI cut site, was 10 extracted and the counts were binned into 1bp bins across the length of the W601 sequence. Plots and comparisons were done using 100,000 reads per replicate. Generation of single motif mES cell lines The Recombinase-mediated Cassette Exchange (RMCE) insertion protocol was used to generate clonal lines with variant sequences inserted at the same position (35, 76). Briefly, TC-1 ES cells (background 129S6/SvEvTac) carrying an RMCE selection cassette (described in (35)) were selected under hygromycin (250 μg/ml, Roche, Switzerland) for 10 days. Next, 4 million cells were electroporated (Amaxa nucleofection, Lonza, Switzerland) with 25 μg of L1-601/motif-1L plasmid and 15 μg of pIC-Cre. Negative selection with 3 μM Ganciclovir (Roche, Switzerland) was started 2 days after transfection and continued for 10 days. Pools of selected cells were clonally expanded and tested for successful insertion of DNA construct by PCR using primers recognizing locus flanking the insertion site. For motif insertion, DNA fragments containing the Widom 601 sequence were cloned into a plasmid containing a multiple cloning site flanked by two inverted L1 Lox sites, all motifs were inserted into the SHL -6 position as used for structural studies (see above). The optimal match to the weight matrix for Oct4-Sox2 from the Jaspar database (MA0142.1) (25) was used to construct the full-length motif while those nucleotides having the most detrimental contribution to a score corresponding to the POUHD domain-contacted nucleotides were used to construct the partial HMGPOUS motif. Control sequences were constructed by using only those nucleotides of the weight matrix model contacted by the HMG domain of Sox2 in the motif. For each construct at least two independent clones were expanded and used for analysis. Chromatin immunoprecipitation ChIP experiments were carried out as described (77), starting with 70 μg of chromatin and 5 µl of Oct-4A (C30A3C1) Rabbit mAb (Cell signaling technology, cat #5677S). Real-time PCR was performed using SYBR Green chemistry (Applied Biosystems) and 1/20 of the ChIP reaction or 1/40 of input chromatin per PCR reaction. Two overlapping primer sets were used targeting the RMCE insertion locus to test enrichment, as well as a negative control primer set for ChIP enrichment (Mm10 genome build: chr12 47,899,688-47,899,802). All primer sequences are available upon request. 11 Fig. S1. SeEN-seq is highly reproducible and identifies specific sites of TF binding to nucleosomes. A, SeEN-seq profiles are highly reproducible across a range of TF concentration (Pearson correlation >0.8). 12 B, SeEN-seq enrichment profiles are highly reproducible between technical replicates at single TF concentrations (Pearson correlation >0.8). C, SeEN-Seq enrichment profiles shown across the nucleosome, indicated by SHLs that describe where the minor groove faces away (±1, 2, etc.) or towards (±1.5, 2.5, etc.) the histone octamer. Asterisks (*) indicate positions tested in (27). Values are the average of technical replicates for the nucleosome pool only (no TF), or the nucleosome pool in the presence of OCT4, OCT4SOX2, or SOX2 and the difference between OCT4-SOX2 and OCT4 alone. Error bars indicate s.d. 13 Fig. S2. SeEN-seq validation by independent methods and periodic binding to nucleosomal DNA. A, Schematic of the 601 sequence indicating histone interactions. The location of the OCT4-SOX2 motif 14 (SOX2, orange; OCT4, red.) for individual affinity measurements is indicated below. Grey bars indicate regions of histone-DNA interactions known as the ‘TA steps’ in the Widom 601 nucleosome positioning sequence (12). B, 10 nM of a fluorescein (Flc) labeled 21-bp DNA containing the OCT4-SOX2 motif (OCT4-SOX2 oligo, 5’-Flc-GACCTTTGTTATGCAAATTAA) were mixed with 300 nM OCT4 fulllength protein and counter-titrated with nucleosomes (Methods). Relative affinities are indicated as IC50 values. All data include two to three technical replicates and are shown as mean ± s.d. C, 10 nM of a Flc21-bp-DNA containing the OCT4-SOX2 motif were mixed with 150 nM SOX2 (residues 37-118) and counter titrated with nucleosomes. Relative affinities are indicated as IC50 values. All data include two to three technical replicates and are shown as mean ± s.d. D, Fluorescence polarization forward titration experiments using 10 nM Flc-OCT4-SOX2 oligo in the presence of increasing amounts of either full-length OCT4, SOX2 (residues 37-118) or OCT4-SOX2 mixed at equimolar concentrations and titrated as indicated. For OCT4-SOX2, the concentration indicates the concentration of the heterodimer. Kd (app): apparent dissociation constant. E, As in B and C but counter-titrations were performed with the indicated unlabeled oligonucleotides. Relative affinities are indicated as IC50 values. We note that these measurements are unable to distinguish between non-specific and specific binding events and were fitted using a total binding curve. All data include three technical replicates (n=3) and are shown as mean ± s.d. F, SeEN-seq enrichments for OCT4 (grey, three technical replicates per position) and the 3bp-running average across the nucleosome (blue). Indicated are observed regions of periodic binding at 10bp intervals. G, Autocorrelation analysis of the nucleosome pool only (no TF), or in the presence of OCT4, OCT4-SOX2 and SOX2 SeEN-seq enrichments across the nucleosome, x-axis gives lag in bp and dashed lines indicate 95% confidence interval to indicate statistical significance. H, SeEN-seq enrichments for SHL-6.5 to SHL5.5, with the corresponding solvent accessible domain of OCT4 indicated above. Solvent accessibility determined using atom clash score (see Methods). Asterisk (*) indicates SHL-6 position used for structure determination. I, ChIPseq enrichments (Log2 - Immunoprecipitated/Input) on the joint set of peaks from two replicates of Sox2 (Sox2 R1 and Sox2 R2) and Oct4 (Oct4 R1 and Oct4 R2). Bottom panels show scatter plots of ChIP enrichments, top panels show the Pearson’s correlation coefficients. Peaks bound strongly by Oct4 tend to also be strongly bound by Sox2. J, Sequence logos of the top weight matrix of each dataset as inferred by HOMER on the top 500 peaks (ranked by enrichment over input). For easier visual comparison, starting or trailing positions with very low information content were removed (first 3 positions of the Sox2 R1 and the first two and last position of the Oct4 R2 logo (information content < 0.041)) or positions with uniform nucleotide distributions were added at the beginning or end (1 at the end for Sox2 R2 and 3 at the beginning and 2 at the end for Oct4 R1). K, The percentage of the top 500 peaks possessing their respective motif (log2-odds motif score >= 10). L, SeEN-seq enrichment for those positions bound at least 2-fold in the OCT4 only SeEN-seq profile, shown as box and whisker (25th 15 percentiles) for OCT4, OCT4-SOX2 and delta between them. Addition of SOX2 at these OCT4-bound loci increases the SeEN-seq enrichment by 1.7 log2-fold on average. 16 Fig. S3. Classification and refinement procedures for the OCT4-SOX2-NCPSHL-6 complex. 17 A, SeEN-seq enrichment of the specific position used for structural study (SHL-6), shown are values of OCT4, OCT4-SOX2 and delta between them. B, Size-exclusion trace of GFP-OCT4, SOX2 (residues 37118) and NCPSHL-6. Peak fractions were analyzed by SDS-PAGE and Coomassie blue stain. C, Representative cryo-EM micrograph and reference-free 2D class averages for the OCT4-SOX2-NCPSHL-6 complex. The micrograph was denoised by JANNI (62). D, A Titan Krios 300keV microscope was used to collect 5,702 micrographs. All dose-fractionated micrograph stacks were subjected to beam-induced motion correction with MotionCorr in RELION 3.0 (60). Particles were picked with crYOLO and only one round of 2D classification to clean up particles before 3D classification in RELION 3.0 (60). A nucleosome model (PDB: 6R94) was low-pass filtered to 60 Å and used as initial model for the first round of 3D classification. Several rounds of 3D classification, including local searches, were necessary to obtain homogeneous datasets. The last 3D classification divided the dataset into four models. Refinement of the best particles with cryoSPARC using a non-uniform refinement led to a 3.1 Å resolution map. E, Gold-standard Fourier shell correlation curve. F, Local resolution filtered map (MonoRes) (68). G, Angular distribution for the OCT4-SOX2-NCPSHL-6. H, Local resolution filtered map (MonoRes) colored by protein chain identity. 18 Fig. S4. OCT and SOX2 homology modeling. A, OCT4 alignment from the OCT4-SOX2-NCPSHL-6 structure aligned with chain B (residues 3-73) of OCT4-free DNA structure (PDB: 3L1P). The backbone (C) of each model was used to calculate an r.m.s.d = 0.60 Å. using the PyMOL align function. B, SOX2DNA alignment from the OCT4-SOX2-NCPSHL-6 structure and the free DNA high-resolution crystal structure (PDB:1GT0). SOX2 was aligned to chain D, residues 1-79 of 1GT0. The SOX-bound NCP DNA was aligned to chain A, DNA bases 1-10 and chain B 40-48 of 1GT0. The backbone of the protein chain (C) and the DNA (phosphoribose only) was used to calculate an r.m.s.d. = 0.97 Å using the PyMOL align function. C, Multiple sequence alignment of OCT1 and OCT4 (human) and D, SOX2 (human and mouse) DNA binding domain sequences using T-Coffee (78) and visualized in Jalview (79). The conserved SOX2 Phe48 and Met49 reported to induce DNA bending are indicated (30, 80). 19 Fig. S5. Analysis and validation of the OCT4-SOX2-NCPSHL-6 structure. Representative, sharpened, local-resolution filtered density with corresponding model cut-out segments of the OCT4-SOX2-NCPSHL-6 complex map using the PyMOL 2.0 carve function; including A, DNA segment (contour level, 9) B, H4 of the histone core (contour level, 4), C, OCT4 POUS domain (contour level, 4), and D, SOX2 density (contour level, 4). Nucleotide and amino acid boundaries used for the cut-out are indicated in each panel. E, Overlay of the OCT1-SOX2 structure (PDB:1O4X) including only the POUS domain and SOX2 (the PDB:1O4X also includes the POUHD domain on free DNA) with the nucleosome bound structure (28). The alignment was performed by superimposing on the DNA. The free-DNA bound structure is shown in solid and the nucleosome-bound structure in transparent. F, Close-up view of OCT4 POUS only engaged on DNA. Nucleotides corresponding to the POUS motif are shown in red and the entire OCT4-SOX2 motif is shown with ribose and base rings. Residues at the DNA-protein interface are shown in sticks. G, 10 nM of a Flc-21-bp-DNA containing the OCT4-SOX2 motif were mixed with 300 nM full-length OCT4 or OCT4 DNA binding domain only (residues 134-290) and counter titrated with nucleosomes. All data include three technical replicates (n = 3) and are shown as mean ± s.d. 20 Fig. S6. OCT-SOX2 free DNA binding modality clashes with the nucleosome architecture. A, Residue clashes for the OCT1 DNA binding domain (PDB: 1O4X, chain A) with the unbound nucleosome structure determined by aligning the 2nd base of the OCT4 motif (ATGCAAT) with the nucleosome DNA at 1bp intervals and calculating the residue clash score with a 1Å cut-off. B, OCT4 POUHD domain is blocked by the native, non-crosslinked H2A-H2B dimer. Lysine crosslink between H2A and H2B molecules of distinct H2A:H2B dimer pairs across the nucleosome dyad axis (H2A, chain C and residue 37; H2B, chain H and residue 86). The inter-lysine crosslink was modelled with a pentanediol moiety (PTD, pink) and is included in the OCT4-SOX2-NCPSHL-6 model. Contour level is 4. 21 Fig. S7. Classification and refinement procedures for the non-crosslinked OCT4-SOX2-NCPSHL-6 complex and the unbound NCPSHL-6. A, Representative cryo-EM micrograph and reference-free 2D class averages for the non-crosslinked NCPSHL-6 and OCT4-SOX2-NCPSHL-6 complex. The micrograph was denoised by JANNI (62). B, A Titan Krios 300keV microscope was used to collect 4,905 micrographs. All 22 dose-fractionated micrograph stacks were subjected to beam-induced motion correction with MotionCorr in RELION 3.0 (60). Particles were picked with crYOLO and only one round of 2D classification to clean up particles before 3D classification in RELION 3.0 (60). A nucleosome model (PDB: 6R94) was low-pass filtered to 60 Å and used as initial model for the first round of 3D classification. Several rounds of 3D classification, including local searches, were necessary to obtain homogeneous datasets. Two classification routes were followed to for the bound and free nucleosome after the first round of 3D classification. The 28,138 particles contributing to the NCPSHL-6 construction were further filtered by further 2D classifications and subset selection of limiting classes to decrease anisotropy (EOD, 0.66). The resulting 6,302 particles were used in the final refinement and showed an increased EOD value of 0.79, consistent with a relatively isotropic reconstruction. C, Refinement of the best particles with cryoSPARCv2 using a non-uniform refinement led to a 3.49Å resolution map (free nucleosome, NCP SHL-6 resolution filtered map, NCP SHL-6 SHL-6 ) and D, Free nucleosome local (MonoRes) (68). E, Angular distribution for the free nucleosome, NCP F, Fit of the crosslinked OCT4-SOX2-NCPSHL-6 model into the 4.15 Å resolution map (FSC, 0.143) for the non-crosslinked OCT4-SOX2-NCPSHL-6 map. The non-crosslinked map was deblurred with ccpem 1.3.0, using refmac5 (81, 82). The individual models were built and refined with the corresponding cryoEM map and the resulting models were compared between the backbone of the entire crosslinked model with the backbone of the entire non-crosslinked model (Overall root mean square deviation, r.m.s.d., 1.3 Å.). When comparing isolated OCT4-SOX2 and histone core chains between maps, the r.m.s.d. is equal to 1.4 Å and 1.8 Å, respectively. G, Cut-out density segment of the H2A:H2B dimer at the OCT4-SOX2 site in the non-crosslinked map. Contour level is 5.5. 23 Fig. S8. OCT4-SOX2 induce DNA release does not induce rearrangements in the histone octamer core. A, Zoom of SOX2-induced DNA kink highlighting the residues involved in intercalating the TT-step (M49 and F48), consistent with previously reported crystal structures of SOX2 on free DNA (18). B, Sharpened, local-resolution filtered density with corresponding model cut-out segment of the H3 Nterminal -helix near the OCT4-SOX2 binding site (contour level, 3). C, Overlay of the unbound nucleosome histone octamer core (grey) and the OCT4-SOX2 bound histone octamer core (colored). The root mean square deviation (r.m.s.d.) of the histones when the structures are aligned on the DNA is equal to 1.8 Å. 24 Fig. S9. DNaseI and thermal stability assays show OCT4 and OCT4-SOX2 destabilize the nucleosome. A, DNaseI digestion profile across the nucleosome only (SHL-6), OCT4 and OCT4-SOX2(OS), 2 replicates are shown per protein condition across a range of enzyme concentrations (0.12-0.03 Units 25 DNaseI). B, Pairwise correlations of DNase I measurements (from A), separated by protein condition. C, Thermal shift assay of the Widom 601 containing the OCT4-SOX2 motif used for structure determination (NCPSHL-6) or the Widom 601 nucleosome only (NCP601) in the presence or absence of OCT4 or OCT4SOX2. Nucleosomes were mixed with buffer only or in a molar ratio of 1:2 (nucleosome:OCT4 or OCT4SOX2). The raw values were normalized to one and plotted (Methods). These data are representative of two separate experiments and within each experiment two technical replicates (n=2) and are shown as mean ± s.d (solid line). The first peak at a melting temperature at ~78C corresponds to release of H2A:H2B dimer and the 2nd peak at ~83C corresponds to release of H3:H4 and DNA (58). The grey bar indicates the temperature of the peaks in the nucleosome only samples for reference. The DNASHL-6 sample is the corresponding 153bp NCP DNA containing the OCT4-SOX2 motif (no histones) in the presence of OCT4SOX2 as a control for the individual melting curve of OCT4-SOX2 and DNA. 26 Fig. S10. Classification and refinement procedures for the OCT4-SOX2-NCPSHL+6 complex. A, Representative cryo-EM micrograph and reference-free 2D class averages for Dataset 1 of the OCT4SOX2-NCPSHL+6 complex. The micrograph was denoised by JANNI (62). B, Representative cryo-EM 27 micrograph for Dataset 2 of the OCT4-SOX2-NCPSHL+6 complex. The micrograph was denoised by JANNI (62) C, A Titan Krios 300 keV microscope was used to collect 3,669 (Dataset 1) and 4,387 (Dataset 2) micrographs. All dose-fractionated micrograph stacks were subjected to beam-induced motion correction with MotionCorr in RELION 3.0 (60). Particles were picked with crYOLO and only one round of 2D classification to clean up particles before 3D classification in RELION 3.0 (60). In the case of Dataset 2, 3D classification was performed directly after particle picking and extraction. A nucleosome model (PDB: 6R94) was low-pass filtered to 60 Å and used as initial model for the first round of 3D classification. Again, several rounds of 3D classification, including local searches, were necessary to obtain homogeneous datasets. 3D variability of the final refinement of Dataset 1 was then assessed using cryoSPARCv2 and a volume that showed the best resolution for both TFs (as visualized and extracted by 3D display within cryoSPARCv2) was used as a reference map for a subsequent 3D classification in RELION 3.0. The last 3D refinements from each independent dataset (Datasets 1 and 2) (RELION 3.0) were joined and a final 3D classification was performed with eight classes. Final refinement of the best particles with the most continuous OCT4 density with cryoSPARCv2 using a non-uniform refinement led to a 3.42 Å resolution map. D, Gold-standard Fourier shell correlation curve from final cryoSPARC non-uniform refinement of joined datasets. E, Local resolution filtered map colored by local resolution (MonoRes) (68). F, Local resolution filtered map colored by chain (MonoRes) (68). G, Angular distribution for the OCT4-SOX2NCPSHL+6. 28 Fig. S11. Model details of the OCT4-SOX2-NCPSHL+6 cryo-EM structure. A, OCT4 alignment from the OCT4-SOX2-NCPSHL+6 structure aligned with chain B (residues 3-73) of OCT4-free DNA structure (PDB: 3L1P). The backbone (C) of each model was used to calculate an r.m.s.d = 0.59 Å. using the PyMOL align function. B, SOX2-DNA alignment from the OCT4-SOX2-NCPSHL+6 structure and the free DNA highresolution crystal structure (PDB:1GT0). SOX2 was aligned to chain D, residues 1-79 of 1GT0. The SOXbound NCP DNA was aligned to chain A, DNA bases 1-10 and chain B 40-48 of 1GT0. The backbone of the protein chain (C) and the DNA (phosphoribose only) was used to calculate an r.m.s.d. = 0.95 Å using the PyMOL align function. C, Alignment of an OCT4-SOX2 crystal structure (PDB: 6HT5, 3.45Å resolution) to the OCT4-SOX2-NCPSHL+6 structure. OCT4 (residues 130 – 201) and SOX2 chains from PDB: 6HT5, together with the DNA were aligned to OCT4 and SOX2 chains of the SHL+6-bound structure. The backbone of the protein chains for OCT4 and SOX2 (C) was used to calculate an r.m.s.d. = 0.69 Å using the PyMOL align function. Representative, sharpened, local-resolution filtered density with corresponding model cut-out segments of the OCT4-SOX2-NCPSHL+6 complex map using the PyMOL 2.0 carve function; including D, OCT4 POUS domain (contour level, 3), E, SOX2 density (contour level, 5) and F, H4 of the histone core (contour level, 2). 29 Fig. S12. Local-directional resolution measurement and local anisotropy analysis with MonoDir. A, D and G, Average directional resolution plots for the cross-linked OCT4-SOX2-NCPSHL-6, noncrosslinked NCPSHL-6 and cross-linked OCT4-SOX2-NCPSHL+6 complexes respectively. B, E and H, Polar angular distribution plots showing the distribution of the highest local-directional resolutions for the crosslinked OCT4-SOX2-NCPSHL-6, non-crosslinked NCPSHL-6 and cross-linked OCT4-SOX2-NCPSHL+6 complexes respectively. C, F and I, Radial average of local-directional resolution maps (tangentialresolution map, radial-resolution map, high- and low-resolution maps and MonoRes map) of the crosslinked OCT4-SOX2-NCPSHL-6, non-crosslinked NCPSHL-6 and cross-linked OCT4-SOX2-NCPSHL+6 complexes respectively. Plots were generated using MonoDir implementation and visualized using Scipion 2.0 (Xmipp3) (68, 70). 30 Fig. S13. The OCT4-POUHD domain is not engaged with its motif at SHL+6. A, Representative cryoEM micrograph and reference-free 2D class averages for Dataset 3 of the OCT4-SOX2-NCPSHL+6 complex. The micrograph was denoised by JANNI (62). The acquisition was performed at a nominal magnification of 105,000 × in EFTEM nanoprobe mode yielding a pixel size of 1.06 Å at the specimen level. B, A Titan Krios 300keV microscope was used to collect 7,170 micrographs. All dose-fractionated micrograph stacks were subjected to beam-induced motion correction with MotionCorr in RELION 3.0 (60). Particles were picked with crYOLO and only one round of 2D classification to clean up particles before 3D classification in RELION 3.0 (60). Varied classification of this additional dataset yielded a 3D reconstruction of an OCT4SOX2-bound nucleosome at SHL+6 at 4.5Å containing 54,648 particles. In addition to reasonably wellresolved density for OCT4(POUS)-SOX2-HMG, this reconstruction also shows evidence of additional density emanating from the docked OCT4 POUS that we tentatively assign to the POUHD (see also panel 31 C). C, Cryo-EM map of a 3D classification showing the density we tentatively assign to the POUHD that is ~17Å away from the most C-terminal residue of the POUS domain in the OCT4-SOX2-NCPSHL+6 model. D, Model depicting the clash of the POUHD motif in this OCT4-SOX2 orientation with the neighboring DNA gyre. Binding of the POUHD to its motif juxtaposed to the POUS motif on the same DNA strand, as observed in the free DNA structure (PDB: 1O4X) would result in clashes with the nucleosome DNA gyre. 32 Fig. S14. Density correlation plots of the OCT4-SOX2-NCP bound models with corresponding maps. Density correlation (CC) of the A, OCT4-SOX2-NCPSHL-6 map and B, OCT4-SOX2-NCPSHL+6 map with the corresponding model. The DNA (chains I and J) shows increased flexibility and correspondingly lower CC values at the ends. The histone proteins (chains A-H) for both models do not show significant variation, with a CC value near 1.0 for all residues. For OCT4 and SOX2, sidechains that were not resolved by density have been set to zero occupancy in the atomic model. 33 Fig. S15. The HMG-POUS partial motif is bound in vivo and requires Oct4 and Sox2 for accessibility. A, Representation of SOX2 HMG and OCT4 POU domains interaction with the preferred motif sequence, 34 as the Oct4-Sox2 position weight matrix (MA0142.1). The dashed lines indicate the DNA bases contacted by OCT4-SOX2 in the nucleosome-bound structure. B, Oct4 ChIP-qPCR data at the ectopic insertion locus using a non-overlapping second primer set and endogenous control locus (cont. primer data as in Fig. 5B) (*P < 0.05, error bars indicate SEM of at least two biological replicates). C, Heatmaps showing accessibility (measured by ATAC-seq) at Oct4-bound sites in cells in the presence and absence of Oct4 (36). Displayed are the top thousand Oct4-bound loci (ranked by accessibility in Oct4-expressing cells) for the canonical and partial HMG-POUS motif. Library size-normalized read densities (51-bp smoothed) are shown ± 0.5kbp around the motif. D, A metaplot of ATAC-seq signal in mES cells before and after Oct4 knockdown (36). Data as in C. E, Heatmaps showing accessibility (measured by ATAC-seq) at Oct4-bound sites in cells in upon knockdown of Oct4 in additional datasets (37). Displayed are loci as in C. F, As in E but showing accessibility in cells before and after knockdown of Sox2 (37). G and H, Metaplot of OCT4 ATAC-seq signal in mES cells before and after Oct4 and Sox2 knockdown (37). Data as in E and F, respectively. 35 Fig. S16. The OCT4 POUS clash score negatively correlates with SeEN-seq binding. A, The OCT4alone SeEN-seq 3bp running mean average profile is plotted (solid red line) together with the POUS nucleosome atom clash score (grey bars) or the B, POUHD nucleosome atom clash score. C, OCT4-SOX2 SeEN-seq 3bp running mean average profile is plotted (solid purple line) together with the POUS nucleosome atom clash score (grey bars) or the D, POUHD nucleosome atom clash score (see Materials and Methods). 36 Fig. S17. Model of OCT4-SOX2 binding in higher order chromatin structure. Model of the OCT4SOX2-NCPSHL-6 nucleosome within a tetranucleosome (PDB: 5OY7) (39, 83). The OCT4-SOX2- NCPSHL6 was aligned to the tetranucleosome structure using the histone chains. Only the DNA bound by the factors of OCT4-SOX2- NCPSHL-6 model is shown for clarity. 37 Table S1. Cryo-EM data collection, refinement and validation statistics OCT4-SOX2- NCPSHL-6 NCP SHL-6 OCT4-SOX2- NCPSHL+6 (EMD-10406) (EMD-10408) (EMD-10864) (PDB 6T90) (PDB 6T93) (PDB 6YOV) Microscope Titan Krios Titan Krios Titan Krios Camera K2 K2 K2 Magnification Nominal: 130,000 Nominal: 130,000 Nominal: 130,000 Calibrated: 58,140 Calibrated: 58,140 Calibrated: 58,140 Voltage (keV) 300 300 300 Total dose (e–/Å2) 45 45 45 Number of frames 40 40 40 Defocus range (μm) -0.25 – -2.0 -0.2 – -2.0 -0.2 – -2.0 Pixel size (Å) 0.86 0.86 0.86 Energy filter slit width 20 eV 20 eV 20 eV Acquisition software EPU EPU EPU No. of micrographs 5,702 4,905 Dataset 1: 3,669 Data collection and processing Dataset 2: 4,387 Symmetry imposed C1 C1 C1 Initial particle images (no.) 1,391,576 853,266 2,472,943 (2 datasets) Final particle images (no.) 94,282 6,302 71,284 Map resolution, masked (Å) 3.05 (0.143) 3.49 (0.143) 3.42 (0.143) 3.30 (0.5) 4.15 (0.5) 4.12 (0.5) Map resolution, unmasked 3.8 (0.143) 6.7 (0.143) 3.7 (0.143) (Å) 6.03 (0.5) 9.37 (0.5) 6.03 (0.5) 2.03 – 4.14 2.42 – 5.49 2.44 – 4.37 Efficiency (Eod) 0.73 0.79 0.78 Map resolution range (Å) 3.0–11 3.0–9 3.0-12 FSC threshold (0.5, 0.143) FSC threshold (0.5, 0.143) Resolution range due to anisotropy (Å) (best, worst PSF) 38 Refinement Refinement package Phenix, Rosetta Phenix, Rosetta Phenix, Rosetta real space real space real space Resolution cut-off (Å) 3.05 3.49 3.42 Initial models used (PDB 6NJ9, 1GT0, 6NJ9 6T90 codes) 3L1P Model resolution (Å) 3.05 3.49 3.42 0.143 0.143 0.143 -98 -69 -104 Non-hydrogen atoms 13,031 12,199 12,298 Protein residues 903 758 898 Nucleotides 282 302 262 Ligands 9 (PTD) FSC threshold 2 Map sharpening B factor (Å ) Model composition 8 (PTD) 2 B factors (Å ) Protein 124 82 167 DNA 181 146 225 Bond lengths (Å) 0.022 0.014 0.011 Bond angles (°) 1.56 1.087 1.056 MolProbity score 0.66 0.73 0.87 Clashscore 0.46 0.73 1.39 Poor rotamers (%) 0.00 0.00 0.00 Favored (%) 99.66 99.73 100.0 Allowed (%) 0.34 0.27 0.00 Disallowed (%) 0.00 0.00 0.00 C-beta deviations 0.0 0.0 0.0 EMringer score 4.3 2.9 1.8 CaBLAM outliers (%) 0.7 0.6 0.7 R.m.s. deviations Validation Ramachandran plot 39 Model-to-data fit* CCmask 0.81 0.84 0.82 CCbox 0.84 0.87 0.88 CCpeaks 0.79 0.79 0.78 CCvolume 0.80 0.82 0.82 Movie S1. OCT4-SOX2 binding at SHL-6 removes DNA from the histone core. A morph video modelling the structural change induced in the nucleosome upon OCT4-SOX2 binding at SHL-6. Morph is between the DNA of the NCP-SHL-6 and OCT4-SOX2-NCP-SHL-6 models. 40 References and Notes 1. A. Soufi, M. F. Garcia, A. Jaroszewicz, N. Osman, M. Pellegrini, K. S. Zaret, Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568 (2015). doi:10.1016/j.cell.2015.03.017 Medline 2. D. J. Rodda, J.-L. Chew, L.-H. Lim, Y.-H. Loh, B. Wang, H.-H. Ng, P. Robson, Transcriptional regulation of nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731– 24737 (2005). doi:10.1074/jbc.M502573200 Medline 3. K. Takahashi, S. Yamanaka, Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006). doi:10.1016/j.cell.2006.07.024 Medline 4. V. Malik, L. V. Glaser, D. Zimmer, S. Velychko, M. Weng, M. Holzner, M. Arend, Y. Chen, Y. Srivastava, V. Veerapandian, Z. Shah, M. A. Esteban, H. Wang, J. Chen, H. R. Schöler, A. P. Hutchins, S. H. Meijsing, S. Pott, R. Jauch, Pluripotency reprogramming by competent and incompetent POU factors uncovers temporal dependency for Oct4 and Sox2. Nat. Commun. 10, 3477 (2019). doi:10.1038/s41467-019-11054-7 Medline 5. D. C. Ambrosetti, C. Basilico, L. Dailey, Synergistic activation of the fibroblast growth factor 4 enhancer by Sox2 and Oct-3 depends on protein-protein interactions facilitated by a specific spatial arrangement of factor binding sites. Mol. Cell. Biol. 17, 6321–6329 (1997). doi:10.1128/MCB.17.11.6321 Medline 6. T. Kumar Mistri, C. S. Lam, W. Arindrarto, D. Rodda, Y. H. Foo, W. Ping Ng, S. Ahmed, P. Robson, T. Wohland, Quantitative Determination of Oct4-Sox2 Heterodimer Formation with Nanog Promoter Element. Biophys. J. 100, 74a (2011). doi:10.1016/j.bpj.2010.12.607 7. C. C. Adams, J. L. Workman, Binding of disparate transcriptional activators to nucleosomal DNA is inherently cooperative. Mol. Cell. Biol. 15, 1405–1421 (1995). doi:10.1128/MCB.15.3.1405 Medline 8. L. A. Mirny, Nucleosome-mediated cooperativity between transcription factors. Proc. Natl. Acad. Sci. U.S.A. 107, 22534–22539 (2010). doi:10.1073/pnas.0913805107 Medline 9. K. S. Zaret, J. S. Carroll, Pioneer transcription factors: Establishing competence for gene expression. Genes Dev. 25, 2227–2241 (2011). doi:10.1101/gad.176826.111 Medline 10. F. Zhu, L. Farnung, E. Kaasinen, B. Sahu, Y. Yin, B. Wei, S. O. Dodonova, K. R. Nitta, E. Morgunova, M. Taipale, P. Cramer, J. Taipale, The interaction landscape between transcription factors and the nucleosome. Nature 562, 76–81 (2018). doi:10.1038/s41586018-0549-5 Medline 11. M. Fernandez Garcia, C. D. Moore, K. N. Schulz, O. Alberto, G. Donague, M. M. Harrison, H. Zhu, K. S. Zaret, Structural Features of Transcription Factors Associating with Nucleosome Binding. Mol. Cell 75, 921–932.e6 (2019). doi:10.1016/j.molcel.2019.06.009 Medline 12. R. K. McGinty, S. Tan, Nucleosome structure and function. Chem. Rev. 115, 2255–2273 (2015). doi:10.1021/cr500373h Medline 13. S. Matsumoto, S. Cavadini, R. D. Bunker, R. S. Grand, A. Potenza, J. Rabl, J. Yamamoto, A. D. Schenk, D. Schübeler, S. Iwai, K. Sugasawa, H. Kurumizaka, N. H. Thomä, DNA damage detection in nucleosomes involves DNA register shifting. Nature 571, 79–84 (2019). doi:10.1038/s41586-019-1259-3 Medline 14. L. A. Cirillo, C. E. McPherson, P. Bossard, K. Stevens, S. Cherian, E. Y. Shim, K. L. Clark, S. K. Burley, K. S. Zaret, Binding of the winged-helix transcription factor HNF3 to a linker histone site on the nucleosome. EMBO J. 17, 244–254 (1998). doi:10.1093/emboj/17.1.244 Medline 15. B. Fierz, M. G. Poirier, Biophysics of Chromatin Dynamics. Annu. Rev. Biophys. 48, 321– 345 (2019). doi:10.1146/annurev-biophys-070317-032847 Medline 16. G. Li, M. Levitus, C. Bustamante, J. Widom, Rapid spontaneous accessibility of nucleosomal DNA. Nat. Struct. Mol. Biol. 12, 46–53 (2005). doi:10.1038/nsmb869 Medline 17. J. Huertas, C. M. MacCarthy, H. R. Schöler, V. Cojocaru, Nucleosomal DNA dynamics mediate Oct4 pioneer factor binding. Biophys. J. S0006-3495(20)30032-1 (2020). doi:10.1016/j.bpj.2019.12.038 Medline 18. A. Reményi, K. Lins, L. J. Nissen, R. Reinbold, H. R. Schöler, M. Wilmanns, Crystal structure of a POU/HMG/DNA ternary complex suggests differential assembly of Oct4 and Sox2 on two enhancers. Genes Dev. 17, 2048–2059 (2003). doi:10.1101/gad.269303 Medline 19. D. Esch, J. Vahokoski, M. R. Groves, V. Pogenberg, V. Cojocaru, H. Vom Bruch, D. Han, H. C. A. Drexler, M. J. Araúzo-Bravo, C. K. L. Ng, R. Jauch, M. Wilmanns, H. R. Schöler, A unique Oct4 interface is crucial for reprogramming to pluripotency. Nat. Cell Biol. 15, 295–301 (2013). doi:10.1038/ncb2680 Medline 20. X. Yu, M. J. Buck, Defining TP53 pioneering capabilities with competitive nucleosome binding assays. Genome Res. 29, 107–115 (2019). doi:10.1101/gr.234104.117 Medline 21. G. D. Stormo, Z. Zuo, Y. K. Chang, Spec-seq: Determining protein-DNA-binding specificity by sequencing. Brief. Funct. Genomics 14, 30–38 (2015). doi:10.1093/bfgp/elu043 Medline 22. L. A. Boyer, T. I. Lee, M. F. Cole, S. E. Johnstone, S. S. Levine, J. P. Zucker, M. G. Guenther, R. M. Kumar, H. L. Murray, R. G. Jenner, D. K. Gifford, D. A. Melton, R. Jaenisch, R. A. Young, Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005). doi:10.1016/j.cell.2005.08.020 Medline 23. X. Chen, H. Xu, P. Yuan, F. Fang, M. Huss, V. B. Vega, E. Wong, Y. L. Orlov, W. Zhang, J. Jiang, Y.-H. Loh, H. C. Yeo, Z. X. Yeo, V. Narang, K. R. Govindarajan, B. Leong, A. Shahab, Y. Ruan, G. Bourque, W.-K. Sung, N. D. Clarke, C.-L. Wei, H.-H. Ng, Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008). doi:10.1016/j.cell.2008.04.043 Medline 24. N. Tapia, C. MacCarthy, D. Esch, A. Gabriele Marthaler, U. Tiemann, M. J. Araúzo-Bravo, R. Jauch, V. Cojocaru, H. R. Schöler, Dissecting the role of distinct OCT4-SOX2 heterodimer configurations in pluripotency. Sci. Rep. 5, 13533 (2015). doi:10.1038/srep13533 Medline 25. A. Khan, O. Fornes, A. Stigliani, M. Gheorghe, J. A. Castro-Mondragon, R. van der Lee, A. Bessy, J. Chèneby, S. R. Kulkarni, G. Tan, D. Baranasic, D. J. Arenillas, A. Sandelin, K. Vandepoele, B. Lenhard, B. Ballester, W. W. Wasserman, F. Parcy, A. Mathelier, JASPAR 2018: Update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018). doi:10.1093/nar/gkx1126 Medline 26. P. T. Lowary, J. Widom, New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42 (1998). doi:10.1006/jmbi.1997.1494 Medline 27. S. Li, E. B. Zheng, L. Zhao, S. Liu, Nonreciprocal and Conditional Cooperativity Directs the Pioneer Activity of Pluripotency Transcription Factors. Cell Rep. 28, 2689–2703.e4 (2019). doi:10.1016/j.celrep.2019.07.103 Medline 28. D. C. Williams Jr., M. Cai, G. M. Clore, Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1.Sox2.Hoxb1-DNA ternary transcription factor complex. J. Biol. Chem. 279, 1449– 1457 (2004). doi:10.1074/jbc.M309790200 Medline 29. H. Stark, GraFix: Stabilization of fragile macromolecular complexes for single particle cryoEM. Methods Enzymol. 481, 109–126 (2010). doi:10.1016/S0076-6879(10)81005-5 Medline 30. P. Scaffidi, M. E. Bianchi, Spatially precise DNA bending is an essential activity of the sox2 transcription factor. J. Biol. Chem. 276, 47296–47302 (2001). doi:10.1074/jbc.M107619200 Medline 31. K. Luger, A. W. Mäder, R. K. Richmond, D. F. Sargent, T. J. Richmond, Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251–260 (1997). doi:10.1038/38444 Medline 32. M. P. Meers, D. H. Janssens, S. Henikoff, Pioneer Factor-Nucleosome Binding Events during Differentiation Are Motif Encoded. Mol. Cell 75, 562–575.e5 (2019). doi:10.1016/j.molcel.2019.05.025 33. C. Chronis, P. Fiziev, B. Papp, S. Butz, G. Bonora, S. Sabri, J. Ernst, K. Plath, Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442–459.e20 (2017). doi:10.1016/j.cell.2016.12.016 Medline 34. Z. Liu, W. L. Kraus, Catalytic-Independent Functions of PARP-1 Determine Sox2 Pioneer Activity at Intractable Genomic Loci. Mol. Cell 65, 589–603.e9 (2017). doi:10.1016/j.molcel.2017.01.017 Medline 35. F. Lienert, C. Wirbelauer, I. Som, A. Dean, F. Mohn, D. Schübeler, Identification of genetic elements that autonomously determine DNA methylation states. Nat. Genet. 43, 1091– 1097 (2011). doi:10.1038/ng.946 Medline 36. H. W. King, R. J. Klose, The pioneer factor OCT4 requires the chromatin remodeller BRG1 to support gene regulatory element function in mouse embryonic stem cells. eLife 6, e22631 (2017). doi:10.7554/eLife.22631 Medline 37. E. T. Friman, C. Deluz, A. C. A. Meireles-Filho, S. Govindan, V. Gardeux, B. Deplancke, D. M. Suter, Dynamic regulation of chromatin accessibility by pluripotency transcription factors across the cell cycle. eLife 8, e50087 (2019). doi:10.7554/eLife.50087 Medline 38. M. A. Hall, A. Shundrovsky, L. Bai, R. M. Fulbright, J. T. Lis, M. D. Wang, High-resolution dynamic mapping of histone-DNA interactions in a nucleosome. Nat. Struct. Mol. Biol. 16, 124–129 (2009). doi:10.1038/nsmb.1526 Medline 39. T. Schalch, S. Duda, D. F. Sargent, T. J. Richmond, X-ray structure of a tetranucleosome and its implications for the chromatin fibre. Nature 436, 138–141 (2005). doi:10.1038/nature03686 Medline 40. A. Osakabe, H. Tachiwana, W. Kagawa, N. Horikoshi, S. Matsumoto, M. Hasegawa, N. Matsumoto, T. Toga, J. Yamamoto, F. Hanaoka, N. H. Thomä, K. Sugasawa, S. Iwai, H. Kurumizaka, Structural basis of pyrimidine-pyrimidone (6-4) photoproduct recognition by UV-DDB in the nucleosome. Sci. Rep. 5, 16330 (2015). doi:10.1038/srep16330 Medline 41. B. Fierz, C. Chatterjee, R. K. McGinty, M. Bar-Dagan, D. P. Raleigh, T. W. Muir, Histone H2B ubiquitylation disrupts local and higher-order chromatin compaction. Nat. Chem. Biol. 7, 113–119 (2011). doi:10.1038/nchembio.501 Medline 42. A. J. Ruthenburg, H. Li, T. A. Milne, S. Dewell, R. K. McGinty, M. Yuen, B. Ueberheide, Y. Dou, T. W. Muir, D. J. Patel, C. D. Allis, Recognition of a mononucleosomal histone modification pattern by BPTF via multivalent interactions. Cell 145, 692–706 (2011). doi:10.1016/j.cell.2011.03.053 Medline 43. W. Abdulrahman, M. Uhring, I. Kolb-Cheynel, J.-M. Garnier, D. Moras, N. Rochel, D. Busso, A. Poterszman, A set of baculovirus transfer vectors for screening of affinity tags and parallel expression strategies. Anal. Biochem. 385, 383–385 (2009). doi:10.1016/j.ab.2008.10.044 Medline 44. D. Gaidatzis, A. Lerch, F. Hahne, M. B. Stadler, QuasR: Quantification and annotation of short reads in R. Bioinformatics 31, 1130–1132 (2015). doi:10.1093/bioinformatics/btu781 Medline 45. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). doi:10.1186/gb-2009-10-3-r25 Medline 46. K.-S. Chan, B. Ripley, TSA: Time Series Analysis (R package version 1.2, 2018); https://CRAN.R-project.org/package=TSA. 47. Y. Benjamini, T. P. Speed, Summarizing and correcting the GC content bias in highthroughput sequencing. Nucleic Acids Res. 40, e72 (2012). doi:10.1093/nar/gks001 Medline 48. C. A. Meyer, X. S. Liu, Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Rev. Genet. 15, 709–721 (2014). doi:10.1038/nrg3788 Medline 49. M. Teng, R. A. Irizarry, Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res. 27, 1930–1938 (2017). doi:10.1101/gr.220673.117 Medline 50. Y. Zhu, R. M. Stephens, P. S. Meltzer, S. R. Davis, SRAdb: Query and use public nextgeneration sequencing data from within R. BMC Bioinformatics 14, 19 (2013). doi:10.1186/1471-2105-14-19 Medline 51. Y. Zhang, T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, X. S. Liu, Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). doi:10.1186/gb-2008-9-9-r137 Medline 52. S. Heinz, C. Benner, N. Spann, E. Bertolino, Y. C. Lin, P. Laslo, J. X. Cheng, C. Murre, H. Singh, C. K. Glass, Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010). doi:10.1016/j.molcel.2010.05.004 Medline 53. H. Pagès, P. Aboyoun, R. Gentleman, S. DebRoy, Biostrings: Efficient manipulation of biological strings (R package version 2.52.0, 2019). 54. S. Masui, Y. Nakatake, Y. Toyooka, D. Shimosato, R. Yagi, K. Takahashi, H. Okochi, A. Okuda, R. Matoba, A. A. Sharov, M. S. H. Ko, H. Niwa, Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat. Cell Biol. 9, 625–635 (2007). doi:10.1038/ncb1589 Medline 55. M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011). doi:10.14806/ej.17.1.200 56. Team BC, Maintainer BP, TxDb.Mmusculus.UCSC.mm10.knownGene: Annotation package for TxDb object(s) (R package version 3.4.7, 2019). 57. B. D. Marks, N. Qadir, H. C. Eliason, M. S. Shekhani, K. Doering, K. W. Vogel, Multiparameter analysis of a screen for progesterone receptor ligands: Comparing fluorescence lifetime and fluorescence polarization measurements. Assay Drug Dev. Technol. 3, 613–622 (2005). doi:10.1089/adt.2005.3.613 Medline 58. H. Taguchi, N. Horikoshi, Y. Arimura, H. Kurumizaka, A method for evaluating nucleosome stability with a protein-binding fluorescent dye. Methods 70, 119–126 (2014). doi:10.1016/j.ymeth.2014.08.019 Medline 59. A. D. Schenk, S. Cavadini, N. H. Thomä, C. Genoud, Live analysis and reconstruction of single-particle cryo-electron microscopy data with CryoFLARE. bioRxiv 861740 [Preprint]. 2 December 2019. https://doi.org/10.1101/861740. 60. J. Zivanov, T. Nakane, B. O. Forsberg, D. Kimanius, W. J. H. Hagen, E. Lindahl, S. H. W. Scheres, New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife 7, e42166 (2018). doi:10.7554/eLife.42166 Medline 61. K. Zhang, Gctf: Real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 (2016). doi:10.1016/j.jsb.2015.11.003 Medline 62. T. Wagner, F. Merino, M. Stabrin, T. Moriya, C. Antoni, A. Apelbaum, P. Hagel, O. Sitsel, T. Raisch, D. Prumbaum, D. Quentin, D. Roderer, S. Tacke, B. Siebolds, E. Schubert, T. R. Shaikh, P. Lill, C. Gatsogiannis, S. Raunser, SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2, 218 (2019). doi:10.1038/s42003-019-0437-z Medline 63. A. Punjani, J. L. Rubinstein, D. J. Fleet, M. A. Brubaker, cryoSPARC: Algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017). doi:10.1038/nmeth.4169 Medline 64. S. H. Scheres, RELION: Implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012). doi:10.1016/j.jsb.2012.09.006 Medline 65. P. B. Rosenthal, R. Henderson, Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333, 721–745 (2003). doi:10.1016/j.jmb.2003.07.013 Medline 66. S. Chen, G. McMullan, A. R. Faruqi, G. N. Murshudov, J. M. Short, S. H. W. Scheres, R. Henderson, High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24–35 (2013). doi:10.1016/j.ultramic.2013.06.004 Medline 67. P. D. Adams, P. V. Afonine, G. Bunkóczi, V. B. Chen, I. W. Davis, N. Echols, J. J. Headd, L.-W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. McCoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger, P. H. Zwart, PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Cryst. D66, 213–221 (2010). doi:10.1107/S0907444909052925 Medline 68. J. M. de la Rosa-Trevín, J. Otón, R. Marabini, A. Zaldívar, J. Vargas, J. M. Carazo, C. O. S. Sorzano, Xmipp 3.0: An improved software suite for image processing in electron microscopy. J. Struct. Biol. 184, 321–328 (2013). doi:10.1016/j.jsb.2013.09.015 Medline 69. K. Naydenova, C. J. Russo, Measuring the effects of particle orientation to improve the efficiency of electron cryomicroscopy. Nat. Commun. 8, 629 (2017). doi:10.1038/s41467-017-00782-3 Medline 70. P. R. Baldwin, D. Lyumkis, Non-uniformity of projection distributions attenuates resolution in Cryo-EM. Prog. Biophys. Mol. Biol. 150, 160–183 (2020). doi:10.1016/j.pbiomolbio.2019.09.002 Medline 71. P. Emsley, K. Cowtan, Coot: Model-building tools for molecular graphics. Acta Cryst. D60, 2126–2132 (2004). doi:10.1107/S0907444904019158 Medline 72. T. C. Terwilliger, F. Dimaio, R. J. Read, D. Baker, G. Bunkóczi, P. D. Adams, R. W. GrosseKunstleve, P. V. Afonine, N. Echols, phenix.mr_rosetta: Molecular replacement and model rebuilding with Phenix and Rosetta. J. Struct. Funct. Genomics 13, 81–90 (2012). doi:10.1007/s10969-012-9129-3 Medline 73. F. DiMaio, Y. Song, X. Li, M. J. Brunner, C. Xu, V. Conticello, E. Egelman, T. Marlovits, Y. Cheng, D. Baker, Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nat. Methods 12, 361–365 (2015). doi:10.1038/nmeth.3286 Medline 74. A. A. Lebedev, P. Young, M. N. Isupov, O. V. Moroz, A. A. Vagin, G. N. Murshudov, JLigand: A graphical tool for the CCP4 template-restraint library. Acta Cryst. D68, 431– 440 (2012). doi:10.1107/S090744491200251X Medline 75. I. W. Davis, A. Leaver-Fay, V. B. Chen, J. N. Block, G. J. Kapral, X. Wang, L. W. Murray, W. B. Arendall 3rd, J. Snoeyink, J. S. Richardson, D. C. Richardson, MolProbity: Allatom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35, W375–W383 (2007). doi:10.1093/nar/gkm216 Medline 76. Y. Q. Feng, J. Seibler, R. Alami, A. Eisen, K. A. Westerman, P. Leboulch, S. Fiering, E. E. Bouhassira, Site-specific chromosomal integration in mammalian cells: Highly efficient CRE recombinase-mediated cassette exchange. J. Mol. Biol. 292, 779–785 (1999). doi:10.1006/jmbi.1999.3113 Medline 77. F. Mohn, M. Weber, M. Rebhan, T. C. Roloff, J. Richter, M. B. Stadler, M. Bibel, D. Schübeler, Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Mol. Cell 30, 755–766 (2008). doi:10.1016/j.molcel.2008.05.007 Medline 78. C. Magis, J.-F. Taly, G. Bussotti, J.-M. Chang, P. Di Tommaso, I. Erb, J. Espinosa-Carrasco, C. Notredame, T-Coffee: Tree-based consistency objective function for alignment evaluation. Methods Mol. Biol. 1079, 117–129 (2014). doi:10.1007/978-1-62703-646-7_7 Medline 79. A. M. Waterhouse, J. B. Procter, D. M. Martin, M. Clamp, G. J. Barton, Jalview Version 2— A multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189– 1191 (2009). doi:10.1093/bioinformatics/btp033 Medline 80. L. Hou, Y. Srivastava, R. Jauch, Molecular basis for the genome engagement by Sox proteins. Semin. Cell Dev. Biol. 63, 2–12 (2017). doi:10.1016/j.semcdb.2016.08.005 Medline 81. T. Burnley, C. M. Palmer, M. Winn, Recent developments in the CCP-EM software suite. Acta Cryst. D73, 469–477 (2017). doi:10.1107/S2059798317007859 Medline 82. G. N. Murshudov, P. Skubák, A. A. Lebedev, N. S. Pannu, R. A. Steiner, R. A. Nicholls, M. D. Winn, F. Long, A. A. Vagin, REFMAC5 for the refinement of macromolecular crystal structures. Acta Cryst. D67, 355–367 (2011). doi:10.1107/S0907444911001314 Medline 83. B. Ekundayo, T. J. Richmond, T. Schalch, Capturing Structural Heterogeneity in Chromatin Fibers. J. Mol. Biol. 429, 3031–3042 (2017). doi:10.1016/j.jmb.2017.09.002 Medline