Online supporting materials for: On the validation of crystallographic symmetry and the quality of structures Jimin Wang* Contents: 7 sections on specific responses to reviewers, supplementary analytic procedures, discussion, and notes, 3 supplemental Figures, 9 supplemental Tables, and 46 supplemental references. Supplemental Text 1. Introduction The intended readership of this manuscript is for the entire structural biology, not just for those experts in crystallography. All highly technical non-essential details have been removed from the main text for clarity, and included in this online supporting materials section when necessary. This material includes some of technical responses to specific requests by reviewers of this manuscript as well as supplementary information to the main text as stated in the first sentence of each section. 2. Direct methods and symmetry downshifting One reviewer requested more information on direct methods for phasing procedures in regard to Miao’ algorithm. There are two classes of direct methods, one pure probability approach in reciprocal space and the other, being a class of dual-space “chargingflipping” algorithms that include classic solvent-flattening procedure used in macromolecular crystallography. Miao’s procedure belongs to the dual-space “chargingflipping” algorithms.1,2 More specifically, Miao’s procedure is to numerically determine unknown values of electron density for properly sized pixels inside the protein boundary using about 2.6 equations on average relating the observed amplitudes to every unknown density value through Fourier transformation. This procedure can determine solvent boundary itself, but not very sufficiently. This procedure works better when a prior solvent boundary is supplied. Different dual-space procedures differ in speed by several orders of magnitudes, some of which become impractical when dealing with over millions of independent observations such as in the C2 crystals of E. coli tetrameric catalase that is currently pursued for phase solutions in the author’s laboratory. Many other programs such as ACORN do not require any prior information about the structure, but often require high resolution.3 When a subset of positioned atoms is used for starting trial phases, these programs require data at much lower resolution. For example, the crystal of C2 E. coli catalase diffracted to a resolution of 1.4 Å with the <I/I> value of 0.5 in in the highest resolution shell. We processed the data to a resolution of 1.3 Å and extended to 1.0 Å resolution using Wilson-expected average values for phase solution. In this case, the direct method using the program ACORN readily resulted in interpretable electron density maps when starting phases were seeded by as few as 6.7% positioned atoms (out of about 30,000 atoms in the asymmetric unit of the unit cell). The success of this example suggests that a large fraction of the structures in the PDB can be phased using a variety of direct methods, a subject that is beyond the scope of this manuscript. Another application of direct methods for phase retrieval highly relevant to this study on the quality of data is that of 4HYO.4 Using the re-processed data of the correct P4212 space group at a resolution of 1.65 Å and then extended to 1.0 Å using Wilsonexpected average values, the complete phase retrieval was possible using ACORN starting a properly positioned, single generic poly-alanine helix model. This was not possible using the original P1 4HYO data. When positioned C atoms for the entire chain in P4212 and those for 8 chains in P1 were used as starting trial phases, complete phase retrieval using ACORN was possible for both space groups. However, the maps in P4212 with the better quality of the data were clearly better than those in P1 (Fig. S1). This difference disappeared when the reprocessed P4212 data were re-expanded to P1 because of the newly improved quality of 8-fold reciprocal space averaged P1 data relative to the original P1 data. Details of this study will be described elsewhere. Symmetry downshifting is often a preferred method for dual-space direction methods of phasing because crystals of macromolecules have a solvent region where randomly placed atoms cannot provide good starting trial phases that would eventually lead to the correct solution.2 Yet, such impossible trial phases lead to a time-consuming, very slowly converging process. In fact, these impossible trials often consume over 95% of the total CPU whereas possible trials consume less than 5%. After symmetry downshifting to P1, any point in the unit ccell now becomes a possible-solution trial, which greatly speeds up the computation time. Even when using this specific symmetry downshifting procedure, the data must be first processed in the correct symmetry of the space group before they are symmetry downshifted. Of course, one solution for phases is found, the resulting electron density maps will reproduce the correct symmetry, and structure refinement has to be carried out in the correct space group. 2. Intensity analysis for symmetry and errors in calculated anomalous amplitudes This section is supplementary to the main text. Self-rotation function (SRF) were calculated using CCP4 MolRep, and were sorted in the descending order for further analysis (Table S1).5,6 The top entries in the resulting list were selected for individual examination and structure re-refinement with properly selected CV sets after the data were rescaled in the correct higher-symmetry space groups (Table S2). During atomic rerefinement, it was found that the quality of the repositioned original models was rather poor and refinement often stalled at an early stage. Re-refinement succeeded only after alternating steps of geometry idealization and atomic refinement were carried out. Some of these entries retrieved from the PDB database included the observed anomalous signals using which better experimental maps could be calculated using symmetry-averaged data in the correct space groups than was possible in the incorrect P1. However, it is noted that many entries include an extra column of information about estimated anomalous signals calculated using refined coordinates, which were incorrectly treated by the cif2mtz script as being observed signals.5,7 Difference Fourier maps using such estimated anomalous signals and calculated phases from re-refined models in the correct space groups showed that these estimated signals could not provide the correct information about the location heavy atom sites most cases test. This is consistent with the fact that an addition of small anomalous differences to the observed amplitudes to generate estimated amplitudes for Friedel mates requires accurate phases, which are not obtainable with symmetry downshifting. Such estimated amplitudes vary from one run of 2 structure refinement to the next, and indirectly contain phase information from models. When such estimated amplitudes are used for structure refinement, the refinement is actually against the calculated structure factors even though the mean values of the Friedel amplitudes are constrained to the observed amplitudes. Such procedure is completely invalid! All Rsymm<0.15, calculated using XPREP of the Shelx suite,8 are tabulated for statistical analysis (Table S3). Of course, entries with Rsymm>0.15 can still be in higher symmetry space groups. For example, in a recent paper, all weak-intensity highresolution were included for NCS averaging in which Rmeas is as high as 0.24,9 in which case, Rmeas was carefully monitored as a function of resolution to ensure that the correct space group was identified on the basis of low Rsymm values at low resolution. On the other hand, many entries in the database alleged in higher symmetry space groups with Rmerging higher than 0.20 may result from incorrect symmetry averaging and actually belong to lower symmetry space groups, which is on the opposite side of the spectrum of problems discussed in this study and beyond the scope of this study. Such incorrect symmetry merging would result in unusual twinning statistics in space groups (such as non-special orthogonal space group), even when the original crystals may not have twinning problems. 3. The closely related isomorphous structures of 3LDC, 4HYO, and 4HZ3 This section is supplemental to the main text. The re-refined 4HYO structure at 1.65 Å resolution is nearly identical to the 3LDC structure at 1.45 Å resolution published by Jiang and colleagues in the same journal two years earlier with an exception of being at lower resolution.4,10 In fact, an overall amplitude difference R-factor is only 8.2% for all the data to 1.65-Å resolution between the 4HYO and 3LDC data (Fig. 4). An overall isomorphous difference between 4HYO and 4HZ3 is 13.0% (Fig. 4).4 With such small isomorphous differences, the proper way to determine these closely related structures is to use the isomorphous difference Fourier method, not independent molecular replacement. An isomorphous difference Fourier map between the observed 4HYO and 4HZ3 data showed that (i) there was no evident for the claimed binding of tetraammonium antimony in 4HZ3,4 and (ii) K+ sites in the central channel of the 4HYO structure were not fully occupied (Fig. 4). An isomorphous difference Fourier map between the observed 4HYO and 3LDC data again showed that K+ sites were not fully occupied in 4HYO (Fig. 4). These results are consistent with earlier observations made by Jiang and colleagues that occupancies of K+ in binding sites can be reduced by lowering K+ concentration in crystal stabilization solution, initially at the third K+ site.10 When K+ fully occupied in the third site in the model for 4HYO in this re-refinement, a hole at this site in residual Fobs-Fcalc maps was over 7 σ where Fobs and Fcalc denote the observed and calculated amplitudes, respectively. The hole was reduced below noise level when the occupancy was reduced by 20%. 4HYO is one of many structures in the PDB that simply reproduced the alreadypublished structures using the symmetry downshifting procedure with improved R free values as new structures. These structures were often determined using independent molecular replacement method, rather than isomorphous difference Fourier method. More often, different crystallographic origins were preferentially chosen to make them look really different from each other for non-specialists even when such pairs of 3 structures are actually nearly identical to one another, some of which entries are truly different and have been properly categorized as having closely related entries in the PDB.11-14 4. Experimental errors and selection of reflections for the CV set in P1 This section is a response to a request of one reviewer on the issue why and how the thin slice method of selecting reflections into the CV set does not work in P1 in addition to evidence that has already been provided in the main text. An analysis of the 3MUU data for a viral glycoprotein E2 structure was carried out using the procedures described in the main text at the time when the 3MUU structure was published because we attempted to use this E2 model to determine the structure of an E2 protein structure from bovine viral diarrhea virus.15,16 Symmetry R-factors for 3MUU were analyzed for each of the three potential dyads and the 3-fold axis, summarized in Table S9. It is clearly that it actually belongs to C2, not P1. To further explain why the selection of reflections in the CV set using a thin slice method of constant resolution shells in P1 for 4HYO did not work (Fig. 4), the 3MUU entry was chosen for demonstration because 3MUU had larger variations in the unconstrained unit cell parameters due to reduced resolution than that of 4HYO.4,15 The crystals used to obtain 3MUU belonged to C2, a subgroup of the closely related hexagonal crystal form P321, but the structure it yielded was refined in P1.15 This highto-low symmetry downshifting made it appear that some CS or NCS-related reflections actually differ in resolution because unit cell dimensions were allowed to vary when the data were reprocessed. The P1 unit cell parameters obtained for 3MUU are: a=67.482 Å, b=158.430 Å, c=160.676 Å, =60.42°, =89.80°, and =89.65°; but for the closely related P321 unit cell, which was initially used to generate experimental maps, the parameters are a=b=159.8 Å, c=68.0 Å, ==90°, and =120°. In 3MUU, an appropriate selection of reflections for inclusion in the CV set should be done in the highest pseudoP321 symmetry, not in P1. If a selection of the CV set were done in the P1 3MUU data, one ought to take into account experimental errors in cell parameters. Because of 1% differences in b=158.430 Å ≠ c=160.676 Å in P1, the resolution of all 3-fold NCS-related mates of reflections that ought to have the same resolution now also differs by as much as 1% of their resolution values, for example between pairs of (0k0) and (00l) of the same index numbers (i.e., k=l). This resolution difference is amplified rapidly in high resolution. For example, the resolution of the reflection (0,10,0) is 15.8 Å and its 3-fold NCS-related mate (0,0,10) is 16.1 Å. The resolution of the reflection (0,50,0) is 3.16 Å and its 3-fold NCS-related mate (0,0,50) is 3.35 Å. If one does not use NCS geometric relationship of indexes for selection of reflections in the CV set (i.e., not used pseudo-P321 symmetry), instead using a thin slice method of a constant resolution shell (i.e., in P1 symmetry), the thickness of the resolution shell must cover all reflections within the resolution range between both (0,50,0) and (0,0,50), i.e., between 3.16 and 3.35 Å. With such large unit cell where the Bragg reflections severely overlap in reciprocal space in resolution, the reflections between 3.35 and 3.16 Å resolution shell are far greater than 5% of the data. At the same time, all other reflections within this resolution shell will have their own 3fold NCS-related mates located outside this shell, and they demand a further expansion of new boundaries of the resolution shell to include all of their 3-fold NCS-related mates, 4 and so on. This is why the thin slice method of selecting reflections for inclusion of the CV set in P1 will never work as long as experimental errors on cell parameters exist. Of course, it might work if experimental errors do not exist or the unit cell parameters are properly constrained to be identical. Thus, in the presence of experimental errors for cell parameters as explained in the above example, the probability of each pair of 2-fold NCS-related mates at high resolution shells simultaneously being selected into the CV set is (5%)2 and the probability of each triplet of 3-fold NCS-related mates in the same CV set is (5%)3 and so on when an ultra thin slice method is used. Thus, the probability of any reflections in the CV set being related one in the working set is 1-(5%)2=97.5% or higher. This violates the requirement that the two sets of reflections be independent.17 Experimental errors of cell parameters in macromolecular crystallography are often ignored in structure refinement in many cases. For example, an overall amplitude difference between 4HYO and 3LDC data is only 8.2% for all the data to a resolution of 1.65Å. Yet, their unit cell parameters differ by as much as 0.6%. This cell-parameter difference was due to numerical instability in data processing, not actual differences. This numerical error can cause the entire molecule to expand or contract, including bond distances that would now be incorrectly used as restraints in structure refinement. Many investigators over-trusted unit cell parameters produced by the data processing programs, even with diffraction data at a resolution of about 5.0 Å.12 We recently analyzed the unit cell parameters/detector distance during the data processing of 1.3-Å resolution of C2 crystal of E. coli catalase. The data were collected using ultra-thin oscillation range of 0.2° per image and processed in 25 images per sector, which corresponds 5° of reciprocal space. We found that the detector distance changed as much as 0.5% from the first zone to the last zone with corresponding expansion and contraction of unit-cell parameters. The numerical instability in data processing is intrinsic to currently implemented data collection/processing methods: one could not get accurate three-dimensional cell parameters from two-dimensional data covering only 5° of reciprocal space even at a resolution of 1.3 Å. These procedures need to improve. 5. Asymptotic R-factor values This section is supplementary to the main text. Evans and Murshudov have shown in an elegant theoretical studies that when a model is refined against the Wilson-expected average values of the data in the highest resolution shells, the asymptotic value of Rfactor is about 42.3%,18 This implies when a model has successfully explained the Wilson-expected values of the data, the asymptotic value of R-factor is about 42.3%. This value is reduced to 29.1% in the presence of perfect merohedry twinning, and is typically between these two values in the presence of any other special or non-special NCS. Thus, the original TTHRNP 3DXJ structure discussed in the text had an Rfree value of 28.9% in the presence of incorrectly proposed perfect merohedral twinning,19 and this value is only 0.2% better than its asymptotic value using the Wilson-expected amplitudes for structure refinement instead of using real data. Thus, it is almost certain that in the presence of such special NCS, a structure with R-factor of about 30% can be completely wrong. 6. NCS restraints versus CS constraints in structure refinement, and multi-start and multi-copy structure refinement 5 This section is mainly a response to a request of one reviewer on how special NCS and non-special NCS behaved differently in structure refinement such as composite-omit maps. There are several important distinctions between non-special and special NCS. Downshifted symmetry cannot distinguish the correct and an incorrect model, always results in model bias even when the model is incorrect. For example, a special translational NCS is related by a vector of approximately (0,0,1/2) and will cause the intensities of reflections in l=odd layers to be near zero and in l=even layers to have the same distribution as a structure without any NCS. A placement of any unrelated molecule in asymmetric unit (for example, RB69 DNA polymerase into a hypothetic TTHRNP unit cell) will result in completely model-biased densities for this molecule in a new location that differs by this vector, even though this molecule is not physically present (or is omitted) in the new location. Non-special NCS such as 222-NCS in asymmetric unit does not have such model bias in omitted-maps. In fact, with sufficiently high solvent content, averaging using non-special 222-NCS can result in correct phases even though starting models were completely wrong. Thus, there is no model bias in composite-omit maps in the presence non-special 222 NCS. NCS restraints after symmetry downshifting differ from strict CS or true NCS constraints in two important respects. First, molecules that are related by true NCS restraints have conformations that are similar but not necessarily identical. Second, true NCS restraints do not impose any geometric constraints on where the NCS axis is located or how it is oriented. However, the symmetry in data sets that is caused by CS of data will not be faithfully reproduced in the amplitudes computed for the structures obtained after CS has been reassigned as NCS unless strict NCS constraints are imposed, which will require that NCS-related molecule be all but perfectly identical, and that NCS axes are very closely aligned with unit cell axes. To resolve this somewhat paradoxical problem, some authors introduced twinning to account for the CS that is lost when the expand data into lower symmetry space groups.19-29 Another attempted the restoration of the lost CS by introducing multiple conformations of the entire asymmetric unit.30 Symmetry downshifting is another type of multi-copy refinement without involving rotational symmetry, which method is known to reduce both working and free R-factors even when the model being refined is completely wrong.31 In the case of the first symmetry downshifting issue I addressed,17 the original authors quadrupled the number of atoms in the model: 2-fold from symmetry downshifting and 2-fold from multiple conformations of the entire asymmetric unit (or the unit cell), resulting in 4-copy of the model.30 There is an important distinction between multi-start refinement32 and multi-copy refinement33 in that the former generates better calculated phases that do not bias towards individual models so that a better model can be rebuilt. The latter assumes that an ensemble of models is a better representation of what is in crystal, including an averaged structure. However, an averaged structure has no physical meaning at all. During the last steps of re-refinement of the 1EGW and 1JXO structures (Table S5-8),34,35 Rfree values varied about 0.2%. An attempt was made in the last steps of the 1JXO re-refinement to automatically refine 5 residues at a time throughout the entire structure in real-space refinement using graphics programs. This resulted in a decrease of Rfree values by more than 2% but without substantial decrease of R working so that the gap between them reduced. This decrease of Rfree values does not suggest an improved quality, but rather results from an invalid approach for using reflections the CV set when 6 reflections in the CV set were included in map calculations for real-space refinement, a serious problem arises which will be addressed elsewhere. These calculations suggest that the surfaces of multi-parameter minimization functions for reflections in both working and CV sets have many local minima, and reductions in both working and free R-factors may result from being trapped inside these local minima, not necessarily from improved quality. In 2005, I first addressed problems caused by symmetry downshifting in a brief communication arising from an original publication.17,30 It is unfortunate that the original authors and many other structural biologists mistook the journal policy that the original authors should have the last word in reply in a brief communication arising format as a vindication of symmetry downshifting based on improved Rfree values and improved composite-omit maps. 17,30 These authors continued to carry out symmetry-downshifting and quadrupling atoms in asymmetric unit to publish additional structures from the same crystal form afterwards,36 followed shortly by an explosion of symmetry shifting entries in the PDB (Table S1). Now, there are more than over 200 entries in the PDB whose symmetry has been downshifted to P1, over 560 to P21, over 180 to C2, over 240 to P212121 and so on (Table 1, S3). 7 Supplemental Figures Figure S1. Electron density maps generated using ACORN using normalized extended structure factors. (a) Using merged P4212 data for 4HYO starting with C atoms only. (b) Using the original P1 4HYO data starting with C atoms for all 8 chains. Maps were contoured at 3.0 , and arrows indicate noticeable differences between the two maps. Figure S2. Comparison of observed (blue) and calculated (red) axial reflections along the (00l) direction. The calculated amplitudes from the incorrectly assigned P32 space group failed to reproduce the observed P65 systematic absences in the 1SMY structure. 8 Figure S3. Distribution of intensities (A) and intensity/sigma ratios (B) of (0k0) axial reflections in the re-merged 1EGW data. Re-refinement has confirmed the correct space group is P21, as supported by the intensity distribution (A), but not by the I/I distribution (B). Violations of systematic absences in (B) are likely due to errors in the original data processing. Figure S4. The re-refined 1EGW structure in P21 with one complex per asymmetric unit in two orthogonal views. Three additional residues were built in the protein. When the DNA duplex is rotated around its pseudo-dyad, resulting in two different orientations, only three bases per strand have different identities. 9 Supplemental Tables Table S1. P1 unit cell parameters, and location and heights of self-rotation function peaksa PDB(year) 1Z9O(2005) 4HZ3(2013) 1WW5(2005) 4NHZ(2013) 4HYO(2013) 3EBN(2009) 1WW4(2005) 4EY0(2012) 3TQS(2011) 3RL0(2011) 3OVJ(2011) 3MU6(2012) 3I3U(2009) 3H0R(2009) 3H0M(2009) 2CJF(2007) 1ZBL(2005) 1JXO(2001) 1EGW(2000) 4LOH(2013) 4IT7(2013) 3VXO(2012) 3RFU(2011) 3HTM(2009) 3CS2(2008) 2O4Q(2008) 1ZLJ(2005) 1WW6(2005) 1R52(2003) 4INH(2013) 4GM3(2013) 4F21(2013) 4DX9(2013) 3NY2(2010) 3MGO(2010) 3MD2(2010) 3FGQ(2009) 4LL7(2014) 3HZ2(2009) 3EVX(2009) 2PIH(2007) 2DES(1993) 4K4X(2013) 4J1R(2013) 4H82(2012) 3S2G(2011) 3RK2(2011) 3R05(2011) 3O37(2010) 3MUU(2010) 3K9M(2010) 3FIX(2011) 3FD2(2009) 3F7F(2009) 2PMC(2008) Res(Å) 1.9 1.7 2.2 1.9 1.65 2.4 2.3 2.8 1.98 3.8 1.8 2.43 2.8 3.0 2.8 1.95 2.2 2.3 1.5 2.25 2.10 2.61 3.2 2.5 1.95 1.95 2.00 2.2 2.89 1.70 3.39 2.5 2.99 2.61 2.30 2.20 2.09 2.31 1.86 2.54 2.10 1.50 2.37 2.70 1.90 2.30 2.20 2.95 2.00 3.29 2.61 2.30 2.69 2.60 2.69 Footnotes: a(Å) 50.048 43.965 47.623 96.998 44.035 51.395 47.264 54.900 38.007 53.745 9.536 41.567 43.706 127.377 128.248 195.755 36.897 45.009 41.371 36.500 43.510 51.120 44.150 36.800 55.295 55.295 33.069 47.506 57.816 56.335 48.943 50.997 75.622 29.390 63.211 29.246 46.017 28.069 29.300 41.368 29.322 18.010 61.001 67.516 39.900 68.080 27.627 61.030 35.667 67.482 62.122 42.229 41.932 52.680 34.760 b(Å) 50.033 63.396 57.648 99.359 63.452 51.350 57.078 59.200 57.895 127.357 26.008 61.622 60.287 130.411 129.856 195.730 44.983 53.410 60.696 59.209 44.499 66.382 72.980 88.700 68.299 68.299 60.488 57.344 75.599 66.917 105.985 64.416 122.207 61.456 79.322 37.748 51.953 49.576 54.202 64.537 52.201 18.830 61.013 67.441 98.860 92.472 39.769 114.574 63.848 158.430 31.078 60.915 41.933 117.478 53.672 c(Å) 90.287 63.311 64.643 108.734 63.477 51.390 64.306 79.200 112.938 142.725 25.803 61.478 60.319 153.973 155.069 239.680 62.280 63.262 63.987 59.205 44.582 87.316 329.950 88.700 90.030 90.030 74.226 64.471 91.327 100.459 120.907 139.039 135.269 72.806 87.146 122.221 80.316 149.950 54.179 66.422 55.448 26.750 195.137 116.422 47.130 116.855 102.275 160.129 79.299 160.676 70.937 72.116 70.960 146.287 65.641 (°) (°) (°) (°) (°) (°) 90.01 89.99 89.96 89.98 90.03 112.22 89.98 90.00 90.00 107.49 62.28 114.12 120.04 89.89 90.01 65.84 83.63 89.96 115.18 83.98 90.00 112.30 89.96 90.80 90.05 90.05 89.90 90.16 114.44 89.99 90.00 94.89 89.97 65.05 90.00 89.98 90.09 89.96 85.81 90.06 61.85 92.60 90.02 90.13 90.03 106.28 83.38 89.60 89.92 60.42 89.95 101.19 107.21 89.94 90.22 90.00 89.95 111.65 89.97 89.99 112.00 111.73 90.00 90.01 90.01 88.59 89.99 89.96 90.21 89.96 65.89 89.90 90.00 89.99 85.83 89.99 89.88 90.04 89.30 100.42 100.42 89.91 111.62 108.43 90.09 89.76 90.12 89.99 89.98 90.00 90.00 90.05 84.67 74.31 89.95 90.64 100.50 89.98 90.08 111.95 90.00 89.94 89.98 89.99 89.80 104.45 90.10 95.43 89.77 102.92 60.03 90.00 90.14 89.99 89.99 104.36 90.22 90.00 104.34 90.05 88.45 89.95 90.03 89.95 90.11 89.97 65.69 91.78 90.00 85.87 90.00 90.01 90.22 89.90 89.96 89.96 90.99 89.88 89.98 75.93 90.03 89.96 108.11 90.01 90.02 112.53 97.12 89.98 74.32 105.18 90.01 94.90 78.36 81.11 89.98 89.97 89.87 87.96 89.80 89.65 89.90 89.97 109.44 89.89 90.18 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 180.0 90.0 148.9 90.0 90.0 90.0 90.0 180.0 90.0 180.0 90.0 137.9 180.0 90.0 90.0 90.0 90.0 90.0 180.0 180.0 180.0 180.0 90.0 90.0 90.0 90.0 180.0 180.0 180.0 180.0 90.0 180.0 90.0 88.9 180.0 180.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 -180.0 180.0 -89.9 180.0 0.0 -121.5 -90.0 -0.0 0.0 -180.0 89.9 180.0 0.0 89.9 -89.9 0.0 179.8 0.0 -180.0 90.1 0.0 90.0 -180.0 180.0 -90.0 -90.0 0.0 0.0 0.0 0.0 -89.8 -0.0 90.1 -90.0 0.0 0.0 0.0 0.0 0.3 0.0 89.8 -89.4 0.0 0.0 90.0 -0.0 -0.0 -180.0 179.8 -90.0 -90.1 180.0 56.5 -0.0 -90.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 60.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 90.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 180.0 SR-Fun 1.00 0.99 0.99 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.94 0.94 0.94 0.94 0.94 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 a The date of structure entries corresponds either the publication date or the deposition date when there is no publication associated with them. The self-rotation peak heights (SF-Fun) are relative to the crystallographic origin peak. X-ray fiber diffraction structures and the inverted Fourier transformed structure factors from electron microscopic structures were excluded for analysis. Errors in some deposited structure factor/intensity data existed but should not affect the overall conclusions of systematical analysis. For example, the first entry 1Z9O deposition in the table has two sets of structure factors, the first including the cross-validation set and intensity data. However, the cross-validation set had a uniform value and intensity data fields were blank. Moreover, the second set was a duplicate of the first set without intensity fields; and the deposited structure factors were actually only at 2.78 Å resolution, and did not corresponding to the resolution of the reported structure. 11 Table S2. Identification of correct space groups from incorrect P1 space groups Accession Code Reference citation number 3O37 37 4HYO 4 4HZ3 4 1Z9Ob 38 1WW5 39 3EBN 40 4EY0 41 3RL0 42 4LL7 43 3MUU 15 3OVJ 44 1ZBL 45 1JXO 34 1EGW 35 4LOH 46 3RFU 47 3HTM 48 4DX9 49 3NY2 50 3RK2 42 3F7F 43 Original Incorrect P1 a,b,c (Å) ,, (°) Resolution (Å)/Rwork (%) 35.667; 63.848; 79.299 89.92; 89.99; 89.80 2.0 Å/22.7% 44.035; 63.452; 63.477 90.03; 89.99; 89.99 1.65Å/16.5% 43.965; 63.396; 63.311 89.99; 89.95; 90.00 1.70Å/17.4% 50.048; 50.033; 90.287 90.01; 90.00; 60.03 1.90Å/21.9% 47.623; 57.648; 64.643 89.96; 111.65; 90.14 2.20Å/19.8% 51.395; 51.350; 51.390 112.22, 112.00, 104.46 2.4Å/20.8% 54.900; 59.200; 79.200 90; 90; 90 2.8Å/19.9% 53.745; 127.357; 142.725 107.49; 90.01; 90.05 3.80Å/30.8% 28.069; 49.576; 149.950 89.96; 84.67; 89.98 2.31Å/23.1% 67.482; 158.430; 160.676 60.42; 89.80; 89.65 3.29Å/24.0% 9.536; 26.008; 25.803 62.28; 88.59; 88.45 1.80Å/20.7% 36.897; 44.983; 62.280 83.63; 89.90; 65.69 2.20Å/21.7% 45.009; 53.410; 63.262 89.96; 90.00; 91.78 2.30Å/22.0% 41.371; 60.696; 63.987 115.18; 89.99; 90.00 1.50Å/20.6% 36.500; 59.209; 59.205 83.98; 85.83; 85.87 2.25Å/18.8% 44.150; 72.980; 329.950 89.96; 90.04; 90.22 3.20Å/23.6% 36.800; 88.70; 88.700 90.80; 89.30; 89.90 2.50Å/21.7% 75.622; 122.207; 135.269 89.97; 89.99; 108.11 2.99Å/25.5% 29.390; 61.456; 72.806 65.05; 89.98; 90.01 2.61Å/23.3% 27.627; 39.769; 102.275 83.38; 89.94; 89.87 2.20Å/23.0% 52.680; 117.478; 146.287 89.94; 89.77; 89.89 2.60Å/23.2% Transformation Matrix [1 0 0] [0 1 0] [0 0 1] [0 1 0] [0 0 1] [1 0 0] [0 1 0] [0 0 1] [1 0 0] [0 1 0] [-1 0 0] [0 0 1] [0 1 0] [-1 0 -2] [0 1 0] [0 -1 -1] [1 0 1] [-1 -1 0] [1 0 0] [0 1 0] [0 0 1] [0 1 0] [-1 0 0 ] [0 0 1] [1 0 0] [1 0 -2] [0 1 0] [0 -1 2] [0 1 0] [-1 0 0] [0 1 1] [0 1 -1] [-1 0 0] [1 -2 0] [1 0 0] [0 0 1] [-1 0 0 ] [0 0 -1] [0 -1 0] [0 -1 0] [-1 0 0 ] [0 0 -1] [0 1 1] [0 1 -1] [-1 0 0] [1 0 0] [0 1 0] [0 0 1] [0 1 1] [0 1 -1] [-1 0 0] [-1 0 0] [-1 -2 0] [0 0 1] [0 -1 0] [0 -1 2] [-1 0 0] [0 1 0] [1 0 0] [0 0 -1] [1 0 0] [0 1 0] [0 0 1] New Cell Parameters a,b,c (Å) ,, (°) Rwork/Rfree (%) 35.667; 63.848; 79.299 90; 90; 90 14.2%/22.7% 63.452; 63.477; 44.035 90; 90; 90 12.6%/17.5% 63.452; 63.477; 44.035 90; 90; 90 14.7%/19.1% 50.040; 50.040; 90.287 90; 90; 120 19.7%/29.0% (2.7Å)b 47.623; 120.166; 57.648 90; 90; 90 19.0/24.5% 57.288; 57.477; 63.001 90; 90; 90 18.1%/25.7% 54.900; 59.200; 79.200 90; 90; 90 19.9%/29.2% 142.725; 53.745; 127.357 90; 107.493; 90. Correct Space Groupa RPIM(%) RSymm(%) 279.478; 158.430; 67.482 90; 90.32; 90 19.8%/25.9% 44.345; 26.794; 9.536 90; 91.73; 90 P212121 5.8% 9.3% P4212 1.0% 2.5% P4212 1.5% 3.7% P3121 2.2% 7.1% C2221 3.1% 5.1% I222 3.0% 4.8% P21212 5.0% 8.0% P21 4.5% 7.4% C2221 6.0% 9.4% C2 7.1% 12.1% C2 10.2% 81.989; 36.897; 62.280 90; 96.95; 90 C2 4.6% 45.009; 63.262; 53.410 90; 91.78; 90 17.6%/23.5% 60.696; 41.371; 63.987 90; 115.18; 90 14.2%/18.3% 88.011; 79.221; 36.500 90; 95.59; 90 P21 1.8% 44.150; 72.980; 329.950 90; 90; 90 P212121 5.8% 124.562; 126.313; 36.80 90; 90; 90 C2221 5.2% 75.622; 232.306; 135.269 90; 90; 90 C2221 7.0% 61.456; 132.023; 29.30 90; 90; 90 C222 8.7% 39.769; 27.627; 102.275 90; 96.62; 90 P2 4.3% 52.680; 117.478; 146.287 90; 90; 90 P22121 6.7% 28.069; 298.603; 49.576 90; 90; 90 P2 5.3% C2 2.5% Footnotes: a The top seven entries and three other entries have been re-refined, additional five are still being partially re-refined, and the last ten entries included only Rsymm values. b. See footnote a of Table S1. Table S3. R-symmetry of the experimental data of crystals assigned to P1 that have special NCSa PDB 1W9S 3VQR 1TQE 2DDS 1TKU 4LOH 3UXJ 1P7H 2YWV 1W9T 1JXO 2D3E 3SJO 2PMC 3U5P 3UXV 3TQS 1U8T 2O4Q 3OJV 1TKS 1YUP 2DDR 1V40 1VG8 3KDM 4GUN 4NMU 3BX8 3I4W 2AEN 4J4Y 1WW6 3N9R 1ZLJ 2R8E 3D4G 3KSE 3U5O 3BP9 3FD2 3IA0 1ZBL 3MFB 4BBF 4BBE 1R52 3K2C 4NHZ 3IHF 2A8Y 3EBN <I/I> 23.97 35.16 12.96 16.82 11.25 29.33 18.65 14.92 15.65 14.38 15.59 29.80 13.31 14.72 17.30 14.45 12.97 21.39 11.60 22.73 13.66 9.00 9.38 14.43 9.91 14.11 10.15 15.89 10.64 8.31 11.19 15.84 37.87 7.31 10.70 19.65 11.02 14.29 25.64 10.51 9.71 4.88 12.73 11.48 10.51 7.51 16.12 11.95 7.52 23.39 8.67 11.41 Lattice C2 C2 C2 C2 C2 C2 C2 C2 C2 C2 P2 C2 P2 P2 P422 C2 P2 C2 P2 C2 C2 P2 C2 P2 C2 C2 P2 P2 C2 P2 P2 C2 C222 P2 P2 I422 P422 R3 P422 C2 C2 P222 C2 P2 C2 C2 I222 P2 P2 P222 C2 I222 Rsymm 0.018 0.019 0.022 0.023 0.024 0.024 0.025 0.026 0.026 0.027 0.028 0.029 0.029 0.030 0.030 0.030 0.031 0.032 0.033 0.033 0.034 0.034 0.034 0.035 0.035 0.035 0.035 0.036 0.037 0.037 0.038 0.038 0.039 0.039 0.040 0.040 0.040 0.040 0.040 0.041 0.041 0.041 0.042 0.042 0.042 0.043 0.044 0.044 0.044 0.045 0.046 0.046 PDB 3MER 1T91 3S19 4IT7 4LGG 1AAR 1U0D 3D6E 2OQY 2DYK 3BS0 1WW5 3BFV 3GUW 3SQP 4OIJ 1SQL 2H6L 3HTM 3PQJ 1EGW 2DDT 3GND 3H7N 3O33 4IFC 2AG5 2C7N 2QGQ 2YZJ 2Z66 1C03 2Q5C 3HQM 3RFU 4KMP 4N0O 2W89 3V1D 4DRW 4DX9 1UKP 3LK4 2WVA 3VEK 4MJM 3EXA 4N21 1R19 3F7F 3MGO 3VXO <I/I> 15.72 11.29 21.97 8.90 12.59 0.52 18.73 15.70 13.68 12.64 12.63 32.88 9.28 8.16 29.02 9.03 9.70 11.81 23.16 13.35 9.95 15.40 6.85 25.87 18.35 9.08 11.98 13.33 10.70 6.32 8.70 11.68 15.16 20.68 11.59 10.57 18.93 13.00 11.11 7.03 10.66 5.71 13.95 11.42 9.17 10.49 11.70 8.06 11.79 11.34 19.88 14.87 Lattice C2 C2 C2 P222 P2 C2 P2 C2 I422 C2 P2 C222 C2 P2 C2 C2 P2 R3 C222 P2 P2 C2 C222 P222 C2 C2 C2 C2 C222 R3 P222 I222 C2 C2 P222 C2 C2 C2 P222 C2 C222 P2 P222 P2 C2 I422 P2 R3 C2 P222 P222 C2 Rsymm 0.046 0.047 0.047 0.047 0.047 0.048 0.048 0.048 0.049 0.050 0.050 0.051 0.051 0.051 0.051 0.051 0.052 0.052 0.052 0.052 0.053 0.053 0.053 0.053 0.053 0.053 0.054 0.055 0.055 0.056 0.056 0.057 0.057 0.058 0.058 0.058 0.058 0.059 0.059 0.059 0.059 0.060 0.061 0.062 0.062 0.062 0.063 0.063 0.066 0.066 0.066 0.066 PDB 4K4X 1SED 3F9X 3WHK 4K4U 4K4V 3R6V 1YNF 2XH0 4J81 2C91 3L89 4INH 3WGJ 4EFR 2X1G 4JYM 2DRM 4E4M 1HV4 4PW5 2ZDC 3CFK 3R0R 3GK0 4EY0 2YHP 2YHQ 3WGN 1WW4 2PIH 3EVX 2IVK 4H82 4OIK 2A1F 2BS1 3R6Q 4K4T 4K4Z 2XLL 3K9M 3RZP 2CJF 3ES8 3S2F 3UDU 3H2V 2V8V 3NMW 3O37 3U58 <I/I> 7.16 10.42 20.32 11.94 7.89 8.95 8.08 8.56 9.61 15.46 8.24 9.01 10.47 18.46 13.93 5.07 4.61 14.01 5.72 20.75 6.35 10.12 11.29 8.76 10.18 6.36 4.51 4.51 11.91 26.88 10.47 10.42 13.70 6.65 4.19 6.96 12.28 10.21 7.10 6.62 12.21 8.69 8.09 6.08 9.01 8.22 11.94 12.30 6.67 5.41 14.94 16.29 Lattice C222 R3 P2 P2 C2 C2 P2 C2 P2 C2 C2 P2 P2 C2 C2 P2 P2 P222 C2 C2 P2 R3 P222 P2 C2 P222 P2 P2 C2 C222 C2 P2 C2 P2 C2 C2 C2 P2 C2 C222 P222 P2 C2 C2 I422 P2 P2 P2 C2 P2 P222 I422 Rsymm 0.066 0.067 0.067 0.067 0.067 0.067 0.068 0.069 0.069 0.069 0.070 0.071 0.072 0.073 0.073 0.074 0.074 0.075 0.075 0.076 0.076 0.077 0.077 0.077 0.078 0.079 0.081 0.081 0.081 0.082 0.082 0.083 0.084 0.085 0.085 0.086 0.086 0.087 0.087 0.087 0.088 0.088 0.088 0.089 0.089 0.089 0.089 0.090 0.091 0.092 0.092 0.092 PDB 3B4V 4F21 4GM3 1FFP 4EIL 3CJH 4J1R 4JYP 4LL7 4BI5 2RH0 2XI9 4K50 3OVJ 3S2G 4N9G 1FFO 4K4S 2XTQ 3UKF 3VO8 4HLY 4K4W 1FFN 1UKO 3OA8 4PW7 4J82 1YAJ 4LY6 2WH8 4LZZ 3MGT 1M9X 3V4E 2VCV 3DA7 2X7L 2IJZ 4FMY 3OA6 2POS 2VI5 4GEZ 4N9F 3HZ2 4M6D 3RCD 2GLJ 3EHR <I/I> 9.97 5.68 6.80 7.93 8.11 9.72 7.18 6.25 13.36 5.63 4.88 15.18 5.17 3.55 6.53 10.75 5.36 6.41 6.04 5.84 16.98 12.15 8.84 4.07 27.43 7.01 6.83 11.88 6.34 5.34 11.71 6.56 25.91 9.94 22.88 7.53 22.49 7.44 4.01 9.87 6.67 17.96 9.79 5.92 10.07 21.24 6.48 7.65 2.94 14.39 Lattice P2 P2 P222 C2 P222 C2 C222 P2 C222 P2 C2 C2 P2 C2 P2 P2 C2 C2 C2 P2 C2 C2 C2 C2 P2 C2 P422 C2 P2 P2 C2 P2 P222 P2 R3 P2 C2 C2 R3 C2 P2 C2 C2 C2 C2 F222 R3 P2 P2 I222 Rsymm 0.094 0.094 0.094 0.095 0.095 0.096 0.096 0.096 0.097 0.100 0.101 0.101 0.101 0.102 0.103 0.107 0.108 0.108 0.109 0.109 0.109 0.109 0.110 0.111 0.113 0.113 0.113 0.115 0.117 0.117 0.118 0.119 0.120 0.122 0.124 0.126 0.127 0.129 0.130 0.130 0.131 0.133 0.134 0.134 0.141 0.142 0.142 0.144 0.145 0.149 Footnotes: a Entries with Rsymm of zero using symmetry-expanded data for atomic refinement were omitted in this table. Table S4. Refinement statistics of the revised TthRNAP holoenzyme structures Myxopyronin complex [3DXJ]a P65 [P32] 3.0 (3.00-3.07)b 149,560 5.33 [3.33] 20.3(41.6) [23.5] 23.4(39.7) [28.9] 28,078 [56,018] 1 [2] 1 [2] 0 27,948 [55,882] 62 [68] 0.009 [0.008] 1.31 [1.57] (-h-k,k,-l) [(-h,-k,l)] (4.4) [50] 45.6 PDB ID Space Group Resolution (Å) # Reflections RO2Ac R-factor (%) Free R-factor (%) Total # Atoms # Complex # Myx # ppGpp # Protein Atoms # H2O/Ions rmsd Bond (Å)e rmsd Angle (°)e Twin Operation Twin Fractions (%)f Rmerging for twinning (%)g Footnotes: a. b. c. d. e. f. g. Alarmone ppGpp complex [1SMY] P65 [P32] 2.71 (2.71-2.78) 198,576 6.99 [4.67] 26.6(38.5) [18.6] 28.6(38.6) [26.6] 28,419 [53,552] 1 [2] 0 1 [2] 28,098 [42,518] 285 [10,962d] 0.009 [0.013] 1.28 [2.08] (-h-k,k,-l) [(-h,-k,l)] (12.8) [50] 41.0 Statistics for the original structures are shown in square brackets. Statistics for the highest resolution shell are shown in parenthesis. RO2A is the observation-to-atom ratio. 1SMY entry contains 1,897 Mg2+ and 34 Zn2+ in addition to 9,031 water molecules, which is 20% of all atoms, but is missing 16% protein residues. Root mean squares deviations (rmsd) to ideal bond length or bond angle values. Only fractions of the minor twinning domain are shown. For fair comparison, we used the original selection of cross-validation set in P65. Because correlation between the working and crossvalidation sets introduced by the twinning operation relating the pairs of reflections between (h, k, l) and (-h-k, k, -l), which twinning results in the higher-symmetry pseudo P6522, free R-factors are generally underestimated. An error of underestimation increases proportionally with the fractions of the minor twinning domain or inverse proportionally with merging R-factors between paired reflections related by the twin operations (h, k, l) and (-h-k, k, -l). The merging R-factors of twinning operation are defined between (h, k, l) and (-h-k, k, -l) domains. Table S5. P21 Symmetry R-factors of the 1JXO data Resolution Range (Å) 99.0-4.95 4.95-3.93 3.93-3.43 3.43-3.12 3.12-2.90 2.90-2.73 2.73-2.59 2.59-2.48 2.48-2.38 2.38-2.30 All I/I 32.58 36.41 30.65 26.19 22.54 20.07 17.91 15.60 15.57 12.86 27.93 Rsymm 0.029 0.023 0.026 0.025 0.032 0.034 0.041 0.045 0.049 0.54 0.028 Rmeas 0.041 0.032 0.036 0.035 0.046 0.048 0.057 0.064 0.070 0.076 0.040 CC1/2 0.987 0.995 0.992 0.995 0.989 0.993 0.991 0.988 0.986 0.988 Table S6. P21 1JXO re-refinement statistics Space group P21 Unit Cell parameters a=45.009 Å, b=63.262 Å, c=53.410 Å, =91.78° Resolution 54-2.30 Å (2.359-2.300 Å) Number of reflections 12,667 (925) Rworking 17.3%(17.3%) Rfree 23.6%(22.7%) Number of atomsa 2,656 Observation/Atom Ratio, RO2A 4.8 <B> 29.8 Å2 Bond length deviations 0.008 Å Bond angle deviations 1.18° Ramachandran Plots Preferred regions 97.6% Allowed regions 2.1% Disallowed regions 0.4% (1 residue) Footnotes: a For comparison, the original P1 structure had 4,207 atoms. Table S7. P21 Symmetry R-factors of the 1EGW data Resolution Range (Å) 99.0-3.23 3.23-2.56 2.56-2.24 2.24-2.04 2.04-1.89 1.89-1.78 1.78-1.69 1.69-1.62 1.62-1.55 1.55-1.50 All I/I 16.99 19.58 18.07 14.54 13.88 13.52 12.33 11.89 11.14 9.31 16.93 Rsymm 0.045 0.053 0.069 0.077 0.082 0.091 0.089 0.089 0.095 0.114 0.053 Rmeas 0.050 0.075 0.097 0.109 0.116 0.129 0.126 0.126 0.135 0.161 0.075 CC1/2 0.993 0.995 0.979 0.974 0.979 0.970 0.970 0.985 0.981 0.963 Table S8. P21 1EGW re-refinement statistics Space group P21 Unit Cell parameters a=60.696 Å, b=41.371 Å, c=63.987 Å, =115.18° Resolution 58-1.5 Å (1.539-1.500 Å) Number of reflections 43,991 (3,224) Rworking 14.2% (14.9%) Rfree 18.2% (18.4%) Number of atomsa 3,165 Observation/Atom Ratio, RO2A 13.9 <B> 23.5 Å2 Bond length deviations 0.004 Å Bond angle deviations 1.10° Ramachandran Plots Preferred regions 99.2% Allowed regions 0.8% Disallowed regions 0% Footnotes: a For comparison, the original P1 structure had 5,996 atoms, which would not allow individual anisotropic B-factors to be refined. With the correct symmetry, one can now refine individual anisotropic 15 B-factors. Since two orientations of the DNA duplex resulted in only three nucleotides per strand having two identities for the given sequence, the number of total atoms can be reduced to 2,591 when all alternative conformers having the same identity are removed for re-refinement. This resulted in slightly increased Rworking and Rfree of 15.3% and 19.2%, respectively. In this case, the observation/parameter ratio increased to 1.88 from 1.54. If the observation/parameter ratio is included in validation, it is most likely that the structure being refined with the least number of atomic parameters has the highest quality. Unfortunately, the current validation does not the observation/parameter ratio into account such as in Rfree calculations. Table S9. Symmetry R-factors of the 3MUU dataa Resolution Range(Å) C2 Rsymm 0.063 0.087 0.103 0.119 0.146 0.213 0.279 0.383 0.447 0.510 0.121 P3 Rsymm 0.133 0.191 0.230 0.252 0.288 0.364 0.436 0.523 0.605 0.615 0.224 P321 Rsymm 0.134 0.192 0.232 0.255 0.292 0.371 0.446 0.539 0.624 0.649 0.227 I/I I/I I/I 99.0-7.10 54.7 70.0 88.4 7.10-5.63 28.6 34.6 46.4 5.63-4.92 21.5 25.9 35.2 4.92-4.47 17.6 21.1 28.6 4.47-4.15 13.5 16.2 22.1 4.15-3.91 9.4 11.4 15.5 3.91-3.71 6.6 8.2 11.4 3.71-3.55 4.4 5.8 8.0 355-3.41 3.3 3.9 5.3 3.41-3.29 2.1 2.7 3.2 All reflections 17.0 20.3 27.4 Footnotes: a Reported Rmerging in P1 was 12.8%. Even without taking multiplicity into account, one of the three 2-fold axes is clearly retained as the crystallographic axis. 16 References 1. Miao J, Sayre D, Chapman HN (1998) Phase retrieval from the magnitude of the Fourier transforms of nonperiodic objects. Journal of the Optical Society of America a-Optics Image Science and Vision 15:1662-1669. 2. Palatinus L (2013) The charge-flipping algorithm in crystallography. Acta Crystallogr B 69:1-16. 3. Yao JX, Dodson EJ, Wilson KS, Woolfson MM (2006) ACORN: a review. Acta Crystallogr D Biol Crystallogr 62:901-908. 4. Posson DJ, McCoy JG, Nimigean CM (2013) The voltage-dependent gate in MthK potassium channels is located at the selectivity filter. Nat Struct Mol Biol 20:159-166. 5. Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR, Keegan RM, Krissinel EB, Leslie AG, McCoy A, McNicholas SJ, Murshudov GN, Pannu NS, Potterton EA, Powell HR, Read RJ, Vagin A, Wilson KS (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67:235-242. 6. Vagin A, Teplyakov A (2010) Molecular replacement with MOLREP. Acta Crystallogr D Biol Crystallogr 66:22-25. 7. Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund AC, Blanchet C, Bongcam-Rudloff E, Combet C, Da Costa AL, Deleage G, Diarena M, Fabbretti R, Fettahi G, Flegel V, Gisel A, Kasam V, Kervinen T, Korpelainen E, Mattila K, Pagni M, Reichstadt M, Breton V, Tickle IJ, Vriend G (2009) PDB_REDO: automated re-refinement of X-ray structure models in the PDB. J Appl Crystallogr 42:376-384. 8. Sheldrick GM (2008) A short history of SHELX. Acta Crystallogr A 64:112-122. 9. Wang J, Wing, R. (2014) Diamonds in the rough: a strong case for the inclusion of weak-intensity X-ray diffraction data. Acta Crystallogr D Biol Crystallogr D70:1491-1497. 10. Ye S, Li Y, Jiang Y (2010) Novel insights into K+ selectivity from high-resolution structures of an open K+ channel pore. Nat Struct Mol Biol 17:1019-1023. 11. Kern J, Tran R, Alonso-Mori R, Koroidov S, Echols N, Hattne J, Ibrahim M, Gul S, Laksmono H, Sierra RG, Gildea RJ, Han G, Hellmich J, Lassalle-Kaiser B, Chatterjee R, Brewster AS, Stan CA, Glockner C, Lampe A, DiFiore D, Milathianaki D, Fry AR, Seibert MM, Koglin JE, Gallo E, Uhlig J, Sokaras D, Weng TC, Zwart PH, Skinner DE, Bogan MJ, Messerschmidt M, Glatzel P, Williams GJ, Boutet S, Adams PD, Zouni A, Messinger J, Sauter NK, Bergmann U, Yano J, Yachandra VK (2014) Taking snapshots of photosynthetic water oxidation using femtosecond X-ray diffraction and spectroscopy. Nat Commun 5:4371. 12. Kupitz C, Basu S, Grotjohann I, Fromme R, Zatsepin NA, Rendek KN, Hunter MS, Shoeman RL, White TA, Wang D, James D, Yang JH, Cobb DE, Reeder B, Sierra RG, Liu H, Barty A, Aquila AL, Deponte D, Kirian RA, Bari S, Bergkamp JJ, Beyerlein KR, Bogan MJ, Caleman C, Chao TC, Conrad CE, Davis KM, Fleckenstein H, Galli L, Hau-Riege SP, Kassemeyer S, Laksmono H, Liang M, 17 Lomb L, Marchesini S, Martin AV, Messerschmidt M, Milathianaki D, Nass K, Ros A, Roy-Chowdhury S, Schmidt K, Seibert M, Steinbrener J, Stellato F, Yan L, Yoon C, Moore TA, Moore AL, Pushkar Y, Williams GJ, Boutet S, Doak RB, Weierstall U, Frank M, Chapman HN, Spence JC, Fromme P (2014) Serial time-resolved crystallography of photosystem II using a femtosecond X-ray laser. Nature 513:261-265. 13. Koua FH, Umena Y, Kawakami K, Shen JR (2013) Structure of Sr-substituted photosystem II at 2.1 A resolution and its implications in the mechanism of water oxidation. Proc Natl Acad Sci U S A 110:3889-3894. 14. Himmel DM, Das K, Clark AD, Jr., Hughes SH, Benjahad A, Oumouch S, Guillemont J, Coupa S, Poncelet A, Csoka I, Meyer C, Andries K, Nguyen CH, Grierson DS, Arnold E (2005) Crystal structures for HIV-1 reverse transcriptase in complexes with three pyridinone derivatives: a new class of non-nucleoside inhibitors effective against a broad range of drug-resistant strains. J Med Chem 48:7582-7591. 15. Li L, Jose J, Xiang Y, Kuhn RJ, Rossmann MG (2010) Structural changes of envelope proteins during alphavirus fusion. Nature 468:705-708. 16. Wang J, Li, Y., Modis, Y. (2014) Exploiting subtle structural differeneces in heavy atom derivatives for experimental phasing. Acta Crystallogr D Biol Crystallogr, in press. 17. Wang J (2005) DNA polymerases: Hoogsteen base-pairing in DNA replication? Nature 437:E6-7; discussion E7. 18. Evans PR, Murshudov GN (2013) How good are my data and what is the resolution? Acta Crystallogr D Biol Crystallogr 69:1204-1214. 19. Mukhopadhyay J, Das K, Ismail S, Koppstein D, Jang M, Hudson B, Sarafianos S, Tuske S, Patel J, Jansen R, Irschik H, Arnold E, Ebright RH (2008) The RNA polymerase "switch region" is a target for inhibitors. Cell 135:295-307. 20. Artsimovitch I, Patlan V, Sekine S, Vassylyeva MN, Hosaka T, Ochi K, Yokoyama S, Vassylyev DG (2004) Structural basis for transcription regulation by alarmone ppGpp. Cell 117:299-310. 21. Vassylyeva MN, Lee J, Sekine SI, Laptenko O, Kuramitsu S, Shibata T, Inoue Y, Borukhov S, Vassylyev DG, Yokoyama S (2002) Purification, crystallization and initial crystallographic analysis of RNA polymerase holoenzyme from Thermus thermophilus. Acta Crystallogr D Biol Crystallogr 58:1497-1500. 22. Vassylyev DG, Sekine S, Laptenko O, Lee J, Vassylyeva MN, Borukhov S, Yokoyama S (2002) Crystal structure of a bacterial RNA polymerase holoenzyme at 2.6 A resolution. Nature 417:712-719. 23. Artsimovitch I, Vassylyeva MN, Svetlov D, Svetlov V, Perederina A, Igarashi N, Matsugaki N, Wakatsuki S, Tahirov TH, Vassylyev DG (2005) Allosteric modulation of the RNA polymerase catalytic reaction is an essential component of transcription control by rifamycins. Cell 122:351-363. 24. Vassylyev DG, Svetlov V, Vassylyeva MN, Perederina A, Igarashi N, Matsugaki N, Wakatsuki S, Artsimovitch I (2005) Structural basis for transcription inhibition by tagetitoxin. Nat Struct Mol Biol 12:1086-1093. 18 25. Vassylyev DG, Vassylyeva MN, Zhang J, Palangat M, Artsimovitch I, Landick R (2007) Structural basis for substrate loading in bacterial RNA polymerase. Nature 448:163-168. 26. Vassylyev DG, Vassylyeva MN, Perederina A, Tahirov TH, Artsimovitch I (2007) Structural basis for transcription elongation by bacterial RNA polymerase. Nature 448:157-162. 27. Belogurov GA, Vassylyeva MN, Sevostyanova A, Appleman JR, Xiang AX, Lira R, Webber SE, Klyuyev S, Nudler E, Artsimovitch I, Vassylyev DG (2009) Transcription inactivation through local refolding of the RNA polymerase structure. Nature 457:332-335. 28. Feklistov A, Mekler V, Jiang Q, Westblade LF, Irschik H, Jansen R, Mustaev A, Darst SA, Ebright RH (2008) Rifamycins do not function by allosteric modulation of binding of Mg2+ to the RNA polymerase active center. Proc Natl Acad Sci U S A 105:14820-14825. 29. Tuske S, Sarafianos SG, Wang X, Hudson B, Sineva E, Mukhopadhyay J, Birktoft JJ, Leroy O, Ismail S, Clark AD, Jr., Dharia C, Napoli A, Laptenko O, Lee J, Borukhov S, Ebright RH, Arnold E (2005) Inhibition of bacterial RNA polymerase by streptolydigin: stabilization of a straight-bridge-helix activecenter conformation. Cell 122:541-552. 30. Aggarwal A, Nair, D., Johnson, R., Prakash, L., Prakash, S. (2005) Reply to Wang: Hoogsteen base-pairing in DNA replication? Nature 437:E7; discussion E7. 31. Chang G, Roth CB, Reyes CL, Pornillos O, Chen YJ, Chen AP (2006) Retraction. Science 314:1875. 32. Rice LM, Shamoo Y, Brunger AT (1998) Phase improvement by multi-start simulated annealing refinement and structure-factor averaging. Journal of Applied Crystallography 31:798-805. 33. Pellegrini M, GronbechJensen N, Kelly JA, Pfluegl GMU, Yeates TO (1997) Highly constrained multiple-copy refinement of protein crystal structures. ProteinsStructure Function and Genetics 29:426-432. 34. Tavares GA, Panepucci EH, Brunger AT (2001) Structural characterization of the intramolecular interaction between the SH3 and guanylate kinase domains of PSD-95. Mol Cell 8:1313-1325. 35. Santelli E, Richmond TJ (2000) Crystal structure of MEF2A core bound to DNA at 1.5 A resolution. J Mol Biol 297:437-449. 36. Nair DT, Johnson RE, Prakash L, Prakash S, Aggarwal AK (2005) Human DNA polymerase iota incorporates dCTP opposite template G via a G.C + Hoogsteen base pair. Structure 13:1569-1577. 37. Tsai WW, Wang Z, Yiu TT, Akdemir KC, Xia W, Winter S, Tsai CY, Shi X, Schwarzer D, Plunkett W, Aronow B, Gozani O, Fischle W, Hung MC, Patel DJ, Barton MC (2010) TRIM24 links a non-canonical histone signature to breast cancer. Nature 468:927-932. 38. Kaiser SE, Brickner JH, Reilein AR, Fenn TD, Walter P, Brunger AT (2005) Structural basis of FFAT motif-mediated ER targeting. Structure 13:10351045. 19 39. Ban M, Yoon HJ, Demirkan E, Utsumi S, Mikami B, Yagi F (2005) Structural basis of a fungal galectin from Agrocybe cylindracea for recognizing sialoconjugate. J Mol Biol 351:695-706. 40. Zhong N, Zhang S, Xue F, Kang X, Zou P, Chen J, Liang C, Rao Z, Jin C, Lou Z, Xia B (2009) C-terminal domain of SARS-CoV main protease can form a 3D domain-swapped dimer. Protein Sci 18:839-844. 41. Bunney TD, Esposito D, Mas-Droux C, Lamber E, Baxendale RW, Martins M, Cole A, Svergun D, Driscoll PC, Katan M (2012) Structural and functional integration of the PLCgamma interaction domains critical for regulatory mechanisms and signaling deregulation. Structure 20:2062-2075. 42. Kummel D, Krishnakumar SS, Radoff DT, Li F, Giraudo CG, Pincet F, Rothman JE, Reinisch KM (2011) Complexin cross-links prefusion SNAREs into a zigzag array. Nat Struct Mol Biol 18:927-933. 43. Shi H, Singh N, Esselborn F, Blobel G (2014) Structure of a myosin*adaptor complex and pairing by cargo. Proc Natl Acad Sci U S A 111:E1082-1090. 44. Landau M, Sawaya MR, Faull KF, Laganowsky A, Jiang L, Sievers SA, Liu J, Barrio JR, Eisenberg D (2011) Towards a pharmacophore for amyloid. PLoS Biol 9:e1001080. 45. Nowotny M, Gaidamakov SA, Crouch RJ, Yang W (2005) Crystal structures of RNase H bound to an RNA/DNA hybrid: substrate specificity and metaldependent catalysis. Cell 121:1005-1016. 46. Gao P, Ascano M, Zillinger T, Wang W, Dai P, Serganov AA, Gaffney BL, Shuman S, Jones RA, Deng L, Hartmann G, Barchet W, Tuschl T, Patel DJ (2013) Structure-function analysis of STING activation by c[G(2',5')pA(3',5')p] and targeting by antiviral DMXAA. Cell 154:748-762. 47. Gourdon P, Liu XY, Skjorringe T, Morth JP, Moller LB, Pedersen BP, Nissen P (2011) Crystal structure of a copper-transporting PIB-type ATPase. Nature 475:59-64. 48. Zhuang M, Calabrese MF, Liu J, Waddell MB, Nourse A, Hammel M, Miller DJ, Walden H, Duda DM, Seyedin SN, Hoggard T, Harper JW, White KP, Schulman BA (2009) Structures of SPOP-substrate complexes: insights into molecular architectures of BTB-Cul3 ubiquitin ligases. Mol Cell 36:39-50. 49. Liu W, Draheim KM, Zhang R, Calderwood DA, Boggon TJ (2013) Mechanism for KRIT1 release of ICAP1-mediated suppression of integrin activation. Mol Cell 49:719-729. 50. Matta-Camacho E, Kozlov G, Li FF, Gehring K (2010) Structural basis of substrate recognition and specificity in the N-end rule pathway. Nat Struct Mol Biol 17:1182-1187. 20