Supplementary material AWSEM In the coarse-grained scheme of AWSEM, each amino acid residue of a protein is simplified so as to be described by three beads: Cα, Cβ, and O atoms (glycine is an exception due to its lack of a Cβ). Assuming an ideal geometry for the peptide bond, the positions of all the rest of atoms in the backbone can be calculated so stereochemistry is quite accurate unlike a Cα only model. The energy function of the standard AWSEM1 can be schematically written as follows. VAWSEM = Vbackbone +Vnon-backbone +VFM (S1) In Eq. (S1), Vbackbone refers to the force field that maintains backbone geometries of protein chains while Vnon-backbone includes physically motivated potentials that reflect the protein’s chemical and/or physical properties in the context of protein secondary and/or tertiary interactions. These two energy terms are completely transferable among different proteins. The term VFM (FM denotes “fragment memory”) on the other hand uses the similarity in local sequence to encode local structural tendencies using structures of peptide fragments available in the Protein Data Bank (PDB). In the following, we shall describe the above energy terms in more detail. Vbackbone consists of five terms, Vbackbone = Vcon + Vchain + Vχ + Vrama + Vexcl, which refer to terms fixing the connectivity of the chain (Vcon), the bond angles around the Cα atom (Vchain), the chirality for the correct orientations of the Cβ atoms (Vχ), the backbone dihedral angles (Vrama, Ramachandran potential), and the excluded volume interaction (Vexcl), respectively. Note that Vcon and Vchain are functions of the distances constrained by a combination of harmonic potentials. Vnon-backbone includes three terms, Vnon-backbone = Vcontact + Vburial + Vhelical, which individually 1 represent different aspects of protein chemistry and physics. Vcontact refers to the contact interactions of a protein’s tertiary fold. It is defined by specifying the Cβ-Cβ distances that are far apart in sequence. The contact potential includes both direct and water (or protein) mediated interactions between the residues. Vburial is a non-additive potential that considers the preference of a residue to be buried inside the protein or to be exposed at the protein surface; this preference depends on residue-type. Vhelical is associated with the propensity for forming helical structure, which requires the formation of an explicit hydrogen bonding between carbonyl oxygen of residue i and amide hydrogen of residue i+4 in the backbone. This helical propensity also depends on residue-type being determined by the types of the residues participating in the bonding. The use of VFM biases local structure to resemble the local structures of protein fragments with closely related local sequence. This term thus uses a database of short fragments called the “fragment memory” library. The strategy here has been used in various forms2 and resembles the associative memory Hamiltonian of neutral network models.3 The fragment memory term takes into account local steric effects of side chain packing that are modulated by the local sequence of proteins. This form of associative memory term has been successfully combined with the physically-based interactions to provide de novo structure prediction of protein tertiary fold.1, 2, 4, 5 When predicting tertiary structure using AWSEM, the quality of predictions depends on whether global homologues are included in the fragment memory library. When homologues are available and used in the code the tertiary prediction improve substantially.1 In this study, the structure predictions of 12 α monomeric proteins were performed under the somewhat artificial constraint of using a database excluding close homologues. This strategy then mimics the situation where a 2 truly novel fold is being encountered. The homologues excluded (HE) fragment memory library (HE fragment library hereafter) refers to a database of selected sequences where no homologues having more than 20% sequence identity with the target sequence are included. If one uses only the information of the native protein itself as the memory, the folding landscape is strongly biased/funneled towards this single structure, thereby this approach is called a “single fragment memory simulation”. Under the premise that folding or binding is biased at the local level towards the appropriate single structure. Here, in addition to mimicking de novo structure prediction by using the homologues excluded library, we use single fragment memory AWSEM to pinpoint the effects of physical forces on the folding and binding landscapes. Single fragment memory AWSEM was used for both protein S6 and dimers since our aim here was to highlight electrostatic effects on forming native structure. More detailed descriptions of all the energy terms mentioned above can be found in the original AWSEM paper1 and the supplemental information therein. Molecular dynamics simulations: simulated annealing and free energy calculations All the simulations were carried out using the LAMMPS simulation package,6 in which the AWSEM force field has been implemented.1 The simulation protocol used generates simulated trajectories of protein folding and binding by solving numerically the equations of motion. According to the ergodic hypothesis, the time average of a property is statistically equivalent to the ensemble average of that property. Here, we adopted a canonical ensemble for the system using a Nose-Hoover thermostat to control the temperature. To fold proteins, we performed simulated annealing by gradually lowering the temperature for both the monomeric proteins and the dimers. 3 Simulated annealing gradually biases unfolded proteins towards their native structure by slowly changing from high to low temperature. For each of the monomeric proteins and dimers, we generated 30 simulated annealing trajectories. The simulation time step is 3 fs for the 12 α proteins and 5 fs for both protein S6 and the dimers. For monomeric proteins using homologues excluded fragment library of AWSEM (see Supplemental Material), we started from an extended conformation at a temperature (600 K) well above the folding temperature of all proteins studied, and the temperature was cooled to 300 K over 6 million steps (same annealing schedule for single fragment memory calculation of protein S6) while for the protein dimers we started from their native structures, first pulling the two monomers apart (25 Å for 1CTA and 1KDX; 60 Å for 1F36 and 1VKX) by applying a biasing force and then followed their motions by carrying out simulated annealing (starting at 450 K to 300 K over 6 million steps) with a Langevin thermostat. Thus these simulations document how well binding interfaces can be predicted by the force field. Free energy profiles (i.e., potentials of mean force) for the dimers were calculated by performing umbrella sampling using a harmonic biasing force applied against Q, defined in Methods. Equilibrium data are then collected at a constant temperature and used to generate free energy profiles using the weighted histogram analysis method (WHAM).7 Figures 4 Figure S1. Simulated annealing results (using single fragment memory) of 1R69 and 3ICB with different electrostatic strengths are plotted. The annealing results (Qavg) are shown as a function of the annealing index. Qavg is an average of Q over the last 100 snapshots from each annealing trajectory (a total of 30). The corresponding native structures of the proteins are also shown below the curves (see Fig. 1 for color descriptions). (top) 1R69. No difference in annealing profiles can be identified between annealing sets with different εr values. (bottom) 3ICB. The simulations were conducted using single fragment memory of AWSEM, which serves as a comparison used to verify the robustness in structure prediction. The results here suggest electrostatics does not have substantial effect on the native structure and it does not affect the quality of structure prediction. 5 (a) WT vs CD/SC 1 WT CD SC 0.9 0.8 0.7 Qavg 0.6 0.5 0.4 0.3 1 5 10 15 20 Annealing index 25 30 (b) Figure S2. Simulated annealing results for different charge variants of protein S6 (1RIS) are plotted. The results (Qavg) are shown as a function of annealing index. (a) Comparison of wild-type (WT), super-charged (SC) and charge-depleted (CD) variants. The native structure of the protein is shown. The dependence of simulated annealing results on the electrostatic strength employed (εr=∞~16.6) is shown in (b) for WT and (c) for CD/SC. The fraction of successful annealed events 6 shows the trend: WT >> CD > SC. This observation suggests that the folding stability of WT is higher than CD/SC, consistent with Oliveberg’s results. Note that the results presented in (a) refer to the wild-type (εr=33.2), charge-depleted (εr=∞), and the super-charged (εr=33.2) chosen from (b). 7 References 1. Davtyan A, Schafer NP, Zheng W, Clementi C, Wolynes PG, Papoian GA (2012) AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494-8503. 2. Friedrichs MS, Wolynes PG (1989) Toward protein tertiary structure recognition by means of associative memory hamiltonians. Science 246:371-373. 3. Hopfield JJ (1984) Neurons with graded response have collective computational properties like those of 2-state neurons. Proc Natl Acad Sci USA 81:3088-3092. 4. Papoian GA, Ulander J, Eastwood MP, Luthey-Schulten Z, Wolynes PG (2004) Water in protein structure prediction. Proc Natl Acad Sci USA 101:3352-3357. 5. Hegler JA, Latzer J, Shehu A, Clementi C, Wolynes PG (2009) Restriction versus guidance in protein structure prediction. Proc Natl Acad Sci USA 106:15302-15307. 6. Plimpton S (1995) Fast parallel algorithms for short-range molecular-dynamics. J Comput Phys 117:1-19. 7. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM (1992) The weighted histogram analysis method for free-energy calculations on biomolecules .1. the method. J Comput Chem 13:1011-1021. 8