Design o o o o Design Quick Start The Rosetta Design Server Concepts Algorithms o The Energy Function Running Design o o o Design sub-protocols Design sub-protocols Common to multiple design sub-protocols Fixed backbone design Flexible backbone design Rotamer packing Multi-state design (designing conformational switches) Second-site suppressor (altering protein-protein binding specifities) Loops design Interface Minimization Tail design Grow termini design Interpreting Results Design Tendencies New Features This section was authored by members of the Kuhlman Lab at UNC and edited by Ion Yannopoulos. Design Quick Start The Rosetta Design Server Rosetta Design can also be run through the RosettaDesign Web Server . There is also useful documentation RosettaDesign? on the RosettaDesign Web Server website . pertinent to Concepts The design protocol searches for amino acid sequences that are compatible with a target protein structure or complex. You can choose to optimize all sequence positions in the protein or can select a subset of positions to vary. You can also specify which amino acids to consider at each sequence position. Design simulations are most often performed either by fixing a protein backbone, or by allowing the backbone to be treated as flexible and optimizing it along with the sequence. In addition to the general protocols for fixed and flexible backbone design, there are several design sub-protocols that allow for more specialized operations: Design sub-protocols Fixed backbone design This sub-protocol carries out design on a fixed protein backbone: it varies only the geometry of the side-chains. Flexible backbone design This sub-protocol carries out design on a flexible protein backbone: it varies the backbone geometry and not just the side-chains. The perturbations to the backbone are generally small and will not change the overall fold of the starting structure. Rotamer packing This sub-protocol does not perform any backbone manipulation or sequence optimization, limiting itself to the packing of rotamers on side-chains. Multistate design (e.g. designing conformational switches, homo-oligomers, etc.) This sub-protocol simultaneously optimizes a single amino acid sequence for multiple target structures. Second-site suppressor (altering protein-protein binding specificities) This protocol works on protein complexes and searches for point mutations that will destroy binding, then attempts to compensate for them by allowing the neighboring residues to vary. It creates an orthogonal binding interface in which the redesigned proteins still bind to each other, but no longer bind to their wild type partners. This sub-protocol is generally combined with a separate use of the interface analysis protocol. The run of the second site suppressor generates a list of mutations that are predicted to alter binding specificity, and the interface analysis is used to calculate binding energies for the mutations suggested. Interface design with a flexible backbone This sub-protocol works on paired protein complexes and optimizes the relative orientation of the two proteins, as well the backbone torsion angles and the amino acid sequences at that interface. Tail design This sub-protocol limits the movement of a flexible backbone design to a subset of the total backbone ending at either terminus. Loop design This sub-protocol restricts design to the variant regions of the protein. It is effectively flexible backone design limited to particular (arbitrary) regions of the backbone. It is called loop design because these flexible regions are usually loops on the surface of the protein which vary tremendously from protein to protein. Loops that can vary both in sequence and in structure are identified by not being part of any other existing secondary structure. Energy function optimization This sub-protocol is used for optimizing the weights used in the Rosetta energy function. It works with a list of PDB files. For each residue position, it tries each amino acid in each possible rotamer and outputs the various Rosetta energies to a text file. Besides the residue being varied, all other sequence positions are held in their native conformation. The output energies are then used by accessory scripts to find weights that maximize the probability the native amino acid will occupy a sequence position. Algorithms All simulations made in design make use of an algorithm for packing amino acid side-chains. To simplify the search, side-chains are only considered in a discrete set of favorable conformations, called rotamers. Rosetta uses Roland Dunbrack's backbone dependent library as well as collection of side-chain conformational libraries assembled by Steven Mayo. Rotamers built with Dunbrack's description have ideal geometries and the side-chain conformational libraries contain rotamers harvested directly from the PDB with non-ideal bond lengths and angles. Like other protocols that have been developed for protein design and rotamer packing, Rosetta's packing algorithm has two key components: an energy function for evaluating the favorability of specific sequence and structure and an optimization protocol for scanning through sequence space. Design sub-protocols Fixed backbone design 1. 2. 3. 4. 5. Pick a random sequence to start the simulation or start from a specified sequence. In general either approach gives similar results. Make single amino acid substitution or change the conformation (rotamer) of a single amino acid. Apply the energy function (See the section called “The Energy Function”) to determine the new energy of the protein. Accept or reject the substitution base on the Metropolis criterion: substitutions that raise the energy are accepted at some probability that depends on the local "temperature". The sequence optimization algorithm starts a permissive "temperature" and cools until it reaches "0 degrees". Repeat. The number of repetitions is empirically derived. For a hundred-residue protein a few hundred thousand rotamer substitutions are attempted. Flexible backbone design 1. 2. 3. Perform sequence optimization (as described under Fixed backbone design). Perform backbone refinement. This is done with the same optimization algorithms that are used for full-atom refinement during ab initio structure prediction (see the section called “Ab Initio: Algorithms”). Repeat the above two steps in alternation. Rotamer packing Uses the same Monte Carlo optimization protocol as that used for fixed backbone design. Multi-state design (designing conformational switches) Multi-state design uses Monte Carlo optimization protocols with simulated annealing in an outer loop in which amino acid substitutions are made and in an inner loop that is similar to fixed backbone design where rotamer substitutions are made. Starting from a random amino acid sequence, single random amino acid mutations are evaluated by threading the new sequence onto the multiple target structures, and determining if the sum of the threading energies compares favorably (based on the Metropolis criterion) to the energy of the previous sequence. Threading is performed using predetermined sequence-structure mapping and the side-chain repacking protocol). The threading energy function is the standard Rosetta energy function. As the simulation progresses, the temperature is lowered to ensure convergence to low-energy sequences. Second-site suppressor (altering protein-protein binding specificities) 1. 2. Search an protein-protein interface for all point mutations that destabilize binding. Find additional mutations which can compensate for the destabilization caused by the above point mutation. The resulting designs will contain both the initial point mutation and the compensating mutations. Note: Due to limitations of the original design, only the first interface detected will be redesigned. It is possible to modify the input file to change which interface is actually detected, by modifying the order in which the chains appear in the PDB. Interface design with flexible backbone Two protocols are available. The first is used when designing loops. The second is used with the interface minimization protocol. The decision about when to alternate is done heuristically. Iterate between the following: 1. 2. 3. Fixed backbone design Docking/rigid-body minimization Loop minimization A second variation of flexible backbone interface design works as follows: 1. 2. Fixed backbone design Simultaneous minimization of backbone and side-chain torsions of interface residues, as well as small rigid body docking minimizations. Loop design Minimize backbone torsion angles in the defined region of the "loop". Tail design Tail design has no special algorithms. It simply limits the range of residues to which fixed backbone design is applied. The Energy Function The energy function used by the design protocol is the same as that used for full-atom structure prediction. Lennard Jones potential This term favors atoms being close to each other, though not closer than a certain threshold distance. Lazaridis Karplus implicit solvation protocol This term penalizees the burial of polar groups and favors the burial of methyl groups. Hydrogen bonding (orientation-dependent) The relative strength of the solvation potential and the hydrogen-bonding potential determines how favorable the burial of a polar amino acid is, in the case where it forms a hydrogen bond. Torsion potentials (of backbone and side-chain) These values are knowledge-based: they are derived from the probabilities observed in the protein database. Pair energy This is a knowledge-based term that accounts for the likelihood of finding two amino acid types within a certain distance of each other. It is included as a low resolution term to capture electrostatic interactions. Reference energies These values control, on average, how often the amino acids are chosen during a design simulation. Running Design This section describes the basic elements of running the design protocol, and contains example command lines which illustrate common uses. Design sub-protocols The option which sets Rosetta to run the design protocol is -design. -design is insufficient however: one of the following sub-protocols must be chosen, as explained under the section called “Design: Concepts”. -fixbb -mvbb Runs Fixed backbone design. Runs Flexible backbone design. -onlypack Run Rotamer packing. -pack_in_parallel Run Multistate design (designing conformational switches) -alter_spec Run Second-site suppressor (altering protein-protein binding specificities) . -design_inter Run Interface design with a fixed backbone -design_loops_dock <dock | hold> -loops Run Loop design. There are two sub-options: dock: Run loop design iterated with rigid-body docking movements. hold: Run loop design on an otherwise fixed structure. Requires -loops. -design_min_inter Run design with small rigid body minimizations, minimization of interface residues' backbone, and side chain torsion minimization. -tail | -tail_fix_helix Run Tail design. Common to multiple design sub-protocols Command-line options -s <pdb file> The PDB file describing the protein that will be redesigned. -l <pdb list file> The file describing the list of proteins that will be redesigned. Each protein must have a corresponding PDB file. -pdbout <base name> Base name for the output PDB files. If this is not specified, the name of the protein is used as the base name. This option is only meaningful for the Fixed backbone design and Rotamer packing sub-protocols. -resfile <residue file> Specifies a file describing which residues will be allowed to vary during design simulation. This file can be created using the Perl script makeresfile.pl found in the rosetta_scripts/resfiles directory. There is a README that explains how to run the script. If this file is not specified all residues will be varied. Note: -resfile does not combine with -l: A resfile refers to only one PDB. -ndruns <# of runs> Set the number of design simulations that are done. Some tasks require considerably more than one run to generate useful data. The default is 1. This option applies to the fixed backbone sub-protocol only. -profile Generates a table of amino acid distributions of the designed residue and a fasta file with designed sequences. -ex1, -ex2, -ex3, -ex4 -ex1aro Increase the size of the rotamer library that is used for particular chi angles. The number indicates which chi angles (from chi1 to chi4) use extra sub-rotamers. They can be used in any combination. Note however that each of these increases the computation time considerably, and using them all in combination may drive your calculation time into the stratospheric. Deciding when to use the extended rotamer sets is a matter of experience. The deciding factor in which rotamers are actually considered is -extra_chi_cutoff. As -ex1, but uses an even bigger rotamer library for aromatics. -ex2aro_only As -ex2, but only considers extra rotamers for aromatics. -extra_chi_cutoff <neighbors> Sets the number of neighbors a residue needs in order for extra rotamers requested by any of the -ex options to be considered. A neighbor is defined as being within 10 Angstroms of C-beta. The default is 18, which means that only residues in the core of the protein are likely to get extra rotamers. Setting this value to 1 will result in all residues getting extra rotamers. -use_electrostatic_repulsion Re-weights the pair potential used to model electrostatic terms of the energy function, so that it is less favorable to place like charged amino acids near each other. -soft_rep_design Specifies the use of an alternate weight set that dampens the Lennard-Jones potential. This option is often used during fixed backbone design since the backbone cannot relax to accommodate small steric clashes. -soft_rep Same as -soft_rep_design except that weights are explicitly optimized for side-chain packing with a fixed sequence. -favor_native_residue <energy> Favor starting amino acids with the specified <energy> in kcal/mol. Negative values are bonuses while positive values are penalties. -favor_polar <energy> Analogous to -favor_native_residue, save that it applies to polar amino acids. -favor_nonpolar <energy> Analogous to -favor_native_residue, save that it applies to non-polar amino acids. -favor_aromatic <energy> Analogous to -favor_native_residue, save that it applies to aromatic amino acids. -rot_opt Optimize the one-body energy (energy with a fixed environment) by minimizing chi angles before entering Monte Carlo rotamer optimization. Fixed backbone design Command-line options rosetta -design -fixbb -mcmin_trials [ -s <PDB file> | -l <PDB list file> ] -fixbb Enable fixed backbone design. -mcmin_trials Perform rotamer/sequence optimization procedure in which following each rotamer substitution the chi angles of the new rotamer and the neighboring residues are minimized before evaluating the substitution with the Metropolis criterion. This procedure is slow so it should be implemented as a follow up to the standard design protocol. Examples Example 1. Simplest fixed-backbone design simulation. rosetta -design -fixbb -s 2ptl.pdb Example 2. Limit which residues to vary in a fixed-backbone simulation using a rosetta -design -fixbb -s 2ptl.pdb -resfile 2ptl.res Example 3. Design sequences for a set of PDB files specified in a list. resfile. rosetta -design -fixbb -l 2ptl.pdb_list Example 4. Designing sequences for a set of PDB files specified in a list that includes chain id information. rosetta -design -fixbb -l 2ptl.pdb_list -chain_inc Example 5. Expand the rotamer library by including small deviations in chi2. Use the default neighbor cutoff, to consider only well-packed atoms. rosetta -design -fixbb -s 2ptl.pdb -ex2 Example 6. Expand the rotamer library by including small deviations in chi and chi2, Use the default neighbor cutoff, to consider only well-packed atoms. rosetta -design -fixbb -s 2ptl.pdb -ex1 -ex2 Example 7. Expand the rotamer library by including small deviations in chi and chi2. Use a neighbor cutoff low enough to apply to all residues, regardless of packing. rosetta -design -fixbb -s 2ptl.pdb -ex2 -extrachi_cutoff 1 Example 8. Perform three fixed-backbone simulations instead of one. rosetta -design -fixbb -s 2ptl.pdb -ndruns 3 Example 9. Perform gradient-based minimization of side-chain torsion angles during design. rosetta -design -fixbb -mcmin_trials -s 2ptl.pdb Flexible backbone design Command-line options rosetta -design -mvbb -mvbb Enabled flexible backbone design. Inputs Fragment file See the section called “Fragments”. Examples Example 10. Simplest possible flexible-backbone simulation. rosetta -design -mvbb -s 2ptl.pdb Example 11. Limit which residues to vary in a fixed-backbone simulation using a resfile. rosetta -design -mvbb -s 2ptl.pdb -resfile 2ptl.res Example 12. Expand the rotamer library by including small deviations in chi2. Use the default neighbor cutoff, to consider only well-packed atoms. rosetta -design -mvbb -s 2ptl.pdb -ex2 Example 13. Expand the rotamer library by including small deviations in chi2. Use a neighbor cutoff low enough to apply to all residues, regardless of packing. rosetta -design -mvbb -s 2ptl.pdb -ex2 -extrachi_cutoff 1 Example 14. Expand the rotamer library by including small deviations in chi and for aromatics only on chi2. Use a neighbor cutoff low enough to apply to all residues, regardless of packing. rosetta -design -mvbb -s 2ptl.pdb -ex1 -ex2aro_only -extrachi_cutoff 1 Example 15. Perform three flexible-backbone simulations instead of one. rosetta -design -mvbb -s 2ptl.pdb -ndruns 3 Rotamer packing Command-line options rosetta -design -onlypack -onlypack Enable rotamer packing sub-protocol. Examples Example 16. Simplest possible sidechain repacking simulation. The sequence is fixed. rosetta -design -onlypack -s 2ptl.pdb Multi-state design (designing conformational switches) Command-line options rosetta -design -pack_in_parallel -equiv_resfile <equiv_resfile> conv_limit_mod <loop count> -pack_in_parallel Enable multi-state design sub-protocol. -equiv_resfile <equiv_resfile> Specifies an equivalency resfile, <equiv_resfile>. -conv_limit_mod <loop count> Multiply <loop count> by 5 to give the number of annealing loops in the mutation generating loop. Increasing this value may improve sequence convergence but at the cost of increased time. Inputs Equivalency resfile (equiv_resfile) Multi-state design requires that the user create a file in which the residues that need to have the same amino acid type are given. The format for an equiv_resfile looks like: A 1 32 B 1 32 C 1 32 A 1 29 D 1 29 Where residues 1-32 of chain A correspond to 1-32 of chains B and C. Similar correspondence exists between residues 1-29 of chains A-D. Examples Example 17. Simple multi-state design rosetta -design -pack_in_parallel -s 2ptl.pdb -resfile 2ptl.res equiv_resfile 2ptl.equiv_res Second-site suppressor (altering protein-protein binding specifities) This sub-protocol works on protein complexes and searches for mutations that will destroy binding, but which can be compensated for by mutation on the partner protein. Command-line options rosetta -design -alter_spec [-alter_spec_mutlist <mutation list file> ] [fix <file> ] [-pmut <file> ] -alter_spec Enables alter specificity sub-protocol. -alter_spec_mutlist <file> Changes the name of the generated output file to <file>. -fix <file> Fixes residues specified in <file> so they will not be redesigned. -pmut <file> Limits the residues which are examined for point mutations to the ones specified in <file> Inputs Note Due to limitations of the original design, only the first interface detected will be redesigned. It is possible to modify the input file to change which interface is actually detected, by modifying the order in which the chains appear in the PDB. -alter_spec does not differentiate chains by chain id. Instead it looks for the first and second PDB termination markers (TER) in the PDB file. To work on different sets of chains it is necessary to reorder the contents of the PDB file. Outputs Mutation list file -alter_spec will cause the list of each mutated residue for each of the four complexes generated (wild-wild, mutant-wild, mutant-mutant) to be stored in the file mutlist. The name of this file can be altered with alter_spec_mutlist. Example 18. Simplest possible alter specificity simulation. Implements second site suppressor strategy for design altered specifity protein-protein interfaces. rosetta -design -alter_spec -s 2ptl.pdb Example 19. Alter specificity and rename the output file. rosetta -design -alter_spec -s 2ptl.pdb -alter_spec_mutlist 2ptl.mutlist Example 20. Alter specificity, renaming the output file and enabling use of expanded rotamer library for chi1 and chi2. rosetta -design -alter_spec -s 2ptl.pdb -ex1 -ex2 -alter_spec_mutlist 2ptl.mutlist Example 21. Alter specificity using a softened repulsion term to compensate for the fact that amino acids are represented in discrete space. rosetta -design -alter_spec -s 2ptl.pdb -soft_rep_design Example 22. Alter specificity, taking into account extended rotamers sets. Use an altered energy function that more strongly disfavors like charges being near each other. rosetta -design -alter_spec -s 2ptl.pdb -use_electrostatic_repulsion Loops design Protocols for iterative docking, design, and loop modeling. Command-line options rosetta -design -loops -design_loops [ dock | hold ] [ -s <PDB file> | -l <PDB list> ] -read_all_chains -loops Do setup for a task involving loops. Required for this sub-protocol. -design_loops dock Enable design with flexible loops and rigid body docking movements. As with any docking operation multiple chains must exist in the PDB file, and they must be delimited with a TER record. -design_loops hold Enable design with flexible loops on an otherwise fixed backbone. Without a resfile, this mode automatically designs the loops regions and repacks the contact neighbors. -read_all_chains Read all the chains in the PDB file, not just the first. This is turned on by default if -design_loops dock is specified. In the case of -design_loops hold this option must be provided to read all the chains. Inputs Loops file -loops looks for a file called <protein>.loops. The loops file specifies which residues are in loops and therefore allowed to move during the simulation (backbone and side chain motion).There is currently no way to change the prefix of the loops file to something other than "<protein>". The extension can be changed with loop_library but this isn't really that useful. Note:The loops file must have the following printf style syntax: "%3d %4d %4d\n", $looplength, $begin, $end The loops file is also covered in other places in the manual. Fragment libraries Fragments are specified by the 'fragments' entry of paths.txt. Outputs Full-atom score file The .fasc is generated as described under the section called “Outputs”. Examples Example 23. Simplest possible loop design run. rosetta -design -loops -design_loops dock -s 2ptl.pdb Example 24. Simplest possible loop design run for a single protein (no docking). rosetta -design -loops -design_loops hold -s 2ptl.pdb Example 25. Loop design run enabling use of extended rotamers for chi1 and chi2. rosetta -design -loops -design_loops dock -ex1 -ex2 -s 2ptl.pdb Interface Minimization This sub-protocol intersperses interface minimization (backbone, side-chain, and small ridid body motions) with design. The interface is defined as residues within 5.0 Angstroms of the binding partner. Command-line options rosetta -design -design_min_inter -design_min_inter Enables design with interface minimization. Examples Example 26. Simplest possible minimize interface run. rosetta -design -design_min_inter -s 2ptl.pdb Tail design This sub-protocol is for optimizing the sequence and conformation of results at the N- or C- terminus. Design is restricted to a tail: a series of residues at either terminus. Command-line options rosetta -design [ -tail | -tail_fix_helix ] -begin <residue id> -end <residue id> -tail Performs Flexible backbone design on the protein terminal region only (called the "tail"). It requires -begin and -end in order to work. It will allow fairly large perturbations to the entire tail region. -tail_fix_helix As with -tail save that every residue in the tail is perturbed except those with helical backbone torsions. This method requires fragments. -begin <residue id> Specifies which <residue id> should be treated as the start of the tail. -end <residue id> Specifies which <residue id> should be treated as the end of the tail. Examples Example 27. Simplest possible tail design run rosetta -design -tail -s 2ptl.pdb -begin 2 -end 10 Example 28. Simplest possible tail design run with fixed helical regions rosetta -design -tail_fix_helix -s 2ptl.pdb -begin 2 -end 10 Grow termini design Similar to the extension protocol described in Sood, V. D. and Baker, D. (2006). J Mol Biol 357(3): 917-27, this subprotocol will extend the N or C terminus of a PDB by some number of residues; alternatively, it may be used to remodel the N- or C-terminus of a protein. It uses Rosetta's centroid mode, so it will strip off all side chains. A library of starting structures with diverse conformations of an N- or C-terminal extension will be output, and these may be used as the inputs to fixed backbone design, after the side-chains of the constant regions have been pasted back on. Note that in addition to the usual input files, a fasta file and a loop file are required. This protocol may be streamlined and made more user-friendly in future Rosetta releases; for the time being, several support scripts are provided to help the user. These support scripts are found in rosetta_scripts/peptide_extensions. Command-line options rosetta <series> <protein> <chain> -design -loops -grow -atom_vdw_set <atom radii> [ -vdw_max <vdw filter> ] [ -rg_max <rg filter> ] [-cenlist_values] [-wiggle_jxn] -loops -grow Enable use of loops. Necessary for grow termini sub-protocol. Enable grow termini subprotocol. -atom_vdw_set <atom radii> Should be set to "highres" to obtain the highest quality structures. -vdw_max <vdw filter> To avoid printing out structures with clashes set this to the vdw score of the starting structure, or a little higher. Structures with scores worse than the filter will be discarded. -rg_max <vdw filter> To avoid printing out structures in which the terminus being modeled has little interaction with the rest of the structure, set this to the rg score of the starting structure, or a little higher. Structures with scores worse than the filter will be discarded. -cenlist_values Fill the the last two columns in the "complete" lines at the end of the PDB with the number of centroid neighbous each residue has within 6.0 Angstroms (cen6) or 10.0 Angstroms (cen10). Useful for identifying output structures with interactions to a particular residue, if you would like your extension to be targeted to a certain site. -wiggle_jxn Sample an even larger conformational space by adding backbone flexibility at the first residue of the extension. Warning: This option will significantly increase computational time, and has not been shown to be defintitively useful in producing higher quality models. Inputs Loops file Contains the backbone conformational space that will be explored by the grow termini protocol. The more lines (conformations) there are in this file, the larger the conformational space that will be explored.A loop file in proper format may be produced either from the vall or from a list of idealized PDB structures, using scripts found in rosetta_scripts/peptide_extensions/. FASTA file The file must contain the entire amino acid sequence of the PDB, including placeholder alanines. Outputs PDB One PDB file will be output for every line in the loop file, as long as the output structure passes the vdw_max and rg_max filters. The PDBs are named by the PDB code of the structure from which the torsion angles of the extension came, and by the line number in the loop file. <Ignored> Two additional files, nnXXXX_0001.pdb and nnXXXX.sc (where PDB name) will also be output. These should be ignored. nn is the series code and XXXX is the Examples Example 29. Simplerun of grow termini rosetta aa 1kka _ -loops -grow -atom_vdw_set highres -read_all_chains -s 1kka.pdb Interpreting Results The energies, structure and output of a Rosetta design simulation are placed in the output PDB file. The PDB file has the following sections: 1) Coordinates of the design structure. 2) A list of scores. Many of these are use in Ab Initio Structure Prediction and are not particularly relevant to protein design. The main score is Wbk_tot * bk_tot + Wother * other. The other terms that contribute to the score evaluate the backbone stucture, e.g. the ramachandran score. The score is supposed to be the energy with bk_tot as just one part of it. For the sake of consistency, the score can be used instead of bk_tot when thinking of the quality of output structures. Note:There are many different scoring functions. The standard score is score12. See score.cc The scores used during design with the default protocols are: bk_tot The total score using the design energy function. Lower is better. fa_atr The attractive portion of the Lennard-Jones? potential. Rewards close contacts. fa_rep Lennard-Jones repulsive term. Penalizes overlaps. fa_sol Lazaridis-Karplus solvation model. Penalizes buried polars. fa_dun Internal energy of sidechain rotamers as derived from Dunbrack's statistics. fa_intrares Intra-residue clashes. fa_pair Statistics based paird term. Favors salt bridges. fa_prob Probabilistic term: P(aa hb_sc Sidechain-sidechain and sidechain-backbone hydrogen bond energy. hb_srbb Backbone-backbone hydrogen bonds close in primary sequence. hb_lrbb Backbone-backbone hydrogen bonds distant in primary sequence. phi, psi) and Ramachandran preferences. 3) A table of energies for each residue in the protein. res The residue index. aa The three-letter amino acid code. nb The count of neighbors. Eatr Lennard-Jones attractive term. Erep Lennard-Jones repulsive term. Esol Lazaridis-Karplus solvation. Eaa Probability of an amino acid given the particular phi and psi angles: P(aa phi, psi) Edun Rotamer preferences from Dunbrack library Eintra Intra-residue clashes. Ehbnd Hydrogen bonding. Epair Statistics-based pair term. Elj Lennard-Jones total. Eres Total energy per residue. Table 1. Example of Residue Energies res_aa Eatr Erep Esol Eh2o Eh2o_sol Eaa Edun Eintra Ehbnd Epair Eref Egb Eh2o Eh2o_bb Ecst Eres 1 MET -4.0 0.3 1.4 0.0 0.0 2.5 0.3 -0.8 0.0 0.3 0.0 0.0 0.0 0.0 -0.7 2 GLN -2.5 0.1 1.5 0.0 0.0 2.9 0.0 -0.7 -0.1 1.0 0.0 0.0 0.0 0.0 0.3 3 ILE -4.1 0.1 1.2 0.0 -0.2 0.1 0.4 -1.6 0.0 -0.2 0.0 0.0 0.0 0.0 -3.9 4 PHE -4.4 0.5 1.8 0.0 -0.3 0.2 0.0 -1.6 0.0 -0.6 0.0 0.0 0.0 0.0 -3.2 5 THR -3.5 0.0 1.8 0.0 0.0 0.0 0.0 -1.4 0.0 0.3 0.0 0.0 0.0 0.0 -3.3 6 LYS -3.0 0.0 1.5 0.0 0.1 0.9 0.1 -1.7 0.0 0.6 0.0 0.0 0.0 0.0 -2.8 7 THR -3.1 0.1 1.9 0.0 -0.7 0.3 0.0 -1.2 0.0 0.3 0.0 0.0 0.0 0.0 -3.0 8 LEU -1.3 0.0 0.5 0.0 0.0 1.6 0.5 0.0 0.0 0.1 0.0 0.0 0.0 0.0 1.2 9 THR -1.4 0.1 1.1 0.0 -0.1 0.8 0.0 -0.2 0.0 0.3 0.0 0.0 0.0 0.0 0.1 10 GLY -0.8 0.1 0.6 0.0 -1.5 -0.0 0.0 -0.3 0.0 0.2 0.0 0.0 0.0 0.0 -2.2 11 LYS -2.8 0.1 2.4 0.0 0.0 4.4 -0.7 -0.3 0.6 0.0 0.0 0.0 0.0 2.6 0.2 4) A table of measured energies minus expected energies. Expected energies are derived by calculating the average energies of the different amino acids with a certain number of neighbors in a large set of proteins in the PDB. The table is useful for determining how well-packed a residue is. The column Elj compares the actual Lennard-Jones energy of residues to the expected value. Well-packed residues should have Elj scores near zero or negative. res The residue index. aa The three-letter amino acid code. nb The count of neighbors. Eatr Lennard-Jones attractive term. Erep Lennard-Jones repulsive term. Esol Lazaridis-Karplus solvation. Eaa Probability of an amino acid given the particular phi and psi angles: P(aa Edun Rotamer preferences from Dunbrack library Eintra Intra-residue clashes. Ehbnd Hydrogen bonding. Epair Statistics-based pair term. Elj Lennard-Jones total. Eres Total energy per residue. phi, psi) SASApack SASApack is related to the void volume in a protein. Surface areas are computed with a 1.4 Angstrom probe and 0.5 angstrom probe and the difference (ASA_0.5 - ASA_1.4) is compared to the expected difference for a particular residue type in a particular environment. A negative value is favorable and indicates that the residue is more tightly packed than is seen in average PDB files. Table 2. Example of Average Energies + RMSD SASApack score res aa nb Eatr Erep Esol Eaa Edun Eintra Ehbnd Epair Elj Eres SASApack 1 MET 15 -0.8 -0.1 0.0 0.0 -0.2 0.2 -0.1 0.0 -1.0 -1.7 2.38 2 GLN 11 0.2 -0.2 -0.3 0.1 0.3 0.0 -0.1 0.0 0.0 -0.4 8.93 3 ILE 22 0.3 -0.2 -0.3 0.0 -1.2 0.4 -0.6 0.0 0.1 -1.8 18.02 4 PHE 14 -0.3 0.1 -1.0 0.0 -0.2 -2.5 1.89 5 THR 22 0.2 -0.3 -0.5 0.1 -0.5 0.0 -0.3 0.1 0.0 -1.0 11.62 6 LYS 15 0.3 -0.4 -0.5 0.1 -2.2 0.1 -0.9 0.2 -0.1 -3.6 5.48 7 THR 18 0.1 -0.2 -0.2 -0.6 -0.3 0.0 -0.2 0.1 -0.1 -1.1 10.38 8 LEU 10 0.9 -0.2 -0.6 0.1 0.1 0.5 0.0 0.8 0.9 1.33 0.4 -0.2 -1.0 -0.1 0.5 5) A table of measured energies minus expected energies, for residues in different environments: surface, buried and exposed. When creating novel structures we have found it difficult to get Elj numbers that are zero or negative for the buried residues. Note Most values in this table are only meaningful with the default energy function. Table 3. Example of Measured Energies - Expected Energies Eatr Erep Elj buried -0.1 -0.2 -0.3 middle -0.3 -0.1 -0.3 surface 0.1 -0.1 0.0 6) A table of starting chi angles minus finishing chi angles, and of absolute chi angles. 7) A table of phi, psi and omega angle for each residue. Design Tendencies In some cases RosettaDesign? does appear to make odd choices, and it helps to know beforehand what some of these tendencies are. In these situations it is probably best to use a resfile to try and point Rosetta away from these pitfalls. 1. 2. The program likes to put amino acids with similar chemical properties near each other. This is primarily because polar residues can hydrogen bond with each other, and hydrophobics can pack without burying hbonding groups. The result is that in some cases you may observe a large cluster of hydrophobic residues on the surface of a protein, or a cluster of polars in the core. In some cases this can be avoided by forcing key residues to be polar or hydrophobic. Sometimes polar groups are buried without a hydrogen bonding partner. The energy function has been parameterized to try and avoid this, but there is no filter that prevents it. New Features 1.design minimize inter -dock_des_min_inter Dock, Design Minimize Interface is a protocol that first docks two proteins using Rosetta's centroid-mode docking algorithm and, once having found a suitable docking arrangement, designs the interface (using the des_min_inter protocol) by iterating between rounds of fixed-backbone sequence and structure optimization and rounds of gradient-based minimization for the degrees of freedom at the interface. There are several flags that go along with dock_des_min_inter whose names begin with "ddmi" (e.g. ddmi_dG_dSASA_ratio_filter, ddmi_dUns_filter). 2.point mutation -point_mutation The point mutation submode of design mode alters the sequence of a single residue and performs a repacking of the residue's neighbors. The two flags used in this submode are "-point_mutation < int = resid to change >" and "-new_aa < char = 1 letter amino acid code for the new amino acid >