Design

advertisement
Design
o
o
o
o
Design Quick Start
The Rosetta Design Server
Concepts
Algorithms


o
The Energy Function
Running Design











o
o
o
Design sub-protocols
Design sub-protocols
Common to multiple design sub-protocols
Fixed backbone design
Flexible backbone design
Rotamer packing
Multi-state design (designing conformational switches)
Second-site suppressor (altering protein-protein binding specifities)
Loops design
Interface Minimization
Tail design
Grow termini design
Interpreting Results
Design Tendencies
New Features
This section was authored by members of the Kuhlman Lab at UNC and edited by Ion Yannopoulos.
Design Quick Start
The Rosetta Design Server
Rosetta Design can also be run through the RosettaDesign Web Server . There is also useful documentation
RosettaDesign? on the RosettaDesign Web Server website .
pertinent to
Concepts
The design protocol searches for amino acid sequences that are compatible with a target protein structure or complex. You
can choose to optimize all sequence positions in
the protein or can select a subset of positions to vary. You can also specify which amino acids to consider at each sequence
position. Design simulations are most often
performed either by fixing a protein backbone, or by allowing the backbone to be treated as flexible and optimizing it along
with the sequence.
In addition to the general protocols for fixed and flexible backbone design, there are several design sub-protocols that allow
for more specialized operations:
Design sub-protocols
Fixed backbone design
This sub-protocol carries out design on a fixed protein backbone: it varies only the geometry of the side-chains.
Flexible backbone design
This sub-protocol carries out design on a flexible protein backbone: it varies the backbone geometry and not just
the side-chains. The perturbations to the backbone are generally small and will not change the overall fold of the
starting structure.
Rotamer packing
This sub-protocol does not perform any backbone manipulation or sequence optimization, limiting itself to the
packing of
rotamers on side-chains.
Multistate design (e.g. designing conformational switches, homo-oligomers, etc.)
This sub-protocol simultaneously optimizes a single amino
acid sequence for multiple target structures.
Second-site suppressor (altering protein-protein binding specificities)
This protocol works on protein complexes and searches for point mutations that will destroy binding, then
attempts to compensate for them by allowing the
neighboring residues to vary. It creates an orthogonal binding interface in which the redesigned proteins still bind
to each other, but no longer
bind to their wild type partners. This sub-protocol is generally combined with a separate use of the interface
analysis protocol.
The run of the second site suppressor generates a list of mutations that are predicted to alter binding specificity,
and the interface analysis is
used to calculate binding energies for the mutations suggested.
Interface design with a flexible backbone
This sub-protocol works on paired protein complexes and optimizes the relative orientation of the
two proteins, as well the backbone torsion angles and the amino acid sequences at that interface.
Tail design
This sub-protocol limits the movement of a flexible backbone design to a subset of the total backbone ending at
either terminus.
Loop design
This sub-protocol restricts design to the variant regions of the protein. It is effectively flexible backone design
limited to
particular (arbitrary) regions of the backbone. It is called loop design because these flexible regions are usually
loops
on the surface of the protein which vary tremendously from protein to protein. Loops that can vary both in
sequence and
in structure are identified by not being part of any other existing secondary structure.
Energy function optimization
This sub-protocol is used for optimizing the weights used in the Rosetta energy function. It works with a list of
PDB files.
For each residue position, it tries each amino acid in each possible rotamer and outputs the various Rosetta
energies to a text file.
Besides the residue being varied, all other sequence positions are held in their native conformation.
The output energies are then used by accessory scripts to find weights that maximize the probability the native
amino acid will occupy a sequence position.
Algorithms
All simulations made in design make use of an algorithm for packing amino acid side-chains. To simplify the search,
side-chains are only considered in a discrete set of favorable conformations, called rotamers. Rosetta uses
Roland Dunbrack's backbone dependent library as well as collection of side-chain conformational libraries
assembled by Steven Mayo. Rotamers built with Dunbrack's description have ideal geometries and the
side-chain conformational libraries contain rotamers harvested directly from the PDB with non-ideal bond
lengths and angles.
Like other protocols that have been developed for protein design and rotamer packing, Rosetta's packing algorithm
has two key components: an energy function for evaluating the favorability of specific sequence and structure
and an optimization protocol for scanning through sequence space.
Design sub-protocols
Fixed backbone design
1.
2.
3.
4.
5.
Pick a random sequence to start the simulation or start from a specified sequence. In general either approach
gives similar results.
Make single amino acid substitution or change the conformation (rotamer) of a single amino acid.
Apply the energy function (See the section called “The Energy Function”) to determine the new energy of the
protein.
Accept or reject the substitution base on the Metropolis criterion: substitutions that raise the energy are accepted
at some probability
that depends on the local "temperature". The sequence optimization algorithm starts a permissive "temperature"
and cools until it reaches "0 degrees".
Repeat. The number of repetitions is empirically derived. For a hundred-residue protein a few hundred thousand
rotamer substitutions are attempted.
Flexible backbone design
1.
2.
3.
Perform sequence optimization (as described under Fixed backbone design).
Perform backbone refinement. This is done with the same optimization algorithms that are used for full-atom
refinement during ab initio structure prediction (see the section called “Ab Initio: Algorithms”).
Repeat the above two steps in alternation.
Rotamer packing
Uses the same Monte Carlo optimization protocol as that used for fixed backbone design.
Multi-state design (designing conformational switches)
Multi-state design uses Monte Carlo optimization protocols with simulated annealing in an outer loop in which
amino acid
substitutions are made and in an inner loop that is similar to fixed backbone design where rotamer substitutions
are made.
Starting from a random amino acid sequence, single random amino acid mutations are evaluated by threading the
new sequence
onto the multiple target structures, and determining if the sum of the threading energies compares favorably
(based on the Metropolis criterion)
to the energy of the previous sequence. Threading is performed using predetermined sequence-structure mapping
and the side-chain repacking protocol).
The threading energy function is the standard Rosetta energy function. As the simulation progresses, the
temperature is lowered to ensure
convergence to low-energy sequences.
Second-site suppressor (altering protein-protein binding specificities)
1.
2.
Search an protein-protein interface for all point mutations that destabilize binding.
Find additional mutations which can compensate for the destabilization caused by the above point mutation.
The resulting designs will contain both the initial point mutation and the compensating mutations. Note: Due to limitations
of the original design, only the first interface detected will be redesigned. It is possible to
modify the input file to change which interface is actually detected, by modifying the order in which the chains
appear in the PDB.
Interface design with flexible backbone
Two protocols are available. The first is used when designing loops. The second is used with the interface
minimization protocol.
The decision about when to alternate is done heuristically.
Iterate between the following:
1.
2.
3.
Fixed backbone design
Docking/rigid-body minimization
Loop minimization
A second variation of flexible backbone interface design works as follows:
1.
2.
Fixed backbone design
Simultaneous minimization of backbone and side-chain torsions of interface residues, as well as small rigid body
docking minimizations.
Loop design
Minimize backbone torsion angles in the defined region of the "loop".
Tail design
Tail design has no special algorithms. It simply limits the range of residues to which fixed backbone design is
applied.
The Energy Function
The energy function used by the design protocol is the same as that used for full-atom structure prediction.
Lennard Jones potential
This term favors atoms being close to each other, though not closer than a certain threshold distance.
Lazaridis Karplus implicit solvation protocol
This term penalizees the burial of polar groups and favors the burial of methyl groups.
Hydrogen bonding (orientation-dependent)
The relative strength of the solvation potential and the hydrogen-bonding potential determines how favorable the
burial of a polar amino acid is, in the case where it forms a hydrogen bond.
Torsion potentials (of backbone and side-chain)
These values are knowledge-based: they are derived from the probabilities observed in the protein database.
Pair energy
This is a knowledge-based term that accounts for the likelihood of finding two amino acid types within a certain
distance of each other. It is included as a low resolution term to capture electrostatic interactions.
Reference energies
These values control, on average, how often the amino acids are chosen during a design simulation.
Running Design
This section describes the basic elements of running the design protocol, and contains example command lines which
illustrate common uses.
Design sub-protocols
The option which sets Rosetta to run the design protocol is -design. -design is insufficient however: one of the
following sub-protocols must be chosen, as explained under the section called “Design: Concepts”.
-fixbb
-mvbb
Runs Fixed backbone design.
Runs Flexible backbone design.
-onlypack
Run Rotamer packing.
-pack_in_parallel
Run Multistate design (designing conformational switches)
-alter_spec
Run Second-site suppressor (altering protein-protein binding specificities) .
-design_inter
Run Interface design with a fixed backbone
-design_loops_dock <dock | hold> -loops
Run Loop design. There are two sub-options:
dock: Run loop design iterated with rigid-body docking movements.
hold: Run loop design on an otherwise fixed structure. Requires -loops.
-design_min_inter
Run design with small rigid body minimizations, minimization of interface residues' backbone, and side chain
torsion minimization.
-tail | -tail_fix_helix
Run Tail design.
Common to multiple design sub-protocols
Command-line options
-s <pdb file>
The PDB file describing the protein that will be redesigned.
-l <pdb list file>
The file describing the list of proteins that will be redesigned. Each protein must have a corresponding PDB file.
-pdbout <base name>
Base name for the output PDB files. If this is not specified, the name of the protein is used as the base name. This
option is only meaningful for the Fixed backbone design and Rotamer packing sub-protocols.
-resfile <residue file>
Specifies a file describing which residues will be allowed to vary during design simulation. This file can be created
using the Perl script makeresfile.pl found in the rosetta_scripts/resfiles directory. There is a
README that explains how to run the script. If this file is not specified all residues will be varied.
Note: -resfile does not combine with -l: A resfile refers to only one PDB.
-ndruns <# of runs>
Set the number of design simulations that are done. Some tasks require considerably more than one run to
generate useful data. The default is 1. This option applies to the fixed backbone sub-protocol only.
-profile
Generates a table of amino acid distributions of the designed residue and a fasta file with designed sequences.
-ex1, -ex2, -ex3, -ex4
-ex1aro
Increase the size of the rotamer library that is used for particular chi angles. The number indicates which chi
angles (from chi1 to chi4) use extra sub-rotamers. They can be used in any combination. Note however that each
of these increases the computation time considerably, and using them all in combination may drive your
calculation time into the stratospheric. Deciding when to use the extended rotamer sets is a matter of experience.
The deciding factor in which rotamers are actually considered is -extra_chi_cutoff.
As -ex1, but uses an even bigger rotamer library for aromatics.
-ex2aro_only
As -ex2, but only considers extra rotamers for aromatics.
-extra_chi_cutoff <neighbors>
Sets the number of neighbors a residue needs in order for extra rotamers requested by any of the -ex options to
be considered. A neighbor is defined as being within 10 Angstroms of C-beta. The default is 18, which means that
only residues in the core of the protein are likely to get extra rotamers. Setting this value to 1 will result in all
residues getting extra rotamers.
-use_electrostatic_repulsion
Re-weights the pair potential used to model electrostatic terms of the energy function, so that it is less favorable
to place like charged amino acids near each other.
-soft_rep_design
Specifies the use of an alternate weight set that dampens the Lennard-Jones potential. This option is often used
during fixed backbone design since the backbone cannot relax to accommodate small steric clashes.
-soft_rep
Same as -soft_rep_design except that weights are explicitly optimized for side-chain packing with a fixed
sequence.
-favor_native_residue <energy>
Favor starting amino acids with the specified <energy> in kcal/mol. Negative values are bonuses while positive
values are penalties.
-favor_polar <energy>
Analogous to -favor_native_residue, save that it applies to polar amino acids.
-favor_nonpolar <energy>
Analogous to -favor_native_residue, save that it applies to non-polar amino acids.
-favor_aromatic <energy>
Analogous to -favor_native_residue, save that it applies to aromatic amino acids.
-rot_opt
Optimize the one-body energy (energy with a fixed environment) by minimizing chi angles before entering Monte
Carlo rotamer optimization.
Fixed backbone design
Command-line options
rosetta -design -fixbb -mcmin_trials [ -s <PDB file> | -l <PDB list file> ]
-fixbb
Enable fixed backbone design.
-mcmin_trials
Perform rotamer/sequence optimization procedure in which following each rotamer substitution the chi angles of
the new rotamer and the neighboring residues are minimized before evaluating the substitution with the
Metropolis criterion. This procedure is slow so it should be implemented as a follow up to the standard design
protocol.
Examples
Example 1. Simplest fixed-backbone design simulation.
rosetta -design -fixbb -s 2ptl.pdb
Example 2. Limit which residues to vary in a fixed-backbone simulation using a
rosetta -design -fixbb -s 2ptl.pdb -resfile 2ptl.res
Example 3. Design sequences for a set of PDB files specified in a list.
resfile.
rosetta -design -fixbb -l 2ptl.pdb_list
Example 4. Designing sequences for a set of PDB files specified in a list that includes chain id information.
rosetta -design -fixbb -l 2ptl.pdb_list -chain_inc
Example 5. Expand the rotamer library by including small deviations in chi2. Use the default neighbor cutoff, to
consider only well-packed atoms.
rosetta -design -fixbb -s 2ptl.pdb -ex2
Example 6. Expand the rotamer library by including small deviations in chi and chi2, Use the default neighbor
cutoff, to consider only well-packed atoms.
rosetta -design -fixbb -s 2ptl.pdb -ex1 -ex2
Example 7. Expand the rotamer library by including small deviations in chi and chi2. Use a neighbor cutoff low
enough to apply to all residues, regardless of packing.
rosetta -design -fixbb -s 2ptl.pdb -ex2 -extrachi_cutoff 1
Example 8. Perform three fixed-backbone simulations instead of one.
rosetta -design -fixbb -s 2ptl.pdb -ndruns 3
Example 9. Perform gradient-based minimization of side-chain torsion angles during design.
rosetta -design -fixbb -mcmin_trials -s 2ptl.pdb
Flexible backbone design
Command-line options
rosetta -design -mvbb
-mvbb
Enabled flexible backbone design.
Inputs
Fragment file
See the section called “Fragments”.
Examples
Example 10. Simplest possible flexible-backbone simulation.
rosetta -design -mvbb -s 2ptl.pdb
Example 11. Limit which residues to vary in a fixed-backbone simulation using a
resfile.
rosetta -design -mvbb -s 2ptl.pdb -resfile 2ptl.res
Example 12. Expand the rotamer library by including small deviations in chi2. Use the default neighbor cutoff,
to consider only well-packed atoms.
rosetta -design -mvbb -s 2ptl.pdb -ex2
Example 13. Expand the rotamer library by including small deviations in chi2. Use a neighbor cutoff low
enough to apply to all residues, regardless of packing.
rosetta -design -mvbb -s 2ptl.pdb -ex2 -extrachi_cutoff 1
Example 14. Expand the rotamer library by including small deviations in chi and for aromatics only on chi2.
Use a neighbor cutoff low enough to apply to all residues, regardless of packing.
rosetta -design -mvbb -s 2ptl.pdb -ex1 -ex2aro_only -extrachi_cutoff 1
Example 15. Perform three flexible-backbone simulations instead of one.
rosetta -design -mvbb -s 2ptl.pdb -ndruns 3
Rotamer packing
Command-line options
rosetta -design -onlypack
-onlypack
Enable rotamer packing sub-protocol.
Examples
Example 16. Simplest possible sidechain repacking simulation. The sequence is fixed.
rosetta -design -onlypack -s 2ptl.pdb
Multi-state design (designing conformational switches)
Command-line options
rosetta -design -pack_in_parallel -equiv_resfile <equiv_resfile> conv_limit_mod <loop count>
-pack_in_parallel
Enable multi-state design sub-protocol.
-equiv_resfile <equiv_resfile>
Specifies an equivalency resfile, <equiv_resfile>.
-conv_limit_mod <loop count>
Multiply <loop count> by 5 to give the number of annealing loops in the mutation generating loop. Increasing this
value may improve sequence convergence but at the cost of increased time.
Inputs
Equivalency resfile (equiv_resfile)
Multi-state design requires that the user create a file in which the residues that need to have the same amino acid
type are given. The format for an equiv_resfile looks like:
A 1 32 B 1 32 C 1 32 A 1 29 D 1 29
Where residues 1-32 of chain A correspond to 1-32 of chains B and C. Similar correspondence exists between residues 1-29
of chains A-D.
Examples
Example 17. Simple multi-state design
rosetta -design -pack_in_parallel -s 2ptl.pdb -resfile 2ptl.res equiv_resfile 2ptl.equiv_res
Second-site suppressor (altering protein-protein binding specifities)
This sub-protocol works on protein complexes and searches for mutations that will destroy binding, but which can be
compensated for by mutation on the partner protein.
Command-line options
rosetta -design -alter_spec [-alter_spec_mutlist <mutation list file> ] [fix <file> ] [-pmut <file> ]
-alter_spec
Enables alter specificity sub-protocol.
-alter_spec_mutlist <file> Changes the name of the generated output file to <file>.
-fix <file>
Fixes residues specified in <file> so they will not be redesigned.
-pmut <file>
Limits the residues which are examined for point mutations to the ones specified in <file>
Inputs
Note Due to limitations of the original design, only the first interface detected will be redesigned. It is possible to modify
the input file to change which interface is actually detected, by modifying the order in which the chains appear in the PDB.
-alter_spec does not differentiate chains by chain id. Instead it looks for the first and second PDB termination
markers (TER) in the PDB file. To work on different sets of chains it is necessary to reorder the contents of the PDB file.
Outputs
Mutation list file
-alter_spec will cause the list of each mutated residue for each of the four complexes generated (wild-wild,
mutant-wild, mutant-mutant) to be stored in the file mutlist. The name of this file can be altered with alter_spec_mutlist.
Example 18. Simplest possible alter specificity simulation. Implements second site suppressor strategy for
design altered specifity protein-protein interfaces.
rosetta -design -alter_spec -s 2ptl.pdb
Example 19. Alter specificity and rename the output file.
rosetta -design -alter_spec -s 2ptl.pdb -alter_spec_mutlist 2ptl.mutlist
Example 20. Alter specificity, renaming the output file and enabling use of expanded rotamer library for chi1
and chi2.
rosetta -design -alter_spec -s 2ptl.pdb -ex1 -ex2 -alter_spec_mutlist 2ptl.mutlist
Example 21. Alter specificity using a softened repulsion term to compensate for the fact that amino acids are
represented in discrete space.
rosetta -design -alter_spec -s 2ptl.pdb -soft_rep_design
Example 22. Alter specificity, taking into account extended rotamers sets. Use an altered energy function that
more strongly disfavors like charges being near each other.
rosetta -design -alter_spec -s 2ptl.pdb -use_electrostatic_repulsion
Loops design
Protocols for iterative docking, design, and loop modeling.
Command-line options
rosetta -design -loops -design_loops [ dock | hold ] [ -s <PDB file> | -l
<PDB list> ] -read_all_chains
-loops
Do setup for a task involving loops. Required for this sub-protocol.
-design_loops dock
Enable design with flexible loops and rigid body docking movements. As with any docking operation multiple
chains must exist in the PDB file, and they must be delimited with a TER record.
-design_loops hold
Enable design with flexible loops on an otherwise fixed backbone. Without a resfile, this mode automatically
designs the loops regions and repacks the contact neighbors.
-read_all_chains
Read all the chains in the PDB file, not just the first. This is turned on by default if -design_loops dock is
specified. In the case of
-design_loops hold this option must be provided to read all the chains.
Inputs
Loops file
-loops looks for a file called <protein>.loops. The loops file specifies which residues are in loops and
therefore allowed to move during the simulation (backbone and side chain motion).There is currently no way to
change the prefix of the loops file to something other than "<protein>". The extension can be changed with loop_library but this isn't really that useful.
Note:The loops file must have the following printf style syntax:
"%3d %4d %4d\n", $looplength, $begin, $end
The loops file is also covered in other places in the manual.
Fragment libraries
Fragments are specified by the 'fragments' entry of
paths.txt.
Outputs
Full-atom score file
The .fasc is generated as described under the section called “Outputs”.
Examples
Example 23. Simplest possible loop design run.
rosetta -design -loops -design_loops dock -s 2ptl.pdb
Example 24. Simplest possible loop design run for a single protein (no docking).
rosetta -design -loops -design_loops hold -s 2ptl.pdb
Example 25. Loop design run enabling use of extended rotamers for chi1 and chi2.
rosetta -design -loops -design_loops dock -ex1 -ex2 -s 2ptl.pdb
Interface Minimization
This sub-protocol intersperses interface minimization (backbone, side-chain, and small ridid body motions) with design. The
interface is defined as residues within 5.0 Angstroms of the binding partner.
Command-line options
rosetta -design -design_min_inter
-design_min_inter
Enables design with interface minimization.
Examples
Example 26. Simplest possible minimize interface run.
rosetta -design -design_min_inter -s 2ptl.pdb
Tail design
This sub-protocol is for optimizing the sequence and conformation of results at the N- or C- terminus. Design is restricted to
a tail: a series of residues at either terminus.
Command-line options
rosetta -design [ -tail | -tail_fix_helix ] -begin <residue id> -end
<residue id>
-tail
Performs Flexible backbone design on the protein terminal region only (called the "tail"). It requires -begin and
-end in order to work. It will allow fairly large perturbations to the entire tail region.
-tail_fix_helix
As with -tail save that every residue in the tail is perturbed except those with helical backbone torsions. This
method requires fragments.
-begin <residue id>
Specifies which <residue id> should be treated as the start of the tail.
-end <residue id>
Specifies which <residue id> should be treated as the end of the tail.
Examples
Example 27. Simplest possible tail design run
rosetta -design -tail -s 2ptl.pdb -begin 2 -end 10
Example 28. Simplest possible tail design run with fixed helical regions
rosetta -design -tail_fix_helix -s 2ptl.pdb -begin 2 -end 10
Grow termini design
Similar to the extension protocol described in Sood, V. D. and Baker, D. (2006). J Mol Biol 357(3): 917-27, this subprotocol will extend the N or C terminus of a PDB by some number of residues; alternatively, it may be used to remodel the
N- or C-terminus of a protein. It uses Rosetta's centroid mode, so it will strip off all side chains. A library of starting
structures with diverse conformations of an N- or C-terminal extension will be output, and these may be used as the inputs
to fixed backbone design, after the side-chains of the constant regions have been pasted back on. Note that in addition to
the usual input files, a fasta file and a loop file are required.
This protocol may be streamlined and made more user-friendly in future Rosetta releases; for the time being, several
support scripts are provided to help the user. These support scripts are found in
rosetta_scripts/peptide_extensions.
Command-line options
rosetta <series> <protein> <chain> -design -loops -grow -atom_vdw_set <atom
radii> [ -vdw_max <vdw filter> ] [ -rg_max <rg filter> ] [-cenlist_values]
[-wiggle_jxn]
-loops
-grow
Enable use of loops. Necessary for grow termini sub-protocol.
Enable grow termini subprotocol.
-atom_vdw_set <atom radii>
Should be set to "highres" to obtain the highest quality structures.
-vdw_max <vdw filter>
To avoid printing out structures with clashes set this to the vdw score of the starting structure, or a little higher.
Structures with scores worse than the filter will be discarded.
-rg_max <vdw filter>
To avoid printing out structures in which the terminus being modeled has little interaction with the rest of the
structure, set this to the rg score of the starting structure, or a little higher. Structures with scores worse than the
filter will be discarded.
-cenlist_values
Fill the the last two columns in the "complete" lines at the end of the PDB with the number of centroid neighbous
each residue has within 6.0 Angstroms (cen6) or 10.0 Angstroms (cen10). Useful for identifying output structures
with interactions to a particular residue, if you would like your extension to be targeted to a certain site.
-wiggle_jxn
Sample an even larger conformational space by adding backbone flexibility at the first residue of the extension.
Warning: This option will significantly increase computational time, and has not been shown to be defintitively
useful in producing higher quality models.
Inputs
Loops file
Contains the backbone conformational space that will be explored by the grow termini protocol. The more lines
(conformations) there are in this file, the larger the conformational space that will be explored.A loop file in
proper format may be produced either from the vall or from a list of idealized PDB structures, using scripts found
in rosetta_scripts/peptide_extensions/.
FASTA file
The file must contain the entire amino acid sequence of the PDB, including placeholder alanines.
Outputs
PDB
One PDB file will be output for every line in the loop file, as long as the output structure passes the vdw_max
and rg_max filters. The PDBs are named by the PDB code of the structure from which the torsion angles of the
extension came, and by the line number in the loop file.
<Ignored>
Two additional files, nnXXXX_0001.pdb and nnXXXX.sc (where
PDB name) will also be output. These should be ignored.
nn is the series code and XXXX is the
Examples
Example 29. Simplerun of grow termini
rosetta aa 1kka _ -loops -grow -atom_vdw_set highres -read_all_chains -s 1kka.pdb
Interpreting Results
The energies, structure and output of a Rosetta design simulation are placed in the output PDB file. The PDB file has the
following sections:
1) Coordinates of the design structure. 2) A list of scores. Many of these are use in Ab Initio Structure Prediction and are
not particularly relevant to protein design.
The main score is Wbk_tot * bk_tot + Wother * other. The other terms that contribute to the score evaluate the
backbone stucture, e.g. the ramachandran score. The score is supposed to be the energy with bk_tot as just one part of it.
For the sake of consistency, the score can be used instead of bk_tot when thinking of the quality of output structures.
Note:There are many different scoring functions. The standard score is
score12. See score.cc
The scores used during design with the default protocols are:
bk_tot
The total score using the design energy function. Lower is better.
fa_atr
The attractive portion of the Lennard-Jones? potential. Rewards close contacts.
fa_rep
Lennard-Jones repulsive term. Penalizes overlaps.
fa_sol
Lazaridis-Karplus solvation model. Penalizes buried polars.
fa_dun
Internal energy of sidechain rotamers as derived from Dunbrack's statistics.
fa_intrares Intra-residue clashes.
fa_pair
Statistics based paird term. Favors salt bridges.
fa_prob
Probabilistic term: P(aa
hb_sc
Sidechain-sidechain and sidechain-backbone hydrogen bond energy.
hb_srbb
Backbone-backbone hydrogen bonds close in primary sequence.
hb_lrbb
Backbone-backbone hydrogen bonds distant in primary sequence.
phi, psi) and Ramachandran preferences.
3) A table of energies for each residue in the protein.
res
The residue index.
aa
The three-letter amino acid code.
nb
The count of neighbors.
Eatr
Lennard-Jones attractive term.
Erep
Lennard-Jones repulsive term.
Esol
Lazaridis-Karplus solvation.
Eaa
Probability of an amino acid given the particular phi and psi angles: P(aa phi, psi)
Edun
Rotamer preferences from Dunbrack library
Eintra Intra-residue clashes.
Ehbnd Hydrogen bonding.
Epair Statistics-based pair term.
Elj
Lennard-Jones total.
Eres
Total energy per residue.
Table 1. Example of Residue Energies
res_aa Eatr Erep Esol Eh2o Eh2o_sol Eaa Edun Eintra Ehbnd Epair Eref Egb Eh2o Eh2o_bb Ecst Eres
1
MET -4.0 0.3 1.4
0.0
0.0 2.5
0.3
-0.8
0.0
0.3 0.0 0.0
0.0
0.0 -0.7
2
GLN -2.5 0.1 1.5
0.0
0.0 2.9
0.0
-0.7
-0.1 1.0 0.0 0.0
0.0
0.0 0.3
3
ILE -4.1 0.1 1.2
0.0
-0.2 0.1
0.4
-1.6
0.0
-0.2 0.0 0.0
0.0
0.0 -3.9
4
PHE -4.4 0.5 1.8
0.0
-0.3 0.2
0.0
-1.6
0.0
-0.6 0.0 0.0
0.0
0.0 -3.2
5
THR -3.5 0.0 1.8
0.0
0.0 0.0
0.0
-1.4
0.0
0.3 0.0 0.0
0.0
0.0 -3.3
6
LYS -3.0 0.0 1.5
0.0
0.1 0.9
0.1
-1.7
0.0
0.6 0.0 0.0
0.0
0.0 -2.8
7
THR -3.1 0.1 1.9
0.0
-0.7 0.3
0.0
-1.2
0.0
0.3 0.0 0.0
0.0
0.0 -3.0
8
LEU -1.3 0.0 0.5
0.0
0.0 1.6
0.5
0.0
0.0
0.1 0.0 0.0
0.0
0.0 1.2
9
THR -1.4 0.1 1.1
0.0
-0.1 0.8
0.0
-0.2
0.0
0.3 0.0 0.0
0.0
0.0 0.1
10
GLY -0.8 0.1 0.6
0.0
-1.5 -0.0 0.0
-0.3
0.0
0.2 0.0 0.0
0.0
0.0 -2.2
11
LYS -2.8 0.1 2.4
0.0
0.0 4.4
-0.7
-0.3 0.6 0.0 0.0
0.0
0.0 2.6
0.2
4) A table of measured energies minus expected energies. Expected energies are derived by calculating the average
energies of the different amino acids with a certain number of neighbors in a large set of proteins in the PDB. The table is
useful for determining how well-packed a residue is. The column Elj compares the actual Lennard-Jones energy of
residues to the expected value. Well-packed residues should have
Elj scores near zero or negative.
res
The residue index.
aa
The three-letter amino acid code.
nb
The count of neighbors.
Eatr
Lennard-Jones attractive term.
Erep
Lennard-Jones repulsive term.
Esol
Lazaridis-Karplus solvation.
Eaa
Probability of an amino acid given the particular phi and psi angles: P(aa
Edun
Rotamer preferences from Dunbrack library
Eintra
Intra-residue clashes.
Ehbnd
Hydrogen bonding.
Epair
Statistics-based pair term.
Elj
Lennard-Jones total.
Eres
Total energy per residue.
phi, psi)
SASApack SASApack is related to the void volume in a protein. Surface areas are computed with a 1.4 Angstrom probe and
0.5 angstrom probe and the difference (ASA_0.5 - ASA_1.4) is compared to the expected difference for a
particular residue type in a particular environment. A negative value is favorable and indicates that the residue is
more tightly packed than is seen in average PDB files.
Table 2. Example of Average Energies + RMSD SASApack score
res aa
nb Eatr Erep Esol Eaa Edun Eintra Ehbnd Epair Elj
Eres SASApack
1
MET 15 -0.8 -0.1 0.0 0.0 -0.2 0.2
-0.1
0.0
-1.0 -1.7 2.38
2
GLN 11 0.2 -0.2 -0.3 0.1 0.3
0.0
-0.1
0.0
0.0 -0.4 8.93
3
ILE 22 0.3 -0.2 -0.3 0.0 -1.2 0.4
-0.6
0.0
0.1 -1.8 18.02
4
PHE 14 -0.3 0.1
-1.0
0.0
-0.2 -2.5 1.89
5
THR 22 0.2 -0.3 -0.5 0.1 -0.5 0.0
-0.3
0.1
0.0 -1.0 11.62
6
LYS 15 0.3 -0.4 -0.5 0.1 -2.2 0.1
-0.9
0.2
-0.1 -3.6 5.48
7
THR 18 0.1 -0.2 -0.2 -0.6 -0.3 0.0
-0.2
0.1
-0.1 -1.1 10.38
8
LEU 10 0.9 -0.2 -0.6 0.1 0.1
0.5
0.0
0.8 0.9 1.33
0.4 -0.2 -1.0 -0.1
0.5
5) A table of measured energies minus expected energies, for residues in different environments: surface, buried and
exposed. When creating novel structures we have found it difficult to get Elj numbers that are zero or negative for the
buried residues. Note Most values in this table are only meaningful with the default energy function.
Table 3. Example of Measured Energies - Expected Energies
Eatr Erep Elj
buried -0.1 -0.2 -0.3
middle -0.3 -0.1 -0.3
surface 0.1 -0.1 0.0
6) A table of starting chi angles minus finishing chi angles, and of absolute chi angles. 7) A table of phi, psi and omega
angle for each residue.
Design Tendencies
In some cases RosettaDesign? does appear to make odd choices, and it helps to know beforehand what some of these
tendencies are. In these situations it is probably best to use a resfile to try and point Rosetta away from these pitfalls.
1.
2.
The program likes to put amino acids with similar chemical properties near each other. This is primarily because
polar residues can hydrogen bond with each other, and hydrophobics can pack without burying hbonding groups.
The result is that in some cases you may observe a large cluster of hydrophobic residues on the surface of a
protein, or a cluster of polars in the core. In some cases this can be avoided by forcing key residues to be polar or
hydrophobic.
Sometimes polar groups are buried without a hydrogen bonding partner. The energy function has been
parameterized to try and avoid this, but there is no filter that prevents it.
New Features
1.design minimize inter
-dock_des_min_inter
Dock, Design Minimize Interface is a protocol that first docks two
proteins using Rosetta's centroid-mode docking algorithm and, once
having found a suitable docking arrangement, designs the interface
(using the des_min_inter protocol) by iterating between rounds of
fixed-backbone sequence and structure optimization and rounds of
gradient-based minimization for the degrees of freedom at the
interface. There are several flags that go along with
dock_des_min_inter whose names begin with "ddmi" (e.g.
ddmi_dG_dSASA_ratio_filter, ddmi_dUns_filter).
2.point mutation
-point_mutation
The point mutation submode of design mode alters the sequence of a
single residue and performs a repacking of the residue's neighbors.
The two flags used in this submode are "-point_mutation < int = resid
to change >" and "-new_aa < char = 1 letter amino acid code for the
new amino acid >
Download