Rosetta Scripts

advertisement
Rosetta Scripts
Ab Initio Scripts
These scripts and executables may be used with the RosettaAbinitio? package. Perl scripts can be run without arguments
for usage. Some scripts may require editing configuration values. The lib directory contains perl modules that are used by
some of the scripts.
An example protocol
1.
Run Rosetta to generate a silent mode file.
Make sure the configuration values are set in rosettaAB.pl.
./bin/rosettaAB.pl -fasta test_.fasta -nstruct 1000
2.
Run Rosetta with a different executable (build with Intel C++ instead of g++).
./bin/rosettaAB.pl -fasta test_.fasta -nstruct 1000 -binary
../../rosetta++/rosetta.intel
3.
Run cluster.pl to cluster the decoys in the resulting silent mode file, aatest.out, and extract the top 10 cluster
centers.
./bin/cluster.pl -silentfile aatest.out -get_centers 10
Scripts and Executables
bin/rosettaAB.pl
RosettaAbinitio? wrapper. Generates a silent mode file with nstruct decoys.

Depends on rosetta++/rosetta.gcc.
bin/cluster.pl
Clusters silent mode file decoys and extracts cluster centers.

Depends on RosettaAbinitio/src/rosetta_cluster/rosetta_cluster.
src/rosetta_cluster/rosetta_cluster
Clustering executable.
bin/extract.pl
Extracts decoys from a silent mode file.

Depends on rosetta++/rosetta.gcc.
bin/findhomologues.pl
Gets homologs from a psi-blast search against the NCBI NR database.

Make sure the configuration values are set to point to blastpgp executable and the NCBI non-redundant database
(nr).
bin/reconstruct_PDB_by_index
Decoy extraction executable.

Requires the GNU Scientific Library (http://www.gnu.org/software/gsl/). Shared libraries must be installed in
/usr/lib/.
bin/make_fragments_from_server.pl
Makes fragment libraries from the robetta fragment server.

Requires libwww-perl (http://search.cpan.org/search?dist=libwww-perl) or individual modules, HTTP and LWP, for
HTTP requests.


If you want to build fragment libraries yourself, use the RosettaFragments? package.
THIS SCRIPT IS NOT FOR COMMERCIAL USE!!
o If you are a commercial user and would like to make fragment libraries, get a license to use the Rosetta
Fragments (NNMAKE) package.
Barcode Scripts
Scripts used in processing barcodes. For an explanation of Rosetta's use of the term 'barcode', see Barcode Constraints.
amino_acids.py
barcode_bb_silent.py
bar_code_chi_hydro.py
barcode_chisq.py
barcode_flavors.py
barcode_frags.py
barcode_frags_ss.py
structures.
barcode_graph.py
barcode.py
barcode_tree_flavors.py
flavor.
barcode_tree.py
barcode_util_CA.py
silent files.
barcode_util.py
silent files.
basic_util.py
cluster2barcode.pl
clusters2cst.py
compare_deviation_lists.py
condense_cst.py
fig_devel.py
fisher_contact.py
frag_ss_dev.py
make_scop_barcode_cst.pl
plot_flavor_torsions.py
plot_outfile_torsions.py
pstat.py
res_barcode_script.py
Data on amino_acids.
barcode_bb_silent.py
Bar codes for chi angles with hydrogens.
Chi-square values from barcodes.
Barcodes for flavors.
Barcodes for fragments.
Barcodes for fragments of secondary
Generate graph of barcode data.
Extract barcodes ???
Generate score trees from barcodes for each
Generate score trees from barcodes.
I/O routines for reading decoy files and
I/O routines for reading decoy files and
Basic utility routines.
Convert cluster files to barcode files.
Generate constraint files from cluster data.
Compare deviations.
Collapse constraints files together.
Figure drawing functions.
Calculate Fisher discriminant.
???
???
Plot torsions of flavor files.
Plot torsions of resulting decoy.
Module for array and list manipulation.
???
resflavor2barcode.pl
file.
residue_flavor_code.py
residue_flavor_code_natlabel.py
native consensus.
residue_phipsi_features.py
residues.
run_barcode_scripts.py
barcodes.
score_trees_devel.py
stats.py
Generate a barcode file from a residue flavor
Extract flavors from residues.
Extract flavors from residues, looking for a
Extract phipsi features of particular
Shorthand for executing Rosetta using
Module to generate trees from distances.
A collection of basic statistical functions.
Clustering Scripts
C Scripts
cluster_info_silent
Cluster data in silent files.
Python Scripts
amino_acids.py
blast.py
compose_score_silent.py
silent file.
fig_devel.py
make_color_trees.py
make_new_plot.py
pdb.py
score_trees_devel.py
How to do clustering:
Data on amino_acids.
Process BLAST data.
Compose the scores from multiple PDBs into a
Figure drawing functions.
Generate trees with data colored by ???
Generate data plots
Generate coord files for scop_family
Module to plot score trees
step 1: If you have PDB files, you have to make them into a
silent-mode output file. Make a list of the decoys:
/bin/ls aa*pdb > tmp.list
then compose into .out file:
python/compose_score_silent.py tmp.out tmp.list
step 2: (optional) Pre-process your native-pdb-file for reading by the
clustering program.
python/make_coords_file.py nat.pdb A tmp.out > nat.coords
step 3: Cluster
C/cluster_info_silent.out tmp.out nat.coords cluster/tmp 5,15,45,75
3,4
I you dont have a native, replace "nat.coords" by "-". This will make
a million files in the directory cluster/ that start with
the characters "tmp" (that's what the 3rd argument specifies).
step 4: Make a contacts plot.
python/make_new_plot.py cluster/tmp.contacts
step 5: Make a dendrogram of the clusters.
python/make_color_trees.py cluster/tmp 1 25
You can get some info about the scripts by running each one without
any arguments.
The clustering is a very simple algorithm: given an RMSD threshold,
find the decoy with the most neighbors within this threshold. This is
your first cluster. Now delete all the members of this cluster, and
repeat: find the decoy with the most neighbors within this threshold.
This is your second cluster. etc etc
The only tricky part is how you decide what the clustering threshold
should be. You could say you want N decoys in the top cluster. Or you
could say you want the threshold to be 3 Angstroms. The complicated
command line arguments to the clustering program are designed to
allow the program to make a smart decision:
C/cluster_info_silent.out <silent-file> <coords-file> <prefix> a,b,c,d e,f
a is the smallest cluster you want to see.
b,c, and d bound the size of the top cluster, and e and f bound the
clustering threshold.
The program will try to get a top cluster of size c. This will define
some initial clustering threshold t. If t >= e and t <= f, you're done.
If t<e then the program will try a top cluster of size c+1, increasing
the top cluster size until either the clustering threshold falls within [e,f]
or a top cluster size of d is reached. If the initial threshold had been
greater than f, the program would instead have decreased the top cluster
size until the threshold fell below f or a top cluster of size b was reached.
In short: the top cluster size will lie between b and d, and if possible the
clustering threshold will lie between e and f.
The memory and speed are most sensitive to the setting "d" as well as the
total number of decoys, so try not to set "d" too big. If it's too small the
first time you can always run it again.
For 1000 decoys and a smallish protein you might use:
5,10,50,150 3,4
Of course the thresholds 3,4 should be scaled with the length of the protein.
Decoy Stats Scripts
Scripts for processing the results of runs of Rosetta using -decoystats. These scripts are not well documented.
cluster_trees.py
decoystats.py
Module for generating trees from clustered distances.
Process results of decoystats runs,
o
Docking Scripts
STARTERS
o
o
o
POST-PROCESSING ROUTINES for large runs on computing clusters
HANDY SCRIPTS for looking at scorefile output
MISCELLANEOUS and supporting scripts
This is a collection of scripts used with the RosettaDock? package. Some are essential (such as post-processing tools),
others are handy for working with or creating Rosetta input and output files, and others are obscure routines for dealing
with specific types of, say, calibration runs.
If you add files:
1.
2.
3.
enter descriptions here
all scripts should have a help message if arguments are not entered properly
comment scripts and list an author/email
Enjoy
STARTERS
rrun.sh
ppk.bash
rosettarc
testrun.bash
main script for invoking Rosetta in dock mode
Creates a prepacked starting structure
Setup file for using RosettaDock
Mike Daily's example script for testing a
dock run
POST-PROCESSING ROUTINES for large runs on computing clusters
pp_pdb2.sh
Second half of post-processing, usually done
on the lab intranet where R is present
pp_pdb.sh
Post-processes a docking run. Calls several
other pp_ scripts. Usually done on the
cluster, then data files are pushed to a
desktop machine for further processing.
pp_compile_scorefiles.sh Merges multiple scorefiles (aa,ab...)
together for an analysis of the superset
pp_extract_set.sh
Extracts topN structures from multiple subdirectories
pp_push_set.sh
Pushes top decoy sets off the cluster
pp_cluster_set.sh
Create clusters of decoys (uses R)
pp_calc_contacts.bash
Calculate the number of correct residue-residue
contacts (obsolete now that Fnat is
calculated by Rosetta)
pp_dwindle_byfile.sh
Observe how many correct structures pass
various filters
pp_dwindle.sh
same as above
pp_set.sh
calls pp_pdb.sh for all targets in the
current directory
pp_summarize_clusters.sh Details the results from clustering
pp_summarize_lowscores.sh Details the results from final score
pp_zip_pdb.sh
Zips up completed sets of runs.
HANDY SCRIPTS for looking at scorefile output
filter_bumps.pl
filter_column.pl
Remove structures with bad bumps
Filter a scorefile on a particular column
(by column number)
filter_on.pl
sort_on.bash
findColumn.pl
findIndex.pl
find_max.pl
find_min.pl
find_percent.pl
findRank.pl
checkCol.pl
column_filter.pl
column_unfilter.pl
Filter by a particular score (by name)
Sort by a particular score (by name)
Find the column number for a particular score
Find the column number for a particular score
Find the max value in a particular column
Find the min value in a particular column
Filter at a cretain percentage
Determine rank of first decoy fullfilling a criterion
be sure all lines of a scorefile have the
same specified number of columns
Remove scorefile lines with the wrong
number of columns
Show offending scorefile lines with the
wrong number of columns
MISCELLANEOUS and supporting scripts
analyzeclusters.sh
calc_score_for_cluster.pl
clean_scratch.sh
Contacts and relative rms of top clusters
Finds the best scoring decoy in a cluster
Removes temp files from computer cluster
scratch drives
docking_make_scorefilter.sh Find cutoff values for filtering decoys by score
do_voids.sh
Calculate voids at interfaces using voidoo
energy_diff.py
Determine energy changes from ppk
structure (monomer) to bound (decoy) structure
getAllscores_complete.pl
Combine scorefiles from calibration
(perturbation) runs for input into R for
regressions
getAllscores_withTargets.pl, getAllscores.pl: ditto above
histjoin.pl
Join two histograms
makeprefix.sh
Convert a number to a two-letter code
(useful for large runs)
pdb_dir_maker.pl
Create output directories for dock runs
adddirs.pl
From a list of decoys, add prefix subdirectories
rms2avglink.csh
For clustering, calculate rms between all
pairs of decoys
rms2.pl
Clustering guts
also:
pdb_scripts/ contains scripts for manipulating pdbs, including
preparation for docking runs
R_scripts/
contains scripts for plotting scores from scorefiles and
fitting weights from scorefiles
PDB Manipulation Scripts
The argument usage for these scripts can be found by running the scripts with no arguments.
addChain.pl
add a new chain to a pdb file that previously did not have one extractChains.pl: extract given
chains from a pdb file, followed by a TER. You can extract multiple chains at once, but there
will be only one TER at the end.
changeChain.pl
change a the chainID of a given chain.
listChains.pl
list chains in a pdb file, with the position of existing TERs marked by a '-' after the appropriate
chain.
openPDB.pl
Separate two docking partners (delimited by TERs) along the line of centers by a given
distance.
pdb_detail.pl
Breaks pdb file down completely by chains and residue numbers.
pdb_fasta.pl
Generate a FASTA format sequence file from a pdb.
pdb_remove_missing_bb.bl remove missing backbone atoms from a pdb. Missing backbone atoms in a pdb will cause
Rosetta to crash.
renumberPDB.pl
change the numbering in a pdb, starting over at TERs. Warning: discontinuous numbering (e.g.
chain breaks) will become continuous.
translate_xyz.pl
translate given chain of a pdb file by given deltaX, deltaY, and deltaZ.
truncate.pl
truncate a given chain of a pdb before or after a given residue number.
zapHs.pl
Removes hydrogens from a pdb (usually to remove Hs that Rosetta has inserted).
Disulfides
makefixdisulf.py python script to make .fixdisulf files for using -fix_disulf option in RosettaDock

calccontacts.py, coordlib.py, and loadPDB.py are python scripts upon which makefixdisulf.py depends. do not
remove them.
More
zapChain.pl
remove a chain from a pdb file
pdb_sequence.pl
extract the sequence from the pdb
pdb_add_insert_codes.pl
add insert letters to repeated pdb residue numbers
orientPDB.pl
rotate the PDB based on a particular residue
renumberPDBandchains.pl starts each chain numbering at res 1
renumberPDBatoms.pl
renumber the atoms from 1
homogenizeChain.pl
identifies all ATOMs with given chain and removes TERs
pdb_subtract_scores.pl
Compare scores residue by residue
pdb_only_ATOM_TER.pl
remove everything from file except ATOM and TER lines
truncateFABs.pl <-LH
-H> <-truncL x> <-truncH x> Truncate light chain L at 112 and heavy chain H at 119
Ligand Scripts
This directory contains scripts for using the results of ligandcode in this version of Rosetta. It contains the following files:
molecule.exe is a c++ program executable written by Jens Meiler. The operation of the subsequently describe rely on this
program. The executable in this directory runs on an x86 Linux box ONLY.
pdb2mdl.inp is a script to translate a pdb format file into a mdl format file which is output to stdout. output is directed to
the screen.
Usage: pdb2rosetta.inp pdbfile
mdl2rosetta.inp translates an mdl format file into a pdb format file with rosetta atomtype names.
Usage: mdl2rosetta.inp mdlfile
addhydrogens.inp will add hydrogens to a PDB file to fill missing valences.
Usage: addhydrogens.inp
Ligand Scripts Unix Session? gives an example of how to use the commands in this directory.
example_files are example files for use with the scripts.
Recommendations as to how the use these scripts to generate a small molecule protein input pdb for input into ligand mode
of rosetta follow.
First, create a file with only the HETATM or ATOM statements of the small moelcule or ligand that you want. For the
example in the example_files folder we can grep the LOV residue HETATM statement using the follwoing line.
grep LOV 2ER7.pdb | grep HETATM > 2ER7_hetatm_start.pdb
With the pdb file we can now use the pdb2mdl.inp script to call molecule.exe to produce a an mdlfile.
pdb2mdl.inp 2ER7_hetatm_start.pdb > 2ER7_hetatm.mdl
The mdlfile allows us check that molecule.exe recognizes the bonding network of the molecule correctly. The fourth line of
the file has two numbers of imporatnace in the first columns. The first number is the number of atoms and the second is
the number of bonds. If you manually edit this file andremove or add a line these need to be updated. The next lines are
the atoms in the molecule. With cartesian coordinates followed by the atomname. The Bond block follows the Atom block.
The first two numbers are the line numbers in the atom block that the bond connects. The third number is the bond type:
1=single bond, 2=double bond, 3=triple bond, 4=aromatic/conjugated bond. Looking at the example mdlfile we see that
there is a bond type 4 present between to carbons that should infact be a single bond. Change the 4 to a 1. Now run the
addhydrogens.inp script to add hydrogens to the molecule.
addhydrogens.inp 2ER7_hetatm.mdl
The resulting 2ER7_hetatm.mdl file will look like 2ER7_hetatm_w_hydrogens.mdl. This mdlfile has all the atoms and bond
description needed to allow molecule.exe to determine the appropriate rosetta atom types. Now run the mdl2rosetta.inp
script to generate the HETATM statements that can be usedin the input file to rosetta.
mdl2rosetta.inp 2ER7_hetatm_w_hydrogens.mdl | grep HETATM >
2er7_hetam_rosetta.pdb
This final file should be added to the end of the input pdb file seperated by a TER statement form the PDB ATOM records. If
the molecule is charged add a a CHARGE record after the HETATM statement.
All example files are found in the example_files/ directory.
NOTE If the hydrogens in the final pdb file appear to be misplaced, the bondlengths or bondtypes for heteroatoms in the
original files are off. There is no fix other than converting the ligand from pdb to mdl (pdb2mdl) and correcting the mdl file
(it contains bond types). After these corrections, hydrogens will be added properly.
Misc Scripts
countIntCont.pl Count the number of heavy-atom contacts at the interface between chains.
rosettaRadii.pl Put Rosetta radii in the B-factor field of a PDB file to display these, use "spacefill temperature" in Rasmol
Peptide Extension Scripts
These scripts help generate input files for peptide extension protocol. Run without args for usage.
addDummyAlanines.pl will add placeholder alanines with zero coords to a pdbfile. They mark the spot where the
extension will go. The output pdbfile will be renumbered sequentially (atoms and residues) Note that if your input pdbfile is
not sequentially numbered for residues, some odd things may happen.
makeLoopLibraryFromVall.pl and makeLoopLibraryFromPDBlist.pl As the names suggest, these scripts will generate
loop library format files from either a vall style file, or from a list of idealized pdbs. You may use this loop library for your
favorite loop modeling protocol and it is also the correct format for peptide extensions.
Resfile Scripts
These scripts make resfiles in the proper format for either Rosetta design mode or for designing with Tanja's interface code.
Run script with no arguments to see what the proper usage is.
makeResfiles.pl

Pass in comma separated lists of residues to be designed and also (optional) to be repacked,or to be designed
only as hphobic,charged, aromatic or polar.




Pass in the pdbfile
Specify rosetta or interface type resfile to be made
Optionally give a name for the ourput resfile or will write to STDOUT
Outputs a resfile in the appropriate format
makeAllPointSubs.pl







Makes a resfile for all 19 amino acids (no CYS) at the specified design residue
Pass in the residue to be designed
Optional: pass in a file with comma separated residues for repacking
Optional: pass in a file with comma separated one-letter amino acids
(these are the only substitution that will then be made)
Pass in the pdb file
Specify rosetta or interface type resfile to be made
Seqparam Scripts
These scripts parameterize rosetta using sequence profile data, with the result being a soft core potential.
Usage
1.
2.
3.
4.
Modify rosetta and set ddG weight to OPTE weight set.
Run a psi-blast for a set of proteins and get a series of multiple sequence alignment with the psi-blast.
Run the script to put the natural amino acid probability and all the energy term into a single file.
Run make to get rosetta_profile_param.
Fragments
o
Making Fragments
o
Fragment Making Tutorials



Setup
James Thompson's Tutorial:How
to Pick Fragments
Making Fragments as Part of Loop Modeling
o
How to Make a vall Without Knowing What You are Doing
Making Fragments
WEBSERVER FOR FRAGMENTS
To make fragments locally with make_fragments.pl:
Setup
DATABASES:
nr — downloadable from ftp://ftp.ncbi.nih.gov/blast/db/
nnmake_database — included in release.
chemshift_database — include in release.
PROGRAMS:
PSI_BLAST — ftp://ftp.ncbi.nih.gov/blast/executables/release/
PSIPRED — http://bioinf.cs.ucl.ac.uk/psipred/
JUFO — http://www.meilerlab.org/
PROFphd — http://www.predictprotein.org/newwebsite/download/index.php
SAM — http://www.soe.ucsc.edu/research/compbio/sam.html
nnmake — include in release
chemshift — include in release
Configure paths at the top of nnmake/make_fragments.pl to point to these databases and programs. PSI-BLAST must be
installed locally
After PSIBLAST and PSIPRED are installed, refer to its README or see quick directions below on how to create a filtered
"NR" seqeuence data bank, called "filtnr", which is also used by make_fragments.pl.
Quick directions for creating filtnr:
tcsh% pfilt nr.fasta > filtnr
tcsh% formatdb -t filtnr -i filtnr
tcsh% cp filtnr.p?? $BLASTDB
1. Obtain a fasta file for the desired sequence. This file
must have 60 characters/line with no white space. First
line can be a comment starting with the '>' character.
2. Obtain secondary structure predictions from web servers, or
setup shareware locally so that make_fragment.pl can run
secondary structure predictions locally.
The fragment maker can use predictions from psipred (.jones or
.psipred extension), PhD (.phd) and SAM-T99 rdb format (.rdb)
and jufo (.jufo). Up to three predictions can be used. At least
one must be used.
The getSSpred.pl script can be used to obtain predictions
off the web. Edit the config portion of this script to include
your email address and to include the correct path to the httpget
script. To use this script, provide the fasta filename and the desired
method.
(invoke the command without arguments to see the
usage explanation). Retrieve the secondary structure predictions
from your email mail box.
3. (Optional) Prepare files with NMR data if avialbe - these include .cst and
.dpl files
that are the same files that rosetta uses, and the .chsft_in file
that contains chemical shift information. The information from these files
can
help Rosetta better pick fragments. See the file 'data_formats.README'
for the formatting information.
4. Run make_fragments.pl. Invoke without arguments for usage options.
Likely the only argument you need to provide is the fasta file.
$> make_fragments.pl -verbose 2ptl_.fasta
If you want to exclude homologous seqeunces from the fragment search,
add the -nohoms argument.
$> make_fragments.pl -verbose -nohoms 2ptl_.fasta
Note that if you want to exclude homologs from the chemical shift/TALOS
search,
you need to edit the talos database. See the README in the
chemshift_source directory for
instructions.
If you do not have a particular type of secondary structure
prediction (say the .jufo file) and you do NOT want make_fragments to
try to run the method locally, use the -nojufo option.
$> make_fragments.pl -verbose -nohoms -nojufo 2ptl_.fasta
Two fragment files will be generated with names like aa2ptl_03_05.200_v1_3
and aa2pt_09_05.200_v1_3.
The prefix "aa" can be changed by -xx option. "2ptl_" is the five-letter
base name which can be
specified by -id option or it is derived from the name of fasta file. 03 or
09 indicate the lengths of
fragments.
5. Generate loop library in addition to fragment files. Run make_fragments.pl
with
-template option such as (five-letter code is 2ptl_ for example):
$> make_fragments.pl -template 2ptl_ 2ptl_.fasta
it requires 2ptl_.pdb and 2ptl_.zones to be present in your run dir and
this pdb is a template pdb
file which has been generated by createTemplate.pl described in
README.loops". From the zone file,
loops can be defined and a library of loop conformations for each defined
loop
are complied into a file called "2pt_.loops_all" (which usually contains
2000 loop conformations)
based on fragment picking. Then the script "trimLoopLibrary.pl" is
automatically called to reduce
the size of the loop library and output the file as "2ptl_.loops". This
file is later on used in
the Rosetta loop modeling mode to build variable loops onto the template
structure. A loop
library differs from a fragment library mainly in that geometrical
information is considered to
pick "loop" fragments with desired length which can roughly close the gap
based on the "take-off"
stub positions.
A newer version vall database (2006-05-05) has been provided in
nnmake_database together with
the orginal version 2001-02-02. You can make fragments using either version
of database, just
modifying make_fragments.pl to have it pointing to the version you want to
use. Currently, making loop
library only works with 2001-02-02 version as some newly developed loop
modeling methods do not
need a loop library any more.( see README.loops for more information)
NOTES:
1.
2.
3.
name all your files with a five character base name followed by the appropriate extension. The base-name should
be the four-letter pdb code and 1 letter chain id.
See also pNNMAKE? for a listing of the files involved in the fragment process
If a pdb file is in the directory you're making fragments in, nnmake will evaluate the fragment match to the pdb.
Note that if the pdb file disagrees with the fasta file, the program will detect an error and stop
Fragment Making Tutorials
James Thompson's Tutorial:How to Pick Fragments
Making Fragments as Part of Loop Modeling
How to Make a vall Without Knowing What You are Doing
Jack Schonbrun May 27, 2004
This is a completely unguaranteed description of the process I went through to make a new vall from a specific set of
proteins.
A "vall" is a what we call the list of idealized protein structures from which pNNMAKE picks fragments. It must contain the
secondary structure (H,E or L) of each amino acid, the idealized backbone torsion angles (phi,psi,omega) of each amino
acid, and a sequence profile for each position. Naturally these must be placed in a specific format.
1.
2.
Secondary structure is usually generated by dssp. I used /users/jack/bin/dsspcmbi.lnx
Idealized torsion angles are obtained by running rosetta in idealize mode:
/users/jack/rosetta++/rosetta.gcc -idealize -l <protein.list> >&
idealize.log
This will give you a new set of pdbs, with idealize torsion angles at then end of the file. They will have names like
"1pdbA_0001.pdb", if 1pdbA.pdb was your starting structure. At the end of each of these new structures will be
the idealized torsion angles.
Also, if you have placed your dssp files for each protein where rosetta can find them, it will include the dssp
secondary structure assignments in the table of torsion angles. This is nice, because rosetta has done the parsing
for you. You can check that rosetta is finding and reading your dssp files by exmaining your stderr log file from
the idealization run.
3.
Profiles are made using multiple sequence alignments from psiblast. This part is a bit tricky, because you need to
have databases of sequence files. There is a set on shampoo.baker in /scratch/shared/genomes. I don't know
exactly which of the files in there you need. It is recommended that you run with these files on a scratch drive.
But shampoo is a single processor machine, and I didn't have have room on the /scratch partition of peake. So I
put a copy in /dump/jack/genomes. Because some scripts expect things to be in /scratch/shared/genomes, I
made a symbolic link on peake:
ln -s /dump/jack/genomes /scratch/shared/
But I was only putting 11 proteins in my vall. If you are doing more, it is recommended you actually find a
computer with space on its scratch drive. If you're lucky, there may already be a copy of genomes/ on it. You
might want to know that your genomes directory is as current as possible, but you'd have to talk to Dylan about
how to get an updated one.
You will need fasta files for proteins to submit to psiblast. You can make this with Dylan's script (avaiable from cvs
co pdbUtil):
/users/dylan/src/pdbUtil/getFastaFromCoords.pl -p <pdbfile> -chain
<chain> > pdbfile.fasta
Now you should be able to run another script of Dylan's to make your profiles. This takes a little while (I found
~20 minutes per 300 residue protein.) There are few things you should set up first. I set my BAKER_HOME
environment variable to /users/dylan. Until everything is standardized, this is useful. In tcsh:
setenv BAKER_HOME /users/dylan
You run the script as:
/users/dylan/src/msaUtil/quickblast.pl 1pdbA.fasta outdir
As far as I know there is no batch processing, so I did
/bin/ls -1 *fasta | awk '{system ("./quickblast.pl "$1" outdir")}'
Where outdir is a previously created directory for the output. When this is all done, you should have many files in
your outdir, some of which have the suffix .checkPROFILE. These contain the profiles that you want to use in your
vall. They contain the residue preferences from psiblast, plus blosum substitions for positions with no information.
4.
Now you have all the information you need to make your vall. I have a is a primitive awk script that will put it
together for you. It is available via 'cvs co rosetta_scripts/vall'. To run, I recommend you make a directory
containing all your idealized pdbs, and all your .checkPROFILE files, and nothing else. Go to that directory and
run:
~/rosetta_scripts/vall/assemble.awk * > vall.dat.whatever
You vall name *must* start with vall.dat or pNNMAKE will get mad. I believe you can have whatever you want
after that.
5.
6.
7.
Now you just need to know how to make make_fragments.pl work! There should be a readme for that too.
Caveats, I have not talked about discontinuous chains, or making the files needed for homolog detection. Because
I don't really know how to do either.
Please let me know if you try this protocol, and where it fails.
Rosetta Databases
avgE_from_pdb
bb21sdep06.Jan.sortlib
bbdep02.May.sortlib
bb_hbW
bbind00.Nov.lib
disulf_jumps.dat
DunbrackBBDepRots12.dat The Dunbrack Backbone-Dependent Rotamer Library
dunsd
energy_quantile__atre__aa_ss_sf_nb.data
energy_quantile__dune__aa_ss_sf_nb.data
energy_quantile__hbe__aa_ss_sf_nb.data
energy_quantile__intrae__aa_ss_sf_nb.data
energy_quantile__paire__aa_ss_sf_nb.data
energy_quantile__probe__aa_ss_sf_nb.data
energy_quantile__repe__aa_ss_sf_nb.data
energy_quantile__rese__aa_ss_sf_nb.data
energy_quantile__sole__aa_ss_sf_nb.data
energy_quantile__spk__aa_ss_sf_nb.data
energy_quantile__tlje__aa_ss_sf_nb.data
Equil_AM.mean.dat
Equil_AM.stddev.dat
Equil_bp_AM.mean.dat
Equil_bp_AM.stddev.dat
Fij_AM.dat
Fij_bp_AM.dat
jump_templates.dat
jump_templates_v2.dat
Paa
Paa_n
Paa_pp
paircutoffs
pdbpairstats_fine
phi.theta.36.HS.resmooth
phi.theta.36.SS.resmooth
plane_data_table_1015.dat
Rama_smooth_dyn.dat_ss_6.4
SASA-angles.dat
SASA-masks.dat
sasa_offsets.txt
sasa_prob_cdf.txt
sc_hbW
smart_scorefilter.pl
template.pdb
unsatisfied_buried_polar__pdb__aa_at.data
unsatisfied_buried_polar__pdb__aa_at_ss.data
Download