Practical: Comparative modelling of Gdh using MODELLER

advertisement
Practical: Comparative modelling of Gdh using MODELLER
Coordinator: Muhammed Sayed
June 2004
Initial requirements:
Before you start running modeller, you will need one or more template structures (pdb
format) and a sequence alignment of the target against the template sequence. The
alignment should be in PIR format.
An example of PIR format sequence :
>P1;target
sequence:target:@:@: 76 :@:target: :-1.00:-1.00
MIVFVRFNSSHGFPVEVDSDTSIFQLKEVVAKRQGVPADQLRVIFAGKELRNDW
TVQNCDLDQQSIVHIVQRPWRK-*
>P1;1aar
structureX:1aar:@:@: 76 :@:1aar : :-1.00:-1.00
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLS
DYNIQKESTLHLVLR-LRGG*
A PIR format file is easily confused with other formats. It has the following features.






Each new protein begins with TWO header lines.
The first header line begins >P1;
Immediately after this is a code, normally a four letter code, that allows programs
to find the corresponding PDB files,
The next line begins either “structure” or “sequence” type record, and it should be
last.
Next comes your sequence, in one letter codes with appropriate gaps indicated by
dashes,
The sequence ends with a*
Part 1: Search for homologues (templates) and obtain structure-based
alignment of target against template sequences
Introduction and background:
You have a sequence and you want to do homology modelling. First thing is to fine some
3-D structures which have similar sequence (sequence identity of 30% or above will do,
but a lower identity might be OK if there are strong conserved features such as a
conserved cysteine pattern). Homologue structures and a structure-based alignment can
be obtained in a variety of ways, but for the purposes of this practical we’ll be using the
FUGUE server.
NB: You will need to align the target sequence in such a way so that no gaps or insertions
will be included in the alignment where there is conserved protein secondary structure
(helices and strands), i.e. you want your gaps to be placed in the loop regions.
What does FUGUE do ?
FUGUE is a program for recognizing distant homologues by sequence-structure
comparison. It utilizes environment-specific substitution tables and structure-dependent
gap penalties, where scores for amino acid matching and insertions/deletions are
evaluated depending on the local environment of each amino acid residue in a known
structure. Given a query sequence (or a sequence alignment), FUGUE scans a database of
structural profiles, calculates the sequence-structure compatibility scores and produces a
list of potential homologues and alignments.
Exercise :
1. Submit the target sequence (gdh) to the fugue server at http://wwwcryst.bioc.cam.ac.uk/~fugue/prfsearch.html
2. Analyze fugue output and identify potential templates for comparative modelling.
Hits with Z-scores above 6.0 are normally highly significant.
3. Have a look at the “Joy” alignment produced from fugue for each ‘homologue’.
Joy displays 3D structural information in a sequence alignment and helps one
understand the conservation of amino acids in their specific local environments.
4. Copy alignment (target against best template structure) from the PIR format
option in Fugue and save as gdh.ali. For now, we’ll only use one template with
the highest Z-score.
Part 2: Prepare input files for modeller
Create a new directory where modeller will be run and download or create the following
files:
1) Protein Data Bank atom files for templates - code.atm
Coordinates for the template structures. Choice of templates based on Fugue results. Each
atom file is named code.atm where code is a short protein code, preferably the PDB code;
for example. The code must be used as that protein's identifier throughout the modeling.
2) Alignment file – gdh.ali
Modeller needs an alignment file in PIR format just like gdh.ali created earlier from the
fugue alignment, however, it needs additional information which it gets from the
comment line. Edit gdh.ali as necessary so that it complies with the PIR format in the
example below.
>P1;gdh
sequence:gdh:@:@: @ :@:gdh: :-1.00:-1.00
MIVFVRFNSSHGFPVEVDSDTSIFQLKEVVAKRQGVPADQLRVIFAGKELRNDW
TVQNCDLDQQSIVHIVQRPWRK-*
>P1;1aar
structureX:1aar:@:@: 76 :@:1aar : :-1.00:-1.00
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLS
DYNIQKESTLHLVLR-LRGG*
The first line has the four letter name of the pdb file after the semi colon, the second line
contains the pdb name, the number of amino acids, and the name of the protein. Enter the
data in the same format as above.
3) Script file – gdh.top
The script file contains commands for MODELLER, in the TOP language Modeller will
take an alignment (gdh.ali) that you give it and model your sequence against that
alignment. All the parameters for this run are set up in a file ending with ".top". A sample
script file called gdh.top is given below. Cut and paste this script in a file called gdh.top
and edit as necessary.
# PRIMER: STEP 5
#
# This script should produce two models, gdh.B999901 and gdh.B999902.
#
#
#
INCLUDE
# Include the predefined TOP routines
SET ALNFILE = 'gdh.ali'
SET KNOWNS = 'Xxxx'
SET SEQUENCE = 'gdh'
SET ATOM_FILES_DIRECTORY = '.'
SET STARTING_MODEL= 1
SET ENDING_MODEL = 2
SET DEVIATION
= 4.0
CALL ROUTINE = 'model'
# alignment filename
# codes of the templates, normally pdb code
# code of the target
# directories for input atom files
# index of the first model
# index of the last model
# (determines how many models to calculate)
# have to be >0 if more than 1 model
# do homology modelling
Set "ALNFILE" to your alignment file, in this case gdh.ali. Set "KNOWNS" to a list of
the pdb files' four letter beginning that you are modelling your sequence against (only a
space in between each.) Set "SEQUENCE" to the 3 to 4 letter name that you want the
sequence that your modelling to be called (gdh for example). The
"STARTING_MODEL" and "ENDING_MODEL" designate how many minimization
runs are done. In the above example two runs will be performed.
Part 3: Run modeller and evaluate quality of final models
Exercise:
1) Run modeller and examine log file for best model
Now your ready to run modeller. At the prompt, type "mod gdh.top" where gdh is the
name that you gave your top file in part 2. Modeller, if run correctly, will run for a while
as it models the sequence then does the minimization steps. The initial output is your 3 or
4 letter name for your protein followed by ".ini" and the actual models are followed by
"B9999??" where ?? is the number of the minimization.
Modeller will now build between 1 to 15 models depending on the number you asked for
in the .top file. Which model do you choose ? The one with the lowest energy. There will
be a table produced for each model. The energy value is the number associated with the
term Value of Ln (Molecular pdf). The model with a combination of the lowest energy
number and the lowest number of restraint violations is the one you want.
2) Evaluate the model
Examine your best models by producing and assessing the Ramachandran plot. This can
be done using the RAMPAGE server (http://raven.bioc.cam.ac.uk/rampage.php). Simply
upload your model coordinates and press submit. Residues in the uploaded PDB file that
fall into the "allowed" and "outlier" regions are listed, and a picture of the Ramachandran
plot is displayed.
Note:
If models are of poor quality, go back to the alignment and see if it can be improved and
rerun modeller. Repeat this process until you are happy with the quality of the models.
Try using more than one template during the modelling process,
Useful links
Fugue: A fold recognition method using structural environment-specific substitution
tables and structure-dependent gap penalties
http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html
Joy: Protein structure and alignment analysis
http://www-cryst.bioc.cam.ac.uk/cgi-bin/joy.cgi
Modeller: Program for Comparative Protein Structure Modelling
http://www.salilab.org/modeller/modeller.html
Rampage: Structural validation by assessment of the Ramachandran plot
http://raven.bioc.cam.ac.uk/rampage.php
Download