Manual

advertisement
Programs
for
Sharing
Prepared by: Chua Gek Huey
Last updated: 24 March 2006
Content Page
1. GetSignal
1
2. KinaseHMM
4
3. RQA
5
4. APBS
7
5. AMBER
8
6. Autodock
10
GetSignal
For all programs in C, key in the command without any input arguments
and a help message will be displayed. For example, if you do not know how
to use GetSignal, just key ‘GetSignal’ at the command prompt and you will
get a message on how to use GetSignal.
For all matlab functions, key in ‘help <function>’ and a message about the
function will be displayed. For example, if you want to get help for
ExtractGetSignal, just key in ‘help ExtractGetSignal’ at the Matlab
command prompt and a help message will be displayed.

GetSignal (C)
o Description: Converts all sequences in <input fasta file> to signals
(raw signals, windowed signals, wavelet reconstructed signals) with all
indices listed in <input index file>
o Usage: GetSignal <input fasta file> <input index file> <signal file>
<window file> <wave file> <WIN> <LEVEL> <type(a/d)>
 (input) Input fast file = your sequences in fasta format
 (input) Input index file = list of aa indices that you want use (using
their AAIndex) (for the format, refer to aaid_489.out)
 (output) signal file = raw signals translated from sequences using
aa indices
 (output) window file = WIN length window-averaged signal
obtained from raw signal (If WIN=1, then the windowed signal is
the same as raw signal)
 (output) wave file = reconstructed wavelet signal determined by
LEVEL and type (a/d). In this program, ‘db10’ wavelets are used.
 (parameter) WIN = Rectangular window size
 (parameter) LEVEL = wavelet level
 (parameter) type (a/d) = Approximation (a) or detailed (d)
o Example:
GetSignal kinase.fasta aaid_489.out kinase_sig kinase_win
kinase_wave 1 2 a
 This command will give you 3 output files : kinase_sig,
kinase_win and kinase_wave
 The wave file contains the the reconstructed signal obtained from
level 2 approximation wavelet transformation using wavelet ‘db10’
o Output file format:
 The first line contains 2 integers: <no of sequences>N
<no of
aa indices> M
1

Subsequent lines contain the information for each seqence. If you
have N sequences in your input fasta file, then you will have N+1
lines in your output files.
 From 2nd line onwards, each line contains the information of a
single sequence:
 First value is an integer which tells you the length of the
sequence, L
 Subsequencly, there are M x L sets of float values which
represents the signal for sig. Not that if WIN is not 1, then
there will be M x (L-WIN+1) sets of values for win and
wave file.
o To load the values into MATLAB, use ExtractGetSignal.m



ExtractGetSignal.m
o Description: Read signal generated by a C program GetSignal
o Usage: [sig,sig_len]=ExtractGetSignal(filename)
 (Input) Filename = The signal filename generated by GetSignal
(kinase_sig, kinase_win and kinase_wave)
 (output) sig = values of the signals; N x M x max(L) array where N
is the number of sequences, M is the number of indices and L is
the length of the signal.
 (output) sig_len = length of each sequence; N x 1 array where N is
the number of sequences since sequences may be of different
length.
o Example: [kinase_sig,kinase_len]=ExtractGetSignal(‘kinase_sig’)
 This will give you a N x M X max(L) array for kinase_sig and Nx1
array for kinase_len
o To plot the signals, use plotwave.m
Plotwave.m
o Description: plot 2D signals for 3D arrays. The plot function in
MATLAB cannot handle 3D data, that is it can’t do plot(kinase_sig(1,1,:))
o Usage: plotwave(seq_choice,index_choice,seq_wave,holdgraph,color)
 Seq_choice = which sequence you want to plot
 Index_choice = which index you want to plot (Note that in this
case, you have to know the order of the indices in your <input
index file>.
 Seq_wave = the variable that holds the 3D data
 Holdgraph = a flag that tells the plotwave function if you want to
overlay the graph or start a new figure (1 = overlay, 0 = start a new
figure)
 Color = color of the graph you want to plot
o Example:
plotwave(1,1,kinase_sig,1,’r’)
 A graph of the first sequence of kinase_sig converted with the first
aa index listed in the <input index file> is plotted in red line
GetSignalOneStop.m
o Description:
2




Get all AA indices wanted from 'aa_filename', store accession into
aa_id and values into norm_aa_index.
Read in sequences from 'fasta_filename' and store in seq_name,
seq and seq_len.
Obtain raw signal with all the AA indices
Perform windowing with size 'win' and store in sig_win
Perform wavelet transformation and store in wave
The coefficients stored in wave is based on 'level' and 'choice'.




o Usage:
[aa_id,norm_aa_index,seq_name,seq,seq_len,signal,sig_win,wave]=GetSi
gnalOneStop(aa_filename,fasta_filename,win,level,choice,wavelet)
GetRawSignal.m
o Description:
 Read in wanted indices from 'aa_filename' and store accession into
aa_id and values of these indices into norm_aa_index
 Read raw sequences read from 'fasta_filename' and store into
seq_name, seq, seq_len
 Convert sequences to raw signals using these indices and store in
signal
o Usage:
[aa_id,norm_aa_index,seq_name,seq,seq_len,signal]=GetRawSignal(aa_fi
lename,fasta_filename)
EuclideanDistance.m
o Description: Return a single value for euclidean distance between x and
y
o Usage: [dist]=EuclideanDistance(x,y)
Download C Sourcecode
Download Matlab files
3
KinaseHMM

HMM_build
o Usage: HMM_build <train set> <profile outfile>
o Parameters in HMM_build.h file that must be set before running the
program
 num_indices
 DOMAIN_NUM
o Example: HMM_build ntrain ntrainprofile
 In this case, num_indices=3 and DOMAIN_NUM=12
 If <train set> = ntrain, it means you must have the fasta format
files ntrain.d1,ntrain.d2……ntrain.d12
 ntrain.d1 contains training sequences of domain 1
 There must exist a file named “index36.out” which contains the
indices for each domain. The indices come in sets of num_indices
(that is 3 in this case). First set corresponds to indices used for
domain 1, second set for domain 2, so on.

HMM_run
o Usage: HMM_run <profile file> <test file> <result file>
o Example: HMM_run ntrainprofile yeast.fasta yeast_result.txt
 ntrainprofile is the profile that you have obtained by running
HMM_build
 yeast.fasta contains the test sequences in fasta format
 yeast_result.txt is the output file containing scores obtained by
running HMM_run
Download sourcecode
4
RQA

Eval3DStruct
o Usage: Eval3DStruct <pdbfile>
o
-c: chain_flag. (default: evaluation performed on all chains
o
-d: draw_flag. (default: no contact map drawn)
o
o Example: Eval3DStruct 1H24.pdb -c AB -d
o
Perform evaluation on structure of chain A and B.
o
1H24.pdb_A.bmp and 1H24.pdb_B.bmp contact maps generated.’

RQA3D
o Usage: RQA3D <filename>
o
-c: chain flag. (default: RQA perfomed on all chains)
o
-c ABC means perform RQA on chain A, B and C
o
-d: draw flag for contact map. (default: no bmp map generated)
o
-f: result flag. (default: only %rec displayed)
o
o
o Example: RQA3D 1H24.pdb -c AB -d -f
o
o
RQA performed on chain A and B of 1H24.pdb.
o
Contact maps 1H24.pdb_A.bmp and 1H24.pdb_B.bmp generated
o
Full display on RQA results

RQA1D
o Usage: RQA1D
o
-ip: input type is pdb format.
o
-if: input type is fasta format.
o
-I: AAIndex ID
o
-c: chain_flag. (default: perform RQA on all chains)
o
This option only available for -ip
o
-d: draw_flag. (default: Contact maps not drawn
o
-r: result_flag. (default: Display only %rec)
o
-F: factor_flag (default: 0.2 of mean of distance matrix
o
o Example: RQA1D -ip 1H24.pdb -c AB -I MIYS850101
Download sourcecode
5
Structure of how RQA source code files
link
Eval3DStruct
Eval3DStruct.c
Eval3DStruct.h
DrawImage.c
DrawImage.h
RQA_lib.c
RQA_lib.h
RQA3D
RQA1D
RQA3D.c
RQA3D.h
RQA1D.c
RQA1D.h
pdb_lib.c
pdb_lib.h
AAIndex_lib.c
AAIndex_lib.h
fasta.c
fasta.h
basic.c
basic.h
6
APBS for Dummies (Version 0.4.0)
(Acknowledgement: Thanks to Sandeep who provided guidance)
Setup
Download pdb2pqr
Step Through






pdb2pqr.py –ff=”amber” –apbs-input <filename.pdb> <filename.pqr> [This will
generate filename.in]
prun –n 1 apbs filename.in [This will generate pot.dx]
Note: Most probably you will get the following error message for APBS 0.4.0:
o PBEparm_check: SDENS not set!
o NOsh: MG parameters not set correctly!
o Error while parsing input file.
Add the following line into your .in file. [sdens keyword sets the sphere density
for the APBS Vacc accessibility object]
o sdens 10.0
Another way is to edit inputgen.py in the pdb2pqr/src directory
o Add a line anywhere after text += self.getCenter() and before text +=
"end\n". Note that this line have to be added at two locations in the file.
 text += " sdens 10.0\n"
Caution: Ensure that your .dx and your .pdb file must not be of the same filename.
PyMol can’t differentiate between the 2 of them if they have the same filename
even with different extension.
7
AMBER for Dummies
(Acknowledgment: Thanks to Ms Lee Hui Jun for giving me a
crash course on AMBER)
Setup
Remember to append these lines to your .bashrc:
export PATH=$PATH:/usr/local/amber/exe
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mkl/lib/32
export AMBERHOME=/usr/local/amber
Note: You have to change the path according to where AMBER is installed
In the following step through example, I use p53.pdb as an example
Step Through









Open Cygwin window, type ‘startx’
Type ‘xhost +<ipaddress>’, where <ipaddress> is the address of the server you
are going to run AMBER. Example: xhost +viper.bii-sg.org
At your computer, check your own ipaddress (Start->Run->type ‘cmd’; At
command prompt, type ‘ipconfig’; note down the IP Address)
Log into the server where you run AMBER
After you log in, type ‘export DISPLAY=<IP Address>:0’, where <IP Address>
is the address that you noted down in step 3)
Type ‘xleap’ and you should see xleap window
At xleap window:
o source leaprc.ff94
o p=loadpdb p53.pdb [Initialize object p to p53]
o solvatebox p TIP3PBOX 8 [add water to p where the box size is 8; you
may change to 10 or 12]
o edit p [You will see your protein in a solvatebox. Check if your protein is
entirely in the box.]
o check p [Check net charge of p. In this case, p53 is –ve charge]
o addions p Na+ 0 [If –ve charge, add Na+, if +ve charge, add Cl-; the last
digit 0 indicates we want net charge of 0]
o saveamberparm p p53.top p53.crd [save protein in solvatebox into .top
and .crd]
o quit
You are supposed to check the residue ID of the last residue of the protein and the
total number of residues in p. Download amber_script.sh.
Amber_script.sh will do the following:
8
o Generate p53_box.pdb and check for last residue of protein and total
number of residues in p
o It then generates .in and .s files for minimization, heating, equilibration
and production run and also a runmd file for submitting the jobs through
LSF.
o Consider your .top and .crd files are in p53 directory
o Usage: amber_script.sh <directory> <filename>
o Example: amber_script.sh p53 p53
o Files generated:
 p53_min1.in, p53_min1.s, Initial energy minimization with
solute frozen
 p53_min2.in, p53_min2.s, Initial energy minimization with
solvent frozen
 p53_min3.in, p53_min3.s, Initial energy minimization on whole
system
 p53_heat1.in,p53_heat1.s,
Heat from 0 to 100 (10ps)
 p53_heat2.in, p53_heat2.s, Heat from 100 to 200 (10ps)
 p53_heat3.in, p53_heat3.s, Heat from 200 to 300 (10ps)
 p53_md2.in, p53_md2.s,
Equilibration (100ps)
 p53_md<X>3.in, p53_md<X>.s,
Production run (2ns each)
[p53_md3 – p53_md7 are generated for the purpose of running 5 x
2ns]
 p53_runmd,
just type ‘p53_runmd’ and it will run
right from p53_min1.s till p53_md7.s in sequence
o Alternatively, if LSF is not working, you can do prun (Go to your .s file
and copy and paste from there; Copy from sander command till end of
double quotes)
 Example: prun –n 16 /usr/local/amber/exe/sander -O -i
1H24BE6_min1.in -o 1H24BE6_min1.out -p 1H24BE6.top -c
1H24BE6.crd -r 1H24BE6_min1.rst -ref 1H24BE6.crd &
 Note: In this case, you will have to check if each job has finished
running before starting the next one.
o You may want to check the progress by checking the .out files. For
example, ‘tail -40 1H24BE6_min1.out’
9
Autodock for Dummies
Setup
Download autodock_share.zip
Download pmol2q
Remember to append these lines to your .bashrc:
export AUTODOCK_UTI="/home/chuagh/bin/autodock/share"
export PATH=$PATH:/bin/pmol2q_2.3.0_win/src
export PATH=$PATH: /autodock/dist305/bin/linux
Note: You have to change the path according to where you put your share and pmol2q
and where your autodoc is installed
Step Through (How to do autodock without using ADT tools









Example, if you have macro.pdb which contains the marcromolecule and
ligand.pdb which is the ligand
pmol2q macro.pdb macro.pdbqs
pmol2q ligand.pdb ligand.mol2
deftors ligand.mol2
mkgpf3 ligand.pdbq macro.pdbqs [macro.gpf generated]
mkdpf3 ligand.pdbq macro.pdbqs [ligand.macro.dpf generated]
autogrid3 –p macro.gpf –l macro.glg
autodock3 –p ligand.macro.dpf –l ligand.macro.dlg
get-docked ligand.macro.dlg [ligand.macro.dlg.pdb will be generated. This file
contains pdb format of all dockings]
Notes:
 Change rmstol for cluster analysis if you want by editing ligand.macro.dpf
Matlab functions to extract coordinates (download)

autodock_get_best_cluster_pdb_from_dlg.m
o Usage: autodock_get_best_cluster_pdb_from_dlg(filename)
o Description: extract all docked conformation from the best cluster and put
into a single pdb file
o Note that for this function, both .dlg and .dlg.pdb file must be available
o Example: autodock_get_best_cluster_pdb_from_dlg(‘1H24E6’)
 In this example, 1H24E6.dlg and 1H24E6.dlg.pdb must be
available
10



An output file 1H24E6_best_cluster.pdb will be generated. It
contains all pdb coordinates of the docking conformations in the
best cluster.
autodock_get_min_rmsd_from_dlg.m
o Usage: [outline]=autodock_get_min_rmsd_from_dlg(filename)
o Description: Extract the docked conformation which has the lowest rmsd
from the starting position
o Note that for this function, both .dlg and .dlg.pdb file must be available
o Example: autodock_get_min_rmsd_from_dlg(‘1H24E6’)
 1H24E6.dlg and 1H24E6.dlg.pdb must be available
 Output file: 1H24E6_min_ref_rmsd.pdb
autodock_get_min_energy_from_dlg.m
o Usage: [outline]=autodock_get_min_energy_from_dlg(filename)
o Description: Extract the docked conformation with the lowest docked
energy and save as a pdb file
o Example: autodock_get_min_energy_from_dlg('1H24E6')
 Input file: 1H24E6.dlg and 1H24E6.dlg.pdb
 Output file: 1H24E6_min_energy.pdb
11
12
Download