Programs for Sharing Prepared by: Chua Gek Huey Last updated: 24 March 2006 Content Page 1. GetSignal 1 2. KinaseHMM 4 3. RQA 5 4. APBS 7 5. AMBER 8 6. Autodock 10 GetSignal For all programs in C, key in the command without any input arguments and a help message will be displayed. For example, if you do not know how to use GetSignal, just key ‘GetSignal’ at the command prompt and you will get a message on how to use GetSignal. For all matlab functions, key in ‘help <function>’ and a message about the function will be displayed. For example, if you want to get help for ExtractGetSignal, just key in ‘help ExtractGetSignal’ at the Matlab command prompt and a help message will be displayed. GetSignal (C) o Description: Converts all sequences in <input fasta file> to signals (raw signals, windowed signals, wavelet reconstructed signals) with all indices listed in <input index file> o Usage: GetSignal <input fasta file> <input index file> <signal file> <window file> <wave file> <WIN> <LEVEL> <type(a/d)> (input) Input fast file = your sequences in fasta format (input) Input index file = list of aa indices that you want use (using their AAIndex) (for the format, refer to aaid_489.out) (output) signal file = raw signals translated from sequences using aa indices (output) window file = WIN length window-averaged signal obtained from raw signal (If WIN=1, then the windowed signal is the same as raw signal) (output) wave file = reconstructed wavelet signal determined by LEVEL and type (a/d). In this program, ‘db10’ wavelets are used. (parameter) WIN = Rectangular window size (parameter) LEVEL = wavelet level (parameter) type (a/d) = Approximation (a) or detailed (d) o Example: GetSignal kinase.fasta aaid_489.out kinase_sig kinase_win kinase_wave 1 2 a This command will give you 3 output files : kinase_sig, kinase_win and kinase_wave The wave file contains the the reconstructed signal obtained from level 2 approximation wavelet transformation using wavelet ‘db10’ o Output file format: The first line contains 2 integers: <no of sequences>N <no of aa indices> M 1 Subsequent lines contain the information for each seqence. If you have N sequences in your input fasta file, then you will have N+1 lines in your output files. From 2nd line onwards, each line contains the information of a single sequence: First value is an integer which tells you the length of the sequence, L Subsequencly, there are M x L sets of float values which represents the signal for sig. Not that if WIN is not 1, then there will be M x (L-WIN+1) sets of values for win and wave file. o To load the values into MATLAB, use ExtractGetSignal.m ExtractGetSignal.m o Description: Read signal generated by a C program GetSignal o Usage: [sig,sig_len]=ExtractGetSignal(filename) (Input) Filename = The signal filename generated by GetSignal (kinase_sig, kinase_win and kinase_wave) (output) sig = values of the signals; N x M x max(L) array where N is the number of sequences, M is the number of indices and L is the length of the signal. (output) sig_len = length of each sequence; N x 1 array where N is the number of sequences since sequences may be of different length. o Example: [kinase_sig,kinase_len]=ExtractGetSignal(‘kinase_sig’) This will give you a N x M X max(L) array for kinase_sig and Nx1 array for kinase_len o To plot the signals, use plotwave.m Plotwave.m o Description: plot 2D signals for 3D arrays. The plot function in MATLAB cannot handle 3D data, that is it can’t do plot(kinase_sig(1,1,:)) o Usage: plotwave(seq_choice,index_choice,seq_wave,holdgraph,color) Seq_choice = which sequence you want to plot Index_choice = which index you want to plot (Note that in this case, you have to know the order of the indices in your <input index file>. Seq_wave = the variable that holds the 3D data Holdgraph = a flag that tells the plotwave function if you want to overlay the graph or start a new figure (1 = overlay, 0 = start a new figure) Color = color of the graph you want to plot o Example: plotwave(1,1,kinase_sig,1,’r’) A graph of the first sequence of kinase_sig converted with the first aa index listed in the <input index file> is plotted in red line GetSignalOneStop.m o Description: 2 Get all AA indices wanted from 'aa_filename', store accession into aa_id and values into norm_aa_index. Read in sequences from 'fasta_filename' and store in seq_name, seq and seq_len. Obtain raw signal with all the AA indices Perform windowing with size 'win' and store in sig_win Perform wavelet transformation and store in wave The coefficients stored in wave is based on 'level' and 'choice'. o Usage: [aa_id,norm_aa_index,seq_name,seq,seq_len,signal,sig_win,wave]=GetSi gnalOneStop(aa_filename,fasta_filename,win,level,choice,wavelet) GetRawSignal.m o Description: Read in wanted indices from 'aa_filename' and store accession into aa_id and values of these indices into norm_aa_index Read raw sequences read from 'fasta_filename' and store into seq_name, seq, seq_len Convert sequences to raw signals using these indices and store in signal o Usage: [aa_id,norm_aa_index,seq_name,seq,seq_len,signal]=GetRawSignal(aa_fi lename,fasta_filename) EuclideanDistance.m o Description: Return a single value for euclidean distance between x and y o Usage: [dist]=EuclideanDistance(x,y) Download C Sourcecode Download Matlab files 3 KinaseHMM HMM_build o Usage: HMM_build <train set> <profile outfile> o Parameters in HMM_build.h file that must be set before running the program num_indices DOMAIN_NUM o Example: HMM_build ntrain ntrainprofile In this case, num_indices=3 and DOMAIN_NUM=12 If <train set> = ntrain, it means you must have the fasta format files ntrain.d1,ntrain.d2……ntrain.d12 ntrain.d1 contains training sequences of domain 1 There must exist a file named “index36.out” which contains the indices for each domain. The indices come in sets of num_indices (that is 3 in this case). First set corresponds to indices used for domain 1, second set for domain 2, so on. HMM_run o Usage: HMM_run <profile file> <test file> <result file> o Example: HMM_run ntrainprofile yeast.fasta yeast_result.txt ntrainprofile is the profile that you have obtained by running HMM_build yeast.fasta contains the test sequences in fasta format yeast_result.txt is the output file containing scores obtained by running HMM_run Download sourcecode 4 RQA Eval3DStruct o Usage: Eval3DStruct <pdbfile> o -c: chain_flag. (default: evaluation performed on all chains o -d: draw_flag. (default: no contact map drawn) o o Example: Eval3DStruct 1H24.pdb -c AB -d o Perform evaluation on structure of chain A and B. o 1H24.pdb_A.bmp and 1H24.pdb_B.bmp contact maps generated.’ RQA3D o Usage: RQA3D <filename> o -c: chain flag. (default: RQA perfomed on all chains) o -c ABC means perform RQA on chain A, B and C o -d: draw flag for contact map. (default: no bmp map generated) o -f: result flag. (default: only %rec displayed) o o o Example: RQA3D 1H24.pdb -c AB -d -f o o RQA performed on chain A and B of 1H24.pdb. o Contact maps 1H24.pdb_A.bmp and 1H24.pdb_B.bmp generated o Full display on RQA results RQA1D o Usage: RQA1D o -ip: input type is pdb format. o -if: input type is fasta format. o -I: AAIndex ID o -c: chain_flag. (default: perform RQA on all chains) o This option only available for -ip o -d: draw_flag. (default: Contact maps not drawn o -r: result_flag. (default: Display only %rec) o -F: factor_flag (default: 0.2 of mean of distance matrix o o Example: RQA1D -ip 1H24.pdb -c AB -I MIYS850101 Download sourcecode 5 Structure of how RQA source code files link Eval3DStruct Eval3DStruct.c Eval3DStruct.h DrawImage.c DrawImage.h RQA_lib.c RQA_lib.h RQA3D RQA1D RQA3D.c RQA3D.h RQA1D.c RQA1D.h pdb_lib.c pdb_lib.h AAIndex_lib.c AAIndex_lib.h fasta.c fasta.h basic.c basic.h 6 APBS for Dummies (Version 0.4.0) (Acknowledgement: Thanks to Sandeep who provided guidance) Setup Download pdb2pqr Step Through pdb2pqr.py –ff=”amber” –apbs-input <filename.pdb> <filename.pqr> [This will generate filename.in] prun –n 1 apbs filename.in [This will generate pot.dx] Note: Most probably you will get the following error message for APBS 0.4.0: o PBEparm_check: SDENS not set! o NOsh: MG parameters not set correctly! o Error while parsing input file. Add the following line into your .in file. [sdens keyword sets the sphere density for the APBS Vacc accessibility object] o sdens 10.0 Another way is to edit inputgen.py in the pdb2pqr/src directory o Add a line anywhere after text += self.getCenter() and before text += "end\n". Note that this line have to be added at two locations in the file. text += " sdens 10.0\n" Caution: Ensure that your .dx and your .pdb file must not be of the same filename. PyMol can’t differentiate between the 2 of them if they have the same filename even with different extension. 7 AMBER for Dummies (Acknowledgment: Thanks to Ms Lee Hui Jun for giving me a crash course on AMBER) Setup Remember to append these lines to your .bashrc: export PATH=$PATH:/usr/local/amber/exe export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mkl/lib/32 export AMBERHOME=/usr/local/amber Note: You have to change the path according to where AMBER is installed In the following step through example, I use p53.pdb as an example Step Through Open Cygwin window, type ‘startx’ Type ‘xhost +<ipaddress>’, where <ipaddress> is the address of the server you are going to run AMBER. Example: xhost +viper.bii-sg.org At your computer, check your own ipaddress (Start->Run->type ‘cmd’; At command prompt, type ‘ipconfig’; note down the IP Address) Log into the server where you run AMBER After you log in, type ‘export DISPLAY=<IP Address>:0’, where <IP Address> is the address that you noted down in step 3) Type ‘xleap’ and you should see xleap window At xleap window: o source leaprc.ff94 o p=loadpdb p53.pdb [Initialize object p to p53] o solvatebox p TIP3PBOX 8 [add water to p where the box size is 8; you may change to 10 or 12] o edit p [You will see your protein in a solvatebox. Check if your protein is entirely in the box.] o check p [Check net charge of p. In this case, p53 is –ve charge] o addions p Na+ 0 [If –ve charge, add Na+, if +ve charge, add Cl-; the last digit 0 indicates we want net charge of 0] o saveamberparm p p53.top p53.crd [save protein in solvatebox into .top and .crd] o quit You are supposed to check the residue ID of the last residue of the protein and the total number of residues in p. Download amber_script.sh. Amber_script.sh will do the following: 8 o Generate p53_box.pdb and check for last residue of protein and total number of residues in p o It then generates .in and .s files for minimization, heating, equilibration and production run and also a runmd file for submitting the jobs through LSF. o Consider your .top and .crd files are in p53 directory o Usage: amber_script.sh <directory> <filename> o Example: amber_script.sh p53 p53 o Files generated: p53_min1.in, p53_min1.s, Initial energy minimization with solute frozen p53_min2.in, p53_min2.s, Initial energy minimization with solvent frozen p53_min3.in, p53_min3.s, Initial energy minimization on whole system p53_heat1.in,p53_heat1.s, Heat from 0 to 100 (10ps) p53_heat2.in, p53_heat2.s, Heat from 100 to 200 (10ps) p53_heat3.in, p53_heat3.s, Heat from 200 to 300 (10ps) p53_md2.in, p53_md2.s, Equilibration (100ps) p53_md<X>3.in, p53_md<X>.s, Production run (2ns each) [p53_md3 – p53_md7 are generated for the purpose of running 5 x 2ns] p53_runmd, just type ‘p53_runmd’ and it will run right from p53_min1.s till p53_md7.s in sequence o Alternatively, if LSF is not working, you can do prun (Go to your .s file and copy and paste from there; Copy from sander command till end of double quotes) Example: prun –n 16 /usr/local/amber/exe/sander -O -i 1H24BE6_min1.in -o 1H24BE6_min1.out -p 1H24BE6.top -c 1H24BE6.crd -r 1H24BE6_min1.rst -ref 1H24BE6.crd & Note: In this case, you will have to check if each job has finished running before starting the next one. o You may want to check the progress by checking the .out files. For example, ‘tail -40 1H24BE6_min1.out’ 9 Autodock for Dummies Setup Download autodock_share.zip Download pmol2q Remember to append these lines to your .bashrc: export AUTODOCK_UTI="/home/chuagh/bin/autodock/share" export PATH=$PATH:/bin/pmol2q_2.3.0_win/src export PATH=$PATH: /autodock/dist305/bin/linux Note: You have to change the path according to where you put your share and pmol2q and where your autodoc is installed Step Through (How to do autodock without using ADT tools Example, if you have macro.pdb which contains the marcromolecule and ligand.pdb which is the ligand pmol2q macro.pdb macro.pdbqs pmol2q ligand.pdb ligand.mol2 deftors ligand.mol2 mkgpf3 ligand.pdbq macro.pdbqs [macro.gpf generated] mkdpf3 ligand.pdbq macro.pdbqs [ligand.macro.dpf generated] autogrid3 –p macro.gpf –l macro.glg autodock3 –p ligand.macro.dpf –l ligand.macro.dlg get-docked ligand.macro.dlg [ligand.macro.dlg.pdb will be generated. This file contains pdb format of all dockings] Notes: Change rmstol for cluster analysis if you want by editing ligand.macro.dpf Matlab functions to extract coordinates (download) autodock_get_best_cluster_pdb_from_dlg.m o Usage: autodock_get_best_cluster_pdb_from_dlg(filename) o Description: extract all docked conformation from the best cluster and put into a single pdb file o Note that for this function, both .dlg and .dlg.pdb file must be available o Example: autodock_get_best_cluster_pdb_from_dlg(‘1H24E6’) In this example, 1H24E6.dlg and 1H24E6.dlg.pdb must be available 10 An output file 1H24E6_best_cluster.pdb will be generated. It contains all pdb coordinates of the docking conformations in the best cluster. autodock_get_min_rmsd_from_dlg.m o Usage: [outline]=autodock_get_min_rmsd_from_dlg(filename) o Description: Extract the docked conformation which has the lowest rmsd from the starting position o Note that for this function, both .dlg and .dlg.pdb file must be available o Example: autodock_get_min_rmsd_from_dlg(‘1H24E6’) 1H24E6.dlg and 1H24E6.dlg.pdb must be available Output file: 1H24E6_min_ref_rmsd.pdb autodock_get_min_energy_from_dlg.m o Usage: [outline]=autodock_get_min_energy_from_dlg(filename) o Description: Extract the docked conformation with the lowest docked energy and save as a pdb file o Example: autodock_get_min_energy_from_dlg('1H24E6') Input file: 1H24E6.dlg and 1H24E6.dlg.pdb Output file: 1H24E6_min_energy.pdb 11 12