Receptor-based virtual screening Lab version 2 Virtual screening • Goal: identify ligands that tightly bind to a protein • Requirements: a computer database of random potential ligands and a structure of the target protein • Repetitively dock new ligands to protein • Score how tightly each ligand may bind • Keep best ‘hits’; discard other ligands Find the best Ligand database • Often databases of commercially available compounds are used – up to 2 million compounds • These take some time to analyze • We will use an NCI diversity set of about 1800 diverse compounds available from the National Cancer Institute • This database contains many interesting compounds but is not exhaustive Protein target • We need a structure to serve as a target for ligand binding • This can be an X-ray crystallographic structure or a high-quality homology model • We need some idea of where the binding site for ligands is as well • If the protein has multiple conformations, choose the appropriate one Scoring • To find the best ligands we must score the docked complexes • Vina does this, giving a DG score • Other scoring methods are available such as Xscore and DrugScore Automation • Virtual screening involves docking new ligands repetitively • We will dock with Vina and automate the docking with a Perl script • Automation includes selecting a new ligand from the database, running Vina, recording the docking score etc. Output • You will get a list of hits (ligand numbers) • You can select in advance how many hits you want to look at – for a database of 2000, maybe 20 hits is a reasonable number • You can recover these hits as PDB files from the (docked_pdb folder) and view them docked to your protein Set up • Patience! • We are trying to emulate much more functional systems • Expect delays Preparing your computer • In the C: directory, copy the folder VirtualScreen2 • VirtualScreen2 contains most of the files you will need and many of the folders Installing Perl • • • • • Google ‘CPAN’ (the site for Perl) Download a ‘binary’ for Perl For PCs this will probably be ActivePerl Install Perl Test Perl; get a ‘Command Prompt’ from start;Programs;accessories;CommandPrompt • Type: perl –v • You should get information about perl version Look at a PDBQT file • • • • Ligands have torsion (twist and bend) features Look in the database folder db_pdbqt Look at ligand1.pdbqt Open file by right-clicking and using ‘open with, wordpad’ • ‘BRANCH’ data indicates where ligand1 can rotate (3 places) Check Vina • • • • • • • Test files are present in \lm\VirtualScreen These are for a receptor and drug ligand 2rhnh.pdbqt, carh.pdbqt, config2.txt To run Vina type at command prompt: \lm\downloads\vina.exe --config config2.txt The program takes a minute or so to run Test_vina.txt should give a list of energies for 9 alternative docked conformations Check ligand database • • • • Go to VirtualScreen2\db_pdbqt directory NCI diversity set = about 1800 chemicals Parent DB from NCI is called Ncidiv_p0.0 These are chemicals available from NCI for testing • We have about 1800 .pdbqt files, one per chemical Target protein • Much of VirtualScreen2 relies on the target protein for binding • A single name (ideally the PDB code) should be used throughout • Any name variation will stop the program Prepare target • In VirtualScreen2 • Make a new directory with a one_word name of your target protein –example 2rht_a • In your target directory place two PDB files: • rech.pdbqt = your receptor/protein; must be called ‘rech.pdbqt’ • xtal-lig.pdb = a reference ligand that will be used to define the binding site • Look in folder 2rht_a to see example Making rech.pdbqt • Start with your receptor/protein without any ligand • Make a copy of the PDB file and delete lines referring to your ligand 3-letter code • Save Making your rech.pdbqt file • • • • • • • • Add hydrogens There are two methods Open your protein in DS Viewer -- click on ‘tools’ then ‘hydrogens’, ‘add’ You should see H’s added Or use OpenBabel on the Command Line Babel.exe –ipdb 2nht.pdb –opdb 2nhtH.pdb -h (substitute the name of your protein) Making your rech.pdbqt file • Now convert the PDB file to PDBQT, adding hydrogen bonding information • Use MGLtools (AutoDock tools) • Install if you do not have it • Start program; you will get a window • In the middle of the lower bar is ‘Grid’ • Click ‘Macromolecule’ on the menu and open your pdb+hydrogens file. • Then choose ‘output’ and save as a .pdbqt file Making your rech.pdbqt file • The file should be ready at this point • Check that file contains hydrogens (only polar Hydrogens are included) • Check that file has hydrogen bonding info on the right margin with entries like HD (indicating hydrogen donor) or OA (oxygen hydrogen bond acceptor) or C, doing nothing Reference ligand • The reference ligand PDB file serves only one purpose: • It defines the region of the protein that Vina will search • If the ligand is in the wrong place, Vina will search the wrong place. • Copy the ligand from a trusted protein-ligand complex file Editing the Virtual2.pl script • Information on how the virtual screen should run is included in the script • You must tell the script what to do • At runtime this information is used VS adjustable features • • • • Edit Virtual2.pl You can adjust: Target_name – must match a folder name Filenum (file number) – use new number to avoid deleting previous experiments • Number of ligands to screen – use ‘stop’ and ‘start’ Target_name • $target_name defines the target for analysis • It should = the name of the folder that holds rech.pdbqt • E.g. $target_name = “2rht_a”; • For the example search • There is a folder called 2rht_a that matches and has the files needed for the search Number of ligands • You can adjust the start and stop point for searching the database • – do only 5 to start… 1800 may take days on your machine (21 hours on my machine) • Time the length of time needed to do 5 ligands and multiply by 360 to calculate the time required for the whole database • The database can be split up using ‘stop’ and ‘start’ and run at different times Editing the script • Right click on virtual.pl and choose open with Wordpad • At the top of the script is information • The section labeled for editing can be changed • If you are going to make big changes, save a copy of the original script • You must enter the name of your protein exactly as the folder is named • Edit carefully, do not delete #’s or ;’s Before you begin VS • Have you set the number of ligands to 5? (0-5) • This should take 3 – 30 minutes (you should time it) • If something goes wrong the first time (it usually does) no harm done. • To stop the program, use ctrl-C (repeat if necessary) Running VS • Get a command prompt (start;programs;accessories;command prompt) • Type: cd \virtualscreen2 • (this gets you to the right directory if needed) • Type: virtual2.pl • The program should run and stop in less than an hour if you are doing 5 ligands (2-10 minutes is likely) Looking at the results • The results are in the vs_log folder (\virtualscreen2\vs_log) • The output file has the file numbers of the hits, ranked from best to worst. • Results files are marked with filenum to avoid overwriting • Sample file: 2rht_a_results2.txt Looking at hits • Open your hits results file or open the example file 2rht_a_results.txt • The predicted DG of binding is shown and the ligand number • A more negative DG indicates tighter binding • The average DG for all ligands is shown • For my data, ligand 438 is best Looking at one ligand • We can look at the best hit from 2rht_a • In db_pdb look for ligand438.pdb the best hit for the example • (db_pdb contains un-docked molecules) • Look at this file with RasMol • It has a symmetric set of fused rings – this type of molecule is usually an artefact, it binds to everything – other hits may be better Looking for a good pose • A ‘pose’ is a ligand conformation bound to a protein • To view the conformation of a docked ligand after VS, look in the docked_pdb folder • These files can also be added to a protein file to view docking • Save molecules you like, because they can be overwritten Viewing complexes • The ligand .pdb file contents can be spliced onto the end of a copy of the receptor file used in virtual screening • The complex can be viewed in RasMol • Especially note what receptor residues the ligand contacts Ligand – protein contacts • • • • • • Splice ligand onto receptor in PDB file Ligand should be named LIG in PDB file Run contact12.pl script Example: contact12.pl 2rht_lig438.pdb LIG Contacts appear on screen and in file ‘contact_output.txt’ The role of good judgment • The value of virtual screening is that one can go from thousands or millions of candidate drugs with 0.01% - 0.1% leads to tens or hundreds of hits with 1% -10% leads • Hits are not leads • They are a step toward getting leads