Receptor-based virtual screening

advertisement
Receptor-based virtual screening
Lab version 2
Virtual screening
• Goal: identify ligands that tightly bind to a
protein
• Requirements: a computer database of
random potential ligands and a structure of
the target protein
• Repetitively dock new ligands to protein
• Score how tightly each ligand may bind
• Keep best ‘hits’; discard other ligands
Find the best
Ligand database
• Often databases of commercially available
compounds are used – up to 2 million
compounds
• These take some time to analyze
• We will use an NCI diversity set of about 1800
diverse compounds available from the
National Cancer Institute
• This database contains many interesting
compounds but is not exhaustive
Protein target
• We need a structure to serve as a target for
ligand binding
• This can be an X-ray crystallographic structure
or a high-quality homology model
• We need some idea of where the binding site
for ligands is as well
• If the protein has multiple conformations,
choose the appropriate one
Scoring
• To find the best ligands we must score the
docked complexes
• Vina does this, giving a DG score
• Other scoring methods are available such as Xscore and DrugScore
Automation
• Virtual screening involves docking new ligands
repetitively
• We will dock with Vina and automate the
docking with a Perl script
• Automation includes selecting a new ligand
from the database, running Vina, recording
the docking score etc.
Output
• You will get a list of hits (ligand numbers)
• You can select in advance how many hits you
want to look at – for a database of 2000,
maybe 20 hits is a reasonable number
• You can recover these hits as PDB files from
the (docked_pdb folder) and view them
docked to your protein
Set up
• Patience!
• We are trying to emulate much more
functional systems
• Expect delays
Preparing your computer
• In the C: directory, copy the folder
VirtualScreen2
• VirtualScreen2 contains most of the files you
will need and many of the folders
Installing Perl
•
•
•
•
•
Google ‘CPAN’ (the site for Perl)
Download a ‘binary’ for Perl
For PCs this will probably be ActivePerl
Install Perl
Test Perl; get a ‘Command Prompt’ from
start;Programs;accessories;CommandPrompt
• Type: perl –v
• You should get information about perl version
Look at a PDBQT file
•
•
•
•
Ligands have torsion (twist and bend) features
Look in the database folder db_pdbqt
Look at ligand1.pdbqt
Open file by right-clicking and using ‘open
with, wordpad’
• ‘BRANCH’ data indicates where ligand1 can
rotate (3 places)
Check Vina
•
•
•
•
•
•
•
Test files are present in \lm\VirtualScreen
These are for a receptor and drug ligand
2rhnh.pdbqt, carh.pdbqt, config2.txt
To run Vina type at command prompt:
\lm\downloads\vina.exe --config config2.txt
The program takes a minute or so to run
Test_vina.txt should give a list of energies for 9
alternative docked conformations
Check ligand database
•
•
•
•
Go to VirtualScreen2\db_pdbqt directory
NCI diversity set = about 1800 chemicals
Parent DB from NCI is called Ncidiv_p0.0
These are chemicals available from NCI for
testing
• We have about 1800 .pdbqt files, one per
chemical
Target protein
• Much of VirtualScreen2 relies on the target
protein for binding
• A single name (ideally the PDB code) should
be used throughout
• Any name variation will stop the program
Prepare target
• In VirtualScreen2
• Make a new directory with a one_word name
of your target protein –example 2rht_a
• In your target directory place two PDB files:
• rech.pdbqt = your receptor/protein; must be
called ‘rech.pdbqt’
• xtal-lig.pdb = a reference ligand that will be
used to define the binding site
• Look in folder 2rht_a to see example
Making rech.pdbqt
• Start with your receptor/protein without any
ligand
• Make a copy of the PDB file and delete lines
referring to your ligand 3-letter code
• Save
Making your rech.pdbqt file
•
•
•
•
•
•
•
•
Add hydrogens
There are two methods
Open your protein in DS Viewer
-- click on ‘tools’ then ‘hydrogens’, ‘add’
You should see H’s added
Or use OpenBabel on the Command Line
Babel.exe –ipdb 2nht.pdb –opdb 2nhtH.pdb -h
(substitute the name of your protein)
Making your rech.pdbqt file
• Now convert the PDB file to PDBQT, adding
hydrogen bonding information
• Use MGLtools (AutoDock tools)
• Install if you do not have it
• Start program; you will get a window
• In the middle of the lower bar is ‘Grid’
• Click ‘Macromolecule’ on the menu and open
your pdb+hydrogens file.
• Then choose ‘output’ and save as a .pdbqt file
Making your rech.pdbqt file
• The file should be ready at this point
• Check that file contains hydrogens (only polar
Hydrogens are included)
• Check that file has hydrogen bonding info on
the right margin with entries like HD
(indicating hydrogen donor) or OA (oxygen
hydrogen bond acceptor) or C, doing nothing
Reference ligand
• The reference ligand PDB file serves only one
purpose:
• It defines the region of the protein that Vina
will search
• If the ligand is in the wrong place, Vina will
search the wrong place.
• Copy the ligand from a trusted protein-ligand
complex file
Editing the Virtual2.pl script
• Information on how the virtual screen should
run is included in the script
• You must tell the script what to do
• At runtime this information is used
VS adjustable features
•
•
•
•
Edit Virtual2.pl
You can adjust:
Target_name – must match a folder name
Filenum (file number) – use new number to
avoid deleting previous experiments
• Number of ligands to screen – use ‘stop’ and
‘start’
Target_name
• $target_name defines the target for analysis
• It should = the name of the folder that holds
rech.pdbqt
• E.g. $target_name = “2rht_a”;
• For the example search
• There is a folder called 2rht_a that matches
and has the files needed for the search
Number of ligands
• You can adjust the start and stop point for
searching the database
• – do only 5 to start… 1800 may take days on
your machine (21 hours on my machine)
• Time the length of time needed to do 5
ligands and multiply by 360 to calculate the
time required for the whole database
• The database can be split up using ‘stop’ and
‘start’ and run at different times
Editing the script
• Right click on virtual.pl and choose open with
Wordpad
• At the top of the script is information
• The section labeled for editing can be changed
• If you are going to make big changes, save a copy
of the original script
• You must enter the name of your protein exactly
as the folder is named
• Edit carefully, do not delete #’s or ;’s
Before you begin VS
• Have you set the number of ligands to 5? (0-5)
• This should take 3 – 30 minutes (you should
time it)
• If something goes wrong the first time (it
usually does) no harm done.
• To stop the program, use ctrl-C (repeat if
necessary)
Running VS
• Get a command prompt
(start;programs;accessories;command
prompt)
• Type: cd \virtualscreen2
• (this gets you to the right directory if needed)
• Type: virtual2.pl
• The program should run and stop in less than
an hour if you are doing 5 ligands (2-10
minutes is likely)
Looking at the results
• The results are in the vs_log folder
(\virtualscreen2\vs_log)
• The output file has the file numbers of the
hits, ranked from best to worst.
• Results files are marked with filenum to avoid
overwriting
• Sample file: 2rht_a_results2.txt
Looking at hits
• Open your hits results file or open the
example file 2rht_a_results.txt
• The predicted DG of binding is shown and the
ligand number
• A more negative DG indicates tighter binding
• The average DG for all ligands is shown
• For my data, ligand 438 is best
Looking at one ligand
• We can look at the best hit from 2rht_a
• In db_pdb look for ligand438.pdb the best hit
for the example
• (db_pdb contains un-docked molecules)
• Look at this file with RasMol
• It has a symmetric set of fused rings – this
type of molecule is usually an artefact, it binds
to everything – other hits may be better
Looking for a good pose
• A ‘pose’ is a ligand conformation bound to a
protein
• To view the conformation of a docked ligand
after VS, look in the docked_pdb folder
• These files can also be added to a protein file
to view docking
• Save molecules you like, because they can be
overwritten
Viewing complexes
• The ligand .pdb file contents can be spliced
onto the end of a copy of the receptor file
used in virtual screening
• The complex can be viewed in RasMol
• Especially note what receptor residues the
ligand contacts
Ligand – protein contacts
•
•
•
•
•
•
Splice ligand onto receptor in PDB file
Ligand should be named LIG in PDB file
Run contact12.pl script
Example:
contact12.pl 2rht_lig438.pdb LIG
Contacts appear on screen and in file
‘contact_output.txt’
The role of good judgment
• The value of virtual screening is that one can
go from thousands or millions of candidate
drugs with 0.01% - 0.1% leads to tens or
hundreds of hits with 1% -10% leads
• Hits are not leads
• They are a step toward getting leads
Download