Insilico drug designing Dinesh Gupta Structural and Computational Biology Group ICGEB Modern drug discovery process Target identification Target validation 2-5 years Lead Lead identification optimization Preclinical phase Drug discovery 6-9 years • Drug discovery is an expensive process involving high R & D cost and extensive clinical testing • A typical development time is estimated to be 10-15 years. Drug discovery technologies • Target identification – Genomics, gene expression profiling and proteomics • Target Validation – Gene knock-out, inhibition assay • Lead Identification – High throughput screening, fragment based screening, combinatorial libraries • Lead Optimization – Medicinal chemistry driven optimization, X-ray crystallography, QSAR, ADME profiling (bioavailability) • Pre Clinical Phase – Pharmacodynamics (PD), Pharmacokinetics (PK), ADME, and toxicity testing through animals • Clinical Phase – Human trials Rational Approach to Drug Discovery Identify and validate target Clone gene encoding target Express target Crystal structures/MM of target and target/inhibitor complexes Identify lead compounds Synthesize modified lead compounds Toxicity & pharmacokinetic studies Preclinical trials Bioinformatics tools in DD • • • • • Comparison of Sequences: Identify targets Homology modelling: active site prediction Systems Biology: Identify targets Databases: Manage information In silico screening (Ligand based, receptor based): Iterative steps of Molecular docking. • Pharmacogenomic databases: assist safety related issues Currently used drug targets J. Drews Science 287, 1960 -1964 (2000) This information is used by bioinformaticians to narrow the search in the groups Published by AAAS Insilico methods in Drug Discovery • Molecular docking • Virtual High through put screening. • QSAR (Quantitative structure-activity relationship) • Pharmacophore mapping • Fragment based screening Molecular Docking • Docking is the computational determination of binding affinity between molecules (protein structure and ligand). • Given a protein and a ligand find out the binding free energy of the complex formed by docking them. L L R R Molecular Docking: classification • Docking or Computer aided drug designing can be broadly classified – Receptor based methods- make use of the structure of the target protein. – Ligand based methods- based on the known inhibitors Receptor based methods • Uses the 3D structure of the target receptor to search for the potential candidate compounds that can modulate the target function. • These involve molecular docking of each compound in the chemical database into the binding site of the target and predicting the electrostatic fit between them. • The compounds are ranked using an appropriate scoring function such that the scores correlate with the binding affinity. • Receptor based method has been successfully applied in many targets Ligand based strategy • In the absence of the structural information of the target, ligand based method make use of the information provided by known inhibitors for the target receptor. • Structures similar to the known inhibitors are identified from chemical databases by variety of methods, • Some of the methods widely used are similarity and substructure searching, pharmacophore matching or 3D shape matching. • Numerous successful applications of ligand based methods have been reported Ligand based strategy Search for similar compounds database known actives structures found Binding free energy • Binding free energy is calculated as the sum of the following energies - Electrostatic Energy - Vander waals Energy - Internal Energy change due to flexible deformations - Translational and rotational energy • Lesser the binding free energy of a complex the more stable it is Basic binding mechanism Complementarities between the ligand and the binding site: • Steric complementarities, i.e. the shape of the ligand is mirrored in the shape of the binding site. • Physicochemical complementarities Components of molecular docking A) Search algorithm • To find the best conformation of the ligand and the protein system. • Rigid and flexible docking B) Scoring function • Rank the ligands according to the interaction energy. • Based on the energy force-field function. Success with vHTS • • • • • Dihydrofolate reductase inhibitor (1992) HIV-protease (1992) Phospholypase A2 (1994) Thrombine (1996) Carbonic anhydrase inhibitors(2002) Virtual High Throughput Screening • Less expensive than High Throughput Screening • Faster than conventional screening • Scanning a large number of potential drug like molecules in very less time. • HTS itself is a trial and error approach but can be better complemented by virtual screening. QSAR • QSAR is statistical approach that attempts to relate physical and chemical properties of molecules to their biological activities. • Various descriptors like molecular weight, number of rotatable bonds LogP etc. are commonly used. • Many QSAR approaches are in practice based on the data dimensions. • It ranges from 1D QSAR to 6D QSAR. Pharmacophore mapping • It is a 3D description of a pharmacophore, developed by specifying the nature of the key pharmacophoric features and the 3D distance map among all the key features. • A Pharmacophore map can be generated by superposition of active compounds to identify their common features. • Based on the pharmacophore map either de novo design or 3D database searching can be carried out. Modeling and informatics in drug design Increased application of structure based drug designing is facilitated by: Growth of targets number Growth of 3D structures determination (PDB database) Growth of computing power Growth of prediction quality of proteincompound interactions Summary: role of Bioinformatics? • Identification of homologs of functional proteins (motif, protein families, domains) • Identification of targets by cross species examination • Visualization of molecular models • Docking, vHTS • QSAR, Pharmacophore mapping Example: use of Bioinformatics in Drug discovery Identification of novel drug targets against human malaria Malaria – A global problem! • Malaria causes at least 500 million clinical cases and more than one million deaths each year. • A child dies of malaria every 30 seconds. • Out of four Plasmodium species causing human malaria, P.falciparum poses most serious threat: because of its virulence, prevalence and drug resistance. • Malaria takes an economic toll - cutting economic growth rates by as much as 1.3% in countries with high disease rates. • There are four types of human malaria: – – – – Plasmodium falciparum Plasmodium vivax Plasmodium malariae Plasmodium ovale. • Approximately half of the world's population is at risk of malaria, particularly those living in lower-income countries. • Today, there are 109 malaria affected countries in 4 regions Chemical structures of drugs in widely used for treatment of Malaria a) Chloroquine b) Quinine c) Artemether d) Sodium artesunate e) Dihydroartemisinin f) Pyrimethamine g) Sulfadoxine h) Mefloquine i) Halofantrine j) Primaquine k) Tafenoquine l) Chlorproguanil m) Dapsone http://malaria.who.i nt/docs/adpolicy_t g2003.pdf Problems with the existing drugs • Drug resistance is most common problem • Adverse effects (Shock and cardiac arrhythmias caused by Chloroquine) • Poor patient compliance (Quinine tastes very unpleasant, causes dizziness, nausea etc.) • High cost of production for some effective drugs (Atovaquine). • Urgent need for identification of novel drug targets which are effective and affordable. Strategies for drug target identification in P. falciparum • Parasite culture for functional assays are difficult and expensive. Making computational approaches more relevant. • Malaria remains a neglected disease- very few stake holders! • Availability of the genomic data of P.falciparum and H.sapiens has facilitated the effective application of comparative genomics. • Comparative genomics helps in the identification and exploitation of different characteristic features in host and the parasite. • Identification of specific metabolic pathways in P. falciparum and targeting the crucial proteins is an attractive approach of target based drug discovery. Comparison of proteomes helps in identifying important indispensible parasite proteins A. gambiae Predicted proteome P. falciparum H. sapiens • Out of 5334 predicted proteins in P. falciparum, 60% didn’t show any similarity to known proteins. • Hence assigning a physiological functional role to these hypothetical proteins using bioinformatics approach still remains a challenge. Novel drug target identification in P.falciparum Comparative genomics studies ~40% identity threshold for three-dimensional modeling BlastP Relational Database of homology models Human proteome 476 P.falciparum proteins Large set of proteins with no/low similarity Literature search for all these proteins Check for physiological and biochemical functions; etc .. Putative drug targets in P.falciparum Proteasome machinery (ClpQY and ClpAP) in P.falciparum Targets identified by comparison of proteins models • Identification of two proteasomal proteins of prokaryotic origin, not present in hosts. • The protein degradation is an important process in parasite development inside host RBCs. Eukaryotic and prokaryotic proteasome machinery 26S proteasome: eukaryotic type •19S regulatory + 20S proteolytic particle •Present only in Eukaryotes and archae •Degrades ubiquitinated proteins 20S proteasome ClpQY system: prokaryotic type •ClpY cap + ClpQ core particle •Present only in prokaryotes > 20 different proteins involved •No ubiquitination in prokaryote •Substrate specificity is not known •Only two proteins ClpQ & ClpY Substrate protein ClpY ClpQ ClpY Peptides ATP Dependent Protease Machinery ClpQY (PfHslUV system) • The HslUV complex in prokaryotes is composed of an HslV threonine protease and HslU ATP-dependent protease, a chaperone of Clp/Hsp100 family. • HslV (ClpQ) subunits are arranged in form of two-stacked hexameric rings and are capped by two HslU (ClpY) hexamers at both ends. • HslU (ClpY) hexamer recognizes and unfold peptide substrates with an ATP dependent process, and translocates them into HslV for degradation. Crystal structure of HslUV complex in H. influenzae PfClpQY complex model in P. falciparum ATP Dependent Protease machineries ClpQY (PfHslUV system) • The HslUV complex in prokaryotes is composed of an HslV threonine protease and ATP-dependent protease HslU, a chaperone of clp/Hsp100 family. • HslV subunits are arranged in the form of two-stacked hexameric rings and are capped by two HslU hexamers at both ends. • In an ATP dependent process, HslU hexamer recognizes and unfold peptide substrates and translocate them into HslV for degradation. PfClpQ component MFIRNFVNIIGSQKSITKTIARNYFSDNSKLIIPRHGTTILCVRKNN EVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFETKIDEYPNQL LRSCVELAKLWRTDRYLRHLEAVLIVADKDILLEVTGNGDVLEPSGNVLGTGSGGPYAMA AARALYDVENLSAKDIAYKAMNIAADMCCHTNNNFICETL For full length & matured active protein Length : 207 aa (170) Pro domain : 37aa Important motifs found: •TT at N terminal in mature protein •GSGG common chymotrypsin protease signal. •Lys(28) and Arg(35) are two conserved amino acids play some role in the activity. Homologs of PfClpQ protein in other Plasmodium spp PK_ClpQ PV_ClpQ PF_ClpQ PY_ClpQ PB_ClpQ TTILCVRKNNEVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFE TTILCVRKNNEVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFE TTILCVRKNNEVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFE TTILCVRKNNEVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFE TTILCVRKNNEVCLIGDGMVSQGTMIVKGNAKKIRRLKDNILMGFAGATADCFTLLDKFE ************************************************************ PK_ClpQ PV_ClpQ PF_ClpQ PY_ClpQ PB_ClpQ TKIDEYPDQLLRSCVELAKLWRTDRYLRHLEAVLIVADKDVLLEVTGNGDVLEPSGNVLG TKIDEYPDQLLRSCVELAKLWRTDRYLRHLEAVLIVADKDVLLEVTGNGDVLEPSGNVLG TKIDEYPNQLLRSCVELAKLWRTDRYLRHLEAVLIVADKDILLEVTGNGDVLEPSGNVLG TKIDEYPDQLLRSCVELAKLWRTDRYLRHLEAVLIVADKDTLLEVTGNGDVLEPSGNVLG TKIDEYPDQLLRSCVELAKLWRTDRYLRHLEAVLIVADKDTLLEVTGNGDVLEPSGNVLG *******:******************************** ******************* PK_ClpQ PV_ClpQ PF_ClpQ PY_ClpQ PB_ClpQ TGSGGPYAIAAARALYDVENLSAKDIAYKAMNIAADMCCHTNNNFICETL TGSGGPYAIAAARALYDVENLSAKDIAYKAMNIAADMCCHTNNNFICETL TGSGGPYAMAAARALYDVENLSAKDIAYKAMNIAADMCCHTNNNFICETL TGSGGPYAMAAARALYDIENLSAKDIAYKAMNIAADMCCHTNHNFICETL TGSGGPYAIAAARALYDIENLSAKDIAYKAMNIAADMCCHTNHNFICETL ********:********:************************:******* Homology modeling of PfClpQ PfClpQ 1kyi Conservation of catalytic residues S125-G45-T1-K33 Structural alignment of PfClpQ and HslV (H.influenzae) Homology Modeling of PfClpQ E. coli S. enterica H. influenzae X. campestris W. pipientis P. falciparum T. brucei T. cruzi L. infantum E. coli S. enterica H. influenzae X. campestris W. pipientis P. falciparum T. brucei T. cruzi L. infantum E. coli S. enterica H. influenzae X. campestris W. pipientis P. falciparum T. brucei T. cruzi L. infantum •Most of the conserved residues in different bacterial species were either identical or similar in PfClpQ Biochemical characterization of PfClpQ protein Protease Activity assay for PfClpQ protein Fluorogenic peptide substrate Fluorescence Threonine protease like Chymotrypsin like Suc-LLVY-AMC chymostatin 100 50 0 1h 2h 3h 4h 5h 6h Time Substrate conc (mM) Km =19.18 mM 500 400 300 200 100 0 30 60 90 120 150 Time in minutes Substrate conc (mM) Km = 58.22 mM 180 AMC released (m moles) AMC released (m moles) 150 AMC released (m moles) Substrate: Cbz-GGL-AMC Inhibitor: Lactacystin Peptidyl glutamyl hydrolase Z-LLE-AMC MG132 150 100 50 0 1h 2h 3h 4h 5h 6h Time Substrate conc (mM) Km =37.79 mM Insilico identification of novel inhibitors against PfClpQ , a novel drug target of P.falciparum by high throughput docking Drug-like compound library (1,000,00) PfclpQ Molecular docking Ligand docked into protein’s active site Top 100 solutions Out of top 40 only 10 compounds available for purchase ClpQ interaction with ligand identified by virtual screening Phe46 Gly49 Gly48 Arg36 Thr2 Thr50 Val21 Ser22 Crystal structure of HslV complexed with a vinyl sulfone inhibitor Compound Gold Score Flexx score 1 52.54 -25.14 2 54.76 -17.37 3 54.66 -24.43 4 52.84 -24.47 Chemical Structure Identification of P. falciparum ClpY (PfClpY) gene A regulatory component of ClpQY system ClpY Recognizes the substrate; unfolds the substrate; feeds it into the degradation machine (ClpQ) ClpQ ClpY Belongs to AAA+ family of proteins PfClpY ATPase domain Walker A DOMAINS ~1.3 kb Contain all the three ClpY domains- N, I and C N Walker B I I-Domain C-Domain N C N-Domain Homology of PfClpY protein with homologs in other organisms Variation in I domain: plays role in recognition of different substrate Targeting the ClpQY interaction Crystal structure of HslUV in H. influenzae Modeled ClpQY interaction in P.falciparum J Biomol Struct Dyn. 2009 Feb;26(4):473-9 IDENTIFICATION OF DRUG TARGETS USING INTERACTION NETWORKS EXTRACTING THE MICROARRAY DATA FROM NCBI GEO NORMALIZATION IF NECESSARY OTHERWISE PREPARING EXCEL FILES FOR WGCNA ANALYSIS EXCEL SHEET OF NORMALIZED DATA AND GENE SIGNIFICANCE ANALYSING THESE FILES IN R LANGUAGE AND RUNNING THEM IN ANOTHER R PACKAGE –”WGCNA” FINDING DIFFERENT HUB GENES AND MODULES WHICH CAN BE USED AS DRUG TARGET BY REFERING TO THESE NETWORKS VISUALIZATION OF NETWORKS BY DIFFERENT GRAPHS AND SOFTWARE IN R PACKAGE PRINCIPLE BEHIND CONSTRUCTING NETWORK IS THAT THE GENES WHICH ARE CO-EXPRESSED, RELATED AND CAN BE CONNECTED TO MAKE A NETWORK , USING PEARSON CORRELATION COEFFICIENT THESE NETWORKS CAN BE USED FOR FINDING THE DRUG TARGETS THESE CAN ALSO BE USED FOR ANNOTATION OF PROTEINS AND GENES BY COMPARING THEM BY INTERACTOME STUDIES THESE NETWORKS CAN BE USED FOR PATHWAY ANNOTATION BETTER THAN OTHER STUDIES AS THEY ARE BASED ON THE MICROARRAY DATA Tools used: • Sequence analysis: Pairwise and multiple sequence alignments, Pfam. • Molecular modelling: Modeller • Docking: Tripos FlexX, GOLD, Arguslab • PP network: R package and Visant Molecular docking hands on • Download and install Arguslab in windows • Load a PDB file, practice Arguslab tools • Follow the tutorial at http://www.arguslab.com/tutorials/tutorial_ docking_1.htm Molecular Docking using Argus lab: Ex : Benzamidine inhibitor docked into Beta Trypsin Create a binding site from bound ligand Setting docking parameters Analyzing docking results Polypeptide builder.