Thinking Outside the Box: Applications Including Finding Off-targets for Major Pharmaceuticals Philip E. Bourne pbourne@ucsd.edu Agenda • Overall Theme - Thinking differently about proteins: – Spherical harmonics and phylogeny – The Gaussian Network Model and new modes of motion – The Geometric Potential for Describing Ligand Binding Sites – SOIPPA for finding off-site targets The Curse of the Ribbon 7 8 The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but its time for new views It is not how a ligand sees a protein after all. Limitations • A local viewpoint – does not capture the global properties of the protein • A local viewpoint does not capture the global properties of a protein • Cartesian coordinates do not necessarily capture the properties of the protein • Comparative analysis is limited Agenda • Overall Theme - Thinking differently about proteins: – Spherical harmonics and phylogeny – The Gaussian Network Model and new modes of motion – The Geometric Potential for Describing Ligand Binding Sites – SOIPPA for finding off-site targets Protein Kinase A – Open Book View Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49 Superfamily Members – The Same But Different Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49 An Alternative Approach: Multipolar Representation • Roots in spherical harmonics • Parameter space and boundary conditions can be a variety of properties • Order of the multipoles defines the granularity of the descriptors • Bottom line – interpreted as shape descriptors Gramada & Bourne 2006 BMC Bioinformatics 7:242 Geometric Comparison Does Not Reflect Biological Reality Gramada & Bourne 2006 BMC Bioinformatics 7:242 Results – Protein Kinase Like Superfamily Alignment Clear distinction between families. Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level. Gramada & Bourne 2006 BMC Bioinformatics 7:242 Results – Protein Kinase Like Superfamily Alignment Gramada & Bourne 2006 BMC Bioinformatics 7:242 Possibilities – Structure Based Phylogenetic Analysis Scheeff & Bourne Multipoles Gramada & Bourne 2007 PLoS ONE submitted Agenda • Overall Theme - Thinking differently about proteins: – Spherical harmonics and phylogeny – The Gaussian Network Model and new modes of motion – The Geometric Potential for Describing Ligand Binding Sites – SOIPPA for finding off-site targets Protein Motion Ordered Structures Disordered Structures Structures exist in a spectrum from order to disorder Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90 Obtaining Protein Dynamic Information Protein Structures Treated as a 3-D Elastic Network Bahar, I., A.R. Atilgan, and B. Erman Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design, 1997. 2(3): p. 173-181. Gaussian Network Model • Each Ca is a node in the network. • Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å) • Decompose protein fluctuation into a summation of different modes. Functional Flexibility Score • Utilize correlated movements to help define regional flexibility with functional importance. Functionally Flexible Score For each residue: 1. Find Maximum and Minimum Correlation. 2. Use to scale normalized fluctuation to determine functional importance. Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90 Identifying FFRs in HIV Protease Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90 Other Examples BPTI and Calmodulin Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90 Side Note: Gaussian Network Model vs Molecular Dynamics • GNM relatively course grained • GNM fast to compute vs MD –Look over larger time scales –Suitable for high throughput Agenda • Overall Theme - Thinking differently about proteins: – Spherical harmonics and phylogeny – The Gaussian Network Model and new modes of motion – The Geometric Potential for Describing Ligand Binding Sites – SOIPPA for finding off-site targets Motivation • What if we can characterize a proteinligand binding site from a 3D structure (primary site) and search for that site on a proteome wide scale? • We could perhaps find alternative binding sites (secondary sites) for existing pharmaceuticals? • We could use it for lead optimization and possible ADME/Tox prediction Background – PDB Contains Major Pharmaceuticals Bound to Receptors Generic Name Other Name Treatment PDBid Lipitor Atorvastatin High cholesterol 1HWK, 1HW8… Testosterone Testosterone Osteoporosis 1AFS, 1I9J .. Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH Viagra Sildenafil citrate ED, pulmonary arterial hypertension 1TBF, 1UDT, 1XOS.. Digoxin Lanoxin Congestive heart failure 1IGJ Background – Superfamily (Derived from Structure) Covers 38% of the Human Proteome http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY Background – Advantage to Using Functional Site Similarity Small molecule Similarity Protein Sequence/Structure Similarity Protein Functional Site Similarity • Poor correlation between structure and activity • Infinite chemical space . Not adequately reflecting functional relationship . Not directly addressing drug design problem . Build closer structurefunction relationships . Limit chemical space through co-evolution Overview of Algorithm Protein structure is represented with Ca atoms only and is characterized with a geometric potential • tolerant to protein flexibility and model uncertainty Optimum superimposition is achieved with a maximum weighted sub-graph algorithm with geometric constraints • sequence order independent to detect cross-fold relationships • to identify sub site similarity Functional site similarity is measured with both evolutionary correlation and physiochemical similarity • to distinguish divergent and convergent evolution Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 Characterization of the Ligand Binding Site - Conceptual 1 2 ab 3 4 5 c 1. Represent the protein structure 2. Determine the environmental boundary 3. Determine the protein boundary 4. Computation of the geometric potential 5. Computation of the virtual ligand Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 Characterization of the Ligand Binding Site - Conceptual Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments • Initially assign Ca atom with a value that is the distance to the environmental boundary • Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i GP P Pi cos(ai) 1.0 2.0 neighbors Di 1.0 Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 Discrimination Power of the Geometric Potential 4 binding site non-binding site 3.5 • Geometric potential can distinguish binding and non-binding sites 3 2.5 2 1.5 1 0.5 100 Geometric Potential 99 88 77 66 55 44 33 22 11 0 0 0 Geometric Potential Scale Boundary Accuracy of Ligand Binding Site Prediction 25 70 60 20 Distribution (%) Distribution (%) 50 15 10 40 30 20 5 10 0 0 10 20 30 40 50 60 70 Sensitivity (%) 80 90 100 10 20 30 40 50 60 70 80 90 100 Specificity (%) • ~90% of the binding sites can be identified with above 50% sensitivity • The specificity of ~70% binding sites identified is above 90% So Far… • Geometric potential dependant on local environment of a residue – relative to other residues and the environmental boundary • Geometric potential reasonably good at discriminating between ligand binding sites and non-ligand binding sites • Boundary of the binding site reasonably well defined • How to compare sites ??? Agenda • Overall Theme - Thinking differently about proteins: – Spherical harmonics and phylogeny – The Gaussian Network Model and new modes of motion – The Geometric Potential for Describing Ligand Binding Sites – SOIPPA for finding off-site targets Identification of Functional Similarity with Local Sequence Order Independent Alignment • Geometric and graph characterization of the protein structure • Chemical similarity matrix and evolutionary relationship with profile-profile comparison • Optimum alignment with maximum-weight subgraph algorithm Xie and Bourne 2007 PNAS, Submitted Similarity Matrix of Alignment Chemical Similarity • Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH) • Amino acid chemical similarity matrix Evolutionary Correlation • Amino acid substitution matrix such as BLOSUM45 • Similarity score between two sequence profiles d f a Sb f b S a i i i i i i fa, fb are the 20 amino acid target frequencies of profile a and b, respectively Sa, Sb are the PSSM of profile a and b, respectively Xie and Bourne 2007 PNAS, Submitted Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm Structure A Structure B LER VKDL LER VKDL • Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix • The maximum-weight clique corresponds to the optimum alignment of the two structures Efficient Functional Site Comparison with Evolutionary and Geometric Constraints • The search space is segmented with the residue clusters determined from the geometric potential • The nodes and edges are greatly reduced with the robust residue boundary orientation and neighbors a 1 + b a2 a1 c 2 b1 a2 a1 2c b2 1c b1 2c b2 1c The time complexity is almost linearly dependant on the number of residues Improved Performance of Alignment Quality and Search Sensitivity and Specificity 90 0.03 Amino Acid Grouping Chemical Similarity Substitution Matrix Profile-Profile 80 Amino Acid Group Chemical Similarity Substitution Matrix Profile-Profile 0.025 70 False Positive Ratio Frequency (%) 60 50 40 30 0.02 0.015 0.01 20 0.005 10 0 0 <1.0 <3.0 <5.0 <7.0 <9.0 <11.0 RMSD (Angsgroms) . RMSD distribution of the aligned common fragments of ligands from 247 test cases showing four scores: amino acid grouping, chemical similarity, substitution matrix and profile-profile. 0 0.04 0.08 0.12 True Positive Ratio 0.16 0.2 So What is the Potential of this Methodology? Lead Discovery from Fragment Assembly • Privileged molecular moieties in medicinal chemistry • Structural genomics and high throughput screening generate a large number of proteinfragment complexes • Similar sub-site detection enhances the application of fragment assembly strategies in drug discovery 1HQC: Holliday junction migration motor protein from Thermus thermophilus 1ZEF: Rio1 atypical serine protein kinase from A. fulgidus Lead Optimization from Conformational Constraints • Same ligand can bind to different proteins, but with different conformations • By recognizing the conformational changes in the binding site, it is possible to improve the binding specificity with conformational constraints placed on the ligand 1ECJ: amido-phosphoribosyltransferase from E. Coli 1H3D: ATP-phosphoribosyltransferase from E. Coli Finding Secondary Binding Sites for Major Pharmaceuticals • Scan known binding sites for major pharmaceuticals bound to their receptors against the human proteome • Try and correlate strong hits with known data from the literature, databases, clinical trials etc. to provide molecular evidence of secondary effects A Case Study Selective Estrogen Receptor Modulators (SERM) • One of the largest classes of drugs • Breast cancer, osteoporosis, birth control etc. • Amine and benzine moiety Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted. Adverse Effects of SERMs cardiac abnormalities thromboembolic disorders loss of calcium homeostatis ????? ocular toxicities Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted. 0.02 Density 0.04 0.06 Ligand Binding Site Similarity Search On a Proteome Scale 0.00 SERCA ERa 0 20 40 Score 60 80 • Searching human proteins covering ~38% of the drugable genome against SERM binding site • Matching Sacroplasmic Reticulum (SR) Ca2+ ion channel ATPase (SERCA) TG1 inhibitor site • ERa ranked top with p-value<0.0001 from reversed search against SERCA Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted. Structure and Function of SERCA • Regulating cytosolic calcium levels in cardiac and skeletal muscle • Cytosolic and transmembrane domains • Predicted SERM binding site locates in the TM, inhibiting Ca2+ uptakes Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted. Binding Poses of SERMs in SERCA from Docking Studies • Salt bridge interaction between amine group and GLU • Aromatic interactions for both N-, and C-moiety 6 SERMS A-F (red) Off-Target of SERMs cardiac abnormalities thromboembolic disorders loss of calcium homeostatis SERCA ! ocular toxicities in vivo and in vitro Studies TAM play roles in regulating calcium uptake activity of cardiac SR TAM reduce intracellular calcium concentration and release in the platelets Cataract results from TG1 inhibited SERCA up-regulations EDS increases intracellular calcium in lens epithelial cells by inhibiting SERCA in silico Studies Ligand binding site similarity Binding affinity correlation Conclusion • By thinking differently about how to represent proteins we have seen potential value in: – Phylogenetic analysis – The study of the dynamics of proteins – Improvements to the drug discovery process Acknowledgements Lei Xie Jian Yang Jenny Gu Protein Motions Apostol Gramada Multipole Analysis Support Open Access www.pdb.org • info@rcsb.org Implications on Drug Development Affinity (ER Site) Affinity (SERCA) Affinity Difference Bazedoxifene(BAZ) -9.44 +/- 0.54 -7.23 +/- 0.13 2.21 Lasofoxifene(LAS) -8.66 +/- 0.40 -6.54 +/- 0.20 2.12 Ormeloxifene(ORM) -8.67 +/- 0.18 -5.84 +/- 0.33 2.83 Raloxifene(RAL) -8.08 +/- 0.64 -5.78 +/- 0.23 2.30 4-hydroxytamoxifen(OHT) -7.67 +/- 0.47 -5.40 +/- 0.15 2.27 Tamoxifen(TAM) -7.30 +/- 0.28 -5.64 +/- 0.28 1.66 • Taking account of both target and off-target for lead optimization • Drug delivery and administration regime Swiss-Prot - 20 Year Celebration A Protein is More than the Union of its Parts • Breaking the protein into parts changes the object of the comparison • This is interpreted in many cases to imply that the rmsd measure is inadequate. • The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do. From Røgen & Fain (2003), PNAS 100:119-124 New Tricks – Protein Representation An Alternative Approach: Multipolar Representation Roots in Spherical Harmonics • Parameterization Charge distribution (i.e. structure) + boundary conditions f g Spatial distribution of a scalar quantity Ð Scalar potential i qlm out ; M lm in ; qlm ; Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation i M lm g An Alternative Approach: Multipolar Representation • “Out” Multipoles qlm = PN i= 1 ã ( òi ; þ i ) ; l = 0; ááá; 1 ; m = à l; ááá; l r li Ylm For a given rank l, they form a 2l+1 dimensional vector under 3D rotations q l = f q l;m gm = à l;ááá;l Vector algebra applies => metric properties Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation An Alternative Approach: Multipolar Representation The multipoles can be interpreted as shape descriptors In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation