Protein-ligand docking: A case study of DEF docking motif interactions in MAP kinases Yong Kong Bioinformatics Resource Yale University Outline • • • • Available programs in Bioinformatics Resource Introduction to molecular docking Autodock 4: a free docking software Substrate discrimination among MAP kinases through distinct docking motifs • Modeling DEF docking motif interactions in MAP kinases using Autodock 4 Available commercial software • DNA/protein sequence analysis – Lasergene – Gene Construction Kits • Microarray analysis – Genespring GX – Partek Genomics Suite • Pathway Analysis – Ingenuity Pathway Analysis – MetaCore • Genotyping analyses – Genespring GT – HelixTree Available commercial software • Protein structure modeling and visualization – SYBYL 8 • Pipelining programs – Pipeline Pilot – VIBE • Mass spectrometry data analysis – GPMAW SYBYL SYBYL SYBYL • SYBYL Base: Comprehensive tools for molecular modeling – structure building, optimization, and comparison; – visualization of structures and associated data; – annotation, hardcopy, and screen capture capabilities; – a wide range of force fields Electrostatic potential for inhibitor methotrexate bound to dihydrofolate reductase SYBYL • Receptor Based Design: docking and de novo design Docked inhibitor (yellow) superimposed with crystal structure (purple) • Ligand Based Design: QSAR, ADME, pharmacophore, structure alignment, etc Left: pharmacophore model; right: X-way structure (CDK2 inhibitors) SYBYL • Protein Modeling: – A database of detailed structural profiles of all known protein families – Structural homologs identified by sequencestructure comparison – Comparative models built from a target sequence using single or multiple structural homologs A set of structurally aligned oxidoreductase structures of 8% sequence identity. Molecular docking • Computationally predict – the structure (pose) – binding free energy of the intermolecular complex formed between two or more constituent molecules Questions and Goals • The questions we are interested in are: – – – – Do two biomolecules bind each other? If so, how and where do they bind? What is the binding free energy or affinity? What chemical groups determine the binding? • The goals we have are: – – – – – Searching for lead compounds Estimating effect of modifications General understanding of binding Design directed libraries … Docking: input data • The starting point: – the atomic coordinates of the two molecules • Additional data: – – – – biochemical mutational conservational … • These additional data can significantly improve the performance; however, this extra information is not absolutely necessary Docking: two components Two related components of docking: 1. Search algorithm: sample sufficiently and efficiently the degrees of freedom of the protein–ligand system (position, orientation, and conformation) 2. Scoring function: represent the thermodynamics of interactions so as to distinguish the true binding modes from all the other possible solutions, and to rank them accordingly Flowchart of docking algorithms Rigid or flexible molecules • Protein + ligand: – Rigid protein + rigid ligand – Rigid protein + flexible ligand – Flexible protein + flexible ligand • Protein + protein: – Rigid protein + rigid protein (still the standard) – Introducing flexibilities into protein-protein docking is challenging Docking software: total number of citations till 2005 Sousa, et. al (2006) Docking programs: citations per year Sousa, et. al (2006) Docking programs: percentage of citations per year freely available for academic users Sousa, et. al (2006) Autodock • Developed in Arthur Olson’s lab in the Scripps Research Institute • Free academic license • The most used program for molecular docking • The latest version is Autodock 4 Autodock features • Pre-calculate atomic affinity potentials for each atom type in the ligand • Support different search methods – Lamarckian genetic algorithm (LGA) – traditional genetic algorithm (GA) – Monte Carlo simulated annealing • Reasonably accurate binding free energy: the scaling factors are empirically calibrated from experimental data Pre-calculated grid maps • A grid map consists of a three dimensional lattice of regularly spaced points, surrounding (either entirely or partly) and centered on some region of interest of the macromolecule under study. • The probe's energy at each grid point is determined by the set of parameters supplied for that particular atom type, and is the summation over all atoms of the macromolecule, within a nonbonded cutoff radius, of all pairwise interactions. From AutoDock manual Pre-calculated grid maps • After the grid map is calculated, it can be used repeatedly in the docking calculations • The time to perform an energy calculation using the grids is proportional only to the number of atoms in the ligand, and is independent of the number of atoms in the receptor Genetic Algorithm (GA) • Computational method based on the ideas and language of natural genetics and evolution • State variables (translation, orientation, and conformation of ligands) “genes” Gene1 x y Gene2 Gene3 z q0 q1 q2 q3 t1 One for each torsion t2 t3 … quaternion • These “genes” make up the “genotypes” • Atomic coordinates are “phenotypes” • “Fitness” is the total interaction energy “chromosome” Genetic Algorithm (GA) • The evolution starts from a population of randomly generated “individuals” • Random individuals are “mated” randomly • New “individuals” inherit genes from either parent through “crossover”: ABC/abc AbC/aBc • Some offspring undergo random “mutation” (one gene is changed by a random amount) • Selection of offspring is based on fitness Genetic Algorithm (GA) Create a random population Fitness evaluation Selection best individuals to reproduce, and their #offspring Crossover: ABC/abc AbC/aBc Mutation (based on Cauchy distribution) Elitist select (top individuals survive into next generation) Termination: # generation? OR # energy evaluation? Lamarckian Genetic Algorithm • Most GAs mimic Darwinian evolution: one-way transfer of information from genotype phenotype (right-side) • This corresponds to the global search of the minima “fitness” Lamarckian Darwinian Lamarckian Genetic Algorithm • One novel improvement of Autodock is the incorporation of local search (left-side) • This is called Lamarckian Genetic Algorithm (LGA), in allusion to Larmarck’s discredited assertion that phenotype acquired can become heritable. “fitness” Lamarckian Darwinian Lamarckian Genetic Algorithm • It’s only possible for LGA if the mapping function from genotype phenotype is invertible: phenotype genotype Genotype Phenotype • Another novel feature of Autodock: the local search is done in the genotypic space rather than phenotypic space • So there is no need for the mapping to be inverted • Performance: LGA > GA > SA Autodock: Scoring Function Dispersion/repulsion • The program uses a fiveterm force field-based function loosely based on the AMBER force field • The scaling factor for each of these five terms is empirically calibrated from a set of 30 structurally known protein–ligand complexes. H-bond electrostatic DGtor : entropic term DGsol : intermolecular pairwise desolvation term Protein kinases • Phosphorylation is the most common reversible post-translational protein modification in eukaryotes • Protein kinases are key players in signal transduction networks • Many cancers are characterized by uncontrolled kinase activity TK TKL STE The human kinome CMGC CAMK AGC Kinase specificity • Tight control of the specificity of protein kinases is required to maintain normal physiology • Specificity is determined in part through recognition of consensus sequences around the site of phosphorylation • However, active site alone is not enough: short amino acid sequence motifs can occur at high frequency in proteomes: ~700,000 potentially phosphorylatable residues Ubersax and Ferrell (2007) Docking interactions ensure specificity • Combinatorial docking interactions are a generally-used mechanism to ensure kinase specificity – The docking sites are distal from the phosphorylation site in the substrates – Outside the active site in the kinase MAP kinases • Mediate cellular responses to a wide variety of extracellular stimuli: growth factor, cytokines, UV, oxidative stress, etc. • Regulate many important cellular activities: gene expression, mitosis, movement, metabolism, cell death, etc. • MAP kinases lie at the bottom of conserved three-component phosphorylation cascades MAP Kinase cascade MAPKs Ramen, et. al (2007) MAP kinases • Three major subfamilies: – ERK (extracellular regulated kinases): ERK1 and ERK2 – p38: p38a, p38b, p38g, p38d – JNK (c-Jun N-terminal kinases): JNK1-3 • The different MAPK subfamilies phosphorylate a distinct set of protein substrates Consensus sequence • Consensus sequence for ERK1, ERK2 and p38α: P-X-S/T-P • ~700,000 potentially phosphorylatable residues • Needs other mechanisms to ensure specificity MAPK’s common phos-site • Positional scanning peptide library: Systematically substitutions of 20 a.a + pT + pY at the 9 positions surrounding a central phosphorylation site (9 x 22) • Confirmed the P-X-S/T-P previously found for ERK2 and p38α • No significant differences among any of the four representative MAPKs Sheridan et. al D-site • Two docking interactions: D-site & DEF site • The first one: D-site (also referred to as the Ddomain, δ-domain, or DEJL domain) • Two or more basic residues followed by a short linker and a cluster of hydrophobic residues • Docking occurs along a groove on the opposite face of the active site of MAPK D-site • Well-characterized • Mutagenesis • Hydrogen-exchange mass spectrometry (HXMS) • X-ray crystallography Lee, et. al (2004) DEF site • DEF site (docking site for ERK FXF, also called the F-site) • Best characterized in ERK • F-X-F/Y-P • 6 and 20 amino acids C-terminal to the phosphorylation site DEF motif • Peptide: derived from Elk1 386399 (phos-site + DEF site) • 19 a.a. (excluding cys) substitutions at each four positions (Z) • The extent of phosphorylation was quantified Sheridan et. al aromatic DEF motif Selectivity > 1.5 (bold when > 3.0) aliphatic No preference Sheridan et. al DEF site selectivity Phos-site DEF site p38a p38d Sheridan et. al DEF interacting pocket - HX-MS green: decreased exchange rate upon DEF peptide binding solvent protection Lee, et. al (2004) DEF interacting pocket - HX-MS Strongest protected regions pT183, pY185 yellow: surface hydrophobic residues Lee, et. al (2004) Docking with autodock • Ligand: a capped pentapeptide DEF site ligand: acetyl-SFQFP-amide • Receptor: published structure of diphosphorylated ERK2 (PDB code 2ERK) • Grid map: 50 x 50 x 50 points with a spacing of 0.375 Å, centered on the previously identified hydrophobic pocket on the ERK2 surface • 256 independent docking runs Grid map Autodock results: model clusters Clustering threshold: RMSD 2 Å Model of DEF site interaction Orange: peptide ligand Green: hydrophobic pocket Model of DEF site interaction Model of DEF site interaction Model of DEF site interaction Structural determinants – mutagenesis studies Highlighted: residues surrounding the DEF pocket • Alanine substitutions of key residues in the binding pocket significantly attenuate phosphorylation (except for L195A of p38d) Mutagenesis studies • Mutants that swap DEF site specificity WT: aromatic DM: aliphatic WT: aliphatic DM: aromatic Sheridan et. al Mutagenesis studies • Collectively these mutagenesis experiments and molecular docking support a mode of binding: – P1 residue contacts residues analogous to Ile196, Met197 and Leu198 of ERK2 – P3 residue makes contact with Leu235 Acknowledgements Dr. Turk Dr. Sheridan Department of Pharmacology, Yale University