"Protein-ligand docking:  A case study of DEF docking motif interactions in MAP kinases”

advertisement
Protein-ligand docking:
A case study of DEF docking motif
interactions in MAP kinases
Yong Kong
Bioinformatics Resource
Yale University
Outline
•
•
•
•
Available programs in Bioinformatics Resource
Introduction to molecular docking
Autodock 4: a free docking software
Substrate discrimination among MAP kinases
through distinct docking motifs
• Modeling DEF docking motif interactions in
MAP kinases using Autodock 4
Available commercial software
• DNA/protein sequence analysis
– Lasergene
– Gene Construction Kits
• Microarray analysis
– Genespring GX
– Partek Genomics Suite
• Pathway Analysis
– Ingenuity Pathway Analysis
– MetaCore
• Genotyping analyses
– Genespring GT
– HelixTree
Available commercial software
• Protein structure modeling and visualization
– SYBYL 8
• Pipelining programs
– Pipeline Pilot
– VIBE
• Mass spectrometry data analysis
– GPMAW
SYBYL
SYBYL
SYBYL
• SYBYL Base: Comprehensive tools for
molecular modeling
– structure building, optimization, and
comparison;
– visualization of structures and associated
data;
– annotation, hardcopy, and screen
capture capabilities;
– a wide range of force fields
Electrostatic potential for inhibitor methotrexate bound to dihydrofolate reductase
SYBYL
• Receptor Based Design: docking and de novo
design
Docked inhibitor (yellow) superimposed with crystal structure (purple)
• Ligand Based Design: QSAR, ADME,
pharmacophore, structure alignment, etc
Left: pharmacophore model; right: X-way structure (CDK2 inhibitors)
SYBYL
• Protein Modeling:
– A database of detailed structural profiles of all
known protein families
– Structural homologs identified by sequencestructure comparison
– Comparative models built from a target sequence
using single or multiple structural homologs
A set of structurally aligned oxidoreductase structures
of 8% sequence identity.
Molecular docking
• Computationally predict
– the structure (pose)
– binding free energy
of the intermolecular complex formed
between two or more constituent molecules
Questions and Goals
• The questions we are interested in are:
–
–
–
–
Do two biomolecules bind each other?
If so, how and where do they bind?
What is the binding free energy or affinity?
What chemical groups determine the binding?
• The goals we have are:
–
–
–
–
–
Searching for lead compounds
Estimating effect of modifications
General understanding of binding
Design directed libraries
…
Docking: input data
• The starting point:
– the atomic coordinates of the two molecules
• Additional data:
–
–
–
–
biochemical
mutational
conservational
…
• These additional data can significantly improve the
performance; however, this extra information is not
absolutely necessary
Docking: two components
Two related components of docking:
1. Search algorithm: sample sufficiently and
efficiently the degrees of freedom of the
protein–ligand system (position, orientation, and
conformation)
2. Scoring function: represent the thermodynamics
of interactions so as to distinguish the true
binding modes from all the other possible
solutions, and to rank them accordingly
Flowchart of docking algorithms
Rigid or flexible molecules
• Protein + ligand:
– Rigid protein + rigid ligand
– Rigid protein + flexible ligand
– Flexible protein + flexible ligand
• Protein + protein:
– Rigid protein + rigid protein (still the standard)
– Introducing flexibilities into protein-protein
docking is challenging
Docking software: total number of citations till 2005
Sousa, et. al (2006)
Docking programs: citations per year
Sousa, et. al (2006)
Docking programs: percentage of citations per year
freely available for academic users
Sousa, et. al (2006)
Autodock
• Developed in Arthur Olson’s lab in the Scripps
Research Institute
• Free academic license
• The most used program for molecular docking
• The latest version is Autodock 4
Autodock features
• Pre-calculate atomic affinity potentials for
each atom type in the ligand
• Support different search methods
– Lamarckian genetic algorithm (LGA)
– traditional genetic algorithm (GA)
– Monte Carlo simulated annealing
• Reasonably accurate binding free energy: the
scaling factors are empirically calibrated from
experimental data
Pre-calculated grid maps
• A grid map consists of a three
dimensional lattice of regularly
spaced points, surrounding (either
entirely or partly) and centered on
some region of interest of the
macromolecule under study.
• The probe's energy at each grid
point is determined by the set of
parameters supplied for that
particular atom type, and is the
summation over all atoms of the
macromolecule, within a nonbonded cutoff radius, of all
pairwise interactions.
From AutoDock manual
Pre-calculated grid maps
• After the grid map is
calculated, it can be used
repeatedly in the docking
calculations
• The time to perform an
energy calculation using the
grids is proportional only to
the number of atoms in the
ligand, and is independent
of the number of atoms in
the receptor
Genetic Algorithm (GA)
• Computational method based on the ideas and
language of natural genetics and evolution
• State variables (translation, orientation, and
conformation of ligands)  “genes”
Gene1
x
y
Gene2
Gene3
z q0 q1 q2 q3 t1
One for each torsion
t2
t3
…
quaternion
• These “genes” make up the “genotypes”
• Atomic coordinates are “phenotypes”
• “Fitness” is the total interaction energy
“chromosome”
Genetic Algorithm (GA)
• The evolution starts from a population of randomly
generated “individuals”
• Random individuals are “mated” randomly
• New “individuals” inherit genes from either parent
through “crossover”:
ABC/abc  AbC/aBc
• Some offspring undergo random “mutation” (one
gene is changed by a random amount)
• Selection of offspring is based on fitness
Genetic Algorithm (GA)
Create a random population
Fitness evaluation
Selection best individuals to reproduce, and their #offspring
Crossover: ABC/abc  AbC/aBc
Mutation (based on Cauchy distribution)
Elitist select (top individuals survive into next generation)
Termination:
# generation? OR
# energy evaluation?
Lamarckian Genetic Algorithm
• Most GAs mimic
Darwinian evolution:
one-way transfer of
information from
genotype  phenotype
(right-side)
• This corresponds to the
global search of the
minima
“fitness”
Lamarckian
Darwinian
Lamarckian Genetic Algorithm
• One novel improvement of
Autodock is the
incorporation of local
search (left-side)
• This is called Lamarckian
Genetic Algorithm (LGA), in
allusion to Larmarck’s
discredited assertion that
phenotype acquired can
become heritable.
“fitness”
Lamarckian
Darwinian
Lamarckian Genetic Algorithm
• It’s only possible for LGA if the mapping function
from genotype  phenotype is invertible:
phenotype  genotype
Genotype  Phenotype
• Another novel feature of Autodock:
the local search is done in the genotypic space rather
than phenotypic space
• So there is no need for the mapping to be inverted
• Performance: LGA > GA > SA
Autodock: Scoring Function
Dispersion/repulsion
• The program uses a fiveterm force field-based
function loosely based on
the AMBER force field
• The scaling factor for each
of these five terms is
empirically calibrated from
a set of 30 structurally
known protein–ligand
complexes.
H-bond
electrostatic
DGtor : entropic term
DGsol : intermolecular pairwise
desolvation term
Protein kinases
• Phosphorylation is the most common
reversible post-translational protein
modification in eukaryotes
• Protein kinases are key players in signal
transduction networks
• Many cancers are characterized by
uncontrolled kinase activity
TK
TKL
STE
The human kinome
CMGC
CAMK
AGC
Kinase specificity
• Tight control of the specificity of protein kinases is
required to maintain normal physiology
• Specificity is determined in part through recognition
of consensus sequences around the site of
phosphorylation
• However, active site alone is not enough: short
amino acid sequence motifs can occur at high
frequency in proteomes:
~700,000 potentially phosphorylatable residues
Ubersax and Ferrell (2007)
Docking interactions ensure specificity
• Combinatorial docking interactions are a
generally-used mechanism to ensure kinase
specificity
– The docking sites are distal from the
phosphorylation site in the substrates
– Outside the active site in the kinase
MAP kinases
• Mediate cellular responses to a wide variety of
extracellular stimuli: growth factor, cytokines,
UV, oxidative stress, etc.
• Regulate many important cellular activities:
gene expression, mitosis, movement,
metabolism, cell death, etc.
• MAP kinases lie at the bottom of conserved
three-component phosphorylation cascades
MAP Kinase cascade
MAPKs
Ramen, et. al (2007)
MAP kinases
• Three major subfamilies:
– ERK (extracellular regulated kinases): ERK1 and
ERK2
– p38: p38a, p38b, p38g, p38d
– JNK (c-Jun N-terminal kinases): JNK1-3
• The different MAPK subfamilies phosphorylate
a distinct set of protein substrates
Consensus sequence
• Consensus sequence for ERK1, ERK2 and
p38α:
P-X-S/T-P
• ~700,000 potentially phosphorylatable
residues
• Needs other mechanisms to ensure specificity
MAPK’s common phos-site
• Positional scanning peptide
library: Systematically
substitutions of 20 a.a + pT
+ pY at the 9 positions
surrounding a central
phosphorylation site (9 x
22)
• Confirmed the P-X-S/T-P
previously found for ERK2
and p38α
• No significant differences
among any of the four
representative MAPKs
Sheridan et. al
D-site
• Two docking interactions: D-site & DEF site
• The first one: D-site (also referred to as the Ddomain, δ-domain, or DEJL domain)
• Two or more basic residues followed by a
short linker and a cluster of hydrophobic
residues
• Docking occurs along a groove on the opposite
face of the active site of MAPK
D-site
• Well-characterized
• Mutagenesis
• Hydrogen-exchange
mass spectrometry (HXMS)
• X-ray crystallography
Lee, et. al (2004)
DEF site
• DEF site (docking site for ERK FXF, also called
the F-site)
• Best characterized in ERK
• F-X-F/Y-P
• 6 and 20 amino acids C-terminal to the
phosphorylation site
DEF motif
• Peptide: derived from Elk1 386399 (phos-site + DEF site)
• 19 a.a. (excluding cys)
substitutions at each four
positions (Z)
• The extent of phosphorylation
was quantified
Sheridan et. al
aromatic
DEF motif
Selectivity > 1.5
(bold when > 3.0)
aliphatic
No preference
Sheridan et. al
DEF site selectivity
Phos-site
DEF site
p38a
p38d
Sheridan et. al
DEF interacting pocket - HX-MS
green: decreased exchange rate upon DEF peptide binding 
solvent protection
Lee, et. al (2004)
DEF interacting pocket - HX-MS
Strongest protected regions
pT183, pY185
yellow: surface hydrophobic residues
Lee, et. al (2004)
Docking with autodock
• Ligand: a capped pentapeptide DEF site ligand:
acetyl-SFQFP-amide
• Receptor: published structure of
diphosphorylated ERK2 (PDB code 2ERK)
• Grid map: 50 x 50 x 50 points with a spacing of
0.375 Å, centered on the previously identified
hydrophobic pocket on the ERK2 surface
• 256 independent docking runs
Grid map
Autodock results: model clusters
Clustering threshold: RMSD 2 Å
Model of DEF site interaction
Orange: peptide ligand
Green: hydrophobic pocket
Model of DEF site interaction
Model of DEF site interaction
Model of DEF site interaction
Structural determinants – mutagenesis
studies
Highlighted: residues surrounding the DEF pocket
• Alanine substitutions of key residues in the binding
pocket significantly attenuate phosphorylation
(except for L195A of p38d)
Mutagenesis studies
• Mutants that swap DEF site specificity
WT: aromatic 
DM: aliphatic
WT: aliphatic 
DM: aromatic
Sheridan et. al
Mutagenesis studies
• Collectively these mutagenesis experiments
and molecular docking support a mode of
binding:
– P1 residue contacts residues analogous to Ile196,
Met197 and Leu198 of ERK2
– P3 residue makes contact with Leu235
Acknowledgements
Dr. Turk
Dr. Sheridan
Department of Pharmacology, Yale University
Download