Lecture 5: Applications in Biomolecular Simulation and Drug

advertisement
SMA5233
Particle Methods and Molecular Dynamics
Lecture 5: Applications in Biomolecular Simulation and
Drug Design
A/P Chen Yu Zong
Tel: 6516-6877
Email: phacyz@nus.edu.sg
http://bidd.nus.edu.sg
Room 08-14, level 8, S16
National University of Singapore
Proteins
Proteins are life’s machines, tools and structures
– Many jobs, many shapes, many sizes
2
Proteins
Proteins are life’s machines, tools and structures
– Nature reuses designs for similar jobs
1enh
1hdd
1bw5
1f43
1du6
1ftt
1cqt
3
Proteins
Proteins are hetero-polymers of specific
sequence
M K
L
V
D Y
A G E
– There are 20 common polymeric units (amino
acids)
Composed of a variety of basic chemical moieties
– Chain lengths range from 40 amino acids on up
4
Proteins
Proteins are hetero-polymers that adopt a
unique fold
M K
L
V
D Y
A G E
5
Proteins
Protein folding as a reaction
Transition
state
Bad
Free
Energy
Reactants
Products
Good
6
Proteins
Protein folding …
Transition
state
Bad
Free
Energy
Denatured /
Partially Unfolded
Native
Good
7
Proteins
Folded proteins
Transition
state
Bad
Free
Energy
Denatured /
Partially Unfolded
Native
Good
Folded, active, functional,
biologically relevant state
(ensemble of conformers)
8
Proteins
Folded proteins
Transition
state
Bad
Free
Energy
Denatured /
Partially Unfolded
Native
Good
Static, 3D coordinates of
some proteins’ atoms are
available from x-ray
crystallography & NMR 9
Proteins
Folded proteins
Transition
state
Bad
Free
Energy
Denatured /
Partially Unfolded
Native
Good
Static, 3D coordinates of
some proteins’ atoms are
available from PDB
http://www.pdb.org
10
Proteins
Folded proteins are complex and dynamic
molecules
Transition
state
Bad
Denatured /
Free
Partially Unfolded
Energy
Native
Good
11
Proteins
Folded proteins are complex and dynamic
molecules
Transition
state
Bad
Denatured /
Free
Partially Unfolded
Energy
Native
Good
12
Molecular Dynamics
MD provides atomic resolution of native
dynamics
PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
13
Molecular Dynamics
MD provides atomic resolution of native
dynamics
PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
14
Molecular Dynamics
MD provides atomic resolution of native
dynamics
3chy, hydrogens added
15
Molecular Dynamics
MD provides atomic resolution of native
dynamics
3chy, waters added (i.e. solvated)
16
Molecular Dynamics
MD provides atomic resolution of native
dynamics
3chy, waters and hydrogens hidden
17
Molecular Dynamics
MD provides atomic resolution of native
dynamics
18
native state simulation of 3chy at 298 Kelvin, waters and hydrogens hidden
Proteins
Folding & unfolding at atomic resolution
Transition
state
Bad
Free
Energy
Denatured /
Partially Unfolded
Native
Good
Disordered, non-functional,
heterogeneous ensemble
of conformers
19
Proteins
Protein folding, why we care how it happens
Transition state
Free
Energy
mutation
mutation
Denatured /
Partially Unfolded
Native
mutation
Many diseases are related to protein folding and / or
20
misfolding in response to genetic mutation.
Proteins
Protein folding, why we care how it happens
Transition state
Free
Energy
mutation
mutation
Denatured /
Partially Unfolded
Native
mutation
We need to comprehend folding to build nano-scale
biomachines (that could produce energy, etc…)
21
Proteins
Protein folding takes > 10 μs (often much
longer)
Transition
state
Bad
Denatured /
Free
Partially Unfolded
Energy
Native
Good
22
Proteins
Protein folding takes > 10 μs (often much
longer)
Transition
state
Bad
Denatured /
Free
Partially Unfolded
Energy
Native
Good
23
Proteins
Protein folding is the reverse of protein
unfolding
Transition
state
Bad
Denatured /
Free
Partially Unfolded
Energy
Native
Good
24
Proteins
Protein unfolding is relatively invariant to
temperature
Transition
state
Bad
Denatured /
Native
Free
Partially Unfolded
Energy
Temperature
Good
25
Molecular Dynamics
MD provides atomic resolution of folding /
unfolding
26
unfolding simulation (reversed) of 3chy at 498 Kelvin, waters & hydrogens hidden
Forces Involved in the Protein
Folding
Electrostatic interactions
van der Waals interactions
Hydrogen bonds
Hydrophobic interactions
(Hydrophobic molecules associate with each other in
water solvent as if water molecules is the repellent to
them. It is like oil/water separation. The presence of
water is important for this interaction.)
27
Energy Functions used in
Molecular Simulation
Φ
r
Θ
Bond stretching
term
Angle bending
term
Vtotal 

K b r  r0  
2
bonds

K    0  
2
angles
 K 1  cosn   

dihedrals
 Cij Dij 
 12  10  


 van der Waals
r
Hbonds rij
ij

 i , j pairs

Dihedral term


H-bonding term Van der Waals term
O
r
H
The most
time
demanding
part.
 Aij Bij 
qi q j
 12  6  
r
 electrosta tic r
r
ij
ij

 i , j pairs ij
Electrostatic
term
+
r
r
ー
28
System for MD Simulations
Without water molecules
With water molecules
# of atoms: 304
# of atoms: 304 + 7,377 =
7,681
29
MD Requires Huge
Computational Cost
Time step of MD (Δt) is limited up to about 1 fsec (10-15 sec).
← The size of Δt should be approximately one-tenth the time of the fastest motion in
the system. For simulation of a protein, because bond stretching motions of light
atoms (ex. O-H, C-H), whose periods are about 10-14 sec, are the fastest motions in
the system for biomolecular simulations, Δt is usually set to about 1 fsec.
Huge number of water molecules have to be used in biomolecular
MD simulations.
← The number of atom-pairs evaluated for non-bonded interactions (van der Waals,
electrostatic interactions) increases in order of N 2 (N is the number of atoms).
It is difficult to simulate for long time. Usually a few tens of
nanoseconds simulation is performed.
30
Time Scales of Protein Motions
and MD
Permeation of an ion in Porin
channel
Elastic vibrations of proteins
α-Helix folding
β-Hairpin folding
Bond stretching
Protein folding
10-15
10-12
10-9
10-6
10-3
100
(fs)
(ps)
(ns)
(μs)
(ms)
(s)
MD
Time
It is still difficult to simulate a whole process of a protein folding using the
conventional MD method.
31
Much Faster, Much Larger!
Special-purpose computer
– Calculation of non-bonded interactions is performed using the
special chip that is developed only for this purpose.
– For example;
MDM (Molecular Dynamics Machine) or MD-Grape: RIKEN
MD Engine: Taisho Pharmaceutical Co., and Fuji Xerox Co.
Parallelization
– A single job is divided into several smaller ones and they are
calculated on multi CPUs simultaneously.
– Today, almost MD programs for biomolecular simulations (ex.
AMBER, CHARMm, GROMOS, NAMD, MARBLE, etc) can run
on parallel computers.
32
Brownian Dynamics (BD)
The dynamic contributions of the solvent are
incorporated as a dissipative random force
(Einstein’s derivation on 1905). Therefore, water
molecules are not treated explicitly.
Since BD algorithm is derived under the conditions
that solvent damping is large and the inertial
memory is lost in a very short time, longer timesteps can be used.
BD method is suitable for long time simulation.
33
System for BD Simulations
Without water molecules
With water molecules
# of atoms: 304
# of atoms: 304 + 7,377 =
7,681
34
Algorithm of BD
The Langevin equation can be expressed as
d 2 ri
dr
mi 2   i i  Fi  R i
dt
dt
(7)
Here, ri and mi represent the position and mass of atom i, respectively. ζi is a frictional
coefficient and is determined by the Stokes’ law, that is, ζi = 6πaiStokesη in which aiStokes
is a Stokes radius of atom i and η is the viscosity of water. Fi is the systematic force on
atom i. Ri is a random force on atom i having a zero mean <Ri(t)> = 0 and a variance
<Ri(t)Rj(t)> = 6ζikTδijδ(t); this derives from the effects of solvent.
For the overdamped limit, we set the left of eq.7 to zero,
i
d ri
 Fi  R i
dt
(8)
The integrated equation of eq. 8 is called Brownian dynamics;
ri (t  t )  ri (t ) 
Fi (t )
i
t 
2kBT
i
t ωi
(9)
where Δt is a time step and ωi is a random noise vector obtained from Gaussian
distribution.
35
Computational Time of BD
Computational time required for 1 nsec simulation of a peptide
Algorithm
Computer
# of
atoms
Time
(sec)
Efficiency
MD
Pentium4
2.8 GHz
7,681
2,057
1.00
BD
Pentium4
2.8 GHz
304
38.8
53.0
BD
+MTS†
Pentium4
2.8 GHz
304
12.8
161
BD
+MTS†
IBM
Regatta
8 CPU
304
3.4
605
†MTS(Multiple
time step) algorithm: This method reduces the
frequency of calculation of the most time-demanding part (nonbonded energy terms).
36
Folding Simulation of an
α-Helical Peptide using BD
The fraction of
native contacts
1.0
0.8
0.6
0.4
0.2
0
0
100
200
300
Simulation time (nsec)
400
37
The fraction of
native contacts
Folding Simulation of an
β-Hairpin Peptide using BD
1.0
0.8
0.6
0.4
0.2
0
0
100
200
300
Simulation time (nsec)
400
38
Time Scales of Protein Motions
and BD
Permeation of an ion in Porin
channel
Elastic vibrations of proteins
α-Helix folding
β-Hairpin folding
Bond stretching
Protein folding
10-15
10-12
10-9
10-6
10-3
(fs)
(ps)
(ns)
(μs)
(ms)
MD
BD
100 Time
(s)
BD method allows us to
simulate for long time.
39
EXAMPLE: Unfolding of Staphylococcal
protein A through high temperature MD
simulations
D. Alonso and V. Daggett, PNAS 2000; 97: 133-138.
40
Simulation Methodology
Starting structure: NMR structures 1edk and 1bbd
ENCAD program
The protein was initially minimized 1,000 steps in vacuo.
The minimized protein was then solvated with water in a
box (approximately 1 g/ml) extending a minimum of
10 Å from the protein
The box dimensions were then increased uniformly to yield
the experimental liquid water density for the temperature of
interest (0.997 g/ml at 298 K and 0.829 at 498 K)
41
Simulation Methodology (cont)
The systems were then equilibrated by minimizing the
water for 2,000 steps, minimizing water and protein for
100 steps, performing MD of the water for 4,000 steps,
minimizing the water for 2,000 steps, minimizing the
protein for 500 steps, and minimizing the protein and water
for 1,000 steps.
Production MD simulations were then run using a 2-fs
time step for several ns (T=298K, T=498K)
42
RMSD and RG as a function of simulation time
43
Snapshots along the unfolding trajectory
44
EXAMPLE 2: Identification of the N-terminal peptide
binding site of GRP94
GRP94 - Glucose
regulated protein 94
VSV8 peptide - derived from
vesicular stomatitis virus
Gidalevitz T, Biswas C, Ding H, Schneidman-Duhovny D, Wolfson HJ,
Stevens F, Radford S, Argon Y. J Biol Chem. 2004
45
Biological and Drug Design Motivation
The complex between the two molecules highly
stimulates the response of the T-cells of the
immune system.
The grp94 protein alone does not have this
property. The activity that stimulates the immune
response is due to the ability of grp94 to bind
different peptides.
Characterization of peptide binding site is highly
important for drug design.
Either the peptides or their derived non-peptide
inhibitors can be developed into drugs for
treating immune related or immune-regulating
diseases
46
GRP94 molecule
There was no structure of grp94 protein. Homology
modeling was used to predict a structure using another
protein with 52% identity.
Recently the structure of grp94 was published. The RMSD
between the crystal structure and the model is 1.3A.
47
Docking and Structure Optimization
PatchDock was applied to dock the two molecules,
without any binding site constraints followed by MD or
MM simulation to optimize the docked structure.
Docking results were clustered in the two cavities:
48
GRP94 molecule
There is a binding site for inhibitors between the helices.
There is another cavity produced by beta sheet on the
opposite side.
49
Advantages of Fully Atomic Models
Detailed level of description of protein and solvent
Disadvantages of Fully Atomic Models
Computationally very costly
Cannot reach the long time and length scales of
biological interest
Can use enhanced sampling techniques (see talks by Andrij and Arturo)
Can use simplified (coarse-grained) models
50
Molecular Dynamics
Scalable, parallel MD & analysis software:
ilmm
in lucem Molecular Mechanics
1.
1
Beck, Alonso, Daggett, (2004) University of Washington, Seattle
51
Molecular Dynamics
ilmm is written in C (ANSI / POSIX)
64 bit math
POSIX threads / MPI
POSIX threads
(multiprocessor machines)
CPU
CPU
Message Passing Interface
(multiple machines)
+
CPU
CPU
VERY high bandwidth
Software design philosophy:
– Kernel
Compiles user’s molecular mechanics programs
Schedules execution across processor and machines
– Modules, e.g.
Molecular Dynamics
Analysis
52
Molecular Dynamics
ilmm is written in C (ANSI / POSIX)
64 bit math
POSIX threads / MPI
POSIX threads
(multiprocessor machines)
CPU
CPU
Message Passing Interface
(multiple machines)
+
CPU
CPU
VERY high bandwidth
Software design philosophy:
– Kernel
Compiles user’s molecular mechanics programs
Schedules execution across processor and machines
– Modules, e.g.
Molecular Dynamics
Analysis
53
Dynameomics
Simulate representative protein from all folds
coverage
population
1
150 folds represent ~ 75%
of known protein structures
fold
fold
1. Day R., Beck D. A. C., Armen R., Daggett V. Protein Science (2003) 10: 2150-2160.
54
Dynameomics
Simulate representative protein from all
folds
– Native (folded) dynamics
20 nanosecond simulation at 298 Kelvin
– Folding / unfolding pathway
3 x 2 ns simulations at 498 K
2 x 20 ns simulations at 498 K
– Each target requires 6 simulations
=
MANY CPU HOURS
55
Dynameomics
NERSC DOE INCITE award
– 2,000,000 + hours
– 906 simulations of 151 protein folds on Seaborg
– One to two simulations per node (8 – 16 CPUs /
simulation)
– Opportunity to tune ilmm for maximum
performance
56
Dynameomics
Load balancing
– Even distribution of non-bonded pairs to processors
~20%
faster
57
Dynameomics
Parallel efficiency
– Threaded computations on 16 CPU IBM Nighthawk
 1  t (1)  p, number of processors
parallel efficiency, e( p)   

 p  t ( p)  t(p), run-time using p processors
parallel efficiency
1
0.8
0.6
0.4
0.2
0
1
2
4
8
12
16
CPU CPU CPU CPU CPU CPU
58
Dynameomics
Simulate representative from top 151 folds
– 151 folds represent about 75% of known
proteins
~ 11 μs of combined sim. time from 906 sims!
~ 2 terabytes of data (w/ 40 to 60% compression!)
~ 75 / 151 have been analyzed
Validated against experiment where possible
59
Dynameomics
Now what?
– Simulate the top 1130 folds (>90%)
More CPU time
– Share simulation data from top 151 folds w/ world:
www.dynameomics.org
Coordinates, analyses, available via WWW
MicrosoftSQL database w/ On-Line Analytical
Processing (OLAP)
End-user queries of coordinate data, analyses, etc.
– Data mining
More CPU time, clever statistical algorithms, etc.
60
Download