Valeri Barsegov
Department of Chemistry
University of Massachusetts Lowell
YITP, Kyoto University, Japan (2008)
I.
Introduction:
•
• single molecule spectroscopy of protein unfolding: biological relevance; pulling experiments (AFM, laser/optical tweezers, force protocols) single molecule spectroscopy of unbinding: biological relevance; experimental probes; resolution of forces, lifetimes, and extension
II. Molecular simulations of proteins:
• proteins: structure, fold types, examples
•
• all-atom Molecular Dynamics (MD) simulations: force fields, examples, simulations of IR spectra coarse-grained description of proteins: approximations, examples
III. New direction - computer simulations using graphics cards:
•
• basic facts, computer architecture, algorithms applications
I.1 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: biological relevance
Fact 1: “mechanically active” proteins perform their biological function in linear tandems of
“head-to-tail” (C-terminal-to-N-terminal) connected protein domains
Examples:
•
•
•
•
Titin contains tandems of immunoglobulin (Ig) domains, separated by short linkers sequences (muscle function)
Actin-crosslinking filamins contain rod-like tandem of ddFLN domains (cellular
locomotion)
Fibronectin tandems consist of nonidentical Fn domains (extracellular matrix, cell
elasticity)
Ubiquitin is a multimeric protein (Ub) n of n=9 identical Ub repeats (protein degradation,
signaling pathways)
I.2 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: AFM experiment
0 f
S force-clamp mode f
S
f
0
const t
0 f
S force-ramp mode f
S
( )
r t t
J. Brujic, R. Hermans, K. Walther & J.
Fernandez, Nature Phys., 2, 282 (2006); J.
Fernandez & H. Li, Science, 303, 1674 (2004)
M. Rief, M. Gautel, F. Oesterhelt, J. Fernandez
& H. Gaub, Science, 276, 1109 (1997); R.
Zinober, D. Brockwell, G. Beddard, A. Blake, P.
Olmsted, S. Radford & D. Smith, Protein Sci.,
11, 2759 (2002)
I.3 Single-molecule dynamic force spectroscopy of forced unbinding of proteins: biological relevance
I.4 Single-molecule dynamic force spectroscopy of forced unbinding of proteins: leukocyte rolling on endothelium
J.-G. Geng, M. Chen, K.-C. Chou, Curr Med Chem, 11,
2153 (2004); L. M. Coussens, Z. Werb, Nature, 420, 860
(2002); Y. J. Kim, L. Borgis, N. M. Varki, A. Varki,
Proc. Natl. Acad. Sci. USA, 95, 9325 (1998) ; J. Weisel, H.
Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227
(2003)
I.5 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: pulling force AFM experiment f-constant f(t)=r f t t, s f, pN
J. Weisel, H. Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227 (2003); M. Schlierf, H. Li, J. Fernandez, PNAS,
101, 7299 (2004); J. Liphardt, D. Smith, C. Bustamante, Curr Opin Struct Biol, 19, 279 (2000); J.-F. Allemand, D.
Bensimon, V. Croquette, ibid, 13, 266 (2003); S. Weiss, Science, 283, 1676 (1999); E. Evans, PNAS, 98, 3784 (2001)
I.6 Single-molecule dynamic force spectroscopy of proteins: experimental resolution of unfolding forces, times, and distances
Experimental resolution:
•protein extension
X ~1 nm;
•stretching force f
S
100pN
•force-quench f
Q
5-10pN
•relaxation interval T 10-100
s
J. Fernandez & H. Li, Science, 303, 1674 (2004);
I. Schwaiger, M. Schleicher, A. Noegel & M. Rief,
EMBO Reports, 6, 46 (2005); J. Brujic, R. Hermans,
K. Walther & J. Fernandez, Nature Phys., 2, 282
(2006)
II.1 Molecular simulations of proteins: levels of structure of proteins o Amino acids in proteins (or polypeptides) are joined together by peptide bonds. o The sequence of R-groups along the chain is called the primary structure . o Secondary structure refers to the local folding of the polypeptide chain. o Tertiary structure is the arrangement of secondary structure elements in 3D o Quaternary structure describes the arrangement of a protein's subunits.
The PDB is the single worldwide repository of 3D structure data of proteins and nucleic acids: ~35,000 structures as of August 2005.
(www.rcsb.org/pdb)
Other Web Resources:
1.
NCBI
2.
The European Bioinformatics
Institute (EBI) (www.ebi.ac.uk)
3.
The RNA world (www.imbjena.de/RNA.html)
II.2 Molecular simulations of proteins: secondary and tertiary structure of proteins
Φ = -57 o , Ψ = -47 o right handed alpha-helix
Chain has directionality!
II.3 Molecular simulations of proteins: secondary and tertiary structure of proteins
Φ = (-110 o , -140 o ), Ψ = (110 o , -135 o )=> beta-sheet
II.4 Molecular simulations of proteins: quaternary structure of proteins
Alpha-beta folds
Multi-domain proteins a) Control protein b) Immunoglobulin (muscles) c) Fibronectin d) Growth factor
Knotted proteins
II.5 All-atom classical Molecular Dynamics (MD) simulations: force fields
I. Potential for bonded interactions:
V
B
V
BL
V
BA
V
DIH
V
SS
V
BL
-bondlength potential,
V
BA
-bond-angle potential,
V
DIH
-dihedral angle potential,
V
SS
– disulfide bond potential
II. Potential for non-bonded interactions:
V
NB
V
PP
V
WW
V
WP
V
PP
- protein-protein interaction potential,
V
WW
- water-water potential,
V
WP
- water-protein interaction potential
III. Software (open-source):
•GROMACS (force field: OPLS and GROMOS )
•NAMD (force fields: CHARMM22, CHARMM27)
IV. Water models:
•GROMACS (SPC, SPC/E, SPC-fw)
•NAMD (TIP, TIP3P)
GROMACS (Univ. of Groeningen, Netherlands): ftp://ftp.gromacs.org/pub/
NAMD (Univ. of Illinois at Urbana Shampaign, USA): http://www.ks.uiuc.edu/Research/namd/
II.6 All-atom MD simulations of proteins: examples of fibrinogen and
A-knob-a-hole complex of fibrin
Fibrin polymerisation: ~2,400 a.a., ~48nm
• essential for blood clotting
• implicated in heart attack and stroke
II.7
All-atom MD simulations of proteins: IR spectroscopy of proteins
Amide I
- infrared light (vibrations of bonds)
Amide I & Amide II are the major bands:
- conformationally sensitive
- localized at individual a.a site
Amide I :
C=O-stretching (90%)+C-N-stretch (10%)
Amide II :
N-H-bending (60%)+C-N-stretch (40%)
Krimm & Bandekar, Adv. Prot. Chem., 38, 181 (1986); Woutersen & Hamm, J. Phys: Cond. Matt. 14, R1035 (2002);
Venyaminov & Kalnin, Biopolymers, 30, 1243 (1990); Chergadze $ Nevskaya, ibid, 15, 637 (1976)
II.8
All-atom MD simulations of proteins: IR spectroscopy of proteins
1. Vibrational exciton Hamiltonian:
H
L ij
i
b b i
i d
2
b b b b i
L b b j frequency (
1 ); d
anharmonicity
electrostatic coupling b b j
;[ ]
ij
2. Transition dipole coupling (TDC):
L ij
C
i j r ij
3
3
(
)(
r ij
5
)
i
dipole moment unit vector r ij
vector connecting dipoles
C
coupling magnitude
3. Linear absorption spectrum:
Cheatum et al, JCP, 120, 8201 (2004);
Torii & Tasumi, JCP, 96, 3379 (1992);
S. Mukamel, Principles of Nonlinear Spectroscopy
I
e
Im
e
e
2
E e
i
;
e
e
i
i
b g i
;
line broadening
II.9
All-atom MD simulations of proteins: IR spectroscopy of proteins
Assumptions used in the vibrational exciton Hamiltonian:
- dynamics of in the near-equilibrium state
- fast bath relaxation (fixed line broadening,
)
- fitting parameters (diagonal energies, peak amplitudes, frequency splitting)
- energies/amplitudes are from ab initio maps of N-methylacetamide, glycine dipeptide analogs
- transferability of ab initio maps to larger proteins
Direct calculation of IR spectra of Amide I from MD:
Amide I
CO-vibration with | q
C
| | q
O
|
- Correction Factor due to assumptions/harmonic force field
q r
q r
q r
r
x n x
y n y
z n z
)
qr
M
CO
( )
i
V
I
j
M
CO
d
M
CO
classical current
2 dt
3 VkTn
dt e j t j 0
refractive index
QCF
1
exp[
]
Advantages of correlation functions:
- IR obtained directly from classical MD
- beyond ensemble average
- far-from-equilibrium regime
II.10
All-atom MD simulations of proteins: IR spectroscopy of proteins
Ubiquitin (1UBQ, 76 a.a):
- water box (4,600 TIP3P, 47Å
51Å
57Å)
- 8 trajectories (t=4ps, dt=0.1fs, NVE) at T=300K
- Ewald sum method (long range electrostatics)
- 12Å cutoff for L-J forces
A
16-22
(3
KLVFFAE, 21 a.a):
- water box (2000 TIP3P, 44Å
41Å
36Å)
- - Ewald sum method; 12Å cutoff for L-J forces
- 12 trajectories (t=8ps, dt=0.1fs, NVE) at T=300K
Correction Factor=0.985
(CHARMM22)
Chung et al, PNAS, 102, 612 (2005) Cheatum et al, JCP, 120, 8201 (2004)
II.11
Coarse-grained (CG) descriptions of proteins: building the CG model
I. Coarse grained model for P-selectin:
Step 1: creating structure file of C a & centers of mass of residues from PDB structure of P-selectin ( www.rcsb.org
)
•mimicking hydrogen bonds
•modeling S-S bonds
Step 2: computing potential energy of obtained conformation of P-selectin:
V
TOT
V
BL
V
SBC
V
BA
V
DIH
V
HB
V
NON
V
SS
Step 3: follow Langevin Dynamics
d dt
V
X j
V
V
X
j
TOT
K. Dill et al, Protein Sci, 4, 561 (1995);
D. Thirumalai, D. Klimov, PNAS, 97, 2544 (2000);
J. Bryngelson et al, Protein, 21, 167 (1995);
M. Karplus, A. Sali, Curr Opin Struct Biol, 5, 58 (1995);
Kolinski, J. Skolnick, Polymer, 45, 511 (2004)
II.11
Coarse-grained (CG) descriptions of proteins: force field
I. Scales of energy/length/mass/time:
m h
- hydrophobic interaction (1.25 kcal/mol);
22
- residue mass ( );
ma
2
/
h a
-bond length (3.8 Å)
- the timescale (~3ps )
II. Harmonic connectivity potentials:
V
BL
i k r
2
X i
1
X i
a
2
; V
SBC
i
2 k s
X
X
a
2
; V
BA
i k
2
0
2
; V
SS
i k
SS
r i
Cys
2
r
0
Cys
2 k r
100
h
/ a 2 ; k s
200
h
/ a 2 ; k
20
h
/ ( rad ) ;
0
105 0 ; k
SS
k s
/ 5
III. Dihedral angle potential:
V
DIH
A i
( 1
cos[
i
i i
0
])
B i
( 1
cos[ 3
i
3
i 0
])
C i
( 1
sin[
i
i 0
])
turn β-sheet α-helix
B i
0 2
h
, A i
C i
0 A i
B i
12
h
, C i
0 A i
B i
12
h
, C i
h
II.11
Coarse-grained (CG) descriptions of proteins: force field
IV. Hydrogen bond potential:
V
HB
h
3
i exp[
a
(cos
2
1 , i
cos
2
2 , i
)] cos
1 , i
( X
OH
X
X
OH
X a
2
1
)
;cos
2 , i
1
( X
OH
X i 3 , i 4
)
;
X
OH
X i 3 , i 4
V. Potential for native contacts:
V
S
(
b ij
)
h
X ij
0
X ij
12
X ij
0
X ij
6
X ij
0
X ij
5
0 b ij
contact interaction matrix; X ij
contact distance (Kolinski et al, JCP, 98, 7420 (1993))
VI. Nonbonded potential: V
S
V
B
V
BS
8
h
3
a
X ij
12
VII. Unfolding/unbinding trajectories : V
V
TOT
d dt
X j
V
X j
( );
( )
0 ,
( ) ( ' )
6 kT
( t
t ' )
II.12
Coarse-grained (CG) descriptions of proteins: forced rupture of the
P-selectin-sPSGL noncovalent bond f
( )
r t r f
s
0 t
N-terminus of P-selectin
( )
r t
C-terminus of sPSGL-1
III.1
Computer simulations using graphics cards: basic facts
CPU:
Advantages:
• can perform very sophisticated flow control (IF/THEN – cycles, conditionals, etc.)
• single CPU cores are faster (3.0GHz) or faster
• a lot of well-tested (commercial) software is available
Disadvantages:
• has no more than 6 cores (today)
• parallel programming on CPU is difficult
• data exchange b/w nodes in a cluster occurs through relatively slow network
GPU:
Advantages:
• up to 240 cores (GeForce 280, Tesla C1060)
• easy to write parallel codes with CUDA language (extension of C)
• memory bandwidth is high because all cores are local
Disadvantages:
• single core clock is not as fast as CPU core (0.5GHz)
• can’t be used for applications with sophisticated flow control
• not many software available for GPU (started in ~2006)
III.2
Computer simulations using graphics cards: hardware
GPU:
• highly parallel
• multythreaded
• manycore processor
Historically, GPU
• was designed for compute-intensive, highly parallel computation
• more transistors are devoted to data processing rather than data caching and flow control
• well-suited for problems that involve data-parallel computations, i.e. the same program is executed on many data elements in parallel (MD, coarse-grained simulations).
III.3
Computer simulations using graphics cards: programming mode
CUDA:
• consist of a minimal extension to the C language
• parallel programming model and software environment
• designed to overcome the challenge of creating software that transparently scales on manycore processors
Example:
• vecAdd() function is called N times on GPU.
• <<<1, N>>> means that the procedure runs in one 1D block with N threads.
• i = threadIdx.x
is a way for thread to identify, which element of the vector it should work with.
III.4
Computer simulations using graphics cards: software organization
Thread hierarchy :
1.
thread index is a 3D vector, so that threads can be identified using a 1D, 2D or 3D index forming 1D, 2D or 3D thread block
2.
multiple blocks can be organized into 1D or
2D grid. Each block can be identified within grid using 1D, 2D or 3D block index
3.
all threads in one block are doing the same thing with different data
4.
threads can synchronize and pass data to each other within block using shared memory
5.
threads can pass the data to the CPU through the GPU global memory
III.5
Computer simulations using graphics cards: software organization
Memory hierarchy :
1.
each thread has it own local memory (in cache) for storing temporary variables
2.
each block has shared memory (in cache) for synchronizing the threads within block
3.
device has global memory that can be accessed from any thread on the GPU
4.
local memory and shared memory are much faster than global, but they are available only locally and exists only during the lifetime of a thread or a block.
5.
global memory is relatively slow, and can be also accessed from CPU.
III.6
Computer simulations using graphics cards: hardware model
Hardware organization:
1.
each device have N multiprocessors
2.
multiprocessors can share data only through device (global) memory
3.
A multiprocessor have M processors ( ALU s)
4.
number of threads that can run at the same time is equal to N x M ; for GeForce 8800GT,
M =8, N =14 (number of processors = 112!!!)
5.
one block can run only on one multiprocessor so the number of blocks in program should be at least equal to the number of multiprocessors on the device.
III.7
Computer simulations using graphics cards: applications
MD and CG simulations are suitable for GPU:
• same potential (force field) for all atoms
(beads)
• integration scheme is explicit
• systems have huge number of atoms (beads)
Example:
• Rouse chain model of homopolymer
• Lennard-Jones potential (self-avoidance)
• 1,000,000 time steps for each chain
• Intel Xeon 2GHz Dual Core (CPU,
~$350) vs GeForce 8800 GT (GPU, ~$130)