L S B :

advertisement
LARGE SCALE BIOMOLECULAR SIMULATIONS: CURRENT STATUS AND FUTURE
PROSPECTS
Yalini Arinaminpathy, Oliver Beckstein, Philip C. Biggin, Peter J. Bond, Carmen Domene,
Andrew Pang and Mark S.P. Sansom*
Department of Biochemistry, University of Oxford, OX1 3QU, U.K.
*to whom correspondence should be addressed: mark.sansom@biop.ox.ac.uk
Keywords: molecular dynamics; protein; membrane; ion channel; hpc; grid; capability
computing; capacity computing; computational biology
Abstract
Large-scale biomolecular simulations form an increasingly important component of a number
of areas of biological investigation, including bionanoscience, structural bioinformatics and
systems biology. Future trends in biomolecular simulations will emphasise greater depth (more
detailed physico-chemical models), greater breadth (comparative simulations across families of
biomolecules), and greater complexity (simulations of large, multi-component systems). These
classes of simulation will place increasing demands on different aspects of high performance
computing, namely capability, capacity and GRID-enabled computing. These developments are
explored via examples of simulations from the authors’ laboratory, including ion channels,
model nanopores, ligand binding proteins, and bacterial outer membranes.
1. Introduction
Bimolecular simulations enable us to explore the dynamics and energetics of complex
biological molecules and systems, starting from e.g. the structure of a protein determined via Xray diffraction or NMR studies. Most such simulations use molecular dynamics (MD), in which
the classical equations of motion of the atoms in a system (interacting with one another via an
empirical forcefield) are solved by numerical integration, yielding a trajectory (i.e. a ‘movie’)
of the system over a time period of ~10 ns. MD simulations of biomolecules have been in use
for ~25 years [1] and have yielded valuable results in a number of areas of macromolecular
function, structure and stability. Such simulations are of particular interest in that they enable
us to extrapolate from the essentially static X-ray structure of a protein to a more dynamic
picture of the protein in its physiological environment. This in turn provides us with enhanced
insights into the relationship between protein structure, dynamics and function.
In the early days of MD simulations of biomolecules, very large-scale computational
facilities were required to run even short simulations. This limited the application of these
methods to a few studies of rather small proteins. More recently biomolecular simulations have
benefited from advances in computer technology. Increases in supercomputer capability have
enabled us to explore much larger molecules for much longer timescales, thus increasing the
biological impact of such studies. Simultaneously, advances in capacity computing (e.g.
commodity clusters) have greatly increased the numbers of research groups running
biomolecular simulations, and thus expanded the range of proteins and other systems being
studied.
As a consequence of these and other advances, MD simulations now provide an
important complement to experimental studies of biological macromolecules and systems.
Simulations enable us to explore conformational dynamics in systems that are difficult to probe
experimentally, such as the behaviour of water within pores of nanoscopic dimensions (see
below). Simulations may also be used as in modelling studies in order to aid extrapolation from
the structure and dynamics of bacterial proteins to the behaviour of their human homologues
[2]. Simulations may also help us to understand the effects of mutations (both in vitro and in
disease states) on the function of proteins [3].
In this paper, we will discuss the current and future state of biomolecular simulations,
with a focus on leading edge applications that need HPC and related high-end resources. This
discussion will be illustrated with examples from the authors’ laboratory. We apologise to our
colleagues for the inevitable bias and omissions that this will introduce.
2. Three key directions
There are three key directions in which current MD simulations must develop if they are
to address contemporary biomolecular problems. These are: (i) greater depth, i.e. more detailed
and realistic physico-chemical models than those offered by current molecular mechanics
forcefields; (ii) greater breadth, i.e. increasing the range of simulations, in order to meet the
challenges offered by the post-genomic expansion in structural biology; and (iii) greater
complexity, i.e. addressing ever more complex biological assemblies and systems via atomistic
simulation.
All three directions will make great demands on computational facilities. However, the
computational needs of these different categories of simulation are not all the same. An
appreciation of the differences will be the key to providing an optimal infrastructure for the
next wave of computational structural and systems biology.
A further aspect that must not be ignored is the development of improved methods for
storage, analysis and archival of biomolecular simulation data. However, this aspect will not be
addressed here and so the interested reader elsewhere in this volume for a description of the
BioSimGRID project (www.biosimgrid.org).
3. Greater depth
As an example of the need for greater physico-chemical complexity we will consider an
ion channel. Ion channels are membrane proteins that form pores in cell membranes. Selected
ions flow rapidly (~107 ions sec-1) through these pores. Ion channels play important roles in
most cells, but especially in cells of the nervous system. The structures of a number of bacterial
homologues of mammalian ion channels have been determined by X-ray diffraction (reviewed
in [4] ). One such channel, a bacterial potassium selective channel KcsA [5, 6], has been the
subject of numerous MD simulations [7 , 8-10].
MD simulations have addressed a number of aspects of KcsA function, including ion
permeation, selectivity and gating (for reviews see [11, 12]). For several of these areas, current
MD approaches do provide a suitable methodology, although the biological significance of the
results may benefit from longer simulations. However, some aspects of channel function may
require more sophisticated (and computationally expensive) approaches. In particular, the
question of the ion selectivity of KcsA (i.e. why K+ and Rb+ ions may pass freely through the
channel whereas Na+ ions move through very slowly if at all) has pushed conventional MD
approaches to their limits. A number of MD studies have attempted to address the question of
the relative stability of K+ vs. Na+ ions within the selectivity filter region of KcsA (e.g. [7, 1315]). What is evident from these studies is that an accurate treatment of the energetics of this
system (an essential prerequisite to an understanding of ion selectivity) requires accurate
calculation of the energetics of ion-protein interactions, of ion- water interactions, and of ioninduced protein distortions.
There are some indications that a more detailed physico-chemical model may be
necessary for an accurate treatment of ion selectivity in KcsA, and by extension in other ion
channels. Firstly, it remains uncertain which is the ‘best’ set of molecular mechanics
parameters to use for ion channel simulations [16]. This is an aspect of a more general problem
of the transferability of molecular mechanics forcefields. In particular, it is likely that when
cations are within the selectivity filter of KcsA there will be a degree of electronic polarisation
of the oxygen atoms that surround the ions in the filter. This polarisation will depend upon the
ionic species and also on the exact configuration of the system, i.e. the location of the ions.
Some evidence for changes in electronic polarisation has emerged from preliminary density
functional calculations of K+ ions in the KcsA filter [17]. However, it is important that such
studies are extended to more ion/filter configurations and to different ionic species. A more
rigorous treatment would require e.g. CPMD simulations (www.cpmd.org) in order to treat
dynamic changes in electronic polarisation during ion movement. Such calculations will require
an extended region of the KcsA system to be treated quantum mechanically, thus making rather
large computational demands.
Figure 1: A KcsA channel (blue
cylinders) in a lipid bilayer. The
expanded region shows K+ ions (cyan)
and water molecules within the
selectivity filter.
A more in depth approach to
simulations of ion channels and related
systems presents a major challenge in
terms of multiple scale biomolecular
simulations (see Fig. 1). The channel
protein is embedded in a lipid + water
environment. To fully represent the slow (> 5 ns) fluctuations in this environment, MD
simulations of systems of at least 50,000 atoms are needed. The environmental fluctuations
may be coupled to changes in the conformation of the protein (~3000 atoms, neglecting
hydrogens), which in turn may be coupled to changes in atomic positions and electronic
polarisation within the selectivity filter (~100 atoms).
As a further example of how in depth simulations can extend our understanding of the
physico-chemical properties of biological systems, let us consider the case of water confined
within a pore of nanoscopic dimensions. This is relevant to our understanding of ion channels,
and of aquaporins (biological water pores [18, 19]). We have simulated the behaviour of water
within model nanopores (see Fig. 2) of radii ranging from 3 to 10 Å [20]. One of the
unexpected properties of water in such pores is that at intermediate radii (~5 Å) the water
within the pore oscillates between a liquid and a vapour state. These oscillations are relatively
slow and so extended simulations (~50 ns) are needed to capture their behaviour. Indeed, the
characterisation of the behaviour of water in this relatively simple system required a total of
~1500 cpu days. To extend such studies to a more complex water model (the current study used
a relatively simple three point fixed charge model) clearly will require a considerable increase
in computational power.
Figure 2: A simple model nanopore (blue) embedded in a
membrane mimetic slab (gold) with water molecules on
either side of the membrane and within the pore.
What are the likely computational requirements of
in depth simulations of ion channels and related pores? To
date, MD simulations have been performed mainly on
clusters and similar capacity machines. However, large
scale ab initio calculations will require access to HPC
facilities, and to suitably scalable codes. If such
calculations are to be coupled to conventional MD
simulations for the remainder of the protein and its
environment then GRID-based approaches (such as those in e.g. the RealityGrid project www.realitygrid.org) will be required in order to synchronise the different aspects of the
calculations.
4. Greater breadth
It is important that biomolecular simulation studies are responsive to the challenges of a
post-genomic era. Much progress in biology is made by comparisons. For example, in the
context of protein structures, the ongoing expansion in the number of protein structures being
determined (see www.rscb.org) has lead to the development of the discipline of structural
bioinformatics [21], based on comparative analysis of protein structures. A great opportunity
thus arises to derive general results by applying MD simulations to a wide range of proteins. In
particular, by applying MD simulations to different protein folds it will be possible to correlate
aspects of protein flexibility with the different protein architectures. Given the importance of
protein flexibility and conformational change in protein function, it is essential that we
approach this aspect of biomolecular simulations in a more wide-ranging and systematic
fashion than has been possible to date.
As an example of the biological importance of this approach, we will consider two
classes of proteins that one might not expect to be related in their conformational dynamics,
namely glutamate receptors and bacterial periplasmic binding proteins. Glutamate receptors
(GluRs) are complex neurotransmitter-activated ion channels present in the central nervous
systems of mammals. X-ray structures of the neurotransmitter (glutamate) binding fragment of
two related mammalian GluRs (GluR2 and NR1) and of a bacterial homologue (GluR0) have
been determined and shown to have similar structures [22-24]. Structural comparisons revealed
that a similar protein fold is found in a functionally unrelated class of proteins, the bacterial
periplasmic binding proteins, which include the glutamine-binding protein (GlnBP) and the
lysine-arginine-ornithine binding protein (LAOBP). All of these proteins share a common fold,
with a ligand-binding site in between two domains (see Fig. 3). From comparison of multiple
static X-ray structures it has been suggested that these domains move together upon ligand
binding.
Comparative MD simulations have been performed on four of these proteins: GluR2
[25], GluR0 (Arinaminpathy, Sansom and Biggin, unpublished data), GlnBP [26] and LAOBP
(Pang, Sansom and Biggin, unpublished data). The results reveal some interesting similarities
in the dynamics of the different proteins. In particular a number of the simulations provide
evidence for dynamic hinge-bending motions of the protein that enable the two domains to
move together/apart. This is illustrated in Fig. 3 for LAOBP. Thus, these comparative
simulations suggest conservation of a pattern of inter-domain dynamics across a family of
protein folds. In the GluRs, these dynamic changes are exploited in the mechanism of receptor
activation; in the periplasmic binding proteins they play a role in ligand transport across the
bacterial cell membrane.
Figure 3: The folds of GluR2 and LAOBP
compared (the bound ligands are shown in
red). The diagram on the right shows the
principal motions (shown as blue/green
cones) corresponding to the first
eigenvector derived from analysis of a 20
ns simulation of LAOBP in the absence of
bound ligand. Hinge-bending is evident.
The computational requirements for such simulations are non-trivial. The example given
corresponds to a family of relatively small (~250 residue) proteins. The simulation system size,
once sufficient water molecules are included, is ~50,000 atoms. A meaningful comparative
study would require a minimum of 4 simulations (e.g. with and without bound ligands, each
starting from an open and a closed conformation) for each of e.g. 5 proteins within a family.
Each simulation would need to be run for a minimum of 20 ns. A typical simulation cost for
this size of system is ~14 cpu days/ns (on a Pentium III using the GROMACS code www.gromacs.org). Thus, a wide-ranging comparative study, encompassing perhaps 100
different (small) folds would require ~560,000 cpu days. Whilst not requiring the largest HPC
resources such a study therefore would need a substantial allocation of high-end capacity time.
5. Greater complexity
So far we have restricted our attention to single proteins and to relatively small systems.
However, from a biophysical and biological perspective, there is much interest in large multisubunit proteins such as molecular machines, and in complex multi-component systems, such
as cell membranes. In the context of the former, a number of investigators [27, 28] have started
to use HPC to simulate the dynamic properties of relatively simple molecular machines such as
the F-ATPase. These simulations, of just part of the machine, contain ~100,000 atoms and thus
may start to raise considerations of scalability of simulation codes on large numbers of
processors (see below).
In addition to large single systems, it is important to apply biomolecular simulations to
complex, multi-component systems. This is a step towards computational systems biology at a
molecular/sub-cellular level. The aim is to provide a rigorous description of how the
conformational dynamics of the individual components contribute to the emergent properties of
a more complex biological system. We are in the early stages of such studies, but we can a
preliminary estimate of their computational complexity, and the HPC infrastructure that will be
required. A first step towards such studies is to select a suitable test system. For simulations of
complex membranes, this is provided by the outer membrane of Gram negative bacteria such as
E. coli. The structures of several bacterial outer membrane proteins are known, and it is
possible to generate plausible homology models of other members of this family of proteins.
Also, the outer membrane is complex, but not as complex as e.g. a mammalian nerve cell
membrane, and so provides a suitable test system for developing a new approach.
The first stage of this approach is to build a library of simulations of the individual
components. This, in terms of computational resource, is similar to the comparative simulation
studies described above. Several research groups [29-33] have embarked upon MD simulations
of bacterial outer membrane proteins. An additional complexity for membrane proteins is that
one needs to perform simulations to explore the effects of environment on protein dynamics, at
least for some well-studied outer membrane proteins. Simulations may be used to explore how
the dynamics of such proteins in the environments used in experimental studies (e.g. crystal or
micelle) compare with the dynamics of the same protein in a cell membrane [34] (see Fig. 4).
Figure 4: A simple outer membrane protein, OmpA (blue)
simulated in a detergent micelle (green). The surrounding water
molecules are omitted for clarity.
The next stage of these investigations is to generate a prototype virtual outer membrane
for E. coli and related bacteria (see Fig. 5) based on a multi-nanosecond atomistic MD
simulation of e.g. a 3x3 array of bacterial outer membrane proteins (this would correspond to
~106 atoms). This will be the first time such a complex simulation has been performed, and will
provide a test case for using MD simulations in systems biology approaches in order to bridge
the molecular and cellular levels. In particular, we wish to explore how the long length- and
time-scale properties of the simulated membrane may be analysed in order to develop suitably
parameterised mesoscale methods for simulating even larger subcellular assemblies.
Figure 5: Schematic of a virtual outer
membrane, showing some of the bacterial
outer membrane proteins (blue) embedded
in a model outer membrane (grey)
The computational needs of these very large scale simulations (106 atoms or more) are
substantial. Access to 1000 cpu resources (e.g. HPCx) is essential in order to perform such
simulations, as is suitable scalable MD code (e.g. NAMD - www.ks.uiuc.edu/research/namd/).
Efficient approaches to simulation data analysis and visualisation will also have to be
developed in order to cope with the output of such simulations.
6. Future Directions
Biomolecular simulations will play an increasingly important role in modern structural
and systems biology. In particular, simulations will aid the interpretation of biophysical and
functional experiments at the single molecule level. Simulations on increasingly complex
systems will provide a component for systems biology, helping to link molecular and cellular
descriptions of function.
All of these applications will place considerable demands upon computing infrastructure,
and will benefit from ongoing developments in e-science and high performance computing. In
particular, access to GRID-enabled HPC resources will be the key to effective multi-scale
simulations of complex biological systems. There will also be important roles for aspects not
discussed above, such as computational steering and visualisation. Depending on the particular
biological application, both capability and capacity HPC resources will be needed.
A major technical challenges for the future of biomolecular simulations will be to match
the software and infrastructure to the changing nature of key biological applications. Very
large-scale simulations will require good scalability of codes on large numbers (> 256) of
processors. Complex, multi-scale simulations will require synchronised access to multiple,
heterogeneous GRID-enabled resources.
Acknowledgements
Many thanks to all of our colleagues for their encouragement and advice. Research in MSPS’s
laboratory is funded by grants from the BBSRC, EPSRC, MRC and the Wellcome Trust.
References
[1] Karplus, M.J. and McCammon, J.A. (2002) Nature Struct. Biol., 9, 646-652.
[2] Capener, C.E., Kim, H.J., Arinaminpathy, Y. and Sansom, M.S.P. (2002) Human Molec.
Genet., 11, 2425-2433.
[3] Capener, C.E., Proks, P., Ashcroft, F.M. and Sansom, M.S.P. (2003) Biophys. J., 84, 23452356.
[4] Domene, C., Haider, S. and Sansom, M.S.P. (2003) Curr. Opin. Drug Discov. Develop., (in
press),
[5] Doyle, D.A., Cabral, J.M., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Cahit, B.T.
and MacKinnon, R. (1998) Science, 280, 69-77.
[6] Zhou, Y., Morais-Cabral, J.H., Kaufman, A. and MacKinnon, R. (2001) Nature, 414, 43-48.
[7] Guidoni, L., Torre, V. and Carloni, P. (1999) Biochem., 38, 8599-8604.
[8] Shrivastava, I.H. and Sansom, M.S.P. (2000) Biophys. J., 78, 557-570.
[9] Bernèche, S. and Roux, B. (2000) Biophys. J., 78, 2900-2917.
[10] Bernèche, S. and Roux, B. (2001) Nature, 414, 73-77.
[11] Domene, C., Bond, P. and Sansom, M.S.P. (2003) Adv. Prot. Chem., (in press),
[12] Sansom, M.S.P., Shrivastava, I.H., Bright, J.N., Tate, J., Capener, C.E. and Biggin, P.C.
(2002) Biochim. Biophys. Acta, 1565, 294-307.
[13] Åqvist, J. and Luzhkov, V. (2000) Nature, 404, 881-884.
[14] Shrivastava, I.H., Tieleman, D.P., Biggin, P.C. and Sansom, M.S.P. (2002) Biophys. J., 83,
633-645.
[15] Domene, C. and Sansom, M.S.P. (2003) Biophys. J., (in press), ms. 2002/018044.
[16] Tieleman, D.P., Biggin, P.C., Smith, G.R. and Sansom, M.S.P. (2001) Quart. Rev.
Biophys., 34, 473-561.
[17] Guidoni, L. and Carloni, P. (2002) Biochim. Biophys. Acta, 1563, 1-6.
[18] de Groot, B.L. and Grubmuller, H. (2001) Science, 294, 2353-2357.
[19] Tajkhorshid, E., Nollert, P., Jensen, M.O., Miercke, L.J.W., O'Connell, J., Stroud, R.M.
and Schulten, K. (2002) Science, 296, 525-530.
[20] Beckstein, O. and Sansom, M.S.P. (2003) Proc. Nat. Acad. Sci. USA, 100, 7063-7068.
[21] Bourne, P.E. and Weissig, H. (2003) Structural Bioinformatics, Wiley-Liss, Hoboken.
[22] Armstrong, N., Sun, Y., Chen, G.-Q. and Gouaux, E. (1998) Nature, 395, 913 - 917.
[23] Mayer, M.L., Olson, R. and Gouaux, E. (2001) J. Mol. Biol., 311, 815-836.
[24] Gouaux, E. and Furukawa, H. (2003) EMBO J., 22, 2873-2875.
[25] Arinaminpathy, T., Sansom, M.S.P. and Biggin, P.C. (2002) Biophys. J., 82, 676-683.
[26] Pang, A., Arinaminpathy, Y., Sansom, M.S.P. and Biggin, P.C. (2003) FEBS Lett., (in
press),
[27] Bockmann, R.A. and Grubmuller, H. (2002) Nature Struct. Biol., 9, 198-202.
[28] Dittrich, M., Hayashi, S. and Schulten, K. (2003) Biophys. J., (in press),
[29] Tieleman, D.P. and Berendsen, H.J.C. (1998) Biophys. J., 74, 2786-2801.
[30] Im, W. and Roux, B. (2002) J. Mol. Biol., 319, 1177-1197.
[31] Bond, P., Faraldo-Goméz, J. and Sansom, M.S.P. (2002) Biophys. J., 83, 763-775.
[32] Baaden, M., Meier, C. and Sansom, M.S.P. (2003) J. Mol. Biol., 331, 177-189.
[33] Faraldo-Gómez, J., Smith, G.R. and Sansom, M.S.P. (2003) Biophys. J., (in press), ms.
2002/017228.
[34] Bond, P. and Sansom, M.S.P. (2003) J. Mol. Biol., 329, 1035-1053.
Download