L S B :

LARGE SCALE BIOMOLECULAR SIMULATIONS: CURRENT STATUS AND FUTURE PROSPECTS Yalini Arinaminpathy, Oliver Beckstein, Philip C. Biggin, Peter J. Bond, Carmen Domene, Andrew Pang and Mark S.P. Sansom* Department of Biochemistry, University of Oxford, OX1 3QU, U.K. *to whom correspondence should be addressed: mark.sansom@biop.ox.ac.uk Keywords: molecular dynamics; protein; membrane; ion channel; hpc; grid; capability computing; capacity computing; computational biology Abstract Large-scale biomolecular simulations form an increasingly important component of a number of areas of biological investigation, including bionanoscience, structural bioinformatics and systems biology. Future trends in biomolecular simulations will emphasise greater depth (more detailed physico-chemical models), greater breadth (comparative simulations across families of biomolecules), and greater complexity (simulations of large, multi-component systems). These classes of simulation will place increasing demands on different aspects of high performance computing, namely capability, capacity and GRID-enabled computing. These developments are explored via examples of simulations from the authors’ laboratory, including ion channels, model nanopores, ligand binding proteins, and bacterial outer membranes. 1. Introduction Bimolecular simulations enable us to explore the dynamics and energetics of complex biological molecules and systems, starting from e.g. the structure of a protein determined via Xray diffraction or NMR studies. Most such simulations use molecular dynamics (MD), in which the classical equations of motion of the atoms in a system (interacting with one another via an empirical forcefield) are solved by numerical integration, yielding a trajectory (i.e. a ‘movie’) of the system over a time period of ~10 ns. MD simulations of biomolecules have been in use for ~25 years [1] and have yielded valuable results in a number of areas of macromolecular function, structure and stability. Such simulations are of particular interest in that they enable us to extrapolate from the essentially static X-ray structure of a protein to a more dynamic picture of the protein in its physiological environment. This in turn provides us with enhanced insights into the relationship between protein structure, dynamics and function. In the early days of MD simulations of biomolecules, very large-scale computational facilities were required to run even short simulations. This limited the application of these methods to a few studies of rather small proteins. More recently biomolecular simulations have benefited from advances in computer technology. Increases in supercomputer capability have enabled us to explore much larger molecules for much longer timescales, thus increasing the biological impact of such studies. Simultaneously, advances in capacity computing (e.g. commodity clusters) have greatly increased the numbers of research groups running biomolecular simulations, and thus expanded the range of proteins and other systems being studied. As a consequence of these and other advances, MD simulations now provide an important complement to experimental studies of biological macromolecules and systems. Simulations enable us to explore conformational dynamics in systems that are difficult to probe experimentally, such as the behaviour of water within pores of nanoscopic dimensions (see below). Simulations may also be used as in modelling studies in order to aid extrapolation from the structure and dynamics of bacterial proteins to the behaviour of their human homologues [2]. Simulations may also help us to understand the effects of mutations (both in vitro and in disease states) on the function of proteins [3]. In this paper, we will discuss the current and future state of biomolecular simulations, with a focus on leading edge applications that need HPC and related high-end resources. This discussion will be illustrated with examples from the authors’ laboratory. We apologise to our colleagues for the inevitable bias and omissions that this will introduce. 2. Three key directions There are three key directions in which current MD simulations must develop if they are to address contemporary biomolecular problems. These are: (i) greater depth, i.e. more detailed and realistic physico-chemical models than those offered by current molecular mechanics forcefields; (ii) greater breadth, i.e. increasing the range of simulations, in order to meet the challenges offered by the post-genomic expansion in structural biology; and (iii) greater complexity, i.e. addressing ever more complex biological assemblies and systems via atomistic simulation. All three directions will make great demands on computational facilities. However, the computational needs of these different categories of simulation are not all the same. An appreciation of the differences will be the key to providing an optimal infrastructure for the next wave of computational structural and systems biology. A further aspect that must not be ignored is the development of improved methods for storage, analysis and archival of biomolecular simulation data. However, this aspect will not be addressed here and so the interested reader elsewhere in this volume for a description of the BioSimGRID project (www.biosimgrid.org). 3. Greater depth As an example of the need for greater physico-chemical complexity we will consider an ion channel. Ion channels are membrane proteins that form pores in cell membranes. Selected ions flow rapidly (~107 ions sec-1) through these pores. Ion channels play important roles in most cells, but especially in cells of the nervous system. The structures of a number of bacterial homologues of mammalian ion channels have been determined by X-ray diffraction (reviewed in [4] ). One such channel, a bacterial potassium selective channel KcsA [5, 6], has been the subject of numerous MD simulations [7 , 8-10]. MD simulations have addressed a number of aspects of KcsA function, including ion permeation, selectivity and gating (for reviews see [11, 12]). For several of these areas, current MD approaches do provide a suitable methodology, although the biological significance of the results may benefit from longer simulations. However, some aspects of channel function may require more sophisticated (and computationally expensive) approaches. In particular, the question of the ion selectivity of KcsA (i.e. why K+ and Rb+ ions may pass freely through the channel whereas Na+ ions move through very slowly if at all) has pushed conventional MD approaches to their limits. A number of MD studies have attempted to address the question of the relative stability of K+ vs. Na+ ions within the selectivity filter region of KcsA (e.g. [7, 1315]). What is evident from these studies is that an accurate treatment of the energetics of this system (an essential prerequisite to an understanding of ion selectivity) requires accurate calculation of the energetics of ion-protein interactions, of ion- water interactions, and of ioninduced protein distortions. There are some indications that a more detailed physico-chemical model may be necessary for an accurate treatment of ion selectivity in KcsA, and by extension in other ion channels. Firstly, it remains uncertain which is the ‘best’ set of molecular mechanics parameters to use for ion channel simulations [16]. This is an aspect of a more general problem of the transferability of molecular mechanics forcefields. In particular, it is likely that when cations are within the selectivity filter of KcsA there will be a degree of electronic polarisation of the oxygen atoms that surround the ions in the filter. This polarisation will depend upon the ionic species and also on the exact configuration of the system, i.e. the location of the ions. Some evidence for changes in electronic polarisation has emerged from preliminary density functional calculations of K+ ions in the KcsA filter [17]. However, it is important that such studies are extended to more ion/filter configurations and to different ionic species. A more rigorous treatment would require e.g. CPMD simulations (www.cpmd.org) in order to treat dynamic changes in electronic polarisation during ion movement. Such calculations will require an extended region of the KcsA system to be treated quantum mechanically, thus making rather large computational demands. Figure 1: A KcsA channel (blue cylinders) in a lipid bilayer. The expanded region shows K+ ions (cyan) and water molecules within the selectivity filter. A more in depth approach to simulations of ion channels and related systems presents a major challenge in terms of multiple scale biomolecular simulations (see Fig. 1). The channel protein is embedded in a lipid + water environment. To fully represent the slow (> 5 ns) fluctuations in this environment, MD simulations of systems of at least 50,000 atoms are needed. The environmental fluctuations may be coupled to changes in the conformation of the protein (~3000 atoms, neglecting hydrogens), which in turn may be coupled to changes in atomic positions and electronic polarisation within the selectivity filter (~100 atoms). As a further example of how in depth simulations can extend our understanding of the physico-chemical properties of biological systems, let us consider the case of water confined within a pore of nanoscopic dimensions. This is relevant to our understanding of ion channels, and of aquaporins (biological water pores [18, 19]). We have simulated the behaviour of water within model nanopores (see Fig. 2) of radii ranging from 3 to 10 Å [20]. One of the unexpected properties of water in such pores is that at intermediate radii (~5 Å) the water within the pore oscillates between a liquid and a vapour state. These oscillations are relatively slow and so extended simulations (~50 ns) are needed to capture their behaviour. Indeed, the characterisation of the behaviour of water in this relatively simple system required a total of ~1500 cpu days. To extend such studies to a more complex water model (the current study used a relatively simple three point fixed charge model) clearly will require a considerable increase in computational power. Figure 2: A simple model nanopore (blue) embedded in a membrane mimetic slab (gold) with water molecules on either side of the membrane and within the pore. What are the likely computational requirements of in depth simulations of ion channels and related pores? To date, MD simulations have been performed mainly on clusters and similar capacity machines. However, large scale ab initio calculations will require access to HPC facilities, and to suitably scalable codes. If such calculations are to be coupled to conventional MD simulations for the remainder of the protein and its environment then GRID-based approaches (such as those in e.g. the RealityGrid project www.realitygrid.org) will be required in order to synchronise the different aspects of the calculations. 4. Greater breadth It is important that biomolecular simulation studies are responsive to the challenges of a post-genomic era. Much progress in biology is made by comparisons. For example, in the context of protein structures, the ongoing expansion in the number of protein structures being determined (see www.rscb.org) has lead to the development of the discipline of structural bioinformatics [21], based on comparative analysis of protein structures. A great opportunity thus arises to derive general results by applying MD simulations to a wide range of proteins. In particular, by applying MD simulations to different protein folds it will be possible to correlate aspects of protein flexibility with the different protein architectures. Given the importance of protein flexibility and conformational change in protein function, it is essential that we approach this aspect of biomolecular simulations in a more wide-ranging and systematic fashion than has been possible to date. As an example of the biological importance of this approach, we will consider two classes of proteins that one might not expect to be related in their conformational dynamics, namely glutamate receptors and bacterial periplasmic binding proteins. Glutamate receptors (GluRs) are complex neurotransmitter-activated ion channels present in the central nervous systems of mammals. X-ray structures of the neurotransmitter (glutamate) binding fragment of two related mammalian GluRs (GluR2 and NR1) and of a bacterial homologue (GluR0) have been determined and shown to have similar structures [22-24]. Structural comparisons revealed that a similar protein fold is found in a functionally unrelated class of proteins, the bacterial periplasmic binding proteins, which include the glutamine-binding protein (GlnBP) and the lysine-arginine-ornithine binding protein (LAOBP). All of these proteins share a common fold, with a ligand-binding site in between two domains (see Fig. 3). From comparison of multiple static X-ray structures it has been suggested that these domains move together upon ligand binding. Comparative MD simulations have been performed on four of these proteins: GluR2 [25], GluR0 (Arinaminpathy, Sansom and Biggin, unpublished data), GlnBP [26] and LAOBP (Pang, Sansom and Biggin, unpublished data). The results reveal some interesting similarities in the dynamics of the different proteins. In particular a number of the simulations provide evidence for dynamic hinge-bending motions of the protein that enable the two domains to move together/apart. This is illustrated in Fig. 3 for LAOBP. Thus, these comparative simulations suggest conservation of a pattern of inter-domain dynamics across a family of protein folds. In the GluRs, these dynamic changes are exploited in the mechanism of receptor activation; in the periplasmic binding proteins they play a role in ligand transport across the bacterial cell membrane. Figure 3: The folds of GluR2 and LAOBP compared (the bound ligands are shown in red). The diagram on the right shows the principal motions (shown as blue/green cones) corresponding to the first eigenvector derived from analysis of a 20 ns simulation of LAOBP in the absence of bound ligand. Hinge-bending is evident. The computational requirements for such simulations are non-trivial. The example given corresponds to a family of relatively small (~250 residue) proteins. The simulation system size, once sufficient water molecules are included, is ~50,000 atoms. A meaningful comparative study would require a minimum of 4 simulations (e.g. with and without bound ligands, each starting from an open and a closed conformation) for each of e.g. 5 proteins within a family. Each simulation would need to be run for a minimum of 20 ns. A typical simulation cost for this size of system is ~14 cpu days/ns (on a Pentium III using the GROMACS code www.gromacs.org). Thus, a wide-ranging comparative study, encompassing perhaps 100 different (small) folds would require ~560,000 cpu days. Whilst not requiring the largest HPC resources such a study therefore would need a substantial allocation of high-end capacity time. 5. Greater complexity So far we have restricted our attention to single proteins and to relatively small systems. However, from a biophysical and biological perspective, there is much interest in large multisubunit proteins such as molecular machines, and in complex multi-component systems, such as cell membranes. In the context of the former, a number of investigators [27, 28] have started to use HPC to simulate the dynamic properties of relatively simple molecular machines such as the F-ATPase. These simulations, of just part of the machine, contain ~100,000 atoms and thus may start to raise considerations of scalability of simulation codes on large numbers of processors (see below). In addition to large single systems, it is important to apply biomolecular simulations to complex, multi-component systems. This is a step towards computational systems biology at a molecular/sub-cellular level. The aim is to provide a rigorous description of how the conformational dynamics of the individual components contribute to the emergent properties of a more complex biological system. We are in the early stages of such studies, but we can a preliminary estimate of their computational complexity, and the HPC infrastructure that will be required. A first step towards such studies is to select a suitable test system. For simulations of complex membranes, this is provided by the outer membrane of Gram negative bacteria such as E. coli. The structures of several bacterial outer membrane proteins are known, and it is possible to generate plausible homology models of other members of this family of proteins. Also, the outer membrane is complex, but not as complex as e.g. a mammalian nerve cell membrane, and so provides a suitable test system for developing a new approach. The first stage of this approach is to build a library of simulations of the individual components. This, in terms of computational resource, is similar to the comparative simulation studies described above. Several research groups [29-33] have embarked upon MD simulations of bacterial outer membrane proteins. An additional complexity for membrane proteins is that one needs to perform simulations to explore the effects of environment on protein dynamics, at least for some well-studied outer membrane proteins. Simulations may be used to explore how the dynamics of such proteins in the environments used in experimental studies (e.g. crystal or micelle) compare with the dynamics of the same protein in a cell membrane [34] (see Fig. 4). Figure 4: A simple outer membrane protein, OmpA (blue) simulated in a detergent micelle (green). The surrounding water molecules are omitted for clarity. The next stage of these investigations is to generate a prototype virtual outer membrane for E. coli and related bacteria (see Fig. 5) based on a multi-nanosecond atomistic MD simulation of e.g. a 3x3 array of bacterial outer membrane proteins (this would correspond to ~106 atoms). This will be the first time such a complex simulation has been performed, and will provide a test case for using MD simulations in systems biology approaches in order to bridge the molecular and cellular levels. In particular, we wish to explore how the long length- and time-scale properties of the simulated membrane may be analysed in order to develop suitably parameterised mesoscale methods for simulating even larger subcellular assemblies. Figure 5: Schematic of a virtual outer membrane, showing some of the bacterial outer membrane proteins (blue) embedded in a model outer membrane (grey) The computational needs of these very large scale simulations (106 atoms or more) are substantial. Access to 1000 cpu resources (e.g. HPCx) is essential in order to perform such simulations, as is suitable scalable MD code (e.g. NAMD - www.ks.uiuc.edu/research/namd/). Efficient approaches to simulation data analysis and visualisation will also have to be developed in order to cope with the output of such simulations. 6. Future Directions Biomolecular simulations will play an increasingly important role in modern structural and systems biology. In particular, simulations will aid the interpretation of biophysical and functional experiments at the single molecule level. Simulations on increasingly complex systems will provide a component for systems biology, helping to link molecular and cellular descriptions of function. All of these applications will place considerable demands upon computing infrastructure, and will benefit from ongoing developments in e-science and high performance computing. In particular, access to GRID-enabled HPC resources will be the key to effective multi-scale simulations of complex biological systems. There will also be important roles for aspects not discussed above, such as computational steering and visualisation. Depending on the particular biological application, both capability and capacity HPC resources will be needed. A major technical challenges for the future of biomolecular simulations will be to match the software and infrastructure to the changing nature of key biological applications. Very large-scale simulations will require good scalability of codes on large numbers (> 256) of processors. Complex, multi-scale simulations will require synchronised access to multiple, heterogeneous GRID-enabled resources. Acknowledgements Many thanks to all of our colleagues for their encouragement and advice. Research in MSPS’s laboratory is funded by grants from the BBSRC, EPSRC, MRC and the Wellcome Trust. References [1] Karplus, M.J. and McCammon, J.A. (2002) Nature Struct. Biol., 9, 646-652. [2] Capener, C.E., Kim, H.J., Arinaminpathy, Y. and Sansom, M.S.P. (2002) Human Molec. Genet., 11, 2425-2433. [3] Capener, C.E., Proks, P., Ashcroft, F.M. and Sansom, M.S.P. (2003) Biophys. J., 84, 23452356. [4] Domene, C., Haider, S. and Sansom, M.S.P. (2003) Curr. Opin. Drug Discov. Develop., (in press), [5] Doyle, D.A., Cabral, J.M., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Cahit, B.T. and MacKinnon, R. (1998) Science, 280, 69-77. [6] Zhou, Y., Morais-Cabral, J.H., Kaufman, A. and MacKinnon, R. (2001) Nature, 414, 43-48. [7] Guidoni, L., Torre, V. and Carloni, P. (1999) Biochem., 38, 8599-8604. [8] Shrivastava, I.H. and Sansom, M.S.P. (2000) Biophys. J., 78, 557-570. [9] Bernèche, S. and Roux, B. (2000) Biophys. J., 78, 2900-2917. [10] Bernèche, S. and Roux, B. (2001) Nature, 414, 73-77. [11] Domene, C., Bond, P. and Sansom, M.S.P. (2003) Adv. Prot. Chem., (in press), [12] Sansom, M.S.P., Shrivastava, I.H., Bright, J.N., Tate, J., Capener, C.E. and Biggin, P.C. (2002) Biochim. Biophys. Acta, 1565, 294-307. [13] Åqvist, J. and Luzhkov, V. (2000) Nature, 404, 881-884. [14] Shrivastava, I.H., Tieleman, D.P., Biggin, P.C. and Sansom, M.S.P. (2002) Biophys. J., 83, 633-645. [15] Domene, C. and Sansom, M.S.P. (2003) Biophys. J., (in press), ms. 2002/018044. [16] Tieleman, D.P., Biggin, P.C., Smith, G.R. and Sansom, M.S.P. (2001) Quart. Rev. Biophys., 34, 473-561. [17] Guidoni, L. and Carloni, P. (2002) Biochim. Biophys. Acta, 1563, 1-6. [18] de Groot, B.L. and Grubmuller, H. (2001) Science, 294, 2353-2357. [19] Tajkhorshid, E., Nollert, P., Jensen, M.O., Miercke, L.J.W., O'Connell, J., Stroud, R.M. and Schulten, K. (2002) Science, 296, 525-530. [20] Beckstein, O. and Sansom, M.S.P. (2003) Proc. Nat. Acad. Sci. USA, 100, 7063-7068. [21] Bourne, P.E. and Weissig, H. (2003) Structural Bioinformatics, Wiley-Liss, Hoboken. [22] Armstrong, N., Sun, Y., Chen, G.-Q. and Gouaux, E. (1998) Nature, 395, 913 - 917. [23] Mayer, M.L., Olson, R. and Gouaux, E. (2001) J. Mol. Biol., 311, 815-836. [24] Gouaux, E. and Furukawa, H. (2003) EMBO J., 22, 2873-2875. [25] Arinaminpathy, T., Sansom, M.S.P. and Biggin, P.C. (2002) Biophys. J., 82, 676-683. [26] Pang, A., Arinaminpathy, Y., Sansom, M.S.P. and Biggin, P.C. (2003) FEBS Lett., (in press), [27] Bockmann, R.A. and Grubmuller, H. (2002) Nature Struct. Biol., 9, 198-202. [28] Dittrich, M., Hayashi, S. and Schulten, K. (2003) Biophys. J., (in press), [29] Tieleman, D.P. and Berendsen, H.J.C. (1998) Biophys. J., 74, 2786-2801. [30] Im, W. and Roux, B. (2002) J. Mol. Biol., 319, 1177-1197. [31] Bond, P., Faraldo-Goméz, J. and Sansom, M.S.P. (2002) Biophys. J., 83, 763-775. [32] Baaden, M., Meier, C. and Sansom, M.S.P. (2003) J. Mol. Biol., 331, 177-189. [33] Faraldo-Gómez, J., Smith, G.R. and Sansom, M.S.P. (2003) Biophys. J., (in press), ms. 2002/017228. [34] Bond, P. and Sansom, M.S.P. (2003) J. Mol. Biol., 329, 1035-1053.

L S B :

Related documents

Products

Support

L S B :

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib