Constructing Chained Molecular Dynamics Simulations of HIV-1 Protease Using the Application Hosting Environment P. V. Coveney, S. K. Sadiq, R. S. Saksena, and S. J. Zasada Centre for Computational Science, Department of Chemistry, University College London, Christopher Ingold Laboratories, 20 Gordon Street, London, WC1H 0AJ Abstract Many crystal structures of HIV-1 protease exist, but the number of clinically interesting drug resistant mutational patterns is far larger than the available crystal structures. Mutational protocols convert one protease sequence with available crystal structure into another that diverges by a small number of mutations. It is important that such mutational algorithms are followed by suitable multi-step equilibration protocols, e.g. using chained molecular dynamics simulations, to ensure that the desired mutant structure is an accurate representation. Previously these have been difficult to perform on computational grids due to the need to keep track of large numbers of simulations. Here we present a simple way to construct a chained MD simulation using the Application Hosting Environment. I Introduction Computational grids [1, 2] provide an ideal environment to perform compute intensive tasks such as molecular dynamic simulations, but many scientists have been deterred from using them due to the perceived difficulty of using the grid middleware [3]. The Application Hosting Environment [4] is a lightweight, WSRF [5] compliant middleware system designed to allow a scientist to easily run applications on remote grid resources. We have successfully used it to host the NAMD [6] molecular dynamics code, and run jobs on both the UK National Grid Service and the US TeraGrid. Here we present a case study using the AHE to construct chained application workflows in an investigation into the HIV-1 protease. II Molecular Dynamics of HIV-1 Protease Our case study is on the use of the AHE to manage molecular dynamics simulations of the HIV-1 protease. The protease encoded by HIV is responsible for the cleavage of viral polyprotein precursors and subsequent maturation of the virus. The protease is a symmetric dimer (each monomer has 99 amino acids) that encloses a pair of catalytic aspartic acid residues in the active site. The active site is bound by a pair of highly flexible flaps that allow the substrate access to the aspartic acid dyad [9, 10]. The enzyme has been a key target for antiretroviral inhibitors and an example of structure assisted drug design [11]. Unfortunately, therapy is limited by the emergence and pro- liferation of drug resistant mutations in various enzymes of HIV [12]. HIV-1 protease, also exhibits tolerance to a significant quantity of nondrug resistant mutations as part of its natural variability [13]. Comparisons of resolved crystal structures of HIV-1 protease supports the stability of tertiary structure to many mutations [14]. Although many such crystal structures of HIV-1 protease exist, the scope and extent of both clinically interesting drug resistant mutational patterns [15] and non-drug resistant mutations is far larger than available by crystallographic methods. It is therefore necessary when modeling HIV-1 protease mutants to employ mutational protocols that convert one protease sequence with available crystal structure into another that diverges by a small number of mutations, but which has no crystal structure. It is also important that such mutational algorithms are followed by suitable multi-step equilibration protocols to ensure that the desired mutant structure is an accurate representation of the actual structure. Whilst standard protocols exist that employ several steps including gentle annealing to physiologically relevant temperatures, removal of force constraints on the protease and establishing a relevant thermodynamic ensemble [10], more extensive protocols are required to cope with the implementation of divergent mutations from a crystal structure. Here we present an equilibration protocol composed of a chained sequence of molecular dynamics simulations that implements standard protocol requirements as well as including steps that allow for conformational sampling and re- Eq Step Procedure eq0 eq1 eq2 eq3∗ eq4 eq5 eq6 eq7 eq8 eq9 eq10 eq11 minimization annealing: 50K - 100K annealing: 100K - 300K NVT∗∗ NVT NVT NVT NVT NVT NVT NVT NPT∗∗∗ Sim Duration (ps) 2000 iterations 10 20 200 50 50 50 50 50 50 470 1000 Force Constant (kcal/mol) 1 1 1 1 0.8 0.6 0.4 0.2 0.2 0.2 0 0 Constrained Atoms A A A A A A A A B C - A = all non-hydrogen protease atoms B = class ‘A’ except atoms of all amino acids within 5 Å of and including N25D mutations C = class ‘A’ except atoms of all amino acids within 5 Å of and including I84V mutations * This step prevents premature flap collapse [7] ** NVT ensemble temperature maintained using Langevin thermostat with coupling coefficient of 5 /ps *** NPT ensemble maintained using Berendsen Barostat [8] at 1bar and with pressure coupling of 0.1 ps Table 1: Equilibration protocol for molecular dynamics simulation of HIV-1 protease incorporating relaxation of mutated amino acid residues. laxation of the incorporated mutations within the framework of their surrounding protease structure. Furthermore, we show how use of the AHE both automates such a chained protocol and facilitates deployment of such simulations across distributed grid resources. III The Application Hosting Environment The Application Hosting Environment (AHE) is a lightweight, WSRF [5] compliant, web services based environment for hosting unmodified scientific applications on the grid. The AHE is designed to allow scientists to quickly and easily run unmodified, legacy applications on grid resources, manage the transfer of files to and from the grid resource and monitor the status of the application. The philosophy of the AHE is based on the fact that very often a group of researchers will all want to access the same application, but not all of them will possess the skill or inclination to install the application on a remote grid resource. In the AHE, an expert user installs the application and configures the AHE server, so that all participating users can share the same application. For a discussion of the architecture and implementation of the AHE, see [4] . The AHE provides users with both GUI and command line clients to interact with a hosted application. In order to run an application using the AHE command line clients, firstly the user must issue the ahe-listapps command to find the end point of the application factory of the application she wants to run. Next she issues the aheprepare command to create a new WS-Resource to manage the state of her application instance. Finally she issues the ahe-start command, which will parse her application configuration file to find any input files that need to be staged to the remote grid resource, stage the necessary files, and launch the application. The user can then use the ahe-monitor command to check on the progress of her application and, once complete, the ahe-getoutput command to stage the output files back to her local machine. By calling these simple commands from a shell or Perl script the user is able to create complex application workflows, starting one application execution using the output files from a previous run. IV Implementation The 1TSU crystal structure was used as the starting point for the molecular dynamics equilibration protocol. This structure contains inactive wildtype protease complexed to a substrate. VMD [16] was used for the initial preparation of the system prior to simulation. The coordinates of the substrate were removed from the structure, all missing hydrogen atoms were inserted and the structure was solvated and neutralized. The N25D mutation was incorporated to restore catalytic activity to the protease and the I84V as it is a primary drug resistant mutation for sev- (a) (b) Figure 1: Root-mean-squared-deviation (RMSD) of protease amino acid backbone atoms excluding hydrogen, with respect to the initial X-ray structure (a) and of the dimeric pair of mutated amino acid atoms excluding hydrogen, with respect to the initial X-ray structure (b). eral inhibitors. The molecular dynamics package NAMD2 [6] was used for all equilibration simulations. The equilibration protocol was adapted from Perryman et al. [10] with several important modifications and is presented in Table 1. NAMD configuration files corresponding to each step of the equilibration protocol were set up in a way such that the output files of each step would serve as the input files of the next step. The files were generated automatically using a Perl script designed to set up such systems, and a naming convention was used for the NAMD configuration file at each step of the equilibration protocol to ease scripting of the workflow. A Perl script was created to execute the desired equilibration chain on a remote grid resource, using the AHE middleware to manage the state of each of the steps in the chain. The script executed the ahe-prepare command followed by the ahe-start command sequentially for each step of the equilibration protocol. This had the effect of preparing a WS-Resource to manage the step, staging input files necessary for the step, and executing the application. The script then polled the AHE server at regular intervals using the ahe-monitor command until the simulation step had completed. Once complete, the script staged the files back to the local machine and used them initiate the next step of the equilibration protocol. The script terminated after sequentially executing all desired steps in the chained protocol. tease was performed (Figure 1 (a)). A slow relaxation of the backbone occurs across the first 0.4 ns of equilibration due to the gradual reduction of the force constant from 1 kcal/mol to 0.2 kcal/mol. The RMSD plateau of 0.5 Å between 0.4 ns and 0.6 ns is a signature of that part of the equilibration protocol where the force constant is maintained at 0.2 kcal/mol for most of the protease and stepwise relaxation of the mutated amino acid positions and local environment is allowed. As soon as the force constant is removed there is a rapid rise in the RMSD to approximately 1 Å away from X-ray structure as all amino acid positions along the backbone move towards optimal conformations. The mean RMSD across the last 1ns of equilibration is 1.11±0.11 Å and describes equilibration of the protease at a relatively low distance from initial X-ray structure with small fluctuations. The RMSD of the backbone and sidechain atoms (excluding hydrogen) of the mutated amino acids was also calculated (Figure 1 (b)). The positions of both residues change relatively little during the period in which their force constants are set to zero. Once the whole protease is free from constraints, residue 25 describes an abrupt change in RMSD to approximately 1 Å whilst residue 84 changes more gradually to the same value. This may be due to the fact that although both sets of residues are in the active site, the D25 dyad is more exposed to water than V84 and thus more prone to moving once constraints have been lifted. The mean RMSD during the last 1 ns of simulation is 1.02±0.15 Å V Results & Conclusion and 0.92±0.10 Å for D25 and V84 residues reAnalysis of the root-mean-square-deviation spectively, which is smaller than the backbone (RMSD) of the backbone atoms of HIV-1 pro- RMSD of the whole protease. The simulation has shown that in this case, [7] K. L. Meagher and H. A. Carlson. Solthe change in RMSD of mutated amino acids vation Influences Flap Collapse in HIV-1 is similar to that of the protease backbone as Protease. Proteins: Struct. Funct. Bioinf., a whole. Whilst this is indicative of a good 58:119–125, 2005. initial mutational protocol, such as that used in VMD, differences in the RMSD of mutated [8] H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak. residues during minimization and force relaxMolecular dynamics with coupling to an exation show that an equilibration protocol that ternal bath. J. Chem. Phys., 81:3684–3690, allows conformational change of mutated amino 1984. acids assists in the achievement of equilibration. Furthermore, as a significant degree of [9] W. R. P. Scott and C. A. Schiffer. Curling simulation using a multi-step protocol is necesof Flap Tips in HIV-1 Protease as a Mechsary to achieve equilibration, the ability to auanism for Substrate Entry and Tolerance of tomate implementation of such a protocol using Drug Resistance. Structure, 8:1259–1265, the AHE is greatly beneficial when considering 2000. the need to do such equilibrations for a large [10] A. L. Perryman, J. Lin, and J. A. McCamnumber of protease mutations. mon. HIV-1 protease molecular dynamics We have also shown that due to the flexiof a wild-type and of the V82F/I84V muble nature of the AHE, a complex workflow can tant: Possible contributions to drug resisbe orchestrated by scripting the AHE command tance and a potential new target site for line clients; in this case we have conducted a drugs. Protein Sci., 13:1108–1123, 2004. chained molecular dynamic simulation using less than forty lines of Perl code. [11] A. Wlodawer and J. Vondrasek. Inhibitors of HIV-1 Protease: A Major Success of References Structure-Assissted Drug Design. Annu. [1] P. V. Coveney, editor. Scientific Grid ComRev. Biophys. Biomol. Struct., 27:249–284, puting. Phil. Trans. R. Soc. A, 2005. 1998. [2] I. Foster, C. Kesselman, and S. Tuecke. The [12] V. A. Johnson, F. Brun-Vezinet, B. Clotet, anatomy of the grid: Enabling scalable virB. Conway, D. R. Kuritzkes, D. Pillay, tual organizations. Intl J. Supercomputer J. Schapiro, A. Telenti, and D. Richman. Applications, 15:3–23, 2001. Update of the Drug Resistance Mutations [3] J. Chin and P. V. Coveney. Toin HIV-1: 2005. Int. AIDS Soc. - USA, wards tractable toolkits for the grid: 13:51–57, 2005. a plea for lightweight, useable middleware. Technical report, UK e-Science [13] N. G. Hoffman, C. A. Schiffer, and R. Swanstrom. Covariation of amino acid Technical Report UKeS-2004-01, 2004. positions in hiv-1 protease. Virology, http://nesc.ac.uk/technical papers/ 314:536–548, 2003. UKeS-2004-01.pdf. [4] P. V. Coveney, S. K. Sadiq, R. S. Saksena, [14] V. Zoete, O. Michielin, and M. Karplus. Relation between Sequence and Structure M. Thyveetil, S. J. Zasada, M. Mc Keof HIV-1 Protease Inhibitor Complexes: A own, and S. Pickles. A lightweight applicaModel System for the Analysis of Protein tion hosting environment for grid computFlexibility. J. Mol. Biol., 315:21–52, 2002. ing. 5th UK e-Science All Hands Meeting, 2006. [15] T. D. Wu, C. A. Schiffer, M. Gonzales, J. Taylor, R. Kantor, S. Chou, D. Is[5] S. Graham, A. Karmarkar, J Mischkinraelski, A. R. Zolopa, W. J. Fessel, and sky, I. Robinson, and I. Sedukin. Web R. W. Shafer. Mutation Patterns and Services Resource Framework. Technical Structural Correlates in Human Immunodreport, OASIS Technical Report, 2006. eficiency Virus Type 1 Protease following http://docs.oasis-open.org/wsrf/wsrfDifferent Protease Inhibitor Treatments. J. ws resource-1.2-spec-os.pdf. Virol., 77:4836–4847, 2003. [6] L. Kale, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, [16] W. Humphrey, A. Dalke, and K. Schulten. VMD - Visual Molecular Dynamics. J. Mol. J. Phillips, A. Shinozaki, K. Varadarajan, Graph., 14:33–38, 1996. and K. Schulten. NAMD2: Greater scalability for parallel molecular dynamics. J. Comp. Phys., 151:283–312, 1999.