Applying e-Science to Computational Chemistry D.S. Coombesa, B. Butchartb, C.R.A. Catlowa, S.A. Frencha, H. Nowellc, S.L. Pricec and W. Emmerichb a Davy Faraday Research Laboratory, The Royal Institution of Great Britain, 21 Albemarle Street, London, W1S 4BS, UK b Dept. of Computing Science, University College London, Gower Street, London, WC1E 6BT, UK c Dept. of Chemistry, University College London, 20 Gordon Street, London, WC1H 0AJ, UK Abstract We summarise our work on the EPSRC e-science project ‘E-Science Technologies in the Simulation of Complex Materials’. The aim of this project has been to apply eScience/Grid technologies to two important areas in computational chemistry, namely combinatorial catalytic chemistry and crystal structure prediction. Introduction The EPSRC funded e-Science project ‘eScience Technologies in the Simulation of Complex Materials’ has set far-reaching goals both in application and development work. Stress has been placed on improving both the computational environment available to researchers and achieving maximum efficiency from computational hardware, both those inhouse and those available within the national Grid network, which will greatly increase the resources available to all institutions. The project combines researchers specialised in applications, code development and computer science to produce the design and innovation that will assist the standardisation of practices within modern computational chemistry. We describe two important areas of computational chemistry, which have greatly benefited from the tools and techniques being developed within the e-science framework. Applications I Combinatorial Chemistry Combinatorial techniques are important in many areas of chemistry and will have a major role to play in the development of new materials with specific properties. One of the most important is in the area of catalyst development, where modelling will allow screening of a large number of zeolites containing different constituents that are varied in optimisations. One application is optimisation of metal modified zeolite catalysts for selective oxidation catalysis (e.g. alkenes to oxides1). Here, the aim is to select the optimum metal/framework combination for activity and selectivity. The ‘e-science challenge’ to be addressed is to harness effectively large distributed computational tasks. Applications that require many calculations of a similar timescale are highly applicable to a grid based infrastructure, a prime example being Monte Carlo simulations. We have used and developed a combined Monte Carlo and energy minimisation approach to model zeolitic materials with low and medium Si/Al ratios. Firstly Al is inserted into a siliceous unit cell and then a charge compensating cation, such as Na, is added between two of the oxygen’s coordinated to Al. When we have confirmed the lowest energy positions of Al the cation is exchanged for a proton and again energy minimised, as shown in Figure 1. The method developed along with the exploitation of low specification computational resources will allow us to construct realistic models of low and medium Si/Al zeolites. Such structures can be used for further simulations and aid the interpretation of experimental data. All the calculations have been performed on the UCL Condor pool. The pool consists of approximately 1000 low specification desktop teaching PCs running Windows 2000, which act solely as a client for a Windows Terminal Server. Therefore their processors are virtually unused and can be made available for calculations as shown by the statistics given in Table 1. Figure 1 An example of Al Substitution and proton charge compensation Number of Nodes 950 Number of Simulations 150,000 Number of Particles 75,000,000 Total cpu time 150 years Table 1 Condor statistics II Crystal Structure Prediction Crystal structure prediction is of great importance in relation to the development of pigments and dyes as well as in the development of energetic materials. However, it is of most benefit in the pharmaceutical industry, where a drug can only be marketed in the licensed crystal form. The appearance of new crystalline forms (polymorphs) of a pharmaceutical compound can cause considerable problems during development, scale-up, production and storage. Discovery of a new polymorph is also important in the area of patent protection, where it can lead to prolonging the ability of a drug company to manufacture solely a particular drug. Pharmaceutical molecules are generally flexible; the complexity of polymorph prediction2 is thus increased because of the need not only to search through the huge range of possible crystal packings but also to consider the range of energetically plausible conformers (different shapes of the molecule). Our method for predicting polymorphs involves a number of programs that have traditionally been run sequentially with manual editing of input and output files. Using this manual method, a polymorph prediction study typically takes several months of work for each flexible molecule studied. The search for possible crystal structures using crystallographic relationships is implemented in the program MOLPAK3. This currently searches 13 space groups represented by 29 of the most common packing types. For a flexible molecule it is necessary to perform a thorough conformational analysis to produce a series of energetically plausible conformers. Up to 200 densely packed structures are found for each packing type and each is input to DMAREL4 for lattice energy minimisation. The next stage of the process involves removing duplicate crystal structures as many of those found in the search will represent the same minimum. The remaining structures are then sorted in terms of energy. Finally, property calculations are then performed to see which structures are more likely to be observed experimentally. We can calculate properties such as elastic constants and phonon frequencies using DMAREL. Subsequent processes allow us to calculate the morphology (shape) of the hypothetical crystal structures as well as calculate powder x-ray diffraction patterns. The ‘e-science challenge’ to be addressed here involves developing and optimising the work-flow so that the MOLPAK and DMAREL codes are linked as a set of loosely coupled web services. The property prediction codes are at present run as a separate process, but could be easily integrated into the same package. We have investigated the polymorphism of the nootropic drug piracetam, whose molecular structure is shown in Figure 2. Prior to this study there were three known polymorphs of piracetam5-7 (which we refer to as form I, form II and form III). The predicted morphologies are shown in Figure 3 and powder patterns of these three forms are shown in Figure 4. O N NH 2 O Figure 2 The piracetam molecule. indicate flexible torsion angles Arrows We have previously reported how our preliminary calculations suggested that it is unlikely that the known polymorphs would be located during a search using a gas phase optimised molecular structure. of rigid conformers to systematically explore which regions of conformational space could give rise to low energy hydrogen bonded crystal structures.. The search is then refined using crystallographic insight to optimise particular intermolecular interactions. Currently a search on one conformation takes about one hour. Using this method we were able to easily locate forms I, II and III. All of these crystal structures contain molecules whose conformations are very different from the gas phase optimised molecule. During the course of this work, a new experimental polymorph (form IV) had been obtained via recrystallisation and data collection at high pressure8. Six computed crystal structures from the low energy region (within 5 kJ mol-1 of the global minimum) were sent to the experimental team. The lowest energy structure proved to be a good approximation to form IV. Use of e-Science Tools Figure 3 Left to right; predicted morphologies for piracetam forms I, II and III In this project we have made extensive use of various e-Science tools. A condor pool consisting of around 1000 Desktop PC’s has be been utilised for calculations on zeolite systems. The ‘interactive search’ system for crystal structure prediction uses a BPEL web service system to orchestrate the complex multi-program workflows as a grid application. We are also using the CCLRC dataportal and the SRB to store low energy structures and properties. We are also developing a database to allow data mining of the results. Conclusions Figure 4 Left to right; predicted powder patterns for piracetam forms I, II and III Our ‘interactive searching’ methodology allows us to search a large area of conformational space quickly, thus increasing the chances of finding a different polymorph which could result in a more thermodynamically stable crystal structure than the known one. Firstly, we search for low energy crystal structures using a large number We have shown how existing programs and processes for both combinatorial chemistry and crystal structure prediction have been grid enabled. This has made it possible to study important processes in catalytic chemistry that would have been impossible without the use of a Condor pool to carry out the calculations in a reasonable amount of time. The crystal structure prediction methodology has also benefited from being grid-enabled, so that time and manpower required to perform a study on a particular molecule has been reduced. We are currently developing a database of known and calculated crystal structures and properties for eventual data mining to develop techniques for and increase our knowledge of crystal structure prediction. Acknowledgements This work was funded by the project ‘E-Science Technologies in the Simulation of Complex Materials’. References (1) Notari, B. Microporous crystalline titanium silicates. In Advances in Catalysis, Vol 41; ACADEMIC PRESS INC: San Diego, 1996; Vol. 41; pp 253-334. (2) Ouvrard, C.; Price, S. L. Cryst. Growth Des. 2004, 4, 1119-1127. (3) Holden, J. R.; Du, Z. Y.; Ammon, H. L. J. Comput. Chem. 1993, 14, 422-437. (4) Willock, D. J.; Price, S. L.; Leslie, M.; Catlow, C. R. A. J. Comput. Chem. 1995, 16, 628-647. (5) Céolin, R.; Agafonov, V.; Louër, D.; Dzyabchenko, V. A.; Toscani, S.; Cense, J. M. J. Solid State Chem. 1996, 122, 186-194. (6) Louër, D.; Louër, M.; Dzyabchenko, V. A.; Agafonov, V.; Céolin, R. Acta Crystallogr. Sect. B-Struct. Sci. 1995, 51, 182-187. (7) Admiraal, G.; Eikelenboom, J. C.; Vos, A. Acta Crystallogr. Sect. B-Struct. Sci. 1982, 38, 2600-2605. (8) Fabbiani, F. P. A.; Allan, D. R.; Parsons, S.; Pulham, C. R. CrystEngComm 2005, 7, 179-186.