e-Science Technologies in the Simulation of Complex Materials L. Blanshard, R. Tyer, K. Kleese S. A. French, D. S. Coombes, C. R. A. Catlow eMaterials B. Butchart, W. Emmerich – CS H. Nowell, S. L. Price – Chem H H3C CH3 H N O H NO2 NO2 Polymorphism prediction of polymorphs – a drug substance may exist as two or more crystalline phases in which the molecules are packed differently. Combinatorial Computational Catalysis explore which sites are involved in catalysis – used in diverse industries including petroleum, chemical, polymers, agrochemicals, and environmental. Acid Sites in Zeolites Polymorph Prediction Different crystal structures of a molecule are called polymorphs. Polymorphs may have considerably different properties (e.g. bioavailability, solubility, morphology) Polymorph prediction is of great importance to the pharmaceutical industry where the discovery of a new polymorph during production or storage of a drug may be disastrous Drug molecules are often flexible and this makes the polymorph prediction process more challenging… Polymorph Prediction Workflow For flexible molecules: conformational optimisation n feasible rigid molecular probes representing energetically plausible conformers MOLPAK Generation of ~6000 densely packed crystal structures using rigid molecular probe n times Morphology DMAREL Lattice energy optimisation Data : Unit cell volume, density, lattice energy Restricted number of structures selected crystal structures and properties stored in Database n = number of conformers Blind Test 2004 H H3C CH3 O The Challenge: H N H Predict the crystal structure of 2-methyl-4,5-dinitro-phenyl-acetamide NO2 NO2 Flexibility indicated with arrows Potential energy surface scan about the CCNC torsion angle Wide range of conformers within plausible energy range 8 conformers chosen and used in subsequent searches Energy Difference / kJmol -1 40 35 30 25 20 15 10 5 0 0 100 200 CCNC Torsion Angle / ˚ 300 400 Blind Test 2004 Minima in the Lattice Energy for Different Conformations Volume / Z (Å3 molecule-1) Lattice energy + intramolecular energy / kJmol-1 250 270 290 310 330 350 370 390 410 -30 -50 Conformer: a b -70 c 10 20 -10 -90 -20 -5 -110 -130 Blind Test 2004 Minima in the Lattice Energy for Different Conformations Volume / Z (Å3 molecule-1) Lattice energy + intramolecular energy / kJmol-1 260 265 270 275 280 285 290 -116 -118 Conformer: a b c -120 10 20 -10 -20 -122 -5 -124 Best 10kJmol-1 -126 Necessary to consider properties of best crystal structures, such as growth rates, to decide which are more likely to be observed Results Observed crystal structure (revealed upon completion of blind test) – higher energy conformer than those considered! When just the observed conformer is used as the rigid probe in the search the observed structure is found as global minimum in lattice energy Predicted Observed Summary High energy gas phase conformers may be stabilised by packing within a lattice in the solid state As many conformers as possible need to be considered to maximise the chance of predicting crystal structures correctly and exploring the range of structures that are energetically feasible as polymorphs A fast, distributed e-Science application is being developed, to enable routine crystal structure prediction for large numbers of conformers –this is essential to develop computational methods of predicting possible polymorphs of pharmaceutical molecules Predicting Morphologies The shape, or morphology, of a crystal plays an important role in the manufacturing process as there are considerable problems if the morphology changes due to impurities or changes of solvent or when the process is scaled up for high volume manufacture. An understanding of the factors influencing crystal morphology will help us to understand how the crystallisation process can be controlled through, for example the use of solvents or additives. • BFDH Theory – based on geometrical factors • AE Model – based on energetic factors Scheme for Morphology Calculations Minimised Structure Choose faces to study ~15-20 For each face calculate AE From DMAREL minimised structure BFDH calculation in GDIS Calculate valid shifts Converge regions (exclude polar) Draw morphology for each crystals set of faces Calculate relative volume growth rates Wulff plot New property Morphologies The calculated morphology can be visualised using a Wulff plot-where the ratio of surface normal distances of all planes from the centre of the crystal are determined by either the interplanar spacings, attachment or surface energies. HO H N CH3 O Observed and predicted morphology of form 1 of paracetamol Growth Volume New property ‘growth volume’- obtained by numerical integration to find the volume within the Wulff shape-gives an indication of whether one face dominates. Pyridine -30 10 9 -25 N 8 Relative Volume -20 6 -15 5 4 -10 AE/kJ mol-1 per molecule 7 Volume AE Form 1 Z’=4 3 2 -5 1 0 fa 37 ak 11 am 50 cb 38 fc 21 aq 34 dd 31 am 20 ak 23 cd 49 av 32 ca 21 am 43 ai 36 cb 39 de 20 ca 28 de 40 cb 47 ai 18 am 5 fo rm II ca 43 ak 7 az 5 ak 14 fa 38 fa 29 fo rm I ak 15 0 Polymorph-Decreasing Stability Prompted expt. search for more polymorphs Many low energy structures, new observed form 2 predicted to grow fast e-Science Issues to Address • • • • • • • simulations take too long to run data are distributed across many sites and systems no catalogue system output in legacy text files, different for each program few tools to access, manage and transfer data workflow management is manual licensing within distributed environment Fortran Web Services 1. Expose Fortran binary as distributed Web Service (Web Service Description Language) WSDL FO XML Fortran output XML <x…/> Fortran binary XSL FO Define an XML interface to the computation Fortran input To get binary to “talk” in XML: either change Fortran code so input and output uses XML or use parsers and XSLT conversion documents to map from fixed format input/output files to and from XML. Distributed Workflow 2. Orchestrate Web Services with workflow service WS wrapped Fortran binary BPEL script Business Process Execution Language Workflow service is exposed to outside world as a web service Data Representation CH4 CH4 CH4 CH4 Fortran programs, use lots of different formats to represent the same thing. Data Representation CML <CH4…/> Since we provide new WSDL interfaces for each application we have a perfect opportunity to employ a standard representation for chemical structures. XML standard in Chemistry is CML (Chemical Markup Language) Development of chemical markup language (CML) as a system for handling complex chemical content. P. Murray-Rust, New Journal of Chemistry, 2001, 25, 618-634. Integration with Existing Infrastructure (BPEL) workflow Prototype has been successfully deployed. Integration with Existing Infrastructure Sun Grid Engine (BPEL) workflow Existing grid infrastructure does not integrate easily with web services. Policy on compute clusters enforced by Sun Grid Engine batch system Other users of clusters submit jobs via this control software Building a WSDL binding over the Sun Grid Engine protocol is difficult Smooth transition from existing infrastructure to WS riskier than thought. Data Management at CCLRC • file storage at CCLRC • distributed file access via Storage Resource Broker (SDSC) • catalogue of files using metadata in relational database • web interface to metadata and files via Data Portal • metadata editor through browser Storage Resource Broker Store data files from simulations in the Storage Resource Broker Data Portal Search for studies in material sciences and download associated data using the - CCLRC Data Portal Ongoing and Future Work • upload files as part of workflow to SRB • generate metadata • upload extracted data from files Acid Sites in Zeolites •Determine the extra framework cation position within the zeolite framework. •Explore which proton sites are involved in catalysis and then characterise the active sites. •To produce a database with structural models and associated vibrational modes for Si/Al ratios. •Improve understanding of the role of the Si/Al ratio in zeolite chemistry. MC/EM A combined MC and EM approach has been developed to model zeolitic materials with low and medium Si/Al ratios. Firstly Al is inserted into a siliceous unit cell and then a charge compensating cation. The zeolite Mordenite, which has a 1 dimensional channel system, has been studied with a simulation cell containing two unit cells, which means 296 atoms, with 96 Si centres (referred to as T sites). 100 Configurations 0 100 Configurations -12085 5550 -12083 5530 full_TE full_Vol -12081 5 per. Mov. Avg. (full_TE) 5510 5490 -12077 5470 -12075 5450 -12073 5430 -12071 5410 -12069 5390 -12067 5370 -12065 5350 It can be seen that there are two distinct regions, -12079eV to -12076eV and -12075eV to -12073eV, but there is no obvious correlation between total energy and cell volume. Cell Vol. Total Energy (eV) 5 per. Mov. Avg. (full_Vol) -12079 10000 Configurations 0 10000 configurations -12090 TE VOL 200 per. Mov. Avg. (TE) 200 per. Mov. Avg. (VOL) 5550 -12085 5500 TE VOL -12080 -12075 5450 -12070 5400 -12065 5350 However, when 10,000 structures are considered it is clear that the most stable structures correspond to cation placements that do not cause the cell to expand. This requires that the cations sit in the large channel. Comparison of Regions -12079.5eV -12075.04eV What Next When confirmed the lowest energy positions of Al the cation is exchanged for a proton and again energy minimised. This method will allow us to construct realistic models of low and medium Si/Al zeolites. Such structures can be used for further simulations and aid the interpretation of experimental data. Condor Extensive use of Condor pools (UCL – 950 nodes in teaching pools). 48 cpu-years of previously unused compute resource have been utilised in this study. Close collaboration with the NERC e-minerals project has allowed access to this resource. 50,000 calculations have been performed each with 488 particles per simulation box, which means a total of 24,000,000 particles have been included in our simulations to date. Achievements To Date 1. First use of CML schema for defining Web Service port types. 2. Calculation of 50,000 configurations of zeolite Mordenite (24,000,000 particles) to gain insight into structure when a realistic ratio of Al substitution is included in model. 3. Successfully exposed Fortran codes as OGSI Web Services prototype application deployed on 80 nodes. The prototype computational polymorph application is being ported to a larger production machine. 4. First use of BPEL standard for orchestrating web services in a Grid application. 5. Open Source BPEL implementation in development enabling late binding and dynamic deployment of large computational processes. 6. Integration of OGSI and BPEL with Sun Grid Engine. 7. Development of Graphic User Interface for polymorph application connects to relational database via EJB interface. 8. Infrastructure for metadata and data management 9. SRB and dataportal are already being used to hold datasets and being used for transferring the data between different scientists and computer applications. 10. Implementation of Condor pool at Ri. Key Achievement We are now doing science that was not possible before the advancements made within e-Science.