e-Science Technologies in the Simulation of Complex Materials L. Blanshard, R. Tyer, K. Kleese S. A. French, D. S. Coombes, C. R. A. Catlow eMaterials B. Butchart, W. Emmerich – CS H. Nowell, S. L. Price – Chem H H3C CH3 H N O H NO2 NO2 Polymorphism prediction of polymorphs – a drug substance may exist as two or more crystalline phases in which the molecules are packed differently. Combinatorial Computational Catalysis explore which sites are involved in catalysis – used in diverse industries including petroleum, chemical, polymers, agrochemicals, and environmental. Acid Sites in Zeolites H H3C CH3 H N O H NO2 NO2 Polymorphism prediction of polymorphs – a drug substance may exist as two or more crystalline phases in which the molecules are packed differently. Combinatorial Computational Catalysis explore which sites are involved in catalysis – used in diverse industries including petroleum, chemical, polymers, agrochemicals, and environmental. Acid Sites in Zeolites e-Science Issues to Address • • • • • • • simulations take too long to run data are distributed across many sites and systems no catalogue system output in legacy text files, different for each program few tools to access, manage and transfer data workflow management is manual licensing within distributed environment Acid Sites in Zeolites •Determine the extra framework cation position within the zeolite framework. •Explore which proton sites are involved in catalysis and then characterise the active sites. •To produce a database with structural models and associated vibrational modes for Si/Al ratios. •Improve understanding of the role of the Si/Al ratio in zeolite chemistry. Chabazite: 1T site, 12 Si centres per unit cell, 8 membered ring channels (3.8Å * 3.8Å). The Problem Si/Al – 11 = 4 Si/Al – 5 = 160 Si/Al – 3 = 5760 Si/Al – 2 = 184,320 When substitution of a second Al is considered there are now 4 * (10 * 4) possible structures as symmetry has been broken. Note this is for a very simple zeolite with 36 ions per unit cell, materials of interest have 296. The number of calculations quickly becomes an issue when realistic Si/Al ratios are considered. A Si/Al ratio of 2 would require 184,320 calculations at ~100 second each. = 5120.0 hours = 213 days of cpu time. MC/EM Final structures Lattice energy (eV) -12295.12 Initial structures -12295.32 -12295.22 -12290.88 -12291.38 Lattice energy (eV) -12290.38 -12295.02 A combined MC and EM approach has been developed to model zeolitic materials with low and medium Si/Al ratios. Firstly Al is inserted into a siliceous unit cell and then charge compensate with cations. RI Condor Pool Name OpSys vm1-8@faraday.r vm1-14@tyndall.r ising2.ri.ac. vm1-16@strutt1-4 xp2.ri.ac.uk xp3.ri.ac.uk d8.ri.ac.uk ATLANTIC BABBLE.ri.ac. D500.ri.ac.uk PCDAVIDC.ri.a e-sam.ri.ac.u pcalexey.ri.a Arch State Activity LoadAv Mem ActvtyTime IRIX65 SGI Owner Idle 1.192 128 3+03:01:02 IRIX65 SGI Unclaimed Idle 0.000 507 0+00:15:09 LINUX INTEL Unclaimed Idle 0.200 501 [?????] OSF1 ALPHA Owner Idle 1.113 1024 0+0:26:46 OSF1 ALPHA Owner Idle 1.113 256 49+12:26:46 OSF1 ALPHA Unclaimed Idle 0.000 256 0+00:55:00 WINNT40 INTEL Unclaimed Idle 0.000 255 0+02:09:45 WINNT51 INTEL Unclaimed Idle 0.008 256 0+01:02:30 WINNT51 INTEL Unclaimed Idle 0.252 512 0+00:22:57 WINNT51 INTEL Owner Idle 0.533 254 0+05:26:06 WINNT51 INTEL Unclaimed Idle 0.000 504 0+03:51:26 WINNT51 INTEL Unclaimed Idle 0.001 512 0+03:16:39 WINNT51 INTEL Unclaimed Idle 0.002 256 0+00:35:53 Machines Owner Claimed Unclaimed Matched Preempting ALPHA/OSF1 INTEL/LINUX INTEL/WINNT40 INTEL/WINNT51 SGI/IRIX65 Total 18 1 1 0 1 0 14 1 22 15 56 17 0 1 0 0 0 0 0 0 1 1 5 7 0 0 0 0 0 0 0 0 0 15 0 0 We have set up and tested a Condor pool at the RI, which has 50+ heterogeneous nodes from desktop PC’s, machines controlling instruments to main servers of the DFRL. RI Condor Pool Name OpSys vm1-8@faraday.r vm1-14@tyndall.r ising2.ri.ac. vm1-16@strutt1-4 xp2.ri.ac.uk xp3.ri.ac.uk d8.ri.ac.uk ATLANTIC BABBLE.ri.ac. D500.ri.ac.uk PCDAVIDC.ri.a e-sam.ri.ac.u pcalexey.ri.a Arch State Activity LoadAv Mem IRIX65 SGI Owner Idle 1.192 128 3+03:01:02 IRIX65 SGI Unclaimed Idle 0.000 507 0+00:15:09 LINUX INTEL Unclaimed Idle 0.200 501 [?????] OSF1 ALPHA Owner Idle 1.113 1024 0+0:26:46 OSF1 ALPHA Owner Idle 1.113 256 49+12:26:46 OSF1 ALPHA Unclaimed Idle 0.000 256 0+00:55:00 WINNT40 INTEL Unclaimed Idle 0.000 255 0+02:09:45 WINNT51 INTEL Unclaimed Idle 0.008 256 0+01:02:30 WINNT51 INTEL Unclaimed Idle 0.252 512 0+00:22:57 WINNT51 INTEL Owner Idle 0.533 254 0+05:26:06 WINNT51 INTEL Unclaimed Idle 0.000 504 0+03:51:26 WINNT51 INTEL Unclaimed Idle 0.001 512 0+03:16:39 WINNT51 INTEL Unclaimed Idle 0.002 256 0+00:35:53 Machines Owner Claimed Unclaimed Matched Preempting ALPHA/OSF1 INTEL/LINUX INTEL/WINNT40 INTEL/WINNT51 SGI/IRIX65 Total ActvtyTime 18 1 1 0 1 0 14 1 22 15 56 17 0 1 0 0 0 0 0 0 1 1 5 7 0 0 0 0 0 0 0 0 0 15 0 0 But where is PC-CRAC??? Level of Optimisation Configurations -12090 full 100 50 20 10 50eV 5 single TE (eV) -12070 -12050 Level of Optimisation Configurations -12090 -12070 full 100 50 -12050 20 10 240eV -12030 single -12010 -11990 TE (eV) 5 -11970 -11950 -11930 -11910 -11890 -11870 -11850 MOR Mordenite – • 1 dimensional channel system • simulation cell contains two unit cells • 296 atoms, with 96 Si centres (referred to as T sites). • Substituting 8 T sites with 8 Na cations Workflow MC_subs Gulp Files Gulp WinXP Perl script MS Excel SRB Workflow II C++ Si-zeo structure Interatomic pots MC_subs Input file Script auto batch sub f90 Script for cleaning dirs BatchGulp of labelled Gulp files Files Gulp WinXP Perl script Subset of data in formatted file Scommands MS Excel SRB Condor Stats Extensive use of Condor pools (UCL ~950 nodes in teaching pools). ~150 cpu-years of previously unused compute resource have been utilised in this study. Close collaboration with the NERC e-minerals project has allowed access to this resource. 150,000 calculations have been performed each with varying numbers of particles per simulation box, which means a total of ~75,000,000 particles have been included in our simulations of Mordenite to date. Condor Specifics Jobs submitted in 1,000 job batches – issue of stability. Shadows – not my game but a pain when Condor Master dies due to too many jobs hitting the queue (guilty feeling as Master was not solely running pool but also being used for science by pool administrator. Maximum number of jobs in queue. Condor Specifics Handling of data and analysis becomes RDS. However, keeping the pool full of jobs is also a tedious step when jobs are short, which is the ideal for the UCL pool (re: turning off pool once a day) – drip feeding. Thought in application design is key – many on UCL pool are TOTALLY unsuitable for UCL Condor Pool. MOR Mordenite – • 1 dimensional channel system • simulation cell contains two unit cells • 296 atoms, with 96 Si centres (referred to as T sites). • Substituting 8 T sites with 8 Na cations 100 Configurations 0 100 Configurations -12085 5550 -12083 5530 full_TE full_Vol -12081 5 per. Mov. Avg. (full_TE) 5510 Total Energy (eV) 20eV 5490 -12077 5470 -12075 5450 -12073 5430 -12071 5410 -12069 5390 -12067 5370 -12065 5350 It can be seen that there are two distinct regions, -12079eV to -12076eV and -12075eV to -12073eV, but there is no obvious correlation between total energy and cell volume. Cell Vol. 5 per. Mov. Avg. (full_Vol) -12079 10000 Configurations 0 10000 configurations -12090 TE VOL 200 per. Mov. Avg. (TE) 200 per. Mov. Avg. (VOL) 5550 -12085 25eV 5500 TE VOL -12080 -12075 5450 -12070 5400 -12065 5350 However, when 10,000 structures are considered it is clear that the most stable structures correspond to cation placements that do not cause the cell to expand. This requires that the cations sit in the large channel. 10000 Configurations 5600 5550 5500 5450 5400 5350 -12085 -12080 -12075 -12070 -12065 Energy_eV -12060 -12055 -12050 -12... Comparison of Regions -12079.5eV -12075.04eV Analysis mysql, allows input from a text file, C/C++ program or mysql command line and GUI Properties: Total energy, cell volume, lattice parameters, T-O distances, T-O-T bond angles, cation-framework oxygen distances, coordination of user specified species etc. Workflow III MC_subs Gulp Files Gulp WinXP mysql db SRB Building an Ensemble Property Good Bad Lattice Energy (eV) < -12070 > -12068 Al-Na average distance (Å) > 3.6 < 3.4 cell volume (Å3) < 5420 > 5475 average cation – Oxygen (Å) > 2.75 < 2.65 Validation Comparison with experiment is very promising showing a large difference in the quality of the fit between ‘good’ set and ‘bad’. Monitor Drip Feeding and Interactive Steering using Relational Databases Distributed Computing Portal User Input: Structural model Si/Al, cation types, [H2O] etc. Model/Configuration Generator Jobs db Analysis Improve generation / model strategy Steering db (geometry, energy, fit) Analysis db User Input: Diffraction data, chemical analysis, building units, Si/Al, cation types, [H2O] etc. D. Lewis, R. Coates, S. French UCL Chem / RI Workflow IV Workflow service needs to be exposed to outside world as a web service SSH CML CML CML Since we require new WSDL interfaces for each application it is a perfect opportunity to employ a standard representation for chemical structures. XML standard in Chemistry is CML (Chemical Markup Language) CML Key Achievement We are now doing science that was not possible before the advancements made within e-Science. FER Ferrite – • 2 dimensional channel system • simulation cell contains 115 atoms. • substituting at 4 T sites with 4 Na cations 100 Configurations -4400 2110 TE eV -4398 2090 Vol 5 per. Mov. Avg. (Vol) 5 per. Mov. Avg. ( TE eV) 14eV 2070 -4396 2050 2030 -4392 Vol TE in eV -4394 2010 -4390 1990 -4388 1970 -4386 1950 1 11 21 31 41 Configurations 51 61 71 Again there are steps in Total Energy and again this time no correlation with volume for the low number of configurations. Only 75 out of 100 configurations optimise 10000 Configurations -4400 2150 TE Vol 200 per. Mov. Avg. (TE) 200 per. Mov. Avg. (Vol) -4398 2130 2110 -4396 2090 15eV 2070 2050 -4392 Vol TE in eV -4394 2030 -4390 2010 -4388 1990 1970 -4386 1950 1 1001 2001 3001 4001 Configurations 5001 6001 7001 8001 However, this time when 10,000 structures are considered there are no clear steps in the volume. The volume still increases with decreasing stability but this is due to cell expansion caused by Al to Al interactions. Only 7500 out of 10000 optimise Comparison of Regions Comparison of Regions MFI ZSM5 – • 3 dimensional channel system • simulation cell contains 292 atoms • substituting at 4 sites with 4 Na cations 10000 Configurations -12215 5390 -12214 TE Vol 200 per. Mov. Avg. (Vol) 200 per. Mov. Avg. (TE) -12213 10eV 5370 -12212 5350 5330 -12210 -12209 Cell Volume TE in eV -12211 5310 -12208 5290 -12207 5270 -12206 -12205 5250 1 1001 2001 Configurations 3001 4001 There is a step in Total Energy but this time only one and from then the trend is smooth. What Next When confirmed the lowest energy positions of Al the cation is exchanged for a proton and again energy minimised. This method will allow us to construct realistic models of low and medium Si/Al zeolites. Such structures can be used for further simulations and aid the interpretation of experimental data. Solid Solutions BaTiO3 Solid Solutions BaSrTiO3 Solid Solutions SrTiO3 Ongoing and Future Work • • • • upload files as part of workflow to SRB generate metadata upload extracted data from files more extensive use of CML Key Achievement We are now doing science that was not possible before the advancements made within e-Science. Achievements To Date 1. First use of CML schema for defining Web Service port types. 2. Calculation of 50,000 configurations of zeolite Mordenite (24,000,000 particles) to gain insight into structure when a realistic ratio of Al substitution is included in model. 3. Successfully exposed Fortran codes as OGSI Web Services prototype application deployed on 80 nodes. The prototype computational polymorph application is being ported to a larger production machine. 4. First use of BPEL standard for orchestrating web services in a Grid application. 5. Open Source BPEL implementation in development enabling late binding and dynamic deployment of large computational processes. 6. Integration of OGSI and BPEL with Sun Grid Engine. 7. Development of Graphic User Interface for polymorph application connects to relational database via EJB interface. 8. Infrastructure for metadata and data management 9. SRB and dataportal are already being used to hold datasets and being used for transferring the data between different scientists and computer applications. 10. Implementation of Condor pool at Ri. Polymorph Prediction Different crystal structures of a molecule are called polymorphs. Polymorphs may have considerably different properties (e.g. bioavailability, solubility, morphology) Polymorph prediction is of great importance to the pharmaceutical industry where the discovery of a new polymorph during production or storage of a drug may be disastrous H H3C O CH3 N H H NO2 NO2 Drug molecules are often flexible and this makes the polymorph prediction process more challenging… Polymorph Prediction Workflow For flexible molecules: conformational optimisation n feasible rigid molecular probes representing energetically plausible conformers MOLPAK Generation of ~6000 densely packed crystal structures using rigid molecular probe n times Morphology DMAREL Lattice energy optimisation Data : Unit cell volume, density, lattice energy Restricted number of structures selected crystal structures and properties stored in Database n = number of conformers Storage Resource Broker Store data files from simulations in the Storage Resource Broker Key Achievement We are now doing science that was not possible before the advancements made within e-Science.