e-Science Technologies in the Simulation of Complex Materials eMaterials

advertisement
e-Science Technologies in the Simulation
of Complex Materials
L. Blanshard, R. Tyer, K. Kleese
S. A. French, D. S. Coombes, C. R. A. Catlow
eMaterials
B. Butchart, W. Emmerich – CS
H. Nowell, S. L. Price – Chem
H
H3C
CH3
H
N
O
H
NO2
NO2
Polymorphism
prediction of polymorphs –
a drug substance may exist
as two or more crystalline
phases in which the
molecules are packed
differently.
Combinatorial Computational Catalysis
explore which sites are involved in
catalysis – used in diverse
industries including petroleum,
chemical, polymers,
agrochemicals, and environmental.
Acid Sites in Zeolites
H
H3C
CH3
H
N
O
H
NO2
NO2
Polymorphism
prediction of polymorphs –
a drug substance may exist
as two or more crystalline
phases in which the
molecules are packed
differently.
Combinatorial Computational Catalysis
explore which sites are involved in
catalysis – used in diverse
industries including petroleum,
chemical, polymers,
agrochemicals, and environmental.
Acid Sites in Zeolites
e-Science Issues to Address
•
•
•
•
•
•
•
simulations take too long to run
data are distributed across many sites and systems
no catalogue system
output in legacy text files, different for each program
few tools to access, manage and transfer data
workflow management is manual
licensing within distributed environment
Acid Sites in Zeolites
•Determine the extra framework
cation position within the zeolite
framework.
•Explore which proton sites are
involved in catalysis and then
characterise the active sites.
•To produce a database with
structural models and associated
vibrational modes for Si/Al ratios.
•Improve understanding of the
role of the Si/Al ratio in zeolite
chemistry.
Chabazite: 1T site, 12 Si centres per unit cell,
8 membered ring channels (3.8Å * 3.8Å).
The Problem
Si/Al – 11 = 4
Si/Al – 5 = 160
Si/Al – 3 = 5760
Si/Al – 2 = 184,320
When substitution of a second Al is
considered there are now 4 * (10 * 4)
possible structures as symmetry has
been broken.
Note this is for a very simple
zeolite with 36 ions per unit cell,
materials of interest have 296.
The number of calculations quickly becomes an issue
when realistic Si/Al ratios are considered.
A Si/Al ratio of 2 would require 184,320 calculations
at ~100 second each.
= 5120.0 hours = 213 days of cpu time.
MC/EM
Final structures
Lattice energy (eV)
-12295.12
Initial structures
-12295.32
-12295.22
-12290.88
-12291.38
Lattice energy (eV)
-12290.38
-12295.02
A combined MC and EM approach has been developed
to model zeolitic materials with low and medium Si/Al
ratios. Firstly Al is inserted into a siliceous unit cell and
then charge compensate with cations.
RI Condor Pool
Name
OpSys
vm1-8@faraday.r
vm1-14@tyndall.r
ising2.ri.ac.
vm1-16@strutt1-4
xp2.ri.ac.uk
xp3.ri.ac.uk
d8.ri.ac.uk
ATLANTIC
BABBLE.ri.ac.
D500.ri.ac.uk
PCDAVIDC.ri.a
e-sam.ri.ac.u
pcalexey.ri.a
Arch
State
Activity
LoadAv Mem
ActvtyTime
IRIX65
SGI Owner
Idle
1.192 128 3+03:01:02
IRIX65
SGI Unclaimed Idle
0.000 507 0+00:15:09
LINUX
INTEL Unclaimed Idle
0.200 501 [?????]
OSF1
ALPHA Owner
Idle
1.113 1024 0+0:26:46
OSF1
ALPHA Owner
Idle
1.113 256 49+12:26:46
OSF1
ALPHA Unclaimed Idle
0.000 256 0+00:55:00
WINNT40
INTEL Unclaimed Idle
0.000 255 0+02:09:45
WINNT51
INTEL Unclaimed Idle
0.008 256 0+01:02:30
WINNT51
INTEL Unclaimed Idle
0.252 512 0+00:22:57
WINNT51
INTEL Owner
Idle
0.533 254 0+05:26:06
WINNT51
INTEL Unclaimed Idle
0.000 504 0+03:51:26
WINNT51
INTEL Unclaimed Idle
0.001 512 0+03:16:39
WINNT51
INTEL Unclaimed Idle
0.002 256 0+00:35:53
Machines Owner Claimed Unclaimed Matched Preempting
ALPHA/OSF1
INTEL/LINUX
INTEL/WINNT40
INTEL/WINNT51
SGI/IRIX65
Total
18
1
1
0
1
0
14
1
22 15
56
17
0
1
0
0
0
0
0
0
1
1
5
7
0
0
0
0
0
0
0
0
0
15
0
0
We have set up and tested a Condor pool at the
RI, which has 50+ heterogeneous nodes from
desktop PC’s, machines controlling instruments
to main servers of the DFRL.
RI Condor Pool
Name
OpSys
vm1-8@faraday.r
vm1-14@tyndall.r
ising2.ri.ac.
vm1-16@strutt1-4
xp2.ri.ac.uk
xp3.ri.ac.uk
d8.ri.ac.uk
ATLANTIC
BABBLE.ri.ac.
D500.ri.ac.uk
PCDAVIDC.ri.a
e-sam.ri.ac.u
pcalexey.ri.a
Arch
State
Activity
LoadAv Mem
IRIX65
SGI Owner
Idle
1.192 128 3+03:01:02
IRIX65
SGI Unclaimed Idle
0.000 507 0+00:15:09
LINUX
INTEL Unclaimed Idle
0.200 501 [?????]
OSF1
ALPHA Owner
Idle
1.113 1024 0+0:26:46
OSF1
ALPHA Owner
Idle
1.113 256 49+12:26:46
OSF1
ALPHA Unclaimed Idle
0.000 256 0+00:55:00
WINNT40
INTEL Unclaimed Idle
0.000 255 0+02:09:45
WINNT51
INTEL Unclaimed Idle
0.008 256 0+01:02:30
WINNT51
INTEL Unclaimed Idle
0.252 512 0+00:22:57
WINNT51
INTEL Owner
Idle
0.533 254 0+05:26:06
WINNT51
INTEL Unclaimed Idle
0.000 504 0+03:51:26
WINNT51
INTEL Unclaimed Idle
0.001 512 0+03:16:39
WINNT51
INTEL Unclaimed Idle
0.002 256 0+00:35:53
Machines Owner Claimed Unclaimed Matched Preempting
ALPHA/OSF1
INTEL/LINUX
INTEL/WINNT40
INTEL/WINNT51
SGI/IRIX65
Total
ActvtyTime
18
1
1
0
1
0
14
1
22 15
56
17
0
1
0
0
0
0
0
0
1
1
5
7
0
0
0
0
0
0
0
0
0
15
0
0
But where is PC-CRAC???
Level of Optimisation
Configurations
-12090
full
100
50
20
10
50eV
5
single
TE (eV)
-12070
-12050
Level of Optimisation
Configurations
-12090
-12070
full
100
50
-12050
20
10
240eV
-12030
single
-12010
-11990
TE (eV)
5
-11970
-11950
-11930
-11910
-11890
-11870
-11850
MOR
Mordenite –
• 1 dimensional channel system
• simulation cell contains two unit cells
• 296 atoms, with 96 Si centres
(referred to as T sites).
• Substituting 8 T sites with 8 Na cations
Workflow
MC_subs
Gulp
Files
Gulp
WinXP
Perl
script
MS
Excel
SRB
Workflow II
C++
Si-zeo structure
Interatomic pots
MC_subs
Input file
Script auto
batch sub
f90
Script for
cleaning dirs
BatchGulp
of labelled
Gulp
files
Files
Gulp
WinXP
Perl
script
Subset of data
in formatted file
Scommands
MS
Excel
SRB
Condor Stats
Extensive use of Condor pools (UCL ~950 nodes in teaching
pools). ~150 cpu-years of previously unused compute resource
have been utilised in this study. Close collaboration with the
NERC e-minerals project has allowed access to this resource.
150,000 calculations have been performed each with varying
numbers of particles per simulation box, which means a total of
~75,000,000 particles have been included in our simulations of
Mordenite to date.
Condor Specifics
Jobs submitted in 1,000 job batches – issue of stability.
Shadows – not my game but a pain when Condor Master dies
due to too many jobs hitting the queue (guilty feeling as Master
was not solely running pool but also being used for science by
pool administrator.
Maximum number of jobs in queue.
Condor Specifics
Handling of data and analysis becomes RDS.
However, keeping the pool full of jobs is also a tedious step
when jobs are short, which is the ideal for the UCL pool (re:
turning off pool once a day) – drip feeding.
Thought in application design is key – many on UCL
pool are TOTALLY unsuitable for UCL Condor Pool.
MOR
Mordenite –
• 1 dimensional channel system
• simulation cell contains two unit cells
• 296 atoms, with 96 Si centres
(referred to as T sites).
• Substituting 8 T sites with 8 Na cations
100 Configurations
0
100
Configurations
-12085
5550
-12083
5530
full_TE
full_Vol
-12081
5 per. Mov. Avg. (full_TE)
5510
Total Energy (eV)
20eV
5490
-12077
5470
-12075
5450
-12073
5430
-12071
5410
-12069
5390
-12067
5370
-12065
5350
It can be seen that there are two distinct regions, -12079eV
to -12076eV and -12075eV to -12073eV, but there is no
obvious correlation between total energy and cell volume.
Cell Vol.
5 per. Mov. Avg. (full_Vol)
-12079
10000 Configurations
0
10000
configurations
-12090
TE
VOL
200 per. Mov. Avg. (TE)
200 per. Mov. Avg. (VOL)
5550
-12085
25eV
5500
TE
VOL
-12080
-12075
5450
-12070
5400
-12065
5350
However, when 10,000 structures are considered it is clear that
the most stable structures correspond to cation placements that
do not cause the cell to expand. This requires that the cations
sit in the large channel.
10000 Configurations
5600
5550
5500
5450
5400
5350
-12085
-12080
-12075
-12070
-12065
Energy_eV
-12060
-12055
-12050 -12...
Comparison of Regions
-12079.5eV
-12075.04eV
Analysis
mysql, allows input from a text
file, C/C++ program or mysql
command line and GUI
Properties: Total energy, cell volume,
lattice parameters, T-O distances, T-O-T
bond angles, cation-framework oxygen
distances, coordination of user specified
species etc.
Workflow III
MC_subs
Gulp
Files
Gulp
WinXP
mysql
db
SRB
Building an Ensemble
Property
Good
Bad
Lattice Energy
(eV)
< -12070
> -12068
Al-Na average
distance (Å)
> 3.6
< 3.4
cell volume
(Å3)
< 5420
> 5475
average cation
– Oxygen (Å)
> 2.75
< 2.65
Validation
Comparison with experiment
is very promising showing a
large difference in the quality
of the fit between ‘good’ set
and ‘bad’.
Monitor
Drip Feeding and Interactive Steering
using Relational Databases
Distributed
Computing
Portal
User Input: Structural model
Si/Al, cation types, [H2O] etc.
Model/Configuration
Generator
Jobs
db
Analysis
Improve
generation / model
strategy
Steering
db
(geometry, energy, fit)
Analysis db
User Input: Diffraction data, chemical
analysis, building units, Si/Al, cation
types, [H2O] etc.
D. Lewis, R. Coates, S. French
UCL Chem / RI
Workflow IV
Workflow service needs to be exposed
to outside world as a web service
SSH
CML
CML
CML
Since we require new WSDL interfaces
for each application it is a perfect
opportunity to employ a standard
representation for chemical structures.
XML standard in Chemistry is CML
(Chemical Markup Language)
CML
Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.
FER
Ferrite –
• 2 dimensional channel system
• simulation cell contains 115 atoms.
• substituting at 4 T sites with 4 Na cations
100 Configurations
-4400
2110
TE eV
-4398
2090
Vol
5 per. Mov. Avg. (Vol)
5 per. Mov. Avg. ( TE eV)
14eV
2070
-4396
2050
2030
-4392
Vol
TE in eV
-4394
2010
-4390
1990
-4388
1970
-4386
1950
1
11
21
31
41
Configurations
51
61
71
Again there are steps in Total Energy and again this time no
correlation with volume for the low number of configurations.
Only 75 out of 100 configurations optimise
10000 Configurations
-4400
2150
TE
Vol
200 per. Mov. Avg. (TE)
200 per. Mov. Avg. (Vol)
-4398
2130
2110
-4396
2090
15eV
2070
2050
-4392
Vol
TE in eV
-4394
2030
-4390
2010
-4388
1990
1970
-4386
1950
1
1001
2001
3001
4001
Configurations
5001
6001
7001
8001
However, this time when 10,000 structures are considered there
are no clear steps in the volume. The volume still increases with
decreasing stability but this is due to cell expansion caused by
Al to Al interactions.
Only 7500 out of 10000 optimise
Comparison of Regions
Comparison of Regions
MFI
ZSM5 –
• 3 dimensional channel system
• simulation cell contains 292 atoms
• substituting at 4 sites with 4 Na cations
10000 Configurations
-12215
5390
-12214
TE
Vol
200 per. Mov. Avg. (Vol)
200 per. Mov. Avg. (TE)
-12213
10eV
5370
-12212
5350
5330
-12210
-12209
Cell Volume
TE in eV
-12211
5310
-12208
5290
-12207
5270
-12206
-12205
5250
1
1001
2001
Configurations
3001
4001
There is a step in Total Energy but this time only one and from
then the trend is smooth.
What Next
When confirmed the lowest energy
positions of Al the cation is exchanged
for a proton and again energy
minimised.
This method will allow us to construct
realistic models of low and medium
Si/Al zeolites. Such structures can be
used for further simulations and aid the
interpretation of experimental data.
Solid Solutions
BaTiO3
Solid Solutions
BaSrTiO3
Solid Solutions
SrTiO3
Ongoing and Future Work
•
•
•
•
upload files as part of workflow to SRB
generate metadata
upload extracted data from files
more extensive use of CML
Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.
Achievements To Date
1. First use of CML schema for defining Web Service port types.
2. Calculation of 50,000 configurations of zeolite Mordenite
(24,000,000 particles) to gain insight into structure when a realistic
ratio of Al substitution is included in model.
3. Successfully exposed Fortran codes as OGSI Web Services prototype application deployed on 80 nodes. The prototype
computational polymorph application is being ported to a larger
production machine.
4. First use of BPEL standard for orchestrating web services in a Grid
application.
5. Open Source BPEL implementation in development enabling late
binding and dynamic deployment of large computational processes.
6. Integration of OGSI and BPEL with Sun Grid Engine.
7. Development of Graphic User Interface for polymorph application connects to relational database via EJB interface.
8. Infrastructure for metadata and data management
9. SRB and dataportal are already being used to hold datasets and
being used for transferring the data between different scientists and
computer applications.
10. Implementation of Condor pool at Ri.
Polymorph Prediction
Different crystal structures of a molecule are called polymorphs.
Polymorphs may have considerably different properties
(e.g. bioavailability, solubility, morphology)
Polymorph prediction is of great importance to the pharmaceutical
industry where the discovery of a new polymorph during production or
storage of a drug may be disastrous
H
H3C
O
CH3
N
H
H
NO2
NO2
Drug molecules are often flexible and this
makes the polymorph prediction process
more challenging…
Polymorph Prediction Workflow
For flexible molecules: conformational optimisation
n feasible rigid molecular probes representing
energetically plausible conformers
MOLPAK
Generation of ~6000 densely packed crystal
structures using rigid molecular probe
n times
Morphology
DMAREL
Lattice energy optimisation
Data : Unit cell volume, density, lattice energy
Restricted number of structures selected
crystal structures and properties stored in
Database
n = number of
conformers
Storage Resource Broker
Store data files from simulations in the
Storage Resource Broker
Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.
Download