FROM EGEE TO EGI: THE ROLE OF VIRTUAL RESEARCH COMMUNITIES IN MOLECULAR AND MATERIALS SCIENCE Antonio Laganà* Department of Chemistry, University of Perugia, Italy * With the collaboration of several members of the COMPCHEM Virtual Organization SUMMARY • THE EGEE GRID AND ITS IMPLICATIONS FOR COMPUTATIONAL MOLECULAR AND MATERIALS SCIENTISTS • PAVING THE WAY TO EGI • FROM SIMBEX (SIMULATOR of MOLECULAR BEAM EXPERIMENT) TO GEMS (GRID EMPOWERED MOLECULAR SIMULATOR) • FROM COMPCHEM TO CMST • GRIDIFICATION APPROACHES • FORWARD LOOKING 1 - THE EGEE GRID AND ITS IMPLICATIONS FOR COMPUTATIONAL MOLECULAR AND MATERIALS SCIENTISTS The european seminal implementation of the Grid and the assemblage of the COMPCHEM Virtual Organization The Grid: from dreams to reality “A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.” Ian Foster, The Grid: Blueprint for a future computing infrastructure (1999) THE PERVASIVITY OF THE EGEE PRODUCTION GRID THE EGEE PRODUCTION GRID • EGEE is a European project aimed at developing a European grid infrastructure for science with links to US, Latin America, India and China grids. • In the first biennium little support (NA4 Activity Application Identification and Support) was given to chemistry. • Starting from the second biennium the Beam Molecular simulator (SIMBEX) was produced and the Chemistry virtual organization (VO) COMPCHEM admitted as unfunded • In the third biennium a prototype version of the Grid Molecular Simulator GEMS was designed and implemented THE COST EFFECTIVENESS OF THE EGEE PRODUCTION GRID • On public network • Out of shelves technology (from PC to supercomputers) • Evolutionary approach • Aggregated local nodes (the Perugia case) The initial Beowulf-Mosix “GRID” front-end + 15 nodes 2 proc. PIII 1.0 Ghz, 2 Gbytes RAM, NIC Intel e1000 Gigabit Ethernet Switch 3Com Gigabit Ethernet 16 port Hybrid architecture: Beowulf MOSIX The additional cluster “GRID” front-end + 40 nodes proc. Intel Xeon Quadcore X3210 2.13 GHz, 164 GB RAM, 8 Mb Cache L2 MB (2x6) Level 2 RJ 45 Ethernet Switch 3Com 2 Switch Gigabit Ethernet 48 ports FURTHER ESPANSION OF THE PERUGIA NODE • Coordination to the original nucleus of scientists from Computer Science and Chem-dynamics with those of the local section of INFN, CNR, Chem-electronics, Drugdesign. • Gathering together the related hardware (different Tier3) and software tools and experimenting new ones (like GPUs, workflows and framework) • Assembling the specific packages of the different scientific areas • Widening the service area in grid porting, training and education. FURTHER ESPANSION OF COMPCHEM • Increase the number of users. • Increase the number of programs • Improvement of the support to users (registration, porting, training (2 schools), …) • Connection with other VOs and application to INFRA2010 as part of the ROSCOE application. Astrophysics Bioinformatics Computational Chemistry Geophysics HP Components Libraries Portals Earth observation Problem Solving Cost models Security Resource Management Networks Applications ProgramMing tools Communications Monitoring Fiber optics Middleware High performance nets THE DEPENDABILITY OF THE EGEE PRODUCTION GRID THE CONSISTENCY AND DEPENDABILITY OF THE EGEE PRODUCTION GRID • NO ADEQUATE BANDWIDTH and RELIABILITY of public networks • NO STANDARD MIDDLEWARE (Glite, Arc, Unicore) • NO EFFICIENT PARALLELIZATION TOOLS (MPI Libraries), PORTALS, WORKFLOWS • NO ESTABLISHED DATA AND PACKAGE MODELS AND STANDARDS 2 - PAVING THE WAY TO THE EUROPEAN GRID INITIATIVE (EGI) The structuring of a new true pan-european grid infrastructure MISSION and STRUCTURE • Support international research teams and projects by means of an international infrastructure to share data (knowledge) and compute resources • Common infrastructure – national funding of computing research infrastructures via NGI platforms – coordination through EGI.ORG – steering by User Communities EGI Basic Elements • EGI ORGANIZATION – EGI.ORG a light coordination body • • • • Central location + decentralized bodies Synergy for EU level added value Coordination activities Links with external bodies (Consortia, ..) – NGIs Stakeholders of EGI.ORG • national funding • own agenda and tasks EGI Stakeholders Research Research Teams Teams Research Research Institutes Institutes NGI2 NGI1 EGI.org EGI.org NGIn … Resource Resource Centres Centres NGIs NGIs NGI User Community Tasks 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. VO Registration and VO Database Site Validation Tests Core VO Service Provision Help Desk and User Technical Support Documentation Help Desk for Application Porting Case Studies Consulting Application Database Development of Services (Grid Planning) NGI User Community Tasks 11. 12. 13. 14. 15. 16. 17. 18. 19. Integration of Domain’s Resources Feedback Dissemination Community-Specific Gateways and Help Desk Validation of Site Resources/Services Coordination User Conference – User Forum Events Technical Coordination Grid Planning Regional Coordination EGI User Community Goals 1. Gathering requirements from the user communities. 2. Carrying out a review process to integrate useful “external” software 3. Establishing Science Gateways that expose common tools and services to user communities in the various disciplines (specialized support center, SSC). 4. Establishing technical collaborations with the large ERI projects 5. Providing “umbrella” services for collaborating projects, (e.g. maintenance of repositories, FAQs, wikis, etc.) 6. Maintaining a European Grid Application Database that allows applications to be “registered” 7. Organising European events such as the User Forum meetings and topical meetings 8. Providing services for new communities 9. Ensuring high quality documentation and training services. OTHER ACTORS • ass. members: EIROs (Cern, Esa, Ebi, ..) - supplement NGIs for services & resources in specific sectors • partners: MiddleWare Consortia (gLite, Unicore, arc) – provide the OS middleware EGI Management/Governance Members NGI1, NGI2, NGI3, … NGIn User Forum Steering Committee Associate Members Non-voting Representatives e.g. EIROforum member, … extra-EU NGIs, Chair of UFSC, … EGI Council (UFSC) User Forum (UF) EGI.org EGI Director Advisory Committees e.g. Middleware Coordination Board (MCB) UCO User Coordination CTO Middleware Maintenance User Community Services Middleware Unit CAO Admin & PR Administration & PR Unit COO Operations Operations Unit FROM EGEE to EGI • January 20th 2009: Vote for approval of the EGI Blueprint by the EGI_DS Policy Board; first list of NGIs subscribing to the principles of EGI. • March 2nd 2009: Catania Workshop – Approval of AMSTERDAM as the EGI location; common work plan with EGEE on transition scenario. • Spring 2009: Transition team in place with authority to prepare key tasks and to negotiate with the EU; work on calls for EC funding • Summer 2009: The core of the EGI project transition team is agreed and confirmed by the Policy Board; latest date for formal establishment of EGI including location. • Autumn 2009: The EGI project proposal is prepared and submitted for approval to the EC. • January 1st, 2010: EGI is operational, with all key personnel being appointed (who may not yet be working for EGI, as e.g. still working for EGEE III or any other project). • April 2010: EGI takes over from EGEE-III 3 – FROM SIMBEX (SIMULATOR of MOLECULAR BEAM EXPERIMENTS) TO GEMS (GRID EMPOWERED MOLECULAR SIMULATOR) A sistematic grid approach to molecular and materials science simulations - O. Gervasi, A. Lagana’, SIMBEX: a portal for the a priori simulation of crossed beam experiments, Future generation Computer Systems, 20(5), 703-716 (2004) - O. Gervasi, C. Dittamo, A. Lagana’, A Grid Molecular simulator for E-science, Lecture Notes in Computer Science 3470, 16-22 (2005). RESEARCH PROJECTS CHEMISTRY COMPUTING ON THE NETWORK • EU: Data grid, Digital libraries, …… COST (D23, (1999) METACHEM Metalaboratories (virtual laboratories made of geographically dispersed laboratories) for computational chemistry complex applications; D37 (2004) GRIDCHEM computational chemistry applications for Grid computing). • NATIONAL: analogous project funded on National resources. THE CROSSED BEAM EXPERIMENT MEASURABLES - Angular and time of flight product distributions INFORMATION OBTAINABLE - Primary reaction products - Reaction mechanisms - Structure and life time of transient - Internal energy distribution of products - Key features of the potential of Perugia The concurrent TRAJECTORY kernel TRAJ Define quantities of general use return Iterate over initial conditions the integration of individual trajectories (ABCTRAJ, etc.) Collect individual trajectory results VIRTUAL MONITORS FOR COMPUTED PRODUCT ANGULAR DISTRIBUTIONS OF THE VARIOUS CHANNELS H+ICl→Cl+HI H+ICl→H+ICl H+ICl→HCl+I KNOWLEDGE FLOW OF GEMS A GRID EMPOWERED MOLECULAR SIMULATOR System input Interaction Dynamics Statistics Virtual Monitors The INTERACTION module START INTERACTION NO Is there a suitable PES? Are ab initio calculations available? NO Are ab initio calculations feasible? NO YES YES YES SUPSIM Import the PES routine DYNAMICS Are dynamics calculations direct? NO FITTING Take a database force field SUPSIM: the concurrent Ab initio approach SUPSIM Define the characteristics of the ab initio calculation, the coordinates used and the Variable’s intervals Iterate over the system Geometries the call of ab initio suites of codes (GAMESS, GAUSSIAN, MOLPRO, etc) L. Storchi, F. Tarantelli, A. Lagana’, Computing Molecular energy surfaces on the grid, Lecture Notes in Computer Science 3980, 675683 (2006). return Collect single molecular geometry energy AB INITIO CALCULATIONS • Methods - wavefunction quantum approaches (MRCI) - density functional theory (DFT) • Programs: often standard packages - ACADEMIC like GAMESS US - COMMERCIAL like GAUSSIAN The FITTING Module YES YES FITTING Are asymptotic values accurate? NO Modify asymptotic values Return YES Are remaining values inaccurate? Do ab initio values have the proper symmetry? NO NO Modify short and long range values Enforce the proper symmetry Application using fitting programs to generate a PES routine The DYNAMICS module DYNAMICS Exact quantum calculations? YES QDYN Integration of the exact quantum dynamics equations OBSERVABLES NO Approximate quantum calcula tions? YES APPRQDYN Integration of the approximate or mixed QM and QC dynamics equations NO Semiclassical calculations? NO YES SEMICLASSICAL Integration of classical equations of motion and of the associated classical action CLASSICAL Integration of the classical equations of motion The QDYN PROCEDURES QUANTUM DYNAMICS Single Initial quantum state? YES TD: single initial state atom diatom S matrix elements for several energies OBSERVABLES NO Multiple initial quantum states? YES TI: single energy atom diatom S matrix elements for all Initial states NO State specific (summed over final states) Fully averaged YES MCTDH: reactive flux flux correlation function method CRP: cumulative reaction probabilities and Transition State theory The concurrent time dependent approach TD Define quantities of general use return •Iterate over initial conditions •the time propagation •(RWAVEPR, CYLHYP, etc.) •Collect single initial state •S matrix element The concurrent time independent approach TI Define quantities of general use including the integration bed Iterate over the reaction coordinate to build the interaction matrix Collect coupling matrix elements Broadcast coupling matrix Iterate over total energy value the integration of scattering equations return Collect state to state S matrix elements The CLASSICAL PROCEDURES CLASSICAL DYNAMICS Few single body problem? YES VENUS: sfew body trajectory calculations OBSERVABLES NO Few large body problem? YES DL_POLY, GROMACS: various ensembles calculations NO Many small body problem? Fully averaged YES DLPOLY, GROMACS: reduced degrees of freedom Simplified or approaches Using history files to rationalize mechanisms QuickTime™ and a Cinepak decompressor are needed to see this picture. RECROSSING IN OH + HCl → H2O + Cl DIATOM-DIATOM REACTIVE PROCESSES 4 – FROM THE COMPCHEM VO TO CMST SSC •Global approaches prompt collaboration, know how sharing and service providing •Collaboration prompts an evaluation of the commitment (including environmental care and social fairness) and of the productivity as well as the establishing of an economy A. Lagana’, A. Riganelli, O. Gervasi, On the structuring of the computational chemistry virtual organization COMPCHEM, Lecture Notes in Computer Science 3980, 665-674 (2006). •COMPCHEM VO (http://compchem.unipg.it) is a virtual organization coordinated by the Perugia University running on the EGEE production Grid from the end of 2004 80 (system, development, application) users 8000 CPUs (~8% of the EGEE resources) Strong ties with two COST actions: D23 (METACHEM, 1999) and D37 (GRIDCHEM, 2005) Tight connections with other VOs of the Computational Chemistry cluster (eg. GAUSSIAN) • COMPCHEM ITALIAN Support sites se.grid.unipg.it (UNI-Perugia) se-01.grid.sissa.it (SISSA-Trieste) gridsrm.ts.infn.it (INFN-Trieste) prod-se-01.pf.infn.it, prod-se-01.pf.infn.it Italian (INFN-Padova) grid-e0-engine04.esrin.esa.int (ESA-esrin) cmsdcache.pi.infn.it, gridse.pi.infn.it (INFN-Pisa) grids.sns.it (SNS-Pisa) aliserv1.ct.infn.it (INFN-Catania) egse.frascati.enea.it, egse.cresco.portici.enea.it (GRISU.ENEA.Grid) spacin-wn03.dna.unina.it (GRSU-SPACI-Napoli) t2-dpm-01.na.infn.it (INFN-Napoli-Atlas) grid2.fe.infn.it (INFN-Ferrara) grid003.ca.infn.it (INFN-Cagliari) • COMPCHEM EUROPEAN Support sites plethon.grid.ucy.ac.cy (CY-01-Kimon) grid05.lal.in2p3.fr, polgrid4.in2p3.fr (GRIF) se02.marie.hellasgrid.gr, se01.marie.hellasgrid.gr (GR-06-iasa) se01.grid.uoi.gr (GR-10-uoi) se01.isabella.grnet.gr (HG-01-grnet) se01.afroditi.hellasgrid.gr (HG-03-auth) se01.kallisto.hellasgrid.gr (HG-04.cti-ceid) se01.ariagni.hellasgrid.gr (HG-05.forth) se01.athena.hellasgrid.gr (HG-06.ekt) gridstore.cs.tcd.ie (csTCDie) se.reef.man.poznan.pl (PSNC) se2.egee.cesga.es (CESGA-EGEE) se2.ppgrid1.rhu1.ac.uk (UKI-lt2-rhul) COMPCHEM Applications • COLUMBUS Vienna (Austria) high-level ab initio molecular electronic structure calculations. • GAMESS-US Catania (Italy) high-level ab initio molecular quantum chemistry • ABC Perugia (Italy), Budapest (Hungary) quantum time-independent reactive dynamics • RWAVEPR Perugia (Italy), Vitoria (Spain) quantum time-dependent reactive dynamics • MCTDH Barcelona (Spain) multi-configurational time-dependent Hartree method • FLUSS Barcelona (Spain) Lanczos iterative diagonalisation of the thermal flux operator • DIFF REAL WAVE Melbourne (Australia) quantum differential cross-section (work in progress) • VENUS Vitoria (Spain) classical mechanics cross sections and rate coefficients • DL_POLY Iraklion (Greece), Perugia (Italy) molecular dynamics simulation of complex systems • CHIMERE Perugia (Italy) chemistry and transport eulerian model for air quality simulations Millions of cpu hours consumption From the EGEE Accounting Portal at the Centro de Supercomputación de Galicia http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.html The share of COMPCHEM THE COMPCHEM MEMBERSHIP 1. USER PASSIVE : Runs other’s programs ACTIVE: Implements at least one program for personal usage 2. SW PROVIDER (from this level on one can earn credits) PASSIVE : Implements at least one program for other’s usage ACTIVE: Management at least one implemented program for cooperative usage 3. HW PROVIDER PASSIVE : Confers to the infrastructure at least a small cluster of processors ACTIVE: Contributes to deploy and manage the structure 4. MANAGER (STAKEHOLDER): Takes part to the development and the management of the virtual organization • Further information at http://compchem.unipg.it THE PLANNED SSC CMST 1. 2. 3. 4. 5. 6. GATHER EXISTING VOs IN CHEMISTRY AND MATERIALS SCIENCE and TECHNOLOGIES (COMPCHEM, GAUSSIAN, ….) IN A SINGLE SSC (CMST) ATTRACT NEW RESEARCH GROUPS AND LABORATORIES ACTIVE IN THE FIELD REPRESENT THE RELATED VOs at EGI USER FORUM AND STEERING COMMITTEE LEVEL INTERACT WITH THE OPERATIONAL AND USER SUPPORT UNITS OF EGI DESIGN A DEVELOPMENT STRATEGY FOR THE VOS OF THE AREA PROVIDE TRAINING OPPORTUNITIES AND COORDINATE DISSEMINATION ACTIVITIES 5 – FURTHER GRIDIFICATION ACTIVITIES APPLY THE DECOMPOSITION METHODS TO OTHER PROGRAMS AND USE GRID PORTALS Lecture notes in Computer Science recent papers A Grid Implementation of Direct Semiclassical Calculations of Rate Coefficients, 5592, 93 (2009), A. Costantini, N. Faginas Lago, A. Lagana, and F. Huarte A Grid Implementation of Direct Quantum Calculations of Rate Coefficients, 5592, 104 (2009), A. Costantini, N. Faginas Lago, A. Lagana, and F. Huarte A Grid Implementation of Chimere: Ozone Production in Central Italy, 5592, 115 (2009), A. Lagana, St. Crocchianti, Alessandro Costantini, Monica Angelucci, and Marco Vecchiocattivi Porting of the GROMACS package into the Grid Environment: testing of a new distribution strategy, 6019, 1-12 (2010), A. Costantini, E. Gutierrez, J. Lope Cacheiro, A. Rodriguez, O. Gervasi, A. Lagana, Accurate quantum dynamics on platforms: some effects of long range interactons on N+N2 reactivitiy, 6019, 41-52(2010), S. Rampino, F. Pirani, A. Lagana, E. Garcia THE MCTDH METHOD • • • Diagonalisation of the thermal flux operator defined onto a dividing surface to build a reduced Krylov subspace (iterative diagonalisation by consecutive application of the thermal flux operator on a trial wave function). The outcome is a set of eigenvalues and eigenstates of the thermal flux operator. Time propagation of the thermal flux eigenstates employing MCTDH. Calculation of observables: k(T), N(E). THE FLUSS PROGRAM calculate the individual eigenfunctions TIME INTEGRATION distribute the individual propagations FURTHER GENERALIZATION OF QUANTUM DYNAMICS • Broaden the offering of cooperating/competing packages as web services • Avoid electron-nuclei separation (BornOppenheimer) and generalize coordinates to Nbody problems • Introduce easy ways of composing packages GENERALIZE GEMS WORKFLOWS • Inter-job workflow - Wrap the jobs - Treat the jobs as objects - Define composition rules and data links • Intra-job workflows - Define tools as for inter-job workflows via directives to be inserted inside the jobs PGRADE ABC workflow Gridification of ABC classical command line interface P-GRADE Grid Portal 2.7 Generator: generates input files with different parameters Collector: collects all output files into a single TAR file Executor: executed as many times in parallel as many parameters are generated by “Generator” Performance Results of ABC 2500 Time (min) 2000 1500 Time grid Time local 1000 500 0 ABC Execution of 4 ABC parameter study jobs for F + HD reaction varying jmax and rmax on - a local machine (P4 3.4GHz, 1GByte RAM) - 4 WMS selected clusters that support COMPCHEM VO Better speed-up can be achieved with more parameter jobs Performance Results of ABC 300000 Execution of 500 parameter study jobs for F + HD reaction on 250000 Time (min) 200000 ABC Time grid Time local 150000 - a local machine (P4 3.4GHz, 1GByte RAM) - WMSs selected clusters that support COMPCHEM VO 100000 50000 0 ABC 6 – FORWARD LOOKING DEVELOP A (COLLABORATIVE) GRID ECONOMY • Service oriented approaches • QoS and QoU • Credit system and cost of services CGW’09 Krakow (PL) – October 12-14, 2009 GriF: a collaborative tool for grid empowering to computational applications • GriF is meant to make grid applications black box like and to push the grid computing to a higher level of transparency (Clouds Computing) in which better memory usage, reduced cpu and wall times consumption as well as an optimized distribution of tasks over the grid are automatically performed. • GriF is a collaborative JAVA Service Oriented Architecture (SOA) framework which provides grid services aimed at exploiting the articulation of computational applications in sequential, concurrent or alternative paths on the EGI Grid by adopting SOA and Web Service standard technologies. • GriF improves the grid by providing the VO or SSC users with standard operational modalities based on friendly user driven services. Moreover, GriF creates collaborations to add value for all parties involved also by working with service providers which can offer applications to users by composing one or more services without knowing their implementation details. C. Manuali – A. Laganà University of Perugia (IT) CGW’09 Krakow (PL) – October 12-14, 2009 GriF in the Grid scenario The SOA organization consists essentially of two JAVA servers and the JAVA client. The two JAVA servers are YR (Yet a Registry, used to drive the initial discovery of the Web Services offered by the VO or the SSC) and YP (Yet a Provider, used to hold the VO or SSC Web Services). The JAVA client is YC (Yet a Consumer, used to interact with GriF in Wizard/Expert mode). In the top part of the figure phases 1 and 2 show the services discovery and phases 3-7 show a typical program execution performed on the EGI Grid in which the selected YP takes care of running the job on the associated User Interface (UI). In the bottom part of the figure the grid proxy management and its YC interactions are shown. C. Manuali – A. Laganà University of Perugia (IT) CGW’09 Krakow (PL) – October 12-14, 2009 GriF @ Work (Wizard Mode) 1 - Using the “Framework Management” tab to create the Grid Proxy and check the GriF Status 2 - Using the “Wizard Mode” to start the Grid Job (Parametric Jobs on EGI for the ABC program), check the Job’s Status and retrieve the results C. Manuali – A. Laganà University of Perugia (IT) AIR POLLUTION SIMULATION CPM10 Concentration from CHIMERE-aerosols Gas hydrates (Clathrates): water hydrogen bonded structures caging gas molecules • Cl2 • H2 S • CO2 • CH4 • H2 • etc. HYDROGEN HYDRATE ACKNOWLEDGEMENTS • CDK group, Dept. Chemistry, Perugia (Crocchianti, Faginas, Pacifici, Skouteris, Costantini, Rampino, Manuali) • HPC group, Dept. Math&Inf, Perugia (Gervasi, Tasso) • Qdyn group, COST D37 (Garcia, Huarte, Lendvay, Nyman, Balint-Kurti, Farantos) • Other groups of COST D37 • COST-ESF, EU-FP7, MIUR (It), ESA funding TANKS FOR YOUR ATTENTION