Enabling Grids for E-sciencE User communities and applications David Fergusson 28th February www.eu-egee.org INFSO-RI-508833 Enabling Grids for EsciencE Enabling Grids for E-sciencE • What is the EGEE community? – – – – – Researchers in eScience (applications NA4) eResearch European community World grid community Industry (industry forum) • What is not the EGEE community? INFSO-RI-508833 eScience/eResearch Enabling Grids for E-sciencE • EGEE’s initial focus is on specific scientific communities – – – – – High Energy Physics (Large Hadron Collider) Biomedical Geology Chemistry Astrophysics • Collaborating with other EU projects in other areas – For example, digital libraries - DILIGENT INFSO-RI-508833 Enabling Grids for E-sciencE Applications in EGEE • Production service supporting multiple VOs with different requirements – Data Volume Location – distributed? Write Once or Update? Metadata archives? Controlled or open access? – Computation High throughput (~ current LCG) High performance, supercomputing – No. of sites, scientists,… • Establish viable general process to bring other scientific communities on board INFSO-RI-508833 An EGEE community Enabling Grids for E-sciencE • EGEE communities are based around the idea of Virtual Organisations. • A Virtual Organisation: – Owns shared computing resources – Authorises and authenticates its members access to resources – Manages its own resources INFSO-RI-508833 EGEE: adding a VO Enabling Grids for E-sciencE • • • • EGEE has a formal procedure for adding selected new user communities (Virtual Organisations): Negotiation with one of the Regional Operations Centres Seek balance between the resources contributed by a VO and those that they consume. Resource allocation will be made at the VO level. Many resources need to be available to multiple VOs : shared use of resources is fundamental to a Grid INFSO-RI-508833 The role of the pilot applications – HEP and Biomedicine Enabling Grids for E-sciencE • Initial area of focus to establish a strong user base on which to build a broad EGEE user community • Provide early feedback to the infrastructure activities on their experience with application deployment and VO management • Act as guinea pigs and provide early feedback to the middleware developers on their experience with new services INFSO-RI-508833 EGEE pilot application: Large Hadron Collider Enabling Grids for E-sciencE • Data Challenge: – 10 Petabytes/year of data !!! – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges – Reliable and scalable through project lifetime of decades INFSO-RI-508833 Mont Blanc (4810 m) Downtown Geneva The characteristics of pilot HEP applications Enabling Grids for E-sciencE • Very large scale from project day 1 • Virtual Organizations were already set up at project day 1 • Very centralized: jobs are sent in a very organized way • Multi-grid: data challenges are deployed on several grids – – – – ALICE ATLAS CMS LHCb INFSO-RI-508833 LCG, LCG, LCG, LCG, Alien US Grid2003, Nordugrid US Grid2003 Dirac The Large Hadron Collider Enabling Grids for E-sciencE http://www.cern.ch LHC ~9 km SPS CERN INFSO-RI-508833 The LHC Experiments Enabling Grids for E-sciencE INFSO-RI-508833 Overview of experiences with LHC data challenges Enabling Grids for E-sciencE • There was continual evolution throughout 2004, with LCG and experiments gaining more experience in the development and use of an expanding LCG grid • All experiments had excellent relations with LCG-EIS support – a model for the future support of VOs • Global job efficiencies ranged from 60-80% as experience developed – must get up to 90+% for user analysis - look to new middleware developments and tighter operational procedures • Sources of problems and losses – Site configuration, management and stability – Data Management (especially metadata handling) – Difficult to monitor job running and causes of failure • D0 in early 2005 showed that one can run with good efficiency with a set of well controlled sites INFSO-RI-508833 EGEE pilot application: BioMedical Enabling Grids for E-sciencE • BioMedical – Bioinformatics (gene/proteome databases distributions) – Medical applications (screening, epidemiology, image databases distribution, etc.) – Interactive application (human supervision or simulation) – Security/privacy constraints Heterogeneous data formats - Frequent data updates - Complex data sets Long term archiving http://egee-na4.ct.infn.it/biomed/applications.html INFSO-RI-508833 The characteristics of biomedical pilot applications Enabling Grids for E-sciencE • Prototype level at project day 1 • VO was created after the project kicked-off • Very decentralized: application developers use the grid at their own pace • Very demanding on services – Compute intensive applications – Applications requiring large amounts of short jobs – Need for interactivity or guaranteed response time • Resources were focused on the deployment of large scale applications on LCG-2 – Integration of Biomed VO used to identify issues relevant to all VOs to be deployed during EGEE lifetime – Decentralized usage of the infrastructure highlights different weaknesses from the more centralized HEP data challenges INFSO-RI-508833 Status of Biomedical VO Enabling Grids for E-sciencE RLS, VO LDAP Server: CC-IN2P3 PADOVA BARI 4 RBs: CNAF, IFAE, LAPP, UPV 15 resource centres ( ) 17 CEs (>750 CPUs) 16 SEs 4 RBs 1 RLS 1 LDAP Server INFSO-RI-508833 Enabling Grids for E-sciencE INFSO-RI-508833 Biomedical VO: production jobs on EGEE Biomedical applications Enabling Grids for E-sciencE – 3 batch-oriented applications ported on LCG2 SiMRI3D: medical image simulation xmipp_MLRefine: molecular structure analysis GATE: radiotherapy planning – 3 high throughput applications ported on LCG2 CDSS: clinical decision support system GPS@: bioinformatics portal (multiple short jobs) gPTM3D: radiology images analysis (interactivity) – New applications to join in the near future Especially in the field of drug discovery INFSO-RI-508833 Enabling Grids for E-sciencE • EGEE pilot application: BioMedical BioMedical – Bioinformatics (gene/proteome databases distributions) – Medical applications (screening, epidemiology, image databases distribution, etc.) – Interactive application (human supervision or simulation) – Security/privacy constraints Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • BioMed applications deployed • GATE - Geant4 Application for Tomographic Emission – GPS@ - genomic web portal – CDSS - Clinical Decision Support System INFSO-RI-508833 12 Biomed applications Enabling Grids for E-sciencE • • • • • • • • • • • • GATE: Geant4 Application for Tomographic Emission (LPC) Docking platform for tropical diseases: grid-enabled docking platform for in sillico drug discovery (LPC) CDSS: Clinical Decision Support System (UPV) GPS@: Grid genomic web portal (IBCP) SiMRI 3D: Magnetic Resonance Image simulator (CREATIS) gPTM 3D: Interactive radiological image visualization and processing tool (LRI) xmipp_ML_refine: Macromolecular 3D structure analysis (CNB) xmipp_multiple_CTFs : Electronmicroscopic images CTF calculation (CNB) GridGRAMM: Molecular Docking web (CNB) GROCK: Mass screenings of molecular interaction (CNB Mammogrid: Mammograms analysis (EU project) SPLATCHE: Genome evolution modeling (U. Berne/WHO) INFSO-RI-508833 ...and more to come Enabling Grids for E-sciencE • SPLATCHE – first application being migrated from GILDA to biomed VO • Pharmacokinetics in MRI (UPV) – MRI registration for contrast agent diffusion study • Some progress on biological sequences analysis (M. Lexa) • ... INFSO-RI-508833 Enabling Grids for E-sciencE BLAST – comparing DNA or protein sequences • BLAST is the first step for analysing new sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. Ideal as a grid application. – Requires resources to store databases and run algorithms – Can compare one or several sequence against a database in parallel – Large user community INFSO-RI-508833 Bio-medicine applications Enabling Grids for E-sciencE • Bio-informatics – – – – – – – • Phylogenetics Search for primers Statistical genetics Bio-informatics web portal Parasitology Data-mining on DNA chips Geometrical protein comparison 1. Query the medical image database and retrieve a patient image Medical imaging – MR image simulation – Medical data and metadata management – Mammographies analysis – Simulation platform for PET/SPECT Exam image patient key ACL ... Medical Metadata images 2. Compute similarity measures over the database images Submit 1 job per image 3. Retrieve most similar cases Applications deployed Applications tested Applications under preparation INFSO-RI-508833 Similar images Low score images Bio-medicine applications Enabling Grids for E-sciencE INFSO-RI-508833 Bio-medicine applications Enabling Grids for E-sciencE INFSO-RI-508833 Bio-medicine applications Enabling Grids for E-sciencE INFSO-RI-508833 gPTM3D : Grid-Enabling Interactive Medical Analysis Enabling Grids for E-sciencE Interaction Acquire INFSO-RI-508833 Explore Analyse Interpret Render Use case Enabling Grids for E-sciencE Planning percutaneous nephrolithotomy INFSO-RI-508833 Enabling Grids for E-sciencE Evolution of biomedical applications • Growing interest of the biomedical community – Partners involved proposing new applications – New application proposals (in various health-related areas) – Enlargement of the biomedical community (drug discovery) • Growing scale of the applications – Progressive migration from prototypes to pre-production services for some applications – Increase in scale (volume of data and number of CPU hours) • Towards pre-production – Several initiatives to build user-friendly portals and interfaces to existing applications in order to open to an end-users community INFSO-RI-508833 Enabling Grids for E-sciencE A look at the future: the HealthGrid vision HealthGRID Public Health Public Health Patient Tissue, organ Cell Molecule Association Modelling Computation Patient Tissue, organ Cell Molecule Patient related data Databases INDIVIDUALISED HEALTHCARE MOLECULAR MEDICINE INFSO-RI-508833 Computational recommendation In this context "Health" does not involve only clinical practice but covers the whole range of information from molecular level (genetic and proteomic information) over cells and tissues, to the individual and finally the population level (social healthcare). Enabling Grids for E-sciencE Earth Sciences in EGEE • Research – Earth observations by satellite (ESA(IT), KNMI(NL), IPSL(FR), UTV(IT), RIVM(NL),SRON(NL)) – Climate : DKRZ(GE),IPSL(FR) – Solid Earth Physics: IPGP (FR) – Hydrology: Neuchâtel University (CH) • Industry – CGG : Geophysics Company (FR) INFSO-RI-508833 Climate Applications in EGEE Enabling Grids for E-sciencE Model: Atmosphere, Ocean, Hydrology, Atmospheric and Marine chemistry…. Goal: Comparison of model outputs from different runs and/or institutes Large volume of data (TB) from different model outputs, and experimental data Run made on supercomputer => Link the EGEE infrastruture with supercomputer Grids (DEISA) EXAMPLE: For the IPCC Assessment reports many experiment are performed with different models (different spatial resolution, different timestep, different "physics" ..) and various sites. The generated data need to be compared in a comprehensive and "unified" way. INFSO-RI-508833 Geophysics Applications Enabling Grids for E-sciencE Seismic processing Generic Platform: - Based on Geocluster, an industrial application – to be a starter of the core member VO. - Include several standard tools for signal processing, simulation and inversion. - Opened: any user can write new algorithms in new modules (shared or not) - Free for academic research -Controlled by license keys (opportunity to explore license issue at a grid level) - initial partners F, CH, UK, Russia, Norway INFSO-RI-508833 Flood simulation Enabling Grids for E-sciencE Sample Vah river Computer vision Geographical Information Systems Results: flow + water depths INFSO-RI-508833 Computational Chemistry: molecular simulator Enabling Grids for E-sciencE SURFACE Construction of the Potential Energy Surface DYNAMICS Dynamical properties Calculation PROPERTIES Calculation of Averaged quantities no Good Results? yes end INFSO-RI-508833 Ar - Benzene Enabling Grids for E-sciencE The MAGIC telescope • Largest Imaging Air Cherenkov Telescope (17 m mirror dish) • Located on Canary Island La Palma (@ 2200 m asl) • Lowest energy threshold ever obtained with a Cherenkov telescope Aim: detect –ray sources in the unexplored energy range: 30 (10)-> 300 GeV INFSO-RI-508833 The MAGIC Physics Program Enabling Grids for E-sciencE Pulsars AGNs Origin of Cosmic Rays SNRs INFSO-RI-508833 Cosmological -Ray Horizon Tests of Quantum Gravity effects GRBs Cold Dark Matter Feedback to LCG-2 middleware developers and infrastructure Enabling Grids for E-sciencE • From HEP applications – Experiment Integration Support group and Grid Applications Group produced documents summarizing problems encountered in use of LCG-2 • From Biomed applications – Very significant exchanges related to the set-up of the biomed VO and the deployment of relevant services – Request to use MPI INFSO-RI-508833 Engineering applications Enabling Grids for E-sciencE INFSO-RI-508833 Engineering applications Enabling Grids for E-sciencE INFSO-RI-508833 Grid Applications: art Enabling Grids for E-sciencE the Thomson flat scanner developed in 1990 140,000 photo-archives digitised in 6.000 dots x 8.000 lines in 5 years (1996-2001) Books are being scanned in at 767 MB per page 1/2 Terabyte for Gutenberg Bible Paintings are being scanned in at 30 GB each in the EU CRISATEL Project INFSO-RI-508833 Museo Virtual de Artes El Pais (MUVA) http://www3.diarioelpais.com/muva/. Who else can benefit from EGEE? Enabling Grids for E-sciencE • EGEE Generic Applications Advisory Panel: – For new applications • EU projects: MammoGrid, Diligent, SEEGRID … • Expression of interest: Planck/Gaia (astroparticle), SimDat (drug discovery) http://agenda.cern.ch/age?a042351 Next meeting at EGEE conference (November) INFSO-RI-508833 New communities identification Enabling Grids for E-sciencE • Through training, dissemination and outreach, communities already using advanced computing and keen to use EGEE infrastructure are identified • These communities are encouraged to prepare a document describing their interest to use EGEE • A scientific advisory panel (EGAAP) assesses and chooses among the interested communities the ones which seem the most mature to deploy their applications on EGEE INFSO-RI-508833 GILDA, an infrastructure for dissemination and demonstration Enabling Grids for E-sciencE • Goals – Demonstration of grid operation for tutorials and outreach – Initial deployment of new applications for testing purposes • Key features – Initiative of the INFN Grid Project using LCG-2 middleware – On request, anyone can quickly receive a grid certificate and a VO membership allowing them to use the infrastructure for 2 weeks – Certificate expires after two weeks but can be renewed – Use of friendly interface: Genius grid portal • Very important for the first steps of new user communities on to the grid infrastructure INFSO-RI-508833 GILDA numbers Enabling Grids for E-sciencE • • • • • • • 14 sites in 2 continents >1200 certificates issued, 10% renewed at least once >35 tutorials and demos performed in 10 months >25 jobs/day on the average Job success rate above 96% >320,000 hits on the web site from 10’s of different countries >200 copies of the UI live CD distributed in the world INFSO-RI-508833 NA4 Applications and GILDA Enabling Grids for E-sciencE • 7 Virtual Organizations supported: – – – – – – – Biomed Earth Science Academy (ESR) Earth Science Industry (CGG) Astroparticle Physics (MAGIC) Computational Chemistry (GEMS) Grid Search Engines (GRACE) Astrophysics (PLANCK) • Development of complete interfaces with GENIUS for 3 Biomed Applications: GATE, hadronTherapy, and Friction/Arlecore • Development of complete interfaces with GENIUS for 4 Generic Applications: EGEODE (CGG), MAGIC, GEMS, and CODESA-3D (ESR) (see demos!) • Development of complete interfaces with GENIUS for 16 demonstrative applications available on the GILDA Grid Demonstrator (https://grid-demo.ct.infn.it) INFSO-RI-508833 Summary Enabling Grids for E-sciencE • EGEE and grids – not just physics • For communities to benefit they need to know what grids can do for them – dissemination • Many communities are beginning to adopt the grid • EGEE has a mechanism for assisting communities onto the grid INFSO-RI-508833 Practical URLs Enabling Grids for E-sciencE • homepages.nesc.ac.uk/~gcw • grid-demo.ct.infn.it INFSO-RI-508833