Particle Physics Grids - a help or a hindrance for other areas of eScience? NeSC Opening Edinburgh, 25 April 2002 David Williams IT Division, CERN also President TERENA mailto:David.O.Williams@cern.ch Homepage: cern.ch/David.Williams/ Slides: cern.ch/David.Williams/public/NeSCTalk.ppt Not my own work While I may generate some ideas, I am not responsible in any way for CERN’s work on grids It is a pleasure to acknowledge the people who do the real work and the people who let me use their slides, including:Fabrizio Gagliardi, Bob Jones, Neil Geddes, Ian Foster, Hans Hoffmann, Tony Hey, Olivier Martin (and anyone else that I unintentionally forgot) Outline A bit about CERN and particle physics For PP, there is no alternative (to Grids) EU DataGrid (EDG) The EDG Testbed Other European projects Other non-European projects ABOUT CERN AND THE GLOBAL PARTICLE PHYSICS COMMUNITY This is (y)our laboratory 1954 2000 CMS Alice LHCb PS Atlas Genève CERN's (Human) Network in the World 267 institutes in Europe, 4603 users 208 institutes elsewhere, 1632 users some points = several institutes Introduction The Mission of CERN (1954): “The Organization shall provide for collaboration among European States in nuclear research of a pure scientific and fundamental character, and in research essentially related thereto. The organization shall have no concern with 20 Member States work for military requirements and the results of its experimental and theoretical work shall (2000) be published or otherwise made generally available”. The ATLAS detector 26m long by 20m diameter The ATLAS Collaboration A "virtual" large science Laboratory Objectives – Study proton proton collisions at c.m. energies of 14 000 GeV Milestones – – – – – – – R&D Experiments Letter of Intent Technical Proposal Approval Start Construction Installation Exploitation 1988-1995 1992 1994 1996 1998 2003-06 2007 - (~) 2017 Open, collaborative culture on a world scale ATLAS (2) Scale – 2000 Scientists – 150 Institutes – 35 Countries Cost – 475 MCHF material cost (1995 prices) – CERN contribution < 15% Methodology – All collaborating institutes supply deliverables: namely software packages, detectors, electronics, simulation tasks, equipment of all kinds, engineering tasks, ... Some numbers – 150 M detector channels – 10 M lines of code – 1 M control points Changing technology! PC farms in 2000 Computer Room in 1985 TINA Data coming out of our ears When they start operating in 2007, each LHC experiment will generate several Petabytes of (already highly selected and compressed) data per year. There are four experiments This is roughly 500x more data than we had to deal with from LEP in 2000. Users from all over the world The users are from all over Europe, and all over the world They need (and like) to spend as much time as possible in their home institutes That is especially true for the youngest researchers CERN cannot possibly afford to pay for “all” LHC computing and data storage If users are to make extra investments in computing and data storage facilities they tend to prefer to make that investment “back home” (Personal) Conclusion Politics (and technological flexibility/uncertainty) force you to a globally distributed solution for handling your data, and computing on it You cannot manage to operate such a system reliably at that scale without reliable grid middleware [i.e. you might be able to do it with 10-20 very smart graduate students per experiment as a “production team” but they would rise in revolt (or implement a Grid!)] There is no alternative EDG EU DataGrid (EDG) Project Objectives To build on the emerging Grid technology to develop a sustainable computing model for effective sharing of computing resources and data Specific project objectives: – Middleware for fabric & Grid management (mostly funded by the EU) – Large scale testbed (mostly funded by the partners) – Production quality demonstrations (partially funded by the EU) To collaborate with and complement other European and US projects Test and demonstrator of Géant Contribute to Open Standards and international bodies: – Co-founder of Global GRID Forum and host of GGF1 and GGF3 – Industry and Research Forum for dissemination of project results Objectives for the first year of the project Collect requirements for middleware – Take into account requirements from application groups Survey current technology – Job resource specification & scheduling Core Services testbed – Testbed 0: Globus (no EDG middleware) WP3: grid monitoring services – Monitoring infrastructure, directories & presentation tools WP4: fabric management – Framework for fabric configuration management & automatic sw installation First Grid testbed release – Testbed 1: first release of EDG middleware WP2: data management – Data access, migration & replication – For all middleware WP1: workload WP5: mass storage management – Common interface for Mass Storage Sys. WP7: network services – Network services and monitoring Main Partners CERN – International (Switzerland/France) CNRS - France ESA/ESRIN – International (Italy) INFN - Italy NIKHEF – The Netherlands PPARC - UK Assistant Partners Industrial Partners •Datamat (Italy) •IBM-UK (UK) •CS-SI (France) Research and Academic Institutes •CESNET (Czech Republic) •Commissariat à l'énergie atomique (CEA) – France •Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) •Consiglio Nazionale delle Ricerche (Italy) •Helsinki Institute of Physics – Finland •Institut de Fisica d'Altes Energies (IFAE) - Spain •Istituto Trentino di Cultura (IRST) – Italy •Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany •Royal Netherlands Meteorological Institute (KNMI) •Ruprecht-Karls-Universität Heidelberg - Germany •Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands •Swedish Research Council - Sweden Project scope 9.8 M Euros of EU funding over 3 years 90% for middleware and applications (HEP, EO and Biomedical) ~50 “funded” staff (= paid from EU funds) and ~70 “unfunded” ftes Three year phased developments & demos (2001-2003) Extensions (time and funds) on the basis of first successful results: – DataTAG (2002-2003) – CrossGrid (2002-2004) – GridStart (2002-2004) EDG TESTBED Project Schedule Project started on 1/1/2001 TestBed 0 (early 2001) – International test bed 0 infrastructure deployed Globus 1 only - no EDG middleware TestBed 1 (end 2001/early 2002) – First release of EU DataGrid software to defined users within the project: HEP experiments, Earth Observation, Biomedical applications Project successfully reviewed by EU on March 1st 2002 TestBed 2 (September-October 2002) – Builds on TestBed 1 to extend facilities of DataGrid TestBed 3 (March 2003) & 4 (September 2003) Project completion expected by end 2003 TestBed 1 Sites Status Web interface showing status of (~400) servers at testbed 1 sites Summary (Jones) Application groups requirements defined and analysed Extensive survey of relevant technologies completed and used as a basis for EDG developments First release of the testbed successfully deployed Excellent collaborative environment developed with key players in Grid arena Project can be judged by: – level of "buy-in" by the application groups – wide-spread usage of EDG software – number and quality of EDG sw releases – positive influence on developments of GGF standards & Globus toolkit (SOME) RELATED EUROPEAN WORK GRIDs – EU IST projects (~36m Euro) Applications EGSO CROSSGRID Middleware GRIP EUROGRID & Tools DAMIEN GRIDSTART GRIA GRIDLAB DATAGRID DATATAG Underlying Infrastructures Industry / business Science DataTAG project NewYork Abilene UK SuperJANET4 IT GARR-B STAR-LIGHT ESNET CERN GEANT MREN NL SURFnet STAR-TAP Some international relations BaBarGrid - UK Deployment GridPP To convince yourself that this is starting to be real try to look at some of the GridPP demos today (SOME) RELATED NON-EUROPEAN WORK Overview of GriPhyN Project GriPhyN basics – $11.9M (NSF) + $1.6M (matching) – 5 Year effort, started October 2000 – 4 frontier physics experiments: ATLAS, CMS, LIGO, SDSS – Over 40 active participants GriPhyN funded primarily as an IT research project – 2/3 CS + 1/3 physics GriPhyN Approach Virtual Data – Tracking the derivation of experiment data with high fidelity – Transparency with respect to location and materialization Automated grid request planning – Advanced, policy driven scheduling Achieve this at peta-scale magnitude GriPhyN CMS SC2001 Demo http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm Full Event Database of ~40,000 large objects Request Parallel tuned GSI FTP Full Event Database of ~100,000 large objects Request “Tag” database of ~140,000 small objects Parallel tuned GSI FTP Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics Work of: Koen Holtman, J.J. Bunn, H. Newman, & others PPDG DoE funding intended (primarily) for the hardware needed to exploit GriPhyN iVDGL: A Global Grid Laboratory “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001 International Virtual-Data Grid Laboratory – – – – – A A A A A global Grid laboratory (US, Europe, Asia, South America, …) place to conduct Data Grid tests “at scale” mechanism to create common Grid infrastructure laboratory for other disciplines to perform Data Grid tests focus of outreach efforts to small institutions U.S. part funded by NSF (2001-2006) – $13.7M (NSF) + $2M (matching) iVDGL Components Computing resources – 2 Tier1 laboratory sites (funded elsewhere) – 7 Tier2 university sites software integration – 3 Tier3 university sites outreach effort Networks – USA (TeraGrid, Internet2, ESNET), Europe (Géant, …) – Transatlantic (DataTAG), Transpacific, AMPATH?, … Grid Operations Center (GOC) – Joint work with TeraGrid on GOC development Computer Science support teams – Support, test, upgrade GriPhyN Virtual Data Toolkit Education and Outreach Coordination, management iVDGL Components (cont.) High level of coordination with DataTAG – Transatlantic research network (2.5 Gb/s) connecting EU & US Current partners – TeraGrid, EU DataGrid, EU projects, Japan, Australia Experiments/labs requesting participation – ALICE, CMS-HI, D0, BaBar, BTEV, PDC (Sweden) RELATIONS WITH OTHER BRANCHES OF e-SCIENCE EU DataGrid is not just Particle Physics Biomedical applications Data mining on genomic databases (exponential growth) Indexing of medical databases (Tb/hospital/year) Collaborative framework for large scale experiments (e.g. epidemiological studies) Parallel processing for – Databases analysis – Complex 3D modelling Biomedical (cont) I was personally very impressed by the serious analysis provided by the Biomedical work package at the EDG Review. Johan Montagnat at Lyon. Accessible via http://www.edg.org [Also by the security requirements analysis, Kelsey et al]. Earth Observations ESA missions: • about 100 Gbytes of data per day (ERS 1/2) • 500 Gbytes, for the next ENVISAT mission (launched March 1st) EO requirements for the Grid: • enhance the ability to access high level products • allow reprocessing of large historical archives • improve Earth science complex applications (data fusion, data mining, modelling …) Data structures Particle physics is actually a rather extensive user of databases (see SC2001 demo slide) The 3 main issues are “performance, performance, performance” Each experiment does agree on its own unified data model, so we don’t (typically) have to constantly federate separately-curated DBs in the way that is common to bioinfomatics and many other branches of research But the experiment’s model does evolve on a say annual basis And the “end user” often wants to get hold of highly selected data on a “personal” machine and explore it A pity that initial Grid developments assumed a very simplistic view of data structures Good that UK is pushing ahead in this area Particle physics looks forward to using (and helping test) this work Test beds Biggest pressure on EDG testbed is to turn in into a bigger facility aiming to offer more resources for “Data Challenges” and to operate on a 24*7 basis It is potentially useful for other people wanting to run tests at scale (and is already being used for this) I suggest that this idea (a really large configurable testbed) should become part of FP6 Integrated Project PP will really push for production quality MW and SW TODAY’S KEY ISSUES Databases and/or data management integrated with grid MW Security Reliability [Networking - everywhere] Some “Large” Grid Issues (Geddes) Consistent transaction management Query (task completion time) estimation Queuing and co-scheduling strategies Load balancing (e.g., Self Organizing Neural Network) Error Recovery: Fallback and Redirection Strategies Strategy for use of tapes Extraction, transport and caching of physicists’ object-collections; Grid/Database Integration Policy-driven strategies for resource sharing among sites and activities; policy/capability tradeoffs Network Performance and Problem Handling Monitoring and Response to Bottlenecks Configuration and Use of New-Technology Networks e.g. Dynamic Wavelength Scheduling or Switching Fault-Tolerance, Performance of the Grid Services Architecture (H. Newman 13/3/02) Creation and support E-Science centres Europe Infrastructure Modulable Testbeds R&D Agenda Semantic GRID Database Security Deployment with IT Industry S/W Hardening GLOBUS EuroGrid, Gridlab etc. National eScience Centres Integrated Project ENABLING GRIDS ESCIENCE EUROPE EGEE Science Outreach Applications in Other Sciences EIROforum Consulting Prototyping Deployment Industry Applications Industry Outreach Consulting Training Courses Dissemination Forum SMEs developing Grid-enabled Applications Tools and Service Development and finally Best wishes to NeSC UK should feel very pleased at the level of researchdevelopment-industry interaction that has been obtained via eScience Programme