Enabling Grids for E-sciencE What are grid computing and e-Science? Mike Mineter mjm@nesc.ac.uk www.eu-egee.org INFSO-RI-508833 Acknowledgements Enabling Grids for E-sciencE • This talk was prepared by Mike Mineter of NeSC and includes slides from previous tutorials and talks delivered by: – – – – – Dave Berry, Richard Hopkins (National e-Science Centre) the EDG training team Ian Foster, Argonne National Laboratories Jeffrey Grethe, SDSC EGEE colleagues INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 3 Goals of this module Enabling Grids for E-sciencE • To introduce the concepts of e-Science and Grid computing assuming no previous knowledge INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 4 Contents Enabling Grids for E-sciencE • • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 5 What is…. Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – Crosses organisational boundaries • What is a Grid? “An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” Ian Foster and Carl Kesselman UK e-Science Programme: http://www.rcuk.ac.uk/escience/ INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 6 What is…. Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – Crosses organisational boundaries • What is a Grid? “An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” Ian Foster and Carl Kesselman UK e-Science Programme: http://www.rcuk.ac.uk/escience/ INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 7 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 8 What is Grid computing? Enabling Grids for E-sciencE • “A grid by any other name”, Dec 2nd 2004 The Economist print edition – “The next big thing in computing” – “No-one can agree what it is” Sometimes in Industry : “Grids” = clusters • Motivations: better use of resources; scope for commercial services Also used to refer to the harvesting of unused compute cycles • (SETI@home, Climateprediction.net) • In e-Research: – The grid vision is of “Virtual computing” (+ information services and brokers to locate computation, storage resources) Cf: The web: “virtual documents” (+ search engine to locate them) – MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of research (and knowledge curation, discovery, and education) INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 9 Before Grids Enabling Grids for E-sciencE Researchers in many locations need to share resources FTP, telnet, blood, sweat and tears… and little support for collaboration Scientific instruments, data stores and computers in many locations INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 10 The Grid Vision Enabling Grids for E-sciencE Researchers in many locations need to share resources Resources connect to “The Grid” Scientific instruments, data stores and computers in many locations INFSO-RI-508833 What are Grid Computing and e-Science? March tutorials 2005, NeSC Slide derived from EDG 10 / LCG 11 A computer Enabling Grids for E-sciencE • The Operating System enables easy use of – – – – Input devices Processor Disks Display INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 12 Resources on a LAN Enabling Grids for E-sciencE Middleware runs on each computer: • To allow sharing of disks and printers (using, e.g. Samba) • To share processors for computation (e.g. Condor) • User just perceives “shared resources”, with no regard to location in the building: – Authenticated by username / password – Authorised to use own files,… INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 13 Grid vision: Resources on the Internet Enabling Grids for E-sciencE • Grid middleware runs on each collaborating resource • Controlled by services of – Authentication – Authorisation INTERNET • User just perceives “shared resources”, with no regard to location or owning organisation • Single sign-on INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 14 Typical current grid Enabling Grids for E-sciencE • Grid middleware runs on each shared resource – Data storage – (Usually) batch jobs on pools of processors • Users join VO’s • Virtual organisation negotiates with sites to agree access to resources INTERNET • Distributed services (both people and middleware) enable the grid, allow single sign-on INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 15 Grid projects Enabling Grids for E-sciencE Many Grid development efforts — all over the world •UK – OGSA-DAI, RealityGrid, GeoDise, •NASA Information Power Grid Comb-e-Chem, DiscoveryNet, DAME, •DOE Science Grid AstroGrid, GridPP, MyGrid, GOLD, eDiamond, Integrative Biology, … •NSF National Virtual Observatory •Netherlands – VLAM, PolderGrid •NSF GriPhyN •Germany – UNICORE, Grid proposal •DOE Particle Physics Data Grid •France – Grid funding approved •NSF TeraGrid •Italy – INFN Grid •DOE ASCI Grid •Eire – Grid proposals •DOE Earth Systems Grid •Switzerland - Network/Grid proposal •DARPA CoABS Grid •DataGrid (CERN, ...) •Hungary – DemoGrid, Grid proposal •NEESGrid •EuroGrid (Unicore) •Norway, Sweden - NorduGrid •DataTag (CERN,…) •DOH BIRN •Astrophysical Virtual Observatory •NSF iVDGL •GRIP (Globus/Unicore) •GRIA (Industrial applications) •GridLab (Cactus Toolkit) •CrossGrid (Infrastructure Components) •EGSO (Solar Physics) INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 16 Contents Enabling Grids for E-sciencE • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 17 Exponential Growth Performance per Dollar Spent Enabling Grids for E-sciencE Optical Fibre Doubling Time 9 12 Gilder’s Law (32X in 4 yrs) (bits per second) (months) 18 Data Storage Storage Law (16X in 4yrs) (bits per sq. inch) Chip capacity (# transistors) 0 1 2 Moore’s Law (5X in 4yrs) 3 4 5 Number of Years Triumph of Light – Scientific American. George Stix, January 2001 INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 18 How Different 2005 is from 1995 Enabling Grids for E-sciencE • Enormous quantities of data: Petabytes – For an increasing number of communities – Constraint is not collection but analysis • Ubiquitous Internet: – >100 million hosts – Security and Trust are crucial issues • Ultra-high-speed networks: >10 Gb/s – Global optical networks – Bottlenecks: last kilometre & firewalls • Huge quantities of computing: >100 Top/s – Moore’s law gives us all supercomputers – Organising their effective use is the challenge • Moore’s law everywhere – Instruments, detectors, sensors, scanners, … – Organising their effective use is the challenge INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 19 Global Drivers of e-Science Enabling Grids for E-sciencE • Collaboration - Enabling People to Work Together – With security and flexibility for otherwise unattainable benefits – For example: • • • to share instruments, databases, or computation to serve occasional peaks of high demand for computation (especially trivially parallelisable ones) from collaborators Digital technology – exponential growth “Data deluge” Consequent Research Investment – UK e-Science programme – EU e-Infrastructure – USA cyberinfrastructure • Industry investment – potential for dynamic accountable use of resources INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 20 What is e-Infrastructure – Political view Enabling Grids for E-sciencE • A shared resource – That enables science, research, engineering, medicine, industry, … – It will improve UK / European / … productivity Lisbon Accord 2000 E-Science Vision SR2000 – John Taylor – Commitment by UK government Sections 2.23-2.25 – Always there c.f. telephones, transport, power, internet INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 21 Contents Enabling Grids for E-sciencE • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 22 Example: Astronomy Enabling Grids for E-sciencE No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1B objects INFSO-RI-508833 Data and images courtesy Alex Szalay, John Hopkins University What are Grid Computing and e-Science? 10 March 2005, NeSC 23 Example: Earth Observation Enabling Grids for E-sciencE ESA missions: • 100’s of Gbytes of data per day Grid contribution to EO: • Enhance the ability to access high level products • Allow reprocessing of large historical archives • Improve Earth science complex applications (data fusion, data mining, modelling …) Federico.Carminati , EU review presentation, 1 March 2002 INFSO-RI-508833 Derived from: L. Fusco, June 2001 What are Grid Computing and e-Science? 10 March 2005, NeSC 24 Example: Wearable Devices Enabling Grids for E-sciencE Sensor bus • • • • • Sensors Wireless connection Positioning information from GPS Mobile medical technologies Environmental sensing (air pollution) INFSO-RI-508833 GPS aerial What are Grid Computing and e-Science? 10 March 2005, NeSC 25 Connecting people: Access Grid Enabling Grids for E-sciencE Cameras Microphones INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 26 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning What are Grid Computing andData e-Science? Signal Explorer 10 March 2005, NeSC 27 BLAST – comparing DNA or protein sequences Enabling Grids for E-sciencE • BLAST is the first step for analysing new sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. • Ideal as a grid application – trivial to parallelise as independent concurrent jobs. – Requires resources to store databases and run algorithms – Large user community INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 28 The LCG Enabling Grids for E-sciencE • Large Hadron Collider (LHC) Compute Grid • Largest current grid – some of its middleware being included in the NGS • One of the initial applications for EGEE and one of its predecessors, European DataGrid • 1000’s of scientists sharing resources to provide the computation / storage needed for the LHC from 2007 • Sustainable, dependable service is vital INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 29 The CERN Large Hadron Collider Enabling Grids for E-sciencE http://www.cern.ch LHC ~9 km SPS CERN INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 30 The LHC Experiments Enabling Grids for E-sciencE INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 31 The LHC Experiments Enabling Grids for E-sciencE ATLAS CMS ~10-15 PetaBytes /year ~108 events/year ~103 batch and interactive users LHCb INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 32 Orders of magnitude… Enabling Grids for E-sciencE 1 Gigabyte (1GB) = 1000MB A DVD movie Per experiment: • 40 million collisions per second • After filtering, 100 collisions of interest per second • A Megabyte of digitised information for each collision = recording rate of 0.1-1 Gigabytes/sec • 1 billion collisions recorded = 1-3 Petabyte/year Total: ~10.000.000.000.000.000 bytes/year = 1% of CMS INFSO-RI-508833 LHCb 1 Megabyte (1MB) A digital photo ATLAS What are Grid Computing and e-Science? 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production ALICE 10 March 2005, NeSC 33 Computing Resources: Feb 2005 Enabling Grids for E-sciencE Country providing resources Country anticipating joining In LCG-2: 113 sites, 30 countries >10,000 cpu ~5 PB storage Includes non-EGEE sites: • 9 countries • 18 sites INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 34 Contents Enabling Grids for E-sciencE • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 35 Current status Enabling Grids for E-sciencE • Many key concepts identified and known • Many grid projects have tested these • Major efforts now on: – Establishing standards (a slow process) – Establishing production Grids (too urgent to wait for standards!, also standards need to emerge from experience!) “Production” = Reliable, sustainable, with commitments to quality of service – Establishing new user communities Need to prove this is technology with widespread potential Much more than for the few disciplines that helped to create it – Whilst research & development continues • In Europe, EGEE • In UK, NGS (interoperable - at least - with EGEE) • In US, Teragrid INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 36 Grid security and trust -1 Enabling Grids for E-sciencE • Providers of resources (computers, databases,..) need risks to be controlled: they are asked to trust users they do not know • User’s need single sign-on: logon to a machine that can pass the user’s identity to other resources • Build middleware on layer providing: – Authentication: know who wants to use resource – Authorisation: know what the user is allowed to do – Security: reduce vulnerability, e.g. from outside the firewall – Non-repudiation: knowing who did what • “GSI” from the Globus toolkit does this for NGS INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 37 Grid security and trust -2 Enabling Grids for E-sciencE • Currently, achieved by Certification: – User’s identity has to be certified by (mutually recognized) national Certification Authorities (CAs) – Resources (node machines) have to be certified by CAs – Digital certificate installed on the machine accessed by user basis of AA – Identity passed to other resources you use, where it is mapped to a local account – the mapping is maintained by the VO • User joins VO’s • Common agreed policies establish rights for a Virtual Organization to use resources INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 38 1997- Present: Globus Enabling Grids for E-sciencE • A software toolkit addressing certain technical problems in the development of Grid enabled tools, services, and applications – Offers a modular “bag of technologies” – Made available under liberal open source license • Not turnkey solutions, but building blocks and tools for application developers and system integrators • Tools built on GSI include: – – – – Job submission (GRAM) : run a job on a remote computer Information services: So I know which computer to use File transfer (GridFTP): so large data files can be transferred Replica management: so I can have multiple versions of a file “close” to the computers where I want to run jobs • http://www.globus.org/ INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 39 Current status – continued Enabling Grids for E-sciencE • Grid technology is developing! • Non-trivial for new users and new applications areas to start! • Hence need major commitments – to training – to supporting new user communities – to establishing procedures for new VO’s • Social as well as technical issues – Commitments to collaboration – From systems admin as well as researchers Negotiation with operations and VO managers (see later) INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 40 The key for new VO’s Enabling Grids for E-sciencE Application Application toolkits, standards Middleware: “collective services” Basic Grid services: AA, job submission, info, … • Application development environment • Insulate applications from changing middleware • Build distributed applications from components INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 41 Contents Enabling Grids for E-sciencE • • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 42 Are Grids for you?! -1 Enabling Grids for E-sciencE • IF a community effort is vital to achieving goals, by sharing services of data and computation, • AND that effort crosses organisation boundaries • THEN yes! • In the UK, negotiate to join the NGS! • OR if you wish to use computation/storage/data services provided on a Grid – then YES! INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 43 Are Grids for you? -2 Enabling Grids for E-sciencE • Suggestions for research disciplines not already engaged with grid computing – Identify your “Virtual Organisations” What are the drivers for collaboration? What are the VO characteristics? • Fixed relationships? • Short-lived or constant requirements for resources? • Sharing results or “just” sharing resources? What services (data, computation) would you want to share?! – What remote resources (computers, databases, instruments…) do you need to access? INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 44 Summary Enabling Grids for E-sciencE • Collaboration across multiple organisations • Single sign-on to resources in multiple organisations • Need for people-services as well as middleware services to enable this: e.g. to run – – – – Enabling services (e.g. info service) Certification authority for AA VO management – to negotiate with sites Helpdesk, … • Drives are towards – Production services In the UK, the NGS In Europe, EGEE – Standards – (tomorrow) – “e-Infrastructure” ~ integration of networking and middleware to support collaboration INFSO-RI-508833 What are Grid Computing and e-Science? 10 March 2005, NeSC 45