Enabling Grids for E-sciencE What is Grid Computing? Richard Hopkins rph@nesc.ac.uk NGS Induction – Rutherford Appleton Laboratory, 2nd / 3rd November 2005 www.eu-egee.org INFSO-RI-508833 Acknowledgements Enabling Grids for E-sciencE • This talk was prepared by Richard Hopkins of NeSC and includes slides from previous tutorials and talks delivered by: – – – – – – Dave Berry, Mike Mineter, Guy Warner (National e-Science Centre) the EDG training team Ian Foster, Argonne National Laboratories Jeffrey Grethe, SDSC EGEE colleagues Mark Baker, The Distributed Systems Group, University of Portsmouth, http://dsg.port.ac.uk/mab • Talks at 3rd EGEE conference by – Kyriakos Baxevanidis,Deputy Head,Unit of Research Infrastructures,European Commission, DG INFSO – Dr Spyros Konidaris, European Commission – DG INFSO INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 2 Goals and Content Enabling Grids for E-sciencE Goal - To introduce the concepts of Grid computing assuming no previous knowledge • What is “a grid” ? • Drivers of grid computing • Current status of grids INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 4 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 5 The grid vision Enabling Grids for E-sciencE • The grid vision is of “Virtual computing” (+ information services to locate computation, storage resources) – Compare: The web: “virtual documents” (+ search engine to locate them) • MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of – Research – Commerce – engineering, … “the knowledge economy” – Public service – health, environment,… INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 6 “A grid” Enabling Grids for E-sciencE • The initial vision: “The Grid” • The present reality: Many “grids” • Each grid is an infrastructure enabling one or more “virtual organisations” (VOs) to share computing resources • What’s a VO? – People in different organisations seeking to cooperate and share resources across their organisational boundaries • Why establish a Grid? – – – – VO Institute A Institute B Institute C Institute D Share data Share computers Share instruments Collaborate INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 7 Single Computer Enabling Grids for E-sciencE • The Operating System enables easy use of – – – – – Input/Output devices Processor Disks Display Instruments Application Software Operating System Disks, Processor, Memory, … INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 8 Local Area Network Enabling Grids for E-sciencE User just perceives “shared resources”, with no regard to location in the organisation LAN resources act like a single virtual computer Middleware (LAN O/S) presents that image Application Software Middleware for sharing computers, servers, printers, … Operating System on each computer Resources connected by a LAN INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 9 A grid Enabling Grids for E-sciencE • Users join VO’s • Virtual organisation negotiates with sites to agree access to resources • Distributed services (both people and middleware) enable the grid INFSO-RI-508833 INTERNET NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 10 Grid Enabling Grids for E-sciencE • Grid middleware creates the image of the Grid being a single virtual computer (Ideally) Issues • Heterogeneity – hardware, software, culture • Scalability • Reliability – tolerate permanent partial failure • Viable computing model batch processing • Access control – Authentication – Authorisation – Single sign on INFSO-RI-508833 Application Software Interface between app. and grid Grid Middleware: “collective services” Grid Middleware on each resource Operating System on each resource Resources connected by internet NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 11 What characterises a grid? Enabling Grids for E-sciencE • Co-ordinated resource sharing – No centralised point of control – Different administrative domains. • Standard, open, general-purpose protocols and interfaces – NOT specific to an application – EGEE, NGS support multiple VO’s • Delivering non-trivial qualities of service – Co-ordinated to deliver combined services, greater than sum of the individual components • http://www.gridtoday.com/02/0722/100136.html INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 12 The components of a Grid Enabling Grids for E-sciencE • Resources – networking, computers, storage, data, instruments, … • Grid Middleware – the “operating system of the grid” • Operations infrastructure – Run enabling services (people + software) • Virtual Organization management – Procedures for gaining access to resources INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 13 DRIVERS OF GRID COMPUTING Enabling Grids for E-sciencE Goal - To introduce the concepts of Grid computing assuming no previous knowledge • What is “a grid” ? • Drivers of grid computing • Current status of grids INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 14 The first driver: e-Science Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – data deluges from new technologies – Crosses administrative boundaries • Examples…. INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 15 Astronomy Enabling Grids for E-sciencE No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large Data and images courtesy Alex Szalay, John Hopkins University areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1Billion objects INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 16 Large Hadron Collider at CERN Enabling Grids for E-sciencE • Data Challenge: – 10 Petabytes/year of data !!! – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges – Reliable and scalable through project lifetime of decades INFSO-RI-508833 Mont Blanc (4810 m) Downtown Geneva NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 17 Enabling Grids for E-sciencE Input file Seq1 > dcscdssdcsdcdsc Computing element dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc Seq1 zedezdze> bcbjbf dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf bscdsbcbjbfvbfvbvfbvbvbhvbhs vbhdvbhfdbvfd bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkc … Seqn > bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkchdsqhfduhdhdhq edezhhezldhezhfehflezfzejfv dedzedz dzedezd dedzedz zecdscsd dzedezd dedzedz cscdssdc zecdscsd dzedezd dedzedz sdcdscbs cscdssdc zecdscsd dzedezd cdsbcbjb dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd sdcdscbs cscdssdc zecdscsd f cdsbcbjb sdcdscbs cscdssdc f cdsbcbjb sdcdscbs f cdsbcbjb f BLAST UI Seq2 > bvdfvfdvhbdfvb DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc Seq2 zedezdze> dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf dedzedzd Seqn zedezdze> dedzedzdzedezdze cdscsdcsc cdscsdcscdssdcsdc dssdcsdc dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf BLAST gridification dedzedzdzedezdzecdscsdcscdssdcsd cdscbscdsbcbjbfvbfvbvfbvbvbhvbh svbhdvbhfdbvfdbvdfvfdvhbdfvbhd bhvdsvbhvbhdvrefghefgdscgdfgcsd ycgdkcsqkcqhdsqhfduhdhdhqedezh dhezldhezhfehflezfzeflehfhezfhehf ezhflezhflhfhfelhfehflzlhfzdjazslzd hfhfdfezhfehfizhflqfhduhsdslchlkc hudcscscdscdscdscsddzdzeqvnvqvn q! Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbwfdbfn INFSO-RI-508833 DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds bcbjbf bcbjbf BLAST DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds RESULT BLAST bcbjbf dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds bcbjbf BLAST dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds DB bcbjbf Computing element NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 18 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 19 Academic drivers: not only e-science!! Enabling Grids for E-sciencE The impact of grids when they support… Curation, discovery, reuse of knowledge e-Research e-Science INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 20 Academic drivers Derived from a slide by the UK’s JISC • E-research • Digital libraries • Centrality of curation, preservation • Under-recognised by many researchers • Virtual Digital Data Libraries needed for research as well as learning • E-learning Enabling Grids for E-sciencE • AAA Services • e-Infrastructure INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 21 Political drivers Enabling Grids for E-sciencE • Entering the “knowledge society” from the “industrial society” Industrial society = Transportation Infrastructure Knowledge society = Communications infrastructure • Lisbon strategy: Research and Innovation will be the most important factors in determining Europe’s success through the next decades • THE GOAL: “UNLEASH CREATIVITY”- by investment in – Human skills – Infrastructures • Growth of e-infrastructure (= networks + grid + operations) – phase 1: mainly academia, some in industry: “an elite, privileged to do this job” – phase 2: ordinary people doing distributed work; SMEs, adopt, adapt and use – phase 3: the next generations Will transform e-infrastructure and its uses We don’t know how others will use what we devise Just as current use of WWW not predictable by its initiators INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 22 EGEE – building e-infrastructure Enabling Grids for E-sciencE EGEE is building a large-scale production grid service to: • Underpin research, technology and public service • Link with and build on national, regional and international initiatives • Foster international cooperation both in the creation and the use of the einfrastructure INFSO-RI-508833 Pan-European Grid Operations, Support and training Collaboration Network infrastructure & Resource centres NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 23 CURRENT STATUS OF GRIDS Enabling Grids for E-sciencE Goal - To introduce the concepts of Grid computing assuming no previous knowledge • What is “a grid” ? • Drivers of grid computing • Current status of grids INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 24 Enabling Grids for E-sciencE If “The Grid” vision leads us here… … then where are we now? INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 25 Grid projects Enabling Grids for E-sciencE Many Grid development efforts — all over the world •UK – OGSA-DAI, RealityGrid, GeoDise, •NASA Information Power Grid Comb-e-Chem, DiscoveryNet, DAME, •DOE Science Grid AstroGrid, GridPP, MyGrid, GOLD, eDiamond, Integrative Biology, … •NSF National Virtual Observatory •Netherlands – VLAM, PolderGrid •NSF GriPhyN •Germany – UNICORE, Grid proposal •DOE Particle Physics Data Grid •France – Grid funding approved •NSF TeraGrid •Italy – INFN Grid •DOE ASCI Grid •Eire – Grid proposals •DOE Earth Systems Grid •Switzerland - Network/Grid proposal •DARPA CoABS Grid •DataGrid (CERN, ...) •Hungary – DemoGrid, Grid proposal •NEESGrid •EuroGrid (Unicore) •Norway, Sweden - NorduGrid •DataTag (CERN,…) •DOH BIRN •Astrophysical Virtual Observatory •NSF iVDGL •GRIP (Globus/Unicore) •GRIA (Industrial applications) •GridLab (Cactus Toolkit) •CrossGrid (Infrastructure Components) •EGSO (Solar Physics) INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 26 Grids: where are we now? Enabling Grids for E-sciencE • Many key concepts identified and known • Many grid projects have tested, and benefit from, these • Major efforts now on establishing: – Standards (a slow process) (e.g. Global Grid Forum, http://www.gridforum.org/ ) – Production Grids for multiple VO’s “Production” = Reliable, sustainable, with commitments to quality of service • In Europe, EGEE • In UK, National Grid Service • In US, Teragrid One stack of middleware that serves many research (and other!!!) communities Operational procedures and services (people!, policy,..) – New user communities • … whilst research & development continues INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 27 The key for new VO’s Enabling Grids for E-sciencE Application Application toolkits, standards Middleware: “collective services” Basic Grid services: AA, job submission, info, … • The tools, services used by the VO’s applications • Application development environment, portals, semantics • Insulate applications from changing middleware INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 28 The vision of 2001: convergence of Web Services and Grids Enabling Grids for E-sciencE Open Grid Services Architecture Web services World-wide web OGSI Grid prototypes High-end computing High throughput-computing INTERNET INFSO-RI-508833 Massively parallel computing NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 29 Key concepts Enabling Grids for E-sciencE • Virtual organisation: – people and resources collaborating - across admin, organisational boundaries Individual joins VO VO negotiations with resource providers • Grid middleware – running on each resource to interface it to the Grid – providing specific services • Single Virtual Computer – User just perceives “shared resources” with no concern for location or owning organisation – Issues INFSO-RI-508833 Heterogeneity Scalability Reliability Computing model Access control NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 30 Key Concepts Enabling Grids for E-sciencE • Drives are towards – Production services (reliable, sustainable,… – against which research projects can plan with confidence) In Europe, EGEE In UK, National Grid Service – Standards & convergence with WWW mainstream – Empowering new user communities INFSO-RI-508833 NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins 31