Enabling Grids for E-sciencE Concepts of grid computing Mike Mineter mjm@nesc.ac.uk www.eu-egee.org INFSO-RI-508833 Acknowledgements Enabling Grids for E-sciencE • This talk was prepared by Mike Mineter of NeSC and includes slides from previous tutorials and talks delivered by: – – – – – – Dave Berry, Richard Hopkins, Guy Warner (National e-Science Centre) the EDG training team Ian Foster, Argonne National Laboratories Jeffrey Grethe, SDSC EGEE colleagues Mark Baker, The Distributed Systems Group, University of Portsmouth, http://dsg.port.ac.uk/mab • Talks at 3rd EGEE conference by – Kyriakos Baxevanidis,Deputy Head,Unit of Research Infrastructures,European Commission, DG INFSO – Dr Spyros Konidaris, European Commission – DG INFSO INFSO-RI-508833 Concepts of Grid Computing 2 Goals of this module Enabling Grids for E-sciencE • To introduce the concepts of Grid computing assuming no previous knowledge INFSO-RI-508833 Concepts of Grid Computing 3 Contents Enabling Grids for E-sciencE • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Current status of grids The basis: authentication, authorisation, security INFSO-RI-508833 Concepts of Grid Computing 4 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 Concepts of Grid Computing 5 The grid vision Enabling Grids for E-sciencE • The grid vision is of “Virtual computing” (+ information services to locate computation, storage resources) – Compare: The web: “virtual documents” (+ search engine to locate them) • MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of – Research – Commerce – engineering, … “the knowledge economy” – Public service – health, environment,… INFSO-RI-508833 Concepts of Grid Computing 6 Contents Enabling Grids for E-sciencE • “The Grid” vision • What is “a grid” ? INFSO-RI-508833 Concepts of Grid Computing 7 “A grid” Enabling Grids for E-sciencE • The initial vision: “The Grid” • The present reality: Many “grids” • Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources • What’s a VO? – People in different organisations seeking to cooperate and share resources across their organisational boundaries • Why establish a Grid? VO Institute A Institute B Institute C Institute D – Share data – Pool computers – Collaborate INFSO-RI-508833 Concepts of Grid Computing 8 The Single Computer Enabling Grids for E-sciencE • The Operating System enables easy use of – – – – – Input devices Processor Disks Display Any other attached devices Application Software Operating System Disks, Processor, Memory, … INFSO-RI-508833 Concepts of Grid Computing 9 Resources on a Local Area Network Enabling Grids for E-sciencE User just perceives “shared resources”, with no regard to location in the organisation: - Authenticated by username / password - Authorised to use own files,… Application Software Middleware for sharing computers, servers, printers, … Operating System on each computer Resources connected by a LAN INFSO-RI-508833 Concepts of Grid Computing 10 Resources on a grid Enabling Grids for E-sciencE Application Software Interface between app. and grid Grid Middleware: “collective services” Grid Middleware on each resource Operating System on each resource Resources connected by internet INFSO-RI-508833 Concepts of Grid Computing 11 A grid Enabling Grids for E-sciencE • Grid middleware runs on each shared resource – Data storage – (Usually) batch jobs on pools of processors • Users join VO’s • Virtual organisation negotiates with sites to agree access to resources INTERNET • Distributed services (both people and middleware) enable the grid INFSO-RI-508833 Concepts of Grid Computing 12 What characterises a grid? Enabling Grids for E-sciencE • Co-ordinated resource sharing – No centralised point of control – Different administrative domains. • Standard, open, general-purpose protocols and interfaces – NOT specific to an application – EGEE, NGS support multiple VO’s • Delivering non-trivial qualities of service – Co-ordinated to deliver combined services, greater than sum of the individual components • http://www.gridtoday.com/02/0722/100136.html INFSO-RI-508833 Concepts of Grid Computing 13 The components of a Grid Enabling Grids for E-sciencE • Resources – networking, computers, storage, data, instruments, … • Grid Middleware – the “operating system of the grid” • Operations infrastructure – Run enabling services (people + software) • Virtual Organization management – Procedures for gaining access to resources INFSO-RI-508833 Concepts of Grid Computing 14 Key concepts Enabling Grids for E-sciencE • Virtual organisation: people and resources collaborating - across admin, organisational boundaries • Single sign-on – I connect to one machine – some sort of “digital credential” is passed on to any other resource I use, basis of: Authentication: How do I identify myself to a resource without username/password for each resource I use? Authorisation: what can I do? Determined by • My membership of VO • VO negotiations with resource providers • Grid middleware runs on each resource • User just perceives “shared resources” with no concern for location or owning organisation INFSO-RI-508833 Concepts of Grid Computing 15 Contents Enabling Grids for E-sciencE • “The Grid” vision • What is “a grid” ? • Drivers of grid computing INFSO-RI-508833 Concepts of Grid Computing 16 The first driver: e-Science Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – data deluges from new technologies – Crosses organisational boundaries • Examples…. INFSO-RI-508833 Concepts of Grid Computing 17 Astronomy Enabling Grids for E-sciencE No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1B objects INFSO-RI-508833 Data and images courtesy Alex Szalay, John Hopkins University Concepts of Grid Computing 18 Large Hadron Collider at CERN Enabling Grids for E-sciencE • Data Challenge: – 10 Petabytes/year of data !!! – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges Mont Blanc (4810 m) – Reliable and scalable through project lifetime of decades INFSO-RI-508833 Concepts of Grid Computing Downtown Geneva 19 Enabling Grids for E-sciencE Input file Seq1 > dcscdssdcsdcdsc Computing element dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc Seq1 zedezdze> bcbjbf dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf bscdsbcbjbfvbfvbvfbvbvbhvbhs vbhdvbhfdbvfd bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkc … Seqn > bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkchdsqhfduhdhdhq edezhhezldhezhfehflezfzejfv dedzedz dzedezd dedzedz zecdscsd dzedezd dedzedz cscdssdc zecdscsd dzedezd dedzedz sdcdscbs cscdssdc zecdscsd dzedezd cdsbcbjb dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd sdcdscbs cscdssdc zecdscsd f cdsbcbjb sdcdscbs cscdssdc f cdsbcbjb sdcdscbs f cdsbcbjb f BLAST UI Seq2 > bvdfvfdvhbdfvb DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc Seq2 zedezdze> dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf dedzedzd Seqn zedezdze> dedzedzdzedezdze cdscsdcsc cdscsdcscdssdcsdc dssdcsdc dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf BLAST gridification dedzedzdzedezdzecdscsdcscdssdcsd cdscbscdsbcbjbfvbfvbvfbvbvbhvbh svbhdvbhfdbvfdbvdfvfdvhbdfvbhd bhvdsvbhvbhdvrefghefgdscgdfgcsd ycgdkcsqkcqhdsqhfduhdhdhqedezh dhezldhezhfehflezfzeflehfhezfhehf ezhflezhflhfhfelhfehflzlhfzdjazslzd hfhfdfezhfehfizhflqfhduhsdslchlkc hudcscscdscdscdscsddzdzeqvnvqvn q! Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbwfdbfn INFSO-RI-508833 DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds bcbjbf bcbjbf BLAST DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds RESULT BLAST bcbjbf dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds bcbjbf BLAST dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds bcbjbf DB Concepts of Grid Computing Computing element 20 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning Concepts of Grid Computing 21 Academic drivers: not only e-science!! Enabling Grids for E-sciencE The impact of grids when they support… Curation, discovery, reuse of knowledge e-Research e-Science INFSO-RI-508833 Concepts of Grid Computing 22 Academic drivers Derived from a slide by the UK’s JISC • E-research • Digital libraries • Centrality of curation, preservation • Under-recognised by many researchers • Virtual Digital Data Libraries needed for research as well as learning • E-learning Enabling Grids for E-sciencE • AAA Services • e-Infrastructure INFSO-RI-508833 Concepts of Grid Computing 23 Political drivers Enabling Grids for E-sciencE • Entering the “knowledge society” from the “industrial society” – industrial society: also enabled by communications infrastructure • Lisbon strategy: Research and Innovation will be the most important factors in determining Europe’s success through the next decades • THE GOAL: “UNLEASH CREATIVITY”- by investment in – Human skills – Infrastructures • Growth of e-infrastructure (= networks + grid + operations) – phase 1: mainly academia, some in industry: “an elite, privileged to do this job” – phase 2: ordinary people doing distributed work; SMEs, adopt, adapt and use – phase 3: the next generations will transform e-infrastructure and its uses We don’t know how others will use what we devise INFSO-RI-508833 Concepts of Grid Computing 24 EGEE – building e-infrastructure Enabling Grids for E-sciencE EGEE is building a large-scale production grid service to: • Underpin research, technology and public service • Link with and build on national, regional and international initiatives • Foster international cooperation both in the creation and the use of the einfrastructure INFSO-RI-508833 Pan-European Grid Operations, Support and training Collaboration Network infrastructure & Resource centres Concepts of Grid Computing 25 Contents Enabling Grids for E-sciencE • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids INFSO-RI-508833 Concepts of Grid Computing 29 Enabling Grids for E-sciencE If “The Grid” vision leads us here… … then where are we now? INFSO-RI-508833 Concepts of Grid Computing 30 Grid projects Enabling Grids for E-sciencE Many Grid development efforts — all over the world •UK – OGSA-DAI, RealityGrid, GeoDise, •NASA Information Power Grid Comb-e-Chem, DiscoveryNet, DAME, •DOE Science Grid AstroGrid, GridPP, MyGrid, GOLD, eDiamond, Integrative Biology, … •NSF National Virtual Observatory •Netherlands – VLAM, PolderGrid •NSF GriPhyN •Germany – UNICORE, Grid proposal •DOE Particle Physics Data Grid •France – Grid funding approved •NSF TeraGrid •Italy – INFN Grid •DOE ASCI Grid •Eire – Grid proposals •DOE Earth Systems Grid •Switzerland - Network/Grid proposal •DARPA CoABS Grid •DataGrid (CERN, ...) •Hungary – DemoGrid, Grid proposal •NEESGrid •EuroGrid (Unicore) •Norway, Sweden - NorduGrid •DataTag (CERN,…) •DOH BIRN •Astrophysical Virtual Observatory •NSF iVDGL •GRIP (Globus/Unicore) •GRIA (Industrial applications) •GridLab (Cactus Toolkit) •CrossGrid (Infrastructure Components) •EGSO (Solar Physics) INFSO-RI-508833 Concepts of Grid Computing 31 Grids: where are we now? Enabling Grids for E-sciencE • Many key concepts identified and known • Many grid projects have tested, and benefit from, these • Major efforts now on establishing: – Standards (a slow process) (e.g. Global Grid Forum, http://www.gridforum.org/ ) – Production Grids for multiple VO’s “Production” = Reliable, sustainable, with commitments to quality of service • In Europe, EGEE • In UK, National Grid Service • In US, Teragrid One stack of middleware that serves many research (and other!!!) communities Operational procedures and services (people!, policy,..) – New user communities • … whilst research & development continues INFSO-RI-508833 Concepts of Grid Computing 32 The key for new VO’s Enabling Grids for E-sciencE Application Application toolkits, standards Middleware: “collective services” Basic Grid services: AA, job submission, info, … • The tools, services used by the VO’s applications • Application development environment, portals, semantics • Insulate applications from changing middleware INFSO-RI-508833 Concepts of Grid Computing 33 The vision of 2001: convergence of Web Services and Grids Enabling Grids for E-sciencE Open Grid Services Architecture Web services World-wide web OGSI Grid prototypes High-end computing High throughput-computing INTERNET INFSO-RI-508833 Massively parallel computing Concepts of Grid Computing 34 Contents Enabling Grids for E-sciencE • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Current status of grids The basis: authentication, authorisation, security INFSO-RI-508833 Concepts of Grid Computing 35 Grid security and trust -1 Enabling Grids for E-sciencE • Providers of resources (computers, databases,..) need risks to be controlled: they are asked to trust users they do not know – They trust a VO – The VO trusts its users • User’s need – single sign-on: to be able to logon to a machine that can pass the user’s identity to other resources – To trust owners of the resources they are using • Build middleware on layer providing: – Authentication: who wants to use/provide resource – Authorisation: what the user is allowed to do – Security: reduce vulnerability, e.g. from outside the firewall – Non-repudiation: knowing who did what • Digital credentials and the “Grid Security Infrastructure” middleware are the basis of production grids INFSO-RI-508833 Concepts of Grid Computing 36 Grid security and trust -2 Enabling Grids for E-sciencE • Currently, achieved by Certification: – User’s identity has to be certified by one of the national Certification Authorities (CAs) mutually recognized http://www.gridpma.org/, for EU go via here to http://marianne.in2p3.fr/datagrid/ca/catable-ca.html to find your CA •E.g. In UK go to http://www.grid-support.ac.uk/ca/ralist.htm – Resources are also certified by CAs • User – User joins a VO – Digital certificate is basis of AA – Identity passed to other resources you use, where it is mapped to a local account – the mapping is maintained by the VO • Common agreed policies establish rights for a Virtual Organization to use resources INFSO-RI-508833 Concepts of Grid Computing 37 Grid security and trust -3 Enabling Grids for E-sciencE • Certification and GSI provides – Authentication Resource can trust user User can trust the resource provider …. So long as certificates are protected – they are your grid identity – A basis for Authorisation so a VO can manage access to resources Resource providers trust the VO The VO trusts the user – Mechanism for checking message integrity Messages are passed between machines Public/private key pairs protect message integrity as well as authentication •Not (usually) encrypted but message-integrity is checked INFSO-RI-508833 Concepts of Grid Computing 38 Summary of grid computing concepts Enabling Grids for E-sciencE • Flexible collaboration across multiple administrative domains – sharing data, computers, instruments, application software,.. • Single sign-on to resources in multiple organisations – Authorisation, authentication • Need for people-services as well as middleware services – credential authorities, VO managers, support • Drives are towards – Production services (reliable, sustainable,… – against which research projects can plan with confidence) In Europe, EGEE In UK, National Grid Service – Standards – Empowering new user communities INFSO-RI-508833 Concepts of Grid Computing 39