Enabling Grids for E-sciencE What are grid computing and e-Science? Mike Mineter mjm@nesc.ac.uk www.eu-egee.org INFSO-RI-508833 Acknowledgements Enabling Grids for E-sciencE • This talk was prepared by Mike Mineter of NeSC and includes slides from previous tutorials and talks delivered by: – – – – – Dave Berry, Richard Hopkins (National e-Science Centre) the EDG training team Ian Foster, Argonne National Laboratories Jeffrey Grethe, SDSC EGEE colleagues INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 3 Goals of this module Enabling Grids for E-sciencE • To introduce the concepts of e-Science and Grid computing assuming no previous knowledge INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 4 Contents Enabling Grids for E-sciencE • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 5 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 6 The grid vision Enabling Grids for E-sciencE • The grid vision is of “Virtual computing” (+ information services to locate computation, storage resources) – Compare: The web: “virtual documents” (+ search engine to locate them) • MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of – Research – Commerce – engineering, … “the knowledge economy” – Public service – health, environment,… INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 7 Before Grids Enabling Grids for E-sciencE Researchers in many locations need to share resources FTP, telnet, blood, sweat and tears… and little support for collaboration Scientific instruments, data stores and computers in many locations INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 8 The Grid Vision Enabling Grids for E-sciencE Researchers in many locations need to share resources Resources connect to “The Grid” Scientific instruments, data stores and computers in many locations INFSO-RI-508833 What are Grid Computing and e-Science? March tutorials 2005, NeSC Slide derived from EDG 29 / LCG 9 Grid projects Enabling Grids for E-sciencE Many Grid development efforts — all over the world •UK – OGSA-DAI, RealityGrid, GeoDise, •NASA Information Power Grid Comb-e-Chem, DiscoveryNet, DAME, •DOE Science Grid AstroGrid, GridPP, MyGrid, GOLD, eDiamond, Integrative Biology, … •NSF National Virtual Observatory •Netherlands – VLAM, PolderGrid •NSF GriPhyN •Germany – UNICORE, Grid proposal •DOE Particle Physics Data Grid •France – Grid funding approved •NSF TeraGrid •Italy – INFN Grid •DOE ASCI Grid •Eire – Grid proposals •DOE Earth Systems Grid •Switzerland - Network/Grid proposal •DARPA CoABS Grid •DataGrid (CERN, ...) •Hungary – DemoGrid, Grid proposal •NEESGrid •EuroGrid (Unicore) •Norway, Sweden - NorduGrid •DataTag (CERN,…) •DOH BIRN •Astrophysical Virtual Observatory •NSF iVDGL •GRIP (Globus/Unicore) •GRIA (Industrial applications) •GridLab (Cactus Toolkit) •CrossGrid (Infrastructure Components) •EGSO (Solar Physics) INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 10 Contents Enabling Grids for E-sciencE • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 11 “A grid” Enabling Grids for E-sciencE • The initial vision: “The Grid” • The present reality: Many “grids” • Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources • What’s a VO? – People in different organisations seeking to cooperate and share resources across their organisational boundaries VO Institute Desktop • Why establish a Grid? – Share data – Pool computers – Collaborate INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 12 A computer Enabling Grids for E-sciencE • The Operating System enables easy use of – – – – Input devices Processor Disks Display INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 13 An institute’s resources on a LAN Enabling Grids for E-sciencE Middleware runs on each computer: • To allow sharing of disks and printers (using, e.g. Samba) • To share processors for computation (e.g. Condor) • User just perceives “shared resources”, with no regard to location in the building: – Authenticated by username / password – Authorised to use own files,… INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 14 Typical current grid Enabling Grids for E-sciencE • Grid middleware runs on each shared resource – Data storage – (Usually) batch jobs on pools of processors • Users join VO’s • Virtual organisation negotiates with sites to agree access to resources INTERNET • Distributed services (both people and middleware) enable the grid, allow single sign-on INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 15 What is a Grid? Enabling Grids for E-sciencE • “An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” Ian Foster and Carl Kesselman INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 16 Key concepts Enabling Grids for E-sciencE • Virtual organisation: people and resources collaborating - crosses admin, organisational boundaries • Single sign-on – I connect to one machine – some sort of “digital credential” is passed on to any other resource I use – Authentication: How do I identify myself to a resource without username/password for each resource I use? – Authorisation: what can I do? Determined by My role in a VO (role-based in near future) VO negotiations with resource providers • Grid middleware runs on each resource • User just perceives “shared resources”, with no awareness of location or owning organisation INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 17 What is Grid computing not? Enabling Grids for E-sciencE • “Grid computing” is a trendy phrase! • It’s therefore also a misused term! – Sometimes in Industry : “Grids” = clusters Motivations: better use of resources; scope for commercial services – Also used to refer to the harvesting of unused compute cycles, e.g. SETI@home, Climateprediction.net INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 18 Contents Enabling Grids for E-sciencE • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 19 The first driver: e-Science Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – data deluges from new technologies – Crosses organisational boundaries INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 20 Other major drivers for grids Enabling Grids for E-sciencE e-Research – not just e-Science – Also curation and digital data libraries (DILIGENT, DELOS, GRACE) • Commerce – not just academia!! • Politics: the “knowledge economy” • “e-Infrastructure”: A shared resource Collaboration Grid – That enables science, research, engineering, medicine, industry, … – It will improve UK / European / … productivity Operations, Support and training • Network infrastructure linking resource centres INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 21 Contents Enabling Grids for E-sciencE • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 22 Astronomy Enabling Grids for E-sciencE No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1B objects INFSO-RI-508833 Data and images courtesy Alex Szalay, John Hopkins University What are Grid Computing and e-Science? 29 March 2005, NeSC 23 Earth Observation Enabling Grids for E-sciencE ESA missions: • 100’s of Gbytes of data per day Grid contribution to EO: • Enhance the ability to access high level products • Allow reprocessing of large historical archives • Improve Earth science complex applications (data fusion, data mining, modelling …) Federico.Carminati , EU review presentation, 1 March 2002 INFSO-RI-508833 Derived from: L. Fusco, June 2001 What are Grid Computing and e-Science? 29 March 2005, NeSC 24 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning What are Grid Computing andData e-Science? Signal Explorer 29 March 2005, NeSC 25 Large Hadron Collider at CERN Enabling Grids for E-sciencE • Data Challenge: – 10 Petabytes/year of data !!! – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges – Reliable and scalable through project lifetime of decades INFSO-RI-508833 What are Grid Computing and e-Science? Mont Blanc (4810 m) Downtown Geneva 29 March 2005, NeSC 26 Contents Enabling Grids for E-sciencE • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 27 Enabling Grids for E-sciencE If “The Grid” vision leads us here… … then where are we now? INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 28 Current status Enabling Grids for E-sciencE • Many key concepts identified and known • Many grid projects have tested these • Major efforts now on establishing: – Standards (a slow process) (Global Grid Forum, http://www.gridforum.org/ ) – Production Grids for multiple VO’s “Production” = Reliable, sustainable, with commitments to quality of service – New user communities – … whilst research & development continues • In Europe, EGEE • In UK, NGS • In US, Teragrid INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 29 1997- Present: Globus Enabling Grids for E-sciencE • A software toolkit addressing certain technical problems in the development of Grid enabled tools, services, and applications – Offers a modular “bag of technologies” – Made available under liberal open source license • Not turnkey solutions, but building blocks and tools for application developers and system integrators • Tools built on Grid Security Infrstructure to include: – – – – Job submission (GRAM) : run a job on a remote computer Information services: So I know which computer to use File transfer (GridFTP): so large data files can be transferred Replica management: so I can have multiple versions of a file “close” to the computers where I want to run jobs • Production grids are (currently) based on release 2 • http://www.globus.org/ INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 30 Computing Resources: Feb 2005 Enabling Grids for E-sciencE Country providing resources Country anticipating joining In LCG-2: 113 sites, 30 countries >10,000 cpu ~5 PB storage Includes non-EGEE sites: • 9 countries • 18 sites INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 31 Grid security and trust -1 Enabling Grids for E-sciencE • Providers of resources (computers, databases,..) need risks to be controlled: they are asked to trust users they do not know • User’s need single sign-on: logon to a machine that can pass the user’s identity to other resources • Build middleware on layer providing: – Authentication: know who wants to use resource – Authorisation: know what the user is allowed to do – Security: reduce vulnerability, e.g. from outside the firewall – Non-repudiation: knowing who did what • GSI from the Globus toolkit does this for NGS INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 32 Grid security and trust -2 Enabling Grids for E-sciencE • Currently, achieved by Certification: – User’s identity has to be certified by one of the national Certification Authorities (CAs) mutually recognized http://www.gridpma.org/ In UK go to http://www.grid-support.ac.uk/ca/ralist.htm – Resources (node machines) have to be certified by CAs – Digital certificate installed on the machine accessed by user: basis of AA – User joins a VO – Identity passed to other resources you use, where it is mapped to a local account – the mapping is maintained by the VO • Common agreed policies establish rights for a Virtual Organization to use resources INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 33 The key for new VO’s Enabling Grids for E-sciencE Application Application toolkits, standards Middleware: “collective services” Basic Grid services: AA, job submission, info, … INFSO-RI-508833 • Application development environment, portals • Insulate applications from changing middleware What are Grid Computing and e-Science? 29 March 2005, NeSC 34 Contents Enabling Grids for E-sciencE • • • • • • Definitions of e-Science and “a grid” Exploring the definitions Why now?! Some examples Current status of grids Are grids for you?! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 35 Are Grids for you?! Enabling Grids for E-sciencE • IF a community effort is vital to achieving goals, by sharing services of data and computation, • AND that effort crosses organisation boundaries • THEN yes! • In the UK, plan to join the NGS! • OR if you wish to use computation/storage/data services provided on a Grid – then YES! INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 36 Summary Enabling Grids for E-sciencE • Collaboration across multiple organisations • Single sign-on to resources in multiple organisations • Need for people-services as well as middleware services to enable this: e.g. to run – – – – Enabling services (e.g. info service) Certification authority for AA VO management – to negotiate with sites Helpdesk, … • Drives are towards – Production services In the UK, the NGS In Europe, EGEE – Standards – (tomorrow) – “e-Infrastructure” ~ integration of networking and middleware to support collaboration INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 37 Additional slides Enabling Grids for E-sciencE INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 38 Exponential Growth Performance per Dollar Spent Enabling Grids for E-sciencE Optical Fibre Doubling Time 9 12 Gilder’s Law (32X in 4 yrs) (bits per second) (months) 18 Data Storage Storage Law (16X in 4yrs) (bits per sq. inch) Chip capacity (# transistors) 0 1 2 Moore’s Law (5X in 4yrs) 3 4 5 Number of Years Triumph of Light – Scientific American. George Stix, January 2001 INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 39 How Different 2005 is from 1995 Enabling Grids for E-sciencE • Enormous quantities of data: Petabytes – For an increasing number of communities – Constraint is not collection but analysis • Ubiquitous Internet: – >100 million hosts – Security and Trust are crucial issues • Ultra-high-speed networks: >10 Gb/s – Global optical networks – Bottlenecks: last kilometre & firewalls • Huge quantities of computing: >100 Top/s – Moore’s law gives us all supercomputers – Organising their effective use is the challenge • Moore’s law everywhere – Instruments, detectors, sensors, scanners, … – Organising their effective use is the challenge INFSO-RI-508833 What are Grid Computing and e-Science? 29 March 2005, NeSC 40