Supercomputing, Visualization & eScience Manchester Computing What is e-Science & What is the Grid? W T Hewitt Tuesday, October 28, 2003 UCISA Meeting Edinburgh Agenda What is Grid & eScience? The Global Programme The UK eScience Programme Impacts 2 escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing What is e-Science & the Grid? Why Grids? Large-scale science and engineering are done through – the interaction of people, – heterogeneous computing resources, information systems, and instruments, – all of which are geographically and organizationally dispersed. The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. 4 escigriducisa/03 Supercomputing, Visualization e-Science From Bill Johnston 27&July 01 The Grid… "…is the web on steroids." "…is Napster for Scientists" [of data grids] "…is the solution to all your problems." "…is evil." [a system manager, of Globus] "…is distributed computing re-badged." "…is distributed computing across multiple administrative domains" – Dave Snelling, senior architect of UNICORE 5 escigriducisa/03 Supercomputing, Visualization & e-Science […provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource" – From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” "…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships." 6 escigriducisa/03 Supercomputing, Visualization & e-Science CERN: Large Hadron Collider (LHC) Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs CMS Detector 7 escigriducisa/03 Supercomputing, Visualization & e-Science Why Grids? A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour; A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions; 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data Civil engineers collaborate to design, execute, & analyze shake table experiments From Steve Tuecke 12 Oct. 01 8 escigriducisa/03 Supercomputing, Visualization & e-Science Why Grids? (contd.) Climate scientists visualize, annotate, & analyze terabyte simulation datasets An emergency response team couples real time data, weather model, population data A multidisciplinary analysis in aerospace couples code and data in four companies A home user invokes architectural design functions at an application service provider From Steve Tuecke 12 Oct. 01 9 escigriducisa/03 Supercomputing, Visualization & e-Science Broader Context “Grid Computing” has much in common with major industrial thrusts – Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… Sharing issues not adequately addressed by existing technologies – Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” – High performance: unique demands of advanced & high-performance systems 10 escigriducisa/03 Supercomputing, Visualization & e-Science What is the Grid? “ Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources what we refer to as virtual organizations." From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke 11 escigriducisa/03 Supercomputing, Visualization & e-Science New Book 12 escigriducisa/03 Supercomputing, Visualization & e-Science What is the Grid? Resource sharing & coordinated problem solving in dynamic, multiinstitutional virtual organizations On-demand, ubiquitous access to computing, data, and all kinds of services New capabilities constructed dynamically and transparently from distributed services No central location, No central control, No existing trust relationships, Little predetermination Uniformity Pooling Resources 13 escigriducisa/03 Supercomputing, Visualization & e-Science e-Science and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor, Director General of Research Councils, Office of Science and Technology 14 escigriducisa/03 Supercomputing, Visualization & e-Science Why GRID? VERY VERY IMPORTANT The GRID is one way to realise the e-Science vision. WE ARE TRYING TO DO E-SCIENCE! 15 escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing Grid Middleware Diverse global services Grid services Local OS Common principles Single sign-on – Often implying Public Key Infrastructure (PKI) Standard protocols and services Respect for autonomy of resource owner Layered architectures Higher-level infrastructures hiding heterogeneity of lower levels Interoperability is paramount 17 escigriducisa/03 Supercomputing, Visualization & e-Science Grid Middleware Middleware Globus UNICORE Legion and Avaki Data Storage Resource Broker (SRB) Replica Management OGSA-DAI Scheduling Sun Grid Engine Load Sharing Facility (LSF) Web services (WSDL, SOAP, UDDI) IBM Websphere Microsoft .NET Sun Open Net Environment (Sun ONE) – OpenPBS and PBS(Pro) – from Veridian Maui scheduler Condor – 18 from Platform Computing could also go under middleware escigriducisa/03 PC Grids Peer-to-Peer computing Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing Data-oriented Grids Data-oriented middleware Wide-area distributed file systems (e.g. AFS) Storage Resource Broker (SRB) – – – – – UCSD and SDSC Provide transparent access to data storage Centralised architecture Motivated by experiences of HPC users, not database users Little enthusiasm from UK e-Science programme OGSA-DAI – – – – Database Access and Integration Strategic contribution of UK e-Science programme Universities of Edinburgh, Manchester, Newcastle; IBM, Oracle Alpha release January 2003 Globus Replica Management software – Next up! 20 escigriducisa/03 Supercomputing, Visualization & e-Science Data Grids for High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) France Regional Centre SpecInt95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25,000 Tier 0 Germany Regional Centre Italy Regional Centre ~100 MBytes/sec CERN Computer Centre FermiLab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute Institute Institute ~0.25TIPS Physics data cache Institute ~1 MBytes/sec Tier 4 Caltech ~1 TIPS Tier2 Centre Tier2 Centre Tier2 Centre Tier2 Centre ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physicist workstations 21 escigriducisa/03 Supercomputing, Visualization & e-Science Data Intensive Issues Include … Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for what Schedule resources efficiently, again subject to local and global constraints Achieve high performance, with respect to both speed and reliability Catalog software and virtual data 22 escigriducisa/03 Supercomputing, Visualization & e-Science Desired Data Grid Functionality 23 High-speed, reliable access to remote data Automated discovery of “best” copy of data Manage replication to improve performance Co-schedule compute, storage, network “Transparency” wrt delivered performance Enforce access control on data Allow representation of “global” resource allocation policies escigriducisa/03 Supercomputing, Visualization & e-Science Grid Standards Grid Standards Bodies: – IETF: Home of the Network Infrastructure Standards – W3C: Home of the Internet – GGF: Home of the Grid GGF Defines the Open Grid Services Architecture – OGSI is the Infrastructure part of OGSA – OGSI Public comment draft submitted 14 February 2003 Key OGSA Areas of Standards Development – – – – 24 Job management interfaces Resources & Discovery Security Grid Economy and Brokering escigriducisa/03 Supercomputing, Visualization & e-Science What is OGSA? “Web Services with Attitude!” Also known as "Open Grid Services Architecture" 25 escigriducisa/03 Supercomputing, Visualization & e-Science Aside: What are Web Services? Loosely Coupled Distributed Computing – Think Java RMI or C remote procedure call Text Based Serialization – XML: “Human Readable” serialization of objects IBM and Microsoft lead – Web Services Description Language (WSDL) – W3C Standardization Three Parts – Messages (SOAP) – Definition (WSDL) – Discovery (UDDI) 26 escigriducisa/03 Supercomputing, Visualization & e-Science Web Services in Action UDDI Publish/WSDL Search Client https/SOAP Java/C/Browser WS Platform Any protocol Legacy Enterprise Application 27 escigriducisa/03 InterStage, WebSphere, J2EE, GLUE, SunOne, .NET Database ... Supercomputing, Visualization & e-Science Enter Grid Services Experiences of Grid computing (and business process integration) suggest similar extensions to Web Services State – Service Data Model Persistence and Naming – Two Level Naming (GSH, GSR) – Allows dynamic migration and QoS adaptation Lifetime Management – Self healing and ‘soft’ garbage collection. Standard PortTypes – Guarantee of minimal level of service – Beyond P2P is Federation through Mediation Explicit Semantics – Grid Services specify semantics on top of Web Service syntax. – PortType Inheritance 28 escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing If one GRID is good then Many GRIDS must be better US Grid Projects NASA Information Power Grid DOE Science Grid NSF National Virtual Observatory NSF GriPhyN DOE Particle Physics Data Grid NSF DTF TeraGrid DOE ASCI DISCOM Grid 30 escigriducisa/03 DOE Earth Systems Grid DOE FusionGrid NEESGrid NIH BIRN NSF iVDGL Supercomputing, Visualization & e-Science National Grid Projects 31 Japan – Grid Data Farm, ITBL Netherlands – VLAM, DutchGrid Germany – UNICORE, Grid proposal France – Grid funding approved Italy – INFN Grid Eire – Grid-Ireland Poland – PIONIER Grid Switzerland - Grid proposal Hungary – DemoGrid, Grid proposal ApGrid – AsiaPacific Grid proposal escigriducisa/03 Supercomputing, Visualization & e-Science EU GridProjects 32 DataGrid (CERN, ..) EuroGrid (Unicore) DataTag (TTT…) Astrophysical Virtual Observatory GRIP (Globus/Unicore) GRIA (Industrial applications) GridLab (Cactus Toolkit) CrossGrid (Infrastructure Components) EGSO (Solar Physics) COG (Semantic Grid) escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing UK e-Science Programme UK e-Science Programme DG Research Councils E-Science Steering Committee Director’s Awareness and Co-ordination Role Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) £80m ESRC (£3m) EPSRC (£17m) CLRC (£5m) From Tony Hey 27 July 01 34 escigriducisa/03 Grid TAG Director Director’s Management Role Generic Challenges EPSRC (£15m), DTI (£15m) Collaborative projects Industrial Collaboration (£40m) Supercomputing, Visualization & e-Science Key Elements Development of Generic Grid Middleware Network of Grid Core Programme e-Science Centres – National Centre http://www.nesc.ac.uk – Regional Centres http://www.esnw.ac.uk/ Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science demonstrators Grid Network Team Grid Engineering Team Grid Support Centre Task Forces – Database lead by Norman Paton – Architecture lead by Malcolm Atkinson International Involvement 35 escigriducisa/03 Supercomputing, Visualization e-Science Adapted from Tony Hey 27 &July 01 National & Regional Centres Centres donate equipment to make a Grid Edinburgh Glasgow Newcastle Belfast DL Oxford Cardiff RAL Manchester Cambridge Hinxton London Southampton 36 escigriducisa/03 Supercomputing, Visualization & e-Science e-Science Demonstrators 37 Dynamic Brain Atlas Biodiversity Chemical Structures Mouse Genes Robotic Astronomy Collaborative Visualisation Climateprediction.com Medical Imaging/VR escigriducisa/03 Supercomputing, Visualization & e-Science Grid Middleware R&D £16M funding available for industrial collaborative projects £11M allocated to Centres projects plus £5M for ‘Open Call’ projects Set up Task Forces – Database Task Force – Architecture Task Force – Security Task Force 38 escigriducisa/03 Supercomputing, Visualization & e-Science Grid Network Team Expert group to identify end-to-end network bottlenecks and other network issues – e.g. problems with multicast for Access Grid Identify e-Science project requirements Funding £0.5M traffic engineering/QoS project with PPARC, UKERNA and CISCO – investigating MPLS using SuperJANET network 39 Funding DataGrid extension project investigating bandwidth scheduling with PPARC Proposal for ‘UKLight’ lambda connection to Chicago and Amsterdam escigriducisa/03 Supercomputing, Visualization & e-Science UK e-Science Pilot Projects GRIDPP (PPARC) ASTROGRID (PPARC) Comb-e-Chem (EPSRC) DAME (EPSRC) DiscoveryNet (EPSRC) GEODISE (EPSRC) myGrid (EPSRC) RealityGrid (EPSRC) RASMOL 40 escigriducisa/03 Climateprediction.com (NERC) Oceanographic Grid (NERC) Molecular Environmental Grid (NERC) NERC DataGrid (+ OST-CP) Biomolecular Grid (BBSRC) Proteome Annotation Pipeline (BBSRC) High-Throughput Structural Biology (BBSRC) Global Biodiversity (BBSRC) Supercomputing, Visualization & e-Science e-Science Centres of Excellence 41 Birmingham/Warwick – Modelling Bristol – Media UCL – Networking White Rose Grid – Leeds, York, Sheffield Lancaster – Social Science Leicester – Astronomy Reading - Environment escigriducisa/03 Supercomputing, Visualization & e-Science UK e-Science Grid Edinburgh Glasgow Newcastle Belfast Manchester DL Oxford Cardiff Cambridge RL London Hinxton Soton 42 escigriducisa/03 Supercomputing, Visualization & e-Science UK e-Science Funding First Phase: 2001 –2004 Application Projects – £74M – All areas of science and engineering Core Programme – £15M + £20M (DTI) – Collaborative industrial projects Second Phase: 2003 – 2006 Application Projects – £96M – All areas of science and engineering Core Programme – £16M – Core Grid Middleware – DTI follow-on? 43 escigriducisa/03 Supercomputing, Visualization & e-Science EPSRC: Computer Science for e-Science – £9M, 18 projects so far ESRC: National e-Social Science Centre + 3 hubs – ~£6M PPARC MRC BBSRC 44 escigriducisa/03 Supercomputing, Visualization & e-Science Core Programme: Phase 2 45 UK e-Science Grid/Centres and e-Science Institute Grid Operation Centre and Network Monitoring Core Middleware engineering National Data Curation Centre e-Science Exemplars/New Opportunities Outreach and International involvement escigriducisa/03 Supercomputing, Visualization & e-Science Other Activities Security Task Force – Joint fund key security projects with EPSRC & JCSR and coordinated effort with NSF NMI Internet2 projects – JCSR £2M call in preparation UK Digital Curation Centre – £3M, Core e-Science + JCSR JCSR – £3M per annum 46 escigriducisa/03 Supercomputing, Visualization & e-Science SR2004 – e-Science Infrastructure 47 Persistent UK e-Science Research Grid Grid Operations Centre UK Open Middleware Infrastructure Institute National e-Science Institute UK Digital Curation Centre AccessGrid Support Service e-Science/Grid collaboratories Legal Service International Standards Activity escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing Conclusions Today’s Grid 49 A Single System Image Transparent wide-area access to large data banks Transparent wide-area access to applications on heterogeneous platforms Transparent wide-area access to processing resources escigriducisa/03 Security, certification, single sign-on authentication, AAA – Data access,Transfer & Replication – Grid Security Infrastructure, GridFTP, Giggle Computational resource discovery, allocation and process creation – GRAAM, Unicore, Condor-G Supercomputing, Visualization & e-Science Reality Checks!! The Technology is Ready – Not true — its emerging • • • • • Building middleware, Advancing Standards, Developing, Dependability Building demonstrators. The computational grid is in advance of the data intensive middleware Integration and curation are probably the obstacles But!! It doesn’t have to be all there to be useful. We know how we will use grid services – No — Disruptive technology • Lower the barriers of entry. 50 escigriducisa/03 Supercomputing, Visualization & e-Science Grid Evolution 1st Generation Grid – – – – 2nd Generation Grid – – – – – 51 Computationally intensive, file access/transfer Bag of various heterogeneous protocols & toolkits Recognises internet, Ignores Web Academic teams Data intensive -> knowledge intensive Services-based architecture Recognises Web and Web services Global Grid Forum Industry participation escigriducisa/03 We are here! Supercomputing, Visualization & e-Science Impacts It's all about interoperability, really. Web & Grid Services are creating a new marketplace for components If you're concerned with systems integration or internet delivery of services, embrace Web Services technologies now. You'll be ready for Grid Services when they're ready for you. – If you're a developer, get Web Services on your CV – If you're an IT manager, collect Web Service expertise through hiring or training Software license models must adapt 52 escigriducisa/03 Supercomputing, Visualization & e-Science I don't want to share! Do I need a grid? 53 escigriducisa/03 Supercomputing, Visualization & e-Science In conclusion The GRID is not, and will not, be free – must pay for resources What have we to show for £250M? 54 escigriducisa/03 Supercomputing, Visualization & e-Science Acknowledgements Carole Goble Stephen Pickles Paul Jeffreys University of Manchester Academic collaborators Industrial collaborators Funding Agencies: DTI, EPSRC, NERC, ESRC, PPARC 55 escigriducisa/03 Supercomputing, Visualization & e-Science Supercomputing, Visualization & eScience Manchester Computing SVE @ Manchester Computing World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together www.man.ac.uk/sve sve@man.ac.uk