Enabling Grids for E-sciencE An overview of e-science and Grid technologies Dave Berry Deputy Director, Research & E-Infrastructure Development National e-Science Centre daveb@nesc.ac.uk 9th March 2006 www.eu-egee.org INFSO-RI-508833 Contents Enabling Grids for E-sciencE • • • • Introduction to E-Science E-Infrastructure & Grids E-Infrastructure - Where are we now? Summmary – Enabling the research & business of the future – and for early adopters… the present! INFSO-RI-508833 3 Enabling Grids for E-sciencE ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ John Taylor Director General of Research Councils Office of Science and Technology INFSO-RI-508833 4 A new way of doing science networking grids instrumentation computation data curation… value added of distributed collaborative research (virtual organisations) Application pull Technology push Enabling Grids for E-sciencE a new way for all scientists to work on research challenges that would otherwise be difficult to address Mário Campolargo DG INFSO F3, Pisa 24th October 2005 INFSO-RI-508833 5 EBank Enabling Grids for E-sciencE Slide from Jeremy Frey INFSO-RI-508833 6 Biomedical Research Informatics Delivered by Grid Enabled Services CFG Virtual Publically Curated Data Ensembl Organisation OMIM Glasgow SWISS-PROT Private Edinburgh MGI Portal data Private data Oxford HUGO … RGD Leicester DATA HUB Private data Netherlands Synteny Grid Service Private data Private data London Private data + http://www.brc.dcs.gla.ac.uk/projects/bridges/ eDiaMoND: Screening for Breast Cancer Patients Radiology reporting systems Letters Screening 1 Trust Many Trusts Collaborative Working Audit capability Epidemiology Electronic Patient Records Case Information Assessment/ Symptomatic Biopsy 2ndary Capture Or FFD X-Rays and Case Information Other Modalities -MRI -PET -Ultrasound Symptomatic/Assessment Information eDiaMoND Grid Case and Reading Information Better access to Case information And digital tools SMF CAD Training Case and Reading Information Digital Reading 3D Images Supplement Mentoring With access to digital Training cases and sharing Of information across clinics Manage Training Cases Perform Training SMF CAD Temporal Comparison 9 Provided by eDiamond project: Prof. Sir Mike Brady et al. climateprediction.net and GENIE • Largest climate model ensemble • >45,000 users, >1,000,000 model years Response of Atlantic circulation to freshwater forcing 2K 10K UK Grid for Particle Physics (2003) CMS LHCb ATLAS CMS GridPP www.gridpp.ac.uk What is e-science? Enabling Grids for E-sciencE • Collaborative research that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – – – – Crosses organisational boundaries Often very compute intensive Often very data intensive Sometimes large-scale collaboration • Began with focus in the “big sciences” – Spreading to new user communities (social science, arts, humanities…) • Technologies also relevant in industry, government, public services INFSO-RI-508833 13 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning Signal Data Explorer 14 Healthcare @ Home REFERRAL GP Home-mobile-clinic via PDA-laptop-PC-Paper REFERRAL Diabetician Home-mobile-clinic via PDA-laptop-PC-Paper Various Clinical Specialists (Distributed) e.g. Ophthalmologist, Podiatrist, Vascular Surgeons, Renal Specialists, Wound clinic, Foot care clinic, Neurologists, Cardiologists REFERRAL VARIABLES ACCESS MATRIX CASE Patient Home-mobile-clinic via TV-PDA-laptop-PC-Paper Dietitian Biochemist Diabetes Specialist / Other Specialist Nurses Home-mobile-clinic via TV-PDA-laptop-PC-Paper Community Nurses / Health Visitors Contents Enabling Grids for E-sciencE • • • • Introduction to E-Science E-Infrastructure and Grids E-Infrastructure - Where are we now? Summary – Enabling the research & business of the future INFSO-RI-508833 16 What is E-Infrastructure? – Political view Enabling Grids for E-sciencE • A shared resource – That enables science, research, engineering, medicine, industry, … – It will improve UK / European / … productivity Lisbon Accord 2000 E-Science Vision SR2000 – John Taylor – Commitment by UK government Sections 2.23-2.25 – Always there c.f. telephones, transport, power, internet INFSO-RI-508833 18 What is Grid Computing? Enabling Grids for E-sciencE • The grid vision is of “Virtual computing” (+ information services to locate computation, storage resources) – Compare: The web: “virtual documents” (+ search engine to locate them) • MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of – Research – Commerce – engineering, … – Public service – health, environment,… INFSO-RI-508833 19 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 20 Computing as a Commodity Enabling Grids for E-sciencE • From hand-built research computers to PC’s on every desktop • From individual computers to the Internet and the WWW • From specialist supercomputers to clusters and cyclescavenging • From proprietary formats to standards and ontologies • From room-sized computers to PDAs and mobile phones • From individual servers to virtualised, dynamically provisioned blade farms • From applications to services • From ownership to computing-on-demand INFSO-RI-508833 21 What is E-Infrastructure? Enabling Grids for E-sciencE Grids permit resource sharing across administrative domains – Dynamic allocation & configuration • Networks permit communication across geographical distance • Supporting organisations Collaboration – Operations for grids, networks • Resources – – – – • Computers Digital libraries Research data Instruments Middleware – – – – – Authentication, Authorisation Registries, search engines Toolkits, environments Shared vocabularies Provenance INFSO-RI-508833 Grid Operations, Support and training • Network infrastructure & Resources 22 Typical current science Grid Enabling Grids for E-sciencE • Grid middleware runs on each shared resource – Data storage – (Usually) batch jobs on pools of processors • Users join Virtual organisation (VO) • VO negotiates with sites to agree access to resources • Distributed services (both people and software) enable the grid INFSO-RI-508833 INTERNET 27 Typical current science grid Enabling Grids for E-sciencE User/Grid interface Input files Output files Datasets info File Replica Catalogue Information Service Resource Broker INFSO-RI-508833 Publish Logging & Book-keeping Job Query Job Submit Event Author. &Authen. Storage Resource Job Status Computing Resource = batch queue 28 Empowering users Enabling Grids for E-sciencE VO-specific developments: Application Application toolkits, standards Middleware: “collective services” Basic Grid services: AA, job submission, info, … – Portals – Virtual Research Environments – Semantics, ontologies – Workflow – Registries of VO services Production grids provide these services. Develop above these to empower ordinary users! INFSO-RI-508833 29 Example Portal - BRIDGES make host proxy, authenticate with NGS and submit job job request is passed on securely with username NeSC grid server with host credentials NGS clusters authenticate at BRIDGES web portal with username and password only get user authorisations end user Leeds Oxford BRIDGES web portal RAL Manch ester NeSC machine with PERMIS authorisation service (GT3.3) Slide by Micha Bayer, NeSC 30 Workflow example Enabling Grids for E-sciencE • Taverna in MyGrid http://www.mygrid.org.uk/ • “allows the e-Scientist to describe and enact their experimental processes in a structured, repeatable and verifiable way” • GUI • Workflow language • enactment engine INFSO-RI-508833 31 Notification Pub/Sub for Laboratory data using a broker and ultimately delivered over GPRS Comb-e-chem: Jeremy Frey Workshop The many scales of grids Enabling Grids for E-sciencE International instruments,.. National datacentres, HPC, instruments Institutes’ data; Wider collaboration greater resources International grid (EGEE) National grids (e.g. National Grid Service) Regional grids (e.g. White Rose Grid) Campus grids Condor pools INFSO-RI-508833 33 Security and trust -1 Enabling Grids for E-sciencE • Providers of resources (computers, databases,..) need risks to be controlled: they are asked to trust users they do not know – They trust a VO – The VO trusts its members • User’s need – single sign-on: to be able to logon to a machine that can pass the user’s authorisation to other resources – To trust owners of the resources they are using • Build middleware on layer providing: – Authentication: know who wants to use resource – Authorisation: know what the user is allowed to do – Security: reduce vulnerability, e.g. from outside the firewall – Non-repudiation: knowing who did what INFSO-RI-508833 34 Security and trust -2 Enabling Grids for E-sciencE • Achieved by Certification: – User’s identity has to be certified by one of the national Certification Authorities (CAs) mutually recognized http://www.gridpma.org/, E.g. In UK go to http://www.grid-support.ac.uk/ca/ralist.htm – Resources are also certified by CAs • User – User joins a VO – Digital certificate is basis of AA – Identity passed to resources you use, where it is mapped to a local account • Policies express the rights for a Virtual Organization to use resources INFSO-RI-508833 35 Enabling Grids for E-sciencE If “The Grid” vision leads us here… … then where are we now? INFSO-RI-508833 36 Grids: where are we now? Enabling Grids for E-sciencE • Many key concepts identified and known • Major efforts now on establishing: – Standards (a slow process) (e.g. Global Grid Forum, http://www.gridforum.org/ ) – Production Grids for multiple VO’s “Production” = Reliable, sustainable, with commitments to quality of service • In Europe, EGEE • In UK, National Grid Service • In US, Teragrid and OSG One stack of middleware that serves many research communities Establishing operational procedures and organisation • “Service orientation” - “the way to build grids” INFSO-RI-508833 38 Where are we now? –user’s view Enabling Grids for E-sciencE Research Pilot projects Early adopters Routine production Unimagined possibilities Networks Grids Web Arts Sciences, Humanities engineering e-Soc-Sci Early production grids: UK – National Grid Service International - EGEE INFSO-RI-508833 39 National Grid initiatives now include… Enabling Grids for E-sciencE CroGrid INFSO-RI-508833 41 Service-Oriented Architecture Enabling Grids for E-sciencE Registry Discovery Registration Invocation Client INFSO-RI-508833 Service 42 Enabling Grids for E-sciencE • Accessible across a network • Loosely coupled, defined by the messages they receive / send • Interoperable: each service has a description that is accessible and can be used to create software to invoke that service • Based on standards • Developed in anticipation of new uses INFSO-RI-508833 Service orientation – software components that are… Client Registry Service Service Service Service Service Service 43 Enabling Grids for E-sciencE Web Services Grid and Web Services Grid Technology • Research driven • Commerce • Data-intensive • Standards • Compute intensive • Tools • Collaboration – sharing of resources Grids based on Web Services INFSO-RI-508833 45 Contents Enabling Grids for E-sciencE • • • • Introduction to E-Science E-Infrastructure and Grids E-Infrastructure - Where are we now? Summary – Enabling the research & business of the future INFSO-RI-508833 49 Summary - 1: its about collaboration!! Enabling Grids for E-sciencE (As well as resource utilisation!) INFSO-RI-508833 Collaboration Grid Operations, Support and training • Grids: collaboration across administrative domains • Networks: collaboration across geographical distance • Semantics, ontologies: collaboration across disciplines • Storage, (“curation”): collaboration across time Network infrastructure & Resource centres 52 Summary - 2 Enabling Grids for E-sciencE • Ask not what “the Grid” can do for you People • BUT • With whom do you collaborate? • What resources / services can you provide? Data Computation • What resources would empower your research? INFSO-RI-508833 53