Learning from e-Science: Grids and e-Infrastructure at GridKa School Karlsruhe Malcolm Atkinson Director www.nesc.ac.uk 26th September 2005 Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure 1 What is e-Science? Goal: to enable better research in all disciplines What is e-Science? Method: Invention and exploitation of advanced computational methods to generate, curate and analyse research data X X From experiments, observations and simulations Quality management, preservation and reliable evidence to develop and explore models and simulations X X Computation and data at extreme scales Trustworthy, economic, timely and relevant results to enable dynamic distributed virtual organisations X X Facilitating collaboration with information and resource sharing Security, reliability, accountability, manageability and agility 2 Distributed Systems to Grids Bespoke pioneering distributed systems, e.g. SABRE & SAGE Linklider proposes shared multi-site computing ARPA net IBM CICS Ethernet TCP Dozens of academic networks Two dominant protocols CORBA & DCOM IP-based Internet Academic & Research WWW 1960 1970 1980 1990 2000 Distributed Systems to Grids Bespoke pioneering distributed systems D-Grid starts Many research grids Linklater proposes shared multi-site computing Using many protocols & M/W stacks IBM CICS Web Services ARPA net Unicore networks Dozens of academic Two dominant protocols Globus CORBAStarts & DCOM Collaboration via shared IP-based Internet bio/chem/medical I-way Academic & Research “DBs” Condor WWW 1960 1970 1980 1990 2000 3 Why use / build Grids? Research Arguments Enables new ways of working New distributed & collaborative research Unprecedented scale and resources Economic Arguments Reduced system management costs Shared resources ⇒ better utilisation Pooled resources ⇒ increased capacity Load sharing & utility computing Cheaper disaster recovery Why use / build Grids? Computer Science Arguments New attempt at an old hard problem Frustrating ignorance existing results New scale, new dynamics, new scope Engineering Arguments Enable autonomous organisations to X X X Write complementary software components Set up run & use complementary services Share operational responsibility 4 Why use / build Grids? Political & Management Arguments Stimulate innovation Promote intra-organisation collaboration Promote inter-enterprise collaboration What is e-Infrastructure – Political view A shared resource That enables science, research, engineering, medicine, industry, … It will improve UK / European / … productivity X X Lisbon Accord 2000 E-Science Vision SR2000 – John Taylor Commitment by UK government X Sections 2.23-2.25 Always there X c.f. telephones, transport, power 5 UK e-Science Budget (2001-2006) Total: £213M + £100M via JISC EPSRC Breakdown M RC (£21.1M ) 10% HPC (£11.5M) BBSRC (£18M ) 15% 8% EPSRC (£77.7M ) 37% Applied (£35M) Staff costs 45%Grid Resources NERC (£15M ) 7% Computers & Network Core (£31.2M) 40% (£57.6M ) funded separately PPARC27% CLRC (£10M ) 5% ESRC (£13.6M ) 6% + Industrial Contributions £25M Source: Science Budget 2003/4 – 2005/6, DTI(OST) Slide from Steve Newhouse Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure 6 Grids: a foundation for e-Science e-Science methodologies will rapidly transform science, engineering, medicine and business driven by exponential growth (×1000/decade) X enabling a whole-system approach computers software Grid sensor nets instruments Diagram derived from Ian Foster’s slide colleagues Shared data archives Service-Oriented Architecture Registry Discovery Registration Invocation Client Service 7 Grid Construction Principles Dynamic coupling based on SOA Respect Autonomy – local policies Independent construction & provision Requires adherence to agreed protocols Interoperability Preferably with widely adopted standards Algorithms & Information structures Must scale well Must be tolerant to partial failure Must be tolerant to partial system change Mechanisms to build trust are essential Much more than just providing services Providing mutually consistent services with compatibly policies Why are distributed systems hard They are necessary to our modern way of life telephone, airline booking, internet, web, financial trading X X Access to remotely curated data and remote / mobile resources Integration of company-wide IT support for R&D Global, always on, single-purpose systems Hard to build, finance and manage Often based on sharing a common (may be replicated) database 8 Why are distributed systems hard Global, always on, multi-purpose systems Never been done – can it be done? Only by a huge international collaborative effort Challenges versus Goals Challenge Heterogeneity & Variety Complex platform behaviour Partial failures Partial failures + large tasks Autonomy – owner’s rights Independent provision Scale, costs & latency Vulnerable to misuse Diverse & Evolving Valuable assets Reputation, equipment, teams, data, algorithms, working practices Goal Simple operational model Simple application model Simple user model Minimal resource wastage Stability & uniformity Simple resource access Good performance Dependable protection Flexible & agile IPR & assets well protected 9 The Primary Requirement … Building people grids Enabling People to Work Together on Challenging Projects: Science, Engineering & Medicine Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure 10 The e-Science Centres Globus Alliance National Centre for e-Social Science Open Middleware Infrastructure Institute e-Science Institute Grid Operations Support Centre Digital Curation Centre CeSC (Cambridge) EGEE National Institute for Environmental e-Science How Grid Software Works: NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes Ian Foster 11 UCSF UIUC From Klaus Schulten, Center for Biomollecular Modelling and Bioinformatics, Urbana-Champaign Grid2003: An Operational Grid 24 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs ¾ 8 substantial applications + CS experiments ¾ Running since October 2003 ¾ ¾ Korea http://www.ivdgl.org/grid2003 12 Database Growth PDB Content Growth BRIDGES C F G V ir t u a l P u b lic a lly C u r a te d D a ta E nsem bl O r g a n is a t io n O M IM G la s g o w S W IS S -P R O T P r iv a te E d in b u r g h MGI VO Authorisation P r iv a te d a ta O x fo rd s bla t P r iv a te d a ta H UG O … RGD L e ic e s te r DATA HUB OGSA-DAI Synteny Grid Service d ata Information Integrator P r iv a te d a ta N e th e r la n d s P r iv a te d a ta London P r iv a te d a ta + 13 27 Mammography A prototype of a national database of mammographic images in support of the UK breast screening programme Temporal mammography Standard Standard Mammo Mammo Format Format Computer Aided Detection Mammograms have different appearances, depending on image settings and acquisition systems 3D View eDiaMoND: Screening for Breast Cancer Patients Screening 1 Trust Æ Many Trusts Collaborative Working Audit capability Epidemiology Electronic Patient Records Radiology reporting systems Letters Case Information Assessment/ Symptomatic Biopsy 2ndary Capture Or FFD X-Rays and Case Information Other Modalities -MRI -PET -Ultrasound Symptomatic/Assessment Information eDiaMoND Grid Case and Reading Information Better access to Case information And digital tools SMF CAD Training Case and Reading Information Digital Reading 3D Images Supplement Mentoring With access to digital Training cases and sharing Of information across clinics Manage Training Cases Perform Training SMF CAD Temporal Comparison Provided by eDiamond project: Prof. Sir Mike Brady et al. 14 Virtual Observatories Observations made across entire electromagnetic spectrum ROSAT ~keV DSS Optical 2MASS 2µ IRAS 25µ IRAS 100µ GB 6cm NVSS 20cm WENSS 92cm ⇒e.g. different views of a local galaxy Need all of them to understand physics fully Databases are located throughout the world Peter Clarke Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure 15 Data Services: challenges • Scale – Many sites, large collections, many uses • Longevity – Research requirements outlive technical decisions • Diversity – No “one size fits all” solutions will work – Primary Data, Data Products, Meta Data, Administrative data, … • Many Data Resources – Independently owned & managed – – – – No common goals No common design Work hard for agreements on foundation types and ontologies Autonomous decisions change data, structure, policy, … – Geographically distributed • and I haven’t even mentioned security yet! Slide from Neil Chue Hong OGSA-DAI In One Slide • An extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: – Queries and updates. – Data transformation / compression – Data delivery. • Customise for you project using – Additional Activities – Client Toolkit APIs – Data Resource handlers Slide from Neil Chue Hong • A base for higher-level services – federation, mining, visualisation,… 16 Core features of OGSA-DAI – I • A framework for building applications – Supports data access, insert and update – Relational: MySQL, Oracle, DB2, SQL Server, Postgres – XML: Xindice, eXist – Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… – Supports data delivery – – – – SOAP over HTTP FTP; GridFTP E-mail Inter-service – Supports data transformation – XSLT – ZIP; GZIP – Supports security – X.509 certificate based security Slide from Neil Chue Hong Core features of OGSA-DAI – II • A framework for building data clients – Client toolkit library for application developers • A framework for developing functionality – Extend existing activities, or implement your own – Mix and match activities to provide functionality you need • Highly-extensible – Customise our out-of-the-box product – Provide your own services, client-side support and data-related functionality • Comprehensive documentation and tutorials • Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 / OMII_2 using Java 1.4 Slide from Neil Chue Hong 17 International Collaboration & Use USA: o Globus Alliance o IBM Corporation o caBIG o BIRN o Indiana University o GridSphere o GEON o LEAD o MCS o NCSA o Secure Data Grid o UNC Tutorials Boston CERN Edinburgh San Francisco Seoul Tokyo UK: o OMII o OMII-UK o NGS o NCeSS Europe: o NIeeS o CERN o AstroGrid o DataMiningGrid o BioSimGrid o GridMiner o BRIDGES o GridSphere o CancerGrid o inteligrid o ConvertGrid o N2Grid o eDiaMonD o OntoGrid o EDINA o Provenance o First Group plc o SIMDAT o Fujitsu Labs Europe o OMII-EU o GEDDM o GeneGrid o Genomic Technology and Informatics o GOLD o Human Genetics Unit Austria o IBM UK 1% o myGrid France o Oracle UK 3% Cambridge Chicago London Seattle Singapore ISSGC 03 to 05 China: Japan: o CAS o AIST o ChinaGrid o BioGrid o cnGrid o NAREGI o INWA o OMII-China South Korea: o KISTI Others 20% Australia: o Curtin Business School o INWA China 40% Italy 2% Japan 5% United Kingdom DIALOGUE workshops 15% Columbus, Edinburgh, Indiana, Vienna 1485 registered users United States Chicago, Manchester, San Diego 11% Germany 3% 5250+ downloads Meeting User Requirements FirstDIG ConvertGrid eDiaMoND GeneGrid BRIDGES LEAD Grid Miner OGSA WebDB OGSA-DQP caBIG 18 Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure It’s a Bonanza – can you keep your head? Your insights in your subject Foundation step in discovery Directs your line of work Selecting well is still a hard challenge A Jungle of Opportunities Retain judgement & sense of direction Don’t be distracted by technology, … Be a Smart Tool User Understand and exploit tools Demand good & persistent tools Early adopters may win or be mired in a techno-bog 19 Be a Smart Data User Choosing data sources How do you find them? You’re an innovator How do they describe and advertise them? Is the equivalent of Google possible? ∴Your model ≠ their model Obtaining access to that data Overcoming administrative barriers Overcoming technical barriers ⇒ Negotiation & patience needed from both sides Understanding that data The parts you care about for your research Extracting nuggets from multiple sources Pieces of your jigsaw puzzle Combing them using sophisticated models The picture of reality in your head Analysis on scales required by statistics Coupling data access with computation Repeated Processes Examining variations, covering a set of candidates Monitoring the emerging details Coupling with scientific workflows Be a Smart Collaborator Understand & Negotiate Position, Role & Responsibilities Deliver on your promises On time Record and State Provenance of Work Build international & national collaboration Investment in communication and learning Expect good collaboration from others 20 CS: Take Home Message There are plenty of Research Challenges Workflow & DB integration, co-optimised Distributed Queries on a global scale Heterogeneity on a global scale Dynamic variability X X Authorisation, Resources, Data & Schema Performance Some Massive Data Metadata for discovery, automation, repetition, … Catalogues Optimising Data replication & Data movement Provenance tracking Grasp the theoretical & practical challenges Working in Open & Dynamic systems Incorporate all computation Contents History & Motivation for e-Science & Grids Linklatter to Foster & Kesselman to … What are Grids & Web Services Disruptive technology advances Rallying cries Example Grids & Projects National Grids, Open Science Grid, BIRN, GEON, … Data and Information – Biomedicine, Engineering & Finance Example Middleware Project OGSA-DAI – international use and collaboration Reaping the Benefit of Grids What is the impact on your research Future e-Infrastructure Ambient Knowledge-based Computational Infrastructure 21 E-Infrastructure Nirvana Users: Individual & Organisations User Adaptation Layer DIK Adaptation Layer CSC Adaptation Layer Integration, Sharing, Trust building, Resource Managing Distributed System Infrastructure Compute Storage & Communica tions Providers Data Information & Knowledge Providers E-Infrastructure: the path to Nirvana 1 Users: Individual & Organisations Stability Extended with Improved Abstractions User Adaptation Layer Integration, Sharing, Trust building, Resource Managing Distributed System Infrastructure Data Information & Knowledge Providers New technology Discipline Specific Toolsets Languages & Portals User Mobility Compute Storage & Communica tions Providers 22 E-Infrastructure: the path to Nirvana 2 Users: Individual & Organisations Stability + Higher-order Operations + Improved (semantic) Meta data User Adaptation Layer Integration, Sharing, Trust building, Resource Managing Distributed System Infrastructure Compute Storage & Communica tions Providers Data Information & Knowledge Providers E-Infrastructure: the path to Nirvana 3 Users: Individual & Organisations Increasing Self-organisation & Autonomic Behaviour Data Information & Knowledge Providers User Adaptation Layer Integration, Sharing, Trust building, Resource Managing Distributed System Infrastructure Automated Generation Of Stability Adapters & Translators Compute Storage & Communica tions Providers 23 Summary: Take home message E-Infrastructure is arriving Built on Grids & Web Services Data and Information grow in importance There is a dramatic rate of change An opportunity for everyone 24