UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre www.nesc.ac.uk 2nd October 2002 The UK Biological Grid — Data and Computation The Wellcome Trust Genome Campus Hinxton, Cambridgeshire Overview UK e-Science Reminder of Investment and Infrastructure International e-Science Examples and Collaboration Data Access and Integration Lego Bricks for Scientific Application Developers A Computer Scientist’s View of Biology Diversity and Opportunity The Way Ahead e-Science Fundamentally about Collaboration Sharing X X X X Ideas Thought processes and Stimuli Effort Resources Requires X X X X Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure Scientists (Biologists) have done this for Centuries e-Science (take 2) Text, digital media, Fundamentally about Collaboration structured, organised & Sharing X X X X Ideas Thought processes and Stimuli Effort Resources Requires X X X X curated data, computable models, visualisation, shared instruments, shared systems, shared administration, … Nationally & Internationally Distributed, … Routine, Daily, Automated, Communication … Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure That Requires very Significant Investment in Digital Systems and their Support e-Science (take 3) Digital networks, digital Fundamentally about Collaboration Sharing X X X X Ideas Thought processes and Stimuli Effort Resources Requires X X X X work-places, digital instruments, … Metadata, ontologies, standards, shared curated data, shared codes, … Common platforms, shared software, shared training, … Communication Authentication, Common understanding & Framework Authorisation, Accounting, Mechanisms for sharing fairly Provenance, Policies, … Organisation and Infrastructure Shared Provision of Platform, The Grid SHOULD make this much easier by providing a common, supported high-level of Software and Organisational infrastructure Grid Expectations Persistence Always there, Always Working, Always Supported Stability You can build on foundations that don’t move Trustworthy & Predictable Honours commitments X X X Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance High-level & Extensible The capabilities you need are already there Ubiquitous Your collaborators use it Grid Reality Persistence Political, Economic & Technical issues to Solve Always there, Always Working, Always Supported Early days but Open Grid Stability Services link with Web Services + GGF standardisation You can build on foundations that don’t move Trustworthy & Predictable Honours commitments X X X Not yet but very substantial global effort to achieve this Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Good basis for extension Performance Commitment to basic functionality High-level & Extensible WS + Community effort The capabilities you need are already there Ubiquitous Your collaborators use it Global & Industrial Rallying Cry Must work with Web Services UK Grid Network National e-Science Centre HPC(x) Edinburgh Glasgow Access Grid always-on video walls Newcastle Belfast Daresbury Lab Manchester Cambridge Hinxton Oxford Cardiff RAL London Southampton National e-Science Centre Events Workshops Research Meetings International Meetings History of Events GGF5 HPDC11 Summer school > 50 workshops held > 1000 people in total Many return often Planned Events 25 workshops Conferences to 2005 Visitors 3 arrived 4 arranged International collaboration, visits & visitors China Argonne National Lab SDSC NCSA … Centre Projects Pilot Projects Regional Support Research Projects EPSRC, MRC, WT, SHEFC A day in the life of NeSC Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination real-time collection archival storage desktop & VR clients with shared controls tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago From Steve Tuecke 12 Oct. 01 UCSF UIUC From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign DataGrid Testbed Testbed Sites(>40) HEP sites ESA sites Dubna Lund RAL Estec KNMI IPSL Paris Santander Lisboa CERN Moscow Berlin Prague Brno Lyon Grenoble Milano PD-LNL Torino Madrid Marseille Pisa BO-CNAF ESRIN Barcelona Roma Valencia Catania Francois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.it A Simplified Grid Anatomy Scientific Users Scientific Application Monitoring Diagnosis Logging Scheduling Accounting Authorisation Application Developers Grid Plumbing & Security Infrastructure Operations Owners Data & Compute Resources Team Distributed A Biological Grid Anatomy Biological Users Scientific Application Monitoring Diagnosis Scheduling Accounting Logging Data Integration Authorisation Data Access Grid Plumbing & Security Infrastructure Data & Compute Resources Distributed Structured Data Database Growth PDB protein structures Scientific Data Deluge of Data Exponential growth X Doubling times Astronomy Bio-Sequences Functional Genomics Bytes/dollar 12 months 9 months 6 months 12 to 18 months Not How big it is but Scientific Data Deluge of Data Exponential growth X Doubling times Astronomy Bio-Sequences Functional Genomics Bytes/dollar 12 months 9 months 6 months 12 to 18 months Not How big it is but What you do with it Sharing Curation Metadata Automated movement, access & integration Computational Access Scientific Data Deluge of Data Exponential growth X Doubling times Astronomy Bio-Sequences Functional Genomics Bytes/dollar 12 months 9 months 6 months 12 to 18 months Not How big it is but How you Embrace & Manage Change The Database is a Knowledge chest The Database is a Communication Hub Autonomously Managed (Curated) change An Essential part of e-BioMedical Science Wellcome Trust: Cardiovascular Functional Genomics Glasgow Shared data Edinburgh Public curated data Leicester Oxford London Netherlands Data Access & Integration Central to e-Science Especially Earth Sciences, Ecology, Biology & Medicine Collaboration X X X X Shared Databases Curated Knowledge Accumulated Observations Accumulated Simulations Computation X X X Data mining Input to models Calibration of models Presentation X X Publication of results Visualisation GGF DAIS WG Chairs Norman Paton (Manchester Uni.) Leanne Guy (CERN) Dave Pearson (Oracle UK) Activity BoF GGF4 Toronto WG Meeting GGF5 Edinburgh Papers for GGF6 Workshops & Mail lists Goals Norman Paton, Inderpal Narang, Leanne Guy, Susan Maliaka, Greg Ricardi, … Agree Standards for Database Access & Integration Freely available reference implementations X OGSA-DAI one source & focus for discussions OGSA-DAI project Lego kit for Data Access & Integration Components for e-Science Applications Accelerated Application Development Multiple Data Models Distributed Data Access via Grid & Proxies Integration, Translation & Transformation Open Source Reference Implementation For DAIS-WG standard Trigger for Component Construction Start a community OGSA-DAI Partners IBM USA EPCC & NeSC Glasgow Newcastle Belfast Daresbury Lab Manchester Oxford Cambridge Hinxton EPCC & NeSC Oracle RAL IBM UK Cardiff London IBM Hursley IBM USA Southampton Manchester e-SC Newcastle e-SC £3 million, 18 months, started February 2002 Oracle Primary Components GDSF Client GDS DB Consumer GDSR Advanced Components Translation Client GDS:PerformScript GDS DB Translation GDT Consumer Composed Components GDS:performScript Translation GDS:performScript GDS Client GDS:performScript GDT Translation GDS:performScript GDT GDT Consumer Distributed Query R F Registry Factory GDS 6 1 5 4 Client 7 Evaluator 3 2 PNM DQP GDS Consumer GDT GDTV 6 GDS GDT GDTV Q 7 NS Evaluator GDTV 7 6 T GDT 5 DQP : Distributed Query Processor GDT : Grid Data Transport T : Translation Q : Query GDTV : Grid Data Transport Vehicle F : Factory QPM : Query Progres Monitor PNM : Progress Notification Message AM : Application Metadata CRM : Computational Resource Metadata NS : Notification Sink T 7 QPM GDT DB GDTV 5 GDTV (7) 8 GDS T Evaluator PNM GDTV 7 GDT GDS T 7 GDTV OGSA-DAI Time Line WS + GSI UK support ( > 100 downloads) XML + OGSA Prototypes for Early Adopters Design Documents & Demos for DAIS WG @ GGF5 XML + OGSA Prototype Available RDB + GT2 / OGSA Prototypes Available GGF6 WG Papers & Prototypes Ship Alpha Release for GT3 Integration Presentation & Beta @ GGF7 Productisation, RAMPS & Extension Feb ’02 May ’02 Phase 1 Starts Jul ’02 Sep ’02 Dec ’02 Phase 2 Starts Feb ’03 May ’03 Sep ’03 OGSA-DAI Summary On Schedule & Going Well Contributions via DAIS-WG @ GGF5 & 6 Releases with GT3 Releases scheduled Status: Early Days Released prototypes Tested Architectural Design Using OGSA Working with Early Adopter Pilot Projects X AstroGrid & MyGrid Influence OGSA-DAI direction Via DAIS-WG & Direct messages to us Biomedical e-Scientists Is this one species? Understanding bird energy Understanding a river / ocean interaction Understanding a biochemical pathway Understanding a cell Understanding a Heart or Brain Understanding Rhododendra Understanding Evolution … No One-Size fits all solutions But sharable re-usable components Opportunities Many, many … More than we can address Compute needs Data management needs Data integration needs … Must choose some pioneers To meet a range of common requirements To provoke rich & high-level platform To generate re-usable components A Long-Term Commitment Needed Advancing Biological Grid Biological Users Scientific Application Biomedical (Grid) Application Component Library Monitoring Diagnosis Scheduling Accounting Logging Data Integration Authorisation Data Access Grid Plumbing & Security Infrastructure Data & Compute Resources Distributed Structured Data Summary e-Science Data as well as Compute Challenges X Needed to be put together Need ubiquitous supported consistent platforms Grid A (potentially) invaluable platform Only show in town Data Integration Hard ⇒ Develop & Use Standard kit of parts Started to build the kit Opportunities No one-size fits all, but re-usable subsystems Invest in wider range of Problem driven pioneering Strategic choices needed