Data Dominated e-Science Dr. Dave Berry Research Manager www.nesc.ac.uk Visit of Clinton Foster 20th August 2003 1 Outline What is e-Science? 3 models & 2 examples Delivering e-Science Open Grid Services Architecture UK e-Science UK e-Science: Roles and Resources Grid Infrastructure 2 3 Foundation for e-Science e-Science methodologies will rapidly transform science, engineering, medicine and business driven by exponential growth (×1000/decade computers software Grid sensor nets instruments colleagues Shared data archives 4 Focus for Three Modes of Thought Computing Science: Systems, Notations & Formal Foundation → Process & Trust Experiment & Advanced Data Collection → Shared Data Models & Simulations → Shared Data Results 5 Virtual Organisations Multi-national, Multi-discipline, Computer-enabled Consortia, Cultures & Societies Requires Much Engineering, Much Innovation Changes Culture, New Mores, New Behaviours New Opportunities, New Results, New Rewards 6 Global in-flight engine diagnostics in-flight data airline global network eg SITA ground station DS&S Engine Health Center internet, e-mail, pager maintenance centre data centre Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York 7 Biology & Medicine Extensive Research Community >1000 per research university Extensive Applications Health, Food, Environment Interacts with virtually every discipline Physics, Chemistry, Nanoengineering, … 450 Databases relevant to bioinformatics Heterogeneity, Interdependence, Complexity, Change, … Wonderful Scientific Questions How does a cell work? How does a brain work? How does an organism develop? Why is the biosphere so stable? What happens to the biosphere when the earth warms up? … 8 Database Growth 39,856,567,747 PDB Content Growth 9 Database-mediated Communication Experimentation Community Curated Shared Database Data Carries knowledge Simulation Community Analysis & Theory Community Carries knowledge Data knowledge Results 10 Organisational & Cultural Changes Access to Computation & Data must be simple All use a computational, semantic, data-rich web i.e. its invisible – the portal / browser lets you do more Responsibility of data publishers Cost, dependability, trustworthy, capable, flexibility, … Shared contributions compose indefinitely Knowledge accumulation and interdependence Contributor recognition and IPR Complexity and management of infrastructure Always on Must be sustained Paid for Hidden 11 TeraBytes → PetaBytes RAM time to move 15 minutes -> 2 months 1Gb/s WAN move time 10 hours ($1000) -> 14 months ($1 million) Disk Cost 7 disks -> 6800 Disks + 490 units + 32 racks $5000 (SCSI) -> $7 million Disk Power 100 Watts -> 100 Kilowatts Disk Weight 5.6 Kg -> 33 Tonnes Disk Footprint Inside machine -> 60 m2 May 2003 Approximately Correct See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24 12 13 Open Grid Services Architecture Web Services Grid Technology Grid Services 14 Web Services Independence Client from Service Service from Client Description Web Services DL … Separation www.w3.org/TR/SOAP Function from Delivery Tools & Platforms Java ONE Visual .NET WebSphere Oracle Commercial Buy in www. w3c. org / TR / SOAP or TR/wsdl 15 Grid Technology Distribution Various Protocols FTP Security Single Sign in Resource Sharing Discovery Process Creation Scheduling Portability APIs Gov’nm’t Agency Buy in Foster, I., Kesselman, C. and Tuecke, S., The Anatomy of the Grid: Enabling Virtual16 Organisations, Intl. J. Supercomputer Applications, 15(3), 2001 OGSA Features WSDL + WSIL Life Time Management Description Discovery Factories Transient & Persistent GS GS Handles GS Records Soft State Notification Tools & Platforms Apache axis … Invocation SOAP RPC … Representations XML + Schema Authentication Certificates + Delegation Change Management Platform Foster, I., Kesselman, C., Nick, J. and Tuecke, S., The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration 17 18 UK e-Science Funding First Phase: 2001 –2004 Application Projects £74M >60 Projects 340 at first All Hands Meet Core Programme £35M Collaborative industrial projects ~80 Companies Second Phase: 2003 –2006 Application Projects £96M Core Programme £16M + £25M (?) Core Grid Middleware 19 All areas of science and engineering 2001-4 Medical £8M Biological £8M Environmental £7M Eng & Phys £17M HPC £9M Core Prog. £15M + £20M Particle Phys & Astro £26M Economic & Social £3M Central Labs £5M Research Council 2004-6 £13.1M £10M £8M £18M £2.5M £16.2M + £DTI £31.6M £10.6M £5M 20 NeSC in the UK You are here HPC(x) Glasgow Edinburgh Directors’ Forum Newcastle Architecture Task Force Belfast UK Adoption of OGSA OGSA Grid Market Manchester Daresbury Lab Workflow Management Database Task Force OGSA-DAI Cambridge Oxford GGF DAIS-WG Hinxton RAL Engineering Task Force Cardiff e-Science Institute London training, coordination, Southampton community building, workshops, pioneering Grid Support Centre GridNet e-Storm 21 UK Grid: Operational Currently based on Globus Toolkit 2 Transition to OGSI/OGSA over the next year Heterogenous Many architectures and operating systems Many organisations Many issues still to be resolved, e.g. OGSA definition / delivery Portals Combinations of Services supported Account management and accounting 22 ODD-Genes database engine 1 registry database engine 2 PSE 23 www.nesc.ac.uk 24