e-Science and the Grid - Towards a BioGrid? Tony Hey Director of UK e-Science Core Program Tony.Hey@epsrc.ac.uk e-Science and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ John Taylor Director General of Research Councils Office of Science and Technology UK e-Science Initiative • £120M Programme over 3 years • £75M is for Grid Applications in all areas of science and engineering • £35M ‘Core Program’ to encourage development of generic ‘industrial strength’ Grid middleware ¾ Require £20M additional ‘matching’ funds from industry UK e-Science Projects • £75M for e-Science application ‘pilots’ - span all sciences and engineering • Particle Physics and Astronomy (PPARC) - £17M GridPP and £5M AstroGrid • Engineering and Physical Sciences (EPSRC) - funding 6 projects at around £3M each • Biology, Medical and Environmental Science - funding projects with total value of £23M Steve Lloyd Tony Doyle John Gordon GridPP Presentation to PPARC Grid Steering Committee 26 July 2001 reconstruction raw data batch batch physics physics analysis analysis event event reprocessing reprocessing analysis event event simulation simulation CER N processed data event summary data analysis objects (extracted by physics topic) simulation interactive physics analysis les.robertson@cern.ch event eventfilter filter (selection (selection&& reconstruction) reconstruction) detector Data Handling and Computation for Physics Analysis Estimated Physics Computation Capacity at CERN 6.0 5.0 3.0 Moore’s law: 2.0 year 2009 2008 2007 2006 2005 2004 2003 2002 2001 0.0 2010 capacity growth with - a fixed cpu count - or a fixed annual budget 1.0 2000 M SI95 4.0 CERN's Users in the World Europe: 267 institutes, 4603 users Elsewhere: 208 institutes, 1632 Powering the Virtual Universe http://www.astrogrid.ac.uk (Edinburgh, Belfast, Cambridge, Leicester, London, Manchester, RAL) Multi-wavelength showing the jet in M87: from top to bottom – Chandra X-ray, HST optical, Gemini mid-IR, VLA radio. AstroGrid will provide advanced, Grid based, federation and data mining tools to facilitate better and faster scientific output. Picture credits: “NASA / Chandra X-ray Observatory / Herman Marshall (MIT)”, “NASA/HST/Eric Perlman (UMBC), “Gemini Observatory/OSCIR”, “VLA/NSF/Eric Perlman (UMBC)/Fang Zhou, Biretta (STScI)/F Owen (NRA)” p9 Printed: 27/06/2002 UK ‘BioGrid’ Projects EPSRC Projects • Comb-e-Chem (EPSRC) • myGrid (EPSRC) • DiscoveryNet (EPSRC) BBSRC Projects • Biomolecular Grid (BBSRC) • Proteome Annotation Pipeline (BBSRC) • High-Throughput Structural Biology (BBSRC) • Global Biodiversity (BBSRC) UK ‘BioGrid’ Projects MRC Projects • Biology of Ageing (BBSRC + MRC) • Sequence and Structure Data (MRC) • Molecular Genetics (MRC) • Cancer Management (MRC + PPARC) • Clinical e-Science Framework – CLEF (MRC) • Neuroinformatics Modeling Tools (MRC) The Comb-e-Chem Project • Goal is to integrate simulated and experimental data within a knowledge environment - Accumulate and model data using new combinatorial methods - Automate metadata annotation for provenance • Southampton, Bristol, Cambridge Crystallographic Data Centre, Pfizer, IBM Comb-e-Chem Architecture Video Simulation Diffractometer Properties Analysis Structures Database Globus X-Ray e-Lab Properties e-Lab The myGrid Project • Goal is to develop ‘workbench’ to support: – Experimental process of data accumulation – Use of community information • Provide facilities for resource selection, data management and process enactment – Functional genomics, pattern database annotation • Manchester, EBI, Newcastle,Nottingham, Sheffield, Southampton, GSK, AstraZeneca, Merck, IBM, Sun, … Functional Genomics Data • Imminent ‘deluge’ of data • Highly heterogeneous • Highly complex and inter-related • Convergence of data and literature archives myGrid Generic Technologies 1. 2. 3. 4. 5. Database access from the Grid Process enactment on the Grid Personalisation services Metadata services Development of Agent Services Grid Services + Ontologies ¾ Towards the ‘Semantic Grid’ The Discovery Net Project • Data issues : Calibration – Diversity of resource: normalisation – Diversity of quality : Cleaning • Information issues : Integration – Information structuring (XML/Schema) – Information abstraction • Knowledge issues : Assimilation – Validation & Reference : knowledge schema – Management : discovery process Discovery Deployment Discovery Component Active Report Discovery Process Markup Language Batch processing Discovery Service GRAB - Biodiversity and the Grid Federated catalogue of life Species data from Edinburgh Specialist in Cardiff Climate data from York Images from Cambridge Application control at Southampton GRAB - Biodiversity and the Grid GRAB Results Species ………… ………… ………… Locations ………… ………… ………… Graphics e-Science ‘Core Program’ 1. ¾ 2. 3. 4. 5. 6. Network of e-Science Centres UK e-Science Grid and AccessGrid Generic/Industrial Grid Middleware e-Health Grid ‘Grand Challenge’ Support for e-Science Applications Outreach/International Activities Grid Network Issues UK e-Science Grid Edinburgh Glasgow DL Belfast Newcastle Manchester Cambridge Oxford Cardiff RAL London Southampton Hinxton Access Grid AccessGrid ‘Grid Computing is one of the three next big things for Sun and our customers’ Ed Zander, COO Sun ‘The alignment of OGSA with XML Web services is important because it will make Internet-scale, distributed Grid Computing possible’ Robert Wahbe, General Manager of Web Services, Microsoft Timescales for Exploitation? • IBM see ‘early adopters’ of Grid technology coming from pharmaceutical, engineering and petrochemical sectors ¾ UK program confirms this picture (AstraZeneca, GSK, Merck, Pfizer, Rolls-Royce, BAESystems, Schlumberger) • IBM see Grid middleware being adopted by mainstream commerce and industry in 2003/2004 timeframe Collaborative Industrial Grid Projects • Grid Application Projects have more than £8M industrial input - mostly major pharmaceutical and engineering companies • Around £15M allocated for collaborative industrial projects for middleware/tools - at present £5M allocated with matching industrial funding Databases in the Grid Data Complexity Computational Complexity OGSA – Data Access and Integration Project - Key middleware area for UK Program - Develop high-quality data-centric middleware capability - Total Budget $5M (CP $2M) - Three Centres: Edinburgh, Manchester and Newcastle - Industrial partners: IBM US, IBM Hursley and Oracle UK. e-Health ‘Grand Challenge’ • Equator: Technological innovation in physical and digital life • AKT: Advanced Knowledge Technologies • DIRC: Dependability of Computer-Based Systems • MIAS: From Medical Images and Signals to Clinical Information e-Health Grid Projects • MIAS-Grid – Annotated Database of digitized mammographic data for epidemiology studies and diagnosis support • Grid-Enabled Knowledge Services for Medical Informatics – Triple Assessment in Breast Cancer: Fusion of Clinical, Radiological and Cytological data • Grid-based Medical Devices for Everyday Health – Patient sensors, mobile wireless communication Support for e-Science Projects • Grid Support Centre - Support Grid middleware for users - Grid Certification Authority • National e-Science Institute for UK Research Seminar and Training Program – see www.nesc.ac.uk • Grid Network Team - QoS project with Cisco on MPLS - Advise on end-to-end e-Science issues SuperJanet4, June 2002 Scotland via Edinburgh Scotland via Glasgow NNW 20Gbps 10Gbps 2.5Gbps 622Mbps 155Mbps WorldCom Glasgow WorldCom Edinburgh NorMAN YHMAN Northern Ireland MidMAN WorldCom Manchester WorldCom Leeds EMMAN WorldCom Reading WorldCom London EastNet TVN South Wales MAN SWAN& BWEMAN WorldCom Bristol External Links WorldCom Portsmouth LMN LeNSE Kentish MAN e-Science and the Grid ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor