Progress with UK e-Science BCS Anglia Ruskin University Chelmsford Malcolm Atkinson Director e-Science Institute & e-Science Envoy www.nesc.ac.uk 20th February 2007 Overview History of e-Science in UK Three Significant Strengths Established Projects e-Infrastructure Communities & Breadth ESFRI, EGEE, et al. thriving in Europe e-Science & Cyberinfrastructure everywhere e-Science definition & history Propose an e-Science Framework Test drive framework on 3 UK project The framework in today’s technical context Defining e-Science e-Science: Systematic Support for Collaborative Research Multi-disciplinary, Multi-Site & Multi-National All disciplines contribute & benefit Enabling wider engagement Building with and demanding advances in Computing Science Using advances in computing to support research, design, diagnosis Dates back 50 years Prevalent in branches of biology 20 years Prevalent in Engineering for >40 years UK e-Science e-Science and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor Director General of Research Councils Office of Science and Technology From presentation by Tony Hey UK e-Science Budget (2001-2006) Total: £213M + £100M via JISC EPSRC Breakdown MRC (£21.1M) 10% EPSRC (£77.7M) 37% Applied (£35M) Staff costs 45%Grid Resources HPC (£11.5M) BBSRC (£18M) 15% 8% NERC (£15M) 7% Computers & Network Core (£31.2M) 40% (£57.6M) funded separately PPARC27% CLRC (£10M) 5% ESRC (£13.6M) 6% + Industrial Contributions £25M Source: Science Budget 2003/4 – 2005/6, DTI(OST) Slide from Steve Newhouse UK e-Science Diversity Thriving Community All disciplines & all Research Councils Industry & Academia Many universities & research institutes UK e-Science All Hands Meetings Productive collaboration e-Infrastructure A shared resource That enables science, research, engineering, medicine, industry, … It will improve UK / European / … productivity Lisbon Accord 2000 E-Science Vision SR2000 – John Taylor Commitment by UK government Sections 2.23-2.25 Always there c.f. telephones, transport, power OSI report www.nesc.ac.uk/documents/ OSI/index.html National Centre for e-Social Science Aberdeen University of Manchester University of Essex Lancaster Manchester Leeds Nottingham Oxford Bristol Colchester London Edinburgh National Grid Service and partners Edinburgh CCLRC Rutherford Appleton Laboratory Lancaster Manchester Leeds York Sheffield Cardiff Didcot Westminster Bristol Slide: Neil Geddes e-Science Centres in the UK Coordinated by: Directors’ Forum Digital Curation Centre & NeSC Edinburgh White Rose Grid Glasgow Access Grid Support Centre Newcastle Lancaster Manchester Leicester Belfast National Centre for Text Mining National Centre for e-Social Science CCLRC Daresbury National Institute for e-Science Leeds York Environmental Sheffield Cambridge Birmingham Oxford National Grid Service Cardiff Bristol Reading +2 years CCLRC RAL Open Middleware Infrastructure Institute Southampton LeSC UCL OMII-UK nodes EPCC & National e-Science Centre School of Computer Science University of Manchester Edinburgh School of Electronics and Computer Science University of Southampton Manchester Southampton Digital Curation Centre and partners Humanities Advanced Technology and Information Institute Database Research Group, School of Informatics AHRC Research Centre for Studies in Intellectual Property and Technology Law EDINA National e-Science Centre Edinburgh Glasgow Rutherford Appleton (Didcot) and Daresbury (Warrington) Laboratories UKOLN (formerly UK Office for Library Networking) Warrington Didcot Bath Achieving the CI Vision requires synergy between 3 types of Foundation wide activities Transformative Application - to enhance discovery & learning Provisioning Creation, deployment and operation of advanced CI R&D to enhance technical and social dimensions of future CI systems Office of Cyberinfrastructure D. E. Atkins Framework for e-Science Motivation for collaboration Socio-economic value identified Impediments recognised All participants agree & cooperate Challenge and Insights Articulated & demanding challenge Creative new approach Potentially feasible Technical advances New models, new methods, collaboration support Economic changes - e.g. shared computing Cultural changes - e.g. shared information The NERC Success Professor Robert Gurney Director, Environmental Systems Science Centre, Reading The NERC e-Science experience 11 papers in Nature Enthusiastic uptake of ensemble methods climateprediction.net Predicting Climate Change Through Volunteer Computing University of Oxford Department of Atmospheric Physics Slide: Robert Gurney climateprediction.net Users Worldwide >300,000 users total (90% MS Windows): >60,000 active ~17 million model-years simulated (as of September '06) ~180,000 completed simulations Impact: New Science Understanding of science Engaging schools BBC follow on The world's largest climate modelling supercomputer! (NB: a black dot is one or more computers running climateprediction.net) Slide: Robert Gurney Climateprediction.net – Volunteer computing – Myles Allen, Atmospheric Physics - More than 10 Million models calculated - Uses BOINC – portal for broader community - Used in schools - Interesting distributed data analysis problems Framework for e-Science Motivation for collaboration Socio-economic value? Better global warming prediction public understanding of GW Impediments? Reaching enough participants Gaining attention & resources Participants cooperate? Volunteers “buy in” Boinc culture helps Good PR media interest BBC involved more incentives motivated by cause, by visualisation and by wiki Global net of data collection centres needed - storage & compute! Why should they contribute? Framework for e-Science Challenge and Insights Challenge? Explore effects of uncertainty in models & physics of climate Infeasible amounts of supercomputing time New approach? Run simpler model Use ensemble computing - Monte Carlo parameter exploration Analyses and integration over all results Feasible? BOINC from SETI suggest computation resource feasible But large volumes of data per model run Needs to be stored and later analysed http://climateprediction.net Framework for e-Science Technical advances New model? Simplified Hadley + … New method? Ensemble methods Distributed using BOINC Distributed data collection Distributed data integration and analysis http://www.allhands.org.uk/2006/proceedings/papers/595.pdf Collaboration support? Built on BOINC collaboration support Improved visualisation Economic change? Free model runs > 21 million model hours How were the data centres financed? Cultural change? Explicit use of media NERC support for community integration NERC centres National Institute for Environmental e-Science, University of Cambridge Cambridge University of Reading 6th September 2006 Swindon Reading 24 In silico biology http://www.mygrid.org.uk Construct in silico experiments, find and adapt others, manage the experiment lifecycle Taverna Workflow workbench OGSA-DQP Semantic Technologies Williams-Beuren Syndrome, Grave’s Disease, Trypanosomiasis in cattle. OMII-UK Node, GRIMOIRE Registry, Taverna Workflow workbench 12000+ Downloads of Taverna Wide transfer to BBSRC (e-Fungi, ISPIDER, ComparaGrid) & MRC projects (PsyGrid, CLEF, CLEFS) Semantic Grid pioneer WBS gene identification Outstanding international links Great deal of open source s/w Links into BOSC & HGMP KT to BT, ComparaGrid, OntoGrid, BBSRC Systems Biology Centre, MIASGrid, Rice Institute etc Middleware for data intensive in silico biology by bioinformaticians • Carole Goble (Comp Sci, Manchester) • 7 Universities and institutes (incl. EBI) • 8 Companies Slide: Carole Goble & Jim Fleming Framework for e-Science Motivation for collaboration Socio-economic value? Impediments? Participants cooperate? Challenge and Insights Challenge? New approach? Feasible? Technical advances New model? New method? Collaboration support? Economic change? Cultural change? Taverna Workflow Workbench Carole Goble David De Roure Slide: Dave De Roure & Jeremy Frey CombeChem Semantic Datagrid Video Simulation Diffractometer Properties Analysis Structures Database X-Ray e-Lab Properties e-Lab Grid Middleware Slide: Dave De Roure & Jeremy Frey Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Data analysis, transformation, mining, modelling Searching , harvesting, embedding Aggregator services: national, commercial Resource discovery, linking, embedding Learning object creation, re-use Harvesting metadata Research & e-Science workflows Deposit / selfarchiving Learning & Teaching workflows Repositories : institutional, e-prints, subject, data, learning objects Validation Deposit / selfarchiving Publication Resource discovery, linking, embedding The scholarly knowledge cycle. Liz Lyon, Ariadne, July 2003. © Liz Lyon (UKOLN, University of Bath), 2003 This work is licensed under a Creative Commons License Attribution-ShareAlike 2.0 Peer-reviewed publications: journals, conference proceedings Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Validation Quality assurance bodies Framework for e-Science Motivation for collaboration Socio-economic value? Impediments? Participants cooperate? Challenge and Insights Challenge? New approach? Feasible? Technical advances New model? New method? Collaboration support? Economic change? Cultural change? Data capture Slide: Dave De Roure & Jeremy Frey Ingredient List Fluorinated biphenyl Br11OCB Potassium Carbonate Butanone Dissolve 4flourinated biphenyl in butanone 0.9 g 1.59 g 2.07 g 40 ml Plan To Do List CombeChem Semantic Datagrid Add Add K2CO3 powder Heat at reflux for 1.5 hours Add 0.9031 Cool and add Br11OCB Heat at reflux until completion Cool and add water (30ml) Extract with DCM (3x40ml) Cool Reflux Add Cool Reflux Liquidliquid extraction Add Combine organics, dry over MgSO4 & filter Dry Remove solvent in vacuo Remove Solvent by Rotary Evaporation Filter Fuse compound to silica & column in ether/petrol Column Chromatography Fuse grammes Inorganics dissolve 2 layers. Added brine ~20ml. 3 of 40 g excess ml text Ether/ Petrol Ratio image Process Record Weigh Butanone dried via silica column and measured into 100ml RB flask. Used 1ml extra solvent to wash out container. Silica Measure Measure Sample of 4flourinated biphenyl Annotate DCM MgSO4 Annotate Add 1 1 2 2 1 Add 3 Cool Reflux text Sample of K2CO3 Powder Measure 3 4 Add Sample of Br11OCB Annotate Butanone 1 Weigh 5 2 Reflux Weigh 6 2 4 7 Add Cool Water 8 9 10 Dry Liquidliquid extraction Annotate 11 Filter (Buchner) Annotate 12 Remove Solvent by Rotary Evaporation 13 Fuse 14 Column Chromatography Measure text 40 Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. ml 2.0719 g 1.5918 g 30 ml Organics are yellow solution Key Observation Types Future Questions Process weight - grammes Whether to have many subclasses of processes or fewer with annotations Input Literal measure - ml, drops How to depict destructive processes annotate - text ° How to depict taking lots of samples temperature - K, C Observation text Washed MgSO4 with DCM ~ 50ml text Combechem 30 January 2004 gvh, hrm, gms What is the observation/process boundary? e.g. MRI scan Slide: Dave De Roure & Jeremy Frey ecrystals.chem.soton.ac.uk Slide: Dave De Roure & Jeremy Frey Slide: Dave De Roure & Jeremy Frey Grunts and body language 500,000 years Printing 600 years Speech 300,000 years Broadcasting 100 years Telecommunications 170 years Home Computers Internet and WWW Mobile phones Grid and Web 2.0 Writing Web 3.0 and Ubiquitous connected devices 30 years 5,000 years Today “Wellbeing” the global-scale killer app., Sir Robin Saxby Oct. 2006 Timeline Healthcare @ Home REFERRAL GP Home-mobile-clinic via PDA-laptop-PC-Paper REFERRAL Diabetician Home-mobile-clinic via PDA-laptop-PC-Paper Various Clinical Specialists (Distributed) e.g. Ophthalmologist, Podiatrist, Vascular Surgeons, Renal Specialists, Wound clinic, Foot care clinic, Neurologists, Cardiologists REFERRAL VARIABLES ACCESS MATRIX CASE Patient Home-mobile-clinic via TV-PDA-laptop-PC-Paper Dietitian Biochemist Diabetes Specialist / Other Specialist Nurses Home-mobile-clinic via TV-PDA-laptop-PC-Paper Community Nurses / Health Visitors ● ● ● ● ● ● ● ● ● ● ● DAME http://www.cs.york.ac.uk/dame/ Aims to manage >1Tb per year of Aero Engine vibration and maintenance data. Interlinks with search and reasoning services. Defined and evaluated a distributed search system. GSI enabled secure engine performance simulation CBR advisor for diagnostic engineer A data architecture defined based on Globus and SRB. BROADEN DTI Project (£3.9M) Spun out technology exploited through Cybula Ltd., Oxford Biosignals and DS&S. Successful mid-term demonstrator well received by Rolls Royce White Rose Grid: experience of building & using production Grids In Grid Blue Print 2 edition 2 Aircraft healthcare diagnosis • Jim Austin (Comp Sci, York) • 4 Universities and institutes • 3 Companies Slide: Carole Goble, Jim Fleming & Jim Austin Timeline (years ago) Homo habilis existed between 2.4 and 1.5 million years ago and the species’ brain shape shows evidence that some speech had developed. Johannes Gutenberg invented the first printing press in 1440. First ‘writing’ system developed in ancient Sumeria (cuneiform). In the US, Charles Herrold sent out broadcasts as early as April 1909. In the UK, the first experimental broadcasts from Marconi’s factory began in 1920. Arrival of ‘modern man’. Up to 1,500,000 300,000 Homo erectus lived between 1.8 million and 300,000 years ago. It was a successful species for over a million years. The brain grew steadily during its reign. The species definitely had 6th September 2006 speech. 50,000 5,000 The first commercial electrical telegraph was constructed and opened on 9 April 1839. 600 170 100 30 Home Computers Internet and WWW Mobile phones Grid and Web 2.0 Web 3.0 and Ubiquitous 40 connected devices The Semantic Web layer cake User Interface and Applications Trust OWL SPARQL (queries) Rules RDF Schema Signature Proof RDF XML + Namespaces URI Encryption Attribution Explanation Ontologies + Inference Metadata Standard syntax Unicode Identity S-OGSA Model Semantic Grid Annotation Tool/Service Is-a Ontology Service Reasoning Service VO Manager Metadata Store/Service Knowledge Service Semantic Binding Provisioning Service Is-a Semantic Provisioning Service 1..m produce Is-a Ontology Grid Service Is-a Is-a 1..m Grid Entity consume 0..m 0..m Is-a Grid Resource Is-a Knowledge Carole Goble File mgt Policy Semantic Entity Knowledge Resource Rule set Intelligent Monitoring Is-a Is-a Knowledge Entity Is-a 0..m Semantic Binding Satellite Image File 0..m Grid Semantic aware Grid Service Is-a JSDL file What’s Web2.0 ? “Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a supposed second-generation of Internet-based services such as social networking sites, wikis, communication tools, and folksonomies that let people collaborate and share information online in previously unavailable ways.” Wikipedia Pamela Fox So what’s a mashup anyway? A mashup is a website or application that combines content from more than one source into an integrated experience. Content used in mashups is typically sourced from a third party via a public interface or API. Other methods of sourcing content for mashups include Web feeds (e.g. RSS or Atom) and JavaScript.” – Wikipedia A mashup is the ultimate user-generated content: user likes data source A, data source B, & puts them together how they like. * There are also music & video mashups Pamela Fox Amazon Web Services Web 2.0 APIs http://www.programmableweb.com/apis currently (Jan 10 2007) 356 Web 2.0 APIs with GoogleMaps the most used in Mashups This site acts as a “UDDI” for Web 2.0 Geoffrey Fox Take Home UK e-Science investment built 3 interdependent strengths: Communities & collaboration Projects delivering & demanding e-Infrastructure: organisation, support & technology Three success factors for projects Engagement & value for all participants Creativity & insight addressing a well-posed challenge Technology adoption and innovation Research, design or diagnosis is the driver Integrate whatever technology you need Invent new technology only if you have to