CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline • • • • • Background: “Big Science” Grid Computing Standards for Grid Computing e-Science – what is it e-Science Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis 1(#total) e-Science - Background “Big Science” During early part of 20th Century, Science became crucial in warfare World War II : Scientists developed new weapons and tools proximity fuse, radar, atomic bomb, cryptography Lead to a new form of research facility: Government-sponsored laboratory thousands of technicians and scientists, managed by universities Enabled hitherto impossible scientific projects heavy investment by government and industrial interests: blurred line between public and private research Criticisms: Undermines basic principles of scientific method: Results difficult to verify. Access to facilities limited to those who are accomplished -> elitism. Increased government funding often implies military agenda Subverts the Enlightenment-era ideal of science as quest for knowledge. Increased administrative overhead – e.g. filling out grant requests Connections between academic, governmental, and industrial interests Concern about Scientists’ objectivity (e.g. pharmaceutical industry) Internet was born from "Big Science" August 1991 CERN (Switzerland) : new World Wide Web project 2(#total) Grid Computing Grid computing evolved from the computational needs of “Big Science” “Grid computing uses the resources of many separate computers connected by a network (usually the internet) to solve large-scale computation problems.” A conceptual framework rather than a physical resource: flexible computational provisioning beyond the local administrative domain. Involves sharing computing power: heterogeneous resources (based on different platforms, hardware/software architectures, and computer languages), located in different places belonging to different administrative domains using open standards. Requires security : to allow remote users to control computing resources. Special Purpose Grid – Example: SETI@home project General Purpose Grid - Example: Parabon Computation (Commercial) In terms of function: Three types of grid: Computational Grids : computationally-intensive operations. Data grids: sharing and management of large amounts of distributed data. Equipment Grids: control equipment remotely and analyse data produced. e.g. controlling a telescope 3(#total) Grid Standards - Globus Globus Alliance is an association – mainly Universities (e.g. Chicago, Edinburgh, Southern California) Developing fundamental technologies needed to build grid computing infrastructures Most grids in Europe and North America use the Globus Toolkit as their core middleware. Globus software provides (e.g.): Resource management: Grid Resource Allocation & Management Protocol (GRAM) Information Services: Monitoring and Discovery Service (MDS) Security Services: Grid Security Infrastructure (GSI) Data Movement and Management: Global Access to Secondary Storage (GASS) and GridFTP XML-based web services allow access to services/applications grid computing and web services converge: Grid Service Open Grid Services Architecture (OGSA): vision is to describe and build a well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications. 4(#total) e-Science What is e-Science? - science enabled by electronic infrastructure Computationally intensive Uses highly distributed network environments Requires access to immense data sets May require Grid Computing High performance visualisation back to the individual user scientists Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis Middleware: Data communication, data integration Organisations: Requires large and complex infrastructure Research Labs, Large Universities, Governments (e.g. UK) 5(#total) e-Science Examples: Particle Physics Large Hadron Collider (LHC) at CERN Currently the most developed e-Science infrastructure LHC due to start generating data in 2007/8/9?? Massive amount of data generated Estimated at 10 petabytes each year (peta=1015) Thousands of researchers across the world will be involved in the LHC experiments and in analysing results. GridPP UK’s contribution to analysing this data deluge. Six-year, £33m project Collaboration of around 100 researchers in 19 UK University particle physics groups, CCLRC and CERN. More than 100,000 PCs, spread at one hundred institutions across the world. Three main areas of work: • Applications to allow physicists to submit data to Grid for analysis • Middleware to manage the distribution of computing jobs around the grid and deal with security • Deploying computing infrastructure at sites across the UK, to build a prototype Grid. 6(#total) 7(#total) e-Science Examples: Astronomy Astrogrid £10M project to build a data-grid for UK astronomy Forms the UK’s contribution to a global VirtualObservatory Three main strands to VirtualObservatory 1. International standards for astronomical data, metadata, and software Interoperability 2. New software infrastructure using emerging technology: web services and the Grid. 3. Science user tools to exploit the new infrastructure will bring the VO to the astronomer’s desktop. Goals of Astrogrid (mainly thread 2): Datagrid for key UK databases Datamining facilities for interrogating those databases e.g. search for ‘cloaked’ objects A uniform archive query and data-mining interface A facility for users to upload code to run their own algorithms on the datamining machines An exploration of techniques for open-ended resource 8(#total) discovery e-Science Examples: Climate Change Climateprediction.net To address the enormous variation in current climate predictions Existing climate models have to include the effects of smallscale physical processes (such as clouds) through simplifications (parameterisations) Results can be out by an order of magnitude Experimental Objective: Ensemble Forecasting Run thousands of climate models with slightly different physics in order to represent the whole range of uncertainties in all the parameterisations. (parameters are varied within their current range of uncertainty) The project has already recruited 37,000 users Project Goal: to make the first fully probability-based fifty-year forecast of human-induced climate change using a full-scale 3-D atmosphere-ocean climate simulation model. 9(#total) e-Science Example: Aircraft Maintenance DAME project £3.2 Million, 3 years, commenced Jan 2002. 4 Universities: York, Sheffield, Oxford, Leeds Industrial Partners: Rolls-Royce, Data Systems, Cybula Ltd Aim: aerospace diagnostics Remote, secure access to flight data and other operational data and resources Rapid data mining and analysis of fault data Distributed search on massive data collections using scalable, neural network type methods for comparing data with archived fleet engine data. Each flight could produce up to 1GB of vibration data The DAME workbench (portal) Analysis tools for the engine diagnosis process Central control point for automated workflows Manages distributed diagnosis team and virtual organisations Manages issues of security and user roles. 10(#total) e-Science Example: Aircraft Maintenance Engine flight data London Airport Airline office New York Airport Grid Diagnostics Centre Maintenance Centre American data center European data center 11(#total) e-Science Example: Predicting Markets The INWA Grid project (Innovation Node: Western Australia) : Investigating suitability of existing Grid technologies for secure, commercial data mining. The three-continent Grid: Edinburgh Parallel Computing Center (EPCC) Curtin University in Western Australia (WA) Chinese Academy of Sciences in Beijing. Data mining to predict customer trends, develop new products and better meet customer needs. Samples drawn from a region + publicly available -> build a clearer picture of regional behaviour within the economy But: need a distributed-aggregated approach to preserve anonymity Resources UK mortgage data + UK property data Australian telco data +Australian property data Compute power at EPCC + Curtin Scenario A bank wants to predict if home owners are likely to move house within 5 years of taking out a mortgage to buy the house Bank wants to use its own data and publicly available data to help improve the 12(#total) prediction e-Science Example: Simulated Biology BioSimGrid project Aim: to make the results of large-scale computer simulations of biomolecules more accessible to the biological community. Simulations of the motions of proteins are a key component in understanding how the structure of a protein is related to its dynamic function. Data distributed between University of California, San Diego and Oxford. Simulations were run using different programs and protocols Data in very different formats. • • • • • Software tools for interrogation and data-mining Generic analysis tools (python), visualisation VMD Annotation of simulation data Readily modifiable simple example scripts Underlying data storage structure hidden 13(#total) e-Science Examples: Cancer Diagnosis Telemedicine on the Grid Multi-site videoconferencing Real-time delivery of microscope imagery Communication and archiving of radiological images Supports multi-disciplinary meetings for the review of cancer diagnoses and treatment. Remote access to computational medical simulations of tumours and other cancer-related problems Data-mining of patient record databases Improved clinical decision making. Currently clinicians travel large distances Grid technology can provide access to appropriate clinical information and images across the network. 14(#total)