Edinburgh - at the Frontiers of e-Science Richard Kenway discovery science e-science = searching for the unknown in vast amounts of data electronic ‘needle in a haystack’ • to find the Higgs boson – and explain where mass comes from and … are not enough • you need to build a Grid LHC computing challenge assumes PC = ~ 25 SpecInt95 ~PByte/sec Online System ~100 MByte/sec Offline Farm ~20,000 PCs ~100 MByte/sec •one bunch crossing per 25 ns •100 triggers per second •each event is ~1 MByte Tier 1 US Regional Centre ~ Gbit/sec or Air Freight CERN Computer Centre >20,000 PCs Tier 0 Italian Regional Centre French Regional Centre Tier 2 ScotGRID++ ~1000 PCs RAL Regional Centre Tier2 Centre Tier2 Centre Tier2 Centre ~1000 PCs~1000 PCs~1000 PCs ~Gbit/sec Tier 3 Institute Institute ~200 PCs Physics data cache Workstations Institute Institute 100 - 1000 Mbit/sec Tier 4 physicists work on analysis “channels” each institute has ~10 physicists working on one or more channels data for these channels is cached by the institute server the web on steroids • 1989: Tim Berners-Lee invented the web – so physicists around the world could share documents • 1999: Grids add to the web – – – – computing power data management big instruments (eventually) sensors a new global infrastructure • information on demand - like power from a socket software computers sensor nets instruments colleagues data archives • the Grid is an emergent infrastructure to deliver dependable, pervasive and uniform access to globally distributed, dynamic and heterogeneous resources • problems of scalability, interoperability, fault tolerance, resource management and security underpinning technology why now? • for 50 years, we have been riding the crest of a IT wave 3.5 million users 22 teraflops – building vast untapped global resources – hundreds of millions of (mostly) idle PCs and • big science is facing a data tsunami increase in MIPS per chip MIPS/chip 1,000,000 100,000 10,000 microprocessor speeds double every 18 months (Moore’s Law) 1,000 P12 P8 P7 (Merced) Pentium Pro Pentium* 100 486* 10 386* 1 286* Year 0 1985 1990 1995 2000 2005 MIPS - Millions of instructions per second *Pentium, 286, 386 and 486 are registered trademarks of Intel Corp. 2010 internet hosts (million) actual and projected 180 network capacity doubles every 9 months 150 120 85 56.2 36.7 26.1 8.2 Jul-95 16.7 Jul-96 Jul-97 Jul-98 Jul-99 Jul-00 Jul-01 Jul-02 Source: ITU “Challenges to the Network: Internet for Development, 1999” Internet Software Consortium (www.isc.org), RIPE (www.ripe.net) Jul-03 fixed lines, mobile phones & internet users millions 1,200 fixed-line telephones 1,000 mobile phones estimated Internet users 800 600 400 200 0 1995 1996 1997 1998 1999 2000 2001 2002 note: columns show actual and projected users at end of year source: ITU 2003 Quality of Service on the internet • aim to distinguish types of traffic – high priority fast lanes – low priority slow lanes • hard to configure • intersim simulation tool – detailed model of network – understand and validate configurations EPCC + Cisco Systems Grid applications whole-system simulations wing models •lift capabilities •drag capabilities •responsiveness airframe models stabilizer models •deflection capabilities •responsiveness crew capabilities - accuracy - perception - stamina - reaction times - SOP’s engine models human models •braking performance •steering capabilities •traction •dampening capabilities landing gear models •thrust performance •reverse thrust performance •responsiveness •fuel consumption NASA Information Power Grid: coupling all sub-system simulations global in-flight engine diagnostics in-flight data global network eg SITA airline ground station DS&S Engine Health Center internet, e-mail, pager maintenance centre data centre Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York National Airspace Simulation Environment stabilizer models engine models 44,000 wing runs wing models GRC 50,000 engine runs airframe models 66,000 stabilizer runs ARC LaRC 22,000 commercial US flights a day 48,000 human crew runs human models simulation drivers Virtual National Air Space VNAS 22,000 airframe impact runs • FAA ops data • weather data 132,000 landing/ • airline schedule data take-off gear runs • digital flight data • radar tracks landing gear • terrain data models • surface data NASA Information Power Grid: aircraft, flight paths, airport operations and the environment are combined to get a virtual national airspace from genome to function • gene expression as an embryo develops EPCC MouseGrid: optical tomography image reconstruction in real time digital radiology on the Grid • 28 petabytes/year for 2000 hospitals • must satisfy privacy laws University of Pennsylvania emergency response teams • bring sensors, data, simulations and experts together – wildfire: predict movement of fire & direct fire-fighters – also earthquakes, peacekeeping forces, battlefields,… National Earthquake Simulation Grid Los Alamos National Laboratory: wildfire Earth observation • ENVISAT – € 3.5 billion – 400 terabytes/year – 700 users • ground deformation prior to a volcano Grid development data, information and knowledge • virtual data …from the grid – from a database somewhere – computed on request – measured on request • automated knowledge …from computer science – data: un-interpreted bits and bytes – information: data equipped with meaning – knowledge: information applied to solve a problem three layer Grid abstraction Knowledge Grid Data to Knowledge Control Information Grid Computation/ Data Grid the Grid as an evolving concept • enabler for transient ‘virtual organisations’ • anatomy: a software infrastructure that enables flexible, secure, co-ordinated resource sharing among dynamic collections of individuals, institutions and resources – Foster, Kesselman & Tuecke (2001) • evolution of and integration with web services • physiology: everything is a Grid service ie a service that conforms to a set of conventions for management and exchanging messages – Foster, Kesselman, Nick & Tuecke (2002) • Global Grid Forum: define a standard Grid architecture – big business and big science working together e-science in Scotland UK e-Science programme ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor Director General of Research Councils Office of Science and Technology UK e-Science funding DG Research Councils E-Science Steering Committee Director’s Awareness and Co-ordination Role Grid TAG Director Director’s Management Role Generic Challenges EPSRC (£15m), DTI (£15m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) £80m Collaborative projects ESRC (£3m) EPSRC (£17m) CLRC (£5m) Industrial Collaboration (£40m) UK e-science centres Edinburgh Glasgow DL AccessGrid always-on video walls Belfast Newcastle Manchester Oxford Cardiff RAL Soton Cambridge London Hinxton National e-Science Centre • Edinburgh + Glasgow Universities – Physics & Astronomy 2 – Informatics, Computing Science – EPCC • £6M EPSRC/DTI + £2M SHEFC over 3 years • e-Science Institute – visitors, workshops, co-ordination, outreach • middleware development – 50 : 50 industry : academia • ‘last-mile’ networking www.nesc.ac.uk data, data everywhere… • globally distributed heterogeneous databases are growing very fast – science is at the frontier – commerce, healthcare, entertainment are not far behind • Scottish e-Data Information & Knowledge Transformation Centre (eDIKT) – proposal to SHEFC for a centre to develop scalable database tools – astronomy, bioinformatics, geophysics, particle physics & commerce Scotland at the frontier… leading • UK core e-science – data integration – linked to US Globus • UK AstroGrid – virtual observatory – linked to EU AVO • UK GridPP + ScotGrid – particle physics data analysis – linked to EU DataGrid • EU enacts + GRIDSTART – supercomputer centres – EU grid projects Scotland at the frontier… participating • EU DataGrid: particle physics, biology & medical imaging, Earth observation • US DARPA Control of AgentBased Systems Grid: multinational military operations DARPA • UK RealityGrid: interactively couple experiments, simulations and visualisation over 100 scientists engaged in grid development by the end of 2002 imagine a political party reception… the leader enters… a rumour is started… and propagates across the room from little acorns… “ It is worth noting that an essential feature of the type of theory which has been described in this note is the prediction of incomplete multiplets of scalar and vector bosons. ” Peter Higgs (1964) “ … a billion people interacting with a million e-businesses with a trillion intelligent devices interconnected ” Lou Gerstner, IBM (2000) another technological revolution is underway