Edinburgh - at the Frontiers of e-Science Richard Kenway discovery science e-science = searching for the unknown in vast amounts of data electronic ‘needle in a haystack’ • to find the Higgs boson – and explain where mass comes from and … are not enough • you need to build a Grid LHC computing challenge assumes PC = ~ 25 SpecInt95 ~PByte/sec ~100 MByte/sec Online System Offline Farm ~20,000 PCs ~100 MByte/sec •one bunch crossing per 25 ns •100 triggers per second •each event is ~1 MByte Tier 1 US Regional Centre ~ Gbit/sec or Air Freight CERN Computer Centre >20,000 PCs Tier 0 Italian Regional Centre French Regional Centre Tier 2 ScotGRID++ ~1000 PCs RAL Regional Centre Tier2 Centre Tier2 Centre Tier2 Centre ~1000 PCs~1000 PCs~1000 PCs ~Gbit/sec Tier 3 Institute Institute ~200 PCs Physics data cache Workstations Institute Institute 100 - 1000 Mbit/sec Tier 4 physicists work on analysis “channels” each institute has ~10 physicists working on one or more channels data for these channels is cached by the institute server the web on steroids • 1989: Tim Berners-Lee invented the web – so physicists around the world could share documents • 1999: Grids add to the web – computing power – data management – big instruments – (eventually) sensors a new global infrastructure • information on demand - like power from a socket software computers sensor nets instruments colleagues data archives • the Grid is an emergent infrastructure to deliver dependable, pervasive and uniform access to globally distributed, dynamic and heterogeneous resources • problems of scalability, interoperability, fault tolerance, resource management and security underpinning technology why now? • for 50 years, we have been riding the crest of a IT wave 3.5 million users 22 teraflops – building vast untapped global resources – hundreds of millions of (mostly) idle PCs and • big science is facing a data tsunami increase in MIPS per chip MIPS/chip 1,000,000 1,000,000 100,000 100,000 10,000 10,000 microprocessor speeds double every 18 months (Moore’s Law) 1,000 1,000 P8 P8 P7 (Merced) P7 (Merced) Pentium Pentium Pro Pro Pentium* Pentium* 100 100 486* 486* 10 10 11 P12 P12 386* 386* 286* 286* Year Year 00 1985 1985 1990 1990 1995 1995 2000 2000 2005 2005 MIPS MIPS -- Millions Millions of of instructions instructions per per second second *Pentium, *Pentium, 286, 286, 386 386 and and 486 486 are are registered registered trademarks trademarks of of Intel Intel Corp. Corp. 2010 2010 internet hosts (million) actual and projected 180 180 network capacity doubles every 9 months 150 150 120 120 85 85 56.2 56.2 8.2 8.2 Jul-95 Jul-95 16.7 16.7 Jul-96 Jul-96 26.1 26.1 Jul-97 Jul-97 36.7 36.7 Jul-98 Jul-98 Jul-99 Jul-99 Jul-00 Jul-00 Jul-01 Jul-01 Jul-02 Jul-02 Source: Source: ITU ITU “Challenges “Challenges to to the the Network: Network: Internet Internet for for Development, Development, 1999” 1999” Internet Internet Software Software Consortium Consortium (www.isc.org), (www.isc.org), RIPE RIPE (www.ripe.net) (www.ripe.net) Jul-03 Jul-03 fixed lines, mobile phones & internet users millions 1,200 1,200 fixed-line fixed-line telephones telephones 1,000 1,000 mobile mobile phones phones estimated estimated Internet Internet users users 800 800 600 600 400 400 200 200 00 1995 1995 1996 1996 1997 1997 1998 1998 1999 1999 2000 2000 2001 2001 2002 2002 note: note: columns columns show show actual actual and and projected projected users users at at end end of of year year source: source: ITU ITU 2003 2003 Quality of Service on the internet • aim to distinguish types of traffic – high priority fast lanes – low priority slow lanes • hard to configure • intersim simulation tool – detailed model of network – understand and validate configurations EPCC + Cisco Systems Grid applications whole-system simulations wing models •lift capabilities •drag capabilities •responsiveness airframe models stabilizer models •deflection capabilities •responsiveness crew capabilities - accuracy - perception - stamina - reaction times - SOP’s engine models human models •braking performance •steering capabilities •traction •dampening capabilities landing gear models •thrust performance •reverse thrust performance •responsiveness •fuel consumption NASA Information Power Grid: coupling all sub-system simulations global in-flight engine diagnostics in-flight data airline global network eg SITA ground station DS&S Engine Health Center internet, e-mail, pager maintenance centre data centre Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York National Airspace Simulation Environment stabilizer models engine models 44,000 wing runs wing models GRC 50,000 engine runs airframe models 66,000 stabilizer runs ARC LaRC 22,000 commercial US flights a day 48,000 human crew runs human models simulation drivers Virtual National Air Space VNAS 22,000 airframe impact runs • FAA ops data • weather data 132,000 landing/ • airline schedule data take-off gear runs • digital flight data • radar tracks landing gear • terrain data models • surface data NASA Information Power Grid: aircraft, flight paths, airport operations and the environment are combined to get a virtual national airspace from genome to function • gene expression as an embryo develops EPCC MouseGrid: optical tomography image reconstruction in real time digital radiology on the Grid • 28 petabytes/year for 2000 hospitals • must satisfy privacy laws University of Pennsylvania emergency response teams • bring sensors, data, simulations and experts together – wildfire: predict movement of fire & direct fire-fighters – also earthquakes, peacekeeping forces, battlefields,… National Earthquake Simulation Grid Los Alamos National Laboratory: wildfire Earth observation • ENVISAT – € 3.5 billion – 400 terabytes/year – 700 users • ground deformation prior to a volcano Grid development data, information and knowledge • virtual data …from the grid – from a database somewhere – computed on request – measured on request • automated knowledge …from computer science – data: un-interpreted bits and bytes – information: data equipped with meaning – knowledge: information applied to solve a problem three layer Grid abstraction Knowledge Grid Data Data to to Knowledge Knowledge Control Control Information Grid Computation/ Data Grid the Grid as an evolving concept • enabler for transient ‘virtual organisations’ • anatomy: a software infrastructure that enables flexible, secure, co-ordinated resource sharing among dynamic collections of individuals, institutions and resources – Foster, Foster, Kesselman Kesselman & & Tuecke Tuecke (2001) (2001) • evolution of and integration with web services • physiology: everything is a Grid service ie a service that conforms to a set of conventions for management and exchanging messages – Foster, Foster, Kesselman, Kesselman, Nick Nick & & Tuecke Tuecke (2002) (2002) • Global Grid Forum: define a standard Grid architecture – big business bu sin e s and s a nbig d big s cieworking n c e w orking science togetherto g et h er e-science in Scotland UK e-Science programme ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor Director General of Research Councils Office of Science and Technology UK e-Science funding DG Research Councils E-Science Steering Committee Director’s Awareness and Co-ordination Role Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) £80m ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG Director Director’s Management Role Generic Challenges EPSRC (£15m), DTI (£15m) Collaborative projects Industrial Collaboration (£40m) UK e-science centres Edinburgh Glasgow DL AccessGrid always-on video walls Belfast Newcastle Manchester Oxford Cardiff RAL Soton Cambridge London Hinxton National e-Science Centre • Edinburgh + Glasgow Universities – Physics & Astronomy × 2 – Informatics, Computing Science – EPCC • £6M EPSRC/DTI + £2M SHEFC over 3 years • e-Science Institute – visitors, workshops, co-ordination, outreach • middleware development – 50 : 50 industry : academia • ‘last-mile’ networking www.nesc.ac.uk data, data everywhere… • globally distributed heterogeneous databases are growing very fast – science is at the frontier – commerce, healthcare, entertainment are not far behind • Scottish e-Data Information & Knowledge Transformation Centre (eDIKT) – proposal to SHEFC for a centre to develop scalable database tools – astronomy, bioinformatics, geophysics, particle physics & commerce Scotland at the frontier… leading • UK core e-science – data integration – linked to US Globus • UK AstroGrid – virtual observatory – linked to EU AVO • UK GridPP + ScotGrid – particle physics data analysis – linked to EU DataGrid • EU enacts + GRIDSTART – supercomputer centres – EU grid projects Scotland at the frontier… participating • EU DataGrid: particle physics, biology & medical imaging, Earth observation DARPA • US DARPA Control of AgentBased Systems Grid: multinational military operations • UK RealityGrid: interactively couple experiments, simulations and visualisation over 100 scientists engaged in grid development by the end of 2002 imagine a political party reception… the leader enters… a rumour is started… and propagates across the room from little acorns… “ It is worth noting that an essential feature of the type of theory which has been described in this note is the prediction of incomplete multiplets of scalar and vector bosons. ” Peter Higgs (1964) “ … a billion people interacting with a million e-businesses with a trillion intelligent devices interconnected ” Lou Gerstner, IBM (2000) another technological revolution is underway