Edinburgh - at the Frontiers of e-Science Richard Kenway

advertisement
Edinburgh - at the Frontiers
of e-Science
Richard Kenway
discovery science
e-science = searching for the unknown
in vast amounts of data
electronic ‘needle in a haystack’
• to find the Higgs boson
– and explain where mass comes from
and
… are not enough
• you need to build a Grid
LHC computing challenge
assumes PC = ~ 25 SpecInt95
~PByte/sec
Online System
~100 MByte/sec
Offline Farm
~20,000 PCs
~100 MByte/sec
•one bunch crossing per 25 ns
•100 triggers per second
•each event is ~1 MByte
Tier 1
US Regional
Centre
~ Gbit/sec
or Air Freight
CERN Computer
Centre >20,000 PCs
Tier 0
Italian Regional
Centre
French Regional
Centre
Tier 2
ScotGRID++
~1000 PCs
RAL Regional
Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
~1000 PCs~1000 PCs~1000 PCs
~Gbit/sec
Tier 3
Institute
Institute
~200 PCs
Physics data
cache
Workstations
Institute
Institute
100 - 1000
Mbit/sec
Tier 4
physicists work on analysis “channels”
each institute has ~10 physicists working on
one or more channels
data for these channels is cached by the
institute server
the web on steroids
• 1989: Tim Berners-Lee
invented the web
– so physicists around the world
could share documents
• 1999: Grids add to the
web
–
–
–
–
computing power
data management
big instruments
(eventually) sensors
a new global infrastructure
• information on demand - like power from a socket
software
computers
sensor
nets
instruments
colleagues
data archives
• the Grid is an emergent infrastructure to deliver
dependable, pervasive and uniform access to globally
distributed, dynamic and heterogeneous resources
• problems of scalability, interoperability, fault tolerance,
resource management and security
underpinning technology
why now?
• for 50 years, we
have been riding
the crest of a IT
wave
3.5 million users
22 teraflops
– building vast
untapped global
resources
– hundreds of
millions of (mostly)
idle PCs
and
• big science is facing a data tsunami
increase in MIPS per chip
MIPS/chip
1,000,000
100,000
10,000
microprocessor speeds double
every 18 months (Moore’s Law)
1,000
P12
P8
P7 (Merced)
Pentium Pro
Pentium*
100
486*
10
386*
1
286*
Year
0
1985
1990
1995
2000
2005
MIPS - Millions of instructions per second
*Pentium, 286, 386 and 486 are registered trademarks of Intel Corp.
2010
internet hosts
(million) actual and projected
180
network capacity doubles
every 9 months
150
120
85
56.2
36.7
26.1
8.2
Jul-95
16.7
Jul-96
Jul-97
Jul-98
Jul-99
Jul-00
Jul-01
Jul-02
Source: ITU “Challenges to the Network: Internet for Development, 1999”
Internet Software Consortium (www.isc.org), RIPE (www.ripe.net)
Jul-03
fixed lines, mobile phones & internet users
millions
1,200
fixed-line telephones
1,000
mobile phones
estimated Internet users
800
600
400
200
0
1995
1996
1997
1998
1999
2000
2001
2002
note: columns show actual and projected users at end of year
source: ITU
2003
Quality of Service on the internet
• aim to distinguish
types of traffic
– high priority fast lanes
– low priority slow lanes
• hard to configure
• intersim simulation
tool
– detailed model of
network
– understand and
validate configurations
EPCC + Cisco Systems
Grid applications
whole-system simulations
wing models
•lift capabilities
•drag capabilities
•responsiveness
airframe models
stabilizer models
•deflection capabilities
•responsiveness
crew capabilities
- accuracy
- perception
- stamina
- reaction times
- SOP’s
engine models
human models
•braking performance
•steering capabilities
•traction
•dampening capabilities
landing gear models
•thrust performance
•reverse thrust performance
•responsiveness
•fuel consumption
NASA Information Power Grid: coupling all sub-system simulations
global in-flight engine diagnostics
in-flight data
global network
eg SITA
airline
ground
station
DS&S Engine Health Center
internet, e-mail, pager
maintenance centre
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
National Airspace Simulation Environment
stabilizer models
engine models
44,000 wing runs
wing models
GRC
50,000 engine runs
airframe models
66,000 stabilizer
runs
ARC
LaRC
22,000 commercial
US flights a day
48,000 human
crew runs
human models
simulation
drivers
Virtual
National Air
Space
VNAS
22,000 airframe
impact runs
• FAA ops data
• weather data
132,000 landing/
• airline schedule data take-off gear runs
• digital flight data
• radar tracks
landing
gear
• terrain data
models
• surface data
NASA Information Power Grid: aircraft, flight paths, airport operations and the environment
are combined to get a virtual national airspace
from genome to function
• gene expression as an embryo develops
EPCC MouseGrid: optical tomography image reconstruction in real time
digital radiology on the Grid
• 28 petabytes/year for 2000 hospitals
• must satisfy privacy laws
University of Pennsylvania
emergency response teams
• bring sensors, data,
simulations and experts
together
– wildfire: predict movement
of fire & direct fire-fighters
– also earthquakes,
peacekeeping forces,
battlefields,…
National Earthquake Simulation Grid
Los Alamos National Laboratory: wildfire
Earth observation
• ENVISAT
– € 3.5 billion
– 400 terabytes/year
– 700 users
• ground deformation
prior to a volcano
Grid development
data, information and knowledge
• virtual data …from the grid
– from a database somewhere
– computed on request
– measured on request
• automated knowledge …from computer science
– data: un-interpreted bits and bytes
– information: data equipped with meaning
– knowledge: information applied to solve a problem
three layer Grid abstraction
Knowledge Grid
Data to
Knowledge
Control
Information Grid
Computation/
Data Grid
the Grid as an evolving concept
• enabler for transient ‘virtual organisations’
• anatomy: a software infrastructure that enables flexible,
secure, co-ordinated resource sharing among dynamic
collections of individuals, institutions and resources
– Foster, Kesselman & Tuecke (2001)
• evolution of and integration with web services
• physiology: everything is a Grid service ie a service that
conforms to a set of conventions for management and
exchanging messages
– Foster, Kesselman, Nick & Tuecke (2002)
• Global Grid Forum: define a standard Grid architecture
– big business and big science working together
e-science in Scotland
UK e-Science programme
‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
‘e-Science will change the dynamic of the way
science is undertaken.’
John Taylor
Director General of Research Councils
Office of Science and Technology
UK e-Science funding
DG Research Councils
E-Science
Steering Committee
Director’s
Awareness and Co-ordination Role
Grid TAG
Director
Director’s
Management Role
Generic Challenges
EPSRC (£15m), DTI (£15m)
Academic Application Support
Programme
Research Councils (£74m), DTI (£5m)
PPARC (£26m)
BBSRC (£8m)
MRC (£8m)
NERC (£7m)
£80m Collaborative projects
ESRC (£3m)
EPSRC (£17m)
CLRC (£5m)
Industrial Collaboration (£40m)
UK e-science centres
Edinburgh
Glasgow
DL
AccessGrid
always-on video
walls
Belfast
Newcastle
Manchester
Oxford
Cardiff
RAL
Soton
Cambridge
London
Hinxton
National e-Science Centre
• Edinburgh + Glasgow Universities
– Physics & Astronomy  2
– Informatics, Computing Science
– EPCC
• £6M EPSRC/DTI + £2M SHEFC
over 3 years
• e-Science Institute
– visitors, workshops, co-ordination,
outreach
• middleware development
– 50 : 50 industry : academia
• ‘last-mile’ networking
www.nesc.ac.uk
data, data everywhere…
• globally distributed heterogeneous databases are
growing very fast
– science is at the frontier
– commerce, healthcare, entertainment are not far behind
• Scottish e-Data Information & Knowledge
Transformation Centre (eDIKT)
– proposal to SHEFC for a centre to develop scalable
database tools
– astronomy, bioinformatics, geophysics, particle physics
& commerce
Scotland at the frontier… leading
• UK core e-science
– data integration
– linked to US Globus
• UK AstroGrid
– virtual observatory
– linked to EU AVO
• UK GridPP + ScotGrid
– particle physics data
analysis
– linked to EU DataGrid
• EU enacts + GRIDSTART
– supercomputer centres
– EU grid projects
Scotland at the frontier… participating
• EU DataGrid: particle physics,
biology & medical imaging, Earth
observation
• US DARPA Control of AgentBased Systems Grid:
multinational military operations
DARPA
• UK RealityGrid: interactively
couple experiments, simulations
and visualisation
over 100 scientists engaged in grid development
by the end of 2002
imagine a political party reception…
the leader enters…
a rumour is started…
and propagates across the room
from little acorns…
“ It is worth noting that an essential feature of
the type of theory which has been described
in this note is the prediction of incomplete
multiplets of scalar and vector bosons. ”
Peter Higgs (1964)
“ … a billion people interacting with a million
e-businesses with a trillion intelligent devices
interconnected ”
Lou Gerstner, IBM (2000)
another technological revolution is underway
Download