Edinburgh - at the Frontiers of e-Science Richard Kenway

advertisement
Edinburgh - at the Frontiers
of e-Science
Richard Kenway
discovery science
e-science = searching for the unknown
in vast amounts of data
electronic ‘needle in a haystack’
• to find the Higgs boson
– and explain where mass comes from
and
… are not enough
• you need to build a Grid
LHC computing challenge
assumes PC = ~ 25 SpecInt95
~PByte/sec
~100 MByte/sec
Online System
Offline Farm
~20,000 PCs
~100 MByte/sec
•one bunch crossing per 25 ns
•100 triggers per second
•each event is ~1 MByte
Tier 1
US Regional
Centre
~ Gbit/sec
or Air Freight
CERN Computer
Centre >20,000 PCs
Tier 0
Italian Regional
Centre
French Regional
Centre
Tier 2
ScotGRID++
~1000 PCs
RAL Regional
Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
~1000 PCs~1000 PCs~1000 PCs
~Gbit/sec
Tier 3
Institute
Institute
~200 PCs
Physics data
cache
Workstations
Institute
Institute
100 - 1000
Mbit/sec
Tier 4
physicists work on analysis “channels”
each institute has ~10 physicists working on
one or more channels
data for these channels is cached by the
institute server
the web on steroids
• 1989: Tim Berners-Lee
invented the web
– so physicists around the world
could share documents
• 1999: Grids add to the
web
– computing power
– data management
– big instruments
– (eventually) sensors
a new global infrastructure
• information on demand - like power from a socket
software
computers
sensor
nets
instruments
colleagues
data archives
• the Grid is an emergent infrastructure to deliver
dependable, pervasive and uniform access to globally
distributed, dynamic and heterogeneous resources
• problems of scalability, interoperability, fault tolerance,
resource management and security
underpinning technology
why now?
• for 50 years, we
have been riding
the crest of a IT
wave
3.5 million users
22 teraflops
– building vast
untapped global
resources
– hundreds of
millions of (mostly)
idle PCs
and
• big science is facing a data tsunami
increase in MIPS per chip
MIPS/chip
1,000,000
1,000,000
100,000
100,000
10,000
10,000
microprocessor speeds double
every 18 months (Moore’s Law)
1,000
1,000
P8
P8
P7
(Merced)
P7 (Merced)
Pentium
Pentium Pro
Pro
Pentium*
Pentium*
100
100
486*
486*
10
10
11
P12
P12
386*
386*
286*
286*
Year
Year
00
1985
1985
1990
1990
1995
1995
2000
2000
2005
2005
MIPS
MIPS -- Millions
Millions of
of instructions
instructions per
per second
second
*Pentium,
*Pentium, 286,
286, 386
386 and
and 486
486 are
are registered
registered trademarks
trademarks of
of Intel
Intel Corp.
Corp.
2010
2010
internet hosts
(million) actual and projected
180
180
network capacity doubles
every 9 months
150
150
120
120
85
85
56.2
56.2
8.2
8.2
Jul-95
Jul-95
16.7
16.7
Jul-96
Jul-96
26.1
26.1
Jul-97
Jul-97
36.7
36.7
Jul-98
Jul-98
Jul-99
Jul-99
Jul-00
Jul-00
Jul-01
Jul-01
Jul-02
Jul-02
Source:
Source: ITU
ITU “Challenges
“Challenges to
to the
the Network:
Network: Internet
Internet for
for Development,
Development, 1999”
1999”
Internet
Internet Software
Software Consortium
Consortium (www.isc.org),
(www.isc.org), RIPE
RIPE (www.ripe.net)
(www.ripe.net)
Jul-03
Jul-03
fixed lines, mobile phones & internet users
millions
1,200
1,200
fixed-line
fixed-line telephones
telephones
1,000
1,000
mobile
mobile phones
phones
estimated
estimated Internet
Internet users
users
800
800
600
600
400
400
200
200
00
1995
1995
1996
1996
1997
1997
1998
1998
1999
1999
2000
2000
2001
2001
2002
2002
note:
note: columns
columns show
show actual
actual and
and projected
projected users
users at
at end
end of
of year
year
source:
source: ITU
ITU
2003
2003
Quality of Service on the internet
• aim to distinguish
types of traffic
– high priority fast lanes
– low priority slow lanes
• hard to configure
• intersim simulation
tool
– detailed model of
network
– understand and
validate configurations
EPCC + Cisco Systems
Grid applications
whole-system simulations
wing models
•lift capabilities
•drag capabilities
•responsiveness
airframe models
stabilizer models
•deflection capabilities
•responsiveness
crew capabilities
- accuracy
- perception
- stamina
- reaction times
- SOP’s
engine models
human models
•braking performance
•steering capabilities
•traction
•dampening capabilities
landing gear models
•thrust performance
•reverse thrust performance
•responsiveness
•fuel consumption
NASA Information Power Grid: coupling all sub-system simulations
global in-flight engine diagnostics
in-flight data
airline
global network
eg SITA
ground
station
DS&S Engine Health Center
internet, e-mail, pager
maintenance centre
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
National Airspace Simulation Environment
stabilizer models
engine models
44,000 wing runs
wing models
GRC
50,000 engine runs
airframe models
66,000 stabilizer
runs
ARC
LaRC
22,000 commercial
US flights a day
48,000 human
crew runs
human models
simulation
drivers
Virtual
National Air
Space
VNAS
22,000 airframe
impact runs
• FAA ops data
• weather data
132,000 landing/
• airline schedule data take-off gear runs
• digital flight data
• radar tracks
landing
gear
• terrain data
models
• surface data
NASA Information Power Grid: aircraft, flight paths, airport operations and the environment
are combined to get a virtual national airspace
from genome to function
• gene expression as an embryo develops
EPCC MouseGrid: optical tomography image reconstruction in real time
digital radiology on the Grid
• 28 petabytes/year for 2000 hospitals
• must satisfy privacy laws
University of Pennsylvania
emergency response teams
• bring sensors, data,
simulations and experts
together
– wildfire: predict movement
of fire & direct fire-fighters
– also earthquakes,
peacekeeping forces,
battlefields,…
National Earthquake Simulation Grid
Los Alamos National Laboratory: wildfire
Earth observation
• ENVISAT
– € 3.5 billion
– 400 terabytes/year
– 700 users
• ground deformation
prior to a volcano
Grid development
data, information and knowledge
• virtual data …from the grid
– from a database somewhere
– computed on request
– measured on request
• automated knowledge …from computer science
– data: un-interpreted bits and bytes
– information: data equipped with meaning
– knowledge: information applied to solve a problem
three layer Grid abstraction
Knowledge Grid
Data
Data to
to
Knowledge
Knowledge
Control
Control
Information Grid
Computation/
Data Grid
the Grid as an evolving concept
• enabler for transient ‘virtual organisations’
• anatomy: a software infrastructure that enables flexible,
secure, co-ordinated resource sharing among dynamic
collections of individuals, institutions and resources
– Foster,
Foster, Kesselman
Kesselman &
& Tuecke
Tuecke (2001)
(2001)
• evolution of and integration with web services
• physiology: everything is a Grid service ie a service that
conforms to a set of conventions for management and
exchanging messages
– Foster,
Foster, Kesselman,
Kesselman, Nick
Nick &
& Tuecke
Tuecke (2002)
(2002)
• Global Grid Forum: define a standard Grid architecture
– big business
bu sin e s and
s a nbig
d big
s cieworking
n c e w orking
science
togetherto g et h er
e-science in Scotland
UK e-Science programme
‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
‘e-Science will change the dynamic of the way
science is undertaken.’
John Taylor
Director General of Research Councils
Office of Science and Technology
UK e-Science funding
DG Research Councils
E-Science
Steering Committee
Director’s
Awareness and Co-ordination Role
Academic Application Support
Programme
Research Councils (£74m), DTI (£5m)
PPARC (£26m)
BBSRC (£8m)
MRC (£8m)
NERC (£7m)
£80m
ESRC (£3m)
EPSRC (£17m)
CLRC (£5m)
Grid TAG
Director
Director’s
Management Role
Generic Challenges
EPSRC (£15m), DTI (£15m)
Collaborative projects
Industrial Collaboration (£40m)
UK e-science centres
Edinburgh
Glasgow
DL
AccessGrid
always-on video
walls
Belfast
Newcastle
Manchester
Oxford
Cardiff
RAL
Soton
Cambridge
London
Hinxton
National e-Science Centre
• Edinburgh + Glasgow Universities
– Physics & Astronomy × 2
– Informatics, Computing Science
– EPCC
• £6M EPSRC/DTI + £2M SHEFC
over 3 years
• e-Science Institute
– visitors, workshops, co-ordination,
outreach
• middleware development
– 50 : 50 industry : academia
• ‘last-mile’ networking
www.nesc.ac.uk
data, data everywhere…
• globally distributed heterogeneous databases are
growing very fast
– science is at the frontier
– commerce, healthcare, entertainment are not far behind
• Scottish e-Data Information & Knowledge
Transformation Centre (eDIKT)
– proposal to SHEFC for a centre to develop scalable
database tools
– astronomy, bioinformatics, geophysics, particle physics
& commerce
Scotland at the frontier… leading
• UK core e-science
– data integration
– linked to US Globus
• UK AstroGrid
– virtual observatory
– linked to EU AVO
• UK GridPP + ScotGrid
– particle physics data
analysis
– linked to EU DataGrid
• EU enacts + GRIDSTART
– supercomputer centres
– EU grid projects
Scotland at the frontier… participating
• EU DataGrid: particle physics,
biology & medical imaging, Earth
observation
DARPA
• US DARPA Control of AgentBased Systems Grid:
multinational military operations
• UK RealityGrid: interactively
couple experiments, simulations
and visualisation
over 100 scientists engaged in grid development
by the end of 2002
imagine a political party reception…
the leader enters…
a rumour is started…
and propagates across the room
from little acorns…
“ It is worth noting that an essential feature of
the type of theory which has been described
in this note is the prediction of incomplete
multiplets of scalar and vector bosons. ”
Peter Higgs (1964)
“ … a billion people interacting with a million
e-businesses with a trillion intelligent devices
interconnected ”
Lou Gerstner, IBM (2000)
another technological revolution is underway
Download