What is Grid Computing? Richard Hopkins Enabling Grids for E-sciencE

advertisement
Enabling Grids for E-sciencE
What is Grid Computing?
Richard Hopkins
rph@nesc.ac.uk
NGS Induction – Rutherford Appleton Laboratory,
2nd / 3rd November 2005
www.eu-egee.org
INFSO-RI-508833
Acknowledgements
Enabling Grids for E-sciencE
• This talk was prepared by Richard Hopkins of NeSC and includes
slides from previous tutorials and talks delivered by:
–
–
–
–
–
–
Dave Berry, Mike Mineter, Guy Warner (National e-Science Centre)
the EDG training team
Ian Foster, Argonne National Laboratories
Jeffrey Grethe, SDSC
EGEE colleagues
Mark Baker, The Distributed Systems Group, University of Portsmouth,
http://dsg.port.ac.uk/mab
• Talks at 3rd EGEE conference by
– Kyriakos Baxevanidis,Deputy Head,Unit of Research
Infrastructures,European Commission, DG INFSO
– Dr Spyros Konidaris, European Commission – DG INFSO
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
2
Goals and Content
Enabling Grids for E-sciencE
Goal - To introduce the concepts of Grid computing
assuming no previous knowledge
• What is “a grid” ?
• Drivers of grid computing
• Current status of grids
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
4
The Grid Metaphor
Enabling Grids for E-sciencE
Mobile Access
G
R
I
D
Workstation
M
I
D
D
L
E
W
A
R
E
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Visualising
Internet, networks
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
5
The grid vision
Enabling Grids for E-sciencE
• The grid vision is of “Virtual
computing” (+ information
services to locate computation,
storage resources)
– Compare: The web: “virtual
documents” (+ search engine
to locate them)
• MOTIVATION: collaboration
through sharing resources
(and expertise) to expand
horizons of
– Research
– Commerce – engineering, …
“the knowledge economy”
– Public service – health,
environment,…
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
6
“A grid”
Enabling Grids for E-sciencE
• The initial vision: “The Grid”
• The present reality: Many
“grids”
• Each grid is an infrastructure
enabling one or more “virtual
organisations” (VOs) to share
computing resources
• What’s a VO?
– People in different
organisations seeking to
cooperate and share
resources across their
organisational boundaries
• Why establish a Grid?
–
–
–
–
VO
Institute A
Institute B
Institute C
Institute D
Share data
Share computers
Share instruments
Collaborate
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
7
Single Computer
Enabling Grids for E-sciencE
• The Operating System
enables easy use of
–
–
–
–
–
Input/Output devices
Processor
Disks
Display
Instruments
Application
Software
Operating
System
Disks, Processor,
Memory, …
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
8
Local Area Network
Enabling Grids for E-sciencE
User just perceives “shared
resources”, with no regard to
location in the organisation
LAN resources act like a single
virtual computer
Middleware (LAN O/S) presents
that image
Application Software
Middleware for sharing
computers, servers, printers, …
Operating System on each
computer
Resources connected by a LAN
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
9
A grid
Enabling Grids for E-sciencE
• Users join VO’s
• Virtual organisation
negotiates with
sites to agree
access to resources
• Distributed services
(both people and
middleware) enable
the grid
INFSO-RI-508833
INTERNET
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
10
Grid
Enabling Grids for E-sciencE
• Grid middleware creates
the image of the Grid being
a single virtual computer
(Ideally)
Issues
• Heterogeneity – hardware,
software, culture
• Scalability
• Reliability – tolerate
permanent partial failure
• Viable computing model batch processing
• Access control
– Authentication
– Authorisation
– Single sign on
INFSO-RI-508833
Application Software
Interface between app. and grid
Grid Middleware: “collective services”
Grid Middleware on each
resource
Operating System on each
resource
Resources connected by internet
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
11
What characterises a grid?
Enabling Grids for E-sciencE
• Co-ordinated resource sharing
– No centralised point of control
– Different administrative domains.
• Standard, open, general-purpose protocols and
interfaces
– NOT specific to an application
– EGEE, NGS support multiple VO’s
• Delivering non-trivial qualities of service
– Co-ordinated to deliver combined services,
greater than sum of the individual components
• http://www.gridtoday.com/02/0722/100136.html
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
12
The components of a Grid
Enabling Grids for E-sciencE
• Resources
– networking, computers, storage, data, instruments, …
• Grid Middleware
– the “operating system of the grid”
• Operations infrastructure
– Run enabling services (people + software)
• Virtual Organization management
– Procedures for gaining access to resources
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
13
DRIVERS OF GRID COMPUTING
Enabling Grids for E-sciencE
Goal - To introduce the concepts of Grid computing assuming
no previous knowledge
• What is “a grid” ?
• Drivers of grid computing
• Current status of grids
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
14
The first driver: e-Science
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the
sharing across the Internet of resources (data,
instruments, computation, people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing
very large data collections) – data deluges from new
technologies
– Crosses administrative boundaries
• Examples….
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
15
Astronomy
Enabling Grids for E-sciencE
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
Data and images courtesy Alex Szalay, John Hopkins University
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1Billion objects
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
16
Large Hadron Collider at CERN
Enabling Grids for E-sciencE
• Data Challenge:
– 10 Petabytes/year of data !!!
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires computing
power equivalent to ~100,000 of today's
fastest PC processors!
• Operational challenges
– Reliable and scalable through project
lifetime of decades
INFSO-RI-508833
Mont Blanc
(4810 m)
Downtown Geneva
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
17
Enabling Grids for E-sciencE
Input
file
Seq1 > dcscdssdcsdcdsc
Computing
element
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
Seq1
zedezdze>
bcbjbf
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
bscdsbcbjbfvbfvbvfbvbvbhvbhs
vbhdvbhfdbvfd
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkc
…
Seqn > bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkchdsqhfduhdhdhq
edezhhezldhezhfehflezfzejfv
dedzedz
dzedezd
dedzedz
zecdscsd
dzedezd
dedzedz
cscdssdc
zecdscsd
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
dzedezd
cdsbcbjb
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
sdcdscbs
cscdssdc
f cdsbcbjb
sdcdscbs
f cdsbcbjb
f
BLAST
UI
Seq2 > bvdfvfdvhbdfvb
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
Seq2
zedezdze>
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
dedzedzd
Seqn
zedezdze>
dedzedzdzedezdze
cdscsdcsc
cdscsdcscdssdcsdc
dssdcsdc
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
BLAST gridification
dedzedzdzedezdzecdscsdcscdssdcsd
cdscbscdsbcbjbfvbfvbvfbvbvbhvbh
svbhdvbhfdbvfdbvdfvfdvhbdfvbhd
bhvdsvbhvbhdvrefghefgdscgdfgcsd
ycgdkcsqkcqhdsqhfduhdhdhqedezh
dhezldhezhfehflezfzeflehfhezfhehf
ezhflezhflhfhfelhfehflzlhfzdjazslzd
hfhfdfezhfehfizhflqfhduhsdslchlkc
hudcscscdscdscdscsddzdzeqvnvqvn
q! Vqlvkndlkvnldwdfbwdfbdbd
wdfbfbndblnblkdnblkdbdfbwfdbfn
INFSO-RI-508833
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
bcbjbf
BLAST
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
RESULT
BLAST
bcbjbf
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
BLAST
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
DB
bcbjbf
Computing
element
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
18
Enabling Grids for E-sciencE
DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis
and Prognosis
Engine flight data
London Airport
Airline
office
New York Airport
•“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
INFSO-RI-508833
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
19
Academic drivers: not only e-science!!
Enabling Grids for E-sciencE
The impact of grids when they support…
Curation, discovery, reuse of knowledge
e-Research
e-Science
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
20
Academic drivers
Derived from a slide by
the UK’s JISC
• E-research
• Digital libraries
• Centrality of
curation,
preservation
• Under-recognised by
many researchers
• Virtual Digital Data
Libraries needed for
research as well as
learning
• E-learning
Enabling Grids for E-sciencE
• AAA Services
• e-Infrastructure
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
21
Political drivers
Enabling Grids for E-sciencE
• Entering the “knowledge society” from the “industrial society”
Industrial society = Transportation Infrastructure
Knowledge society = Communications infrastructure
• Lisbon strategy: Research and Innovation will be the most
important factors in determining Europe’s success through the
next decades
• THE GOAL: “UNLEASH CREATIVITY”- by investment in
– Human skills
– Infrastructures
• Growth of e-infrastructure (= networks + grid + operations)
– phase 1: mainly academia, some in industry: “an elite, privileged to do
this job”
– phase 2: ordinary people doing distributed work; SMEs, adopt, adapt
and use
– phase 3: the next generations
 Will transform e-infrastructure and its uses
 We don’t know how others will use what we devise
 Just as current use of WWW not predictable by its initiators
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
22
EGEE – building e-infrastructure
Enabling Grids for E-sciencE
EGEE is building a large-scale
production grid service to:
• Underpin research,
technology and public service
• Link with and build on
national, regional and
international initiatives
• Foster international
cooperation both in the
creation and the use of the einfrastructure
INFSO-RI-508833
Pan-European Grid
Operations, Support and
training
Collaboration
Network
infrastructure
& Resource
centres
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
23
CURRENT STATUS OF GRIDS
Enabling Grids for E-sciencE
Goal - To introduce the concepts of Grid computing assuming
no previous knowledge
• What is “a grid” ?
• Drivers of grid computing
• Current status of grids
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
24
Enabling Grids for E-sciencE
If “The Grid”
vision leads us
here…
… then where are
we now?
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
25
Grid projects
Enabling Grids for E-sciencE
Many Grid development efforts — all over the world
•UK – OGSA-DAI, RealityGrid, GeoDise,
•NASA Information Power Grid
Comb-e-Chem, DiscoveryNet, DAME,
•DOE Science Grid
AstroGrid, GridPP, MyGrid, GOLD,
eDiamond, Integrative Biology, …
•NSF National Virtual Observatory
•Netherlands – VLAM, PolderGrid
•NSF GriPhyN
•Germany – UNICORE, Grid proposal
•DOE Particle Physics Data Grid
•France – Grid funding approved
•NSF TeraGrid
•Italy – INFN Grid
•DOE ASCI Grid
•Eire – Grid proposals
•DOE Earth Systems Grid
•Switzerland - Network/Grid proposal
•DARPA CoABS Grid
•DataGrid (CERN, ...)
•Hungary – DemoGrid, Grid proposal
•NEESGrid
•EuroGrid (Unicore)
•Norway, Sweden - NorduGrid
•DataTag (CERN,…)
•DOH BIRN
•Astrophysical Virtual Observatory
•NSF iVDGL
•GRIP (Globus/Unicore)
•GRIA (Industrial applications)
•GridLab (Cactus Toolkit)
•CrossGrid (Infrastructure Components)
•EGSO (Solar Physics)
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
26
Grids: where are we now?
Enabling Grids for E-sciencE
• Many key concepts identified and known
• Many grid projects have tested, and benefit from, these
• Major efforts now on establishing:
– Standards (a slow process)
(e.g. Global Grid Forum, http://www.gridforum.org/ )
– Production Grids for multiple VO’s
 “Production” = Reliable, sustainable, with commitments to quality of
service
• In Europe, EGEE
• In UK, National Grid Service
• In US, Teragrid
 One stack of middleware that serves many research (and other!!!)
communities
 Operational procedures and services (people!, policy,..)
– New user communities
• … whilst research & development continues
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
27
The key for new VO’s
Enabling Grids for E-sciencE
Application
Application
toolkits, standards
Middleware:
“collective services”
Basic Grid services:
AA, job submission, info, …
• The tools, services used
by the VO’s applications
• Application development
environment, portals,
semantics
• Insulate applications
from changing
middleware
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
28
The vision of 2001: convergence of
Web Services and Grids
Enabling Grids for E-sciencE
Open Grid
Services
Architecture
Web services
World-wide web
OGSI
Grid prototypes
High-end computing
High throughput-computing
INTERNET
INFSO-RI-508833
Massively parallel
computing
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
29
Key concepts
Enabling Grids for E-sciencE
• Virtual organisation:
– people and resources collaborating - across admin, organisational
boundaries
 Individual joins VO
 VO negotiations with resource providers
• Grid middleware
– running on each resource to interface it to the Grid
– providing specific services
• Single Virtual Computer
– User just perceives “shared resources” with no concern for location or
owning organisation
– Issues





INFSO-RI-508833
Heterogeneity
Scalability
Reliability
Computing model
Access control
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
30
Key Concepts
Enabling Grids for E-sciencE
• Drives are towards
– Production services (reliable, sustainable,… – against which
research projects can plan with confidence)
 In Europe, EGEE
 In UK, National Grid Service
– Standards & convergence with WWW mainstream
– Empowering new user communities
INFSO-RI-508833
NGS Induction, RAL Nov 2nd/3rd 2005 - What is Grid Computing – Richard Hopkins
31
Download