Concepts of grid computing Mike Mineter Enabling Grids for E-sciencE

advertisement
Enabling Grids for E-sciencE
Concepts of grid computing
Mike Mineter
mjm@nesc.ac.uk
www.eu-egee.org
INFSO-RI-508833
Acknowledgements
Enabling Grids for E-sciencE
• This talk was prepared by Mike Mineter of NeSC and includes
slides from previous tutorials and talks delivered by:
–
–
–
–
–
–
Dave Berry, Richard Hopkins, Guy Warner (National e-Science Centre)
the EDG training team
Ian Foster, Argonne National Laboratories
Jeffrey Grethe, SDSC
EGEE colleagues
Mark Baker, The Distributed Systems Group, University of Portsmouth,
http://dsg.port.ac.uk/mab
• Talks at 3rd EGEE conference by
– Kyriakos Baxevanidis,Deputy Head,Unit of Research
Infrastructures,European Commission, DG INFSO
– Dr Spyros Konidaris, European Commission – DG INFSO
INFSO-RI-508833
Concepts of Grid Computing
2
Goals of this module
Enabling Grids for E-sciencE
• To introduce the concepts of Grid computing assuming
no previous knowledge
INFSO-RI-508833
Concepts of Grid Computing
3
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Current status of grids
The basis: authentication, authorisation, security
INFSO-RI-508833
Concepts of Grid Computing
4
The Grid Metaphor
Enabling Grids for E-sciencE
Mobile Access
G
R
I
D
Workstation
M
I
D
D
L
E
W
A
R
E
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Visualising
Internet, networks
INFSO-RI-508833
Concepts of Grid Computing
5
The grid vision
Enabling Grids for E-sciencE
• The grid vision is of “Virtual
computing” (+ information
services to locate computation,
storage resources)
– Compare: The web: “virtual
documents” (+ search engine
to locate them)
• MOTIVATION: collaboration
through sharing resources
(and expertise) to expand
horizons of
– Research
– Commerce – engineering, …
“the knowledge economy”
– Public service – health,
environment,…
INFSO-RI-508833
Concepts of Grid Computing
6
Contents
Enabling Grids for E-sciencE
• “The Grid” vision
• What is “a grid” ?
INFSO-RI-508833
Concepts of Grid Computing
7
“A grid”
Enabling Grids for E-sciencE
• The initial vision: “The Grid”
• The present reality: Many
“grids”
• Each grid is an infrastructure
enabling one or more “virtual
organisations” to share
computing resources
• What’s a VO?
– People in different
organisations seeking to
cooperate and share
resources across their
organisational boundaries
• Why establish a Grid?
VO
Institute A
Institute B
Institute C
Institute D
– Share data
– Pool computers
– Collaborate
INFSO-RI-508833
Concepts of Grid Computing
8
The Single Computer
Enabling Grids for E-sciencE
• The Operating System
enables easy use of
–
–
–
–
–
Input devices
Processor
Disks
Display
Any other attached devices
Application
Software
Operating
System
Disks, Processor,
Memory, …
INFSO-RI-508833
Concepts of Grid Computing
9
Resources on a Local Area Network
Enabling Grids for E-sciencE
User just perceives “shared
resources”, with no regard to
location in the organisation:
- Authenticated by username /
password
- Authorised to use own files,…
Application Software
Middleware for sharing
computers, servers, printers, …
Operating System on each
computer
Resources connected by a LAN
INFSO-RI-508833
Concepts of Grid Computing
10
Resources on a grid
Enabling Grids for E-sciencE
Application Software
Interface between app. and grid
Grid Middleware: “collective services”
Grid Middleware on each
resource
Operating System on each
resource
Resources connected by internet
INFSO-RI-508833
Concepts of Grid Computing
11
A grid
Enabling Grids for E-sciencE
• Grid middleware
runs on each
shared resource
– Data storage
– (Usually) batch
jobs on pools of
processors
• Users join VO’s
• Virtual organisation
negotiates with
sites to agree
access to resources
INTERNET
• Distributed services
(both people and
middleware) enable
the grid
INFSO-RI-508833
Concepts of Grid Computing
12
What characterises a grid?
Enabling Grids for E-sciencE
• Co-ordinated resource sharing
– No centralised point of control
– Different administrative domains.
• Standard, open, general-purpose protocols and
interfaces
– NOT specific to an application
– EGEE, NGS support multiple VO’s
• Delivering non-trivial qualities of service
– Co-ordinated to deliver combined services,
greater than sum of the individual components
• http://www.gridtoday.com/02/0722/100136.html
INFSO-RI-508833
Concepts of Grid Computing
13
The components of a Grid
Enabling Grids for E-sciencE
• Resources
– networking, computers, storage, data, instruments, …
• Grid Middleware
– the “operating system of the grid”
• Operations infrastructure
– Run enabling services (people + software)
• Virtual Organization management
– Procedures for gaining access to resources
INFSO-RI-508833
Concepts of Grid Computing
14
Key concepts
Enabling Grids for E-sciencE
• Virtual organisation: people and resources
collaborating - across admin, organisational
boundaries
• Single sign-on
– I connect to one machine – some sort of “digital credential” is
passed on to any other resource I use, basis of:
 Authentication: How do I identify myself to a resource without
username/password for each resource I use?
 Authorisation: what can I do? Determined by
• My membership of VO
• VO negotiations with resource providers
• Grid middleware runs on each resource
• User just perceives “shared resources” with no
concern for location or owning organisation
INFSO-RI-508833
Concepts of Grid Computing
15
Contents
Enabling Grids for E-sciencE
• “The Grid” vision
• What is “a grid” ?
• Drivers of grid computing
INFSO-RI-508833
Concepts of Grid Computing
16
The first driver: e-Science
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the
sharing across the Internet of resources (data,
instruments, computation, people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing
very large data collections) – data deluges from new
technologies
– Crosses organisational boundaries
• Examples….
INFSO-RI-508833
Concepts of Grid Computing
17
Astronomy
Enabling Grids for E-sciencE
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
INFSO-RI-508833
Data and images courtesy Alex Szalay, John Hopkins University
Concepts of Grid Computing
18
Large Hadron Collider at CERN
Enabling Grids for E-sciencE
• Data Challenge:
– 10 Petabytes/year of data !!!
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires computing
power equivalent to ~100,000 of today's
fastest PC processors!
• Operational challenges
Mont Blanc
(4810 m)
– Reliable and scalable through project
lifetime of decades
INFSO-RI-508833
Concepts of Grid Computing
Downtown Geneva
19
Enabling Grids for E-sciencE
Input
file
Seq1 > dcscdssdcsdcdsc
Computing
element
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
Seq1
zedezdze>
bcbjbf
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
bscdsbcbjbfvbfvbvfbvbvbhvbhs
vbhdvbhfdbvfd
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkc
…
Seqn > bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkchdsqhfduhdhdhq
edezhhezldhezhfehflezfzejfv
dedzedz
dzedezd
dedzedz
zecdscsd
dzedezd
dedzedz
cscdssdc
zecdscsd
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
dzedezd
cdsbcbjb
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
sdcdscbs
cscdssdc
f cdsbcbjb
sdcdscbs
f cdsbcbjb
f
BLAST
UI
Seq2 > bvdfvfdvhbdfvb
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
Seq2
zedezdze>
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
dedzedzd
Seqn
zedezdze>
dedzedzdzedezdze
cdscsdcsc
cdscsdcscdssdcsdc
dssdcsdc
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
BLAST gridification
dedzedzdzedezdzecdscsdcscdssdcsd
cdscbscdsbcbjbfvbfvbvfbvbvbhvbh
svbhdvbhfdbvfdbvdfvfdvhbdfvbhd
bhvdsvbhvbhdvrefghefgdscgdfgcsd
ycgdkcsqkcqhdsqhfduhdhdhqedezh
dhezldhezhfehflezfzeflehfhezfhehf
ezhflezhflhfhfelhfehflzlhfzdjazslzd
hfhfdfezhfehfizhflqfhduhsdslchlkc
hudcscscdscdscdscsddzdzeqvnvqvn
q! Vqlvkndlkvnldwdfbwdfbdbd
wdfbfbndblnblkdnblkdbdfbwfdbfn
INFSO-RI-508833
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
bcbjbf
BLAST
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
RESULT
BLAST
bcbjbf
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
BLAST
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
DB
Concepts of Grid Computing
Computing
element
20
Enabling Grids for E-sciencE
DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis
and Prognosis
Engine flight data
London Airport
Airline
office
New York Airport
•“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
INFSO-RI-508833
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
Concepts of Grid Computing
21
Academic drivers: not only e-science!!
Enabling Grids for E-sciencE
The impact of grids when they support…
Curation, discovery, reuse of knowledge
e-Research
e-Science
INFSO-RI-508833
Concepts of Grid Computing
22
Academic drivers
Derived from a slide by
the UK’s JISC
• E-research
• Digital libraries
• Centrality of
curation,
preservation
• Under-recognised by
many researchers
• Virtual Digital Data
Libraries needed for
research as well as
learning
• E-learning
Enabling Grids for E-sciencE
• AAA Services
• e-Infrastructure
INFSO-RI-508833
Concepts of Grid Computing
23
Political drivers
Enabling Grids for E-sciencE
• Entering the “knowledge society” from the “industrial society”
– industrial society: also enabled by communications infrastructure
• Lisbon strategy: Research and Innovation will be the most
important factors in determining Europe’s success through the
next decades
• THE GOAL: “UNLEASH CREATIVITY”- by investment in
– Human skills
– Infrastructures
• Growth of e-infrastructure (= networks + grid + operations)
– phase 1: mainly academia, some in industry: “an elite, privileged to do
this job”
– phase 2: ordinary people doing distributed work; SMEs, adopt, adapt
and use
– phase 3: the next generations
 will transform e-infrastructure and its uses
 We don’t know how others will use what we devise
INFSO-RI-508833
Concepts of Grid Computing
24
EGEE – building e-infrastructure
Enabling Grids for E-sciencE
EGEE is building a large-scale
production grid service to:
• Underpin research,
technology and public service
• Link with and build on
national, regional and
international initiatives
• Foster international
cooperation both in the
creation and the use of the einfrastructure
INFSO-RI-508833
Pan-European Grid
Operations, Support and
training
Collaboration
Network
infrastructure
& Resource
centres
Concepts of Grid Computing
25
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
INFSO-RI-508833
Concepts of Grid Computing
29
Enabling Grids for E-sciencE
If “The Grid”
vision leads us
here…
… then where are
we now?
INFSO-RI-508833
Concepts of Grid Computing
30
Grid projects
Enabling Grids for E-sciencE
Many Grid development efforts — all over the world
•UK – OGSA-DAI, RealityGrid, GeoDise,
•NASA Information Power Grid
Comb-e-Chem, DiscoveryNet, DAME,
•DOE Science Grid
AstroGrid, GridPP, MyGrid, GOLD,
eDiamond, Integrative Biology, …
•NSF National Virtual Observatory
•Netherlands – VLAM, PolderGrid
•NSF GriPhyN
•Germany – UNICORE, Grid proposal
•DOE Particle Physics Data Grid
•France – Grid funding approved
•NSF TeraGrid
•Italy – INFN Grid
•DOE ASCI Grid
•Eire – Grid proposals
•DOE Earth Systems Grid
•Switzerland - Network/Grid proposal
•DARPA CoABS Grid
•DataGrid (CERN, ...)
•Hungary – DemoGrid, Grid proposal
•NEESGrid
•EuroGrid (Unicore)
•Norway, Sweden - NorduGrid
•DataTag (CERN,…)
•DOH BIRN
•Astrophysical Virtual Observatory
•NSF iVDGL
•GRIP (Globus/Unicore)
•GRIA (Industrial applications)
•GridLab (Cactus Toolkit)
•CrossGrid (Infrastructure Components)
•EGSO (Solar Physics)
INFSO-RI-508833
Concepts of Grid Computing
31
Grids: where are we now?
Enabling Grids for E-sciencE
• Many key concepts identified and known
• Many grid projects have tested, and benefit from, these
• Major efforts now on establishing:
– Standards (a slow process)
(e.g. Global Grid Forum, http://www.gridforum.org/ )
– Production Grids for multiple VO’s
 “Production” = Reliable, sustainable, with commitments to quality of
service
• In Europe, EGEE
• In UK, National Grid Service
• In US, Teragrid
 One stack of middleware that serves many research (and other!!!)
communities
 Operational procedures and services (people!, policy,..)
– New user communities
• … whilst research & development continues
INFSO-RI-508833
Concepts of Grid Computing
32
The key for new VO’s
Enabling Grids for E-sciencE
Application
Application
toolkits, standards
Middleware:
“collective services”
Basic Grid services:
AA, job submission, info, …
• The tools, services used
by the VO’s applications
• Application development
environment, portals,
semantics
• Insulate applications
from changing
middleware
INFSO-RI-508833
Concepts of Grid Computing
33
The vision of 2001: convergence of
Web Services and Grids
Enabling Grids for E-sciencE
Open Grid
Services
Architecture
Web services
World-wide web
OGSI
Grid prototypes
High-end computing
High throughput-computing
INTERNET
INFSO-RI-508833
Massively parallel
computing
Concepts of Grid Computing
34
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Current status of grids
The basis: authentication, authorisation, security
INFSO-RI-508833
Concepts of Grid Computing
35
Grid security and trust -1
Enabling Grids for E-sciencE
• Providers of resources (computers, databases,..) need risks to
be controlled: they are asked to trust users they do not know
– They trust a VO
– The VO trusts its users
• User’s need
– single sign-on: to be able to logon to a machine that can pass the
user’s identity to other resources
– To trust owners of the resources they are using
• Build middleware on layer providing:
– Authentication: who wants to use/provide resource
– Authorisation: what the user is allowed to do
– Security: reduce vulnerability, e.g. from outside the firewall
– Non-repudiation: knowing who did what
• Digital credentials and the “Grid Security Infrastructure”
middleware are the basis of production grids
INFSO-RI-508833
Concepts of Grid Computing
36
Grid security and trust -2
Enabling Grids for E-sciencE
• Currently, achieved by Certification:
– User’s identity has to be certified by one of the national
Certification Authorities (CAs)
 mutually recognized http://www.gridpma.org/,
for EU go via here to http://marianne.in2p3.fr/datagrid/ca/catable-ca.html to find your CA
•E.g. In UK go to http://www.grid-support.ac.uk/ca/ralist.htm
– Resources are also certified by CAs
• User
– User joins a VO
– Digital certificate is basis of AA
– Identity passed to other resources you use, where it is
mapped to a local account – the mapping is maintained by
the VO
• Common agreed policies establish rights for a
Virtual Organization to use resources
INFSO-RI-508833
Concepts of Grid Computing
37
Grid security and trust -3
Enabling Grids for E-sciencE
• Certification and GSI provides
– Authentication
 Resource can trust user
 User can trust the resource provider
 …. So long as certificates are protected – they are your grid
identity
– A basis for Authorisation
 so a VO can manage access to resources
 Resource providers trust the VO
 The VO trusts the user
– Mechanism for checking message integrity
 Messages are passed between machines
 Public/private key pairs protect message integrity as well as
authentication
•Not (usually) encrypted but message-integrity is checked
INFSO-RI-508833
Concepts of Grid Computing
38
Summary of grid computing concepts
Enabling Grids for E-sciencE
• Flexible collaboration across multiple administrative
domains – sharing data, computers, instruments,
application software,..
• Single sign-on to resources in multiple organisations
– Authorisation, authentication
• Need for people-services as well as middleware
services
– credential authorities, VO managers, support
• Drives are towards
– Production services (reliable, sustainable,… – against which
research projects can plan with confidence)
 In Europe, EGEE
 In UK, National Grid Service
– Standards
– Empowering new user communities
INFSO-RI-508833
Concepts of Grid Computing
39
Download