What are grid computing and e-Science? Mike Mineter

advertisement
Enabling Grids for E-sciencE
What are grid computing and
e-Science?
Mike Mineter
mjm@nesc.ac.uk
www.eu-egee.org
INFSO-RI-508833
Acknowledgements
Enabling Grids for E-sciencE
• This talk was prepared by Mike Mineter of NeSC and
includes slides from previous tutorials and talks
delivered by:
–
–
–
–
–
Dave Berry, Richard Hopkins (National e-Science Centre)
the EDG training team
Ian Foster, Argonne National Laboratories
Jeffrey Grethe, SDSC
EGEE colleagues
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
3
Goals of this module
Enabling Grids for E-sciencE
• To introduce the concepts of e-Science and Grid
computing assuming no previous knowledge
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
4
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
5
The Grid Metaphor
Enabling Grids for E-sciencE
Mobile Access
G
R
I
D
Workstation
M
I
D
D
L
E
W
A
R
E
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Visualising
Internet, networks
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
6
The grid vision
Enabling Grids for E-sciencE
• The grid vision is of “Virtual computing” (+ information
services to locate computation, storage resources)
– Compare: The web: “virtual documents” (+ search engine to
locate them)
• MOTIVATION: collaboration through sharing resources
(and expertise) to expand horizons of
– Research
– Commerce – engineering, … “the knowledge economy”
– Public service – health, environment,…
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
7
Before Grids
Enabling Grids for E-sciencE
Researchers in many
locations need to share
resources
FTP, telnet, blood, sweat and tears…
and little support for collaboration
Scientific instruments, data
stores and computers in
many locations
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
8
The Grid Vision
Enabling Grids for E-sciencE
Researchers in many
locations need to share
resources
Resources connect to “The
Grid”
Scientific instruments, data
stores and computers in
many locations
INFSO-RI-508833
What are Grid
Computing
and e-Science?
March tutorials
2005, NeSC
Slide
derived
from EDG 29
/ LCG
9
Grid projects
Enabling Grids for E-sciencE
Many Grid development efforts — all over the world
•UK – OGSA-DAI, RealityGrid, GeoDise,
•NASA Information Power Grid
Comb-e-Chem, DiscoveryNet, DAME,
•DOE Science Grid
AstroGrid, GridPP, MyGrid, GOLD,
eDiamond, Integrative Biology, …
•NSF National Virtual Observatory
•Netherlands – VLAM, PolderGrid
•NSF GriPhyN
•Germany – UNICORE, Grid proposal
•DOE Particle Physics Data Grid
•France – Grid funding approved
•NSF TeraGrid
•Italy – INFN Grid
•DOE ASCI Grid
•Eire – Grid proposals
•DOE Earth Systems Grid
•Switzerland - Network/Grid proposal
•DARPA CoABS Grid
•DataGrid (CERN, ...)
•Hungary – DemoGrid, Grid proposal
•NEESGrid
•EuroGrid (Unicore)
•Norway, Sweden - NorduGrid
•DataTag (CERN,…)
•DOH BIRN
•Astrophysical Virtual Observatory
•NSF iVDGL
•GRIP (Globus/Unicore)
•GRIA (Industrial applications)
•GridLab (Cactus Toolkit)
•CrossGrid (Infrastructure Components)
•EGSO (Solar Physics)
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
10
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
11
“A grid”
Enabling Grids for E-sciencE
• The initial vision: “The Grid”
• The present reality: Many
“grids”
• Each grid is an infrastructure
enabling one or more “virtual
organisations” to share
computing resources
• What’s a VO?
– People in different
organisations seeking to
cooperate and share
resources across their
organisational boundaries
VO
Institute
Desktop
• Why establish a Grid?
– Share data
– Pool computers
– Collaborate
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
12
A computer
Enabling Grids for E-sciencE
• The Operating System
enables easy use of
–
–
–
–
Input devices
Processor
Disks
Display
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
13
An institute’s resources on a LAN
Enabling Grids for E-sciencE
Middleware runs on each
computer:
• To allow sharing of disks and
printers (using, e.g. Samba)
• To share processors for
computation (e.g. Condor)
• User just perceives “shared
resources”, with no regard to
location in the building:
– Authenticated by
username / password
– Authorised to use own files,…
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
14
Typical current grid
Enabling Grids for E-sciencE
• Grid middleware
runs on each
shared resource
– Data storage
– (Usually) batch
jobs on pools of
processors
• Users join VO’s
• Virtual organisation
negotiates with
sites to agree
access to resources
INTERNET
• Distributed services
(both people and
middleware) enable
the grid, allow
single sign-on
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
15
What is a Grid?
Enabling Grids for E-sciencE
• “An infrastructure that enables flexible, secure,
coordinated resource sharing among dynamic collections
of individuals, institutions and resources”
Ian Foster and Carl Kesselman
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
16
Key concepts
Enabling Grids for E-sciencE
• Virtual organisation: people and resources
collaborating - crosses admin, organisational
boundaries
• Single sign-on
– I connect to one machine – some sort of “digital credential” is
passed on to any other resource I use
– Authentication: How do I identify myself to a resource without
username/password for each resource I use?
– Authorisation: what can I do? Determined by
 My role in a VO (role-based in near future)
 VO negotiations with resource providers
• Grid middleware runs on each resource
• User just perceives “shared resources”, with no
awareness of location or owning organisation
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
17
What is Grid computing not?
Enabling Grids for E-sciencE
• “Grid computing” is a trendy phrase!
• It’s therefore also a misused term!
– Sometimes in Industry : “Grids” = clusters
 Motivations: better use of resources; scope for commercial services
– Also used to refer to the harvesting of unused compute cycles,
e.g.
 SETI@home, Climateprediction.net
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
18
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
19
The first driver: e-Science
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the
sharing across the Internet of resources (data,
instruments, computation, people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing
very large data collections) – data deluges from new
technologies
– Crosses organisational boundaries
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
20
Other major drivers for grids
Enabling Grids for E-sciencE
e-Research – not just e-Science
– Also curation and digital data
libraries (DILIGENT, DELOS,
GRACE)
•
Commerce – not just academia!!
•
Politics: the “knowledge
economy”
•
“e-Infrastructure”: A shared
resource
Collaboration
Grid
– That enables science, research,
engineering, medicine, industry,
…
– It will improve UK / European / …
productivity
Operations, Support and
training
•
Network infrastructure
linking resource centres
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
21
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
22
Astronomy
Enabling Grids for E-sciencE
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
INFSO-RI-508833
Data and images courtesy Alex Szalay, John Hopkins University
What are Grid Computing and e-Science?
29 March 2005, NeSC
23
Earth Observation
Enabling Grids for E-sciencE
ESA missions:
• 100’s of Gbytes of data per day
Grid contribution to EO:
• Enhance the ability to access high level
products
• Allow reprocessing of large historical
archives
• Improve Earth science complex applications
(data fusion, data mining, modelling …)
Federico.Carminati , EU review presentation, 1 March 2002
INFSO-RI-508833
Derived from: L. Fusco, June 2001
What are Grid Computing and e-Science?
29 March 2005, NeSC
24
Enabling Grids for E-sciencE
DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis
and Prognosis
Engine flight data
London Airport
Airline
office
New York Airport
•“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
INFSO-RI-508833
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
What are Grid Computing
andData
e-Science?
Signal
Explorer 29 March 2005, NeSC
25
Large Hadron Collider at CERN
Enabling Grids for E-sciencE
• Data Challenge:
– 10 Petabytes/year of data !!!
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires computing
power equivalent to ~100,000 of today's
fastest PC processors!
• Operational challenges
– Reliable and scalable through project
lifetime of decades
INFSO-RI-508833
What are Grid Computing and e-Science?
Mont Blanc
(4810 m)
Downtown Geneva
29 March 2005, NeSC
26
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
27
Enabling Grids for E-sciencE
If “The Grid”
vision leads us
here…
… then where are
we now?
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
28
Current status
Enabling Grids for E-sciencE
• Many key concepts identified and known
• Many grid projects have tested these
• Major efforts now on establishing:
– Standards (a slow process)
(Global Grid Forum, http://www.gridforum.org/ )
– Production Grids for multiple VO’s
 “Production” = Reliable, sustainable, with commitments to quality of
service
– New user communities
– … whilst research & development continues
• In Europe, EGEE
• In UK, NGS
• In US, Teragrid
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
29
1997- Present: Globus
Enabling Grids for E-sciencE
• A software toolkit addressing certain technical
problems in the development of Grid enabled tools,
services, and applications
– Offers a modular “bag of technologies”
– Made available under liberal open source license
• Not turnkey solutions, but building blocks and tools for
application developers and system integrators
• Tools built on Grid Security Infrstructure to include:
–
–
–
–
Job submission (GRAM) : run a job on a remote computer
Information services: So I know which computer to use
File transfer (GridFTP): so large data files can be transferred
Replica management: so I can have multiple versions of a file
“close” to the computers where I want to run jobs
• Production grids are (currently) based on release 2
• http://www.globus.org/
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
30
Computing Resources: Feb 2005
Enabling Grids for E-sciencE
Country providing resources
Country anticipating joining
In LCG-2:
 113 sites, 30 countries
 >10,000 cpu
 ~5 PB storage
Includes non-EGEE sites:
• 9 countries
• 18 sites
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
31
Grid security and trust -1
Enabling Grids for E-sciencE
• Providers of resources (computers, databases,..)
need risks to be controlled: they are asked to trust
users they do not know
• User’s need single sign-on: logon to a machine that
can pass the user’s identity to other resources
• Build middleware on layer providing:
– Authentication: know who wants to use resource
– Authorisation: know what the user is allowed to do
– Security: reduce vulnerability, e.g. from outside the firewall
– Non-repudiation: knowing who did what
• GSI from the Globus toolkit does this for NGS
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
32
Grid security and trust -2
Enabling Grids for E-sciencE
• Currently, achieved by Certification:
– User’s identity has to be certified by one of the national
Certification Authorities (CAs)
 mutually recognized http://www.gridpma.org/
 In UK go to http://www.grid-support.ac.uk/ca/ralist.htm
– Resources (node machines) have to be certified by CAs
– Digital certificate installed on the machine accessed by user:
basis of AA
– User joins a VO
– Identity passed to other resources you use, where it is
mapped to a local account – the mapping is maintained by
the VO
• Common agreed policies establish rights for a
Virtual Organization to use resources
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
33
The key for new VO’s
Enabling Grids for E-sciencE
Application
Application
toolkits, standards
Middleware:
“collective services”
Basic Grid services:
AA, job submission, info, …
INFSO-RI-508833
• Application development
environment, portals
• Insulate applications
from changing
middleware
What are Grid Computing and e-Science?
29 March 2005, NeSC
34
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
35
Are Grids for you?!
Enabling Grids for E-sciencE
• IF a community effort is vital to achieving goals, by
sharing services of data and computation,
• AND that effort crosses organisation boundaries
• THEN yes!
•
In the UK, plan to join the NGS!
• OR if you wish to use computation/storage/data
services provided on a Grid – then YES!
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
36
Summary
Enabling Grids for E-sciencE
• Collaboration across multiple organisations
• Single sign-on to resources in multiple organisations
• Need for people-services as well as middleware
services to enable this: e.g. to run
–
–
–
–
Enabling services (e.g. info service)
Certification authority for AA
VO management – to negotiate with sites
Helpdesk, …
• Drives are towards
– Production services
 In the UK, the NGS
 In Europe, EGEE
– Standards – (tomorrow)
– “e-Infrastructure” ~ integration of networking and middleware to
support collaboration
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
37
Additional slides
Enabling Grids for E-sciencE
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
38
Exponential Growth
Performance per Dollar Spent
Enabling Grids for E-sciencE
Optical Fibre
Doubling Time
9 12
Gilder’s Law
(32X in 4 yrs)
(bits per second)
(months)
18
Data Storage
Storage Law
(16X in 4yrs)
(bits per sq. inch)
Chip capacity
(# transistors)
0
1
2
Moore’s Law
(5X in 4yrs)
3
4
5
Number of Years
Triumph of Light – Scientific American. George Stix, January 2001
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
39
How Different 2005 is from 1995
Enabling Grids for E-sciencE
• Enormous quantities of data: Petabytes
– For an increasing number of communities
– Constraint is not collection but analysis
• Ubiquitous Internet:
– >100 million hosts
– Security and Trust are crucial issues
• Ultra-high-speed networks: >10 Gb/s
– Global optical networks
– Bottlenecks: last kilometre & firewalls
• Huge quantities of computing: >100 Top/s
– Moore’s law gives us all supercomputers
– Organising their effective use is the challenge
• Moore’s law everywhere
– Instruments, detectors, sensors, scanners, …
– Organising their effective use is the challenge
INFSO-RI-508833
What are Grid Computing and e-Science?
29 March 2005, NeSC
40
Download