What are grid computing and e-Science? Mike Mineter

advertisement
Enabling Grids for E-sciencE
What are grid computing and
e-Science?
Mike Mineter
mjm@nesc.ac.uk
www.eu-egee.org
INFSO-RI-508833
Acknowledgements
Enabling Grids for E-sciencE
• This talk was prepared by Mike Mineter of NeSC and
includes slides from previous tutorials and talks
delivered by:
–
–
–
–
–
Dave Berry, Richard Hopkins (National e-Science Centre)
the EDG training team
Ian Foster, Argonne National Laboratories
Jeffrey Grethe, SDSC
EGEE colleagues
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
3
Goals of this module
Enabling Grids for E-sciencE
• To introduce the concepts of e-Science and Grid
computing assuming no previous knowledge
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
4
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
5
What is….
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the sharing across
the Internet of resources (data, instruments, computation,
people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing very
large data collections)
– Crosses organisational boundaries
• What is a Grid?
“An infrastructure that enables flexible, secure, coordinated
resource sharing among dynamic collections of individuals, institutions
and resources”
Ian Foster and Carl Kesselman
UK e-Science Programme: http://www.rcuk.ac.uk/escience/
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
6
What is….
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the sharing across
the Internet of resources (data, instruments, computation,
people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing very
large data collections)
– Crosses organisational boundaries
• What is a Grid?
“An infrastructure that enables flexible, secure, coordinated
resource sharing among dynamic collections of individuals, institutions
and resources”
Ian Foster and Carl Kesselman
UK e-Science Programme: http://www.rcuk.ac.uk/escience/
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
7
The Grid Metaphor
Enabling Grids for E-sciencE
Mobile Access
G
R
I
D
Workstation
M
I
D
D
L
E
W
A
R
E
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Visualising
Internet, networks
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
8
What is Grid computing?
Enabling Grids for E-sciencE
• “A grid by any other name”, Dec 2nd 2004 The
Economist print edition
– “The next big thing in computing”
– “No-one can agree what it is”
 Sometimes in Industry : “Grids” = clusters
• Motivations: better use of resources; scope for commercial
services
 Also used to refer to the harvesting of unused compute cycles
• (SETI@home, Climateprediction.net)
• In e-Research:
– The grid vision is of “Virtual computing” (+ information services
and brokers to locate computation, storage resources)
 Cf: The web: “virtual documents” (+ search engine to locate them)
– MOTIVATION: collaboration through sharing resources (and
expertise) to expand horizons of research (and knowledge
curation, discovery, and education)
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
9
Before Grids
Enabling Grids for E-sciencE
Researchers in many
locations need to share
resources
FTP, telnet, blood, sweat and tears…
and little support for collaboration
Scientific instruments, data
stores and computers in
many locations
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
10
The Grid Vision
Enabling Grids for E-sciencE
Researchers in many
locations need to share
resources
Resources connect to “The
Grid”
Scientific instruments, data
stores and computers in
many locations
INFSO-RI-508833
What are Grid
Computing
and e-Science?
March tutorials
2005, NeSC
Slide
derived
from EDG 10
/ LCG
11
A computer
Enabling Grids for E-sciencE
• The Operating System
enables easy use of
–
–
–
–
Input devices
Processor
Disks
Display
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
12
Resources on a LAN
Enabling Grids for E-sciencE
Middleware runs on each
computer:
• To allow sharing of disks and
printers (using, e.g. Samba)
• To share processors for
computation (e.g. Condor)
• User just perceives “shared
resources”, with no regard to
location in the building:
– Authenticated by
username / password
– Authorised to use own files,…
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
13
Grid vision: Resources on the Internet
Enabling Grids for E-sciencE
• Grid middleware
runs on each
collaborating
resource
• Controlled by
services of
– Authentication
– Authorisation
INTERNET
• User just perceives
“shared resources”,
with no regard to
location or owning
organisation
• Single sign-on
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
14
Typical current grid
Enabling Grids for E-sciencE
• Grid middleware
runs on each
shared resource
– Data storage
– (Usually) batch
jobs on pools of
processors
• Users join VO’s
• Virtual organisation
negotiates with
sites to agree
access to resources
INTERNET
• Distributed services
(both people and
middleware) enable
the grid, allow
single sign-on
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
15
Grid projects
Enabling Grids for E-sciencE
Many Grid development efforts — all over the world
•UK – OGSA-DAI, RealityGrid, GeoDise,
•NASA Information Power Grid
Comb-e-Chem, DiscoveryNet, DAME,
•DOE Science Grid
AstroGrid, GridPP, MyGrid, GOLD,
eDiamond, Integrative Biology, …
•NSF National Virtual Observatory
•Netherlands – VLAM, PolderGrid
•NSF GriPhyN
•Germany – UNICORE, Grid proposal
•DOE Particle Physics Data Grid
•France – Grid funding approved
•NSF TeraGrid
•Italy – INFN Grid
•DOE ASCI Grid
•Eire – Grid proposals
•DOE Earth Systems Grid
•Switzerland - Network/Grid proposal
•DARPA CoABS Grid
•DataGrid (CERN, ...)
•Hungary – DemoGrid, Grid proposal
•NEESGrid
•EuroGrid (Unicore)
•Norway, Sweden - NorduGrid
•DataTag (CERN,…)
•DOH BIRN
•Astrophysical Virtual Observatory
•NSF iVDGL
•GRIP (Globus/Unicore)
•GRIA (Industrial applications)
•GridLab (Cactus Toolkit)
•CrossGrid (Infrastructure Components)
•EGSO (Solar Physics)
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
16
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
17
Exponential Growth
Performance per Dollar Spent
Enabling Grids for E-sciencE
Optical Fibre
Doubling Time
9 12
Gilder’s Law
(32X in 4 yrs)
(bits per second)
(months)
18
Data Storage
Storage Law
(16X in 4yrs)
(bits per sq. inch)
Chip capacity
(# transistors)
0
1
2
Moore’s Law
(5X in 4yrs)
3
4
5
Number of Years
Triumph of Light – Scientific American. George Stix, January 2001
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
18
How Different 2005 is from 1995
Enabling Grids for E-sciencE
• Enormous quantities of data: Petabytes
– For an increasing number of communities
– Constraint is not collection but analysis
• Ubiquitous Internet:
– >100 million hosts
– Security and Trust are crucial issues
• Ultra-high-speed networks: >10 Gb/s
– Global optical networks
– Bottlenecks: last kilometre & firewalls
• Huge quantities of computing: >100 Top/s
– Moore’s law gives us all supercomputers
– Organising their effective use is the challenge
• Moore’s law everywhere
– Instruments, detectors, sensors, scanners, …
– Organising their effective use is the challenge
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
19
Global Drivers of e-Science
Enabling Grids for E-sciencE
•
Collaboration - Enabling People to Work Together
– With security and flexibility for otherwise unattainable benefits
– For example:


•
•
•
to share instruments, databases, or computation
to serve occasional peaks of high demand for computation (especially
trivially parallelisable ones) from collaborators
Digital technology – exponential growth
“Data deluge”
Consequent Research Investment
– UK e-Science programme
– EU e-Infrastructure
– USA cyberinfrastructure
•
Industry investment – potential for dynamic accountable use
of resources
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
20
What is e-Infrastructure – Political
view
Enabling Grids for E-sciencE
• A shared resource
– That enables science,
research, engineering,
medicine, industry, …
– It will improve UK / European /
… productivity
 Lisbon Accord 2000
 E-Science Vision SR2000 –
John Taylor
– Commitment by UK
government
 Sections 2.23-2.25
– Always there
 c.f. telephones, transport,
power, internet
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
21
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
22
Example: Astronomy
Enabling Grids for E-sciencE
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
INFSO-RI-508833
Data and images courtesy Alex Szalay, John Hopkins University
What are Grid Computing and e-Science?
10 March 2005, NeSC
23
Example: Earth Observation
Enabling Grids for E-sciencE
ESA missions:
• 100’s of Gbytes of data per day
Grid contribution to EO:
• Enhance the ability to access high level
products
• Allow reprocessing of large historical
archives
• Improve Earth science complex applications
(data fusion, data mining, modelling …)
Federico.Carminati , EU review presentation, 1 March 2002
INFSO-RI-508833
Derived from: L. Fusco, June 2001
What are Grid Computing and e-Science?
10 March 2005, NeSC
24
Example: Wearable Devices
Enabling Grids for E-sciencE
Sensor bus
•
•
•
•
•
Sensors
Wireless connection
Positioning information from GPS
Mobile medical technologies
Environmental sensing (air
pollution)
INFSO-RI-508833
GPS aerial
What are Grid Computing and e-Science?
10 March 2005, NeSC
25
Connecting people: Access Grid
Enabling Grids for E-sciencE
Cameras
Microphones
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
26
Enabling Grids for E-sciencE
DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis
and Prognosis
Engine flight data
London Airport
Airline
office
New York Airport
•“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
INFSO-RI-508833
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
What are Grid Computing
andData
e-Science?
Signal
Explorer 10 March 2005, NeSC
27
BLAST – comparing DNA or protein sequences
Enabling Grids for E-sciencE
• BLAST is the first step for analysing new sequences: to
compare DNA or protein sequences to other ones stored in
personal or public databases.
• Ideal as a grid application – trivial to parallelise as independent
concurrent jobs.
– Requires resources to store databases and run algorithms
– Large user community
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
28
The LCG
Enabling Grids for E-sciencE
• Large Hadron Collider (LHC) Compute Grid
• Largest current grid – some of its middleware being
included in the NGS
• One of the initial applications for EGEE and one of its
predecessors, European DataGrid
• 1000’s of scientists sharing resources to provide the
computation / storage needed for the LHC from 2007
• Sustainable, dependable service is vital
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
29
The CERN Large Hadron Collider
Enabling Grids for E-sciencE
http://www.cern.ch
LHC
~9 km
SPS
CERN
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
30
The LHC Experiments
Enabling Grids for E-sciencE
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
31
The LHC Experiments
Enabling Grids for E-sciencE
ATLAS
CMS
~10-15 PetaBytes /year
~108 events/year
~103 batch and interactive users
LHCb
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
32
Orders of magnitude…
Enabling Grids for E-sciencE
1 Gigabyte (1GB)
= 1000MB
A DVD movie
Per experiment:
• 40 million collisions per second
• After filtering, 100 collisions of interest per second
• A Megabyte of digitised information for each
collision = recording rate of 0.1-1 Gigabytes/sec
• 1 billion collisions recorded = 1-3 Petabyte/year
Total: ~10.000.000.000.000.000 bytes/year
= 1% of
CMS
INFSO-RI-508833
LHCb
1 Megabyte (1MB)
A digital photo
ATLAS
What are Grid Computing and e-Science?
1 Terabyte (1TB)
= 1000GB
World annual
book production
1 Petabyte (1PB)
= 1000TB
10% of the annual
production by LHC
experiments
1 Exabyte (1EB)
= 1000 PB
World annual
information production
ALICE
10 March 2005, NeSC
33
Computing Resources: Feb 2005
Enabling Grids for E-sciencE
Country providing resources
Country anticipating joining
In LCG-2:
 113 sites, 30 countries
 >10,000 cpu
 ~5 PB storage
Includes non-EGEE sites:
• 9 countries
• 18 sites
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
34
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
35
Current status
Enabling Grids for E-sciencE
• Many key concepts identified and known
• Many grid projects have tested these
• Major efforts now on:
– Establishing standards (a slow process)
– Establishing production Grids (too urgent to wait for standards!,
also standards need to emerge from experience!)
“Production” = Reliable, sustainable, with commitments to quality
of service
– Establishing new user communities
 Need to prove this is technology with widespread potential
 Much more than for the few disciplines that helped to create it
– Whilst research & development continues
• In Europe, EGEE
• In UK, NGS (interoperable - at least - with EGEE)
• In US, Teragrid
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
36
Grid security and trust -1
Enabling Grids for E-sciencE
• Providers of resources (computers, databases,..)
need risks to be controlled: they are asked to trust
users they do not know
• User’s need single sign-on: logon to a machine that
can pass the user’s identity to other resources
• Build middleware on layer providing:
– Authentication: know who wants to use resource
– Authorisation: know what the user is allowed to do
– Security: reduce vulnerability, e.g. from outside the firewall
– Non-repudiation: knowing who did what
• “GSI” from the Globus toolkit does this for NGS
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
37
Grid security and trust -2
Enabling Grids for E-sciencE
• Currently, achieved by Certification:
– User’s identity has to be certified by (mutually recognized)
national Certification Authorities (CAs)
– Resources (node machines) have to be certified by CAs
– Digital certificate installed on the machine accessed by user
basis of AA
– Identity passed to other resources you use, where it is
mapped to a local account – the mapping is maintained by
the VO
• User joins VO’s
• Common agreed policies establish rights for a
Virtual Organization to use resources
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
38
1997- Present: Globus
Enabling Grids for E-sciencE
• A software toolkit addressing certain technical
problems in the development of Grid enabled tools,
services, and applications
– Offers a modular “bag of technologies”
– Made available under liberal open source license
• Not turnkey solutions, but building blocks and tools for
application developers and system integrators
• Tools built on GSI include:
–
–
–
–
Job submission (GRAM) : run a job on a remote computer
Information services: So I know which computer to use
File transfer (GridFTP): so large data files can be transferred
Replica management: so I can have multiple versions of a file
“close” to the computers where I want to run jobs
• http://www.globus.org/
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
39
Current status – continued
Enabling Grids for E-sciencE
• Grid technology is developing!
• Non-trivial for new users and new applications areas to
start!
• Hence need major commitments
– to training
– to supporting new user communities
– to establishing procedures for new VO’s
• Social as well as technical issues
– Commitments to collaboration
– From systems admin as well as researchers
 Negotiation with operations and VO managers (see later)
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
40
The key for new VO’s
Enabling Grids for E-sciencE
Application
Application
toolkits, standards
Middleware:
“collective services”
Basic Grid services:
AA, job submission, info, …
• Application development
environment
• Insulate applications
from changing
middleware
• Build distributed
applications from
components
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
41
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
Definitions of e-Science and “a grid”
Exploring the definitions
Why now?!
Some examples
Current status of grids
Are grids for you?!
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
42
Are Grids for you?! -1
Enabling Grids for E-sciencE
• IF a community effort is vital to achieving goals, by
sharing services of data and computation,
• AND that effort crosses organisation boundaries
• THEN yes!
•
In the UK, negotiate to join the NGS!
• OR if you wish to use computation/storage/data
services provided on a Grid – then YES!
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
43
Are Grids for you? -2
Enabling Grids for E-sciencE
• Suggestions for research disciplines not already
engaged with grid computing
– Identify your “Virtual Organisations”
 What are the drivers for collaboration?
 What are the VO characteristics?
• Fixed relationships?
• Short-lived or constant requirements for resources?
• Sharing results or “just” sharing resources?
 What services (data, computation) would you want to share?!
– What remote resources (computers, databases, instruments…)
do you need to access?
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
44
Summary
Enabling Grids for E-sciencE
• Collaboration across multiple organisations
• Single sign-on to resources in multiple organisations
• Need for people-services as well as middleware
services to enable this: e.g. to run
–
–
–
–
Enabling services (e.g. info service)
Certification authority for AA
VO management – to negotiate with sites
Helpdesk, …
• Drives are towards
– Production services
 In the UK, the NGS
 In Europe, EGEE
– Standards – (tomorrow)
– “e-Infrastructure” ~ integration of networking and middleware to
support collaboration
INFSO-RI-508833
What are Grid Computing and e-Science?
10 March 2005, NeSC
45
Download