Grid Computing (Special Topics in Computer Engineering) Veera Muangsin 23 January 2004

advertisement
Grid Computing
(Special Topics in Computer Engineering)
Veera Muangsin
23 January 2004
1
Outline
• High-Performance Computing
• Grid Computing
• Grid Applications
• Grid Architecture
• Grid Middleware
• Grid Services
2
High-Performance Computing
3
World’s Fastest Computers: The Top 5
mega = 106 (ล้าน) giga = 109 (พันล้าน) , tera = 1012 (ล้านล้าน) , peta = 1015 (พันล้านล้าน)
4
#1 Japan’s Earth Simulator
Specifications
Total number of processors
5,120
Peak performance / processor
8 Gflops
Total number of nodes
640
Peak performance / node
64 Gflops
Total peak performance
40 Tflops
Shared memory
16 GB
Total main memory
10 TB
5
Processor Cabinets
6
Earth Simulator does climate modeling
Parallel decomposition
PN320
I=3840
FFT
Spectral
space
PN01
PN0
2PN03
Inverse
d
FFT
. . .
. . .
PN01
PN02
PN0
3
J=19
20
Grid space
J=1920
Grid points: 3840*1920*96
PN32
0
7
• Being constructed by IBM
• To be completed in 2006
• Expected performance:
1 PetaFLOPS, to be no.1 in
the TOP500 list
(in 2003 the aggregated performance of
TOP500 machines is 528 TFlops)
• Applications: molecular
dynamics, protein folding,
drug-protein interaction
(docking)
8
Clusters
The most common
architecture in the
TOP500
– 7 in the top 10
– 208 from 500
9
10
#2 LANL’s ASCI Q
• 13.88 TFlops
• 8192-node cluster
HP AlphaServer 1.25 GHz
• LANL (Los Alamos
National Laboratory)
• Analyze and predict the
performance, safety, and
reliability of nuclear
weapons
11
#3 Virginia Tech’s System X
• 10.28 TFlops
• 1,100-node cluster, Apple
G5, Dual PowerPC970
2GHz, 4GB memory,
160GB disk (total 176
TB), Mac OS X (FreeBSD
based UNIX)
• $5.2 millions
12
System X’s Applications
•
•
•
•
•
•
•
•
Nanoscale Electonics
Quantum Chemistry
Computational Chemistry/Biochemistry
Computational Fluid Dynamics
Computational Acoustics
Ecomputational Electromagnetics
Wireless Systems Modeling
Large scale Network emulation
13
#4 NCSA’s Tungsten
• 9.81 TFlops
• 1,450-node cluster, dualprocessor Dell PowerEdge
1750, Intel Xeon 3.06 GHz
• NCSA (National Center for
Supercomputing Applications)
14
#5 PNNL’s MPP2
• 8.63 TFlops
• 980-node cluster, HP
Longs Peak, dual Intel
Itanium-2 1.5 GHz
• PNNL (Pacific Northwest
National Laboratory)
• Application: Molecular
Science
15
The Real No.1 68.06 TFlops !!!
16
Users
Results received
Total CPU time
Floating Point
Operations
Average CPU time
per work unit
Total
Last 24 Hours
4,848,584
1,213,258,391
1,783,547.603 years
1,457 (new users)
1,507,691
1,324.293 years
5.879995e+18
(68.06 TeraFLOPs/sec)
4.315893e+21
12 hr 52 min 39.4 sec
7 hr 41 min 39.9 sec
Last updated: Fri Jan 23 01:33:45 2004
17
Science at Home
18
Evaluate AIDS drugs at home
• 9,020 users (12 Jan
2004)
• AutoDock: predict how
drug candidates, might
bind to a receptor of
HIV’s protein
19
Scientific Applications
• Always push computer technology to its
limit
• Grand Challenge applications
– Those applications that cannot be completed with sufficient
accuracy and timeliness to be of interest, due to limitations such as
speed and memory in current computing systems
• Next challenge: large scale collaborative
problems
20
E-Science: a new way to do science
• Pre-electronic science
– Theorize and/or experiment, in small teams
• Post-electronic science
–
–
–
–
Construct and mine very large databases
Develop computer simulations & analyses
Access specialized devices remotely
Exchange information within distributed
multidisciplinary teams
21
Data Intensive Science: 2000-2015
• Scientific discovery increasingly driven by IT
–
–
–
–
Computationally intensive analyses
Massive data collections
Data distributed across networks of varying capability
Geographically distributed collaboration
• Dominant factor: data growth
–
–
–
–
2000
2005
2010
2015
~0.5 Petabyte
~10 Petabytes
~100 Petabytes
~1000 Petabytes?
• Storage density doubles every 12 months
• Transforming entire disciplines in physical and biological
sciences
22
Network
• Network vs. computer performance
– Computer speed doubles every 18 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
• 1986 to 2000
– Computers: x 500
– Networks: x 340,000
• 2001 to 2010
– Computers: x 60
– Networks: x 4000
23
E-Science Infrastructure
software
computers
sensor
nets
instruments
colleagues
data archives
24
Online Access to Scientific Instruments
Advanced Photon Source
wide-area
dissemination
real-time
collection
archival
storage
desktop & VR
clients with
shared controls
tomographic
reconstruction
DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
25
Data Intensive Physical Sciences
• High energy & nuclear physics
– Including new experiments at CERN
• Astronomy: Digital sky surveys
• Time-dependent 3-D systems (simulation, data)
–
–
–
–
Earth Observation, climate modeling
Geophysics, earthquake modeling
Fluids, aerodynamic design
Pollutant dispersal scenarios
26
Data Intensive Biology and Medicine
• Medical data
– X-Ray
– Digitizing patient records
• X-ray crystallography
• Molecular genomics and related disciplines
– Human Genome, other genome databases
– Proteomics (protein structure, activities, …)
– Protein interactions, drug delivery
• 3-D Brain scans
27
Grid Computing
28
What is Grid?
Google Search (Jan 2004)
“grid computing”
>600,000 hits
“grid computing” AND hype
>20,000 hits
(hype = โฆษณาชวนเชื่อ)
29
From Web to Grid
• 1989: Tim Berners-Lee invented
the web
• so physicists around the world
could share documents
• 1999: Grids add to the web
• computing power, data
management, instruments
• E-Science
• Commerce is not far behind
30
The Grid Opportunity:
e-Science and e-Business
• Physicists worldwide pool resources for peta-op
analyses of petabytes of data
• Engineers collaborate to design buildings, cars
• An insurance company mines data from partner
hospitals for fraud detection
• An enterprise configures internal & external
resources to support e-Business workload
31
Grid
• “We will give you access to some of our
computers and instruments if you give us
access to some of yours.”
• “Resource sharing & coordinated problem
solving in dynamic, multi-institutional
virtual organizations”
32
Grid
• Grid provides the infrastructure
– to dynamically managed:
• Compute resources
• Data sources (static and live)
• Scientific Instruments (Wind Tunnels, Telescopes,
Microscopes, Simulators, etc.)
– to build large scale collaborative problem solving
environments that are:
• cost effective
• secure
33
Grid Applications
34
Life Sciences
DATA ACQUISITION
PROCESSING,
ANALYSIS
ADVANCED
VISUALIZATION
NETWORK
IMAGING
INSTRUMENTS
COMPUTATIONAL
RESOURCES
LARGE DATABASES
35
Biomedical applications
• Data mining on genomic
databases (exponential growth)
• Indexing of medical databases
(Tb/hospital/year)
• Collaborative framework for
large scale experiments
• Parallel processing for
– Databases analysis
– Complex 3D modelling
36
Digital Radiology on the Grid
• 28 petabytes/year for 2000 hospitals
• must satisfy privacy laws
37
University of Pennsylvania
Brain Imaging
• Biomedical Informatics Research Network [BIRN]
Reference set of brains provides essential data for
developing therapies for neurological disorders (Multiple
Sclerosis, Alzheimer’s disease).
• Pre-BIRN:
– One lab, small patient base
– 4 TB collection
• With TeraGrid
–
–
–
–
Tens of collaborating labs
Larger population sample
400 TB data collection: more brains, higher resolution
Multiple-scale data integration, analysis
38
Earth Observations
ESA missions:
• about 100 Gbytes of data
per day (ERS 1/2)
• 500 Gbytes, for the next
ENVISAT mission
39
Particle Physics
• Simulate and reconstruct complex physics
phenomena millions of times
40
Whole-system Simulations
wing models
airframe models
•lift capabilities
•drag
capabilities
•responsiveness
stabilizer models
•deflection
capabilities
•responsiveness
crew
capabilities
- accuracy
- perception
- stamina
- reaction
times
- SOP’s
engine models
human models
•braking
performance
•steering capabilities
•traction
•dampening
landing gear
models capabilities
•thrust performance
•reverse thrust
performance
•responsiveness
•fuel consumption
41
NASA Information Power Grid: coupling all sub-system simulations
National Airspace Simulation
Environment
stabilizer models
engine models
44,000 wing runs
wing models
GRC
50,000 engine runs
airframe models
66,000 stabilizer
runs
ARC
22,000 commercial
US flights a day
48,000 human
crew runs
human models
simulation
drivers
Virtual
National
Air Space
VNAS
LaRC
22,000 airframe
impact runs
• FAA ops data
132,000
• weather data
landing/
• airline schedule data
• digital flight data take-off gear
runs
• radar tracks
• terrain data
• surface data
landing
gear
models
NASA Information Power Grid: aircraft, flight paths, airport operations and the
42 environm
are combined to get a virtual national airspace
Global In-flight Engine Diagnostics
in-flight data
airline
global network
eg SITA
ground
station
DS&S Engine Health Center
internet, e-mail, pager
maintenance centre
data centre
43
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &Yo
Emergency Response Teams
• Bring sensors, data,
simulations and experts
together
– wildfire: predict
movement of fire &
direct fire-fighters
– also earthquakes,
peacekeeping forces,
battlefields,…
National Earthquake Simulation Grid
44
Los Alamos National Laboratory: wildfire
Grid C Grid Computing Today
omputing Today
DISCOM
SinRG
APGrid
IPG …
45
Selected Major Grid Projects
URL & Sponsors
Name
Access Grid
BlueGrid
New
DISCOM
www.mcs.anl.gov/FL/
accessgrid; DOE, NSF
g
g IBM
g www.cs.sandia.gov/
discom
DOE Defense Programs
DOE Science
Grid
New
Earth System
Grid (ESG)
European
Union (EU)
DataGrid
g
g
Focus
Create & deploy group collaboration systems
using commodity technologies
Grid testbed linking IBM laboratories
Create operational Grid providing access to
resources at three U.S. DOE weapons
laboratories
sciencegrid.org
DOE Office of Science
Create operational Grid providing access to
resources & applications at U.S. DOE science
laboratories & partner universities
earthsystemgrid.org
DOE Office of Science
Delivery and analysis of large climate model
datasets for the climate research community
g eu-datagrid.org
European Union
Create & apply an operational grid for
applications in high energy physics,
environmental science, bioinformatics
46
Selected Major Grid Projects
Name
EuroGrid, Grid
Interoperability
(GRIP)
URL/Sponso
r
g eurogrid.org
New
European Union
Focus
Create tech for remote access to supercomp
resources & simulation codes; in GRIP,
integrate with Globus Toolkit™
Fusion Collaboratory g fusiongrid.org
New DOE Off. Science
Create a national computational collaboratory
for fusion research
Globus Project™
Research on Grid technologies; development
and support of Globus Toolkit™; application
and deployment
GridLab
GridPP
g globus.org
DARPA, DOE,
NSF, NASA, Msoft
g gridlab.org
New European Union
g gridpp.ac.uk
New U.K. eScience
Grid Research
g grids-center.org
Integration Dev. &
NSF
Support Center New
Grid technologies and applications
Create & apply an operational grid within the
U.K. for particle physics research
Integration, deployment, support of the NSF
Middleware Infrastructure for research &
education
47
Selected Major Grid Projects
Name
URL/Sponsor
Focus
Grid Application
Dev. Software
g hipersoft.rice.edu/
Research into program development
technologies for Grid applications
Grid Physics
Network
g griphyn.org
NSF
Technology R&D for data analysis in physics
expts: ATLAS, CMS, LIGO, SDSS
ipg.nasa.gov
NASA
Create and apply a production Grid for
aerosciences and other NASA missions
NSF
Create international Data Grid to enable
large-scale experimentation on Grid
technologies & applications
grads; NSF
Information Power
Grid
g
International Virtual g ivdgl.org
Data Grid
Laboratory
New
Network for
g neesgrid.org
Earthquake Eng.
NSF
Simulation Grid New
Create and apply a production Grid for
earthquake engineering
Particle Physics Datag ppdg.net
Grid
DOE Science
Create and apply production Grids for data
analysis in high energy and nuclear physics
experiments
48
Selected Major Grid Projects
Name
TeraGrid
URL/Sponsor
g teragrid.org
New NSF
Focus
U.S. science infrastructure linking four major
resource sites at 40 Gb/s
UK Grid Support g grid-support.ac.uk
Center
New U.K. eScience
Support center for Grid projects within the
U.K.
Unicore
Technologies for remote access to
supercomputers
BMBFT
Also many technology R&D
projects: e.g., Condor,
NetSolve, Ninf, NWS
See also www.gridforum.org
49
TeraGrid
• 13.6 trillion calculations per second
• Over 600 trillion bytes of immediately
accessible data
• 40 gigabit per second network speed
50
TeraGrid
51
European DataGrid
Lund
RAL
Paris
Santander
Lisboa
Estec KNMI
IPSL
Lyon
Grenoble
Berlin
Prague
CERN
Brno
Milano
PD-LNL
Torino
Madrid
Marseille Pisa BO-CNAF
Barcelona
ESRIN
Roma
Valencia
Testbed Sites (>40)
Catania
52
UK e-Science Grid
e-Science Centers
Edinburgh
Glasgow
DL
Belfast
Newcastle
Manchester
Oxford
Cardiff
RAL
Cambridge
London
Hinxton
Soton
53
Asia-Pacific Grid (APGrid)
Japan
Australia
USA
Canada
Korea
Thailand
Taiwan
Singapore
Malaysia
APAN members
54
Grid goes to business
• IBM, HP, Oracle, Sun, …
• www.ibm.com/grid
• www.hp.com/techservers/grid
• www.oracle.com/technologies/grid
• www.sun.com/grid
55
For More Information
• Globus Project™
– www.globus.org
• Grid Forum
– www.gridforum.org
• Book (Morgan Kaufman)
– www.mkp.com/grids
56
Download