Grid Computing - Mona Institute of Applied Sciences

advertisement
Grid Computing
Mr. Tim Stitt
Dept. Computer Science and Mathematics
University of the West Indies
Talk Outline
1.
The Data Challenge
2.
Solutions to the Data Challenge
3.
What is the Grid ?
4.
How will the Grid Work ?
5.
Grid Applications
The Data Challenge
After fifty years of innovation:
The raw speed of individual computers has increased
by a factor of around one million:
yet they are still too slow for many challenging
scientific problems.
Grand Challenge Problem
Detectors at the Large Hadron
Collider at CERN, Geneva

will be producing several Petabytes of data per year
- a million times the storage capacity of the average desktop computer;
- accounts for nearly 10% of all the information produced by humans each
year.
Performing the most rudimentary analysis of the LHC
data will require close to 20 TeraFlops ( a trillion
floating-point operations per second )
The fastest contemporary supercomputer just
manages close to 13 Teraflops of computing
power.
ASC Q
• 4,096 Alpha EV-68 processors
( 1.25 GhZ each )
• 33 TB of memory
• 664 TB of disk space
• 13 Teraflops (peak)
• >football field
• >1 Mwatt
• >$100M
Supplier:
Client:
Main Application:
IBM
USA Department of Energy
Simulated Testing of Nuclear Weapons Stockpile
Solutions to the Data Challenge ?
Nearly every organization is sitting on top of enormous
unused computing capacity:



Mainframes idle 40% of the time;
Unix servers are actually “serving” less than 10% of
the time;
Most PC’s do nothing for 95% of a typical day.
Consider a hotel with 95% vacancy or an airline with
90% of its fleet on the ground.
Solutions to the Data Challenge ?
Can we use this to our advantage ?
Answer: Yes
Large Hadron Collider,
CERN, Geneva
Solutions to the Data Challenge ?
Several approaches to harnessing idle compute time
and resources have been implemented over the last
decade:
1.
Distributed Computing

2.
Metacomputing

3.
Linking computers over a network
Linking Supercomputers with high-speed networks
Cluster Computing

Overcome the need for expensive commercial supercomputers by
linking together commodity PC’s – good scalability.
Solutions to the Data Challenge ?
4.
Peer-to-Peer Computing

5.
Individuals download software to
their local machines and make it
public to other users bypassing a
central server e.g. Kazaa
Internet Computing ( or cyclescavenging )

6.
Different parts of problems can be
worked on simultaneously by users
who download chunks of data over
the Internet, process the data and
return their results. Based on goodwill and hence not a viable strategy
for all tasks e.g. SETI project.
The Grid
NorduGrid
Scalable Computing
P
E
R
F
O
R
M
A
N
C
E
2100
2100
2100
2100
2100
2100
2100
2100
2100
+
Q
o
S
Personal Device
SMPs or
SuperComputers
Local
Cluster
Enterprise
Cluster/Grid
Global
Grid
Inter Planet
Grid
What is the Grid ?
Internet Computing, clustering, meta-computing etc.
are all special cases of something much more
powerful which includes:
the ability for communities to share resources to
tackle common goals.
Science and business today are increasingly:

collaborative and multidisciplinary in nature;

span institutions, states, countries and continents.
What is the Grid ?
Email and the Web are basic mechanisms to allow
such groups to work together but what could
happen if we could link:
1.
2.
3.
4.
5.
Data and Databases
Computers
Sensors
Handhelds
Radio Telescopes …
into a single virtual laboratory.
So what is the Grid ?
In a paragraph or two:
Whereas the Web is a service for sharing information over
the Internet, the Grid is a service for sharing computer
power and data storage capacity over the Internet.
The Grid goes well beyond simple communication between
computers, and aims ultimately to turn the global network
of computers into one vast computational resource.
Another Definition:
A grid is a type of parallel and distributed system that
enables the sharing, selection, and aggregation of
geographically distributed "autonomous" resources
dynamically at runtime depending on their availability,
capability, performance, cost, and users' quality-of-service
requirements.
The Grid
The term “Grid” was chosen by analogy with the electric
power grid.
“The Grid would let users tap processing power off the
internet as easily as electric power can be drawn
from a wall socket”
The Power Grid
Electrical Power Grid
The “Grid”
When you plug in an appliance to a
socket, you don’t care where the
power came from e.g. wind, coal or
a nuclear plant
When you sit in front of your
computer to solve a problem you
know that whatever computer you
plug into the Internet, you will get
the computing power and storage
capacity you need to complete the
job
The infrastructure is called the
“power grid”. It links together
power plants of different kinds with
your home through transmission
stations, power stations,
transformers, powerlines etc.
The infrastructure is called the
“Grid”. It links together computing
resources such as PC’s,
workstations, servers, storage
elements etc.
The Power Grid
Electrical Power Grid
The Grid
Is Transperent:
Is Transparent:
No need to worry about how and
where the electricity is generated.
You don’t need to know what
computer processes you request
and where the data is that it needs.
Is Pervasive:
Is Pervasive:
Electricity is available essentially
everywhere and can be accessed
simply through a standard wall
socket
Remote computing resources will
be accessible from different
platforms, including desktops,
laptops, PDA’s, mobile phones,
through a web browser ( a portal )
Is a Utility:
Is a Utility:
You ask for electricity, you get it
and pay for it
You ask for computer power or
storage and you get it. You also pay
for what you get.
This idea of a computing power grid is not new:
“The time-sharing computer system can unite a group
of investigators …. one can conceive of such a facility
as an … intellectual public utility.”
- Fernando Corbato and Robert Fano , 1966
“We will perhaps see the spread of ‘computer utilities’,
which, like present electric and telephone utilities, will
service individual homes and offices across the
country.”
- Len Kleinrock, 1967
It should be noted that reference to grid computing can
presently lead to some confusion:


The current reality now and for a while is that there
is not one single “Grid” ( and may never be ! ).
Instead there are many grids ( or virtual
organizations ) evolving:
-
National Grids
> couple high end resources across a nation e.g. the e-Science program
national Grid in the U.K.
-
Private Grids
> Local grids for use in institutions such as hospitals, corporations etc.
-
Project Grids
> Grids developed to meet the need of multi-institutional research groups
and multi-company virtual teams.
NASA’s Information Power Grid
-
GoodWill Grids
> Anyone owning a computer at home can
donate some computer capacity to a good cause
-
Consumer Grids
> Resources are shared on a commercial basis rather than goodwill or
mutual self-interest. Companies or organisations rent distributed
resources from the owners.
Gusto
How will it work ?
Grid development relies on
advanced software, called
Middleware, which:
-
Ensures seamless communication
between different computers and
different parts of the world;
-
Provides a grid search engine which
will not only find the data the user
requires but also the data processing
techniques and computing power to
carry it out;
-
Distributes the computing task to
wherever in the world there is spare
capacity and then send the results
back to the user.
The Grid Middleware ( in more detail ):
1.
Finds convienient places for computing tasks to be run
2.
Discovers and optimises use of the widely dispersed resources
3.
Organises efficient access to scientific data
4.
5.
Deals with authentication to the different sites that the user will
be using
Interfaces to local site authorisation and resource allocation
policies
6.
Runs the jobs
7.
Monitors progress
8.
Recovers from problems
9.
Tells the user when the work is completed and transfers back
the results.
The Five Big Ideas

Resource Sharing
- Challenge: resources owned by many different people

Secure Access
- 3 A’s: Access, Authentication, Authorization

Resource Use
- Make use of resources efficiently

Death of Distance
- High speed networking technology makes Grid possible

Open Standards
- Coordinated by Global Grid Forum
- Agreement on core technologies such as the Globus Toolkit
Grid Layered Architecture
Natural
Language
Engineering
Molecular
Docking
High
Energy
Physics
Portfolio
Applications
Analysis
Brain
Activity
Analysis
Application Toolkits
GlobusView
DUROC
MPI
Condor-G
MPI
LSF
PBS
GSI-FTP
GSI
MDS
Condor
Testbed Status
Nimrod/G
Grid Services
Nexus
I/O
HPC++
Linux
globusrun
HBM
GASS
TCP
NT
Solaris
User-Level
Middleware
(Grid Tools)
Core Grid
Middleware
GRAM
Grid Fabric
NQE
GAMESS
Chemistry
Grid
Apps.
UDP
DiffServ
Grid
Fabric
Current Grid Applications
NCSA
Origin
Caltech
Exemplar
CEWES
SP
Maui
SP


SF-Express distributed interactive simulation.
100K vehicles (2002 goal) using 13 computers,
1386 nodes, 9 sites.
Distributed Molecular Docking
Molecules
Protein
Chemical Databases
(legacy, in .MOL2 format)
Involves screening millions of
chemical compounds
(molecules) in the Chemical
DataBase (CDB) to identify
those having potential to
serve as drug candidates.
Medical Healthcare Applications
• Digital image archives
• Collaborative virtual environments
• On-line clinical conferences
“The Grid will enable a
standardized, distributed digital
mammography resource for
improving diagnostic
confidence"
“The Grid makes it possible to use
large collections of images in new,
dynamic ways, including medical
diagnosis.”
“The ability to visualise 3D
medical images is key to
the diagnosis of
pathologies and presurgical planning”
Nanotechnology
• New and 'better' materials
• Benefits in pharmaceuticals, agrochemicals, food production,
electronics manufacture from the faster, cheaper discovery of new
catalysts, metals, polymers, organic and inorganic materials
“The Grid has the potential
to store and analyze data on
a scale that will support
faster, cheaper synthesis of
a whole range of new
materials.”
Natural Resources/Environments
• Modeling and prediction of earthquakes
• Climate change studies and weather forecast
• Pollution control
• Socio-economic growth planning, financial modeling and
performance optimization
“Federations of
heterogeneous databases
can be exploited through
the Grid to solve complex
questions about global
issues such as
biodiversity.”
myGrid Project - Bioinformatics

Imminent ‘deluge’ of
genomics data
-

Highly heterogeneous,
Highly complex and interrelated
Convergence of data
and literature archives
-
Database access from the
Grid
-
Process enactment on the
Grid
-
Personalisation services
-
Metadata services
National Virtual Observatory
http://virtualsky.org/
from
Caltech CACR
Caltech Astronomy
Microsoft Research
Virtual Sky has
140,000,000 tiles
140 Gbyte
Change scale
Change theme
Optical (DPOSS)
Xray (ROSAT) theme
Coma cluster
Belle Particle
Physics
Experiment





A Running experiment based in KEK B-Factory, Japan
Investigating fundamental violation of symmetry in nature
(Charge Parity) which may help explain the universal matter
– antimatter imbalance.
Collaboration 400 people, 50 institutes
100’s TB data currently
UoM School of Physics is an active participant and have led
the Grid-enabling of the Belle data analysis framework.
Famous Predictions on Technology
“The world will only
need five computers”
Thomas.J.Watson IBM
“There is absolutely no
need for a computer in
the home.”
Ken Olsen, DEC
“640 kilobytes is all
the memory you will
ever need.”
Bill Gates, Microsoft
Announcement
"High-Performance Computing, Parallelism and
Applications" Summer School
July 5-10, 2004
Centre Commun de Calcul Intensif (C3I) Fouillole
Campus de l'Université des Antilles et de la Guyane.
Download