slides1-1 - Personal Web Pages

advertisement
Introduction to Grid
Computing
© 2010 B. Wilkinson/Clayton Ferner. Modification date: Jan 13, 2010
1-1.1
Grid Computing
• Using
geographically
distributed and
interconnected
computers
together for
computing and
for resource
sharing.
“The grid virtualizes heterogeneous geographically disperse resources”
from "Introduction to Grid Computing with Globus," IBM Redbooks
1-1.2
“Grid”
• Common practice to use word Grid as
a proper noun (i.e. G is capitalized)
although does not refer to one universe
Grid.
• There are many Grid infrastructures.
• We have set up one for this course.
• You will learn how that was done and
the technicalities in the course.
1-1.3
Need to harness computers
Original driving force behind Grid
computing same as behind the early
development of networks that became the
Internet:
– Connecting computers at distributed
sites for high performance computing.
1-1.4
However, Grid
computing is
about
collaborating and
resource sharing
as much as it is
about high
performance
computing.
1a-1.5
Virtual
Organizations
Grid computing offers
potential of virtual organizations:
– groups of people, both geographically and
organizationally distributed, working
together on a problem, sharing computers
AND other resources such as databases
and experimental equipment.
1a-1.6
Different organizations can supply resources and
personnel.
Concept has many benefits, including:
•Problems that could not be solved previously for
humanity because of limited computing resources
can now be tackled.
Examples
•
•
Understanding the human genome
Searching for new drugs … .
Continued.
1-1.7
• Users can have access to far greater computing
resources and expertise than available locally.
• Inter-disciplinary teams can be formed across
different institutions and organizations to tackle
problems that require expertise of multiple
disciplines.
• Specialized localized experimental equipment can
be accessed remotely and collectively.
Continued.
1-1.8
• Large collective databases can be created to hold
vast amounts of data.
• Unused compute cycles can be harnessed at
remote sites, achieving more efficient use of
computers.
• Business processes can be re-implemented using
Grid technology for dramatic cost saving.
1-1.9
Crosses multiple
administrative domains.
• Another hallmark of larger Grid
computing projects.
• Resources being shared owned either
by members of virtual organization or
donated by others.
• Introduces challenging technical and
social-political challenges.
• Requires true collaboration.
1-1.10
• Some key features we regard as indicative
of Grid computing:
– Shared multi-owner computing resources
– Uses Grid computing software, with security and
cross-management mechanisms in place
– Tools to bring together geographically distributed
computers owned by others.
1-1.11
Shared Resources
Can share much more than just computers:
•
•
•
•
•
Storage
Sensors for experiments at particular sites
Application Software
Databases
Network capacity, …
1-1.12
Interconnections and
Protocols
Focus now on:
• using standard Internet protocols and
technology, i.e. HTTP, SOAP, web
services, etc.,
1-1.13
History of distributed computing
Certainly one can go back a long way to trace the
history of distributed computing.
Types of distributed computing existed in 1960s.
Many people interested in connecting computers
together for high performance computing.
From connecting processors/computers together
locally that began in earnest in the 1960s and 1970s,
distributed computing now extends to connecting
computers that are geographically distant - Grid
computing.
1-1.14
Distributed computing technologies that
underpin Grid computing developed
concurrently and rely upon each other.
Three concurrent interrelated paths:
• Networks
• Computing platforms
• Software techniques
1-1.15
Networks
1960s - Development of packet switched networks.
1969 - ARPNET network became operational.
4 nodes, Univ. of California at Los Angeles, Stanford
Research Institute, Univ. of California at Santa
Barbara, and Univ. of Utah. Design speed of 50
Kbits/sec.
1974 - TCP (Transmission Control Protocol)
1978 - TCP/IP (Transmission Control Protocol/Internet
Protocol).
TCP a protocol for reliable communication
IP for network routing. IP addresses identify hosts on
the Internet Ports identify end points (processes) for
communication purposes.
Early 1970s - Ethernet for interconnecting computers on
local networks.
Early 1980s - Internet. Uses the TCP/IP protocol.
1990s - Internet developed into World-Wide Web.
Browser and HTML markup language introduced.
1-1.16
Computing Platforms
1960 onwards - Recognized that increased speed could
potentially be obtained by having more than one
processor inside a single computer system
Parallel computer coined to describe such systems.
1970s and 1980s - many parallel computer projects
especially with advent of low cost microprocessors.
1990s - cluster computing, a group of computers inter
connected through a network switch to form a
computing platform
Commodity computers (PCs) provided cost-effective
solution.
1-1.17
Typical cluster computing configuration
Fig. 1.1
1-1.18
Programming clusters
Message passing programming -- Messages between
processes specified by programmer using messagepassing routines:
Late 1980s - early 1990s - PVM (Parallel Virtual Machine)
Late 1990s - MPI (Message Passing Interface)
Late 1980’s onwards – Condor
To harness “unused” cycles of networked computers for
high performance computing.
A collection of computers could be given over to remote
access automatically when not being used locally.
Widely used as a job scheduler for clusters in addition to
its original purpose of using laboratory computers
collectively.
We will consider Condor in the light of Grid computing. 1-1.19
Software Techniques
Mid 1980s - Remote procedure call (RPC) for invoking a
procedure on a remote computer.
Service registry - introduced with RPC to locate remote
services.
1990s - Object-oriented versions of RPC:
CORBA (Common Request Broker Architecture)
Java Method Invocation (RMI).
2000 - Web service
Provide remote actions as RPC but invoked through
standard protocols and Internet addressing.
Use XML (eXtensible Markup Language), also
introduced in 2000.
Web services and XML adopted into Grid computing
soon after their introduction
1-1.20
Grid Computing History
• Began in mid 1990s with experiments
using computers at geographically
dispersed sites.
• Seminal experiment – “I-way” experiment
at 1995 Supercomputing conference
(SC’95), using 17 sites across US running:
– 60+ applications.
– Existing networks (10 networks).
1-1.21
Globus Project
Led by Ian Foster, a co-developer of I-Way
demonstration, and founder of the Grid computing
concept.
Globus -- middleware software Grid computing toolkit.
Evolved through several implementation versions
although basic structural components remained
essentially same:
• Security,
• Data management
• Execution management
• Information services
• Run time environment)
We will describe Globus in detail later.
Other grid computing middleware software
Although Globus widely adopted and basis of the course,
there are other software infrastructure projects.
1993 - Legion project
Software development started in 1996
Used object-based approach to Grid computing.
First public release at Supercomputing 97 in Nov.1997.
Led to Avaki company/software, taken over by Sybase Inc.
1990s - UNICORE (UNiform Interface to COmputing
REsources)
European grid computing project.
Initially funded by German Ministry for Education and
Research. Continued with other European funding.
Basis of several European efforts in Grid computing and
elsewhere.
Many similarities to Globus.
1-1.23
Key
concepts
in the
history of
Grid
computing
Fig. 1.2
1-1.24
Applications
• Originally e-Science applications
– Computational intensive
• Traditional high performance computing
addressing large problems
• Not necessarily one big problem but a
problem that has to be solved repeatedly
with different parameters.
– Data intensive
• Computational but emphasis on large
amounts of data to store and process
– Experimental collaborative projects
1-1.25
• Now also e-Business applications
–To improve business models and
practices.
–Sharing corporate computing
resources and databases
–On-demand Grid computing …
indirectly led to cloud computing.
1-1.26
Grid Computing verse Cluster
Computing
• Important not to think of Grid computing
simply as large cluster because potential and
challenges different.
• Courses on Grid computing and on cluster
computing are quite different.
1-1.27
Cluster computing course
• One learns about :
– Message passing programming using tools such as MPI,
and
– Shared memory programming using threads and
OpenMP, given that most computers in a cluster today
now multi-core shared memory systems.
– Parallel algorithms (lots)
• Network security is not a big issue.
– Usually an ssh connection to front node of cluster
sufficient.
– User logging onto a single compute resource.
• Computers connected together locally under one
administrative domain
1-1.28
Grid computing course
• Learn about running jobs of remote machines,
scheduling jobs and distributed workflow
• Learn in detail underlying Grid infrastructure
• How Internet technologies applied to Grid
computing
• Grid computing software and standards
• Security is an issue.
1-1.29
Grid Computing verse Cluster Computing
• Of course, there are things in common
• Both courses hands-on with programming
experiences.
• Both use multiple computers
• Both require job scheduler to place jobs.
1-1.30
Cloud computing
• Lot of hype on Cloud computing at the moment.
• Business model in which services provided on
servers that can be accessed through Internet.
• Lineage of cloud computing can be traced back to
on-demand Grid computing in the early 2000s.
1-1.31
Cloud computing using virtualized resources
Fig. 1.3
1-1.32
• Common thread between Grid computing
and cloud computing is use of Internet to
access resources.
• Cloud computing driven by widespread
access that Internet provides and Internet
technologies.
• However cloud computing quite distinct
from original purpose of Grid computing.
1-1.33
Grid Computing verse Cloud Computing
• Whereas Grid computing focuses on
collaborative and distributed shared resources,
Cloud computing concentrates upon placing
services for users to pay to use.
• Technology for cloud computing emphases:
– use of software as a service (SaaS)
– virtualization (process of separating particular
user’s software environment from underlying
hardware).
1-1.34
Ian Fosters’ check list
Ian Foster credited for development of Grid
computing.
Sometimes called father of Grid computing
Proposed simple checklist of aspects that are
common to most true Grids:
•No centralized Control
•Standard open protocols
•Non-trivial quality of service (QoS)
1-1.35
Computational Grid
Applications
•
•
•
•
•
Biomedical research
Industrial research
Engineering research
Studies in Physics and Chemistry
…
1-1.36
Sample Grid Computing
Projects
1-1.37
• Enterprise Grids – Grid formed within an
organization for collaboration
– Still might cross administrative domains of
departments and requires departments to
share their resources
– Example: campus Grids
1-1.38
Example
University
of Virginia
Campus
Grid
1-1.39
1a.39
• Partner Grids -- Grids between
collaborative organizations
• This makes most use of potential of
Grid computing and collaboration
1-1.40
Environment/Earth
NSF Network for Earthquake Engineering Simulation
(NEES)
Transform our ability to carry out research vital to reducing
vulnerability to catastrophic earthquakes
from I. Foster
SCOOP Project
Southeastern Coastal Ocean Observing and
Prediction Program
http://scoop.sura.org/
• Integrating data from regional observing systems
for real time coastal forecasts in SE
• Coastal modelers with computer scientists to
couple models, provide data solutions, deploy
ensembles of models on the Grid, assemble real
time results with GIS technologies.
From: "Urgent Computing for Hurricane Forecasts,“ Gabrielle Allen, Urgent Computing Workshop, Argonne National
Laboratory, April 25th to 26th, 2007 http://scoop.sura.org/documents/UrgentComputing_April2007.pdf
1-1.42
SCOOP Prototype Distributed Laboratory
2005/2006 SCOOP
Implementation Team
Funded by ONR & NOAA
Bedford Institute of Oceanography
•
Gulf of Maine Ocean
Observing System
Louisiana State University
Texas A&M
University of Alabama, Huntsville
University of Florida
University of North Carolina
Virginia Institute of Marine Science
•External Resources
•e.g. SURAgrid regional
grid infrastructure,
www.sura.org/suragrid
Renaissance
Computing Institute
MCNC
Southeastern Universities
Research Association
From: Dr. Philip Bogden "Designing a Collaborative Cyberinfrastructure for Event-Driven Coastal Modeling," Philip
Bogden, Supercomputing 2006, Nov 2006, Tampa, Fl.
DOE Earth System Grid
Goal
Address technical obstacles to sharing
and analysis of high-volume data from
advanced earth system models
www.earthsystemgrid.org
1-1.44
Earth System Grid II
http://www.csm.ornl.gov/Highlights/esg.html
1.45
Medicine
/Biology
http://www.ediamond.
ox.ac.uk/
Project period:
2002-2005
1a.46
Project period:
2002-2005…
http://www.openmolgrid.org/
1-1.47
Physics
CERN LCH Computing grid (LCG)
Large Hadron Collider experimental facility for
complex particle experiments at CERN
(European Center for Nuclear Research, near
Geneva Switzerland).
Started in 2002. Expected operational 2008
1-1.48
http://public.web.cern.ch/public/en/LHC/LHC-en.html
1-1.49
CERN LCH Computing grid (LCG)
1a.50
LCG depends on two major science
grid infrastructures ….
EGEE
OSG
- Enabling Grids for E-Science
- US Open Science Grid
From: LCG Overview - May 2007 - Les Robertson, http://lcg.web.cern.ch/LCG/dissemination.html
Grid computing infrastructure
projects
Not tied to one specific application
1-1.52
Grid Networks
Grid networks for
collaborative grid computing
projects
Grids have been set up at local level, national
level, and international level throughout the
world, to promote Grid computing
1-1.53
TeraGrid
Funded by NSF in 2001 initially to link five
supercomputer centers. Hubs established at Chicago
and Los Angeles . Five centers connected to one hub:
• Argonne National Laboratory (ANL) (Chicago hub)
• National Center for Supercomputing Applications
(NCSA) (Chicago hub)
• Pittsburgh Supercomputing Center (PSC) (Chicago hub)
• San Diego Supercomputer Center (SDSC) (LA hub)
• Caltech (LA hub)
• National Center for Supercomputing Applications
(NCSA) (Chicago hub)
1-1.54
Hubs at Chicago and Los Angeles
Interconnected using 40 Gigabit/sec optical
backplane network .
Five centers
Connected to one hub using 30 Gigabit/sec
connections
State-of-the-art optical lines could reach 10
Gigabit/sec in the early 2000s
Four lines used to achieve 40 Gigabit/sec.
Three lines used to achieve 30 Gigabit/sec
1-1.55
TeraGrid circa 2004
1-1.56
TeraGrid was further funded by NSF for period
2005-2010.
Has developed into a platform for a wide range
of Grid applications and is described as:
“the world’s largest, most comprehensive
distributed cyberinfrastructure for open
scientific research.”
http://www.teragrid.org/about/
1-1.57
TeraGrid as of 2008
1-1.58
Open Science Grid (OSG)
Started around 2005, received $30 million funding from
NSF and DOE in 2006:
• Boston University
• Brookhaven National
Laboratory
• California Institute of
Technology
• Columbia University
• Cornell University
• Fermi National Accelerator
Laboratory
• Indiana University
• Lawrence Berkeley National
Laboratory
• Stanford Linear
Accelerator Center
• University of California,
San Diego
• University of Chicago
• University of Florida
• University of Iowa
• University of North
Carolina/RENCI
• University of WisconsinMadison
1-1.59
Current status July 200
1a.60
SURAGrid as of 2009
Southeastern Universities Research Association
Fig. 1.4
1-1.61
National Grids
Many countries have embraced Grid computing
and set-up Grid computing infrastructure:
• UK e-Science grid
• Grid-Ireland
• NorduGrid
• DutchGrid
• POINIER grid (Poland)
• ACI grid (France)
• Japanese grid
• etc, etc., …
1-1.62
UK e-Science Grid
Early 2000’s
1-1.63
UK National Grid Service
• Follow-up from UK e-Science Grid
• Founded in 2004 to provide distributed
access to computational and database
resources, with four core sites:
– Universities of Manchester, Oxford and Leeds, and
Rutherford Appleton Laboratory
• By 2008, it had grown to 16 sites.
• Access free to any academic with a
legitimate need.
1-1.64
Multi-national Grids
• 2000-2005, several efforts to create Grids
that spanned across many countries.
1-1.65
Multi-national Grid example
ApGrid
• A partnership in Asia Pacific region
involving:
– Australia, Canada, China, Hong Kong, India,
Japan, Malaysia, New Zealand, Philippines,
Singapore, South Korea, Taiwan, Thailand,
USA, and Vietnam.
1-1.66
European centered multinational Grids
• Several initiatives for European countries to
collaborated in forming Grid-like infrastructures to
share compute resources funded by European
programs.
1-1.67
European centered multi-national Grid
Example
DEISA
(Distributed European Infrastructure for
Supercomputing Applications)
DEISA-1 project from 2004 - 2008.
DEISA-2 started in 2008, to extend to 2011
1-1.68
DEISA
(Distributed
European
Infrastructure for
Supercomputing
Applications)
As of 2008
1a.69
DEISA-2 partners
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Barcelona Supercomputing Centre Spain (BSC),
Consortio Interuniversitario per il Calcolo Automatico Italy (CINECA),
Finnish Information Technology Centre for Science Finland (CSC),
University of Edinburgh and CCLRC UK (EPCC)
European Centre for Medium-Range Weather Forecast UK (ECMWF)
Research Centre Juelich Germany (FZJ)
High Performance Computing Centre Stuttgart Germany (HLRS),
Institut du Développement et des Ressources en Informatique
Scientifique - CNRS France (IDRIS),
Leibniz Rechenzentrum Munich Germany (LRZ),
Rechenzentrum Garching of the Max Planck Society Germany (RZG)
Dutch National High Performance Computing Netherlands (SARA),
Kungliga Tekniska Högskolan Sweden (KTH),
Swiss National Supercomputing Centre Switzerland (CSCS),
Joint Supercomputer Center of the Russian Academy of Sciences
Russia (JSCC).
1-1.70
Vision of a single
universal international
Grid such as the
Internet/World Wide
Web
May never be
achieved though.
More likely - Grids will
connect to other Grids
but will maintain their
identity.
1a.71
Questions
1-1.72
Questions
1a.73
There will be multiple-choice quizzes in the
course (on-line through Moodle).
Quiz
Question: What is a virtual organization?
(a) An imaginary company.
(b) A web-based organization.
(c) A group of people geographically distributed that
come together from different organizations to
work on a Grid project.
(d) A group of people that come together to work on
a virtual reality Grid project.
1a.74
Question: What is meant by the term cloud
computing?
(a) Atmospheric Computing
(b) Computing using geographically
distributed computers
(c) A facility providing services and software
applications
(d) A secure CIA computing facility
1a.75
Question: In addition to computers, which of
the following resources can be shared on
a Grid?
(a) Storage
(b) Application Software
(c) Specialized equipment (such as
sensors)
(d) Databases
(e) All of the above
1a.76
Questions
Grid Computing is using ______________
______________ and interconnected
computers together for computing and
resource ______________.
Questions
The original driving force behind Grid
Computing was ______________
______________ ______________.
Questions
However, Grid Computing is more about
______________ and ______________
______________ than it is about high
performance computing.
Questions
Another important components of Grid
Computing is ______________
______________, groups of people, both
geographically and organizationally
distributed, working together on a problem,
sharing computers AND other resources.
Questions
Other models of computing that are similar
but different to Grid Computing are
______________ Computing and
______________ Computing.
Questions
Ian Foster's checklist for determining of
a grid is a Grid:
a)______________________
b)______________________
c)______________________
Download