Grid Computing (2) (Special Topics in Computer Engineering) Veera Muangsin 30 January 2004

advertisement
Grid Computing (2)
(Special Topics in Computer Engineering)
Veera Muangsin
30 January 2004
1
Outline
•
•
•
•
High-Performance Computing
Grid Computing
Grid Applications
Grid Architecture
– Parallel Computers Architectures
– Cluster Architecture
– Grid Architecture
• Grid Middleware
• Grid Services
2
Parallel Computer Architectures
3
Parallel Architecture Taxonomy
•
•
•
•
Single Instruction Single Data (SISD )
Multiple Instruction Single Data (MISD)
Single Instruction Multiple Data (SIMD)
Multiple Instruction Multiple Data (MIMD)
– Shared Memory MIMD
– Distributed Memory MIMD
4
SISD : A Conventional Computer
Instructions
Data Input
Processor
Data Output
Speed is limited by the rate at which computer
can transfer information internally.
Ex: PC, Macintosh, Workstations
5
The MISD Architecture
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
Processor
Data
Output
Stream
A
Data
Input
Stream
Processor
B
Processor
C
More of an intellectual exercise than a practical
configuration. Few built, but commercially not available
6
SIMD Architecture
Instruction
Stream
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Ci<= Ai * Bi
Ex: CRAY machine vector processing
7
MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Processor
A
Data Output
stream B
Processor
B
Processor
C
Data Output
stream C
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
8
Clusters
• Distributed Memory
MIMD
• The most common
architecture in the
TOP500
11
Top 2-5 Clusters
• #2 LANL’s ASCI Q
• 13.88 TFlops
• 8192-node cluster
HP AlphaServer 1.25 GHz
• #3 Virginia Tech’s System X
• 10.28 TFlops
• 1,100-node cluster, Apple G5
12
• #4 NCSA’s Tungsten
• 9.81 TFlops
• 1,450-node cluster, dual-processor
Dell PowerEdge 1750
• #5 PNNL’s MPP2
• 8.63 TFlops
• 980-node cluster, HP
Longs Peak, dual Intel
Itanium-2 1.5 GHz
13
Our Parallel
Computers
Apollo
Zeus and Athena
14
Our Parallel Computers
Apollo Cluster
•6-node cluster
•Athlon XP 2000+ processor, 512 MB memory
•Linux + MPI + PBS (batch scheduler system) + Globus
(Grid middleware)
Zeus and Athena
•Two 4-processor Sun Enterprise 420R multiprocessor
computers
•450 MHz UltraSPARC II processors, 1 GB memory
•Solaris + Pthread + MPI
15
Cluster Architecture
16
Cluster Middleware
• Resides Between OS and Applications and
offers in infrastructure for supporting:
– Single System Image (SSI)
– System Availability (SA)
• SSI makes collection appear as single machine
• SA - Check pointing and process migration
17
Single System Image Components
• NFS (Network File System)
• NIS (Network Information System)
• NTP (Network Time Protocol)
server
client
client
client
18
Programming Environments
• Threads (Cluster of SMPs)
– POSIX Threads
– Java Threads
• Message Passing
– MPI
– PVM
• Virtual Shared Memory
• Batch Scheduling
– PBS, Condor, etc.
19
Batch Scheduling
• Process distribution
• Load balancing
• Job scheduling
• PBS, Condor, Sun Grid Engine,
IBM Load Leveler, LSF, DQS, …
20
Cluster Applications
• Sequential
• Parallel / Distributed (Cluster-aware app.)
– Grand Challenging applications
•
•
•
•
•
Weather Forecasting
Quantum Chemistry
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
……………….
– Web servers, data-mining
21
Grid Architecture
22
What is Grid ?
• An infrastructure that dynamically couples
– Computers (PCs, workstations, clusters, traditional
supercomputers, and even laptops, notebooks, mobile computers, PDA,
and so on)
– Software (e.g., renting special purpose applications on demand)
– Databases (e.g., transparent access to human genome database)
– Special Instruments (e.g., radio)
– People
• across the local/wide-area networks (enterprise,
organisations, or Internet) and presents them as a
unified resource or problem solving environment.
23
Grid Infrastructure
24
TeraGrid
25
Grid Applications
• Old and new applications getting Grid-enabled
via coupling of computers, databases,
instruments, people, etc:
– (distributed) Supercomputing
– Collaborative engineering
– high-throughput computing
• large scale simulation & parameter studies
– Remote software access / Renting Software
– Data-intensive computing
– On-demand computing
26
Conceptual view of the Grid
27
How can the Grid help me?
• Provide access to a global distributed
computing environment
– via authentication, authorisation, negotiation, security
• Identify and allocate appropriate resources
–
–
–
–
interrogate information services -> resource discovery
enquire current status/loading via monitoring tools
decide strategy - eg move data or move application
(co-)allocate resources -> process flow
28
How can the Grid help me? (2)
• Schedule tasks and analyse results
– ensure required application code is available on
remote machine
– transfer or replicate data and update catalogues
– monitor execution and resolve problems as they occur
– retrieve and analyse results - eg using local
visualization
29
To make this happen you need …
•
•
•
•
•
•
•
agreed protocols (Grid protocols)
defined application programming interfaces (APIs)
distributed data management
availability of current status of resources
monitoring tools
accepted authentication procedures and policies
network traffic management
30
Grid Components
Applications and Portals
Scientific
Engineering
Collaboration
…
Prob. Solving Env.
Development Environments and Tools
Languages
Libraries
Debuggers
Monitoring
Resource Brokers
Web enabled Apps
…
Distributed Resources Coupling Services
Comm.
Sign on & Security
Information
Process
Data Access
Web tools
…
QoS
Grid
Apps.
Grid
Tools
Grid
Middleware
Local Resource Managers
Operating Systems
Computers
Queuing Systems
Clusters
Libraries & App Kernels
Networked Resources across
Organisations
Storage Systems
Data Sources
…
…
TCP/IP & UDP
Scientific Instruments
Grid
Fabric
Before the Grid
User
Application
The User is responsible for
resolving the complexities of
the environment
Network
Site A
• independent sites
• independent
hardware and
software
• independent user
ids
• security policy
requiring local
connection to the
machine.
Site B
32
First Step to the Grid
Metacenter
User
Application
A layer of abstraction is added that hides some of
the complexities associated with running jobs in a
distributed computing environment, however,
limitations exist
Network
Centralized Scheduler and file staging
Site A
Site B
• Two or more
resources
connected in a
controlled user
environment
Constraints
• common
architecture
• single name
space
• common
scheduler
33
The Grid Today
1 Request info from
the grid
2 Get response
3 Make selection and
submit job
User
Application
1
2
3
The underlying infrastructure is abstracted into
Middleware
defined APIsGrid
thereby
simplifying developer and the
user access to resources,
however, this layer is not
Infrastructure
intelligent
Network
Site A
Common Middleware
- abstracts
independent,
hardware, software,
user ids, into a
service layer with
defined APIs
- comprehensive
security,
- allows for site
autonomy
- provides a common
infrastructure based
on middleware
Site B
34
The Near Future Grid
User
Application
Resources are accessed via various
intelligent services that access
Intelligent,
Customized Middleware
infrastructure APIs
Grid Middleware - Infrastructure APIs
The result: The (service
Scientist
oriented) and Application
Developer can Infrastructure
focus on science and not
on systems
management
Network
Site A
Customizable Grid
Services built on
defined Infrastructure
APIs
• automatic selection
of resources
• information products
tailored to users
• accountless
processing
• flexible interface:
web based, command
line, APIs
Site B
35
How the User Sees a Grid
• A set of grid functions that are available as
– Application programmer interfaces (APIs)
– Command-line functions
• After authentication, functions can be used to
–
–
–
–
Spawn jobs on different processors with a single command
Access data on remote systems
Move data from one processor to another
Support the communication between programs executing on
different processors
– Discover the properties of computational resources available
on the grid using the grid information service
– Use a broker to select the best place for a job to run and then
negotiate the reservation and execution (coming soon).
Tom Hinke
36
Many GRID Projects and Initiatives
• Public Grid Initiatives
• PUBLIC FORUMS
–
–
–
–
–
Computing Portals
Grid Forum
European Grid Forum
IEEE TFCC!
GRID’2000 and more.
• Australia
– Nimrod/G
– EcoGrid and GRACE
– DISCWorld
• Europe
–
–
–
–
–
–
–
–
–
–
UNICORE
MOL
METODIS
Globe
Poznan Metacomputing
CERN Data Grid
MetaMPI
DAS
JaWS
and many more...
– Distributed.net
– SETI@Home
– Compute Power Grid
• USA
–
–
–
–
–
–
–
–
–
–
–
–
Globus
Legion
JAVELIN
AppLes
NASA IPG
Condor
Harness
NetSolve
NCSA Workbench
WebFlow
EveryWhere
and many more...
• Japan
– Ninf
– Bricks
– and many more...
http://www.gridcomputing.com/
37
NetSolve
Client/Server/Agent -- Based Computing
Easy-to-use tool to provide efficient and uniform
access to a variety of scientific packages on UNIX platforms
•
•
•
•
•
•
•
Client-Server design
Network-enabled solvers
Network Resources
Seamless access to resources
Non-hierarchical system
Load Balancing
Fault Tolerance
reply
Interfaces to Fortran, C, Java, Matlab, more
Software Repository
choice
request
Software is available
www.cs.utk.edu/netsolve/
NetSolve Client
NetSolve Agent
38
Nimrod - A Job Management
System
http://www.dgs.monash.edu.au/~davida/nimrod.html39
Job processing with Nimrod
40
Nimrod/G Architecture
Nimrod/G Client
Nimrod/G Client
Nimrod/G Client
Nimrod Engine
Schedule Advisor
Trading Manager
Persistent
Store
Dispatcher
Grid Explorer
TM
Middleware Services
TS
GE
GIS
Grid Information Services
RM & TS
RM & TS
RM & TS
GUSTO Test Bed
41
RM: Local Resource Manager, TS: Trade Server
Compute Power Market
Grid Information Server
Grid Explorer
Application
Job
Control
Agent
Schedule Advisor
Trade Server
Charging Alg.
Trading
Trade Manager
Deployment Agent
User
Resource Broker
Accounting
Resource
Reservation
Other
services
Resource Allocation
R1
R2
…
Rn
A Resource Domain
42
Globus Toolkit
• Grid computing middleware
– Software between the hardware and high-level
services
– Basic libraries, services, command-line programs
• Most common middleware used in grids
• Integrated with Web Service
43
Globus Software Architecture
•login
•execute commands
•copy files
information about
resources and services
Monitoring and
Discovery Service
(MDS)
LDAP
distributed
directory service
•get and put files
•3rd party copy
•interactive file
management
•parallel transfers
Grid
Grid FTP
SSH
Grid Security Infrastructure
(GSI)
X.509 Certificates
SSL/TLS
credentials for
users, services,
hosts
•execute remote
applications
•stage executable, stdin,
stdout, stderr
Globus Resource Allocation
Manager (GRAM)
PBS
LSF
fork/exe
c
job management
systems
•authentication
•secure communication •single sign on
•delegation of
credentials
•authorization
44
Globus Deployment Architecture
User
Globus
client
system
Grid FTP
Client
MDS
server
system
Grid FTP
Server
Globus
server
system
User
Web portal
application/tool
GRAM
Grid SSH
MDS
Client
Client
Client
Clients are
programs and
libraries
MDS GIIS
GRAM
Server
Grid SSH
Server
Grid SSH
Server
GRAM
Server
PBS
MDS
GRIS
MDS
GRIS
LSF
Grid FTP
Server
Globus
server
system
45
For More Information
• Globus Project™
– www.globus.org
• Grid Forum
– www.gridforum.org
• Book (Morgan Kaufman)
– www.mkp.com/grids
46
Download