Parallel, Cluster and Grid Computing By P.S.Dhekne, BARC

advertisement
Parallel, Cluster and Grid
Computing
By
P.S.Dhekne, BARC
[email protected]
August 23, 2006
Talk at SASTRA
1
High Performance Computing
• Branch of computing that deals with
extremely powerful computers and the
applications that use them
• Supercomputers: Fastest computer at any
given point of time
• HPC Applications: Applications that cannot
be solved by conventional computers in a
reasonable amount of time
August 23, 2006
Talk at SASTRA
2
Supercomputers
• Characterized by very high speed, very large
memory
• Speed measured in terms of number of
floating point operations per second
(FLOPS)
• Fastest Computer in the world: “Earth
Simulator” (NEC, Japan) – 35 Tera Flops
• Memory in the order of hundreds of
gigabytes or terabytes
August 23, 2006
Talk at SASTRA
3
HPC Technologies
• Different approaches for building
supercomputers
– Traditional : Build faster CPUs
• Special Semiconductor technology for increasing
clock speed
• Advanced CPU architecture: Pipelining, Vector
Processing, Multiple functional units etc.
– Parallel Processing : Harness large number of
ordinary CPUs and divide the job between them
August 23, 2006
Talk at SASTRA
4
Traditional Supercomputers
• Eg: CRAY
• Very complex architecture
• Very high clock speed results in very high heat
dissipation and advanced cooling techniques
(Liquid Freon / Liquid Nitrogen)
• Custom built or produced as per order
• Extremely expensive
• Advantages: Program development is
conventional and straight forward
August 23, 2006
Talk at SASTRA
5
Alternative to Supercomputer
• Parallel Computing: the use of multiple
computers or processors working together on a
single problem; harness large number of ordinary
CPUs and divide the job between them
– each processor works on its section of the
problem
– processors are allowed to exchange
Sequential
1
Parallel
1
cpu 1
2500
2501
5000
cpu 2
5001
cpu 3
7500
7501
cpu 4
10000
10000
information with other processors via fast interconnect path
•
Big advantages of parallel computers:
1. total computing performance multiples of processors used
2. total very large amount of memory to fit very large
programs
3. Much lower cost and can be developed in India
August 23, 2006
Talk at SASTRA
6
Types of Parallel Computers
•
•
The parallel computers are classified as
–
shared memory
–
distributed memory
Both shared and distributed memory systems have:
1. processors: now generally commodity processors
2. memory: now general commodity DRAM/DDR
3. network/interconnect: between the processors or memory
August 23, 2006
Talk at SASTRA
7
Interconnect Method
There is no single way to connect bunch of processors
• The manner in which the nodes are connected - Network
& Topology
• Best choice would be a fully connected network (every
processor to every other). Unfeasible for cost and scaling
reasons : Instead, processors are arranged in some
variation of a grid, torus, tree, bus, mesh or hypercube.
3-d hypercube
August 23, 2006
2-d mesh
Talk at SASTRA
2-d torus
8
Block Diagrams …
Memory
Interconnection Network
P1
P2
P3
P4
P5
Processors
A Shared Memory Parallel Computer
August 23, 2006
Talk at SASTRA
9
Block Diagrams …
Interconnection Network
P1
P2
P3
P4
P5
M1
M2
M3
M4
M5
A Distributed Memory Parallel Computer
August 23, 2006
Talk at SASTRA
10
Performance Measurements
• Speed of a supercomputer is generally
denoted in FLOPS (Floating Point
Operations per second)
– MegaFlops (MFLOPS), Million (106)FLOPS
– GigaFLOPS (GFLOPS), Billion (109)FLOPS
– TeraFLOPS (TFLOPS), Trillion (1012) FLOPS
August 23, 2006
Talk at SASTRA
11
Sequential vs. Parallel
Programming
• Conventional programs are called sequential (or
serial) programs since they run on one cpu only as
in a conventional (or sequential) computer
• Parallel programs are written such that they get
divided into multiple pieces, each running
independently and concurrently on multiple cpus.
• Converting a sequential program to a parallel
program is called parallelization.
August 23, 2006
Talk at SASTRA
12
Terms and Definitions
• Speedup of a parallel program:
= Time taken on 1 cpus / Time taken on ‘n’ cpus
• Ideally Speedup should be ‘n’
August 23, 2006
Talk at SASTRA
13
Terms and Definitions
• Efficiency of a parallel program:
= Speedup / No. of processors
• Ideally efficiency should be 1 (100 %)
August 23, 2006
Talk at SASTRA
14
Problem areas in parallel programs
• Practically, speedup is always less than ‘n’ and
efficiency is always less than 100%
• Reason 1: Some portions of the program cannot be
run in parallel (cannot be split)
• Reason 2: Data needs to be communicated among
the cpus. This involves time for sending the data
and time in waiting for the data
• The challenge in parallel programming is to
suitably split the program into pieces such that
speedup and efficiencies approach the maximum
August 23, 2006
Talk at SASTRA
15
Parallelism
• Property of an algorithm that lends itself
amenable to parallelization
• Parts of the program that has inherent
parallelism can be parallelized (divided into
multiple independent pieces that can
execute concurrently)
August 23, 2006
Talk at SASTRA
16
Types of parallelism
• Control parallelism (Algorithmic
parallelism):
– Different portions (or subroutines/functions)
can execute independently and concurrently
• Data parallelism
– Data can be split up into multiple chunks and
processed independently and concurrently
– Most scientific applications exhibit data
parallelism
August 23, 2006
Talk at SASTRA
17
Parallel Programming Models
• Different approaches are used in the
development of parallel programs
• Shared Variable Model: Best suited for
shared memory parallel computers
• Message Passing Model: Best suited for
distributed memory parallel computers
August 23, 2006
Talk at SASTRA
18
Message Passing
• Most commonly used method of parallel
programming
• Processes in a parallel program use
messages to transfer data between
themselves
• Also used to synchronize the activities of
processes
• Typically consists of send/receive
operations
August 23, 2006
Talk at SASTRA
19
How we started
In the absence of any standardization initial
parallel
machines
were
designed
with
varied
architectures having different network topologies
BARC started Supercomputing development to
meet computing demands of in-house users with the
aim to provide inexpensive high-end computing
since 1990-91 and have built several models
August 23, 2006
Talk at SASTRA
20
Selection Of Main Components
• Architecture
• Simple
•
Scalable
•
Processor Independent
• Inter Connecting Network
• Scalable bandwidth
• Architecture independent
•
Cost effective
• Parallel Software Environment
• User friendly
•
Portable
•
Comprehensive Debugging tools
August 23, 2006
Talk at SASTRA
21
SINGLE CLUSTER OF ANUPAM
Node 0
master
Node 1
slave
Node 2
slave
Node 15
slave
860/XP ,50MHZ
128 KB-512KB
cache & 64-Mb256 Mb memory
MULTIBUS II
SCSI
Ethernet
X25
TC
D
I
S
K
S
other systems
TERMINALS
August 23, 2006
Talk at SASTRA
22
64 NODE ANUPAM CONFIGURATION
Y
0
1
MB II BUS
0
15
wscsi
1
15
wscsi
MB II BUS
X
X
wscsi
MB II BUS
0
August 23, 2006
1
wscsi
0
15
Y
Talk at SASTRA
1
MB II BUS
15
23
8 NODE ANUPAM CONFIGURATION
August 23, 2006
Talk at SASTRA
24
ANUPAM APPLICATIONS
Finite Element Analysis
Protein Structures
64-Node ANUPAM
August 23, 2006
Pressure Contour in LCA Duct
Talk at SASTRA
25
3-D Plasma Simulations
2-D Atmospheric Transport Problem
Estimation of Neutron-Gamma Dose up to 8 km from the source
Problem Specification:
Simulations on BARC
Computer System
Cylindrical geometry
Radius = 8 km
Height= 8 km
No. of mesh points 80,000
No. of Energy groups 42
SN order 16
14
CPU Time (Hours)
12
Conclusions:
10
Use of 10 processors of the
BARC computer system reduces
the run time by 6 times.
August 23, 2006
Talk at SASTRA
8
6
4
2
0
0
5
10
15
No.of Processors
20
26
OTHER APPLICATIONS OF ANUPAM SYSTEM
* Protein Structure Optimization
* AB Initio Electronic Structure Calculations
* Neutron Transport Calculations
* AB Initio Molecular Dynamics Simulations
* Computational Structure Analysis
* Computational Fluid Dynamics ( ADA, LCA)
* Computational Turbulent flow
* Simulation Studies in Gamma-Ray Astronomy
* Finite Element Analysis of Structures
* Weather Forecasting
August 23, 2006
Talk at SASTRA
27
Key Benefits
• Simple to use
• ANUPAM uses user familiar Unix environment with
large memory & specially designed parallelizing tools
• No parallel language needed
• PSIM – parallel simulator runs on any Unix based
system
• Scalable and processor independent
August 23, 2006
Talk at SASTRA
28
Bus based architecture
• Dynamic Interconnection network providing full
connectivity and high speed and TCP/IP support
• Simple and general purpose industry back-plain bus
• Easily available off-the-shelf, low cost
• MultiBus, VME Bus, Futurebus … many solutions
Disadvantages
• One communication at a time
• Limited scalability of applications in bus based systems
• Lengthy development cycle for specialized hardware
• i860, Multibus-II reaching end of line, so radical change
in architecture was needed
August 23, 2006
Talk at SASTRA
29
Typical Computing, Memory &
Device Attachment
CPU
M
e
m
o
r
y
B
u
s
August 23, 2006
Memory
Input/Output Bus
Talk at SASTRA
Device
Card
30
Memory Hierarchy
SPEED
SIZE COST/BIT
CPU
Cache
Local
Memory
Remote
Memory
August 23, 2006
Talk at SASTRA
31
Ethernet: The Unibus of the 80s
(UART of the 90s)
Clients
compute
server
print
server
file
server
comm
server
2Km
August 23, 2006
Talk at SASTRA
32
Ethernet: The Unibus of the 80s
• Ethernet designed for
– DEC: Interconnect VAXen, terminals
– Xerox: enable distributed computing (SUN Micro)
• Ethernet evolved into a hodge podge of nets and
boxes
• Distributed computing was very hard, evolving into
–
–
–
–
expensive, assymmetric, hard to maintain,
client server for a VendorIX
apps are bound to a configuration & VendorIX!
network is NOT the computer
• Internet model is less hierarchical, more democratic
August 23, 2006
Talk at SASTRA
33
Networks of workstations (NOW)
• New concept in parallel computing and parallel
computers
• Nodes are full-fledged workstations having cpu,
memory, disks, OS etc.
• Interconnection through commodity networks like
Ethernet, ATM, FDDI etc.
• Reduced Development Cycle, mostly restricted to
software
• Switched Network topology
August 23, 2006
Talk at SASTRA
34
Typical ANUPAM x86 Cluster
NODE-01
NODE-02
NODE-03
NODE-04
NODE-05
NODE-06
NODE-07
NODE-08
UPLINK
FAST ETHERNET SWITCH
FILE SERVER
NODE-09
CAT-5 CABLE
NODE-10
NODE-11
NODE-12
NODE-13
NODE-14
NODE-15
NODE-16
ANUPAM - Alpha
• Each node is a
complete Alpha
workstation with 21164
cpu, 256 MB memory,
Digital UNIX OS etc.
• Interconnection thru
ATM switch with fiber
optic links @ 155 Mbps
August 23, 2006
Talk at SASTRA
36
PC Clusters : Multiple PCs
• Over the last few years, computing power of Intel PCs
have gone up considerably (from 100 MHz to 3.2 GHz in
8 years) with fast, cheap network & disk (in built )
• Intel processors beating conventional RISC chips in
performance
• PCs are freely available from several vendors
• Emergence of free Linux as a robust, efficient OS with
plenty of applications
• Linux clusters (use of multiple PCs) are now rapidly
gaining popularity in academic/research institutions
because of low cost, high performance and availability of
source code
August 23, 2006
Talk at SASTRA
37
Trends in Clustering
Clustering is not a new idea, it has become affordable, can be
build easily (plug&play) now. Even Small colleges have it.
August 23, 2006
Talk at SASTRA
38
Cluster based Systems
` traditional Computing
Clustering is replacing all
platforms and can be configured depending on
the method and applied areas
 LB Cluster - Network load distribution and LB
 HA Cluster - Increase the Availability of systems
 HPC Cluster (Scientific Cluster) - Computation-intensive
 Web farms - Increase HTTP/SEC
 Rendering Cluster – Increase Graphics speed
HPC : High Performance Computing
August 23, 2006
HA : High Availability LB
Talk at SASTRA
: Load Balancing
39
Computing Trends
• It is fully expected that the substantial and exponential
increases in performance of IT will continue for the
foreseeable future ( at least next 50 years) in terms of
– CPU Power ( 2X – every 18 months)
– Memory Capacity (2X – every 18 months)
– LAN/WAN speed (2X – every 9 months)
– Disk Capacity (2X – every 12 months)
• It is expected that all computing resources will continue
to become cheaper and faster, though not necessarily
faster than the computing problems we are trying to
solve.
August 23, 2006
Talk at SASTRA
40
Processor Speed Comparison
Sr. No. Processor
SPECint_base
2000
SPECfp_base
2000
1
Pentium-III, 550 MHz
231
191
2
Pentium-IV, 1.7 GHz
574
591
3
Pentium-IV, 2.4 GHz
852
840
4
Alpha, 833 MHz (64
bit)
511
571
5
Alpha, 1GHz (64 bit)
621
776
6
Intel Itanium-2,
900 MHz (64 bit)
810
1356
August 23, 2006
Talk at SASTRA
41
Technology Gaps
• Sheer CPU speed is not enough
• Matching of Processing speed, compiler performance,
cache size and speed, memory size and speed, disk size
and speed, and network size and speed, interconnect &
topology is also important
• Application and middleware software also adds to
performance degradation if not good
August 23, 2006
Talk at SASTRA
42
Interconnect-Related Terms
• Most critical component of HPC still remains to be
interconnect technology and network topology
• Latency:
– Networks: How long does it take to start sending a "message"?
Measured in microseconds- startup time
– Processors: How long does it take to output results of some
operations, such as floating point add, divide etc., which are
pipelined?)
• Bandwidth: What data rate can be sustained once the
message is started? Measured in Mbytes/sec or
Gbytes/sec
August 23, 2006
Talk at SASTRA
43
High Speed Networking
• Network bandwidth is improving
– LAN are having 10, 100, 1000, 10000 Mbps
– WAN are based on ATM with 155, 622, 2500 Mbps
• With constant advances
Communication technology
-
Processors and
infrastructure
Networks
in
are
Information
merging
&
into
one
-
System Area or Storage Area Networks (Myrinet, Craylink, Fiber channel, SCI etc): Low latency, High Bandwidth,
Scalable to large numbers of nodes
August 23, 2006
Talk at SASTRA
44
Processor Interconnect Technology
Sr. No. Communication
Technology
Bandwidth
MBits/sec
Latency time
Microseconds
1
Fast Ethernet
100
60
2
Gigabit Ethernet
1000
100
3
SCI based Woulfkit
2500
1.5-14
4
InfiniBand
10,000
<400 nsec
5
Quadric Switch
2500
2-10
6
August 23, 2006
10-G Ethernet
10,000
Talk at SASTRA
<100
45
Interconnect Comparison
Feature
Fast Ethernet
Gigabit
SCI
Latency
88.07µs
44.93µs (16.88 µs)
5.55 µs (1.61 µs)
Bandwidth
11 Mbytes/Sec
90 Mbytes/Sec
250 Mbytes/Sec
 Across Machines
 Two processes in a given machine
August 23, 2006
Talk at SASTRA
46
Switched Network Topology
• Interconnection Networks such as ATM, Ethernet etc.
are available as switched networks
• Switch implements a dynamic interconnection network
providing all-to-all connectivity on demand
• Switch allows multiple independent communications
simultaneously
• Full duplex mode of communication
• Disadvantages: Single point of failure, Finite capacity
• Disadvantages: Scalability, Cost for higher node count
August 23, 2006
Talk at SASTRA
47
Scalable Coherent Interface (SCI)
• High Bandwidth, Low latency SAN interconnect for
clusters of workstations (IEEE 1596)
• Standard for point to point links between computers
• Various topologies possible: Ring, Tree, Switched
Rings, Torus etc.
• Peak Bandwidth: 667 MB/s, Latency < 5 microseconds
August 23, 2006
Talk at SASTRA
48
ANUPAM-Xeon Performance
ARUNA 2-D TORUS – LOGICAL SCHEMATIC
Y
Aruna32
8
18
7
17
6
16
Aruna48
Aruna64
Aruna63
15
Aruna47
4
14
3
13
Aruna31
Aruna15
2
12
Aruna16
1
Aruna44
27
Aruna60
26
Aruna59
25
Aruna43
24
Aruna27
23
Aruna11
22
Aruna12
Aruna24
38
Aruna40
37
Aruna56
36
Aruna55
35
Aruna39
34
Aruna23
33
Aruna07
32
Aruna08
11
21
31
1
2
3
Aruna20
48
Aruna36
47
Aruna52
46
Aruna51
45
Aruna35
44
Aruna19
43
Aruna03
42
Aruna04
Aruna18
58
Aruna34
57
Aruna50
56
Aruna49
55
Aruna33
54
Aruna17
53
Aruna01
52
Aruna02
41
51
4
5
Aruna22
68
Aruna38
67
Aruna54
66
Aruna53
65
Aruna37
64
Aruna21
63
Aruna05
62
Aruna06
Aruna26
78
Aruna42
77
Aruna58
76
Aruna57
75
Aruna41
74
Aruna25
73
Aruna09
72
Aruna10
Gigabit Ethernet
SCI
Aruna30
88
Aruna46
87
Aruna62
86
Aruna61
85
Total latency
(s)
Latency within
node (s)
Total
Latency
(s)
Latency
within node
(s)
44.93
16.88
5.55
1.61
Aruna45
84
Aruna29
83
Aruna13
82
Aruna14
61
71
81
6
7
8
X
Bandwidth Vs Packet Sizes - Gigabit and SCI
300
250
200
150
100
50
Bandwidth MB/s
5
Aruna28
28
0
8
32
128
512
GbE MTU=1500
GbE MTU=6000
SCI
August 23, 2006
Talk at SASTRA
2K
8K
32K 128K 512K 2M
GbE MTU=3000
Packet sizes
GbE MTU=7500
GbE MTU=4500
GbE MTU=9000
49
ANUPAM P-Xeon Parallel Supercomputer
No. Of nodes :
Compute Node:
128
Processor:Dual Intel Pentium Xeon @ 2.4 GHz
Memory :2 GB per node
File Server:
Dual Intel based with RAID 5, 360 GB
Interconnection networks: 64 bit Scalable Coherent Interface (2D Torus connectivity)
Software:
Linux, MPI, PVM, ANULIB
Anulib Tools:
PSIM, FFLOW, SYN, PRE, S_TRACE
Benchmarked Performance: 362 GFLOPS for High Performance Linpack
August 23, 2006
Talk at SASTRA
50
ANUPAM clusters
ANU64
ASHVA
Sustained speed on 84 P-III processors: 15 GFLOPS
Year of introduction :- 2001
ARUNA
Year: 2002, Sustained speed on 64 P-IV cpus : 72 GFLOPS
Sustained speed on 128 Xeon processors :- 365 GFLOPS
Year of introduction :- 2003
August 23, 2006
Talk at SASTRA
51
ANUPAM series of super computers after 1997
ANUPAM-Pentium
Date
Model
Node-Microprocessor
Inter Comm.
Oct/98 4-node PII
Pentium PII/266Mhz
Ethernet/100
248
Mar/99 16-node PII
Pentium PII/333Mhz
Ethernet/100
1300
Mar/00 16 node PIII
Pentium PIII/550 Mhz
Gigabit Eth.
3500
May/01 84 node PIII
Pentium PIII/650 Mhz
Gigabit Eth.
15000
June/02 64 Node PIV
Pentium PIV/1.7GHz
Giga & SCI
August/03 128 Node-Xeon Pentium/Xeon 2.4 Ghz Giga & SCI
August 23, 2006
Talk at SASTRA
Mflops
72000
362000
52
Table of comparison ( with precise ( 64 bit ) computations )
Program name
T-80 (24 Hrforecast)
1 + 4 Anupam
Alpha
14 minutes
1 + 8 Anupam
Alpha
11 minutes
Cray XMP 216
12.5 minutes
All timings are Wall clock
times
August 23, 2006
Talk at SASTRA
53
BARC’s New Super Computing Facility
External View of New Super Computing Facility
512 Node ANUPAM-AMEYA
 BARC’s new Supercomputing facility was inaugurated by Honorable PM,
Dr. Manmohan Singh on 15th November 2005.
 A 512-node ANUPAM-AMEYA Supercomputer was developed with a speed
of 1.7 Teraflop for HPC benchmark.
 A 1024 node Supercomputer ( ~ 5 Tera flop) is planned during 2006-07
 Being used by in-house users
August 23, 2006
Talk at SASTRA
54
Support equipment
• Terminal servers
– Connect serial consoles from 16 nodes onto a single
ethernet link
– Consoles of each node can be accessed using the terminal
servers and management network
• Power Distribution Units
– Network controlled 8 outlet power distribution unit
– Facilities such as power sequencing, power cycling of each
node possible
– Current monitoring
• Racks
– 14 racks of 42U height, 1000 mm depth, 600 mm width
August 23, 2006
Talk at SASTRA
55
Software components
• Operating System on each node of the cluster is
Scientific Linux 4.1 for 64 bit architecture
– Fully compatible with Redhat Enterprise Linux
– Kernel version 2.6
• ANUPRO Parallel Programming Environment
• Load Sharing and Queuing System
• Cluster Management
August 23, 2006
Talk at SASTRA
56
ANUPRO Programming Environment
• ANUPAM supports following programming interfaces
–
–
–
–
MPI
PVM
Anulib
BSD Sockets
• Compilers
– Intel Fortran Compiler
– Portland Fortran Compiler
• Numerical Libraries
– BLAS (ATLAS and MKL implementations)
– LAPACK (Linear Algebra Package)
– Scalapack (Parallel Lapack)
• Program development tools
– MPI performance monitoring tools (Upshot, Nupshot, Jumpshot)
– ANUSOFT tool suite (FFLOW, S_TRACE, ANU2MPI, SYN)
August 23, 2006
Talk at SASTRA
57
Load Sharing and Queuing System
•
•
•
•
•
Torque based system resource manager
Keep track of available nodes in the system
Allot nodes to jobs
Maintain job queues with job priority, reservations
User level commands to submit jobs, delete jobs,
find out job status, find out number of available
nodes
• Administrator level commands to manage nodes,
jobs and queues, priorities, reservations
August 23, 2006
Talk at SASTRA
58
ANUNETRA : Cluster Management System
• Management and Monitoring of one or more clusters from a
single interface
• Monitoring functions:
– Status of each node and different metrics (load, memory, disk space,
processes, processors, traffic, temperature and so on)
– Jobs running on the system
– Alerts to the administrators in case of malfunctions or anomalies
– Archival of monitored data for future use
• Management functions:
– Manage each node or groups of nodes (reboots, power cycling,
online/offline, queuing and so on)
– Job management
August 23, 2006
Talk at SASTRA
59
Metric View on Ameya
Node View on Ameya
August 23, 2006
Talk at SASTRA
60
SMART: Self Monitoring And Rectifying Tool
• Service running on each node which keeps
track of things happening in the system
– Hanging jobs
– Services terminated abnormally
• SMART takes corrective action to remedy
the situation and improve availability of the
system
August 23, 2006
Talk at SASTRA
61
Accounting System
• Maintains database entries for each and every
job run on the system
–
–
–
–
Job ID, user name, number of nodes,
Queue name, API (mpi, pvm, anulib)
Submit time, start and end time
End status (finished, cancelled, terminated)
• Computes system utilization
• User wise, node wise statistics for different
periods of time
August 23, 2006
Talk at SASTRA
62
Accounting system – Utilization plot
August 23, 2006
Talk at SASTRA
63
Other tools
• Console logger
– Logs all console messages of each node into a database for
diagnostics purposes
• Sync tool
– Synchronizes important files across nodes
• Automated backup
– Scripts for taking periodic backups of user areas onto the
tape libraries
• Automated installation service
– Non-interactive installation of node and all required software
August 23, 2006
Talk at SASTRA
64
Parallel File System (PFS)


PFS gives a different view of I/O system with its
unique
architecture
and
hence
provides
an
alternative
platform
for
development
of
I/O
intensive applications
Data(File) striping in a distributed environment.

Supports collective I/O operations.

Interface as close to a standard LINUX interface
as possible.
Fast access to file data in parallel environment
irrespective of how and where file is distributed.
Parallel file scatter and gather operations


August 23, 2006
Talk at SASTRA
65
Architecture of PFS
PFS
Manager
PFS DAEMON
MIT
SERVER
DIT
MIT
Request
I/O Manager
Data
I/O DAEMON
LI T
NODE 1
August 23, 2006
I/O DAEMON
I/O DAEMON
LI T
LI T
NODE 2
NODE N
Talk at SASTRA
66
Complete solution to scientific problems
by exploiting parallelism for
 Processing
( parallelization of computation)
 I/O
( parallel file system)
 Visualization (parallelized graphic pipeline/
Tile Display Unit)
August 23, 2006
Talk at SASTRA
67
Domain-specific Automatic
Parallelization (DAP)
•
Domain: a class of applications such as
FEM applications
•
Experts use domain-specific knowledge
•
DAP is a combination of expert system and
parallelizing compiler
•
Key features: interactive process, experiencebased heuristic techniques, and visual
environment
August 23, 2006
Talk at SASTRA
68
Operation and Management Tools
• Manual installation of all nodes with O.S., compilers,
Libraries etc is not only time consuming it is tedious and
error prone
• Constant monitoring of hardware/networks and software
is essential to report healthiness of the system while
running 24/7 operation
• Debugging and communication measurement tools are
needed
• Tools are also needed to measure load, free CPU, predict
load, checkpoint restart, replace failed node etc.
We have developed all these tools to enrich
ANUPAM software environment
August 23, 2006
Talk at SASTRA
69
Limitations of Parallel Computing
• Programming so many nodes concurrently remains a
major barrier for most applications
- Source code should be known & parallisable
- Scalable algorithm development is not an easy task
- All resources are allotted for a single job
- User has to worry about message passing,
synchronization and scheduling of his job
- 15% users only require these solutions, rest can
manage with normal PCs
Fortunately lot of free MPI codes and even parallel solvers
are now available
Still there is large gap between technology & usage as
parallel tools are not so user friendly
August 23, 2006
Talk at SASTRA
70
Evolution in Hardware
• Compute Nodes:
– Intel i860
– Alpha 21x64
– Intel x86
• Interconnection Network:
– Bus : MultiBus-II, Wide SCSI
– Switched Network: ATM, Fast Ethernet, Gigabit
Ethernet
– SAN: Scalable Coherent Interface
August 23, 2006
Talk at SASTRA
71
Evolution in Software
• Parallel Program Development API:
– ANULIB (Proprietary) to MPI (Standard)
• Runtime environment
– I/O restricted to master only (860) to Full I/O (Alpha
and x86)
– One program at a time (860) to Multiple Programs to
Batch operations
• Applications
– In-house parallel to Ready made parallel applications
– Commercially available parallel software
August 23, 2006
Talk at SASTRA
72
Issues in building large clusters
• Scalability of interconnection network
• Scalability of software components
–
–
–
–
Communication Libraries
I/O Subsystem
Cluster Management Tools
Applications
• Installation and Management Procedures
• Troubleshooting Procedures
August 23, 2006
Talk at SASTRA
73
Other Issues in operating large clusters
• Space Management
– Node form factor
– Layout of the nodes
– Cable routing and weight
• Power Management
• Cooling arrangements
August 23, 2006
Talk at SASTRA
74
The P2P Computing
• Computing based on P2P architecture allows to share distributed
resources with each other with or without the support from a server.
• How do you manage under utilized resources?
- It is seen that utilization of desktop PC is typically <10 %, and this
percentage is decreasing even further as PCs are becoming more
powerful
- Large organizations must be having more than thousand PCs, each
delivering > 20 MFlops and this power is growing with every passing
day ……Trick is to use Cycle Stealing mode
- Each PC now has about 20Gbyte disc capacity
80Gb X 1000 = 80 Terabyte storage space is available ; Very large File
storage
– How do you harness power of so many PCs in a large organization?
…….. Issue of “Owership” hurdle , to be resolved
– Latency & bandwidth of LAN environment is quite adequate for P2P
computing………Space management no problem; use PCs wherever
they are!!
August 23, 2006
Talk at SASTRA
75
INTERNET COMPUTING
•
•
•
Today you can’t run your jobs on the Internet
Internet Computing using idle PC’s, is becoming an important
computing platform ([email protected],Napster,Gnutella,Freenet, KaZak)
– www is now a promising candidate for core component of wide
area distributed computing environment.
– Efficient Client/server models & protocols
– Transparent networking, navigation & GUI with multimedia
access & dissemination for data visualization
– Mechanism for distributed computing such as CGI.Java
With improved performance (price/performance) & the availability
of Linux, Web Services ( SOAP, WSDL, UDDI,WSFL), COM
technology it is easy to develop loosely coupled distributed
applications
August 23, 2006
Talk at SASTRA
76
Difficulties in present systems
– As technology is constantly changing there is a need for
regular upgrade/enhancement
– Cluster/Servers are not fail safe and fault tolerant.
– Many systems are dedicated to a single application, thus idle
when application has no load
– Many clusters in the organization remain idle
– For operating a computer centre 75 % cost come from
environment upkeep, staffing, operation and maintenance.
– Computers, Networks, Clusters, Parallel Machines and Visual
systems are not tightly coupled by software; difficult for users
to use it
August 23, 2006
Talk at SASTRA
77
Analysis – a very general model
Can we tie all components tightly by software?
PCs, SMPs
Clusters
RAID
Disks
Problem
Solving
Environment
Menu
-Template
- Solver
- Pre & Post
- Mesh
High
Speed
Network
Visual Data
Server
Computer Assisted Science & Engineering CASE
August 23, 2006
Talk at SASTRA
78
GRID CONCEPT
User Access Point
Result
Resource
Broker
Grid Resources
August 23, 2006
Talk at SASTRA
79
Are Grids a Solution?
“Grid Computing” means different things to different people.
Goals of Grid Computing
Technology Issues
Reduce computing costs
Clusters
Increase computing resources
Internet infrastructure
Reduce job turnaround time
MPP solver adoption
Enable parametric analyses
Administration of desktop
Reduce Complexity to Users
Use middleware to automate
Increase Productivity
Virtual Computing Centre
“Dependable, consistent, pervasive access to resources”
August 23, 2006
Talk at SASTRA
80
What is needed?
Computational Resources
ISP
Clusters
Reply
Choice
MPP
Workstations
MPI, PVM,Condor...
Matlab
Mathematica
C, Fortran
Java, Perl
Java GUI
Request
Client - RPC like
August 23, 2006
Talk at SASTRA
G
a
t
e
k
e
e
p
e
r
Broker
Scheduler
Database
81
Why Migrate Processes ?
 LOAD BALANCING
 Reduce average response time
 Speed up individual jobs
 Gain higher throughput
 MOVE PROCESS CLOSER
TO ITS RESOURCES
Use resources effectively
 Reduce network traffic
 INCREASE SYSTEMS RELIABILITY
 MOVE PROCESS TO A MACHINE
HOLDING / CONFIDENTIAL DATA
August 23, 2006
Talk at SASTRA
82
FILE-SERVER
PR-JOB1
Process
PR-JOB3
Process
PR-JOB2
Process
PR-JOB3
Process
PR-PARL
Process
PR-JOB1
PR-PARL
PR-JOB2
Process
PR-JOB2
Process
PR-PARL
Process
FILE-SERVER
PR-PARL
Process
August 23, 2006
PR-PARL
Process
Talk at SASTRA
Process
83
What does the Grid do for you?
•
You submit your work
•
And the Grid
– Finds convenient places for it to be run
– Organises efficient access to your data
• Caching, migration, replication
– Deals with authentication to the different sites that you will be using
– Interfaces to local site resource allocation mechanisms, policies
– Runs your jobs, Monitors progress, Recovers from problems,Tells you
when your work is complete
•
If there is scope for parallelism, it can also decompose your work
into convenient execution units based on the available resources,
data distribution
August 23, 2006
Talk at SASTRA
84
Main components
User Interface (UI):
The place where users logon to the Grid
Resource Broker (RB): Matches the user requirements with the available
resources on the Grid
Information System: Characteristics and status of CE and SE
(Uses “GLUE schema”)
Computing Element (CE): A batch queue on a site’s computers where
the user’s job is executed
Storage Element (SE): provides (large-scale) storage for files
August 23, 2006
Talk at SASTRA
85
Typical current grid
• Virtual organisations
negotiate with sites to
agree access to
resources
• Grid middleware runs
on each shared
resource to provide
INTERNET
– Data services
– Computation
services
– Single sign-on
• Distributed services
(both people and
middleware) enable
the grid
E-infrastructure is the key !!!
August 23, 2006
Talk at SASTRA
86
Biomedical applications
August 23, 2006
Talk at SASTRA
87
Earth sciences applications
• Earth Observations by Satellite
– Ozone profiles
• Solid Earth Physics
– Fast Determination of mechanisms
of important earthquakes
• Hydrology
– Management of water resources
in Mediterranean area (SWIMED)
• Geology
– Geocluster: R&D initiative of the
Compagnie Générale de Géophysique
 A large variety of applications is the key !!!
EGEE tutorial, Seoul
88
GARUDA
• Department of Information Technology
(DIT), Govt. of India, has funded CDAC to
deploy
computational
grid
named
GARUDA as Proof of Concept project.
• It will connect 45 institutes in 17 cities in
the country at 10/100 Mbps bandwidth.
August 23, 2006
Talk at SASTRA
90
Other Grids in India
• EU-IndiaGrid (ERNET, C-DAC, BARC,TIFR,SINP,PUNE UNIV, NBCS)
• Coordination with Geant for Education Research
• DAE/DST/ERNET MOU for Tier II LHC Grid (10 Univ)
• BARC MOU with INFN, Italy to setup Grid research Hub
• C-DAC’s GARUDA Grid
• Talk about Bio-Grid and Weather-Grid
August 23, 2006
Talk at SASTRA
91
Summary
• There have been three generations of ANUPAM, all with
different architectures, hardware and software
• Usage of ANUPAM has increased due to standardization
in programming models and availability of parallel
software
• Parallel processing awareness has increased among users
• Building parallel computers is a learning experience
• Development of Grid Computing is equally challenging
August 23, 2006
Talk at SASTRA
92
THANK YOU
August 23, 2006
Talk at SASTRA
93
Download