Parallel Processing: Past, Present and Future

advertisement
What is a Supercomputer?
Parallel Processing:
Past, Present and Future

Let us run a contest. Who gives the
most updated explanation?
Dr. G. Young
CS 370
Dr. Young
1
Supercomputer

CS 370
(AllWords.com)
Dr. Young
2
Supercomputer

A very fast, powerful mainframe
computer, used in advanced military
and scientific applications.
CS 370
Dr. Young
3
(M-W.com, Merriam-Webster's
Collegiate Dictionary)
A large very fast mainframe used
especially for scientific computations
CS 370
Dr. Young
4
1
Supercomputer

Supercomputer
(Dictionary.com)
A broad term for one of the fastest computers currently available.
Such computers are typically used for number crunching
including scientific simulations, (animated) graphics, analysis of
geological data (e.g. in petrochemical prospecting), structural
analysis, computational fluid dynamics, physics, chemistry,
electronic design, nuclear energy research and meteorology.
Perhaps the best known supercomputer manufacturer is Cray
Research.
A less serious definition, reported from about 1990 at The
University Of New South Wales states that a supercomputer is
any computer that can outperform IBM's current fastest, thus
making it impossible for IBM to ever produce a supercomputer.
A mainframe computer that is
among the largest, fastest, or
most powerful of those
available at a given time.
CS 370
Dr. Young
5
CS 370
(ComputerUser.com)

Dr. Young
6
(PCWebopaedia.com)
The fastest type of computer. Supercomputers are very
expensive and are employed for specialized
applications that require immense amounts of
mathematical calculations.
For example, weather forecasting requires a supercomputer.
Other uses of supercomputers include animated graphics, fluid
dynamic calculations, nuclear energy research, and petroleum
exploration.
The chief difference between a supercomputer and a
mainframe is that a supercomputer channels all its power into
executing a few programs as fast as possible, whereas a
mainframe uses its power to execute many programs
concurrently.
A very fast and powerful computer,
outperforming most mainframes, and used
for intensive calculation, scientific
simulations, animated graphics, and other
work that requires sophisticated and highpowered computing.
Cray Research and Intel are well-known
producers of supercomputers.
CS 370
Dr. Young
Supercomputer
Supercomputer

(FOLDOC.doc.ic.ac.uk)

7
CS 370
Dr. Young
8
2
Supercomputer

Supercomputer
(PrenHall.com)

The category that includes the largest
and most powerful computers.
CS 370
Dr. Young
9
Who is the winner?








CS 370

AllWords.com
M-W.com, Merriam-Webster's Collegiate
Dictionary
Dictionary.com
FOLDOC.doc.ic.ac.uk
ComputerUser.com
PCWebopaedia.com
PrenHall.com
Geek.com
Dr. Young
Dr. Young
10
Contest Winner
Supercomputer Contest

CS 370
(Geek.com) This refers to a computer that is
able to operate at a speed that places it at or
near the top speed of currently produced
computers.
Most supercomputers cost millions of dollars,
and the traditional model of using one large
computer with proprietary hardware is being
challenged by using a cluster of cheaper
computers with more standard hardware.
geek.com @ 2001
(Led by Chief Geek - Joel Evans )
Used to tell people all about Geek.
For example, to check out if you’re
Beginner Geek,
Intermediate Geek,
Advanced Geek or
Super Geek
11
CS 370
Dr. Young
12
3
Winner Highlight

(Geek.com@2001) This refers to a computer
that is able to operate at a speed that places it
at or near the top speed of currently produced
computers. Most supercomputers cost millions
of dollars, and the traditional model of using
one large computer with proprietary hardware
is being challenged by using
a cluster of cheaper computers with more
standard hardware.
CS 370
Dr. Young
13
Topics of Discussion







CS 370

Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
Dr. Young
14
Introduction








CS 370
Dr. Young
15
CS 370
Why we need Supercomputers?
Supercomputer Vendors
Supercomputer Products
Top Supercomputers
How to evaluate the power of a
supercomputer?
Top 10 Supercomputers
Theoretical Implication of Parallel machines
Areas of Research in Supercomputing
Supercomputing Journals
Dr. Young
16
4
Supercomputer Vendors
Why we need Supercomputers?


Even though processor speed has been increased
dramatically, but still not fast enough to our needs.
Use multiple processors is the way to go.
Areas need supercomputers:


CS 370
Generally involves intensive computation
Aerospace, Weather, Finance, Defense, Energy,
Internet, Government, Chemistry, Geophysics,
Telecom, Academic, Database, Mechanics,
Automotive,Transportation, Electronics,
Manufacturing, Fluid Dynamic, Petroleum
Dr. Young
17
Supercomputer Products














The
The
The
The
The
The
The
The
The
The
The
The
The
The
CS 370
Avalon A12
Cambridge Parallel Processing Gamma II Plus.
Compaq AlphaServer SC Series.
Fujitsu AP3000
Fujitsu VPP5000 series
Hitachi SR8000 system
HP Exemplar V2600
IBM RS/6000 SP
NEC Cenju-4
NEC SX-5
SGI Origin 2000 series
Sun E1000 Starfire
Tera/Cray SV1
Tera/Cray T3E
Dr. Young
18
How to evaluate the power of a supercomputer?

Peak-performance




19
CS 370
Theoretical
Run-time
Benchmarks

They use different technologies: Processor, OS, Connection
structure, Proprietary hardware and Software
CS 370
Dr. Young
Linpack benchmark (Top500)
Finding Largest Mersenne Prime
Number
Dr. Young
20
5
How to evaluate the power of a supercomputer?
How to evaluate the power of a supercomputer?
Benchmarks
LINPACK Benchmark (introduced by Jack Dongarra)
is to solve a dense system of linear equations. Rank
Top500 supercomputers
This performance does not reflect the overall
performance of a given system, as no single number
ever can.
Since the problem is very regular, the performance
achieved is quite high, and the performance numbers
give a good correction of peak performance.



CS 370
Dr. Young




21
How to evaluate the power of a supercomputer?
Largest known Mersenne Prime Numbers* before 2000
Prime













2^21701-1
2^23209-1
2^44497-1
2^86243-1
2^132049-1
2^216091-1
2^756839-1
2^859433-1
2^1257787-1
2^1398269-1
2^2976221-1
2^3021377-1
2^6972593-1
Digits
6533
6987
13395
25962
39751
65050
227832
258716
378632
420921
895932
909526
2098960 #
Year
1978
1979
1979
1982
1983
1985
1992
1994
1996
1997
1997
1998
1999
Dr. Young
do not occur in a regular sequence
no formula for generating them.
Discovery of new primes requires randomly
generating and testing millions of numbers.
CS 370
Dr. Young
22
How to evaluate the power of a supercomputer?
The current largest known
Mersenne Prime Numbers (in the form of 2n – 1)
can be found at
Name
Landon Curt Noll (with Laura Nickel, Ariel Glenn)
Landon Curt Noll
David Slowinski (with Harry Nelson)
David Slowinski
David Slowinski
David Slowinski
David Slowinski Paul Gage
David Slowinski Paul Gage
David Slowinski Paul Gage
David Slowinski Paul Gage
David Slowinski Paul Gage
David Slowinski Paul Gage
David Slowinski Paul Gage
http://www.mersenne.org/
$$$ The Electronic Frontier Foundation
is offering a $100,000 award for
discovering the next largest
(ten million digits) prime number
* Mersenne Prime Numbers are Prime Numbers in the form of 2^<Integer> -1
# 67 pages long if printed on Newspaper
CS 370
Prime Number
Greek mathematician Euclid proved that there
are an infinite number of prime numbers.
23
CS 370
Dr. Young
24
6
Top 10 Supercomputers
How to evaluate the power of a supercomputer?

Finding the Largest Mersenne Prime Number
Slowinski: (SGI, Cray)
"The prime finder program rigorously tests all
elements of a system -- from the logic of the
processors, to the memory, the compiler and the
operating and multitasking systems.
For high performance systems with multiple
processors, this is an excellent test of the system's
ability."
CS 370
Dr. Young
25
Country
USA
Japan
Spain
India
Germany
France
USA
China
Japan
Germany
Italy
Switzerland
CS 370
2012
(Nov)
5
1
1
2
1
2013
(June)
5
2
2
1
2007
8
2008
6
1
1
1
CS 370
Top 10 Supercomputers
Country
2006
6
2
1
1
1
2
Dr. Young
26
Top Supercomputers
2013
(Nov)
5
1
1
2

Timeline

Top #1 System


http://www.top500.org/timeline/
http://www.top500.org/featured/to
p-systems/
1
Dr. Young
27
CS 370
Dr. Young
28
7
Areas of Research in P&D Computing
Theoretical Implication of Parallel machines



Parallel machine with infinite number of processors
means we have a Non-deterministic Machine
Statement like Guess({S1,S2}) can be added to our
familiar deterministic program.
Suddenly, those NP-hard problems (e.g. Traveling
Salesman Problem) can be solved in Linear time















CS 370
Dr. Young
29
Supercomputing Journals










ACM J. of Experimental
Algorithmics
BIT
Cluster Computing
Computing and Visualization in
Science
IEEE Trans. on Computers
IEEE Trans. on Parallel and
Distributed Systems
International J. of Computer
Research
International J. of Computers and
Their Applications
International J. of High Performance
Computing and Networking
International J. of High Speed
Computing
CS 370












Dr. Young
30
Topics of Discussion
International J. of Parallel
Programming
J. of Interconnection Networks
J. of Parallel and Distributed
Computing
J. of Performance Evaluation and
Modeling of Computer Systems
J. of Supercomputing
J. of Visual Languages &
Computing
Parallel Algorithms and Applications
Parallel Computing
Parallel and Distributed Computing
Practices
Parallel Processing Letters
SIAM J. of Computing
SIAM J. of Scientific Computing
Dr. Young
CS 370
Parallel and Distributed Architectures
Parallel and Distributed Algorithms
Parallel Programming Languages
Scientific Computing
Signal & Image Processing Systems
Special Purpose Processors
VLSI and Configurable Logic Systems
Performance Modeling/Evaluation
Memory Hierarchy Issues in Parallel and Distributed Processing
Programming Environments and Tools for Parallel and Distributed Platforms
Compilers and Optimizations for Parallel and Distributed Processing
Operating System and Runtime Support for Parallel and Distributed Computing
Parallel and Distributed Network Protocols and Implementations
Applications of Parallel and Distributed Computing
Nontraditional Processor Technologies (Optical, Quantum, DNA, etc.)







31
CS 370
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
Dr. Young
32
8
Computer Networks

Network/Parallel Computer Architecture
Homogeneity



Computer Networks
Same kind of computers
Examples: a network of PCs, a network of
Sun workstations, …
Chain
Heterogeneity


Tree
Dr. Young
33
Computer Networks
CS 370


HP Exemplar V2600

Ring
Mesh
Cambridge Parallel Processing
Gamma II Plus


Torus
Fujitsu AP3000

Tera/Cray Research Inc. T3E

CS 370
Torus
Hypercube

SGI Origin series
Dr. Young
Star
Cube
Dr. Young
Hypercube
34
Topics of Discussion
Proprietary Parallel Computers

Mesh
A mixture of different computers
Example: Internet
CS 370

Ring

35
CS 370
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
Dr. Young
36
9
Parallel and Distributed Processing
Parallel and Distributed Processing
Hardware Structure of Parallel Computers
Hardware structure of Parallel Computers
Architectural Classes
Memory Systems
Distributed Processing
PVM & MPI
Parallel Applications
Task Assignment







CS 370
Dr. Young
37


Classification is based on the way of
manipulating of instruction and data streams
4 main architectural classes [Flynn, 1972]
 Multiple/Single Instruction (MI/SI)
 Multiple/Single Data (MD/SD)
M.J. Flynn, Some computer organizations and their)
effectiveness, IEEE Transactions on Computing, C-21,
pp. 948-960, 1972.
CS 370
Parallel and Distributed Processing
Architectural Classes
SISD machines:

38
Parallel and Distributed Processing
Architectural Classes

Dr. Young
Accommodate one instruction stream that is executed serially.
These are the conventional systems that contain one CPU
MISD machines:


Multiple instructions should act on a single stream of
data
No practical machine
SIMD machines:



Such systems often have thousands of processing units
execute the same instruction on different data
Hitachi S3600
MIMD machines:



CS 370
Dr. Young
39
Execute instruction streams in parallel on different data.
Run many sub-tasks in parallel
Large variety of MIMD systems
CS 370
Dr. Young
40
10
Parallel and Distributed Processing
Parallel and Distributed Processing
Memory Systems


Shared memory systems:
Have multiple CPUs all of which share
the same address space.


Distributed memory systems:
Each CPU has its own associated
memory.
CS 370
Dr. Young

41
Parallel and Distributed Processing
Distributed Processing
DM-MIMD concept one step further
Instead of many integrated processors
in one or several boxes, workstations
are connected by (Gigabit) Ethernet,
FDDI, or otherwise and set to work
concurrently on tasks in the same
program.
communication between processors is
often slower in orders of magnitude.
CS 370
Packages to realize Distributed Processing



PVM
(Parallel Virtual Machine)
[Geist et al., 1994]
MPI
(Message Passing Interface)
[Snir et al. and Gropp et al., 1998]

A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing, and V. Sunderam,
PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT
Press, Boston, 1994.
M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The
Complete Reference Vol. 1, The MPI Core, MIT Press, Boston, 1998.
W. Gropp, S. Huss-Ledermann, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir,
MPI: The Complete Reference, Vol. 2, The MPI Extensions, MIT Press, Boston, 1998.
CS 370
Dr. Young
42
Parallel and Distributed Processing
PVM & MPI

Dr. Young
43
CS 370
PVM & MPI
This style of programming, called the
"message passing" model, has been
widely accepted
PVM and MPI have been adopted by
virtually all major vendors of
distributed-memory MIMD systems and
even on shared-memory MIMD systems
for compatibility reasons.
Dr. Young
44
11
Parallel and Distributed Processing



Parallel Applications
Parallel Algorithms
Fine grain/Coarse grain
Parallel Programming


Parallel and Distributed Processing

Task Assignment
Performance Measures



ParBegin/ParEnd
Overheads for P&D Processing

PVM/MPI APIs


CS 370
Dr. Young
45
CS 370
Task Assignment
Throughput (Stone, 1977)



 E +  ITI +  ITC


H. Stone, Multiprocessor Scheduling with the Aid of
Network Flow Algorithms, IEEE Transactions on
Software Engineering, Vol. 3, No. 1, pp. 83-85, 1977.
Dr. Young
Execution Time for tasks (E)
Intra-task Interference cost (ITI)
Inter-task Communication cost (ITC)
Dr. Young
46
Topics of Discussion
Parallel and Distributed Processing
CS 370
Completion Time
Throughput


47
CS 370
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
Dr. Young
48
12
Computer Networks with Off-the-Shelf Hardware
Powered by Parallel and Distributed Processing Tools
Affordable supercomputer



Computer networks with Off-the-Shelf hardware
Powered by Parallel and Distributed Software
Tools
Advantages over Conventional Supercomputer
System of Homogeneous Network



A network of PC with SCSI Link
SPVM
System of Heterogeneous Network


CS 370
Internet
JMPI
Dr. Young
49
Advantages over Conventional Supercomputer











Decomposable
Reusable
Scale up and down easily
Off-the-shelf
Third World friendly
Economical
Reconfigurable Interconnection Topology
Easy to upgrade – bus, processor, software
Collaborative R&D Environment
General-purpose
Multi-usage
CS 370
Dr. Young
CS 370
50
Homogeneous Network

51
Dr. Young
CS 370
A network of Pentium PCs
Dr. Young
52
13
Heterogeneous Network
Topics of Discussion







CS 370
Dr. Young
53
Future Trend and Challenge




CS 370
Dr. Young
54
Future Trend and Challenge
PVM and MPI Community continues to
grow
Cheaper and faster processors and
Interconnections
More employment of Clusters of
Workstations for High Performance
Computing
More freely available Software Tools
Dr. Young
CS 370
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
55






CS 370
Race between Proprietary supercomputer and
the Cluster computers
How fast can a supercomputer go?
How the heterogeneous computing evolves?
Will a cluster of computers over Internet be a
faster computer in the world?
Processing Power on Demand Service?
Processor Sharing?
Dr. Young
56
14
Topics of Discussion
Conclusion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A







CS 370
Dr. Young

57
Conclusion




CS 370
Dr. Young

Practical
Affordable
Educational
Research topics




Knowledge Sharing through Major Forums
(e.g. IEEE TFCC, Top500, TopClusters)



One Key issue is how to
compare/evaluate/rank their performances
CS 370
58
Conclusion
Such an Exciting Area of Research

Powered by the state-of-art Parallel and
Distributed Processing Tools, highspeed computer network, with powerful
workstations, will become a very
attractive, affordable, highly scalable
and highly available solution for the
High Performance Computing world.
Dr. Young

59
CS 370
Build Your Own Supercomputer(Cluster)
Heterogeneous System
Employ new COTS (Com. Off-the-Shelf)
Classification
Benchmarks
Performance Tracking Tools
System Administration Software
Dr. Young
60
15
Top 500 Supercomputers Update


Trend of Cluster Computers
Versus
Proprietary Supercomputers.
Q&A
The TOP 500 Supercomputer List
http://www.top500.org/
CS 370
Dr. Young
61
CS 370
Dr. Young
62
16
Download