14-High Performance Computing(PPT)

advertisement
High Performance
Computing and CyberGIS
Keith T. Weber, GISP
GIS Director, ISU
Goal of this presentation
• Introduce you to an another world of
computing, analysis, and opportunity
• Encourage you to learn more!
Some Terminology Up-Front
•
•
•
•
Supercomputing
HPC
HTC
CI
Acknowledgements
• Much of the material presented here,
was originally designed by Henry
Neeman at Oklahoma University and
OSCER
What is Supercomputing?
• Supercomputing is the biggest, fastest computing right this
minute.
• Likewise, a supercomputer is one of the biggest, fastest
computers right this minute.
• So, the definition of supercomputing is constantly changing.
• Rule of Thumb: A supercomputer is typically 100 X as
powerful as a PC.
5
Fastest Supercomputer and
Moore’s Law
Fastest Supercomputer in the World
10000000
Speed in GFLOPs
1000000
100000
10000
Fastest
Moore
1000
100
10
1
1992
1997
2002
Year
2007
What is Supercomputing
About?
Size
Speed
Laptop
7
Size…
• Many problems that are interesting to
scientists and engineers can’t fit on a PC
– usually because they need more than a few GB
of RAM, or more than a few 100 GB of disk.
8
Speed…
• Many problems that are interesting to
scientists and engineers would take a
long time to run on a PC.
– months or even years.
– But a problem that would take 1 month on
a PC might take only a few hours on a
supercomputer
What can Supercomputing
be used for?
[1]
•
•
•
•
Data Mining
Modeling
Simulation
Visualization
What is a Supercomputer?
• A cluster of small computers, each called a node,
hooked together by an interconnection network
(interconnect for short).
• A cluster needs software that allows the nodes to
communicate across the interconnect.
• But what a cluster is … is all of these components
working together as if they’re one big computer ...
a super computer.
For example: Dell Intel Xeon
Linux Cluster
• 1,076 Intel Xeon CPU chips/4288
cores
• 8,800 GB RAM
• ~130 TB globally accessible disk
• QLogic Infiniband
• Force10 Networks Gigabit
Ethernet
• Red Hat Enterprise Linux 5
• Peak speed: 34.5 TFLOPs*
– *TFLOPs: trillion floating point operations (calculations) per
sooner.oscer.ou.edu
second
Quantifying a Supercomputer
• Number of cores
– Your workstation (4?)
– ISU cluster (800)
– Blue Waters (300,000)
• TeraFlops
How a cluster works together:
PARALLELISM
Parallelism
Parallelism means
doing multiple things
at the same time
More fish!
Less fish …
Understanding Parallel Processing
THE JIGSAW PUZZLE
ANALOGY
16
Serial Computing
• We are very accustom to serial processing.
It can be compared to building a jigsaw
puzzle by yourself.
• In other words, suppose you want to
complete a jigsaw puzzle that has 1000
pieces.
• We can agree this will take a certain amount
of time…let’s just say, one hour
Shared Memory Parallelism
• If Scott sits across the table from you, then
he can work on his half of the puzzle and
you can work on yours.
• Once in a while, you’ll both reach into the
pile of pieces at the same time (you’ll
contend for the same resource), which will
cause you to slowdown.
• And from time to time you’ll have to work
together (communicate) at the interface
between his half and yours. The speedup
will be nearly 2-to-1: Together it will take
about 35 minutes instead of 30.
The More the Merrier?
• Now let’s put Paul and Charlie on the
other two sides of the table.
• Each of you can work on a part of the
puzzle, but there’ll be a lot more
contention for the shared resource (the
pile of puzzle pieces) and a lot more
communication at the interfaces.
• So you will achieve noticeably less
than a 4-to-1 speedup.
• But you’ll still have an improvement,
maybe something like 20 minutes
instead of an hour.
Diminishing Returns
• If we now put Dave, Tom, Horst, and
Brandon at the corners of the table,
there’s going to be a much more
contention for the shared resource,
and a lot of communication at the
many interfaces.
• The speedup will be much less than
we’d like; you’ll be lucky to get 5-to-1.
• We can see that adding more and
more workers onto a shared
resource is eventually going to have
a diminishing return.
Amdahl’s Law
CPU
utilization
Source: http://codeidol.com/java/java-concurrency/Performance-and-Scalability/Amdahls-Law/
Distributed Parallelism
•
•
•
•
Let’s try something a little different.
Let’s set up two tables
You will sit at one table and Scott at the other.
We will put half of the puzzle pieces on your table and the other half of the
pieces on Scott’s.
• Now you can work completely independently, without any contention for a
shared resource.
• BUT, the cost per communication is MUCH higher, and you need the
ability to split up (decompose) the puzzle correctly, which can be tricky.
More Distributed
Processors
• It’s easy to add more processors in distributed
parallelism.
• But you must be aware of the need to:
• decompose the problem and
• communicate among the processors.
• Also, as you add more processors, it may be harder to
load balance the amount of work that each processor
gets.
23
FYI…Kinds of Parallelism
•
•
•
•
•
Instruction Level Parallelism
Shared Memory Multithreading
Distributed Memory Multiprocessing
GPU Parallelism
Hybrid Parallelism (Shared + Distributed
+ GPU)
24
Why Parallelism Is Good
• The Trees: We like parallelism because, as
the number of processing units working on a
problem grows, we can solve the same
problem in less time.
• The Forest: We like parallelism because, as
the number of processing units working on a
problem grows, we can solve bigger
problems.
Jargon
• Threads are execution sequences that share a single
memory area
• Processes are execution sequences with their own
independent, private memory areas
• Multithreading: parallelism via multiple threads
• Multiprocessing: parallelism via multiple processes
• Shared Memory Parallelism is concerned with threads
• Distributed Parallelism is concerned with processes.
Basic Strategies
• Data Parallelism: Each processor does exactly the
same tasks on its unique subset of the data
– jigsaw puzzles or big datasets that need to be processed now!
• Task Parallelism: Each processor does different
tasks on exactly the same set of data
– which algorithm is best?
An Example:
Embarrassingly Parallel
• An application is known as embarrassingly
parallel if its parallel implementation:
– Can straightforwardly be broken up into equal
amounts of work per processor, AND
– Has minimal parallel overhead (i.e., communication
among processors)
FYI…Embarrassingly parallel applications are also known as loosely
coupled.
Monte Carlo Methods
• Monte Carlo methods are ways of simulating
or calculating actual phenomena based on
randomness within known error limits.
– In GIS, we use Monte Carlo simulations to calculate
error propagation effects
– How?
• Monte Carlo simulations are typically,
embarrassingly parallel applications.
Monte Carlo Methods
• In a Monte Carlo method, you randomly
generate a large number of example cases
(realizations), and then compare the results of
these realizations
• When the average of the realizations converges
that is, your answer doesn’t change
substantially if new realizations are generated,
then the Monte Carlo simulation can stop.
Embarrassingly Parallel
• Monte Carlo simulations are embarrassingly
parallel, because each realization is
independent of all other realizations
A Quiz…
• Q: Is this an example of Data
Parallelism or Task Parallelism?
• A: Task Parallelism: Each processor does
different tasks on exactly the same set of data
Questions so far?
A bit more to know…
WHAT IS A GPGPU?
OR THANK YOU GAMING INDUSTRY
It’s an Accelerator
No, not this ....
Accelerators
• In HPC, an accelerator is hardware whose role
it is to speed up some aspect of the computing
workload.
– In the olden days (1980s), PCs sometimes had
floating point accelerators (aka, the math
coprocessor)
Why Accelerators are Good
• They make your code
run faster.
Why Accelerators are Bad
Because:
• They’re expensive (or they were)
• They’re harder to program (NVIDIA
CUDA)
• Your code may not be portable to other
accelerators, so the labor you invest in
programming may have a very short life.
The King of the Accelerators
The undisputed king of accelerators is the
graphics processing unit (GPU).
Why GPU?
• Graphics Processing Units (GPUs) were
originally designed to accelerate graphics
tasks like image rendering for gaming.
• They became very popular with gamers,
because they produced better and better
images, at lightning fast refresh speeds
• As a result, prices have become extremely
reasonable, ranging from three figures at the
low end to four figures at the high end.
GPU’s Do Arithmetic
• GPUs render images
• This is done through floating point
arithmetic – As it turns out, this is the
same stuff people use supercomputing
for!
Interested? Curious?
• To learn more, or to get involved with
supercomputing there is a host of
opportunities awaiting you
– Get to know your Campus Champions
– Visit http://giscenter.isu.edu/research/Techpg/CI/XSEDE/
– Visit https://www.xsede.org/
– Ask about internships (BWUPEP)
– Learn C (not C++, but C) or Fortran
– Learn UNIX
Questions?
Download