cs668-lec1 - Department of Electrical Engineering and

Fall 2008
CS 668
Parallel Computing
Prof. Fred Annexstein
Office Hours: 11-1 MW or by
Tel: 513-556-1807
Lecture 1: Welcome
Goals of this course
Syllabus, policies, grading
Blackboard Resources
LINC Linux cluster
Introduction/Motivation for HPPC
Scope of the Problems in Parallel Computing
• Primary:
– Provide an introduction to the computing systems,
programming approaches, common numerical and
algorithmic methods used for high performance
parallel computing
• Secondary:
– Have an course meeting competency requirements of
– Provide hands-on parallel programming experience
• Official Syllabus Available on Blackboard
• Textbook
Parallel Programming in C with MPI and OpenMP,
Michael J. Quinn
Other Recommended Texts
- Parallel Programming With Mpi, Peter Pacheco
- Introduction to Parallel Computing: Design and
Analysis of Algorithms: Ananth Grama, Anshul
Gupta, George Karpis, Vipin Kumar
- Using MPI - 2nd Edition: Portable Parallel
Programming with the Message Passing Interface
by William Gropp
• Exams (1 or 2)
– Graded 30% of Grade
• Written exercises (3-4)
– May/may not be graded
• Programming Assignments (3-4)
– May be done in groups of at most 2
– MPI programming, performance measurement
• Research papers (1)
– Discussion research questions, strengths, weaknesses,
interesting points, contemporary bibliography
• Final project (1)
– Individual or Group programming project and report
• Missed Exams:
– Missed exams can not be made up unless preapproved. Please see the instructor as soon as
possible in the event of a conflict.
• Academic Honesty:
– Plagiarism on assignments, quizzes or exams will not
be tolerated. See your student code of conduct
for more on the consequences of academic
misconduct. There are no “small” offenses.
Syllabus and my contact info
Lecture slides
Assignment handouts
Web resources relevant to the course
Discussion board
What is the Ralph Regula
• The Ralph Regula School of Computational Science
is a statewide, virtual school focused on computational
science. It is a collaborative effort of the Ohio Board of
Regents, Ohio Supercomputer Center, Ohio Learning
Network and Ohio's colleges and universities. With
funding from NSF, the school acts as a coordinating
entity for a variety of computational science education
activities aimed at making education in computational
science available to students across Ohio, as well as to
workers seeking continuing education about this
• Website: http://www.rrscs.org
CS LINC Cluster
• Michal Kouril’s links
– http://www.ececs.uc.edu/~kourilm/clusters/
– See README file for instructions on running MPI
code on beowulf.linc.uc.edu
• Accounts
– ECE/CS students should already have an account
– I can request accounts for the non-ECE/CS students
• Access
– Remote access only, the cluster is in the ECE/CS
server/machine room on the 8th floor of Rhodes,
visible through windows in the 890’s hallway
• Who needs a
roomful of computers
• My PC and XBOX
run at GFLOP rates
(Billion Floating Point
Operations per second)
NCSA TeraGrid IA-64 Linux Cluster
Needed by People who solve
Science and Engineering problems
Materials / Superconductivity
Fluid Flow
Structural Deformation
Genetics / Protein interactions
Many Research Projects in Natural Sciences and
Engineering cannot exist without HPPC
• Videos – Applications in Physics and
• Simulation of Large-Scale Structure of Universe
• Stability Simulation –
• Super Volcano Movie - Show first 1:00 minute
Why are the problems so large?
• 3-Dimensional
– If you want to increase the level of resolution by factor
of 10, problem size increases by 103
• Many Length Scales (both time and space)
– If you want to observe the interactions between very
small local phenomenon and larger more global
• The number of relationships between data items
grows quadraticly.
– Example: human genome 3.2 G base pairs means
about 5,000,000,000,000,000,000=5E relations
How can you solve these
• Take advantage of parallelism
– Large problems generally have many operations
which can be performed concurrently
• Parallelism can be exploited at many levels by
the computer hardware
– Within the CPU core, multiple functional units,
– Within the Chip, many cores
– On a node, multiple chips
– In a system, many nodes
• Parallelism has overheads
– At the core and chip level the cost is
complexity and money
– Most applications get only a fraction of peak
performance (10%-20%)
– At the chip and node level, memory bus can
get saturated if too many cores
– Between nodes, the communication
infrastructure is typically much slower than the
Necessity Yields Modest Success
• Power of CPUs keeps growing
• Parallel programming environments
changing very slowly – much harder than
Two standards have emerged
• MPI library, for processes that do not share
• OpenMP directives, for processes that do
share memory
Why MPI?
• MPI = “Message Passing Interface”
• Standard specification for messagepassing libraries
• Very Portable
• Libraries available on virtually all parallel
• Free libraries also available for networks
of workstations or commodity clusters
Why OpenMP?
• OpenMP an application programming
interface (API) for shared-memory
• Based on model of creating and
scheduling multi-threaded computations.
• Supports higher performance parallel
programming of symmetrical
What are the Costs?
Commercial Parallel Systems
• Relatively costly per processor
• Primitive programming environments
• Scientists looked for alternative
Beowulf Concept circa 1994
• NASA project (written by Sterling and Becker)
• Commodity processors
• Commodity interconnect
• Linux operating system
• Message Passing Interface (MPI) library
• High performance/$ for certain applications
How are they Programmed?
Task Dependence Graph
• Begin with Directed graph
• Vertices = tasks Edges = dependences
• Edges are removed as tasks complete
Data Parallelism
• Independent tasks apply same operation to different elements of a
data set
Functional Parallelism
• Independent tasks apply different operations to different data
• Divide a process into stages
• Produce and consume several items simultaneously
Why not just use a Compiler?
• Parallelizing compiler - Detect parallelism in sequential program
• Produce parallel executable program
Can leverage millions of lines of existing serial programs
• Saves time and labor- Requires no retraining of programmers
• Sequential programming easier than parallel programming
• Parallelism may be irretrievably lost when programs written in
sequential languages
• Simple example: Compute all partial sums in an array
• Performance of parallelizing compilers on broad range of
applications still up in air
Can we Extend Existing Languages?
Programmer can give directives or clues to the
complier about how to parallelize
• Easiest, quickest, and least expensive
• Allows existing compiler technology to be
• New libraries can be ready soon after new
parallel computers are available
• Lack of compiler support to catch errors
• Easy to write programs that are difficult to debug
Or Create New Parallel Languages?
• Allows programmer to communicate parallelism
to compiler directly
• Improves probability that executable will achieve
high performance
• Requires development of new compilers
• New languages may not become standards
• Programmer resistance
Where are we in 2008?
• Performance makes Low-level approaches
• Augment existing language with low-level
parallel constructs and directives
• MPI and OpenMP are prime examples
• Efficiency
• Portability
• More difficult to program and debug
Programming Assignment #1
• Log into beowulf.linc.uc.edu and run some
simple sample programs.
Reading Assignment #1 on