presentation slides

advertisement
Toward using higher-level abstractions
to teach Parallel Computing
Clayton Ferner, University of North Carolina Wilmington
601 S. College Rd., Wilmington, NC 28403 USA
cferner@uncw.edu
Barry Wilkinson, University of North Carolina Charlotte
9201 Univ. City Blvd., Charlotte, NC 28223 USA
abw@uncc.edu
Barbara Heath, East Main Evaluation & Consulting, LLC
P.O. Box 12343, Wilmington, NC 28405 USA
bheath@emeconline.com
May 20 2013
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
1
Outline
 Problem Addressed – Raising the level of
abstraction in Parallel Programming courses
 Patterns
 Seeds Framework
 Paraguin Compiler
 Surveys/Results
 Conclusions
 Future Work
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
2
Problem Addressed
 Parallel computing has typically been treated
as an advanced topic in computer science.
 Most parallel computing classes use lowlevel tools:



5/20/2013
MPI for distributed-memory systems
OpenMP for shared-memory systems
CUDA/OpenCL for high performance GPU
computing
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
3
Problem Addressed
 Does not give students skills to tackle larger
problems
 Does not give students skills in
computational thinking for parallel
applications
 Have to deal with issues such as deadlock,
race conditions, and mutual exclusion.
 Need to raise the level of abstraction
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
4
Pattern Programming Concept
 Programmer begins with established
computational patterns that provide a
structure




5/20/2013
Reusable solutions to commonly occurring
problems
Patterns provide guide to “best practices,” not a
final implementation
Provides good scalable design structure
Can reason more easier about programs
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
5
Pattern Programming Concept


5/20/2013
Potential for automatic conversion into
executable code avoiding low-level
programming
Particularly useful for the complexities of
parallel/distributed computing
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
6
What kind of Patters are We
Talking About?
 Low-level algorithmic patterns: fork-join,
broadcast/scatter/gather.
 Higher-level algorithm patterns: workpool,
pipeline, stencil, map-reduce.
 We concentrate upon higher-level patterns
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
7
Patterns
(Workpool)
Master (Source/sink)
Compute node
One-way connection
Two-way connection
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
8
Patterns
(Pipeline)
Stage 1
Stage 2
Stage 3
Master (Source/sink)
Compute node
One-way connection
Two-way connection
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
9
Patterns
(Stencil)
Master (Source/sink)
Compute node
One-way connection
Two-way connection
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
10
Patterns
(Divide and Conquer)
Master (Source/sink)
Compute node
Divide
Merge
One-way connection
Two-way connection
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
11
Patterns
(All-to-All)
Master (Source/sink)
Compute node
One-way connection
Two-way connection
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
12
Two Approaches
 The first approach uses a new software
environment (called Seeds)


Developed at UNCC
Creates a higher level of abstraction for parallel
and distributed programming based upon a
pattern programming approach.
 The second approach is uses compiler
directives (Paraguin Compiler):


5/20/2013
Developed at UNCW
Similar to OpenMP but creates MPI code for a
distributed-memory system.
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
13
Seeds Framework
 Programmer selects the pattern


specifies the data to be sent to and from the
processors/processes
specifies the computation to be performed by the
processors/processes
 The framework will


5/20/2013
automatically distribute tasks across distributed
computers and processors
self-deploy on distributed computers, clusters,
and multicore processors, or a combination of
distributed- and shared-memory computers.
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
14
Advantages
 Programmer does not program using low
level message passing APIs
 Programmer is relieved concerns for
message-passing deadlock
 Programmer has a very simple programming
interface
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
15
Seeds Framework
 Programmer implement 3 methods:



Diffuse method – to distribute pieces of data.
Compute method – the actual computation
Gather method – to gather the results
 Plus Programmer completes a “bootstrap”
class to deploy and start the framework with
the selected pattern.
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
16
Example: Monte Carlo Method
for Estimating of π
package edu.uncc.grid.example.workpool;
... // import statements
public class MonteCarloPiModule extends Workpool
{
private static final long serialVersionUID = 1L;
private static final int DoubleDataSize = 1000;
double total;
int random_samples;
Random R;
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
17
Example: Monte Carlo Method
for Estimating of π
public MonteCarloPiModule() {
R = new Random();
}
public void initializeModule(String[] args) {
total = 0;
random_samples = 3000; // random samples
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
18
Example: Monte Carlo Method
for Estimating of π
public Data Compute (Data data) {
DataMap<String,Object>
input= (DataMap<String,Object>)data;
DataMap<String, Object>
output = new DataMap<String, Object>();
Long seed = (Long) input.get("seed");
Random r = new Random();
r.setSeed(seed);
Long inside = 0L;
…
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
19
Example: Monte Carlo Method
for Estimating of π
for (int i = 0; i < DoubleDataSize ; i++) {
double x = r.nextDouble();
double y = r.nextDouble();
double dist = x * x + y * y;
if (dist <= 1.0) ++inside;
}
output.put("inside", inside);
return output;
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
20
Example: Monte Carlo Method
for Estimating of π
public Data DiffuseData (int segment) {
DataMap<String, Object>
d =new DataMap<String, Object>();
d.put("seed", R.nextLong());
return d; // returns a random seed for
//each job unit
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
21
Example: Monte Carlo Method
for Estimating of π
public void GatherData (int segment, Data dat)
{
DataMap<String,Object>
out = (DataMap<String,Object>) dat;
Long inside = (Long) out.get("inside");
total += inside; // aggregate answer from all
// the worker nodes.
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
22
Example: Monte Carlo Method
for Estimating of π
public double getPi() {
double pi = (total/
(random_samples*DoubleDataSize)) * 4;
return pi;
}
public int getDataCount() {
return random_samples;
}
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
23
Example: Monte Carlo Method
for Estimating of π
package edu.uncc.grid.example.workpool;
… //import statements
public class RunMonteCarloPiModule {
public static void main(String[] args) {
try {
MonteCarloPiModule pi = new MonteCarloPiModule();
Seeds.start( "/path-to-seeds-folder" , false);
PipeID id = Seeds.startPattern(new Operand(
(String[])null, new Anchor( "hostname",
Types.DataFlowRoll.SINK_SOURCE), pi ));
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
24
Example: Monte Carlo Method
for Estimating of π
System.out.println(id.toString() );
Seeds.waitOnPattern(id);
System.out.println( "The result is: " + pi.getPi() );
Seeds.stop();
} catch
… // exceptions
}
}
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
25
Compiler Directive Approach
(Paraguin Compiler)
 The Paraguin Compiler is a compiler being
developed at UNCW that will produce
parallel code




5/20/2013
Suitable for distributed-memory systems
Uses MPI
Interface is through directives (#pragmas)
Similar to OpenMP
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
26
Example: Matrix Multiplication
#pragma paraguin begin_parallel
#pragma paraguin bcast a b
#pragma paraguin forall
C p i
j k\
0x0 -1 1 0x0 0x0 \
0x0 1 -1 0x0 0x0
#pragma paraguin gather 1 C
i
j
0x0 0x0 0x0
0x0 0x0 0x0
5/20/2013
k\
1\
-1
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
27
Example: Matrix Multiplication
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
c[i][j] = c[i][j] + a[i][k] * b[k][j];
}
}
}
#pragma paraguin end_parallel
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
28
Surveys
 During the Fall 2012 semester, we
administered 3 surveys (a pre-, mid-, and
post-course)




5/20/2013
Feedback was collected by the external evaluator
Students who provided consent and completed
each of the three surveys were entered in a
drawing for one of eight $25 Amazon gift cards.
For each survey, 58 invitations were sent to
students at both campuses.
The response rates for the three surveys were:
36%, 29%, and 28%, respectively.
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
29
Pre- and Post-test Survey
Questions
 The purpose of the pre- and post-semester
surveys was to assess the degree to which the
students learned the material
 A set of seven pre-course items were
developed for this purpose.
 The items were presented with a six-point
Likert scale from “strongly disagree” (1)
through “strongly agree” (6).
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
30
Pre- and Post-test Survey
Questions
Item
I am familiar with the topic of parallel patterns for structured parallel
programming.
I am able to use the pattern programming framework to create a parallel
implementation of an algorithm.
I am familiar with the CUDA parallel computing architecture.
I am able to use CUDA parallel computing architecture.
I am able to use MPI to create a parallel implementation of an algorithm.
I am able to use OpenMP to create a parallel implementation of an
algorithm.
I am able to use the Paraguin compiler (with compiler directives) to create a
parallel implementation of an algorithm.
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
31
Results of Likert-Questions
Item
I am familiar with the topic of parallel patterns for structured
parallel programming.
I am able to use the pattern programming framework to create
a parallel implementation of an algorithm.
I am familiar with the CUDA parallel computing architecture.
I am able to use CUDA parallel computing architecture.
I am able to use MPI to create a parallel implementation of an
algorithm.
I am able to use OpenMP to create a parallel implementation
of an algorithm.
I am able to use the Paraguin compiler (with compiler
directives) to create a parallel implementation of an algorithm.
5/20/2013
Pre
Post
Mean (sd) Mean (sd)
N=21
N=16
2.74 (1.59) 4.44 (1.09)
2.38 (1.60) 4.25 (0.86)
2.29 (1.55) 4.63 (0.72)
1.95 (1.43) 4.44 (0.89)
2.24 (1.26) 4.88 (0.81)
2.19 (1.12) 5.06 (1.24)
1.76 (0.89) 4.13 (1.15)
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
32
Results
 Beginning of the semester:

Students indicated that they mostly did not feel
able to use the tools to implement algorithms in
parallel.
 End of the semester:

Students were mostly confident in their ability to
implement parallel algorithms using the tools
 Naturally, this is expected.
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
33
Results
 What is not expected is:

Students indicated greater confidence in using the lower
level parallel tools (MPI, OpenMP, and CUDA) than in
using our new approaches (patterns and the Paraguin
compiler).
 There are two possible explanations for this:
1) the tools need improvement to be easier to use; and
2) students preferred the flexibility and control of the
lower level tools.
 Based upon the next set of data, both explanations
are true.
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
34
Open-Ended Questions
 The students were asked to provide openended comments comparing and contrasting
the various methods
Item
Describe the benefits and drawbacks between the following methods:
Pattern Programming (Assignment 1) and MPI (Assignment 2).
Describe the benefits and drawbacks between the following methods:
Pattern Programming (Assignment 1) and Paraguin Compiler Directives
(Assignment 3).
Describe the benefits and drawbacks between the following methods: MPI
(Assignment 2) and Paraguin Compiler Directives (Assignment 3).
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
35
Results of Open-Ended
Questions
 All of the open-ended answers are included
in the paper.
 This answer is representative of the rest:

5/20/2013
“Using the seeds framework for pattern
programming made it easy implement patterns
for workflow. However, seeds works at such a
high level that I do not understand how it
implements the patterns. MPI gave me much
more control over how program divides the
workflow, but it can often be difficult to write a
complex program that requires a lot of message
passing between systems.”
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
36
Results of Open-Ended
Questions
 Computer Science students are accustomed
to having control.
 Seeds framework takes control from the user
when the user is working within the “basic”
layer (which the students were using)
 The framework is constructed in three layers:



5/20/2013
the “basic” layer
the “advanced” layer
the “expert” layer
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
37
Results of Open-Ended
Questions
 All of the open-ended answers are included
in the paper.
 This answer is representative of the rest:

5/20/2013
“I found Paraguin to be useful because it
eliminated the need for me to write the more
complex message passing routines in MPI.
However, I found it difficult to debug errors in
the code and also determine the correct loop
dependencies.”
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
38
Results of Open-Ended
Questions
 Some found the Paraguin compiler easier to
use and some did not
 The Paraguin compiler generates MPI code,
so the programmer can see how it’s
implemented, plus are free to modify
 We feel that the concerns of the Paraguin
being complicated are legitimate


5/20/2013
The compiler was designed to provide significant
flexibility over partitioning nested loops
This flexibility overly complicated the
partitioning directive
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
39
Relative Difficulty
 Students were asked to rate the relative
difficulty of using the Seeds Framework,
MPI, and the Paraguin compiler


(1) very difficult to (6) very easy
Students felt that the Seeds framework was the
easiest while the Paraguin was the most difficult
Mean (sd)
Pattern Programming
MPI
Paraguin Compiler Directives
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
3.63 (0.89)
3.25 (1.13)
2.56 (1.26)
40
Conclusions
 Student would benefit from using the
“advanced” layer of the Seeds Framework


Students would see how their code is
implemented
Students would feel the “control” over
implementation
 The compiler directives of the Paraguin
compiler have be redesigned to make them
easier (work is almost complete)
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
41
Conclusions
 Given the feedback, we feel confident we can
overcome the obstacles and achieve our goal
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
42
Future Work
 Our course will be offered again Fall 2013
with changes to address what we’ve learned
 We will be measuring again the effectiveness
of our methods
 Spring 2013 will serve as a “control”


5/20/2013
Parallel Computing taught in traditional methods
at UNCC
Similar surveys were administered
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
43
Future Work
 Paraguin directives have been redesigned and
reimplemented to make them easier to use:
#pragma paraguin begin_parallel
#pragma paraguin scatter A B
#pragma paraguin forall
…
#pragma paraguin gather C
#pragma paraguin end_parallel
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
44
Future Work
 Paraguin compiler directives to describe
patterns will be introduced
#pragma paraguin pattern(workpool)
#pragma paraguin begin_master
…
#pragma paraguin end_master
#pragma paraguin begin_worker
…
#pragma paraguin end_worker
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
45
Questions?
Clayton Ferner
Department of Computer Science
University of North Carolina Wilmington
cferner@uncw.edu
http://people.uncw.edu/cferner/
5/20/2013
(c) Copyright 2013 Clayton S. Ferner, UNC Wilmington
46
Download