Final Project

advertisement
530 Project Fall 2014
Multicore – Manycore – GPU Architecture
In this final Group project –Parallel trends in computing architecture,
mainly Multicore – Manycore – GPU computing. The idea is to get a "feel"
for parallelization – its problems – available tools & their limitation.
You will be exposed to Parallel computing using modern programming
tools. The recommended tools are Intel’s Parallel studio suite of tools (MPI,
CILK, TBB) and Nvidias’ CUDA programming environment. Final demos
and report due in class as shown in Deliverables. This is your chance to
learn something new, use it wisely. On the assignment menu you will find a
technical report from UC Berkeley on parallel computing. This is required
reading! Can use intels’s tools such as TBB, Vtune..can also use C# to gen
parallelization. Problems: NQueens , GO, for test, For app: Quicksort, eigen,
Map-reduce, Ambient Oclusion, Ocean, Heat equation
Each group will need a reasonable & different application to parallelize for
the project – and test by running it with varying the no of cores.
You can use Nqueens', Matrix multiply as a test and to allow you to make
comparisons.
Deadlines, are chosen to encourage research / application selection
progress so you're "ready" to start actual parallelization till end of
semester.
Deliverables:
1. Wed Nov 5 in class: Present initial finding: Your programming
environment selection, test bench & CHOSEN APPLICATION; Must
describe – justify your benchmark approach. Students have written
their application before
2. Wed Nov 12 in class: Update _ Confirm your selection or change _
progress – Present your summary of UC Berkley parlab
3. Wed dec 10 in class: Final report, presentation
Following is a link to 2010 student projects at UC Berkeley, or explore latest
projects at Berkeley & elsewhere
http://www.cs.berkeley.edu/~demmel/cs267_Spr11/Syllabus.html
1. Use Intel Parallel Studio tools to implement, analyze, debug a parallel
application. A selected benchmark can be used for comparison of
your experiments.
a. Must be meaningful size application (approved ), besides
NQueens & Matrix multiply
b. Each group must have a different application besides NQueens
c. Download 30 day trial version of parallel studio
d. Test program by running on more than one core i.e. multicore
machine. Vary number of cores used (1 – 4) and measure
performance relative to a single core / single thread base.
e. If you’re getting same results for 1, 2, 3, 4 ….look again. You
probably have the wrong configuration
f. Can use the following examples to test your approach and
compare with the final chosen application
i. Nqueens
ii. Write your own
iii. Matrix multiply
2. Use CUDA to implement GPU parallel computing. Be adventurous
and try it. Mnay people do not understand GPU computing.
a. Must Run on GPU workstation available in Dept.
b. Must be meaningful size (approved)
c. Each group must have a separate application
d. Test program by running on more than one core i.e. multicore
machine. Vary number of cores used (1 – 4) and measure
performance relative to a single core / single thread base.
e. Useful CUDA links
f. CUDA Programing Guide 3.2
:http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CU
DA_C_Programming_Guide.pdf
g.
CUDA Developers Zone:
http://developer.nvidia.com/object/gpucomputing.html
h. Video lectures from UIUC about CUDA and Parallel Programming:
http://developer.nvidia.com/object/cuda_training.html
3. Useful links
a. Parlab – Parallel computing from Berkeley
http://parlab.eecs.berkeley.edu/2011bootcampagenda
b. CS267
c. http://software.intel.com/sites/products/documentation/studi
o/studio/en-us/2011/start/index.htm [parallel studio
getting started]
d. http://software.intel.com/en-us/articles/intel-parallel-studiohome/ [parallel studio home page]
Download