530 Project Fall 2014 Multicore – Manycore – GPU Architecture In this final Group project –Parallel trends in computing architecture, mainly Multicore – Manycore – GPU computing. The idea is to get a "feel" for parallelization – its problems – available tools & their limitation. You will be exposed to Parallel computing using modern programming tools. The recommended tools are Intel’s Parallel studio suite of tools (MPI, CILK, TBB) and Nvidias’ CUDA programming environment. Final demos and report due in class as shown in Deliverables. This is your chance to learn something new, use it wisely. On the assignment menu you will find a technical report from UC Berkeley on parallel computing. This is required reading! Can use intels’s tools such as TBB, Vtune..can also use C# to gen parallelization. Problems: NQueens , GO, for test, For app: Quicksort, eigen, Map-reduce, Ambient Oclusion, Ocean, Heat equation Each group will need a reasonable & different application to parallelize for the project – and test by running it with varying the no of cores. You can use Nqueens', Matrix multiply as a test and to allow you to make comparisons. Deadlines, are chosen to encourage research / application selection progress so you're "ready" to start actual parallelization till end of semester. Deliverables: 1. Wed Nov 5 in class: Present initial finding: Your programming environment selection, test bench & CHOSEN APPLICATION; Must describe – justify your benchmark approach. Students have written their application before 2. Wed Nov 12 in class: Update _ Confirm your selection or change _ progress – Present your summary of UC Berkley parlab 3. Wed dec 10 in class: Final report, presentation Following is a link to 2010 student projects at UC Berkeley, or explore latest projects at Berkeley & elsewhere http://www.cs.berkeley.edu/~demmel/cs267_Spr11/Syllabus.html 1. Use Intel Parallel Studio tools to implement, analyze, debug a parallel application. A selected benchmark can be used for comparison of your experiments. a. Must be meaningful size application (approved ), besides NQueens & Matrix multiply b. Each group must have a different application besides NQueens c. Download 30 day trial version of parallel studio d. Test program by running on more than one core i.e. multicore machine. Vary number of cores used (1 – 4) and measure performance relative to a single core / single thread base. e. If you’re getting same results for 1, 2, 3, 4 ….look again. You probably have the wrong configuration f. Can use the following examples to test your approach and compare with the final chosen application i. Nqueens ii. Write your own iii. Matrix multiply 2. Use CUDA to implement GPU parallel computing. Be adventurous and try it. Mnay people do not understand GPU computing. a. Must Run on GPU workstation available in Dept. b. Must be meaningful size (approved) c. Each group must have a separate application d. Test program by running on more than one core i.e. multicore machine. Vary number of cores used (1 – 4) and measure performance relative to a single core / single thread base. e. Useful CUDA links f. CUDA Programing Guide 3.2 :http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CU DA_C_Programming_Guide.pdf g. CUDA Developers Zone: http://developer.nvidia.com/object/gpucomputing.html h. Video lectures from UIUC about CUDA and Parallel Programming: http://developer.nvidia.com/object/cuda_training.html 3. Useful links a. Parlab – Parallel computing from Berkeley http://parlab.eecs.berkeley.edu/2011bootcampagenda b. CS267 c. http://software.intel.com/sites/products/documentation/studi o/studio/en-us/2011/start/index.htm [parallel studio getting started] d. http://software.intel.com/en-us/articles/intel-parallel-studiohome/ [parallel studio home page]