GPU Research Capabilities at Seneca FSOSS 2012-10-26 Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada A Fresh Initiative From Some Personal History To Heterogeneous Computing 2 A Fresh Initiative The 80287 3 A Fresh Initiative Floating-Point Co-Processor (1985) 4 A Fresh Initiative ATI 3D Rage II Co-Processor (1996) 5 A Fresh Initiative A Paradigm Shift In Programming 6 Paradigm Shift The Turn Towards Concurrency 7 Paradigm Shift 8 Paradigm Shift Can still increase transistor density – but it's getting more expensive 9 Paradigm Shift Can still increase transistor density – but it's getting more expensive Can't increase processor frequencies < 10 GHz chips 10 Paradigm Shift Can still increase transistor density – but it's getting more expensive Can't increase processor frequencies < 10 GHz chips power consumption – can't melt chips 11 Paradigm Shift Can still increase transistor density – but it's getting more expensive Can't increase processor frequencies < 10 GHz chips power consumption – can't melt chips The Free Lunch is Over we can't just wait for improvement like we did before we need new routes to improvement 12 Paradigm Shift Use Different Computational Units For Distinctly Different Tasks 13 Heterogeneous Computing Intel Core i7 (2008), NVIDIA GeForce GTX580 (2010) 14 Heterogeneous Computing 15 Heterogeneous Computing 16 Heterogeneous Computing Serial processing Parallel processing + 17 Heterogeneous Computing NVIDIA many-core GPUs vs Intel multi-core CPUs Floating point operations per sec (GFLOP/s) Memory bandwidth (GB/s) 18 Industry Momentum STI (Sony + Toshiba + IBM) Broadband Cell Processor – CPU + GPU on one chip 19 Industry Momentum STI (Sony + Toshiba + IBM) Broadband Cell Processor – CPU + GPU on one chip Intel Xeon Phi – MIC (Many Integrated Core) 20 Industry Momentum STI (Sony + Toshiba + IBM) Broadband Cell Processor – CPU + GPU on one chip Intel Xeon Phi – MIC (Many Integrated Core) AMD APUs (Fusion) – CPU + GPU on a single chip 21 Industry Momentum STI (Sony + Toshiba + IBM) Broadband Cell Processor – CPU + GPU on one chip Intel Xeon Phi – MIC (Many Integrated Core) AMD APUs (Fusion) – CPU + GPU on a single chip HSA Foundation (2012) – AMD + ARM + TI + Imagination + MediaTek + Samsung + Ateris + Multicore Ware + Apical + Sonics + Symbio + Vivante 22 Industry Momentum STI (Sony + Toshiba + IBM) Broadband Cell Processor – CPU + GPU on one chip Intel Xeon Phi – MIC (Many Integrated Core) AMD APUs (Fusion) – CPU + GPU on a single chip HSA Foundation (2012) – AMD + ARM + TI + Imagination + MediaTek + Samsung + Ateris + Multicore Ware + Apical + Sonics + Symbio + Vivante Radeon – Discrete GPUs 23 Industry Momentum STI (Sony + Toshiba + IBM) Cell Processor – CPU + GPU on one chip Intel Xeon Phi – MIC (Many Integrated Core) AMD APUs (Fusion) – CPU + GPU on a single chip HSA Foundation (2012) – AMD + ARM + TI + Imagination + MediaTek + Samsung + Ateris + Multicore Ware + Apical + Sonics + Symbio + Vivante Radeon – Discrete GPUs NVIDIA – Discrete GPUs GeForce (digital gaming) Quadro (engineering workstations - graphics) Tesla (scientific computations – double precision) 24 Industry Momentum Discrete GPUs - Add-in board shipments 25 Industry Momentum Predictions 26 Industry Predictions Computer Graphics Market 1974-2015 27 Industry Predictions Computer Graphics Market 1974-2015 Traditional processors + low-cost graphics processors enable combinations of science and entertainment 28 Industry Predictions Embedded Graphics Processors (EGPs) are killing off Integrated Graphics Processors (IGPs) 29 Industry Predictions Embedded Graphics Processors (EGPs) are no threat to Discrete Graphics 30 Programming Heterogeneous Computers Concurrency-Oriented Programming Core Languages Fortran C C++ 31 Programming Heterogeneous Computers Concurrency-Oriented Programming (COP) Core Languages Fortran C C++ Extensions for COP Cilk Plus (Intel) OpenCL (Khronos Group – AMD and HSA) CUDA C/C++ (NVIDIA) Fortran 2008, C-x86 (PGI) DirectCompute (Microsoft) 32 Programming Heterogeneous Computers CUDA Teaching Centers in Ontario McMaster University (2010) University of Toronto (2011) High Performance Parallel Computing on Graphical Processing Units – ECE709 – part of Master's Degree Special Topics in Software Engineering: Programming Massively Parallel Graphics Processors – ECE1724H – part of Master's Degree Seneca College (2012) Introduction to Parallel Programming – Professional Option – GPU610/DPS915 – CPA Diploma and BSD Degree 33 Programming Heterogeneous Computers School of Information and Communications Technology (ICT) Our Capabilities and Plans 34 ICT Facilities Fully Equipped Teaching Classroom and Lab 40 seats 38 CUDA enabled desktops with GTX480s (480 cores) Maximus Workstation Quadro 600 for visualization Tesla C2075 for computation SCI-Net Research Accelerator Research Cluster – research testbed 8 x [2 Intel Xeon X5550 + 2 NVIDIA Tesla M2070] 35 ICT Facilities The 80287 36 ICT Courses Introductory Course – Student Skill Set Solid tested background in both C and C++ Profile for computationally intensive code Move critical code to the GPU using CUDA Optimize to hide memory latency with computations Programmer Training Workshops – on demand Advanced Course – (in the planning stage) Interactive Real-Time Computations + Visualization Parallelizing Fortran Applications OpenGL, DirectX Graphics Interoperability 37 ICT Faculty Areas of Interest or Domain Expertise Big Data – Geocomputation Cognition – Cognitive Tutors Intrusion Detection – Information Security Finite Element Analysis – Soft Matter 38 ICT Scope Areas of Application (source: NVIDIA) Image Processing Big Data Mining Gaming Advertising Genetics Quantum Chemistry Mathematics Product Design Scientific Computing Computational Finance 39 GPU Research Capabilities at Seneca FSOSS 2012-10-26 Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada