Scott Finley ECE 539 Fall 2008 UW-Madison Modern GPUs are have 100s of “stream processors” Can now be used for non-graphics computing nVida CUDA (used for this project) openCL Basic Linear Algebra Subprograms (BLAS) 1. ◦ CPU-Only nVidia’s cuBLAS library 2. ◦ ◦ No explicit GPU use, library uses GPU “under the hood” Lots of copies of data from CPU to GPU cuBLAS with CUDA 3. ◦ Same cuBLAS use as above, non-BLAS operations done with CUDA. Data from US forestry service Large feature vectors: 54 Large number of training samples: 500 per epoch Two hidden layers ◦ Number of neurons per layer varied BLAS cuBLAS cuBLAS + CUDA 10000 Time per Epoch (ms) 1000 100 10 1 1 10 100 Neurons in Hidden Layers BLAS cuBLAS cuBLAS + CUDA 1000 Time Per Epoch (ms) 100 10 1 1 0.1 10 100 Nuerons In Hidden Layers 1000 GPU is very powerful parallel processor ◦ Up to two orders of magnitude improvement possible Much more effective for large comutations Many improvements possible ◦ CUDA-only version needed