Multi-Layer Perceptron On A GPU

advertisement
Scott Finley
ECE 539 Fall 2008
UW-Madison




Modern GPUs are have 100s of “stream
processors”
Can now be used for non-graphics
computing
nVida CUDA (used for this project)
openCL
Basic Linear Algebra Subprograms (BLAS)
1.
◦
CPU-Only
nVidia’s cuBLAS library
2.
◦
◦
No explicit GPU use, library uses GPU “under the
hood”
Lots of copies of data from CPU to GPU
cuBLAS with CUDA
3.
◦
Same cuBLAS use as above, non-BLAS operations
done with CUDA.




Data from US forestry service
Large feature vectors: 54
Large number of training samples: 500 per
epoch
Two hidden layers
◦ Number of neurons per layer varied
BLAS
cuBLAS
cuBLAS + CUDA
10000
Time per Epoch (ms)
1000
100
10
1
1
10
100
Neurons in Hidden Layers
BLAS
cuBLAS
cuBLAS + CUDA
1000
Time Per Epoch (ms)
100
10
1
1
0.1
10
100
Nuerons In Hidden Layers
1000

GPU is very powerful parallel processor
◦ Up to two orders of magnitude improvement
possible


Much more effective for large comutations
Many improvements possible
◦ CUDA-only version needed
Download