GPU-accelerated image recognition

advertisement
Accelerating image recognition
on mobile devices using GPGPU
Miguel Bordallo1, Henri Nykänen2, Jari Hannuksela1, Olli Silvén1 and Markku Vehviläinen3
1
University of Oulu, Finland
2 Visidon Ltd. Oulu, Finland
3 Nokia Research Center, Tampere, Finland
Jari Hannuksela, Olli Silvén
Machine Vision Group, Infotech Oulu
Department of Electrical and Information Engineeering
University of Oulu, Finland
MACHINE VISION GROUP
Contents
Introduction
Mobile Image Recognition
• Local Binary Pattern
Graphics processor as a computing engine
GPU accelerated image recognition
• LBP Fragment Shader implementation
• Image preprocessing
Experiments and results
• Speed
• Power Consumptions
MACHINE VISION GROUP
Motivation
• Face detection and recognition is a key
component of future multimodal user interfaces
• Mobile computation power still not harnessed
properly for real-time computer vision
• High demand computations compromise
battery life.
• Need for energy and computationally efficient
solutions
MACHINE VISION GROUP
Face analysis using local binary patterns
• Face analysis is one of the major challenges in
computer vision
• LBP method has already been adopted by many
leading scientists
• Excellent results in face recognition and
authentication, face detection, facial expression
recognition, gender classification
MACHINE VISION GROUP
Local Binary Pattern
MACHINE VISION GROUP
GPU as a computing engine
GPU can be treated a
an independent entity
• Newer phones include a GPU chipset
• OpenGL ES as a highly optimized and attractive
accelerator interface
• Emerging platforms (OpenCL EP) will facilitate using
the GPU as a computing resource
• Compatible data formats for graphics and camera subsystems desirable
MACHINE VISION GROUP
Fixed pipeline (OpenGL ES 1.1) vs.
programmable pipeline (OpenGL ES 2.0)
MACHINE VISION GROUP
Stream processing (OpenGL) vs.
shared memory processing (CUDA)
MACHINE VISION GROUP
OpenCL (Embedded Profile)
•
•
•
•
Emerging platforms will offer needed flexibility
OpenCL Embedded Profile is a subset of OpenCL
Supports data and task parallel programming models
Code executed concurrently on CPU & GPU (& DSP)
– Other current and future resources are compatible
– Easier programming in a heterogeneous processor
environment
• High parallelization on image processing
computations -> High efficiency
MACHINE VISION GROUP
GPU assisted face analysis process
MACHINE VISION GROUP
GPU-accelerated image
recognition
• Open GL ES 2.0:
– Image features (LBP,...) extraction:
– Image preprocessing
– Image scaling
– Displaying
• C code:
– Camera control
– Classification
MACHINE VISION GROUP
• c
LBP fragment shader
implementation
• Two versions:
– Version 1: calculates LBP map in one grayscale channel
– Version 2: calculates 4 LBP maps in RGBA channels
•Access the image via texture lookup
•Fetch the selected picture pixel
•Fetch the neighbours values
•Compute binary vector
•Multiply by weighting factor
MACHINE VISION GROUP
Preprocessing
Create quad
Render each piece
in one channel
Divide texture &
Convert to grayscale
MACHINE VISION GROUP
Experiments setup
• OMAP 3 family (OMAP3530)
– ARM Cortex A8 CPU
– Power VRSGX535 GPU
• 3 set-ups:
– Beagleboard revision 3
– Zoom AM3517EVM (TI Sitara)
– Nokia N900
MACHINE VISION GROUP
Processing times: LBP extraction
Size
GPUv1
GPUv2
CPU
CPU&
GPUv1
CPU&
GPUv2
1024x1024
232ms
180ms
100ms
116ms
90ms
512x512
76ms
46ms
25ms
37ms
23ms
64x64
2ms
1,5ms
0,4ms
1ms
0,2ms
•Computing LBP in four channels (version 2)
faster than computing in one
•CPU faster than GPU
•Concurrent execution of algorithms in GPU +
CPU increases performance
MACHINE VISION GROUP
Processing times: Preprocessing
Size
GPU
CPU
CPU &GPU
1024x1024
35ms
100ms
54ms
512x512
10ms
25ms
15ms
64x64
0,2ms
0,4ms
0,4ms
•GPU outperforms CPU in pixelwise simple operations
(scaling + interpolation)
•Concurrent execution of algorithms in GPU + CPU
slower than GPU alone due to data transfers
MACHINE VISION GROUP
Speed (II): Preprocessing
Size
GPU
CPU
CPU&GPU
1024x1024
35ms
100ms
54ms
512x512
10ms
25ms
15ms
64x64
0,2ms
0,4ms
0,4ms
MACHINE VISION GROUP
Speed (II): Preprocessing
Size
GPU
CPU
GPU preprocessing &
CPU LBP extraction
1024x1024
215ms
205ms
142ms
512x512
56ms
50ms
40ms
64x64
1,8ms
1ms
0,8ms
MACHINE VISION GROUP
Power and Energy consumptions
Operation
GPU
CPU
Preprocesing
27mJ
19mJ
LBP
5,3mJ
10mJ
Combined
algorithm
32,3mJ
28mJ
•Power consumption of GPU and CPU is independent
•CPU – 190mW
•GPU – 110mW-130mW (increases with image size)
•Energy consumption depends on processing time
•GPU has smaller energy per operation.
MACHINE VISION GROUP
Summary
•GPUs can be used as a general purpose procesors
•New platforms will offer more efficiency and flexibility
•Not optimized interfaces include excesive overheads
MACHINE VISION GROUP
Future directions
•
•
•
•
Implementation of classifier
Implementations in OpenCL
Multi-scale LBP
Implementation of other feature extraction
MACHINE VISION GROUP
Thank you!
• Any questions???
Thanks to Texas Instruments for the donation of the Hardware
MACHINE VISION GROUP
Download