Low-Power, Real-Time Object-Recognition Processors for Mobile

advertisement
“Low-Power, Real-Time ObjectRecognition Processors for
Mobile Vision Systems”,
IEEE Micro 2012.
Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ;
Junyoung Park ; Seungjin Lee ; Joo-Young Kim ;
Jeong-Ho Woo ; Hoi-Jun Yoo
Presenter: Juseong Lee, 2013021037
1
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
2
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
3
Introduction
Source by MBN News
4
Introduction
• Object recognition system
– Require real-time operation
• High performance
• Low power in mobile system
• How can implement?
– Find suitable algorithm
• SIFT algorithm
– Hardware optimization
• Algorithm optimization
• Make exclusive processor
– Parallel computation
• Multi-threading
• NoC
SIFT - Scale Invariant Feature Transform
NoC - Network on Chip
Source by VOLVO
5
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
6
Background Knowledge
• What is SIFT algorithm?
– Scale Invariant Feature Transform
– The most popular candidate
• For how to extract some interest points out of the object and describe them
– Robust against changes in translation, scaling, and rotation.
Image matching by SIFT
7
Background Knowledge
• What’s the problem in SIFT-based object recognition?
– Consumes a lot of power
• Owing to the heavy computation required in descriptor Gen. and matching
– Today’s high-resolution image sensors & tight power budgets
• Make real-time SIFT implementation in mobile device even harder
Scare resources problem
8
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
9
Main Idea
• How can we solve the problem?
– Make an object-recognition processor
• Using an attention-based recognition algorithm
– For energy efficiency
• A heterogeneous multicore architecture
– For data and thread parallelism
• Network-on-Chip(NoC) communication
– For high bandwidth
• The processor determines Regions of Interest(ROI) part of image
– For minimizing unnecessary computations
• Heterogeneous multicore architecture
– provides several types of parallelism
– achieves high throughput
– low power consumption
•
High-bandwidth NoC plays a role as the communications backbone
10
Why find ROI?
• Image processing algorithm has no regard throughput
Example) Edge detection
Image size
480 x 360
172,800 computations!
Objects have feature!
You can select part for reducing computation!
11
Main Idea – BONE V
Using Conventional method
Using Main Idea
12
Main Idea – Algorithm
• Attention-based object recognition
13
Main Idea – Architecture
Pixel level parallel
Very long instruction word
3 stage task level pipeline
1.5x↓ power consumption
5 stage fine-grained pipeline
3.45x↑ pipeline throughput
14
BONE-V5:
SMT-enabled heterogeneous
multicore processor
• Throughput-optimized SFEC
– Find ROI tile for energy efficiency
– Memory locality with high bandwidth utilization
• Latency-optimized FMP
– ROI tile and NoC help latency
• Power-optimized MLE
– Changes the core’s thread allocation
– and operating voltage and frequency dynamically
SFEC: SMT-enabled Feature Extraction Cluster
FMP: Feature Matching Processor
MLE: Machine Learning Engine
15
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
16
Implementation
17
Implementation - Comparing
18
Implementation - Comparing
19
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
20
Conclusion
• Energy efficient system is important to
improve performance
• Algorithm and architecture have to optimize at
the same time
• BONE-V multicore processors can apply realtime object recognition system
• Future BONE-V processors will further lower
the power consumption.
21
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
22
Evaluation
• Table 3 has to contain the result that
comparing other recognition processor
• When hardware optimization, Not only
overall algorithm but particular algorithm
block optimization are needed
– CORDIC based gradient and magnitude computation
23
Thanks for Ur listening!
Thanks!
Juseong_lee@korea.ac.kr
24
Download