“Low-Power, Real-Time ObjectRecognition Processors for Mobile Vision Systems”, IEEE Micro 2012. Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung Park ; Seungjin Lee ; Joo-Young Kim ; Jeong-Ho Woo ; Hoi-Jun Yoo Presenter: Juseong Lee, 2013021037 1 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 2 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 3 Introduction Source by MBN News 4 Introduction • Object recognition system – Require real-time operation • High performance • Low power in mobile system • How can implement? – Find suitable algorithm • SIFT algorithm – Hardware optimization • Algorithm optimization • Make exclusive processor – Parallel computation • Multi-threading • NoC SIFT - Scale Invariant Feature Transform NoC - Network on Chip Source by VOLVO 5 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 6 Background Knowledge • What is SIFT algorithm? – Scale Invariant Feature Transform – The most popular candidate • For how to extract some interest points out of the object and describe them – Robust against changes in translation, scaling, and rotation. Image matching by SIFT 7 Background Knowledge • What’s the problem in SIFT-based object recognition? – Consumes a lot of power • Owing to the heavy computation required in descriptor Gen. and matching – Today’s high-resolution image sensors & tight power budgets • Make real-time SIFT implementation in mobile device even harder Scare resources problem 8 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 9 Main Idea • How can we solve the problem? – Make an object-recognition processor • Using an attention-based recognition algorithm – For energy efficiency • A heterogeneous multicore architecture – For data and thread parallelism • Network-on-Chip(NoC) communication – For high bandwidth • The processor determines Regions of Interest(ROI) part of image – For minimizing unnecessary computations • Heterogeneous multicore architecture – provides several types of parallelism – achieves high throughput – low power consumption • High-bandwidth NoC plays a role as the communications backbone 10 Why find ROI? • Image processing algorithm has no regard throughput Example) Edge detection Image size 480 x 360 172,800 computations! Objects have feature! You can select part for reducing computation! 11 Main Idea – BONE V Using Conventional method Using Main Idea 12 Main Idea – Algorithm • Attention-based object recognition 13 Main Idea – Architecture Pixel level parallel Very long instruction word 3 stage task level pipeline 1.5x↓ power consumption 5 stage fine-grained pipeline 3.45x↑ pipeline throughput 14 BONE-V5: SMT-enabled heterogeneous multicore processor • Throughput-optimized SFEC – Find ROI tile for energy efficiency – Memory locality with high bandwidth utilization • Latency-optimized FMP – ROI tile and NoC help latency • Power-optimized MLE – Changes the core’s thread allocation – and operating voltage and frequency dynamically SFEC: SMT-enabled Feature Extraction Cluster FMP: Feature Matching Processor MLE: Machine Learning Engine 15 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 16 Implementation 17 Implementation - Comparing 18 Implementation - Comparing 19 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 20 Conclusion • Energy efficient system is important to improve performance • Algorithm and architecture have to optimize at the same time • BONE-V multicore processors can apply realtime object recognition system • Future BONE-V processors will further lower the power consumption. 21 Outline • Introduction • Background • Main Idea • Implementation • Conclusion • Evaluation Object Recognition by Juseong Lee 22 Evaluation • Table 3 has to contain the result that comparing other recognition processor • When hardware optimization, Not only overall algorithm but particular algorithm block optimization are needed – CORDIC based gradient and magnitude computation 23 Thanks for Ur listening! Thanks! Juseong_lee@korea.ac.kr 24