Fast 3D Object Recognition In Real-World Environments Ken Lee, CEO Insert Company Logo on Slide Master May 29, 2014 Copyright © 2014 VanGogh Imaging 1 Company Background • Founded in 2007 • Located in McLean, VA • Mission: “Provide Real-time 3D computer vision technology for embedded and mobile applications” • Product: ‘Starry Night’ 3D-CV Middleware • Operating System: Android and Linux • 3D Sensor: PrimeSense & Kinect & SoftKinetic • Processors: ARM & Xilinx Zynq • Applications • 3D Printing, Parts Inspection, Robotics • Security, Automotive, Augmented Reality • Medical, Gaming Copyright © 2014 VanGogh Imaging 2 Starry Night 3D Middleware Copyright © 2014 VanGogh Imaging 3 The ‘Starry Night’ Middleware (Unity Plugin) • • • • • • • Busy real-world environment Real-time processing Tolerant to noise from low-cost scanners Efficient Fully automated Mobile or portable embedded platform (ARM & Xilinx Zynq FPGA) Released on Avnet Embedded Software Store: June, 2014. Starry Night Video: https://www.youtube.com/watch?v=Ro1mv007MHo&feature=youtu.be Copyright © 2014 VanGogh Imaging 4 The ‘Starry Night’ Middleware Blocks Copyright © 2014 VanGogh Imaging 5 The ‘Starry Night’ Shape-Based Registration • Reliable — The output is always a fully-formed 3D model with known feature points despite noisy or partial scans • Easy to use — Fully automated process • Powerful — Known data structure for easy analysis and measurement • Fast — Single step process (Not iterative) Input Scan (Partial) + Reference Model = Full 3D Model Copyright © 2014 VanGogh Imaging 6 Object Recognition Algorithm Copyright © 2014 VanGogh Imaging 7 Challenges — Scene • Busy scene, object orientation, and occlusion Copyright © 2014 VanGogh Imaging 8 Challenges — Platform • Mobile and Embedded Devices • ARM — A9 or A15, <1G RAM • Existing libraries were built for laptop/desktop platform • GPU processing is not always available • Therefore, we need a very efficient algorithm Copyright © 2014 VanGogh Imaging 9 Previous Approaches • Texture based methods • Color based depends heavily on lighting or color of the object • Machine Learning robust but requires training per each object • Neither method provides Transform (i.e. orientation) • (3D) methods • Hough transform and geometric hashing Slow • Geometric hashing Even slower • Tensor matching Not good for noisy and sparse scene • Correspondence based methods using rigid geometric descriptors • The models must have distinctive feature points which is not true for most models (i.e. cylinder) Tried Copyright © 2014 VanGogh Imaging 10 General Concept Reference Object Descriptor distance & normal Match Criteria Compare Fine-Tune Orientation Location Transpose Scene Distance and normal of Random sample points Copyright © 2014 VanGogh Imaging 11 Block Diagram — Example for One Model Copyright © 2014 VanGogh Imaging 12 Model Descriptor (Pre-processed) Sample all point pairs in the model that are separated by the same distance D Use the surface normal of the pair to group them into the hash tablet Note: In the bear example, D = 5 cm which resulted in 1000 pairs Note: The keys are angles derived from the normal of the points. alpha(α) = first normal to second point beta(β) = second normal to first point omega(Ω) = angle of the plane between two points key (α1,β1,Ω1) P1, P2 P3, P4 (α2,β2,Ω2) P5, P6 P7, P8 (α3,β3,Ω3) P13, P14 P9, P10 P11, P12 Copyright © 2014 VanGogh Imaging 13 Object Recognition of the Model (Real-time) Grab Scene Sample point pair w/ distance D using RANSAC Generate key using same hash function Use key to retrieve similarly oriented points in the model & rough transform Match criteria to find the best match Note: The example scene has around 16K points Note: We iterated this sampling process 100 times Note: Entire process can be easily parallelized Very Important: Multiple models can be found using a single hash table for example sampled point pair in the scene Use ICP to refine transform Copyright © 2014 VanGogh Imaging 14 Implementation • Result Object Recognition Video: https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be Copyright © 2014 VanGogh Imaging 15 Performance Copyright © 2014 VanGogh Imaging 16 Reliability (w/ bear model) • Reliability • % False positives — depends on the scene • Clean scene — <1% • Noisy scene — 15% • % negative results (cannot find the object) • Clean scene — <1% • Noisy scene — 25% (also takes longer) • Effect of orientation on success ratio • Model facing front — > 99% • Model facing backward — > 99% • Model facing sideways — 65% Copyright © 2014 VanGogh Imaging False positive 17 Performance — Mobile • Performance on Cortex A-15 2GHz ARM (on Android mobile) • Amount of time it takes to find one object • Single-thread — 4 seconds • Multi-thread & NEON — 1 second • Amount of time it takes to find two objects • Single-thread — 5.2 seconds • Multi-thread & NEON — 1.4 second Copyright © 2014 VanGogh Imaging 18 Hardware Acceleration — FPGA (Xilinx Zynq) • Select Functions to Be Implemented in Zynq • FPGA — Matrix operations • Dual-core ARM — Data management + Floating point • Entire implementation done in C++ (Xilinx Vivado-HLS) Copyright © 2014 VanGogh Imaging 19 Performance — Embedded using FPGA • Note: Currently, only 30% of the computationally intensive functions are implemented on the FPGA with the rest still running on ARM A9. Therefore, it should be much faster once we can transfer most of these to the FPGA. • Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA) • Amount of time it takes to find one object • Zynq 7020 — 6 second • Zynq 7045 (est.) — <1 second • No test result for two objects but should scale the same way as for the ARM. Copyright © 2014 VanGogh Imaging 20 Lesson Learned • The object recognition implemented is pretty reliable • The algorithm does a great job in recognizing multiple models with minimal penalty • More improvement is needed for the noisy environment and certain object orientation • Additional improvement in the performance is needed • Algorithm • Application specific parameters (e.g. size of the model descriptor) • ARM — NEON • Algorithm improvement • Optimize the use of FPGA core Copyright © 2014 VanGogh Imaging 21 Summary Copyright © 2014 VanGogh Imaging 22 Summary • Key implementation issues • Model descriptor • Data structure • Sampling technique • Performance • IMPORTANT • Both ARM & FPGA provides the scalability • Therefore • Real-time object recognition was very difficult but successfully implemented on both mobile and embedded platforms • LIVE DEMO AT THE BOOTH! Copyright © 2014 VanGogh Imaging 23 Resources • www.vangoghimaging.com • Android 3D printing: http://www.youtube.com/watch?v=7yCAVCGvvso • “Challenges and Techniques in Using CPUs and GPUs for Embedded Vision” by Ken Lee, VanGogh Imaging—http://www.embedded-vision.com/platinummembers/vangogh-imaging/embedded-visiontraining/videos/pages/september-2012-embedded-vision-summit • “Using FPGAs to Accelerate Embedded Vision Applications”, Kamalina Srikant, National Instruments— http://www.embedded-vision.com/platinummembers/national-instruments/embedded-visiontraining/videos/pages/september-2012-embedded-vision-summit • “Demonstration of Optical Flow algorithm on an FPGA”— http://www.embedded-vision.com/platinum-members/bdti/embedded-visiontraining/videos/pages/demonstration-optical-flow-algorithm-fpg • * Reference: “An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes” by Chavdar Papazov and Darius Burschka. Technische Universitaet Muenchen (TUM), Germany. Copyright © 2014 VanGogh Imaging 24 Thank you Copyright © 2014 VanGogh Imaging 25