3D Object Recognition Pipeline Kurt Konolige, Radu Rusu, Victor Eruhmov, Suat Gedikli Willow Garage Stefan Holzer, Stefan Hinterstoisser TUM Morgan Quigley, Stephen Gould Stanford Marius Muja UBC 3D and Object Recognition • • • Provides more info than just visual texture Good for scale and segmentation Verification Need a good device for 3D info 2 3D Cameras 3 Technology Examples Pro/Con Stereo Newcombe, Davison CVPR 2010 Not dense, smearing; real-time, good resolution Registration + regularization Stereo + texture WG device Dense, real-time, good resolution Short range Laser Laser line line scan scan STAIR STAIR Borg Borg scanner scanner Structured Structured light light PrimeSense PrimeSense Dense, Dense, most most accurate accurate Short Short range, range, not not real real time time Dense, Dense, real-time, real-time, good good resolution resolution Short Short range, range, ambient ambient light/scene light/scene texture texture Phase Phase shift shift SR4, SR4, PMD, PMD Canesta Dense, Dense, real-time, real-time, medium medium range range Low resolution, low accuracy, Low resolution, low accuracy, gross gross errors errors Gated Gated reflectance reflectance 3DV Canesta Dense, Dense, real-time real-time Low Low resolution, resolution, low low accuracy accuracy Tabletop manipulation: • Short range • High resolution • High range accuracy • Real-time WG Projected Texture Stereo Device • Paint the scene with texture from a projector • vs. single camera with structured light • Advantages: • Simple projector • Standard algorithms • Full frame rates (640x480) • Dynamic scenes WG project texture device Projector • Red LED • Eye safe • Synchronized to cameras 3D Fly-thru Object Recognition Pipeline Pre-filter Detect Verify • Textured objects via keypoints [Victor Eruhimov, Suat Gedikli] • Untextured objects via DOT [Stefan Holzer, Stefan Hinterstoisser] • Simple 3D model matching [Marius Muja] • STAIR 2D/3D features [Stephen Gould] 6 MOPED – Textured object recognition with pose • Model: Stereo view of an object at a known pose • Extract keypoints and features • For a new scene, match keypoints to each model • Run SfM geometric check to verify and recover pose Torres, Romea, Srinivasa ICRA 2010 7 - Need texture - Need high res camera 8 Dominant Orientation Templates (DOT) Stefan Hinterstoisser, Stefan Holzer (TUM; CVPR 2010, ECCV 2010) ● DOT is a template matching based approach template current scene - Template is slid over the image to compute the response for each image position - If response is above a threshold it is considered as detection of the template DOT – Basic Principle ● DOT uses gradients instead of color or gray values template current scene - Gradients are less sensitive to illumination changes - Gradients have orientation and magnitude Offline Learning ● Good learning is necessary to reduce false-positive rate ● We try to use all available information to segment the object: ● Point cloud from narrow stereo is used to detect the table and segment the point cloud of the object ● Object point cloud is used to create an initial mask ● Mask is refined using GrabCut (see OpenCV) False-Positive Rejection ● Two more precise templates for validation: ● more precise and not discretized gradient template ● disparity template to compare expected with real disparities False-Positive Rejection ● Compute error between reference point cloud and point cloud at detected position Optimize initial 3D point cloud pose given from the detection Directly gives object pose if model is associated with learned point clouds 14 15 16 17 18 STAIR Vision Library (SVL) Stanford STAIR project [Andrew Ng, Stephen Gould] • • • Initially developed to support the Stanford AI Robot (STAIR) project Builds on top of OpenCV computer vision library and Eigen matrix library Provides a range of software infrastructure for • • • • computer vision machine learning probabilistic graphical models Hosted on SourceForge Object Detection in SVL • Sliding-window object detector • • • Features are extracted from a local window Learned boosted decision-tree classifier scores each window Image is scanned at multiple resolutions to detect objects at different scales [Quigley et al., ICRA 2009] Image Channels • Image decomposed into multiple channels • Depth at each pixel, obtained from a laser scanner, can be thought of as an additional channel intensity image edge map depth map [Quigley et al., ICRA 2009] Object Detection Features • Learn a “patch” dictionary over intensity, edge and depth channels • • • Patches encode localized templates for matching Depth patches capture shape; intensity and edge patches capture appearance Patch responses (over entire dictionary) are combined to form the feature vector [Quigley et al., ICRA 2009] Results • • • 150 images of cluttered indoor scenes 5-fold cross-validation Depth information provides significant improvement in area under precision-recall curve 8% improvement 3% improvement 38% improvement Conclusions • Realtime, accurate 3D devices are becoming available • 3D can help in object detection for untextured objects - Combo of visual and 3D features best • 3D is useful for verification • Check out the PR2 Grasping Demo! 24