Sliding - CUNY Graduate Center

advertisement
Survey of Object Classification
in 3D Range Scans
A L L A N ZE L E NER
T HE G R A DUATE CE N T ER, CUN Y
JA N UA RY 8 T H 2 0 1 5
Overview
1.
Introduction
◦
◦
2.
Problem definition: Object Recognition, Object Classification, and Semantic Segmentation
Problem domains: LiDAR scanners for outdoor scenes and RGB-D sensors for indoor scenes
Urban object classification
◦
3.
Case study: Vehicle object detection and classification
Indoor object classification
◦
Cluttered scenes with large variety of objects
4.
Related Works
5.
Comparison and Conclusions
◦
◦
Criteria for evaluation: Classification accuracy, range of classes, use of data
Context through structured prediction and learned 3D feature representations
Object Recognition
Query
Matches (Decreasing score order)
Model
Scene
Lai and Fox (IJRR 08)
Mian, Bennamoun, Owens (IJCV 09)
Object Classification
o Segmentation or sliding template used to find
candidate regions for classification
o Feature based classification may be invariant
to pose and intra-class variation
o More compressed representation than entire
database of object models
o Detection and recognition may still work
better in practice for controlled applications
Golovinskiy, Kim, and Funkhouser (ICCV 2009)
Semantic Segmentation
o Every point in the scene is labeled, including
both objects of interest and background
o Typically a joint optimization of segmentation
and classification
o Formally utilizes context in MRF/CRF model,
where by context we mean nearby regions
Wu, Lenz, and Saxena (RSS 2014)
LiDAR Scans for Outdoor/Urban Scenes
o Long range sensors for outdoor scenes
o Fast scans at low resolution or slow scans at
high resolution, depending on number of
individual sensors
o Moving sensors and registration from
multiple scans result in unstructured point
cloud data with no adjacency grid
o RGB imagery tends to be low quality,
challenging to align, or simply unavailable
RGB-D Images for Indoor Scenes
o Short range sensors for indoor scenes
o Real-time 30 FPS depth maps based on
structured light or time of flight in infrared
o Integrated RGB camera is better aligned and
provides better quality under indoor
conditions than LiDAR systems
o RGB-D image grid makes it well suited for
traditional 2D computer vision techniques on
image frames from a single view
Patterson et al.
• Object Detection from Large-Scale 3D datasets Using Bottom-up and Top-down Descriptors.
Patterson, Mordohai, and Daniilidis. (ECCV 2008)
Spin Image
Extended Gaussian Images
Patterson et al.
1.
Compute normals for all points and spin images for a subset of sampled points
2.
Classify spin image features as either positive (object) or negative (background) points using
nearest neighbor classifier.
3.
Greedy region growing of positively classified points gives object hypothesis
4.
Compute EGI and constellation EGI for object hypothesis and compute alignment and
similarity with database model objects.
•
•
•
5.
Rotation hypothesis based on angles subtended by pairs of points
Translation based on maximum frequency of Fourier transform of best rotation hypothesis
Similarity based on fraction of inliers defined as query points that are nearby model points with
small cosine similarity between normals after alignment
If similarity is above a threshold then the object is positively detected and points that
overlap with the database model after alignment are labeled to obtain segmentation.
Patterson et al.
o Precision 0.92 and Recall 0.74 for chosen inlier threshold parameter
o Computation and comparison of EGIs is slow due to alignment
o Cost of object detection grows linearly in the size of the database
Recall
Precision
Huber et al.
• Parts-based 3D Object Classification. Huber, Kapuria, Donamukkala, and Hebert. (CVPR 2004)
Huber et al.
o Vehicles are segmented into
front/middle/back parts and part classes are
generated as follows:
oFor each part 𝑟𝑖 , the distance between spin
image features in 𝑟𝑖 and ∀𝑟𝑗 , 𝑖 ≠ 𝑗 is computed
to produce 𝑝(𝑚 𝑟𝑖 = 𝑟𝑗 |𝑟𝑖 ) where the event
denotes a nearest neighbor match from a
feature of part 𝑟𝑖 to a feature in part 𝑟𝑗 .
oA symmetric similarity matrix is computed as
the average of the matching probabilities
between all pairs of parts. Part classes are
determined by agglomerative clustering and
the features for each part class are clustered
by k-means to produce a class representation.
Huber et al.
o Relationship between object class and part class is determined by Bayes’ theorem,
𝑝 𝑂𝑗 𝑅𝑖 =
𝑝 𝑅𝑖 𝑂𝑗 𝑝 𝑂𝑗
𝑗𝑝
𝑅𝑖 𝑂𝑗 𝑝 𝑂𝑗
o 𝑝 𝑅𝑖 𝑂𝑗 is determined empirically from the training data and 𝑝 𝑂𝑗 is assumed uniform
o Object class is determined by maximizing likelihood over all parts
arg max
𝑗
𝜋𝑅 𝑅𝑖 𝑝 𝑂𝑗 𝑅𝑖
𝑅𝑖 ∈ℛ
o Here 𝜋𝑅 𝑅𝑖 is determined by matching features between the query part 𝑅𝑖 and the set of part
classes 𝑅 as described during the part class generation stage.
Huber et al.
o Excellent accuracy on simulated scans but lacks
experiments for real data.
o Consistent part segmentation requires recovery of
vehicle pose.
o Improvement over classifier without using parts
Solid Line: Parts-based
Dashed Line: Object-based
Golovinskiy et al.
• Shape-based Recognition of 3D Point Clouds. Golovinskiy, Kim, and Funkhouser. (ICCV 2009)
Golovinskiy et al.
o Localization and segmentation are based on
K-NN graph weighted by point distances
o Localization performed by agglomerative
clustering
o Segmentation performed by min-cut using
virtual background vertex and background
radius parameter.
o Contextual features use geolocation
alignment with street map and occupancy grid
of neighboring objects.
o Relatively poor classification performance,
perhaps due to a lack of local features
Stamos et al.
• Online Algorithms for Classification of Urban Objects in 3D Point Clouds. Stamos, Hadjiliadis,
Zhang, and Flynn. (3DIMPVT 2012)
o Online classification of scan lines using
HMMs and CUSUM hypothesis testing
𝑆𝑛+1 = max 0, 𝑆𝑛 + 𝑥𝑛 − 𝜔𝑛
o 𝜔𝑛 is the likelihood of observation 𝑥𝑛 under
the null hypothesis HMM
o Change detected at large value of 𝑆𝑘
Stamos et al.
o Simple features between points
o Signed angles: sgn 𝐷𝑖,𝑘 ⋅ 𝐷𝑖,𝑘−1 𝑧 𝑇 ⋅ 𝐷𝑖,𝑝
o Line angles: Consistent for collinear points
o Sequence of online classifications performed
to refine from coarse to fine classes
o Each additional classifier incorporates more
prior knowledge about the target class. E.g.,
cars should be on the street at a certain height
Xiong et al.
• 3D Scene Analysis via Sequenced Predictions over Points and Regions. Xiong, Munoz, Bagnell,
Hebert. (ICRA 2011)
Context accumulated from
neighbor segments
Context from segment sent
down to individual points
Context from points averaged
and sent up to segment
Xiong et al.
o Multi-Round Stacking generates contextual
features by using a sequence of weak
classifiers to predict class labels of neighbors
o Two-level hierarchy of regions: segments and
points. MRS is run on one level of the
hierarchy and then the results are passed on
to the other level.
o Sensitive to quality of labeling in training,
particularly if there is a “misc” class
Contextual features for tree-trunk class
Silberman and Fergus
• Indoor Scene Segmentation Using a Structured Light Sensor. Silberman and Fergus. (ICCV 2011)
Silberman and Fergus
o Conditional Random Field
𝐸 𝑦 =
𝜙 𝑥𝑖 , 𝑖; 𝜃 +
𝑖
𝜓 𝑦𝑖 , 𝑦𝑗 𝜂 𝑖, 𝑗
𝑖,𝑗
o 𝜙 ⋅ - Color/depth features and location prior
o 𝜓 𝑦𝑖 , 𝑦𝑗 = 0 if 𝑦𝑖 = 𝑦𝑗 , 3 otherwise
o 𝜂 ⋅ - Spatial transition based on gradient
o Location prior improves performance for
classes in consistent configurations with
respect to camera but decreases otherwise
o E.g., bookshelves in office vs library
3D
Location
Priors
Couprie et al.
• Indoor Semantic Segmentation Using Depth Information. Couprie, Farabet, Najman, LeCun. (ICLR 2013)
Couprie et al.
o Simple application of CNN framework
improves accuracy on classes at consistent
depths such as walls and floors but
performance for objects of interest degrades
o Depth gradients alone are not informative
and depth information must be normalized or
interpreted to be invariant to variations
Anand et al.
• Contextually Guided Semantic Labeling and Search for 3D Point Clouds. Anand, Koppula,
Joachims, and Saxena. (IJRR 2012)
Anand et al.
o MRF trained by structured SVM.
𝐾
𝑦𝑖𝑘 𝑤𝑛𝑘 ⋅ 𝜙𝑛 𝑖
𝑓𝑤 𝑥, 𝑦 =
𝑖∈𝑉 𝑘=1
𝑦𝑖𝑙 𝑦𝑗𝑘 [𝑤𝑡𝑙𝑘 ⋅ 𝜙𝑡 𝑖, 𝑗 ]
+
𝑖,𝑗 ∈𝐸 𝑇𝑡 ∈𝑇 𝑙,𝑘 ∈𝑇
o 𝜙𝑛 𝑖 - Unary features
o 𝜙𝑡 𝑖, 𝑗 - Pairwise features, may be associative or non-associative depending on 𝑇𝑡
o Associative – Feature between neighboring segments of same class, 𝑇𝑡 only has self loops
o Object non-associative –Features between related class labels of neighboring segments
Anand et al.
𝑟𝑖 − 𝑟𝑗
𝑟𝑗 − 𝑟𝑖
𝑇
𝑇
𝑛𝑖 ≥ 0
𝑛𝑗 ≥ 0
Anand et al.
o Object part categories better exploit relationships than object categories alone
o Registered 3D scenes provide more coverage and context than single view scenes
o Common errors include objects that lie on top of other objects, e.g. a book on a table.
o Either the result of poor segmentation or smoothing effect from pairwise potentials
Related Works
• Unsupervised Feature Learning for RGB-D Based Object Detection. Bo, Ren, and Fox (ISER 2012)
Related Works
• Convolutional-Recursive Deep Learning for 3D Object Classification.
Socher, Huval, Bhat, Manning and Ng. (NIPS 2012)
Related Works
Kahler and Reid. (ICCV 2013)
Müller and Behnke. (ICRA 2014)
Related Works
Sliding Shapes for 3D Object Detection in Depth Images. Song and Xiao. (ECCV 2014)
Related Works
• Instance Segmentation of Indoor Scenes Using a Coverage Loss.
Silberman, Sontag, and Fergus. (ECCV 2014)
Input
Perfect
Semantic Segmentation
Naïve Region Growing
Correct
Instance Segmentation
Related Works
• Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception.
Wu, Lenz, and Saxena. (RSS 2014)
NO-CT: Non-overlapping constraints
HR-CT: Hierarchical relation constraints
Related Works
• Classification of Vehicle Parts in Unstructured 3D Point Clouds. Zelener, Mordohai, and Stamos. (3DV 2014)
• Unsupervised segmentation of parts by RANSAC plane fitting
• Structured prediction over parts and object class by HMM and structured perceptron
• Does not require pose estimation, experiments performed using real data
p1
p2
x1
x2
⋯
pn
c
xn
x1 x2 … xn
Comparison
o Fine tuned object recognition methods still appear to work best for specific tasks
o E.g. for car detection in urban scenes
o Indoor scenes have many potential objects of interest, difficult to scale number of classes
oObject classification requires 3D shape features that are discriminative
o Simple accumulators like the spin image are still competitive choices for features
o Learned representations may do better, but how to construct them is a challenge
o Differences in representations between point clouds and RGB-D images
o Errors in segmentation may propagate to classification
o Semantic segmentation jointly optimizes segmentation and classification
o Structured prediction provides useful context-based relationships, but can lead to false assumptions
o Context relationships are also often fixed and manually engineered
Conclusions
o 3D shape and context based features provide consistent improvements to classification systems
o Learned 3D representations that are aware of the unique properties of 3D shape features may
see improvement over simple application of 2D techniques
o Structured prediction to model relationships between objects, their parts, and their
environment also improves performance
o Sparse or hierarchical structured relationships are desirable for computational efficiency
Download