balazsszanto_semanticsegmentation_review

advertisement
Friedrich-Schiller University Jena
Department of Mathematics and Computer Sciences
Institute of Informatik
Chair for Computer Vision
METHODS IN FIELD OF SEMANTIC
SEGMENTATION
A REVIEW
Author:
Balázs Szántó
BSc student
Consultant:
Erik Rodner
Scientific Associate
January, 2012, Jena
Methods in field of Semantic Segmentation
January - 2012, FSU Jena, Balázs Szántó
1 INTRODUCTION
2
//From the website
The goal of semantic segmentation algorithms is to label each pixel with one of the given
object categories. Due to this very general formulation, there is a wide range of
applications, such as facade recognition, generic object detection, or inferring semantics of
remote sensing data. In our work, we study approaches based on random decision forests.
http://www.inf-cv.uni-jena.de/SSG.html
http://www.inf-cv.unijena.de/Lehrstuhl/Mitarbeiter/Dipl__Inf_+Bj%C3%B6rn+Fr%C3%B6hlich-p54/LabelMeFacade+Database.html
Recently I realized that object class detection and semantic segmentation are the
two different ways to solve the recognition task. Although the approaches look very
similar, methods vary significantly on the higher level (and sometimes on the lower level
too). Let me first state the problem formulations.
// http://computerblindness.blogspot.com/2010/06/object-detection-vs-semantic.html
Semantic segmentation (or pixel classification) associates one of the pre-defined class
labels to each pixel. The input image is divided into the regions, which correspond to the
objects of the scene or "stuff" (in terms of Heitz and Koller (2008)). In the simplest case
pixels are classified w.r.t. their local features, such as colour and/or texture
features (Shotton et al., 2006). Markov Random Fields could be used to incorporate interpixel relations.
Object detection addresses the problem of localization of objects of the certain classes.
Minimum bounding rectangles (MBRs) of the objects are the ideal output. The simplest
approach here is to use a sliding window of varying size and classify sub-images defined
by the window. Usually, neighbouring windows have similar features, so each object is
likely to be alarmed by several windows. Since multiple/wrong detections are not
desirable, non-maximum suppression (NMS) is used. In PASCAL VOC contest an object
is considered detected, if the true and found rectangles are intersected on at least half of
their union area. In the Marr prize winning paper by Desai et al. (2009) more intelligent
2
Methods in field of Semantic Segmentation
January - 2012, FSU Jena, Balázs Szántó
scheme for NMS and incorporation of context is proposed. In the recent paper by Alexe the
objectness measure for a sliding window is presented.
In theory, the two problems are almost equivalent. Object detection reduces easily to
semantic segmentation. If we have a segmentation output, we just need to retain object
classes (or discard the "stuff" classes) and take MBRs of regions. The contrary is more
difficult. Actually, all the stuff turns into the background class. All the found objects
within the rectangles should be segmented, but it is a solvable issue since foreground
extraction techniques like GrabCut could be applied. So, there are technical difficulties
which could be overcome and the two problems could be considered equivalent, however,
in practice the approaches are different.
There arise two questions:
1. Which task has more applications? I think we do not generally need to classify
background into e.g. ground and sky (unless we are programming an autonomous robot),
we are interested in finding objects more. Do we often need to obtain the exact object
boundary?
2. Which task is sufficient for the "retrieval" stage of the intelligent vision system in the
philosophical sense? I.e. which task is more suitable for solving the global problem of
exhaustive scene analysis?
Thoughts?
[1] F. Tombari, L. Di Stefano, S. Giardino, "Online Learning for Automatic Segmentation
of 3D Data", IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS '11), 2011
http://vision.deis.unibo.it/fede/kinectDataset.html
videók:
http://www.youtube.com/watch?v=ibzE9F-Ksf8
http://www.youtube.com/watch?v=s33whmjr6wk
http://research.microsoft.com/apps/video/default.aspx?id=150895
3
Methods in field of Semantic Segmentation
January - 2012, FSU Jena, Balázs Szántó
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
4
Methods in field of Semantic Segmentation
January - 2012, FSU Jena, Balázs Szántó
3 REFERENCES
5
Download