Friedrich-Schiller University Jena Department of Mathematics and Computer Sciences Institute of Informatik Chair for Computer Vision METHODS IN FIELD OF SEMANTIC SEGMENTATION A REVIEW Author: Balázs Szántó BSc student Consultant: Erik Rodner Scientific Associate January, 2012, Jena Methods in field of Semantic Segmentation January - 2012, FSU Jena, Balázs Szántó 1 INTRODUCTION 2 //From the website The goal of semantic segmentation algorithms is to label each pixel with one of the given object categories. Due to this very general formulation, there is a wide range of applications, such as facade recognition, generic object detection, or inferring semantics of remote sensing data. In our work, we study approaches based on random decision forests. http://www.inf-cv.uni-jena.de/SSG.html http://www.inf-cv.unijena.de/Lehrstuhl/Mitarbeiter/Dipl__Inf_+Bj%C3%B6rn+Fr%C3%B6hlich-p54/LabelMeFacade+Database.html Recently I realized that object class detection and semantic segmentation are the two different ways to solve the recognition task. Although the approaches look very similar, methods vary significantly on the higher level (and sometimes on the lower level too). Let me first state the problem formulations. // http://computerblindness.blogspot.com/2010/06/object-detection-vs-semantic.html Semantic segmentation (or pixel classification) associates one of the pre-defined class labels to each pixel. The input image is divided into the regions, which correspond to the objects of the scene or "stuff" (in terms of Heitz and Koller (2008)). In the simplest case pixels are classified w.r.t. their local features, such as colour and/or texture features (Shotton et al., 2006). Markov Random Fields could be used to incorporate interpixel relations. Object detection addresses the problem of localization of objects of the certain classes. Minimum bounding rectangles (MBRs) of the objects are the ideal output. The simplest approach here is to use a sliding window of varying size and classify sub-images defined by the window. Usually, neighbouring windows have similar features, so each object is likely to be alarmed by several windows. Since multiple/wrong detections are not desirable, non-maximum suppression (NMS) is used. In PASCAL VOC contest an object is considered detected, if the true and found rectangles are intersected on at least half of their union area. In the Marr prize winning paper by Desai et al. (2009) more intelligent 2 Methods in field of Semantic Segmentation January - 2012, FSU Jena, Balázs Szántó scheme for NMS and incorporation of context is proposed. In the recent paper by Alexe the objectness measure for a sliding window is presented. In theory, the two problems are almost equivalent. Object detection reduces easily to semantic segmentation. If we have a segmentation output, we just need to retain object classes (or discard the "stuff" classes) and take MBRs of regions. The contrary is more difficult. Actually, all the stuff turns into the background class. All the found objects within the rectangles should be segmented, but it is a solvable issue since foreground extraction techniques like GrabCut could be applied. So, there are technical difficulties which could be overcome and the two problems could be considered equivalent, however, in practice the approaches are different. There arise two questions: 1. Which task has more applications? I think we do not generally need to classify background into e.g. ground and sky (unless we are programming an autonomous robot), we are interested in finding objects more. Do we often need to obtain the exact object boundary? 2. Which task is sufficient for the "retrieval" stage of the intelligent vision system in the philosophical sense? I.e. which task is more suitable for solving the global problem of exhaustive scene analysis? Thoughts? [1] F. Tombari, L. Di Stefano, S. Giardino, "Online Learning for Automatic Segmentation of 3D Data", IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS '11), 2011 http://vision.deis.unibo.it/fede/kinectDataset.html videók: http://www.youtube.com/watch?v=ibzE9F-Ksf8 http://www.youtube.com/watch?v=s33whmjr6wk http://research.microsoft.com/apps/video/default.aspx?id=150895 3 Methods in field of Semantic Segmentation January - 2012, FSU Jena, Balázs Szántó http://pascallin.ecs.soton.ac.uk/challenges/VOC/ 4 Methods in field of Semantic Segmentation January - 2012, FSU Jena, Balázs Szántó 3 REFERENCES 5