Indoor Scene Classification Anuja Ranjan Manav Garg Prof: Dr. Amitabha Mukherjee Problem • Classifying indoor scenes is a challenging task due to the large variation across different examples within each class and similarities between different classes. • Besides spacial properties indoor classification requires us to see the objects they contain for a good accuracy. Related Work In a recent work by Toralba and Quattoni region of interests are extracted from images and compared. They do not use objects in their approach but do mention that some indoor scenes are better classified by the objects they contain. Img source: Ariadna Quattoni and Antonio Torralba: ”Recognizing Indoor Scenes,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2009. Towards classification via objects • Training Phase : Extracting features and building object classifiers Extracted HOG features and gabor features for images of each of the 4 object classes. Images obtained from Caltech Dataset and Google Images Use AdaBoost algorithm to combine these weak classifiers to develop a strong classifier. • Testing Phase : Detecting objects in test image and predicting the scene probability Used sliding window method with 5 shapes to detect objects in the scene. The method is additionally improved by the use of 3D features of image. With the detections, confidence files and the prior probabilities of objects in the scene we classify the image into scenes. Results Office: 68.26 Hall: 14.31 Conference: 17.43 Visual Prob: 0.786 3D Prob: 0.915 Office: 37.33 Hall: 30.33 Conference: 32.34 Office: 18.53 Hall: 22.84 Conference: 58.61 Ref: P. Espinace, T. Kollar, A. Soto, and N. Roy: ”Indoor scene recognition through object detection,” IEEE International Conference on Robotics and Automation (ICRA), 2010. Improving the work • Use Gist Features and SVM to develop object classifiers. Source: Oliva, A., & Torralba, A. (2001) “Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope” International Journal of Computer Vision, 42, 145-175. Improving the work • Use Gist Features and SVM to develop object classifiers. WHY GIST???WHAT’S NEW??? Source: Oliva, A., & Torralba, A. (2001) “Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope” International Journal of Computer Vision, 42, 145-175. Improving the work • Use Gist Features and SVM to develop object classifiers. WHY GIST???WHAT’S NEW??? • Gist features develop low dimensional representation of a scene which doesn’t require segmentation or object recognition but use perceptual features. Source: Oliva, A., & Torralba, A. (2001) “Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope” International Journal of Computer Vision, 42, 145-175. Improving the work • Use Gist Features and SVM to develop object classifiers. WHY GIST???WHAT’S NEW??? • Gist features develop low dimensional representation of a scene which doesn’t require segmentation or object recognition but use perceptual features. • The result obtained by Gist classification show a considerable improvement in the context based recognition and classification. Source: Oliva, A., & Torralba, A. (2001) “Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope” International Journal of Computer Vision, 42, 145-175. Improving the work • Use Gist Features and SVM to develop object classifiers. WHY GIST???WHAT’S NEW??? • Gist features develop low dimensional representation of a scene which doesn’t require segmentation or object recognition but use perceptual features. • The result obtained by Gist classification show a considerable improvement in the context based recognition and classification. • They haven’t yet been used as such in this classification problem. Source: Oliva, A., & Torralba, A. (2001) “Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope” International Journal of Computer Vision, 42, 145-175. Comparison of Object Classifiers • Average overall accuracy achieved by the HOG classifier ~ 62% • Our classifier ~ 75% Comparison of Object Classifiers • Average overall accuracy achieved by the HOG classifier ~ 62% • Our classifier ~ 75% HOG features: Monitor:0.786 Monitor: 0.052 Screen: 0.621 Gist features: Monitor:0.65 Monitor: 0.15 Screen: 0.028 Appendix They record the occurrence of gradients in localized area of image. Uses the fact that an object can be described by the distribution of its intensity gradient. Gabor features • • • • Gabor filters are used for edge detection ,texture representation,etc. Filters with different frequencies and orientations are used for feature extraction. 2D Gabor filter is a Gaussian kernel function. Also used for sparse object representation SR Ranger Method • • • • The dataset consists of pair of visual and 3D images. The depth in 3D image is estimated per pixel from measuring the time difference between the signal sent and received. 3D models are used to compute the probability of matching the geometric properties of a given object and the information present in a given window (Size, Height) to improve object classification. This takes into account the spatial properties. In order to find an object the probability of visual and 3D image both should be above a threshold.