The department of Systems and Control Engineering invites you to a talk entitled: Wearable Personal Assistant for the Visually Impaired Computer vision algorithms for scene reconstruction, recognition and segmentation Speaker: Dr. Michael Sapienza Date: Friday 18th March 2016 Venue: Engineering Board Room (Rm 411) Time: 09:30 – 10:30 Abstract: Motivated by the creation of a wearable personal assistant for the visually impaired, we are developing computer vision algorithms for scene reconstruction, recognition and segmentation, of which three selected works will be presented in this talk. The segmentation of images into semantic classes such as 'person' and 'bus' play a central role in scene understanding. Recent approaches use deep learning for image segmentation, however their ability to delineate objects of interest is limited. In order to alleviate this issue we combine the strengths of convolutional neural networks (CNNs) and conditional random fields (CRFs), which allows us to enforce consistency between the image labelling and appearance. By integrating the CRF architecture into a CNN, it becomes possible to train the whole deep network via backpropagation, resulting in higher-quality segmentations; an online demo is available at: http://www.robots.ox.ac.uk/~szheng/crfasrnndemo. Whilst deep convolution-al networks give state-of-the-art performance on 2D-image segmentation, they require large amounts of training data and are constrained to a specific set of categories on which the network has been trained. In order to personalise the learning to a user's environment, we designed an interactive system for the geometric reconstruction and online object-class learning and segmentation of 3D scenes. Using our system, a user can walk into a room wearing a consumer depth camera, and both reconstruct the 3D scene and interactively segment it on the fly into object classes which the user has selected, such as 'my wallet', 'my strange telephone' and 'my particular chair'. The code-base has been made available at: www.semantic-paint.com. It is not always possible to recognise objects from visual cues alone since different object categories can look visually very similar. However, if the visually similar objects are made of different materials, then they can be more easily discriminated using auditory cues rather than visual ones. We therefore present an approach that combines dense visual cues with sparse auditory cues in order to estimate dense object and material labels. Since estimates of object class and material properties are mutually informative, we optimise a multi-label output jointly using a random-field framework. We demonstrate the value of this approach on a new dataset with paired visual and auditory data made available at: www.robots.ox.ac.uk/~tvg/projects/AudioVisual Biography Michael Sapienza received his B.Eng and M.Sc in Electrical Engineering from the University of Malta. Following an internship with the PERCEPTION team at INRIA Grenoble, Michael moved to Oxford (2011) to pursue a doctorate with the Oxford Brookes Vision Group. His research addressed the question of how a machine may automatically identify parts of videos in which human actions are occurring given huge video databases without location annotation. Currently, Michael has moved to the University of Oxford as a post-doctoral research assistant with the Torr Vision Group. His research interests span computer vision, machine learning, robotic perception and human interaction. Please indicate your interest in attending this talk by sending an e-mail to Alexandra Bonnici : alexandra.bonnici@um.edu.mt by Thursday 10th March.