Wearable Personal Assistant for the Visually Impaired recognition and segmentation

advertisement
The department of Systems and Control Engineering invites you to a talk entitled:
Wearable Personal Assistant for the Visually Impaired
Computer vision algorithms for scene reconstruction,
recognition and segmentation
Speaker:
Dr. Michael Sapienza
Date:
Friday 18th March 2016
Venue:
Engineering Board Room (Rm 411)
Time:
09:30 – 10:30
Abstract:
Motivated by the creation of a wearable personal assistant for the visually impaired, we are
developing computer vision algorithms for scene reconstruction, recognition and segmentation, of
which three selected works will be presented in this talk.
The segmentation of images into semantic classes such as 'person' and 'bus' play a central role in
scene understanding. Recent approaches use deep learning for image segmentation, however their
ability to delineate objects of interest is limited. In order to alleviate this issue we combine the
strengths of convolutional neural networks (CNNs) and conditional random fields (CRFs), which
allows us to enforce consistency between the image labelling and appearance. By integrating the
CRF architecture into a CNN, it becomes possible to train the whole deep network via backpropagation, resulting in higher-quality segmentations; an online demo is available at:
http://www.robots.ox.ac.uk/~szheng/crfasrnndemo.
Whilst deep convolution-al networks give state-of-the-art performance on 2D-image segmentation,
they require large amounts of training data and are constrained to a specific set of categories on
which the network has been trained. In order to personalise the learning to a user's environment,
we designed an interactive system for the geometric reconstruction and online object-class
learning and segmentation of 3D scenes. Using our system, a user can walk into a room wearing a
consumer depth camera, and both reconstruct the 3D scene and interactively segment it on the fly
into object classes which the user has selected, such as 'my wallet', 'my strange telephone' and 'my
particular chair'. The code-base has been made available at: www.semantic-paint.com.
It is not always possible to recognise objects from visual cues alone since different object
categories can look visually very similar. However, if the visually similar objects are made of
different materials, then they can be more easily discriminated using auditory cues rather than
visual ones. We therefore present an approach that combines dense visual cues with sparse
auditory cues in order to estimate dense object and material labels. Since estimates of object class
and material properties are mutually informative, we optimise a multi-label output jointly using a
random-field framework. We demonstrate the value of this approach on a new dataset with paired
visual and auditory data made available at: www.robots.ox.ac.uk/~tvg/projects/AudioVisual
Biography
Michael Sapienza received his B.Eng and M.Sc in Electrical Engineering from the University of
Malta. Following an internship with the PERCEPTION team at INRIA Grenoble, Michael moved to
Oxford (2011) to pursue a doctorate with the Oxford Brookes Vision Group. His research addressed
the question of how a machine may automatically identify parts of videos in which human actions
are occurring given huge video databases without location annotation. Currently, Michael has
moved to the University of Oxford as a post-doctoral research assistant with the Torr Vision Group.
His research interests span computer vision, machine learning, robotic perception and human
interaction.
Please indicate your interest in attending this talk by sending an e-mail
to Alexandra Bonnici : alexandra.bonnici@um.edu.mt by Thursday 10th
March.
Download