Learning video saliency from human gaze using candidate selection

advertisement
Learning video saliency from
human gaze using candidate
selection
CVPR2013 Poster
Outline




Introduction
Method
Experiments
Conclusions
Introduction


Predicting where people look in video is
relevant in many applications.
Image vs. video saliency
Introduction



Two observation:
1.image saliency studies concentrate on a
single image stimulus, without any prior.
2. when watching dynamic scenes people
usually follow the action and the characters
by shifting their gaze to a new interesting
location in the scene.
Introduction

We propose a novel method for video
saliency estimation, which is inspired by the
way people watch videos.
Method


Candidate extraction
Modeling gaze dynamics
Candidate extraction




Three types of candidates:
1. Static candidates
2. Motion candidates
3. Semantic candidates
Candidate extraction


1. Static candidates
calculate the graph-based visual saliency
(GBVS)
Candidate extraction



2. Motion candidates
calculate the optical flow between consecutive
frames
apply Difference-of-Gaussians (DoG) filtering
to the optical flow magnitude
Candidate extraction

Static (a) and motion (b) candidates.
Candidate extraction




3. Semantic candidates
due to higher level visual processing
three types:
center, face, and body
Candidate extraction





3. Semantic candidates
small detections : create a single candidate at
their center.
large detections : create several candidates
four for body detections (head, shoulders and
torso)
three for faces (eyes and nose with mouth).
Candidate extraction

Semantic candidates
Modeling gaze dynamics



Features
Gaze transitions for training
Learning transition probability
Features


the creation of a feature vector for every
ordered pair of (source, destination)
candidate locations
The features can be categorized into two sets:
destination frame features and inter-frame
features.
Features

As a low level spatial cue we use the local contrast of
the neighborhood around the candidate location.
Gaze transitions for training


Whether a gaze transition occurs from a
given source candidate to a given target
candidate.
1. choose relevant pairs of frames


Scene cut
2. to label positive and negative gaze
transitions between these frames
Gaze transitions for training
Learning transition probability



whether a transition occurs or not
train a standard random forest classifier
using the normalized feature vectors and
their labeling.
trained model classifies every transition
between source and destination candidates
and provides a confidence value.
Learning transition probability

transition probability P(d|si)
Experiments



Dataset :
DIEM (Dynamic Images and Eye
Movements)dataset
CRCNS dataset
Experiments
Experiments
Experiments
Experiments
Conclusions



The method is substantially different from
existing methods and uses a sparse
candidate set to model the saliency map.
using candidates boosts the accuracy of the
saliency prediction and speeds up the
algorithm.
the proposed method accounts for the
temporal dimension of the video by learning
the probability to shift between saliency
locations.
Download