MediaEval Workshop 2011

advertisement
MediaEval Workshop 2011
Pisa, Italy
1-2 September 2011
Introduction
• Genre Tagging task: Given 1727 videos and 26
genre tags, decide which tag goes to which
video.
• Genres were – art, health, literature.
Technology, sports, blogs, religion, travel, etc.
• Videos were from an online video hosting site
called blip.tv
Introduction cont..
• Data given to us: Videos, Speech transcripts,
metadata and some user defined tags.
• Total data/videos were divided into two sets.
– Development set (consisting of 247 videos of
which we were given the ground truth, so that we
can play around with our algorithm).
– Test Set (consisting of 1727 videos for which we
were not given the ground truth and we had to
submit our results in the workshop).
TUD-MIR at MediaEval 2011 Genre
Tagging Task: Query
Expansion from a Limited Number
of Labeled Videos
Main Idea
• Information Retrieval approach
• Just used the textual data
• Using a relatively small number of labeled
videos in the development set to mine query
expansion terms that are characteristic of
each genre.
Approach
• Combined all the videos of the same genre in
the development set together.
• Apply preprocessing such as stop word
removal and stemming.
• Perform weighting and ranking of all the terms
in the development set vocabulary.
• And then use the top 20 terms from each
genre document to be expanded query terms.
Offer Weighting Formula
 r  0.5* N  n  R  r  0.5 

OW (i )  r * log 
 n  r  0.5* R  r  0.5 
In the formula above, r is the number of videos of
a particular genre in which term t(i) appears in, R
is the total number of videos of that genre, N is
the total number of videos in the collection and n
is the number of videos in the collection in which
term t(i) appears.
Few other Query Expansion
Techniques
• They also ran several query expansions: PRF,
WordNet, Google Sets and YouTube.
• To expand queries via YouTube, they first
download metadata (e.g. title, description and
tags) of the top-50 ranked videos returned by
YouTube for each genre label, except for
default category and sample 20 expansion
terms from those using the Offer Weight as
explained earlier.
LIA @ MediaEval 2011 : Compact
Representation of
Heterogeneous Descriptors for
Video Genre Classification
Main Idea
• Classification approach
• A method that extracts low dimensional
feature space based on text, audio and video
information.
• Late fusion of SVM results for each modality.
Data Collection
• Training data set was collected from the web.
• They first expanded the query terms using
Latent Dirichlet Allocation (LDA) on Gigaword
corpus and then used top 10 expanded terms
for each genre.
• They Queried YouTube and Daily-motion for
the videos (total of 3120 videos).
• For textual data they used web pages from
Google (1560 documents/web pages)
Features Extracted
• Features –
– Text: TF-IDF metric
– Audio: Acoustic frames of MFCC every 10ms in a
hamming window of 20 ms large.
– Visual: Color structure descriptor or dominant
color structure like homogeneous texture
descriptor or edge histogram descriptor. Texture
was the best feature according to them.
Classification
• Each modality is separately given to SVM
classifier and the scores of each are combined
using linear interpolation.
User Name Similarity
• They also tried to use the user name similarity
in the training set. They refer to the relation of
genres and user name as a knowledge base
and use it to boost the genre scores.
• So they increase the scores of genre for any
video if the user name of that video exists in
the knowledge base (development set).
TUB @ MediaEval 2011 Genre
Tagging Task: Prediction
using Bag-of-(visual)-Words
Approaches
Main Idea
• Classification task
• Bag-of-words approaches with different
features derived from visual content and
associated textual information
Features Extracted
• Mainly textual features:
– They translated foreign language program ASR in
English using Google Translate.
– Used Bag-of-Words (Tf-Idf) model for the textual
features.
• For visual features:
– They used local feature SURF extracted from each
key frame of video sequence.
Classification
• Fusion:
– Early fusion of visual and textual features and then
SVM classification.
• Classification:
– Used multi-class SVM, Multinomial Naïve Bayes
and Nearest Neighbor for classification.
SINAI-Genre tagging of videos
based on information retrieval and
semantic similarity using WordNet
Main Idea
• IR approach
• Query expansion using WordNet
• And different similarity measure rather than
Cosine similarity
Approach
• Query Expansion: Produce a bag of words
using WordNet’s synonyms, hyponyms and
domain terms for each genre term.
• An existing framework, Terrier IR system, has
been used to obtain a measure of relatedness
between the videos and the genre terms.
Second Approach
• They also used a formula proposed by Lin,
which is based on WordNet, to measure the
semantic similarity between the nouns
detected in each test video and the bags of
words generated for each genre category.
• Then they only kept the matches which
exceeded the threshold of 0.75 score.
• Finally, the accumulated similarity score has
been divided by the number of words
detected in the video, obtaining the final
semantic similarity score.
Results for all
Work
MAP Score
TUD-MIR (IR approach)
0.3212
LIA (Classification)
0.1828
TUB (Classification)
0.3049
SINAI (IR approach)
0.1115
Our Result (IR approach)
0.1081
Download