MultiMediaIR

advertisement
Multimedia Information Retrieval
Modern Information Retrieval Course
Computer Engineering Department
Sharif University of Technology
Spring 2006
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
2
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
3
Support variety of data

Different kinds of media

Image


Audio


Graph,…
Music, speech,…
Video
Sharif University, Modern Information Retrieval
Course, Spring 2006
4
MMIR Motivations

Content, content, and more
content …
How to get what is needed ?

Increasing availability of
multimedia information

Difficult to find, select, filter,
manage AV content

More and more situations where it
is necessary to have ‘information
about the content’
Sharif University, Modern Information Retrieval
Course, Spring 2006
5
Key Issues in MMIR
Sharif University, Modern Information Retrieval
Course, Spring 2006
6
Goals


Want to make multimedia content
searchable like text information,
Because the value of content depends
on how easy it is to find, filter, manage,
and use it.
Need content description method
beyond simple text annotation
Sharif University, Modern Information Retrieval
Course, Spring 2006
7
MMIR Approaches


Text Based MMIR
Content Based MMIR
Sharif University, Modern Information Retrieval
Course, Spring 2006
8
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
9
Text-Based Retrieval

based on text associated with the
file

URL:


Alt text:


http://www.host.com/animals/dogs/poodle.gif
<img src=URL alt="picture of poodle">
Hyperlink text:

<a href=URL>Sally the poodle</a>
Sharif University, Modern Information Retrieval
Course, Spring 2006
10
Text-based Search Engines

Indexing based on text in the
container webpage



Http://www.google.com
Http://www.ditto.com
…
Sharif University, Modern Information Retrieval
Course, Spring 2006
11
Keyword-based System
User
Video Database
Automatic
Annotation
Keyword
Information Need
Including
filename, video
title, caption,
related web page
Sharif University, Modern Information Retrieval
Course, Spring 2006
12
Why this happens?

Most of these search engines are
keyword based


Have to represent your idea in keywords
These keywords are expected to appear
in the filename, or corresponding
webpage
Sharif University, Modern Information Retrieval
Course, Spring 2006
13
Image: The Google Approach

How does image search work?


Examples



Google analyzes the text on the page adjacent to
the image, the image caption and dozens of other
factors to determine the image content. Google
also uses sophisticated algorithms to remove
duplicates and ensure that the highest quality
images are presented first in your results.
Campanile tcd
Cliffs of Moher
Recall may not be great…
Sharif University, Modern Information Retrieval
Course, Spring 2006
14
Google image search
Sharif University, Modern Information Retrieval
Course, Spring 2006
15
Google Image Search
Sharif University, Modern Information Retrieval
Course, Spring 2006
16
Problems with Text-Based

The text in the ALT tag has to be done
manually




Expensive
Time consuming
It is incomplete and subjective
Some features are difficult to define in
text such as texture or object shape
Sharif University, Modern Information Retrieval
Course, Spring 2006
17
Therefore……





Unable to handle semantic meaning of
images
Unable to handle visual position
Unable to handle time information
Unable to use images as query
……….
Sharif University, Modern Information Retrieval
Course, Spring 2006
18
So …

Better for simple concepts


e.g. A picture of a giraffe
Don’t work for complex queries

e.g. A picture of a brick home with black
shutters and white pillars, with a pickup
truck in front of it (image)
Sharif University, Modern Information Retrieval
Course, Spring 2006
19
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
20
Architecture for Multimedia Retrieval
Feature
extraction
AV Description
Manual / automatic
Storage
Decoding
(for transmission)
Search / query
Pull
Browse
Conf.
points
Transmission
Encoding
(for transmission)
Filter
Push
Human or machine
Sharif University, Modern Information Retrieval
Course, Spring 2006
21
text
stills
sketch
speech
sound
humming
examples
Query-retrieval matrix
query

text
video



Example
doc
images
speech
music
sketches
multimedia
Sharif University, Modern Information Retrieval
Course, Spring 2006
conventional
text roar
retrieval
you
and
get a wildlife
type
“floods”
documentary
and
BBC
humget
a tune
radio
news
and get
a
music piece
22
Main Components




Feature Extraction & Analysis
Description Schemes
Searching & Filtering
Examples:
IBM’s Query By Image Content (QBIC)
 Virages’s VIR Image Engine
 Online http://collage.nhil.com/

Sharif University, Modern Information Retrieval
Course, Spring 2006
23
Internal representation


Using attributes is not sufficient
Feature



Information extracted from objects
Multimedia object is represented as a
set of features
Features can be assigned manually,
automatically, or using a hybrid
approach
Sharif University, Modern Information Retrieval
Course, Spring 2006
24
Features for MMIR

high-level features


medium-level features


words and phrases from text, speech recognition
face detector, regions classifiers, outdoor etc
low-level features

Fourier transforms, wavelet decomposition,
texture histograms, colour histograms, shape
primitives, filter primitives
Sharif University, Modern Information Retrieval
Course, Spring 2006
25
Internal representation


Values of some specific features are
assigned to a object by comparing the
object with some previously classified
objects
Feature extraction cannot be precise

A weight is usually assigned to each feature value
representing the uncertainty of assigning such a
value to that feature

80% sure that a shape is a square
Sharif University, Modern Information Retrieval
Course, Spring 2006
26
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
27
MMIR Model’s Main Components

Query Language

Indexing and Searching
Sharif University, Modern Information Retrieval
Course, Spring 2006
28
Query languages

In designing a multimedia query
language, two main aspects require
attention
How the user enters his/her request to the
system
 Which conditions on multimedia objects
can be specified in the user request

Sharif University, Modern Information Retrieval
Course, Spring 2006
29
Request specification

Interfaces
Browsing and navigation
 Specifying the conditions the objects of
interest must satisfy, by means of queries


Queries can be specified in two
different ways
Using a specific query language
 Query by example


Using actual data (object example)
Sharif University, Modern Information Retrieval
Course, Spring 2006
30
Conditions on multimedia data

Query predicates

Attribute predicates
Concern the attributes for which an exact value is
supplied for each object
 Exact-match retrieval


Structural predicates
Concern the structure of multimedia objects
 Can be answered by metadata and information
about the database schema
 “Find all multimedia objects containing at least one
image and a video clip”

Sharif University, Modern Information Retrieval
Course, Spring 2006
31
Conditions on multimedia data

Semantic predicates
Concern the semantic content of the
required data, depending on the features
that have been extracted and stored for
each multimedia object
 “Find all the red houses”
 Exact match cannot be applied

Sharif University, Modern Information Retrieval
Course, Spring 2006
32
Indexing and searching


Searching similar patterns
Distance function


Given two objects, O1 and O2, the distance
(=dissimilarity) of the two objects is denoted
by D(O1,O2)
Similarity queries
Whole match
 Sub-pattern match
 Nearest neighbors
 All pairs

Sharif University, Modern Information Retrieval
Course, Spring 2006
33
Spatial access methods


Map objects into points in f-D space, and to use
multiattribute access methods (also referred to as
spatial access methods or SAMs) to cluster them
and to search for them
Methods




R*-trees and the rest of the R-tree family
Linear quadtrees
Grid-files
Linear quadtrees and grid files explode exponentially with
the dimensionality
Sharif University, Modern Information Retrieval
Course, Spring 2006
34
R-tree

R-tree





Represent a spatial object by its minimum
bounding rectangle (MBR)
Data rectangles are grouped to form parent
nodes (recursively grouped)
The MBR of a parent node completely contains
the MBRs of its children
MBRs are allowed to overlap
Nodes of the tree correspond to disk pages
Sharif University, Modern Information Retrieval
Course, Spring 2006
35
Sharif University, Modern Information Retrieval
Course, Spring 2006
36
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
37
Visual Features ...
Texture
Colour
Shape
Sharif University, Modern Information Retrieval
Course, Spring 2006
38
Histograms
Greyscale histogram of image A
Assuming 256 intensity levels
hA(l) (l=1  256)
hA(l) =#{(i,j)|A(i,j)=l, i = 1  m, for j = 1  n}
Sharif University, Modern Information Retrieval
Course, Spring 2006
i.e. a count
of the
number of
pixels at
each level
39
Colour Histogram

Describe the colors and its
percentages in an image.




f c  (I j , Pj ) I j  ColorValue,0  Pj  1,  Pj  1, and 1  j  N 


1 j  N


Sharif University, Modern Information Retrieval
Course, Spring 2006
40
Texture Matching




Texture characterizes small-scale regularity
 Color describes pixels, texture describes
regions
Described by several types of features
 e.g., smoothness, periodicity, directionality
Perform weighted vector space matching
Usually in combination with a color
histogram
Sharif University, Modern Information Retrieval
Course, Spring 2006
41
Texture Test Patterns
Sharif University, Modern Information Retrieval
Course, Spring 2006
42
Image Retrieval using low level features

See IBM demos at:
http://wwwqbic.almaden.ibm.com/
 http://mp7.watson.ibm.com/ (video)


Hermitage Museum

www.hermitagemuseum.org
Sharif University, Modern Information Retrieval
Course, Spring 2006
43
Berkeley Blobworld
Sharif University, Modern Information Retrieval
Course, Spring 2006
44
Berkeley Blobworld
Sharif University, Modern Information Retrieval
Course, Spring 2006
45
But…..
• Low-level feature doesn’t work in all the cases
Sharif University, Modern Information Retrieval
Course, Spring 2006
46
Solution: Regional Low-level Image Feature


Segmentation into objects
Extract low-level features from each regions
Sharif University, Modern Information Retrieval
Course, Spring 2006
47
Solution: High-level Image Feature




Objects: Persons, Roads, Cars,
Skies…
Scenes: Indoors, Outdoors, Cityscape,
Landscape, Water, Office, Factory…
Event: Parade, Explosion, Picnic,
Playing Soccer…
Generated from low-level features
Sharif University, Modern Information Retrieval
Course, Spring 2006
48
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
49
Audio Genres

Important types of audio data

Speech-centered




Music-centered


Radio programs
Telephone conversations
Recorded meetings
Instrumental, vocal
Other sources

Alarms, instrumentation, surveillance, …
Sharif University, Modern Information Retrieval
Course, Spring 2006
50
Speech-based Documents





Radio/TV news retrieval.
Search archival radio/news broadcasts.
Video and audio email.
Knowledge management : transfert of
tacit knowledge to others.
Search audio archives of meetings,
lectures, etc…
Sharif University, Modern Information Retrieval
Course, Spring 2006
51
Preamble


Two utterances of the same words by
the same person under the same
conditions generate very different
waveforms.
Variations due to loudness, pitch,
brightness, bandwidth, harmonisity, and
others are all continuous variables and
are equivalent to color and texture in
images.
Sharif University, Modern Information Retrieval
Course, Spring 2006
52
Detectable Speech Features

Content


Identity


Speaker identification, speaker segmentation
Language


Phonemes, one-best word recognition, n-best
Language, dialect, accent
Other measurable parameters

Time, duration, channel, environment
Sharif University, Modern Information Retrieval
Course, Spring 2006
53
How Speech Recognition Works

Three stages

What sounds were made?


How could the sounds be grouped into words?


Identify the most probable word segmentation points
Which of the possible words were spoken?


Convert from waveform to subword units (phonemes)
Based on likelihood of possible multiword sequences
All three stages are learned from training data

Using hill climbing (a “Hidden Markov Model”)
Sharif University, Modern Information Retrieval
Course, Spring 2006
54
Speech Recognition
One-best phoneme transcription
Phoneme
Detection
Phoneme
transcription
dictionary
Word n-gram
language
model
N-best phoneme sequences
Phoneme
n-grams
Phoneme
lattices
Word
Construction
Word
Selection
Sharif University, Modern Information Retrieval
Course, Spring 2006
One-best
word transcript
Words
55
Music and audio analysis



Music is a large and extremely variable
audio class.
The range of sounds is large, from
music genres to animal cries to
synthesizer samples.
Any of the above can and will occur in
combination.
Sharif University, Modern Information Retrieval
Course, Spring 2006
56
Audio retrieval-by-content



Require some measure of audio similarity.
Most approaches to general audio retrieval
take a perceptual approach, using measures
such as loudness.
Neural net to map a sound clip to a text
description : An obvious drawback is the
subjective nature of audio description.
Sharif University, Modern Information Retrieval
Course, Spring 2006
57
Sample system: Muscle fish



To analyze sound files for a specific set
of psychoacoustic features.
This results in a vector of attributes that
include loudness, pitch, bandwidth and
harmonicity.
Given enough training samples, a
Gaussian classifier can be constructed,
or for retrieval.
Sharif University, Modern Information Retrieval
Course, Spring 2006
58



An Euclidean distance is used as a measure
of similarity.
For retrieval, the distance is computed
between a given sound example and all
other sound examples (about 400 in the
demonstration).
Sounds are ranked by distance, with the
closer ones being more similar.
Sharif University, Modern Information Retrieval
Course, Spring 2006
59
Music and MIDI retrieval


Using archives of MIDI files, which are
score-like representations of music
intended for musical synthesizers or
sequencers.
Given a melodic query, the MIDI files
can be searched for similar melodies.
Sharif University, Modern Information Retrieval
Course, Spring 2006
60
Polyphonic Music Indexing Technique

n-grams
encode music as text strings using pitch and
onsets
 index text words with text search engine
 process query in the same way
 application: eg, Query by Humming

Sharif University, Modern Information Retrieval
Course, Spring 2006
61
Monophonic pitch n-gramming
Interval:
0
+7
0
+2
0
-2
0
-2
0
[0 +7 0 +2]
[+7 0 +2 0]
ZGZB
[0 +2 0 -2]
GZBZ
ZBZb
Example: musical strings with interval-only representation
Sharif University, Modern Information Retrieval
Course, Spring 2006
62
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
63
Application

Increasing demand for visual information retrieval



Retrieve useful information from databases
Sharing and distributing video data through computer
networks
Example: BBC


BBC archive has +500k queries plus 1M new items … per
year;
From the BBC …



Police car with blue light flashing
Government plan to improve reading standards
Two shot of Kenneth Clarke and William Hague
Sharif University, Modern Information Retrieval
Course, Spring 2006
64
Video Search
Active Research Area
Sharif University, Modern Information Retrieval
Course, Spring 2006
65
Video Search: Features
Color






Robust to background
Independent of size, orientation
Color Histogram [Swain &
Ballard]
“Sensitive to noise and
sparse”- Cumulative
Histograms [Stricker &
Orgengo]
Color Moments
Color Sets: Map RGB Color
space to Hue Saturation
Value, & quantize [Smith,
Chang]


Color layout- local color features
by dividing image into regions
Color Autocorrelograms
Texture









One of the earliest Image features
[Harlick et al 70s]
Co-occurrence matrix
Orientation and distance on grayscale pixels
Contrast, inverse deference
moment, and entropy [Gotlieb &
Kreyszig]
Human visual texture properties:
coarseness, contrast,
directionality, likeliness, regularity
and roughness [Tamura et al]
Wavelet Transforms [90s]
[Smith & Chang] extracted mean
and variance from wavelet
subbands
Gabor Filters
And so on
Sharif University, Modern Information Retrieval
Course, Spring 2006
Region Segmentation



Partition image into regions
Strong Segmentation: Object
segmentation is difficult.
Weak segmentation: Region
segmentation based on some
homegenity criteria
Scene Segmentation



Shot detection, scene detection
Look for changes in color,
texture, brightness
Context based scene
segmentation applied to certain
categories such as broadcast
news
66
Video Search: Features
Face
Shape









Outer Boundary based vs.
region based
Fourier descriptors
Moment invariants
Finite Element Method
(Stiffness matrix- how each
point is connected to others;
Eigen vectors of matrix)
Turing function based
(similar to Fourier descriptor)
convex/concave
polygons[Arkin et al]
Wavelet transforms leverages
multiresolution [Chuang &
Kao]
Chamfer matching for
comparing 2 shapes (linear
dimension rather than area)
3-D object representations
using similar invariant
features
Well-known edge detection
algorithms.


Face detection is highly reliable
- Neural Networks [Rwoley]
- Wavelet based histograms of
facial features [Schneiderman]
Face recognition for video is still a
challenging problem.
- EigenFaces: Extract
eigenvectors and use as feature
space
OCR



OCR is fairly successful
technology.
Accurate, especially with good
matching vocabularies.
Script recognition still an open
problem.
ASR



Automatic speech recognition
fairly accurate for medium to large
vocabulary broadcast type data
Large number of available speech
vendors.
Still open for free conversational
speech in noisy conditions.
Sharif University, Modern Information Retrieval
Course, Spring 2006
67
Video Structures

Image structure


Object motion


Translation, rotation
Camera motion


Absolute positioning, relative positioning
Pan, zoom, perspective change
Shot transitions

Cut, fade, dissolve, …
Sharif University, Modern Information Retrieval
Course, Spring 2006
68
Typical Retrieval Framework



User : provide query information
that represents his information
needs
Database: store a large collection
of video data
Goal: Find the most relevant shots
from the database

Shots: “paragraph” in video, typically
20 – 40 seconds, which is the basic
unit of video retrieval
Sharif University, Modern Information Retrieval
Course, Spring 2006
69
Bridging the Gap
Video Database
User
Result
Sharif University, Modern Information Retrieval
Course, Spring 2006
70
Automatically Structure Video
Data


The first step for video retrieval: Video “programmes”
are structured into logical scenes, and physical
shots
If dealing with text, then the structure is obvious:



paragraph, section, topic, page, etc.
All text-based indexing, retrieval, linking, etc. builds
upon this structure;
Automatic shot boundary detection and selection of
representative keyframes is usually the first step;
Sharif University, Modern Information Retrieval
Course, Spring 2006
71
Typical automatic structuring
of video
a video document
A set of
shots
Keyframe browser
combined with
transcript or objectbased search
Sharif University, Modern Information Retrieval
Course, Spring 2006
72
Ideal solution
Video Database
Video Structure
User
Information Need
Understanding the
semantic meaning
and retrieve
Result
Sharif University, Modern Information Retrieval
Course, Spring 2006
73
Ideal solution
Video Database
Video Structure
However,
1. Hard to represent query in
natural language and for
User
computer to understand
2. Computers have no experience
3. Other representation
restriction like position, time
Information Need
Understanding the
semantic meaning
and retrieve
Result
Sharif University, Modern Information Retrieval
Course, Spring 2006
74
Alternative Solution
Video Database
Video Structure
User
Provide evidence of
relevant information (
text, image, audio)
Information Need
Match and combine
Result
Sharif University, Modern Information Retrieval
Course, Spring 2006
75
Evidence-based Retrieval System


General framework for current video retrieval system
Video retrieval based on the evidence from both
users and database, including






Text information
Image information
Motion information
Audio information
Return a relevant score for each evidence
Combination of the scores
Sharif University, Modern Information Retrieval
Course, Spring 2006
76
Keyword-based System
Video Database
User
Automatic
Annotation
Keyword
Video Structure
Information Need
Including
filename, video
title, caption,
related web page
Sharif University, Modern Information Retrieval
Course, Spring 2006
77
Keyword-based System
Video Database
User
Automatic
Annotation
Video Structure
Keyword
Information Need
Manual
Annotation
Sharif University, Modern Information Retrieval
Course, Spring 2006
78
Manual Annotation



Manually creating annotation/keywords for image /
video data
Examples: Gettyimage.com (image retrieval)
Pros:


Represent the semantic meaning of video
Cons


Time-consuming, labor-intensive
Keyword is not enough to represent information need
Sharif University, Modern Information Retrieval
Course, Spring 2006
79
Speech and OCR transcription
Video Database
User
Annotation
Keyword
Video Structure
Information Need
Speech
Transcription
OCR
Transcription
Sharif University, Modern Information Retrieval
Course, Spring 2006
80
Query using speech/OCR information
Query:
Find pictures of Harry
Hertz, Director of the
National Quality
Program, NIST
Speech:
We’re looking for people that have a broad range of
expertise that have business knowledge that have
knowledge on quality management on quality
improvement and in particular …
OCR:
H,arry Hertz a Director aro 7 wa,i,,ty Program
,Harry
Hertz a Director
Sharif University, Modern Information Retrieval
Course, Spring 2006
81
What we lack?
Video Database
User
Annotation
Keyword
Video Structure
Information Need
Speech
Transcription
OCR
Transcription
Image
Information
Sharif University, Modern Information Retrieval
Course, Spring 2006
82
Image-based Retrieval
Video Database
User
Text
Information
Keyword
Information Need
Video Structure
Image
Feature
Query
Images
Sharif University, Modern Information Retrieval
Course, Spring 2006
83
Image-based Retrieval
Video Database
User
Text
Information
Keyword
Information Need
Video Structure
Image Feature
Low-level
Feature
Query
Images
High-level
Feature
Sharif University, Modern Information Retrieval
Course, Spring 2006
84
More Evidence in Video
Retrieval
Video Database
User
Text
Information
Keyword
Information Need
Video Structure
Image
Information
Query
Images
Motion
Information
Motion
Audio
Information
Audio
Sharif University, Modern Information Retrieval
Course, Spring 2006
85
MPEG-7: The Objective

Standardize object-based description tools for various types
of audiovisual information, allowing fast and efficient content
searching, filtering and identification, and addressing a large
range of applications.
New objective for MPEG:

MPEG-1, -2 and -4 represent the content itself (‘the bits’)

MPEG-7 should represent information about the content (‘the
bits about the bits’)
Sharif University, Modern Information Retrieval
Course, Spring 2006
86
Scope of MPEG-7
Description
creation



description
Not the description
creation
Not the description
consumption
Just the description !
Description
consumption
This is the scope
of MPEG-7
The goal is to define the minimum that enables
Sharif
University, Modern Information Retrieval
interoperability.
Course, Spring 2006
87
MPEG-7 Terminology: Descriptor

Descriptor (D) : A Descriptor is a representation of a Feature. A
Descriptor defines the syntax and the semantics of the Feature
representation.
Examples:
Feature
Descriptor
Color
Histogram of Y,U,V components
Shape
ART moments
Motion
Motion field, coefficients of a model
Audio frequency Average frequency components
Title
Text
Annotation
Text
Genre
Text, index in as thesaurus
Sharif University, Modern Information Retrieval
Course, Spring 2006
88
Outline



Introduction
Text-Based MMIR
Content-Based Retrieval





Multimedia IR Model
Image Retrieval
Audio Retrieval
Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval
Course, Spring 2006
89
Conclusions

Simple image retrieval is commercially available


Segmentation-based retrieval is still in the lab


Color histograms, texture, limited shape information
Keep an eye on the Berkeley group
Limited audio indexing is practical now

Audio feature matching, answering machine detection
Sharif University, Modern Information Retrieval
Course, Spring 2006
90
Conclusions

Multimedia IR
Text: good solutions exist
 Video, Image, Sound – a lot of work to do.

Sharif University, Modern Information Retrieval
Course, Spring 2006
91
Conclusions





The goal of content-based video retrieval is
to build more intelligent video retrieval
engine via semantic meaning
Many applications in daily life
Combine evidence from different aspects
Hot research topic, few business system
State-of-the-art performance is still
unacceptable for normal users, space to
improve
Sharif University, Modern Information Retrieval
Course, Spring 2006
92
Conclusions

Problems with Content-Based MMIR


Must have an example image
Example image is 2-D




Hence only that view of the object will be returned
Large amount of image data
Similar colour histogram does not equal similar
image
Usually the best results come from a
combination of both text and content searching
Sharif University, Modern Information Retrieval
Course, Spring 2006
93
Conclusions

Combination of multi-modal results
 Difference characteristics between multimodal information
Text-based Information: better for middle and
high level queries
 Image-based Information: better for low and
middle level queries


Combination of multi-modal information
Sharif University, Modern Information Retrieval
Course, Spring 2006
94
Conclusions

Challenging research questions

Draws on








computer vision,
audio processing,
natural language analysis,
unstructured document analysis,
information retrieval,
information visualisation,
computer human interaction,
artificial intelligence
Sharif University, Modern Information Retrieval
Course, Spring 2006
95
Download