INTRODUCTION - SolutioMagister

advertisement
INTRODUCTION

Multimedia is rapidly growing in several application environments.

Characteristic: variety of data that it has to support.

Multimedia systems must have the capability to store, retrieve
transport and present data with very heterogenous characteristics such
as text images, graphs and sound.

More complex than traditional information systems.

Traditional I.S. deal only with textual, unstructured data and do not
have metadata information.
DATA MODELLING:
A multimedia IR s/m should be able to represent and store multimedia objects
in a way that ensures their fast retrieval. The s/m must be therefore be able to
deal with different kinds of media and with semi-structured data.
DATA RETRIEVAL:
The main goal of a multimedia IR system is to efficient perform retrieval,
based on user requests, exploiting not only data attributes, as in traditional
DBMSs, but also the content of multimedia objects.
Data retrieval relies on the following steps:
Query specification
Query processing and optimization
Query answer
Query iteration
Multimedia IR: Indexing and Searching

In multimedia IR, we have to design fast searching methods that will
search a multimedia database...

Objects can be 2D color images, gray scale medical images, digitised
voice or music, video clips etc..

eg: A typical query would be : 'in a collection of photographs, find
ones with the same color distribution as a sunset photograph.'

Specific applications include image databases, scientific databases,
DNA/Genome databases etc..

In such databases, typical queries would be: 'find companies whose
stockprices move similarily', or 'find images that look like a sunset'
etc..

Now for such retrieval of multimedia objects, the distance of two
objects need to be quantified. This is done by a distance function D( ).
DEFINITION: Given two objects, O1 & O2. the distance/ dissimilarity of the
two objects is denoted by D(O1, O2).
Whole match: Given a collection of N objects O1, O2, O3, . . . On and a
query object Q, we want to find those data objects that are within a distance e
from Q. Notice that the query and the objects are of the same type. eg: If the
objects are 512*512 images, so is the query.
Partial match: Here the query is allowed to specify only a part of the object.
Given N objects O1, O2, O3, . . On a query object Q, & tolerance e, we want
to identify the parts of the objects that match the query. eg: If the objects are
512*512 images, in this case the query could be a 16*16 sub pattern.
For all types of queries, the ideal method should fulfill the following
requirements:
It should be fast.
It should be correct.
The method should be dynamic.
The ideal method should require a small space overhead.
BASIC PRINCIPLES USED IN GEMINI (GEneric
Multimedia INdexIng)
Spatial access methods: The idea is to use multiattribute access methods to
search for data. The structures commonly used for mulltidimensional indexing
are R trees, R* trees, etc..
R tree based methods are more robust for higher dimensions as their fan-out
remains>2 always.
The R tree represents a spatial object by a minimum bounding rectangle
(MBR). Data rectangles are grouped to form parent nodes which are
recursively grouped.
Such nodes are recursively grouped to form a tree hierarchy. MBR of a parent
node consists of the MBR's of its children. MBR's are also allowed to overlap.
A range query specifies a region of interest, requiring all the data regions that
intersect it. To answer this query, we first retrieve a superset of the qualifying
regions, we compute the MBR and then we recursively descend the R tree,
excluding the branches whose MBR's do not intersect the the query . Thus it'll
be quick to retrieve objects..
A Generic Multimedia Indexing Approach
To illustrate this idea we'll focus on whole matched queries. The problem is
defined as follows:
We have a collection of N objects: O1, O2, O3, . . . On . The distance
/dissimilarity between two objects (Oi,Oj) is given by the function D(Oi,Oj)
which can be implemented as a program. The user specifies a query object Q,
and a tolerance e.
Our goal is to find the objects in the collection that are at a distance e from the
query object. An obvious solution is to apply sequential scanning. But this
may be slow, for two reasons:
(1) The distance computation might be expensive.
(2) The database size N might be huge.
A faster alternative is which is based on two ideas:
(1) A 'quick and dirty test' to discard quickly the majority of non-qualifying
objects;
(2) The use of spatial access methods.
The idea behind the quick and dirty test is to characterize a sequence with a
single number, which will help us discard many non-qualifying sequences.eg:
average. Thus using a good feature, we can have a quick test.
If using one
feature is good, using two or more features might be even better. The end
result of using f features for each of our objects is that we can map each object
into a point in f-dimensional space.
FEATURE EXTRACTION
Lower bound the diatance. Use good features. Next step is to use DFT,
parseval's theorem, DCT etc to show that the distance in feature space lower
bounds the actual
Download