IEEE International Conference on Consumer Electronics
Los Angeles, CA, June 2001
B. S. Manjunath
University of California,
Santa Barbara, USA manj@ece.ucsb.edu
Prof. P. Salembier (for permitting me to use some of his slides fr our tutorial at ICIP’2000).
Editors of the MPEG-7 XM and WD.
Dr. L. Cieplensky, Mitsubishi Electric
Dr. A. Divakaran, Mitsubishi
Dr. S. Jeannin, Philips research lab
Prof. W.Y. Kim, Hanyang University
Dr. M. Bober, Mitsubishi Electric
Dr. H. J. Kim, LG Electronics
Dr. S. Park, ETRI
Overview of the visual descriptors
Their capabilities and limitations
Some application examples
Pointers to publicly available documents
Do not expect
Programming and implementation details – these are available in the MPEG-7 eXperimental Model (XM) document.
Binary bit stream syntax – see the MPEG-7 Committee
Draft(s).
Objective, goals, requirements and applications
Basic component of the MPEG-7 standard
Description Definition Language
Description Schemes
A/V Descriptors
Color
Shape
Texture
Motion
Face
©Salembier
The multimedia context:
More information is in digital form and is on-line.
AV content covers: still pictures, audio, speech, video, graphics, 3D models, etc.
AV content is available at all bitrates and on all networks.
Increasing number of multimedia applications, services.
Necessity of describing content:
Increasing amount of information.
More needs to have “information about the content”.
Difficult to manage (find, select, filter, organize, etc) content.
User: human or computational systems.
©Salembier
MPEG-1 : Storage of moving picture and audio on storage media (CD-ROM) 11 / 1992
MPEG-2 : Digital television 11 / 1994
MPEG-4 : Coding of natural and synthetic media objects for multimedia applications v1: 09 / 1998 v2: 11 / 1999
MPEG-7 : Multimedia content description for AV material
08 / 2001
MPEG-21 : Digital audiovisual framework: Integration of multimedia technologies (identification, copyright, protection, etc.) 11 / 2001
©Salembier
Standardize content-based description for various types of audiovisual information
Enable fast and efficient content searching, filtering and identification
Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.)
Address a large range of applications
(
user preferences)
Types of audiovisual information:
Audio, speech
Moving video, still pictures, graphics, 3D models
Information on how objects are combined in scenes
Descriptions independent of the data support
Existing solutions for textual content or description
©Salembier
Pull Applications:
Example:
“Search and Browsing”
Internet search engines and databases
Advantage: Queries based on standardized descriptions
Push Applications: “Filtering”
Example:
Advantage:
Broadcast video, Interactive television
Intelligent agents filter on the basis of standardized descriptions
Universal Multimedia Access:
Adapt delivery to network and terminal characteristics (QoS)
Specialized Professional and Control Applications
©Salembier
Storage and retrieval of audiovisual databases (image, film, radio archives)
Broadcast media selection (radio, TV programs)
Surveillance(traffic control, surface transportation, production chains)
E-commerce and Tele-shopping (searching for clothes / patterns)
Remote sensing(cartography, ecology, natural resources management)
Entertainment (searching for a game, for a karaoke)
Cultural services (museums, art galleries)
Journalism (searching for events, persons)
Personalized news service on Internet (push media filtering)
Intelligent multimedia presentations
Educational applications
Bio-medical applications
©Salembier
Text:
Find AV material with the concepts described by the text
Semantic:
Find AV material corresponding to the specified semantic
Image:
Find an image with a similar characteristic (global or local)
Music:
Play a few notes and search for corresponding musical pieces
Motion:
Find video with specific object motion trajectories
©Salembier
Description may be separated from the content
AV material
Description
AV material AV material
AV material
Description may be multiplexed with the content
AV Desc AV Desc AV Desc
©Salembier
Information about the content: recording date & conditions, title, author, copyright, coding format, classification, etc.
Information present in the content: Combination of low level and high level descriptors
High level description:
Efficient and powerful
Lack of flexibility
Low level description
Generic and flexible
Intelligent / efficient search engine
Efficiency
Indexing
Feature extrac
High level
Recognition process
Search
Retrieval
Low level
Recognition process
No restriction on the search
©Salembier
Having a standard will ease the task of fast and efficient identification of content that is of interest to the user by
allowing the same described content to be accessed by more search engines and filters
allowing the same search engine and filters to identify described content from more sources
©Salembier
Description generation
Description
Description consumption
Research and future competition
Scope of MPEG-7
The description generation
Feature extraction, Indexing process, Annotation &Authoring tools,...) consumption
Search engine, Filtering tool, Retrieval process, Browsing device, ...) are non normative parts of MPEG-7
The goal is to define the minimum that enables interoperability
1996
Competition:
• Individual work
• Definition of the scope and requ
1998 1999
Collaboration:
• Common work
• Core experiments
• eXperimentation
Model
• Requirements
2000 2001
©Salembier
Call for proposals Working draft Committee draft
Final committee draft
International standard
Draft international standard
©Salembier
Feature extraction
Manual / automatic
AV Description
Decoding
(for transmission)
Storage
Search / query
User & computational systems
Pull
Browse
Filter
Conf.
points
Transmission
Encoding
(for transmission)
Push
The content and its description may also be multiplexed
ISO / IEC 15938 - 1: Systems
ISO / IEC 15938 - 2: Description Definition Language
ISO / IEC 15938 - 3: Visual
ISO / IEC 15938 - 4: Audio
ISO / IEC 15938 - 5: Multimedia Description Schemes
ISO / IEC 15938 - 6: Reference Software
Visual Descriptors
Color Texture Shape Motion
1. Histogram
• Scalable Color
• Color Structure
• GOF/GOP
2. Dominant Color
3. Color Layout
• Texture Browsing
• Homogeneous texture
• Edge Histogram
• Contour Shape
• Region Shape
•
• Camera motion
Motion Trajectory
• Parametric motion
• Motion Activity
Color: about 5400 color images and 50 queries.
See MPEG document M5060 from Melbourne,
October 1999.
Texture: various data sets – Brodatz texture, aerial pictures, Corel photos.
Let the number of ground truth images for a query q be NG(q)
Compute NR(q), number of found items in first K(q) retrievals, where
-
K(q)=min(4*NG(q), 2*GTM)
Where GTM is max{NG(q)} for all q’s of a data set.
Compute MR(q)=NG(q)-NR(q), number of missed items
Compute from the ranks Rank ( k ) of the found items counting the rank of the first retrieved item as one.
A Rank of ( 1.25K(q) ) is assigned to each of the ground truth images which are not in the first K(q) retrievals.
Compute the normalized modified retrieval rank as follows
(next slide). Note that the NMRR ( q ) will always be in the range of
[0.0,1.0].
Compute AVR(q) for query q as follows:
AVR ( q )
NG k
(
1 q )
Rank ( k )
NG ( q )
Compute the modified retrieval rank as follows:
( ) = ( ) 0.5(1 +
Normalized MRR, NMRR = MRR(q)/Norm(q)
Where Norm(q)=1.25*K –0.5 – 0.5*NG(q)
ANMRR
1
Q q
Q
1
NMRR ( q )
Color descriptors
Dominant Color Scalable Color
- HSV space
Group of frames/pictures histogram color structure
- HMMD space
Color Layout
- YCbCr space
Color Spaces
- YCrCb
- monochrome
(Y only)
- RGB
- HSV
- HMMD
The Color Space Descriptor allows a selection of a color space to be used in the description, the Color
Quantization Descriptor specifies the partitioning of the given color space into discrete bins. These two descriptors are rather to be used in the context of other descriptors, not standalone.
RGB
YCrCb color layout
HSV scalable color
HMMD color structure
Arbitrary 3x3 color transformation matrix
Y = 0.299*R + 0.587*G + 0.114*B
Cb = -0.169*R - 0.331*G + 0.500*B
Cr = 0.500*R - 0.419*G - 0.081*B
Max = max(R, G, B); Min = min( R, G, B);
Value = max(R, G, B); if( Max == 0 ) then
Saturation = 0; else
Saturation = (Max-Min)/Max; if( Max == Min ) Hue is undefined (achromatic color); otherwise: if( Max == R && G > B ) Hue = 60*(G-B)/(Max-Min) else if( Max == R && G < B ) Hue = 360 + 60*(G-B)/(Max-Min) else if( G == Max ) Hue = 60*(2.0 + (B-R)/(Max-Min)) else Hue = 60*(4.0 + (R-G)/(Max-Min))
Diff=Max-Min
Sum=(max+min)/2
Hue as defined for the HSV.
White
Intensity(sum)
Whiteness(min)
Chroma(Diff)
Pure color
Blackness(max)
Black
Component Subspace
Hue
Sum
2
3
4
0
1
3
4
0
1
2
4
2
8
4
4
12
12
24
Number of quantisation levels for different numbers of histogram bins
184 120 64 32
1
8
1
4
1
4
1
4
12
12
6
4
3
2
8
4
4
4
8
4
4
4
8
2
4
2
This is a color histogram in the HSV space encoded using a Haar transform. The binary representation is scalable in terms of number of bins used and in the number of bits per bin over a wide range of data rates.
Number of bins can range from 16 to 256.
No. coeff
16
32
64
128
256
8
8
16
# bins: H
4
8
2
4
4
# bins: S
2
2
4
4
4
#bins: V
2
2
t o a)
-
+
+
+ b)
HSV space is uniformly quantized into 256 bins (colors);
The bin values are non-uniformly quantized into 4 bits per bin (11 bit integer truncation followed by a 4 bit non-linear quantization). The quantization is described in the normative part.
The 4 bit values are then Haar transformed.
For matching, one can use the Haar transformed coefficient values instead of reconstructing the histogram values.
However, more accurate results can be expected by reconstructing the histogram index values.
0,6
0,5
A
N
M
R
R
0,4
0,3
0,2
0,1
16
32
64
128
256
H-Rec
0
0 256 512 768 1024
Number of bits
Results with different numbers of Haar coefficients (16-256) quantized at different numbers of bits. H-Rec signifies retrieval results after reconstruction of histogram from Haar coefficients at full bit resolution.
The Group of Frames/Group of Pictures Descriptor (GoP) extends the SCD application to a collection of images, video segments, or moving regions.
In the GoP descriptor, three different ways of computing the joint color histogram values for the whole series using the individual histograms from items within the collection are identified:
averaging, median filtering, and histogram intersection.
This joint color histogram is then processed as in the SCD using the
Haar transform and encoded.
Similar to a histogram, but a 8x8 structuring element is used to compute the bin values.
HMMD color space should be used with this descriptor. The quantization of the HMMD space to
32, 64, 128 and 180 bins is specified.
C4
C5
C6
C7
COLOR BIN
C0
C1
C2
+1
C3 +1
+1
8 x 8 structuring element dist( A , B )
Ack: WD 4.0, July 2000
i h
A
( i )
h
B
( i )
a
.
b .
Figure 6: Structuring elements for images with different resolutions: (a): 320x240, (b): 640x480
.
The Color-Structure Descriptor can be used in a limited range of Color Quantization settings, such that the total number of bins lies between 32 and 256.
Re-quantization of a Color Structure Descriptor from fine color space quantization to course can be performed in
HMMD color space using the re-quantization method defined in the WD.
The Color-Structure Descriptor is not interoperable with other Color Descriptors.
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 20 40 60 descriptor bit-length
80 100
Best suitable for local (object or region) features
a small number of colors enough to characterize the color information
Before feature extraction, images segmented into regions
Similar to color histogram EXCEPT:
color bins not fixed, depending on quantization in each region
number of bins not fixed, on average only 3 bins per region
To extract feature
quantize to a small number of representing colors in each region
calculate percentages of quantized colors in the region
A given image is described in terms of a set of region labels and the associated color descriptors
Each pixel has a unique region label.
Each region is characterized by a variable bin color histogram
Color feature descriptor for a given region in the image F :
F =
{ { c p v i i i
} , s
}
, ( i = 1, 2, , N ) where c i is the i -th dominant color, p i is its percentage value, and v i is its color variance. The color variance is an optional field. N: total number of quantized colors in the region .
The spatial coherency s is a single number that represents the overall spatial homogeneity of the dominant colors in the image.
The number of dominant colors N can vary from image to image and a maximum of 8 dominant colors can be used to represent the region (3 bits).
The percentage values are quantized to 5 bits each.
The color quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor.
Experiments with 6 bits/color index.
Variance: 3 bits/dominant color.
Spatial coherence: 5 bits.
Similarity Distance Measure
D
2
( F
1
, F
2
)
i
N
1
1 p
1 i
2 j
N 2
1 p
2
2 j
i
N
1
N 2
1 j
1
2 a
1 i , 2 j p
1 i p
2 j a k,l
: similarity coefficient between two colors c k and c l a k , l
1
0
d k , l
/ d max d d k , l k , l
T d
T d d k,l
: Euclidean distance between two color c k and c l d k , l
c k
c l
T d
: maximum distance for two colors to be considered similar, d max
=
T d
,
values 1.0-1.5, T d values 10-20
Equivalent to the quadratic color histogram distance measure
D
2
( F
1
, F
2
)
( F
1
F
2
)
T
A
( F
1
F
2
)
Spatial coherency
Color variance: variance of each of the dominant color.
Color
Space
#dominant colors
(average)
Size(bits)
DC
ARR
DC+Variance
ANMRR Size (bits) ARR ANMRR
RGB
CIE-LAB 3
5
3
6
69
130
67
112
0.6368
0.7114
0.7568
0.8083
0.3897
0.3214
0.2784
0.2312
78
148
76
127
0.7163
0.7933
0.8160
0.8951
0.3222
0.2295
0.2350
0.1563
2
1
0
5
4
3
# bits for the spatial coherence
ANMRR
Spatial coherence field with dominant colors
0.221
Spatial coherence for each dominant color
0.227
0.246
0.250
0.252
0.197
0.202
0.252 (without spatial coherence value)
Merits
Accurate and compact compared to the traditional color histogram
color bins quantized from each image region VS. fixed
3 bins on average VS. 256 or more
Efficient database indexing and search
NO high-dimensional indexing
complexity of the searching depends
only on the desired degree of the similarity of the matching
not directly on the database size
insertion and deletion of database entries do not cause index structure rebuilding
retrieval results accurate and fast compared to the traditional color histogram
Spatial distribution of colors for fast browsing and retrieval.
A very compact representation (63 bits).
Uses YCbCr color space.
Partition the image into 8x8 subimages.
Calculate the “dominant color” of each sub-image.
Compute the DCT of this 8x8 matrix of dominant colors.
Quantize the DCT coefficients.
Zig-zag scan of quantized DCT coefficients.
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 20 40 60 descriptor bit-length
80 100
Compute the CLD every 0.5 secs.
Subsample and quantize the CLD sequence:
Let the CLD (t) be the CLD at instant t.
Let t=0, i=0. Let CLDV(0)=CLD(0).
STEP 1: t=t+1.
If CLD(t+1) is similar to CLDV(i) then GOTO STEP 1.;
Else i=i+1; CLDV(i)=CLD(t). GOTO STEP 1.
Visual Descriptors
Color Texture Shape Motion
1. Histogram
• Scalable Color
• Color Structure
• GOF/GOP
2. Dominant Color
3. Color Layout
• Texture Browsing
• Homogeneous texture
• Edge Histogram
• Contour Shape
• Region Shape
• Camera motion
• Motion Trajectory
• Parametric motion
• Motion Activity
Texture browsing
Homogeneous texture
Edge histogram
Compact descriptor for texture browsing- requires only 12 bits. The components provide a higher level perceptual characterization of texture that is useful for browsing and clustering.
Feature extraction is simple, involving image convolutions with a set of masks. The filters are based on a 2-D Gabor wavelet decomposition.
Image convolutions can be efficiently implemented in hardware and software.
[ v v v v v
1 2 3 4 5
0 0 1 1
... ...
M M
]
PBC SRC
PBC: Perceptual Browsing Component, ( v i integers (12 bits total) v
1
1 N v i 1 ,..., 5 ) are non negative provides a confidence measure on the texture regularity v v
2 4
О K give two quantized directions which best capture the regularity v v
5
О S give two quantized scales which best capture the texture regularity
PBC v v v v v
(6)
Figure 2: Two texture examples: (a) regular pattern, and (b) an unstructured pattern
Some sample textures and corresponding PBC vectors
[1 3 3 1 1] [1 3 3 1 3] [1 4 1 1 1] [1 6 3 1 1] [1 1 5 1 2]
[2 6 2 3 3] [2 2 6 4 1] [2 6 4 1 3] [2 2 4 2 1] [3 1 4 1 4]
[4 1 4 3 3] [4 1 4 4 4] [4 1 4 4 4] [4 2 3 3 2] [4 1 4 3 3]
[ v v v v v
5
2
... ...
M
]
PBC
SRC
SRC: Similarity Retrieval Component.
Components are computed by convolving the image with a set of filters tuned to detect image features at different scales and orientations.
denote the normalized first and second moments of these filtered outputs (computed in the frequency domain).
62 Components x 8 bits/component.
The components represent mean and standard deviation of the energies in each of the filtered outputs.
Matching can be made rotation and scale invariant.
Channel (C i
) channel number (i)
4
5 3
6
10
11
12
30
18
16
17
15
24
23
22
21
20
19
14
13
29
28
27
26
25
0
9
8
7
2
1 w q w
G
P s , r
w q
exp
w w s
2
2
2
s
exp
q q r
2
2
q
2 r
40 large aerial photographs (each is 5K x 5K)
Contain about 280,000 tiles and 6,000 regions
Texture based search (using Gabor texture features)
A pattern thesaurus for image indexing
Fast image segmentation scheme
Provide both tile-based and region-based search capabilities.
Integrated into the UCSB Alexandria Digital Library.
Query Codeword
http://vision.ece.ucsb.edu/
Go to category based image search demo.
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3) sub-image image-block
a) vertical b) horizontal c) 45 degree d) 135 degree e)non-directional
edge edge edge edge edge block_size image-block block_size 0 1
2 3
Histogram bins
Local_Edge [0]
Local_Edge [1]
Local_Edge [2]
Local_Edge [3]
Local_Edge [4]
Local_Edge [5]
:
:
:
Local_Edge [74]
Local_Edge [75]
Local_Edge [76]
Local_Edge [77]
Local_Edge [78]
Local_Edge [79]
Semantics
Vertical edge of sub-image at (0,0)
Horizontal edge of sub-image at (0,0)
45degree edge of sub-image at (0,0)
135 degree edge of sub-image at (0,0)
Non-directional edge of sub-image at (0,0)
Vertical edge of sub-image at (0,1)
:
:
:
Non-directional edge of sub-image at (3,2)
Vertical edge of sub-image at (3,3)
Horizontal edge of sub-image at (3,3)
45degree edge of sub-image at (3,3)
135 degree edge of sub-image at (3,3)
Non-directional edge of sub-image at (3,3)
1 2 3 4
7
8
5
6
9
11
13
10
12
0 79 0 4 0 64
local bins global bins semi-global bins
A N M R R
0.40
0.38
2bits/bin
0.36
0.34
0.32
0.30
0.28
150 200
3bits/bin
250
O nly Local proposed m ethod
300
4bits/bin 5bits/bin
350 400 bits/im age
2bits/bin 3bits/bin 4bits/bin 5bits/bin with local histogram only with local, semi-global and global histograms
(proposed)
0.396
0.364
0.336
0.296
0.318
0.286
0.325
0.284
Visual Descriptors
Color Texture Shape Motion
• Histogram
• Scalable Color
• Color Structure
• GOF/GOP
• Dominant Color
• Color Layout
• Texture Browsing
• Homogeneous texture
• Edge Histogram
• Contour Shape
• Region Shape
• Camera motion
• Motion Trajectory
• Parametric motion
• Motion Activity
Region shape
Contour shape
Contour-based shape descriptor
Region-based shape descriptor
Iteration
1
1000
5000
CSS Space
Iteration t
1/2
20 peaks in CSS image
(0.773, 1)
(0.629, 0.8441)
(0.460, 0.7510)
(0.336, 0.2198)
(0.153, 0.1427)
(0.669, 0.0806)
(0.716, 0.0742)
(0.234, 0.0587)
(0.960, 0.0575)
(0.499, 0.0214)
(0.466, 0.0155)
(0.957, 0.0143)
(0.900, 0.0114)
(0.412, 0.0114)
(0.886, 0.0107)
(0.996, 0.0105)
(0.723, 0.0090)
(0.357, 0.0076)
(0.764, 0.0069)
(0.921, 0.0066)
Model n
0.2
Circular shift
1 t
Query n
Penalty
1
Dissimilarity = Assignment cost + distance t
2/2
Angular Radial Transform
ART-C ART-S
ART basis function
Angular function
Radial function
ART coefficients
A Rotated image
Its ART coefficients
Relationship with the original
Magnitude has rotation invariance
Experimental Dataset & Procedure
1/3
Dataset 1 : 70 classes × 20 variations = 1400 images
CE1-A-1: Scale, CE1-A-2: Rotation, CE1-B: Similarity
Experimental Dataset & Procedure
2/3
Dataset 2 : 1100 marine creatures
Dataset 3 : 200 Bream fish video sequence
CE2-C: motion and non-rigid deformations
Experimental Dataset & Procedure
3/3
Dataset4 : 3000 trademark images
CE2-A-1: Scale, CE2-A-2: Rotation, CE2-A-3: Scale & rotation,
CE2-A-4: Perspective transformation
CE2-B: Similarity
1/2
Query results without respect to perspective normalization
Query results with respect to perspective normalization
2/2
Query results without respect to perspective normalization
Query results with respect to perspective normalization
U.S.
Patent and
Trademark
Office Local
Trademark Office
World
Intellectual
Property
Organization
Trademark
Examining Officers
Enterprises
Graphic Designers
Visual Descriptors
Color Texture Shape Motion
• Histogram
• Scalable Color
• Color Structure
• GOF/GOP
• Dominant Color
• Color Layout
• Texture Browsing
• Homogeneous texture
• Edge Histogram
• Contour Shape
• Region Shape
• Camera motion
• Motion Trajectory
• Parametric motion
• Motion Activity
Video Segment
Camera Motion
Mosaic
Warping Parameters
Moving Region
Trajectory
Motion Activity Parametric Motion
Boom up
Track right
Dolly forward
Dolly backward
Boom down
Track left
(a)
Roll
Pan right
Tilt up
Pan left
Tilt down
(b)
(a) Camera track, boom, and dolly motion modes, (b) Camera pan, tilt and roll motion modes.
Need to capture “pace” or Intensity of activity
For example, draw distinction between
“High Action” segments such as chase scenes.
“Low Action” segments such as talking heads
Emphasize simple extraction and matching
Use Gross Motion Characteristics thus avoiding object segmentation, tracking etc.
Compressed domain extraction is important
Attributes of Motion Activity Descriptor
Intensity/Magnitude - 3 bits
Spatial Characteristics - 16 bits
Temporal Characteristics - 30 bits
Directional Characteristics - 3 bits
Expresses “pace” or Intensity of Action
Uses scale of 1-5, very low - low - medium - high very high
Extracted by suitably quantizing variance of motion vector magnitude
Successfully tested with subjectively constructed
Ground Truth
Raster Scan
Medium Run-Length
Long Run-Length
Short Run-Length
With smaller, widely spaced objects note that there are more long-run lengths and medium run-lengths
Captures the size and number of moving regions in the shot on a frame by frame basis
Enables distinction between shots with one large region in the middle such as talking heads and shots with several small moving regions such as aerial soccer shots
Thus “sparse” shots have many long runs while
“dense” shots do not have many long runs.
Expresses fraction of the duration of each level of activity in the total duration of the shot
Straightforward extension of the intensity of motion activity to the temporal dimension
For instance, since a talking head is typically exclusively low activity it would have zero entries for all levels except one
Expresses dominant direction if definable as one of a set of eight equally spaced directions
Extracted by using averages of angle (direction) of each motion vector
Useful where there is strong directional motion
Extraction of 10 most active segments in a news program
VIDEO BROWSING
RETRIEVAL FROM STORED VIDEO
CONTENT RE-PURPOSING
CONTENT BASED PRESENTATION
SURVEILLANCE
COMPACT DESCRIPTOR
EASY TO EXTRACT AND MATCH
EFFECTIVE BY ITSELF
NUMEROUS APPLICATIONS
EFFECTIVE IN COMBINATION WITH OTHER
DESCRIPTORS
DEMO at the end.
First order approximation: v f
f b f a
f a v a t
t b a a
t
t a
second order approximation: f v a
f b f t b
f a v a
t a
t
t a
1
2 a a
( t b
1
2 t a a a
)
( t
t a
)
2
All of the visual descriptors in the MPEG-7 working draft have undergone rigorous testing and evaluation.
They represent the state of the art descriptors in image and video retrieval.
For further information, refer to the MPEG documents (see the next slide;)
Are we ready for this leap of faith?
Major MPEG-7 documents are public:
MPEG Home page: http://www.cselt.it/mpeg/
Public documents: http://www.cselt.it/mpeg/working_documents.htm
Also check: http://www.mpeg-7.com
Special issues of journals:
Signal Processing : Image Communications, Vol. 16(1-2), Sept. 2000: http://www.elsevier.com/locate/image
IEEE Trans. on Circuits and Systems on Video Technology
(June 2001)
IEEE Trans. Image Processing (Jan 2000 special issue on content based retrieval.)
IBM MPEG-7 Visual Annotation Tool:
http://www.alphaworks.ibm.com/tech/mpeg-7
Book on MPEG-7: to be published later this year (Manjunath, Salembier and
Sikora, Wiley International, 2001.)