Locating content: MPEG-7 - Electrical and Computer Engineering

advertisement

IEEE International Conference on Consumer Electronics

Los Angeles, CA, June 2001

MPEG-7 Visual Descriptors

B. S. Manjunath

University of California,

Santa Barbara, USA manj@ece.ucsb.edu

Acknowledgements

Prof. P. Salembier (for permitting me to use some of his slides fr our tutorial at ICIP’2000).

Editors of the MPEG-7 XM and WD.

Dr. L. Cieplensky, Mitsubishi Electric

Dr. A. Divakaran, Mitsubishi

Dr. S. Jeannin, Philips research lab

Prof. W.Y. Kim, Hanyang University

Dr. M. Bober, Mitsubishi Electric

Dr. H. J. Kim, LG Electronics

Dr. S. Park, ETRI

What you may (may not) expect..

Overview of the visual descriptors

Their capabilities and limitations

Some application examples

Pointers to publicly available documents

Do not expect

 Programming and implementation details – these are available in the MPEG-7 eXperimental Model (XM) document.

 Binary bit stream syntax – see the MPEG-7 Committee

Draft(s).

Audio-Visual Content Description and the MPEG-7 Standard

Objective, goals, requirements and applications

Basic component of the MPEG-7 standard

 Description Definition Language

 Description Schemes

 A/V Descriptors

Low level Visual Information

Description

Color

Shape

Texture

Motion

Face

©Salembier

Motivation

The multimedia context:

 More information is in digital form and is on-line.

 AV content covers: still pictures, audio, speech, video, graphics, 3D models, etc.

 AV content is available at all bitrates and on all networks.

 Increasing number of multimedia applications, services.

Necessity of describing content:

 Increasing amount of information.

More needs to have “information about the content”.

 Difficult to manage (find, select, filter, organize, etc) content.

 User: human or computational systems.

©Salembier

MPEG Standards

MPEG-1 : Storage of moving picture and audio on storage media (CD-ROM) 11 / 1992

MPEG-2 : Digital television 11 / 1994

MPEG-4 : Coding of natural and synthetic media objects for multimedia applications v1: 09 / 1998 v2: 11 / 1999

MPEG-7 : Multimedia content description for AV material

08 / 2001

MPEG-21 : Digital audiovisual framework: Integration of multimedia technologies (identification, copyright, protection, etc.) 11 / 2001

©Salembier

Objective of MPEG-7

Standardize content-based description for various types of audiovisual information

 Enable fast and efficient content searching, filtering and identification

 Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.)

 Address a large range of applications

(

 user preferences)

Types of audiovisual information:

 Audio, speech

 Moving video, still pictures, graphics, 3D models

 Information on how objects are combined in scenes

Descriptions independent of the data support

Existing solutions for textual content or description

©Salembier

Type of applications

Pull Applications:

 Example:

“Search and Browsing”

Internet search engines and databases

 Advantage: Queries based on standardized descriptions

Push Applications: “Filtering”

 Example:

 Advantage:

Broadcast video, Interactive television

Intelligent agents filter on the basis of standardized descriptions

Universal Multimedia Access:

 Adapt delivery to network and terminal characteristics (QoS)

Specialized Professional and Control Applications

Example of application areas

©Salembier

Storage and retrieval of audiovisual databases (image, film, radio archives)

Broadcast media selection (radio, TV programs)

Surveillance(traffic control, surface transportation, production chains)

E-commerce and Tele-shopping (searching for clothes / patterns)

Remote sensing(cartography, ecology, natural resources management)

Entertainment (searching for a game, for a karaoke)

Cultural services (museums, art galleries)

Journalism (searching for events, persons)

Personalized news service on Internet (push media filtering)

Intelligent multimedia presentations

Educational applications

Bio-medical applications

©Salembier

Example of queries

Text:

 Find AV material with the concepts described by the text

Semantic:

 Find AV material corresponding to the specified semantic

Image:

 Find an image with a similar characteristic (global or local)

Music:

 Play a few notes and search for corresponding musical pieces

Motion:

 Find video with specific object motion trajectories

Relation content / description

©Salembier

Description may be separated from the content

AV material

Description

AV material AV material

AV material

Description may be multiplexed with the content

AV Desc AV Desc AV Desc

©Salembier

Type of description

Information about the content: recording date & conditions, title, author, copyright, coding format, classification, etc.

Information present in the content: Combination of low level and high level descriptors

 High level description:

 Efficient and powerful

 Lack of flexibility

 Low level description

 Generic and flexible

 Intelligent / efficient search engine

Efficiency

Indexing

Feature extrac

High level

Recognition process

Search

Retrieval

Low level

Recognition process

No restriction on the search

Why do we need a standard ?

©Salembier

Having a standard will ease the task of fast and efficient identification of content that is of interest to the user by

 allowing the same described content to be accessed by more search engines and filters

 allowing the same search engine and filters to identify described content from more sources

©Salembier

Scope of MPEG-7

Description generation

Description

Description consumption

Research and future competition

Scope of MPEG-7

The description generation

 Feature extraction, Indexing process, Annotation &Authoring tools,...) consumption

 Search engine, Filtering tool, Retrieval process, Browsing device, ...) are non normative parts of MPEG-7

The goal is to define the minimum that enables interoperability

MPEG-7: The Workplan

1996

Competition:

• Individual work

• Definition of the scope and requ

1998 1999

Collaboration:

• Common work

• Core experiments

• eXperimentation

Model

• Requirements

2000 2001

©Salembier

Call for proposals Working draft Committee draft

Final committee draft

International standard

Draft international standard

©Salembier

Information Flow

Feature extraction

Manual / automatic

AV Description

Decoding

(for transmission)

Storage

Search / query

User & computational systems

Pull

Browse

Filter

Conf.

points

Transmission

Encoding

(for transmission)

Push

The content and its description may also be multiplexed

Parts of the MPEG-7 Standard

ISO / IEC 15938 - 1: Systems

ISO / IEC 15938 - 2: Description Definition Language

ISO / IEC 15938 - 3: Visual

ISO / IEC 15938 - 4: Audio

ISO / IEC 15938 - 5: Multimedia Description Schemes

ISO / IEC 15938 - 6: Reference Software

Visual Descriptors

Visual Descriptors

Color Texture Shape Motion

1. Histogram

• Scalable Color

• Color Structure

• GOF/GOP

2. Dominant Color

3. Color Layout

• Texture Browsing

• Homogeneous texture

• Edge Histogram

• Contour Shape

• Region Shape

• Camera motion

Motion Trajectory

• Parametric motion

• Motion Activity

Color Datasets and Evaluation

Criteria

Color: about 5400 color images and 50 queries.

See MPEG document M5060 from Melbourne,

October 1999.

Texture: various data sets – Brodatz texture, aerial pictures, Corel photos.

Performance evaluation

Let the number of ground truth images for a query q be NG(q)

Compute NR(q), number of found items in first K(q) retrievals, where

-

K(q)=min(4*NG(q), 2*GTM)

Where GTM is max{NG(q)} for all q’s of a data set.

Compute MR(q)=NG(q)-NR(q), number of missed items

Compute from the ranks Rank ( k ) of the found items counting the rank of the first retrieved item as one.

A Rank of ( 1.25K(q) ) is assigned to each of the ground truth images which are not in the first K(q) retrievals.

Compute the normalized modified retrieval rank as follows

(next slide). Note that the NMRR ( q ) will always be in the range of

[0.0,1.0].

Average Retrieval Rate (AVR) and

ANMRR

Compute AVR(q) for query q as follows:

AVR ( q )

NG k

 (

1 q )

Rank ( k )

NG ( q )

Compute the modified retrieval rank as follows:

( ) = ( ) 0.5(1 +

Normalized MRR, NMRR = MRR(q)/Norm(q)

Where Norm(q)=1.25*K –0.5 – 0.5*NG(q)

ANMRR

1

Q q

Q 

1

NMRR ( q )

Color Descriptors

Color descriptors

Dominant Color Scalable Color

- HSV space

Group of frames/pictures histogram color structure

- HMMD space

Color Layout

- YCbCr space

Color Spaces

- YCrCb

- monochrome

(Y only)

- RGB

- HSV

- HMMD

Color spaces

The Color Space Descriptor allows a selection of a color space to be used in the description, the Color

Quantization Descriptor specifies the partitioning of the given color space into discrete bins. These two descriptors are rather to be used in the context of other descriptors, not standalone.

 RGB

 YCrCb  color layout

 HSV  scalable color

 HMMD  color structure

 Arbitrary 3x3 color transformation matrix

RGB color space

HSV color space

RGB to YCrCb

Y = 0.299*R + 0.587*G + 0.114*B

Cb = -0.169*R - 0.331*G + 0.500*B

Cr = 0.500*R - 0.419*G - 0.081*B

RGB to HSV

Max = max(R, G, B); Min = min( R, G, B);

Value = max(R, G, B); if( Max == 0 ) then

Saturation = 0; else

Saturation = (Max-Min)/Max; if( Max == Min ) Hue is undefined (achromatic color); otherwise: if( Max == R && G > B ) Hue = 60*(G-B)/(Max-Min) else if( Max == R && G < B ) Hue = 360 + 60*(G-B)/(Max-Min) else if( G == Max ) Hue = 60*(2.0 + (B-R)/(Max-Min)) else Hue = 60*(4.0 + (R-G)/(Max-Min))

RGB to HMMD

Diff=Max-Min

Sum=(max+min)/2

Hue as defined for the HSV.

HMMD

White

Intensity(sum)

Whiteness(min)

Chroma(Diff)

Pure color

Blackness(max)

Black

HMMD Quantization

Component Subspace

Hue

Sum

2

3

4

0

1

3

4

0

1

2

4

2

8

4

4

12

12

24

Number of quantisation levels for different numbers of histogram bins

184 120 64 32

1

8

1

4

1

4

1

4

12

12

6

4

3

2

8

4

4

4

8

4

4

4

8

2

4

2

Scalable color histogram

This is a color histogram in the HSV space encoded using a Haar transform. The binary representation is scalable in terms of number of bins used and in the number of bits per bin over a wide range of data rates.

Number of bins can range from 16 to 256.

No. coeff

16

32

64

128

256

8

8

16

# bins: H

4

8

2

4

4

# bins: S

2

2

4

4

4

#bins: V

2

2

Scalable color descriptor

t o a)

-

+

+

+ b)

Extraction & matching

HSV space is uniformly quantized into 256 bins (colors);

The bin values are non-uniformly quantized into 4 bits per bin (11 bit integer truncation followed by a 4 bit non-linear quantization). The quantization is described in the normative part.

The 4 bit values are then Haar transformed.

For matching, one can use the Haar transformed coefficient values instead of reconstructing the histogram values.

 However, more accurate results can be expected by reconstructing the histogram index values.

Performance evaluation

0,6

0,5

A

N

M

R

R

0,4

0,3

0,2

0,1

16

32

64

128

256

H-Rec

0

0 256 512 768 1024

Number of bits

Results with different numbers of Haar coefficients (16-256) quantized at different numbers of bits. H-Rec signifies retrieval results after reconstruction of histogram from Haar coefficients at full bit resolution.

GoP/GoF descriptor

The Group of Frames/Group of Pictures Descriptor (GoP) extends the SCD application to a collection of images, video segments, or moving regions.

In the GoP descriptor, three different ways of computing the joint color histogram values for the whole series using the individual histograms from items within the collection are identified:

 averaging, median filtering, and histogram intersection.

 This joint color histogram is then processed as in the SCD using the

Haar transform and encoded.

Color structure

Similar to a histogram, but a 8x8 structuring element is used to compute the bin values.

HMMD color space should be used with this descriptor. The quantization of the HMMD space to

32, 64, 128 and 180 bins is specified.

Color Structure

C4

C5

C6

C7

COLOR BIN

C0

C1

C2

+1

C3 +1

+1

8 x 8 structuring element dist( A , B )

Ack: WD 4.0, July 2000

 i h

A

( i )

 h

B

( i )

Structuring element: scalability

a

.

b .

Figure 6: Structuring elements for images with different resolutions: (a): 320x240, (b): 640x480

.

Interoperability

The Color-Structure Descriptor can be used in a limited range of Color Quantization settings, such that the total number of bins lies between 32 and 256.

Re-quantization of a Color Structure Descriptor from fine color space quantization to course can be performed in

HMMD color space using the re-quantization method defined in the WD.

The Color-Structure Descriptor is not interoperable with other Color Descriptors.

CSD: Experimental results

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0

0 20 40 60 descriptor bit-length

80 100

Dominant color

Best suitable for local (object or region) features

 a small number of colors enough to characterize the color information

Before feature extraction, images segmented into regions

Similar to color histogram EXCEPT:

 color bins not fixed, depending on quantization in each region

 number of bins not fixed, on average only 3 bins per region

To extract feature

 quantize to a small number of representing colors in each region

 calculate percentages of quantized colors in the region

Descriptor Definition

A given image is described in terms of a set of region labels and the associated color descriptors

 Each pixel has a unique region label.

 Each region is characterized by a variable bin color histogram

Color feature descriptor for a given region in the image F :

F =

{ { c p v i i i

} , s

}

, ( i = 1, 2, , N ) where c i is the i -th dominant color, p i is its percentage value, and v i is its color variance. The color variance is an optional field. N: total number of quantized colors in the region .

 The spatial coherency s is a single number that represents the overall spatial homogeneity of the dominant colors in the image.

Dominant color (contd.)

The number of dominant colors N can vary from image to image and a maximum of 8 dominant colors can be used to represent the region (3 bits).

The percentage values are quantized to 5 bits each.

The color quantization depends on the color space specifications defined for the entire database and need not be specified with each descriptor.

Experiments with 6 bits/color index.

Variance: 3 bits/dominant color.

Spatial coherence: 5 bits.

Similarity Distance Measure

D

2

( F

1

, F

2

)

 i

N

1 

1 p

1 i

2  j

N  2

1 p

2

2 j

 i

N

1

N  2

1 j

1

2 a

1 i , 2 j p

1 i p

2 j a k,l

: similarity coefficient between two colors c k and c l a k , l

1

 0

 d k , l

/ d max d d k , l k , l

T d

T d d k,l

: Euclidean distance between two color c k and c l d k , l

 c k

 c l

T d

: maximum distance for two colors to be considered similar, d max

=

T d

,

 values 1.0-1.5, T d values 10-20

Equivalent to the quadratic color histogram distance measure

D

2

( F

1

, F

2

)

( F

1

 

F

2

)

T

A

( F

1

 

F

2

)

Enhancements

Spatial coherency

Color variance: variance of each of the dominant color.

ANMRR Results for DC

Color

Space

#dominant colors

(average)

Size(bits)

DC

ARR

DC+Variance

ANMRR Size (bits) ARR ANMRR

RGB

CIE-LAB 3

5

3

6

69

130

67

112

0.6368

0.7114

0.7568

0.8083

0.3897

0.3214

0.2784

0.2312

78

148

76

127

0.7163

0.7933

0.8160

0.8951

0.3222

0.2295

0.2350

0.1563

With Spatial Coherence field

2

1

0

5

4

3

# bits for the spatial coherence

ANMRR

Spatial coherence field with dominant colors

0.221

Spatial coherence for each dominant color

0.227

0.246

0.250

0.252

0.197

0.202

0.252 (without spatial coherence value)

Merits

Accurate and compact compared to the traditional color histogram

 color bins quantized from each image region VS. fixed

 3 bins on average VS. 256 or more

Efficient database indexing and search

 NO high-dimensional indexing

 complexity of the searching depends

 only on the desired degree of the similarity of the matching

 not directly on the database size

 insertion and deletion of database entries do not cause index structure rebuilding

 retrieval results accurate and fast compared to the traditional color histogram

Color Layout Descriptor (CLD)

Spatial distribution of colors for fast browsing and retrieval.

A very compact representation (63 bits).

Uses YCbCr color space.

 Partition the image into 8x8 subimages.

Calculate the “dominant color” of each sub-image.

 Compute the DCT of this 8x8 matrix of dominant colors.

 Quantize the DCT coefficients.

 Zig-zag scan of quantized DCT coefficients.

Experimental Results

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0

0 20 40 60 descriptor bit-length

80 100

Application to video segment retrieval

Compute the CLD every 0.5 secs.

Subsample and quantize the CLD sequence:

 Let the CLD (t) be the CLD at instant t.

 Let t=0, i=0. Let CLDV(0)=CLD(0).

 STEP 1: t=t+1.

If CLD(t+1) is similar to CLDV(i) then GOTO STEP 1.;

 Else i=i+1; CLDV(i)=CLD(t). GOTO STEP 1.

Visual Descriptors

Color Texture Shape Motion

1. Histogram

• Scalable Color

• Color Structure

• GOF/GOP

2. Dominant Color

3. Color Layout

• Texture Browsing

• Homogeneous texture

• Edge Histogram

• Contour Shape

• Region Shape

• Camera motion

• Motion Trajectory

• Parametric motion

• Motion Activity

Texture descriptors

Texture browsing

Homogeneous texture

Edge histogram

Texture Browsing Descriptor

Compact descriptor for texture browsing- requires only 12 bits. The components provide a higher level perceptual characterization of texture that is useful for browsing and clustering.

Feature extraction is simple, involving image convolutions with a set of masks. The filters are based on a 2-D Gabor wavelet decomposition.

Image convolutions can be efficiently implemented in hardware and software.

Proposed Texture Descriptor

 

[ v v v v v

1 2 3 4 5

   

0 0 1 1

... ...

 

M M

]

PBC SRC

PBC: Perceptual Browsing Component, ( v i integers (12 bits total) v

1

 1 N v  i  1 ,..., 5 ) are non negative provides a confidence measure on the texture regularity v v

2 4

О K  give two quantized directions which best capture the regularity v v

5

О S  give two quantized scales which best capture the texture regularity

Examples

PBC  v v v v v

(6)

Figure 2: Two texture examples: (a) regular pattern, and (b) an unstructured pattern

Some sample textures and corresponding PBC vectors

[1 3 3 1 1] [1 3 3 1 3] [1 4 1 1 1] [1 6 3 1 1] [1 1 5 1 2]

[2 6 2 3 3] [2 2 6 4 1] [2 6 4 1 3] [2 2 4 2 1] [3 1 4 1 4]

[4 1 4 3 3] [4 1 4 4 4] [4 1 4 4 4] [4 2 3 3 2] [4 1 4 3 3]

Similarity Retrieval: Homogeneous

Texture Descriptor

  [ v v v v v

5

   

2

... ...

 

M

]

PBC

SRC

SRC: Similarity Retrieval Component.

 Components are computed by convolving the image with a set of filters tuned to detect image features at different scales and orientations.

 denote the normalized first and second moments of these filtered outputs (computed in the frequency domain).

SRC

62 Components x 8 bits/component.

The components represent mean and standard deviation of the energies in each of the filtered outputs.

Matching can be made rotation and scale invariant.

Descriptor computation:

Frequency Layout

Channel (C i

) channel number (i)

4

5 3

6

10

11

12

30

18

16

17

15

24

23

22

21

20

19

14

13

29

28

27

26

25

0

9

8

7

2

1 w q w

G

P s , r

 w  q

 exp

 w  w s

2

2

 2

 s

 exp

 q  q r

2

2

 q

2 r

E.g.: Browsing large aerial photographs

40 large aerial photographs (each is 5K x 5K)

Contain about 280,000 tiles and 6,000 regions

Texture based search (using Gabor texture features)

A pattern thesaurus for image indexing

Fast image segmentation scheme

Provide both tile-based and region-based search capabilities.

Integrated into the UCSB Alexandria Digital Library.

Examples of tile-based search

Query Codeword

Applications - Web Image Search

Web image search

Applications - Web Image Search

More details

 http://vision.ece.ucsb.edu/

 Go to category based image search demo.

Edge histogram

(0,0) (0,1) (0,2) (0,3)

(1,0) (1,1) (1,2) (1,3)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3) sub-image image-block

Edge masks

a) vertical b) horizontal c) 45 degree d) 135 degree e)non-directional

edge edge edge edge edge block_size image-block block_size 0 1

2 3

Local edge histogram

Histogram bins

Local_Edge [0]

Local_Edge [1]

Local_Edge [2]

Local_Edge [3]

Local_Edge [4]

Local_Edge [5]

:

:

:

Local_Edge [74]

Local_Edge [75]

Local_Edge [76]

Local_Edge [77]

Local_Edge [78]

Local_Edge [79]

Semantics

Vertical edge of sub-image at (0,0)

Horizontal edge of sub-image at (0,0)

45degree edge of sub-image at (0,0)

135 degree edge of sub-image at (0,0)

Non-directional edge of sub-image at (0,0)

Vertical edge of sub-image at (0,1)

:

:

:

Non-directional edge of sub-image at (3,2)

Vertical edge of sub-image at (3,3)

Horizontal edge of sub-image at (3,3)

45degree edge of sub-image at (3,3)

135 degree edge of sub-image at (3,3)

Non-directional edge of sub-image at (3,3)

Global histogram

1 2 3 4

7

8

5

6

9

11

13

10

12

0 79 0 4 0 64

local bins global bins semi-global bins

ANMRR results

A N M R R

0.40

0.38

2bits/bin

0.36

0.34

0.32

0.30

0.28

150 200

3bits/bin

250

O nly Local proposed m ethod

300

4bits/bin 5bits/bin

350 400 bits/im age

Local vs Global histograms

2bits/bin 3bits/bin 4bits/bin 5bits/bin with local histogram only with local, semi-global and global histograms

(proposed)

0.396

0.364

0.336

0.296

0.318

0.286

0.325

0.284

Some example results

Visual Descriptors

Color Texture Shape Motion

• Histogram

• Scalable Color

• Color Structure

• GOF/GOP

• Dominant Color

• Color Layout

• Texture Browsing

• Homogeneous texture

• Edge Histogram

• Contour Shape

• Region Shape

• Camera motion

• Motion Trajectory

• Parametric motion

• Motion Activity

Shape Descriptors

Region shape

Contour shape

Types of Shape Descriptor

Contour-based shape descriptor

Region-based shape descriptor

CSS Descriptor

Iteration

1

1000

5000

CSS Space

Iteration t

1/2

20 peaks in CSS image

(0.773, 1)

(0.629, 0.8441)

(0.460, 0.7510)

(0.336, 0.2198)

(0.153, 0.1427)

(0.669, 0.0806)

(0.716, 0.0742)

(0.234, 0.0587)

(0.960, 0.0575)

(0.499, 0.0214)

(0.466, 0.0155)

(0.957, 0.0143)

(0.900, 0.0114)

(0.412, 0.0114)

(0.886, 0.0107)

(0.996, 0.0105)

(0.723, 0.0090)

(0.357, 0.0076)

(0.764, 0.0069)

(0.921, 0.0066)

CSS Descriptor

Model n

0.2

Circular shift

1 t

Query n

Penalty

1

Dissimilarity = Assignment cost + distance t

2/2

CSS image formation

ART Descriptor

Angular Radial Transform

ART-C ART-S

The Definition of ART

ART basis function

Angular function

Radial function

ART coefficients

Rotation Invariance of the ART

A Rotated image

Its ART coefficients

Relationship with the original

Magnitude has rotation invariance

Experimental Dataset & Procedure

1/3

Dataset 1 : 70 classes × 20 variations = 1400 images

CE1-A-1: Scale, CE1-A-2: Rotation, CE1-B: Similarity

Experimental Dataset & Procedure

2/3

Dataset 2 : 1100 marine creatures

Dataset 3 : 200 Bream fish video sequence

CE2-C: motion and non-rigid deformations

Experimental Dataset & Procedure

3/3

Dataset4 : 3000 trademark images

CE2-A-1: Scale, CE2-A-2: Rotation, CE2-A-3: Scale & rotation,

CE2-A-4: Perspective transformation

CE2-B: Similarity

Retrieval Example

1/2

Query results without respect to perspective normalization

Query results with respect to perspective normalization

Retrieval Example

2/2

Query results without respect to perspective normalization

Query results with respect to perspective normalization

Ex: Trademark Registration

Application

U.S.

Patent and

Trademark

Office Local

Trademark Office

World

Intellectual

Property

Organization

Trademark

Examining Officers

Enterprises

Graphic Designers

Visual Descriptors

Color Texture Shape Motion

• Histogram

• Scalable Color

• Color Structure

• GOF/GOP

• Dominant Color

• Color Layout

• Texture Browsing

• Homogeneous texture

• Edge Histogram

• Contour Shape

• Region Shape

• Camera motion

• Motion Trajectory

• Parametric motion

• Motion Activity

Motion Descriptors

Video Segment

Camera Motion

Mosaic

Warping Parameters

Moving Region

Trajectory

Motion Activity Parametric Motion

Camera Motion

Boom up

Track right

Dolly forward

Dolly backward

Boom down

Track left

(a)

Roll

Pan right

Tilt up

Pan left

Tilt down

(b)

(a) Camera track, boom, and dolly motion modes, (b) Camera pan, tilt and roll motion modes.

Motion Activity: motivation

Need to capture “pace” or Intensity of activity

For example, draw distinction between

“High Action” segments such as chase scenes.

“Low Action” segments such as talking heads

Emphasize simple extraction and matching

Use Gross Motion Characteristics thus avoiding object segmentation, tracking etc.

Compressed domain extraction is important

PROPOSED MOTION ACTIVITY

DESCRIPTOR

Attributes of Motion Activity Descriptor

 Intensity/Magnitude - 3 bits

 Spatial Characteristics - 16 bits

 Temporal Characteristics - 30 bits

 Directional Characteristics - 3 bits

INTENSITY

Expresses “pace” or Intensity of Action

Uses scale of 1-5, very low - low - medium - high very high

Extracted by suitably quantizing variance of motion vector magnitude

Successfully tested with subjectively constructed

Ground Truth

Spatial Distribution: Using run-lengths to describe moving regions

Raster Scan

Medium Run-Length

Long Run-Length

Short Run-Length

With smaller, widely spaced objects note that there are more long-run lengths and medium run-lengths

SPATIAL DISTRIBUTION

Captures the size and number of moving regions in the shot on a frame by frame basis

Enables distinction between shots with one large region in the middle such as talking heads and shots with several small moving regions such as aerial soccer shots

Thus “sparse” shots have many long runs while

“dense” shots do not have many long runs.

TEMPORAL DISTRIBUTION

Expresses fraction of the duration of each level of activity in the total duration of the shot

Straightforward extension of the intensity of motion activity to the temporal dimension

For instance, since a talking head is typically exclusively low activity it would have zero entries for all levels except one

DIRECTION

Expresses dominant direction if definable as one of a set of eight equally spaced directions

Extracted by using averages of angle (direction) of each motion vector

Useful where there is strong directional motion

APPLICATION TO VIDEO

BROWSING

Extraction of 10 most active segments in a news program

Improve Search by Combining

Intensity and Spatial Attributes

APPLICATIONS

VIDEO BROWSING

RETRIEVAL FROM STORED VIDEO

CONTENT RE-PURPOSING

CONTENT BASED PRESENTATION

SURVEILLANCE

Motion activity: conclusions

COMPACT DESCRIPTOR

EASY TO EXTRACT AND MATCH

EFFECTIVE BY ITSELF

NUMEROUS APPLICATIONS

EFFECTIVE IN COMBINATION WITH OTHER

DESCRIPTORS

DEMO at the end.

Motion Trajectory

First order approximation: v f

 

 f b f a

 f a v a t

 t b a a

 t

 t a

 second order approximation: f v a

 f b f t b

 f a v a

 t a

 t

 t a

1

2 a a

( t b

1

2 t a a a

)

( t

 t a

)

2

Trajectory (contd.)

Conclusions

All of the visual descriptors in the MPEG-7 working draft have undergone rigorous testing and evaluation.

They represent the state of the art descriptors in image and video retrieval.

For further information, refer to the MPEG documents (see the next slide;)

Ds to DS

Are we ready for this leap of faith?

Further information

Major MPEG-7 documents are public:

MPEG Home page: http://www.cselt.it/mpeg/

Public documents: http://www.cselt.it/mpeg/working_documents.htm

Also check: http://www.mpeg-7.com

Special issues of journals:

Signal Processing : Image Communications, Vol. 16(1-2), Sept. 2000: http://www.elsevier.com/locate/image

IEEE Trans. on Circuits and Systems on Video Technology

(June 2001)

 IEEE Trans. Image Processing (Jan 2000 special issue on content based retrieval.)

IBM MPEG-7 Visual Annotation Tool:

 http://www.alphaworks.ibm.com/tech/mpeg-7

Book on MPEG-7: to be published later this year (Manjunath, Salembier and

Sikora, Wiley International, 2001.)

Download