MPEG-7 Multimedia Content Description Interface Presented by: Moustafa A. Hammad Introduction More and more digital audio - visual information exists and increasing. How fast and easy can desirable information be made available? Increasing Internet popularity . More audio-visual information processing systems emerged. MPEG-1 “Standard for storage and retrieval” MPEG-2 “The digital television standard” MPEG-4 “ Multimedia production, distribution and content access” developed a Syntactic Description Language. “ Fine - where is the semantic?” Introduction (Cont.) MPEG-7 “ Multimedia content description interface”. Represents information about the contents, but not the content itself. Satisfies both database and signal processing communities. Goal: audio-visual material as searchable as text. What is the standard? (To be finalized mid 2001) Topics of Discussion Scope of the standard Terminology Interaction between MPEG-7 and Applications. Requirements Applications Case study : A Proposal for MPEG-7 Description Definition. Scope of the Standard MPEG-7 processing chain: feature extraction (analysis) the description itself the search engines (application) What is in the standard? Description production Standard Description Description consumption MPEG-7 Terminology Data, Description Definition Language (DDL) Feature, 1..* defines AV Content Item 1..* Descriptor (D), Descriptor value, 0..* Description Scheme * Description Scheme (DS), 1..* Description, Descriptor 1..* describes Feature *..1 1..* Coded Description and To Description Definition Language (DDL) signifies Human or System Data Interaction between MPEG-7 and Applications MM Content Description Generation Description Definition Language (DDL) Filter Agents Description Schemes (DS) MPEG-7 Description Encoder Descriptor (D) MPEG-7 Coded Description Search/ Query Engine Decoder Human or System User or data processing system MPEG-7 Requirements Descriptors:(Cross modularity, Direct data manipulation, Data adaptation, Language of text based descriptions, Linking, Prioritization of related information, Unique identification) Description Schemes:( Description Scheme relationships, Prioritization of descriptors, Hierarchy of descriptors, Scalability of descriptors, Description of temporal range, Data adaptation) DDL: (Compositional capabilities, Unique identification, Primitive/composite data types, Multiple media types, Relationships between description and data, Grammar, Intellectual Property Management and Protection (IPMP), Real time support …..Etc.) Descriptors requirement: General: (Types of features [N-dimensional spatio-temporal structure, Objectives, subjective, Production, composition, Concepts], Referencing analogue data,…..Etc.) MPEG-7 Requirements (Cont.) Functional: ( Retrieval effectiveness, similarity-base retrieval ….Etc.) Coding: (Description efficient representation, Description extraction…Etc.) Visual specific: (Types of features (color, texture, sketch…), Visual data formats…Etc.) Audio specific: (Types of features (Frequency contour, Harmony…), Auditory data formats…Etc.) Text specific: (Text retrieval, consistency of text description tools) (Types of features (Frequency contour, Harmony…), Auditory data formats…Etc.) System requirement: (multiplexing, Temporal synchronization, File format, IPMP…Etc.) Ref: MPEG Requirement Group, “MPEG-7 Requirement”, Doc, ISO/MPEG N2859, MPEG Vancouver Meeting, July 1999. MPEG-7 Applications Pull applications: video retrieval: (storage and retrieval of video database. Sound effects library, historical speech database…Etc.) Push applications: video selection and filtering:( Personalized television services, information access facilities for people with special needs..Etc.) Specialized professional and control applications. (Remote sensing applications, Surveillance applications..Etc.) A proposal for an MPEG-7 Description Definition language (DDL) Reference: [J. Hunter (DSTC)] A schema is based on different schemas; Resource Description Framework (RDF) Schema, XML Document Type Descriptors (DTD), Document Content Description (DCD), A Schema for Object-Oriented XML (SOX). Satisfies The DDL requirements. Consists of classes, properties and relations between classes. Uses of Dublin Core (DC) attributes. (Name, Identifier, Version, Registration Authority, Language, Definition, Obligation, Datatype, Maximum Occurrence, Comment) The Description Scheme MM Document Audio Speed Track Phoneme List Music Track Scope MIDI tempo Video SoundFX Track Sequence1 List of soundFX DC.Title DC. Creator DC.Subject DC.Publisher DC.Description DC.Contributor DC.Date DC.Type DC.Format DC.Identifier DC.Source DC.Language DC.Relation.HasPart DC.Rights Sequence2 Sequence3 Scene1.1 Scene1.2 Scene1.3 Shot 1.1.1 Shot 1.1.2 Shot 1.1.3 Frame 1 Object1 Frame 120 Object2 Object3 DC.Subject DC.Description DC.Contributor.Presenter DC.Type DC.Format.Length DC.Identifier DC.Relation.HasPart DC.Coverage.T.Min DC.Coverage.T.Max DC.Description DC.Contributor.Presenter DC.Type DC.Format.Length DC.Identifier DC.Relation.HasPart DC.Coverage.T.Min DC.Coverage.T.Max DC.Description DC.Contributor.Presenter DC.Type DC.Format.Length DC.Identifier DC.Relation.HasPart DC.Coverage.T.Min DC.Coverage.T.Max DC.Description DC.Type DC.Format.Type DC.Identifier DC.Relation.HasPart DC.Description DC.Type DC.Identifier MPEG-7 Text Script Transcript EditList KeyFrame Locale Cast Objects Text Script Transcript EditList KeyFrame Camera.Dist Camera.Angle Camera.Motion Lighting OpenTrans CloseTrans Text Image Timestamp colour Anno.Text Anno.Posn Text Position shape Trajectory Speed Colour Texture Volume Anno.Text Anno.Posn Features of the proposed MPEG-7 DDL Namespace Declarations <x xmlns:dc=“http://purl.org/metadata/dublin_core#”> <!-- the ”dc" prefix is bound to http://purl.org/metadata/dublin_core for the "x" element and contents --> <dc:Title> CNN News </ dc:Title> </x> The Class Type declarations and Class Hierarchies <class id=“MM_Document”> <property type = “#dc_attribs”/> </class> <class id=“Video_Document”> <subclassof type=“#MM_Document” /> <property type = “duration”/> </class> Features ….. (Cont.) Property type declaration <propertyType id=“frameNum” datatype=“int”/>, <propertyType id=“secs” datatype=“float”/> <propertyType id=“timestamp”> <Alt> <property type =“#frameNum”> <property type =“#secs”> </Alt> </propertType> The relationship type declaration <realtionType id=“contains” direction=“uni” inverse=“#contained_by”> <domain type=“#MM_Document’/> <range type=“#MM_Document” occurs=“zerormore” order=“Seq”/> <constraint type=“boolean” value= “((range[1].start>=domain.start)&&range[n].end <= domain.end ) )”/> </relationType> <class id=“scene”> <subclassof type=“#MM_Document” /> <property type = “#dc_attribs”/> <relation type=“contains” range=“#object”/> </class> Features ….. (Cont.) Order and Occurs (Seq, Bag, Alt, Par) Data typing & user defined datatypes Attribute Definitions: <attributeType id=“src” datatype=“uri”/> Synchronization and temporal specification <seq> <audio_track src=“audio1”/> <audio_track begin=“5s” src=“audio2”/> </seq> Audio1 5s Audio2 Spatial specification Both rectangle and polygon representation, HTML syntax and semantics. Example: MPEG-7 description …………………. <MM_Document src = http://………./test.mpg> <!-- other properties> <contains> <Par> <Seq id=“vidoe_sequences”> <sequence id=“seq1” src=“http://…….” /> <!-- other sequences> </Seq> <Seq id=“audio_tracks”> <Audio id=“speech” src=“test.ra”/> <!-- other audios> </Seq> </Par> </contains> <sequence id=‘seq1” src=“http://…..”/> ……… <contains> <Seq> <Scene id=“scene1” src=“http://…….” /> <!-- other scenes> </Seq> </contains> …………………………………………………….. <!-- declaration od shots, frames and objects/> ……………..…………………………………….. <Object id=“#car” src=“http://……………../test.jpg#car”> ……….. <DC.Description.text>”A red car which has been severely damaged by the exposition.” </DC.Description.text>” ………... </object> Conclusion The proposed DDL provides most of the DDL requirement. There are some remarks: Lack of provision to push applications: (filtering and selection, real time support) No representation for subjective and concept features. Simple representation and support for spatial features.