Internet Streaming Media Metadata Interchange with MPEG-7 Eric Rehm CTO, singingfish.com Thomson multimedia 4 May 2001, Hong Kong Overview • Brief look at Singingfish • Indexing Internet streaming media • Automating metadata delivery and processing • Case Study: Using XSL to transform MSNBC schema to MPEG-7 singingfish.com • • • • • Wholly-owned subsidiary of Thomson Multimedia B2B Streaming Media Search Service Pay per query business model Over 15 M streams indexed Live with customers since Jan 2000 – InfoSpace: Metacrawler, Dogpile – Inside Internet AG: Swiss-Search, Austria-Search • Involved with MPEG-7 standards development since Sept 1999 Service Model Indexing Streaming Media • High quality metadata improves relevancy of multimedia search results • Crawl….or…work directly with multimedia “Content Producers” to acquire quality metadata • Solution: Implement FTP push/pull of metadata – Automated processing upon FTP close – Support bulk or incremental operations: add, update, delete, reset – Future: SOAP or other W3C XML protocol Design Content Producer Program Metadata Engine Java FTP Servlet Scheduler Promotion FTP Promotion JDBC Workflow RDBMS Importer Promotion Promotion Content Producer XML Feed XSL Engine Singingfish Search Engine Query / Response Search Index Development Goals • Single metadata schema interface to a database – Control development costs – Partition engineering and content development • Adapt to any “content partner” metadata – XML, CSV, Excel, Virage VDF, …. – Transform “content partner” metadata to MPEG-7 via: • Custom applications (CSV, Excel) MPEG-7 • Proprietary XML schemas XSL MPEG-7 Case Study Create XSL transformation • From: – MSNBC "Partner XML Format" • To: – MPEG-7 Description Experimental Results • XSL Stylesheet: 370 lines of lightly commented code File lines chars elemnts attrs MSNBC Partner XML Example 73 1199 58 16 MPEG-7 Result 263 4471 151 74 Discussion • Basic MPEG-7 Tools • Semantic Encoding of MSNBC Keywords into MPEG-7 Structured Annotation DS (Who, What, Where, When, Why, How) • Encoding Controlled Terms using namespaces • Encoding Streaming Media Validity with the Availability DS • Extending an MPEG-7 DS MSNBC Video Distribution Entry tdy_fletcher_mideast_001023 Keywords: Israel, palestinian, Yasser Arafat Top News Order: 12 Peace hopes slip farther The slim hopes for peace in the Mideast are rapidly fading, NBC’s Martin Fletcher reports Monday from the outskirts of Jerusalem. Today’s show •Barak, Sharon talk coalition •What’s on Today •What’s on Weekend Today •What’s on Today MSNBC <article> <article storyorder="12" pubdate="10/23/2000 8:02:00 AM" source="Today show" topnews="12"> <filename>tdy_fletcher_mideast_001023</filename> <duration>00:01:09</duration> <headline>Peace hopes slip farther</headline> <description>The slim hopes for peace in the Mideast are rapidly fading, NBC&amp;#146;s Martin Fletcher reports Monday from the outskirts of Jerusalem.</description> <keywords>Israel, palestinian, Yasser Arafat</keywords> ...</article> MPEG-7 link to stream <MediaInformation> <MediaProfile> <MediaInstance> <MediaLocator> <MediaUri> http://www.msnbc.com/news/asx/video/28/tdy_fletcher_m ideast_001023.asx </MediaUri> </MediaLocator> </MediaInstance> <MediaProfile> <MediaInformation> <headline> <headline>Peace hopes slip farther</headline> <CreationInformation> <Creation> <Title> <xsl:value-of select="headline"/> </Title> </Creation> </CreationInformation> < description>, <keywords> <description>The slim hopes for peace in the Mideast ...</description> <keywords>Israel, palestinian, Yasser Arafat</keywords> <Abstract> <FreeTextAnnotation> <xsl:value-of select="description"/> </FreeTextAnnotation> <KeywordAnnotation> <Keyword>Israel</Keyword> <Keyword>Yasser Arafat</Keyword </KeywordAnnotation> </Abstract> Enhanced <keywords> <keywords>Israel, palestinian, Yasser Arafat</keywords> <Abstract> <Who> <Name>Yasser Arafat</Name> </Who> <WhatObject> <Name>palestinian</Name> </WhatObject> <Where> <Name>Israel</Name> </Where> </Abstract> Encoding Controlled Terms 1. Singingfish.com Genres are described in one namespace (urn:sf:genre). 2. MSNBC Genres are described in another namespace (urn:msnbc:category ) Encoding Controlled Terms <categories> <category id="News"> <topics> <topic>International</topic> </topics> </category> </categories> <xsl:variable name=“sfCategory" select="singingfish:mapper.map(string(category[1]/@id))"> <Genre href=“urn:sf:{$sfCategory}“ /> <Genre href="urn:msnbc:category:{category[1]/@id}"> <Term type="NT" termId="{category[1]/topics/topic[1]}"/> </Genre> Extending an MPEG-7 DS sf:PublicationType sf:UsageInformationType 0..1 mpeg7:UsageInformationType +Rights : mpeg7:RightsType +FinancialResults : mpeg7:FinancialType +Availability : mpeg7:AvailabilityType +UsageRecord : UsageRecordType +Publisher : mpeg7:AgentType +PublicationLocation : mpeg7:LocationType +Publication : mpeg7:TimeType +Rights : mpeg7:RightsType Extending an MPEG-7 DS <complexType name="PublicationType"> <complexContent> <extension base="mpeg7:DSType"> <sequence> <element name="Publisher" type="mpeg7:AgentType" minOccurs="0"/> <element name="PublicationLocation" type="mpeg7:PlaceType" minOccurs="0"/> <element name="PublicationDate" type="mpeg7:TimeType" minOccurs="0"/> <element name="Rights" type="mpeg7:RightsType" minOccurs="0"/> </sequence> </extension> </complexContent> Extending an MPEG-7 DS <complexType name="UsageInformationType"> <complexContent> <extension base="mpeg7:UsageInformationType"> <sequence> <element name="Publication“ type="sf:PublicationType" minOccurs="0"/> </sequence> </extension> </complexContent> </complexType> Extending an MPEG-7 DS <UsageInformation xsi:type="sf:UsageInformationType"> ... <Publication> <Publisher xsi:type="mpeg7:OrganizationType"> <NameTerm href=“urn:sf:publisher:MSNBC”/> </Publisher> <PublicationLocation> <Country>us</Country> <Region>wa</Region> </PublicationLocation> <PublicationDate> <TimePoint>2000-10-23T14:20:00</TimePoint> </PublicationDate> </Publication> </UsageInformation> Summary • Quality search depends on quality metadata – MPEG-7 standards ease development costs – Controlled vocabularies • MPEG-7 MDS can be used to interoperate • XML Schema allows controlled extensions Thank you singingfish.com Optional MPEG-7 Background Slides MPEG-7 Basics • ISO/IEC 15928 Multimedia Content Description Interface • Comprehensive set of audiovisual description tools. • Enabled by key Internet standards: – W3C: XML, XML Schema – IETF standards: URI, URN, URL for resource naming and location • Harmonized with other emerging metadata standards: – Dublin Core, MPEG-21, NewsML, SMPTE Metadata Dictionary, TVAnytime, and more. • Text and compressed binary encodings – Both encodings have streaming add, delete, update features for delivery over real-time transports: MPEG-2, MPEG-4, IP, etc. • International Standard in October 2001 – Ballot period begins 14 March 2001 Basic elements Textual Annotation (free text, structured annotation, syntactic dependency, etc.) Controlled vocabularies, Agent, Place, Graph, etc. Time, Duration, Medialocators Basic elements Schema tools Datatype & structures Link & media localization Basic DSs Content Management & Description Format, Coding, Instances, Identification, Transcoding Hint, etc. (Several instances) Title, Creator, Creation location & date, Purpose, Classification, Genre, etc. (Author generated) Rights holder, Access rights, Usage Record, Financial aspects, etc. (Evolution) Creation & production Media Content Usage Content management Content description Structural aspects Conceptual aspects Viewpoint of the structure: Segments • Spatial / temporal structure Schema Datatype & • Audio, Ds tools video low-level structures • Elementary semantic information. Link & media localization Basic DSs Content Management & Description (Conceptual aspects) Creation & production Media Content Usage Content management Content description Structural aspects Conceptual aspects Viewpoint of conceptual notions Schema tools Datatype & structures Link & media objects,Basic DSs concepts, and • Events, abstract localization their relation Navigation and Access Efficient support of : discovery, Creation & browsing, navigation, visualization / production sonification Media Content Usage Content management Navigation & Access Summary Content description Structural aspects Schema tools Conceptual aspects Datatype & structures Link & media localization Variation Basic DSs Navigation and Access Navigation & Access Creation & production Media Content Usage Content management Summary Content description Structural aspects Schema tools Conceptual aspects Variation Substitution of the original content Adaptation to terminal, network, or Datatype & Link & media Basic DSs user preferences structures localization Content Organization Collection & Classification Content organization Description and organization of Creation & collection of documents Navigation & Access production Media Probability Model Statistical functions and structures to describe Content sample of AV content and classes of descriptors. Summary Usage Content management Analytic model: Content description Definition of cluster, classes and models to associate a semantic label to a set of data. Structural aspects Schema tools Model Conceptual aspects Datatype & structures Link & media localization Variation Basic DSs User Interaction Collection & Classification Content organization Analytic Model User identification and preferences:Navigation & Filtering, search and browsing Access Creation & production Media Content Usage Content management Summary User preferences Content description Structural aspects Schema tools Conceptual aspects Datatype & structures Link & media localization User Interaction Variation User preferences Basic DSs Usage History MPEG-7 DDL • XML Schema • Data type extensions – MIME type, ISO country, region, currency codes – ISO Character set codes – Revised time data types to support arbitrary fractional seconds denominator for per-frame positioning • 2001-05-01T15:23:46N11F30 (11th frame @ 30 FPS) • Type-centric approach using root abstract types – Control available global elements – Allow extension via name spaces and <extension> mechanism Basic Derivation of MPEG-7 Types <complexType name="Mpeg7RootType" abstract="true"> <complexContent> <restriction base="anyType"/> </complexContent> </complexType> <complexType name="DSType" abstract="true"> <complexContent> <extension base="mpeg7:Mpeg7RootType"> <sequence> <element name="Header" type="mpeg7:HeaderType" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="id" type="ID" use="optional"/> </extension> </complexContent> </complexType> Creation Description Scheme <complexType name="CreationType"> <complexContent> <extension base="mpeg7:DSType"> <sequence> <element name="Title" type="mpeg7:TitleType maxOccurs="unbounded"/> … <element name="Creator“ type="mpeg7:CreatorType“ minOccurs="0" maxOccurs="unbounded"/> … </sequence> </extension> </complexContent> </complexType>