Microsoft Interactive Media Manager Metadata Model White Paper September 2008 John Devaney Microsoft Corporation The information contained in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. 1 Applies To Microsoft Interactive Media Manager Microsoft Office SharePoint® Server 2007 Summary Interactive Media Manager (IMM) uses Semantic Web technologies to provide enhanced metadata management for media files. The use of Resource Description Framework (RDF) specification for metadata and information modeling enables IMM to deliver a flexible notation platform, which you can search by using SPARQL Protocol and RDF Query Language (SPARQL). IMM provides a default ontology to represent objects and concepts. This open metadata World Wide Web Consortium (W3C) description standard is used throughout the IMM Web Parts. The ontology is extensible and enables users to define any object. Ontologies can be extended or developed by using the Microsoft Visual Studio® 2005 or 2005 development system ontology add-on for IMM or the upcoming Metadata Editor Web Part. Figure 1 shows the position of the RDF Semantic Store in the IMM architecture. Figure 1 Semantic Metadata Store position in the IMM architecture 2 3 Contents Summary................................................................................................................................................... 2 Contents ................................................................................................................................................... 4 Objectives ................................................................................................................................................. 5 Overview................................................................................................................................................... 5 Goals of the Interactive Media Manager Metadata Model ..................................................................... 6 The Semantic Web.................................................................................................................................... 6 Resource Description Framework Overview ........................................................................................ 6 Web Ontology Language Overview ...................................................................................................... 7 SPARQL Protocol and RDF Query Language ......................................................................................... 7 Interactive Media Manager Metadata Model .......................................................................................... 7 Interactive Media Manager Semantic Metadata Store ............................................................................ 8 IMM Ontology .......................................................................................................................................... 8 Ontology Definition in Interactive Media Manager ............................................................................. 9 Defining a New Ontology and Namespace ......................................................................................... 11 IMM Metadata Editor ......................................................................................................................... 11 Web Ontology Language ........................................................................................................................ 12 Working with Visual Studio 2005 ....................................................................................................... 12 Searching the IMM Repository ............................................................................................................... 13 IMM Semantic Metadata Service ....................................................................................................... 14 Conclusion .............................................................................................................................................. 15 Appendix 1 .............................................................................................................................................. 16 4 Objectives Set the business and technical landscape for the IMM metadata model. Describe the structure of the metadata model. Describe the key components of the IMM metadata model (RDF, OWL, and SPARQL). Describe how to use the metadata model and its associated tools. Overview This white paper describes the technologies that the IMM metadata model uses to hold, manage, and query IMM asset metadata. IMM is built on Microsoft Office SharePoint® Server 2007 technologies, but it extends them in a number of areas, particularly in its use of RDF to model metadata. The use of RDF triples means that IMM adds a new Semantic Metadata Store database, previously known as the Metadata Repository, to the standard Office SharePoint Server search database. While enterprise search is still available so that Office SharePoint Server 2007 users can find documents, the Semantic Store database has an additional search function that uses SPARQL to mine media metadata. IMM is developing a rich tool set that uses Web Parts and other Microsoft solutions, such as Visual Studio 2005 and 2008, to support the use and interrogation of the metadata model. The paper describes the use of the Semantic Web, RDF, ontologies, Web Ontology Language, and SPARQL in and with IMM. 5 Goals of the Interactive Media Manager Metadata Model IMM is designed to help users who work with rich media to ingest, manage, edit, and distribute content. While IMM is based on Office SharePoint Server 2007 and offers collaborative capabilities, enterprise content management, and search capabilities, IMM is designed specifically to work with media files, and has been modified to integrate with the wider media industry. Key to this is the addition of the IMM Semantic Metadata Store. IMM uses Semantic Web technologies to express, interchange, store, and search media asset metadata. This is because with RDF and OWL, developers and users can add business-specific and technical information to media assets. Metadata stored in Digital Asset Management (DAM) systems is often based on a variety of vocabularies and data models. This means that it is more difficult to exchange metadata between disparate systems and different organizations. A fixed schema semantic metadata store can also make it more difficult to change or extend the metadata model. The use of RDF and OWL makes it easier to share information, both internally and with partners, because RDF and OWL are W3C standards. The resulting metadata is stored in the Semantic Metadata Store. This is a SQL Server 2005 or 2008–based RDF engine. Unlike a more traditional relational database, which implicitly encodes the data semantics, the RDF has an open standard for specifying the semantics of the data model. For both technical and business users this means that IMM can integrate seamlessly with current, proposed, or unknown DAM systems that use RDF or a relational database for metadata. The Semantic Web The term Semantic Web describes Tim Berners-Lee’s vision of a World Wide Web in which computers can interpret information found on the Web. Berners-Lee states that: “For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.” (Tim Berners-Lee, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American Magazine, May 17, 2001.) IMM uses a Semantic Web paradigm in the definition, interpretation, and interrogation of metadata in its metadata model. Three of the formal specifications that are widely described as Semantic Web tools are: Resource Description Framework (RDF) Web Ontology Language (OWL) SPARQL Protocol and RDF Query Language (SPARQL) Resource Description Framework Overview RDF is a W3C specification for metadata and information modeling. RDF is used to describe objects in triples, which means that each object is referred to by subject, predicate, and object notation. The subject refers to the resource, usually in the format of a universal resource indicator (URI), the 6 predicate defines a feature of the object, and the object specifies a value for the feature. RDF triples are also referred to as RDF graphs, because the visual representation of the data model can be made as a graph of relationships between the subject, predicate, and object. Web Ontology Language Overview An ontology defines classes, properties and rules. An ontology to RDF is analogous to a schema for a database. In an ontology, you define common things. Ontologies define objects and the types of properties and property values that the objects can have. There might be ontologies for different types of users, such as editors, but the RDF triples structure stays the same. SPARQL Protocol and RDF Query Language SPARQL Protocol and RDF Query Language (SPARQL) is an RDF query language designed to work with triples. It can be compared to T-SQL for SQL database relational data, except that it is developed to query RDF graph data. Interactive Media Manager Metadata Model The IMM metadata model is based on the use of RDF graphs to hold information about the resources within a system. The triple structure means that there is no database schema that needs to be modified if changes occur. New triples can be defined and included as new items are defined by your business. Figure 2 RDF Graph In Figure 2, and the tabular representation in Table 1, you can see how the subject can have multiple predicates and subjects. In spite of this, IMM demands that all triples are unique. The language and 7 type options, which make up elements four and five of a triple, are used to define the language and XML Schema Definition (XSD) type. Table 1 – RDF Triple with Type Subject Predicate Object http://litwareinc/imm/resources/item1 dc:title Finding Me http://litwareinc/imm/resources/item1 dc:description A young man loses his memory … http://litwareinc/imm/resources/item2 Imm.videoFormat http://www.litwareinc.com/imm/form at/uid http://litwareinc/imm/resources/item2 dc:title Another Movie http://litwareinc/imm/resources/item3 dc:description In a state far away … http://litwareinc/imm/resources/item4 dc:title Another Movie II Interactive Media Manager Semantic Metadata Store The IMM Semantic Metadata Store is a Microsoft SQL Server 2005 or 2008–based RDF engine. The key difference between the IMM Semantic Metadata Store and a more traditional relational database is the representation of the data model semantics. A relational database uses database features, such as foreign keys and tables, to encode semantics. This means that the client application must understand how to interpret the database semantics, which can make it difficult to import data from other sources because the semantics must be converted to match the current model. In the RDF model, all tables, regardless of the source of the data, are the same format, as shown in Table 1. This means that there is no need for complex changes to schemas to manage integration. IMM Ontology The IMM metadata model is based on the MPEG-21 Digital Item Declaration Language (DIDL) Abstract Model (see ISO 21000-2 section 6.2) represented in an RDF formatting. The IMM ontology is designed to maximize integration between multiple data sources by providing the majority of facilities for data management. IMM uses the Dublin Core ontology, and the aim is for this ontology to be used across all repository types. To cater to the majority of media description requirements, IMM also provides the IMM ontologies. IMM applies the DIDL abstract model to base classes in the IMM Core OWL ontology and extends the abstract base classes for use within the IMM framework. The IMM Software Development Kit (SDK), documentation (available at http://msdn.microsoft.com/en-us/library/bb971663.aspx) contains two versions of the IMM OWL ontology: the IMM Full ontology and the IMM Core ontology. The Full ontology provides all of the metadata and classes that can be used for video content, while the Core ontology provides all of the 8 classes and a minimal set of metadata necessary for IMM Web Part and Workflow operation. The SDK documentation primarily uses the IMM Core ontology in its examples. The reference to definitive ontology namespaces, such as the Dublin Core, combined with the function-specific ontologies from IMM means that the majority of class and metadata definitions already exist, so that customers do not need to undertake large-scale ontology development. Instead, they can extend the existing ontologies to include any unique items. For a full list of referenced namespaces, see Appendix 1. In addition to the classes and properties defined in an ontology, a customer can develop rules. Rules can specify the properties, or lack of properties, that make a video item a video item, and different from an audio item. Rules use a reasoned, rather than a defined, approach, so that identifying objects within the core data uses rule-based deduction, rather than loading the date and then handling exception errors. It is possible to extend rule sets or derive from existing rule sets. When a new ontology is created, the developer can code an import at the top of the XML file and refer to an existing file from Microsoft or elsewhere. The same data is interpreted in different ways depending on the entry point—for example, user types, customer accessing data, how data is used—which means that multiple rule sets are available. This is particularly useful for workflow development because it is possible to define rules that identify which files are metadata complete and which are not. Ontology Definition in Interactive Media Manager IMM metadata model elements can be created from the base Object class, as shown in Figure 3. These might be items, such as VideoItem or ImageItem, or containers, such as folders or projects. imm:Object Class – This is the base class that all classes in the model inherit from, and it provides base-level functionality to all classes that are required by the IMM Web Parts. The Object class should be treated as sealed. did:Item Class – This is a first-level class derived from Object. An item most often represents a single piece of data, a movie, or an audio file. did:Container Class – This is a first-level class derived from Object. It represents a physical or logical collection of objects. MediaItem (did:Item) – This is a base class to be used for extending predicates. It should apply globally to all media type items. VideoItem (imm:MediaItem) –This is an instance and base class for items that are temporally based. It can exist in either a digital or a physical format. AudioItem (imm:MediaItem) – This is an instance and base class for all items that are audio only. It can exist in either a digital or a physical format. ImageItem (imm:MediaItem) – This is a base class, although it can be used to describe an instance. It can store global image predicates along with extensions from image vocabularies. 9 Folder (did:Container) – This class is a container-type virtual folder that exists only in metadata through the Semantic Metadata Service. Figure 3 Item and folder classes Figure 4 shows an item instance. This can have properties based on the following: IMM predicate or Dublin Core, custom, and domain specific properties. Pointers to other related items, by using the did:ItemCollection predicate. Pointer to an instance of type Resource, by using the did:Resource predicate. Annotations, by using the IMM:AnnotationCollection predicate. An annotation can contain custom predicates and pointers to Anchors. did:Resource class, which points to the physical representation of an item. These are equivalent to the MPEG-21 DID abstract Item Class. did:Annotation class, which is a container for notes, digital ink, and other notations about an item. Annotations can be anchored by time or location in an item. did:Anchor class, which binds the annotation to its temporal or location point. 10 Figure 4 Item Instance All IMM items should be assigned a unique identifier. These are stored in the Digital Item Identification (DII) (see ISO 21000-3) predicates provided in the IMM Core ontology. The predicates are composed of two properties: identifier and type. In IMM, the type predicate should be a string that identifies the Registration Authority or identification system used. It is optional if a valid Uniform Resource Name is used in the identifier. Defining a New Ontology and Namespace There are a number of tools that you can use to create new or edit existing ontologies. In addition to third-party tools, such as the Altova SemanticWorks Semantic Web tool, IMM version 2.0 will provide the Metadata Editor Web Part (previously called the Ontology Editor Web Part). IMM Metadata Editor In IMM version 2.0, it will be possible to create and edit a unique ontology by using the IMM Metadata Editor Web Part. The integration of the Metadata Editor allows IMM users to load an existing ontology from the RDF data store via HTTP upload, and then modify the ontology by adding or deleting classes, fields, and rules. Users can edit security at the ontology level, which gives users and groups the ability to read, write, and view fields and classes of the ontology. 11 The Metadata Editor provides access to definitions for fields, classes, rules and namespaces. In addition, users can set a rollback point and can reset the ontology to the previous version to help manage and standardize change. The Metadata Editor displays all of the associated classes for an ontology, class descriptions, and namespaces. Users can also add fields from existing classes to the specified class. With the Metadata Editor, a user can create new classes; view, edit, and delete existing classes; and edit class definitions and settings, such as permission and management. Fields across all classes can be displayed in a Field Library, so that users can create new fields and view or edit existing fields. Rules associated with an ontology are displayed in a Rules Library. This library enables users to create new rules or copy an existing rule to a new rule. Rules can be filtered by title, description, or active status. Web Ontology Language The Web Ontology Language (OWL) is a language with which you can write ontologies. IMM uses the OWL Full implementation, which contains all of the OWL constructs and provides free, unconstrained use of RDF constructs. IMM aims to be OWL-Full for the repository processing, although the .NET classes are not OWL-Full. Working with Visual Studio 2005 Visual Studio 2005 provides an Ontology Objects add-on that lets a user generate .NET classes from an existing ontology file (see Figure 5). This replaces the Owl.exe tool that was made available for use with earlier implementations of IMM. Figure 5 Ontology Objects add-on The Ontology Objects add-on is found in Visual C# Items. By using this add-on, you can create a new ontology based on importing OWL files that are converted to C# classes. You can retrieve Web-based or local files, as you can see in Figure 6. 12 Figure 6 Ontology file selection The add-on also parses required imports, such as Dublin Core elements, which you might have included, as shown in Figure 7. The tool can find all of the classes and properties defined in the selected ontology, and provide a URI for each. Figure 7 Ontology imports selection When you are happy with your selections, the add-on adds a new XML file to your Visual Studio project. The new file provides a parsed RDF file and a .cs file with the generated classes. You can then use these classes within the IMM API. In addition to the editor options from IMM and Visual Studio, you can use a third-party ontology editor, such as SemanticWorks from Altova. Searching the IMM Repository You can use SPARQL to query the IMM Semantic Metadata Store. The structure of a SPARQL query is similar to that of a T-SQL query for a relational database. A SELECT statement defines the variables that are returned as a table. In the WHERE clause in the SELECT statement, you insert triple patterns (subject, predicate, and object). These can be wildcard (*) variables. The code sample shows a SPARQL query over a graph pattern, or tree collection, of triple patterns. 13 SPARQL Query Statement Description Notes PREFIX did: <urn:mpeg:mpeg21:2002:02-DIDMODEL-NS#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX imm: <http://schemas.microsoft.com/imm/core/1.0#> SELECT ?title WHERE { <guid:_ A101E8DE-C4F6-48a6-8DF3-E21034084282> <did:ItemCollection> ?childId. ?childId <dc:title> ?title; <rdf:type> <imm:MediaItem>. } Namespaces defined ?=variable, in this case a variable called TITLE { defines graph pattern start Subject Predicate Object Subject Predicate ;=same object Type is MediaItem } defines graph pattern end The prefix for each namespace is declared at the top of a query, which makes it easier to read. SPARQL is optimized for traversing an RDF graph. You can apply rule sets to a query by applying rules from an ontology within the query. If necessary, you can convert a graph to a data table, for use with more traditional tools, such as SQL, or to manipulate with an object-oriented language. IMM Semantic Metadata Service IMM components communicate with the Semantic Metadata Store through the Semantic Metadata Service, previously known as the DAM Web Service. You can use an OLE DB driver to access the database, but the WCF Web Service is preferred because it maintains the Service-Oriented Architecture (SOA) of IMM. The service interface provides methods for interacting with and retrieving graphs. This means that it is possible to retrieve a graph without having to write SPARQL queries. The major IMM methods available and their functional descriptions are shown in Table 2. Table 2 – IMM Methods Method Description GetResource Accesses a resource when passed a subject and predicate(s). The method also requires instruction on whether to get child items, such as Annotations. The method returns RDF/XML, which can be processed on the client by loading the data into an OW-generated object. CopyResource Clones a resource. UpdateResource Creates or updates a resource, depending on whether or not the resource already exists. DeleteResource Climbs the graph to delete all triples that reference the resource and climbs down to delete child items of the resource, if they are not child items of another object. Deletion is permanent. 14 An ontology is stored in the Semantic Metadata Store. Ontology methods that are similar to the RDF methods include GetType, UpdateType, and DeleteType. Conclusion The IMM metadata model provides the flexibility and extensibility that is essential when working with media assets. The use of Semantic Web technologies, such as RDF and SPARQL, means that IMM is based on popular W3C industry standards. IMM can integrate with third-party storage and management systems because the IMM SOA makes it possible to integrate easily, using well-defined Web service APIs. IMM solutions are built on the highly successful Office SharePoint Server 2007 platform, which provides additional business support for an organization. Because IMM adheres to established Web standards, an organization can be confident that interoperability with new tools should be seamless. 15 Appendix 1 Namespaces and qualified names are used within IMM. QName Namespace Description imm http://schemas.microsoft.com/imm/core/2.0/ The base IMM framework namespace. dc http://purl.org/dc/elements/1.1 The Dublin Core namespace. did urn:mpeg:mpeg21:2002:02-DIDMODEL-NS# The MPEG21 DID abstract model namespace. didl urn:mpeg:mpeg21:2002:02-DIDL-NS# The MPEG21 Digital Item Declaration Language namespace. This is the XML schema implementation of the abstract model. dii urn:mpeg:mpeg21:2002:01-DII-NS# The MPEG21 Digital Item Identifier Namespace mediapro http://ns.iview-multimedia.com/mediapro/1.0/ Microsoft Expression Media vocabulary (iView Media Pro v3.1). mpeg7 urn:mpeg:mpeg7:schema:2001# The MPEG7 namespace used to define broadcast metadata industry vocabulary elements. photoshop http://ns.adobe.com/photoshop/1.0/ Adobe XMP Photoshop vocabulary. xapRights http://ns.adobe.com/xap/1.0/rights/ Adobe XMP Rights Management Schema. xap http://ns.adobe.com/xap/1.0/ Adobe XMP Basic Schema. xmpMM http://ns.adobe.com/xap/1.0/mm/ Adobe XMP Media Management Schema. xmpDM http://ns.adobe.com/xmp/1.0/DynamicMedia/ Adobe Dynamic Media Schema. crs http://ns.adobe.com/camera-rawsettings/1.0/ Adobe Camera Raw Schema. tiff http://ns.adobe.com/tiff/1.0/ Adobe EXIF Schema for TIFF. exif http://ns.adobe.com/exif/1.0/ Adobe EXIF Schema for EXIF-specific Properties. aux http://ns.adobe.com/exif/1.0/aux/ Adobe EXIF Schema for Additional EXIF Properties. Iptc4xmpCore http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/ IPTC core vocabulary for XMP. 16 owl http://www.w3.org/2002/07/ow The Web Ontology Language namespace. rdf http://www.w3.org/1999/02/22-rdf-syntax-ns The Resource Description Framework (RDF) syntax. rdfs http://www.w3.org/2000/01/rdf-schema RDF Schema. xsd http://www.w3.org/2001/XMLSchema# XML Schema. 17