Semantic Abstractions in the Multimedia Domain Elina Megalou1,2 & Thanasis Hadzilacos1 1 Computer Technology Institute, and 2 Dept. of Computer Engineering and Informatics, University of Patras, Greece Kolokotroni 3, GR-26221, Patras, Greece e-mail: {megalou, thh} Abstract -- Information searching by exactly matching content is traditionally a strong point of machine searching; it is not however how human memory works and is rarely satisfactory for advanced retrieval tasks in any domain multimedia in particular, where the presentational aspects can be equally important to the semantic information content of multimedia applications. A combined abstraction of their conceptual and presentational characteristics, leading on the one hand to their conceptual structure (with classic semantics of the real world modeled by entities, relationships and attributes) and on the other to the presentational structure (including media type, logical structure, temporal synchronization, spatial (on the screen) “synchronization”, interactive behavior) is developed in this paper. Multimedia applications are construed as consisting of “Presentational Units”: elementary (with media object, play duration and screen position), and composite (recursive structures of PUs in the temporal, spatial, and logical dimension. The fundamental concept introduced is that of Semantic Multimedia Abstractions (SMA): qualitative abstract descriptions of multimedia applications in terms of their conceptual and presentational properties at an adjustable level of abstraction. SMAs, which could be viewed as metadata, form an abstract space to be queried. A detailed study of possible abstractions (from multimedia applications to SMAs and SMA-to-SMA), a definition and query language for Semantic Multimedia Abstractions (SMA-L) and the corresponding SMA model (equivalent to extended OMT), as well as an implementation of a system capable of wrapping the presentational structure of XML-based documents complete this work, whose contribution lays in the classically fruitful boundary between AI, software engineering and database research. Index Terms ---- Multimedia data model, semantic modeling, abstraction, semantic multimedia abstraction, spatio-temporal retrieval, multimedia query language. 1. Introduction Looking for a piece of information among many, is one of the basic tasks in computer science -searching. The traditional approach to searching makes two assumptions: First that we know, exactly, what we are looking for; second that we can organize the data where we are looking to find it –the so called search space. These assumptions are not true in all searching situations. For instance when as humans we try to recall information in our minds we may have an approximate, fuzzy or incomplete description of it at a non uniform level of detail; as for the “search space” if it does have an organization, it escapes us. This paper is about looking up multimedia applications; on CD’s or in the Web they form a huge, loosely organized if at all, distributed search space with a wealth of information. “Get me multimedia applications on music teaching for string instruments; I recall seeing one with audio and video (or was it a series of slides?) at the same time; half of the remaining screen was filled with music score and the rest with textual instructions or explanations; it included Paganini’s Moto perpetuo”. This is the type of queries we deal with. Why? Because from cognitive science we know that this is how people remember. But also because –and this has been the original motivation of this work- this is how multimedia applications may be specified and such specifications -available from the design phase or to be reconstructed- would be a most suitable search space. The name of the game is abstraction. A fundamental human cognitive process and skill [27], and a basic mathematical and computer science tool for problem solving in general [17], [22] and searching in particular [21]. It implies the transformation of our target objects by shedding some of their properties, those deemed irrelevant for the task at hand. Such a transformation may be so drastic that it changes the domain of discourse: from land parcels to rectilinear two-dimentional figures is the classic abstraction better known as Euclidean geometry. Properties such as color, weight, substance become irrelevant and objects of the real worlds are mapped into their classes of equivalence. All this is discussed, as background relevant research in Section 3.3.1. From the example query, from our work in specification of multimedia titles in series (a software engineering methodology developed to facilitate and automate the generation of classes of similar multimedia applications)[16],[37],[38],[53] and from a wealth of research during the past few years [2],[7], [10], [12],[19],[29], [39], [55] it is clear that we need to abstract on the conceptual and presentational characteristics of multimedia applications at the same time. The conceptual structure can be neatly captured with classic semantic models: entities, relationships and attributes are the basic tools augmented with higher level structuring concepts (such as aggregation, grouping, and classification) for which both theory and tools are reasonably well developed [46], [9], [42]. For the presentational structure of multimedia applications more is needed although a lot has been done [12], [19],[33],[34],[36],[54]. Our analysis (Sections 2.2.1 and 2.2.2) indicates media type, logical structure, temporal synchronization, spatial (on the screen) “synchronization”, and interactive behavior as being the main aspects. Our contribution to this analysis is the concept of Presentational Unit. A structurally scalable unit, the PU can be elementary (just a media object positioned in time and place within a multimedia presentation, i.e. augmented with playout duration and screen position) or composite (recursively consisting of simpler ones combined logically, synchronized temporally or put together on the screen). This is detailed in Section 2.2.3. The main contribution of the paper regards the analysis of abstractions in the multimedia domain. Semantic Multimedia Abstractions (SMA) are qualitative abstract descriptions of a multimedia application in terms of its conceptual and presentational properties at an adjustable level of abstraction (Section 2.3). For instance, while a multimedia application needs absolute temporal durations for its media objects, an SMA would only retain their relative temporal relationships. SMAs are metadata and they form an abstract space rather suitable for searching. The base abstraction leads from representations of multimedia applications (in XML for example) to SMAs; from then on hierarchies of abstract spaces can be created using SMA-to-SMA transformations which move up abstraction level by relaxing constraints, and wrapping temporal, spatial or logical structures. The admissible types of abstractions (with minor dependencies on the language used) are studied in detail in Sections 3.3 and 3.4 whereas the SMA definition and query language is given in Table 1 in BNF. Although not the subject of this paper, a system has actually been developed to support the SMA model –itself equivalent to OMT suitably extended with compatible definitions of temporal and spatial aggregation and grouping. The system has been used to exemplify our ideas using XML as the base document language and Section 3.5 concludes the paper with an illustrative example. Some of the most interesting research takes place in the boundaries of traditional computer science areas. This work is a contribution in a classic fertile such boundary, between databases, artificial intelligence and software engineering [9]. 2. Semantic Abstractions for specifying, designing and constraining multimedia applications 2.1 Automating the generation of multimedia applications in series: Motivation and the MULTIS systems We started studying abstraction in the multimedia domain while approaching a software engineering problem: how to specify a set of thematically and structurally “similar” multimedia applications (a multimedia series) in order to design and build a special-purpose authoring environment which facilitates the development of each multimedia application of the series. Towards this end, a production methodology called MULTIS (Multimedia Titles In Series) [16] was proposed consisting of the following steps: initially, domain knowledge providers and multimedia designers identify the desired common properties of all multimedia applications in the series and produce a generic specification –called “Model Title Specification”- of the multimedia series in terms of them; based on the Model Title Specification, computer engineers build a special-purpose authoring system –called a MULTIS system- that embodies such properties in its own structure; using the MULTIS system, end-users “fill-in” the particular properties of each personalized multimedia application and ask for the automatic generation of the application’s source code. Hence, the generic specification of a multimedia series is a fundamental issue in the MULTIS approach. The Model Title Specification captures both the knowledge of the application domain (concepts and relations) and the presentational and behavioral properties of the multimedia Code Generator (Presentation Layer) applications, in a way, generic enough to represent the whole Title Editor (Control Layer) series while adequately focused to entail in an easy to use Application Database (Application Layer) MULTIS system. Multimedia Database (Data Layer) For each multimedia series, the corresponding MULTIS Model Title Specification (Specifications Layer) system consists of a custom multimedia database for the Figure 1: The MULTIS Layered Architecture organization and storage of the domain-specific multimedia data and an editing environment for the definition (instantiation), storage and automatic generation of each multimedia application in the series. From the database point of view, MULTIS are special-purpose database systems enhanced with code generation functionality. The ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 3 COMPUTER TECHNOLOGY INSTITUTE 1999 Model Title Specification guides the development of two database schemata, the schema of the multimedia database and that of the application database and it determines the design of an application-specific user interface for the editing and instantiation of each personalized multimedia application. Both databases are modeled with object-oriented models; each multimedia application forms a network of objects which reflects its specific structure, behavior and data and is stored as an independent database. In a MULTIS system, the knowledge about the presentational and structural characteristics of the multimedia applications in the series is embedded in each object class and it is used in the code generation process; each object “knows how to present itself”, it produces its code and propagates a pertinent message to the appropriate objects of the network. Figure 1 depicts the MULTIS layered architecture; layers communicate with adjacent ones but operate independently allowing the separation of the multimedia data and their presentation. The MULTIS approach was validated in practice within the context of the EEC funded project “Valmmeth” [53] whose aim was to demonstrate the feasibility and benefits of publishing series of multimedia applications using this technology. Based on the Model Title Specifications of four multimedia series (foreign language training applications, business presentations, point of information systems (POISs) and medical training applications) given by domain experts and multimedia designers, the corresponding four MULTIS systems were developed and tested at pilot sites in Greece, Belgium and UK. 2.2 Identifying the properties of multimedia applications in a semantic abstract representation 2.2.1 The specification of a multimedia series as an abstraction process At the specification and design stage of a software system, the goal is to capture the desired “functionality” ignoring implementation details. Taking the MULTIS example, when developing a Model Title Specification the goal is to specify the functionality of a multimedia series by capturing only the desired common properties of multimedia applications that the corresponding MULTIS system is able to produce and ignoring those properties in terms of which these multimedia applications are allowed to differ. In other worlds, the Model Title Specification constrains the applications to be generated to those considered “identical” in terms of certain properties. Hence, specifying a multimedia series is an abstraction process; the particular abstraction goal determines both the properties ignored at this stage and the selection of the “right abstraction level”. However, the “functionality” of multimedia applications pertain to their conceptual and presentational structure and behavior: real-world objects and relationships involved, spatio-temporal structure and synchronization during presentation, control flow and behavior in various events etc. For instance, if temporal issues of multimedia applications is of particular importance and should be specified in detail, the abstraction process may ignore details such as the multimedia content layout and text formatting and may keep only the relevant to the temporal dimension properties. The decision on the "right abstraction level" in the MULTIS example is guided mainly by the desired diversity -or similarity degree- of produced multimedia applications and is a trade-off between the complexity of a MULTIS system and the range and diversity of multimedia applications that the system is able to produce -high abstraction level leads to too open MULTIS systems which tend to resemble general purpose authoring tools. For instance, if the temporal properties of a multimedia scene are specified using exact time instances and time distances from a specific time point -e.g. a video starts at 5 from the beginning of the presentation and at the 3 of its playout an image appears for 5 -, then “valid” scenes are considered only those that conform to these strict temporal constraints; hence, there is no diversity of scenes in terms of their temporal synchronization. At a higher abstraction level, similar specifications could be given using temporal relations instead of time instances allowing several multimedia scenes to “fall under” these temporal constraints e.g. “a video starts some time after the beginning of the scene while two images appear sequentially and in parallel with the video”. 2.2.2 The Model Title Specification as an Semantic Abstract Description of a set of multimedia applications In the MULTIS approach the Model Title Specification captures the following: a. The conceptual structure of multimedia applications (application layer), which consists of the “real-world” objects of the application domain (objects that exist in the real, outside world –we call them conceptual structure objects or conceptual units), their attributes and relationships. For instance, in a multimedia series of tourist guides, a city, a hotel, or a museum are conceptual structure objects; a relationship could be “each city has one or more hotels”. b. The presentational structure of multimedia applications (presentation layer), which consists of the presentational objects (objects that appear “on the screen” during the execution of the multimedia application we call them presentational units), their attributes and relationships. The data types of multimedia content (called media objects) that each real-world object “is presented by” are the basic objects for building presentational units. The presentational structure reflects how the conceptual structure is mapped onto a multimedia application. For instance, if the conceptual structure includes that “a company consists of a number of departments”, one possible presentational structure is “a company is presented by an screen whose background is an organizational chart with a number of departments; each department is presented by one introductory screen, accessible through active hotspots on the organizational chart”. Note that the same conceptual structure can be mapped onto many different presentational structures depending on the way the multimedia content of the conceptual objects are assembled and structured in a multimedia application. 2.2.3 On the Presentational Structure of Multimedia Applications To define the properties of multimedia applications captured by presentational structure we first define the concept of the Presentational Unit (PU). Definition 2.1: An elementary Presentational Unit (PU) is a triplet pu ( m, , p ) where m is a media object, ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 5 COMPUTER TECHNOLOGY INSTITUTE 1999 τ is a time interval, called the “presentational duration” of pu (possible indefinite e.g. for a web page that stays on the screen until an outside event occurs), p is a region on the screen, called the “presentational position” of pu (possibly indefinite e.g. for the object of type “audio”) The “presentational duration” of an elementary PU pu, denoted with pu.τ, represents the temporal interval when the pu is active during an execution of a multimedia application (e.g. it appears in a presentation). A temporal interval is defined by two end points or time instances [3], [34]. The “presentational position” of an elementary pu, denoted with pu.p, represents the screen portion the pu occupies during an execution of a multimedia application; the domain of presentational position is the set of 2D polygons [13]. Definition 2.2: A composite PU is defined inductively by combining PUs in three orthogonal dimensions or views: Logically, Temporally and Spatially. The presentational duration of a composite PU is a set of temporal intervals representing the presentational durations of its constituents PUs. The presentational position of a composite PU is a set of screen regions representing the presentational positions of its constituents PUs. The following properties are captured by the Presentational Structure of a PU : i. The types of the constituent PUs (media objects and composite PUs) disregarding the specific content i.e. two pictures (PUs) have the same type. ii. The logical structure of the PU (including constituent PUs), disregarding specific instances i.e. two slide shows, one of 10 slides and the other of 25 have the same logical structure. iii. The temporal synchronization of the PU (including constituent PUs), disregarding the specific durations and considering only the qualitative temporal information that is considered significant and relevant to the specific abstraction goal. i.e. two pieces of synchronized audio-video (PUs), one of 5 and the other of 10 have the same temporal synchronization. We will refer to this as the “Temporal Structure” of a PU. iv. The spatial synchronization -on the screen relative positioning- of the PU (including constituent PUs) disregarding i.e. specific sizes and taking into account only qualitative spatial information that is considered significant and relevant to the specific goal i.e. two pairs of non-overlapping photos (PUs) have the same spatial synchronization. We will refer to this as the "Spatial Structure" of a PU. Note that the spatial synchronization of a PU is meaningful for visual PUs e.g. sub-scenes, web-pages, PUs of type image or video. v. The interactive behavior of the PU (including constituent PUs), disregarding specific events, conditions and actions and considering only types (classes) of the above i.e. two buttons, one activated by the "mouse click" event and the other by the "mouse over" event have the same interactive behavior. Dropping or under-specifying one of the axes (logical, temporal, spatial) creates a Presentational View. Hence, a PU is a structured composition of simpler PUs which is semantically meaningful under a Presentational View; depending on the Presentational View, a PU can be characterized as a Logical, Temporal or Spatial PU. Examples: a) Let the specification of a multimedia application for company presentations include that “the application starts with the company introductory video and while the video plays, various images appear on the screen; when the video finishes, an image of a company organization chart appears”. The video duration –a temporal interval- can be considered to define a PU whose “meaning” is “introduction”; the duration that the company organization chart stays on the screen defines another PU. The Macromedia Director [35] authoring paradigm is based on temporal PUs: a time frame (a temporal interval) in the score window, or a set of such frames can be considered a temporal PU. A temporal PU is defined by a temporal interval within which its constituent PUs appear b) A space-oriented specification of the above PUs may include: “a company introductory scene contains a video and a slide-show of images; a second scene contains an organization chart”. The two scenes are two spatial PUs. A web page is another example of a spatial PU. c) A structured web document consisting of a header, one or more author names and a set of paragraphs is an example of a logical PU. Definition 2.4: The Presentational Structure of a multimedia application consists of the set of its constituent PUs and their relationships during presentation; the relationships among PUs determine the control flow of the application. In many cases, a multimedia application is a single PU. Examples: a) In the “company presentation” example given above, the introductory screen and the department screens are PUs linked together via active hotspots on the organizational chart; the relationship can be characterized as link-oriented relationship between PUs. Link-oriented relationships used also in web-based applications that consist of a number of hyper-linked spatial PUs (web pages). b) A multimedia presentation that “plays” in automatic mode is considered a set of PUs with time-oriented relationships. 2.3 Semantic Multimedia Abstractions (SMAs) and the SMA model Definition 2.5: A Semantic Multimedia Abstraction (SMA) is a qualitative abstract description of a multimedia application in terms of the properties captured by the conceptual and presentational structure of multimedia applications (defined in Sect. 2.2.2 and 2.2.3); we call such properties conceptual and presentational properties of multimedia applications at the semantic level. A number of models for multimedia information management that address certain aspects of multimedia applications have been developed. Most of them emphasizes on individual media -images and video- following various modeling approaches i.e. the knowledge-based semantic image model, proposed in [10] a four-layered model that uses the hierarchical structure TAH (Type Abstraction Hierarchy) for approximate query answering by image feature and content. Models for multimedia documents address the issues of spatio-temporal synchronization and of structuring of multimedia documents: Time intervals and Allen's temporal relationships [3] and 2D spatial relationships [13] are extensively used for modeling the spatio-temporal synchronization of monomedia data and of more complex multimedia structures and for representing temporal and spatial semantics e.g. [34], [10], [11], [54]. Language-based models such as SGML[49], XML[14] and object-oriented approaches [29], [48] have also been developed for modeling multimedia documents. For the representation of SMAs we need a “model” based on well established concepts and techniques, able to capture in a uniform way the conceptual and presentational properties of multimedia applications at the semantic level. Al-Khatib et al. in [2] review, categorize and compare recent semantic data models for multimedia data at different levels of granularity. According to this categorization, the model for representing SMAs should include features from both compositional and organizational models for multimedia documents while it should emphasize on multimedia databases and provide abstraction constructs for representing higher level structures. Using for instance a graph-based model for multimedia applications -where nodes represent multimedia objects (simple or composite) and arcs denote the execution flow, a model for SMAs could be created by mapping a detailed graph representing one multimedia application –one instance- onto a less detailed, generic graph whose nodes and arcs represent classes of objects and relationships of the initial graph. The proposed model –called “SMA model”- is based on well established semantic data models used in database conceptual modeling and knowledge representation [8], [9], [42]. These models provide structural concepts such as entities (objects), relationships, attributes as well as forms of data abstraction for relating concepts: classification (grouping objects that share common characteristics into a class), aggregation (treating a collection of component concepts as a single concept), generalization (extracting from a set of category concepts a more general concept and suppress the detailed differences) and association (treating a collection of similar member concepts as a single set concept) [8], [9], [42]. Our building tools will be these classic forms of abstraction, extended to the temporal and spatial dimensions to capture the presentational properties of multimedia applications. 2.3.1 Representing conceptual properties of multimedia applications at the semantic level (SMA's conceptual structure) For the representation of the conceptual structure of SMAs, we shall use the provided structural concepts (entities, attributes and relationships) and forms of abstraction (classification, generalization, aggregation, association) of semantic data models. 2.3.2 Representing presentational properties of multimedia applications (SMA's presentational structure) Semantic Data Models use two abstraction constructs to allow the recursive formation of complex objects from simpler ones: aggregation and grouping. For SMA's presentational structure such models should encompass the standard notion of "consists" for representing the logical structure of PUs (e.g. the PU "map" consists of an image and a number of buttons), a "temporal consists" for representing the temporal structure of PUs (e.g. a PU "slide show" consists of a temporal sequence of slides) and a "spatial consists" for representing the spatial structure of PUs (e.g a PU "scene" consists of two disjoint pictures and a text on the bottom of the screen). By extending these abstraction constructs to the temporal and spatial dimensions, temporal and spatial aggregation and grouping are defined. Representing Temporal Structure of multimedia applications at the semantic level (SMA's temporal structure) a. Temporal Abstraction Constructs Let U { pui | pui PU }, 1 i n} and pui . be the presentational duration of pu i for i 1...n . Let also Rt ( pu. , pu . ) denote a temporal relationship R (such as “before” [3]) between pairs of presentational durations. Temporal Aggregation Aggregation is “the form of abstraction in which a relationship between component objects is considered as a higher level aggregate object. Every instance of an aggregate object class can be decomposed into instances of the component object classes, which establishes a part-of relationship between objects” [9]. E.g. a car is an aggregate of its parts. Temporal Aggregation is the form of abstraction in which a collection of PUs pu i , i 1...n with presentational durations pui . form a higher level PU pu whose presentational duration pu. “temporally consists of” the presentational durations pui . of its constituents (see Example 2.1). The higher level presentational unit pu is called a temporal aggregate of pu i and its presentational duration pu. equals the union of the presentational duration pui . of its constituents. The important features of a Temporal aggregate are: a) it is also a PU with a presentational duration attribute, b) the presentational durations of its constituent PUs are “within” (or play during) the presentational duration of the temporal aggregate, c) the presentational duration of the aggregate PU pu. does not extend before the start time or after the end time of any of its constituent PUs, d) pu. is a single temporal interval without “temporal” holes. Definition 2.6: Let pu A( pu1 , pu2 ,.... pun ) be an aggregation of component PUs pu1 , pu2 ,.... pun . Then pu is a temporal aggregation of pu1 , pu2 ,.... pun noted AT , with presentational duration AT . if and only if: n AT ( pu ) pui A, During ( pui . , pu. ) and Equals ( pu. , pui . ) (During and Equals are temporal i 1 relationships [3] – see Figure 2). Example 2.1: If a scene consists of three PUs of type image, video and audio, with presentational durations τscene τaudio τvideo τimage pu I . , puV . and pu A . Scene AT ( pu I , puV , pu A ) respectively, then: During ( pu I . , scene. ) During ( puV . , scene. ) During ( pu A . , scene. ) Equals ( scene. , pu I . puV . pu A . ) ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 9 COMPUTER TECHNOLOGY INSTITUTE 1999 Examples of Temporal aggregations in common authoring paradigms i. Macromedia Director [35] paradigm: Here, a PU (e.g. a multimedia scene) is typically specified as a sequence of time frames each consisting of several elementary PUs i.e. media objects with presentational duration and presentational position; such PU is a temporal aggregate of its constituent PUs with presentational duration equal to the temporal interval defined by the set of time frames if and only if all the constituent media objects “play within” this temporal interval and for each time frame there is at least one active media object (there are no “temporal” holes). In case a PU e.g. a background music, continues to play to the succeeding time frames, the PU is not a temporal aggregate. In this paradigm a multimedia application is typically a set of inter-linked temporal aggregations. ii. HTML / web authoring paradigm: a PU is usually determined by the media objects in a web page; such PU can be characterized as a temporal aggregate if none of its constituent media objects plays outside -e.g. extends to the previous or the next page- the temporal interval when the web page is active . Temporal Grouping Grouping or Association “is a form of abstraction in which a relationship between member objects is considered as a higher level set object. An instance of a set object class can be decomposed into a set of instances of the member object classes and this establishes a member-of relationship between a member object and a set object” [9]. Temporal-grouping is a form of abstraction in which a collection (group) of similar PUs pu i , i 1...n (i.e. PUs with the same presentational structure) with presentational durations pui . , temporally related with the same -or similar- temporal relationship R t , form a higher level PU pu whose presentational duration pui . “is a temporal group of” the member presentational durations pui . . (see example 2.2). The higher level PU pu is called a temporal group of pui with temporal relation Rt and has presentational duration pu.τ equals to the minimal cover of the pui . of PUs. The R t is a temporal constraint on the set. A group of PUs, temporally related with “similar” temporal relationships can be considered a temporal grouping if a more generic temporal relationship is used instead. For instance, a group of PUs with temporal relationship either “meets” or “before” can be considered a temporal grouping where the temporal relationship “sequential” holds for all members of the temporal grouping. A temporal grouping without temporal constraints emphasizes the similarity of temporal relationships of the set and ignores the exact relationship (abstraction transformations on abstraction constructs and temporal relations are discussed in section 3.4). Definition 2.7: Let pu G( pu1 , pu2 ,.... pun ) be a grouping of similar PUs pui , i 1n and Rt be a temporal relationship. Then pu is a temporal grouping of pu i , noted GT , with presentational duration GT . if and only if Rt holds between all pairs ( pui . , pui 1 . ) : GT ( pu) pui , pui 1 G, Rt ( pui . , pui 1 . ) . ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 10 COMPUTER TECHNOLOGY INSTITUTE 1999 Example 2.2: A sequence of slides where the temporal relationship overlaps τslide show holds between every pair of slides is a temporal grouping if: SlideShow GT ( slide1 , slide 2 ...slide n ){Overlaps} slide i , slide i 1 , Overlaps ( slide i . , slide11 . ) SlideShow. [ slide1 . .start, slide n . .end ] τslide1 τslide2 ..... τslide n Notice that, for this to be presentationally maningful, suitable spatial relationships must hold between successive slides. b. Representing Temporal Relations of multimedia applications at the semantic level Temporal Aggregation and TEMPORAL CONSTRAINTS parallel sequential Grouping defined above are is-a two abstraction constructs certain temporal is-a overlaps before during starts finishes meets equal posing constraints. However, temporal synchronization information of Figure 2: A hierarchy of temporal constraints PUs refers also to temporal constraints on such abstraction constructs, within the constituent PUs of a PU and among PUs. The 13 temporal relationships of Allen’s between time intervals [3], namely before, meets, overlaps, during, starts, finishes, and equals and their reverse relationships form the basic “vocabulary” for temporal constraints at the semantic level. The SMA model handles these relationships and their combinations utilizing the operators , , , ,,, , (e.g. before meets ) as temporal integrity constraints. Quantitative values of presentational durations (such as concrete start/end time instances of presentational durations as well as lengths of presentational durations e.g. actual duration of a video) are ignored and abstracted to the corresponding qualitative information. A generalization hierarchy of temporal relations allows a variable precision at this level (e.g. the hierarchy in Figure 2) e.g. less information is given by limiting the set of temporal relationships to sequential and parallel, while more information is provided if qualitative distances (near, far etc) are captured as well [11]. Representing Spatial Structure of multimedia applications at the semantic level (SMA’s spatial structure) a. Spatial Abstraction Constructs In [52] it is noted that when dealing with spatial objects, i.e. those whose position in space matters to the information system, it is often the case that if objects A, B and C constitute object X , then the position of A, B and C form a subset of the position of X . Thus spatial aggregation and spatial grouping were introduced as simple extensions to modeling primitives for conveying this extra piece of information. In [37] we identified the particular interpretation of spatial aggregation and grouping in the multimedia domain and defined the corresponding abstraction constructs. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 11 COMPUTER TECHNOLOGY INSTITUTE 1999 Let U { pui | pui PU }, 1 i n} and pui . p be the presentational position of pu i for i 1...n . Let also Rs ( pu. p, pu . p ) denote a spatial relationship R (such as “disjoint” [13]) between pairs of presentational positions. Spatial Aggregation Spatial Aggregation is the form of abstraction in which a collection of PUs pu i , i 1...n with presentational positions pui . p form a higher level PU pu whose presentational position pu. p “spatially consists of” the presentational positions pui . p of its constituents (see Example 2.3). The higher level presentational unit pu is called a spatial aggregate of pu i and its presentational position pu. p equals the union of the pui . p of its constituents. The important features of a Spatial aggregate are: a) it is also a PU with a presentational position attribute, b) the presentational positions of the constituent PUs are “within” (or appear inside) the presentational position of the spatial aggregate, c) the presentational position of the aggregate PU pu. p does not extend the space limits of any of the constituent PUs, d) pu. p is a region without “spatial” holes. Definition 2.8: Let pu A( pu1 , pu2 ,.... pun ) be an aggregation of component PUs pu1 , pu2 ,.... pun Then pu is a Spatial Aggregation of pu1 , pu2 ,.... pun noted AS , with presentational position AS . p if and only if: n AS ( pu ) pui A, Covers ( pui . p, pu. p ) and Equal ( pu. p, pui . p ) (Covers and Equal are spatial i 1 relationships [13], see Figure 3). Example 2.3: If a multimedia scene consists of three visual PUs of type image, video and text with presentational positions pu I . p, puV . p and puT . p respectively, then: Scene AS ( pu I , puV , puT ) scene Covers( pu I . p, scene. p ) Covers( puV . p, scene. p ) Covers( puT . p, scene. p ) text video Equal ( scene. p, pu I . p puV . p puT . p ) text background image Examples of Spatial aggregations in common authoring paradigms i. Macromedia Director [35] paradigm: Within a temporal interval, the visual PUs (e.g. visual media objects) that appear simultaneously on a screen portion form a spatial aggregate; hence, each time frame in the score window defines a spatial aggregate of the PUs exist in the score channels. ii. HTML / web authoring paradigm: a web page forms a spatial aggregate of its constituents PUs. Spatial Grouping Spatial-grouping is a form of abstraction in which a collection (group) of similar PUs pu i , i 1...n with presentational positions pui . p , spatially related with the same spatial relationship Rs , form a higher level PU pu whose presentational position pu. p “is a spatial group of” the member presentational positions pui . p . (see example 2.4). The higher level PU pu is called a spatial group of pui with spatial relation Rs and has ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 12 COMPUTER TECHNOLOGY INSTITUTE 1999 presentational position pu. p equals to the minimal cover of the pui . p of PUs. The Rs is a spatial constraint on the set. Definition 2.9: Let pu G( pu1 , pu2 ,.... pun ) be a grouping of similar PUs pui , i 1n and Rs be a spatial relationship. Then pu is a spatial grouping of pu i , noted G S , with presentational position GS . p if and only if Rs holds between all pairs ( pui . p, pui 1 . p) : GS ( pu) pui , pui 1 G, Rs ( pui . p, pui 1 . p) . Example 2.4: A group of buttons where the spatial relationship meets holds between every pair of successive buttons is considered a spatial grouping if : button1 button2 ... button n ButtonBar G S (button1 , button2 ...buttonn ){Meets} buttoni , buttoni 1 , Meets (buttoni . p, button11 . p ) ButtonBar. p [button1 . p.( x1 , y1 ), buttonn . p.( x n , y n )] c. d. Representing Spatial Relations of multimedia applications at the semantic level Similarly SPATIAL CONSTRAINTS & GENERALISATION HIERARCHY Temporal to the Dimension, general overlap w ithin boundary_overlap is-a y is-a is-a is-a is-a is-a meets(x,y) overlaps(x,y) inside(x,y) covered by (x, y) disjoint(x,y) x is-a the is-a is-a x y x y y x y is-a equal covers x x is-a sixteen 2D Topological relations [13], disjoint, namely meets, overlaps, inside, y covered_by, covers and is-a is-a boundary_disjoint is-a is-a is-a is-a equal form the basic boundary_meets “vocabulary” for spatial Figure 3 : A hierarchy of 2D spatial constraints constraints of multimedia applications at the semantic level. A generalization hierarchy of 2D topological relations (e.g. Figure 3) allows a variable precision at this level e.g. less information is given by limiting the set of topological relationships to general overlap and disjoint, while more information is provided if qualitative distances (near, far etc) are captured as well [11]. Representing multimedia content at the semantic level In conceptual modeling, Classification, a form of abstraction in which a collection of objects is considered a higher level object class, is used to classify and describe objects in terms of object classes; hence, it is natural in the SMA model to represent media objects by their corresponding classes (data types of multimedia content). Specific properties of media objects are ignored at this stage. Abstraction hierarchies of multimedia data classes allow a variable precision at this level. For instance, the content data type SELECTOR is a generalization of data types MENU, EVENTER and BUTTON. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 13 COMPUTER TECHNOLOGY INSTITUTE 2.4 The 1999 SMA model graphical notation (extended-OMT model notation) and the corresponding SMA Definition and Query Language (SMA-L) 2.4.1 The Extended-OMT model graphical notation The abstraction constructs proposed for representing SMA’s presentational structure are generic and can be used with any semantic model which has the minimal functionality of allowing the construction of complex objects from simpler ones. We illustrate this with the Object Modeling Technique (OMT)[46], resulting in an ExtendedOMT model. Extensions to OMT-Aggregation construct Extensions to OMT-Association construct T{<Temporal Constraints>} T{<Temporal Constraints>} Temporal Aggregation S{<Spatial Constraints>} Spatial Aggregation Class Temporal Grouping S{<Spatial Constraints>} Class Spatial Grouping Figure 4 : Extensions to OMT Object Model graphical notation 2.4.2 Semantic Multimedia Abstraction (SMA) Definition and Query Language (SMA-L) For the representation and manipulation of SMAs, the Semantic Multimedia Abstraction Definition and Query language (SMA-L) has been defined, the formal syntax of which is given in BNF format (Table 1). The SMA-L was built on the Extended-OMT model and thus any SMA modeled using the extended-OMT can be represented with SMA-L. SMA-L is a declarative object-oriented language which: allows the representation of the conceptual and presentational structure of SMAs (c_units and p_units represent conceptual and presentational units respectively). contains predicates corresponding to the temporal and spatial abstraction constructs (aggregation and grouping) defined for SMAs for forming PUs as well as to the Presentational View of PUs: logical, temporal and spatial, allowing users to formulate queries on complex structures of multimedia applications. provides a way for defining PUs and consequently SMAs in various abstraction levels in terms of their conceptual and presentational properties at the semantic level, through abstraction hierarchies on abstraction constructs, constraints and multimedia data types. Syntax of SMA-L The BNF notation of the SMA-L syntax is given in Table 1. Words in <italics> denote non-terminal elements of the language. Clauses in [ ] are optional arguments. Bold is used to denote reserved words. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 14 COMPUTER TECHNOLOGY INSTITUTE 1999 Table 1: Semantic Multimedia Abstractions Definition and Query Language (SMA-L) <SMA> <unit> <c_unit> : : : <unit_name> <c_unit_types> <c_unit_type> <simple_c_unit> <composite c_unit> <p_unit> : : : : : : <p_unit_types> <p_unit type> <simple p_unit> : : : <content data type > types : <composite p_unit> <abstraction construct > : : <abstraction view> <temporal> <spatial> <logical> <p_unit_list> <member_unit> <component_unit_list> <category_unit_list> < source_p_unit_list> : : : : : : : : : < target_p_unit_list> : <unit reference> <c_unit reference> <p_unit reference> <constraint> : : : : <condition> : <action> <statement> <temporal constraint> : : : <temporal relation> : < spatial constraint> : < spatial relation> : <query definition> : < match statement> : unit | unit <unit > ; An SMA is a sequence of conceptual and/or presentational units <c_unit> | <p_unit> C_UNIT <unit_name> ; conceptual unit [TYPE <c_unit_types>] [PRESENTED_BY <p_unit_list>] identifier <c_unit_type> | <c_unit_type>, <c_unit_type> <simple_c_unit> | <composite_c_unit> identifier | ABSTRACT | LINK (<source_ unit_list>) (<target_unit_list>) < abstraction construct > [<{constraint}>] P_UNIT<unit name> ; presentational unit [TYPE <p_unit types >] <p_unit_type> | <p_unit_type>, <p_unit_type> <simple_p_unit> | <composite_p_unit> <content data type > | ABSTRACT | LINK (<source_p_ unit_list>) (<target_p_unit_list>) CONTENT | MULTIPLEXED_CONTENT ; content data types can be extended to new | COMPOSITE | VISUAL_OBJECT | INPUT | OUTPUT | IMAGE | VIDEO | AUDIO | ANIMATION | TEXT | GRAPHICS | PICKER | HOTSPOT | SELECTABLE_CONTENT | STRING | VALUATOR | SELECTOR | MENU | EVENTER | BUTTON | SLIDE_SHOW | INTERACTIVE_IMAGE < abstraction construct > [ : <abstraction view> ] [<{constraint}>] GROUP_OF (<member_unit>) | AGGREGATION OF (<component_unit_list>) | GENERIC (<category_unit_list>) [<temporal>] [<spatial>] [<logical>] ; the “view” of presentational units [T] [<{ temporal constraint }>] ; temporal view [S] [<{ spatial constraint }>] ; spatial view [<{constraint}>] ; logical view <p_unit reference> | <p_unit_list>, <p_unit reference> < unit reference> <unit reference> | <component_unit_list>, <unit reference> <unit reference> | <category_unit_list>, <unit reference> <p_unit reference> [ : <condition>] | < source_ p_unit_list>, < p_unit reference> [ : <condition>] < p_unit reference> [ : <action>] | < target_p_unit_list>, < p_unit reference>[ : <action>] <c_unit reference> |<p_unit reference> < unit_name> | < c_unit type> < unit_name> | < p_unit type> <statement> | not < constraint> | < constraint> and < constraint> | < constraint> or < constraint> | (<constraint>) <statement> | not < condition > | < condition > and < condition > | < condition > or < condition > | (<condition >) <statement> string | function <temporal relation> | not <temporal relation> | <temporal constraint> and <temporal constraint> | <temporal constraint> or <temporal constraint> | (<temporal constraint>) meets | met-by | before | after | during | contains | overlaps | overlapped-by | starts | started-by | finishes | finished-by | equal | sequential | parallel < spatial relation> | not < spatial relation> | < spatial constraint> and < spatial constraint> | < spatial constraint> or < spatial constraint> | (<spatial constraint>) disjoint | meet | overlap | covered_by | covers | inside | contains | equal | g_overlap | within | b_disjoint | b_meets | b_overlap SELECT < semantic mm abstraction name> <match statement> MATCH (<semantic mm abstraction>) | < match statement> and < match statement> | < match statement> or < match statement> < semantic mm abstraction name>: identifier ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 15 COMPUTER TECHNOLOGY INSTITUTE 1999 2.5 Related Work The M Model by Dionisio and Cardenas [12] and the ZYX model by Boll and Klas [7] follow a modeling approach similar to the SMA model. The M Model is a synthesis of the Extended ER and Object-Oriented data models, integrating spatial and temporal semantics with general database constructs; the basic construct introduced is the stream”, an ordered finite sequence of entities or values; substream and multistream (an aggregation of streams that creates new more complex streams) are the other two basic constructs of the model. In the SMA model, streams, substreams and multistreams are modeled with temporal groupings and temporal aggregations, which are generic extensions of the classic aggregation and grouping constructs. However, M Model and its MQuey language can also be used for modeling and querying SMAs. The ZYX model introduces new constructs for multimedia document modeling; the model uses a hierarchical organization for the document structure, an extension of Allen’s model (which support intervals with unknown duration) for modeling temporal synchronization and a point-based description for modeling spatial layout. The ZYX model is a tree-based model where nodes represent “presentation elements” -a concept similar to our notion of presentational unit- and each node has a binding point that connects it to other elements. Spatio-temporal synchronization and interactivity are modeled with temporal, spatial or interaction elements (par, seq, loop, temporal-p and spatial-p, link, menu etc.); in the SMA model such relationships are modeled as constraints on presentational units (including temporal/spatial aggregation and grouping) which allow the modeling of the conceptual and presentational structure in a uniform way. As the Z YX model uses a structure similar to language-based models (XML, SMIL) the abstraction process from these models to ZYX is straightforward and abstraction transformations (discussed in 3.4) can transform ZYX representations to SMA-L ones. Conceptual modeling has been proposed for document information retrieval in [39], where principles from database area are used in order to enhance retrieval of multimedia documents; the model focuses on multimedia documents and is restricted on their logical, layout and conceptual view. Other object-oriented multimedia query languages with the appropriate extensions/modifications can be used for the same purpose as SMA-L such as: the Multimedia Query Specification Language along with the object-oriented data model for multimedia databases proposed by Hirzalla et al. [19] which allows the description of multimedia segments to be retrieved from a database containing information on media and on spatial and temporal relationships between these media; the Query language of the TIGUKAT object management system [43]; the general purpose multimedia query language MOQL [33] which includes constructs to capture the temporal and spatial relations in multimedia data. 2.6 Application: Validation of the MULTIS production approach Figure 5 depicts a part of the conceptual database schema of the MULTIS system for a series of Point of Information Systems (POIs), modeled using the extended-OMT model. POIS Interactive M ap Geographic Area HotSpot Place Image Area View Landmark M useum Castle Hotel S{meets} sub-scene T{meets}, S{equal} Image Video T {equals}, S{disjoint} Text ButtonList C_UNIT POIS TYPE AGGREGATION_OF (GeographicArea) C_UNIT GeographicArea TYPE AGGREGATION_OF ( GROUP_OF (GeographicArea)), AGGREGATION_OF(GROUP_OF(Place), GROUP_OF(AreaView) ) ………….. C_UNIT Landmark TYPE GENERIC(Museum, …. , Hotel) ………………. C_UNIT Hotel PRESENTED_BY(AGGREGATION_OF (SubScene, Video)) P_UNIT SubScene TYPE AGGREGATION_OF ( GROUP_OF (IMAGE): T {meets}, S{equal}, TEXT, ButtonList) : T {equal}, S {disjoint} P_UNIT ButtonList TYPE GROUP_OF (Button): T{equal}, S{meets} T{equals} S{meets} Button Figure 5 : Extended-OMT model of MULTIS POIS 3. Semantic Multimedia Abstractions for Querying Large Multimedia Repositories 3.1 The opportunity of abstraction in multimedia information retrieval Organized units of interactive multimedia material are becoming rapidly available beyond their original format, namely Compact Disks; the advent of the Web and the appearance of digital libraries enlarge the habitat of such multimedia units which can now reside anywhere on the Internet, be distributed across local or global networks, or even have a transient and virtual existence: a net-surfing session on the Web is a multimedia application of this kind. Although such collections of applications are not organized as proper databases, they are very large repositories of multimedia information. For large collections of such applications, browsers and query mechanisms addressing the multimedia data alone while reasonably well developed are inadequate: we lack techniques for efficient generic retrieval of structured multimedia information. To really tap the information resource we need a different approach for querying and navigating in these repositories, one that would resemble our own way of recalling information from our minds, human remembering [27]. Consider the following query: find multimedia electronic books explaining grammar phenomena of English Language where phrasal verbs are explained through a page of a synchronized video and a piece of text in two languages; the video covers half of the screen and when clicked a translation text appears. This is an abstract specification of –possibly a part of- a multimedia application and regard its conceptual structure (a book has pages with phrasal verbs), its presentational structure including its spatio-temporal synchronization and its interactivity. It is exactly with respect to these characteristics that we would like to be able to query and navigate through a multimedia repository. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 17 COMPUTER TECHNOLOGY INSTITUTE 3.1.1 1999 Principles of Human Remembering and the Need for mixed granularity levels According to cognitive science, we think and remember using abstractions [27]; we build abstraction models of varying granularity that depend on the task at hand as well as on the state of our knowledge in a domain. Moreover, the process of changing representation levels and the multilevel representation of knowledge are fundamental in common sense reasoning [44]. Starting at a high abstraction level -coarse granularity- and moving towards a more detailed one -fine granularity- is a common approach in solving a problem [40]. The technique used in AI to imitate this process is the use of hierarchies of abstraction models, each one in a different abstraction level, and the definition of the relations between the different models in a hierarchy [20], [47], [25]. However, when we think and recall information in our minds, we normally mix granularity levels in a single representation. For instance, recalling a place visited, a description may include “an island, where in the harbor exists a castle and a church of 16th century and there is a small village named “Sigri”. To answer such query using maps, -a mature form of symbolic representation for complex information- we would need a multi-resolution map with only names for some large cities but including details such as street names and museums for others. Consequently, to allow information retrieval congruent with human remembering we need techniques that support multilevel knowledge representation using various abstraction levels and mixed abstraction levels or resolutions at the same representation (considered either granularity levels at the same abstraction level or hierarchies of different abstraction levels). In multimedia applications a number of factors affect the choice of abstraction level in both the design and the retrieval of applications while an abstract representation may be "more detailed" for one part of it and “more abstract” for another. When specifying MULTIS systems [53] we identified the following: the user view: conceptual, logical, temporal, spatial, interactivity or content. E.g. if the temporal synchronization matters most, the abstraction level of the other dimensions is kept high. the user’s knowledge and recollection of the multimedia application in any presentational view. E.g. when looking for an application with a slide show, one might or might not remember –or deem important- the slide synchronization. the tightness of a constraint in each dimension, implied from the significance of the behavior under consideration the temporal scope, determined by the temporal interval over which the behavior the application is analyzed. E.g. is one interested in the behavior over the whole application or over a few seconds of it? If a query focuses on the temporal interval of a slideshow in an application, the slide synchronization may be considered important and specified in detail. the spatial scope, determined by the area of the screen over which the behavior is analyzed. The technique used in AI to imitate this process is the use of hierarchies of abstraction models, each one in a different abstraction level, and the definition of the relations between the different models in a hierarchy [20], [47], [25]. However, when we think and recall information in our minds, we normally mix granularity levels in a single representation. For instance, recalling a place visited, a description may include "an island, where in the harbor exists a castle and a church of 16th century and there is a small village named "Sigri". To answer such query using maps, -a mature form of symbolic representation for complex information- we would need a multi-resolution map with only names for some large cities but including details such as street names and museums for others. Research aims to find methods for creating and using abstract spaces to improve the efficiency of classical searching techniques such as heuristic search (especially state-space search). The basic idea behind these approaches is that instead of directly solving a problem in the original search space, the problem is mapped onto and solved in an abstract search space; then the abstract solution is used to guide the search for a solution in the original space (guided search) [21]. In the multimedia domain reducing the complexity of the search space and hence improving information retrieval is a “quantitative’ objective for introducing abstraction. A repository of SMAs forms an abstract search space of which we can have a hierarchy (see Sect. 3.4). An abstract answer to a query can be found by searching in the hierarchy of abstract spaces in principle a computationally easier task. Then, a method for using the abstract answer to guide search can be followed, for instance, to use the length of abstract solution as a heuristic estimate of the distance to the goal [45], or to use the abstract solution as a skeleton in the search process (“refinement method”[47] or other variants such as path-marking and alternating opportunism [21] ). 3.1.3 Approximate match retrievals - Filtering large multimedia repositories There are many cases where queries aim to filter out interesting parts of large repositories; in such cases information retrieval is based on the similarity of the repository’s data to user’s query and approximate matching techniques are used for query evaluation [26], [1]. In order to filter large multimedia repositories in terms of the conceptual and presentational properties of multimedia applications, the user should be able to pose approximate queries and get a set of approximate answers that match in a certain similarity degree the given query. Here, the similarity measure should also agree to human perception of similarity in multimedia applications. Abstraction and abstraction hierarchies seem to have a significant role in filtering large multimedia repositories. An approximate query has the form of an SMA and given an abstract search space of SMAs approximate query evaluation is performed as a “normal” search process in a simplified abstract search space. Abstraction hierarchies are used as a basis in query evaluation process and relaxation of queries (see sect. 3.4). ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 19 COMPUTER TECHNOLOGY INSTITUTE 3.1.4 1999 Semantic Multimedia Abstractions and Existing Types of Metadata One way to query the conceptual and presentational properties of multimedia applications is to capture these properties by using metadata. The most significant works on metadata for digital media are presented in [30]. In [6], the metadata used for multimedia documents are classified according to the type of information captured. The categories include content-descriptive metadata, metadata for the representation of media types, metadata for document composition; composition-specific metadata are knowledge about the semantics of logical components of multimedia documents, their role as part of a document and the relationships among these components. Metadata in SGML [49] and XML[14] documents are organized in document type definitions (DTDs) which are themselves part of the metadata and contain “element types” of metadata. Additionally, there exist metadata for collections of multimedia documents (DFR standard) [24]. Statistical metadata and metadata for the logical structure of documents are expected to optimize query processing on multimedia documents. In [28] a three-level architecture consisting of the ontology, metadata and data levels is presented to support queries that require correlation of heterogeneous types of information; in this approach, metadata are information about the data in the individual databases and can be seen as an extension of the database schema. Metadata are usually stored either as external -text- files or along with the original information while objectrelational database systems could also be employed to manage them. In [18], Grosky et al. propose content-based metadata for capturing information about a media object that can be used to infer information regarding its content and use these metadata to intelligently browse through a collection of media objects; image and video objects are used as surrogates of real-world objects and metadata are modeled as specific classes being part-of an image/video class. SMAs are a type of metadata that capture the conceptual and presentational properties of multimedia applications at the semantic level. Based on existing types of metadata and extending them (e.g. extending content-based metadata described in [18] from image and video surrogates of real world objects to PUs), SMAs can be viewed as metadata on PUs, capturing both semantic information about real world objects that a PU represents and information about the presentational properties of the PU. Introducing a new class as part-of any PU as a way to model metadata implies that we should put conceptual and presentational structure as normal class attributes in a metadatabase scheme which seems rather cumbersome. The approach of DTDs in SGML and XML seems quite appropriate as it captures structural information of multimedia documents and given that the XML-based recommendation SMIL[50] would allow to capture information on synchronization of media objects. In a MULTIS system, the knowledge about the presentational and structural characteristics of the multimedia applications in the series is embedded in each object class and it is used in the code generation process; each object "knows how to present itself", it produces its code and propagates a pertinent message to the appropriate objects of the network. Figure 1 depicts the MULTIS layered architecture; layers communicate with adjacent ones but operate independently allowing the separation of the multimedia data and their presentation. The MULTIS approach was validated in practice within the context of the EEC funded project "Valmmeth" [53] whose aim was to demonstrate the feasibility and benefits of publishing series of multimedia applications using this technology. Hierarchies of SMAs allow the definition of higher level metadata (discussed in detail in section 3.3). 3.2 On the Abstract Multimedia Space Abstraction has been defined [17] as a mapping between two representations of a problem which preserves certain properties. The set of concrete, original representations is the Ground space while an Abstract space is a set of their abstract representations. Definition 3.1 A “Ground Multimedia Space” is a set of concrete representations of multimedia applications represented in various models and languages e.g. a set of HTML documents, a set of XML-based documents [14], a set of Macromedia Director applications [35] form Multimedia Ground Spaces. Definition 3.2: An Abstract Multimedia Space is a repository of Semantic Multimedia Abstractions (SMAs); we call this repository Semantic Multimedia Abstractions Database or simply an SMA space. Without affecting the generality we use the Extended-OMT model / SMA-L for representing the “content” of the SMA space. Hence, an SMA space is a repository of Extended-OMT object models or sets of statements in SMA-L. The SMA space is a set of Ground Multimedia Space: Repository of Multimedia Applications mm application mm application mm application Database of M ultimedia Applications M ultimedia Database Schema multimedia application multimedia application HTM L/ XM L Document metadata applications. SM IL document multimedia application of distributed multimedia It is a repository of SMAs if metadata are stored generic specifications (M TS) SM A SM A along with the multimedia applications of the ground SM A multimedia space. Figure 6 Abstract Multimedia Space (The SMA Space) Figure 6 :Ground and Abstract Multimedia Spaces shows the two different types of existing repositories: Multimedia Repositories (Ground Multimedia Spaces), which contain multimedia applications too loosely organized to be called a database, and the SMA space (Multimedia Abstract Space) whose instances are SMAs each representing several multimedia applications. Note that a number of concrete multimedia applications may correspond to the same SMA (Figure 7). Moreover, given the hierarchies of abstraction constructs and constraints (defined in section 3.4) a specific multimedia representation can have many SMAs in various abstraction levels. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 21 COMPUTER TECHNOLOGY INSTITUTE 3.2.1 1999 Querying the Abstract Multimedia Space: Requirements from Query languages A query on the conceptual and presentational structure of multimedia applications is an SMA and corresponds to the key value in the query evaluation process. The search space is the SMA Space. The output of the query is the M ultimedia Application Semantic M ultimedia Abstraction 1 Semantic M ultimedia Abstraction 2 scene scene T{starts}, S{g_overlap} T{meets} Image1 3" 5" 1" T{parallel}, S T{sequential} Image1 Image2 Image2 5" 17" 6 5" 3" 6 12" 4" 10" 5" 15" Figure 7 : Semantic Multimedia Abstractions from a concrete multimedia application set of SMAs that “match” this key SMA. “Matching” the key SMA means that the retrieved SMAs contain a fragment which “can be abstracted to the key SMA” if abstraction transformations are applied on it. Hence, the query language should at least have the expressive power of the language for the representation of the SMA space. Among the requirements for the query language are: i. to support queries on combinations of conceptual and presentational properties of multimedia applications and to allow queries on complex structures of multimedia applications (by providing predicates corresponding to abstraction constructs). ii. to support queries on temporal and spatial synchronization of multimedia applications at various abstraction levels. iii. to allow mixing abstraction levels in a single query and logical combinations of queries. iv. Given that the structure of the Abstract Multimedia Space queried is unknown to the user, the query language should provide mechanisms for efficient filtering multimedia applications by fuzzy and incomplete queries, which, together with techniques for partial matching in query evaluation and relaxation of queries, would improve the efficiency of the query evaluation process. Based on the proposed SMA model, a query can be defined a) By a set of statements in SMA-L b) Graphically, by drawing the Extended OMT-model scheme that represents the query or c) Visually, by an approximate description of the presentational structure of the application. Towards this visual approach, a software tool has been developed, initially for the purpose of partially specifying the presentational structure of MULTIS systems (part of the MTS). The tool is customizable by application domain, it allows the WYSIWYG definition of spatial PUs and it produces the corresponding statements in SMA-L. Queries on presentational spatial structure of PUs. Example 3.1: “Get multimedia applications containing scenes with two photos, where the first is either covered_by the second or it is Image1 within it”. The spatial constraint scene Image2 a here can be expressed either by a OR combination of two simple spatial scene S {within} S{covered_by inside} Image1 Image1 Image2 Image1 Image2 Image2 relationships or by a generalized relationship such as“within”. ii. Queries on presentational temporal structure of PUs Example 3.2: “Get multimedia applications containing a slide-show where each slide is synchronized with a piece τ1 τ2 τ3 τ4 of audio” SlideShow T{meets} sync slide_audio .... T{equals} Image Audio SELECT mm_applications MATCH (P_UNIT SlideShow) P_UNIT SlideShow TYPE = GROUP_OF(sync slide_audio): T{meets} P_UNIT sync slide-audio TYPE = AGGREGATION_OF (IMAGE,AUDIO):T{equals} iii. Queries on a combination of conceptual structure, data types and presentational structure of multimedia applications Example 3.3: The Extended-OMT model and SMA-L statements of the example query given in Section 3.1 is English Grammar Phrasal Verb Phr_V_page Video_mode Text_mode T{meets} sync v-t T{equals} S{disjoint} Video Text T{meets} sync t-t T{equals} S{disjoint} SELECT Electronic Books MATCH (C_UNIT English Grammar) C_UNIT English Grammar TYPE = GROUP_OF(Phrasal Verb) C_UNIT Phrasal Verb PRESENTED_BY Phr_V_page PU_UNIT Phr_V_page TYPE GENERIC(Video_mode, Text_mode) PU_UNIT Video_mode TYPE GROUP_OF(sync v-t): T{meets} P_UNIT sync v-t TYPE AGGREGATION OF(Video, Text):T{equals}, S{disjoint} ….. P_UNIT Video TYPE LINK (Video_mode, Text_mode) Text iii. Approximate Queries / Queries with incomplete information on presentational properties of multimedia applications ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 23 COMPUTER TECHNOLOGY INSTITUTE 1999 Example 3.4: Let an SMA in an SMA space contain a spatial aggregation of the PUs: slide show, text and video. A query looking for applications Scene Scene containing a scene with an image and a S{disjoint} text will not retrieve this SMA, Image Text Video In the S MA S pace Image Text Query although it represents a rather similar scene. Considering such cases as queries with incomplete information in terms of the presentational structure of SMAs, the query evaluation process will search for SMAs that match, in a certain similarity degree, the given query e.g. if the similarity measure considers a slide show of images similar to one image and a scene with image and text similar to a scene with image, text and video the SMA will match the query. In Section 3.4 we define the appropriate abstraction transformations to allow such approximate queries. Example 3.5: Consider again the query in the Example 3.2. A user query could be: P_UNIT SlideShow TYPE = GROUP_OF (AGGREGATION_OF (IMAGE,AUDIO):T{equals}): T{meets} The PU “sync slide_audio” does not exist in the query although it exists in the SMA as an internal PU added for modeling purposes (a direct grouping of aggregation constructs seems awkward in OMT). In this case the query evaluation process returns the above SMA by applying abstraction transformations on the SMA to abstract out the internal PU (provided the remaining SMA structure matches the user query). vi. Query relaxation An overly restrictive query may result in an empty or irrelevant set of SMAs while the SMA space may contain similar SMAs that the user would like to retrieve. Relaxing the query by abstraction transformations might result in useful output. 3.3 Models and Techniques for Creating and Searching the Abstract Multimedia Space The creation of the Abstract Multimedia Space from the existing repositories of multimedia applications (Ground Multimedia Space) is the basic prerequisite for using abstraction techniques in searching large multimedia repositories. However interesting is the example of the MTS, which implies availability of abstract representations of sets of multimedia applications, it is not representative. The abstraction methods vary according to: the models used for the representation of the Ground and the Abstract space (formal systems, languages or graphs) which results in different types of abstraction methods: model-based abstraction, graph-based, syntactic abstractions, domain abstractions, abstractions as graph homomorphism, abstraction based on irrelevance reasoning, behavioral abstractions, time-based abstractions etc, whether the languages/models of the Ground and the Abstract space are the same or not, the multiplicity of models used for reasoning and the use of hierarchies of abstraction spaces, the goal of abstraction: problem solving, planning, search etc. Giunchiglia and Walsh in [17] present a theory of reasoning with abstraction that unifies most previous work in the area. The completeness and consistency of the abstract space as well as the computational savings gained by using abstractions is beyond the scope of this paper. 3.3.1 Abstraction in Artificial Intelligence – Related Work Abstraction is generally defined as a mapping between a ground and an abstract space which preserves certain properties. The abstraction methods vary according to: the models used for the representation of the Ground and the Abstract space (formal systems, languages or graphs) which results in different types of abstraction methods: model-based abstraction, graph-based, syntactic abstractions, domain abstractions, abstractions as graph homomorphism, abstraction based on irrelevance reasoning, behavioral abstractions, time-based abstractions etc, whether the languages/models of the Ground and the Abstract space are the same or not, the multiplicity of models used for reasoning and the use of hierarchies of abstraction spaces, the goal of abstraction: problem solving, planning, search etc. Giunchiglia and Walsh in [17] present a theory of reasoning with abstraction that unifies most previous work in the area. Abstraction is defined as the process of mapping a representation of a problem (called the Ground representation) onto a new representation (called the Abstract representation), preserving certain desirable properties. For the representation of a problem they use formal systems (formal description of a theory described as a set of folmulae , which represents the statements of the theory, written in language which is a set of well formed folmulae) and abstraction is defined as a mapping between formal systems: “An abstraction written f : 1 2 is a pair of formal systems 1 , 2 with languages Λ1 and Λ2 respectively, and an effective total function f : 1 2 . Σ1 is called the ground space, Σ2 is the abstract space and f is the mapping function between them. The theory was used to study the properties of abstraction mappings, to analyze various types of abstractions and to classify previous work. A basic classification studied is whether the set of theorems of the abstract theory is a subset, superset or equal to the set of theorems of the base theory. Domain abstractions are abstractions which map a domain (constants or function symbols) onto a smaller and simpler domain [17]. Hobbs in [20] suggests a theory of granularity in which different constants in ground space are mapped onto -not necessarily different- constants in the abstract space according to an indistinguishability relation. Imielinski in [23] proposes domain abstractions where objects in a domain are mapped onto their equivalence classes; a domain in this work is considered to be the domain of a knowledge base and the domain abstraction is defined by means of an equivalence relation on this domain, determined by what is of interest to the user, what features the system should hide or by the degree of the required approximation. A similar approach based on irrelevance reasoning is presented by Levy in [32]; he considers the problem of simplifying a knowledge base and he presents a general schema for automatically generating abstractions tailored for a given set of queries, ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 25 COMPUTER TECHNOLOGY INSTITUTE 1999 by deciding which aspects of a representation are irrelevant to the query and removing them from the knowledge base. Abstraction is a widely studied technique for speeding up search and problem solving. Research in this area investigates methods for the creation of the abstract space and for the use of the abstract solution to guide search. A very common representation of a search space (or a problem space) is the STRIPS notation [15] where nodes (states) represent sets of sentences in a formal language and operators that map one state to the other represent relations between states (implicit graph representation). The idea behind guided search is to create from the original graph a “simpler” graph, to find a solution in the simpler graph and use it to guide search in the original graph. An abstraction is a mapping from one search space to another and a widely used type of mapping between search spaces is “the homomophism”, a many-to-one mapping that preserves behavior. Homomorphism is used in the ALPINE abstraction system [31], where ordered sequences of abstraction spaces are formed by dropping certain terms from the language of a problem space and in the STAR method of abstraction [22] where the search space is represented by a graph and abstraction is defined as mapping between graphs based on graph- homomorphism. The STAR method creates abstract spaces by selecting a state -the “hub” state- and grouping together its neighbors within a given distance -the radious of abstraction; the process is repeated until all states have been assigned to an abstract space, creating consequently a hierarchy of abstraction spaces. In [41] Nayak and Levy propose a Semantic Theory of Abstraction as a different approach to the syntactic theory of abstraction. Abstraction is defined as a model-level mapping (i.e. the decision what to abstract is made at the model level, using knowledge about relevant aspects of the domain) from a detailed theory to an abstract one and is viewed as a two step process where the intended domain model is first abstracted and then a set of abstract formulas is constructed to capture the abstracted domain model. A special case of this Semantic Theory of Abstraction is presented in [4] where structural and behavioral abstractions are defined for reducing complexity in diagnosis: Structural abstraction is the abstraction where an abstracted component is created by grouping several subcomponents together and whose behavioral description can be derived from the behavioral description of its constituents. Behavioral abstraction is the abstraction where a component is assigned with behavioral models at varying levels of precision and whose behavioral model can be automatically derived by using abstraction axioms. In this work, the detailed and the abstract model are both represented in the same language. To overcome the restrictions of abstractions by dropping sentences and mainly the dependency of the abstraction’s effectiveness on the representation of the domain (knowledge engineers had to represent a domain in a certain way that is not always feasible), Bergmann and Wilke [5] proposed abstractions with complete change of representation language at the abstract level in which the detail is reduced (e.g. by abstracting the quantitative value expressed in the sentence towards a qualitative representation). The prerequisites of the approach are the definition of the ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 26 COMPUTER TECHNOLOGY INSTITUTE 1999 abstract language by the domain expert and a domain abstraction theory in which the set of admissible ways of abstracting states is predefined. 3.3.2 On the creation of the SMA space (Abstract Multimedia Space) As a basis in the discussion on the creation of the SMA space, we first define the Ground and Abstract Multimedia Space. Let M L denote the representation of a multimedia application M in a language/model L. Let also M L denote a set of representations of multimedia applications in L. Definition 3.3 : If L g is a language/model for the representation of concrete instances of multimedia applications (e.g. HTML, Macromedia Director[35], XML[14]) then M Lg is a Ground Multimedia Space. If La is a language/model for the representation of SMAs then M La is an SMA space. M SMA L denotes an SMA space where SMA-L is used for the representation of SMAs. Definition 3.4 : An SMA transformation F is a function M L1 M L2 which preserves the conceptual and presentational properties at the semantic level of all multimedia applications in M L1 and F(m) M L2 is “simpler” than m M L1 . Two special cases of SMA trasformations are of particular interest: First is when M L1 is a ground multimedia space and M L2 an SMA space in which case F is a base SMA transformation denoted Fbase ; for instance, Fbase : M Lg Μ SMA L . Second is when L1 = L2, i.e. M L1 Μ L2 are both SMA spaces with the same language and F is an SMA-to-SMA transformation; for instance, F : M SMA L Μ SMA L . For the representation of concrete multimedia applications there exist a variety of languages/models [35], [19], [33], [10] and standards [49], [14], [50]. The base SMA transformation is an abstraction by change of the representation language, an approach similar to that proposed in [5] (briefly described in Sect. 3.3.1); it follows model-based abstraction proposed as proposed in [41] (a two-step process where the first stage is the definition of SMA-L that captures an intended SMA space). To complete the definition of an Fbase we should define all the admissible abstraction mappings; however, to do this we must use a specific L g for the ground multimedia space. In Section 3.5 we present a prototype system we developed for the creation of an Abstract Multimedia Space: We consider L g be XML-based languages (XML DTDs and SMIL documents) and La be SMA-L and we define the set of admissible abstraction mappings for the Fbase SMA transformation. In this specific example, for the complete definition of an Fbase SMA transformation we considered the following types of admissible abstraction mappings (similar types of abstraction mappings can be defined for other L g s). 1. Domain abstraction mappings, where objects considered "identical" in terms of certain qualities or characteristics are mapped onto their equivalence class e.g. media objects onto their equivalent <content data types>, real-world objects onto their equivalent <c_unit_type>. Spatial abstraction mappings, where a concrete domain of quantitative spatial values (i.e. exact positions of spatial PUs) is mapped onto a domain of spatial relations (qualitative representation). 4. Structural abstraction mappings, where a) a group of (similar) PUs is mapped onto a higher level PU, their “grouping” or b) a set of component PUs is mapped onto a higher level PU, their aggregate. Structural abstraction mappings apply on logical, temporal and spatial Presentational Views (see Definition 2.2). The Fbase SMA transformation was defined as a mapping from sets of valid statements in an L g to a set of valid statements representing PUs in SMA-L. A first step for the complete definition of an Fbase is to identify PUs in L g . For instance, Elements in XML DTD or statements within <smil> </smil> tags in SMIL define PUs. A PU is considered the basic unit on which the Fbase SMA transformation is to be defined and it corresponds to a sentence in a language-based abstraction approach or to a state in a graph-based approach. 3.4 SMA-to-SMA Transformations and SMA Hierarchies Creating hierarchies of abstraction spaces by repeating the abstraction process on the generated abstract spaces is a technique used in the field of AI for approximate search -in general, for reasoning with approximation. In this section, we introduce the basics for generating hierarchies of SMA spaces by applying abstraction transformations (beyond the base SMA transformation Fbase ). Although hierarchies of abstraction spaces can be generated by changing representation language, here we assume the same language La for all abstraction levels in an SMA Space and exemplify it by SMA-L. The approach of applying transformation rules has been followed also in [26] for the purpose of answering queries in terms of similarity of objects (objects that approximately match a pattern). A related approach for answering and relaxing queries on structured multimedia database systems is presented in [36] . A complete definition of an SMA-to-SMA transformation is given by defining all the admissible abstraction mappings (admissible SMA-to-SMA transformations). 3.4.1 Types of admissible SMA-to-SMA transformations 1. Relaxing Constraints by adding a disjunct or dropping a conjunct: Let C be a set of constraints in SMA-L. The admissible SMA-to-SMA transformations for relaxing constraints c C are: a. Adding a disjunct Fdis : c, c C : c c is a relaxation of c. The transformation rule maps a constraint onto a more generic one according to a generalization relation. SMA-to-SMA transformations by adding a t t disjunct are defined on temporal and spatial constraints (noted Fdis respectively). and Fdis ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 28 COMPUTER TECHNOLOGY INSTITUTE 1999 b. Dropping a conjunct Fdrop : c, c C : c is a relaxation of c c . This implies also that: c C , is a relaxation of c. SMA-to-SMA transformations by dropping a conjunct is defined for logical, spatial and l s t temporal constraints (noted Fdrop respectively). , Fdrop , Fdrop 2. Structural SMA-to-SMA Transformations on abstraction constructs are: a. Unifying Structural SMA-to-SMA transformations Fstruct : this transformation rule maps a logical, temporal or spatial set of PUs which form an aggregation or a grouping to a single PU. drop b. Dropping elements of aggregates Fstruct It is defined for component PUs of aggregation constructs. 3. Relaxing Content Data types by adding a disjunct: Let D be a set of Content Data Types in SMA-L. The admissible SMA-to-SMA transformation for relaxing Content data types d D is: d Adding a disjunct Fdis : d , d D : d d is a relaxation of d. The transformation rule maps a content data type onto a more generic one according to a generalization relation. To specify these types of admissible SMA-to-SMA transformations on SMAs, we first define SMA-to-SMA transformations on PUs based on the relation a “more or equally abstract than” on presentational properties of PUs. SMA transformations on SMAs are synthesis of SMA transformations on their PUs. 3.4.2 Relaxing Constraints by adding a disjunct : Fdis t On <temporal constraints>: Fdis Definition 3.5 : If T is the domain of temporal constraints, the relation a “more or equally abstract than” on T is defined as follows: t, t T : t t a t . For instance, if T = {meets, met-by, before, after, during, contains, overlaps, overlapped-by, starts, started-by, finishes, finished-by, equal, sequential, parallel} then: (1) sequential a t for t {meets, met-by, before, after, sequential}, where sequential meets met _ by before after (2) parallel a t for t {during, contains, overlaps, overlapped-by, starts, started-by, finishes, finished- by, equal},where parallel during contains overlaps overlapped _ by starts started _ by finishes finished _ by equals (3) t T , " not t" a T {x |t a x} (e.g. “not parallel” a any relationship of the set {meets, met-by, before, after, sequential}) We denote with a * the transitive closure of the relation a . Let pu denote the representation of a PU in SMA-L. Let also Pu denote a set of representations of PUs in SMA-L. Definition 3.6: Let pu : T {t1 , t 2 ...t n } denote a pu Pu with a set of <temporal constrains> t1 , t 2 ...t n . t t Fdis : Pu P u , pu : T {t1 , t 2 ...t n } Fdis ( pu : T {t1 , t 2 ...t n }) iff t i , t i T , t i a * t i ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 29 COMPUTER TECHNOLOGY INSTITUTE 1999 s On <spatial constraints>: Fdis Definition 3.7: If S is the domain of spatial constraints, the relation a ”more or equally abstract than” on S is defined as follows: s, s S : s s a s For instance, if S = {disjoint, meet, overlap, covered_by, covers, inside, contains, equal, g_overlap, within, b_disjoint, b_meets, b_overlap}, then: (1) b_overlap a s for s {meets, overlap, covered_by, covers, equal, b_overlap} (2) b_disjoint a s for s {disjoint, inside, b_disjoint} (3 ) b_meets a s for s {meets, covered_by, covers, equal, b_meets} (4) within a s for s {inside, covered_by, covers, equal, within} (5) g_overlap a s for s {b_overlap, Within, g_overlap } (6) s S , " not s" a S {x |s a x} Definition 3.8: Let pu : S{s1 , s 2 ...s n } denote a pu Pu with a set of <spatial constrains> s1 , s 2 ...s n . s s Fdis : Pu P u , pu : S{s1 , s 2 ...s n } Fdis ( pu : S{s1 , s 2 ...s n }) iff si , si S , si a * si 3.4.3 Relaxing Constraints by Dropping a conjunct: Fdrop Definition 3.9: Let T, S, L be sets of temporal, spatial and logical constraints in SMA-L. The relation a “more or equally abstract than” on T, S, L is defined as follows: t T : a t s S : a s t, t T : t a t t s, s S : s a s s l L : a l l, l L : l a l l Definition 3.10: If pu : {l}T {t}, S{s} denote a pu Pu with sets of <logical>, <temporal>, <spatial> constraints, where and t t1 t 2 ... t k , s s1 s 2 ... s n , l l1 l 2 ... l m . l ,t , s l ,t , s Fdrop : Pu P u , pu : {l }, T (t }, S{s } Fdrop ( pu : {l}, T (t}, S{s}) iff t a * t and 3.4.4 s a * s and l a * l Unifying Structural SMA-to-SMA transformations: Fstruct t_a s_a On Aggregation Constructs: Fstruct and Fstruct t_a Unifying Structural SMA-to-SMA Transformations on Temporal Aggregation Fstruct follow; Definitions for Spatial s_a Aggregation Fstruct are similar. Definition 3.11: Let pu1 , pu 2 ... pu n be temporal PUs and pu1 . , pu 2 . ... pu n . their presentational durations. The relation a “more or equally abstract than” on temporal PUs is defined as follows: If pu Pu : pu AT ( pu1 , pu2 ... pun ) according to Definitio n 2.6 , then pu a pu1 , pu2 ... pun (where pu a pu1 , pu2 ... pun denotes that pu is “more or equally abstract than” the whole set of PUs <pu 1, pu2… pun>. Definition 3.12: Let A1T , A2T ... AnT be temporal aggregations PUs and A1T . , A2T . ... AnT . their presentational durations. If pu Pu : pu AT ( A1T , A2T ... AnT ) according to Definitio n 2.6 , then pu a A1T , A2T ... AnT ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 30 COMPUTER TECHNOLOGY INSTITUTE 1999 (A set of temporal aggregations can be abstracted to a single -higher level- temporal aggregation if the set of its constituents temporal aggregations form a higher level temporal aggregation). Definition 3.13: pu, pu Pu, Let ) pu AT ( A1T , A2T ... AnT ) and pu AT ( A1T , A2T ... AmT two PUs where mn t_a t_a pu , AiT ...Ai kT pu, k 0 : AiT Fstruct : Pu P u , pu Fstruct ( pu) iff AiT a * AiT ...Ai kT t_g s_g On Grouping constructs : Fstruct and Fstruct t_g Unifying Structural SMA-to-SMA Transformations on Temporal Grouping Fstruct follow; Definitions for Spatial s_g Grouping Fstruct are similar. Definition 3.14: Let pu1 , pu 2 ... pu n be temporal PUs and pu1 . , pu 2 . ... pu n . their presentational durations. The relation a “more or equally abstract than” defined as follows: If pu Pu : pu GT ( pu1 , pu2 ... pun ) according to Definitio n 2.7 , then pu a pu1 , pu2 ... pun Definition 3.15: Let G1T , G2T ...GnT be temporal groupings (PUs), G1T . , G2T . ...GnT . their presentational durations and Rt1 , Rt 2 ...Rtn their temporal relationships. If pu Pu : pu GT (G1T , G2T ...GnT ) according to Definitio n 2.7 , then pu a G1T , G2T ...GnT (A set of temporal grouping can be abstracted to a single -higher level- temporal grouping if the set of its member temporal groupings form a higher level temporal grouping; the Temporal Relationship of the higher level temporal grouping is a generalized temporal relationship of the member temporal groupings). Definition 3.16: pu, pu Pu, Let ) pu GT (G1T , G2T ...GnT ) and pu GT (G1T , G2T ...GmT PUs where mn t_g t_g pu , GiT ...Gi kT pu, k 0 : GiT Fstruct : Pu P u , pu Fstruct ( pu) iff GiT 3.4.5 two a * GiT ...Gi kT drop SMA-to-SMA transformations by Dropping elements (components) of aggregate constructs: Fstruct Definition 3.17 : Let U pu1 , pu 2 ... pu n and U pu1 , pu 2 ... pu n be two sets of PUs, pui , pu j Pu, as follows: If i 1...n, j 1...m, m n . The relation a “more or equally abstract than” on U is defined U U then U a U Definition 3.18: Let pu A(U ) and pu A(U ) two PUs where pu, pu Pu (aggregation with sets of PUs U and U ). drop drop Fstruct : Pu P u , pu Fstruct ( pu ) iff U a * U 3.4.6 d Relaxing Content Data types by adding a disjunct: Fdis Definition 3.19 : If D is the domain of content data types, the relation a ”more or equally abstract than” on D is defined as follows: d , d D : d d a d For instance, if D = {CONTENT, MULTIPLEXED_CONTENT, INPUT, OUTPUT, VISUAL OBJECT, IMAGE, VIDEO, AUDIO, ANIMATION, TEXT, GRAPHICS, PICKER , HOTSPOT , SELECTABLE_CONTENT , STRING , VALUATOR , SELECTOR , MENU , EVENTER , BUTTON, ANIMATION, SLIDESHOW} then: ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 31 COMPUTER TECHNOLOGY INSTITUTE 1999 (1) CONTENT a d for d {VISUAL OBJECT, IMAGE, VIDEO, AUDIO, ANIMATION, TEXT, GRAPHICS, CONTENT } (2) MULTIPLEXED_CONTENT a d for d {VIDEO, ANIMATION, SLIDESHOW, MULTIPLEXED_CONTENT } (3) INPUT a d for d {PICKER , HOTSPOT , SELECTABLE_CONTENT , STRING , VALUATOR , SELECTOR , MENU , EVENTER , BUTTON, INPUT } (4) OUTPUT a d for d {CONTENT, VISUAL OBJECT, IMAGE, VIDEO, AUDIO, ANIMATION, TEXT,GRAPHICS, OUTPUT } (5) PICKER a d for d {HOTSPOT , SELECTABLE_CONTENT, PICKER } (6) SELECTOR a d for d {MENU, SELECTOR } (7) EVENTER a d for d {BUTTON, EVENTER } (8) d D, " not d " a D {x |d a x} Definition 3.20: Let pu{d1 , d 2 ...d n } denote a pu Pu with a set of <content data types> d 1 , d 2 ...d n (e.g. d d : Pu P u , pu {d 1, d 2 ...d n } Fdis ( pu{d 1 , d 2 ...d n } iff d i , d i D, d i a * d i pu A(d 1 , d 2 ...d n ) ). Fdis 3.4.7 SMA-to-SMA transformations on SMAs An SMA is a sequence of presentational units (p_unit) and conceptual units (c_unit). A representation of an SMA in SMA-L M SMA L is a sequence of c_unit and p_unit declarations: M SMA L cu1 , cu 2 ... cu n , pu1 , pu2 ... pum . Definition 3.21: If F1 , F2 ... Fn are SMA-to-SMA transformations -of any of the above types- on Pu , then F F1 F2 .... Fn is also an SMA-to-SMA transformation on Pu (synthesis of SMA transformations). Definition 3.22: If M SMA L cu1 , cu 2 n , pu1 , pu2 ... pum then an SMA transformation L is: F : M SMA L Μ SMA F( M SMA L ) F cu1 , cu 2 ... cu n , pu1 , pu2 ,... pun F( cu1 ), F( cu 2 )... F( cu n ), F( pu1 ), F( pu2 ),... F( pun ) Definition 3.23: If M SMA L and M SMA L are two representations of an SMA, then M SMA L a M SMA L 3.4.8 iff F : M SMA L F ( M SMA L ) Abstraction Hierarchies in an SMA space The following hold for the relation a in an SMA space i. For all M L a M La , M L a a ii. For any M1 L a , M 2 L a , M 3 L a M La , if M1 L a a M 2 L a and M 2 L a a M 3 L a then M1 L a a M 3 L a iii. For any M1 L a , M 2 L a M La , if M1 L a a M 2 L a and M 2 L a a M1 L a then M1 L a M 2 L a M L a (Each SMA representation matches itself ) The relation a “more or equally abstract” on is reflexive (i), transitive (ii) and antisymmetric (iii), so a is a partial ordering on M La . If an SMA space M La is a partially ordered set of SMAs with respect to a , then an SMA-to-SMA transformation F on M La defines a new partial ordered set with respect to a . Note that if an SMAto-SMA transformation F is not meaningful when applied to an M La then F( M La ) M La . 3.4.9 On Searching the abstract multimedia space: Query evaluation A Query to the SMA space M La is an SMA M Query L represented in a query language Query-L: Q M Query L . Answers to the query Q are all the SMAs M La for which there exist an SMA transformation F such as : F ( M La ) Q . ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 32 COMPUTER TECHNOLOGY INSTITUTE 1999 As a null transformation is an SMAtransformation, SMAs that match the query with no SMA transformations are also answers to the query. The query language Query-L could be either a language for the representation of the Multimedia Ground Space Lg or a language for the representation of SMAs such as the SMA-L. i. If Q M Lg then the query is first abstracted onto an SMA representation M La by applying an Fbase SMA transformation: Fbase (Q) Q , where Q M La . Q is then used as the key value for searching the SMA space M La . Hence, instead of solving the query in the complex Ground Multimedia Space, the query is mapped onto and solved in the Abstract Multimedia Search Space (an SMA space). The solution is either used to guide search in the ground space (guided search) or contain links to the multimedia applications that fall under these SMAs (approximate answers). Relaxation of queries If there is no SMA transformation F on M La such as F ( M La ) Q , then the query can be “relaxed” by applying SMA-to-SMA transformations on it. This can be used for answering approximate queries. Definition 3.26: Similarity distance is the number of SMA-to-SMA transformations applied to a query Q or an SMA M La : F ( M La ) Q. Satisfying the need for mixed abstraction levels and the principles of Human Remembering The proposed abstraction model allows queries that satisfy the parameters listed in 3.2.1 by means of the following: the use of the same language in the hierarchy of abstraction levels within the SMA space which allows users mix the abstraction levels in the same query, the SMA-to-SMA transformations, which allow users decide on the tightness of the constraints, on the complexity of abstraction constructs, on the temporal and spatial grain size etc. the various types of admissible SMA transformations, addressing different Presentational Views of multimedia applications. The presentational View is determined by applying the relevant SMA transformation on those properties of SMAs that are considered of less importance for the specific goal. the use of a basic unit (PU) for structuring SMAs and the definition of SMA transformations on PUs; this allows the decision about the PU to determine the desired grain size of the presentational structure in a way conformant to user’s perception of the application’s presentational structure. 3.5 Application: Wrapping presentational structure of XML-based web documents Extensible Markup Language (XML)[14], the standard developed by the W3C as a subset of SGML[49], enables users design their own markup languages and use them to create structured multimedia documents. The formal definition of a particular class of documents (i.e., the markup language for this class of documents) is described in the Document Type Definition (DTD), which is an attribute grammar. XML documents conform to their DTD; ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 33 COMPUTER TECHNOLOGY INSTITUTE 1999 document components are defined as XML “Elements”; nested elements in a DTD form a tree structure. Adding semantics for specific domains is achieved via a DTD. Synchronized Multimedia Integration Language (SMIL) [50] allows the integration of a set of independent multimedia objects into a synchronized multimedia presentation. The syntax of SMIL documents is defined by an XML DTD. As mentioned in Sect. 3.3.2 the complete definition of a basic SMA transformation Fbase requires a specific language Lg for the representation of the Ground Multimedia Space. In order to create an Abstract Multimedia Space and experiment with it, we considered XML-based languages for the multimedia ground space (XML DTDs and SMIL documents) and SMA-L for the SMA space, defined the set of admissible abstraction mappings for Fbase , and implemented a prototype system for automatically creating SMAs from XML DTS and SMIL documents [51]. The system consists of: a) an “XML DTD-to-SMA” wrapper which takes as input an XML DTD, parses it, extracts the conceptual structure and part of the presentational structure of multimedia documents and produces an SMA-L representation of them. F : M XML_DTD Μ SMA L where DTD is a set of Element type definitions in XML. Each Element type is mapped onto either a conceptual or a presentational unit in the corresponding SMA, depending on its attribute list. b) A “SMIL document-to-SMA wrapper”, which takes as input a SMIL document, parses it, extracts from SMIL statements the presentational temporal and spatial structure of multimedia documents (including temporal and spatial constraints) and produces the SMA representation of them. F : M SMIL Μ SMA L An example of an XML DTD and a SMIL document and their corresponding representation in SMA-L are given below. Higher abstraction levels can be achieved by applying SMA-to-SMA transformations on the generated SMAs. XML DTD <! ELEMENT landmark (Hotel | Museum | Castle)* > <! ELEMENT Hotel (Room+, Hall*, Facilities*)> <ATTLIST Hotel src CDATA #REQUIRED name CDATA #REQUIRED> <! ELEMENT Room EMPTY > <ATTLIST Room src CDATA #required > <! ELEMENT Hall (#PCDATA) > <! ELEMENT Facilities (B+ | C ) > …… SMA representation in SMA-L C_UNIT landmark TYPE GROUP OF (GENERIC(Hotel, Museum, Castle)):{0 or more} C_UNIT Hotel TYPE AGGREGATION OF ((GROUP OF Room):{1 or more}, GROUP OF Hall:{0 or more}, GROUP OF Facilities:{0 or more}) PRESENTED_BY AGGREGATION OF (CONTENT, name) P_UNIT name TYPE TEXT C_UNIT Room PRESENTED_BY CONTENT C_UNIT Hall PRESENTED_BY TEXT C_UNIT Facilities TYPE GENERIC (GROUP OF B:{one or more}, C) …………. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 34 COMPUTER TECHNOLOGY INSTITUTE 1999 SMIL Document <smil> <body> <seq> <par> <video src = video1 dur = 10s / > <audio src = audio1 dur = 8s» / > </par> <par> <text src = text1 begin = 6s / > <audio src = audio2 / > </par> </seq> </body> </smil> 4. video1 audio1 6'' text11 audio2 SMA representation in SMA-L P_UNIT xxx TYPE AGGREGATION OF ( AGGREGATION OF (video, audio): T {starts}, AGGREGATION OF (text, audio): T {parallel} ) : T meets Conclusions / Open Issues A query is equivalent to the specification of its answer. An abstract specification –all specifications are abstractions of a sort- is equivalent to a query on some abstraction of the items specified. In this paper we developed Semantic Multimedia Abstractions (SMA) which capture the conceptual and presentational properties of multimedia applications. They have been used either as Model Title Specifications or as a model for looking up multimedia applications. Our main conclusion from this work –supported by the system developed in [51]- is that SMAs constitute a feasible approach to the original problem, namely how to search large multimedia repositories for applications matching conceptual and/or presentational characteristics given in mixed, end-user defined abstraction level. The SMA definition and query language, or equivalently the SMA model, is quite satisfactory for the representation of multimedia applications at that abstraction level which, being congruent with human memory, is suitable for on-line searching by end users. On the other hand the use of SMA-L is not mandatory: several of the existing languages, models and standards can be enhanced to produce equivalent representation and querying mechanisms. Two types of issues are left open by our work: on the one hand are theoretical issues such as the completeness and consistency of the abstract space; on the other are practical such as the computational savings gained by using abstractions –which should take into account the formation of the abstract space, rather a complex engineering endeavour in itself. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 35 COMPUTER TECHNOLOGY INSTITUTE 5. 1999 References [1] S. Adali, P. Bonatti, M. Sapino, V.S Subrahmanian, “A Multi-Similarity Algebra”, Proc. SIGMOD ‘98 Conf. on Management of Data, 1998, pp. 402-413. [2] W. Al-Khatib, Y. F. Day, A. Ghafoor, P.B. Berra, “Semantic Modeling and Knowledge Representation in Multimedia Databases”, IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, pp. 64-80, Jan./Feb. 1999. [3] J. F. Allen, “Maintaining Knowledge about Temporal Intervals”, Communications of the ACM, Vol. 26, No. 11, Nov. 1983. [4] K. Autio, “Abstraction of behavior and structure in model-based diagnosis”, Proc. DX-95 The 6th Int’l Workshop on Principles of Diagnosis, Goslar, Germany, Oct. 1995. [5] R. Bergmann and W. Wilke, “Building and Refining Abstract Planning Cases by Change of Representation Language”, Journal of Artificial Intelligence Research 3, 1995., pp. 53-118. [6] K. Bohm and T. Rakow, “Metadata for Multimedia Documents”, Special issue of SIGMOD Record, Dec. 1994. [7] S. Boll, W. Klas, “ZYX – A semantic model for multimedia documents and presentations”, Proc. 8th IFIP Conference on Data Semantics (DS-8): “Semantic Issues in Multimedia Systems”, Jan. 5-8, New Zealand, 1999. [8] A. Borgida, J. Mylopoulos, H. Wong, “Generalization/Specialization as a Basis for Software Specification”, “On Conceptual Modelling”, edited by Brodie M., Mylopoulos J., Schmidt, Springer-Verlag, 1982, pp. 87-117. [9] M. Brodie, D. Ridjanovic, “On the Design and Specification of Database Transactions”, “On Conceptual Modelling”, edited by Brodie M., Mylopoulos J., Schmidt, Springer-Verlag, 1982, pp. 277-306. [10] W. W. Chu, C.C. Hsu, A.F. Cardenas, R.K.Taira, “Knowledge-Based Image Retrieval with Spatial and Temporal Constructs”, IEEE Trans. Knowledge and Data Eng., vol. 10, no. 6, Nov/Dec 1998. [11] V.Delis, Th. Hadzilacos, “Binary String Relations: A Foundation for Spatio-Temporal Knowledge Representation”, Proc. ACM Conf. On Information and Knowledge Management (CIKM), 1999. [12] J.D. Dionisio and A. F. Cardenas, “A Unified Model for Representing Multimedia, Timeline and Simulation Data”, IEEE Trans. Knowledge and Data Eng., vol. 10, no. 5, Sept./Oct. 1998 [13] J. M. Egenhofer and R. J. Herring, “Categorizing Binary Topological Relationships Between Regions, Lines and Points in Geographic Databases”, Tech. Report, Department of Surveying Engineering, Un. of Maine, 1990. [14] Extensible Markup Language (XML), W3C Recommendations XML 1.0, Feb. 1998, [15] R. Fikes and N.J. Nilson, “STRIPS: A New Approach to the Application Theorem Proving to Problem Solving”, Artificial Intelligence, Vol. 2, pp. 189-208. [16] D. Gardelis, Th.. Hadzilacos, P. Kourouniotis, M. Koutlis, E. Megalou, “Automating the generation of multimedia titles”, Proc. 10th Int’l Conf. on Advanced Science and Technology (ICAST’94), Chicago, USA, Mar. 1994. [17] F. Giunchiglia and T. Walsh, “A theory of Abstraction”, Artificial Intelligence, Vol.56, No. 2-3, pp. 323-390, 1992. [18] W. Grosky, F. Fotouli, I. Sethi, “Using Metadata for Intelligent Browsing of Structured Media Objects”, SIGMOD Record, Vol. 23, No. 4, Dec. 1994. [19] N. Hirzalla, O. Megzari, A. Karmouch, “An Object-Oriented Data Model and a Query Language for Multimedia Databases”, IEEE ICECS’95, Dec. 1995. [20] J.R. Hobbs, “Granularity”, Proc. 9th Int’l Joint Conference on Artificial Intelligence (IJCAI), pp. 432-435, 1985. [21] R.C. Holte, C. Drummond, M.B. Perez, R.M. Zimmer, A.J. MacDonald, “Searching with Abstraction: A unifying Framework and a New High Performance Algorithm”, Proc. 10th Canadian Conf. on AI, pp. 263-270, 1994. [22] R.C. Holte, T. Mkadmi, “Speeding up Problem Solving by Abstraction: A Graph Oriented Approach”, Special Issue of Artificial Intelligence, (spring 1996) on Empirical AI, edited by P. Cohen and B. Porter. [23] T. Imielinski, “Domain Abstraction and Limited Reasoning”, Proc. 10th Int’l Joint Conf. on AI, 1987, pp. 997-1003. [24] ISO/IEC 10166-2:1991, Document Filing and Retrieval (DFR) - Part 2: Protocol Specification. [25] Y. Iwasaki, “Reasoning with Multiple Abstraction Models”, Proc. 4th Int’l Workshop on Qualitative Physics, 1990. [26] H. Jagadish, A. Medelzon, T. Milo, “Similarity-Based Queries”, Proc. ACM PODS, San Jose, May 1995, pp. 36-45. [27] J.N Johnson-Lairdm., “Mental Models”, Harvard University Press, 1983. [28] V. Kashyap, K. Shah, A. Sheth, “Metadata for building the MultiMedia Patch Quilt”, "Multimedia Database Systems: Issues and Research Directions", S. Jajodia and V.S.Subrahmaniun, Eds., Springer-Verlag, 1995. [29] W. Klas, E.J. Neuhold, and M. Schrefl, “Using an Object-Oriented Approach to Model Multimedia Data”, Computer Comm. Vol. 13, no. 4, pp.204-216, May 1990. [30] W. Klas and A. Sheth, "Metadata for Digital Media", Special issue of SIGMOD Record, Dec. 1994. [31] C.A. Knoblock, “Automatically Generating Abstractions for Planning”, Artificial Intelligence, Vol. 68, No. 2, 1994. [32] A. Levy, “Creating Abstractions Using Relevance Reasoning”, Proc. 12th National Conf. on AI, Seattle, Aug. 1994. [33] J. Z. Li, M.T. Özsu, D. Szafron, V. Oria, "MOQL: A Multimedia Object Query Language", Proc. 3rd Int’l Workshop on Multimedia Information Systems, Como, Italy, Sept. 1997, pp. 19-28. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 36 COMPUTER TECHNOLOGY INSTITUTE 1999 [34] T.D.C Little and A.Grafoor, “Interval-Based Conceptual Models for Time-Dependent Multimedia Data”, Trans. on Knowledge and Data Engineering, Vol. 5, No. 4, 1993. [35] Macromedia Director, Macromedia Inc., [36] S. Marcus, V.S. Subrahmanian, “Foundations of Multimedia Database Systems”, JACM, vol. 43, no 3., pp. 474-523, 1996. [37] E. Megalou and Th.Hadzilacos, “On Conceptual Modeling for Interactive Multimedia Presentations”, Proc. 2nd Int’l Conf. on Multimedia Modeling ’95 (MMM’95), Singapore, pp. 51, World Scientific. [38] E. Megalou, Th. Hadzilacos, N. Mamoulis, "Conceptual Title Abstractions: Modeling and Querying Very Large Interactive Multimedia Repositories", Proc. 3rd Int’l Conf. on Multi-Media Modeling (MMM'96), Toulouse, 1996. [39] C. Meghini, F. Rabitti, C. Thanos, “Conceptual Document Modeling and Retrieval”, Computer Standards & Interfaces, Vol. 11, 1990/91, pp.195-213. [40] I. Mozetic, I. Bratko, T. Urbancic, “Varying Levels of Abstraction in Qualitative Modelling”, Machine Intelligence, Vol. 12, Clarendon Press, 1991, pp. 259-280. [41] P. Nayak, A. Levy, “A Semantic Theory of Abstraction”, Proc. Int’l Joint Conf. on AI, Montreal, Canada, 1995. [42] S. B. Navathe, “Evolution of Data Modeling for Databases”, Communications of the ACM, vol. 35, no. 9, Sept. 1992. [43] R.J. Peters, A. Lipka, M. T. Oszu, D. Szafron, “The Query Model and Query Language of TIGUKAT”, Tech. Rep. 93-01, Dept of CS, Univ. of Alberta, Jan. 1995. [44] D.A Plaisted, “Theorem Proving with Abstraction”, Artificial Intelligence Vol. 16, pp. 47-108, 1981. [45] A. Prieditis and B. Janakiraman, “Generating Effective Admissible Heuristics by Abstraction and Reconstitution”, Proc. AAAI, 1993, pp. 743-748. [46] J. Rumbaugh et al. “Object-Oriented Modeling and Design”, Prentice Hall, 1991. [47] E.D. Sacerdoti, “Planning in a hierarchy of abstraction spaces”, Artificial Intelligence, Vol. 5, 1974, pp 115-135. [48] G. Schloss, M. Wynblatt, “Providing definition and temporal structure for multimedia data”, ACM Multimedia Systems, vol. 3, no 5/6, Nov 1995. [49] Standard Generalized Markup Language (SGML), International Standard ISO 8879. [50] Synchronized Multimedia Integration Language (SMIL), W3C Recommendation SMIL 1.0, 1998, [51] G. Sygletos , “Abstracting XML-based documents to Semantic Multimedia Abstractions”, Diploma Thesis, Univ. of Patras, Greece, Dept. of Computer Engineering and Informatics, 1999 (in Greek). [52] N. Tryfona and Th. Hadzilacos, “Geographic Applications Development: Models and Tools for the Conceptual Level”, Proc. 3rd ACM GIS Workshop CIKM’95, Baltimore, Maryland, USA , Dec. 1995. [53] ValMMeth, “Validation of a Multimedia Title Series Production Methodology”, Innovation Program IN34D, DGXIII, European Commission, CTI-R&D Unit 3, 1997-98. [54] Vazirgiannis M., Theodoridis Y., Sellis T., “Spatio-Temporal Composition in Multimedia Applications”, Proc. Int’l Workshop on Multimedia Software Development, Berlin, IEEE-ICSE, 1996. [55] K. Wittenburg, “Introduction: Abstraction in Multimedia”, ACM Workshop on Effective Abstractions in MMedia, 1995. ____________________________________________________________________________________________ TECHNICAL REPORT No. TR99/09/06 37